In this dissertation, we aim to develop a theoretical understanding of foundation models and reinforcement learning. We delve into a comprehensive analysis of specific aspects within these domains. The focal points of our study are as follows: • Generative Adversarial Imitation Learning (GAIL) with Neural Networks: GAIL is poised to...
Machine learning and deep learning have been proven successful across various scientific fields, such as computer vision, natural language processing, and recommendation systems. As models become more complex, with more parameters and intricate architectures, they can achieve higher prediction accuracy when trained on larger datasets. However, despite the great prediction...
With the advancement of high-throughput sequencing technology, it has become much easier to extract gene expression data and to discover gene-disease associations more efficiently. Longitudinal gene expression data offer more insight into expression patterns for distinct patient groups compared to cross-sectional data. For instance, patients diagnosed with subclinical acute rejections...
In the Maximum-a-Posteriori (MAP) Inference problem, for any given probability distribution, the goal is to find the point in the support of that distribution with the highest probability. Potts models and Determinantal Point Processes (DPPs) are probabilistic models that were introduced in the context of statistical physics several decades ago....
Literature screening is the process of identifying all relevant records from a pool of candidate paper records in systematic review, meta-analysis, and other research synthesis tasks. This process is time consuming, expensive, and prone to human error. Screening prioritization methods attempt to help reviewers identify most relevant records while only...
Deduplication, also referred to as "entity resolution", is a common and crucial pre-processing step in the construction of social networks. Traditional deduplication methods compare the attributes (such as name and age) of potential matching pairs to estimate a match probability for a pair. Recently research has used clustering techniques for...
Deduplication, also referred to as "entity resolution", is a common and crucial pre-processing step in the construction of social networks. Traditional deduplication methods compare the attributes (such as name and age) of potential matching pairs to estimate a match probability for a pair. Recently research has used clustering techniques for...
Seasonal malaria chemoprevention (SMC) was first recommended by the World Health Organization (WHO) in 2012 to prevent uncomplicated malaria in children and began implementation in Burkina Faso in 2014 under programmatic campaigns. Systematic assessment of the impact of national SMC campaigns requires data with weekly or monthly temporal resolution over...
This thesis develops novel methods for generating space-filling designs inside a designspace and subsampling from a data set. It incorporates materials from two papers by the
author: Shang and Apley 2021; Shang, Apley, and Mehrotra 2022a. Chapter 1 discusses space-filling designs of computer experiments, which is publishedas Shang and Apley...
Sequential change-point detection for time series enables us to sequentially check the hypothesisthat the model still holds as more and more data are observed. It’s widely used in data monitoring
in practice. In this work, we propose two models: Binomial AR(1) model and Generalized
Beta AR(p) model, for modeling binomial...