Index Catalog // Arch : Northwestern University Institutional Repository

1. On MAP Inference of Ferromagnetic Potts Models and Nonsymmetric Determinantal Point Processes

Description:: In the Maximum-a-Posteriori (MAP) Inference problem, for any given probability distribution, the goal is to find the point in the support of that distribution with the highest probability. Potts models and Determinantal Point Processes (DPPs) are probabilistic models that were introduced in the context of statistical physics several decades ago....
Keyword:: MAP Inference, Potts Model, Determinantal Point Process, Linear Programming, Streaming Algorithms, and Online Algorithms
Subject:: Statistics and Computer science
Creator:: Talla, Aravind Reddy
Owner:: Scholarly Digital Publishing
Language:: en
Date Uploaded:: 05/31/2023
Date Created:: 2023-01-01
Resource Type:: Dissertation
Alternate Identifier:: http://dissertations.umi.com/northwestern:16561 and etdadmin_upload_985524

2. Applications of Statistical Language Models in Complex Network Community Detection and Definition Modeling

Description:: Modeling human language is at the very frontier of machine learning and artificial intelligence. Statistical language models are probabilistic models that assign probabilities to sequences of words. For example, topic models are frequently used text-mining tools to organize a vast set of unstructured documents by exploring their theme structure. More...
Keyword:: definition modeling, statistical language models, deep learning, nonparametric Bayesian model, community detection, and neural language models
Subject:: Artificial intelligence, Statistics, and Computer science
Creator:: Zhu, Ruimin
Owner:: Scholarly Digital Publishing
Language:: en
Date Uploaded:: 02/12/2020
Date Created:: 2019-01-01
Resource Type:: Dissertation
Alternate Identifier:: etdadmin_upload_685529 and http://dissertations.umi.com/northwestern:14809

3. A Novel Parallel Adaptive Survivor Selection Framework for Large-Scale Simulation Optimization

Description:: For stochastic simulation optimization in a modern computing era, we introduce a new parallel framework for solving very large-scale problems using a ranking & selection (R&S) approach that simulates all systems or feasible solutions to provide a global statistical guarantee. We propose a parallel adaptive survivor selection (PASS) framework that...
Keyword:: master-worker, simulation, parallel, stochastic, large-scale, and optimization
Subject:: Applied mathematics, Statistics, and Operations research
Creator:: Pei, Linda
Owner:: Scholarly Digital Publishing
Language:: en
Date Uploaded:: 06/22/2022
Date Created:: 2022-01-01
Resource Type:: Dissertation
Alternate Identifier:: etdadmin_upload_903888 and http://dissertations.umi.com/northwestern:16108

4. Risk Prediction with Longitudinal Gene Expression Data Using Statistical and Machine Learning Method

Description:: With the advancement of high-throughput sequencing technology, it has become much easier to extract gene expression data and to discover gene-disease associations more efficiently. Longitudinal gene expression data offer more insight into expression patterns for distinct patient groups compared to cross-sectional data. For instance, patients diagnosed with subclinical acute rejections...
Keyword:: mixed effect estimation, Empirical Bayes, two-stage modeling, and dynamic predictions
Subject:: Statistics
Creator:: Lyu, Jiahui
Owner:: Scholarly Digital Publishing
Language:: en
Date Uploaded:: 05/31/2023
Date Created:: 2023-01-01
Resource Type:: Dissertation
Alternate Identifier:: etdadmin_upload_982900 and http://dissertations.umi.com/northwestern:16506

5. Essay on Foundation Models and Reinforcement Learning

Description:: In this dissertation, we aim to develop a theoretical understanding of foundation models and reinforcement learning. We delve into a comprehensive analysis of specific aspects within these domains. The focal points of our study are as follows: • Generative Adversarial Imitation Learning (GAIL) with Neural Networks: GAIL is poised to...
Keyword:: Reinforcement Leanring, In-Context Learning, Imitation Leanring, and Transformer
Subject:: Statistics and Operations research
Creator:: Zhang, Yufeng
Owner:: Scholarly Digital Publishing
Language:: en
Date Uploaded:: 08/23/2023
Date Created:: 2023-01-01
Resource Type:: Dissertation
Alternate Identifier:: etdadmin_upload_1013414 and http://dissertations.umi.com/northwestern:16740

6. Asymptotic Uncertainty Quantification and Its Application in Efficient Sampling and Learning

Description:: The ever growing desire for accurate estimation and efficient learning necessitates the efforts to quantitatively characterize uncertainties for models. In this thesis, four problems pertaining to uncertainty quantification are discussed: A sequential stopping framework of constructing fixed-precision confidence regions is proposed for a class of multivariate simulation problems where variance...
Keyword:: Ranking and Selection, Reinforcement Learning, Statistical Learning, Stochastic Gradient Descent, and Uncertainty Quantification
Subject:: Applied mathematics, Statistics, and Computer science
Creator:: Zhu, Yi
Owner:: Scholarly Digital Publishing
Language:: en
Date Uploaded:: 01/21/2021
Date Created:: 2020-01-01
Resource Type:: Dissertation
Alternate Identifier:: http://dissertations.umi.com/northwestern:15080 and etdadmin_upload_741745

7. A Hybrid Physics-Based And Data-Driven Modeling Framework For Energy And Water Use Analysis Of Data Centers With Spatio-Temporal Resolution

Description:: With the rapid growth of demand for data center services, the energy and water use of data centers has become a critical concern in the contexts of energy use, climate change, and freshwater conservation. Therefore, understanding, quantifying, and optimizing the use of energy and water resources in data centers has...
Keyword:: Data Center, Data-Driven Modeling, Thermodynamics, Information Technology, Energy-Water Nexus, and Energy and Sustainability
Subject:: Statistics, Sustainability, and Energy
Creator:: Lei, Nuoa
Owner:: Scholarly Digital Publishing
Language:: en
Date Uploaded:: 06/22/2022
Date Created:: 2022-01-01
Resource Type:: Dissertation
Alternate Identifier:: http://dissertations.umi.com/northwestern:15929 and etdadmin_upload_879204

8. Stochastic Noise in Gene Expression Impacts Developmental Self-organization

Description:: Cells are often precisely organized into patterns within developing tissues. This precision must emerge from biochemical processes within, and between cells, that are inherently stochastic. I investigated the impact of stochastic gene expression on self-organized pattern formation, focusing on Senseless (Sens), a key target of Wnt and Notch signaling during...
Keyword:: gene expression, microRNA, cell fate, Drosophila, self-organization, and noise
Subject:: Applied mathematics, Statistics, and Developmental biology
Creator:: Giri, Ritika
Owner:: Scholarly Digital Publishing
Language:: en
Date Uploaded:: 02/01/2021
Date Created:: 2020-01-01
Resource Type:: Dissertation
Alternate Identifier:: etdadmin_upload_763812 and http://dissertations.umi.com/northwestern:15251

9. Subgroup Identification in Longitudinal Studies

Description:: This dissertation focuses on subgroup identification in longitudinal studies. There are two different but related topics. In chapter two and chapter three, several longitudinal based methods for subgroup identification with enhanced treatment effect are proposed to correct the deficiency in measuring treatment effect by simply using a summary statistic. In...
Keyword:: interaction, recursive partitioning, latent trajectory analysis, latent class, precision medicine, and personalized medicine
Subject:: Statistics
Creator:: Wei, Yishu
Owner:: Scholarly Digital Publishing
Language:: en
Date Uploaded:: 02/12/2020
Date Created:: 2019-01-01
Resource Type:: Dissertation
Alternate Identifier:: http://dissertations.umi.com/northwestern:14649 and etdadmin_upload_662625

10. Sequential Change-point Detection for Time Series

Description:: Sequential change-point detection for time series enables us to sequentially check the hypothesisthat the model still holds as more and more data are observed. It’s widely used in data monitoring in practice. In this work, we propose two models: Binomial AR(1) model and Generalized Beta AR(p) model, for modeling binomial...
Keyword:: compositional time series, time series of counts, ergodicity, strong mixing processes, and sequential change-point detection
Subject:: Statistics
Creator:: Liu, Yajun
Owner:: Scholarly Digital Publishing
Language:: en
Date Uploaded:: 06/22/2022
Date Created:: 2022-01-01
Resource Type:: Dissertation
Alternate Identifier:: http://dissertations.umi.com/northwestern:16079 and etdadmin_upload_902877

11. Essays on the Sociological Analysis of Segregation and Natural Language

Description:: This dissertation contributes to the theory of segregation and methodologies to measure it. The first two chapters focus on the traditional problem of quantifying segregation in traditional survey data through segregation indices. Segregation indices describe the segregation of an environment with one number – usually from 0 to 1. The...
Keyword:: Word Embedding, Statistical Inference, Segregation, and Segregation Index
Subject:: Sociology and Statistics
Creator:: Nanni, Antonio
Owner:: Scholarly Digital Publishing
Language:: en
Date Uploaded:: 06/22/2022
Date Created:: 2022-01-01
Resource Type:: Dissertation
Alternate Identifier:: etdadmin_upload_901361 and http://dissertations.umi.com/northwestern:16050

12. Variable Selection for High Dimensional Compositional Data with Application in Metagenomics

Description:: The advent of next-generation sequencing technologies has greatly promoted the devel- opment of metagenomics, and the analysis of compositional dataset has a wide range of application in this area. Because of the constraint that the sum of species relative abun- dance being 1, many traditional and classical statistical methods cannot...
Keyword:: stability selection, penalized regression, metagenomimcs, subgroup identification, and compositional data
Subject:: Statistics
Creator:: Wang, Pan
Owner:: Scholarly Digital Publishing
Language:: en
Date Uploaded:: 02/12/2020
Date Created:: 2019-01-01
Resource Type:: Dissertation
Alternate Identifier:: etdadmin_upload_638478 and http://dissertations.umi.com/northwestern:14510

13. Prediction of CRISPR-Cas9 Cleavage Efficiency Through Markov Feature Engineering and Boosting-Based Transfer Learning

Description:: In the short amount of time that genetic manipulation has been possible through CRISPR technology, myriad applications have been developed. Results from one of the most promising applications of this technology, pooled screens, have shown that single guide RNAs (sgRNAs), RNA sequences used to target specific regions of the genome,...
Keyword:: sgRNA design, CRISPR efficiency, and Machine learning
Subject:: Biostatistics, Statistics, and Bioinformatics
Creator:: Zarate, Oscar Alberto
Owner:: Scholarly Digital Publishing
Language:: en
Date Uploaded:: 02/01/2021
Date Created:: 2020-01-01
Resource Type:: Dissertation
Alternate Identifier:: http://dissertations.umi.com/northwestern:15353 and etdadmin_upload_773909

14. Embedding a Randomized Experiment Within a Regression Discontinuity Design: A Meta-Analytic Approach

Description:: Randomization is considered the gold standard when it comes to evaluating the effectiveness of interventions, primarily due to its ability to avoid bias. However, in recent years, randomization has been heavily criticized in circumstances where subject randomization may not be ethical. In a randomized controlled trial, patients who are extremely...
Keyword:: Meta-Analysis, Regression Discontinuity Design, and Randomized Experiment
Subject:: Statistics
Creator:: Hong, Mindy
Owner:: Scholarly Digital Publishing
Language:: en
Date Uploaded:: 02/12/2020
Date Created:: 2019-01-01
Resource Type:: Dissertation
Alternate Identifier:: etdadmin_upload_686415 and http://dissertations.umi.com/northwestern:14841

15. Statistical Methods for Assessing Replication: A Meta-Analytic Framework

Description:: A replication crisis has enveloped several scientific fields since the early 2000s (see Baker, 2016). This has given rise to improved research and reporting practices (e.g., F. S. Collins & Tabak, 2014), as well as a cottage industry of research into issues of replication and reproducibility (e.g., R. A. Klein...
Keyword:: Replication, Statistics, Experimental Design, Heterogeneity, and Meta-analysis
Subject:: Statistics
Creator:: Schauer, Jacob
Owner:: Scholarly Digital Publishing
Language:: en
Date Uploaded:: 02/12/2020
Date Created:: 2018-01-01
Resource Type:: Dissertation
Alternate Identifier:: http://dissertations.umi.com/northwestern:14385 and etdadmin_upload_613798

16. Topics in Statistical Modeling for Unstructured Text Data with Application to Commonsense Inference

Description:: Commonsense inference is a critical capability of modern artificial intelligence (AI) systems. The machines need commonsense knowledge to perform tasks exactly like human being does. Learning commonsense inference from text has been a long standing challenge in the field of natural language processing due to reporting bias -- people do...
Keyword:: data augmentation, commonsense inference, statistical modeling in text data, NLP, deep learning, and language modeling
Subject:: Statistics and Computer science
Creator:: Yang, Yiben
Owner:: Scholarly Digital Publishing
Language:: en
Date Uploaded:: 02/01/2021
Date Created:: 2020-01-01
Resource Type:: Dissertation
Alternate Identifier:: http://dissertations.umi.com/northwestern:15163 and etdadmin_upload_750023

17. Topics in Microbiome Data Analysis: Normalization and Differential Abundance Test and Large-Scale Human Microbe-Disease Association Prediction

Description:: The advent of sequencing technologies has generated a large amount of biological and medical data. These data such as genetic sequencing data and lab experimental evidence data can help understand critical biomedical problems. This dissertation makes contribution in three different but related applications in biomedical research. In Chapter 2, we...
Keyword:: Normalization, Metagenomics, Graph neural network, Clustering, and Differential abundance test
Subject:: Biostatistics, Statistics, and Bioinformatics
Creator:: Ma, Yuanjing
Owner:: Scholarly Digital Publishing
Language:: en
Date Uploaded:: 01/21/2021
Date Created:: 2020-01-01
Resource Type:: Dissertation
Alternate Identifier:: http://dissertations.umi.com/northwestern:15057 and etdadmin_upload_736532

18. Multidisciplinary and Dynamic Decisions in Simulation-Based Design

Description:: Modern design practices rely more and more on computer simulations due to their low cost compared with physical experiments. However, it is still an elusive task to fully unleash the advantages of the simulation models while mitigating their disadvantages for designing complex engineering systems. In simulation-based design, computer simulation models...
Keyword:: Design optimization, Bayesian optimization, Uncertainty quantification, Simulation-based design, Model calibration, and Multidisciplinary design
Subject:: Statistics, Design, and Mechanical engineering
Creator:: Tao, Siyu
Owner:: Scholarly Digital Publishing
Language:: en
Date Uploaded:: 02/01/2021
Date Created:: 2020-01-01
Resource Type:: Dissertation
Alternate Identifier:: etdadmin_upload_773686 and http://dissertations.umi.com/northwestern:15350

19. Quantification of Microstructure Induced Uncertainty in Multiscale Materials with Random Processes

Description:: The heart of computational materials science lies in providing fundamental insights and understanding of materials behavior and properties across different scales. The significance of this task is highlighted by the Materials Genome Initiative and the emergence of computational tools and frameworks such as materials by design, microstructure sensitive design, and...
Keyword:: random Process, Multiscale simulation, Microstructure, Gaussian process, and Uncertainty
Subject:: Materials Science, Statistics, and Mechanical engineering
Creator:: Bostanabad, Ramin
Owner:: Scholarly Digital Publishing
Language:: en
Date Uploaded:: 02/12/2020
Date Created:: 2019-01-01
Resource Type:: Dissertation
Alternate Identifier:: http://dissertations.umi.com/northwestern:14508 and etdadmin_upload_638165

20. Spatial and Temporal Methods to Analyze the Malaria Burden using Routine Health Facility Case Data in Burkina Faso

Description:: Seasonal malaria chemoprevention (SMC) was first recommended by the World Health Organization (WHO) in 2012 to prevent uncomplicated malaria in children and began implementation in Burkina Faso in 2014 under programmatic campaigns. Systematic assessment of the impact of national SMC campaigns requires data with weekly or monthly temporal resolution over...
Keyword:: Spatio-temporal modeling, Integrated Nested Laplacian Approximation, Malaria, Time-series analysis, Difference-in-differences, and Seasonal Malaria Chemoprevention
Subject:: Epidemiology and Statistics
Creator:: Rodriguez, Sebastian
Owner:: Scholarly Digital Publishing
Language:: en
Date Uploaded:: 09/22/2022
Date Created:: 2022-01-01
Resource Type:: Dissertation
Alternate Identifier:: http://dissertations.umi.com/northwestern:16220 and etdadmin_upload_927176

21. Space-Filling Designed Sampling from Databases

Description:: This thesis develops novel methods for generating space-filling designs inside a designspace and subsampling from a data set. It incorporates materials from two papers by the author: Shang and Apley 2021; Shang, Apley, and Mehrotra 2022a. Chapter 1 discusses space-filling designs of computer experiments, which is publishedas Shang and Apley...
Keyword:: fully-sequential, diversity subsampling, design of computer experiments, space-filling, and custom subsampling
Subject:: Industrial engineering and Statistics
Creator:: Shang, Boyang
Owner:: Scholarly Digital Publishing
Language:: en
Date Uploaded:: 09/22/2022
Date Created:: 2022-01-01
Resource Type:: Dissertation
Alternate Identifier:: etdadmin_upload_919274 and http://dissertations.umi.com/northwestern:16168

22. Methods for Multi-Objective Genetic Clustering of Time-Evolving Data

Description:: Sequential batches of time-evolving data for a set of persistent identifiable entities (e.g. online shopping behavior by month for a customer ID, or economic figures by year for a collection of countries) can exhibit temporal shifts in their underlying clustering structure. Methods for recovering this evolutionary clustering structure exploit natural...
Keyword:: clustering and multi-objective
Subject:: Statistics
Creator:: Alleman, Austin
Owner:: Scholarly Digital Publishing
Language:: en
Date Uploaded:: 02/01/2021
Date Created:: 2020-01-01
Resource Type:: Dissertation
Alternate Identifier:: etdadmin_upload_763977 and http://dissertations.umi.com/northwestern:15255

23. The Impact of Entity Resolution on Observed Social Network Structure

Description:: Deduplication, also referred to as "entity resolution", is a common and crucial pre-processing step in the construction of social networks. Traditional deduplication methods compare the attributes (such as name and age) of potential matching pairs to estimate a match probability for a pair. Recently research has used clustering techniques for...
Keyword:: egocentric, networks, entity resolution, and record linkage
Subject:: Information science, Statistics, and Public health
Creator:: Smith, Abigail Leeza
Owner:: Scholarly Digital Publishing
Language:: en
Date Uploaded:: 05/31/2023
Date Created:: 2023-01-01
Resource Type:: Dissertation
Alternate Identifier:: http://dissertations.umi.com/northwestern:16419 and etdadmin_upload_962549

24. Investigation of diffusion of innovations among homogeneous professional groups

Description:: Innovations are adopted by individuals and spread to other individuals. They are adopted at different rates, some are never adopted at all, some are abandoned, and some become the new norms. A very extensive evidence-based research and practice paradigm that studies how innovations spread is called diffusion of innovations. This...
Keyword:: multipartite networks, complex networks, diffusion of innovations, and network reconstruction
Subject:: Statistics
Creator:: Lee, Hyojun Ada
Owner:: Scholarly Digital Publishing
Language:: en
Date Uploaded:: 02/12/2020
Date Created:: 2019-01-01
Resource Type:: Dissertation
Alternate Identifier:: http://dissertations.umi.com/northwestern:14765 and etdadmin_upload_678904

25. Interpretable Machine Learning with Applications to Computational Materials Science

Description:: Machine learning and deep learning have been proven successful across various scientific fields, such as computer vision, natural language processing, and recommendation systems. As models become more complex, with more parameters and intricate architectures, they can achieve higher prediction accuracy when trained on larger datasets. However, despite the great prediction...
Keyword:: Neural Network, Interpretable Machine Learning, and Function Visualization
Subject:: Statistics and Computational chemistry
Creator:: Zhang, Shengtong
Owner:: Scholarly Digital Publishing
Language:: en
Date Uploaded:: 08/23/2023
Date Created:: 2023-01-01
Resource Type:: Dissertation
Alternate Identifier:: etdadmin_upload_995408 and http://dissertations.umi.com/northwestern:16632

26. Fisher Score-Based Concept Drift Monitoring and Diagnosis with Applications to Microstructure Nonstationarity Analysis and Deep Learning Segmentation for Multiphase Materials

Description:: Supervised learning model is one of the most fundamental machine learning models. It can provide powerful capability of prediction by learning complex patterns hidden in many, sometimes thousands, predictors. It can also be used as a building block of other machine learning tasks, like unsupervised learning and reinforcement learning. Such...
Keyword:: Stochastic Microstructures, Fisher Score, Deep Learning, Machine Learning, Concept Drift, and Nonstationarity Analysis
Subject:: Materials Science, Statistics, and Computer science
Creator:: Zhang, Kungang
Owner:: Scholarly Digital Publishing
Language:: en
Date Uploaded:: 02/01/2021
Date Created:: 2020-01-01
Resource Type:: Dissertation
Alternate Identifier:: etdadmin_upload_765218 and http://dissertations.umi.com/northwestern:15297

27. Several New Advances for Gaussian Process Models

Description:: Gaussian process provides a principled and flexible approach for modeling the response surface or the latent function in many areas, including machine learning, statistics and computer experiment. In literature, Gaussian process models have already demonstrated their effectiveness and usefulness in a variety of applications. In this dissertation, we mainly focus...
Keyword:: variable selection, variational inference, Gaussian process, and lifted Brownian random field
Subject:: Statistics
Creator:: Yu, Yang
Owner:: Scholarly Digital Publishing
Language:: en
Date Uploaded:: 01/21/2021
Date Created:: 2020-01-01
Resource Type:: Dissertation
Alternate Identifier:: http://dissertations.umi.com/northwestern:15098 and etdadmin_upload_742660

28. Experimental Design for the Study of Treatment Effect Heterogeneity in Education Research

Description:: In recent years, the social sciences have been ensnared in a crisis in which many research findings cannot be replicated (Ioannidis, 2005; Open Science Collaboration, 2015; Camerer et al., 2016; Makel & Plucker, 2014). This crisis has been attributed to a variety of problems including lack of transparency about research...
Keyword:: education, treatment effect heterogeneity, response surface, and experimental design
Subject:: Statistics, Education, and Social research
Creator:: Peko-Spicer, Sarah
Owner:: Scholarly Digital Publishing
Language:: en
Date Uploaded:: 02/02/2022
Date Created:: 2021-01-01
Resource Type:: Dissertation
Alternate Identifier:: etdadmin_upload_853504 and http://dissertations.umi.com/northwestern:15806

29. Comparative analysis of feature selection and classification methods for epigenetic methylation data

Description:: Epigenetics, the study of heritable changes in organisms not caused by mutations to DNA,holds tremendous promise for future medical applications. Although still in its infancy, feature selection in statistics plays an important role in correlating epigenetic changes with diseases and various health issues. Feature selection may also be used in...
Subject:: Statistics
Creator:: Kleyn, Aaron
Owner:: Scholarly Digital Publishing
Language:: en
Date Uploaded:: 04/15/2021
Date Created:: 2021-01-01
Resource Type:: Dissertation
Alternate Identifier:: etdadmin_upload_797839 and http://dissertations.umi.com/northwestern:15488

30. The Impact of Entity Resolution on Observed Social Network Structure

Description:: Deduplication, also referred to as "entity resolution", is a common and crucial pre-processing step in the construction of social networks. Traditional deduplication methods compare the attributes (such as name and age) of potential matching pairs to estimate a match probability for a pair. Recently research has used clustering techniques for...
Keyword:: egocentric, networks, entity resolution, and record linkage
Subject:: Information science, Statistics, and Public health
Creator:: Smith, Abigail Leeza
Owner:: Scholarly Digital Publishing
Language:: en
Date Uploaded:: 03/24/2023
Date Created:: 2023-01-01
Resource Type:: Dissertation
Alternate Identifier:: http://dissertations.umi.com/northwestern:16419 and etdadmin_upload_962549

31. Statistical methods for the network-based analysis of genomic data

Description:: The focus of this thesis is on evaluating, designing, and applying statistical methods that elucidate molecular mechanisms by seeking to understand the pathways that contribute to disease. Chapter 1 introduces the field and motivates the work in this thesis. Chapters 2, 3, and 4 describe original work. Chapter 5 recapitulates...
Keyword:: Gene expression, Algorithms, Pathways, Networks, and Systems biology
Subject:: Statistics and Bioinformatics
Creator:: Shah, Sahil D
Owner:: Scholarly Digital Publishing
Language:: en
Date Uploaded:: 02/12/2020
Date Created:: 2019-01-01
Resource Type:: Dissertation
Alternate Identifier:: http://dissertations.umi.com/northwestern:14812 and etdadmin_upload_685626

32. Methods for Synthesizing and Translating Statistical Evidence in Education

Description:: This dissertation is a collection of three papers on synthesizing and translating statistical evidence in education research. Chapter 1 serves as an introduction and executive summary, and Chapters 2 - 4 contain the three substantive papers respectively. Chapter 2 presents methods for pooling sample variances across studies to improve properties...
Keyword:: meta-analysis, clearinghouse data, translation science, data visualization, evidence synthesis, and statistical cognition
Subject:: Statistics and Education
Creator:: Fitzgerald, Kaitlyn Grace
Owner:: Scholarly Digital Publishing
Language:: en
Date Uploaded:: 10/07/2021
Date Created:: 2021-01-01
Resource Type:: Dissertation
Alternate Identifier:: etdadmin_upload_838471 and http://dissertations.umi.com/northwestern:15676

33. Estimating Network Metrics via Random Walk Sampling

Description:: In this thesis we present methods for estimating network metrics via random walk sampling. More specifically, we generalize the Hansen-Hurwitz estimator and the Horvitz-Thompson estimator to estimate the shortest path length distribution (SPLD), closeness centrality ranking, and clustering coefficients of a network. Those are important metrics to a network, but...
Keyword:: closeness centrality, sampling, shortest path length, clustering coefficient, network, and random walk
Subject:: Statistics
Creator:: Zheng, Minhui
Owner:: Scholarly Digital Publishing
Language:: en
Date Uploaded:: 02/12/2020
Date Created:: 2019-01-01
Resource Type:: Dissertation
Alternate Identifier:: etdadmin_upload_662638 and http://dissertations.umi.com/northwestern:14650

34. Efficient Estimation with Smooth Penalization

Description:: This dissertation proposes an oracle efficient estimator in the context of a sparse linear model. Chapter 1 introduces the penalty and the estimator that optimizes a penalized least squares objective. Unlike existing methods, the penalty is differentiable – once, and hence the estimator does not engage in model selection. This...
Keyword:: Lasso, Econometrics, Sparsity, Penalized estimation, and Bootstrap
Subject:: Statistics and Economics
Creator:: Gitlin, Sergey
Owner:: Scholarly Digital Publishing
Language:: en
Date Uploaded:: 02/12/2020
Date Created:: 2019-01-01
Resource Type:: Dissertation
Alternate Identifier:: etdadmin_upload_682504 and http://dissertations.umi.com/northwestern:14786

35. Ensembling and Data Selection for Neural Language Models, and Analysis of F-measure

Description:: Language models are the foundation of many natural language tasks such as machine translation, speech recognition, and dialogue systems. Modeling the probability distributions of text accurately helps capture the structures of language and extract valuable information contained in various corpora. In recent years, many advanced models have achieved state-of-the-art performance...
Subject:: Artificial intelligence, Statistics, and Computer science
Creator:: Ju, Wei
Owner:: Scholarly Digital Publishing
Language:: en
Date Uploaded:: 02/02/2022
Date Created:: 2021-01-01
Resource Type:: Dissertation
Alternate Identifier:: etdadmin_upload_863369 and http://dissertations.umi.com/northwestern:15867

36. Statistical Methods for Policy-Relevant Questions in Health and Criminology

Description:: The logistics of policy implementation can lead to a delay from when the actual change in behavior occurs, leading to a shift in a time series. Using change point analysis allows for the data to determine where a change in mean, or other parameters, occurred. But when policy is implemented...
Subject:: Statistics
Creator:: Whalen, Mena
Owner:: Scholarly Digital Publishing
Language:: en
Date Uploaded:: 02/02/2022
Date Created:: 2021-01-01
Resource Type:: Dissertation
Alternate Identifier:: http://dissertations.umi.com/northwestern:15802 and etdadmin_upload_851996

37. Data Centric Design for Microstructural Materials Systems

Description:: Materials science has been central to human advancement since time immemorial. There has always been curiosity around studying the processes required to extract materials, examine their structure, and ultimately tailor their properties to meet human needs. Over the last few centuries, the ability to tailor material properties was driven by...
Keyword:: Gaussian Process, Mixed Variable Machine Learning, Data Centric Design, Multicriteria Optimization, and Microstructure
Subject:: Materials Science, Statistics, and Design
Creator:: Iyer, Akshay
Owner:: Scholarly Digital Publishing
Language:: en
Date Uploaded:: 02/02/2022
Date Created:: 2021-01-01
Resource Type:: Dissertation
Alternate Identifier:: etdadmin_upload_851800 and http://dissertations.umi.com/northwestern:15801

38. Topics in Meta-analysis with Few Studies

Description:: This dissertation consists of three papers on methods for meta-analysis with few studies. These papers are concerned with proper inference from meta-analysis models that combine data from a small number of studies using fixed and random-effects models. Chapter 1 provides an introduction to meta-analysis, the motivation for this work and...
Subject:: Statistics
Creator:: Zejnullahi, Rrita
Owner:: Scholarly Digital Publishing
Language:: en
Date Uploaded:: 10/07/2021
Date Created:: 2021-01-01
Resource Type:: Dissertation
Alternate Identifier:: etdadmin_upload_845508 and http://dissertations.umi.com/northwestern:15774

39. Reliably Accelerated Literature Screening in Systematic Review and Meta-Analysis and Other Evidence Synthesis Methods

Description:: Literature screening is the process of identifying all relevant records from a pool of candidate paper records in systematic review, meta-analysis, and other research synthesis tasks. This process is time consuming, expensive, and prone to human error. Screening prioritization methods attempt to help reviewers identify most relevant records while only...
Subject:: Statistics
Creator:: Hou, Zhipeng
Owner:: Scholarly Digital Publishing
Language:: en
Date Uploaded:: 05/31/2023
Date Created:: 2023-01-01
Resource Type:: Dissertation
Alternate Identifier:: http://dissertations.umi.com/northwestern:16478 and etdadmin_upload_978985

40. Adaptive Computer Experiments for Metamodeling

Description:: Computer simulation experiments are commonly used as an inexpensive alternative to real-world experiments to form a metamodel that approximates the input-output relationship of the real-world experiment. The metamodel can be useful for decision making and making predictions for inputs that have not been evaluated yet since it can be evaluated...
Keyword:: Metamodel, Gaussian process, Composite grid design, Sequential computer experiment, and Computer experiment
Subject:: Industrial engineering and Statistics
Creator:: Erickson, Collin
Owner:: Scholarly Digital Publishing
Language:: en
Date Uploaded:: 02/12/2020
Date Created:: 2019-01-01
Resource Type:: Dissertation
Alternate Identifier:: http://dissertations.umi.com/northwestern:14683 and etdadmin_upload_663282

41. Spatial Statistics Analysis with Artificial Neural Network

Description:: The spatial autoregressive model has been widely applied in science, in areas such as economics, public finance, political science, agricultural economics, environmental studies and transportation analyses. The classical spatial autoregressive model is a linear model for describing spatial correlation. In this work, we expand the classical model to include time...
Keyword:: Neural network, Maximum likelihood method, Spatial analysis, and Noise injection
Subject:: Mathematics, Geography, and Statistics
Creator:: Wang, Wenqian
Owner:: Scholarly Digital Publishing
Language:: en
Date Uploaded:: 02/12/2020
Date Created:: 2019-01-01
Resource Type:: Dissertation
Alternate Identifier:: etdadmin_upload_663006 and http://dissertations.umi.com/northwestern:14671

42. Small Dispersion Asymptotics in Stratified Models

Description:: This dissertation studies the small dispersion asymptotics in highly stratified models. My goal is to show that accurate inferences are possible even if s, the number of strata, is large while m, the number of observations within each stratum, is small, provided that the model ”fit well” in the term...
Keyword:: modified profile likelihood, Inverse Gaussian, effective sample size, exponential dispersion model, MapReduce, and high order expansion
Subject:: Statistics
Creator:: Xuan Mei
Owner:: Scholarly Digital Publishing
Date Uploaded:: 04/18/2018
Date Modified:: 04/18/2018
Date Created:: 2017-01-01
Resource Type:: Dissertation

43. Improved Estimation of the Proportion of True Null Hypotheses with Applications to Adaptive Control of FDR and Drug Screening

Description:: Many methods have been proposed for estimating the number, $m_0$ (or the proportion, $\pi_0$), of the true null hypotheses for adaptively controlling a type I error rate (e.g., the false discovery rate or FDR) using a multiple test procedure. Most of these methods eliminate ``significantly" non-null $p$-values. Then $m_0$ is...
Keyword:: False discovery rate (FDR), Long-run FDR ($l$-FDR), Mixture model, Bias-correction, $p$-values, and EM algorithm
Subject:: Statistics
Creator:: Jiaxiao Shi
Owner:: Scholarly Digital Publishing
Date Uploaded:: 05/09/2018
Date Modified:: 05/09/2018
Date Created:: 2006-10-10
Resource Type:: Dissertation

44. Synthetic Bias Estimation in Small Area Estimation

Description:: Small area estimation (SAE) has been one of the most active areas in survey methodology research, due to the increasing demand for small area statistics from government agencies and the private sector. But in some areas of interest, sample sizes could be very small, or even zero, in which case,...
Keyword:: Statistics
Subject:: Statistics
Creator:: Haoliang Song
Owner:: Scholarly Digital Publishing
Date Uploaded:: 05/28/2018
Date Modified:: 05/28/2018
Date Created:: 2007-04-19
Resource Type:: Dissertation

45. Bayesian Inference with Mixtures of Logistic Regression: Functional Approximation, Statistical Consistency and Algorithmic Convergence

Description:: One of the most commonly used techniques for classification problem is logistic regression. For example, logistic regression for a binary response assumes that the odds Pr(y = 1|x)/Pr(y = 0) = exp(a+bx). However, in reality, the pattern of the data can be so complicated that logistic regression model often fails,...
Keyword:: Statistics
Subject:: Statistics
Creator:: Yang Ge
Owner:: Scholarly Digital Publishing
Date Uploaded:: 08/31/2018
Date Modified:: 08/31/2018
Date Created:: 2008-04-30
Resource Type:: Dissertation

46. Utilizing External Information about the Covariance Structure in Experiments with Clustering

Description:: The use of cluster randomized experiments to study the effects of treatments on groups of subjects has increased in recent years. Many of these experiments lack the necessary statistical power to detect practically meaningful effects of treatment. One method for improving power in cluster randomized experiments that has been advanced...
Keyword:: intracluster correlation coefficient, external information, and cluster randomized experiments
Subject:: Statistics
Creator:: Christopher H Rhodes
Owner:: Scholarly Digital Publishing
Date Uploaded:: 09/06/2018
Date Modified:: 09/06/2018
Date Created:: 2008-04-25
Resource Type:: Dissertation

47. Design and Analysis of Trials for Developing Adaptive Treatment Strategies in Complex Clustered Settings

Description:: In recent years, research has been conducted to develop Sequential, Multiple Assignment, Randomized Trial (SMART) designs. These experimental designs were created to aid in the construction of adaptive treatment strategies for individuals, particularly in medical contexts. Simultaneously, research has been done on developing the use of randomized trials to evaluate...
Keyword:: Statistics
Subject:: Statistics
Creator:: Rachel Ktsanes
Owner:: Scholarly Digital Publishing
Date Uploaded:: 10/28/2018
Date Modified:: 10/28/2018
Date Created:: 2017-01-01
Resource Type:: Dissertation

48. Topics of Variable Selection in Biomedical Data Mining

Description:: High-dimensional data are becoming increasingly available in various fields as data collection technology advances. Not only are we interested in knowing which variables are relevant to the response and which are not, but also a simpler model with less predictor variables is easier for interpretation and computational purposes. Furthermore, a...
Keyword:: Variable Selection, Iterative Sure Independence Screening, Elastic net, Spike-or-Slab, Medical Expenditure, and DNA methylation
Subject:: Statistics
Creator:: Grace Yoon
Owner:: Scholarly Digital Publishing
Date Uploaded:: 10/09/2018
Date Modified:: 01/29/2019
Date Created:: 2017-01-01
Resource Type:: Dissertation

49. Inference in heterogeneous networks

Description:: Last two decades have seen a surge of interests in approaches that leverage network structure in machine learning models. For many networks, not only the connections of the network but also the network attributes, such as node attributes and dyadic attributes, are observed. This heterogeneity in networks raises new challenges...
Keyword:: Inference, Network Representation Learning, Node classification, Network analysis, Heterogeneous networks, and Community detection
Subject:: Statistics
Creator:: Yuan Li
Owner:: Scholarly Digital Publishing
Date Uploaded:: 05/06/2019
Date Modified:: 05/06/2019
Date Created:: 2018-01-01
Resource Type:: Dissertation

50. NORMALIZATION OF RNA DEGRADATION IN RNA-SEQ DATA

Description:: RNA-Sequencing (RNA-Seq) is a powerful high-throughput tool to profile transcriptional activities in cells. The observed read counts can be biased by various factors such that they do not accurately represent the true relative abundance of mRNA transcript abundance. Normalization is a critical step to ensure unbiased comparison of gene expression...
Keyword:: Normalization, Matrix factorization, RNA-Seq, and Degradation
Subject:: Statistics
Creator:: Bin Xiong
Owner:: Scholarly Digital Publishing
Date Uploaded:: 10/08/2019
Date Modified:: 10/08/2019
Date Created:: 1/1/2018
Resource Type:: Dissertation

Limit your search

Type

Resource type

Creator

Keyword

Subject

Language

Search Constraints

Search Results

1. On MAP Inference of Ferromagnetic Potts Models and Nonsymmetric Determinantal Point Processes

2. Applications of Statistical Language Models in Complex Network Community Detection and Definition Modeling

3. A Novel Parallel Adaptive Survivor Selection Framework for Large-Scale Simulation Optimization

4. Risk Prediction with Longitudinal Gene Expression Data Using Statistical and Machine Learning Method

5. Essay on Foundation Models and Reinforcement Learning

6. Asymptotic Uncertainty Quantification and Its Application in Efficient Sampling and Learning

7. A Hybrid Physics-Based And Data-Driven Modeling Framework For Energy And Water Use Analysis Of Data Centers With Spatio-Temporal Resolution

8. Stochastic Noise in Gene Expression Impacts Developmental Self-organization

9. Subgroup Identification in Longitudinal Studies

10. Sequential Change-point Detection for Time Series

11. Essays on the Sociological Analysis of Segregation and Natural Language

12. Variable Selection for High Dimensional Compositional Data with Application in Metagenomics

13. Prediction of CRISPR-Cas9 Cleavage Efficiency Through Markov Feature Engineering and Boosting-Based Transfer Learning

14. Embedding a Randomized Experiment Within a Regression Discontinuity Design: A Meta-Analytic Approach

15. Statistical Methods for Assessing Replication: A Meta-Analytic Framework

16. Topics in Statistical Modeling for Unstructured Text Data with Application to Commonsense Inference

17. Topics in Microbiome Data Analysis: Normalization and Differential Abundance Test and Large-Scale Human Microbe-Disease Association Prediction

18. Multidisciplinary and Dynamic Decisions in Simulation-Based Design

19. Quantification of Microstructure Induced Uncertainty in Multiscale Materials with Random Processes

20. Spatial and Temporal Methods to Analyze the Malaria Burden using Routine Health Facility Case Data in Burkina Faso

21. Space-Filling Designed Sampling from Databases

22. Methods for Multi-Objective Genetic Clustering of Time-Evolving Data

23. The Impact of Entity Resolution on Observed Social Network Structure

24. Investigation of diffusion of innovations among homogeneous professional groups

25. Interpretable Machine Learning with Applications to Computational Materials Science

26. Fisher Score-Based Concept Drift Monitoring and Diagnosis with Applications to Microstructure Nonstationarity Analysis and Deep Learning Segmentation for Multiphase Materials

27. Several New Advances for Gaussian Process Models

28. Experimental Design for the Study of Treatment Effect Heterogeneity in Education Research

29. Comparative analysis of feature selection and classification methods for epigenetic methylation data

30. The Impact of Entity Resolution on Observed Social Network Structure

31. Statistical methods for the network-based analysis of genomic data

32. Methods for Synthesizing and Translating Statistical Evidence in Education

33. Estimating Network Metrics via Random Walk Sampling

34. Efficient Estimation with Smooth Penalization

35. Ensembling and Data Selection for Neural Language Models, and Analysis of F-measure

36. Statistical Methods for Policy-Relevant Questions in Health and Criminology

37. Data Centric Design for Microstructural Materials Systems

38. Topics in Meta-analysis with Few Studies

39. Reliably Accelerated Literature Screening in Systematic Review and Meta-Analysis and Other Evidence Synthesis Methods

40. Adaptive Computer Experiments for Metamodeling

41. Spatial Statistics Analysis with Artificial Neural Network

42. Small Dispersion Asymptotics in Stratified Models

43. Improved Estimation of the Proportion of True Null Hypotheses with Applications to Adaptive Control of FDR and Drug Screening

44. Synthetic Bias Estimation in Small Area Estimation

45. Bayesian Inference with Mixtures of Logistic Regression: Functional Approximation, Statistical Consistency and Algorithmic Convergence

46. Utilizing External Information about the Covariance Structure in Experiments with Clustering

47. Design and Analysis of Trials for Developing Adaptive Treatment Strategies in Complex Clustered Settings

48. Topics of Variable Selection in Biomedical Data Mining

49. Inference in heterogeneous networks

50. NORMALIZATION OF RNA DEGRADATION IN RNA-SEQ DATA