Work

Topics in Microbiome Data Analysis: Normalization and Differential Abundance Test and Large-Scale Human Microbe-Disease Association Prediction

Public

Downloadable Content

Download PDF

The advent of sequencing technologies has generated a large amount of biological and medical data. These data such as genetic sequencing data and lab experimental evidence data can help understand critical biomedical problems. This dissertation makes contribution in three different but related applications in biomedical research. In Chapter 2, we propose a process with a variety of variable selection approaches to select informative genetic variants for identifying clinically meaningful subtypes of hypertensive patients. Two subgroups are identified for African Americans and Caucasians who are enrolled in the Hypertension Genetic Epidemiology Network study. The identified genetic-based subtypes are clinically meaningful with cardiac mechanics statistically different between two subtypes. In Chapter 3, we develop a novel framework for differential abundance analysis using sparse high-dimensional marker gene microbiome abundance data. The framework consists of a novel network-based normalization approach and a two-stage zero-inflated mixture count regression model. Large-scale simulation studies and case analysis have showed the superior performance of our framework. In chapter 4, we focus on recommending candidate microbes for diseases at large scale using the experimentally verified microbe-disease association data. We define the microbe-disease association prediction as a link prediction problem on a multiplex heterogeneous network and propose an end-to-end graph convolutional neural network model which is based on the network node representation learning. Considering a microbe can be either reduced or elevated under the impact of a disease, our model is not only capable to predict the existence of associations but also able to specify the specific association type between a microbe and a disease. Comprehensive cross validation studies and case analysis demonstrate the effectiveness and superiority of our model.

Creator
DOI
Subject
Language
Alternate Identifier
Keyword
Date created
Resource type
Rights statement

Relationships

Items