Work

Statistical methods for the network-based analysis of genomic data

Public

The focus of this thesis is on evaluating, designing, and applying statistical methods that elucidate molecular mechanisms by seeking to understand the pathways that contribute to disease. Chapter 1 introduces the field and motivates the work in this thesis. Chapters 2, 3, and 4 describe original work. Chapter 5 recapitulates our findings in the context of the field. Chapter 2 outlines a novel evaluation framework for pathway analysis methods. The key idea is that analysis techniques that correctly identify disease-associated pathways should find them across different datasets that are measuring the same underlying conditions. Therefore, we apply eight network-based pathway analysis techniques to ten different ovarian cancer studies that have been curated to ensure comparability and evaluate the methods by their cross-study concordance. This approach allows us to evaluate the method with real (instead of artificial) data. Chapter 3 presents a new analysis method that integrates expression data and network information in a novel procedure to detect genes that appear to influence nearby genes with disease-associated dysregulation. Applying our algorithm to real expression data we show that our method is able to identify biologically relevant genes, integrate pathway and expression data, and yield more reproducible results across multiple studies of the same phenotype than competing methods. Chapter 4 concerns original mouse cell line time-series expression data and the statistical analysis of that data to study the development of acute myloeid leukemia (AML) from severe congenital neutropenia (SCN). From this data, we seek to identify the sources of dysregulation in a mutant GCSF background. The key idea of our approach is to combine our data with an interaction network specific to the context of hematopoiesis. To infer a network that is independent from our data we apply the semi-supervised method iRafNet to data from GCSF responsive cells available through Haemopedia. We then find the genes that are the sources of differential time-course profiles on the network using our method, GeneSurrounder. Together these analyses establish a network-based approach to glean mechanistic insights from transcriptomic data by identifying dysregulated pathways and the sources of dysregulation on those pathways.

Creator
DOI
Subject
Language
Alternate Identifier
Keyword
Date created
Resource type
Rights statement

Relationships

Items