Work

Development and Application of Innovative Algorithms for Transcriptome Profiling

Public

Downloadable Content

Download PDF

Transcriptome profiling including whole genome bulk and single cell RNA sequencing (RNA-seq) enables the quantitative study of transcript changes at either tissue or single cell level. With the advancement in next generation sequencing technology, it is now part of routine lab practice. More than other high-throughput technologies, computational algorithms are required to analyze the transcriptomics signals in order to unbiasedly identify genes or cell groups that are associated with complex diseases or biological processes. However, the computational pipeline to handle these raw data still lags behind our ability to generate them. Thus, there is a crucial need to develop and apply innovative theory-grounded algorithms in RNA-seq analysis pipeline. In the first and second parts of this thesis, we present application of machine learning and network analysis algorithms to either bulk or single cell RNA-seq analysis. These examples demonstrate the power of transcriptome profiling as a revolutionary tool to provide meaningful insights in a system that cannot be easily accessed by traditional biological approaches. Since algorithm development in transcriptome profiling especially in single cell RNA-seq (scRNA-seq) analysis is still in her infancy, the reproducibility and accuracy of current analysis pipelines remains challenging. For example, cellular classification algorithms continue to be evaluated using datasets with cell type labels generated by previous computational analysis, inducing circularity. Therefore, there is urgent need to develop automatic and reproducible tools for scRNA-seq analysis. In the last part of this thesis, we present an innovative framework of scRNA-seq analysis including constructing gold standard benchmarking dataset, applying information-theory-based feature selection as well as developing “TopicMapping”, an unsupervised machine learning algorithm for cellular classification and trajectory prediction. All of our approaches require no prior biological knowledge – including no need for tissue or organism adjustments, and minimal parameter tuning. Our framework yields a dramatic improvement in accuracy compared with current scRNA-seq analysis workflow.

Creator
DOI
Subject
Language
Alternate Identifier
Keyword
Date created
Resource type
Rights statement

Relationships

Items