Modern data sets are increasingly vast, not only in the number of samples, but also in the number of measurements, or features, that they contain. This high-dimensionality poses a unique set of problems for data analysis due to a set of phenomena known as ``the curse of dimensionality.'' This thesis...
Annual age-adjusted breast cancer incidence rates in the United States have been static for decades. More recently, the development of massively parallel, high throughput DNA sequencing has enabled the cataloging of somatic mutations in cancer. Mutations are non-random and occur within sequence motifs. These motifs provide us with evidence to...