Work

Prediction of CRISPR-Cas9 Cleavage Efficiency Through Markov Feature Engineering and Boosting-Based Transfer Learning

Public

Downloadable Content

Download PDF

In the short amount of time that genetic manipulation has been possible through CRISPR technology, myriad applications have been developed. Results from one of the most promising applications of this technology, pooled screens, have shown that single guide RNAs (sgRNAs), RNA sequences used to target specific regions of the genome, vary in their ability to produce mutations through DNA cleavage. A variety of features have been linked to sgRNA efficiency, including features derived from target site nucleotide compositions, thermodynamic properties of the sgRNAs, and epigenetic features. In this work, an optimized machine learning pipeline is proposed for the prediction of CRISPR-Cas9 cleavage efficiency. Feature engineering using Markov models, the gradient boosting learner LightGBM, and a boosting-based transfer learning framework are employed to produce a model that obtains state-of-the-art results for the prediction of wild-type CRISPR-Cas9 efficiency in U6 promoter-based systems, even compared to recent neural network architectures. We term this model BoostMEC (Boosting and Markov for Efficient CRISPR). We further validate BoostMEC by evaluating it against competing models on various external datasets. Before introducing the model, we review relevant research and provide background for the learners used in the model.

Creator
DOI
Subject
Language
Alternate Identifier
Keyword
Date created
Resource type
Rights statement

Relationships

Items