Interpretable CRISPR/Cas9 off-target activities with mismatches and indels prediction using BERT.
Comput Biol Med
; 169: 107932, 2024 Feb.
Article
em En
| MEDLINE
| ID: mdl-38199209
ABSTRACT
Off-target effects of CRISPR/Cas9 can lead to suboptimal genome editing outcomes. Numerous deep learning-based approaches have achieved excellent performance for off-target prediction; however, few can predict the off-target activities with both mismatches and indels between single guide RNA (sgRNA) and target DNA sequence pair. In addition, data imbalance is a common pitfall for off-target prediction. Moreover, due to the complexity of genomic contexts, generating an interpretable model also remains challenged. To address these issues, firstly we developed a BERT-based model called CRISPR-BERT for enhancing the prediction of off-target activities with both mismatches and indels. Secondly, we proposed an adaptive batch-wise class balancing strategy to combat the noise exists in imbalanced off-target data. Finally, we applied a visualization approach for investigating the generalizable nucleotide position-dependent patterns of sgRNA-DNA pair for off-target activity. In our comprehensive comparison to existing methods on five mismatches-only datasets and two mismatches-and-indels datasets, CRISPR-BERT achieved the best performance in terms of AUROC and PRAUC. Besides, the visualization analysis demonstrated how implicit knowledge learned by CRISPR-BERT facilitates off-target prediction, which shows potential in model interpretability. Collectively, CRISPR-BERT provides an accurate and interpretable framework for off-target prediction, further contributes to sgRNA optimization in practical use for improved target specificity in CRISPR/Cas9 genome editing. The source code is available at https//github.com/BrokenStringx/CRISPR-BERT.
Palavras-chave
Texto completo:
1
Coleções:
01-internacional
Base de dados:
MEDLINE
Assunto principal:
Sistemas CRISPR-Cas
/
RNA Guia de Sistemas CRISPR-Cas
Tipo de estudo:
Prognostic_studies
/
Risk_factors_studies
Idioma:
En
Revista:
Comput Biol Med
Ano de publicação:
2024
Tipo de documento:
Article