LYRUS: a machine learning model for predicting the pathogenicity of missense variants.

Lai, Jiaying; Yang, Jordan; Gamsiz Uzun, Ece D; Rubenstein, Brenda M; Sarkar, Indra Neil

Lai, Jiaying; Yang, Jordan; Gamsiz Uzun, Ece D; Rubenstein, Brenda M; Sarkar, Indra Neil.

Affiliation

Lai J; Center for Biomedical Informatics, Brown University, Providence, RI 02903, USA.
Yang J; Center for Computational Molecular Biology, Brown University, Providence, RI 02906, USA.
Gamsiz Uzun ED; Department of Chemistry, Brown University, Providence, RI 02906, USA.
Rubenstein BM; Center for Computational Molecular Biology, Brown University, Providence, RI 02906, USA.
Sarkar IN; Department of Pathology and Laboratory Medicine, Brown University Alpert Medical School, Providence, RI 02903, USA.

Bioinform Adv ; 2(1): vbab045, 2022.

Article in En | MEDLINE | ID: mdl-35036922

ABSTRACT

SUMMARY: Single amino acid variations (SAVs) are a primary contributor to variations in the human genome. Identifying pathogenic SAVs can provide insights to the genetic architecture of complex diseases. Most approaches for predicting the functional effects or pathogenicity of SAVs rely on either sequence or structural information. This study presents ãLai Yang Rubenstein Uzun Sarkarã (LYRUS), a machine learning method that uses an XGBoost classifier to predict the pathogenicity of SAVs. LYRUS incorporates five sequence-based, six structure-based and four dynamics-based features. Uniquely, LYRUS includes a newly proposed sequence co-evolution feature called the variation number. LYRUS was trained using a dataset that contains 4363 protein structures corresponding to 22 639 SAVs from the ClinVar database, and tested using the VariBench testing dataset. Performance analysis showed that LYRUS achieved comparable performance to current variant effect predictors. LYRUS's performance was also benchmarked against six Deep Mutational Scanning datasets for PTEN and TP53. AVAILABILITY AND IMPLEMENTATION: LYRUS is freely available and the source code can be found at https://github.com/jiaying2508/LYRUS. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics Advances online.

Fulltext

Add to My VHL

XML

PubMed Links

Search on Google

Full text: 1 Collection: 01-internacional Database: MEDLINE Type of study: Prognostic_studies / Risk_factors_studies Language: En Journal: Bioinform Adv Year: 2022 Document type: Article Affiliation country: United States Country of publication: United kingdom

Fulltext

Add to My VHL

XML

PubMed Links

Search on Google