Your browser doesn't support javascript.
loading
StrVCTVRE: A supervised learning method to predict the pathogenicity of human genome structural variants.
Sharo, Andrew G; Hu, Zhiqiang; Sunyaev, Shamil R; Brenner, Steven E.
Afiliação
  • Sharo AG; Biophysics Graduate Group, University of California, Berkeley, Berkeley, CA 94720, USA; Center for Computational Biology, University of California, Berkeley, Berkeley, CA 94720, USA. Electronic address: sharo@compbio.berkeley.edu.
  • Hu Z; Center for Computational Biology, University of California, Berkeley, Berkeley, CA 94720, USA; Department of Plant and Microbial Biology, University of California, Berkeley, Berkeley, CA 94720, USA.
  • Sunyaev SR; Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, USA; Division of Genetics, Brigham and Women's Hospital, Boston, MA 02115, USA.
  • Brenner SE; Biophysics Graduate Group, University of California, Berkeley, Berkeley, CA 94720, USA; Center for Computational Biology, University of California, Berkeley, Berkeley, CA 94720, USA; Department of Plant and Microbial Biology, University of California, Berkeley, Berkeley, CA 94720, USA. Electronic ad
Am J Hum Genet ; 109(2): 195-209, 2022 02 03.
Article em En | MEDLINE | ID: mdl-35032432
Whole-genome sequencing resolves many clinical cases where standard diagnostic methods have failed. However, at least half of these cases remain unresolved after whole-genome sequencing. Structural variants (SVs; genomic variants larger than 50 base pairs) of uncertain significance are the genetic cause of a portion of these unresolved cases. As sequencing methods using long or linked reads become more accessible and SV detection algorithms improve, clinicians and researchers are gaining access to thousands of reliable SVs of unknown disease relevance. Methods to predict the pathogenicity of these SVs are required to realize the full diagnostic potential of long-read sequencing. To address this emerging need, we developed StrVCTVRE to distinguish pathogenic SVs from benign SVs that overlap exons. In a random forest classifier, we integrated features that capture gene importance, coding region, conservation, expression, and exon structure. We found that features such as expression and conservation are important but are absent from SV classification guidelines. We leveraged multiple resources to construct a size-matched training set of rare, putatively benign and pathogenic SVs. StrVCTVRE performs accurately across a wide SV size range on independent test sets, which will allow clinicians and researchers to eliminate about half of SVs from consideration while retaining a 90% sensitivity. We anticipate clinicians and researchers will use StrVCTVRE to prioritize SVs in probands where no SV is immediately compelling, empowering deeper investigation into novel SVs to resolve cases and understand new mechanisms of disease. StrVCTVRE runs rapidly and is publicly available.
Assuntos
Palavras-chave

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Algoritmos / Software / Genoma Humano / Variação Estrutural do Genoma / Aprendizado de Máquina Supervisionado Tipo de estudo: Guideline / Prognostic_studies / Risk_factors_studies Limite: Humans Idioma: En Revista: Am J Hum Genet Ano de publicação: 2022 Tipo de documento: Article

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Algoritmos / Software / Genoma Humano / Variação Estrutural do Genoma / Aprendizado de Máquina Supervisionado Tipo de estudo: Guideline / Prognostic_studies / Risk_factors_studies Limite: Humans Idioma: En Revista: Am J Hum Genet Ano de publicação: 2022 Tipo de documento: Article