AVADA: toward automated pathogenic variant evidence retrieval directly from the full-text literature.

Birgmeier, Johannes; Deisseroth, Cole A; Hayward, Laura E; Galhardo, Luisa M T; Tierno, Andrew P; Jagadeesh, Karthik A; Stenson, Peter D; Cooper, David N; Bernstein, Jonathan A; Haeussler, Maximilian; Bejerano, Gill

Birgmeier, Johannes; Deisseroth, Cole A; Hayward, Laura E; Galhardo, Luisa M T; Tierno, Andrew P; Jagadeesh, Karthik A; Stenson, Peter D; Cooper, David N; Bernstein, Jonathan A; Haeussler, Maximilian; Bejerano, Gill.

Afiliação

Birgmeier J; Department of Computer Science, Stanford University, Stanford, CA, USA.
Deisseroth CA; Department of Computer Science, Stanford University, Stanford, CA, USA.
Hayward LE; Department of Genetics, Stanford University School of Medicine, Stanford, CA, USA.
Galhardo LMT; Department of Computer Science, Stanford University, Stanford, CA, USA.
Tierno AP; Department of Computer Science, Stanford University, Stanford, CA, USA.
Jagadeesh KA; Department of Computer Science, Stanford University, Stanford, CA, USA.
Stenson PD; Institute of Medical Genetics, School of Medicine, Cardiff University, Heath Park, Cardiff, UK.
Cooper DN; Institute of Medical Genetics, School of Medicine, Cardiff University, Heath Park, Cardiff, UK.
Bernstein JA; Department of Pediatrics, Stanford School of Medicine, Stanford, CA, USA.
Haeussler M; Santa Cruz Genomics Institute, MS CBSE, University of California Santa Cruz, Santa Cruz, CA, USA.
Bejerano G; Department of Computer Science, Stanford University, Stanford, CA, USA. bejerano@stanford.edu.

Genet Med ; 22(2): 362-370, 2020 02.

Article em En | MEDLINE | ID: mdl-31467448

RESUMO

PURPOSE: Both monogenic pathogenic variant cataloging and clinical patient diagnosis start with variant-level evidence retrieval followed by expert evidence integration in search of diagnostic variants and genes. Here, we try to accelerate pathogenic variant evidence retrieval by an automatic approach. METHODS: Automatic VAriant evidence DAtabase (AVADA) is a novel machine learning tool that uses natural language processing to automatically identify pathogenic genetic variant evidence in full-text primary literature about monogenic disease and convert it to genomic coordinates. RESULTS: AVADA automatically retrieved almost 60% of likely disease-causing variants deposited in the Human Gene Mutation Database (HGMD), a 4.4-fold improvement over the current best open source automated variant extractor. AVADA contains over 60,000 likely disease-causing variants that are in HGMD but not in ClinVar. AVADA also highlights the challenges of automated variant mapping and pathogenicity curation. However, when combined with manual validation, on 245 diagnosed patients, AVADA provides valuable evidence for an additional 18 diagnostic variants, on top of ClinVar's 21, versus only 2 using the best current automated approach. CONCLUSION: AVADA advances automated retrieval of pathogenic monogenic variant evidence from full-text literature. Far from perfect, but much faster than PubMed/Google Scholar search, careful curation of AVADA-retrieved evidence can aid both database curation and patient diagnosis.

Assuntos

Processamento Eletrônico de Dados/métodos; Genômica/métodos; Armazenamento e Recuperação da Informação/métodos; Gerenciamento de Dados/métodos; Bases de Dados Factuais; Bases de Dados Genéticas; Humanos; Processamento de Linguagem Natural; PubMed; Publicações

Palavras-chave

automatic variant retrieval; full-text extraction; machine learning; natural language processing; variants database

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Processamento Eletrônico de Dados / Armazenamento e Recuperação da Informação / Genômica Tipo de estudo: Guideline Limite: Humans Idioma: En Ano de publicação: 2020 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google