Pesquisa | Biblioteca Virtual em Saúde

tmVar 2.0: integrating genomic variant information from literature with dbSNP and ClinVar for precision medicine.

Wei, Chih-Hsuan; Phan, Lon; Feltz, Juliana; Maiti, Rama; Hefferon, Tim; Lu, Zhiyong.

Bioinformatics ; 34(1): 80-87, 2018 01 01.

Artigo em Inglês | MEDLINE | ID: mdl-28968638

RESUMO

Motivation: Despite significant efforts in expert curation, clinical relevance about most of the 154 million dbSNP reference variants (RS) remains unknown. However, a wealth of knowledge about the variant biological function/disease impact is buried in unstructured literature data. Previous studies have attempted to harvest and unlock such information with text-mining techniques but are of limited use because their mutation extraction results are not standardized or integrated with curated data. Results: We propose an automatic method to extract and normalize variant mentions to unique identifiers (dbSNP RSIDs). Our method, in benchmarking results, demonstrates a high F-measure of â¼90% and compared favorably to the state of the art. Next, we applied our approach to the entire PubMed and validated the results by verifying that each extracted variant-gene pair matched the dbSNP annotation based on mapped genomic position, and by analyzing variants curated in ClinVar. We then determined which text-mined variants and genes constituted novel discoveries. Our analysis reveals 41 889 RS numbers (associated with 9151 genes) not found in ClinVar. Moreover, we obtained a rich set worth further review: 12 462 rare variants (MAF ≤ 0.01) in 3849 genes which are presumed to be deleterious and not frequently found in the general population. To our knowledge, this is the first large-scale study to analyze and integrate text-mined variant data with curated knowledge in existing databases. Our results suggest that databases can be significantly enriched by text mining and that the combined information can greatly assist human efforts in evaluating/prioritizing variants in genomic research. Availability and implementation: The tmVar 2.0 source code and corpus are freely available at https://www.ncbi.nlm.nih.gov/research/bionlp/Tools/tmvar/. Contact: zhiyong.lu@nih.gov.

Assuntos

Mineração de Dados/métodos , Mutação , Polimorfismo Genético , Medicina de Precisão/métodos , Software , Curadoria de Dados , Bases de Dados Factuais , Predisposição Genética para Doença , Genômica/métodos , Humanos , Fenótipo , PubMed , Publicações

PhenCode: connecting ENCODE data with mutations and phenotype.

Giardine, Belinda; Riemer, Cathy; Hefferon, Tim; Thomas, Daryl; Hsu, Fan; Zielenski, Julian; Sang, Yunhua; Elnitski, Laura; Cutting, Garry; Trumbower, Heather; Kern, Andrew; Kuhn, Robert; Patrinos, George P; Hughes, Jim; Higgs, Doug; Chui, David; Scriver, Charles; Phommarinh, Manyphong; Patnaik, Santosh K; Blumenfeld, Olga; Gottlieb, Bruce; Vihinen, Mauno; Väliaho, Jouni; Kent, Jim; Miller, Webb; Hardison, Ross C.

Hum Mutat ; 28(6): 554-62, 2007 Jun.

Artigo em Inglês | MEDLINE | ID: mdl-17326095

RESUMO

PhenCode (Phenotypes for ENCODE; http://www.bx.psu.edu/phencode) is a collaborative, exploratory project to help understand phenotypes of human mutations in the context of sequence and functional data from genome projects. Currently, it connects human phenotype and clinical data in various locus-specific databases (LSDBs) with data on genome sequences, evolutionary history, and function from the ENCODE project and other resources in the UCSC Genome Browser. Initially, we focused on a few selected LSDBs covering genes encoding alpha- and beta-globins (HBA, HBB), phenylalanine hydroxylase (PAH), blood group antigens (various genes), androgen receptor (AR), cystic fibrosis transmembrane conductance regulator (CFTR), and Bruton's tyrosine kinase (BTK), but we plan to include additional loci of clinical importance, ultimately genomewide. We have also imported variant data and associated OMIM links from Swiss-Prot. Users can find interesting mutations in the UCSC Genome Browser (in a new Locus Variants track) and follow links back to the LSDBs for more detailed information. Alternatively, they can start with queries on mutations or phenotypes at an LSDB and then display the results at the Genome Browser to view complementary information such as functional data (e.g., chromatin modifications and protein binding from the ENCODE consortium), evolutionary constraint, regulatory potential, and/or any other tracks they choose. We present several examples illustrating the power of these connections for exploring phenotypes associated with functional elements, and for identifying genomic data that could help to explain clinical phenotypes.

Assuntos

Bases de Dados Genéticas , Mutação , Fenótipo , Tirosina Quinase da Agamaglobulinemia , Antígenos de Grupos Sanguíneos/genética , Comportamento Cooperativo , Regulador de Condutância Transmembrana em Fibrose Cística/genética , Bases de Dados Genéticas/normas , Genótipo , Globinas/genética , Humanos , Internet , Fenilalanina Hidroxilase/genética , Proteínas Tirosina Quinases/genética , Receptores Androgênicos/genética , Design de Software , Integração de Sistemas

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA