Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 2 de 2
Filtrar
Mais filtros

Base de dados
Ano de publicação
Tipo de documento
Intervalo de ano de publicação
1.
Bioinformatics ; 34(1): 80-87, 2018 01 01.
Artigo em Inglês | MEDLINE | ID: mdl-28968638

RESUMO

Motivation: Despite significant efforts in expert curation, clinical relevance about most of the 154 million dbSNP reference variants (RS) remains unknown. However, a wealth of knowledge about the variant biological function/disease impact is buried in unstructured literature data. Previous studies have attempted to harvest and unlock such information with text-mining techniques but are of limited use because their mutation extraction results are not standardized or integrated with curated data. Results: We propose an automatic method to extract and normalize variant mentions to unique identifiers (dbSNP RSIDs). Our method, in benchmarking results, demonstrates a high F-measure of ∼90% and compared favorably to the state of the art. Next, we applied our approach to the entire PubMed and validated the results by verifying that each extracted variant-gene pair matched the dbSNP annotation based on mapped genomic position, and by analyzing variants curated in ClinVar. We then determined which text-mined variants and genes constituted novel discoveries. Our analysis reveals 41 889 RS numbers (associated with 9151 genes) not found in ClinVar. Moreover, we obtained a rich set worth further review: 12 462 rare variants (MAF ≤ 0.01) in 3849 genes which are presumed to be deleterious and not frequently found in the general population. To our knowledge, this is the first large-scale study to analyze and integrate text-mined variant data with curated knowledge in existing databases. Our results suggest that databases can be significantly enriched by text mining and that the combined information can greatly assist human efforts in evaluating/prioritizing variants in genomic research. Availability and implementation: The tmVar 2.0 source code and corpus are freely available at https://www.ncbi.nlm.nih.gov/research/bionlp/Tools/tmvar/. Contact: zhiyong.lu@nih.gov.


Assuntos
Mineração de Dados/métodos , Mutação , Polimorfismo Genético , Medicina de Precisão/métodos , Software , Curadoria de Dados , Bases de Dados Factuais , Predisposição Genética para Doença , Genômica/métodos , Humanos , Fenótipo , PubMed , Publicações
2.
Hum Mutat ; 28(6): 554-62, 2007 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-17326095

RESUMO

PhenCode (Phenotypes for ENCODE; http://www.bx.psu.edu/phencode) is a collaborative, exploratory project to help understand phenotypes of human mutations in the context of sequence and functional data from genome projects. Currently, it connects human phenotype and clinical data in various locus-specific databases (LSDBs) with data on genome sequences, evolutionary history, and function from the ENCODE project and other resources in the UCSC Genome Browser. Initially, we focused on a few selected LSDBs covering genes encoding alpha- and beta-globins (HBA, HBB), phenylalanine hydroxylase (PAH), blood group antigens (various genes), androgen receptor (AR), cystic fibrosis transmembrane conductance regulator (CFTR), and Bruton's tyrosine kinase (BTK), but we plan to include additional loci of clinical importance, ultimately genomewide. We have also imported variant data and associated OMIM links from Swiss-Prot. Users can find interesting mutations in the UCSC Genome Browser (in a new Locus Variants track) and follow links back to the LSDBs for more detailed information. Alternatively, they can start with queries on mutations or phenotypes at an LSDB and then display the results at the Genome Browser to view complementary information such as functional data (e.g., chromatin modifications and protein binding from the ENCODE consortium), evolutionary constraint, regulatory potential, and/or any other tracks they choose. We present several examples illustrating the power of these connections for exploring phenotypes associated with functional elements, and for identifying genomic data that could help to explain clinical phenotypes.


Assuntos
Bases de Dados Genéticas , Mutação , Fenótipo , Tirosina Quinase da Agamaglobulinemia , Antígenos de Grupos Sanguíneos/genética , Comportamento Cooperativo , Regulador de Condutância Transmembrana em Fibrose Cística/genética , Bases de Dados Genéticas/normas , Genótipo , Globinas/genética , Humanos , Internet , Fenilalanina Hidroxilase/genética , Proteínas Tirosina Quinases/genética , Receptores Androgênicos/genética , Design de Software , Integração de Sistemas
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA