Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 6 de 6
Filtrar
1.
PLoS Comput Biol ; 14(8): e1006390, 2018 08.
Artículo en Inglés | MEDLINE | ID: mdl-30102703

RESUMEN

Manually curating biomedical knowledge from publications is necessary to build a knowledge based service that provides highly precise and organized information to users. The process of retrieving relevant publications for curation, which is also known as document triage, is usually carried out by querying and reading articles in PubMed. However, this query-based method often obtains unsatisfactory precision and recall on the retrieved results, and it is difficult to manually generate optimal queries. To address this, we propose a machine-learning assisted triage method. We collect previously curated publications from two databases UniProtKB/Swiss-Prot and the NHGRI-EBI GWAS Catalog, and used them as a gold-standard dataset for training deep learning models based on convolutional neural networks. We then use the trained models to classify and rank new publications for curation. For evaluation, we apply our method to the real-world manual curation process of UniProtKB/Swiss-Prot and the GWAS Catalog. We demonstrate that our machine-assisted triage method outperforms the current query-based triage methods, improves efficiency, and enriches curated content. Our method achieves a precision 1.81 and 2.99 times higher than that obtained by the current query-based triage methods of UniProtKB/Swiss-Prot and the GWAS Catalog, respectively, without compromising recall. In fact, our method retrieves many additional relevant publications that the query-based method of UniProtKB/Swiss-Prot could not find. As these results show, our machine learning-based method can make the triage process more efficient and is being implemented in production so that human curators can focus on more challenging tasks to improve the quality of knowledge bases.


Asunto(s)
Curaduría de Datos/métodos , Almacenamiento y Recuperación de la Información/métodos , Curaduría de Datos/estadística & datos numéricos , Bases de Datos Genéticas , Bases de Datos de Proteínas , Aprendizaje Profundo , Genómica , Bases del Conocimiento , Aprendizaje Automático , Publicaciones
2.
Bioinformatics ; 33(21): 3454-3460, 2017 Nov 01.
Artículo en Inglés | MEDLINE | ID: mdl-29036270

RESUMEN

MOTIVATION: Biological knowledgebases, such as UniProtKB/Swiss-Prot, constitute an essential component of daily scientific research by offering distilled, summarized and computable knowledge extracted from the literature by expert curators. While knowledgebases play an increasingly important role in the scientific community, their ability to keep up with the growth of biomedical literature is under scrutiny. Using UniProtKB/Swiss-Prot as a case study, we address this concern via multiple literature triage approaches. RESULTS: With the assistance of the PubTator text-mining tool, we tagged more than 10 000 articles to assess the ratio of papers relevant for curation. We first show that curators read and evaluate many more papers than they curate, and that measuring the number of curated publications is insufficient to provide a complete picture as demonstrated by the fact that 8000-10 000 papers are curated in UniProt each year while curators evaluate 50 000-70 000 papers per year. We show that 90% of the papers in PubMed are out of the scope of UniProt, that a maximum of 2-3% of the papers indexed in PubMed each year are relevant for UniProt curation, and that, despite appearances, expert curation in UniProt is scalable. AVAILABILITY AND IMPLEMENTATION: UniProt is freely available at http://www.uniprot.org/. CONTACT: sylvain.poux@sib.swiss. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Curaduría de Datos , Bases de Datos de Proteínas , Curaduría de Datos/estadística & datos numéricos , Minería de Datos , Bases de Datos de Proteínas/estadística & datos numéricos , Humanos , Bases del Conocimiento , PubMed/estadística & datos numéricos , Literatura de Revisión como Asunto , Estadística como Asunto
3.
Hum Mutat ; 35(8): 927-35, 2014 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-24848695

RESUMEN

During the last few years, next-generation sequencing (NGS) technologies have accelerated the detection of genetic variants resulting in the rapid discovery of new disease-associated genes. However, the wealth of variation data made available by NGS alone is not sufficient to understand the mechanisms underlying disease pathogenesis and manifestation. Multidisciplinary approaches combining sequence and clinical data with prior biological knowledge are needed to unravel the role of genetic variants in human health and disease. In this context, it is crucial that these data are linked, organized, and made readily available through reliable online resources. The Swiss-Prot section of the Universal Protein Knowledgebase (UniProtKB/Swiss-Prot) provides the scientific community with a collection of information on protein functions, interactions, biological pathways, as well as human genetic diseases and variants, all manually reviewed by experts. In this article, we present an overview of the information content of UniProtKB/Swiss-Prot to show how this knowledgebase can support researchers in the elucidation of the mechanisms leading from a molecular defect to a disease phenotype.


Asunto(s)
Bases de Datos de Proteínas/estadística & datos numéricos , Estudios de Asociación Genética , Genética Médica , Bases del Conocimiento , Proteoma , Programas Informáticos , Secuencia de Aminoácidos , Variación Genética , Genoma Humano , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Internet , Anotación de Secuencia Molecular , Datos de Secuencia Molecular , Terminología como Asunto
4.
J Pers Med ; 14(6)2024 Jun 17.
Artículo en Inglés | MEDLINE | ID: mdl-38929869

RESUMEN

Large-scale next-generation sequencing (NGS) germline testing is technically feasible today, but variant interpretation represents a major bottleneck in analysis workflows. This includes extensive variant prioritization, annotation, and time-consuming evidence curation. The scale of the interpretation problem is massive, and variants of uncertain significance (VUSs) are a challenge to personalized medicine. This challenge is further compounded by the complexity and heterogeneity of the standards used to describe genetic variants and the associated phenotypes when searching for relevant information to support clinical decision making. To address this, all five Swiss academic institutions for Medical Genetics joined forces with the Swiss Institute of Bioinformatics (SIB) to create SwissGenVar as a user-friendly nationwide repository and sharing platform for genetic variant data generated during routine diagnostic procedures and research sequencing projects. Its aim is to provide a protected environment for expert evidence sharing about individual variants to harmonize and upscale their significance interpretation at the clinical grade according to international standards. To corroborate the clinical assessment, the variant-related data will be combined with consented high-quality clinical information. Broader visibility will be achieved by interfacing with international databases, thus supporting global initiatives in personalized healthcare.

5.
J Alzheimers Dis ; 77(1): 257-273, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-32716361

RESUMEN

BACKGROUND: The analysis and interpretation of data generated from patient-derived clinical samples relies on access to high-quality bioinformatics resources. These are maintained and updated by expert curators extracting knowledge from unstructured biological data described in free-text journal articles and converting this into more structured, computationally-accessible forms. This enables analyses such as functional enrichment of sets of genes/proteins using the Gene Ontology, and makes the searching of data more productive by managing issues such as gene/protein name synonyms, identifier mapping, and data quality. OBJECTIVE: To undertake a coordinated annotation update of key public-domain resources to better support Alzheimer's disease research. METHODS: We have systematically identified target proteins critical to disease process, in part by accessing informed input from the clinical research community. RESULTS: Data from 954 papers have been added to the UniProtKB, Gene Ontology, and the International Molecular Exchange Consortium (IMEx) databases, with 299 human proteins and 279 orthologs updated in UniProtKB. 745 binary interactions were added to the IMEx human molecular interaction dataset. CONCLUSION: This represents a significant enhancement in the expert curated data pertinent to Alzheimer's disease available in a number of biomedical databases. Relevant protein entries have been updated in UniProtKB and concomitantly in the Gene Ontology. Molecular interaction networks have been significantly extended in the IMEx Consortium dataset and a set of reference protein complexes created. All the resources described are open-source and freely available to the research community and we provide examples of how these data could be exploited by researchers.


Asunto(s)
Enfermedad de Alzheimer/genética , Biología Computacional/métodos , Bases de Datos de Proteínas , Sistemas Especialistas , Mapas de Interacción de Proteínas/genética , Sector Público , Enfermedad de Alzheimer/diagnóstico , Humanos
6.
Artículo en Inglés | MEDLINE | ID: mdl-26896845

RESUMEN

Advances in high-throughput and advanced technologies allow researchers to routinely perform whole genome and proteome analysis. For this purpose, they need high-quality resources providing comprehensive gene and protein sets for their organisms of interest. Using the example of the human proteome, we will describe the content of a complete proteome in the UniProt Knowledgebase (UniProtKB). We will show how manual expert curation of UniProtKB/Swiss-Prot is complemented by expert-driven automatic annotation to build a comprehensive, high-quality and traceable resource. We will also illustrate how the complexity of the human proteome is captured and structured in UniProtKB. Database URL: www.uniprot.org.


Asunto(s)
Bases de Datos de Proteínas , Proteoma/genética , Proteómica/métodos , Automatización , Genoma , Humanos , Bases del Conocimiento , Fenotipo , Procesamiento Proteico-Postraduccional , Proteínas/química , Edición de ARN , Programas Informáticos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA