Pesquisa | Portal Regional da BVS

A Coordinated Approach by Public Domain Bioinformatics Resources to Aid the Fight Against Alzheimer's Disease Through Expert Curation of Key Protein Targets.

Breuza, Lionel; Arighi, Cecilia N; Argoud-Puy, Ghislaine; Casals-Casas, Cristina; Estreicher, Anne; Famiglietti, Maria Livia; Georghiou, George; Gos, Arnaud; Gruaz-Gumowski, Nadine; Hinz, Ursula; Hyka-Nouspikel, Nevila; Kramarz, Barbara; Lovering, Ruth C; Lussi, Yvonne; Magrane, Michele; Masson, Patrick; Perfetto, Livia; Poux, Sylvain; Rodriguez-Lopez, Milagros; Stoeckert, Christian; Sundaram, Shyamala; Wang, Li-San; Wu, Elizabeth; Orchard, Sandra.

J Alzheimers Dis ; 77(1): 257-273, 2020.

Artigo em Inglês | MEDLINE | ID: mdl-32716361

RESUMO

BACKGROUND: The analysis and interpretation of data generated from patient-derived clinical samples relies on access to high-quality bioinformatics resources. These are maintained and updated by expert curators extracting knowledge from unstructured biological data described in free-text journal articles and converting this into more structured, computationally-accessible forms. This enables analyses such as functional enrichment of sets of genes/proteins using the Gene Ontology, and makes the searching of data more productive by managing issues such as gene/protein name synonyms, identifier mapping, and data quality. OBJECTIVE: To undertake a coordinated annotation update of key public-domain resources to better support Alzheimer's disease research. METHODS: We have systematically identified target proteins critical to disease process, in part by accessing informed input from the clinical research community. RESULTS: Data from 954 papers have been added to the UniProtKB, Gene Ontology, and the International Molecular Exchange Consortium (IMEx) databases, with 299 human proteins and 279 orthologs updated in UniProtKB. 745 binary interactions were added to the IMEx human molecular interaction dataset. CONCLUSION: This represents a significant enhancement in the expert curated data pertinent to Alzheimer's disease available in a number of biomedical databases. Relevant protein entries have been updated in UniProtKB and concomitantly in the Gene Ontology. Molecular interaction networks have been significantly extended in the IMEx Consortium dataset and a set of reference protein complexes created. All the resources described are open-source and freely available to the research community and we provide examples of how these data could be exploited by researchers.

Assuntos

Doença de Alzheimer/genética , Biologia Computacional/métodos , Bases de Dados de Proteínas , Sistemas Inteligentes , Mapas de Interação de Proteínas/genética , Setor Público , Doença de Alzheimer/diagnóstico , Humanos

Scaling up data curation using deep learning: An application to literature triage in genomic variation resources.

Lee, Kyubum; Famiglietti, Maria Livia; McMahon, Aoife; Wei, Chih-Hsuan; MacArthur, Jacqueline Ann Langdon; Poux, Sylvain; Breuza, Lionel; Bridge, Alan; Cunningham, Fiona; Xenarios, Ioannis; Lu, Zhiyong.

PLoS Comput Biol ; 14(8): e1006390, 2018 08.

Artigo em Inglês | MEDLINE | ID: mdl-30102703

RESUMO

Manually curating biomedical knowledge from publications is necessary to build a knowledge based service that provides highly precise and organized information to users. The process of retrieving relevant publications for curation, which is also known as document triage, is usually carried out by querying and reading articles in PubMed. However, this query-based method often obtains unsatisfactory precision and recall on the retrieved results, and it is difficult to manually generate optimal queries. To address this, we propose a machine-learning assisted triage method. We collect previously curated publications from two databases UniProtKB/Swiss-Prot and the NHGRI-EBI GWAS Catalog, and used them as a gold-standard dataset for training deep learning models based on convolutional neural networks. We then use the trained models to classify and rank new publications for curation. For evaluation, we apply our method to the real-world manual curation process of UniProtKB/Swiss-Prot and the GWAS Catalog. We demonstrate that our machine-assisted triage method outperforms the current query-based triage methods, improves efficiency, and enriches curated content. Our method achieves a precision 1.81 and 2.99 times higher than that obtained by the current query-based triage methods of UniProtKB/Swiss-Prot and the GWAS Catalog, respectively, without compromising recall. In fact, our method retrieves many additional relevant publications that the query-based method of UniProtKB/Swiss-Prot could not find. As these results show, our machine learning-based method can make the triage process more efficient and is being implemented in production so that human curators can focus on more challenging tasks to improve the quality of knowledge bases.

Assuntos

Curadoria de Dados/métodos , Armazenamento e Recuperação da Informação/métodos , Curadoria de Dados/estatística & dados numéricos , Bases de Dados Genéticas , Bases de Dados de Proteínas , Aprendizado Profundo , Genômica , Bases de Conhecimento , Aprendizado de Máquina , Publicações

On expert curation and scalability: UniProtKB/Swiss-Prot as a case study.

Poux, Sylvain; Arighi, Cecilia N; Magrane, Michele; Bateman, Alex; Wei, Chih-Hsuan; Lu, Zhiyong; Boutet, Emmanuel; Bye-A-Jee, Hema; Famiglietti, Maria Livia; Roechert, Bernd; UniProt Consortium, The.

Bioinformatics ; 33(21): 3454-3460, 2017 Nov 01.

Artigo em Inglês | MEDLINE | ID: mdl-29036270

RESUMO

MOTIVATION: Biological knowledgebases, such as UniProtKB/Swiss-Prot, constitute an essential component of daily scientific research by offering distilled, summarized and computable knowledge extracted from the literature by expert curators. While knowledgebases play an increasingly important role in the scientific community, their ability to keep up with the growth of biomedical literature is under scrutiny. Using UniProtKB/Swiss-Prot as a case study, we address this concern via multiple literature triage approaches. RESULTS: With the assistance of the PubTator text-mining tool, we tagged more than 10 000 articles to assess the ratio of papers relevant for curation. We first show that curators read and evaluate many more papers than they curate, and that measuring the number of curated publications is insufficient to provide a complete picture as demonstrated by the fact that 8000-10 000 papers are curated in UniProt each year while curators evaluate 50 000-70 000 papers per year. We show that 90% of the papers in PubMed are out of the scope of UniProt, that a maximum of 2-3% of the papers indexed in PubMed each year are relevant for UniProt curation, and that, despite appearances, expert curation in UniProt is scalable. AVAILABILITY AND IMPLEMENTATION: UniProt is freely available at http://www.uniprot.org/. CONTACT: sylvain.poux@sib.swiss. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Curadoria de Dados , Bases de Dados de Proteínas , Curadoria de Dados/estatística & dados numéricos , Mineração de Dados , Bases de Dados de Proteínas/estatística & dados numéricos , Humanos , Bases de Conhecimento , PubMed/estatística & dados numéricos , Literatura de Revisão como Assunto , Estatística como Assunto

The UniProtKB guide to the human proteome.

Breuza, Lionel; Poux, Sylvain; Estreicher, Anne; Famiglietti, Maria Livia; Magrane, Michele; Tognolli, Michael; Bridge, Alan; Baratin, Delphine; Redaschi, Nicole.

Database (Oxford) ; 20162016.

Artigo em Inglês | MEDLINE | ID: mdl-26896845

RESUMO

Advances in high-throughput and advanced technologies allow researchers to routinely perform whole genome and proteome analysis. For this purpose, they need high-quality resources providing comprehensive gene and protein sets for their organisms of interest. Using the example of the human proteome, we will describe the content of a complete proteome in the UniProt Knowledgebase (UniProtKB). We will show how manual expert curation of UniProtKB/Swiss-Prot is complemented by expert-driven automatic annotation to build a comprehensive, high-quality and traceable resource. We will also illustrate how the complexity of the human proteome is captured and structured in UniProtKB. Database URL: www.uniprot.org.

Assuntos

Bases de Dados de Proteínas , Proteoma/genética , Proteômica/métodos , Automação , Genoma , Humanos , Bases de Conhecimento , Fenótipo , Processamento de Proteína Pós-Traducional , Proteínas/química , Edição de RNA , Software

Genetic variations and diseases in UniProtKB/Swiss-Prot: the ins and outs of expert manual curation.

Famiglietti, Maria Livia; Estreicher, Anne; Gos, Arnaud; Bolleman, Jerven; Géhant, Sébastien; Breuza, Lionel; Bridge, Alan; Poux, Sylvain; Redaschi, Nicole; Bougueleret, Lydie; Xenarios, Ioannis.

Hum Mutat ; 35(8): 927-35, 2014 Aug.

Artigo em Inglês | MEDLINE | ID: mdl-24848695

RESUMO

During the last few years, next-generation sequencing (NGS) technologies have accelerated the detection of genetic variants resulting in the rapid discovery of new disease-associated genes. However, the wealth of variation data made available by NGS alone is not sufficient to understand the mechanisms underlying disease pathogenesis and manifestation. Multidisciplinary approaches combining sequence and clinical data with prior biological knowledge are needed to unravel the role of genetic variants in human health and disease. In this context, it is crucial that these data are linked, organized, and made readily available through reliable online resources. The Swiss-Prot section of the Universal Protein Knowledgebase (UniProtKB/Swiss-Prot) provides the scientific community with a collection of information on protein functions, interactions, biological pathways, as well as human genetic diseases and variants, all manually reviewed by experts. In this article, we present an overview of the information content of UniProtKB/Swiss-Prot to show how this knowledgebase can support researchers in the elucidation of the mechanisms leading from a molecular defect to a disease phenotype.

Assuntos

Bases de Dados de Proteínas/estatística & dados numéricos , Estudos de Associação Genética , Genética Médica , Bases de Conhecimento , Proteoma , Software , Sequência de Aminoácidos , Variação Genética , Genoma Humano , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Internet , Anotação de Sequência Molecular , Dados de Sequência Molecular , Terminologia como Assunto

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA