Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 4 de 4
Filtrar
Más filtros










Base de datos
Intervalo de año de publicación
1.
Elife ; 102021 12 03.
Artículo en Inglés | MEDLINE | ID: mdl-34860157

RESUMEN

Making the knowledge contained in scientific papers machine-readable and formally computable would allow researchers to take full advantage of this information by enabling integration with other knowledge sources to support data analysis and interpretation. Here we describe Biofactoid, a web-based platform that allows scientists to specify networks of interactions between genes, their products, and chemical compounds, and then translates this information into a representation suitable for computational analysis, search and discovery. We also report the results of a pilot study to encourage the wide adoption of Biofactoid by the scientific community.


Asunto(s)
Biología Computacional/métodos , Genómica/métodos , Biología Computacional/instrumentación , Bases de Datos Factuales , Genómica/instrumentación , Proyectos Piloto
2.
Bioinformatics ; 36(1): 280-286, 2020 01 01.
Artículo en Inglés | MEDLINE | ID: mdl-31218364

RESUMEN

MOTIVATION: Automatic biomedical named entity recognition (BioNER) is a key task in biomedical information extraction. For some time, state-of-the-art BioNER has been dominated by machine learning methods, particularly conditional random fields (CRFs), with a recent focus on deep learning. However, recent work has suggested that the high performance of CRFs for BioNER may not generalize to corpora other than the one it was trained on. In our analysis, we find that a popular deep learning-based approach to BioNER, known as bidirectional long short-term memory network-conditional random field (BiLSTM-CRF), is correspondingly poor at generalizing. To address this, we evaluate three modifications of BiLSTM-CRF for BioNER to improve generalization: improved regularization via variational dropout, transfer learning and multi-task learning. RESULTS: We measure the effect that each strategy has when training/testing on the same corpus ('in-corpus' performance) and when training on one corpus and evaluating on another ('out-of-corpus' performance), our measure of the model's ability to generalize. We found that variational dropout improves out-of-corpus performance by an average of 4.62%, transfer learning by 6.48% and multi-task learning by 8.42%. The maximal increase we identified combines multi-task learning and variational dropout, which boosts out-of-corpus performance by 10.75%. Furthermore, we make available a new open-source tool, called Saber that implements our best BioNER models. AVAILABILITY AND IMPLEMENTATION: Source code for our biomedical IE tool is available at https://github.com/BaderLab/saber. Corpora and other resources used in this study are available at https://github.com/BaderLab/Towards-reliable-BioNER. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Biología Computacional , Aprendizaje Profundo , Almacenamiento y Recuperación de la Información , Aprendizaje Automático , Biología Computacional/métodos , Almacenamiento y Recuperación de la Información/métodos , Modelos Genéticos , Programas Informáticos
3.
Bioinformatics ; 34(23): 4087-4094, 2018 12 01.
Artículo en Inglés | MEDLINE | ID: mdl-29868832

RESUMEN

Motivation: The explosive increase of biomedical literature has made information extraction an increasingly important tool for biomedical research. A fundamental task is the recognition of biomedical named entities in text (BNER) such as genes/proteins, diseases and species. Recently, a domain-independent method based on deep learning and statistical word embeddings, called long short-term memory network-conditional random field (LSTM-CRF), has been shown to outperform state-of-the-art entity-specific BNER tools. However, this method is dependent on gold-standard corpora (GSCs) consisting of hand-labeled entities, which tend to be small but highly reliable. An alternative to GSCs are silver-standard corpora (SSCs), which are generated by harmonizing the annotations made by several automatic annotation systems. SSCs typically contain more noise than GSCs but have the advantage of containing many more training examples. Ideally, these corpora could be combined to achieve the benefits of both, which is an opportunity for transfer learning. In this work, we analyze to what extent transfer learning improves upon state-of-the-art results for BNER. Results: We demonstrate that transferring a deep neural network (DNN) trained on a large, noisy SSC to a smaller, but more reliable GSC significantly improves upon state-of-the-art results for BNER. Compared to a state-of-the-art baseline evaluated on 23 GSCs covering four different entity classes, transfer learning results in an average reduction in error of approximately 11%. We found transfer learning to be especially beneficial for target datasets with a small number of labels (approximately 6000 or less). Availability and implementation: Source code for the LSTM-CRF is available at https://github.com/Franck-Dernoncourt/NeuroNER/ and links to the corpora are available at https://github.com/BaderLab/Transfer-Learning-BNER-Bioinformatics-2018/. Supplementary information: Supplementary data are available at Bioinformatics online.


Asunto(s)
Aprendizaje Profundo , Almacenamiento y Recuperación de la Información , Redes Neurales de la Computación , Programas Informáticos , Biología Computacional
4.
New Phytol ; 220(4): 1161-1171, 2018 12.
Artículo en Inglés | MEDLINE | ID: mdl-29355972

RESUMEN

Arbuscular mycorrhizal fungi (AMF) are known to improve plant fitness through the establishment of mycorrhizal symbioses. Genetic and phenotypic variations among closely related AMF isolates can significantly affect plant growth, but the genomic changes underlying this variability are unclear. To address this issue, we improved the genome assembly and gene annotation of the model strain Rhizophagus irregularis DAOM197198, and compared its gene content with five isolates of R. irregularis sampled in the same field. All isolates harbor striking genome variations, with large numbers of isolate-specific genes, gene family expansions, and evidence of interisolate genetic exchange. The observed variability affects all gene ontology terms and PFAM protein domains, as well as putative mycorrhiza-induced small secreted effector-like proteins and other symbiosis differentially expressed genes. High variability is also found in active transposable elements. Overall, these findings indicate a substantial divergence in the functioning capacity of isolates harvested from the same field, and thus their genetic potential for adaptation to biotic and abiotic changes. Our data also provide a first glimpse into the genome diversity that resides within natural populations of these symbionts, and open avenues for future analyses of plant-AMF interactions that link AMF genome variation with plant phenotype and fitness.


Asunto(s)
Variación Genética , Genoma Fúngico , Glomeromycota/genética , Modelos Biológicos , Micorrizas/genética , Simbiosis/genética , Adaptación Fisiológica/genética , Elementos Transponibles de ADN/genética , Proteínas Fúngicas/química , Genes Fúngicos , Glomeromycota/aislamiento & purificación , Anotación de Secuencia Molecular , Filogenia , Dominios Proteicos , Especificidad de la Especie
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...