Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 9 de 9
Filtrar
1.
Commun Biol ; 6(1): 876, 2023 08 25.
Artigo em Inglês | MEDLINE | ID: mdl-37626165

RESUMO

Geometric deep learning has recently achieved great success in non-Euclidean domains, and learning on 3D structures of large biomolecules is emerging as a distinct research area. However, its efficacy is largely constrained due to the limited quantity of structural data. Meanwhile, protein language models trained on substantial 1D sequences have shown burgeoning capabilities with scale in a broad range of applications. Several preceding studies consider combining these different protein modalities to promote the representation power of geometric neural networks but fail to present a comprehensive understanding of their benefits. In this work, we integrate the knowledge learned by well-trained protein language models into several state-of-the-art geometric networks and evaluate a variety of protein representation learning benchmarks, including protein-protein interface prediction, model quality assessment, protein-protein rigid-body docking, and binding affinity prediction. Our findings show an overall improvement of 20% over baselines. Strong evidence indicates that the incorporation of protein language models' knowledge enhances geometric networks' capacity by a significant margin and can be generalized to complex tasks.


Assuntos
Aprendizado Profundo , Benchmarking , Idioma , Redes Neurais de Computação
2.
NPJ Digit Med ; 4(1): 68, 2021 Apr 12.
Artigo em Inglês | MEDLINE | ID: mdl-33846532

RESUMO

The COVID-19 global pandemic has resulted in international efforts to understand, track, and mitigate the disease, yielding a significant corpus of COVID-19 and SARS-CoV-2-related publications across scientific disciplines. Throughout 2020, over 400,000 coronavirus-related publications have been collected through the COVID-19 Open Research Dataset. Here, we present CO-Search, a semantic, multi-stage, search engine designed to handle complex queries over the COVID-19 literature, potentially aiding overburdened health workers in finding scientific answers and avoiding misinformation during a time of crisis. CO-Search is built from two sequential parts: a hybrid semantic-keyword retriever, which takes an input query and returns a sorted list of the 1000 most relevant documents, and a re-ranker, which further orders them by relevance. The retriever is composed of a deep learning model (Siamese-BERT) that encodes query-level meaning, along with two keyword-based models (BM25, TF-IDF) that emphasize the most important words of a query. The re-ranker assigns a relevance score to each document, computed from the outputs of (1) a question-answering module which gauges how much each document answers the query, and (2) an abstractive summarization module which determines how well a query matches a generated summary of the document. To account for the relatively limited dataset, we develop a text augmentation technique which splits the documents into pairs of paragraphs and the citations contained in them, creating millions of (citation title, paragraph) tuples for training the retriever. We evaluate our system ( http://einstein.ai/covid ) on the data of the TREC-COVID information retrieval challenge, obtaining strong performance across multiple key information retrieval metrics.

3.
Eur Radiol ; 30(7): 4125-4133, 2020 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-32103365

RESUMO

PURPOSE: The highly structured nature of medical reports makes them feasible for automated large-scale patient identification. This study aimed to develop a natural language processing (NLP) model to retrospectively retrieve patients with presence and history of carotid stenosis (CS) using their ultrasound reports. METHODS: Ultrasound reports from our institution between January 2016 and December 2017 were selected. To process the texts, we developed a parser to divide the raw text into fields. For baseline method, we used bag-of-n-grams and term frequency inverse document frequency as the features and used linear classifiers. Logistic regression was performed as the baseline model. Convolution and recurrent neural networks (CNN; RNN) with attention mechanism were applied to the dataset to improve the classification accuracy. RESULTS: We had 1220 ultrasound reports for training and 307 for testing, totaling to 1527 reports. For predicting history of CS, both CNN and RNN-attention models had a significantly higher specificity than logistic regression. In addition, RNN-attention also had a significantly higher F1 score and accuracy. For predicting presence of carotid stenosis, all models achieved above 93% accuracy. RNN-attention achieved a 95.4% accuracy, although the difference with logistic regression was not statistically significant. RNN-attention had a statistically significant higher specificity than logistic regression. CONCLUSIONS: We developed linear, CNN, and RNN models to predict history and presence of CS from ultrasound reports. We have demonstrated NLP to be an efficient, accurate approach for large-scale retrospective patient identification, with applications in long-term follow-up of patients and clinical research studies. KEY POINTS: • Natural language processing models using both linear classifiers and neural networks can achieve a good performance, with an overall accuracy above 90% in predicting history and presence of carotid stenosis. • Convolution and recurrent neural networks, especially with additional features including field awareness and attention mechanism, have superior performance than traditional linear classifiers. • NLP is shown to be an efficient approach for large-scale retrospective patient identification, with applications in long-term follow-up of patients and further clinical research studies.


Assuntos
Estenose das Carótidas/diagnóstico , Processamento de Linguagem Natural , Estenose das Carótidas/diagnóstico por imagem , Humanos , Redes Neurais de Computação , Estudos Retrospectivos , Sensibilidade e Especificidade , Ultrassonografia
4.
BMC Bioinformatics ; 12: 481, 2011 Dec 18.
Artigo em Inglês | MEDLINE | ID: mdl-22177292

RESUMO

BACKGROUND: Bio-molecular event extraction from literature is recognized as an important task of bio text mining and, as such, many relevant systems have been developed and made available during the last decade. While such systems provide useful services individually, there is a need for a meta-service to enable comparison and ensemble of such services, offering optimal solutions for various purposes. RESULTS: We have integrated nine event extraction systems in the U-Compare framework, making them intercompatible and interoperable with other U-Compare components. The U-Compare event meta-service provides various meta-level features for comparison and ensemble of multiple event extraction systems. Experimental results show that the performance improvements achieved by the ensemble are significant. CONCLUSIONS: While individual event extraction systems themselves provide useful features for bio text mining, the U-Compare meta-service is expected to improve the accessibility to the individual systems, and to enable meta-level uses over multiple event extraction systems such as comparison and ensemble.


Assuntos
Mineração de Dados , Sistemas Computacionais , Publicações Periódicas como Assunto , Software
5.
J Biomed Semantics ; 2 Suppl 2: S8, 2011 May 17.
Artigo em Inglês | MEDLINE | ID: mdl-21624163

RESUMO

BACKGROUND: Interferon-gamma (IFN-γ) is vital in vaccine-induced immune defense against bacterial and viral infections and tumor. Our recent study demonstrated the power of a literature-based discovery method in extraction and comparison of the IFN-γ and vaccine-mediated gene interaction networks. The Vaccine Ontology (VO) contains a hierarchy of vaccine names. It is hypothesized that the application of VO will enhance the prediction of IFN-γ and vaccine-mediated gene interaction network. RESULTS: In this study, 186 specific vaccine names listed in the Vaccine Ontology (VO) and their semantic relations were used for possible improved retrieval of the IFN-γ and vaccine associated gene interactions. The application of VO allows discovery of 38 more genes and 60 more interactions. Comparison of different layers of IFN-γ networks and the example BCG vaccine-induced subnetwork led to generation of new hypotheses. By analyzing all discovered genes using centrality metrics, 32 genes were ranked high in the VO-based IFN-γ vaccine network using four centrality scores. Furthermore, 28 specific vaccines were found to be associated with these top 32 genes. These specific vaccine-gene associations were further used to generate a network of vaccine-vaccine associations. The BCG and LVS vaccines are found to be the most central vaccines in the vaccine-vaccine association network. CONCLUSION: Our results demonstrate that the combined usages of biomedical ontologies and centrality-based literature mining are able to significantly facilitate discovery of gene interaction networks and gene-concept associations. AVAILABILITY: VO is available at: http://www.violinet.org/vaccineontology; and the SVM edit kernel for gene interaction extraction is available at: http://www.violinet.org/ifngvonet/int_ext_svm.zip.

6.
J Biomed Biotechnol ; 2010: 426479, 2010.
Artigo em Inglês | MEDLINE | ID: mdl-20625487

RESUMO

Interferon-gamma (IFN-gamma) regulates various immune responses that are often critical for vaccine-induced protection. In order to annotate the IFN-gamma-related gene interaction network from a large amount of IFN-gamma research reported in the literature, a literature-based discovery approach was applied with a combination of natural language processing (NLP) and network centrality analysis. The interaction network of human IFN-gamma (Gene symbol: IFNG) and its vaccine-specific subnetwork were automatically extracted using abstracts from all articles in PubMed. Four network centrality metrics were further calculated to rank the genes in the constructed networks. The resulting generic IFNG network contains 1060 genes and 26313 interactions among these genes. The vaccine-specific subnetwork contains 102 genes and 154 interactions. Fifty six genes such as TNF, NFKB1, IL2, IL6, and MAPK8 were ranked among the top 25 by at least one of the centrality methods in one or both networks. Gene enrichment analysis indicated that these genes were classified in various immune mechanisms such as response to extracellular stimulus, lymphocyte activation, and regulation of apoptosis. Literature evidence was manually curated for the IFN-gamma relatedness of 56 genes and vaccine development relatedness for 52 genes. This study also generated many new hypotheses worth further experimental studies.


Assuntos
Redes Reguladoras de Genes/imunologia , Interferon gama/imunologia , Vacinas/genética , Vacinas/imunologia , Humanos , Imunidade/genética , Proteína Quinase 8 Ativada por Mitógeno/genética , Proteína Quinase 8 Ativada por Mitógeno/metabolismo
7.
Nucleic Acids Res ; 37(Database issue): D642-6, 2009 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-18978014

RESUMO

Molecular interaction data exists in a number of repositories, each with its own data format, molecule identifier and information coverage. Michigan molecular interactions (MiMI) assists scientists searching through this profusion of molecular interaction data. The original release of MiMI gathered data from well-known protein interaction databases, and deep merged this information while keeping track of provenance. Based on the feedback received from users, MiMI has been completely redesigned. This article describes the resulting MiMI Release 2 (MiMIr2). New functionality includes extension from proteins to genes and to pathways; identification of highlighted sentences in source publications; seamless two-way linkage with Cytoscape; query facilities based on MeSH/GO terms and other concepts; approximate graph matching to find relevant pathways; support for querying in bulk; and a user focus-group driven interface design. MiMI is part of the NIH's; National Center for Integrative Biomedical Informatics (NCIBI) and is publicly available at: http://mimi.ncibi.org.


Assuntos
Bases de Dados de Proteínas , Mapeamento de Interação de Proteínas , Proteínas/metabolismo , Gráficos por Computador , Proteínas/genética , Interface Usuário-Computador
8.
Genome Biol ; 9 Suppl 2: S6, 2008.
Artigo em Inglês | MEDLINE | ID: mdl-18834497

RESUMO

We introduce the first meta-service for information extraction in molecular biology, the BioCreative MetaServer (BCMS; http://bcms.bioinfo.cnio.es/). This prototype platform is a joint effort of 13 research groups and provides automatically generated annotations for PubMed/Medline abstracts. Annotation types cover gene names, gene IDs, species, and protein-protein interactions. The annotations are distributed by the meta-server in both human and machine readable formats (HTML/XML). This service is intended to be used by biomedical researchers and database annotators, and in biomedical language processing. The platform allows direct comparison, unified access, and result aggregation of the annotations.


Assuntos
Pesquisa Biomédica/métodos , Biologia Computacional/métodos , Armazenamento e Recuperação da Informação , Internet , Humanos
9.
Bioinformatics ; 24(13): i277-85, 2008 Jul 01.
Artigo em Inglês | MEDLINE | ID: mdl-18586725

RESUMO

MOTIVATION: Understanding the role of genetics in diseases is one of the most important aims of the biological sciences. The completion of the Human Genome Project has led to a rapid increase in the number of publications in this area. However, the coverage of curated databases that provide information manually extracted from the literature is limited. Another challenge is that determining disease-related genes requires laborious experiments. Therefore, predicting good candidate genes before experimental analysis will save time and effort. We introduce an automatic approach based on text mining and network analysis to predict gene-disease associations. We collected an initial set of known disease-related genes and built an interaction network by automatic literature mining based on dependency parsing and support vector machines. Our hypothesis is that the central genes in this disease-specific network are likely to be related to the disease. We used the degree, eigenvector, betweenness and closeness centrality metrics to rank the genes in the network. RESULTS: The proposed approach can be used to extract known and to infer unknown gene-disease associations. We evaluated the approach for prostate cancer. Eigenvector and degree centrality achieved high accuracy. A total of 95% of the top 20 genes ranked by these methods are confirmed to be related to prostate cancer. On the other hand, betweenness and closeness centrality predicted more genes whose relation to the disease is currently unknown and are candidates for experimental study. AVAILABILITY: A web-based system for browsing the disease-specific gene-interaction networks is available at: http://gin.ncibi.org.


Assuntos
Sistemas de Gerenciamento de Base de Dados , Predisposição Genética para Doença/genética , Processamento de Linguagem Natural , Publicações Periódicas como Assunto , Mapeamento de Interação de Proteínas/métodos , Proteoma/metabolismo , Transdução de Sinais , Humanos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...