Pesquisa | Portal de Pesquisa da BVS

1.

Literature-based predictions of Mendelian disease therapies.

Deisseroth, Cole A; Lee, Won-Seok; Kim, Jiyoen; Jeong, Hyun-Hwan; Dhindsa, Ryan S; Wang, Julia; Zoghbi, Huda Y; Liu, Zhandong.

Am J Hum Genet ; 110(10): 1661-1672, 2023 10 05.

Artigo em Inglês | MEDLINE | ID: mdl-37741276

RESUMO

In the effort to treat Mendelian disorders, correcting the underlying molecular imbalance may be more effective than symptomatic treatment. Identifying treatments that might accomplish this goal requires extensive and up-to-date knowledge of molecular pathways-including drug-gene and gene-gene relationships. To address this challenge, we present "parsing modifiers via article annotations" (PARMESAN), a computational tool that searches PubMed and PubMed Central for information to assemble these relationships into a central knowledge base. PARMESAN then predicts putatively novel drug-gene relationships, assigning an evidence-based score to each prediction. We compare PARMESAN's drug-gene predictions to all of the drug-gene relationships displayed by the Drug-Gene Interaction Database (DGIdb) and show that higher-scoring relationship predictions are more likely to match the directionality (up- versus down-regulation) indicated by this database. PARMESAN had more than 200,000 drug predictions scoring above 8 (as one example cutoff), for more than 3,700 genes. Among these predicted relationships, 210 were registered in DGIdb and 201 (96%) had matching directionality. This publicly available tool provides an automated way to prioritize drug screens to target the most-promising drugs to test, thereby saving time and resources in the development of therapeutics for genetic disorders.

Assuntos

PubMed , Humanos , Bases de Dados Factuais

2.

IDPpub: Illuminating the Dark Phosphoproteome Through PubMed Mining.

Savage, Sara R; Zhang, Yaoyun; Jaehnig, Eric J; Liao, Yuxing; Shi, Zhiao; Pham, Huy Anh; Xu, Hua; Zhang, Bing.

Mol Cell Proteomics ; 23(1): 100682, 2024 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-37993103

RESUMO

Global phosphoproteomics experiments quantify tens of thousands of phosphorylation sites. However, data interpretation is hampered by our limited knowledge on functions, biological contexts, or precipitating enzymes of the phosphosites. This study establishes a repository of phosphosites with associated evidence in biomedical abstracts, using deep learning-based natural language processing techniques. Our model for illuminating the dark phosphoproteome through PubMed mining (IDPpub) was generated by fine-tuning BioBERT, a deep learning tool for biomedical text mining. Trained using sentences containing protein substrates and phosphorylation site positions from 3000 abstracts, the IDPpub model was then used to extract phosphorylation sites from all MEDLINE abstracts. The extracted proteins were normalized to gene symbols using the National Center for Biotechnology Information gene query, and sites were mapped to human UniProt sequences using ProtMapper and mouse UniProt sequences by direct match. Precision and recall were calculated using 150 curated abstracts, and utility was assessed by analyzing the CPTAC (Clinical Proteomics Tumor Analysis Consortium) pan-cancer phosphoproteomics datasets and the PhosphoSitePlus database. Using 10-fold cross validation, pairs of correct substrates and phosphosite positions were extracted with an average precision of 0.93 and recall of 0.94. After entity normalization and site mapping to human reference sequences, an independent validation achieved a precision of 0.91 and recall of 0.77. The IDPpub repository contains 18,458 unique human phosphorylation sites with evidence sentences from 58,227 abstracts and 5918 mouse sites in 14,610 abstracts. This included evidence sentences for 1803 sites identified in CPTAC studies that are not covered by manually curated functional information in PhosphoSitePlus. Evaluation results demonstrate the potential of IDPpub as an effective biomedical text mining tool for collecting phosphosites. Moreover, the repository (http://idppub.ptmax.org), which can be automatically updated, can serve as a powerful complement to existing resources.

Assuntos

Mineração de Dados , Processamento de Linguagem Natural , Humanos , Mineração de Dados/métodos , Bases de Dados Factuais , PubMed

3.

PubTator 3.0: an AI-powered literature resource for unlocking biomedical knowledge.

Wei, Chih-Hsuan; Allot, Alexis; Lai, Po-Ting; Leaman, Robert; Tian, Shubo; Luo, Ling; Jin, Qiao; Wang, Zhizheng; Chen, Qingyu; Lu, Zhiyong.

Nucleic Acids Res ; 52(W1): W540-W546, 2024 Jul 05.

Artigo em Inglês | MEDLINE | ID: mdl-38572754

RESUMO

PubTator 3.0 (https://www.ncbi.nlm.nih.gov/research/pubtator3/) is a biomedical literature resource using state-of-the-art AI techniques to offer semantic and relation searches for key concepts like proteins, genetic variants, diseases and chemicals. It currently provides over one billion entity and relation annotations across approximately 36 million PubMed abstracts and 6 million full-text articles from the PMC open access subset, updated weekly. PubTator 3.0's online interface and API utilize these precomputed entity relations and synonyms to provide advanced search capabilities and enable large-scale analyses, streamlining many complex information needs. We showcase the retrieval quality of PubTator 3.0 using a series of entity pair queries, demonstrating that PubTator 3.0 retrieves a greater number of articles than either PubMed or Google Scholar, with higher precision in the top 20 results. We further show that integrating ChatGPT (GPT-4) with PubTator APIs dramatically improves the factuality and verifiability of its responses. In summary, PubTator 3.0 offers a comprehensive set of features and tools that allow researchers to navigate the ever-expanding wealth of biomedical literature, expediting research and unlocking valuable insights for scientific discovery.

Assuntos

PubMed , Inteligência Artificial , Humanos , Software , Mineração de Dados/métodos , Semântica , Internet

4.

NetMe 2.0: a web-based platform for extracting and modeling knowledge from biomedical literature as a labeled graph.

Di Maria, Antonio; Bellomo, Lorenzo; Billeci, Fabrizio; Cardillo, Alfio; Alaimo, Salvatore; Ferragina, Paolo; Ferro, Alfredo; Pulvirenti, Alfredo.

Bioinformatics ; 40(5)2024 May 02.

Artigo em Inglês | MEDLINE | ID: mdl-38597890

RESUMO

MOTIVATION: The rapid increase of bio-medical literature makes it harder and harder for scientists to keep pace with the discoveries on which they build their studies. Therefore, computational tools have become more widespread, among which network analysis plays a crucial role in several life-science contexts. Nevertheless, building correct and complete networks about some user-defined biomedical topics on top of the available literature is still challenging. RESULTS: We introduce NetMe 2.0, a web-based platform that automatically extracts relevant biomedical entities and their relations from a set of input texts-i.e. in the form of full-text or abstract of PubMed Central's papers, free texts, or PDFs uploaded by users-and models them as a BioMedical Knowledge Graph (BKG). NetMe 2.0 also implements an innovative Retrieval Augmented Generation module (Graph-RAG) that works on top of the relationships modeled by the BKG and allows the distilling of well-formed sentences that explain their content. The experimental results show that NetMe 2.0 can infer comprehensive and reliable biological networks with significant Precision-Recall metrics when compared to state-of-the-art approaches. AVAILABILITY AND IMPLEMENTATION: https://netme.click/.

Assuntos

Internet , Software , Mineração de Dados/métodos , Biologia Computacional/métodos , PubMed

5.

Text mining for contexts and relationships in cancer genomics literature.

Collins, Charlotte; Baker, Simon; Brown, Jason; Zheng, Huiyuan; Chan, Adelyne; Stenius, Ulla; Narita, Masashi; Korhonen, Anna.

Bioinformatics ; 40(1)2024 01 02.

Artigo em Inglês | MEDLINE | ID: mdl-38258418

RESUMO

MOTIVATION: Scientific advances build on the findings of existing research. The 2001 publication of the human genome has led to the production of huge volumes of literature exploring the context-specific functions and interactions of genes. Technology is needed to perform large-scale text mining of research papers to extract the reported actions of genes in specific experimental contexts and cell states, such as cancer, thereby facilitating the design of new therapeutic strategies. RESULTS: We present a new corpus and Text Mining methodology that can accurately identify and extract the most important details of cancer genomics experiments from biomedical texts. We build a Named Entity Recognition model that accurately extracts relevant experiment details from PubMed abstract text, and a second model that identifies the relationships between them. This system outperforms earlier models and enables the analysis of gene function in diverse and dynamically evolving experimental contexts. AVAILABILITY AND IMPLEMENTATION: Code and data are available here: https://github.com/cambridgeltl/functional-genomics-ie.

Assuntos

Genômica , Neoplasias , Humanos , Neoplasias/genética , Mineração de Dados/métodos , PubMed , Fenótipo

6.

Analysis of 567,758 randomized controlled trials published over 30 years reveals trends in phrases used to discuss results that do not reach statistical significance.

Otte, Willem M; Vinkers, Christiaan H; Habets, Philippe C; van IJzendoorn, David G P; Tijdink, Joeri K.

PLoS Biol ; 20(2): e3001562, 2022 02.

Artigo em Inglês | MEDLINE | ID: mdl-35180228

RESUMO

The power of language to modify the reader's perception of interpreting biomedical results cannot be underestimated. Misreporting and misinterpretation are pressing problems in randomized controlled trials (RCT) output. This may be partially related to the statistical significance paradigm used in clinical trials centered around a P value below 0.05 cutoff. Strict use of this P value may lead to strategies of clinical researchers to describe their clinical results with P values approaching but not reaching the threshold to be "almost significant." The question is how phrases expressing nonsignificant results have been reported in RCTs over the past 30 years. To this end, we conducted a quantitative analysis of English full texts containing 567,758 RCTs recorded in PubMed between 1990 and 2020 (81.5% of all published RCTs in PubMed). We determined the exact presence of 505 predefined phrases denoting results that approach but do not cross the line of formal statistical significance (P < 0.05). We modeled temporal trends in phrase data with Bayesian linear regression. Evidence for temporal change was obtained through Bayes factor (BF) analysis. In a randomly sampled subset, the associated P values were manually extracted. We identified 61,741 phrases in 49,134 RCTs indicating almost significant results (8.65%; 95% confidence interval (CI): 8.58% to 8.73%). The overall prevalence of these phrases remained stable over time, with the most prevalent phrases being "marginally significant" (in 7,735 RCTs), "all but significant" (7,015), "a nonsignificant trend" (3,442), "failed to reach statistical significance" (2,578), and "a strong trend" (1,700). The strongest evidence for an increased temporal prevalence was found for "a numerical trend," "a positive trend," "an increasing trend," and "nominally significant." In contrast, the phrases "all but significant," "approaches statistical significance," "did not quite reach statistical significance," "difference was apparent," "failed to reach statistical significance," and "not quite significant" decreased over time. In a random sampled subset of 29,000 phrases, the manually identified and corresponding 11,926 P values, 68,1% ranged between 0.05 and 0.15 (CI: 67. to 69.0; median 0.06). Our results show that RCT reports regularly contain specific phrases describing marginally nonsignificant results to report P values close to but above the dominant 0.05 cutoff. The fact that the prevalence of the phrases remained stable over time indicates that this practice of broadly interpreting P values close to a predefined threshold remains prevalent. To enhance responsible and transparent interpretation of RCT results, researchers, clinicians, reviewers, and editors may reduce the focus on formal statistical significance thresholds and stimulate reporting of P values with corresponding effect sizes and CIs and focus on the clinical relevance of the statistical difference found in RCTs.

Assuntos

PubMed/normas , Publicações/normas , Ensaios Clínicos Controlados Aleatórios como Assunto/normas , Projetos de Pesquisa/normas , Relatório de Pesquisa/normas , Teorema de Bayes , Viés , Humanos , Modelos Lineares , Avaliação de Resultados em Cuidados de Saúde/métodos , Avaliação de Resultados em Cuidados de Saúde/normas , Avaliação de Resultados em Cuidados de Saúde/estatística & dados numéricos , PubMed/estatística & dados numéricos , Publicações/estatística & dados numéricos , Ensaios Clínicos Controlados Aleatórios como Assunto/estatística & dados numéricos , Reprodutibilidade dos Testes

7.

MetaTron: advancing biomedical annotation empowering relation annotation and collaboration.

Irrera, Ornella; Marchesin, Stefano; Silvello, Gianmaria.

BMC Bioinformatics ; 25(1): 112, 2024 Mar 14.

Artigo em Inglês | MEDLINE | ID: mdl-38486137

RESUMO

BACKGROUND: The constant growth of biomedical data is accompanied by the need for new methodologies to effectively and efficiently extract machine-readable knowledge for training and testing purposes. A crucial aspect in this regard is creating large, often manually or semi-manually, annotated corpora vital for developing effective and efficient methods for tasks like relation extraction, topic recognition, and entity linking. However, manual annotation is expensive and time-consuming especially if not assisted by interactive, intuitive, and collaborative computer-aided tools. To support healthcare experts in the annotation process and foster annotated corpora creation, we present MetaTron. MetaTron is an open-source and free-to-use web-based annotation tool to annotate biomedical data interactively and collaboratively; it supports both mention-level and document-level annotations also integrating automatic built-in predictions. Moreover, MetaTron enables relation annotation with the support of ontologies, functionalities often overlooked by off-the-shelf annotation tools. RESULTS: We conducted a qualitative analysis to compare MetaTron with a set of manual annotation tools including TeamTat, INCEpTION, LightTag, MedTAG, and brat, on three sets of criteria: technical, data, and functional. A quantitative evaluation allowed us to assess MetaTron performances in terms of time and number of clicks to annotate a set of documents. The results indicated that MetaTron fulfills almost all the selected criteria and achieves the best performances. CONCLUSIONS: MetaTron stands out as one of the few annotation tools targeting the biomedical domain supporting the annotation of relations, and fully customizable with documents in several formats-PDF included, as well as abstracts retrieved from PubMed, Semantic Scholar, and OpenAIRE. To meet any user need, we released MetaTron both as an online instance and as a Docker image locally deployable.

Assuntos

Poder Psicológico , Semântica , PubMed

8.

GPDminer: a tool for extracting named entities and analyzing relations in biological literature.

Park, Yeon-Ji; Yang, Geun-Je; Sohn, Chae-Bong; Park, Soo Jun.

BMC Bioinformatics ; 25(1): 101, 2024 Mar 06.

Artigo em Inglês | MEDLINE | ID: mdl-38448845

RESUMO

PURPOSE: The expansion of research across various disciplines has led to a substantial increase in published papers and journals, highlighting the necessity for reliable text mining platforms for database construction and knowledge acquisition. This abstract introduces GPDMiner(Gene, Protein, and Disease Miner), a platform designed for the biomedical domain, addressing the challenges posed by the growing volume of academic papers. METHODS: GPDMiner is a text mining platform that utilizes advanced information retrieval techniques. It operates by searching PubMed for specific queries, extracting and analyzing information relevant to the biomedical field. This system is designed to discern and illustrate relationships between biomedical entities obtained from automated information extraction. RESULTS: The implementation of GPDMiner demonstrates its efficacy in navigating the extensive corpus of biomedical literature. It efficiently retrieves, extracts, and analyzes information, highlighting significant connections between genes, proteins, and diseases. The platform also allows users to save their analytical outcomes in various formats, including Excel and images. CONCLUSION: GPDMiner offers a notable additional functionality among the array of text mining tools available for the biomedical field. This tool presents an effective solution for researchers to navigate and extract relevant information from the vast unstructured texts found in biomedical literature, thereby providing distinctive capabilities that set it apart from existing methodologies. Its application is expected to greatly benefit researchers in this domain, enhancing their capacity for knowledge discovery and data management.

Assuntos

Gerenciamento de Dados , Mineração de Dados , Bases de Dados Factuais , Descoberta do Conhecimento , PubMed

9.

Fast searches of large collections of single-cell data using scfind.

Lee, Jimmy Tsz Hang; Patikas, Nikolaos; Kiselev, Vladimir Yu; Hemberg, Martin.

Nat Methods ; 18(3): 262-271, 2021 03.

Artigo em Inglês | MEDLINE | ID: mdl-33649586

RESUMO

Single-cell technologies have made it possible to profile millions of cells, but for these resources to be useful they must be easy to query and access. To facilitate interactive and intuitive access to single-cell data we have developed scfind, a single-cell analysis tool that facilitates fast search of biologically or clinically relevant marker genes in cell atlases. Using transcriptome data from six mouse cell atlases, we show how scfind can be used to evaluate marker genes, perform in silico gating, and identify both cell-type-specific and housekeeping genes. Moreover, we have developed a subquery optimization routine to ensure that long and complex queries return meaningful results. To make scfind more user friendly, we use indices of PubMed abstracts and techniques from natural language processing to allow for arbitrary queries. Finally, we show how scfind can be used for multi-omics analyses by combining single-cell ATAC-seq data with transcriptome data.

Assuntos

Gerenciamento de Dados/métodos , Armazenamento e Recuperação da Informação/métodos , Análise de Célula Única/métodos , Transcriptoma/genética , Algoritmos , Animais , Análise de Dados , Bases de Dados Genéticas , Regulação da Expressão Gênica , Camundongos , Processamento de Linguagem Natural , PubMed , Interface Usuário-Computador

10.

BioRED: a rich biomedical relation extraction dataset.

Luo, Ling; Lai, Po-Ting; Wei, Chih-Hsuan; Arighi, Cecilia N; Lu, Zhiyong.

Brief Bioinform ; 23(5)2022 09 20.

Artigo em Inglês | MEDLINE | ID: mdl-35849818

RESUMO

Automated relation extraction (RE) from biomedical literature is critical for many downstream text mining applications in both research and real-world settings. However, most existing benchmarking datasets for biomedical RE only focus on relations of a single type (e.g. protein-protein interactions) at the sentence level, greatly limiting the development of RE systems in biomedicine. In this work, we first review commonly used named entity recognition (NER) and RE datasets. Then, we present a first-of-its-kind biomedical relation extraction dataset (BioRED) with multiple entity types (e.g. gene/protein, disease, chemical) and relation pairs (e.g. gene-disease; chemical-chemical) at the document level, on a set of 600 PubMed abstracts. Furthermore, we label each relation as describing either a novel finding or previously known background knowledge, enabling automated algorithms to differentiate between novel and background information. We assess the utility of BioRED by benchmarking several existing state-of-the-art methods, including Bidirectional Encoder Representations from Transformers (BERT)-based models, on the NER and RE tasks. Our results show that while existing approaches can reach high performance on the NER task (F-score of 89.3%), there is much room for improvement for the RE task, especially when extracting novel relations (F-score of 47.7%). Our experiments also demonstrate that such a rich dataset can successfully facilitate the development of more accurate, efficient and robust RE systems for biomedicine. Availability: The BioRED dataset and annotation guidelines are freely available at https://ftp.ncbi.nlm.nih.gov/pub/lu/BioRED/.

Assuntos

Algoritmos , Mineração de Dados , Proteínas , PubMed

11.

OncoPubMiner: a platform for mining oncology publications.

Xu, Quan; Liu, Yueyue; Hu, Jifang; Duan, Xiaohong; Song, Niuben; Zhou, Jiale; Zhai, Jincheng; Su, Junyan; Liu, Siyao; Chen, Fan; Zheng, Wei; Guo, Zhongjia; Li, Hexiang; Zhou, Qiming; Niu, Beifang.

Brief Bioinform ; 23(5)2022 09 20.

Artigo em Inglês | MEDLINE | ID: mdl-36058206

RESUMO

Updated and expert-quality knowledge bases are fundamental to biomedical research. A knowledge base established with human participation and subject to multiple inspections is needed to support clinical decision making, especially in the growing field of precision oncology. The number of original publications in this field has risen dramatically with the advances in technology and the evolution of in-depth research. Consequently, the issue of how to gather and mine these articles accurately and efficiently now requires close consideration. In this study, we present OncoPubMiner (https://oncopubminer.chosenmedinfo.com), a free and powerful system that combines text mining, data structure customisation, publication search with online reading and project-centred and team-based data collection to form a one-stop 'keyword in-knowledge out' oncology publication mining platform. The platform was constructed by integrating all open-access abstracts from PubMed and full-text articles from PubMed Central, and it is updated daily. OncoPubMiner makes obtaining precision oncology knowledge from scientific articles straightforward and will assist researchers in efficiently developing structured knowledge base systems and bring us closer to achieving precision oncology goals.

Assuntos

Neoplasias , Mineração de Dados , Humanos , Oncologia , Medicina de Precisão , PubMed , Publicações

12.

Updated review of advances in microRNAs and complex diseases: experimental results, databases, webservers and data fusion.

Huang, Li; Zhang, Li; Chen, Xing.

Brief Bioinform ; 23(6)2022 11 19.

Artigo em Inglês | MEDLINE | ID: mdl-36094095

RESUMO

MicroRNAs (miRNAs) are gene regulators involved in the pathogenesis of complex diseases such as cancers, and thus serve as potential diagnostic markers and therapeutic targets. The prerequisite for designing effective miRNA therapies is accurate discovery of miRNA-disease associations (MDAs), which has attracted substantial research interests during the last 15 years, as reflected by more than 55 000 related entries available on PubMed. Abundant experimental data gathered from the wealth of literature could effectively support the development of computational models for predicting novel associations. In 2017, Chen et al. published the first-ever comprehensive review on MDA prediction, presenting various relevant databases, 20 representative computational models, and suggestions for building more powerful ones. In the current review, as the continuation of the previous study, we revisit miRNA biogenesis, detection techniques and functions; summarize recent experimental findings related to common miRNA-associated diseases; introduce recent updates of miRNA-relevant databases and novel database releases since 2017, present mainstream webservers and new webserver releases since 2017 and finally elaborate on how fusion of diverse data sources has contributed to accurate MDA prediction.

Assuntos

MicroRNAs , Neoplasias , Humanos , MicroRNAs/genética , Bases de Dados Genéticas , Neoplasias/genética , PubMed , Biologia Computacional/métodos , Predisposição Genética para Doença , Algoritmos

13.

On the effectiveness of compact biomedical transformers.

Rohanian, Omid; Nouriborji, Mohammadmahdi; Kouchaki, Samaneh; Clifton, David A.

Bioinformatics ; 39(3)2023 03 01.

Artigo em Inglês | MEDLINE | ID: mdl-36825820

RESUMO

MOTIVATION: Language models pre-trained on biomedical corpora, such as BioBERT, have recently shown promising results on downstream biomedical tasks. Many existing pre-trained models, on the other hand, are resource-intensive and computationally heavy owing to factors such as embedding size, hidden dimension and number of layers. The natural language processing community has developed numerous strategies to compress these models utilizing techniques such as pruning, quantization and knowledge distillation, resulting in models that are considerably faster, smaller and subsequently easier to use in practice. By the same token, in this article, we introduce six lightweight models, namely, BioDistilBERT, BioTinyBERT, BioMobileBERT, DistilBioBERT, TinyBioBERT and CompactBioBERT which are obtained either by knowledge distillation from a biomedical teacher or continual learning on the Pubmed dataset. We evaluate all of our models on three biomedical tasks and compare them with BioBERT-v1.1 to create the best efficient lightweight models that perform on par with their larger counterparts. RESULTS: We trained six different models in total, with the largest model having 65 million in parameters and the smallest having 15 million; a far lower range of parameters compared with BioBERT's 110M. Based on our experiments on three different biomedical tasks, we found that models distilled from a biomedical teacher and models that have been additionally pre-trained on the PubMed dataset can retain up to 98.8% and 98.6% of the performance of the BioBERT-v1.1, respectively. Overall, our best model below 30 M parameters is BioMobileBERT, while our best models over 30 M parameters are DistilBioBERT and CompactBioBERT, which can keep up to 98.2% and 98.8% of the performance of the BioBERT-v1.1, respectively. AVAILABILITY AND IMPLEMENTATION: Codes are available at: https://github.com/nlpie-research/Compact-Biomedical-Transformers. Trained models can be accessed at: https://huggingface.co/nlpie.

Assuntos

Processamento de Linguagem Natural , PubMed , Conjuntos de Dados como Assunto

14.

NEREL-BIO: a dataset of biomedical abstracts annotated with nested named entities.

Loukachevitch, Natalia; Manandhar, Suresh; Baral, Elina; Rozhkov, Igor; Braslavski, Pavel; Ivanov, Vladimir; Batura, Tatiana; Tutubalina, Elena.

Bioinformatics ; 39(4)2023 04 03.

Artigo em Inglês | MEDLINE | ID: mdl-37004189

RESUMO

MOTIVATION: This article describes NEREL-BIO-an annotation scheme and corpus of PubMed abstracts in Russian and smaller number of abstracts in English. NEREL-BIO extends the general domain dataset NEREL by introducing domain-specific entity types. NEREL-BIO annotation scheme covers both general and biomedical domains making it suitable for domain transfer experiments. NEREL-BIO provides annotation for nested named entities as an extension of the scheme employed for NEREL. Nested named entities may cross entity boundaries to connect to shorter entities nested within longer entities, making them harder to detect. RESULTS: NEREL-BIO contains annotations for 700+ Russian and 100+ English abstracts. All English PubMed annotations have corresponding Russian counterparts. Thus, NEREL-BIO comprises the following specific features: annotation of nested named entities, it can be used as a benchmark for cross-domain (NEREL â NEREL-BIO) and cross-language (English â Russian) transfer. We experiment with both transformer-based sequence models and machine reading comprehension models and report their results. AVAILABILITY AND IMPLEMENTATION: The dataset and annotation guidelines are freely available at https://github.com/nerel-ds/NEREL-BIO.

Assuntos

Processamento de Linguagem Natural , Semântica , PubMed , Idioma

15.

PEDL+: protein-centered relation extraction from PubMed at your fingertip.

Weber, Leon; Barth, Fabio; Lorenz, Leonie; Konrath, Fabian; Huska, Kirsten; Wolf, Jana; Leser, Ulf.

Bioinformatics ; 39(11)2023 11 01.

Artigo em Inglês | MEDLINE | ID: mdl-37950510

RESUMO

SUMMARY: Relation extraction (RE) from large text collections is an important tool for database curation, pathway reconstruction, or functional omics data analysis. In practice, RE often is part of a complex data analysis pipeline requiring specific adaptations like restricting the types of relations or the set of proteins to be considered. However, current systems are either non-programmable web sites or research code with fixed functionality. We present PEDL+, a user-friendly tool for extracting protein-protein and protein-chemical associations from PubMed articles. PEDL+ combines state-of-the-art NLP technology with adaptable ranking and filtering options and can easily be integrated into analysis pipelines. We evaluated PEDL+ in two pathway curation projects and found that 59% to 80% of its extractions were helpful. AVAILABILITY AND IMPLEMENTATION: PEDL+ is freely available at https://github.com/leonweber/pedl.

Assuntos

Software , PubMed , Bases de Dados Factuais

16.

AIONER: all-in-one scheme-based biomedical named entity recognition using deep learning.

Luo, Ling; Wei, Chih-Hsuan; Lai, Po-Ting; Leaman, Robert; Chen, Qingyu; Lu, Zhiyong.

Bioinformatics ; 39(5)2023 05 04.

Artigo em Inglês | MEDLINE | ID: mdl-37171899

RESUMO

MOTIVATION: Biomedical named entity recognition (BioNER) seeks to automatically recognize biomedical entities in natural language text, serving as a necessary foundation for downstream text mining tasks and applications such as information extraction and question answering. Manually labeling training data for the BioNER task is costly, however, due to the significant domain expertise required for accurate annotation. The resulting data scarcity causes current BioNER approaches to be prone to overfitting, to suffer from limited generalizability, and to address a single entity type at a time (e.g. gene or disease). RESULTS: We therefore propose a novel all-in-one (AIO) scheme that uses external data from existing annotated resources to enhance the accuracy and stability of BioNER models. We further present AIONER, a general-purpose BioNER tool based on cutting-edge deep learning and our AIO schema. We evaluate AIONER on 14 BioNER benchmark tasks and show that AIONER is effective, robust, and compares favorably to other state-of-the-art approaches such as multi-task learning. We further demonstrate the practical utility of AIONER in three independent tasks to recognize entity types not previously seen in training data, as well as the advantages of AIONER over existing methods for processing biomedical text at a large scale (e.g. the entire PubMed data). AVAILABILITY AND IMPLEMENTATION: The source code, trained models and data for AIONER are freely available at https://github.com/ncbi/AIONER.

Assuntos

Aprendizado Profundo , Mineração de Dados/métodos , Software , Idioma , PubMed

17.

MedCPT: Contrastive Pre-trained Transformers with large-scale PubMed search logs for zero-shot biomedical information retrieval.

Jin, Qiao; Kim, Won; Chen, Qingyu; Comeau, Donald C; Yeganova, Lana; Wilbur, W John; Lu, Zhiyong.

Bioinformatics ; 39(11)2023 11 01.

Artigo em Inglês | MEDLINE | ID: mdl-37930897

RESUMO

MOTIVATION: Information retrieval (IR) is essential in biomedical knowledge acquisition and clinical decision support. While recent progress has shown that language model encoders perform better semantic retrieval, training such models requires abundant query-article annotations that are difficult to obtain in biomedicine. As a result, most biomedical IR systems only conduct lexical matching. In response, we introduce MedCPT, a first-of-its-kind Contrastively Pre-trained Transformer model for zero-shot semantic IR in biomedicine. RESULTS: To train MedCPT, we collected an unprecedented scale of 255 million user click logs from PubMed. With such data, we use contrastive learning to train a pair of closely integrated retriever and re-ranker. Experimental results show that MedCPT sets new state-of-the-art performance on six biomedical IR tasks, outperforming various baselines including much larger models, such as GPT-3-sized cpt-text-XL. In addition, MedCPT also generates better biomedical article and sentence representations for semantic evaluations. As such, MedCPT can be readily applied to various real-world biomedical IR tasks. AVAILABILITY AND IMPLEMENTATION: The MedCPT code and model are available at https://github.com/ncbi/MedCPT.

Assuntos

Armazenamento e Recuperação da Informação , Semântica , Idioma , Processamento de Linguagem Natural , PubMed , Literatura de Revisão como Assunto

18.

NeuroML-DB: Sharing and characterizing data-driven neuroscience models described in NeuroML.

Birgiolas, Justas; Haynes, Vergil; Gleeson, Padraig; Gerkin, Richard C; Dietrich, Suzanne W; Crook, Sharon.

PLoS Comput Biol ; 19(3): e1010941, 2023 03.

Artigo em Inglês | MEDLINE | ID: mdl-36867658

RESUMO

As researchers develop computational models of neural systems with increasing sophistication and scale, it is often the case that fully de novo model development is impractical and inefficient. Thus arises a critical need to quickly find, evaluate, re-use, and build upon models and model components developed by other researchers. We introduce the NeuroML Database (NeuroML-DB.org), which has been developed to address this need and to complement other model sharing resources. NeuroML-DB stores over 1,500 previously published models of ion channels, cells, and networks that have been translated to the modular NeuroML model description language. The database also provides reciprocal links to other neuroscience model databases (ModelDB, Open Source Brain) as well as access to the original model publications (PubMed). These links along with Neuroscience Information Framework (NIF) search functionality provide deep integration with other neuroscience community modeling resources and greatly facilitate the task of finding suitable models for reuse. Serving as an intermediate language, NeuroML and its tooling ecosystem enable efficient translation of models to other popular simulator formats. The modular nature also enables efficient analysis of a large number of models and inspection of their properties. Search capabilities of the database, together with web-based, programmable online interfaces, allow the community of researchers to rapidly assess stored model electrophysiology, morphology, and computational complexity properties. We use these capabilities to perform a database-scale analysis of neuron and ion channel models and describe a novel tetrahedral structure formed by cell model clusters in the space of model properties and features. This analysis provides further information about model similarity to enrich database search.

Assuntos

Neurociências , Software , Ecossistema , PubMed , Neurônios/fisiologia

19.

Antipsychotic agents in anxiety disorders: An umbrella review.

Garakani, Amir; Buono, Frank D; Salehi, Mona; Funaro, Melissa C; Klimowicz, Anna; Sharma, Harshit; Faria, Clara G F; Larkin, Kaitlyn; Freire, Rafael C.

Acta Psychiatr Scand ; 149(4): 295-312, 2024 04.

Artigo em Inglês | MEDLINE | ID: mdl-38382649

RESUMO

BACKGROUND: Although not approved for the treatment of anxiety disorders (except trifluoperazine) there is ongoing off-label, unapproved use of first-generation antipsychotics (FGAs) and second-generation antipsychotics (SGAs) for anxiety disorders. There have been systematic reviews and meta-analyses on the use of antipsychotics in anxiety disorders, most of which focused on SGAs. OBJECTIVE: The specific aims of this umbrella review are to: (1) Evaluate the evidence of efficacy of FGAs and SGAs in anxiety disorders as an adjunctive treatment to traditional antidepressant treatments and other nonantipsychotic medications; (2) Compare monotherapy with antipsychotics to first-line treatments for anxiety disorders in terms of effectiveness, risks, and side effects. The review protocol is registered on PROSPERO (CRD42021237436). METHODS: An initial search was undertaken to identify systematic reviews and meta-analyses from inception until 2020, with an updated search completed August 2021 and January 2023. The searches were conducted in PubMed, MEDLINE (Ovid), EMBASE (Ovid), APA PsycInfo (Ovid), CINAHL Complete (EBSCOhost), and the Cochrane Library through hand searches of references of included articles. Review quality was measured using the AMSTAR-2 (A MeaSurement Tool to Assess Systematic Reviews) scale. RESULTS: The original and updated searches yielded 1796 and 3744 articles respectively, of which 45 were eligible. After final review, 25 systematic reviews and meta-analyses were included in the analysis. Most of the systematic reviews and meta-analyses were deemed low-quality through AMSTAR-2 with only one review being deemed high-quality. In evaluating the monotherapies with antipsychotics compared with first-line treatments for anxiety disorder there was insufficient evidence due to flawed study designs (such as problems with randomization) and small sample sizes within studies. There was limited evidence suggesting efficacy of antipsychotic agents in anxiety disorders other than quetiapine in generalized anxiety disorder (GAD). CONCLUSIONS: This umbrella review indicates a lack of high-quality studies of antipsychotics in anxiety disorders outside of the use of quetiapine in GAD. Although potentially effective for anxiety disorders, FGAs and SGAs may have risks and side effects that outweigh their efficacy, although there were limited data. Further long-term and larger-scale studies of antipsychotics in anxiety disorders are needed.

Assuntos

Antipsicóticos , Transtornos de Ansiedade , Humanos , Antipsicóticos/efeitos adversos , Transtornos de Ansiedade/tratamento farmacológico , PubMed , Fumarato de Quetiapina , Trifluoperazina , Revisões Sistemáticas como Assunto , Metanálise como Assunto

20.

Utilization of EHRs for clinical trials: a systematic review.

Kalankesh, Leila R; Monaghesh, Elham.

BMC Med Res Methodol ; 24(1): 70, 2024 Mar 18.

Artigo em Inglês | MEDLINE | ID: mdl-38494497

RESUMO

BACKGROUND AND OBJECTIVE: Clinical trials are of high importance for medical progress. This study conducted a systematic review to identify the applications of EHRs in supporting and enhancing clinical trials. MATERIALS AND METHODS: A systematic search of PubMed was conducted on 12/3/2023 to identify relevant studies on the use of EHRs in clinical trials. Studies were included if they (1) were full-text journal articles, (2) were written in English, (3) examined applications of EHR data to support clinical trial processes (e.g. recruitment, screening, data collection). A standardized form was used by two reviewers to extract data on: study design, EHR-enabled process(es), related outcomes, and limitations. RESULTS: Following full-text review, 19 studies met the predefined eligibility criteria and were included. Overall, included studies consistently demonstrated that EHR data integration improves clinical trial feasibility and efficiency in recruitment, screening, data collection, and trial design. CONCLUSIONS: According to the results of the present study, the use of Electronic Health Records in conducting clinical trials is very helpful. Therefore, it is better for researchers to use EHR in their studies for easy access to more accurate and comprehensive data. EHRs collects all individual data, including demographic, clinical, diagnostic, and therapeutic data. Moreover, all data is available seamlessly in EHR. In future studies, it is better to consider the cost-effectiveness of using EHR in clinical trials.

Assuntos

Registros Eletrônicos de Saúde , Projetos de Pesquisa , Humanos , Coleta de Dados , PubMed , Ensaios Clínicos como Assunto

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA