Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 5 de 5
Filter
Add more filters










Database
Language
Publication year range
1.
Biomedicines ; 12(7)2024 Jul 10.
Article in English | MEDLINE | ID: mdl-39062108

ABSTRACT

microRNA (miRNA)-messenger RNA (mRNA or gene) interactions are pivotal in various biological processes, including the regulation of gene expression, cellular differentiation, proliferation, apoptosis, and development, as well as the maintenance of cellular homeostasis and pathogenesis of numerous diseases, such as cancer, cardiovascular diseases, neurological disorders, and metabolic conditions. Understanding the mechanisms of miRNA-mRNA interactions can provide insights into disease mechanisms and potential therapeutic targets. However, extracting these interactions efficiently from a huge collection of published articles in PubMed is challenging. In the current study, we annotated a miRNA-mRNA Interaction Corpus (MMIC) and used it for evaluating the performance of a variety of machine learning (ML) models, deep learning-based transformer (DLT) models, and large language models (LLMs) in extracting the miRNA-mRNA interactions mentioned in PubMed. We used the genomics approaches for validating the extracted miRNA-mRNA interactions. Among the ML, DLT, and LLM models, PubMedBERT showed the highest precision, recall, and F-score, with all equal to 0.783. Among the LLM models, the performance of Llama-2 is better when compared to others. Llama 2 achieved 0.56 precision, 0.86 recall, and 0.68 F-score in a zero-shot experiment and 0.56 precision, 0.87 recall, and 0.68 F-score in a three-shot experiment. Our study shows that Llama 2 achieves better recall than ML and DLT models and leaves space for further improvement in terms of precision and F-score.

2.
AMIA Jt Summits Transl Sci Proc ; 2024: 391-400, 2024.
Article in English | MEDLINE | ID: mdl-38827097

ABSTRACT

Relation Extraction (RE) is a natural language processing (NLP) task for extracting semantic relations between biomedical entities. Recent developments in pre-trained large language models (LLM) motivated NLP researchers to use them for various NLP tasks. We investigated GPT-3.5-turbo and GPT-4 on extracting the relations from three standard datasets, EU-ADR, Gene Associations Database (GAD), and ChemProt. Unlike the existing approaches using datasets with masked entities, we used three versions for each dataset for our experiment: a version with masked entities, a second version with the original entities (unmasked), and a third version with abbreviations replaced with the original terms. We developed the prompts for various versions and used the chat completion model from GPT API. Our approach achieved a F1-score of 0.498 to 0.809 for GPT-3.5-turbo, and a highest F1-score of 0.84 for GPT-4. For certain experiments, the performance of GPT, BioBERT, and PubMedBERT are almost the same.

3.
Genes (Basel) ; 15(5)2024 05 11.
Article in English | MEDLINE | ID: mdl-38790243

ABSTRACT

Alzheimer's disease (AD), a multifactorial neurodegenerative disorder, is prevalent among the elderly population. It is a complex trait with mutations in multiple genes. Although the US Food and Drug Administration (FDA) has approved a few drugs for AD treatment, a definitive cure remains elusive. Research efforts persist in seeking improved treatment options for AD. Here, a hybrid pipeline is proposed to apply text mining to identify comorbid diseases for AD and an omics approach to identify the common genes between AD and five comorbid diseases-dementia, type 2 diabetes, hypertension, Parkinson's disease, and Down syndrome. We further identified the pathways and drugs for common genes. The rationale behind this approach is rooted in the fact that elderly individuals often receive multiple medications for various comorbid diseases, and an insight into the genes that are common to comorbid diseases may enhance treatment strategies. We identified seven common genes-PSEN1, PSEN2, MAPT, APP, APOE, NOTCH, and HFE-for AD and five comorbid diseases. We investigated the drugs interacting with these common genes using LINCS gene-drug perturbation. Our analysis unveiled several promising candidates, including MG-132 and Masitinib, which exhibit potential efficacy for both AD and its comorbid diseases. The pipeline can be extended to other diseases.


Subject(s)
Alzheimer Disease , Comorbidity , Data Mining , Alzheimer Disease/genetics , Alzheimer Disease/drug therapy , Humans , Data Mining/methods , Parkinson Disease/genetics , Parkinson Disease/drug therapy , Diabetes Mellitus, Type 2/genetics , Diabetes Mellitus, Type 2/drug therapy , Down Syndrome/genetics , Down Syndrome/drug therapy , Hypertension/genetics , Hypertension/drug therapy
4.
Bioinformatics ; 40(4)2024 Mar 29.
Article in English | MEDLINE | ID: mdl-38514400

ABSTRACT

MOTIVATION: Large Language Models (LLMs) have the potential to revolutionize the field of Natural Language Processing, excelling not only in text generation and reasoning tasks but also in their ability for zero/few-shot learning, swiftly adapting to new tasks with minimal fine-tuning. LLMs have also demonstrated great promise in biomedical and healthcare applications. However, when it comes to Named Entity Recognition (NER), particularly within the biomedical domain, LLMs fall short of the effectiveness exhibited by fine-tuned domain-specific models. One key reason is that NER is typically conceptualized as a sequence labeling task, whereas LLMs are optimized for text generation and reasoning tasks. RESULTS: We developed an instruction-based learning paradigm that transforms biomedical NER from a sequence labeling task into a generation task. This paradigm is end-to-end and streamlines the training and evaluation process by automatically repurposing pre-existing biomedical NER datasets. We further developed BioNER-LLaMA using the proposed paradigm with LLaMA-7B as the foundational LLM. We conducted extensive testing on BioNER-LLaMA across three widely recognized biomedical NER datasets, consisting of entities related to diseases, chemicals, and genes. The results revealed that BioNER-LLaMA consistently achieved higher F1-scores ranging from 5% to 30% compared to the few-shot learning capabilities of GPT-4 on datasets with different biomedical entities. We show that a general-domain LLM can match the performance of rigorously fine-tuned PubMedBERT models and PMC-LLaMA, biomedical-specific language model. Our findings underscore the potential of our proposed paradigm in developing general-domain LLMs that can rival SOTA performances in multi-task, multi-domain scenarios in biomedical and health applications. AVAILABILITY AND IMPLEMENTATION: Datasets and other resources are available at https://github.com/BIDS-Xu-Lab/BioNER-LLaMA.


Subject(s)
Camelids, New World , Deep Learning , Animals , Language , Natural Language Processing
5.
Mol Biol Evol ; 41(3)2024 Mar 01.
Article in English | MEDLINE | ID: mdl-38376487

ABSTRACT

The blue whale, Balaenoptera musculus, is the largest animal known to have ever existed, making it an important case study in longevity and resistance to cancer. To further this and other blue whale-related research, we report a reference-quality, long-read-based genome assembly of this fascinating species. We assembled the genome from PacBio long reads and utilized Illumina/10×, optical maps, and Hi-C data for scaffolding, polishing, and manual curation. We also provided long read RNA-seq data to facilitate the annotation of the assembly by NCBI and Ensembl. Additionally, we annotated both haplotypes using TOGA and measured the genome size by flow cytometry. We then compared the blue whale genome with other cetaceans and artiodactyls, including vaquita (Phocoena sinus), the world's smallest cetacean, to investigate blue whale's unique biological traits. We found a dramatic amplification of several genes in the blue whale genome resulting from a recent burst in segmental duplications, though the possible connection between this amplification and giant body size requires further study. We also discovered sites in the insulin-like growth factor-1 gene correlated with body size in cetaceans. Finally, using our assembly to examine the heterozygosity and historical demography of Pacific and Atlantic blue whale populations, we found that the genomes of both populations are highly heterozygous and that their genetic isolation dates to the last interglacial period. Taken together, these results indicate how a high-quality, annotated blue whale genome will serve as an important resource for biology, evolution, and conservation research.


Subject(s)
Balaenoptera , Neoplasms , Animals , Balaenoptera/genetics , Segmental Duplications, Genomic , Genome , Demography , Neoplasms/genetics
SELECTION OF CITATIONS
SEARCH DETAIL