Búsqueda | Portal de Búsqueda de la BVS España

1.

PubTator 3.0: an AI-powered literature resource for unlocking biomedical knowledge.

Wei, Chih-Hsuan; Allot, Alexis; Lai, Po-Ting; Leaman, Robert; Tian, Shubo; Luo, Ling; Jin, Qiao; Wang, Zhizheng; Chen, Qingyu; Lu, Zhiyong.

Nucleic Acids Res ; 52(W1): W540-W546, 2024 Jul 05.

Artículo en Inglés | MEDLINE | ID: mdl-38572754

RESUMEN

PubTator 3.0 (https://www.ncbi.nlm.nih.gov/research/pubtator3/) is a biomedical literature resource using state-of-the-art AI techniques to offer semantic and relation searches for key concepts like proteins, genetic variants, diseases and chemicals. It currently provides over one billion entity and relation annotations across approximately 36 million PubMed abstracts and 6 million full-text articles from the PMC open access subset, updated weekly. PubTator 3.0's online interface and API utilize these precomputed entity relations and synonyms to provide advanced search capabilities and enable large-scale analyses, streamlining many complex information needs. We showcase the retrieval quality of PubTator 3.0 using a series of entity pair queries, demonstrating that PubTator 3.0 retrieves a greater number of articles than either PubMed or Google Scholar, with higher precision in the top 20 results. We further show that integrating ChatGPT (GPT-4) with PubTator APIs dramatically improves the factuality and verifiability of its responses. In summary, PubTator 3.0 offers a comprehensive set of features and tools that allow researchers to navigate the ever-expanding wealth of biomedical literature, expediting research and unlocking valuable insights for scientific discovery.

Asunto(s)

PubMed , Inteligencia Artificial , Humanos , Programas Informáticos , Minería de Datos/métodos , Semántica , Internet

2.

Opportunities and challenges for ChatGPT and large language models in biomedicine and health.

Tian, Shubo; Jin, Qiao; Yeganova, Lana; Lai, Po-Ting; Zhu, Qingqing; Chen, Xiuying; Yang, Yifan; Chen, Qingyu; Kim, Won; Comeau, Donald C; Islamaj, Rezarta; Kapoor, Aadit; Gao, Xin; Lu, Zhiyong.

Brief Bioinform ; 25(1)2023 11 22.

Artículo en Inglés | MEDLINE | ID: mdl-38168838

RESUMEN

ChatGPT has drawn considerable attention from both the general public and domain experts with its remarkable text generation capabilities. This has subsequently led to the emergence of diverse applications in the field of biomedicine and health. In this work, we examine the diverse applications of large language models (LLMs), such as ChatGPT, in biomedicine and health. Specifically, we explore the areas of biomedical information retrieval, question answering, medical text summarization, information extraction and medical education and investigate whether LLMs possess the transformative power to revolutionize these tasks or whether the distinct complexities of biomedical domain presents unique challenges. Following an extensive literature survey, we find that significant advances have been made in the field of text generation tasks, surpassing the previous state-of-the-art methods. For other applications, the advances have been modest. Overall, LLMs have not yet revolutionized biomedicine, but recent rapid progress indicates that such methods hold great potential to provide valuable means for accelerating discovery and improving health. We also find that the use of LLMs, like ChatGPT, in the fields of biomedicine and health entails various risks and challenges, including fabricated information in its generated responses, as well as legal and privacy concerns associated with sensitive patient data. We believe this survey can provide a comprehensive and timely overview to biomedical researchers and healthcare practitioners on the opportunities and challenges associated with using ChatGPT and other LLMs for transforming biomedicine and health.

Asunto(s)

Almacenamiento y Recuperación de la Información , Lenguaje , Humanos , Privacidad , Investigadores

3.

BioRED: a rich biomedical relation extraction dataset.

Luo, Ling; Lai, Po-Ting; Wei, Chih-Hsuan; Arighi, Cecilia N; Lu, Zhiyong.

Brief Bioinform ; 23(5)2022 09 20.

Artículo en Inglés | MEDLINE | ID: mdl-35849818

RESUMEN

Automated relation extraction (RE) from biomedical literature is critical for many downstream text mining applications in both research and real-world settings. However, most existing benchmarking datasets for biomedical RE only focus on relations of a single type (e.g. protein-protein interactions) at the sentence level, greatly limiting the development of RE systems in biomedicine. In this work, we first review commonly used named entity recognition (NER) and RE datasets. Then, we present a first-of-its-kind biomedical relation extraction dataset (BioRED) with multiple entity types (e.g. gene/protein, disease, chemical) and relation pairs (e.g. gene-disease; chemical-chemical) at the document level, on a set of 600 PubMed abstracts. Furthermore, we label each relation as describing either a novel finding or previously known background knowledge, enabling automated algorithms to differentiate between novel and background information. We assess the utility of BioRED by benchmarking several existing state-of-the-art methods, including Bidirectional Encoder Representations from Transformers (BERT)-based models, on the NER and RE tasks. Our results show that while existing approaches can reach high performance on the NER task (F-score of 89.3%), there is much room for improvement for the RE task, especially when extracting novel relations (F-score of 47.7%). Our experiments also demonstrate that such a rich dataset can successfully facilitate the development of more accurate, efficient and robust RE systems for biomedicine. Availability: The BioRED dataset and annotation guidelines are freely available at https://ftp.ncbi.nlm.nih.gov/pub/lu/BioRED/.

Asunto(s)

Algoritmos , Minería de Datos , Proteínas , PubMed

4.

AIONER: all-in-one scheme-based biomedical named entity recognition using deep learning.

Luo, Ling; Wei, Chih-Hsuan; Lai, Po-Ting; Leaman, Robert; Chen, Qingyu; Lu, Zhiyong.

Bioinformatics ; 39(5)2023 05 04.

Artículo en Inglés | MEDLINE | ID: mdl-37171899

RESUMEN

MOTIVATION: Biomedical named entity recognition (BioNER) seeks to automatically recognize biomedical entities in natural language text, serving as a necessary foundation for downstream text mining tasks and applications such as information extraction and question answering. Manually labeling training data for the BioNER task is costly, however, due to the significant domain expertise required for accurate annotation. The resulting data scarcity causes current BioNER approaches to be prone to overfitting, to suffer from limited generalizability, and to address a single entity type at a time (e.g. gene or disease). RESULTS: We therefore propose a novel all-in-one (AIO) scheme that uses external data from existing annotated resources to enhance the accuracy and stability of BioNER models. We further present AIONER, a general-purpose BioNER tool based on cutting-edge deep learning and our AIO schema. We evaluate AIONER on 14 BioNER benchmark tasks and show that AIONER is effective, robust, and compares favorably to other state-of-the-art approaches such as multi-task learning. We further demonstrate the practical utility of AIONER in three independent tasks to recognize entity types not previously seen in training data, as well as the advantages of AIONER over existing methods for processing biomedical text at a large scale (e.g. the entire PubMed data). AVAILABILITY AND IMPLEMENTATION: The source code, trained models and data for AIONER are freely available at https://github.com/ncbi/AIONER.

Asunto(s)

Aprendizaje Profundo , Minería de Datos/métodos , Programas Informáticos , Lenguaje , PubMed

5.

GNorm2: an improved gene name recognition and normalization system.

Wei, Chih-Hsuan; Luo, Ling; Islamaj, Rezarta; Lai, Po-Ting; Lu, Zhiyong.

Bioinformatics ; 39(10)2023 10 03.

Artículo en Inglés | MEDLINE | ID: mdl-37878810

RESUMEN

MOTIVATION: Gene name normalization is an important yet highly complex task in biomedical text mining research, as gene names can be highly ambiguous and may refer to different genes in different species or share similar names with other bioconcepts. This poses a challenge for accurately identifying and linking gene mentions to their corresponding entries in databases such as NCBI Gene or UniProt. While there has been a body of literature on the gene normalization task, few have addressed all of these challenges or make their solutions publicly available to the scientific community. RESULTS: Building on the success of GNormPlus, we have created GNorm2: a more advanced tool with optimized functions and improved performance. GNorm2 integrates a range of advanced deep learning-based methods, resulting in the highest levels of accuracy and efficiency for gene recognition and normalization to date. Our tool is freely available for download. AVAILABILITY AND IMPLEMENTATION: https://github.com/ncbi/GNorm2.

Asunto(s)

Minería de Datos , Minería de Datos/métodos , Bases de Datos Factuales

6.

BioREx: Improving biomedical relation extraction by leveraging heterogeneous datasets.

Lai, Po-Ting; Wei, Chih-Hsuan; Luo, Ling; Chen, Qingyu; Lu, Zhiyong.

J Biomed Inform ; 146: 104487, 2023 Oct.

Artículo en Inglés | MEDLINE | ID: mdl-37673376

RESUMEN

Biomedical relation extraction (RE) is the task of automatically identifying and characterizing relations between biomedical concepts from free text. RE is a central task in biomedical natural language processing (NLP) research and plays a critical role in many downstream applications, such as literature-based discovery and knowledge graph construction. State-of-the-art methods were used primarily to train machine learning models on individual RE datasets, such as protein-protein interaction and chemical-induced disease relation. Manual dataset annotation, however, is highly expensive and time-consuming, as it requires domain knowledge. Existing RE datasets are usually domain-specific or small, which limits the development of generalized and high-performing RE models. In this work, we present a novel framework for systematically addressing the data heterogeneity of individual datasets and combining them into a large dataset. Based on the framework and dataset, we report on BioREx, a data-centric approach for extracting relations. Our evaluation shows that BioREx achieves significantly higher performance than the benchmark system trained on the individual dataset, setting a new SOTA from 74.4% to 79.6% in F-1 measure on the recently released BioRED corpus. We further demonstrate that the combined dataset can improve performance for five different RE tasks. In addition, we show that on average BioREx compares favorably to current best-performing methods such as transfer learning and multi-task learning. Finally, we demonstrate BioREx's robustness and generalizability in two independent RE tasks not previously seen in training data: drug-drug N-ary combination and document-level gene-disease RE. The integrated dataset and optimized method have been packaged as a stand-alone tool available at https://github.com/ncbi/BioREx.

7.

Biomedical named entity recognition and linking datasets: survey and our recent development.

Huang, Ming-Siang; Lai, Po-Ting; Lin, Pei-Yen; You, Yu-Ting; Tsai, Richard Tzong-Han; Hsu, Wen-Lian.

Brief Bioinform ; 21(6): 2219-2238, 2020 12 01.

Artículo en Inglés | MEDLINE | ID: mdl-32602538

RESUMEN

Natural language processing (NLP) is widely applied in biological domains to retrieve information from publications. Systems to address numerous applications exist, such as biomedical named entity recognition (BNER), named entity normalization (NEN) and protein-protein interaction extraction (PPIE). High-quality datasets can assist the development of robust and reliable systems; however, due to the endless applications and evolving techniques, the annotations of benchmark datasets may become outdated and inappropriate. In this study, we first review commonlyused BNER datasets and their potential annotation problems such as inconsistency and low portability. Then, we introduce a revised version of the JNLPBA dataset that solves potential problems in the original and use state-of-the-art named entity recognition systems to evaluate its portability to different kinds of biomedical literature, including protein-protein interaction and biology events. Lastly, we introduce an ensembled biomedical entity dataset (EBED) by extending the revised JNLPBA dataset with PubMed Central full-text paragraphs, figure captions and patent abstracts. This EBED is a multi-task dataset that covers annotations including gene, disease and chemical entities. In total, it contains 85000 entity mentions, 25000 entity mentions with database identifiers and 5000 attribute tags. To demonstrate the usage of the EBED, we review the BNER track from the AI CUP Biomedical Paper Analysis challenge. Availability: The revised JNLPBA dataset is available at https://iasl-btm.iis.sinica.edu.tw/BNER/Content/Re vised_JNLPBA.zip. The EBED dataset is available at https://iasl-btm.iis.sinica.edu.tw/BNER/Content/AICUP _EBED_dataset.rar. Contact: Email: thtsai@g.ncu.edu.tw, Tel. 886-3-4227151 ext. 35203, Fax: 886-3-422-2681 Email: hsu@iis.sinica.edu.tw, Tel. 886-2-2788-3799 ext. 2211, Fax: 886-2-2782-4814 Supplementary information: Supplementary data are available at Briefings in Bioinformatics online.

Asunto(s)

Minería de Datos , Almacenamiento y Recuperación de la Información , Procesamiento de Lenguaje Natural , Benchmarking , Biología Computacional/métodos , Minería de Datos/métodos , Bases de Datos Factuales , Redes Neurales de la Computación , PubMed , Programas Informáticos , Encuestas y Cuestionarios

8.

BERT-GT: cross-sentence n-ary relation extraction with BERT and Graph Transformer.

Lai, Po-Ting; Lu, Zhiyong.

Bioinformatics ; 36(24): 5678-5685, 2021 Apr 05.

Artículo en Inglés | MEDLINE | ID: mdl-33416851

RESUMEN

MOTIVATION: A biomedical relation statement is commonly expressed in multiple sentences and consists of many concepts, including gene, disease, chemical and mutation. To automatically extract information from biomedical literature, existing biomedical text-mining approaches typically formulate the problem as a cross-sentence n-ary relation-extraction task that detects relations among n entities across multiple sentences, and use either a graph neural network (GNN) with long short-term memory (LSTM) or an attention mechanism. Recently, Transformer has been shown to outperform LSTM on many natural language processing (NLP) tasks. RESULTS: In this work, we propose a novel architecture that combines Bidirectional Encoder Representations from Transformers with Graph Transformer (BERT-GT), through integrating a neighbor-attention mechanism into the BERT architecture. Unlike the original Transformer architecture, which utilizes the whole sentence(s) to calculate the attention of the current token, the neighbor-attention mechanism in our method calculates its attention utilizing only its neighbor tokens. Thus, each token can pay attention to its neighbor information with little noise. We show that this is critically important when the text is very long, as in cross-sentence or abstract-level relation-extraction tasks. Our benchmarking results show improvements of 5.44% and 3.89% in accuracy and F1-measure over the state-of-the-art on n-ary and chemical-protein relation datasets, suggesting BERT-GT is a robust approach that is applicable to other biomedical relation extraction tasks or datasets. AVAILABILITY AND IMPLEMENTATION: the source code of BERT-GT will be made freely available at https://github.com/ncbi/bert_gt upon publication. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

9.

PhenoTagger: a hybrid method for phenotype concept recognition using human phenotype ontology.

Luo, Ling; Yan, Shankai; Lai, Po-Ting; Veltri, Daniel; Oler, Andrew; Xirasagar, Sandhya; Ghosh, Rajarshi; Similuk, Morgan; Robinson, Peter N; Lu, Zhiyong.

Bioinformatics ; 37(13): 1884-1890, 2021 Jul 27.

Artículo en Inglés | MEDLINE | ID: mdl-33471061

RESUMEN

MOTIVATION: Automatic phenotype concept recognition from unstructured text remains a challenging task in biomedical text mining research. Previous works that address the task typically use dictionary-based matching methods, which can achieve high precision but suffer from lower recall. Recently, machine learning-based methods have been proposed to identify biomedical concepts, which can recognize more unseen concept synonyms by automatic feature learning. However, most methods require large corpora of manually annotated data for model training, which is difficult to obtain due to the high cost of human annotation. RESULTS: In this article, we propose PhenoTagger, a hybrid method that combines both dictionary and machine learning-based methods to recognize Human Phenotype Ontology (HPO) concepts in unstructured biomedical text. We first use all concepts and synonyms in HPO to construct a dictionary, which is then used to automatically build a distantly supervised training dataset for machine learning. Next, a cutting-edge deep learning model is trained to classify each candidate phrase (n-gram from input sentence) into a corresponding concept label. Finally, the dictionary and machine learning-based prediction results are combined for improved performance. Our method is validated with two HPO corpora, and the results show that PhenoTagger compares favorably to previous methods. In addition, to demonstrate the generalizability of our method, we retrained PhenoTagger using the disease ontology MEDIC for disease concept recognition to investigate the effect of training on different ontologies. Experimental results on the NCBI disease corpus show that PhenoTagger without requiring manually annotated training data achieves competitive performance as compared with state-of-the-art supervised methods. AVAILABILITYAND IMPLEMENTATION: The source code, API information and data for PhenoTagger are freely available at https://github.com/ncbi-nlp/PhenoTagger. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

10.

PhenoRerank: A re-ranking model for phenotypic concept recognition pre-trained on human phenotype ontology.

Yan, Shankai; Luo, Ling; Lai, Po-Ting; Veltri, Daniel; Oler, Andrew J; Xirasagar, Sandhya; Ghosh, Rajarshi; Similuk, Morgan; Robinson, Peter N; Lu, Zhiyong.

J Biomed Inform ; 129: 104059, 2022 05.

Artículo en Inglés | MEDLINE | ID: mdl-35351638

RESUMEN

The study aims at developing a neural network model to improve the performance of Human Phenotype Ontology (HPO) concept recognition tools. We used the terms, definitions, and comments about the phenotypic concepts in the HPO database to train our model. The document to be analyzed is first split into sentences and annotated with a base method to generate candidate concepts. The sentences, along with the candidate concepts, are then fed into the pre-trained model for re-ranking. Our model comprises the pre-trained BlueBERT and a feature selection module, followed by a contrastive loss. We re-ranked the results generated by three robust HPO annotation tools and compared the performance against most of the existing approaches. The experimental results show that our model can improve the performance of the existing methods. Significantly, it boosted 3.0% and 5.6% in F1 score on the two evaluated datasets compared with the base methods. It removed more than 80% of the false positives predicted by the base methods, resulting in up to 18% improvement in precision. Our model utilizes the descriptive data in the ontology and the contextual information in the sentences for re-ranking. The results indicate that the additional information and the re-ranking model can significantly enhance the precision of HPO concept recognition compared with the base method.

Asunto(s)

Lenguaje , Redes Neurales de la Computación , Bases de Datos Factuales , Humanos , Fenotipo

11.

A resource-saving collective approach to biomedical semantic role labeling.

Tsai, Richard Tzong-Han; Lai, Po-Ting.

BMC Bioinformatics ; 15: 160, 2014 May 27.

Artículo en Inglés | MEDLINE | ID: mdl-24884358

RESUMEN

BACKGROUND: Biomedical semantic role labeling (BioSRL) is a natural language processing technique that identifies the semantic roles of the words or phrases in sentences describing biological processes and expresses them as predicate-argument structures (PAS's). Currently, a major problem of BioSRL is that most systems label every node in a full parse tree independently; however, some nodes always exhibit dependency. In general SRL, collective approaches based on the Markov logic network (MLN) model have been successful in dealing with this problem. However, in BioSRL such an approach has not been attempted because it would require more training data to recognize the more specialized and diverse terms found in biomedical literature, increasing training time and computational complexity. RESULTS: We first constructed a collective BioSRL system based on MLN. This system, called collective BIOSMILE (CBIOSMILE), is trained on the BioProp corpus. To reduce the resources used in BioSRL training, we employ a tree-pruning filter to remove unlikely nodes from the parse tree and four argument candidate identifiers to retain candidate nodes in the tree. Nodes not recognized by any candidate identifier are discarded. The pruned annotated parse trees are used to train a resource-saving MLN-based system, which is referred to as resource-saving collective BIOSMILE (RCBIOSMILE). Our experimental results show that our proposed CBIOSMILE system outperforms BIOSMILE, which is the top BioSRL system. Furthermore, our proposed RCBIOSMILE maintains the same level of accuracy as CBIOSMILE using 92% less memory and 57% less training time. CONCLUSIONS: This greatly improved efficiency makes RCBIOSMILE potentially suitable for training on much larger BioSRL corpora over more biomedical domains. Compared to real-world biomedical corpora, BioProp is relatively small, containing only 445 MEDLINE abstracts and 30 event triggers. It is not large enough for practical applications, such as pathway construction. We consider it of primary importance to pursue SRL training on large corpora in the future.

Asunto(s)

Semántica , Investigación Biomédica , Minería de Datos , Bases de Datos Factuales , Cadenas de Markov

12.

Targeting the Warburg effect with a novel glucose transporter inhibitor to overcome gemcitabine resistance in pancreatic cancer cells.

Lai, I-Lu; Chou, Chih-Chien; Lai, Po-Ting; Fang, Chun-Sheng; Shirley, Lawrence A; Yan, Ribai; Mo, Xiaokui; Bloomston, Mark; Kulp, Samuel K; Bekaii-Saab, Tanios; Chen, Ching-Shih.

Carcinogenesis ; 35(10): 2203-13, 2014 Oct.

Artículo en Inglés | MEDLINE | ID: mdl-24879635

RESUMEN

Gemcitabine resistance remains a significant clinical challenge. Here, we used a novel glucose transporter (Glut) inhibitor, CG-5, as a proof-of-concept compound to investigate the therapeutic utility of targeting the Warburg effect to overcome gemcitabine resistance in pancreatic cancer. The effects of gemcitabine and/or CG-5 on viability, survival, glucose uptake and DNA damage were evaluated in gemcitabine-sensitive and gemcitabine-resistant pancreatic cancer cell lines. Mechanistic studies were conducted to determine the molecular basis of gemcitabine resistance and the mechanism of CG-5-induced sensitization to gemcitabine. The effects of CG-5 on gemcitabine sensitivity were investigated in a xenograft tumor model of gemcitabine-resistant pancreatic cancer. In contrast to gemcitabine-sensitive pancreatic cancer cells, the resistant Panc-1 and Panc-1(GemR) cells responded to gemcitabine by increasing the expression of ribonucleotide reductase M2 catalytic subunit (RRM2) through E2F1-mediated transcriptional activation. Acting as a pan-Glut inhibitor, CG-5 abrogated this gemcitabine-induced upregulation of RRM2 through decreased E2F1 expression, thereby enhancing gemcitabine-induced DNA damage and inhibition of cell survival. This CG-5-induced inhibition of E2F1 expression was mediated by the induction of a previously unreported E2F1-targeted microRNA, miR-520f. The addition of oral CG-5 to gemcitabine therapy caused greater suppression of Panc-1(GemR) xenograft tumor growth in vivo than either drug alone. Glut inhibition may be an effective strategy to enhance gemcitabine activity for the treatment of pancreatic cancer.

Asunto(s)

Desoxicitidina/análogos & derivados , Resistencia a Antineoplásicos/efectos de los fármacos , Proteínas Facilitadoras del Transporte de la Glucosa/antagonistas & inhibidores , Neoplasias Pancreáticas/tratamiento farmacológico , Tiazolidinedionas/farmacología , Animales , Antimetabolitos Antineoplásicos/farmacología , Línea Celular Tumoral/efectos de los fármacos , Supervivencia Celular/efectos de los fármacos , Desoxicitidina/farmacología , Factor de Transcripción E2F1 , Femenino , Glucosa/metabolismo , Humanos , Ratones , Ratones Desnudos , MicroARNs/genética , Ribonucleósido Difosfato Reductasa/genética , Ribonucleósido Difosfato Reductasa/metabolismo , Ensayos Antitumor por Modelo de Xenoinjerto , Gemcitabina , Neoplasias Pancreáticas

13.

Retraction: Targeting the Warburg effect with a novel glucose transporter inhibitor to overcome gemcitabine resistance in pancreatic cancer cells.

Lai, I-Lu; Chou, Chih-Chien; Lai, Po-Ting; Fang, Chun-Sheng; Shirley, Lawrence A; Yan, Ribai; Mo, Xiaokui; Bloomston, Mark; K Kulp, Samuel; Bekaii-Saab, Tanios; Chen, Ching-Shih.

Carcinogenesis ; 40(2): e16, 2019 Apr 29.

Artículo en Inglés | MEDLINE | ID: mdl-31034565

14.

PubTator 3.0: an AI-powered Literature Resource for Unlocking Biomedical Knowledge.

Wei, Chih-Hsuan; Allot, Alexis; Lai, Po-Ting; Leaman, Robert; Tian, Shubo; Luo, Ling; Jin, Qiao; Wang, Zhizheng; Chen, Qingyu; Lu, Zhiyong.

ArXiv ; 2024 Jan 19.

Artículo en Inglés | MEDLINE | ID: mdl-38410657

RESUMEN

PubTator 3.0 (https://www.ncbi.nlm.nih.gov/research/pubtator3/) is a biomedical literature resource using state-of-the-art AI techniques to offer semantic and relation searches for key concepts like proteins, genetic variants, diseases, and chemicals. It currently provides over one billion entity and relation annotations across approximately 36 million PubMed abstracts and 6 million full-text articles from the PMC open access subset, updated weekly. PubTator 3.0's online interface and API utilize these precomputed entity relations and synonyms to provide advanced search capabilities and enable large-scale analyses, streamlining many complex information needs. We showcase the retrieval quality of PubTator 3.0 using a series of entity pair queries, demonstrating that PubTator 3.0 retrieves a greater number of articles than either PubMed or Google Scholar, with higher precision in the top 20 results. We further show that integrating ChatGPT (GPT-4) with PubTator APIs dramatically improves the factuality and verifiability of its responses. In summary, PubTator 3.0 offers a comprehensive set of features and tools that allow researchers to navigate the ever-expanding wealth of biomedical literature, expediting research and unlocking valuable insights for scientific discovery.

15.

GeneAgent: Self-verification Language Agent for Gene Set Knowledge Discovery using Domain Databases.

Wang, Zhizheng; Jin, Qiao; Wei, Chih-Hsuan; Tian, Shubo; Lai, Po-Ting; Zhu, Qingqing; Day, Chi-Ping; Ross, Christina; Lu, Zhiyong.

ArXiv ; 2024 May 25.

Artículo en Inglés | MEDLINE | ID: mdl-38903746

RESUMEN

Gene set knowledge discovery is essential for advancing human functional genomics. Recent studies have shown promising performance by harnessing the power of Large Language Models (LLMs) on this task. Nonetheless, their results are subject to several limitations common in LLMs such as hallucinations. In response, we present GeneAgent, a first-of-its-kind language agent featuring self-verification capability. It autonomously interacts with various biological databases and leverages relevant domain knowledge to improve accuracy and reduce hallucination occurrences. Benchmarking on 1,106 gene sets from different sources, GeneAgent consistently outperforms standard GPT-4 by a significant margin. Moreover, a detailed manual review confirms the effectiveness of the self-verification module in minimizing hallucinations and generating more reliable analytical narratives. To demonstrate its practical utility, we apply GeneAgent to seven novel gene sets derived from mouse B2905 melanoma cell lines, with expert evaluations showing that GeneAgent offers novel insights into gene functions and subsequently expedites knowledge discovery.

16.

EnzChemRED, a rich enzyme chemistry relation extraction dataset.

Lai, Po-Ting; Coudert, Elisabeth; Aimo, Lucila; Axelsen, Kristian; Breuza, Lionel; de Castro, Edouard; Feuermann, Marc; Morgat, Anne; Pourcel, Lucille; Pedruzzi, Ivo; Poux, Sylvain; Redaschi, Nicole; Rivoire, Catherine; Sveshnikova, Anastasia; Wei, Chih-Hsuan; Leaman, Robert; Luo, Ling; Lu, Zhiyong; Bridge, Alan.

ArXiv ; 2024 Apr 22.

Artículo en Inglés | MEDLINE | ID: mdl-38903736

RESUMEN

Expert curation is essential to capture knowledge of enzyme functions from the scientific literature in FAIR open knowledgebases but cannot keep pace with the rate of new discoveries and new publications. In this work we present EnzChemRED, for Enzyme Chemistry Relation Extraction Dataset, a new training and benchmarking dataset to support the development of Natural Language Processing (NLP) methods such as (large) language models that can assist enzyme curation. EnzChemRED consists of 1,210 expert curated PubMed abstracts in which enzymes and the chemical reactions they catalyze are annotated using identifiers from the UniProt Knowledgebase (UniProtKB) and the ontology of Chemical Entities of Biological Interest (ChEBI). We show that fine-tuning pre-trained language models with EnzChemRED can significantly boost their ability to identify mentions of proteins and chemicals in text (Named Entity Recognition, or NER) and to extract the chemical conversions in which they participate (Relation Extraction, or RE), with average F1 score of 86.30% for NER, 86.66% for RE for chemical conversion pairs, and 83.79% for RE for chemical conversion pairs and linked enzymes. We combine the best performing methods after fine-tuning using EnzChemRED to create an end-to-end pipeline for knowledge extraction from text and apply this to abstracts at PubMed scale to create a draft map of enzyme functions in literature to guide curation efforts in UniProtKB and the reaction knowledgebase Rhea. The EnzChemRED corpus is freely available at https://ftp.expasy.org/databases/rhea/nlp/.

17.

BioREx: Improving Biomedical Relation Extraction by Leveraging Heterogeneous Datasets.

Lai, Po-Ting; Wei, Chih-Hsuan; Luo, Ling; Chen, Qingyu; Lu, Zhiyong.

ArXiv ; 2023 Jun 19.

Artículo en Inglés | MEDLINE | ID: mdl-37502629

RESUMEN

Biomedical relation extraction (RE) is the task of automatically identifying and characterizing relations between biomedical concepts from free text. RE is a central task in biomedical natural language processing (NLP) research and plays a critical role in many downstream applications, such as literature-based discovery and knowledge graph construction. State-of-the-art methods were used primarily to train machine learning models on individual RE datasets, such as protein-protein interaction and chemical-induced disease relation. Manual dataset annotation, however, is highly expensive and time-consuming, as it requires domain knowledge. Existing RE datasets are usually domain-specific or small, which limits the development of generalized and high-performing RE models. In this work, we present a novel framework for systematically addressing the data heterogeneity of individual datasets and combining them into a large dataset. Based on the framework and dataset, we report on BioREx, a data-centric approach for extracting relations. Our evaluation shows that BioREx achieves significantly higher performance than the benchmark system trained on the individual dataset, setting a new SOTA from 74.4% to 79.6% in F-1 measure on the recently released BioRED corpus. We further demonstrate that the combined dataset can improve performance for five different RE tasks. In addition, we show that on average BioREx compares favorably to current best-performing methods such as transfer learning and multi-task learning. Finally, we demonstrate BioREx's robustness and generalizability in two independent RE tasks not previously seen in training data: drug-drug N-ary combination and document-level gene-disease RE. The integrated dataset and optimized method have been packaged as a stand-alone tool available at https://github.com/ncbi/BioREx.

18.

Tuning Electrochemical Stability of 5,10-Ditolylphenazine-Based Antiaromatic Materials for Unipolar Memristor toward Artificial Synapses Application.

Kuo, Kai-Hua; Chiu, Yi-Jhen; Hou, Yu-Che; Lai, Po-Ting; Chen, Cheng-Yueh; Tan, Guang-Hsun; Lin, Hao-Wu; Wong, Ken-Tsung.

ACS Appl Mater Interfaces ; 15(37): 44033-44042, 2023 Sep 20.

Artículo en Inglés | MEDLINE | ID: mdl-37694918

RESUMEN

Three organic conjugated small molecules, DTA-DTPZ, Cz-DTPZ, and DTA-me-DTPZ comprising an antiaromatic 5,10-ditolylphenazine (DTPZ) core and electron-donating peripheral substituents with high HOMOs (-4.2 to -4.7 eV) and multiple reversible oxidative potentials are reported. The corresponding films sandwiched between two electrodes show unipolar and switchable hysteresis current-voltage (I-V) characteristics upon voltage sweeping, revealing the prominent features of nonvolatile memristor behaviors. The numerical simulation of the I-V curves suggests that the carriers generated by the oxidized molecules lead to the increment of conductance. However, the accumulated carriers tend to deteriorate the device endurance. The electroactive sites are fully blocked in the dimethylated molecule DTA-me-DTPZ, preventing the irreversible electrochemical reaction, thereby boosting the endurance of the memristor device over 300 cycles. Despite the considerable improvement in endurance, the decrement of on/off ratio from 105 to 101 after 250 cycles suggests that the excessive charge carriers (radical cations) remains a problem. Thus, a new strategy of doping an electron-deficient material, CN-T2T, into the unipolar active layer was introduced to further improve the device stability. The device containing DTA-me-DTPZ:CNT2T (1:1) blend as the active layer retained the endurance and on/off ratio (â¼104) upon sweeping 300 cycles. The molecular designs and doping strategy demonstrate effective approaches toward more stable metal-free organic conjugated small-molecule memristors.

19.

Opportunities and Challenges for ChatGPT and Large Language Models in Biomedicine and Health.

Tian, Shubo; Jin, Qiao; Yeganova, Lana; Lai, Po-Ting; Zhu, Qingqing; Chen, Xiuying; Yang, Yifan; Chen, Qingyu; Kim, Won; Comeau, Donald C; Islamaj, Rezarta; Kapoor, Aadit; Gao, Xin; Lu, Zhiyong.

ArXiv ; 2023 Oct 17.

Artículo en Inglés | MEDLINE | ID: mdl-37904734

RESUMEN

ChatGPT has drawn considerable attention from both the general public and domain experts with its remarkable text generation capabilities. This has subsequently led to the emergence of diverse applications in the field of biomedicine and health. In this work, we examine the diverse applications of large language models (LLMs), such as ChatGPT, in biomedicine and health. Specifically we explore the areas of biomedical information retrieval, question answering, medical text summarization, information extraction, and medical education, and investigate whether LLMs possess the transformative power to revolutionize these tasks or whether the distinct complexities of biomedical domain presents unique challenges. Following an extensive literature survey, we find that significant advances have been made in the field of text generation tasks, surpassing the previous state-of-the-art methods. For other applications, the advances have been modest. Overall, LLMs have not yet revolutionized biomedicine, but recent rapid progress indicates that such methods hold great potential to provide valuable means for accelerating discovery and improving health. We also find that the use of LLMs, like ChatGPT, in the fields of biomedicine and health entails various risks and challenges, including fabricated information in its generated responses, as well as legal and privacy concerns associated with sensitive patient data. We believe this survey can provide a comprehensive and timely overview to biomedical researchers and healthcare practitioners on the opportunities and challenges associated with using ChatGPT and other LLMs for transforming biomedicine and health.

20.

Vacuum-Deposited Inorganic Perovskite Light-Emitting Diodes with External Quantum Efficiency Exceeding 10% via Composition and Crystallinity Manipulation of Emission Layer under High Vacuum.

Hsieh, Chung-An; Tan, Guang-Hsun; Chuang, Yung-Tang; Lin, Hao-Cheng; Lai, Po-Ting; Jan, Pei-En; Chen, Bo-Han; Lu, Chih-Hsuan; Yang, Shang-Da; Hsiao, Kai-Yuan; Lu, Ming-Yen; Chen, Li-Yin; Lin, Hao-Wu.

Adv Sci (Weinh) ; 10(10): e2206076, 2023 Apr.

Artículo en Inglés | MEDLINE | ID: mdl-36748267

RESUMEN

Although vacuum-deposited metal halide perovskite light-emitting diodes (PeLEDs) have great promise for use in large-area high-color-gamut displays, the efficiency of vacuum-sublimed PeLEDs currently lags that of solution-processed counterparts. In this study, highly efficient vacuum-deposited PeLEDs are prepared through a process of optimizing the stoichiometric ratio of the sublimed precursors under high vacuum and incorporating ultrathin under- and upper-layers for the perovskite emission layer (EML). In contrast to the situation in most vacuum-deposited organic light-emitting devices, the properties of these perovskite EMLs are highly influenced by the presence and nature of the upper- and presublimed materials, thereby allowing us to enhance the performance of the resulting devices. By eliminating Pb° formation and passivating defects in the perovskite EMLs, the PeLEDs achieve an outstanding external quantum efficiency (EQE) of 10.9% when applying a very smooth and flat geometry; it reaches an extraordinarily high value of 21.1% when integrating a light out-coupling structure, breaking through the 10% EQE milestone of vacuum-deposited PeLEDs.

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

RESUMEN

RESUMEN

RESUMEN

RESUMEN

RESUMEN

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA