Pesquisa | BVS Integralidade em Saúde

1.

PubTator 3.0: an AI-powered literature resource for unlocking biomedical knowledge.

Wei, Chih-Hsuan; Allot, Alexis; Lai, Po-Ting; Leaman, Robert; Tian, Shubo; Luo, Ling; Jin, Qiao; Wang, Zhizheng; Chen, Qingyu; Lu, Zhiyong.

Nucleic Acids Res ; 2024 Apr 04.

Artigo em Inglês | MEDLINE | ID: mdl-38572754

RESUMO

PubTator 3.0 (https://www.ncbi.nlm.nih.gov/research/pubtator3/) is a biomedical literature resource using state-of-the-art AI techniques to offer semantic and relation searches for key concepts like proteins, genetic variants, diseases and chemicals. It currently provides over one billion entity and relation annotations across approximately 36 million PubMed abstracts and 6 million full-text articles from the PMC open access subset, updated weekly. PubTator 3.0's online interface and API utilize these precomputed entity relations and synonyms to provide advanced search capabilities and enable large-scale analyses, streamlining many complex information needs. We showcase the retrieval quality of PubTator 3.0 using a series of entity pair queries, demonstrating that PubTator 3.0 retrieves a greater number of articles than either PubMed or Google Scholar, with higher precision in the top 20 results. We further show that integrating ChatGPT (GPT-4) with PubTator APIs dramatically improves the factuality and verifiability of its responses. In summary, PubTator 3.0 offers a comprehensive set of features and tools that allow researchers to navigate the ever-expanding wealth of biomedical literature, expediting research and unlocking valuable insights for scientific discovery.

2.

Database resources of the National Center for Biotechnology Information.

Sayers, Eric W; Beck, Jeff; Bolton, Evan E; Brister, J Rodney; Chan, Jessica; Comeau, Donald C; Connor, Ryan; DiCuccio, Michael; Farrell, Catherine M; Feldgarden, Michael; Fine, Anna M; Funk, Kathryn; Hatcher, Eneida; Hoeppner, Marilu; Kane, Megan; Kannan, Sivakumar; Katz, Kenneth S; Kelly, Christopher; Klimke, William; Kim, Sunghwan; Kimchi, Avi; Landrum, Melissa; Lathrop, Stacy; Lu, Zhiyong; Malheiro, Adriana; Marchler-Bauer, Aron; Murphy, Terence D; Phan, Lon; Prasad, Arjun B; Pujar, Shashikant; Sawyer, Amanda; Schmieder, Erin; Schneider, Valerie A; Schoch, Conrad L; Sharma, Shobha; Thibaud-Nissen, Françoise; Trawick, Barton W; Venkatapathi, Thilakam; Wang, Jiyao; Pruitt, Kim D; Sherry, Stephen T.

Nucleic Acids Res ; 52(D1): D33-D43, 2024 Jan 05.

Artigo em Inglês | MEDLINE | ID: mdl-37994677

RESUMO

The National Center for Biotechnology Information (NCBI) provides online information resources for biology, including the GenBank® nucleic acid sequence database and the PubMed® database of citations and abstracts published in life science journals. NCBI provides search and retrieval operations for most of these data from 35 distinct databases. The E-utilities serve as the programming interface for most of these databases. Resources receiving significant updates in the past year include PubMed, PMC, Bookshelf, SciENcv, the NIH Comparative Genomics Resource (CGR), NCBI Virus, SRA, RefSeq, foreign contamination screening tools, Taxonomy, iCn3D, ClinVar, GTR, MedGen, dbSNP, ALFA, ClinicalTrials.gov, Pathogen Detection, antimicrobial resistance resources, and PubChem. These resources can be accessed through the NCBI home page at https://www.ncbi.nlm.nih.gov.

Assuntos

Bases de Dados Genéticas , National Library of Medicine (U.S.) , Biotecnologia/instrumentação , Bases de Dados de Ácidos Nucleicos , Internet , Estados Unidos

3.

Opportunities and challenges for ChatGPT and large language models in biomedicine and health.

Tian, Shubo; Jin, Qiao; Yeganova, Lana; Lai, Po-Ting; Zhu, Qingqing; Chen, Xiuying; Yang, Yifan; Chen, Qingyu; Kim, Won; Comeau, Donald C; Islamaj, Rezarta; Kapoor, Aadit; Gao, Xin; Lu, Zhiyong.

Brief Bioinform ; 25(1)2023 11 22.

Artigo em Inglês | MEDLINE | ID: mdl-38168838

RESUMO

ChatGPT has drawn considerable attention from both the general public and domain experts with its remarkable text generation capabilities. This has subsequently led to the emergence of diverse applications in the field of biomedicine and health. In this work, we examine the diverse applications of large language models (LLMs), such as ChatGPT, in biomedicine and health. Specifically, we explore the areas of biomedical information retrieval, question answering, medical text summarization, information extraction and medical education and investigate whether LLMs possess the transformative power to revolutionize these tasks or whether the distinct complexities of biomedical domain presents unique challenges. Following an extensive literature survey, we find that significant advances have been made in the field of text generation tasks, surpassing the previous state-of-the-art methods. For other applications, the advances have been modest. Overall, LLMs have not yet revolutionized biomedicine, but recent rapid progress indicates that such methods hold great potential to provide valuable means for accelerating discovery and improving health. We also find that the use of LLMs, like ChatGPT, in the fields of biomedicine and health entails various risks and challenges, including fabricated information in its generated responses, as well as legal and privacy concerns associated with sensitive patient data. We believe this survey can provide a comprehensive and timely overview to biomedical researchers and healthcare practitioners on the opportunities and challenges associated with using ChatGPT and other LLMs for transforming biomedicine and health.

Assuntos

Armazenamento e Recuperação da Informação , Idioma , Humanos , Privacidade , Pesquisadores

4.

GeneGPT: augmenting large language models with domain tools for improved access to biomedical information.

Jin, Qiao; Yang, Yifan; Chen, Qingyu; Lu, Zhiyong.

Bioinformatics ; 40(2)2024 Feb 01.

Artigo em Inglês | MEDLINE | ID: mdl-38341654

RESUMO

MOTIVATION: While large language models (LLMs) have been successfully applied to various tasks, they still face challenges with hallucinations. Augmenting LLMs with domain-specific tools such as database utilities can facilitate easier and more precise access to specialized knowledge. In this article, we present GeneGPT, a novel method for teaching LLMs to use the Web APIs of the National Center for Biotechnology Information (NCBI) for answering genomics questions. Specifically, we prompt Codex to solve the GeneTuring tests with NCBI Web APIs by in-context learning and an augmented decoding algorithm that can detect and execute API calls. RESULTS: Experimental results show that GeneGPT achieves state-of-the-art performance on eight tasks in the GeneTuring benchmark with an average score of 0.83, largely surpassing retrieval-augmented LLMs such as the new Bing (0.44), biomedical LLMs such as BioMedLM (0.08) and BioGPT (0.04), as well as GPT-3 (0.16) and ChatGPT (0.12). Our further analyses suggest that: First, API demonstrations have good cross-task generalizability and are more useful than documentations for in-context learning; second, GeneGPT can generalize to longer chains of API calls and answer multi-hop questions in GeneHop, a novel dataset introduced in this work; finally, different types of errors are enriched in different tasks, providing valuable insights for future improvements. AVAILABILITY AND IMPLEMENTATION: The GeneGPT code and data are publicly available at https://github.com/ncbi/GeneGPT.

Assuntos

Algoritmos , Benchmarking , Bases de Dados Factuais , Documentação , Idioma

5.

Advancing entity recognition in biomedicine via instruction tuning of large language models.

Keloth, Vipina K; Hu, Yan; Xie, Qianqian; Peng, Xueqing; Wang, Yan; Zheng, Andrew; Selek, Melih; Raja, Kalpana; Wei, Chih Hsuan; Jin, Qiao; Lu, Zhiyong; Chen, Qingyu; Xu, Hua.

Bioinformatics ; 40(4)2024 Mar 29.

Artigo em Inglês | MEDLINE | ID: mdl-38514400

RESUMO

MOTIVATION: Large Language Models (LLMs) have the potential to revolutionize the field of Natural Language Processing, excelling not only in text generation and reasoning tasks but also in their ability for zero/few-shot learning, swiftly adapting to new tasks with minimal fine-tuning. LLMs have also demonstrated great promise in biomedical and healthcare applications. However, when it comes to Named Entity Recognition (NER), particularly within the biomedical domain, LLMs fall short of the effectiveness exhibited by fine-tuned domain-specific models. One key reason is that NER is typically conceptualized as a sequence labeling task, whereas LLMs are optimized for text generation and reasoning tasks. RESULTS: We developed an instruction-based learning paradigm that transforms biomedical NER from a sequence labeling task into a generation task. This paradigm is end-to-end and streamlines the training and evaluation process by automatically repurposing pre-existing biomedical NER datasets. We further developed BioNER-LLaMA using the proposed paradigm with LLaMA-7B as the foundational LLM. We conducted extensive testing on BioNER-LLaMA across three widely recognized biomedical NER datasets, consisting of entities related to diseases, chemicals, and genes. The results revealed that BioNER-LLaMA consistently achieved higher F1-scores ranging from 5% to 30% compared to the few-shot learning capabilities of GPT-4 on datasets with different biomedical entities. We show that a general-domain LLM can match the performance of rigorously fine-tuned PubMedBERT models and PMC-LLaMA, biomedical-specific language model. Our findings underscore the potential of our proposed paradigm in developing general-domain LLMs that can rival SOTA performances in multi-task, multi-domain scenarios in biomedical and health applications. AVAILABILITY AND IMPLEMENTATION: Datasets and other resources are available at https://github.com/BIDS-Xu-Lab/BioNER-LLaMA.

Assuntos

Camelídeos Americanos , Aprendizado Profundo , Animais , Idioma , Processamento de Linguagem Natural

6.

LitCovid in 2022: an information resource for the COVID-19 literature.

Chen, Qingyu; Allot, Alexis; Leaman, Robert; Wei, Chih-Hsuan; Aghaarabi, Elaheh; Guerrerio, John J; Xu, Lilly; Lu, Zhiyong.

Nucleic Acids Res ; 51(D1): D1512-D1518, 2023 01 06.

Artigo em Inglês | MEDLINE | ID: mdl-36350613

RESUMO

LitCovid (https://www.ncbi.nlm.nih.gov/research/coronavirus/)-first launched in February 2020-is a first-of-its-kind literature hub for tracking up-to-date published research on COVID-19. The number of articles in LitCovid has increased from 55 000 to â¼300 000 over the past 2.5 years, with a consistent growth rate of â¼10 000 articles per month. In addition to the rapid literature growth, the COVID-19 pandemic has evolved dramatically. For instance, the Omicron variant has now accounted for over 98% of new infections in the United States. In response to the continuing evolution of the COVID-19 pandemic, this article describes significant updates to LitCovid over the last 2 years. First, we introduced the long Covid collection consisting of the articles on COVID-19 survivors experiencing ongoing multisystemic symptoms, including respiratory issues, cardiovascular disease, cognitive impairment, and profound fatigue. Second, we provided new annotations on the latest COVID-19 strains and vaccines mentioned in the literature. Third, we improved several existing features with more accurate machine learning algorithms for annotating topics and classifying articles relevant to COVID-19. LitCovid has been widely used with millions of accesses by users worldwide on various information needs and continues to play a critical role in collecting, curating and standardizing the latest knowledge on the COVID-19 literature.

Assuntos

COVID-19 , Bases de Dados Bibliográficas , Humanos , COVID-19/epidemiologia , Pandemias , Síndrome de COVID-19 Pós-Aguda , SARS-CoV-2 , Estados Unidos

7.

Database resources of the National Center for Biotechnology Information in 2023.

Sayers, Eric W; Bolton, Evan E; Brister, J Rodney; Canese, Kathi; Chan, Jessica; Comeau, Donald C; Farrell, Catherine M; Feldgarden, Michael; Fine, Anna M; Funk, Kathryn; Hatcher, Eneida; Kannan, Sivakumar; Kelly, Christopher; Kim, Sunghwan; Klimke, William; Landrum, Melissa J; Lathrop, Stacy; Lu, Zhiyong; Madden, Thomas L; Malheiro, Adriana; Marchler-Bauer, Aron; Murphy, Terence D; Phan, Lon; Pujar, Shashikant; Rangwala, Sanjida H; Schneider, Valerie A; Tse, Tony; Wang, Jiyao; Ye, Jian; Trawick, Barton W; Pruitt, Kim D; Sherry, Stephen T.

Nucleic Acids Res ; 51(D1): D29-D38, 2023 01 06.

Artigo em Inglês | MEDLINE | ID: mdl-36370100

RESUMO

The National Center for Biotechnology Information (NCBI) provides online information resources for biology, including the GenBank® nucleic acid sequence database and the PubMed® database of citations and abstracts published in life science journals. NCBI provides search and retrieval operations for most of these data from 35 distinct databases. The E-utilities serve as the programming interface for most of these databases. New resources include the Comparative Genome Resource (CGR) and the BLAST ClusteredNR database. Resources receiving significant updates in the past year include PubMed, PMC, Bookshelf, IgBLAST, GDV, RefSeq, NCBI Virus, GenBank type assemblies, iCn3D, ClinVar, GTR, dbGaP, ALFA, ClinicalTrials.gov, Pathogen Detection, antimicrobial resistance resources, and PubChem. These resources can be accessed through the NCBI home page at https://www.ncbi.nlm.nih.gov.

Assuntos

Bases de Dados Genéticas , Bases de Dados de Ácidos Nucleicos , Estados Unidos , National Library of Medicine (U.S.) , Alinhamento de Sequência , Biotecnologia , Internet

8.

Atomically Precise Single-Site Catalysts via Exsolution in a Polyoxometalate-Metal-Organic-Framework Architecture.

Chen, Zhihengyu; Gulam Rabbani, S M; Liu, Qin; Bi, Wentuan; Duan, Jiaxin; Lu, Zhiyong; Schweitzer, Neil M; Getman, Rachel B; Hupp, Joseph T; Chapman, Karena W.

J Am Chem Soc ; 146(12): 7950-7955, 2024 Mar 27.

Artigo em Inglês | MEDLINE | ID: mdl-38483267

RESUMO

Single-site catalysts (SSCs) achieve a high catalytic performance through atomically dispersed active sites. A challenge facing the development of SSCs is aggregation of active catalytic species. Reducing the loading of these sites to very low levels is a common strategy to mitigate aggregation and sintering; however, this limits the tools that can be used to characterize the SSCs. Here we report a sintering-resistant SSC with high loading that is achieved by incorporating Anderson-Evans polyoxometalate clusters (POMs, MMo6O24, M = Rh/Pt) within NU-1000, a Zr-based metal-organic framework (MOF). The dual confinement provided by isolating the active site within the POM, then isolating the POMs within the MOF, facilitates the formation of isolated noble metal sites with low coordination numbers via exsolution from the POM during activation. The high loading (up to 3.2 wt %) that can be achieved without sintering allowed the local structure transformation in the POM cluster and the surrounding MOF to be evaluated using in situ X-ray scattering with pair distribution function (PDF) analysis. Notably, the Rh/Pt···Mo distance in the active catalyst is shorter than the M···M bond lengths in the respective bulk metals. Models of the active cluster structure were identified based on the PDF data with complementary computation and X-ray absorption spectroscopy analysis.

9.

Evolving use of ancestry, ethnicity, and race in genetics research-A survey spanning seven decades.

Byeon, Yen Ji Julia; Islamaj, Rezarta; Yeganova, Lana; Wilbur, W John; Lu, Zhiyong; Brody, Lawrence C; Bonham, Vence L.

Am J Hum Genet ; 108(12): 2215-2223, 2021 12 02.

Artigo em Inglês | MEDLINE | ID: mdl-34861173

RESUMO

To inform continuous and rigorous reflection about the description of human populations in genomics research, this study investigates the historical and contemporary use of the terms "ancestry," "ethnicity," "race," and other population labels in The American Journal of Human Genetics from 1949 to 2018. We characterize these terms' frequency of use and assess their odds of co-occurrence with a set of social and genetic topical terms. Throughout The Journal's 70-year history, "ancestry" and "ethnicity" have increased in use, appearing in 33% and 26% of articles in 2009-2018, while the use of "race" has decreased, occurring in 4% of articles in 2009-2018. Although its overall use has declined, the odds of "race" appearing in the presence of "ethnicity" has increased relative to the odds of occurring in its absence. Forms of population descriptors "Caucasian" and "Negro" have largely disappeared from The Journal (<1% of articles in 2009-2018). Conversely, the continental labels "African," "Asian," and "European" have increased in use and appear in 18%, 14%, and 42% of articles from 2009-2018, respectively. Decreasing uses of the terms "race," "Caucasian," and "Negro" are indicative of a transition away from the field's history of explicitly biological race science; at the same time, the increasing use of "ancestry," "ethnicity," and continental labels should serve to motivate ongoing reflection as the terminology used to describe genetic variation continues to evolve.

Assuntos

Pesquisa em Genética , Genética Humana/tendências , Etnicidade , Pesquisa em Genética/história , História do Século XX , História do Século XXI , Genética Humana/história , Humanos , Editoração/história , Grupos Raciais

10.

One-Pot Synthesis of MOF@MOF: Structural Incompatibility Leads to Core-Shell Structure and Adaptability Control Makes the Sequence.

Tan, Hao; Zhao, Xiang; Du, Liting; Wang, Bufeng; Huang, Yongliang; Gu, Yupeng; Lu, Zhiyong.

Small ; 20(3): e2305881, 2024 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-37670528

RESUMO

Core-shell metal-organic frameworks (MOF@MOF) are promising materials with sophisticated structures that cannot only enhance the properties of MOFs but also endow them with new functions. The growth of isotopic lcore-shell MOFs is mostly limited to inconvenient stepwise seeding strategies with strict requirements, and by far one-pot synthesis is still of great challenge due to the interference of different components. Through two pairs of isoreticular MOFs, it reveals that the structural incompatibility is a prerequisite for the formation of MOFs@MOFs by one-pot synthesis, as illustrated by PMOF-3@HHU-9. It further unveils that the adaptability of the shell-MOF is a more key factor for nucleation kinetic control. MOFs with flexible linkers has comparably slower nucleation than MOFs with rigid linkers (forming PMOF-3@NJU-Bai21), and structural-flexible MOFs built by flexible linkers show the lowest nucleation and the most adaptability (affording NJU-Bai21@HHU-9). This degree of adaptability variation controls the sequence and further facilitates the synthesis of a first triple-layered core-shell MOF (PMOF-3@NJU-Bai21@HHU-9) by one-pot synthesis. The insight gained from this study will aid in the rational design and synthesis of other multi-shelled structures by one-pot synthesis and the further expansion of their applications.

11.

BioRED: a rich biomedical relation extraction dataset.

Luo, Ling; Lai, Po-Ting; Wei, Chih-Hsuan; Arighi, Cecilia N; Lu, Zhiyong.

Brief Bioinform ; 23(5)2022 09 20.

Artigo em Inglês | MEDLINE | ID: mdl-35849818

RESUMO

Automated relation extraction (RE) from biomedical literature is critical for many downstream text mining applications in both research and real-world settings. However, most existing benchmarking datasets for biomedical RE only focus on relations of a single type (e.g. protein-protein interactions) at the sentence level, greatly limiting the development of RE systems in biomedicine. In this work, we first review commonly used named entity recognition (NER) and RE datasets. Then, we present a first-of-its-kind biomedical relation extraction dataset (BioRED) with multiple entity types (e.g. gene/protein, disease, chemical) and relation pairs (e.g. gene-disease; chemical-chemical) at the document level, on a set of 600 PubMed abstracts. Furthermore, we label each relation as describing either a novel finding or previously known background knowledge, enabling automated algorithms to differentiate between novel and background information. We assess the utility of BioRED by benchmarking several existing state-of-the-art methods, including Bidirectional Encoder Representations from Transformers (BERT)-based models, on the NER and RE tasks. Our results show that while existing approaches can reach high performance on the NER task (F-score of 89.3%), there is much room for improvement for the RE task, especially when extracting novel relations (F-score of 47.7%). Our experiments also demonstrate that such a rich dataset can successfully facilitate the development of more accurate, efficient and robust RE systems for biomedicine. Availability: The BioRED dataset and annotation guidelines are freely available at https://ftp.ncbi.nlm.nih.gov/pub/lu/BioRED/.

Assuntos

Algoritmos , Mineração de Dados , Proteínas , PubMed

12.

AIONER: all-in-one scheme-based biomedical named entity recognition using deep learning.

Luo, Ling; Wei, Chih-Hsuan; Lai, Po-Ting; Leaman, Robert; Chen, Qingyu; Lu, Zhiyong.

Bioinformatics ; 39(5)2023 05 04.

Artigo em Inglês | MEDLINE | ID: mdl-37171899

RESUMO

MOTIVATION: Biomedical named entity recognition (BioNER) seeks to automatically recognize biomedical entities in natural language text, serving as a necessary foundation for downstream text mining tasks and applications such as information extraction and question answering. Manually labeling training data for the BioNER task is costly, however, due to the significant domain expertise required for accurate annotation. The resulting data scarcity causes current BioNER approaches to be prone to overfitting, to suffer from limited generalizability, and to address a single entity type at a time (e.g. gene or disease). RESULTS: We therefore propose a novel all-in-one (AIO) scheme that uses external data from existing annotated resources to enhance the accuracy and stability of BioNER models. We further present AIONER, a general-purpose BioNER tool based on cutting-edge deep learning and our AIO schema. We evaluate AIONER on 14 BioNER benchmark tasks and show that AIONER is effective, robust, and compares favorably to other state-of-the-art approaches such as multi-task learning. We further demonstrate the practical utility of AIONER in three independent tasks to recognize entity types not previously seen in training data, as well as the advantages of AIONER over existing methods for processing biomedical text at a large scale (e.g. the entire PubMed data). AVAILABILITY AND IMPLEMENTATION: The source code, trained models and data for AIONER are freely available at https://github.com/ncbi/AIONER.

Assuntos

Aprendizado Profundo , Mineração de Dados/métodos , Software , Idioma , PubMed

13.

GNorm2: an improved gene name recognition and normalization system.

Wei, Chih-Hsuan; Luo, Ling; Islamaj, Rezarta; Lai, Po-Ting; Lu, Zhiyong.

Bioinformatics ; 39(10)2023 10 03.

Artigo em Inglês | MEDLINE | ID: mdl-37878810

RESUMO

MOTIVATION: Gene name normalization is an important yet highly complex task in biomedical text mining research, as gene names can be highly ambiguous and may refer to different genes in different species or share similar names with other bioconcepts. This poses a challenge for accurately identifying and linking gene mentions to their corresponding entries in databases such as NCBI Gene or UniProt. While there has been a body of literature on the gene normalization task, few have addressed all of these challenges or make their solutions publicly available to the scientific community. RESULTS: Building on the success of GNormPlus, we have created GNorm2: a more advanced tool with optimized functions and improved performance. GNorm2 integrates a range of advanced deep learning-based methods, resulting in the highest levels of accuracy and efficiency for gene recognition and normalization to date. Our tool is freely available for download. AVAILABILITY AND IMPLEMENTATION: https://github.com/ncbi/GNorm2.

Assuntos

Mineração de Dados , Mineração de Dados/métodos , Bases de Dados Factuais

14.

MedCPT: Contrastive Pre-trained Transformers with large-scale PubMed search logs for zero-shot biomedical information retrieval.

Jin, Qiao; Kim, Won; Chen, Qingyu; Comeau, Donald C; Yeganova, Lana; Wilbur, W John; Lu, Zhiyong.

Bioinformatics ; 39(11)2023 11 01.

Artigo em Inglês | MEDLINE | ID: mdl-37930897

RESUMO

MOTIVATION: Information retrieval (IR) is essential in biomedical knowledge acquisition and clinical decision support. While recent progress has shown that language model encoders perform better semantic retrieval, training such models requires abundant query-article annotations that are difficult to obtain in biomedicine. As a result, most biomedical IR systems only conduct lexical matching. In response, we introduce MedCPT, a first-of-its-kind Contrastively Pre-trained Transformer model for zero-shot semantic IR in biomedicine. RESULTS: To train MedCPT, we collected an unprecedented scale of 255 million user click logs from PubMed. With such data, we use contrastive learning to train a pair of closely integrated retriever and re-ranker. Experimental results show that MedCPT sets new state-of-the-art performance on six biomedical IR tasks, outperforming various baselines including much larger models, such as GPT-3-sized cpt-text-XL. In addition, MedCPT also generates better biomedical article and sentence representations for semantic evaluations. As such, MedCPT can be readily applied to various real-world biomedical IR tasks. AVAILABILITY AND IMPLEMENTATION: The MedCPT code and model are available at https://github.com/ncbi/MedCPT.

Assuntos

Armazenamento e Recuperação da Informação , Semântica , Idioma , Processamento de Linguagem Natural , PubMed , Literatura de Revisão como Assunto

15.

Term-BLAST-like alignment tool for concept recognition in noisy clinical texts.

Groza, Tudor; Wu, Honghan; Dinger, Marcel E; Danis, Daniel; Hilton, Coleman; Bagley, Anita; Davids, Jon R; Luo, Ling; Lu, Zhiyong; Robinson, Peter N.

Bioinformatics ; 39(12)2023 12 01.

Artigo em Inglês | MEDLINE | ID: mdl-38001031

RESUMO

MOTIVATION: Methods for concept recognition (CR) in clinical texts have largely been tested on abstracts or articles from the medical literature. However, texts from electronic health records (EHRs) frequently contain spelling errors, abbreviations, and other nonstandard ways of representing clinical concepts. RESULTS: Here, we present a method inspired by the BLAST algorithm for biosequence alignment that screens texts for potential matches on the basis of matching k-mer counts and scores candidates based on conformance to typical patterns of spelling errors derived from 2.9 million clinical notes. Our method, the Term-BLAST-like alignment tool (TBLAT) leverages a gold standard corpus for typographical errors to implement a sequence alignment-inspired method for efficient entity linkage. We present a comprehensive experimental comparison of TBLAT with five widely used tools. Experimental results show an increase of 10% in recall on scientific publications and 20% increase in recall on EHR records (when compared against the next best method), hence supporting a significant enhancement of the entity linking task. The method can be used stand-alone or as a complement to existing approaches. AVAILABILITY AND IMPLEMENTATION: Fenominal is a Java library that implements TBLAT for named CR of Human Phenotype Ontology terms and is available at https://github.com/monarch-initiative/fenominal under the GNU General Public License v3.0.

Assuntos

Algoritmos , Idioma , Humanos , Alinhamento de Sequência , Registros Eletrônicos de Saúde , Publicações

16.

An Updated Simplified Severity Scale for Age-Related Macular Degeneration Incorporating Reticular Pseudodrusen: Age-Related Eye Disease Study Report Number 42.

Agrón, Elvira; Domalpally, Amitha; Chen, Qingyu; Lu, Zhiyong; Chew, Emily Y; Keenan, Tiarnan D L.

Ophthalmology ; 2024 Apr 23.

Artigo em Inglês | MEDLINE | ID: mdl-38657840

RESUMO

PURPOSE: To update the Age-Related Eye Disease Study (AREDS) simplified severity scale for risk of late age-related macular degeneration (AMD), including incorporation of reticular pseudodrusen (RPD), and to perform external validation on the Age-Related Eye Disease Study 2 (AREDS2). DESIGN: Post hoc analysis of 2 clinical trial cohorts: AREDS and AREDS2. PARTICIPANTS: Participants with no late AMD in either eye at baseline in AREDS (n = 2719) and AREDS2 (n = 1472). METHODS: Five-year rates of progression to late AMD were calculated according to levels 0 to 4 on the simplified severity scale after 2 updates: (1) noncentral geographic atrophy (GA) considered part of the outcome, rather than a risk feature, and (2) scale separation according to RPD status (determined by validated deep learning grading of color fundus photographs). MAIN OUTCOME MEASURES: Five-year rate of progression to late AMD (defined as neovascular AMD or any GA). RESULTS: In the AREDS, after the first scale update, the 5-year rates of progression to late AMD for levels 0 to 4 were 0.3%, 4.5%, 12.9%, 32.2%, and 55.6%, respectively. As the final simplified severity scale, the 5-year progression rates for levels 0 to 4 were 0.3%, 4.3%, 11.6%, 26.7%, and 50.0%, respectively, for participants without RPD at baseline and 2.8%, 8.0%, 29.0%, 58.7%, and 72.2%, respectively, for participants with RPD at baseline. In external validation on the AREDS2, for levels 2 to 4, the progression rates were similar: 15.0%, 27.7%, and 45.7% (RPD absent) and 26.2%, 46.0%, and 73.0% (RPD present), respectively. CONCLUSIONS: The AREDS AMD simplified severity scale has been modernized with 2 important updates. The new scale for individuals without RPD has 5-year progression rates of approximately 0.5%, 4%, 12%, 25%, and 50%, such that the rates on the original scale remain accurate. The new scale for individuals with RPD has 5-year progression rates of approximately 3%, 8%, 30%, 60%, and 70%, that is, approximately double for most levels. This scale fits updated definitions of late AMD, has increased prognostic accuracy, seems generalizable to similar populations, but remains simple for broad risk categorization. FINANCIAL DISCLOSURE(S): Proprietary or commercial disclosure may be found in the Footnotes and Disclosures at the end of this article.

17.

A survey of recent methods for addressing AI fairness and bias in biomedicine.

Yang, Yifan; Lin, Mingquan; Zhao, Han; Peng, Yifan; Huang, Furong; Lu, Zhiyong.

J Biomed Inform ; 154: 104646, 2024 Jun.

Artigo em Inglês | MEDLINE | ID: mdl-38677633

RESUMO

OBJECTIVES: Artificial intelligence (AI) systems have the potential to revolutionize clinical practices, including improving diagnostic accuracy and surgical decision-making, while also reducing costs and manpower. However, it is important to recognize that these systems may perpetuate social inequities or demonstrate biases, such as those based on race or gender. Such biases can occur before, during, or after the development of AI models, making it critical to understand and address potential biases to enable the accurate and reliable application of AI models in clinical settings. To mitigate bias concerns during model development, we surveyed recent publications on different debiasing methods in the fields of biomedical natural language processing (NLP) or computer vision (CV). Then we discussed the methods, such as data perturbation and adversarial learning, that have been applied in the biomedical domain to address bias. METHODS: We performed our literature search on PubMed, ACM digital library, and IEEE Xplore of relevant articles published between January 2018 and December 2023 using multiple combinations of keywords. We then filtered the result of 10,041 articles automatically with loose constraints, and manually inspected the abstracts of the remaining 890 articles to identify the 55 articles included in this review. Additional articles in the references are also included in this review. We discuss each method and compare its strengths and weaknesses. Finally, we review other potential methods from the general domain that could be applied to biomedicine to address bias and improve fairness. RESULTS: The bias of AIs in biomedicine can originate from multiple sources such as insufficient data, sampling bias and the use of health-irrelevant features or race-adjusted algorithms. Existing debiasing methods that focus on algorithms can be categorized into distributional or algorithmic. Distributional methods include data augmentation, data perturbation, data reweighting methods, and federated learning. Algorithmic approaches include unsupervised representation learning, adversarial learning, disentangled representation learning, loss-based methods and causality-based methods.

Assuntos

Inteligência Artificial , Viés , Processamento de Linguagem Natural , Humanos , Inquéritos e Questionários , Aprendizado de Máquina , Algoritmos

18.

Leveraging generative AI for clinical evidence synthesis needs to ensure trustworthiness.

Zhang, Gongbo; Jin, Qiao; Jered McInerney, Denis; Chen, Yong; Wang, Fei; Cole, Curtis L; Yang, Qian; Wang, Yanshan; Malin, Bradley A; Peleg, Mor; Wallace, Byron C; Lu, Zhiyong; Weng, Chunhua; Peng, Yifan.

J Biomed Inform ; 153: 104640, 2024 May.

Artigo em Inglês | MEDLINE | ID: mdl-38608915

RESUMO

Evidence-based medicine promises to improve the quality of healthcare by empowering medical decisions and practices with the best available evidence. The rapid growth of medical evidence, which can be obtained from various sources, poses a challenge in collecting, appraising, and synthesizing the evidential information. Recent advancements in generative AI, exemplified by large language models, hold promise in facilitating the arduous task. However, developing accountable, fair, and inclusive models remains a complicated undertaking. In this perspective, we discuss the trustworthiness of generative AI in the context of automated summarization of medical evidence.

Assuntos

Inteligência Artificial , Medicina Baseada em Evidências , Humanos , Confiança , Processamento de Linguagem Natural

19.

Database resources of the national center for biotechnology information.

Sayers, Eric W; Bolton, Evan E; Brister, J Rodney; Canese, Kathi; Chan, Jessica; Comeau, Donald C; Connor, Ryan; Funk, Kathryn; Kelly, Chris; Kim, Sunghwan; Madej, Tom; Marchler-Bauer, Aron; Lanczycki, Christopher; Lathrop, Stacy; Lu, Zhiyong; Thibaud-Nissen, Francoise; Murphy, Terence; Phan, Lon; Skripchenko, Yuri; Tse, Tony; Wang, Jiyao; Williams, Rebecca; Trawick, Barton W; Pruitt, Kim D; Sherry, Stephen T.

Nucleic Acids Res ; 50(D1): D20-D26, 2022 01 07.

Artigo em Inglês | MEDLINE | ID: mdl-34850941

RESUMO

The National Center for Biotechnology Information (NCBI) produces a variety of online information resources for biology, including the GenBank® nucleic acid sequence database and the PubMed® database of citations and abstracts published in life science journals. NCBI provides search and retrieval operations for most of these data from 35 distinct databases. The E-utilities serve as the programming interface for the most of these databases. Resources receiving significant updates in the past year include PubMed, PMC, Bookshelf, RefSeq, SRA, Virus, dbSNP, dbVar, ClinicalTrials.gov, MMDB, iCn3D and PubChem. These resources can be accessed through the NCBI home page at https://www.ncbi.nlm.nih.gov.

Assuntos

Biotecnologia/tendências , Bases de Dados Genéticas/tendências , Bases de Dados de Compostos Químicos , Bases de Dados de Ácidos Nucleicos , Bases de Dados de Proteínas , Humanos , Internet , National Library of Medicine (U.S.) , PubMed , Estados Unidos

20.

Quality of Answers of Generative Large Language Models Versus Peer Users for Interpreting Laboratory Test Results for Lay Patients: Evaluation Study.

He, Zhe; Bhasuran, Balu; Jin, Qiao; Tian, Shubo; Hanna, Karim; Shavor, Cindy; Arguello, Lisbeth Garcia; Murray, Patrick; Lu, Zhiyong.

J Med Internet Res ; 26: e56655, 2024 Apr 17.

Artigo em Inglês | MEDLINE | ID: mdl-38630520

RESUMO

BACKGROUND: Although patients have easy access to their electronic health records and laboratory test result data through patient portals, laboratory test results are often confusing and hard to understand. Many patients turn to web-based forums or question-and-answer (Q&A) sites to seek advice from their peers. The quality of answers from social Q&A sites on health-related questions varies significantly, and not all responses are accurate or reliable. Large language models (LLMs) such as ChatGPT have opened a promising avenue for patients to have their questions answered. OBJECTIVE: We aimed to assess the feasibility of using LLMs to generate relevant, accurate, helpful, and unharmful responses to laboratory test-related questions asked by patients and identify potential issues that can be mitigated using augmentation approaches. METHODS: We collected laboratory test result-related Q&A data from Yahoo! Answers and selected 53 Q&A pairs for this study. Using the LangChain framework and ChatGPT web portal, we generated responses to the 53 questions from 5 LLMs: GPT-4, GPT-3.5, LLaMA 2, MedAlpaca, and ORCA_mini. We assessed the similarity of their answers using standard Q&A similarity-based evaluation metrics, including Recall-Oriented Understudy for Gisting Evaluation, Bilingual Evaluation Understudy, Metric for Evaluation of Translation With Explicit Ordering, and Bidirectional Encoder Representations from Transformers Score. We used an LLM-based evaluator to judge whether a target model had higher quality in terms of relevance, correctness, helpfulness, and safety than the baseline model. We performed a manual evaluation with medical experts for all the responses to 7 selected questions on the same 4 aspects. RESULTS: Regarding the similarity of the responses from 4 LLMs; the GPT-4 output was used as the reference answer, the responses from GPT-3.5 were the most similar, followed by those from LLaMA 2, ORCA_mini, and MedAlpaca. Human answers from Yahoo data were scored the lowest and, thus, as the least similar to GPT-4-generated answers. The results of the win rate and medical expert evaluation both showed that GPT-4's responses achieved better scores than all the other LLM responses and human responses on all 4 aspects (relevance, correctness, helpfulness, and safety). LLM responses occasionally also suffered from lack of interpretation in one's medical context, incorrect statements, and lack of references. CONCLUSIONS: By evaluating LLMs in generating responses to patients' laboratory test result-related questions, we found that, compared to other 4 LLMs and human answers from a Q&A website, GPT-4's responses were more accurate, helpful, relevant, and safer. There were cases in which GPT-4 responses were inaccurate and not individualized. We identified a number of ways to improve the quality of LLM responses, including prompt engineering, prompt augmentation, retrieval-augmented generation, and response evaluation.

Assuntos

Inteligência Artificial , Registros Eletrônicos de Saúde , Humanos , Idioma

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

Detalhe da pesquisa