Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 6 de 6
Filtrar
Mais filtros

Base de dados
Ano de publicação
Tipo de documento
Intervalo de ano de publicação
1.
Bioinformatics ; 40(3)2024 Mar 04.
Artigo em Inglês | MEDLINE | ID: mdl-38383067

RESUMO

MOTIVATION: Creating knowledge bases and ontologies is a time consuming task that relies on manual curation. AI/NLP approaches can assist expert curators in populating these knowledge bases, but current approaches rely on extensive training data, and are not able to populate arbitrarily complex nested knowledge schemas. RESULTS: Here we present Structured Prompt Interrogation and Recursive Extraction of Semantics (SPIRES), a Knowledge Extraction approach that relies on the ability of Large Language Models (LLMs) to perform zero-shot learning and general-purpose query answering from flexible prompts and return information conforming to a specified schema. Given a detailed, user-defined knowledge schema and an input text, SPIRES recursively performs prompt interrogation against an LLM to obtain a set of responses matching the provided schema. SPIRES uses existing ontologies and vocabularies to provide identifiers for matched elements. We present examples of applying SPIRES in different domains, including extraction of food recipes, multi-species cellular signaling pathways, disease treatments, multi-step drug mechanisms, and chemical to disease relationships. Current SPIRES accuracy is comparable to the mid-range of existing Relation Extraction methods, but greatly surpasses an LLM's native capability of grounding entities with unique identifiers. SPIRES has the advantage of easy customization, flexibility, and, crucially, the ability to perform new tasks in the absence of any new training data. This method supports a general strategy of leveraging the language interpreting capabilities of LLMs to assemble knowledge bases, assisting manual knowledge curation and acquisition while supporting validation with publicly-available databases and ontologies external to the LLM. AVAILABILITY AND IMPLEMENTATION: SPIRES is available as part of the open source OntoGPT package: https://github.com/monarch-initiative/ontogpt.


Assuntos
Bases de Conhecimento , Semântica , Bases de Dados Factuais
2.
BMC Med Inform Decis Mak ; 24(1): 30, 2024 Jan 31.
Artigo em Inglês | MEDLINE | ID: mdl-38297371

RESUMO

OBJECTIVE: Clinical deep phenotyping and phenotype annotation play a critical role in both the diagnosis of patients with rare disorders as well as in building computationally-tractable knowledge in the rare disorders field. These processes rely on using ontology concepts, often from the Human Phenotype Ontology, in conjunction with a phenotype concept recognition task (supported usually by machine learning methods) to curate patient profiles or existing scientific literature. With the significant shift in the use of large language models (LLMs) for most NLP tasks, we examine the performance of the latest Generative Pre-trained Transformer (GPT) models underpinning ChatGPT as a foundation for the tasks of clinical phenotyping and phenotype annotation. MATERIALS AND METHODS: The experimental setup of the study included seven prompts of various levels of specificity, two GPT models (gpt-3.5-turbo and gpt-4.0) and two established gold standard corpora for phenotype recognition, one consisting of publication abstracts and the other clinical observations. RESULTS: The best run, using in-context learning, achieved 0.58 document-level F1 score on publication abstracts and 0.75 document-level F1 score on clinical observations, as well as a mention-level F1 score of 0.7, which surpasses the current best in class tool. Without in-context learning, however, performance is significantly below the existing approaches. CONCLUSION: Our experiments show that gpt-4.0 surpasses the state of the art performance if the task is constrained to a subset of the target ontology where there is prior knowledge of the terms that are expected to be matched. While the results are promising, the non-deterministic nature of the outcomes, the high cost and the lack of concordance between different runs using the same prompt and input make the use of these LLMs challenging for this particular task.


Assuntos
Conhecimento , Idioma , Humanos , Aprendizado de Máquina , Fenótipo , Doenças Raras
3.
bioRxiv ; 2024 Jul 04.
Artigo em Inglês | MEDLINE | ID: mdl-39005436

RESUMO

Objectives: Concept embeddings are low-dimensional vector representations of concepts such as MeSH:D009203 (Myocardial Infarction), whose similarity in the embedded vector space reflects their semantic similarity. Here, we test the hypothesis that non-biomedical concept synonym replacement can improve the quality of biomedical concepts embeddings. Materials and methods: We developed an approach that leverages WordNet to replace sets of synonyms with the most common representative of the synonym set. Results: We tested our approach on 1055 concept sets and found that, on average, the mean intra-cluster distance was reduced by 8% in the vector-space. Assuming that homophily of related concepts in the vector space is desirable, our approach tends to improve the quality of embeddings. Discussion and Conclusion: This pilot study shows that non-biomedical synonym replacement tends to improve the quality of embeddings of biomedical concepts using the Word2Vec algorithm. We have implemented our approach in a freely available Python package available at https://github.com/TheJacksonLaboratory/wn2vec.

4.
medRxiv ; 2024 Jul 22.
Artigo em Inglês | MEDLINE | ID: mdl-39108510

RESUMO

Large language models (LLM) have shown great promise in supporting differential diagnosis, but 23 available published studies on the diagnostic accuracy evaluated small cohorts (number of cases, 30-422, mean 104) and have evaluated LLM responses subjectively by manual curation (23/23 studies). The performance of LLMs for rare disease diagnosis has not been evaluated systematically. Here, we perform a rigorous and large-scale analysis of the performance of a GPT-4 in prioritizing candidate diagnoses, using the largest-ever cohort of rare disease patients. Our computational study used 5267 computational case reports from previously published data. Each case was formatted as a Global Alliance for Genomics and Health (GA4GH) phenopacket, in which clinical anomalies were represented as Human Phenotype Ontology (HPO) terms. We developed software to generate prompts from each phenopacket. Prompts were sent to Generative Pre-trained Transformer 4 (GPT-4), and the rank of the correct diagnosis, if present in the response, was recorded. The mean reciprocal rank of the correct diagnosis was 0.24 (with the reciprocal of the MRR corresponding to a rank of 4.2), and the correct diagnosis was placed in rank 1 in 19.2% of the cases, in the first 3 ranks in 28.6%, and in the first 10 ranks in 32.5%. Our study is the largest to be reported to date and provides a realistic estimate of the performance of GPT-4 in rare disease medicine.

5.
Transl Psychiatry ; 14(1): 246, 2024 Jun 08.
Artigo em Inglês | MEDLINE | ID: mdl-38851761

RESUMO

Acute COVID-19 infection can be followed by diverse clinical manifestations referred to as Post Acute Sequelae of SARS-CoV2 Infection (PASC). Studies have shown an increased risk of being diagnosed with new-onset psychiatric disease following a diagnosis of acute COVID-19. However, it was unclear whether non-psychiatric PASC-associated manifestations (PASC-AMs) are associated with an increased risk of new-onset psychiatric disease following COVID-19. A retrospective electronic health record (EHR) cohort study of 2,391,006 individuals with acute COVID-19 was performed to evaluate whether non-psychiatric PASC-AMs are associated with new-onset psychiatric disease. Data were obtained from the National COVID Cohort Collaborative (N3C), which has EHR data from 76 clinical organizations. EHR codes were mapped to 151 non-psychiatric PASC-AMs recorded 28-120 days following SARS-CoV-2 diagnosis and before diagnosis of new-onset psychiatric disease. Association of newly diagnosed psychiatric disease with age, sex, race, pre-existing comorbidities, and PASC-AMs in seven categories was assessed by logistic regression. There were significant associations between a diagnosis of any psychiatric disease and five categories of PASC-AMs with odds ratios highest for neurological, cardiovascular, and constitutional PASC-AMs with odds ratios of 1.31, 1.29, and 1.23 respectively. Secondary analysis revealed that the proportions of 50 individual clinical features significantly differed between patients diagnosed with different psychiatric diseases. Our study provides evidence for association between non-psychiatric PASC-AMs and the incidence of newly diagnosed psychiatric disease. Significant associations were found for features related to multiple organ systems. This information could prove useful in understanding risk stratification for new-onset psychiatric disease following COVID-19. Prospective studies are needed to corroborate these findings.


Assuntos
COVID-19 , Transtornos Mentais , SARS-CoV-2 , Humanos , COVID-19/psicologia , COVID-19/complicações , COVID-19/epidemiologia , Masculino , Feminino , Transtornos Mentais/epidemiologia , Pessoa de Meia-Idade , Adulto , Estudos Retrospectivos , Idoso , Fenótipo , Síndrome de COVID-19 Pós-Aguda , Comorbidade , Registros Eletrônicos de Saúde , Adulto Jovem , Fatores de Risco , Adolescente
6.
medRxiv ; 2024 May 29.
Artigo em Inglês | MEDLINE | ID: mdl-38854034

RESUMO

The Global Alliance for Genomics and Health (GA4GH) Phenopacket Schema was released in 2022 and approved by ISO as a standard for sharing clinical and genomic information about an individual, including phenotypic descriptions, numerical measurements, genetic information, diagnoses, and treatments. A phenopacket can be used as an input file for software that supports phenotype-driven genomic diagnostics and for algorithms that facilitate patient classification and stratification for identifying new diseases and treatments. There has been a great need for a collection of phenopackets to test software pipelines and algorithms. Here, we present phenopacket-store. Version 0.1.12 of phenopacket-store includes 4916 phenopackets representing 277 Mendelian and chromosomal diseases associated with 236 genes, and 2872 unique pathogenic alleles curated from 605 different publications. This represents the first large-scale collection of case-level, standardized phenotypic information derived from case reports in the literature with detailed descriptions of the clinical data and will be useful for many purposes, including the development and testing of software for prioritizing genes and diseases in diagnostic genomics, machine learning analysis of clinical phenotype data, patient stratification, and genotype-phenotype correlations. This corpus also provides best-practice examples for curating literature-derived data using the GA4GH Phenopacket Schema.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA