Pesquisa | Portal Regional da BVS

Structured Prompt Interrogation and Recursive Extraction of Semantics (SPIRES): a method for populating knowledge bases using zero-shot learning.

Caufield, J Harry; Hegde, Harshad; Emonet, Vincent; Harris, Nomi L; Joachimiak, Marcin P; Matentzoglu, Nicolas; Kim, HyeongSik; Moxon, Sierra; Reese, Justin T; Haendel, Melissa A; Robinson, Peter N; Mungall, Christopher J.

Bioinformatics ; 40(3)2024 Mar 04.

Artigo em Inglês | MEDLINE | ID: mdl-38383067

RESUMO

MOTIVATION: Creating knowledge bases and ontologies is a time consuming task that relies on manual curation. AI/NLP approaches can assist expert curators in populating these knowledge bases, but current approaches rely on extensive training data, and are not able to populate arbitrarily complex nested knowledge schemas. RESULTS: Here we present Structured Prompt Interrogation and Recursive Extraction of Semantics (SPIRES), a Knowledge Extraction approach that relies on the ability of Large Language Models (LLMs) to perform zero-shot learning and general-purpose query answering from flexible prompts and return information conforming to a specified schema. Given a detailed, user-defined knowledge schema and an input text, SPIRES recursively performs prompt interrogation against an LLM to obtain a set of responses matching the provided schema. SPIRES uses existing ontologies and vocabularies to provide identifiers for matched elements. We present examples of applying SPIRES in different domains, including extraction of food recipes, multi-species cellular signaling pathways, disease treatments, multi-step drug mechanisms, and chemical to disease relationships. Current SPIRES accuracy is comparable to the mid-range of existing Relation Extraction methods, but greatly surpasses an LLM's native capability of grounding entities with unique identifiers. SPIRES has the advantage of easy customization, flexibility, and, crucially, the ability to perform new tasks in the absence of any new training data. This method supports a general strategy of leveraging the language interpreting capabilities of LLMs to assemble knowledge bases, assisting manual knowledge curation and acquisition while supporting validation with publicly-available databases and ontologies external to the LLM. AVAILABILITY AND IMPLEMENTATION: SPIRES is available as part of the open source OntoGPT package: https://github.com/monarch-initiative/ontogpt.

Assuntos

Bases de Conhecimento , Semântica , Bases de Dados Factuais

Gene Set Summarization using Large Language Models.

Joachimiak, Marcin P; Caufield, J Harry; Harris, Nomi L; Kim, Hyeongsik; Mungall, Christopher J.

ArXiv ; 2023 May 25.

Artigo em Inglês | MEDLINE | ID: mdl-37292480

RESUMO

Molecular biologists frequently interpret gene lists derived from high-throughput experiments and computational analysis. This is typically done as a statistical enrichment analysis that measures the over- or under-representation of biological function terms associated with genes or their properties, based on curated assertions from a knowledge base (KB) such as the Gene Ontology (GO). Interpreting gene lists can also be framed as a textual summarization task, enabling the use of Large Language Models (LLMs), potentially utilizing scientific texts directly and avoiding reliance on a KB. We developed SPINDOCTOR (Structured Prompt Interpolation of Natural Language Descriptions of Controlled Terms for Ontology Reporting), a method that uses GPT models to perform gene set function summarization as a complement to standard enrichment analysis. This method can use different sources of gene functional information: (1) structured text derived from curated ontological KB annotations, (2) ontology-free narrative gene summaries, or (3) direct model retrieval. We demonstrate that these methods are able to generate plausible and biologically valid summary GO term lists for gene sets. However, GPT-based approaches are unable to deliver reliable scores or p-values and often return terms that are not statistically significant. Crucially, these methods were rarely able to recapitulate the most precise and informative term from standard enrichment, likely due to an inability to generalize and reason using an ontology. Results are highly nondeterministic, with minor variations in prompt resulting in radically different term lists. Our results show that at this point, LLM-based methods are unsuitable as a replacement for standard term enrichment analysis and that manual curation of ontological assertions remains necessary.

Ontology Development Kit: a toolkit for building, maintaining and standardizing biomedical ontologies.

Matentzoglu, Nicolas; Goutte-Gattat, Damien; Tan, Shawn Zheng Kai; Balhoff, James P; Carbon, Seth; Caron, Anita R; Duncan, William D; Flack, Joe E; Haendel, Melissa; Harris, Nomi L; Hogan, William R; Hoyt, Charles Tapley; Jackson, Rebecca C; Kim, HyeongSik; Kir, Huseyin; Larralde, Martin; McMurry, Julie A; Overton, James A; Peters, Bjoern; Pilgrim, Clare; Stefancsik, Ray; Robb, Sofia Mc; Toro, Sabrina; Vasilevsky, Nicole A; Walls, Ramona; Mungall, Christopher J; Osumi-Sutherland, David.

Database (Oxford) ; 20222022 10 08.

Artigo em Inglês | MEDLINE | ID: mdl-36208225

RESUMO

Similar to managing software packages, managing the ontology life cycle involves multiple complex workflows such as preparing releases, continuous quality control checking and dependency management. To manage these processes, a diverse set of tools is required, from command-line utilities to powerful ontology-engineering environmentsr. Particularly in the biomedical domain, which has developed a set of highly diverse yet inter-dependent ontologies, standardizing release practices and metadata and establishing shared quality standards are crucial to enable interoperability. The Ontology Development Kit (ODK) provides a set of standardized, customizable and automatically executable workflows, and packages all required tooling in a single Docker image. In this paper, we provide an overview of how the ODK works, show how it is used in practice and describe how we envision it driving standardization efforts in our community. Database URL: https://github.com/INCATools/ontology-development-kit.

Assuntos

Ontologias Biológicas , Bases de Dados Factuais , Metadados , Controle de Qualidade , Software , Fluxo de Trabalho

A Simple Standard for Sharing Ontological Mappings (SSSOM).

Matentzoglu, Nicolas; Balhoff, James P; Bello, Susan M; Bizon, Chris; Brush, Matthew; Callahan, Tiffany J; Chute, Christopher G; Duncan, William D; Evelo, Chris T; Gabriel, Davera; Graybeal, John; Gray, Alasdair; Gyori, Benjamin M; Haendel, Melissa; Harmse, Henriette; Harris, Nomi L; Harrow, Ian; Hegde, Harshad B; Hoyt, Amelia L; Hoyt, Charles T; Jiao, Dazhi; Jiménez-Ruiz, Ernesto; Jupp, Simon; Kim, Hyeongsik; Koehler, Sebastian; Liener, Thomas; Long, Qinqin; Malone, James; McLaughlin, James A; McMurry, Julie A; Moxon, Sierra; Munoz-Torres, Monica C; Osumi-Sutherland, David; Overton, James A; Peters, Bjoern; Putman, Tim; Queralt-Rosinach, Núria; Shefchek, Kent; Solbrig, Harold; Thessen, Anne; Tudorache, Tania; Vasilevsky, Nicole; Wagner, Alex H; Mungall, Christopher J.

Database (Oxford) ; 20222022 05 25.

Artigo em Inglês | MEDLINE | ID: mdl-35616100

RESUMO

Despite progress in the development of standards for describing and exchanging scientific information, the lack of easy-to-use standards for mapping between different representations of the same or similar objects in different databases poses a major impediment to data integration and interoperability. Mappings often lack the metadata needed to be correctly interpreted and applied. For example, are two terms equivalent or merely related? Are they narrow or broad matches? Or are they associated in some other way? Such relationships between the mapped terms are often not documented, which leads to incorrect assumptions and makes them hard to use in scenarios that require a high degree of precision (such as diagnostics or risk prediction). Furthermore, the lack of descriptions of how mappings were done makes it hard to combine and reconcile mappings, particularly curated and automated ones. We have developed the Simple Standard for Sharing Ontological Mappings (SSSOM) which addresses these problems by: (i) Introducing a machine-readable and extensible vocabulary to describe metadata that makes imprecision, inaccuracy and incompleteness in mappings explicit. (ii) Defining an easy-to-use simple table-based format that can be integrated into existing data science pipelines without the need to parse or query ontologies, and that integrates seamlessly with Linked Data principles. (iii) Implementing open and community-driven collaborative workflows that are designed to evolve the standard continuously to address changing requirements and mapping practices. (iv) Providing reference tools and software libraries for working with the standard. In this paper, we present the SSSOM standard, describe several use cases in detail and survey some of the existing work on standardizing the exchange of mappings, with the goal of making mappings Findable, Accessible, Interoperable and Reusable (FAIR). The SSSOM specification can be found at http://w3id.org/sssom/spec. Database URL: http://w3id.org/sssom/spec.

Assuntos

Metadados , Web Semântica , Gerenciamento de Dados , Bases de Dados Factuais , Fluxo de Trabalho

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA