Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 2.172
Filtrar
1.
Phys Med ; 121: 103364, 2024 May.
Artigo em Inglês | MEDLINE | ID: mdl-38701626

RESUMO

PURPOSE: Test whether a well-grounded KBP model trained on moderately hypo-fractionated prostate treatments can be used to satisfactorily drive the optimization of SBRT prostate treatments. MATERIALS AND METHODS: A KBP model (SBRT-model) was developed, trained and validated using the first forty-seven clinically treated VMAT SBRT prostate plans (42.7 Gy/7fx or 36.25 Gy/5fx). The performance and robustness of this model were compared against a high-quality KBP-model (ST-model) that was already clinically adopted for hypo-fractionated (70 Gy/28fx and 60 Gy/20fx) prostate treatments. The two models were compared in terms of their predictions robustness, and the quality of their outcomes were evaluated against a set of reference clinical SBRT plans. Plan quality was assessed using DVH metrics, blinded clinical ranking, and a dedicated Plan Quality Metric algorithm. RESULTS: The plan libraries of the two models were found to share a high degree of anatomical similarity. The overall quality (APQM%) of the plans obtained both with the ST- and SBRT-models was compatible with that of the original clinical plans, namely (93.7 ± 4.1)% and (91.6 ± 3.9)% vs (92.8.9 ± 3.6)%. Plans obtained with the ST-model showed significantly higher target coverage (PTV V95%): (97.9 ± 0.8)% vs (97.1 ± 0.9)% (p < 0.05). Conversely, plans optimized following the SBRT-model showed a small but not-clinically relevant increase in OAR sparing. ST-model generally provided more reliable predictions than SBRT-model. Two radiation oncologists judged as equivalent the plans based on the KBP prediction, which was also judged better that reference clinical plans. CONCLUSION: A KBP model trained on moderately fractionated prostate treatment plans provided optimal SBRT prostate plans, with similar or larger plan quality than an embryonic SBRT-model based on a limited number of cases.


Assuntos
Neoplasias da Próstata , Radiocirurgia , Planejamento da Radioterapia Assistida por Computador , Humanos , Planejamento da Radioterapia Assistida por Computador/métodos , Radiocirurgia/métodos , Masculino , Neoplasias da Próstata/radioterapia , Bases de Conhecimento , Radioterapia de Intensidade Modulada/métodos , Dosagem Radioterapêutica
2.
Lung Cancer ; 191: 107787, 2024 May.
Artigo em Inglês | MEDLINE | ID: mdl-38593479

RESUMO

AIMS: To date, precision medicine has revolutionized the clinical management of Non-Small Cell Lung Cancer (NSCLC). International societies approved a rapidly improved mandatory testing biomarkers panel for the clinical stratification of NSCLC patients, but harmonized procedures are required to optimize the diagnostic workflow. In this context a knowledge-based database (Biomarkers ATLAS, https://biomarkersatlas.com/) was developed by a supervising group of expert pathologists and thoracic oncologists collecting updated clinical and molecular records from about 80 referral Italian institutions. Here, we audit molecular and clinical data from n = 1100 NSCLC patients collected from January 2019 to December 2020. METHODS: Clinical and molecular records from NSCLC patients were retrospectively collected from the two coordinating institutions (University of Turin and University of Naples). Molecular biomarkers (KRAS, EGFR, BRAF, ROS1, ALK, RET, NTRK, MET) and clinical data (sex, age, histological type, smoker status, PD-L1 expression, therapy) were collected and harmonized. RESULTS: Clinical and molecular data from 1100 (n = 552 mutated and n = 548 wild-type) NSCLC patients were systematized and annotated in the ATLAS knowledge-database. Molecular records from biomarkers testing were matched with main patients' clinical variables. CONCLUSIONS: Biomarkers ATLAS (https://biomarkersatlas.com/) represents a unique, easily managing, and reliable diagnostic tool aiming to integrate clinical records with molecular alterations of NSCLC patients in the real-word Italian scenario.


Assuntos
Biomarcadores Tumorais , Carcinoma Pulmonar de Células não Pequenas , Neoplasias Pulmonares , Humanos , Carcinoma Pulmonar de Células não Pequenas/diagnóstico , Carcinoma Pulmonar de Células não Pequenas/genética , Carcinoma Pulmonar de Células não Pequenas/patologia , Carcinoma Pulmonar de Células não Pequenas/metabolismo , Neoplasias Pulmonares/diagnóstico , Neoplasias Pulmonares/genética , Neoplasias Pulmonares/patologia , Itália , Masculino , Feminino , Idoso , Pessoa de Meia-Idade , Estudos Retrospectivos , Bases de Dados Factuais , Bases de Conhecimento , Adulto , Idoso de 80 Anos ou mais
3.
Med Phys ; 51(5): 3207-3219, 2024 May.
Artigo em Inglês | MEDLINE | ID: mdl-38598107

RESUMO

BACKGROUND: Current methods for Gamma Knife (GK) treatment planning utilizes either manual forward planning, where planners manually place shots in a tumor to achieve a desired dose distribution, or inverse planning, whereby the dose delivered to a tumor is optimized for multiple objectives based on established metrics. For other treatment modalities like IMRT and VMAT, there has been a recent push to develop knowledge-based planning (KBP) pipelines to address the limitations presented by forward and inverse planning. However, no complete KBP pipeline has been created for GK. PURPOSE: To develop a novel (KBP) pipeline, using inverse optimization (IO) with 3D dose predictions for GK. METHODS: Data were obtained for 349 patients from Sunnybrook Health Sciences Centre. A 3D dose prediction model was trained using 322 patients, based on a previously published deep learning methodology, and dose predictions were generated for the remaining 27 out-of-sample patients. A generalized IO model was developed to learn objective function weights from dose predictions. These weights were then used in an inverse planning model to generate deliverable treatment plans. A dose mimicking (DM) model was also implemented for comparison. The quality of the resulting plans was compared to their clinical counterparts using standard GK quality metrics. The performance of the models was also characterized with respect to the dose predictions. RESULTS: Across all quality metrics, plans generated using the IO pipeline performed at least as well as or better than the respective clinical plans. The average conformity and gradient indices of IO plans was 0.737 ± $\pm$ 0.158 and 3.356 ± $\pm$ 1.030 respectively, compared to 0.713 ± $\pm$ 0.124 and 3.452 ± $\pm$ 1.123 for the clinical plans. IO plans also performed better than DM plans for five of the six quality metrics. Plans generated using IO also have average treatment times comparable to that of clinical plans. With regards to the dose predictions, predictions with higher conformity tend to result in higher quality KBP plans. CONCLUSIONS: Plans resulting from an IO KBP pipeline are, on average, of equal or superior quality compared to those obtained through manual planning. The results demonstrate the potential for the use of KBP to generate GK treatment with minimal human intervention.


Assuntos
Radiocirurgia , Dosagem Radioterapêutica , Planejamento da Radioterapia Assistida por Computador , Planejamento da Radioterapia Assistida por Computador/métodos , Radiocirurgia/métodos , Humanos , Bases de Conhecimento , Doses de Radiação
4.
Sci Data ; 11(1): 363, 2024 Apr 11.
Artigo em Inglês | MEDLINE | ID: mdl-38605048

RESUMO

Translational research requires data at multiple scales of biological organization. Advancements in sequencing and multi-omics technologies have increased the availability of these data, but researchers face significant integration challenges. Knowledge graphs (KGs) are used to model complex phenomena, and methods exist to construct them automatically. However, tackling complex biomedical integration problems requires flexibility in the way knowledge is modeled. Moreover, existing KG construction methods provide robust tooling at the cost of fixed or limited choices among knowledge representation models. PheKnowLator (Phenotype Knowledge Translator) is a semantic ecosystem for automating the FAIR (Findable, Accessible, Interoperable, and Reusable) construction of ontologically grounded KGs with fully customizable knowledge representation. The ecosystem includes KG construction resources (e.g., data preparation APIs), analysis tools (e.g., SPARQL endpoint resources and abstraction algorithms), and benchmarks (e.g., prebuilt KGs). We evaluated the ecosystem by systematically comparing it to existing open-source KG construction methods and by analyzing its computational performance when used to construct 12 different large-scale KGs. With flexible knowledge representation, PheKnowLator enables fully customizable KGs without compromising performance or usability.


Assuntos
Disciplinas das Ciências Biológicas , Bases de Conhecimento , Reconhecimento Automatizado de Padrão , Algoritmos , Pesquisa Translacional Biomédica
5.
Brief Bioinform ; 25(3)2024 Mar 27.
Artigo em Inglês | MEDLINE | ID: mdl-38678388

RESUMO

Cyclic peptides offer a range of notable advantages, including potent antibacterial properties, high binding affinity and specificity to target molecules, and minimal toxicity, making them highly promising candidates for drug development. However, a comprehensive database that consolidates both synthetically derived and naturally occurring cyclic peptides is conspicuously absent. To address this void, we introduce CyclicPepedia (https://www.biosino.org/iMAC/cyclicpepedia/), a pioneering database that encompasses 8744 known cyclic peptides. This repository, structured as a composite knowledge network, offers a wealth of information encompassing various aspects of cyclic peptides, such as cyclic peptides' sources, categorizations, structural characteristics, pharmacokinetic profiles, physicochemical properties, patented drug applications, and a collection of crucial publications. Supported by a user-friendly knowledge retrieval system and calculation tools specifically designed for cyclic peptides, CyclicPepedia will be able to facilitate advancements in cyclic peptide drug development.


Assuntos
Bases de Conhecimento , Peptídeos Cíclicos , Peptídeos Cíclicos/química , Bases de Dados de Proteínas
6.
J Med Internet Res ; 26: e46777, 2024 Apr 18.
Artigo em Inglês | MEDLINE | ID: mdl-38635981

RESUMO

BACKGROUND: As global populations age and become susceptible to neurodegenerative illnesses, new therapies for Alzheimer disease (AD) are urgently needed. Existing data resources for drug discovery and repurposing fail to capture relationships central to the disease's etiology and response to drugs. OBJECTIVE: We designed the Alzheimer's Knowledge Base (AlzKB) to alleviate this need by providing a comprehensive knowledge representation of AD etiology and candidate therapeutics. METHODS: We designed the AlzKB as a large, heterogeneous graph knowledge base assembled using 22 diverse external data sources describing biological and pharmaceutical entities at different levels of organization (eg, chemicals, genes, anatomy, and diseases). AlzKB uses a Web Ontology Language 2 ontology to enforce semantic consistency and allow for ontological inference. We provide a public version of AlzKB and allow users to run and modify local versions of the knowledge base. RESULTS: AlzKB is freely available on the web and currently contains 118,902 entities with 1,309,527 relationships between those entities. To demonstrate its value, we used graph data science and machine learning to (1) propose new therapeutic targets based on similarities of AD to Parkinson disease and (2) repurpose existing drugs that may treat AD. For each use case, AlzKB recovers known therapeutic associations while proposing biologically plausible new ones. CONCLUSIONS: AlzKB is a new, publicly available knowledge resource that enables researchers to discover complex translational associations for AD drug discovery. Through 2 use cases, we show that it is a valuable tool for proposing novel therapeutic hypotheses based on public biomedical knowledge.


Assuntos
Doença de Alzheimer , Humanos , Doença de Alzheimer/tratamento farmacológico , Doença de Alzheimer/genética , Reconhecimento Automatizado de Padrão , Bases de Conhecimento , Aprendizado de Máquina , Conhecimento
7.
J Am Med Inform Assoc ; 31(5): 1126-1134, 2024 Apr 19.
Artigo em Inglês | MEDLINE | ID: mdl-38481028

RESUMO

OBJECTIVE: Development of clinical phenotypes from electronic health records (EHRs) can be resource intensive. Several phenotype libraries have been created to facilitate reuse of definitions. However, these platforms vary in target audience and utility. We describe the development of the Centralized Interactive Phenomics Resource (CIPHER) knowledgebase, a comprehensive public-facing phenotype library, which aims to facilitate clinical and health services research. MATERIALS AND METHODS: The platform was designed to collect and catalog EHR-based computable phenotype algorithms from any healthcare system, scale metadata management, facilitate phenotype discovery, and allow for integration of tools and user workflows. Phenomics experts were engaged in the development and testing of the site. RESULTS: The knowledgebase stores phenotype metadata using the CIPHER standard, and definitions are accessible through complex searching. Phenotypes are contributed to the knowledgebase via webform, allowing metadata validation. Data visualization tools linking to the knowledgebase enhance user interaction with content and accelerate phenotype development. DISCUSSION: The CIPHER knowledgebase was developed in the largest healthcare system in the United States and piloted with external partners. The design of the CIPHER website supports a variety of front-end tools and features to facilitate phenotype development and reuse. Health data users are encouraged to contribute their algorithms to the knowledgebase for wider dissemination to the research community, and to use the platform as a springboard for phenotyping. CONCLUSION: CIPHER is a public resource for all health data users available at https://phenomics.va.ornl.gov/ which facilitates phenotype reuse, development, and dissemination of phenotyping knowledge.


Assuntos
Registros Eletrônicos de Saúde , Fenômica , Fenótipo , Bases de Conhecimento , Algoritmos
8.
J Chem Inf Model ; 64(6): 1868-1881, 2024 Mar 25.
Artigo em Inglês | MEDLINE | ID: mdl-38483449

RESUMO

The lengthy and expensive process of developing new drugs from scratch, coupled with a high failure rate, has prompted the emergence of drug repurposing/repositioning as a more efficient and cost-effective approach. This approach involves identifying new therapeutic applications for existing approved drugs, leveraging the extensive drug-related data already gathered. However, the diversity and heterogeneity of data, along with the limited availability of known drug-disease interactions, pose significant challenges to computational drug design. To address these challenges, this study introduces EKGDR, an end-to-end knowledge graph-based approach for computational drug repurposing. EKGDR utilizes the power of a drug knowledge graph, a comprehensive repository of drug-related information that encompasses known drug interactions and various categorization information, as well as structural molecular descriptors of drugs. EKGDR employs graph neural networks, a cutting-edge graph representation learning technique, to embed the drug knowledge graph (nodes and relations) in an end-to-end manner. By doing so, EKGDR can effectively learn the underlying causes (intents) behind drug-disease interactions and recursively aggregate and combine relational messages between nodes along different multihop neighborhood paths (relational paths). This process generates representations of disease and drug nodes, enabling EKGDR to predict the interaction probability for each drug-disease pair in an end-to-end manner. The obtained results demonstrate that EKGDR outperforms previous models in all three evaluation metrics: area under the receiver operating characteristic curve (AUROC = 0.9475), area under the precision-recall curve (AUPRC = 0.9490), and recall at the top-200 recommendations (Recall@200 = 0.8315). To further validate EKGDR's effectiveness, we evaluated the top-20 candidate drugs suggested for each of Alzheimer's and Parkinson's diseases.


Assuntos
Reposicionamento de Medicamentos , Reconhecimento Automatizado de Padrão , Reposicionamento de Medicamentos/métodos , Redes Neurais de Computação , Bases de Conhecimento , Interações Medicamentosas
9.
PLoS One ; 19(3): e0296864, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38536833

RESUMO

The modeling of uncertain information is an open problem in ontology research and is a theoretical obstacle to creating a truly semantic web. Currently, ontologies often do not model uncertainty, so stochastic subject matter must either be normalized or rejected entirely. Because uncertainty is omnipresent in the real world, knowledge engineers are often faced with the dilemma of performing prohibitively labor-intensive research or running the risk of rejecting correct information and accepting incorrect information. It would be preferable if ontologies could explicitly model real-world uncertainty and incorporate it into reasoning. We present an ontology framework which is based on a seamless synthesis of description logic and probabilistic semantics. This synthesis is powered by a link between ontology assertions and random variables that allows for automated construction of a probability distribution suitable for inferencing. Furthermore, our approach defines how to represent stochastic, uncertain, or incomplete subject matter. Additionally, this paper describes how to fuse multiple conflicting ontologies into a single knowledge base that can be reasoned with using the methods of both description logic and probabilistic inferencing. This is accomplished by using probabilistic semantics to resolve conflicts between assertions, eliminating the need to delete potentially valid knowledge and perform consistency checks. In our framework, emergent inferences can be made from a fused ontology that were not present in any of the individual ontologies, producing novel insights in a given domain.


Assuntos
Ontologias Biológicas , Semântica , Incerteza , Teorema de Bayes , Bases de Conhecimento , Lógica
10.
J Biomed Semantics ; 15(1): 1, 2024 Mar 04.
Artigo em Inglês | MEDLINE | ID: mdl-38438913

RESUMO

The increasing number of articles on adverse interactions that may occur when specific foods are consumed with certain drugs makes it difficult to keep up with the latest findings. Conflicting information is available in the scientific literature and specialized knowledge bases because interactions are described in an unstructured or semi-structured format. The FIDEO ontology aims to integrate and represent information about food-drug interactions in a structured way. This article reports on the new version of this ontology in which more than 1700 interactions are integrated from two online resources: DrugBank and Hedrine. These food-drug interactions have been represented in FIDEO in the form of precompiled concepts, each of which specifies both the food and the drug involved. Additionally, competency questions that can be answered are reviewed, and avenues for further enrichment are discussed.


Assuntos
Interações Alimento-Droga , Bases de Conhecimento
11.
Artif Intell Med ; 149: 102812, 2024 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-38462270

RESUMO

Mental and physical disorders (MPD) are inextricably linked in many medical cases; psychosomatic diseases can be induced by mental concerns and psychological discomfort can ensue from physiological diseases. However, existing medical informatics studies focus on identifying mental or physical disorders from a unilateral perspective. Consequently, no existing domain knowledge base, corpus, or detection modeling approach considers mental as well as physical aspects concurrently. This paper proposes a joint modeling approach to detect MPD. First, we crawl through online medical consultation records of patients from websites and build an MPD knowledge ontology by extracting the core conceptual features of the text. Based on the ontology, an MPD knowledge graph containing 12,673 nodes and 82,195 relations is obtained using term matching with a domain thesaurus of each concept. Subsequently, an MPD corpus with fine-grained severities (None, Mild, Moderate, Severe, Dangerous) and 8909 records is constructed by formulating MPD classification criteria and a data annotation process under the guidance of domain experts. Taking the knowledge graph and corpus as the dataset, we design a multi-task learning model to detect the MPD severity, in which a knowledge graph attention network (KGAT) is embedded to better extract knowledge features. Experiments are performed to demonstrate the effectiveness of our model. Furthermore, we employ ontology-based and centrality-based methods to discover additional potential inferred knowledge, which can be captured by KGAT so as to improve the prediction performance and interpretability of our model. Our dataset has been made publicly available, so it can be further used as a medical informatics reference in the fields of psychosomatic medicine, psychiatrics, physical co-morbidity, and so on.


Assuntos
Transtornos Mentais , Psiquiatria , Humanos , Reconhecimento Automatizado de Padrão , Aprendizagem , Transtornos Mentais/diagnóstico , Bases de Conhecimento
12.
PLoS One ; 19(3): e0297044, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38478525

RESUMO

This study examines the relationship between CEO career variety, digital knowledge base extension, and digital transformation in a digital M&A context. An empirical test was conducted using regression analysis with the digital M&A events of the new generation of information technology firms in China as the research sample. The results reveal that CEO career variety has a positive effect on digital transformation in the digital M&A context and that digital knowledge-base extension plays a mediating role. Moreover, the heterogeneity impact analysis indicated that the moderating effects of geographical distance, knowledge disparity, and cultural difference between target and acquirer firms on the above relationships vary greatly: geographical distance has a negative moderating effect, cultural difference has a positive moderating effect, and the moderating effects of both geographical distance and cultural difference are realized through mediating effects, but none of the moderating effects of knowledge disparity are significant.


Assuntos
Evolução Cultural , Tecnologia da Informação , Ciência da Informação , China , Bases de Conhecimento
13.
Genetics ; 227(1)2024 May 07.
Artigo em Inglês | MEDLINE | ID: mdl-38531069

RESUMO

Mouse Genome Informatics (MGI) is a federation of expertly curated information resources designed to support experimental and computational investigations into genetic and genomic aspects of human biology and disease using the laboratory mouse as a model system. The Mouse Genome Database (MGD) and the Gene Expression Database (GXD) are core MGI databases that share data and system architecture. MGI serves as the central community resource of integrated information about mouse genome features, variation, expression, gene function, phenotype, and human disease models acquired from peer-reviewed publications, author submissions, and major bioinformatics resources. To facilitate integration and standardization of data, biocuration scientists annotate using terms from controlled metadata vocabularies and biological ontologies (e.g. Mammalian Phenotype Ontology, Mouse Developmental Anatomy, Disease Ontology, Gene Ontology, etc.), and by applying international community standards for gene, allele, and mouse strain nomenclature. MGI serves basic scientists, translational researchers, and data scientists by providing access to FAIR-compliant data in both human-readable and compute-ready formats. The MGI resource is accessible at https://informatics.jax.org. Here, we present an overview of the core data types represented in MGI and highlight recent enhancements to the resource with a focus on new data and functionality for MGD and GXD.


Assuntos
Bases de Dados Genéticas , Genoma , Animais , Camundongos , Bases de Conhecimento , Genômica/métodos , Biologia Computacional/métodos , Humanos
14.
Elife ; 122024 Feb 12.
Artigo em Inglês | MEDLINE | ID: mdl-38345923

RESUMO

Hippocampome.org is a mature open-access knowledge base of the rodent hippocampal formation focusing on neuron types and their properties. Previously, Hippocampome.org v1.0 established a foundational classification system identifying 122 hippocampal neuron types based on their axonal and dendritic morphologies, main neurotransmitter, membrane biophysics, and molecular expression (Wheeler et al., 2015). Releases v1.1 through v1.12 furthered the aggregation of literature-mined data, including among others neuron counts, spiking patterns, synaptic physiology, in vivo firing phases, and connection probabilities. Those additional properties increased the online information content of this public resource over 100-fold, enabling numerous independent discoveries by the scientific community. Hippocampome.org v2.0, introduced here, besides incorporating over 50 new neuron types, now recenters its focus on extending the functionality to build real-scale, biologically detailed, data-driven computational simulations. In all cases, the freely downloadable model parameters are directly linked to the specific peer-reviewed empirical evidence from which they were derived. Possible research applications include quantitative, multiscale analyses of circuit connectivity and spiking neural network simulations of activity dynamics. These advances can help generate precise, experimentally testable hypotheses and shed light on the neural mechanisms underlying associative memory and spatial navigation.


Assuntos
Hipocampo , Roedores , Animais , Hipocampo/fisiologia , Neurônios/fisiologia , Redes Neurais de Computação , Bases de Conhecimento
15.
Artif Intell Med ; 148: 102748, 2024 02.
Artigo em Inglês | MEDLINE | ID: mdl-38325935

RESUMO

Medical automatic diagnosis aims to organize real-world diagnostic processes similar to those from human doctors and to achieve accurate diagnoses by interacting with patients. The task is formulated as a sequential decision-making problem with a series of information inquiry steps (asking about symptoms and ordering examinations) and the final diagnosis. Recent research has studied incorporating reinforcement learning for information inquiry and classification techniques for disease diagnosis, respectively. However, studies on efficiently and effectively combining the two procedures are still lacking. To address this issue, we devised an adaptive mechanism to align reinforcement learning and classification methods using distribution entropy as the medium. Additionally, we created a new dataset for patient simulation to address the lack of large-scale evaluation benchmarks. The dataset is extracted from the MedlinePlus knowledge base and contains significantly more diseases and more comprehensive symptom and examination information than existing datasets. Experimental evaluation shows that our method outperforms three current state-of-the-art methods on different datasets by achieving higher medical diagnostic accuracy with fewer inquiring turns.


Assuntos
Aprendizagem , Médicos , Humanos , Reforço Psicológico , Entropia , Bases de Conhecimento
16.
Comput Methods Programs Biomed ; 246: 108051, 2024 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-38301394

RESUMO

BACKGROUND AND OBJECTIVE: Symptom descriptions by ordinary people are often inaccurate or vague when seeking medical advice, which often leads to inaccurate preliminary clinical diagnoses. To address this issue, we propose a deep learning model named the knowledgeable diagnostic transformer (KDT) for the natural language processing (NLP)-based preliminary clinical diagnoses. METHODS: The KDT extracts symptom-disease relation triples (h,r,t) from patient symptom descriptions by using a proposed bipartite medical knowledge graph (bMKG). To avoid too many relation triples causing the knowledge noise issue, we propose a knowledge inclusion-exclusion approach (KIA) to eliminate undesirable triples (a knowledge filtering layer). Next, we combine token embedding techniques with the transformer model to predict the diseases that patients may encounter. RESULTS: To train the KDT, a medical diagnosis question-answering dataset (named MDQA dataset) containing large-scale, high-quality questions (patient syndrome description) and answering (diagnosis) corpora with 2.6M entries (1.07GB in size) in Mandarin was built. We also train the KDT with the National Institutes of Health (NIH) English dataset (MedQuAD). The KDT marks a transformative approach by achieving a remarkable accuracy of 99% for different evaluation metrics when compared with the baseline transformers used for the NLP-based preliminary clinical diagnoses approaches. CONCLUSIONS: In essence, our study not only demonstrates the effectiveness of the KDT in enhancing diagnostic precision but also underscores its potential to revolutionize the field of preliminary clinical diagnoses. By harnessing the power of knowledge-based approaches and advanced NLP techniques, we have paved the way for more accurate and reliable diagnoses, ultimately benefiting both healthcare providers and patients. The KDT has the potential to significantly reduce misdiagnoses and improve patient outcomes, marking a pivotal advancement in the realm of medical diagnostics.


Assuntos
Benchmarking , Processamento de Linguagem Natural , Humanos , Bases de Conhecimento , Idioma , Encaminhamento e Consulta , Estados Unidos
17.
Comput Biol Med ; 170: 108105, 2024 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-38330823

RESUMO

Infertility affects ∼15% of couples globally and half of cases are related to genetic disorders. Despite growing data and unprecedented improvements in high-throughput sequencing technologies, accumulated fertility-related issues concerning genetic diagnosis and potential treatment are urgent to be solved. However, there is a lack of comprehensive platforms that characterise various infertility-related records to provide research applications for exploring infertility in-depth and genetic counselling of infertility couple. To solve this problem, we provide IDDB Xtra by further integrating phenotypic manifestations, genomic datasets, epigenetics, modulators in collaboration with numerous interactive tools into our previous infertility database, IDDB. IDDB Xtra houses manually-curated 2369 genes of human and nine model organisms, 273 chromosomal abnormalities, 884 phenotypes, 60 genomic datasets, 464 epigenetic records, 1144 modulators relevant to infertility diagnosis and treatment. Additionally, IDDB Xtra incorporated customized graphical applications for researchers and clinicians to decipher in-depth disease mechanisms from the perspectives of developmental atlas, mutation effects, and clinical manifestations. Users can browse genes across developmental stages of human and mouse, filter candidate genes, mine potential variants and retrieve infertility biomedical network in an intuitive web interface. In summary, IDDB Xtra not only captures valuable research and data, but also provides useful applications to facilitate the genetic counselling and drug discovery of infertility. IDDB Xtra is freely available at https://mdl.shsmu.edu.cn/IDDB/and http://www.allostery.net/IDDB.


Assuntos
Infertilidade , Humanos , Camundongos , Animais , Bases de Dados Factuais , Mutação , Infertilidade/genética , Fenótipo , Bases de Conhecimento
18.
Bioinformatics ; 40(3)2024 Mar 04.
Artigo em Inglês | MEDLINE | ID: mdl-38383067

RESUMO

MOTIVATION: Creating knowledge bases and ontologies is a time consuming task that relies on manual curation. AI/NLP approaches can assist expert curators in populating these knowledge bases, but current approaches rely on extensive training data, and are not able to populate arbitrarily complex nested knowledge schemas. RESULTS: Here we present Structured Prompt Interrogation and Recursive Extraction of Semantics (SPIRES), a Knowledge Extraction approach that relies on the ability of Large Language Models (LLMs) to perform zero-shot learning and general-purpose query answering from flexible prompts and return information conforming to a specified schema. Given a detailed, user-defined knowledge schema and an input text, SPIRES recursively performs prompt interrogation against an LLM to obtain a set of responses matching the provided schema. SPIRES uses existing ontologies and vocabularies to provide identifiers for matched elements. We present examples of applying SPIRES in different domains, including extraction of food recipes, multi-species cellular signaling pathways, disease treatments, multi-step drug mechanisms, and chemical to disease relationships. Current SPIRES accuracy is comparable to the mid-range of existing Relation Extraction methods, but greatly surpasses an LLM's native capability of grounding entities with unique identifiers. SPIRES has the advantage of easy customization, flexibility, and, crucially, the ability to perform new tasks in the absence of any new training data. This method supports a general strategy of leveraging the language interpreting capabilities of LLMs to assemble knowledge bases, assisting manual knowledge curation and acquisition while supporting validation with publicly-available databases and ontologies external to the LLM. AVAILABILITY AND IMPLEMENTATION: SPIRES is available as part of the open source OntoGPT package: https://github.com/monarch-initiative/ontogpt.


Assuntos
Bases de Conhecimento , Semântica , Bases de Dados Factuais
19.
BMC Bioinformatics ; 25(1): 62, 2024 Feb 07.
Artigo em Inglês | MEDLINE | ID: mdl-38326757

RESUMO

BACKGROUND: Recent developments in the domain of biomedical knowledge bases (KBs) open up new ways to exploit biomedical knowledge that is available in the form of KBs. Significant work has been done in the direction of biomedical KB creation and KB completion, specifically, those having gene-disease associations and other related entities. However, the use of such biomedical KBs in combination with patients' temporal clinical data still largely remains unexplored, but has the potential to immensely benefit medical diagnostic decision support systems. RESULTS: We propose two new algorithms, LOADDx and SCADDx, to combine a patient's gene expression data with gene-disease association and other related information available in the form of a KB, to assist personalized disease diagnosis. We have tested both of the algorithms on two KBs and on four real-world gene expression datasets of respiratory viral infection caused by Influenza-like viruses of 19 subtypes. We also compare the performance of proposed algorithms with that of five existing state-of-the-art machine learning algorithms (k-NN, Random Forest, XGBoost, Linear SVM, and SVM with RBF Kernel) using two validation approaches: LOOCV and a single internal validation set. Both SCADDx and LOADDx outperform the existing algorithms when evaluated with both validation approaches. SCADDx is able to detect infections with up to 100% accuracy in the cases of Datasets 2 and 3. Overall, SCADDx and LOADDx are able to detect an infection within 72 h of infection with 91.38% and 92.66% average accuracy respectively considering all four datasets, whereas XGBoost, which performed best among the existing machine learning algorithms, can detect the infection with only 86.43% accuracy on an average. CONCLUSIONS: We demonstrate how our novel idea of using the most and least differentially expressed genes in combination with a KB can enable identification of the diseases that a patient is most likely to have at a particular time, from a KB with thousands of diseases. Moreover, the proposed algorithms can provide a short ranked list of the most likely diseases for each patient along with their most affected genes, and other entities linked with them in the KB, which can support health care professionals in their decision-making.


Assuntos
Bases de Conhecimento , Transcriptoma , Humanos , Algoritmos , Aprendizado de Máquina
20.
Nucleic Acids Res ; 52(D1): D1210-D1217, 2024 Jan 05.
Artigo em Inglês | MEDLINE | ID: mdl-38183204

RESUMO

The Catalogue Of Somatic Mutations In Cancer (COSMIC), https://cancer.sanger.ac.uk/cosmic, is an expert-curated knowledgebase providing data on somatic variants in cancer, supported by a comprehensive suite of tools for interpreting genomic data, discerning the impact of somatic alterations on disease, and facilitating translational research. The catalogue is accessed and used by thousands of cancer researchers and clinicians daily, allowing them to quickly access information from an immense pool of data curated from over 29 thousand scientific publications and large studies. Within the last 4 years, COSMIC has substantially expanded its utility by adding new resources: the Mutational Signatures catalogue, the Cancer Mutation Census, and Actionability. To improve data accessibility and interoperability, somatic variants have received stable genomic identifiers that are associated with their genomic coordinates in GRCh37 and GRCh38, and new export files with reduced data redundancy have been made available for download.


Assuntos
Bases de Dados Genéticas , Genômica , Neoplasias , Humanos , Bases de Dados Factuais , Bases de Conhecimento , Mutação , Neoplasias/genética , Bases de Dados Genéticas/tendências , Internet
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...