Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 56
Filtrar
1.
medRxiv ; 2024 Aug 22.
Artículo en Inglés | MEDLINE | ID: mdl-39228707

RESUMEN

BACKGROUND: Structured representations of clinical data can support computational analysis of individuals and cohorts, and ontologies representing disease entities and phenotypic abnormalities are now commonly used for translational research. The Medical Action Ontology (MAxO) provides a computational representation of treatments and other actions taken for the clinical management of patients. Currently, manual biocuration is used to assign MAxO terms to rare diseases, enabling clinical management of rare diseases to be described computationally for use in clinical decision support and mechanism discovery. However, it is challenging to scale manual curation to comprehensively capture information about medical actions for the more than 10,000 rare diseases. METHODS: We present AutoMAxO, a semi-automated workflow that leverages Large Language Models (LLMs) to streamline MAxO biocuration for rare diseases. AutoMAxO first uses LLMs to retrieve candidate curations from abstracts of relevant publications. Next, the candidate curations are matched to ontology terms from MAxO, Human Phenotype Ontology (HPO), and MONDO disease ontology via a combination of LLMs and post-processing techniques. Finally, the matched terms are presented in a structured form to a human curator for approval. RESULTS: We used this approach to process 4,918 unique medical abstracts and identified annotations for 21 rare genetic diseases, we extracted 18,631 candidate disease-treatment curations, 538 of which were confirmed and transferred to the MAxO annotation dataset. CONCLUSION: The results of this project underscore the potential of generative AI to accelerate precision medicine by enabling a robust and comprehensive curation of the primary literature to represent information about diseases and procedures in a structured fashion. Although we focused on MAxO in this project, similar approaches could be taken for other biomedical curation tasks.

2.
Bioinformatics ; 2024 Sep 13.
Artículo en Inglés | MEDLINE | ID: mdl-39271156

RESUMEN

MOTIVATION: Molecular representation learning (MRL) models molecules with low-dimensional vectors to support biological and chemical applications. Current methods primarily rely on intrinsic molecular information to learn molecular representations, but they often overlook effectively integrating domain knowledge into MRL. RESULTS: In this paper, we develop a reaction-enhanced graph learning (RXGL) framework for MRL, utilizing chemical reactions as domain knowledge. RXGL introduces dual graph learning modules to model molecule representation. One module employs graph convolutions on molecular graphs to capture molecule structures. The other module constructs a reaction-aware graph from chemical reactions and designs a novel graph attention network on this graph to integrate reaction-level relations into molecular modeling. To refine molecule representations, we design a reaction-based relation learning task, which considers the relations between the reactant and product sides in reactions. In addition, we introduce a cross-view contrastive task to strengthen the cooperative associations between molecular and reaction-aware graph learning. Experiment results show that our RXGL achieves strong performance in various downstream tasks, including product prediction, reaction classification, and molecular property prediction. AVAILABILITY AND IMPLEMENTATION: The code is publicly available at https://github.com/coder-ACAC/RLM. SUPPLEMENTARY INFORMATION: Supplementary data is available at Bioinformatics online.

3.
Sci Data ; 11(1): 906, 2024 Aug 22.
Artículo en Inglés | MEDLINE | ID: mdl-39174566

RESUMEN

The "RNA world" represents a novel frontier for the study of fundamental biological processes and human diseases and is paving the way for the development of new drugs tailored to each patient's biomolecular characteristics. Although scientific data about coding and non-coding RNA molecules are constantly produced and available from public repositories, they are scattered across different databases and a centralized, uniform, and semantically consistent representation of the "RNA world" is still lacking. We propose RNA-KG, a knowledge graph (KG) encompassing biological knowledge about RNAs gathered from more than 60 public databases, integrating functional relationships with genes, proteins, and chemicals and ontologically grounded biomedical concepts. To develop RNA-KG, we first identified, pre-processed, and characterized each data source; next, we built a meta-graph that provides an ontological description of the KG by representing all the bio-molecular entities and medical concepts of interest in this domain, as well as the types of interactions connecting them. Finally, we leveraged an instance-based semantically abstracted knowledge model to specify the ontological alignment according to which RNA-KG was generated. RNA-KG can be downloaded in different formats and also queried by a SPARQL endpoint. A thorough topological analysis of the resulting heterogeneous graph provides further insights into the characteristics of the "RNA world". RNA-KG can be both directly explored and visualized, and/or analyzed by applying computational methods to infer bio-medical knowledge from its heterogeneous nodes and edges. The resource can be easily updated with new experimental data, and specific views of the overall KG can be extracted according to the bio-medical problem to be studied.


Asunto(s)
ARN , ARN/genética , Humanos , Ontologías Biológicas
4.
medRxiv ; 2024 Jul 22.
Artículo en Inglés | MEDLINE | ID: mdl-39108510

RESUMEN

Large language models (LLM) have shown great promise in supporting differential diagnosis, but 23 available published studies on the diagnostic accuracy evaluated small cohorts (number of cases, 30-422, mean 104) and have evaluated LLM responses subjectively by manual curation (23/23 studies). The performance of LLMs for rare disease diagnosis has not been evaluated systematically. Here, we perform a rigorous and large-scale analysis of the performance of a GPT-4 in prioritizing candidate diagnoses, using the largest-ever cohort of rare disease patients. Our computational study used 5267 computational case reports from previously published data. Each case was formatted as a Global Alliance for Genomics and Health (GA4GH) phenopacket, in which clinical anomalies were represented as Human Phenotype Ontology (HPO) terms. We developed software to generate prompts from each phenopacket. Prompts were sent to Generative Pre-trained Transformer 4 (GPT-4), and the rank of the correct diagnosis, if present in the response, was recorded. The mean reciprocal rank of the correct diagnosis was 0.24 (with the reciprocal of the MRR corresponding to a rank of 4.2), and the correct diagnosis was placed in rank 1 in 19.2% of the cases, in the first 3 ranks in 28.6%, and in the first 10 ranks in 32.5%. Our study is the largest to be reported to date and provides a realistic estimate of the performance of GPT-4 in rare disease medicine.

5.
bioRxiv ; 2024 Jul 04.
Artículo en Inglés | MEDLINE | ID: mdl-39005436

RESUMEN

Objectives: Concept embeddings are low-dimensional vector representations of concepts such as MeSH:D009203 (Myocardial Infarction), whose similarity in the embedded vector space reflects their semantic similarity. Here, we test the hypothesis that non-biomedical concept synonym replacement can improve the quality of biomedical concepts embeddings. Materials and methods: We developed an approach that leverages WordNet to replace sets of synonyms with the most common representative of the synonym set. Results: We tested our approach on 1055 concept sets and found that, on average, the mean intra-cluster distance was reduced by 8% in the vector-space. Assuming that homophily of related concepts in the vector space is desirable, our approach tends to improve the quality of embeddings. Discussion and Conclusion: This pilot study shows that non-biomedical synonym replacement tends to improve the quality of embeddings of biomedical concepts using the Word2Vec algorithm. We have implemented our approach in a freely available Python package available at https://github.com/TheJacksonLaboratory/wn2vec.

6.
EBioMedicine ; 106: 105220, 2024 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-39018755

RESUMEN

BACKGROUND: Anthracycline-based neoadjuvant chemotherapy (NAC) may modify tumour immune infiltrate. This study characterized immune infiltrate spatial distribution after NAC in primary high-risk soft tissue sarcomas (STS) and investigate association with prognosis. METHODS: The ISG-STS 1001 trial randomized STS patients to anthracycline plus ifosfamide (AI) or a histology-tailored (HT) NAC. Four areas of tumour specimens were sampled: the area showing the highest lymphocyte infiltrate (HI) at H&E; the area with lack of post-treatment changes (highest grade, HG); the area with post-treatment changes (lowest grade, LG); and the tumour edge (TE). CD3, CD8, PD-1, CD20, FOXP3, and CD163 were analyzed at immunohistochemistry and digital pathology. A machine learning method was used to generate sarcoma immune index scores (SIS) that predict patient disease-free and overall survival (DFS and OS). FINDINGS: Tumour infiltrating lymphocytes and PD-1+ cells together with CD163+ cells were more represented in STS histologies with complex compared to simple karyotype, while CD20+ B-cells were detected in both these histology groups. PD-1+ cells exerted a negative prognostic value irrespectively of their spatial distribution. Enrichment in CD20+ B-cells at HI and TE areas was associated with better patient outcomes. We generated a prognostic SIS for each tumour area, having the HI-SIS the best performance. Such prognostic value was driven by treatment with AI. INTERPRETATION: The different spatial distribution of immune populations and their different association with prognosis support NAC as a modifier of tumour immune infiltrate in STS. FUNDING: Pharmamar; Italian Ministry of Health [RF-2019-12370923; GR-2016-02362609]; 5 × 1000 Funds-2016, Italian Ministry of Health; AIRC Grant [ID#28546].


Asunto(s)
Linfocitos Infiltrantes de Tumor , Terapia Neoadyuvante , Sarcoma , Humanos , Sarcoma/tratamiento farmacológico , Sarcoma/mortalidad , Sarcoma/inmunología , Sarcoma/patología , Femenino , Masculino , Linfocitos Infiltrantes de Tumor/inmunología , Linfocitos Infiltrantes de Tumor/metabolismo , Pronóstico , Persona de Mediana Edad , Adulto , Anciano , Resultado del Tratamiento , Microambiente Tumoral/inmunología , Protocolos de Quimioterapia Combinada Antineoplásica/uso terapéutico , Biomarcadores de Tumor , Inmunohistoquímica
7.
Transl Psychiatry ; 14(1): 246, 2024 Jun 08.
Artículo en Inglés | MEDLINE | ID: mdl-38851761

RESUMEN

Acute COVID-19 infection can be followed by diverse clinical manifestations referred to as Post Acute Sequelae of SARS-CoV2 Infection (PASC). Studies have shown an increased risk of being diagnosed with new-onset psychiatric disease following a diagnosis of acute COVID-19. However, it was unclear whether non-psychiatric PASC-associated manifestations (PASC-AMs) are associated with an increased risk of new-onset psychiatric disease following COVID-19. A retrospective electronic health record (EHR) cohort study of 2,391,006 individuals with acute COVID-19 was performed to evaluate whether non-psychiatric PASC-AMs are associated with new-onset psychiatric disease. Data were obtained from the National COVID Cohort Collaborative (N3C), which has EHR data from 76 clinical organizations. EHR codes were mapped to 151 non-psychiatric PASC-AMs recorded 28-120 days following SARS-CoV-2 diagnosis and before diagnosis of new-onset psychiatric disease. Association of newly diagnosed psychiatric disease with age, sex, race, pre-existing comorbidities, and PASC-AMs in seven categories was assessed by logistic regression. There were significant associations between a diagnosis of any psychiatric disease and five categories of PASC-AMs with odds ratios highest for neurological, cardiovascular, and constitutional PASC-AMs with odds ratios of 1.31, 1.29, and 1.23 respectively. Secondary analysis revealed that the proportions of 50 individual clinical features significantly differed between patients diagnosed with different psychiatric diseases. Our study provides evidence for association between non-psychiatric PASC-AMs and the incidence of newly diagnosed psychiatric disease. Significant associations were found for features related to multiple organ systems. This information could prove useful in understanding risk stratification for new-onset psychiatric disease following COVID-19. Prospective studies are needed to corroborate these findings.


Asunto(s)
COVID-19 , Trastornos Mentales , SARS-CoV-2 , Humanos , COVID-19/psicología , COVID-19/complicaciones , COVID-19/epidemiología , Masculino , Femenino , Trastornos Mentales/epidemiología , Persona de Mediana Edad , Adulto , Estudios Retrospectivos , Anciano , Fenotipo , Síndrome Post Agudo de COVID-19 , Comorbilidad , Registros Electrónicos de Salud , Adulto Joven , Factores de Riesgo , Adolescente
8.
Bioinform Adv ; 4(1): vbae036, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38577542

RESUMEN

Motivation: Graph representation learning is a family of related approaches that learn low-dimensional vector representations of nodes and other graph elements called embeddings. Embeddings approximate characteristics of the graph and can be used for a variety of machine-learning tasks such as novel edge prediction. For many biomedical applications, partial knowledge exists about positive edges that represent relationships between pairs of entities, but little to no knowledge is available about negative edges that represent the explicit lack of a relationship between two nodes. For this reason, classification procedures are forced to assume that the vast majority of unlabeled edges are negative. Existing approaches to sampling negative edges for training and evaluating classifiers do so by uniformly sampling pairs of nodes. Results: We show here that this sampling strategy typically leads to sets of positive and negative examples with imbalanced node degree distributions. Using representative heterogeneous biomedical knowledge graph and random walk-based graph machine learning, we show that this strategy substantially impacts classification performance. If users of graph machine-learning models apply the models to prioritize examples that are drawn from approximately the same distribution as the positive examples are, then performance of models as estimated in the validation phase may be artificially inflated. We present a degree-aware node sampling approach that mitigates this effect and is simple to implement. Availability and implementation: Our code and data are publicly available at https://github.com/monarch-initiative/negativeExampleSelection.

9.
Int J Med Inform ; 187: 105461, 2024 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-38643701

RESUMEN

OBJECTIVE: Female reproductive disorders (FRDs) are common health conditions that may present with significant symptoms. Diet and environment are potential areas for FRD interventions. We utilized a knowledge graph (KG) method to predict factors associated with common FRDs (for example, endometriosis, ovarian cyst, and uterine fibroids). MATERIALS AND METHODS: We harmonized survey data from the Personalized Environment and Genes Study (PEGS) on internal and external environmental exposures and health conditions with biomedical ontology content. We merged the harmonized data and ontologies with supplemental nutrient and agricultural chemical data to create a KG. We analyzed the KG by embedding edges and applying a random forest for edge prediction to identify variables potentially associated with FRDs. We also conducted logistic regression analysis for comparison. RESULTS: Across 9765 PEGS respondents, the KG analysis resulted in 8535 significant or suggestive predicted links between FRDs and chemicals, phenotypes, and diseases. Amongst these links, 32 were exact matches when compared with the logistic regression results, including comorbidities, medications, foods, and occupational exposures. DISCUSSION: Mechanistic underpinnings of predicted links documented in the literature may support some of our findings. Our KG methods are useful for predicting possible associations in large, survey-based datasets with added information on directionality and magnitude of effect from logistic regression. These results should not be construed as causal but can support hypothesis generation. CONCLUSION: This investigation enabled the generation of hypotheses on a variety of potential links between FRDs and exposures. Future investigations should prospectively evaluate the variables hypothesized to impact FRDs.


Asunto(s)
Exposición a Riesgos Ambientales , Humanos , Femenino , Exposición a Riesgos Ambientales/efectos adversos , Enfermedades de los Genitales Femeninos , Modelos Logísticos , Estado Nutricional , Dieta , Adulto , Bosques Aleatorios
10.
J Pers Med ; 14(4)2024 Mar 25.
Artículo en Inglés | MEDLINE | ID: mdl-38672968

RESUMEN

Artificial intelligence (AI) approaches have been introduced in various disciplines but remain rather unused in head and neck (H&N) cancers. This survey aimed to infer the current applications of and attitudes toward AI in the multidisciplinary care of H&N cancers. From November 2020 to June 2022, a web-based questionnaire examining the relationship between AI usage and professionals' demographics and attitudes was delivered to different professionals involved in H&N cancers through social media and mailing lists. A total of 139 professionals completed the questionnaire. Only 49.7% of the respondents reported having experience with AI. The most frequent AI users were radiologists (66.2%). Significant predictors of AI use were primary specialty (V = 0.455; p < 0.001), academic qualification and age. AI's potential was seen in the improvement of diagnostic accuracy (72%), surgical planning (64.7%), treatment selection (57.6%), risk assessment (50.4%) and the prediction of complications (45.3%). Among participants, 42.7% had significant concerns over AI use, with the most frequent being the 'loss of control' (27.6%) and 'diagnostic errors' (57.0%). This survey reveals limited engagement with AI in multidisciplinary H&N cancer care, highlighting the need for broader implementation and further studies to explore its acceptance and benefits.

11.
Sci Data ; 11(1): 363, 2024 Apr 11.
Artículo en Inglés | MEDLINE | ID: mdl-38605048

RESUMEN

Translational research requires data at multiple scales of biological organization. Advancements in sequencing and multi-omics technologies have increased the availability of these data, but researchers face significant integration challenges. Knowledge graphs (KGs) are used to model complex phenomena, and methods exist to construct them automatically. However, tackling complex biomedical integration problems requires flexibility in the way knowledge is modeled. Moreover, existing KG construction methods provide robust tooling at the cost of fixed or limited choices among knowledge representation models. PheKnowLator (Phenotype Knowledge Translator) is a semantic ecosystem for automating the FAIR (Findable, Accessible, Interoperable, and Reusable) construction of ontologically grounded KGs with fully customizable knowledge representation. The ecosystem includes KG construction resources (e.g., data preparation APIs), analysis tools (e.g., SPARQL endpoint resources and abstraction algorithms), and benchmarks (e.g., prebuilt KGs). We evaluated the ecosystem by systematically comparing it to existing open-source KG construction methods and by analyzing its computational performance when used to construct 12 different large-scale KGs. With flexible knowledge representation, PheKnowLator enables fully customizable KGs without compromising performance or usability.


Asunto(s)
Disciplinas de las Ciencias Biológicas , Bases del Conocimiento , Reconocimiento de Normas Patrones Automatizadas , Algoritmos , Investigación Biomédica Traslacional
12.
medRxiv ; 2024 Feb 26.
Artículo en Inglés | MEDLINE | ID: mdl-37503093

RESUMEN

Objective: Large Language Models such as GPT-4 previously have been applied to differential diagnostic challenges based on published case reports. Published case reports have a sophisticated narrative style that is not readily available from typical electronic health records (EHR). Furthermore, even if such a narrative were available in EHRs, privacy requirements would preclude sending it outside the hospital firewall. We therefore tested a method for parsing clinical texts to extract ontology terms and programmatically generating prompts that by design are free of protected health information. Materials and Methods: We investigated different methods to prepare prompts from 75 recently published case reports. We transformed the original narratives by extracting structured terms representing phenotypic abnormalities, comorbidities, treatments, and laboratory tests and creating prompts programmatically. Results: Performance of all of these approaches was modest, with the correct diagnosis ranked first in only 5.3-17.6% of cases. The performance of the prompts created from structured data was substantially worse than that of the original narrative texts, even if additional information was added following manual review of term extraction. Moreover, different versions of GPT-4 demonstrated substantially different performance on this task. Discussion: The sensitivity of the performance to the form of the prompt and the instability of results over two GPT-4 versions represent important current limitations to the use of GPT-4 to support diagnosis in real-life clinical settings. Conclusion: Research is needed to identify the best methods for creating prompts from typically available clinical data to support differential diagnostics.

13.
Front Bioinform ; 3: 1304099, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-38076030

RESUMEN

The recent breakthroughs of Large Language Models (LLMs) in the context of natural language processing have opened the way to significant advances in protein research. Indeed, the relationships between human natural language and the "language of proteins" invite the application and adaptation of LLMs to protein modelling and design. Considering the impressive results of GPT-4 and other recently developed LLMs in processing, generating and translating human languages, we anticipate analogous results with the language of proteins. Indeed, protein language models have been already trained to accurately predict protein properties, generate novel functionally characterized proteins, achieving state-of-the-art results. In this paper we discuss the promises and the open challenges raised by this novel and exciting research area, and we propose our perspective on how LLMs will affect protein modeling and design.

14.
J Clin Med ; 12(24)2023 Dec 09.
Artículo en Inglés | MEDLINE | ID: mdl-38137667

RESUMEN

PURPOSE: to evaluate the clinical impact of a protocol for the image-guided percutaneous microwave ablation (MWA) of hepatocellular carcinoma (HCC) that includes cone-beam computed tomography (CBCT), fusion imaging and ablation volume prediction in patients with hepatocellular carcinoma unsuitable for standard ultrasound (US) guidance. MATERIALS AND METHODS: this study included all patients with HCC treated with MWA between January 2021 and June 2022 in a tertiary institution. Patients were divided into two groups: Group A, treated following the protocol, and Group B, treated with standard ultrasound (US) guidance. Follow-up images were reviewed to assess residual disease (RD), local tumor progression (LTP) and intrahepatic distant recurrence (IDR). Ablation response at 1 month was also evaluated according to mRECIST. Baseline variables and outcomes were compared between the groups. For 1-month RD, propensity score weighting (PSW) was performed. RESULTS: 80 consecutive patients with 101 HCCs treated with MWA were divided into two groups. Group A had 41 HCCs in 37 patients, and Group B had 60 HCCs in 43 patients. Among all baseline variables, the groups differed regarding their age (mean of 72 years in Group A and 64 years in Group B, respectively), new vs. residual tumor rates (48% Group A vs. 25% Group B, p < 0.05) and number of subcapsular tumors (56.7% Group B vs. 31.7% Group A, p < 0.05) and perivascular tumors (51.7% Group B vs. 17.1% Group A, p < 0.05). The protocol led to repositioning the antenna in 49% of cases. There was a significant difference in 1-month local response between the groups measured as the RD rate and mRECIST outcomes. LTP rates at 3 and 6 months, and IDR rates at 1, 3 and 6 months, showed no significant differences. Among all variables, logistic regression after PSW demonstrated a protective effect of the protocol against 1-month RD. CONCLUSIONS: The use of CBCT, fusion imaging and ablation volume prediction during percutaneous MWA of HCCs provided a better 1-month tumor local control. Further studies with a larger population and longer follow-up are needed.

15.
EBioMedicine ; 96: 104777, 2023 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-37672869

RESUMEN

BACKGROUND: The cause and symptoms of long COVID are poorly understood. It is challenging to predict whether a given COVID-19 patient will develop long COVID in the future. METHODS: We used electronic health record (EHR) data from the National COVID Cohort Collaborative to predict the incidence of long COVID. We trained two machine learning (ML) models - logistic regression (LR) and random forest (RF). Features used to train predictors included symptoms and drugs ordered during acute infection, measures of COVID-19 treatment, pre-COVID comorbidities, and demographic information. We assigned the 'long COVID' label to patients diagnosed with the U09.9 ICD10-CM code. The cohorts included patients with (a) EHRs reported from data partners using U09.9 ICD10-CM code and (b) at least one EHR in each feature category. We analysed three cohorts: all patients (n = 2,190,579; diagnosed with long COVID = 17,036), inpatients (149,319; 3,295), and outpatients (2,041,260; 13,741). FINDINGS: LR and RF models yielded median AUROC of 0.76 and 0.75, respectively. Ablation study revealed that drugs had the highest influence on the prediction task. The SHAP method identified age, gender, cough, fatigue, albuterol, obesity, diabetes, and chronic lung disease as explanatory features. Models trained on data from one N3C partner and tested on data from the other partners had average AUROC of 0.75. INTERPRETATION: ML-based classification using EHR information from the acute infection period is effective in predicting long COVID. SHAP methods identified important features for prediction. Cross-site analysis demonstrated the generalizability of the proposed methodology. FUNDING: NCATS U24 TR002306, NCATS UL1 TR003015, Axle Informatics Subcontract: NCATS-P00438-B, NIH/NIDDK/OD, PSR2015-1720GVALE_01, G43C22001320007, and Director, Office of Science, Office of Basic Energy Sciences of the U.S. Department of Energy Contract No. DE-AC02-05CH11231.


Asunto(s)
COVID-19 , Síndrome Post Agudo de COVID-19 , Humanos , Tratamiento Farmacológico de COVID-19 , Aprendizaje Automático , Obesidad
16.
medRxiv ; 2023 Jul 16.
Artículo en Inglés | MEDLINE | ID: mdl-37502882

RESUMEN

Objective: Female reproductive disorders (FRDs) are common health conditions that may present with significant symptoms. Diet and environment are potential areas for FRD interventions. We utilized a knowledge graph (KG) method to predict factors associated with common FRDs (e.g., endometriosis, ovarian cyst, and uterine fibroids). Materials and Methods: We harmonized survey data from the Personalized Environment and Genes Study on internal and external environmental exposures and health conditions with biomedical ontology content. We merged the harmonized data and ontologies with supplemental nutrient and agricultural chemical data to create a KG. We analyzed the KG by embedding edges and applying a random forest for edge prediction to identify variables potentially associated with FRDs. We also conducted logistic regression analysis for comparison. Results: Across 9765 PEGS respondents, the KG analysis resulted in 8535 significant predicted links between FRDs and chemicals, phenotypes, and diseases. Amongst these links, 32 were exact matches when compared with the logistic regression results, including comorbidities, medications, foods, and occupational exposures. Discussion: Mechanistic underpinnings of predicted links documented in the literature may support some of our findings. Our KG methods are useful for predicting possible associations in large, survey-based datasets with added information on directionality and magnitude of effect from logistic regression. These results should not be construed as causal, but can support hypothesis generation. Conclusion: This investigation enabled the generation of hypotheses on a variety of potential links between FRDs and exposures. Future investigations should prospectively evaluate the variables hypothesized to impact FRDs.

17.
Diagnostics (Basel) ; 13(11)2023 May 25.
Artículo en Inglés | MEDLINE | ID: mdl-37296701

RESUMEN

(1) Background: The assessment of resection margins during surgery of oral cavity squamous cell cancer (OCSCC) dramatically impacts the prognosis of the patient as well as the need for adjuvant treatment in the future. Currently there is an unmet need to improve OCSCC surgical margins which appear to be involved in around 45% cases. Intraoperative imaging techniques, magnetic resonance imaging (MRI) and intraoral ultrasound (ioUS), have emerged as promising tools in guiding surgical resection, although the number of studies available on this subject is still low. The aim of this diagnostic test accuracy (DTA) review is to investigate the accuracy of intraoperative imaging in the assessment of OCSCC margins. (2) Methods: By using the Cochrane-supported platform Review Manager version 5.4, a systematic search was performed on the online databases MEDLINE-EMBASE-CENTRAL using the keywords "oral cavity cancer, squamous cell carcinoma, tongue cancer, surgical margins, magnetic resonance imaging, intraoperative, intra-oral ultrasound". (3) Results: Ten papers were identified for full-text analysis. The negative predictive value (cutoff < 5 mm) for ioUS ranged from 0.55 to 0.91, that of MRI ranged from 0.5 to 0.91; accuracy analysis performed on four selected studies showed a sensitivity ranging from 0.07 to 0.75 and specificity ranging from 0.81 to 1. Image guidance allowed for a mean improvement in free margin resection of 35%. (4) Conclusions: IoUS shows comparable accuracy to that of ex vivo MRI for the assessment of close and involved surgical margins, and should be preferred as the more affordable and reproducible technique. Both techniques showed higher diagnostic yield if applied to early OCSCC (T1-T2 stages), and when histology is favorable.

18.
NPJ Digit Med ; 6(1): 89, 2023 May 19.
Artículo en Inglés | MEDLINE | ID: mdl-37208468

RESUMEN

Common data models solve many challenges of standardizing electronic health record (EHR) data but are unable to semantically integrate all of the resources needed for deep phenotyping. Open Biological and Biomedical Ontology (OBO) Foundry ontologies provide computable representations of biological knowledge and enable the integration of heterogeneous data. However, mapping EHR data to OBO ontologies requires significant manual curation and domain expertise. We introduce OMOP2OBO, an algorithm for mapping Observational Medical Outcomes Partnership (OMOP) vocabularies to OBO ontologies. Using OMOP2OBO, we produced mappings for 92,367 conditions, 8611 drug ingredients, and 10,673 measurement results, which covered 68-99% of concepts used in clinical practice when examined across 24 hospitals. When used to phenotype rare disease patients, the mappings helped systematically identify undiagnosed patients who might benefit from genetic testing. By aligning OMOP vocabularies to OBO ontologies our algorithm presents new opportunities to advance EHR-based deep phenotyping.

19.
Bioinformatics ; 39(4)2023 04 03.
Artículo en Inglés | MEDLINE | ID: mdl-36929917

RESUMEN

MOTIVATION: Advances in RNA sequencing technologies have achieved an unprecedented accuracy in the quantification of mRNA isoforms, but our knowledge of isoform-specific functions has lagged behind. There is a need to understand the functional consequences of differential splicing, which could be supported by the generation of accurate and comprehensive isoform-specific gene ontology annotations. RESULTS: We present isoform interpretation, a method that uses expectation-maximization to infer isoform-specific functions based on the relationship between sequence and functional isoform similarity. We predicted isoform-specific functional annotations for 85 617 isoforms of 17 900 protein-coding human genes spanning a range of 17 430 distinct gene ontology terms. Comparison with a gold-standard corpus of manually annotated human isoform functions showed that isoform interpretation significantly outperforms state-of-the-art competing methods. We provide experimental evidence that functionally related isoforms predicted by isoform interpretation show a higher degree of domain sharing and expression correlation than functionally related genes. We also show that isoform sequence similarity correlates better with inferred isoform function than with gene-level function. AVAILABILITY AND IMPLEMENTATION: Source code, documentation, and resource files are freely available under a GNU3 license at https://github.com/TheJacksonLaboratory/isopretEM and https://zenodo.org/record/7594321.


Asunto(s)
Motivación , Programas Informáticos , Humanos , Isoformas de Proteínas/genética , Empalme Alternativo , Análisis de Secuencia de ARN
20.
J Biomed Inform ; 139: 104295, 2023 03.
Artículo en Inglés | MEDLINE | ID: mdl-36716983

RESUMEN

Healthcare datasets obtained from Electronic Health Records have proven to be extremely useful for assessing associations between patients' predictors and outcomes of interest. However, these datasets often suffer from missing values in a high proportion of cases, whose removal may introduce severe bias. Several multiple imputation algorithms have been proposed to attempt to recover the missing information under an assumed missingness mechanism. Each algorithm presents strengths and weaknesses, and there is currently no consensus on which multiple imputation algorithm works best in a given scenario. Furthermore, the selection of each algorithm's parameters and data-related modeling choices are also both crucial and challenging. In this paper we propose a novel framework to numerically evaluate strategies for handling missing data in the context of statistical analysis, with a particular focus on multiple imputation techniques. We demonstrate the feasibility of our approach on a large cohort of type-2 diabetes patients provided by the National COVID Cohort Collaborative (N3C) Enclave, where we explored the influence of various patient characteristics on outcomes related to COVID-19. Our analysis included classic multiple imputation techniques as well as simple complete-case Inverse Probability Weighted models. Extensive experiments show that our approach can effectively highlight the most promising and performant missing-data handling strategy for our case study. Moreover, our methodology allowed a better understanding of the behavior of the different models and of how it changed as we modified their parameters. Our method is general and can be applied to different research fields and on datasets containing heterogeneous types.


Asunto(s)
COVID-19 , Humanos , Algoritmos , Proyectos de Investigación , Sesgo , Probabilidad
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA