Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 485
Filter
1.
BioData Min ; 17(1): 37, 2024 Oct 01.
Article in English | MEDLINE | ID: mdl-39354639

ABSTRACT

BACKGROUND: Epistasis, the interaction between genetic loci where the effect of one locus is influenced by one or more other loci, plays a crucial role in the genetic architecture of complex traits. However, as the number of loci considered increases, the investigation of epistasis becomes exponentially more complex, making the selection of key features vital for effective downstream analyses. Relief-Based Algorithms (RBAs) are often employed for this purpose due to their reputation as "interaction-sensitive" algorithms and uniquely non-exhaustive approach. However, the limitations of RBAs in detecting interactions, particularly those involving multiple loci, have not been thoroughly defined. This study seeks to address this gap by evaluating the efficiency of RBAs in detecting higher-order epistatic interactions. Motivated by previous findings that suggest some RBAs may rank predictive features involved in higher-order epistasis negatively, we explore the potential of absolute value ranking of RBA feature weights as an alternative approach for capturing complex interactions. In this study, we assess the performance of ReliefF, MultiSURF, and MultiSURFstar on simulated genetic datasets that model various patterns of genotype-phenotype associations, including 2-way to 5-way genetic interactions, and compare their performance to two control methods: a random shuffle and mutual information. RESULTS: Our findings indicate that while RBAs effectively identify lower-order (2 to 3-way) interactions, their capability to detect higher-order interactions is significantly limited, primarily by large feature count but also by signal noise. Specifically, we observe that RBAs are successful in detecting fully penetrant 4-way XOR interactions using an absolute value ranking approach, but this is restricted to datasets with only 20 total features. CONCLUSIONS: These results highlight the inherent limitations of current RBAs and underscore the need for the development of Relief-based approaches with enhanced detection capabilities for the investigation of epistasis, particularly in datasets with large feature counts and complex higher-order interactions.

2.
BioData Min ; 17(1): 41, 2024 Oct 11.
Article in English | MEDLINE | ID: mdl-39394173

ABSTRACT

BACKGROUND: The additive model of inheritance assumes that heterozygotes (Aa) are exactly intermediate in respect to homozygotes (AA and aa). While this model is commonly used in single-locus genetic association studies, significant deviations from additivity are well-documented and contribute to phenotypic variance across many traits and systems. This assumption can introduce type I and type II errors by overestimating or underestimating the effects of variants that deviate from additivity. Alternative genotype encoding strategies have been explored to account for different inheritance patterns, but they often incur significant computational or methodological costs. To address these challenges, we introduce PAGER (Phenotype Adjusted Genotype Encoding and Ranking), an efficient pre-processing method that encodes each genetic variant based on normalized mean phenotypic differences between diallelic genotype classes (AA, Aa, and aa). This approach more accurately reflects each variant's true inheritance model, improving model precision while minimizing the costs associated with alternative encoding strategies. RESULTS: Through extensive benchmarking on SNPs simulated with both binary and continuous phenotypes, we demonstrate that PAGER accurately represents various inheritance patterns (including additive, dominant, recessive, and heterosis), achieves levels of statistical power that meet or exceed other encoding strategies, and attains computation speeds up to 55 times faster than a similar method, EDGE. We also apply PAGER to publicly available real-world data and identify a novel, relevant putative QTL associated with body mass index in rats (Rattus norvegicus) that is not detected with the additive model. CONCLUSIONS: Overall, we show that PAGER is an efficient genotype encoding approach that can uncover sources of missing heritability and reveal novel insights in the study of complex traits while incurring minimal costs.

3.
Res Sq ; 2024 Sep 02.
Article in English | MEDLINE | ID: mdl-39281873

ABSTRACT

Background: The investigation of epistasis becomes increasingly complex as more loci are considered due to the exponential expansion of possible interactions. Consequently, selecting key features that influence epistatic interactions is crucial for effective downstream analyses. Recognizing this challenge, this study investigates the efficiency of Relief-Based Algorithms (RBAs) in detecting higher-order epistatic interactions, which may be critical for understanding the genetic architecture of complex traits. RBAs are uniquely non-exhaustive, eliminating the need to construct features for every possible interaction and thus improving computational tractability. Motivated by previous research indicating that some RBAs rank predictive features involved in higher-order epistasis as highly negative, we explore the utility of absolute value ranking of RBA feature weights as an alternative method to capture complex interactions. We evaluate ReliefF, MultiSURF, and MultiSURFstar on simulated genetic datasets that model various patterns of genotype-phenotype associations, including 2-way to 5-way genetic interactions, and compare their performance to two control methods: a random shuffle and mutual information. Results: Our findings indicate that while RBAs effectively identify lower-order (2 to 3-way) interactions, their capability to detect higher-order interactions is significantly limited, primarily by large feature count but also by signal noise. Specifically, we observe that RBAs are successful in detecting fully penetrant 4-way XOR interactions using an absolute value ranking approach, but this is restricted to datasets with a minimal number of total features. Conclusions: These results highlight the inherent limitations of current RBAs and underscore the need for enhanced detection capabilities for the investigation of epistasis, particularly in datasets with large feature counts and complex higher-order interactions.

4.
Cell ; 187(17): 4449-4457, 2024 Aug 22.
Article in English | MEDLINE | ID: mdl-39178828

ABSTRACT

Computational data-centric research techniques play a prevalent and multi-disciplinary role in life science research. In the past, scientists in wet labs generated the data, and computational researchers focused on creating tools for the analysis of those data. Computational researchers are now becoming more independent and taking leadership roles within biomedical projects, leveraging the increased availability of public data. We are now able to generate vast amounts of data, and the challenge has shifted from data generation to data analysis. Here we discuss the pitfalls, challenges, and opportunities facing the field of data-centric research in biology. We discuss the evolving perception of computational data-driven research and its rise as an independent domain in biomedical research while also addressing the significant collaborative opportunities that arise from integrating computational research with experimental and translational biology. Additionally, we discuss the future of data-centric research and its applications across various areas of the biomedical field.


Subject(s)
Biomedical Research , Computational Biology , Computational Biology/methods , Humans
5.
medRxiv ; 2024 Aug 02.
Article in English | MEDLINE | ID: mdl-39132476

ABSTRACT

Objective: A multitude of factors affect a hospitalized individual's blood glucose (BG), making BG difficult to predict and manage. Beyond medications well established to alter BG, such as beta-blockers, there are likely many medications with undiscovered effects on BG variability. Identification of these medications and the strength and timing of these relationships has potential to improve glycemic management and patient safety. Materials and Methods: EHR data from 103,871 inpatient encounters over 8 years within a large, urban health system was used to extract over 500 medications, laboratory measurements, and clinical predictors of BG. Feature selection was performed using an optimized Lasso model with repeated 5-fold cross-validation on the 80% training set, followed by a linear mixed regression model to evaluate statistical significance. Significant medication predictors were then evaluated for novelty against a comprehensive adverse drug event database. Results: We found 29 statistically significant features associated with BG; 24 were medications including 10 medications not previously documented to alter BG. The remaining five factors were Black/African American race, history of type 2 diabetes mellitus, prior BG (mean and last) and creatinine. Discussion: The unexpected medications, including several agents involved in gastrointestinal motility, found to affect BG were supported by available studies. This study may bring to light medications to use with caution in individuals with hyper- or hypoglycemia. Further investigation of these potential candidates is needed to enhance clinical utility of these findings. Conclusion: This study uniquely identifies medications involved in gastrointestinal transit to be predictors of BG that may not well established and recognized in clinical practice.

6.
Patterns (N Y) ; 5(6): 101010, 2024 Jun 14.
Article in English | MEDLINE | ID: mdl-39005486

ABSTRACT

The authors emphasize diversity, equity, and inclusion in STEM education and artificial intelligence (AI) research, focusing on LGBTQ+ representation. They discuss the challenges faced by queer scientists, educational resources, the implementation of National AI Campus, and the notion of intersectionality. The authors hope to ensure supportive and respectful engagement across all communities.

7.
Bioinformatics ; 40(6)2024 06 03.
Article in English | MEDLINE | ID: mdl-38830083

ABSTRACT

MOTIVATION: Answering and solving complex problems using a large language model (LLM) given a certain domain such as biomedicine is a challenging task that requires both factual consistency and logic, and LLMs often suffer from some major limitations, such as hallucinating false or irrelevant information, or being influenced by noisy data. These issues can compromise the trustworthiness, accuracy, and compliance of LLM-generated text and insights. RESULTS: Knowledge Retrieval Augmented Generation ENgine (KRAGEN) is a new tool that combines knowledge graphs, Retrieval Augmented Generation (RAG), and advanced prompting techniques to solve complex problems with natural language. KRAGEN converts knowledge graphs into a vector database and uses RAG to retrieve relevant facts from it. KRAGEN uses advanced prompting techniques: namely graph-of-thoughts (GoT), to dynamically break down a complex problem into smaller subproblems, and proceeds to solve each subproblem by using the relevant knowledge through the RAG framework, which limits the hallucinations, and finally, consolidates the subproblems and provides a solution. KRAGEN's graph visualization allows the user to interact with and evaluate the quality of the solution's GoT structure and logic. AVAILABILITY AND IMPLEMENTATION: KRAGEN is deployed by running its custom Docker containers. KRAGEN is available as open-source from GitHub at: https://github.com/EpistasisLab/KRAGEN.


Subject(s)
Software , Natural Language Processing , Problem Solving , Algorithms , Information Storage and Retrieval/methods , Humans , Computational Biology/methods , Databases, Factual
8.
Med Image Anal ; 97: 103231, 2024 Oct.
Article in English | MEDLINE | ID: mdl-38941858

ABSTRACT

Alzheimer's disease (AD) is a complex neurodegenerative disorder that has impacted millions of people worldwide. The neuroanatomical heterogeneity of AD has made it challenging to fully understand the disease mechanism. Identifying AD subtypes during the prodromal stage and determining their genetic basis would be immensely valuable for drug discovery and subsequent clinical treatment. Previous studies that clustered subgroups typically used unsupervised learning techniques, neglecting the survival information and potentially limiting the insights gained. To address this problem, we propose an interpretable survival analysis method called Deep Clustering Survival Machines (DCSM), which combines both discriminative and generative mechanisms. Similar to mixture models, we assume that the timing information of survival data can be generatively described by a mixture of parametric distributions, referred to as expert distributions. We learn the weights of these expert distributions for individual instances in a discriminative manner by leveraging their features. This allows us to characterize the survival information of each instance through a weighted combination of the learned expert distributions. We demonstrate the superiority of the DCSM method by applying this approach to cluster patients with mild cognitive impairment (MCI) into subgroups with different risks of converting to AD. Conventional clustering measurements for survival analysis along with genetic association studies successfully validate the effectiveness of the proposed method and characterize our clustering findings.


Subject(s)
Alzheimer Disease , Cognitive Dysfunction , Alzheimer Disease/diagnostic imaging , Alzheimer Disease/genetics , Humans , Cluster Analysis , Aged , Female , Survival Analysis , Male , Algorithms
9.
BioData Min ; 17(1): 16, 2024 Jun 18.
Article in English | MEDLINE | ID: mdl-38890715

ABSTRACT

GPT-4, as the most advanced version of OpenAI's large language models, has attracted widespread attention, rapidly becoming an indispensable AI tool across various areas. This includes its exploration by scientists for diverse applications. Our study focused on assessing GPT-4's capabilities in generating text, tables, and diagrams for biomedical review papers. We also assessed the consistency in text generation by GPT-4, along with potential plagiarism issues when employing this model for the composition of scientific review papers. Based on the results, we suggest the development of enhanced functionalities in ChatGPT, aiming to meet the needs of the scientific community more effectively. This includes enhancements in uploaded document processing for reference materials, a deeper grasp of intricate biomedical concepts, more precise and efficient information distillation for table generation, and a further refined model specifically tailored for scientific diagram creation.

10.
Sci Rep ; 14(1): 13707, 2024 06 14.
Article in English | MEDLINE | ID: mdl-38877045

ABSTRACT

Determining the fundamental characteristics that define a face as "feminine" or "masculine" has long fascinated anatomists and plastic surgeons, particularly those involved in aesthetic and gender-affirming surgery. Previous studies in this area have relied on manual measurements, comparative anatomy, and heuristic landmark-based feature extraction. In this study, we collected retrospectively at Cedars Sinai Medical Center (CSMC) a dataset of 98 skull samples, which is the first dataset of this kind of 3D medical imaging. We then evaluated the accuracy of multiple deep learning neural network architectures on sex classification with this dataset. Specifically, we evaluated methods representing three different 3D data modeling approaches: Resnet3D, PointNet++, and MeshNet. Despite the limited number of imaging samples, our testing results show that all three approaches achieve AUC scores above 0.9 after convergence. PointNet++ exhibits the highest accuracy, while MeshNet has the lowest. Our findings suggest that accuracy is not solely dependent on the sparsity of data representation but also on the architecture design, with MeshNet's lower accuracy likely due to the lack of a hierarchical structure for progressive data abstraction. Furthermore, we studied a problem related to sex determination, which is the analysis of the various morphological features that affect sex classification. We proposed and developed a new method based on morphological gradients to visualize features that influence model decision making. The method based on morphological gradients is an alternative to the standard saliency map, and the new method provides better visualization of feature importance. Our study is the first to develop and evaluate deep learning models for analyzing 3D facial skull images to identify imaging feature differences between individuals assigned male or female at birth. These findings may be useful for planning and evaluating craniofacial surgery, particularly gender-affirming procedures, such as facial feminization surgery.


Subject(s)
Deep Learning , Imaging, Three-Dimensional , Neural Networks, Computer , Skull , Humans , Skull/anatomy & histology , Skull/diagnostic imaging , Imaging, Three-Dimensional/methods , Female , Male , Retrospective Studies , Sex Characteristics , Adult , Image Processing, Computer-Assisted/methods
11.
Res Sq ; 2024 May 23.
Article in English | MEDLINE | ID: mdl-38826481

ABSTRACT

Background: Epistasis, the phenomenon where the effect of one gene (or variant) is masked or modified by one or more other genes, can significantly contribute to the observed phenotypic variance of complex traits. To date, it has been generally assumed that genetic interactions can be detected using a Cartesian, or multiplicative, interaction model commonly utilized in standard regression approaches. However, a recent study investigating epistasis in obesity-related traits in rats and mice has identified potential limitations of the Cartesian model, revealing that it only detects some of the genetic interactions occurring in these systems. By applying an alternative approach, the exclusive-or (XOR) model, the researchers detected a greater number of epistatic interactions and identified more biologically relevant ontological terms associated with the interacting loci. This suggests that the XOR model may provide a more comprehensive understanding of epistasis in these species and phenotypes. To further explore these findings and determine if different interaction models also make up distinct epistatic networks, we leverage network science to provide a more comprehensive view into the genetic interactions underlying BMI in this system. Results: Our comparative analysis of networks derived from Cartesian and XOR interaction models in rats (Rattus norvegicus) uncovers distinct topological characteristics for each model-derived network. Notably, we discover that networks based on the XOR model exhibit an enhanced sensitivity to epistatic interactions. This sensitivity enables the identification of network communities, revealing novel trait-related biological functions through enrichment analysis. Furthermore, we identify triangle network motifs in the XOR epistatic network, suggestive of higher-order epistasis, based on the topology of lower-order epistasis. Conclusions: These findings highlight the XOR model's ability to uncover meaningful biological associations as well as higher-order epistasis from lower-order epistatic networks. Additionally, our results demonstrate that network approaches not only enhance epistasis detection capabilities but also provide more nuanced understandings of genetic architectures underlying complex traits. The identification of community structures and motifs within these distinct networks, especially in XOR, points to the potential for network science to aid in the discovery of novel genetic pathways and regulatory networks. Such insights are important for advancing our understanding of phenotype-genotype relationships.

12.
AMIA Jt Summits Transl Sci Proc ; 2024: 211-220, 2024.
Article in English | MEDLINE | ID: mdl-38827072

ABSTRACT

Fairness is crucial in machine learning to prevent bias based on sensitive attributes in classifier predictions. However, the pursuit of strict fairness often sacrifices accuracy, particularly when significant prevalence disparities exist among groups, making classifiers less practical. For example, Alzheimer's disease (AD) is more prevalent in women than men, making equal treatment inequitable for females. Accounting for prevalence ratios among groups is essential for fair decision-making. In this paper, we introduce prior knowledge for fairness, which incorporates prevalence ratio information into the fairness constraint within the Empirical Risk Minimization (ERM) framework. We develop the Prior-knowledge-guided Fair ERM (PFERM) framework, aiming to minimize expected risk within a specified function class while adhering to a prior-knowledge-guided fairness constraint. This approach strikes a flexible balance between accuracy and fairness. Empirical results confirm its effectiveness in preserving fairness without compromising accuracy.

13.
Annu Rev Biomed Data Sci ; 7(1): 179-199, 2024 Aug.
Article in English | MEDLINE | ID: mdl-38723657

ABSTRACT

The progress of precision medicine research hinges on the gathering and analysis of extensive and diverse clinical datasets. With the continued expansion of modalities, scales, and sources of clinical datasets, it becomes imperative to devise methods for aggregating information from these varied sources to achieve a comprehensive understanding of diseases. In this review, we describe two important approaches for the analysis of diverse clinical datasets, namely the centralized model and federated model. We compare and contrast the strengths and weaknesses inherent in each model and present recent progress in methodologies and their associated challenges. Finally, we present an outlook on the opportunities that both models hold for the future analysis of clinical data.


Subject(s)
Precision Medicine , Humans , Precision Medicine/methods , Datasets as Topic
14.
J Med Internet Res ; 26: e46777, 2024 Apr 18.
Article in English | MEDLINE | ID: mdl-38635981

ABSTRACT

BACKGROUND: As global populations age and become susceptible to neurodegenerative illnesses, new therapies for Alzheimer disease (AD) are urgently needed. Existing data resources for drug discovery and repurposing fail to capture relationships central to the disease's etiology and response to drugs. OBJECTIVE: We designed the Alzheimer's Knowledge Base (AlzKB) to alleviate this need by providing a comprehensive knowledge representation of AD etiology and candidate therapeutics. METHODS: We designed the AlzKB as a large, heterogeneous graph knowledge base assembled using 22 diverse external data sources describing biological and pharmaceutical entities at different levels of organization (eg, chemicals, genes, anatomy, and diseases). AlzKB uses a Web Ontology Language 2 ontology to enforce semantic consistency and allow for ontological inference. We provide a public version of AlzKB and allow users to run and modify local versions of the knowledge base. RESULTS: AlzKB is freely available on the web and currently contains 118,902 entities with 1,309,527 relationships between those entities. To demonstrate its value, we used graph data science and machine learning to (1) propose new therapeutic targets based on similarities of AD to Parkinson disease and (2) repurpose existing drugs that may treat AD. For each use case, AlzKB recovers known therapeutic associations while proposing biologically plausible new ones. CONCLUSIONS: AlzKB is a new, publicly available knowledge resource that enables researchers to discover complex translational associations for AD drug discovery. Through 2 use cases, we show that it is a valuable tool for proposing novel therapeutic hypotheses based on public biomedical knowledge.


Subject(s)
Alzheimer Disease , Humans , Alzheimer Disease/drug therapy , Alzheimer Disease/genetics , Pattern Recognition, Automated , Knowledge Bases , Machine Learning , Knowledge
15.
Inflamm Bowel Dis ; 2024 Mar 07.
Article in English | MEDLINE | ID: mdl-38452040

ABSTRACT

Endoscopy, histology, and cross-sectional imaging serve as fundamental pillars in the detection, monitoring, and prognostication of inflammatory bowel disease (IBD). However, interpretation of these studies often relies on subjective human judgment, which can lead to delays, intra- and interobserver variability, and potential diagnostic discrepancies. With the rising incidence of IBD globally coupled with the exponential digitization of these data, there is a growing demand for innovative approaches to streamline diagnosis and elevate clinical decision-making. In this context, artificial intelligence (AI) technologies emerge as a timely solution to address the evolving challenges in IBD. Early studies using deep learning and radiomics approaches for endoscopy, histology, and imaging in IBD have demonstrated promising results for using AI to detect, diagnose, characterize, phenotype, and prognosticate IBD. Nonetheless, the available literature has inherent limitations and knowledge gaps that need to be addressed before AI can transition into a mainstream clinical tool for IBD. To better understand the potential value of integrating AI in IBD, we review the available literature to summarize our current understanding and identify gaps in knowledge to inform future investigations.

16.
Alzheimers Dement ; 20(4): 3074-3079, 2024 04.
Article in English | MEDLINE | ID: mdl-38324244

ABSTRACT

This perspective outlines the Artificial Intelligence and Technology Collaboratories (AITC) at Johns Hopkins University, University of Pennsylvania, and University of Massachusetts, highlighting their roles in developing AI-based technologies for older adult care, particularly targeting Alzheimer's disease (AD). These National Institute on Aging (NIA) centers foster collaboration among clinicians, gerontologists, ethicists, business professionals, and engineers to create AI solutions. Key activities include identifying technology needs, stakeholder engagement, training, mentoring, data integration, and navigating ethical challenges. The objective is to apply these innovations effectively in real-world scenarios, including in rural settings. In addition, the AITC focuses on developing best practices for AI application in the care of older adults, facilitating pilot studies, and addressing ethical concerns related to technology development for older adults with cognitive impairment, with the ultimate aim of improving the lives of older adults and their caregivers. HIGHLIGHTS: Addressing the complex needs of older adults with Alzheimer's disease (AD) requires a comprehensive approach, integrating medical and social support. Current gaps in training, techniques, tools, and expertise hinder uniform access across communities and health care settings. Artificial intelligence (AI) and digital technologies hold promise in transforming care for this demographic. Yet, transitioning these innovations from concept to marketable products presents significant challenges, often stalling promising advancements in the developmental phase. The Artificial Intelligence and Technology Collaboratories (AITC) program, funded by the National Institute on Aging (NIA), presents a viable model. These Collaboratories foster the development and implementation of AI methods and technologies through projects aimed at improving care for older Americans, particularly those with AD, and promote the sharing of best practices in AI and technology integration. Why Does This Matter? The National Institute on Aging (NIA) Artificial Intelligence and Technology Collaboratories (AITC) program's mission is to accelerate the adoption of artificial intelligence (AI) and new technologies for the betterment of older adults, especially those with dementia. By bridging scientific and technological expertise, fostering clinical and industry partnerships, and enhancing the sharing of best practices, this program can significantly improve the health and quality of life for older adults with Alzheimer's disease (AD).


Subject(s)
Alzheimer Disease , Isothiocyanates , United States , Humans , Aged , Alzheimer Disease/therapy , Artificial Intelligence , Geroscience , Quality of Life , Technology
18.
BioData Min ; 17(1): 7, 2024 Feb 28.
Article in English | MEDLINE | ID: mdl-38419006

ABSTRACT

PURPOSE: Epistasis, the interaction between two or more genes, is integral to the study of genetics and is present throughout nature. Yet, it is seldom fully explored as most approaches primarily focus on single-locus effects, partly because analyzing all pairwise and higher-order interactions requires significant computational resources. Furthermore, existing methods for epistasis detection only consider a Cartesian (multiplicative) model for interaction terms. This is likely limiting as epistatic interactions can evolve to produce varied relationships between genetic loci, some complex and not linearly separable. METHODS: We present new algorithms for the interaction coefficients for standard regression models for epistasis that permit many varied models for the interaction terms for loci and efficient memory usage. The algorithms are given for two-way and three-way epistasis and may be generalized to higher order epistasis. Statistical tests for the interaction coefficients are also provided. We also present an efficient matrix based algorithm for permutation testing for two-way epistasis. We offer a proof and experimental evidence that methods that look for epistasis only at loci that have main effects may not be justified. Given the computational efficiency of the algorithm, we applied the method to a rat data set and mouse data set, with at least 10,000 loci and 1,000 samples each, using the standard Cartesian model and the XOR model to explore body mass index. RESULTS: This study reveals that although many of the loci found to exhibit significant statistical epistasis overlap between models in rats, the pairs are mostly distinct. Further, the XOR model found greater evidence for statistical epistasis in many more pairs of loci in both data sets with almost all significant epistasis in mice identified using XOR. In the rat data set, loci involved in epistasis under the XOR model are enriched for biologically relevant pathways. CONCLUSION: Our results in both species show that many biologically relevant epistatic relationships would have been undetected if only one interaction model was applied, providing evidence that varied interaction models should be implemented to explore epistatic interactions that occur in living systems.

19.
Nat Cancer ; 5(2): 299-314, 2024 Feb.
Article in English | MEDLINE | ID: mdl-38253803

ABSTRACT

Contemporary analyses focused on a limited number of clinical and molecular biomarkers have been unable to accurately predict clinical outcomes in pancreatic ductal adenocarcinoma. Here we describe a precision medicine platform known as the Molecular Twin consisting of advanced machine-learning models and use it to analyze a dataset of 6,363 clinical and multi-omic molecular features from patients with resected pancreatic ductal adenocarcinoma to accurately predict disease survival (DS). We show that a full multi-omic model predicts DS with the highest accuracy and that plasma protein is the top single-omic predictor of DS. A parsimonious model learning only 589 multi-omic features demonstrated similar predictive performance as the full multi-omic model. Our platform enables discovery of parsimonious biomarker panels and performance assessment of outcome prediction models learning from resource-intensive panels. This approach has considerable potential to impact clinical care and democratize precision cancer medicine worldwide.


Subject(s)
Adenocarcinoma , Carcinoma, Pancreatic Ductal , Pancreatic Neoplasms , Humans , Adenocarcinoma/genetics , Adenocarcinoma/surgery , Pancreatic Neoplasms/genetics , Pancreatic Neoplasms/surgery , Multiomics , Artificial Intelligence , Carcinoma, Pancreatic Ductal/genetics , Carcinoma, Pancreatic Ductal/surgery , Intelligence
20.
medRxiv ; 2024 Jan 10.
Article in English | MEDLINE | ID: mdl-38260403

ABSTRACT

Genome-wide association studies (GWAS) have been instrumental in identifying genetic associations for various diseases and traits. However, uncovering genetic underpinnings among traits beyond univariate phenotype associations remains a challenge. Multi-phenotype associations (MPA), or genetic pleiotropy, offer important insights into shared genes and pathways among traits, enhancing our understanding of genetic architectures of complex diseases. GWAS of biobank-linked electronic health record (EHR) data are increasingly being utilized to identify MPA among various traits and diseases. However, methodologies that can efficiently take advantage of distributed EHR to detect MPA are still lacking. Here, we introduce mixWAS, a novel algorithm that efficiently and losslessly integrates multiple EHRs via summary statistics, allowing the detection of MPA among mixed phenotypes while accounting for heterogeneities across EHRs. Simulations demonstrate that mixWAS outperforms the widely used MPA detection method, Phenome-wide association study (PheWAS), across diverse scenarios. Applying mixWAS to data from seven EHRs in the US, we identified 4,534 MPA among blood lipids, BMI, and circulatory diseases. Validation in an independent EHR data from UK confirmed 97.7% of the associations. mixWAS fundamentally improves the detection of MPA and is available as a free, open-source software.

SELECTION OF CITATIONS
SEARCH DETAIL