RESUMEN
Most heritable diseases are polygenic. To comprehend the underlying genetic architecture, it is crucial to discover the clinically relevant epistatic interactions (EIs) between genomic single nucleotide polymorphisms (SNPs) (1-3). Existing statistical computational methods for EI detection are mostly limited to pairs of SNPs due to the combinatorial explosion of higher-order EIs. With NeEDL (network-based epistasis detection via local search), we leverage network medicine to inform the selection of EIs that are an order of magnitude more statistically significant compared to existing tools and consist, on average, of five SNPs. We further show that this computationally demanding task can be substantially accelerated once quantum computing hardware becomes available. We apply NeEDL to eight different diseases and discover genes (affected by EIs of SNPs) that are partly known to affect the disease, additionally, these results are reproducible across independent cohorts. EIs for these eight diseases can be interactively explored in the Epistasis Disease Atlas (https://epistasis-disease-atlas.com). In summary, NeEDL demonstrates the potential of seamlessly integrated quantum computing techniques to accelerate biomedical research. Our network medicine approach detects higher-order EIs with unprecedented statistical and biological evidence, yielding unique insights into polygenic diseases and providing a basis for the development of improved risk scores and combination therapies.
Asunto(s)
Epistasis Genética , Polimorfismo de Nucleótido Simple , Humanos , Teoría Cuántica , Herencia Multifactorial/genética , Enfermedad/genética , Biología Computacional/métodos , Algoritmos , Predisposición Genética a la EnfermedadRESUMEN
BACKGROUND: Recent advances in biotechnology enable the acquisition of high-dimensional data on individuals, posing challenges for prediction models which traditionally use covariates such as clinical patient characteristics. Alternative forms of covariate representations for the features derived from these modern data modalities should be considered that can utilize their intrinsic interconnection. The connectivity information between these features can be represented as an individual-specific network defined by a set of nodes and edges, the strength of which can vary from individual to individual. Global or local graph-theoretical features describing the network may constitute potential prognostic biomarkers instead of or in addition to traditional covariates and may replace the often unsuccessful search for individual biomarkers in a high-dimensional predictor space. METHODS: We conducted a scoping review to identify, collate and critically appraise the state-of-art in the use of individual-specific networks for prediction modelling in medicine and applied health research, published during 2000-2020 in the electronic databases PubMed, Scopus and Embase. RESULTS: Our scoping review revealed the main application areas namely neurology and pathopsychology, followed by cancer research, cardiology and pathology (N = 148). Network construction was mainly based on Pearson correlation coefficients of repeated measurements, but also alternative approaches (e.g. partial correlation, visibility graphs) were found. For covariates measured only once per individual, network construction was mostly based on quantifying an individual's contribution to the overall group-level structure. Despite the multitude of identified methodological approaches for individual-specific network inference, the number of studies that were intended to enable the prediction of clinical outcomes for future individuals was quite limited, and most of the models served as proof of concept that network characteristics can in principle be useful for prediction. CONCLUSION: The current body of research clearly demonstrates the value of individual-specific network analysis for prediction modelling, but it has not yet been considered as a general tool outside the current areas of application. More methodological research is still needed on well-founded strategies for network inference, especially on adequate network sparsification and outcome-guided graph-theoretical feature extraction and selection, and on how networks can be exploited efficiently for prediction modelling.
RESUMEN
Individual Specific Networks (ISNs) are a tool used in computational biology to infer Individual Specific relationships between biological entities from omics data. ISNs provide insights into how the interactions among these entities affect their respective functions. To address the scarcity of solutions for efficiently computing ISNs on large biological datasets, we present ISN-tractor, a data-agnostic, highly optimized Python library to build and analyse ISNs. ISN-tractor demonstrates superior scalability and efficiency in generating Individual Specific Networks (ISNs) when compared to existing methods such as LionessR, both in terms of time and memory usage, allowing ISNs to be used on large datasets. We show how ISN-tractor can be applied to real-life datasets, including The Cancer Genome Atlas (TCGA) and HapMap, showcasing its versatility. ISN-tractor can be used to build ISNs from various -omics data types, including transcriptomics, proteomics, and genotype arrays, and can detect distinct patterns of gene interactions within and across cancer types. We also show how Filtration Curves provided valuable insights into ISN characteristics, revealing topological distinctions among individuals with different clinical outcomes. Additionally, ISN-tractor can effectively cluster populations based on genetic relationships, as demonstrated with Principal Component Analysis on HapMap data.
Asunto(s)
Biología Computacional , Humanos , Biología Computacional/métodos , Redes Reguladoras de Genes , Neoplasias/genética , Programas Informáticos , Proteómica/métodos , AlgoritmosRESUMEN
Genetic ancestry plays a major role in pharmacogenomics, and a deeper understanding of the genetic diversity among individuals holds immerse promise for reshaping personalized medicine. In this pivotal study, we have conducted a large-scale genomic analysis of 1,136 pharmacogenomic variants employing machine learning algorithms on 3,714 individuals from publicly available datasets to assess the risk proximity of experiencing drug-related adverse events. Our findings indicate that Admixed Americans and Europeans have demonstrated a higher risk of experiencing drug toxicity, whereas individuals with East Asian ancestry and, to a lesser extent, Oceanians displayed a lower risk proximity. Polygenic risk scores for drug-gene interactions did not necessarily follow similar assumptions, reflecting distinct genetic patterns and population-specific differences that vary depending on the drug class. Overall, our results provide evidence that genetic ancestry is a pivotal factor in population pharmacogenomics and should be further exploited to strengthen even more personalized drug therapy.
RESUMEN
Molecular profiling technologies, such as RNA sequencing, offer new opportunities to better discover and understand the molecular networks involved in complex biological processes. Clinically important variations of diseases, or responses to treatment, are often reflected, or even caused, by the dysregulation of molecular interaction networks specific to particular network regions. In this work, we propose the R package PLEX.I, that allows quantifying and testing variation in the direct neighborhood of a given node between networks corresponding to different conditions or states. We illustrate PLEX.I in two applications in which we discover variation that is associated with different responses to tamoxifen treatment and to sex-specific responses to bacterial stimuli. In the first case, PLEX.I analysis identifies two known pathways i) that have already been implicated in the same context as the tamoxifen mechanism of action, and ii) that would have not have been identified using classical differential gene expression analysis.
RESUMEN
Longitudinal analysis of multivariate individual-specific microbiome profiles over time or across conditions remains dauntin. Most statistical tools and methods that are available to study microbiomes are based on cross-sectional data. Over the past few years, several attempts have been made to model the dynamics of bacterial species over time or across conditions. However, the field needs novel views on handling microbial interactions in temporal analyses. This study proposes a novel data analysis framework, MNDA, that combines representation learning and individual-specific microbial co-occurrence networks to uncover taxon neighborhood dynamics. As a use case, we consider a cohort of newborns with microbiomes available at 6 and 9 months after birth, and extraneous data available on the mode of delivery and diet changes between the considered time points. Our results show that prediction models for these extraneous outcomes based on an MNDA measure of local neighborhood dynamics for each taxon outperform traditional prediction models solely based on individual-specific microbial abundances. Furthermore, our results show that unsupervised similarity analysis of newborns in the study, again using the notion of a taxon's dynamic neighborhood derived from time-matched individual-specific microbial networks, can reveal different subpopulations of individuals, compared to standard microbiome-based clustering, with potential relevance to clinical practice. This study highlights the complementarity of microbial interactions and abundances in downstream analyses and opens new avenues to personalized prediction or stratified medicine with temporal microbiome data.
RESUMEN
Individual-specific networks, defined as networks of nodes and connecting edges that are specific to an individual, are promising tools for precision medicine. When such networks are biological, interpretation of functional modules at an individual level becomes possible. An under-investigated problem is relevance or "significance" assessment of each individual-specific network. This paper proposes novel edge and module significance assessment procedures for weighted and unweighted individual-specific networks. Specifically, we propose a modular Cook's distance using a method that involves iterative modeling of one edge versus all the others within a module. Two procedures assessing changes between using all individuals and using all individuals but leaving one individual out (LOO) are proposed as well (LOO-ISN, MultiLOO-ISN), relying on empirically derived edges. We compare our proposals to competitors, including adaptions of OPTICS, kNN, and Spoutlier methods, by an extensive simulation study, templated on real-life scenarios for gene co-expression and microbial interaction networks. Results show the advantages of performing modular versus edge-wise significance assessments for individual-specific networks. Furthermore, modular Cook's distance is among the top performers across all considered simulation settings. Finally, the identification of outlying individuals regarding their individual-specific networks, is meaningful for precision medicine purposes, as confirmed by network analysis of microbiome abundance profiles.
Asunto(s)
Algoritmos , Redes Reguladoras de Genes , Humanos , Simulación por ComputadorRESUMEN
Multi-view data offer advantages over single-view data for characterizing individuals, which is crucial in precision medicine toward personalized prevention, diagnosis, or treatment follow-up. Here, we develop a network-guided multi-view clustering framework named netMUG to identify actionable subgroups of individuals. This pipeline first adopts sparse multiple canonical correlation analysis to select multi-view features possibly informed by extraneous data, which are then used to construct individual-specific networks (ISNs). Finally, the individual subtypes are automatically derived by hierarchical clustering on these network representations. We applied netMUG to a dataset containing genomic data and facial images to obtain BMI-informed multi-view strata and showed how it could be used for a refined obesity characterization. Benchmark analysis of netMUG on synthetic data with known strata of individuals indicated its superior performance compared with both baseline and benchmark methods for multi-view clustering. In addition, the real-data analysis revealed subgroups strongly linked to BMI and genetic and facial determinants of these classes. NetMUG provides a powerful strategy, exploiting individual-specific networks to identify meaningful and actionable strata. Moreover, the implementation is easy to generalize to accommodate heterogeneous data sources or highlight data structures.
RESUMEN
Introduction: Multi-view data offer advantages over single-view data for characterizing individuals, which is crucial in precision medicine toward personalized prevention, diagnosis, or treatment follow-up. Methods: Here, we develop a network-guided multi-view clustering framework named netMUG to identify actionable subgroups of individuals. This pipeline first adopts sparse multiple canonical correlation analysis to select multi-view features possibly informed by extraneous data, which are then used to construct individual-specific networks (ISNs). Finally, the individual subtypes are automatically derived by hierarchical clustering on these network representations. Results: We applied netMUG to a dataset containing genomic data and facial images to obtain BMI-informed multi-view strata and showed how it could be used for a refined obesity characterization. Benchmark analysis of netMUG on synthetic data with known strata of individuals indicated its superior performance compared with both baseline and benchmark methods for multi-view clustering. The clustering derived from netMUG achieved an adjusted Rand index of 1 with respect to the synthesized true labels. In addition, the real-data analysis revealed subgroups strongly linked to BMI and genetic and facial determinants of these subgroups. Discussion: netMUG provides a powerful strategy, exploiting individual-specific networks to identify meaningful and actionable strata. Moreover, the implementation is easy to generalize to accommodate heterogeneous data sources or highlight data structures.
RESUMEN
Most heritable diseases are polygenic. To comprehend the underlying genetic architecture, it is crucial to discover the clinically relevant epistatic interactions (EIs) between genomic single nucleotide polymorphisms (SNPs)1-3. Existing statistical computational methods for EI detection are mostly limited to pairs of SNPs due to the combinatorial explosion of higher-order EIs. With NeEDL (network-based epistasis detection via local search), we leverage network medicine to inform the selection of EIs that are an order of magnitude more statistically significant compared to existing tools and consist, on average, of five SNPs. We further show that this computationally demanding task can be substantially accelerated once quantum computing hardware becomes available. We apply NeEDL to eight different diseases and discover genes (affected by EIs of SNPs) that are partly known to affect the disease, additionally, these results are reproducible across independent cohorts. EIs for these eight diseases can be interactively explored in the Epistasis Disease Atlas (https://epistasis-disease-atlas.com). In summary, NeEDL is the first application that demonstrates the potential of seamlessly integrated quantum computing techniques to accelerate biomedical research. Our network medicine approach detects higher-order EIs with unprecedented statistical and biological evidence, yielding unique insights into polygenic diseases and providing a basis for the development of improved risk scores and combination therapies.