RESUMO
Leveraging linkage disequilibrium (LD) patterns as representative of population substructure enables the discovery of additive association signals in genome-wide association studies (GWASs). Standard GWASs are well-powered to interrogate additive models; however, new approaches are required for invesigating other modes of inheritance such as dominance and epistasis. Epistasis, or non-additive interaction between genes, exists across the genome but often goes undetected because of a lack of statistical power. Furthermore, the adoption of LD pruning as customary in standard GWASs excludes detection of sites that are in LD but might underlie the genetic architecture of complex traits. We hypothesize that uncovering long-range interactions between loci with strong LD due to epistatic selection can elucidate genetic mechanisms underlying common diseases. To investigate this hypothesis, we tested for associations between 23 common diseases and 5,625,845 epistatic SNP-SNP pairs (determined by Ohta's D statistics) in long-range LD (>0.25 cM). Across five disease phenotypes, we identified one significant and four near-significant associations that replicated in two large genotype-phenotype datasets (UK Biobank and eMERGE). The genes that were most likely involved in the replicated associations were (1) members of highly conserved gene families with complex roles in multiple pathways, (2) essential genes, and/or (3) genes that were associated in the literature with complex traits that display variable expressivity. These results support the highly pleiotropic and conserved nature of variants in long-range LD under epistatic selection. Our work supports the hypothesis that epistatic interactions regulate diverse clinical mechanisms and might especially be driving factors in conditions with a wide range of phenotypic outcomes.
Assuntos
Epistasia Genética , Estudo de Associação Genômica Ampla , Desequilíbrio de Ligação/genética , Genótipo , Bancos de Espécimes Biológicos , Reino Unido , Polimorfismo de Nucleotídeo Único/genéticaRESUMO
Unsupervised learning, particularly clustering, plays a pivotal role in disease subtyping and patient stratification, especially with the abundance of large-scale multi-omics data. Deep learning models, such as variational autoencoders (VAEs), can enhance clustering algorithms by leveraging inter-individual heterogeneity. However, the impact of confounders-external factors unrelated to the condition, e.g. batch effect or age-on clustering is often overlooked, introducing bias and spurious biological conclusions. In this work, we introduce four novel VAE-based deconfounding frameworks tailored for clustering multi-omics data. These frameworks effectively mitigate confounding effects while preserving genuine biological patterns. The deconfounding strategies employed include (i) removal of latent features correlated with confounders, (ii) a conditional VAE, (iii) adversarial training, and (iv) adding a regularization term to the loss function. Using real-life multi-omics data from The Cancer Genome Atlas, we simulated various confounding effects (linear, nonlinear, categorical, mixed) and assessed model performance across 50 repetitions based on reconstruction error, clustering stability, and deconfounding efficacy. Our results demonstrate that our novel models, particularly the conditional multi-omics VAE (cXVAE), successfully handle simulated confounding effects and recover biologically driven clustering structures. cXVAE accurately identifies patient labels and unveils meaningful pathological associations among cancer types, validating deconfounded representations. Furthermore, our study suggests that some of the proposed strategies, such as adversarial training, prove insufficient in confounder removal. In summary, our study contributes by proposing innovative frameworks for simultaneous multi-omics data integration, dimensionality reduction, and deconfounding in clustering. Benchmarking on open-access data offers guidance to end-users, facilitating meaningful patient stratification for optimized precision medicine.
Assuntos
Algoritmos , Humanos , Análise por Conglomerados , Neoplasias/genética , Neoplasias/classificação , Aprendizado Profundo , Genômica/métodos , Biologia Computacional/métodos , Aprendizado de Máquina não Supervisionado , MultiômicaRESUMO
Most heritable diseases are polygenic. To comprehend the underlying genetic architecture, it is crucial to discover the clinically relevant epistatic interactions (EIs) between genomic single nucleotide polymorphisms (SNPs) (1-3). Existing statistical computational methods for EI detection are mostly limited to pairs of SNPs due to the combinatorial explosion of higher-order EIs. With NeEDL (network-based epistasis detection via local search), we leverage network medicine to inform the selection of EIs that are an order of magnitude more statistically significant compared to existing tools and consist, on average, of five SNPs. We further show that this computationally demanding task can be substantially accelerated once quantum computing hardware becomes available. We apply NeEDL to eight different diseases and discover genes (affected by EIs of SNPs) that are partly known to affect the disease, additionally, these results are reproducible across independent cohorts. EIs for these eight diseases can be interactively explored in the Epistasis Disease Atlas (https://epistasis-disease-atlas.com). In summary, NeEDL demonstrates the potential of seamlessly integrated quantum computing techniques to accelerate biomedical research. Our network medicine approach detects higher-order EIs with unprecedented statistical and biological evidence, yielding unique insights into polygenic diseases and providing a basis for the development of improved risk scores and combination therapies.
Assuntos
Epistasia Genética , Polimorfismo de Nucleotídeo Único , Humanos , Teoria Quântica , Herança Multifatorial/genética , Doença/genética , Biologia Computacional/métodos , Algoritmos , Predisposição Genética para DoençaRESUMO
Many problems in life sciences can be brought back to a comparison of graphs. Even though a multitude of such techniques exist, often, these assume prior knowledge about the partitioning or the number of clusters and fail to provide statistical significance of observed between-network heterogeneity. Addressing these issues, we developed an unsupervised workflow to identify groups of graphs from reliable network-based statistics. In particular, we first compute the similarity between networks via appropriate distance measures between graphs and use them in an unsupervised hierarchical algorithm to identify classes of similar networks. Then, to determine the optimal number of clusters, we recursively test for distances between two groups of networks. The test itself finds its inspiration in distance-wise ANOVA algorithms. Finally, we assess significance via the permutation of between-object distance matrices. Notably, the approach, which we will call netANOVA, is flexible since users can choose multiple options to adapt to specific contexts and network types. We demonstrate the benefits and pitfalls of our approach via extensive simulations and an application to two real-life datasets. NetANOVA achieved high performance in many simulation scenarios while controlling type I error. On non-synthetic data, comparison against state-of-the-art methods showed that netANOVA is often among the top performers. There are many application fields, including precision medicine, for which identifying disease subtypes via individual-level biological networks improves prevention programs, diagnosis and disease monitoring.
Assuntos
Algoritmos , Análise por Conglomerados , Simulação por Computador , Fluxo de Trabalho , Análise de VariânciaRESUMO
Debates about the prospective clinical use of polygenic risk scores (PRS) have grown considerably in the last years. The potential benefits of PRS to improve patient care at individual and population levels have been extensively underlined. Nonetheless, the use of PRS in clinical contexts presents a number of unresolved ethical challenges and consequent normative gaps that hinder their optimal implementation. Here, we conducted a systematic review of reasons of the normative literature discussing ethical issues and moral arguments related to the use of PRS for the prevention and treatment of common complex diseases. In total, we have included and analyzed 34 records, spanning from 2013 to 2023. The findings have been organized in three major themes: in the first theme, we consider the potential harms of PRS to individuals and their kin. In the theme "Threats to health equity," we consider ethical concerns of social relevance, with a focus on justice issues. Finally, the theme "Towards best practices" collects a series of research priorities and provisional recommendations to be considered for an optimal clinical translation of PRS. We conclude that the use of PRS in clinical care reinvigorates old debates in matters of health justice; however, open questions, regarding best practices in clinical counseling, suggest that the ethical considerations applicable in monogenic settings will not be sufficient to face PRS emerging challenges.
Assuntos
Predisposição Genética para Doença , Herança Multifatorial , Humanos , Herança Multifatorial/genética , Princípios Morais , Testes Genéticos/ética , Medição de Risco , Aconselhamento Genético/ética , Fatores de Risco , Estratificação de Risco GenéticoRESUMO
Assumptions are made about the genetic model of single nucleotide polymorphisms (SNPs) when choosing a traditional genetic encoding: additive, dominant, and recessive. Furthermore, SNPs across the genome are unlikely to demonstrate identical genetic models. However, running SNP-SNP interaction analyses with every combination of encodings raises the multiple testing burden. Here, we present a novel and flexible encoding for genetic interactions, the elastic data-driven genetic encoding (EDGE), in which SNPs are assigned a heterozygous value based on the genetic model they demonstrate in a dataset prior to interaction testing. We assessed the power of EDGE to detect genetic interactions using 29 combinations of simulated genetic models and found it outperformed the traditional encoding methods across 10%, 30%, and 50% minor allele frequencies (MAFs). Further, EDGE maintained a low false-positive rate, while additive and dominant encodings demonstrated inflation. We evaluated EDGE and the traditional encodings with genetic data from the Electronic Medical Records and Genomics (eMERGE) Network for five phenotypes: age-related macular degeneration (AMD), age-related cataract, glaucoma, type 2 diabetes (T2D), and resistant hypertension. A multi-encoding genome-wide association study (GWAS) for each phenotype was performed using the traditional encodings, and the top results of the multi-encoding GWAS were considered for SNP-SNP interaction using the traditional encodings and EDGE. EDGE identified a novel SNP-SNP interaction for age-related cataract that no other method identified: rs7787286 (MAF: 0.041; intergenic region of chromosome 7)-rs4695885 (MAF: 0.34; intergenic region of chromosome 4) with a Bonferroni LRT p of 0.018. A SNP-SNP interaction was found in data from the UK Biobank within 25 kb of these SNPs using the recessive encoding: rs60374751 (MAF: 0.030) and rs6843594 (MAF: 0.34) (Bonferroni LRT p: 0.026). We recommend using EDGE to flexibly detect interactions between SNPs exhibiting diverse action.
Assuntos
Modelos Genéticos , Catarata/genética , Conjuntos de Dados como Assunto , Diabetes Mellitus Tipo 2/genética , Frequência do Gene , Estudo de Associação Genômica Ampla , Glaucoma/genética , Humanos , Hipertensão/genética , Degeneração Macular/genética , Fenótipo , Polimorfismo de Nucleotídeo ÚnicoRESUMO
Genes and gene products do not function in isolation but as components of complex networks of macromolecules through physical or biochemical interactions. Dependencies of gene mutations on genetic background (i.e., epistasis) are believed to play a role in understanding molecular underpinnings of complex diseases such as inflammatory bowel disease (IBD). However, the process of identifying such interactions is complex due to for instance the curse of high dimensionality, dependencies in the data and non-linearity. Here, we propose a novel approach for robust and computationally efficient epistasis detection. We do so by first reducing dimensionality, per gene via diffusion kernel principal components (kpc). Subsequently, kpc gene summaries are used for downstream analysis including the construction of a gene-based epistasis network. We show that our approach is not only able to recover known IBD associated genes but also additional genes of interest linked to this difficult gastrointestinal disease.
Assuntos
Epistasia Genética , Estudo de Associação Genômica Ampla , Difusão , Redes Reguladoras de Genes , Polimorfismo de Nucleotídeo ÚnicoRESUMO
BACKGROUND: Recent advances in biotechnology enable the acquisition of high-dimensional data on individuals, posing challenges for prediction models which traditionally use covariates such as clinical patient characteristics. Alternative forms of covariate representations for the features derived from these modern data modalities should be considered that can utilize their intrinsic interconnection. The connectivity information between these features can be represented as an individual-specific network defined by a set of nodes and edges, the strength of which can vary from individual to individual. Global or local graph-theoretical features describing the network may constitute potential prognostic biomarkers instead of or in addition to traditional covariates and may replace the often unsuccessful search for individual biomarkers in a high-dimensional predictor space. METHODS: We conducted a scoping review to identify, collate and critically appraise the state-of-art in the use of individual-specific networks for prediction modelling in medicine and applied health research, published during 2000-2020 in the electronic databases PubMed, Scopus and Embase. RESULTS: Our scoping review revealed the main application areas namely neurology and pathopsychology, followed by cancer research, cardiology and pathology (N = 148). Network construction was mainly based on Pearson correlation coefficients of repeated measurements, but also alternative approaches (e.g. partial correlation, visibility graphs) were found. For covariates measured only once per individual, network construction was mostly based on quantifying an individual's contribution to the overall group-level structure. Despite the multitude of identified methodological approaches for individual-specific network inference, the number of studies that were intended to enable the prediction of clinical outcomes for future individuals was quite limited, and most of the models served as proof of concept that network characteristics can in principle be useful for prediction. CONCLUSION: The current body of research clearly demonstrates the value of individual-specific network analysis for prediction modelling, but it has not yet been considered as a general tool outside the current areas of application. More methodological research is still needed on well-founded strategies for network inference, especially on adequate network sparsification and outcome-guided graph-theoretical feature extraction and selection, and on how networks can be exploited efficiently for prediction modelling.
RESUMO
A reoccurring issue in neuroepigenomic studies, especially in the context of neurodegenerative disease, is the use of (heterogeneous) bulk tissue, which generates noise during epigenetic profiling. A workable solution to this issue is to quantify epigenetic patterns in individually isolated neuronal cells using laser capture microdissection (LCM). For this purpose, we established a novel approach for targeted DNA methylation profiling of individual genes that relies on a combination of LCM and limiting dilution bisulfite pyrosequencing (LDBSP). Using this approach, we determined cytosine-phosphate-guanine (CpG) methylation rates of single alleles derived from 50 neurons that were isolated from unfixed post-mortem brain tissue. In the present manuscript, we describe the general workflow and, as a showcase, demonstrate how targeted methylation analysis of various genes, in this case, RHBDF2, OXT, TNXB, DNAJB13, PGLYRP1, C3, and LMX1B, can be performed simultaneously. By doing so, we describe an adapted data analysis pipeline for LDBSP, allowing one to include and correct CpG methylation rates derived from multi-allele reactions. In addition, we show that the efficiency of LDBSP on DNA derived from LCM neurons is similar to the efficiency obtained in previously published studies using this technique on other cell types. Overall, the method described here provides the user with a more accurate estimation of the DNA methylation status of each target gene in the analyzed cell pools, thereby adding further validity to this approach.
Assuntos
Doenças Neurodegenerativas , Humanos , Análise de Sequência de DNA/métodos , Metilação de DNA , Encéfalo , Sequenciamento de Nucleotídeos em Larga Escala , Lasers , Chaperonas Moleculares , Proteínas Reguladoras de ApoptoseRESUMO
Principal components (PCs) are widely used in statistics and refer to a relatively small number of uncorrelated variables derived from an initial pool of variables, while explaining as much of the total variance as possible. Also in statistical genetics, principal component analysis (PCA) is a popular technique. To achieve optimal results, a thorough understanding about the different implementations of PCA is required and their impact on study results, compared to alternative approaches. In this review, we focus on the possibilities, limitations and role of PCs in ancestry prediction, genome-wide association studies, rare variants analyses, imputation strategies, meta-analysis and epistasis detection. We also describe several variations of classic PCA that deserve increased attention in statistical genetics applications.
Assuntos
Modelos Estatísticos , Análise de Componente Principal , Animais , HumanosRESUMO
Genome-wide association studies (GWAS) detect common genetic variants associated with complex disorders. With their comprehensive coverage of common single nucleotide polymorphisms and comparatively low cost, GWAS are an attractive tool in the clinical and commercial genetic testing. This review introduces the pipeline of statistical methods used in GWAS analysis, from data quality control, association tests, population structure control, interaction effects and results visualization, through to post-GWAS validation methods and related issues.
Assuntos
Testes Genéticos/estatística & dados numéricos , Estudo de Associação Genômica Ampla/estatística & dados numéricos , Polimorfismo de Nucleotídeo Único/genética , Genótipo , Humanos , FenótipoRESUMO
Due to its long genetic evolutionary history, Africans exhibit more genetic variation than any other population in the world. Their genetic diversity further lends itself to subdivisions of Africans into groups of individuals with a genetic similarity of varying degrees of granularity. It remains challenging to detect fine-scale structure in a computationally efficient and meaningful way. In this paper, we present a proof-of-concept of a novel fine-scale population structure detection tool with Western African samples. These samples consist of 1396 individuals from 25 ethnic groups (two groups are African American descendants). The strategy is based on a recently developed tool called IPCAPS. IPCAPS, or Iterative Pruning to CApture Population Structure, is a genetic divisive clustering strategy that enhances iterative pruning PCA, is robust to outliers and does not require a priori computation of haplotypes. Our strategy identified in total 12 groups and 6 groups were revealed as fine-scale structure detected in the samples from Cameroon, Gambia, Mali, Southwest USA, and Barbados. Our finding helped to explain evolutionary processes in the analyzed West African samples and raise awareness for fine-scale structure resolution when conducting genome-wide association and interaction studies.
Assuntos
População Negra/genética , Etnicidade/genética , Variação Genética , Genética Populacional , Estudo de Associação Genômica Ampla , Haplótipos , Software , África Ocidental/etnologia , HumanosRESUMO
BACKGROUND AND GOALS: Active inflammatory bowel diseases (IBD) represent an independent risk factor for venous thromboembolism. The authors investigated the hemostatic profile of IBD patients before and after induction treatment with infliximab, vedolizumab, and methylprednisolone. STUDY: This prospective study included 62 patients with active IBD starting infliximab, vedolizumab, and/or methylprednisolone, and 22 healthy controls (HC). Plasma was collected before (w0) and after induction therapy (w14). Using a clot lysis assay, amplitude (marker for clot intensity), time to peak (Tmax; marker for clot formation rate), area under the curve (AUC; global marker for coagulation/fibrinolysis), and 50% clot lysis time (50%CLT; marker for fibrinolytic capacity) were determined. Plasminogen activator inhibitor-1 (PAI-1) and fibronectin were measured by ELISA. Clinical remission was evaluated at w14. RESULTS: At baseline, AUC, amplitude, and 50%CLT were significantly higher in IBD patients as compared with HC. In 34 remitters, AUC [165 (103-229)% vs. 97 (78-147)%, P=0.001], amplitude [119 (99-163)% vs. 95 (82-117)%, P=0.002], and 50%CLT [122 (94-146)% vs. 100 (87-129)%, P=0.001] decreased significantly and even normalized to the HC level. Vedolizumab trough concentration correlated inversely to fibronectin concentration (r, -0.732; P=0.002). The increase in Tmax for infliximab-treated remitters was significantly different from the decrease in Tmax for vedolizumab-treated remitters (P=0.028). The 50%CLT increased (P=0.038) when remitters were concomitantly treated with methylprednisolone. CONCLUSIONS: Control of inflammation using infliximab most strongly reduced those parameters that are associated with a higher risk of venous thromboembolism.
Assuntos
Doenças Inflamatórias Intestinais , Trombose , Fibrinólise , Humanos , Doenças Inflamatórias Intestinais/tratamento farmacológico , Infliximab/efeitos adversos , Estudos ProspectivosRESUMO
Rationale: Analysis of exhaled breath for asthma phenotyping using endogenously generated volatile organic compounds (VOCs) offers the possibility of noninvasive diagnosis and therapeutic monitoring. Induced sputum is indeed not widely available and markers of neutrophilic asthma are still lacking.Objectives: To determine whether analysis of exhaled breath using endogenously generated VOCs can be a surrogate marker for recognition of sputum inflammatory phenotypes.Methods: We conducted a prospective study on 521 patients with asthma recruited from the University Asthma Clinic of Liege. Patients underwent VOC measurement, fraction of exhaled nitric oxide (FeNO) spirometry, sputum induction, and gave a blood sample. Subjects with asthma were classified in three inflammatory phenotypes according to their sputum granulocytic cell count.Measurements and Main Results: In the discovery study, seven potential biomarkers were highlighted by gas chromatography-mass spectrometry in a training cohort of 276 patients with asthma. In the replication study (n = 245), we confirmed four VOCs of interest to discriminate among asthma inflammatory phenotypes using comprehensive two-dimensional gas chromatography coupled to high-resolution time-of-flight mass spectrometry. Hexane and 2-hexanone were identified as compounds with the highest classification performance in eosinophilic asthma with accuracy comparable to that of blood eosinophils and FeNO. Moreover, the combination of FeNO, blood eosinophils, and VOCs gave a very good prediction of eosinophilic asthma (area under the receiver operating characteristic curve, 0.9). For neutrophilic asthma, the combination of nonanal, 1-propanol, and hexane had a classification performance similar to FeNO or blood eosinophils in eosinophilic asthma. Those compounds were found in higher levels in neutrophilic asthma.Conclusions: Our study is the first attempt to characterize VOCs according to sputum granulocytic profile in a large population of patients with asthma and provide surrogate markers for neutrophilic asthma.
Assuntos
Asma/imunologia , Eosinofilia/imunologia , Eosinófilos , Neutrófilos , Escarro/citologia , Adulto , Idoso , Asma/classificação , Asma/diagnóstico , Asma/metabolismo , Testes Respiratórios , Eosinofilia/metabolismo , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Óxido Nítrico/metabolismo , Estudos Prospectivos , Espirometria , Compostos Orgânicos VoláteisRESUMO
PURPOSE: The inclusion of patient-reported outcome (PRO) questionnaires in prognostic factor analyses in oncology has substantially increased in recent years. We performed a simulation study to compare the performances of four different modeling strategies in estimating the prognostic impact of multiple collinear scales from PRO questionnaires. METHODS: We generated multiple scenarios describing survival data with different sample sizes, event rates and degrees of multicollinearity among five PRO scales. We used the Cox proportional hazards (PH) model to estimate the hazard ratios (HR) using automatic selection procedures, which were based on either the likelihood ratio-test (Cox-PV) or the Akaike Information Criterion (Cox-AIC). We also used Cox PH models which included all variables and were either penalized using the Ridge regression (Cox-R) or were estimated as usual (Cox-Full). For each scenario, we simulated 1000 independent datasets and compared the average outcomes of all methods. RESULTS: The Cox-R showed similar or better performances with respect to the other methods, particularly in scenarios with medium-high multicollinearity (ρ = 0.4 to ρ = 0.8) and small sample sizes (n = 100). Overall, the Cox-PV and Cox-AIC performed worse, for example they did not select one or more prognostic collinear PRO scales in some scenarios. Compared with the Cox-Full, the Cox-R provided HR estimates with similar bias patterns but smaller root-mean-squared errors, particularly in higher multicollinearity scenarios. CONCLUSIONS: Our findings suggest that the Cox-R is the best approach when performing prognostic factor analyses with multiple and collinear PRO scales, particularly in situations of high multicollinearity, small sample sizes and low event rates.
Assuntos
Neoplasias/psicologia , Neoplasias/terapia , Medidas de Resultados Relatados pelo Paciente , Qualidade de Vida/psicologia , Adulto , Idoso , Idoso de 80 Anos ou mais , Análise Fatorial , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Prognóstico , Modelos de Riscos Proporcionais , Tamanho da Amostra , Adulto JovemRESUMO
Integrative analyses of several omics data are emerging. The data are usually generated from the same source material (i.e., tumor sample) representing one level of regulation. However, integrating different regulatory levels (i.e., blood) with those from tumor may also reveal important knowledge about the human genetic architecture. To model this multilevel structure, an integrative-expression quantitative trait loci (eQTL) analysis applying two-stage regression (2SR) was proposed. This approach first regressed tumor gene expression levels with tumor markers and the adjusted residuals from the previous model were then regressed with the germline genotypes measured in blood. Previously, we demonstrated that penalized regression methods in combination with a permutation-based MaxT method (Global-LASSO) is a promising tool to fix some of the challenges that high-throughput omics data analysis imposes. Here, we assessed whether Global-LASSO can also be applied when tumor and blood omics data are integrated. We further compared our strategy with two 2SR-approaches, one using multiple linear regression (2SR-MLR) and other using LASSO (2SR-LASSO). We applied the three models to integrate genomic, epigenomic, and transcriptomic data from tumor tissue with blood germline genotypes from 181 individuals with bladder cancer included in the TCGA Consortium. Global-LASSO provided a larger list of eQTLs than the 2SR methods, identified a previously reported eQTLs in prostate stem cell antigen (PSCA), and provided further clues on the complexity of APBEC3B loci, with a minimal false-positive rate not achieved by 2SR-MLR. It also represents an important contribution for omics integrative analysis because it is easy to apply and adaptable to any type of data.
Assuntos
Genômica , Locos de Características Quantitativas/genética , Neoplasias da Bexiga Urinária/genética , Cromossomos Humanos/genética , Simulação por Computador , Humanos , Modelos Lineares , Modelos Genéticos , Análise Multivariada , Polimorfismo de Nucleotídeo Único/genéticaRESUMO
The vast amount of heterogeneous omics data, encompassing a broad range of biomolecular information, requires novel methods of analysis, including those that integrate the available levels of information. In this work, we describe Regression2Net, a computational approach that is able to integrate gene expression and genomic or methylation data in two steps. First, penalized regressions are used to build Expression-Expression (EEnet) and Expression-Genomic or Expression-Methylation (EMnet) networks. Second, network theory is used to highlight important communities of genes. When applying our approach, Regression2Net to gene expression and methylation profiles for individuals with glioblastoma multiforme, we identified, respectively, 284 and 447 potentially interesting genes in relation to glioblastoma pathology. These genes showed at least one connection in the integrated networks ANDnet and XORnet derived from aforementioned EEnet and EMnet networks. Although the edges in ANDnet occur in both EEnet and EMnet, the edges in XORnet occur in EMnet but not in EEnet. In-depth biological analysis of connected genes in ANDnet and XORnet revealed genes that are related to energy metabolism, cell cycle control (AATF), immune system response, and several cancer types. Importantly, we observed significant overrepresentation of cancer-related pathways including glioma, especially in the XORnet network, suggesting a nonignorable role of methylation in glioblastoma multiforma. In the ANDnet, we furthermore identified potential glioma suppressor genes ACCN3 and ACCN4 linked to the NBPF1 neuroblastoma breakpoint family, as well as numerous ABC transporter genes (ABCA1, ABCB1) suggesting drug resistance of glioblastoma tumors.
Assuntos
Metilação de DNA , Regulação Neoplásica da Expressão Gênica , Redes Reguladoras de Genes , Genômica/métodos , Glioblastoma/genética , Proteínas de Neoplasias/genética , Biologia Computacional/métodos , Glioblastoma/patologia , HumanosRESUMO
Complex diseases are defined to be determined by multiple genetic and environmental factors alone as well as in interactions. To analyze interactions in genetic data, many statistical methods have been suggested, with most of them relying on statistical regression models. Given the known limitations of classical methods, approaches from the machine-learning community have also become attractive. From this latter family, a fast-growing collection of methods emerged that are based on the Multifactor Dimensionality Reduction (MDR) approach. Since its first introduction, MDR has enjoyed great popularity in applications and has been extended and modified multiple times. Based on a literature search, we here provide a systematic and comprehensive overview of these suggested methods. The methods are described in detail, and the availability of implementations is listed. Most recent approaches offer to deal with large-scale data sets and rare variants, which is why we expect these methods to even gain in popularity.
Assuntos
Algoritmos , Modelos Estatísticos , Redução Dimensional com Múltiplos Fatores/métodos , Reconhecimento Automatizado de Padrão/métodos , Mapeamento de Interação de Proteínas/métodos , Simulação por ComputadorRESUMO
Omics data integration is becoming necessary to investigate the genomic mechanisms involved in complex diseases. During the integration process, many challenges arise such as data heterogeneity, the smaller number of individuals in comparison to the number of parameters, multicollinearity, and interpretation and validation of results due to their complexity and lack of knowledge about biological processes. To overcome some of these issues, innovative statistical approaches are being developed. In this work, we propose a permutation-based method to concomitantly assess significance and correct by multiple testing with the MaxT algorithm. This was applied with penalized regression methods (LASSO and ENET) when exploring relationships between common genetic variants, DNA methylation and gene expression measured in bladder tumor samples. The overall analysis flow consisted of three steps: (1) SNPs/CpGs were selected per each gene probe within 1Mb window upstream and downstream the gene; (2) LASSO and ENET were applied to assess the association between each expression probe and the selected SNPs/CpGs in three multivariable models (SNP, CPG, and Global models, the latter integrating SNPs and CPGs); and (3) the significance of each model was assessed using the permutation-based MaxT method. We identified 48 genes whose expression levels were significantly associated with both SNPs and CPGs. Importantly, 36 (75%) of them were replicated in an independent data set (TCGA) and the performance of the proposed method was checked with a simulation study. We further support our results with a biological interpretation based on an enrichment analysis. The approach we propose allows reducing computational time and is flexible and easy to implement when analyzing several types of omics data. Our results highlight the importance of integrating omics data by applying appropriate statistical strategies to discover new insights into the complex genetic mechanisms involved in disease conditions.