Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 30
Filtrar
1.
Biostatistics ; 2024 Jun 07.
Artigo em Inglês | MEDLINE | ID: mdl-38850151

RESUMO

DNA methylation is an important epigenetic mark that modulates gene expression through the inhibition of transcriptional proteins binding to DNA. As in many other omics experiments, the issue of missing values is an important one, and appropriate imputation techniques are important in avoiding an unnecessary sample size reduction as well as to optimally leverage the information collected. We consider the case where relatively few samples are processed via an expensive high-density whole genome bisulfite sequencing (WGBS) strategy and a larger number of samples is processed using more affordable low-density, array-based technologies. In such cases, one can impute the low-coverage (array-based) methylation data using the high-density information provided by the WGBS samples. In this paper, we propose an efficient Linear Model of Coregionalisation with informative Covariates (LMCC) to predict missing values based on observed values and covariates. Our model assumes that at each site, the methylation vector of all samples is linked to the set of fixed factors (covariates) and a set of latent factors. Furthermore, we exploit the functional nature of the data and the spatial correlation across sites by assuming some Gaussian processes on the fixed and latent coefficient vectors, respectively. Our simulations show that the use of covariates can significantly improve the accuracy of imputed values, especially in cases where missing data contain some relevant information about the explanatory variable. We also showed that our proposed model is particularly efficient when the number of columns is much greater than the number of rows-which is usually the case in methylation data analysis. Finally, we apply and compare our proposed method with alternative approaches on two real methylation datasets, showing how covariates such as cell type, tissue type or age can enhance the accuracy of imputed values.

2.
Stat Med ; 2024 Jun 26.
Artigo em Inglês | MEDLINE | ID: mdl-38932470

RESUMO

Motivated by a DNA methylation application, this article addresses the problem of fitting and inferring a multivariate binomial regression model for outcomes that are contaminated by errors and exhibit extra-parametric variations, also known as dispersion. While dispersion in univariate binomial regression has been extensively studied, addressing dispersion in the context of multivariate outcomes remains a complex and relatively unexplored task. The complexity arises from a noteworthy data characteristic observed in our motivating dataset: non-constant yet correlated dispersion across outcomes. To address this challenge and account for possible measurement error, we propose a novel hierarchical quasi-binomial varying coefficient mixed model, which enables flexible dispersion patterns through a combination of additive and multiplicative dispersion components. To maximize the Laplace-approximated quasi-likelihood of our model, we further develop a specialized two-stage expectation-maximization (EM) algorithm, where a plug-in estimate for the multiplicative scale parameter enhances the speed and stability of the EM iterations. Simulations demonstrated that our approach yields accurate inference for smooth covariate effects and exhibits excellent power in detecting non-zero effects. Additionally, we applied our proposed method to investigate the association between DNA methylation, measured across the genome through targeted custom capture sequencing of whole blood, and levels of anti-citrullinated protein antibodies (ACPA), a preclinical marker for rheumatoid arthritis (RA) risk. Our analysis revealed 23 significant genes that potentially contribute to ACPA-related differential methylation, highlighting the relevance of cell signaling and collagen metabolism in RA. We implemented our method in the R Bioconductor package called "SOMNiBUS."

3.
Stat Methods Med Res ; 32(11): 2096-2122, 2023 11.
Artigo em Inglês | MEDLINE | ID: mdl-37832140

RESUMO

With the cost-effectiveness technology in whole-genome sequencing, more sophisticated statistical methods for testing genetic association with both rare and common variants are being investigated to identify the genetic variation between individuals. Several methods which group variants, also called gene-based approaches, are developed. For instance, advanced extensions of the sequence kernel association test, which is a widely used variant-set test, have been proposed for unrelated samples and extended for family data. Family data have been shown to be powerful when analyzing rare variants. However, most of such methods capture familial relatedness using a random effect component within the generalized linear mixed model framework. Therefore, there is a need to develop unified and flexible methods to study the association between a set of genetic variants and a trait, especially for a binary outcome. Copulas are multivariate distribution functions with uniform margins on the [0,1] interval and they provide suitable models to capture familial dependence structure. In this work, we propose a flexible family-based association test for both rare and common variants in the presence of binary traits. The method, termed novel rare variant association test (NRVAT), uses a marginal logistic model and a Gaussian Copula. The latter is employed to model the dependence between relatives. An analytic score-type test is derived. Through simulations, we show that our method can achieve greater power than existing approaches. The proposed model is applied to investigate the association between schizophrenia and bipolar disorder in a family-based cohort consisting of 17 extended families from Eastern Quebec.


Assuntos
Variação Genética , Modelos Genéticos , Humanos , Simulação por Computador , Estudos de Associação Genética , Fenótipo , Modelos Lineares
4.
J Appl Stat ; 50(7): 1496-1514, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37197752

RESUMO

Accounting for important interaction effects can improve the prediction of many statistical learning models. Identification of relevant interactions, however, is a challenging issue owing to their ultrahigh-dimensional nature. Interaction screening strategies can alleviate such issues. However, due to heavier tail distribution and complex dependence structure of interaction effects, innovative robust and/or model-free methods for screening interactions are required to better scale analysis of complex and high-throughput data. In this work, we develop a new model-free interaction screening method, termed Kendall Interaction Filter (KIF), for the classification in high-dimensional settings. KIF method suggests a weighted-sum measure, which compares the overall to the within-cluster Kendall's τ of pairs of predictors, to select interactive couples of features. The proposed KIF measure captures relevant interactions for the clusters response-variable, handles continuous, categorical or a mixture of continuous-categorical features, and is invariant under monotonic transformations. The tKIF measure enjoys the sure screening property in the high-dimensional setting under mild conditions, without imposing sub-exponential moment assumptions on the features' distribution. We illustrate the favorable behavior of the proposed methodology compared to the methods in the same category using simulation studies, and we conduct real data analyses to demonstrate its utility.

5.
Bioinformatics ; 39(2)2023 02 03.
Artigo em Inglês | MEDLINE | ID: mdl-36708013

RESUMO

MOTIVATION: Sparse regularized regression methods are now widely used in genome-wide association studies (GWAS) to address the multiple testing burden that limits discovery of potentially important predictors. Linear mixed models (LMMs) have become an attractive alternative to principal components (PCs) adjustment to account for population structure and relatedness in high-dimensional penalized models. However, their use in binary trait GWAS rely on the invalid assumption that the residual variance does not depend on the estimated regression coefficients. Moreover, LMMs use a single spectral decomposition of the covariance matrix of the responses, which is no longer possible in generalized linear mixed models (GLMMs). RESULTS: We introduce a new method called pglmm, a penalized GLMM that allows to simultaneously select genetic markers and estimate their effects, accounting for between-individual correlations and binary nature of the trait. We develop a computationally efficient algorithm based on penalized quasi-likelihood estimation that allows to scale regularized mixed models on high-dimensional binary trait GWAS. We show through simulations that when the dimensionality of the relatedness matrix is high, penalized LMM and logistic regression with PC adjustment fail to select important predictors, and have inferior prediction accuracy compared to pglmm. Further, we demonstrate through the analysis of two polygenic binary traits in a subset of 6731 related individuals from the UK Biobank data with 320K SNPs that our method can achieve higher predictive performance, while also selecting fewer predictors than a sparse regularized logistic lasso with PC adjustment. AVAILABILITY AND IMPLEMENTATION: Our Julia package PenalizedGLMM.jl is publicly available on github: https://github.com/julstpierre/PenalizedGLMM. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Algoritmos , Estudo de Associação Genômica Ampla , Humanos , Estudo de Associação Genômica Ampla/métodos , Fenótipo , Modelos Lineares , Polimorfismo de Nucleotídeo Único , Modelos Genéticos
6.
Int J Biostat ; 19(2): 369-387, 2023 11 01.
Artigo em Inglês | MEDLINE | ID: mdl-36279152

RESUMO

In genome wide association studies (GWAS), researchers are often dealing with dichotomous and non-normally distributed traits, or a mixture of discrete-continuous traits. However, most of the current region-based methods rely on multivariate linear mixed models (mvLMMs) and assume a multivariate normal distribution for the phenotypes of interest. Hence, these methods are not applicable to disease or non-normally distributed traits. Therefore, there is a need to develop unified and flexible methods to study association between a set of (possibly rare) genetic variants and non-normal multivariate phenotypes. Copulas are multivariate distribution functions with uniform margins on the [0, 1] interval and they provide suitable models to deal with non-normality of errors in multivariate association studies. We propose a novel unified and flexible copula-based multivariate association test (CBMAT) for discovering association between a genetic region and a bivariate continuous, binary or mixed phenotype. We also derive a data-driven analytic p-value procedure of the proposed region-based score-type test. Through simulation studies, we demonstrate that CBMAT has well controlled type I error rates and higher power to detect associations compared with other existing methods, for discrete and non-normally distributed traits. At last, we apply CBMAT to detect the association between two genes located on chromosome 11 and several lipid levels measured on 1477 subjects from the ASLPAC study.


Assuntos
Estudo de Associação Genômica Ampla , Modelos Genéticos , Humanos , Estudo de Associação Genômica Ampla/métodos , Fenótipo , Simulação por Computador , Modelos Lineares
7.
Front Immunol ; 13: 1067075, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-36505483

RESUMO

Introduction: Kidney transplantation is the optimal treatment in end-stage kidney disease, but de-novo donor specific antibody development continues to negatively impact patients undergoing kidney transplantation. One of the recent advances in solid organ transplantation has been the definition of molecular mismatching between donors and recipients' Human Leukocyte Antigens (HLA). While not fully integrated in standard clinical care, cumulative molecular mismatch at the level of eplets (EMM) as well as the PIRCHE-II score have shown promise in predicting transplant outcomes. In this manuscript, we sought to study whether certain T-cell molecular mismatches (TcEMM) were highly predictive of death-censored graft failure (DCGF). Methods: We studied a retrospective cohort of kidney donor:recipient pairs from the Scientific Registry of Transplant Recipients (2000-2015). Allele level HLA-A, B, C, DRB1 and DQB1 types were imputed from serologic types using the NMDP algorithm. TcEMMs were then estimated using the PIRCHE-II algorithm. Multivariable Accelerated Failure Time (AFT) models assessed the association between each TcEMM and DCGF. To discriminate between TcEMMs most predictive of DCGF, we fit multivariable Lasso penalized regression models. We identified co-expressed TcEMMs using weighted correlation network analysis (WGCNA). Finally, we conducted sensitivity analyses to address PIRCHE and IMGT/HLA version updates. Results: A total of 118,309 donor:recipient pairs meeting the eligibility criteria were studied. When applying the PIRCHE-II algorithm, we identified 1,935 distinct TcEMMs at the population level. A total of 218 of the observed TcEMM were independently associated with DCGF by AFT models. The Lasso penalized regression model with post selection inference identified a smaller subset of 86 TcEMMs (56 and 30 TcEMM derived from HLA Class I and II, respectively) to be highly predictive of DCGF. Of the observed TcEMM, 38.14% appeared as profiles of highly co-expressed TcEMMs. In addition, sensitivity analyses identified that the selected TcEMM were congruent across IMGT/HLA versions. Conclusion: In this study, we identified subsets of TcEMMs highly predictive of DCGF and profiles of co-expressed mismatches. Experimental verification of these TcEMMs determining immune responses and how they may interact with EMM as predictors of transplant outcomes would justify their consideration in organ allocation schemes and for modifying immunosuppression regimens.


Assuntos
Transplante de Rim , Humanos , Transplante de Rim/efeitos adversos , Estudos Retrospectivos , Linfócitos T , Antígenos HLA/genética , Complicações Pós-Operatórias
8.
J Appl Stat ; 49(14): 3564-3590, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-36246864

RESUMO

Generalized estimating equations ( G E E ) are widely used to analyze longitudinal data; however, they are not appropriate for heteroscedastic data, because they only estimate regressor effects on the mean response - and therefore do not account for data heterogeneity. Here, we combine the G E E with the asymmetric least squares (expectile) regression to derive a new class of estimators, which we call generalized expectile estimating equations ( G E E E ) . The G E E E model estimates regressor effects on the expectiles of the response distribution, which provides a detailed view of regressor effects on the entire response distribution. In addition to capturing data heteroscedasticity, the GEEE extends the various working correlation structures to account for within-subject dependence. We derive the asymptotic properties of the G E E E estimators and propose a robust estimator of its covariance matrix for inference (see our R package, github.com/AmBarry/expectgee). Our simulations show that the GEEE estimator is non-biased and efficient, and our real data analysis shows it captures heteroscedasticity.

9.
Genet Epidemiol ; 45(8): 874-890, 2021 12.
Artigo em Inglês | MEDLINE | ID: mdl-34468045

RESUMO

Medical research increasingly includes high-dimensional regression modeling with a need for error-in-variables methods. The Convex Conditioned Lasso (CoCoLasso) utilizes a reformulated Lasso objective function and an error-corrected cross-validation to enable error-in-variables regression, but requires heavy computations. Here, we develop a Block coordinate Descent Convex Conditioned Lasso (BDCoCoLasso) algorithm for modeling high-dimensional data that are only partially corrupted by measurement error. This algorithm separately optimizes the estimation of the uncorrupted and corrupted features in an iterative manner to reduce computational cost, with a specially calibrated formulation of cross-validation error. Through simulations, we show that the BDCoCoLasso algorithm successfully copes with much larger feature sets than CoCoLasso, and as expected, outperforms the naïve Lasso with enhanced estimation accuracy and consistency, as the intensity and complexity of measurement errors increase. Also, a new smoothly clipped absolute deviation penalization option is added that may be appropriate for some data sets. We apply the BDCoCoLasso algorithm to data selected from the UK Biobank. We develop and showcase the utility of covariate-adjusted genetic risk scores for body mass index, bone mineral density, and lifespan. We demonstrate that by leveraging more information than the naïve Lasso in partially corrupted data, the BDCoCoLasso may achieve higher prediction accuracy. These innovations, together with an R package, BDCoCoLasso, make error-in-variables adjustments more accessible for high-dimensional data sets. We posit the BDCoCoLasso algorithm has the potential to be widely applied in various fields, including genomics-facilitated personalized medicine research.


Assuntos
Algoritmos , Modelos Genéticos , Humanos , Projetos de Pesquisa
10.
Kidney Int Rep ; 6(6): 1567-1579, 2021 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-34169197

RESUMO

INTRODUCTION: To mitigate risks related to human leukocyte antigen (HLA) incompatibility, we assessed whether certain structurally defined HLA targets present in donors but absent from recipients, known as eplet mismatches (EMM), are associated with death-censored graft failure (DCGF). METHODS: We studied a cohort of 118,313 American 0% panel reactive antibodies (PRA) first kidney transplant recipients (2000 to 2015) from the Scientific Registry of Transplant Recipients. Imputed allele-level donor and recipient HLA-A, -B, -C, -DRB1, and -DQB1 genotypes were converted to the repertoire of EMM. We fit survival models for each EMM with significance thresholds corrected for false discovery rate and validated those in an independent PRA > 0% cohort. We conducted network-based analyses to model relationships among EMM and developed models to select the subset of EMM most predictive of DCGF. RESULTS: Of 412 EMM observed, 119 class I and 118 class II EMM were associated with DCGF. Network analysis showed that although 210 eplets formed profiles of 2 to 12 simultaneously occurring EMMs, 202 were singleton EMMs that were not involved in any profile. A variable selection procedure identified 55 single HLA class I and II EMMs in 70% of the dataset; of those, 15 EMMs (9 singleton and 6 involved in profiles) were predictive of DCGF in the remaining dataset. CONCLUSION: Our analysis distinguished increasingly smaller subsets of EMMs associated with increased risk of DCGF. Validation of these EMMs as important predictors of transplant outcomes (in contrast to acceptable EMMs) in datasets with measured allele-level genotypes will support their role as immunodominant EMMs worthy of consideration in organ allocation schemes.

11.
Biometrics ; 77(2): 424-438, 2021 06.
Artigo em Inglês | MEDLINE | ID: mdl-32438470

RESUMO

Identifying disease-associated changes in DNA methylation can help us gain a better understanding of disease etiology. Bisulfite sequencing allows the generation of high-throughput methylation profiles at single-base resolution of DNA. However, optimally modeling and analyzing these sparse and discrete sequencing data is still very challenging due to variable read depth, missing data patterns, long-range correlations, data errors, and confounding from cell type mixtures. We propose a regression-based hierarchical model that allows covariate effects to vary smoothly along genomic positions and we have built a specialized EM algorithm, which explicitly allows for experimental errors and cell type mixtures, to make inference about smooth covariate effects in the model. Simulations show that the proposed method provides accurate estimates of covariate effects and captures the major underlying methylation patterns with excellent power. We also apply our method to analyze data from rheumatoid arthritis patients and controls. The method has been implemented in R package SOMNiBUS.


Assuntos
Metilação de DNA , Sequenciamento de Nucleotídeos em Larga Escala , Metilação de DNA/genética , Humanos , Análise de Sequência de DNA , Sulfitos
12.
PLoS Genet ; 16(5): e1008766, 2020 05.
Artigo em Inglês | MEDLINE | ID: mdl-32365090

RESUMO

Complex traits are known to be influenced by a combination of environmental factors and rare and common genetic variants. However, detection of such multivariate associations can be compromised by low statistical power and confounding by population structure. Linear mixed effects models (LMM) can account for correlations due to relatedness but have not been applicable in high-dimensional (HD) settings where the number of fixed effect predictors greatly exceeds the number of samples. False positives or false negatives can result from two-stage approaches, where the residuals estimated from a null model adjusted for the subjects' relationship structure are subsequently used as the response in a standard penalized regression model. To overcome these challenges, we develop a general penalized LMM with a single random effect called ggmix for simultaneous SNP selection and adjustment for population structure in high dimensional prediction models. We develop a blockwise coordinate descent algorithm with automatic tuning parameter selection which is highly scalable, computationally efficient and has theoretical guarantees of convergence. Through simulations and three real data examples, we show that ggmix leads to more parsimonious models compared to the two-stage approach or principal component adjustment with better prediction accuracy. Our method performs well even in the presence of highly correlated markers, and when the causal SNPs are included in the kinship matrix. ggmix can be used to construct polygenic risk scores and select instrumental variables in Mendelian randomization studies. Our algorithms are available in an R package available on CRAN (https://cran.r-project.org/package=ggmix).


Assuntos
Algoritmos , Estudo de Associação Genômica Ampla/métodos , Modelos Genéticos , Polimorfismo de Nucleotídeo Único , Animais , Simulação por Computador , Cruzamentos Genéticos , Genética Populacional/métodos , Estudo de Associação Genômica Ampla/estatística & dados numéricos , Humanos , Leishmania tropica/genética , Leishmaniose Cutânea/genética , Modelos Lineares , Camundongos , Camundongos Endogâmicos , Herança Multifatorial/genética , Mycobacterium bovis , Dinâmica Populacional , Tamanho da Amostra , Software , Tuberculose/genética , Tuberculose/patologia
13.
Stat Med ; 39(5): 517-543, 2020 02 28.
Artigo em Inglês | MEDLINE | ID: mdl-31868965

RESUMO

Data collected for a genome-wide association study of a primary phenotype are often used for additional genome-wide association analyses of secondary phenotypes. However, when the primary and secondary traits are dependent, naïve analyses of secondary phenotypes may induce spurious associations in non-randomly ascertained samples. Previously, retrospective likelihood-based methods have been proposed to correct for sampling biases arising in secondary trait association analyses. However, most methods have been introduced to handle studies featuring a case-control design based on a binary primary phenotype. As such, these methods are not directly applicable to more complicated study designs such as multiple-trait studies, where the sampling mechanism also depends on the secondary phenotype, or extreme-trait studies, where individuals with extreme primary phenotype values are selected. To accommodate these more complicated sampling mechanisms, only a few prospective likelihood approaches have been proposed. These approaches assume a normal distribution for the secondary phenotype (or the latent secondary phenotype) and a bivariate normal distribution for the primary-secondary phenotype dependence. In this paper, we propose a unified copula-based approach to appropriately detect genetic variant/secondary phenotype association in the presence of selected samples. Primary phenotype is either binary or continuous and the secondary phenotype is continuous although not necessary normal. We use both prospective and retrospective likelihoods to account for the sampling mechanism and use a copula model to allow for potentially different dependence structures between the primary and secondary phenotypes. We demonstrate the effectiveness of our approach through simulation studies and by analyzing data from the Avon Longitudinal Study of Parents and Children cohort.


Assuntos
Estudo de Associação Genômica Ampla , Modelos Genéticos , Criança , Humanos , Funções Verossimilhança , Estudos Longitudinais , Fenótipo , Polimorfismo de Nucleotídeo Único , Estudos Prospectivos , Estudos Retrospectivos
14.
Genet Epidemiol ; 43(4): 373-401, 2019 06.
Artigo em Inglês | MEDLINE | ID: mdl-30635941

RESUMO

In Mendelian randomization (MR), inference about causal relationship between a phenotype of interest and a response or disease outcome can be obtained by constructing instrumental variables from genetic variants. However, MR inference requires three assumptions, one of which is that the genetic variants only influence the outcome through phenotype of interest. Pleiotropy, that is, the situation in which some genetic variants affect more than one phenotype, can invalidate these genetic variants for use as instrumental variables; thus a naive analysis will give biased estimates of the causal relation. Here, we present new methods (constrained instrumental variable [CIV] methods) to construct valid instrumental variables and perform adjusted causal effect estimation when pleiotropy exists and when the pleiotropic phenotypes are available. We demonstrate that a smoothed version of CIV performs approximate selection of genetic variants that are valid instruments, and provides unbiased estimates of the causal effects. We provide details on a number of existing methods, together with a comparison of their performance in a large series of simulations. CIV performs robustly across different pleiotropic violations of the MR assumptions. We also analyzed the data from the Alzheimer's disease (AD) neuroimaging initiative (ADNI; Mueller et al., 2005. Alzheimer's Dementia, 11(1), 55-66) to disentangle causal relationships of several biomarkers with AD progression.


Assuntos
Pleiotropia Genética/fisiologia , Análise da Randomização Mendeliana/métodos , Algoritmos , Fatores de Confusão Epidemiológicos , Estudos de Associação Genética , Variação Genética , Humanos , Modelos Genéticos , Fenótipo
15.
BMC Proc ; 12(Suppl 9): 20, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-30275877

RESUMO

Using data on 680 patients from the GAW20 real data set, we conducted Mendelian randomization (MR) studies to explore the causal relationships between methylation levels at selected probes (cytosine-phosphate-guanine sites [CpGs]) and high-density lipoprotein (HDL) changes (ΔHDL) using single-nucleotide polymorphisms (SNPs) as instrumental variables. Several methods were used to estimate the causal effects at CpGs of interest on ΔHDL, including a newly developed method that we call constrained instrumental variables (CIV). CIV performs automatic SNP selection while providing estimates of causal effects adjusted for possible pleiotropy, when the potentially-pleiotropic phenotypes are measured. For CpGs in or near the 10 genes identified as associated with ΔHDL using a family-based VC-score test, we compared CIV to Egger regression and the two-stage least squares (TSLS) method. All 3 approaches selected at least 1CpG in 2 genes-RNMT;C18orf19 and C6orf141-as showing a causal relationship with ΔHDL.

16.
BMC Proc ; 12(Suppl 9): 30, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-30263044

RESUMO

Epigenome association studies that test a large number of methylation sites suffer from stringent multiple-testing corrections. This study's goals were to investigate region-based associations between DNA methylation sites and lipid-level changes in response to the treatment with fenofibrate in the GAW20 data and to investigate whether improvements in power could be obtained by taking into account correlations between DNA methylation at neighboring cytosine-phosphate-guanine (CpG) sites. To this end, we applied both a recently developed block-based data-dimension-reduction approach and a region-based variance-component (VC) linear mixed model to GAW20 data. We compared analyses of unrelated individuals with familial data. The region-based VC approach using unrelated (independent) individuals identified the gene LGALS9C as significantly associated with changes in triglycerides. However, univariate tests of individual CpG sites yielded no valid statistically significant results.

17.
BMC Genet ; 19(Suppl 1): 74, 2018 09 17.
Artigo em Inglês | MEDLINE | ID: mdl-30255779

RESUMO

BACKGROUND: Increasingly available multilayered omics data on large populations has opened exciting analytic opportunities and posed unique challenges to robust estimation of causal effects in the setting of complex disease phenotypes. The GAW20 Causal Modeling Working Group has applied complementary approaches (eg, Mendelian randomization, structural equations modeling, Bayesian networks) to discover novel causal effects of genomic and epigenomic variation on lipid phenotypes, as well as to validate prior findings from observational studies. RESULTS: Two Mendelian randomization studies have applied novel approaches to instrumental variable selection in methylation data, identifying bidirectional causal effects of CPT1A and triglycerides, as well as of RNMT and C6orf42, on high-density lipoprotein cholesterol response to fenofibrate. The CPT1A finding also emerged in a Bayesian network study. The Mendelian randomization studies have implemented both existing and novel steps to account for pleiotropic effects, which were independently detected in the GAW20 data via a structural equation modeling approach. Two studies estimated indirect effects of genomic variation (via DNA methylation and/or correlated phenotypes) on lipid outcomes of interest. Finally, a novel weighted R2 measure was proposed to complement other causal inference efforts by controlling for the influence of outlying observations. CONCLUSIONS: The GAW20 contributions illustrate the diversity of possible approaches to causal inference in the multi-omic context, highlighting the promises and assumptions of each method and the benefits of integrating both across methods and across omics layers for the most robust and comprehensive insights into disease processes.


Assuntos
Estudo de Associação Genômica Ampla , Modelos Genéticos , Teorema de Bayes , Carnitina O-Palmitoiltransferase/genética , HDL-Colesterol/sangue , Metilação de DNA , Fenofibrato/uso terapêutico , Variação Genética , Humanos , Hipertrigliceridemia/tratamento farmacológico , Hipertrigliceridemia/genética , Hipoglicemiantes/uso terapêutico , Metiltransferases/genética , Triglicerídeos/sangue
18.
Sci Rep ; 8(1): 220, 2018 01 09.
Artigo em Inglês | MEDLINE | ID: mdl-29317680

RESUMO

Performance of a recently developed test for association between multivariate phenotypes and sets of genetic variants (MURAT) is demonstrated using measures of bone mineral density (BMD). By combining individual-level whole genome sequenced data from the UK10K study, and imputed genome-wide genetic data on individuals from the Study of Osteoporotic Fractures (SOF) and the Osteoporotic Fractures in Men Study (MrOS), a data set of 8810 individuals was assembled; tests of association were performed between autosomal gene-sets of genetic variants and BMD measured at lumbar spine and femoral neck. Distributions of p-values obtained from analyses of a single BMD phenotype are compared to those from the multivariate tests, across several region definitions and variant weightings. There is evidence of increased power with the multivariate test, although no new loci for BMD were identified. Among 17 genes highlighted either because there were significant p-values in region-based association tests or because they were in well-known BMD genes, 4 windows in 2 genes as well as 6 single SNPs in one of these genes showed association at genome-wide significant thresholds with the multivariate phenotype test but not with the single-phenotype test, Sequence Kernel Association Test (SKAT).


Assuntos
Densidade Óssea/genética , Estudo de Associação Genômica Ampla/normas , Fraturas por Osteoporose/genética , Polimorfismo de Nucleotídeo Único , Idoso , Exoma , Feminino , Colo do Fêmur/patologia , Estudo de Associação Genômica Ampla/métodos , Humanos , Vértebras Lombares/patologia , Masculino , Fraturas por Osteoporose/patologia , Fenótipo
19.
Stat Methods Med Res ; 27(5): 1331-1350, 2018 05.
Artigo em Inglês | MEDLINE | ID: mdl-27460538

RESUMO

The genomics era has led to an increase in the dimensionality of data collected in the investigation of biological questions. In this context, dimension-reduction techniques can be used to summarise high-dimensional signals into low-dimensional ones, to further test for association with one or more covariates of interest. This paper revisits one such approach, previously known as principal component of heritability and renamed here as principal component of explained variance (PCEV). As its name suggests, the PCEV seeks a linear combination of outcomes in an optimal manner, by maximising the proportion of variance explained by one or several covariates of interest. By construction, this method optimises power; however, due to its computational complexity, it has unfortunately received little attention in the past. Here, we propose a general analytical PCEV framework that builds on the assets of the original method, i.e. conceptually simple and free of tuning parameters. Moreover, our framework extends the range of applications of the original procedure by providing a computationally simple strategy for high-dimensional outcomes, along with exact and asymptotic testing procedures that drastically reduce its computational cost. We investigate the merits of the PCEV using an extensive set of simulations. Furthermore, the use of the PCEV approach is illustrated using three examples taken from the fields of epigenetics and brain imaging.


Assuntos
Análise de Variância , Análise de Componente Principal/métodos , Simulação por Computador , Metilação de DNA , Interpretação Estatística de Dados , Genes/genética , Humanos , Modelos Estatísticos , Análise Multivariada , Neuroimagem/estatística & dados numéricos
20.
Stat Appl Genet Mol Biol ; 16(5-6): 333-347, 2017 11 27.
Artigo em Inglês | MEDLINE | ID: mdl-29055941

RESUMO

We consider the assessment of DNA methylation profiles for sequencing-derived data from a single cell type or from cell lines. We derive a kernel smoothed EM-algorithm, capable of analyzing an entire chromosome at once, and to simultaneously correct for experimental errors arising from either the pre-treatment steps or from the sequencing stage and to take into account spatial correlations between DNA methylation profiles at neighbouring CpG sites. The outcomes of our algorithm are then used to (i) call the true methylation status at each CpG site, (ii) provide accurate smoothed estimates of DNA methylation levels, and (iii) detect differentially methylated regions. Simulations show that the proposed methodology outperforms existing analysis methods that either ignore the correlation between DNA methylation profiles at neighbouring CpG sites or do not correct for errors. The use of the proposed inference procedure is illustrated through the analysis of a publicly available data set from a cell line of induced pluripotent H9 human embryonic stem cells and also a data set where methylation measures were obtained for a small genomic region in three different immune cell types separated from whole blood.


Assuntos
Algoritmos , Metilação de DNA , Epigênese Genética , Epigenômica/métodos , Linhagem Celular , Simulação por Computador , Ilhas de CpG , Sequenciamento de Nucleotídeos em Larga Escala , Análise de Célula Única
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA