RESUMO
MOTIVATION: DNA methylation is a key epigenetic modification that can modulate gene expression. Over the past decade, a lot of studies have focused on profiling DNA methylation and investigating its alterations in complex diseases such as cancer. While early studies were mostly restricted to CpG islands or promoter regions, recent findings indicate that many of important DNA methylation changes can occur in other regions and DNA methylation needs to be examined on a genome-wide scale. In this article, we apply the wavelet-based functional mixed model methodology to analyze the high-throughput methylation data for identifying differentially methylated loci across the genome. Contrary to many commonly-used methods that model probes independently, this framework accommodates spatial correlations across the genome through basis function modeling as well as correlations between samples through functional random effects, which allows it to be applied to many different settings and potentially leads to more power in detection of differential methylation. RESULTS: We applied this framework to three different high-dimensional methylation data sets (CpG Shore data, THREE data and NIH Roadmap Epigenomics data), studied previously in other works. A simulation study based on CpG Shore data suggested that in terms of detection of differentially methylated loci, this modeling approach using wavelets outperforms analogous approaches modeling the loci as independent. For the THREE data, the method suggests newly detected regions of differential methylation, which were not reported in the original study. AVAILABILITY AND IMPLEMENTATION: Automated software called WFMM is available at https://biostatistics.mdanderson.org/SoftwareDownload CpG Shore data is available at http://rafalab.dfci.harvard.edu NIH Roadmap Epigenomics data is available at http://compbio.mit.edu/roadmap SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. CONTACT: jefmorris@mdanderson.org.
Assuntos
Metilação de DNA , Ilhas de CpG , Epigênese Genética , Epigenômica , SoftwareRESUMO
PURPOSE: Consensus molecular subtyping (CMS) of colorectal cancer has potential to reshape the colorectal cancer landscape. We developed and validated an assay that is applicable on formalin-fixed, paraffin-embedded (FFPE) samples of colorectal cancer and implemented the assay in a Clinical Laboratory Improvement Amendments (CLIA)-certified laboratory. EXPERIMENTAL DESIGN: We performed an in silico experiment to build an optimal CMS classifier using a training set of 1,329 samples from 12 studies and validation set of 1,329 samples from 14 studies. We constructed an assay on the basis of NanoString CodeSets for the top 472 genes, and performed analyses on paired flash-frozen (FF)/FFPE samples from 175 colorectal cancers to adapt the classifier to FFPE samples using a subset of genes found to be concordant between FF and FFPE, tested the classifier's reproducibility and repeatability, and validated in a CLIA-certified laboratory. We assessed prognostic significance of CMS in 345 patients pooled across three clinical trials. RESULTS: The best classifier was weighted support vector machine with high accuracy across platforms and gene lists (>0.95), and the 472-gene model outperforming existing classifiers. We constructed subsets of 99 and 200 genes with high FF/FFPE concordance, and adapted FFPE-based classifier that had strong classification accuracy (>80%) relative to "gold standard" CMS. The classifier was reproducible to sample type and RNA quality, and demonstrated poor prognosis for CMS1-3 and good prognosis for CMS2 in metastatic colorectal cancer (P < 0.001). CONCLUSIONS: We developed and validated a colorectal cancer CMS assay that is ready for use in clinical trials, to assess prognosis in standard-of-care settings and explore as predictor of therapy response.
Assuntos
Antineoplásicos/uso terapêutico , Biomarcadores Tumorais/genética , Neoplasias Colorretais/diagnóstico , Regulação Neoplásica da Expressão Gênica , Máquina de Vetores de Suporte , Antineoplásicos/farmacologia , Neoplasias Colorretais/tratamento farmacológico , Neoplasias Colorretais/genética , Neoplasias Colorretais/mortalidade , Resistencia a Medicamentos Antineoplásicos/genética , Feminino , Perfilação da Expressão Gênica , Humanos , Masculino , Pessoa de Meia-Idade , Prognóstico , Reprodutibilidade dos Testes , Medição de Risco/métodos , TranscriptomaRESUMO
Glaucoma, a leading cause of blindness, is characterized by optic nerve damage related to intraocular pressure (IOP), but its full etiology is unknown. Researchers at UAB have devised a custom device to measure scleral strain continuously around the eye under fixed levels of IOP, which here is used to assess how strain varies around the posterior pole, with IOP, and across glaucoma risk factors such as age. The hypothesis is that scleral strain decreases with age, which could alter biomechanics of the optic nerve head and cause damage that could eventually lead to glaucoma. To evaluate this hypothesis, we adapted Bayesian Functional Mixed Models to model these complex data consisting of correlated functions on spherical scleral surface, with nonparametric age effects allowed to vary in magnitude and smoothness across the scleral surface, multi-level random effect functions to capture within-subject correlation, and functional growth curve terms to capture serial correlation across IOPs that can vary around the scleral surface. Our method yields fully Bayesian inference on the scleral surface or any aggregation or transformation thereof, and reveals interesting insights into the biomechanical etiology of glaucoma. The general modeling framework described is very flexible and applicable to many complex, high-dimensional functional data.
RESUMO
Purpose: To determine the relationship between peripapillary scleral strain change and cumulative differential IOP exposure in nonhuman primates (NHPs) with unilateral chronic ocular hypertension. Methods: Posterior scleral shells from 6 bilaterally normal and 10 unilateral chronic ocular hypertension NHPs were pressurized from 5 to 45 mm Hg, and the resulting full-field, three-dimensional, scleral surface deformations were acquired using laser speckle interferometry. Scleral tensile strain (local tissue deformation) was calculated by analytical differentiation of the displacement field; zero strain was assumed at 5 mm Hg. Maximum principal strain was used to represent the scleral strain, and strains were averaged over a 15°-wide (â¼3.6-mm) circumpapillary region adjacent to the ONH. The relative difference in mean strain was calculated between fellow eyes and compared with the differential cumulative IOP exposure within NHPs during the study period. The relationship between the relative difference in scleral strain and the differential cumulative IOP exposure in fellow eyes was assessed using an F test and quadratic regression model. Results: Relative differential scleral tensile strain was significantly associated with differential cumulative IOP exposure in contralateral eyes in the chronic ocular hypertension NHPs, with the bilaterally normal NHPs showing no significant strain difference between fellow eyes. The sclera in the chronic ocular hypertension eyes was more compliant than in their fellow eyes at low levels of differential cumulative IOP exposure, but stiffer at larger differential IOPs (P < 0.0001). Conclusions: These cross-sectional findings suggest that longitudinal IOP-induced changes in scleral mechanical behavior are dependent on the magnitude of differential cumulative IOP exposure.
Assuntos
Pressão Intraocular/fisiologia , Hipertensão Ocular/fisiopatologia , Esclera/fisiologia , Animais , Estudos Transversais , Modelos Animais de Doenças , Glaucoma/fisiopatologia , Primatas , Tonometria OcularRESUMO
Estimation of inverse covariance matrices, known as precision matrices, is important in various areas of statistical analysis. In this article, we consider estimation of multiple precision matrices sharing some common structures. In this setting, estimating each precision matrix separately can be suboptimal as it ignores potential common structures. This article proposes a new approach to parameterize each precision matrix as a sum of common and unique components and estimate multiple precision matrices in a constrained l1 minimization framework. We establish both estimation and selection consistency of the proposed estimator in the high dimensional setting. The proposed estimator achieves a faster convergence rate for the common structure in certain cases. Our numerical examples demonstrate that our new estimator can perform better than several existing methods in terms of the entropy loss and Frobenius loss. An application to a glioblastoma cancer data set reveals some interesting gene networks across multiple cancer subtypes.
RESUMO
Multivariate regression is a common statistical tool for practical problems. Many multivariate regression techniques are designed for univariate response cases. For problems with multiple response variables available, one common approach is to apply the univariate response regression technique separately on each response variable. Although it is simple and popular, the univariate response approach ignores the joint information among response variables. In this paper, we propose three new methods for utilizing joint information among response variables. All methods are in a penalized likelihood framework with weighted L(1) regularization. The proposed methods provide sparse estimators of conditional inverse co-variance matrix of response vector given explanatory variables as well as sparse estimators of regression parameters. Our first approach is to estimate the regression coefficients with plug-in estimated inverse covariance matrices, and our second approach is to estimate the inverse covariance matrix with plug-in estimated regression parameters. Our third approach is to estimate both simultaneously. Asymptotic properties of these methods are explored. Our numerical examples demonstrate that the proposed methods perform competitively in terms of prediction, variable selection, as well as inverse covariance matrix estimation.
RESUMO
Multiple response regression is a useful regression technique to model multiple response variables using the same set of predictor variables. Most existing methods for multiple response regression are designed for modeling homogeneous data. In many applications, however, one may have heterogeneous data where the samples are divided into multiple groups. Our motivating example is a cancer dataset where the samples belong to multiple cancer subtypes. In this paper, we consider modeling the data coming from a mixture of several Gaussian distributions with known group labels. A naive approach is to split the data into several groups according to the labels and model each group separately. Although it is simple, this approach ignores potential common structures across different groups. We propose new penalized methods to model all groups jointly in which the common and unique structures can be identified. The proposed methods estimate the regression coefficient matrix, as well as the conditional inverse covariance matrix of response variables. Asymptotic properties of the proposed methods are explored. Through numerical examples, we demonstrate that both estimation and prediction can be improved by modeling all groups jointly using the proposed methods. An application to a glioblastoma cancer dataset reveals some interesting common and unique gene relationships across different cancer subtypes.