RESUMO
BACKGROUND: Identifying gene interactions is a topic of great importance in genomics, and approaches based on network models provide a powerful tool for studying these. Assuming a Gaussian graphical model, a gene association network may be estimated from multiomic data based on the non-zero entries of the inverse covariance matrix. Inferring such biological networks is challenging because of the high dimensionality of the problem, making traditional estimators unsuitable. The graphical lasso is constructed for the estimation of sparse inverse covariance matrices in such situations, using [Formula: see text]-penalization on the matrix entries. The weighted graphical lasso is an extension in which prior biological information from other sources is integrated into the model. There are however issues with this approach, as it naïvely forces the prior information into the network estimation, even if it is misleading or does not agree with the data at hand. Further, if an associated network based on other data is used as the prior, the method often fails to utilize the information effectively. RESULTS: We propose a novel graphical lasso approach, the tailored graphical lasso, that aims to handle prior information of unknown accuracy more effectively. We provide an R package implementing the method, tailoredGlasso. Applying the method to both simulated and real multiomic data sets, we find that it outperforms the unweighted and weighted graphical lasso in terms of all performance measures we consider. In fact, the graphical lasso and weighted graphical lasso can be considered special cases of the tailored graphical lasso, and a parameter determined by the data measures the usefulness of the prior information. We also find that among a larger set of methods, the tailored graphical is the most suitable for network inference from high-dimensional data with prior information of unknown accuracy. With our method, mRNA data are demonstrated to provide highly useful prior information for protein-protein interaction networks. CONCLUSIONS: The method we introduce utilizes useful prior information more effectively without involving any risk of loss of accuracy should the prior information be misleading.
Assuntos
Algoritmos , Redes Reguladoras de Genes , Genômica , Distribuição Normal , Mapas de Interação de ProteínasRESUMO
BACKGROUND: The aim of this study was to investigate the prognostic value of the PAM50 intrinsic subtypes and risk of recurrence (ROR) score in patients with early breast cancer and long-term follow-up. A special focus was placed on hormone receptor-positive/human epidermal growth factor receptor 2-negative (HR+/HER2-) pN0 patients not treated with chemotherapy. METHODS: Patients with early breast cancer (n = 653) enrolled in the observational Oslo1 study (1995-1998) were followed for distant recurrence and breast cancer death. Clinicopathological parameters were collected from hospital records. The primary tumors were analyzed using the Prosigna® PAM50 assay to determine the prognostic value of the intrinsic subtypes and ROR score in comparison with pathological characteristics. The primary endpoints were distant disease-free survival (DDFS) and breast cancer-specific survival (BCSS). RESULTS: Of 653 tumors, 52.2% were classified as luminal A, 26.5% as luminal B, 10.6% as HER2-enriched, and 10.7% as basal-like. Among the HR+/HER2- patients (n = 476), 37.8% were categorized as low risk by ROR score, 22.7% as intermediate risk, and 39.5% as high risk. Median follow-up durations for BCSS and DDFS were 16.6 and 7.1 years, respectively. Multivariate analysis showed that intrinsic subtypes (all patients) and ROR risk classification (HR+/HER2- patients) yielded strong prognostic information. Among the HR+/HER2- pN0 patients with no adjuvant treatment (n = 231), 53.7% of patients had a low ROR, and their prognosis at 15 years was excellent (15-year BCSS 96.3%). Patients with intermediate risk had reduced survival compared with those with low risk (p = 0.005). In contrast, no difference in survival between the low- and intermediate-risk groups was seen for HR+/HER2- pN0 patients who received tamoxifen only. Ki-67 protein, grade, and ROR score were analyzed in the unselected, untreated pT1pN0 HR+/HER2- population (n = 171). In multivariate analysis, ROR score outperformed both Ki-67 and grade. Furthermore, 55% of patients who according to the PREDICT tool ( http://www.predict.nhs.uk/ ) would be considered chemotherapy candidates were ROR low risk (33%) or luminal A ROR intermediate risk (22%). CONCLUSIONS: The PAM50 intrinsic subtype classification and ROR score improve classification of patients with breast cancer into prognostic groups, allowing for a more precise identification of future recurrence risk and providing an improved basis for adjuvant treatment decisions. Node-negative patients with low ROR scores had an excellent outcome at 15 years even in the absence of adjuvant therapy.
Assuntos
Biomarcadores Tumorais , Neoplasias da Mama/diagnóstico , Neoplasias da Mama/mortalidade , Adulto , Idoso , Neoplasias da Mama/metabolismo , Neoplasias da Mama/patologia , Feminino , Seguimentos , Humanos , Estimativa de Kaplan-Meier , Pessoa de Meia-Idade , Gradação de Tumores , Metástase Neoplásica , Recidiva Local de Neoplasia , Estadiamento de Neoplasias/métodos , Avaliação de Resultados da Assistência ao Paciente , Prognóstico , Medição de RiscoRESUMO
For many high-dimensional studies, additional information on the variables, like (genomic) annotation or external p-values, is available. In the context of binary and continuous prediction, we develop a method for adaptive group-regularized (logistic) ridge regression, which makes structural use of such 'co-data'. Here, 'groups' refer to a partition of the variables according to the co-data. We derive empirical Bayes estimates of group-specific penalties, which possess several nice properties: (i) They are analytical. (ii) They adapt to the informativeness of the co-data for the data at hand. (iii) Only one global penalty parameter requires tuning by cross-validation. In addition, the method allows use of multiple types of co-data at little extra computational effort. We show that the group-specific penalties may lead to a larger distinction between 'near-zero' and relatively large regression parameters, which facilitates post hoc variable selection. The method, termed GRridge, is implemented in an easy-to-use R-package. It is demonstrated on two cancer genomics studies, which both concern the discrimination of precancerous cervical lesions from normal cervix tissues using methylation microarray data. For both examples, GRridge clearly improves the predictive performances of ordinary logistic ridge regression and the group lasso. In addition, we show that for the second study, the relatively good predictive performance is maintained when selecting only 42 variables.
Assuntos
Testes Genéticos/estatística & dados numéricos , Lesões Pré-Cancerosas/diagnóstico , Projetos de Pesquisa/estatística & dados numéricos , Neoplasias do Colo do Útero/diagnóstico , Teorema de Bayes , Simulação por Computador , Metilação de DNA/genética , Feminino , Testes Genéticos/métodos , Humanos , Modelos Logísticos , Lesões Pré-Cancerosas/genética , Valor Preditivo dos Testes , Reprodutibilidade dos Testes , Projetos de Pesquisa/normas , Neoplasias do Colo do Útero/genéticaRESUMO
UNLABELLED: Recently developed methods that couple next-generation sequencing with chromosome conformation capture-based techniques, such as Hi-C and ChIA-PET, allow for characterization of genome-wide chromatin 3D structure. Understanding the organization of chromatin in three dimensions is a crucial next step in the unraveling of global gene regulation, and methods for analyzing such data are needed. We have developed HiBrowse, a user-friendly web-tool consisting of a range of hypothesis-based and descriptive statistics, using realistic assumptions in null-models. AVAILABILITY AND IMPLEMENTATION: HiBrowse is supported by all major browsers, and is freely available at http://hyperbrowser.uio.no/3d. Software is implemented in Python, and source code is available for download by following instructions on the main site.
Assuntos
Cromatina/química , Software , Interpretação Estatística de Dados , Genoma , Genômica/métodos , Sequenciamento de Nucleotídeos em Larga EscalaRESUMO
The study of chromatin 3D structure has recently gained much focus owing to novel techniques for detecting genome-wide chromatin contacts using next-generation sequencing. A deeper understanding of the architecture of the DNA inside the nucleus is crucial for gaining insight into fundamental processes such as transcriptional regulation, genome dynamics and genome stability. Chromatin conformation capture-based methods, such as Hi-C and ChIA-PET, are now paving the way for routine genome-wide studies of chromatin 3D structure in a range of organisms and tissues. However, appropriate methods for analyzing such data are lacking. Here, we propose a hypothesis test and an enrichment score of 3D co-localization of genomic elements that handles intra- or interchromosomal interactions, both separately and jointly, and that adjusts for biases caused by structural dependencies in the 3D data. We show that maintaining structural properties during resampling is essential to obtain valid estimation of P-values. We apply the method on chromatin states and a set of mutated regions in leukemia cells, and find significant co-localization of these elements, with varying enrichment scores, supporting the role of chromatin 3D structure in shaping the landscape of somatic mutations in cancer.
Assuntos
Cromatina/química , Linhagem Celular Tumoral , Cromossomos Humanos/química , Interpretação Estatística de Dados , Genoma , Humanos , Leucemia/genética , Mutação , Conformação de Ácido Nucleico , Análise de Sequência de DNARESUMO
The PAM50 gene expression subtypes and the associated risk of recurrence (ROR) score are used to predict the risk of recurrence and the benefits of adjuvant therapy in early-stage breast cancer. The Prosigna assay includes the PAM50 subtypes along with their clinicopathological features, and is approved for treatment recommendations for adjuvant hormonal therapy and chemotherapy in hormone-receptor-positive early breast cancer. The Prosigna test utilizes RNA extracted from macrodissected tumor cells obtained from formalin-fixed, paraffin-embedded (FFPE) tissue sections. However, RNA extracted from fresh-frozen (FF) bulk tissue without macrodissection is widely used for research purposes, and yields high-quality RNA for downstream analyses. To investigate the impact of the sample preparation approach on ROR scores, we analyzed 94 breast carcinomas included in an observational study that had available gene expression data from macrodissected FFPE tissue and FF bulk tumor tissue, along with the clinically approved Prosigna scores for the node-negative, hormone-receptor-positive, HER2-negative cases (n = 54). ROR scores were calculated in R; the resulting two sets of scores from FFPE and FF samples were compared, and treatment recommendations were evaluated. Overall, ROR scores calculated based on the macrodissected FFPE tissue were consistent with the Prosigna scores. However, analyses from bulk tissue yielded a higher proportion of cases classified as normal-like; these were samples with relatively low tumor cellularity, leading to lower ROR scores. When comparing ROR scores (low, intermediate, and high), discordant cases between the two preparation approaches were revealed among the luminal tumors; the recommended treatment would have changed in a minority of cases.
RESUMO
Ductal carcinoma in situ (DCIS) is a non-invasive type of breast cancer with highly variable potential of becoming invasive and affecting mortality. Currently, many patients with DCIS are overtreated due to the lack of specific biomarkers that distinguish low risk lesions from those with a higher risk of progression. In this study, we analyzed 57 pure DCIS and 313 invasive breast cancers (IBC) from different patients. Three levels of genomic data were obtained; gene expression, DNA methylation, and DNA copy number. We performed subtype stratified analyses and identified key differences between DCIS and IBC that suggest subtype specific progression. Prominent differences were found in tumors of the basal-like subtype: Basal-like DCIS were less proliferative and showed a higher degree of differentiation than basal-like IBC. Also, core basal tumors (characterized by high correlation to the basal-like centroid) were not identified amongst DCIS as opposed to IBC. At the copy number level, basal-like DCIS exhibited fewer copy number aberrations compared with basal-like IBC. An intriguing finding through analysis of the methylome was hypermethylation of multiple protocadherin genes in basal-like IBC compared with basal-like DCIS and normal tissue, possibly caused by long range epigenetic silencing. This points to silencing of cell adhesion-related genes specifically in IBC of the basal-like subtype. Our work confirms that subtype stratification is essential when studying progression from DCIS to IBC, and we provide evidence that basal-like DCIS show less aggressive characteristics and question the assumption that basal-like DCIS is a direct precursor of basal-like invasive breast cancer.
RESUMO
BACKGROUND: Using high-dimensional penalized regression we studied genome-wide DNA-methylation in bone biopsies of 80 postmenopausal women in relation to their bone mineral density (BMD). The women showed BMD varying from severely osteoporotic to normal. Global gene expression data from the same individuals was available, and since DNA-methylation often affects gene expression, the overall aim of this paper was to include both of these omics data sets into an integrated analysis. METHODS: The classical penalized regression uses one penalty, but we incorporated individual penalties for each of the DNA-methylation sites. These individual penalties were guided by the strength of association between DNA-methylations and gene transcript levels. DNA-methylations that were highly associated to one or more transcripts got lower penalties and were therefore favored compared to DNA-methylations showing less association to expression. Because of the complex pathways and interactions among genes, we investigated both the association between DNA-methylations and their corresponding cis gene, as well as the association between DNA-methylations and trans-located genes. Two integrating penalized methods were used: first, an adaptive group-regularized ridge regression, and secondly, variable selection was performed through a modified version of the weighted lasso. RESULTS: When information from gene expressions was integrated, predictive performance was considerably improved, in terms of predictive mean square error, compared to classical penalized regression without data integration. We found a 14.7% improvement in the ridge regression case and a 17% improvement for the lasso case. Our version of the weighted lasso with data integration found a list of 22 interesting methylation sites. Several corresponded to genes that are known to be important in bone formation. Using BMD as response and these 22 methylation sites as covariates, least square regression analyses resulted in R2=0.726, comparable to an average R2=0.438 for 10000 randomly selected groups of DNA-methylations with group size 22. CONCLUSIONS: Two recent types of penalized regression methods were adapted to integrate DNA-methylation and their association to gene expression in the analysis of bone mineral density. In both cases predictions clearly benefit from including the additional information on gene expressions.
Assuntos
Densidade Óssea/genética , Metilação de DNA , Análise de Dados , Perfilação da Expressão Gênica , Pós-Menopausa/genética , Pós-Menopausa/fisiologia , Estudos de Coortes , Feminino , Genômica , Humanos , Análise Multivariada , Análise de RegressãoRESUMO
DNA methylation affects expression of associated genes and may contribute to the missing genetic effects from genome-wide association studies of osteoporosis. To improve insight into the mechanisms of postmenopausal osteoporosis, we combined transcript profiling with DNA methylation analyses in bone. RNA and DNA were isolated from 84 bone biopsies of postmenopausal donors varying markedly in bone mineral density (BMD). In all, 2529 CpGs in the top 100 genes most significantly associated with BMD were analyzed. The methylation levels at 63 CpGs differed significantly between healthy and osteoporotic women at 10% false discovery rate (FDR). Five of these CpGs at 5% FDR could explain 14% of BMD variation. To test whether blood DNA methylation reflect the situation in bone (as shown for other tissues), an independent cohort was selected and BMD association was demonstrated in blood for 13 of the 63 CpGs. Four transcripts representing inhibitors of bone metabolism-MEPE, SOST, WIF1, and DKK1-showed correlation to a high number of methylated CpGs, at 5% FDR. Our results link DNA methylation to the genetic influence modifying the skeleton, and the data suggest a complex interaction between CpG methylation and gene regulation. This is the first study in the hitherto largest number of postmenopausal women to demonstrate a strong association among bone CpG methylation, transcript levels, and BMD/fracture. This new insight may have implications for evaluation of osteoporosis stage and susceptibility.
Assuntos
Metilação de DNA , Osteoporose Pós-Menopausa/genética , Proteínas Adaptadoras de Transdução de Sinal/genética , Proteínas Adaptadoras de Transdução de Sinal/metabolismo , Idoso , Idoso de 80 Anos ou mais , Células Sanguíneas/metabolismo , Densidade Óssea/genética , Proteínas Morfogenéticas Ósseas/genética , Proteínas Morfogenéticas Ósseas/metabolismo , Osso e Ossos/metabolismo , Estudos de Casos e Controles , Ilhas de CpG , Proteínas da Matriz Extracelular/genética , Proteínas da Matriz Extracelular/metabolismo , Feminino , Marcadores Genéticos/genética , Glicoproteínas/genética , Glicoproteínas/metabolismo , Humanos , Peptídeos e Proteínas de Sinalização Intercelular/genética , Peptídeos e Proteínas de Sinalização Intercelular/metabolismo , Pessoa de Meia-Idade , Fosfoproteínas/genética , Fosfoproteínas/metabolismo , Proteínas Repressoras/genética , Proteínas Repressoras/metabolismoRESUMO
Combining genome-wide structural models with phenomenological data is at the forefront of efforts to understand the organizational principles regulating the human genome. Here, we use chromosome-chromosome contact data as knowledge-based constraints for large-scale three-dimensional models of the human diploid genome. The resulting models remain minimally entangled and acquire several functional features that are observed in vivo and that were never used as input for the model. We find, for instance, that gene-rich, active regions are drawn towards the nuclear center, while gene poor and lamina associated domains are pushed to the periphery. These and other properties persist upon adding local contact constraints, suggesting their compatibility with non-local constraints for the genome organization. The results show that suitable combinations of data analysis and physical modelling can expose the unexpectedly rich functionally-related properties implicit in chromosome-chromosome contact data. Specific directions are suggested for further developments based on combining experimental data analysis and genomic structural modelling.