RESUMO
Fingerprints are of long-standing practical and cultural interest, but little is known about the mechanisms that underlie their variation. Using genome-wide scans in Han Chinese cohorts, we identified 18 loci associated with fingerprint type across the digits, including a genetic basis for the long-recognized "pattern-block" correlations among the middle three digits. In particular, we identified a variant near EVI1 that alters regulatory activity and established a role for EVI1 in dermatoglyph patterning in mice. Dynamic EVI1 expression during human development supports its role in shaping the limbs and digits, rather than influencing skin patterning directly. Trans-ethnic meta-analysis identified 43 fingerprint-associated loci, with nearby genes being strongly enriched for general limb development pathways. We also found that fingerprint patterns were genetically correlated with hand proportions. Taken together, these findings support the key role of limb development genes in influencing the outcome of fingerprint patterning.
Assuntos
Dermatoglifia , Dedos/crescimento & desenvolvimento , Organogênese/genética , Polimorfismo de Nucleotídeo Único , Dedos do Pé/crescimento & desenvolvimento , Adolescente , Adulto , Idoso , Idoso de 80 Anos ou mais , Animais , Povo Asiático/genética , Padronização Corporal/genética , Criança , Estudos de Coortes , Feminino , Membro Anterior/crescimento & desenvolvimento , Loci Gênicos , Estudo de Associação Genômica Ampla , Humanos , Proteína do Locus do Complexo MDS1 e EVI1/genética , Masculino , Camundongos , Pessoa de Meia-Idade , Adulto JovemRESUMO
An outstanding challenge of epigenome-wide association studies (EWASs) performed in complex tissues is the identification of the specific cell type(s) responsible for the observed differential DNA methylation. Here we present a statistical algorithm called CellDMC ( https://github.com/sjczheng/EpiDISH ), which can identify differentially methylated positions and the specific cell type(s) driving the differential methylation. We validated CellDMC on in silico mixtures of DNA methylation data generated with different technologies, as well as on real mixtures from epigenome-wide association and cancer epigenome studies. CellDMC achieved over 90% sensitivity and specificity in scenarios where current state-of-the-art methods did not identify differential methylation. By applying CellDMC to an EWAS performed in buccal swabs, we identified smoking-associated differentially methylated positions occurring in the epithelial compartment, which we validated in smoking-related lung cancer. CellDMC may be useful in the identification of causal DNA-methylation alterations in disease.
Assuntos
Metilação de DNA , DNA/análise , Epigênese Genética , Epigenômica/métodos , Marcadores Genéticos , Estudo de Associação Genômica Ampla , Análise de Sequência de DNA/métodos , Algoritmos , Artrite Reumatoide/genética , Artrite Reumatoide/patologia , Neoplasias da Mama/genética , Neoplasias da Mama/patologia , Ilhas de CpG , Neoplasias do Endométrio/genética , Neoplasias do Endométrio/patologia , Feminino , Humanos , Neoplasias Pulmonares/genética , Neoplasias Pulmonares/patologia , Fumar/efeitos adversos , Fumar/genéticaRESUMO
MOTIVATION: The biological interpretation of differentially methylated sites derived from Epigenome-Wide-Association Studies (EWAS) remains a significant challenge. Gene Set Enrichment Analysis (GSEA) is a general tool to aid biological interpretation, yet its correct and unbiased implementation in the EWAS context is difficult due to the differential probe representation of Illumina Infinium DNA methylation beadchips. RESULTS: We present a novel GSEA method, called ebGSEA, which ranks genes, not CpGs, according to the overall level of differential methylation, as assessed using all the probes mapping to the given gene. Applied on simulated and real EWAS data, we show how ebGSEA may exhibit higher sensitivity and specificity than the current state-of-the-art, whilst also avoiding differential probe representation bias. Thus, ebGSEA will be a useful additional tool to aid the interpretation of EWAS data. AVAILABILITY AND IMPLEMENTATION: ebGSEA is available from https://github.com/aet21/ebGSEA, and has been incorporated into the ChAMP Bioconductor package (https://www.bioconductor.org). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Metilação de DNA , Epigenoma , ProbabilidadeRESUMO
SUMMARY: It is well recognized that cell-type heterogeneity hampers the interpretation of Epigenome-Wide Association Studies (EWAS). Many tools have emerged to address this issue, including several R/Bioconductor packages that infer cell-type composition. Here we present a web application for cell-type deconvolution, which offers the functionality of our EpiDISH Bioconductor/R package in a user-friendly GUI environment. Users can upload their data to infer cell-type composition and differentially methylated cytosines in individual cell-types (DMCTs) for a range of different tissues. AVAILABILITY AND IMPLEMENTATION: EpiDISH web server is implemented with Shiny in R, and is freely available at https://www.biosino.org/EpiDISH/.
RESUMO
BACKGROUND: Intra-sample cellular heterogeneity presents numerous challenges to the identification of biomarkers in large Epigenome-Wide Association Studies (EWAS). While a number of reference-based deconvolution algorithms have emerged, their potential remains underexplored and a comparative evaluation of these algorithms beyond tissues such as blood is still lacking. RESULTS: Here we present a novel framework for reference-based inference, which leverages cell-type specific DNAse Hypersensitive Site (DHS) information from the NIH Epigenomics Roadmap to construct an improved reference DNA methylation database. We show that this leads to a marginal but statistically significant improvement of cell-count estimates in whole blood as well as in mixtures involving epithelial cell-types. Using this framework we compare a widely used state-of-the-art reference-based algorithm (called constrained projection) to two non-constrained approaches including CIBERSORT and a method based on robust partial correlations. We conclude that the widely-used constrained projection technique may not always be optimal. Instead, we find that the method based on robust partial correlations is generally more robust across a range of different tissue types and for realistic noise levels. We call the combined algorithm which uses DHS data and robust partial correlations for inference, EpiDISH (Epigenetic Dissection of Intra-Sample Heterogeneity). Finally, we demonstrate the added value of EpiDISH in an EWAS of smoking. CONCLUSIONS: Estimating cell-type fractions and subsequent inference in EWAS may benefit from the use of non-constrained reference-based cell-type deconvolution methods.
Assuntos
Algoritmos , Epigenômica/métodos , Marcadores Genéticos , Estudo de Associação Genômica Ampla , HumanosRESUMO
BACKGROUND: RNA velocity analysis of single cells offers the potential to predict temporal dynamics from gene expression. In many systems, RNA velocity has been observed to produce a vector field that qualitatively reflects known features of the system. However, the limitations of RNA velocity estimates are still not well understood. RESULTS: We analyze the impact of different steps in the RNA velocity workflow on direction and speed. We consider both high-dimensional velocity estimates and low-dimensional velocity vector fields mapped onto an embedding. We conclude the transition probability method for mapping velocity estimates onto an embedding is effectively interpolating in the embedding space. Our findings reveal a significant dependence of the RNA velocity workflow on smoothing via the k-nearest-neighbors (k-NN) graph of the observed data. This reliance results in considerable estimation errors for both direction and speed in both high- and low-dimensional settings when the k-NN graph fails to accurately represent the true data structure; this is an unknown feature of real data. RNA velocity performs poorly at estimating speed in both low- and high-dimensional spaces, except in very low noise settings. We introduce a novel quality measure that can identify when RNA velocity should not be used. CONCLUSIONS: Our findings emphasize the importance of choices in the RNA velocity workflow and highlight critical limitations of data analysis. We advise against over-interpreting expression dynamics using RNA velocity, particularly in terms of speed. Finally, we emphasize that the use of RNA velocity in assessing the correctness of a low-dimensional embedding is circular.
Assuntos
Probabilidade , Análise por ConglomeradosRESUMO
BACKGROUND: Changes in cell-type composition of tissues are associated with a wide range of diseases and environmental risk factors and may be causally implicated in disease development and progression. However, these shifts in cell-type fractions are often of a low magnitude, or involve similar cell subtypes, making their reliable identification challenging. DNA methylation profiling in a tissue like blood is a promising approach to discover shifts in cell-type abundance, yet studies have only been performed at a relatively low cellular resolution and in isolation, limiting their power to detect shifts in tissue composition. METHODS: Here we derive a DNA methylation reference matrix for 12 immune-cell types in human blood and extensively validate it with flow-cytometric count data and in whole-genome bisulfite sequencing data of sorted cells. Using this reference matrix, we perform a directional Stouffer and fixed effects meta-analysis comprising 23,053 blood samples from 22 different cohorts, to comprehensively map associations between the 12 immune-cell fractions and common phenotypes. In a separate cohort of 4386 blood samples, we assess associations between immune-cell fractions and health outcomes. RESULTS: Our meta-analysis reveals many associations of cell-type fractions with age, sex, smoking and obesity, many of which we validate with single-cell RNA sequencing. We discover that naïve and regulatory T-cell subsets are higher in women compared to men, while the reverse is true for monocyte, natural killer, basophil, and eosinophil fractions. Decreased natural killer counts associated with smoking, obesity, and stress levels, while an increased count correlates with exercise and sleep. Analysis of health outcomes revealed that increased naïve CD4 + T-cell and N-cell fractions associated with a reduced risk of all-cause mortality independently of all major epidemiological risk factors and baseline co-morbidity. A machine learning predictor built only with immune-cell fractions achieved a C-index value for all-cause mortality of 0.69 (95%CI 0.67-0.72), which increased to 0.83 (0.80-0.86) upon inclusion of epidemiological risk factors and baseline co-morbidity. CONCLUSIONS: This work contributes an extensively validated high-resolution DNAm reference matrix for blood, which is made freely available, and uses it to generate a comprehensive map of associations between immune-cell fractions and common phenotypes, including health outcomes.
Assuntos
Metilação de DNA , Linfócitos T , Masculino , Humanos , Feminino , Linfócitos T/metabolismo , Fenótipo , Obesidade/metabolismo , Avaliação de Resultados em Cuidados de SaúdeRESUMO
BACKGROUND: The cell cycle is a highly conserved, continuous process which controls faithful replication and division of cells. Single-cell technologies have enabled increasingly precise measurements of the cell cycle both as a biological process of interest and as a possible confounding factor. Despite its importance and conservation, there is no universally applicable approach to infer position in the cell cycle with high-resolution from single-cell RNA-seq data. RESULTS: Here, we present tricycle, an R/Bioconductor package, to address this challenge by leveraging key features of the biology of the cell cycle, the mathematical properties of principal component analysis of periodic functions, and the use of transfer learning. We estimate a cell-cycle embedding using a fixed reference dataset and project new data into this reference embedding, an approach that overcomes key limitations of learning a dataset-dependent embedding. Tricycle then predicts a cell-specific position in the cell cycle based on the data projection. The accuracy of tricycle compares favorably to gold-standard experimental assays, which generally require specialized measurements in specifically constructed in vitro systems. Using internal controls which are available for any dataset, we show that tricycle predictions generalize to datasets with multiple cell types, across tissues, species, and even sequencing assays. CONCLUSIONS: Tricycle generalizes across datasets and is highly scalable and applicable to atlas-level single-cell RNA-seq data.
Assuntos
Aprendizado de Máquina , Análise de Célula Única , Ciclo Celular/genética , Análise de Componente Principal , Análise de Sequência de RNA , Sequenciamento do ExomaRESUMO
We present recount3, a resource consisting of over 750,000 publicly available human and mouse RNA sequencing (RNA-seq) samples uniformly processed by our new Monorail analysis pipeline. To facilitate access to the data, we provide the recount3 and snapcount R/Bioconductor packages as well as complementary web resources. Using these tools, data can be downloaded as study-level summaries or queried for specific exon-exon junctions, genes, samples, or other features. Monorail can be used to process local and/or private data, allowing results to be directly compared to any study in recount3. Taken together, our tools help biologists maximize the utility of publicly available RNA-seq data, especially to improve their understanding of newly collected data. recount3 is available from http://rna.recount.bio .
Assuntos
Splicing de RNA , RNA-Seq/métodos , RNA/genética , Animais , Sequência de Bases , Biologia Computacional/métodos , Éxons , Regulação da Expressão Gênica , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Camundongos , Análise de Sequência de RNA/métodos , SoftwareRESUMO
Highly reproducible smoking-associated DNA methylation changes in whole blood have been reported by many Epigenome-Wide-Association Studies (EWAS). These epigenetic alterations could have important implications for understanding and predicting the risk of smoking-related diseases. To this end, it is important to establish if these DNA methylation changes happen in all blood cell subtypes or if they are cell-type specific. Here, we apply a cell-type deconvolution algorithm to identify cell-type specific DNA methylation signals in seven large EWAS. We find that most of the highly reproducible smoking-associated hypomethylation signatures are more prominent in the myeloid lineage. A meta-analysis further identifies a myeloid-specific smoking-associated hypermethylation signature enriched for DNase Hypersensitive Sites in acute myeloid leukemia. These results may guide the design of future smoking EWAS and have important implications for our understanding of how smoking affects immune-cell subtypes and how this may influence the risk of smoking related diseases.
Assuntos
Metilação de DNA/efeitos dos fármacos , Epigenoma , Fumar/efeitos adversos , Algoritmos , Povo Asiático , Sangue , Ilhas de CpG , Epigenômica/métodos , Etnicidade , Feminino , Humanos , Linfócitos , Masculino , Pessoa de Meia-Idade , Modelos Estatísticos , Células MieloidesRESUMO
Age-associated DNA methylation changes have been widely reported across many different tissue and cell types. Epigenetic 'clocks' that can predict chronological age with a surprisingly high degree of accuracy appear to do so independently of tissue and cell-type, suggesting that a component of epigenetic drift is cell-type independent. However, the relative amount of age-associated DNAm changes that are specific to a cell or tissue type versus the amount that occurs independently of cell or tissue type is unclear and a matter of debate, with a recent study concluding that most epigenetic drift is tissue-specific. Here, we perform a novel comprehensive statistical analysis, including matched multi cell-type and multi-tissue DNA methylation profiles from the same individuals and adjusting for cell-type heterogeneity, demonstrating that a substantial amount of epigenetic drift, possibly over 70%, is shared between significant numbers of different tissue/cell types. We further show that ELOVL2 is not unique and that many other CpG sites, some mapping to genes in the Wnt and glutamate receptor signaling pathways, are altered with age across at least 10 different cell/tissue types. We propose that while most age-associated DNAm changes are shared between cell-types that the putative functional effect is likely to be tissue-specific.
Assuntos
Envelhecimento/fisiologia , Linfócitos T CD4-Positivos/metabolismo , Linfócitos T CD8-Positivos/metabolismo , Metilação de DNA , Epigênese Genética , Fibroblastos/metabolismo , Encéfalo , Colo do Útero/citologia , Bochecha , Feminino , Humanos , Fígado , MasculinoRESUMO
AIM: An outstanding challenge in epigenome studies is the estimation of cell-type proportions in complex epithelial tissues. MATERIALS & METHODS: Here, we construct and validate a DNA methylation reference and algorithm for complex tissues that contain epithelial, immune and nonimmune stromal cells. RESULTS: Using this reference, we show that easily accessible tissues such as saliva, buccal and cervix exhibit substantial variation in immune cell (IC) contamination. We further validate our reference in the context of oral cancer, where it correctly predicts an increased IC infiltration in cancer but suppressed in patients with highest smoking exposure. Finally, our method can improve the specificity of differentially methylated CpG calls in epithelial cancer. CONCLUSION: The degree and variation of IC contamination in complex epithelial tissues is substantial. We provide a valuable resource and tool for assessing the epithelial purity and IC contamination of samples and for identifying differential methylation in such complex tissues.
Assuntos
Algoritmos , Colo do Útero/imunologia , Metilação de DNA , Epigênese Genética , Mucosa Bucal/imunologia , Saliva/imunologia , Colo do Útero/citologia , Ilhas de CpG/genética , Feminino , Estudo de Associação Genômica Ampla , Humanos , Mucosa Bucal/citologia , Saliva/citologiaRESUMO
A major challenge faced by epigenome-wide association studies (EWAS) is cell-type heterogeneity. As many EWAS have already demonstrated, adjusting for changes in cell-type composition can be critical when analyzing and interpreting findings from such studies. Because of their importance, a great number of different statistical algorithms, which adjust for cell-type composition, have been proposed. Some of the methods are 'reference based' in that they require a priori defined reference DNA methylation profiles of cell types that are present in the tissue of interest, while other algorithms are 'reference free.' At present, however, it is unclear how best to adjust for cell-type heterogeneity, as this may also largely depend on the type of tissue and phenotype being considered. Here, we provide a critical review of the major existing algorithms for correcting cell-type composition in the context of Illumina Infinium Methylation Beadarrays, with the aim of providing useful recommendations to the EWAS community.
Assuntos
Metilação de DNA , Epigênese Genética , Epigenômica/métodos , Variação Genética , Estudo de Associação Genômica Ampla/métodos , Animais , Epigenômica/normas , Estudo de Associação Genômica Ampla/normas , Humanos , FenótipoRESUMO
It is well-established that the DNA methylation landscape of normal cells undergoes a gradual modification with age, termed as 'epigenetic drift'. Here, we review the current state of knowledge of epigenetic drift and its potential role in cancer etiology. We propose a new terminology to help distinguish the different components of epigenetic drift, with the aim of clarifying the role of the epigenetic clock, mitotic clocks and active changes, which accumulate in response to environmental disease risk factors. We further highlight the growing evidence that epigenetic changes associated with cancer risk factors may play an important causal role in cancer development, and that monitoring these molecular changes in normal cells may offer novel risk prediction and disease prevention strategies.
Assuntos
Epigênese Genética , Neoplasias/genética , Animais , Metilação de DNA , Regulação Neoplásica da Expressão Gênica , Humanos , Fatores de RiscoRESUMO
BACKGROUND: Hypermethylation of transcription factor promoters bivalently marked in stem cells is a cancer hallmark. However, the biological significance of this observation for carcinogenesis is unclear given that most of these transcription factors are not expressed in any given normal tissue. METHODS: We analysed the dynamics of gene expression between human embryonic stem cells, fetal and adult normal tissue, as well as six different matching cancer types. In addition, we performed an integrative multi-omic analysis of matched DNA methylation, copy number, mutational and transcriptomic data for these six cancer types. RESULTS: We here demonstrate that bivalently and PRC2 marked transcription factors highly expressed in a normal tissue are more likely to be silenced in the corresponding tumour type compared with non-housekeeping genes that are also highly expressed in the same normal tissue. Integrative multi-omic analysis of matched DNA methylation, copy number, mutational and transcriptomic data for six different matching cancer types reveals that in-cis promoter hypermethylation, and not in-cis genomic loss or genetic mutation, emerges as the predominant mechanism associated with silencing of these transcription factors in cancer. However, we also observe that some silenced bivalently/PRC2 marked transcription factors are more prone to copy number loss than promoter hypermethylation, pointing towards distinct, mutually exclusive inactivation patterns. CONCLUSIONS: These data provide statistical evidence that inactivation of cell fate-specifying transcription factors in cancer is an important step in carcinogenesis and that it occurs predominantly through a mechanism associated with promoter hypermethylation.
Assuntos
Carcinogênese/genética , Regulação Neoplásica da Expressão Gênica , Inativação Gênica , Proteínas de Neoplasias/genética , Neoplasias/genética , Fatores de Transcrição/genética , Adulto , Carcinogênese/metabolismo , Carcinogênese/patologia , Biologia Computacional , Ilhas de CpG , Variações do Número de Cópias de DNA , Metilação de DNA , Feto , Perfilação da Expressão Gênica , Células-Tronco Embrionárias Humanas/citologia , Células-Tronco Embrionárias Humanas/metabolismo , Humanos , Proteínas de Neoplasias/metabolismo , Neoplasias/classificação , Neoplasias/metabolismo , Neoplasias/patologia , Complexo Repressor Polycomb 2/genética , Complexo Repressor Polycomb 2/metabolismo , Regiões Promotoras Genéticas , Fatores de Transcrição/metabolismoRESUMO
BACKGROUND: Variation in cancer risk among somatic tissues has been attributed to variations in the underlying rate of stem cell division. For a given tissue type, variable cancer risk between individuals is thought to be influenced by extrinsic factors which modulate this rate of stem cell division. To date, no molecular mitotic clock has been developed to approximate the number of stem cell divisions in a tissue of an individual and which is correlated with cancer risk. RESULTS: Here, we integrate mathematical modeling with prior biological knowledge to construct a DNA methylation-based age-correlative model which approximates a mitotic clock in both normal and cancer tissue. By focusing on promoter CpG sites that localize to Polycomb group target genes that are unmethylated in 11 different fetal tissue types, we show that increases in DNA methylation at these sites defines a tick rate which correlates with the estimated rate of stem cell division in normal tissues. Using matched DNA methylation and RNA-seq data, we further show that it correlates with an expression-based mitotic index in cancer tissue. We demonstrate that this mitotic-like clock is universally accelerated in cancer, including pre-cancerous lesions, and that it is also accelerated in normal epithelial cells exposed to a major carcinogen. CONCLUSIONS: Unlike other epigenetic and mutational clocks or the telomere clock, the epigenetic clock proposed here provides a concrete example of a mitotic-like clock which is universally accelerated in cancer and precancerous lesions.