RESUMO
BACKGROUND: Transcription factors bind DNA in specific sequence contexts. In addition to distinguishing one nucleobase from another, some transcription factors can distinguish between unmodified and modified bases. Current models of transcription factor binding tend not to take DNA modifications into account, while the recent few that do often have limitations. This makes a comprehensive and accurate profiling of transcription factor affinities difficult. RESULTS: Here, we develop methods to identify transcription factor binding sites in modified DNA. Our models expand the standard A/C/G/T DNA alphabet to include cytosine modifications. We develop Cytomod to create modified genomic sequences and we also enhance the MEME Suite, adding the capacity to handle custom alphabets. We adapt the well-established position weight matrix (PWM) model of transcription factor binding affinity to this expanded DNA alphabet. Using these methods, we identify modification-sensitive transcription factor binding motifs. We confirm established binding preferences, such as the preference of ZFP57 and C/EBPß for methylated motifs and the preference of c-Myc for unmethylated E-box motifs. CONCLUSIONS: Using known binding preferences to tune model parameters, we discover novel modified motifs for a wide array of transcription factors. Finally, we validate our binding preference predictions for OCT4 using cleavage under targets and release using nuclease (CUT&RUN) experiments across conventional, methylation-, and hydroxymethylation-enriched sequences. Our approach readily extends to other DNA modifications. As more genome-wide single-base resolution modification data becomes available, we expect that our method will yield insights into altered transcription factor binding affinities across many different modifications.
Assuntos
Regulação da Expressão Gênica , Fatores de Transcrição , Epigenômica , DNA , Epigênese GenéticaRESUMO
DNA comprises molecular information stored in genetic and epigenetic bases, both of which are vital to our understanding of biology. Most DNA sequencing approaches address either genetics or epigenetics and thus capture incomplete information. Methods widely used to detect epigenetic DNA bases fail to capture common C-to-T mutations or distinguish 5-methylcytosine from 5-hydroxymethylcytosine. We present a single base-resolution sequencing methodology that sequences complete genetics and the two most common cytosine modifications in a single workflow. DNA is copied and bases are enzymatically converted. Coupled decoding of bases across the original and copy strand provides a phased digital readout. Methods are demonstrated on human genomic DNA and cell-free DNA from a blood sample of a patient with cancer. The approach is accurate, requires low DNA input and has a simple workflow and analysis pipeline. Simultaneous, phased reading of genetic and epigenetic bases provides a more complete picture of the information stored in genomes and has applications throughout biomedicine.
RESUMO
Genome sequencing has revealed over 300 million genetic variations in human populations. Over 90% of variants are single nucleotide polymorphisms (SNPs), the remainder include short deletions or insertions, and small numbers of structural variants. Hundreds of thousands of these variants have been associated with specific phenotypic traits and diseases through genome wide association studies which link significant differences in variant frequencies with specific phenotypes among large groups of individuals. Only 5% of disease-associated SNPs are located in gene coding sequences, with the potential to disrupt gene expression or alter of the function of encoded proteins. The remaining 95% of disease-associated SNPs are located in non-coding DNA sequences which make up 98% of the genome. The role of non-coding, disease-associated SNPs, many of which are located at considerable distances from any gene, was at first a mystery until the discovery that gene promoters regularly interact with distal regulatory elements to control gene expression. Disease-associated SNPs are enriched at the millions of gene regulatory elements that are dispersed throughout the non-coding sequences of the genome, suggesting they function as gene regulation variants. Assigning specific regulatory elements to the genes they control is not straightforward since they can be millions of base pairs apart. In this review we describe how understanding 3D genome organization can identify specific interactions between gene promoters and distal regulatory elements and how 3D genomics can link disease-associated SNPs to their target genes. Understanding which gene or genes contribute to a specific disease is the first step in designing rational therapeutic interventions.
RESUMO
Early detection of cancer will improve survival rates. The blood biomarker 5-hydroxymethylcytosine has been shown to discriminate cancer. In a large covariate-controlled study of over two thousand individual blood samples, we created, tested and explored the properties of a 5-hydroxymethylcytosine-based classifier to detect colorectal cancer (CRC). In an independent validation sample set, the classifier discriminated CRC samples from controls with an area under the receiver operating characteristic curve (AUC) of 90% (95% CI [87, 93]). Sensitivity was 55% at 95% specificity. Performance was similar for early stage 1 (AUC 89%; 95% CI [83, 94]) and late stage 4 CRC (AUC 94%; 95% CI [89, 98]). The classifier could detect CRC even when the proportion of tumor DNA in blood was undetectable by other methods. Expanding the classifier to include information about cell-free DNA fragment size and abundance across the genome led to gains in sensitivity (63% at 95% specificity), with similar overall performance (AUC 91%; 95% CI [89, 94]). We confirm that 5-hydroxymethylcytosine can be used to detect CRC, even in early-stage disease. Therefore, the inclusion of 5-hydroxymethylcytosine in multianalyte testing could improve sensitivity for the detection of early-stage cancer.
Assuntos
Ácidos Nucleicos Livres , Neoplasias Colorretais , Biomarcadores Tumorais/genética , Ácidos Nucleicos Livres/genética , Neoplasias Colorretais/diagnóstico , Neoplasias Colorretais/genética , Neoplasias Colorretais/patologia , DNA/genética , Detecção Precoce de Câncer/métodos , Humanos , Sensibilidade e EspecificidadeRESUMO
Imprinting is the preferential expression of one parental allele over the other. It is controlled primarily through differential methylation of cytosine at CpG dinucleotides. Here we combine 285 methylomes and 11,617 transcriptomes from peripheral blood samples with parent-of-origin phased haplotypes, to produce a new map of imprinted methylation and gene expression patterns across the human genome. We demonstrate how imprinted methylation is a continuous rather than a binary characteristic. We describe at high resolution the parent-of-origin methylation pattern at the 15q11.2 Prader-Willi/Angelman syndrome locus, with nearly confluent stochastic paternal methylation punctuated by 'spikes' of maternal methylation. We find examples of polymorphic imprinted methylation unrelated (at VTRNA2-1 and PARD6G) or related (at CHRNE) to nearby SNP genotypes. We observe RNA isoform-specific imprinted expression patterns suggestive of a methylation-sensitive transcriptional elongation block. Finally, we gain new insights into parent-of-origin-specific effects on phenotypes at the DLK1/MEG3 and GNAS loci.
Assuntos
Metilação de DNA/genética , Genoma Humano , Impressão Genômica/fisiologia , Padrões de Herança/genética , Pais , Transcriptoma/genética , Síndrome de Angelman/genética , Estudos de Casos e Controles , Cromossomos Humanos Par 15 , Estudos de Coortes , Ilhas de CpG/genética , Feminino , Loci Gênicos , Humanos , Islândia , Masculino , Polimorfismo de Nucleotídeo Único , Síndrome de Prader-Willi/genética , Locos de Características Quantitativas/genéticaRESUMO
Homozygosity for Slc25a21(tm1a(KOMP)Wtsi) results in mice exhibiting orofacial abnormalities, alterations in carpal and rugae structures, hearing impairment and inflammation in the middle ear. In humans it has been hypothesised that the 2-oxoadipate mitochondrial carrier coded by SLC25A21 may be involved in the disease 2-oxoadipate acidaemia. Unexpectedly, no 2-oxoadipate acidaemia-like symptoms were observed in animals homozygous for Slc25a21(tm1a(KOMP)Wtsi) despite confirmation that this allele reduces Slc25a21 expression by 71.3%. To study the complete knockout, an allelic series was generated using the loxP and FRT sites typical of a Knockout Mouse Project allele. After removal of the critical exon and neomycin selection cassette, Slc25a21 knockout mice homozygous for the Slc25a21(tm1b(KOMP)Wtsi) and Slc25a21(tm1d(KOMP)Wtsi) alleles were phenotypically indistinguishable from wild-type. This led us to explore the genomic environment of Slc25a21 and to discover that expression of Pax9, located 3' of the target gene, was reduced in homozygous Slc25a21(tm1a(KOMP)Wtsi) mice. We hypothesize that the presence of the selection cassette is the cause of the down regulation of Pax9 observed. The phenotypes we observed in homozygous Slc25a21(tm1a(KOMP)Wtsi) mice were broadly consistent with a hypomorphic Pax9 allele with the exception of otitis media and hearing impairment which may be a novel consequence of Pax9 down regulation. We explore the ramifications associated with this particular targeted mutation and emphasise the need to interpret phenotypes taking into consideration all potential underlying genetic mechanisms.