RESUMO
BACKGROUND: Transcription factors bind DNA in specific sequence contexts. In addition to distinguishing one nucleobase from another, some transcription factors can distinguish between unmodified and modified bases. Current models of transcription factor binding tend not to take DNA modifications into account, while the recent few that do often have limitations. This makes a comprehensive and accurate profiling of transcription factor affinities difficult. RESULTS: Here, we develop methods to identify transcription factor binding sites in modified DNA. Our models expand the standard A/C/G/T DNA alphabet to include cytosine modifications. We develop Cytomod to create modified genomic sequences and we also enhance the MEME Suite, adding the capacity to handle custom alphabets. We adapt the well-established position weight matrix (PWM) model of transcription factor binding affinity to this expanded DNA alphabet. Using these methods, we identify modification-sensitive transcription factor binding motifs. We confirm established binding preferences, such as the preference of ZFP57 and C/EBPß for methylated motifs and the preference of c-Myc for unmethylated E-box motifs. CONCLUSIONS: Using known binding preferences to tune model parameters, we discover novel modified motifs for a wide array of transcription factors. Finally, we validate our binding preference predictions for OCT4 using cleavage under targets and release using nuclease (CUT&RUN) experiments across conventional, methylation-, and hydroxymethylation-enriched sequences. Our approach readily extends to other DNA modifications. As more genome-wide single-base resolution modification data becomes available, we expect that our method will yield insights into altered transcription factor binding affinities across many different modifications.
Assuntos
Regulação da Expressão Gênica , Fatores de Transcrição , Epigenômica , DNA , Epigênese GenéticaAssuntos
Metilação de DNA , Epigenômica/métodos , Histonas/metabolismo , Motivos de Nucleotídeos , Software , HumanosRESUMO
Summary: Chromatin immunoprecipitation-sequencing is widely used to find transcription factor binding sites, but suffers from various sources of noise. Knocking out the target factor mitigates noise by acting as a negative control. Paired wild-type and knockout (KO) experiments can generate improved motifs but require optimal differential analysis. We introduce peaKO-a computational method to automatically optimize motif analyses with KO controls, which we compare to two other methods. PeaKO often improves elucidation of the target factor and highlights the benefits of KO controls, which far outperform input controls. Availability and implementation: PeaKO is freely available at https://peako.hoffmanlab.org. Contact: michael.hoffman@utoronto.ca.
RESUMO
Covalent DNA modifications, such as 5-methylcytosine (5mC), are increasingly the focus of numerous research programs. In eukaryotes, both 5mC and 5-hydroxymethylcytosine (5hmC) are now recognized as stable epigenetic marks, with diverse functions. Bacteria, archaea, and viruses contain various other modified DNA nucleobases. Numerous databases describe RNA and histone modifications, but no database specifically catalogues DNA modifications, despite their broad importance in epigenetic regulation. To address this need, we have developed DNAmod: the DNA modification database. DNAmod is an open-source database ( https://dnamod.hoffmanlab.org ) that catalogues DNA modifications and provides a single source to learn about their properties. DNAmod provides a web interface to easily browse and search through these modifications. The database annotates the chemical properties and structures of all curated modified DNA bases, and a much larger list of candidate chemical entities. DNAmod includes manual annotations of available sequencing methods, descriptions of their occurrence in nature, and provides existing and suggested nomenclature. DNAmod enables researchers to rapidly review previous work, select mapping techniques, and track recent developments concerning modified bases of interest.
RESUMO
Somatic mutations reported in large-scale breast cancer (BC) sequencing studies primarily consist of protein coding mutations. mRNA splicing mutation analyses have been limited in scope, despite their prevalence in Mendelian genetic disorders. We predicted splicing mutations in 442 BC tumour and matched normal exomes from The Cancer Genome Atlas Consortium (TCGA). These splicing defects were validated by abnormal expression changes in these tumours. Of the 5,206 putative mutations identified, exon skipping, leaky or cryptic splicing was confirmed for 988 variants. Pathway enrichment analysis of the mutated genes revealed mutations in 9 NCAM1-related pathways, which were significantly increased in samples with evidence of lymph node metastasis, but not in lymph node-negative tumours. We suggest that comprehensive reporting of DNA sequencing data should include non-trivial splicing analyses to avoid missing clinically-significant deleterious splicing mutations, which may reveal novel mutated pathways present in genetic disorders.
Assuntos
Neoplasias da Mama/genética , Neoplasias da Mama/metabolismo , Mutação , Splicing de RNA , Transdução de Sinais , Processamento Alternativo , Neoplasias da Mama/diagnóstico , Neoplasias da Mama/patologia , Antígeno CD56/metabolismo , Biologia Computacional , Variações do Número de Cópias de DNA , Análise Mutacional de DNA , Bases de Dados de Ácidos Nucleicos , Feminino , Ordem dos Genes , Loci Gênicos , Humanos , Linfonodos/patologia , Metástase Linfática , Polimorfismo de Nucleotídeo Único , Reprodutibilidade dos TestesRESUMO
Interpretation of variants present in complete genomes or exomes reveals numerous sequence changes, only a fraction of which are likely to be pathogenic. Mutations have been traditionally inferred from allele frequencies and inheritance patterns in such data. Variants predicted to alter mRNA splicing can be validated by manual inspection of transcriptome sequencing data, however this approach is intractable for large datasets. These abnormal mRNA splicing patterns are characterized by reads demonstrating either exon skipping, cryptic splice site use, and high levels of intron inclusion, or combinations of these properties. We present, Veridical, an in silico method for the automatic validation of DNA sequencing variants that alter mRNA splicing. Veridical performs statistically valid comparisons of the normalized read counts of abnormal RNA species in mutant versus non-mutant tissues. This leverages large numbers of control samples to corroborate the consequences of predicted splicing variants in complete genomes and exomes.