RESUMEN
BACKGROUND: Autism spectrum disorder (ASD) is a common and etiologically heterogeneous neurodevelopmental disorder. Although many genetic causes have been identified (> 200 ASD-risk genes), no single gene variant accounts for > 1% of all ASD cases. A role for epigenetic mechanisms in ASD etiology is supported by the fact that many ASD-risk genes function as epigenetic regulators and evidence that epigenetic dysregulation can interrupt normal brain development. Gene-specific DNAm profiles have been shown to assist in the interpretation of variants of unknown significance. Therefore, we investigated the epigenome in patients with ASD or two of the most common genomic variants conferring increased risk for ASD. Genome-wide DNA methylation (DNAm) was assessed using the Illumina Infinium HumanMethylation450 and MethylationEPIC arrays in blood from individuals with ASD of heterogeneous, undefined etiology (n = 52), and individuals with 16p11.2 deletions (16p11.2del, n = 9) or pathogenic variants in the chromatin modifier CHD8 (CHD8+/-, n = 7). RESULTS: DNAm patterns did not clearly distinguish heterogeneous ASD cases from controls. However, the homogeneous genetically-defined 16p11.2del and CHD8+/- subgroups each exhibited unique DNAm signatures that distinguished 16p11.2del or CHD8+/- individuals from each other and from heterogeneous ASD and control groups with high sensitivity and specificity. These signatures also classified additional 16p11.2del (n = 9) and CHD8 (n = 13) variants as pathogenic or benign. Our findings that DNAm alterations in each signature target unique genes in relevant biological pathways including neural development support their functional relevance. Furthermore, genes identified in our CHD8+/- DNAm signature in blood overlapped differentially expressed genes in CHD8+/- human-induced pluripotent cell-derived neurons and cerebral organoids from independent studies. CONCLUSIONS: DNAm signatures can provide clinical utility complementary to next-generation sequencing in the interpretation of variants of unknown significance. Our study constitutes a novel approach for ASD risk-associated molecular classification that elucidates the vital cross-talk between genetics and epigenetics in the etiology of ASD.
Asunto(s)
Trastorno del Espectro Autista/genética , Trastorno Autístico/genética , Trastornos de los Cromosomas/genética , Metilación de ADN , Proteínas de Unión al ADN/genética , Estudio de Asociación del Genoma Completo/métodos , Discapacidad Intelectual/genética , Factores de Transcripción/genética , Adolescente , Estudios de Casos y Controles , Niño , Preescolar , Deleción Cromosómica , Cromosomas Humanos Par 16/genética , Epigénesis Genética , Femenino , Redes Reguladoras de Genes , Predisposición Genética a la Enfermedad , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Lactante , Masculino , Sensibilidad y Especificidad , Análisis de Secuencia de ADNRESUMEN
Alternative pre-mRNA splicing is a major cellular process by which functionally diverse proteins can be generated from the primary transcript of a single gene, often in tissue-specific patterns. The current study investigates the hypothesis that splicing of tissue-specific alternative exons is regulated in part by control sequences in adjacent introns and that such elements may be recognized via computational analysis of exons sharing a highly specific expression pattern. We have identified 25 brain-specific alternative cassette exons, compiled a dataset of genomic sequences encompassing these exons and their adjacent introns and used word contrast algorithms to analyze key features of these nucleotide sequences. By comparison to a control group of constitutive exons, brain-specific exons were often found to possess the following: divergent 5' splice sites; highly pyrimidine-rich upstream introns; a paucity of GGG motifs in the downstream intron; a highly statistically significant over-representation of the hexanucleotide UGCAUG in the proximal downstream intron. UGCAUG was also found at a high frequency downstream of a smaller group of muscle-specific exons. Intriguingly, UGCAUG has been identified previously in a few intron splicing enhancers. Our results indicate that this element plays a much wider role than previously appreciated in the regulated tissue-specific splicing of many alternative exons.
Asunto(s)
Empalme Alternativo , Encéfalo/metabolismo , Intrones/genética , Precursores del ARN/genética , Secuencias Reguladoras de Ácidos Nucleicos , Algoritmos , Secuencia de Bases , ADN/genética , Exones/genética , Genes/genética , HumanosRESUMEN
Sotos syndrome (SS) represents an important human model system for the study of epigenetic regulation; it is an overgrowth/intellectual disability syndrome caused by mutations in a histone methyltransferase, NSD1. As layered epigenetic modifications are often interdependent, we propose that pathogenic NSD1 mutations have a genome-wide impact on the most stable epigenetic mark, DNA methylation (DNAm). By interrogating DNAm in SS patients, we identify a genome-wide, highly significant NSD1(+/-)-specific signature that differentiates pathogenic NSD1 mutations from controls, benign NSD1 variants and the clinically overlapping Weaver syndrome. Validation studies of independent cohorts of SS and controls assigned 100% of these samples correctly. This highly specific and sensitive NSD1(+/-) signature encompasses genes that function in cellular morphogenesis and neuronal differentiation, reflecting cardinal features of the SS phenotype. The identification of SS-specific genome-wide DNAm alterations will facilitate both the elucidation of the molecular pathophysiology of SS and the development of improved diagnostic testing.
Asunto(s)
Metilación de ADN/genética , Genoma Humano , Péptidos y Proteínas de Señalización Intracelular/metabolismo , Proteínas Nucleares/metabolismo , Síndrome de Sotos/genética , Regulación de la Expresión Génica , Histona Metiltransferasas , N-Metiltransferasa de Histona-Lisina , Humanos , Péptidos y Proteínas de Señalización Intracelular/genética , Mutación , Proteínas Nucleares/genéticaRESUMEN
While the popular DNA sequence alignment tools incorporate powerful heuristics to allow for fast and accurate alignment of DNA, most of them still optimize the classical Needleman Wunsch scoring scheme. The development of novel scoring schemes is often hampered by the difficulty of finding an optimizing algorithm for each non-trivial scheme. In this paper we define the broad class of rectangle scoring schemes, and describe an algorithm and tool that can align two sequences with an arbitrary rectangle scoring scheme in polynomial time. Rectangle scoring schemes encompass some of the popular alignment scoring metrics currently in use, as well as many other functions. We investigate a novel scoring function based on minimizing the expected number of random diagonals observed with the given scores and show that it rivals the LAGAN and Clustal-W aligners, without using any biological or evolutionary parameters. The FRESCO program, freely available at http://compbio.cs.toronto.edu/fresco, gives bioinformatics researchers the ability to quickly compare the performance of other complex scoring formulas without having to implement new algorithms to optimize them.
Asunto(s)
Alineación de Secuencia/estadística & datos numéricos , Programas Informáticos , Algoritmos , Biología Computacional , ADN/genéticaRESUMEN
Version 2.1 of ASDB (Alternative Splicing Data Base) contains 1922 protein and 2486 DNA sequences. The protein entries from SWISS-PROT are joined into clusters corresponding to alternatively spliced variants of one gene. The DNA division consists of complete genes with alternative splicing mentioned or annotated in GenBank. The search engine allows one to search over SWISS-PROT and GenBank fields and then follow the links to all variants. The database can be assessed at the URL http://cbcg.nersc.gov/asdb
Asunto(s)
Empalme Alternativo/genética , ADN/genética , Bases de Datos Factuales , Proteínas/química , Internet , Proteínas/genéticaRESUMEN
Human and mouse genomic sequence comparisons are being increasingly used to search for evolutionarily conserved gene regulatory elements. Large-scale human-mouse DNA comparison studies have discovered numerous conserved noncoding sequences of which only a fraction has been functionally investigated A question therefore remains as to whether most of these noncoding sequences are conserved because of functional constraints or are the result of a lack of divergence time.
Asunto(s)
Secuencia Conservada/genética , Alineación de Secuencia , Regiones no Traducidas/genética , Animales , Perros , Humanos , Ratones , Datos de Secuencia Molecular , Especificidad de la Especie , Regiones no Traducidas/aislamiento & purificaciónRESUMEN
SUMMARY: VISTA is a program for visualizing global DNA sequence alignments of arbitrary length. It has a clean output, allowing for easy identification of similarity, and is easily configurable, enabling the visualization of alignments of various lengths at different levels of resolution. It is currently available on the web, thus allowing for easy access by all researchers. AVAILABILITY: VISTA server is available on the web at http://www-gsd.lbl.gov/vista. The source code is available upon request. CONTACT: vista@lbl.gov