RESUMEN
Tandem DNA repeats vary in the size and sequence of each unit (motif). When expanded, these tandem DNA repeats have been associated with more than 40 monogenic disorders1. Their involvement in disorders with complex genetics is largely unknown, as is the extent of their heterogeneity. Here we investigated the genome-wide characteristics of tandem repeats that had motifs with a length of 2-20 base pairs in 17,231 genomes of families containing individuals with autism spectrum disorder (ASD)2,3 and population control individuals4. We found extensive polymorphism in the size and sequence of motifs. Many of the tandem repeat loci that we detected correlated with cytogenetic fragile sites. At 2,588 loci, gene-associated expansions of tandem repeats that were rare among population control individuals were significantly more prevalent among individuals with ASD than their siblings without ASD, particularly in exons and near splice junctions, and in genes related to the development of the nervous system and cardiovascular system or muscle. Rare tandem repeat expansions had a prevalence of 23.3% in children with ASD compared with 20.7% in children without ASD, which suggests that tandem repeat expansions make a collective contribution to the risk of ASD of 2.6%. These rare tandem repeat expansions included previously undescribed ASD-linked expansions in DMPK and FXN, which are associated with neuromuscular conditions, and in previously unknown loci such as FGF14 and CACNB1. Rare tandem repeat expansions were associated with lower IQ and adaptive ability. Our results show that tandem DNA repeat expansions contribute strongly to the genetic aetiology and phenotypic complexity of ASD.
Asunto(s)
Trastorno del Espectro Autista/genética , Expansión de las Repeticiones de ADN/genética , Genoma Humano/genética , Genómica , Secuencias Repetidas en Tándem/genética , Femenino , Factores de Crecimiento de Fibroblastos/genética , Predisposición Genética a la Enfermedad , Humanos , Inteligencia/genética , Proteínas de Unión a Hierro/genética , Masculino , Proteína Quinasa de Distrofia Miotónica/genética , Motivos de Nucleótidos , Polimorfismo Genético , FrataxinaRESUMEN
Huntington disease (HD) is caused by a CAG repeat expansion in the huntingtin (HTT) gene. Although the length of this repeat is inversely correlated with age of onset (AOO), it does not fully explain the variability in AOO. We assessed the sequence downstream of the CAG repeat in HTT [reference: (CAG)n-CAA-CAG], since variants within this region have been previously described, but no study of AOO has been performed. These analyses identified a variant that results in complete loss of interrupting (LOI) adenine nucleotides in this region [(CAG)n-CAG-CAG]. Analysis of multiple HD pedigrees showed that this LOI variant is associated with dramatically earlier AOO (average of 25 years) despite the same polyglutamine length as in individuals with the interrupting penultimate CAA codon. This LOI allele is particularly frequent in persons with reduced penetrance alleles who manifest with HD and increases the likelihood of presenting clinically with HD with a CAG of 36-39 repeats. Further, we show that the LOI variant is associated with increased somatic repeat instability, highlighting this as a significant driver of this effect. These findings indicate that the number of uninterrupted CAG repeats, which is lengthened by the LOI, is the most significant contributor to AOO of HD and is more significant than polyglutamine length, which is not altered in these individuals. In addition, we identified another variant in this region, where the CAA-CAG sequence is duplicated, which was associated with later AOO. Identification of these cis-acting modifiers have potentially important implications for genetic counselling in HD-affected families.
Asunto(s)
Codón/genética , Enfermedad de Huntington/genética , Enfermedad de Huntington/patología , Péptidos/genética , Expansión de Repetición de Trinucleótido/genética , Adolescente , Adulto , Edad de Inicio , Niño , Femenino , Humanos , Masculino , Persona de Mediana Edad , LinajeRESUMEN
We report an inborn error of metabolism caused by an expansion of a GCA-repeat tract in the 5' untranslated region of the gene encoding glutaminase (GLS) that was identified through detailed clinical and biochemical phenotyping, combined with whole-genome sequencing. The expansion was observed in three unrelated patients who presented with an early-onset delay in overall development, progressive ataxia, and elevated levels of glutamine. In addition to ataxia, one patient also showed cerebellar atrophy. The expansion was associated with a relative deficiency of GLS messenger RNA transcribed from the expanded allele, which probably resulted from repeat-mediated chromatin changes upstream of the GLS repeat. Our discovery underscores the importance of careful examination of regions of the genome that are typically excluded from or poorly captured by exome sequencing.
Asunto(s)
Errores Innatos del Metabolismo de los Aminoácidos/genética , Ataxia/genética , Discapacidades del Desarrollo/genética , Glutaminasa/deficiencia , Glutaminasa/genética , Glutamina/metabolismo , Repeticiones de Microsatélite , Mutación , Atrofia/genética , Cerebelo/patología , Preescolar , Femenino , Genotipo , Glutamina/análisis , Humanos , Masculino , Fenotipo , Reacción en Cadena de la Polimerasa , Secuenciación Completa del GenomaRESUMEN
BACKGROUND: Whole blood is currently the most common DNA source for whole-genome sequencing (WGS), but for studies requiring non-invasive collection, self-collection, greater sample stability or additional tissue references, saliva or buccal samples may be preferred. However, the relative quality of sequencing data and accuracy of genetic variant detection from blood-derived, saliva-derived and buccal-derived DNA need to be thoroughly investigated. METHODS: Matched blood, saliva and buccal samples from four unrelated individuals were used to compare sequencing metrics and variant-detection accuracy among these DNA sources. RESULTS: We observed significant differences among DNA sources for sequencing quality metrics such as percentage of reads aligned and mean read depth (p<0.05). Differences were negligible in the accuracy of detecting short insertions and deletions; however, the false positive rate for single nucleotide variation detection was slightly higher in some saliva and buccal samples. The sensitivity of copy number variant (CNV) detection was up to 25% higher in blood samples, depending on CNV size and type, and appeared to be worse in saliva and buccal samples with high bacterial concentration. We also show that methylation-based enrichment for eukaryotic DNA in saliva and buccal samples increased alignment rates but also reduced read-depth uniformity, hampering CNV detection. CONCLUSION: For WGS, we recommend using DNA extracted from blood rather than saliva or buccal swabs; if saliva or buccal samples are used, we recommend against using methylation-based eukaryotic DNA enrichment. All data used in this study are available for further open-science investigation.
Asunto(s)
Variaciones en el Número de Copia de ADN/genética , ADN/genética , Secuenciación Completa del Genoma/normas , Adulto , ADN/sangre , ADN/química , ADN/normas , Metilación de ADN/genética , Femenino , Genotipo , Humanos , Masculino , Persona de Mediana Edad , Mucosa Bucal/química , Polimorfismo de Nucleótido Simple/genética , Saliva/química , Análisis de Secuencia de ADN/normasRESUMEN
Epilepsies are a group of common neurological disorders with a substantial genetic basis. Despite this, the molecular diagnosis of epilepsies remains challenging due to its heterogeneity. Studies utilizing whole-genome sequencing may provide additional insights into genetic causes of epilepsies of unknown aetiology. Whole-genome sequencing was used to evaluate a cohort of adults with unexplained developmental and epileptic encephalopathies (n = 30), for whom prior genetic tests, including whole-exome sequencing in some cases, were negative or inconclusive. Rare single nucleotide variants, insertions/deletions, copy number variants and tandem repeat expansions were analysed. Seven pathogenic or likely pathogenic single nucleotide variants, and two pathogenic deleterious copy number variants were identified in nine patients (32.1% of the cohort). One of the copy number variants, identified in a patient with Lennox-Gastaut syndrome, was too small to be detected by chromosomal microarray techniques. We also identified two tandem repeat expansions with clinical implications in two other patients with Lennox-Gastaut syndrome: a CGG repeat expansion in the 5'untranslated region of DIP2B, and a CTG expansion in ATXN8OS (previously implicated in spinocerebellar ataxia type 8). Three patients had KCNA2 pathogenic variants. One of them died of sudden unexpected death in epilepsy. The other two patients had, in addition to a KCNA2 variant, a second de novo variant impacting potential epilepsy-relevant genes (KCNIP4 and UBR5). Overall, whole-genome sequencing provided a genetic explanation in 32.1% of the total cohort. This is also the first report of coding and non-coding tandem repeat expansions identified in patients with Lennox-Gastaut syndrome. This study demonstrates that using whole-genome sequencing, the examination of multiple types of rare genetic variation, including those found in the non-coding region of the genome, can help resolve unexplained epilepsies.
RESUMEN
Repeat expansions are responsible for over 40 monogenic disorders, and undoubtedly more pathogenic repeat expansions remain to be discovered. Existing methods for detecting repeat expansions in short-read sequencing data require predefined repeat catalogs. Recent discoveries emphasize the need for methods that do not require pre-specified candidate repeats. To address this need, we introduce ExpansionHunter Denovo, an efficient catalog-free method for genome-wide repeat expansion detection. Analysis of real and simulated data shows that our method can identify large expansions of 41 out of 44 pathogenic repeats, including nine recently reported non-reference repeat expansions not discoverable via existing methods.