RESUMEN
Copy number variants (CNVs) are significant contributors to the pathogenicity of rare genetic diseases and, with new innovative methods, can now reliably be identified from exome sequencing. Challenges still remain in accurate classification of CNV pathogenicity. CNV calling using GATK-gCNV was performed on exomes from a cohort of 6,633 families (15,759 individuals) with heterogeneous phenotypes and variable prior genetic testing collected at the Broad Institute Center for Mendelian Genomics of the Genomics Research to Elucidate the Genetics of Rare Diseases consortium and analyzed using the seqr platform. The addition of CNV detection to exome analysis identified causal CNVs for 171 families (2.6%). The estimated sizes of CNVs ranged from 293 bp to 80 Mb. The causal CNVs consisted of 140 deletions, 15 duplications, 3 suspected complex structural variants (SVs), 3 insertions, and 10 complex SVs, the latter two groups being identified by orthogonal confirmation methods. To classify CNV variant pathogenicity, we used the 2020 American College of Medical Genetics and Genomics/ClinGen CNV interpretation standards and developed additional criteria to evaluate allelic and functional data as well as variants on the X chromosome to further advance the framework. We interpreted 151 CNVs as likely pathogenic/pathogenic and 20 CNVs as high-interest variants of uncertain significance. Calling CNVs from existing exome data increases the diagnostic yield for individuals undiagnosed after standard testing approaches, providing a higher-resolution alternative to arrays at a fraction of the cost of genome sequencing. Our improvements to the classification approach advances the systematic framework to assess the pathogenicity of CNVs.
Asunto(s)
Variaciones en el Número de Copia de ADN , Secuenciación del Exoma , Exoma , Enfermedades Raras , Humanos , Variaciones en el Número de Copia de ADN/genética , Enfermedades Raras/genética , Enfermedades Raras/diagnóstico , Exoma/genética , Masculino , Femenino , Estudios de Cohortes , Pruebas Genéticas/métodosRESUMEN
BACKGROUND: Genetic variants that cause rare disorders may remain elusive even after expansive testing, such as exome sequencing. The diagnostic yield of genome sequencing, particularly after a negative evaluation, remains poorly defined. METHODS: We sequenced and analyzed the genomes of families with diverse phenotypes who were suspected to have a rare monogenic disease and for whom genetic testing had not revealed a diagnosis, as well as the genomes of a replication cohort at an independent clinical center. RESULTS: We sequenced the genomes of 822 families (744 in the initial cohort and 78 in the replication cohort) and made a molecular diagnosis in 218 of 744 families (29.3%). Of the 218 families, 61 (28.0%) - 8.2% of families in the initial cohort - had variants that required genome sequencing for identification, including coding variants, intronic variants, small structural variants, copy-neutral inversions, complex rearrangements, and tandem repeat expansions. Most families in which a molecular diagnosis was made after previous nondiagnostic exome sequencing (63.5%) had variants that could be detected by reanalysis of the exome-sequence data (53.4%) or by additional analytic methods, such as copy-number variant calling, to exome-sequence data (10.8%). We obtained similar results in the replication cohort: in 33% of the families in which a molecular diagnosis was made, or 8% of the cohort, genome sequencing was required, which showed the applicability of these findings to both research and clinical environments. CONCLUSIONS: The diagnostic yield of genome sequencing in a large, diverse research cohort and in a small clinical cohort of persons who had previously undergone genetic testing was approximately 8% and included several types of pathogenic variation that had not previously been detected by means of exome sequencing or other techniques. (Funded by the National Human Genome Research Institute and others.).
Asunto(s)
Variación Genética , Enfermedades Raras , Secuenciación Completa del Genoma , Femenino , Humanos , Masculino , Estudios de Cohortes , Exoma , Secuenciación del Exoma , Enfermedades Genéticas Congénitas/diagnóstico , Enfermedades Genéticas Congénitas/etnología , Enfermedades Genéticas Congénitas/genética , Pruebas Genéticas , Genoma Humano , Fenotipo , Enfermedades Raras/diagnóstico , Enfermedades Raras/etnología , Enfermedades Raras/genética , Análisis de Secuencia de ADN , Niño , Adolescente , Adulto Joven , AdultoRESUMEN
BACKGROUND: A major obstacle faced by families with rare diseases is obtaining a genetic diagnosis. The average "diagnostic odyssey" lasts over five years and causal variants are identified in under 50%, even when capturing variants genome-wide. To aid in the interpretation and prioritization of the vast number of variants detected, computational methods are proliferating. Knowing which tools are most effective remains unclear. To evaluate the performance of computational methods, and to encourage innovation in method development, we designed a Critical Assessment of Genome Interpretation (CAGI) community challenge to place variant prioritization models head-to-head in a real-life clinical diagnostic setting. METHODS: We utilized genome sequencing (GS) data from families sequenced in the Rare Genomes Project (RGP), a direct-to-participant research study on the utility of GS for rare disease diagnosis and gene discovery. Challenge predictors were provided with a dataset of variant calls and phenotype terms from 175 RGP individuals (65 families), including 35 solved training set families with causal variants specified, and 30 unlabeled test set families (14 solved, 16 unsolved). We tasked teams to identify causal variants in as many families as possible. Predictors submitted variant predictions with estimated probability of causal relationship (EPCR) values. Model performance was determined by two metrics, a weighted score based on the rank position of causal variants, and the maximum F-measure, based on precision and recall of causal variants across all EPCR values. RESULTS: Sixteen teams submitted predictions from 52 models, some with manual review incorporated. Top performers recalled causal variants in up to 13 of 14 solved families within the top 5 ranked variants. Newly discovered diagnostic variants were returned to two previously unsolved families following confirmatory RNA sequencing, and two novel disease gene candidates were entered into Matchmaker Exchange. In one example, RNA sequencing demonstrated aberrant splicing due to a deep intronic indel in ASNS, identified in trans with a frameshift variant in an unsolved proband with phenotypes consistent with asparagine synthetase deficiency. CONCLUSIONS: Model methodology and performance was highly variable. Models weighing call quality, allele frequency, predicted deleteriousness, segregation, and phenotype were effective in identifying causal variants, and models open to phenotype expansion and non-coding variants were able to capture more difficult diagnoses and discover new diagnoses. Overall, computational models can significantly aid variant prioritization. For use in diagnostics, detailed review and conservative assessment of prioritized variants against established criteria is needed.
Asunto(s)
Enfermedades Raras , Humanos , Enfermedades Raras/genética , Enfermedades Raras/diagnóstico , Genoma Humano/genética , Variación Genética/genética , Biología Computacional/métodos , FenotipoRESUMEN
PURPOSE: Genome sequencing (GS)-specific diagnostic rates in prospective tightly ascertained exome sequencing (ES)-negative intellectual disability (ID) cohorts have not been reported extensively. METHODS: ES, GS, epigenetic signatures, and long-read sequencing diagnoses were assessed in 74 trios with at least moderate ID. RESULTS: The ES diagnostic yield was 42 of 74 (57%). GS diagnoses were made in 9 of 32 (28%) ES-unresolved families. Repeated ES with a contemporary pipeline on the GS-diagnosed families identified 8 of 9 single-nucleotide variations/copy-number variations undetected in older ES, confirming a GS-unique diagnostic rate of 1 in 32 (3%). Episignatures contributed diagnostic information in 9% with GS corroboration in 1 of 32 (3%) and diagnostic clues in 2 of 32 (6%). A genetic etiology for ID was detected in 51 of 74 (69%) families. Twelve candidate disease genes were identified. Contemporary ES followed by GS cost US$4976 (95% CI: $3704; $6969) per diagnosis and first-line GS at a cost of $7062 (95% CI: $6210; $8475) per diagnosis. CONCLUSION: Performing GS only in ID trios would be cost equivalent to ES if GS were available at $2435, about a 60% reduction from current prices. This study demonstrates that first-line GS achieves higher diagnostic rate than contemporary ES but at a higher cost.
Asunto(s)
Secuenciación del Exoma , Exoma , Discapacidad Intelectual , Humanos , Discapacidad Intelectual/genética , Discapacidad Intelectual/diagnóstico , Masculino , Femenino , Exoma/genética , Secuenciación del Exoma/economía , Estudios de Cohortes , Pruebas Genéticas/economía , Pruebas Genéticas/métodos , Secuenciación Completa del Genoma/economía , Niño , Genoma Humano/genética , Variaciones en el Número de Copia de ADN/genética , Polimorfismo de Nucleótido Simple/genética , PreescolarRESUMEN
Recessive variants in WASHC4 are linked to intellectual disability complicated by poor language skills, short stature, and dysmorphic features. The protein encoded by WASHC4 is part of the Wiskott-Aldrich syndrome protein and SCAR homolog family, co-localizes with actin in cells, and promotes Arp2/3-dependent actin polymerization in vitro. Functional studies in a zebrafish model suggested that WASHC4 knockdown may also affect skeletal muscles by perturbing protein clearance. However, skeletal muscle involvement has not been reported so far in patients, and precise biochemical studies allowing a deeper understanding of the molecular etiology of the disease are still lacking. Here, we report two siblings with a homozygous WASHC4 variant expanding the clinical spectrum of the disease and provide a phenotypical comparison with cases reported in the literature. Proteomic profiling of fibroblasts of the WASHC4-deficient patient revealed dysregulation of proteins relevant for the maintenance of the neuromuscular axis. Immunostaining on a muscle biopsy derived from the same patient confirmed dysregulation of proteins relevant for proper muscle function, thus highlighting an affliction of muscle cells upon loss of functional WASHC4. The results of histological and coherent anti-Stokes Raman scattering microscopic studies support the concept of a functional role of the WASHC4 protein in humans by altering protein processing and clearance. The proteomic analysis confirmed key molecular players in vitro and highlighted, for the first time, the involvement of skeletal muscle in patients. © 2021 The Authors. The Journal of Pathology published by John Wiley & Sons, Ltd. on behalf of The Pathological Society of Great Britain and Ireland.
Asunto(s)
Discapacidades del Desarrollo/genética , Discapacidad Intelectual/genética , Músculo Esquelético/patología , Mutación/genética , Niño , Discapacidades del Desarrollo/complicaciones , Discapacidades del Desarrollo/diagnóstico , Humanos , Discapacidad Intelectual/diagnóstico , Músculo Esquelético/metabolismo , Linaje , Fenotipo , Proteómica/métodos , Hermanos , Secuenciación del Exoma/métodosRESUMEN
Consanguineous marriages have a prevalence rate of 24% in Turkey. These carry an increased risk of autosomal recessive genetic conditions, leading to severe disability or premature death, with a significant health and economic burden. A definitive molecular diagnosis could not be achieved in these children previously, as infrastructures and access to sophisticated diagnostic options were limited. We studied the cause of neurogenetic disease in 246 children from 190 consanguineous families recruited in three Turkish hospitals between 2016 and 2020. All patients underwent deep phenotyping and trio whole exome sequencing, and data were integrated in advanced international bioinformatics platforms. We detected causative variants in 119 known disease genes in 72% of families. Due to overlapping phenotypes 52% of the confirmed genetic diagnoses would have been missed on targeted diagnostic gene panels. Likely pathogenic variants in 27 novel genes in 14% of the families increased the diagnostic yield to 86%. Eighty-two per cent of causative variants (141/172) were homozygous, 11 of which were detected in genes previously only associated with autosomal dominant inheritance. Eight families carried two pathogenic variants in different disease genes. De novo (9.3%), X-linked recessive (5.2%) and compound heterozygous (3.5%) variants were less frequent compared to non-consanguineous populations. This cohort provided a unique opportunity to better understand the genetic characteristics of neurogenetic diseases in a consanguineous population. Contrary to what may be expected, causative variants were often not on the longest run of homozygosity and the diagnostic yield was lower in families with the highest degree of consanguinity, due to the high number of homozygous variants in these patients. Pathway analysis highlighted that protein synthesis/degradation defects and metabolic diseases are the most common pathways underlying paediatric neurogenetic disease. In our cohort 164 families (86%) received a diagnosis, enabling prevention of transmission and targeted treatments in 24 patients (10%). We generated an important body of genomic data with lasting impacts on the health and wellbeing of consanguineous families and economic benefit for the healthcare system in Turkey and elsewhere. We demonstrate that an untargeted next generation sequencing approach is far superior to a more targeted gene panel approach, and can be performed without specialized bioinformatics knowledge by clinicians using established pipelines in populations with high rates of consanguinity.
Asunto(s)
Exoma , Consanguinidad , Exoma/genética , Homocigoto , Humanos , Mutación , Linaje , Fenotipo , Secuenciación del ExomaRESUMEN
BACKGROUND: High-impact pathogenic variants in more than a thousand genes are involved in Mendelian forms of neurodevelopmental disorders (NDD). METHODS: This study describes the molecular and clinical characterisation of 28 probands with NDD harbouring heterozygous AGO1 coding variants, occurring de novo for all those whose transmission could have been verified (26/28). RESULTS: A total of 15 unique variants leading to amino acid changes or deletions were identified: 12 missense variants, two in-frame deletions of one codon, and one canonical splice variant leading to a deletion of two amino acid residues. Recurrently identified variants were present in several unrelated individuals: p.(Phe180del), p.(Leu190Pro), p.(Leu190Arg), p.(Gly199Ser), p.(Val254Ile) and p.(Glu376del). AGO1 encodes the Argonaute 1 protein, which functions in gene-silencing pathways mediated by small non-coding RNAs. Three-dimensional protein structure predictions suggest that these variants might alter the flexibility of the AGO1 linker domains, which likely would impair its function in mRNA processing. Affected individuals present with intellectual disability of varying severity, as well as speech and motor delay, autistic behaviour and additional behavioural manifestations. CONCLUSION: Our study establishes that de novo coding variants in AGO1 are involved in a novel monogenic form of NDD, highly similar to the recently reported AGO2-related NDD.
Asunto(s)
Proteínas Argonautas , Discapacidad Intelectual , Trastornos del Neurodesarrollo , Humanos , Aminoácidos/genética , Heterocigoto , Discapacidad Intelectual/genética , Discapacidad Intelectual/patología , Trastornos del Neurodesarrollo/genética , Trastornos del Neurodesarrollo/patología , ARN Mensajero , Proteínas Argonautas/genéticaRESUMEN
Exome and genome sequencing have become the tools of choice for rare disease diagnosis, leading to large amounts of data available for analyses. To identify causal variants in these datasets, powerful filtering and decision support tools that can be efficiently used by clinicians and researchers are required. To address this need, we developed seqr - an open-source, web-based tool for family-based monogenic disease analysis that allows researchers to work collaboratively to search and annotate genomic callsets. To date, seqr is being used in several research pipelines and one clinical diagnostic lab. In our own experience through the Broad Institute Center for Mendelian Genomics, seqr has enabled analyses of over 10,000 families, supporting the diagnosis of more than 3,800 individuals with rare disease and discovery of over 300 novel disease genes. Here, we describe a framework for genomic analysis in rare disease that leverages seqr's capabilities for variant filtration, annotation, and causal variant identification, as well as support for research collaboration and data sharing. The seqr platform is available as open source software, allowing low-cost participation in rare disease research, and a community effort to support diagnosis and gene discovery in rare disease.
Asunto(s)
Genómica , Enfermedades Raras , Exoma , Humanos , Internet , Enfermedades Raras/diagnóstico , Enfermedades Raras/genética , Programas InformáticosRESUMEN
PURPOSE: Nonmuscle myosin II complexes are master regulators of actin dynamics that play essential roles during embryogenesis with vertebrates possessing 3 nonmuscle myosin II heavy chain genes, MYH9, MYH10, and MYH14. As opposed to MYH9 and MYH14, no recognizable disorder has been associated with MYH10. We sought to define the clinical characteristics and molecular mechanism of a novel autosomal dominant disorder related to MYH10. METHODS: An international collaboration identified the patient cohort. CAS9-mediated knockout cell models were used to explore the mechanism of disease pathogenesis. RESULTS: We identified a cohort of 16 individuals with heterozygous MYH10 variants presenting with a broad spectrum of neurodevelopmental disorders and variable congenital anomalies that affect most organ systems and were recapitulated in animal models of altered MYH10 activity. Variants were typically de novo missense changes with clustering observed in the motor domain. MYH10 knockout cells showed defects in primary ciliogenesis and reduced ciliary length with impaired Hedgehog signaling. MYH10 variant overexpression produced a dominant-negative effect on ciliary length. CONCLUSION: These data presented a novel genetic cause of isolated and syndromic neurodevelopmental disorders related to heterozygous variants in the MYH10 gene with implications for disrupted primary cilia length control and altered Hedgehog signaling in disease pathogenesis.
Asunto(s)
Trastornos del Neurodesarrollo , Miosina Tipo IIB no Muscular , Actinas , Cilios/genética , Proteínas Hedgehog/genética , Humanos , Cadenas Pesadas de Miosina/genética , Trastornos del Neurodesarrollo/genética , Miosina Tipo IIB no Muscular/genéticaRESUMEN
Autosomal dominant and recessive mutations in COL12A1 cause the Ehlers-Danlos/myopathy overlap syndrome. Here, we describe a boy with fetal hypokinesia, severe neonatal weakness, striking hyperlaxity, high arched palate, retrognathia, club feet, and pectus excavatum. His motor development was initially delayed but muscle strength improved with time while hyperlaxity remained very severe causing recurrent joint dislocations. Using trio exome sequencing and a copy number variation (CNV) analysis tool, we identified an in-frame de novo heterozygous deletion of the exons 45 to 54 in the COL12A1 gene. Collagen XII immunostaining on cultured skin fibroblasts demonstrated intracellular retention of collagen XII, supporting the pathogenicity of the deletion. The phenotype of our patient is slightly more severe than other cases with dominantly acting mutations, notably with the presence of fetal hypokinesia. This case highlights the importance of CNVs analysis in the COL12A1 gene in patients with a phenotype suggesting Ehlers-Danlos/myopathy overlap syndrome.
Asunto(s)
Síndrome de Ehlers-Danlos , Enfermedades Musculares , Colágeno Tipo XII/genética , Variaciones en el Número de Copia de ADN , Síndrome de Ehlers-Danlos/diagnóstico , Síndrome de Ehlers-Danlos/genética , Exones , Humanos , Hipocinesia/genética , Masculino , Enfermedades Musculares/genética , MutaciónRESUMEN
CSDE1 encodes the cytoplasmic cold shock domain-containing protein E1 (CSDE1), which is highly conserved across species and functions as an RNA-binding protein involved in translationally coupled mRNA turnover. CSDE1 displays a bidirectional role: promoting and repressing the translation of RNAs but also increasing and decreasing the abundance of RNAs. Preclinical studies highlighted an involvement of CSDE1 in different forms of cancer. Moreover, CSDE1 is highly expressed in human embryonic stem cells and plays a role in neuronal migration and differentiation. A genome-wide association study suggested CSDE1 as a potential autism-spectrum disorder risk gene. A multicenter next generation sequencing approach unraveled likely causative heterozygous variants in CSDE1 in 18 patients, identifying a new autism spectrum disorder-related syndrome consisting of autism, intellectual disability, and neurodevelopmental delay. Since then, no further patients with CSDE1 variants have been reported in the literature. Here, we report a 9.5-year-old girl from a consanguineous family of Turkish origin suffering from profound delayed speech and motor development, moderate intellectual disability, neurologic and psychiatric symptoms as well as hypoplasia of corpus callosum and mildly reduced brain volume on brain magnetic resonance imaging associated with a recurrent de novo mutation in CSDE1 (c.367C > T; p.R123*) expanding the phenotypical spectrum associated with pathogenic CSDE1 variants.
Asunto(s)
Trastorno del Espectro Autista , Discapacidad Intelectual , Trastorno del Espectro Autista/diagnóstico , Trastorno del Espectro Autista/genética , Niño , Consanguinidad , Proteínas de Unión al ADN/genética , Femenino , Estudio de Asociación del Genoma Completo , Humanos , Discapacidad Intelectual/diagnóstico , Discapacidad Intelectual/genética , Discapacidad Intelectual/patología , Mutación , Padres , Proteínas de Unión al ARN/genéticaRESUMEN
Cyclin-dependent kinase-like 5 (CDKL5) deficiency disorder (CDD) is caused by heterozygous or hemizygous variants in CDKL5 and is characterized by refractory epilepsy, cognitive and motor impairments, and cerebral visual impairment. CDKL5 has multiple transcripts, of which the longest transcripts, NM_003159 and NM_001037343, have been used historically in clinical laboratory testing. However, the transcript NM_001323289 is the most highly expressed in brain and contains 170 nucleotides at the 3' end of its last exon that are noncoding in other transcripts. Two truncating variants in this region have been reported in association with a CDD phenotype. To clarify the significance and range of phenotypes associated with late truncating variants in this region of the predominant transcript in the brain, we report detailed information on two individuals, updated clinical information on a third individual, and a summary of published and unpublished individuals reported in ClinVar. The two new individuals (one male and one female) each had a relatively mild clinical presentation including periods of pharmaco-responsive epilepsy, independent walking and limited purposeful communication skills. A previously reported male continued to have a severe phenotype. Overall, variants in this region demonstrate a range of clinical severity consistent with reports in CDD but with the potential for milder presentation.
Asunto(s)
Síndromes Epilépticos , Espasmos Infantiles , Masculino , Femenino , Humanos , Espasmos Infantiles/diagnóstico , Espasmos Infantiles/genética , Espasmos Infantiles/complicaciones , Síndromes Epilépticos/genética , Fenotipo , Encéfalo , Proteínas Serina-Treonina Quinasas/genéticaRESUMEN
Phosphoinositides are lipids that play a critical role in processes such as cellular signalling, ion channel activity and membrane trafficking. When mutated, several genes that encode proteins that participate in the metabolism of these lipids give rise to neurological or developmental phenotypes. PI4KA is a phosphoinositide kinase that is highly expressed in the brain and is essential for life. Here we used whole exome or genome sequencing to identify 10 unrelated patients harbouring biallelic variants in PI4KA that caused a spectrum of conditions ranging from severe global neurodevelopmental delay with hypomyelination and developmental brain abnormalities to pure spastic paraplegia. Some patients presented immunological deficits or genito-urinary abnormalities. Functional analyses by western blotting and immunofluorescence showed decreased PI4KA levels in the patients' fibroblasts. Immunofluorescence and targeted lipidomics indicated that PI4KA activity was diminished in fibroblasts and peripheral blood mononuclear cells. In conclusion, we report a novel severe metabolic disorder caused by PI4KA malfunction, highlighting the importance of phosphoinositide signalling in human brain development and the myelin sheath.
Asunto(s)
Alelos , Variación Genética/genética , Enfermedades Desmielinizantes del Sistema Nervioso Central Hereditarias/genética , Antígenos de Histocompatibilidad Menor/genética , Trastornos del Neurodesarrollo/genética , Fosfotransferasas (Aceptor de Grupo Alcohol)/genética , Adolescente , Adulto , Niño , Preescolar , Femenino , Enfermedades Desmielinizantes del Sistema Nervioso Central Hereditarias/diagnóstico por imagen , Humanos , Lactante , Recién Nacido , Leucocitos Mononucleares/fisiología , Masculino , Trastornos del Neurodesarrollo/diagnóstico por imagen , LinajeRESUMEN
PURPOSE: CACNA1C encodes the alpha-1-subunit of a voltage-dependent L-type calcium channel expressed in human heart and brain. Heterozygous variants in CACNA1C have previously been reported in association with Timothy syndrome and long QT syndrome. Several case reports have suggested that CACNA1C variation may also be associated with a primarily neurological phenotype. METHODS: We describe 25 individuals from 22 families with heterozygous variants in CACNA1C, who present with predominantly neurological manifestations. RESULTS: Fourteen individuals have de novo, nontruncating variants and present variably with developmental delays, intellectual disability, autism, hypotonia, ataxia, and epilepsy. Functional studies of a subgroup of missense variants via patch clamp experiments demonstrated differential effects on channel function in vitro, including loss of function (p.Leu1408Val), neutral effect (p.Leu614Arg), and gain of function (p.Leu657Phe, p.Leu614Pro). The remaining 11 individuals from eight families have truncating variants in CACNA1C. The majority of these individuals have expressive language deficits, and half have autism. CONCLUSION: We expand the phenotype associated with CACNA1C variants to include neurodevelopmental abnormalities and epilepsy, in the absence of classic features of Timothy syndrome or long QT syndrome.
Asunto(s)
Trastorno Autístico , Canales de Calcio Tipo L , Síndrome de QT Prolongado , Sindactilia , Trastorno Autístico/genética , Canales de Calcio Tipo L/genética , Humanos , FenotipoRESUMEN
Spinal muscular atrophy (SMA) is a genetic disorder that causes progressive degeneration of lower motor neurons and the subsequent loss of muscle function throughout the body. It is the second most common recessive disorder in individuals of European descent and is present in all populations. Accurate tools exist for diagnosing SMA from genome sequencing data. However, there are no publicly available tools for GRCh38-aligned data from panel or exome sequencing assays which continue to be used as first line tests for neuromuscular disorders. This deficiency creates a critical gap in our ability to diagnose SMA in large existing rare disease cohorts, as well as newly sequenced exome and panel datasets. We therefore developed and extensively validated a new tool - SMA Finder - that can diagnose SMA not only in genome, but also exome and panel sequencing samples aligned to GRCh37, GRCh38, or T2T-CHM13. It works by evaluating aligned reads that overlap the c.840 position of SMN1 and SMN2 in order to detect the most common molecular causes of SMA. We applied SMA Finder to 16,626 exomes and 3,911 genomes from heterogeneous rare disease cohorts sequenced at the Broad Institute Center for Mendelian Genomics as well as 1,157 exomes and 8,762 panel sequencing samples from Tartu University Hospital. SMA Finder correctly identified all 16 known SMA cases and reported nine novel diagnoses which have since been confirmed by clinical testing, with another four novel diagnoses undergoing validation. Notably, out of the 29 total SMA positive cases, 23 had an initial clinical diagnosis of muscular dystrophy, congenital myasthenic syndrome, or myopathy. This underscored the frequency with which SMA can be misdiagnosed as other neuromuscular disorders and confirmed the utility of using SMA Finder to reanalyze phenotypically diverse neuromuscular disease cohorts. Finally, we evaluated SMA Finder on 198,868 individuals that had both exome and genome sequencing data within the UK Biobank (UKBB) and found that SMA Finder's overall false positive rate was less than 1 / 200,000 exome samples, and its positive predictive value (PPV) was 97%. We also observed 100% concordance between UKBB exome and genome calls. This analysis showed that, even though it is located within a segmental duplication, the most common causal variant for SMA can be detected with comparable accuracy to monogenic disease variants in non-repetitive regions. Additionally, the high PPV demonstrated by SMA Finder, the existence of treatment options for SMA in which early diagnosis is imperative for therapeutic benefit, as well as widespread availability of clinical confirmatory testing for SMA, warrants the addition of SMN1 to the ACMG list of genes with reportable secondary findings after genome and exome sequencing.
RESUMEN
Genetic association studies have made significant contributions to our understanding of the etiology of neurodevelopmental disorders (NDDs). However, these studies rarely focused on the African continent. The NeuroDev Project aims to address this diversity gap through detailed phenotypic and genetic characterization of children with NDDs from Kenya and South Africa. We present results from NeuroDev's first year of data collection, including phenotype data from 206 cases and clinical genetic analyses of 99 parent-child trios. Most cases met criteria for global developmental delay/intellectual disability (GDD/ID, 80.3%). Approximately half of the children with GDD/ID also met criteria for autism. Analysis of exome-sequencing data identified a pathogenic or likely pathogenic variant in 13 (17%) of the 75 cases from South Africa and 9 (38%) of the 24 cases from Kenya. Data from the trio pilot are publicly available, and the NeuroDev Project will continue to develop resources for the global genetics community.
Asunto(s)
Trastorno Autístico , Discapacidad Intelectual , Trastornos del Neurodesarrollo , Humanos , Niño , Trastornos del Neurodesarrollo/genética , Fenotipo , Discapacidad Intelectual/genética , Trastorno Autístico/genética , Exoma , Discapacidades del Desarrollo/genéticaRESUMEN
Pathogenic variants in ATP-dependent chromatin remodeling proteins are a recurrent cause of neurodevelopmental disorders (NDDs). The NURF complex consists of BPTF and either the SNF2H (SMARCA5) or SNF2L (SMARCA1) ISWI-chromatin remodeling enzyme. Pathogenic variants in BPTF and SMARCA5 were previously implicated in NDDs. Here, we describe 40 individuals from 30 families with de novo or maternally inherited pathogenic variants in SMARCA1. This novel NDD was associated with mild to severe ID/DD, delayed or regressive speech development, and some recurrent facial dysmorphisms. Individuals carrying SMARCA1 loss-of-function variants exhibited a mild genome-wide DNA methylation profile and a high penetrance of macrocephaly. Genetic dissection of the NURF complex using Smarca1, Smarca5, and Bptfsingle and double mouse knockouts revealed the importance of NURF composition and dosage for proper forebrain development. Finally, we propose that genetic alterations affecting different NURF components result in a NDD with a broad clinical spectrum.
RESUMEN
Copy number variants (CNVs) are significant contributors to the pathogenicity of rare genetic diseases and with new innovative methods can now reliably be identified from exome sequencing. Challenges still remain in accurate classification of CNV pathogenicity. CNV calling using GATK-gCNV was performed on exomes from a cohort of 6,633 families (15,759 individuals) with heterogeneous phenotypes and variable prior genetic testing collected at the Broad Institute Center for Mendelian Genomics of the GREGoR consortium. Each family's CNV data was analyzed using the seqr platform and candidate CNVs classified using the 2020 ACMG/ClinGen CNV interpretation standards. We developed additional evidence criteria to address situations not covered by the current standards. The addition of CNV calling to exome analysis identified causal CNVs for 173 families (2.6%). The estimated sizes of CNVs ranged from 293 bp to 80 Mb with estimates that 44% would not have been detected by standard chromosomal microarrays. The causal CNVs consisted of 141 deletions, 15 duplications, 4 suspected complex structural variants (SVs), 3 insertions and 10 complex SVs, the latter two groups being identified by orthogonal validation methods. We interpreted 153 CNVs as likely pathogenic/pathogenic and 20 CNVs as high interest variants of uncertain significance. Calling CNVs from existing exome data increases the diagnostic yield for individuals undiagnosed after standard testing approaches, providing a higher resolution alternative to arrays at a fraction of the cost of genome sequencing. Our improvements to the classification approach advances the systematic framework to assess the pathogenicity of CNVs.
RESUMEN
Background: A major obstacle faced by rare disease families is obtaining a genetic diagnosis. The average "diagnostic odyssey" lasts over five years, and causal variants are identified in under 50%. The Rare Genomes Project (RGP) is a direct-to-participant research study on the utility of genome sequencing (GS) for diagnosis and gene discovery. Families are consented for sharing of sequence and phenotype data with researchers, allowing development of a Critical Assessment of Genome Interpretation (CAGI) community challenge, placing variant prioritization models head-to-head in a real-life clinical diagnostic setting. Methods: Predictors were provided a dataset of phenotype terms and variant calls from GS of 175 RGP individuals (65 families), including 35 solved training set families, with causal variants specified, and 30 test set families (14 solved, 16 unsolved). The challenge tasked teams with identifying the causal variants in as many test set families as possible. Ranked variant predictions were submitted with estimated probability of causal relationship (EPCR) values. Model performance was determined by two metrics, a weighted score based on rank position of true positive causal variants and maximum F-measure, based on precision and recall of causal variants across EPCR thresholds. Results: Sixteen teams submitted predictions from 52 models, some with manual review incorporated. Top performing teams recalled the causal variants in up to 13 of 14 solved families by prioritizing high quality variant calls that were rare, predicted deleterious, segregating correctly, and consistent with reported phenotype. In unsolved families, newly discovered diagnostic variants were returned to two families following confirmatory RNA sequencing, and two prioritized novel disease gene candidates were entered into Matchmaker Exchange. In one example, RNA sequencing demonstrated aberrant splicing due to a deep intronic indel in ASNS, identified in trans with a frameshift variant, in an unsolved proband with phenotype overlap with asparagine synthetase deficiency. Conclusions: By objective assessment of variant predictions, we provide insights into current state-of-the-art algorithms and platforms for genome sequencing analysis for rare disease diagnosis and explore areas for future optimization. Identification of diagnostic variants in unsolved families promotes synergy between researchers with clinical and computational expertise as a means of advancing the field of clinical genome interpretation.