RESUMO
Despite significant advancements in rare genetic disease diagnostics, many patients with rare genetic disease remain without a molecular diagnosis. Novel tools and methods are needed to improve the detection of disease-associated variants and understand the genetic basis of many rare diseases. Long-read genome sequencing provides improved sequencing in highly repetitive, homologous, and low-complexity regions, and improved assessment of structural variation and complex genomic rearrangements compared to short-read genome sequencing. As such, it is a promising method to explore overlooked genetic variants in rare diseases with a high suspicion of a genetic basis. We therefore applied PacBio HiFi sequencing in a large multi-generational family presenting with autosomal dominant 46,XY differences of sexual development (DSD), for whom extensive molecular testing over multiple decades had failed to identify a molecular diagnosis. This revealed a rare SINE-VNTR-Alu retroelement insertion in intron 4 of NR5A1, a gene in which loss-of-function variants are an established cause of 46,XY DSD. The insertion segregated among affected family members and was associated with loss-of-expression of alleles in cis, demonstrating a functional impact on NR5A1. This case highlights the power of long-read genome sequencing to detect genomic variants that have previously been intractable to detection by standard short-read genomic testing.
Assuntos
Transtorno 46,XY do Desenvolvimento Sexual , Retroelementos , Humanos , Mutação , Íntrons/genética , Retroelementos/genética , Transtorno 46,XY do Desenvolvimento Sexual/genética , Doenças Raras/genética , Desenvolvimento Sexual , Fator Esteroidogênico 1/genéticaRESUMO
Using five complementary short- and long-read sequencing technologies, we phased and assembled >95% of each diploid human genome in a four-generation, 28-member family (CEPH 1463) allowing us to systematically assess de novo mutations (DNMs) and recombination. From this family, we estimate an average of 192 DNMs per generation, including 75.5 de novo single-nucleotide variants (SNVs), 7.4 non-tandem repeat indels, 79.6 de novo indels or structural variants (SVs) originating from tandem repeats, 7.7 centromeric de novo SVs and SNVs, and 12.4 de novo Y chromosome events per generation. STRs and VNTRs are the most mutable with 32 loci exhibiting recurrent mutation through the generations. We accurately assemble 288 centromeres and six Y chromosomes across the generations, documenting de novo SVs, and demonstrate that the DNM rate varies by an order of magnitude depending on repeat content, length, and sequence identity. We show a strong paternal bias (75-81%) for all forms of germline DNM, yet we estimate that 17% of de novo SNVs are postzygotic in origin with no paternal bias. We place all this variation in the context of a high-resolution recombination map (~3.5 kbp breakpoint resolution). We observe a strong maternal recombination bias (1.36 maternal:paternal ratio) with a consistent reduction in the number of crossovers with increasing paternal (r=0.85) and maternal (r=0.65) age. However, we observe no correlation between meiotic crossover locations and de novo SVs, arguing against non-allelic homologous recombination as a predominant mechanism. The use of multiple orthogonal technologies, near-telomere-to-telomere phased genome assemblies, and a multi-generation family to assess transmission has created the most comprehensive, publicly available "truth set" of all classes of genomic variants. The resource can be used to test and benchmark new algorithms and technologies to understand the most fundamental processes underlying human genetic variation.