RESUMO
SINE-VNTR-Alu (SVA) retrotransposons are evolutionarily young and still-active transposable elements (TEs) in the human genome. Several pathogenic SVA insertions have been identified that directly mutate host genes to cause neurodegenerative and other types of diseases. However, due to their sequence heterogeneity and complex structures as well as limitations in sequencing techniques and analysis, SVA insertions have been less well studied compared to other mobile element insertions. Here, we identified polymorphic SVA insertions from 3646 whole-genome sequencing (WGS) samples of >150 diverse populations and constructed a polymorphic SVA insertion reference catalog. Using 20 long-read samples, we also assembled reference and polymorphic SVA sequences and characterized the internal hexamer/variable-number-tandem-repeat (VNTR) expansions as well as differing SVA activity for SVA subfamilies and human populations. In addition, we developed a module to annotate both reference and polymorphic SVA copies. By characterizing the landscape of both reference and polymorphic SVA retrotransposons, our study enables more accurate genotyping of these elements and facilitate the discovery of pathogenic SVA insertions.
Assuntos
Genoma Humano , Retroelementos , Humanos , Elementos Alu , Genoma Humano/genética , Repetições Minissatélites/genética , Retroelementos/genética , Elementos Nucleotídeos Curtos e DispersosRESUMO
Improvements in single-cell whole-genome sequencing (scWGS) assays have enabled detailed characterization of somatic copy number alterations (CNAs) at the single-cell level. Yet, current computational methods are mostly designed for detecting chromosome-scale changes in cancer samples with low sequencing coverage. Here, we introduce HiScanner (High-resolution Single-Cell Allelic copy Number callER), which combines read depth, B-allele frequency, and haplotype phasing to identify CNAs with high resolution. In simulated data, HiScanner consistently outperforms state-of-the-art methods across various CNA types and sizes. When applied to high-coverage scWGS data from human brain cells, HiScanner shows a superior ability to detect smaller CNAs, uncovering distinct CNA patterns between neurons and oligodendrocytes. For 179 cells we sequenced from longitudinal meningioma samples, integration of CNAs with point mutations revealed evolutionary trajectories of tumor cells. These findings show that HiScanner enables accurate characterization of frequency, clonality, and distribution of CNAs at the single-cell level in both non-neoplastic and neoplastic cells.
RESUMO
Genomics for rare disease diagnosis has advanced at a rapid pace due to our ability to perform "N-of-1" analyses on individual patients with ultra-rare diseases. The increasing sizes of ultra-rare disease cohorts internationally newly enables cohort-wide analyses for new discoveries, but well-calibrated statistical genetics approaches for jointly analyzing these patients are still under development.1,2 The Undiagnosed Diseases Network (UDN) brings multiple clinical, research and experimental centers under the same umbrella across the United States to facilitate and scale N-of-1 analyses. Here, we present the first joint analysis of whole genome sequencing data of UDN patients across the network. We introduce new, well-calibrated statistical methods for prioritizing disease genes with de novo recurrence and compound heterozygosity. We also detect pathways enriched with candidate and known diagnostic genes. Our computational analysis, coupled with a systematic clinical review, recapitulated known diagnoses and revealed new disease associations. We further release a software package, RaMeDiES, enabling automated cross-analysis of deidentified sequenced cohorts for new diagnostic and research discoveries. Gene-level findings and variant-level information across the cohort are available in a public-facing browser (https://dbmi-bgm.github.io/udn-browser/). These results show that N-of-1 efforts should be supplemented by a joint genomic analysis across cohorts.
RESUMO
The 4D Nucleome (4DN) Network aims to elucidate the complex structure and organization of chromosomes in the nucleus and the impact of their disruption in disease biology. We present the 4DN Data Portal ( https://data.4dnucleome.org/ ), a repository for datasets generated in the 4DN network and relevant external datasets. Datasets were generated with a wide range of experiments, including chromosome conformation capture assays such as Hi-C and other innovative sequencing and microscopy-based assays probing chromosome architecture. All together, the 4DN data portal hosts more than 1800 experiment sets and 36000 files. Results of sequencing-based assays from different laboratories are uniformly processed and quality-controlled. The portal interface allows easy browsing, filtering, and bulk downloads, and the integrated HiGlass genome browser allows interactive visualization and comparison of multiple datasets. The 4DN data portal represents a primary resource for chromosome contact and other nuclear architecture data for the scientific community.