RESUMEN
We deployed the Blended Genome Exome (BGE), a DNA library blending approach that generates low pass whole genome (1-4× mean depth) and deep whole exome (30-40× mean depth) data in a single sequencing run. This technology is cost-effective, empowers most genomic discoveries possible with deep whole genome sequencing, and provides an unbiased method to capture the diversity of common SNP variation across the globe. To evaluate this new technology at scale, we applied BGE to sequence >53,000 samples from the Populations Underrepresented in Mental Illness Associations Studies (PUMAS) Project, which included participants across African, African American, and Latin American populations. We evaluated the accuracy of BGE imputed genotypes against raw genotype calls from the Illumina Global Screening Array. All PUMAS cohorts had R 2 concordance ≥95% among SNPs with MAF≥1%, and never fell below ≥90% R 2 for SNPs with MAF<1%. Furthermore, concordance rates among local ancestries within two recently admixed cohorts were consistent among SNPs with MAF≥1%, with only minor deviations in SNPs with MAF<1%. We also benchmarked the discovery capacity of BGE to access protein-coding copy number variants (CNVs) against deep whole genome data, finding that deletions and duplications spanning at least 3 exons had a positive predicted value of ~90%. Our results demonstrate BGE scalability and efficacy in capturing SNPs, indels, and CNVs in the human genome at 28% of the cost of deep whole-genome sequencing. BGE is poised to enhance access to genomic testing and empower genomic discoveries, particularly in underrepresented populations.
RESUMEN
Single-cell RNA-seq (scRNA-seq) is emerging as a powerful tool for understanding gene function across diverse cells. Recently, this has included the use of allele-specific expression (ASE) analysis to better understand how variation in the human genome affects RNA expression at the single-cell level. We reasoned that because intronic reads are more prevalent in single-nucleus RNA-Seq (snRNA-Seq), and introns are under lower purifying selection and thus enriched for genetic variants, that snRNA-seq should facilitate single-cell analysis of ASE. Here we demonstrate how experimental and computational choices can improve the results of allelic imbalance analysis. We explore how experimental choices, such as RNA source, read length, sequencing depth, genotyping, etc., impact the power of ASE-based methods. We developed a new suite of computational tools to process and analyze scRNA-seq and snRNA-seq for ASE. As hypothesized, we extracted more ASE information from reads in intronic regions than those in exonic regions and show how read length can be set to increase power. Additionally, hybrid selection improved our power to detect allelic imbalance in genes of interest. We also explored methods to recover allele-specific isoform expression levels from both long- and short-read snRNA-seq. To further investigate ASE in the context of human disease, we applied our methods to a Parkinson's disease cohort of 94 individuals and show that ASE analysis had more power than eQTL analysis to identify significant SNP/gene pairs in our direct comparison of the two methods. Overall, we provide an end-to-end experimental and computational approach for future studies.
RESUMEN
Full-length RNA-sequencing methods using long-read technologies can capture complete transcript isoforms, but their throughput is limited. We introduce multiplexed arrays isoform sequencing (MAS-ISO-seq), a technique for programmably concatenating complementary DNAs (cDNAs) into molecules optimal for long-read sequencing, increasing the throughput >15-fold to nearly 40 million cDNA reads per run on the Sequel IIe sequencer. When applied to single-cell RNA sequencing of tumor-infiltrating T cells, MAS-ISO-seq demonstrated a 12- to 32-fold increase in the discovery of differentially spliced genes.