ABSTRACT
Complex structural variations (cxSVs) are often overlooked in genome analyses due to detection challenges. We developed ARC-SV, a probabilistic and machine-learning-based method that enables accurate detection and reconstruction of cxSVs from standard datasets. By applying ARC-SV across 4,262 genomes representing all continental populations, we identified cxSVs as a significant source of natural human genetic variation. Rare cxSVs have a propensity to occur in neural genes and loci that underwent rapid human-specific evolution, including those regulating corticogenesis. By performing single-nucleus multiomics in postmortem brains, we discovered cxSVs associated with differential gene expression and chromatin accessibility across various brain regions and cell types. Additionally, cxSVs detected in brains of psychiatric cases are enriched for linkage with psychiatric GWAS risk alleles detected in the same brains. Furthermore, our analysis revealed significantly decreased brain-region- and cell-type-specific expression of cxSV genes, specifically for psychiatric cases, implicating cxSVs in the molecular etiology of major neuropsychiatric disorders.
ABSTRACT
Duplication is a foundation of molecular evolution and a driver of genomic and complex diseases. Here, we develop a genome editing tool named Amplification Editing (AE) that enables programmable DNA duplication with precision at chromosomal scale. AE can duplicate human genomes ranging from 20 bp to 100 Mb, a size comparable to human chromosomes. AE exhibits activity across various cell types, encompassing diploid, haploid, and primary cells. AE exhibited up to 73.0% efficiency for 1 Mb and 3.4% for 100 Mb duplications, respectively. Whole-genome sequencing and deep sequencing of the junctions of edited sequences confirm the precision of duplication. AE can create chromosomal microduplications within disease-relevant regions in embryonic stem cells, indicating its potential for generating cellular and animal models. AE is a precise and efficient tool for chromosomal engineering and DNA duplication, broadening the landscape of precision genome editing from an individual genetic locus to the chromosomal scale.
Subject(s)
Gene Duplication , Gene Editing , Genome, Human , Humans , Gene Editing/methods , CRISPR-Cas Systems/genetics , DNA/genetics , Animals , Embryonic Stem Cells/metabolism , Chromosomes, Human/geneticsABSTRACT
In aging, physiologic networks decline in function at rates that differ between individuals, producing a wide distribution of lifespan. Though 70% of human lifespan variance remains unexplained by heritable factors, little is known about the intrinsic sources of physiologic heterogeneity in aging. To understand how complex physiologic networks generate lifespan variation, new methods are needed. Here, we present Asynch-seq, an approach that uses gene-expression heterogeneity within isogenic populations to study the processes generating lifespan variation. By collecting thousands of single-individual transcriptomes, we capture the Caenorhabditis elegans "pan-transcriptome"-a highly resolved atlas of non-genetic variation. We use our atlas to guide a large-scale perturbation screen that identifies the decoupling of total mRNA content between germline and soma as the largest source of physiologic heterogeneity in aging, driven by pleiotropic genes whose knockdown dramatically reduces lifespan variance. Our work demonstrates how systematic mapping of physiologic heterogeneity can be applied to reduce inter-individual disparities in aging.
Subject(s)
Aging , Caenorhabditis elegans , Gene Regulatory Networks , Longevity , Transcriptome , Caenorhabditis elegans/genetics , Caenorhabditis elegans/physiology , Animals , Aging/genetics , Transcriptome/genetics , Longevity/genetics , Caenorhabditis elegans Proteins/metabolism , Caenorhabditis elegans Proteins/genetics , RNA, Messenger/metabolism , RNA, Messenger/geneticsABSTRACT
This perspective focuses on advances in genome technology over the last 25 years and their impact on germline variant discovery within the field of human genetics. The field has witnessed tremendous technological advances from microarrays to short-read sequencing and now long-read sequencing. Each technology has provided genome-wide access to different classes of human genetic variation. We are now on the verge of comprehensive variant detection of all forms of variation for the first time with a single assay. We predict that this transition will further transform our understanding of human health and biology and, more importantly, provide novel insights into the dynamic mutational processes shaping our genomes.
Subject(s)
Genomic Structural Variation , Genomics , Humans , Genomics/methods , Germ-Line Mutation , Mutation , TechnologyABSTRACT
Precise control of gene expression levels is essential for normal cell functions, yet how they are defined and tightly maintained, particularly at intermediate levels, remains elusive. Here, using a series of newly developed sequencing, imaging, and functional assays, we uncover a class of transcription factors with dual roles as activators and repressors, referred to as condensate-forming level-regulating dual-action transcription factors (TFs). They reduce high expression but increase low expression to achieve stable intermediate levels. Dual-action TFs directly exert activating and repressing functions via condensate-forming domains that compartmentalize core transcriptional unit selectively. Clinically relevant mutations in these domains, which are linked to a range of developmental disorders, impair condensate selectivity and dual-action TF activity. These results collectively address a fundamental question in expression regulation and demonstrate the potential of level-regulating dual-action TFs as powerful effectors for engineering controlled expression levels.
Subject(s)
Transcription Factors , Animals , Humans , Mice , Gene Expression Regulation , Mutation , Repressor Proteins/metabolism , Repressor Proteins/genetics , Transcription Factors/metabolism , Transcription Factors/genetics , Cell LineABSTRACT
Certain cancer types afflict female and male patients disproportionately. The reasons include differences in male/female physiology, effect of sex hormones, risk behavior, environmental exposures, and genetics of the sex chromosomes X and Y. Loss of Y (LOY) is common in peripheral blood cells in aging men, and this phenomenon is associated with several diseases. However, the frequency and role of LOY in tumors is little understood. Here, we present a comprehensive catalog of LOY in >5,000 primary tumors from male patients in the TCGA. We show that LOY rates vary by tumor type and provide evidence for LOY being either a passenger or driver event depending on context. LOY in uveal melanoma specifically is associated with age and survival and is an independent predictor of poor outcome. LOY creates common dependencies on DDX3X and EIF1AX in male cell lines, suggesting that LOY generates unique vulnerabilities that could be therapeutically exploited.
ABSTRACT
Many regions in the human genome vary in length among individuals due to variable numbers of tandem repeats (VNTRs). To assess the phenotypic impact of VNTRs genome-wide, we applied a statistical imputation approach to estimate the lengths of 9,561 autosomal VNTR loci in 418,136 unrelated UK Biobank participants and 838 GTEx participants. Association and statistical fine-mapping analyses identified 58 VNTRs that appeared to influence a complex trait in UK Biobank, 18 of which also appeared to modulate expression or splicing of a nearby gene. Non-coding VNTRs at TMCO1 and EIF3H appeared to generate the largest known contributions of common human genetic variation to risk of glaucoma and colorectal cancer, respectively. Each of these two VNTRs associated with a >2-fold range of risk across individuals. These results reveal a substantial and previously unappreciated role of non-coding VNTRs in human health and gene regulation.
Subject(s)
Calcium Channels , Colorectal Neoplasms , Eukaryotic Initiation Factor-3 , Glaucoma , Minisatellite Repeats , Humans , Calcium Channels/genetics , Colorectal Neoplasms/genetics , Genome, Human , Glaucoma/genetics , Polymorphism, Genetic , Eukaryotic Initiation Factor-3/geneticsABSTRACT
Rare copy-number variants (rCNVs) include deletions and duplications that occur infrequently in the global human population and can confer substantial risk for disease. In this study, we aimed to quantify the properties of haploinsufficiency (i.e., deletion intolerance) and triplosensitivity (i.e., duplication intolerance) throughout the human genome. We harmonized and meta-analyzed rCNVs from nearly one million individuals to construct a genome-wide catalog of dosage sensitivity across 54 disorders, which defined 163 dosage sensitive segments associated with at least one disorder. These segments were typically gene dense and often harbored dominant dosage sensitive driver genes, which we were able to prioritize using statistical fine-mapping. Finally, we designed an ensemble machine-learning model to predict probabilities of dosage sensitivity (pHaplo & pTriplo) for all autosomal genes, which identified 2,987 haploinsufficient and 1,559 triplosensitive genes, including 648 that were uniquely triplosensitive. This dosage sensitivity resource will provide broad utility for human disease research and clinical genetics.
Subject(s)
DNA Copy Number Variations , Genome, Human , DNA Copy Number Variations/genetics , Gene Dosage , Haploinsufficiency/genetics , HumansABSTRACT
Fully understanding autism spectrum disorder (ASD) genetics requires whole-genome sequencing (WGS). We present the latest release of the Autism Speaks MSSNG resource, which includes WGS data from 5,100 individuals with ASD and 6,212 non-ASD parents and siblings (total n = 11,312). Examining a wide variety of genetic variants in MSSNG and the Simons Simplex Collection (SSC; n = 9,205), we identified ASD-associated rare variants in 718/5,100 individuals with ASD from MSSNG (14.1%) and 350/2,419 from SSC (14.5%). Considering genomic architecture, 52% were nuclear sequence-level variants, 46% were nuclear structural variants (including copy-number variants, inversions, large insertions, uniparental isodisomies, and tandem repeat expansions), and 2% were mitochondrial variants. Our study provides a guidebook for exploring genotype-phenotype correlations in families who carry ASD-associated rare variants and serves as an entry point to the expanded studies required to dissect the etiology in the â¼85% of the ASD population that remain idiopathic.
Subject(s)
Autism Spectrum Disorder , Autistic Disorder , Humans , Autism Spectrum Disorder/genetics , Genetic Predisposition to Disease , DNA Copy Number Variations/genetics , GenomicsABSTRACT
Unlike copy number variants (CNVs), inversions remain an underexplored genetic variation class. By integrating multiple genomic technologies, we discover 729 inversions in 41 human genomes. Approximately 85% of inversions <2 kbp form by twin-priming during L1 retrotransposition; 80% of the larger inversions are balanced and affect twice as many nucleotides as CNVs. Balanced inversions show an excess of common variants, and 72% are flanked by segmental duplications (SDs) or retrotransposons. Since flanking repeats promote non-allelic homologous recombination, we developed complementary approaches to identify recurrent inversion formation. We describe 40 recurrent inversions encompassing 0.6% of the genome, showing inversion rates up to 2.7 × 10-4 per locus per generation. Recurrent inversions exhibit a sex-chromosomal bias and co-localize with genomic disorder critical regions. We propose that inversion recurrence results in an elevated number of heterozygous carriers and structural SD diversity, which increases mutability in the population and predisposes specific haplotypes to disease-causing CNVs.
Subject(s)
Chromosome Inversion , Segmental Duplications, Genomic , Chromosome Inversion/genetics , DNA Copy Number Variations/genetics , Genome, Human , Genomics , HumansABSTRACT
The human genome contains hundreds of thousands of regions harboring copy-number variants (CNV). However, the phenotypic effects of most such polymorphisms are unknown because only larger CNVs have been ascertainable from SNP-array data generated by large biobanks. We developed a computational approach leveraging haplotype sharing in biobank cohorts to more sensitively detect CNVs. Applied to UK Biobank, this approach accounted for approximately half of all rare gene inactivation events produced by genomic structural variation. This CNV call set enabled a detailed analysis of associations between CNVs and 56 quantitative traits, identifying 269 independent associations (p < 5 × 10-8) likely to be causally driven by CNVs. Putative target genes were identifiable for nearly half of the loci, enabling insights into dosage sensitivity of these genes and uncovering several gene-trait relationships. These results demonstrate the ability of haplotype-informed analysis to provide insights into the genetic basis of human complex traits.
Subject(s)
Multifactorial Inheritance , Quantitative Trait Loci , Humans , DNA Copy Number Variations , Phenotype , Genome, Human , Polymorphism, Single Nucleotide/geneticsABSTRACT
The 1000 Genomes Project (1kGP) is the largest fully open resource of whole-genome sequencing (WGS) data consented for public distribution without access or use restrictions. The final, phase 3 release of the 1kGP included 2,504 unrelated samples from 26 populations and was based primarily on low-coverage WGS. Here, we present a high-coverage 3,202-sample WGS 1kGP resource, which now includes 602 complete trios, sequenced to a depth of 30X using Illumina. We performed single-nucleotide variant (SNV) and short insertion and deletion (INDEL) discovery and generated a comprehensive set of structural variants (SVs) by integrating multiple analytic methods through a machine learning model. We show gains in sensitivity and precision of variant calls compared to phase 3, especially among rare SNVs as well as INDELs and SVs spanning frequency spectrum. We also generated an improved reference imputation panel, making variants discovered here accessible for association studies.
Subject(s)
Genome, Human , Whole Genome Sequencing , Female , High-Throughput Nucleotide Sequencing/methods , Humans , INDEL Mutation , Male , Polymorphism, Single NucleotideABSTRACT
Structural variations (SVs) and gene copy number variations (gCNVs) have contributed to crop evolution, domestication, and improvement. Here, we assembled 31 high-quality genomes of genetically diverse rice accessions. Coupling with two existing assemblies, we developed pan-genome-scale genomic resources including a graph-based genome, providing access to rice genomic variations. Specifically, we discovered 171,072 SVs and 25,549 gCNVs and used an Oryza glaberrima assembly to infer the derived states of SVs in the Oryza sativa population. Our analyses of SV formation mechanisms, impacts on gene expression, and distributions among subpopulations illustrate the utility of these resources for understanding how SVs and gCNVs shaped rice environmental adaptation and domestication. Our graph-based genome enabled genome-wide association study (GWAS)-based identification of phenotype-associated genetic variations undetectable when using only SNPs and a single reference assembly. Our work provides rich population-scale resources paired with easy-to-access tools to facilitate rice breeding as well as plant functional genomics and evolutionary biology research.
Subject(s)
Ecotype , Genetic Variation , Genome, Plant , Oryza/genetics , Adaptation, Physiological/genetics , Agriculture , Domestication , Gene Expression Profiling , Gene Expression Regulation, Plant , Genes, Plant , Genomic Structural Variation , Molecular Sequence Annotation , PhenotypeABSTRACT
The recently enriched genomic history of Indigenous groups in the Americas is still meager concerning continental Central America. Here, we report ten pre-Hispanic (plus two early colonial) genomes and 84 genome-wide profiles from seven groups presently living in Panama. Our analyses reveal that pre-Hispanic demographic events contributed to the extensive genetic structure currently seen in the area, which is also characterized by a distinctive Isthmo-Colombian Indigenous component. This component drives these populations on a specific variability axis and derives from the local admixture of different ancestries of northern North American origin(s). Two of these ancestries were differentially associated to Pleistocene Indigenous groups that also moved into South America, leaving heterogenous genetic footprints. An additional Pleistocene ancestry was brought by a still unsampled population of the Isthmus (UPopI) that remained restricted to the Isthmian area, expanded locally during the early Holocene, and left genomic traces up to the present day.
Subject(s)
American Indian or Alaska Native/genetics , Archaeology , Genomics/methods , American Indian or Alaska Native/classification , DNA, Mitochondrial/genetics , Genetic Variation , Genome, Human , Haplotypes , Humans , PhylogenyABSTRACT
Cultivated rice varieties are all diploid, and polyploidization of rice has long been desired because of its advantages in genome buffering, vigorousness, and environmental robustness. However, a workable route remains elusive. Here, we describe a practical strategy, namely de novo domestication of wild allotetraploid rice. By screening allotetraploid wild rice inventory, we identified one genotype of Oryza alta (CCDD), polyploid rice 1 (PPR1), and established two important resources for its de novo domestication: (1) an efficient tissue culture, transformation, and genome editing system and (2) a high-quality genome assembly discriminated into two subgenomes of 12 chromosomes apiece. With these resources, we show that six agronomically important traits could be rapidly improved by editing O. alta homologs of the genes controlling these traits in diploid rice. Our results demonstrate the possibility that de novo domesticated allotetraploid rice can be developed into a new staple cereal to strengthen world food security.
Subject(s)
Crops, Agricultural/genetics , Domestication , Oryza/genetics , CRISPR-Cas Systems , Food Security , Gene Editing , Genetic Variation , Genome, Plant , Oryza/classification , PolyploidyABSTRACT
Plant immunity is activated upon pathogen perception and often affects growth and yield when it is constitutively active. How plants fine-tune immune homeostasis in their natural habitats remains elusive. Here, we discover a conserved immune suppression network in cereals that orchestrates immune homeostasis, centering on a Ca2+-sensor, RESISTANCE OF RICE TO DISEASES1 (ROD1). ROD1 promotes reactive oxygen species (ROS) scavenging by stimulating catalase activity, and its protein stability is regulated by ubiquitination. ROD1 disruption confers resistance to multiple pathogens, whereas a natural ROD1 allele prevalent in indica rice with agroecology-specific distribution enhances resistance without yield penalty. The fungal effector AvrPiz-t structurally mimics ROD1 and activates the same ROS-scavenging cascade to suppress host immunity and promote virulence. We thus reveal a molecular framework adopted by both host and pathogen that integrates Ca2+ sensing and ROS homeostasis to suppress plant immunity, suggesting a principle for breeding disease-resistant, high-yield crops.
Subject(s)
Calcium/metabolism , Free Radical Scavengers/metabolism , Fungal Proteins/metabolism , Oryza/immunology , Plant Immunity , Plant Proteins/metabolism , Reactive Oxygen Species/metabolism , CRISPR-Cas Systems/genetics , Cell Membrane/metabolism , Disease Resistance/genetics , Models, Biological , Oryza/genetics , Plant Diseases/immunology , Plant Proteins/genetics , Protein Binding , Protein Stability , Reproduction , Species Specificity , Ubiquitin-Protein Ligases/metabolism , Ubiquitination , Zea mays/immunologyABSTRACT
Structural variants (SVs) underlie important crop improvement and domestication traits. However, resolving the extent, diversity, and quantitative impact of SVs has been challenging. We used long-read nanopore sequencing to capture 238,490 SVs in 100 diverse tomato lines. This panSV genome, along with 14 new reference assemblies, revealed large-scale intermixing of diverse genotypes, as well as thousands of SVs intersecting genes and cis-regulatory regions. Hundreds of SV-gene pairs exhibit subtle and significant expression changes, which could broadly influence quantitative trait variation. By combining quantitative genetics with genome editing, we show how multiple SVs that changed gene dosage and expression levels modified fruit flavor, size, and production. In the last example, higher order epistasis among four SVs affecting three related transcription factors allowed introduction of an important harvesting trait in modern tomato. Our findings highlight the underexplored role of SVs in genotype-to-phenotype relationships and their widespread importance and utility in crop improvement.
Subject(s)
Crops, Agricultural/genetics , Gene Expression Regulation, Plant , Genomic Structural Variation , Solanum lycopersicum/genetics , Alleles , Cytochrome P-450 Enzyme System/genetics , Ecotype , Epistasis, Genetic , Fruit/genetics , Gene Duplication , Genome, Plant , Genotype , Inbreeding , Molecular Sequence Annotation , Phenotype , Plant Breeding , Quantitative Trait Loci/geneticsABSTRACT
The gut microbiome is the resident microbial community of the gastrointestinal tract. This community is highly diverse, but how microbial diversity confers resistance or susceptibility to intestinal pathogens is poorly understood. Using transplantation of human microbiomes into several animal models of infection, we show that key microbiome species shape the chemical environment of the gut through the activity of the enzyme bile salt hydrolase. The activity of this enzyme reduced colonization by the major human diarrheal pathogen Vibrio cholerae by degrading the bile salt taurocholate that activates the expression of virulence genes. The absence of these functions and species permits increased infection loads on a personal microbiome-specific basis. These findings suggest new targets for individualized preventative strategies of V. cholerae infection through modulating the structure and function of the gut microbiome.
Subject(s)
Cholera/metabolism , Disease Susceptibility/microbiology , Gastrointestinal Microbiome/physiology , Adult , Animals , Bile Acids and Salts , Cholera/microbiology , Disease Models, Animal , Fecal Microbiota Transplantation/methods , Female , Host-Pathogen Interactions/physiology , Humans , Hydrolases/analysis , Male , Mice , Mice, Inbred C57BL , Microbiota , Taurocholic Acid/metabolism , Vibrio cholerae/pathogenicity , Vibrio cholerae/physiology , VirulenceABSTRACT
Metagenomic inferences of bacterial strain diversity and infectious disease transmission studies largely assume a dominant, within-individual haplotype. We hypothesize that within-individual bacterial population diversity is critical for homeostasis of a healthy microbiome and infection risk. We characterized the evolutionary trajectory and functional distribution of Staphylococcus epidermidis-a keystone skin microbe and opportunistic pathogen. Analyzing 1,482 S. epidermidis genomes from 5 healthy individuals, we found that skin S. epidermidis isolates coalesce into multiple founder lineages rather than a single colonizer. Transmission events, natural selection, and pervasive horizontal gene transfer result in population admixture within skin sites and dissemination of antibiotic resistance genes within-individual. We provide experimental evidence for how admixture can modulate virulence and metabolism. Leveraging data on the contextual microbiome, we assess how interspecies interactions can shape genetic diversity and mobile gene elements. Our study provides insights into how within-individual evolution of human skin microbes shapes their functional diversification.
Subject(s)
Evolution, Molecular , Gene Transfer, Horizontal , Host Microbial Interactions/genetics , Microbiota/genetics , Polymorphism, Single Nucleotide , Skin/microbiology , Staphylococcus epidermidis/genetics , Adult , DNA, Bacterial/genetics , Drug Resistance, Bacterial/genetics , Female , Healthy Volunteers , Humans , Male , Middle Aged , Phylogeny , Staphylococcus epidermidis/isolation & purification , Staphylococcus epidermidis/pathogenicity , Virulence/genetics , Young AdultABSTRACT
Cancer genomes often harbor hundreds of somatic DNA rearrangement junctions, many of which cannot be easily classified into simple (e.g., deletion) or complex (e.g., chromothripsis) structural variant classes. Applying a novel genome graph computational paradigm to analyze the topology of junction copy number (JCN) across 2,778 tumor whole-genome sequences, we uncovered three novel complex rearrangement phenomena: pyrgo, rigma, and tyfonas. Pyrgo are "towers" of low-JCN duplications associated with early-replicating regions, superenhancers, and breast or ovarian cancers. Rigma comprise "chasms" of low-JCN deletions enriched in late-replicating fragile sites and gastrointestinal carcinomas. Tyfonas are "typhoons" of high-JCN junctions and fold-back inversions associated with expressed protein-coding fusions, breakend hypermutation, and acral, but not cutaneous, melanomas. Clustering of tumors according to genome graph-derived features identified subgroups associated with DNA repair defects and poor prognosis.