ABSTRACT
The Genome Aggregation Database (gnomAD), widely recognized as the gold-standard reference map of human genetic variation, has largely overlooked tandem repeat (TR) expansions, despite the fact that TRs constitute â¼6% of our genome and are linked to over 50 human diseases. Here, we introduce the TR-gnomAD (https://wlcb.oit.uci.edu/TRgnomAD), a biobank-scale reference of 0.86 million TRs derived from 338,963 whole-genome sequencing (WGS) samples of diverse ancestries (39.5% non-European samples). TR-gnomAD offers critical insights into ancestry-specific disease prevalence using disparities in TR unit number frequencies among ancestries. Moreover, TR-gnomAD is able to differentiate between common, presumably benign TR expansions, which are prevalent in TR-gnomAD, from those potentially pathogenic TR expansions, which are found more frequently in disease groups than within TR-gnomAD. Together, TR-gnomAD is an invaluable resource for researchers and physicians to interpret TR expansions in individuals with genetic diseases.
Subject(s)
Genome, Human , Tandem Repeat Sequences , Humans , Tandem Repeat Sequences/genetics , Whole Genome Sequencing , Databases, Genetic , DNA Repeat Expansion/genetics , Genome-Wide Association StudyABSTRACT
Characterizing somatic mutations in the brain is important for disentangling the complex mechanisms of aging, yet little is known about mutational patterns in different brain cell types. Here, we performed whole-genome sequencing (WGS) of 86 single oligodendrocytes, 20 mixed glia, and 56 single neurons from neurotypical individuals spanning 0.4-104 years of age and identified >92,000 somatic single-nucleotide variants (sSNVs) and small insertions/deletions (indels). Although both cell types accumulate somatic mutations linearly with age, oligodendrocytes accumulated sSNVs 81% faster than neurons and indels 28% slower than neurons. Correlation of mutations with single-nucleus RNA profiles and chromatin accessibility from the same brains revealed that oligodendrocyte mutations are enriched in inactive genomic regions and are distributed across the genome similarly to mutations in brain cancers. In contrast, neuronal mutations are enriched in open, transcriptionally active chromatin. These stark differences suggest an assortment of active mutagenic processes in oligodendrocytes and neurons.
Subject(s)
Aging , Brain , Neurons , Oligodendroglia , Humans , Aging/genetics , Aging/pathology , Chromatin/genetics , Chromatin/metabolism , Mutation , Neurons/metabolism , Neurons/pathology , Oligodendroglia/metabolism , Oligodendroglia/pathology , Single-Cell Gene Expression Analysis , Whole Genome Sequencing , Brain/metabolism , Brain/pathology , Polymorphism, Single Nucleotide , INDEL Mutation , Biological Specimen Banks , Oligodendrocyte Precursor Cells/metabolism , Oligodendrocyte Precursor Cells/pathologyABSTRACT
We conduct high coverage (>30×) whole-genome sequencing of 180 individuals from 12 indigenous African populations. We identify millions of unreported variants, many predicted to be functionally important. We observe that the ancestors of southern African San and central African rainforest hunter-gatherers (RHG) diverged from other populations >200 kya and maintained a large effective population size. We observe evidence for ancient population structure in Africa and for multiple introgression events from "ghost" populations with highly diverged genetic lineages. Although currently geographically isolated, we observe evidence for gene flow between eastern and southern Khoesan-speaking hunter-gatherer populations lasting until â¼12 kya. We identify signatures of local adaptation for traits related to skin color, immune response, height, and metabolic processes. We identify a positively selected variant in the lightly pigmented San that influences pigmentation in vitro by regulating the enhancer activity and gene expression of PDPK1.
Subject(s)
Acclimatization , Skin Pigmentation , Humans , Whole Genome Sequencing , Population Density , Africa , 3-Phosphoinositide-Dependent Protein KinasesABSTRACT
The 1000 Genomes Project (1kGP) is the largest fully open resource of whole-genome sequencing (WGS) data consented for public distribution without access or use restrictions. The final, phase 3 release of the 1kGP included 2,504 unrelated samples from 26 populations and was based primarily on low-coverage WGS. Here, we present a high-coverage 3,202-sample WGS 1kGP resource, which now includes 602 complete trios, sequenced to a depth of 30X using Illumina. We performed single-nucleotide variant (SNV) and short insertion and deletion (INDEL) discovery and generated a comprehensive set of structural variants (SVs) by integrating multiple analytic methods through a machine learning model. We show gains in sensitivity and precision of variant calls compared to phase 3, especially among rare SNVs as well as INDELs and SVs spanning frequency spectrum. We also generated an improved reference imputation panel, making variants discovered here accessible for association studies.
Subject(s)
Genome, Human , Whole Genome Sequencing , Female , High-Throughput Nucleotide Sequencing/methods , Humans , INDEL Mutation , Male , Polymorphism, Single NucleotideABSTRACT
An outbreak of over 1,000 COVID-19 cases in Provincetown, Massachusetts (MA), in July 2021-the first large outbreak mostly in vaccinated individuals in the US-prompted a comprehensive public health response, motivating changes to national masking recommendations and raising questions about infection and transmission among vaccinated individuals. To address these questions, we combined viral genomic and epidemiological data from 467 individuals, including 40% of outbreak-associated cases. The Delta variant accounted for 99% of cases in this dataset; it was introduced from at least 40 sources, but 83% of cases derived from a single source, likely through transmission across multiple settings over a short time rather than a single event. Genomic and epidemiological data supported multiple transmissions of Delta from and between fully vaccinated individuals. However, despite its magnitude, the outbreak had limited onward impact in MA and the US overall, likely due to high vaccination rates and a robust public health response.
Subject(s)
COVID-19/epidemiology , COVID-19/immunology , COVID-19/transmission , SARS-CoV-2/genetics , SARS-CoV-2/immunology , Adolescent , Adult , Aged , Aged, 80 and over , COVID-19/virology , Child , Child, Preschool , Contact Tracing/methods , Disease Outbreaks , Female , Genome, Viral , Humans , Infant , Infant, Newborn , Male , Massachusetts/epidemiology , Middle Aged , Molecular Epidemiology , Phylogeny , SARS-CoV-2/classification , Vaccination , Whole Genome Sequencing , Young AdultABSTRACT
Intra-tumor heterogeneity (ITH) is a mechanism of therapeutic resistance and therefore an important clinical challenge. However, the extent, origin, and drivers of ITH across cancer types are poorly understood. To address this, we extensively characterize ITH across whole-genome sequences of 2,658 cancer samples spanning 38 cancer types. Nearly all informative samples (95.1%) contain evidence of distinct subclonal expansions with frequent branching relationships between subclones. We observe positive selection of subclonal driver mutations across most cancer types and identify cancer type-specific subclonal patterns of driver gene mutations, fusions, structural variants, and copy number alterations as well as dynamic changes in mutational processes between subclonal expansions. Our results underline the importance of ITH and its drivers in tumor evolution and provide a pan-cancer resource of comprehensively annotated subclonal events from whole-genome sequencing data.
Subject(s)
Genetic Heterogeneity , Neoplasms/genetics , DNA Copy Number Variations , DNA, Neoplasm/chemistry , DNA, Neoplasm/metabolism , Databases, Genetic , Drug Resistance, Neoplasm/genetics , Humans , Neoplasms/pathology , Polymorphism, Single Nucleotide , Whole Genome SequencingABSTRACT
Lungfishes are the closest extant relatives of tetrapods and preserve ancestral traits linked with the water-to-land transition. However, their huge genome sizes have hindered understanding of this key transition in evolution. Here, we report a 40-Gb chromosome-level assembly of the African lungfish (Protopterus annectens) genome, which is the largest genome assembly ever reported and has a contig and chromosome N50 of 1.60 Mb and 2.81 Gb, respectively. The large size of the lungfish genome is due mainly to retrotransposons. Genes with ultra-long length show similar expression levels to other genes, indicating that lungfishes have evolved high transcription efficacy to keep gene expression balanced. Together with transcriptome and experimental data, we identified potential genes and regulatory elements related to such terrestrial adaptation traits as pulmonary surfactant, anxiolytic ability, pentadactyl limbs, and pharyngeal remodeling. Our results provide insights and key resources for understanding the evolutionary pathway leading from fishes to humans.
Subject(s)
Adaptation, Biological , Biological Evolution , Fishes/genetics , Whole Genome Sequencing , Animal Fins/anatomy & histology , Animal Fins/physiology , Animals , Extremities/anatomy & histology , Extremities/physiology , Fishes/anatomy & histology , Fishes/classification , Fishes/physiology , Phylogeny , Respiratory Physiological Phenomena , Respiratory System/anatomy & histology , Vertebrates/geneticsABSTRACT
Industrialization has impacted the human gut ecosystem, resulting in altered microbiome composition and diversity. Whether bacterial genomes may also adapt to the industrialization of their host populations remains largely unexplored. Here, we investigate the extent to which the rates and targets of horizontal gene transfer (HGT) vary across thousands of bacterial strains from 15 human populations spanning a range of industrialization. We show that HGTs have accumulated in the microbiome over recent host generations and that HGT occurs at high frequency within individuals. Comparison across human populations reveals that industrialized lifestyles are associated with higher HGT rates and that the functions of HGTs are related to the level of host industrialization. Our results suggest that gut bacteria continuously acquire new functionality based on host lifestyle and that high rates of HGT may be a recent development in human history linked to industrialization.
Subject(s)
Bacteria/genetics , Gastrointestinal Microbiome , Gene Transfer, Horizontal , Bacteria/classification , Bacteria/isolation & purification , DNA, Bacterial/chemistry , DNA, Bacterial/isolation & purification , DNA, Bacterial/metabolism , Feces/microbiology , Genome, Bacterial , Humans , Phylogeny , Rural Population , Sequence Analysis, DNA , Urban Population , Whole Genome SequencingABSTRACT
Genetic studies have revealed many variant loci that are associated with immune-mediated diseases. To elucidate the disease pathogenesis, it is essential to understand the function of these variants, especially under disease-associated conditions. Here, we performed a large-scale immune cell gene-expression analysis, together with whole-genome sequence analysis. Our dataset consists of 28 distinct immune cell subsets from 337 patients diagnosed with 10 categories of immune-mediated diseases and 79 healthy volunteers. Our dataset captured distinctive gene-expression profiles across immune cell types and diseases. Expression quantitative trait loci (eQTL) analysis revealed dynamic variations of eQTL effects in the context of immunological conditions, as well as cell types. These cell-type-specific and context-dependent eQTLs showed significant enrichment in immune disease-associated genetic variants, and they implicated the disease-relevant cell types, genes, and environment. This atlas deepens our understanding of the immunogenetic functions of disease-associated variants under in vivo disease conditions.
Subject(s)
Gene Expression Regulation/genetics , Gene Expression/immunology , Immune System Diseases/genetics , Adult , Female , Gene Expression/genetics , Gene Expression Regulation/immunology , Genetic Predisposition to Disease/genetics , Genome-Wide Association Study/methods , Humans , Immune System/cytology , Immune System/metabolism , Immune System Diseases/metabolism , Immune System Diseases/physiopathology , Male , Middle Aged , Polymorphism, Single Nucleotide/genetics , Quantitative Trait Loci/genetics , Quantitative Trait Loci/immunology , Transcriptome/genetics , Whole Genome Sequencing/methodsABSTRACT
We identified an emerging severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) variant by viral whole-genome sequencing of 2,172 nasal/nasopharyngeal swab samples from 44 counties in California, a state in the western United States. Named B.1.427/B.1.429 to denote its two lineages, the variant emerged in May 2020 and increased from 0% to >50% of sequenced cases from September 2020 to January 2021, showing 18.6%-24% increased transmissibility relative to wild-type circulating strains. The variant carries three mutations in the spike protein, including an L452R substitution. We found 2-fold increased B.1.427/B.1.429 viral shedding in vivo and increased L452R pseudovirus infection of cell cultures and lung organoids, albeit decreased relative to pseudoviruses carrying the N501Y mutation common to variants B.1.1.7, B.1.351, and P.1. Antibody neutralization assays revealed 4.0- to 6.7-fold and 2.0-fold decreases in neutralizing titers from convalescent patients and vaccine recipients, respectively. The increased prevalence of a more transmissible variant in California exhibiting decreased antibody neutralization warrants further investigation.
Subject(s)
Antibodies, Neutralizing/immunology , COVID-19/immunology , COVID-19/transmission , SARS-CoV-2/genetics , Spike Glycoprotein, Coronavirus/immunology , Antibodies, Monoclonal/immunology , Antibodies, Viral/immunology , Humans , Mutation/genetics , Whole Genome Sequencing/methodsABSTRACT
We present evidence for multiple independent origins of recombinant SARS-CoV-2 viruses sampled from late 2020 and early 2021 in the United Kingdom. Their genomes carry single-nucleotide polymorphisms and deletions that are characteristic of the B.1.1.7 variant of concern but lack the full complement of lineage-defining mutations. Instead, the remainder of their genomes share contiguous genetic variation with non-B.1.1.7 viruses circulating in the same geographic area at the same time as the recombinants. In four instances, there was evidence for onward transmission of a recombinant-origin virus, including one transmission cluster of 45 sequenced cases over the course of 2 months. The inferred genomic locations of recombination breakpoints suggest that every community-transmitted recombinant virus inherited its spike region from a B.1.1.7 parental virus, consistent with a transmission advantage for B.1.1.7's set of mutations.
Subject(s)
COVID-19/epidemiology , COVID-19/transmission , Pandemics , Recombination, Genetic , SARS-CoV-2/genetics , Base Sequence/genetics , COVID-19/virology , Computational Biology/methods , Gene Frequency , Genome, Viral , Genotype , Humans , Mutation , Phylogeny , Polymorphism, Single Nucleotide , United Kingdom/epidemiology , Whole Genome Sequencing/methodsABSTRACT
Global dispersal and increasing frequency of the SARS-CoV-2 spike protein variant D614G are suggestive of a selective advantage but may also be due to a random founder effect. We investigate the hypothesis for positive selection of spike D614G in the United Kingdom using more than 25,000 whole genome SARS-CoV-2 sequences. Despite the availability of a large dataset, well represented by both spike 614 variants, not all approaches showed a conclusive signal of positive selection. Population genetic analysis indicates that 614G increases in frequency relative to 614D in a manner consistent with a selective advantage. We do not find any indication that patients infected with the spike 614G variant have higher COVID-19 mortality or clinical severity, but 614G is associated with higher viral load and younger age of patients. Significant differences in growth and size of 614G phylogenetic clusters indicate a need for continued study of this variant.
Subject(s)
Amino Acid Substitution , COVID-19/transmission , COVID-19/virology , SARS-CoV-2/genetics , SARS-CoV-2/pathogenicity , Spike Glycoprotein, Coronavirus/genetics , Aspartic Acid/analysis , Aspartic Acid/genetics , COVID-19/epidemiology , Genome, Viral , Glycine/analysis , Glycine/genetics , Humans , Mutation , SARS-CoV-2/growth & development , United Kingdom/epidemiology , Virulence , Whole Genome SequencingABSTRACT
The dichotomous model of "drivers" and "passengers" in cancer posits that only a few mutations in a tumor strongly affect its progression, with the remaining ones being inconsequential. Here, we leveraged the comprehensive variant dataset from the ICGC/TCGA Pan-Cancer Analysis of Whole Genomes (PCAWG) project to demonstrate that-in addition to the dichotomy of high- and low-impact variants-there is a third group of medium-impact putative passengers. Moreover, we also found that molecular impact correlates with subclonal architecture (i.e., early versus late mutations), and different signatures encode for mutations with divergent impact. Furthermore, we adapted an additive-effects model from complex-trait studies to show that the aggregated effect of putative passengers, including undetected weak drivers, provides significant additional power (â¼12% additive variance) for predicting cancerous phenotypes, beyond PCAWG-identified driver mutations. Finally, this framework allowed us to estimate the frequency of potential weak-driver mutations in PCAWG samples lacking any well-characterized driver alterations.
Subject(s)
Genome, Human/genetics , Genomics/methods , Mutation/genetics , Neoplasms/genetics , DNA Mutational Analysis/methods , Disease Progression , Humans , Neoplasms/pathology , Whole Genome SequencingABSTRACT
Cancer genomes often harbor hundreds of somatic DNA rearrangement junctions, many of which cannot be easily classified into simple (e.g., deletion) or complex (e.g., chromothripsis) structural variant classes. Applying a novel genome graph computational paradigm to analyze the topology of junction copy number (JCN) across 2,778 tumor whole-genome sequences, we uncovered three novel complex rearrangement phenomena: pyrgo, rigma, and tyfonas. Pyrgo are "towers" of low-JCN duplications associated with early-replicating regions, superenhancers, and breast or ovarian cancers. Rigma comprise "chasms" of low-JCN deletions enriched in late-replicating fragile sites and gastrointestinal carcinomas. Tyfonas are "typhoons" of high-JCN junctions and fold-back inversions associated with expressed protein-coding fusions, breakend hypermutation, and acral, but not cutaneous, melanomas. Clustering of tumors according to genome graph-derived features identified subgroups associated with DNA repair defects and poor prognosis.
Subject(s)
Genomic Structural Variation/genetics , Genomics/methods , Neoplasms/genetics , Chromosome Inversion/genetics , Chromothripsis , DNA Copy Number Variations/genetics , Gene Rearrangement/genetics , Genome, Human/genetics , Humans , Mutation/genetics , Whole Genome Sequencing/methodsABSTRACT
Gastroenteropancreatic (GEP) neuroendocrine neoplasm (NEN) that consists of neuroendocrine tumor and neuroendocrine carcinoma (NEC) is a lethal but under-investigated disease owing to its rarity. To fill the scarcity of clinically relevant models of GEP-NEN, we here established 25 lines of NEN organoids and performed their comprehensive molecular characterization. GEP-NEN organoids recapitulated pathohistological and functional phenotypes of the original tumors. Whole-genome sequencing revealed frequent genetic alterations in TP53 and RB1 in GEP-NECs, and characteristic chromosome-wide loss of heterozygosity in GEP-NENs. Transcriptome analysis identified molecular subtypes that are distinguished by the expression of distinct transcription factors. GEP-NEN organoids gained independence from the stem cell niche irrespective of genetic mutations. Compound knockout of TP53 and RB1, together with overexpression of key transcription factors, conferred on the normal colonic epithelium phenotypes that are compatible with GEP-NEN biology. Altogether, our study not only provides genetic understanding of GEP-NEN, but also connects its genetics and biological phenotypes.
Subject(s)
Biological Specimen Banks , Neuroendocrine Tumors/pathology , Organoids/pathology , Animals , Chromosomes, Human/genetics , Genotype , Humans , Intercellular Signaling Peptides and Proteins/metabolism , Intestinal Neoplasms/genetics , Intestinal Neoplasms/pathology , Male , Mice , Models, Genetic , Mutation/genetics , Neuroendocrine Tumors/genetics , Pancreatic Neoplasms/genetics , Pancreatic Neoplasms/pathology , Phenotype , Stomach Neoplasms/genetics , Stomach Neoplasms/pathology , Transcriptome/genetics , Whole Genome SequencingABSTRACT
Inflammatory bowel disease (IBD) is a chronic inflammatory disease associated with increased risk of gastrointestinal cancers. We whole-genome sequenced 446 colonic crypts from 46 IBD patients and compared these to 412 crypts from 41 non-IBD controls from our previous publication on the mutation landscape of the normal colon. The average mutation rate of affected colonic epithelial cells is 2.4-fold that of healthy colon, and this increase is mostly driven by acceleration of mutational processes ubiquitously observed in normal colon. In contrast to the normal colon, where clonal expansions outside the confines of the crypt are rare, we observed widespread millimeter-scale clonal expansions. We discovered non-synonymous mutations in ARID1A, FBXW7, PIGR, ZC3H12A, and genes in the interleukin 17 and Toll-like receptor pathways, under positive selection in IBD. These results suggest distinct selection mechanisms in the colitis-affected colon and that somatic mutations potentially play a causal role in IBD pathogenesis.
Subject(s)
Clonal Evolution/genetics , Colitis/genetics , Inflammatory Bowel Diseases/genetics , Mutation Rate , Adult , Aged , Aged, 80 and over , Aging/genetics , Clonal Evolution/immunology , Colitis/metabolism , Colitis, Ulcerative/genetics , Colitis, Ulcerative/metabolism , Crohn Disease/genetics , Crohn Disease/metabolism , DNA-Binding Proteins/genetics , Epithelial Cells/metabolism , Epithelial Cells/pathology , F-Box-WD Repeat-Containing Protein 7/genetics , Female , Humans , INDEL Mutation , Inflammatory Bowel Diseases/immunology , Inflammatory Bowel Diseases/metabolism , Inflammatory Bowel Diseases/pathology , Interleukin-17/metabolism , Intestinal Mucosa/metabolism , Intestinal Mucosa/pathology , Male , Middle Aged , Phylogeny , Point Mutation , Receptors, Cell Surface/genetics , Ribonucleases/genetics , Toll-Like Receptors/genetics , Transcription Factors/genetics , Whole Genome SequencingABSTRACT
Affordable genome sequencing technologies promise to revolutionize the field of human genetics by enabling comprehensive studies that interrogate all classes of genome variation, genome-wide, across the entire allele frequency spectrum. Ongoing projects worldwide are sequencing many thousands-and soon millions-of human genomes as part of various gene mapping studies, biobanking efforts, and clinical programs. However, while genome sequencing data production has become routine, genome analysis and interpretation remain challenging endeavors with many limitations and caveats. Here, we review the current state of technologies for genetic variant discovery, genotyping, and functional interpretation and discuss the prospects for future advances. We focus on germline variants discovered by whole-genome sequencing, genome-wide functional genomic approaches for predicting and measuring variant functional effects, and implications for studies of common and rare human disease.
Subject(s)
Genetic Variation/genetics , Genome, Human/genetics , Sequence Analysis, DNA/trends , Biological Specimen Banks , Chromosome Mapping/methods , Genetic Predisposition to Disease/genetics , Genetic Testing/trends , Genome-Wide Association Study , Genomics/methods , Genomics/trends , High-Throughput Nucleotide Sequencing/methods , Human Genome Project , Humans , Polymorphism, Single Nucleotide/genetics , Sequence Analysis, DNA/methods , Whole Genome Sequencing/methods , Whole Genome Sequencing/trendsABSTRACT
Genomic studies in African populations provide unique opportunities to understand disease etiology, human diversity, and population history. In the largest study of its kind, comprising genome-wide data from 6,400 individuals and whole-genome sequences from 1,978 individuals from rural Uganda, we find evidence of geographically correlated fine-scale population substructure. Historically, the ancestry of modern Ugandans was best represented by a mixture of ancient East African pastoralists. We demonstrate the value of the largest sequence panel from Africa to date as an imputation resource. Examining 34 cardiometabolic traits, we show systematic differences in trait heritability between European and African populations, probably reflecting the differential impact of genes and environment. In a multi-trait pan-African GWAS of up to 14,126 individuals, we identify novel loci associated with anthropometric, hematological, lipid, and glycemic traits. We find that several functionally important signals are driven by Africa-specific variants, highlighting the value of studying diverse populations across the region.
Subject(s)
Black People/genetics , Genetic Predisposition to Disease , Genome, Human/genetics , Genomics , Female , Gene Frequency/genetics , Genome-Wide Association Study , Humans , Male , Polymorphism, Single Nucleotide/genetics , Uganda/epidemiology , Whole Genome SequencingABSTRACT
Underrepresentation of Asian genomes has hindered population and medical genetics research on Asians, leading to population disparities in precision medicine. By whole-genome sequencing of 4,810 Singapore Chinese, Malays, and Indians, we found 98.3 million SNPs and small insertions or deletions, over half of which are novel. Population structure analysis demonstrated great representation of Asian genetic diversity by three ethnicities in Singapore and revealed a Malay-related novel ancestry component. Furthermore, demographic inference suggested that Malays split from Chinese â¼24,800 years ago and experienced significant admixture with East Asians â¼1,700 years ago, coinciding with the Austronesian expansion. Additionally, we identified 20 candidate loci for natural selection, 14 of which harbored robust associations with complex traits and diseases. Finally, we show that our data can substantially improve genotype imputation in diverse Asian and Oceanian populations. These results highlight the value of our data as a resource to empower human genetics discovery across broad geographic regions.
Subject(s)
Genetics, Population , Genome, Human/genetics , Selection, Genetic , Whole Genome Sequencing , Asian People/genetics , Female , Genotype , Humans , Malaysia/epidemiology , Male , Polymorphism, Single Nucleotide/genetics , Singapore/epidemiologyABSTRACT
Whole-genome-sequencing (WGS) of human tumors has revealed distinct mutation patterns that hint at the causative origins of cancer. We examined mutational signatures in 324 WGS human-induced pluripotent stem cells exposed to 79 known or suspected environmental carcinogens. Forty-one yielded characteristic substitution mutational signatures. Some were similar to signatures found in human tumors. Additionally, six agents produced double-substitution signatures and eight produced indel signatures. Investigating mutation asymmetries across genome topography revealed fully functional mismatch and transcription-coupled repair pathways. DNA damage induced by environmental mutagens can be resolved by disparate repair and/or replicative pathways, resulting in an assortment of signature outcomes even for a single agent. This compendium of experimentally induced mutational signatures permits further exploration of roles of environmental agents in cancer etiology and underscores how human stem cell DNA is directly vulnerable to environmental agents. VIDEO ABSTRACT.