ABSTRACT
The adoption of agriculture triggered a rapid shift towards starch-rich diets in human populations1. Amylase genes facilitate starch digestion, and increased amylase copy number has been observed in some modern human populations with high-starch intake2, although evidence of recent selection is lacking3,4. Here, using 94 long-read haplotype-resolved assemblies and short-read data from approximately 5,600 contemporary and ancient humans, we resolve the diversity and evolutionary history of structural variation at the amylase locus. We find that amylase genes have higher copy numbers in agricultural populations than in fishing, hunting and pastoral populations. We identify 28 distinct amylase structural architectures and demonstrate that nearly identical structures have arisen recurrently on different haplotype backgrounds throughout recent human history. AMY1 and AMY2A genes each underwent multiple duplication/deletion events with mutation rates up to more than 10,000-fold the single-nucleotide polymorphism mutation rate, whereas AMY2B gene duplications share a single origin. Using a pangenome-based approach, we infer structural haplotypes across thousands of humans identifying extensively duplicated haplotypes at higher frequency in modern agricultural populations. Leveraging 533 ancient human genomes, we find that duplication-containing haplotypes (with more gene copies than the ancestral haplotype) have rapidly increased in frequency over the past 12,000 years in West Eurasians, suggestive of positive selection. Together, our study highlights the potential effects of the agricultural revolution on human genomes and the importance of structural variation in human adaptation.
Subject(s)
Agriculture , Amylases , Evolution, Molecular , Gene Dosage , Genome, Human , Haplotypes , Selection, Genetic , Humans , Agriculture/history , Agriculture/statistics & numerical data , Amylases/genetics , Amylases/chemistry , Gene Dosage/genetics , Gene Duplication/genetics , Genetic Loci/genetics , Genome, Human/genetics , Haplotypes/genetics , History, Ancient , Mutation Rate , Polymorphism, Single Nucleotide/genetics , Hunting/statistics & numerical data , Gene Deletion , DNA, Ancient/analysisABSTRACT
The Holocene (beginning around 12,000 years ago) encompassed some of the most significant changes in human evolution, with far-reaching consequences for the dietary, physical and mental health of present-day populations. Using a dataset of more than 1,600 imputed ancient genomes1, we modelled the selection landscape during the transition from hunting and gathering, to farming and pastoralism across West Eurasia. We identify key selection signals related to metabolism, including that selection at the FADS cluster began earlier than previously reported and that selection near the LCT locus predates the emergence of the lactase persistence allele by thousands of years. We also find strong selection in the HLA region, possibly due to increased exposure to pathogens during the Bronze Age. Using ancient individuals to infer local ancestry tracts in over 400,000 samples from the UK Biobank, we identify widespread differences in the distribution of Mesolithic, Neolithic and Bronze Age ancestries across Eurasia. By calculating ancestry-specific polygenic risk scores, we show that height differences between Northern and Southern Europe are associated with differential Steppe ancestry, rather than selection, and that risk alleles for mood-related phenotypes are enriched for Neolithic farmer ancestry, whereas risk alleles for diabetes and Alzheimer's disease are enriched for Western hunter-gatherer ancestry. Our results indicate that ancient selection and migration were large contributors to the distribution of phenotypic diversity in present-day Europeans.
Subject(s)
Asian , European People , Genome, Human , Selection, Genetic , Humans , Affect , Agriculture/history , Alleles , Alzheimer Disease/genetics , Asia/ethnology , Asian/genetics , Diabetes Mellitus/genetics , Europe/ethnology , European People/genetics , Farmers/history , Genetic Loci/genetics , Genetic Predisposition to Disease , Genome, Human/genetics , History, Ancient , Human Migration , Hunting/history , Multigene Family/genetics , Phenotype , UK Biobank , Multifactorial Inheritance/geneticsABSTRACT
Pangenome graphs can represent all variation between multiple reference genomes, but current approaches to build them exclude complex sequences or are based upon a single reference. In response, we developed the PanGenome Graph Builder, a pipeline for constructing pangenome graphs without bias or exclusion. The PanGenome Graph Builder uses all-to-all alignments to build a variation graph in which we can identify variation, measure conservation, detect recombination events and infer phylogenetic relationships.
ABSTRACT
Gene duplication is an important source of phenotypic change and adaptive evolution. We leverage a haploid hydatidiform mole to identify highly identical sequences missing from the reference genome, confirming that the cortical development gene Slit-Robo Rho GTPase-activating protein 2 (SRGAP2) duplicated three times exclusively in humans. We show that the promoter and first nine exons of SRGAP2 duplicated from 1q32.1 (SRGAP2A) to 1q21.1 (SRGAP2B) â¼3.4 million years ago (mya). Two larger duplications later copied SRGAP2B to chromosome 1p12 (SRGAP2C) and to proximal 1q21.1 (SRGAP2D) â¼2.4 and â¼1 mya, respectively. Sequence and expression analyses show that SRGAP2C is the most likely duplicate to encode a functional protein and is among the most fixed human-specific duplicate genes. Our data suggest a mechanism where incomplete duplication created a novel gene function-antagonizing parental SRGAP2 function-immediately "at birth" 2-3 mya, which is a time corresponding to the transition from Australopithecus to Homo and the beginning of neocortex expansion.
Subject(s)
Evolution, Molecular , GTPase-Activating Proteins/genetics , Primates/genetics , Segmental Duplications, Genomic , Animals , DNA Copy Number Variations , Female , Genetics, Medical , Humans , Hydatidiform Mole/genetics , In Situ Hybridization, Fluorescence , Mammals/genetics , Molecular Sequence Data , PregnancyABSTRACT
Aging is a nearly inescapable trait among organisms yet lifespan varies tremendously across different species and spans several orders of magnitude in vertebrates alone. This vast phenotypic diversity is driven by distinct evolutionary trajectories and tradeoffs that are reflected in patterns of diversification and constraint in organismal genomes. Age-specific impacts of selection also shape allele frequencies in populations, thus impacting disease susceptibility and environment-specific mortality risk. Further, the mutational processes that spawn this genetic diversity in both germline and somatic cells are strongly influenced by age and life history. We discuss recent advances in our understanding of the evolution of aging and lifespan at organismal, population, and cellular scales, and highlight outstanding questions that remain unanswered.
ABSTRACT
Advancements in sequencing technologies and assembly methods enable the regular production of high-quality genome assemblies characterizing complex regions. However, challenges remain in efficiently interpreting variation at various scales, from smaller tandem repeats to megabase rearrangements, across many human genomes. We present a PanGenome Research Tool Kit (PGR-TK) enabling analyses of complex pangenome structural and haplotype variation at multiple scales. We apply the graph decomposition methods in PGR-TK to the class II major histocompatibility complex demonstrating the importance of the human pangenome for analyzing complicated regions. Moreover, we investigate the Y-chromosome genes, DAZ1/DAZ2/DAZ3/DAZ4, of which structural variants have been linked to male infertility, and X-chromosome genes OPN1LW and OPN1MW linked to eye disorders. We further showcase PGR-TK across 395 complex repetitive medically important genes. This highlights the power of PGR-TK to resolve complex variation in regions of the genome that were previously too complex to analyze.
Subject(s)
Genome, Human , Genomics , Male , Humans , Major Histocompatibility ComplexABSTRACT
BACKGROUND: Elephant seals exhibit extreme hypoxemic tolerance derived from repetitive hypoxia/reoxygenation episodes they experience during diving bouts. Real-time assessment of the molecular changes underlying protection against hypoxic injury in seals remains restricted by their at-sea inaccessibility. Hence, we developed a proliferative arterial endothelial cell culture model from elephant seals and used RNA-seq, functional assays, and confocal microscopy to assess the molecular response to prolonged hypoxia. RESULTS: Seal and human endothelial cells exposed to 1% O2 for up to 6 h respond differently to acute and prolonged hypoxia. Seal cells decouple stabilization of the hypoxia-sensitive transcriptional regulator HIF-1α from angiogenic signaling. Rapid upregulation of genes involved in glutathione (GSH) metabolism supports the maintenance of GSH pools, and intracellular succinate increases in seal but not human cells. High maximal and spare respiratory capacity in seal cells after hypoxia exposure occurs in concert with increasing mitochondrial branch length and independent from major changes in extracellular acidification rate, suggesting that seal cells recover oxidative metabolism without significant glycolytic dependency after hypoxia exposure. CONCLUSIONS: We found that the glutathione antioxidant system is upregulated in seal endothelial cells during hypoxia, while this system remains static in comparable human cells. Furthermore, we found that in contrast to human cells, hypoxia exposure rapidly activates HIF-1 in seal cells, but this response is decoupled from the canonical angiogenesis pathway. These results highlight the unique mechanisms that confer extraordinary tolerance to limited oxygen availability in a champion diving mammal.
Subject(s)
Antioxidants , Endothelial Cells , Seals, Earless , Signal Transduction , Up-Regulation , Animals , Seals, Earless/physiology , Seals, Earless/metabolism , Endothelial Cells/metabolism , Endothelial Cells/drug effects , Antioxidants/metabolism , Humans , Hypoxia/metabolism , Cell Hypoxia , Neovascularization, Physiologic/drug effects , Neovascularization, Physiologic/physiology , Cells, Cultured , Glutathione/metabolism , Hypoxia-Inducible Factor 1, alpha Subunit/metabolism , Hypoxia-Inducible Factor 1, alpha Subunit/geneticsABSTRACT
The Yuma myotis bat (Myotis yumanensis) is a small vespertilionid bat and one of 52 species of new world Myotis bats in the subgenus Pizonyx. While M. yumanensis populations currently appear relatively stable, it is one of 12 bat species known or suspected to be susceptible to white-nose syndrome, the fungal disease causing declines in bat populations across North America. Only two of these 12 species have genome resources available, which limits the ability of resource managers to use genomic techniques to track the responses of bat populations to white-nose syndrome generally. Here we present the first de novo genome assembly for Yuma myotis, generated as a part of the California Conservation Genomics Project. The M. yumanensis genome was generated using a combination of PacBio HiFi long reads and Omni-C chromatin-proximity sequencing technology. This high-quality genome is one of the most complete bat assemblies available, with a contig N50 of 28.03 Mb, scaffold N50 of 99.14 Mb, and BUSCO completeness score of 93.7%. The Yuma myotis genome provides a high-quality resource that will aid in comparative genomic and evolutionary studies, as well as inform conservation management related to white-nose syndrome.
Subject(s)
Chiroptera , Animals , Chiroptera/genetics , North America , Genome , Genomics , Biological EvolutionABSTRACT
Townsend's big-eared bat, Corynorhinus townsendii, is a cave- and mine-roosting species found largely in western North America. Considered a species of conservation concern throughout much of its range, protection efforts would greatly benefit from understanding patterns of population structure, genetic diversity, and local adaptation. To facilitate such research, we present the first de novo genome assembly of C. townsendii as part of the California Conservation Genomics Project (CCGP). Pacific Biosciences HiFi long reads and Omni-C chromatin-proximity sequencing technologies were used to produce a de novo genome assembly, consistent with the standard CCGP reference genome protocol. This assembly comprises 391 scaffolds spanning 2.1 Gb, represented by a scaffold N50 of 174.6 Mb, a contig N50 of 23.4 Mb, and a benchmarking universal single-copy ortholog (BUSCO) completeness score of 96.6%. This high-quality genome will be a key tool for informed conservation and management of this vulnerable species in California and across its range.
Subject(s)
Chiroptera , Animals , Chiroptera/genetics , Genome , Genomics/methods , North AmericaABSTRACT
Many RNA binding proteins (RBPs) bind specific RNA sequence motifs, but only a small fraction (â¼15%-40%) of RBP motif occurrences are occupied in vivo. To determine which contextual features discriminate between bound and unbound motifs, we performed an in vitro binding assay using 12,000 mouse RNA sequences with the RBPs MBNL1 and RBFOX2. Surprisingly, the strength of binding to motif occurrences in vitro was significantly correlated with in vivo binding, developmental regulation, and evolutionary age of alternative splicing. Multiple lines of evidence indicate that the primary context effect that affects binding in vitro and in vivo is RNA secondary structure. Large-scale combinatorial mutagenesis of unfavorable sequence contexts revealed a consistent pattern whereby mutations that increased motif accessibility improved protein binding and regulatory activity. Our results indicate widespread inhibition of motif binding by local RNA secondary structure and suggest that mutations that alter sequence context commonly affect RBP binding and regulation.
Subject(s)
Algorithms , DNA-Binding Proteins/chemistry , RNA Splicing Factors/chemistry , RNA-Binding Proteins/chemistry , RNA/chemistry , Alternative Splicing , Animals , Binding Sites , Cattle , Cell Differentiation , DNA-Binding Proteins/genetics , DNA-Binding Proteins/metabolism , Fibroblasts/cytology , Fibroblasts/metabolism , Gene Expression , Macaca , Mice , Mice, Knockout , Mouse Embryonic Stem Cells/cytology , Mouse Embryonic Stem Cells/metabolism , Mutation , Neurons/cytology , Neurons/metabolism , Nucleic Acid Conformation , Nucleotide Motifs , Protein Binding , Protein Interaction Domains and Motifs , RNA/genetics , RNA/metabolism , RNA Splicing Factors/genetics , RNA Splicing Factors/metabolism , RNA-Binding Proteins/genetics , RNA-Binding Proteins/metabolism , Rats , SoftwareABSTRACT
BACKGROUND: Mitochondrial genome sequences have become critical to the study of biodiversity. Genome skimming and other short-read based methods are the most common approaches, but they are not well-suited to scale up to multiplexing hundreds of samples. Here, we report on a new approach to sequence hundreds to thousands of complete mitochondrial genomes in parallel using long-amplicon sequencing. We amplified the mitochondrial genome of 677 specimens in two partially overlapping amplicons and implemented an asymmetric PCR-based indexing approach to multiplex 1,159 long amplicons together on a single PacBio SMRT Sequel II cell. We also tested this method on Oxford Nanopore Technologies (ONT) MinION R9.4 to assess if this method could be applied to other long-read technologies. We implemented several optimizations that make this method significantly more efficient than alternative mitochondrial genome sequencing methods. RESULTS: With the PacBio sequencing data we recovered at least one of the two fragments for 96% of samples (~ 80-90%) with mean coverage ~ 1,500x. The ONT data recovered less than 50% of input fragments likely due to low throughput and the design of the Barcoded Universal Primers which were optimized for PacBio sequencing. We compared a single mitochondrial gene alignment to half and full mitochondrial genomes and found, as expected, increased tree support with longer alignments, though whole mitochondrial genomes were not significantly better than half mitochondrial genomes. CONCLUSIONS: This method can effectively capture thousands of long amplicons in a single run and be used to build more robust phylogenies quickly and effectively. We provide several recommendations for future users depending on the evolutionary scale of their system. A natural extension of this method is to collect multi-locus datasets consisting of mitochondrial genomes and several long nuclear loci at once.
Subject(s)
Genome, Mitochondrial , Nanopore Sequencing , Nanopores , Sequence Analysis, DNA/methods , High-Throughput Nucleotide Sequencing/methods , BiodiversityABSTRACT
Nature has evolved a wealth of sex determination (SD) mechanisms, driven by both genetic and environmental factors. Recent studies of SD in fishes have shown that not all taxa fit the classic paradigm of sex chromosome evolution and diverse SD methods can be found even among closely related species. Here, we apply a suite of genomic approaches to investigate sex-biased genomic variation in eight species of Sebastes rockfish found in the northeast Pacific Ocean. Using recently assembled chromosome-level rockfish genomes, we leverage published sequence data to identify disparate sex chromosomes and sex-biased loci in five species. We identify two putative male sex chromosomes in S. diaconus, a single putative sex chromosome in the sibling species S. carnatus and S. chrysomelas, and an unplaced sex determining contig in the sibling species S. miniatus and S. crocotulus. Our study provides evidence for disparate means of sex determination within a recently diverged set of species and sheds light on the diverse origins of sex determination mechanisms present in the animal kingdom.
Subject(s)
Bass , Perciformes , Animals , Male , Perciformes/genetics , Sex Chromosomes/genetics , Y Chromosome , Genomics/methods , Bass/genetics , Evolution, MolecularABSTRACT
Genetic differences that specify unique aspects of human evolution have typically been identified by comparative analyses between the genomes of humans and closely related primates, including more recently the genomes of archaic hominins. Not all regions of the genome, however, are equally amenable to such study. Recurrent copy number variation (CNV) at chromosome 16p11.2 accounts for approximately 1% of cases of autism and is mediated by a complex set of segmental duplications, many of which arose recently during human evolution. Here we reconstruct the evolutionary history of the locus and identify bolA family member 2 (BOLA2) as a gene duplicated exclusively in Homo sapiens. We estimate that a 95-kilobase-pair segment containing BOLA2 duplicated across the critical region approximately 282 thousand years ago (ka), one of the latest among a series of genomic changes that dramatically restructured the locus during hominid evolution. All humans examined carried one or more copies of the duplication, which nearly fixed early in the human lineage--a pattern unlikely to have arisen so rapidly in the absence of selection (P < 0.0097). We show that the duplication of BOLA2 led to a novel, human-specific in-frame fusion transcript and that BOLA2 copy number correlates with both RNA expression (r = 0.36) and protein level (r = 0.65), with the greatest expression difference between human and chimpanzee in experimentally derived stem cells. Analyses of 152 patients carrying a chromosome 16p11. rearrangement show that more than 96% of breakpoints occur within the H. sapiens-specific duplication. In summary, the duplicative transposition of BOLA2 at the root of the H. sapiens lineage about 282 ka simultaneously increased copy number of a gene associated with iron homeostasis and predisposed our species to recurrent rearrangements associated with disease.
Subject(s)
Chromosomes, Human, Pair 16/genetics , DNA Copy Number Variations/genetics , Evolution, Molecular , Genetic Predisposition to Disease , Proteins/genetics , Animals , Autistic Disorder/genetics , Chromosome Breakage , Gene Duplication , Homeostasis/genetics , Humans , Iron/metabolism , Pan troglodytes/genetics , Pongo/genetics , Proteins/analysis , Recombination, Genetic , Species Specificity , Time FactorsABSTRACT
A progressive loss of protein homeostasis is characteristic of aging and a driver of neurodegeneration. To investigate this process quantitatively, we characterized proteome dynamics during brain aging in the short-lived vertebrate Nothobranchius furzeri combining transcriptomics and proteomics. We detected a progressive reduction in the correlation between protein and mRNA, mainly due to post-transcriptional mechanisms that account for over 40% of the age-regulated proteins. These changes cause a progressive loss of stoichiometry in several protein complexes, including ribosomes, which show impaired assembly/disassembly and are enriched in protein aggregates in old brains. Mechanistically, we show that reduction of proteasome activity is an early event during brain aging and is sufficient to induce proteomic signatures of aging and loss of stoichiometry in vivo. Using longitudinal transcriptomic data, we show that the magnitude of early life decline in proteasome levels is a major risk factor for mortality. Our work defines causative events in the aging process that can be targeted to prevent loss of protein homeostasis and delay the onset of age-related neurodegeneration.
Subject(s)
Aging/metabolism , Brain/metabolism , Proteasome Endopeptidase Complex/metabolism , Protein Aggregates , Ribosomes/metabolism , Aging/genetics , Animals , Biophysical Phenomena , Cyprinodontiformes/genetics , Mice, Inbred C57BL , RNA, Messenger/genetics , RNA, Messenger/metabolism , Reproducibility of Results , Ribosomal Proteins/genetics , Ribosomal Proteins/metabolism , Risk Factors , Transcriptome/geneticsABSTRACT
Elephant seals experience natural periods of prolonged food deprivation while breeding, molting, and undergoing postnatal development. Prolonged food deprivation in elephant seals increases circulating glucocorticoids without inducing muscle atrophy, but the cellular mechanisms that allow elephant seals to cope with such conditions remain elusive. We generated a cellular model and conducted transcriptomic, metabolic, and morphological analyses to study how seal cells adapt to sustained glucocorticoid exposure. Seal muscle progenitor cells differentiate into contractile myotubes with a distinctive morphology, gene expression profile, and metabolic phenotype. Exposure to dexamethasone at three ascending concentrations for 48 h modulated the expression of six clusters of genes related to structural constituents of muscle and pathways associated with energy metabolism and cell survival. Knockdown of the glucocorticoid receptor (GR) and downstream expression analyses corroborated that GR mediates the observed effects. Dexamethasone also decreased cellular respiration, shifted the metabolic phenotype toward glycolysis, and induced mitochondrial fission and dissociation of mitochondria-endoplasmic reticulum (ER) interactions without decreasing cell viability. Knockdown of DNA damage-inducible transcript 4 (DDIT4), a GR target involved in the dissociation of mitochondria-ER membranes, recovered respiration and modulated antioxidant gene expression in myotubes treated with dexamethasone. These results show that adaptation to sustained glucocorticoid exposure in elephant seal myotubes involves a metabolic shift toward glycolysis, which is supported by alterations in mitochondrial morphology and a reduction in mitochondria-ER interactions, resulting in decreased respiration without compromising cell survival.
Subject(s)
Energy Metabolism/physiology , Glucocorticoids/metabolism , Muscle, Skeletal/metabolism , Muscular Atrophy/metabolism , Adaptation, Physiological , Animals , Antioxidants/metabolism , Fasting/metabolism , Food Deprivation/physiology , Phenotype , Receptors, Glucocorticoid/genetics , Seals, Earless/metabolism , Transcriptome/physiologyABSTRACT
The human genome is arguably the most complete mammalian reference assembly, yet more than 160 euchromatic gaps remain and aspects of its structural variation remain poorly understood ten years after its completion. To identify missing sequence and genetic variation, here we sequence and analyse a haploid human genome (CHM1) using single-molecule, real-time DNA sequencing. We close or extend 55% of the remaining interstitial gaps in the human GRCh37 reference genome--78% of which carried long runs of degenerate short tandem repeats, often several kilobases in length, embedded within (G+C)-rich genomic regions. We resolve the complete sequence of 26,079 euchromatic structural variants at the base-pair level, including inversions, complex insertions and long tracts of tandem repeats. Most have not been previously reported, with the greatest increases in sensitivity occurring for events less than 5 kilobases in size. Compared to the human reference, we find a significant insertional bias (3:1) in regions corresponding to complex insertions and long short tandem repeats. Our results suggest a greater complexity of the human genome in the form of variation of longer and more complex repetitive DNA that can now be largely resolved with the application of this longer-read sequencing technology.
Subject(s)
Genetic Variation/genetics , Genome, Human/genetics , Genomics , Sequence Analysis, DNA/methods , Chromosome Inversion/genetics , Chromosomes, Human, Pair 10/genetics , Cloning, Molecular , GC Rich Sequence/genetics , Haploidy , Humans , Mutagenesis, Insertional/genetics , Reference Standards , Tandem Repeat Sequences/geneticsABSTRACT
Structural variants are implicated in numerous diseases and make up the majority of varying nucleotides among human genomes. Here we describe an integrated set of eight structural variant classes comprising both balanced and unbalanced variants, which we constructed using short-read DNA sequencing data and statistically phased onto haplotype blocks in 26 human populations. Analysing this set, we identify numerous gene-intersecting structural variants exhibiting population stratification and describe naturally occurring homozygous gene knockouts that suggest the dispensability of a variety of human genes. We demonstrate that structural variants are enriched on haplotypes identified by genome-wide association studies and exhibit enrichment for expression quantitative trait loci. Additionally, we uncover appreciable levels of structural variant complexity at different scales, including genic loci subject to clusters of repeated rearrangement and complex structural variants with multiple breakpoints likely to have formed through individual mutational events. Our catalogue will enhance future studies into structural variant demography, functional impact and disease association.
Subject(s)
Genetic Variation/genetics , Genome, Human/genetics , Physical Chromosome Mapping , Amino Acid Sequence , Genetic Predisposition to Disease , Genetics, Medical , Genetics, Population , Genome-Wide Association Study , Genomics , Genotype , Haplotypes/genetics , Homozygote , Humans , Molecular Sequence Data , Mutation Rate , Polymorphism, Single Nucleotide/genetics , Quantitative Trait Loci/genetics , Sequence Analysis, DNA , Sequence Deletion/geneticsABSTRACT
We present a high-quality genome sequence of a Neanderthal woman from Siberia. We show that her parents were related at the level of half-siblings and that mating among close relatives was common among her recent ancestors. We also sequenced the genome of a Neanderthal from the Caucasus to low coverage. An analysis of the relationships and population history of available archaic genomes and 25 present-day human genomes shows that several gene flow events occurred among Neanderthals, Denisovans and early modern humans, possibly including gene flow into Denisovans from an unknown archaic group. Thus, interbreeding, albeit of low magnitude, occurred among many hominin groups in the Late Pleistocene. In addition, the high-quality Neanderthal genome allows us to establish a definitive list of substitutions that became fixed in modern humans after their separation from the ancestors of Neanderthals and Denisovans.
Subject(s)
Fossils , Genome/genetics , Neanderthals/genetics , Africa , Animals , Caves , DNA Copy Number Variations/genetics , Female , Gene Flow/genetics , Gene Frequency , Heterozygote , Humans , Inbreeding , Models, Genetic , Neanderthals/classification , Phylogeny , Population Density , Siberia/ethnology , Toe Phalanges/anatomy & histologyABSTRACT
We sequenced the genomes of a â¼7,000-year-old farmer from Germany and eight â¼8,000-year-old hunter-gatherers from Luxembourg and Sweden. We analysed these and other ancient genomes with 2,345 contemporary humans to show that most present-day Europeans derive from at least three highly differentiated populations: west European hunter-gatherers, who contributed ancestry to all Europeans but not to Near Easterners; ancient north Eurasians related to Upper Palaeolithic Siberians, who contributed to both Europeans and Near Easterners; and early European farmers, who were mainly of Near Eastern origin but also harboured west European hunter-gatherer related ancestry. We model these populations' deep relationships and show that early European farmers had â¼44% ancestry from a 'basal Eurasian' population that split before the diversification of other non-African lineages.
Subject(s)
Genome, Human/genetics , White People/classification , White People/genetics , Agriculture/history , Asia/ethnology , Europe , History, Ancient , Humans , Population Dynamics , Principal Component Analysis , WorkforceABSTRACT
Most great ape genetic variation remains uncharacterized; however, its study is critical for understanding population history, recombination, selection and susceptibility to disease. Here we sequence to high coverage a total of 79 wild- and captive-born individuals representing all six great ape species and seven subspecies and report 88.8 million single nucleotide polymorphisms. Our analysis provides support for genetically distinct populations within each species, signals of gene flow, and the split of common chimpanzees into two distinct groups: Nigeria-Cameroon/western and central/eastern populations. We find extensive inbreeding in almost all wild populations, with eastern gorillas being the most extreme. Inferred effective population sizes have varied radically over time in different lineages and this appears to have a profound effect on the genetic diversity at, or close to, genes in almost all species. We discover and assign 1,982 loss-of-function variants throughout the human and great ape lineages, determining that the rate of gene loss has not been different in the human branch compared to other internal branches in the great ape phylogeny. This comprehensive catalogue of great ape genome diversity provides a framework for understanding evolution and a resource for more effective management of wild and captive great ape populations.