RESUMEN
Establishing the genetic and geographic structure of populations is fundamental, both to understand their evolutionary past and preserve their future. Nevertheless, the patterns of genetic population structure are unknown for most endangered species. This is the case for bonobos (Pan paniscus), which, together with chimpanzees (Pan troglodytes), are humans' closest living relatives. Chimpanzees live across equatorial Africa and are classified into four subspecies,1 with some genetic population substructure even within subspecies. Conversely, bonobos live exclusively in the Democratic Republic of Congo and are considered a homogeneous group with low genetic diversity,2 despite some population structure inferred from mtDNA. Nevertheless, mtDNA aside, their genetic structure remains unknown, hampering our understanding of the species and conservation efforts. Mapping bonobo genetic diversity in space is, however, challenging because, being endangered, only non-invasive sampling is possible for wild individuals. Here, we jointly analyze the exomes and mtDNA from 20 wild-born bonobos, the whole genomes of 10 captive bonobos, and the mtDNA of 136 wild individuals. We identify three genetically distinct bonobo groups of inferred Central, Western, and Far-Western geographic origin within the bonobo range. We estimate the split time between the central and western populations to be â¼145,000 years ago and genetic differentiation to be in the order of that of the closest chimpanzee subspecies. Furthermore, our estimated long-term Ne for Far-West (â¼3,000) is among the lowest estimated for any great ape lineage. Our results highlight the need to attend to the bonobo substructure, both in terms of research and conservation.
RESUMEN
We used pathogen genomics to test orangutan specimens from a museum in Bonn, Germany, to identify the origin of the animals and the circumstances of their death. We found monkeypox virus genomes in the samples and determined that they represent cases from a 1965 outbreak at Rotterdam Zoo in Rotterdam, the Netherlands.
Asunto(s)
Monkeypox virus , Museos , Animales , Genómica , Brotes de Enfermedades , Alemania/epidemiologíaRESUMEN
In population genetics, the emergence of large-scale genomic data for various species and populations has provided new opportunities to understand the evolutionary forces that drive genetic diversity using statistical inference. However, the era of population genomics presents new challenges in analysing the massive amounts of genomes and variants. Deep learning has demonstrated state-of-the-art performance for numerous applications involving large-scale data. Recently, deep learning approaches have gained popularity in population genetics; facilitated by the advent of massive genomic data sets, powerful computational hardware and complex deep learning architectures, they have been used to identify population structure, infer demographic history and investigate natural selection. Here, we introduce common deep learning architectures and provide comprehensive guidelines for implementing deep learning models for population genetic inference. We also discuss current challenges and future directions for applying deep learning in population genetics, focusing on efficiency, robustness and interpretability.
Asunto(s)
Aprendizaje Profundo , Genómica , Genética de Población , Genoma , Evolución BiológicaRESUMEN
Noncoding DNA is central to our understanding of human gene regulation and complex diseases1,2, and measuring the evolutionary sequence constraint can establish the functional relevance of putative regulatory elements in the human genome3-9. Identifying the genomic elements that have become constrained specifically in primates has been hampered by the faster evolution of noncoding DNA compared to protein-coding DNA10, the relatively short timescales separating primate species11, and the previously limited availability of whole-genome sequences12. Here we construct a whole-genome alignment of 239 species, representing nearly half of all extant species in the primate order. Using this resource, we identified human regulatory elements that are under selective constraint across primates and other mammals at a 5% false discovery rate. We detected 111,318 DNase I hypersensitivity sites and 267,410 transcription factor binding sites that are constrained specifically in primates but not across other placental mammals and validate their cis-regulatory effects on gene expression. These regulatory elements are enriched for human genetic variants that affect gene expression and complex traits and diseases. Our results highlight the important role of recent evolution in regulatory sequence elements differentiating primates, including humans, from other placental mammals.
Asunto(s)
Secuencia Conservada , Evolución Molecular , Genoma , Primates , Animales , Femenino , Humanos , Embarazo , Secuencia Conservada/genética , Desoxirribonucleasa I/metabolismo , ADN/genética , ADN/metabolismo , Genoma/genética , Mamíferos/clasificación , Mamíferos/genética , Placenta , Primates/clasificación , Primates/genética , Secuencias Reguladoras de Ácidos Nucleicos/genética , Reproducibilidad de los Resultados , Factores de Transcripción/metabolismo , Proteínas/genética , Regulación de la Expresión Génica/genéticaRESUMEN
Archaic admixture has had a substantial impact on human evolution with multiple events across different clades, including from extinct hominins such as Neanderthals and Denisovans into modern humans. In great apes, archaic admixture has been identified in chimpanzees and bonobos but the possibility of such events has not been explored in other species. Here, we address this question using high-coverage whole-genome sequences from all four extant gorilla subspecies, including six newly sequenced eastern gorillas from previously unsampled geographic regions. Using approximate Bayesian computation with neural networks to model the demographic history of gorillas, we find a signature of admixture from an archaic 'ghost' lineage into the common ancestor of eastern gorillas but not western gorillas. We infer that up to 3% of the genome of these individuals is introgressed from an archaic lineage that diverged more than 3 million years ago from the common ancestor of all extant gorillas. This introgression event took place before the split of mountain and eastern lowland gorillas, probably more than 40 thousand years ago and may have influenced perception of bitter taste in eastern gorillas. When comparing the introgression landscapes of gorillas, humans and bonobos, we find a consistent depletion of introgressed fragments on the X chromosome across these species. However, depletion in protein-coding content is not detectable in eastern gorillas, possibly as a consequence of stronger genetic drift in this species.
Asunto(s)
Hominidae , Hombre de Neandertal , Animales , Humanos , Gorilla gorilla/genética , Pan paniscus/genética , Teorema de Bayes , Hominidae/genética , Pan troglodytes , Hombre de Neandertal/genéticaRESUMEN
Baboons (genus Papio) are a morphologically and behaviorally diverse clade of catarrhine monkeys that have experienced hybridization between phenotypically and genetically distinct phylogenetic species. We used high-coverage whole-genome sequences from 225 wild baboons representing 19 geographic localities to investigate population genomics and interspecies gene flow. Our analyses provide an expanded picture of evolutionary reticulation among species and reveal patterns of population structure within and among species, including differential admixture among conspecific populations. We describe the first example of a baboon population with a genetic composition that is derived from three distinct lineages. The results reveal processes, both ancient and recent, that produced the observed mismatch between phylogenetic relationships based on matrilineal, patrilineal, and biparental inheritance. We also identified several candidate genes that may contribute to species-specific phenotypes.
Asunto(s)
Evolución Biológica , Flujo Génico , Papio , Animales , Masculino , Papio/anatomía & histología , Papio/genética , Fenotipo , Filogenia , Especificidad de la Especie , Caracteres SexualesRESUMEN
The rich diversity of morphology and behavior displayed across primate species provides an informative context in which to study the impact of genomic diversity on fundamental biological processes. Analysis of that diversity provides insight into long-standing questions in evolutionary and conservation biology and is urgent given severe threats these species are facing. Here, we present high-coverage whole-genome data from 233 primate species representing 86% of genera and all 16 families. This dataset was used, together with fossil calibration, to create a nuclear DNA phylogeny and to reassess evolutionary divergence times among primate clades. We found within-species genetic diversity across families and geographic regions to be associated with climate and sociality, but not with extinction risk. Furthermore, mutation rates differ across species, potentially influenced by effective population sizes. Lastly, we identified extensive recurrence of missense mutations previously thought to be human specific. This study will open a wide range of research avenues for future primate genomic research.
Asunto(s)
Evolución Biológica , Variación Genética , Primates , Animales , Humanos , Genoma , Tasa de Mutación , Filogenia , Primates/genética , Densidad de PoblaciónRESUMEN
Personalized genome sequencing has revealed millions of genetic differences between individuals, but our understanding of their clinical relevance remains largely incomplete. To systematically decipher the effects of human genetic variants, we obtained whole-genome sequencing data for 809 individuals from 233 primate species and identified 4.3 million common protein-altering variants with orthologs in humans. We show that these variants can be inferred to have nondeleterious effects in humans based on their presence at high allele frequencies in other primate populations. We use this resource to classify 6% of all possible human protein-altering variants as likely benign and impute the pathogenicity of the remaining 94% of variants with deep learning, achieving state-of-the-art accuracy for diagnosing pathogenic variants in patients with genetic diseases.
Asunto(s)
Variación Genética , Primates , Animales , Humanos , Secuencia de Bases , Frecuencia de los Genes , Primates/genética , Secuenciación Completa del GenomaRESUMEN
Baboons (genus Papio ) are a morphologically and behaviorally diverse clade of catarrhine monkeys that have experienced hybridization between phenotypically and genetically distinct phylogenetic species. We used high coverage whole genome sequences from 225 wild baboons representing 19 geographic localities to investigate population genomics and inter-species gene flow. Our analyses provide an expanded picture of evolutionary reticulation among species and reveal novel patterns of population structure within and among species, including differential admixture among conspecific populations. We describe the first example of a baboon population with a genetic composition that is derived from three distinct lineages. The results reveal processes, both ancient and recent, that produced the observed mismatch between phylogenetic relationships based on matrilineal, patrilineal, and biparental inheritance. We also identified several candidate genes that may contribute to species-specific phenotypes. One-Sentence Summary: Genomic data for 225 baboons reveal novel sites of inter-species gene flow and local effects due to differences in admixture.
RESUMEN
Personalized genome sequencing has revealed millions of genetic differences between individuals, but our understanding of their clinical relevance remains largely incomplete. To systematically decipher the effects of human genetic variants, we obtained whole genome sequencing data for 809 individuals from 233 primate species, and identified 4.3 million common protein-altering variants with orthologs in human. We show that these variants can be inferred to have non-deleterious effects in human based on their presence at high allele frequencies in other primate populations. We use this resource to classify 6% of all possible human protein-altering variants as likely benign and impute the pathogenicity of the remaining 94% of variants with deep learning, achieving state-of-the-art accuracy for diagnosing pathogenic variants in patients with genetic diseases. One Sentence Summary: Deep learning classifier trained on 4.3 million common primate missense variants predicts variant pathogenicity in humans.
RESUMEN
Sea turtles represent an ancient lineage of marine vertebrates that evolved from terrestrial ancestors over 100 Mya. The genomic basis of the unique physiological and ecological traits enabling these species to thrive in diverse marine habitats remains largely unknown. Additionally, many populations have drastically declined due to anthropogenic activities over the past two centuries, and their recovery is a high global conservation priority. We generated and analyzed high-quality reference genomes for the leatherback (Dermochelys coriacea) and green (Chelonia mydas) turtles, representing the two extant sea turtle families. These genomes are highly syntenic and homologous, but localized regions of noncollinearity were associated with higher copy numbers of immune, zinc-finger, and olfactory receptor (OR) genes in green turtles, with ORs related to waterborne odorants greatly expanded in green turtles. Our findings suggest that divergent evolution of these key gene families may underlie immunological and sensory adaptations assisting navigation, occupancy of neritic versus pelagic environments, and diet specialization. Reduced collinearity was especially prevalent in microchromosomes, with greater gene content, heterozygosity, and genetic distances between species, supporting their critical role in vertebrate evolutionary adaptation. Finally, diversity and demographic histories starkly contrasted between species, indicating that leatherback turtles have had a low yet stable effective population size, exhibit extremely low diversity compared with other reptiles, and harbor a higher genetic load compared with green turtles, reinforcing concern over their persistence under future climate scenarios. These genomes provide invaluable resources for advancing our understanding of evolution and conservation best practices in an imperiled vertebrate lineage.
Asunto(s)
Tortugas , Animales , Ecosistema , Dinámica PoblacionalRESUMEN
S* is a widely used statistic for detecting archaic admixture from population genetic data. Previous studies used freezing-archer to apply S*, which is only directly applicable to the specific case of Neanderthal and Denisovan introgression in Papuans. Here, we implemented sstar for a more general purpose. Compared with several tools, including SPrime, SkovHMM, and ArchaicSeeker2.0, for detecting introgressed fragments with simulations, our results suggest that sstar is robust to differences in demographic models, including ghost introgression and two-source introgression. We believe sstar will be a useful tool for detecting introgressed fragments in various scenarios and in non-human species.
Asunto(s)
Genoma Humano , Hombre de Neandertal , Humanos , Animales , Hombre de Neandertal/genética , Genética de PoblaciónRESUMEN
Large-scale estimations of the time of emergence of variants are essential to examine hypotheses concerning human evolution with precision. Using an open repository of genetic variant age estimations, we offer here a temporal evaluation of various evolutionarily relevant datasets, such as Homo sapiens-specific variants, high-frequency variants found in genetic windows under positive selection, introgressed variants from extinct human species, as well as putative regulatory variants specific to various brain regions. We find a recurrent bimodal distribution of high-frequency variants, but also evidence for specific enrichments of gene categories in distinct time windows, pointing to different periods of phenotypic changes, resulting in a mosaic. With a temporal classification of genetic mutations in hand, we then applied a machine learning tool to predict what genes have changed more in certain time windows, and which tissues these genes may have impacted more. Overall, we provide a fine-grained temporal mapping of derived variants in Homo sapiens that helps to illuminate the intricate evolutionary history of our species.
Asunto(s)
Evolución Biológica , Encéfalo , Humanos , MutaciónRESUMEN
Knowledge on the population history of endangered species is critical for conservation, but whole-genome data on chimpanzees (Pan troglodytes) is geographically sparse. Here, we produced the first non-invasive geolocalized catalog of genomic diversity by capturing chromosome 21 from 828 non-invasive samples collected at 48 sampling sites across Africa. The four recognized subspecies show clear genetic differentiation correlating with known barriers, while previously undescribed genetic exchange suggests that these have been permeable on a local scale. We obtained a detailed reconstruction of population stratification and fine-scale patterns of isolation, migration, and connectivity, including a comprehensive picture of admixture with bonobos (Pan paniscus). Unlike humans, chimpanzees did not experience extended episodes of long-distance migrations, which might have limited cultural transmission. Finally, based on local rare variation, we implement a fine-grained geolocalization approach demonstrating improved precision in determining the origin of confiscated chimpanzees.
RESUMEN
BACKGROUND: Numerous Ebola virus outbreaks have occurred in Equatorial Africa over the past decades. Besides human fatalities, gorillas and chimpanzees have also succumbed to the fatal virus. The 2004 outbreak at the Odzala-Kokoua National Park (Republic of Congo) alone caused a severe decline in the resident western lowland gorilla (Gorilla gorilla gorilla) population, with a 95% mortality rate. Here, we explore the immediate genetic impact of the Ebola outbreak in the western lowland gorilla population. RESULTS: Associations with survivorship were evaluated by utilizing DNA obtained from fecal samples from 16 gorilla individuals declared missing after the outbreak (non-survivors) and 15 individuals observed before and after the epidemic (survivors). We used a target enrichment approach to capture the sequences of 123 genes previously associated with immunology and Ebola virus resistance and additionally analyzed the gut microbiome which could influence the survival after an infection. Our results indicate no changes in the population genetic diversity before and after the Ebola outbreak, and no significant differences in microbial community composition between survivors and non-survivors. However, and despite the low power for an association analysis, we do detect six nominally significant missense mutations in four genes that might be candidate variants associated with an increased chance of survival. CONCLUSION: This study offers the first insight to the genetics of a wild great ape population before and after an Ebola outbreak using target capture experiments from fecal samples, and presents a list of candidate loci that may have facilitated their survival.
Asunto(s)
Microbioma Gastrointestinal , Fiebre Hemorrágica Ebola , Animales , Brotes de Enfermedades , Gorilla gorilla/genética , Fiebre Hemorrágica Ebola/epidemiología , Fiebre Hemorrágica Ebola/veterinaria , Humanos , Pan troglodytesRESUMEN
Changes in the epigenetic regulation of gene expression have a central role in evolution. Here, we extensively profiled a panel of human, chimpanzee, gorilla, orangutan, and macaque lymphoblastoid cell lines (LCLs), using ChIP-seq for five histone marks, ATAC-seq and RNA-seq, further complemented with whole genome sequencing (WGS) and whole genome bisulfite sequencing (WGBS). We annotated regulatory elements (RE) and integrated chromatin contact maps to define gene regulatory architectures, creating the largest catalog of RE in primates to date. We report that epigenetic conservation and its correlation with sequence conservation in primates depends on the activity state of the regulatory element. Our gene regulatory architectures reveal the coordination of different types of components and highlight the role of promoters and intragenic enhancers (gE) in the regulation of gene expression. We observe that most regulatory changes occur in weakly active gE. Remarkably, novel human-specific gE with weak activities are enriched in human-specific nucleotide changes. These elements appear in genes with signals of positive selection and human acceleration, tissue-specific expression, and particular functional enrichments, suggesting that the regulatory evolution of these genes may have contributed to human adaptation.
Asunto(s)
Elementos de Facilitación Genéticos/genética , Epigénesis Genética/genética , Epigenómica/métodos , Linfocitos/metabolismo , Secuencias Reguladoras de Ácidos Nucleicos/genética , Animales , Línea Celular , Secuenciación de Inmunoprecipitación de Cromatina/métodos , Evolución Molecular , Regulación de la Expresión Génica , Humanos , Linfocitos/citología , Primates , RNA-Seq/métodosRESUMEN
Modern human contamination is a common problem in ancient DNA studies. We provide evidence that this issue is also present in studies in great apes, which are our closest living relatives, for example in noninvasive samples. Here, we present a simple method to detect human contamination in short-read sequencing data from different species: HuConTest. We demonstrate its feasibility using blood and tissue samples from these species. This test is particularly useful for more complex samples (such as museum and noninvasive samples) which have smaller amounts of endogenous DNA, as we show here.
Asunto(s)
Contaminación de ADN , Hominidae/genética , Animales , HumanosRESUMEN
Noninvasive samples as a source of DNA are gaining interest in genomic studies of endangered species. However, their complex nature and low endogenous DNA content hamper the recovery of good quality data. Target capture has become a productive method to enrich the endogenous fraction of noninvasive samples, such as faeces, but its sensitivity has not yet been extensively studied. Coping with faecal samples with an endogenous DNA content below 1% is a common problem when prior selection of samples from a large collection is not possible. However, samples classified as unfavourable for target capture sequencing might be the only representatives of unique specific geographical locations, or to answer the question of interest. To explore how library complexity may be increased without repeating DNA extractions and generating new libraries, in this study we captured the exome of 60 chimpanzees (Pan troglodytes) using faecal samples with very low proportions of endogenous content (<1%). Our results indicate that by performing additional hybridizations of the same libraries, the molecular complexity can be maintained to achieve higher coverage. Also, whenever possible, the starting DNA material for capture should be increased. Finally, we specifically calculated the sequencing effort needed to avoid exhausting the library complexity of enriched faecal samples with low endogenous DNA content. This study provides guidelines, schemes and tools for laboratories facing the challenges of working with noninvasive samples containing extremely low amounts of endogenous DNA.
Asunto(s)
Exoma , Genómica , Hibridación de Ácido Nucleico , Animales , Heces , Biblioteca de Genes , Secuenciación de Nucleótidos de Alto Rendimiento , Pan troglodytes/genética , Análisis de Secuencia de ADNRESUMEN
An amendment to this paper has been published and can be accessed via a link at the top of the paper.