RESUMO
A composite likelihood method is introduced for jointly estimating the intensity of selection and the rate of mutation, both scaled by the effective population size, when there is balancing selection at a single multi-allelic locus in an isolated population at demographic equilibrium. The performance of the method is tested using simulated data. Average estimated mutation rates and selection intensities are close to the true values but there is considerable variation about the averages. Allowing for both population growth and population subdivision does not result in qualitative differences but the estimated mutation rates and selection intensities do not in general reflect the current effective population size. The method is applied to 3 class I (HLA-A, HLA-B, and HLA-C) and 2 class II loci (HLA-DRB1 and HLA-DQA1) in the 1000 Genomes populations. Allowing for asymmetric balancing selection has only a slight effect on the results from the symmetric model. Mutations that restore symmetry of the selection model are preferentially retained because of the tendency of natural selection to maximize average fitness. However, slight differences in selective effects result in much longer persistence time of some alleles. Trans-species polymorphism, which is characteristic of major-histocompatibility loci in vertebrates, is more likely when there are small differences in allelic fitness than when complete symmetry is assumed. Therefore, variation in allelic fitness expands the range of parameter values consistent with observations of trans-species polymorphism.
Assuntos
Variação Genética , Taxa de Mutação , Alelos , Animais , Frequência do Gene , Haplótipos , Seleção GenéticaRESUMO
We present a method called the G(A|B) method for estimating coalescence probabilities within population lineages from genome sequences when one individual is sampled from each population. Population divergence times can be estimated from these coalescence probabilities if additional assumptions about the history of population sizes are made. Our method is based on a method presented by Rasmussen et al. (2014) to test whether an archaic genome is from a population directly ancestral to a present-day population. The G(A|B) method does not require distinguishing ancestral from derived alleles or assumptions about demographic history before population divergence. We discuss the relationship of our method to two similar methods, one introduced by Green et al. (2010) and called the F(A|B) method and the other introduced by Schlebusch et al. (2017) and called the TT method. When our method is applied to individuals from three or more populations, it provides a test of whether the population history is treelike because coalescence probabilities are additive on a tree. We illustrate the use of our method by applying it to three high-coverage archaic genomes, two Neanderthals (Vindija and Altai) and a Denisovan.
Assuntos
Homem de Neandertal , Alelos , Animais , Humanos , Homem de Neandertal/genética , Densidade Demográfica , ProbabilidadeRESUMO
The mosquito Anopheles gambiae s.s. is distributed across most of sub-Saharan Africa and is of major scientific and public health interest for being an African malaria vector. Here we present population genomic analyses of 111 specimens sampled from west to east Africa, including the first whole genome sequences from oceanic islands, the Comoros. Genetic distances between populations of A. gambiae are discordant with geographic distances but are consistent with a stepwise migration scenario in which the species increases its range from west to east Africa through consecutive founder events over the last ~200,000 years. Geological barriers like the Congo River basin and the East African rift seem to play an important role in shaping this process. Moreover, we find a high degree of genetic isolation of populations on the Comoros, confirming the potential of these islands as candidate sites for potential field trials of genetically engineered mosquitoes for malaria control.
Assuntos
Anopheles/genética , Efeito Fundador , Genética Populacional , Mosquitos Vetores/genética , África Oriental , África Ocidental , Animais , Geografia , Malária/epidemiologia , Malária/parasitologia , Malária/transmissão , Densidade Demográfica , Dinâmica PopulacionalRESUMO
BACKGROUND: In the summer of 2013, Aedes aegypti Linnaeus was first detected in three cities in central California (Clovis, Madera and Menlo Park). It has now been detected in multiple locations in central and southern CA as far south as San Diego and Imperial Counties. A number of published reports suggest that CA populations have been established from multiple independent introductions. RESULTS: Here we report the first population genomics analyses of Ae. aegypti based on individual, field collected whole genome sequences. We analyzed 46 Ae. aegypti genomes to establish genetic relationships among populations from sites in California, Florida and South Africa. Based on 4.65 million high quality biallelic SNPs, we identified 3 major genetic clusters within California; one that includes all sample sites in the southern part of the state (South of Tehachapi mountain range) plus the town of Exeter in central California and two additional clusters in central California. CONCLUSIONS: A lack of concordance between mitochondrial and nuclear genealogies suggests that the three founding populations were polymorphic for two main mitochondrial haplotypes prior to being introduced to California. One of these has been lost in the Clovis populations, possibly by a founder effect. Genome-wide comparisons indicate extensive differentiation between genetic clusters. Our observations support recent introductions of Ae. aegypti into California from multiple, genetically diverged source populations. Our data reveal signs of hybridization among diverged populations within CA. Genetic markers identified in this study will be of great value in pursuing classical population genetic studies which require larger sample sizes.
Assuntos
Aedes/classificação , Genoma de Inseto , Sequenciamento Completo do Genoma/veterinária , Aedes/genética , Animais , California , Evolução Molecular , Variação Genética , Genética Populacional , Tamanho do Genoma , Espécies Introduzidas , Metagenômica , Mosquitos Vetores/classificação , Mosquitos Vetores/genética , Filogenia , FilogeografiaRESUMO
The increasing abundance of DNA sequences obtained from fossils calls for new population genetics theory that takes account of both the temporal and spatial separation of samples. Here, we exploit the relationship between Wright's FST and average coalescence times to develop an analytic theory describing how FST depends on both the distance and time separating pairs of sampled genomes. We apply this theory to several simple models of population history. If there is a time series of samples, partial population replacement creates a discontinuity in pairwise FST values. The magnitude of the discontinuity depends on the extent of replacement. In stepping-stone models, pairwise FST values between archaic and present-day samples reflect both the spatial and temporal separation. At long distances, an isolation by distance pattern dominates. At short distances, the time separation dominates. Analytic predictions fit patterns generated by simulations. We illustrate our results with applications to archaic samples from European human populations. We compare present-day samples with a pair of archaic samples taken before and after a replacement event.
Assuntos
DNA Antigo/análise , Genética Populacional/história , Genoma , Fósseis/história , História Antiga , Modelos GenéticosRESUMO
Although many large mammal species went extinct at the end of the Pleistocene epoch, their DNA may persist due to past episodes of interspecies admixture. However, direct empirical evidence of the persistence of ancient alleles remains scarce. Here, we present multifold coverage genomic data from four Late Pleistocene cave bears (Ursus spelaeus complex) and show that cave bears hybridized with brown bears (Ursus arctos) during the Pleistocene. We develop an approach to assess both the directionality and relative timing of gene flow. We find that segments of cave bear DNA still persist in the genomes of living brown bears, with cave bears contributing 0.9 to 2.4% of the genomes of all brown bears investigated. Our results show that even though extinction is typically considered as absolute, following admixture, fragments of the gene pool of extinct species can survive for tens of thousands of years in the genomes of extant recipient species.
Assuntos
Extinção Biológica , Fluxo Gênico , Hibridização Genética , Ursidae/genética , Animais , GenômicaRESUMO
Although it has previously been shown that Neanderthals contributed DNA to modern humans, not much is known about the genetic diversity of Neanderthals or the relationship between late Neanderthal populations at the time at which their last interactions with early modern humans occurred and before they eventually disappeared. Our ability to retrieve DNA from a larger number of Neanderthal individuals has been limited by poor preservation of endogenous DNA and contamination of Neanderthal skeletal remains by large amounts of microbial and present-day human DNA. Here we use hypochlorite treatment of as little as 9 mg of bone or tooth powder to generate between 1- and 2.7-fold genomic coverage of five Neanderthals who lived around 39,000 to 47,000 years ago (that is, late Neanderthals), thereby doubling the number of Neanderthals for which genome sequences are available. Genetic similarity among late Neanderthals is well predicted by their geographical location, and comparison to the genome of an older Neanderthal from the Caucasus indicates that a population turnover is likely to have occurred, either in the Caucasus or throughout Europe, towards the end of Neanderthal history. We find that the bulk of Neanderthal gene flow into early modern humans originated from one or more source populations that diverged from the Neanderthals that were studied here at least 70,000 years ago, but after they split from a previously sequenced Neanderthal from Siberia around 150,000 years ago. Although four of the Neanderthals studied here post-date the putative arrival of early modern humans into Europe, we do not detect any recent gene flow from early modern humans in their ancestry.
Assuntos
Genoma/genética , Homem de Neandertal/classificação , Homem de Neandertal/genética , Filogenia , África/etnologia , Animais , Osso e Ossos , DNA Antigo/análise , Europa (Continente)/etnologia , Feminino , Fluxo Gênico , Genética Populacional , Genômica , Humanos , Ácido Hipocloroso , Masculino , Sibéria/etnologia , DenteRESUMO
By at least 45,000 years before present, anatomically modern humans had spread across Eurasia [1-3], but it is not well known how diverse these early populations were and whether they contributed substantially to later people or represent early modern human expansions into Eurasia that left no surviving descendants today. Analyses of genome-wide data from several ancient individuals from Western Eurasia and Siberia have shown that some of these individuals have relationships to present-day Europeans [4, 5] while others did not contribute to present-day Eurasian populations [3, 6]. As contributions from Upper Paleolithic populations in Eastern Eurasia to present-day humans and their relationship to other early Eurasians is not clear, we generated genome-wide data from a 40,000-year-old individual from Tianyuan Cave, China, [1, 7] to study his relationship to ancient and present-day humans. We find that he is more related to present-day and ancient Asians than he is to Europeans, but he shares more alleles with a 35,000-year-old European individual than he shares with other ancient Europeans, indicating that the separation between early Europeans and early Asians was not a single population split. We also find that the Tianyuan individual shares more alleles with some Native American groups in South America than with Native Americans elsewhere, providing further support for population substructure in Asia [8] and suggesting that this persisted from 40,000 years ago until the colonization of the Americas. Our study of the Tianyuan individual highlights the complex migration and subdivision of early human populations in Eurasia.
Assuntos
DNA Antigo/análise , Genoma Humano , Migração Humana , Arqueologia , Variação Biológica da População , China , Humanos , Masculino , FilogeniaRESUMO
To date, the only Neandertal genome that has been sequenced to high quality is from an individual found in Southern Siberia. We sequenced the genome of a female Neandertal from ~50,000 years ago from Vindija Cave, Croatia, to ~30-fold genomic coverage. She carried 1.6 differences per 10,000 base pairs between the two copies of her genome, fewer than present-day humans, suggesting that Neandertal populations were of small size. Our analyses indicate that she was more closely related to the Neandertals that mixed with the ancestors of present-day humans living outside of sub-Saharan Africa than the previously sequenced Neandertal from Siberia, allowing 10 to 20% more Neandertal DNA to be identified in present-day humans, including variants involved in low-density lipoprotein cholesterol concentrations, schizophrenia, and other diseases.
Assuntos
Evolução Biológica , Homem de Neandertal/genética , Alelos , Animais , Cavernas , Croácia , DNA Antigo , Genoma , HumanosRESUMO
We develop and evaluate methods for inferring relatedness among individuals from low-coverage DNA sequences of their genomes, with particular emphasis on sequences obtained from fossil remains. We suggest the major factors complicating the determination of relatedness among ancient individuals are sequencing depth, the number of overlapping sites, the sequencing error rate and the presence of contamination from present-day genetic sources. We develop a theoretical model that facilitates the exploration of these factors and their relative effects, via measurement of pairwise genetic distances, without calling genotypes, and determine the power to infer relatedness under various scenarios of varying sequencing depth, present-day contamination and sequencing error. The model is validated by a simulation study as well as the analysis of aligned sequences from present-day human genomes. We then apply the method to the recently published genome sequences of ancient Europeans, developing a statistical treatment to determine confidence in assigned relatedness that is, in some cases, more precise than previously reported. As the majority of ancient specimens are from animals, this method would be applicable to investigate kinship in nonhuman remains. The developed software grups (Genetic Relatedness Using Pedigree Simulations) is implemented in Python and freely available.
Assuntos
Simulação por Computador , Genoma Humano , Modelos Genéticos , Linhagem , Genótipo , Humanos , Polimorfismo de Nucleotídeo Único , SoftwareRESUMO
Here, we develop and test a method to address whether DNA samples sequenced from a group of fossil hominin bone or tooth fragments originate from the same individual or from closely related individuals. Our method assumes low amounts of retrievable DNA, significant levels of sequencing error, and contamination from one or more present-day humans. We develop and implement a maximum likelihood method that estimates levels of contamination, sequencing error rates, and pairwise relatedness coefficients in a set of individuals. We assume that there is no reference panel for the ancient population to provide allele and haplotype frequencies. Our approach makes use of single nucleotide polymorphisms (SNPs) and does not make assumptions about the underlying demographic model. By artificially mating genomes from the 1000 Genomes Project, we determine the numbers of individuals at a given genomic coverage that are required to detect different levels of genetic relatedness with confidence.
Assuntos
Fósseis , Frequência do Gene/genética , Genoma Humano/genética , Análise de Sequência de DNA/métodos , Osso e Ossos , Humanos , Funções Verossimilhança , Polimorfismo de Nucleotídeo ÚnicoRESUMO
Woolly mammoths (Mammuthus primigenius) populated Siberia, Beringia, and North America during the Pleistocene and early Holocene. Recent breakthroughs in ancient DNA sequencing have allowed for complete genome sequencing for two specimens of woolly mammoths (Palkopoulou et al. 2015). One mammoth specimen is from a mainland population 45,000 years ago when mammoths were plentiful. The second, a 4300 yr old specimen, is derived from an isolated population on Wrangel island where mammoths subsisted with small effective population size more than 43-fold lower than previous populations. These extreme differences in effective population size offer a rare opportunity to test nearly neutral models of genome architecture evolution within a single species. Using these previously published mammoth sequences, we identify deletions, retrogenes, and non-functionalizing point mutations. In the Wrangel island mammoth, we identify a greater number of deletions, a larger proportion of deletions affecting gene sequences, a greater number of candidate retrogenes, and an increased number of premature stop codons. This accumulation of detrimental mutations is consistent with genomic meltdown in response to low effective population sizes in the dwindling mammoth population on Wrangel island. In addition, we observe high rates of loss of olfactory receptors and urinary proteins, either because these loci are non-essential or because they were favored by divergent selective pressures in island environments. Finally, at the locus of FOXQ1 we observe two independent loss-of-function mutations, which would confer a satin coat phenotype in this island woolly mammoth.
Assuntos
Fósseis , Genoma , Genômica/métodos , Mamutes/genética , Animais , DNA Antigo/análise , Evolução Molecular , Ilhas , Mutação , Federação Russa , Análise de Sequência de DNA , Fatores de TempoRESUMO
[This corrects the article DOI: 10.1371/journal.pgen.1005972.].
RESUMO
In the past few years, the number of autosomal DNA sequences from human fossils has grown explosively and numerous partial or complete sequences are available from our closest relatives, Neanderthal and Denisovans. I review commonly used statistical methods applied to these sequences. These methods fall into three broad classes: methods for estimating levels of contamination, descriptive methods, and methods based on population genetic models. The latter two classes are largely methods developed for the analysis of present-day genomic data. When they are applied to ancient DNA (aDNA), they usually ignore the time dimension. A few methods, particularly those concerned with inferring something about selection or ancestor-descendant relationships, take explicit account of the ages of aDNA samples.
Assuntos
DNA Antigo/análise , DNA Mitocondrial/genética , Genética Populacional/estatística & dados numéricos , Hominidae/genética , Animais , Fósseis , Genoma/genética , Humanos , Homem de Neandertal/genéticaRESUMO
We review studies of genomic data obtained by sequencing hominin fossils with particular emphasis on the unique information that ancient DNA (aDNA) can provide about the demographic history of humans and our closest relatives. We concentrate on nuclear genomic sequences that have been published in the past few years. In many cases, particularly in the Arctic, the Americas, and Europe, aDNA has revealed historical demographic patterns in a way that could not be resolved by analyzing present-day genomes alone. Ancient DNA from archaic hominins has revealed a rich history of admixture between early modern humans, Neanderthals, and Denisovans, and has allowed us to disentangle complex selective processes. Information from aDNA studies is nowhere near saturation, and we believe that future aDNA sequences will continue to change our understanding of hominin history.
Assuntos
DNA Antigo , Hominidae/genética , Animais , Contaminação por DNA , Fósseis , Genoma , HumanosRESUMO
When sequencing an ancient DNA sample from a hominin fossil, DNA from present-day humans involved in excavation and extraction will be sequenced along with the endogenous material. This type of contamination is problematic for downstream analyses as it will introduce a bias towards the population of the contaminating individual(s). Quantifying the extent of contamination is a crucial step as it allows researchers to account for possible biases that may arise in downstream genetic analyses. Here, we present an MCMC algorithm to co-estimate the contamination rate, sequencing error rate and demographic parameters-including drift times and admixture rates-for an ancient nuclear genome obtained from human remains, when the putative contaminating DNA comes from present-day humans. We assume we have a large panel representing the putative contaminant population (e.g. European, East Asian or African). The method is implemented in a C++ program called 'Demographic Inference with Contamination and Error' (DICE). We applied it to simulations and genome data from ancient Neanderthals and modern humans. With reasonable levels of genome sequence coverage (>3X), we find we can recover accurate estimates of all these parameters, even when the contamination rate is as high as 50%.
Assuntos
Contaminação por DNA , DNA/genética , Deriva Genética , Homem de Neandertal/genética , Algoritmos , Animais , Sequência de Bases , Simulação por Computador , DNA Mitocondrial/genética , Fósseis , Genética Populacional , Humanos , Cadeias de Markov , Método de Monte Carlo , Análise de Sequência de DNA , SoftwareRESUMO
The advent of accessible ancient DNA technology now allows the direct ascertainment of allele frequencies in ancestral populations, thereby enabling the use of allele frequency time series to detect and estimate natural selection. Such direct observations of allele frequency dynamics are expected to be more powerful than inferences made using patterns of linked neutral variation obtained from modern individuals. We developed a Bayesian method to make use of allele frequency time series data and infer the parameters of general diploid selection, along with allele age, in nonequilibrium populations. We introduce a novel path augmentation approach, in which we use Markov chain Monte Carlo to integrate over the space of allele frequency trajectories consistent with the observed data. Using simulations, we show that this approach has good power to estimate selection coefficients and allele age. Moreover, when applying our approach to data on horse coat color, we find that ignoring a relevant demographic history can significantly bias the results of inference. Our approach is made available in a C++ software package.
Assuntos
Frequência do Gene , Modelos Genéticos , Seleção Genética , Software , Animais , Teorema de Bayes , Diploide , Cavalos/genética , Pigmentação da Pele/genéticaRESUMO
With the great advances in ancient DNA extraction, genetic data are now obtained from geographically separated individuals from both present and past. However, population genetics theory about the joint effect of space and time has not been thoroughly studied. Based on the classical stepping-stone model, we develop the theory of Isolation by distance and time. We derive the correlation of allele frequencies between demes in the case where ancient samples are present, and investigate the impact of edge effects with forward-in-time simulations. We also derive results about coalescent times in circular and toroidal models. As one of the most common ways to investigate population structure is principal components analysis (PCA), we evaluate the impact of our theory on PCA plots. Our results demonstrate that time between samples is an important factor. Ancient samples tend to be drawn to the center of a PCA plot.
Assuntos
Genética Populacional , Modelos Genéticos , Fluxo Gênico , Frequência do Gene , Humanos , Análise de Componente PrincipalRESUMO
Yakutia, Sakha Republic, in the Siberian Far East, represents one of the coldest places on Earth, with winter record temperatures dropping below -70 °C. Nevertheless, Yakutian horses survive all year round in the open air due to striking phenotypic adaptations, including compact body conformations, extremely hairy winter coats, and acute seasonal differences in metabolic activities. The evolutionary origins of Yakutian horses and the genetic basis of their adaptations remain, however, contentious. Here, we present the complete genomes of nine present-day Yakutian horses and two ancient specimens dating from the early 19th century and â¼5,200 y ago. By comparing these genomes with the genomes of two Late Pleistocene, 27 domesticated, and three wild Przewalski's horses, we find that contemporary Yakutian horses do not descend from the native horses that populated the region until the mid-Holocene, but were most likely introduced following the migration of the Yakut people a few centuries ago. Thus, they represent one of the fastest cases of adaptation to the extreme temperatures of the Arctic. We find cis-regulatory mutations to have contributed more than nonsynonymous changes to their adaptation, likely due to the comparatively limited standing variation within gene bodies at the time the population was founded. Genes involved in hair development, body size, and metabolic and hormone signaling pathways represent an essential part of the Yakutian horse adaptive genetic toolkit. Finally, we find evidence for convergent evolution with native human populations and woolly mammoths, suggesting that only a few evolutionary strategies are compatible with survival in extremely cold environments.