Pesquisa | Portal Regional da BVS

Model-based detection and analysis of introgressed Neanderthal ancestry in modern humans.

Steinrücken, Matthias; Spence, Jeffrey P; Kamm, John A; Wieczorek, Emilia; Song, Yun S.

Mol Ecol ; 27(19): 3873-3888, 2018 10.

Artigo em Inglês | MEDLINE | ID: mdl-29603507

RESUMO

Genetic evidence has revealed that the ancestors of modern human populations outside Africa and their hominin sister groups, notably Neanderthals, exchanged genetic material in the past. The distribution of these introgressed sequence tracts along modern-day human genomes provides insight into the selective forces acting on them and the role of introgression in the evolutionary history of hominins. Studying introgression patterns on the X-chromosome is of particular interest, as sex chromosomes are thought to play a special role in speciation. Recent studies have developed methods to localize introgressed ancestries, reporting long regions that are depleted of Neanderthal introgression and enriched in genes, suggesting negative selection against the Neanderthal variants. On the other hand, enriched Neanderthal ancestry in hair- and skin-related genes suggests that some introgressed variants facilitated adaptation to new environments. Here, we present a model-based introgression detection method called dical-admix. We demonstrate its efficiency and accuracy through extensive simulations and apply it to detect tracts of Neanderthal introgression in modern human individuals from the 1000 Genomes Project. Our findings are largely concordant with previous studies, consistent with weak selection against Neanderthal ancestry. We find evidence that selection against Neanderthal ancestry was due to higher genetic load in Neanderthals resulting from small effective population size, rather than widespread Dobzhansky-Müller incompatibilities (DMIs) that could contribute to reproductive isolation. Moreover, we confirm the previously reported low level of introgression on the X-chromosome, but find little evidence that DMIs contributed to this pattern.

Assuntos

Genética Populacional , Genoma Humano , Modelos Genéticos , Homem de Neandertal/genética , Animais , Cromossomos Humanos X/genética , Simulação por Computador , Carga Genética , Humanos , Hibridização Genética , Cadeias de Markov , Densidade Demográfica , Seleção Genética

Terminal Pleistocene Alaskan genome reveals first founding population of Native Americans.

Moreno-Mayar, J Víctor; Potter, Ben A; Vinner, Lasse; Steinrücken, Matthias; Rasmussen, Simon; Terhorst, Jonathan; Kamm, John A; Albrechtsen, Anders; Malaspinas, Anna-Sapfo; Sikora, Martin; Reuther, Joshua D; Irish, Joel D; Malhi, Ripan S; Orlando, Ludovic; Song, Yun S; Nielsen, Rasmus; Meltzer, David J; Willerslev, Eske.

Nature ; 553(7687): 203-207, 2018 01 11.

Artigo em Inglês | MEDLINE | ID: mdl-29323294

RESUMO

Despite broad agreement that the Americas were initially populated via Beringia, the land bridge that connected far northeast Asia with northwestern North America during the Pleistocene epoch, when and how the peopling of the Americas occurred remains unresolved. Analyses of human remains from Late Pleistocene Alaska are important to resolving the timing and dispersal of these populations. The remains of two infants were recovered at Upward Sun River (USR), and have been dated to around 11.5 thousand years ago (ka). Here, by sequencing the USR1 genome to an average coverage of approximately 17 times, we show that USR1 is most closely related to Native Americans, but falls basal to all previously sequenced contemporary and ancient Native Americans. As such, USR1 represents a distinct Ancient Beringian population. Using demographic modelling, we infer that the Ancient Beringian population and ancestors of other Native Americans descended from a single founding population that initially split from East Asians around 36 ± 1.5 ka, with gene flow persisting until around 25 ± 1.1 ka. Gene flow from ancient north Eurasians into all Native Americans took place 25-20 ka, with Ancient Beringians branching off around 22-18.1 ka. Our findings support a long-term genetic structure in ancestral Native Americans, consistent with the Beringian 'standstill model'. We show that the basal northern and southern Native American branches, to which all other Native Americans belong, diverged around 17.5-14.6 ka, and that this probably occurred south of the North American ice sheets. We also show that after 11.5 ka, some of the northern Native American populations received gene flow from a Siberian population most closely related to Koryaks, but not Palaeo-Eskimos, Inuits or Kets, and that Native American gene flow into Inuits was through northern and not southern Native American groups. Our findings further suggest that the far-northern North American presence of northern Native Americans is from a back migration that replaced or absorbed the initial founding population of Ancient Beringians.

Assuntos

Efeito Fundador , Genoma Humano/genética , Indígenas Norte-Americanos/genética , Modelos Genéticos , Filogenia , Alaska , Ásia Oriental/etnologia , Fluxo Gênico , Genética Populacional , História Antiga , Migração Humana , Humanos , Lactente , Rios , Sibéria/etnologia , Fatores de Tempo

Efficient computation of the joint sample frequency spectra for multiple populations.

Kamm, John A; Terhorst, Jonathan; Song, Yun S.

J Comput Graph Stat ; 26(1): 182-194, 2017.

Artigo em Inglês | MEDLINE | ID: mdl-28239248

RESUMO

A wide range of studies in population genetics have employed the sample frequency spectrum (SFS), a summary statistic which describes the distribution of mutant alleles at a polymorphic site in a sample of DNA sequences and provides a highly efficient dimensional reduction of large-scale population genomic variation data. Recently, there has been much interest in analyzing the joint SFS data from multiple populations to infer parameters of complex demographic histories, including variable population sizes, population split times, migration rates, admixture proportions, and so on. SFS-based inference methods require accurate computation of the expected SFS under a given demographic model. Although much methodological progress has been made, existing methods suffer from numerical instability and high computational complexity when multiple populations are involved and the sample size is large. In this paper, we present new analytic formulas and algorithms that enable accurate, efficient computation of the expected joint SFS for thousands of individuals sampled from hundreds of populations related by a complex demographic model with arbitrary population size histories (including piecewise-exponential growth). Our results are implemented in a new software package called momi (MOran Models for Inference). Through an empirical study we demonstrate our improvements to numerical stability and computational complexity.

Robust and scalable inference of population history from hundreds of unphased whole genomes.

Terhorst, Jonathan; Kamm, John A; Song, Yun S.

Nat Genet ; 49(2): 303-309, 2017 Feb.

Artigo em Inglês | MEDLINE | ID: mdl-28024154

RESUMO

It has recently been demonstrated that inference methods based on genealogical processes with recombination can uncover past population history in unprecedented detail. However, these methods scale poorly with sample size, limiting resolution in the recent past, and they require phased genomes, which contain switch errors that can catastrophically distort the inferred history. Here we present SMC++, a new statistical tool capable of analyzing orders of magnitude more samples than existing methods while requiring only unphased genomes (its results are independent of phasing). SMC++ can jointly infer population size histories and split times in diverged populations, and it employs a novel spline regularization scheme that greatly reduces estimation error. We apply SMC++ to analyze sequence data from over a thousand human genomes in Africa and Eurasia, hundreds of genomes from a Drosophila melanogaster population in Africa, and tens of genomes from zebra finch and long-tailed finch populations in Australia.

Assuntos

Genoma/genética , África , Animais , Austrália , Simulação por Computador , Drosophila melanogaster/genética , Equidae/genética , Genética Populacional/métodos , Humanos , Modelos Genéticos , Densidade Demográfica

Two-Locus Likelihoods Under Variable Population Size and Fine-Scale Recombination Rate Estimation.

Kamm, John A; Spence, Jeffrey P; Chan, Jeffrey; Song, Yun S.

Genetics ; 203(3): 1381-99, 2016 07.

Artigo em Inglês | MEDLINE | ID: mdl-27182948

RESUMO

Two-locus sampling probabilities have played a central role in devising an efficient composite-likelihood method for estimating fine-scale recombination rates. Due to mathematical and computational challenges, these sampling probabilities are typically computed under the unrealistic assumption of a constant population size, and simulation studies have shown that resulting recombination rate estimates can be severely biased in certain cases of historical population size changes. To alleviate this problem, we develop here new methods to compute the sampling probability for variable population size functions that are piecewise constant. Our main theoretical result, implemented in a new software package called LDpop, is a novel formula for the sampling probability that can be evaluated by numerically exponentiating a large but sparse matrix. This formula can handle moderate sample sizes ([Formula: see text]) and demographic size histories with a large number of epochs ([Formula: see text]). In addition, LDpop implements an approximate formula for the sampling probability that is reasonably accurate and scales to hundreds in sample size ([Formula: see text]). Finally, LDpop includes an importance sampler for the posterior distribution of two-locus genealogies, based on a new result for the optimal proposal distribution in the variable-size setting. Using our methods, we study how a sharp population bottleneck followed by rapid growth affects the correlation between partially linked sites. Then, through an extensive simulation study, we show that accounting for population size changes under such a demographic model leads to substantial improvements in fine-scale recombination rate estimation.

Assuntos

Genética Populacional , Modelos Genéticos , Recombinação Genética , Algoritmos , Simulação por Computador , Humanos , Funções Verossimilhança , Densidade Demográfica

The Site Frequency Spectrum for General Coalescents.

Spence, Jeffrey P; Kamm, John A; Song, Yun S.

Genetics ; 202(4): 1549-61, 2016 Apr.

Artigo em Inglês | MEDLINE | ID: mdl-26883445

RESUMO

General genealogical processes such as Λ- and Ξ-coalescents, which respectively model multiple and simultaneous mergers, have important applications in studying marine species, strong positive selection, recurrent selective sweeps, strong bottlenecks, large sample sizes, and so on. Recently, there has been significant progress in developing useful inference tools for such general models. In particular, inference methods based on the site frequency spectrum (SFS) have received noticeable attention. Here, we derive a new formula for the expected SFS for general Λ- and Ξ-coalescents, which leads to an efficient algorithm. For time-homogeneous coalescents, the runtime of our algorithm for computing the expected SFS is O(n2) where n is the sample size. This is a factor of[Formula: see text]faster than the state-of-the-art method. Furthermore, in contrast to existing methods, our method generalizes to time-inhomogeneous Λ- and Ξ-coalescents with measures that factorize as[Formula: see text] and [Formula: see text]respectively, where Î¶ denotes a strictly positive function of time. The runtime of our algorithm in this setting is[Formula: see text]We also obtain general theoretical results for the identifiability of the Λ measure when Î¶ is a constant function, as well as for the identifiability of the function Î¶ under a fixed Ξ measure.

Assuntos

Algoritmos , Modelos Teóricos , Modelos Genéticos

Decoding coalescent hidden Markov models in linear time.

Harris, Kelley; Sheehan, Sara; Kamm, John A; Song, Yun S.

Res Comput Mol Biol ; 8394: 100-114, 2014.

Artigo em Inglês | MEDLINE | ID: mdl-25340178

RESUMO

In many areas of computational biology, hidden Markov models (HMMs) have been used to model local genomic features. In particular, coalescent HMMs have been used to infer ancient population sizes, migration rates, divergence times, and other parameters such as mutation and recombination rates. As more loci, sequences, and hidden states are added to the model, however, the runtime of coalescent HMMs can quickly become prohibitive. Here we present a new algorithm for reducing the runtime of coalescent HMMs from quadratic in the number of hidden time states to linear, without making any additional approximations. Our algorithm can be incorporated into various coalescent HMMs, including the popular method PSMC for inferring variable effective population sizes. Here we implement this algorithm to speed up our demographic inference method diCal, which is equivalent to PSMC when applied to a sample of two haplotypes. We demonstrate that the linear-time method can reconstruct a population size change history more accurately than the quadratic-time method, given similar computation resources. We also apply the method to data from the 1000 Genomes project, inferring a high-resolution history of size changes in the European population.

APPROXIMATE SAMPLING FORMULAS FOR GENERAL FINITE-ALLELES MODELS OF MUTATION.

Bhaskar, Anand; Kamm, John A; Song, Yun S.

Adv Appl Probab ; 44(2): 408-428, 2012 Jun.

Artigo em Inglês | MEDLINE | ID: mdl-24634516

RESUMO

Many applications in genetic analyses utilize sampling distributions, which describe the probability of observing a sample of DNA sequences randomly drawn from a population. In the one-locus case with special models of mutation such as the infinite-alleles model or the finite-alleles parent-independent mutation model, closed-form sampling distributions under the coalescent have been known for many decades. However, no exact formula is currently known for more general models of mutation that are of biological interest. In this paper, models with finitely-many alleles are considered, and an urn construction related to the coalescent is used to derive approximate closed-form sampling formulas for an arbitrary irreducible recurrent mutation model or for a reversible recurrent mutation model, depending on whether the number of distinct observed allele types is at most three or four, respectively. It is demonstrated empirically that the formulas derived here are highly accurate when the per-base mutation rate is low, which holds for many biological organisms.

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA