Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 86
Filtrar
Mais filtros

Bases de dados
Tipo de documento
Intervalo de ano de publicação
1.
Cell ; 185(24): 4604-4620.e32, 2022 11 23.
Artigo em Inglês | MEDLINE | ID: mdl-36423582

RESUMO

Natural and induced somatic mutations that accumulate in the genome during development record the phylogenetic relationships of cells; whether these lineage barcodes capture the complex dynamics of progenitor states remains unclear. We introduce quantitative fate mapping, an approach to reconstruct the hierarchy, commitment times, population sizes, and commitment biases of intermediate progenitor states during development based on a time-scaled phylogeny of their descendants. To reconstruct time-scaled phylogenies from lineage barcodes, we introduce Phylotime, a scalable maximum likelihood clustering approach based on a general barcoding mutagenesis model. We validate these approaches using realistic in silico and in vitro barcoding experiments. We further establish criteria for the number of cells that must be analyzed for robust quantitative fate mapping and a progenitor state coverage statistic to assess the robustness. This work demonstrates how lineage barcodes, natural or synthetic, enable analyzing progenitor fate and dynamics long after embryonic development in any organism.


Assuntos
Desenvolvimento Embrionário , Linhagem da Célula/genética , Estudos Retrospectivos , Filogenia , Mutagênese
2.
Theor Popul Biol ; 158: 150-169, 2024 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-38880430

RESUMO

The coalescent is a stochastic process representing ancestral lineages in a population undergoing neutral genetic drift. Originally defined for a well-mixed population, the coalescent has been adapted in various ways to accommodate spatial, age, and class structure, along with other features of real-world populations. To further extend the range of population structures to which coalescent theory applies, we formulate a coalescent process for a broad class of neutral drift models with arbitrary - but fixed - spatial, age, sex, and class structure, haploid or diploid genetics, and any fixed mating pattern. Here, the coalescent is represented as a random sequence of mappings [Formula: see text] from a finite set G to itself. The set G represents the "sites" (in individuals, in particular locations and/or classes) at which these alleles can live. The state of the coalescent, Ct:G→G, maps each site g∈G to the site containing g's ancestor, t time-steps into the past. Using this representation, we define and analyze coalescence time, coalescence branch length, mutations prior to coalescence, and stationary probabilities of identity-by-descent and identity-by-state. For low mutation, we provide a recipe for computing identity-by-descent and identity-by-state probabilities via the coalescent. Applying our results to a diploid population with arbitrary sex ratio r, we find that measures of genetic dissimilarity, among any set of sites, are scaled by 4r(1-r) relative to the even sex ratio case.


Assuntos
Deriva Genética , Genética Populacional , Modelos Genéticos , Mutação , Processos Estocásticos , Humanos , Diploide
3.
Proc Natl Acad Sci U S A ; 118(21)2021 05 25.
Artigo em Inglês | MEDLINE | ID: mdl-34016747

RESUMO

As populations boom and bust, the accumulation of genetic diversity is modulated, encoding histories of living populations in present-day variation. Many methods exist to decode these histories, and all must make strong model assumptions. It is typical to assume that mutations accumulate uniformly across the genome at a constant rate that does not vary between closely related populations. However, recent work shows that mutational processes in human and great ape populations vary across genomic regions and evolve over time. This perturbs the mutation spectrum (relative mutation rates in different local nucleotide contexts). Here, we develop theoretical tools in the framework of Kingman's coalescent to accommodate mutation spectrum dynamics. We present mutation spectrum history inference (mushi), a method to perform nonparametric inference of demographic and mutation spectrum histories from allele frequency data. We use mushi to reconstruct trajectories of effective population size and mutation spectrum divergence between human populations, identify mutation signatures and their dynamics in different human populations, and calibrate the timing of a previously reported mutational pulse in the ancestors of Europeans. We show that mutation spectrum histories can be placed in a well-studied theoretical setting and rigorously inferred from genomic variation data, like other features of evolutionary history.


Assuntos
Frequência do Gene/genética , Genética Populacional/estatística & dados numéricos , Modelos Genéticos , Mutação/genética , Animais , Variação Genética/genética , Genômica , Hominidae/genética , Humanos , Taxa de Mutação , Densidade Demográfica
4.
Theor Popul Biol ; 154: 94-101, 2023 12.
Artigo em Inglês | MEDLINE | ID: mdl-37742787

RESUMO

Multiple-merger coalescents, also known as Λ-coalescents, have been used to describe the genealogy of populations that have a skewed offspring distribution or that undergo strong selection. Inferring the characteristic measure Λ, which describes the rates of the multiple-merger events, is key to understand these processes. So far, most inference methods only work for some particular families of Λ-coalescents that are described by only one parameter, but not for more general models. This article is devoted to the construction of a non-parametric estimator of the density of Λ that is based on the observation at a single time of the so-called Site Frequency Spectrum (SFS), which describes the allelic frequencies in a present population sample. First, we produce estimates of the multiple-merger rates by solving a linear system, whose coefficients are obtained by appropriately subsampling the SFS. Then, we use a technique that aggregates the information extracted from the previous step through a kernel type of re-construction to give a non-parametric estimation of the measure Λ. We give a consistency result of this estimator under mild conditions on the behavior of Λ around 0. We also show some numerical examples of how our method performs.


Assuntos
Genética Populacional , Modelos Genéticos , Frequência do Gene , Densidade Demográfica
5.
Theor Popul Biol ; 147: 16-27, 2022 10.
Artigo em Inglês | MEDLINE | ID: mdl-36007782

RESUMO

A number of powerful demographic inference methods have been developed in recent years, with the goal of fitting rich evolutionary models to genetic data obtained from many populations. In this paper we investigate the statistical performance of these methods in the specific case where there is continuous migration between populations. Compared with earlier work, migration significantly complicates the theoretical analysis and requires new techniques. We employ the theories of phase-type distributions and concentration of measure in order to study the two-island and isolation-with-migration models, resulting in both upper and lower bounds on rates of convergence for parametric estimators in migration models. For the upper bounds, we consider inferring rates of coalescent and migration on the basis of directly observing pairwise coalescent times, and, more realistically, when (conditionally) Poisson-distributed mutations dropped on latent trees are observed. We complement these upper bounds with information-theoretic lower bounds which establish a limit, in terms of sample size, below which inference is effectively impossible.


Assuntos
Genética Populacional , Modelos Genéticos , Evolução Biológica
6.
Theor Popul Biol ; 147: 1-15, 2022 10.
Artigo em Inglês | MEDLINE | ID: mdl-35973448

RESUMO

By providing additional opportunities for coalescence within families, the presence of consanguineous unions in a population reduces coalescence times relative to non-consanguineous populations. First-cousin consanguinity can take one of six forms differing in the configuration of sexes in the pedigree of the male and female cousins who join in a consanguineous union: patrilateral parallel, patrilateral cross, matrilateral parallel, matrilateral cross, bilateral parallel, and bilateral cross. Considering populations with each of the six types of first-cousin consanguinity individually and a population with a mixture of the four unilateral types, we examine coalescent models of consanguinity. We previously computed, for first-cousin consanguinity models, the mean coalescence time for X-chromosomal loci and the limiting distribution of coalescence times for autosomal loci. Here, we use the separation-of-time-scales approach to obtain the limiting distribution of coalescence times for X-chromosomal loci. This limiting distribution has an instantaneous coalescence probability that depends on the probability that a union is consanguineous; lineages that do not coalesce instantaneously coalesce according to an exponential distribution. We study the effects on the coalescence time distribution of the type of first-cousin consanguinity, showing that patrilateral-parallel and patrilateral-cross consanguinity have no effect on X-chromosomal coalescence time distributions and that matrilateral-parallel consanguinity decreases coalescence times to a greater extent than does matrilateral-cross consanguinity.


Assuntos
Família , Casamento , Consanguinidade , Feminino , Humanos , Masculino , Linhagem
7.
Theor Popul Biol ; 148: 11-21, 2022 12.
Artigo em Inglês | MEDLINE | ID: mdl-36122755

RESUMO

Principal component analysis (PCA) is one of the most frequently-used approach to describe population structure from multilocus genotype data. Regarding geographic range expansions of modern humans, interpretations of PCA have, however, been questioned, as there is uncertainty about the wave-like patterns that have been observed in principal components. It has indeed been argued that wave-like patterns are mathematical artifacts that arise generally when PCA is applied to data in which genetic differentiation increases with geographic distance. Here, we present an alternative theory for the observation of wave-like patterns in PCA. We study a coalescent model - the umbrella model - for the diffusion of genetic variants. The model is based on genetic drift without any particular geographical structure. In the umbrella model, splits from an ancestral population occur almost continuously in time, giving birth to small daughter populations at a regular pace. Our results provide detailed mathematical descriptions of eigenvalues and eigenvectors for the PCA of sampled genomic sequences under the model. When variants uniquely represented in the sample are removed, the PCA eigenvectors are defined as cosine functions of increasing periodicity, reproducing wave-like patterns observed in equilibrium isolation-by-distance models. Including singleton variants in the analysis, the eigenvectors corresponding to the largest eigenvalues exhibit complex wave shapes. The accuracy of our predictions is further investigated with coalescent simulations. Our analysis supports the hypothesis that highly structured wave-like patterns could arise from genetic drift only, and may not always be artificial outcomes of spatially structured data. Genomic data related to the peopling of the Americas are reanalyzed in the light of our new theory.


Assuntos
Análise de Componente Principal , Gravidez , Humanos , Feminino
8.
Am J Bot ; 109(5): 706-726, 2022 05.
Artigo em Inglês | MEDLINE | ID: mdl-35526278

RESUMO

PREMISE: Accurate species delimitation is essential for evolutionary biology, conservation, and biodiversity management. We studied species delimitation in North American pinyon pines, Pinus subsection Cembroides, a natural group with high levels of incomplete lineage sorting. METHODS: We used coalescent-based methods and multivariate analyses of low-copy number nuclear genes and nearly complete high-copy number plastomes generated with the Hyb-Seq method. The three coalescent-based species delimitation methods evaluated were the Generalized Mixed Yule Coalescent (GMYC), Poisson Tree Process (PTP), and Trinomial Distribution of Triplets (Tr2). We also measured admixture in populations with possible introgression. RESULTS: Our results show inconsistencies among GMYC, PTP, and Tr2. The single-locus based GMYC analysis of plastid DNA recovered a higher number of species (up to 24 entities, including singleton lineages and clusters) than PTP and the multi-locus coalescent approach. The PTP analysis identified 10 species whereas Tr2 recovered 13, which agreed closely with taxonomic treatments. CONCLUSIONS: We found that PTP and GMYC identified species with low levels of ILS and high morphological divergence (P. maximartinezii, P. pinceana, and P. rzedowskii). However, GMYC method oversplit species by identification of more divergent samples as singletons. Moreover, both PTP and GMYC were incapable of identifying some species that are readily identified morphologically. We suggest that the divergence times between lineages within North American pinyon pines are so disparate that GMYC results are unreliable. Results of the Tr2 method coincided well with previous delimitations based on morphology, DNA, geography, and secondary chemistry.


Assuntos
Núcleo Celular , Pinus , Núcleo Celular/genética , DNA , América do Norte , Filogenia , Pinus/genética
9.
Theor Popul Biol ; 140: 32-43, 2021 08.
Artigo em Inglês | MEDLINE | ID: mdl-33901539

RESUMO

Consanguineous unions increase the frequency at which identical genomic segments are inherited along separate paths of descent, decreasing coalescence times for pairs of alleles drawn from an individual who is the offspring of a consanguineous pair. For an autosomal locus, it has recently been shown that the mean time to the most recent common ancestor (TMRCA) for two alleles in the same individual and the mean TMRCA for two alleles in two separate individuals both decrease with increasing consanguinity in a population. Here, we extend this analysis to the X chromosome, considering X-chromosomal coalescence times under a coalescent model with diploid, male-female mating pairs. We examine four possible first-cousin mating schemes that are equivalent in their effects on autosomes, but that have differing effects on the X chromosome: patrilateral-parallel, patrilateral-cross, matrilateral-parallel, and matrilateral-cross. In each mating model, we calculate mean TMRCA for X-chromosomal alleles sampled either within or between individuals. We describe a consanguinity effect on X-chromosomal TMRCA that differs from the autosomal pattern under matrilateral but not under patrilateral first-cousin mating. For matrilateral first cousins, the effect of consanguinity in reducing TMRCA is stronger on the X chromosome than on the autosomes, with an increased effect of parallel-cousin mating compared to cross-cousin mating. The theoretical computations support the utility of the model in understanding patterns of genomic sharing on the X chromosome.


Assuntos
Diploide , Família , Alelos , Consanguinidade , Feminino , Humanos , Masculino , Cromossomo X
10.
J Hered ; 112(1): 145-154, 2021 03 12.
Artigo em Inglês | MEDLINE | ID: mdl-33511984

RESUMO

Genome studies of facultative sexual species, which can either reproduce sexually or asexually, are providing insight into the evolutionary consequences of mixed reproductive modes. It is currently unclear to what extent the evolutionary history of facultative sexuals' genomes can be approximated by the standard coalescent, and if a coalescent effective population size Ne exists. Here, I determine if and when these approximations can be made. When sex is frequent (occurring at a frequency much greater than 1/N per reproduction per generation, for N the actual population size), the underlying genealogy can be approximated by the standard coalescent, with a coalescent Ne≈N. When sex is very rare (at frequency much lower than 1/N), approximations for the pairwise coalescent time can be obtained, which is strongly influenced by the frequencies of sex and mitotic gene conversion, rather than N. However, these terms do not translate into a coalescent Ne. These results are used to discuss the best sampling strategies for investigating the evolutionary history of facultative sexual species.


Assuntos
Evolução Biológica , Modelos Genéticos , Reprodução , Sexo , Simulação por Computador , Mitose , Densidade Demográfica
11.
Phytopathology ; 111(1): 68-77, 2021 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-33021879

RESUMO

Phylogeography combines geographic information with phylogenetic and population genomic approaches to infer the evolutionary history of a species or population in a geographic context. This approach has been instrumental in understanding the emergence, spread, and evolution of a range of plant pathogens. In particular, phylogeography can address questions about where a pathogen originated, whether it is native or introduced, and when and how often introductions occurred. We review the theory, methods, and approaches underpinning phylogeographic inference and highlight applications providing novel insights into the emergence and spread of select pathogens. We hope that this review will be useful in assessing the power, pitfalls, and opportunities presented by various phylogeographic approaches.


Assuntos
Modelos Genéticos , Doenças das Plantas , Filogenia , Filogeografia
12.
J Math Biol ; 83(6-7): 63, 2021 11 16.
Artigo em Inglês | MEDLINE | ID: mdl-34783900

RESUMO

Linear functions of the site frequency spectrum (SFS) play a major role for understanding and investigating genetic diversity. Estimators of the mutation rate (e.g. based on the total number of segregating sites or average of the pairwise differences) and tests for neutrality (e.g. Tajima's D) are perhaps the most well-known examples. The distribution of linear functions of the SFS is important for constructing confidence intervals for the estimators, and to determine significance thresholds for neutrality tests. These distributions are often approximated using simulation procedures. In this paper we use multivariate phase-type theory to specify, characterize and calculate the distribution of linear functions of the site frequency spectrum. In particular, we show that many of the classical estimators of the mutation rate are distributed according to a discrete phase-type distribution. Neutrality tests, however, are generally not discrete phase-type distributed. For neutrality tests we derive the probability generating function using continuous multivariate phase-type theory, and numerically invert the function to obtain the distribution. A main result is an analytically tractable formula for the probability generating function of the SFS. Software implementation of the phase-type methodology is available in the R package PhaseTypeR, and R code for the reproduction of our results is available as an accompanying vignette.


Assuntos
Modelos Genéticos , Taxa de Mutação , Genética Populacional , Funções Verossimilhança , Mutação
13.
BMC Bioinformatics ; 21(1): 441, 2020 Oct 07.
Artigo em Inglês | MEDLINE | ID: mdl-33028201

RESUMO

BACKGROUND: Inferring phylogenetic relationships of polyploid species and their diploid ancestors (leading to reticulate phylogenies in the case of an allopolyploid origin) based on multi-locus sequence data is complicated by the unknown assignment of alleles found in polyploids to diploid subgenomes. A parsimony-based approach to this problem has been proposed by Oberprieler et al. (Methods Ecol Evol 8:835-849, 2017), however, its implementation is of limited practical value. In addition to previously identified shortcomings, it has been found that in some cases, the obtained results barely satisfy the applied criterion. To be of better use to other researchers, a reimplementation with methodological refinement appears to be indispensable. RESULTS: We present the AllCoPol package, which provides a heuristic method for assigning alleles from polyploids to diploid subgenomes based on the Minimizing Deep Coalescences (MDC) criterion in multi-locus sequence datasets. An additional consensus approach further allows to assess the confidence of phylogenetic reconstructions. Simulations of tetra- and hexaploids show that under simplifying assumptions such as completely disomic inheritance, the topological errors of reconstructed phylogenies are similar to those of MDC species trees based on the true allele partition. CONCLUSIONS: AllCoPol is a Python package for phylogenetic reconstructions of polyploids offering enhanced functionality as well as improved usability. The included methods are supplied as command line tools without the need for prior programming knowledge.


Assuntos
Interface Usuário-Computador , Alelos , Bases de Dados Genéticas , Leucanthemum/classificação , Leucanthemum/genética , Tipagem de Sequências Multilocus , Filogenia , Poliploidia
14.
Mol Biol Evol ; 36(10): 2358-2374, 2019 10 01.
Artigo em Inglês | MEDLINE | ID: mdl-31165149

RESUMO

Natural populations display a variety of spatial arrangements, each potentially with a distinctive impact on genetic diversity and genetic differentiation among subpopulations. Although the spatial arrangement of populations can lead to intricate migration networks, theoretical developments have focused mainly on a small subset of such networks, emphasizing the island-migration and stepping-stone models. In this study, we investigate all small network motifs: the set of all possible migration networks among populations subdivided into at most four subpopulations. For each motif, we use coalescent theory to derive expectations for three quantities that describe genetic variation: nucleotide diversity, FST, and half-time to equilibrium diversity. We describe the impact of network properties on these quantities, finding that motifs with a high mean node degree have the largest nucleotide diversity and the longest time to equilibrium, whereas motifs with low density have the largest FST. In addition, we show that the motifs whose pattern of variation is most strongly influenced by loss of a connection or a subpopulation are those that can be split easily into disconnected components. We illustrate our results using two example data sets-sky island birds of genus Sholicola and Indian tigers-identifying disturbance scenarios that produce the greatest reduction in genetic diversity; for tigers, we also compare the benefits of two assisted gene flow scenarios. Our results have consequences for understanding the effect of geography on genetic diversity, and they can assist in designing strategies to alter population migration networks toward maximizing genetic variation in the context of conservation of endangered species.


Assuntos
Migração Animal , Genética Populacional/métodos , Animais , Aves/genética , Variação Genética , Tigres/genética
15.
Theor Popul Biol ; 134: 171-181, 2020 08.
Artigo em Inglês | MEDLINE | ID: mdl-32278682

RESUMO

Only 6% of known species have a conservation status. Methods that assess conservation statuses are often based on individual counts and are thus too laborious to be generalized to all species. Population genomics methods that infer past variations in population size are easy to use but limited to the relatively distant past. Here we propose a population genomics approach that tests for recent population decline and may be used to assess species conservation statuses. More specifically, we study Maximal Recombination Free (MRF) blocks, that are segments of a sequence alignment inherited from a common ancestor without recombination. MRF blocks are relatively longer in small than in large populations. We use the distribution of MRF block lengths rescaled by their mean to test for recent population decline. However, because MRF blocks are difficult to detect, we also consider Maximal Linkage Disequilibrium (MLD) blocks, which are runs of single nucleotide polymorphisms compatible with a single tree. We develop a new method capable of inferring a very recent decline (e.g. with a detection power of 50% for populations whose size was halved to N, 0.05 ×N generations ago) from rescaled MLD block lengths. Our framework could serve as a basis for quantitative tools to assess conservation status in a wide range of species.


Assuntos
Polimorfismo de Nucleotídeo Único , Animais , Desequilíbrio de Ligação , Densidade Demográfica
16.
J Evol Biol ; 33(10): 1387-1404, 2020 10.
Artigo em Inglês | MEDLINE | ID: mdl-32654283

RESUMO

The process of species diversification is traditionally summarized by a single tree, the species tree, whose reconstruction from molecular data is hindered by frequent conflicts between gene genealogies. Here, we argue that instead of seeing these conflicts as nuisances, we can exploit them to inform the diversification process itself. We adopt a gene-based view of diversification to model the ubiquitous presence of gene flow between diverging lineages, one of the most important processes explaining disagreements among gene trees. We propose a new framework for modelling the joint evolution of gene and species lineages relaxing the hierarchy between the species tree and gene trees inherent to the standard view, as embodied in a popular model known as the multispecies coalescent (MSC). We implement this framework in two alternative models called the gene-based diversification models (GBD): (a) GBD-forward following all evolving genomes through time and (b) GBD-backward based on coalescent theory. They feature four parameters tuning colonization, gene flow, genetic drift and genetic differentiation. We propose an inference method based on differences between gene trees. Applied to two empirical data sets prone to gene flow, we find better support for the GBD-backward model than for the MSC model. Along with the increasing awareness of the extent of gene flow, this work shows the importance of considering the richer signal contained in genomic histories, rather than in the mere species tree, to better apprehend the complex evolutionary history of species.


Assuntos
Fluxo Gênico , Especiação Genética , Genoma , Modelos Genéticos , Filogenia
17.
Syst Biol ; 68(5): 730-743, 2019 09 01.
Artigo em Inglês | MEDLINE | ID: mdl-30726979

RESUMO

The coalescent process describes how changes in the size or structure of a population influence the genealogical patterns of sequences sampled from that population. The estimation of (effective) population size changes from genealogies that are reconstructed from these sampled sequences is an important problem in many biological fields. Often, population size is characterized by a piecewise-constant function, with each piece serving as a population size parameter to be estimated. Estimation quality depends on both the statistical coalescent inference method employed, and on the experimental protocol, which controls variables such as the sampling of sequences through time and space, or the transformation of model parameters. While there is an extensive literature on coalescent inference methodology, there is comparatively little work on experimental design. The research that does exist is largely simulation-based, precluding the development of provable or general design theorems. We examine three key design problems: temporal sampling of sequences under the skyline demographic coalescent model, spatio-temporal sampling under the structured coalescent model, and time discretization for sequentially Markovian coalescent models. In all cases, we prove that 1) working in the logarithm of the parameters to be inferred (e.g., population size) and 2) distributing informative coalescent events uniformly among these log-parameters, is uniquely robust. "Robust" means that the total and maximum uncertainty of our parameter estimates are minimized, and made insensitive to their unknown (true) values. This robust design theorem provides rigorous justification for several existing coalescent experimental design decisions and leads to usable guidelines for future empirical or simulation-based investigations. Given its persistence among models, this theorem may form the basis of an experimental design paradigm for coalescent inference.


Assuntos
Classificação/métodos , Modelos Biológicos , Simulação por Computador , Densidade Demográfica
18.
Proc Natl Acad Sci U S A ; 114(7): 1607-1612, 2017 02 14.
Artigo em Inglês | MEDLINE | ID: mdl-28137871

RESUMO

The multispecies coalescent model underlies many approaches used for species delimitation. In previous work assessing the performance of species delimitation under this model, speciation was treated as an instantaneous event rather than as an extended process involving distinct phases of speciation initiation (structuring) and completion. Here, we use data under simulations that explicitly model speciation as an extended process rather than an instantaneous event and carry out species delimitation inference on these data under the multispecies coalescent. We show that the multispecies coalescent diagnoses genetic structure, not species, and that it does not statistically distinguish structure associated with population isolation vs. species boundaries. Because of the misidentification of population structure as putative species, our work raises questions about the practice of genome-based species discovery, with cascading consequences in other fields. Specifically, all fields that rely on species as units of analysis, from conservation biology to studies of macroevolutionary dynamics, will be impacted by inflated estimates of the number of species, especially as genomic resources provide unprecedented power for detecting increasingly finer-scaled genetic structure under the multispecies coalescent. As such, our work also represents a general call for systematic study to reconsider a reliance on genomic data alone. Until new methods are developed that can discriminate between structure due to population-level processes and that due to species boundaries, genomic-based results should only be considered a hypothesis that requires validation of delimited species with multiple data types, such as phenotypic and ecological information.


Assuntos
Fluxo Gênico , Especiação Genética , Genoma/genética , Modelos Genéticos , Animais , Simulação por Computador , Evolução Molecular , Humanos , Fenótipo , Filogenia , Especificidade da Espécie
19.
BMC Bioinformatics ; 20(1): 526, 2019 Oct 28.
Artigo em Inglês | MEDLINE | ID: mdl-31660852

RESUMO

BACKGROUND: Our current understanding of archaic admixture in humans relies on statistical methods with large biases, whose magnitudes depend on the sizes and separation times of ancestral populations. To avoid these biases, it is necessary to estimate these parameters simultaneously with those describing admixture. Genetic estimates of population histories also confront problems of statistical identifiability: different models or different combinations of parameter values may fit the data equally well. To deal with this problem, we need methods of model selection and model averaging, which are lacking from most existing software. RESULTS: The Legofit software package allows simultaneous estimation of parameters describing admixture, and the sizes and separation times of ancestral populations. It includes facilities for data manipulation, estimation, analysis of residuals, model selection, and model averaging. CONCLUSIONS: Legofit uses genetic data to study the history of a subdivided population. It is unaffected by recent history and can therefore focus on the deep history of population size, subdivision, and admixture. It outperforms several statistical methods that have been widely used to study population history and should be useful in any species for which DNA sequence data is available from several populations.


Assuntos
Modelos Genéticos , Biometria , Humanos , Densidade Demográfica , Software
20.
Theor Popul Biol ; 127: 16-32, 2019 06.
Artigo em Inglês | MEDLINE | ID: mdl-30822431

RESUMO

Probability modelling for DNA sequence evolution is well established and provides a rich framework for understanding genetic variation between samples of individuals from one or more populations. We show that both classical and more recent models for coalescence (with or without recombination) can be described in terms of the so-called phase-type theory, where complicated and tedious calculations are circumvented by the use of matrix manipulations. The application of phase-type theory in population genetics consists of describing the biological system as a Markov model by appropriately setting up a state space and calculating the corresponding intensity and reward matrices. Formulae of interest are then expressed in terms of these aforementioned matrices. We illustrate this procedure by a number of examples: (a) Calculating the mean, (co)variance and even higher order moments of the site frequency spectrum in multiple merger coalescent models, (b) Analysing a sample of DNA sequences from the Atlantic Cod using the Beta-coalescent, and (c) Determining the correlation of the number of segregating sites for multiple samples in the two-locus ancestral recombination graph. We believe that phase-type theory has great potential as a tool for analysing probability models in population genetics. The compact matrix notation is useful for clarification of current models, and in particular their formal manipulation and calculations, but also for further development or extensions.


Assuntos
Genética Populacional , Modelos Genéticos , Algoritmos , Humanos , Cadeias de Markov , Densidade Demográfica , Recombinação Genética
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA