Pesquisa | BVS IEC

1.

Quantitative fate mapping: A general framework for analyzing progenitor state dynamics via retrospective lineage barcoding.

Fang, Weixiang; Bell, Claire M; Sapirstein, Abel; Asami, Soichiro; Leeper, Kathleen; Zack, Donald J; Ji, Hongkai; Kalhor, Reza.

Cell ; 185(24): 4604-4620.e32, 2022 11 23.

Artigo em Inglês | MEDLINE | ID: mdl-36423582

RESUMO

Natural and induced somatic mutations that accumulate in the genome during development record the phylogenetic relationships of cells; whether these lineage barcodes capture the complex dynamics of progenitor states remains unclear. We introduce quantitative fate mapping, an approach to reconstruct the hierarchy, commitment times, population sizes, and commitment biases of intermediate progenitor states during development based on a time-scaled phylogeny of their descendants. To reconstruct time-scaled phylogenies from lineage barcodes, we introduce Phylotime, a scalable maximum likelihood clustering approach based on a general barcoding mutagenesis model. We validate these approaches using realistic in silico and in vitro barcoding experiments. We further establish criteria for the number of cells that must be analyzed for robust quantitative fate mapping and a progenitor state coverage statistic to assess the robustness. This work demonstrates how lineage barcodes, natural or synthetic, enable analyzing progenitor fate and dynamics long after embryonic development in any organism.

Assuntos

Desenvolvimento Embrionário , Linhagem da Célula/genética , Estudos Retrospectivos , Filogenia , Mutagênese

2.

Evolutionary Genomics of High Fecundity.

Eldon, Bjarki.

Annu Rev Genet ; 54: 213-236, 2020 11 23.

Artigo em Inglês | MEDLINE | ID: mdl-32870729

RESUMO

Natural highly fecund populations abound. These range from viruses to gadids. Many highly fecund populations are economically important. Highly fecund populations provide an important contrast to the low-fecundity organisms that have traditionally been applied in evolutionary studies. A key question regarding high fecundity is whether large numbers of offspring are produced on a regular basis, by few individuals each time, in a sweepstakes mode of reproduction. Such reproduction characteristics are not incorporated into the classical Wright-Fisher model, the standard reference model of population genetics, or similar types of models, in which each individual can produce only small numbers of offspring relative to the population size. The expected genomic footprints of population genetic models of sweepstakes reproduction are very different from those of the Wright-Fisher model. A key, immediate issue involves identifying the footprints of sweepstakes reproduction in genomic data. Whole-genome sequencing data can be used to distinguish the patterns made by sweepstakes reproduction from the patterns made by population growth in a population evolving according to the Wright-Fisher model (or similar models). If the hypothesis of sweepstakes reproduction cannot be rejected, then models of sweepstakes reproduction and associated multiple-merger coalescents will become at least as relevant as the Wright-Fisher model (or similar models) and the Kingman coalescent, the cornerstones of mathematical population genetics, in further discussions of evolutionary genomics of highly fecund populations.

Assuntos

Fertilidade/genética , Evolução Biológica , Genética Populacional/métodos , Genômica/métodos , Humanos , Modelos Genéticos , Densidade Demográfica , Crescimento Demográfico , Reprodução/genética

3.

Tree-based QTL mapping with expected local genetic relatedness matrices.

Link, Vivian; Schraiber, Joshua G; Fan, Caoqi; Dinh, Bryan; Mancuso, Nicholas; Chiang, Charleston W K; Edge, Michael D.

Am J Hum Genet ; 110(12): 2077-2091, 2023 Dec 07.

Artigo em Inglês | MEDLINE | ID: mdl-38065072

RESUMO

Understanding the genetic basis of complex phenotypes is a central pursuit of genetics. Genome-wide association studies (GWASs) are a powerful way to find genetic loci associated with phenotypes. GWASs are widely and successfully used, but they face challenges related to the fact that variants are tested for association with a phenotype independently, whereas in reality variants at different sites are correlated because of their shared evolutionary history. One way to model this shared history is through the ancestral recombination graph (ARG), which encodes a series of local coalescent trees. Recent computational and methodological breakthroughs have made it feasible to estimate approximate ARGs from large-scale samples. Here, we explore the potential of an ARG-based approach to quantitative-trait locus (QTL) mapping, echoing existing variance-components approaches. We propose a framework that relies on the conditional expectation of a local genetic relatedness matrix (local eGRM) given the ARG. Simulations show that our method is especially beneficial for finding QTLs in the presence of allelic heterogeneity. By framing QTL mapping in terms of the estimated ARG, we can also facilitate the detection of QTLs in understudied populations. We use local eGRM to analyze two chromosomes containing known body size loci in a sample of Native Hawaiians. Our investigations can provide intuition about the benefits of using estimated ARGs in population- and statistical-genetic methods in general.

Assuntos

Genética Populacional , Estudo de Associação Genômica Ampla , Locos de Características Quantitativas , Humanos , Mapeamento Cromossômico/métodos , Modelos Genéticos , Fenótipo , Locos de Características Quantitativas/genética , Havaiano Nativo ou Outro Ilhéu do Pacífico/genética

4.

Efficient Bayesian inference under the multispecies coalescent with migration.

Flouri, Tomás; Jiao, Xiyun; Huang, Jun; Rannala, Bruce; Yang, Ziheng.

Proc Natl Acad Sci U S A ; 120(44): e2310708120, 2023 Oct 31.

Artigo em Inglês | MEDLINE | ID: mdl-37871206

RESUMO

Analyses of genome sequence data have revealed pervasive interspecific gene flow and enriched our understanding of the role of gene flow in speciation and adaptation. Inference of gene flow using genomic data requires powerful statistical methods. Yet current likelihood-based methods involve heavy computation and are feasible for small datasets only. Here, we implement the multispecies-coalescent-with-migration model in the Bayesian program bpp, which can be used to test for gene flow and estimate migration rates, as well as species divergence times and population sizes. We develop Markov chain Monte Carlo algorithms for efficient sampling from the posterior, enabling the analysis of genome-scale datasets with thousands of loci. Implementation of both introgression and migration models in the same program allows us to test whether gene flow occurred continuously over time or in pulses. Analyses of genomic data from Anopheles mosquitoes demonstrate rich information in typical genomic datasets about the mode and rate of gene flow.

Assuntos

Algoritmos , Fluxo Gênico , Animais , Filogenia , Simulação por Computador , Teorema de Bayes , Funções Verossimilhança , Modelos Genéticos

5.

Comparison of Bayesian Coalescent Skyline Plot Models for Inferring Demographic Histories.

Billenstein, Ronja J; Höhna, Sebastian.

Mol Biol Evol ; 41(5)2024 May 03.

Artigo em Inglês | MEDLINE | ID: mdl-38630635

RESUMO

Bayesian coalescent skyline plot models are widely used to infer demographic histories. The first (non-Bayesian) coalescent skyline plot model assumed a known genealogy as data, while subsequent models and implementations jointly inferred the genealogy and demographic history from sequence data, including heterochronous samples. Overall, there exist multiple different Bayesian coalescent skyline plot models which mainly differ in two key aspects: (i) how changes in population size are modeled through independent or autocorrelated prior distributions, and (ii) how many change-points in the demographic history are used, where they occur and if the number is pre-specified or inferred. The specific impact of each of these choices on the inferred demographic history is not known because of two reasons: first, not all models are implemented in the same software, and second, each model implementation makes specific choices that the biologist cannot influence. To facilitate a detailed evaluation of Bayesian coalescent skyline plot models, we implemented all currently described models in a flexible design into the software RevBayes. Furthermore, we evaluated models and choices on an empirical dataset of horses supplemented by a small simulation study. We find that estimated demographic histories can be grouped broadly into two groups depending on how change-points in the demographic history are specified (either independent of or at coalescent events). Our simulations suggest that models using change-points at coalescent events produce spurious variation near the present, while most models using independent change-points tend to over-smooth the inferred demographic history.

Assuntos

Teorema de Bayes , Genética Populacional , Modelos Genéticos , Animais , Genética Populacional/métodos , Cavalos , Densidade Demográfica , Simulação por Computador , Software , Demografia

6.

Benefits and Limits of Phasing Alleles for Network Inference of Allopolyploid Complexes.

Tiley, George P; Crowl, Andrew A; Manos, Paul S; Sessa, Emily B; Solís-Lemus, Claudia; Yoder, Anne D; Burleigh, J Gordon.

Syst Biol ; 2024 May 11.

Artigo em Inglês | MEDLINE | ID: mdl-38733563

RESUMO

Accurately reconstructing the reticulate histories of polyploids remains a central challenge for understanding plant evolution. Although phylogenetic networks can provide insights into relationships among polyploid lineages, inferring networks may be hindered by the complexities of homology determination in polyploid taxa. We use simulations to show that phasing alleles from allopolyploid individuals can improve phylogenetic network inference under the multispecies coalescent by obtaining the true network with fewer loci compared to haplotype consensus sequences or sequences with heterozygous bases represented as ambiguity codes. Phased allelic data can also improve divergence time estimates for networks, which is helpful for evaluating allopolyploid speciation hypotheses and proposing mechanisms of speciation. To achieve these outcomes in empirical data, we present a novel pipeline that leverages a recently developed phasing algorithm to reliably phase alleles from polyploids. This pipeline is especially appropriate for target enrichment data, where depth of coverage is typically high enough to phase entire loci. We provide an empirical example in the North American Dryopteris fern complex that demonstrates insights from phased data as well as the challenges of network inference. We establish that our pipeline (PATÉ: Phased Alleles from Target Enrichment data) is capable of recovering a high proportion of phased loci from both diploids and polyploids. These data may improve network estimates compared to using haplotype consensus assemblies by accurately inferring the direction of gene flow, but statistical non-identifiability of phylogenetic networks poses a barrier to inferring the evolutionary history of reticulate complexes.

7.

Likelihood-Based Tests of Species Tree Hypotheses.

Adams, Richard; DeGiorgio, Michael.

Mol Biol Evol ; 40(7)2023 07 05.

Artigo em Inglês | MEDLINE | ID: mdl-37440530

RESUMO

Likelihood-based tests of phylogenetic trees are a foundation of modern systematics. Over the past decade, an enormous wealth and diversity of model-based approaches have been developed for phylogenetic inference of both gene trees and species trees. However, while many techniques exist for conducting formal likelihood-based tests of gene trees, such frameworks are comparatively underdeveloped and underutilized for testing species tree hypotheses. To date, widely used tests of tree topology are designed to assess the fit of classical models of molecular sequence data and individual gene trees and thus are not readily applicable to the problem of species tree inference. To address this issue, we derive several analogous likelihood-based approaches for testing topologies using modern species tree models and heuristic algorithms that use gene tree topologies as input for maximum likelihood estimation under the multispecies coalescent. For the purpose of comparing support for species trees, these tests leverage the statistical procedures of their original gene tree-based counterparts that have an extended history for testing phylogenetic hypotheses at a single locus. We discuss and demonstrate a number of applications, limitations, and important considerations of these tests using simulated and empirical phylogenomic data sets that include both bifurcating topologies and reticulate network models of species relationships. Finally, we introduce the open-source R package SpeciesTopoTestR (SpeciesTopology Tests in R) that includes a suite of functions for conducting formal likelihood-based tests of species topologies given a set of input gene tree topologies.

Assuntos

Algoritmos , Modelos Genéticos , Filogenia , Funções Verossimilhança

8.

Inferring the Direction of Introgression Using Genomic Sequence Data.

Thawornwattana, Yuttapong; Huang, Jun; Flouri, Tomás; Mallet, James; Yang, Ziheng.

Mol Biol Evol ; 40(8)2023 08 03.

Artigo em Inglês | MEDLINE | ID: mdl-37552932

RESUMO

Genomic data are informative about the history of species divergence and interspecific gene flow, including the direction, timing, and strength of gene flow. However, gene flow in opposite directions generates similar patterns in multilocus sequence data, such as reduced sequence divergence between the hybridizing species. As a result, inference of the direction of gene flow is challenging. Here, we investigate the information about the direction of gene flow present in genomic sequence data using likelihood-based methods under the multispecies-coalescent-with-introgression model. We analyze the case of two species, and use simulation to examine cases with three or four species. We find that it is easier to infer gene flow from a small population to a large one than in the opposite direction, and easier to infer inflow (gene flow from outgroup species to an ingroup species) than outflow (gene flow from an ingroup species to an outgroup species). It is also easier to infer gene flow if there is a longer time of separate evolution between the initial divergence and subsequent introgression. When introgression is assumed to occur in the wrong direction, the time of introgression tends to be correctly estimated and the Bayesian test of gene flow is often significant, while estimates of introgression probability can be even greater than the true probability. We analyze genomic sequences from Heliconius butterflies to demonstrate that typical genomic datasets are informative about the direction of interspecific gene flow, as well as its timing and strength.

Assuntos

Borboletas , Animais , Funções Verossimilhança , Teorema de Bayes , Borboletas/genética , Genoma , Genômica , Fluxo Gênico , Filogenia , Hibridização Genética

9.

Whole genome phylogenomics helps to resolve the phylogenetic position of the Zygothrica genus group (Diptera, Drosophilidae) and the causes of previous incongruences.

Hartwig Bessa, Maiara; Silva Gottschalk, Marco; Jaqueline Robe, Lizandra.

Mol Phylogenet Evol ; : 108158, 2024 Jul 16.

Artigo em Inglês | MEDLINE | ID: mdl-39025321

RESUMO

Incomplete Lineage Sorting (ILS) and introgression are among the two main factors causing incongruence between gene and species trees. Advances in phylogenomic studies have allowed us to overcome most of these issues, providing reliable phylogenetic hypotheses while revealing the underlying evolutionary scenario. Across the last century, many incongruent phylogenetic reconstructions were recovered for Drosophilidae, employing a limited sampling of genetic markers or species. In these studies, the monophyly and the phylogenetic positioning of the Zygothrica genus group stood out as one of the most controversial questions. Thus, here, we addressed these issues using a phylogenomic approach, while accessing the influence of ILS and introgressions on the diversification of these species and addressing the spatio-temporal scenario associated with their evolution. For this task, the genomes of nine specimens from six Neotropical species belonging to the Zygothrica genus group were sequenced and evaluated in a phylogenetic framework encompassing other 39 species of Drosophilidae. Nucleotide and amino acid sequences recovered for a set of 2,534 single-copy genes by BUSCO were employed to reconstruct maximum likelihood (ML) concatenated and multi-species coalescent (MSC) trees. Likelihood mapping, quartet sampling, and reticulation tests were employed to infer the level and causes of incongruence. Lastly, a penalized-likelihood molecular clock strategy with fossil calibrations was performed to infer divergence times. Taken together, our results recovered the subdivision of Drosophila into six different lineages, one of which clusters species of the Zygothrica genus group (except for H. duncani). The divergence of this lineage was dated to Oligoceneâ¯â¼â¯31 Mya and seems to have occurred in the same timeframe as other key diversification within Drosophila. According to the concatenated and MSC strategies, this lineage is sister to the clade joining Drosophila (Siphlodora) with the Hawaiian Drosophila and Scaptomyza. Likelihood mapping, quartet sampling, reticulation reconstructions as well as introgression tests revealed that this lineage was the target of several hybridization events involving the ancestors of different Drosophila lineages. Thus, our results generally show introgression as a major source of previous incongruence. Nevertheless, the similar diversification times recovered for several of the Neotropical Drosophila lineages also support the scenario of multiple and simultaneous diversifications taking place at the base of Drosophilidae phylogeny, at least in the Neotropics.

10.

Coalescence and sampling distributions for Feller diffusions.

Burden, Conrad J; Griffiths, Robert C.

Theor Popul Biol ; 155: 67-76, 2024 02.

Artigo em Inglês | MEDLINE | ID: mdl-38092137

RESUMO

Consider the diffusion process defined by the forward equation ut(t,x)=12{xu(t,x)}xx-α{xu(t,x)}x for t,x≥0 and -∞<α<∞, with an initial condition u(0,x)=Î´(x-x0). This equation was introduced and solved by Feller to model the growth of a population of independently reproducing individuals. We explore important coalescent processes related to Feller's solution. For any α and x0>0 we calculate the distribution of the random variable An(s;t), defined as the finite number of ancestors at a time s in the past of a sample of size n taken from the infinite population of a Feller diffusion at a time t since its initiation. In a subcritical diffusion we find the distribution of population and sample coalescent trees from time t back, conditional on non-extinction as tâ∞. In a supercritical diffusion we construct a coalescent tree which has a single founder and derive the distribution of coalescent times.

11.

The coalescent in finite populations with arbitrary, fixed structure.

Allen, Benjamin; McAvoy, Alex.

Theor Popul Biol ; 158: 150-169, 2024 Aug.

Artigo em Inglês | MEDLINE | ID: mdl-38880430

RESUMO

The coalescent is a stochastic process representing ancestral lineages in a population undergoing neutral genetic drift. Originally defined for a well-mixed population, the coalescent has been adapted in various ways to accommodate spatial, age, and class structure, along with other features of real-world populations. To further extend the range of population structures to which coalescent theory applies, we formulate a coalescent process for a broad class of neutral drift models with arbitrary - but fixed - spatial, age, sex, and class structure, haploid or diploid genetics, and any fixed mating pattern. Here, the coalescent is represented as a random sequence of mappings [Formula: see text] from a finite set G to itself. The set G represents the "sites" (in individuals, in particular locations and/or classes) at which these alleles can live. The state of the coalescent, Ct:GâG, maps each site g∈G to the site containing g's ancestor, t time-steps into the past. Using this representation, we define and analyze coalescence time, coalescence branch length, mutations prior to coalescence, and stationary probabilities of identity-by-descent and identity-by-state. For low mutation, we provide a recipe for computing identity-by-descent and identity-by-state probabilities via the coalescent. Applying our results to a diploid population with arbitrary sex ratio r, we find that measures of genetic dissimilarity, among any set of sites, are scaled by 4r(1-r) relative to the even sex ratio case.

Assuntos

Deriva Genética , Genética Populacional , Modelos Genéticos , Mutação , Processos Estocásticos , Humanos , Diploide

12.

Latent mutations in the ancestries of alleles under selection.

Fan, Wai-Tong Louis; Wakeley, John.

Theor Popul Biol ; 158: 1-20, 2024 Aug.

Artigo em Inglês | MEDLINE | ID: mdl-38697365

RESUMO

We consider a single genetic locus with two alleles A1 and A2 in a large haploid population. The locus is subject to selection and two-way, or recurrent, mutation. Assuming the allele frequencies follow a Wright-Fisher diffusion and have reached stationarity, we describe the asymptotic behaviors of the conditional gene genealogy and the latent mutations of a sample with known allele counts, when the count n1 of allele A1 is fixed, and when either or both the sample size n and the selection strength |α| tend to infinity. Our study extends previous work under neutrality to the case of non-neutral rare alleles, asserting that when selection is not too strong relative to the sample size, even if it is strongly positive or strongly negative in the usual sense (αâ-∞ or αâ+∞), the number of latent mutations of the n1 copies of allele A1 follows the same distribution as the number of alleles in the Ewens sampling formula. On the other hand, very strong positive selection relative to the sample size leads to neutral gene genealogies with a single ancient latent mutation. We also demonstrate robustness of our asymptotic results against changing population sizes, when one of |α| or n is large.

Assuntos

Alelos , Frequência do Gene , Modelos Genéticos , Mutação , Seleção Genética , Humanos , Genética Populacional

13.

On multi-type Cannings models and multi-type exchangeable coalescents.

Möhle, Martin.

Theor Popul Biol ; 156: 103-116, 2024 Apr.

Artigo em Inglês | MEDLINE | ID: mdl-38367871

RESUMO

A multi-type neutral Cannings population model with migration and fixed subpopulation sizes is analyzed. Under appropriate conditions, as all subpopulation sizes tend to infinity, the ancestral process, properly time-scaled, converges to a multi-type coalescent sharing the exchangeability and consistency property. The proof gains from coalescent theory for single-type Cannings models and from decompositions of transition probabilities into parts concerning reproduction and migration respectively. The following section deals with a different but closely related multi-type Cannings model with mutation and fixed total population size but stochastically varying subpopulation sizes. The latter model is analyzed forward and backward in time with an emphasis on its behavior as the total population size tends to infinity. Forward in time, multi-type limiting branching processes arise for large population size. Its backward structure and related open problems are briefly discussed.

Assuntos

Genética Populacional , Modelos Genéticos , Reprodução/genética , Densidade Demográfica , Mutação

14.

The ancestral selection graph for a Λ-asymmetric Moran model.

González Casanova, Adrián; Kurt, Noemi; Pérez, José Luis.

Theor Popul Biol ; : 1-17, 2024 Mar 13.

Artigo em Inglês | MEDLINE | ID: mdl-38490495

RESUMO

Motivated by the question of the impact of selective advantage in populations with skewed reproduction mechanisms, we study a Moran model with selection. We assume that there are two types of individuals, where the reproductive success of one type is larger than the other. The higher reproductive success may stem from either more frequent reproduction, or from larger numbers of offspring, and is encoded in a measure Λ for each of the two types. Λ-reproduction here means that a whole fraction of the population is replaced at a reproductive event. Our approach consists of constructing a Λ-asymmetric Moran model in which individuals of the two populations compete, rather than considering a Moran model for each population. Provided the measure are ordered stochastically, we can couple them. This allows us to construct the central object of this paper, the Λ-asymmetric ancestral selection graph, leading to a pathwise duality of the forward in time Λ-asymmetric Moran model with its ancestral process. We apply the ancestral selection graph in order to obtain scaling limits of the forward and backward processes, and note that the frequency process converges to the solution of an SDE with discontinuous paths. Finally, we derive a Griffiths representation for the generator of the SDE and use it to find a semi-explicit formula for the probability of fixation of the less beneficial of the two types.

15.

Phase-type distributions in mathematical population genetics: An emerging framework.

Hobolth, Asger; Rivas-González, Iker; Bladt, Mogens; Futschik, Andreas.

Theor Popul Biol ; 157: 14-32, 2024 Jun.

Artigo em Inglês | MEDLINE | ID: mdl-38460602

RESUMO

A phase-type distribution is the time to absorption in a continuous- or discrete-time Markov chain. Phase-type distributions can be used as a general framework to calculate key properties of the standard coalescent model and many of its extensions. Here, the 'phases' in the phase-type distribution correspond to states in the ancestral process. For example, the time to the most recent common ancestor and the total branch length are phase-type distributed. Furthermore, the site frequency spectrum follows a multivariate discrete phase-type distribution and the joint distribution of total branch lengths in the two-locus coalescent-with-recombination model is multivariate phase-type distributed. In general, phase-type distributions provide a powerful mathematical framework for coalescent theory because they are analytically tractable using matrix manipulations. The purpose of this review is to explain the phase-type theory and demonstrate how the theory can be applied to derive basic properties of coalescent models. These properties can then be used to obtain insight into the ancestral process, or they can be applied for statistical inference. In particular, we show the relation between classical first-step analysis of coalescent models and phase-type calculations. We also show how reward transformations in phase-type theory lead to easy calculation of covariances and correlation coefficients between e.g. tree height, tree length, external branch length, and internal branch length. Furthermore, we discuss how these quantities can be used for statistical inference based on estimating equations. Providing an alternative to previous work based on the Laplace transform, we derive likelihoods for small-size coalescent trees based on phase-type theory. Overall, our main aim is to demonstrate that phase-type distributions provide a convenient general set of tools to understand aspects of coalescent models that are otherwise difficult to derive. Throughout the review, we emphasize the versatility of the phase-type framework, which is also illustrated by our accompanying R-code. All our analyses and figures can be reproduced from code available on GitHub.

Assuntos

Genética Populacional , Cadeias de Markov , Modelos Genéticos , Humanos

16.

Demographic inference for spatially heterogeneous populations using long shared haplotypes.

Forien, Raphaël; Ringbauer, Harald; Coop, Graham.

Theor Popul Biol ; 2024 Mar 16.

Artigo em Inglês | MEDLINE | ID: mdl-38492811

RESUMO

We introduce a modified spatial Λ-Fleming-Viot process to model the ancestry of individuals in a population occupying a continuous spatial habitat divided into two areas by a sharp discontinuity of the dispersal rate and effective population density. We derive an analytical formula for the expected number of shared haplotype segments between two individuals depending on their sampling locations. This formula involves the transition density of a skew diffusion which appears as a scaling limit of the ancestral lineages of individuals in this model. We then show that this formula can be used to infer the dispersal parameters and the effective population density of both regions, using a composite likelihood approach, and we demonstrate the efficiency of this method on a range of simulated data sets.

17.

Tropical Logistic Regression Model on Space of Phylogenetic Trees.

Aliatimis, Georgios; Yoshida, Ruriko; Boyaci, Burak; Grant, James A.

Bull Math Biol ; 86(8): 99, 2024 Jul 02.

Artigo em Inglês | MEDLINE | ID: mdl-38954147

RESUMO

Classification of gene trees is an important task both in the analysis of multi-locus phylogenetic data, and assessment of the convergence of Markov Chain Monte Carlo (MCMC) analyses used in Bayesian phylogenetic tree reconstruction. The logistic regression model is one of the most popular classification models in statistical learning, thanks to its computational speed and interpretability. However, it is not appropriate to directly apply the standard logistic regression model to a set of phylogenetic trees, as the space of phylogenetic trees is non-Euclidean and thus contradicts the standard assumptions on covariates. It is well-known in tropical geometry and phylogenetics that the space of phylogenetic trees is a tropical linear space in terms of the max-plus algebra. Therefore, in this paper, we propose an analogue approach of the logistic regression model in the setting of tropical geometry. Our proposed method outperforms classical logistic regression in terms of Area under the ROC Curve in numerical examples, including with data generated by the multi-species coalescent model. Theoretical properties such as statistical consistency have been proved and generalization error rates have been derived. Finally, our classification algorithm is proposed as an MCMC convergence criterion for Mr Bayes. Unlike the convergence metric used by Mr Bayes which is only dependent on tree topologies, our method is sensitive to branch lengths and therefore provides a more robust metric for convergence. In a test case, it is illustrated that the tropical logistic regression can differentiate between two independently run MCMC chains, even when the standard metric cannot.

Assuntos

Algoritmos , Teorema de Bayes , Cadeias de Markov , Conceitos Matemáticos , Modelos Genéticos , Método de Monte Carlo , Filogenia , Modelos Logísticos , Curva ROC , Simulação por Computador

18.

Genealogical structure changes as range expansions transition from pushed to pulled.

Birzu, Gabriel; Hallatschek, Oskar; Korolev, Kirill S.

Proc Natl Acad Sci U S A ; 118(34)2021 08 24.

Artigo em Inglês | MEDLINE | ID: mdl-34413189

RESUMO

Range expansions accelerate evolution through multiple mechanisms, including gene surfing and genetic drift. The inference and control of these evolutionary processes ultimately rely on the information contained in genealogical trees. Currently, there are two opposing views on how range expansions shape genealogies. In invasion biology, expansions are typically approximated by a series of population bottlenecks producing genealogies with only pairwise mergers between lineages-a process known as the Kingman coalescent. Conversely, traveling wave models predict a coalescent with multiple mergers, known as the Bolthausen-Sznitman coalescent. Here, we unify these two approaches and show that expansions can generate an entire spectrum of coalescent topologies. Specifically, we show that tree topology is controlled by growth dynamics at the front and exhibits large differences between pulled and pushed expansions. These differences are explained by the fluctuations in the total number of descendants left by the early founders. High growth cooperativity leads to a narrow distribution of reproductive values and the Kingman coalescent. Conversely, low growth cooperativity results in a broad distribution, whose exponent controls the merger sizes in the genealogies. These broad distribution and non-Kingman tree topologies emerge due to the fluctuations in the front shape and position and do not occur in quasi-deterministic simulations. Overall, our results show that range expansions provide a robust mechanism for generating different types of multiple mergers, which could be similar to those observed in populations with strong selection or high fecundity. Thus, caution should be exercised in making inferences about the origin of non-Kingman genealogies.

Assuntos

Ecossistema , Modelos Genéticos , Filogenia , Distribuição Animal , Animais , Deriva Genética , Genética Populacional , Linhagem , Dinâmica Populacional

19.

Nonparametric coalescent inference of mutation spectrum history and demography.

DeWitt, William S; Harris, Kameron Decker; Ragsdale, Aaron P; Harris, Kelley.

Proc Natl Acad Sci U S A ; 118(21)2021 05 25.

Artigo em Inglês | MEDLINE | ID: mdl-34016747

RESUMO

As populations boom and bust, the accumulation of genetic diversity is modulated, encoding histories of living populations in present-day variation. Many methods exist to decode these histories, and all must make strong model assumptions. It is typical to assume that mutations accumulate uniformly across the genome at a constant rate that does not vary between closely related populations. However, recent work shows that mutational processes in human and great ape populations vary across genomic regions and evolve over time. This perturbs the mutation spectrum (relative mutation rates in different local nucleotide contexts). Here, we develop theoretical tools in the framework of Kingman's coalescent to accommodate mutation spectrum dynamics. We present mutation spectrum history inference (mushi), a method to perform nonparametric inference of demographic and mutation spectrum histories from allele frequency data. We use mushi to reconstruct trajectories of effective population size and mutation spectrum divergence between human populations, identify mutation signatures and their dynamics in different human populations, and calibrate the timing of a previously reported mutational pulse in the ancestors of Europeans. We show that mutation spectrum histories can be placed in a well-studied theoretical setting and rigorously inferred from genomic variation data, like other features of evolutionary history.

Assuntos

Frequência do Gene/genética , Genética Populacional/estatística & dados numéricos , Modelos Genéticos , Mutação/genética , Animais , Variação Genética/genética , Genômica , Hominidae/genética , Humanos , Taxa de Mutação , Densidade Demográfica

20.

Structure, Evolution, and Mitochondrial Genome Analysis of Mussel Species (Bivalvia, Mytilidae).

Kartavtsev, Yuri Phedorovich; Masalkova, Natalia A.

Int J Mol Sci ; 25(13)2024 Jun 24.

Artigo em Inglês | MEDLINE | ID: mdl-39000014

RESUMO

Based on the nucleotide sequences of the mitochondrial genome (mitogenome) of specimens taken from two mussel species (Arcuatula senhousia and Mytilus coruscus), an investigation was performed by means of the complex approaches of the genomics, molecular phylogenetics, and evolutionary genetics. The mitogenome structure of studied mussels, like in many other invertebrates, appears to be much more variable than in vertebrates and includes changing gene order, duplications, and deletions, which were most frequent for tRNA genes; the mussel species' mitogenomes also have variable sizes. The results demonstrate some of the very important properties of protein polypeptides, such as hydrophobicity and its determination by the purine and pyrimidine nucleotide ratio. This fact might indirectly indicate the necessity of purifying natural selection for the support of polypeptide functionality. However, in accordance with the widely accepted and logical concept of natural cutoff selection for organisms living in nature, which explains its action against deleterious nucleotide substitutions in the nonsynonymous codons (mutations) and its holding of the active (effective) macromolecules of the polypeptides in a population, we were unable to get unambiguous evidence in favor of this concept in the current paper. Here, the phylogeny and systematics of mussel species from one of the largest taxons of bivalve mollusks are studied, the family known as Mytilidae. The phylogeny for Mytilidae (order Mytilida), which currently has no consensus in terms of systematics, is reconstructed using a data matrix of 26-27 mitogenomes. Initially, a set of 100 sequences from GenBank were downloaded and checked for their gender: whether they were female (F) or male (M) in origin. Our analysis of the new data confirms the known drastic differences between the F/M mitogenome lines in mussels. Phylogenetic reconstructions of the F-lines were performed using the combined set of genetic markers, reconstructing only protein-coding genes (PCGs), only rRNA + tRNA genes, and all genes. Additionally, the analysis includes the usage of nucleotide sequences composed of other data matrices, such as 20-68 mitogenome sequences. The time of divergence from MRCA, estimated via BEAST2, for Mytilidae is close to 293 Mya, suggesting that they originate in the Silurian Period. From all these data, a consensus for the phylogeny of the subfamily of Mytilinae and its systematics is suggested. In particular, the long-debated argument on mussel systematics was resolved as to whether Mytilidae, and the subfamily of Mytilinae, are monophyletic. The topology signal, which was strongly resolved in this paper and in the literature, has refuted the theory regarding the monophyly of Mytilinae.

Assuntos

Evolução Molecular , Genoma Mitocondrial , Filogenia , Animais , Genoma Mitocondrial/genética , Mytilidae/genética , Mytilidae/classificação , RNA de Transferência/genética , Bivalves/genética , Bivalves/classificação , Mytilus/genética , Mytilus/classificação

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA