Pesquisa | Portal Regional da BVS

1.

Benefits and Limits of Phasing Alleles for Network Inference of Allopolyploid Complexes.

Tiley, George P; Crowl, Andrew A; Manos, Paul S; Sessa, Emily B; Solís-Lemus, Claudia; Yoder, Anne D; Burleigh, J Gordon.

Syst Biol ; 2024 May 11.

Artigo em Inglês | MEDLINE | ID: mdl-38733563

RESUMO

Accurately reconstructing the reticulate histories of polyploids remains a central challenge for understanding plant evolution. Although phylogenetic networks can provide insights into relationships among polyploid lineages, inferring networks may be hindered by the complexities of homology determination in polyploid taxa. We use simulations to show that phasing alleles from allopolyploid individuals can improve phylogenetic network inference under the multispecies coalescent by obtaining the true network with fewer loci compared to haplotype consensus sequences or sequences with heterozygous bases represented as ambiguity codes. Phased allelic data can also improve divergence time estimates for networks, which is helpful for evaluating allopolyploid speciation hypotheses and proposing mechanisms of speciation. To achieve these outcomes in empirical data, we present a novel pipeline that leverages a recently developed phasing algorithm to reliably phase alleles from polyploids. This pipeline is especially appropriate for target enrichment data, where depth of coverage is typically high enough to phase entire loci. We provide an empirical example in the North American Dryopteris fern complex that demonstrates insights from phased data as well as the challenges of network inference. We establish that our pipeline (PATÉ: Phased Alleles from Target Enrichment data) is capable of recovering a high proportion of phased loci from both diploids and polyploids. These data may improve network estimates compared to using haplotype consensus assemblies by accurately inferring the direction of gene flow, but statistical non-identifiability of phylogenetic networks poses a barrier to inferring the evolutionary history of reticulate complexes.

2.

Identifying microbial drivers in biological phenotypes with a Bayesian network regression model.

Ozminkowski, Samuel; Solís-Lemus, Claudia.

Ecol Evol ; 14(5): e11039, 2024 May.

Artigo em Inglês | MEDLINE | ID: mdl-38774136

RESUMO

In Bayesian Network Regression models, networks are considered the predictors of continuous responses. These models have been successfully used in brain research to identify regions in the brain that are associated with specific human traits, yet their potential to elucidate microbial drivers in biological phenotypes for microbiome research remains unknown. In particular, microbial networks are challenging due to their high dimension and high sparsity compared to brain networks. Furthermore, unlike in brain connectome research, in microbiome research, it is usually expected that the presence of microbes has an effect on the response (main effects), not just the interactions. Here, we develop the first thorough investigation of whether Bayesian Network Regression models are suitable for microbial datasets on a variety of synthetic and real data under diverse biological scenarios. We test whether the Bayesian Network Regression model that accounts only for interaction effects (edges in the network) is able to identify key drivers (microbes) in phenotypic variability. We show that this model is indeed able to identify influential nodes and edges in the microbial networks that drive changes in the phenotype for most biological settings, but we also identify scenarios where this method performs poorly which allows us to provide practical advice for domain scientists aiming to apply these tools to their datasets. BNR models provide a framework for microbiome researchers to identify connections between microbes and measured phenotypes. We allow the use of this statistical model by providing an easy-to-use implementation which is publicly available Julia package at https://github.com/solislemuslab/BayesianNetworkRegression.jl.

3.

Novel symmetry-preserving neural network model for phylogenetic inference.

Tang, Xudong; Zepeda-Nuñez, Leonardo; Yang, Shengwen; Zhao, Zelin; Solís-Lemus, Claudia.

Bioinform Adv ; 4(1): vbae022, 2024.

Artigo em Inglês | MEDLINE | ID: mdl-38638281

RESUMO

Motivation: Scientists world-wide are putting together massive efforts to understand how the biodiversity that we see on Earth evolved from single-cell organisms at the origin of life and this diversification process is represented through the Tree of Life. Low sampling rates and high heterogeneity in the rate of evolution across sites and lineages produce a phenomenon denoted "long branch attraction" (LBA) in which long nonsister lineages are estimated to be sisters regardless of their true evolutionary relationship. LBA has been a pervasive problem in phylogenetic inference affecting different types of methodologies from distance-based to likelihood-based. Results: Here, we present a novel neural network model that outperforms standard phylogenetic methods and other neural network implementations under LBA settings. Furthermore, unlike existing neural network models in phylogenetics, our model naturally accounts for the tree isomorphisms via permutation invariant functions which ultimately result in lower memory and allows the seamless extension to larger trees. Availability and implementation: We implement our novel theory on an open-source publicly available GitHub repository: https://github.com/crsl4/nn-phylogenetics.

4.

Ultrafast learning of four-node hybridization cycles in phylogenetic networks using algebraic invariants.

Wu, Zhaoxing; Solís-Lemus, Claudia.

Bioinform Adv ; 4(1): vbae014, 2024.

Artigo em Inglês | MEDLINE | ID: mdl-38384862

RESUMO

Motivation: The abundance of gene flow in the Tree of Life challenges the notion that evolution can be represented with a fully bifurcating process which cannot capture important biological realities like hybridization, introgression, or horizontal gene transfer. Coalescent-based network methods are increasingly popular, yet not scalable for big data, because they need to perform a heuristic search in the space of networks as well as numerical optimization that can be NP-hard. Here, we introduce a novel method to reconstruct phylogenetic networks based on algebraic invariants. While there is a long tradition of using algebraic invariants in phylogenetics, our work is the first to define phylogenetic invariants on concordance factors (frequencies of four-taxon splits in the input gene trees) to identify level-1 phylogenetic networks under the multispecies coalescent model. Results: Our novel hybrid detection methodology is optimization-free as it only requires the evaluation of polynomial equations, and as such, it bypasses the traversal of network space, yielding a computational speed at least 10 times faster than the fastest-to-date network methods. We illustrate our method's performance on simulated and real data from the genus Canis. Availability and implementation: We present an open-source publicly available Julia package PhyloDiamond.jl available at https://github.com/solislemuslab/PhyloDiamond.jl with broad applicability within the evolutionary community.

5.

Machine learning identification of Pseudomonas aeruginosa strains from colony image data.

Rattray, Jennifer B; Lowhorn, Ryan J; Walden, Ryan; Márquez-Zacarías, Pedro; Molotkova, Evgeniya; Perron, Gabriel; Solis-Lemus, Claudia; Pimentel Alarcon, Daniel; Brown, Sam P.

PLoS Comput Biol ; 19(12): e1011699, 2023 Dec.

Artigo em Inglês | MEDLINE | ID: mdl-38091365

RESUMO

When grown on agar surfaces, microbes can produce distinct multicellular spatial structures called colonies, which contain characteristic sizes, shapes, edges, textures, and degrees of opacity and color. For over one hundred years, researchers have used these morphology cues to classify bacteria and guide more targeted treatment of pathogens. Advances in genome sequencing technology have revolutionized our ability to classify bacterial isolates and while genomic methods are in the ascendancy, morphological characterization of bacterial species has made a resurgence due to increased computing capacities and widespread application of machine learning tools. In this paper, we revisit the topic of colony morphotype on the within-species scale and apply concepts from image processing, computer vision, and deep learning to a dataset of 69 environmental and clinical Pseudomonas aeruginosa strains. We find that colony morphology and complexity under common laboratory conditions is a robust, repeatable phenotype on the level of individual strains, and therefore forms a potential basis for strain classification. We then use a deep convolutional neural network approach with a combination of data augmentation and transfer learning to overcome the typical data starvation problem in biological applications of deep learning. Using a train/validation/test split, our results achieve an average validation accuracy of 92.9% and an average test accuracy of 90.7% for the classification of individual strains. These results indicate that bacterial strains have characteristic visual 'fingerprints' that can serve as the basis of classification on a sub-species level. Our work illustrates the potential of image-based classification of bacterial pathogens and highlights the potential to use similar approaches to predict medically relevant strain characteristics like antibiotic resistance and virulence from colony data.

Assuntos

Aprendizado de Máquina , Pseudomonas aeruginosa , Pseudomonas aeruginosa/genética , Redes Neurais de Computação , Processamento de Imagem Assistida por Computador/métodos , Bactérias

6.

Towards a robust out-of-the-box neural network model for genomic data.

Zhang, Zhaoyi; Cheng, Songyang; Solis-Lemus, Claudia.

BMC Bioinformatics ; 23(1): 125, 2022 Apr 09.

Artigo em Inglês | MEDLINE | ID: mdl-35397517

RESUMO

BACKGROUND: The accurate prediction of biological features from genomic data is paramount for precision medicine and sustainable agriculture. For decades, neural network models have been widely popular in fields like computer vision, astrophysics and targeted marketing given their prediction accuracy and their robust performance under big data settings. Yet neural network models have not made a successful transition into the medical and biological world due to the ubiquitous characteristics of biological data such as modest sample sizes, sparsity, and extreme heterogeneity. RESULTS: Here, we investigate the robustness, generalization potential and prediction accuracy of widely used convolutional neural network and natural language processing models with a variety of heterogeneous genomic datasets. Mainly, recurrent neural network models outperform convolutional neural network models in terms of prediction accuracy, overfitting and transferability across the datasets under study. CONCLUSIONS: While the perspective of a robust out-of-the-box neural network model is out of reach, we identify certain model characteristics that translate well across datasets and could serve as a baseline model for translational researchers.

Assuntos

Big Data , Redes Neurais de Computação , Genômica , Processamento de Linguagem Natural

7.

Efficient estimation of indirect effects in case-control studies using a unified likelihood framework.

Satten, Glen A; Curtis, Sarah W; Solis-Lemus, Claudia; Leslie, Elizabeth J; Epstein, Michael P.

Stat Med ; 41(15): 2879-2893, 2022 07 10.

Artigo em Inglês | MEDLINE | ID: mdl-35352841

RESUMO

Mediation models are a set of statistical techniques that investigate the mechanisms that produce an observed relationship between an exposure variable and an outcome variable in order to deduce the extent to which the relationship is influenced by intermediate mediator variables. For a case-control study, the most common mediation analysis strategy employs a counterfactual framework that permits estimation of indirect and direct effects on the odds ratio scale for dichotomous outcomes, assuming either binary or continuous mediators. While this framework has become an important tool for mediation analysis, we demonstrate that we can embed this approach in a unified likelihood framework for mediation analysis in case-control studies that leverages more features of the data (in particular, the relationship between exposure and mediator) to improve efficiency of indirect effect estimates. One important feature of our likelihood approach is that it naturally incorporates cases within the exposure-mediator model to improve efficiency. Our approach does not require knowledge of disease prevalence and can model confounders and exposure-mediator interactions, and is straightforward to implement in standard statistical software. We illustrate our approach using both simulated data and real data from a case-control genetic study of lung cancer.

Assuntos

Modelos Estatísticos , Estudos de Casos e Controles , Fatores de Confusão Epidemiológicos , Humanos , Funções Verossimilhança , Razão de Chances

8.

Effect of genetic background on the evolution of Vancomycin-Intermediate Staphylococcus aureus (VISA).

Su, Michelle; Davis, Michelle H; Peterson, Jessica; Solis-Lemus, Claudia; Satola, Sarah W; Read, Timothy D.

PeerJ ; 9: e11764, 2021.

Artigo em Inglês | MEDLINE | ID: mdl-34306830

RESUMO

Vancomycin-intermediate Staphylococcus aureus (VISA) typically arises through accumulation of chromosomal mutations that alter cell-wall thickness and global regulatory pathways. Genome-based prediction of VISA requires understanding whether strain background influences patterns of mutation that lead to resistance. We used an iterative method to experimentally evolve three important methicillin-resistant S. aureus (MRSA) strain backgrounds-(CC1, CC5 and CC8 (USA300)) to generate a library of 120 laboratory selected VISA isolates. At the endpoint, isolates had vancomycin MICs ranging from 4 to 10 µg/mL. We detected mutations in more than 150 genes, but only six genes (already known to be associated with VISA from prior studies) were mutated in all three background strains (walK, prs, rpoB, rpoC, vraS, yvqF). We found evidence of interactions between loci (e.g., vraS and yvqF mutants were significantly negatively correlated) and rpoB, rpoC, vraS and yvqF were more frequently mutated in one of the backgrounds. Increasing vancomycin resistance was correlated with lower maximal growth rates (a proxy for fitness) regardless of background. However, CC5 VISA isolates had higher MICs with fewer rounds of selection and had lower fitness costs than the CC8 VISA isolates. Using multivariable regression, we found that genes differed in their contribution to overall MIC depending on the background. Overall, these results demonstrated that VISA evolved through mutations in a similar set of loci in all backgrounds, but the effect of mutation in common genes differed with regard to fitness and contribution to resistance in different strains.

9.

Genes Influencing Phage Host Range in Staphylococcus aureus on a Species-Wide Scale.

Moller, Abraham G; Winston, Kyle; Ji, Shiyu; Wang, Junting; Hargita Davis, Michelle N; Solís-Lemus, Claudia R; Read, Timothy D.

mSphere ; 6(1)2021 01 13.

Artigo em Inglês | MEDLINE | ID: mdl-33441407

RESUMO

Staphylococcus aureus is a human pathogen that causes serious diseases, ranging from skin infections to septic shock. Bacteriophages (phages) are both natural killers of S. aureus, offering therapeutic possibilities, and important vectors of horizontal gene transfer (HGT) in the species. Here, we used high-throughput approaches to understand the genetic basis of strain-to-strain variation in sensitivity to phages, which defines the host range. We screened 259 diverse S. aureus strains covering more than 40 sequence types for sensitivity to eight phages, which were representatives of the three phage classes that infect the species. The phages were variable in host range, each infecting between 73 and 257 strains. Using genome-wide association approaches, we identified putative loci that affect host range and validated their function using USA300 transposon knockouts. In addition to rediscovering known host range determinants, we found several previously unreported genes affecting bacterial growth during phage infection, including trpA, phoR, isdB, sodM, fmtC, and relA We used the data from our host range matrix to develop predictive models that achieved between 40% and 95% accuracy. This work illustrates the complexity of the genetic basis for phage susceptibility in S. aureus but also shows that with more data, we may be able to understand much of the variation. With a knowledge of host range determination, we can rationally design phage therapy cocktails that target the broadest host range of S. aureus strains and address basic questions regarding phage-host interactions, such as the impact of phage on S. aureus evolution.IMPORTANCEStaphylococcus aureus is a widespread, hospital- and community-acquired pathogen, many strains of which are antibiotic resistant. It causes diverse diseases, ranging from local to systemic infection, and affects both the skin and many internal organs, including the heart, lungs, bones, and brain. Its ubiquity, antibiotic resistance, and disease burden make new therapies urgent. One alternative therapy to antibiotics is phage therapy, in which viruses specific to infecting bacteria clear infection. In this work, we identified and validated S. aureus genes that influence phage host range-the number of strains a phage can infect and kill-by testing strains representative of the diversity of the S. aureus species for phage host range and associating the genome sequences of strains with host range. These findings together improved our understanding of how phage therapy works in the bacterium and improve prediction of phage therapy efficacy based on the predicted host range of the infecting strain.

Assuntos

Especificidade de Hospedeiro/genética , Fagos de Staphylococcus/fisiologia , Staphylococcus aureus/genética , Staphylococcus aureus/virologia , Genoma Bacteriano , Estudo de Associação Genômica Ampla/métodos , Humanos , Fenótipo , Infecções Estafilocócicas/microbiologia

10.

Genomic analysis of variability in Delta-toxin levels between Staphylococcus aureus strains.

Su, Michelle; Lyles, James T; Petit Iii, Robert A; Peterson, Jessica; Hargita, Michelle; Tang, Huaqiao; Solis-Lemus, Claudia; Quave, Cassandra L; Read, Timothy D.

PeerJ ; 8: e8717, 2020.

Artigo em Inglês | MEDLINE | ID: mdl-32231873

RESUMO

BACKGROUND: The delta-toxin (Î´-toxin) of Staphylococcus aureus is the only hemolysin shown to cause mast cell degranulation and is linked to atopic dermatitis, a chronic inflammatory skin disease. We sought to characterize variation in Î´-toxin production across S. aureus strains and identify genetic loci potentially associated with differences between strains. METHODS: A set of 124 S. aureus strains was genome-sequenced and Î´-toxin levels in stationary phase supernatants determined by high performance liquid chromatography (HPLC). SNPs and kmers were associated with differences in toxin production using four genome-wide association study (GWAS) methods. Transposon mutations in candidate genes were tested for their Î´-toxin levels. We constructed XGBoost models to predict toxin production based on genetic loci discovered to be potentially associated with the phenotype. RESULTS: The S. aureus strain set encompassed 40 sequence types (STs) in 23 clonal complexes (CCs). Î´-toxin production ranged from barely detectable levels to >90,000 units, with a median of >8,000 units. CC30 had significantly lower levels of toxin production than average while CC45 and CC121 were higher. MSSA (methicillin sensitive) strains had higher Î´-toxin production than MRSA (methicillin resistant) strains. Through multiple GWAS approaches, 45 genes were found to be potentially associated with toxicity. Machine learning models using loci discovered through GWAS as features were able to predict Î´-toxin production (as a high/low binary phenotype) with a precision of .875 and specificity of .990 but recall of .333. We discovered that mutants in the carA gene, encoding the small chain of carbamoyl phosphate synthase, completely abolished toxin production and toxicity in Caenorhabditis elegans. CONCLUSIONS: The amount of stationary phase production of the toxin is a strain-specific phenotype likely affected by a complex interaction of number of genes with different levels of effect. We discovered new candidate genes that potentially play a role in modulating production. We report for the first time that the product of the carA gene is necessary for Î´-toxin production in USA300. This work lays a foundation for future work on understanding toxin regulation in S. aureus and prediction of phenotypes from genomic sequences.

11.

Leveraging Family History in Case-Control Analyses of Rare Variation.

Solis-Lemus, Claudia R; Fischer, S Taylor; Todor, Andrei; Liu, Cuining; Leslie, Elizabeth J; Cutler, David J; Ghosh, Debashis; Epstein, Michael P.

Genetics ; 214(2): 295-303, 2020 02.

Artigo em Inglês | MEDLINE | ID: mdl-31843756

RESUMO

Standard methods for case-control association studies of rare variation often treat disease outcome as a dichotomous phenotype. However, both theoretical and experimental studies have demonstrated that subjects with a family history of disease can be enriched for risk variation relative to subjects without such history. Assuming family history information is available, this observation motivates the idea of replacing the standard dichotomous outcome variable used in case-control studies with a more informative ordinal outcome variable that distinguishes controls (0), sporadic cases (1), and cases with a family history (2), with the expectation that we should observe increasing number of risk variants with increasing category of the ordinal variable. To leverage this expectation, we propose a novel rare-variant association test that incorporates family history information based on our previous GAMuT framework for rare-variant association testing of multivariate phenotypes. We use simulated data to show that, when family history information is available, our new method outperforms standard rare-variant association methods, like burden and SKAT tests, that ignore family history. We further illustrate our method using a rare-variant study of cleft lip and palate.

Assuntos

Doença/genética , Estudos de Associação Genética/métodos , Variação Genética/genética , Simulação por Computador , Família , Genótipo , Humanos , Modelos Genéticos , Modelos Estatísticos , Linhagem , Fenótipo

12.

Phylogenetic Comparative Methods on Phylogenetic Networks with Reticulations.

Bastide, Paul; Solís-Lemus, Claudia; Kriebel, Ricardo; William Sparks, K; Ané, Cécile.

Syst Biol ; 67(5): 800-820, 2018 09 01.

Artigo em Inglês | MEDLINE | ID: mdl-29701821

RESUMO

The goal of phylogenetic comparative methods (PCMs) is to study the distribution of quantitative traits among related species. The observed traits are often seen as the result of a Brownian Motion (BM) along the branches of a phylogenetic tree. Reticulation events such as hybridization, gene flow or horizontal gene transfer, can substantially affect a species' traits, but are not modeled by a tree. Phylogenetic networks have been designed to represent reticulate evolution. As they become available for downstream analyses, new models of trait evolution are needed, applicable to networks. We develop here an efficient recursive algorithm to compute the phylogenetic variance matrix of a trait on a network, in only one preorder traversal of the network. We then extend the standard PCM tools to this new framework, including phylogenetic regression with covariates (or phylogenetic ANOVA), ancestral trait reconstruction, and Pagel's $\lambda$ test of phylogenetic signal. The trait of a hybrid is sometimes outside of the range of its two parents, for instance because of hybrid vigor or hybrid depression. These two phenomena are rather commonly observed in present-day hybrids. Transgressive evolution can be modeled as a shift in the trait value following a reticulation point. We develop a general framework to handle such shifts and take advantage of the phylogenetic regression view of the problem to design statistical tests for ancestral transgressive evolution in the evolutionary history of a group of species. We study the power of these tests in several scenarios and show that recent events have indeed the strongest impact on the trait distribution of present-day taxa. We apply those methods to a data set of Xiphophorus fishes, to confirm and complete previous analysis in this group. All the methods developed here are available in the Julia package PhyloNetworks.

Assuntos

Ciprinodontiformes/genética , Evolução Molecular , Fluxo Gênico , Transferência Genética Horizontal , Hibridização Genética , Filogenia , Algoritmos , Animais , Ciprinodontiformes/classificação , Modelos Genéticos , Fenótipo

13.

Greatly reduced phylogenetic structure in the cultivated potato clade (Solanum section Petota pro parte).

Spooner, David M; Ruess, Holly; Arbizu, Carlos I; Rodríguez, Flor; Solís-Lemus, Claudia.

Am J Bot ; 105(1): 60-70, 2018 01.

Artigo em Inglês | MEDLINE | ID: mdl-29532930

RESUMO

PREMISE OF THE STUDY: The species boundaries of wild and cultivated potatoes are controversial, with most of the taxonomic problems in the cultivated potato clade. We here provide the first in-depth phylogenetic study of the cultivated potato clade to explore possible causes of these problems. METHODS: We examined 131 diploid accessions, using 12 nuclear orthologs, producing an aligned data set of 14,072 DNA characters, 2171 of which are parsimony-informative. We analyzed the data to produce phylogenies and perform concordance analysis and goodness-of-fit tests. KEY RESULTS: There is good phylogenetic structure in clades traditionally referred to as clade 1+2 (North and Central American diploid potatoes exclusive of Solanum verrucosum), clade 3, and a newly discovered basal clade, but drastically reduced phylogenetic structure in clade 4, the cultivated potato clade. The results highlight a clade of species in South America not shown before, 'neocardenasii', sister to clade 1+2, that possesses key morphological traits typical of diploids in Mexico and Central America. Goodness-of-fit tests suggest potential hybridization between some species of the cultivated potato clade. However, we do not have enough phylogenetic signal with the data at hand to explicitly estimate such hybridization events with species networks methods. CONCLUSIONS: We document the close relationships of many of the species in the cultivated potato clade, provide insight into the cause of their taxonomic problems, and support the recent reduction of species in this clade. The discovery of the neocardenasii clade forces a reevaluation of a hypothesis that section Petota originated in Mexico and Central America.

Assuntos

Evolução Molecular , Filogenia , Solanum/genética , Análise de Sequência de DNA , Solanum/classificação

14.

PhyloNetworks: A Package for Phylogenetic Networks.

Solís-Lemus, Claudia; Bastide, Paul; Ané, Cécile.

Mol Biol Evol ; 34(12): 3292-3298, 2017 Dec 01.

Artigo em Inglês | MEDLINE | ID: mdl-28961984

RESUMO

PhyloNetworks is a Julia package for the inference, manipulation, visualization, and use of phylogenetic networks in an interactive environment. Inference of phylogenetic networks is done with maximum pseudolikelihood from gene trees or multi-locus sequences (SNaQ), with possible bootstrap analysis. PhyloNetworks is the first software providing tools to summarize a set of networks (from a bootstrap or posterior sample) with measures of tree edge support, hybrid edge support, and hybrid node support. Networks can be used for phylogenetic comparative analysis of continuous traits, to estimate ancestral states or do a phylogenetic regression. The software is available in open source and with documentation at https://github.com/crsl4/PhyloNetworks.jl.

Assuntos

Biologia Computacional/métodos , Filogenia , Algoritmos , Evolução Molecular , Software

15.

Statistical evidence for common ancestry: Application to primates.

Baum, David A; Ané, Cécile; Larget, Bret; Solís-Lemus, Claudia; Ho, Lam Si Tung; Boone, Peggy; Drummond, Chloe P; Bontrager, Martin; Hunter, Steven J; Saucier, William.

Evolution ; 70(6): 1354-63, 2016 06.

Artigo em Inglês | MEDLINE | ID: mdl-27139421

RESUMO

Since Darwin, biologists have come to recognize that the theory of descent from common ancestry (CA) is very well supported by diverse lines of evidence. However, while the qualitative evidence is overwhelming, we also need formal methods for quantifying the evidential support for CA over the alternative hypothesis of separate ancestry (SA). In this article, we explore a diversity of statistical methods using data from the primates. We focus on two alternatives to CA, species SA (the separate origin of each named species) and family SA (the separate origin of each family). We implemented statistical tests based on morphological, molecular, and biogeographic data and developed two new methods: one that tests for phylogenetic autocorrelation while correcting for variation due to confounding ecological traits and a method for examining whether fossil taxa have fewer derived differences than living taxa. We overwhelmingly rejected both species and family SA with infinitesimal P values. We compare these results with those from two companion papers, which also found tremendously strong support for the CA of all primates, and discuss future directions and general philosophical issues that pertain to statistical testing of historical hypotheses such as CA.

Assuntos

Evolução Biológica , Classificação/métodos , Modelos Genéticos , Primatas/classificação , Distribuição Animal , Animais , Fósseis/anatomia & histologia , Modelos Estatísticos , Filogenia , Primatas/anatomia & histologia , Primatas/genética , Primatas/fisiologia

16.

Inconsistency of Species Tree Methods under Gene Flow.

Solís-Lemus, Claudia; Yang, Mengyao; Ané, Cécile.

Syst Biol ; 65(5): 843-51, 2016 09.

Artigo em Inglês | MEDLINE | ID: mdl-27151419

RESUMO

Coalescent-based methods are now broadly used to infer evolutionary relationships between groups of organisms under the assumption that incomplete lineage sorting (ILS) is the only source of gene tree discordance. Many of these methods are known to consistently estimate the species tree when all their assumptions are met. Nonetheless, little work has been done to test the robustness of such methods to violations of their assumptions. Here, we study the performance of two of the most efficient coalescent-based methods, ASTRAL and NJst, in the presence of gene flow. Gene flow violates the assumption that ILS is the sole source of gene tree conflict. We find anomalous gene trees on three-taxon rooted trees and on four-taxon unrooted trees. These anomalous trees do not exist under ILS only, but appear because of gene flow. Our simulations show that species tree methods (and concatenation) may reconstruct the wrong evolutionary history, even from a very large number of well-reconstructed gene trees. In other words, species tree methods can be inconsistent under gene flow. Our results underline the need for methods like PhyloNet, to account simultaneously for ILS and gene flow in a unified framework. Although much slower, PhyloNet had better accuracy and remained consistent at high levels of gene flow.

Assuntos

Classificação/métodos , Fluxo Gênico , Filogenia , Evolução Biológica , Simulação por Computador , Especiação Genética , Modelos Genéticos

17.

Inferring Phylogenetic Networks with Maximum Pseudolikelihood under Incomplete Lineage Sorting.

Solís-Lemus, Claudia; Ané, Cécile.

PLoS Genet ; 12(3): e1005896, 2016 Mar.

Artigo em Inglês | MEDLINE | ID: mdl-26950302

RESUMO

Phylogenetic networks are necessary to represent the tree of life expanded by edges to represent events such as horizontal gene transfers, hybridizations or gene flow. Not all species follow the paradigm of vertical inheritance of their genetic material. While a great deal of research has flourished into the inference of phylogenetic trees, statistical methods to infer phylogenetic networks are still limited and under development. The main disadvantage of existing methods is a lack of scalability. Here, we present a statistical method to infer phylogenetic networks from multi-locus genetic data in a pseudolikelihood framework. Our model accounts for incomplete lineage sorting through the coalescent model, and for horizontal inheritance of genes through reticulation nodes in the network. Computation of the pseudolikelihood is fast and simple, and it avoids the burdensome calculation of the full likelihood which can be intractable with many species. Moreover, estimation at the quartet-level has the added computational benefit that it is easily parallelizable. Simulation studies comparing our method to a full likelihood approach show that our pseudolikelihood approach is much faster without compromising accuracy. We applied our method to reconstruct the evolutionary relationships among swordtails and platyfishes (Xiphophorus: Poeciliidae), which is characterized by widespread hybridizations.

Assuntos

Evolução Molecular , Transferência Genética Horizontal , Filogenia , Simulação por Computador , Funções Verossimilhança , Modelos Genéticos

18.

Bayesian species delimitation combining multiple genes and traits in a unified framework.

Solís-Lemus, Claudia; Knowles, L Lacey; Ané, Cécile.

Evolution ; 69(2): 492-507, 2015 Feb.

Artigo em Inglês | MEDLINE | ID: mdl-25495061

RESUMO

Delimitation of species based exclusively on genetic data has been advocated despite a critical knowledge gap: how might such approaches fail because they rely on genetic data alone, and would their accuracy be improved by using multiple data types. We provide here the requisite framework for addressing these key questions. Because both phenotypic and molecular data can be analyzed in a common Bayesian framework with our program iBPP, we can compare the accuracy of delimited taxa based on genetic data alone versus when integrated with phenotypic data. We can also evaluate how the integration of phenotypic data might improve species delimitation when divergence occurs with gene flow and/or is selectively driven. These two realities of the speciation process are ignored by currently available genetic approaches. Our model accommodates phenotypic characters that exhibit different degrees of divergence, allowing for both neutral traits and traits under selection. We found a greater accuracy of estimated species boundaries with the integration of phenotypic and genetic data, with a strong beneficial influence of phenotypic data from traits under selection when the speciation process involves gene flow. Our results highlight the benefits of multiple data types, but also draws into question the rationale of species delimitation based exclusively on genetic data.

Assuntos

Especiação Genética , Modelos Genéticos , Especificidade da Espécie , Animais , Teorema de Bayes , Fluxo Gênico , Lagartos/genética , Fenótipo , Filogenia

RESUMO

RESUMO

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA