Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 55
Filtrar
1.
Mol Biol Evol ; 40(5)2023 05 02.
Artigo em Inglês | MEDLINE | ID: mdl-37140129

RESUMO

The data available for reconstructing molecular phylogenies have become wildly disparate. Phylogenomic studies can generate data for thousands of genetic markers for dozens of species, but for hundreds of other taxa, data may be available from only a few genes. Can these two types of data be integrated to combine the advantages of both, addressing the relationships of hundreds of species with thousands of genes? Here, we show that this is possible, using data from frogs. We generated a phylogenomic data set for 138 ingroup species and 3,784 nuclear markers (ultraconserved elements [UCEs]), including new UCE data from 70 species. We also assembled a supermatrix data set, including data from 97% of frog genera (441 total), with 1-307 genes per taxon. We then produced a combined phylogenomic-supermatrix data set (a "gigamatrix") containing 441 ingroup taxa and 4,091 markers but with 86% missing data overall. Likelihood analysis of the gigamatrix yielded a generally well-supported tree among families, largely consistent with trees from the phylogenomic data alone. All terminal taxa were placed in the expected families, even though 42.5% of these taxa each had >99.5% missing data and 70.2% had >90% missing data. Our results show that missing data need not be an impediment to successfully combining very large phylogenomic and supermatrix data sets, and they open the door to new studies that simultaneously maximize sampling of genes and taxa.


Assuntos
Anuros , Animais , Filogenia , Análise de Sequência de DNA , Anuros/genética , Probabilidade
2.
Mol Biol Evol ; 40(11)2023 Nov 03.
Artigo em Inglês | MEDLINE | ID: mdl-37879113

RESUMO

In phylogenomics, incongruences between gene trees, resulting from both artifactual and biological reasons, can decrease the signal-to-noise ratio and complicate species tree inference. The amount of data handled today in classical phylogenomic analyses precludes manual error detection and removal. However, a simple and efficient way to automate the identification of outliers from a collection of gene trees is still missing. Here, we present PhylteR, a method that allows rapid and accurate detection of outlier sequences in phylogenomic datasets, i.e. species from individual gene trees that do not follow the general trend. PhylteR relies on DISTATIS, an extension of multidimensional scaling to 3 dimensions to compare multiple distance matrices at once. In PhylteR, these distance matrices extracted from individual gene phylogenies represent evolutionary distances between species according to each gene. On simulated datasets, we show that PhylteR identifies outliers with more sensitivity and precision than a comparable existing method. We also show that PhylteR is not sensitive to ILS-induced incongruences, which is a desirable feature. On a biological dataset of 14,463 genes for 53 species previously assembled for Carnivora phylogenomics, we show (i) that PhylteR identifies as outliers sequences that can be considered as such by other means, and (ii) that the removal of these sequences improves the concordance between the gene trees and the species tree. Thanks to the generation of numerous graphical outputs, PhylteR also allows for the rapid and easy visual characterization of the dataset at hand, thus aiding in the precise identification of errors. PhylteR is distributed as an R package on CRAN and as containerized versions (docker and singularity).


Assuntos
Evolução Biológica , Filogenia
3.
Antonie Van Leeuwenhoek ; 117(1): 103, 2024 Jul 23.
Artigo em Inglês | MEDLINE | ID: mdl-39042225

RESUMO

Genus Thermus is the main focus of researcher among the thermophiles. Members of this genus are the inhabitants of both natural and artificial thermal environments. We performed phylogenomic analyses and comparative genomic studies to unravel the genomic diversity among the strains belonging to the genus Thermus in geographically different thermal springs. Sixteen Thermus strains were isolated and sequenced from hot springs, Qucai hot springs in Tibet and Tengchong hot springs in Yunnan, China. 16S rRNA gene based phylogeny and phylogenomic analyses based on concatenated set of 971 Orthologous Protein Families (supermatrix and gene content methods) revealed a mixed distribution of the Thermus strains. Whole genome based phylogenetic analysis showed, all 16 Thermus strains belong to five species; Thermus oshimai (YIM QC-2-109, YIM 1640, YIM 1627, 77359, 77923, 77838), Thermus antranikianii (YIM 73052, 77412, 77311, 71206), Thermus brokianus (YIM 73518, 71318, 72351), Thermus hydrothermalis (YIM 730264 and 77927) and one potential novel species 77420 forming clade with Thermus thalpophilus SYSU G00506T. Although the genomes of different strains of Thermus of same species were highly similar in their metabolic pathways, but subtle differences were found. CRISPR loci were detected through genome-wide screening, which showed that Thermus isolates from two different thermal locations had well developed defense system against viruses and adopt similar strategy for survival. Additionally, comparative genome analysis screened competence loci across all the Thermus genomes which could be helpful to acquire DNA from environment. In the present study it was found that Thermus isolates use two mechanism of incomplete denitrification pathway, some Thermus strains produces nitric oxide while others nitrious oxide (dinitrogen oxide), which show the heterotrophic lifestyle of Thermus genus. All isolated organisms encoded complete pathways for glycolysis, tricarboxylic acid and pentose phosphate. Calvin Benson Bassham cycle genes were identified in genomes of T. oshimai and T. antranikianii strains, while genomes of all T. brokianus strains and organism 77420 were lacking. Arsenic, cadmium and cobalt-zinc-cadmium resistant genes were detected in genomes of all sequenced Thermus strains. Strains 77,420, 77,311, 73,518, 77,412 and 72,351 genomes were found harboring genes for siderophores production. Sox gene clusters were identified in all sequenced genomes, except strain YIM 730264, suggesting a mode of chemolithotrophy. Through the comparative genomic analysis, we also identified 77420 as the genome type species and its validity as novel organism was confirmed by whole genome sequences comparison. Although isolate 77420 had 99.0% 16S rRNA gene sequence similarity with T. thalpophilus SYSU G00506T but based on ANI 95.86% (Jspecies) and digital DDH 68.80% (GGDC) values differentiate it as a potential novel species. Similarly, in the phylogenomic tree, the novel isolate 77,420 forming a separate branch with their closest reference type strain T. thalpophilus SYSU G00506T.


Assuntos
Genoma Bacteriano , Genômica , Fontes Termais , Filogenia , RNA Ribossômico 16S , Thermus , Thermus/genética , Thermus/classificação , Thermus/isolamento & purificação , Fontes Termais/microbiologia , RNA Ribossômico 16S/genética , Tibet , China , DNA Bacteriano/genética , Análise de Sequência de DNA
4.
BMC Bioinformatics ; 24(1): 390, 2023 Oct 14.
Artigo em Inglês | MEDLINE | ID: mdl-37838689

RESUMO

BACKGROUND: Genome-scale phylogenetic analysis based on core gene sets is routinely used in microbiological research. However, the techniques are still not approachable for individuals with little bioinformatics experience. Here, we present EasyCGTree, a user-friendly and cross-platform pipeline to reconstruct genome-scale maximum-likehood (ML) phylogenetic tree using supermatrix (SM) and supertree (ST) approaches. RESULTS: EasyCGTree was implemented in Perl programming languages and was built using a collection of published reputable programs. All the programs were precompiled as standalone executable files and contained in the EasyCGTree package. It can run after installing Perl language environment. Several profile hidden Markov models (HMMs) of core gene sets were prepared in advance to construct a profile HMM database (PHD) that was enclosed in the package and available for homolog searching. Customized gene sets can also be used to build profile HMM and added to the PHD via EasyCGTree. Taking 43 genomes of the genus Paracoccus as the testing data set, consensus (a variant of the typical SM), SM, and ST trees were inferred via EasyCGTree successfully, and the SM trees were compared with those inferred via the pipelines UBCG and bcgTree, using the metrics of cophenetic correlation coefficients (CCC) and Robinson-Foulds distance (topological distance). The results suggested that EasyCGTree can infer SM trees with nearly identical topology (distance < 0.1) and accuracy (CCC > 0.99) to those of trees inferred with the two pipelines. CONCLUSIONS: EasyCGTree is an all-in-one automatic pipeline from input data to phylogenomic tree with guaranteed accuracy, and is much easier to install and use than the reference pipelines. In addition, ST is implemented in EasyCGTree conveniently and can be used to explore prokaryotic evolutionary signals from a different perspective. The EasyCGTree version 4 is freely available for Linux and Windows users at Github ( https://github.com/zdf1987/EasyCGTree4 ).


Assuntos
Biologia Computacional , Células Procarióticas , Humanos , Filogenia , Biologia Computacional/métodos , Evolução Biológica , Linguagens de Programação
5.
Mol Biol Evol ; 39(2)2022 02 03.
Artigo em Inglês | MEDLINE | ID: mdl-35137183

RESUMO

Deciphering the evolutionary relationships of Chelicerata (arachnids, horseshoe crabs, and allied taxa) has proven notoriously difficult, due to their ancient rapid radiation and the incidence of elevated evolutionary rates in several lineages. Although conflicting hypotheses prevail in morphological and molecular data sets alike, the monophyly of Arachnida is nearly universally accepted, despite historical lack of support in molecular data sets. Some phylotranscriptomic analyses have recovered arachnid monophyly, but these did not sample all living orders, whereas analyses including all orders have failed to recover Arachnida. To understand this conflict, we assembled a data set of 506 high-quality genomes and transcriptomes, sampling all living orders of Chelicerata with high occupancy and rigorous approaches to orthology inference. Our analyses consistently recovered the nested placement of horseshoe crabs within a paraphyletic Arachnida. This result was insensitive to variation in evolutionary rates of genes, complexity of the substitution models, and alternative algorithmic approaches to species tree inference. Investigation of sources of systematic bias showed that genes and sites that recover arachnid monophyly are enriched in noise and exhibit low information content. To test the impact of morphological data, we generated a 514-taxon morphological data matrix of extant and fossil Chelicerata, analyzed in tandem with the molecular matrix. Combined analyses recovered the clade Merostomata (the marine orders Xiphosura, Eurypterida, and Chasmataspidida), but merostomates appeared nested within Arachnida. Our results suggest that morphological convergence resulting from adaptations to life in terrestrial habitats has driven the historical perception of arachnid monophyly, paralleling the history of numerous other invertebrate terrestrial groups.


Assuntos
Aracnídeos , Animais , Aracnídeos/genética , Evolução Biológica , Fósseis , Genoma , Filogenia
6.
Mol Phylogenet Evol ; 188: 107907, 2023 11.
Artigo em Inglês | MEDLINE | ID: mdl-37633542

RESUMO

Large-scale, time-calibrated phylogenies from supermatrix studies have become crucial for evolutionary and ecological studies in many groups of organisms. However, in frogs (anuran amphibians), there is a serious problem with existing supermatrix estimates. Specifically, these trees are based on a limited number of loci (15 or fewer), and the higher-level relationships estimated are discordant with recent phylogenomic estimates based on much larger numbers of loci. Here, we attempted to rectify this problem by generating an expanded supermatrix and combining this with data from phylogenomic studies. To assist in aligning ribosomal sequences for this supermatrix, we developed a new program (TaxonomyAlign) to help perform taxonomy-guided alignments. The new combined matrix contained 5,242 anuran species with data from 307 markers, but with 95% missing data overall. This dataset represented a 71% increase in species sampled relative to the previous largest supermatrix analysis of anurans (adding 2,175 species). Maximum-likelihood analyses generated a tree in which higher-level relationships (and estimated clade ages) were generally concordant with those from phylogenomic analyses but were more discordant with the previous largest supermatrix analysis. We found few obvious problems arising from the extensive missing data in most species. We also generated a set of 100 time-calibrated trees for use in comparative analyses. Overall, we provide an improved estimate of anuran phylogeny based on the largest number of combined taxa and markers to date. More broadly, we demonstrate the potential to combine phylogenomic and supermatrix analyses in other groups of organisms.


Assuntos
Anuros , Evolução Biológica , Animais , Filogenia , Anuros/genética , Ribossomos
7.
Mol Phylogenet Evol ; 178: 107646, 2023 01.
Artigo em Inglês | MEDLINE | ID: mdl-36265831

RESUMO

The Old World flycatchers, robins and chats (Aves, Muscicapidae) are a diverse songbird family with over three hundred species. Despite continuous efforts over the past two decades, there is still no comprehensive and well-resolved species-level phylogeny for Muscicapidae. Here we present a supermatrix phylogeny that includes all 50 currently recognized genera and ca. 92% of all the species, built using data from up to 15 mitochondrial and 13 nuclear loci. In addition to assembling nucleotide sequences available in public databases, we also extracted sequences from the genome assemblies and raw sequencing reads from GenBank and included a few unpublished sequences. Our analyses resolved the phylogenetic position for several previously unsampled taxa, for example, the Grand Comoro Flycatcher Humblotia flavirostris, the Collared Palm Thrush Cichladusa arquata, and the Taiwan Whistling-Thrush Myophonus insularis, etc. We also provide taxonomic recommendations for genera that exhibit paraphyly or polyphyly. Our results suggest that Muscicapidae diverged from Turdidae (thrushes and allies) in the early Miocene, and the most recent common ancestors for the four subfamilies (Muscicapinae, Niltavinae, Cossyphinae and Saxicolinae) all arose around the middle Miocene.


Assuntos
Gadiformes , Passeriformes , Aves Canoras , Animais , Aves Canoras/genética , Filogenia , Passeriformes/genética , Gadiformes/genética , Núcleo Celular/genética , DNA Mitocondrial/genética
8.
Am J Bot ; 110(10): e16226, 2023 10.
Artigo em Inglês | MEDLINE | ID: mdl-37561651

RESUMO

PREMISE: Although Boechera (Boechereae, Brassicaceae) has become a plant model system for both ecological genomics and evolutionary biology, all previous phylogenetic studies have had limited success in resolving species relationships within the genus. The recent effective application of sequence data from target enrichment approaches to resolve the evolutionary relationships of several other challenging plant groups prompted us to investigate their usefulness in Boechera and Boechereae. METHODS: To resolve the phylogeny of Boechera and closely related genera, we utilized the Hybpiper pipeline to analyze two combined bait sets: Angiosperms353, with broad applicability across flowering plants; and a Brassicaceae-specific bait set designed for use in the mustard family. Relationships for 101 samples representing 81 currently recognized species were inferred from a total of 1114 low-copy nuclear genes using both supermatrix and species coalescence methods. RESULTS: Our analyses resulted in a well-resolved and highly supported phylogeny of the tribe Boechereae. Boechereae is divided into two major clades, one comprising all western North American species of Boechera, the other encompassing the eight other genera of the tribe. Our understanding of relationships within Boechera is enhanced by the recognition of three core clades that are further subdivided into robust regional species complexes. CONCLUSIONS: This study presents the first broadly sampled, well-resolved phylogeny for most known sexual diploid Boechera. This effort provides the foundation for a new phylogenetically informed taxonomy of Boechera that is crucial for its continued use as a model system.


Assuntos
Brassicaceae , Filogenia , Brassicaceae/genética , Evolução Biológica , Genômica
9.
Mol Biol Evol ; 38(6): 2446-2467, 2021 05 19.
Artigo em Inglês | MEDLINE | ID: mdl-33565584

RESUMO

Long-branch attraction is a systematic artifact that results in erroneous groupings of fast-evolving taxa. The combination of short, deep internodes in tandem with long-branch attraction artifacts has produced empirically intractable parts of the Tree of Life. One such group is the arthropod subphylum Chelicerata, whose backbone phylogeny has remained unstable despite improvements in phylogenetic methods and genome-scale data sets. Pseudoscorpion placement is particularly variable across data sets and analytical frameworks, with this group either clustering with other long-branch orders or with Arachnopulmonata (scorpions and tetrapulmonates). To surmount long-branch attraction, we investigated the effect of taxonomic sampling via sequential deletion of basally branching pseudoscorpion superfamilies, as well as varying gene occupancy thresholds in supermatrices. We show that concatenated supermatrices and coalescent-based summary species tree approaches support a sister group relationship of pseudoscorpions and scorpions, when more of the basally branching taxa are sampled. Matrix completeness had demonstrably less influence on tree topology. As an external arbiter of phylogenetic placement, we leveraged the recent discovery of an ancient genome duplication in the common ancestor of Arachnopulmonata as a litmus test for competing hypotheses of pseudoscorpion relationships. We generated a high-quality developmental transcriptome and the first genome for pseudoscorpions to assess the incidence of arachnopulmonate-specific duplications (e.g., homeobox genes and miRNAs). Our results support the inclusion of pseudoscorpions in Arachnopulmonata (new definition), as the sister group of scorpions. Panscorpiones (new name) is proposed for the clade uniting Scorpiones and Pseudoscorpiones.


Assuntos
Filogenia , Escorpiões/classificação , Animais , Feminino , Duplicação Gênica , Genes Homeobox , Masculino , Escorpiões/genética
10.
Genomics ; 113(2): 681-692, 2021 03.
Artigo em Inglês | MEDLINE | ID: mdl-33508445

RESUMO

Acer (Sapindaceae) is an exceptional study system for understanding the evolutionary history, divergence, and assembly of broad-leaved deciduous forests at higher latitudes. Maples stand out due to their high diversity, disjunct distribution pattern across the northern continents, and rich fossil record dating back to the Paleocene. Using a genome-wide supermatrix combining plastomes and nuclear sequences (~585 kb) for 110 Acer taxa, we built a robust time-calibrated hypothesis investigating the evolution of maples, inferring ancestral ranges, reconstructing diversification rates over time, and exploring the impact of mass-extinction on lineage accumulation. Contrary to fossil evidence, our results indicate Acer first originated in the (north)eastern Palearctic region, which acted as a source for recurring outward migration. Warm conditions favored rapid Eocene-onward divergence, but ranges and diversity declined extensively as a result of the Plio-Pleistocene glacial cycles. These signals in genome-wide sequence data corroborate paleobotanical evidence for other major woody north-temperate groups, highlighting the significant (disparate) impact of climatic changes on the evolution, composition, and distribution of the vegetation in the northern hemisphere.


Assuntos
Acer/genética , Evolução Molecular , Especiação Genética , Filogenia , Polimorfismo Genético , Acer/classificação , Biomassa , Mudança Climática , Espécies em Perigo de Extinção/tendências , Fósseis , Genoma de Planta , Filogeografia
11.
Syst Biol ; 69(1): 38-60, 2020 01 01.
Artigo em Inglês | MEDLINE | ID: mdl-31062850

RESUMO

Evolutionary relationships have remained unresolved in many well-studied groups, even though advances in next-generation sequencing and analysis, using approaches such as transcriptomics, anchored hybrid enrichment, or ultraconserved elements, have brought systematics to the brink of whole genome phylogenomics. Recently, it has become possible to sequence the entire genomes of numerous nonbiological models in parallel at reasonable cost, particularly with shotgun sequencing. Here, we identify orthologous coding sequences from whole-genome shotgun sequences, which we then use to investigate the relevance and power of phylogenomic relationship inference and time-calibrated tree estimation. We study an iconic group of butterflies-swallowtails of the family Papilionidae-that has remained phylogenetically unresolved, with continued debate about the timing of their diversification. Low-coverage whole genomes were obtained using Illumina shotgun sequencing for all genera. Genome assembly coupled to BLAST-based orthology searches allowed extraction of 6621 orthologous protein-coding genes for 45 Papilionidae species and 16 outgroup species (with 32% missing data after cleaning phases). Supermatrix phylogenomic analyses were performed with both maximum-likelihood (IQ-TREE) and Bayesian mixture models (PhyloBayes) for amino acid sequences, which produced a fully resolved phylogeny providing new insights into controversial relationships. Species tree reconstruction from gene trees was performed with ASTRAL and SuperTriplets and recovered the same phylogeny. We estimated gene site concordant factors to complement traditional node-support measures, which strengthens the robustness of inferred phylogenies. Bayesian estimates of divergence times based on a reduced data set (760 orthologs and 12% missing data) indicate a mid-Cretaceous origin of Papilionoidea around 99.2 Ma (95% credibility interval: 68.6-142.7 Ma) and Papilionidae around 71.4 Ma (49.8-103.6 Ma), with subsequent diversification of modern lineages well after the Cretaceous-Paleogene event. These results show that shotgun sequencing of whole genomes, even when highly fragmented, represents a powerful approach to phylogenomics and molecular dating in a group that has previously been refractory to resolution.


Assuntos
Evolução Biológica , Borboletas/classificação , Borboletas/genética , Genoma de Inseto/genética , Filogenia , Animais , Tempo
12.
Mol Phylogenet Evol ; 151: 106862, 2020 10.
Artigo em Inglês | MEDLINE | ID: mdl-32473335

RESUMO

Gobies, sleepers, and cardinalfishes represent major clades of a species rich radiation of small bodied, ecologically diverse percomorphs (Gobiaria). Molecular phylogenetics has been crucial to resolving broad relationships of sleepers and gobies (Gobioidei), but the phylogenetic placements of cardinalfishes and nurseryfishes, as reciprocal or sequential sister clades to Gobioidei, are uncertain. In order to evaluate relationships among and within families we used a phylogenetic data mining approach to generate densely sampled trees inclusive of all higher taxa. We utilized conspecific amino acid homology to improve alignment accuracy, included ambiguously identified taxa to increase taxon sampling density, and resampled individual gene alignments to filter rogue sequences before concatenation. This approach yielded the most comprehensive tree yet of Gobiaria, inferred from a sparse (17 percent-complete) supermatrix of one ribosomal and 22 protein coding loci (18,065 characters), comprised of 50 outgroup and 777 ingroup taxa, representing 32 percent of species and 68 percent of genera. Our analyses confirmed the lineage-based classification of gobies with strong support, identified sleeper clades with unforeseen levels of systematic uncertainty, and quantified competing phylogenetic signals that confound resolution of the root topology. We also discovered that multilocus data completeness was related to maximum likelihood branch support, and verified that the phylogenetic uncertainty of shallow relationships observed within goby lineages could largely be explained by supermatrix sparseness. These results demonstrate the potential and limits of publicly available sequence data for producing densely-sampled phylogenetic trees of exceptionally biodiverse groups.


Assuntos
Peixes/classificação , Filogenia , Animais , Biodiversidade , Peixes/genética , Loci Gênicos , Perciformes/classificação , Análise de Sequência de DNA , Especificidade da Espécie
13.
Int J Syst Evol Microbiol ; 69(7): 2028-2036, 2019 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-31066660

RESUMO

The family Thermoactinomycetaceaecomprises 43 validly published species, which were identified by a polyphasic taxonomic study based on molecular phylogenetics, physiological and biochemical characteristics. However, phylogenetic analysis merely based on 16S rRNA gene sequences cannot infer a robust and reliable phylogeny. For disentangling the phylogenetic relationships among members of this family, we used a large collection of genome data and the approach of phylogenomics, to re-examine their taxonomy. The topologies of phylogenomic trees are different from those of the 16S rRNA gene sequences. In addition, based on the average nucleotide identity, digital DNA-DNA hybridization, phenotypic and biochemical characteristics, we found that Laceyella sediminis should be reclassified as a later heterotypic synonym of Laceyella tengchongensis; and reclassified Thermoactinomyces guangxiensis as Paenactinomyces guangxiensis gen. nov., comb. nov.; and establish Novibacillaceae fam. nov. to accommodate the genus Novibacillus as the type genus. In addition, compared to values calculated directly from genome sequences, the genomic DNA G+C contents mentioned in some species descriptions are too imprecise; and the corrected G+C content values have a significantly better fit to the phylogeny. Thus, the corresponding emendations of species descriptions are also proposed. In this paper, phylogenomics has been used to resolve the classification of the family Thermoactinomycetaceae.


Assuntos
Bacillales/classificação , Filogenia , Técnicas de Tipagem Bacteriana , Composição de Bases , DNA Bacteriano/genética , Hibridização de Ácido Nucleico , Peptidoglicano/química , RNA Ribossômico 16S/genética , Análise de Sequência de DNA , Vitamina K 2/análogos & derivados , Vitamina K 2/química
14.
BMC Evol Biol ; 18(1): 46, 2018 04 04.
Artigo em Inglês | MEDLINE | ID: mdl-29618314

RESUMO

BACKGROUND: The pattern of data availability in a phylogenetic data set may lead to the formation of terraces, collections of equally optimal trees. Terraces can arise in tree space if trees are scored with parsimony or with partitioned, edge-unlinked maximum likelihood. Theory predicts that terraces can be large, but their prevalence in contemporary data sets has never been surveyed. We selected 26 data sets and phylogenetic trees reported in recent literature and investigated the terraces to which the trees would belong, under a common set of inference assumptions. We examined terrace size as a function of the sampling properties of the data sets, including taxon coverage density (the proportion of taxon-by-gene positions with any data present) and a measure of gene sampling "sufficiency". We evaluated each data set in relation to the theoretical minimum gene sampling depth needed to reduce terrace size to a single tree, and explored the impact of the terraces found in replicate trees in bootstrap methods. RESULTS: Terraces were identified in nearly all data sets with taxon coverage densities < 0.90. They were not found, however, in high-coverage-density (i.e., ≥ 0.94) transcriptomic and genomic data sets. The terraces could be very large, and size varied inversely with taxon coverage density and with gene sampling sufficiency. Few data sets achieved a theoretical minimum gene sampling depth needed to reduce terrace size to a single tree. Terraces found during bootstrap resampling reduced overall support. CONCLUSIONS: If certain inference assumptions apply, trees estimated from empirical data sets often belong to large terraces of equally optimal trees. Terrace size correlates to data set sampling properties. Data sets seldom include enough genes to reduce terrace size to one tree. When bootstrap replicate trees lie on a terrace, statistical support for phylogenetic hypotheses may be reduced. Although some of the published analyses surveyed were conducted with edge-linked inference models (which do not induce terraces), unlinked models have been used and advocated. The present study describes the potential impact of that inference assumption on phylogenetic inference in the context of the kinds of multigene data sets now widely assembled for large-scale tree construction.


Assuntos
Bases de Dados Genéticas , Filogenia , Genes , Modelos Genéticos
15.
Mol Biol Evol ; 34(9): 2408-2421, 2017 09 01.
Artigo em Inglês | MEDLINE | ID: mdl-28873954

RESUMO

Supertree methods merge a set of overlapping phylogenetic trees into a supertree containing all taxa of the input trees. The challenge in supertree reconstruction is the way of dealing with conflicting information in the input trees. Many different algorithms for different objective functions have been suggested to resolve these conflicts. In particular, there exist methods based on encoding the source trees in a matrix, where the supertree is constructed applying a local search heuristic to optimize the respective objective function. We present a novel heuristic supertree algorithm called Bad Clade Deletion (BCD) supertrees. It uses minimum cuts to delete a locally minimal number of columns from such a matrix representation so that it is compatible. This is the complement problem to Matrix Representation with Compatibility (Maximum Split Fit). Our algorithm has guaranteed polynomial worst-case running time and performs swiftly in practice. Different from local search heuristics, it guarantees to return the directed perfect phylogeny for the input matrix, corresponding to the parent tree of the input trees, if one exists. Comparing supertrees to model trees for simulated data, BCD shows a better accuracy (F1 score) than the state-of-the-art algorithms SuperFine (up to 3%) and Matrix Representation with Parsimony (up to 7%); at the same time, BCD is up to 7 times faster than SuperFine, and up to 600 times faster than Matrix Representation with Parsimony. Finally, using the BCD supertree as a starting tree for a combined Maximum Likelihood analysis using RAxML, we reach significantly improved accuracy (1% higher F1 score) and running time (1.7-fold speedup).


Assuntos
Biologia Computacional/métodos , Algoritmos , Simulação por Computador , Filogenia , Software
16.
Mol Phylogenet Evol ; 126: 85-91, 2018 09.
Artigo em Inglês | MEDLINE | ID: mdl-29649581

RESUMO

The phylogenetic relationships of Microhylidae, the third largest family of extant frogs, have been difficult to resolve. In the past decade, large amounts of sequence data have been deposited for almost every microhylid genus, but no study has attempted to combine these data to reconstruct a comprehensive phylogeny for this family. In this study, we sequenced 20 near-complete or partial microhylid mitochondrial genomes and integrated them with all available sequences of Microhylidae from GenBank to construct a supermatrix containing 121 genes (14 mitochondrial and 107 nuclear protein-coding genes). The combined dataset is 112,328 characters long (average sequence data length per species = 7829 bp), includes 427 microhylid taxa, and covers all but three genera of the entire family. This dataset provides strong support for the traditional classification of 11 nominal subfamilies and improves the phylogenetic resolution of the relationships among subfamilies. The African subfamily Phrynomerinae is the sister group of all the other microhylids, and the African subfamily Hoplophryninae is the sister taxon to a clade comprising the remaining 9 subfamilies. At the genus level, our analyses confirm the monophyly of most but not all microhylid genera. In summary, we present a new large-scale phylogeny of microhylid frogs that should be valuable for addressing their classification and for comparative evolutionary studies.


Assuntos
Anuros/classificação , Anuros/genética , Genes , Filogenia , Animais , Núcleo Celular/genética , DNA Mitocondrial/genética , Geografia , Funções Verossimilhança
17.
Mol Phylogenet Evol ; 122: 59-79, 2018 05.
Artigo em Inglês | MEDLINE | ID: mdl-29410353

RESUMO

Inferring interfamilial relationships within the eudicot order Ericales has remained one of the more recalcitrant problems in angiosperm phylogenetics, likely due to a rapid, ancient radiation. As a result, no comprehensive time-calibrated tree or biogeographical analysis of the order has been published. Here, we elucidate phylogenetic relationships within the order and then conduct time-dependent biogeographical and diversification analyses by using a taxon and locus-rich supermatrix approach on one-third of the extant species diversity calibrated with 23 macrofossils and two secondary calibration points. Our results corroborate previous studies and also suggest several new but poorly supported relationships. Newly suggested relationships are: (1) holoparasitic Mitrastemonaceae is sister to Lecythidaceae, (2) the clade formed by Mitrastemonaceae + Lecythidaceae is sister to Ericales excluding balsaminoids, (3) Theaceae is sister to the styracoids + sarracenioids + ericoids, and (4) subfamilial relationships with Ericaceae suggest that Arbutoideae is sister to Monotropoideae and Pyroloideae is sister to all subfamilies excluding Arbutoideae, Enkianthoideae, and Monotropoideae. Our results indicate Ericales began to diversify 110 Mya, within Indo-Malaysia and the Neotropics, with exchange between the two areas and expansion out of Indo-Malaysia becoming an important area in shaping the extant diversity of many families. Rapid cladogenesis occurred along the backbone of the order between 104 and 106 Mya. Jump dispersal is important within the order in the last 30 My, but vicariance is the most important cladogenetic driver of disjunctions at deeper levels of the phylogeny. We detect between 69 and 81 shifts in speciation rate throughout the order, the vast majority of which occurred within the last 30 My. We propose that range shifting may be responsible for older shifts in speciation rate, but more recent shifts may be better explained by morphological innovation.


Assuntos
Biodiversidade , Magnoliopsida/classificação , Filogenia , Animais , Cloroplastos/genética , Ásia Oriental , Fósseis/história , Especiação Genética , História Antiga , Magnoliopsida/genética , Mitocôndrias/genética , Filogeografia/história , Ribossomos/genética
18.
J Hum Evol ; 123: 35-51, 2018 10.
Artigo em Inglês | MEDLINE | ID: mdl-30057325

RESUMO

African papionins are a highly successful subtribe of Old World monkeys with an extensive fossil record. On the basis of both molecular and morphological data, crown African papionins are divided into two clades: Cercocebus/Mandrillus and Papio/Lophocebus/Rungwecebus/Theropithecus (P/L/R/T), though phylogenetic relationships in the latter clade, among both fossil and extant taxa, remain difficult to resolve. While previous phylogenetic studies have focused on either molecular or morphological data, here African papionin molecular and morphological data were combined using both supermatrix and molecular backbone approaches. Theropithecus is supported as the sister taxon to Papio/Lophocebus/Rungwecebus, and while supermatrix analyses using Bayesian methods are largely unresolved, analyses using parsimony are broadly similar to earlier studies. Thus, the position of Rungwecebus relative to Papio and Lophocebus remains equivocal, possibly due to complex patterns of reticulation. Parapapio is likely a paraphyletic grouping of primitive African papionins or possibly a collection of stem P/L/R/T taxa, and a similar phylogenetic position is also hypothesized for Pliopapio. ?Papio izodi is either a stem or crown P/L/R/T taxon, but does not group with other Papio taxa. Dinopithecus and Gorgopithecus are also stem or crown P/L/R/T taxa, but their phylogenetic positions remain unstable. Finally, T. baringensis is likely the most basal Theropithecus taxon, with T. gelada and T. oswaldi sister taxa to the exclusion of T. brumpti. By integrating large amounts of molecular and morphological data, combined with the application of updated parsimony and Bayesian methods, this study represents the most comprehensive analysis of African papionin phylogenetic history to date.


Assuntos
Cercopithecinae/classificação , Filogenia , África , Animais , Cercopithecinae/anatomia & histologia , Cercopithecinae/genética , Análise de Sequência de DNA
19.
Am J Bot ; 105(3): 446-462, 2018 03.
Artigo em Inglês | MEDLINE | ID: mdl-29738076

RESUMO

PREMISE OF THE STUDY: The Caryophyllales contain ~12,500 species and are known for their cosmopolitan distribution, convergence of trait evolution, and extreme adaptations. Some relationships within the Caryophyllales, like those of many large plant clades, remain unclear, and phylogenetic studies often recover alternative hypotheses. We explore the utility of broad and dense transcriptome sampling across the order for resolving evolutionary relationships in Caryophyllales. METHODS: We generated 84 transcriptomes and combined these with 224 publicly available transcriptomes to perform a phylogenomic analysis of Caryophyllales. To overcome the computational challenge of ortholog detection in such a large data set, we developed an approach for clustering gene families that allowed us to analyze >300 transcriptomes and genomes. We then inferred the species relationships using multiple methods and performed gene-tree conflict analyses. KEY RESULTS: Our phylogenetic analyses resolved many clades with strong support, but also showed significant gene-tree discordance. This discordance is not only a common feature of phylogenomic studies, but also represents an opportunity to understand processes that have structured phylogenies. We also found taxon sampling influences species-tree inference, highlighting the importance of more focused studies with additional taxon sampling. CONCLUSIONS: Transcriptomes are useful both for species-tree inference and for uncovering evolutionary complexity within lineages. Through analyses of gene-tree conflict and multiple methods of species-tree inference, we demonstrate that phylogenomic data can provide unparalleled insight into the evolutionary history of Caryophyllales. We also discuss a method for overcoming computational challenges associated with homolog clustering in large data sets.


Assuntos
Evolução Biológica , Caryophyllales/genética , Genes de Plantas , Genômica/métodos , Modelos Genéticos , Filogenia , Transcriptoma , Cactaceae/genética , Carnivoridade , Análise por Conglomerados , Evolução Molecular , Genoma de Planta , Análise de Sequência de DNA , Homologia de Sequência , Especificidade da Espécie
20.
Syst Biol ; 65(3): 381-96, 2016 May.
Artigo em Inglês | MEDLINE | ID: mdl-26821913

RESUMO

Under the multispecies coalescent model of molecular evolution, gene trees have independent evolutionary histories within a shared species tree. In comparison, supermatrix concatenation methods assume that gene trees share a single common genealogical history, thereby equating gene coalescence with species divergence. The multispecies coalescent is supported by previous studies which found that its predicted distributions fit empirical data, and that concatenation is not a consistent estimator of the species tree. *BEAST, a fully Bayesian implementation of the multispecies coalescent, is popular but computationally intensive, so the increasing size of phylogenetic data sets is both a computational challenge and an opportunity for better systematics. Using simulation studies, we characterize the scaling behavior of *BEAST, and enable quantitative prediction of the impact increasing the number of loci has on both computational performance and statistical accuracy. Follow-up simulations over a wide range of parameters show that the statistical performance of *BEAST relative to concatenation improves both as branch length is reduced and as the number of loci is increased. Finally, using simulations based on estimated parameters from two phylogenomic data sets, we compare the performance of a range of species tree and concatenation methods to show that using *BEAST with tens of loci can be preferable to using concatenation with thousands of loci. Our results provide insight into the practicalities of Bayesian species tree estimation, the number of loci required to obtain a given level of accuracy and the situations in which supermatrix or summary methods will be outperformed by the fully Bayesian multispecies coalescent.


Assuntos
Classificação/métodos , Filogenia , Software , Teorema de Bayes , Evolução Biológica , Interpretação Estatística de Dados , Evolução Molecular , Modelos Genéticos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA