Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 12 de 12
Filter
1.
Mol Phylogenet Evol ; 197: 108091, 2024 Aug.
Article in English | MEDLINE | ID: mdl-38719080

ABSTRACT

Cryptic diversity poses a great obstacle in our attempts to assess the current biodiversity crisis and may hamper conservation efforts. The gekkonid genus Mediodactylus, a well-known case of hidden species and genetic diversity, has been taxonomically reclassified several times during the last decade. Focusing on the Mediterranean populations, a recent study within the M. kotschyi species complex using classic mtDNA/nuDNA markers suggested the existence of five distinct species, some being endemic and some possibly threatened, yet their relationships have not been fully resolved. Here, we generated genome-wide SNPs (using ddRADseq) and applied molecular species delimitation approaches and population genomic analyses to further disentangle these relationships. Τhe most extensive nuclear dataset, so far, encompassing 2,360 loci and âˆ¼ 699,000 bp from across the genome of Mediodactylus gecko, enabled us to resolve previously obscure phylogenetic relationships among the five, recently elevated, Mediodactylus species and to support the hypothesis that the taxon includes several new, undescribed species. Population genomic analyses within each of the proposed species showed strong genetic structure and high levels of genetic differentiation among populations.


Subject(s)
Lizards , Phylogeny , Phylogeography , Animals , Mediterranean Region , Lizards/genetics , Lizards/classification , Polymorphism, Single Nucleotide , Genetic Variation , Genetics, Population , DNA, Mitochondrial/genetics , Sequence Analysis, DNA
2.
Mol Biol Evol ; 37(1): 291-294, 2020 Jan 01.
Article in English | MEDLINE | ID: mdl-31432070

ABSTRACT

ModelTest-NG is a reimplementation from scratch of jModelTest and ProtTest, two popular tools for selecting the best-fit nucleotide and amino acid substitution models, respectively. ModelTest-NG is one to two orders of magnitude faster than jModelTest and ProtTest but equally accurate and introduces several new features, such as ascertainment bias correction, mixture, and free-rate models, or the automatic processing of single partitions. ModelTest-NG is available under a GNU GPL3 license at https://github.com/ddarriba/modeltest , last accessed September 2, 2019.


Subject(s)
Amino Acid Substitution , Evolution, Molecular , Genetic Techniques , Models, Genetic , Software
3.
Mol Phylogenet Evol ; 159: 107121, 2021 06.
Article in English | MEDLINE | ID: mdl-33609707

ABSTRACT

Wall lizards of the genus Podarcis (Sauria, Lacertidae) are the predominant reptile group in southern Europe, including 24 recognized species. Mitochondrial DNA data have shown that, with the exception of P. muralis, the Podarcis species distributed in the Balkan peninsula form a species group that is further sub-divided into two subgroups: the one of "P. tauricus" consisting of P. tauricus, P. milensis, P. gaigeae, and P. melisellensis, and the other of "P. erhardii" comprising P. erhardii, P. levendis, P. cretensis, and P. peloponnesiacus. In an attempt to explore the Balkan Podarcis phylogenomic relationships, assess the levels of genetic structure and to re-evaluate the number of extant species, we employed phylogenomic and admixture approaches on ddRADseq (double digested Restriction site Associated DNA sequencing) genomic data. With this efficient Next Generation Sequencing approach, we were able to obtain a large number of genomic loci randomly distributed throughout the genome and use them to resolve the previously obscure phylogenetic relationships among the different Podarcis species distributed in the Balkans. The obtained phylogenomic relationships support the monophyly of both aforementioned subgroups and revealed several divergent lineages within each subgroup, stressing the need for taxonomic re-evaluation of Podarcis' species in Balkans. The phylogenomic trees and the species delimitation analyses confirmed all recently recognized species (P. levendis, P. cretensis, and P. ionicus) and showed the presence of at least two more species, one in P. erhardii and the other in P. peloponnesiacus.


Subject(s)
Genetic Speciation , Genetics, Population , Lizards/classification , Phylogeny , Animals , Balkan Peninsula , DNA, Mitochondrial/genetics , Genomics , Metagenomics , Sequence Analysis, DNA
4.
Bioinformatics ; 35(21): 4453-4455, 2019 11 01.
Article in English | MEDLINE | ID: mdl-31070718

ABSTRACT

MOTIVATION: Phylogenies are important for fundamental biological research, but also have numerous applications in biotechnology, agriculture and medicine. Finding the optimal tree under the popular maximum likelihood (ML) criterion is known to be NP-hard. Thus, highly optimized and scalable codes are needed to analyze constantly growing empirical datasets. RESULTS: We present RAxML-NG, a from-scratch re-implementation of the established greedy tree search algorithm of RAxML/ExaML. RAxML-NG offers improved accuracy, flexibility, speed, scalability, and usability compared with RAxML/ExaML. On taxon-rich datasets, RAxML-NG typically finds higher-scoring trees than IQTree, an increasingly popular recent tool for ML-based phylogenetic inference (although IQ-Tree shows better stability). Finally, RAxML-NG introduces several new features, such as the detection of terraces in tree space and the recently introduced transfer bootstrap support metric. AVAILABILITY AND IMPLEMENTATION: The code is available under GNU GPL at https://github.com/amkozlov/raxml-ng. RAxML-NG web service (maintained by Vital-IT) is available at https://raxml-ng.vital-it.ch/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Algorithms , Phylogeny , Software , Likelihood Functions
5.
Syst Biol ; 68(2): 365-369, 2019 03 01.
Article in English | MEDLINE | ID: mdl-30165689

ABSTRACT

Next generation sequencing (NGS) technologies have led to a ubiquity of molecular sequence data. This data avalanche is particularly challenging in metagenetics, which focuses on taxonomic identification of sequences obtained from diverse microbial environments. Phylogenetic placement methods determine how these sequences fit into an evolutionary context. Previous implementations of phylogenetic placement algorithms, such as the evolutionary placement algorithm (EPA) included in RAxML, or PPLACER, are being increasingly used for this purpose. However, due to the steady progress in NGS technologies, the current implementations face substantial scalability limitations. Herein, we present EPA-NG, a complete reimplementation of the EPA that is substantially faster, offers a distributed memory parallelization, and integrates concepts from both, RAxML-EPA and PPLACER. EPA-NG can be executed on standard shared memory, as well as on distributed memory systems (e.g., computing clusters). To demonstrate the scalability of EPA-NG, we placed $1$ billion metagenetic reads from the Tara Oceans Project onto a reference tree with 3748 taxa in just under $7$ h, using 2048 cores. Our performance assessment shows that EPA-NG outperforms RAxML-EPA and PPLACER by up to a factor of $30$ in sequential execution mode, while attaining comparable parallel efficiency on shared memory systems. We further show that the distributed memory parallelization of EPA-NG scales well up to 2048 cores. EPA-NG is available under the AGPLv3 license: https://github.com/Pbdas/epa-ng.


Subject(s)
Algorithms , Classification/methods , Phylogeny , Sequence Analysis, DNA , Software
6.
Mol Biol Evol ; 35(5): 1037-1046, 2018 05 01.
Article in English | MEDLINE | ID: mdl-29385525

ABSTRACT

With Next Generation Sequencing data being routinely used, evolutionary biology is transforming into a computational science. Thus, researchers have to rely on a growing number of increasingly complex software. All widely used core tools in the field have grown considerably, in terms of the number of features as well as lines of code and consequently, also with respect to software complexity. A topic that has received little attention is the software engineering quality of widely used core analysis tools. Software developers appear to rarely assess the quality of their code, and this can have potential negative consequences for end-users. To this end, we assessed the code quality of 16 highly cited and compute-intensive tools mainly written in C/C++ (e.g., MrBayes, MAFFT, SweepFinder, etc.) and JAVA (BEAST) from the broader area of evolutionary biology that are being routinely used in current data analysis pipelines. Because, the software engineering quality of the tools we analyzed is rather unsatisfying, we provide a list of best practices for improving the quality of existing tools and list techniques that can be deployed for developing reliable, high quality scientific software from scratch. Finally, we also discuss journal as well as science policy and, more importantly, funding issues that need to be addressed for improving software engineering quality as well as ensuring support for developing new and maintaining existing software. Our intention is to raise the awareness of the community regarding software engineering quality issues and to emphasize the substantial lack of funding for scientific software development.


Subject(s)
Biological Evolution , Computational Biology , Software/standards
7.
Mol Phylogenet Evol ; 125: 100-115, 2018 08.
Article in English | MEDLINE | ID: mdl-29574273

ABSTRACT

The Balkan Peninsula constitutes a biodiversity hotspot with high levels of species richness and endemism. The complex geological history of the Balkans in conjunction with the climate evolution are hypothesized as the main drivers generating this biodiversity. We investigated the phylogeography, historical demography, and population structure of closely related wall-lizard species from the Balkan Peninsula and southeastern Europe to better understand diversification processes of species with limited dispersal ability, from Late Miocene to the Holocene. We used several analytical methods integrating genome-wide SNPs (ddRADseq), microsatellites, mitochondrial and nuclear DNA data, as well as species distribution modelling. Phylogenomic analysis resulted in a completely resolved species level phylogeny, population level analyses confirmed the existence of at least two cryptic evolutionary lineages and extensive within species genetic structuring. Divergence time estimations indicated that the Messinian Salinity Crisis played a key role in shaping patterns of species divergence, whereas intraspecific genetic structuring was mainly driven by Pliocene tectonic events and Quaternary climatic oscillations. The present work highlights the effectiveness of utilizing multiple methods and data types coupled with extensive geographic sampling to uncover the evolutionary processes that shaped the species over space and time.


Subject(s)
Lizards/classification , Models, Biological , Phylogeography , Animals , Balkan Peninsula , Bayes Theorem , Biodiversity , Calibration , DNA, Mitochondrial/genetics , Genetic Variation , Genetics, Population , Genomics , Haplotypes/genetics , Lizards/genetics , Microsatellite Repeats/genetics , Phylogeny , Species Specificity
8.
Bioinformatics ; 32(9): 1331-7, 2016 05 01.
Article in English | MEDLINE | ID: mdl-26733454

ABSTRACT

MOTIVATION: The presence of missing data in large-scale phylogenomic datasets has negative effects on the phylogenetic inference process. One effect that is caused by alignments with missing per-gene or per-partition sequences is that the inferred phylogenies may exhibit extremely long branch lengths. We investigate if statistically predicting missing sequences for organisms by using information from genes/partitions that have data for these organisms alleviates the problem and improves phylogenetic accuracy. RESULTS: We present several algorithms for correcting excessively long branch lengths induced by missing data. We also present methods for predicting/imputing missing sequence data. We evaluate our algorithms by systematically removing sequence data from three empirical and 100 simulated alignments. We then compare the Maximum Likelihood trees inferred from the gappy alignments and on the alignments with predicted sequence data to the trees inferred from the original, complete datasets. The datasets with predicted sequences showed one to two orders of magnitude more accurate branch lengths compared to the branch lengths of the trees inferred from the alignments with missing data. However, prediction did not affect the RF distances between the trees. AVAILABILITY AND IMPLEMENTATION: https://github.com/ddarriba/ForeSeqs CONTACT: : diego.darriba@h-its.org SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Phylogeny , Sequence Alignment , Algorithms , Probability
9.
BMC Bioinformatics ; 17: 143, 2016 Mar 24.
Article in English | MEDLINE | ID: mdl-27009141

ABSTRACT

BACKGROUND: In the context of a master level programming practical at the computer science department of the Karlsruhe Institute of Technology, we developed and make available an open-source code for testing all 203 possible nucleotide substitution models in the Maximum Likelihood (ML) setting under the common Akaike, corrected Akaike, and Bayesian information criteria. We address the question if model selection matters topologically, that is, if conducting ML inferences under the optimal, instead of a standard General Time Reversible model, yields different tree topologies. We also assess, to which degree models selected and trees inferred under the three standard criteria (AIC, AICc, BIC) differ. Finally, we assess if the definition of the sample size (#sites versus #sites × #taxa) yields different models and, as a consequence, different tree topologies. RESULTS: We find that, all three factors (by order of impact: nucleotide model selection, information criterion used, sample size definition) can yield topologically substantially different final tree topologies (topological difference exceeding 10 %) for approximately 5 % of the tree inferences conducted on the 39 empirical datasets used in our study. CONCLUSIONS: We find that, using the best-fit nucleotide substitution model may change the final ML tree topology compared to an inference under a default GTR model. The effect is less pronounced when comparing distinct information criteria. Nonetheless, in some cases we did obtain substantial topological differences.


Subject(s)
Models, Genetic , Algorithms , Bayes Theorem , DNA/chemistry , DNA/metabolism , Likelihood Functions
10.
Bioinformatics ; 30(9): 1310-1, 2014 May 01.
Article in English | MEDLINE | ID: mdl-24451621

ABSTRACT

The selection of models of nucleotide substitution is one of the major steps of modern phylogenetic analysis. Different tools exist to accomplish this task, among which jModelTest 2 (jMT2) is one of the most popular. Still, to deal with large DNA alignments with hundreds or thousands of loci, users of jMT2 need to have access to High Performance Computing clusters, including installation and configuration capabilities, conditions not always met. Here we present jmodeltest.org, a novel web server for the transparent execution of jMT2 across different platforms and for a wide range of users. Its main benefit is straightforward execution, avoiding any configuration/execution issues, and reducing significantly in most cases the time required to complete the analysis.


Subject(s)
Nucleotides/genetics , Cluster Analysis , Internet , Models, Genetic , Phylogeny , Software
11.
Bioinformatics ; 27(8): 1164-5, 2011 Apr 15.
Article in English | MEDLINE | ID: mdl-21335321

ABSTRACT

UNLABELLED: We have implemented a high-performance computing (HPC) version of ProtTest that can be executed in parallel in multicore desktops and clusters. This version, called ProtTest 3, includes new features and extended capabilities. AVAILABILITY: ProtTest 3 source code and binaries are freely available under GNU license for download from http://darwin.uvigo.es/software/prottest3, linked to a Mercurial repository at Bitbucket (https://bitbucket.org/). CONTACT: dposada@uvigo.es SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Evolution, Molecular , Sequence Alignment/methods , Sequence Analysis, Protein , Software , Models, Statistical , Phylogeny
SELECTION OF CITATIONS
SEARCH DETAIL