Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 33
Filtrar
Más filtros

Banco de datos
País/Región como asunto
Tipo del documento
Intervalo de año de publicación
1.
Nature ; 629(8013): 851-860, 2024 May.
Artículo en Inglés | MEDLINE | ID: mdl-38560995

RESUMEN

Despite tremendous efforts in the past decades, relationships among main avian lineages remain heavily debated without a clear resolution. Discrepancies have been attributed to diversity of species sampled, phylogenetic method and the choice of genomic regions1-3. Here we address these issues by analysing the genomes of 363 bird species4 (218 taxonomic families, 92% of total). Using intergenic regions and coalescent methods, we present a well-supported tree but also a marked degree of discordance. The tree confirms that Neoaves experienced rapid radiation at or near the Cretaceous-Palaeogene boundary. Sufficient loci rather than extensive taxon sampling were more effective in resolving difficult nodes. Remaining recalcitrant nodes involve species that are a challenge to model due to either extreme DNA composition, variable substitution rates, incomplete lineage sorting or complex evolutionary events such as ancient hybridization. Assessment of the effects of different genomic partitions showed high heterogeneity across the genome. We discovered sharp increases in effective population size, substitution rates and relative brain size following the Cretaceous-Palaeogene extinction event, supporting the hypothesis that emerging ecological opportunities catalysed the diversification of modern birds. The resulting phylogenetic estimate offers fresh insights into the rapid radiation of modern birds and provides a taxon-rich backbone tree for future comparative studies.


Asunto(s)
Aves , Evolución Molecular , Genoma , Filogenia , Animales , Aves/genética , Aves/clasificación , Aves/anatomía & histología , Encéfalo/anatomía & histología , Extinción Biológica , Genoma/genética , Genómica , Densidad de Población , Masculino , Femenino
2.
Bioinformatics ; 38(15): 3725-3733, 2022 08 02.
Artículo en Inglés | MEDLINE | ID: mdl-35713506

RESUMEN

MOTIVATION: Phylogenetic networks can represent non-treelike evolutionary scenarios. Current, actively developed approaches for phylogenetic network inference jointly account for non-treelike evolution and incomplete lineage sorting (ILS). Unfortunately, this induces a very high computational complexity and current tools can only analyze small datasets. RESULTS: We present NetRAX, a tool for maximum likelihood (ML) inference of phylogenetic networks in the absence of ILS. Our tool leverages state-of-the-art methods for efficiently computing the phylogenetic likelihood function on trees, and extends them to phylogenetic networks via the notion of 'displayed trees'. NetRAX can infer ML phylogenetic networks from partitioned multiple sequence alignments and returns the inferred networks in Extended Newick format. On simulated data, our results show a very low relative difference in Bayesian Information Criterion (BIC) score and a near-zero unrooted softwired cluster distance to the true, simulated networks. With NetRAX, a network inference on a partitioned alignment with 8000 sites, 30 taxa and 3 reticulations completes within a few minutes on a standard laptop. AVAILABILITY AND IMPLEMENTATION: Our implementation is available under the GNU General Public License v3.0 at https://github.com/lutteropp/NetRAX. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Algoritmos , Filogenia , Teorema de Bayes , Alineación de Secuencia , Funciones de Verosimilitud
3.
Mol Biol Evol ; 38(5): 1777-1791, 2021 05 04.
Artículo en Inglés | MEDLINE | ID: mdl-33316067

RESUMEN

Numerous studies covering some aspects of SARS-CoV-2 data analyses are being published on a daily basis, including a regularly updated phylogeny on nextstrain.org. Here, we review the difficulties of inferring reliable phylogenies by example of a data snapshot comprising a quality-filtered subset of 8,736 out of all 16,453 virus sequences available on May 5, 2020 from gisaid.org. We find that it is difficult to infer a reliable phylogeny on these data due to the large number of sequences in conjunction with the low number of mutations. We further find that rooting the inferred phylogeny with some degree of confidence either via the bat and pangolin outgroups or by applying novel computational methods on the ingroup phylogeny does not appear to be credible. Finally, an automatic classification of the current sequences into subclasses using the mPTP tool for molecular species delimitation is also, as might be expected, not possible, as the sequences are too closely related. We conclude that, although the application of phylogenetic methods to disentangle the evolution and spread of COVID-19 provides some insight, results of phylogenetic analyses, in particular those conducted under the default settings of current phylogenetic inference tools, as well as downstream analyses on the inferred phylogenies, should be considered and interpreted with extreme caution.


Asunto(s)
COVID-19/genética , Evolución Molecular , Genoma Viral , Mutación , Filogenia , SARS-CoV-2/genética , Humanos
4.
Small ; 18(47): e2203555, 2022 11.
Artículo en Inglés | MEDLINE | ID: mdl-36192153

RESUMEN

Metallic barcode nanowires (BNWs) composed of repeating heterogeneous segments fabricated by template-assisted electrodeposition can offer extended functionality in magnetic, electrical, mechanical, and biomedical applications. The authors consider such nanostructures as a 3D system of magnetically interacting elements with magnetic behavior strongly affected by complex magnetostatic interactions. This study discusses the influence of geometrical parameters of segments on the character of their interactions and the overall magnetic behavior of the array of BNWs having alternating magnetization, because the Fe and Au segments are made of Fe-Au alloys with high and low magnetizations. By controlling the applied current densities and the elapsed time in the electrodeposition, the dimension of the Fe-Au BNWs can be regulated. This study reveals that the influence of the length of magnetically weak Au segments on the interaction field between nanowires is different for samples with magnetically strong 100 and 200 nm long Fe segments using the first-order reversal curve (FORC) diagram method. With the help of micromagnetic simulations, three types of magnetostatic interactions in the BNW arrays are discovered and analy. This study demonstrates that the dominating type of interaction depends on the geometric parameters of the Fe and Au segments and the interwire and intrawire distances.


Asunto(s)
Nanoestructuras , Nanocables , Nanocables/química , Nanoestructuras/química , Galvanoplastia/métodos , Magnetismo
5.
Bioinformatics ; 37(22): 4056-4063, 2021 11 18.
Artículo en Inglés | MEDLINE | ID: mdl-34037680

RESUMEN

MOTIVATION: Phylogenetic trees are now routinely inferred on large scale high performance computing systems with thousands of cores as the parallel scalability of phylogenetic inference tools has improved over the past years to cope with the molecular data avalanche. Thus, the parallel fault tolerance of phylogenetic inference tools has become a relevant challenge. To this end, we explore parallel fault tolerance mechanisms and algorithms, the software modifications required and the performance penalties induced via enabling parallel fault tolerance by example of RAxML-NG, the successor of the widely used RAxML tool for maximum likelihood-based phylogenetic tree inference. RESULTS: We find that the slowdown induced by the necessary additional recovery mechanisms in RAxML-NG is on average 1.00 ± 0.04. The overall slowdown by using these recovery mechanisms in conjunction with a fault-tolerant Message Passing Interface implementation amounts to on average 1.7 ± 0.6 for large empirical datasets. Via failure simulations, we show that RAxML-NG can successfully recover from multiple simultaneous failures, subsequent failures, failures during recovery and failures during checkpointing. Recoveries are automatic and transparent to the user. AVAILABILITY AND IMPLEMENTATION: The modified fault-tolerant RAxML-NG code is available under GNU GPL at https://github.com/lukashuebner/ft-raxml-ng. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Filogenia , Interfaz Usuario-Computador , Algoritmos , Programas Informáticos
6.
Mol Biol Evol ; 37(9): 2763-2774, 2020 09 01.
Artículo en Inglés | MEDLINE | ID: mdl-32502238

RESUMEN

Inferring phylogenetic trees for individual homologous gene families is difficult because alignments are often too short, and thus contain insufficient signal, while substitution models inevitably fail to capture the complexity of the evolutionary processes. To overcome these challenges, species-tree-aware methods also leverage information from a putative species tree. However, only few methods are available that implement a full likelihood framework or account for horizontal gene transfers. Furthermore, these methods often require expensive data preprocessing (e.g., computing bootstrap trees) and rely on approximations and heuristics that limit the degree of tree space exploration. Here, we present GeneRax, the first maximum likelihood species-tree-aware phylogenetic inference software. It simultaneously accounts for substitutions at the sequence level as well as gene level events, such as duplication, transfer, and loss relying on established maximum likelihood optimization algorithms. GeneRax can infer rooted phylogenetic trees for multiple gene families, directly from the per-gene sequence alignments and a rooted, yet undated, species tree. We show that compared with competing tools, on simulated data GeneRax infers trees that are the closest to the true tree in 90% of the simulations in terms of relative Robinson-Foulds distance. On empirical data sets, GeneRax is the fastest among all tested methods when starting from aligned sequences, and it infers trees with the highest likelihood score, based on our model. GeneRax completed tree inferences and reconciliations for 1,099 Cyanobacteria families in 8 min on 512 CPU cores. Thus, its parallelization scheme enables large-scale analyses. GeneRax is available under GNU GPL at https://github.com/BenoitMorel/GeneRax (last accessed June 17, 2020).


Asunto(s)
Duplicación de Gen , Técnicas Genéticas , Filogenia , Programas Informáticos , Cianobacterias/genética , Eliminación de Gen , Transferencia de Gen Horizontal
7.
Mol Biol Evol ; 37(1): 291-294, 2020 Jan 01.
Artículo en Inglés | MEDLINE | ID: mdl-31432070

RESUMEN

ModelTest-NG is a reimplementation from scratch of jModelTest and ProtTest, two popular tools for selecting the best-fit nucleotide and amino acid substitution models, respectively. ModelTest-NG is one to two orders of magnitude faster than jModelTest and ProtTest but equally accurate and introduces several new features, such as ascertainment bias correction, mixture, and free-rate models, or the automatic processing of single partitions. ModelTest-NG is available under a GNU GPL3 license at https://github.com/ddarriba/modeltest , last accessed September 2, 2019.


Asunto(s)
Sustitución de Aminoácidos , Evolución Molecular , Técnicas Genéticas , Modelos Genéticos , Programas Informáticos
8.
Bioinformatics ; 36(7): 2280-2281, 2020 04 01.
Artículo en Inglés | MEDLINE | ID: mdl-31755898

RESUMEN

MOTIVATION: Recently, Lemoine et al. suggested the transfer bootstrap expectation (TBE) branch support metric as an alternative to classical phylogenetic bootstrap support for taxon-rich datasets. However, the original TBE implementation in the booster tool is compute- and memory-intensive. RESULTS: We developed a fast and memory-efficient TBE implementation. We improve upon the original algorithm by Lemoine et al. via several algorithmic and technical optimizations. On empirical as well as on random tree sets with varying taxon counts, our implementation is up to 480 times faster than booster. Furthermore, it only requires memory that is linear in the number of taxa, which leads to 10× to 40× memory savings compared with booster. AVAILABILITY AND IMPLEMENTATION: Our implementation has been partially integrated into pll-modules and RAxML-NG and is available under the GNU Affero General Public License v3.0 at https://github.com/ddarriba/pll-modules and https://github.com/amkozlov/raxml-ng. The parallel version that also computes additional TBE-related statistics is available at: https://github.com/lutteropp/raxml-ng/tree/tbe. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Algoritmos , Programas Informáticos , Filogenia
9.
Proc Natl Acad Sci U S A ; 115(50): 12775-12780, 2018 12 11.
Artículo en Inglés | MEDLINE | ID: mdl-30478043

RESUMEN

Hemipteroid insects (Paraneoptera), with over 10% of all known insect diversity, are a major component of terrestrial and aquatic ecosystems. Previous phylogenetic analyses have not consistently resolved the relationships among major hemipteroid lineages. We provide maximum likelihood-based phylogenomic analyses of a taxonomically comprehensive dataset comprising sequences of 2,395 single-copy, protein-coding genes for 193 samples of hemipteroid insects and outgroups. These analyses yield a well-supported phylogeny for hemipteroid insects. Monophyly of each of the three hemipteroid orders (Psocodea, Thysanoptera, and Hemiptera) is strongly supported, as are most relationships among suborders and families. Thysanoptera (thrips) is strongly supported as sister to Hemiptera. However, as in a recent large-scale analysis sampling all insect orders, trees from our data matrices support Psocodea (bark lice and parasitic lice) as the sister group to the holometabolous insects (those with complete metamorphosis). In contrast, four-cluster likelihood mapping of these data does not support this result. A molecular dating analysis using 23 fossil calibration points suggests hemipteroid insects began diversifying before the Carboniferous, over 365 million years ago. We also explore implications for understanding the timing of diversification, the evolution of morphological traits, and the evolution of mitochondrial genome organization. These results provide a phylogenetic framework for future studies of the group.


Asunto(s)
Insectos/genética , Animales , Calibración , Ecosistema , Fósiles , Genoma Mitocondrial/genética , Filogenia
10.
Mol Biol Evol ; 36(9): 2086-2103, 2019 09 01.
Artículo en Inglés | MEDLINE | ID: mdl-31114882

RESUMEN

Few models of sequence evolution incorporate parameters describing protein structure, despite its high conservation, essential functional role and increasing availability. We present a structurally aware empirical substitution model for amino acid sequence evolution in which proteins are expressed using an expanded alphabet that relays both amino acid identity and structural information. Each character specifies an amino acid as well as information about the rotamer configuration of its side-chain: the discrete geometric pattern of permitted side-chain atomic positions, as defined by the dihedral angles between covalently linked atoms. By assigning rotamer states in 251,194 protein structures and identifying 4,508,390 substitutions between closely related sequences, we generate a 55-state "Dayhoff-like" model that shows that the evolutionary properties of amino acids depend strongly upon side-chain geometry. The model performs as well as or better than traditional 20-state models for divergence time estimation, tree inference, and ancestral state reconstruction. We conclude that not only is rotamer configuration a valuable source of information for phylogenetic studies, but that modeling the concomitant evolution of sequence and structure may have important implications for understanding protein folding and function.


Asunto(s)
Evolución Molecular , Modelos Biológicos , Conformación Proteica , Sustitución de Aminoácidos , Cadenas de Markov
11.
Bioinformatics ; 35(10): 1771-1773, 2019 05 15.
Artículo en Inglés | MEDLINE | ID: mdl-30321303

RESUMEN

MOTIVATION: Coalescent- and reconciliation-based methods are now widely used to infer species phylogenies from genomic data. They typically use per-gene phylogenies as input, which requires conducting multiple individual tree inferences on a large set of multiple sequence alignments (MSAs). At present, no easy-to-use parallel tool for this task exists. Ad hoc scripts for this purpose do not only induce additional implementation overhead, but can also lead to poor resource utilization and long times-to-solution. We present ParGenes, a tool for simultaneously determining the best-fit model and inferring maximum likelihood (ML) phylogenies on thousands of independent MSAs using supercomputers. RESULTS: ParGenes executes common phylogenetic pipeline steps such as model-testing, ML inference(s), bootstrapping and computation of branch support values via a single parallel program invocation. We evaluated ParGenes by inferring > 20 000 phylogenetic gene trees with bootstrap support values from Ensembl Compara and VectorBase alignments in 28 h on a cluster with 1024 nodes. AVAILABILITY AND IMPLEMENTATION: GNU GPL at https://github.com/BenoitMorel/ParGenes. SUPPLEMENTARY INFORMATION: Supplementary material is available at Bioinformatics online.


Asunto(s)
Filogenia , Genómica , Probabilidad , Alineación de Secuencia
12.
Bioinformatics ; 35(21): 4453-4455, 2019 11 01.
Artículo en Inglés | MEDLINE | ID: mdl-31070718

RESUMEN

MOTIVATION: Phylogenies are important for fundamental biological research, but also have numerous applications in biotechnology, agriculture and medicine. Finding the optimal tree under the popular maximum likelihood (ML) criterion is known to be NP-hard. Thus, highly optimized and scalable codes are needed to analyze constantly growing empirical datasets. RESULTS: We present RAxML-NG, a from-scratch re-implementation of the established greedy tree search algorithm of RAxML/ExaML. RAxML-NG offers improved accuracy, flexibility, speed, scalability, and usability compared with RAxML/ExaML. On taxon-rich datasets, RAxML-NG typically finds higher-scoring trees than IQTree, an increasingly popular recent tool for ML-based phylogenetic inference (although IQ-Tree shows better stability). Finally, RAxML-NG introduces several new features, such as the detection of terraces in tree space and the recently introduced transfer bootstrap support metric. AVAILABILITY AND IMPLEMENTATION: The code is available under GNU GPL at https://github.com/amkozlov/raxml-ng. RAxML-NG web service (maintained by Vital-IT) is available at https://raxml-ng.vital-it.ch/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Algoritmos , Filogenia , Programas Informáticos , Funciones de Verosimilitud
13.
Syst Biol ; 68(2): 365-369, 2019 03 01.
Artículo en Inglés | MEDLINE | ID: mdl-30165689

RESUMEN

Next generation sequencing (NGS) technologies have led to a ubiquity of molecular sequence data. This data avalanche is particularly challenging in metagenetics, which focuses on taxonomic identification of sequences obtained from diverse microbial environments. Phylogenetic placement methods determine how these sequences fit into an evolutionary context. Previous implementations of phylogenetic placement algorithms, such as the evolutionary placement algorithm (EPA) included in RAxML, or PPLACER, are being increasingly used for this purpose. However, due to the steady progress in NGS technologies, the current implementations face substantial scalability limitations. Herein, we present EPA-NG, a complete reimplementation of the EPA that is substantially faster, offers a distributed memory parallelization, and integrates concepts from both, RAxML-EPA and PPLACER. EPA-NG can be executed on standard shared memory, as well as on distributed memory systems (e.g., computing clusters). To demonstrate the scalability of EPA-NG, we placed $1$ billion metagenetic reads from the Tara Oceans Project onto a reference tree with 3748 taxa in just under $7$ h, using 2048 cores. Our performance assessment shows that EPA-NG outperforms RAxML-EPA and PPLACER by up to a factor of $30$ in sequential execution mode, while attaining comparable parallel efficiency on shared memory systems. We further show that the distributed memory parallelization of EPA-NG scales well up to 2048 cores. EPA-NG is available under the AGPLv3 license: https://github.com/Pbdas/epa-ng.


Asunto(s)
Algoritmos , Clasificación/métodos , Filogenia , Análisis de Secuencia de ADN , Programas Informáticos
14.
BMC Evol Biol ; 18(1): 71, 2018 05 18.
Artículo en Inglés | MEDLINE | ID: mdl-29776336

RESUMEN

BACKGROUND: Apoid wasps and bees (Apoidea) are an ecologically and morphologically diverse group of Hymenoptera, with some species of bees having evolved eusocial societies. Major problems for our understanding of the evolutionary history of Apoidea have been the difficulty to trace the phylogenetic origin and to reliably estimate the geological age of bees. To address these issues, we compiled a comprehensive phylogenomic dataset by simultaneously analyzing target DNA enrichment and transcriptomic sequence data, comprising 195 single-copy protein-coding genes and covering all major lineages of apoid wasps and bee families. RESULTS: Our compiled data matrix comprised 284,607 nucleotide sites that we phylogenetically analyzed by applying a combination of domain- and codon-based partitioning schemes. The inferred results confirm the polyphyletic status of the former family "Crabronidae", which comprises nine major monophyletic lineages. We found the former subfamily Pemphredoninae to be polyphyletic, comprising three distantly related clades. One of them, Ammoplanina, constituted the sister group of bees in all our analyses. We estimate the origin of bees to be in the Early Cretaceous (ca. 128 million years ago), a time period during which angiosperms rapidly radiated. Finally, our phylogenetic analyses revealed that within the Apoidea, (eu)social societies evolved exclusively in a single clade that comprises pemphredonine and philanthine wasps as well as bees. CONCLUSION: By combining transcriptomic sequences with those obtained via target DNA enrichment, we were able to include an unprecedented large number of apoid wasps in a phylogenetic study for tracing the phylogenetic origin of bees. Our results confirm the polyphyletic nature of the former wasp family Crabonidae, which we here suggest splitting into eight families. Of these, the family Ammoplanidae possibly represents the extant sister lineage of bees. Species of Ammoplanidae are known to hunt thrips, of which some aggregate on flowers and feed on pollen. The specific biology of Ammoplanidae as predators indicates how the transition from a predatory to pollen-collecting life style could have taken place in the evolution of bees. This insight plus the finding that (eu)social societies evolved exclusively in a single subordinated lineage of apoid wasps provides new perspectives for future comparative studies.


Asunto(s)
Abejas/clasificación , Abejas/genética , Genómica , Filogenia , Animales , Funciones de Verosimilitud , Análisis de Secuencia de ADN , Conducta Social , Transcriptoma/genética , Avispas/genética
15.
Mol Phylogenet Evol ; 125: 100-115, 2018 08.
Artículo en Inglés | MEDLINE | ID: mdl-29574273

RESUMEN

The Balkan Peninsula constitutes a biodiversity hotspot with high levels of species richness and endemism. The complex geological history of the Balkans in conjunction with the climate evolution are hypothesized as the main drivers generating this biodiversity. We investigated the phylogeography, historical demography, and population structure of closely related wall-lizard species from the Balkan Peninsula and southeastern Europe to better understand diversification processes of species with limited dispersal ability, from Late Miocene to the Holocene. We used several analytical methods integrating genome-wide SNPs (ddRADseq), microsatellites, mitochondrial and nuclear DNA data, as well as species distribution modelling. Phylogenomic analysis resulted in a completely resolved species level phylogeny, population level analyses confirmed the existence of at least two cryptic evolutionary lineages and extensive within species genetic structuring. Divergence time estimations indicated that the Messinian Salinity Crisis played a key role in shaping patterns of species divergence, whereas intraspecific genetic structuring was mainly driven by Pliocene tectonic events and Quaternary climatic oscillations. The present work highlights the effectiveness of utilizing multiple methods and data types coupled with extensive geographic sampling to uncover the evolutionary processes that shaped the species over space and time.


Asunto(s)
Lagartos/clasificación , Modelos Biológicos , Filogeografía , Animales , Peninsula Balcánica , Teorema de Bayes , Biodiversidad , Calibración , ADN Mitocondrial/genética , Variación Genética , Genética de Población , Genómica , Haplotipos/genética , Lagartos/genética , Repeticiones de Microsatélite/genética , Filogenia , Especificidad de la Especie
16.
Mol Phylogenet Evol ; 120: 286-296, 2018 03.
Artículo en Inglés | MEDLINE | ID: mdl-29247847

RESUMEN

Chalcidoidea are a megadiverse group of mostly parasitoid wasps of major ecological and economical importance that are omnipresent in almost all extant terrestrial habitats. The timing and pattern of chalcidoid diversification is so far poorly understood and has left many important questions on the evolutionary history of Chalcidoidea unanswered. In this study, we infer the early divergence events within Chalcidoidea and address the question of whether or not ancestral chalcidoids were small egg parasitoids. We also trace the evolution of some key traits: jumping ability, development of enlarged hind femora, and associations with figs. Our phylogenetic inference is based on the analysis of 3,239 single-copy genes across 48 chalcidoid wasps and outgroups representatives. We applied an innovative a posteriori evaluation approach to molecular clock-dating based on nine carefully validated fossils, resulting in the first molecular clock-based estimation of deep Chalcidoidea divergence times. Our results suggest a late Jurassic origin of Chalcidoidea, with a first divergence of morphologically and biologically distinct groups in the early to mid Cretaceous, between 129 and 81 million years ago (mya). Diversification of most extant lineages happened rapidly after the Cretaceous in the early Paleogene, between 75 and 53 mya. The inferred Chalcidoidea tree suggests a transition from ancestral minute egg parasitoids to larger-bodied parasitoids of other host stages during the early history of chalcidoid evolution. The ability to jump evolved independently at least three times, namely in Eupelmidae, Encyrtidae, and Tanaostigmatidae. Furthermore, the large-bodied strongly sclerotized species with enlarged hind femora in Chalcididae and Leucospidae are not closely related. Finally, the close association of some chalcidoid wasps with figs, either as pollinators, or as inquilines/gallers or as parasitoids, likely evolved at least twice independently: in the Eocene, giving rise to fig pollinators, and in the Oligocene or Miocene, resulting in non-pollinating fig-wasps, including gallers and parasitoids. The origins of very speciose lineages (e.g., Mymaridae, Eulophidae, Pteromalinae) are evenly spread across the period of chalcidoid evolution from early Cretaceous to the late Eocene. Several shifts in biology and morphology (e.g., in host exploitation, body shape and size, life history), each followed by rapid radiations, have likely enabled the evolutionary success of Chalcidoidea.


Asunto(s)
Filogenia , Transcriptoma , Avispas/clasificación , Animales , Evolución Molecular , Fósiles , Secuenciación de Nucleótidos de Alto Rendimiento , Óvulo/metabolismo , ARN/química , ARN/aislamiento & purificación , ARN/metabolismo , Análisis de Secuencia de ARN , Avispas/genética
17.
Am J Bot ; 105(3): 614-622, 2018 03.
Artículo en Inglés | MEDLINE | ID: mdl-29603138

RESUMEN

Providing science and society with an integrated, up-to-date, high quality, open, reproducible and sustainable plant tree of life would be a huge service that is now coming within reach. However, synthesizing the growing body of DNA sequence data in the public domain and disseminating the trees to a diverse audience are often not straightforward due to numerous informatics barriers. While big synthetic plant phylogenies are being built, they remain static and become quickly outdated as new data are published and tree-building methods improve. Moreover, the body of existing phylogenetic evidence is hard to navigate and access for non-experts. We propose that our community of botanists, tree builders, and informaticians should converge on a modular framework for data integration and phylogenetic analysis, allowing easy collaboration, updating, data sourcing and flexible analyses. With support from major institutions, this pipeline should be re-run at regular intervals, storing trees and their metadata long-term. Providing the trees to a diverse global audience through user-friendly front ends and application development interfaces should also be a priority. Interactive interfaces could be used to solicit user feedback and thus improve data quality and to coordinate the generation of new data. We conclude by outlining a number of steps that we suggest the scientific community should take to achieve global phylogenetic synthesis.


Asunto(s)
Difusión de la Información , Gestión de la Información , Filogenia , Plantas/genética , ADN de Plantas , Humanos , Tecnología de la Información , Análisis de Secuencia de ADN
18.
Nucleic Acids Res ; 44(11): 5022-33, 2016 06 20.
Artículo en Inglés | MEDLINE | ID: mdl-27166378

RESUMEN

Molecular sequences in public databases are mostly annotated by the submitting authors without further validation. This procedure can generate erroneous taxonomic sequence labels. Mislabeled sequences are hard to identify, and they can induce downstream errors because new sequences are typically annotated using existing ones. Furthermore, taxonomic mislabelings in reference sequence databases can bias metagenetic studies which rely on the taxonomy. Despite significant efforts to improve the quality of taxonomic annotations, the curation rate is low because of the labor-intensive manual curation process. Here, we present SATIVA, a phylogeny-aware method to automatically identify taxonomically mislabeled sequences ('mislabels') using statistical models of evolution. We use the Evolutionary Placement Algorithm (EPA) to detect and score sequences whose taxonomic annotation is not supported by the underlying phylogenetic signal, and automatically propose a corrected taxonomic classification for those. Using simulated data, we show that our method attains high accuracy for identification (96.9% sensitivity/91.7% precision) as well as correction (94.9% sensitivity/89.9% precision) of mislabels. Furthermore, an analysis of four widely used microbial 16S reference databases (Greengenes, LTP, RDP and SILVA) indicates that they currently contain between 0.2% and 2.5% mislabels. Finally, we use SATIVA to perform an in-depth evaluation of alternative taxonomies for Cyanobacteria. SATIVA is freely available at https://github.com/amkozlov/sativa.


Asunto(s)
Biología Computacional/métodos , Código de Barras del ADN Taxonómico/normas , Genómica/métodos , Anotación de Secuencia Molecular/normas , Filogenia , Bacterias/genética , Bases de Datos de Ácidos Nucleicos , ARN Ribosómico 16S , Reproducibilidad de los Resultados , Análisis de Secuencia de ADN , Programas Informáticos , Navegador Web
19.
Mol Phylogenet Evol ; 116: 213-226, 2017 11.
Artículo en Inglés | MEDLINE | ID: mdl-28887149

RESUMEN

The wasp family Vespidae comprises more than 5000 described species which represent life history strategies ranging from solitary and presocial to eusocial and socially parasitic. The phylogenetic relationships of the major vespid wasp lineages (i.e., subfamilies and tribes) have been investigated repeatedly by analyzing behavioral and morphological traits as well as nucleotide sequences of few selected genes with largely incongruent results. Here we reconstruct their phylogenetic relationships using a phylogenomic approach. We sequenced the transcriptomes of 24 vespid wasp and eight outgroup species and exploited the transcript sequences for design of probes for enriching 913 single-copy protein-coding genes to complement the transcriptome data with nucleotide sequence data from additional 25 ethanol-preserved vespid species. Results from phylogenetic analyses of the combined sequence data revealed the eusocial subfamily Stenogastrinae to be the sister group of all remaining Vespidae, while the subfamily Eumeninae turned out to be paraphyletic. Of the three currently recognized eumenine tribes, Odynerini is paraphyletic with respect to Eumenini, and Zethini is paraphyletic with respect to Polistinae and Vespinae. Our results are in conflict with the current tribal subdivision of Eumeninae and thus, we suggest granting subfamily rank to the two major clades of "Zethini": Raphiglossinae and Zethinae. Overall, our findings corroborate the hypothesis of two independent origins of eusociality in vespid wasps and suggest a single origin of using masticated and salivated plant material for building nests by Raphiglossinae, Zethinae, Polistinae, and Vespinae. The inferred phylogenetic relationships and the open access vespid wasp target DNA enrichment probes will provide a valuable tool for future comparative studies on species of the family Vespidae, including their genomes, life styles, evolution of sociality, and co-evolution with other organisms.


Asunto(s)
ADN/genética , Filogenia , Transcriptoma/genética , Avispas/clasificación , Avispas/genética , Animales , Secuencia de Bases , Sistemas de Lectura Abierta/genética , ARN Mensajero/genética , ARN Mensajero/metabolismo , Análisis de Secuencia de ARN
20.
Bioinformatics ; 31(15): 2577-9, 2015 Aug 01.
Artículo en Inglés | MEDLINE | ID: mdl-25819675

RESUMEN

MOTIVATION: Phylogenies are increasingly used in all fields of medical and biological research. Because of the next generation sequencing revolution, datasets used for conducting phylogenetic analyses grow at an unprecedented pace. We present ExaML version 3, a dedicated production-level code for inferring phylogenies on whole-transcriptome and whole-genome alignments using supercomputers. RESULTS: We introduce several improvements and extensions to ExaML: Extensions of substitution models and supported data types, the integration of a novel load balance algorithm as well as a parallel I/O optimization that significantly improve parallel efficiency, and a production-level implementation for Intel MIC-based hardware platforms.


Asunto(s)
Algoritmos , Computadores , Genoma Humano , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Filogenia , Análisis de Secuencia de ADN/métodos , Programas Informáticos , Simulación por Computador , Humanos , Interfaz Usuario-Computador
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA