RESUMEN
High-throughput sequencing projects generate genome-scale sequence data for species-level phylogenies1-3. However, state-of-the-art Bayesian methods for inferring timetrees are computationally limited to small datasets and cannot exploit the growing number of available genomes4. In the case of mammals, molecular-clock analyses of limited datasets have produced conflicting estimates of clade ages with large uncertainties5,6, and thus the timescale of placental mammal evolution remains contentious7-10. Here we develop a Bayesian molecular-clock dating approach to estimate a timetree of 4,705 mammal species integrating information from 72 mammal genomes. We show that increasingly larger phylogenomic datasets produce diversification time estimates with progressively smaller uncertainties, facilitating precise tests of macroevolutionary hypotheses. For example, we confidently reject an explosive model of placental mammal origination in the Palaeogene8 and show that crown Placentalia originated in the Late Cretaceous with unambiguous ordinal diversification in the Palaeocene/Eocene. Our Bayesian methodology facilitates analysis of complete genomes and thousands of species within an integrated framework, making it possible to address hitherto intractable research questions on species diversifications. This approach can be used to address other contentious cases of animal and plant diversifications that require analysis of species-level phylogenomic datasets.
Asunto(s)
Evolución Molecular , Mamíferos , Filogenia , Animales , Teorema de Bayes , Euterios/clasificación , Euterios/genética , Femenino , Mamíferos/clasificación , Mamíferos/genética , Placenta , Embarazo , Especificidad de la EspecieRESUMEN
Dire wolves are considered to be one of the most common and widespread large carnivores in Pleistocene America1, yet relatively little is known about their evolution or extinction. Here, to reconstruct the evolutionary history of dire wolves, we sequenced five genomes from sub-fossil remains dating from 13,000 to more than 50,000 years ago. Our results indicate that although they were similar morphologically to the extant grey wolf, dire wolves were a highly divergent lineage that split from living canids around 5.7 million years ago. In contrast to numerous examples of hybridization across Canidae2,3, there is no evidence for gene flow between dire wolves and either North American grey wolves or coyotes. This suggests that dire wolves evolved in isolation from the Pleistocene ancestors of these species. Our results also support an early New World origin of dire wolves, while the ancestors of grey wolves, coyotes and dholes evolved in Eurasia and colonized North America only relatively recently.
Asunto(s)
Extinción Biológica , Filogenia , Lobos/clasificación , Animales , Fósiles , Flujo Génico , Genoma/genética , Genómica , Mapeo Geográfico , América del Norte , Paleontología , Fenotipo , Lobos/genéticaRESUMEN
The CODEML program in the PAML package has been widely used to analyze protein-coding gene sequences to estimate the synonymous and nonsynonymous rates (dS and dN) and to detect positive Darwinian selection driving protein evolution. For users not familiar with molecular evolutionary analysis, the program is known to have a steep learning curve. Here, we provide a step-by-step protocol to illustrate the commonly used tests available in the program, including the branch models, the site models, and the branch-site models, which can be used to detect positive selection driving adaptive protein evolution affecting particular lineages of the species phylogeny, affecting a subset of amino acid residues in the protein, and affecting a subset of sites along prespecified lineages, respectively. A data set of the myxovirus (Mx) genes from ten mammal and two bird species is used as an example. We discuss a new feature in CODEML that allows users to perform positive selection tests for multiple genes for the same set of taxa, as is common in modern genome-sequencing projects. The PAML package is distributed at https://github.com/abacus-gene/paml under the GNU license, with support provided at its discussion site (https://groups.google.com/g/pamlsoftware). Data files used in this protocol are available at https://github.com/abacus-gene/paml-tutorial.
Asunto(s)
Evolución Molecular , Programas Informáticos , Animales , Codón , Secuencia de Bases , Selección Genética , Filogenia , Mamíferos/genéticaRESUMEN
Virtual screening of large chemical libraries is essential to support computer-aided drug development, providing a rapid and low-cost approach for further experimental validation. However, existing computational packages are often for specialised users or platform limited. Previously, we developed VSpipe, an open-source semi-automated pipeline for structure-based virtual screening. We have now improved and expanded the initial command-line version into an interactive graphical user interface: VSpipe-GUI, a cross-platform open-source Python toolkit functional in various operating systems (e.g., Linux distributions, Windows, and Mac OS X). The new implementation is more user-friendly and accessible, and considerably faster than the previous version when AutoDock Vina is used for docking. Importantly, we have introduced a new compound selection module (i.e., spatial filtering) that allows filtering of docked compounds based on specified features at the target binding site. We have tested the new VSpipe-GUI on the Hepatitis C Virus NS3 (HCV NS3) protease as the target protein. The pocket-based and interaction-based modes of the spatial filtering module showed efficient and specific selection of ligands from the virtual screening that interact with the HCV NS3 catalytic serine 139.
Asunto(s)
Hepatitis C , Programas Informáticos , Humanos , Proteínas/química , Sitios de Unión , Hepacivirus , Ligandos , Interfaz Usuario-Computador , Simulación del Acoplamiento MolecularRESUMEN
The evolution of cetaceans, from their early transition to an aquatic lifestyle to their subsequent diversification, has been the subject of numerous studies. However, although the higher-level relationships among cetacean families have been largely settled, several aspects of the systematics within these groups remain unresolved. Problematic clades include the oceanic dolphins (37 spp.), which have experienced a recent rapid radiation, and the beaked whales (22 spp.), which have not been investigated in detail using nuclear loci. The combined application of high-throughput sequencing with techniques that target specific genomic sequences provide a powerful means of rapidly generating large volumes of orthologous sequence data for use in phylogenomic studies. To elucidate the phylogenetic relationships within the Cetacea, we combined sequence capture with Illumina sequencing to generate data for $\sim $3200 protein-coding genes for 68 cetacean species and their close relatives including the pygmy hippopotamus. By combining data from $>$38,000 exons with existing sequences from 11 cetaceans and seven outgroup taxa, we produced the first comprehensive comparative genomic data set for cetaceans, spanning 6,527,596 aligned base pairs (bp) and 89 taxa. Phylogenetic trees reconstructed with maximum likelihood and Bayesian inference of concatenated loci, as well as with coalescence analyses of individual gene trees, produced mostly concordant and well-supported trees. Our results completely resolve the relationships among beaked whales as well as the contentious relationships among oceanic dolphins, especially the problematic subfamily Delphinidae. We carried out Bayesian estimation of species divergence times using MCMCTree and compared our complete data set to a subset of clocklike genes. Analyses using the complete data set consistently showed less variance in divergence times than the reduced data set. In addition, integration of new fossils (e.g., Mystacodon selenensis) indicates that the diversification of Crown Cetacea began before the Late Eocene and the divergence of Crown Delphinidae as early as the Middle Miocene. [Cetaceans; phylogenomics; Delphinidae; Ziphiidae; dolphins; whales.].
Asunto(s)
Cetáceos/clasificación , Cetáceos/genética , Filogenia , Animales , Biodiversidad , Clasificación , Secuenciación de Nucleótidos de Alto Rendimiento , Especificidad de la EspecieRESUMEN
Discrete morphological data have been widely used to study species evolution, but the use of quantitative (or continuous) morphological characters is less common. Here, we implement a Bayesian method to estimate species divergence times using quantitative characters. Quantitative character evolution is modeled using Brownian diffusion with character correlation and character variation within populations. Through simulations, we demonstrate that ignoring the population variation (or population "noise") and the correlation among characters leads to biased estimates of divergence times and rate, especially if the correlation and population noise are high. We apply our new method to the analysis of quantitative characters (cranium landmarks) and molecular data from carnivoran mammals. Our results show that time estimates are affected by whether the correlations and population noise are accounted for or ignored in the analysis. The estimates are also affected by the type of data analyzed, with analyses of morphological characters only, molecular data only, or a combination of both; showing noticeable differences among the time estimates. Rate variation of morphological characters among the carnivoran species appears to be very high, with Bayesian model selection indicating that the independent-rates model fits the morphological data better than the autocorrelated-rates model. We suggest that using morphological continuous characters, together with molecular data, can bring a new perspective to the study of species evolution. Our new model is implemented in the MCMCtree computer program for Bayesian inference of divergence times.
Asunto(s)
Biodiversidad , Carnívoros/clasificación , Clasificación/métodos , Filogenia , Animales , Teorema de Bayes , Carnívoros/anatomía & histología , Modelos BiológicosRESUMEN
The explosive growth of molecular sequence data has made it possible to estimate species divergence times under relaxed-clock models using genome-scale data sets with many gene loci. In order to improve both model realism and to best extract information about relative divergence times in the sequence data, it is important to account for the heterogeneity in the evolutionary process across genes or genomic regions. Partitioning is a commonly used approach to achieve those goals. We group sites that have similar evolutionary characteristics into the same partition and those with different characteristics into different partitions, and then use different models or different values of model parameters for different partitions to account for the among-partition heterogeneity. However, how to partition data in practical phylogenetic analysis, and in particular in relaxed-clock dating analysis, is more art than science. Here, we use computer simulation and real data analysis to study the impact of the partition scheme on divergence time estimation. The partition schemes had relatively minor effects on the accuracy of posterior time estimates when the prior assumptions were correct and the clock was not seriously violated, but showed large differences when the clock was seriously violated, when the fossil calibrations were in conflict or incorrect, or when the rate prior was mis-specified. Concatenation produced the widest posterior intervals with the least precision. Use of many partitions increased the precision, as predicted by the infinite-sites theory, but the posterior intervals might fail to include the true ages because of the conflicting fossil calibrations or mis-specified rate priors. We analyzed a data set of 78 plastid genes from 15 plant species with serious clock violation and showed that time estimates differed significantly among partition schemes, irrespective of the rate drift model used. Multiple and precise fossil calibrations reduced the differences among partition schemes and were important to improving the precision of divergence time estimates. While the use of many partitions is an important approach to reducing the uncertainty in posterior time estimates, we do not recommend its general use for the present, given the limitations of current models of rate drift for partitioned data and the challenges of interpreting the fossil evidence to construct accurate and informative calibrations.
Asunto(s)
Clasificación/métodos , Especiación Genética , Fósiles , Plantas/clasificación , Plantas/genética , Plastidios/genética , Reproducibilidad de los Resultados , TiempoRESUMEN
Fungal diseases are a serious health burden worldwide with drug resistance compromising efficacy of the limited arsenal of antifungals available. New drugs with novel mechanisms of action are desperately needed to overcome current challenges. The screening of the Aspergillus fumigatus genome identified 35 phosphatases, four of which were previously reported as essential for viability. In addition, we validated another three essential phosphatases. Phosphatases control critical events in fungi from cell wall integrity to cell cycle, thus they are attractive targets for drug development. We used VSpipe v1.0, a virtual screening pipeline, to evaluate the druggability of the seven essential phosphatases and identify starting points for drug discovery. Targeted virtual screening and evaluation of the ligand efficiency plots created by VSpipe, enabled us to define the most favourable chemical space for drug development and suggested different modes of inhibition for each phosphatase. Interestingly, the identified ligand binding sites match with functional sites (active site and protein interaction sites) reported for other yeast and human homologues. Thus, the VSpipe virtual screening approach identified both druggable and functional sites in these essential phosphatases for further experimental validation and antifungal drug development.
Asunto(s)
Aspergillus fumigatus/enzimología , Proteínas Fúngicas/genética , Genoma Fúngico , Monoéster Fosfórico Hidrolasas/genética , Análisis de Secuencia de ADN , Programas Informáticos , Aspergillus fumigatus/genética , Ciclo Celular/genéticaRESUMEN
The use of computational tools for virtual screening provides a cost-efficient approach to select starting points for drug development. We have developed VSpipe, a user-friendly semi-automated pipeline for structure-based virtual screening. VSpipe uses the existing tools AutoDock and OpenBabel together with software developed in-house, to create an end-to-end virtual screening workflow ranging from the preparation of receptor and ligands to the visualisation of results. VSpipe is efficient and flexible, allowing the users to make choices at different steps, and it is amenable to use in both local and cluster mode. We have validated VSpipe using the human protein tyrosine phosphatase PTP1B as a case study. Using a combination of blind and targeted docking VSpipe identified both new and known functional ligand binding sites. Assessment of different binding clusters using the ligand efficiency plots created by VSpipe, defined a drug-like chemical space for development of PTP1B inhibitors with potential applications to other PTPs. In this study, we show that VSpipe can be deployed to identify and compare different modes of inhibition thus guiding the selection of initial hits for drug discovery.
Asunto(s)
Biología Computacional/métodos , Descubrimiento de Drogas/métodos , Programas Informáticos , Regulación Alostérica , Sitios de Unión , Dominio Catalítico , Humanos , Ligandos , Modelos Moleculares , Conformación Molecular , Unión Proteica , Proteínas Tirosina Fosfatasas/antagonistas & inhibidores , Proteínas Tirosina Fosfatasas/química , Relación Estructura-Actividad CuantitativaRESUMEN
The nature of the last universal common ancestor (LUCA), its age and its impact on the Earth system have been the subject of vigorous debate across diverse disciplines, often based on disparate data and methods. Age estimates for LUCA are usually based on the fossil record, varying with every reinterpretation. The nature of LUCA's metabolism has proven equally contentious, with some attributing all core metabolisms to LUCA, whereas others reconstruct a simpler life form dependent on geochemistry. Here we infer that LUCA lived ~4.2 Ga (4.09-4.33 Ga) through divergence time analysis of pre-LUCA gene duplicates, calibrated using microbial fossils and isotope records under a new cross-bracing implementation. Phylogenetic reconciliation suggests that LUCA had a genome of at least 2.5 Mb (2.49-2.99 Mb), encoding around 2,600 proteins, comparable to modern prokaryotes. Our results suggest LUCA was a prokaryote-grade anaerobic acetogen that possessed an early immune system. Although LUCA is sometimes perceived as living in isolation, we infer LUCA to have been part of an established ecological system. The metabolism of LUCA would have provided a niche for other microbial community members and hydrogen recycling by atmospheric photochemistry could have supported a modestly productive early ecosystem.
Asunto(s)
Archaea , Bacterias , Planeta Tierra , Bacterias/genética , Bacterias/clasificación , Bacterias/metabolismo , Archaea/genética , Archaea/clasificación , Filogenia , Fósiles , Evolución BiológicaRESUMEN
Pathogenic bacteria use specific host factors to modulate virulence and stress responses during infection. We found previously that the host factor bile and the bile component glyco-conjugated cholate (NaGCH, sodium glycocholate) upregulate the colonization factor CS5 in enterotoxigenic Escherichia coli (ETEC). To further understand the global regulatory effects of bile and NaGCH, we performed Illumina RNA-Seq and found that crude bile and NaGCH altered the expression of 61 genes in CS5 + CS6 ETEC isolates. The most striking finding was high induction of the CS5 operon (csfA-F), its putative transcription factor csvR, and the putative ETEC virulence factor cexE. iTRAQ-coupled LC-MS/MS proteomic analyses verified induction of the plasmid-borne virulence proteins CS5 and CexE and also showed that NaGCH affected the expression of bacterial membrane proteins. Furthermore, NaGCH induced bacteria to aggregate, increased their adherence to epithelial cells, and reduced their motility. Our results indicate that CS5 + CS6 ETEC use NaGCH present in the small intestine as a signal to initiate colonization of the epithelium.