RESUMO
[This corrects the article DOI: 10.1371/journal.ppat.1009786.].
RESUMO
The long-term evolutionary impacts of whole-genome duplication (WGD) are strongly influenced by the ensuing rediploidization process. Following autopolyploidization, rediploidization involves a transition from tetraploid to diploid meiotic pairing, allowing duplicated genes (ohnologs) to diverge genetically and functionally. Our understanding of autopolyploid rediploidization has been informed by a WGD event ancestral to salmonid fishes, where large genomic regions are characterized by temporally delayed rediploidization, allowing lineage-specific ohnolog sequence divergence in the major salmonid clades. Here, we investigate the long-term outcomes of autopolyploid rediploidization at genome-wide resolution, exploiting a recent "explosion" of salmonid genome assemblies, including a new genome sequence for the huchen (Hucho hucho). We developed a genome alignment approach to capture duplicated regions across multiple species, allowing us to create 121,864 phylogenetic trees describing genome-wide ohnolog divergence across salmonid evolution. Using molecular clock analysis, we show that 61% of the ancestral salmonid genome experienced an initial "wave" of rediploidization in the late Cretaceous (85-106 Ma). This was followed by a period of relative genomic stasis lasting 17-39 My, where much of the genome remained tetraploid. A second rediploidization wave began in the early Eocene and proceeded alongside species diversification, generating predictable patterns of lineage-specific ohnolog divergence, scaling in complexity with the number of speciation events. Using gene set enrichment, gene expression, and codon-based selection analyses, we provide insights into potential functional outcomes of delayed rediploidization. This study enhances our understanding of delayed autopolyploid rediploidization and has broad implications for future studies of WGD events.
Assuntos
Salmonidae , Animais , Evolução Molecular , Duplicação Gênica , Genoma , Filogenia , Salmonidae/genéticaRESUMO
CRF19 is a recombinant form of HIV-1 subtypes D, A1 and G, which was first sampled in Cuba in 1999, but was already present there in 1980s. CRF19 was reported almost uniquely in Cuba, where it accounts for â¼25% of new HIV-positive patients and causes rapid progression to AIDS (â¼3 years). We analyzed a large data set comprising â¼350 pol and env sequences sampled in Cuba over the last 15 years and â¼350 from Los Alamos database. This data set contained both CRF19 (â¼315), and A1, D and G sequences. We performed and combined analyses for the three A1, G and D regions, using fast maximum likelihood approaches, including: (1) phylogeny reconstruction, (2) spatio-temporal analysis of the virus spread, and ancestral character reconstruction for (3) transmission mode and (4) drug resistance mutations (DRMs). We verified these results with a Bayesian approach. This allowed us to acquire new insights on the CRF19 origin and transmission patterns. We showed that CRF19 recombined between 1966 and 1977, most likely in Cuban community stationed in Congo region. We further investigated CRF19 spread on the Cuban province level, and discovered that the epidemic started in 1970s, most probably in Villa Clara, that it was at first carried by heterosexual transmissions, and then quickly spread in the 1980s within the "men having sex with men" (MSM) community, with multiple transmissions back to heterosexuals. The analysis of the transmission patterns of common DRMs found very few resistance transmission clusters. Our results show a very early introduction of CRF19 in Cuba, which could explain its local epidemiological success. Ignited by a major founder event, the epidemic then followed a similar pattern as other subtypes and CRFs in Cuba. The reason for the short time to AIDS remains to be understood and requires specific surveillance, in Cuba and elsewhere.
Assuntos
Transmissão de Doença Infecciosa/estatística & dados numéricos , Variação Genética , Infecções por HIV/epidemiologia , HIV-1/classificação , Filogenia , Teorema de Bayes , Cuba/epidemiologia , Feminino , Infecções por HIV/transmissão , Infecções por HIV/virologia , HIV-1/genética , HIV-1/fisiologia , Humanos , MasculinoRESUMO
BACKGROUND: The gaur (Bos gaurus) is the largest extant wild bovine species, native to South and Southeast Asia, with unique traits, and is listed as vulnerable by the International Union for Conservation of Nature (IUCN). RESULTS: We report the first gaur reference genome and identify three biological pathways including lysozyme activity, proton transmembrane transporter activity, and oxygen transport with significant changes in gene copy number in gaur compared to other mammals. These may reflect adaptation to challenges related to climate and nutrition. Comparative analyses with domesticated indicine (Bos indicus) and taurine (Bos taurus) cattle revealed genomic signatures of artificial selection, including the expansion of sperm odorant receptor genes in domesticated cattle, which may have important implications for understanding selection for male fertility. CONCLUSIONS: Apart from aiding dissection of economically important traits, the gaur genome will also provide the foundation to conserve the species.
Assuntos
Receptores Odorantes , Animais , Bovinos/genética , Genoma , Genômica , Masculino , Mamíferos , Receptores Odorantes/genética , Espermatozoides , Glicoproteínas da Zona PelúcidaRESUMO
Cellular factors have important roles in all facets of the flavivirus replication cycle. Deciphering viral-host protein interactions is essential for understanding the flavivirus life cycle as well as development of effective antiviral strategies. To uncover novel host factors that are co-opted by multiple flaviviruses, a CRISPR/Cas9 genome wide knockout (KO) screen was employed to identify genes required for replication of Zika virus (ZIKV). Receptor for Activated Protein C Kinase 1 (RACK1) was identified as a novel host factor required for ZIKV replication, which was confirmed via complementary experiments. Depletion of RACK1 via siRNA demonstrated that RACK1 is important for replication of a wide range of mosquito- and tick-borne flaviviruses, including West Nile Virus (WNV), Dengue Virus (DENV), Powassan Virus (POWV) and Langat Virus (LGTV) as well as the coronavirus SARS-CoV-2, but not for YFV, EBOV, VSV or HSV. Notably, flavivirus replication was only abrogated when RACK1 expression was dampened prior to infection. Utilising a non-replicative flavivirus model, we show altered morphology of viral replication factories and reduced formation of vesicle packets (VPs) in cells lacking RACK1 expression. In addition, RACK1 interacted with NS1 protein from multiple flaviviruses; a key protein for replication complex formation. Overall, these findings reveal RACK1's crucial role to the biogenesis of pan-flavivirus replication organelles. IMPORTANCE Cellular factors are critical in all facets of viral lifecycles, where overlapping interactions between the virus and host can be exploited as possible avenues for the development of antiviral therapeutics. Using a genome-wide CRISPR knockout screening approach to identify novel cellular factors important for flavivirus replication we identified RACK1 as a pro-viral host factor for both mosquito- and tick-borne flaviviruses in addition to SARS-CoV-2. Using an innovative flavivirus protein expression system, we demonstrate for the first time the impact of the loss of RACK1 on the formation of viral replication factories known as 'vesicle packets' (VPs). In addition, we show that RACK1 can interact with numerous flavivirus NS1 proteins as a potential mechanism by which VP formation can be induced by the former.
Assuntos
Sistemas CRISPR-Cas , Flavivirus/genética , Proteínas de Neoplasias/genética , Receptores de Quinase C Ativada/genética , Replicação Viral , Células A549 , Aedes , Animais , COVID-19 , Chlorocebus aethiops , Culicidae , Vírus da Dengue/genética , Estudo de Associação Genômica Ampla , Células HEK293 , Interações Hospedeiro-Patógeno/genética , Humanos , RNA Interferente Pequeno/metabolismo , RNA Viral/metabolismo , SARS-CoV-2 , Células Vero , Vírus do Nilo Ocidental/genética , Zika virus/genética , Infecção por Zika virus/virologiaRESUMO
River buffalo is an agriculturally important species with many traits, such as disease tolerance, which promote its use worldwide. Highly contiguous genome assemblies of the river buffalo, goat, pig, human and two cattle subspecies were aligned to study gene gains and losses and signs of positive selection. The gene families that have changed significantly in river buffalo since divergence from cattle play important roles in protein degradation, the olfactory receptor system, detoxification and the immune system. We used the branch site model in PAML to analyse single-copy orthologs to identify positively selected genes that may be involved in skin differentiation, mammary development and bone formation in the river buffalo branch. The high contiguity of the genomes enabled evaluation of differences among species in the major histocompatibility complex. We identified a Babesia-like L1 LINE insertion in the DRB1-like gene in the river buffalo and discuss the implication of this finding.
Assuntos
Búfalos , Genoma , Animais , Búfalos/genética , Bovinos/genética , Complexo Principal de Histocompatibilidade/genética , Fenótipo , SuínosRESUMO
MOTIVATION: High throughput next generation sequencing (NGS) has become exceedingly cheap, facilitating studies to be undertaken containing large sample numbers. Quality control (QC) is an essential stage during analytic pipelines and the outputs of popular bioinformatics tools such as FastQC and Picard can provide information on individual samples. Although these tools provide considerable power when carrying out QC, large sample numbers can make inspection of all samples and identification of systemic bias a challenge. RESULTS: We present ngsReports, an R package designed for the management and visualization of NGS reports from within an R environment. The available methods allow direct import into R of FastQC reports along with outputs from other tools. Visualization can be carried out across many samples using default, highly customizable plots with options to perform hierarchical clustering to quickly identify outlier libraries. Moreover, these can be displayed in an interactive shiny app or HTML report for ease of analysis. AVAILABILITY AND IMPLEMENTATION: The ngsReports package is available on Bioconductor and the GUI shiny app is available at https://github.com/UofABioinformaticsHub/shinyNgsreports. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Sequenciamento de Nucleotídeos em Larga Escala , Software , Viés , Controle de QualidadeRESUMO
Elephantids are the world's most iconic megafaunal family, yet there is no comprehensive genomic assessment of their relationships. We report a total of 14 genomes, including 2 from the American mastodon, which is an extinct elephantid relative, and 12 spanning all three extant and three extinct elephantid species including an â¼120,000-y-old straight-tusked elephant, a Columbian mammoth, and woolly mammoths. Earlier genetic studies modeled elephantid evolution via simple bifurcating trees, but here we show that interspecies hybridization has been a recurrent feature of elephantid evolution. We found that the genetic makeup of the straight-tusked elephant, previously placed as a sister group to African forest elephants based on lower coverage data, in fact comprises three major components. Most of the straight-tusked elephant's ancestry derives from a lineage related to the ancestor of African elephants while its remaining ancestry consists of a large contribution from a lineage related to forest elephants and another related to mammoths. Columbian and woolly mammoths also showed evidence of interbreeding, likely following a latitudinal cline across North America. While hybridization events have shaped elephantid history in profound ways, isolation also appears to have played an important role. Our data reveal nearly complete isolation between the ancestors of the African forest and savanna elephants for â¼500,000 y, providing compelling justification for the conservation of forest and savanna elephants as separate species.
Assuntos
Elefantes/genética , Mamutes/genética , Mastodontes/genética , Animais , Elefantes/classificação , Evolução Molecular , Extinção Biológica , Fósseis , Fluxo Gênico , Genoma , Genômica/história , História Antiga , Mamutes/classificação , Mastodontes/classificação , FilogeniaRESUMO
BACKGROUND: Recently developed genome resources in Salmonid fish provides tools for studying the genomics underlying a wide range of properties including life history trait variation in the wild, economically important traits in aquaculture and the evolutionary consequences of whole genome duplications. Although genome assemblies now exist for a number of salmonid species, the lack of regulatory annotations are holding back our mechanistic understanding of how genetic variation in non-coding regulatory regions affect gene expression and the downstream phenotypic effects. RESULTS: We present SalMotifDB, a database and associated web and R interface for the analysis of transcription factors (TFs) and their cis-regulatory binding sites in five salmonid genomes. SalMotifDB integrates TF-binding site information for 3072 non-redundant DNA patterns (motifs) assembled from a large number of metazoan motif databases. Through motif matching and TF prediction, we have used these multi-species databases to construct putative regulatory networks in salmonid species. The utility of SalMotifDB is demonstrated by showing that key lipid metabolism regulators are predicted to regulate a set of genes affected by different lipid and fatty acid content in the feed, and by showing that our motif database explains a significant proportion of gene expression divergence in gene duplicates originating from the salmonid specific whole genome duplication. CONCLUSIONS: SalMotifDB is an effective tool for analyzing transcription factors, their binding sites and the resulting gene regulatory networks in salmonid species, and will be an important tool for gaining a better mechanistic understanding of gene regulation and the associated phenotypes in salmonids. SalMotifDB is available at https://salmobase.org/apps/SalMotifDB .
Assuntos
Bases de Dados Genéticas , Genômica/métodos , Salmonidae/genética , Fatores de Transcrição/metabolismo , Animais , DNA/química , Duplicação Gênica/genética , Redes Reguladoras de Genes , Metabolismo dos Lipídeos/genética , Motivos de Nucleotídeos , Ligação ProteicaRESUMO
PCDH19-Girls Clustering Epilepsy (PCDH19-GCE) is a childhood epileptic encephalopathy characterised by a spectrum of neurodevelopmental problems. PCDH19-GCE is caused by heterozygous loss-of-function mutations in the X-chromosome gene, Protocadherin 19 (PCDH19) encoding a cell-cell adhesion molecule. Intriguingly, hemizygous males are generally unaffected. As PCDH19 is subjected to random X-inactivation, heterozygous females are comprised of a mosaic of cells expressing either the normal or mutant allele, which is thought to drive pathology. Despite being the second most prevalent monogeneic cause of epilepsy, little is known about the role of PCDH19 in brain development. In this study we show that PCDH19 is highly expressed in human neural stem and progenitor cells (NSPCs) and investigate its function in vitro in these cells of both mouse and human origin. Transcriptomic analysis of mouse NSPCs lacking Pcdh19 revealed changes to genes involved in regulation of neuronal differentiation, and we subsequently show that loss of Pcdh19 causes increased NSPC neurogenesis. We reprogramed human fibroblast cells harbouring a pathogenic PCDH19 mutation into human induced pluripotent stem cells (hiPSC) and employed neural differentiation of these to extend our studies into human NSPCs. As in mouse, loss of PCDH19 function caused increased neurogenesis, and furthermore, we show this is associated with a loss of human NSPC polarity. Overall our data suggests a conserved role for PCDH19 in regulating mammalian cortical neurogenesis and has implications for the pathogenesis of PCDH19-GCE. We propose that the difference in timing or "heterochrony" of neuronal cell production originating from PCDH19 wildtype and mutant NSPCs within the same individual may lead to downstream asynchronies and abnormalities in neuronal network formation, which in-part predispose the individual to network dysfunction and epileptic activity.
Assuntos
Caderinas/biossíntese , Epilepsia/metabolismo , Células-Tronco Pluripotentes Induzidas/metabolismo , Células-Tronco Neurais/metabolismo , Neurogênese/fisiologia , Animais , Caderinas/genética , Células Cultivadas , Análise por Conglomerados , Epilepsia/patologia , Feminino , Humanos , Células-Tronco Pluripotentes Induzidas/patologia , Masculino , Camundongos , Camundongos Knockout , Células-Tronco Neurais/patologia , ProtocaderinasRESUMO
Phylogenies provide a useful way to understand the evolutionary history of genetic samples, and data sets with more than a thousand taxa are becoming increasingly common, notably with viruses (e.g., human immunodeficiency virus (HIV)). Dating ancestral events is one of the first, essential goals with such data. However, current sophisticated probabilistic approaches struggle to handle data sets of this size. Here, we present very fast dating algorithms, based on a Gaussian model closely related to the Langley-Fitch molecular-clock model. We show that this model is robust to uncorrelated violations of the molecular clock. Our algorithms apply to serial data, where the tips of the tree have been sampled through times. They estimate the substitution rate and the dates of all ancestral nodes. When the input tree is unrooted, they can provide an estimate for the root position, thus representing a new, practical alternative to the standard rooting methods (e.g., midpoint). Our algorithms exploit the tree (recursive) structure of the problem at hand, and the close relationships between least-squares and linear algebra. We distinguish between an unconstrained setting and the case where the temporal precedence constraint (i.e., an ancestral node must be older that its daughter nodes) is accounted for. With rooted trees, the former is solved using linear algebra in linear computing time (i.e., proportional to the number of taxa), while the resolution of the latter, constrained setting, is based on an active-set method that runs in nearly linear time. With unrooted trees the computing time becomes (nearly) quadratic (i.e., proportional to the square of the number of taxa). In all cases, very large input trees (>10,000 taxa) can easily be processed and transformed into time-scaled trees. We compare these algorithms to standard methods (root-to-tip, r8s version of Langley-Fitch method, and BEAST). Using simulated data, we show that their estimation accuracy is similar to that of the most sophisticated methods, while their computing time is much faster. We apply these algorithms on a large data set comprising 1194 strains of Influenza virus from the pdm09 H1N1 Human pandemic. Again the results show that these algorithms provide a very fast alternative with results similar to those of other computer programs. These algorithms are implemented in the LSD software (least-squares dating), which can be downloaded from http://www.atgc-montpellier.fr/LSD/, along with all our data sets and detailed results. An Online Appendix, providing additional algorithm descriptions, tables, and figures can be found in the Supplementary Material available on Dryad at http://dx.doi.org/10.5061/dryad.968t3.
Assuntos
Algoritmos , Vírus da Influenza A Subtipo H1N1/classificação , Filogenia , Simulação por Computador , Evolução Molecular , Vírus da Influenza A Subtipo H1N1/genética , Análise dos Mínimos Quadrados , Modelos Genéticos , SoftwareRESUMO
BACKGROUND: Given a gene and a species tree, reconciliation methods attempt to retrieve the macro-evolutionary events that best explain the discrepancies between the two tree topologies. The DTL parsimonious approach searches for a most parsimonious reconciliation between a gene tree and a (dated) species tree, considering four possible macro-evolutionary events (speciation, duplication, transfer, and loss) with specific costs. Unfortunately, many events are erroneously predicted due to errors in the input trees, inappropriate input cost values or because of the existence of several equally parsimonious scenarios. It is thus crucial to provide a measure of the reliability for predicted events. It has been recently proposed that the reliability of an event can be estimated via its frequency in the set of most parsimonious reconciliations obtained using a variety of reasonable input cost vectors. To compute such a support, a straightforward but time-consuming approach is to generate the costs slightly departing from the original ones, independently compute the set of all most parsimonious reconciliations for each vector, and combine these sets a posteriori. Another proposed approach uses Pareto-optimality to partition cost values into regions which induce reconciliations with the same number of DTL events. The support of an event is then defined as its frequency in the set of regions. However, often, the number of regions is not large enough to provide reliable supports. RESULTS: We present here a method to compute efficiently event supports via a polynomial-sized graph, which can represent all reconciliations for several different costs. Moreover, two methods are proposed to take into account alternative input costs: either explicitly providing an input cost range or allowing a tolerance for the over cost of a reconciliation. Our methods are faster than the region based method, substantially faster than the sampling-costs approach, and have a higher event-prediction accuracy on simulated data. CONCLUSIONS: We propose a new approach to improve the accuracy of event supports for parsimonious reconciliation methods to account for uncertainty in the input costs. Furthermore, because of their speed, our methods can be used on large gene families. Our algorithms are implemented in the ecceTERA program, freely available from http://mbb.univ-montp2.fr/MBB/.
Assuntos
Evolução Molecular , Filogenia , Proteobactérias/genética , Algoritmos , Simulação por Computador , Genes Bacterianos , Reprodutibilidade dos TestesRESUMO
Reconciliation methods explain topology differences between a species tree and a gene tree by evolutionary events other than speciations. However, not all phylogenies are trees: hybridization can occur and create new species and this results into reticulate phylogenies. Here, we consider the problem of reconciling a gene tree with a species network via duplication and loss events. Two variants are proposed and solved with effcient algorithms: the first one finds the best tree in the network with which to reconcile the gene tree, and the second one finds the best reconciliation between the gene tree and the whole network.
Assuntos
Evolução Molecular , Duplicação Gênica/genética , Especiação Genética , Filogenia , Algoritmos , GenômicaRESUMO
Proteins are under selection to maintain central functions and to accommodate needs that arise in ever-changing environments. The positive selection and neutral drift that preserve functions result in a diversity of protein variants. The amount of diversity differs between proteins: multifunctional or disease-related proteins tend to have fewer variants than proteins involved in some aspects of immunity. Our work focuses on the extensively studied protein Vitellogenin (Vg), which in honey bees (Apis mellifera) is multifunctional and highly expressed and plays roles in immunity. Yet, almost nothing is known about the natural variation in the coding sequences of this protein or how amino acid-altering variants might impact structure-function relationships. Here, we map out allelic variation in honey bee Vg using biological samples from 15 countries. The successful barcoded amplicon Nanopore sequencing of 543 bees revealed 121 protein variants, indicating a high level of diversity in Vg. We find that the distribution of non-synonymous single nucleotide polymorphisms (nsSNPs) differs between protein regions with different functions; domains involved in DNA and protein-protein interactions contain fewer nsSNPs than the protein's lipid binding cavities. We outline how the central functions of the protein can be maintained in different variants and how the variation pattern may inform about selection from pathogens and nutrition.
Assuntos
Vitelogeninas , Sequência de Aminoácidos , Animais , Abelhas/genética , Vitelogeninas/genética , Vitelogeninas/metabolismoRESUMO
For a given set L of species and a set T of triplets on L, we seek to construct a phylogenetic network which is consistent with T i.e. which represents all triplets of T. The level of a network is defined as the maximum number of hybrid vertices in its biconnected components. When T is dense, there exist polynomial time algorithms to construct level-0,1 and 2 networks (Aho et al., 1981; Jansson, Nguyen and Sung, 2006; Jansson and Sung, 2006; Iersel et al., 2009). For higher levels, partial answers were obtained in the paper by Iersel and Kelk (2008), with a polynomial time algorithm for simple networks. In this paper, we detail the first complete answer for the general case, solving a problem proposed in Jansson and Sung (2006) and Iersel et al. (2009). For any k fixed, it is possible to construct a level-k network having the minimum number of hybrid vertices and consistent with T, if there is any, in time O(T(k+1)n([4k/3]+1)).