ABSTRACT
The 2009 H1N1 pandemic (pdm09) lineage of influenza A virus (IAV) crosses interspecies barriers with frequent human-to-swine spillovers each year. These spillovers reassort and drift within swine populations, leading to genetically and antigenically novel IAV that represent a zoonotic threat. We quantified interspecies transmission of the pdm09 lineage, persistence in swine, and identified how evolution in swine impacted zoonotic risk. Human and swine pdm09 case counts between 2010 and 2020 were correlated and human pdm09 burden and circulation directly impacted the detection of pdm09 in pigs. However, there was a relative absence of pdm09 circulation in humans during the 2020-21 season that was not reflected in swine. During the 2020-21 season, most swine pdm09 detections originated from human-to-swine spillovers from the 2018-19 and 2019-20 seasons that persisted in swine. We identified contemporary swine pdm09 representatives of each persistent spillover and quantified cross-reactivity between human seasonal H1 vaccine strains and the swine strains using a panel of monovalent ferret antisera in hemagglutination inhibition (HI) assays. The swine pdm09s had variable antigenic reactivity to vaccine antisera, but each swine pdm09 clade exhibited significant reduction in cross-reactivity to one or more of the human seasonal vaccine strains. Further supporting zoonotic risk, we showed phylogenetic evidence for 17 swine-to-human transmission events of pdm09 from 2010 to 2021, 11 of which were not previously classified as variants, with each of the zoonotic cases associated with persistent circulation of pdm09 in pigs. These data demonstrate that reverse-zoonoses and evolution of pdm09 in swine results in viruses that are capable of zoonotic transmission and represent a potential pandemic threat.
Subject(s)
Influenza A Virus, H1N1 Subtype , Influenza A virus , Influenza, Human , Orthomyxoviridae Infections , Swine Diseases , Animals , United States/epidemiology , Humans , Swine , Influenza A Virus, H1N1 Subtype/genetics , Orthomyxoviridae Infections/epidemiology , Orthomyxoviridae Infections/veterinary , Phylogeny , Ferrets , Zoonoses/epidemiology , Immune Sera , Influenza, Human/epidemiologyABSTRACT
Proteins encoded by newly-emerged genes ('orphan genes') share no sequence similarity with proteins in any other species. They provide organisms with a reservoir of genetic elements to quickly respond to changing selection pressures. Here, we systematically assess the ability of five gene prediction pipelines to accurately predict genes in genomes according to phylostratal origin. BRAKER and MAKER are existing, popular ab initio tools that infer gene structures by machine learning. Direct Inference is an evidence-based pipeline we developed to predict gene structures from alignments of RNA-Seq data. The BIND pipeline integrates ab initio predictions of BRAKER and Direct inference; MIND combines Direct Inference and MAKER predictions. We use highly-curated Arabidopsis and yeast annotations as gold-standard benchmarks, and cross-validate in rice. Each pipeline under-predicts orphan genes (as few as 11 percent, under one prediction scenario). Increasing RNA-Seq diversity greatly improves prediction efficacy. The combined methods (BIND and MIND) yield best predictions overall, BIND identifying 68% of annotated orphan genes, 99% of ancient genes, and give the highest sensitivity score regardless dataset in Arabidopsis. We provide a light weight, flexible, reproducible, and well-documented solution to improve gene prediction.
Subject(s)
Arabidopsis , Oryza , Arabidopsis/genetics , Genome , Oryza/genetics , RNA-Seq , SoftwareABSTRACT
MOTIVATION: The goal of phylostratigraphy is to infer the evolutionary origin of each gene in an organism. This is done by searching for homologs within increasingly broad clades. The deepest clade that contains a homolog of the protein(s) encoded by a gene is that gene's phylostratum. RESULTS: We have created a general R-based framework, phylostratr, to estimate the phylostratum of every gene in a species. The program fully automates analysis: selecting species for balanced representation, retrieving sequences, building databases, inferring phylostrata and returning diagnostics. Key diagnostics include: detection of genes with inferred homologs in old clades, but not intermediate ones; proteome quality assessments; false-positive diagnostics, and checks for missing organellar genomes. phylostratr allows extensive customization and systematic comparisons of the influence of analysis parameters or genomes on phylostrata inference. A user may: modify the automatically generated clade tree or use their own tree; provide custom sequences in place of those automatically retrieved from UniProt; replace BLAST with an alternative algorithm; or tailor the method and sensitivity of the homology inference classifier. We show the utility of phylostratr through case studies in Arabidopsis thaliana and Saccharomyces cerevisiae. AVAILABILITY AND IMPLEMENTATION: Source code available at https://github.com/arendsee/phylostratr. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Subject(s)
Algorithms , Phylogeny , Software , Genome , Saccharomyces cerevisiaeABSTRACT
BACKGROUND: With every new genome that is sequenced, thousands of species-specific genes (orphans) are found, some originating from ultra-rapid mutations of existing genes, many others originating de novo from non-genic regions of the genome. If some of these genes survive across speciations, then extant organisms will contain a patchwork of genes whose ancestors first appeared at different times. Standard phylostratigraphy, the technique of partitioning genes by their age, is based solely on protein similarity algorithms. However, this approach relies on negative evidence â a failure to detect a homolog of a query gene. An alternative approach is to limit the search for homologs to syntenic regions. Then, genes can be positively identified as de novo orphans by tracing them to non-coding sequences in related species. RESULTS: We have developed a synteny-based pipeline in the R framework. Fagin determines the genomic context of each query gene in a focal species compared to homologous sequence in target species. We tested the fagin pipeline on two focal species, Arabidopsis thaliana (plus four target species in Brassicaseae) and Saccharomyces cerevisiae (plus six target species in Saccharomyces). Using microsynteny maps, fagin classified the homology relationship of each query gene against each target genome into three main classes, and further subclasses: AAic (has a coding syntenic homolog), NTic (has a non-coding syntenic homolog), and Unknown (has no detected syntenic homolog). fagin inferred over half the "Unknown" A. thaliana query genes, and about 20% for S. cerevisiae, as lacking a syntenic homolog because of local indels or scrambled synteny. CONCLUSIONS: fagin augments standard phylostratigraphy, and extends synteny-based phylostratigraphy with an automated, customizable, and detailed contextual analysis. By comparing synteny-based phylostrata to standard phylostrata, fagin systematically identifies those orphans and lineage-specific genes that are well-supported to have originated de novo. Analyzing within-species genomes should distinguish orphan genes that may have originated through rapid divergence from de novo orphans. Fagin also delineates whether a gene has no syntenic homolog because of technical or biological reasons. These analyses indicate that some orphans may be associated with regions of high genomic perturbation.
Subject(s)
Arabidopsis/genetics , Genes , Phylogeny , Saccharomyces cerevisiae/genetics , Software , Synteny/genetics , Base Sequence , Genome , Sequence HomologyABSTRACT
The allocation of carbon and nitrogen resources to the synthesis of plant proteins, carbohydrates, and lipids is complex and under the control of many genes; much remains to be understood about this process. QQS (Qua-Quine Starch; At3g30720), an orphan gene unique to Arabidopsis thaliana, regulates metabolic processes affecting carbon and nitrogen partitioning among proteins and carbohydrates, modulating leaf and seed composition in Arabidopsis and soybean. Here the universality of QQS function in modulating carbon and nitrogen allocation is exemplified by a series of transgenic experiments. We show that ectopic expression of QQS increases soybean protein independent of the genetic background and original protein content of the cultivar. Furthermore, transgenic QQS expression increases the protein content of maize, a C4 species (a species that uses 4-carbon photosynthesis), and rice, a protein-poor agronomic crop, both highly divergent from Arabidopsis. We determine that QQS protein binds to the transcriptional regulator AtNF-YC4 (Arabidopsis nuclear factor Y, subunit C4). Overexpression of AtNF-YC4 in Arabidopsis mimics the QQS-overexpression phenotype, increasing protein and decreasing starch levels. NF-YC, a component of the NF-Y complex, is conserved across eukaryotes. The NF-YC4 homologs of soybean, rice, and maize also bind to QQS, which provides an explanation of how QQS can act in species where it does not occur endogenously. These findings are, to our knowledge, the first insight into the mechanism of action of QQS in modulating carbon and nitrogen allocation across species. They have major implications for the emergence and function of orphan genes, and identify a nontransgenic strategy for modulating protein levels in crop species, a trait of great agronomic significance.
Subject(s)
Arabidopsis Proteins/metabolism , Carbon/metabolism , Genes, Plant , Nitrogen/metabolism , Transcription Factors/metabolism , Arabidopsis/genetics , Arabidopsis Proteins/genetics , Gene Expression Regulation, Plant , Models, Biological , Mutation , Oryza/genetics , Phenotype , Photosynthesis , Phylogeny , Plant Leaves/physiology , Plants, Genetically Modified , Protein Binding , Protein Structure, Tertiary , Glycine max/genetics , Glycine max/growth & development , Species SpecificityABSTRACT
BACKGROUND: The molecular, biochemical, and genetic mechanisms that regulate the complex metabolic network of soybean seed development determine the ultimate balance of protein, lipid, and carbohydrate stored in the mature seed. Many of the genes and metabolites that participate in seed metabolism are unknown or poorly defined; even more remains to be understood about the regulation of their metabolic networks. A global omics analysis can provide insights into the regulation of seed metabolism, even without a priori assumptions about the structure of these networks. RESULTS: With the future goal of predictive biology in mind, we have combined metabolomics, transcriptomics, and metabolic flux technologies to reveal the global developmental and metabolic networks that determine the structure and composition of the mature soybean seed. We have coupled this global approach with interactive bioinformatics and statistical analyses to gain insights into the biochemical programs that determine soybean seed composition. For this purpose, we used Plant/Eukaryotic and Microbial Metabolomics Systems Resource (PMR, http://www.metnetdb.org/pmr, a platform that incorporates metabolomics data to develop hypotheses concerning the organization and regulation of metabolic networks, and MetNet systems biology tools http://www.metnetdb.org for plant omics data, a framework to enable interactive visualization of metabolic and regulatory networks. CONCLUSIONS: This combination of high-throughput experimental data and bioinformatics analyses has revealed sets of specific genes, genetic perturbations and mechanisms, and metabolic changes that are associated with the developmental variation in soybean seed composition. Researchers can explore these metabolomics and transcriptomics data interactively at PMR.
Subject(s)
Glycine max/metabolism , Metabolomics , Seeds/growth & development , Software , Systems Biology , Transcriptome , Gene Regulatory Networks , Metabolic Networks and Pathways , Metabolomics/statistics & numerical data , Seeds/chemistry , Seeds/embryology , Glycine max/chemistry , Glycine max/genetics , Transcription Factors/genetics , Transcription Factors/metabolismABSTRACT
DIVERGE is a software system for phylogeny-based analyses of protein family evolution and functional divergence. It provides a suite of statistical tools for selection and prioritization of the amino acid sites that are responsible for the functional divergence of a gene family. The synergistic efforts of DIVERGE and other methods have convincingly demonstrated that the pattern of rate change at a particular amino acid site may contain insightful information about the underlying functional divergence following gene duplication. These predicted sites may be used as candidates for further experiments. We are now releasing an updated version of DIVERGE with the following improvements: 1) a feasible approach to examining functional divergence in nearly complete sequences by including deletions and insertions (indels); 2) the calculation of the false discovery rate of functionally diverging sites; 3) estimation of the effective number of functional divergence-related sites that is reliable and insensitive to cutoffs; 4) a statistical test for asymmetric functional divergence; and 5) a new method to infer functional divergence specific to a given duplicate cluster. In addition, we have made efforts to improve software design and produce a well-written software manual for the general user.
Subject(s)
Amino Acids/genetics , Gene Duplication , Phylogeny , Proteins/genetics , Amino Acid Sequence , Animals , Evolution, Molecular , INDEL Mutation , Models, Genetic , Multigene Family , Proteins/classification , Software , Vertebrates/geneticsABSTRACT
Influenza A viruses (IAVs) of the H1N1 classical swine lineage became endemic in North American swine following the 1918 pandemic. Additional human-to-swine transmission events after 1918, and a spillover of H1 viruses from wild birds in Europe, potentiated a rapid increase in genomic diversity via reassortment between introductions and the endemic classical swine lineage. To determine mechanisms affecting reassortment and evolution, we conducted a phylogenetic analysis of N1 and paired HA swine IAV genes in North America between 1930 and 2020. We described fourteen N1 clades within the N1 Eurasian avian lineage (including the N1 pandemic clade), the N1 classical swine lineage, and the N1 human seasonal lineage. Seven N1 genetic clades had evidence for contemporary circulation. To assess antigenic drift associated with N1 genetic diversity, we generated a panel of representative swine N1 antisera and quantified the antigenic distance between wild-type viruses using enzyme-linked lectin assays and antigenic cartography. Within the N1 genes, the antigenic similarity was variable and reflected shared evolutionary history. Sustained circulation and evolution of N1 genes in swine had resulted in a significant antigenic distance between the N1 pandemic clade and the classical swine lineage. Between 2010 and 2020, N1 clades and N1-HA pairings fluctuated in detection frequency across North America, with hotspots of diversity generally appearing and disappearing within 2 years. We also identified frequent N1-HA reassortment events (n = 36), which were rarely sustained (n = 6) and sometimes also concomitant with the emergence of new N1 genetic clades (n = 3). These data form a baseline from which we can identify N1 clades that expand in range or genetic diversity that may impact viral phenotypes or vaccine immunity and subsequently the health of North American swine.
ABSTRACT
Human-to-swine transmission of influenza A (H3N2) virus occurs repeatedly and plays a critical role in swine influenza A virus (IAV) evolution and diversity. Human seasonal H3 IAVs were introduced from human-to-swine in the 1990s in the United States and classified as 1990.1 and 1990.4 lineages; the 1990.4 lineage diversified into 1990.4.A-F clades. Additional introductions occurred in the 2010s, establishing the 2010.1 and 2010.2 lineages. Human zoonotic cases with swine IAV, known as variant viruses, have occurred from the 1990.4 and 2010.1 lineages, highlighting a public health concern. If a variant virus is antigenically drifted from current human seasonal vaccine (HuVac) strains, it may be chosen as a candidate virus vaccine (CVV) for pandemic preparedness purposes. We assessed the zoonotic risk of US swine H3N2 strains by performing phylogenetic analyses of recent swine H3 strains to identify the major contemporary circulating genetic clades. Representatives were tested in hemagglutination inhibition assays with ferret post-infection antisera raised against existing CVVs or HuVac viruses. The 1990.1, 1990.4.A, and 1990.4.B.2 clade viruses displayed significant loss in cross-reactivity to CVV and HuVac antisera, and interspecies transmission potential was subsequently investigated in a pig-to-ferret transmission study. Strains from the three lineages were transmitted from pigs to ferrets via respiratory droplets, but there were differential shedding profiles. These data suggest that existing CVVs may offer limited protection against swine H3N2 infection, and that contemporary 1990.4.A viruses represent a specific concern given their widespread circulation among swine in the United States and association with multiple zoonotic cases.
Subject(s)
Influenza A virus , Influenza, Human , Viral Vaccines , Humans , Animals , Swine , Ferrets , Influenza A Virus, H3N2 Subtype/genetics , Phylogeny , Immune Sera , Influenza, Human/epidemiologyABSTRACT
During the last decade, endemic swine H1 influenza A viruses (IAV) from six different genetic clades of the hemagglutinin gene caused zoonotic infections in humans. The majority of zoonotic events with swine IAV were restricted to a single case with no subsequent transmission. However, repeated introduction of human-seasonal H1N1, continual reassortment between endemic swine IAV, and subsequent drift in the swine host resulted in highly diverse swine IAV with human-origin genes that may become a risk to the human population. To prepare for the potential of a future swine-origin IAV pandemic in humans, public health laboratories selected candidate vaccine viruses (CVV) for use as vaccine seed strains. To assess the pandemic risk of contemporary US swine H1N1 or H1N2 strains, we quantified the genetic diversity of swine H1 HA genes, and identified representative strains from each circulating clade. We then characterized the representative swine IAV against human seasonal vaccine and CVV strains using ferret antisera in hemagglutination inhibition assays (HI). HI assays revealed that 1A.3.3.2 (pdm09) and 1B.2.1 (delta-2) demonstrated strong cross reactivity to human seasonal vaccines or CVVs. However, swine IAV from three clades that represent more than 50% of the detected swine IAVs in the USA showed significant reduction in cross-reactivity compared to the closest CVV virus: 1A.1.1.3 (alpha-deletion), 1A.3.3.3-clade 3 (gamma), and 1B.2.2.1 (delta-1a). Representative viruses from these three clades were further characterized in a pig-to-ferret transmission model and shown to exhibit variable transmission efficiency. Our data prioritize specific genotypes of swine H1N1 and H1N2 to further investigate in the risk they pose to the human population.
Subject(s)
Influenza A Virus, H1N1 Subtype , Influenza A virus , Orthomyxoviridae Infections , Swine Diseases , Animals , Swine , Humans , Ferrets , Influenza A Virus, H1N1 Subtype/genetics , Orthomyxoviridae Infections/epidemiology , Cowpox virus , Immune Sera , Swine Diseases/epidemiologyABSTRACT
The "dark transcriptome" can be considered the multitude of sequences that are transcribed but not annotated as genes. We evaluated expression of 6,692 annotated genes and 29,354 unannotated open reading frames (ORFs) in the Saccharomyces cerevisiae genome across diverse environmental, genetic and developmental conditions (3,457 RNA-Seq samples). Over 30% of the highly transcribed ORFs have translation evidence. Phylostratigraphic analysis infers most of these transcribed ORFs would encode species-specific proteins ("orphan-ORFs"); hundreds have mean expression comparable to annotated genes. These data reveal unannotated ORFs most likely to be protein-coding genes. We partitioned a co-expression matrix by Markov Chain Clustering; the resultant clusters contain 2,468 orphan-ORFs. We provide the aggregated RNA-Seq yeast data with extensive metadata as a project in MetaOmGraph (MOG), a tool designed for interactive analysis and visualization. This approach enables reuse of public RNA-Seq data for exploratory discovery, providing a rich context for experimentalists to make novel, experimentally testable hypotheses about candidate genes.
ABSTRACT
Influenza A virus (IAV) is passively surveilled in swine in the United States through a U.S. Department of Agriculture administered surveillance system. We present an interactive Web tool to visualize and explore trends in the genetic and geographic diversity of IAV derived from the surveillance system.
ABSTRACT
The antigenic diversity of influenza A viruses (IAV) circulating in swine challenges the development of effective vaccines, increasing zoonotic threat and pandemic potential. High-throughput sequencing technologies can quantify IAV genetic diversity, but there are no accurate approaches to adequately describe antigenic phenotypes. This study evaluated an ensemble of nonlinear regression models to estimate virus phenotype from genotype. Regression models were trained with a phenotypic data set of pairwise hemagglutination inhibition (HI) assays, using genetic sequence identity and pairwise amino acid mutations as predictor features. The model identified amino acid identity, ranked the relative importance of mutations in the hemagglutinin (HA) protein, and demonstrated good prediction accuracy. Four previously untested IAV strains were selected to experimentally validate model predictions by HI assays. Errors between predicted and measured distances of uncharacterized strains were 0.35, 0.61, 1.69, and 0.13 antigenic units. These empirically trained regression models can be used to estimate antigenic distances between different strains of IAV in swine by using sequence data. By ranking the importance of mutations in the HA, we provide criteria for identifying antigenically advanced IAV strains that may not be controlled by existing vaccines and can inform strain updates to vaccines to better control this pathogen.IMPORTANCE Influenza A viruses (IAV) in swine constitute a major economic burden to an important global agricultural sector, impact food security, and are a public health threat. Despite significant improvement in surveillance for IAV in swine over the past 10 years, sequence data have not been integrated into a systematic vaccine strain selection process for predicting antigenic phenotype and identifying determinants of antigenic drift. To overcome this, we developed nonlinear regression models that predict antigenic phenotype from genetic sequence data by training the model on hemagglutination inhibition assay results. We used these models to predict antigenic phenotype for previously uncharacterized IAV, ranked the importance of genetic features for antigenic phenotype, and experimentally validated our predictions. Our model predicted virus antigenic characteristics from genetic sequence data and provides a rapid and accurate method linking genetic sequence data to antigenic characteristics. This approach also provides support for public health by identifying viruses that are antigenically advanced from strains used as pandemic preparedness candidate vaccine viruses.
Subject(s)
Antigenic Variation/genetics , Genotype , Hemagglutinin Glycoproteins, Influenza Virus/genetics , Influenza A Virus, H3N2 Subtype/genetics , Machine Learning , Orthomyxoviridae Infections/veterinary , Orthomyxoviridae Infections/virology , Phenotype , Amino Acid Substitution , Animals , Antigenic Variation/immunology , Hemagglutinin Glycoproteins, Influenza Virus/classification , Hemagglutinin Glycoproteins, Influenza Virus/immunology , Influenza A Virus, H3N2 Subtype/classification , Influenza A Virus, H3N2 Subtype/immunology , Orthomyxoviridae Infections/immunology , Regression Analysis , Swine , Swine Diseases/virologyABSTRACT
Influenza A viruses (IAVs) are the causative agents of one of the most important viral respiratory diseases in pigs and humans. Human and swine IAV are prone to interspecies transmission, leading to regular incursions from human to pig and vice versa. This bidirectional transmission of IAV has heavily influenced the evolutionary history of IAV in both species. Transmission of distinct human seasonal lineages to pigs, followed by sustained within-host transmission and rapid adaptation and evolution, represent a considerable challenge for pig health and production. Consequently, although only subtypes of H1N1, H1N2, and H3N2 are endemic in swine around the world, extensive diversity can be found in the hemagglutinin (HA) and neuraminidase (NA) genes, as well as the remaining six genes. We review the complicated global epidemiology of IAV in swine and the inextricably entangled implications for public health and influenza pandemic planning.
Subject(s)
Influenza A virus/genetics , Influenza, Human/epidemiology , Orthomyxoviridae Infections/epidemiology , Swine/virology , Animals , Hemagglutination Inhibition Tests , Humans , Influenza A Virus, H1N1 Subtype/genetics , Influenza A Virus, H1N2 Subtype/genetics , Influenza A Virus, H3N2 Subtype/genetics , Orthomyxoviridae Infections/transmission , Orthomyxoviridae Infections/virology , PhylogenyABSTRACT
More than 15 petabases of raw RNAseq data is now accessible through public repositories. Acquisition of other 'omics data types is expanding, though most lack a centralized archival repository. Data-reuse provides tremendous opportunity to extract new knowledge from existing experiments, and offers a unique opportunity for robust, multi-'omics analyses by merging metadata (information about experimental design, biological samples, protocols) and data from multiple experiments. We illustrate how predictive research can be accelerated by meta-analysis with a study of orphan (species-specific) genes. Computational predictions are critical to infer orphan function because their coding sequences provide very few clues. The metadata in public databases is often confusing; a test case with Zea mays mRNA seq data reveals a high proportion of missing, misleading or incomplete metadata. This metadata morass significantly diminishes the insight that can be extracted from these data. We provide tips for data submitters and users, including specific recommendations to improve metadata quality by more use of controlled vocabulary and by metadata reviews. Finally, we advocate for a unified, straightforward metadata submission and retrieval system.
Subject(s)
Base Sequence , Databases, Factual/statistics & numerical data , Metadata/statistics & numerical data , Plant Proteins , RNA, Messenger , Zea mays , Plant Proteins/genetics , RNA, Messenger/genetics , Zea mays/geneticsABSTRACT
Mitochondrial function has long been hypothesized to be intimately involved in aging processes--either directly through declining efficiency of mitochondrial respiration and ATP production with advancing age, or indirectly, e.g., through increased mitochondrial production of damaging free radicals with age. Yet we lack a comprehensive understanding of the evolution of mitochondrial genotypes and phenotypes across diverse animal models, particularly in species that have extremely labile physiology. Here, we measure mitochondrial genome-types and transcription in ecotypes of garter snakes (Thamnophis elegans) that are adapted to disparate habitats and have diverged in aging rates and lifespans despite residing in close proximity. Using two RNA-seq datasets, we (1) reconstruct the garter snake mitochondrial genome sequence and bioinformatically identify regulatory elements, (2) test for divergence of mitochondrial gene expression between the ecotypes and in response to heat stress, and (3) test for sequence divergence in mitochondrial protein-coding regions in these slow-aging (SA) and fast-aging (FA) naturally occurring ecotypes. At the nucleotide sequence level, we confirmed two (duplicated) mitochondrial control regions one of which contains a glucocorticoid response element (GRE). Gene expression of protein-coding genes was higher in FA snakes relative to SA snakes for most genes, but was neither affected by heat stress nor an interaction between heat stress and ecotype. SA and FA ecotypes had unique mitochondrial haplotypes with amino acid substitutions in both CYTB and ND5. The CYTB amino acid change (Isoleucine â Threonine) was highly segregated between ecotypes. This divergence of mitochondrial haplotypes between SA and FA snakes contrasts with nuclear gene-flow estimates, but correlates with previously reported divergence in mitochondrial function (mitochondrial oxygen consumption, ATP production, and reactive oxygen species consequences).
Subject(s)
Aging/physiology , Colubridae/physiology , Mitochondria/physiology , Aging/genetics , Animals , Base Sequence , Colubridae/genetics , Ecotype , Female , Gene Expression Regulation/physiology , Gene Regulatory Networks/physiology , Genome, Mitochondrial , Haplotypes , Heat-Shock Response/genetics , Longevity/genetics , Longevity/physiology , Phenotype , Sequence Alignment , Species SpecificityABSTRACT
Sizable minorities of protein-coding genes from every sequenced eukaryotic and prokaryotic genome are unique to the species. These so-called 'orphan genes' may evolve de novo from non-coding sequence or be derived from older coding material. They are often associated with environmental stress responses and species-specific traits or regulatory patterns. However, difficulties in studying genes where comparative analysis is impossible, and a bias towards broadly conserved genes, have resulted in underappreciation of their importance. We review here the identification, possible origins, evolutionary trends, and functions of orphans with an emphasis on their role in plant biology. We exemplify several evolutionary trends with an analysis of Arabidopsis thaliana and present QQS as a model orphan gene.
Subject(s)
Genes, Plant , Animals , Genetic Phenomena , Reproduction , Species Specificity , Stress, PhysiologicalABSTRACT
Thanks to the microarray technology, our understanding of transcriptome evolution at the genome level has been considerably advanced in the past decade. Yet, further investigation was challenged by several technical limitations of this technology. Recent innovation of next-generation sequencing, particularly the invention of RNA-seq technology, has shed insightful lights on resolving this problem. Though a number of statistical and computational methods have been developed to analyze RNA-seq data, the analytical framework specifically designed for evolutionary genomics remains an open question. In this article we develop a new method for estimating the genome expression distance from the RNA-seq data, which has explicit interpretations under the model of gene expression evolution. Moreover, this distance measure takes the data overdispersion, gene length variation, and sequencing depth variation into account so that it can be applied to multiple genomes from different species. Using mammalian RNA-seq data as example, we demonstrated that this expression distance is useful in phylogenomic analysis.