RESUMO
The tuatara (Sphenodon punctatus)-the only living member of the reptilian order Rhynchocephalia (Sphenodontia), once widespread across Gondwana1,2-is an iconic species that is endemic to New Zealand2,3. A key link to the now-extinct stem reptiles (from which dinosaurs, modern reptiles, birds and mammals evolved), the tuatara provides key insights into the ancestral amniotes2,4. Here we analyse the genome of the tuatara, which-at approximately 5 Gb-is among the largest of the vertebrate genomes yet assembled. Our analyses of this genome, along with comparisons with other vertebrate genomes, reinforce the uniqueness of the tuatara. Phylogenetic analyses indicate that the tuatara lineage diverged from that of snakes and lizards around 250 million years ago. This lineage also shows moderate rates of molecular evolution, with instances of punctuated evolution. Our genome sequence analysis identifies expansions of proteins, non-protein-coding RNA families and repeat elements, the latter of which show an amalgam of reptilian and mammalian features. The sequencing of the tuatara genome provides a valuable resource for deep comparative analyses of tetrapods, as well as for tuatara biology and conservation. Our study also provides important insights into both the technical challenges and the cultural obligations that are associated with genome sequencing.
Assuntos
Evolução Molecular , Genoma/genética , Filogenia , Répteis/genética , Animais , Conservação dos Recursos Naturais/tendências , Feminino , Genética Populacional , Lagartos/genética , Masculino , Anotação de Sequência Molecular , Nova Zelândia , Caracteres Sexuais , Serpentes/genética , SinteniaRESUMO
An amendment to this paper has been published and can be accessed via a link at the top of the paper.
RESUMO
Experiments that are planned using accurate prediction algorithms will mitigate failures in recombinant protein production. We have developed TISIGNER (https://tisigner.com) with the aim of addressing technical challenges to recombinant protein production. We offer three web services, TIsigner (Translation Initiation coding region designer), SoDoPE (Soluble Domain for Protein Expression) and Razor, which are specialised in synonymous optimisation of recombinant protein expression, solubility and signal peptide analysis, respectively. Importantly, TIsigner, SoDoPE and Razor are linked, which allows users to switch between the tools when optimising genes of interest.
Assuntos
Proteínas Recombinantes/biossíntese , Software , Internet , Iniciação Traducional da Cadeia Peptídica , Sinais Direcionadores de Proteínas , Proteínas Recombinantes/química , Proteínas Recombinantes/genética , SolubilidadeRESUMO
Recombinant protein production is a key process in generating proteins of interest in the pharmaceutical industry and biomedical research. However, about 50% of recombinant proteins fail to be expressed in a variety of host cells. Here we show that the accessibility of translation initiation sites modelled using the mRNA base-unpairing across the Boltzmann's ensemble significantly outperforms alternative features. This approach accurately predicts the successes or failures of expression experiments, which utilised Escherichia coli cells to express 11,430 recombinant proteins from over 189 diverse species. On this basis, we develop TIsigner that uses simulated annealing to modify up to the first nine codons of mRNAs with synonymous substitutions. We show that accessibility captures the key propensity beyond the target region (initiation sites in this case), as a modest number of synonymous changes is sufficient to tune the recombinant protein expression levels. We build a stochastic simulation model and show that higher accessibility leads to higher protein production and slower cell growth, supporting the idea of protein cost, where cell growth is constrained by protein circuits during overexpression.
Assuntos
Códon de Iniciação/genética , Códon de Terminação/genética , Proteínas Recombinantes/química , Proteínas Recombinantes/genética , Mutação Silenciosa/genética , Biologia ComputacionalRESUMO
Some Serratia entomophila isolates have been successfully exploited in biopesticides due to their ability to cause amber disease in larvae of the Aotearoa (New Zealand) endemic pasture pest, Costelytra giveni. Anti-feeding prophage and ABC toxin complex virulence determinants are encoded by a 153-kb single-copy conjugative plasmid (pADAP; amber disease-associated plasmid). Despite growing understanding of the S. entomophila pADAP model plasmid, little is known about the wider plasmid family. Here, we sequence and analyse mega-plasmids from 50 Serratia isolates that induce variable disease phenotypes in the C. giveni insect host. Mega-plasmids are highly conserved within S. entomophila, but show considerable divergence in Serratia proteamaculans with other variants in S. liquefaciens and S. marcescens, likely reflecting niche adaption. In this study to reconstruct ancestral relationships for a complex mega-plasmid system, strong co-evolution between Serratia species and their plasmids were found. We identify 12 distinct mega-plasmid genotypes, all sharing a conserved gene backbone, but encoding highly variable accessory regions including virulence factors, secondary metabolite biosynthesis, Nitrogen fixation genes and toxin-antitoxin systems. We show that the variable pathogenicity of Serratia isolates is largely caused by presence/absence of virulence clusters on the mega-plasmids, but notably, is augmented by external chromosomally encoded factors.
Assuntos
Besouros , Animais , Larva , Plasmídeos/genética , Prófagos/genética , Virulência/genéticaRESUMO
MOTIVATION: Recombinant protein production is a widely used technique in the biotechnology and biomedical industries, yet only a quarter of target proteins are soluble and can therefore be purified. RESULTS: We have discovered that global structural flexibility, which can be modeled by normalized B-factors, accurately predicts the solubility of 12 216 recombinant proteins expressed in Escherichia coli. We have optimized these B-factors, and derived a new set of values for solubility scoring that further improves prediction accuracy. We call this new predictor the 'Solubility-Weighted Index' (SWI). Importantly, SWI outperforms many existing protein solubility prediction tools. Furthermore, we have developed 'SoDoPE' (Soluble Domain for Protein Expression), a web interface that allows users to choose a protein region of interest for predicting and maximizing both protein expression and solubility. AVAILABILITY AND IMPLEMENTATION: The SoDoPE web server and source code are freely available at https://tisigner.com/sodope and https://github.com/Gardner-BinfLab/TISIGNER-ReactJS, respectively. The code and data for reproducing our analysis can be found at https://github.com/Gardner-BinfLab/SoDoPE_paper_2020. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Proteínas , Software , Computadores , Escherichia coli/genética , SolubilidadeRESUMO
Emerging pathogens are a major threat to public health, however understanding how pathogens adapt to new niches remains a challenge. New methods are urgently required to provide functional insights into pathogens from the massive genomic data sets now being generated from routine pathogen surveillance for epidemiological purposes. Here, we measure the burden of atypical mutations in protein coding genes across independently evolved Salmonella enterica lineages, and use these as input to train a random forest classifier to identify strains associated with extraintestinal disease. Members of the species fall along a continuum, from pathovars which cause gastrointestinal infection and low mortality, associated with a broad host-range, to those that cause invasive infection and high mortality, associated with a narrowed host range. Our random forest classifier learned to perfectly discriminate long-established gastrointestinal and invasive serovars of Salmonella. Additionally, it was able to discriminate recently emerged Salmonella Enteritidis and Typhimurium lineages associated with invasive disease in immunocompromised populations in sub-Saharan Africa, and within-host adaptation to invasive infection. We dissect the architecture of the model to identify the genes that were most informative of phenotype, revealing a common theme of degradation of metabolic pathways in extraintestinal lineages. This approach accurately identifies patterns of gene degradation and diversifying selection specific to invasive serovars that have been captured by more labour-intensive investigations, but can be readily scaled to larger analyses.
Assuntos
Adaptação Fisiológica/genética , Proteínas de Bactérias/genética , Aprendizado de Máquina , Salmonella enterica/genética , Animais , Especificidade de Hospedeiro , Humanos , Mutação , Filogenia , Infecções por Salmonella/microbiologia , Salmonelose Animal/microbiologia , Salmonella enterica/classificação , Salmonella enterica/patogenicidade , Virulência/genéticaRESUMO
Mammalian diversification has coincided with a rapid proliferation of various types of noncoding RNAs, including members of both snRNAs and snoRNAs. The significance of this expansion however remains obscure. While some ncRNA copy-number expansions have been linked to functionally tractable effects, such events may equally likely be neutral, perhaps as a result of random retrotransposition. Hindering progress in our understanding of such observations is the difficulty in establishing function for the diverse features that have been identified in our own genome. Projects such as ENCODE and FANTOM have revealed a hidden world of genomic expression patterns, as well as a host of other potential indicators of biological function. However, such projects have been criticized, particularly from practitioners in the field of molecular evolution, where many suspect these data provide limited insight into biological function. The molecular evolution community has largely taken a skeptical view, thus it is important to establish tests of function. We use a range of data, including data drawn from ENCODE and FANTOM, to examine the case for function for the recent copy number expansion in mammals of six evolutionarily ancient RNA families involved in splicing and rRNA maturation. We use several criteria to assess evidence for function: conservation of sequence and structure, genomic synteny, evidence for transposition, and evidence for species-specific expression. Applying these criteria, we find that only a minority of loci show strong evidence for function and that, for the majority, we cannot reject the null hypothesis of no function.
Assuntos
Elementos de DNA Transponíveis , Dosagem de Genes , Expressão Gênica , Mamíferos/genética , RNA Nuclear Pequeno , Animais , Bases de Dados Genéticas , Evolução Molecular , Genômica , Família Multigênica , Splicing de RNARESUMO
Understanding how new genes originate and integrate into cellular networks is key to understanding evolution. Bacteria present unique opportunities for both the natural history and experimental study of gene origins, due to their large effective population sizes, rapid generation times, and ease of genetic manipulation. Bacterial small non-coding RNAs (sRNAs), in particular, many of which operate through a simple antisense regulatory logic, may serve as tractable models for exploring processes of gene origin and adaptation. Understanding how and on what timescales these regulatory molecules arise has important implications for understanding the evolution of bacterial regulatory networks, in particular, for the design of comparative studies of sRNA function. Here, we introduce relevant concepts from evolutionary biology and review recent work that has begun to shed light on the timescales and processes through which non-functional transcriptional noise is co-opted to provide regulatory functions. We explore possible scenarios for sRNA origin, focusing on the co-option, or exaptation, of existing genomic structures which may provide protected spaces for sRNA evolution.
Assuntos
Bactérias/genética , RNA Bacteriano/genética , Pequeno RNA não Traduzido/genética , Evolução Molecular , Regulação Bacteriana da Expressão Gênica/genética , Genoma Bacteriano/genética , FilogeniaRESUMO
Motivation: The aim of this study is to assess the performance of RNA-RNA interaction prediction tools for all domains of life. Results: Minimum free energy (MFE) and alignment methods constitute most of the current RNA interaction prediction algorithms. The MFE tools that include accessibility (i.e. RNAup, IntaRNA and RNAplex) to the final predicted binding energy have better true positive rates (TPRs) with a high positive predictive values (PPVs) in all datasets than other methods. They can also differentiate almost half of the native interactions from background. The algorithms that include effects of internal binding energies to their model and alignment methods seem to have high TPR but relatively low associated PPV compared to accessibility based methods. Availability and Implementation: We shared our wrapper scripts and datasets at Github (github.com/UCanCompBio/RNA_Interactions_Benchmark). All parameters are documented for personal use. Contact: sinan.umu@pg.canterbury.ac.nz. Supplementary information: Supplementary data are available at Bioinformatics online.
Assuntos
Algoritmos , Benchmarking , RNA/metabolismo , Bactérias/genética , Bases de Dados Genéticas , Modelos Teóricos , RNA/química , Análise de Sequência de RNARESUMO
BACKGROUND: The New Zealand collembolan genus Holacanthella contains the largest species of springtails (Collembola) in the world. Using Illumina technology we have sequenced and assembled a draft genome and transcriptome from Holacanthella duospinosa (Salmon). We have used this annotated assembly to investigate the genetic basis of a range of traits critical to the evolution of the Hexapoda, the phylogenetic position of H. duospinosa and potential horizontal gene transfer events. RESULTS: Our genome assembly was ~375 Mbp in size with a scaffold N50 of ~230 Kbp and sequencing coverage of ~180×. DNA elements, LTRs and simple repeats and LINEs formed the largest components and SINEs were very rare. Phylogenomics (370,877 amino acids) placed H. duospinosa within the Neanuridae. We recovered orthologs of the conserved sex determination genes thought to play a role in sex determination. Analysis of CpG content suggested the absence of DNA methylation, and consistent with this we were unable to detect orthologs of the DNA methyltransferase enzymes. The small subunit rRNA gene contained a possible retrotransposon. The Hox gene complex was broken over two scaffolds. For chemosensory ability, at least 15 and 18 ionotropic glutamate and gustatory receptors were identified, respectively. However, we were unable to identify any odorant receptors or their obligate co-receptor Orco. Twenty-three chitinase-like genes were identified from the assembly. Members of this multigene family may play roles in the digestion of fungal cell walls, a common food source for these saproxylic organisms. We also detected 59 and 96 genes that blasted to bacteria and fungi, respectively, but were located on scaffolds that otherwise contained arthropod genes. CONCLUSIONS: The genome of H. duospinosa contains some unusual features including a Hox complex broken over two scaffolds, in a different manner to other arthropod species, a lack of odorant receptor genes and an apparent lack of environmentally responsive DNA methylation, unlike many other arthropods. Our detection of candidate horizontal gene transfer candidates confirms that this phenomenon is occurring across Collembola. These findings allow us to narrow down the regions of the arthropod phylogeny where key innovations have occurred that have facilitated the evolutionary success of Hexapoda.
Assuntos
Artrópodes/genética , Evolução Molecular , Genômica , Animais , Artrópodes/crescimento & desenvolvimento , Artrópodes/metabolismo , Quitinases/genética , Metilação de DNA , Perfilação da Expressão Gênica , Transferência Genética Horizontal , Anotação de Sequência Molecular , Filogenia , Processos de Determinação Sexual/genéticaRESUMO
MOTIVATION: Next generation sequencing technologies have provided us with a wealth of information on genetic variation, but predicting the functional significance of this variation is a difficult task. While many comparative genomics studies have focused on gene flux and large scale changes, relatively little attention has been paid to quantifying the effects of single nucleotide polymorphisms and indels on protein function, particularly in bacterial genomics. RESULTS: We present a hidden Markov model based approach we call delta-bitscore (DBS) for identifying orthologous proteins that have diverged at the amino acid sequence level in a way that is likely to impact biological function. We benchmark this approach with several widely used datasets and apply it to a proof-of-concept study of orthologous proteomes in an investigation of host adaptation in Salmonella enterica We highlight the value of the method in identifying functional divergence of genes, and suggest that this tool may be a better approach than the commonly used dN/dS metric for identifying functionally significant genetic changes occurring in recently diverged organisms. AVAILABILITY AND IMPLEMENTATION: A program implementing DBS for pairwise genome comparisons is freely available at: https://github.com/UCanCompBio/deltaBS CONTACT: nicole.wheeler@pg.canterbury.ac.nz or lars.barquist@uni-wuerzburg.deSupplementary information: Supplementary data are available at Bioinformatics online.
Assuntos
Genoma Bacteriano , Genômica/métodos , Algoritmos , Proteínas de Bactérias/genética , Cadeias de Markov , Modelos Teóricos , Proteoma , Salmonella enterica/genética , SoftwareRESUMO
Toxin-antitoxin (TA) systems are gene modules that appear to be horizontally mobile across a wide range of prokaryotes. It has been proposed that type I TA systems, with an antisense RNA-antitoxin, are less mobile than other TAs that rely on direct toxin-antitoxin binding but no direct comparisons have been made. We searched for type I, II and III toxin families using iterative searches with profile hidden Markov models across phyla and replicons. The distribution of type I toxin families were comparatively narrow, but these patterns weakened with recently discovered families. We discuss how the function and phenotypes of TA systems as well as biases in our search methods may account for differences in their distribution.
Assuntos
Antitoxinas/genética , Toxinas Bacterianas/genética , Regulação Bacteriana da Expressão Gênica , RNA Antissenso/genética , Bases de Dados Genéticas , Transferência Genética Horizontal , Sequências Repetitivas Dispersas , Família Multigênica , Óperon , FilogeniaRESUMO
RNA performs a diverse array of important functions across all cellular life. These functions include important roles in translation, building translational machinery and maturing messenger RNA. More recent discoveries include the miRNAs and bacterial sRNAs that regulate gene expression, the thermosensors, riboswitches and other cis-regulatory elements that help prokaryotes sense their environment and eukaryotic piRNAs that suppress transposition. However, there can be a long period between the initial discovery of a RNA and determining its function. We present a bioinformatic approach to characterize RNA motifs, which are critical components of many RNA structure-function relationships. These motifs can, in some instances, provide researchers with functional hypotheses for uncharacterized RNAs. Moreover, we introduce a new profile-based database of RNA motifs--RMfam--and illustrate some applications for investigating the evolution and functional characterization of RNA. All the data and scripts associated with this work are available from: https://github.com/ppgardne/RMfam.
Assuntos
Bases de Dados de Ácidos Nucleicos , Anotação de Sequência Molecular , RNA/química , Evolução Molecular , Modelos Estatísticos , Motivos de Nucleotídeos , Alinhamento de Sequência , Análise de Sequência de RNARESUMO
The Rfam database (available at http://rfam.xfam.org) is a collection of non-coding RNA families represented by manually curated sequence alignments, consensus secondary structures and annotation gathered from corresponding Wikipedia, taxonomy and ontology resources. In this article, we detail updates and improvements to the Rfam data and website for the Rfam 12.0 release. We describe the upgrade of our search pipeline to use Infernal 1.1 and demonstrate its improved homology detection ability by comparison with the previous version. The new pipeline is easier for users to apply to their own data sets, and we illustrate its ability to annotate RNAs in genomic and metagenomic data sets of various sizes. Rfam has been expanded to include 260 new families, including the well-studied large subunit ribosomal RNA family, and for the first time includes information on short sequence- and structure-based RNA motifs present within families.
Assuntos
Bases de Dados de Ácidos Nucleicos , RNA não Traduzido/química , Genômica , Internet , Anotação de Sequência Molecular , Conformação de Ácido Nucleico , Motivos de Nucleotídeos , RNA Longo não Codificante/química , RNA não Traduzido/classificação , SoftwareRESUMO
Noncoding RNAs are integral to a wide range of biological processes, including translation, gene regulation, host-pathogen interactions and environmental sensing. While genomics is now a mature field, our capacity to identify noncoding RNA elements in bacterial and archaeal genomes is hampered by the difficulty of de novo identification. The emergence of new technologies for characterizing transcriptome outputs, notably RNA-seq, are improving noncoding RNA identification and expression quantification. However, a major challenge is to robustly distinguish functional outputs from transcriptional noise. To establish whether annotation of existing transcriptome data has effectively captured all functional outputs, we analysed over 400 publicly available RNA-seq datasets spanning 37 different Archaea and Bacteria. Using comparative tools, we identify close to a thousand highly-expressed candidate noncoding RNAs. However, our analyses reveal that capacity to identify noncoding RNA outputs is strongly dependent on phylogenetic sampling. Surprisingly, and in stark contrast to protein-coding genes, the phylogenetic window for effective use of comparative methods is perversely narrow: aggregating public datasets only produced one phylogenetic cluster where these tools could be used to robustly separate unannotated noncoding RNAs from a null hypothesis of transcriptional noise. Our results show that for the full potential of transcriptomics data to be realized, a change in experimental design is paramount: effective transcriptomics requires phylogeny-aware sampling.
Assuntos
Perfilação da Expressão Gênica/métodos , RNA não Traduzido/classificação , RNA não Traduzido/genética , Transcriptoma/genética , Archaea/genética , Bactérias/genética , Análise por Conglomerados , Biologia Computacional , Bases de Dados Genéticas , Filogenia , RNA Arqueal/química , RNA Arqueal/classificação , RNA Arqueal/genética , RNA Bacteriano/química , RNA Bacteriano/classificação , RNA Bacteriano/genética , RNA não Traduzido/químicaRESUMO
Salmonella Typhi and Typhimurium diverged only â¼50 000 years ago, yet have very different host ranges and pathogenicity. Despite the availability of multiple whole-genome sequences, the genetic differences that have driven these changes in phenotype are only beginning to be understood. In this study, we use transposon-directed insertion-site sequencing to probe differences in gene requirements for competitive growth in rich media between these two closely related serovars. We identify a conserved core of 281 genes that are required for growth in both serovars, 228 of which are essential in Escherichia coli. We are able to identify active prophage elements through the requirement for their repressors. We also find distinct differences in requirements for genes involved in cell surface structure biogenesis and iron utilization. Finally, we demonstrate that transposon-directed insertion-site sequencing is not only applicable to the protein-coding content of the cell but also has sufficient resolution to generate hypotheses regarding the functions of non-coding RNAs (ncRNAs) as well. We are able to assign probable functions to a number of cis-regulatory ncRNA elements, as well as to infer likely differences in trans-acting ncRNA regulatory networks.
Assuntos
Elementos de DNA Transponíveis , Mutagênese Insercional , Salmonella typhi/genética , Salmonella typhimurium/genética , Proteínas de Bactérias/genética , Biblioteca Gênica , Genes Bacterianos , Pequeno RNA não Traduzido/genética , RNA não Traduzido/genética , Salmonella typhi/crescimento & desenvolvimento , Salmonella typhimurium/crescimento & desenvolvimentoRESUMO
The Rfam database (available via the website at http://rfam.sanger.ac.uk and through our mirror at http://rfam.janelia.org) is a collection of non-coding RNA families, primarily RNAs with a conserved RNA secondary structure, including both RNA genes and mRNA cis-regulatory elements. Each family is represented by a multiple sequence alignment, predicted secondary structure and covariance model. Here we discuss updates to the database in the latest release, Rfam 11.0, including the introduction of genome-based alignments for large families, the introduction of the Rfam Biomart as well as other user interface improvements. Rfam is available under the Creative Commons Zero license.
Assuntos
Bases de Dados de Ácidos Nucleicos , RNA não Traduzido/química , RNA não Traduzido/classificação , Sequência de Bases , Genômica , Internet , Anotação de Sequência Molecular , Conformação de Ácido Nucleico , RNA não Traduzido/genética , Alinhamento de Sequência , Interface Usuário-ComputadorRESUMO
Wikipedia, the online encyclopedia, is the most famous wiki in use today. It contains over 3.7 million pages of content; with many pages written on scientific subject matters that include peer-reviewed citations, yet are written in an accessible manner and generally reflect the consensus opinion of the community. In this, the 19th Annual Database Issue of Nucleic Acids Research, there are 11 articles that describe the use of a wiki in relation to a biological database. In this commentary, we discuss how biological databases can be integrated with Wikipedia, thereby utilising the pre-existing infrastructure, tools and above all, large community of authors (or Wikipedians). The limitations to the content that can be included in Wikipedia are highlighted, with examples drawn from articles found in this issue and other wiki-based resources, indicating why other wiki solutions are necessary. We discuss the merits of using open wikis, like Wikipedia, versus other models, with particular reference to potential vandalism. Finally, we raise the question about the future role of dedicated database biocurators in context of the thousands of crowdsourced, community annotations that are now being stored in wikis.
Assuntos
Bases de Dados Factuais , Enciclopédias como Assunto , Internet , Integração de SistemasRESUMO
The Enterobacteriaceae are a scientifically and medically important clade of bacteria, containing the model organism Escherichia coli, as well as major human pathogens including Salmonella enterica and Klebsiella pneumoniae. Essential gene sets have been determined for several members of the Enterobacteriaceae, with the Keio E. coli single-gene deletion library often regarded as a gold standard. However, it remains unclear how gene essentiality varies between related strains and species. To investigate this, we have assembled a collection of 13 sequenced high-density transposon mutant libraries from five genera within the Enterobacteriaceae. We first assess several gene essentiality prediction approaches, investigate the effects of transposon density on essentiality prediction, and identify biases in transposon insertion sequencing data. Based on these investigations, we develop a new classifier for gene essentiality. Using this new classifier, we define a core essential genome in the Enterobacteriaceae of 201 universally essential genes. Despite the presence of a large cohort of variably essential genes, we find an absence of evidence for genus-specific essential genes. A clear example of this sporadic essentiality is given by the set of genes regulating the σE extracytoplasmic stress response, which appears to have independently acquired essentiality multiple times in the Enterobacteriaceae. Finally, we compare our essential gene sets to the natural experiment of gene loss in obligate insect endosymbionts that have emerged from within the Enterobacteriaceae. This isolates a remarkably small set of genes absolutely required for survival and identifies several instances of essential stress responses masked by redundancy in free-living bacteria.IMPORTANCEThe essential genome, that is the set of genes absolutely required to sustain life, is a core concept in genetics. Essential genes in bacteria serve as drug targets, put constraints on the engineering of biological chassis for technological or industrial purposes, and are key to constructing synthetic life. Despite decades of study, relatively little is known about how gene essentiality varies across related bacteria. In this study, we have collected gene essentiality data for 13 bacteria related to the model organism Escherichia coli, including several human pathogens, and investigated the conservation of essentiality. We find that approximately a third of the genes essential in any particular strain are non-essential in another related strain. Surprisingly, we do not find evidence for essential genes unique to specific genera; rather it appears a substantial fraction of the essential genome rapidly gains or loses essentiality during evolution. This suggests that essentiality is not an immutable characteristic but depends crucially on the genomic context. We illustrate this through a comparison of our essential genes in free-living bacteria to genes conserved in 34 insect endosymbionts with naturally reduced genomes, finding several cases where genes generally regarded as being important for specific stress responses appear to have become essential in endosymbionts due to a loss of functional redundancy in the genome.