Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 33
Filter
Add more filters










Publication year range
1.
Nucleic Acids Res ; 51(D1): D678-D689, 2023 01 06.
Article in English | MEDLINE | ID: mdl-36350631

ABSTRACT

The National Institute of Allergy and Infectious Diseases (NIAID) established the Bioinformatics Resource Center (BRC) program to assist researchers with analyzing the growing body of genome sequence and other omics-related data. In this report, we describe the merger of the PAThosystems Resource Integration Center (PATRIC), the Influenza Research Database (IRD) and the Virus Pathogen Database and Analysis Resource (ViPR) BRCs to form the Bacterial and Viral Bioinformatics Resource Center (BV-BRC) https://www.bv-brc.org/. The combined BV-BRC leverages the functionality of the bacterial and viral resources to provide a unified data model, enhanced web-based visualization and analysis tools, bioinformatics services, and a powerful suite of command line tools that benefit the bacterial and viral research communities.


Subject(s)
Genomics , Software , Viruses , Humans , Bacteria/genetics , Computational Biology , Databases, Genetic , Influenza, Human , Viruses/genetics
2.
Nucleic Acids Res ; 48(D1): D606-D612, 2020 01 08.
Article in English | MEDLINE | ID: mdl-31667520

ABSTRACT

The PathoSystems Resource Integration Center (PATRIC) is the bacterial Bioinformatics Resource Center funded by the National Institute of Allergy and Infectious Diseases (https://www.patricbrc.org). PATRIC supports bioinformatic analyses of all bacteria with a special emphasis on pathogens, offering a rich comparative analysis environment that provides users with access to over 250 000 uniformly annotated and publicly available genomes with curated metadata. PATRIC offers web-based visualization and comparative analysis tools, a private workspace in which users can analyze their own data in the context of the public collections, services that streamline complex bioinformatic workflows and command-line tools for bulk data analysis. Over the past several years, as genomic and other omics-related experiments have become more cost-effective and widespread, we have observed considerable growth in the usage of and demand for easy-to-use, publicly available bioinformatic tools and services. Here we report the recent updates to the PATRIC resource, including new web-based comparative analysis tools, eight new services and the release of a command-line interface to access, query and analyze data.


Subject(s)
Bacteria/genetics , Computational Biology/methods , Databases, Genetic , Algorithms , Animals , Caenorhabditis elegans/genetics , Chickens/genetics , Drosophila melanogaster/genetics , Host-Pathogen Interactions/genetics , Humans , Internet , Macaca mulatta/genetics , Metagenomics , Mice , National Institute of Allergy and Infectious Diseases (U.S.) , Phenotype , Phylogeny , Rats , Swine/genetics , United States , Zebrafish/genetics
3.
Brief Bioinform ; 20(4): 1094-1102, 2019 07 19.
Article in English | MEDLINE | ID: mdl-28968762

ABSTRACT

The Pathosystems Resource Integration Center (PATRIC, www.patricbrc.org) is designed to provide researchers with the tools and services that they need to perform genomic and other 'omic' data analyses. In response to mounting concern over antimicrobial resistance (AMR), the PATRIC team has been developing new tools that help researchers understand AMR and its genetic determinants. To support comparative analyses, we have added AMR phenotype data to over 15 000 genomes in the PATRIC database, often assembling genomes from reads in public archives and collecting their associated AMR panel data from the literature to augment the collection. We have also been using this collection of AMR metadata to build machine learning-based classifiers that can predict the AMR phenotypes and the genomic regions associated with resistance for genomes being submitted to the annotation service. Likewise, we have undertaken a large AMR protein annotation effort by manually curating data from the literature and public repositories. This collection of 7370 AMR reference proteins, which contains many protein annotations (functional roles) that are unique to PATRIC and RAST, has been manually curated so that it projects stably across genomes. The collection currently projects to 1 610 744 proteins in the PATRIC database. Finally, the PATRIC Web site has been expanded to enable AMR-based custom page views so that researchers can easily explore AMR data and design experiments based on whole genomes or individual genes.


Subject(s)
Computational Biology/methods , Databases, Genetic , Drug Resistance, Microbial/genetics , Systems Integration , Computational Biology/trends , Databases, Genetic/statistics & numerical data , Genome, Microbial , Humans , Internet , Molecular Sequence Annotation
4.
Nat Commun ; 9(1): 4908, 2018 11 21.
Article in English | MEDLINE | ID: mdl-30464174

ABSTRACT

Sulfolobus islandicus is a model microorganism in the TACK superphylum of the Archaea, a key lineage in the evolutionary history of cells. Here we report a genome-wide identification of the repertoire of genes essential to S. islandicus growth in culture. We confirm previous targeted gene knockouts, uncover the non-essentiality of functions assumed to be essential to the Sulfolobus cell, including the proteinaceous S-layer, and highlight essential genes whose functions are yet to be determined. Phyletic distributions illustrate the potential transitions that may have occurred during the evolution of this archaeal microorganism, and highlight sets of genes that may have been associated with each transition. We use this comparative context as a lens to focus future research on archaea-specific uncharacterized essential genes that may provide valuable insights into the evolutionary history of cells.


Subject(s)
Genes, Essential , Genome, Archaeal , Sulfolobus/genetics , Biological Evolution , DNA Topoisomerases, Type I/genetics , Genetic Complementation Test , Membrane Glycoproteins/genetics , Sulfolobus/ultrastructure
5.
Nucleic Acids Res ; 45(D1): D535-D542, 2017 01 04.
Article in English | MEDLINE | ID: mdl-27899627

ABSTRACT

The Pathosystems Resource Integration Center (PATRIC) is the bacterial Bioinformatics Resource Center (https://www.patricbrc.org). Recent changes to PATRIC include a redesign of the web interface and some new services that provide users with a platform that takes them from raw reads to an integrated analysis experience. The redesigned interface allows researchers direct access to tools and data, and the emphasis has changed to user-created genome-groups, with detailed summaries and views of the data that researchers have selected. Perhaps the biggest change has been the enhanced capability for researchers to analyze their private data and compare it to the available public data. Researchers can assemble their raw sequence reads and annotate the contigs using RASTtk. PATRIC also provides services for RNA-Seq, variation, model reconstruction and differential expression analysis, all delivered through an updated private workspace. Private data can be compared by 'virtual integration' to any of PATRIC's public data. The number of genomes available for comparison in PATRIC has expanded to over 80 000, with a special emphasis on genomes with antimicrobial resistance data. PATRIC uses this data to improve both subsystem annotation and k-mer classification, and tags new genomes as having signatures that indicate susceptibility or resistance to specific antibiotics.


Subject(s)
Bacteria/genetics , Computational Biology/methods , Databases, Genetic , Genome, Bacterial , Genomics/methods , Anti-Bacterial Agents/pharmacology , Bacteria/drug effects , Bacteria/metabolism , Bacterial Proteins/genetics , Bacterial Proteins/metabolism , Drug Resistance, Bacterial , Molecular Sequence Annotation , Proteome , Proteomics/methods , Software , Web Browser
6.
Front Microbiol ; 7: 118, 2016.
Article in English | MEDLINE | ID: mdl-26903996

ABSTRACT

The ability to build accurate protein families is a fundamental operation in bioinformatics that influences comparative analyses, genome annotation, and metabolic modeling. For several years we have been maintaining protein families for all microbial genomes in the PATRIC database (Pathosystems Resource Integration Center, patricbrc.org) in order to drive many of the comparative analysis tools that are available through the PATRIC website. However, due to the burgeoning number of genomes, traditional approaches for generating protein families are becoming prohibitive. In this report, we describe a new approach for generating protein families, which we call PATtyFams. This method uses the k-mer-based function assignments available through RAST (Rapid Annotation using Subsystem Technology) to rapidly guide family formation, and then differentiates the function-based groups into families using a Markov Cluster algorithm (MCL). This new approach for generating protein families is rapid, scalable and has properties that are consistent with alignment-based methods.

7.
Genome Biol Evol ; 7(12): 3337-57, 2015 Nov 19.
Article in English | MEDLINE | ID: mdl-26590210

ABSTRACT

The large repABC plasmids of the order Rhizobiales with Class I quorum-regulated conjugative transfer systems often define the nature of the bacterium that harbors them. These otherwise diverse plasmids contain a core of highly conserved genes for replication and conjugation raising the question of their evolutionary relationships. In an analysis of 18 such plasmids these elements fall into two organizational classes, Group I and Group II, based on the sites at which cargo DNA is located. Cladograms constructed from proteins of the transfer and quorum-sensing components indicated that those of the Group I plasmids, while coevolving, have diverged from those coevolving proteins of the Group II plasmids. Moreover, within these groups the phylogenies of the proteins usually occupy similar, if not identical, tree topologies. Remarkably, such relationships were not seen among proteins of the replication system; although RepA and RepB coevolve, RepC does not. Nor do the replication proteins coevolve with the proteins of the transfer and quorum-sensing systems. Functional analysis was mostly consistent with phylogenies. TraR activated promoters from plasmids within its group, but not between groups and dimerized with TraR proteins from within but not between groups. However, oriT sequences, which are highly conserved, were processed by the transfer system of plasmids regardless of group. We conclude that these plasmids diverged into two classes based on the locations at which cargo DNA is inserted, that the quorum-sensing and transfer functions are coevolving within but not between the two groups, and that this divergent evolution extends to function.


Subject(s)
Bacterial Proteins/genetics , DNA Helicases/genetics , Evolution, Molecular , Gene Transfer, Horizontal , Quorum Sensing/genetics , Rhizobiaceae/genetics , Trans-Activators/genetics , Plasmids/genetics
8.
PLoS One ; 10(6): e0126883, 2015.
Article in English | MEDLINE | ID: mdl-26039056

ABSTRACT

The Salmonella enterica serovars Enteritidis, Dublin, and Gallinarum are closely related but differ in virulence and host range. To identify the genetic elements responsible for these differences and to better understand how these serovars are evolving, we sequenced the genomes of Enteritidis strain LK5 and Dublin strain SARB12 and compared these genomes to the publicly available Enteritidis P125109, Dublin CT 02021853 and Dublin SD3246 genome sequences. We also compared the publicly available Gallinarum genome sequences from biotype Gallinarum 287/91 and Pullorum RKS5078. Using bioinformatic approaches, we identified single nucleotide polymorphisms, insertions, deletions, and differences in prophage and pseudogene content between strains belonging to the same serovar. Through our analysis we also identified several prophage cargo genes and pseudogenes that affect virulence and may contribute to a host-specific, systemic lifestyle. These results strongly argue that the Enteritidis, Dublin and Gallinarum serovars of Salmonella enterica evolve by acquiring new genes through horizontal gene transfer, followed by the formation of pseudogenes. The loss of genes necessary for a gastrointestinal lifestyle ultimately leads to a systemic lifestyle and niche exclusion in the host-specific serovars.


Subject(s)
Genome, Bacterial , Mutation , Polymorphism, Single Nucleotide , Salmonella enteritidis/genetics , Salmonella enteritidis/pathogenicity , Serogroup
9.
Sci Rep ; 5: 8365, 2015 Feb 10.
Article in English | MEDLINE | ID: mdl-25666585

ABSTRACT

The RAST (Rapid Annotation using Subsystem Technology) annotation engine was built in 2008 to annotate bacterial and archaeal genomes. It works by offering a standard software pipeline for identifying genomic features (i.e., protein-encoding genes and RNA) and annotating their functions. Recently, in order to make RAST a more useful research tool and to keep pace with advancements in bioinformatics, it has become desirable to build a version of RAST that is both customizable and extensible. In this paper, we describe the RAST tool kit (RASTtk), a modular version of RAST that enables researchers to build custom annotation pipelines. RASTtk offers a choice of software for identifying and annotating genomic features as well as the ability to add custom features to an annotation job. RASTtk also accommodates the batch submission of genomes and the ability to customize annotation protocols for batch submissions. This is the first major software restructuring of RAST since its inception.


Subject(s)
Molecular Sequence Annotation/methods , Software
10.
Nucleic Acids Res ; 42(Database issue): D206-14, 2014 Jan.
Article in English | MEDLINE | ID: mdl-24293654

ABSTRACT

In 2004, the SEED (http://pubseed.theseed.org/) was created to provide consistent and accurate genome annotations across thousands of genomes and as a platform for discovering and developing de novo annotations. The SEED is a constantly updated integration of genomic data with a genome database, web front end, API and server scripts. It is used by many scientists for predicting gene functions and discovering new pathways. In addition to being a powerful database for bioinformatics research, the SEED also houses subsystems (collections of functionally related protein families) and their derived FIGfams (protein families), which represent the core of the RAST annotation engine (http://rast.nmpdr.org/). When a new genome is submitted to RAST, genes are called and their annotations are made by comparison to the FIGfam collection. If the genome is made public, it is then housed within the SEED and its proteins populate the FIGfam collection. This annotation cycle has proven to be a robust and scalable solution to the problem of annotating the exponentially increasing number of genomes. To date, >12 000 users worldwide have annotated >60 000 distinct genomes using RAST. Here we describe the interconnectedness of the SEED database and RAST, the RAST annotation pipeline and updates to both resources.


Subject(s)
Databases, Genetic , Genome, Archaeal , Genome, Bacterial , Molecular Sequence Annotation , Bacterial Proteins/chemistry , Bacterial Proteins/genetics , Bacterial Proteins/physiology , Genomics , Internet , Software
11.
3 Biotech ; 4(3): 331-335, 2014 Jun.
Article in English | MEDLINE | ID: mdl-28324432

ABSTRACT

Maintaining consistency in genome annotations is important for supporting many computational tasks, particularly metabolic modeling. The SEED project has implemented a process that improves annotation consistencies across microbial genomes for proteins with conserved sequences and genomic context. In this research report, we describe this process and show how this effort has resulted in improvements to microbial genome annotations in the SEED. We also compare SEED annotation consistencies with other commonly used resources such as IMG (the Joint Genome Institute's Integrated Microbial Genomes system), RefSeq (the National Center for Biotechnology Information's Reference Sequence Database), Swiss-Prot (the annotated protein sequence database of the Swiss Institute of Bioinformatics, European Molecular Biology Laboratory and the European Bioinformatics Institute) and TrEMBL (Translated European Molecular Biology Laboratory nucleotide sequence data Library). Our analysis indicates that manual and computational efforts are paying off for the databases where consistency is a major goal.

12.
J Bacteriol ; 196(5): 1031-44, 2014 Mar.
Article in English | MEDLINE | ID: mdl-24363349

ABSTRACT

The Ti plasmid in Agrobacterium tumefaciens strain 15955 carries two alleles of traR that regulate conjugative transfer. The first is a functional allele, called traR, that is transcriptionally induced by the opine octopine. The second, trlR, is a nonfunctional, dominant-negative mutant located in an operon that is inducible by the opine mannopine (MOP). Based on these findings, we predicted that there exist wild-type agrobacterial strains harboring plasmids in which MOP induces a functional traR and, hence, conjugation. We analyzed 11 MOP-utilizing field isolates and found five where MOP induced transfer of the MOP-catabolic element and increased production of the acyl-homoserine lactone (acyl-HSL) quormone. The transmissible elements in these five strains represent a set of highly related plasmids. Sequence analysis of one such plasmid, pAoF64/95, revealed that the 176-kb element is not a Ti plasmid but carries genes for catabolism of MOP, mannopinic acid (MOA), agropinic acid (AGA), and the agrocinopines. The plasmid additionally carries all of the genes required for conjugative transfer, including the regulatory genes traR, traI, and traM. The traR gene, however, is not located in the MOP catabolism region. The gene, instead, is monocistronic and located within the tra-trb-rep gene cluster. A traR mutant failed to transfer the plasmid and produced little to no quormone even when grown with MOP, indicating that TraRpAoF64/95 is the activator of the tra regulon. A traM mutant was constitutive for transfer and acyl-HSL production, indicating that the anti-activator function of TraM is conserved.


Subject(s)
Agrobacterium tumefaciens/metabolism , Conjugation, Genetic/physiology , Mannitol/analogs & derivatives , Plasmids/metabolism , Quorum Sensing , Acyl-Butyrolactones/metabolism , Agrobacterium tumefaciens/genetics , Bacterial Proteins/genetics , Chromosome Mapping , Chromosomes, Bacterial/genetics , Mannitol/pharmacology , Molecular Sequence Data , Plasmids/genetics , Transcription Factors/genetics
13.
Int J Syst Evol Microbiol ; 63(Pt 7): 2727-2741, 2013 Jul.
Article in English | MEDLINE | ID: mdl-23606477

ABSTRACT

The tree of life is paramount for achieving an integrated understanding of microbial evolution and the relationships between physiology, genealogy and genomics. It provides the framework for interpreting environmental sequence data, whether applied to microbial ecology or to human health. However, there remain many instances where there is ambiguity in our understanding of the phylogeny of major lineages, and/or confounding nomenclature. Here we apply recent genomic sequence data to examine the evolutionary history of members of the classes Mollicutes (phylum Tenericutes) and Erysipelotrichia (phylum Firmicutes). Consistent with previous analyses, we find evidence of a specific relationship between them in molecular phylogenies and signatures of the 16S rRNA, 23S rRNA, ribosomal proteins and aminoacyl-tRNA synthetase proteins. Furthermore, by mapping functions over the phylogenetic tree we find that the erysipelotrichia lineages are involved in various stages of genomic reduction, having lost (often repeatedly) a variety of metabolic functions and the ability to form endospores. Although molecular phylogeny has driven numerous taxonomic revisions, we find it puzzling that the most recent taxonomic revision of the phyla Firmicutes and Tenericutes has further separated them into distinct phyla, rather than reflecting their common roots.


Subject(s)
Genome, Bacterial , Phylogeny , Tenericutes/classification , Amino Acyl-tRNA Synthetases/genetics , Bacterial Proteins/genetics , DNA, Bacterial/genetics , Nucleic Acid Conformation , RNA, Ribosomal, 16S/genetics , RNA, Ribosomal, 23S/genetics , Ribosomal Proteins/genetics , Sequence Alignment , Tenericutes/genetics
14.
PLoS One ; 7(10): e48053, 2012.
Article in English | MEDLINE | ID: mdl-23110173

ABSTRACT

The remarkable advance in sequencing technology and the rising interest in medical and environmental microbiology, biotechnology, and synthetic biology resulted in a deluge of published microbial genomes. Yet, genome annotation, comparison, and modeling remain a major bottleneck to the translation of sequence information into biological knowledge, hence computational analysis tools are continuously being developed for rapid genome annotation and interpretation. Among the earliest, most comprehensive resources for prokaryotic genome analysis, the SEED project, initiated in 2003 as an integration of genomic data and analysis tools, now contains >5,000 complete genomes, a constantly updated set of curated annotations embodied in a large and growing collection of encoded subsystems, a derived set of protein families, and hundreds of genome-scale metabolic models. Until recently, however, maintaining current copies of the SEED code and data at remote locations has been a pressing issue. To allow high-performance remote access to the SEED database, we developed the SEED Servers (http://www.theseed.org/servers): four network-based servers intended to expose the data in the underlying relational database, support basic annotation services, offer programmatic access to the capabilities of the RAST annotation server, and provide access to a growing collection of metabolic models that support flux balance analysis. The SEED servers offer open access to regularly updated data, the ability to annotate prokaryotic genomes, the ability to create metabolic reconstructions and detailed models of metabolism, and access to hundreds of existing metabolic models. This work offers and supports a framework upon which other groups can build independent research efforts. Large integrations of genomic data represent one of the major intellectual resources driving research in biology, and programmatic access to the SEED data will provide significant utility to a broad collection of potential users.


Subject(s)
Computational Biology/methods , Databases, Factual/statistics & numerical data , Information Storage and Retrieval/methods , Software , Escherichia coli/genetics , Escherichia coli/metabolism , Genomics/methods , Genomics/statistics & numerical data , Internet , Metabolomics/methods , Metabolomics/statistics & numerical data , Molecular Sequence Annotation/methods , Molecular Sequence Annotation/statistics & numerical data , Reproducibility of Results
15.
Proc Natl Acad Sci U S A ; 108(50): 20154-9, 2011 Dec 13.
Article in English | MEDLINE | ID: mdl-22128332

ABSTRACT

Most bacterial and archaeal genomes contain many genes with little or no similarity to other genes, a property that impedes identification of gene origins. By comparing the codon usage of genes shared among strains (primarily vertically inherited genes) and genes unique to one strain (primarily recently horizontally acquired genes), we found that the plurality of unique genes in Escherichia coli and Salmonella enterica are much more similar to each other than are their vertically inherited genes. We conclude that E. coli and S. enterica derive these unique genes from a common source, a supraspecies phylogenetic group that includes the organisms themselves. The phylogenetic range of the sharing appears to include other (but not all) members of the Enterobacteriaceae. We found evidence of similar gene sharing in other bacterial and archaeal taxa. Thus, we conclude that frequent gene exchange, particularly that of genetic novelties, extends well beyond accepted species boundaries.


Subject(s)
Escherichia coli/genetics , Gene Transfer, Horizontal/genetics , Genes, Bacterial/genetics , Salmonella enterica/genetics , Sequence Homology, Nucleic Acid , Codon/genetics , Phylogeny , Species Specificity
16.
Am J Primatol ; 73(2): 119-26, 2011 Feb.
Article in English | MEDLINE | ID: mdl-20853395

ABSTRACT

Humans and baboons (Papio spp.) share considerable anatomical and physiological similarities in their reproductive tracts. Given the similarities, it is reasonable to expect that the normal vaginal microbial composition (microbiota) of baboons would be similar to that of humans. We have used a 16S rRNA phylogenetic approach to assess the composition of the baboon vaginal microbiota in a set of nine animals from a captive facility and six from the wild. Results show that although Gram-positive bacteria dominate in baboons as they do in humans, there are major differences between the vaginal microbiota of baboons and that of humans. In contrast to humans, the species of Gram-positive bacteria (Firmicutes) were taxa other than Lactobacillus species. In addition, some groups of Gram-negative bacteria that are not normally abundant in humans were found in the baboon samples. A further level of difference was also seen even within the same bacterial phylogenetic group, as baboon strains tended to be more phylogenetically distinct from human strains than human strains were with each other. Finally, results of our analysis suggests that co-evolution of microbes and their hosts cannot account for the major differences between the microbiota of baboons and that of humans because divergences between the major bacterial genera were too ancient to have occurred since primates evolved. Instead, the primate vaginal tracts appear to have acquired discrete subsets of bacteria from the vast diversity of bacteria available in the environment and established a community responsive to and compatible with host species physiology.


Subject(s)
Gram-Negative Bacteria/classification , Gram-Positive Bacteria/classification , Metagenome , Papio hamadryas/microbiology , Vagina/microbiology , Animals , Biological Evolution , DNA, Bacterial/genetics , Female , Gram-Negative Bacteria/genetics , Gram-Negative Bacteria/physiology , Gram-Positive Bacteria/genetics , Gram-Positive Bacteria/physiology , Humans , Kenya , Papio hamadryas/physiology , Phylogeny , RNA, Ribosomal, 16S/genetics , Texas
17.
Mol Biol Evol ; 28(1): 211-21, 2011 Jan.
Article in English | MEDLINE | ID: mdl-20679093

ABSTRACT

Codon usage can provide insights into the nature of the genes in a genome. Genes that are "native" to a genome (have not been recently acquired by horizontal transfer) range in codon usage from a low-bias "typical" usage to a more biased "high-expression" usage characteristic of genes encoding abundant proteins. Genes that differ from these native codon usages are candidates for foreign genes that have been recently acquired by horizontal gene transfer. In this study, we present a method for characterizing the codon usages of native genes--both typical and highly expressed--within a genome. Each gene is evaluated relative to a half line (or axis) in a 59D space of codon usage. The axis begins at the modal codon usage, the usage that matches the largest number of genes in the genome, and it passes through a point representing the codon usage of a set of genes with expression-related bias. A gene whose codon usage matches (does not significantly differ from) a point on this axis is a candidate native gene, and the location of its projection onto the axis provides a general estimate of its expression level. A gene that differs significantly from all points on the axis is a candidate foreign gene. This automated approach offers significant improvements over existing methods. We illustrate this by analyzing the genomes of Pseudomonas aeruginosa PAO1 and Bacillus anthracis A0248, which can be difficult to analyze with commonly used methods due to their biased base compositions. Finally, we use this approach to measure the proportion of candidate foreign genes in 923 bacterial and archaeal genomes. The organisms with the most homogeneous genomes (containing the fewest candidate foreign genes) are mostly endosymbionts and parasites, though with exceptions that include Pelagibacter ubique and Beutenbergia cavernae. The organisms with the most heterogeneous genomes (containing the most candidate foreign genes) include members of the genera Bacteroides, Corynebacterium, Desulfotalea, Neisseria, Xylella, and Thermobaculum.


Subject(s)
Codon , Genes, Bacterial , Genome, Bacterial , Algorithms , Bacillus anthracis/genetics , Base Composition/genetics , Escherichia coli/genetics , Gene Expression Regulation, Bacterial , Gene Transfer, Horizontal , Genes, Archaeal , Pseudomonas aeruginosa/genetics
18.
PLoS One ; 5(6): e10866, 2010 Jun 02.
Article in English | MEDLINE | ID: mdl-20532250

ABSTRACT

BACKGROUND: The replication of DNA in Archaea and eukaryotes requires several ancillary complexes, including proliferating cell nuclear antigen (PCNA), replication factor C (RFC), and the minichromosome maintenance (MCM) complex. Bacterial DNA replication utilizes comparable proteins, but these are distantly related phylogenetically to their archaeal and eukaryotic counterparts at best. METHODOLOGY/PRINCIPAL FINDINGS: While the structures of each of the complexes do not differ significantly between the archaeal and eukaryotic versions thereof, the evolutionary dynamic in the two cases does. The number of subunits in each complex is constant across all taxa. However, they vary subtly with regard to composition. In some taxa the subunits are all identical in sequence, while in others some are homologous rather than identical. In the case of eukaryotes, there is no phylogenetic variation in the makeup of each complex-all appear to derive from a common eukaryotic ancestor. This is not the case in Archaea, where the relationship between the subunits within each complex varies taxon-to-taxon. We have performed a detailed phylogenetic analysis of these relationships in order to better understand the gene duplications and divergences that gave rise to the homologous subunits in Archaea. CONCLUSION/SIGNIFICANCE: This domain level difference in evolution suggests that different forces have driven the evolution of DNA replication proteins in each of these two domains. In addition, the phylogenies of all three gene families support the distinctiveness of the proposed archaeal phylum Thaumarchaeota.


Subject(s)
Archaea/genetics , DNA Replication , DNA, Archaeal/genetics , Evolution, Molecular , Eukaryotic Cells , Phylogeny , Proliferating Cell Nuclear Antigen/genetics , Replication Protein C/genetics
19.
Mol Biol Evol ; 27(4): 800-10, 2010 Apr.
Article in English | MEDLINE | ID: mdl-20018979

ABSTRACT

Most genomes are heterogeneous in codon usage, so a codon usage study should start by defining the codon usage that is typical to the genome. Although this is commonly taken to be the genomewide average, we propose that the mode-the codon usage that matches the most genes-provides a more useful approximation of the typical codon usage of a genome. We provide a method for estimating the modal codon usage, which utilizes a continuous approximation to the number of matching genes and a simplex optimization. In a survey of bacterial and archaeal genomes, as many as 20% more of the genes in a given genome match the modal codon usage than the average codon usage. We use the mode to examine the evolution of the multireplicon genomes of Agrobacterium tumefaciens C58 and Borrelia burgdorferi B31. In A. tumefaciens, the circular and linear chromosomes are characterized by a common "chromosome-like" codon usage, whereas both plasmids share a distinct "plasmid-like" codon usage. In B. burgdorferi, in addition to different codon-usage biases on the leading and lagging strands of DNA replication found by McInerney (McInerney JO. 1998. Replicational and transcriptional selection on codon usage in Borrelia burgdorferi. Proc Natl Acad Sci USA. 95:10698-10703), we also detect a codon-usage similarity between linear plasmid lp38 and the leading strand of the chromosome and a high similarity among the cp32 family of plasmids.


Subject(s)
Agrobacterium tumefaciens/genetics , Borrelia burgdorferi/genetics , Codon , Genome, Bacterial , Chromosomes, Bacterial/genetics , Phylogeny , Plasmids/genetics , Replicon
20.
RNA ; 15(10): 1909-16, 2009 Oct.
Article in English | MEDLINE | ID: mdl-19717546

ABSTRACT

Messenger RNA (mRNA) processing plays important roles in gene expression in all domains of life. A number of cases of mRNA cleavage have been documented in Archaea, but available data are fragmentary. We have examined RNAs present in Methanocaldococcus (Methanococcus) jannaschii for evidence of RNA processing upstream of protein-coding genes. Of 123 regions covered by the data, 31 were found to be processed, with 30 including a cleavage site 12-16 nucleotides upstream of the corresponding translation start site. Analyses with 3'-RACE (rapid amplification of cDNA ends) and 5'-RACE indicate that the processing is endonucleolytic. Analyses of the sequences surrounding the processing sites for functional sites, sequence motifs, or potential RNA secondary structure elements did not reveal any recurring features except for an AUG translation start codon and (in most cases) a ribosome binding site. These properties differ from those of all previously described mRNA processing systems. Our data suggest that the processing alters the representation of various genes in the RNA pool and therefore, may play a significant role in defining the balance of proteins in the cell.


Subject(s)
Methanococcus/genetics , RNA Processing, Post-Transcriptional , RNA, Archaeal/genetics , RNA, Messenger/genetics , Base Sequence , Nucleic Acid Conformation , RNA, Archaeal/chemistry , RNA, Messenger/chemistry
SELECTION OF CITATIONS
SEARCH DETAIL
...