Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 24
Filter
Add more filters










Publication year range
1.
Viruses ; 14(11)2022 11 16.
Article in English | MEDLINE | ID: mdl-36423143

ABSTRACT

The determination of core genes in viral and bacterial genomes is crucial for a better understanding of their relatedness and for their classification. CoreGenes5.0 is an updated user-friendly web-based software tool for the identification of core genes in and data mining of viral and bacterial genomes. This tool has been useful in the resolution of several issues arising in the taxonomic analysis of bacteriophages and has incorporated many suggestions from researchers in that community. The webserver displays result in a format that is easy to understand and allows for automated batch processing, without the need for any user-installed bioinformatics software. CoreGenes5.0 uses group protein clustering of genomes with one of three algorithm options to output a table of core genes from the input genomes. Previously annotated "unknown genes" may be identified with homologues in the output. The updated version of CoreGenes is able to handle more genomes, is faster, and is more robust, providing easier analysis of custom or proprietary datasets. CoreGenes5.0 is accessible at coregenes.org, migrating from a previous site.


Subject(s)
Genome, Bacterial , Software , Computational Biology , Algorithms , Data Mining
2.
J Gen Virol ; 103(4)2022 04.
Article in English | MEDLINE | ID: mdl-35417319

ABSTRACT

Members of the family Chaseviridae are lytic bacterial viruses infecting representatives of the bacterial class Gammaproteobacteria. Chaseviruses have a global distribution. Virions of members of this family have a myovirus morphology (icosahedral head with contractile tail). Genomes are dsDNA of 52-56 kbp with G+C content ranging from 39.3-52.5 %. Chaseviruses, like members of the family Autographiviridae, encode a large single subunit RNA polymerase, but unlike those viruses their promoter sequences have not yet been identified. This is a summary of the International Committee on Taxonomy of Viruses (ICTV) Report on the family Chaseviridae, which is available at ictv.global/report/chaseviridae.


Subject(s)
Bacteriophages , Viruses , Bacteriophages/genetics , Genome, Viral , Virion/genetics , Virus Replication , Viruses/genetics
3.
PLoS One ; 16(10): e0257436, 2021.
Article in English | MEDLINE | ID: mdl-34653198

ABSTRACT

In mammals, the photopigment melanopsin (Opn4) is found in a subset of retinal ganglion cells that serve light detection for circadian photoentrainment and pupil constriction (i.e., mydriasis). For a given species, the efficiency of photoentrainment and length of time that mydriasis occurs is determined by the spectral sensitivity and deactivation kinetics of melanopsin, respectively, and to date, neither of these properties have been described in marine mammals. Previous work has indicated that the absorbance maxima (λmax) of marine mammal rhodopsins (Rh1) have diversified to match the available light spectra at foraging depths. However, similar to the melanopsin λmax of terrestrial mammals (~480 nm), the melanopsins of marine mammals may be conserved, with λmax values tuned to the spectrum of solar irradiance at the water's surface. Here, we investigated the Opn4 pigments of 17 marine mammal species inhabiting diverse photic environments including the Infraorder Cetacea, as well as the Orders Sirenia and Carnivora. Both genomic and cDNA sequences were used to deduce amino acid sequences to identify substitutions most likely involved in spectral tuning and deactivation kinetics of the Opn4 pigments. Our results show that there appears to be no amino acid substitutions in marine mammal Opn4 opsins that would result in any significant change in λmax values relative to their terrestrial counterparts. We also found some marine mammal species to lack several phosphorylation sites in the carboxyl terminal domain of their Opn4 pigments that result in significantly slower deactivation kinetics, and thus longer mydriasis, compared to terrestrial controls. This finding was restricted to cetacean species previously found to lack cone photoreceptor opsins, a condition known as rod monochromacy. These results suggest that the rod monochromat whales rely on extended pupillary constriction to prevent photobleaching of the highly photosensitive all-rod retina when moving between photopic and scotopic conditions.


Subject(s)
Carnivora/metabolism , Cetacea/metabolism , Rod Opsins/metabolism , Sirenia/metabolism , Amino Acid Sequence , Animals , Aquatic Organisms/genetics , Aquatic Organisms/metabolism , Caniformia/genetics , Caniformia/metabolism , Carnivora/genetics , Cetacea/genetics , Kinetics , Models, Molecular , Phylogeny , Rod Opsins/chemistry , Rod Opsins/genetics , Sequence Alignment , Sirenia/genetics
4.
Antibiotics (Basel) ; 9(10)2020 Sep 30.
Article in English | MEDLINE | ID: mdl-33008130

ABSTRACT

Escherichia phage N4 was isolated in 1966 in Italy and has remained a genomic orphan for a long time. It encodes an extremely large virion-associated RNA polymerase unique for bacterial viruses that became characteristic for this group. In recent years, due to new and relatively inexpensive sequencing techniques the number of publicly available phage genome sequences expanded rapidly. This revealed new members of the N4-like phage group, from 33 members in 2015 to 115 N4-like viruses in 2020. Using new technologies and methods for classification, the Bacterial and Archaeal Viruses Subcommittee of the International Committee on Taxonomy of Viruses (ICTV) has moved the classification and taxonomy of bacterial viruses from mere morphological approaches to genomic and proteomic methods. The analysis of 115 N4-like genomes resulted in a huge reassessment of this group and the proposal of a new family "Schitoviridae", including eight subfamilies and numerous new genera.

5.
Microbiologyopen ; 9(9): e1098, 2020 09.
Article in English | MEDLINE | ID: mdl-32602643

ABSTRACT

Few studies have examined the bacterial communities associated with photosynthetic sacoglossan sea slugs. In this study, we determined the bacterial diversity in the clarki ecotype, Elysia crispata using 16S rRNA sequencing. Computational analysis using QIIME2 revealed variability between individual samples, with the Spirochaetes and Bacteroidetes phyla dominating most samples. Tenericutes and Proteobacteria were also found, among other phyla. Computational metabolic profiling of the bacteria revealed a variety of metabolic pathways involving carbohydrate metabolism, lipid metabolism, nucleotide metabolism, and amino acid metabolism. Although associated bacteria may be involved in mutually beneficial metabolic pathways, there was a high degree of variation in the bacterial community of individual slugs. This suggests that many of these relationships are likely opportunistic rather than obligate and that many of these bacteria may live commensally providing no major benefit to the slugs.


Subject(s)
Bacteria/classification , Bacteria/metabolism , Gastropoda/microbiology , Microbiota , Amino Acids/metabolism , Animals , Bacteria/genetics , Bacteria/isolation & purification , Carbohydrate Metabolism , Ecotype , Gastropoda/classification , Gastropoda/metabolism , Lipid Metabolism , Metabolic Networks and Pathways , Metabolome , Nucleotides/metabolism , Photosynthesis , Phylogeny , Symbiosis
6.
PeerJ ; 7: e6821, 2019.
Article in English | MEDLINE | ID: mdl-31360620

ABSTRACT

The aim of this study was the characterization of fatty acids, antioxidant activity, some physical properties, nutrient content, sugars, and minerals in the pulp and seeds of the date cultivar 'Medjool' (Phoenix dactylifera L.) grown in Mexico. The samples were obtained at maturity (Tamar) in the 2017 harvest season in the valleys of San Luis Rio Colorado and Mexicali, Mexico. The following average values were obtained on a % dry weight basis for pulp and seeds, respectively: protein, 3.14% and 4.84%; lipids, 0.75% and 9.94%; fiber, 6.34% and 66.79%; total sugars, 75.32% and 5.88%; reducing sugars, 70.26% and 4.40%; and sucrose, 5.06% and 1.46%. Analysis of the minerals revealed that the most abundant elements for the pulp were: potassium, 851.98 mg/100 g; magnesium, 142.97 mg/100 g; and phosphorus, 139.40 mg/100 g, whereas for the seeds, they were potassium, 413.36 mg/100 g; sulfur, 151.36 mg/100 g; and phosphorus, 92.42 mg/100 g. Gas chromatography-mass spectrometry analysis revealed that the major unsaturated fatty acid was oleic acid, at 52.34% and 45.92%, respectively, for pulp and seeds. The main saturated fatty acids were palmitic acid (6.75%) and lauric acid (17.24%) in pulp and seeds, respectively. The total phenolic content was 1.16 and 13.73 mg GAE/100 g for pulp and seeds, respectively. Finally, the antioxidant activities were: b-carotene, 65.50% and 47.75%; DPPH, 0.079 IC50 g/L and 0.0046 IC50 g/L; and ABTS, 13.72 IC50 g/L and 0.238 IC50 g/L, respectively. The results obtained in this study confirm that the 'Medjool' cultivar grown in Mexico has the same quality of nutrients and antioxidants as those grown in the other main date-producing countries.

7.
Sensors (Basel) ; 17(8)2017 Aug 21.
Article in English | MEDLINE | ID: mdl-28825658

ABSTRACT

Hypoplasia and ovarian cysts are the most common ovarian pathologies in cattle. In this genome-wide study we analyzed the signal intensity of 648,315 Single Nucleotide Polymorphisms (SNPs) and identified 1338 genes differentiating cows with ovarian pathologies from healthy cows. The sample consisted of six cows presenting an ovarian pathology and six healthy cows. SNP signal intensities were measured with a genotyping process using the Axiom Genome-Wide BOS 1 SNPchip. Statistical tests for equality of variance and mean were applied to SNP intensities, and significance p-values were obtained. A Benjamini-Hochberg multiple testing correction reveled significant SNPs. Corresponding genes were identified using the Bovine Genome UMD 3.1 annotation. Principal Components Analysis (PCA) confirmed differentiation. An analysis of Copy Number Variations (CNVs), obtained from signal intensities, revealed no evidence of association between ovarian pathologies and CNVs. In addition, a haplotype frequency analysis showed no association with ovarian pathologies. Results show that SNP signal intensity, which captures not only information for base-pair genotypes elucidation, but the amount of fluorescence nucleotide synthetization produced in an enzymatic reaction, is a rich source of information that, by itself or in combination with base-pair genotypes, might be used to implement differentiation, prediction and diagnostic procedures, increasing the scope of applications for Genotyping Microarrays.


Subject(s)
Polymorphism, Single Nucleotide , Animals , Cattle , DNA Copy Number Variations , Female , Genome , Genome-Wide Association Study , Genotype , Ovarian Diseases
8.
Biol Bull ; 231(3): 236-244, 2016 12.
Article in English | MEDLINE | ID: mdl-28048954

ABSTRACT

An endogenous retrovirus that is present in the sea slug Elysia chlorotica is expressed in all individuals at the end of the annual life cycle. But the precise role of the virus, if any, in slug senescence or death is unknown. We have determined the genomic sequence of the virus and performed a phylogenetic analysis of the data. The 6060-base pair genome of the virus possesses a reverse transcriptase-domain-containing protein that shows similarity to retrotransposon sequences found in Aplysia californica and Strongylocentrotus purpuratus. However, nucleotide BLAST analysis of the whole genome resulted in hits to only a few portions of the genome, indicating that the Elysia chlorotica retrovirus is novel, has not been previously sequenced, and does not have great genetic similarity to other known viral species. When more invertebrate retroviral genomes are examined, a more precise phylogenetic placement of the Elysia chlorotica retrovirus can be determined.


Subject(s)
Endogenous Retroviruses/classification , Endogenous Retroviruses/genetics , Gastropoda/virology , Phylogeny , Animals , Base Sequence , Genome, Viral/genetics , Genomics
9.
Bioinformation ; 12(6): 301-310, 2016.
Article in English | MEDLINE | ID: mdl-28293072

ABSTRACT

The evolution of sequencing technology has lead to an enormous increase in the number of genomes that have been sequenced. This is especially true in the field of virus genomics. In order to extract meaningful biological information from these genomes, whole genome data mining software tools must be utilized. Hundreds of tools have been developed to analyze biological sequence data. However, only some of these tools are user-friendly to biologists. Several of these tools that have been successfully used to analyze adenovirus genomes are described here. These include Artemis, EMBOSS, pDRAW, zPicture, CoreGenes, GeneOrder, and PipMaker. These tools provide functionalities such as visualization, restriction enzyme analysis, alignment, and proteome comparisons that are extremely useful in the bioinformatics analysis of adenovirus genomes.

10.
J Neurovirol ; 22(3): 336-48, 2016 06.
Article in English | MEDLINE | ID: mdl-26631080

ABSTRACT

Theiler's murine encephalomyelitis virus (TMEV) infects the central nervous system of mice and causes a demyelinating disease that is a model for multiple sclerosis. During the chronic phase of the disease, TMEV persists in oligodendrocytes and macrophages. Lack of remyelination has been attributed to insufficient proliferation and differentiation of oligodendrocyte progenitor cells (OPCs), but the molecular mechanisms remain unknown. Here, we employed pluripotent stem cell technologies to generate pure populations of mouse OPCs to study the temporal and molecular effects of TMEV infection. Global transcriptome analysis of RNA sequencing data revealed that TMEV infection of OPCs caused significant up-regulation of 1926 genes, whereas 1853 genes were significantly down-regulated compared to uninfected cells. Pathway analysis revealed that TMEV disrupted many genes required for OPC growth and maturation. Down-regulation of Olig2, a transcription factor necessary for OPC proliferation, was confirmed by real-time PCR, immunofluorescence microscopy, and western blot analysis. Depletion of Olig2 was not found to be specific to viral strain and did not require expression of the leader (L) protein, which is a multifunctional protein important for persistence, modulation of gene expression, and cell death. These data suggest that direct infection of OPCs by TMEV may inhibit remyelination during the chronic phase of TMEV-induced demyelinating disease.


Subject(s)
Demyelinating Diseases/virology , Host-Pathogen Interactions , Oligodendrocyte Precursor Cells/virology , Oligodendrocyte Transcription Factor 2/genetics , Pluripotent Stem Cells/virology , Theilovirus/genetics , Animals , Cell Differentiation , Cell Line , Cricetinae , Demyelinating Diseases/pathology , Epithelial Cells/virology , Gene Expression Profiling , Gene Expression Regulation , Mice , Molecular Sequence Annotation , Oligodendrocyte Precursor Cells/metabolism , Oligodendrocyte Transcription Factor 2/deficiency , Pluripotent Stem Cells/metabolism , Primary Cell Culture , Theilovirus/metabolism , Transcriptome
11.
Bioinformation ; 11(10): 466-73, 2015.
Article in English | MEDLINE | ID: mdl-26664031

ABSTRACT

Assigning functional information to hypothetical proteins in virus genomes is crucial for gaining insight into their proteomes. Human adenoviruses are medium sized viruses that cause a range of diseases. Their genomes possess proteins with uncharacterized function known as hypothetical proteins. Using a wide range of protein function prediction servers, functional information was obtained about these hypothetical proteins. A comparison of functional information obtained from these servers revealed that some of them produced functional information, while others provided little functional information about these human adenovirus hypothetical proteins. The PFP, ESG, PSIPRED, 3d2GO, and ProtFun servers produced the most functional information regarding these hypothetical proteins.

12.
Virology ; 477: 144-154, 2015 Mar.
Article in English | MEDLINE | ID: mdl-25466308

ABSTRACT

Using a variety of genomic (BLASTN, ClustalW) and proteomic (Phage Proteomic Tree, CoreGenes) tools we have tackled the taxonomic status of members of the largest bacteriophage family, the Siphoviridae. In all over 400 phages were examined and we were able to propose 39 new genera, comprising 216 phage species, and add 62 species to two previously defined genera (Phic3unalikevirus; L5likevirus) grouping, in total, 390 fully sequenced phage isolates. Many of the remainders are orphans which the Bacterial and Archaeal Viruses Subcommittee of the International Committee on Taxonomy of Viruses (ICTV) chooses not to ascribe genus status at the time being.


Subject(s)
Bacteriophages/classification , Genome, Viral , Proteome/analysis , Siphoviridae/classification , Viral Proteins/analysis , Bacteriophages/chemistry , Bacteriophages/genetics , Genomics/methods , Proteomics/methods , Siphoviridae/chemistry , Siphoviridae/genetics , Virology/methods
13.
PLoS One ; 8(10): e78280, 2013.
Article in English | MEDLINE | ID: mdl-24167615

ABSTRACT

Leopard complex spotting is a group of white spotting patterns in horses caused by an incompletely dominant gene (LP) where homozygotes (LP/LP) are also affected with congenital stationary night blindness. Previous studies implicated Transient Receptor Potential Cation Channel, Subfamily M, Member 1 (TRPM1) as the best candidate gene for both CSNB and LP. RNA-Seq data pinpointed a 1378 bp insertion in intron 1 of TRPM1 as the potential cause. This insertion, a long terminal repeat (LTR) of an endogenous retrovirus, was completely associated with LP, testing 511 horses (χ(2)=1022.00, p<<0.0005), and CSNB, testing 43 horses (χ(2)=43, p<<0.0005). The LTR was shown to disrupt TRPM1 transcription by premature poly-adenylation. Furthermore, while deleterious transposable element insertions should be quickly selected against the identification of this insertion in three ancient DNA samples suggests it has been maintained in the horse gene pool for at least 17,000 years. This study represents the first description of an LTR insertion being associated with both a pigmentation phenotype and an eye disorder.


Subject(s)
Horse Diseases/genetics , Mutagenesis, Insertional , Night Blindness/genetics , Night Blindness/veterinary , Retroviridae/genetics , Skin Pigmentation/genetics , TRPM Cation Channels/genetics , Animals , Female , Horses , Male , Night Blindness/metabolism , Retroelements , TRPM Cation Channels/metabolism
14.
BMC Res Notes ; 6: 140, 2013 Apr 08.
Article in English | MEDLINE | ID: mdl-23566564

ABSTRACT

BACKGROUND: CoreGenes3.5 is a webserver that determines sets of core genes from viral and small bacterial genomes as an automated batch process. Previous versions of CoreGenes have been used to classify bacteriophage genomes and mine data from pathogen genomes. FINDINGS: CoreGenes3.5 accepts as input GenBank accession numbers of genomes and performs iterative BLASTP analyses to output a set of core genes. After completion of the program run, the results can be either displayed in a new window for one pair of reference and query genomes or emailed to the user for multiple pairs of small genomes in tabular format. CONCLUSIONS: With the number of genomes sequenced increasing daily and interest in determining phylogenetic relationships, CoreGenes3.5 provides a user-friendly web interface for wet-bench biologists to process multiple small genomes for core gene determinations. CoreGenes3.5 is available at http://binf.gmu.edu:8080/CoreGenes3.5.


Subject(s)
Computational Biology/methods , Genome, Bacterial , Genome, Viral , Algorithms , Data Mining , Databases, Genetic , Genomics/methods , Internet , Phylogeny , Software
15.
Adv Exp Med Biol ; 680: 379-85, 2010.
Article in English | MEDLINE | ID: mdl-20865522

ABSTRACT

A combined genomics and in situ proteomics approach can be used to determine and classify the relatedness of organisms. The common set of proteins shared within a group of genomes is encoded by the "core" set of genes, which is increasingly recognized as a metric for parsing viral and bacterial species. These can be described by the concept of a "pan-genome", which consists of this "core" set and a "dispensable" set, i.e., genes found in one or more but not all organisms in the grouping. "CoreGenesUniqueGenes" (CGUG) is a web-based tool that determines this core set of proteins in a set of genomes as well as parses the dispensable set of unique proteins in a pair of viral or small bacterial genomes. This proteome-based methodology is validated using bacteriophages, aiding the reevaluation of current classifications of bacteriophages. The utility of CGUG in the analysis of small bacterial genomes and the annotation of hypothetical proteins is also presented.


Subject(s)
Algorithms , Bacteriophages/classification , Bacteriophages/genetics , Genomics/statistics & numerical data , Bacteriophage P22/classification , Bacteriophage P22/genetics , Bacteriophage T7/classification , Bacteriophage T7/genetics , Bacteriophage lambda/classification , Bacteriophage lambda/genetics , Burkholderia cenocepacia/classification , Burkholderia cenocepacia/genetics , Computational Biology , Genes, Viral , Genome, Bacterial , Internet , Podoviridae/classification , Podoviridae/genetics , Proteome , Proteomics/statistics & numerical data
16.
BMC Res Notes ; 3: 41, 2010 Feb 23.
Article in English | MEDLINE | ID: mdl-20178631

ABSTRACT

BACKGROUND: The growing whole genome sequence databases necessitate the development of user-friendly software tools to mine these data. Web-based tools are particularly useful to wet-bench biologists as they enable platform-independent analysis of sequence data, without having to perform complex programming tasks and software compiling. FINDINGS: GeneOrder4.0 is a web-based "on-the-fly" synteny and gene order analysis tool for comparative bacterial genomics (ca. 8 Mb). It enables the visualization of synteny by plotting protein similarity scores between two genomes and it also provides visual annotation of "hypothetical" proteins from older archived genomes based on more recent annotations. CONCLUSIONS: The web-based software tool GeneOrder4.0 is a user-friendly application that has been updated to allow the rapid analysis of synteny and gene order in large bacterial genomes. It is developed with the wet-bench researcher in mind.

17.
Viruses ; 2(1): 1-26, 2010 Jan.
Article in English | MEDLINE | ID: mdl-21994597

ABSTRACT

Technological advances and increasingly cost-effect methodologies in DNA sequencing and computational analysis are providing genome and proteome data for human adenovirus research. Applying these tools, data and derived knowledge to the development of vaccines against these pathogens will provide effective prophylactics. The same data and approaches can be applied to vector development for gene delivery in gene therapy and vaccine delivery protocols. Examination of several field strain genomes and their analyses provide examples of data that are available using these approaches. An example of the development of HAdV-B3 both as a vaccine and also as a vector is presented.

18.
Virology ; 397(1): 113-8, 2010 Feb 05.
Article in English | MEDLINE | ID: mdl-19932910

ABSTRACT

Human adenovirus type 3 (HAdV-B3) has an apparently stable genome yet remains a major circulating and problematic respiratory pathogen. Comparisons of the prototype genome to genomes from three current field strains, including two isolated from epidemics, and a laboratory strain, yielded small-scale nucleotide variations across 50 years of time and space (U.S. and China). This is in contrast to the recombination events that have been reported recently for HAdV genomes. Recombinant genomes have been identified in emergent HAdV pathogens and is a pathway for the molecular evolution of types. These two contrasting views of HAdV genome stability have repercussions in the development and use of vaccines for countering HAdV-B3, as well as in the continued effectiveness of vaccines developed against earlier and current circulating types of HAdV.


Subject(s)
Adenoviruses, Human/genetics , DNA, Viral/genetics , Genetic Variation , Genomic Instability , Adenoviridae Infections/virology , Adenoviruses, Human/chemistry , Adenoviruses, Human/isolation & purification , Amino Acid Sequence , China , Cluster Analysis , Geography , Humans , Molecular Sequence Data , Phylogeny , Proteome/analysis , Sequence Analysis, DNA , Sequence Homology, Amino Acid , Time Factors , United States , Viral Proteins/analysis
19.
BMC Microbiol ; 9: 224, 2009 Oct 26.
Article in English | MEDLINE | ID: mdl-19857251

ABSTRACT

BACKGROUND: We advocate unifying classical and genomic classification of bacteriophages by integration of proteomic data and physicochemical parameters. Our previous application of this approach to the entirely sequenced members of the Podoviridae fully supported the current phage classification of the International Committee on Taxonomy of Viruses (ICTV). It appears that horizontal gene transfer generally does not totally obliterate evolutionary relationships between phages. RESULTS: CoreGenes/CoreExtractor proteome comparison techniques applied to 102 Myoviridae suggest the establishment of three subfamilies (Peduovirinae, Teequatrovirinae, the Spounavirinae) and eight new independent genera (Bcep781, BcepMu, FelixO1, HAP1, Bzx1, PB1, phiCD119, and phiKZ-like viruses). The Peduovirinae subfamily, derived from the P2-related phages, is composed of two distinct genera: the "P2-like viruses", and the "HP1-like viruses". At present, the more complex Teequatrovirinae subfamily has two genera, the "T4-like" and "KVP40-like viruses". In the genus "T4-like viruses" proper, four groups sharing >70% proteins are distinguished: T4-type, 44RR-type, RB43-type, and RB49-type viruses. The Spounavirinae contain the "SPO1-"and "Twort-like viruses." CONCLUSION: The hierarchical clustering of these groupings provide biologically significant subdivisions, which are consistent with our previous analysis of the Podoviridae.


Subject(s)
Genome, Viral , Myoviridae/classification , Proteomics/methods , Cluster Analysis , Computational Biology , Myoviridae/genetics , Phylogeny , Sequence Analysis, Protein , Sequence Homology, Amino Acid , Viral Proteins/genetics
20.
BMC Res Notes ; 2: 168, 2009 Aug 25.
Article in English | MEDLINE | ID: mdl-19706165

ABSTRACT

BACKGROUND: Viruses and small-genome bacteria (~2 megabases and smaller) comprise a considerable population in the biosphere and are of interest to many researchers. These genomes are now sequenced at an unprecedented rate and require complementary computational tools to analyze. "CoreGenesUniqueGenes" (CGUG) is an in silico genome data mining tool that determines a "core" set of genes from two to five organisms with genomes in this size range. Core and unique genes may reflect similar niches and needs, and may be used in classifying organisms. FINDINGS: CGUG is available at http://binf.gmu.edu/geneorder.html as a web-based on-the-fly tool that performs iterative BLASTP analyses using a reference genome and up to four query genomes to provide a table of genes common to these genomes. The result is an in silico display of genomes and their proteomes, allowing for further analysis. CGUG can be used for "genome annotation by homology", as demonstrated with Chlamydophila and Francisella genomes. CONCLUSION: CGUG is used to reanalyze the ICTV-based classifications of bacteriophages, to reconfirm long-standing relationships and to explore new classifications. These genomes have been problematic in the past, due largely to horizontal gene transfers. CGUG is validated as a tool for reannotating small genome bacteria using more up-to-date annotations by similarity or homology. These serve as an entry point for wet-bench experiments to confirm the functions of these "hypothetical" and "unknown" proteins.

SELECTION OF CITATIONS
SEARCH DETAIL