Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 46
Filter
Add more filters










Publication year range
1.
PLoS Genet ; 16(10): e1009042, 2020 10.
Article in English | MEDLINE | ID: mdl-33022009

ABSTRACT

A ~10-11 bp periodicity in dinucleotides imparting DNA bending, with shorter periods found in organisms with positively-supercoiled DNA and longer periods found in organisms with negatively-supercoiled DNA, was previously suggested to assist in DNA compaction. However, when measured with more robust methods, variation in the observed periods between organisms with different growth temperatures is not consistent with that hypothesis. We demonstrate that dinucleotide periodicity does not arise solely by mutational biases but is under selection. We found variation between genomes in both the period and the suite of dinucleotides that are periodic. Whereas organisms with similar growth temperatures have highly variable periods, differences in periods increase with phylogenetic distance between organisms. In addition, while the suites of dinucleotides under selection for periodicity become more dissimilar among more distantly-related organisms, there is a core set of dinucleotides that are strongly periodic among genomes in all domains of life. Notably, this core set of periodic motifs are not involved in DNA bending. These data indicate that dinucleotide periodicity is an ancient genomic architecture which may play a role in shaping the evolution of genes and genomes.


Subject(s)
DNA/genetics , Dinucleotide Repeats/genetics , Evolution, Molecular , Nucleotide Motifs/genetics , Archaea/genetics , Genome, Bacterial/genetics , Genomics , Mutation/genetics , Nucleosomes/genetics , Paleontology , Phylogeny , Selection, Genetic/genetics
2.
PLoS Genet ; 14(5): e1007421, 2018 05.
Article in English | MEDLINE | ID: mdl-29813058

ABSTRACT

Despite significant frequencies of lateral gene transfer between species, higher taxonomic groups of bacteria show ecological and phenotypic cohesion. This suggests that barriers prevent panmictic dissemination of genes via lateral gene transfer. We have proposed that most bacterial genomes have a functional architecture imposed by Architecture IMparting Sequences (AIMS). AIMS are defined as 8 base pair sequences preferentially abundant on leading strands, whose abundance and strand-bias are positively correlated with proximity to the replication terminus. We determined that inversions whose endpoints lie within a single chromosome arm, which would reverse the polarity of AIMS in the inverted region, are both shorter and less frequent near the replication terminus. This distribution is consistent with the increased selection on AIMS function in this region, thus constraining DNA rearrangement. To test the hypothesis that AIMS also constrain DNA transfer between genomes, AIMS were identified in genomes while ignoring atypical, potentially laterally-transferred genes. The strand-bias of AIMS within recently acquired genes was negatively correlated with the distance of those genes from their genome's replication terminus. This suggests that selection for AIMS function prevents the acquisition of genes whose AIMS are not found predominantly in the permissive orientation. This constraint has led to the loss of at least 18% of genes acquired by transfer in the terminus-proximal region. We used completely sequenced genomes to produce a predictive road map of paths of expected horizontal gene transfer between species based on AIMS compatibility between donor and recipient genomes. These results support a model whereby organisms retain introgressed genes only if the benefits conferred by their encoded functions outweigh the detriments incurred by the presence of foreign DNA lacking genome-wide architectural information.


Subject(s)
Bacteria/genetics , Chromosomes/genetics , Gene Rearrangement/genetics , Gene Transfer, Horizontal , Selection, Genetic , Chromosome Inversion , DNA Replication , Genome, Bacterial , Phylogeny
3.
Nucleic Acids Res ; 46(5): 2265-2278, 2018 03 16.
Article in English | MEDLINE | ID: mdl-29432573

ABSTRACT

Highly Iterated Palindrome 1 (HIP1, GCGATCGC) is hyper-abundant in most cyanobacterial genomes. In some cyanobacteria, average HIP1 abundance exceeds one motif per gene. Such high abundance suggests a significant role in cyanobacterial biology. However, 20 years of study have not revealed whether HIP1 has a function, much less what that function might be. We show that HIP1 is 15- to 300-fold over-represented in genomes analyzed. More importantly, HIP1 sites are conserved both within and between open reading frames, suggesting that their overabundance is maintained by selection rather than by continual replenishment by neutral processes, such as biased DNA repair. This evidence for selection suggests a functional role for HIP1. No evidence was found to support a functional role as a peptide or RNA motif or a role in the regulation of gene expression. Rather, we demonstrate that the distribution of HIP1 along cyanobacterial chromosomes is significantly periodic, with periods ranging from 10 to 90 kb, consistent in scale with periodicities reported for co-regulated, co-expressed and evolutionarily correlated genes. The periodicity we observe is also comparable in scale to chromosomal interaction domains previously described in other bacteria. In this context, our findings imply HIP1 functions associated with chromosome and nucleoid structure.


Subject(s)
Bacterial Proteins/genetics , Cyanobacteria/genetics , Genome, Bacterial/genetics , Selection, Genetic , Bacterial Proteins/metabolism , Base Sequence , Chromosomes, Bacterial/genetics , Cyanobacteria/classification , Cyanobacteria/metabolism , DNA, Bacterial/genetics , Gene Expression Regulation, Bacterial , Periodicity , Phylogeny
4.
Genome Biol Evol ; 8(6): 2065-75, 2016 07 03.
Article in English | MEDLINE | ID: mdl-27289093

ABSTRACT

Neisseria meningitidis is an important cause of meningococcal disease globally. Sequence type (ST)-11 clonal complex (cc11) is a hypervirulent meningococcal lineage historically associated with serogroup C capsule and is believed to have acquired the W capsule through a C to W capsular switching event. We studied the sequence of capsule gene cluster (cps) and adjoining genomic regions of 524 invasive W cc11 strains isolated globally. We identified recombination breakpoints corresponding to two distinct recombination events within W cc11: A 8.4-kb recombinant region likely acquired from W cc22 including the sialic acid/glycosyl-transferase gene, csw resulted in a C→W change in capsular phenotype and a 13.7-kb recombinant segment likely acquired from Y cc23 lineage includes 4.5 kb of cps genes and 8.2 kb downstream of the cps cluster resulting in allelic changes in capsule translocation genes. A vast majority of W cc11 strains (497/524, 94.8%) retain both recombination events as evidenced by sharing identical or very closely related capsular allelic profiles. These data suggest that the W cc11 capsular switch involved two separate recombination events and that current global W cc11 meningococcal disease is caused by strains bearing this mosaic capsular switch.


Subject(s)
Meningococcal Infections/genetics , Neisseria meningitidis/genetics , Phylogeny , Recombination, Genetic , Genome, Bacterial , Genomics , Humans , Meningococcal Infections/microbiology , Multigene Family , Neisseria meningitidis/pathogenicity , Serogroup
5.
Proc Biol Sci ; 283(1829)2016 04 27.
Article in English | MEDLINE | ID: mdl-27097926

ABSTRACT

Despite the importance of host attributes for the likelihood of associated microbial transmission, individual variation is seldom considered in studies of wildlife disease. Here, we test the influence of host phenotypes on social network structure and the likelihood of cuticular bacterial transmission from exposed individuals to susceptible group-mates using female social spiders (Stegodyphus dumicola). Based on the interactions of resting individuals of known behavioural types, we assessed whether individuals assorted according to their behavioural traits. We found that individuals preferentially interacted with individuals of unlike behavioural phenotypes. We next applied a green fluorescent protein-transformed cuticular bacterium,Pantoeasp., to individuals and allowed them to interact with an unexposed colony-mate for 24 h. We found evidence for transmission of bacteria in 55% of cases. The likelihood of transmission was influenced jointly by the behavioural phenotypes of both the exposed and susceptible individuals: transmission was more likely when exposed spiders exhibited higher 'boldness' relative to their colony-mate, and when unexposed individuals were in better body condition. Indirect transmission via shared silk took place in only 15% of cases. Thus, bodily contact appears key to transmission in this system. These data represent a fundamental step towards understanding how individual traits influence larger-scale social and epidemiological dynamics.


Subject(s)
Spiders/microbiology , Spiders/physiology , Animals , Female , Pantoea/isolation & purification , Phenotype , Silk , Social Behavior
6.
Microbiology (Reading) ; 162(4): 610-621, 2016 Apr.
Article in English | MEDLINE | ID: mdl-26888189

ABSTRACT

Antigenically distinct members of bacterial species can be differentially distributed in the environment. Predators known to consume antigenically distinct prey with different efficiencies are also differentially distributed. Here we show that antigenically distinct, but otherwise isogenic and physiologically indistinct, strains of Salmonella enterica show differential survival in natural soil, sediment and intestinal environments, where they would face a community of predators. Decline in overall cell numbers is attenuated by factors that inhibit the action of predators, including heat and antiprotozoal and antihelminthic drugs. Moreover, the fitness of strains facing these predators - calculated by comparing survival with and without treatments attenuating predator activity - varies between environments. These results suggest that relative survival in natural environments is arbitrated by communities of natural predators whose feeding preferences, if not species composition, vary between environments. These data support the hypothesis that survival against natural predators may drive the differential distribution of bacteria among microenvironments.

7.
EBioMedicine ; 2(10): 1447-55, 2015 Oct.
Article in English | MEDLINE | ID: mdl-26629539

ABSTRACT

Neisseria meningitidis is a leading bacterial cause of sepsis and meningitis globally with dynamic strain distribution over time. Beginning with an epidemic among Hajj pilgrims in 2000, serogroup W (W) sequence type (ST) 11 emerged as a leading cause of epidemic meningitis in the African 'meningitis belt' and endemic cases in South America, Europe, Middle East and China. Previous genotyping studies were unable to reliably discriminate sporadic W ST-11 strains in circulation since 1970 from the Hajj outbreak strain (Hajj clone). It is also unclear what proportion of more recent W ST-11 disease clusters are caused by direct descendants of the Hajj clone. Whole genome sequences of 270 meningococcal strains isolated from patients with invasive meningococcal disease globally from 1970 to 2013 were compared using whole genome phylogenetic and major antigen-encoding gene sequence analyses. We found that all W ST-11 strains were descendants of an ancestral strain that had undergone unique capsular switching events. The Hajj clone and its descendants were distinct from other W ST-11 strains in that they shared a common antigen gene profile and had undergone recombination involving virulence genes encoding factor H binding protein, nitric oxide reductase, and nitrite reductase. These data demonstrate that recent acquisition of a distinct antigen-encoding gene profile and variations in meningococcal virulence genes was associated with the emergence of the Hajj clone. Importantly, W ST-11 strains unrelated to the Hajj outbreak contribute a significant proportion of W ST-11 cases globally. This study helps illuminate genomic factors associated with meningococcal strain emergence and evolution.


Subject(s)
Genome, Viral , Genomics , Meningitis, Meningococcal/epidemiology , Meningitis, Meningococcal/microbiology , Neisseria meningitidis/genetics , Neisseria meningitidis/pathogenicity , Antigens, Bacterial/genetics , Computational Biology/methods , Disease Outbreaks , Genes, Bacterial , Genotype , High-Throughput Nucleotide Sequencing , Humans , Molecular Sequence Annotation , Neisseria meningitidis/classification , Neisseria meningitidis/isolation & purification , Open Reading Frames , Phylogeny , Polymorphism, Single Nucleotide , Serogroup , Virulence/genetics
8.
Elife ; 4: e06416, 2015 Apr 28.
Article in English | MEDLINE | ID: mdl-25919952

ABSTRACT

The bacteriophage population is large, dynamic, ancient, and genetically diverse. Limited genomic information shows that phage genomes are mosaic, and the genetic architecture of phage populations remains ill-defined. To understand the population structure of phages infecting a single host strain, we isolated, sequenced, and compared 627 phages of Mycobacterium smegmatis. Their genetic diversity is considerable, and there are 28 distinct genomic types (clusters) with related nucleotide sequences. However, amino acid sequence comparisons show pervasive genomic mosaicism, and quantification of inter-cluster and intra-cluster relatedness reveals a continuum of genetic diversity, albeit with uneven representation of different phages. Furthermore, rarefaction analysis shows that the mycobacteriophage population is not closed, and there is a constant influx of genes from other sources. Phage isolation and analysis was performed by a large consortium of academic institutions, illustrating the substantial benefits of a disseminated, structured program involving large numbers of freshman undergraduates in scientific discovery.


Subject(s)
DNA, Viral/genetics , Genetic Variation , Genome, Viral , Mycobacteriophages/genetics , Phylogeny , Biomedical Research/ethics , Cooperative Behavior , Gene Flow , High-Throughput Nucleotide Sequencing , Information Dissemination , Mosaicism , Mycobacteriophages/classification , Mycobacterium smegmatis/virology , Phylogeography , Workforce
9.
mBio ; 5(6): e02145, 2014 Dec 02.
Article in English | MEDLINE | ID: mdl-25467442

ABSTRACT

UNLABELLED: Newly emerging human viruses such as Ebola virus, severe acute respiratory syndrome (SARS) virus, and HIV likely originate within an extant population of viruses in nonhuman hosts and acquire the ability to infect and cause disease in humans. Although several mechanisms preventing viral infection of particular hosts have been described, the mechanisms and constraints on viral host expansion are ill defined. We describe here mycobacteriophage Patience, a newly isolated phage recovered using Mycobacterium smegmatis mc(2)155 as a host. Patience has genomic features distinct from its M. smegmatis host, including a much lower GC content (50.3% versus 67.4%) and an abundance of codons that are rarely used in M. smegmatis. Nonetheless, it propagates well in M. smegmatis, and we demonstrate the use of mass spectrometry to show expression of over 75% of the predicted proteins, to identify new genes, to refine the genome annotation, and to estimate protein abundance. We propose that Patience evolved primarily among lower-GC hosts and that the disparities between its genomic profile and that of M. smegmatis presented only a minimal barrier to host expansion. Rapid adaptions to its new host include recent acquisition of higher-GC genes, expression of out-of-frame proteins within predicted genes, and codon selection among highly expressed genes toward the translational apparatus of its new host. IMPORTANCE: The mycobacteriophage Patience genome has a notably lower GC content (50.3%) than its Mycobacterium smegmatis host (67.4%) and has markedly different codon usage biases. The viral genome has an abundance of codons that are rare in the host and are decoded by wobble tRNA pairing, although the phage grows well and expression of most of the genes is detected by mass spectrometry. Patience thus has the genomic profile of a virus that evolved primarily in one type of host genetic landscape (moderate-GC bacteria) but has found its way into a distinctly different high-GC environment. Although Patience genes are ill matched to the host expression apparatus, this is of little functional consequence and has not evidently imposed a barrier to migration across the microbial landscape. Interestingly, comparison of expression levels and codon usage profiles reveals evidence of codon selection as the genome evolves and adapts to its new environment.


Subject(s)
Genome, Viral , Mycobacteriophages/chemistry , Mycobacteriophages/genetics , Mycobacterium smegmatis/virology , Proteome/analysis , Viral Proteins/analysis , Viral Proteins/genetics , Base Composition , Codon , Mass Spectrometry , Mycobacteriophages/isolation & purification , Mycobacteriophages/physiology , Virus Replication
10.
J Microbiol Methods ; 94(1): 1-12, 2013 Jul.
Article in English | MEDLINE | ID: mdl-23588324

ABSTRACT

Flow cytometry is an effective tool for enumerating fluorescently-labeled microbes recovered from natural environments. However, low signal strength and the presence of fluorescent, non-cellular particles complicate the separation of cellular events from noise. Existing classification methods rely on the arbitrary placement of noise thresholds, resulting in potentially high rates of misclassification of fluorescent cells, thus precluding the robust estimation of the proportions of classes of fluorescent cells. Here we present a method for objectively separating signal from noise. Rather than setting an arbitrary noise threshold, the Z-scoring approach uses the Gaussian distribution of signal strength (a) to locate noise threshold for individual fluorophores, (b) to predict the likelihood of different fluorescent genotypes in producing the signal observed, and (c) to normalize the fraction of cellular events count for each fluorescent cell class. The likelihood framework allows rejection of alternative genotypes, leading to robust and reliable classification of fluorescent cells. Use of Z-scoring in classification of cells expressing multiple fluorophores, use of spillover in actively scoring events, and the successful classification of multiple fluorophores using a single detector within a flow cytometer are discussed. A software package that performs Z-scoring for cells labeled with one or more fluorophores is described.


Subject(s)
Flow Cytometry/methods , Fluorescent Dyes/analysis , Staining and Labeling/methods , Models, Statistical , Signal-To-Noise Ratio , Software
11.
J Microbiol Methods ; 91(3): 477-82, 2012 Dec.
Article in English | MEDLINE | ID: mdl-23041268

ABSTRACT

The traditional genetic tools used in Salmonella enterica serovar Typhimurium rely heavily on a high-transducing mutant of bacteriophage P22. P22 recognizes its hosts by the structure of their O-antigens, which vary among serovars of Salmonella; therefore, it cannot be used in most non-Typhimurium Salmonella, including the majority of those causing food-borne illnesses in both humans and livestock. Bacteriophage P1 infects a variety of enteric bacteria, including galE mutants of serovar Typhimurium; however, the degree to which the presence of coimmune prophages, the lack of required attachment sites or the lack of host factors act as barriers to using phage P1 in natural isolates of Salmonella is unknown. Here, we show that recombineering can be used to make virtually any serovar of Salmonella susceptible to P1 infection; as a result, P1 can be utilized for facile genetic manipulation of non-Typhimurium Salmonella, including movement of very large pathogenicity islands. A toolkit for easy manipulation of non-Typhimurium serovars of Salmonella is described.


Subject(s)
Bacteriophages/genetics , Genomic Islands , Salmonella enterica/genetics , Transduction, Genetic/methods , Bacteriophages/physiology , Humans , Salmonella Infections/microbiology , Salmonella enterica/pathogenicity , Salmonella enterica/virology , Salmonella typhimurium/genetics , Salmonella typhimurium/pathogenicity
12.
Mol Biol Evol ; 29(12): 3669-83, 2012 Dec.
Article in English | MEDLINE | ID: mdl-22740635

ABSTRACT

In bacteria, physiological change may be effected by a single gene acquisition, producing ecological differentiation without genetic isolation. Natural selection acting on such differences can reduce the frequency of genotypes that arise from recombination at these loci. However, gene acquisition can only account for recombination interference in the fraction of the genome that is tightly linked to the integration site. To identify additional loci that contribute to adaptive differences, we examined orthologous genes in species of Enterobacteriaceae to identify significant differences in the degree of codon selection. Significance was assessed using the Adaptive Codon Enrichment metric, which accounts for the variation in codon usage bias that is expected to arise from mutation and drift; large differences in codon usage bias were identified in more genes than would be expected to arise from stochastic processes alone. Genes in the same operon showed parallel differences in codon usage bias, suggesting that changes in the overall levels of gene expression led to changes in the degree of adaptive codon usage. Most significant differences between orthologous operons were found among those involved with specific environmental adaptations, whereas "housekeeping" genes rarely showed significant changes. When considered together, the loci experiencing significant changes in codon selection outnumber potentially adaptive gene acquisition events. The identity of genes under strong codon selection seems to be influenced by the habitat from which the bacteria were isolated. We propose a two-stage model for how adaptation to different selective regimes can drive bacterial speciation. Initially, gene acquisitions catalyze rapid ecological differentiation, which modifies the utilization of genes, thereby changing the strength of codon selection on them. Alleles develop fitness variation by substitution, producing recombination interference at these loci in addition to those flanking acquired genes, allowing sequences to diverge across the entire genome and establishing genetic isolation (i.e., protection from frequent homologous recombination).


Subject(s)
Adaptation, Biological/genetics , Bacteria/genetics , Codon/genetics , Environment , Gene Transfer, Horizontal/genetics , Phylogeny , Selection, Genetic/genetics , Analysis of Variance , Likelihood Functions , Models, Genetic , Operon/genetics , Principal Component Analysis , Species Specificity
13.
Methods Mol Biol ; 855: 281-308, 2012.
Article in English | MEDLINE | ID: mdl-22407713

ABSTRACT

Methods for identifying alien genes in genomes fall into two general classes. Phylogenetic methods examine the distribution of a gene's homologues among genomes to find those with relationships not consistent with vertical inheritance. These approaches include identifying orphan genes which lack homologues in closely related genomes and genes with unduly high levels of similarity to genes in otherwise unrelated genomes. Rigorous statistical tests are available to place confidence intervals for predicted alien genes. Parametric methods examine the compositional properties of genes within a genome to find those with atypical properties, likely indicating the directional mutational pressures of a donor genome. These methods may compare the properties of genes to genomic averages, properties of genes to each other, or properties of large, multigene regions of the chromosome. Here, we discuss the strengths and weaknesses of each approach.


Subject(s)
Gene Transfer, Horizontal/genetics , Genomics/methods , Animals , Genome/genetics , Humans , Phylogeny
14.
BMC Genomics ; 12: 374, 2011 Jul 25.
Article in English | MEDLINE | ID: mdl-21787402

ABSTRACT

BACKGROUND: Statistics measuring codon selection seek to compare genes by their sensitivity to selection for translational efficiency, but existing statistics lack a model for testing the significance of differences between genes. Here, we introduce a new statistic for measuring codon selection, the Adaptive Codon Enrichment (ACE). RESULTS: This statistic represents codon usage bias in terms of a probabilistic distribution, quantifying the extent that preferred codons are over-represented in the gene of interest relative to the mean and variance that would result from stochastic sampling of codons. Expected codon frequencies are derived from the observed codon usage frequencies of a broad set of genes, such that they are likely to reflect nonselective, genome wide influences on codon usage (e.g. mutational biases). The relative adaptiveness of synonymous codons is deduced from the frequency of codon usage in a pre-selected set of genes relative to the expected frequency. The ACE can predict both transcript abundance during rapid growth and the rate of synonymous substitutions, with accuracy comparable to or greater than existing metrics. We further examine how the composition of reference gene sets affects the accuracy of the statistic, and suggest methods for selecting appropriate reference sets for any genome, including bacteriophages. Finally, we demonstrate that the ACE may naturally be extended to quantify the genome-wide influence of codon selection in a manner that is sensitive to a large fraction of codons in the genome. This reveals substantial variation among genomes, correlated with the tRNA gene number, even among groups of bacteria where previously proposed whole-genome measures show little variation. CONCLUSIONS: The statistical framework of the ACE allows rigorous comparison of the level of codon selection acting on genes, both within a genome and between genomes.


Subject(s)
Bacteria/genetics , Codon/genetics , Genomics , Algorithms , Amino Acid Substitution , Gene Expression Profiling , Genes, Bacterial/genetics , Genome, Bacterial/genetics , Models, Genetic , Open Reading Frames/genetics , Probability , Stochastic Processes
15.
J Bacteriol ; 193(12): 2941-7, 2011 Jun.
Article in English | MEDLINE | ID: mdl-21515774

ABSTRACT

Haemophilus ducreyi, the etiologic agent of chancroid, expresses variants of several key virulence factors. While previous reports suggested that H. ducreyi strains formed two clonal populations, the differences between, and diversity within, these populations were unclear. To assess their variability, we examined sequence diversity at 11 H. ducreyi loci, including virulence and housekeeping genes, augmenting published data sets with PCR-amplified genes to acquire data for at least 10 strains at each locus. While sequences from all 11 loci place strains into two distinct groups, there was very little variation within each group. The difference between alleles of the two groups was variable and large at 3 loci encoding surface-exposed proteins (0.4 < K(S) < 1.3, where K(S) is divergence at synonymous sites) but consistently small at genes encoding cytoplasmic or periplasmic proteins (K(S) < 0.09). The data suggest that the two classes have recently diverged, that recombination has introduced variant alleles into at least 3 distinct loci, and that these alleles have been confined to one of the two classes. In addition, recombination is evident among alleles within, but not between, classes. Rather than clones of the same species, these properties indicate that the two classes may form distinct species.


Subject(s)
Genetic Variation , Haemophilus ducreyi/classification , Haemophilus ducreyi/genetics , Antigens, Bacterial/genetics , Antigens, Bacterial/metabolism , Bacterial Proteins/genetics , Bacterial Proteins/metabolism , Base Sequence , Gene Expression Regulation, Bacterial , Haemophilus Infections/prevention & control , Haemophilus Vaccines/immunology , Haemophilus ducreyi/immunology , Humans , Molecular Sequence Data , Phylogeny , Time Factors
16.
Nucleic Acids Res ; 39(9): e56, 2011 May.
Article in English | MEDLINE | ID: mdl-21297116

ABSTRACT

Because the properties of horizontally-transferred genes will reflect the mutational proclivities of their donor genomes, they often show atypical compositional properties relative to native genes. Parametric methods use these discrepancies to identify bacterial genes recently acquired by horizontal transfer. However, compositional patterns of native genes vary stochastically, leaving no clear boundary between typical and atypical genes. As a result, while strongly atypical genes are readily identified as alien, genes of ambiguous character are poorly classified when a single threshold separates typical and atypical genes. This limitation affects all parametric methods that examine genes independently, and escaping it requires the use of additional genomic information. We propose that the performance of all parametric methods can be improved by using a multiple-threshold approach. First, strongly atypical alien genes and strongly typical native genes would be identified using conservative thresholds. Genes with ambiguous compositional features would then be classified by examining gene context, including the class (native or alien) of flanking genes. By including additional genomic information in a multiple-threshold framework, we observed a remarkable improvement in the performance of several popular, but algorithmically distinct, methods for alien gene detection.


Subject(s)
Genes, Bacterial , Genomics/methods , Algorithms , Cluster Analysis , Gene Transfer, Horizontal , Genes, Archaeal , Genome, Bacterial
17.
Proc Natl Acad Sci U S A ; 107(25): 11453-8, 2010 Jun 22.
Article in English | MEDLINE | ID: mdl-20534528

ABSTRACT

Evolutionary relationships among species are often assumed to be fundamentally unambiguous, where genes within a genome are thought to evolve in concert and phylogenetic incongruence between individual orthologs is attributed to idiosyncrasies in their evolution. We have identified substantial incongruence between the phylogenies of orthologous genes in Escherichia, Salmonella, and Citrobacter, or E. coli, E. fergusonii, and E. albertii. The source of incongruence was inferred to be recombination, because individual genes support conflicting topology more robustly than expected from stochastic sequence homoplasies. Clustering of phylogenetically informative sites on the genome indicated that the regions of recombination extended over several kilobases. Analysis of phylogenetically distant taxa resulted in consensus among individual gene phylogenies, suggesting that recombination is not ongoing; instead, conflicting relationships among genes in descendent taxa reflect recombination among their ancestors. Incongruence could have resulted from random assortment of ancestral polymorphisms if species were instantly created from the division of a recombining population. However, the estimated branch lengths in alternative phylogenies would require ancestral populations with far more diversity than is found in extant populations. Rather, these and previous data collectively suggest that genome-wide recombination rates decreased gradually, with variation in rate among loci, leading to pluralistic relationships among their descendent taxa.


Subject(s)
Bacteria/genetics , Cell Lineage , Codon , DNA, Bacterial/genetics , Enterobacteriaceae/genetics , Escherichia coli/genetics , Evolution, Molecular , Genetic Variation , Genome , Models, Genetic , Multigene Family , Phylogeny , Polymorphism, Genetic , Recombination, Genetic , Salmonella enterica/genetics
18.
J Mol Biol ; 397(1): 119-43, 2010 Mar 19.
Article in English | MEDLINE | ID: mdl-20064525

ABSTRACT

Mycobacteriophages are viruses that infect mycobacterial hosts. Expansion of a collection of sequenced phage genomes to a total of 60-all infecting a common bacterial host-provides further insight into their diversity and evolution. Of the 60 phage genomes, 55 can be grouped into nine clusters according to their nucleotide sequence similarities, 5 of which can be further divided into subclusters; 5 genomes do not cluster with other phages. The sequence diversity between genomes within a cluster varies greatly; for example, the 6 genomes in Cluster D share more than 97.5% average nucleotide similarity with one another. In contrast, similarity between the 2 genomes in Cluster I is barely detectable by diagonal plot analysis. In total, 6858 predicted open-reading frames have been grouped into 1523 phamilies (phams) of related sequences, 46% of which possess only a single member. Only 18.8% of the phams have sequence similarity to non-mycobacteriophage database entries, and fewer than 10% of all phams can be assigned functions based on database searching or synteny. Genome clustering facilitates the identification of genes that are in greatest genetic flux and are more likely to have been exchanged horizontally in relatively recent evolutionary time. Although mycobacteriophage genes exhibit a smaller average size than genes of their host (205 residues compared with 315), phage genes in higher flux average only 100 amino acids, suggesting that the primary units of genetic exchange correspond to single protein domains.


Subject(s)
Genes, Viral/genetics , Mycobacteriophages/genetics , Base Sequence , Cluster Analysis , Genetic Variation , Molecular Sequence Data , Multigene Family/genetics , Mycobacteriophages/isolation & purification , Nucleotides/genetics , Open Reading Frames/genetics , Phylogeny , Sequence Alignment , Sequence Analysis, DNA , Virion/genetics
19.
Curr Biol ; 19(20): R943-5, 2009 Nov 03.
Article in English | MEDLINE | ID: mdl-19889369

ABSTRACT

How do bacterial cells mediate effective cooperation? A new paper suggests two routes: converting the uninitiated to their cause by lateral gene transfer, and enforcing cooperative behavior by killing revertants.


Subject(s)
Biological Evolution , Escherichia coli/physiology , Microbial Interactions , Escherichia coli/genetics , Escherichia coli/metabolism , Escherichia coli Proteins/metabolism , Escherichia coli Proteins/physiology , Gene Transfer, Horizontal
20.
Nucleic Acids Res ; 37(16): 5255-66, 2009 Sep.
Article in English | MEDLINE | ID: mdl-19589805

ABSTRACT

While the recognition of genomic islands can be a powerful mechanism for identifying genes that distinguish related bacteria, few methods have been developed to identify them specifically. Rather, identification of islands often begins with cataloging individual genes likely to have been recently introduced into the genome; regions with many putative alien genes are then examined for other features suggestive of recent acquisition of a large genomic region. When few phylogenetic relatives are available, the identification of alien genes relies on their atypical features relative to the bulk of the genes in the genome. The weakness of these 'bottom-up' approaches lies in the difficulty in identifying robustly those genes which are atypical, or phylogenetically restricted, due to recent foreign ancestry. Herein, we apply an alternative 'top-down' approach where bacterial genomes are recursively divided into progressively smaller regions, each with uniform composition. In this way, large chromosomal regions with atypical features are identified with high confidence due to the simultaneous analysis of multiple genes. This approach is based on a generalized divergence measure to quantify the compositional difference between segments in a hypothesis-testing framework. We tested the proposed genome island prediction algorithm on both artificial chimeric genomes and genuine bacterial genomes.


Subject(s)
Algorithms , Genome, Bacterial , Genomic Islands , Genomics/methods , Genes, Bacterial , Genetic Heterogeneity , Salmonella typhi/genetics
SELECTION OF CITATIONS
SEARCH DETAIL
...