RESUMO
There is extensive genomic diversity among Streptococcus pneumoniae isolates. Approximately half of the comprehensive set of genes in the species (the supragenome or pangenome) is present in all the isolates (core set), and the remaining is unevenly distributed among strains (distributed set). The Streptococcus pneumoniae Supragenome Hybridization (SpSGH) array provides coverage for an extensive set of genes and polymorphisms encountered within this species, capturing this genomic diversity. Further, the capture is quantitative. In this manner, the SpSGH array allows for both genomic and transcriptomic analyses of diverse S. pneumoniae isolates on a single platform. In this unit, we present the SpSGH array, and describe in detail its design and implementation for both genomic and transcriptomic analyses. The methodology can be applied to construction and modification of SpSGH array platforms, as well to other bacterial species as long as multiple whole-genome sequences are available that collectively capture the vast majority of the species supragenome.
Assuntos
Variação Genética , Genética Microbiana/métodos , Biologia Molecular/métodos , Hibridização de Ácido Nucleico/métodos , Streptococcus pneumoniae/classificação , Streptococcus pneumoniae/genética , Perfilação da Expressão Gênica/métodos , Genômica/métodosRESUMO
We previously carried out the design and testing of a custom-built Haemophilus influenzae supragenome hybridization (SGH) array that contains probe sequences to 2,890 gene clusters identified by whole genome sequencing of 24 strains of H. influenzae. The array was originally designed as a tool to interrogate the gene content of large numbers of clinical isolates without the need for sequencing, however, the data obtained is quantitative and is thus suitable for transcriptomic analyses. In the current study RNA was extracted from H. influenzae strain CZ4126/02 (which was not included in the design of the array) converted to cDNA, and labelled and hybridized to the SGH arrays to assess the quality and reproducibility of data obtained from these custom-designed chips to serve as a tool for transcriptomics. Three types of experimental replicates were analyzed with all showing very high degrees of correlation, thus validating both the array and the methods used for RNA profiling. A custom filtering pipeline for two-condition unpaired data using five metrics was developed to minimize variability within replicates and to maximize the identification of the most significant true transcriptional differences between two samples. These methods can be extended to transcriptional analysis of other bacterial species utilizing supragenome-based arrays.
Assuntos
Genoma Bacteriano , Haemophilus influenzae/genética , Hibridização de Ácido Nucleico , Transcriptoma , Perfilação da Expressão Gênica , Genômica/métodos , Humanos , Reprodutibilidade dos TestesRESUMO
BACKGROUND: Haemophilus influenzae colonizes the human nasopharynx as a commensal, and is etiologically associated with numerous opportunistic infections of the airway; it is also less commonly associated with invasive disease. Clinical isolates of H. influenzae display extensive genomic diversity and plasticity. The development of strategies to successfully prevent, diagnose and treat H. influenzae infections depends on tools to ascertain the gene content of individual isolates. RESULTS: We describe and validate a Haemophilus influenzae supragenome hybridization (SGH) array that can be used to characterize the full genic complement of any strain within the species, as well as strains from several highly related species. The array contains 31,307 probes that collectively cover essentially all alleles of the 2890 gene clusters identified from the whole genome sequencing of 24 clinical H. influenzae strains. The finite supragenome model predicts that these data include greater than 85% of all non-rare genes (where rare genes are defined as those present in less than 10% of sequenced strains). The veracity of the array was tested by comparing the whole genome sequences of eight strains with their hybridization data obtained using the supragenome array. The array predictions were correct and reproducible for ~ 98% of the gene content of all of the sequenced strains. This technology was then applied to an investigation of the gene content of 193 geographically and clinically diverse H. influenzae clinical strains. These strains came from multiple locations from five different continents and Papua New Guinea and include isolates from: the middle ears of persons with otitis media and otorrhea; lung aspirates and sputum samples from pneumonia and COPD patients, blood specimens from patients with sepsis; cerebrospinal fluid from patients with meningitis, as well as from pharyngeal specimens from healthy persons. CONCLUSIONS: These analyses provided the most comprehensive and detailed genomic/phylogenetic look at this species to date, and identified a subset of highly divergent strains that form a separate lineage within the species. This array provides a cost-effective and high-throughput tool to determine the gene content of any H. influenzae isolate or lineage. Furthermore, the method for probe selection can be applied to any species, given a group of available whole genome sequences.
Assuntos
Genômica/métodos , Haemophilus influenzae/genética , Hibridização de Ácido Nucleico/métodos , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Genes Bacterianos/genética , Haemophilus influenzae/patogenicidade , Anotação de Sequência Molecular , Análise de SequênciaRESUMO
Gardnerella vaginalis is associated with a spectrum of clinical conditions, suggesting high degrees of genetic heterogeneity among stains. Seventeen G. vaginalis isolates were subjected to a battery of comparative genomic analyses to determine their level of relatedness. For each measure, the degree of difference among the G. vaginalis strains was the highest observed among 23 pathogenic bacterial species for which at least eight genomes are available. Genome sizes ranged from 1.491 to 1.716 Mb; GC contents ranged from 41.18% to 43.40%; and the core genome, consisting of only 746 genes, makes up only 51.6% of each strain's genome on average and accounts for only 27% of the species supragenome. Neighbor-grouping analyses, using both distributed gene possession data and core gene allelic data, each identified two major sets of strains, each of which is composed of two groups. Each of the four groups has its own characteristic genome size, GC ratio, and greatly expanded core gene content, making the genomic diversity of each group within the range for other bacterial species. To test whether these 4 groups corresponded to genetically isolated clades, we inferred the phylogeny of each distributed gene that was present in at least two strains and absent in at least two strains; this analysis identified frequent homologous recombination within groups but not between groups or sets. G. vaginalis appears to include four nonrecombining groups/clades of organisms with distinct gene pools and genomic properties, which may confer distinct ecological properties. Consequently, it may be appropriate to treat these four groups as separate species.
Assuntos
Infecções Bacterianas/microbiologia , DNA Bacteriano/genética , Gardnerella vaginalis/classificação , Gardnerella vaginalis/genética , Genoma Bacteriano , Polimorfismo Genético , Composição de Bases , Análise por Conglomerados , DNA Bacteriano/química , Gardnerella vaginalis/isolamento & purificação , Genes Bacterianos , Genótipo , Humanos , Dados de Sequência Molecular , Filogenia , Análise de Sequência de DNARESUMO
We report on the comparative genomics and characterization of the virulence phenotypes of four S. pneumoniae strains that belong to the multidrug resistant clone PMEN1 (Spain(23F) ST81). Strains SV35-T23 and SV36-T3 were recovered in 1996 from the nasopharynx of patients at an AIDS hospice in New York. Strain SV36-T3 expressed capsule type 3 which is unusual for this clone and represents the product of an in vivo capsular switch event. A third PMEN1 isolate - PN4595-T23 - was recovered in 1996 from the nasopharynx of a child attending day care in Portugal, and a fourth strain - ATCC700669 - was originally isolated from a patient with pneumococcal disease in Spain in 1984. We compared the genomes among four PMEN1 strains and 47 previously sequenced pneumococcal isolates for gene possession differences and allelic variations within core genes. In contrast to the 47 strains - representing a variety of clonal types - the four PMEN1 strains grouped closely together, demonstrating high genomic conservation within this lineage relative to the rest of the species. In the four PMEN1 strains allelic and gene possession differences were clustered into 18 genomic regions including the capsule, the blp bacteriocins, erythromycin resistance, the MM1-2008 prophage and multiple cell wall anchored proteins. In spite of their genomic similarity, the high resolution chinchilla model was able to detect variations in virulence properties of the PMEN1 strains highlighting how small genic or allelic variation can lead to significant changes in pathogenicity and making this set of strains ideal for the identification of novel virulence determinants.