RESUMO
Recent technological and computational advances have made metagenomic assembly a viable approach to achieving high-resolution views of complex microbial communities. In previous benchmarking, short-read (SR) metagenomic assemblers had the highest accuracy, long-read (LR) assemblers generated the most contiguous sequences and hybrid (HY) assemblers balanced length and accuracy. However, no assessments have specifically compared the performance of these assemblers on low-abundance species, which include clinically relevant organisms in the gut. We generated semi-synthetic LR and SR datasets by spiking small and increasing amounts of Escherichia coli isolate reads into fecal metagenomes and, using different assemblers, examined E. coli contigs and the presence of antibiotic resistance genes (ARGs). For ARG assembly, although SR assemblers recovered more ARGs with high accuracy, even at low coverages, LR assemblies allowed for the placement of ARGs within longer, E. coli-specific contigs, thus pinpointing their taxonomic origin. HY assemblies identified resistance genes with high accuracy and had lower contiguity than LR assemblies. Each assembler type's strengths were maintained even when our isolate was spiked in with a competing strain, which fragmented and reduced the accuracy of all assemblies. For strain characterization and determining gene context, LR assembly is optimal, while for base-accurate gene identification, SR assemblers outperform other options. HY assembly offers contiguity and base accuracy, but requires generating data on multiple platforms, and may suffer high misassembly rates when strain diversity exists. Our results highlight the trade-offs associated with each approach for recovering low-abundance taxa, and that the optimal approach is goal-dependent.
Assuntos
Metagenoma , Microbiota , Análise de Sequência de DNA/métodos , Escherichia coli/genética , Microbiota/genética , Metagenômica/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodosRESUMO
BACKGROUND: Urinary tract infections (UTIs) affect 15 million women each year in the United States, with > 20% experiencing frequent recurrent UTIs. A recent placebo-controlled clinical trial found a 39% reduction in UTI symptoms among recurrent UTI sufferers who consumed a daily cranberry beverage for 24 weeks. Using metagenomic sequencing of stool from a subset of these trial participants, we assessed the impact of cranberry consumption on the gut microbiota, a reservoir for UTI-causing pathogens such as Escherichia coli, which causes > 80% of UTIs. RESULTS: The overall taxonomic composition, community diversity, carriage of functional pathways and gene families, and relative abundances of the vast majority of observed bacterial taxa, including E. coli, were not changed significantly by cranberry consumption. However, one unnamed Flavonifractor species (OTU41), which represented ≤1% of the overall metagenome, was significantly less abundant in cranberry consumers compared to placebo at trial completion. Given Flavonifractor's association with negative human health effects, we sought to determine OTU41 characteristic genes that may explain its differential abundance and/or relationship to key host functions. Using comparative genomic and metagenomic techniques, we identified genes in OTU41 related to transport and metabolism of various compounds, including tryptophan and cobalamin, which have been shown to play roles in host-microbe interactions. CONCLUSION: While our results indicated that cranberry juice consumption had little impact on global measures of the microbiome, we found one unnamed Flavonifractor species differed significantly between study arms. This suggests further studies are needed to assess the role of cranberry consumption and Flavonifractor in health and wellbeing in the context of recurrent UTI. TRIAL REGISTRATION: Clinical trial registration number: ClinicalTrials.gov NCT01776021 .
Assuntos
Bactérias/efeitos dos fármacos , Microbioma Gastrointestinal/efeitos dos fármacos , Microbioma Gastrointestinal/genética , Extratos Vegetais/administração & dosagem , Vaccinium macrocarpon/química , Adulto , Bactérias/classificação , Bactérias/genética , Bebidas , Método Duplo-Cego , Fezes/microbiologia , Feminino , Microbioma Gastrointestinal/fisiologia , Humanos , Metagenoma , Metagenômica/métodos , Pessoa de Meia-Idade , Reinfecção/microbiologia , Reinfecção/prevenção & controle , Infecções Urinárias/microbiologia , Infecções Urinárias/prevenção & controleRESUMO
BACKGROUND: Mixed infections of Mycobacterium tuberculosis and antibiotic heteroresistance continue to complicate tuberculosis (TB) diagnosis and treatment. Detection of mixed infections has been limited to molecular genotyping techniques, which lack the sensitivity and resolution to accurately estimate the multiplicity of TB infections. In contrast, whole genome sequencing offers sensitive views of the genetic differences between strains of M. tuberculosis within a sample. Although metagenomic tools exist to classify strains in a metagenomic sample, most tools have been developed for more divergent species, and therefore cannot provide the sensitivity required to disentangle strains within closely related bacterial species such as M. tuberculosis. Here we present QuantTB, a method to identify and quantify individual M. tuberculosis strains in whole genome sequencing data. QuantTB uses SNP markers to determine the combination of strains that best explain the allelic variation observed in a sample. QuantTB outputs a list of identified strains, their corresponding relative abundances, and a list of drugs for which resistance-conferring mutations (or heteroresistance) have been predicted within the sample. RESULTS: We show that QuantTB has a high degree of resolution and is capable of differentiating communities differing by less than 25 SNPs and identifying strains down to 1× coverage. Using simulated data, we found QuantTB outperformed other metagenomic strain identification tools at detecting strains and quantifying strain multiplicity. In a real-world scenario, using a dataset of 50 paired clinical isolates from a study of patients with either reinfections or relapses, we found that QuantTB could detect mixed infections and reinfections at rates concordant with a manually curated approach. CONCLUSION: QuantTB can determine infection multiplicity, identify hetero-resistance patterns, enable differentiation between relapse and re-infection, and clarify transmission events across seemingly unrelated patients - even in low-coverage (1×) samples. QuantTB outperforms existing tools and promises to serve as a valuable resource for both clinicians and researchers working with clinical TB samples.
Assuntos
Biologia Computacional/métodos , Genoma Bacteriano , Genômica , Mycobacterium tuberculosis/genética , Tuberculose/microbiologia , Sequenciamento Completo do Genoma , Algoritmos , Antituberculosos/farmacologia , Bases de Dados Genéticas , Farmacorresistência Bacteriana , Genômica/métodos , Mycobacterium tuberculosis/classificação , Mycobacterium tuberculosis/efeitos dos fármacos , Filogenia , Polimorfismo de Nucleotídeo Único , Tuberculose/tratamento farmacológicoRESUMO
BACKGROUND: Immunosuppression is associated with a variety of idiopathic clinical syndromes that may have infectious causes. It has been hypothesized that the cord colitis syndrome, a complication of umbilical-cord hematopoietic stem-cell transplantation, is infectious in origin. METHODS: We performed shotgun DNA sequencing on four archived, paraffin-embedded endoscopic colon-biopsy specimens obtained from two patients with cord colitis. Computational subtraction of human and known microbial sequences and assembly of residual sequences into a bacterial draft genome were performed. We used polymerase-chain-reaction (PCR) assays and fluorescence in situ hybridization to determine whether the corresponding bacterium was present in additional patients and controls. RESULTS: DNA sequencing of the biopsy specimens revealed more than 2.5 million sequencing reads that did not match known organisms. These sequences were computationally assembled into a 7.65-Mb draft genome showing a high degree of homology with genomes of bacteria in the bradyrhizobium genus. The corresponding newly discovered bacterium was provisionally named Bradyrhizobium enterica. PCR identified B. enterica nucleotide sequences in biopsy specimens from all three additional patients with cord colitis whose samples were tested, whereas B. enterica sequences were absent in samples obtained from healthy controls and patients with colon cancer or graft-versus-host disease. CONCLUSIONS: We assembled a novel bacterial draft genome from the direct sequencing of tissue specimens from patients with cord colitis. Association of these sequences with cord colitis suggests that B. enterica may be an opportunistic human pathogen. (Funded by the National Cancer Institute and others.)
Assuntos
Bradyrhizobium/genética , Colite/microbiologia , Colo/microbiologia , Sangue Fetal , Transplante de Células-Tronco Hematopoéticas/efeitos adversos , Infecções Oportunistas/microbiologia , Biópsia , Bradyrhizobium/classificação , Bradyrhizobium/isolamento & purificação , Colite/imunologia , Neoplasias do Colo/microbiologia , DNA Bacteriano/análise , Diarreia/microbiologia , Feminino , Genoma Bacteriano , Doença Enxerto-Hospedeiro/microbiologia , Humanos , Hospedeiro Imunocomprometido , Masculino , Inclusão em Parafina , Filogenia , Reação em Cadeia da Polimerase , Análise de Sequência de DNARESUMO
BACKGROUND: The continued advance of antibiotic resistance threatens the treatment and control of many infectious diseases. This is exemplified by the largest global outbreak of extensively drug-resistant (XDR) tuberculosis (TB) identified in Tugela Ferry, KwaZulu-Natal, South Africa, in 2005 that continues today. It is unclear whether the emergence of XDR-TB in KwaZulu-Natal was due to recent inadequacies in TB control in conjunction with HIV or other factors. Understanding the origins of drug resistance in this fatal outbreak of XDR will inform the control and prevention of drug-resistant TB in other settings. In this study, we used whole genome sequencing and dating analysis to determine if XDR-TB had emerged recently or had ancient antecedents. METHODS AND FINDINGS: We performed whole genome sequencing and drug susceptibility testing on 337 clinical isolates of Mycobacterium tuberculosis collected in KwaZulu-Natal from 2008 to 2013, in addition to three historical isolates, collected from patients in the same province and including an isolate from the 2005 Tugela Ferry XDR outbreak, a multidrug-resistant (MDR) isolate from 1994, and a pansusceptible isolate from 1995. We utilized an array of whole genome comparative techniques to assess the relatedness among strains, to establish the order of acquisition of drug resistance mutations, including the timing of acquisitions leading to XDR-TB in the LAM4 spoligotype, and to calculate the number of independent evolutionary emergences of MDR and XDR. Our sequencing and analysis revealed a 50-member clone of XDR M. tuberculosis that was highly related to the Tugela Ferry XDR outbreak strain. We estimated that mutations conferring isoniazid and streptomycin resistance in this clone were acquired 50 y prior to the Tugela Ferry outbreak (katG S315T [isoniazid]; gidB 130 bp deletion [streptomycin]; 1957 [95% highest posterior density (HPD): 1937-1971]), with the subsequent emergence of MDR and XDR occurring 20 y (rpoB L452P [rifampicin]; pncA 1 bp insertion [pyrazinamide]; 1984 [95% HPD: 1974-1992]) and 10 y (rpoB D435G [rifampicin]; rrs 1400 [kanamycin]; gyrA A90V [ofloxacin]; 1995 [95% HPD: 1988-1999]) prior to the outbreak, respectively. We observed frequent de novo evolution of MDR and XDR, with 56 and nine independent evolutionary events, respectively. Isoniazid resistance evolved before rifampicin resistance 46 times, whereas rifampicin resistance evolved prior to isoniazid only twice. We identified additional putative compensatory mutations to rifampicin in this dataset. One major limitation of this study is that the conclusions with respect to ordering and timing of acquisition of mutations may not represent universal patterns of drug resistance emergence in other areas of the globe. CONCLUSIONS: In the first whole genome-based analysis of the emergence of drug resistance among clinical isolates of M. tuberculosis, we show that the ancestral precursor of the LAM4 XDR outbreak strain in Tugela Ferry gained mutations to first-line drugs at the beginning of the antibiotic era. Subsequent accumulation of stepwise resistance mutations, occurring over decades and prior to the explosion of HIV in this region, yielded MDR and XDR, permitting the emergence of compensatory mutations. Our results suggest that drug-resistant strains circulating today reflect not only vulnerabilities of current TB control efforts but also those that date back 50 y. In drug-resistant TB, isoniazid resistance was overwhelmingly the initial resistance mutation to be acquired, which would not be detected by current rapid molecular diagnostics employed in South Africa that assess only rifampicin resistance.
Assuntos
Antituberculosos/farmacologia , Tuberculose Extensivamente Resistente a Medicamentos/genética , Genoma Bacteriano , Mycobacterium tuberculosis/genética , Adulto , Surtos de Doenças , Tuberculose Extensivamente Resistente a Medicamentos/tratamento farmacológico , Tuberculose Extensivamente Resistente a Medicamentos/epidemiologia , Feminino , Humanos , Masculino , Testes de Sensibilidade Microbiana , Mutação , Mycobacterium tuberculosis/efeitos dos fármacos , Mycobacterium tuberculosis/isolamento & purificação , Análise de Sequência de DNA , África do Sul/epidemiologiaRESUMO
Exceptionally accurate genome reference sequences have proven to be of great value to microbial researchers. Thus, to date, about 1800 bacterial genome assemblies have been "finished" at great expense with the aid of manual laboratory and computational processes that typically iterate over a period of months or even years. By applying a new laboratory design and new assembly algorithm to 16 samples, we demonstrate that assemblies exceeding finished quality can be obtained from whole-genome shotgun data and automated computation. Cost and time requirements are thus dramatically reduced.
Assuntos
Bactérias/genética , Genoma Bacteriano , Biblioteca Genômica , Análise de Sequência de DNA/métodos , AlgoritmosRESUMO
The degree to which molecular epidemiology reveals information about the sources and transmission patterns of an outbreak depends on the resolution of the technology used and the samples studied. Isolates of Escherichia coli O104:H4 from the outbreak centered in Germany in May-July 2011, and the much smaller outbreak in southwest France in June 2011, were indistinguishable by standard tests. We report a molecular epidemiological analysis using multiplatform whole-genome sequencing and analysis of multiple isolates from the German and French outbreaks. Isolates from the German outbreak showed remarkably little diversity, with only two single nucleotide polymorphisms (SNPs) found in isolates from four individuals. Surprisingly, we found much greater diversity (19 SNPs) in isolates from seven individuals infected in the French outbreak. The German isolates form a clade within the more diverse French outbreak strains. Moreover, five isolates derived from a single infected individual from the French outbreak had extremely limited diversity. The striking difference in diversity between the German and French outbreak samples is consistent with several hypotheses, including a bottleneck that purged diversity in the German isolates, variation in mutation rates in the two E. coli outbreak populations, or uneven distribution of diversity in the seed populations that led to each outbreak.
Assuntos
Surtos de Doenças/estatística & dados numéricos , Infecções por Escherichia coli/epidemiologia , Infecções por Escherichia coli/microbiologia , Escherichia coli/genética , Escherichia coli/isolamento & purificação , Infecções por Escherichia coli/genética , Europa (Continente)/epidemiologia , Humanos , Modelos Genéticos , Filogenia , Polimorfismo de Nucleotídeo Único/genéticaRESUMO
Massively parallel DNA sequencing technologies are revolutionizing genomics by making it possible to generate billions of relatively short (~100-base) sequence reads at very low cost. Whereas such data can be readily used for a wide range of biomedical applications, it has proven difficult to use them to generate high-quality de novo genome assemblies of large, repeat-rich vertebrate genomes. To date, the genome assemblies generated from such data have fallen far short of those obtained with the older (but much more expensive) capillary-based sequencing approach. Here, we report the development of an algorithm for genome assembly, ALLPATHS-LG, and its application to massively parallel DNA sequence data from the human and mouse genomes, generated on the Illumina platform. The resulting draft genome assemblies have good accuracy, short-range contiguity, long-range connectivity, and coverage of the genome. In particular, the base accuracy is high (≥99.95%) and the scaffold sizes (N50 size = 11.5 Mb for human and 7.2 Mb for mouse) approach those obtained with capillary-based sequencing. The combination of improved sequencing technology and improved computational methods should now make it possible to increase dramatically the de novo sequencing of large genomes. The ALLPATHS-LG program is available at http://www.broadinstitute.org/science/programs/genome-biology/crd.
Assuntos
Algoritmos , Genômica/métodos , Análise de Sequência de DNA/métodos , Software , Animais , Genoma/genética , Humanos , Internet , Camundongos , Reprodutibilidade dos TestesRESUMO
Human-associated microbial communities comprise not only complex mixtures of bacterial species, but also mixtures of conspecific strains, the implications of which are mostly unknown since strain level dynamics are underexplored due to the difficulties of studying them. We introduce the Strain Genome Explorer (StrainGE) toolkit, which deconvolves strain mixtures and characterizes component strains at the nucleotide level from short-read metagenomic sequencing with higher sensitivity and resolution than other tools. StrainGE is able to identify strains at 0.1x coverage and detect variants for multiple conspecific strains within a sample from coverages as low as 0.5x.
Assuntos
Microbiota , Bactérias/genética , Humanos , Metagenoma , Metagenômica , Microbiota/genéticaRESUMO
BACKGROUND: Carbapenem-resistant Enterobacterales (CRE) are an urgent global health threat. Inferring the dynamics of local CRE dissemination is currently limited by our inability to confidently trace the spread of resistance determinants to unrelated bacterial hosts. Whole-genome sequence comparison is useful for identifying CRE clonal transmission and outbreaks, but high-frequency horizontal gene transfer (HGT) of carbapenem resistance genes and subsequent genome rearrangement complicate tracing the local persistence and mobilization of these genes across organisms. METHODS: To overcome this limitation, we developed a new approach to identify recent HGT of large, near-identical plasmid segments across species boundaries, which also allowed us to overcome technical challenges with genome assembly. We applied this to complete and near-complete genome assemblies to examine the local spread of CRE in a systematic, prospective collection of all CRE, as well as time- and species-matched carbapenem-susceptible Enterobacterales, isolated from patients from four US hospitals over nearly 5 years. RESULTS: Our CRE collection comprised a diverse range of species, lineages, and carbapenem resistance mechanisms, many of which were encoded on a variety of promiscuous plasmid types. We found and quantified rearrangement, persistence, and repeated transfer of plasmid segments, including those harboring carbapenemases, between organisms over multiple years. Some plasmid segments were found to be strongly associated with specific locales, thus representing geographic signatures that make it possible to trace recent and localized HGT events. Functional analysis of these signatures revealed genes commonly found in plasmids of nosocomial pathogens, such as functions required for plasmid retention and spread, as well survival against a variety of antibiotic and antiseptics common to the hospital environment. CONCLUSIONS: Collectively, the framework we developed provides a clearer, high-resolution picture of the epidemiology of antibiotic resistance importation, spread, and persistence in patients and healthcare networks.
Assuntos
Carbapenêmicos , Transferência Genética Horizontal , Antibacterianos/farmacologia , Carbapenêmicos/farmacologia , Humanos , Plasmídeos/genética , Estudos ProspectivosRESUMO
Recurrent urinary tract infections (rUTIs) are a major health burden worldwide, with history of infection being a significant risk factor. While the gut is a known reservoir for uropathogenic bacteria, the role of the microbiota in rUTI remains unclear. We conducted a year-long study of women with (n = 15) and without (n = 16) history of rUTI, from whom we collected urine, blood and monthly faecal samples for metagenomic and transcriptomic interrogation. During the study 24 UTIs were reported, with additional samples collected during and after infection. The gut microbiome of individuals with a history of rUTI was significantly depleted in microbial richness and butyrate-producing bacteria compared with controls, reminiscent of other inflammatory conditions. However, Escherichia coli gut and bladder populations were comparable between cohorts in both relative abundance and phylogroup. Transcriptional analysis of peripheral blood mononuclear cells revealed expression profiles indicative of differential systemic immunity between cohorts. Altogether, these results suggest that rUTI susceptibility is in part mediated through the gut-bladder axis, comprising gut dysbiosis and differential immune response to bacterial bladder colonization, manifesting in symptoms.
Assuntos
Infecções por Escherichia coli , Microbioma Gastrointestinal , Infecções Urinárias , Disbiose , Escherichia coli , Infecções por Escherichia coli/microbiologia , Feminino , Humanos , Leucócitos Mononucleares , Masculino , Infecções Urinárias/microbiologiaRESUMO
A more complete understanding of the genetic basis of drug resistance in Mycobacterium tuberculosis is critical for prompt diagnosis and optimal treatment, particularly for toxic second-line drugs such as D-cycloserine. Here we used the whole-genome sequences from 498 strains of M. tuberculosis to identify new resistance-conferring genotypes. By combining association and correlated evolution tests with strategies for amplifying signal from rare variants, we found that loss-of-function mutations in ald (Rv2780), encoding L-alanine dehydrogenase, were associated with unexplained drug resistance. Convergent evolution of this loss of function was observed exclusively among multidrug-resistant strains. Drug susceptibility testing established that ald loss of function conferred resistance to D-cycloserine, and susceptibility to the drug was partially restored by complementation of ald. Clinical strains with mutations in ald and alr exhibited increased resistance to D-cycloserine when cultured in vitro. Incorporation of D-cycloserine resistance in novel molecular diagnostics could allow for targeted use of this toxic drug among patients with susceptible infections.
Assuntos
Antibióticos Antituberculose/farmacologia , Ciclosserina/farmacologia , Mycobacterium tuberculosis/efeitos dos fármacos , Mycobacterium tuberculosis/genética , Alanina Desidrogenase/genética , Alanina Desidrogenase/metabolismo , Alanina Racemase/genética , Antituberculosos , Farmacorresistência Bacteriana/genética , Técnicas de Inativação de Genes , Genoma Bacteriano , Testes de Sensibilidade Microbiana , Mutação , Mycobacterium tuberculosis/enzimologiaRESUMO
Advances in modern sequencing technologies allow us to generate sufficient data to analyze hundreds of bacterial genomes from a single machine in a single day. This potential for sequencing massive numbers of genomes calls for fully automated methods to produce high-quality assemblies and variant calls. We introduce Pilon, a fully automated, all-in-one tool for correcting draft assemblies and calling sequence variants of multiple sizes, including very large insertions and deletions. Pilon works with many types of sequence data, but is particularly strong when supplied with paired end data from two Illumina libraries with small e.g., 180 bp and large e.g., 3-5 Kb inserts. Pilon significantly improves draft genome assemblies by correcting bases, fixing mis-assemblies and filling gaps. For both haploid and diploid genomes, Pilon produces more contiguous genomes with fewer errors, enabling identification of more biologically relevant genes. Furthermore, Pilon identifies small variants with high accuracy as compared to state-of-the-art tools and is unique in its ability to accurately identify large sequence variants including duplications and resolve large insertions. Pilon is being used to improve the assemblies of thousands of new genomes and to identify variants from thousands of clinically relevant bacterial strains. Pilon is freely available as open source software.