Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 18 de 18
Filter
Add more filters










Publication year range
1.
Nat Commun ; 12(1): 2879, 2021 05 17.
Article in English | MEDLINE | ID: mdl-34001879

ABSTRACT

As whole-genome sequencing capacity becomes increasingly decentralized, there is a growing opportunity for collaboration and the sharing of surveillance data within and between countries to inform typhoid control policies. This vision requires free, community-driven tools that facilitate access to genomic data for public health on a global scale. Here we present the Pathogenwatch scheme for Salmonella enterica serovar Typhi (S. Typhi), a web application enabling the rapid identification of genomic markers of antimicrobial resistance (AMR) and contextualization with public genomic data. We show that the clustering of S. Typhi genomes in Pathogenwatch is comparable to established bioinformatics methods, and that genomic predictions of AMR are highly concordant with phenotypic susceptibility data. We demonstrate the public health utility of Pathogenwatch with examples selected from >4,300 public genomes available in the application. Pathogenwatch provides an intuitive entry point to monitor of the emergence and spread of S. Typhi high risk clones.


Subject(s)
Anti-Bacterial Agents/pharmacology , Drug Resistance, Multiple, Bacterial/genetics , Salmonella typhi/drug effects , Typhoid Fever/prevention & control , Bacterial Proteins/genetics , Genome, Bacterial/genetics , Genomics/methods , Genotype , Geography , Humans , Malawi , Membrane Transport Proteins/genetics , Microbial Sensitivity Tests/methods , Mutation , Salmonella typhi/genetics , Salmonella typhi/physiology , Tanzania , Typhoid Fever/microbiology
3.
PeerJ ; 6: e5233, 2018.
Article in English | MEDLINE | ID: mdl-30083440

ABSTRACT

Genome sequencing is rapidly being adopted in reference labs and hospitals for bacterial outbreak investigation and diagnostics where time is critical. Seven gene multi-locus sequence typing is a standard tool for broadly classifying samples into sequence types (STs), allowing, in many cases, to rule a sample out of an outbreak, or allowing for general characteristics about a bacterial strain to be inferred. Long-read sequencing technologies, such as from Oxford Nanopore, can produce read data within minutes of an experiment starting, unlike short-read sequencing technologies which require many hours/days. However, the error rates of raw uncorrected long read data are very high. We present Krocus which can predict a ST directly from uncorrected long reads, and which was designed to consume read data as it is produced, providing results in minutes. It is the only tool which can do this from uncorrected long reads. We tested Krocus on over 700 isolates sequenced using long-read sequencing technologies from Pacific Biosciences and Oxford Nanopore. It provides STs for isolates on average within 90 s, with a sensitivity of 94% and specificity of 97% on real sample data, directly from uncorrected raw sequence reads. The software is written in Python and is available under the open source license GNU GPL version 3.

4.
Microb Genom ; 4(7)2018 07.
Article in English | MEDLINE | ID: mdl-29870330

ABSTRACT

Streptococcus pneumoniae is responsible for 240 000-460 000 deaths in children under 5 years of age each year. Accurate identification of pneumococcal serotypes is important for tracking the distribution and evolution of serotypes following the introduction of effective vaccines. Recent efforts have been made to infer serotypes directly from genomic data but current software approaches are limited and do not scale well. Here, we introduce a novel method, SeroBA, which uses a k-mer approach. We compare SeroBA against real and simulated data and present results on the concordance and computational performance against a validation dataset, the robustness and scalability when analysing a large dataset, and the impact of varying the depth of coverage on sequence-based serotyping. SeroBA can predict serotypes, by identifying the cps locus, directly from raw whole genome sequencing read data with 98 % concordance using a k-mer-based method, can process 10 000 samples in just over 1 day using a standard server and can call serotypes at a coverage as low as 15-21×. SeroBA is implemented in Python3 and is freely available under an open source GPLv3 licence from: https://github.com/sanger-pathogens/seroba.


Subject(s)
High-Throughput Nucleotide Sequencing/methods , Pneumococcal Infections/microbiology , Serotyping/methods , Software , Streptococcus mitis/genetics , Streptococcus pneumoniae/classification , Whole Genome Sequencing , Alleles , Child, Preschool , Databases, Genetic , Genes, Bacterial , Humans , Polymorphism, Single Nucleotide , Sensitivity and Specificity , Serogroup , Streptococcus pneumoniae/genetics , Streptococcus pneumoniae/isolation & purification
5.
Microb Genom ; 4(3)2018 03.
Article in English | MEDLINE | ID: mdl-29533742

ABSTRACT

Increasingly rich metadata are now being linked to samples that have been whole-genome sequenced. However, much of this information is ignored. This is because linking this metadata to genes, or regions of the genome, usually relies on knowing the gene sequence(s) responsible for the particular trait being measured and looking for its presence or absence in that genome. Examples of this would be the spread of antimicrobial resistance genes carried on mobile genetic elements (MGEs). However, although it is possible to routinely identify the resistance gene, identifying the unknown MGE upon which it is carried can be much more difficult if the starting point is short-read whole-genome sequence data. The reason for this is that MGEs are often full of repeats and so assemble poorly, leading to fragmented consensus sequences. Since mobile DNA, which can carry many clinically and ecologically important genes, has a different evolutionary history from the host, its distribution across the host population will, by definition, be independent of the host phylogeny. It is possible to use this phenomenon in a genome-wide association study to identify both the genes associated with the specific trait and also the DNA linked to that gene, for example the flanking sequence of the plasmid vector on which it is encoded, which follows the same patterns of distribution as the marker gene/sequence itself. We present PlasmidTron, which utilizes the phenotypic data normally available in bacterial population studies, such as antibiograms, virulence factors, or geographical information, to identify traits that are likely to be present on DNA that can randomly reassort across defined bacterial populations. It is also possible to use this methodology to associate unknown genes/sequences (e.g. plasmid backbones) with a specific molecular signature or marker (e.g. resistance gene presence or absence) using PlasmidTron. PlasmidTron uses a k-mer-based approach to identify reads associated with a phylogenetically unlinked phenotype. These reads are then assembled de novo to produce contigs in a fast and scalable-to-large manner. PlasmidTron is written in Python 3 and is available under the open source licence GNU GPL3 from https://github.com/sanger-pathogens/plasmidtron.


Subject(s)
Genetic Association Studies , DNA Copy Number Variations , Genome, Bacterial , Genotype , High-Throughput Nucleotide Sequencing , Klebsiella pneumoniae/genetics , Klebsiella pneumoniae/isolation & purification , Microbial Sensitivity Tests , Phenotype , Phylogeny , Plasmids/genetics , Plasmids/isolation & purification , Salmonella enterica/genetics , Salmonella enterica/isolation & purification , Sequence Analysis, DNA
6.
Microb Genom ; 3(10): e000131, 2017 10.
Article in English | MEDLINE | ID: mdl-29177089

ABSTRACT

Antimicrobial resistance (AMR) is one of the major threats to human and animal health worldwide, yet few high-throughput tools exist to analyse and predict the resistance of a bacterial isolate from sequencing data. Here we present a new tool, ARIBA, that identifies AMR-associated genes and single nucleotide polymorphisms directly from short reads, and generates detailed and customizable output. The accuracy and advantages of ARIBA over other tools are demonstrated on three datasets from Gram-positive and Gram-negative bacteria, with ARIBA outperforming existing methods.


Subject(s)
Drug Resistance, Microbial/genetics , Enterococcus faecium/genetics , Genomics , Infections/microbiology , Neisseria gonorrhoeae/genetics , Shigella sonnei/genetics , Software , Animals , Humans
7.
Emerg Infect Dis ; 23(11): 1872-1875, 2017 11.
Article in English | MEDLINE | ID: mdl-29048298

ABSTRACT

Klebsiella pneumoniae shows increasing emergence of multidrug-resistant lineages, including strains resistant to all available antimicrobial drugs. We conducted whole-genome sequencing of 178 highly drug-resistant isolates from a tertiary hospital in Lahore, Pakistan. Phylogenetic analyses to place these isolates into global context demonstrate the expansion of multiple independent lineages, including K. quasipneumoniae.


Subject(s)
Drug Resistance, Multiple, Bacterial/genetics , Genome, Bacterial/genetics , Klebsiella Infections/microbiology , Klebsiella pneumoniae/genetics , Adolescent , Anti-Bacterial Agents/pharmacology , Child , Child, Hospitalized , Child, Preschool , Humans , Infant , Infant, Newborn , Klebsiella Infections/epidemiology , Klebsiella pneumoniae/isolation & purification , Pakistan/epidemiology , Phylogeny , Sequence Analysis, DNA
8.
Microb Genom ; 3(8): e000124, 2017 08.
Article in English | MEDLINE | ID: mdl-29026660

ABSTRACT

Multi-locus sequence typing (MLST) is a widely used method for categorizing bacteria. Increasingly, MLST is being performed using next-generation sequencing (NGS) data by reference laboratories and for clinical diagnostics. Many software applications have been developed to calculate sequence types from NGS data; however, there has been no comprehensive review to date on these methods. We have compared eight of these applications against real and simulated data, and present results on: (1) the accuracy of each method against traditional typing methods, (2) the performance on real outbreak datasets, (3) the impact of contamination and varying depth of coverage, and (4) the computational resource requirements.


Subject(s)
Bacteria/genetics , Bacterial Typing Techniques/methods , Multilocus Sequence Typing/methods , Databases, Factual , Genome, Bacterial , Software
9.
Bioinformatics ; 32(7): 1109-11, 2016 04 01.
Article in English | MEDLINE | ID: mdl-26794317

ABSTRACT

UNLABELLED: Transposon insertion sequencing is a high-throughput technique for assaying large libraries of otherwise isogenic transposon mutants providing insight into gene essentiality, gene function and genetic interactions. We previously developed the Transposon Directed Insertion Sequencing (TraDIS) protocol for this purpose, which utilizes shearing of genomic DNA followed by specific PCR amplification of transposon-containing fragments and Illumina sequencing. Here we describe an optimized high-yield library preparation and sequencing protocol for TraDIS experiments and a novel software pipeline for analysis of the resulting data. The Bio-Tradis analysis pipeline is implemented as an extensible Perl library which can either be used as is, or as a basis for the development of more advanced analysis tools. This article can serve as a general reference for the application of the TraDIS methodology. AVAILABILITY AND IMPLEMENTATION: The optimized sequencing protocol is included as supplementary information. The Bio-Tradis analysis pipeline is available under a GPL license at https://github.com/sanger-pathogens/Bio-Tradis CONTACT: parkhill@sanger.ac.uk SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
DNA Transposable Elements , Gene Library , Software , High-Throughput Nucleotide Sequencing
10.
Microb Genom ; 2(4): e000056, 2016 04.
Article in English | MEDLINE | ID: mdl-28348851

ABSTRACT

Rapidly decreasing genome sequencing costs have led to a proportionate increase in the number of samples used in prokaryotic population studies. Extracting single nucleotide polymorphisms (SNPs) from a large whole genome alignment is now a routine task, but existing tools have failed to scale efficiently with the increased size of studies. These tools are slow, memory inefficient and are installed through non-standard procedures. We present SNP-sites which can rapidly extract SNPs from a multi-FASTA alignment using modest resources and can output results in multiple formats for downstream analysis. SNPs can be extracted from a 8.3 GB alignment file (1842 taxa, 22 618 sites) in 267 seconds using 59 MB of RAM and 1 CPU core, making it feasible to run on modest computers. It is easy to install through the Debian and Homebrew package managers, and has been successfully tested on more than 20 operating systems. SNP-sites is implemented in C and is available under the open source license GNU GPL version 3.


Subject(s)
Algorithms , Polymorphism, Single Nucleotide/genetics , Sequence Analysis, DNA/methods , Software , Base Sequence , Genome/genetics , High-Throughput Nucleotide Sequencing , Sequence Alignment
11.
Microb Genom ; 2(8): e000083, 2016 08.
Article in English | MEDLINE | ID: mdl-28348874

ABSTRACT

The rapidly reducing cost of bacterial genome sequencing has lead to its routine use in large-scale microbial analysis. Though mapping approaches can be used to find differences relative to the reference, many bacteria are subject to constant evolutionary pressures resulting in events such as the loss and gain of mobile genetic elements, horizontal gene transfer through recombination and genomic rearrangements. De novo assembly is the reconstruction of the underlying genome sequence, an essential step to understanding bacterial genome diversity. Here we present a high-throughput bacterial assembly and improvement pipeline that has been used to generate nearly 20 000 annotated draft genome assemblies in public databases. We demonstrate its performance on a public data set of 9404 genomes. We find all the genes used in multi-locus sequence typing schema present in 99.6 % of assembled genomes. When tested on low-, neutral- and high-GC organisms, more than 94 % of genes were present and completely intact. The pipeline has been proven to be scalable and robust with a wide variety of datasets without requiring human intervention. All of the software is available on GitHub under the GNU GPL open source license.


Subject(s)
Genomics/methods , Sequence Analysis, DNA/methods , Software , Genome, Bacterial/genetics , Genomics/economics , High-Throughput Nucleotide Sequencing , Multilocus Sequence Typing , Prokaryotic Cells
12.
Genome Biol ; 16: 294, 2015 Dec 29.
Article in English | MEDLINE | ID: mdl-26714481

ABSTRACT

The assembly of DNA sequence data is undergoing a renaissance thanks to emerging technologies capable of producing reads tens of kilobases long. Assembling complete bacterial and small eukaryotic genomes is now possible, but the final step of circularizing sequences remains unsolved. Here we present Circlator, the first tool to automate assembly circularization and produce accurate linear representations of circular sequences. Using Pacific Biosciences and Oxford Nanopore data, Circlator correctly circularized 26 of 27 circularizable sequences, comprising 11 chromosomes and 12 plasmids from bacteria, the apicoplast and mitochondrion of Plasmodium falciparum and a human mitochondrion. Circlator is available at http://sanger-pathogens.github.io/circlator/ .


Subject(s)
Contig Mapping/methods , DNA, Circular/genetics , Software , Genome, Bacterial , Genome, Mitochondrial , Genome, Protozoan , Humans , Plasmodium falciparum/genetics
13.
Bioinformatics ; 31(22): 3691-3, 2015 Nov 15.
Article in English | MEDLINE | ID: mdl-26198102

ABSTRACT

UNLABELLED: A typical prokaryote population sequencing study can now consist of hundreds or thousands of isolates. Interrogating these datasets can provide detailed insights into the genetic structure of prokaryotic genomes. We introduce Roary, a tool that rapidly builds large-scale pan genomes, identifying the core and accessory genes. Roary makes construction of the pan genome of thousands of prokaryote samples possible on a standard desktop without compromising on the accuracy of results. Using a single CPU Roary can produce a pan genome consisting of 1000 isolates in 4.5 hours using 13 GB of RAM, with further speedups possible using multiple processors. AVAILABILITY AND IMPLEMENTATION: Roary is implemented in Perl and is freely available under an open source GPLv3 license from http://sanger-pathogens.github.io/Roary CONTACT: roary@sanger.ac.uk SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Genome, Bacterial , Prokaryotic Cells/metabolism , Software , Computer Simulation , Databases, Genetic , Salmonella typhi/genetics
14.
Nat Genet ; 47(6): 632-9, 2015 Jun.
Article in English | MEDLINE | ID: mdl-25961941

ABSTRACT

The emergence of multidrug-resistant (MDR) typhoid is a major global health threat affecting many countries where the disease is endemic. Here whole-genome sequence analysis of 1,832 Salmonella enterica serovar Typhi (S. Typhi) identifies a single dominant MDR lineage, H58, that has emerged and spread throughout Asia and Africa over the last 30 years. Our analysis identifies numerous transmissions of H58, including multiple transfers from Asia to Africa and an ongoing, unrecognized MDR epidemic within Africa itself. Notably, our analysis indicates that H58 lineages are displacing antibiotic-sensitive isolates, transforming the global population structure of this pathogen. H58 isolates can harbor a complex MDR element residing either on transmissible IncHI1 plasmids or within multiple chromosomal integration sites. We also identify new mutations that define the H58 lineage. This phylogeographical analysis provides a framework to facilitate global management of MDR typhoid and is applicable to similar MDR lineages emerging in other bacterial species.


Subject(s)
Salmonella typhi/genetics , Typhoid Fever/microbiology , Anti-Bacterial Agents/pharmacology , Anti-Bacterial Agents/therapeutic use , Drug Resistance, Multiple, Bacterial , Genome, Bacterial , Humans , Molecular Sequence Data , Phylogeny , Phylogeography , Quinolines/pharmacology , Quinolines/therapeutic use , Sequence Analysis, DNA , Typhoid Fever/drug therapy , Typhoid Fever/transmission
15.
Bioinformatics ; 31(14): 2374-6, 2015 Jul 15.
Article in English | MEDLINE | ID: mdl-25725497

ABSTRACT

MOTIVATION: An accurate genome assembly from short read sequencing data is critical for downstream analysis, for example allowing investigation of variants within a sequenced population. However, assembling sequencing data from virus samples, especially RNA viruses, into a genome sequence is challenging due to the combination of viral population diversity and extremely uneven read depth caused by amplification bias in the inevitable reverse transcription and polymerase chain reaction amplification process of current methods. RESULTS: We developed a new de novo assembler called IVA (Iterative Virus Assembler) designed specifically for read pairs sequenced at highly variable depth from RNA virus samples. We tested IVA on datasets from 140 sequenced samples from human immunodeficiency virus-1 or influenza-virus-infected people and demonstrated that IVA outperforms all other virus de novo assemblers. AVAILABILITY AND IMPLEMENTATION: The software runs under Linux, has the GPLv3 licence and is freely available from http://sanger-pathogens.github.io/iva


Subject(s)
Genome, Viral , HIV-1/genetics , Influenza A virus/genetics , Influenza B virus/genetics , RNA Viruses/genetics , Sequence Analysis, DNA/methods , Software , HIV Infections/genetics , HIV Infections/virology , HIV-1/isolation & purification , High-Throughput Nucleotide Sequencing/methods , Humans , Influenza A virus/isolation & purification , Influenza B virus/isolation & purification , Influenza, Human/genetics , Influenza, Human/virology , Polymerase Chain Reaction/methods
16.
Nucleic Acids Res ; 43(3): e15, 2015 Feb 18.
Article in English | MEDLINE | ID: mdl-25414349

ABSTRACT

The emergence of new sequencing technologies has facilitated the use of bacterial whole genome alignments for evolutionary studies and outbreak analyses. These datasets, of increasing size, often include examples of multiple different mechanisms of horizontal sequence transfer resulting in substantial alterations to prokaryotic chromosomes. The impact of these processes demands rapid and flexible approaches able to account for recombination when reconstructing isolates' recent diversification. Gubbins is an iterative algorithm that uses spatial scanning statistics to identify loci containing elevated densities of base substitutions suggestive of horizontal sequence transfer while concurrently constructing a maximum likelihood phylogeny based on the putative point mutations outside these regions of high sequence diversity. Simulations demonstrate the algorithm generates highly accurate reconstructions under realistically parameterized models of bacterial evolution, and achieves convergence in only a few hours on alignments of hundreds of bacterial genome sequences. Gubbins is appropriate for reconstructing the recent evolutionary history of a variety of haploid genotype alignments, as it makes no assumptions about the underlying mechanism of recombination. The software is freely available for download at github.com/sanger-pathogens/Gubbins, implemented in Python and C and supported on Linux and Mac OS X.


Subject(s)
Algorithms , Bacteria/classification , Genome, Bacterial , Phylogeny , Recombination, Genetic , Bacteria/genetics , Sequence Analysis
17.
Cell Host Microbe ; 16(4): 504-16, 2014 Oct 08.
Article in English | MEDLINE | ID: mdl-25263220

ABSTRACT

Our intestinal microbiota harbors a diverse microbial community, often containing opportunistic bacteria with virulence potential. However, mutualistic host-microbial interactions prevent disease by opportunistic pathogens through poorly understood mechanisms. We show that the epithelial interleukin-22 receptor IL-22RA1 protects against lethal Citrobacter rodentium infection and chemical-induced colitis by promoting colonization resistance against an intestinal opportunistic bacterium, Enterococcus faecalis. Susceptibility of Il22ra1(-/-) mice to C. rodentium was associated with preferential expansion and epithelial translocation of pathogenic E. faecalis during severe microbial dysbiosis and was ameloriated with antibiotics active against E. faecalis. RNA sequencing analyses of primary colonic organoids showed that IL-22RA1 signaling promotes intestinal fucosylation via induction of the fucosyltransferase Fut2. Additionally, administration of fucosylated oligosaccharides to C. rodentium-challenged Il22ra1(-/-) mice attenuated infection and promoted E. faecalis colonization resistance by restoring the diversity of anaerobic commensal symbionts. These results support a model whereby IL-22RA1 enhances host-microbiota mutualism to limit detrimental overcolonization by opportunistic pathogens.


Subject(s)
Citrobacter rodentium/immunology , Colitis/prevention & control , Enterococcus faecalis/immunology , Intestinal Mucosa/immunology , Intestinal Mucosa/metabolism , Microbial Interactions , Receptors, Interleukin/metabolism , Animals , Bacterial Translocation , Citrobacter rodentium/physiology , Colitis/chemically induced , Disease Susceptibility , Dysbiosis , Enterococcus faecalis/physiology , Fucosyltransferases/metabolism , Mice , Mice, Knockout , Receptors, Interleukin/genetics , Signal Transduction , Galactoside 2-alpha-L-fucosyltransferase
18.
Nature ; 496(7443): 57-63, 2013 Apr 04.
Article in English | MEDLINE | ID: mdl-23485966

ABSTRACT

Tapeworms (Cestoda) cause neglected diseases that can be fatal and are difficult to treat, owing to inefficient drugs. Here we present an analysis of tapeworm genome sequences using the human-infective species Echinococcus multilocularis, E. granulosus, Taenia solium and the laboratory model Hymenolepis microstoma as examples. The 115- to 141-megabase genomes offer insights into the evolution of parasitism. Synteny is maintained with distantly related blood flukes but we find extreme losses of genes and pathways that are ubiquitous in other animals, including 34 homeobox families and several determinants of stem cell fate. Tapeworms have specialized detoxification pathways, metabolism that is finely tuned to rely on nutrients scavenged from their hosts, and species-specific expansions of non-canonical heat shock proteins and families of known antigens. We identify new potential drug targets, including some on which existing pharmaceuticals may act. The genomes provide a rich resource to underpin the development of urgently needed treatments and control.


Subject(s)
Adaptation, Physiological/genetics , Cestoda/genetics , Genome, Helminth/genetics , Parasites/genetics , Animals , Biological Evolution , Cestoda/drug effects , Cestoda/physiology , Cestode Infections/drug therapy , Cestode Infections/metabolism , Conserved Sequence/genetics , Echinococcus granulosus/genetics , Echinococcus multilocularis/drug effects , Echinococcus multilocularis/genetics , Echinococcus multilocularis/metabolism , Genes, Helminth/genetics , Genes, Homeobox/genetics , HSP70 Heat-Shock Proteins/genetics , Humans , Hymenolepis/genetics , Metabolic Networks and Pathways/genetics , Molecular Targeted Therapy , Parasites/drug effects , Parasites/physiology , Proteome/genetics , Stem Cells/cytology , Stem Cells/metabolism , Taenia solium/genetics
SELECTION OF CITATIONS
SEARCH DETAIL
...