Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 7 de 7
Filtrar
1.
PLoS Biol ; 19(11): e3001421, 2021 11.
Artículo en Inglés | MEDLINE | ID: mdl-34752446

RESUMEN

The open sharing of genomic data provides an incredibly rich resource for the study of bacterial evolution and function and even anthropogenic activities such as the widespread use of antimicrobials. However, these data consist of genomes assembled with different tools and levels of quality checking, and of large volumes of completely unprocessed raw sequence data. In both cases, considerable computational effort is required before biological questions can be addressed. Here, we assembled and characterised 661,405 bacterial genomes retrieved from the European Nucleotide Archive (ENA) in November of 2018 using a uniform standardised approach. Of these, 311,006 did not previously have an assembly. We produced a searchable COmpact Bit-sliced Signature (COBS) index, facilitating the easy interrogation of the entire dataset for a specific sequence (e.g., gene, mutation, or plasmid). Additional MinHash and pp-sketch indices support genome-wide comparisons and estimations of genomic distance. Combined, this resource will allow data to be easily subset and searched, phylogenetic relationships between genomes to be quickly elucidated, and hypotheses rapidly generated and tested. We believe that this combination of uniform processing and variety of search/filter functionalities will make this a resource of very wide utility. In terms of diversity within the data, a breakdown of the 639,981 high-quality genomes emphasised the uneven species composition of the ENA/public databases, with just 20 of the total 2,336 species making up 90% of the genomes. The overrepresented species tend to be acute/common human pathogens, aligning with research priorities at different levels from individual interests to funding bodies and national and global public health agencies.


Asunto(s)
Bacterias/genética , Biodiversidad , ADN Bacteriano/genética , Curaduría de Datos , Secuencia de Bases , Farmacorresistencia Bacteriana/genética , Especificidad de la Especie
2.
Nucleic Acids Res ; 48(8): 4357-4370, 2020 05 07.
Artículo en Inglés | MEDLINE | ID: mdl-32232417

RESUMEN

The Klebsiella pneumoniae species complex includes important opportunistic pathogens which have become public health priorities linked to major hospital outbreaks and the recent emergence of multidrug-resistant hypervirulent strains. Bacterial virulence and the spread of multidrug resistance have previously been linked to toxin-antitoxin (TA) systems. TA systems encode a toxin that disrupts essential cellular processes, and a cognate antitoxin which counteracts this activity. Whilst associated with the maintenance of plasmids, they also act in bacterial immunity and antibiotic tolerance. However, the evolutionary dynamics and distribution of TA systems in clinical pathogens are not well understood. Here, we present a comprehensive survey and description of the diversity of TA systems in 259 clinically relevant genomes of K. pneumoniae. We show that TA systems are highly prevalent with a median of 20 loci per strain. Importantly, these toxins differ substantially in their distribution patterns and in their range of cognate antitoxins. Classification along these properties suggests different roles of TA systems and highlights the association and co-evolution of toxins and antitoxins.


Asunto(s)
Evolución Molecular , Klebsiella pneumoniae/genética , Sistemas Toxina-Antitoxina/genética , Simulación por Computador , Farmacorresistencia Bacteriana/genética , Genoma Bacteriano , Klebsiella pneumoniae/efectos de los fármacos , Klebsiella pneumoniae/patogenicidad , Fenotipo , Factores de Virulencia/genética
3.
Nucleic Acids Res ; 46(21): e128, 2018 11 30.
Artículo en Inglés | MEDLINE | ID: mdl-30124998

RESUMEN

Gene arrays and operons that encode functionally linked proteins form the most basic unit of transcriptional regulation in bacteria. Rules that govern the order and orientation of genes in these systems have been defined; however, these were based on a small set of genomes that may not be representative. The growing availability of large genomic datasets presents an opportunity to test these rules, to define the full range and diversity of these systems, and to understand their evolution. Here we present SLING, a tool to Search for LINked Genes by searching for a single functionally essential gene, along with its neighbours in a rule-defined proximity (https://github.com/ghoresh11/sling/wiki). Examining this subset of genes enables us to understand the basic diversity of these genetic systems in large datasets. We demonstrate the utility of SLING on a clinical collection of enteropathogenic Escherichia coli for two relevant operons: toxin antitoxin (TA) systems and RND efflux pumps. By examining the diversity of these systems, we gain insight on distinct classes of operons which present variable levels of prevalence and ability to be lost or gained. The importance of this analysis is not limited to TA systems and RND pumps, and can be expanded to understand the diversity of many other relevant gene arrays.


Asunto(s)
Proteínas Bacterianas/genética , Biología Computacional/métodos , Genes Bacterianos/genética , Almacenamiento y Recuperación de la Información/métodos , Operón/genética , Antitoxinas/genética , Toxinas Bacterianas/genética , Bases de Datos Genéticas , Genoma Bacteriano/genética , Genómica/métodos , Internet , Reproducibilidad de los Resultados
4.
Microb Genom ; 7(2)2021 02.
Artículo en Inglés | MEDLINE | ID: mdl-33417534

RESUMEN

Escherichia coli is a highly diverse organism that includes a range of commensal and pathogenic variants found across a range of niches and worldwide. In addition to causing severe intestinal and extraintestinal disease, E. coli is considered a priority pathogen due to high levels of observed drug resistance. The diversity in the E. coli population is driven by high genome plasticity and a very large gene pool. All these have made E. coli one of the most well-studied organisms, as well as a commonly used laboratory strain. Today, there are thousands of sequenced E. coli genomes stored in public databases. While data is widely available, accessing the information in order to perform analyses can still be a challenge. Collecting relevant available data requires accessing different sources, where data may be stored in a range of formats, and often requires further manipulation and processing to apply various analyses and extract useful information. In this study, we collated and intensely curated a collection of over 10 000 E. coli and Shigella genomes to provide a single, uniform, high-quality dataset. Shigella were included as they are considered specialized pathovars of E. coli. We provide these data in a number of easily accessible formats that can be used as the foundation for future studies addressing the biological differences between E. coli lineages and the distribution and flow of genes in the E. coli population at a high resolution. The analysis we present emphasizes our lack of understanding of the true diversity of the E. coli species, and the biased nature of our current understanding of the genetic diversity of such a key pathogen.


Asunto(s)
Bases de Datos Genéticas , Proteínas de Escherichia coli/genética , Escherichia coli/genética , Shigella/genética , Acceso a la Información , Biología Computacional/métodos , Curaduría de Datos , Escherichia coli/clasificación , Flujo Génico , Genoma Bacteriano , Shigella/clasificación
5.
Microb Genom ; 7(9)2021 09.
Artículo en Inglés | MEDLINE | ID: mdl-34559043

RESUMEN

The pan-genome is defined as the combined set of all genes in the gene pool of a species. Pan-genome analyses have been very useful in helping to understand different evolutionary dynamics of bacterial species: an open pan-genome often indicates a free-living lifestyle with metabolic versatility, while closed pan-genomes are linked to host-restricted, ecologically specialized bacteria. A detailed understanding of the species pan-genome has also been instrumental in tracking the phylodynamics of emerging drug resistance mechanisms and drug-resistant pathogens. However, current approaches to analyse a species' pan-genome do not take the species population structure into account, nor do they account for the uneven sampling of different lineages, as is commonplace due to over-sampling of clinically relevant representatives. Here we present the application of a population structure-aware approach for classifying genes in a pan-genome based on within-species distribution. We demonstrate our approach on a collection of 7500 Escherichia coli genomes, one of the most-studied bacterial species and used as a model for an open pan-genome. We reveal clearly distinct groups of genes, clustered by different underlying evolutionary dynamics, and provide a more biologically informed and accurate description of the species' pan-genome.


Asunto(s)
Bacterias/genética , Evolución Molecular , Genoma Bacteriano , Escherichia coli/genética , Transferencia de Gen Horizontal , Genómica , Familia de Multigenes , Filogenia
6.
Microb Genom ; 7(9)2021 09.
Artículo en Inglés | MEDLINE | ID: mdl-34550065

RESUMEN

The Salmonella enterica serotype Paratyphi B complex causes a wide range of diseases, from gastroenteritis to paratyphoid fever, depending on the biotypes Java and sensu stricto. The burden of Paratyphi B biotypes in Bangladesh is still unknown, as these are indistinguishable by Salmonella serotyping. Here, we conducted the first whole-genome sequencing (WGS) study on 79 Salmonella isolates serotyped as Paratyphi B that were collected from 10 nationwide enteric disease surveillance sites in Bangladesh. Placing these in a global genetic context revealed that these are biotype Java, and the addition of these genomes expanded the previously described PG4 clade containing Bangladeshi and UK isolates. Importantly, antimicrobial resistance (AMR) genes were scarce amongst Bangladeshi S. Java isolates, somewhat surprisingly given the widespread availability of antibiotics without prescription. This genomic information provides important insights into the significance of S. Paratyphi B biotypes in enteric disease and their implications for public health.


Asunto(s)
Infecciones por Salmonella/microbiología , Salmonella/clasificación , Salmonella/genética , Adolescente , Adulto , Anciano , Anciano de 80 o más Años , Bangladesh/epidemiología , Niño , Preescolar , Femenino , Humanos , Lactante , Recién Nacido , Masculino , Persona de Mediana Edad , Fiebre Paratifoidea/epidemiología , Salmonella/aislamiento & purificación , Infecciones por Salmonella/epidemiología , Serogrupo , Serotipificación , Reino Unido/epidemiología , Secuenciación Completa del Genoma , Adulto Joven
7.
Genome Biol ; 21(1): 180, 2020 07 22.
Artículo en Inglés | MEDLINE | ID: mdl-32698896

RESUMEN

Population-level comparisons of prokaryotic genomes must take into account the substantial differences in gene content resulting from horizontal gene transfer, gene duplication and gene loss. However, the automated annotation of prokaryotic genomes is imperfect, and errors due to fragmented assemblies, contamination, diverse gene families and mis-assemblies accumulate over the population, leading to profound consequences when analysing the set of all genes found in a species. Here, we introduce Panaroo, a graph-based pangenome clustering tool that is able to account for many of the sources of error introduced during the annotation of prokaryotic genome assemblies. Panaroo is available at https://github.com/gtonkinhill/panaroo .


Asunto(s)
Algoritmos , Genoma Bacteriano , Genómica/métodos , Programas Informáticos , Evolución Biológica , Farmacorresistencia Bacteriana/genética , Klebsiella pneumoniae/genética , Mycobacterium tuberculosis/genética
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA