Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 17 de 17
Filtrar
1.
Genome Res ; 33(1): 129-140, 2023 01.
Artículo en Inglés | MEDLINE | ID: mdl-36669850

RESUMEN

Horizontal gene transfer (HGT) plays a critical role in the evolution and diversification of many microbial species. The resulting dynamics of gene gain and loss can have important implications for the development of antibiotic resistance and the design of vaccine and drug interventions. Methods for the analysis of gene presence/absence patterns typically do not account for errors introduced in the automated annotation and clustering of gene sequences. In particular, methods adapted from ecological studies, including the pangenome gene accumulation curve, can be misleading as they may reflect the underlying diversity in the temporal sampling of genomes rather than a difference in the dynamics of HGT. Here, we introduce Panstripe, a method based on generalized linear regression that is robust to population structure, sampling bias, and errors in the predicted presence/absence of genes. We show using simulations that Panstripe can effectively identify differences in the rate and number of genes involved in HGT events, and illustrate its capability by analyzing several diverse bacterial genome data sets representing major human pathogens.


Asunto(s)
Evolución Molecular , Células Procariotas , Humanos , Filogenia , Genoma Bacteriano , Transferencia de Gen Horizontal
2.
Microbiology (Reading) ; 167(9)2021 09.
Artículo en Inglés | MEDLINE | ID: mdl-34491894

RESUMEN

Enterococcus faecium is a nosocomial, multidrug-resistant pathogen. Whole genome sequence studies revealed that hospital-associated E. faecium isolates are clustered in a separate clade A1. Here, we investigated the distribution, integration site and function of a putative iol gene cluster that encodes for myo-inositol (MI) catabolism. This iol gene cluster was found as part of an ~20 kbp genetic element (iol element), integrated in ICEEfm1 close to its integrase gene in E. faecium isolate E1679. Among 1644 E. faecium isolates, ICEEfm1 was found in 789/1227 (64.3 %) clade A1 and 3/417 (0.7 %) non-clade A1 isolates. The iol element was present at a similar integration site in 180/792 (22.7 %) ICEEfm1-containing isolates. Examination of the phylogenetic tree revealed genetically closely related isolates that differed in presence/absence of ICEEfm1 and/or iol element, suggesting either independent acquisition or loss of both elements. E. faecium iol gene cluster containing isolates E1679 and E1504 were able to grow in minimal medium with only myo-inositol as carbon source, while the iolD-deficient mutant in E1504 (E1504∆iolD) lost this ability and an iol gene cluster negative recipient strain gained this ability after acquisition of ICEEfm1 by conjugation from donor strain E1679. Gene expression profiling revealed that the iol gene cluster is only expressed in the absence of other carbon sources. In an intestinal colonization mouse model the colonization ability of E1504∆iolD mutant was not affected relative to the wild-type E1504 strain. In conclusion, we describe and functionally characterise a gene cluster involved in MI catabolism that is associated with the ICEEfm1 island in hospital-associated E. faecium isolates. We were unable to show that this gene cluster provides a competitive advantage during gut colonisation in a mouse model. Therefore, to what extent this gene cluster contributes to the spread and ecological specialisation of ICEEfm1-carrying hospital-associated isolates remains to be investigated.


Asunto(s)
Enterococcus faecium , Infecciones por Bacterias Grampositivas , Animales , Antibacterianos , Enterococcus faecium/genética , Genoma Bacteriano , Hospitales , Inositol , Ratones , Familia de Multigenes , Filogenia
3.
Bioinformatics ; 36(12): 3874-3876, 2020 06 01.
Artículo en Inglés | MEDLINE | ID: mdl-32271863

RESUMEN

SUMMARY: Plasmids can horizontally transmit genetic traits, enabling rapid bacterial adaptation to new environments and hosts. Short-read whole-genome sequencing data are often applied to large-scale bacterial comparative genomics projects but the reconstruction of plasmids from these data is facing severe limitations, such as the inability to distinguish plasmids from each other in a bacterial genome. We developed gplas, a new approach to reliably separate plasmid contigs into discrete components using sequence composition, coverage, assembly graph information and network partitioning based on a pruned network of plasmid unitigs. Gplas facilitates the analysis of large numbers of bacterial isolates and allows a detailed analysis of plasmid epidemiology based solely on short-read sequence data. AVAILABILITY AND IMPLEMENTATION: Gplas is written in R, Bash and uses a Snakemake pipeline as a workflow management system. Gplas is available under the GNU General Public License v3.0 at https://gitlab.com/sirarredondo/gplas.git. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Genoma Bacteriano , Programas Informáticos , Genómica , Secuenciación de Nucleótidos de Alto Rendimiento , Plásmidos/genética , Análisis de Secuencia de ADN , Secuenciación Completa del Genoma
4.
BMC Genomics ; 21(1): 568, 2020 Aug 18.
Artículo en Inglés | MEDLINE | ID: mdl-32811437

RESUMEN

BACKGROUND: The nosocomial pathogen Enterococcus faecium can survive for prolonged periods of time on surfaces in the absence of nutrients. This trait is thought to contribute to the ability of E. faecium to spread among patients in hospitals. There is currently a lack of data on the mechanisms that are responsible for the ability of E. faecium to survive in the absence of nutrients. RESULTS: We performed a high-throughput transposon mutant library screening (Tn-seq) to identify genes that have a role in long-term survival during incubation in phosphate-buffered saline (PBS) at 20 °C. A total of 24 genes were identified by Tn-seq to contribute to survival in PBS, with functions associated with the general stress response, DNA repair, metabolism, and membrane homeostasis. The gene which was quantitatively most important for survival in PBS was usp (locus tag: EfmE745_02439), which is predicted to encode a 17.4 kDa universal stress protein. After generating a targeted deletion mutant in usp, we were able to confirm that usp significantly contributes to survival in PBS and this defect was restored by in trans complementation. The usp gene is present in 99% of a set of 1644 E. faecium genomes that collectively span the diversity of the species. CONCLUSIONS: We postulate that usp is a key determinant for the remarkable environmental robustness of E. faecium. Further mechanistic studies into usp and other genes identified in this study may shed further light on the mechanisms by which E. faecium can survive in the absence of nutrients for prolonged periods of time.


Asunto(s)
Enterococcus faecium , Infecciones por Bacterias Grampositivas , Enterococcus faecium/genética , Genes Esenciales , Humanos
5.
Microb Genom ; 10(2)2024 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-38376388

RESUMEN

Accurate reconstruction of Escherichia coli antibiotic resistance gene (ARG) plasmids from Illumina sequencing data has proven to be a challenge with current bioinformatic tools. In this work, we present an improved method to reconstruct E. coli plasmids using short reads. We developed plasmidEC, an ensemble classifier that identifies plasmid-derived contigs by combining the output of three different binary classification tools. We showed that plasmidEC is especially suited to classify contigs derived from ARG plasmids with a high recall of 0.941. Additionally, we optimized gplas, a graph-based tool that bins plasmid-predicted contigs into distinct plasmid predictions. Gplas2 is more effective at recovering plasmids with large sequencing coverage variations and can be combined with the output of any binary classifier. The combination of plasmidEC with gplas2 showed a high completeness (median=0.818) and F1-Score (median=0.812) when reconstructing ARG plasmids and exceeded the binning capacity of the reference-based method MOB-suite. In the absence of long-read data, our method offers an excellent alternative to reconstruct ARG plasmids in E. coli.


Asunto(s)
Escherichia coli , Secuenciación de Nucleótidos de Alto Rendimiento , Escherichia coli/genética , Antibacterianos/farmacología , Farmacorresistencia Microbiana , Plásmidos/genética
6.
NAR Genom Bioinform ; 6(2): lqae061, 2024 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-38846349

RESUMEN

Population genomics has revolutionized our ability to study bacterial evolution by enabling data-driven discovery of the genetic architecture of trait variation. Genome-wide association studies (GWAS) have more recently become accompanied by genome-wide epistasis and co-selection (GWES) analysis, which offers a phenotype-free approach to generating hypotheses about selective processes that simultaneously impact multiple loci across the genome. However, existing GWES methods only consider associations between distant pairs of loci within the genome due to the strong impact of linkage-disequilibrium (LD) over short distances. Based on the general functional organisation of genomes it is nevertheless expected that majority of co-selection and epistasis will act within relatively short genomic proximity, on co-variation occurring within genes and their promoter regions, and within operons. Here, we introduce LDWeaver, which enables an exhaustive GWES across both short- and long-range LD, to disentangle likely neutral co-variation from selection. We demonstrate the ability of LDWeaver to efficiently generate hypotheses about co-selection using large genomic surveys of multiple major human bacterial pathogen species and validate several findings using functional annotation and phenotypic measurements. Our approach will facilitate the study of bacterial evolution in the light of rapidly expanding population genomic data.

7.
NAR Genom Bioinform ; 5(3): lqad066, 2023 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-37435357

RESUMEN

Extrachromosomal elements of bacterial cells such as plasmids are notorious for their importance in evolution and adaptation to changing ecology. However, high-resolution population-wide analysis of plasmids has only become accessible recently with the advent of scalable long-read sequencing technology. Current typing methods for the classification of plasmids remain limited in their scope which motivated us to develop a computationally efficient approach to simultaneously recognize novel types and classify plasmids into previously identified groups. Here, we introduce mge-cluster that can easily handle thousands of input sequences which are compressed using a unitig representation in a de Bruijn graph. Our approach offers a faster runtime than existing algorithms, with moderate memory usage, and enables an intuitive visualization, classification and clustering scheme that users can explore interactively within a single framework. Mge-cluster platform for plasmid analysis can be easily distributed and replicated, enabling a consistent labelling of plasmids across past, present, and future sequence collections. We underscore the advantages of our approach by analysing a population-wide plasmid data set obtained from the opportunistic pathogen Escherichia coli, studying the prevalence of the colistin resistance gene mcr-1.1 within the plasmid population, and describing an instance of resistance plasmid transmission within a hospital environment.

8.
Microbiol Spectr ; 11(6): e0020123, 2023 Dec 12.
Artículo en Inglés | MEDLINE | ID: mdl-37811975

RESUMEN

IMPORTANCE: Enterococcus faecalis causes life-threatening invasive hospital- and community-associated infections that are usually associated with multidrug resistance globally. Although E. faecalis infections cause opportunistic infections typically associated with antibiotic use, immunocompromised immune status, and other factors, they also possess an arsenal of virulence factors crucial for their pathogenicity. Despite this, the relative contribution of these virulence factors and other genetic changes to the pathogenicity of E. faecalis strains remain poorly understood. Here, we investigated whether specific genomic changes in the genome of E. faecalis isolates influence its pathogenicity-infection of hospitalized and nonhospitalized individuals and the propensity to cause extraintestinal infection and intestinal colonization. Our findings indicate that E. faecalis genetics partially influence the infection of hospitalized and nonhospitalized individuals and the propensity to cause extraintestinal infection, possibly due to gut-to-bloodstream translocation, highlighting the potential substantial role of host and environmental factors, including gut microbiota, on the opportunistic pathogenic lifestyle of this bacterium.


Asunto(s)
Enterococcus faecalis , Infecciones por Bacterias Grampositivas , Humanos , Factores de Virulencia/genética , Virulencia/genética , Antibacterianos , Infecciones por Bacterias Grampositivas/microbiología
9.
Nat Commun ; 14(1): 3294, 2023 06 15.
Artículo en Inglés | MEDLINE | ID: mdl-37322051

RESUMEN

Escherichia coli is a leading cause of invasive bacterial infections in humans. Capsule polysaccharide has an important role in bacterial pathogenesis, and the K1 capsule has been firmly established as one of the most potent capsule types in E. coli through its association with severe infections. However, little is known about its distribution, evolution and functions across the E. coli phylogeny, which is fundamental to elucidating its role in the expansion of successful lineages. Using systematic surveys of invasive E. coli isolates, we show that the K1-cps locus is present in a quarter of bloodstream infection isolates and has emerged in at least four different extraintestinal pathogenic E. coli (ExPEC) phylogroups independently in the last 500 years. Phenotypic assessment demonstrates that K1 capsule synthesis enhances E. coli survival in human serum independent of genetic background, and that therapeutic targeting of the K1 capsule re-sensitizes E. coli from distinct genetic backgrounds to human serum. Our study highlights that assessing the evolutionary and functional properties of bacterial virulence factors at population levels is important to better monitor and predict the emergence of virulent clones, and to also inform therapies and preventive medicine to effectively control bacterial infections whilst significantly lowering antibiotic usage.


Asunto(s)
Infecciones por Escherichia coli , Proteínas de Escherichia coli , Humanos , Escherichia coli , Infecciones por Escherichia coli/microbiología , Virulencia/genética , Factores de Virulencia/genética , Proteínas de Escherichia coli/genética , Filogenia
10.
Genome Med ; 13(1): 9, 2021 01 20.
Artículo en Inglés | MEDLINE | ID: mdl-33472670

RESUMEN

BACKGROUND: Enterococcus faecium is a commensal of the gastrointestinal tract of animals and humans but also a causative agent of hospital-acquired infections. Resistance against glycopeptides and to vancomycin has motivated the inclusion of E. faecium in the WHO global priority list. Vancomycin resistance can be conferred by the vanA gene cluster on the transposon Tn1546, which is frequently present in plasmids. The vanA gene cluster can be disseminated clonally but also horizontally either by plasmid dissemination or by Tn1546 transposition between different genomic locations. METHODS: We performed a retrospective study of the genomic epidemiology of 309 vancomycin-resistant E. faecium (VRE) isolates across 32 Dutch hospitals (2012-2015). Genomic information regarding clonality and Tn1546 characterization was extracted using hierBAPS sequence clusters (SC) and TETyper, respectively. Plasmids were predicted using gplas in combination with a network approach based on shared k-mer content. Next, we conducted a pairwise comparison between isolates sharing a potential epidemiological link to elucidate whether clonal, plasmid, or Tn1546 spread accounted for vanA-type resistance dissemination. RESULTS: On average, we estimated that 59% of VRE cases with a potential epidemiological link were unrelated which was defined as VRE pairs with a distinct Tn1546 variant. Clonal dissemination accounted for 32% cases in which the same SC and Tn1546 variants were identified. Horizontal plasmid dissemination accounted for 7% of VRE cases, in which we observed VRE pairs belonging to a distinct SC but carrying an identical plasmid and Tn1546 variant. In 2% of cases, we observed the same Tn1546 variant in distinct SC and plasmid types which could be explained by mixed and consecutive events of clonal and plasmid dissemination. CONCLUSIONS: In related VRE cases, the dissemination of the vanA gene cluster in Dutch hospitals between 2012 and 2015 was dominated by clonal spread. However, we also identified outbreak settings with high frequencies of plasmid dissemination in which the spread of resistance was mainly driven by horizontal gene transfer (HGT). This study demonstrates the feasibility of distinguishing between modes of dissemination with short-read data and provides a novel assessment to estimate the relative contribution of nested genomic elements in the dissemination of vanA-type resistance.


Asunto(s)
Proteínas Bacterianas/metabolismo , Infecciones por Bacterias Grampositivas/microbiología , Infecciones por Bacterias Grampositivas/transmisión , Hospitales , Resistencia a la Vancomicina , Secuencia de Bases , Enterococcus faecium/genética , Enterococcus faecium/fisiología , Humanos , Países Bajos/epidemiología , Plásmidos/genética , Reproducibilidad de los Resultados , Factores de Tiempo , Enterococos Resistentes a la Vancomicina/genética , Enterococos Resistentes a la Vancomicina/aislamiento & purificación
11.
Gigascience ; 10(12)2021 12 09.
Artículo en Inglés | MEDLINE | ID: mdl-34891160

RESUMEN

BACKGROUND: Bacterial whole-genome sequencing based on short-read technologies often results in a draft assembly formed by contiguous sequences. The introduction of long-read sequencing technologies permits those contiguous sequences to be unambiguously bridged into complete genomes. However, the elevated costs associated with long-read sequencing frequently limit the number of bacterial isolates that can be long-read sequenced. Here we evaluated the recently released 96 barcoding kit from Oxford Nanopore Technologies (ONT) to generate complete genomes on a high-throughput basis. In addition, we propose an isolate selection strategy that optimizes a representative selection of isolates for long-read sequencing considering as input large-scale bacterial collections. RESULTS: Despite an uneven distribution of long reads per barcode, near-complete chromosomal sequences (assembly contiguity = 0.89) were generated for 96 Escherichia coli isolates with associated short-read sequencing data. The assembly contiguity of the plasmid replicons was even higher (0.98), which indicated the suitability of the multiplexing strategy for studies focused on resolving plasmid sequences. We benchmarked hybrid and ONT-only assemblies and showed that the combination of ONT sequencing data with short-read sequencing data is still highly desirable (i) to perform an unbiased selection of isolates for long-read sequencing, (ii) to achieve an optimal genome accuracy and completeness, and (iii) to include small plasmids underrepresented in the ONT library. CONCLUSIONS: The proposed long-read isolate selection ensures the completion of bacterial genomes that span the genome diversity inherent in large collections of bacterial isolates. We show the potential of using this multiplexing approach to close bacterial genomes on a high-throughput basis.


Asunto(s)
Genoma Bacteriano , Nanoporos , Biblioteca de Genes , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Análisis de Secuencia de ADN/métodos
12.
Microorganisms ; 9(8)2021 Jul 28.
Artículo en Inglés | MEDLINE | ID: mdl-34442692

RESUMEN

The incidence of infections caused by multidrug-resistant E. coli strains has risen in the past years. Antibiotic resistance in E. coli is often mediated by acquisition and maintenance of plasmids. The study of E. coli plasmid epidemiology and genomics often requires long-read sequencing information, but recently a number of tools that allow plasmid prediction from short-read data have been developed. Here, we reviewed 25 available plasmid prediction tools and categorized them into binary plasmid/chromosome classification tools and plasmid reconstruction tools. We benchmarked six tools (MOB-suite, plasmidSPAdes, gplas, FishingForPlasmids, HyAsP and SCAPP) that aim to reliably reconstruct distinct plasmids, with a special focus on plasmids carrying antibiotic resistance genes (ARGs) such as extended-spectrum beta-lactamase genes. We found that two thirds (n = 425, 66.3%) of all plasmids were correctly reconstructed by at least one of the six tools, with a range of 92 (14.58%) to 317 (50.23%) correctly predicted plasmids. However, the majority of plasmids that carried antibiotic resistance genes (n = 85, 57.8%) could not be completely recovered as distinct plasmids by any of the tools. MOB-suite was the only tool that was able to correctly reconstruct the majority of plasmids (n = 317, 50.23%), and performed best at reconstructing large plasmids (n = 166, 46.37%) and ARG-plasmids (n = 41, 27.9%), but predictions frequently contained chromosome contamination (40%). In contrast, plasmidSPAdes reconstructed the highest fraction of plasmids smaller than 18 kbp (n = 168, 61.54%). Large ARG-plasmids, however, were frequently merged with sequences derived from distinct replicons. Available bioinformatic tools can provide valuable insight into E. coli plasmids, but also have important limitations. This work will serve as a guideline for selecting the most appropriate plasmid reconstruction tool for studies focusing on E. coli plasmids in the absence of long-read sequencing data.

13.
Nat Commun ; 12(1): 1523, 2021 03 09.
Artículo en Inglés | MEDLINE | ID: mdl-33750782

RESUMEN

Enterococcus faecalis is a commensal and nosocomial pathogen, which is also ubiquitous in animals and insects, representing a classical generalist microorganism. Here, we study E. faecalis isolates ranging from the pre-antibiotic era in 1936 up to 2018, covering a large set of host species including wild birds, mammals, healthy humans, and hospitalised patients. We sequence the bacterial genomes using short- and long-read techniques, and identify multiple extant hospital-associated lineages, with last common ancestors dating back as far as the 19th century. We find a population cohesively connected through homologous recombination, a metabolic flexibility despite a small genome size, and a stable large core genome. Our findings indicate that the apparent hospital adaptations found in hospital-associated E. faecalis lineages likely predate the "modern hospital" era, suggesting selection in another niche, and underlining the generalist nature of this nosocomial pathogen.


Asunto(s)
Infección Hospitalaria/microbiología , Enterococcus faecalis/genética , Animales , Antibacterianos , Aves , Farmacorresistencia Bacteriana/genética , Enterococcus faecalis/efectos de los fármacos , Enterococcus faecalis/aislamiento & purificación , Genes MDR/genética , Genoma Bacteriano , Infecciones por Bacterias Grampositivas/microbiología , Hospitales , Especificidad del Huésped , Humanos , Filogenia , Factores de Virulencia , Secuenciación Completa del Genoma
14.
Microb Genom ; 6(12)2020 12.
Artículo en Inglés | MEDLINE | ID: mdl-33253085

RESUMEN

Enterococcus faecium is a gut commensal of the gastro-digestive tract, but also known as nosocomial pathogen among hospitalized patients. Population genetics based on whole-genome sequencing has revealed that E. faecium strains from hospitalized patients form a distinct clade, designated clade A1, and that plasmids are major contributors to the emergence of nosocomial E. faecium. Here we further explored the adaptive evolution of E. faecium using a genome-wide co-evolution study (GWES) to identify co-evolving single-nucleotide polymorphisms (SNPs). We identified three genomic regions harbouring large numbers of SNPs in tight linkage that are not proximal to each other based on the completely assembled chromosome of the clade A1 reference hospital isolate AUS0004. Close examination of these regions revealed that they are located at the borders of four different types of large-scale genomic rearrangements, insertion sites of two different genomic islands and an IS30-like transposon. In non-clade A1 isolates, these regions are adjacent to each other and they lack the insertions of the genomic islands and IS30-like transposon. Additionally, among the clade A1 isolates there is one group of pet isolates lacking the genomic rearrangement and insertion of the genomic islands, suggesting a distinct evolutionary trajectory. In silico analysis of the biological functions of the genes encoded in three regions revealed a common link to a stress response. This suggests that these rearrangements may reflect adaptation to the stringent conditions in the hospital environment, such as antibiotics and detergents, to which bacteria are exposed. In conclusion, to our knowledge, this is the first study using GWES to identify genomic rearrangements, suggesting that there is considerable untapped potential to unravel hidden evolutionary signals from population genomic data.


Asunto(s)
Enterococcus faecium/clasificación , Infecciones por Bacterias Grampositivas/microbiología , Polimorfismo de Nucleótido Simple , Secuenciación Completa del Genoma/métodos , Infección Hospitalaria/microbiología , Elementos Transponibles de ADN , Enterococcus faecium/genética , Evolución Molecular , Islas Genómicas , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Filogenia , Plásmidos/genética
15.
Microb Genom ; 4(11)2018 11.
Artículo en Inglés | MEDLINE | ID: mdl-30383524

RESUMEN

Assembly of bacterial short-read whole-genome sequencing data frequently results in hundreds of contigs for which the origin, plasmid or chromosome, is unclear. Complete genomes resolved by long-read sequencing can be used to generate and label short-read contigs. These were used to train several popular machine learning methods to classify the origin of contigs from Enterococcus faecium, Klebsiella pneumoniae and Escherichia coli using pentamer frequencies. We selected support-vector machine (SVM) models as the best classifier for all three bacterial species (F1-score E. faecium=0.92, F1-score K. pneumoniae=0.90, F1-score E. coli=0.76), which outperformed other existing plasmid prediction tools using a benchmarking set of isolates. We demonstrated the scalability of our models by accurately predicting the plasmidome of a large collection of 1644 E. faecium isolates and illustrate its applicability by predicting the location of antibiotic-resistance genes in all three species. The SVM classifiers are publicly available as an R package and graphical-user interface called 'mlplasmids'. We anticipate that this tool may significantly facilitate research on the dissemination of plasmids encoding antibiotic resistance and/or contributing to host adaptation.


Asunto(s)
Cromosomas Bacterianos , Genoma Bacteriano , Plásmidos/genética , Programas Informáticos , Farmacorresistencia Bacteriana/genética , Enterococcus faecium/genética , Escherichia coli/genética , Transferencia de Gen Horizontal , Klebsiella pneumoniae/genética , Aprendizaje Automático , Máquina de Vectores de Soporte , Secuenciación Completa del Genoma
16.
Microb Genom ; 3(10): e000128, 2017 10.
Artículo en Inglés | MEDLINE | ID: mdl-29177087

RESUMEN

To benchmark algorithms for automated plasmid sequence reconstruction from short-read sequencing data, we selected 42 publicly available complete bacterial genome sequences spanning 12 genera, containing 148 plasmids. We predicted plasmids from short-read data with four programs (PlasmidSPAdes, Recycler, cBar and PlasmidFinder) and compared the outcome to the reference sequences. PlasmidSPAdes reconstructs plasmids based on coverage differences in the assembly graph. It reconstructed most of the reference plasmids (recall=0.82), but approximately a quarter of the predicted plasmid contigs were false positives (precision=0.75). PlasmidSPAdes merged 84 % of the predictions from genomes with multiple plasmids into a single bin. Recycler searches the assembly graph for sub-graphs corresponding to circular sequences and correctly predicted small plasmids, but failed with long plasmids (recall=0.12, precision=0.30). cBar, which applies pentamer frequency analysis to detect plasmid-derived contigs, showed a recall and precision of 0.76 and 0.62, respectively. However, cBar categorizes contigs as plasmid-derived and does not bin the different plasmids. PlasmidFinder, which searches for replicons, had the highest precision (1.0), but was restricted by the contents of its database and the contig length obtained from de novo assembly (recall=0.36). PlasmidSPAdes and Recycler detected putative small plasmids (<10 kbp), which were also predicted as plasmids by cBar, but were absent in the original assembly. This study shows that it is possible to automatically predict small plasmids. Prediction of large plasmids (>50 kbp) containing repeated sequences remains challenging and limits the high-throughput analysis of plasmids from short-read whole-genome sequencing data.


Asunto(s)
Genoma Bacteriano , Plásmidos , Biología Computacional , Sistemas de Administración de Bases de Datos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA