Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 17 de 17
Filtrar
1.
Genome Res ; 33(1): 129-140, 2023 01.
Artigo em Inglês | MEDLINE | ID: mdl-36669850

RESUMO

Horizontal gene transfer (HGT) plays a critical role in the evolution and diversification of many microbial species. The resulting dynamics of gene gain and loss can have important implications for the development of antibiotic resistance and the design of vaccine and drug interventions. Methods for the analysis of gene presence/absence patterns typically do not account for errors introduced in the automated annotation and clustering of gene sequences. In particular, methods adapted from ecological studies, including the pangenome gene accumulation curve, can be misleading as they may reflect the underlying diversity in the temporal sampling of genomes rather than a difference in the dynamics of HGT. Here, we introduce Panstripe, a method based on generalized linear regression that is robust to population structure, sampling bias, and errors in the predicted presence/absence of genes. We show using simulations that Panstripe can effectively identify differences in the rate and number of genes involved in HGT events, and illustrate its capability by analyzing several diverse bacterial genome data sets representing major human pathogens.


Assuntos
Evolução Molecular , Células Procarióticas , Humanos , Filogenia , Genoma Bacteriano , Transferência Genética Horizontal
2.
Microbiology (Reading) ; 167(9)2021 09.
Artigo em Inglês | MEDLINE | ID: mdl-34491894

RESUMO

Enterococcus faecium is a nosocomial, multidrug-resistant pathogen. Whole genome sequence studies revealed that hospital-associated E. faecium isolates are clustered in a separate clade A1. Here, we investigated the distribution, integration site and function of a putative iol gene cluster that encodes for myo-inositol (MI) catabolism. This iol gene cluster was found as part of an ~20 kbp genetic element (iol element), integrated in ICEEfm1 close to its integrase gene in E. faecium isolate E1679. Among 1644 E. faecium isolates, ICEEfm1 was found in 789/1227 (64.3 %) clade A1 and 3/417 (0.7 %) non-clade A1 isolates. The iol element was present at a similar integration site in 180/792 (22.7 %) ICEEfm1-containing isolates. Examination of the phylogenetic tree revealed genetically closely related isolates that differed in presence/absence of ICEEfm1 and/or iol element, suggesting either independent acquisition or loss of both elements. E. faecium iol gene cluster containing isolates E1679 and E1504 were able to grow in minimal medium with only myo-inositol as carbon source, while the iolD-deficient mutant in E1504 (E1504∆iolD) lost this ability and an iol gene cluster negative recipient strain gained this ability after acquisition of ICEEfm1 by conjugation from donor strain E1679. Gene expression profiling revealed that the iol gene cluster is only expressed in the absence of other carbon sources. In an intestinal colonization mouse model the colonization ability of E1504∆iolD mutant was not affected relative to the wild-type E1504 strain. In conclusion, we describe and functionally characterise a gene cluster involved in MI catabolism that is associated with the ICEEfm1 island in hospital-associated E. faecium isolates. We were unable to show that this gene cluster provides a competitive advantage during gut colonisation in a mouse model. Therefore, to what extent this gene cluster contributes to the spread and ecological specialisation of ICEEfm1-carrying hospital-associated isolates remains to be investigated.


Assuntos
Enterococcus faecium , Infecções por Bactérias Gram-Positivas , Animais , Antibacterianos , Enterococcus faecium/genética , Genoma Bacteriano , Hospitais , Inositol , Camundongos , Família Multigênica , Filogenia
3.
Bioinformatics ; 36(12): 3874-3876, 2020 06 01.
Artigo em Inglês | MEDLINE | ID: mdl-32271863

RESUMO

SUMMARY: Plasmids can horizontally transmit genetic traits, enabling rapid bacterial adaptation to new environments and hosts. Short-read whole-genome sequencing data are often applied to large-scale bacterial comparative genomics projects but the reconstruction of plasmids from these data is facing severe limitations, such as the inability to distinguish plasmids from each other in a bacterial genome. We developed gplas, a new approach to reliably separate plasmid contigs into discrete components using sequence composition, coverage, assembly graph information and network partitioning based on a pruned network of plasmid unitigs. Gplas facilitates the analysis of large numbers of bacterial isolates and allows a detailed analysis of plasmid epidemiology based solely on short-read sequence data. AVAILABILITY AND IMPLEMENTATION: Gplas is written in R, Bash and uses a Snakemake pipeline as a workflow management system. Gplas is available under the GNU General Public License v3.0 at https://gitlab.com/sirarredondo/gplas.git. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Genoma Bacteriano , Software , Genômica , Sequenciamento de Nucleotídeos em Larga Escala , Plasmídeos/genética , Análise de Sequência de DNA , Sequenciamento Completo do Genoma
4.
BMC Genomics ; 21(1): 568, 2020 Aug 18.
Artigo em Inglês | MEDLINE | ID: mdl-32811437

RESUMO

BACKGROUND: The nosocomial pathogen Enterococcus faecium can survive for prolonged periods of time on surfaces in the absence of nutrients. This trait is thought to contribute to the ability of E. faecium to spread among patients in hospitals. There is currently a lack of data on the mechanisms that are responsible for the ability of E. faecium to survive in the absence of nutrients. RESULTS: We performed a high-throughput transposon mutant library screening (Tn-seq) to identify genes that have a role in long-term survival during incubation in phosphate-buffered saline (PBS) at 20 °C. A total of 24 genes were identified by Tn-seq to contribute to survival in PBS, with functions associated with the general stress response, DNA repair, metabolism, and membrane homeostasis. The gene which was quantitatively most important for survival in PBS was usp (locus tag: EfmE745_02439), which is predicted to encode a 17.4 kDa universal stress protein. After generating a targeted deletion mutant in usp, we were able to confirm that usp significantly contributes to survival in PBS and this defect was restored by in trans complementation. The usp gene is present in 99% of a set of 1644 E. faecium genomes that collectively span the diversity of the species. CONCLUSIONS: We postulate that usp is a key determinant for the remarkable environmental robustness of E. faecium. Further mechanistic studies into usp and other genes identified in this study may shed further light on the mechanisms by which E. faecium can survive in the absence of nutrients for prolonged periods of time.


Assuntos
Enterococcus faecium , Infecções por Bactérias Gram-Positivas , Enterococcus faecium/genética , Genes Essenciais , Humanos
5.
Microb Genom ; 10(2)2024 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-38376388

RESUMO

Accurate reconstruction of Escherichia coli antibiotic resistance gene (ARG) plasmids from Illumina sequencing data has proven to be a challenge with current bioinformatic tools. In this work, we present an improved method to reconstruct E. coli plasmids using short reads. We developed plasmidEC, an ensemble classifier that identifies plasmid-derived contigs by combining the output of three different binary classification tools. We showed that plasmidEC is especially suited to classify contigs derived from ARG plasmids with a high recall of 0.941. Additionally, we optimized gplas, a graph-based tool that bins plasmid-predicted contigs into distinct plasmid predictions. Gplas2 is more effective at recovering plasmids with large sequencing coverage variations and can be combined with the output of any binary classifier. The combination of plasmidEC with gplas2 showed a high completeness (median=0.818) and F1-Score (median=0.812) when reconstructing ARG plasmids and exceeded the binning capacity of the reference-based method MOB-suite. In the absence of long-read data, our method offers an excellent alternative to reconstruct ARG plasmids in E. coli.


Assuntos
Escherichia coli , Sequenciamento de Nucleotídeos em Larga Escala , Escherichia coli/genética , Antibacterianos/farmacologia , Resistência Microbiana a Medicamentos , Plasmídeos/genética
6.
NAR Genom Bioinform ; 6(2): lqae061, 2024 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-38846349

RESUMO

Population genomics has revolutionized our ability to study bacterial evolution by enabling data-driven discovery of the genetic architecture of trait variation. Genome-wide association studies (GWAS) have more recently become accompanied by genome-wide epistasis and co-selection (GWES) analysis, which offers a phenotype-free approach to generating hypotheses about selective processes that simultaneously impact multiple loci across the genome. However, existing GWES methods only consider associations between distant pairs of loci within the genome due to the strong impact of linkage-disequilibrium (LD) over short distances. Based on the general functional organisation of genomes it is nevertheless expected that majority of co-selection and epistasis will act within relatively short genomic proximity, on co-variation occurring within genes and their promoter regions, and within operons. Here, we introduce LDWeaver, which enables an exhaustive GWES across both short- and long-range LD, to disentangle likely neutral co-variation from selection. We demonstrate the ability of LDWeaver to efficiently generate hypotheses about co-selection using large genomic surveys of multiple major human bacterial pathogen species and validate several findings using functional annotation and phenotypic measurements. Our approach will facilitate the study of bacterial evolution in the light of rapidly expanding population genomic data.

7.
NAR Genom Bioinform ; 5(3): lqad066, 2023 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-37435357

RESUMO

Extrachromosomal elements of bacterial cells such as plasmids are notorious for their importance in evolution and adaptation to changing ecology. However, high-resolution population-wide analysis of plasmids has only become accessible recently with the advent of scalable long-read sequencing technology. Current typing methods for the classification of plasmids remain limited in their scope which motivated us to develop a computationally efficient approach to simultaneously recognize novel types and classify plasmids into previously identified groups. Here, we introduce mge-cluster that can easily handle thousands of input sequences which are compressed using a unitig representation in a de Bruijn graph. Our approach offers a faster runtime than existing algorithms, with moderate memory usage, and enables an intuitive visualization, classification and clustering scheme that users can explore interactively within a single framework. Mge-cluster platform for plasmid analysis can be easily distributed and replicated, enabling a consistent labelling of plasmids across past, present, and future sequence collections. We underscore the advantages of our approach by analysing a population-wide plasmid data set obtained from the opportunistic pathogen Escherichia coli, studying the prevalence of the colistin resistance gene mcr-1.1 within the plasmid population, and describing an instance of resistance plasmid transmission within a hospital environment.

8.
Microbiol Spectr ; 11(6): e0020123, 2023 Dec 12.
Artigo em Inglês | MEDLINE | ID: mdl-37811975

RESUMO

IMPORTANCE: Enterococcus faecalis causes life-threatening invasive hospital- and community-associated infections that are usually associated with multidrug resistance globally. Although E. faecalis infections cause opportunistic infections typically associated with antibiotic use, immunocompromised immune status, and other factors, they also possess an arsenal of virulence factors crucial for their pathogenicity. Despite this, the relative contribution of these virulence factors and other genetic changes to the pathogenicity of E. faecalis strains remain poorly understood. Here, we investigated whether specific genomic changes in the genome of E. faecalis isolates influence its pathogenicity-infection of hospitalized and nonhospitalized individuals and the propensity to cause extraintestinal infection and intestinal colonization. Our findings indicate that E. faecalis genetics partially influence the infection of hospitalized and nonhospitalized individuals and the propensity to cause extraintestinal infection, possibly due to gut-to-bloodstream translocation, highlighting the potential substantial role of host and environmental factors, including gut microbiota, on the opportunistic pathogenic lifestyle of this bacterium.


Assuntos
Enterococcus faecalis , Infecções por Bactérias Gram-Positivas , Humanos , Fatores de Virulência/genética , Virulência/genética , Antibacterianos , Infecções por Bactérias Gram-Positivas/microbiologia
9.
Nat Commun ; 14(1): 3294, 2023 06 15.
Artigo em Inglês | MEDLINE | ID: mdl-37322051

RESUMO

Escherichia coli is a leading cause of invasive bacterial infections in humans. Capsule polysaccharide has an important role in bacterial pathogenesis, and the K1 capsule has been firmly established as one of the most potent capsule types in E. coli through its association with severe infections. However, little is known about its distribution, evolution and functions across the E. coli phylogeny, which is fundamental to elucidating its role in the expansion of successful lineages. Using systematic surveys of invasive E. coli isolates, we show that the K1-cps locus is present in a quarter of bloodstream infection isolates and has emerged in at least four different extraintestinal pathogenic E. coli (ExPEC) phylogroups independently in the last 500 years. Phenotypic assessment demonstrates that K1 capsule synthesis enhances E. coli survival in human serum independent of genetic background, and that therapeutic targeting of the K1 capsule re-sensitizes E. coli from distinct genetic backgrounds to human serum. Our study highlights that assessing the evolutionary and functional properties of bacterial virulence factors at population levels is important to better monitor and predict the emergence of virulent clones, and to also inform therapies and preventive medicine to effectively control bacterial infections whilst significantly lowering antibiotic usage.


Assuntos
Infecções por Escherichia coli , Proteínas de Escherichia coli , Humanos , Escherichia coli , Infecções por Escherichia coli/microbiologia , Virulência/genética , Fatores de Virulência/genética , Proteínas de Escherichia coli/genética , Filogenia
10.
Genome Med ; 13(1): 9, 2021 01 20.
Artigo em Inglês | MEDLINE | ID: mdl-33472670

RESUMO

BACKGROUND: Enterococcus faecium is a commensal of the gastrointestinal tract of animals and humans but also a causative agent of hospital-acquired infections. Resistance against glycopeptides and to vancomycin has motivated the inclusion of E. faecium in the WHO global priority list. Vancomycin resistance can be conferred by the vanA gene cluster on the transposon Tn1546, which is frequently present in plasmids. The vanA gene cluster can be disseminated clonally but also horizontally either by plasmid dissemination or by Tn1546 transposition between different genomic locations. METHODS: We performed a retrospective study of the genomic epidemiology of 309 vancomycin-resistant E. faecium (VRE) isolates across 32 Dutch hospitals (2012-2015). Genomic information regarding clonality and Tn1546 characterization was extracted using hierBAPS sequence clusters (SC) and TETyper, respectively. Plasmids were predicted using gplas in combination with a network approach based on shared k-mer content. Next, we conducted a pairwise comparison between isolates sharing a potential epidemiological link to elucidate whether clonal, plasmid, or Tn1546 spread accounted for vanA-type resistance dissemination. RESULTS: On average, we estimated that 59% of VRE cases with a potential epidemiological link were unrelated which was defined as VRE pairs with a distinct Tn1546 variant. Clonal dissemination accounted for 32% cases in which the same SC and Tn1546 variants were identified. Horizontal plasmid dissemination accounted for 7% of VRE cases, in which we observed VRE pairs belonging to a distinct SC but carrying an identical plasmid and Tn1546 variant. In 2% of cases, we observed the same Tn1546 variant in distinct SC and plasmid types which could be explained by mixed and consecutive events of clonal and plasmid dissemination. CONCLUSIONS: In related VRE cases, the dissemination of the vanA gene cluster in Dutch hospitals between 2012 and 2015 was dominated by clonal spread. However, we also identified outbreak settings with high frequencies of plasmid dissemination in which the spread of resistance was mainly driven by horizontal gene transfer (HGT). This study demonstrates the feasibility of distinguishing between modes of dissemination with short-read data and provides a novel assessment to estimate the relative contribution of nested genomic elements in the dissemination of vanA-type resistance.


Assuntos
Proteínas de Bactérias/metabolismo , Infecções por Bactérias Gram-Positivas/microbiologia , Infecções por Bactérias Gram-Positivas/transmissão , Hospitais , Resistência a Vancomicina , Sequência de Bases , Enterococcus faecium/genética , Enterococcus faecium/fisiologia , Humanos , Países Baixos/epidemiologia , Plasmídeos/genética , Reprodutibilidade dos Testes , Fatores de Tempo , Enterococos Resistentes à Vancomicina/genética , Enterococos Resistentes à Vancomicina/isolamento & purificação
11.
Gigascience ; 10(12)2021 12 09.
Artigo em Inglês | MEDLINE | ID: mdl-34891160

RESUMO

BACKGROUND: Bacterial whole-genome sequencing based on short-read technologies often results in a draft assembly formed by contiguous sequences. The introduction of long-read sequencing technologies permits those contiguous sequences to be unambiguously bridged into complete genomes. However, the elevated costs associated with long-read sequencing frequently limit the number of bacterial isolates that can be long-read sequenced. Here we evaluated the recently released 96 barcoding kit from Oxford Nanopore Technologies (ONT) to generate complete genomes on a high-throughput basis. In addition, we propose an isolate selection strategy that optimizes a representative selection of isolates for long-read sequencing considering as input large-scale bacterial collections. RESULTS: Despite an uneven distribution of long reads per barcode, near-complete chromosomal sequences (assembly contiguity = 0.89) were generated for 96 Escherichia coli isolates with associated short-read sequencing data. The assembly contiguity of the plasmid replicons was even higher (0.98), which indicated the suitability of the multiplexing strategy for studies focused on resolving plasmid sequences. We benchmarked hybrid and ONT-only assemblies and showed that the combination of ONT sequencing data with short-read sequencing data is still highly desirable (i) to perform an unbiased selection of isolates for long-read sequencing, (ii) to achieve an optimal genome accuracy and completeness, and (iii) to include small plasmids underrepresented in the ONT library. CONCLUSIONS: The proposed long-read isolate selection ensures the completion of bacterial genomes that span the genome diversity inherent in large collections of bacterial isolates. We show the potential of using this multiplexing approach to close bacterial genomes on a high-throughput basis.


Assuntos
Genoma Bacteriano , Nanoporos , Biblioteca Gênica , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/métodos
12.
Microorganisms ; 9(8)2021 Jul 28.
Artigo em Inglês | MEDLINE | ID: mdl-34442692

RESUMO

The incidence of infections caused by multidrug-resistant E. coli strains has risen in the past years. Antibiotic resistance in E. coli is often mediated by acquisition and maintenance of plasmids. The study of E. coli plasmid epidemiology and genomics often requires long-read sequencing information, but recently a number of tools that allow plasmid prediction from short-read data have been developed. Here, we reviewed 25 available plasmid prediction tools and categorized them into binary plasmid/chromosome classification tools and plasmid reconstruction tools. We benchmarked six tools (MOB-suite, plasmidSPAdes, gplas, FishingForPlasmids, HyAsP and SCAPP) that aim to reliably reconstruct distinct plasmids, with a special focus on plasmids carrying antibiotic resistance genes (ARGs) such as extended-spectrum beta-lactamase genes. We found that two thirds (n = 425, 66.3%) of all plasmids were correctly reconstructed by at least one of the six tools, with a range of 92 (14.58%) to 317 (50.23%) correctly predicted plasmids. However, the majority of plasmids that carried antibiotic resistance genes (n = 85, 57.8%) could not be completely recovered as distinct plasmids by any of the tools. MOB-suite was the only tool that was able to correctly reconstruct the majority of plasmids (n = 317, 50.23%), and performed best at reconstructing large plasmids (n = 166, 46.37%) and ARG-plasmids (n = 41, 27.9%), but predictions frequently contained chromosome contamination (40%). In contrast, plasmidSPAdes reconstructed the highest fraction of plasmids smaller than 18 kbp (n = 168, 61.54%). Large ARG-plasmids, however, were frequently merged with sequences derived from distinct replicons. Available bioinformatic tools can provide valuable insight into E. coli plasmids, but also have important limitations. This work will serve as a guideline for selecting the most appropriate plasmid reconstruction tool for studies focusing on E. coli plasmids in the absence of long-read sequencing data.

13.
Nat Commun ; 12(1): 1523, 2021 03 09.
Artigo em Inglês | MEDLINE | ID: mdl-33750782

RESUMO

Enterococcus faecalis is a commensal and nosocomial pathogen, which is also ubiquitous in animals and insects, representing a classical generalist microorganism. Here, we study E. faecalis isolates ranging from the pre-antibiotic era in 1936 up to 2018, covering a large set of host species including wild birds, mammals, healthy humans, and hospitalised patients. We sequence the bacterial genomes using short- and long-read techniques, and identify multiple extant hospital-associated lineages, with last common ancestors dating back as far as the 19th century. We find a population cohesively connected through homologous recombination, a metabolic flexibility despite a small genome size, and a stable large core genome. Our findings indicate that the apparent hospital adaptations found in hospital-associated E. faecalis lineages likely predate the "modern hospital" era, suggesting selection in another niche, and underlining the generalist nature of this nosocomial pathogen.


Assuntos
Infecção Hospitalar/microbiologia , Enterococcus faecalis/genética , Animais , Antibacterianos , Aves , Farmacorresistência Bacteriana/genética , Enterococcus faecalis/efeitos dos fármacos , Enterococcus faecalis/isolamento & purificação , Genes MDR/genética , Genoma Bacteriano , Infecções por Bactérias Gram-Positivas/microbiologia , Hospitais , Especificidade de Hospedeiro , Humanos , Filogenia , Fatores de Virulência , Sequenciamento Completo do Genoma
14.
Microb Genom ; 6(12)2020 12.
Artigo em Inglês | MEDLINE | ID: mdl-33253085

RESUMO

Enterococcus faecium is a gut commensal of the gastro-digestive tract, but also known as nosocomial pathogen among hospitalized patients. Population genetics based on whole-genome sequencing has revealed that E. faecium strains from hospitalized patients form a distinct clade, designated clade A1, and that plasmids are major contributors to the emergence of nosocomial E. faecium. Here we further explored the adaptive evolution of E. faecium using a genome-wide co-evolution study (GWES) to identify co-evolving single-nucleotide polymorphisms (SNPs). We identified three genomic regions harbouring large numbers of SNPs in tight linkage that are not proximal to each other based on the completely assembled chromosome of the clade A1 reference hospital isolate AUS0004. Close examination of these regions revealed that they are located at the borders of four different types of large-scale genomic rearrangements, insertion sites of two different genomic islands and an IS30-like transposon. In non-clade A1 isolates, these regions are adjacent to each other and they lack the insertions of the genomic islands and IS30-like transposon. Additionally, among the clade A1 isolates there is one group of pet isolates lacking the genomic rearrangement and insertion of the genomic islands, suggesting a distinct evolutionary trajectory. In silico analysis of the biological functions of the genes encoded in three regions revealed a common link to a stress response. This suggests that these rearrangements may reflect adaptation to the stringent conditions in the hospital environment, such as antibiotics and detergents, to which bacteria are exposed. In conclusion, to our knowledge, this is the first study using GWES to identify genomic rearrangements, suggesting that there is considerable untapped potential to unravel hidden evolutionary signals from population genomic data.


Assuntos
Enterococcus faecium/classificação , Infecções por Bactérias Gram-Positivas/microbiologia , Polimorfismo de Nucleotídeo Único , Sequenciamento Completo do Genoma/métodos , Infecção Hospitalar/microbiologia , Elementos de DNA Transponíveis , Enterococcus faecium/genética , Evolução Molecular , Ilhas Genômicas , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Filogenia , Plasmídeos/genética
15.
Microb Genom ; 4(11)2018 11.
Artigo em Inglês | MEDLINE | ID: mdl-30383524

RESUMO

Assembly of bacterial short-read whole-genome sequencing data frequently results in hundreds of contigs for which the origin, plasmid or chromosome, is unclear. Complete genomes resolved by long-read sequencing can be used to generate and label short-read contigs. These were used to train several popular machine learning methods to classify the origin of contigs from Enterococcus faecium, Klebsiella pneumoniae and Escherichia coli using pentamer frequencies. We selected support-vector machine (SVM) models as the best classifier for all three bacterial species (F1-score E. faecium=0.92, F1-score K. pneumoniae=0.90, F1-score E. coli=0.76), which outperformed other existing plasmid prediction tools using a benchmarking set of isolates. We demonstrated the scalability of our models by accurately predicting the plasmidome of a large collection of 1644 E. faecium isolates and illustrate its applicability by predicting the location of antibiotic-resistance genes in all three species. The SVM classifiers are publicly available as an R package and graphical-user interface called 'mlplasmids'. We anticipate that this tool may significantly facilitate research on the dissemination of plasmids encoding antibiotic resistance and/or contributing to host adaptation.


Assuntos
Cromossomos Bacterianos , Genoma Bacteriano , Plasmídeos/genética , Software , Farmacorresistência Bacteriana/genética , Enterococcus faecium/genética , Escherichia coli/genética , Transferência Genética Horizontal , Klebsiella pneumoniae/genética , Aprendizado de Máquina , Máquina de Vetores de Suporte , Sequenciamento Completo do Genoma
16.
Microb Genom ; 3(10): e000128, 2017 10.
Artigo em Inglês | MEDLINE | ID: mdl-29177087

RESUMO

To benchmark algorithms for automated plasmid sequence reconstruction from short-read sequencing data, we selected 42 publicly available complete bacterial genome sequences spanning 12 genera, containing 148 plasmids. We predicted plasmids from short-read data with four programs (PlasmidSPAdes, Recycler, cBar and PlasmidFinder) and compared the outcome to the reference sequences. PlasmidSPAdes reconstructs plasmids based on coverage differences in the assembly graph. It reconstructed most of the reference plasmids (recall=0.82), but approximately a quarter of the predicted plasmid contigs were false positives (precision=0.75). PlasmidSPAdes merged 84 % of the predictions from genomes with multiple plasmids into a single bin. Recycler searches the assembly graph for sub-graphs corresponding to circular sequences and correctly predicted small plasmids, but failed with long plasmids (recall=0.12, precision=0.30). cBar, which applies pentamer frequency analysis to detect plasmid-derived contigs, showed a recall and precision of 0.76 and 0.62, respectively. However, cBar categorizes contigs as plasmid-derived and does not bin the different plasmids. PlasmidFinder, which searches for replicons, had the highest precision (1.0), but was restricted by the contents of its database and the contig length obtained from de novo assembly (recall=0.36). PlasmidSPAdes and Recycler detected putative small plasmids (<10 kbp), which were also predicted as plasmids by cBar, but were absent in the original assembly. This study shows that it is possible to automatically predict small plasmids. Prediction of large plasmids (>50 kbp) containing repeated sequences remains challenging and limits the high-throughput analysis of plasmids from short-read whole-genome sequencing data.


Assuntos
Genoma Bacteriano , Plasmídeos , Biologia Computacional , Sistemas de Gerenciamento de Base de Dados
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA