Pesquisa | BVS Doenças Infecciosas e Parasitárias

Hybracter: enabling scalable, automated, complete and accurate bacterial genome assemblies.

Bouras, George; Houtak, Ghais; Wick, Ryan R; Mallawaarachchi, Vijini; Roach, Michael J; Papudeshi, Bhavya; Judd, Lousie M; Sheppard, Anna E; Edwards, Robert A; Vreugde, Sarah.

Microb Genom ; 10(5)2024 May.

Artigo em Inglês | MEDLINE | ID: mdl-38717808

RESUMO

Improvements in the accuracy and availability of long-read sequencing mean that complete bacterial genomes are now routinely reconstructed using hybrid (i.e. short- and long-reads) assembly approaches. Complete genomes allow a deeper understanding of bacterial evolution and genomic variation beyond single nucleotide variants. They are also crucial for identifying plasmids, which often carry medically significant antimicrobial resistance genes. However, small plasmids are often missed or misassembled by long-read assembly algorithms. Here, we present Hybracter which allows for the fast, automatic and scalable recovery of near-perfect complete bacterial genomes using a long-read first assembly approach. Hybracter can be run either as a hybrid assembler or as a long-read only assembler. We compared Hybracter to existing automated hybrid and long-read only assembly tools using a diverse panel of samples of varying levels of long-read accuracy with manually curated ground truth reference genomes. We demonstrate that Hybracter as a hybrid assembler is more accurate and faster than the existing gold standard automated hybrid assembler Unicycler. We also show that Hybracter with long-reads only is the most accurate long-read only assembler and is comparable to hybrid methods in accurately recovering small plasmids.

Assuntos

Algoritmos , Genoma Bacteriano , Software , Plasmídeos/genética , Análise de Sequência de DNA/métodos , Genômica/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Bactérias/genética , Bactérias/classificação

Hybracter: Enabling Scalable, Automated, Complete and Accurate Bacterial Genome Assemblies.

Bouras, George; Houtak, Ghais; Wick, Ryan R; Mallawaarachchi, Vijini; Roach, Michael J; Papudeshi, Bhavya; Judd, Lousie M; Sheppard, Anna E; Edwards, Robert A; Vreugde, Sarah.

bioRxiv ; 2024 Apr 11.

Artigo em Inglês | MEDLINE | ID: mdl-38168369

RESUMO

Improvements in the accuracy and availability of long-read sequencing mean that complete bacterial genomes are now routinely reconstructed using hybrid (i.e. short- and long-reads) assembly approaches. Complete genomes allow a deeper understanding of bacterial evolution and genomic variation beyond single nucleotide variants (SNVs). They are also crucial for identifying plasmids, which often carry medically significant antimicrobial resistance (AMR) genes. However, small plasmids are often missed or misassembled by long-read assembly algorithms. Here, we present Hybracter which allows for the fast, automatic, and scalable recovery of near-perfect complete bacterial genomes using a long-read first assembly approach. Hybracter can be run either as a hybrid assembler or as a long-read only assembler. We compared Hybracter to existing automated hybrid and long-read only assembly tools using a diverse panel of samples of varying levels of long-read accuracy with manually curated ground truth reference genomes. We demonstrate that Hybracter as a hybrid assembler is more accurate and faster than the existing gold standard automated hybrid assembler Unicycler. We also show that Hybracter with long-reads only is the most accurate long-read only assembler and is comparable to hybrid methods in accurately recovering small plasmids.

Discordance between different bioinformatic methods for identifying resistance genes from short-read genomic data, with a focus on Escherichia coli.

Davies, Timothy J; Swann, Jeremy; Sheppard, Anna E; Pickford, Hayleigh; Lipworth, Samuel; AbuOun, Manal; Ellington, Matthew J; Fowler, Philip W; Hopkins, Susan; Hopkins, Katie L; Crook, Derrick W; Peto, Timothy E A; Anjum, Muna F; Walker, A Sarah; Stoesser, Nicole.

Microb Genom ; 9(12)2023 Dec.

Artigo em Inglês | MEDLINE | ID: mdl-38100178

RESUMO

Several bioinformatics genotyping algorithms are now commonly used to characterize antimicrobial resistance (AMR) gene profiles in whole-genome sequencing (WGS) data, with a view to understanding AMR epidemiology and developing resistance prediction workflows using WGS in clinical settings. Accurately evaluating AMR in Enterobacterales, particularly Escherichia coli, is of major importance, because this is a common pathogen. However, robust comparisons of different genotyping approaches on relevant simulated and large real-life WGS datasets are lacking. Here, we used both simulated datasets and a large set of real E. coli WGS data (n=1818 isolates) to systematically investigate genotyping methods in greater detail. Simulated constructs and real sequences were processed using four different bioinformatic programs (ABRicate, ARIBA, KmerResistance and SRST2, run with the ResFinder database) and their outputs compared. For simulation tests where 3079 AMR gene variants were inserted into random sequence constructs, KmerResistance was correct for 3076 (99.9â%) simulations, ABRicate for 3054 (99.2â%), ARIBA for 2783 (90.4â%) and SRST2 for 2108 (68.5â%). For simulation tests where two closely related gene variants were inserted into random sequence constructs, KmerResistance identified the correct alleles in 35â¯338/46â¯318 (76.3â%) simulations, ABRicate identified them in 11â¯842/46â¯318 (25.6â%) simulations, ARIBA identified them in 1679/46â¯318 (3.6â%) simulations and SRST2 identified them in 2000/46â¯318 (4.3â%) simulations. In real data, across all methods, 1392/1818 (76â%) isolates had discrepant allele calls for at least 1 gene. In addition to highlighting areas for improvement in challenging scenarios, (e.g. identification of AMR genes at <10× coverage, identifying multiple closely related AMR genes present in the same sample), our evaluations identified some more systematic errors that could be readily soluble, such as repeated misclassification (i.e. naming) of genes as shorter variants of the same gene present within the reference resistance gene database. Such naming errors accounted for at least 2530/4321 (59â%) of the discrepancies seen in real data. Moreover, many of the remaining discrepancies were likely 'artefactual', with reporting of cut-off differences accounting for at least 1430/4321 (33â%) discrepants. Whilst we found that comparing outputs generated by running multiple algorithms on the same dataset could identify and resolve these algorithmic artefacts, the results of our evaluations emphasize the need for developing new and more robust genotyping algorithms to further improve accuracy and performance.

Assuntos

Escherichia coli , Genômica , Escherichia coli/genética , Biologia Computacional , Alelos , Algoritmos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA