Pesquisa | BVS IEC

Pangenome-spanning epistasis and coselection analysis via de Bruijn graphs.

Kuronen, Juri; Horsfield, Samuel T; Pöntinen, Anna K; Mallawaarachchi, Sudaraka; Arredondo-Alonso, Sergio; Thorpe, Harry; Gladstone, Rebecca A; Willems, Rob J L; Bentley, Stephen D; Croucher, Nicholas J; Pensar, Johan; Lees, John A; Tonkin-Hill, Gerry; Corander, Jukka.

Genome Res ; 34(7): 1081-1088, 2024 Aug 20.

Artigo em Inglês | MEDLINE | ID: mdl-39134411

RESUMO

Studies of bacterial adaptation and evolution are hampered by the difficulty of measuring traits such as virulence, drug resistance, and transmissibility in large populations. In contrast, it is now feasible to obtain high-quality complete assemblies of many bacterial genomes thanks to scalable high-accuracy long-read sequencing technologies. To exploit this opportunity, we introduce a phenotype- and alignment-free method for discovering coselected and epistatically interacting genomic variation from genome assemblies covering both core and accessory parts of genomes. Our approach uses a compact colored de Bruijn graph to approximate the intragenome distances between pairs of loci for a collection of bacterial genomes to account for the impacts of linkage disequilibrium (LD). We demonstrate the versatility of our approach to efficiently identify associations between loci linked with drug resistance and adaptation to the hospital niche in the major human bacterial pathogens Streptococcus pneumoniae and Enterococcus faecalis.

Assuntos

Enterococcus faecalis , Epistasia Genética , Genoma Bacteriano , Streptococcus pneumoniae , Streptococcus pneumoniae/genética , Enterococcus faecalis/genética , Desequilíbrio de Ligação , Humanos , Genômica/métodos

Accurate and fast graph-based pangenome annotation and clustering with ggCaller.

Horsfield, Samuel T; Tonkin-Hill, Gerry; Croucher, Nicholas J; Lees, John A.

Genome Res ; 33(9): 1622-1637, 2023 09.

Artigo em Inglês | MEDLINE | ID: mdl-37620118

RESUMO

Bacterial genomes differ in both gene content and sequence mutations, which underlie extensive phenotypic diversity, including variation in susceptibility to antimicrobials or vaccine-induced immunity. To identify and quantify important variants, all genes within a population must be predicted, functionally annotated, and clustered, representing the "pangenome." Despite the volume of genome data available, gene prediction and annotation are currently conducted in isolation on individual genomes, which is computationally inefficient and frequently inconsistent across genomes. Here, we introduce the open-source software graph-gene-caller (ggCaller). ggCaller combines gene prediction, functional annotation, and clustering into a single workflow using population-wide de Bruijn graphs, removing redundancy in gene annotation and resulting in more accurate gene predictions and orthologue clustering. We applied ggCaller to simulated and real-world bacterial data sets containing hundreds or thousands of genomes, comparing it to current state-of-the-art tools. ggCaller has considerable speed-ups with equivalent or greater accuracy, particularly with data sets containing complex sources of error, such as assembly contamination or fragmentation. ggCaller is also an important extension to bacterial genome-wide association studies, enabling querying of annotated graphs for functional analyses. We highlight this application by functionally annotating DNA sequences with significant associations to tetracycline and macrolide resistance in Streptococcus pneumoniae, identifying key resistance determinants that were missed when using only a single reference genome. ggCaller is a novel bacterial genome analysis tool with applications in bacterial evolution and epidemiology.

Assuntos

Antibacterianos , Estudo de Associação Genômica Ampla , Farmacorresistência Bacteriana , Macrolídeos , Software , Anotação de Sequência Molecular , Genoma Bacteriano , Análise por Conglomerados , Algoritmos

CELEBRIMBOR: Core and accessory genes from metagenomes.

Hellewell, Joel; Horsfield, Samuel T; von Wachsmann, Johanna; Gurbich, Tatiana A; Finn, Robert D; Iqbal, Zamin; Roberts, Leah W; Lees, John A.

Bioinformatics ; 2024 Sep 19.

Artigo em Inglês | MEDLINE | ID: mdl-39298479

RESUMO

MOTIVATION: Metagenome-Assembled Genomes (MAGs) or Single-cell Amplified Genomes (SAGs) are often incomplete, with sequences missing due to errors in assembly or low coverage. This presents a particular challenge for the identification of true gene frequencies within a microbial population, as core genes missing in only a few assemblies will be mischaracterized by current pangenome approaches. RESULTS: Here, we present CELEBRIMBOR, a Snakemake pangenome analysis pipeline which uses a measure of genome completeness to automatically adjust the frequency threshold at which core genes are identified, enabling accurate core gene identification in MAGs and SAGs. AVAILABILITY: CELEBRIMBOR is published under open source Apache 2.0 licence at https://github.com/bacpop/CELEBRIMBOR and is available as a Docker container from this repository. Supplementary material is available in the online version of the article. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Improved Prediction of Bacterial Genotype-Phenotype Associations Using Interpretable Pangenome-Spanning Regressions.

Lees, John A; Mai, T Tien; Galardini, Marco; Wheeler, Nicole E; Horsfield, Samuel T; Parkhill, Julian; Corander, Jukka.

mBio ; 11(4)2020 07 07.

Artigo em Inglês | MEDLINE | ID: mdl-32636251

RESUMO

Discovery of genetic variants underlying bacterial phenotypes and the prediction of phenotypes such as antibiotic resistance are fundamental tasks in bacterial genomics. Genome-wide association study (GWAS) methods have been applied to study these relations, but the plastic nature of bacterial genomes and the clonal structure of bacterial populations creates challenges. We introduce an alignment-free method which finds sets of loci associated with bacterial phenotypes, quantifies the total effect of genetics on the phenotype, and allows accurate phenotype prediction, all within a single computationally scalable joint modeling framework. Genetic variants covering the entire pangenome are compactly represented by extended DNA sequence words known as unitigs, and model fitting is achieved using elastic net penalization, an extension of standard multiple regression. Using an extensive set of state-of-the-art bacterial population genomic data sets, we demonstrate that our approach performs accurate phenotype prediction, comparable to popular machine learning methods, while retaining both interpretability and computational efficiency. Compared to those of previous approaches, which test each genotype-phenotype association separately for each variant and apply a significance threshold, the variants selected by our joint modeling approach overlap substantially.IMPORTANCE Being able to identify the genetic variants responsible for specific bacterial phenotypes has been the goal of bacterial genetics since its inception and is fundamental to our current level of understanding of bacteria. This identification has been based primarily on painstaking experimentation, but the availability of large data sets of whole genomes with associated phenotype metadata promises to revolutionize this approach, not least for important clinical phenotypes that are not amenable to laboratory analysis. These models of phenotype-genotype association can in the future be used for rapid prediction of clinically important phenotypes such as antibiotic resistance and virulence by rapid-turnaround or point-of-care tests. However, despite much effort being put into adapting genome-wide association study (GWAS) approaches to cope with bacterium-specific problems, such as strong population structure and horizontal gene exchange, current approaches are not yet optimal. We describe a method that advances methodology for both association and generation of portable prediction models.

Assuntos

Bactérias/genética , Estudos de Associação Genética/métodos , Genômica/métodos , Metagenoma , Simulação por Computador , Variação Genética , Genótipo , Modelos Teóricos , Fenótipo , Análise de Regressão

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA