Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 338
Filtrar
1.
Brief Bioinform ; 25(3)2024 Mar 27.
Artigo em Inglês | MEDLINE | ID: mdl-38706320

RESUMO

The advent of rapid whole-genome sequencing has created new opportunities for computational prediction of antimicrobial resistance (AMR) phenotypes from genomic data. Both rule-based and machine learning (ML) approaches have been explored for this task, but systematic benchmarking is still needed. Here, we evaluated four state-of-the-art ML methods (Kover, PhenotypeSeeker, Seq2Geno2Pheno and Aytan-Aktug), an ML baseline and the rule-based ResFinder by training and testing each of them across 78 species-antibiotic datasets, using a rigorous benchmarking workflow that integrates three evaluation approaches, each paired with three distinct sample splitting methods. Our analysis revealed considerable variation in the performance across techniques and datasets. Whereas ML methods generally excelled for closely related strains, ResFinder excelled for handling divergent genomes. Overall, Kover most frequently ranked top among the ML approaches, followed by PhenotypeSeeker and Seq2Geno2Pheno. AMR phenotypes for antibiotic classes such as macrolides and sulfonamides were predicted with the highest accuracies. The quality of predictions varied substantially across species-antibiotic combinations, particularly for beta-lactams; across species, resistance phenotyping of the beta-lactams compound, aztreonam, amoxicillin/clavulanic acid, cefoxitin, ceftazidime and piperacillin/tazobactam, alongside tetracyclines demonstrated more variable performance than the other benchmarked antibiotics. By organism, Campylobacter jejuni and Enterococcus faecium phenotypes were more robustly predicted than those of Escherichia coli, Staphylococcus aureus, Salmonella enterica, Neisseria gonorrhoeae, Klebsiella pneumoniae, Pseudomonas aeruginosa, Acinetobacter baumannii, Streptococcus pneumoniae and Mycobacterium tuberculosis. In addition, our study provides software recommendations for each species-antibiotic combination. It furthermore highlights the need for optimization for robust clinical applications, particularly for strains that diverge substantially from those used for training.


Assuntos
Antibacterianos , Fenótipo , Antibacterianos/farmacologia , Aprendizado de Máquina , Farmacorresistência Bacteriana/genética , Biologia Computacional/métodos , Genoma Bacteriano , Genoma Microbiano , Humanos , Bactérias/genética , Bactérias/efeitos dos fármacos
2.
Nat Commun ; 15(1): 4631, 2024 May 31.
Artigo em Inglês | MEDLINE | ID: mdl-38821971

RESUMO

Although long-read sequencing enables the generation of complete genomes for unculturable microbes, its high cost limits the widespread adoption of long-read sequencing in large-scale metagenomic studies. An alternative method is to assemble short-reads with long-range connectivity, which can be a cost-effective way to generate high-quality microbial genomes. Here, we develop Pangaea, a bioinformatic approach designed to enhance metagenome assembly using short-reads with long-range connectivity. Pangaea leverages connectivity derived from physical barcodes of linked-reads or virtual barcodes by aligning short-reads to long-reads. Pangaea utilizes a deep learning-based read binning algorithm to assemble co-barcoded reads exhibiting similar sequence contexts and abundances, thereby improving the assembly of high- and medium-abundance microbial genomes. Pangaea also leverages a multi-thresholding algorithm strategy to refine assembly for low-abundance microbes. We benchmark Pangaea on linked-reads and a combination of short- and long-reads from simulation data, mock communities and human gut metagenomes. Pangaea achieves significantly higher contig continuity as well as more near-complete metagenome-assembled genomes (NCMAGs) than the existing assemblers. Pangaea also generates three complete and circular NCMAGs on the human gut microbiomes.


Assuntos
Algoritmos , Microbioma Gastrointestinal , Genoma Microbiano , Metagenoma , Metagenômica , Humanos , Metagenoma/genética , Metagenômica/métodos , Microbioma Gastrointestinal/genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Aprendizado Profundo , Biologia Computacional/métodos , Análise de Sequência de DNA/métodos , Genoma Bacteriano
3.
Sci Data ; 11(1): 484, 2024 May 10.
Artigo em Inglês | MEDLINE | ID: mdl-38730026

RESUMO

Barley (Hordeum vulgare) is essential to global food systems and the brewing industry. Its physiological traits and microbial communities determine malt quality. Although microbes influence barley from seed health to fermentation, there is a gap in metagenomic insights during seed storage. Crucially, elucidating the changes in microbial composition associated with barley seeds is imperative for understanding how these fluctuations can impact seed health and ultimately, influence both agricultural yield and quality of barley-derived products. Whole metagenomes were sequenced from eight barley seed samples obtained at different storage time points from harvest to nine months. After binning, 82 metagenome-assembled genomes (MAGs) belonging to 26 distinct bacterial genera were assembled, with a substantial proportion of potential novel species. Most of our MAG dataset (61%) showed over 90% genome completeness. This pioneering barley seed microbial genome retrieval provides insights into species diversity and structure, laying the groundwork for understanding barley seed microbiome interactions at the genome level.


Assuntos
Hordeum , Sementes , Hordeum/microbiologia , Hordeum/genética , Sementes/microbiologia , Metagenoma , Microbiota , Metagenômica , Genoma Microbiano , Genoma Bacteriano , Bactérias/genética , Bactérias/classificação
4.
Genome Biol ; 25(1): 106, 2024 Apr 25.
Artigo em Inglês | MEDLINE | ID: mdl-38664753

RESUMO

Centrifuger is an efficient taxonomic classification method that compares sequencing reads against a microbial genome database. In Centrifuger, the Burrows-Wheeler transformed genome sequences are losslessly compressed using a novel scheme called run-block compression. Run-block compression achieves sublinear space complexity and is effective at compressing diverse microbial databases like RefSeq while supporting fast rank queries. Combining this compression method with other strategies for compacting the Ferragina-Manzini (FM) index, Centrifuger reduces the memory footprint by half compared to other FM-index-based approaches. Furthermore, the lossless compression and the unconstrained match length help Centrifuger achieve greater accuracy than competing methods at lower taxonomic levels.


Assuntos
Compressão de Dados , Metagenômica , Compressão de Dados/métodos , Metagenômica/métodos , Software , Genoma Microbiano , Genoma Bacteriano , Análise de Sequência de DNA/métodos
5.
BMC Genomics ; 25(1): 365, 2024 Apr 15.
Artigo em Inglês | MEDLINE | ID: mdl-38622536

RESUMO

BACKGROUND: Microbial genomes are largely comprised of protein coding sequences, yet some genomes contain many pseudogenes caused by frameshifts or internal stop codons. These pseudogenes are believed to result from gene degradation during evolution but could also be technical artifacts of genome sequencing or assembly. RESULTS: Using a combination of observational and experimental data, we show that many putative pseudogenes are attributable to errors that are incorporated into genomes during assembly. Within 126,564 publicly available genomes, we observed that nearly identical genomes often substantially differed in pseudogene counts. Causal inference implicated assembler, sequencing platform, and coverage as likely causative factors. Reassembly of genomes from raw reads confirmed that each variable affects the number of putative pseudogenes in an assembly. Furthermore, simulated sequencing reads corroborated our observations that the quality and quantity of raw data can significantly impact the number of pseudogenes in an assembler dependent fashion. The number of unexpected pseudogenes due to internal stops was highly correlated (R2 = 0.96) with average nucleotide identity to the ground truth genome, implying relative pseudogene counts can be used as a proxy for overall assembly correctness. Applying our method to assemblies in RefSeq resulted in rejection of 3.6% of assemblies due to significantly elevated pseudogene counts. Reassembly from real reads obtained from high coverage genomes showed considerable variability in spurious pseudogenes beyond that observed with simulated reads, reinforcing the finding that high coverage is necessary to mitigate assembly errors. CONCLUSIONS: Collectively, these results demonstrate that many pseudogenes in microbial genome assemblies are actually genes. Our results suggest that high read coverage is required for correct assembly and indicate an inflated number of pseudogenes due to internal stops is indicative of poor overall assembly quality.


Assuntos
Genoma Bacteriano , Pseudogenes , Pseudogenes/genética , Mapeamento Cromossômico , Sequência de Bases , Genoma Microbiano , Análise de Sequência de DNA/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos
6.
Methods Mol Biol ; 2760: 147-155, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38468087

RESUMO

Microbial genome editing can be achieved by donor DNA-directed mutagenesis and CRISPR-Cas12a-mediated negative selection. Single-nucleotide-level genome editing enables the manipulation of microbial cells exactly as designed. Here, we describe single-nucleotide substitutions/indels in the target DNA of E. coli genome using a mutagenic DNA oligonucleotide donor and truncated crRNA/Cas12a system. The maximal truncation of nucleotides at the 3'-end of the crRNA enables Cas12a-mediated single-nucleotide-level precise editing at galK targets in the genome of E. coli.


Assuntos
Sistemas CRISPR-Cas , Edição de Genes , Sistemas CRISPR-Cas/genética , RNA Guia de Sistemas CRISPR-Cas , Nucleotídeos , Escherichia coli/genética , Genoma Microbiano , DNA
7.
mSystems ; 9(3): e0003624, 2024 Mar 19.
Artigo em Inglês | MEDLINE | ID: mdl-38364094

RESUMO

Analyzing microbial genomes has become an essential part of microbiology research, giving valuable insights into the functions and evolution of microbial species. Identifying genes of interest and assigning putative annotations to those genes is a central task in genome analysis, and a plethora of tools and approaches have been developed for this task. The ProkFunFind tool was developed to bridge the gap between these various annotation approaches, providing a flexible and customizable search approach to annotate microbial functions. ProkFunFind is designed around hierarchical definitions of biological functions, where individual genes can be identified using heterogeneous search terms consisting of sequences, profile hidden Markov models, protein domains, and orthology groups. This flexible and customizable search approach allows for searches to be tailored to specific biological functions, and the search results are output in multiple formats to facilitate downstream analyses. The utility of the ProkFunFind search tool was demonstrated through its application in searching for bacterial flagella, which are complex organelles composed of multiple genes. Overall, ProkFunFind provides an accessible and flexible way to integrate multiple types of annotation and sequence data while annotating biological functions in microbial genomes.IMPORTANCEGenome sequencing and analysis are increasingly important parts of microbiology, providing a way to predict metabolic functions, identify virulence factors, and understand the evolution of microbes. The expanded use of genome sequencing has also brought an abundance of search and annotation methods, but integrating the information from these different methods can be challenging and is often done through ad hoc approaches. To bridge the gap between different types of annotations, we developed ProkFunFind, a flexible and customizable search tool incorporating multiple search approaches and annotation types to annotate microbial functions. We demonstrated the utility of ProkFunFind by searching for gene clusters encoding flagellar genes using a combination of different annotation types and searches. Overall, ProkFunFind provides a reproducible and flexible way to identify gene clusters of interest, facilitating the meaningful analysis of new and existing microbial genomes.


Assuntos
Genoma Microbiano , Software , Ferramenta de Busca
8.
Nucleic Acids Res ; 52(D1): D690-D700, 2024 Jan 05.
Artigo em Inglês | MEDLINE | ID: mdl-37897361

RESUMO

The Animal Meta-omics landscape database (AnimalMetaOmics, https://yanglab.hzau.edu.cn/animalmetaomics#/) is a comprehensive and freely available resource that includes metagenomic, metatranscriptomic, and metaproteomic data from various non-human animal species and provides abundant information on animal microbiomes, including cluster analysis of microbial cognate genes, functional gene annotations, active microbiota composition, gene expression abundance, and microbial protein identification. In this work, 55 898 microbial genomes were annotated from 581 animal species, including 42 924 bacterial genomes, 12 336 virus genomes, 496 archaea genomes and 142 fungi genomes. Moreover, 321 metatranscriptomic datasets were analyzed from 31 animal species and 326 metaproteomic datasets from four animal species, as well as the pan-genomic dynamics and compositional characteristics of 679 bacterial species and 13 archaea species from animal hosts. Researchers can efficiently access and acquire the information of cross-host microbiota through a user-friendly interface, such as species, genomes, activity levels, expressed protein sequences and functions, and pan-genome composition. These valuable resources provide an important reference for better exploring the classification, functional diversity, biological process diversity and functional genes of animal microbiota.


Assuntos
Bases de Dados Genéticas , Microbiota , Multiômica , Animais , Bactérias/genética , Genoma Microbiano , Metagenoma/genética , Microbiota/genética
9.
Nucleic Acids Res ; 52(D1): D586-D589, 2024 Jan 05.
Artigo em Inglês | MEDLINE | ID: mdl-37904617

RESUMO

Many microorganisms produce natural products that are frequently used in the development of medicines and crop protection agents. Genome mining has evolved into a prominent method to access this potential. antiSMASH is the most popular tool for this task. Here we present version 4 of the antiSMASH database, providing biosynthetic gene clusters detected by antiSMASH 7.1 in publicly available, dereplicated, high-quality microbial genomes via an interactive graphical user interface. In version 4, the database contains 231 534 high quality BGC regions from 592 archaeal, 35 726 bacterial and 236 fungal genomes and is available at https://antismash-db.secondarymetabolites.org/.


Assuntos
Produtos Biológicos , Vias Biossintéticas , Bases de Dados Genéticas , Genoma Microbiano , Vias Biossintéticas/genética , Família Multigênica , Software
10.
mSphere ; 9(1): e0060823, 2024 Jan 30.
Artigo em Inglês | MEDLINE | ID: mdl-38112433

RESUMO

Transposons, plasmids, bacteriophages, and other mobile genetic elements facilitate horizontal gene transfer in the gut microbiota, allowing some pathogenic bacteria to acquire antibiotic resistance genes (ARGs). Currently, the relationship between specific ARGs and specific transposons in the comprehensive infant gut microbiome has not been elucidated. In this study, ARGs and transposons were annotated from the Unified Human Gastrointestinal Genome (UHGG) and the Early-Life Gut Genomes (ELGG). Association rules mining was used to explore the association between specific ARGs and specific transposons in UHGG, and the robustness of the association rules was validated using the external database in ELGG. Our results suggested that ARGs and transposons were more likely to be relevant in infant gut microbiota compared to adult gut microbiota, and nine robust association rules were identified, among which Klebsiella pneumoniae, Enterobacter hormaechei_A, and Escherichia coli_D played important roles in this association phenomenon. The emphasis of this study is to investigate the synergistic transfer of specific ARGs and specific transposons in the infant gut microbiota, which can contribute to the study of microbial pathogenesis and the ARG dissemination dynamics.IMPORTANCEThe transfer of transposons carrying antibiotic resistance genes (ARGs) among microorganisms accelerates antibiotic resistance dissemination among infant gut microbiota. Nonetheless, it is unclear what the relationship between specific ARGs and specific transposons within the infant gut microbiota. K. pneumoniae, E. hormaechei_A, and E. coli_D were identified as key players in the nine robust association rules we discovered. Meanwhile, we found that infant gut microorganisms were more susceptible to horizontal gene transfer events about specific ARGs and specific transposons than adult gut microorganisms. These discoveries could enhance the understanding of microbial pathogenesis and the ARG dissemination dynamics within the infant gut microbiota.


Assuntos
Antibacterianos , Escherichia coli , Lactente , Humanos , Antibacterianos/farmacologia , Escherichia coli/genética , Resistência Microbiana a Medicamentos/genética , Bactérias/genética , Genoma Microbiano
11.
Commun Biol ; 6(1): 1073, 2023 10 21.
Artigo em Inglês | MEDLINE | ID: mdl-37865678

RESUMO

Assembly of reads from metagenomic samples is a hard problem, often resulting in highly fragmented genome assemblies. Metagenomic binning allows us to reconstruct genomes by re-grouping the sequences by their organism of origin, thus representing a crucial processing step when exploring the biological diversity of metagenomic samples. Here we present Adversarial Autoencoders for Metagenomics Binning (AAMB), an ensemble deep learning approach that integrates sequence co-abundances and tetranucleotide frequencies into a common denoised space that enables precise clustering of sequences into microbial genomes. When benchmarked, AAMB presented similar or better results compared with the state-of-the-art reference-free binner VAMB, reconstructing ~7% more near-complete (NC) genomes across simulated and real data. In addition, genomes reconstructed using AAMB had higher completeness and greater taxonomic diversity compared with VAMB. Finally, we implemented a pipeline Integrating VAMB and AAMB that enabled improved binning, recovering 20% and 29% more simulated and real NC genomes, respectively, compared to VAMB, with moderate additional runtime.


Assuntos
Genoma Microbiano , Metagenoma , Metagenômica/métodos , Análise por Conglomerados , Benchmarking
12.
Environ Monit Assess ; 195(9): 1027, 2023 Aug 09.
Artigo em Inglês | MEDLINE | ID: mdl-37553528

RESUMO

The clarification of drinking water leads to the production of large quantities of water treatment residuals (WTRs). DNA was extracted from six WTR samples collected from water treatment plants within the UK to compare their bacterial communities and examine whether factors such as coagulant usage (aluminium versus iron salt), the type of water source (reservoir or river), or leachable chemical composition influence these communities. Bacterial 16S variable region 4 (V4) was amplified and sequenced using Illumina MiSeq sequencing. The most abundant phyla in WTR samples were Proteobacteria, Actinobacteria, Bacteroidetes, Acidobacteria, and Firmicutes, collectively representing 92.77-97.8% of the total bacterial sequences. Statistical analysis of microbial profiles indicated that water source played a significant role in microbial community structure, diversity, and richness, however coagulant type did not. PERMANOVA analysis showed that no single chemical variable (pH, organic matter, or extractable element concentration) influenced microbial composition significantly; however, canonical correspondence analysis of WTR microbiomes yielded a model using all these variables that could be used to explain variations in microbial community structures of WTRs (p < 0.05). No common, potentially toxic cyanobacteria, or related pathogens of concern were found. Analysis with PICRUSt showed that WTRs all had similar predicted microbial functional profiles. Overall, the results indicate that WTRs analysed in this study are unlikely to pose any threat to soil microbial community structure when applied to land as a soil conditioner or enhancer and may help to enhance the soil microbial community.


Assuntos
Cianobactérias , Água Potável , Purificação da Água , Monitoramento Ambiental , Solo , Microbiologia do Solo , Genoma Microbiano , RNA Ribossômico 16S
13.
Sci Data ; 10(1): 536, 2023 08 10.
Artigo em Inglês | MEDLINE | ID: mdl-37563185

RESUMO

The great threat of microbes carried by ballast water calls for figuring out the species composition of the ballast-tank microbial community, where the dark, cold, and anoxic tank environment might select special taxa. In this study, we reconstructed 103 metagenome-assembled genomes (MAGs), including 102 bacteria and one archaea, from four vessels on international voyages. Of these MAGs, 60 were 'near complete' (completeness >90%), 34 were >80% complete, and nine were >75% complete. Phylogenomic analysis revealed that over 70% (n = 74) of these MAGs represented new taxa at different taxonomical levels, including one order, three families, 12 genera, and 58 species. The species composition of these MAGs was most consistent with the previous reports, with the most abundant phyla being Proteobacteria (n = 69), Bacteroidota (n = 17), and Actinobacteriota (n = 7). These draft genomes provided novel data on species diversity and function in the ballast-tank microbial community, which will facilitate ballast water and sediments management.


Assuntos
Metagenoma , Microbiota , Archaea/genética , Bactérias/genética , Genoma Microbiano , Metagenômica
14.
Nat Methods ; 20(8): 1203-1212, 2023 08.
Artigo em Inglês | MEDLINE | ID: mdl-37500759

RESUMO

Advances in sequencing technologies and bioinformatics tools have dramatically increased the recovery rate of microbial genomes from metagenomic data. Assessing the quality of metagenome-assembled genomes (MAGs) is a critical step before downstream analysis. Here, we present CheckM2, an improved method of predicting genome quality of MAGs using machine learning. Using synthetic and experimental data, we demonstrate that CheckM2 outperforms existing tools in both accuracy and computational speed. In addition, CheckM2's database can be rapidly updated with new high-quality reference genomes, including taxa represented only by a single genome. We also show that CheckM2 accurately predicts genome quality for MAGs from novel lineages, even for those with reduced genome size (for example, Patescibacteria and the DPANN superphylum). CheckM2 provides accurate genome quality predictions across bacterial and archaeal lineages, giving increased confidence when inferring biological conclusions from MAGs.


Assuntos
Bactérias , Genoma Microbiano , Bactérias/genética , Metagenoma , Metagenômica/métodos , Aprendizado de Máquina
15.
Bioinformatics ; 39(39 Suppl 1): i40-i46, 2023 06 30.
Artigo em Inglês | MEDLINE | ID: mdl-37387149

RESUMO

Microbial natural products represent a major source of bioactive compounds for drug discovery. Among these molecules, nonribosomal peptides (NRPs) represent a diverse class that include antibiotics, immunosuppressants, anticancer agents, toxins, siderophores, pigments, and cytostatics. The discovery of novel NRPs remains a laborious process because many NRPs consist of nonstandard amino acids that are assembled by nonribosomal peptide synthetases (NRPSs). Adenylation domains (A-domains) in NRPSs are responsible for selection and activation of monomers appearing in NRPs. During the past decade, several support vector machine-based algorithms have been developed for predicting the specificity of the monomers present in NRPs. These algorithms utilize physiochemical features of the amino acids present in the A-domains of NRPSs. In this article, we benchmarked the performance of various machine learning algorithms and features for predicting specificities of NRPSs and we showed that the extra trees model paired with one-hot encoding features outperforms the existing approaches. Moreover, we show that unsupervised clustering of 453 560 A-domains reveals many clusters that correspond to potentially novel amino acids. While it is challenging to predict the chemical structure of these amino acids, we developed novel techniques to predict their various properties, including polarity, hydrophobicity, charge, and presence of aromatic rings, carboxyl, and hydroxyl groups.


Assuntos
Aminoácidos , Genoma Microbiano , Algoritmos , Família Multigênica , Peptídeos
16.
ACS Synth Biol ; 12(7): 2203-2207, 2023 07 21.
Artigo em Inglês | MEDLINE | ID: mdl-37368988

RESUMO

Multiplex genome editing with CRISPR-Cas9 offers a cost-effective solution for time and labor savings. However, achieving high accuracy remains a challenge. In an Escherichia coli model system, we achieved highly efficient single-nucleotide level simultaneous editing of the galK and xylB genes using the 5'-end-truncated single-molecular guide RNA (sgRNA) method. Furthermore, we successfully demonstrated the simultaneous editing of three genes (galK, xylB, and srlD) at single-nucleotide resolution. To showcase practical application, we targeted the cI857 and ilvG genes in the genome of E. coli. While untruncated sgRNAs failed to produce any edited cells, the use of truncated sgRNAs allowed us to achieve simultaneous and accurate editing of these two genes with an efficiency of 30%. This enabled the edited cells to retain their lysogenic state at 42 °C and effectively alleviated l-valine toxicity. These results suggest that our truncated sgRNA method holds significant potential for widespread and practical use in synthetic biology.


Assuntos
Edição de Genes , RNA Guia de Sistemas CRISPR-Cas , Edição de Genes/métodos , Sistemas CRISPR-Cas/genética , Nucleotídeos , Escherichia coli/genética , Genoma Microbiano
17.
PLoS Comput Biol ; 19(6): e1011129, 2023 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-37347768

RESUMO

The increasing availability of high-throughput sequencing (frequently termed next-generation sequencing (NGS)) data has created opportunities to gain deeper insights into the mechanisms of a number of diseases and is already impacting many areas of medicine and public health. The area of infectious diseases stands somewhat apart from other human diseases insofar as the relevant genomic data comes from the microbes rather than their human hosts. A particular concern about the threat of antimicrobial resistance (AMR) has driven the collection and reporting of large-scale datasets containing information from microbial genomes together with antimicrobial susceptibility test (AST) results. Unfortunately, the lack of clear standards or guiding principles for the reporting of such data is hampering the field's advancement. We therefore present our recommendations for the publication and sharing of genotype and phenotype data on AMR, in the form of 10 simple rules. The adoption of these recommendations will enhance AMR data interoperability and help enable its large-scale analyses using computational biology tools, including mathematical modelling and machine learning. We hope that these rules can shed light on often overlooked but nonetheless very necessary aspects of AMR data sharing and enhance the field's ability to address the problems of understanding AMR mechanisms, tracking their emergence and spread in populations, and predicting microbial susceptibility to antimicrobials for diagnostic purposes.


Assuntos
Antibacterianos , Anti-Infecciosos , Humanos , Antibacterianos/farmacologia , Farmacorresistência Bacteriana/genética , Bactérias/genética , Genoma Microbiano , Genótipo , Fenótipo
18.
PeerJ ; 11: e15339, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37250706

RESUMO

Here, we present the R package, minSNPs. This is a re-development of a previously described Java application named Minimum SNPs. MinSNPs assembles resolution-optimised sets of single nucleotide polymorphisms (SNPs) from sequence alignments such as genome-wide orthologous SNP matrices. MinSNPs can derive sets of SNPs optimised for discriminating any user-defined combination of sequences from all others. Alternatively, SNP sets may be optimised to determine all sequences from all other sequences, i.e., to maximise diversity. MinSNPs encompasses functions that facilitate rapid and flexible SNP mining, and clear and comprehensive presentation of the results. The minSNPs' running time scales in a linear fashion with input data volume and the numbers of SNPs and SNPs sets specified in the output. MinSNPs was tested using a previously reported orthologous SNP matrix of Staphylococcus aureus and an orthologous SNP matrix of 3,279 genomes with 164,335 SNPs assembled from four S. aureus short read genomic data sets. MinSNPs was shown to be effective for deriving discriminatory SNP sets for potential surveillance targets and in identifying SNP sets optimised to discriminate isolates from different clonal complexes. MinSNPs was also tested with a large Plasmodium vivax orthologous SNP matrix. A set of five SNPs was derived that reliably indicated the country of origin within three south-east Asian countries. In summary, we report the capacity to assemble comprehensive SNP matrices that effectively capture microbial genomic diversity, and to rapidly and flexibly mine these entities for optimised marker sets.


Assuntos
Polimorfismo de Nucleotídeo Único , Staphylococcus aureus , Polimorfismo de Nucleotídeo Único/genética , Staphylococcus aureus/genética , Alinhamento de Sequência , Genoma Microbiano , Genômica
19.
Nucleic Acids Res ; 51(W1): W46-W50, 2023 07 05.
Artigo em Inglês | MEDLINE | ID: mdl-37140036

RESUMO

Microorganisms produce small bioactive compounds as part of their secondary or specialised metabolism. Often, such metabolites have antimicrobial, anticancer, antifungal, antiviral or other bio-activities and thus play an important role for applications in medicine and agriculture. In the past decade, genome mining has become a widely-used method to explore, access, and analyse the available biodiversity of these compounds. Since 2011, the 'antibiotics and secondary metabolite analysis shell-antiSMASH' (https://antismash.secondarymetabolites.org/) has supported researchers in their microbial genome mining tasks, both as a free to use web server and as a standalone tool under an OSI-approved open source licence. It is currently the most widely used tool for detecting and characterising biosynthetic gene clusters (BGCs) in archaea, bacteria, and fungi. Here, we present the updated version 7 of antiSMASH. antiSMASH 7 increases the number of supported cluster types from 71 to 81, as well as containing improvements in the areas of chemical structure prediction, enzymatic assembly-line visualisation and gene cluster regulation.


Assuntos
Computadores , Software , Bactérias/genética , Bactérias/metabolismo , Archaea/genética , Genoma Microbiano , Família Multigênica , Metabolismo Secundário/genética
20.
mSystems ; 8(2): e0117822, 2023 04 27.
Artigo em Inglês | MEDLINE | ID: mdl-37010293

RESUMO

Comprehensive protein function annotation is essential for understanding microbiome-related disease mechanisms in the host organisms. However, a large portion of human gut microbial proteins lack functional annotation. Here, we have developed a new metagenome analysis workflow integrating de novo genome reconstruction, taxonomic profiling, and deep learning-based functional annotations from DeepFRI. This is the first approach to apply deep learning-based functional annotations in metagenomics. We validate DeepFRI functional annotations by comparing them to orthology-based annotations from eggNOG on a set of 1,070 infant metagenomes from the DIABIMMUNE cohort. Using this workflow, we generated a sequence catalogue of 1.9 million nonredundant microbial genes. The functional annotations revealed 70% concordance between Gene Ontology annotations predicted by DeepFRI and eggNOG. DeepFRI improved the annotation coverage, with 99% of the gene catalogue obtaining Gene Ontology molecular function annotations, although they are less specific than those from eggNOG. Additionally, we constructed pangenomes in a reference-free manner using high-quality metagenome-assembled genomes (MAGs) and analyzed the associated annotations. eggNOG annotated more genes on well-studied organisms, such as Escherichia coli, while DeepFRI was less sensitive to taxa. Further, we show that DeepFRI provides additional annotations in comparison to the previous DIABIMMUNE studies. This workflow will contribute to novel understanding of the functional signature of the human gut microbiome in health and disease as well as guiding future metagenomics studies. IMPORTANCE The past decade has seen advancement in high-throughput sequencing technologies resulting in rapid accumulation of genomic data from microbial communities. While this growth in sequence data and gene discovery is impressive, the majority of microbial gene functions remain uncharacterized. The coverage of functional information coming from either experimental sources or inferences is low. To solve these challenges, we have developed a new workflow to computationally assemble microbial genomes and annotate the genes using a deep learning-based model DeepFRI. This improved microbial gene annotation coverage to 1.9 million metagenome-assembled genes, representing 99% of the assembled genes, which is a significant improvement compared to 12% Gene Ontology term annotation coverage by commonly used orthology-based approaches. Importantly, the workflow supports pangenome reconstruction in a reference-free manner, allowing us to analyze the functional potential of individual bacterial species. We therefore propose this alternative approach combining deep-learning functional predictions with the commonly used orthology-based annotations as one that could help us uncover novel functions observed in metagenomic microbiome studies.


Assuntos
Aprendizado Profundo , Microbiota , Humanos , Metagenoma/genética , Anotação de Sequência Molecular , Microbiota/genética , Genoma Microbiano
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...