Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 337
Filtrar
1.
Cell ; 185(15): 2725-2738, 2022 07 21.
Artigo em Inglês | MEDLINE | ID: mdl-35868276

RESUMO

Microbial culturing and meta-omic profiling technologies have significantly advanced our understanding of the taxonomic and functional variation of the human microbiome and its impact on host processes. The next increase in resolution will come by understanding the role of low-abundant and less-prevalent bacteria and the study of individual cell behaviors that underlie the complexity of microbial ecosystems. To this aim, single-cell techniques are being rapidly developed to isolate, culture, and characterize the genomes and transcriptomes of individual microbes in complex communities. Here, we discuss how these single-cell technologies are providing unique insights into the biology and behavior of human microbiomes.


Assuntos
Microbiota , Bactérias/genética , Genoma Microbiano , Interações entre Hospedeiro e Microrganismos , Humanos , Análise de Sequência de RNA , Análise de Célula Única
2.
Nat Methods ; 20(8): 1203-1212, 2023 08.
Artigo em Inglês | MEDLINE | ID: mdl-37500759

RESUMO

Advances in sequencing technologies and bioinformatics tools have dramatically increased the recovery rate of microbial genomes from metagenomic data. Assessing the quality of metagenome-assembled genomes (MAGs) is a critical step before downstream analysis. Here, we present CheckM2, an improved method of predicting genome quality of MAGs using machine learning. Using synthetic and experimental data, we demonstrate that CheckM2 outperforms existing tools in both accuracy and computational speed. In addition, CheckM2's database can be rapidly updated with new high-quality reference genomes, including taxa represented only by a single genome. We also show that CheckM2 accurately predicts genome quality for MAGs from novel lineages, even for those with reduced genome size (for example, Patescibacteria and the DPANN superphylum). CheckM2 provides accurate genome quality predictions across bacterial and archaeal lineages, giving increased confidence when inferring biological conclusions from MAGs.


Assuntos
Bactérias , Genoma Microbiano , Bactérias/genética , Metagenoma , Metagenômica/métodos , Aprendizado de Máquina
3.
Brief Bioinform ; 25(3)2024 Mar 27.
Artigo em Inglês | MEDLINE | ID: mdl-38706320

RESUMO

The advent of rapid whole-genome sequencing has created new opportunities for computational prediction of antimicrobial resistance (AMR) phenotypes from genomic data. Both rule-based and machine learning (ML) approaches have been explored for this task, but systematic benchmarking is still needed. Here, we evaluated four state-of-the-art ML methods (Kover, PhenotypeSeeker, Seq2Geno2Pheno and Aytan-Aktug), an ML baseline and the rule-based ResFinder by training and testing each of them across 78 species-antibiotic datasets, using a rigorous benchmarking workflow that integrates three evaluation approaches, each paired with three distinct sample splitting methods. Our analysis revealed considerable variation in the performance across techniques and datasets. Whereas ML methods generally excelled for closely related strains, ResFinder excelled for handling divergent genomes. Overall, Kover most frequently ranked top among the ML approaches, followed by PhenotypeSeeker and Seq2Geno2Pheno. AMR phenotypes for antibiotic classes such as macrolides and sulfonamides were predicted with the highest accuracies. The quality of predictions varied substantially across species-antibiotic combinations, particularly for beta-lactams; across species, resistance phenotyping of the beta-lactams compound, aztreonam, amoxicillin/clavulanic acid, cefoxitin, ceftazidime and piperacillin/tazobactam, alongside tetracyclines demonstrated more variable performance than the other benchmarked antibiotics. By organism, Campylobacter jejuni and Enterococcus faecium phenotypes were more robustly predicted than those of Escherichia coli, Staphylococcus aureus, Salmonella enterica, Neisseria gonorrhoeae, Klebsiella pneumoniae, Pseudomonas aeruginosa, Acinetobacter baumannii, Streptococcus pneumoniae and Mycobacterium tuberculosis. In addition, our study provides software recommendations for each species-antibiotic combination. It furthermore highlights the need for optimization for robust clinical applications, particularly for strains that diverge substantially from those used for training.


Assuntos
Antibacterianos , Fenótipo , Antibacterianos/farmacologia , Aprendizado de Máquina , Farmacorresistência Bacteriana/genética , Biologia Computacional/métodos , Genoma Bacteriano , Genoma Microbiano , Humanos , Bactérias/genética , Bactérias/efeitos dos fármacos
4.
Nucleic Acids Res ; 52(D1): D586-D589, 2024 Jan 05.
Artigo em Inglês | MEDLINE | ID: mdl-37904617

RESUMO

Many microorganisms produce natural products that are frequently used in the development of medicines and crop protection agents. Genome mining has evolved into a prominent method to access this potential. antiSMASH is the most popular tool for this task. Here we present version 4 of the antiSMASH database, providing biosynthetic gene clusters detected by antiSMASH 7.1 in publicly available, dereplicated, high-quality microbial genomes via an interactive graphical user interface. In version 4, the database contains 231 534 high quality BGC regions from 592 archaeal, 35 726 bacterial and 236 fungal genomes and is available at https://antismash-db.secondarymetabolites.org/.


Assuntos
Produtos Biológicos , Vias Biossintéticas , Bases de Dados Genéticas , Genoma Microbiano , Vias Biossintéticas/genética , Família Multigênica , Software
5.
Nucleic Acids Res ; 52(D1): D690-D700, 2024 Jan 05.
Artigo em Inglês | MEDLINE | ID: mdl-37897361

RESUMO

The Animal Meta-omics landscape database (AnimalMetaOmics, https://yanglab.hzau.edu.cn/animalmetaomics#/) is a comprehensive and freely available resource that includes metagenomic, metatranscriptomic, and metaproteomic data from various non-human animal species and provides abundant information on animal microbiomes, including cluster analysis of microbial cognate genes, functional gene annotations, active microbiota composition, gene expression abundance, and microbial protein identification. In this work, 55 898 microbial genomes were annotated from 581 animal species, including 42 924 bacterial genomes, 12 336 virus genomes, 496 archaea genomes and 142 fungi genomes. Moreover, 321 metatranscriptomic datasets were analyzed from 31 animal species and 326 metaproteomic datasets from four animal species, as well as the pan-genomic dynamics and compositional characteristics of 679 bacterial species and 13 archaea species from animal hosts. Researchers can efficiently access and acquire the information of cross-host microbiota through a user-friendly interface, such as species, genomes, activity levels, expressed protein sequences and functions, and pan-genome composition. These valuable resources provide an important reference for better exploring the classification, functional diversity, biological process diversity and functional genes of animal microbiota.


Assuntos
Bases de Dados Genéticas , Microbiota , Multiômica , Animais , Bactérias/genética , Genoma Microbiano , Metagenoma/genética , Microbiota/genética
6.
Bioinformatics ; 40(7)2024 Jul 01.
Artigo em Inglês | MEDLINE | ID: mdl-38905502

RESUMO

SUMMARY: The design of two overlapping genes in a microbial genome is an emerging technique for adding more reliable control mechanisms in engineered organisms for increased stability. The design of functional overlapping gene pairs is a challenging procedure, and computational design tools are used to improve the efficiency to deploy successful designs in genetically engineered systems. GENTANGLE (Gene Tuples ArraNGed in overLapping Elements) is a high-performance containerized pipeline for the computational design of two overlapping genes translated in different reading frames of the genome. This new software package can be used to design and test gene entanglements for microbial engineering projects using arbitrary sets of user-specified gene pairs. AVAILABILITY AND IMPLEMENTATION: The GENTANGLE source code and its submodules are freely available on GitHub at https://github.com/BiosecSFA/gentangle. The DATANGLE (DATA for genTANGLE) repository contains related data and results and is freely available on GitHub at https://github.com/BiosecSFA/datangle. The GENTANGLE container is freely available on Singularity Cloud Library at https://cloud.sylabs.io/library/khyox/gentangle/gentangle.sif. The GENTANGLE repository wiki (https://github.com/BiosecSFA/gentangle/wiki), website (https://biosecsfa.github.io/gentangle/), and user manual contain detailed instructions on how to use the different components of software and data, including examples and reproducing the results. The code is licensed under the GNU Affero General Public License version 3 (https://www.gnu.org/licenses/agpl.html).


Assuntos
Software , Biologia Computacional/métodos , Genoma Microbiano , Engenharia Genética/métodos
7.
Nucleic Acids Res ; 51(W1): W46-W50, 2023 07 05.
Artigo em Inglês | MEDLINE | ID: mdl-37140036

RESUMO

Microorganisms produce small bioactive compounds as part of their secondary or specialised metabolism. Often, such metabolites have antimicrobial, anticancer, antifungal, antiviral or other bio-activities and thus play an important role for applications in medicine and agriculture. In the past decade, genome mining has become a widely-used method to explore, access, and analyse the available biodiversity of these compounds. Since 2011, the 'antibiotics and secondary metabolite analysis shell-antiSMASH' (https://antismash.secondarymetabolites.org/) has supported researchers in their microbial genome mining tasks, both as a free to use web server and as a standalone tool under an OSI-approved open source licence. It is currently the most widely used tool for detecting and characterising biosynthetic gene clusters (BGCs) in archaea, bacteria, and fungi. Here, we present the updated version 7 of antiSMASH. antiSMASH 7 increases the number of supported cluster types from 71 to 81, as well as containing improvements in the areas of chemical structure prediction, enzymatic assembly-line visualisation and gene cluster regulation.


Assuntos
Computadores , Software , Bactérias/genética , Bactérias/metabolismo , Archaea/genética , Genoma Microbiano , Família Multigênica , Metabolismo Secundário/genética
8.
BMC Genomics ; 25(1): 786, 2024 Aug 14.
Artigo em Inglês | MEDLINE | ID: mdl-39138557

RESUMO

Biological networks serve a crucial role in elucidating intricate biological processes. While interspecies environmental interactions have been extensively studied, the exploration of gene interactions within species, particularly among individual microorganisms, is less developed. The increasing amount of microbiome genomic data necessitates a more nuanced analysis of microbial genome structures and functions. In this context, we introduce a complex structure using higher-order network theory, "Solid Motif Structures (SMS)", via a hierarchical biological network analysis of genomes within the same genus, effectively linking microbial genome structure with its function. Leveraging 162 high-quality genomes of Microcystis, a key freshwater cyanobacterium within microbial ecosystems, we established a genome structure network. Employing deep learning techniques, such as adaptive graph encoder, we uncovered 27 critical functional subnetworks and their associated SMSs. Incorporating metagenomic data from seven geographically distinct lakes, we conducted an investigation into Microcystis' functional stability under varying environmental conditions, unveiling unique functional interaction models for each lake. Our work compiles these insights into an extensive resource repository, providing novel perspectives on the functional dynamics within Microcystis. This research offers a hierarchical network analysis framework for understanding interactions between microbial genome structures and functions within the same genus.


Assuntos
Genoma Bacteriano , Microcystis , Microcystis/genética , Lagos/microbiologia , Redes Reguladoras de Genes , Metagenômica/métodos , Metagenoma , Genoma Microbiano , Genômica/métodos , Aprendizado Profundo
9.
BMC Genomics ; 25(1): 709, 2024 Jul 22.
Artigo em Inglês | MEDLINE | ID: mdl-39039439

RESUMO

Whole genome analysis for microbial genomics is critical to studying and monitoring antimicrobial resistance strains. The exponential growth of microbial sequencing data necessitates a fast and scalable computational pipeline to generate the desired outputs in a timely and cost-effective manner. Recent methods have been implemented to integrate individual genomes into large collections of specific bacterial populations and are widely employed for systematic genomic surveillance. However, they do not scale well when the population expands and turnaround time remains the main issue for this type of analysis. Here, we introduce AMRomics, an optimized microbial genomics pipeline that can work efficiently with big datasets. We use different bacterial data collections to compare AMRomics against competitive tools and show that our pipeline can generate similar results of interest but with better performance. The software is open source and is publicly available at https://github.com/amromics/amromics under an MIT license.


Assuntos
Genoma Bacteriano , Genômica , Software , Fluxo de Trabalho , Genômica/métodos , Biologia Computacional/métodos , Bactérias/genética , Genoma Microbiano , Farmacorresistência Bacteriana/genética
10.
BMC Genomics ; 25(1): 365, 2024 Apr 15.
Artigo em Inglês | MEDLINE | ID: mdl-38622536

RESUMO

BACKGROUND: Microbial genomes are largely comprised of protein coding sequences, yet some genomes contain many pseudogenes caused by frameshifts or internal stop codons. These pseudogenes are believed to result from gene degradation during evolution but could also be technical artifacts of genome sequencing or assembly. RESULTS: Using a combination of observational and experimental data, we show that many putative pseudogenes are attributable to errors that are incorporated into genomes during assembly. Within 126,564 publicly available genomes, we observed that nearly identical genomes often substantially differed in pseudogene counts. Causal inference implicated assembler, sequencing platform, and coverage as likely causative factors. Reassembly of genomes from raw reads confirmed that each variable affects the number of putative pseudogenes in an assembly. Furthermore, simulated sequencing reads corroborated our observations that the quality and quantity of raw data can significantly impact the number of pseudogenes in an assembler dependent fashion. The number of unexpected pseudogenes due to internal stops was highly correlated (R2 = 0.96) with average nucleotide identity to the ground truth genome, implying relative pseudogene counts can be used as a proxy for overall assembly correctness. Applying our method to assemblies in RefSeq resulted in rejection of 3.6% of assemblies due to significantly elevated pseudogene counts. Reassembly from real reads obtained from high coverage genomes showed considerable variability in spurious pseudogenes beyond that observed with simulated reads, reinforcing the finding that high coverage is necessary to mitigate assembly errors. CONCLUSIONS: Collectively, these results demonstrate that many pseudogenes in microbial genome assemblies are actually genes. Our results suggest that high read coverage is required for correct assembly and indicate an inflated number of pseudogenes due to internal stops is indicative of poor overall assembly quality.


Assuntos
Genoma Bacteriano , Pseudogenes , Pseudogenes/genética , Mapeamento Cromossômico , Sequência de Bases , Genoma Microbiano , Análise de Sequência de DNA/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos
11.
Bioinformatics ; 39(39 Suppl 1): i40-i46, 2023 06 30.
Artigo em Inglês | MEDLINE | ID: mdl-37387149

RESUMO

Microbial natural products represent a major source of bioactive compounds for drug discovery. Among these molecules, nonribosomal peptides (NRPs) represent a diverse class that include antibiotics, immunosuppressants, anticancer agents, toxins, siderophores, pigments, and cytostatics. The discovery of novel NRPs remains a laborious process because many NRPs consist of nonstandard amino acids that are assembled by nonribosomal peptide synthetases (NRPSs). Adenylation domains (A-domains) in NRPSs are responsible for selection and activation of monomers appearing in NRPs. During the past decade, several support vector machine-based algorithms have been developed for predicting the specificity of the monomers present in NRPs. These algorithms utilize physiochemical features of the amino acids present in the A-domains of NRPSs. In this article, we benchmarked the performance of various machine learning algorithms and features for predicting specificities of NRPSs and we showed that the extra trees model paired with one-hot encoding features outperforms the existing approaches. Moreover, we show that unsupervised clustering of 453 560 A-domains reveals many clusters that correspond to potentially novel amino acids. While it is challenging to predict the chemical structure of these amino acids, we developed novel techniques to predict their various properties, including polarity, hydrophobicity, charge, and presence of aromatic rings, carboxyl, and hydroxyl groups.


Assuntos
Aminoácidos , Genoma Microbiano , Algoritmos , Família Multigênica , Peptídeos
12.
PLoS Comput Biol ; 19(4): e1010998, 2023 04.
Artigo em Inglês | MEDLINE | ID: mdl-37014908

RESUMO

The increase in microbial sequenced genomes from pure cultures and metagenomic samples reflects the current attainability of whole-genome and shotgun sequencing methods. However, software for genome visualization still lacks automation, integration of different analyses, and customizable options for non-experienced users. In this study, we introduce GenoVi, a Python command-line tool able to create custom circular genome representations for the analysis and visualization of microbial genomes and sequence elements. It is designed to work with complete or draft genomes, featuring customizable options including 25 different built-in color palettes (including 5 color-blind safe palettes), text formatting options, and automatic scaling for complete genomes or sequence elements with more than one replicon/sequence. Using a Genbank format file as the input file or multiple files within a directory, GenoVi (i) visualizes genomic features from the GenBank annotation file, (ii) integrates a Cluster of Orthologs Group (COG) categories analysis using DeepNOG, (iii) automatically scales the visualization of each replicon of complete genomes or multiple sequence elements, (iv) and generates COG histograms, COG frequency heatmaps and output tables including general stats of each replicon or contig processed. GenoVi's potential was assessed by analyzing single and multiple genomes of Bacteria and Archaea. Paraburkholderia genomes were analyzed to obtain a fast classification of replicons in large multipartite genomes. GenoVi works as an easy-to-use command-line tool and provides customizable options to automatically generate genomic maps for scientific publications, educational resources, and outreach activities. GenoVi is freely available and can be downloaded from https://github.com/robotoD/GenoVi.


Assuntos
Archaea , Bactérias , Archaea/genética , Bactérias/genética , Genômica/métodos , Software , Genoma Microbiano
13.
PLoS Comput Biol ; 19(6): e1011129, 2023 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-37347768

RESUMO

The increasing availability of high-throughput sequencing (frequently termed next-generation sequencing (NGS)) data has created opportunities to gain deeper insights into the mechanisms of a number of diseases and is already impacting many areas of medicine and public health. The area of infectious diseases stands somewhat apart from other human diseases insofar as the relevant genomic data comes from the microbes rather than their human hosts. A particular concern about the threat of antimicrobial resistance (AMR) has driven the collection and reporting of large-scale datasets containing information from microbial genomes together with antimicrobial susceptibility test (AST) results. Unfortunately, the lack of clear standards or guiding principles for the reporting of such data is hampering the field's advancement. We therefore present our recommendations for the publication and sharing of genotype and phenotype data on AMR, in the form of 10 simple rules. The adoption of these recommendations will enhance AMR data interoperability and help enable its large-scale analyses using computational biology tools, including mathematical modelling and machine learning. We hope that these rules can shed light on often overlooked but nonetheless very necessary aspects of AMR data sharing and enhance the field's ability to address the problems of understanding AMR mechanisms, tracking their emergence and spread in populations, and predicting microbial susceptibility to antimicrobials for diagnostic purposes.


Assuntos
Antibacterianos , Anti-Infecciosos , Humanos , Antibacterianos/farmacologia , Farmacorresistência Bacteriana/genética , Bactérias/genética , Genoma Microbiano , Genótipo , Fenótipo
14.
Nucleic Acids Res ; 50(D1): D102-D105, 2022 01 07.
Artigo em Inglês | MEDLINE | ID: mdl-34751405

RESUMO

The Bioinformation and DDBJ (DNA Data Bank of Japan) Center (DDBJ Center; https://www.ddbj.nig.ac.jp) operates archival databases that collect nucleotide sequences, study and sample information, and distribute them without access restriction to progress life science research as a member of the International Nucleotide Sequence Database Collaboration (INSDC), in collaboration with the National Center for Biotechnology Information (NCBI) and the European Bioinformatics Institute. Besides the INSDC databases, the DDBJ Center also provides the Genomic Expression Archive for functional genomics data and the Japanese Genotype-phenotype Archive for human data requiring controlled access. Additionally, the DDBJ Center started a new public repository, MetaboBank, for experimental raw data and metadata from metabolomics research in October 2020. In response to the COVID-19 pandemic, the DDBJ Center openly shares SARS-CoV-2 genome sequences in collaboration with Shizuoka Prefecture and Keio University. The operation of DDBJ is based on the National Institute of Genetics (NIG) supercomputer, which is open for large-scale sequence data analysis for life science researchers. This paper reports recent updates on the archival databases and the services of DDBJ.


Assuntos
Bases de Dados Genéticas , Bases de Dados de Ácidos Nucleicos , Genoma Microbiano , Japão , Metabolômica , SARS-CoV-2/genética , Transcriptoma
15.
Nucleic Acids Res ; 50(W1): W541-W550, 2022 07 05.
Artigo em Inglês | MEDLINE | ID: mdl-35639517

RESUMO

Most bacteria and archaea possess multiple antiviral defence systems that protect against infection by phages, archaeal viruses and mobile genetic elements. Our understanding of the diversity of defence systems has increased greatly in the last few years, and many more systems likely await discovery. To identify defence-related genes, we recently developed the Prokaryotic Antiviral Defence LOCator (PADLOC) bioinformatics tool. To increase the accessibility of PADLOC, we describe here the PADLOC web server (freely available at https://padloc.otago.ac.nz), allowing users to analyse whole genomes, metagenomic contigs, plasmids, phages and archaeal viruses. The web server includes a more than 5-fold increase in defence system types detected (since the first release) and expanded functionality enabling detection of CRISPR arrays and retron ncRNAs. Here, we provide user information such as input options, description of the multiple outputs, limitations and considerations for interpretation of the results, and guidance for subsequent analyses. The PADLOC web server also houses a precomputed database of the defence systems in > 230,000 RefSeq genomes. These data reveal two taxa, Campylobacterota and Spriochaetota, with unusual defence system diversity and abundance. Overall, the PADLOC web server provides a convenient and accessible resource for the detection of antiviral defence systems.


Assuntos
Archaea , Bactérias , Genoma Microbiano , Genômica , Internet , Software , Archaea/genética , Archaea/virologia , Bactérias/genética , Bactérias/virologia , Bacteriófagos/imunologia , Genoma Microbiano/genética , Plasmídeos/genética , Células Procarióticas/metabolismo , Células Procarióticas/virologia , Computadores , Genômica/métodos
16.
Nucleic Acids Res ; 50(D1): D1-D10, 2022 01 07.
Artigo em Inglês | MEDLINE | ID: mdl-34986604

RESUMO

The 2022 Nucleic Acids Research Database Issue contains 185 papers, including 87 papers reporting on new databases and 85 updates from resources previously published in the Issue. Thirteen additional manuscripts provide updates on databases most recently published elsewhere. Seven new databases focus specifically on COVID-19 and SARS-CoV-2, including SCoV2-MD, the first of the Issue's Breakthrough Articles. Major nucleic acid databases reporting updates include MODOMICS, JASPAR and miRTarBase. The AlphaFold Protein Structure Database, described in the second Breakthrough Article, is the stand-out in the protein section, where the Human Proteoform Atlas and GproteinDb are other notable new arrivals. Updates from DisProt, FuzDB and ELM comprehensively cover disordered proteins. Under the metabolism and signalling section Reactome, ConsensusPathDB, HMDB and CAZy are major returning resources. In microbial and viral genomes taxonomy and systematics are well covered by LPSN, TYGS and GTDB. Genomics resources include Ensembl, Ensembl Genomes and UCSC Genome Browser. Major returning pharmacology resource names include the IUPHAR/BPS guide and the Therapeutic Target Database. New plant databases include PlantGSAD for gene lists and qPTMplants for post-translational modifications. The entire Database Issue is freely available online on the Nucleic Acids Research website (https://academic.oup.com/nar). Our latest update to the NAR online Molecular Biology Database Collection brings the total number of entries to 1645. Following last year's major cleanup, we have updated 317 entries, listing 89 new resources and trimming 80 discontinued URLs. The current release is available at http://www.oxfordjournals.org/nar/database/c/.


Assuntos
Bases de Dados Factuais , Biologia Molecular , Animais , COVID-19 , Bases de Dados de Ácidos Nucleicos , Bases de Dados de Proteínas , Genoma Microbiano , Genoma Viral , Humanos , Camundongos , Plantas/genética , Processamento de Proteína Pós-Traducional , Proteoma , SARS-CoV-2/genética , Transdução de Sinais
17.
Proc Natl Acad Sci U S A ; 118(14)2021 04 06.
Artigo em Inglês | MEDLINE | ID: mdl-33737447

RESUMO

When addressing a genomic question, having a reliable and adequate reference genome is of utmost importance. This drives the necessity to refine and customize reference genomes (RGs). Our laboratory has recently developed a strategy, the Perfect Match Genomic Landscape (PMGL), to detect variation between genomes [K. Palacios-Flores et al.Genetics 208, 1631-1641 (2018)]. The PMGL is precise and sensitive and, in contrast to most currently used algorithms, is nonstatistical in nature. Here we demonstrate the power of PMGL to refine and customize RGs. As a proof-of-concept, we refined different versions of the Saccharomyces cerevisiae RG. We applied the automatic PMGL pipeline to refine the genomes of microorganisms belonging to the three domains of life: the archaea Methanococcus maripaludis and Pyrococcus furiosus; the bacteria Escherichia coli, Staphylococcus aureus, and Bacillus subtilis; and the eukarya Schizosaccharomyces pombe, Aspergillus oryzae, and several strains of Saccharomyces paradoxus. We analyzed the reference genome of the virus SARS-CoV-2 and previously published viral genomes from patients' samples with COVID-19. We performed a mutation-accumulation experiment in E. coli and show that the PMGL strategy can detect specific mutations generated at any desired step of the whole procedure. We propose that PMGL can be used as a final step for the refinement and customization of any haploid genome, independently of the strategies and algorithms used in its assembly.


Assuntos
Variação Genética , Genoma Microbiano , Genômica/métodos , SARS-CoV-2/genética , Algoritmos , Acúmulo de Mutações , Estudo de Prova de Conceito , Saccharomyces cerevisiae/genética
18.
BMC Bioinformatics ; 24(1): 128, 2023 Apr 04.
Artigo em Inglês | MEDLINE | ID: mdl-37016282

RESUMO

BACKGROUND: Concentrations of the pathogenic microorganisms' DNA in biological samples are typically low. Therefore, DNA diagnostics of common infections are costly, rarely accurate, and challenging. Limited by failing to cover updated epidemic testing samples, computational services are difficult to implement in clinical applications without complex customized settings. Furthermore, the combined biomarkers used to maintain high conservation may not be cost effective and could cause several experimental errors in many clinical settings. Given the limitations of recent developed technology, 16S rRNA is too conserved to distinguish closely related species, and mosaic plasmids are not effective as well because of their uneven distribution across prokaryotic taxa. RESULTS: Here, we provide a computational strategy, Shine, that allows extraction of specific, sensitive and well-conserved biomarkers from massive microbial genomic datasets. Distinguished with simple concatenations with blast-based filtering, our method involves a de novo genome alignment-based pipeline to explore the original and specific repetitive biomarkers in the defined population. It can cover all members to detect newly discovered multicopy conserved species-specific or even subspecies-specific target probes and primer sets. The method has been successfully applied to a number of clinical projects and has the overwhelming advantages of automated detection of all pathogenic microorganisms without the limitations of genome annotation and incompletely assembled motifs. Using on our pipeline, users may select different configuration parameters depending on the purpose of the project for routine clinical detection practices on the website https://bioinfo.liferiver.com.cn with easy registration. CONCLUSIONS: The proposed strategy is suitable for identifying shared phylogenetic markers while featuring low rates of false positive or false negative. This technology is suitable for the automatic design of minimal and efficient PCR primers and other types of detection probes.


Assuntos
DNA , Genoma Microbiano , Filogenia , RNA Ribossômico 16S , Genômica , Biomarcadores
19.
Brief Bioinform ; 22(5)2021 09 02.
Artigo em Inglês | MEDLINE | ID: mdl-33758906

RESUMO

Recent advances in high-throughput sequencing technologies and computational methods have added a new dimension to metagenomic data analysis i.e. genome-resolved metagenomics. In general terms, it refers to the recovery of draft or high-quality microbial genomes and their taxonomic classification and functional annotation. In recent years, several studies have utilized the genome-resolved metagenome analysis approach and identified previously unknown microbial species from human and environmental metagenomes. In this review, we describe genome-resolved metagenome analysis as a series of four necessary steps: (i) preprocessing of the sequencing reads, (ii) de novo metagenome assembly, (iii) genome binning and (iv) taxonomic and functional analysis of the recovered genomes. For each of these four steps, we discuss the most commonly used tools and the currently available pipelines to guide the scientific community in the recovery and subsequent analyses of genomes from any metagenome sample. Furthermore, we also discuss the tools required for validation of assembly quality as well as for improving quality of the recovered genomes. We also highlight the currently available pipelines that can be used to automate the whole analysis without having advanced bioinformatics knowledge. Finally, we will highlight the most widely adapted and actively maintained tools and pipelines that can be helpful to the scientific community in decision making before they commence the analysis.


Assuntos
Código de Barras de DNA Taxonômico/métodos , Genoma Microbiano , Metagenoma , Metagenômica/métodos , Microbiota/genética , Fezes/microbiologia , Genitália/microbiologia , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Boca/microbiologia , Análise de Sequência de DNA , Pele/microbiologia , Microbiologia do Solo , Microbiologia da Água
20.
Bioinformatics ; 38(19): 4481-4487, 2022 09 30.
Artigo em Inglês | MEDLINE | ID: mdl-35972375

RESUMO

MOTIVATION: Despite recent advancements in sequencing technologies and assembly methods, obtaining high-quality microbial genomes from metagenomic samples is still not a trivial task. Current metagenomic binners do not take full advantage of assembly graphs and are not optimized for long-read assemblies. Deep graph learning algorithms have been proposed in other fields to deal with complex graph data structures. The graph structure generated during the assembly process could be integrated with contig features to obtain better bins with deep learning. RESULTS: We propose GraphMB, which uses graph neural networks to incorporate the assembly graph into the binning process. We test GraphMB on long-read datasets of different complexities, and compare the performance with other binners in terms of the number of High Quality (HQ) genome bins obtained. With our approach, we were able to obtain unique bins on all real datasets, and obtain more bins on most datasets. In particular, we obtained on average 17.5% more HQ bins when compared with state-of-the-art binners and 13.7% when aggregating the results of our binner with the others. These results indicate that a deep learning model can integrate contig-specific and graph-structure information to improve metagenomic binning. AVAILABILITY AND IMPLEMENTATION: GraphMB is available from https://github.com/MicrobialDarkMatter/GraphMB. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Metagenoma , Metagenômica , Análise de Sequência de DNA/métodos , Metagenômica/métodos , Genoma Microbiano , Algoritmos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA