Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 33
Filtrar
Más filtros

Base de datos
País/Región como asunto
Tipo del documento
Intervalo de año de publicación
1.
Nat Commun ; 15(1): 5356, 2024 Jun 25.
Artículo en Inglés | MEDLINE | ID: mdl-38918378

RESUMEN

Type 1 polyketides are a major class of natural products used as antiviral, antibiotic, antifungal, antiparasitic, immunosuppressive, and antitumor drugs. Analysis of public microbial genomes leads to the discovery of over sixty thousand type 1 polyketide gene clusters. However, the molecular products of only about a hundred of these clusters are characterized, leaving most metabolites unknown. Characterizing polyketides relies on bioactivity-guided purification, which is expensive and time-consuming. To address this, we present Seq2PKS, a machine learning algorithm that predicts chemical structures derived from Type 1 polyketide synthases. Seq2PKS predicts numerous putative structures for each gene cluster to enhance accuracy. The correct structure is identified using a variable mass spectral database search. Benchmarks show that Seq2PKS outperforms existing methods. Applying Seq2PKS to Actinobacteria datasets, we discover biosynthetic gene clusters for monazomycin, oasomycin A, and 2-aminobenzamide-actiphenol.


Asunto(s)
Espectrometría de Masas , Familia de Multigenes , Sintasas Poliquetidas , Policétidos , Policétidos/metabolismo , Policétidos/química , Sintasas Poliquetidas/genética , Sintasas Poliquetidas/metabolismo , Espectrometría de Masas/métodos , Minería de Datos/métodos , Aprendizaje Automático , Actinobacteria/genética , Actinobacteria/metabolismo , Genoma Bacteriano , Algoritmos , Productos Biológicos/química , Productos Biológicos/metabolismo
2.
Nat Biotechnol ; 2024 Jan 02.
Artículo en Inglés | MEDLINE | ID: mdl-38168990

RESUMEN

The throughput of mass spectrometers and the amount of publicly available metabolomics data are growing rapidly, but analysis tools such as molecular networking and Mass Spectrometry Search Tool do not scale to searching and clustering billions of mass spectral data in metabolomics repositories. To address this limitation, we designed MASST+ and Networking+, which can process datasets that are up to three orders of magnitude larger than those processed by state-of-the-art tools.

3.
Nat Commun ; 14(1): 4219, 2023 07 14.
Artículo en Inglés | MEDLINE | ID: mdl-37452020

RESUMEN

Recent analyses of public microbial genomes have found over a million biosynthetic gene clusters, the natural products of the majority of which remain unknown. Additionally, GNPS harbors billions of mass spectra of natural products without known structures and biosynthetic genes. We bridge the gap between large-scale genome mining and mass spectral datasets for natural product discovery by developing HypoRiPPAtlas, an Atlas of hypothetical natural product structures, which is ready-to-use for in silico database search of tandem mass spectra. HypoRiPPAtlas is constructed by mining genomes using seq2ripp, a machine-learning tool for the prediction of ribosomally synthesized and post-translationally modified peptides (RiPPs). In HypoRiPPAtlas, we identify RiPPs in microbes and plants. HypoRiPPAtlas could be extended to other natural product classes in the future by implementing corresponding biosynthetic logic. This study paves the way for large-scale explorations of biosynthetic pathways and chemical structures of microbial and plant RiPP classes.


Asunto(s)
Productos Biológicos , Ribosomas , Ribosomas/metabolismo , Productos Biológicos/química , Péptidos/química , Bases de Datos Factuales , Espectrometría de Masas en Tándem , Procesamiento Proteico-Postraduccional
4.
Bioinformatics ; 39(39 Suppl 1): i40-i46, 2023 06 30.
Artículo en Inglés | MEDLINE | ID: mdl-37387149

RESUMEN

Microbial natural products represent a major source of bioactive compounds for drug discovery. Among these molecules, nonribosomal peptides (NRPs) represent a diverse class that include antibiotics, immunosuppressants, anticancer agents, toxins, siderophores, pigments, and cytostatics. The discovery of novel NRPs remains a laborious process because many NRPs consist of nonstandard amino acids that are assembled by nonribosomal peptide synthetases (NRPSs). Adenylation domains (A-domains) in NRPSs are responsible for selection and activation of monomers appearing in NRPs. During the past decade, several support vector machine-based algorithms have been developed for predicting the specificity of the monomers present in NRPs. These algorithms utilize physiochemical features of the amino acids present in the A-domains of NRPSs. In this article, we benchmarked the performance of various machine learning algorithms and features for predicting specificities of NRPSs and we showed that the extra trees model paired with one-hot encoding features outperforms the existing approaches. Moreover, we show that unsupervised clustering of 453 560 A-domains reveals many clusters that correspond to potentially novel amino acids. While it is challenging to predict the chemical structure of these amino acids, we developed novel techniques to predict their various properties, including polarity, hydrophobicity, charge, and presence of aromatic rings, carboxyl, and hydroxyl groups.


Asunto(s)
Aminoácidos , Genoma Microbiano , Algoritmos , Familia de Multigenes , Péptidos
5.
Sci Rep ; 13(1): 7285, 2023 05 04.
Artículo en Inglés | MEDLINE | ID: mdl-37142645

RESUMEN

Finding alignments between millions of reads and genome sequences is crucial in computational biology. Since the standard alignment algorithm has a large computational cost, heuristics have been developed to speed up this task. Though orders of magnitude faster, these methods lack theoretical guarantees and often have low sensitivity especially when reads have many insertions, deletions, and mismatches relative to the genome. Here we develop a theoretically principled and efficient algorithm that has high sensitivity across a wide range of insertion, deletion, and mutation rates. We frame sequence alignment as an inference problem in a probabilistic model. Given a reference database of reads and a query read, we find the match that maximizes a log-likelihood ratio of a reference read and query read being generated jointly from a probabilistic model versus independent models. The brute force solution to this problem computes joint and independent probabilities between each query and reference pair, and its complexity grows linearly with database size. We introduce a bucketing strategy where reads with higher log-likelihood ratio are mapped to the same bucket with high probability. Experimental results show that our method is more accurate than the state-of-the-art approaches in aligning long-reads from Pacific Bioscience sequencers to genome sequences.


Asunto(s)
Algoritmos , Genoma , Alineación de Secuencia , Biología Computacional/métodos , Probabilidad , Análisis de Secuencia de ADN/métodos , Programas Informáticos , Secuenciación de Nucleótidos de Alto Rendimiento
6.
Sci Rep ; 12(1): 10342, 2022 06 20.
Artículo en Inglés | MEDLINE | ID: mdl-35725893

RESUMEN

As antibiotic resistance is becoming a major public health problem worldwide, one of the approaches for novel antibiotic discovery is re-purposing drugs available on the market for treating antibiotic resistant bacteria. The main economic advantage of this approach is that since these drugs have already passed all the safety tests, it vastly reduces the overall cost of clinical trials. Recently, several machine learning approaches have been developed for predicting promising antibiotics by training on bioactivity data collected on a set of small molecules. However, these methods report hundreds/thousands of bioactive molecules, and it remains unclear which of these molecules possess a novel mechanism of action. While the cost of high-throughput bioactivity testing has dropped dramatically in recent years, determining the mechanism of action of small molecules remains a costly and time-consuming step, and therefore computational methods for prioritizing molecules with novel mechanisms of action are needed. The existing approaches for predicting bioactivity of small molecules are based on uninterpretable machine learning, and therefore are not capable of determining known mechanism of action of small molecules and prioritizing novel mechanisms. We introduce InterPred, an interpretable technique for predicting bioactivity of small molecules and their mechanism of action. InterPred has the same accuracy as the state of the art in bioactivity prediction, and it enables assigning chemical moieties that are responsible for bioactivity. After analyzing bioactivity data of several thousand molecules against bacterial and fungal pathogens available from Community for Open Antimicrobial Drug Discovery and a US Food and Drug Association-approved drug library, InterPred identified five known links between moieties and mechanism of action.


Asunto(s)
Antibacterianos , Antiinfecciosos , Antibacterianos/química , Antibacterianos/farmacología , Bacterias , Descubrimiento de Drogas/métodos , Aprendizaje Automático
7.
Metabolites ; 12(2)2022 Jan 26.
Artículo en Inglés | MEDLINE | ID: mdl-35208194

RESUMEN

The human microbiome is a complex community of microorganisms, their enzymes, and the molecules they produce or modify. Recent studies show that imbalances in human microbial ecosystems can cause disease. Our microbiome affects our health through the products of biochemical reactions catalyzed by microbial enzymes (microbial biotransformations). Despite their significance, currently, there are no systematic strategies for identifying these chemical reactions, their substrates and molecular products, and their effects on health and disease. We present TransDiscovery, a computational algorithm that integrates molecular networks (connecting related molecules with similar mass spectra), association networks (connecting co-occurring molecules and microbes) and knowledge bases of microbial enzymes to discover microbial biotransformations, their substrates, and their products. After searching the metabolomics and metagenomics data from the American Gut Project and the Global Foodomic Project, TranDiscovery identified 17 potentially novel biotransformations from the human gut microbiome, along with the corresponding microbial species, substrates, and products.

8.
Metabolites ; 11(10)2021 Oct 11.
Artículo en Inglés | MEDLINE | ID: mdl-34677408

RESUMEN

Microbial natural products are a major source of bioactive compounds for drug discovery. Among these molecules, nonribosomal peptides (NRPs) represent a diverse class of natural products that include antibiotics, immunosuppressants, and anticancer agents. Recent breakthroughs in natural product discovery have revealed the chemical structure of several thousand NRPs. However, biosynthetic gene clusters (BGCs) encoding them are known only for a few hundred compounds. Here, we developed Nerpa, a computational method for the high-throughput discovery of novel BGCs responsible for producing known NRPs. After searching 13,399 representative bacterial genomes from the RefSeq repository against 8368 known NRPs, Nerpa linked 117 BGCs to their products. We further experimentally validated the predicted BGC of ngercheumicin from Photobacterium galatheae via mass spectrometry. Nerpa supports searching new genomes against thousands of known NRP structures, and novel molecular structures against tens of thousands of bacterial genomes. The availability of these tools can enhance our understanding of NRP synthesis and the function of their biosynthetic enzymes.

10.
Bioinformatics ; 37(Suppl_1): i231-i236, 2021 07 12.
Artículo en Inglés | MEDLINE | ID: mdl-34252948

RESUMEN

MOTIVATION: Untargeted mass spectrometry experiments enable the profiling of metabolites in complex biological samples. The collected fragmentation spectra are the metabolite's fingerprints that are used for molecule identification and discovery. Two main mass spectrometry strategies exist for the collection of fragmentation spectra: data-dependent acquisition (DDA) and data-independent acquisition (DIA). In the DIA strategy, all the metabolites ions in predefined mass-to-charge ratio ranges are co-isolated and co-fragmented, resulting in multiplexed fragmentation spectra that are challenging to annotate. In contrast, in the DDA strategy, fragmentation spectra are dynamically and specifically collected for the most abundant ions observed, causing redundancy and sub-optimal fragmentation spectra collection. Yet, DDA results in less multiplexed fragmentation spectra that can be readily annotated. RESULTS: We introduce the MS2Planner workflow, an Iterative Optimized Data Acquisition strategy that optimizes the number of high-quality fragmentation spectra over multiple experimental acquisitions using topological sorting. Our results showed that MS2Planner increases the annotation rate by 38.6% and is 62.5% more sensitive and 9.4% more specific compared to DDA. AVAILABILITY AND IMPLEMENTATION: MS2Planner code is available at https://github.com/mohimanilab/MS2Planner. The generation of the inclusion list from MS2Planner was performed with python scripts available at https://github.com/lfnothias/IODA_MS. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Espectrometría de Masas , Iones , Flujo de Trabajo
11.
Nat Commun ; 12(1): 3718, 2021 06 17.
Artículo en Inglés | MEDLINE | ID: mdl-34140479

RESUMEN

Identification of small molecules is a critical task in various areas of life science. Recent advances in mass spectrometry have enabled the collection of tandem mass spectra of small molecules from hundreds of thousands of environments. To identify which molecules are present in a sample, one can search mass spectra collected from the sample against millions of molecular structures in small molecule databases. The existing approaches are based on chemistry domain knowledge, and they fail to explain many of the peaks in mass spectra of small molecules. Here, we present molDiscovery, a mass spectral database search method that improves both efficiency and accuracy of small molecule identification by learning a probabilistic model to match small molecules with their mass spectra. A search of over 8 million spectra from the Global Natural Product Social molecular networking infrastructure shows that molDiscovery correctly identify six times more unique small molecules than previous methods.


Asunto(s)
Ensayos Analíticos de Alto Rendimiento/métodos , Metabolómica/métodos , Bibliotecas de Moléculas Pequeñas/análisis , Espectrometría de Masas en Tándem/métodos , Algoritmos , Bacterias/aislamiento & purificación , Bacterias/metabolismo , Benchmarking , Simulación por Computador , Bases de Datos de Compuestos Químicos , Humanos , Lípidos/aislamiento & purificación , Modelos Estadísticos , Plantas/metabolismo , Metabolismo Secundario , Programas Informáticos
12.
Nat Commun ; 12(1): 3225, 2021 05 28.
Artículo en Inglés | MEDLINE | ID: mdl-34050176

RESUMEN

Non-Ribosomal Peptides (NRPs) represent a biomedically important class of natural products that include a multitude of antibiotics and other clinically used drugs. NRPs are not directly encoded in the genome but are instead produced by metabolic pathways encoded by biosynthetic gene clusters (BGCs). Since the existing genome mining tools predict many putative NRPs synthesized by a given BGC, it remains unclear which of these putative NRPs are correct and how to identify post-assembly modifications of amino acids in these NRPs in a blind mode, without knowing which modifications exist in the sample. To address this challenge, here we report NRPminer, a modification-tolerant tool for NRP discovery from large (meta)genomic and mass spectrometry datasets. We show that NRPminer is able to identify many NRPs from different environments, including four previously unreported NRP families from soil-associated microbes and NRPs from human microbiota. Furthermore, in this work we demonstrate the anti-parasitic activities and the structure of two of these NRP families using direct bioactivity screening and nuclear magnetic resonance spectrometry, illustrating the power of NRPminer for discovering bioactive NRPs.


Asunto(s)
Antibacterianos/aislamiento & purificación , Productos Biológicos/aislamiento & purificación , Biología Computacional/métodos , Descubrimiento de Drogas/métodos , Péptidos/aislamiento & purificación , Algoritmos , Secuencia de Aminoácidos/genética , Antibacterianos/biosíntesis , Productos Biológicos/metabolismo , Conjuntos de Datos como Asunto , Humanos , Espectrometría de Masas , Redes y Vías Metabólicas/genética , Metabolómica/métodos , Metagenómica/métodos , Microbiota/genética , Familia de Multigenes , Biosíntesis de Péptidos , Péptido Sintasas/genética , Péptido Sintasas/metabolismo , Péptidos/genética , Péptidos/metabolismo , Microbiología del Suelo
13.
Sci Rep ; 11(1): 8314, 2021 04 15.
Artículo en Inglés | MEDLINE | ID: mdl-33859284

RESUMEN

Various studies have shown associations between molecular features and phenotypes of biological samples. These studies, however, focus on a single phenotype per study and are not applicable to repository scale metabolomics data. Here we report MetSummarizer, a method for predicting (i) the biological phenotypes of environmental and host-oriented samples, and (ii) the raw ingredient composition of complex mixtures. We show that the aggregation of various metabolomic datasets can improve the accuracy of predictions. Since these datasets have been collected using different standards at various laboratories, in order to get unbiased results it is crucial to detect and discard standard-specific features during the classification step. We further report high accuracy in prediction of the raw ingredient composition of complex foods from the Global Foodomics Project.


Asunto(s)
Conjuntos de Datos como Asunto , Análisis de los Alimentos , Metabolómica , Espectrometría de Masas en Tándem , Predicción , Sensibilidad y Especificidad
14.
Nat Methods ; 17(9): 905-908, 2020 09.
Artículo en Inglés | MEDLINE | ID: mdl-32839597

RESUMEN

Molecular networking has become a key method to visualize and annotate the chemical space in non-targeted mass spectrometry data. We present feature-based molecular networking (FBMN) as an analysis method in the Global Natural Products Social Molecular Networking (GNPS) infrastructure that builds on chromatographic feature detection and alignment tools. FBMN enables quantitative analysis and resolution of isomers, including from ion mobility spectrometry.


Asunto(s)
Productos Biológicos/química , Espectrometría de Masas , Biología Computacional/métodos , Bases de Datos Factuales , Metabolómica/métodos , Programas Informáticos
15.
Chem Soc Rev ; 49(11): 3297-3314, 2020 Jun 07.
Artículo en Inglés | MEDLINE | ID: mdl-32393943

RESUMEN

Microbial and plant specialized metabolites constitute an immense chemical diversity, and play key roles in mediating ecological interactions between organisms. Also referred to as natural products, they have been widely applied in medicine, agriculture, cosmetic and food industries. Traditionally, the main discovery strategies have centered around the use of activity-guided fractionation of metabolite extracts. Increasingly, omics data is being used to complement this, as it has the potential to reduce rediscovery rates, guide experimental work towards the most promising metabolites, and identify enzymatic pathways that enable their biosynthetic production. In recent years, genomic and metabolomic analyses of specialized metabolic diversity have been scaled up to study thousands of samples simultaneously. Here, we survey data analysis technologies that facilitate the effective exploration of large genomic and metabolomic datasets, and discuss various emerging strategies to integrate these two types of omics data in order to further accelerate discovery.


Asunto(s)
Bacterias/metabolismo , Productos Biológicos/química , Hongos/metabolismo , Genómica/métodos , Metabolómica/métodos , Plantas/metabolismo , Vías Biosintéticas , Biología Computacional , Simulación por Computador , Minería de Datos , Bases de Datos Genéticas , Descubrimiento de Drogas , Ensayos Analíticos de Alto Rendimiento , Humanos , Metabolismo Secundario
16.
Cell Syst ; 10(1): 99-108.e5, 2020 01 22.
Artículo en Inglés | MEDLINE | ID: mdl-31864964

RESUMEN

Cyclic and branch cyclic peptides (cyclopeptides) represent a class of bioactive natural products that include many antibiotics and anti-tumor compounds. Despite the recent advances in metabolomics analysis, still little is known about the cyclopeptides in the human gut and their possible interactions due to a lack of computational analysis pipelines that are applicable to such compounds. Here, we introduce CycloNovo, an algorithm for automated de novo cyclopeptide analysis and sequencing that employs de Bruijn graphs, the workhorse of DNA sequencing algorithms, to identify cyclopeptides in spectral datasets. CycloNovo reconstructed 32 previously unreported cyclopeptides (to the best of our knowledge) in the human gut and reported over a hundred cyclopeptides in other environments represented by various spectra on Global Natural Products Social Molecular Network (GNPS). https://github.com/bbehsaz/cyclonovo.


Asunto(s)
Secuencia de Aminoácidos/genética , Microbioma Gastrointestinal/genética , Péptidos Cíclicos/química , Humanos , Espectrometría de Masas
17.
Cell Syst ; 9(6): 600-608.e4, 2019 12 18.
Artículo en Inglés | MEDLINE | ID: mdl-31629686

RESUMEN

Ribosomally synthesized and post-translationally modified peptides (RiPPs) are an important class of natural products that contain antibiotics and a variety of other bioactive compounds. The existing methods for discovery of RiPPs by combining genome mining and computational mass spectrometry are limited to discovering specific classes of RiPPs from small datasets, and these methods fail to handle unknown post-translational modifications. Here, we present MetaMiner, a software tool for addressing these challenges that is compatible with large-scale screening platforms for natural product discovery. After searching millions of spectra in the Global Natural Products Social (GNPS) molecular networking infrastructure against just eight genomic and metagenomic datasets, MetaMiner discovered 31 known and seven unknown RiPPs from diverse microbial communities, including human microbiome and lichen microbiome, and microorganisms isolated from the International Space Station.


Asunto(s)
Biología Computacional/métodos , Microbiota/genética , Procesamiento Proteico-Postraduccional/genética , Genómica/métodos , Humanos , Péptidos/química , Ribosomas/genética , Programas Informáticos
18.
mSystems ; 4(4)2019 Aug 27.
Artículo en Inglés | MEDLINE | ID: mdl-31455639

RESUMEN

The human microbiome consists of thousands of different microbial species, and tens of thousands of bioactive small molecules are associated with them. These associated molecules include the biosynthetic products of microbiota and the products of microbial transformation of host molecules, dietary components, and pharmaceuticals. The existing methods for characterization of these small molecules are currently time consuming and expensive, and they are limited to the cultivable bacteria. Here, we propose a method for detecting microbiota-associated small molecules based on the patterns of cooccurrence of molecular and microbial features across multiple microbiomes. We further map each molecule to the clade in a phylogenetic tree that is responsible for its production/transformation. We applied our proposed method to the tandem mass spectrometry and metagenomics data sets collected by the American Gut Project and to microbiome isolates from cystic fibrosis patients and discovered the genes in the human microbiome responsible for the production of corynomycolenic acid, which serves as a ligand for human T cells and induces a specific immune response against infection. Moreover, our method correctly associated pseudomonas quinolone signals, tyrvalin, and phevalin with their known biosynthetic gene clusters.IMPORTANCE Experimental advances have enabled the acquisition of tandem mass spectrometry and metagenomics sequencing data from tens of thousands of environmental/host-oriented microbial communities. Each of these communities contains hundreds of microbial features (corresponding to microbial species) and thousands of molecular features (corresponding to microbial natural products). However, with the current technology, it is very difficult to identify the microbial species responsible for the production/biotransformation of each molecular feature. Here, we develop association networks, a new approach for identifying the microbial producer/biotransformer of natural products through cooccurrence analysis of metagenomics and mass spectrometry data collected on multiple microbiomes.

19.
Genome Res ; 29(8): 1352-1362, 2019 08.
Artículo en Inglés | MEDLINE | ID: mdl-31160374

RESUMEN

Predicting biosynthetic gene clusters (BGCs) is critically important for discovery of antibiotics and other natural products. While BGC prediction from complete genomes is a well-studied problem, predicting BGCs in fragmented genomic assemblies remains challenging. The existing BGC prediction tools often assume that each BGC is encoded within a single contig in the genome assembly, a condition that is violated for most sequenced microbial genomes where BGCs are often scattered through several contigs, making it difficult to reconstruct them. The situation is even more severe in shotgun metagenomics, where the contigs are often short, and the existing tools fail to predict a large fraction of long BGCs. While it is difficult to assemble BGCs in a single contig, the structure of the genome assembly graph often provides clues on how to combine multiple contigs into segments encoding long BGCs. We describe biosyntheticSPAdes, a tool for predicting BGCs in assembly graphs and demonstrate that it greatly improves the reconstruction of BGCs from genomic and metagenomics data sets.


Asunto(s)
Genes Bacterianos , Metagenoma , Metagenómica/métodos , Familia de Multigenes , Programas Informáticos , Mapeo Contig , Conjuntos de Datos como Asunto , Placa Dental/microbiología , Encía/microbiología , Humanos , Internet , Mucosa Bucal/microbiología , Faringe/microbiología , Biosíntesis de Proteínas , Lengua/microbiología
20.
Nat Commun ; 9(1): 4035, 2018 10 02.
Artículo en Inglés | MEDLINE | ID: mdl-30279420

RESUMEN

Natural products have traditionally been rich sources for drug discovery. In order to clear the road toward the discovery of unknown natural products, biologists need dereplication strategies that identify known ones. Here we report DEREPLICATOR+, an algorithm that improves on the previous approaches for identifying peptidic natural products, and extends them for identification of polyketides, terpenes, benzenoids, alkaloids, flavonoids, and other classes of natural products. We show that DEREPLICATOR+ can search all spectra in the recently launched Global Natural Products Social molecular network and identify an order of magnitude more natural products than previous dereplication efforts. We further demonstrate that DEREPLICATOR+ enables cross-validation of genome-mining and peptidogenomics/glycogenomics results.


Asunto(s)
Productos Biológicos/análisis , Descubrimiento de Drogas/métodos , Espectrometría de Masas , Actinomyces/química , Algoritmos , Cianobacterias/química , Genómica , Macrólidos/análisis , Programas Informáticos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA