Búsqueda | Portal Regional de la BVS

MS2Query: reliable and scalable MS² mass spectra-based analogue search.

de Jonge, Niek F; Louwen, Joris J R; Chekmeneva, Elena; Camuzeaux, Stephane; Vermeir, Femke J; Jansen, Robert S; Huber, Florian; van der Hooft, Justin J J.

Nat Commun ; 14(1): 1752, 2023 03 29.

Artículo en Inglés | MEDLINE | ID: mdl-36990978

RESUMEN

Metabolomics-driven discoveries of biological samples remain hampered by the grand challenge of metabolite annotation and identification. Only few metabolites have an annotated spectrum in spectral libraries; hence, searching only for exact library matches generally returns a few hits. An attractive alternative is searching for so-called analogues as a starting point for structural annotations; analogues are library molecules which are not exact matches but display a high chemical similarity. However, current analogue search implementations are not yet very reliable and relatively slow. Here, we present MS2Query, a machine learning-based tool that integrates mass spectral embedding-based chemical similarity predictors (Spec2Vec and MS2Deepscore) as well as detected precursor masses to rank potential analogues and exact matches. Benchmarking MS2Query on reference mass spectra and experimental case studies demonstrate improved reliability and scalability. Thereby, MS2Query offers exciting opportunities to further increase the annotation rate of metabolomics profiles of complex metabolite mixtures and to discover new biology.

Asunto(s)

Aprendizaje Automático , Metabolómica , Reproducibilidad de los Resultados , Espectrometría de Masas , Mezclas Complejas

iPRESTO: Automated discovery of biosynthetic sub-clusters linked to specific natural product substructures.

Louwen, Joris J R; Kautsar, Satria A; van der Burg, Sven; Medema, Marnix H; van der Hooft, Justin J J.

PLoS Comput Biol ; 19(2): e1010462, 2023 02.

Artículo en Inglés | MEDLINE | ID: mdl-36758069

RESUMEN

Microbial specialised metabolism is full of valuable natural products that are applied clinically, agriculturally, and industrially. The genes that encode their biosynthesis are often physically clustered on the genome in biosynthetic gene clusters (BGCs). Many BGCs consist of multiple groups of co-evolving genes called sub-clusters that are responsible for the biosynthesis of a specific chemical moiety in a natural product. Sub-clusters therefore provide an important link between the structures of a natural product and its BGC, which can be leveraged for predicting natural product structures from sequence, as well as for linking chemical structures and metabolomics-derived mass features to BGCs. While some initial computational methodologies have been devised for sub-cluster detection, current approaches are not scalable, have only been run on small and outdated datasets, or produce an impractically large number of possible sub-clusters to mine through. Here, we constructed a scalable method for unsupervised sub-cluster detection, called iPRESTO, based on topic modelling and statistical analysis of co-occurrence patterns of enzyme-coding protein families. iPRESTO was used to mine sub-clusters across 150,000 prokaryotic BGCs from antiSMASH-DB. After annotating a fraction of the resulting sub-cluster families, we could predict a substructure for 16% of the antiSMASH-DB BGCs. Additionally, our method was able to confirm 83% of the experimentally characterised sub-clusters in MIBiG reference BGCs. Based on iPRESTO-detected sub-clusters, we could correctly identify the BGCs for xenorhabdin and salbostatin biosynthesis (which had not yet been annotated in BGC databases), as well as propose a candidate BGC for akashin biosynthesis. Additionally, we show for a collection of 145 actinobacteria how substructures can aid in linking BGCs to molecules by correlating iPRESTO-detected sub-clusters to MS/MS-derived Mass2Motifs substructure patterns. This work paves the way for deeper functional and structural annotation of microbial BGCs by improved linking of orphan molecules to their cognate gene clusters, thus facilitating accelerated natural product discovery.

Asunto(s)

Productos Biológicos , Espectrometría de Masas en Tándem , Metabolómica , Bacterias/genética , Familia de Multigenes

Enhanced correlation-based linking of biosynthetic gene clusters to their metabolic products through chemical class matching.

Louwen, Joris J R; Medema, Marnix H; van der Hooft, Justin J J.

Microbiome ; 11(1): 13, 2023 01 23.

Artículo en Inglés | MEDLINE | ID: mdl-36691088

RESUMEN

BACKGROUND: It is well-known that the microbiome produces a myriad of specialised metabolites with diverse functions. To better characterise their structures and identify their producers in complex samples, integrative genome and metabolome mining is becoming increasingly popular. Metabologenomic co-occurrence-based correlation scoring methods facilitate the linking of metabolite mass fragmentation spectra (MS/MS) to their cognate biosynthetic gene clusters (BGCs) based on shared absence/presence patterns of metabolites and BGCs in paired omics datasets of multiple strains. Recently, these methods have been made more readily accessible through the NPLinker platform. However, co-occurrence-based approaches usually result in too many candidate links to manually validate. To address this issue, we introduce a generic feature-based correlation method that matches chemical compound classes between BGCs and MS/MS spectra. RESULTS: To automatically reduce the long lists of potential BGC-MS/MS spectrum links, we match natural product (NP) ontologies previously independently developed for genomics and metabolomics and developed NPClassScore: an empirical class matching score that we also implemented in the NPLinker platform. By applying NPClassScore on three paired omics datasets totalling 189 bacterial strains, we show that the number of links is reduced by on average 63% as compared to using a co-occurrence-based strategy alone. We further demonstrate that 96% of experimentally validated links in these datasets are retained and prioritised when using NPClassScore. CONCLUSION: The matching genome-metabolome class ontologies provide a starting point for selecting plausible candidates for BGCs and MS/MS spectra based on matching chemical compound class ontologies. NPClassScore expedites genome/metabolome data integration, as relevant BGC-metabolite links are prioritised, and researchers are faced with substantially fewer proposed BGC-MS/MS links to manually inspect. We anticipate that our addition to the NPLinker platform will aid integrative omics mining workflows in discovering novel NPs and understanding complex metabolic interactions in the microbiome. Video Abstract.

Asunto(s)

Vías Biosintéticas , Espectrometría de Masas en Tándem , Vías Biosintéticas/genética , Genómica , Metabolómica/métodos , Familia de Multigenes

MIBiG 3.0: a community-driven effort to annotate experimentally validated biosynthetic gene clusters.

Terlouw, Barbara R; Blin, Kai; Navarro-Muñoz, Jorge C; Avalon, Nicole E; Chevrette, Marc G; Egbert, Susan; Lee, Sanghoon; Meijer, David; Recchia, Michael J J; Reitz, Zachary L; van Santen, Jeffrey A; Selem-Mojica, Nelly; Tørring, Thomas; Zaroubi, Liana; Alanjary, Mohammad; Aleti, Gajender; Aguilar, César; Al-Salihi, Suhad A A; Augustijn, Hannah E; Avelar-Rivas, J Abraham; Avitia-Domínguez, Luis A; Barona-Gómez, Francisco; Bernaldo-Agüero, Jordan; Bielinski, Vincent A; Biermann, Friederike; Booth, Thomas J; Carrion Bravo, Victor J; Castelo-Branco, Raquel; Chagas, Fernanda O; Cruz-Morales, Pablo; Du, Chao; Duncan, Katherine R; Gavriilidou, Athina; Gayrard, Damien; Gutiérrez-García, Karina; Haslinger, Kristina; Helfrich, Eric J N; van der Hooft, Justin J J; Jati, Afif P; Kalkreuter, Edward; Kalyvas, Nikolaos; Kang, Kyo Bin; Kautsar, Satria; Kim, Wonyong; Kunjapur, Aditya M; Li, Yong-Xin; Lin, Geng-Min; Loureiro, Catarina; Louwen, Joris J R; Louwen, Nico L L.

Nucleic Acids Res ; 51(D1): D603-D610, 2023 01 06.

Artículo en Inglés | MEDLINE | ID: mdl-36399496

RESUMEN

With an ever-increasing amount of (meta)genomic data being deposited in sequence databases, (meta)genome mining for natural product biosynthetic pathways occupies a critical role in the discovery of novel pharmaceutical drugs, crop protection agents and biomaterials. The genes that encode these pathways are often organised into biosynthetic gene clusters (BGCs). In 2015, we defined the Minimum Information about a Biosynthetic Gene cluster (MIBiG): a standardised data format that describes the minimally required information to uniquely characterise a BGC. We simultaneously constructed an accompanying online database of BGCs, which has since been widely used by the community as a reference dataset for BGCs and was expanded to 2021 entries in 2019 (MIBiG 2.0). Here, we describe MIBiG 3.0, a database update comprising large-scale validation and re-annotation of existing entries and 661 new entries. Particular attention was paid to the annotation of compound structures and biological activities, as well as protein domain selectivities. Together, these new features keep the database up-to-date, and will provide new opportunities for the scientific community to use its freely available data, e.g. for the training of new machine learning models to predict sequence-structure-function relationships for diverse natural products. MIBiG 3.0 is accessible online at https://mibig.secondarymetabolites.org/.

Asunto(s)

Genoma , Genómica , Familia de Multigenes , Vías Biosintéticas/genética

Good practices and recommendations for using and benchmarking computational metabolomics metabolite annotation tools.

de Jonge, Niek F; Mildau, Kevin; Meijer, David; Louwen, Joris J R; Bueschl, Christoph; Huber, Florian; van der Hooft, Justin J J.

Metabolomics ; 18(12): 103, 2022 12 05.

Artículo en Inglés | MEDLINE | ID: mdl-36469190

RESUMEN

BACKGROUND: Untargeted metabolomics approaches based on mass spectrometry obtain comprehensive profiles of complex biological samples. However, on average only 10% of the molecules can be annotated. This low annotation rate hampers biochemical interpretation and effective comparison of metabolomics studies. Furthermore, de novo structural characterization of mass spectral data remains a complicated and time-intensive process. Recently, the field of computational metabolomics has gained traction and novel methods have started to enable large-scale and reliable metabolite annotation. Molecular networking and machine learning-based in-silico annotation tools have been shown to greatly assist metabolite characterization in diverse fields such as clinical metabolomics and natural product discovery. AIM OF REVIEW: We highlight recent advances in computational metabolite annotation workflows with a special focus on their evaluation and comparison with other tools. Whilst the progress is substantial and promising, we also argue that inconsistencies in benchmarking different tools hamper users from selecting the most appropriate and promising method for their research. We summarize benchmarking strategies of the different tools and outline several recommendations for benchmarking and comparing novel tools. KEY SCIENTIFIC CONCEPTS OF REVIEW: This review focuses on recent advances in mass spectral library-based and machine learning-supported metabolite annotation workflows. We discuss large-scale library matching and analogue search, the current bloom of mass spectral similarity scores, and how molecular networking has changed the field. In addition, the potentials and challenges of machine learning-supported metabolite annotation workflows are highlighted. Overall, recent developments in computational metabolomics have started to fundamentally change metabolomics workflows, and we expect that as a community we will be able to overcome current method performance ambiguities and annotation bottlenecks.

Asunto(s)

Benchmarking , Metabolómica , Metabolómica/métodos , Espectrometría de Masas , Aprendizaje Automático

NPOmix: A machine learning classifier to connect mass spectrometry fragmentation data to biosynthetic gene clusters.

Leão, Tiago F; Wang, Mingxun; da Silva, Ricardo; Gurevich, Alexey; Bauermeister, Anelize; Gomes, Paulo Wender P; Brejnrod, Asker; Glukhov, Evgenia; Aron, Allegra T; Louwen, Joris J R; Kim, Hyun Woo; Reher, Raphael; Fiore, Marli F; van der Hooft, Justin J J; Gerwick, Lena; Gerwick, William H; Bandeira, Nuno; Dorrestein, Pieter C.

PNAS Nexus ; 1(5): pgac257, 2022 Nov.

Artículo en Inglés | MEDLINE | ID: mdl-36712343

RESUMEN

Microbial specialized metabolites are an important source of and inspiration for many pharmaceuticals, biotechnological products and play key roles in ecological processes. Untargeted metabolomics using liquid chromatography coupled with tandem mass spectrometry is an efficient technique to access metabolites from fractions and even environmental crude extracts. Nevertheless, metabolomics is limited in predicting structures or bioactivities for cryptic metabolites. Efficiently linking the biosynthetic potential inferred from (meta)genomics to the specialized metabolome would accelerate drug discovery programs by allowing metabolomics to make use of genetic predictions. Here, we present a k-nearest neighbor classifier to systematically connect mass spectrometry fragmentation spectra to their corresponding biosynthetic gene clusters (independent of their chemical class). Our new pattern-based genome mining pipeline links biosynthetic genes to metabolites that they encode for, as detected via mass spectrometry from bacterial cultures or environmental microbiomes. Using paired datasets that include validated genes-mass spectral links from the Paired Omics Data Platform, we demonstrate this approach by automatically linking 18 previously known mass spectra (17 for which the biosynthesis gene clusters can be found at the MIBiG database plus palmyramide A) to their corresponding previously experimentally validated biosynthetic genes (e.g., via nuclear magnetic resonance or genetic engineering). We illustrated a computational example of how to use our Natural Products Mixed Omics (NPOmix) tool for siderophore mining that can be reproduced by the users. We conclude that NPOmix minimizes the need for culturing (it worked well on microbiomes) and facilitates specialized metabolite prioritization based on integrative omics mining.

Comprehensive Large-Scale Integrative Analysis of Omics Data To Accelerate Specialized Metabolite Discovery.

Louwen, Joris J R; van der Hooft, Justin J J.

mSystems ; 6(4): e0072621, 2021 Aug 31.

Artículo en Inglés | MEDLINE | ID: mdl-34427506

RESUMEN

Microbial specialized metabolites are key mediators in host-microbiome interactions. Most of the chemical space produced by the microbiome currently remains unexplored and uncharacterized. This situation calls for new and improved methods to exploit the growing publicly available genomic and metabolomic data sets and connect the outcomes to structural and functional knowledge inferred from transcriptomics and proteomics experiments. Here, we first describe currently available approaches that support the comprehensive mining of metabolomics and genomics data. Next, we provide our vision on how to move forward toward the automated linking of omics data of specialized metabolites to their structures, biosynthesis pathways, producers, and functions.

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA