Search | VHL Regional Portal

1.

CineMol: a programmatically accessible direct-to-SVG 3D small molecule drawer.

Meijer, David; Medema, Marnix H; van der Hooft, Justin J J.

J Cheminform ; 16(1): 58, 2024 May 23.

Article in English | MEDLINE | ID: mdl-38783386

ABSTRACT

Effective visualization of small molecules is paramount in conveying concepts and results in cheminformatics. Scalable vector graphics (SVG) are preferred for creating such visualizations, as SVGs can be easily altered in post-production and exported to other formats. A wide spectrum of software applications already exist that can visualize molecules, and customize these visualizations, in many ways. However, software packages that can output projected 3D models onto a 2D canvas directly as SVG, while being programmatically accessible from Python, are lacking. Here, we introduce CineMol, which can draw vectorized approximations of three-dimensional small molecule models in seconds, without triangulation or ray tracing, resulting in files of around 50-300 kilobytes per molecule model for compounds with up to 45 heavy atoms. The SVGs outputted by CineMol can be readily modified in popular vector graphics editing software applications. CineMol is written in Python and can be incorporated into any existing Python cheminformatics workflow, as it only depends on native Python libraries. CineMol also provides programmatic access to all its internal states, allowing for per-atom and per-bond-based customization. CineMol's capacity to programmatically create molecular visualizations suitable for post-production offers researchers and scientists a powerful tool for enhancing the clarity and visual impact of their scientific presentations and publications in cheminformatics, metabolomics, and related scientific disciplines.Scientific contributionWe introduce CineMol, a Python-based tool that provides a valuable solution for cheminformatics researchers by enabling the direct generation of high-quality approximations of two-dimensional SVG visualizations from three-dimensional small molecule models, all within a programmable Python framework. CineMol offers a unique combination of speed, efficiency, and accessibility, making it an indispensable tool for researchers in cheminformatics, especially when working with SVG visualizations.

2.

Reproducible mass spectrometry data processing and compound annotation in MZmine 3.

Heuckeroth, Steffen; Damiani, Tito; Smirnov, Aleksandr; Mokshyna, Olena; Brungs, Corinna; Korf, Ansgar; Smith, Joshua David; Stincone, Paolo; Dreolin, Nicola; Nothias, Louis-Félix; Hyötyläinen, Tuulia; Oresic, Matej; Karst, Uwe; Dorrestein, Pieter C; Petras, Daniel; Du, Xiuxia; van der Hooft, Justin J J; Schmid, Robin; Pluskal, Tomás.

Nat Protoc ; 2024 May 20.

Article in English | MEDLINE | ID: mdl-38769143

ABSTRACT

Untargeted mass spectrometry (MS) experiments produce complex, multidimensional data that are practically impossible to investigate manually. For this reason, computational pipelines are needed to extract relevant information from raw spectral data and convert it into a more comprehensible format. Depending on the sample type and/or goal of the study, a variety of MS platforms can be used for such analysis. MZmine is an open-source software for the processing of raw spectral data generated by different MS platforms. Examples include liquid chromatography-MS, gas chromatography-MS and MS-imaging. These data might typically be associated with various applications including metabolomics and lipidomics. Moreover, the third version of the software, described herein, supports the processing of ion mobility spectrometry (IMS) data. The present protocol provides three distinct procedures to perform feature detection and annotation of untargeted MS data produced by different instrumental setups: liquid chromatography-(IMS-)MS, gas chromatography-MS and (IMS-)MS imaging. For training purposes, example datasets are provided together with configuration batch files (i.e., list of processing steps and parameters) to allow new users to easily replicate the described workflows. Depending on the number of data files and available computing resources, we anticipate this to take between 2 and 24 h for new MZmine users and nonexperts. Within each procedure, we provide a detailed description for all processing parameters together with instructions/recommendations for their optimization. The main generated outputs are represented by aligned feature tables and fragmentation spectra lists that can be used by other third-party tools for further downstream analysis.

3.

Charting the Cannabis plant chemical space with computational metabolomics.

Myoli, Akhona; Choene, Mpho; Kappo, Abidemi Paul; Madala, Ntakadzeni Edwin; van der Hooft, Justin J J; Tugizimana, Fidele.

Metabolomics ; 20(3): 62, 2024 May 25.

Article in English | MEDLINE | ID: mdl-38796627

ABSTRACT

INTRODUCTION: The chemical classification of Cannabis is typically confined to the cannabinoid content, whilst Cannabis encompasses diverse chemical classes that vary in abundance among all its varieties. Hence, neglecting other chemical classes within Cannabis strains results in a restricted and biased comprehension of elements that may contribute to chemical intricacy and the resultant medicinal qualities of the plant. OBJECTIVES: Thus, herein, we report a computational metabolomics study to elucidate the Cannabis metabolic map beyond the cannabinoids. METHODS: Mass spectrometry-based computational tools were used to mine and evaluate the methanolic leaf and flower extracts of two Cannabis cultivars: Amnesia haze (AMNH) and Royal dutch cheese (RDC). RESULTS: The results revealed the presence of different chemical compound classes including cannabinoids, but extending it to flavonoids and phospholipids at varying distributions across the cultivar plant tissues, where the phenylpropnoid superclass was more abundant in the leaves than in the flowers. Therefore, the two cultivars were differentiated based on the overall chemical content of their plant tissues where AMNH was observed to be more dominant in the flavonoid content while RDC was more dominant in the lipid-like molecules. Additionally, in silico molecular docking studies in combination with biological assay studies indicated the potentially differing anti-cancer properties of the two cultivars resulting from the elucidated chemical profiles. CONCLUSION: These findings highlight distinctive chemical profiles beyond cannabinoids in Cannabis strains. This novel mapping of the metabolomic landscape of Cannabis provides actionable insights into plant biochemistry and justifies selecting certain varieties for medicinal use.

Subject(s)

Cannabis , Metabolomics , Plant Leaves , Cannabis/chemistry , Cannabis/metabolism , Metabolomics/methods , Plant Leaves/metabolism , Plant Leaves/chemistry , Flowers/metabolism , Flowers/chemistry , Plant Extracts/metabolism , Plant Extracts/chemistry , Plant Extracts/pharmacology , Cannabinoids/metabolism , Cannabinoids/analysis , Molecular Docking Simulation , Flavonoids/metabolism , Flavonoids/analysis , Mass Spectrometry/methods

4.

Tailored Mass Spectral Data Exploration Using the SpecXplore Interactive Dashboard.

Mildau, Kevin; Ehlers, Henry; Oesterle, Ian; Pristner, Manuel; Warth, Benedikt; Doppler, Maria; Bueschl, Christoph; Zanghellini, Jürgen; van der Hooft, Justin J J.

Anal Chem ; 96(15): 5798-5806, 2024 Apr 16.

Article in English | MEDLINE | ID: mdl-38564584

ABSTRACT

Untargeted metabolomics promises comprehensive characterization of small molecules in biological samples. However, the field is hampered by low annotation rates and abstract spectral data. Despite recent advances in computational metabolomics, manual annotations and manual confirmation of in-silico annotations remain important in the field. Here, exploratory data analysis methods for mass spectral data provide overviews, prioritization, and structural hypothesis starting points to researchers facing large quantities of spectral data. In this research, we propose a fluid means of dealing with mass spectral data using specXplore, an interactive Python dashboard providing interactive and complementary visualizations facilitating mass spectral similarity matrix exploration. Specifically, specXplore provides a two-dimensional t-distributed stochastic neighbor embedding embedding as a jumping board for local connectivity exploration using complementary interactive visualizations in the form of partial network drawings, similarity heatmaps, and fragmentation overview maps. SpecXplore makes use of state-of-the-art ms2deepscore pairwise spectral similarities as a quantitative backbone while allowing fast changes of threshold and connectivity limitation settings, providing flexibility in adjusting settings to suit the localized node environment being explored. We believe that specXplore can become an integral part of mass spectral data exploration efforts and assist users in the generation of structural hypotheses for compounds of interest.

5.

Studying Plant Specialized Metabolites Using Computational Metabolomics Strategies.

Mutabdzija, Lana; Myoli, Akhona; de Jonge, Niek F; Damiani, Tito; Schmid, Robin; van der Hooft, Justin J J; Tugizimana, Fidele; Pluskal, Tomás.

Methods Mol Biol ; 2788: 97-136, 2024.

Article in English | MEDLINE | ID: mdl-38656511

ABSTRACT

Plant specialized metabolites have diversified vastly over the course of plant evolution, and they are considered key players in complex interactions between plants and their environment. The chemical diversity of these metabolites has been widely explored and utilized in agriculture and crop enhancement, the food industry, and drug development, among other areas. However, the immensity of the plant metabolome can make its exploration challenging. Here we describe a protocol for exploring plant specialized metabolites that combines high-resolution mass spectrometry and computational metabolomics strategies, including molecular networking, identification of structural motifs, as well as prediction of chemical structures and metabolite classes.

Subject(s)

Mass Spectrometry , Metabolome , Metabolomics , Plants , Metabolomics/methods , Plants/metabolism , Mass Spectrometry/methods , Computational Biology/methods

6.

Open access repository-scale propagated nearest neighbor suspect spectral library for untargeted metabolomics.

Bittremieux, Wout; Avalon, Nicole E; Thomas, Sydney P; Kakhkhorov, Sarvar A; Aksenov, Alexander A; Gomes, Paulo Wender P; Aceves, Christine M; Caraballo-Rodríguez, Andrés Mauricio; Gauglitz, Julia M; Gerwick, William H; Huan, Tao; Jarmusch, Alan K; Kaddurah-Daouk, Rima F; Kang, Kyo Bin; Kim, Hyun Woo; Kondic, Todor; Mannochio-Russo, Helena; Meehan, Michael J; Melnik, Alexey V; Nothias, Louis-Felix; O'Donovan, Claire; Panitchpakdi, Morgan; Petras, Daniel; Schmid, Robin; Schymanski, Emma L; van der Hooft, Justin J J; Weldon, Kelly C; Yang, Heejung; Xing, Shipei; Zemlin, Jasmine; Wang, Mingxun; Dorrestein, Pieter C.

Nat Commun ; 14(1): 8488, 2023 Dec 20.

Article in English | MEDLINE | ID: mdl-38123557

ABSTRACT

Despite the increasing availability of tandem mass spectrometry (MS/MS) community spectral libraries for untargeted metabolomics over the past decade, the majority of acquired MS/MS spectra remain uninterpreted. To further aid in interpreting unannotated spectra, we created a nearest neighbor suspect spectral library, consisting of 87,916 annotated MS/MS spectra derived from hundreds of millions of MS/MS spectra originating from published untargeted metabolomics experiments. Entries in this library, or "suspects," were derived from unannotated spectra that could be linked in a molecular network to an annotated spectrum. Annotations were propagated to unknowns based on structural relationships to reference molecules using MS/MS-based spectrum alignment. We demonstrate the broad relevance of the nearest neighbor suspect spectral library through representative examples of propagation-based annotation of acylcarnitines, bacterial and plant natural products, and drug metabolism. Our results also highlight how the library can help to better understand an Alzheimer's brain phenotype. The nearest neighbor suspect spectral library is openly available for download or for data analysis through the GNPS platform to help investigators hypothesize candidate structures for unknown MS/MS spectra in untargeted metabolomics data.

Subject(s)

Access to Information , Tandem Mass Spectrometry , Tandem Mass Spectrometry/methods , Metabolomics/methods , Gene Library , Cluster Analysis

7.

Artificial intelligence for natural product drug discovery.

Mullowney, Michael W; Duncan, Katherine R; Elsayed, Somayah S; Garg, Neha; van der Hooft, Justin J J; Martin, Nathaniel I; Meijer, David; Terlouw, Barbara R; Biermann, Friederike; Blin, Kai; Durairaj, Janani; Gorostiola González, Marina; Helfrich, Eric J N; Huber, Florian; Leopold-Messer, Stefan; Rajan, Kohulan; de Rond, Tristan; van Santen, Jeffrey A; Sorokina, Maria; Balunas, Marcy J; Beniddir, Mehdi A; van Bergeijk, Doris A; Carroll, Laura M; Clark, Chase M; Clevert, Djork-Arné; Dejong, Chris A; Du, Chao; Ferrinho, Scarlet; Grisoni, Francesca; Hofstetter, Albert; Jespers, Willem; Kalinina, Olga V; Kautsar, Satria A; Kim, Hyunwoo; Leao, Tiago F; Masschelein, Joleen; Rees, Evan R; Reher, Raphael; Reker, Daniel; Schwaller, Philippe; Segler, Marwin; Skinnider, Michael A; Walker, Allison S; Willighagen, Egon L; Zdrazil, Barbara; Ziemert, Nadine; Goss, Rebecca J M; Guyomard, Pierre; Volkamer, Andrea; Gerwick, William H.

Nat Rev Drug Discov ; 22(11): 895-916, 2023 11.

Article in English | MEDLINE | ID: mdl-37697042

ABSTRACT

Developments in computational omics technologies have provided new means to access the hidden diversity of natural products, unearthing new potential for drug discovery. In parallel, artificial intelligence approaches such as machine learning have led to exciting developments in the computational drug design field, facilitating biological activity prediction and de novo drug design for molecular targets of interest. Here, we describe current and future synergies between these developments to effectively identify drug candidates from the plethora of molecules produced by nature. We also discuss how to address key challenges in realizing the potential of these synergies, such as the need for high-quality datasets to train deep learning algorithms and appropriate strategies for algorithm validation.

Subject(s)

Artificial Intelligence , Biological Products , Humans , Algorithms , Machine Learning , Drug Discovery , Drug Design , Biological Products/pharmacology

8.

Overexpression of Brassica napus COMT1 in Arabidopsis heightens UV-B-mediated resistance to Plutella xylostella herbivory.

McInnes, Kirsty J; van der Hooft, Justin J J; Sharma, Ashutosh; Herzyk, Pawel; Hundleby, Penny A C; Schoonbeek, Henk-Jan; Amtmann, Anna; Ridout, Christopher; Jenkins, Gareth I.

Photochem Photobiol Sci ; 22(10): 2341-2356, 2023 Oct.

Article in English | MEDLINE | ID: mdl-37505444

ABSTRACT

UV-B radiation regulates numerous morphogenic, biochemical and physiological responses in plants, and can stimulate some responses typically associated with other abiotic and biotic stimuli, including invertebrate herbivory. Removal of UV-B from the growing environment of various plant species has been found to increase their susceptibility to consumption by invertebrate pests, however, to date, little research has been conducted to investigate the effects of UV-B on crop susceptibility to field pests. Here, we report findings from a multi-omic and genetic-based study investigating the mechanisms of UV-B-stimulated resistance of the crop, Brassica napus (oilseed rape), to herbivory from an economically important lepidopteran specialist of the Brassicaceae, Plutella xylostella (diamondback moth). The UV-B photoreceptor, UV RESISTANCE LOCUS 8 (UVR8), was not found to mediate resistance to this pest. RNA-Seq and untargeted metabolomics identified components of the sinapate/lignin biosynthetic pathway that were similarly regulated by UV-B and herbivory. Arabidopsis mutants in genes encoding two enzymes in the sinapate/lignin biosynthetic pathway, CAFFEATE O-METHYLTRANSFERASE 1 (COMT1) and ELICITOR-ACTIVATED GENE 3-2 (ELI3-2), retained UV-B-mediated resistance to P. xylostella herbivory. However, the overexpression of B. napus COMT1 in Arabidopsis further reduced plant susceptibility to P. xylostella herbivory in a UV-B-dependent manner. These findings demonstrate that overexpression of a component of the sinapate/lignin biosynthetic pathway in a member of the Brassicaceae can enhance UV-B-stimulated resistance to herbivory from P. xylostella.

Subject(s)

Arabidopsis , Brassica napus , Moths , Animals , Arabidopsis/genetics , Arabidopsis/radiation effects , Brassica napus/genetics , Herbivory , Lignin , Moths/physiology , Plants

9.

Metabolome-guided genome mining of RiPP natural products.

Zdouc, Mitja M; van der Hooft, Justin J J; Medema, Marnix H.

Trends Pharmacol Sci ; 44(8): 532-541, 2023 08.

Article in English | MEDLINE | ID: mdl-37391295

ABSTRACT

Ribosomally synthesized and post-translationally modified peptides (RiPPs) are a chemically diverse class of metabolites. Many RiPPs show potent biological activities that make them attractive starting points for drug development. A promising approach for the discovery of new classes of RiPPs is genome mining. However, the accuracy of genome mining is hampered by the lack of signature genes shared across different RiPP classes. One way to reduce false-positive predictions is by complementing genomic information with metabolomics data. In recent years, several new approaches addressing such integrative genomics and metabolomics analyses have been developed. In this review, we provide a detailed discussion of RiPP-compatible software tools that integrate paired genomics and metabolomics data. We highlight current challenges in data integration and identify opportunities for further developments targeting new classes of bioactive RiPPs.

Subject(s)

Biological Products , Humans , Ribosomes/genetics , Ribosomes/metabolism , Peptides , Genomics , Metabolome , Protein Processing, Post-Translational

10.

Trachymyrmex septentrionalis ants promote fungus garden hygiene using Trichoderma-derived metabolite cues.

Kyle, Kathleen E; Puckett, Sara P; Caraballo-Rodríguez, Andrés Mauricio; Rivera-Chávez, José; Samples, Robert M; Earp, Cody E; Raja, Huzefa A; Pearce, Cedric J; Ernst, Madeleine; van der Hooft, Justin J J; Adams, Madison E; Oberlies, Nicholas H; Dorrestein, Pieter C; Klassen, Jonathan L; Balunas, Marcy J.

Proc Natl Acad Sci U S A ; 120(25): e2219373120, 2023 Jun 20.

Article in English | MEDLINE | ID: mdl-37319116

ABSTRACT

Fungus-growing ants depend on a fungal mutualist that can fall prey to fungal pathogens. This mutualist is cultivated by these ants in structures called fungus gardens. Ants exhibit weeding behaviors that keep their fungus gardens healthy by physically removing compromised pieces. However, how ants detect diseases of their fungus gardens is unknown. Here, we applied the logic of Koch's postulates using environmental fungal community gene sequencing, fungal isolation, and laboratory infection experiments to establish that Trichoderma spp. can act as previously unrecognized pathogens of Trachymyrmex septentrionalis fungus gardens. Our environmental data showed that Trichoderma are the most abundant noncultivar fungi in wild T. septentrionalis fungus gardens. We further determined that metabolites produced by Trichoderma induce an ant weeding response that mirrors their response to live Trichoderma. Combining ant behavioral experiments with bioactivity-guided fractionation and statistical prioritization of metabolites in Trichoderma extracts demonstrated that T. septentrionalis ants weed in response to peptaibols, a specific class of secondary metabolites known to be produced by Trichoderma fungi. Similar assays conducted using purified peptaibols, including the two previously undescribed peptaibols trichokindins VIII and IX, suggested that weeding is likely induced by peptaibols as a class rather than by a single peptaibol metabolite. In addition to their presence in laboratory experiments, we detected peptaibols in wild fungus gardens. Our combination of environmental data and laboratory infection experiments strongly support that peptaibols act as chemical cues of Trichoderma pathogenesis in T. septentrionalis fungus gardens.

Subject(s)

Ants , Laboratory Infection , Trichoderma , Animals , Ants/physiology , Gardens , Cues , Symbiosis , Peptaibols

11.

MS2Query: reliable and scalable MS² mass spectra-based analogue search.

de Jonge, Niek F; Louwen, Joris J R; Chekmeneva, Elena; Camuzeaux, Stephane; Vermeir, Femke J; Jansen, Robert S; Huber, Florian; van der Hooft, Justin J J.

Nat Commun ; 14(1): 1752, 2023 03 29.

Article in English | MEDLINE | ID: mdl-36990978

ABSTRACT

Metabolomics-driven discoveries of biological samples remain hampered by the grand challenge of metabolite annotation and identification. Only few metabolites have an annotated spectrum in spectral libraries; hence, searching only for exact library matches generally returns a few hits. An attractive alternative is searching for so-called analogues as a starting point for structural annotations; analogues are library molecules which are not exact matches but display a high chemical similarity. However, current analogue search implementations are not yet very reliable and relatively slow. Here, we present MS2Query, a machine learning-based tool that integrates mass spectral embedding-based chemical similarity predictors (Spec2Vec and MS2Deepscore) as well as detected precursor masses to rank potential analogues and exact matches. Benchmarking MS2Query on reference mass spectra and experimental case studies demonstrate improved reliability and scalability. Thereby, MS2Query offers exciting opportunities to further increase the annotation rate of metabolomics profiles of complex metabolite mixtures and to discover new biology.

Subject(s)

Machine Learning , Metabolomics , Reproducibility of Results , Mass Spectrometry , Complex Mixtures

12.

Simulated-to-real benchmarking of acquisition methods in untargeted metabolomics.

Wandy, Joe; McBride, Ross; Rogers, Simon; Terzis, Nikolaos; Weidt, Stefan; van der Hooft, Justin J J; Bryson, Kevin; Daly, Rónán; Davies, Vinny.

Front Mol Biosci ; 10: 1130781, 2023.

Article in English | MEDLINE | ID: mdl-36959982

ABSTRACT

Data-Dependent and Data-Independent Acquisition modes (DDA and DIA, respectively) are both widely used to acquire MS2 spectra in untargeted liquid chromatography tandem mass spectrometry (LC-MS/MS) metabolomics analyses. Despite their wide use, little work has been attempted to systematically compare their MS/MS spectral annotation performance in untargeted settings due to the lack of ground truth and the costs involved in running a large number of acquisitions. Here, we present a systematic in silico comparison of these two acquisition methods in untargeted metabolomics by extending our Virtual Metabolomics Mass Spectrometer (ViMMS) framework with a DIA module. Our results show that the performance of these methods varies with the average number of co-eluting ions as the most important factor. At low numbers, DIA outperforms DDA, but at higher numbers, DDA has an advantage as DIA can no longer deal with the large amount of overlapping ion chromatograms. Results from simulation were further validated on an actual mass spectrometer, demonstrating that using ViMMS we can draw conclusions from simulation that translate well into the real world. The versatility of the Virtual Metabolomics Mass Spectrometer (ViMMS) framework in simulating different parameters of both Data-Dependent and Data-Independent Acquisition (DDA and DIA) modes is a key advantage of this work. Researchers can easily explore and compare the performance of different acquisition methods within the ViMMS framework, without the need for expensive and time-consuming experiments with real experimental data. By identifying the strengths and limitations of each acquisition method, researchers can optimize their choice and obtain more accurate and robust results. Furthermore, the ability to simulate and validate results using the ViMMS framework can save significant time and resources, as it eliminates the need for numerous experiments. This work not only provides valuable insights into the performance of DDA and DIA, but it also opens the door for further advancements in LC-MS/MS data acquisition methods.

13.

Recent advances in mass spectrometry-based computational metabolomics.

Ebbels, Timothy M D; van der Hooft, Justin J J; Chatelaine, Haley; Broeckling, Corey; Zamboni, Nicola; Hassoun, Soha; Mathé, Ewy A.

Curr Opin Chem Biol ; 74: 102288, 2023 06.

Article in English | MEDLINE | ID: mdl-36966702

ABSTRACT

The computational metabolomics field brings together computer scientists, bioinformaticians, chemists, clinicians, and biologists to maximize the impact of metabolomics across a wide array of scientific and medical disciplines. The field continues to expand as modern instrumentation produces datasets with increasing complexity, resolution, and sensitivity. These datasets must be processed, annotated, modeled, and interpreted to enable biological insight. Techniques for visualization, integration (within or between omics), and interpretation of metabolomics data have evolved along with innovation in the databases and knowledge resources required to aid understanding. In this review, we highlight recent advances in the field and reflect on opportunities and innovations in response to the most pressing challenges. This review was compiled from discussions from the 2022 Dagstuhl seminar entitled "Computational Metabolomics: From Spectra to Knowledge".

Subject(s)

Computational Biology , Metabolomics , Metabolomics/methods , Mass Spectrometry/methods , Databases, Factual , Computational Biology/methods

14.

iPRESTO: Automated discovery of biosynthetic sub-clusters linked to specific natural product substructures.

Louwen, Joris J R; Kautsar, Satria A; van der Burg, Sven; Medema, Marnix H; van der Hooft, Justin J J.

PLoS Comput Biol ; 19(2): e1010462, 2023 02.

Article in English | MEDLINE | ID: mdl-36758069

ABSTRACT

Microbial specialised metabolism is full of valuable natural products that are applied clinically, agriculturally, and industrially. The genes that encode their biosynthesis are often physically clustered on the genome in biosynthetic gene clusters (BGCs). Many BGCs consist of multiple groups of co-evolving genes called sub-clusters that are responsible for the biosynthesis of a specific chemical moiety in a natural product. Sub-clusters therefore provide an important link between the structures of a natural product and its BGC, which can be leveraged for predicting natural product structures from sequence, as well as for linking chemical structures and metabolomics-derived mass features to BGCs. While some initial computational methodologies have been devised for sub-cluster detection, current approaches are not scalable, have only been run on small and outdated datasets, or produce an impractically large number of possible sub-clusters to mine through. Here, we constructed a scalable method for unsupervised sub-cluster detection, called iPRESTO, based on topic modelling and statistical analysis of co-occurrence patterns of enzyme-coding protein families. iPRESTO was used to mine sub-clusters across 150,000 prokaryotic BGCs from antiSMASH-DB. After annotating a fraction of the resulting sub-cluster families, we could predict a substructure for 16% of the antiSMASH-DB BGCs. Additionally, our method was able to confirm 83% of the experimentally characterised sub-clusters in MIBiG reference BGCs. Based on iPRESTO-detected sub-clusters, we could correctly identify the BGCs for xenorhabdin and salbostatin biosynthesis (which had not yet been annotated in BGC databases), as well as propose a candidate BGC for akashin biosynthesis. Additionally, we show for a collection of 145 actinobacteria how substructures can aid in linking BGCs to molecules by correlating iPRESTO-detected sub-clusters to MS/MS-derived Mass2Motifs substructure patterns. This work paves the way for deeper functional and structural annotation of microbial BGCs by improved linking of orphan molecules to their cognate gene clusters, thus facilitating accelerated natural product discovery.

Subject(s)

Biological Products , Tandem Mass Spectrometry , Metabolomics , Bacteria/genetics , Multigene Family

15.

Enhanced correlation-based linking of biosynthetic gene clusters to their metabolic products through chemical class matching.

Louwen, Joris J R; Medema, Marnix H; van der Hooft, Justin J J.

Microbiome ; 11(1): 13, 2023 01 23.

Article in English | MEDLINE | ID: mdl-36691088

ABSTRACT

BACKGROUND: It is well-known that the microbiome produces a myriad of specialised metabolites with diverse functions. To better characterise their structures and identify their producers in complex samples, integrative genome and metabolome mining is becoming increasingly popular. Metabologenomic co-occurrence-based correlation scoring methods facilitate the linking of metabolite mass fragmentation spectra (MS/MS) to their cognate biosynthetic gene clusters (BGCs) based on shared absence/presence patterns of metabolites and BGCs in paired omics datasets of multiple strains. Recently, these methods have been made more readily accessible through the NPLinker platform. However, co-occurrence-based approaches usually result in too many candidate links to manually validate. To address this issue, we introduce a generic feature-based correlation method that matches chemical compound classes between BGCs and MS/MS spectra. RESULTS: To automatically reduce the long lists of potential BGC-MS/MS spectrum links, we match natural product (NP) ontologies previously independently developed for genomics and metabolomics and developed NPClassScore: an empirical class matching score that we also implemented in the NPLinker platform. By applying NPClassScore on three paired omics datasets totalling 189 bacterial strains, we show that the number of links is reduced by on average 63% as compared to using a co-occurrence-based strategy alone. We further demonstrate that 96% of experimentally validated links in these datasets are retained and prioritised when using NPClassScore. CONCLUSION: The matching genome-metabolome class ontologies provide a starting point for selecting plausible candidates for BGCs and MS/MS spectra based on matching chemical compound class ontologies. NPClassScore expedites genome/metabolome data integration, as relevant BGC-metabolite links are prioritised, and researchers are faced with substantially fewer proposed BGC-MS/MS links to manually inspect. We anticipate that our addition to the NPLinker platform will aid integrative omics mining workflows in discovering novel NPs and understanding complex metabolic interactions in the microbiome. Video Abstract.

Subject(s)

Biosynthetic Pathways , Tandem Mass Spectrometry , Biosynthetic Pathways/genetics , Genomics , Metabolomics/methods , Multigene Family

16.

MIBiG 3.0: a community-driven effort to annotate experimentally validated biosynthetic gene clusters.

Terlouw, Barbara R; Blin, Kai; Navarro-Muñoz, Jorge C; Avalon, Nicole E; Chevrette, Marc G; Egbert, Susan; Lee, Sanghoon; Meijer, David; Recchia, Michael J J; Reitz, Zachary L; van Santen, Jeffrey A; Selem-Mojica, Nelly; Tørring, Thomas; Zaroubi, Liana; Alanjary, Mohammad; Aleti, Gajender; Aguilar, César; Al-Salihi, Suhad A A; Augustijn, Hannah E; Avelar-Rivas, J Abraham; Avitia-Domínguez, Luis A; Barona-Gómez, Francisco; Bernaldo-Agüero, Jordan; Bielinski, Vincent A; Biermann, Friederike; Booth, Thomas J; Carrion Bravo, Victor J; Castelo-Branco, Raquel; Chagas, Fernanda O; Cruz-Morales, Pablo; Du, Chao; Duncan, Katherine R; Gavriilidou, Athina; Gayrard, Damien; Gutiérrez-García, Karina; Haslinger, Kristina; Helfrich, Eric J N; van der Hooft, Justin J J; Jati, Afif P; Kalkreuter, Edward; Kalyvas, Nikolaos; Kang, Kyo Bin; Kautsar, Satria; Kim, Wonyong; Kunjapur, Aditya M; Li, Yong-Xin; Lin, Geng-Min; Loureiro, Catarina; Louwen, Joris J R; Louwen, Nico L L.

Nucleic Acids Res ; 51(D1): D603-D610, 2023 01 06.

Article in English | MEDLINE | ID: mdl-36399496

ABSTRACT

With an ever-increasing amount of (meta)genomic data being deposited in sequence databases, (meta)genome mining for natural product biosynthetic pathways occupies a critical role in the discovery of novel pharmaceutical drugs, crop protection agents and biomaterials. The genes that encode these pathways are often organised into biosynthetic gene clusters (BGCs). In 2015, we defined the Minimum Information about a Biosynthetic Gene cluster (MIBiG): a standardised data format that describes the minimally required information to uniquely characterise a BGC. We simultaneously constructed an accompanying online database of BGCs, which has since been widely used by the community as a reference dataset for BGCs and was expanded to 2021 entries in 2019 (MIBiG 2.0). Here, we describe MIBiG 3.0, a database update comprising large-scale validation and re-annotation of existing entries and 661 new entries. Particular attention was paid to the annotation of compound structures and biological activities, as well as protein domain selectivities. Together, these new features keep the database up-to-date, and will provide new opportunities for the scientific community to use its freely available data, e.g. for the training of new machine learning models to predict sequence-structure-function relationships for diverse natural products. MIBiG 3.0 is accessible online at https://mibig.secondarymetabolites.org/.

Subject(s)

Genome , Genomics , Multigene Family , Biosynthetic Pathways/genetics

17.

Editorial: NMR-based metabolomics.

Junot, Christophe; Pinu, Farhana R; van der Hooft, Justin J J; Moco, Sofia.

Front Mol Biosci ; 10: 1337566, 2023.

Article in English | MEDLINE | ID: mdl-38223239

18.

Good practices and recommendations for using and benchmarking computational metabolomics metabolite annotation tools.

de Jonge, Niek F; Mildau, Kevin; Meijer, David; Louwen, Joris J R; Bueschl, Christoph; Huber, Florian; van der Hooft, Justin J J.

Metabolomics ; 18(12): 103, 2022 12 05.

Article in English | MEDLINE | ID: mdl-36469190

ABSTRACT

BACKGROUND: Untargeted metabolomics approaches based on mass spectrometry obtain comprehensive profiles of complex biological samples. However, on average only 10% of the molecules can be annotated. This low annotation rate hampers biochemical interpretation and effective comparison of metabolomics studies. Furthermore, de novo structural characterization of mass spectral data remains a complicated and time-intensive process. Recently, the field of computational metabolomics has gained traction and novel methods have started to enable large-scale and reliable metabolite annotation. Molecular networking and machine learning-based in-silico annotation tools have been shown to greatly assist metabolite characterization in diverse fields such as clinical metabolomics and natural product discovery. AIM OF REVIEW: We highlight recent advances in computational metabolite annotation workflows with a special focus on their evaluation and comparison with other tools. Whilst the progress is substantial and promising, we also argue that inconsistencies in benchmarking different tools hamper users from selecting the most appropriate and promising method for their research. We summarize benchmarking strategies of the different tools and outline several recommendations for benchmarking and comparing novel tools. KEY SCIENTIFIC CONCEPTS OF REVIEW: This review focuses on recent advances in mass spectral library-based and machine learning-supported metabolite annotation workflows. We discuss large-scale library matching and analogue search, the current bloom of mass spectral similarity scores, and how molecular networking has changed the field. In addition, the potentials and challenges of machine learning-supported metabolite annotation workflows are highlighted. Overall, recent developments in computational metabolomics have started to fundamentally change metabolomics workflows, and we expect that as a community we will be able to overcome current method performance ambiguities and annotation bottlenecks.

Subject(s)

Benchmarking , Metabolomics , Metabolomics/methods , Mass Spectrometry , Machine Learning

19.

Combining Feature-Based Molecular Networking and Contextual Mass Spectral Libraries to Decipher Nutrimetabolomics Profiles.

Renai, Lapo; Ulaszewska, Marynka; Mattivi, Fulvio; Bartoletti, Riccardo; Del Bubba, Massimo; van der Hooft, Justin J J.

Metabolites ; 12(10)2022 Oct 21.

Article in English | MEDLINE | ID: mdl-36295906

ABSTRACT

Untargeted metabolomics approaches deal with complex data hindering structural information for the comprehensive analysis of unknown metabolite features. We investigated the metabolite discovery capacity and the possible extension of the annotation coverage of the Feature-Based Molecular Networking (FBMN) approach by adding two novel nutritionally-relevant (contextual) mass spectral libraries to the existing public ones, as compared to widely-used open-source annotation protocols. Two contextual mass spectral libraries in positive and negative ionization mode of ~300 reference molecules relevant for plant-based nutrikinetic studies were created and made publicly available through the GNPS platform. The postprandial urinary metabolome analysis within the intervention of Vaccinium supplements was selected as a case study. Following the FBMN approach in combination with the added contextual mass spectral libraries, 67 berry-related and human endogenous metabolites were annotated, achieving a structural annotation coverage comparable to or higher than existing non-commercial annotation workflows. To further exploit the quantitative data obtained within the FBMN environment, the postprandial behavior of the annotated metabolites was analyzed with Pearson product-moment correlation. This simple chemometric tool linked several molecular families with phase II and phase I metabolism. The proposed approach is a powerful strategy to employ in longitudinal studies since it reduces the unknown chemical space by boosting the annotation power to characterize biochemically relevant metabolites in human biofluids.

20.

Homologue series detection and management in LC-MS data with homologueDiscoverer.

Mildau, Kevin; van der Hooft, Justin J J; Flasch, Mira; Warth, Benedikt; El Abiead, Yasin; Koellensperger, Gunda; Zanghellini, Jürgen; Büschl, Christoph.

Bioinformatics ; 38(22): 5139-5140, 2022 11 15.

Article in English | MEDLINE | ID: mdl-36165687

ABSTRACT

SUMMARY: Untargeted metabolomics data analysis is highly labour intensive and can be severely frustrated by both experimental noise and redundant features. Homologous polymer series is a particular case of features that can either represent large numbers of noise features or alternatively represent features of interest with large peak redundancy. Here, we present homologueDiscoverer, an R package that allows for the targeted and untargeted detection of homologue series as well as their evaluation and management using interactive plots and simple local database functionalities. AVAILABILITY AND IMPLEMENTATION: homologueDiscoverer is freely available at GitHub https://github.com/kevinmildau/homologueDiscoverer. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Subject(s)

Software , Tandem Mass Spectrometry , Chromatography, Liquid , Metabolomics , Data Analysis

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL