Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 97
Filtrar
1.
Bioinformatics ; 40(10)2024 Oct 01.
Artigo em Inglês | MEDLINE | ID: mdl-39348165

RESUMO

SUMMARY: Computational metabolomics workflows have revolutionized the untargeted metabolomics field. However, the organization and prioritization of metabolite features remains a laborious process. Organizing metabolomics data is often done through mass fragmentation-based spectral similarity grouping, resulting in feature sets that also represent an intuitive and scientifically meaningful first stage of analysis in untargeted metabolomics. Exploiting such feature sets, feature-set testing has emerged as an approach that is widely used in genomics and targeted metabolomics pathway enrichment analyses. It allows for formally combining groupings with statistical testing into more meaningful pathway enrichment conclusions. Here, we present msFeaST (mass spectral Feature Set Testing), a feature-set testing and visualization workflow for LC-MS/MS untargeted metabolomics data. Feature-set testing involves statistically assessing differential abundance patterns for groups of features across experimental conditions. We developed msFeaST to make use of spectral similarity-based feature groupings generated using k-medoids clustering, where the resulting clusters serve as a proxy for grouping structurally similar features with potential biosynthesis pathway relationships. Spectral clustering done in this way allows for feature group-wise statistical testing using the globaltest package, which provides high power to detect small concordant effects via joint modeling and reduced multiplicity adjustment penalties. Hence, msFeaST provides interactive integration of the semi-quantitative experimental information with mass-spectral structural similarity information, enhancing the prioritization of features and feature sets during exploratory data analysis. AVAILABILITY AND IMPLEMENTATION: The msFeaST workflow is freely available through https://github.com/kevinmildau/msFeaST and built to work on MacOS and Linux systems.


Assuntos
Metabolômica , Espectrometria de Massas em Tandem , Metabolômica/métodos , Espectrometria de Massas em Tandem/métodos , Cromatografia Líquida/métodos , Software , Análise por Conglomerados , Espectrometria de Massa com Cromatografia Líquida
2.
Nat Protoc ; 2024 Sep 20.
Artigo em Inglês | MEDLINE | ID: mdl-39304763

RESUMO

Feature-based molecular networking (FBMN) is a popular analysis approach for liquid chromatography-tandem mass spectrometry-based non-targeted metabolomics data. While processing liquid chromatography-tandem mass spectrometry data through FBMN is fairly streamlined, downstream data handling and statistical interrogation are often a key bottleneck. Especially users new to statistical analysis struggle to effectively handle and analyze complex data matrices. Here we provide a comprehensive guide for the statistical analysis of FBMN results, focusing on the downstream analysis of the FBMN output table. We explain the data structure and principles of data cleanup and normalization, as well as uni- and multivariate statistical analysis of FBMN results. We provide explanations and code in two scripting languages (R and Python) as well as the QIIME2 framework for all protocol steps, from data clean-up to statistical analysis. All code is shared in the form of Jupyter Notebooks ( https://github.com/Functional-Metabolomics-Lab/FBMN-STATS ). Additionally, the protocol is accompanied by a web application with a graphical user interface ( https://fbmn-statsguide.gnps2.org/ ) to lower the barrier of entry for new users and for educational purposes. Finally, we also show users how to integrate their statistical results into the molecular network using the Cytoscape visualization tool. Throughout the protocol, we use a previously published environmental metabolomics dataset for demonstration purposes. Together, the protocol, code and web application provide a complete guide and toolbox for FBMN data integration, cleanup and advanced statistical analysis, enabling new users to uncover molecular insights from their non-targeted metabolomics data. Our protocol is tailored for the seamless analysis of FBMN results from Global Natural Products Social Molecular Networking and can be easily adapted to other mass spectrometry feature detection, annotation and networking tools.

3.
Nat Prod Rep ; 2024 Aug 16.
Artigo em Inglês | MEDLINE | ID: mdl-39148455

RESUMO

Artificial intelligence (AI) is accelerating how we conduct science, from folding proteins with AlphaFold and summarizing literature findings with large language models, to annotating genomes and prioritizing newly generated molecules for screening using specialized software. However, the application of AI to emulate human cognition in natural product research and its subsequent impact has so far been limited. One reason for this limited impact is that available natural product data is multimodal, unbalanced, unstandardized, and scattered across many data repositories. This makes natural product data challenging to use with existing deep learning architectures that consume fairly standardized, often non-relational, data. It also prevents models from learning overarching patterns in natural product science. In this Viewpoint, we address this challenge and support ongoing initiatives aimed at democratizing natural product data by collating our collective knowledge into a knowledge graph. By doing so, we believe there will be an opportunity to use such a knowledge graph to develop AI models that can truly mimic natural product scientists' decision-making.

4.
Front Toxicol ; 6: 1401036, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-39086553

RESUMO

The cell painting (CP) assay has emerged as a potent imaging-based high-throughput phenotypic profiling (HTPP) tool that provides comprehensive input data for in silico prediction of compound activities and potential hazards in drug discovery and toxicology. CP enables the rapid, multiplexed investigation of various molecular mechanisms for thousands of compounds at the single-cell level. The resulting large volumes of image data provide great opportunities but also pose challenges to image and data analysis routines as well as property prediction models. This review addresses the integration of CP-based phenotypic data together with or in substitute of structural information from compounds into machine (ML) and deep learning (DL) models to predict compound activities for various human-relevant disease endpoints and to identify the underlying modes-of-action (MoA) while avoiding unnecessary animal testing. The successful application of CP in combination with powerful ML/DL models promises further advances in understanding compound responses of cells guiding therapeutic development and risk assessment. Therefore, this review highlights the importance of unlocking the potential of CP assays when combined with molecular fingerprints for compound evaluation and discusses the current challenges that are associated with this approach.

5.
J Cheminform ; 16(1): 88, 2024 Jul 29.
Artigo em Inglês | MEDLINE | ID: mdl-39075613

RESUMO

Mass spectral libraries have proven to be essential for mass spectrum annotation, both for library matching and training new machine learning algorithms. A key step in training machine learning models is the availability of high-quality training data. Public libraries of mass spectrometry data that are open to user submission often suffer from limited metadata curation and harmonization. The resulting variability in data quality makes training of machine learning models challenging. Here we present a library cleaning pipeline designed for cleaning tandem mass spectrometry library data. The pipeline is designed with ease of use, flexibility, and reproducibility as leading principles.Scientific contributionThis pipeline will result in cleaner public mass spectral libraries that will improve library searching and the quality of machine-learning training datasets in mass spectrometry. This pipeline builds on previous work by adding new functionality for curating and correcting annotated libraries, by validating structure annotations. Due to the high quality of our software, the reproducibility, and improved logging, we think our new pipeline has the potential to become the standard in the field for cleaning tandem mass spectrometry libraries.

6.
J Cheminform ; 16(1): 58, 2024 May 23.
Artigo em Inglês | MEDLINE | ID: mdl-38783386

RESUMO

Effective visualization of small molecules is paramount in conveying concepts and results in cheminformatics. Scalable vector graphics (SVG) are preferred for creating such visualizations, as SVGs can be easily altered in post-production and exported to other formats. A wide spectrum of software applications already exist that can visualize molecules, and customize these visualizations, in many ways. However, software packages that can output projected 3D models onto a 2D canvas directly as SVG, while being programmatically accessible from Python, are lacking. Here, we introduce CineMol, which can draw vectorized approximations of three-dimensional small molecule models in seconds, without triangulation or ray tracing, resulting in files of around 50-300 kilobytes per molecule model for compounds with up to 45 heavy atoms. The SVGs outputted by CineMol can be readily modified in popular vector graphics editing software applications. CineMol is written in Python and can be incorporated into any existing Python cheminformatics workflow, as it only depends on native Python libraries. CineMol also provides programmatic access to all its internal states, allowing for per-atom and per-bond-based customization. CineMol's capacity to programmatically create molecular visualizations suitable for post-production offers researchers and scientists a powerful tool for enhancing the clarity and visual impact of their scientific presentations and publications in cheminformatics, metabolomics, and related scientific disciplines.Scientific contributionWe introduce CineMol, a Python-based tool that provides a valuable solution for cheminformatics researchers by enabling the direct generation of high-quality approximations of two-dimensional SVG visualizations from three-dimensional small molecule models, all within a programmable Python framework. CineMol offers a unique combination of speed, efficiency, and accessibility, making it an indispensable tool for researchers in cheminformatics, especially when working with SVG visualizations.

7.
Nat Protoc ; 19(9): 2597-2641, 2024 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-38769143

RESUMO

Untargeted mass spectrometry (MS) experiments produce complex, multidimensional data that are practically impossible to investigate manually. For this reason, computational pipelines are needed to extract relevant information from raw spectral data and convert it into a more comprehensible format. Depending on the sample type and/or goal of the study, a variety of MS platforms can be used for such analysis. MZmine is an open-source software for the processing of raw spectral data generated by different MS platforms. Examples include liquid chromatography-MS, gas chromatography-MS and MS-imaging. These data might typically be associated with various applications including metabolomics and lipidomics. Moreover, the third version of the software, described herein, supports the processing of ion mobility spectrometry (IMS) data. The present protocol provides three distinct procedures to perform feature detection and annotation of untargeted MS data produced by different instrumental setups: liquid chromatography-(IMS-)MS, gas chromatography-MS and (IMS-)MS imaging. For training purposes, example datasets are provided together with configuration batch files (i.e., list of processing steps and parameters) to allow new users to easily replicate the described workflows. Depending on the number of data files and available computing resources, we anticipate this to take between 2 and 24 h for new MZmine users and nonexperts. Within each procedure, we provide a detailed description for all processing parameters together with instructions/recommendations for their optimization. The main generated outputs are represented by aligned feature tables and fragmentation spectra lists that can be used by other third-party tools for further downstream analysis.


Assuntos
Espectrometria de Massas , Software , Espectrometria de Massas/métodos , Cromatografia Líquida/métodos , Metabolômica/métodos , Reprodutibilidade dos Testes , Espectrometria de Mobilidade Iônica/métodos , Cromatografia Gasosa-Espectrometria de Massas/métodos
8.
Metabolomics ; 20(3): 62, 2024 May 25.
Artigo em Inglês | MEDLINE | ID: mdl-38796627

RESUMO

INTRODUCTION: The chemical classification of Cannabis is typically confined to the cannabinoid content, whilst Cannabis encompasses diverse chemical classes that vary in abundance among all its varieties. Hence, neglecting other chemical classes within Cannabis strains results in a restricted and biased comprehension of elements that may contribute to chemical intricacy and the resultant medicinal qualities of the plant. OBJECTIVES: Thus, herein, we report a computational metabolomics study to elucidate the Cannabis metabolic map beyond the cannabinoids. METHODS: Mass spectrometry-based computational tools were used to mine and evaluate the methanolic leaf and flower extracts of two Cannabis cultivars: Amnesia haze (AMNH) and Royal dutch cheese (RDC). RESULTS: The results revealed the presence of different chemical compound classes including cannabinoids, but extending it to flavonoids and phospholipids at varying distributions across the cultivar plant tissues, where the phenylpropnoid superclass was more abundant in the leaves than in the flowers. Therefore, the two cultivars were differentiated based on the overall chemical content of their plant tissues where AMNH was observed to be more dominant in the flavonoid content while RDC was more dominant in the lipid-like molecules. Additionally, in silico molecular docking studies in combination with biological assay studies indicated the potentially differing anti-cancer properties of the two cultivars resulting from the elucidated chemical profiles. CONCLUSION: These findings highlight distinctive chemical profiles beyond cannabinoids in Cannabis strains. This novel mapping of the metabolomic landscape of Cannabis provides actionable insights into plant biochemistry and justifies selecting certain varieties for medicinal use.


Assuntos
Cannabis , Metabolômica , Folhas de Planta , Cannabis/química , Cannabis/metabolismo , Metabolômica/métodos , Folhas de Planta/metabolismo , Folhas de Planta/química , Flores/metabolismo , Flores/química , Extratos Vegetais/metabolismo , Extratos Vegetais/química , Extratos Vegetais/farmacologia , Canabinoides/metabolismo , Canabinoides/análise , Simulação de Acoplamento Molecular , Flavonoides/metabolismo , Flavonoides/análise , Espectrometria de Massas/métodos
9.
Methods Mol Biol ; 2788: 97-136, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38656511

RESUMO

Plant specialized metabolites have diversified vastly over the course of plant evolution, and they are considered key players in complex interactions between plants and their environment. The chemical diversity of these metabolites has been widely explored and utilized in agriculture and crop enhancement, the food industry, and drug development, among other areas. However, the immensity of the plant metabolome can make its exploration challenging. Here we describe a protocol for exploring plant specialized metabolites that combines high-resolution mass spectrometry and computational metabolomics strategies, including molecular networking, identification of structural motifs, as well as prediction of chemical structures and metabolite classes.


Assuntos
Espectrometria de Massas , Metaboloma , Metabolômica , Plantas , Metabolômica/métodos , Plantas/metabolismo , Espectrometria de Massas/métodos , Biologia Computacional/métodos
10.
Anal Chem ; 96(15): 5798-5806, 2024 Apr 16.
Artigo em Inglês | MEDLINE | ID: mdl-38564584

RESUMO

Untargeted metabolomics promises comprehensive characterization of small molecules in biological samples. However, the field is hampered by low annotation rates and abstract spectral data. Despite recent advances in computational metabolomics, manual annotations and manual confirmation of in-silico annotations remain important in the field. Here, exploratory data analysis methods for mass spectral data provide overviews, prioritization, and structural hypothesis starting points to researchers facing large quantities of spectral data. In this research, we propose a fluid means of dealing with mass spectral data using specXplore, an interactive Python dashboard providing interactive and complementary visualizations facilitating mass spectral similarity matrix exploration. Specifically, specXplore provides a two-dimensional t-distributed stochastic neighbor embedding embedding as a jumping board for local connectivity exploration using complementary interactive visualizations in the form of partial network drawings, similarity heatmaps, and fragmentation overview maps. SpecXplore makes use of state-of-the-art ms2deepscore pairwise spectral similarities as a quantitative backbone while allowing fast changes of threshold and connectivity limitation settings, providing flexibility in adjusting settings to suit the localized node environment being explored. We believe that specXplore can become an integral part of mass spectral data exploration efforts and assist users in the generation of structural hypotheses for compounds of interest.

11.
Nat Commun ; 14(1): 8488, 2023 Dec 20.
Artigo em Inglês | MEDLINE | ID: mdl-38123557

RESUMO

Despite the increasing availability of tandem mass spectrometry (MS/MS) community spectral libraries for untargeted metabolomics over the past decade, the majority of acquired MS/MS spectra remain uninterpreted. To further aid in interpreting unannotated spectra, we created a nearest neighbor suspect spectral library, consisting of 87,916 annotated MS/MS spectra derived from hundreds of millions of MS/MS spectra originating from published untargeted metabolomics experiments. Entries in this library, or "suspects," were derived from unannotated spectra that could be linked in a molecular network to an annotated spectrum. Annotations were propagated to unknowns based on structural relationships to reference molecules using MS/MS-based spectrum alignment. We demonstrate the broad relevance of the nearest neighbor suspect spectral library through representative examples of propagation-based annotation of acylcarnitines, bacterial and plant natural products, and drug metabolism. Our results also highlight how the library can help to better understand an Alzheimer's brain phenotype. The nearest neighbor suspect spectral library is openly available for download or for data analysis through the GNPS platform to help investigators hypothesize candidate structures for unknown MS/MS spectra in untargeted metabolomics data.


Assuntos
Acesso à Informação , Espectrometria de Massas em Tandem , Espectrometria de Massas em Tandem/métodos , Metabolômica/métodos , Biblioteca Gênica , Análise por Conglomerados
12.
Nat Rev Drug Discov ; 22(11): 895-916, 2023 11.
Artigo em Inglês | MEDLINE | ID: mdl-37697042

RESUMO

Developments in computational omics technologies have provided new means to access the hidden diversity of natural products, unearthing new potential for drug discovery. In parallel, artificial intelligence approaches such as machine learning have led to exciting developments in the computational drug design field, facilitating biological activity prediction and de novo drug design for molecular targets of interest. Here, we describe current and future synergies between these developments to effectively identify drug candidates from the plethora of molecules produced by nature. We also discuss how to address key challenges in realizing the potential of these synergies, such as the need for high-quality datasets to train deep learning algorithms and appropriate strategies for algorithm validation.


Assuntos
Inteligência Artificial , Produtos Biológicos , Humanos , Algoritmos , Aprendizado de Máquina , Descoberta de Drogas , Desenho de Fármacos , Produtos Biológicos/farmacologia
13.
Photochem Photobiol Sci ; 22(10): 2341-2356, 2023 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-37505444

RESUMO

UV-B radiation regulates numerous morphogenic, biochemical and physiological responses in plants, and can stimulate some responses typically associated with other abiotic and biotic stimuli, including invertebrate herbivory. Removal of UV-B from the growing environment of various plant species has been found to increase their susceptibility to consumption by invertebrate pests, however, to date, little research has been conducted to investigate the effects of UV-B on crop susceptibility to field pests. Here, we report findings from a multi-omic and genetic-based study investigating the mechanisms of UV-B-stimulated resistance of the crop, Brassica napus (oilseed rape), to herbivory from an economically important lepidopteran specialist of the Brassicaceae, Plutella xylostella (diamondback moth). The UV-B photoreceptor, UV RESISTANCE LOCUS 8 (UVR8), was not found to mediate resistance to this pest. RNA-Seq and untargeted metabolomics identified components of the sinapate/lignin biosynthetic pathway that were similarly regulated by UV-B and herbivory. Arabidopsis mutants in genes encoding two enzymes in the sinapate/lignin biosynthetic pathway, CAFFEATE O-METHYLTRANSFERASE 1 (COMT1) and ELICITOR-ACTIVATED GENE 3-2 (ELI3-2), retained UV-B-mediated resistance to P. xylostella herbivory. However, the overexpression of B. napus COMT1 in Arabidopsis further reduced plant susceptibility to P. xylostella herbivory in a UV-B-dependent manner. These findings demonstrate that overexpression of a component of the sinapate/lignin biosynthetic pathway in a member of the Brassicaceae can enhance UV-B-stimulated resistance to herbivory from P. xylostella.


Assuntos
Arabidopsis , Brassica napus , Mariposas , Animais , Arabidopsis/genética , Arabidopsis/efeitos da radiação , Brassica napus/genética , Herbivoria , Lignina , Mariposas/fisiologia , Plantas
14.
Proc Natl Acad Sci U S A ; 120(25): e2219373120, 2023 Jun 20.
Artigo em Inglês | MEDLINE | ID: mdl-37319116

RESUMO

Fungus-growing ants depend on a fungal mutualist that can fall prey to fungal pathogens. This mutualist is cultivated by these ants in structures called fungus gardens. Ants exhibit weeding behaviors that keep their fungus gardens healthy by physically removing compromised pieces. However, how ants detect diseases of their fungus gardens is unknown. Here, we applied the logic of Koch's postulates using environmental fungal community gene sequencing, fungal isolation, and laboratory infection experiments to establish that Trichoderma spp. can act as previously unrecognized pathogens of Trachymyrmex septentrionalis fungus gardens. Our environmental data showed that Trichoderma are the most abundant noncultivar fungi in wild T. septentrionalis fungus gardens. We further determined that metabolites produced by Trichoderma induce an ant weeding response that mirrors their response to live Trichoderma. Combining ant behavioral experiments with bioactivity-guided fractionation and statistical prioritization of metabolites in Trichoderma extracts demonstrated that T. septentrionalis ants weed in response to peptaibols, a specific class of secondary metabolites known to be produced by Trichoderma fungi. Similar assays conducted using purified peptaibols, including the two previously undescribed peptaibols trichokindins VIII and IX, suggested that weeding is likely induced by peptaibols as a class rather than by a single peptaibol metabolite. In addition to their presence in laboratory experiments, we detected peptaibols in wild fungus gardens. Our combination of environmental data and laboratory infection experiments strongly support that peptaibols act as chemical cues of Trichoderma pathogenesis in T. septentrionalis fungus gardens.


Assuntos
Formigas , Infecção Laboratorial , Trichoderma , Animais , Formigas/fisiologia , Jardins , Sinais (Psicologia) , Simbiose , Peptaibols
15.
Trends Pharmacol Sci ; 44(8): 532-541, 2023 08.
Artigo em Inglês | MEDLINE | ID: mdl-37391295

RESUMO

Ribosomally synthesized and post-translationally modified peptides (RiPPs) are a chemically diverse class of metabolites. Many RiPPs show potent biological activities that make them attractive starting points for drug development. A promising approach for the discovery of new classes of RiPPs is genome mining. However, the accuracy of genome mining is hampered by the lack of signature genes shared across different RiPP classes. One way to reduce false-positive predictions is by complementing genomic information with metabolomics data. In recent years, several new approaches addressing such integrative genomics and metabolomics analyses have been developed. In this review, we provide a detailed discussion of RiPP-compatible software tools that integrate paired genomics and metabolomics data. We highlight current challenges in data integration and identify opportunities for further developments targeting new classes of bioactive RiPPs.


Assuntos
Produtos Biológicos , Humanos , Ribossomos/genética , Ribossomos/metabolismo , Peptídeos , Genômica , Metaboloma , Processamento de Proteína Pós-Traducional
16.
Curr Opin Chem Biol ; 74: 102288, 2023 06.
Artigo em Inglês | MEDLINE | ID: mdl-36966702

RESUMO

The computational metabolomics field brings together computer scientists, bioinformaticians, chemists, clinicians, and biologists to maximize the impact of metabolomics across a wide array of scientific and medical disciplines. The field continues to expand as modern instrumentation produces datasets with increasing complexity, resolution, and sensitivity. These datasets must be processed, annotated, modeled, and interpreted to enable biological insight. Techniques for visualization, integration (within or between omics), and interpretation of metabolomics data have evolved along with innovation in the databases and knowledge resources required to aid understanding. In this review, we highlight recent advances in the field and reflect on opportunities and innovations in response to the most pressing challenges. This review was compiled from discussions from the 2022 Dagstuhl seminar entitled "Computational Metabolomics: From Spectra to Knowledge".


Assuntos
Biologia Computacional , Metabolômica , Metabolômica/métodos , Espectrometria de Massas/métodos , Bases de Dados Factuais , Biologia Computacional/métodos
17.
Front Mol Biosci ; 10: 1130781, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-36959982

RESUMO

Data-Dependent and Data-Independent Acquisition modes (DDA and DIA, respectively) are both widely used to acquire MS2 spectra in untargeted liquid chromatography tandem mass spectrometry (LC-MS/MS) metabolomics analyses. Despite their wide use, little work has been attempted to systematically compare their MS/MS spectral annotation performance in untargeted settings due to the lack of ground truth and the costs involved in running a large number of acquisitions. Here, we present a systematic in silico comparison of these two acquisition methods in untargeted metabolomics by extending our Virtual Metabolomics Mass Spectrometer (ViMMS) framework with a DIA module. Our results show that the performance of these methods varies with the average number of co-eluting ions as the most important factor. At low numbers, DIA outperforms DDA, but at higher numbers, DDA has an advantage as DIA can no longer deal with the large amount of overlapping ion chromatograms. Results from simulation were further validated on an actual mass spectrometer, demonstrating that using ViMMS we can draw conclusions from simulation that translate well into the real world. The versatility of the Virtual Metabolomics Mass Spectrometer (ViMMS) framework in simulating different parameters of both Data-Dependent and Data-Independent Acquisition (DDA and DIA) modes is a key advantage of this work. Researchers can easily explore and compare the performance of different acquisition methods within the ViMMS framework, without the need for expensive and time-consuming experiments with real experimental data. By identifying the strengths and limitations of each acquisition method, researchers can optimize their choice and obtain more accurate and robust results. Furthermore, the ability to simulate and validate results using the ViMMS framework can save significant time and resources, as it eliminates the need for numerous experiments. This work not only provides valuable insights into the performance of DDA and DIA, but it also opens the door for further advancements in LC-MS/MS data acquisition methods.

18.
Nat Commun ; 14(1): 1752, 2023 03 29.
Artigo em Inglês | MEDLINE | ID: mdl-36990978

RESUMO

Metabolomics-driven discoveries of biological samples remain hampered by the grand challenge of metabolite annotation and identification. Only few metabolites have an annotated spectrum in spectral libraries; hence, searching only for exact library matches generally returns a few hits. An attractive alternative is searching for so-called analogues as a starting point for structural annotations; analogues are library molecules which are not exact matches but display a high chemical similarity. However, current analogue search implementations are not yet very reliable and relatively slow. Here, we present MS2Query, a machine learning-based tool that integrates mass spectral embedding-based chemical similarity predictors (Spec2Vec and MS2Deepscore) as well as detected precursor masses to rank potential analogues and exact matches. Benchmarking MS2Query on reference mass spectra and experimental case studies demonstrate improved reliability and scalability. Thereby, MS2Query offers exciting opportunities to further increase the annotation rate of metabolomics profiles of complex metabolite mixtures and to discover new biology.


Assuntos
Aprendizado de Máquina , Metabolômica , Reprodutibilidade dos Testes , Espectrometria de Massas , Misturas Complexas
19.
PLoS Comput Biol ; 19(2): e1010462, 2023 02.
Artigo em Inglês | MEDLINE | ID: mdl-36758069

RESUMO

Microbial specialised metabolism is full of valuable natural products that are applied clinically, agriculturally, and industrially. The genes that encode their biosynthesis are often physically clustered on the genome in biosynthetic gene clusters (BGCs). Many BGCs consist of multiple groups of co-evolving genes called sub-clusters that are responsible for the biosynthesis of a specific chemical moiety in a natural product. Sub-clusters therefore provide an important link between the structures of a natural product and its BGC, which can be leveraged for predicting natural product structures from sequence, as well as for linking chemical structures and metabolomics-derived mass features to BGCs. While some initial computational methodologies have been devised for sub-cluster detection, current approaches are not scalable, have only been run on small and outdated datasets, or produce an impractically large number of possible sub-clusters to mine through. Here, we constructed a scalable method for unsupervised sub-cluster detection, called iPRESTO, based on topic modelling and statistical analysis of co-occurrence patterns of enzyme-coding protein families. iPRESTO was used to mine sub-clusters across 150,000 prokaryotic BGCs from antiSMASH-DB. After annotating a fraction of the resulting sub-cluster families, we could predict a substructure for 16% of the antiSMASH-DB BGCs. Additionally, our method was able to confirm 83% of the experimentally characterised sub-clusters in MIBiG reference BGCs. Based on iPRESTO-detected sub-clusters, we could correctly identify the BGCs for xenorhabdin and salbostatin biosynthesis (which had not yet been annotated in BGC databases), as well as propose a candidate BGC for akashin biosynthesis. Additionally, we show for a collection of 145 actinobacteria how substructures can aid in linking BGCs to molecules by correlating iPRESTO-detected sub-clusters to MS/MS-derived Mass2Motifs substructure patterns. This work paves the way for deeper functional and structural annotation of microbial BGCs by improved linking of orphan molecules to their cognate gene clusters, thus facilitating accelerated natural product discovery.


Assuntos
Produtos Biológicos , Espectrometria de Massas em Tandem , Metabolômica , Bactérias/genética , Família Multigênica
20.
Microbiome ; 11(1): 13, 2023 01 23.
Artigo em Inglês | MEDLINE | ID: mdl-36691088

RESUMO

BACKGROUND: It is well-known that the microbiome produces a myriad of specialised metabolites with diverse functions. To better characterise their structures and identify their producers in complex samples, integrative genome and metabolome mining is becoming increasingly popular. Metabologenomic co-occurrence-based correlation scoring methods facilitate the linking of metabolite mass fragmentation spectra (MS/MS) to their cognate biosynthetic gene clusters (BGCs) based on shared absence/presence patterns of metabolites and BGCs in paired omics datasets of multiple strains. Recently, these methods have been made more readily accessible through the NPLinker platform. However, co-occurrence-based approaches usually result in too many candidate links to manually validate. To address this issue, we introduce a generic feature-based correlation method that matches chemical compound classes between BGCs and MS/MS spectra. RESULTS: To automatically reduce the long lists of potential BGC-MS/MS spectrum links, we match natural product (NP) ontologies previously independently developed for genomics and metabolomics and developed NPClassScore: an empirical class matching score that we also implemented in the NPLinker platform. By applying NPClassScore on three paired omics datasets totalling 189 bacterial strains, we show that the number of links is reduced by on average 63% as compared to using a co-occurrence-based strategy alone. We further demonstrate that 96% of experimentally validated links in these datasets are retained and prioritised when using NPClassScore. CONCLUSION: The matching genome-metabolome class ontologies provide a starting point for selecting plausible candidates for BGCs and MS/MS spectra based on matching chemical compound class ontologies. NPClassScore expedites genome/metabolome data integration, as relevant BGC-metabolite links are prioritised, and researchers are faced with substantially fewer proposed BGC-MS/MS links to manually inspect. We anticipate that our addition to the NPLinker platform will aid integrative omics mining workflows in discovering novel NPs and understanding complex metabolic interactions in the microbiome. Video Abstract.


Assuntos
Vias Biossintéticas , Espectrometria de Massas em Tandem , Vias Biossintéticas/genética , Genômica , Metabolômica/métodos , Família Multigênica
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA