RESUMO
With an ever-increasing amount of (meta)genomic data being deposited in sequence databases, (meta)genome mining for natural product biosynthetic pathways occupies a critical role in the discovery of novel pharmaceutical drugs, crop protection agents and biomaterials. The genes that encode these pathways are often organised into biosynthetic gene clusters (BGCs). In 2015, we defined the Minimum Information about a Biosynthetic Gene cluster (MIBiG): a standardised data format that describes the minimally required information to uniquely characterise a BGC. We simultaneously constructed an accompanying online database of BGCs, which has since been widely used by the community as a reference dataset for BGCs and was expanded to 2021 entries in 2019 (MIBiG 2.0). Here, we describe MIBiG 3.0, a database update comprising large-scale validation and re-annotation of existing entries and 661 new entries. Particular attention was paid to the annotation of compound structures and biological activities, as well as protein domain selectivities. Together, these new features keep the database up-to-date, and will provide new opportunities for the scientific community to use its freely available data, e.g. for the training of new machine learning models to predict sequence-structure-function relationships for diverse natural products. MIBiG 3.0 is accessible online at https://mibig.secondarymetabolites.org/.
Assuntos
Genoma , Genômica , Família Multigênica , Vias Biossintéticas/genéticaRESUMO
Major advances in genome sequencing and large-scale biosynthetic gene cluster (BGC) analysis have prompted an age of natural product discovery driven by genome mining. Still, connecting molecules to their cognate BGCs is a substantial bottleneck for this approach. We have developed a mass-spectrometry-based parallel stable isotope labeling platform, termed IsoAnalyst, which assists in associating metabolite stable isotope labeling patterns with BGC structure prediction to connect natural products to their corresponding BGCs. Here we show that IsoAnalyst can quickly associate both known metabolites and unknown analytes with BGCs to elucidate the complex chemical phenotypes of these biosynthetic systems. We validate this approach for a range of compound classes, using both the type strain Saccharopolyspora erythraea and an environmentally isolated Micromonospora sp. We further demonstrate the utility of this tool with the discovery of lobosamide D, a new and structurally unique member of the family of lobosamide macrolactams.
Assuntos
Produtos Biológicos , Micromonospora , Vias Biossintéticas/genética , Marcação por Isótopo , Família MultigênicaRESUMO
Within the natural products field there is an increasing emphasis on the study of compounds from microbial sources. This has been fuelled by interest in the central role that microorganisms play in mediating both interspecies interactions and host-microbe relationships. To support the study of natural products chemistry produced by microorganisms we released the Natural Products Atlas, a database of known microbial natural products structures, in 2019. This paper reports the release of a new version of the database which includes a full RESTful application programming interface (API), a new website framework, and an expanded database that includes 8128 new compounds, bringing the total to 32 552. In addition to these structural and content changes we have added full taxonomic descriptions for all microbial taxa and have added chemical ontology terms from both NP Classifier and ClassyFire. We have also performed manual curation to review all entries with incomplete configurational assignments and have integrated data from external resources, including CyanoMetDB. Finally, we have improved the user experience by updating the Overview dashboard and creating a dashboard for taxonomic origin. The database can be accessed via the new interactive website at https://www.npatlas.org.
Assuntos
Produtos Biológicos/classificação , Bases de Dados Factuais , Interações entre Hospedeiro e Microrganismos/genética , Software , Bactérias/classificação , Classificação , Fungos/classificação , Humanos , Interface Usuário-ComputadorRESUMO
The Natural Products Magnetic Resonance Database (NP-MRD) is a comprehensive, freely available electronic resource for the deposition, distribution, searching and retrieval of nuclear magnetic resonance (NMR) data on natural products, metabolites and other biologically derived chemicals. NMR spectroscopy has long been viewed as the 'gold standard' for the structure determination of novel natural products and novel metabolites. NMR is also widely used in natural product dereplication and the characterization of biofluid mixtures (metabolomics). All of these NMR applications require large collections of high quality, well-annotated, referential NMR spectra of pure compounds. Unfortunately, referential NMR spectral collections for natural products are quite limited. It is because of the critical need for dedicated, open access natural product NMR resources that the NP-MRD was funded by the National Institute of Health (NIH). Since its launch in 2020, the NP-MRD has grown quickly to become the world's largest repository for NMR data on natural products and other biological substances. It currently contains both structural and NMR data for nearly 41,000 natural product compounds from >7400 different living species. All structural, spectroscopic and descriptive data in the NP-MRD is interactively viewable, searchable and fully downloadable in multiple formats. Extensive hyperlinks to other databases of relevance are also provided. The NP-MRD also supports community deposition of NMR assignments and NMR spectra (1D and 2D) of natural products and related meta-data. The deposition system performs extensive data enrichment, automated data format conversion and spectral/assignment evaluation. Details of these database features, how they are implemented and plans for future upgrades are also provided. The NP-MRD is available at https://np-mrd.org.
Assuntos
Produtos Biológicos/química , Bases de Dados Factuais , Espectroscopia de Ressonância Magnética , Software , Produtos Biológicos/classificação , InternetRESUMO
Nuclear magnetic resonance (NMR) data are rarely deposited in open databases, leading to loss of critical scientific knowledge. Existing data reporting methods (images, tables, lists of values) contain less information than raw data and are poorly standardized. Together, these issues limit FAIR (findable, accessible, interoperable, reusable) access to these data, which in turn creates barriers for compound dereplication and the development of new data-driven discovery tools. Existing NMR databases either are not designed for natural products data or employ complex deposition interfaces that disincentivize deposition. Journals, including the Journal of Natural Products (JNP), are now requiring data submission as part of the publication process, creating the need for a streamlined, user-friendly mechanism to deposit and distribute NMR data.
Assuntos
Produtos Biológicos , Bases de Dados Factuais , Espectroscopia de Ressonância MagnéticaRESUMO
Fueled by the explosion of (meta)genomic data, genome mining of specialized metabolites has become a major technology for drug discovery and studying microbiome ecology. In these efforts, computational tools like antiSMASH have played a central role through the analysis of Biosynthetic Gene Clusters (BGCs). Thousands of candidate BGCs from microbial genomes have been identified and stored in public databases. Interpreting the function and novelty of these predicted BGCs requires comparison with a well-documented set of BGCs of known function. The MIBiG (Minimum Information about a Biosynthetic Gene Cluster) Data Standard and Repository was established in 2015 to enable curation and storage of known BGCs. Here, we present MIBiG 2.0, which encompasses major updates to the schema, the data, and the online repository itself. Over the past five years, 851 new BGCs have been added. Additionally, we performed extensive manual data curation of all entries to improve the annotation quality of our repository. We also redesigned the data schema to ensure the compliance of future annotations. Finally, we improved the user experience by adding new features such as query searches and a statistics page, and enabled direct link-outs to chemical structure databases. The repository is accessible online at https://mibig.secondarymetabolites.org/.
Assuntos
Bases de Dados Genéticas , Genoma Bacteriano , Genômica/métodos , Família Multigênica , Software , Vias Biossintéticas/genética , Anotação de Sequência MolecularRESUMO
The applicability of the Evans-Polanyi (EP) relationship to HAT reactions from C(sp3)-H bonds to the cumyloxyl radical (CumOâ¢) has been investigated. A consistent set of rate constants, kH, for HAT from the C-H bonds of 56 substrates to CumOâ¢, spanning a range of more than 4 orders of magnitude, has been measured under identical experimental conditions. A corresponding set of consistent gas-phase C-H bond dissociation enthalpies (BDEs) spanning 27 kcal mol-1 has been calculated using the (RO)CBS-QB3 method. The log kH' vs C-H BDE plot shows two distinct EP relationships, one for substrates bearing benzylic and allylic C-H bonds (unsaturated group) and the other one, with a steeper slope, for saturated hydrocarbons, alcohols, ethers, diols, amines, and carbamates (saturated group), in line with the bimodal behavior observed previously in theoretical studies of reactions promoted by other HAT reagents. The parallel use of BDFEs instead of BDEs allows the transformation of this correlation into a linear free energy relationship, analyzed within the framework of the Marcus theory. The ΔG⧧HAT vs ΔG°HAT plot shows again distinct behaviors for the two groups. A good fit to the Marcus equation is observed only for the saturated group, with λ = 58 kcal mol-1, indicating that with the unsaturated group λ must increase with increasing driving force. Taken together these results provide a qualitative connection between Bernasconi's principle of nonperfect synchronization and Marcus theory and suggest that the observed bimodal behavior is a general feature in the reactions of oxygen-based HAT reagents with C(sp3)-H donors.
Assuntos
Teoria da Densidade Funcional , Hidrogênio/química , Radicais Livres/química , Cinética , Estrutura Molecular , Fatores de TempoRESUMO
Covering: 2010-2020The digital revolution is driving significant changes in how people store, distribute, and use information. With the advent of new technologies around linked data, machine learning and large-scale network inference, the natural products research field is beginning to embrace real-time sharing and large-scale analysis of digitized experimental data. Databases play a key role in this, as they allow systematic annotation and storage of data for both basic and advanced applications. The quality of the content, structure, and accessibility of these databases all contribute to their usefulness for the scientific community in practice. This review covers the development of databases relevant for microbial natural product discovery during the past decade (2010-2020), including repositories of chemical structures/properties, metabolomics, and genomic data (biosynthetic gene clusters). It provides an overview of the most important databases and their functionalities, highlights some early meta-analyses using such databases, and discusses basic principles to enable widespread interoperability between databases. Furthermore, it points out conceptual and practical challenges in the curation and usage of natural products databases. Finally, the review closes with a discussion of key action points required for the field moving forward, not only for database developers but for any scientist active in the field.
Assuntos
Produtos Biológicos , Bases de Dados Factuais , Microbiologia , Antibacterianos , Vias Biossintéticas/genética , Bases de Dados de Compostos Químicos , Bases de Dados de Produtos Farmacêuticos , Armazenamento e Recuperação da Informação , Metabolômica , Família MultigênicaRESUMO
The development of new "omics" platforms is having a significant impact on the landscape of natural products discovery. However, despite the advantages that such platforms bring to the field, there remains no straightforward method for characterizing the chemical landscape of natural products libraries using two-dimensional nuclear magnetic resonance (2D-NMR) experiments. NMR analysis provides a powerful complement to mass spectrometric approaches, given the universal coverage of NMR experiments. However, the high degree of signal overlap, particularly in one-dimensional NMR spectra, has limited applications of this approach. To address this issue, we have developed a new data analysis platform for complex mixture analysis, termed MADByTE (Metabolomics and Dereplication by Two-Dimensional Experiments). This platform employs a combination of TOCSY and HSQC spectra to identify spin system features within complex mixtures and then matches spin system features between samples to create a chemical similarity network for a given sample set. In this report we describe the design and construction of the MADByTE platform and demonstrate the application of chemical similarity networks for both the dereplication of known compound scaffolds and the prioritization of bioactive metabolites from a bacterial prefractionated extract library.
Assuntos
Produtos Biológicos/química , Misturas Complexas/química , Espectroscopia de Ressonância Magnética/métodos , Metabolômica , Software , Interface Usuário-ComputadorRESUMO
The use of pairwise dispersion corrections together with dispersion-correcting potentials (DCPs) offers a computationally low-cost approach to improving the performance of a density-functional theory based method with respect to the prediction of important chemical properties. In this work, we develop DCPs for the C, H, N, and O atoms for use with the BLYP generalized gradient approximation functional coupled with "D3" pairwise dispersion corrections and 6-31+G(2d,2p) basis sets. The combined approach, referred to as BLYP-D3-DCP, offers generally improved performance over both unadorned BLYP and BLYP with D3 corrections with respect to the prediction of noncovalent binding energies (BEs) and covalent bond dissociation enthalpies (BDEs). Predicted barrier heights for a set of pericyclic and Diels-Alder reactions are improved in some instances, as are organic bond separation reaction energies and radical stabilization energies. It is also shown that the BLYP-D3-DCP approach outperforms B3LYP-D3 in the prediction of many chemical properties, in particular noncovalent BEs and BDEs, suggesting that the addition of D3 and DCP corrections, which have negligible computational cost, to simple density functionals like BLYP may elevate their performance to that of more complex functionals such as B3LYP.
RESUMO
Spectral matching of MS2 fragmentation spectra has become a popular method for characterizing natural products libraries but identification remains challenging due to differences in MS2 fragmentation properties between instruments and the low coverage of current spectral reference libraries. To address this bottleneck we present Structural similarity Network Annotation Platform for Mass Spectrometry (SNAP-MS) which matches chemical similarity grouping in the Natural Products Atlas to grouping of mass spectrometry features from molecular networking. This approach assigns compound families to molecular networking subnetworks without the need for experimental or calculated reference spectra. We demonstrate SNAP-MS can accurately annotate subnetworks built from both reference spectra and an in-house microbial extract library, and correctly predict compound families from published molecular networks acquired on a range of MS instrumentation. Compound family annotations for the microbial extract library are validated by co-injection of standards or isolation and spectroscopic analysis. SNAP-MS is freely available at www.npatlas.org/discover/snapms .
Assuntos
Produtos Biológicos , Humanos , Espectrometria de MassasRESUMO
Developments in computational omics technologies have provided new means to access the hidden diversity of natural products, unearthing new potential for drug discovery. In parallel, artificial intelligence approaches such as machine learning have led to exciting developments in the computational drug design field, facilitating biological activity prediction and de novo drug design for molecular targets of interest. Here, we describe current and future synergies between these developments to effectively identify drug candidates from the plethora of molecules produced by nature. We also discuss how to address key challenges in realizing the potential of these synergies, such as the need for high-quality datasets to train deep learning algorithms and appropriate strategies for algorithm validation.
Assuntos
Inteligência Artificial , Produtos Biológicos , Humanos , Algoritmos , Aprendizado de Máquina , Descoberta de Drogas , Desenho de Fármacos , Produtos Biológicos/farmacologiaRESUMO
Few tools exist in natural products discovery to integrate biological screening and untargeted mass spectrometry data at the library scale. Previously, we reported Compound Activity Mapping as a strategy for predicting compound bioactivity profiles directly from primary screening results on extract libraries. We now present NP Analyst, an open online platform for Compound Activity Mapping that accepts bioassay data of almost any type, and is compatible with mass spectrometry data from major instrument manufacturers via the mzML format. In addition, NP Analyst will accept processed mass spectrometry data from the MZmine 2 and GNPS open-source platforms, making it a versatile tool for integration with existing discovery workflows. We demonstrate the utility of this new tool for both the dereplication of known compounds and the discovery of novel bioactive natural products using a challenging low-resolution antimicrobial bioassay data set. This new platform is available at www.npanalyst.org.
RESUMO
Despite rapid evolution in the area of microbial natural products chemistry, there is currently no open access database containing all microbially produced natural product structures. Lack of availability of these data is preventing the implementation of new technologies in natural products science. Specifically, development of new computational strategies for compound characterization and identification are being hampered by the lack of a comprehensive database of known compounds against which to compare experimental data. The creation of an open access, community-maintained database of microbial natural product structures would enable the development of new technologies in natural products discovery and improve the interoperability of existing natural products data resources. However, these data are spread unevenly throughout the historical scientific literature, including both journal articles and international patents. These documents have no standard format, are often not digitized as machine readable text, and are not publicly available. Further, none of these documents have associated structure files (e.g., MOL, InChI, or SMILES), instead containing images of structures. This makes extraction and formatting of relevant natural products data a formidable challenge. Using a combination of manual curation and automated data mining approaches we have created a database of microbial natural products (The Natural Products Atlas, www.npatlas.org) that includes 24â¯594 compounds and contains referenced data for structure, compound names, source organisms, isolation references, total syntheses, and instances of structural reassignment. This database is accompanied by an interactive web portal that permits searching by structure, substructure, and physical properties. The Web site also provides mechanisms for visualizing natural products chemical space and dashboards for displaying author and discovery timeline data. These interactive tools offer a powerful knowledge base for natural products discovery with a central interface for structure and property-based searching and presents new viewpoints on structural diversity in natural products. The Natural Products Atlas has been developed under FAIR principles (Findable, Accessible, Interoperable, and Reusable) and is integrated with other emerging natural product databases, including the Minimum Information About a Biosynthetic Gene Cluster (MIBiG) repository, and the Global Natural Products Social Molecular Networking (GNPS) platform. It is designed as a community-supported resource to provide a central repository for known natural product structures from microorganisms and is the first comprehensive, open access resource of this type. It is expected that the Natural Products Atlas will enable the development of new natural products discovery modalities and accelerate the process of structural characterization for complex natural products libraries.