RESUMO
Microorganisms produce small bioactive compounds as part of their secondary or specialised metabolism. Often, such metabolites have antimicrobial, anticancer, antifungal, antiviral or other bio-activities and thus play an important role for applications in medicine and agriculture. In the past decade, genome mining has become a widely-used method to explore, access, and analyse the available biodiversity of these compounds. Since 2011, the 'antibiotics and secondary metabolite analysis shell-antiSMASH' (https://antismash.secondarymetabolites.org/) has supported researchers in their microbial genome mining tasks, both as a free to use web server and as a standalone tool under an OSI-approved open source licence. It is currently the most widely used tool for detecting and characterising biosynthetic gene clusters (BGCs) in archaea, bacteria, and fungi. Here, we present the updated version 7 of antiSMASH. antiSMASH 7 increases the number of supported cluster types from 71 to 81, as well as containing improvements in the areas of chemical structure prediction, enzymatic assembly-line visualisation and gene cluster regulation.
Assuntos
Computadores , Software , Bactérias/genética , Bactérias/metabolismo , Archaea/genética , Genoma Microbiano , Família Multigênica , Metabolismo Secundário/genéticaRESUMO
With an ever-increasing amount of (meta)genomic data being deposited in sequence databases, (meta)genome mining for natural product biosynthetic pathways occupies a critical role in the discovery of novel pharmaceutical drugs, crop protection agents and biomaterials. The genes that encode these pathways are often organised into biosynthetic gene clusters (BGCs). In 2015, we defined the Minimum Information about a Biosynthetic Gene cluster (MIBiG): a standardised data format that describes the minimally required information to uniquely characterise a BGC. We simultaneously constructed an accompanying online database of BGCs, which has since been widely used by the community as a reference dataset for BGCs and was expanded to 2021 entries in 2019 (MIBiG 2.0). Here, we describe MIBiG 3.0, a database update comprising large-scale validation and re-annotation of existing entries and 661 new entries. Particular attention was paid to the annotation of compound structures and biological activities, as well as protein domain selectivities. Together, these new features keep the database up-to-date, and will provide new opportunities for the scientific community to use its freely available data, e.g. for the training of new machine learning models to predict sequence-structure-function relationships for diverse natural products. MIBiG 3.0 is accessible online at https://mibig.secondarymetabolites.org/.
Assuntos
Genoma , Genômica , Família Multigênica , Vias Biossintéticas/genéticaRESUMO
Biomacromolecules are known to feature complex three-dimensional shapes that are essential for their function. Among natural products, ambiguous molecular shapes are a rare phenomenon. The hexapeptide tryptorubin A can adopt one of two unusual atropisomeric configurations. Initially hypothesized to be a non-ribosomal peptide, we show that tryptorubin A is the first characterized member of a new family of ribosomally synthesized and posttranslationally modified peptides (RiPPs) that we named atropopeptides. The sole modifying enzyme encoded in the gene cluster, a cytochrome P450 monooxygenase, is responsible for the atropospecific formation of one carbon-carbon and two carbon-nitrogen bonds. The characterization of two additional atropopeptide biosynthetic pathways revealed a two-step maturation process. Atropopeptides promote pro-angiogenic cell functions as indicated by an increase in endothelial cell proliferation and undirected migration. Our study expands the biochemical space of RiPP-modifying enzymes and paves the way towards the chemoenzymatic utilization of atropopeptide-modifying P450s.
Assuntos
Produtos Biológicos , Ribossomos , Produtos Biológicos/química , Carbono/metabolismo , Oxigenases de Função Mista/metabolismo , Família Multigênica , Nitrogênio/metabolismo , Peptídeos/química , Processamento de Proteína Pós-Traducional , Ribossomos/metabolismoRESUMO
Natural products are structurally highly diverse and exhibit a wide array of biological activities. As a result, they serve as an important source of new drug leads. Traditionally, natural products have been discovered by bioactivity-guided fractionation. The advent of genome sequencing technology has resulted in the introduction of an alternative approach towards novel natural product scaffolds: Genome mining. Genome mining is an in-silico natural product discovery strategy in which sequenced genomes are analyzed for the potential of the associated organism to produce natural products. Seemingly universal biosynthetic principles have been deciphered for most natural product classes that are used to detect natural product biosynthetic gene clusters using pathway-encoded conserved key enzymes, domains, or motifs as bait. Several generations of highly sophisticated tools have been developed for the biosynthetic rule-based identification of natural product gene clusters. Apart from these hard-coded algorithms, multiple tools that use machine learning-based approaches have been designed to complement the existing genome mining tool set and focus on natural product gene clusters that lack genes with conserved signature sequences. In this perspective, we take a closer look at state-of-the-art genome mining tools that are based on either hard-coded rules or machine learning algorithms, with an emphasis on the confidence of their predictions and potential to identify non-canonical natural product biosynthetic gene clusters. We highlight the genome mining pipelines' current strengths and limitations by contrasting their advantages and disadvantages. Moreover, we introduce two indirect biosynthetic gene cluster identification strategies that complement current workflows. The combination of all genome mining approaches will pave the way towards a more comprehensive understanding of the full biosynthetic repertoire encoded in microbial genome sequences.
RESUMO
Natural products are molecules that fulfil a range of important ecological functions. Many natural products have been exploited for pharmaceutical and agricultural applications. In contrast to many other specialised metabolites, the products of modular nonribosomal peptide synthetase (NRPS) and polyketide synthase (PKS) systems can often (partially) be predicted from the DNA sequence of the biosynthetic gene clusters. This is because the biosynthetic pathways of NRPS and PKS systems adhere to consistent rulesets. These universal biosynthetic rules can be leveraged to generate biosynthetic models of biosynthetic pathways. While these principles have been largely deciphered, software that leverages these rules to automatically generate visualisations of biosynthetic models has not yet been developed. To enable high-quality automated visualisations of natural product biosynthetic pathways, we developed RAIChU (Reaction Analysis through Illustrating Chemical Units), which produces depictions of biosynthetic transformations of PKS, NRPS, and hybrid PKS/NRPS systems from predicted or experimentally verified module architectures and domain substrate specificities. RAIChU also boasts a library of functions to perform and visualise reactions and pathways whose specifics (e.g., regioselectivity, stereoselectivity) are still difficult to predict, including terpenes, ribosomally synthesised and posttranslationally modified peptides and alkaloids. Additionally, RAIChU includes 34 prevalent tailoring reactions to enable the visualisation of biosynthetic pathways of fully maturated natural products. RAIChU can be integrated into Python pipelines, allowing users to upload and edit results from antiSMASH, a widely used BGC detection and annotation tool, or to build biosynthetic PKS/NRPS systems from scratch. RAIChU's cluster drawing correctness (100%) and drawing readability (97.66%) were validated on 5000 randomly generated PKS/NRPS systems, and on the MIBiG database. The automated visualisation of these pathways accelerates the generation of biosynthetic models, facilitates the analysis of large (meta-) genomic datasets and reduces human error. RAIChU is available at https://github.com/BTheDragonMaster/RAIChU and https://pypi.org/project/raichu .Scientific contributionRAIChU is the first software package capable of automating high-quality visualisations of natural product biosynthetic pathways. By leveraging universal biosynthetic rules, RAIChU enables the depiction of complex biosynthetic transformations for PKS, NRPS, ribosomally synthesised and posttranslationally modified peptide (RiPP), terpene and alkaloid systems, enhancing predictive and analytical capabilities. This innovation not only streamlines the creation of biosynthetic models, making the analysis of large genomic datasets more efficient and accurate, but also bridges a crucial gap in predicting and visualising the complexities of natural product biosynthesis.
RESUMO
Ribosomally synthesized and posttranslationally modified peptides (RiPPs) constitute a diverse class of natural products. Atropopeptides are a recent addition to the class. Here we developed AtropoFinder, a genome mining algorithm to chart the biosynthetic landscape of the atropopeptides. AtropoFinder identified more than 650 atropopeptide biosynthetic gene clusters (BGCs). We pinpointed crucial motifs and residues in leader and core peptide sequences, prompting a refined definition of the atropopeptide RiPP family. Our study revealed that a substantial subset of atropopeptide BGCs harbors multiple tailoring genes, thus suggesting a broader structural diversity than previously anticipated. To verify AtropoFinder, we heterologously expressed four atropopeptide BGCs, which resulted in the identification of novel atropopeptides with varying peptide lengths, number and types of modifications. Atropopeptides serve as a proof-of-principle for the versatile genome mining approach developed in this study that can be repurposed for the identification of RiPP and other BGCs that currently evade detection.
RESUMO
Many complex terpenoids, predominantly isolated from plants and fungi, show drug-like physicochemical properties. Recent advances in genome mining revealed actinobacteria as an almost untouched treasure trove of terpene biosynthetic gene clusters (BGCs). In this study, we characterized a terpene BGC with an unusual architecture. The selected BGC includes, among others, genes encoding a terpene cyclase fused to a truncated reductase domain and a cytochrome P450 monooxygenase (P450) that is split over three gene fragments. Functional characterization of the BGC in a heterologous host led to the identification of several new members of the trans-eunicellane family of diterpenoids, the euthailols, that feature unique oxidation patterns. A combination of bioinformatic analyses, structural modeling studies, and heterologous expression revealed a dual function of the pathway-encoded hypothetical protein that acts as an isomerase and an oxygenase. Moreover, in the absence of other tailoring enzymes, a P450 hydroxylates the eunicellane scaffold at a position that is not modified in other eunicellanes. Surprisingly, both the modifications installed by the hypothetical protein and one of the P450s exhibit partial redundancy. Bioactivity assays revealed that some of the euthailols show growth inhibitory properties against Gram-negative ESKAPE pathogens. The characterization of the euthailol BGC in this study provides unprecedented insights into the partial functional redundancy of tailoring enzymes in complex diterpenoid biosynthesis and highlights hypothetical proteins as an important and largely overlooked family of tailoring enzymes involved in the maturation of complex terpenoids.
RESUMO
Developments in computational omics technologies have provided new means to access the hidden diversity of natural products, unearthing new potential for drug discovery. In parallel, artificial intelligence approaches such as machine learning have led to exciting developments in the computational drug design field, facilitating biological activity prediction and de novo drug design for molecular targets of interest. Here, we describe current and future synergies between these developments to effectively identify drug candidates from the plethora of molecules produced by nature. We also discuss how to address key challenges in realizing the potential of these synergies, such as the need for high-quality datasets to train deep learning algorithms and appropriate strategies for algorithm validation.
Assuntos
Inteligência Artificial , Produtos Biológicos , Humanos , Algoritmos , Aprendizado de Máquina , Descoberta de Drogas , Desenho de Fármacos , Produtos Biológicos/farmacologiaRESUMO
Microbes produce structurally diverse natural products to interact with their environment. Many of the biosynthetic products involved in this "metabolic small talk" have been exploited for the treatment of various diseases. As an alternative to the traditional bioactivity-guided workflow, genome mining has been introduced for targeted natural product discovery based on genome sequence information. In this commentary, we will discuss the evolution of genome mining, as well as its current limitations. The Helfrich laboratory aims to play a leading role in overcoming these limitations with the development of computational strategies to identify noncanonical biosynthetic pathways and to decipher the principles that govern the production of the associated metabolites. We will use these insights to develop algorithms for the prediction of natural product scaffolds. These studies will pave the way toward a more comprehensive understanding of the full biosynthetic repertoire encoded in microbial genomes and provide access to novel metabolites.