RESUMO
Fungus-growing ants depend on a fungal mutualist that can fall prey to fungal pathogens. This mutualist is cultivated by these ants in structures called fungus gardens. Ants exhibit weeding behaviors that keep their fungus gardens healthy by physically removing compromised pieces. However, how ants detect diseases of their fungus gardens is unknown. Here, we applied the logic of Koch's postulates using environmental fungal community gene sequencing, fungal isolation, and laboratory infection experiments to establish that Trichoderma spp. can act as previously unrecognized pathogens of Trachymyrmex septentrionalis fungus gardens. Our environmental data showed that Trichoderma are the most abundant noncultivar fungi in wild T. septentrionalis fungus gardens. We further determined that metabolites produced by Trichoderma induce an ant weeding response that mirrors their response to live Trichoderma. Combining ant behavioral experiments with bioactivity-guided fractionation and statistical prioritization of metabolites in Trichoderma extracts demonstrated that T. septentrionalis ants weed in response to peptaibols, a specific class of secondary metabolites known to be produced by Trichoderma fungi. Similar assays conducted using purified peptaibols, including the two previously undescribed peptaibols trichokindins VIII and IX, suggested that weeding is likely induced by peptaibols as a class rather than by a single peptaibol metabolite. In addition to their presence in laboratory experiments, we detected peptaibols in wild fungus gardens. Our combination of environmental data and laboratory infection experiments strongly support that peptaibols act as chemical cues of Trichoderma pathogenesis in T. septentrionalis fungus gardens.
Assuntos
Formigas , Infecção Laboratorial , Trichoderma , Animais , Formigas/fisiologia , Jardins , Sinais (Psicologia) , Simbiose , PeptaibolsRESUMO
SUMMARY: Computational metabolomics workflows have revolutionized the untargeted metabolomics field. However, the organization and prioritization of metabolite features remains a laborious process. Organizing metabolomics data is often done through mass fragmentation-based spectral similarity grouping, resulting in feature sets that also represent an intuitive and scientifically meaningful first stage of analysis in untargeted metabolomics. Exploiting such feature sets, feature-set testing has emerged as an approach that is widely used in genomics and targeted metabolomics pathway enrichment analyses. It allows for formally combining groupings with statistical testing into more meaningful pathway enrichment conclusions. Here, we present msFeaST (mass spectral Feature Set Testing), a feature-set testing and visualization workflow for LC-MS/MS untargeted metabolomics data. Feature-set testing involves statistically assessing differential abundance patterns for groups of features across experimental conditions. We developed msFeaST to make use of spectral similarity-based feature groupings generated using k-medoids clustering, where the resulting clusters serve as a proxy for grouping structurally similar features with potential biosynthesis pathway relationships. Spectral clustering done in this way allows for feature group-wise statistical testing using the globaltest package, which provides high power to detect small concordant effects via joint modeling and reduced multiplicity adjustment penalties. Hence, msFeaST provides interactive integration of the semi-quantitative experimental information with mass-spectral structural similarity information, enhancing the prioritization of features and feature sets during exploratory data analysis. AVAILABILITY AND IMPLEMENTATION: The msFeaST workflow is freely available through https://github.com/kevinmildau/msFeaST and built to work on MacOS and Linux systems.
Assuntos
Metabolômica , Espectrometria de Massas em Tandem , Metabolômica/métodos , Espectrometria de Massas em Tandem/métodos , Cromatografia Líquida/métodos , Software , Análise por Conglomerados , Espectrometria de Massa com Cromatografia LíquidaRESUMO
With an ever-increasing amount of (meta)genomic data being deposited in sequence databases, (meta)genome mining for natural product biosynthetic pathways occupies a critical role in the discovery of novel pharmaceutical drugs, crop protection agents and biomaterials. The genes that encode these pathways are often organised into biosynthetic gene clusters (BGCs). In 2015, we defined the Minimum Information about a Biosynthetic Gene cluster (MIBiG): a standardised data format that describes the minimally required information to uniquely characterise a BGC. We simultaneously constructed an accompanying online database of BGCs, which has since been widely used by the community as a reference dataset for BGCs and was expanded to 2021 entries in 2019 (MIBiG 2.0). Here, we describe MIBiG 3.0, a database update comprising large-scale validation and re-annotation of existing entries and 661 new entries. Particular attention was paid to the annotation of compound structures and biological activities, as well as protein domain selectivities. Together, these new features keep the database up-to-date, and will provide new opportunities for the scientific community to use its freely available data, e.g. for the training of new machine learning models to predict sequence-structure-function relationships for diverse natural products. MIBiG 3.0 is accessible online at https://mibig.secondarymetabolites.org/.
Assuntos
Genoma , Genômica , Família Multigênica , Vias Biossintéticas/genéticaRESUMO
Artificial intelligence (AI) is accelerating how we conduct science, from folding proteins with AlphaFold and summarizing literature findings with large language models, to annotating genomes and prioritizing newly generated molecules for screening using specialized software. However, the application of AI to emulate human cognition in natural product research and its subsequent impact has so far been limited. One reason for this limited impact is that available natural product data is multimodal, unbalanced, unstandardized, and scattered across many data repositories. This makes natural product data challenging to use with existing deep learning architectures that consume fairly standardized, often non-relational, data. It also prevents models from learning overarching patterns in natural product science. In this Viewpoint, we address this challenge and support ongoing initiatives aimed at democratizing natural product data by collating our collective knowledge into a knowledge graph. By doing so, we believe there will be an opportunity to use such a knowledge graph to develop AI models that can truly mimic natural product scientists' decision-making.
RESUMO
Untargeted metabolomics promises comprehensive characterization of small molecules in biological samples. However, the field is hampered by low annotation rates and abstract spectral data. Despite recent advances in computational metabolomics, manual annotations and manual confirmation of in-silico annotations remain important in the field. Here, exploratory data analysis methods for mass spectral data provide overviews, prioritization, and structural hypothesis starting points to researchers facing large quantities of spectral data. In this research, we propose a fluid means of dealing with mass spectral data using specXplore, an interactive Python dashboard providing interactive and complementary visualizations facilitating mass spectral similarity matrix exploration. Specifically, specXplore provides a two-dimensional t-distributed stochastic neighbor embedding embedding as a jumping board for local connectivity exploration using complementary interactive visualizations in the form of partial network drawings, similarity heatmaps, and fragmentation overview maps. SpecXplore makes use of state-of-the-art ms2deepscore pairwise spectral similarities as a quantitative backbone while allowing fast changes of threshold and connectivity limitation settings, providing flexibility in adjusting settings to suit the localized node environment being explored. We believe that specXplore can become an integral part of mass spectral data exploration efforts and assist users in the generation of structural hypotheses for compounds of interest.
RESUMO
INTRODUCTION: The chemical classification of Cannabis is typically confined to the cannabinoid content, whilst Cannabis encompasses diverse chemical classes that vary in abundance among all its varieties. Hence, neglecting other chemical classes within Cannabis strains results in a restricted and biased comprehension of elements that may contribute to chemical intricacy and the resultant medicinal qualities of the plant. OBJECTIVES: Thus, herein, we report a computational metabolomics study to elucidate the Cannabis metabolic map beyond the cannabinoids. METHODS: Mass spectrometry-based computational tools were used to mine and evaluate the methanolic leaf and flower extracts of two Cannabis cultivars: Amnesia haze (AMNH) and Royal dutch cheese (RDC). RESULTS: The results revealed the presence of different chemical compound classes including cannabinoids, but extending it to flavonoids and phospholipids at varying distributions across the cultivar plant tissues, where the phenylpropnoid superclass was more abundant in the leaves than in the flowers. Therefore, the two cultivars were differentiated based on the overall chemical content of their plant tissues where AMNH was observed to be more dominant in the flavonoid content while RDC was more dominant in the lipid-like molecules. Additionally, in silico molecular docking studies in combination with biological assay studies indicated the potentially differing anti-cancer properties of the two cultivars resulting from the elucidated chemical profiles. CONCLUSION: These findings highlight distinctive chemical profiles beyond cannabinoids in Cannabis strains. This novel mapping of the metabolomic landscape of Cannabis provides actionable insights into plant biochemistry and justifies selecting certain varieties for medicinal use.
Assuntos
Cannabis , Metabolômica , Folhas de Planta , Cannabis/química , Cannabis/metabolismo , Metabolômica/métodos , Folhas de Planta/metabolismo , Folhas de Planta/química , Flores/metabolismo , Flores/química , Extratos Vegetais/metabolismo , Extratos Vegetais/química , Extratos Vegetais/farmacologia , Canabinoides/metabolismo , Canabinoides/análise , Simulação de Acoplamento Molecular , Flavonoides/metabolismo , Flavonoides/análise , Espectrometria de Massas/métodosRESUMO
Major advances in genome sequencing and large-scale biosynthetic gene cluster (BGC) analysis have prompted an age of natural product discovery driven by genome mining. Still, connecting molecules to their cognate BGCs is a substantial bottleneck for this approach. We have developed a mass-spectrometry-based parallel stable isotope labeling platform, termed IsoAnalyst, which assists in associating metabolite stable isotope labeling patterns with BGC structure prediction to connect natural products to their corresponding BGCs. Here we show that IsoAnalyst can quickly associate both known metabolites and unknown analytes with BGCs to elucidate the complex chemical phenotypes of these biosynthetic systems. We validate this approach for a range of compound classes, using both the type strain Saccharopolyspora erythraea and an environmentally isolated Micromonospora sp. We further demonstrate the utility of this tool with the discovery of lobosamide D, a new and structurally unique member of the family of lobosamide macrolactams.
Assuntos
Produtos Biológicos , Micromonospora , Vias Biossintéticas/genética , Marcação por Isótopo , Família MultigênicaRESUMO
Microbial specialised metabolism is full of valuable natural products that are applied clinically, agriculturally, and industrially. The genes that encode their biosynthesis are often physically clustered on the genome in biosynthetic gene clusters (BGCs). Many BGCs consist of multiple groups of co-evolving genes called sub-clusters that are responsible for the biosynthesis of a specific chemical moiety in a natural product. Sub-clusters therefore provide an important link between the structures of a natural product and its BGC, which can be leveraged for predicting natural product structures from sequence, as well as for linking chemical structures and metabolomics-derived mass features to BGCs. While some initial computational methodologies have been devised for sub-cluster detection, current approaches are not scalable, have only been run on small and outdated datasets, or produce an impractically large number of possible sub-clusters to mine through. Here, we constructed a scalable method for unsupervised sub-cluster detection, called iPRESTO, based on topic modelling and statistical analysis of co-occurrence patterns of enzyme-coding protein families. iPRESTO was used to mine sub-clusters across 150,000 prokaryotic BGCs from antiSMASH-DB. After annotating a fraction of the resulting sub-cluster families, we could predict a substructure for 16% of the antiSMASH-DB BGCs. Additionally, our method was able to confirm 83% of the experimentally characterised sub-clusters in MIBiG reference BGCs. Based on iPRESTO-detected sub-clusters, we could correctly identify the BGCs for xenorhabdin and salbostatin biosynthesis (which had not yet been annotated in BGC databases), as well as propose a candidate BGC for akashin biosynthesis. Additionally, we show for a collection of 145 actinobacteria how substructures can aid in linking BGCs to molecules by correlating iPRESTO-detected sub-clusters to MS/MS-derived Mass2Motifs substructure patterns. This work paves the way for deeper functional and structural annotation of microbial BGCs by improved linking of orphan molecules to their cognate gene clusters, thus facilitating accelerated natural product discovery.
Assuntos
Produtos Biológicos , Espectrometria de Massas em Tandem , Metabolômica , Bactérias/genética , Família MultigênicaRESUMO
Natural products are a sustainable resource for drug discovery, but their identification in complex mixtures remains a daunting task. We present an automated pipeline that compares, harmonizes and ranks the annotations of LC-HRMS data by different tools. When applied to 7,400 extracts derived from 6,566 strains belonging to 86 actinomycete genera, it yielded 150,000 molecules after processing over 50 million MS features. The web-based Molecules Gateway provides a highly interactive access to experimental and calculated data for these molecules, along with the metadata related to extracts and producer strains. We show how the Molecules Gateway can be used to rapidly identify known hard to find microbial products, unreported analogs of known families and not yet described metabolites. The Molecules Gateway, which complements available repositories, contains annotated MS data, both acquired and computationally processed under an identical workflow, making it suitable for global analyses which reveal a large and untapped chemical diversity afforded by actinomycetes.
RESUMO
Molecular networking has become a key method to visualize and annotate the chemical space in non-targeted mass spectrometry data. We present feature-based molecular networking (FBMN) as an analysis method in the Global Natural Products Social Molecular Networking (GNPS) infrastructure that builds on chromatographic feature detection and alignment tools. FBMN enables quantitative analysis and resolution of isomers, including from ion mobility spectrometry.
Assuntos
Produtos Biológicos/química , Espectrometria de Massas , Biologia Computacional/métodos , Bases de Dados Factuais , Metabolômica/métodos , SoftwareRESUMO
We present ReDU ( https://redu.ucsd.edu/ ), a system for metadata capture of public mass spectrometry-based metabolomics data, with validated controlled vocabularies. Systematic capture of knowledge enables the reanalysis of public data and/or co-analysis of one's own data. ReDU enables multiple types of analyses, including finding chemicals and associated metadata, comparing the shared and different chemicals between groups of samples, and metadata-filtered, repository-scale molecular networking.
Assuntos
Bases de Dados de Compostos Químicos , Espectrometria de Massas , Metabolômica/métodos , Software , Metadados , Modelos QuímicosRESUMO
SUMMARY: Untargeted metabolomics data analysis is highly labour intensive and can be severely frustrated by both experimental noise and redundant features. Homologous polymer series is a particular case of features that can either represent large numbers of noise features or alternatively represent features of interest with large peak redundancy. Here, we present homologueDiscoverer, an R package that allows for the targeted and untargeted detection of homologue series as well as their evaluation and management using interactive plots and simple local database functionalities. AVAILABILITY AND IMPLEMENTATION: homologueDiscoverer is freely available at GitHub https://github.com/kevinmildau/homologueDiscoverer. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Software , Espectrometria de Massas em Tandem , Cromatografia Líquida , Metabolômica , Análise de DadosRESUMO
Untargeted mass spectrometry is employed to detect small molecules in complex biospecimens, generating data that are difficult to interpret. We developed Qemistree, a data exploration strategy based on the hierarchical organization of molecular fingerprints predicted from fragmentation spectra. Qemistree allows mass spectrometry data to be represented in the context of sample metadata and chemical ontologies. By expressing molecular relationships as a tree, we can apply ecological tools that are designed to analyze and visualize the relatedness of DNA sequences to metabolomics data. Here we demonstrate the use of tree-guided data exploration tools to compare metabolomics samples across different experimental conditions such as chromatographic shifts. Additionally, we leverage a tree representation to visualize chemical diversity in a heterogeneous collection of samples. The Qemistree software pipeline is freely available to the microbiome and metabolomics communities in the form of a QIIME2 plugin, and a global natural products social molecular networking workflow.
Assuntos
Espectrometria de Massas/métodos , Metabolômica , Algoritmos , Análise por Conglomerados , DNA/química , Impressões Digitais de DNA , Bases de Dados Factuais , Ecologia , Análise de Alimentos , Microbiota , Análise Multivariada , Software , Espectrometria de Massas em Tandem , Fluxo de TrabalhoRESUMO
UV-B radiation regulates numerous morphogenic, biochemical and physiological responses in plants, and can stimulate some responses typically associated with other abiotic and biotic stimuli, including invertebrate herbivory. Removal of UV-B from the growing environment of various plant species has been found to increase their susceptibility to consumption by invertebrate pests, however, to date, little research has been conducted to investigate the effects of UV-B on crop susceptibility to field pests. Here, we report findings from a multi-omic and genetic-based study investigating the mechanisms of UV-B-stimulated resistance of the crop, Brassica napus (oilseed rape), to herbivory from an economically important lepidopteran specialist of the Brassicaceae, Plutella xylostella (diamondback moth). The UV-B photoreceptor, UV RESISTANCE LOCUS 8 (UVR8), was not found to mediate resistance to this pest. RNA-Seq and untargeted metabolomics identified components of the sinapate/lignin biosynthetic pathway that were similarly regulated by UV-B and herbivory. Arabidopsis mutants in genes encoding two enzymes in the sinapate/lignin biosynthetic pathway, CAFFEATE O-METHYLTRANSFERASE 1 (COMT1) and ELICITOR-ACTIVATED GENE 3-2 (ELI3-2), retained UV-B-mediated resistance to P. xylostella herbivory. However, the overexpression of B. napus COMT1 in Arabidopsis further reduced plant susceptibility to P. xylostella herbivory in a UV-B-dependent manner. These findings demonstrate that overexpression of a component of the sinapate/lignin biosynthetic pathway in a member of the Brassicaceae can enhance UV-B-stimulated resistance to herbivory from P. xylostella.
Assuntos
Arabidopsis , Brassica napus , Mariposas , Animais , Arabidopsis/genética , Arabidopsis/efeitos da radiação , Brassica napus/genética , Herbivoria , Lignina , Mariposas/fisiologia , PlantasRESUMO
Covering: up to 2022With the emergence of large amounts of omics data, computational approaches for the identification of plant natural product biosynthetic pathways and their genetic regulation have become increasingly important. While genomes provide clues regarding functional associations between genes based on gene clustering, metabolome mining provides a foundational technology to chart natural product structural diversity in plants, and transcriptomics has been successfully used to identify new members of their biosynthetic pathways based on coexpression. Thus far, most approaches utilizing transcriptomics and metabolomics have been targeted towards specific pathways and use one type of omics data at a time. Recent technological advances now provide new opportunities for integration of multiple omics types and untargeted pathway discovery. Here, we review advances in plant biosynthetic pathway discovery using genomics, transcriptomics, and metabolomics, as well as recent efforts towards omics integration. We highlight how transcriptomics and metabolomics provide complementary information to link genes to metabolites, by associating temporal and spatial gene expression levels with metabolite abundance levels across samples, and by matching mass-spectral features to enzyme families. Furthermore, we suggest that elucidation of gene regulatory networks using time-series data may prove useful for efforts to unwire the complexities of biosynthetic pathway components based on regulatory interactions and events.
Assuntos
Produtos Biológicos , Vias Biossintéticas , Produtos Biológicos/metabolismo , Vias Biossintéticas/genética , Genômica , Metaboloma , Metabolômica , Plantas/genética , Plantas/metabolismoRESUMO
Livestock diseases caused by Trypanosoma congolense, T. vivax and T. brucei, collectively known as nagana, are responsible for billions of dollars in lost food production annually. There is an urgent need for novel therapeutics. Encouragingly, promising antitrypanosomal benzoxaboroles are under veterinary development. Here, we show that the most efficacious subclass of these compounds are prodrugs activated by trypanosome serine carboxypeptidases (CBPs). Drug-resistance to a development candidate, AN11736, emerged readily in T. brucei, due to partial deletion within the locus containing three tandem copies of the CBP genes. T. congolense parasites, which possess a larger array of related CBPs, also developed resistance to AN11736 through deletion within the locus. A genome-scale screen in T. brucei confirmed CBP loss-of-function as the primary mechanism of resistance and CRISPR-Cas9 editing proved that partial deletion within the locus was sufficient to confer resistance. CBP re-expression in either T. brucei or T. congolense AN11736-resistant lines restored drug-susceptibility. CBPs act by cleaving the benzoxaborole AN11736 to a carboxylic acid derivative, revealing a prodrug activation mechanism. Loss of CBP activity results in massive reduction in net uptake of AN11736, indicating that entry is facilitated by the concentration gradient created by prodrug metabolism.
Assuntos
Compostos de Boro/metabolismo , Carboxipeptidases/metabolismo , Tripanossomicidas/metabolismo , Trypanosoma brucei brucei/enzimologia , Trypanosoma congolense/enzimologia , Trypanosoma vivax/enzimologia , Tripanossomíase Africana/veterinária , Valina/análogos & derivados , Animais , Ácidos Carboxílicos/metabolismo , Resistência a Medicamentos , Feminino , Gado , Camundongos , Parasitemia/veterinária , Pró-Fármacos/metabolismo , Proteínas de Protozoários/metabolismo , Trypanosoma brucei brucei/efeitos dos fármacos , Trypanosoma congolense/efeitos dos fármacos , Trypanosoma vivax/efeitos dos fármacos , Tripanossomíase Africana/tratamento farmacológico , Tripanossomíase Africana/parasitologia , Valina/metabolismoRESUMO
BACKGROUND: Untargeted metabolomics approaches based on mass spectrometry obtain comprehensive profiles of complex biological samples. However, on average only 10% of the molecules can be annotated. This low annotation rate hampers biochemical interpretation and effective comparison of metabolomics studies. Furthermore, de novo structural characterization of mass spectral data remains a complicated and time-intensive process. Recently, the field of computational metabolomics has gained traction and novel methods have started to enable large-scale and reliable metabolite annotation. Molecular networking and machine learning-based in-silico annotation tools have been shown to greatly assist metabolite characterization in diverse fields such as clinical metabolomics and natural product discovery. AIM OF REVIEW: We highlight recent advances in computational metabolite annotation workflows with a special focus on their evaluation and comparison with other tools. Whilst the progress is substantial and promising, we also argue that inconsistencies in benchmarking different tools hamper users from selecting the most appropriate and promising method for their research. We summarize benchmarking strategies of the different tools and outline several recommendations for benchmarking and comparing novel tools. KEY SCIENTIFIC CONCEPTS OF REVIEW: This review focuses on recent advances in mass spectral library-based and machine learning-supported metabolite annotation workflows. We discuss large-scale library matching and analogue search, the current bloom of mass spectral similarity scores, and how molecular networking has changed the field. In addition, the potentials and challenges of machine learning-supported metabolite annotation workflows are highlighted. Overall, recent developments in computational metabolomics have started to fundamentally change metabolomics workflows, and we expect that as a community we will be able to overcome current method performance ambiguities and annotation bottlenecks.
Assuntos
Benchmarking , Metabolômica , Metabolômica/métodos , Espectrometria de Massas , Aprendizado de MáquinaRESUMO
Spectral similarity is used as a proxy for structural similarity in many tandem mass spectrometry (MS/MS) based metabolomics analyses such as library matching and molecular networking. Although weaknesses in the relationship between spectral similarity scores and the true structural similarities have been described, little development of alternative scores has been undertaken. Here, we introduce Spec2Vec, a novel spectral similarity score inspired by a natural language processing algorithm-Word2Vec. Spec2Vec learns fragmental relationships within a large set of spectral data to derive abstract spectral embeddings that can be used to assess spectral similarities. Using data derived from GNPS MS/MS libraries including spectra for nearly 13,000 unique molecules, we show how Spec2Vec scores correlate better with structural similarity than cosine-based scores. We demonstrate the advantages of Spec2Vec in library matching and molecular networking. Spec2Vec is computationally more scalable allowing structural analogue searches in large databases within seconds.
Assuntos
Algoritmos , Biologia Computacional/métodos , Biblioteca Gênica , Metabolômica/métodos , Espectrometria de Massas em Tandem/métodos , Simulação por Computador , Bases de Dados Factuais , Reações Falso-Positivas , Aprendizado de Máquina , Processamento de Linguagem Natural , Reprodutibilidade dos TestesRESUMO
Specialised metabolites from microbial sources are well-known for their wide range of biomedical applications, particularly as antibiotics. When mining paired genomic and metabolomic data sets for novel specialised metabolites, establishing links between Biosynthetic Gene Clusters (BGCs) and metabolites represents a promising way of finding such novel chemistry. However, due to the lack of detailed biosynthetic knowledge for the majority of predicted BGCs, and the large number of possible combinations, this is not a simple task. This problem is becoming ever more pressing with the increased availability of paired omics data sets. Current tools are not effective at identifying valid links automatically, and manual verification is a considerable bottleneck in natural product research. We demonstrate that using multiple link-scoring functions together makes it easier to prioritise true links relative to others. Based on standardising a commonly used score, we introduce a new, more effective score, and introduce a novel score using an Input-Output Kernel Regression approach. Finally, we present NPLinker, a software framework to link genomic and metabolomic data. Results are verified using publicly available data sets that include validated links.
Assuntos
Genética Microbiana/estatística & dados numéricos , Genômica/estatística & dados numéricos , Metabolômica/estatística & dados numéricos , Software , Vias Biossintéticas/genética , Biologia Computacional , Mineração de Dados , Bases de Dados Factuais , Bases de Dados Genéticas , Genoma Microbiano , Fenômenos Microbiológicos , Família Multigênica , Análise de RegressãoRESUMO
Fueled by the explosion of (meta)genomic data, genome mining of specialized metabolites has become a major technology for drug discovery and studying microbiome ecology. In these efforts, computational tools like antiSMASH have played a central role through the analysis of Biosynthetic Gene Clusters (BGCs). Thousands of candidate BGCs from microbial genomes have been identified and stored in public databases. Interpreting the function and novelty of these predicted BGCs requires comparison with a well-documented set of BGCs of known function. The MIBiG (Minimum Information about a Biosynthetic Gene Cluster) Data Standard and Repository was established in 2015 to enable curation and storage of known BGCs. Here, we present MIBiG 2.0, which encompasses major updates to the schema, the data, and the online repository itself. Over the past five years, 851 new BGCs have been added. Additionally, we performed extensive manual data curation of all entries to improve the annotation quality of our repository. We also redesigned the data schema to ensure the compliance of future annotations. Finally, we improved the user experience by adding new features such as query searches and a statistics page, and enabled direct link-outs to chemical structure databases. The repository is accessible online at https://mibig.secondarymetabolites.org/.