Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 12 de 12
Filter
Add more filters










Publication year range
1.
Nat Biotechnol ; 2024 Jan 02.
Article in English | MEDLINE | ID: mdl-38168990

ABSTRACT

The throughput of mass spectrometers and the amount of publicly available metabolomics data are growing rapidly, but analysis tools such as molecular networking and Mass Spectrometry Search Tool do not scale to searching and clustering billions of mass spectral data in metabolomics repositories. To address this limitation, we designed MASST+ and Networking+, which can process datasets that are up to three orders of magnitude larger than those processed by state-of-the-art tools.

2.
Nat Commun ; 14(1): 4219, 2023 07 14.
Article in English | MEDLINE | ID: mdl-37452020

ABSTRACT

Recent analyses of public microbial genomes have found over a million biosynthetic gene clusters, the natural products of the majority of which remain unknown. Additionally, GNPS harbors billions of mass spectra of natural products without known structures and biosynthetic genes. We bridge the gap between large-scale genome mining and mass spectral datasets for natural product discovery by developing HypoRiPPAtlas, an Atlas of hypothetical natural product structures, which is ready-to-use for in silico database search of tandem mass spectra. HypoRiPPAtlas is constructed by mining genomes using seq2ripp, a machine-learning tool for the prediction of ribosomally synthesized and post-translationally modified peptides (RiPPs). In HypoRiPPAtlas, we identify RiPPs in microbes and plants. HypoRiPPAtlas could be extended to other natural product classes in the future by implementing corresponding biosynthetic logic. This study paves the way for large-scale explorations of biosynthetic pathways and chemical structures of microbial and plant RiPP classes.


Subject(s)
Biological Products , Ribosomes , Ribosomes/metabolism , Biological Products/chemistry , Peptides/chemistry , Databases, Factual , Tandem Mass Spectrometry , Protein Processing, Post-Translational
3.
Bioinformatics ; 39(39 Suppl 1): i40-i46, 2023 06 30.
Article in English | MEDLINE | ID: mdl-37387149

ABSTRACT

Microbial natural products represent a major source of bioactive compounds for drug discovery. Among these molecules, nonribosomal peptides (NRPs) represent a diverse class that include antibiotics, immunosuppressants, anticancer agents, toxins, siderophores, pigments, and cytostatics. The discovery of novel NRPs remains a laborious process because many NRPs consist of nonstandard amino acids that are assembled by nonribosomal peptide synthetases (NRPSs). Adenylation domains (A-domains) in NRPSs are responsible for selection and activation of monomers appearing in NRPs. During the past decade, several support vector machine-based algorithms have been developed for predicting the specificity of the monomers present in NRPs. These algorithms utilize physiochemical features of the amino acids present in the A-domains of NRPSs. In this article, we benchmarked the performance of various machine learning algorithms and features for predicting specificities of NRPSs and we showed that the extra trees model paired with one-hot encoding features outperforms the existing approaches. Moreover, we show that unsupervised clustering of 453 560 A-domains reveals many clusters that correspond to potentially novel amino acids. While it is challenging to predict the chemical structure of these amino acids, we developed novel techniques to predict their various properties, including polarity, hydrophobicity, charge, and presence of aromatic rings, carboxyl, and hydroxyl groups.


Subject(s)
Amino Acids , Genome, Microbial , Algorithms , Multigene Family , Peptides
5.
Nat Commun ; 12(1): 3225, 2021 05 28.
Article in English | MEDLINE | ID: mdl-34050176

ABSTRACT

Non-Ribosomal Peptides (NRPs) represent a biomedically important class of natural products that include a multitude of antibiotics and other clinically used drugs. NRPs are not directly encoded in the genome but are instead produced by metabolic pathways encoded by biosynthetic gene clusters (BGCs). Since the existing genome mining tools predict many putative NRPs synthesized by a given BGC, it remains unclear which of these putative NRPs are correct and how to identify post-assembly modifications of amino acids in these NRPs in a blind mode, without knowing which modifications exist in the sample. To address this challenge, here we report NRPminer, a modification-tolerant tool for NRP discovery from large (meta)genomic and mass spectrometry datasets. We show that NRPminer is able to identify many NRPs from different environments, including four previously unreported NRP families from soil-associated microbes and NRPs from human microbiota. Furthermore, in this work we demonstrate the anti-parasitic activities and the structure of two of these NRP families using direct bioactivity screening and nuclear magnetic resonance spectrometry, illustrating the power of NRPminer for discovering bioactive NRPs.


Subject(s)
Anti-Bacterial Agents/isolation & purification , Biological Products/isolation & purification , Computational Biology/methods , Drug Discovery/methods , Peptides/isolation & purification , Algorithms , Amino Acid Sequence/genetics , Anti-Bacterial Agents/biosynthesis , Biological Products/metabolism , Datasets as Topic , Humans , Mass Spectrometry , Metabolic Networks and Pathways/genetics , Metabolomics/methods , Metagenomics/methods , Microbiota/genetics , Multigene Family , Peptide Biosynthesis , Peptide Synthases/genetics , Peptide Synthases/metabolism , Peptides/genetics , Peptides/metabolism , Soil Microbiology
6.
Nat Methods ; 17(11): 1103-1110, 2020 11.
Article in English | MEDLINE | ID: mdl-33020656

ABSTRACT

Long-read sequencing technologies have substantially improved the assemblies of many isolate bacterial genomes as compared to fragmented short-read assemblies. However, assembling complex metagenomic datasets remains difficult even for state-of-the-art long-read assemblers. Here we present metaFlye, which addresses important long-read metagenomic assembly challenges, such as uneven bacterial composition and intra-species heterogeneity. First, we benchmarked metaFlye using simulated and mock bacterial communities and show that it consistently produces assemblies with better completeness and contiguity than state-of-the-art long-read assemblers. Second, we performed long-read sequencing of the sheep microbiome and applied metaFlye to reconstruct 63 complete or nearly complete bacterial genomes within single contigs. Finally, we show that long-read assembly of human microbiomes enables the discovery of full-length biosynthetic gene clusters that encode biomedically important natural products.


Subject(s)
Genome, Bacterial/genetics , Genome, Human/genetics , Metagenome/genetics , Metagenomics/methods , Microbiota/genetics , Algorithms , Animals , Benchmarking , Gastrointestinal Microbiome/genetics , Humans , Sequence Analysis, DNA/methods , Sheep , Software , Species Specificity
7.
Cell Syst ; 10(1): 99-108.e5, 2020 01 22.
Article in English | MEDLINE | ID: mdl-31864964

ABSTRACT

Cyclic and branch cyclic peptides (cyclopeptides) represent a class of bioactive natural products that include many antibiotics and anti-tumor compounds. Despite the recent advances in metabolomics analysis, still little is known about the cyclopeptides in the human gut and their possible interactions due to a lack of computational analysis pipelines that are applicable to such compounds. Here, we introduce CycloNovo, an algorithm for automated de novo cyclopeptide analysis and sequencing that employs de Bruijn graphs, the workhorse of DNA sequencing algorithms, to identify cyclopeptides in spectral datasets. CycloNovo reconstructed 32 previously unreported cyclopeptides (to the best of our knowledge) in the human gut and reported over a hundred cyclopeptides in other environments represented by various spectra on Global Natural Products Social Molecular Network (GNPS). https://github.com/bbehsaz/cyclonovo.


Subject(s)
Amino Acid Sequence/genetics , Gastrointestinal Microbiome/genetics , Peptides, Cyclic/chemistry , Humans , Mass Spectrometry
8.
Plant Direct ; 2(2)2018 Feb.
Article in English | MEDLINE | ID: mdl-30417166

ABSTRACT

Orbitides are cyclic ribosomally-synthesized and post-translationally modified peptides (RiPPs) from plants; they consist of standard amino acids arranged in an unbroken chain of peptide bonds. These cyclic peptides are stable and range in size and topologies making them potential scaffolds for peptide drugs; some display valuable biological activities. Recently two orbitides whose sequences were buried in those of seed storage albumin precursors were said to represent the first observable step in the evolution of larger and hydrophilic bicyclic peptides. Here, guided by transcriptome data, we investigated peptide extracts of 40 species specifically for the more hydrophobic orbitides and confirmed 44 peptides by tandem mass spectrometry, as well as obtaining solution structures for four of them by NMR. Acquiring transcriptomes from the phylogenetically important Corymboideae family confirmed the precursor genes for the peptides (called PawS1-Like or PawL1) are confined to the Asteroideae, a subfamily of the huge plant family Asteraceae. To be confined to the Asteroideae indicates these peptides arose during the Eocene epoch around 45 Mya. Unlike other orbitides, all PawL-derived Peptides contain an Asp residue, needed for processing by asparaginyl endopeptidase. This study has revealed what is likely to be a very large new family of orbitides, uniquely buried alongside albumin and processed by asparaginyl endopeptidase.

9.
mSystems ; 3(3)2018.
Article in English | MEDLINE | ID: mdl-29795809

ABSTRACT

Although much work has linked the human microbiome to specific phenotypes and lifestyle variables, data from different projects have been challenging to integrate and the extent of microbial and molecular diversity in human stool remains unknown. Using standardized protocols from the Earth Microbiome Project and sample contributions from over 10,000 citizen-scientists, together with an open research network, we compare human microbiome specimens primarily from the United States, United Kingdom, and Australia to one another and to environmental samples. Our results show an unexpected range of beta-diversity in human stool microbiomes compared to environmental samples; demonstrate the utility of procedures for removing the effects of overgrowth during room-temperature shipping for revealing phenotype correlations; uncover new molecules and kinds of molecular communities in the human stool metabolome; and examine emergent associations among the microbiome, metabolome, and the diversity of plants that are consumed (rather than relying on reductive categorical variables such as veganism, which have little or no explanatory power). We also demonstrate the utility of the living data resource and cross-cohort comparison to confirm existing associations between the microbiome and psychiatric illness and to reveal the extent of microbiome change within one individual during surgery, providing a paradigm for open microbiome research and education. IMPORTANCE We show that a citizen science, self-selected cohort shipping samples through the mail at room temperature recaptures many known microbiome results from clinically collected cohorts and reveals new ones. Of particular interest is integrating n = 1 study data with the population data, showing that the extent of microbiome change after events such as surgery can exceed differences between distinct environmental biomes, and the effect of diverse plants in the diet, which we confirm with untargeted metabolomics on hundreds of samples.

10.
Aquat Toxicol ; 185: 48-57, 2017 Apr.
Article in English | MEDLINE | ID: mdl-28187360

ABSTRACT

The ringed seal, Pusa hispida, is a keystone species in the Arctic marine ecosystem, and is proving a useful marine mammal for linking polychlorinated biphenyl (PCB) exposure to toxic injury. We report here the first de novo assembled transcriptome for the ringed seal (342,863 transcripts, of which 53% were annotated), which we then applied to a population of ringed seals exposed to a local PCB source in Arctic Labrador, Canada. We found an indication of energy metabolism imbalance in local ringed seals (n=4), and identified five significant gene transcript targets: plasminogen receptor (Plg-R(KT)), solute carrier family 25 member 43 receptor (Slc25a43), ankyrin repeat domain-containing protein 26-like receptor (Ankrd26), HIS30 (not yet annotated) and HIS16 (not yet annotated) that may represent indicators of PCB exposure and effects in marine mammals. The abundance profiles of these five gene targets were validated in blubber samples collected from 43 ringed seals using a qPCR assay. The mRNA transcript levels for all five gene targets, (Plg-R(KT), r2=0.43), (Slc25a43, r2=0.51), (Ankrd26, r2=0.43), (HIS30, r2=0.39) and (HIS16, r2=0.31) correlated with increasing levels of blubber PCBs. Results from the present study contribute to our understanding of PCB associated effects in marine mammals, and provide new tools for future molecular and toxicology work in pinnipeds.


Subject(s)
Animal Structures/metabolism , Environmental Exposure/analysis , Health Status Indicators , Polychlorinated Biphenyls/toxicity , Seals, Earless/genetics , Transcriptome/genetics , Animals , Gene Expression Profiling , Gene Ontology , Molecular Sequence Annotation , Polymerase Chain Reaction , RNA, Messenger/genetics , RNA, Messenger/metabolism , Reproducibility of Results , Sequence Analysis, RNA , Water Pollutants, Chemical/toxicity
11.
Gigascience ; 4: 35, 2015.
Article in English | MEDLINE | ID: mdl-26244089

ABSTRACT

BACKGROUND: Owing to the complexity of the assembly problem, we do not yet have complete genome sequences. The difficulty in assembling reads into finished genomes is exacerbated by sequence repeats and the inability of short reads to capture sufficient genomic information to resolve those problematic regions. In this regard, established and emerging long read technologies show great promise, but their current associated higher error rates typically require computational base correction and/or additional bioinformatics pre-processing before they can be of value. RESULTS: We present LINKS, the Long Interval Nucleotide K-mer Scaffolder algorithm, a method that makes use of the sequence properties of nanopore sequence data and other error-containing sequence data, to scaffold high-quality genome assemblies, without the need for read alignment or base correction. Here, we show how the contiguity of an ABySS Escherichia coli K-12 genome assembly can be increased greater than five-fold by the use of beta-released Oxford Nanopore Technologies Ltd. long reads and how LINKS leverages long-range information in Saccharomyces cerevisiae W303 nanopore reads to yield assemblies whose resulting contiguity and correctness are on par with or better than that of competing applications. We also present the re-scaffolding of the colossal white spruce (Picea glauca) draft assembly (PG29, 20 Gbp) and demonstrate how LINKS scales to larger genomes. CONCLUSIONS: This study highlights the present utility of nanopore reads for genome scaffolding in spite of their current limitations, which are expected to diminish as the nanopore sequencing technology advances. We expect LINKS to have broad utility in harnessing the potential of long reads in connecting high-quality sequences of small and large genome assembly drafts.


Subject(s)
Genome , Sequence Alignment
12.
PLoS One ; 10(6): e0130720, 2015.
Article in English | MEDLINE | ID: mdl-26121473

ABSTRACT

In this work we studied the liver transcriptomes of two frog species, the American bullfrog (Rana (Lithobates) catesbeiana) and the African clawed frog (Xenopus laevis). We used high throughput RNA sequencing (RNA-seq) data to assemble and annotate these transcriptomes, and compared how their baseline expression profiles change when tadpoles of the two species are exposed to thyroid hormone. We generated more than 1.5 billion RNA-seq reads in total for the two species under two conditions as treatment/control pairs. We de novo assembled these reads using Trans-ABySS to reconstruct reference transcriptomes, obtaining over 350,000 and 130,000 putative transcripts for R. catesbeiana and X. laevis, respectively. Using available genomics resources for X. laevis, we annotated over 97% of our X. laevis transcriptome contigs, demonstrating the utility and efficacy of our methodology. Leveraging this validated analysis pipeline, we also annotated the assembled R. catesbeiana transcriptome. We used the expression profiles of the annotated genes of the two species to examine the similarities and differences between the tadpole liver transcriptomes. We also compared the gene ontology terms of expressed genes to measure how the animals react to a challenge by thyroid hormone. Our study reports three main conclusions. First, de novo assembly of RNA-seq data is a powerful method for annotating and establishing transcriptomes of non-model organisms. Second, the liver transcriptomes of the two frog species, R. catesbeiana and X. laevis, show many common features, and the distribution of their gene ontology profiles are statistically indistinguishable. Third, although they broadly respond the same way to the presence of thyroid hormone in their environment, their receptor/signal transduction pathways display marked differences.


Subject(s)
Genome , Genomics , Liver/metabolism , Rana catesbeiana/genetics , Transcriptome/genetics , Xenopus laevis/genetics , Animals , Gene Expression Profiling , Gene Expression Regulation, Developmental , Gene Ontology , High-Throughput Nucleotide Sequencing , Larva/genetics , Molecular Sequence Annotation , RNA, Messenger/genetics , RNA, Messenger/metabolism , Reference Standards , Signal Transduction/genetics
SELECTION OF CITATIONS
SEARCH DETAIL
...