Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 13 de 13
Filtrar
1.
Bioinformatics ; 39(39 Suppl 1): i40-i46, 2023 06 30.
Artículo en Inglés | MEDLINE | ID: mdl-37387149

RESUMEN

Microbial natural products represent a major source of bioactive compounds for drug discovery. Among these molecules, nonribosomal peptides (NRPs) represent a diverse class that include antibiotics, immunosuppressants, anticancer agents, toxins, siderophores, pigments, and cytostatics. The discovery of novel NRPs remains a laborious process because many NRPs consist of nonstandard amino acids that are assembled by nonribosomal peptide synthetases (NRPSs). Adenylation domains (A-domains) in NRPSs are responsible for selection and activation of monomers appearing in NRPs. During the past decade, several support vector machine-based algorithms have been developed for predicting the specificity of the monomers present in NRPs. These algorithms utilize physiochemical features of the amino acids present in the A-domains of NRPSs. In this article, we benchmarked the performance of various machine learning algorithms and features for predicting specificities of NRPSs and we showed that the extra trees model paired with one-hot encoding features outperforms the existing approaches. Moreover, we show that unsupervised clustering of 453 560 A-domains reveals many clusters that correspond to potentially novel amino acids. While it is challenging to predict the chemical structure of these amino acids, we developed novel techniques to predict their various properties, including polarity, hydrophobicity, charge, and presence of aromatic rings, carboxyl, and hydroxyl groups.


Asunto(s)
Aminoácidos , Genoma Microbiano , Algoritmos , Familia de Multigenes , Péptidos
2.
Nat Methods ; 17(11): 1103-1110, 2020 11.
Artículo en Inglés | MEDLINE | ID: mdl-33020656

RESUMEN

Long-read sequencing technologies have substantially improved the assemblies of many isolate bacterial genomes as compared to fragmented short-read assemblies. However, assembling complex metagenomic datasets remains difficult even for state-of-the-art long-read assemblers. Here we present metaFlye, which addresses important long-read metagenomic assembly challenges, such as uneven bacterial composition and intra-species heterogeneity. First, we benchmarked metaFlye using simulated and mock bacterial communities and show that it consistently produces assemblies with better completeness and contiguity than state-of-the-art long-read assemblers. Second, we performed long-read sequencing of the sheep microbiome and applied metaFlye to reconstruct 63 complete or nearly complete bacterial genomes within single contigs. Finally, we show that long-read assembly of human microbiomes enables the discovery of full-length biosynthetic gene clusters that encode biomedically important natural products.


Asunto(s)
Genoma Bacteriano/genética , Genoma Humano/genética , Metagenoma/genética , Metagenómica/métodos , Microbiota/genética , Algoritmos , Animales , Benchmarking , Microbioma Gastrointestinal/genética , Humanos , Análisis de Secuencia de ADN/métodos , Ovinos , Programas Informáticos , Especificidad de la Especie
3.
Nat Biotechnol ; 2024 Jan 02.
Artículo en Inglés | MEDLINE | ID: mdl-38168990

RESUMEN

The throughput of mass spectrometers and the amount of publicly available metabolomics data are growing rapidly, but analysis tools such as molecular networking and Mass Spectrometry Search Tool do not scale to searching and clustering billions of mass spectral data in metabolomics repositories. To address this limitation, we designed MASST+ and Networking+, which can process datasets that are up to three orders of magnitude larger than those processed by state-of-the-art tools.

4.
Nat Commun ; 15(1): 5356, 2024 Jun 25.
Artículo en Inglés | MEDLINE | ID: mdl-38918378

RESUMEN

Type 1 polyketides are a major class of natural products used as antiviral, antibiotic, antifungal, antiparasitic, immunosuppressive, and antitumor drugs. Analysis of public microbial genomes leads to the discovery of over sixty thousand type 1 polyketide gene clusters. However, the molecular products of only about a hundred of these clusters are characterized, leaving most metabolites unknown. Characterizing polyketides relies on bioactivity-guided purification, which is expensive and time-consuming. To address this, we present Seq2PKS, a machine learning algorithm that predicts chemical structures derived from Type 1 polyketide synthases. Seq2PKS predicts numerous putative structures for each gene cluster to enhance accuracy. The correct structure is identified using a variable mass spectral database search. Benchmarks show that Seq2PKS outperforms existing methods. Applying Seq2PKS to Actinobacteria datasets, we discover biosynthetic gene clusters for monazomycin, oasomycin A, and 2-aminobenzamide-actiphenol.


Asunto(s)
Espectrometría de Masas , Familia de Multigenes , Sintasas Poliquetidas , Policétidos , Policétidos/metabolismo , Policétidos/química , Sintasas Poliquetidas/genética , Sintasas Poliquetidas/metabolismo , Espectrometría de Masas/métodos , Minería de Datos/métodos , Aprendizaje Automático , Actinobacteria/genética , Actinobacteria/metabolismo , Genoma Bacteriano , Algoritmos , Productos Biológicos/química , Productos Biológicos/metabolismo
5.
Nat Commun ; 14(1): 4219, 2023 07 14.
Artículo en Inglés | MEDLINE | ID: mdl-37452020

RESUMEN

Recent analyses of public microbial genomes have found over a million biosynthetic gene clusters, the natural products of the majority of which remain unknown. Additionally, GNPS harbors billions of mass spectra of natural products without known structures and biosynthetic genes. We bridge the gap between large-scale genome mining and mass spectral datasets for natural product discovery by developing HypoRiPPAtlas, an Atlas of hypothetical natural product structures, which is ready-to-use for in silico database search of tandem mass spectra. HypoRiPPAtlas is constructed by mining genomes using seq2ripp, a machine-learning tool for the prediction of ribosomally synthesized and post-translationally modified peptides (RiPPs). In HypoRiPPAtlas, we identify RiPPs in microbes and plants. HypoRiPPAtlas could be extended to other natural product classes in the future by implementing corresponding biosynthetic logic. This study paves the way for large-scale explorations of biosynthetic pathways and chemical structures of microbial and plant RiPP classes.


Asunto(s)
Productos Biológicos , Ribosomas , Ribosomas/metabolismo , Productos Biológicos/química , Péptidos/química , Bases de Datos Factuales , Espectrometría de Masas en Tándem , Procesamiento Proteico-Postraduccional
6.
Nat Commun ; 12(1): 3225, 2021 05 28.
Artículo en Inglés | MEDLINE | ID: mdl-34050176

RESUMEN

Non-Ribosomal Peptides (NRPs) represent a biomedically important class of natural products that include a multitude of antibiotics and other clinically used drugs. NRPs are not directly encoded in the genome but are instead produced by metabolic pathways encoded by biosynthetic gene clusters (BGCs). Since the existing genome mining tools predict many putative NRPs synthesized by a given BGC, it remains unclear which of these putative NRPs are correct and how to identify post-assembly modifications of amino acids in these NRPs in a blind mode, without knowing which modifications exist in the sample. To address this challenge, here we report NRPminer, a modification-tolerant tool for NRP discovery from large (meta)genomic and mass spectrometry datasets. We show that NRPminer is able to identify many NRPs from different environments, including four previously unreported NRP families from soil-associated microbes and NRPs from human microbiota. Furthermore, in this work we demonstrate the anti-parasitic activities and the structure of two of these NRP families using direct bioactivity screening and nuclear magnetic resonance spectrometry, illustrating the power of NRPminer for discovering bioactive NRPs.


Asunto(s)
Antibacterianos/aislamiento & purificación , Productos Biológicos/aislamiento & purificación , Biología Computacional/métodos , Descubrimiento de Drogas/métodos , Péptidos/aislamiento & purificación , Algoritmos , Secuencia de Aminoácidos/genética , Antibacterianos/biosíntesis , Productos Biológicos/metabolismo , Conjuntos de Datos como Asunto , Humanos , Espectrometría de Masas , Redes y Vías Metabólicas/genética , Metabolómica/métodos , Metagenómica/métodos , Microbiota/genética , Familia de Multigenes , Biosíntesis de Péptidos , Péptido Sintasas/genética , Péptido Sintasas/metabolismo , Péptidos/genética , Péptidos/metabolismo , Microbiología del Suelo
7.
Cell Syst ; 10(1): 99-108.e5, 2020 01 22.
Artículo en Inglés | MEDLINE | ID: mdl-31864964

RESUMEN

Cyclic and branch cyclic peptides (cyclopeptides) represent a class of bioactive natural products that include many antibiotics and anti-tumor compounds. Despite the recent advances in metabolomics analysis, still little is known about the cyclopeptides in the human gut and their possible interactions due to a lack of computational analysis pipelines that are applicable to such compounds. Here, we introduce CycloNovo, an algorithm for automated de novo cyclopeptide analysis and sequencing that employs de Bruijn graphs, the workhorse of DNA sequencing algorithms, to identify cyclopeptides in spectral datasets. CycloNovo reconstructed 32 previously unreported cyclopeptides (to the best of our knowledge) in the human gut and reported over a hundred cyclopeptides in other environments represented by various spectra on Global Natural Products Social Molecular Network (GNPS). https://github.com/bbehsaz/cyclonovo.


Asunto(s)
Secuencia de Aminoácidos/genética , Microbioma Gastrointestinal/genética , Péptidos Cíclicos/química , Humanos , Espectrometría de Masas
8.
Plant Direct ; 2(2)2018 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-30417166

RESUMEN

Orbitides are cyclic ribosomally-synthesized and post-translationally modified peptides (RiPPs) from plants; they consist of standard amino acids arranged in an unbroken chain of peptide bonds. These cyclic peptides are stable and range in size and topologies making them potential scaffolds for peptide drugs; some display valuable biological activities. Recently two orbitides whose sequences were buried in those of seed storage albumin precursors were said to represent the first observable step in the evolution of larger and hydrophilic bicyclic peptides. Here, guided by transcriptome data, we investigated peptide extracts of 40 species specifically for the more hydrophobic orbitides and confirmed 44 peptides by tandem mass spectrometry, as well as obtaining solution structures for four of them by NMR. Acquiring transcriptomes from the phylogenetically important Corymboideae family confirmed the precursor genes for the peptides (called PawS1-Like or PawL1) are confined to the Asteroideae, a subfamily of the huge plant family Asteraceae. To be confined to the Asteroideae indicates these peptides arose during the Eocene epoch around 45 Mya. Unlike other orbitides, all PawL-derived Peptides contain an Asp residue, needed for processing by asparaginyl endopeptidase. This study has revealed what is likely to be a very large new family of orbitides, uniquely buried alongside albumin and processed by asparaginyl endopeptidase.

9.
mSystems ; 3(3)2018.
Artículo en Inglés | MEDLINE | ID: mdl-29795809

RESUMEN

Although much work has linked the human microbiome to specific phenotypes and lifestyle variables, data from different projects have been challenging to integrate and the extent of microbial and molecular diversity in human stool remains unknown. Using standardized protocols from the Earth Microbiome Project and sample contributions from over 10,000 citizen-scientists, together with an open research network, we compare human microbiome specimens primarily from the United States, United Kingdom, and Australia to one another and to environmental samples. Our results show an unexpected range of beta-diversity in human stool microbiomes compared to environmental samples; demonstrate the utility of procedures for removing the effects of overgrowth during room-temperature shipping for revealing phenotype correlations; uncover new molecules and kinds of molecular communities in the human stool metabolome; and examine emergent associations among the microbiome, metabolome, and the diversity of plants that are consumed (rather than relying on reductive categorical variables such as veganism, which have little or no explanatory power). We also demonstrate the utility of the living data resource and cross-cohort comparison to confirm existing associations between the microbiome and psychiatric illness and to reveal the extent of microbiome change within one individual during surgery, providing a paradigm for open microbiome research and education. IMPORTANCE We show that a citizen science, self-selected cohort shipping samples through the mail at room temperature recaptures many known microbiome results from clinically collected cohorts and reveals new ones. Of particular interest is integrating n = 1 study data with the population data, showing that the extent of microbiome change after events such as surgery can exceed differences between distinct environmental biomes, and the effect of diverse plants in the diet, which we confirm with untargeted metabolomics on hundreds of samples.

10.
Aquat Toxicol ; 185: 48-57, 2017 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-28187360

RESUMEN

The ringed seal, Pusa hispida, is a keystone species in the Arctic marine ecosystem, and is proving a useful marine mammal for linking polychlorinated biphenyl (PCB) exposure to toxic injury. We report here the first de novo assembled transcriptome for the ringed seal (342,863 transcripts, of which 53% were annotated), which we then applied to a population of ringed seals exposed to a local PCB source in Arctic Labrador, Canada. We found an indication of energy metabolism imbalance in local ringed seals (n=4), and identified five significant gene transcript targets: plasminogen receptor (Plg-R(KT)), solute carrier family 25 member 43 receptor (Slc25a43), ankyrin repeat domain-containing protein 26-like receptor (Ankrd26), HIS30 (not yet annotated) and HIS16 (not yet annotated) that may represent indicators of PCB exposure and effects in marine mammals. The abundance profiles of these five gene targets were validated in blubber samples collected from 43 ringed seals using a qPCR assay. The mRNA transcript levels for all five gene targets, (Plg-R(KT), r2=0.43), (Slc25a43, r2=0.51), (Ankrd26, r2=0.43), (HIS30, r2=0.39) and (HIS16, r2=0.31) correlated with increasing levels of blubber PCBs. Results from the present study contribute to our understanding of PCB associated effects in marine mammals, and provide new tools for future molecular and toxicology work in pinnipeds.


Asunto(s)
Estructuras Animales/metabolismo , Exposición a Riesgos Ambientales/análisis , Indicadores de Salud , Bifenilos Policlorados/toxicidad , Phocidae/genética , Transcriptoma/genética , Animales , Perfilación de la Expresión Génica , Ontología de Genes , Anotación de Secuencia Molecular , Reacción en Cadena de la Polimerasa , ARN Mensajero/genética , ARN Mensajero/metabolismo , Reproducibilidad de los Resultados , Análisis de Secuencia de ARN , Contaminantes Químicos del Agua/toxicidad
12.
PLoS One ; 10(6): e0130720, 2015.
Artículo en Inglés | MEDLINE | ID: mdl-26121473

RESUMEN

In this work we studied the liver transcriptomes of two frog species, the American bullfrog (Rana (Lithobates) catesbeiana) and the African clawed frog (Xenopus laevis). We used high throughput RNA sequencing (RNA-seq) data to assemble and annotate these transcriptomes, and compared how their baseline expression profiles change when tadpoles of the two species are exposed to thyroid hormone. We generated more than 1.5 billion RNA-seq reads in total for the two species under two conditions as treatment/control pairs. We de novo assembled these reads using Trans-ABySS to reconstruct reference transcriptomes, obtaining over 350,000 and 130,000 putative transcripts for R. catesbeiana and X. laevis, respectively. Using available genomics resources for X. laevis, we annotated over 97% of our X. laevis transcriptome contigs, demonstrating the utility and efficacy of our methodology. Leveraging this validated analysis pipeline, we also annotated the assembled R. catesbeiana transcriptome. We used the expression profiles of the annotated genes of the two species to examine the similarities and differences between the tadpole liver transcriptomes. We also compared the gene ontology terms of expressed genes to measure how the animals react to a challenge by thyroid hormone. Our study reports three main conclusions. First, de novo assembly of RNA-seq data is a powerful method for annotating and establishing transcriptomes of non-model organisms. Second, the liver transcriptomes of the two frog species, R. catesbeiana and X. laevis, show many common features, and the distribution of their gene ontology profiles are statistically indistinguishable. Third, although they broadly respond the same way to the presence of thyroid hormone in their environment, their receptor/signal transduction pathways display marked differences.


Asunto(s)
Genoma , Genómica , Hígado/metabolismo , Rana catesbeiana/genética , Transcriptoma/genética , Xenopus laevis/genética , Animales , Perfilación de la Expresión Génica , Regulación del Desarrollo de la Expresión Génica , Ontología de Genes , Secuenciación de Nucleótidos de Alto Rendimiento , Larva/genética , Anotación de Secuencia Molecular , ARN Mensajero/genética , ARN Mensajero/metabolismo , Estándares de Referencia , Transducción de Señal/genética
13.
Gigascience ; 4: 35, 2015.
Artículo en Inglés | MEDLINE | ID: mdl-26244089

RESUMEN

BACKGROUND: Owing to the complexity of the assembly problem, we do not yet have complete genome sequences. The difficulty in assembling reads into finished genomes is exacerbated by sequence repeats and the inability of short reads to capture sufficient genomic information to resolve those problematic regions. In this regard, established and emerging long read technologies show great promise, but their current associated higher error rates typically require computational base correction and/or additional bioinformatics pre-processing before they can be of value. RESULTS: We present LINKS, the Long Interval Nucleotide K-mer Scaffolder algorithm, a method that makes use of the sequence properties of nanopore sequence data and other error-containing sequence data, to scaffold high-quality genome assemblies, without the need for read alignment or base correction. Here, we show how the contiguity of an ABySS Escherichia coli K-12 genome assembly can be increased greater than five-fold by the use of beta-released Oxford Nanopore Technologies Ltd. long reads and how LINKS leverages long-range information in Saccharomyces cerevisiae W303 nanopore reads to yield assemblies whose resulting contiguity and correctness are on par with or better than that of competing applications. We also present the re-scaffolding of the colossal white spruce (Picea glauca) draft assembly (PG29, 20 Gbp) and demonstrate how LINKS scales to larger genomes. CONCLUSIONS: This study highlights the present utility of nanopore reads for genome scaffolding in spite of their current limitations, which are expected to diminish as the nanopore sequencing technology advances. We expect LINKS to have broad utility in harnessing the potential of long reads in connecting high-quality sequences of small and large genome assembly drafts.


Asunto(s)
Genoma , Alineación de Secuencia
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA