Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 33
Filtrar
Mais filtros

Bases de dados
País/Região como assunto
Tipo de documento
Intervalo de ano de publicação
1.
Bioinformatics ; 39(39 Suppl 1): i40-i46, 2023 06 30.
Artigo em Inglês | MEDLINE | ID: mdl-37387149

RESUMO

Microbial natural products represent a major source of bioactive compounds for drug discovery. Among these molecules, nonribosomal peptides (NRPs) represent a diverse class that include antibiotics, immunosuppressants, anticancer agents, toxins, siderophores, pigments, and cytostatics. The discovery of novel NRPs remains a laborious process because many NRPs consist of nonstandard amino acids that are assembled by nonribosomal peptide synthetases (NRPSs). Adenylation domains (A-domains) in NRPSs are responsible for selection and activation of monomers appearing in NRPs. During the past decade, several support vector machine-based algorithms have been developed for predicting the specificity of the monomers present in NRPs. These algorithms utilize physiochemical features of the amino acids present in the A-domains of NRPSs. In this article, we benchmarked the performance of various machine learning algorithms and features for predicting specificities of NRPSs and we showed that the extra trees model paired with one-hot encoding features outperforms the existing approaches. Moreover, we show that unsupervised clustering of 453 560 A-domains reveals many clusters that correspond to potentially novel amino acids. While it is challenging to predict the chemical structure of these amino acids, we developed novel techniques to predict their various properties, including polarity, hydrophobicity, charge, and presence of aromatic rings, carboxyl, and hydroxyl groups.


Assuntos
Aminoácidos , Genoma Microbiano , Algoritmos , Família Multigênica , Peptídeos
2.
Nat Methods ; 17(9): 905-908, 2020 09.
Artigo em Inglês | MEDLINE | ID: mdl-32839597

RESUMO

Molecular networking has become a key method to visualize and annotate the chemical space in non-targeted mass spectrometry data. We present feature-based molecular networking (FBMN) as an analysis method in the Global Natural Products Social Molecular Networking (GNPS) infrastructure that builds on chromatographic feature detection and alignment tools. FBMN enables quantitative analysis and resolution of isomers, including from ion mobility spectrometry.


Assuntos
Produtos Biológicos/química , Espectrometria de Massas , Biologia Computacional/métodos , Bases de Dados Factuais , Metabolômica/métodos , Software
3.
Genome Res ; 29(8): 1352-1362, 2019 08.
Artigo em Inglês | MEDLINE | ID: mdl-31160374

RESUMO

Predicting biosynthetic gene clusters (BGCs) is critically important for discovery of antibiotics and other natural products. While BGC prediction from complete genomes is a well-studied problem, predicting BGCs in fragmented genomic assemblies remains challenging. The existing BGC prediction tools often assume that each BGC is encoded within a single contig in the genome assembly, a condition that is violated for most sequenced microbial genomes where BGCs are often scattered through several contigs, making it difficult to reconstruct them. The situation is even more severe in shotgun metagenomics, where the contigs are often short, and the existing tools fail to predict a large fraction of long BGCs. While it is difficult to assemble BGCs in a single contig, the structure of the genome assembly graph often provides clues on how to combine multiple contigs into segments encoding long BGCs. We describe biosyntheticSPAdes, a tool for predicting BGCs in assembly graphs and demonstrate that it greatly improves the reconstruction of BGCs from genomic and metagenomics data sets.


Assuntos
Genes Bacterianos , Metagenoma , Metagenômica/métodos , Família Multigênica , Software , Mapeamento de Sequências Contíguas , Conjuntos de Dados como Assunto , Placa Dentária/microbiologia , Gengiva/microbiologia , Humanos , Internet , Mucosa Bucal/microbiologia , Faringe/microbiologia , Biossíntese de Proteínas , Língua/microbiologia
4.
Bioinformatics ; 37(Suppl_1): i231-i236, 2021 07 12.
Artigo em Inglês | MEDLINE | ID: mdl-34252948

RESUMO

MOTIVATION: Untargeted mass spectrometry experiments enable the profiling of metabolites in complex biological samples. The collected fragmentation spectra are the metabolite's fingerprints that are used for molecule identification and discovery. Two main mass spectrometry strategies exist for the collection of fragmentation spectra: data-dependent acquisition (DDA) and data-independent acquisition (DIA). In the DIA strategy, all the metabolites ions in predefined mass-to-charge ratio ranges are co-isolated and co-fragmented, resulting in multiplexed fragmentation spectra that are challenging to annotate. In contrast, in the DDA strategy, fragmentation spectra are dynamically and specifically collected for the most abundant ions observed, causing redundancy and sub-optimal fragmentation spectra collection. Yet, DDA results in less multiplexed fragmentation spectra that can be readily annotated. RESULTS: We introduce the MS2Planner workflow, an Iterative Optimized Data Acquisition strategy that optimizes the number of high-quality fragmentation spectra over multiple experimental acquisitions using topological sorting. Our results showed that MS2Planner increases the annotation rate by 38.6% and is 62.5% more sensitive and 9.4% more specific compared to DDA. AVAILABILITY AND IMPLEMENTATION: MS2Planner code is available at https://github.com/mohimanilab/MS2Planner. The generation of the inclusion list from MS2Planner was performed with python scripts available at https://github.com/lfnothias/IODA_MS. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Espectrometria de Massas , Íons , Fluxo de Trabalho
5.
Chem Soc Rev ; 49(11): 3297-3314, 2020 Jun 07.
Artigo em Inglês | MEDLINE | ID: mdl-32393943

RESUMO

Microbial and plant specialized metabolites constitute an immense chemical diversity, and play key roles in mediating ecological interactions between organisms. Also referred to as natural products, they have been widely applied in medicine, agriculture, cosmetic and food industries. Traditionally, the main discovery strategies have centered around the use of activity-guided fractionation of metabolite extracts. Increasingly, omics data is being used to complement this, as it has the potential to reduce rediscovery rates, guide experimental work towards the most promising metabolites, and identify enzymatic pathways that enable their biosynthetic production. In recent years, genomic and metabolomic analyses of specialized metabolic diversity have been scaled up to study thousands of samples simultaneously. Here, we survey data analysis technologies that facilitate the effective exploration of large genomic and metabolomic datasets, and discuss various emerging strategies to integrate these two types of omics data in order to further accelerate discovery.


Assuntos
Bactérias/metabolismo , Produtos Biológicos/química , Fungos/metabolismo , Genômica/métodos , Metabolômica/métodos , Plantas/metabolismo , Vias Biossintéticas , Biologia Computacional , Simulação por Computador , Mineração de Dados , Bases de Dados Genéticas , Descoberta de Drogas , Ensaios de Triagem em Larga Escala , Humanos , Metabolismo Secundário
6.
Nat Chem Biol ; 13(1): 30-37, 2017 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-27820803

RESUMO

Peptidic natural products (PNPs) are widely used compounds that include many antibiotics and a variety of other bioactive peptides. Although recent breakthroughs in PNP discovery raised the challenge of developing new algorithms for their analysis, identification of PNPs via database search of tandem mass spectra remains an open problem. To address this problem, natural product researchers use dereplication strategies that identify known PNPs and lead to the discovery of new ones, even in cases when the reference spectra are not present in existing spectral libraries. DEREPLICATOR is a new dereplication algorithm that enables high-throughput PNP identification and that is compatible with large-scale mass-spectrometry-based screening platforms for natural product discovery. After searching nearly one hundred million tandem mass spectra in the Global Natural Products Social (GNPS) molecular networking infrastructure, DEREPLICATOR identified an order of magnitude more PNPs (and their new variants) than any previous dereplication efforts.


Assuntos
Algoritmos , Produtos Biológicos/análise , Bases de Dados de Compostos Químicos , Descoberta de Drogas/métodos , Peptídeos/análise , Espectrometria de Massas em Tandem
7.
Nat Prod Rep ; 33(1): 73-86, 2016 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-26497201

RESUMO

Covering: 2000 to 2015. While recent breakthroughs in the discovery of peptide antibiotics and other Peptidic Natural Products (PNPs) raise a challenge for developing new algorithms for their analyses, the computational technologies for high-throughput PNP discovery are still lacking. We discuss the computational bottlenecks in analyzing PNPs and review recent advances in genome mining, peptidogenomics, and spectral networks that are now enabling the discovery of new PNPs via mass spectrometry. We further describe the connections between these advances and the new generation of software tools for PNP dereplication, de novo sequencing, and identification.


Assuntos
Antibacterianos , Produtos Biológicos , Peptídeos , Sequência de Aminoácidos , Antibacterianos/química , Antibacterianos/metabolismo , Produtos Biológicos/química , Produtos Biológicos/metabolismo , Genômica/métodos , Dados de Sequência Molecular , Estrutura Molecular , Peptídeos/química , Peptídeos/genética
8.
J Nat Prod ; 77(8): 1902-9, 2014 Aug 22.
Artigo em Inglês | MEDLINE | ID: mdl-25116163

RESUMO

Nonribosomal peptides (NRPs) such as vancomycin and daptomycin are among the most effective antibiotics. While NRPs are biomedically important, the computational techniques for sequencing these peptides are still in their infancy. The recent emergence of mass spectrometry techniques for NRP analysis (capable of sequencing an NRP from small amounts of nonpurified material) revealed an enormous diversity of NRPs. However, as many NRPs have nonlinear structure (e.g., cyclic or branched-cyclic peptides), the standard de novo sequencing tools (developed for linear peptides) are not applicable to NRP analysis. Here, we introduce the first NRP identification algorithm, NRPquest, that performs mutation-tolerant and modification-tolerant searches of spectral data sets against a database of putative NRPs. In contrast to previous studies aimed at NRP discovery (that usually report very few NRPs), NRPquest revealed nearly a hundred NRPs (including unknown variants of previously known peptides) in a single study. This result indicates that NRPquest can potentially make MS-based NRP identification as robust as the identification of linear peptides in traditional proteomics.


Assuntos
Antibacterianos/farmacologia , Produtos Biológicos/farmacologia , Peptídeos/farmacologia , Algoritmos , Antibacterianos/química , Bacillus/genética , Bacillus/metabolismo , Produtos Biológicos/química , Daptomicina/farmacologia , Espectrometria de Massas , Estrutura Molecular , Peptídeo Sintases/metabolismo , Peptídeos/química , Proteômica , Streptomyces/genética , Streptomyces/metabolismo , Vancomicina/farmacologia
9.
Nat Biotechnol ; 2024 Jan 02.
Artigo em Inglês | MEDLINE | ID: mdl-38168990

RESUMO

The throughput of mass spectrometers and the amount of publicly available metabolomics data are growing rapidly, but analysis tools such as molecular networking and Mass Spectrometry Search Tool do not scale to searching and clustering billions of mass spectral data in metabolomics repositories. To address this limitation, we designed MASST+ and Networking+, which can process datasets that are up to three orders of magnitude larger than those processed by state-of-the-art tools.

10.
Nat Commun ; 15(1): 5356, 2024 Jun 25.
Artigo em Inglês | MEDLINE | ID: mdl-38918378

RESUMO

Type 1 polyketides are a major class of natural products used as antiviral, antibiotic, antifungal, antiparasitic, immunosuppressive, and antitumor drugs. Analysis of public microbial genomes leads to the discovery of over sixty thousand type 1 polyketide gene clusters. However, the molecular products of only about a hundred of these clusters are characterized, leaving most metabolites unknown. Characterizing polyketides relies on bioactivity-guided purification, which is expensive and time-consuming. To address this, we present Seq2PKS, a machine learning algorithm that predicts chemical structures derived from Type 1 polyketide synthases. Seq2PKS predicts numerous putative structures for each gene cluster to enhance accuracy. The correct structure is identified using a variable mass spectral database search. Benchmarks show that Seq2PKS outperforms existing methods. Applying Seq2PKS to Actinobacteria datasets, we discover biosynthetic gene clusters for monazomycin, oasomycin A, and 2-aminobenzamide-actiphenol.


Assuntos
Espectrometria de Massas , Família Multigênica , Policetídeo Sintases , Policetídeos , Policetídeos/metabolismo , Policetídeos/química , Policetídeo Sintases/genética , Policetídeo Sintases/metabolismo , Espectrometria de Massas/métodos , Mineração de Dados/métodos , Aprendizado de Máquina , Actinobacteria/genética , Actinobacteria/metabolismo , Genoma Bacteriano , Algoritmos , Produtos Biológicos/química , Produtos Biológicos/metabolismo
11.
J Proteome Res ; 12(4): 1560-8, 2013 Apr 05.
Artigo em Inglês | MEDLINE | ID: mdl-23343606

RESUMO

While nonlinear peptide natural products such as Vancomycin and Daptomycin are among the most effective antibiotics, the computational techniques for sequencing such peptides are still in their infancy. Previous methods for sequencing peptide natural products are based on Nuclear Magnetic Resonance spectroscopy and require large amounts (milligrams) of purified materials. Recently, development of mass spectrometry-based methods has enabled accurate sequencing of nonlinear peptide natural products using picograms of material, but the question of evaluating statistical significance of Peptide Spectrum Matches (PSM) for these peptides remains open. Moreover, it is unclear how to decide whether a given spectrum is produced by a linear, cyclic, or branch-cyclic peptide. Surprisingly, all previous mass spectrometry studies overlooked the fact that a very similar problem has been successfully addressed in particle physics in 1951. In this work, we develop a method for estimating statistical significance of PSMs defined by any peptide (including linear and nonlinear). This method enables us to identify whether a peptide is linear, cyclic, or branch-cyclic, an important step toward identification of peptide natural products.


Assuntos
Interpretação Estatística de Dados , Bases de Dados de Proteínas , Peptídeos/análise , Peptídeos/química , Sequência de Aminoácidos , Proteínas de Bactérias/química , Haemophilus influenzae/química , Cadeias de Markov , Espectrometria de Massas/métodos , Dados de Sequência Molecular , Peptídeos Cíclicos/análise , Peptídeos Cíclicos/química , Probabilidade , Reprodutibilidade dos Testes
12.
Sci Rep ; 13(1): 7285, 2023 05 04.
Artigo em Inglês | MEDLINE | ID: mdl-37142645

RESUMO

Finding alignments between millions of reads and genome sequences is crucial in computational biology. Since the standard alignment algorithm has a large computational cost, heuristics have been developed to speed up this task. Though orders of magnitude faster, these methods lack theoretical guarantees and often have low sensitivity especially when reads have many insertions, deletions, and mismatches relative to the genome. Here we develop a theoretically principled and efficient algorithm that has high sensitivity across a wide range of insertion, deletion, and mutation rates. We frame sequence alignment as an inference problem in a probabilistic model. Given a reference database of reads and a query read, we find the match that maximizes a log-likelihood ratio of a reference read and query read being generated jointly from a probabilistic model versus independent models. The brute force solution to this problem computes joint and independent probabilities between each query and reference pair, and its complexity grows linearly with database size. We introduce a bucketing strategy where reads with higher log-likelihood ratio are mapped to the same bucket with high probability. Experimental results show that our method is more accurate than the state-of-the-art approaches in aligning long-reads from Pacific Bioscience sequencers to genome sequences.


Assuntos
Algoritmos , Genoma , Alinhamento de Sequência , Biologia Computacional/métodos , Probabilidade , Análise de Sequência de DNA/métodos , Software , Sequenciamento de Nucleotídeos em Larga Escala
13.
Nat Commun ; 14(1): 4219, 2023 07 14.
Artigo em Inglês | MEDLINE | ID: mdl-37452020

RESUMO

Recent analyses of public microbial genomes have found over a million biosynthetic gene clusters, the natural products of the majority of which remain unknown. Additionally, GNPS harbors billions of mass spectra of natural products without known structures and biosynthetic genes. We bridge the gap between large-scale genome mining and mass spectral datasets for natural product discovery by developing HypoRiPPAtlas, an Atlas of hypothetical natural product structures, which is ready-to-use for in silico database search of tandem mass spectra. HypoRiPPAtlas is constructed by mining genomes using seq2ripp, a machine-learning tool for the prediction of ribosomally synthesized and post-translationally modified peptides (RiPPs). In HypoRiPPAtlas, we identify RiPPs in microbes and plants. HypoRiPPAtlas could be extended to other natural product classes in the future by implementing corresponding biosynthetic logic. This study paves the way for large-scale explorations of biosynthetic pathways and chemical structures of microbial and plant RiPP classes.


Assuntos
Produtos Biológicos , Ribossomos , Ribossomos/metabolismo , Produtos Biológicos/química , Peptídeos/química , Bases de Dados Factuais , Espectrometria de Massas em Tandem , Processamento de Proteína Pós-Traducional
14.
Sci Rep ; 12(1): 10342, 2022 06 20.
Artigo em Inglês | MEDLINE | ID: mdl-35725893

RESUMO

As antibiotic resistance is becoming a major public health problem worldwide, one of the approaches for novel antibiotic discovery is re-purposing drugs available on the market for treating antibiotic resistant bacteria. The main economic advantage of this approach is that since these drugs have already passed all the safety tests, it vastly reduces the overall cost of clinical trials. Recently, several machine learning approaches have been developed for predicting promising antibiotics by training on bioactivity data collected on a set of small molecules. However, these methods report hundreds/thousands of bioactive molecules, and it remains unclear which of these molecules possess a novel mechanism of action. While the cost of high-throughput bioactivity testing has dropped dramatically in recent years, determining the mechanism of action of small molecules remains a costly and time-consuming step, and therefore computational methods for prioritizing molecules with novel mechanisms of action are needed. The existing approaches for predicting bioactivity of small molecules are based on uninterpretable machine learning, and therefore are not capable of determining known mechanism of action of small molecules and prioritizing novel mechanisms. We introduce InterPred, an interpretable technique for predicting bioactivity of small molecules and their mechanism of action. InterPred has the same accuracy as the state of the art in bioactivity prediction, and it enables assigning chemical moieties that are responsible for bioactivity. After analyzing bioactivity data of several thousand molecules against bacterial and fungal pathogens available from Community for Open Antimicrobial Drug Discovery and a US Food and Drug Association-approved drug library, InterPred identified five known links between moieties and mechanism of action.


Assuntos
Antibacterianos , Anti-Infecciosos , Antibacterianos/química , Antibacterianos/farmacologia , Bactérias , Descoberta de Drogas/métodos , Aprendizado de Máquina
15.
Metabolites ; 12(2)2022 Jan 26.
Artigo em Inglês | MEDLINE | ID: mdl-35208194

RESUMO

The human microbiome is a complex community of microorganisms, their enzymes, and the molecules they produce or modify. Recent studies show that imbalances in human microbial ecosystems can cause disease. Our microbiome affects our health through the products of biochemical reactions catalyzed by microbial enzymes (microbial biotransformations). Despite their significance, currently, there are no systematic strategies for identifying these chemical reactions, their substrates and molecular products, and their effects on health and disease. We present TransDiscovery, a computational algorithm that integrates molecular networks (connecting related molecules with similar mass spectra), association networks (connecting co-occurring molecules and microbes) and knowledge bases of microbial enzymes to discover microbial biotransformations, their substrates, and their products. After searching the metabolomics and metagenomics data from the American Gut Project and the Global Foodomic Project, TranDiscovery identified 17 potentially novel biotransformations from the human gut microbiome, along with the corresponding microbial species, substrates, and products.

16.
Proteomics ; 11(18): 3642-50, 2011 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-21751357

RESUMO

Some of the most effective antibiotics (e.g. Vancomycin and Daptomycin) are cyclic peptides produced by non-ribosomal biosynthetic pathways. While hundreds of biomedically important cyclic peptides have been sequenced, the computational techniques for sequencing cyclic peptides are still in their infancy. Previous methods for sequencing peptide antibiotics and other cyclic peptides are based on Nuclear Magnetic Resonance spectroscopy, and require large amount (miligrams) of purified materials that, for most compounds, are not possible to obtain. Recently, development of MS-based methods has provided some hope for accurate sequencing of cyclic peptides using picograms of materials. In this paper we develop a method for sequencing of cyclic peptides by multistage MS, and show its advantages over single-stage MS. The method is tested on known and new cyclic peptides from Bacillus brevis, Dianthus superbus and Streptomyces griseus, as well as a new family of cyclic peptides produced by marine bacteria.


Assuntos
Espectrometria de Massas/métodos , Peptídeos Cíclicos/química , Análise de Sequência de Proteína/métodos , Bacillus/química , Proteínas de Bactérias/análise , Proteínas de Bactérias/química , Produtos Biológicos/química , Dianthus/química , Macrolídeos/química , Espectroscopia de Ressonância Magnética , Cadeias de Markov , Streptomyces/química , Tirocidina/química
17.
J Proteome Res ; 10(10): 4505-12, 2011 Oct 07.
Artigo em Inglês | MEDLINE | ID: mdl-21851130

RESUMO

Hundreds of ribosomally synthesized cyclopeptides have been isolated from all domains of life, the vast majority having been reported in the last 15 years. Studies of cyclic peptides have highlighted their exceptional potential both as stable drug scaffolds and as biomedicines in their own right. Despite this, computational techniques for cyclopeptide identification are still in their infancy, with many such peptides remaining uncharacterized. Tandem mass spectrometry has occupied a niche role in cyclopeptide identification, taking over from traditional techniques such as nuclear magnetic resonance spectroscopy (NMR). MS/MS studies require only picogram quantities of peptide (compared to milligrams for NMR studies) and are applicable to complex samples, abolishing the requirement for time-consuming chromatographic purification. While database search tools such as Sequest and Mascot have become standard tools for the MS/MS identification of linear peptides, they are not applicable to cyclopeptides, due to the parent mass shift resulting from cyclization and different fragmentation patterns of cyclic peptides. In this paper, we describe the development of a novel database search methodology to aid in the identification of cyclopeptides by mass spectrometry and evaluate its utility in identifying two peptide rings from Helianthus annuus, a bacterial cannibalism factor from Bacillus subtilis, and a θ-defensin from Rhesus macaque.


Assuntos
Espectrometria de Massas/métodos , Peptídeos/química , Animais , Bacillus subtilis/metabolismo , Boroidretos/química , Bases de Dados Genéticas , Defensinas/química , Genoma , Helianthus/metabolismo , Macaca mulatta , Proteômica/métodos , Ribossomos/química , Espectrometria de Massas por Ionização e Dessorção a Laser Assistida por Matriz , Espectrometria de Massas em Tandem/métodos , Tripsina/química
18.
J Nat Prod ; 74(5): 928-36, 2011 May 27.
Artigo em Inglês | MEDLINE | ID: mdl-21488639

RESUMO

A family of cancer cell cytotoxic cyclodepsipeptides, veraguamides A-C (1-3) and H-L (4-8), were isolated from a collection of cf. Oscillatoria margaritifera obtained from the Coiba National Park, Panama, as part of the Panama International Cooperative Biodiversity Group program. The planar structure of veraguamide A (1) was deduced by 2D NMR spectroscopy and mass spectrometry, whereas the structures of 2-8 were mainly determined by a combination of 1H NMR and MS2/MS3 techniques. These new compounds are analogous to the mollusk-derived kulomo'opunalide natural products, with two of the veraguamides (C and H) containing the same terminal alkyne moiety. However, four veraguamides, A, B, K, and L, also feature an alkynyl bromide, a functionality that has been previously observed in only one other marine natural product, jamaicamide A. Veraguamide A showed potent cytotoxicity to the H-460 human lung cancer cell line (LD50=141 nM).


Assuntos
Depsipeptídeos/isolamento & purificação , Depsipeptídeos/farmacologia , Oscillatoria/química , Amidas/química , Amidas/isolamento & purificação , Depsipeptídeos/química , Ensaios de Seleção de Medicamentos Antitumorais , Humanos , Lipopeptídeos/química , Lipopeptídeos/isolamento & purificação , Biologia Marinha , Estrutura Molecular , Ressonância Magnética Nuclear Biomolecular , Panamá , Pirrolidinonas/química , Pirrolidinonas/isolamento & purificação
19.
Sci Rep ; 11(1): 8314, 2021 04 15.
Artigo em Inglês | MEDLINE | ID: mdl-33859284

RESUMO

Various studies have shown associations between molecular features and phenotypes of biological samples. These studies, however, focus on a single phenotype per study and are not applicable to repository scale metabolomics data. Here we report MetSummarizer, a method for predicting (i) the biological phenotypes of environmental and host-oriented samples, and (ii) the raw ingredient composition of complex mixtures. We show that the aggregation of various metabolomic datasets can improve the accuracy of predictions. Since these datasets have been collected using different standards at various laboratories, in order to get unbiased results it is crucial to detect and discard standard-specific features during the classification step. We further report high accuracy in prediction of the raw ingredient composition of complex foods from the Global Foodomics Project.


Assuntos
Conjuntos de Dados como Assunto , Análise de Alimentos , Metabolômica , Espectrometria de Massas em Tandem , Previsões , Sensibilidade e Especificidade
20.
Nat Commun ; 12(1): 3718, 2021 06 17.
Artigo em Inglês | MEDLINE | ID: mdl-34140479

RESUMO

Identification of small molecules is a critical task in various areas of life science. Recent advances in mass spectrometry have enabled the collection of tandem mass spectra of small molecules from hundreds of thousands of environments. To identify which molecules are present in a sample, one can search mass spectra collected from the sample against millions of molecular structures in small molecule databases. The existing approaches are based on chemistry domain knowledge, and they fail to explain many of the peaks in mass spectra of small molecules. Here, we present molDiscovery, a mass spectral database search method that improves both efficiency and accuracy of small molecule identification by learning a probabilistic model to match small molecules with their mass spectra. A search of over 8 million spectra from the Global Natural Product Social molecular networking infrastructure shows that molDiscovery correctly identify six times more unique small molecules than previous methods.


Assuntos
Ensaios de Triagem em Larga Escala/métodos , Metabolômica/métodos , Bibliotecas de Moléculas Pequenas/análise , Espectrometria de Massas em Tandem/métodos , Algoritmos , Bactérias/isolamento & purificação , Bactérias/metabolismo , Benchmarking , Simulação por Computador , Bases de Dados de Compostos Químicos , Humanos , Lipídeos/isolamento & purificação , Modelos Estatísticos , Plantas/metabolismo , Metabolismo Secundário , Software
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA