Pesquisa | Biblioteca Virtual em Saúde

AdenPredictor: accurate prediction of the adenylation domain specificity of nonribosomal peptide biosynthetic gene clusters in microbial genomes.

Mongia, Mihir; Baral, Romel; Adduri, Abhinav; Yan, Donghui; Liu, Yudong; Bian, Yuying; Kim, Paul; Behsaz, Bahar; Mohimani, Hosein.

Bioinformatics ; 39(39 Suppl 1): i40-i46, 2023 06 30.

Artigo em Inglês | MEDLINE | ID: mdl-37387149

RESUMO

Microbial natural products represent a major source of bioactive compounds for drug discovery. Among these molecules, nonribosomal peptides (NRPs) represent a diverse class that include antibiotics, immunosuppressants, anticancer agents, toxins, siderophores, pigments, and cytostatics. The discovery of novel NRPs remains a laborious process because many NRPs consist of nonstandard amino acids that are assembled by nonribosomal peptide synthetases (NRPSs). Adenylation domains (A-domains) in NRPSs are responsible for selection and activation of monomers appearing in NRPs. During the past decade, several support vector machine-based algorithms have been developed for predicting the specificity of the monomers present in NRPs. These algorithms utilize physiochemical features of the amino acids present in the A-domains of NRPSs. In this article, we benchmarked the performance of various machine learning algorithms and features for predicting specificities of NRPSs and we showed that the extra trees model paired with one-hot encoding features outperforms the existing approaches. Moreover, we show that unsupervised clustering of 453â560 A-domains reveals many clusters that correspond to potentially novel amino acids. While it is challenging to predict the chemical structure of these amino acids, we developed novel techniques to predict their various properties, including polarity, hydrophobicity, charge, and presence of aromatic rings, carboxyl, and hydroxyl groups.

Assuntos

Aminoácidos , Genoma Microbiano , Algoritmos , Família Multigênica , Peptídeos

Fast mass spectrometry search and clustering of untargeted metabolomics data.

Mongia, Mihir; Yasaka, Tyler M; Liu, Yudong; Guler, Mustafa; Lu, Liang; Bhagwat, Aditya; Behsaz, Bahar; Wang, Mingxun; Dorrestein, Pieter C; Mohimani, Hosein.

Nat Biotechnol ; 2024 Jan 02.

Artigo em Inglês | MEDLINE | ID: mdl-38168990

RESUMO

The throughput of mass spectrometers and the amount of publicly available metabolomics data are growing rapidly, but analysis tools such as molecular networking and Mass Spectrometry Search Tool do not scale to searching and clustering billions of mass spectral data in metabolomics repositories. To address this limitation, we designed MASST+ and Networking+, which can process datasets that are up to three orders of magnitude larger than those processed by state-of-the-art tools.

Large scale sequence alignment via efficient inference in generative models.

Mongia, Mihir; Shen, Chengze; Davoodi, Arash Gholami; Marçais, Guillaume; Mohimani, Hosein.

Sci Rep ; 13(1): 7285, 2023 05 04.

Artigo em Inglês | MEDLINE | ID: mdl-37142645

RESUMO

Finding alignments between millions of reads and genome sequences is crucial in computational biology. Since the standard alignment algorithm has a large computational cost, heuristics have been developed to speed up this task. Though orders of magnitude faster, these methods lack theoretical guarantees and often have low sensitivity especially when reads have many insertions, deletions, and mismatches relative to the genome. Here we develop a theoretically principled and efficient algorithm that has high sensitivity across a wide range of insertion, deletion, and mutation rates. We frame sequence alignment as an inference problem in a probabilistic model. Given a reference database of reads and a query read, we find the match that maximizes a log-likelihood ratio of a reference read and query read being generated jointly from a probabilistic model versus independent models. The brute force solution to this problem computes joint and independent probabilities between each query and reference pair, and its complexity grows linearly with database size. We introduce a bucketing strategy where reads with higher log-likelihood ratio are mapped to the same bucket with high probability. Experimental results show that our method is more accurate than the state-of-the-art approaches in aligning long-reads from Pacific Bioscience sequencers to genome sequences.

Assuntos

Algoritmos , Genoma , Alinhamento de Sequência , Biologia Computacional/métodos , Probabilidade , Análise de Sequência de DNA/métodos , Software , Sequenciamento de Nucleotídeos em Larga Escala

An interpretable machine learning approach to identify mechanism of action of antibiotics.

Mongia, Mihir; Guler, Mustafa; Mohimani, Hosein.

Sci Rep ; 12(1): 10342, 2022 06 20.

Artigo em Inglês | MEDLINE | ID: mdl-35725893

RESUMO

As antibiotic resistance is becoming a major public health problem worldwide, one of the approaches for novel antibiotic discovery is re-purposing drugs available on the market for treating antibiotic resistant bacteria. The main economic advantage of this approach is that since these drugs have already passed all the safety tests, it vastly reduces the overall cost of clinical trials. Recently, several machine learning approaches have been developed for predicting promising antibiotics by training on bioactivity data collected on a set of small molecules. However, these methods report hundreds/thousands of bioactive molecules, and it remains unclear which of these molecules possess a novel mechanism of action. While the cost of high-throughput bioactivity testing has dropped dramatically in recent years, determining the mechanism of action of small molecules remains a costly and time-consuming step, and therefore computational methods for prioritizing molecules with novel mechanisms of action are needed. The existing approaches for predicting bioactivity of small molecules are based on uninterpretable machine learning, and therefore are not capable of determining known mechanism of action of small molecules and prioritizing novel mechanisms. We introduce InterPred, an interpretable technique for predicting bioactivity of small molecules and their mechanism of action. InterPred has the same accuracy as the state of the art in bioactivity prediction, and it enables assigning chemical moieties that are responsible for bioactivity. After analyzing bioactivity data of several thousand molecules against bacterial and fungal pathogens available from Community for Open Antimicrobial Drug Discovery and a US Food and Drug Association-approved drug library, InterPred identified five known links between moieties and mechanism of action.

Assuntos

Antibacterianos , Anti-Infecciosos , Antibacterianos/química , Antibacterianos/farmacologia , Bactérias , Descoberta de Drogas/métodos , Aprendizado de Máquina

Repository scale classification and decomposition of tandem mass spectral data.

Mongia, Mihir; Mohimani, Hosein.

Sci Rep ; 11(1): 8314, 2021 04 15.

Artigo em Inglês | MEDLINE | ID: mdl-33859284

RESUMO

Various studies have shown associations between molecular features and phenotypes of biological samples. These studies, however, focus on a single phenotype per study and are not applicable to repository scale metabolomics data. Here we report MetSummarizer, a method for predicting (i) the biological phenotypes of environmental and host-oriented samples, and (ii) the raw ingredient composition of complex mixtures. We show that the aggregation of various metabolomic datasets can improve the accuracy of predictions. Since these datasets have been collected using different standards at various laboratories, in order to get unbiased results it is crucial to detect and discard standard-specific features during the classification step. We further report high accuracy in prediction of the raw ingredient composition of complex foods from the Global Foodomics Project.

Assuntos

Conjuntos de Dados como Assunto , Análise de Alimentos , Metabolômica , Espectrometria de Massas em Tandem , Previsões , Sensibilidade e Especificidade

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA