Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 50
Filtrar
1.
Bioinformatics ; 2024 Sep 13.
Artículo en Inglés | MEDLINE | ID: mdl-39271156

RESUMEN

MOTIVATION: Molecular representation learning (MRL) models molecules with low-dimensional vectors to support biological and chemical applications. Current methods primarily rely on intrinsic molecular information to learn molecular representations, but they often overlook effectively integrating domain knowledge into MRL. RESULTS: In this paper, we develop a reaction-enhanced graph learning (RXGL) framework for MRL, utilizing chemical reactions as domain knowledge. RXGL introduces dual graph learning modules to model molecule representation. One module employs graph convolutions on molecular graphs to capture molecule structures. The other module constructs a reaction-aware graph from chemical reactions and designs a novel graph attention network on this graph to integrate reaction-level relations into molecular modeling. To refine molecule representations, we design a reaction-based relation learning task, which considers the relations between the reactant and product sides in reactions. In addition, we introduce a cross-view contrastive task to strengthen the cooperative associations between molecular and reaction-aware graph learning. Experiment results show that our RXGL achieves strong performance in various downstream tasks, including product prediction, reaction classification, and molecular property prediction. AVAILABILITY AND IMPLEMENTATION: The code is publicly available at https://github.com/coder-ACAC/RLM. SUPPLEMENTARY INFORMATION: Supplementary data is available at Bioinformatics online.

2.
PLoS One ; 19(9): e0309921, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-39250478

RESUMEN

Multi-omics analysis offers a promising avenue to a better understanding of complex biological phenomena. In particular, untangling the pathophysiology of multifactorial health conditions such as the inflammatory bowel disease (IBD) could benefit from simultaneous consideration of several omics levels. However, taking full advantage of multi-omics data requires the adoption of suitable new tools. Multi-view learning, a machine learning technique that natively joins together heterogeneous data, is a natural source for such methods. Here we present a new approach to variable selection in unsupervised multi-view learning by applying stability selection to canonical correlation analysis (CCA). We apply our method, StabilityCCA, to simulated and real multi-omics data, and demonstrate its ability to find relevant variables and improve the stability of variable selection. In a case study on an IBD microbiome data set, we link together metagenomics and metabolomics, revealing a connection between their joint structure and the disease, and identifying potential biomarkers. Our results showcase the usefulness of multi-view learning in multi-omics analysis and demonstrate StabilityCCA as a powerful tool for biomarker discovery.


Asunto(s)
Biomarcadores , Enfermedades Inflamatorias del Intestino , Metabolómica , Humanos , Biomarcadores/metabolismo , Enfermedades Inflamatorias del Intestino/metabolismo , Metabolómica/métodos , Metagenómica/métodos , Aprendizaje Automático , Microbioma Gastrointestinal , Multiómica
3.
Bioinformatics ; 40(8)2024 08 02.
Artículo en Inglés | MEDLINE | ID: mdl-39115379

RESUMEN

MOTIVATION: Drug-target interactions (DTIs) hold a pivotal role in drug repurposing and elucidation of drug mechanisms of action. While single-targeted drugs have demonstrated clinical success, they often exhibit limited efficacy against complex diseases, such as cancers, whose development and treatment is dependent on several biological processes. Therefore, a comprehensive understanding of primary, secondary and even inactive targets becomes essential in the quest for effective and safe treatments for cancer and other indications. The human proteome offers over a thousand druggable targets, yet most FDA-approved drugs bind to only a small fraction of these targets. RESULTS: This study introduces an attention-based method (called as MMAtt-DTA) to predict drug-target bioactivities across human proteins within seven superfamilies. We meticulously examined nine different descriptor sets to identify optimal signature descriptors for predicting novel DTIs. Our testing results demonstrated Spearman correlations exceeding 0.72 (P < 0.001) for six out of seven superfamilies. The proposed method outperformed fourteen state-of-the-art machine learning, deep learning and graph-based methods and maintained relatively high performance for most target superfamilies when tested with independent bioactivity data sources. We computationally validated 185 676 drug-target pairs from ChEMBL-V33 that were not available during model training, achieving a reasonable performance with Spearman correlation >0.57 (P < 0.001) for most superfamilies. This underscores the robustness of the proposed method for predicting novel DTIs. Finally, we applied our method to predict missing bioactivities among 3492 approved molecules in ChEMBL-V33, offering a valuable tool for advancing drug mechanism discovery and repurposing existing drugs for new indications. AVAILABILITY AND IMPLEMENTATION: https://github.com/AronSchulman/MMAtt-DTA.


Asunto(s)
Reposicionamiento de Medicamentos , Humanos , Reposicionamiento de Medicamentos/métodos , Proteínas/metabolismo , Proteínas/química , Aprendizaje Automático , Biología Computacional/métodos , Descubrimiento de Drogas/métodos
4.
Curr Opin Struct Biol ; 86: 102827, 2024 06.
Artículo en Inglés | MEDLINE | ID: mdl-38705070

RESUMEN

In this mini-review, we explore the new prediction methods for drug combination synergy relying on high-throughput combinatorial screens. The fast progress of the field is witnessed in the more than thirty original machine learning methods published since 2021, a clear majority of them based on deep learning techniques. We aim to put these articles under a unifying lens by highlighting the core technologies, the data sources, the input data types and synergy scores used in the methods, as well as the prediction scenarios and evaluation protocols that the articles deal with. Our finding is that the best methods accurately solve the synergy prediction scenarios involving known drugs or cell lines while the scenarios involving new drugs or cell lines still fall short of an accurate prediction level.


Asunto(s)
Sinergismo Farmacológico , Humanos , Aprendizaje Automático , Ensayos Analíticos de Alto Rendimiento/métodos , Aprendizaje Profundo
5.
BMC Bioinformatics ; 25(1): 174, 2024 May 02.
Artículo en Inglés | MEDLINE | ID: mdl-38698340

RESUMEN

BACKGROUND: In last two decades, the use of high-throughput sequencing technologies has accelerated the pace of discovery of proteins. However, due to the time and resource limitations of rigorous experimental functional characterization, the functions of a vast majority of them remain unknown. As a result, computational methods offering accurate, fast and large-scale assignment of functions to new and previously unannotated proteins are sought after. Leveraging the underlying associations between the multiplicity of features that describe proteins could reveal functional insights into the diverse roles of proteins and improve performance on the automatic function prediction task. RESULTS: We present GO-LTR, a multi-view multi-label prediction model that relies on a high-order tensor approximation of model weights combined with non-linear activation functions. The model is capable of learning high-order relationships between multiple input views representing the proteins and predicting high-dimensional multi-label output consisting of protein functional categories. We demonstrate the competitiveness of our method on various performance measures. Experiments show that GO-LTR learns polynomial combinations between different protein features, resulting in improved performance. Additional investigations establish GO-LTR's practical potential in assigning functions to proteins under diverse challenging scenarios: very low sequence similarity to previously observed sequences, rarely observed and highly specific terms in the gene ontology. IMPLEMENTATION: The code and data used for training GO-LTR is available at https://github.com/aalto-ics-kepaco/GO-LTR-prediction .


Asunto(s)
Biología Computacional , Proteínas , Proteínas/química , Proteínas/metabolismo , Biología Computacional/métodos , Bases de Datos de Proteínas , Algoritmos
6.
J Cheminform ; 16(1): 46, 2024 Apr 22.
Artículo en Inglés | MEDLINE | ID: mdl-38650016

RESUMEN

Accurate atom mapping, which establishes correspondences between atoms in reactants and products, is a crucial step in analyzing chemical reactions. In this paper, we present a novel end-to-end approach that formulates the atom mapping problem as a deep graph matching task. Our proposed model, AMNet (Atom Matching Network), utilizes molecular graph representations and employs various atom and bond features using graph neural networks to capture the intricate structural characteristics of molecules, ensuring precise atom correspondence predictions. Notably, AMNet incorporates the consideration of molecule symmetry, enhancing accuracy while simultaneously reducing computational complexity. The integration of the Weisfeiler-Lehman isomorphism test for symmetry identification refines the model's predictions. Furthermore, our model maps the entire atom set in a chemical reaction, offering a comprehensive approach beyond focusing solely on the main molecules in reactions. We evaluated AMNet's performance on a subset of USPTO reaction datasets, addressing various tasks, including assessing the impact of molecular symmetry identification, understanding the influence of feature selection on AMNet performance, and comparing its performance with the state-of-the-art method. The result reveals an average accuracy of 97.3% on mapped atoms, with 99.7% of reactions correctly mapped when the correct mapped atom is within the top 10 predicted atoms.Scientific contributionThe paper introduces a novel end-to-end deep graph matching model for atom mapping, utilizing molecular graph representations to capture structural characteristics effectively. It enhances accuracy by integrating symmetry detection through the Weisfeiler-Lehman test, reducing the number of possible mappings and improving efficiency. Unlike previous methods, it maps the entire reaction, not just main components, providing a comprehensive view. Additionally, by integrating efficient graph matching techniques, it reduces computational complexity, making atom mapping more feasible.

7.
Adv Sci (Weinh) ; 11(8): e2306235, 2024 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-38095508

RESUMEN

Aerosol particles found in the atmosphere affect the climate and worsen air quality. To mitigate these adverse impacts, aerosol particle formation and aerosol chemistry in the atmosphere need to be better mapped out and understood. Currently, mass spectrometry is the single most important analytical technique in atmospheric chemistry and is used to track and identify compounds and processes. Large amounts of data are collected in each measurement of current time-of-flight and orbitrap mass spectrometers using modern rapid data acquisition practices. However, compound identification remains a major bottleneck during data analysis due to lacking reference libraries and analysis tools. Data-driven compound identification approaches could alleviate the problem, yet remain rare to non-existent in atmospheric science. In this perspective, the authors review the current state of data-driven compound identification with mass spectrometry in atmospheric science and discuss current challenges and possible future steps toward a digital era for atmospheric mass spectrometry.

8.
PLoS Comput Biol ; 18(6): e1010177, 2022 06.
Artículo en Inglés | MEDLINE | ID: mdl-35658018

RESUMEN

Engineered microbial cells present a sustainable alternative to fossil-based synthesis of chemicals and fuels. Cellular synthesis routes are readily assembled and introduced into microbial strains using state-of-the-art synthetic biology tools. However, the optimization of the strains required to reach industrially feasible production levels is far less efficient. It typically relies on trial-and-error leading into high uncertainty in total duration and cost. New techniques that can cope with the complexity and limited mechanistic knowledge of the cellular regulation are called for guiding the strain optimization. In this paper, we put forward a multi-agent reinforcement learning (MARL) approach that learns from experiments to tune the metabolic enzyme levels so that the production is improved. Our method is model-free and does not assume prior knowledge of the microbe's metabolic network or its regulation. The multi-agent approach is well-suited to make use of parallel experiments such as multi-well plates commonly used for screening microbial strains. We demonstrate the method's capabilities using the genome-scale kinetic model of Escherichia coli, k-ecoli457, as a surrogate for an in vivo cell behaviour in cultivation experiments. We investigate the method's performance relevant for practical applicability in strain engineering i.e. the speed of convergence towards the optimum response, noise tolerance, and the statistical stability of the solutions found. We further evaluate the proposed MARL approach in improving L-tryptophan production by yeast Saccharomyces cerevisiae, using publicly available experimental data on the performance of a combinatorial strain library. Overall, our results show that multi-agent reinforcement learning is a promising approach for guiding the strain optimization beyond mechanistic knowledge, with the goal of faster and more reliably obtaining industrially attractive production levels.


Asunto(s)
Ingeniería Metabólica , Saccharomyces cerevisiae , Escherichia coli/genética , Escherichia coli/metabolismo , Ingeniería Metabólica/métodos , Redes y Vías Metabólicas , Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/metabolismo , Biología Sintética
9.
Comput Struct Biotechnol J ; 20: 2807-2814, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-35685365

RESUMEN

Synergistic effects between drugs are rare and highly context-dependent and patient-specific. Hence, there is a need to develop novel approaches to stratify patients for optimal therapy regimens, especially in the context of personalized design of combinatorial treatments. Computational methods enable systematic in-silico screening of combination effects, and can thereby prioritize most potent combinations for further testing, among the massive number of potential combinations. To help researchers to choose a prediction method that best fits for various real-world applications, we carried out a systematic literature review of 117 computational methods developed to date for drug combination prediction, and classified the methods in terms of their combination prediction tasks and input data requirements. Most current methods focus on prediction or classification of combination synergy, and only a few methods consider the efficacy and potential toxicity of the combinations, which are the key determinants of therapeutic success of drug treatments. Furthermore, there is a need to further develop methods that enable dose-specific predictions of combination effects across multiple doses, which is important for clinical translation of the predictions, as well as model-based identification of biomarkers predictive of heterogeneous drug combination responses. Even if most of the computational methods reviewed focus on anticancer applications, many of the modelling approaches are also applicable to antiviral and other diseases or indications.

10.
Bioinformatics ; 37(Suppl_1): i93-i101, 2021 07 12.
Artículo en Inglés | MEDLINE | ID: mdl-34252952

RESUMEN

MOTIVATION: Combination therapies have emerged as a powerful treatment modality to overcome drug resistance and improve treatment efficacy. However, the number of possible drug combinations increases very rapidly with the number of individual drugs in consideration, which makes the comprehensive experimental screening infeasible in practice. Machine-learning models offer time- and cost-efficient means to aid this process by prioritizing the most effective drug combinations for further pre-clinical and clinical validation. However, the complexity of the underlying interaction patterns across multiple drug doses and in different cellular contexts poses challenges to the predictive modeling of drug combination effects. RESULTS: We introduce comboLTR, highly time-efficient method for learning complex, non-linear target functions for describing the responses of therapeutic agent combinations in various doses and cancer cell-contexts. The method is based on a polynomial regression via powerful latent tensor reconstruction. It uses a combination of recommender system-style features indexing the data tensor of response values in different contexts, and chemical and multi-omics features as inputs. We demonstrate that comboLTR outperforms state-of-the-art methods in terms of predictive performance and running time, and produces highly accurate results even in the challenging and practical inference scenario where full dose-response matrices are predicted for completely new drug combinations with no available combination and monotherapy response measurements in any training cell line. AVAILABILITY AND IMPLEMENTATION: comboLTR code is available at https://github.com/aalto-ics-kepaco/ComboLTR. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Biología Computacional , Neoplasias , Algoritmos , Línea Celular , Combinación de Medicamentos , Humanos
11.
PLoS Comput Biol ; 17(5): e1008920, 2021 05.
Artículo en Inglés | MEDLINE | ID: mdl-33945539

RESUMEN

Specialised metabolites from microbial sources are well-known for their wide range of biomedical applications, particularly as antibiotics. When mining paired genomic and metabolomic data sets for novel specialised metabolites, establishing links between Biosynthetic Gene Clusters (BGCs) and metabolites represents a promising way of finding such novel chemistry. However, due to the lack of detailed biosynthetic knowledge for the majority of predicted BGCs, and the large number of possible combinations, this is not a simple task. This problem is becoming ever more pressing with the increased availability of paired omics data sets. Current tools are not effective at identifying valid links automatically, and manual verification is a considerable bottleneck in natural product research. We demonstrate that using multiple link-scoring functions together makes it easier to prioritise true links relative to others. Based on standardising a commonly used score, we introduce a new, more effective score, and introduce a novel score using an Input-Output Kernel Regression approach. Finally, we present NPLinker, a software framework to link genomic and metabolomic data. Results are verified using publicly available data sets that include validated links.


Asunto(s)
Genética Microbiana/estadística & datos numéricos , Genómica/estadística & datos numéricos , Metabolómica/estadística & datos numéricos , Programas Informáticos , Vías Biosintéticas/genética , Biología Computacional , Minería de Datos , Bases de Datos Factuales , Bases de Datos Genéticas , Genoma Microbiano , Fenómenos Microbiológicos , Familia de Multigenes , Análisis de Regresión
12.
Bioinformatics ; 37(12): 1724-1731, 2021 07 19.
Artículo en Inglés | MEDLINE | ID: mdl-33244585

RESUMEN

MOTIVATION: Identification of small molecules in a biological sample remains a major bottleneck in molecular biology, despite a decade of rapid development of computational approaches for predicting molecular structures using mass spectrometry (MS) data. Recently, there has been increasing interest in utilizing other information sources, such as liquid chromatography (LC) retention time (RT), to improve identifications solely based on MS information, such as precursor mass-per-charge and tandem mass spectrometry (MS2). RESULTS: We put forward a probabilistic modelling framework to integrate MS and RT data of multiple features in an LC-MS experiment. We model the MS measurements and all pairwise retention order information as a Markov random field and use efficient approximate inference for scoring and ranking potential molecular structures. Our experiments show improved identification accuracy by combining MS2 data and retention orders using our approach, thereby outperforming state-of-the-art methods. Furthermore, we demonstrate the benefit of our model when only a subset of LC-MS features has MS2 measurements available besides MS1. AVAILABILITY AND IMPLEMENTATION: Software and data are freely available at https://github.com/aalto-ics-kepaco/msms_rt_score_integration. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Programas Informáticos , Espectrometría de Masas en Tándem , Cromatografía Liquida , Modelos Estadísticos
13.
Nat Biotechnol ; 39(4): 462-471, 2021 04.
Artículo en Inglés | MEDLINE | ID: mdl-33230292

RESUMEN

Metabolomics using nontargeted tandem mass spectrometry can detect thousands of molecules in a biological sample. However, structural molecule annotation is limited to structures present in libraries or databases, restricting analysis and interpretation of experimental data. Here we describe CANOPUS (class assignment and ontology prediction using mass spectrometry), a computational tool for systematic compound class annotation. CANOPUS uses a deep neural network to predict 2,497 compound classes from fragmentation spectra, including all biologically relevant classes. CANOPUS explicitly targets compounds for which neither spectral nor structural reference data are available and predicts classes lacking tandem mass spectrometry training data. In evaluation using reference data, CANOPUS reached very high prediction performance (average accuracy of 99.7% in cross-validation) and outperformed four baseline methods. We demonstrate the broad utility of CANOPUS by investigating the effect of microbial colonization in the mouse digestive system, through analysis of the chemodiversity of different Euphorbia plants and regarding the discovery of a marine natural product, revealing biological insights at the compound class level.


Asunto(s)
Organismos Acuáticos/química , Productos Biológicos/análisis , Biología Computacional/métodos , Euphorbia/química , Metabolómica/métodos , Animales , Cromatografía Liquida , Microbioma Gastrointestinal , Ratones , Redes Neurales de la Computación , Espectrometría de Masas en Tándem
14.
Nat Commun ; 11(1): 6136, 2020 12 01.
Artículo en Inglés | MEDLINE | ID: mdl-33262326

RESUMEN

We present comboFM, a machine learning framework for predicting the responses of drug combinations in pre-clinical studies, such as those based on cell lines or patient-derived cells. comboFM models the cell context-specific drug interactions through higher-order tensors, and efficiently learns latent factors of the tensor using powerful factorization machines. The approach enables comboFM to leverage information from previous experiments performed on similar drugs and cells when predicting responses of new combinations in so far untested cells; thereby, it achieves highly accurate predictions despite sparsely populated data tensors. We demonstrate high predictive performance of comboFM in various prediction scenarios using data from cancer cell line pharmacogenomic screens. Subsequent experimental validation of a set of previously untested drug combinations further supports the practical and robust applicability of comboFM. For instance, we confirm a novel synergy between anaplastic lymphoma kinase (ALK) inhibitor crizotinib and proteasome inhibitor bortezomib in lymphoma cells. Overall, our results demonstrate that comboFM provides an effective means for systematic pre-screening of drug combinations to support precision oncology applications.


Asunto(s)
Antineoplásicos/farmacología , Aprendizaje Automático , Bortezomib/farmacología , Línea Celular Tumoral , Crizotinib/farmacología , Interacciones Farmacológicas , Humanos , Linfoma/tratamiento farmacológico , Medicina de Precisión
15.
Comput Struct Biotechnol J ; 18: 3819-3832, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-33335681

RESUMEN

While high-throughput drug screening offers possibilities to profile phenotypic responses of hundreds of compounds, elucidation of the cell context-specific mechanisms of drug action requires additional analyses. To that end, we developed a computational target deconvolution pipeline that identifies the key target dependencies based on collective drug response patterns in each cell line separately. The pipeline combines quantitative drug-cell line responses with drug-target interaction networks among both intended on- and potent off-targets to identify pharmaceutically actionable and selective therapeutic targets. To demonstrate its performance, the target deconvolution pipeline was applied to 310 small molecules tested on 20 genetically and phenotypically heterogeneous triple-negative breast cancer (TNBC) cell lines to identify cell line-specific target mechanisms in terms of cytotoxic and cytostatic drug target vulnerabilities. The functional essentiality of each protein target was quantified with a target addiction score (TAS), as a measure of dependency of the cell line on the therapeutic target. The target dependency profiling was shown to capture inhibitory information that is complementary to that obtained from the structure or sensitivity of the drugs. Comparison of the TAS profiles and gene essentiality scores from CRISPR-Cas9 knockout screens revealed that certain proteins with low gene essentiality showed high target addictions, suggesting that they might be functioning as protein groups, and therefore be resistant to single gene knock-out. The comparative analysis discovered protein groups of potential multi-target synthetic lethal interactions, for instance, among histone deacetylases (HDACs). Our integrated approach also recovered a number of well-established TNBC cell line-specific drivers and known TNBC therapeutic targets, such as HDACs and cyclin-dependent kinases (CDKs). The present work provides novel insights into druggable vulnerabilities for TNBC, and opportunities to identify multi-target synthetic lethal interactions for further studies.

16.
Appl Microbiol Biotechnol ; 104(24): 10515-10529, 2020 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-33147349

RESUMEN

In this work, deoxyribose-5-phosphate aldolase (Ec DERA, EC 4.1.2.4) from Escherichia coli was chosen as the protein engineering target for improving the substrate preference towards smaller, non-phosphorylated aldehyde donor substrates, in particular towards acetaldehyde. The initial broad set of mutations was directed to 24 amino acid positions in the active site or in the close vicinity, based on the 3D complex structure of the E. coli DERA wild-type aldolase. The specific activity of the DERA variants containing one to three amino acid mutations was characterised using three different substrates. A novel machine learning (ML) model utilising Gaussian processes and feature learning was applied for the 3rd mutagenesis round to predict new beneficial mutant combinations. This led to the most clear-cut (two- to threefold) improvement in acetaldehyde (C2) addition capability with the concomitant abolishment of the activity towards the natural donor molecule glyceraldehyde-3-phosphate (C3P) as well as the non-phosphorylated equivalent (C3). The Ec DERA variants were also tested on aldol reaction utilising formaldehyde (C1) as the donor. Ec DERA wild-type was shown to be able to carry out this reaction, and furthermore, some of the improved variants on acetaldehyde addition reaction turned out to have also improved activity on formaldehyde. KEY POINTS: • DERA aldolases are promiscuous enzymes. • Synthetic utility of DERA aldolase was improved by protein engineering approaches. • Machine learning methods aid the protein engineering of DERA.


Asunto(s)
Escherichia coli , Fructosa-Bifosfato Aldolasa , Aldehído-Liasas/genética , Aldehído-Liasas/metabolismo , Escherichia coli/genética , Escherichia coli/metabolismo , Fructosa-Bifosfato Aldolasa/genética , Aprendizaje Automático , Ingeniería de Proteínas , Especificidad por Sustrato
17.
Bioinformatics ; 35(14): i548-i557, 2019 07 15.
Artículo en Inglés | MEDLINE | ID: mdl-31510676

RESUMEN

MOTIVATION: Metabolic flux balance analysis (FBA) is a standard tool in analyzing metabolic reaction rates compatible with measurements, steady-state and the metabolic reaction network stoichiometry. Flux analysis methods commonly place model assumptions on fluxes due to the convenience of formulating the problem as a linear programing model, while many methods do not consider the inherent uncertainty in flux estimates. RESULTS: We introduce a novel paradigm of Bayesian metabolic flux analysis that models the reactions of the whole genome-scale cellular system in probabilistic terms, and can infer the full flux vector distribution of genome-scale metabolic systems based on exchange and intracellular (e.g. 13C) flux measurements, steady-state assumptions, and objective function assumptions. The Bayesian model couples all fluxes jointly together in a simple truncated multivariate posterior distribution, which reveals informative flux couplings. Our model is a plug-in replacement to conventional metabolic balance methods, such as FBA. Our experiments indicate that we can characterize the genome-scale flux covariances, reveal flux couplings, and determine more intracellular unobserved fluxes in Clostridium acetobutylicum from 13C data than flux variability analysis. AVAILABILITY AND IMPLEMENTATION: The COBRA compatible software is available at github.com/markusheinonen/bamfa. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Clostridium acetobutylicum , Análisis de Flujos Metabólicos , Teorema de Bayes , Redes y Vías Metabólicas , Modelos Biológicos
18.
Metabolites ; 9(8)2019 Aug 01.
Artículo en Inglés | MEDLINE | ID: mdl-31374904

RESUMEN

In small molecule identification from tandem mass (MS/MS) spectra, input-output kernel regression (IOKR) currently provides the state-of-the-art combination of fast training and prediction and high identification rates. The IOKR approach can be simply understood as predicting a fingerprint vector from the MS/MS spectrum of the unknown molecule, and solving a pre-image problem to find the molecule with the most similar fingerprint. In this paper, we bring forward the following improvements to the IOKR framework: firstly, we formulate the IOKRreverse model that can be understood as mapping molecular structures into the MS/MS feature space and solving a pre-image problem to find the molecule whose predicted spectrum is the closest to the input MS/MS spectrum. Secondly, we introduce an approach to combine several IOKR and IOKRreverse models computed from different input and output kernels, called IOKRfusion. The method is based on minimizing structured Hinge loss of the combined model using a mini-batch stochastic subgradient optimization. Our experiments show a consistent improvement of top-k accuracy both in positive and negative ionization mode data.

19.
Nat Methods ; 16(4): 299-302, 2019 04.
Artículo en Inglés | MEDLINE | ID: mdl-30886413

RESUMEN

Mass spectrometry is a predominant experimental technique in metabolomics and related fields, but metabolite structural elucidation remains highly challenging. We report SIRIUS 4 (https://bio.informatik.uni-jena.de/sirius/), which provides a fast computational approach for molecular structure identification. SIRIUS 4 integrates CSI:FingerID for searching in molecular structure databases. Using SIRIUS 4, we achieved identification rates of more than 70% on challenging metabolomics datasets.


Asunto(s)
Metabolómica/métodos , Estructura Molecular , Procesamiento de Señales Asistido por Computador , Espectrometría de Masas en Tándem/métodos , Algoritmos , Teorema de Bayes , Biomarcadores , Análisis por Conglomerados , Biología Computacional/métodos , Gráficos por Computador , Bases de Datos Factuales , Procesamiento Automatizado de Datos , Internet , Isótopos , Funciones de Verosimilitud , Metaboloma , Redes Neurales de la Computación , Lenguajes de Programación , Interfaz Usuario-Computador
20.
Bioinformatics ; 34(17): i875-i883, 2018 09 01.
Artículo en Inglés | MEDLINE | ID: mdl-30423079

RESUMEN

Motivation: Liquid Chromatography (LC) followed by tandem Mass Spectrometry (MS/MS) is one of the predominant methods for metabolite identification. In recent years, machine learning has started to transform the analysis of tandem mass spectra and the identification of small molecules. In contrast, LC data is rarely used to improve metabolite identification, despite numerous published methods for retention time prediction using machine learning. Results: We present a machine learning method for predicting the retention order of molecules; that is, the order in which molecules elute from the LC column. Our method has important advantages over previous approaches: We show that retention order is much better conserved between instruments than retention time. To this end, our method can be trained using retention time measurements from different LC systems and configurations without tedious pre-processing, significantly increasing the amount of available training data. Our experiments demonstrate that retention order prediction is an effective way to learn retention behaviour of molecules from heterogeneous retention time data. Finally, we demonstrate how retention order prediction and MS/MS-based scores can be combined for more accurate metabolite identifications when analyzing a complete LC-MS/MS run. Availability and implementation: Implementation of the method is available at https://version.aalto.fi/gitlab/bache1/retention_order_prediction.git.


Asunto(s)
Cromatografía Liquida/métodos , Espectrometría de Masas en Tándem/métodos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA