Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 10 de 10
Filtrar
1.
BMC Bioinformatics ; 25(1): 174, 2024 May 02.
Artículo en Inglés | MEDLINE | ID: mdl-38698340

RESUMEN

BACKGROUND: In last two decades, the use of high-throughput sequencing technologies has accelerated the pace of discovery of proteins. However, due to the time and resource limitations of rigorous experimental functional characterization, the functions of a vast majority of them remain unknown. As a result, computational methods offering accurate, fast and large-scale assignment of functions to new and previously unannotated proteins are sought after. Leveraging the underlying associations between the multiplicity of features that describe proteins could reveal functional insights into the diverse roles of proteins and improve performance on the automatic function prediction task. RESULTS: We present GO-LTR, a multi-view multi-label prediction model that relies on a high-order tensor approximation of model weights combined with non-linear activation functions. The model is capable of learning high-order relationships between multiple input views representing the proteins and predicting high-dimensional multi-label output consisting of protein functional categories. We demonstrate the competitiveness of our method on various performance measures. Experiments show that GO-LTR learns polynomial combinations between different protein features, resulting in improved performance. Additional investigations establish GO-LTR's practical potential in assigning functions to proteins under diverse challenging scenarios: very low sequence similarity to previously observed sequences, rarely observed and highly specific terms in the gene ontology. IMPLEMENTATION: The code and data used for training GO-LTR is available at https://github.com/aalto-ics-kepaco/GO-LTR-prediction .


Asunto(s)
Biología Computacional , Proteínas , Proteínas/química , Proteínas/metabolismo , Biología Computacional/métodos , Bases de Datos de Proteínas , Algoritmos
2.
PLoS Comput Biol ; 18(6): e1010177, 2022 06.
Artículo en Inglés | MEDLINE | ID: mdl-35658018

RESUMEN

Engineered microbial cells present a sustainable alternative to fossil-based synthesis of chemicals and fuels. Cellular synthesis routes are readily assembled and introduced into microbial strains using state-of-the-art synthetic biology tools. However, the optimization of the strains required to reach industrially feasible production levels is far less efficient. It typically relies on trial-and-error leading into high uncertainty in total duration and cost. New techniques that can cope with the complexity and limited mechanistic knowledge of the cellular regulation are called for guiding the strain optimization. In this paper, we put forward a multi-agent reinforcement learning (MARL) approach that learns from experiments to tune the metabolic enzyme levels so that the production is improved. Our method is model-free and does not assume prior knowledge of the microbe's metabolic network or its regulation. The multi-agent approach is well-suited to make use of parallel experiments such as multi-well plates commonly used for screening microbial strains. We demonstrate the method's capabilities using the genome-scale kinetic model of Escherichia coli, k-ecoli457, as a surrogate for an in vivo cell behaviour in cultivation experiments. We investigate the method's performance relevant for practical applicability in strain engineering i.e. the speed of convergence towards the optimum response, noise tolerance, and the statistical stability of the solutions found. We further evaluate the proposed MARL approach in improving L-tryptophan production by yeast Saccharomyces cerevisiae, using publicly available experimental data on the performance of a combinatorial strain library. Overall, our results show that multi-agent reinforcement learning is a promising approach for guiding the strain optimization beyond mechanistic knowledge, with the goal of faster and more reliably obtaining industrially attractive production levels.


Asunto(s)
Ingeniería Metabólica , Saccharomyces cerevisiae , Escherichia coli/genética , Escherichia coli/metabolismo , Ingeniería Metabólica/métodos , Redes y Vías Metabólicas , Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/metabolismo , Biología Sintética
3.
Bioinformatics ; 37(Suppl_1): i93-i101, 2021 07 12.
Artículo en Inglés | MEDLINE | ID: mdl-34252952

RESUMEN

MOTIVATION: Combination therapies have emerged as a powerful treatment modality to overcome drug resistance and improve treatment efficacy. However, the number of possible drug combinations increases very rapidly with the number of individual drugs in consideration, which makes the comprehensive experimental screening infeasible in practice. Machine-learning models offer time- and cost-efficient means to aid this process by prioritizing the most effective drug combinations for further pre-clinical and clinical validation. However, the complexity of the underlying interaction patterns across multiple drug doses and in different cellular contexts poses challenges to the predictive modeling of drug combination effects. RESULTS: We introduce comboLTR, highly time-efficient method for learning complex, non-linear target functions for describing the responses of therapeutic agent combinations in various doses and cancer cell-contexts. The method is based on a polynomial regression via powerful latent tensor reconstruction. It uses a combination of recommender system-style features indexing the data tensor of response values in different contexts, and chemical and multi-omics features as inputs. We demonstrate that comboLTR outperforms state-of-the-art methods in terms of predictive performance and running time, and produces highly accurate results even in the challenging and practical inference scenario where full dose-response matrices are predicted for completely new drug combinations with no available combination and monotherapy response measurements in any training cell line. AVAILABILITY AND IMPLEMENTATION: comboLTR code is available at https://github.com/aalto-ics-kepaco/ComboLTR. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Biología Computacional , Neoplasias , Algoritmos , Línea Celular , Combinación de Medicamentos , Humanos
4.
Bioinformatics ; 34(17): i875-i883, 2018 09 01.
Artículo en Inglés | MEDLINE | ID: mdl-30423079

RESUMEN

Motivation: Liquid Chromatography (LC) followed by tandem Mass Spectrometry (MS/MS) is one of the predominant methods for metabolite identification. In recent years, machine learning has started to transform the analysis of tandem mass spectra and the identification of small molecules. In contrast, LC data is rarely used to improve metabolite identification, despite numerous published methods for retention time prediction using machine learning. Results: We present a machine learning method for predicting the retention order of molecules; that is, the order in which molecules elute from the LC column. Our method has important advantages over previous approaches: We show that retention order is much better conserved between instruments than retention time. To this end, our method can be trained using retention time measurements from different LC systems and configurations without tedious pre-processing, significantly increasing the amount of available training data. Our experiments demonstrate that retention order prediction is an effective way to learn retention behaviour of molecules from heterogeneous retention time data. Finally, we demonstrate how retention order prediction and MS/MS-based scores can be combined for more accurate metabolite identifications when analyzing a complete LC-MS/MS run. Availability and implementation: Implementation of the method is available at https://version.aalto.fi/gitlab/bache1/retention_order_prediction.git.


Asunto(s)
Cromatografía Liquida/métodos , Espectrometría de Masas en Tándem/métodos
5.
Bioinformatics ; 34(13): i509-i518, 2018 07 01.
Artículo en Inglés | MEDLINE | ID: mdl-29949975

RESUMEN

Motivation: Many inference problems in bioinformatics, including drug bioactivity prediction, can be formulated as pairwise learning problems, in which one is interested in making predictions for pairs of objects, e.g. drugs and their targets. Kernel-based approaches have emerged as powerful tools for solving problems of that kind, and especially multiple kernel learning (MKL) offers promising benefits as it enables integrating various types of complex biomedical information sources in the form of kernels, along with learning their importance for the prediction task. However, the immense size of pairwise kernel spaces remains a major bottleneck, making the existing MKL algorithms computationally infeasible even for small number of input pairs. Results: We introduce pairwiseMKL, the first method for time- and memory-efficient learning with multiple pairwise kernels. pairwiseMKL first determines the mixture weights of the input pairwise kernels, and then learns the pairwise prediction function. Both steps are performed efficiently without explicit computation of the massive pairwise matrices, therefore making the method applicable to solving large pairwise learning problems. We demonstrate the performance of pairwiseMKL in two related tasks of quantitative drug bioactivity prediction using up to 167 995 bioactivity measurements and 3120 pairwise kernels: (i) prediction of anticancer efficacy of drug compounds across a large panel of cancer cell lines; and (ii) prediction of target profiles of anticancer compounds across their kinome-wide target spaces. We show that pairwiseMKL provides accurate predictions using sparse solutions in terms of selected kernels, and therefore it automatically identifies also data sources relevant for the prediction problem. Availability and implementation: Code is available at https://github.com/aalto-ics-kepaco. Supplementary information: Supplementary data are available at Bioinformatics online.


Asunto(s)
Antineoplásicos/farmacología , Biología Computacional/métodos , Descubrimiento de Drogas/métodos , Neoplasias/tratamiento farmacológico , Máquina de Vectores de Soporte , Antineoplásicos/uso terapéutico , Línea Celular Tumoral , Humanos , Neoplasias/enzimología , Neoplasias/metabolismo , Proteínas Quinasas/efectos de los fármacos , Proteínas Quinasas/metabolismo , Transducción de Señal , Programas Informáticos , Resultado del Tratamiento
6.
Nat Commun ; 11(1): 6136, 2020 12 01.
Artículo en Inglés | MEDLINE | ID: mdl-33262326

RESUMEN

We present comboFM, a machine learning framework for predicting the responses of drug combinations in pre-clinical studies, such as those based on cell lines or patient-derived cells. comboFM models the cell context-specific drug interactions through higher-order tensors, and efficiently learns latent factors of the tensor using powerful factorization machines. The approach enables comboFM to leverage information from previous experiments performed on similar drugs and cells when predicting responses of new combinations in so far untested cells; thereby, it achieves highly accurate predictions despite sparsely populated data tensors. We demonstrate high predictive performance of comboFM in various prediction scenarios using data from cancer cell line pharmacogenomic screens. Subsequent experimental validation of a set of previously untested drug combinations further supports the practical and robust applicability of comboFM. For instance, we confirm a novel synergy between anaplastic lymphoma kinase (ALK) inhibitor crizotinib and proteasome inhibitor bortezomib in lymphoma cells. Overall, our results demonstrate that comboFM provides an effective means for systematic pre-screening of drug combinations to support precision oncology applications.


Asunto(s)
Antineoplásicos/farmacología , Aprendizaje Automático , Bortezomib/farmacología , Línea Celular Tumoral , Crizotinib/farmacología , Interacciones Farmacológicas , Humanos , Linfoma/tratamiento farmacológico , Medicina de Precisión
7.
Front Comput Neurosci ; 9: 104, 2015.
Artículo en Inglés | MEDLINE | ID: mdl-26321941

RESUMEN

This paper investigates how utilizing diversity priors can discover early visual features that resemble their biological counterparts. The study is mainly motivated by the sparsity and selectivity of activations of visual neurons in area V1. Most previous work on computational modeling emphasizes selectivity or sparsity independently. However, we argue that selectivity and sparsity are just two epiphenomena of the diversity of receptive fields, which has been rarely exploited in learning. In this paper, to verify our hypothesis, restricted Boltzmann machines (RBMs) are employed to learn early visual features by modeling the statistics of natural images. Considering RBMs as neural networks, the receptive fields of neurons are formed by the inter-weights between hidden and visible nodes. Due to the conditional independence in RBMs, there is no mechanism to coordinate the activations of individual neurons or the whole population. A diversity prior is introduced in this paper for training RBMs. We find that the diversity prior indeed can assure simultaneously sparsity and selectivity of neuron activations. The learned receptive fields yield a high degree of biological similarity in comparison to physiological data. Also, corresponding visual features display a good generative capability in image reconstruction.

8.
J Psychosom Res ; 54(6): 549-57, 2003 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-12781309

RESUMEN

OBJECTIVE: To evaluate the prevalence of depressive symptoms in patients with different kinds of allergic diseases and the connection of depressive symptoms with the severity, type and seasonality of allergic complaints. METHODS: Data was obtained via a cross-sectional multicenter questionnaire survey of 528 patients aged 16-60 years attending six regional in- and outpatient allergy clinics in Hungary in June to July 1998. Consecutive patients completed a structured, self-administered questionnaire containing questions about their current and past allergic complaints. Depressive symptoms were measured by the Shortened Beck Depression Inventory (BDI). RESULTS: 32.2% of patients scored above the normal level (> or =10) and 12.5% had clinically significant depressive symptomatology (> or =19) by the BDI. These rates were significantly higher than those found in the control group from a national representative population sample (22.4% and 8.3%). Patients with asthma and with perennial symptoms had significantly higher depression scores than patients with other types of allergic diseases. There was a significant association between the severity of depressive symptoms and the severity of allergic complaints independently from age, sex, type and seasonality of the allergic disease, and other current physical illnesses and symptoms tested by the General Linear Model (GLM). CONCLUSIONS: Our results draw attention that patients even with mild depressive symptoms have significantly more severe allergic complaints and assess general health state as much worse than those without depressive symptoms in any types of allergic diseases. Diagnosis and treatment of depressive symptoms in allergic patients is of great concern from both a clinical and an economical point of view.


Asunto(s)
Depresión/psicología , Hipersensibilidad/psicología , Adulto , Afecto , Femenino , Estado de Salud , Humanos , Masculino , Persona de Mediana Edad , Escalas de Valoración Psiquiátrica , Estaciones del Año , Índice de Severidad de la Enfermedad
9.
BMC Proc ; 2 Suppl 4: S2, 2008 Dec 17.
Artículo en Inglés | MEDLINE | ID: mdl-19091049

RESUMEN

BACKGROUND: In this paper we describe work in progress in developing kernel methods for enzyme function prediction. Our focus is in developing so called structured output prediction methods, where the enzymatic reaction is the combinatorial target object for prediction. We compared two structured output prediction methods, the Hierarchical Max-Margin Markov algorithm (HM3) and the Maximum Margin Regression algorithm (MMR) in hierarchical classification of enzyme function. As sequence features we use various string kernels and the GTG feature set derived from the global alignment trace graph of protein sequences. RESULTS: In our experiments, in predicting enzyme EC classification we obtain over 85% accuracy (predicting the four digit EC code) and over 91% microlabel F1 score (predicting individual EC digits). In predicting the Gold Standard enzyme families, we obtain over 79% accuracy (predicting family correctly) and over 89% microlabel F1 score (predicting superfamilies and families). In the latter case, structured output methods are significantly more accurate than nearest neighbor classifier. A polynomial kernel over the GTG feature set turned out to be a prerequisite for accurate function prediction. Combining GTG with string kernels boosted accuracy slightly in the case of EC class prediction. CONCLUSION: Structured output prediction with GTG features is shown to be computationally feasible and to have accuracy on par with state-of-the-art approaches in enzyme function prediction.

10.
Neural Comput ; 16(12): 2639-64, 2004 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-15516276

RESUMEN

We present a general method using kernel canonical correlation analysis to learn a semantic representation to web images and their associated text. The semantic space provides a common representation and enables a comparison between the text and images. In the experiments, we look at two approaches of retrieving images based on only their content from a text query. We compare orthogonalization approaches against a standard cross-representation retrieval technique known as the generalized vector space model.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA