Búsqueda | Portal Regional de la BVS

Non-linearity of Metabolic Pathways Critically Influences the Choice of Machine Learning Model.

Lo-Thong-Viramoutou, Ophélie; Charton, Philippe; Cadet, Xavier F; Grondin-Perez, Brigitte; Saavedra, Emma; Damour, Cédric; Cadet, Frédéric.

Front Artif Intell ; 5: 744755, 2022.

Artículo en Inglés | MEDLINE | ID: mdl-35757298

RESUMEN

The use of machine learning (ML) in life sciences has gained wide interest over the past years, as it speeds up the development of high performing models. Important modeling tools in biology have proven their worth for pathway design, such as mechanistic models and metabolic networks, as they allow better understanding of mechanisms involved in the functioning of organisms. However, little has been done on the use of ML to model metabolic pathways, and the degree of non-linearity associated with them is not clear. Here, we report the construction of different metabolic pathways with several linear and non-linear ML models. Different types of data are used; they lead to the prediction of important biological data, such as pathway flux and final product concentration. A comparison reveals that the data features impact model performance and highlight the effectiveness of non-linear models (e.g., QRF: RMSE = 0.021 nmol·min-1 and R2 = 1 vs. Bayesian GLM: RMSE = 1.379 nmol·min-1 R2 = 0.823). It turns out that the greater the degree of non-linearity of the pathway, the better suited a non-linear model will be. Therefore, a decision-making support for pathway modeling is established. These findings generally support the hypothesis that non-linear aspects predominate within the metabolic pathways. This must be taken into account when devising possible applications of these pathways for the identification of biomarkers of diseases (e.g., infections, cancer, neurodegenerative diseases) or the optimization of industrial production processes.

Learning Strategies in Protein Directed Evolution.

Cadet, Xavier F; Gelly, Jean Christophe; van Noord, Aster; Cadet, Frédéric; Acevedo-Rocha, Carlos G.

Methods Mol Biol ; 2461: 225-275, 2022.

Artículo en Inglés | MEDLINE | ID: mdl-35727454

RESUMEN

Synthetic biology is a fast-evolving research field that combines biology and engineering principles to develop new biological systems for medical, pharmacological, and industrial applications. Synthetic biologists use iterative "design, build, test, and learn" cycles to efficiently engineer genetic systems that are reliable, reproducible, and predictable. Protein engineering by directed evolution can benefit from such a systematic engineering approach for various reasons. Learning can be carried out before starting, throughout or after finalizing a directed evolution project. Computational tools, bioinformatics, and scanning mutagenesis methods can be excellent starting points, while molecular dynamics simulations and other strategies can guide engineering efforts. Similarly, studying protein intermediates along evolutionary pathways offers fascinating insights into the molecular mechanisms shaped by evolution. The learning step of the cycle is not only crucial for proteins or enzymes that are not suitable for high-throughput screening or selection systems, but it is also valuable for any platform that can generate a large amount of data that can be aided by machine learning algorithms. The main challenge in protein engineering is to predict the effect of a single mutation on one functional parameter-to say nothing of several mutations on multiple parameters. This is largely due to nonadditive mutational interactions, known as epistatic effects-beneficial mutations present in a genetic background may not be beneficial in another genetic background. In this work, we provide an overview of experimental and computational strategies that can guide the user to learn protein function at different stages in a directed evolution project. We also discuss how epistatic effects can influence the success of directed evolution projects. Since machine learning is gaining momentum in protein engineering and the field is becoming more interdisciplinary thanks to collaboration between mathematicians, computational scientists, engineers, molecular biologists, and chemists, we provide a general workflow that familiarizes nonexperts with the basic concepts, dataset requirements, learning approaches, model capabilities and performance metrics of this intriguing area. Finally, we also provide some practical recommendations on how machine learning can harness epistatic effects for engineering proteins in an "outside-the-box" way.

Asunto(s)

Evolución Molecular Dirigida , Ingeniería de Proteínas , Evolución Molecular Dirigida/métodos , Ingeniería de Proteínas/métodos , Proteínas/genética , Biología Sintética

Correction to: Learning Strategies in Protein Directed Evolution.

Cadet, Xavier F; Gelly, Jean Christophe; van Noord, Aster; Cadet, Frédéric; Acevedo-Rocha, Carlos G.

Methods Mol Biol ; 2461: C1, 2022.

Artículo en Inglés | MEDLINE | ID: mdl-37062797

Machine Learning Enables Selection of Epistatic Enzyme Mutants for Stability Against Unfolding and Detrimental Aggregation.

Li, Guangyue; Qin, Youcai; Fontaine, Nicolas T; Ng Fuk Chong, Matthieu; Maria-Solano, Miguel A; Feixas, Ferran; Cadet, Xavier F; Pandjaitan, Rudy; Garcia-Borràs, Marc; Cadet, Frederic; Reetz, Manfred T.

Chembiochem ; 22(5): 904-914, 2021 03 02.

Artículo en Inglés | MEDLINE | ID: mdl-33094545

RESUMEN

Machine learning (ML) has pervaded most areas of protein engineering, including stability and stereoselectivity. Using limonene epoxide hydrolase as the model enzyme and innov'SAR as the ML platform, comprising a digital signal process, we achieved high protein robustness that can resist unfolding with concomitant detrimental aggregation. Fourier transform (FT) allows us to take into account the order of the protein sequence and the nonlinear interactions between positions, and thus to grasp epistatic phenomena. The innov'SAR approach is interpolative, extrapolative and makes outside-the-box, predictions not found in other state-of-the-art ML or deep learning approaches. Equally significant is the finding that our approach to ML in the present context, flanked by advanced molecular dynamics simulations, uncovers the connection between epistatic mutational interactions and protein robustness.

Asunto(s)

Epóxido Hidrolasas/química , Epóxido Hidrolasas/metabolismo , Aprendizaje Automático , Mutación , Pliegue de Proteína , Multimerización de Proteína , Rhodococcus/enzimología , Epóxido Hidrolasas/genética , Limoneno/química , Limoneno/metabolismo , Simulación de Dinámica Molecular , Ingeniería de Proteínas

Identification of flux checkpoints in a metabolic pathway through white-box, grey-box and black-box modeling approaches.

Lo-Thong, Ophélie; Charton, Philippe; Cadet, Xavier F; Grondin-Perez, Brigitte; Saavedra, Emma; Damour, Cédric; Cadet, Frédéric.

Sci Rep ; 10(1): 13446, 2020 08 10.

Artículo en Inglés | MEDLINE | ID: mdl-32778715

RESUMEN

Metabolic pathway modeling plays an increasing role in drug design by allowing better understanding of the underlying regulation and controlling networks in the metabolism of living organisms. However, despite rapid progress in this area, pathway modeling can become a real nightmare for researchers, notably when few experimental data are available or when the pathway is highly complex. Here, three different approaches were developed to model the second part of glycolysis of E. histolytica as an application example, and have succeeded in predicting the final pathway flux: one including detailed kinetic information (white-box), another with an added adjustment term (grey-box) and the last one using an artificial neural network method (black-box). Afterwards, each model was used for metabolic control analysis and flux control coefficient determination. The first two enzymes of this pathway are identified as the key enzymes playing a role in flux control. This study revealed the significance of the three methods for building suitable models adjusted to the available data in the field of metabolic pathway modeling, and could be useful to biologists and modelers.

Asunto(s)

Glucólisis/fisiología , Redes y Vías Metabólicas/fisiología , Simulación por Computador , Entamoeba histolytica/metabolismo , Cinética , Modelos Biológicos , Modelos Teóricos , Fenómenos Físicos

Use of Machine Learning and Infrared Spectra for Rheological Characterization and Application to the Apricot.

Cadet, Xavier F; Lo-Thong, Ophélie; Bureau, Sylvie; Dehak, Reda; Bessafi, Miloud.

Sci Rep ; 9(1): 19197, 2019 12 16.

Artículo en Inglés | MEDLINE | ID: mdl-31844151

RESUMEN

Fast advancement of machine learning methods and constant growth of the areas of application open up new horizons for large data management and processing. Among the various types of data available for analysis, the Fourier Transform InfraRed (FTIR) spectroscopy spectra are very challenging datasets to consider. In this study, machine learning is used to analyze and predict a rheological parameter: firmness. Various statistics have been gathered including both chemistry (such as ethylene, titrable acidity or sugars) and spectra values to visualize and analyze a dataset of 731 biological samples. Two-dimensional (2D) and three-dimensional (3D) principal component analyses (PCA) are used to evaluate their ability to discriminate for one parameter: firmness. Partial least squared regression (PLSR) modeling has been carried out to predict the rheological parameter using either sixteen physicochemical parameters or only the infrared spectra. We show that (i) the spectra alone allows good discrimination of the samples based on rheology, (ii) 3D-PCA allows comprehensive and informative visualization of the data, and (iii) that the rheological parameters are predicted accurately using a regression method such as PLSR; instead of using chemical parameters which are laborious to obtain, Mid-FTIR spectra gathering all physicochemical information could be used for efficient prediction of firmness. As a conclusion, rheological and chemical parameters allow good discrimination of the samples according to their firmness. However, using only the IR spectra leads to better results. A good predictive model was built for the prediction of the firmness of the fruit, and we reached a coefficient of determination R2 value of 0.90. This method outperforms a model based on physicochemical descriptors only. Such an approach could be very helpful to technologists and farmers.

Asunto(s)

Frutas/química , Prunus armeniaca/química , Análisis de Fourier , Análisis de los Mínimos Cuadrados , Aprendizaje Automático , Análisis de Componente Principal/métodos , Reología/métodos , Espectroscopía Infrarroja por Transformada de Fourier/métodos

Novel Descriptors and Digital Signal Processing- Based Method for Protein Sequence Activity Relationship Study.

Fontaine, Nicolas T; Cadet, Xavier F; Vetrivel, Iyanar.

Int J Mol Sci ; 20(22)2019 Nov 11.

Artículo en Inglés | MEDLINE | ID: mdl-31718061

RESUMEN

The work aiming to unravel the correlation between protein sequence and function in the absence of structural information can be highly rewarding. We present a new way of considering descriptors from the amino acids index database for modeling and predicting the fitness value of a polypeptide chain. This approach includes the following steps: (i) Calculating Q elementary numerical sequences (Ele_SEQ) depending on the encoding of the amino acid residues, (ii) determining an extended numerical sequence (Ext_SEQ) by concatenating the Q elementary numerical sequences, wherein at least one elementary numerical sequence is a protein spectrum obtained by applying fast Fourier transformation (FFT), and (iii) predicting a value of fitness for polypeptide variants (train and/or validation set). These new descriptors were tested on four sets of proteins of different lengths (GLP-2, TNF alpha, cytochrome P450, and epoxide hydrolase) and activities (cAMP activation, binding affinity, thermostability and enantioselectivity). We show that the use of multiple physicochemical descriptors coupled with the implementation of the FFT, taking into account the interactions between residues of amino amides within the protein sequence, could lead to very significant improvement in the quality of models and predictions. The choice of the descriptor or of the combination of descriptors and/or FFT is dependent on the couple protein/fitness. This approach can provide potential users with value added to existing mutant libraries where screening efforts have so far been unsuccessful in finding improved polypeptide mutants for useful applications.

Asunto(s)

Aprendizaje Automático , Relación Estructura-Actividad Cuantitativa , Análisis de Secuencia de Proteína/métodos , Animales , Dominio Catalítico , Sistema Enzimático del Citocromo P-450/química , Sistema Enzimático del Citocromo P-450/metabolismo , Epóxido Hidrolasas/química , Epóxido Hidrolasas/metabolismo , Receptor del Péptido 2 Similar al Glucagón/química , Receptor del Péptido 2 Similar al Glucagón/metabolismo , Humanos , Factor de Necrosis Tumoral alfa/química , Factor de Necrosis Tumoral alfa/metabolismo

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA