Pesquisa | Biblioteca Virtual em Saúde

PERISCOPE-Opt: Machine learning-based prediction of optimal fermentation conditions and yields of recombinant periplasmic protein expressed in Escherichia coli.

Packiam, Kulandai Arockia Rajesh; Ooi, Chien Wei; Li, Fuyi; Mei, Shutao; Tey, Beng Ti; Ong, Huey Fang; Song, Jiangning; Ramanan, Ramakrishnan Nagasundara.

Comput Struct Biotechnol J ; 20: 2909-2920, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-35765650

RESUMO

Optimization of the fermentation process for recombinant protein production (RPP) is often resource-intensive. Machine learning (ML) approaches are helpful in minimizing the experimentations and find vast applications in RPP. However, these ML-based tools primarily focus on features with respect to amino-acid-sequence, ruling out the influence of fermentation process conditions. The present study combines the features derived from fermentation process conditions with that from amino acid-sequence to construct an ML-based model that predicts the maximal protein yields and the corresponding fermentation conditions for the expression of target recombinant protein in the Escherichia coli periplasm. Two sets of XGBoost classifiers were employed in the first stage to classify the expression levels of the target protein as high (>50 mg/L), medium (between 0.5 and 50 mg/L), or low (<0.5 mg/L). The second-stage framework consisted of three regression models involving support vector machines and random forest to predict the expression yields corresponding to each expression-level-class. Independent tests showed that the predictor achieved an overall average accuracy of 75% and a Pearson coefficient correlation of 0.91 for the correctly classified instances. Therefore, our model offers a reliable substitution of numerous trial-and-error experiments to identify the optimal fermentation conditions and yield for RPP. It is also implemented as an open-access webserver, PERISCOPE-Opt (http://periscope-opt.erc.monash.edu).

Methodology for the identification of relevant loci for milk traits in dairy cattle, using machine learning algorithms.

Raschia, María Agustina; Ríos, Pablo Javier; Maizon, Daniel Omar; Demitrio, Daniel; Poli, Mario Andrés.

MethodsX ; 9: 101733, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-35637693

RESUMO

Machine learning methods were considered efficient in identifying single nucleotide polymorphisms (SNP) underlying a trait of interest. This study aimed to construct predictive models using machine learning algorithms, to identify loci that best explain the variance in milk traits of dairy cattle. Further objectives involved validating the results by comparison with reported relevant regions and retrieving the pathways overrepresented by the genes flanking relevant SNPs. Regression models using XGBoost (XGB), LightGBM (LGB), and Random Forest (RF) algorithms were trained using estimated breeding values for milk production (EBVM), milk fat content (EBVF) and milk protein content (EBVP) as phenotypes and genotypes on 40417 SNPs as predictor variables. To evaluate their efficiency, metrics for actual vs. predicted values were determined in validation folds (XGB and LGB) and out-of-bag data (RF). Less than 4500 relevant SNPs were retrieved for each trait. Among the genes flanking them, signaling and transmembrane transporter activities were overrepresented. The models trained:â¢Predicted breeding values for animals not included in the dataset.â¢Were efficient in identifying a subset of SNPs explaining phenotypic variation. The results obtained using XGB and LGB algorithms agreed with previous results. Therefore, the method proposed could be applied for future association studies on milk traits.

RESUMO

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA