Sequence-based prediction model of protein crystallization propensity using machine learning and two-level feature selection.

Le, Nguyen Quoc Khanh; Li, Wanru; Cao, Yanshuang

Le, Nguyen Quoc Khanh; Li, Wanru; Cao, Yanshuang.

Afiliación

Le NQK; Professional Master Program in Artificial Intelligence in Medicine, College of Medicine, Taipei Medical University, 250 Wuxing Street, 110, Taipei, Taiwan.
Li W; AIBioMed Research Group, Taipei Medical University, 250 Wuxing Street, 110, Taipei, Taiwan.
Cao Y; Research Center for Artificial Intelligence in Medicine, Taipei Medical University, 250 Wuxing Street, 110, Taipei, Taiwan.

Brief Bioinform ; 24(5)2023 09 20.

Article en En | MEDLINE | ID: mdl-37649385

RESUMEN

Protein crystallization is crucial for biology, but the steps involved are complex and demanding in terms of external factors and internal structure. To save on experimental costs and time, the tendency of proteins to crystallize can be initially determined and screened by modeling. As a result, this study created a new pipeline aimed at using protein sequence to predict protein crystallization propensity in the protein material production stage, purification stage and production of crystal stage. The newly created pipeline proposed a new feature selection method, which involves combining Chi-square (${\chi }^{2}$) and recursive feature elimination together with the 12 selected features, followed by a linear discriminant analysisfor dimensionality reduction and finally, a support vector machine algorithm with hyperparameter tuning and 10-fold cross-validation is used to train the model and test the results. This new pipeline has been tested on three different datasets, and the accuracy rates are higher than the existing pipelines. In conclusion, our model provides a new solution to predict multistage protein crystallization propensity which is a big challenge in computational biology.

Asunto(s)

Algoritmos; Aprendizaje Automático; Cristalización; Secuencia de Aminoácidos; Biología Computacional

Palabras clave

crystallization; feature selection; machine learning; prediction model; protein sequence; support vector machine

Texto completo

Imprimir

XML

PubMed Links

Buscar en Google

Texto completo: 1 Colección: 01-internacional Banco de datos: MEDLINE Asunto principal: Algoritmos / Aprendizaje Automático Tipo de estudio: Prognostic_studies / Risk_factors_studies Idioma: En Revista: Brief Bioinform Asunto de la revista: BIOLOGIA / INFORMATICA MEDICA Año: 2023 Tipo del documento: Article País de afiliación: Taiwán

Texto completo

Imprimir

XML

PubMed Links

Buscar en Google