MLysPRED: graph-based multi-view clustering and multi-dimensional normal distribution resampling techniques to predict multiple lysine sites.

Zuo, Yun; Hong, Yue; Zeng, Xiangxiang; Zhang, Qiang; Liu, Xiangrong

Zuo, Yun; Hong, Yue; Zeng, Xiangxiang; Zhang, Qiang; Liu, Xiangrong.

Afiliación

Zuo Y; Department of Computer Science, Xiamen University, Xiamen 361005, China.
Hong Y; Department of Computer Science, Xiamen University, Xiamen 361005, China.
Zeng X; School of Information Science and Engineering, Hunan University, Changsha, China.
Zhang Q; School of Computer Science and Technology, Dalian University of Technology (DLUT), China.
Liu X; Department of Computer Science, Xiamen University, Xiamen 361005, China.

Brief Bioinform ; 23(5)2022 09 20.

Article en En | MEDLINE | ID: mdl-35953081

ABSTRACT

ABSTRACT

Posttranslational modification of lysine residues, K-PTM, is one of the most popular PTMs. Some lysine residues in proteins can be continuously or cascaded covalently modified, such as acetylation, crotonylation, methylation and succinylation modification. The covalent modification of lysine residues may have some special functions in basic research and drug development. Although many computational methods have been developed to predict lysine PTMs, up to now, the K-PTM prediction methods have been modeled and learned a single class of K-PTM modification. In view of this, this study aims to fill this gap by building a multi-label computational model that can be directly used to predict multiple K-PTMs in proteins. In this study, a multi-label prediction model, MLysPRED, is proposed to identify multiple lysine sites using features generated from human protein sequences. In MLysPRED, three kinds of multi-label sequence encoding algorithms (MLDBPB, MLPSDAAP, MLPSTAAP) are proposed and combined with three encoding strategies (CHHAA, DR and Kmer) to convert preprocessed lysine sequences into effective numerical features. A multidimensional normal distribution oversampling technique and graph-based multi-view clustering under-sampling algorithm were first proposed and incorporated to reduce the proportion of the original training samples, and multi-label nearest neighbor algorithm is used for classification. It is observed that MLysPRED achieved an Aiming of 92.21%, Coverage of 94.98%, Accuracy of 89.63%, Absolute-True of 81.46% and Absolute-False of 0.0682 on the independent datasets. Additionally, comparison of results with five existing predictors also indicated that MLysPRED is very promising and encouraging to predict multiple K-PTMs in proteins. For the convenience of the experimental scientists, 'MLysPRED' has been deployed as a user-friendly web-server at http//47.100.136.418181.

Asunto(s)

Lisina; Proteínas; Algoritmos; Análisis por Conglomerados; Biología Computacional/métodos; Humanos; Lisina/metabolismo; Distribución Normal; Procesamiento Proteico-Postraduccional; Proteínas/química

Palabras clave

multi-label prediction model; multiple K-PTMs; resampling techniques; sequence encoding

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Asunto principal: Proteínas / Lisina Tipo de estudio: Prognostic_studies / Risk_factors_studies Límite: Humans Idioma: En Revista: Brief Bioinform Asunto de la revista: BIOLOGIA / INFORMATICA MEDICA Año: 2022 Tipo del documento: Article País de afiliación: China

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google