Identifying N7-methylguanosine sites by integrating multiple features.
Biopolymers
; 113(2): e23480, 2022 Feb.
Article
en En
| MEDLINE
| ID: mdl-34709657
Recent studies reported that N7-methylguanosine (m7G) plays a vital role in gene expression regulation. As a consequence, determining the distribution of m7G is a crucial step towards further understanding its biological functions. Although biological experimental approaches are capable of accurately locating m7G sites, they are labor-intensive, costly, and time-consuming. Therefore, it is necessary to develop more effective and robust computational methods to replace, or at least complement current experimental methods. In this study, we developed a novel sequence-based computational tool to identify RNA m7G sites. In this model, 22 kinds of dinucleotide physicochemical (PC) properties were employed to encode the RNA sequence. Three types of descriptors, including auto-covariance, cross-covariance, and discrete wavelet transform were adopted to extract effective features from the PC matrix. The least absolute shrinkage and selection operator (LASSO) algorithm was utilized to reduce the influence of irrelevant or redundant features. Finally, these selected features were fed into a support vector machine (SVM) for distinguishing m7G from non-m7G sites. The proposed method significantly outperforms existing predictors across all evaluation metrics. It indicates that the approach is effective in identifying RNA m7G sites.
Palabras clave
Texto completo:
1
Colección:
01-internacional
Base de datos:
MEDLINE
Asunto principal:
Máquina de Vectores de Soporte
/
Guanosina
Tipo de estudio:
Prognostic_studies
Idioma:
En
Revista:
Biopolymers
Año:
2022
Tipo del documento:
Article
País de afiliación:
China
Pais de publicación:
Estados Unidos