Your browser doesn't support javascript.
loading
Predicting Post-Translational Modifications from Local Sequence Fragments Using Machine Learning Algorithms: Overview and Best Practices.
Tatjewski, Marcin; Kierczak, Marcin; Plewczynski, Dariusz.
Afiliación
  • Tatjewski M; Institute of Computer Science, Polish Academy of Sciences, Warsaw, Poland.
  • Kierczak M; Centre of New Technologies, University of Warsaw, S. Banacha 2c, 02-097, Warsaw, Poland.
  • Plewczynski D; Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden.
Methods Mol Biol ; 1484: 275-300, 2017.
Article en En | MEDLINE | ID: mdl-27787833
Here, we present two perspectives on the task of predicting post translational modifications (PTMs) from local sequence fragments using machine learning algorithms. The first is the description of the fundamental steps required to construct a PTM predictor from the very beginning. These steps include data gathering, feature extraction, or machine-learning classifier selection. The second part of our work contains the detailed discussion of more advanced problems which are encountered in PTM prediction task. Probably the most challenging issues which we have covered here are: (1) how to address the training data class imbalance problem (we also present statistics describing the problem); (2) how to properly set up cross-validation folds with an approach which takes into account the homology of protein data records, to address this problem we present our folds-over-clusters algorithm; and (3) how to efficiently reach for new sources of learning features. Presented techniques and notes resulted from intense studies in the field, performed by our and other groups, and can be useful both for researchers beginning in the field of PTM prediction and for those who want to extend the repertoire of their research techniques.
Asunto(s)
Palabras clave

Texto completo: 1 Bases de datos: MEDLINE Asunto principal: Programas Informáticos / Proteínas / Procesamiento Proteico-Postraduccional / Biología Computacional Tipo de estudio: Guideline / Prognostic_studies / Risk_factors_studies Idioma: En Revista: Methods Mol Biol Asunto de la revista: BIOLOGIA MOLECULAR Año: 2017 Tipo del documento: Article País de afiliación: Polonia

Texto completo: 1 Bases de datos: MEDLINE Asunto principal: Programas Informáticos / Proteínas / Procesamiento Proteico-Postraduccional / Biología Computacional Tipo de estudio: Guideline / Prognostic_studies / Risk_factors_studies Idioma: En Revista: Methods Mol Biol Asunto de la revista: BIOLOGIA MOLECULAR Año: 2017 Tipo del documento: Article País de afiliación: Polonia