Your browser doesn't support javascript.
loading
Amino Acid Encoding Methods for Protein Sequences: A Comprehensive Review and Assessment.
IEEE/ACM Trans Comput Biol Bioinform ; 17(6): 1918-1931, 2020.
Article em En | MEDLINE | ID: mdl-30998480
ABSTRACT
As the first step of machine-learning based protein structure and function prediction, the amino acid encoding play a fundamental role in the final success of those methods. Different from the protein sequence encoding, the amino acid encoding can be used in both residue-level and sequence-level prediction of protein properties by combining them with different algorithms. However, it has not attracted enough attention in the past decades, and there are no comprehensive reviews and assessments about encoding methods so far. In this article, we make a systematic classification and propose a comprehensive review and assessment for various amino acid encoding methods. Those methods are grouped into five categories according to their information sources and information extraction methodologies, including binary encoding, physicochemical properties encoding, evolution-based encoding, structure-based encoding, and machine-learning encoding. Then, 16 representative methods from five categories are selected and compared on protein secondary structure prediction and protein fold recognition tasks by using large-scale benchmark datasets. The results show that the evolution-based position-dependent encoding method PSSM achieved the best performance, and the structure-based and machine-learning encoding methods also show some potential for further application, the neural network based distributed representation of amino acids in particular may bring new light to this area. We hope that the review and assessment are useful for future studies in amino acid encoding.
Assuntos

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Proteínas / Sequência de Aminoácidos / Biologia Computacional / Análise de Sequência de Proteína / Aminoácidos Idioma: En Revista: ACM Trans Comput Biol Bioinform Assunto da revista: BIOLOGIA / INFORMATICA MEDICA Ano de publicação: 2020 Tipo de documento: Article

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Proteínas / Sequência de Aminoácidos / Biologia Computacional / Análise de Sequência de Proteína / Aminoácidos Idioma: En Revista: ACM Trans Comput Biol Bioinform Assunto da revista: BIOLOGIA / INFORMATICA MEDICA Ano de publicação: 2020 Tipo de documento: Article