Your browser doesn't support javascript.
loading
Unsupervised protein embeddings outperform hand-crafted sequence and structure features at predicting molecular function.
Villegas-Morcillo, Amelia; Makrodimitris, Stavros; van Ham, Roeland C H J; Gomez, Angel M; Sanchez, Victoria; Reinders, Marcel J T.
Afiliação
  • Villegas-Morcillo A; Department of Signal Theory, Telematics and Communications, University of Granada, 18071 Granada, Spain.
  • Makrodimitris S; Delft Bioinformatics Lab, Delft University of Technology, 2628XE Delft, The Netherlands.
  • van Ham RCHJ; Keygene N.V., 6708PW Wageningen, The Netherlands.
  • Gomez AM; Delft Bioinformatics Lab, Delft University of Technology, 2628XE Delft, The Netherlands.
  • Sanchez V; Keygene N.V., 6708PW Wageningen, The Netherlands.
  • Reinders MJT; Department of Signal Theory, Telematics and Communications, University of Granada, 18071 Granada, Spain.
Bioinformatics ; 37(2): 162-170, 2021 04 19.
Article em En | MEDLINE | ID: mdl-32797179
ABSTRACT
MOTIVATION Protein function prediction is a difficult bioinformatics problem. Many recent methods use deep neural networks to learn complex sequence representations and predict function from these. Deep supervised models require a lot of labeled training data which are not available for this task. However, a very large amount of protein sequences without functional labels is available.

RESULTS:

We applied an existing deep sequence model that had been pretrained in an unsupervised setting on the supervised task of protein molecular function prediction. We found that this complex feature representation is effective for this task, outperforming hand-crafted features such as one-hot encoding of amino acids, k-mer counts, secondary structure and backbone angles. Also, it partly negates the need for complex prediction models, as a two-layer perceptron was enough to achieve competitive performance in the third Critical Assessment of Functional Annotation benchmark. We also show that combining this sequence representation with protein 3D structure information does not lead to performance improvement, hinting that 3D structure is also potentially learned during the unsupervised pretraining. AVAILABILITY AND IMPLEMENTATION Implementations of all used models can be found at https//github.com/stamakro/GCN-for-Structure-and-Function. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Assuntos

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Software / Proteínas Tipo de estudo: Prognostic_studies / Risk_factors_studies Idioma: En Revista: Bioinformatics Assunto da revista: INFORMATICA MEDICA Ano de publicação: 2021 Tipo de documento: Article País de afiliação: Espanha

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Software / Proteínas Tipo de estudo: Prognostic_studies / Risk_factors_studies Idioma: En Revista: Bioinformatics Assunto da revista: INFORMATICA MEDICA Ano de publicação: 2021 Tipo de documento: Article País de afiliação: Espanha