Learning Proteome Domain Folding Using LSTMs in an Empirical Kernel Space.

Kuang, Da; Issakova, Dina; Kim, Junhyong

Kuang, Da; Issakova, Dina; Kim, Junhyong.

Afiliación

Kuang D; Department of Computer and Information Science, University of Pennsylvania, Philadelphia, USA. Electronic address: kuangda@seas.upenn.edu.
Issakova D; Department of Biology, University of Pennsylvania, Philadelphia, USA. Electronic address: dissakov@sas.upenn.edu.
Kim J; Department of Computer and Information Science, University of Pennsylvania, Philadelphia, USA; Department of Biology, University of Pennsylvania, Philadelphia, USA. Electronic address: junhyong@sas.upenn.edu.

J Mol Biol ; 434(15): 167686, 2022 08 15.

Article en En | MEDLINE | ID: mdl-35716781

RESUMEN

The recognition of protein structural folds is the starting point for protein function inference and for many structural prediction tools. We previously introduced the idea of using empirical comparisons to create a data-augmented feature space called PESS (Protein Empirical Structure Space)1 as a novel approach for protein structure prediction. Here, we extend the previous approach by generating the PESS feature space over fixed-length subsequences of query peptides, and applying a sequential neural network model, with one long short-term memory cell layer followed by a fully connected layer. Using this approach, we show that only a small group of domains as a training set is needed to achieve near state-of-the-art accuracy on fold recognition. Our method improves on the previous approach by reducing the training set required and improving the model's ability to generalize across species, which will help fold prediction for newly discovered proteins.

Asunto(s)

Redes Neurales de la Computación; Pliegue de Proteína; Proteoma; Algoritmos; Dominios Proteicos

Palabras clave

SCOPe; long short-term memory networks; protein empirical structure space; proteins fold

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Asunto principal: Redes Neurales de la Computación / Pliegue de Proteína / Proteoma Tipo de estudio: Prognostic_studies Idioma: En Revista: J Mol Biol Año: 2022 Tipo del documento: Article Pais de publicación: Países Bajos

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google