Búsqueda | Portal Regional de la BVS

DeepLoc 2.1: multi-label membrane protein type prediction using protein language models.

Ødum, Marius Thrane; Teufel, Felix; Thumuluri, Vineet; Almagro Armenteros, José Juan; Johansen, Alexander Rosenberg; Winther, Ole; Nielsen, Henrik.

Nucleic Acids Res ; 2024 Apr 08.

Artículo en Inglés | MEDLINE | ID: mdl-38587188

RESUMEN

DeepLoc 2.0 is a popular web server for the prediction of protein subcellular localization and sorting signals. Here, we introduce DeepLoc 2.1, which additionally classifies the input proteins into the membrane protein types Transmembrane, Peripheral, Lipid-anchored and Soluble. Leveraging pre-trained transformer-based protein language models, the server utilizes a three-stage architecture for sequence-based, multi-label predictions. Comparative evaluations with other established tools on a test set of 4933 eukaryotic protein sequences, constructed following stringent homology partitioning, demonstrate state-of-the-art performance. Notably, DeepLoc 2.1 outperforms existing models, with the larger ProtT5 model exhibiting a marginal advantage over the ESM-1B model. The web server is available at https://services.healthtech.dtu.dk/services/DeepLoc-2.1.

DeepLoc 2.0: multi-label subcellular localization prediction using protein language models.

Thumuluri, Vineet; Almagro Armenteros, José Juan; Johansen, Alexander Rosenberg; Nielsen, Henrik; Winther, Ole.

Nucleic Acids Res ; 50(W1): W228-W234, 2022 07 05.

Artículo en Inglés | MEDLINE | ID: mdl-35489069

RESUMEN

The prediction of protein subcellular localization is of great relevance for proteomics research. Here, we propose an update to the popular tool DeepLoc with multi-localization prediction and improvements in both performance and interpretability. For training and validation, we curate eukaryotic and human multi-location protein datasets with stringent homology partitioning and enriched with sorting signal information compiled from the literature. We achieve state-of-the-art performance in DeepLoc 2.0 by using a pre-trained protein language model. It has the further advantage that it uses sequence input rather than relying on slower protein profiles. We provide two means of better interpretability: an attention output along the sequence and highly accurate prediction of nine different types of protein sorting signals. We find that the attention output correlates well with the position of sorting signals. The webserver is available at services.healthtech.dtu.dk/service.php?DeepLoc-2.0.

Asunto(s)

Señales de Clasificación de Proteína , Proteínas , Humanos , Proteínas/metabolismo , Eucariontes/metabolismo , Transporte de Proteínas , Lenguaje , Bases de Datos de Proteínas , Biología Computacional , Fracciones Subcelulares/metabolismo

NetSolP: predicting protein solubility in Escherichia coli using language models.

Thumuluri, Vineet; Martiny, Hannah-Marie; Almagro Armenteros, Jose J; Salomon, Jesper; Nielsen, Henrik; Johansen, Alexander Rosenberg.

Bioinformatics ; 38(4): 941-946, 2022 01 27.

Artículo en Inglés | MEDLINE | ID: mdl-35088833

RESUMEN

MOTIVATION: Solubility and expression levels of proteins can be a limiting factor for large-scale studies and industrial production. By determining the solubility and expression directly from the protein sequence, the success rate of wet-lab experiments can be increased. RESULTS: In this study, we focus on predicting the solubility and usability for purification of proteins expressed in Escherichia coli directly from the sequence. Our model NetSolP is based on deep learning protein language models called transformers and we show that it achieves state-of-the-art performance and improves extrapolation across datasets. As we find current methods are built on biased datasets, we curate existing datasets by using strict sequence-identity partitioning and ensure that there is minimal bias in the sequences. AVAILABILITY AND IMPLEMENTATION: The predictor and data are available at https://services.healthtech.dtu.dk/service.php?NetSolP and the open-sourced code is available at https://github.com/tvinet/NetSolP-1.0. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Asunto(s)

Escherichia coli , Lenguaje , Proteínas , Programas Informáticos , Solubilidad

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA