Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 3 de 3
Filtrar
Mais filtros











Base de dados
Intervalo de ano de publicação
1.
Bioinformatics ; 40(2)2024 02 01.
Artigo em Inglês | MEDLINE | ID: mdl-38244571

RESUMO

MOTIVATION: Phosphorylation, a post-translational modification regulated by protein kinase enzymes, plays an essential role in almost all cellular processes. Understanding how each of the nearly 500 human protein kinases selectively phosphorylates their substrates is a foundational challenge in bioinformatics and cell signaling. Although deep learning models have been a popular means to predict kinase-substrate relationships, existing models often lack interpretability and are trained on datasets skewed toward a subset of well-studied kinases. RESULTS: Here we leverage recent peptide library datasets generated to determine substrate specificity profiles of 300 serine/threonine kinases to develop an explainable Transformer model for kinase-peptide interaction prediction. The model, trained solely on primary sequences, achieved state-of-the-art performance. Its unique multitask learning paradigm built within the model enables predictions on virtually any kinase-peptide pair, including predictions on 139 kinases not used in peptide library screens. Furthermore, we employed explainable machine learning methods to elucidate the model's inner workings. Through analysis of learned embeddings at different training stages, we demonstrate that the model employs a unique strategy of substrate prediction considering both substrate motif patterns and kinase evolutionary features. SHapley Additive exPlanation (SHAP) analysis reveals key specificity determining residues in the peptide sequence. Finally, we provide a web interface for predicting kinase-substrate associations for user-defined sequences and a resource for visualizing the learned kinase-substrate associations. AVAILABILITY AND IMPLEMENTATION: All code and data are available at https://github.com/esbgkannan/Phosformer-ST. Web server is available at https://phosformer.netlify.app.


Assuntos
Biblioteca de Peptídeos , Proteínas Quinases , Humanos , Proteínas Quinases/metabolismo , Fosforilação , Peptídeos/química , Aprendizado de Máquina
2.
Brief Bioinform ; 24(1)2023 01 19.
Artigo em Inglês | MEDLINE | ID: mdl-36642409

RESUMO

Protein language models, trained on millions of biologically observed sequences, generate feature-rich numerical representations of protein sequences. These representations, called sequence embeddings, can infer structure-functional properties, despite protein language models being trained on primary sequence alone. While sequence embeddings have been applied toward tasks such as structure and function prediction, applications toward alignment-free sequence classification have been hindered by the lack of studies to derive, quantify and evaluate relationships between protein sequence embeddings. Here, we develop workflows and visualization methods for the classification of protein families using sequence embedding derived from protein language models. A benchmark of manifold visualization methods reveals that Neighbor Joining (NJ) embedding trees are highly effective in capturing global structure while achieving similar performance in capturing local structure compared with popular dimensionality reduction techniques such as t-SNE and UMAP. The statistical significance of hierarchical clusters on a tree is evaluated by resampling embeddings using a variational autoencoder (VAE). We demonstrate the application of our methods in the classification of two well-studied enzyme superfamilies, phosphatases and protein kinases. Our embedding-based classifications remain consistent with and extend upon previously published sequence alignment-based classifications. We also propose a new hierarchical classification for the S-Adenosyl-L-Methionine (SAM) enzyme superfamily which has been difficult to classify using traditional alignment-based approaches. Beyond applications in sequence classification, our results further suggest NJ trees are a promising general method for visualizing high-dimensional data sets.


Assuntos
Sequência de Aminoácidos , Proteínas , Análise por Conglomerados , Proteínas/química , Alinhamento de Sequência
3.
Cell Immunol ; 325: 1-13, 2018 03.
Artigo em Inglês | MEDLINE | ID: mdl-29329637

RESUMO

Idiopathic pulmonary fibrosis (IPF) is a fatal lung disease manifested by overtly scarred peripheral and basilar regions and more normal-appearing central lung areas. Lung tissues from macroscopically normal-appearing (IPFn) and scarred (IPFs) areas of explanted IPF lungs were analyzed by RNASeq and compared with healthy control (HC) lung tissues. There were profound transcriptomic changes in IPFn compared with HC tissues, which included elevated expression of numerous immune-, inflammation-, and extracellular matrix-related mRNAs, and these changes were similar to those observed with IPFs compared to HC. Comparing IPFn directly to IPFs, elevated expression of epithelial mucociliary mRNAs was observed in the IPFs tissues. Thus, despite the known geographic tissue heterogeneity in IPF, the entire lung is actively involved in the disease process, and demonstrates pronounced elevated expression of numerous immune-related genes. Differences between normal-appearing and scarred tissues may thus be driven by deranged epithelial homeostasis or possibly non-transcriptomic factors.


Assuntos
Fibrose Pulmonar Idiopática/genética , Fibrose Pulmonar Idiopática/imunologia , Pulmão/imunologia , Matriz Extracelular/metabolismo , Fibroblastos/metabolismo , Ontologia Genética , Humanos , Pulmão/metabolismo , Ativação de Macrófagos/imunologia , Cultura Primária de Células , RNA Mensageiro/metabolismo , Mucosa Respiratória/imunologia , Mucosa Respiratória/metabolismo , Análise de Sequência de RNA/métodos , Transcriptoma/genética
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA