Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 6 de 6
Filtrar
Mais filtros








Base de dados
Intervalo de ano de publicação
1.
Bioinformatics ; 40(2)2024 02 01.
Artigo em Inglês | MEDLINE | ID: mdl-38244571

RESUMO

MOTIVATION: Phosphorylation, a post-translational modification regulated by protein kinase enzymes, plays an essential role in almost all cellular processes. Understanding how each of the nearly 500 human protein kinases selectively phosphorylates their substrates is a foundational challenge in bioinformatics and cell signaling. Although deep learning models have been a popular means to predict kinase-substrate relationships, existing models often lack interpretability and are trained on datasets skewed toward a subset of well-studied kinases. RESULTS: Here we leverage recent peptide library datasets generated to determine substrate specificity profiles of 300 serine/threonine kinases to develop an explainable Transformer model for kinase-peptide interaction prediction. The model, trained solely on primary sequences, achieved state-of-the-art performance. Its unique multitask learning paradigm built within the model enables predictions on virtually any kinase-peptide pair, including predictions on 139 kinases not used in peptide library screens. Furthermore, we employed explainable machine learning methods to elucidate the model's inner workings. Through analysis of learned embeddings at different training stages, we demonstrate that the model employs a unique strategy of substrate prediction considering both substrate motif patterns and kinase evolutionary features. SHapley Additive exPlanation (SHAP) analysis reveals key specificity determining residues in the peptide sequence. Finally, we provide a web interface for predicting kinase-substrate associations for user-defined sequences and a resource for visualizing the learned kinase-substrate associations. AVAILABILITY AND IMPLEMENTATION: All code and data are available at https://github.com/esbgkannan/Phosformer-ST. Web server is available at https://phosformer.netlify.app.


Assuntos
Biblioteca de Peptídeos , Proteínas Quinases , Humanos , Proteínas Quinases/metabolismo , Fosforilação , Peptídeos/química , Aprendizado de Máquina
2.
PeerJ ; 11: e15815, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37868056

RESUMO

The 534 protein kinases encoded in the human genome constitute a large druggable class of proteins that include both well-studied and understudied "dark" members. Accurate prediction of dark kinase functions is a major bioinformatics challenge. Here, we employ a graph mining approach that uses the evolutionary and functional context encoded in knowledge graphs (KGs) to predict protein and pathway associations for understudied kinases. We propose a new scalable graph embedding approach, RegPattern2Vec, which employs regular pattern constrained random walks to sample diverse aspects of node context within a KG flexibly. RegPattern2Vec learns functional representations of kinases, interacting partners, post-translational modifications, pathways, cellular localization, and chemical interactions from a kinase-centric KG that integrates and conceptualizes data from curated heterogeneous data resources. By contextualizing information relevant to prediction, RegPattern2Vec improves accuracy and efficiency in comparison to other random walk-based graph embedding approaches. We show that the predictions produced by our model overlap with pathway enrichment data produced using experimentally validated Protein-Protein Interaction (PPI) data from both publicly available databases and experimental datasets not used in training. Our model also has the advantage of using the collected random walks as biological context to interpret the predicted protein-pathway associations. We provide high-confidence pathway predictions for 34 dark kinases and present three case studies in which analysis of meta-paths associated with the prediction enables biological interpretation. Overall, RegPattern2Vec efficiently samples multiple node types for link prediction on biological knowledge graphs and the predicted associations between understudied kinases, pseudokinases, and known pathways serve as a conceptual starting point for hypothesis generation and testing.


Assuntos
Reconhecimento Automatizado de Padrão , Proteínas , Humanos , Proteínas/genética , Biologia Computacional , Aprendizagem , Conhecimento
3.
Bioinformatics ; 39(2)2023 02 03.
Artigo em Inglês | MEDLINE | ID: mdl-36692152

RESUMO

MOTIVATION: The human genome encodes over 500 distinct protein kinases which regulate nearly all cellular processes by the specific phosphorylation of protein substrates. While advances in mass spectrometry and proteomics studies have identified thousands of phosphorylation sites across species, information on the specific kinases that phosphorylate these sites is currently lacking for the vast majority of phosphosites. Recently, there has been a major focus on the development of computational models for predicting kinase-substrate associations. However, most current models only allow predictions on a subset of well-studied kinases. Furthermore, the utilization of hand-curated features and imbalances in training and testing datasets pose unique challenges in the development of accurate predictive models for kinase-specific phosphorylation prediction. Motivated by the recent development of universal protein language models which automatically generate context-aware features from primary sequence information, we sought to develop a unified framework for kinase-specific phosphosite prediction, allowing for greater investigative utility and enabling substrate predictions at the whole kinome level. RESULTS: We present a deep learning model for kinase-specific phosphosite prediction, termed Phosformer, which predicts the probability of phosphorylation given an arbitrary pair of unaligned kinase and substrate peptide sequences. We demonstrate that Phosformer implicitly learns evolutionary and functional features during training, removing the need for feature curation and engineering. Further analyses reveal that Phosformer also learns substrate specificity motifs and is able to distinguish between functionally distinct kinase families. Benchmarks indicate that Phosformer exhibits significant improvements compared to the state-of-the-art models, while also presenting a more generalized, unified, and interpretable predictive framework. AVAILABILITY AND IMPLEMENTATION: Code and data are available at https://github.com/esbgkannan/phosformer. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Proteínas Quinases , Processamento de Proteína Pós-Traducional , Humanos , Fosforilação , Proteínas Quinases/metabolismo , Proteínas/metabolismo
4.
Brief Bioinform ; 24(1)2023 01 19.
Artigo em Inglês | MEDLINE | ID: mdl-36642409

RESUMO

Protein language models, trained on millions of biologically observed sequences, generate feature-rich numerical representations of protein sequences. These representations, called sequence embeddings, can infer structure-functional properties, despite protein language models being trained on primary sequence alone. While sequence embeddings have been applied toward tasks such as structure and function prediction, applications toward alignment-free sequence classification have been hindered by the lack of studies to derive, quantify and evaluate relationships between protein sequence embeddings. Here, we develop workflows and visualization methods for the classification of protein families using sequence embedding derived from protein language models. A benchmark of manifold visualization methods reveals that Neighbor Joining (NJ) embedding trees are highly effective in capturing global structure while achieving similar performance in capturing local structure compared with popular dimensionality reduction techniques such as t-SNE and UMAP. The statistical significance of hierarchical clusters on a tree is evaluated by resampling embeddings using a variational autoencoder (VAE). We demonstrate the application of our methods in the classification of two well-studied enzyme superfamilies, phosphatases and protein kinases. Our embedding-based classifications remain consistent with and extend upon previously published sequence alignment-based classifications. We also propose a new hierarchical classification for the S-Adenosyl-L-Methionine (SAM) enzyme superfamily which has been difficult to classify using traditional alignment-based approaches. Beyond applications in sequence classification, our results further suggest NJ trees are a promising general method for visualizing high-dimensional data sets.


Assuntos
Sequência de Aminoácidos , Proteínas , Análise por Conglomerados , Proteínas/química , Alinhamento de Sequência
5.
Cell Immunol ; 325: 1-13, 2018 03.
Artigo em Inglês | MEDLINE | ID: mdl-29329637

RESUMO

Idiopathic pulmonary fibrosis (IPF) is a fatal lung disease manifested by overtly scarred peripheral and basilar regions and more normal-appearing central lung areas. Lung tissues from macroscopically normal-appearing (IPFn) and scarred (IPFs) areas of explanted IPF lungs were analyzed by RNASeq and compared with healthy control (HC) lung tissues. There were profound transcriptomic changes in IPFn compared with HC tissues, which included elevated expression of numerous immune-, inflammation-, and extracellular matrix-related mRNAs, and these changes were similar to those observed with IPFs compared to HC. Comparing IPFn directly to IPFs, elevated expression of epithelial mucociliary mRNAs was observed in the IPFs tissues. Thus, despite the known geographic tissue heterogeneity in IPF, the entire lung is actively involved in the disease process, and demonstrates pronounced elevated expression of numerous immune-related genes. Differences between normal-appearing and scarred tissues may thus be driven by deranged epithelial homeostasis or possibly non-transcriptomic factors.


Assuntos
Fibrose Pulmonar Idiopática/genética , Fibrose Pulmonar Idiopática/imunologia , Pulmão/imunologia , Matriz Extracelular/metabolismo , Fibroblastos/metabolismo , Ontologia Genética , Humanos , Pulmão/metabolismo , Ativação de Macrófagos/imunologia , Cultura Primária de Células , RNA Mensageiro/metabolismo , Mucosa Respiratória/imunologia , Mucosa Respiratória/metabolismo , Análise de Sequência de RNA/métodos , Transcriptoma/genética
6.
J Biol Chem ; 292(52): 21653-21661, 2017 12 29.
Artigo em Inglês | MEDLINE | ID: mdl-29127199

RESUMO

Human mature IL-33 is a member of the IL-1 family and a potent regulator of immunity through its pro-T helper cell 2 activity. Its precursor form, full-length interleukin-33 (FLIL33), is an intranuclear protein in many cell types, including fibroblasts, and its intracellular levels can change in response to stimuli. However, the mechanisms controlling the nuclear localization of FLIL33 or its stability in cells are not understood. Here, we identified importin-5 (IPO5), a member of the importin family of nuclear transport proteins, as an intracellular binding partner of FLIL33. By overexpressing various FLIL33 protein segments and variants in primary human lung fibroblasts and HEK293T cells, we show that FLIL33, but not mature interleukin-33, physically interacts with IPO5 and that this interaction localizes to a cluster of charged amino acids (positions 46-56) but not to an adjacent segment (positions 61-67) in the FLIL33 N-terminal region. siRNA-mediated IPO5 knockdown in cell culture did not affect nuclear localization of FLIL33. However, the IPO5 knockdown significantly decreased the intracellular levels of overexpressed FLIL33, reversed by treatment with the 20S proteasome inhibitor bortezomib. Furthermore, FLIL33 variants deficient in IPO5 binding remained intranuclear and exhibited decreased levels, which were also restored by the bortezomib treatment. These results indicate that the interaction between FLIL33 and IPO5 is localized to a specific segment of the FLIL33 protein, is not required for nuclear localization of FLIL33, and protects FLIL33 from proteasome-dependent degradation.


Assuntos
Interleucina-33/metabolismo , beta Carioferinas/metabolismo , Sequência de Aminoácidos , Núcleo Celular/metabolismo , Citoplasma/metabolismo , Células HEK293 , Células HeLa , Humanos , Interleucina-33/genética , Sinais de Localização Nuclear/metabolismo , Proteínas Nucleares/metabolismo , Ligação Proteica , Domínios e Motivos de Interação entre Proteínas , Transporte Proteico , Proteólise , beta Carioferinas/genética
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA