Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 13 de 13
Filtrar
1.
Bioinformatics ; 40(5)2024 May 02.
Artículo en Inglés | MEDLINE | ID: mdl-38662579

RESUMEN

MOTIVATION: Recent advancements in natural language processing have highlighted the effectiveness of global contextualized representations from protein language models (pLMs) in numerous downstream tasks. Nonetheless, strategies to encode the site-of-interest leveraging pLMs for per-residue prediction tasks, such as crotonylation (Kcr) prediction, remain largely uncharted. RESULTS: Herein, we adopt a range of approaches for utilizing pLMs by experimenting with different input sequence types (full-length protein sequence versus window sequence), assessing the implications of utilizing per-residue embedding of the site-of-interest as well as embeddings of window residues centered around it. Building upon these insights, we developed a novel residual ConvBiLSTM network designed to process window-level embeddings of the site-of-interest generated by the ProtT5-XL-UniRef50 pLM using full-length sequences as input. This model, termed T5ResConvBiLSTM, surpasses existing state-of-the-art Kcr predictors in performance across three diverse datasets. To validate our approach of utilizing full sequence-based window-level embeddings, we also delved into the interpretability of ProtT5-derived embedding tensors in two ways: firstly, by scrutinizing the attention weights obtained from the transformer's encoder block; and secondly, by computing SHAP values for these tensors, providing a model-agnostic interpretation of the prediction results. Additionally, we enhance the latent representation of ProtT5 by incorporating two additional local representations, one derived from amino acid properties and the other from supervised embedding layer, through an intermediate fusion stacked generalization approach, using an n-mer window sequence (or, peptide/fragment). The resultant stacked model, dubbed LMCrot, exhibits a more pronounced improvement in predictive performance across the tested datasets. AVAILABILITY AND IMPLEMENTATION: LMCrot is publicly available at https://github.com/KCLabMTU/LMCrot.


Asunto(s)
Proteínas , Proteínas/química , Proteínas/metabolismo , Procesamiento de Lenguaje Natural , Biología Computacional/métodos , Bases de Datos de Proteínas , Programas Informáticos , Procesamiento Proteico-Postraduccional , Secuencia de Aminoácidos
2.
BMC Bioinformatics ; 24(1): 41, 2023 Feb 08.
Artículo en Inglés | MEDLINE | ID: mdl-36755242

RESUMEN

BACKGROUND: Protein S-nitrosylation (SNO) plays a key role in transferring nitric oxide-mediated signals in both animals and plants and has emerged as an important mechanism for regulating protein functions and cell signaling of all main classes of protein. It is involved in several biological processes including immune response, protein stability, transcription regulation, post translational regulation, DNA damage repair, redox regulation, and is an emerging paradigm of redox signaling for protection against oxidative stress. The development of robust computational tools to predict protein SNO sites would contribute to further interpretation of the pathological and physiological mechanisms of SNO. RESULTS: Using an intermediate fusion-based stacked generalization approach, we integrated embeddings from supervised embedding layer and contextualized protein language model (ProtT5) and developed a tool called pLMSNOSite (protein language model-based SNO site predictor). On an independent test set of experimentally identified SNO sites, pLMSNOSite achieved values of 0.340, 0.735 and 0.773 for MCC, sensitivity and specificity respectively. These results show that pLMSNOSite performs better than the compared approaches for the prediction of S-nitrosylation sites. CONCLUSION: Together, the experimental results suggest that pLMSNOSite achieves significant improvement in the prediction performance of S-nitrosylation sites and represents a robust computational approach for predicting protein S-nitrosylation sites. pLMSNOSite could be a useful resource for further elucidation of SNO and is publicly available at https://github.com/KCLabMTU/pLMSNOSite .


Asunto(s)
Óxido Nítrico , Proteínas , Animales , Proteínas/metabolismo , Óxido Nítrico/metabolismo , Oxidación-Reducción , Procesamiento Proteico-Postraduccional , Transducción de Señal
3.
J Proteome Res ; 22(8): 2548-2557, 2023 08 04.
Artículo en Inglés | MEDLINE | ID: mdl-37459437

RESUMEN

Phosphorylation is one of the most important post-translational modifications and plays a pivotal role in various cellular processes. Although there exist several computational tools to predict phosphorylation sites, existing tools have not yet harnessed the knowledge distilled by pretrained protein language models. Herein, we present a novel deep learning-based approach called LMPhosSite for the general phosphorylation site prediction that integrates embeddings from the local window sequence and the contextualized embedding obtained using global (overall) protein sequence from a pretrained protein language model to improve the prediction performance. Thus, the LMPhosSite consists of two base-models: one for capturing effective local representation and the other for capturing global per-residue contextualized embedding from a pretrained protein language model. The output of these base-models is integrated using a score-level fusion approach. LMPhosSite achieves a precision, recall, Matthew's correlation coefficient, and F1-score of 38.78%, 67.12%, 0.390, and 49.15%, for the combined serine and threonine independent test data set and 34.90%, 62.03%, 0.298, and 44.67%, respectively, for the tyrosine independent test data set, which is better than the compared approaches. These results demonstrate that LMPhosSite is a robust computational tool for the prediction of the general phosphorylation sites in proteins.


Asunto(s)
Aprendizaje Profundo , Fosforilación , Proteínas/metabolismo , Procesamiento Proteico-Postraduccional , Secuencia de Aminoácidos
4.
Glycobiology ; 33(5): 411-422, 2023 06 03.
Artículo en Inglés | MEDLINE | ID: mdl-37067908

RESUMEN

Protein N-linked glycosylation is an important post-translational mechanism in Homo sapiens, playing essential roles in many vital biological processes. It occurs at the N-X-[S/T] sequon in amino acid sequences, where X can be any amino acid except proline. However, not all N-X-[S/T] sequons are glycosylated; thus, the N-X-[S/T] sequon is a necessary but not sufficient determinant for protein glycosylation. In this regard, computational prediction of N-linked glycosylation sites confined to N-X-[S/T] sequons is an important problem that has not been extensively addressed by the existing methods, especially in regard to the creation of negative sets and leveraging the distilled information from protein language models (pLMs). Here, we developed LMNglyPred, a deep learning-based approach, to predict N-linked glycosylated sites in human proteins using embeddings from a pre-trained pLM. LMNglyPred produces sensitivity, specificity, Matthews Correlation Coefficient, precision, and accuracy of 76.50, 75.36, 0.49, 60.99, and 75.74 percent, respectively, on a benchmark-independent test set. These results demonstrate that LMNglyPred is a robust computational tool to predict N-linked glycosylation sites confined to the N-X-[S/T] sequon.


Asunto(s)
Aminoácidos , Glicoproteínas , Humanos , Glicosilación , Glicoproteínas/metabolismo , Aminoácidos/química , Procesamiento Proteico-Postraduccional , Secuencia de Aminoácidos
5.
Sensors (Basel) ; 23(8)2023 Apr 14.
Artículo en Inglés | MEDLINE | ID: mdl-37112326

RESUMEN

Older adults are more vulnerable to falling due to normal changes due to aging, and their falls are a serious medical risk with high healthcare and societal costs. However, there is a lack of automatic fall detection systems for older adults. This paper reports (1) a wireless, flexible, skin-wearable electronic device for both accurate motion sensing and user comfort, and (2) a deep learning-based classification algorithm for reliable fall detection of older adults. The cost-effective skin-wearable motion monitoring device is designed and fabricated using thin copper films. It includes a six-axis motion sensor and is directly laminated on the skin without adhesives for the collection of accurate motion data. To study accurate fall detection using the proposed device, different deep learning models, body locations for the device placement, and input datasets are investigated using motion data based on various human activities. Our results indicate the optimal location to place the device is the chest, achieving accuracy of more than 98% for falls with motion data from older adults. Moreover, our results suggest a large motion dataset directly collected from older adults is essential to improve the accuracy of fall detection for the older adult population.


Asunto(s)
Aprendizaje Profundo , Dispositivos Electrónicos Vestibles , Humanos , Anciano , Algoritmos , Movimiento (Física)
6.
Int J Mol Sci ; 24(21)2023 Nov 06.
Artículo en Inglés | MEDLINE | ID: mdl-37958983

RESUMEN

O-linked ß-N-acetylglucosamine (O-GlcNAc) is a distinct monosaccharide modification of serine (S) or threonine (T) residues of nucleocytoplasmic and mitochondrial proteins. O-GlcNAc modification (i.e., O-GlcNAcylation) is involved in the regulation of diverse cellular processes, including transcription, epigenetic modifications, and cell signaling. Despite the great progress in experimentally mapping O-GlcNAc sites, there is an unmet need to develop robust prediction tools that can effectively locate the presence of O-GlcNAc sites in protein sequences of interest. In this work, we performed a comprehensive evaluation of a framework for prediction of protein O-GlcNAc sites using embeddings from pre-trained protein language models. In particular, we compared the performance of three protein sequence-based large protein language models (pLMs), Ankh, ESM-2, and ProtT5, for prediction of O-GlcNAc sites and also evaluated various ensemble strategies to integrate embeddings from these protein language models. Upon investigation, the decision-level fusion approach that integrates the decisions of the three embedding models, which we call LM-OGlcNAc-Site, outperformed the models trained on these individual language models as well as other fusion approaches and other existing predictors in almost all of the parameters evaluated. The precise prediction of O-GlcNAc sites will facilitate the probing of O-GlcNAc site-specific functions of proteins in physiology and diseases. Moreover, these findings also indicate the effectiveness of combined uses of multiple protein language models in post-translational modification prediction and open exciting avenues for further research and exploration in other protein downstream tasks. LM-OGlcNAc-Site's web server and source code are publicly available to the community.


Asunto(s)
Procesamiento Proteico-Postraduccional , Proteínas , Proteínas/química , Secuencia de Aminoácidos , Acetilglucosamina/metabolismo , N-Acetilglucosaminiltransferasas/metabolismo
7.
BMC Med Inform Decis Mak ; 21(Suppl 11): 369, 2022 11 23.
Artículo en Inglés | MEDLINE | ID: mdl-36419042

RESUMEN

BACKGROUND: Colorectal cancer (CRC) is a heterogeneous disease with different responses to targeted therapies due to various factors, and the treatment effect differs significantly between individuals. Personalize medical treatment (PMT) is a method that takes individual patient characteristics into consideration, making it the most effective way to deal with this issue. Patient similarity and clustering analysis is an important aspect of PMT. This paper describes how to build a knowledge base using formal concept analysis (FCA), which clusters patients based on their similarity and preserves the relations between clusters in hierarchical structural form. METHODS: Prognostic factors (attributes) of 2442 CRC patients, including patient age, cancer cell differentiation, lymphatic invasion and metastasis stages were used to build a formal context in FCA. A concept was defined as a set of patients with their shared attributes. The formal context was formed based on the similarity scores between each concept identified from the dataset, which can be used as a knowledge base. RESULTS: A hierarchical knowledge base was constructed along with the clinical records of the diagnosed CRC patients. For each new patient, a similarity score to each existing concept in the knowledge base can be retrieved with different similarity calculations. The ranked similarity scores that are associated with the concepts can offer references for treatment plans. CONCLUSIONS: Patients that share the same concept indicates the potential similar effect from same clinical procedures or treatments. In conjunction with a clinician's ability to undergo flexible analyses and apply appropriate judgement, the knowledge base allows faster and more effective decisions to be made for patient treatment and care.


Asunto(s)
Neoplasias Colorrectales , Atención al Paciente , Humanos , Bases del Conocimiento , Análisis por Conglomerados , Juicio , Neoplasias Colorrectales/diagnóstico , Neoplasias Colorrectales/terapia
8.
Sci Rep ; 13(1): 3277, 2023 Feb 25.
Artículo en Inglés | MEDLINE | ID: mdl-36841922

RESUMEN

With the technological advancement in recent years and the widespread use of magnetism in every sector of the current technology, a search for a low-cost magnetic material has been more important than ever. The discovery of magnetism in alternate materials such as metal chalcogenides with abundant atomic constituents would be a milestone in such a scenario. However, considering the multitude of possible chalcogenide configurations, predictive computational modeling or experimental synthesis is an open challenge. Here, we recourse to a stacked generalization machine learning model to predict magnetic moment (µB) in hexagonal Fe-based bimetallic chalcogenides, FexAyB; A represents Ni, Co, Cr, or Mn, and B represents S, Se, or Te, and x and y represent the concentration of respective atoms. The stacked generalization model is trained on the dataset obtained using first-principles density functional theory. The model achieves MSE, MAE, and R2 values of 1.655 (µB)2, 0.546 (µB), and 0.922 respectively on an independent test set, indicating that our model predicts the compositional dependent magnetism in bimetallic chalcogenides with a high degree of accuracy. A generalized algorithm is also developed to test the universality of our proposed model for any concentration of Ni, Co, Cr, or Mn up to 62.5% in bimetallic chalcogenides.

9.
Methods Mol Biol ; 2499: 285-322, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-35696087

RESUMEN

Posttranslational modification (PTM ) is a ubiquitous phenomenon in both eukaryotes and prokaryotes which gives rise to enormous proteomic diversity. PTM mostly comes in two flavors: covalent modification to polypeptide chain and proteolytic cleavage. Understanding and characterization of PTM is a fundamental step toward understanding the underpinning of biology. Recent advances in experimental approaches, mainly mass-spectrometry-based approaches, have immensely helped in obtaining and characterizing PTMs. However, experimental approaches are not enough to understand and characterize more than 450 different types of PTMs and complementary computational approaches are becoming popular. Recently, due to the various advancements in the field of Deep Learning (DL), along with the explosion of applications of DL to various fields, the field of computational prediction of PTM has also witnessed the development of a plethora of deep learning (DL)-based approaches. In this book chapter, we first review some recent DL-based approaches in the field of PTM site prediction. In addition, we also review the recent advances in the not-so-studied PTM , that is, proteolytic cleavage predictions. We describe advances in PTM prediction by highlighting the Deep learning architecture, feature encoding, novelty of the approaches, and availability of the tools/approaches. Finally, we provide an outlook and possible future research directions for DL-based approaches for PTM prediction.


Asunto(s)
Aprendizaje Profundo , Proteómica , Espectrometría de Masas , Procesamiento Proteico-Postraduccional , Proteínas/química
10.
Sci Rep ; 12(1): 16933, 2022 10 08.
Artículo en Inglés | MEDLINE | ID: mdl-36209286

RESUMEN

Protein succinylation is an important post-translational modification (PTM) responsible for many vital metabolic activities in cells, including cellular respiration, regulation, and repair. Here, we present a novel approach that combines features from supervised word embedding with embedding from a protein language model called ProtT5-XL-UniRef50 (hereafter termed, ProtT5) in a deep learning framework to predict protein succinylation sites. To our knowledge, this is one of the first attempts to employ embedding from a pre-trained protein language model to predict protein succinylation sites. The proposed model, dubbed LMSuccSite, achieves state-of-the-art results compared to existing methods, with performance scores of 0.36, 0.79, 0.79 for MCC, sensitivity, and specificity, respectively. LMSuccSite is likely to serve as a valuable resource for exploration of succinylation and its role in cellular physiology and disease.


Asunto(s)
Biología Computacional , Lisina , Biología Computacional/métodos , Lenguaje , Lisina/metabolismo , Procesamiento Proteico-Postraduccional , Proteínas/metabolismo
11.
Comput Biol Med ; 131: 104249, 2021 04.
Artículo en Inglés | MEDLINE | ID: mdl-33561673

RESUMEN

BACKGROUND: The COVID-19 pandemic is a significant public health crisis that is hitting hard on people's health, well-being, and freedom of movement, and affecting the global economy. Scientists worldwide are competing to develop therapeutics and vaccines; currently, three drugs and two vaccine candidates have been given emergency authorization use. However, there are still questions of efficacy with regard to specific subgroups of patients and the vaccine's scalability to the general public. Under such circumstances, understanding COVID-19 symptoms is vital in initial triage; it is crucial to distinguish the severity of cases for effective management and treatment. This study aimed to discover symptom patterns and overall symptom rules, including rules disaggregated by age, sex, chronic condition, and mortality status, among COVID-19 patients. METHODS: This study was a retrospective analysis of COVID-19 patient data made available online by the Wolfram Data Repository through May 27, 2020. We applied a widely used rule-based machine learning technique called association rule mining to identify frequent symptoms and define patterns in the rules discovered. RESULT: In total, 1,560 patients with COVID-19 were included in the study, with a median age of 52 years. The most frequently occurring symptom was fever (67%), followed by cough (37%), malaise/body soreness (11%), pneumonia (11%), and sore throat (8%). Myocardial infarction, heart failure, and renal disease were present in less than 1% of patients. The top ten significant symptom rules (out of 71 generated) showed cough, septic shock, and respiratory distress syndrome as frequent consequents. If a patient had a breathing problem and sputum production, then, there was higher confidence of that patient having a cough; if cardiac disease, renal disease, or pneumonia was present, then there was a higher confidence of septic shock or respiratory distress syndrome. Symptom rules differed between younger and older patients and between male and female patients. Patients who had chronic conditions or died of COVID-19 had more severe symptom rules than those patients who did not have chronic conditions or survived of COVID-19. Concerning chronic condition rules among 147 patients, if a patient had diabetes, prerenal azotemia, and coronary bypass surgery, there was a certainty of hypertension. CONCLUSION: The most frequently reported symptoms in patients with COVID-19 were fever, cough, pneumonia, and sore throat; while 1% had severe symptoms, such as septic shock, respiratory distress syndrome, and respiratory failure. Symptom rules differed by age and sex. Patients with chronic disease and patients who died of COVID-19 had severe symptom rules more specifically, cardiovascular-related symptoms accompanied by pneumonia, fever, and cough as consequents.


Asunto(s)
COVID-19 , Minería de Datos , Bases de Datos Factuales , Diagnóstico por Computador , Pandemias , SARS-CoV-2/metabolismo , Biomarcadores/metabolismo , COVID-19/diagnóstico , COVID-19/epidemiología , COVID-19/metabolismo , Femenino , Humanos , Masculino , Persona de Mediana Edad , Estudios Retrospectivos
12.
Artif Intell Med ; 108: 101900, 2020 08.
Artículo en Inglés | MEDLINE | ID: mdl-32972652

RESUMEN

OBJECTIVE: The aim of this study is to compute similarities between patient records in an electronic health record (EHR). This is an important problem because the availability of effective methods for the computation of patient similarity would allow for assistance with and automation of tasks such as patients stratification, medical prognosis and cohort selection, and for unlocking the potential of medical analytics methods for healthcare intelligence. However, health data in EHRs presents many challenges that make the automatic computation of patient similarity difficult; these include: temporal aspects, multivariate, heterogeneous and irregular data, and data sparsity. MATERIALS AND METHODS: We propose a new method for EHR data representation called Temporal Tree: a temporal hierarchical representation which, based on temporal co-occurrence, preserves the compound information found at different levels in health data. In addition, this representation is augmented using the doc2vec embedding technique which here is exploited for patient similarity computation. We empirically investigate our proposed method, along with several state-of-the-art benchmarks, on a dataset of real world Intensive Care Unit (ICU) EHRs, for the task of identifying patients with a specific target diagnosis. RESULTS: Our empirical results show that the Temporal Trees representation is significantly better than other traditional and state-of-the-art methods for representing patients and computing their similarities. CONCLUSION: Temporal trees capture the temporal relationships between medical, hierarchical data: this enables to effectively model the rich information provided within EHRs and thus the identification of similar patients.


Asunto(s)
Registros Electrónicos de Salud , Árboles , Estudios de Cohortes , Humanos , Pronóstico
13.
Prehosp Disaster Med ; 34(6): 644-652, 2019 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-31599218

RESUMEN

In the present world, International Consensus Frameworks, commonly called global frameworks or global agendas, guide international development policies and practices. They guide the development of all countries and influence the development initiatives by their respective governments. Recent global frameworks, adopted mostly post-2015, include both a group of over-arching frameworks (eg, the Sendai Framework for Disaster Risk Reduction [SFDRR]) and a group of frameworks addressing specific issues (eg, the Dhaka Declaration on Disability and Disaster Risk Management). These global frameworks serve twin purposes: first, to set a global development standard, and second, to set policies and approaches to achieve these standards. A companion group of professional standards, guidelines, and tools (ie, Sphere's Humanitarian Charter and Minimum Standards) guide the implementation and operationalization of these frameworks on the ground.This paper gathers these global frameworks and core professional guidelines in one place, presents an analytical review of their essential features, and highlights the commonalities and differences between and among these frameworks. The aim of this paper is to facilitate understanding of these frameworks and to help in designing development and resilience policy, planning, and implementation, at international and national levels, where these frameworks complement and contribute to each other.This Special Report describes an important and evolving aspect of the discipline and provides core information necessary to progress the science. Additionally, the report will help governments and policy makers to define their priorities and to design policies/strategies/programs to reflect the global commitments. Development practitioners can pre-empt the focus of the international community and the assistance coming from donors to the priority sectors, as identified in the global agenda. This would then help governments and stakeholders to develop and design a realistic plan and program and prepare the instruments and mechanisms to deliver the goals.


Asunto(s)
Planificación en Desastres/organización & administración , Modelos Organizacionales , Política Pública , Conferencias de Consenso como Asunto , Salud Global , Humanos , Guías de Práctica Clínica como Asunto
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA