Your browser doesn't support javascript.
loading
: 20 | 50 | 100
1 - 20 de 21
1.
Heliyon ; 10(10): e31380, 2024 May 30.
Article En | MEDLINE | ID: mdl-38803927

Objective: Our aim was to develop and validate a nomogram for predicting the in-hospital 14-day (14 d) and 28-day (28 d) survival rates of patients with coronavirus disease 2019 (COVID-19). Methods: Clinical data of patients with COVID-19 admitted to the Renmin Hospital of Wuhan University from December 2022 to February 2023 and the north campus of Shanghai Ninth People's Hospital from April 2022 to June 2022 were collected. A total of 408 patients from Renmin Hospital of Wuhan University were selected as the training cohort, and 151 patients from Shanghai Ninth People's Hospital were selected as the verification cohort. Independent variables were screened using Cox regression analysis, and a nomogram was constructed using R software. The prediction accuracy of the nomogram was evaluated using the receiver operating characteristic (ROC) curve, C-index, and calibration curve. Decision curve analysis was used to evaluate the clinical application value of the model. The nomogram was externally validated using a validation cohort. Result: In total, 559 patients with severe/critical COVID-19 were included in this study, of whom 179 (32.02 %) died. Multivariate Cox regression analysis showed that age >80 years [hazard ratio (HR) = 1.539, 95 % confidence interval (CI): 1.027-2.306, P = 0.037], history of diabetes (HR = 1.741, 95 % CI: 1.253-2.420, P = 0.001), high APACHE II score (HR = 1.083, 95 % CI: 1.042-1.126, P < 0.001), sepsis (HR = 2.387, 95 % CI: 1.707-3.338, P < 0.001), high neutrophil-to-lymphocyte ratio (NLR) (HR = 1.010, 95 % CI: 1.003-1.017, P = 0.007), and high D-dimer level (HR = 1.005, 95 % CI: 1.001-1.009, P = 0.028) were independent risk factors for 14 d and 28 d survival rates, whereas COVID-19 vaccination (HR = 0.625, 95 % CI: 0.440-0.886, P = 0.008) was a protective factor affecting prognosis. ROC curve analysis showed that the area under the curve (AUC) of the 14 d and 28 d hospital survival rates in the training cohort was 0.765 (95 % CI: 0.641-0.923) and 0.814 (95 % CI: 0.702-0.938), respectively, and the AUC of the 14 d and 28 d hospital survival rates in the verification cohort was 0.898 (95 % CI: 0.765-0.962) and 0.875 (95 % CI: 0.741-0.945), respectively. The calibration curves of 14 d and 28 d hospital survival showed that the predicted probability of the model agreed well with the actual probability. Decision curve analysis (DCA) showed that the nomogram has high clinical application value. Conclusion: In-hospital survival rates of patients with COVID-19 were predicted using a nomogram, which will help clinicians in make appropriate clinical decisions.

2.
Front Immunol ; 14: 1326018, 2023.
Article En | MEDLINE | ID: mdl-38143770

Background: Ovarian cancer (OC) is a highly heterogeneous and malignant gynecological cancer, thereby leading to poor clinical outcomes. The study aims to identify and characterize clinically relevant subtypes in OC and develop a diagnostic model that can precisely stratify OC patients, providing more diagnostic clues for OC patients to access focused therapeutic and preventative strategies. Methods: Gene expression datasets of OC were retrieved from TCGA and GEO databases. To evaluate immune cell infiltration, the ESTIMATE algorithm was applied. A univariate Cox analysis and the two-sided log-rank test were used to screen OC risk factors. We adopted the ConsensusClusterPlus algorithm to determine OC subtypes. Enrichment analysis based on KEGG and GO was performed to determine enriched pathways of signature genes for each subtype. The machine learning algorithm, support vector machine (SVM) was used to select the feature gene and develop a diagnostic model. A ROC curve was depicted to evaluate the model performance. Results: A total of 1,273 survival-related genes (SRGs) were firstly determined and used to clarify OC samples into different subtypes based on their different molecular pattern. SRGs were successfully stratified in OC patients into three robust subtypes, designated S-I (Immunoreactive and DNA Damage repair), S-II (Mixed), and S-III (Proliferative and Invasive). S-I had more favorable OS and DFS, whereas S-III had the worst prognosis and was enriched with OC patients at advanced stages. Meanwhile, comprehensive functional analysis highlighted differences in biological pathways: genes associated with immune function and DNA damage repair including CXCL9, CXCL10, CXCL11, APEX, APEX2, and RBX1 were enriched in S-I; S-II combined multiple gene signatures including genes associated with metabolism and transcription; and the gene signature of S-III was extensively involved in pathways reflecting malignancies, including many core kinases and transcription factors involved in cancer such as CDK6, ERBB2, JAK1, DAPK1, FOXO1, and RXRA. The SVM model showed superior diagnostic performance with AUC values of 0.922 and 0.901, respectively. Furthermore, a new dataset of the independent cohort could be automatically analyzed by this innovative pipeline and yield similar results. Conclusion: This study exploited an innovative approach to construct previously unexplored robust subtypes significantly related to different clinical and molecular features for OC and a diagnostic model using SVM to aid in clinical diagnosis and treatment. This investigation also illustrated the importance of targeting innate immune suppression together with DNA damage in OC, offering novel insights for further experimental exploration and clinical trial.


Genes, cdc , Ovarian Neoplasms , Humans , Female , Prognosis , Ovarian Neoplasms/diagnosis , Ovarian Neoplasms/genetics , Algorithms
3.
Nat Commun ; 14(1): 2813, 2023 05 17.
Article En | MEDLINE | ID: mdl-37198164

Proteostasis is fundamental for maintaining organismal health. However, the mechanisms underlying its dynamic regulation and how its disruptions lead to diseases are largely unclear. Here, we conduct in-depth propionylomic profiling in Drosophila, and develop a small-sample learning framework to prioritize the propionylation at lysine 17 of H2B (H2BK17pr) to be functionally important. Mutating H2BK17 which eliminates propionylation leads to elevated total protein level in vivo. Further analyses reveal that H2BK17pr modulates the expression of 14.7-16.3% of genes in the proteostasis network, and determines global protein level by regulating the expression of genes involved in the ubiquitin-proteasome system. In addition, H2BK17pr exhibits daily oscillation, mediating the influences of feeding/fasting cycles to drive rhythmic expression of proteasomal genes. Our study not only reveals a role of lysine propionylation in regulating proteostasis, but also implements a generally applicable method which can be extended to other issues with little prior knowledge.


Lysine , Proteostasis , Animals , Lysine/metabolism , Ubiquitin/metabolism , Drosophila/metabolism , Proteasome Endopeptidase Complex/metabolism
4.
Nucleic Acids Res ; 51(W1): W243-W250, 2023 07 05.
Article En | MEDLINE | ID: mdl-37158278

Protein phosphorylation, catalyzed by protein kinases (PKs), is one of the most important post-translational modifications (PTMs), and involved in regulating almost all of biological processes. Here, we report an updated server, Group-based Prediction System (GPS) 6.0, for prediction of PK-specific phosphorylation sites (p-sites) in eukaryotes. First, we pre-trained a general model using penalized logistic regression (PLR), deep neural network (DNN), and Light Gradient Boosting Machine (LightGMB) on 490 762 non-redundant p-sites in 71 407 proteins. Then, transfer learning was conducted to obtain 577 PK-specific predictors at the group, family and single PK levels, using a well-curated data set of 30 043 known site-specific kinase-substrate relations in 7041 proteins. Together with the evolutionary information, GPS 6.0 could hierarchically predict PK-specific p-sites for 44046 PKs in 185 species. Besides the basic statistics, we also offered the knowledge from 22 public resources to annotate the prediction results, including the experimental evidence, physical interactions, sequence logos, and p-sites in sequences and 3D structures. The GPS 6.0 server is freely available at https://gps.biocuckoo.cn. We believe that GPS 6.0 could be a highly useful service for further analysis of phosphorylation.


Computational Biology , Proteins , Software , Phosphorylation , Protein Kinases/chemistry , Protein Kinases/metabolism , Protein Processing, Post-Translational , Proteins/chemistry , Proteins/metabolism , Computational Biology/instrumentation , Computational Biology/methods , Internet
5.
Diagnostics (Basel) ; 12(10)2022 Oct 21.
Article En | MEDLINE | ID: mdl-36292251

Objective: A nomograph model of mortality risk for patients with coronavirus disease 2019 (COVID-19) was established and validated. Methods: We collected the clinical medical records of patients with severe/critical COVID-19 admitted to the eastern campus of Renmin Hospital of Wuhan University from January 2020 to May 2020 and to the north campus of Shanghai Ninth People's Hospital, Shanghai JiaoTong University School of Medicine, from April 2022 to June 2022. We assigned 254 patients to the former group, which served as the training set, and 113 patients were assigned to the latter group, which served as the validation set. The least absolute shrinkage and selection operator (LASSO) and multivariable logistic regression were used to select the variables and build the mortality risk prediction model. Results: The nomogram model was constructed with four risk factors for patient mortality following severe/critical COVID-19 (≥3 basic diseases, APACHE II score, urea nitrogen (Urea), and lactic acid (Lac)) and two protective factors (percentage of lymphocyte (L%) and neutrophil-to-platelets ratio (NPR)). The area under the curve (AUC) of the training set was 0.880 (95% confidence interval (95%CI), 0.837~0.923) and the AUC of the validation set was 0.814 (95%CI, 0.705~0.923). The decision curve analysis (DCA) showed that the nomogram model had high clinical value. Conclusion: The nomogram model for predicting the death risk of patients with severe/critical COVID-19 showed good prediction performance, and may be helpful in making appropriate clinical decisions for high-risk patients.

6.
Nucleic Acids Res ; 50(W1): W405-W411, 2022 07 05.
Article En | MEDLINE | ID: mdl-35670661

Recent high-throughput omics techniques have produced a large amount of biological data. Visualization of big omics data is essential to answer a wide range of biological problems. As a concise but comprehensive strategy, a heatmap can analyze and visualize high-dimensional and heterogeneous biomolecular expression data in an attractive artwork. In 2014, we developed a stand-alone software package, Heat map Illustrator (HemI 1.0), which implemented three clustering methods and seven distance metrics for heatmap illustration. Here, we significantly improved 1.0 and released the online service of HemI 2.0, in which 7 clustering methods and 22 types of distance metrics were implemented. In HemI 2.0, the clustering results and publication-quality heatmaps can be exported directly. For an in-depth analysis of the data, we further added an option of enrichment analysis for 12 model organisms, with 15 types of functional annotations. The enrichment results can be visualized in five idioms, including bubble chart, bar graph, coxcomb chart, pie chart and word cloud. We anticipate that HemI 2.0 can be a helpful web server for visualization of biomolecular expression data, as well as the additional enrichment analysis. HemI 2.0 is freely available for all users at: https://hemi.biocuckoo.org/.


Cluster Analysis , Data Analysis , Data Visualization , Internet , Software , Big Data , Animals , Models, Animal , Gene Expression Profiling/methods
7.
ACS Chem Biol ; 17(1): 252-262, 2022 01 21.
Article En | MEDLINE | ID: mdl-34989232

Although thermal proteome profiling (TPP) acts as a popular modification-free approach for drug target deconvolution, some key problems are still limiting screening sensitivity. In the prevailing TPP workflow, only the soluble fractions are analyzed after thermal treatment, while the precipitate fractions that also contain abundant information of drug-induced stability shifts are discarded; the sigmoid melting curve fitting strategy used for data processing suffers from discriminations for a part of human proteome with multiple transitions. In this study, a precipitate-supported TPP (PSTPP) assay was presented for unbiased and comprehensive analysis of protein-drug interactions at the proteome level. In PSTPP, only these temperatures where significant precipitation is observed were applied to induce protein denaturation and the complementary information contained in both supernatant fractions and precipitate fractions was used to improve the screening specificity and sensitivity. In addition, a novel image recognition algorithm based on deep learning was developed to recognize the target proteins, which circumvented the problems that exist in the sigmoid curve fitting strategy. PSTPP assay was validated by identifying the known targets of methotrexate, raltitrexed, and SNS-032 with good performance. Using a promiscuous kinase inhibitor, staurosporine, we delineated 99 kinase targets with a specificity up to 83% in K562 cell lysates, which represented a significant improvement over the existing thermal shift methods. Furthermore, the PSTPP strategy was successfully applied to analyze the binding targets of rapamycin, identifying the well-known targets, FKBP1A, as well as revealing a few other potential targets.


Chemical Precipitation , Deep Learning , Drug Delivery Systems , Proteins/drug effects , Proteome , Proteomics/methods , Algorithms , Hot Temperature , Humans , K562 Cells
8.
Brief Bioinform ; 23(2)2022 03 10.
Article En | MEDLINE | ID: mdl-35037020

As an important post-translational modification, lysine ubiquitination participates in numerous biological processes and is involved in human diseases, whereas the site specificity of ubiquitination is mainly decided by ubiquitin-protein ligases (E3s). Although numerous ubiquitination predictors have been developed, computational prediction of E3-specific ubiquitination sites is still a great challenge. Here, we carefully reviewed the existing tools for the prediction of general ubiquitination sites. Also, we developed a tool named GPS-Uber for the prediction of general and E3-specific ubiquitination sites. From the literature, we manually collected 1311 experimentally identified site-specific E3-substrate relations, which were classified into different clusters based on corresponding E3s at different levels. To predict general ubiquitination sites, we integrated 10 types of sequence and structure features, as well as three types of algorithms including penalized logistic regression, deep neural network and convolutional neural network. Compared with other existing tools, the general model in GPS-Uber exhibited a highly competitive accuracy, with an area under curve values of 0.7649. Then, transfer learning was adopted for each E3 cluster to construct E3-specific models, and in total 112 individual E3-specific predictors were implemented. Using GPS-Uber, we conducted a systematic prediction of human cancer-associated ubiquitination events, which could be helpful for further experimental consideration. GPS-Uber will be regularly updated, and its online service is free for academic research at http://gpsuber.biocuckoo.cn/.


Lysine , Ubiquitin-Protein Ligases , Algorithms , Humans , Lysine/metabolism , Protein Processing, Post-Translational , Ubiquitin-Protein Ligases/chemistry , Ubiquitin-Protein Ligases/genetics , Ubiquitin-Protein Ligases/metabolism , Ubiquitination
9.
Nucleic Acids Res ; 50(D1): D451-D459, 2022 01 07.
Article En | MEDLINE | ID: mdl-34581824

Here, we reported the compendium of protein lysine modifications (CPLM 4.0, http://cplm.biocuckoo.cn/), a data resource for various post-translational modifications (PTMs) specifically occurred at the side-chain amino group of lysine residues in proteins. From the literature and public databases, we collected 450 378 protein lysine modification (PLM) events, and combined them with the existing data of our previously developed protein lysine modification database (PLMD 3.0). In total, CPLM 4.0 contained 592 606 experimentally identified modification events on 463 156 unique lysine residues of 105 673 proteins for up to 29 types of PLMs across 219 species. Furthermore, we carefully annotated the data using the knowledge from 102 additional resources that covered 13 aspects, including variation and mutation, disease-associated information, protein-protein interaction, protein functional annotation, DNA & RNA element, protein structure, chemical-target relation, mRNA expression, protein expression/proteomics, subcellular localization, biological pathway annotation, functional domain annotation, and physicochemical property. Compared to PLMD 3.0 and other existing resources, CPLM 4.0 achieved a >2-fold increase in collection of PLM events, with a data volume of ∼45GB. We anticipate that CPLM 4.0 can serve as a more useful database for further study of PLMs.


Databases, Protein , Lysine/metabolism , Protein Processing, Post-Translational , Proteins/metabolism , Software , Acetylation , Animals , Bacteria/genetics , Bacteria/metabolism , Biotinylation , Humans , Hydroxylation , Internet , Lysine/chemistry , Methylation , Models, Molecular , Molecular Sequence Annotation , Mutation , Phosphorylation , Plants/genetics , Plants/metabolism , Protein Binding , Protein Conformation , Protein Interaction Mapping , Proteins/chemistry , Proteins/genetics , RNA, Messenger/genetics , RNA, Messenger/metabolism , Ubiquitination
11.
Comput Struct Biotechnol J ; 19: 4497-4509, 2021.
Article En | MEDLINE | ID: mdl-34471495

As a novel lactate-derived post-translational modification (PTM), lysine lactylation (Kla) is involved in diverse biological processes, and participates in human tumorigenesis. Identification of Kla substrates with their exact sites is crucial for revealing the molecular mechanisms of lactylation. In contrast with labor-intensive and time-consuming experimental approaches, computational prediction of Kla could provide convenience and increased speed, but is still lacking. In this work, although current identified Kla sites are limited, we constructed the first Kla benchmark dataset and developed a few-shot learning-based architecture approach to leverage the power of small datasets and reduce the impact of imbalance and overfitting. A maximum 11.7% (0.745 versus 0.667) increase of area under the curve (AUC) value was achieved in contrast to conventional machine learning methods. We conducted a comprehensive survey of the performance by combining 8 sequence-based features and 3 structure-based features and tailored a multi-feature hybrid system for synergistic combination. This system achieved >16.2% improvement of the AUC value (0.889 versus 0.765) compared with single feature-based models for the prediction of Kla sites in silico. Taken few-shot learning and hybrid system together, we present our newly designed predictor named FSL-Kla, which is not only a cutting-edge tool for Kla site profile but also could generate candidates for further experimental approaches. The webserver of FSL-Kla is freely accessible for academic research at http://kla.zbiolab.cn/.

12.
Theranostics ; 11(16): 8008-8026, 2021.
Article En | MEDLINE | ID: mdl-34335977

Rationale: Children usually develop less severe symptoms responding to Coronavirus Disease 2019 (COVID-19) than adults. However, little is known about the molecular alterations and pathogenesis of COVID-19 in children. Methods: We conducted plasma proteomic and metabolomic profilings of the blood samples of a cohort containing 18 COVID-19-children with mild symptoms and 12 healthy children, which were enrolled from hospital admissions and outpatients, respectively. Statistical analyses were performed to identify molecules specifically altered in COVID-19-children. We also developed a machine learning-based pipeline named inference of biomolecular combinations with minimal bias (iBM) to prioritize proteins and metabolites strongly altered in COVID-19-children, and experimentally validated the predictions. Results: By comparing to the multi-omic data in adults, we identified 44 proteins and 249 metabolites differentially altered in COVID-19-children against healthy children or COVID-19-adults. Further analyses demonstrated that both deteriorative immune response/inflammation processes and protective antioxidant or anti-inflammatory processes were markedly induced in COVID-19-children. Using iBM, we prioritized two combinations that contained 5 proteins and 5 metabolites, respectively, each exhibiting a total area under curve (AUC) value of 100% to accurately distinguish COVID-19-children from healthy children or COVID-19-adults. Further experiments validated that all the 5 proteins were up-regulated upon coronavirus infection. Interestingly, we found that the prioritized metabolites inhibited the expression of pro-inflammatory factors, and two of them, methylmalonic acid (MMA) and mannitol, also suppressed coronaviral replication, implying a protective role of these metabolites in COVID-19-children. Conclusion: The finding of a strong antagonism of deteriorative and protective effects provided new insights on the mechanism and pathogenesis of COVID-19 in children that mostly underwent mild symptoms. The identified metabolites strongly altered in COVID-19-children could serve as potential therapeutic agents of COVID-19.


COVID-19/blood , COVID-19/virology , Adult , COVID-19/epidemiology , COVID-19/immunology , Child , Child, Preschool , China/epidemiology , Female , Hospitalization , Humans , Male , Metabolomics/methods , Middle Aged , Proteomics/methods , SARS-CoV-2/isolation & purification
13.
Nat Commun ; 12(1): 3258, 2021 05 31.
Article En | MEDLINE | ID: mdl-34059679

Autophagy can selectively target protein aggregates, pathogens, and dysfunctional organelles for the lysosomal degradation. Aberrant regulation of autophagy promotes tumorigenesis, while it is far less clear whether and how tumor-specific alterations result in autophagic aberrance. To form a link between aberrant autophagy selectivity and human cancer, we establish a computational pipeline and prioritize 222 potential LIR (LC3-interacting region) motif-associated mutations (LAMs) in 148 proteins. We validate LAMs in multiple proteins including ATG4B, STBD1, EHMT2 and BRAF that impair their interactions with LC3 and autophagy activities. Using a combination of transcriptomic, metabolomic and additional experimental assays, we show that STBD1, a poorly-characterized protein, inhibits tumor growth via modulating glycogen autophagy, while a patient-derived W203C mutation on LIR abolishes its cancer inhibitory function. This work suggests that altered autophagy selectivity is a frequently-used mechanism by cancer cells to survive during various stresses, and provides a framework to discover additional autophagy-related pathways that influence carcinogenesis.


Carcinogenesis/genetics , Macroautophagy/genetics , Membrane Proteins/genetics , Models, Genetic , Muscle Proteins/genetics , Neoplasms/genetics , Algorithms , Animals , Carcinogenesis/pathology , Cell Line, Tumor , Computer Simulation , DNA Mutational Analysis , Datasets as Topic , Gene Knockdown Techniques , Glycogen/metabolism , Humans , Kaplan-Meier Estimate , Membrane Proteins/metabolism , Mice , Microtubule-Associated Proteins/metabolism , Muscle Proteins/metabolism , Mutation , Neoplasms/mortality , Neoplasms/pathology , Pentose Phosphate Pathway/genetics , Protein Interaction Domains and Motifs/genetics , Proteome/genetics , RNA-Seq , Tissue Array Analysis , Warburg Effect, Oncologic , Xenograft Model Antitumor Assays
14.
Brief Bioinform ; 22(2): 1836-1847, 2021 03 22.
Article En | MEDLINE | ID: mdl-32248222

As an important reversible lipid modification, S-palmitoylation mainly occurs at specific cysteine residues in proteins, participates in regulating various biological processes and is associated with human diseases. Besides experimental assays, computational prediction of S-palmitoylation sites can efficiently generate helpful candidates for further experimental consideration. Here, we reviewed the current progress in the development of S-palmitoylation site predictors, as well as training data sets, informative features and algorithms used in these tools. Then, we compiled a benchmark data set containing 3098 known S-palmitoylation sites identified from small- or large-scale experiments, and developed a new method named data quality discrimination (DQD) to distinguish data quality weights (DQWs) between the two types of the sites. Besides DQD and our previous methods, we encoded sequence similarity values into images, constructed a deep learning framework of convolutional neural networks (CNNs) and developed a novel algorithm of graphic presentation system (GPS) 6.0. We further integrated nine additional types of sequence-based and structural features, implemented parallel CNNs (pCNNs) and designed a new predictor called GPS-Palm. Compared with other existing tools, GPS-Palm showed a >31.3% improvement of the area under the curve (AUC) value (0.855 versus 0.651) for general prediction of S-palmitoylation sites. We also produced two species-specific predictors, with corresponding AUC values of 0.900 and 0.897 for predicting human- and mouse-specific sites, respectively. GPS-Palm is free for academic research at http://gpspalm.biocuckoo.cn/.


Computer Graphics , Deep Learning , Lipoylation , Proteins/chemistry , Algorithms , Animals , Computational Biology/methods , Humans , Mice , Software
15.
Nat Biomed Eng ; 4(12): 1197-1207, 2020 12.
Article En | MEDLINE | ID: mdl-33208927

Data from patients with coronavirus disease 2019 (COVID-19) are essential for guiding clinical decision making, for furthering the understanding of this viral disease, and for diagnostic modelling. Here, we describe an open resource containing data from 1,521 patients with pneumonia (including COVID-19 pneumonia) consisting of chest computed tomography (CT) images, 130 clinical features (from a range of biochemical and cellular analyses of blood and urine samples) and laboratory-confirmed severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) clinical status. We show the utility of the database for prediction of COVID-19 morbidity and mortality outcomes using a deep learning algorithm trained with data from 1,170 patients and 19,685 manually labelled CT slices. In an independent validation cohort of 351 patients, the algorithm discriminated between negative, mild and severe cases with areas under the receiver operating characteristic curve of 0.944, 0.860 and 0.884, respectively. The open database may have further uses in the diagnosis and management of patients with COVID-19.


COVID-19/pathology , COVID-19/virology , Pneumonia, Viral/pathology , Pneumonia, Viral/virology , Algorithms , Deep Learning , Female , Humans , Male , Pandemics , ROC Curve , SARS-CoV-2/pathogenicity , Tomography, X-Ray Computed/methods
16.
Immunity ; 53(5): 1108-1122.e5, 2020 11 17.
Article En | MEDLINE | ID: mdl-33128875

The coronavirus disease 2019 (COVID-19) pandemic is a global public health crisis. However, little is known about the pathogenesis and biomarkers of COVID-19. Here, we profiled host responses to COVID-19 by performing plasma proteomics of a cohort of COVID-19 patients, including non-survivors and survivors recovered from mild or severe symptoms, and uncovered numerous COVID-19-associated alterations of plasma proteins. We developed a machine-learning-based pipeline to identify 11 proteins as biomarkers and a set of biomarker combinations, which were validated by an independent cohort and accurately distinguished and predicted COVID-19 outcomes. Some of the biomarkers were further validated by enzyme-linked immunosorbent assay (ELISA) using a larger cohort. These markedly altered proteins, including the biomarkers, mediate pathophysiological pathways, such as immune or inflammatory responses, platelet degranulation and coagulation, and metabolism, that likely contribute to the pathogenesis. Our findings provide valuable knowledge about COVID-19 biomarkers and shed light on the pathogenesis and potential therapeutic targets of COVID-19.


Coronavirus Infections/blood , Coronavirus Infections/pathology , Plasma/metabolism , Pneumonia, Viral/blood , Pneumonia, Viral/pathology , Adult , Aged , Aged, 80 and over , Betacoronavirus , Biomarkers/blood , Blood Proteins/metabolism , COVID-19 , Coronavirus Infections/classification , Coronavirus Infections/metabolism , Female , Humans , Machine Learning , Male , Middle Aged , Pandemics/classification , Pneumonia, Viral/classification , Pneumonia, Viral/metabolism , Proteomics , Reproducibility of Results , SARS-CoV-2
17.
Genomics Proteomics Bioinformatics ; 18(2): 194-207, 2020 04.
Article En | MEDLINE | ID: mdl-32861878

As an important protein acylation modification, lysine succinylation (Ksucc) is involved in diverse biological processes, and participates in human tumorigenesis. Here, we collected 26,243 non-redundant known Ksucc sites from 13 species as the benchmark data set, combined 10 types of informative features, and implemented a hybrid-learning architecture by integrating deep-learning and conventional machine-learning algorithms into a single framework. We constructed a new tool named HybridSucc, which achieved area under curve (AUC) values of 0.885 and 0.952 for general and human-specific prediction of Ksucc sites, respectively. In comparison, the accuracy of HybridSucc was 17.84%-50.62% better than that of other existing tools. Using HybridSucc, we conducted a proteome-wide prediction and prioritized 370 cancer mutations that change Ksucc states of 218 important proteins, including PKM2, SHMT2, and IDH2. We not only developed a high-profile tool for predicting Ksucc sites, but also generated useful candidates for further experimental consideration. The online service of HybridSucc can be freely accessed for academic research at http://hybridsucc.biocuckoo.org/.


Algorithms , Machine Learning , Proteins/metabolism , Succinic Acid/metabolism , Acylation , Amino Acid Sequence , Area Under Curve , Humans , Lysine/metabolism , Neoplasms/metabolism , Proteome/metabolism , ROC Curve , Species Specificity
18.
Cells ; 9(5)2020 05 20.
Article En | MEDLINE | ID: mdl-32443803

Protein phosphorylation is essential for regulating cellular activities by modifying substrates at specific residues, which frequently interact with proteins containing phosphoprotein-binding domains (PPBDs) to propagate the phosphorylation signaling into downstream pathways. Although massive phosphorylation sites (p-sites) have been reported, most of their interacting PPBDs are unknown. Here, we collected 4458 known PPBD-specific binding p-sites (PBSs), considerably improved our previously developed group-based prediction system (GPS) algorithm, and implemented a deep learning plus transfer learning strategy for model training. Then, we developed a new online service named GPS-PBS, which can hierarchically predict PBSs of 122 single PPBD clusters belonging to two groups and 16 families. By comparison, GPS-PBS achieved a highly competitive accuracy against other existing tools. Using GPS-PBS, we predicted 371,018 mammalian p-sites that potentially interact with at least one PPBD, and revealed that various PPBD-containing proteins (PPCPs) and protein kinases (PKs) can simultaneously regulate the same p-sites to orchestrate important pathways, such as the PI3K-Akt signaling pathway. Taken together, we anticipate GPS-PBS can be a great help for further dissecting phosphorylation signaling networks.


Algorithms , Deep Learning , Phosphoproteins/chemistry , Phosphoproteins/metabolism , Animals , Binding Sites , Databases, Protein , Humans , Phosphorylation , Protein Binding , Protein Domains , Proteome/metabolism , Signal Transduction , Statistics as Topic
19.
Nucleic Acids Res ; 48(D1): D288-D295, 2020 01 08.
Article En | MEDLINE | ID: mdl-31691822

Here, we presented an integrative database named DrLLPS (http://llps.biocuckoo.cn/) for proteins involved in liquid-liquid phase separation (LLPS), which is a ubiquitous and crucial mechanism for spatiotemporal organization of various biochemical reactions, by creating membraneless organelles (MLOs) in eukaryotic cells. From the literature, we manually collected 150 scaffold proteins that are drivers of LLPS, 987 regulators that contribute in modulating LLPS, and 8148 potential client proteins that might be dispensable for the formation of MLOs, which were then categorized into 40 biomolecular condensates. We searched potential orthologs of these known proteins, and in total DrLLPS contained 437 887 known and potential LLPS-associated proteins in 164 eukaryotes. Furthermore, we carefully annotated LLPS-associated proteins in eight model organisms, by using the knowledge integrated from 110 widely used resources that covered 16 aspects, including protein disordered regions, domain annotations, post-translational modifications (PTMs), genetic variations, cancer mutations, molecular interactions, disease-associated information, drug-target relations, physicochemical property, protein functional annotations, protein expressions/proteomics, protein 3D structures, subcellular localizations, mRNA expressions, DNA & RNA elements, and DNA methylations. We anticipate DrLLPS can serve as a helpful resource for further analysis of LLPS.


Databases, Factual , Eukaryota , Proteins/chemistry , Proteins/metabolism , Genome , Intrinsically Disordered Proteins/chemistry , Intrinsically Disordered Proteins/metabolism , Organelles , Protein Processing, Post-Translational , User-Computer Interface
20.
Nucleic Acids Res ; 47(D1): D344-D350, 2019 01 08.
Article En | MEDLINE | ID: mdl-30380109

Here, we described the updated database iEKPD 2.0 (http://iekpd.biocuckoo.org) for eukaryotic protein kinases (PKs), protein phosphatases (PPs) and proteins containing phosphoprotein-binding domains (PPBDs), which are key molecules responsible for phosphorylation-dependent signalling networks and participate in the regulation of almost all biological processes and pathways. In total, iEKPD 2.0 contained 197 348 phosphorylation regulators, including 109 912 PKs, 23 294 PPs and 68 748 PPBD-containing proteins in 164 eukaryotic species. In particular, we provided rich annotations for the regulators of eight model organisms, especially humans, by compiling and integrating the knowledge from 100 widely used public databases that cover 13 aspects, including cancer mutations, genetic variations, disease-associated information, mRNA expression, DNA & RNA elements, DNA methylation, molecular interactions, drug-target relations, protein 3D structures, post-translational modifications, protein expressions/proteomics, subcellular localizations and protein functional annotations. Compared with our previously developed EKPD 1.0 (∼0.5 GB), iEKPD 2.0 contains ∼99.8 GB of data with an ∼200-fold increase in data volume. We anticipate that iEKPD 2.0 represents a more useful resource for further study of phosphorylation regulators.


Databases, Protein , Eukaryota/genetics , Molecular Sequence Annotation , Phosphoprotein Phosphatases/genetics , Protein Kinases/genetics , Animals , Data Collection , Humans , Phosphoproteins/metabolism , Phosphorylation , Protein Domains/genetics , Protein Processing, Post-Translational , User-Computer Interface
...