Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 114
Filter
1.
Methods ; 230: 108-115, 2024 Aug 05.
Article in English | MEDLINE | ID: mdl-39111721

ABSTRACT

Cervical cancer (CC) is one of the most common gynecological malignancies. Cytological screening, while being the most common and accurate method for detecting cervical cancer, is both time-consuming and costly. Predicting CC based on bioinformatics can assist in the rapid early screening of CC in clinical practice. Most recent CC prediction methods require a large amount of detection data or sequencing data and are not ideal for CC detection in complex disease samples. We developed the Disease trend analysis platform (Dtap), which can quickly predict the occurrence of diseases using only blood routine data. Blood routine data was collected from 1,292 cervical cancer patients, 4,860 patients with complex diseases, and 4,980 healthy individuals from various sources. The results show that the Dtap-based trend model maintained good and stable performance in the prediction task of multiple datasets as well as complex disease samples. Finally, we built DTAPCC (http://bioinfor.imu.edu.cn/dtapcc), a Dtap-based CC disease prediction platform, to help users quickly predict CC and visualize trend features.

2.
Int J Biol Macromol ; : 134798, 2024 Aug 15.
Article in English | MEDLINE | ID: mdl-39153678

ABSTRACT

Histone lysine demethylase (KDM), AlkB homolog (ALKBH), and Ten-Eleven Translocation (TET) proteins are members of the 2-Oxoglutarate (2OG) and ferrous iron-dependent oxygenases, each of which harbors a catalytic domain centered on a double-stranded ß-helix whose topology restricts the regions directly involved in substrate binding. However, they have different catalytic functions, and the deeply structural biological reasons are not yet clear. In this review, the catalytic domain features of the three protein families are summarized from both sequence and structural perspectives. The construction of the phylogenetic tree and comparison of the structure show ten relatively conserved ß-sheets and three key regions with substantial structural differences. We summarize the relationship between three key regions of remarkable differences and the substrate compatibility of the three protein families. This review facilitates research into substrate-selective inhibition and bioengineering by providing new insights into the catalytic domains of KDM, ALKBH, and TET proteins.

3.
Methods ; 229: 156-162, 2024 Sep.
Article in English | MEDLINE | ID: mdl-39019099

ABSTRACT

Diabetes stands as one of the most prevalent chronic diseases globally. The conventional methods for diagnosing diabetes are frequently overlooked until individuals manifest noticeable symptoms of the condition. This study aimed to address this gap by collecting comprehensive datasets, including 1000 instances of blood routine data from diabetes patients and an equivalent dataset from healthy individuals. To differentiate diabetes patients from their healthy counterparts, a computational framework was established, encompassing eXtreme Gradient Boosting (XGBoost), random forest, support vector machine, and elastic net algorithms. Notably, the XGBoost model emerged as the most effective, exhibiting superior predictive results with an area under the receiver operating characteristic curve (AUC) of 99.90% in the training set and 98.51% in the testing set. Moreover, the model showcased commendable performance during external validation, achieving an overall accuracy of 81.54%. The probability generated by the model serves as a risk score for diabetes susceptibility. Further interpretability was achieved through the utilization of the Shapley additive explanations (SHAP) algorithm, identifying pivotal indicators such as mean corpuscular hemoglobin concentration (MCHC), lymphocyte ratio (LY%), standard deviation of red blood cell distribution width (RDW-SD), and mean corpuscular hemoglobin (MCH). This enhances our understanding of the predictive mechanisms underlying diabetes. To facilitate the application in clinical and real-life settings, a nomogram was created based on the logistic regression algorithm, which can provide a preliminary assessment of the likelihood of an individual having diabetes. Overall, this research contributes valuable insights into the predictive modeling of diabetes, offering potential applications in clinical practice for more effective and timely diagnoses.


Subject(s)
Diabetes Mellitus , Machine Learning , Humans , Diabetes Mellitus/blood , Diabetes Mellitus/diagnosis , Female , Male , Support Vector Machine , Algorithms , ROC Curve , Middle Aged , Erythrocyte Indices , Adult
4.
Brief Bioinform ; 25(4)2024 May 23.
Article in English | MEDLINE | ID: mdl-38811360

ABSTRACT

The advancement of spatial transcriptomics (ST) technology contributes to a more profound comprehension of the spatial properties of gene expression within tissues. However, due to challenges of high dimensionality, pronounced noise and dynamic limitations in ST data, the integration of gene expression and spatial information to accurately identify spatial domains remains challenging. This paper proposes a SpaNCMG algorithm for the purpose of achieving precise spatial domain description and localization based on a neighborhood-complementary mixed-view graph convolutional network. The algorithm enables better adaptation to ST data at different resolutions by integrating the local information from KNN and the global structure from r-radius into a complementary neighborhood graph. It also introduces an attention mechanism to achieve adaptive fusion of different reconstructed expressions, and utilizes KPCA method for dimensionality reduction. The application of SpaNCMG on five datasets from four sequencing platforms demonstrates superior performance to eight existing advanced methods. Specifically, the algorithm achieved highest ARI accuracies of 0.63 and 0.52 on the datasets of the human dorsolateral prefrontal cortex and mouse somatosensory cortex, respectively. It accurately identified the spatial locations of marker genes in the mouse olfactory bulb tissue and inferred the biological functions of different regions. When handling larger datasets such as mouse embryos, the SpaNCMG not only identified the main tissue structures but also explored unlabeled domains. Overall, the good generalization ability and scalability of SpaNCMG make it an outstanding tool for understanding tissue structure and disease mechanisms. Our codes are available at https://github.com/ZhihaoSi/SpaNCMG.


Subject(s)
Algorithms , Transcriptome , Humans , Animals , Mice , Gene Expression Profiling/methods , Neural Networks, Computer , Computational Biology/methods , Prefrontal Cortex/metabolism
5.
Biochem Genet ; 2024 Apr 24.
Article in English | MEDLINE | ID: mdl-38658494

ABSTRACT

Long non-coding RNAs (lncRNAs), as promising novel biomarkers for cancer treatment and prognosis, can function as tumor suppressors and oncogenes in the occurrence and development of many types of cancer, including gastric cancer (GC). However, little is known about the complex regulatory system of lncRNAs in GC. In this study, we systematically analyzed lncRNA and miRNA transcriptomic profiles of GC based on bioinformatics methods and experimental validation. An lncRNA-miRNA interaction network related to GC was constructed, and the nine crucial lncRNAs were identified. These 9 lncRNAs were found to be associated with the prognosis of GC patients by Cox proportional hazards regression analysis. Among them, the expression of lncRNA SNHG14 can affect the survival of GC patients as a potential prognostic marker. Moreover, it was shown that SNHG14 was involved in immune-related pathways and significantly correlated with immune cell infiltration in GC. Meanwhile, we found that SNHG14 affected immune function in many cancers, such as breast cancer and esophageal carcinoma. Such information revealed that SNHG14 may serve as a potential target for cancer immunotherapy. As well, our study could provide practical and theoretical guiding significance for clinical application of non-coding RNAs.

6.
Animals (Basel) ; 14(5)2024 Feb 21.
Article in English | MEDLINE | ID: mdl-38473062

ABSTRACT

The number of vertebrae is a crucial economic trait that can significantly impact the carcass length and meat production in animals. However, our understanding of the quantitative trait loci (QTLs) and candidate genes associated with the vertebral number in sheep (Ovis aries) remains limited. To identify these candidate genes and QTLs, we collected 73 Ujimqin sheep with increased numbers of vertebrae (T13L7, T14L6, and T14L7) and 23 sheep with normal numbers of vertebrae (T13L6). Through high-throughput genome resequencing, we obtained a total of 24,130,801 effective single-nucleotide polymorphisms (SNPs). By conducting a selective-sweep analysis, we discovered that the most significantly selective region was located on chromosome 7. Within this region, we identified several genes, including VRTN, SYNDIG1L, LTBP2, and ABCD4, known to regulate the spinal development and morphology. Further, a genome-wide association study (GWAS) performed on sheep with increased and normal vertebral numbers confirmed that ABCD4 is a candidate gene for determining the number of vertebrae in sheep. Additionally, the most significant SNP on chromosome 7 was identified as a candidate QTL. Moreover, we detected two missense mutations in the ABCD4 gene; one of these mutations (Chr7: 89393414, C > T) at position 22 leads to the conversion of arginine (Arg) to glutamine (Gln), which is expected to negatively affect the protein's function. Notably, a transcriptome expression profile in mouse embryonic development revealed that ABCD4 is highly expressed during the critical period of vertebral formation (4.5-7.5 days). Our study highlights ABCD4 as a potential major gene influencing the number of vertebrae in Ujimqin sheep, with promising prospects for future genome-assisted breeding improvements in sheep.

7.
Arch Biochem Biophys ; 754: 109942, 2024 Apr.
Article in English | MEDLINE | ID: mdl-38387828

ABSTRACT

Several simple secondary structures could form complex and diverse functional proteins, meaning that secondary structures may contain a lot of hidden information and are arranged according to certain principles, to carry enough information of functional specificity and diversity. However, these inner information and principles have not been understood systematically. In our study, we designed a structure-function alphabet of helix based on reduced amino acid clusters to describe the typical features of helices and delve into the information. Firstly, we selected 480 typical helices from membrane proteins, zymoproteins, transcription factors, and other proteins to define and calculate the interval range, and the helices are classified in terms of hydrophilicity, charge and length: (1) hydrophobic helix (≤43%), amphiphilic helix (43%∼71%), and hydrophilic helix (≥71%). (2) positive helix, negative helix, electrically neutral helix and uncharged helix. (3) short helix (≤8 aa), medium-length helix (9-28 aa), and long helix (≥29 aa). Then, we designed an alphabet containing 36 triplet codes according to the above classification, so that the main features of each helix can be represented by only three letters. This alphabet not only preliminarily defined the helix characteristics, but also greatly reduced the informational dimension of protein structure. Finally, we present an application example to demonstrate the value of the structure-function alphabet in protein functional determination and differentiation.


Subject(s)
Membrane Proteins , Transcription Factors , Membrane Proteins/chemistry , Protein Structure, Secondary , Hydrophobic and Hydrophilic Interactions , Amino Acids/chemistry
8.
Comput Biol Med ; 170: 108049, 2024 Mar.
Article in English | MEDLINE | ID: mdl-38290319

ABSTRACT

Mammalian embryonic development is a complex process, characterized by intricate spatiotemporal dynamics and distinct chromatin preferences. However, the quick diversification in early embryogenesis leads to significant cellular diversity and the sparsity of scRNA-seq data, posing challenges in accurately determining cell fate decisions. In this study, we introduce a chromatin region binning method using scChrBin, designed to identify chromatin regions that elucidate the dynamics of embryonic development and lineage differentiation. This method transforms scRNA-seq data into a chromatin-based matrix, leveraging genomic annotations. Our results showed that the scChrBin method achieves high accuracy, with 98.0% and 89.2% on two single-cell embryonic datasets, demonstrating its effectiveness in analyzing complex developmental processes. We also systematically and comprehensively analysis of these key chromatin binning regions and their associated genes, focusing on their roles in lineage and stage development. The perspective of chromatin region binning method enables a comprehensive analysis of transcriptome data at the chromatin level, allowing us to unveil the dynamic expression of chromatin regions across temporal and spatial development. The tool is available as an application at https://github.com/liameihao/scChrBin.


Subject(s)
Chromatin , Embryonic Development , Animals , Female , Pregnancy , Chromatin/genetics , Embryonic Development/genetics , Cell Differentiation/genetics , Transcriptome , Genome , Gene Expression Profiling , Sequence Analysis, RNA , Mammals/genetics
9.
Brief Bioinform ; 25(1)2023 11 22.
Article in English | MEDLINE | ID: mdl-38040491

ABSTRACT

Pancreatic cancer is a globally recognized highly aggressive malignancy, posing a significant threat to human health and characterized by pronounced heterogeneity. In recent years, researchers have uncovered that the development and progression of cancer are often attributed to the accumulation of somatic mutations within cells. However, cancer somatic mutation data exhibit characteristics such as high dimensionality and sparsity, which pose new challenges in utilizing these data effectively. In this study, we propagated the discrete somatic mutation data of pancreatic cancer through a network propagation model based on protein-protein interaction networks. This resulted in smoothed somatic mutation profile data that incorporate protein network information. Based on this smoothed mutation profile data, we obtained the activity levels of different metabolic pathways in pancreatic cancer patients. Subsequently, using the activity levels of various metabolic pathways in cancer patients, we employed a deep clustering algorithm to establish biologically and clinically relevant metabolic subtypes of pancreatic cancer. Our study holds scientific significance in classifying pancreatic cancer based on somatic mutation data and may provide a crucial theoretical basis for the diagnosis and immunotherapy of pancreatic cancer patients.


Subject(s)
Genomics , Pancreatic Neoplasms , Humans , Prognosis , Genomics/methods , Pancreatic Neoplasms/genetics , Mutation , Cluster Analysis
10.
Mol Ther Nucleic Acids ; 34: 102044, 2023 Dec 12.
Article in English | MEDLINE | ID: mdl-37869261

ABSTRACT

Single-cell studies have demonstrated that somatic cell reprogramming is a continuous process of cell fates transition. Only partial reprogramming intermediates can overcome the molecular bottlenecks to acquire pluripotency. To decipher the underlying decisive factors driving cell fate, we identified induced pluripotent stem cells or stromal-like cells (iPSCs/SLCs) and iPSCs or trophoblast-like cells (iPSCs/TLCs) fate bifurcations by reconstructing cellular trajectory. The mesenchymal-epithelial transition and the activation of pluripotency networks are the main molecular series in successful reprogramming. Correspondingly, intermediates diverge into SLCs accompanied by the inhibition of cell cycle genes and the activation of extracellular matrix genes, whereas the TLCs fate is characterized by the up-regulation of placenta development genes. Combining putative gene regulatory networks, seven (Taf7, Ezh2, Klf2, etc.) and three key factors (Cdc5l, Klf4, and Nanog) were individually identified as drivers of the successful reprogramming by triggering downstream pluripotent networks during iPSCs/SLCs and iPSCs/TLCs fate bifurcation. Conversely, 11 factors (Cebpb, Sox4, Junb, etc.) and four factors (Gata2, Jund, Ctnnb1, etc.) drive SLCs fate and TLCs fate, respectively. Our study sheds new light on the understanding of decisive factors driving cell fate, which is helpful for improving reprogramming efficiency through manipulating cell fates to avoid alternative fates.

11.
BMC Genomics ; 24(1): 523, 2023 Sep 04.
Article in English | MEDLINE | ID: mdl-37667177

ABSTRACT

BACKGROUND: Ubiquitination controls almost all cellular processes. The dysregulation of ubiquitination signals is closely associated with the initiation and progression of multiple diseases. However, there is little comprehensive research on the interaction and potential function of ubiquitination regulators (UBRs) in spermatogenesis and cancer. METHODS: We systematically characterized the mRNA and protein expression of UBRs across tissues and further evaluated their roles in testicular development and spermatogenesis. Subsequently, we explored the genetic alterations, expression perturbations, cancer hallmark-related pathways, and clinical relevance of UBRs in pan-cancer. RESULTS: This work reveals heterogeneity in the expression patterns of UBRs across tissues, and the expression pattern in testis is the most distinct. UBRs are dynamically expressed during testis development, which are critical for normal spermatogenesis. Furthermore, UBRs have widespread genetic alterations and expression perturbations in pan-cancer. The expression of 79 UBRs was identified to be closely correlated with the activity of 32 cancer hallmark-related pathways, and ten hub genes were screened for further clinical relevance analysis by a network-based method. More than 90% of UBRs can affect the survival of cancer patients, and hub genes have an excellent prognostic classification for specific cancer types. CONCLUSIONS: Our study provides a comprehensive analysis of UBRs in spermatogenesis and pan-cancer, which can build a foundation for understanding male infertility and developing cancer drugs in the aspect of ubiquitination.


Subject(s)
Infertility, Male , Neoplasms , Humans , Male , Neoplasms/genetics , Ubiquitination , Clinical Relevance , Cognition
12.
Comput Methods Programs Biomed ; 242: 107808, 2023 Dec.
Article in English | MEDLINE | ID: mdl-37716222

ABSTRACT

BACKGROUND AND OBJECTIVE: Breast cancer is among of the most malignant tumor that occurs in women and is one of the leading causes of death from gynecologic malignancy worldwide. The high degree of heterogeneity that characterizes breast cancer makes it challenging to devise effective therapeutic strategies. Accumulating evidence highlights the crucial role of stratifying breast cancer patients into clinically significant subtypes to achieve better prognoses and treatments. The structural deep clustering network is a graph convolutional network-based clustering algorithm that integrates structural information and has achieved state-of-the-art performance in various applications. METHODS: In this study, we employed structural deep clustering network to integrate somatic mutation profiles for stratifying 2526 breast cancer patients from the Memorial Sloan Kettering Cancer Center into two clinically differentiable subtypes. RESULTS: Breast cancer patients in cluster 1 exhibited better prognosis than breast cancer patients in cluster 2, and the difference between them was statistically significant. The immunogenomic landscape further demonstrated that cluster 1 was associated with remarkable infiltration of the tumor infiltrating lymphocytes. The clustering subtype could be used to evaluate the therapeutic benefit of immunotherapy and chemotherapy in breast cancer patients. Furthermore, our approach effectively classified patients from eight different cancer types, demonstrating its generalizability. CONCLUSIONS: Our study represents a step towards a generic methodology for classifying cancer patients using only somatic mutation data and structural deep clustering network approaches. Employing structural deep clustering network to identify breast cancer subtypes is promising and can inform the development of more accurate and personalized therapies.


Subject(s)
Breast Neoplasms , Humans , Female , Breast Neoplasms/genetics , Breast Neoplasms/pathology , Algorithms , Prognosis , Cluster Analysis , Mutation
13.
Plants (Basel) ; 12(15)2023 Jul 26.
Article in English | MEDLINE | ID: mdl-37570928

ABSTRACT

Flax is an economic crop with a long history. It is grown worldwide and is mainly used for edible oil, industry, and textiles. Here, we reported a high-quality genome assembly for "Neiya No. 9", a popular variety widely grown in China. Combining PacBio long reads, Hi-C sequencing, and a genetic map reported previously, a genome assembly of 473.55 Mb was constructed, which covers ~94.7% of the flax genome. These sequences were anchored onto 15 chromosomes. The N50 lengths of the contig and scaffold were 0.91 Mb and 31.72 Mb, respectively. A total of 32,786 protein-coding genes were annotated, and 95.9% of complete BUSCOs were found. Through morphological and cytological observation, the male sterility of flax was considered dominant nuclear sterility. Through GWAS analysis, the gene LUSG00017705 (cysteine synthase gene) was found to be closest to the most significant SNP, and the expression level of this gene was significantly lower in male sterile plants than in fertile plants. Among the significant SNPs identified in the GWAS analysis, only two were located in the coding region, and these two SNPs caused changes in the protein encoded by LUSG00017565 (cysteine protease gene). It was speculated that these two genes may be related to male sterility in flax. This is the first time the molecular mechanism of male sterility in flax has been reported. The high-quality genome assembly and the male sterility genes revealed, provided a solid foundation for flax breeding.

15.
Int J Biol Macromol ; 244: 124993, 2023 Jul 31.
Article in English | MEDLINE | ID: mdl-37307968

ABSTRACT

Copper ion-binding proteins play an essential role in metabolic processes and are critical factors in many diseases, such as breast cancer, lung cancer, and Menkes disease. Many algorithms have been developed for predicting metal ion classification and binding sites, but none have been applied to copper ion-binding proteins. In this study, we developed a copper ion-bound protein classifier, RPCIBP, which integrating the reduced amino acid composition into position-specific scoring matrix (PSSM). The reduced amino acid composition filters out a large number of useless evolutionary features, improving the operational efficiency and predictive ability of the model (feature dimension from 2900 to 200, ACC from 83 % to 85.1 %). Compared with the basic model using only three sequence feature extraction methods (ACC in training set between 73.8 %-86.2 %, ACC in test set between 69.3 %-87.5 %), the model integrating the evolutionary features of the reduced amino acid composition showed higher accuracy and robustness (ACC in training set between 83.1 %-90.8 %, ACC in test set between 79.1 %-91.9 %). Best copper ion-binding protein classifiers filtered by feature selection progress were deployed in a user-friendly web server (http://bioinfor.imu.edu.cn/RPCIBP). RPCIBP can accurately predict copper ion-binding proteins, which is convenient for further structural and functional studies, and conducive to mechanism exploration and target drug development.


Subject(s)
Copper , Proteins , Position-Specific Scoring Matrices , Proteins/chemistry , Algorithms , Amino Acids/chemistry , Databases, Protein , Computational Biology/methods
16.
Heliyon ; 9(5): e16147, 2023 May.
Article in English | MEDLINE | ID: mdl-37215759

ABSTRACT

Transcription factors are protein molecules that act as regulators of gene expression. Aberrant protein activity of transcription factors can have a significant impact on tumor progression and metastasis in tumor patients. In this study, 868 immune-related transcription factors were identified from the transcription factor activity profile of 1823 ovarian cancer patients. The prognosis-related transcription factors were identified through univariate Cox analysis and random survival tree analysis, and two distinct clustering subtypes were subsequently derived based on these transcription factors. We assessed the clinical significance and genomics landscape of the two clustering subtypes and found statistically significant differences in prognosis, response to immunotherapy, and chemotherapy among ovarian cancer patients with different subtypes. Multi-scale Embedded Gene Co-expression Network Analysis was used to identify differential gene modules between the two clustering subtypes, which allowed us to conduct further analysis of biological pathways that exhibited significant differences between them. Finally, a ceRNA network was constructed to analyze lncRNA-miRNA-mRNA regulatory pairs with differential expression levels between two clustering subtypes. We expected that our study may provide some useful references for stratifying and treating patients with ovarian cancer.

17.
Research (Wash D C) ; 6: 0118, 2023.
Article in English | MEDLINE | ID: mdl-37223479

ABSTRACT

The precise characterization of cellular differentiation potency remains an open question, which is fundamentally important for deciphering the dynamics mechanism related to cell fate transition. We quantitatively evaluated the differentiation potency of different stem cells based on the Hopfield neural network (HNN). The results emphasized that cellular differentiation potency can be approximated by Hopfield energy values. We then profiled the Waddington energy landscape of embryogenesis and cell reprogramming processes. The energy landscape at single-cell resolution further confirmed that cell fate decision is progressively specified in a continuous process. Moreover, the transition of cells from one steady state to another in embryogenesis and cell reprogramming processes was dynamically simulated on the energy ladder. These two processes can be metaphorized as the motion of descending and ascending ladders, respectively. We further deciphered the dynamics of the gene regulatory network (GRN) for driving cell fate transition. Our study proposes a new energy indicator to quantitatively characterize cellular differentiation potency without prior knowledge, facilitating the further exploration of the potential mechanism of cellular plasticity.

18.
Brief Funct Genomics ; 22(4): 351-365, 2023 07 17.
Article in English | MEDLINE | ID: mdl-37103222

ABSTRACT

The expression and activity of transcription factors, which directly mediate gene transcription, are strictly regulated to control numerous normal cellular processes. In cancer, transcription factor activity is often dysregulated, resulting in abnormal expression of genes related to tumorigenesis and development. The carcinogenicity of transcription factors can be reduced through targeted therapy. However, most studies on the pathogenic and drug-resistant mechanisms of ovarian cancer have focused on the expression and signaling pathways of individual transcription factors. To improve the prognosis and treatment of patients with ovarian cancer, multiple transcription factors should be evaluated simultaneously to determine the effects of their protein activity on drug therapies. In this study, the transcription factor activity of ovarian cancer samples was inferred from virtual inference of protein activity by enriched regulon algorithm using mRNA expression data. Patients were clustered according to their transcription factor protein activities to investigate the association of transcription factor activities of different subtypes with prognosis and drug sensitivity for filtering subtype-specific drugs. Meanwhile, master regulator analysis was utilized to identify master regulators of differential protein activity between clustering subtypes, thereby identifying transcription factors associated with prognosis and assessing their potential as therapeutic targets. Master regulator risk scores were then constructed for guiding patients' clinical treatment, providing new insights into the treatment of ovarian cancer at the level of transcriptional regulation.


Subject(s)
Gene Expression Regulation , Ovarian Neoplasms , Humans , Female , Prognosis , Ovarian Neoplasms/drug therapy , Ovarian Neoplasms/genetics , Ovarian Neoplasms/pathology , Transcription Factors/genetics , Transcription Factors/metabolism , Genomics , Gene Expression Regulation, Neoplastic
19.
J Mol Biol ; 435(14): 168117, 2023 07 15.
Article in English | MEDLINE | ID: mdl-37086947

ABSTRACT

Metal-binding proteins are essential for the vital activities and engage in their roles by acting in concert with metal cations. MbPA (The Metal-binding Protein Atlas) is the most comprehensive resource up to now dedicated to curating metal-binding proteins. Currently, it contains 106,373 entries and 440,187 sites related to 54 metals and 8169 species. Users can view all metal-binding proteins and species-specific proteins in MbPA. There are also metal-proteomics data that quantitatively describes protein expression in different tissues and organs. By analyzing the data of the amino acid residues at the metal-binding site, it is found that about 80% of the metal ions tend to bind to cysteine, aspartic acid, glutamic acid, and histidine. Moreover, we use Diversity Measure to confirm that the diversity of metal-binding is specific in different area of periodic table, and further elucidate the binding modes of 19 transition metals on 20 amino acids. In addition, MbPA also embraces 6855 potential pathogenic mutations related to metalloprotein. The resource is freely available at http://bioinfor.imu.edu.cn/mbpa.


Subject(s)
Metalloproteins , Amino Acids/chemistry , Binding Sites , Cations/chemistry , Metalloproteins/chemistry , Metalloproteins/genetics , Metals/chemistry
20.
Brief Bioinform ; 24(2)2023 03 19.
Article in English | MEDLINE | ID: mdl-36772998

ABSTRACT

Chronic diseases, because of insidious onset and long latent period, have become the major global disease burden. However, the current chronic disease diagnosis methods based on genetic markers or imaging analysis are challenging to promote completely due to high costs and cannot reach universality and popularization. This study analyzed massive data from routine blood and biochemical test of 32 448 patients and developed a novel framework for cost-effective chronic disease prediction with high accuracy (AUC 87.32%). Based on the best-performing XGBoost algorithm, 20 classification models were further constructed for 17 types of chronic diseases, including 9 types of cancers, 5 types of cardiovascular diseases and 3 types of mental illness. The highest accuracy of the model was 90.13% for cardia cancer, and the lowest was 76.38% for rectal cancer. The model interpretation with the SHAP algorithm showed that CREA, R-CV, GLU and NEUT% might be important indices to identify the most chronic diseases. PDW and R-CV are also discovered to be crucial indices in classifying the three types of chronic diseases (cardiovascular disease, cancer and mental illness). In addition, R-CV has a higher specificity for cancer, ALP for cardiovascular disease and GLU for mental illness. The association between chronic diseases was further revealed. At last, we build a user-friendly explainable machine-learning-based clinical decision support system (DisPioneer: http://bioinfor.imu.edu.cn/dispioneer) to assist in predicting, classifying and treating chronic diseases. This cost-effective work with simple blood tests will benefit more people and motivate clinical implementation and further investigation of chronic diseases prevention and surveillance program.


Subject(s)
Cardiovascular Diseases , Mental Disorders , Humans , Cardiovascular Diseases/diagnosis , Cardiovascular Diseases/genetics , Cost-Benefit Analysis , Chronic Disease , Algorithms
SELECTION OF CITATIONS
SEARCH DETAIL