Search | VHL Regional Portal

Risk prediction of diabetes and pre-diabetes based on physical examination data.

Han, Yu-Mei; Yang, Hui; Huang, Qin-Lai; Sun, Zi-Jie; Li, Ming-Liang; Zhang, Jing-Bo; Deng, Ke-Jun; Chen, Shuo; Lin, Hao.

Math Biosci Eng ; 19(4): 3597-3608, 2022 02 07.

Article in English | MEDLINE | ID: mdl-35341266

ABSTRACT

Diabetes is a metabolic disorder caused by insufficient insulin secretion and insulin secretion disorders. From health to diabetes, there are generally three stages: health, pre-diabetes and type 2 diabetes. Early diagnosis of diabetes is the most effective way to prevent and control diabetes and its complications. In this work, we collected the physical examination data from Beijing Physical Examination Center from January 2006 to December 2017, and divided the population into three groups according to the WHO (1999) Diabetes Diagnostic Standards: normal fasting plasma glucose (NFG) (FPG < 6.1 mmol/L), mildly impaired fasting plasma glucose (IFG) (6.1 mmol/L ≤ FPG < 7.0 mmol/L) and type 2 diabetes (T2DM) (FPG > 7.0 mmol/L). Finally, we obtained1,221,598 NFG samples, 285,965 IFG samples and 387,076 T2DM samples, with a total of 15 physical examination indexes. Furthermore, taking eXtreme Gradient Boosting (XGBoost), random forest (RF), Logistic Regression (LR), and Fully connected neural network (FCN) as classifiers, four models were constructed to distinguish NFG, IFG and T2DM. The comparison results show that XGBoost has the best performance, with AUC (macro) of 0.7874 and AUC (micro) of 0.8633. In addition, based on the XGBoost classifier, three binary classification models were also established to discriminate NFG from IFG, NFG from T2DM, IFG from T2DM. On the independent dataset, the AUCs were 0.7808, 0.8687, 0.7067, respectively. Finally, we analyzed the importance of the features and identified the risk factors associated with diabetes.

Subject(s)

Diabetes Mellitus, Type 2 , Prediabetic State , Blood Glucose/metabolism , Diabetes Mellitus, Type 2/diagnosis , Diabetes Mellitus, Type 2/epidemiology , Fasting , Humans , Physical Examination , Prediabetic State/diagnosis , Prediabetic State/epidemiology

Deep-4mCGP: A Deep Learning Approach to Predict 4mC Sites in Geobacter pickeringii by Using Correlation-Based Feature Selection Technique.

Zulfiqar, Hasan; Huang, Qin-Lai; Lv, Hao; Sun, Zi-Jie; Dao, Fu-Ying; Lin, Hao.

Int J Mol Sci ; 23(3)2022 Jan 23.

Article in English | MEDLINE | ID: mdl-35163174

ABSTRACT

4mC is a type of DNA alteration that has the ability to synchronize multiple biological movements, for example, DNA replication, gene expressions, and transcriptional regulations. Accurate prediction of 4mC sites can provide exact information to their hereditary functions. The purpose of this study was to establish a robust deep learning model to recognize 4mC sites in Geobacter pickeringii. In the anticipated model, two kinds of feature descriptors, namely, binary and k-mer composition were used to encode the DNA sequences of Geobacter pickeringii. The obtained features from their fusion were optimized by using correlation and gradient-boosting decision tree (GBDT)-based algorithm with incremental feature selection (IFS) method. Then, these optimized features were inserted into 1D convolutional neural network (CNN) to classify 4mC sites from non-4mC sites in Geobacter pickeringii. The performance of the anticipated model on independent data exhibited an accuracy of 0.868, which was 4.2% higher than the existing model.

Subject(s)

Computational Biology/methods , Epigenesis, Genetic/genetics , Geobacter/genetics , Algorithms , Cytosine/metabolism , DNA/genetics , DNA Methylation/genetics , Deep Learning , Machine Learning , Mutation/genetics , Neural Networks, Computer , Software

Deep-4mCW2V: A sequence-based predictor to identify N4-methylcytosine sites in Escherichia coli.

Zulfiqar, Hasan; Sun, Zi-Jie; Huang, Qin-Lai; Yuan, Shi-Shi; Lv, Hao; Dao, Fu-Ying; Lin, Hao; Li, Yan-Wen.

Methods ; 203: 558-563, 2022 07.

Article in English | MEDLINE | ID: mdl-34352373

ABSTRACT

N4-methylcytosine (4mC) is a type of DNA modification which could regulate several biological progressions such as transcription regulation, replication and gene expressions. Precisely recognizing 4mC sites in genomic sequences can provide specific knowledge about their genetic roles. This study aimed to develop a deep learning-based model to predict 4mC sites in the Escherichia coli. In the model, DNA sequences were encoded by word embedding technique 'word2vec'. The obtained features were inputted into 1-D convolutional neural network (CNN) to discriminate 4mC sites from non-4mC sites in Escherichia coli genome. The examination on independent dataset showed that our model could yield the overall accuracy of 0.861, which was about 4.3% higher than the existing model. To provide convenience to scholars, we provided the data and source code of the model which can be freely download from https://github.com/linDing-groups/Deep-4mCW2V.

Subject(s)

DNA , Escherichia coli , DNA/genetics , Escherichia coli/genetics , Genome , Genomics , Software

Identification of cyclin protein using gradient boost decision tree algorithm.

Zulfiqar, Hasan; Yuan, Shi-Shi; Huang, Qin-Lai; Sun, Zi-Jie; Dao, Fu-Ying; Yu, Xiao-Long; Lin, Hao.

Comput Struct Biotechnol J ; 19: 4123-4131, 2021.

Article in English | MEDLINE | ID: mdl-34527186

ABSTRACT

Cyclin proteins are capable to regulate the cell cycle by forming a complex with cyclin-dependent kinases to activate cell cycle. Correct recognition of cyclin proteins could provide key clues for studying their functions. However, their sequences share low similarity, which results in poor prediction for sequence similarity-based methods. Thus, it is urgent to construct a machine learning model to identify cyclin proteins. This study aimed to develop a computational model to discriminate cyclin proteins from non-cyclin proteins. In our model, protein sequences were encoded by seven kinds of features that are amino acid composition, composition of k-spaced amino acid pairs, tri peptide composition, pseudo amino acid composition, geary correlation, normalized moreau-broto autocorrelation and composition/transition/distribution. Afterward, these features were optimized by using analysis of variance (ANOVA) and minimum redundancy maximum relevance (mRMR) with incremental feature selection (IFS) technique. A gradient boost decision tree (GBDT) classifier was trained on the optimal features. Five-fold cross-validated results showed that our model would identify cyclins with an accuracy of 93.06% and AUC value of 0.971, which are higher than the two recent studies on the same data.

iDHS-Deep: an integrated tool for predicting DNase I hypersensitive sites by deep neural network.

Dao, Fu-Ying; Lv, Hao; Su, Wei; Sun, Zi-Jie; Huang, Qin-Lai; Lin, Hao.

Brief Bioinform ; 22(5)2021 09 02.

Article in English | MEDLINE | ID: mdl-33751027

ABSTRACT

DNase I hypersensitive site (DHS) refers to the hypersensitive region of chromatin for the DNase I enzyme. It is an important part of the noncoding region and contains a variety of regulatory elements, such as promoter, enhancer, and transcription factor-binding site, etc. Moreover, the related locus of disease (or trait) are usually enriched in the DHS regions. Therefore, the detection of DHS region is of great significance. In this study, we develop a deep learning-based algorithm to identify whether an unknown sequence region would be potential DHS. The proposed method showed high prediction performance on both training datasets and independent datasets in different cell types and developmental stages, demonstrating that the method has excellent superiority in the identification of DHSs. Furthermore, for the convenience of related wet-experimental researchers, the user-friendly web-server iDHS-Deep was established at http://lin-group.cn/server/iDHS-Deep/, by which users can easily distinguish DHS and non-DHS and obtain the corresponding developmental stage ofDHS.

Subject(s)

Arabidopsis/genetics , DNA/genetics , Deep Learning , Deoxyribonuclease I/genetics , Oryza/genetics , Software , Arabidopsis/metabolism , Chromatin/metabolism , Chromatin/ultrastructure , DNA/chemistry , DNA/metabolism , Datasets as Topic , Deoxyribonuclease I/metabolism , Enhancer Elements, Genetic , Genetic Loci , Humans , Internet , Oryza/metabolism , Promoter Regions, Genetic , Protein Binding , Transcription Factors/genetics , Transcription Factors/metabolism , Transcription, Genetic

iBLP: An XGBoost-Based Predictor for Identifying Bioluminescent Proteins.

Zhang, Dan; Chen, Hua-Dong; Zulfiqar, Hasan; Yuan, Shi-Shi; Huang, Qin-Lai; Zhang, Zhao-Yue; Deng, Ke-Jun.

Comput Math Methods Med ; 2021: 6664362, 2021.

Article in English | MEDLINE | ID: mdl-33505515

ABSTRACT

Bioluminescent proteins (BLPs) are a class of proteins that widely distributed in many living organisms with various mechanisms of light emission including bioluminescence and chemiluminescence from luminous organisms. Bioluminescence has been commonly used in various analytical research methods of cellular processes, such as gene expression analysis, drug discovery, cellular imaging, and toxicity determination. However, the identification of bioluminescent proteins is challenging as they share poor sequence similarities among them. In this paper, we briefly reviewed the development of the computational identification of BLPs and subsequently proposed a novel predicting framework for identifying BLPs based on eXtreme gradient boosting algorithm (XGBoost) and using sequence-derived features. To train the models, we collected BLP data from bacteria, eukaryote, and archaea. Then, for getting more effective prediction models, we examined the performances of different feature extraction methods and their combinations as well as classification algorithms. Finally, based on the optimal model, a novel predictor named iBLP was constructed to identify BLPs. The robustness of iBLP has been proved by experiments on training and independent datasets. Comparison with other published method further demonstrated that the proposed method is powerful and could provide good performance for BLP identification. The webserver and software package for BLP identification are freely available at http://lin-group.cn/server/iBLP.

Subject(s)

Algorithms , Luminescent Proteins , Amino Acid Sequence , Chemical Phenomena , Computational Biology , Databases, Protein , Drug Discovery , Luminescence , Luminescent Proteins/chemistry , Luminescent Proteins/genetics , Luminescent Proteins/metabolism , Machine Learning , Software

Identification of 2'-O-methylation Site by Investigating Multi-feature Extracting Techniques.

Huang, Qin-Lai; Wang, Lida; Han, Shu-Guang; Tang, Hua.

Comb Chem High Throughput Screen ; 23(6): 527-535, 2020.

Article in English | MEDLINE | ID: mdl-32334499

ABSTRACT

BACKGROUND: RNA methylation is a reversible post-transcriptional modification involving numerous biological processes. Ribose 2'-O-methylation is part of RNA methylation. It has shown that ribose 2'-O-methylation plays an important role in immune recognition and other pathogenesis. OBJECTIVE: We aim to design a computational method to identify 2'-O-methylation. METHODS: Different from the experimental method, we propose a computational workflow to identify the methylation site based on the multi-feature extracting algorithm. RESULTS: With a voting procedure based on 7 best feature-classifier combinations, we achieved Accuracy of 76.5% in 10-fold cross-validation. Furthermore, we optimized features and input the optimized features into SVM. As a result, the AUC reached to 0.813. CONCLUSION: The RNA sample, especially the negative samples, used in this study are more objective and strict, so we obtained more representative results than state-of-arts studies.

Subject(s)

Computational Biology , Machine Learning , RNA/metabolism , Methylation , RNA/chemistry

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL