Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 9 de 9
Filtrar
1.
J Chem Inf Model ; 63(15): 4960-4969, 2023 08 14.
Artigo em Inglês | MEDLINE | ID: mdl-37499224

RESUMO

Diabetes mellitus is a chronic metabolic disease, which causes an imbalance in blood glucose homeostasis and further leads to severe complications. With the increasing population of diabetes, there is an urgent need to develop drugs to treat diabetes. The development of artificial intelligence provides a powerful tool for accelerating the discovery of antidiabetic drugs. This work aims to establish a predictor called iPADD for discovering potential antidiabetic drugs. In the predictor, we used four kinds of molecular fingerprints and their combinations to encode the drugs and then adopted minimum-redundancy-maximum-relevance (mRMR) combined with an incremental feature selection strategy to screen optimal features. Based on the optimal feature subset, eight machine learning algorithms were applied to train models by using 5-fold cross-validation. The best model could produce an accuracy (Acc) of 0.983 with the area under the receiver operating characteristic curve (auROC) value of 0.989 on an independent test set. To further validate the performance of iPADD, we selected 65 natural products for case analysis, including 13 natural products in clinical trials as positive samples and 52 natural products as negative samples. Except for abscisic acid, our model can give correct prediction results. Molecular docking illustrated that quercetin and resveratrol stably bound with the diabetes target NR1I2. These results are consistent with the model prediction results of iPADD, indicating that the machine learning model has a strong generalization ability. The source code of iPADD is available at https://github.com/llllxw/iPADD.


Assuntos
Inteligência Artificial , Hipoglicemiantes , Hipoglicemiantes/farmacologia , Simulação de Acoplamento Molecular , Algoritmos , Aprendizado de Máquina
2.
Brief Funct Genomics ; 2024 Feb 19.
Artigo em Inglês | MEDLINE | ID: mdl-38376798

RESUMO

Gut microbes is a crucial factor in the pathogenesis of type 1 diabetes (T1D). However, it is still unclear which gut microbiota are the key factors affecting T1D and their influence on the development and progression of the disease. To fill these knowledge gaps, we constructed a model to find biomarker from gut microbiota in patients with T1D. We first identified microbial markers using Linear discriminant analysis Effect Size (LEfSe) and random forest (RF) methods. Furthermore, by constructing co-occurrence networks for gut microbes in T1D, we aimed to reveal all gut microbial interactions as well as major beneficial and pathogenic bacteria in healthy populations and type 1 diabetic patients. Finally, PICRUST2 was used to predict Kyoto Encyclopedia of Genes and Genomes (KEGG) functional pathways and KO gene levels of microbial markers to investigate the biological role. Our study revealed that 21 identified microbial genera are important biomarker for T1D. Their AUC values are 0.962 and 0.745 on discovery set and validation set. Functional analysis showed that 10 microbial genera were significantly positively associated with D-arginine and D-ornithine metabolism, spliceosome in transcription, steroid hormone biosynthesis and glycosaminoglycan degradation. These genera were significantly negatively correlated with steroid biosynthesis, cyanoamino acid metabolism and drug metabolism. The other 11 genera displayed an inverse correlation. In summary, our research identified a comprehensive set of T1D gut biomarkers with universal applicability and have revealed the biological consequences of alterations in gut microbiota and their interplay. These findings offer significant prospects for individualized management and treatment of T1D.

3.
Comput Biol Med ; 169: 107952, 2024 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-38194779

RESUMO

Diabetes, a common chronic disease worldwide, can induce vascular complications, such as coronary heart disease (CHD), which is also one of the main causes of human death. It is of great significance to study the factors of diabetic patients complicated with CHD for understanding the occurrence of diabetes/CHD comorbidity. In this study, by analyzing the risk of CHD in more than 300,000 diabetes patients in southwest China, an artificial intelligence (AI) model was proposed to predict the risk of diabetes/CHD comorbidity. Firstly, we statistically analyzed the distribution of four types of features (basic demographic information, laboratory indicators, medical examination, and questionnaire) in comorbidities, and evaluated the predictive performance of three traditional machine learning methods (eXtreme Gradient Boosting, Random Forest, and Logistic regression). In addition, we have identified nine important features, including age, WHtR, BMI, stroke, smoking, chronic lung disease, drinking and MSP. Finally, the model produced an area under the receiver operating characteristic curve (AUC) of 0.701 on the test samples. These findings can provide personalized guidance for early CHD warning for diabetic populations.


Assuntos
Doença das Coronárias , Diabetes Mellitus , Humanos , Inteligência Artificial , Diabetes Mellitus/diagnóstico , Doença das Coronárias/epidemiologia , Doença das Coronárias/etiologia , China/epidemiologia , Aprendizado de Máquina
4.
Int J Biol Macromol ; 239: 124247, 2023 Jun 01.
Artigo em Inglês | MEDLINE | ID: mdl-37003392

RESUMO

2'-O-methylation (2OM) is an omnipresent post-transcriptional modification in RNAs. It is important for the regulation of RNA stability, mRNA splicing and translation, as well as innate immunity. With the increase in publicly available 2OM data, several computational tools have been developed for the identification of 2OM sites in human RNA. Unfortunately, these tools suffer from the low discriminative power of redundant features, unreasonable dataset construction or overfitting. To address those issues, based on four types of 2OM (2OM-adenine (A), cytosine (C), guanine (G), and uracil (U)) data, we developed a two-step feature selection model to identify 2OM. For each type, the one-way analysis of variance (ANOVA) combined with mutual information (MI) was proposed to rank sequence features for obtaining the optimal feature subset. Subsequently, four predictors based on eXtreme Gradient Boosting (XGBoost) or support vector machine (SVM) were presented to identify the four types of 2OM sites. Finally, the proposed model could produce an overall accuracy of 84.3 % on the independent set. To provide a convenience for users, an online tool called i2OM was constructed and can be freely access at i2om.lin-group.cn. The predictor may provide a reference for the study of the 2OM.


Assuntos
Biologia Computacional , RNA , Humanos , RNA/genética , Metilação , Máquina de Vetores de Suporte , Citosina
5.
NPJ Digit Med ; 6(1): 136, 2023 Jul 31.
Artigo em Inglês | MEDLINE | ID: mdl-37524859

RESUMO

Large-scale screening for the risk of coronary heart disease (CHD) is crucial for its prevention and management. Physical examination data has the advantages of wide coverage, large capacity, and easy collection. Therefore, here we report a gender-specific cascading system for risk assessment of CHD based on physical examination data. The dataset consists of 39,538 CHD patients and 640,465 healthy individuals from the Luzhou Health Commission in Sichuan, China. Fifty physical examination characteristics were considered, and after feature screening, ten risk factors were identified. To facilitate large-scale CHD risk screening, a CHD risk model was developed using a fully connected network (FCN). For males, the model achieves AUCs of 0.8671 and 0.8659, respectively on the independent test set and the external validation set. For females, the AUCs of the model are 0.8991 and 0.9006, respectively on the independent test set and the external validation set. Furthermore, to enhance the convenience and flexibility of the model in clinical and real-life scenarios, we established a CHD risk scorecard base on logistic regression (LR). The results show that, for both males and females, the AUCs of the scorecard on the independent test set and the external verification set are only slightly lower (<0.05) than those of the corresponding prediction model, indicating that the scorecard construction does not result in a significant loss of information. To promote CHD personal lifestyle management, an online CHD risk assessment system has been established, which can be freely accessed at http://lin-group.cn/server/CHD/index.html .

6.
Int J Biol Macromol ; 227: 1174-1181, 2023 Feb 01.
Artigo em Inglês | MEDLINE | ID: mdl-36470433

RESUMO

RNA N4-acetylcytidine (ac4C) is the acetylation of cytidine at the nitrogen-4 position, which is a highly conserved RNA modification and involves a variety of biological processes. Hence, accurate identification of genome-wide ac4C sites is vital for understanding regulation mechanism of gene expression. In this work, a novel predictor, named iRNA-ac4C, was established to identify ac4C sites in human mRNA based on three feature extraction methods, including nucleotide composition, nucleotide chemical property, and accumulated nucleotide frequency. Subsequently, minimum-Redundancy-Maximum-Relevance combined with incremental feature selection strategies was utilized to select the optimal feature subset. According to the optimal feature subset, the best ac4C classification model was trained by gradient boosting decision tree with 10-fold cross-validation. The results of independent testing set indicated that our proposed method could produce encouraging generalization capabilities. For the convenience of other researchers, we established a user-friendly web server which is freely available at http://lin-group.cn/server/iRNA-ac4C/. We hope that the tool could provide guide for wet-experimental scholars.


Assuntos
Citidina , RNA , Humanos , RNA Mensageiro/metabolismo , Citidina/genética , Citidina/metabolismo , RNA/química , Nucleotídeos
7.
Comput Math Methods Med ; 2022: 7493834, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35069791

RESUMO

Helicobacter pylori (H. pylori) is the most common risk factor for gastric cancer worldwide. The membrane proteins of the H. pylori are involved in bacterial adherence and play a vital role in the field of drug discovery. Thus, an accurate and cost-effective computational model is needed to predict the uncharacterized membrane proteins of H. pylori. In this study, a reliable benchmark dataset consisted of 114 membrane and 219 nonmembrane proteins was constructed based on UniProt. A support vector machine- (SVM-) based model was developed for discriminating H. pylori membrane proteins from nonmembrane proteins by using sequence information. Cross-validation showed that our method achieved good performance with an accuracy of 91.29%. It is anticipated that the proposed model will be useful for the annotation of H. pylori membrane proteins and the development of new anti-H. pylori agents.


Assuntos
Proteínas de Bactérias/genética , Helicobacter pylori/genética , Proteínas de Membrana/genética , Sequência de Aminoácidos , Aminoácidos/análise , Proteínas de Bactérias/química , Biologia Computacional , Bases de Dados de Proteínas/estatística & dados numéricos , Helicobacter pylori/química , Helicobacter pylori/patogenicidade , Interações entre Hospedeiro e Microrganismos , Humanos , Proteínas de Membrana/química , Máquina de Vetores de Suporte
8.
Front Biosci (Landmark Ed) ; 27(3): 84, 2022 03 05.
Artigo em Inglês | MEDLINE | ID: mdl-35345316

RESUMO

BACKGROUND: Lipocalin belongs to the calcyin family, and its sequence length is generally between 165 and 200 residues. They are mainly stable and multifunctional extracellular proteins. Lipocalin plays an important role in several stress responses and allergic inflammations. Because the accurate identification of lipocalins could provide significant evidences for the study of their function, it is necessary to develop a machine learning-based model to recognize lipocalin. METHODS: In this study, we constructed a prediction model to identify lipocalin. Their sequences were encoded by six types of features, namely amino acid composition (AAC), composition of k-spaced amino acid pairs (CKSAAP), pseudo amino acid composition (PseAAC), Geary correlation (GD), normalized Moreau-Broto autocorrelation (NMBroto) and composition/transition/distribution (CTD). Subsequently, these features were optimized by using feature selection techniques. A classifier based on random forest was trained according to the optimal features. RESULTS: The results of 10-fold cross-validation showed that our computational model would classify lipocalins with accuracy of 95.03% and area under the curve of 0.987. On the independent dataset, our computational model could produce the accuracy of 89.90% which was 4.17% higher than the existing model. CONCLUSIONS: In this work, we developed an advanced computational model to discriminate lipocalin proteins from non-lipocalin proteins. In the proposed model, protein sequences were encoded by six descriptors. Then, feature selection was performed to pick out the best features which could produce the maximum accuracy. On the basis of the best feature subset, the RF-based classifier can obtained the best prediction results.


Assuntos
Inteligência Artificial , Lipocalinas , Aminoácidos , Biologia Computacional , Lipocalinas/química , Aprendizado de Máquina , Proteínas/química
9.
Comput Struct Biotechnol J ; 20: 4942-4951, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-36147670

RESUMO

Ion binding proteins (IBPs) can selectively and non-covalently interact with ions. IBPs in phages also play an important role in biological processes. Therefore, accurate identification of IBPs is necessary for understanding their biological functions and molecular mechanisms that involve binding to ions. Since molecular biology experimental methods are still labor-intensive and cost-ineffective in identifying IBPs, it is helpful to develop computational methods to identify IBPs quickly and efficiently. In this work, a random forest (RF)-based model was constructed to quickly identify IBPs. Based on the protein sequence information and residues' physicochemical properties, the dipeptide composition combined with the physicochemical correlation between two residues were proposed for the extraction of features. A feature selection technique called analysis of variance (ANOVA) was used to exclude redundant information. By comparing with other classified methods, we demonstrated that our method could identify IBPs accurately. Based on the model, a Python package named IBPred was built with the source code which can be accessed at https://github.com/ShishiYuan/IBPred.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA