Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 27
Filtrar
1.
Int J Environ Health Res ; 32(8): 1716-1732, 2022 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-33769141

RESUMO

This study investigated the influence of climate factors on malaria incidence in the Sundargarh district, Odisha, India. The WEKA machine learning tool was used with two classifier techniques, Multi-Layer Perceptron (MLP) and J48, with three test options, 10-fold cross-validation, percentile split, and supplied test. A comparative analysis was carried out to ascertain the superior model among malaria prediction accuracy techniques in varying climate contexts. The results suggested that J48 had exhibited better skill than MLP with the 10-fold cross-validation method over the percentile split and supplied test options. J48 demonstrated less error (RMSE = 0.6), better kappa = 0.63, and higher accuracy = 0.71), suggesting it as most suitable model. Seasonal variation of temperature and humidity had a better association with malaria incidents than rainfall, and the performance was better during the monsoon and post-monsoon when the incidents are at the peak.


Assuntos
Aprendizado de Máquina , Malária , Clima , Humanos , Malária/epidemiologia , Redes Neurais de Computação , Estações do Ano
2.
BMC Nephrol ; 22(1): 273, 2021 08 09.
Artigo em Inglês | MEDLINE | ID: mdl-34372817

RESUMO

BACKGROUND: Chronic Kidney Disease (CKD), i.e., gradual decrease in the renal function spanning over a duration of several months to years without any major symptoms, is a life-threatening disease. It progresses in six stages according to the severity level. It is categorized into various stages based on the Glomerular Filtration Rate (GFR), which in turn utilizes several attributes, like age, sex, race and Serum Creatinine. Among multiple available models for estimating GFR value, Chronic Kidney Disease Epidemiology Collaboration (CKD-EPI), which is a linear model, has been found to be quite efficient because it allows detecting all CKD stages. METHODS: Early detection and cure of CKD is extremely desirable as it can lead to the prevention of unwanted consequences. Machine learning methods are being extensively advocated for early detection of symptoms and diagnosis of several diseases recently. With the same motivation, the aim of this study is to predict the various stages of CKD using machine learning classification algorithms on the dataset obtained from the medical records of affected people. Specifically, we have used the Random Forest and J48 algorithms to obtain a sustainable and practicable model to detect various stages of CKD with comprehensive medical accuracy. RESULTS: Comparative analysis of the results revealed that J48 predicted CKD in all stages better than random forest with an accuracy of 85.5%. The study also showed that J48 shows improved performance over Random Forest. CONCLUSIONS: The study concluded that it may be used to build an automated system for the detection of severity of CKD.


Assuntos
Árvores de Decisões , Progressão da Doença , Taxa de Filtração Glomerular , Aprendizado de Máquina , Insuficiência Renal Crônica , Algoritmos , Diagnóstico Precoce , Feminino , Humanos , Testes de Função Renal/métodos , Masculino , Prontuários Médicos/estatística & dados numéricos , Pessoa de Meia-Idade , Gravidade do Paciente , Prognóstico , Insuficiência Renal Crônica/diagnóstico , Insuficiência Renal Crônica/fisiopatologia , Reprodutibilidade dos Testes , Índice de Gravidade de Doença
3.
BMC Bioinformatics ; 21(1): 278, 2020 Jul 02.
Artigo em Inglês | MEDLINE | ID: mdl-32615980

RESUMO

BACKGROUND: Heart disease (HD) is one of the most common diseases nowadays, and an early diagnosis of such a disease is a crucial task for many health care providers to prevent their patients for such a disease and to save lives. In this paper, a comparative analysis of different classifiers was performed for the classification of the Heart Disease dataset in order to correctly classify and or predict HD cases with minimal attributes. The set contains 76 attributes including the class attribute, for 1025 patients collected from Cleveland, Hungary, Switzerland, and Long Beach, but in this paper, only a subset of 14 attributes are used, and each attribute has a given set value. The algorithms used K- Nearest Neighbor (K-NN), Naive Bayes, Decision tree J48, JRip, SVM, Adaboost, Stochastic Gradient Decent (SGD) and Decision Table (DT) classifiers to show the performance of the selected classifications algorithms to best classify, and or predict, the HD cases. RESULTS: It was shown that using different classification algorithms for the classification of the HD dataset gives very promising results in term of the classification accuracy for the K-NN (K = 1), Decision tree J48 and JRip classifiers with accuracy of classification of 99.7073, 98.0488 and 97.2683% respectively. A feature extraction method was performed using Classifier Subset Evaluator on the HD dataset, and results show enhanced performance in term of the classification accuracy for K-NN (N = 1) and Decision Table classifiers to 100 and 93.8537% respectively after using the selected features by only applying a combination of up to 4 attributes instead of 13 attributes for the predication of the HD cases. CONCLUSION: Different classifiers were used and compared to classify the HD dataset, and we concluded the benefit of having a reliable feature selection method for HD disease prediction with using minimal number of attributes instead of having to consider all available ones.


Assuntos
Algoritmos , Cardiopatias/diagnóstico , Teorema de Bayes , Dor no Peito/diagnóstico , Bases de Dados como Assunto , Humanos , Curva ROC , Máquina de Vetores de Suporte
4.
Environ Monit Assess ; 192(3): 172, 2020 Feb 10.
Artigo em Inglês | MEDLINE | ID: mdl-32040638

RESUMO

The microalga Dunaliella salina has been broadly studied for different purposes such as beta-carotene production, toxicity assessment and salinity tolerance, yet research on the habitat suitability of this alga has rarely been reported. The present research aims to apply a suitable monitoring and modelling methods (two critical steps in ecological researches) to predict the abundance of D. salina. The abundance of D. salina was predicted by decision tree model (J48 algorithm) in 10 different monitoring sites during 1-year study period (2016-2017) in the Meighan wetland, one of the valuable hypersaline wetlands in Iran. The abundance of alga (as output of model) together with various water quality and physical-habitat wetland characteristics (as inputs of model) were monthly and repeatedly monitored in two different depths (one from the surface layer and another one from the depth of maximum 50 cm) which in total resulted in 240 instances (120 instances for each depth). Based on trial and error, a sevenfold cross-validation resulted in the highest predictive performances of the model (CCI > 75% and Cohen's Kappa > 0.65). According to the model's prediction, the number of sunny hours might be one of the most important driving parameters to meet the habitat requirements of alga in the hypersaline wetland. Model also predicted that an increase in dissolved oxygen and sodium concentrations might increase the abundance of D. salina in the salt wetland. In contrast, an increase in total suspended solids concentration and monthly precipitation might lead to a decrease in the abundance of alga. Chi-square test of independence showed a significant difference between the abundance of the D. salina and spatio-temporal patterns in the wetland (Pearson chi-square statistic = 221.7, p = 0.001) so warm seasons (spring and summer) had more contribution to the sampling of the species than cold seasons (autumn and winter). The difference in the abundance of the species in different sampling sites can be attributed due to the various anthropogenic activities.


Assuntos
Árvores de Decisões , Monitoramento Ambiental , Microalgas , Áreas Alagadas , Ecossistema , Irã (Geográfico)
5.
J Biomed Inform ; 87: 79-87, 2018 11.
Artigo em Inglês | MEDLINE | ID: mdl-30296491

RESUMO

This paper proposes an effective and robust approach for Chemical-Induced Disease (CID) relation extraction from PubMed articles. The study was performed on the Chemical Disease Relation (CDR) task of BioCreative V track-3 corpus. The proposed system, named relSCAN, is an efficient CID relation extraction system with two phases to classify relation instances from the Co-occurrence and Non-Co-occurrence mention levels. We describe the case of chemical and disease mentions that occur in the same sentence as 'Co-occurrence', or as 'Non-Co-occurrence' otherwise. In the first phase, the relation instances are constructed on both mention levels. In the second phase, we employ a hybrid feature set to classify the relation instances at both of these mention levels using the combination of two Machine Learning (ML) classifiers (Support Vector Machine (SVM) and J48 Decision tree). This system is entirely corpus dependent and does not rely on information from external resources in order to boost its performance. We achieved good results, which are comparable with the other state-of-the-art CID relation extraction systems on the BioCreative V corpus. Furthermore, our system achieves the best performance on the Non-Co-occurrence mention level.


Assuntos
Distúrbios Induzidos Quimicamente/diagnóstico , Mineração de Dados/métodos , Informática Médica/métodos , Máquina de Vetores de Suporte , Algoritmos , Tomada de Decisões , Doença , Humanos , Aprendizado de Máquina , Publicações , Distribuição Aleatória , Análise de Regressão
6.
Sensors (Basel) ; 16(7)2016 Jul 12.
Artigo em Inglês | MEDLINE | ID: mdl-27420067

RESUMO

Water bodies are essential to humans and other forms of life. Identification of water bodies can be useful in various ways, including estimation of water availability, demarcation of flooded regions, change detection, and so on. In past decades, Landsat satellite sensors have been used for land use classification and water body identification. Due to the introduction of a New Operational Land Imager (OLI) sensor on Landsat 8 with a high spectral resolution and improved signal-to-noise ratio, the quality of imagery sensed by Landsat 8 has improved, enabling better characterization of land cover and increased data size. Therefore, it is necessary to explore the most appropriate and practical water identification methods that take advantage of the improved image quality and use the fewest inputs based on the original OLI bands. The objective of the study is to explore the potential of a J48 decision tree (JDT) in identifying water bodies using reflectance bands from Landsat 8 OLI imagery. J48 is an open-source decision tree. The test site for the study is in the Northern Han River Basin, which is located in Gangwon province, Korea. Training data with individual bands were used to develop the JDT model and later applied to the whole study area. The performance of the model was statistically analysed using the kappa statistic and area under the curve (AUC). The results were compared with five other known water identification methods using a confusion matrix and related statistics. Almost all the methods showed high accuracy, and the JDT was successfully applied to the OLI image using only four bands, where the new additional deep blue band of OLI was found to have the third highest information gain. Thus, the JDT can be a good method for water body identification based on images with improved resolution and increased size.

7.
Orthop J Sports Med ; 12(3): 23259671241231609, 2024 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-38449692

RESUMO

Background: Although evidence indicates that extracorporeal shockwave therapy (ESWT) is effective in treating calcifying shoulder tendinitis, incomplete resorption and dissatisfactory results are still reported in many cases. Data mining techniques have been applied in health care in the past decade to predict outcomes of disease and treatment. Purpose: To identify the ideal data mining technique for the prediction of ESWT-induced shoulder calcification resorption and the most accurate algorithm for use in the clinical setting. Study Design: Case-control study. Methods: Patients with painful calcified shoulder tendinitis treated by ESWT were enrolled. Seven clinical factors related to shoulder calcification were adopted as the input attributes: sex, age, side affected, symptom duration, pretreatment Constant-Murley score, and calcification size and type. The 5 data mining techniques assessed were multilayer perceptron (neural network), naïve Bayes, sequential minimal optimization, logistic regression, and the J48 decision tree classifier. Results: A total of 248 patients with calcified shoulder tendinitis were enrolled in this study. Shorter symptom duration yielded the highest gain ratio (0.374), followed by smaller calcification size (0.336) and calcification type (0.253). With the J48 decision tree method, the accuracy of 3 input attributes was 89.5% by 10-fold cross-validation, indicating satisfactory accuracy. A treatment algorithm using the J48 decision tree indicated that a symptom duration of ≤10 months was the most positive indicator of calcification resorption, followed by a calcification size of ≤10.82 mm. Conclusion: The J48 decision tree method demonstrated the highest precision and accuracy in the prediction of shoulder calcification resorption by ESWT. A symptom duration of ≤10 months or calcification size of ≤10.82 mm represented the clinical scenarios most likely to show resorption after ESWT.

8.
Environ Technol ; 44(13): 1973-1984, 2023 May.
Artigo em Inglês | MEDLINE | ID: mdl-34919033

RESUMO

ABSTRACTDue to industrialization, activities of human and urbanization, environment is getting polluted. Air pollution has become a main issue in the metropolitan areas of the world. To protect people from diseases, monitoring air quality plays an important thing. This air pollutant may lead to many health issues like respiratory and cardiac problems. The major air pollutants are NO, C6H6, CO, etc. Many research works have been done in predicting air pollution-based health issues, predicting air pollution levels, monitoring and controlling the polluted levels. But they are not efficient, cost of maintenance is high and insufficient tool for monitoring it. To overcome these issues, this paper implements hybrid algorithm of Decision Tree J48 and Grey Wolf Optimizer (DT-GWO). This DT-GWO is a better model to addresses the predicting of Air Quality Index (AQI), which minimizes the error rate, accurately and effectively predicting the air quality. The AQI values are categorised as good, moderate, unhealthy, very unhealthy and hazardous. The dataset used in this work is collected from Kaggle website which contains air pollutants details with air quality index values. Accuracy obtained for decision Tree J48 is 93.72%, grey wolf optimizer is 96.83% and our proposed work DT-GWO is 99.78%.


Assuntos
Poluentes Atmosféricos , Poluição do Ar , Humanos , Poluição do Ar/prevenção & controle , Poluentes Atmosféricos/análise , Algoritmos , Aprendizado de Máquina , Árvores de Decisões , Monitoramento Ambiental
9.
Med Biol Eng Comput ; 60(9): 2589-2600, 2022 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-35781590

RESUMO

This paper presents a comparative evaluation of classification algorithms using Waikato Environment for Knowledge Analysis (WEKA) software. The main goal of the paper is to conduct a comprehensive comparison and determine which predictive modelling technique is best for the problem of classifying breast cancer recurrence. The dataset for this study consists of 286 instances (201 instances belong to recurrence class and 85 instances belong to non-recurrence class) and 10 attributes. Comparison analysis is conducted for Naïve Bayes, J48, K*, Random Forest, Multilayer Perceptron (MLP) and Support Vector Machine (SVM) models using different parameters. The performance of the developed models is calculated using the following evaluation metrics: accuracy, precision, sensitivity, specificity, mean absolute error, ROC curves and AUC values. Contribution of the attributes to the classification models is assessed by measuring information gain. Results show that J48 model and the SVM algorithm give the highest accuracy, which is 75.5% and 79.6%, respectively. Implementation of SVM algorithm also shows the highest sensitivity of 99%, while the highest precision is obtained by MLP algorithm which is 79%. In addition, SVM algorithm possesses the lowest mean absolute error. Furthermore, by measuring information gain, it is revealed that a degree of malignant tumour contributes more than other attributes to recurrence of breast cancer.


Assuntos
Neoplasias da Mama , Algoritmos , Teorema de Bayes , Neoplasias da Mama/diagnóstico , Feminino , Humanos , Aprendizado de Máquina , Recidiva Local de Neoplasia , Máquina de Vetores de Suporte
10.
Results Eng ; 13: 100363, 2022 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-35317385

RESUMO

The outbreak of Covid-19 pandemic has been declared a global health crisis by the World Health Organization since its emergence. Several researchers have proposed a number of techniques to understand how the pandemic affects the populations. Reported among these techniques are data mining models which have been successfully applied in a wide range of situations before the advent of Covid-19 pandemic. In this work, the researchers have applied a number of existing data mining methods (classifiers) available in the Waikato Environment for Knowledge Analysis (WEKA) machine learning library. WEKA was used to gain a better understanding on how the epidemic spread within Zambia. The classifiers used are J48 decision tree, Multilayer Perceptron and Naïve Bayes among others. The predictions of these techniques are compared against simpler classifiers and those reported in related works.

11.
J Biomol Struct Dyn ; 40(13): 5836-5847, 2022 08.
Artigo em Inglês | MEDLINE | ID: mdl-33475019

RESUMO

In the hospital, because of the rise in cases daily, there are a small number of COVID-19 test kits available. For this purpose, a rapid alternative diagnostic choice to prevent COVID-19 spread among individuals must be implemented as an automatic detection method. In this article, the multi-objective optimization and deep learning-based technique for identifying infected patients with coronavirus using X-rays is proposed. J48 decision tree approach classifies the deep feature of corona affected X-ray images for the efficient detection of infected patients. In this study, 11 different convolutional neural network-based (CNN) models (AlexNet, VGG16, VGG19, GoogleNet, ResNet18, ResNet50, ResNet101, InceptionV3, InceptionResNetV2, DenseNet201 and XceptionNet) are developed for detection of infected patients with coronavirus pneumonia using X-ray images. The efficiency of the proposed model is tested using k-fold cross-validation method. Moreover, the parameters of CNN deep learning model are tuned using multi-objective spotted hyena optimizer (MOSHO). Extensive analysis shows that the proposed model can classify the X-ray images at a good accuracy, precision, recall, specificity and F1-score rates. Extensive experimental results reveal that the proposed model outperforms competitive models in terms of well-known performance metrics. Hence, the proposed model is useful for real-time COVID-19 disease classification from X-ray chest images.Communicated by Ramaswamy H. Sarma.


Assuntos
COVID-19 , Aprendizado Profundo , COVID-19/diagnóstico por imagem , Humanos , Redes Neurais de Computação , SARS-CoV-2 , Raios X
12.
Toxicol In Vitro ; 81: 105347, 2022 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-35318113

RESUMO

A 3D-QSAR study based on DFT descriptors and machine learning calculations is presented in this work. Our goal has been to build predictive models for classifying the carcinogenic activity of a set of aromatic amines (AA) and nitroaromatic (NA) compounds. As the main result, we stress that calculations must consider both the activated metabolites (derived from AA and NA species) and the water solvent to obtain reliable predictive classification models. We have obtained eight decision tree models that presented an accuracy of over 90% by using either Gázquez-Vela chemical potential (µ+) or the chemical hardness (η) of the activated metabolites in aqueous solvent.


Assuntos
Aminas , Carcinógenos , Aminas/química , Aminas/toxicidade , Carcinógenos/química , Carcinógenos/toxicidade , Aprendizado de Máquina , Relação Quantitativa Estrutura-Atividade , Solventes
13.
Diabetes Metab Syndr Obes ; 14: 3437-3445, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34349537

RESUMO

BACKGROUND: Nonalcoholic fatty liver disease (NAFLD) is the commonest form of chronic liver disease worldwide and its prevalence is rapidly increasing. Screening and early diagnosis of high-risk groups are important for the prevention and treatment of NAFLD; however, traditional imaging examinations are expensive and difficult to perform on a large scale. This study aimed to develop a simple and reliable predictive model based on the risk factors for NAFLD using a decision tree algorithm for the diagnosis of NAFLD and reduction of healthcare costs. METHODS: This retrospective cross-sectional study included 22,819 participants who underwent annual health examinations between January 2019 and December 2019 at Physical Examination Center in Shengjing Hospital of China Medical University. After rigorous data screening, data of 9190 participants were retained in the final dataset for use in the J48 decision tree algorithm for the construction of predictive models. Approximately 66% of these patients (n=6065) were randomly assigned to the training dataset for the construction of the decision tree, while 34% of the patients (n=3125) were assigned to the test dataset to evaluate the performance of the decision tree. RESULTS: The results showed that the J48 decision tree classifier exhibited good performance (accuracy=0.830, precision=0.837, recall=0.830, F-measure=0.830, and area under the curve=0.905). The decision tree structure revealed waist circumference as the most significant attribute, followed by triglyceride levels, systolic blood pressure, sex, age, and total cholesterol level. CONCLUSION: Our study suggests that a decision tree analysis can be used to screen high-risk individuals for NAFLD. The key attributes in the tree structure can further contribute to the prevention of NAFLD by suggesting implementable targeted community interventions, which can help improve the outcome of NAFLD and reduce the burden on the healthcare system.

14.
Interdiscip Sci ; 13(2): 260-272, 2021 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-33587262

RESUMO

In the hospital, a limited number of COVID-19 test kits are available due to the spike in cases every day. For this reason, a rapid alternative diagnostic option should be introduced as an automated detection method to prevent COVID-19 spreading among individuals. This article proposes multi-objective optimization and a deep-learning methodology for the detection of infected coronavirus patients with X-rays. J48 decision tree method classifies the deep characteristics of affected X-ray corona images to detect the contaminated patients effectively. Eleven different convolutional neuronal network-based (CNN) models were developed in this study to detect infected patients with coronavirus pneumonia using X-ray images (AlexNet, VGG16, VGG19, GoogleNet, ResNet18, ResNet500, ResNet101, InceptionV3, InceptionResNetV2, DenseNet201 and XceptionNet). In addition, the parameters of the CNN profound learning model are described using an emperor penguin optimizer with several objectives (MOEPO). A broad review reveals that the proposed model can categorise the X-ray images at the correct rates of precision, accuracy, recall, specificity and F1-score. Extensive test results show that the proposed model outperforms competitive models with well-known efficiency metrics. The proposed model is, therefore, useful for the real-time classification of X-ray chest images of COVID-19 disease.


Assuntos
COVID-19/diagnóstico por imagem , Aprendizado Profundo , Diagnóstico por Computador , Pulmão/diagnóstico por imagem , Interpretação de Imagem Radiográfica Assistida por Computador , Radiografia Torácica , COVID-19/virologia , Árvores de Decisões , Humanos , Pulmão/virologia , Valor Preditivo dos Testes , Reprodutibilidade dos Testes
15.
Transl Oncol ; 14(9): 101157, 2021 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-34247136

RESUMO

INTRODUCTION: An efficient readily employable risk prognostication method is desirable for MM in settings where genomics tests cannot be performed owing to geographical/economical constraints. In this work, a new Modified Risk Staging (MRS) has been proposed for newly diagnosed Multiple Myeloma (NDMM) that exploits six easy-to-acquire clinical parameters i.e. age, albumin, ß2-microglobulin (ß2M), calcium, estimated glomerular filtration rate (eGFR) and hemoglobin. MATERIALS AND METHODS: MRS was designed using a training cohort of 716 NDMM patients of our inhouse MM Indian (MMIn) cohort and validated on MMIn (n=354) cohort and MMRF (n=900) cohort. K-adaptive partitioning (KAP) was used to find new thresholds for the parameters. Risk staging rules, obtained via training a J48 classifier, were used to build MRS. RESULTS: New thresholds were identified for albumin (3.6 g/dL), ß2M (4.8 mg/L), calcium (11.13 mg/dL), eGFR (48.1 mL/min), and hemoglobin (12.3 g/dL) using KAP on the MMIn dataset. On the MMIn dataset, MRS outperformed ISS for OS prediction in terms of C-index, hazard ratios, and its corresponding p-values, but performs comparable in prediction of PFS. On both MMIn and MMRF datasets, MRS performed better than RISS in terms of C-index and p-values. A simple online tool was also designed to allow automated calculation of MRS based on the values of the parameters. DISCUSSION: Our proposed ML-derived yet simple staging system, MRS, although does not employ genetic features, outperforms RISS as confirmed by better separability in KM survival curves and higher values of C-index on both MMIn and MMRF datasets. FUNDING: Grant: BT/MED/30/SP11006/2015 (Department of Biotechnology, Govt. of India), Grant: DST/ICPS/CPS-Individual/2018/279(G) (Department of Science and Technology, Govt. of India), UGC-Senior Research Fellowship.

16.
Bioinformation ; 17(2): 348-355, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34234395

RESUMO

Alzheimer's Disease (AD) is one of the most common causes of dementia, mostly affecting the elderly population. Currently, there is no proper diagnostic tool or method available for the detection of AD. The present study used two distinct data sets of AD genes, which could be potential biomarkers in the diagnosis. The differentially expressed genes (DEGs) curated from both datasets were used for machine learning classification, tissue expression annotation and co-expression analysis. Further, CNPY3, GPR84, HIST1H2AB, HIST1H2AE, IFNAR1, LMO3, MYO18A, N4BP2L1, PML, SLC4A4, ST8SIA4, TLE1 and N4BP2L1 were identified as highly significant DEGs and exhibited co-expression with other query genes. Moreover, a tissue expression study found that these genes are also expressed in the brain tissue. In addition to the earlier studies for marker gene identification, we have considered a different set of machine learning classifiers to improve the accuracy rate from the analysis. Amongst all the six classification algorithms, J48 emerged as the best classifier, which could be used for differentiating healthy and diseased samples. SMO/SVM and Logit Boost further followed J48 to achieve the classification accuracy.

17.
Diabetes Metab Syndr Obes ; 13: 4621-4630, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-33273837

RESUMO

BACKGROUND: To predict and make an early diagnosis of diabetes is a critical approach in a population with high risk of diabetes, one of the devastating diseases globally. Traditional and conventional blood tests are recommended for screening the suspected patients; however, applying these tests could have health side effects and expensive cost. The goal of this study was to establish a simple and reliable predictive model based on the risk factors associated with diabetes using a decision tree algorithm. METHODS: A retrospective cross-sectional study was used in this study. A total of 10,436 participants who had a health check-up from January 2017 to July 2017 were recruited. With appropriate data mining approaches, 3454 participants remained in the final dataset for further analysis. Seventy percent of these participants (2420 cases) were then randomly allocated to either the training dataset for the construction of the decision tree or the testing dataset (30%, 1034 cases) for evaluation of the performance of the decision tree. For this purpose, the cost-sensitive J48 algorithm was used to develop the decision tree model. RESULTS: Utilizing all the key features of the dataset consisting of 14 input variables and two output variables, the constructed decision tree model identified several key factors that are closely linked to the development of diabetes and are also modifiable. Furthermore, our model achieved an accuracy of classification of 90.3% with a precision of 89.7% and a recall of 90.3%. CONCLUSION: By applying simple and cost-effective classification rules, our decision tree model estimates the development of diabetes in a high-risk adult Chinese population with strong potential for implementation of diabetes management.

18.
Ethiop J Health Sci ; 30(1): 115-124, 2020 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-32116440

RESUMO

BACKGROUND: Diabetes is a disease that affects the body's ability to produce or use insulin. A total of 425 million people are suffering from diabetes in the world. Of this, more than 16 million people live in the Africa Region, which is estimated to be around 41 million by 2045. The main objective of this study was to design and develop a prototype knowledge-based system using data mining techniques for diagnosis and treatment of diabetes. METHODS: For this study, experimental research design was employed, and the researchers used domain expert knowledge as a supplement of data mining techniques whereby three classification algorithms in WEKA; namely J48, PART and JRip were used, and finally the researchers decided to use the results of J48 classification algorithm. Ultimate Visual basic studio 2013 (Vb.net) was used to store knowledge and as front side of prototype. Common lisp prolog (Clisp) was used for obtained knowledge back end coding. RESULTS: Using a decision tree algorithm; namely J48, 2512 (95.1515%) of the instances were classified correctly, and 128 (4.8485 %) were classified incorrectly. The second most performing model was generated by JRip Classier. This model scored the 94.7348% accuracy on the general data to classify the status of diabetic patient datasets. It classified the 2501 instances of the records correctly. CONCLUSION: The J48 model was the best performing model with the best accuracy of results.


Assuntos
Mineração de Dados/métodos , Árvores de Decisões , Diabetes Mellitus/diagnóstico , Diabetes Mellitus/terapia , Bases de Conhecimento , África , Confiabilidade dos Dados , Humanos , Estudo de Prova de Conceito
19.
Future Med Chem ; 12(2): 147-159, 2020 01.
Artigo em Inglês | MEDLINE | ID: mdl-32031024

RESUMO

Aim: We applied genetic programming approaches to understand the impact of descriptors on inhibitory effects of serine protease inhibitors of Mycobacterium tuberculosis (Mtb) and the discovery of new inhibitors as drug candidates. Materials & methods: The experimental dataset of serine protease inhibitors of Mtb descriptors was optimized by genetic algorithm (GA) along with the correlation-based feature selection (CFS) in order to develop predictive models using machine-learning algorithms. The best model was deployed on a library of 918 phytochemical compounds to screen potential serine protease inhibitors of Mtb. Quality and performance of the predictive models were evaluated using various standard statistical parameters. Result: The best random forest model with CFS-GA screened 126 anti-tubercular agents out of 918 phytochemical compounds. Also, genetic programing symbolic classification method is optimized descriptors and developed an equation for mathematical models. Conclusion: The use of CFS-GA with random forest-enhanced classification accuracy and predicted new serine protease inhibitors of Mtb, which can be used for better drug development against tuberculosis.


Assuntos
Mycobacterium tuberculosis/enzimologia , Serina Proteases/metabolismo , Inibidores de Serina Proteinase/farmacologia , Aprendizado de Máquina , Modelos Moleculares , Serina Proteases/genética , Inibidores de Serina Proteinase/química
20.
Curr Med Imaging ; 16(4): 340-354, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32410537

RESUMO

BACKGROUND: In this era of cutting edge research, though one of the ubiquitous facilities accessible to modern man is state of the art medical care yet diabetes has emerged as one of the major ailments afflicting the present generation. So the prime necessity of this age has transformed into providing cheap and sustainable medical care against such major diseases like diabetes. In layman's terms Diabetes may be defined as a physiological condition wherein the blood glucose level is more than the prescribed level on a regular basis. OBJECTIVES: So the prime objective of this work is to provide a novel classification technique for detection of diabetes in a timely and effective manner. METHODS: The proposed work comprises of four phases: In the first phase a "Localized Diabetes Dataset" has been compiled and collected from Bombay Medical Hall, Mahabir Chowk, Pyada Toli, Upper Bazar, Jharkhand, Ranchi, India. In the second phase various classification techniques namely RBF NN, MLP NN, NBs, and J48graft DT have been applied on the Localized Diabetes Dataset. In the third phase, Genetic algorithm (GA) has been utilized as an attribute selection technique through which six attributes among twelve attributes have been filtered. Lastly in the fourth phase RBF NN, MLP NN, NBs and J48graft DT has been utilized for classification on relevant attributes obtained by GA. RESULTS: In this study, comparative analysis of outcomes obtained by with and without the use of GA for the same set of classification technique has been done w.r.t performance assessment. It has been conclusively inferred that GA is helpful in removing insignificant attributes, reducing the cost and computation time while enhancing ROC and accuracy. CONCLUSION: The utilized strategy may likewise be executed for other medical issues.


Assuntos
Algoritmos , Diabetes Mellitus/classificação , Diabetes Mellitus/diagnóstico , Adolescente , Adulto , Idoso , Idoso de 80 Anos ou mais , Criança , Pré-Escolar , Conjuntos de Dados como Assunto , Feminino , Humanos , Índia , Masculino , Pessoa de Meia-Idade , Adulto Jovem
SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa