Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 9 de 9
Filter
Add more filters











Database
Language
Publication year range
1.
J Chem Phys ; 161(1)2024 Jul 07.
Article in English | MEDLINE | ID: mdl-38958158

ABSTRACT

Sequential optimization is one of the promising approaches in identifying the optimal candidate(s) (molecules, reactants, drugs, etc.) with desired properties (reaction yield, selectivity, efficacy, etc.) from a large set of potential candidates, while minimizing the number of experiments required. However, the high dimensionality of the feature space (e.g., molecular descriptors) makes it often difficult to utilize the relevant features during the process of updating the set of candidates to be examined. In this article, we developed a new sequential optimization algorithm for molecular problems based on reinforcement learning, multi-armed linear bandit framework, and online, dynamic feature selections in which relevant molecular descriptors are updated along with the experiments. We also designed a stopping condition aimed to guarantee the reliability of the chosen candidate from the dataset pool. The developed algorithm was examined by comparing with Bayesian optimization (BO), using two synthetic datasets and two real datasets in which one dataset includes hydration free energy of molecules and another one includes a free energy difference between enantiomer products in chemical reaction. We found that the dynamic feature selection in representing the desired properties along the experiments provides a better performance (e.g., time required to find the best candidate and stop the experiment) as the overall trend and that our multi-armed linear bandit approach with a dynamic feature selection scheme outperforms the standard BO with fixed feature variables. The comparison of our algorithm to BO with dynamic feature selection is also addressed.

2.
PLoS One ; 17(5): e0267190, 2022.
Article in English | MEDLINE | ID: mdl-35617201

ABSTRACT

BACKGROUND AND OBJECTIVE: Low birth weight is one of the primary causes of child mortality and several diseases of future life in developing countries, especially in Southern Asia. The main objective of this study is to determine the risk factors of low birth weight and predict low birth weight babies based on machine learning algorithms. MATERIALS AND METHODS: Low birth weight data has been taken from the Bangladesh Demographic and Health Survey, 2017-18, which had 2351 respondents. The risk factors associated with low birth weight were investigated using binary logistic regression. Two machine learning-based classifiers (logistic regression and decision tree) were adopted to characterize and predict low birth weight. The model performances were evaluated by accuracy, sensitivity, specificity, positive predictive value, negative predictive value, and area under the curve. RESULTS: The average percentage of low birth weight in Bangladesh was 16.2%. The respondent's region, education, wealth index, height, twin child, and alive child were statistically significant risk factors for low birth weight babies. The logistic regression-based classifier performed 87.6% accuracy and 0.59 area under the curve for holdout (90:10) cross-validation, whereas the decision tree performed 85.4% accuracy and 0.55 area under the curve. CONCLUSIONS: Logistic regression-based classifier provided the most accurate classification of low birth weight babies and has the highest accuracy. This study's findings indicate the necessity for an efficient, cost-effective, and integrated complementary approach to reduce and correctly predict low birth weight babies in Bangladesh.


Subject(s)
Infant, Low Birth Weight , Machine Learning , Bangladesh/epidemiology , Birth Weight , Child , Humans , Infant , Infant, Newborn , Logistic Models , Risk Factors
3.
PLoS One ; 16(6): e0253172, 2021.
Article in English | MEDLINE | ID: mdl-34138925

ABSTRACT

AIMS: Malnutrition is a major health issue among Bangladeshi under-five (U5) children. Children are malnourished if the calories and proteins they take through their diet are not sufficient for their growth and maintenance. The goal of the research was to use machine learning (ML) algorithms to detect the risk factors of malnutrition (stunted, wasted, and underweight) as well as their prediction. METHODS: This work utilized malnutrition data that was derived from Bangladesh Demographic and Health Survey which was conducted in 2014. The selected dataset consisted of 7079 children with 13 factors. The potential risks of malnutrition have been identified by logistic regression (LR). Moreover, 3 ML classifiers (support vector machine (SVM), random forest (RF), and LR) have been implemented for predicting malnutrition and the performance of these ML algorithms were assessed on the basis of accuracy. RESULTS: The average prevalence of stunted, wasted, and underweight was 35.4%, 15.4%, and 32.8%, respectively. It was noted that LR identified five risk factors for stunting and underweight, as well as four factors for wasting. Results illustrated that RF can be accurately classified as stunted, wasted, and underweight children and obtained the highest accuracy of 88.3% for stunted, 87.7% for wasted, and 85.7% for underweight. CONCLUSION: This research focused on the identification and prediction of major risk factors for stunting, wasting, and underweight using ML algorithms which will aid policymakers in reducing malnutrition among Bangladesh's U5 children.


Subject(s)
Growth Disorders/etiology , Malnutrition/etiology , Thinness/etiology , Wasting Syndrome/etiology , Age Factors , Algorithms , Bangladesh , Child, Preschool , Diet , Female , Growth Disorders/epidemiology , Humans , Infant , Machine Learning , Male , Malnutrition/epidemiology , Prevalence , Risk Factors , Socioeconomic Factors , Thinness/epidemiology , Wasting Syndrome/epidemiology
4.
J Affect Disord ; 264: 157-162, 2020 03 01.
Article in English | MEDLINE | ID: mdl-32056745

ABSTRACT

BACKGROUND: Depressive symptoms are common among older people which are associated with disability, morbidity and mortality. The aim of this study was to determine the associated risk factors for depressive symptoms among older people in Bangladesh. METHODS: A cross-sectional survey was conducted among 400 people aged ≥65 years from the Meherpur district in Bangladesh. Depressive symptoms were measured by the 15-item Geriatric Depression Scale and categorized into: no depressive symptoms, mild, moderate and severe depressive symptoms. Information was also collected on socio-economic and demographic characteristics, health problems, feeling of loneliness, history of falls and concern about falling. Chi-square test of association and multinomial logistic regression was performed to reveal the determinants of depressive symptoms. RESULTS: Just over half of the sample were female, aged 70+ years, and lived in rural areas. The prevalence of depressive symptoms was 55.5%, and 23.0% mild, 19.0% moderate, and 13.5% having severe levels of depressive symptoms. Older age, sex, residence, marital status, presence of co-morbidities, visual impairment, previous falls, loneliness, and fear of falling were the significant determinants for developing depressive symptoms. LIMITATIONS: A convenience sampling method was used for data collection among older people from selected communities in a district of Bangladesh. The results do not represent the entire population of Bangladesh. Besides, it was a cross-sectional study, and causality cannot be determined. CONCLUSION: Depressive symptoms among older people in Bangladesh is prevalent, and needs to be addressed. Public health programs and strategies are needed to reduce depressive symptoms among older adults in Bangladesh.


Subject(s)
Accidental Falls , Depression , Aged , Aged, 80 and over , Bangladesh/epidemiology , Cross-Sectional Studies , Depression/epidemiology , Fear , Female , Humans
5.
Health Inf Sci Syst ; 8(1): 7, 2020 Dec.
Article in English | MEDLINE | ID: mdl-31949894

ABSTRACT

BACKGROUND AND OBJECTIVES: Diabetes is a chronic disease characterized by high blood sugar. It may cause many complicated disease like stroke, kidney failure, heart attack, etc. About 422 million people were affected by diabetes disease in worldwide in 2014. The figure will be reached 642 million in 2040. The main objective of this study is to develop a machine learning (ML)-based system for predicting diabetic patients. MATERIALS AND METHODS: Logistic regression (LR) is used to identify the risk factors for diabetes disease based on p value and odds ratio (OR). We have adopted four classifiers like naïve Bayes (NB), decision tree (DT), Adaboost (AB), and random forest (RF) to predict the diabetic patients. Three types of partition protocols (K2, K5, and K10) have also adopted and repeated these protocols into 20 trails. Performances of these classifiers are evaluated using accuracy (ACC) and area under the curve (AUC). RESULTS: We have used diabetes dataset, conducted in 2009-2012, derived from the National Health and Nutrition Examination Survey. The dataset consists of 6561 respondents with 657 diabetic and 5904 controls. LR model demonstrates that 7 factors out of 14 as age, education, BMI, systolic BP, diastolic BP, direct cholesterol, and total cholesterol are the risk factors for diabetes. The overall ACC of ML-based system is 90.62%. The combination of LR-based feature selection and RF-based classifier gives 94.25% ACC and 0.95 AUC for K10 protocol. CONCLUSION: The combination of LR and RF-based classifier performs better. This combination will be very helpful for predicting diabetic patients.

6.
Tob Control ; 29(6): 692-694, 2020 11.
Article in English | MEDLINE | ID: mdl-31776264

ABSTRACT

BACKGROUND: Tobacco production continues to increase in low-income and middle-income countries including in Bangladesh. It has spreads to different parts of Bangladesh and is now threatening food cultivation, the environment and health. The aim of this study is to determine the factors those are influenced farmers' decisions to grow tobacco. METHODS: We surveyed 371 tobacco farmers using a simple random sampling in the Meherpur district of Bangladesh. Binary logistic regression was used to examine the variables affecting farmers' decision to cultivate tobacco. RESULTS: Approximately 87.0% of the respondents were contract farmers with different tobacco companies. Almost 83.3% of the farmers had intentions to continue tobacco farming. Binary logistic regression results suggest that company's incentives to farmers, farmers' profitability, a guaranteed market for the tobacco crop and economic viability were the variables most affecting the decision to cultivate tobacco. CONCLUSIONS: Governments seeking to shift farmers away from tobacco will need to consider how to address the dynamics revealed in this research.


Subject(s)
Farmers , Nicotiana , Agriculture , Bangladesh , Humans , Income
7.
Comput Methods Programs Biomed ; 176: 173-193, 2019 Jul.
Article in English | MEDLINE | ID: mdl-31200905

ABSTRACT

OBJECTIVE: A colon microarray data is a repository of thousands of gene expressions with different strengths for each cancer cell. It is necessary to detect which genes are responsible for cancer growth. This study presents an exhaustive comparative study of different machine learning (ML) systems which serves two major purposes: (a) identification of high risk differential genes using statistical tests and (b) development of a ML strategy for predicting cancer genes. METHODS: Four statistical tests namely: Wilcoxon sign rank sum (WCSRS), t test, Kruskal-Wallis (KW), and F-test were adapted for cancerous gene identification using their p-values. The extracted gene set was used to classify cancer patients using ten classifiers namely: linear discriminant analysis (LDA), quadratic discriminant analysis (QDA), naïve Bayes (NB), Gaussian process classification (GPC), support vector machine (SVM), artificial neural network (ANN), logistic regression (LR), decision tree (DT), Adaboost (AB), and random forest (RF). Performance was then evaluated using cross-validation protocols and standardized metrics viz. accuracy (ACC) and area under the curve (AUC). RESULTS: The colon cancer dataset consists of 2000 genes from 62 patients (40 cancer vs. 22 control). The overall mean ACC of our ML system using all four statistical tests and all ten classifiers was 90.50%. The ML system showed an ACC of 99.81% using a combination WCSRS test and RF-based classifier. This is an improvement of 8% over previously published values in literature. CONCLUSIONS: RF-based model with statistical tests for detection of high risk genes showed the best performance for accurate cancer classification in multi-center clinical trials.


Subject(s)
Colon/metabolism , Colonic Neoplasms/metabolism , Machine Learning , Tissue Array Analysis/methods , Area Under Curve , Bayes Theorem , Decision Trees , Discriminant Analysis , Gene Expression Profiling , Humans , Logistic Models , Models, Statistical , Neural Networks, Computer , Normal Distribution , Oncogenes , Regression Analysis , Risk , Sensitivity and Specificity , Support Vector Machine
8.
J Glob Health ; 8(1): 010417, 2018 Jun.
Article in English | MEDLINE | ID: mdl-29740501

ABSTRACT

BACKGROUND: Child and neonatal mortality is a serious problem in Bangladesh. The main objective of this study was to determine the most significant socio-economic factors (covariates) between the years 2011 and 2014 that influences on neonatal and child mortality and to further suggest the plausible policy proposals. METHODS: We modeled the neonatal and child mortality as categorical dependent variable (alive vs death of the child) while 16 covariates are used as independent variables using χ2 statistic and multiple logistic regression (MLR) based on maximum likelihood estimate. FINDINGS: Using the MLR, for neonatal mortality, diarrhea showed the highest positive coefficient (ß = 1.130; P < 0.010) leading to most significant covariate for both 2011 and 2014. The corresponding odds ratios were: 0.323 for both the years. The second most significant covariate in 2011 was birth order between 2-6 years (ß = 0.744; P < 0.001), while father's education was negative correlation (ß = -0.910; P < 0.050). In general, 10 covariates in 2011 and 5 covariates in 2014 were significant, so there was an improvement in socio-economic conditions for neonatal mortality. For child mortality, birth order between 2-6 years and 7 and above years showed the highest positive coefficients (ß = 1.042; P < 0.010) and (ß = 1.285; P < 0.050) for 2011. The corresponding odds ratios were: 2.835 and 3.614, respectively. Father's education showed the highest coefficient (ß = 0.770; P < 0.050) indicating the significant covariate for 2014 and the corresponding odds ratio was 2.160. In general, 6 covariates in 2011 and 4 covariates in 2014 were also significant, so there was also an improvement in socio-economic conditions for child mortality. This study allows policy makers to make appropriate decisions to reduce neonatal and child mortality in Bangladesh. CONCLUSIONS: In 2014, mother's age and father's education were also still significant covariates for child mortality. This study allows policy makers to make appropriate decisions to reduce neonatal and child mortality in Bangladesh.


Subject(s)
Child Mortality/trends , Infant Mortality/trends , Adolescent , Adult , Bangladesh/epidemiology , Child, Preschool , Educational Status , Fathers/statistics & numerical data , Female , Humans , Infant , Infant, Newborn , Male , Maternal Age , Middle Aged , Risk Factors , Socioeconomic Factors , Young Adult
9.
J Med Syst ; 42(5): 92, 2018 Apr 10.
Article in English | MEDLINE | ID: mdl-29637403

ABSTRACT

Diabetes mellitus is a group of metabolic diseases in which blood sugar levels are too high. About 8.8% of the world was diabetic in 2017. It is projected that this will reach nearly 10% by 2045. The major challenge is that when machine learning-based classifiers are applied to such data sets for risk stratification, leads to lower performance. Thus, our objective is to develop an optimized and robust machine learning (ML) system under the assumption that missing values or outliers if replaced by a median configuration will yield higher risk stratification accuracy. This ML-based risk stratification is designed, optimized and evaluated, where: (i) the features are extracted and optimized from the six feature selection techniques (random forest, logistic regression, mutual information, principal component analysis, analysis of variance, and Fisher discriminant ratio) and combined with ten different types of classifiers (linear discriminant analysis, quadratic discriminant analysis, naïve Bayes, Gaussian process classification, support vector machine, artificial neural network, Adaboost, logistic regression, decision tree, and random forest) under the hypothesis that both missing values and outliers when replaced by computed medians will improve the risk stratification accuracy. Pima Indian diabetic dataset (768 patients: 268 diabetic and 500 controls) was used. Our results demonstrate that on replacing the missing values and outliers by group median and median values, respectively and further using the combination of random forest feature selection and random forest classification technique yields an accuracy, sensitivity, specificity, positive predictive value, negative predictive value and area under the curve as: 92.26%, 95.96%, 79.72%, 91.14%, 91.20%, and 0.93, respectively. This is an improvement of 10% over previously developed techniques published in literature. The system was validated for its stability and reliability. RF-based model showed the best performance when outliers are replaced by median values.


Subject(s)
Diabetes Mellitus/classification , Diabetes Mellitus/epidemiology , Machine Learning , Adult , Age Distribution , Artificial Intelligence , Bayes Theorem , Blood Glucose , Blood Pressure , Body Weights and Measures , Data Interpretation, Statistical , Decision Support Techniques , Female , Humans , Indians, North American , Male , Middle Aged , Neural Networks, Computer , Reproducibility of Results , Sex Distribution , United States
SELECTION OF CITATIONS
SEARCH DETAIL