Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 19 de 19
Filter
Add more filters











Publication year range
1.
Talanta ; 280: 126793, 2024 Dec 01.
Article in English | MEDLINE | ID: mdl-39222596

ABSTRACT

Dry matter content (DMC), firmness and soluble solid content (SSC) are important indicators for assessing the quality attributes and determining the maturity of kiwifruit. However, traditional measurement methods are time-consuming, labor-intensive, and destructive to the kiwifruit, leading to resource wastage. In order to solve this problem, this study has tracked the flowering, fruiting, maturing and collecting processes of Ya'an red-heart kiwifruit, and has proposed a non-destructive method for kiwifruit quality attribute assessment and maturity identification that combines fluorescence hyperspectral imaging (FHSI) technology and chemometrics. Specifically, first of all, three different spectral data preprocessing methods were adopted, and PLSR was used to evaluate the quality attributes (DMC, firmness, and SSC) of kiwifruit. Next, the differences in accuracy of different models in discriminating kiwifruit maturity were compared, and an ensemble learning model based on LightGBM and GBDT models was constructed. The results indicate that the ensemble learning model outperforms single machine learning models. In addition, the application effects of the 'Convolutional Neural Network'-'Multilayer Perceptron' (CNN-MLP) model under different optimization algorithms were compared. To improve the robustness of the model, an improved whale optimization algorithm (IWOA) was introduced by modifying the acceleration factor. Overall, the IWOA-CNN-MLP model performs the best in discriminating the maturity of kiwifruit, with Accuracytest of 0.916 and Loss of 0.23. In addition, compared with the basic model, the accuracy of the integrated learning model SG-MSC-SEL was improved by about 12%-20 %. The research findings will provide new perspectives for the evaluation of kiwifruit quality and maturity discrimination using FHSI and chemometric methods, thereby promoting further research and applications in this field.


Subject(s)
Actinidia , Fruit , Hyperspectral Imaging , Actinidia/chemistry , Actinidia/growth & development , Hyperspectral Imaging/methods , Fruit/chemistry , Fruit/growth & development , Chemometrics , Neural Networks, Computer , Food Quality , Fluorescence , Quality Control
2.
BMC Med Inform Decis Mak ; 24(1): 198, 2024 Jul 22.
Article in English | MEDLINE | ID: mdl-39039464

ABSTRACT

Genes, expressed as sequences of nucleotides, are susceptible to mutations, some of which can lead to cancer. Machine learning and deep learning methods have emerged as vital tools in identifying mutations associated with cancer. Thyroid cancer ranks as the 5th most prevalent cancer in the USA, with thousands diagnosed annually. This paper presents an ensemble learning model leveraging deep learning techniques such as Long Short-Term Memory (LSTM), Gated Recurrent Units (GRUs), and Bi-directional LSTM (Bi-LSTM) to detect thyroid cancer mutations early. The model is trained on a dataset sourced from asia.ensembl.org and IntOGen.org, consisting of 633 samples with 969 mutations across 41 genes, collected from individuals of various demographics. Feature extraction encompasses techniques including Hahn moments, central moments, raw moments, and various matrix-based methods. Evaluation employs three testing methods: self-consistency test (SCT), independent set test (IST), and 10-fold cross-validation test (10-FCVT). The proposed ensemble learning model demonstrates promising performance, achieving 96% accuracy in the independent set test (IST). Statistical measures such as training accuracy, testing accuracy, recall, sensitivity, specificity, Mathew's Correlation Coefficient (MCC), loss, training accuracy, F1 Score, and Cohen's kappa are utilized for comprehensive evaluation.


Subject(s)
Deep Learning , Mutation , Thyroid Neoplasms , Humans , Thyroid Neoplasms/genetics , Thyroid Neoplasms/diagnosis , Disease Progression
3.
Foods ; 13(9)2024 Apr 25.
Article in English | MEDLINE | ID: mdl-38731691

ABSTRACT

Sunflower is an important crop, and the vitality and moisture content of sunflower seeds have an important influence on the sunflower's planting and yield. By employing hyperspectral technology, the spectral characteristics of sunflower seeds within the wavelength range of 384-1034 nm were carefully analyzed with the aim of achieving effective prediction of seed vitality and moisture content. Firstly, the original hyperspectral data were subjected to preprocessing techniques such as Savitzky-Golay smoothing, standard normal variable correction (SNV), and multiplicative scatter correction (MSC) to effectively reduce noise interference, ensuring the accuracy and reliability of the data. Subsequently, principal component analysis (PCA), extreme gradient boosting (XGBoost), and stacked autoencoders (SAE) were utilized to extract key feature bands, enhancing the interpretability and predictive performance of the data. During the modeling phase, random forests (RFs) and LightGBM algorithms were separately employed to construct classification models for seed vitality and prediction models for moisture content. The experimental results demonstrated that the SG-SAE-LightGBM model exhibited outstanding performance in the classification task of sunflower seed vitality, achieving an accuracy rate of 98.65%. Meanwhile, the SNV-XGBoost-LightGBM model showed remarkable achievement in moisture content prediction, with a coefficient of determination (R2) of 0.9715 and root mean square error (RMSE) of 0.8349. In conclusion, this study confirms that the fusion of hyperspectral technology and multivariate data analysis algorithms enables the accurate and rapid assessment of sunflower seed vitality and moisture content, providing robust tools and theoretical support for seed quality evaluation and agricultural production practices. Furthermore, this research not only expands the application of hyperspectral technology in unraveling the intrinsic vitality characteristics of sunflower seeds but also possesses significant theoretical and practical value.

4.
J Hazard Mater ; 474: 134666, 2024 Aug 05.
Article in English | MEDLINE | ID: mdl-38815389

ABSTRACT

The Hartman Park community in Houston, Texas-USA, is in a highly polluted area which poses significant risks to its predominantly Hispanic and lower-income residents. Surrounded by dense clustering of industrial facilities compounds health and safety hazards, exacerbating environmental and social inequalities. Such conditions emphasize the urgent need for environmental measures that focus on investigating ambient air quality. This study estimated benzene, one of the most reported pollutants in Hartman Park, using machine learning-based approaches. Benzene data was collected in residential areas in the neighborhood and analyzed using a combination of five machine-learning algorithms (i.e., XGBR, GBR, LGBMR, CBR, RFR) through a newly developed ensemble learning model. Evaluations on model robustness, overfitting tests, 10-fold cross-validation, internal and stratified validation were performed. We found that the ensemble model depicted about 98.7% spatial variability of benzene (Adj. R2 =0.987). Through rigorous validations, stability of model performance was confirmed. Several predictors that contribute to benzene were identified, including temperature, developed intensity areas, leaking petroleum storage tank, and traffic-related factors. Analyzing spatial patterns, we found high benzene spread over areas near industrial zones as well as in residential areas. Overall, our study area was exposed to high benzene levels and requires extra attention from relevant authorities.

5.
J Comput Chem ; 45(13): 953-968, 2024 May 15.
Article in English | MEDLINE | ID: mdl-38174739

ABSTRACT

In the pursuit of novel antiretroviral therapies for human immunodeficiency virus type-1 (HIV-1) proteases (PRs), recent improvements in drug discovery have embraced machine learning (ML) techniques to guide the design process. This study employs ensemble learning models to identify crucial substructures as significant features for drug development. Using molecular docking techniques, a collection of 160 darunavir (DRV) analogs was designed based on these key substructures and subsequently screened using molecular docking techniques. Chemical structures with high fitness scores were selected, combined, and one-dimensional (1D) screening based on beyond Lipinski's rule of five (bRo5) and ADME (absorption, distribution, metabolism, and excretion) prediction implemented in the Combined Analog generator Tool (CAT) program. A total of 473 screened analogs were subjected to docking analysis through convolutional neural networks scoring function against both the wild-type (WT) and 12 major mutated PRs. DRV analogs with negative changes in binding free energy ( ΔΔ G bind ) compared to DRV could be categorized into four attractive groups based on their interactions with the majority of vital PRs. The analysis of interaction profiles revealed that potent designed analogs, targeting both WT and mutant PRs, exhibited interactions with common key amino acid residues. This observation further confirms that the ML model-guided approach effectively identified the substructures that play a crucial role in potent analogs. It is expected to function as a powerful computational tool, offering valuable guidance in the identification of chemical substructures for synthesis and subsequent experimental testing.


Subject(s)
HIV Infections , HIV Protease Inhibitors , HIV-1 , Humans , Darunavir/pharmacology , HIV Protease Inhibitors/pharmacology , HIV Protease Inhibitors/chemistry , Peptide Hydrolases/pharmacology , Molecular Docking Simulation , HIV Protease/chemistry , Drug Discovery
6.
Sensors (Basel) ; 23(15)2023 Jul 25.
Article in English | MEDLINE | ID: mdl-37571446

ABSTRACT

Efficient detection and evaluation of soybean seedling emergence is an important measure for making field management decisions. However, there are many indicators related to emergence, and using multiple models to detect them separately makes data processing too slow to aid timely field management. In this study, we aimed to integrate several deep learning and image processing methods to build a model to evaluate multiple soybean seedling emergence information. An unmanned aerial vehicle (UAV) was used to acquire soybean seedling RGB images at emergence (VE), cotyledon (VC), and first node (V1) stages. The number of soybean seedlings that emerged was obtained by the seedling emergence detection module, and image datasets were constructed using the seedling automatic cutting module. The improved AlexNet was used as the backbone network of the growth stage discrimination module. The above modules were combined to calculate the emergence proportion in each stage and determine soybean seedlings emergence uniformity. The results show that the seedling emergence detection module was able to identify the number of soybean seedlings with an average accuracy of 99.92%, a R2 of 0.9784, a RMSE of 6.07, and a MAE of 5.60. The improved AlexNet was more lightweight, training time was reduced, the average accuracy was 99.07%, and the average loss was 0.0355. The model was validated in the field, and the error between predicted and real emergence proportions was up to 0.0775 and down to 0.0060. It provides an effective ensemble learning model for the detection and evaluation of soybean seedling emergence, which can provide a theoretical basis for making decisions on soybean field management and precision operations and has the potential to evaluate other crops emergence information.


Subject(s)
Glycine max , Seedlings , Unmanned Aerial Devices , Crops, Agricultural , Machine Learning
7.
J Transl Med ; 21(1): 485, 2023 07 20.
Article in English | MEDLINE | ID: mdl-37475016

ABSTRACT

BACKGROUND: The nuclear factor kappa B (NFκB) regulatory pathways downstream of tumor necrosis factor (TNF) play a critical role in carcinogenesis. However, the widespread influence of NFκB in cells can result in off-target effects, making it a challenging therapeutic target. Ensemble learning is a machine learning technique where multiple models are combined to improve the performance and robustness of the prediction. Accordingly, an ensemble learning model could uncover more precise targets within the NFκB/TNF signaling pathway for cancer therapy. METHODS: In this study, we trained an ensemble learning model on the transcriptome profiles from 16 cancer types in the TCGA database to identify a robust set of genes that are consistently associated with the NFκB/TNF pathway in cancer. Our model uses cancer patients as features to predict the genes involved in the NFκB/TNF signaling pathway and can be adapted to predict the genes for different cancer types by switching the cancer type of patients. We also performed functional analysis, survival analysis, and a case study of triple-negative breast cancer to demonstrate our model's potential in translational cancer medicine. RESULTS: Our model accurately identified genes regulated by NFκB in response to TNF in cancer patients. The downstream analysis showed that the identified genes are typically involved in the canonical NFκB-regulated pathways, particularly in adaptive immunity, anti-apoptosis, and cellular response to cytokine stimuli. These genes were found to have oncogenic properties and detrimental effects on patient survival. Our model also could distinguish patients with a specific cancer subtype, triple-negative breast cancer (TNBC), which is known to be influenced by NFκB-regulated pathways downstream of TNF. Furthermore, a functional module known as mononuclear cell differentiation was identified that accurately predicts TNBC patients and poor short-term survival in non-TNBC patients, providing a potential avenue for developing precision medicine for cancer subtypes. CONCLUSIONS: In conclusion, our approach enables the discovery of genes in NFκB-regulated pathways in response to TNF and their relevance to carcinogenesis. We successfully categorized these genes into functional groups, providing valuable insights for discovering more precise and targeted cancer therapeutics.


Subject(s)
NF-kappa B , Triple Negative Breast Neoplasms , Humans , NF-kappa B/genetics , NF-kappa B/metabolism , Triple Negative Breast Neoplasms/drug therapy , Tumor Necrosis Factor-alpha/genetics , Tumor Necrosis Factor-alpha/therapeutic use , Signal Transduction/genetics , Carcinogenesis , Machine Learning
8.
Front Plant Sci ; 13: 1047479, 2022.
Article in English | MEDLINE | ID: mdl-36438117

ABSTRACT

Moldy peanut seeds are damaged by mold, which seriously affects the germination rate of peanut seeds. At the same time, the quality and variety purity of peanut seeds profoundly affect the final yield of peanuts and the economic benefits of farmers. In this study, hyperspectral imaging technology was used to achieve variety classification and mold detection of peanut seeds. In addition, this paper proposed to use median filtering (MF) to preprocess hyperspectral data, use four variable selection methods to obtain characteristic wavelengths, and ensemble learning models (SEL) as a stable classification model. This paper compared the model performance of SEL and extreme gradient boosting algorithm (XGBoost), light gradient boosting algorithm (LightGBM), and type boosting algorithm (CatBoost). The results showed that the MF-LightGBM-SEL model based on hyperspectral data achieves the best performance. Its prediction accuracy on the data training and data testing reach 98.63% and 98.03%, respectively, and the modeling time was only 0.37s, which proved that the potential of the model to be used in practice. The approach of SEL combined with hyperspectral imaging techniques facilitates the development of a real-time detection system. It could perform fast and non-destructive high-precision classification of peanut seed varieties and moldy peanuts, which was of great significance for improving crop yields.

9.
Alzheimers Dement (N Y) ; 8(1): e12351, 2022.
Article in English | MEDLINE | ID: mdl-36204350

ABSTRACT

Introduction: Geriatric patients with dementia incur higher healthcare costs and longer hospital stays than other geriatric patients. We aimed to identify risk factors for hospitalization outcomes that could be mitigated early to improve outcomes and impact overall quality of life. Methods: We identified risk factors, that is, demographics, hospital complications, pre-admission, and post-admission risk factors including medical history and comorbidities, affecting hospitalization outcomes determined by hospital stays and discharge dispositions. Over 150 clinical and demographic factors of 15,678 encounters (8407 patients) were retrieved from our institution's data warehouse. We further narrowed them down to twenty factors through feature selection engineering by using analysis of variance (ANOVA) and Glmnet. We developed an explainable machine-learning model to predict hospitalization outcomes among geriatric patients with dementia. Results: Our model is based on stacking ensemble learning and achieved accuracy of 95.6% and area under the curve (AUC) of 0.757. It outperformed prevalent methods of risk assessment for encounters of patients with Alzheimer's disease dementia (ADD) (4993), vascular dementia (VD) (4173), Parkinson's disease with dementia (PDD) (3735), and other unspecified dementias (OUD) (2777). Top identified hospitalization outcome risk factors, mostly from medical history, include encephalopathy, number of medical problems at admission, pressure ulcers, urinary tract infections, falls, admission source, age, race, anemia, etc., with several overlaps in multi-dementia groups. Discussion: Our model identified several predictive factors that can be modified or intervened so that efforts can be made to prevent recurrence or mitigate their adverse effects. Knowledge of the modifiable risk factors would help guide early interventions for patients at high risk for poor hospitalization outcome as defined by hospital stays longer than seven days, undesirable discharge disposition, or both. The interventions include starting specific protocols on modifiable risk factors like encephalopathy, falls, and infections, where non-existent or not routine, to improve hospitalization outcomes of geriatric patients with dementia. Highlights: A total 15,678 encounters of Geriatrics with dementia with a final 20 risk factors.Developed a predictive model for hospitalization outcomes for multi-dementia types.Risk factors for each type were identified including those amenable to interventions.Top factors are encephalopathy, pressure ulcers, urinary tract infection (UTI), falls, and admission source.With accuracy of 95.6%, our ensemble predictive model outperforms other models.

10.
Sensors (Basel) ; 22(13)2022 Jun 29.
Article in English | MEDLINE | ID: mdl-35808388

ABSTRACT

With the improvement of intelligence and interconnection, Internet of Things (IoT) devices tend to become more vulnerable and exposed to many threats. Device identification is the foundation of many cybersecurity operations, such as asset management, vulnerability reaction, and situational awareness, which are important for enhancing the security of IoT devices. The more information sources and the more angles of view we have, the more precise identification results we obtain. This study proposes a novel and alternative method for IoT device identification, which introduces commonly available WebUI login pages with distinctive characteristics specific to vendors as the data source and uses an ensemble learning model based on a combination of Convolutional Neural Networks (CNN) and Deep Neural Networks (DNN) for device vendor identification and develops an Optical Character Recognition (OCR) based method for device type and model identification. The experimental results show that the ensemble learning model can achieve 99.1% accuracy and 99.5% F1-Score in the determination of whether a device is from a vendor that appeared in the training dataset, and if the answer is positive, 98% accuracy and 98.3% F1-Score in identifying which vendor it is from. The OCR-based method can identify fine-grained attributes of the device and achieve an accuracy of 99.46% in device model identification, which is higher than the results of the Shodan cyber search engine by a considerable margin of 11.39%.

11.
J Diabetes Metab Disord ; 21(1): 339-352, 2022 Jun.
Article in English | MEDLINE | ID: mdl-35673418

ABSTRACT

Objective: Diabetes is a chronic fatal disease that has affected millions of people all over the globe. Type 2 Diabetes Mellitus (T2DM) accounts for 90% of the affected population among all types of diabetes. Millions of T2DM patients remain undiagnosed due to lack of awareness and under resourced healthcare system. So, there is a dire need for a diagnostic and prognostic tool that shall help the healthcare providers, clinicians and practitioners with early prediction and hence can recommend the lifestyle changes required to stop the progression of diabetes. The main objective of this research is to develop a framework based on machine learning techniques using only lifestyle indicators for prediction of T2DM disease. Moreover, prediction model can be used without visiting clinical labs and hospital readmissions. Method: A proposed framework is presented and implemented based on machine learning paradigms using lifestyle indicators for better prediction of T2DM disease. The current research has involved different experts like Diabetologists, Endocrinologists, Dieticians, Nutritionists, etc. for selecting the contributing 1552 instances and 11 attributes lifestyle biological features to promote health and manage complications towards T2DM disease. The dataset has been collected through survey and google forms from different geographical regions. Results: Seven machine learning classifiers were employed namely K-Nearest Neighbour (KNN), Linear Regression (LR), Support Vector Machine (SVM), Naive Bayes (NB), Decision Tree (DT), Random Forest (RF) and Gradient Boosting (GB). Gradient Boosting classifier outperformed best with an accuracy rate of 97.24% for training and 96.90% for testing separately followed by RF, DT, NB, SVM, LR, and KNN as 95.36%, 92.52%, 90.72%, 90.20%, 90.20% and 77.06% respectively. However, in terms of precision, RF achieved high performance (0.980%) and KNN performed the lowest (0.793%). As far as recall is being concerned, GB achieved the highest rate of 0.975% and KNN showed the worst rate of 0.774%. Also, GB is top performed in terms of f1-score. According to the ROCs, GB and NB had a better area under the curve compared to the others. Conclusion: The research developed a realistic health management system for T2DM disease based on machine learning techniques using only lifestyle data for prediction of T2DM. To extend the current study, these models shall be used for different, large and real-time datasets which share the commonality of data with T2DM disease to establish the efficacy of the proposed system.

12.
Comput Biol Med ; 146: 105645, 2022 07.
Article in English | MEDLINE | ID: mdl-35751183

ABSTRACT

Deep learning is a machine learning technique that has revolutionized the research community due to its impressive results on various real-life problems. Recently, ensembles of Convolutional Neural Networks (CNN) have proven to achieve high robustness and accuracy in numerous computer vision challenges. As expected, the more models we add to the ensemble, the better performance we can obtain, but, in contrast, more computer resources are needed. Hence, the importance of deciding how many models to use and which models to select from a pool of trained models is huge. From the latter, a common strategy in deep learning is to select the models randomly or according to the results on the validation set. However, in this way models are chosen based on individual performance ignoring how they are expected to work together. Alternatively, to ensure a better complement between models, an exhaustive search can be used by evaluating the performance of several ensemble models based on different numbers and combinations of trained models. Nevertheless, this may result in being high computationally expensive. Considering that epistemic uncertainty analysis has recently been successfully employed to understand model learning, we aim to analyze whether an uncertainty-aware epistemic method can help us decide which groups of CNN models may work best. The method was validated on several food datasets and with different CNN architectures. In most cases, our proposal outperforms the results by a statistically significant range with respect to the baseline techniques and is much less computationally expensive compared to the brute-force search.


Subject(s)
Machine Learning , Neural Networks, Computer , Uncertainty
13.
Article in English | MEDLINE | ID: mdl-36613022

ABSTRACT

The cavity length, which is a vital index in aeration and corrosion reduction engineering, is affected by many factors and is challenging to calculate. In this study, 10-fold cross-validation was performed to select the optimal input configuration. Additionally, the hyperparameters of three ensemble learning models-random forest (RF), gradient boosting decision tree (GBDT), and extreme gradient boosting tree (XGBOOST)-were fine-tuned by the Bayesian optimization (BO) algorithm to improve the prediction accuracy and compare the five empirical methods. The XGBOOST method was observed to present the highest prediction accuracy. Further interpretability analysis carried out using the Sobol method demonstrated its ability to reasonably capture the varying relative significance of different input features under different flow conditions. The Sobol sensitivity analysis also observed two patterns of extracting information from the input features in ML models: (1) the main effect of individual features in ensemble learning and (2) the interactive effect between each feature in SVR. From the results, the models obtaining individual information both predict the cavity length more accurately than that using interactive information. Subsequently, the XGBOOST captures more correct information from features, which leads to the varied Sobol index in accordance with outside phenomena; meanwhile, the predicted results fit the experimental points best.


Subject(s)
Algorithms , Learning , Bayes Theorem , Corrosion , Machine Learning
14.
Article in English | MEDLINE | ID: mdl-34873580

ABSTRACT

Predicting students at risk of academic failure is valuable for higher education institutions to improve student performance. During the pandemic, with the transition to compulsory distance learning in higher education, it has become even more important to identify these students and make instructional interventions to avoid leaving them behind. This goal can be achieved by new data mining techniques and machine learning methods. This study took both the synchronous and asynchronous activity characteristics of students into account to identify students at risk of academic failure during the pandemic. Additionally, this study proposes an optimal ensemble model predicting students at risk using a combination of relevant machine learning algorithms. Performances of over two thousand university students were predicted with an ensemble model in terms of gender, degree, number of downloaded lecture notes and course materials, total time spent in online sessions, number of attendances, and quiz score. Asynchronous learning activities were found more determinant than synchronous ones. The proposed ensemble model made a good prediction with a specificity of 90.34%. Thus, practitioners are suggested to monitor and organize training activities accordingly.

15.
BMC Nephrol ; 22(1): 372, 2021 11 09.
Article in English | MEDLINE | ID: mdl-34753430

ABSTRACT

OBJECTIVE: To assess the clinical practicability of the ensemble learning model established by Liu et al. in estimating glomerular filtration rate (GFR) and validate whether it is a better model than the Asian modified Chronic Kidney Disease Epidemiology Collaboration (CKD-EPI) equation in a cohort of Chinese chronic kidney disease (CKD) patients in an external validation study. METHODS: According to the ensemble learning model and the Asian modified CKD-EPI equation, we calculated estimated GFRensemble and GFRCKD-EPI, separately. Diagnostic performance of the two models was assessed and compared by correlation coefficient, regression equation, Bland-Altman analysis, bias, precision and P30 under the premise of 99mTc-diethylenetriaminepentaacetic acid (99mTc-DTPA) dual plasma sample clearance method as reference method for GFR measurement (mGFR). RESULTS: A total of 158 Chinese CKD patients were included in our external validation study. The GFRensemble was highly related with mGFR, with the correlation coefficient of 0.94. However, regression equation of GFRensemble = 0.66*mGFR + 23.05, the regression coefficient was far away from one, and the intercept was wide. Compared with the Asian modified CKD-EPI equation, the diagnostic performance of the ensemble learning model also demonstrated a wider 95% limit of agreement in Bland-Altman analysis (52.6 vs 42.4 ml/min/1.73 m2), a poorer bias (8.0 vs 1.0 ml/min/1.73 m2, P = 0.02), an inferior precision (18.4 vs 12.7 ml/min/1.73 m2, P < 0.001) and a lower P30 (58.9% vs 74.1%, P < 0.001). CONCLUSIONS: Our study showed that the ensemble learning model cannot replace the Asian modified CKD-EPI equation for the first choice for GFR estimation in overall Chinese CKD patients.


Subject(s)
Glomerular Filtration Rate , Machine Learning , Renal Insufficiency, Chronic/diagnosis , Age Factors , Asian People , China , Female , Humans , Male , Middle Aged , Radionuclide Imaging , Radiopharmaceuticals , Regression Analysis , Renal Insufficiency, Chronic/diagnostic imaging , Sex Factors , Technetium Tc 99m Pentetate
16.
Environ Sci Technol ; 55(10): 7157-7166, 2021 05 18.
Article in English | MEDLINE | ID: mdl-33939421

ABSTRACT

Inhaling radon and its progeny is associated with adverse health outcomes. However, previous studies of the health effects of residential exposure to radon in the United States were commonly based on a county-level temporally invariant radon model that was developed using measurements collected in the mid- to late 1980s. We developed a machine learning model to predict monthly radon concentrations for each ZIP Code Tabulation Area (ZCTA) in the Greater Boston area based on 363,783 short-term measurements by Spruce Environmental Technologies, Inc., during the period 2005-2018. A two-stage ensemble-based model was developed to predict radon concentrations for all ZCTAs and months. Stage one included 12 base statistical models that independently predicted ZCTA-level radon concentrations based on geological, architectural, socioeconomic, and meteorological factors for each ZCTA. Stage two aggregated the predictions of these 12 base models using an ensemble learning method. The results of a 10-fold cross-validation showed that the stage-two model has a good prediction accuracy with a weighted R2 of 0.63 and root mean square error of 22.6 Bq/m3. The community-level time-varying predictions from our model have good predictive precision and accuracy and can be used in future prospective epidemiological studies in the Greater Boston area.


Subject(s)
Air Pollutants, Radioactive , Air Pollution, Indoor , Radon , Air Pollutants, Radioactive/analysis , Air Pollution, Indoor/analysis , Boston , Housing , Machine Learning , Models, Statistical , Radon/analysis , United States
17.
J Magn Reson Imaging ; 51(4): 1223-1234, 2020 04.
Article in English | MEDLINE | ID: mdl-31456317

ABSTRACT

BACKGROUND: Accurate detection and localization of prostate cancer (PCa) in men undergoing prostate MRI is a fundamental step for future targeted prostate biopsies and treatment planning. Fully automated localization of peripheral zone (PZ) PCa using the apparent diffusion coefficient (ADC) map might be clinically useful. PURPOSE/HYPOTHESIS: To describe automated localization of PCa in the PZ on ADC map MR images using an ensemble U-Net-based model. STUDY TYPE: Retrospective, case-control. POPULATION: In all, 226 patients (154 and 72 patients with and without clinically significant PZ PCa, respectively), training, and testing was performed using dataset images of 146 and 80 patients, respectively. FIELD STRENGTH: 3T, ADC maps. SEQUENCE: ADC map. ASSESSMENT: The ground truth was established by manual delineation of the prostate and prostate PZ tumors on ADC maps by dedicated radiologists using MRI-radical prostatectomy maps as a reference standard. Statistical Tests: Performance of the ensemble model was evaluated using Dice similarity coefficient (DSC), sensitivity, and specificity metrics on a per-slice basis. Receiver operating characteristic (ROC) curve and area under the curve (AUC) were employed as well. The paired t-test was used to test the differences between the performances of constituent networks of the ensemble model. RESULTS: Our developed algorithm yielded DSC, sensitivity, and specificity of 86.72% ± 9.93%, 85.76% ± 23.33%, and 76.44% ± 23.70%, respectively (mean ± standard deviation) on 80 test cases consisting of 41 and 39 instances from patients with and without clinically significant tumors including 660 extracted 2D slices. AUC was reported as 0.779. DATA CONCLUSION: An ensemble U-Net-based approach can accurately detect and segment PCa in the PZ from ADC map MR prostate images. LEVEL OF EVIDENCE: 4 Technical Efficacy: Stage 1 J. Magn. Reson. Imaging 2020;51:1223-1234.


Subject(s)
Diffusion Magnetic Resonance Imaging , Prostatic Neoplasms , Humans , Machine Learning , Magnetic Resonance Imaging , Male , Prostatic Neoplasms/diagnostic imaging , Retrospective Studies
18.
Sensors (Basel) ; 19(18)2019 Sep 05.
Article in English | MEDLINE | ID: mdl-31492034

ABSTRACT

Drift is an important issue that impairs the reliability of sensors, especially in gas sensors. The conventional method usually adopts the reference gas to compensate for the drift. However, its classification accuracy is not high. We propose a supervised learning algorithm that is based on multi-classifier integration for drift compensation in this paper, which incorporates drift compensation into the classification process, motivated by the fact that the goal of drift compensation is to improve the classification performance. In our method, with the obtained characteristics of sensors and the advantage of Support Vector Machine (SVM) in few-shot classification, the improved Long Shot Term Memory (LSTM) is integrated to build the multi-class classifier model. We tested the proposed approach on the publicly available time series dataset that was collected over three years by the metal-oxide gas sensors. The results clearly indicate the superiority of multiple classifier approach, which achieves higher classification accuracy as compared with different approaches during testing period with an ensemble of classifiers in the presence of sensor drift over time.

19.
Sensors (Basel) ; 19(10)2019 May 19.
Article in English | MEDLINE | ID: mdl-31109126

ABSTRACT

Human activity recognition (HAR) has gained lots of attention in recent years due to its high demand in different domains. In this paper, a novel HAR system based on a cascade ensemble learning (CELearning) model is proposed. Each layer of the proposed model is comprised of Extremely Gradient Boosting Trees (XGBoost), Random Forest, Extremely Randomized Trees (ExtraTrees) and Softmax Regression, and the model goes deeper layer by layer. The initial input vectors sampled from smartphone accelerometer and gyroscope sensor are trained separately by four different classifiers in the first layer, and the probability vectors representing different classes to which each sample belongs are obtained. Both the initial input data and the probability vectors are concatenated together and considered as input to the next layer's classifiers, and eventually the final prediction is obtained according to the classifiers of the last layer. This system achieved satisfying classification accuracy on two public datasets of HAR based on smartphone accelerometer and gyroscope sensor. The experimental results show that the proposed approach has gained better classification accuracy for HAR compared to existing state-of-the-art methods, and the training process of the model is simple and efficient.


Subject(s)
Biosensing Techniques/methods , Human Activities , Monitoring, Physiologic , Algorithms , Humans , Smartphone
SELECTION OF CITATIONS
SEARCH DETAIL