Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 18 de 18
Filtrar
Mais filtros

Base de dados
País/Região como assunto
Tipo de documento
Intervalo de ano de publicação
1.
Neuroimage ; 285: 120495, 2024 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-38092156

RESUMO

This study presents a comprehensive examination of sex-related differences in resting-state electroencephalogram (EEG) data, leveraging two different types of machine learning models to predict an individual's sex. We utilized data from the Two Decades-Brainclinics Research Archive for Insights in Neurophysiology (TDBRAIN) EEG study, affirming that gender prediction can be attained with noteworthy accuracy. The best performing model achieved an accuracy of 85% and an ROC AUC of 89%, surpassing all prior benchmarks set using EEG data and rivaling the top-tier results derived from fMRI studies. A comparative analysis of LightGBM and Deep Convolutional Neural Network (DCNN) models revealed DCNN's superior performance, attributed to its ability to learn complex spatial-temporal patterns in the EEG data and handle large volumes of data effectively. Despite this, interpretability remained a challenge for the DCNN model. The LightGBM interpretability analysis revealed that the most important EEG features for accurate sex prediction were related to left fronto-central and parietal EEG connectivity. We also showed the role of both low (delta and theta) and high (beta and gamma) activity in the accurate sex prediction. These results, however, have to be approached with caution, because it was obtained from a dataset comprised largely of participants with various mental health conditions, which limits the generalizability of the results and necessitates further validation in future studies. . Overall, the study illuminates the potential of interpretable machine learning for sex prediction, alongside highlighting the importance of considering individual differences in prediction sex from brain activity.


Assuntos
Encéfalo , Redes Neurais de Computação , Humanos , Encéfalo/fisiologia , Aprendizado de Máquina , Imageamento por Ressonância Magnética , Eletroencefalografia/métodos
2.
J Comput Chem ; 45(7): 368-376, 2024 Mar 15.
Artigo em Inglês | MEDLINE | ID: mdl-37909259

RESUMO

The concept of chemical bonding is a crucial aspect of chemistry that aids in understanding the complexity and reactivity of molecules and materials. However, the interpretation of chemical bonds can be hindered by the choice of the theoretical approach and the specific method utilized. This study aims to investigate the effect of choosing different density functionals on the interpretation of bonding achieved through energy decomposition analysis (EDA). To achieve this goal, a data set was created, representing four bonding groups and various combinations of functionals and dispersion correction schemes. The calculations showed significant variation among the different functionals for the EDA terms, with the dispersion correction terms exhibiting the highest variability. More information was extracted by using machine learning in combination with dimensionality reduction on the data set. Results indicate that, despite the differences in the EDA terms obtained from different functionals, the functional has the least significant impact, suggesting minimal influence on the bonding interpretation.

3.
Environ Sci Technol ; 58(26): 11492-11503, 2024 Jul 02.
Artigo em Inglês | MEDLINE | ID: mdl-38904357

RESUMO

Soil organic carbon (SOC) plays a vital role in global carbon cycling and sequestration, underpinning the need for a comprehensive understanding of its distribution and controls. This study explores the importance of various covariates on SOC spatial distribution at both local (up to 1.25 km) and continental (USA) scales using a deep learning approach. Our findings highlight the significant role of terrain attributes in predicting SOC concentration distribution with terrain, contributing approximately one-third of the overall prediction at the local scale. At the continental scale, climate is only 1.2 times more important than terrain in predicting SOC distribution, whereas at the local scale, the structural pattern of terrain is 14 and 2 times more important than climate and vegetation, respectively. We underscore that terrain attributes, while being integral to the SOC distribution at all scales, are stronger predictors at the local scale with explicit spatial arrangement information. While this observational study does not assess causal mechanisms, our analysis nonetheless presents a nuanced perspective about SOC spatial distribution, which suggests disparate predictors of SOC at local and continental scales. The insights gained from this study have implications for improved SOC mapping, decision support tools, and land management strategies, aiding in the development of effective carbon sequestration initiatives and enhancing climate mitigation efforts.


Assuntos
Carbono , Clima , Solo , Solo/química , Ciclo do Carbono , Sequestro de Carbono
4.
Children (Basel) ; 11(7)2024 Jun 24.
Artigo em Inglês | MEDLINE | ID: mdl-39062212

RESUMO

Artificial intelligence has been applied to medical diagnosis and decision-making but it has not been used for classification of Class III malocclusions in children. OBJECTIVE: This study aims to propose an innovative machine learning (ML)-based diagnostic model for automatically classifies dental, skeletal and functional Class III malocclusions. METHODS: The collected data related to 46 cephalometric feature measurements from 4-14-year-old children (n = 666). The data set was divided into a training set and a test set in a 7:3 ratio. Initially, we employed the Recursive Feature Elimination (RFE) algorithm to filter the 46 input parameters, selecting 14 significant features. Subsequently, we constructed 10 ML models and trained these models using the 14 significant features from the training set through ten-fold cross-validation, and evaluated the models' average accuracy in test set. Finally, we conducted an interpretability analysis of the optimal model using the ML model interpretability tool SHapley Additive exPlanations (SHAP). RESULTS: The top five models ranked by their area under the curve (AUC) values were: GPR (0.879), RBF SVM (0.876), QDA (0.876), Linear SVM (0.875) and L2 logistic (0.869). The DeLong test showed no statistical difference between GPR and the other models (p > 0.05). Therefore GPR was selected as the optimal model. The SHAP feature importance plot revealed that he top five features were SN-GoMe (the ratio of the length of the anterior skull base SN to that of the mandibular base GoMe), U1-NA (maxillary incisor angulation to NA plane), Overjet (the distance between two lines perpendicular to the functional occlusal plane from U1 and L), ANB (the difference between angles SNA and SNB), and AB-NPo (the angle between the AB and N-Pog line). CONCLUSIONS: Our findings suggest that ML models based on cephalometric data could effectively assist dentists to classify dental, functional and skeletal Class III malocclusions in children. In addition, features such as SN_GoMe, U1_NA and Overjet can as important indicators for predicting the severity of Class III malocclusions.

5.
J Hazard Mater ; 471: 134426, 2024 Jun 05.
Artigo em Inglês | MEDLINE | ID: mdl-38688220

RESUMO

Nanoplastics (NPs) aggregation determines their bioavailability and risks in natural aquatic environments, which is driven by multiple environmental and polymer factors. The back propagation artificial neural network (BP-ANN) model in machine learning (R2 = 0.814) can fit the complex NPs aggregation, and the feature importance was in the order of surface charge of NPs > dissolved organic matter (DOM) > functional group of NPs > ionic strength and pH > concentration of NPs. Meta-analysis results specified low surface charge (0 ≤ |ζ| < 10 mV) of NPs, low concentration (< 1 mg/L) and low molecular weight (< 10 kg/mol) of DOM, NPs with amino groups, high ionic strength (IS > 700 mM) and acidic solution, and high concentration (≥ 20 mg/L) of NPs with smaller size (< 100 nm) contribute to NPs aggregation, which is consistent with the prediction in machine learning. Feature interaction synergistically (e.g., DOM and pH) or antagonistically (e.g., DOM and cation potential) changed NPs aggregation. Therefore, NPs were predicted to aggregate in the dry period and estuary of Poyang Lake. Research on aggregation of NPs with different particle size,shapes, and functional groups, heteroaggregation of NPs with coexisting particles and aging effects should be strengthened in the future. This study supports better assessments of the NPs fate and risks in environments.

6.
Digit Biomark ; 8(1): 59-74, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38650695

RESUMO

Introduction: Alzheimer's disease (AD) is a progressive neurological disorder characterized by mild memory loss and ranks as a leading cause of mortality in the USA, accounting for approximately 120,000 deaths per year. It is also the primary form of dementia. Early detection is critical for timely intervention as the neurodegenerative process often starts 15-20 years before cognitive symptoms manifest. This study focuses on determining feature importance in AD classification using fused texture features from 3D magnetic resonance imaging hippocampal and entorhinal cortex and standardized uptake value ratio (SUVR) derived from positron emission tomography (PET) images. Methods: To achieve this objective, we employed four distinct classifiers (Linear Support Vector Classification, Linear Discriminant Analysis, Logistic Regression, and Logistic Regression Classifier with Stochastic Gradient Descent Learning). These classifiers were used to derive both average and top-ranked importance scores for each feature based on their outputs. Our framework is designed to distinguish between two classes, AD-negative (or mild cognitive impairment stable [MCIs]) and AD-positive (or MCI conversion [MCIc]), using a probabilistic neural network classifier and the Alzheimer's Disease Neuroimaging Initiative (ADNI) database. Results: The findings from the feature importance highlight the crucial role of the GLCM texture features extracted from the hippocampus and entorhinal cortex, demonstrating their superior performance compared to the volume and SUVR. GLCM texture AD classification achieved approximately 90% sensitivity in identifying MCIc cases while maintaining low false positives (below 30%) when fused with other features. Moreover, the receiver operating characteristic curves validate the GLCMs' superior performance in distinguishing between MCIs and MCIc. Additionally, fusing different types of features improved classification performance compared to relying solely on any single feature category. Conclusion: Our study emphasizes the pivotal role of GLCM texture features in early Alzheimer's detection.

7.
J Bioinform Comput Biol ; 21(1): 2350003, 2023 02.
Artigo em Inglês | MEDLINE | ID: mdl-36891974

RESUMO

N4-methyladenosine (4mC) methylation is an essential epigenetic modification of deoxyribonucleic acid (DNA) that plays a key role in many biological processes such as gene expression, gene replication and transcriptional regulation. Genome-wide identification and analysis of the 4mC sites can better reveal the epigenetic mechanisms that regulate various biological processes. Although some high-throughput genomic experimental methods can effectively facilitate the identification in a genome-wide scale, they are still too expensive and laborious for routine use. Computational methods can compensate for these disadvantages, but they still leave much room for performance improvement. In this study, we develop a non-NN-style deep learning-based approach for accurately predicting 4mC sites from genomic DNA sequence. We generate various informative features represented sequence fragments around 4mC sites, and subsequently implement them into a deep forest (DF) model. After training the deep model using 10-fold cross-validation, the overall accuracies of 85.0%, 90.0%, and 87.8% were achieved for three representative model organisms, A. thaliana, C. elegans, and D. melanogaster, respectively. In addition, extensive experiment results show that our proposed approach outperforms other existing state-of-the-art predictors in the 4mC identification. Our approach stands for the first DF-based algorithm for the prediction of 4mC sites, providing a novel idea in this field.


Assuntos
Caenorhabditis elegans , DNA , Animais , DNA/genética , Caenorhabditis elegans/genética , Drosophila melanogaster/genética , Algoritmos
8.
Front Genet ; 14: 1230579, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37842648

RESUMO

Background: Despite the recent success of genome-wide association studies (GWAS) in identifying 90 independent risk loci for Parkinson's disease (PD), the genomic underpinning of PD is still largely unknown. At the same time, accurate and reliable predictive models utilizing genomic or demographic features are desired in the clinic for predicting the risk of Parkinson's disease. Methods: To identify influential demographic and genomic factors associated with PD and to further develop predictive models, we utilized demographic data, incorporating 200 variables across 33,473 participants, along with genomic data involving 447,089 SNPs across 8,840 samples, both derived from the Fox Insight online study. We first applied correlation and GWAS analyses to find the top demographic and genomic factors associated with PD, respectively. We further developed and compared a variety of machine learning (ML) models for predicting PD. From the developed ML models, we performed feature importance analysis to reveal the predictability of each demographic or the genomic input feature for PD. Finally, we performed gene set enrichment analysis on our GWAS results to identify PD-associated pathways. Results: In our study, we identified both novel and well-known demographic and genetic factors (along with the enriched pathways) related to PD. In addition, we developed predictive models that performed robustly, with AUC = 0.89 for demographic data and AUC = 0.74 for genomic data. Our GWAS analysis identified several novel and significant variants and gene loci, including three intron variants in LMNA (p-values smaller than 4.0e-21) and one missense variant in SEMA4A (p-value = 1.11e-26). Our feature importance analysis from the PD-predictive ML models highlighted some significant and novel variants from our GWAS analysis (e.g., the intron variant rs1749409 in the RIT1 gene) and helped identify potentially causative variants that were missed by GWAS, such as rs11264300, a missense variant in the gene DCST1, and rs11584630, an intron variant in the gene KCNN3. Conclusion: In summary, by combining a GWAS with advanced machine learning models, we identified both known and novel demographic and genomic factors as well as built well-performing ML models for predicting Parkinson's disease.

9.
Front Cardiovasc Med ; 10: 1140670, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37034340

RESUMO

Objectives: To evaluate the efficacy of the Cox-Maze IV procedure (CMP-IV) in combination with valve surgery in patients with both atrial fibrillation (AF) and valvular disease and use machine learning algorithms to identify potential risk factors of AF recurrence. Methods: A total of 1,026 patients with AF and valvular disease from two hospitals were included in the study. 555 patients received the CMP-IV procedure in addition to valve surgery and left atrial appendage ligation (CMP-IV group), while 471 patients only received valve surgery and left atrial appendage ligation (Non-CMP-IV group). Kaplan-Meier analysis was used to calculate the sinus rhythm maintenance rate. 58 variables were selected as variables for each group and 10 machine learning models were developed respectively. The performance of the models was evaluated using five-fold cross-validation and metrics including F1 score, accuracy, precision, and recall. The four best-performing models for each group were selected for further analysis, including feature importance evaluation and SHAP analysis. Results: The 5-year sinus rhythm maintenance rate in the CMP-IV group was 82.13% (95% CI: 78.51%, 85.93%), while in the Non-CMP-IV group, it was 13.40% (95% CI: 10.44%, 17.20%). The eXtreme Gradient Boosting (XGBoost), LightGBM, Category Boosting (CatBoost) and Random Fores (RF) models performed the best in the CMP-IV group, with area under the curve (AUC) values of 0.768 (95% CI: 0.742, 0.786), 0.766 (95% CI: 0.744, 0.792), 0.762 (95% CI: 0.723, 0.801), and 0.732 (95% CI: 0.701, 0.763), respectively. In the Non-CMP-IV group, the LightGBM, XGBoost, CatBoost and RF models performed the best, with AUC values of 0.738 (95% CI: 0.699, 0.777), 0.732 (95% CI: 0.694, 0.770), 0.724 (95% CI: 0.668, 0.789), and 0.716 (95% CI: 0.656, 0.774), respectively. Analysis of feature importance and SHAP revealed that duration of AF, preoperative left ventricular ejection fraction, postoperative heart rhythm, preoperative neutrophil-lymphocyte ratio, preoperative left atrial diameter and heart rate were significant factors in AF recurrence. Conclusion: CMP-IV is effective in treating AF and multiple machine learning models were successfully developed, and several risk factors were identified for AF recurrence, which may aid clinical decision-making and optimize the individual surgical management of AF.

10.
Health Policy ; 129: 104709, 2023 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-36725380

RESUMO

OBJECTIVE: The purpose of this study was to use a deep learning model and a traditional statistical regression model to predict the long-term care insurance decisions of registered nurses. METHODS: We Prospectively surveyed 1,373 registered nurses with a minimum of 2 years of full-time working experience at a large medical center in Taiwan: 615 who already owned long-term care insurance (LTCI), 332 who had no intention to purchase LTCI (group 1), and 426 who intended to purchase LTCI (group 2). RESULTS: After inverse probability of treatment weighting (IPTW), no statistically significant differences were identified in the study characteristics of the two groups. All the performance indices for the deep neural network (DNN) model were significantly higher than those of the multiple logistic regression (MLR) model (P<0.001). The strongest predictor of an individual's long-term care insurance decision was their risk propensity score, followed by their caregiving responsibilities, whether they live with older adult relatives, their experiences of catastrophic illness, and their openness to experience. CONCLUSIONS: The DNN model is useful for predicting long-term care insurance decisions. Its prediction accuracy can be increased through training with temporal data collected from registered nurses. Future research can explore designs for two-level or multilevel models that explain the contextual effects of the risk factors on long-term care insurance decisions.


Assuntos
Aprendizado Profundo , Seguro de Assistência de Longo Prazo , Humanos , Idoso , Modelos Logísticos , Modelos Estatísticos , Inquéritos e Questionários , Assistência de Longa Duração
11.
Materials (Basel) ; 15(15)2022 Jul 30.
Artigo em Inglês | MEDLINE | ID: mdl-35955210

RESUMO

The lack of skid resistance performance is a crucial factor leading to road traffic accidents. The pavement surface friction is an essential indicator for measuring the skid resistance. The surface texture structure significantly affects the friction between the tire and the pavement, determining the pavement skid resistance. To deeply study the relationship between surface texture structure and pavement skid resistance performance, two types of asphalt mixture specimens, asphalt concrete (AC) and open-graded friction course (OGFC), are prepared for the skid resistance performance test. Firstly, a high-precision 3D smart sensor Gocator 3110 is used to collect the 3D point cloud data of the asphalt mixture surface texture. The British pendulum tester is used to measure the friction. Secondly, ten feature parameters are extracted to describe the 3D macrotexture characteristics. A data set containing 10 features and 200 groups of texture and friction data was also constructed. Meanwhile, the influence of macro-texture features on the skid resistance performance is discussed. Finally, an optimized Bayesian-LightGBM model is trained based on the constructed dataset. Compared with LightGBM, XGBoost, RF, and SVR algorithms, the Bayesian-LightGBM model can evaluate skid resistance more accurately. The R2 value of the proposed model is 92.83%. The research results prove that ten macrotexture features contribute to the evaluation of skid resistance to varying degrees. Furthermore, compared with AC mixture specimen, the texture surface of OGFC mixture specimen has more obvious height characteristics and higher roughness. The skid resistance of OGFC mixture specimens is better than that of AC.

12.
Front Neurol ; 13: 1027557, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-36313499

RESUMO

Background: The management of unruptured intracranial aneurysm (UIA) remains controversial. Recently, machine learning has been widely applied in the field of medicine. This study developed predictive models using machine learning to investigate periprocedural complications associated with endovascular procedures for UIA. Methods: We enrolled patients with solitary UIA who underwent endovascular procedures. Periprocedural complications were defined as neurological adverse events resulting from endovascular procedures. We incorporated three machine learning algorithms into our prediction models: artificial neural networks (ANN), random forest (RF), and logistic regression (LR). The Shapley Additive Explanations (SHAP) approach and feature importance analysis were used to identify and prioritize significant features associated with periprocedural complications. Results: In total, 443 patients were included. Forty-eight (10.83%) procedure-related complications occurred. In the testing set, the ANN model produced the largest value (0.761) for area under the curve (AUC). The RF model also achieved an acceptable AUC value of 0.735, while the AUC value of the LR model was 0.668. SHAP and feature importance analysis identified distal aneurysm, aneurysm size and treatment modality as most significant features for the prediction of periprocedural complications following endovascular treatment for UIA. Conclusion: Periprocedural complications after endovascular treatment for UIA are not negligible. Prediction of periprocedural complications via machine learning is feasible and effective. Machine learning can serve as a promising tool in the decision-making process for UIA treatment.

13.
Sci Total Environ ; 849: 157823, 2022 Nov 25.
Artigo em Inglês | MEDLINE | ID: mdl-35931171

RESUMO

Reference evapotranspiration (ETo) is a variable that helps determine atmospheric pressure on living (reference) grass to release water into the atmosphere. For this purpose, four main driving forces: air temperature, air humidity, solar radiation, and wind speed need to be measured over the well-watered reference grass. The relative influence of these driving forces is region and climate-specific, with daily and seasonal variations. A clear understanding of the dynamic interactions of ETo's driving factors can illuminate the water and energy cycles of the earth and assist modelers with more accurate predictions of ETo. In this study, Pearson correlation, mutual information, and random forest feature importance analyses have been used to evaluate the relative importance of meteorological driving forces of ETo in California. To better understand the interrelations of these variables, 1,365,823 daily data samples from 237 standardized weather stations for 36 years have been clustered into homogeneous climatic zones and analyzed. To compensate for the effects of seasonality, feature importance analysis is also conducted on seasonal and monthly clustered data. Moreover, seasonal and annual trends of ETo and its driving factors are investigated for California and homogeneous zones using the Mann-Kendall test. Our findings reveal that for annually clustered data, solar radiation is the most influential driving factor of ETo in California. However, analysis of seasonal and monthly clustered data shows that vapor pressure deficit is the most informative factor during the summer and spring, while solar radiation is more important during the colder seasons. Results of trend analysis don't suggest a consistent monotonic trend for ETo and other variables for different seasons and zones. However, it is shown that agricultural regions with heavy irrigation dependence like the Central Valley are getting warmer and drier, especially during the irrigation season. This can adversely affect the water resources, agriculture industry, and food production of California, and modeling efforts like this can be very informative for future water resources management.


Assuntos
Tempo (Meteorologia) , Vento , Poaceae , Estações do Ano , Temperatura , Água
14.
Front Neurol ; 13: 875491, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35860493

RESUMO

Background: Machine learning algorithms for predicting 30-day stroke readmission are rarely discussed. The aims of this study were to identify significant predictors of 30-day readmission after stroke and to compare prediction accuracy and area under the receiver operating characteristic (AUROC) curve in five models: artificial neural network (ANN), K nearest neighbor (KNN), random forest (RF), support vector machine (SVM), naive Bayes classifier (NBC), and Cox regression (COX) models. Methods: The subjects of this prospective cohort study were 1,476 patients with a history of admission for stroke to one of six hospitals between March, 2014, and September, 2019. A training dataset (n = 1,033) was used for model development, and a testing dataset (n = 443) was used for internal validation. Another 167 patients with stroke recruited from October, to December, 2019, were enrolled in the dataset for external validation. A feature importance analysis was also performed to identify the significance of the selected input variables. Results: For predicting 30-day readmission after stroke, the ANN model had significantly (P < 0.001) higher performance indices compared to the other models. According to the ANN model results, the best predictor of 30-day readmission was PAC followed by nasogastric tube insertion and stroke type (P < 0.05). Using a machine learning ANN model to obtain an accurate estimate of 30-day readmission for stroke and to identify risk factors may improve the precision and efficacy of management for these patients. Conclusion: Using a machine-learning ANN model to obtain an accurate estimate of 30-day readmission for stroke and to identify risk factors may improve the precision and efficacy of management for these patients. For stroke patients who are candidates for PAC rehabilitation, these predictors have practical applications in educating patients in the expected course of recovery and health outcomes.

15.
Food Res Int ; 156: 111132, 2022 06.
Artigo em Inglês | MEDLINE | ID: mdl-35651007

RESUMO

The importance of single-cell variability is increasingly prominent with the developments in foodborne pathogens modeling. Traditional predictive microbiology model cannot accurately describe the growth behavior of small numbers of cells due to individual cell heterogeneity. The objective of the present study was to develop predictive models for single cell lag times of Salmonella Enteritidis after heat and chlorine treatment. A time-lapse microscopy method was employed to evaluate the single cell lag time by monitoring cell divisions. Four supervised machine learning algorithms including gradient boosting regression tree (GBRT), artificial neural network (ANN), random forest (RF), and support vector regression (SVR) were applied and compared. Results show that all four machine learning models have good predictive capabilities without an overfitting of the data. The ANN approach demonstrated superior prediction performance over other machine learning models (RMSE: 0.209, MAE: 0.135 and R2: 0.989). Furthermore, the SHapley Additive exPlanation (SHAP) measures were used to capture the influence of each feature on the model output, and results revealed that population lag times and sublethal injury rate have dominant impacts on the single cell lag time. Consequently, the findings generated from this study may be useful in managing the potential food safety risk caused by single cells of foodborne pathogens.


Assuntos
Cloro , Salmonella enteritidis , Temperatura Alta , Aprendizado de Máquina , Redes Neurais de Computação
16.
Sustain Cities Soc ; 75: 103254, 2021 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-34414067

RESUMO

To inform data-driven decisions in fighting the global pandemic caused by COVID-19, this research develops a spatiotemporal analysis framework under the combination of an ensemble model (random forest regression) and a multi-objective optimization algorithm (NSGA-II). It has been verified for four Asian countries, including Japan, South Korea, Pakistan, and Nepal. Accordingly, we can gain some valuable experience to better understand the disease evolution, forecast the prevalence of the disease, which can provide sustainable evidence to guide further intervention and management. Random forest with a proper rolling time-window can learn the combined effects of environmental and social factors to accurately predict the daily growth of confirmed cases and daily death rate on a national scale, which is followed by NSGA-II to find a range of Pareto optimal solutions for ensuring the minimization of the infection rate and mortality at the same time. Experimental results demonstrate that the predictive model can alert the local government in advance, allowing the accused time to put forward relevant measures. The temperature in the category of environment and the stringency index belonging to the social factor are identified as the top 2 important features to exert a greater impact on the virus transmission. Moreover, optimal solutions provide references to design the best control strategies towards pandemic containment and prevention that can accommodate the country-specific circumstance, which are possible to decrease the two objectives by more than 95%. In particular, appropriate adjustment of social-related features needs to take priority over others, since it can bring about at least 1.47% average improvement of two objectives compared to environmental factors.

17.
ACS Appl Mater Interfaces ; 12(13): 15837-15843, 2020 Apr 01.
Artigo em Inglês | MEDLINE | ID: mdl-32191023

RESUMO

The physical chemistry mechanisms behind the oil-brine interface phenomena are not yet fully clarified. The knowledge of the relation between brine composition and concentration for a given oil may lead to the ionic tuning of the injected solution on geochemical and enhanced oil recovery processes. Thus, it is worth examining the parameters influencing the interfacial properties. In this context, we have combined machine learning (ML) techniques with classical molecular dynamics simulations (MD) to predict oil/brine interfacial tensions (IFT) effectively and compared this process to a linear regression (LR) method. To diversify our data set, we have introduced a new atomistic crude oil model (medium) with 36 different types of hydrocarbon molecules. The MD simulations were performed for mono- and multicomponent (toluene, heptane, Heptol, light, and medium) oil systems interfaced with sulfate and chloride brines with varying cations (Na+, K+, Ca2+, and Mg2+) and salinity concentration. Thus, a consistent IFT data set was built for the ML training and LR fitting at room temperature and pressure conditions, over the feature space considering oil density, oil composition, salinity, and ionic concentrations. On the basis of gradient boosted (GB) algorithms, we have observed that the dominant quantities affecting the IFT are related to the oil attributes and the salinity concentration, and no specific ion dominates the IFT changes. When the obtained LR model was validated against MD and experimental data from the literature, the error varied up to 2% and 9%, respectively, showing a robust and consistent transferability. The combination of MD simulations and ML techniques may provide a fast and cost-effective IFT determination over multiple and complex fluid-fluid and fluid-solid interfaces.

18.
Sci Total Environ ; 635: 644-658, 2018 Sep 01.
Artigo em Inglês | MEDLINE | ID: mdl-29679837

RESUMO

A stacked ensemble model is developed for forecasting and analyzing the daily average concentrations of fine particulate matter (PM2.5) in Beijing, China. Special feature extraction procedures, including those of simplification, polynomial, transformation and combination, are conducted before modeling to identify potentially significant features based on an exploratory data analysis. Stability feature selection and tree-based feature selection methods are applied to select important variables and evaluate the degrees of feature importance. Single models including LASSO, Adaboost, XGBoost and multi-layer perceptron optimized by the genetic algorithm (GA-MLP) are established in the level 0 space and are then integrated by support vector regression (SVR) in the level 1 space via stacked generalization. A feature importance analysis reveals that nitrogen dioxide (NO2) and carbon monoxide (CO) concentrations measured from the city of Zhangjiakou are taken as the most important elements of pollution factors for forecasting PM2.5 concentrations. Local extreme wind speeds and maximal wind speeds are considered to extend the most effects of meteorological factors to the cross-regional transportation of contaminants. Pollutants found in the cities of Zhangjiakou and Chengde have a stronger impact on air quality in Beijing than other surrounding factors. Our model evaluation shows that the ensemble model generally performs better than a single nonlinear forecasting model when applied to new data with a coefficient of determination (R2) of 0.90 and a root mean squared error (RMSE) of 23.69µg/m3. For single pollutant grade recognition, the proposed model performs better when applied to days characterized by good air quality than when applied to days registering high levels of pollution. The overall classification accuracy level is 73.93%, with most misclassifications made among adjacent categories. The results demonstrate the interpretability and generalizability of the stacked ensemble model.


Assuntos
Poluentes Atmosféricos/análise , Poluição do Ar/estatística & dados numéricos , Monitoramento Ambiental/métodos , Material Particulado/análise , Pequim , Monóxido de Carbono/análise , Conceitos Meteorológicos , Modelos Químicos , Dióxido de Nitrogênio/análise
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA