RESUMO
Predicting therapeutic responses in cancer patients is a major challenge in the field of precision medicine due to high inter- and intra-tumor heterogeneity. Most drug response models need to be improved in terms of accuracy, and there is limited research to assess therapeutic responses of particular tumor types. Here, we developed a novel method DROEG (Drug Response based on Omics and Essential Genes) for prediction of drug response in tumor cell lines by integrating genomic, transcriptomic and methylomic data along with CRISPR essential genes, and revealed that the incorporation of tumor proliferation essential genes can improve drug sensitivity prediction. Concisely, DROEG integrates literature-based and statistics-based methods to select features and uses Support Vector Regression for model construction. We demonstrate that DROEG outperforms most state-of-the-art algorithms by both qualitative (prediction accuracy for drug-sensitive/resistant) and quantitative (Pearson correlation coefficient between the predicted and actual IC50) evaluation in Genomics of Drug Sensitivity in Cancer and Cancer Cell Line Encyclopedia datasets. In addition, DROEG is further applied to the pan-gastrointestinal tumor with high prevalence and mortality as a case study at both cell line and clinical levels to evaluate the model efficacy and discover potential prognostic biomarkers in Cisplatin and Epirubicin treatment. Interestingly, the CRISPR essential gene information is found to be the most important contributor to enhance the accuracy of the DROEG model. To our knowledge, this is the first study to integrate essential genes with multi-omics data to improve cancer drug response prediction and provide insights into personalized precision treatment.
Assuntos
Antineoplásicos , Neoplasias , Humanos , Genes Essenciais , Antineoplásicos/farmacologia , Antineoplásicos/uso terapêutico , Neoplasias/tratamento farmacológico , Neoplasias/genética , Genômica/métodos , Medicina de Precisão/métodosRESUMO
Rationale: Distinguishing connective tissue disease-associated interstitial lung disease (CTD-ILD) from idiopathic pulmonary fibrosis (IPF) can be clinically challenging. Objectives: To identify proteins that separate and classify patients with CTD-ILD and those with IPF. Methods: Four registries with 1,247 patients with IPF and 352 patients with CTD-ILD were included in analyses. Plasma samples were subjected to high-throughput proteomics assays. Protein features were prioritized using recursive feature elimination to construct a proteomic classifier. Multiple machine learning models, including support vector machine, LASSO (least absolute shrinkage and selection operator) regression, random forest, and imbalanced Random Forest, were trained and tested in independent cohorts. The validated models were used to classify each case iteratively in external datasets. Measurements and Main Results: A classifier with 37 proteins (proteomic classifier 37 [PC37]) was enriched in the biological process of bronchiole development and smooth muscle proliferation and immune responses. Four machine learning models used PC37 with sex and age score to generate continuous classification values. Receiver operating characteristic curve analyses of these scores demonstrated consistent areas under the curve of 0.85-0.90 in the test cohort and 0.94-0.96 in the single-sample dataset. Binary classification demonstrated 78.6-80.4% sensitivity and 76-84.4% specificity in the test cohort and 93.5-96.1% sensitivity and 69.5-77.6% specificity in the single-sample classification dataset. Composite analysis of all machine learning models confirmed 78.2% (194 of 248) accuracy in the test cohort and 82.9% (208 of 251) in the single-sample classification dataset. Conclusions: Multiple machine learning models trained with large cohort proteomic datasets consistently distinguished CTD-ILD from IPF. Many of the identified proteins are involved in immune pathways. We further developed a novel approach for single-sample classification, which could facilitate honing the differential diagnosis of ILD in challenging cases and improve clinical decision making.
Assuntos
Doenças Pulmonares Intersticiais , Aprendizado de Máquina , Proteômica , Humanos , Doenças Pulmonares Intersticiais/sangue , Doenças Pulmonares Intersticiais/diagnóstico , Feminino , Masculino , Proteômica/métodos , Pessoa de Meia-Idade , Idoso , Fibrose Pulmonar Idiopática/sangue , Fibrose Pulmonar Idiopática/diagnóstico , Diagnóstico Diferencial , Doenças do Tecido Conjuntivo/sangue , Doenças do Tecido Conjuntivo/diagnóstico , Biomarcadores/sangueRESUMO
DNA methylation comprises a cumulative record of lifetime exposures superimposed on genetically determined markers. Little is known about methylation dynamics in humans following an acute perturbation, such as infection. We characterized the temporal trajectory of blood epigenetic remodeling in 133 participants in a prospective study of young adults before, during, and after asymptomatic and mildly symptomatic SARS-CoV-2 infection. The differential methylation caused by asymptomatic or mildly symptomatic infections was indistinguishable. While differential gene expression largely returned to baseline levels after the virus became undetectable, some differentially methylated sites persisted for months of follow-up, with a pattern resembling autoimmune or inflammatory disease. We leveraged these responses to construct methylation-based machine learning models that distinguished samples from pre-, during-, and postinfection time periods, and quantitatively predicted the time since infection. The clinical trajectory in the young adults and in a diverse cohort with more severe outcomes was predicted by the similarity of methylation before or early after SARS-CoV-2 infection to the model-defined postinfection state. Unlike the phenomenon of trained immunity, the postacute SARS-CoV-2 epigenetic landscape we identify is antiprotective.
Assuntos
COVID-19 , Adulto Jovem , Humanos , COVID-19/genética , SARS-CoV-2/genética , Estudos Prospectivos , Metilação de DNA/genética , Processamento de Proteína Pós-TraducionalRESUMO
OBJECTIVE: To explore the value of six machine learning models based on PET/CT radiomics combined with EGFR in predicting brain metastases of lung adenocarcinoma. METHODS: Retrospectively collected 204 patients with lung adenocarcinoma who underwent PET/CT examination and EGFR gene detection before treatment from Cancer Hospital Affiliated to Shandong First Medical University in 2020. Using univariate analysis and multivariate logistic regression analysis to find the independent risk factors for brain metastasis. Based on PET/CT imaging combined with EGFR and PET metabolic indexes, established six machine learning models to predict brain metastases of lung adenocarcinoma. Finally, using ten-fold cross-validation to evaluate the predictive effectiveness. RESULTS: In univariate analysis, patients with N2-3, EGFR mutation-positive, LYM%≤20, and elevated tumor markers(P<0.05) were more likely to develop brain metastases. In multivariate Logistic regression analysis, PET metabolic indices revealed that SUVmax, SUVpeak, Volume, and TLG were risk factors for lung adenocarcinoma brain metastasis(P<0.05). The SVM model was the most efficient predictor of brain metastasis with an AUC of 0.82 (PET/CT group),0.70 (CT group),0.76 (PET group). CONCLUSIONS: Radiomics combined with EGFR machine learning model as a new method have higher accuracy than EGFR mutation alone. SVM model is the most effective method for predicting brain metastases of lung adenocarcinoma, and the prediction efficiency of PET/CT group is better than PET group and CT group.
Assuntos
Adenocarcinoma de Pulmão , Neoplasias Encefálicas , Receptores ErbB , Neoplasias Pulmonares , Aprendizado de Máquina , Humanos , Adenocarcinoma de Pulmão/diagnóstico por imagem , Adenocarcinoma de Pulmão/patologia , Neoplasias Encefálicas/diagnóstico por imagem , Receptores ErbB/genética , Pulmão/patologia , Neoplasias Pulmonares/genética , Tomografia por Emissão de Pósitrons combinada à Tomografia Computadorizada , Estudos RetrospectivosRESUMO
BACKGROUND: The 5´ untranslated region (5´ UTR) plays a key role in regulating translation efficiency and mRNA stability, making it a favored target in genetic engineering and synthetic biology. A common feature found in the 5´ UTR is the poly-adenine (poly(A)) tract. However, the effect of 5´ UTR poly(A) on protein production remains controversial. Machine-learning models are powerful tools for explaining the complex contributions of features, but models incorporating features of 5´ UTR poly(A) are currently lacking. Thus, our goal is to construct such a model, using natural 5´ UTRs from Kluyveromyces marxianus, a promising cell factory for producing heterologous proteins. RESULTS: We constructed a mini-library consisting of 207 5´ UTRs harboring poly(A) and 34 5´ UTRs without poly(A) from K. marxianus. The effects of each 5´ UTR on the production of a GFP reporter were evaluated individually in vivo, and the resulting protein abundance spanned an approximately 450-fold range throughout. The data were used to train a multi-layer perceptron neural network (MLP-NN) model that incorporated the length and position of poly(A) as features. The model exhibited good performance in predicting protein abundance (average R2 = 0.7290). The model suggests that the length of poly(A) is negatively correlated with protein production, whereas poly(A) located between 10 and 30 nt upstream of the start codon (AUG) exhibits a weak positive effect on protein abundance. Using the model as guidance, the deletion or reduction of poly(A) upstream of 30 nt preceding AUG tended to improve the production of GFP and a feruloyl esterase. Deletions of poly(A) showed inconsistent effects on mRNA levels, suggesting that poly(A) represses protein production either with or without reducing mRNA levels. CONCLUSION: The effects of poly(A) on protein production depend on its length and position. Integrating poly(A) features into machine-learning models improves simulation accuracy. Deleting or reducing poly(A) upstream of 30 nt preceding AUG tends to enhance protein production. This optimization strategy can be applied to enhance the yield of K. marxianus and other microbial cell factories.
Assuntos
Kluyveromyces , Regiões 5' não Traduzidas , Sequência de Bases , Kluyveromyces/genética , Kluyveromyces/metabolismo , RNA Mensageiro/genéticaRESUMO
Pollen collected by pollinators can be used as a marker of the foraging behavior as well as indicate the botanical species present in each environment. Pollen intake is essential for pollinators' health and survival. During the foraging activity, some pollinators, such as honeybees, manipulate the collected pollen mixing it with salivary secretions and nectar (corbicular pollen) changing the pollen chemical profile. Different tools have been developed for the identification of the botanical origin of pollen, based on microscopy, spectrometry, or molecular markers. However, up to date, corbicular pollen has never been investigated. In our work, corbicular pollen from 5 regions with different climate conditions was collected during spring. Pollens were identified with microscopy-based techniques, and then analyzed in MALDI-MS. Four different chemical extraction solutions and two physical disruption methods were tested to achieve a MALDI-MS effective protocol. The best performance was obtained using a sonication disruption method after extraction with acetic acid or trifluoroacetic acid. Therefore, we propose a new rapid and reliable methodology for the identification of the botanical origin of the corbicular pollens using MALDI-MS. This new approach opens to a wide range of environmental studies spanning from plant biodiversity to ecosystem trophic interactions.
Assuntos
Pólen , Espectrometria de Massas por Ionização e Dessorção a Laser Assistida por Matriz , Espectrometria de Massas por Ionização e Dessorção a Laser Assistida por Matriz/métodos , Pólen/química , Abelhas/fisiologia , AnimaisRESUMO
BACKGROUND: Timely detection of modifiable risk factors for postoperative pulmonary complications (PPCs) could inform ventilation strategies that attenuate lung injury. We sought to develop, validate, and internally test machine learning models that use intraoperative respiratory features to predict PPCs. METHODS: We analysed perioperative data from a cohort comprising patients aged 65 yr and older at an academic medical centre from 2019 to 2023. Two linear and four nonlinear learning models were developed and compared with the current gold-standard risk assessment tool ARISCAT (Assess Respiratory Risk in Surgical Patients in Catalonia Tool). The Shapley additive explanation of artificial intelligence was utilised to interpret feature importance and interactions. RESULTS: Perioperative data were obtained from 10 284 patients who underwent 10 484 operations (mean age [range] 71 [65-98] yr; 42% female). An optimised XGBoost model that used preoperative variables and intraoperative respiratory variables had area under the receiver operating characteristic curves (AUROCs) of 0.878 (0.866-0.891) and 0.881 (0.879-0.883) in the validation and prospective cohorts, respectively. These models outperformed ARISCAT (AUROC: 0.496-0.533). The intraoperative dynamic features of respiratory dynamic system compliance, mechanical power, and driving pressure were identified as key modifiable contributors to PPCs. A simplified model based on XGBoost including 20 variables generated an AUROC of 0.864 (0.852-0.875) in an internal testing cohort. This has been developed into a web-based tool for further external validation (https://aorm.wchscu.cn/). CONCLUSIONS: These findings suggest that real-time identification of surgical patients' risk of postoperative pulmonary complications could help personalise intraoperative ventilatory strategies and reduce postoperative pulmonary complications.
Assuntos
Aprendizado de Máquina , Complicações Pós-Operatórias , Humanos , Idoso , Feminino , Complicações Pós-Operatórias/prevenção & controle , Masculino , Idoso de 80 Anos ou mais , Pneumopatias/etiologia , Pneumopatias/prevenção & controle , Medição de Risco/métodos , Estudos Prospectivos , Estudos de Coortes , Fatores de Risco , Monitorização Intraoperatória/métodosRESUMO
PURPOSE: Ovarian stimulation with gonadotropins is crucial for obtaining mature oocytes for in vitro fertilization (IVF). Determining the optimal gonadotropin dosage is essential for maximizing its effectiveness. Our study aimed to develop a machine learning (ML) model to predict oocyte counts in IVF patients and retrospectively analyze whether higher gonadotropin doses improve ovarian stimulation outcomes. METHODS: We analyzed the data from 9598 ovarian stimulations. An ML model was employed to predict the number of mature metaphase II (MII) oocytes based on clinical parameters. These predictions were compared with the actual counts of retrieved MII oocytes at different gonadotropin dosages. RESULTS: The ML model provided precise predictions of MII counts, with the AMH and AFC being the most important, and the previous stimulation outcome and age, the less important features for the prediction. Our findings revealed that increasing gonadotropin dosage did not result in a higher number of retrieved MII oocytes. Specifically, for patients predicted to produce 4-8 MII oocytes, a decline in oocyte count was observed as gonadotropin dosage increased. Patients with low (1-3) and high (9-12) MII predictions achieved the best results when administered a daily dose of 225 IU; lower and higher doses proved to be less effective. CONCLUSIONS: Our study suggests that high gonadotropin doses do not enhance MII oocyte retrieval. Our ML model can offer clinicians a novel tool for the precise prediction of MII to guide gonadotropin dosing.
Assuntos
Fertilização in vitro , Gonadotropinas , Recuperação de Oócitos , Oócitos , Indução da Ovulação , Humanos , Feminino , Indução da Ovulação/métodos , Recuperação de Oócitos/métodos , Adulto , Oócitos/efeitos dos fármacos , Oócitos/crescimento & desenvolvimento , Gonadotropinas/administração & dosagem , Gonadotropinas/uso terapêutico , Fertilização in vitro/métodos , Gravidez , Taxa de Gravidez , Estudos Retrospectivos , Metáfase/efeitos dos fármacosRESUMO
Alzheimer's disease is a type of neurodegenerative disorder that is characterized by the progressive degeneration of brain cells, leading to cognitive decline and memory loss. It is the most common cause of dementia and affects millions of people worldwide. While there is currently no cure for Alzheimer's disease, early detection and treatment can help to slow the progression of symptoms and improve quality of life. This research presents a diagnostic tool for classifying mild cognitive impairment and Alzheimer's diseases using feature-based machine learning applied to optical coherence tomographic angiography images (OCT-A). Several features are extracted from the OCT-A image, including vessel density in five sectors, the area of the foveal avascular zone, retinal thickness, and novel features based on the histogram of the range-filtered OCT-A image. To ensure effectiveness for a diverse population, a large local database for our study was collected. The promising results of our study, with the best accuracy of 92.17,% will provide an efficient diagnostic tool for early detection of Alzheimer's disease.
Assuntos
Doença de Alzheimer , Disfunção Cognitiva , Tomografia de Coerência Óptica , Humanos , Doença de Alzheimer/diagnóstico por imagem , Doença de Alzheimer/classificação , Doença de Alzheimer/patologia , Disfunção Cognitiva/diagnóstico por imagem , Disfunção Cognitiva/diagnóstico , Tomografia de Coerência Óptica/métodos , Angiografia/métodos , Aprendizado de Máquina , Masculino , Idoso , FemininoRESUMO
This paper focuses on the emissions of the three most sold categories of light vehicles: sedans, SUVs, and pickups. The research is carried out through an innovative methodology based on GPS and machine learning in real driving conditions. For this purpose, driving data from the three best-selling vehicles in Ecuador are acquired using a data logger with GPS included, and emissions are measured using a PEMS in six RDE tests with two standardized routes for each vehicle. The data obtained on Route 1 are used to estimate the gears used during driving using the K-means algorithm and classification trees. Then, the relative importance of driving variables is estimated using random forest techniques, followed by the training of ANNs to estimate CO2, CO, NOX, and HC. The data generated on Route 2 are used to validate the obtained ANNs. These models are fed with a dataset generated from 324, 300, and 316 km of random driving for each type of vehicle. The results of the model were compared with the IVE model and an OBD-based model, showing similar results without the need to mount the PEMS on the vehicles for long test drives. The generated model is robust to different traffic conditions as a result of its training and validation using a large amount of data obtained under completely random driving conditions.
RESUMO
Background: Bone cancer is a severe condition often leading to patient mortality. Diagnosis relies on X-rays, MRIs, or CT scans, which require time-consuming manual review by experts. Thus, developing an automated system is crucial for accurate classification of malignant and healthy bone.Methods: Differentiating between them poses a challenge as they may exhibit similar physical characteristics. The initial step is selecting the optimal edge detection method. Two feature sets are then generated: one with the histogram of oriented gradients (HOG) and one without. Performance evaluation involves two machine learning models: Support Vector Machine (SVM) and Random Forest.Results: Including HOG consistently yields superior results. The SVM model with HOG achieves an F-1 score of 0.92, outperforming the Random Forest model's .77. This study aims to develop reliable methods for bone cancer classification. The proposed automated method assists surgeons in accurately detecting malignant bone regions using modern image analysis techniques and machine learning models. Incorporating HOG significantly enhances performance, improving differentiation between malignant and healthy bone.Conclusion: Ultimately, this approach supports precise diagnoses and informed treatment decisions for bone cancer patients.
Assuntos
Neoplasias Ósseas , Aprendizado de Máquina , Humanos , Imageamento por Ressonância Magnética , Tomografia Computadorizada por Raios X/métodos , Neoplasias Ósseas/diagnóstico por imagemRESUMO
The implementation of green financial reform and innovation pilot zones is a pivotal initiative aimed at directing financial resources more effectively towards green transformation and national sustainable development strategy. To this end, this study adopts a dual machine learning model to examine the effect of this pilot policy on energy intensity and the underlying mechanisms, drawing upon data from 254 cities in China spanning from 2006 to 2019. The conclusions obtained confirm that the establishment of these pilot zones has exerted a substantial impact on mitigating energy intensity. This inhibitory effect is particularly evident in cities with lower administrative levels, cities in western regions, smaller and medium-sized cities, and cities dominated by the secondary industry. It should be emphasized that the reduction in energy intensity is achieved through fostering green technology innovation and enhancing green financial development. The results not only provide empirical evidence for the effectiveness of green finance pilot policies in reducing energy intensity, thereby enriching the inclusive impact of financial innovation, but also offer practical insights for strengthening the green financial system and replicating and expanding the pilot zones.
RESUMO
Mitigation of nitrous oxide (N2O) emissions in full-scale wastewater treatment plant (WWTP) has become an irreversible trend to adapt the climate change. Monitoring of N2O emissions plays a fundamental role in understanding and mitigating N2O emissions. This paper provides a comprehensive review of direct and indirect N2O monitoring methods. The techniques, strengths, limitations, and applicable scenarios of various methods are discussed. We conclude that the floating chamber technique is suitable for capturing and interpreting the spatiotemporal variability of real-time N2O emissions, due to its long-term in-situ monitoring capability and high data acquisition frequency. The monitoring duration, location, and frequency should be emphasized to guarantee the accuracy and comparability of acquired data. Calculation by default emission factors (EFs) is efficient when there is a need for ambiguous historical N2O emission accounts of national-scale or regional-scale WWTPs. Using process-specific EFs is beneficial in promoting mitigation pathways that are primarily focused on low-emission process upgrades. Machine learning models exhibit exemplary performance in the prediction of N2O emissions. Integrating mechanistic models with machine learning models can improve their explanatory power and sharpen their predictive precision. The implementation of the synergy of nutrient removal and N2O mitigation strategies necessitates the calibration and validation of multi-path mechanistic models, supported by long-term continuous direct monitoring campaigns.
Assuntos
Monitoramento Ambiental , Óxido Nitroso , Águas Residuárias , Óxido Nitroso/análise , Águas Residuárias/análise , Águas Residuárias/química , Monitoramento Ambiental/métodos , Eliminação de Resíduos Líquidos/métodosRESUMO
As an indicator of the optical characteristics of perovskite materials, the band gap is a crucial parameter that impacts the functionality of a wide range of optoelectronic devices. Obtaining the band gap of a material via a labor-intensive, time-consuming, and inefficient high-throughput calculation based on first principles is possible. However, it does not yield the most accurate results. Machine learning techniques emerge as a viable and effective substitute for conventional approaches in band gap prediction. This paper collected 201 pieces of data through the literature and open-source databases. By separating the features related to bits A, B, and X, a dataset of 1208 pieces of data containing 30 feature descriptors was established. The dataset underwent preprocessing, and the Pearson correlation coefficient method was employed to eliminate non-essential features as a subset of features. The band gap was predicted using the GBR algorithm, the random forest algorithm, the LightGBM algorithm, and the XGBoost algorithm, in that order, to construct a prediction model for organic-inorganic hybrid perovskites. The outcomes demonstrate that the XGBoost algorithm yielded an MAE value of 0.0901, an MSE value of 0.0173, and an R2 value of 0.991310. These values suggest that, compared to the other two models, the XGBoost model exhibits the lowest prediction error, suggesting that the input features may better fit the prediction model. Finally, analysis of the XGBoost-based prediction model's prediction results using the SHAP model interpretation method reveals that the occupancy rate of the A-position ion has the greatest impact on the prediction of the band gap and has an A-negative correlation with the prediction results of the band gap. The findings provide valuable insights into the relationship between the prediction of band gaps and significant characteristics of organic-inorganic hybrid perovskites.
RESUMO
Wolves have returned to Germany since 2000. Numbers have grown to 209 territorial pairs in 2021. XGBoost machine learning, combined with SHAP analysis is applied to predict German wolf pair presence in 2022 for 10 × 10 km grid cells. Model input consisted of 38 variables from open sources, covering the period 2000 to 2021. The XGBoost model predicted well, with 0.91 as the AUC. SHAP analysis ranked the variables: distance to the closest neighboring wolf pair was the main driver for a grid cell to become occupied by a wolf pair. The clustering tendency of related wolves seems to be an important explanatory factor here. Second was the percentage of wooded area. The next eight variables related to wolf presence in the preceding year, except at fifth, eighth and tenth position in the total order: human density (square root) in the grid, percentage arable land and road density respectively. Other variables including the occurrence of wild prey were the weakest predictors. The SHAP analysis also provided crucial added value in identifying a variable that had threshold values where its contribution to the prediction changed from positive to negative or vice versa. For instance, low density of people increased the probability of wolf pair presence, whereas a high density decreased this probability. Cumulative lift techniques showed that the model performed almost four times better than random prediction. The combination of XGBoost, SHAP and cumulative lift techniques is new in wolf management and conservation, allowing for the focusing of educational and financial resources.
Assuntos
Lobos , Animais , Humanos , Probabilidade , AlemanhaRESUMO
Air pollution and climate change are two complementary forces that directly or indirectly affect the environment's physical, chemical, and biological processes. The air quality index is a parameter defined to cope with this effect of air pollution. This study delves deeper into predicting this AQI parameter using multiple machine learning-based models. The AQI pollutants considered for this study are particulate matter (PM10, PM2.5), SO2, and NO2. It also tries to develop a comparative analysis of two different machine learning (ML) models viz. a viz. XGBoost and Lasso regression. An ever-changing emission concentration of pollutants is displayed by this study conducted in the urban city of Gorakhpur Uttar Pradesh, India. The validation of prediction accuracies of models was done over several statistical metrics. The value of the R2 metric for XGBoost (0.9985) is comparatively more than the R2 value for Lasso regression (0.9218) indicating lesser variance and higher accuracy of XGBoost in predicting AQI. Various statistical measures are taken into consideration in this study, including mean absolute error (MAE), mean square error (MSE), root mean square error (RMSE), T-test and p-values, and confidence intervals (CI). An increased degree of model accuracy is suggested as XGBoost's MAE, MSE, and RMSE values are significantly lower than Lasso's. Statistically significant performance differences between the XGBoost and Lasso regression models are demonstrated by T-statistics and p-values for MAE, MSE, RMSE, and R2.
Assuntos
Poluentes Atmosféricos , Poluição do Ar , Cidades , Monitoramento Ambiental , Aprendizado de Máquina , Material Particulado , Índia , Poluição do Ar/estatística & dados numéricos , Poluentes Atmosféricos/análise , Monitoramento Ambiental/métodos , Material Particulado/análise , Dióxido de Enxofre/análise , Dióxido de Nitrogênio/análiseRESUMO
Electricity consumption and sludge yield (SY) are important indirect greenhouse gas (GHG) emission sources in wastewater treatment plants (WWTPs). Predicting these byproducts is crucial for tailoring technology-related policy decisions. However, it challenges balancing mass balance models and mechanistic models that respectively have limited intervariable nexus representation and excessive requirements on operational parameters. Herein, we propose integrating two machine learning models, namely, gradient boosting tree (GBT) and deep learning (DL), to precisely pointwise model electricity consumption intensity (ECI) and SY for WWTPs in China. Results indicate that GBT and DL are capable of mining massive data to compensate for the lack of available parameters, providing a comprehensive modeling focusing on operation conditions and designed parameters, respectively. The proposed model reveals that lower ECI and SY were associated with higher treated wastewater volumes, more lenient effluent standards, and newer equipment. Moreover, ECI and SY showed different patterns when influent biochemical oxygen demand is above or below 100 mg/L in the anaerobic-anoxic-oxic process. Therefore, managing ECI and SY requires quantifying the coupling relationships between biochemical reactions instead of isolating each variable. Furthermore, the proposed models demonstrate potential economic-related inequalities resulting from synergizing water pollution and GHG emissions management.
Assuntos
Gases de Efeito Estufa , Purificação da Água , Eliminação de Resíduos Líquidos , Águas Residuárias , Esgotos , Purificação da Água/métodos , Efeito EstufaRESUMO
The lack of objective diagnostic methods for mental disorders challenges the reliability of diagnosis. The study aimed to develop an easily accessible and useable objective method for diagnosing major depressive disorder (MDD), schizophrenia (SZ), bipolar disorder (BPD), and panic disorder (PD) using serum multi-protein. Serum levels of brain-derived neurotrophic factor (BDNF), VGF (non-acronymic), bicaudal C homolog 1 (BICC1), C-reactive protein (CRP), and cortisol, which are generally recognized to be involved in different pathogenesis of various mental disorders, were measured in patients with MDD (n = 50), SZ (n = 50), BPD (n = 55), and PD along with 50 healthy controls (HC). Linear discriminant analysis (LDA) was employed to construct a multi-classification model to classify these mental disorders. Both leave-one-out cross-validation (LOOCV) and fivefold cross-validation were applied to validate the accuracy and stability of the LDA model. All five serum proteins were included in the LDA model, and it was found to display a high overall accuracy of 96.9% when classifying MDD, SZ, BPD, PD, and HC groups. Multi-classification accuracy of the LDA model for LOOCV and fivefold cross-validation (within-study replication) reached 96.9 and 96.5%, respectively, demonstrating the feasibility of the blood-based multi-protein LDA model for classifying common mental disorders in a mixed cohort. The results suggest that combining multiple proteins associated with different pathogeneses of mental disorders using LDA may be a novel and relatively objective method for classifying mental disorders. Clinicians should consider combining multiple serum proteins to diagnose mental disorders objectively.
Assuntos
Transtorno Depressivo Maior , Transtornos Mentais , Humanos , Transtorno Depressivo Maior/diagnóstico , Reprodutibilidade dos Testes , Transtornos Mentais/diagnóstico , Proteínas Sanguíneas , Aprendizado de MáquinaRESUMO
OBJECTIVE: Anemia is one of the common adverse reactions after hip fracture surgery. The traditional method to solve anemia is allogeneic transfusion. However, the transfusion may lead to some complications such as septicemia and fever. So far, few studies have reported roles of machine learning in predicting whether blood transfusion is needed or not after hip fracture surgery. Therefore, the purpose of this study is to develop machine learning models to predict the likelihood of postoperative blood transfusion in patients undergoing hip fracture surgery. METHODS: This study enrolled 1355 patients who underwent hip fracture surgery at the Affiliated Hospital of Qingdao University from January 2016 to December 2021. Among all patients, 210 cases received postoperative blood transfusion. All patients were randomly divided into a training group and a testing group at a ratio of 7:3. In the training group, univariate and multivariate logistic regression analyses were used to determine independent risk factors for the postoperative transfusion. Then, based on these independent risk factors, tenfold cross-validation method was utilized to develop five machine learning models, including logistic, multilayer perceptron (MLP), extreme gradient boosting (XGBoost), random forest (RF), and support vector machine (SVM). The receiver operating characteristic (ROC) curve, area under ROC curve (AUC), and Matthews correlation coefficient (MCC) were generated to evaluate the performance of the models. Calibration plot and decision curve analysis (DCA) were used to test the performance, stability, and clinical applicability of the models. The models were validated using the testing group; and the ROC curve, MCC, calibration plot, and DCA curves were also generated to validate the performance, stability, and clinical applicability of the models. To further verify the robustness of the model, we randomly grabbed 70% of the samples in the testing set, performed 1000 iterations, and calculated the AUC and confidence interval of the five models. Finally, we used SHapley Additive exPlanations (SHAP) to explain these models. RESULTS: Multivariate logistic regression analysis showed that there were 8 independent risk factors, including age, blood transfusion history, albumin (ALB), globulin (GLO), total bilirubin (TBIL), indirect bilirubin (IBIL), hemoglobin (HB), and blood loss > 200 ml. We finally selected five independent risk factors including HB, GLO, age, IBIL, and blood loss > 200 ml. Based on these five independent risk factors, we generated six characteristic variables, namely HB, HB × HB, HB × blood loss, GLO × HB, age, age × IBIL, and established five machine learning models using a tenfold cross-validation method. In the training group, the AUC values of logistic, RF, MLP, SVM, and XGB were 0.9320, 0.8911, 0.9327, 0.9225, and 0.8825, respectively, and the average AUC was 0.9122 ± 0.0212. The MCC values were 0.65, 0.77, 0.65, 0.66, and 0.68, respectively, and the calibration plot and DCA performed well. In the testing group the AUC values of logistic, RF, MLP, SVM, and XGB were 0.8483, 0.7978, 0.8576, 0.8598, and 0.8216, respectively. The average AUC was 0.8370 ± 0.0238, and the MCC values were 0.41, 0.35, 0.40, 0.41, and 0.41, respectively. The calibration plot and DCA in the testing group also showed good performance. The AUC values and confidence intervals of the 1000-iteration model were: logistic (AUC, min confidence interval [CI]-max confidence interval [CI] 0.848, 0.804-0.903), RF (AUC, minCI-maxCI 0.797, 0.734-0.857), MLP (AUC, minCI-maxCI 0.858, 0.812-0.902), SVM (AUC, minCI-maxCI 0.859, 0.819-0.910), and XGB (AUC, minCI-maxCI 0.821, 0.764-0.894). The model performed well. Finally, according to SHAP, among all five models, HB played the most important role in model prediction and interpretation. CONCLUSION: The five models we developed all performed well in predicting the likelihood of blood transfusion after hip fracture surgery. Therefore, we believed that the prediction model based on machine learning had great application prospects in clinical practice, which could help clinicians better predict the risk of blood transfusion after hip fracture surgery.
Assuntos
Anemia , Fraturas do Quadril , Humanos , Bilirrubina , Transfusão de Sangue , Aprendizado de MáquinaRESUMO
BACKGROUND: Given the increasing number of dementia patients worldwide, a new method was developed for machine learning models to identify the 'latent needs' of patients and caregivers to facilitate patient/public involvement in societal decision making. METHODS: Japanese transcribed interviews with 53 dementia patients and caregivers were used. A new morpheme selection method using Z-scores was developed to identify trends in describing the latent needs. F-measures with and without the new method were compared using three machine learning models. RESULTS: The F-measures with the new method were higher for the support vector machine (SVM) (F-measure of 0.81 with the new method and F-measure of 0.79 without the new method for patients) and Naive Bayes (F-measure of 0.69 with the new method and F-measure of 0.67 without the new method for caregivers and F-measure of 0.75 with the new method and F-measure of 0.73 without the new method for patients). CONCLUSION: A new scheme based on Z-score adaptation for machine learning models was developed to predict the latent needs of dementia patients and their caregivers by extracting data from interviews in Japanese. However, this study alone cannot be used to assign significance to the adaptation of the new method because of no enough size of sample dataset. Such pre-selection with Z-score adaptation from text data in machine learning models should be considered with more modified suitable methods in the near future.