Pesquisa | Biblioteca Virtual em Saúde

1.

DROEG: a method for cancer drug response prediction based on omics and essential genes integration.

Wu, Peike; Sun, Renliang; Fahira, Aamir; Chen, Yongzhou; Jiangzhou, Huiting; Wang, Ke; Yang, Qiangzhen; Dai, Yang; Pan, Dun; Shi, Yongyong; Wang, Zhuo.

Brief Bioinform ; 24(2)2023 03 19.

Artigo em Inglês | MEDLINE | ID: mdl-36715269

RESUMO

Predicting therapeutic responses in cancer patients is a major challenge in the field of precision medicine due to high inter- and intra-tumor heterogeneity. Most drug response models need to be improved in terms of accuracy, and there is limited research to assess therapeutic responses of particular tumor types. Here, we developed a novel method DROEG (Drug Response based on Omics and Essential Genes) for prediction of drug response in tumor cell lines by integrating genomic, transcriptomic and methylomic data along with CRISPR essential genes, and revealed that the incorporation of tumor proliferation essential genes can improve drug sensitivity prediction. Concisely, DROEG integrates literature-based and statistics-based methods to select features and uses Support Vector Regression for model construction. We demonstrate that DROEG outperforms most state-of-the-art algorithms by both qualitative (prediction accuracy for drug-sensitive/resistant) and quantitative (Pearson correlation coefficient between the predicted and actual IC50) evaluation in Genomics of Drug Sensitivity in Cancer and Cancer Cell Line Encyclopedia datasets. In addition, DROEG is further applied to the pan-gastrointestinal tumor with high prevalence and mortality as a case study at both cell line and clinical levels to evaluate the model efficacy and discover potential prognostic biomarkers in Cisplatin and Epirubicin treatment. Interestingly, the CRISPR essential gene information is found to be the most important contributor to enhance the accuracy of the DROEG model. To our knowledge, this is the first study to integrate essential genes with multi-omics data to improve cancer drug response prediction and provide insights into personalized precision treatment.

Assuntos

Antineoplásicos , Neoplasias , Humanos , Genes Essenciais , Antineoplásicos/farmacologia , Antineoplásicos/uso terapêutico , Neoplasias/tratamento farmacológico , Neoplasias/genética , Genômica/métodos , Medicina de Precisão/métodos

2.

Machine Learning of Plasma Proteomics Classifies Diagnosis of Interstitial Lung Disease.

Huang, Yong; Ma, Shwu-Fan; Oldham, Justin M; Adegunsoye, Ayodeji; Zhu, Daisy; Murray, Susan; Kim, John S; Bonham, Catherine; Strickland, Emma; Linderholm, Angela L; Lee, Cathryn T; Paul, Tessy; Mannem, Hannah; Maher, Toby M; Molyneaux, Philip L; Strek, Mary E; Martinez, Fernando J; Noth, Imre.

Am J Respir Crit Care Med ; 2024 Feb 29.

Artigo em Inglês | MEDLINE | ID: mdl-38422478

RESUMO

RATIONALE: Distinguishing connective tissue disease associated interstitial lung disease (CTD-ILD) from idiopathic pulmonary fibrosis (IPF) can be clinically challenging. OBJECTIVES: Identify proteins that separate and classify CTD-ILD from IPF patients. METHODS: Four registries with 1247 IPF and 352 CTD-ILD patients were included in analyses. Plasma samples were subjected to high-throughput proteomics assays. Protein features were prioritized using Recursive Feature Elimination (RFE) to construct a proteomic classifier. Multiple machine learning models, including Support Vector Machine, LASSO regression, Random Forest (RF), and imbalanced-RF, were trained and tested in independent cohorts. The validated models were used to classify each case iteratively in external datasets. MEASUREMENT AND MAIN RESULTS: A classifier with 37 proteins (PC37) was enriched in biological process of bronchiole development and smooth muscle proliferation, and immune responses. Four machine learning models used PC37 with sex and age score to generate continuous classification values. Receiver-operating-characteristic curve analyses of these scores demonstrated consistent Area-Under-Curve 0.85-0.90 in test cohort, and 0.94-0.96 in the single-sample dataset. Binary classification demonstrated 78.6%-80.4% sensitivity and 76%-84.4% specificity in test cohort, 93.5%-96.1% sensitivity and 69.5%-77.6% specificity in single-sample classification dataset. Composite analysis of all machine learning models confirmed 78.2% (194/248) accuracy in test cohort and 82.9% (208/251) in single-sample classification dataset. CONCLUSIONS: Multiple machine learning models trained with large cohort proteomic datasets consistently distinguished CTD-ILD from IPF. Identified proteins involved in immune pathways. We further developed a novel approach for single sample classification, which could facilitate honing the differential diagnosis of ILD in challenging cases and improve clinical decision-making.

3.

A methylation clock model of mild SARS-CoV-2 infection provides insight into immune dysregulation.

Mao, Weiguang; Miller, Clare M; Nair, Venugopalan D; Ge, Yongchao; Amper, Mary Anne S; Cappuccio, Antonio; George, Mary-Catherine; Goforth, Carl W; Guevara, Kristy; Marjanovic, Nada; Nudelman, German; Pincas, Hanna; Ramos, Irene; Sealfon, Rachel S G; Soares-Schanoski, Alessandra; Vangeti, Sindhu; Vasoya, Mital; Weir, Dawn L; Zaslavsky, Elena; Kim-Schulze, Seunghee; Gnjatic, Sacha; Merad, Miriam; Letizia, Andrew G; Troyanskaya, Olga G; Sealfon, Stuart C; Chikina, Maria.

Mol Syst Biol ; 19(5): e11361, 2023 05 09.

Artigo em Inglês | MEDLINE | ID: mdl-36919946

RESUMO

DNA methylation comprises a cumulative record of lifetime exposures superimposed on genetically determined markers. Little is known about methylation dynamics in humans following an acute perturbation, such as infection. We characterized the temporal trajectory of blood epigenetic remodeling in 133 participants in a prospective study of young adults before, during, and after asymptomatic and mildly symptomatic SARS-CoV-2 infection. The differential methylation caused by asymptomatic or mildly symptomatic infections was indistinguishable. While differential gene expression largely returned to baseline levels after the virus became undetectable, some differentially methylated sites persisted for months of follow-up, with a pattern resembling autoimmune or inflammatory disease. We leveraged these responses to construct methylation-based machine learning models that distinguished samples from pre-, during-, and postinfection time periods, and quantitatively predicted the time since infection. The clinical trajectory in the young adults and in a diverse cohort with more severe outcomes was predicted by the similarity of methylation before or early after SARS-CoV-2 infection to the model-defined postinfection state. Unlike the phenomenon of trained immunity, the postacute SARS-CoV-2 epigenetic landscape we identify is antiprotective.

Assuntos

COVID-19 , Adulto Jovem , Humanos , COVID-19/genética , SARS-CoV-2/genética , Estudos Prospectivos , Metilação de DNA/genética , Processamento de Proteína Pós-Traducional

4.

The application of different machine learning models based on PET/CT images and EGFR in predicting brain metastasis of adenocarcinoma of the lung.

Kong, Chao; Yin, Xiaoyan; Zou, Jingmin; Ma, Changsheng; Liu, Kai.

BMC Cancer ; 24(1): 454, 2024 Apr 11.

Artigo em Inglês | MEDLINE | ID: mdl-38605303

RESUMO

OBJECTIVE: To explore the value of six machine learning models based on PET/CT radiomics combined with EGFR in predicting brain metastases of lung adenocarcinoma. METHODS: Retrospectively collected 204 patients with lung adenocarcinoma who underwent PET/CT examination and EGFR gene detection before treatment from Cancer Hospital Affiliated to Shandong First Medical University in 2020. Using univariate analysis and multivariate logistic regression analysis to find the independent risk factors for brain metastasis. Based on PET/CT imaging combined with EGFR and PET metabolic indexes, established six machine learning models to predict brain metastases of lung adenocarcinoma. Finally, using ten-fold cross-validation to evaluate the predictive effectiveness. RESULTS: In univariate analysis, patients with N2-3, EGFR mutation-positive, LYM%≤20, and elevated tumor markers(P<0.05) were more likely to develop brain metastases. In multivariate Logistic regression analysis, PET metabolic indices revealed that SUVmax, SUVpeak, Volume, and TLG were risk factors for lung adenocarcinoma brain metastasis(P<0.05). The SVM model was the most efficient predictor of brain metastasis with an AUC of 0.82 (PET/CT group),0.70 (CT group),0.76 (PET group). CONCLUSIONS: Radiomics combined with EGFR machine learning model as a new method have higher accuracy than EGFR mutation alone. SVM model is the most effective method for predicting brain metastases of lung adenocarcinoma, and the prediction efficiency of PET/CT group is better than PET group and CT group.

Assuntos

Adenocarcinoma de Pulmão , Neoplasias Encefálicas , Receptores ErbB , Neoplasias Pulmonares , Aprendizado de Máquina , Humanos , Adenocarcinoma de Pulmão/diagnóstico por imagem , Adenocarcinoma de Pulmão/patologia , Neoplasias Encefálicas/diagnóstico por imagem , Receptores ErbB/genética , Pulmão/patologia , Neoplasias Pulmonares/genética , Tomografia por Emissão de Pósitrons combinada à Tomografia Computadorizada , Estudos Retrospectivos

5.

Characterization and optimization of 5´ untranslated region containing poly-adenine tracts in Kluyveromyces marxianus using machine-learning model.

Zeng, Junyuan; Song, Kunfeng; Wang, Jingqi; Wen, Haimei; Zhou, Jungang; Ni, Ting; Lu, Hong; Yu, Yao.

Microb Cell Fact ; 23(1): 7, 2024 Jan 03.

Artigo em Inglês | MEDLINE | ID: mdl-38172836

RESUMO

BACKGROUND: The 5´ untranslated region (5´ UTR) plays a key role in regulating translation efficiency and mRNA stability, making it a favored target in genetic engineering and synthetic biology. A common feature found in the 5´ UTR is the poly-adenine (poly(A)) tract. However, the effect of 5´ UTR poly(A) on protein production remains controversial. Machine-learning models are powerful tools for explaining the complex contributions of features, but models incorporating features of 5´ UTR poly(A) are currently lacking. Thus, our goal is to construct such a model, using natural 5´ UTRs from Kluyveromyces marxianus, a promising cell factory for producing heterologous proteins. RESULTS: We constructed a mini-library consisting of 207 5´ UTRs harboring poly(A) and 34 5´ UTRs without poly(A) from K. marxianus. The effects of each 5´ UTR on the production of a GFP reporter were evaluated individually in vivo, and the resulting protein abundance spanned an approximately 450-fold range throughout. The data were used to train a multi-layer perceptron neural network (MLP-NN) model that incorporated the length and position of poly(A) as features. The model exhibited good performance in predicting protein abundance (average R2 = 0.7290). The model suggests that the length of poly(A) is negatively correlated with protein production, whereas poly(A) located between 10 and 30 nt upstream of the start codon (AUG) exhibits a weak positive effect on protein abundance. Using the model as guidance, the deletion or reduction of poly(A) upstream of 30 nt preceding AUG tended to improve the production of GFP and a feruloyl esterase. Deletions of poly(A) showed inconsistent effects on mRNA levels, suggesting that poly(A) represses protein production either with or without reducing mRNA levels. CONCLUSION: The effects of poly(A) on protein production depend on its length and position. Integrating poly(A) features into machine-learning models improves simulation accuracy. Deleting or reducing poly(A) upstream of 30 nt preceding AUG tends to enhance protein production. This optimization strategy can be applied to enhance the yield of K. marxianus and other microbial cell factories.

Assuntos

Kluyveromyces , Regiões 5' não Traduzidas , Sequência de Bases , Kluyveromyces/genética , Kluyveromyces/metabolismo , RNA Mensageiro/genética

6.

Application of a robust MALDI mass spectrometry approach for bee pollen investigation.

Braglia, Chiara; Alberoni, Daniele; Di Gioia, Diana; Giacomelli, Alessandra; Bocquet, Michel; Bulet, Philippe.

Anal Bioanal Chem ; 416(19): 4315-4324, 2024 Aug.

Artigo em Inglês | MEDLINE | ID: mdl-38879687

RESUMO

Pollen collected by pollinators can be used as a marker of the foraging behavior as well as indicate the botanical species present in each environment. Pollen intake is essential for pollinators' health and survival. During the foraging activity, some pollinators, such as honeybees, manipulate the collected pollen mixing it with salivary secretions and nectar (corbicular pollen) changing the pollen chemical profile. Different tools have been developed for the identification of the botanical origin of pollen, based on microscopy, spectrometry, or molecular markers. However, up to date, corbicular pollen has never been investigated. In our work, corbicular pollen from 5 regions with different climate conditions was collected during spring. Pollens were identified with microscopy-based techniques, and then analyzed in MALDI-MS. Four different chemical extraction solutions and two physical disruption methods were tested to achieve a MALDI-MS effective protocol. The best performance was obtained using a sonication disruption method after extraction with acetic acid or trifluoroacetic acid. Therefore, we propose a new rapid and reliable methodology for the identification of the botanical origin of the corbicular pollens using MALDI-MS. This new approach opens to a wide range of environmental studies spanning from plant biodiversity to ecosystem trophic interactions.

Assuntos

Pólen , Espectrometria de Massas por Ionização e Dessorção a Laser Assistida por Matriz , Espectrometria de Massas por Ionização e Dessorção a Laser Assistida por Matriz/métodos , Pólen/química , Abelhas/fisiologia , Animais

7.

Utilising intraoperative respiratory dynamic features for developing and validating an explainable machine learning model for postoperative pulmonary complications.

Li, Peiyi; Gao, Shuanliang; Wang, Yaqiang; Zhou, RuiHao; Chen, Guo; Li, Weimin; Hao, Xuechao; Zhu, Tao.

Br J Anaesth ; 132(6): 1315-1326, 2024 Jun.

Artigo em Inglês | MEDLINE | ID: mdl-38637267

RESUMO

BACKGROUND: Timely detection of modifiable risk factors for postoperative pulmonary complications (PPCs) could inform ventilation strategies that attenuate lung injury. We sought to develop, validate, and internally test machine learning models that use intraoperative respiratory features to predict PPCs. METHODS: We analysed perioperative data from a cohort comprising patients aged 65 yr and older at an academic medical centre from 2019 to 2023. Two linear and four nonlinear learning models were developed and compared with the current gold-standard risk assessment tool ARISCAT (Assess Respiratory Risk in Surgical Patients in Catalonia Tool). The Shapley additive explanation of artificial intelligence was utilised to interpret feature importance and interactions. RESULTS: Perioperative data were obtained from 10 284 patients who underwent 10 484 operations (mean age [range] 71 [65-98] yr; 42% female). An optimised XGBoost model that used preoperative variables and intraoperative respiratory variables had area under the receiver operating characteristic curves (AUROCs) of 0.878 (0.866-0.891) and 0.881 (0.879-0.883) in the validation and prospective cohorts, respectively. These models outperformed ARISCAT (AUROC: 0.496-0.533). The intraoperative dynamic features of respiratory dynamic system compliance, mechanical power, and driving pressure were identified as key modifiable contributors to PPCs. A simplified model based on XGBoost including 20 variables generated an AUROC of 0.864 (0.852-0.875) in an internal testing cohort. This has been developed into a web-based tool for further external validation (https://aorm.wchscu.cn/). CONCLUSIONS: These findings suggest that real-time identification of surgical patients' risk of postoperative pulmonary complications could help personalise intraoperative ventilatory strategies and reduce postoperative pulmonary complications.

Assuntos

Aprendizado de Máquina , Complicações Pós-Operatórias , Humanos , Idoso , Feminino , Complicações Pós-Operatórias/prevenção & controle , Masculino , Idoso de 80 Anos ou mais , Pneumopatias/etiologia , Pneumopatias/prevenção & controle , Medição de Risco/métodos , Estudos Prospectivos , Estudos de Coortes , Fatores de Risco , Monitorização Intraoperatória/métodos

8.

Exploring gonadotropin dosing effects on MII oocyte retrieval in ovarian stimulation.

Zielinski, Krystian; Kloska, Anna; Wygocki, Piotr; Zielen, Marcin; Kunicki, Michal.

J Assist Reprod Genet ; 41(6): 1557-1567, 2024 Jun.

Artigo em Inglês | MEDLINE | ID: mdl-38573535

RESUMO

PURPOSE: Ovarian stimulation with gonadotropins is crucial for obtaining mature oocytes for in vitro fertilization (IVF). Determining the optimal gonadotropin dosage is essential for maximizing its effectiveness. Our study aimed to develop a machine learning (ML) model to predict oocyte counts in IVF patients and retrospectively analyze whether higher gonadotropin doses improve ovarian stimulation outcomes. METHODS: We analyzed the data from 9598 ovarian stimulations. An ML model was employed to predict the number of mature metaphase II (MII) oocytes based on clinical parameters. These predictions were compared with the actual counts of retrieved MII oocytes at different gonadotropin dosages. RESULTS: The ML model provided precise predictions of MII counts, with the AMH and AFC being the most important, and the previous stimulation outcome and age, the less important features for the prediction. Our findings revealed that increasing gonadotropin dosage did not result in a higher number of retrieved MII oocytes. Specifically, for patients predicted to produce 4-8 MII oocytes, a decline in oocyte count was observed as gonadotropin dosage increased. Patients with low (1-3) and high (9-12) MII predictions achieved the best results when administered a daily dose of 225 IU; lower and higher doses proved to be less effective. CONCLUSIONS: Our study suggests that high gonadotropin doses do not enhance MII oocyte retrieval. Our ML model can offer clinicians a novel tool for the precise prediction of MII to guide gonadotropin dosing.

Assuntos

Fertilização in vitro , Gonadotropinas , Recuperação de Oócitos , Oócitos , Indução da Ovulação , Humanos , Feminino , Indução da Ovulação/métodos , Recuperação de Oócitos/métodos , Adulto , Oócitos/efeitos dos fármacos , Oócitos/crescimento & desenvolvimento , Gonadotropinas/administração & dosagem , Gonadotropinas/uso terapêutico , Fertilização in vitro/métodos , Gravidez , Taxa de Gravidez , Estudos Retrospectivos , Metáfase/efeitos dos fármacos

9.

GPS Data and Machine Learning Tools, a Practical and Cost-Effective Combination for Estimating Light Vehicle Emissions.

Rivera-Campoverde, Néstor Diego; Arenas-Ramírez, Blanca; Muñoz Sanz, José Luis; Jiménez, Edisson.

Sensors (Basel) ; 24(7)2024 Apr 05.

Artigo em Inglês | MEDLINE | ID: mdl-38610515

RESUMO

This paper focuses on the emissions of the three most sold categories of light vehicles: sedans, SUVs, and pickups. The research is carried out through an innovative methodology based on GPS and machine learning in real driving conditions. For this purpose, driving data from the three best-selling vehicles in Ecuador are acquired using a data logger with GPS included, and emissions are measured using a PEMS in six RDE tests with two standardized routes for each vehicle. The data obtained on Route 1 are used to estimate the gears used during driving using the K-means algorithm and classification trees. Then, the relative importance of driving variables is estimated using random forest techniques, followed by the training of ANNs to estimate CO2, CO, NOX, and HC. The data generated on Route 2 are used to validate the obtained ANNs. These models are fed with a dataset generated from 324, 300, and 316 km of random driving for each type of vehicle. The results of the model were compared with the IVE model and an OBD-based model, showing similar results without the need to mount the PEMS on the vehicles for long test drives. The generated model is robust to different traffic conditions as a result of its training and validation using a large amount of data obtained under completely random driving conditions.

10.

Enhancing Bone Cancer Diagnosis Through Image Extraction and Machine Learning: A State-of-the-Art Approach.

Shrivastava, Abhishek; Nag, Mukesh Kumar.

Surg Innov ; 31(1): 58-70, 2024 Feb.

Artigo em Inglês | MEDLINE | ID: mdl-38059371

RESUMO

Background: Bone cancer is a severe condition often leading to patient mortality. Diagnosis relies on X-rays, MRIs, or CT scans, which require time-consuming manual review by experts. Thus, developing an automated system is crucial for accurate classification of malignant and healthy bone.Methods: Differentiating between them poses a challenge as they may exhibit similar physical characteristics. The initial step is selecting the optimal edge detection method. Two feature sets are then generated: one with the histogram of oriented gradients (HOG) and one without. Performance evaluation involves two machine learning models: Support Vector Machine (SVM) and Random Forest.Results: Including HOG consistently yields superior results. The SVM model with HOG achieves an F-1 score of 0.92, outperforming the Random Forest model's .77. This study aims to develop reliable methods for bone cancer classification. The proposed automated method assists surgeons in accurately detecting malignant bone regions using modern image analysis techniques and machine learning models. Incorporating HOG significantly enhances performance, improving differentiation between malignant and healthy bone.Conclusion: Ultimately, this approach supports precise diagnoses and informed treatment decisions for bone cancer patients.

Assuntos

Neoplasias Ósseas , Aprendizado de Máquina , Humanos , Imageamento por Ressonância Magnética , Tomografia Computadorizada por Raios X/métodos , Neoplasias Ósseas/diagnóstico por imagem

11.

Direct and indirect monitoring methods for nitrous oxide emissions in full-scale wastewater treatment plants: A critical review.

Shang, Zhenxin; Cai, Chen; Guo, Yanli; Huang, Xiangfeng; Peng, Kaiming; Guo, Ru; Wei, Zhongqing; Wu, Chenyuan; Cheng, Shunjian; Liao, Youxiang; Hung, Chih-Yu; Liu, Jia.

J Environ Manage ; 358: 120842, 2024 May.

Artigo em Inglês | MEDLINE | ID: mdl-38599092

RESUMO

Mitigation of nitrous oxide (N2O) emissions in full-scale wastewater treatment plant (WWTP) has become an irreversible trend to adapt the climate change. Monitoring of N2O emissions plays a fundamental role in understanding and mitigating N2O emissions. This paper provides a comprehensive review of direct and indirect N2O monitoring methods. The techniques, strengths, limitations, and applicable scenarios of various methods are discussed. We conclude that the floating chamber technique is suitable for capturing and interpreting the spatiotemporal variability of real-time N2O emissions, due to its long-term in-situ monitoring capability and high data acquisition frequency. The monitoring duration, location, and frequency should be emphasized to guarantee the accuracy and comparability of acquired data. Calculation by default emission factors (EFs) is efficient when there is a need for ambiguous historical N2O emission accounts of national-scale or regional-scale WWTPs. Using process-specific EFs is beneficial in promoting mitigation pathways that are primarily focused on low-emission process upgrades. Machine learning models exhibit exemplary performance in the prediction of N2O emissions. Integrating mechanistic models with machine learning models can improve their explanatory power and sharpen their predictive precision. The implementation of the synergy of nutrient removal and N2O mitigation strategies necessitates the calibration and validation of multi-path mechanistic models, supported by long-term continuous direct monitoring campaigns.

Assuntos

Monitoramento Ambiental , Óxido Nitroso , Águas Residuárias , Óxido Nitroso/análise , Águas Residuárias/análise , Águas Residuárias/química , Monitoramento Ambiental/métodos , Eliminação de Resíduos Líquidos/métodos

12.

Prediction of Organic-Inorganic Hybrid Perovskite Band Gap by Multiple Machine Learning Algorithms.

Feng, Shun; Wang, Juan.

Molecules ; 29(2)2024 Jan 19.

Artigo em Inglês | MEDLINE | ID: mdl-38276577

RESUMO

As an indicator of the optical characteristics of perovskite materials, the band gap is a crucial parameter that impacts the functionality of a wide range of optoelectronic devices. Obtaining the band gap of a material via a labor-intensive, time-consuming, and inefficient high-throughput calculation based on first principles is possible. However, it does not yield the most accurate results. Machine learning techniques emerge as a viable and effective substitute for conventional approaches in band gap prediction. This paper collected 201 pieces of data through the literature and open-source databases. By separating the features related to bits A, B, and X, a dataset of 1208 pieces of data containing 30 feature descriptors was established. The dataset underwent preprocessing, and the Pearson correlation coefficient method was employed to eliminate non-essential features as a subset of features. The band gap was predicted using the GBR algorithm, the random forest algorithm, the LightGBM algorithm, and the XGBoost algorithm, in that order, to construct a prediction model for organic-inorganic hybrid perovskites. The outcomes demonstrate that the XGBoost algorithm yielded an MAE value of 0.0901, an MSE value of 0.0173, and an R2 value of 0.991310. These values suggest that, compared to the other two models, the XGBoost model exhibits the lowest prediction error, suggesting that the input features may better fit the prediction model. Finally, analysis of the XGBoost-based prediction model's prediction results using the SHAP model interpretation method reveals that the occupancy rate of the A-position ion has the greatest impact on the prediction of the band gap and has an A-negative correlation with the prediction results of the band gap. The findings provide valuable insights into the relationship between the prediction of band gaps and significant characteristics of organic-inorganic hybrid perovskites.

13.

Applying XGBoost and SHAP to Open Source Data to Identify Key Drivers and Predict Likelihood of Wolf Pair Presence.

Schoonemann, Jeanine; Nagelkerke, Jurriaan; Seuntjens, Terri G; Osinga, Nynke; van Liere, Diederik.

Environ Manage ; 73(5): 1072-1087, 2024 May.

Artigo em Inglês | MEDLINE | ID: mdl-38372749

RESUMO

Wolves have returned to Germany since 2000. Numbers have grown to 209 territorial pairs in 2021. XGBoost machine learning, combined with SHAP analysis is applied to predict German wolf pair presence in 2022 for 10 × 10 km grid cells. Model input consisted of 38 variables from open sources, covering the period 2000 to 2021. The XGBoost model predicted well, with 0.91 as the AUC. SHAP analysis ranked the variables: distance to the closest neighboring wolf pair was the main driver for a grid cell to become occupied by a wolf pair. The clustering tendency of related wolves seems to be an important explanatory factor here. Second was the percentage of wooded area. The next eight variables related to wolf presence in the preceding year, except at fifth, eighth and tenth position in the total order: human density (square root) in the grid, percentage arable land and road density respectively. Other variables including the occurrence of wild prey were the weakest predictors. The SHAP analysis also provided crucial added value in identifying a variable that had threshold values where its contribution to the prediction changed from positive to negative or vice versa. For instance, low density of people increased the probability of wolf pair presence, whereas a high density decreased this probability. Cumulative lift techniques showed that the model performed almost four times better than random prediction. The combination of XGBoost, SHAP and cumulative lift techniques is new in wolf management and conservation, allowing for the focusing of educational and financial resources.

Assuntos

Lobos , Animais , Humanos , Probabilidade , Alemanha

14.

Modeling Indirect Greenhouse Gas Emissions Sources from Urban Wastewater Treatment Plants: Integrating Machine Learning Models to Compensate for Sparse Parameters with Abundant Observations.

Huang, Yujun; Xie, Yifan; Wu, Yipeng; Meng, Fanlin; He, Chengyu; Zou, Hao; Wang, Xiaoting; Shui, Ailun; Liu, Shuming.

Environ Sci Technol ; 57(48): 19860-19870, 2023 Dec 05.

Artigo em Inglês | MEDLINE | ID: mdl-37976424

RESUMO

Electricity consumption and sludge yield (SY) are important indirect greenhouse gas (GHG) emission sources in wastewater treatment plants (WWTPs). Predicting these byproducts is crucial for tailoring technology-related policy decisions. However, it challenges balancing mass balance models and mechanistic models that respectively have limited intervariable nexus representation and excessive requirements on operational parameters. Herein, we propose integrating two machine learning models, namely, gradient boosting tree (GBT) and deep learning (DL), to precisely pointwise model electricity consumption intensity (ECI) and SY for WWTPs in China. Results indicate that GBT and DL are capable of mining massive data to compensate for the lack of available parameters, providing a comprehensive modeling focusing on operation conditions and designed parameters, respectively. The proposed model reveals that lower ECI and SY were associated with higher treated wastewater volumes, more lenient effluent standards, and newer equipment. Moreover, ECI and SY showed different patterns when influent biochemical oxygen demand is above or below 100 mg/L in the anaerobic-anoxic-oxic process. Therefore, managing ECI and SY requires quantifying the coupling relationships between biochemical reactions instead of isolating each variable. Furthermore, the proposed models demonstrate potential economic-related inequalities resulting from synergizing water pollution and GHG emissions management.

Assuntos

Gases de Efeito Estufa , Purificação da Água , Eliminação de Resíduos Líquidos , Águas Residuárias , Esgotos , Purificação da Água/métodos , Efeito Estufa

15.

Predicting the diagnosis of various mental disorders in a mixed cohort using blood-based multi-protein model: a machine learning approach.

Chen, Suzhen; Chen, Gang; Li, Yinghui; Yue, Yingying; Zhu, Zixin; Li, Lei; Jiang, Wenhao; Shen, Zhongxia; Wang, Tianyu; Hou, Zhenghua; Xu, Zhi; Shen, Xinhua; Yuan, Yonggui.

Eur Arch Psychiatry Clin Neurosci ; 273(6): 1267-1277, 2023 Sep.

Artigo em Inglês | MEDLINE | ID: mdl-36567366

RESUMO

The lack of objective diagnostic methods for mental disorders challenges the reliability of diagnosis. The study aimed to develop an easily accessible and useable objective method for diagnosing major depressive disorder (MDD), schizophrenia (SZ), bipolar disorder (BPD), and panic disorder (PD) using serum multi-protein. Serum levels of brain-derived neurotrophic factor (BDNF), VGF (non-acronymic), bicaudal C homolog 1 (BICC1), C-reactive protein (CRP), and cortisol, which are generally recognized to be involved in different pathogenesis of various mental disorders, were measured in patients with MDD (n = 50), SZ (n = 50), BPD (n = 55), and PD along with 50 healthy controls (HC). Linear discriminant analysis (LDA) was employed to construct a multi-classification model to classify these mental disorders. Both leave-one-out cross-validation (LOOCV) and fivefold cross-validation were applied to validate the accuracy and stability of the LDA model. All five serum proteins were included in the LDA model, and it was found to display a high overall accuracy of 96.9% when classifying MDD, SZ, BPD, PD, and HC groups. Multi-classification accuracy of the LDA model for LOOCV and fivefold cross-validation (within-study replication) reached 96.9 and 96.5%, respectively, demonstrating the feasibility of the blood-based multi-protein LDA model for classifying common mental disorders in a mixed cohort. The results suggest that combining multiple proteins associated with different pathogeneses of mental disorders using LDA may be a novel and relatively objective method for classifying mental disorders. Clinicians should consider combining multiple serum proteins to diagnose mental disorders objectively.

Assuntos

Transtorno Depressivo Maior , Transtornos Mentais , Humanos , Transtorno Depressivo Maior/diagnóstico , Reprodutibilidade dos Testes , Transtornos Mentais/diagnóstico , Proteínas Sanguíneas , Aprendizado de Máquina

16.

Application of machine learning model in predicting the likelihood of blood transfusion after hip fracture surgery.

Chen, Xiao; Pan, Junpeng; Li, Yi; Tang, Ruixin.

Aging Clin Exp Res ; 35(11): 2643-2656, 2023 Nov.

Artigo em Inglês | MEDLINE | ID: mdl-37733228

RESUMO

OBJECTIVE: Anemia is one of the common adverse reactions after hip fracture surgery. The traditional method to solve anemia is allogeneic transfusion. However, the transfusion may lead to some complications such as septicemia and fever. So far, few studies have reported roles of machine learning in predicting whether blood transfusion is needed or not after hip fracture surgery. Therefore, the purpose of this study is to develop machine learning models to predict the likelihood of postoperative blood transfusion in patients undergoing hip fracture surgery. METHODS: This study enrolled 1355 patients who underwent hip fracture surgery at the Affiliated Hospital of Qingdao University from January 2016 to December 2021. Among all patients, 210 cases received postoperative blood transfusion. All patients were randomly divided into a training group and a testing group at a ratio of 7:3. In the training group, univariate and multivariate logistic regression analyses were used to determine independent risk factors for the postoperative transfusion. Then, based on these independent risk factors, tenfold cross-validation method was utilized to develop five machine learning models, including logistic, multilayer perceptron (MLP), extreme gradient boosting (XGBoost), random forest (RF), and support vector machine (SVM). The receiver operating characteristic (ROC) curve, area under ROC curve (AUC), and Matthews correlation coefficient (MCC) were generated to evaluate the performance of the models. Calibration plot and decision curve analysis (DCA) were used to test the performance, stability, and clinical applicability of the models. The models were validated using the testing group; and the ROC curve, MCC, calibration plot, and DCA curves were also generated to validate the performance, stability, and clinical applicability of the models. To further verify the robustness of the model, we randomly grabbed 70% of the samples in the testing set, performed 1000 iterations, and calculated the AUC and confidence interval of the five models. Finally, we used SHapley Additive exPlanations (SHAP) to explain these models. RESULTS: Multivariate logistic regression analysis showed that there were 8 independent risk factors, including age, blood transfusion history, albumin (ALB), globulin (GLO), total bilirubin (TBIL), indirect bilirubin (IBIL), hemoglobin (HB), and blood loss > 200 ml. We finally selected five independent risk factors including HB, GLO, age, IBIL, and blood loss > 200 ml. Based on these five independent risk factors, we generated six characteristic variables, namely HB, HB × HB, HB × blood loss, GLO × HB, age, age × IBIL, and established five machine learning models using a tenfold cross-validation method. In the training group, the AUC values of logistic, RF, MLP, SVM, and XGB were 0.9320, 0.8911, 0.9327, 0.9225, and 0.8825, respectively, and the average AUC was 0.9122 ± 0.0212. The MCC values were 0.65, 0.77, 0.65, 0.66, and 0.68, respectively, and the calibration plot and DCA performed well. In the testing group the AUC values of logistic, RF, MLP, SVM, and XGB were 0.8483, 0.7978, 0.8576, 0.8598, and 0.8216, respectively. The average AUC was 0.8370 ± 0.0238, and the MCC values were 0.41, 0.35, 0.40, 0.41, and 0.41, respectively. The calibration plot and DCA in the testing group also showed good performance. The AUC values and confidence intervals of the 1000-iteration model were: logistic (AUC, min confidence interval [CI]-max confidence interval [CI] 0.848, 0.804-0.903), RF (AUC, minCI-maxCI 0.797, 0.734-0.857), MLP (AUC, minCI-maxCI 0.858, 0.812-0.902), SVM (AUC, minCI-maxCI 0.859, 0.819-0.910), and XGB (AUC, minCI-maxCI 0.821, 0.764-0.894). The model performed well. Finally, according to SHAP, among all five models, HB played the most important role in model prediction and interpretation. CONCLUSION: The five models we developed all performed well in predicting the likelihood of blood transfusion after hip fracture surgery. Therefore, we believed that the prediction model based on machine learning had great application prospects in clinical practice, which could help clinicians better predict the risk of blood transfusion after hip fracture surgery.

Assuntos

Anemia , Fraturas do Quadril , Humanos , Bilirrubina , Transfusão de Sangue , Aprendizado de Máquina

17.

Extracting the latent needs of dementia patients and caregivers from transcribed interviews in japanese: an initial assessment of the availability of morpheme selection as input data with Z-scores in machine learning.

Tanemura, Nanae; Sasaki, Tsuyoshi; Miyamoto, Ryotaro; Watanabe, Jin; Araki, Michihiro; Sato, Junko; Chiba, Tsuyoshi.

BMC Med Inform Decis Mak ; 23(1): 203, 2023 10 05.

Artigo em Inglês | MEDLINE | ID: mdl-37798639

RESUMO

BACKGROUND: Given the increasing number of dementia patients worldwide, a new method was developed for machine learning models to identify the 'latent needs' of patients and caregivers to facilitate patient/public involvement in societal decision making. METHODS: Japanese transcribed interviews with 53 dementia patients and caregivers were used. A new morpheme selection method using Z-scores was developed to identify trends in describing the latent needs. F-measures with and without the new method were compared using three machine learning models. RESULTS: The F-measures with the new method were higher for the support vector machine (SVM) (F-measure of 0.81 with the new method and F-measure of 0.79 without the new method for patients) and Naive Bayes (F-measure of 0.69 with the new method and F-measure of 0.67 without the new method for caregivers and F-measure of 0.75 with the new method and F-measure of 0.73 without the new method for patients). CONCLUSION: A new scheme based on Z-score adaptation for machine learning models was developed to predict the latent needs of dementia patients and their caregivers by extracting data from interviews in Japanese. However, this study alone cannot be used to assign significance to the adaptation of the new method because of no enough size of sample dataset. Such pre-selection with Z-score adaptation from text data in machine learning models should be considered with more modified suitable methods in the near future.

Assuntos

Cuidadores , Demência , Avaliação das Necessidades , Humanos , Teorema de Bayes , População do Leste Asiático , Aprendizado de Máquina , Necessidades e Demandas de Serviços de Saúde

18.

Impacts of meteorological variables and machine learning algorithms on rice yield prediction in Korea.

Ha, Subin; Kim, Yong-Tak; Im, Eun-Soon; Hur, Jina; Jo, Sera; Kim, Yong-Seok; Shim, Kyo-Moon.

Int J Biometeorol ; 67(11): 1825-1838, 2023 Nov.

Artigo em Inglês | MEDLINE | ID: mdl-37667047

RESUMO

As crop productivity is greatly influenced by weather conditions, many attempts have been made to estimate crop yields using meteorological data and have achieved great progress with the development of machine learning. However, most yield prediction models are developed based on observational data, and the utilization of climate model output in yield prediction has been addressed in very few studies. In this study, we estimate rice yields in South Korea using the meteorological variables provided by ERA5 reanalysis data (ERA-O) and its dynamically downscaled data (ERA-DS). After ERA-O and ERA-DS are validated against observations (OBS), two different machine learning models, Support Vector Machine (SVM) and Long Short-Term Memory (LSTM), are trained with different combinations of eight meteorological variables (mean temperature, maximum temperature, minimum temperature, precipitation, diurnal temperature range, solar irradiance, mean wind speed, and relative humidity) obtained from OBS, ERA-O, and ERA-DS at weekly and monthly timescales from May to September. Regardless of the model type and the source of the input data, training a model with weekly datasets leads to better yield estimates compared to monthly datasets. LSTM generally outperforms SVM, especially when the model is trained with ERA-DS data at a weekly timescale. The best yield estimates are produced by the LSTM model trained with all eight variables at a weekly timescale. Altogether this study shows the significance of high spatial and temporal resolution of input meteorological data in yield prediction, which can also serve to substantiate the added value of dynamical downscaling.

19.

Internal and External Validation of the Generalizability of Machine Learning Algorithms in Predicting Non-home Discharge Disposition Following Primary Total Knee Joint Arthroplasty.

Chen, Tony Lin-Wei; Buddhiraju, Anirudh; Seo, Henry Hojoon; Subih, Murad Abdullah; Tuchinda, Pete; Kwon, Young-Min.

J Arthroplasty ; 38(10): 1973-1981, 2023 10.

Artigo em Inglês | MEDLINE | ID: mdl-36764409

RESUMO

BACKGROUND: Nonhome discharge disposition following primary total knee arthroplasty (TKA) is associated with a higher rate of complications and constitutes a socioeconomic burden on the health care system. While existing algorithms predicting nonhome discharge disposition varied in degrees of mathematical complexity and prediction power, their capacity to generalize predictions beyond the development dataset remains limited. Therefore, this study aimed to establish the machine learning model generalizability by performing internal and external validations using nation-scale and institutional cohorts, respectively. METHODS: Four machine learning models were trained using the national cohort. Recursive feature elimination and hyper-parameter tuning were applied. Internal validation was achieved through five-fold cross-validation during model training. The trained models' performance was externally validated using the institutional cohort and assessed by discrimination, calibration, and clinical utility. RESULTS: The national (424,354 patients) and institutional (10,196 patients) cohorts had non-home discharge rates of 19.4 and 36.4%, respectively. The areas under the receiver operating curve of the model predictions were 0.83 to 0.84 during internal validation and increased to 0.88 to 0.89 during external validation. Artificial neural network and histogram-based gradient boosting elicited the best performance with a mean area under the receiver operating curve of 0.89, calibration slope of 1.39, and Brier score of 0.14, which indicated that the two models were robust in distinguishing non-home discharge and well-calibrated with accurate predictions of the probabilities. The low inter-dataset similarity indicated reliable external validation. Length of stay, age, body mass index, and sex were the strongest predictors of discharge destination after primary TKA. CONCLUSION: The machine learning models demonstrated excellent predictive performance during both internal and external validations, supporting their generalizability across different patient cohorts and potential applicability in the clinical workflow.

Assuntos

Artroplastia do Joelho , Humanos , Alta do Paciente , Algoritmos , Aprendizado de Máquina , Articulação do Joelho , Estudos Retrospectivos

20.

Machine Learning Models Based on a National-Scale Cohort Identify Patients at High Risk for Prolonged Lengths of Stay Following Primary Total Hip Arthroplasty.

Chen, Tony Lin-Wei; Buddhiraju, Anirudh; Costales, Timothy G; Subih, Murad Abdullah; Seo, Henry Hojoon; Kwon, Young-Min.

J Arthroplasty ; 38(10): 1967-1972, 2023 10.

Artigo em Inglês | MEDLINE | ID: mdl-37315634

RESUMO

BACKGROUND: Existing machine learning models that predicted prolonged lengths of stay (LOS) following primary total hip arthroplasty (THA) were limited by the small training volume and exclusion of important patient factors. This study aimed to develop machine learning models using a national-scale data set and examine their performance in predicting prolonged LOS following THA. METHODS: A total of 246,265 THAs were analyzed from a large database. Prolonged LOS was defined as exceeding the 75th percentile of all LOSs in the cohort. Candidate predictors of prolonged LOS were selected by recursive feature elimination and used to construct four machine learning models-artificial neural network, random forest, histogram-based gradient boosting, and k-nearest neighbor. The model performance was assessed by discrimination, calibration, and utility. RESULTS: All models exhibited excellent performance in discrimination (area under the receiver operating characteristic curve [AUC] = 0.72 to 0.74) and calibration (slope: 0.83 to 1.18, intercept: -0.01 to 0.11, Brier score: 0.185 to 0.192) during both training and testing sessions. The artificial neural network was the best performer with an AUC of 0.73, calibration slope of 0.99, calibration intercept of -0.01, and Brier score of 0.185. All models showed great utility by producing higher net benefits than the default treatment strategies in the decision curve analyses. Age, laboratory tests, and surgical variables were the strongest predictors of prolonged LOS. CONCLUSION: The excellent prediction performance of machine learning models demonstrated their capacity to identify patients prone to prolonged LOS. Many factors contributing to prolonged LOS can be optimized to minimize hospital stay for high-risk patients.

Assuntos

Artroplastia de Quadril , Humanos , Aprendizado de Máquina , Redes Neurais de Computação , Pacientes , Curva ROC

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA