ABSTRACT
The clinical manifestation of Parkinson's disease exhibits significant heterogeneity in the prevalence of non-motor symptoms and the rate of progression of motor symptoms, suggesting that Parkinson's disease can be classified into distinct subtypes. In this study, we aimed to explore this heterogeneity by identifying a set of subtypes with distinct patterns of spatiotemporal trajectories of neurodegeneration. We applied Subtype and Stage Inference (SuStaIn), an unsupervised machine learning algorithm that combined disease progression modelling with clustering methods, to cortical and subcortical neurodegeneration visible on 3 T structural MRI of a large cross-sectional sample of 504 patients and 279 healthy controls. Serial longitudinal data were available for a subset of 178 patients at the 2-year follow-up and for 140 patients at the 4-year follow-up. In a subset of 210 patients, concomitant Alzheimer's disease pathology was assessed by evaluating amyloid-ß concentrations in the CSF or via the amyloid-specific radiotracer 18F-flutemetamol with PET. The SuStaIn analysis revealed three distinct subtypes, each characterized by unique patterns of spatiotemporal evolution of brain atrophy: neocortical, limbic and brainstem. In the neocortical subtype, a reduction in brain volume occurred in the frontal and parietal cortices in the earliest disease stage and progressed across the entire neocortex during the early stage, although with relative sparing of the striatum, pallidum, accumbens area and brainstem. The limbic subtype represented comparative regional vulnerability, which was characterized by early volume loss in the amygdala, accumbens area, striatum and temporal cortex, subsequently spreading to the parietal and frontal cortices across disease stage. The brainstem subtype showed gradual rostral progression from the brainstem extending to the amygdala and hippocampus, followed by the temporal and other cortices. Longitudinal MRI data confirmed that 77.8% of participants at the 2-year follow-up and 84.0% at the 4-year follow-up were assigned to subtypes consistent with estimates from the cross-sectional data. This three-subtype model aligned with empirically proposed subtypes based on age at onset, because the neocortical subtype demonstrated characteristics similar to those found in the old-onset phenotype, including older onset and cognitive decline symptoms (P < 0.05). Moreover, the subtypes correspond to the three categories of the neuropathological consensus criteria for symptomatic patients with Lewy pathology, proposing neocortex-, limbic- and brainstem-predominant patterns as different subgroups of α-synuclein distributions. Among the subtypes, the prevalence of biomarker evidence of amyloid-ß pathology was comparable. Upon validation, the subtype model might be applied to individual cases, potentially serving as a biomarker to track disease progression and predict temporal evolution.
Subject(s)
Atrophy , Brain , Disease Progression , Magnetic Resonance Imaging , Parkinson Disease , Humans , Parkinson Disease/pathology , Parkinson Disease/diagnostic imaging , Male , Female , Atrophy/pathology , Aged , Middle Aged , Brain/pathology , Brain/diagnostic imaging , Cross-Sectional Studies , Longitudinal Studies , Positron-Emission Tomography , Alzheimer Disease/pathology , Alzheimer Disease/diagnostic imagingABSTRACT
Foodborne illnesses, particularly those caused by Salmonella enterica with its extensive array of over 2600 serovars, present a significant public health challenge. Therefore, prompt and precise identification of S. enterica serovars is essential for clinical relevance, which facilitates the understanding of S. enterica transmission routes and the determination of outbreak sources. Classical serotyping methods via molecular subtyping and genomic markers currently suffer from various limitations, such as labour intensiveness, time consumption, etc. Therefore, there is a pressing need to develop new diagnostic techniques. Surface-enhanced Raman spectroscopy (SERS) is a non-invasive diagnostic technique that can generate Raman spectra, based on which rapid and accurate discrimination of bacterial pathogens could be achieved. To generate SERS spectra, a Raman spectrometer is needed to detect and collect signals, which are divided into two types: the expensive benchtop spectrometer and the inexpensive handheld spectrometer. In this study, we compared the performance of two Raman spectrometers to discriminate four closely associated S. enterica serovars, that is, S. enterica subsp. enterica serovar dublin, enteritidis, typhi and typhimurium. Six machine learning algorithms were applied to analyse these SERS spectra. The support vector machine (SVM) model showed the highest accuracy for both handheld (99.97%) and benchtop (99.38%) Raman spectrometers. This study demonstrated that handheld Raman spectrometers achieved similar prediction accuracy as benchtop spectrometers when combined with machine learning models, providing an effective solution for rapid, accurate and cost-effective identification of closely associated S. enterica serovars.
Subject(s)
Salmonella enterica , Serogroup , Spectrum Analysis, Raman , Support Vector Machine , Spectrum Analysis, Raman/methods , Salmonella enterica/isolation & purification , Humans , AlgorithmsABSTRACT
A multimodal sensor array, combining pressure and proximity sensing, has attracted considerable interest due to its importance in ubiquitous monitoring of cardiopulmonary health- and sleep-related biometrics. However, the sensitivity and dynamic range of prevalent sensors are often insufficient to detect subtle body signals. This study introduces a novel capacitive nanocomposite proximity-pressure sensor (NPPS) for detecting multiple human biometrics. NPPS consists of a carbon nanotube-paper composite (CPC) electrode and a percolating multiwalled carbon nanotube (MWCNT) foam enclosed in a MWCNT-coated auxetic frame. The fractured fibers in the CPC electrode intensify an electric field, enabling highly sensitive detection of proximity and pressure. When pressure is applied to the sensor, the synergic effect of MWCNT foam and auxetic deformation amplifies the sensitivity. The simple and mass-producible fabrication protocol allows for building an array of highly sensitive sensors to monitor human presence, sleep posture, and vital signs, including ballistocardiography (BCG). With the aid of a machine learning algorithm, the sensor array accurately detects blood pressure (BP) without intervention. This advancement holds promise for unrestricted vital sign monitoring during sleep or driving.
ABSTRACT
Infection post-hematopoietic stem cell transplantation (HSCT) is one of the main causes of patient mortality. Fever is the most crucial clinical symptom indicating infection. However, current microbial detection methods are limited. Therefore, timely diagnosis of infectious fever and administration of antimicrobial drugs can effectively reduce patient mortality. In this study, serum samples were collected from 181 patients with HSCT with or without infection, as well as the clinical information. And more than 80 infectious-related microRNAs in the serum were selected according to the bulk RNA-seq result and detected in the 345 time-pointed serum samples by Q-PCR. Unsupervised clustering result indicates a close association between these microRNAs expression and infection occurrence. Compared to the uninfected cohort, more than 10 serum microRNAs were identified as the combined diagnostic markers in one formula constructed by the Random Forest (RF) algorithms, with a diagnostic accuracy more than 0.90. Furthermore, correlations of serum microRNAs to immune cells, inflammatory factors, pathgens, infection tissue, and prognosis were analyzed in the infection cohort. Overall, this study demonstrates that the combination of serum microRNAs detection and machine learning algorithms holds promising potential in diagnosing infectious fever after HSCT.
Subject(s)
Fever , Hematopoietic Stem Cell Transplantation , Machine Learning , Humans , Hematopoietic Stem Cell Transplantation/adverse effects , Female , Male , Adult , Middle Aged , Fever/etiology , Fever/diagnosis , Fever/blood , Algorithms , MicroRNAs/blood , Biomarkers/blood , Adolescent , Young AdultABSTRACT
BACKGROUND: Microsatellite instability (MSI) caused by DNA mismatch repair (MMR) deficiency is of great significance in the occurrence, diagnosis and treatment of colorectal cancer (CRC). AIM: This study aimed to analyze the relationship between mismatch repair status and clinical characteristics of CRC. METHODS: The histopathological results and clinical characteristics of 2029 patients who suffered from CRC and underwent surgery at two centers from 2018 to 2020 were determined. After screening the importance of clinical characteristics through machine learning algorithms, the patients were divided into deficient mismatch repair (dMMR) and proficient mismatch repair (pMMR) groups based on the immunohistochemistry results and the clinical feature data between the two groups were observed by statistical methods. RESULTS: The dMMR and pMMR groups had significant differences in histologic type, TNM stage, maximum tumor diameter, lymph node metastasis, differentiation grade, gross appearance, and vascular invasion. There were significant differences between the MLH1 groups in age, histologic type, TNM stage, lymph node metastasis, tumor location, and depth of invasion. The MSH2 groups were significantly different in age. The MSH6 groups had significant differences in age, histologic type, and TNM stage. There were significant differences between the PMS2 groups in lymph node metastasis and tumor location. CRC was dominated by MLH1 and PMS2 combined expression loss (41.77%). There was a positive correlation between MLH1 and MSH2 and between MSH6 and PMS2 as well. CONCLUSIONS: The proportion of mucinous adenocarcinoma, protruding type, and poor differentiation is relatively high in dMMR CRCs, but lymph node metastasis is rare. It is worth noting that the expression of MMR protein has different prognostic significance in different stages of CRC disease.
Subject(s)
Colorectal Neoplasms , DNA Mismatch Repair , Humans , Colorectal Neoplasms/pathology , Colorectal Neoplasms/genetics , Male , Female , Retrospective Studies , Middle Aged , Aged , Neoplasm Staging , Microsatellite Instability , Lymphatic Metastasis , AdultABSTRACT
Non-targeted analysis (NTA) has great potential to screen emerging contaminants in the environment, and some studies have conducted in-depth investigation on environmental samples. Here, we used a NTA workflow to identify emerging contaminants in used tire particle (TP) leachates, followed by quantitative prediction and toxicity assessment based on hazard scores. Tire particles were obtained from four different types of automobiles, representing the most common tires during daily transportation. With the instrumental analysis of TP leachates, a total of 244 positive and 104 negative molecular features were extracted from the mass data. After filtering by a specialized emerging contaminants list and matching by spectral databases, a total of 51 molecular features were tentatively identified as contaminants, including benzothiazole, hexaethylene glycol, 2-hydroxybenzaldehyde, etc. Given that these contaminants have different mass spectral responses in the mass spectrometry, models for predicting the response of contaminants were constructed based on machine learning algorithms, in this case random forest and artificial neural networks. After five-fold cross-validation, the random forest algorithm model had better prediction performance (MAECV = 0.12, Q2 = 0.90), and thus it was chosen to predict the contaminant concentrations. The prediction results showed that the contaminant at the highest concentration was benzothiazole, with 4,875 µg/L in the winter tire sample. In addition, the joint toxicity assessment of four types of tires was conducted in this study. According to different hazard levels, hazard scores increasing by a factor 10 were developed, and hazard scores of all the contaminants identified in each TP leachate were summed to obtain the total hazard score. All four tires were calculated to have relatively high risks, with winter tires having the highest total hazard score of 40,751. This study extended the application of NTA research and led to the direction of subsequent targeting studies on highly concentrated and toxic contaminants.
Subject(s)
Automobiles , Rubber , Rubber/chemistry , Rubber/toxicity , Transportation , Benzothiazoles/toxicityABSTRACT
BACKGROUND: Adverse birth outcomes, including preterm birth, low birth weight, and stillbirth, remain a major global health challenge, particularly in developing regions. Understanding the possible risk factors is crucial for designing effective interventions for birth outcomes. Accordingly, this study aimed to develop a predictive model for adverse birth outcomes among childbearing women in Sub-Saharan Africa using advanced machine learning techniques. Additionally, this study aimed to employ a novel data science interpretability techniques to identify the key risk factors and quantify the impact of each feature on the model prediction. METHODS: The study population involved women of childbearing age from 26 Sub-Saharan African countries who had given birth within five years before the data collection, totaling 139,659 participants. Our data source was a recent Demographic Health Survey (DHS). We utilized various data balancing techniques. Ten advanced machine learning algorithms were employed, with the dataset split into 80% training and 20% testing sets. Model evaluation was conducted using various performance metrics, along with hyperparameter optimization. Association rule mining and SHAP analysis were employed to enhance model interpretability. RESULTS: Based on our findings, about 28.59% (95% CI: 28.36, 28.83) of childbearing women in Sub-Saharan Africa experienced adverse birth outcomes. After repeated experimentation and evaluation, the random forest model emerged as the top-performing machine learning algorithm, with an AUC of 0.95 and an accuracy of 88.0%. The key risk factors identified were home deliveries, lack of prenatal iron supplementation, fewer than four antenatal care (ANC) visits, short and long delivery intervals, unwanted pregnancy, primiparous mothers, and geographic location in the West African region. CONCLUSION: The region continues to face persistent adverse birth outcomes, emphasizing the urgent need for increased attention and action. Encouragingly, advanced machine learning methods, particularly the random forest algorithm, have uncovered crucial insights that can guide targeted actions. Specifically, the analysis identifies risky groups, including first-time mothers, women with short or long birth intervals, and those with unwanted pregnancies. To address the needs of these high-risk women, the researchers recommend immediately providing iron supplements, scheduling comprehensive prenatal care, and strongly encouraging facility-based deliveries or skilled birth attendance.
Subject(s)
Machine Learning , Pregnancy Outcome , Humans , Female , Pregnancy , Africa South of the Sahara/epidemiology , Adult , Young Adult , Pregnancy Outcome/epidemiology , Premature Birth/epidemiology , Risk Factors , Adolescent , Infant, Newborn , Stillbirth/epidemiology , Infant, Low Birth WeightABSTRACT
INTRODUCTION: The aim of this study was to compare various machine learning algorithms for constructing a diabetic retinopathy (DR) prediction model among type 2 diabetes mellitus (DM) patients and to develop a nomogram based on the best model. METHODS: This cross-sectional study included DM patients receiving routine DR screening. Patients were randomly divided into training (244) and validation (105) sets. Least absolute shrinkage and selection operator regression was used for the selection of clinical characteristics. Six machine learning algorithms were compared: decision tree (DT), k-nearest neighbours (KNN), logistic regression model (LM), random forest (RF), support vector machine (SVM), and XGBoost (XGB). Model performance was assessed via receiver-operating characteristic (ROC), calibration, and decision curve analyses (DCAs). A nomogram was then developed on the basis of the best model. RESULTS: Compared with the five other machine learning algorithms (DT, KNN, RF, SVM, and XGB), the LM demonstrated the highest area under the ROC curve (AUC, 0.894) and recall (0.92) in the validation set. Additionally, the calibration curves and DCA results were relatively favourable. Disease duration, DPN, insulin dosage, urinary protein, and ALB were included in the LM. The nomogram exhibited robust discrimination (AUC: 0.856 in the training set and 0.868 in the validation set), calibration, and clinical applicability across the two datasets after 1,000 bootstraps. CONCLUSION: Among the six different machine learning algorithms, the LM algorithm demonstrated the best performance. A logistic regression-based nomogram for predicting DR in type 2 DM patients was established. This nomogram may serve as a valuable tool for DR detection, facilitating timely treatment.
Subject(s)
Algorithms , Diabetes Mellitus, Type 2 , Diabetic Retinopathy , Machine Learning , Nomograms , ROC Curve , Humans , Diabetes Mellitus, Type 2/complications , Diabetes Mellitus, Type 2/diagnosis , Diabetic Retinopathy/diagnosis , Male , Cross-Sectional Studies , Female , Middle Aged , AgedABSTRACT
PURPOSE: The aim of this study was to develop and train a machine learning (ML) algorithm to create a clinical decision support tool (i.e., ML-driven probability calculator) to be used in clinical practice to estimate recurrence rates following an arthroscopic Bankart repair (ABR). METHODS: Data from 14 previously published studies were collected. Inclusion criteria were (1) patients treated with ABR without remplissage for traumatic anterior shoulder instability and (2) a minimum of 2 years follow-up. Risk factors associated with recurrence were identified using bivariate logistic regression analysis. Subsequently, four ML algorithms were developed and internally validated. The predictive performance was assessed using discrimination, calibration and the Brier score. RESULTS: In total, 5591 patients underwent ABR with a recurrence rate of 15.4% (n = 862). Age <35 years, participation in contact and collision sports, bony Bankart lesions and full-thickness rotator cuff tears increased the risk of recurrence (all p < 0.05). A single shoulder dislocation (compared to multiple dislocations) lowered the risk of recurrence (p < 0.05). Due to the unavailability of certain variables in some patients, a portion of the patient data had to be excluded before pooling the data set to create the algorithm. A total of 797 patients were included providing information on risk factors associated with recurrence. The discrimination (area under the receiver operating curve) ranged between 0.54 and 0.57 for prediction of recurrence. CONCLUSION: ML was not able to predict the recurrence following ABR with the current available predictors. Despite a global coordinated effort, the heterogeneity of clinical data limited the predictive capabilities of the algorithm, emphasizing the need for standardized data collection methods in future studies. LEVEL OF EVIDENCE: Level IV, retrospective cohort study.
ABSTRACT
Electroencephalography (EEG) is a non-invasive method used to track human brain activity over time. The time-locked EEG to an external event is known as event-related potential (ERP). ERP can be a biomarker of human perception and other cognitive processes. The success of ERP research depends on the laboratory conditions and attentiveness of the test subjects. Specifically, the inability to control experimental variables has reduced ERP research in the real world. This study collected EEG data under various experimental circumstances within an auditory oddball paradigm experiment to enable the use of ERP as an active biomarker in normal laboratory conditions. Then, ERP epochs were analyzed to identify unfocused epochs, affected by typical artifacts and external distortion. For the initial comparison, the ability of four unsupervised machine learning algorithms (MLAs) was evaluated to identify unfocused epochs. Then, their accuracy was compared with the human inspection and a current EEG analysis tool (EEGLab). All four MLAs were typically 95-100% accurate. In summary, our analysis finds that humans might miss subtle differences in the regular ERP patterns, but MLAs could efficiently identify those. Thus, our analysis suggests that unsupervised MLAs perform better for detecting unfocused ERP epochs compared with the other two standard methods.
Subject(s)
Algorithms , Electroencephalography , Evoked Potentials , Machine Learning , Humans , Electroencephalography/methods , Male , Female , Evoked Potentials/physiology , Adult , Brain/physiology , Signal Processing, Computer-Assisted , Young AdultABSTRACT
Phosphorus (P) pollution in aquatic environments poses significant environmental challenges, necessitating the development of effective remediation strategies, and biochar has emerged as a promising adsorbent for P removal at the cost of extensive research resources worldwide. In this study, a machine learning approach was proposed to simulate and predict the performance of biochar in removing P from water. A dataset consisting of 190 types of biochar was compiled from literature, encompassing various variables including biochar characteristics, water quality parameters, and operating conditions. Subsequently, the random forest and CatBoost algorithms were fine-tuned to establish a predictive model for P adsorption capacity. The results demonstrated that the optimized CatBoost model exhibited high prediction accuracy with an R2 value of 0.9573, and biochar dosage, initial P concentration in water, and C content in biochar were identified as the predominant factors. Furthermore, partial dependence analysis was employed to examine the impact of individual variables and interactions between two features, providing valuable insights for adsorbent design and operating condition optimization. This work presented a comprehensive framework for applying a machine learning approach to address environmental issues and provided a valuable tool for advancing the design and implementation of biochar-based water treatment systems.
Subject(s)
Charcoal , Machine Learning , Phosphorus , Phosphorus/chemistry , Charcoal/chemistry , Adsorption , Water Purification/methods , Water Pollutants, Chemical/chemistry , AlgorithmsABSTRACT
This study investigates the impact of green finance (GF) and green innovation (GI) on corporate credit rating (CR) performance in Chinese A-share listed firms from 2018 to 2021. The least absolute shrinkage and selection operators (LASSOs) machine learning algorithms are first used to select the critical drivers of corporate credit performance. Then, we applied partialing-out LASSO linear regression (POLR) and double selection LASSO linear regression (DSLR) machine learning techniques to check the impact of GF and GI on CR. The main results reveal that a 1% increase in GF diminishes CR by 0.26%, whereas GI promotes CR performance by 0.15%. Moreover, the heterogeneity analysis reveals a more significant negative effect of GF on the CR performance of heavily polluting firms, non-state-owned enterprises, and firms in the Western region. The findings raise policies for managing green finance and encouraging green innovation formation, as well as addressing company heterogeneity to support sustainability.
Subject(s)
Machine Learning , Algorithms , ChinaABSTRACT
INTRODUCTION: Knee osteoarthritis is a prevalent condition frequently necessitating knee replacement surgery, with demand projected to rise substantially. Partial knee arthroplasty (PKA) offers advantages over total knee arthroplasty (TKA), yet its utilisation remains low despite guidance recommending consideration alongside TKA in shared decision making. Radiographic decision aids exist but are underutilised due to clinician time constraints. MATERIALS AND METHODS: This research develops a novel radiographic artificial intelligence (AI) tool using a dataset of knee radiographs and a panel of expert orthopaedic surgeons' assessments. Six AI models were trained to identify PKA candidacy. RESULTS: 1241 labelled four-view radiograph series were included. Models achieved statistically significant accuracies above random assignment, with EfficientNet-ES demonstrating the highest performance (AUC 95%, F1 score 83% and accuracy 80%). CONCLUSIONS: The AI decision tool shows promise in identifying PKA candidates, potentially addressing underutilisation of this procedure. Its integration into clinical practice could enhance shared decision making and improve patient outcomes. Further validation and implementation studies are warranted to assess real-world utility and impact.
ABSTRACT
Land use and land cover are critical factors that influence the environment and human societies. The dynamics of LULC have been constantly changing over the years, and these changes can be analyzed at different spatial and temporal scales to evaluate their impact on the natural environment. This study employs multitemporal satellite data to investigate the spatial and temporal transformations that occurred in Sidi Bel Abbes province, situated in the northwestern region of Algeria, spanning from the early 1990s to 2020. Notably, this province is marked by semi-arid and arid climates and hosts a wide range of areas susceptible to gravitational hazards, especially concerning alterations in land use and forest fires. The interactive supervised classification tool utilized multiple machine learning algorithms including Random Forest, Support Vector Machine, Classification and Regression Tree, and Naïve Bayes to produce land cover maps with six main classes: forest, shrub, agricultural, pasture, water, and built-up. The findings showed that the LULC in the research area is undergoing continuous change, particularly in the forest and agricultural lands. The forest area has decreased significantly from 10.80% in 1990 to 5.25% in 2020, mainly due to repeated fires. Agricultural land has also undergone fluctuations, with a decrease between 1990 and 2000, followed by a fast increase and near stabilization in 2020. At the same time, pasture lands and built-up areas grew steadily, increasing by 11% and 13% respectively. This research highlights the significant impact of anthropogenic activities on LULC changes in the study area and can provide valuable insights for promoting sustainable land use policies.
Subject(s)
Anthropogenic Effects , Environmental Monitoring , Humans , Algeria , Bayes Theorem , Desert Climate , Conservation of Natural ResourcesABSTRACT
Frequent floods are a severe threat to the well-being of people the world over. This is particularly severe in developing countries like India where tropical monsoon climate prevails. Recently, flood hazard susceptibility mapping has become a popular tool to mitigate the effects of this threat. Therefore, the present study utilized four distinctive Machine Learning algorithms i.e., K-Nearest Neighbor, Decision Tree, Naive Bayes, and Random Forest to estimate flood susceptibility zones in the Agartala Urban Watershed of Tripura, India. The latter experiences debilitating floods during the monsoon season. A multicollinearity test was conducted to examine the collinearity of the chosen flood conditioning factors, and it was seen that none of the factors were compromised by multicollinearity. Results showed that around three-fourths of the AUW area was classified as moderate to very high flood-prone zones, while over 20 percent was between low and very low flood-prone zones. The models applied performed well with ROC-AUC scores greater than 70 percent and MAE, MSE, and RMSE scores less than 30 percent. DT and RF algorithms were suggested for places with similar physical characteristics based on their outstanding performance and the training datasets. The study provides valuable insights to policymakers, administrative authorities, and local stakeholders to cope with floods and enhance flood prevention measures as a climate change adaptation strategy in the AUW.
Subject(s)
Environmental Monitoring , Floods , Humans , Bayes Theorem , Environmental Monitoring/methods , Algorithms , Machine Learning , IndiaABSTRACT
Many countries curate national registries of liver transplant (LT) data. These registries are often used to generate predictive models; however, potential performance and transferability of these models remain unclear. We used data from 3 national registries and developed machine learning algorithm (MLA)-based models to predict 90-day post-LT mortality within and across countries. Predictive performance and external validity of each model were assessed. Prospectively collected data of adult patients (aged ≥18 years) who underwent primary LTs between January 2008 and December 2018 from the Canadian Organ Replacement Registry (Canada), National Health Service Blood and Transplantation (United Kingdom), and United Network for Organ Sharing (United States) were used to develop MLA models to predict 90-day post-LT mortality. Models were developed using each registry individually (based on variables inherent to the individual databases) and using all 3 registries combined (variables in common between the registries [harmonized]). The model performance was evaluated using area under the receiver operating characteristic (AUROC) curve. The number of patients included was as follows: Canada, n = 1214; the United Kingdom, n = 5287; and the United States, n = 59,558. The best performing MLA-based model was ridge regression across both individual registries and harmonized data sets. Model performance diminished from individualized to the harmonized registries, especially in Canada (individualized ridge: AUROC, 0.74; range, 0.73-0.74; harmonized: AUROC, 0.68; range, 0.50-0.73) and US (individualized ridge: AUROC, 0.71; range, 0.70-0.71; harmonized: AUROC, 0.66; range, 0.66-0.66) data sets. External model performance across countries was poor overall. MLA-based models yield a fair discriminatory potential when used within individual databases. However, the external validity of these models is poor when applied across countries. Standardization of registry-based variables could facilitate the added value of MLA-based models in informing decision making in future LTs.
Subject(s)
Liver Transplantation , Adult , Humans , Adolescent , State Medicine , Canada/epidemiology , Machine Learning , Registries , Retrospective StudiesABSTRACT
BACKGROUND: Oxaliplatin-based chemotherapy is the first-line treatment for colorectal cancer (CRC). Long noncoding RNAs (lncRNAs) have been implicated in chemotherapy sensitivity. This study aimed to identify lncRNAs related to oxaliplatin sensitivity and predict the prognosis of CRC patients underwent oxaliplatin-based chemotherapy. METHODS: Data from the Genomics of Drug Sensitivity in Cancer (GDSC) was used to screen for lncRNAs related to oxaliplatin sensitivity. Four machine learning algorithms (LASSO, Decision tree, Random-forest, and support vector machine) were applied to identify the key lncRNAs. A predictive model for oxaliplatin sensitivity and a prognostic model based on key lncRNAs were established. The published datasets, and cell experiments were used to verify the predictive value. RESULTS: A total of 805 tumor cell lines from GDSC were divided into oxaliplatin sensitive (top 1/3) and resistant (bottom 1/3) groups based on their IC50 values, and 113 lncRNAs, which were differentially expressed between the two groups, were selected and incorporated into four machine learning algorithms, and seven key lncRNAs were identified. The predictive model exhibited good predictions for oxaliplatin sensitivity. The prognostic model exhibited high performance in patients with CRC who underwent oxaliplatin-based chemotherapies. Four lncRNAs, including C20orf197, UCA1, MIR17HG, and MIR22HG, displayed consistent responses to oxaliplatin treatment in the validation analysis. CONCLUSION: Certain lncRNAs were associated with oxaliplatin sensitivity and predicted the response to oxaliplatin treatment. The prognostic models established based on the key lncRNAs could predict the prognosis of patients given oxaliplatin-based chemotherapy.
ABSTRACT
Water-soluble organic matter (WSOM) formed through aqueous processes contributes substantially to total atmospheric aerosol, however, the impact of water evaporation on particle concentrations is highly uncertain. Herein, we present a novel approach to predict the amount of evaporated organic mass induced by sample drying using multivariate polynomial regression and random forest (RF) machine learning models. The impact of particle drying on fine WSOM was monitored during three consecutive summers in Baltimore, MD (2015, 2016, and 2017). The amount of evaporated organic mass was dependent on relative humidity (RH), WSOM concentrations, isoprene concentrations, and NOx/isoprene ratios. Different models corresponding to each class were fitted (trained and tested) to data from the summers of 2015 and 2016 while model validation was performed using summer 2017 data. Using the coefficient of determination (R2) and the root-mean-square error (RMSE), it was concluded that an RF model with 100 decision trees had the best performance (R2 of 0.81) and the lowest normalized mean error (NME < 1%) leading to low model uncertainties. The relative feature importance for the RF model was calculated to be 0.55, 0.2, 0.15, and 0.1 for WSOM concentrations, RH levels, isoprene concentrations, and NOx/isoprene ratios, respectively. The machine learning model was thus used to predict summertime concentrations of evaporated organics in Yorkville, Georgia, and Centerville, Alabama in 2016 and 2013, respectively. Results presented herein have implications for measurements that rely on sample drying using a machine learning approach for the analysis and interpretation of atmospheric data sets to elucidate their complex behavior.
Subject(s)
Butadienes , Water , Baltimore , Aerosols/analysisABSTRACT
BACKGROUND: Tuberculosis is a chronic infectious disease caused by mycobacterium tuberculosis (MTB) and is the ninth leading cause of death worldwide. It is still difficult to distinguish active TB from latent TB,but it is very important for individualized management and treatment to distinguish whether patients are active or latent tuberculosis infection. METHODS: A total of 220 subjects, including active TB patients (ATB, n = 97) and latent TB patients (LTB, n = 113), were recruited in this study .46 features about blood routine indicators and the VCS parameters (volume, conductivity, light scatter) of neutrophils(NE), monocytes(MO), and lymphocytes(LY) were collected and was constructed classification model by four machine learning algorithms(logistic regression(LR), random forest(RF), support vector machine(SVM) and k-nearest neighbor(KNN)). And the area under the precision-recall curve (AUPRC) and the area under the receiver operating characteristic curve (AUROC) to estimate of the model's predictive performance for dentifying active and latent tuberculosis infection. RESULTS: After verification,among the four classifications, LR and RF had the best performance (AUROC = 1, AUPRC = 1), followed by SVM (AUROC = 0.967, AUPRC = 0.971), KNN (AUROC = 0.943, AUPRC = 0.959) in the training set. And LR had the best performance (AUROC = 0.977, AUPRC = 0.957), followed by SVM (AUROC = 0.962, AUPRC = 0.949), RF (AUROC = 0.903, AUPRC = 0.922),KNN(AUROC = 0.883, AUPRC = 0.901) in the testing set. CONCLUSIONS: The machine learning algorithm classifier based on leukocyte VCS parameters is of great value in identifying active and latent tuberculosis infection.
Subject(s)
Latent Tuberculosis , Mycobacterium tuberculosis , Tuberculosis , Humans , Latent Tuberculosis/diagnosis , Algorithms , Machine LearningABSTRACT
The determination of sudden cardiac death (SCD) is one of the difficult tasks in the forensic practice, especially in the absence of specific morphological changes in the autopsies and histological investigations. In this study, we combined the metabolic characteristics from corpse specimens of cardiac blood and cardiac muscle to predict SCD. Firstly, ultra-high performance liquid chromatography coupled with high-resolution mass spectrometry (UPLC-HRMS)-based untargeted metabolomics was applied to obtain the metabolomic profiles of the specimens, and 18 and 16 differential metabolites were identified in the cardiac blood and cardiac muscle from the corpses of those who died of SCD, respectively. Several possible metabolic pathways were proposed to explain these metabolic alterations, including the metabolism of energy, amino acids, and lipids. Then, we validated the capability of these combinations of differential metabolites to distinguish between SCD and non-SCD through multiple machine learning algorithms. The results showed that stacking model integrated differential metabolites featured from the specimens showed the best performance with 92.31% accuracy, 93.08% precision, 92.31% recall, 91.96% F1 score, and 0.92 AUC. Our results revealed that the SCD metabolic signature identified by metabolomics and ensemble learning in cardiac blood and cardiac muscle has potential in SCD post-mortem diagnosis and metabolic mechanism investigations.