Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 93
Filter
1.
Med Image Anal ; 97: 103257, 2024 Jul 01.
Article in English | MEDLINE | ID: mdl-38981282

ABSTRACT

The alignment of tissue between histopathological whole-slide-images (WSI) is crucial for research and clinical applications. Advances in computing, deep learning, and availability of large WSI datasets have revolutionised WSI analysis. Therefore, the current state-of-the-art in WSI registration is unclear. To address this, we conducted the ACROBAT challenge, based on the largest WSI registration dataset to date, including 4,212 WSIs from 1,152 breast cancer patients. The challenge objective was to align WSIs of tissue that was stained with routine diagnostic immunohistochemistry to its H&E-stained counterpart. We compare the performance of eight WSI registration algorithms, including an investigation of the impact of different WSI properties and clinical covariates. We find that conceptually distinct WSI registration methods can lead to highly accurate registration performances and identify covariates that impact performances across methods. These results provide a comparison of the performance of current WSI registration methods and guide researchers in selecting and developing methods.

2.
J Affect Disord ; 357: 148-155, 2024 Jul 15.
Article in English | MEDLINE | ID: mdl-38670463

ABSTRACT

BACKGROUND: Anxiety disorders are among the most common mental health disorders in the middle aged and older population. Because older individuals are more likely to have multiple comorbidities or increased frailty, the impact of anxiety disorders on their overall well-being is exacerbated. Early identification of anxiety disorders using machine learning (ML) can potentially mitigate the adverse consequences associated with these disorders. METHODS: We applied ML to the data from the Canadian Longitudinal Study on Aging (CLSA) to predict the onset of anxiety disorders approximately three years in the future. We used Shapley value-based methods to determine the top factor for prediction. We also investigated whether anxiety onset can be predicted by baseline depression-related predictors alone. RESULTS: Our model was able to predict anxiety onset accurately (Area under the Receiver Operating Characteristic Curve or AUC = 0.814 ± 0.016 (mean ± standard deviation), balanced accuracy = 0.741 ± 0.016, sensitivity = 0.743 ± 0.033, and specificity = 0.738 ± 0.010). The top predictive factors included prior depression or mood disorder diagnosis, high frailty, anxious personality, and low emotional stability. Depression and mood disorders are well known comorbidity of anxiety; however a prior depression or mood disorder diagnosis could not predict anxiety onset without other factors. LIMITATION: While our findings underscore the importance of a prior depression diagnosis in predicting anxiety, they also highlight that it alone is inadequate, signifying the necessity to incorporate additional predictors for improved prediction accuracy. CONCLUSION: Our study showcases promising prospects for using machine learning to develop personalized prediction models for anxiety onset in middle-aged and older adults using easy-to-access survey data.


Subject(s)
Anxiety Disorders , Machine Learning , Humans , Female , Male , Canada/epidemiology , Longitudinal Studies , Aged , Anxiety Disorders/epidemiology , Anxiety Disorders/diagnosis , Anxiety Disorders/psychology , Middle Aged , Aging/psychology , Aged, 80 and over , Depression/epidemiology , Depression/diagnosis , Depression/psychology , Comorbidity , Frailty/diagnosis , Frailty/epidemiology , Prospective Studies , Anxiety/epidemiology , Anxiety/diagnosis , Anxiety/psychology
3.
Cancers (Basel) ; 16(4)2024 Feb 15.
Article in English | MEDLINE | ID: mdl-38398176

ABSTRACT

Recent advances in our understanding of gastric cancer biology have prompted a shift towards more personalized therapy. However, results are based on population-based survival analyses, which evaluate the average survival effects of entire treatment groups or single prognostic variables. This study uses a personalized survival modelling approach called individual survival distributions (ISDs) with the multi-task logistic regression (MTLR) model to provide novel insight into personalized survival in gastric adenocarcinoma. We performed a pooled analysis using 1043 patients from a previously characterized database annotated with molecular subtypes from the Cancer Genome Atlas, Asian Cancer Research Group, and tumour microenvironment (TME) score. The MTLR model achieved a 5-fold cross-validated concordance index of 72.1 ± 3.3%. This model found that the TME score and chemotherapy had similar survival effects over the entire study time. The TME score provided the greatest survival benefit beyond a 5-year follow-up. Stage III and Stage IV disease contributed the greatest negative effect on survival. The MTLR model weights were significantly correlated with the Cox model coefficients (Pearson coefficient = 0.86, p < 0.0001). We illustrate how ISDs can accurately predict the survival time for each patient, which is especially relevant in cases of molecular subtype heterogeneity. This study provides evidence that the TME score is principally associated with long-term survival in gastric adenocarcinoma. Additional external validation and investigation into the clinical utility of this ISD model in gastric cancer is an area of future research.

4.
Anal Chem ; 95(50): 18326-18334, 2023 12 19.
Article in English | MEDLINE | ID: mdl-38048435

ABSTRACT

The market for illicit drugs has been reshaped by the emergence of more than 1100 new psychoactive substances (NPS) over the past decade, posing a major challenge to the forensic and toxicological laboratories tasked with detecting and identifying them. Tandem mass spectrometry (MS/MS) is the primary method used to screen for NPS within seized materials or biological samples. The most contemporary workflows necessitate labor-intensive and expensive MS/MS reference standards, which may not be available for recently emerged NPS on the illicit market. Here, we present NPS-MS, a deep learning method capable of accurately predicting the MS/MS spectra of known and hypothesized NPS from their chemical structures alone. NPS-MS is trained by transfer learning from a generic MS/MS prediction model on a large data set of MS/MS spectra. We show that this approach enables a more accurate identification of NPS from experimentally acquired MS/MS spectra than any existing method. We demonstrate the application of NPS-MS to identify a novel derivative of phencyclidine (PCP) within an unknown powder seized in Denmark without the use of any reference standards. We anticipate that NPS-MS will allow forensic laboratories to identify more rapidly both known and newly emerging NPS. NPS-MS is available as a web server at https://nps-ms.ca/, which provides MS/MS spectra prediction capabilities for given NPS compounds. Additionally, it offers MS/MS spectra identification against a vast database comprising approximately 8.7 million predicted NPS compounds from DarkNPS and 24.5 million predicted ESI-QToF-MS/MS spectra for these compounds.


Subject(s)
Deep Learning , Illicit Drugs , Tandem Mass Spectrometry/methods , Psychotropic Drugs/analysis , Illicit Drugs/analysis , Spectrometry, Mass, Electrospray Ionization
5.
Sci Rep ; 13(1): 20713, 2023 11 24.
Article in English | MEDLINE | ID: mdl-38001260

ABSTRACT

Amyotrophic Lateral Sclerosis (ALS) is a rapidly progressive neurodegenerative disease. Accurately predicting the survival time for ALS patients can help patients and clinicians to plan for future treatment and care. We describe the application of a machine-learned tool that incorporates clinical features and cortical thickness from brain magnetic resonance (MR) images to estimate the time until a composite respiratory failure event for ALS patients, and presents the prediction as individual survival distributions (ISDs). These ISDs provide the probability of survival (none of the respiratory failures) at multiple future time points, for each individual patient. Our learner considers several survival prediction models, and selects the best model to provide predictions. We evaluate our learned model using the mean absolute error margin (MAE-margin), a modified version of mean absolute error that handles data with censored outcomes. We show that our tool can provide helpful information for patients and clinicians in planning future treatment.


Subject(s)
Amyotrophic Lateral Sclerosis , Neurodegenerative Diseases , Humans , Amyotrophic Lateral Sclerosis/diagnosis , Probability , Brain , Learning , Disease Progression
7.
Gerontology ; 69(12): 1394-1403, 2023.
Article in English | MEDLINE | ID: mdl-37725932

ABSTRACT

INTRODUCTION: An aging population will bring a pressing challenge for the healthcare system. Insights into promoting healthy longevity can be gained by quantifying the biological aging process and understanding the roles of modifiable lifestyle and environmental factors, and chronic disease conditions. METHODS: We developed a biological age (BioAge) index by applying multiple state-of-art machine learning models based on easily accessible blood test data from the Canadian Longitudinal Study of Aging (CLSA). The BioAge gap, which is the difference between BioAge index and chronological age, was used to quantify the differential aging, i.e., the difference between biological and chronological age, of the CLSA participants. We further investigated the associations between the BioAge gap and lifestyle, environmental factors, and current and future health conditions. RESULTS: BioAge gap had strong associations with existing adverse health conditions (e.g., cancers, cardiovascular diseases, diabetes, and kidney diseases) and future disease onset (e.g., Parkinson's disease, diabetes, and kidney diseases). We identified that frequent consumption of processed meat, pork, beef, and chicken, poor outcomes in nutritional risk screening, cigarette smoking, exposure to passive smoking are associated with positive BioAge gap ("older" BioAge than expected). We also identified several modifiable factors, including eating fruits, legumes, vegetables, related to negative BioAge gap ("younger" BioAge than expected). CONCLUSIONS: Our study shows that a BioAge index based on easily accessible blood tests has the potential to quantify the differential biological aging process that can be associated with current and future adverse health events. The identified risk and protective factors for differential aging indicated by BioAge gap are informative for future research and guidelines to promote healthy longevity.


Subject(s)
Diabetes Mellitus , Kidney Diseases , Animals , Cattle , Humans , Aged , Longitudinal Studies , Canada/epidemiology , Aging , Life Style
8.
Clin Cancer Res ; 29(19): 3924-3936, 2023 10 02.
Article in English | MEDLINE | ID: mdl-37463063

ABSTRACT

PURPOSE: Personalized medicine attempts to predict survival time for each patient, based on their individual tumor molecular profile. We investigate whether our survival learner in combination with a dimension reduction method can produce useful survival estimates for a variety of patients with cancer. EXPERIMENTAL DESIGN: This article provides a method that learns a model for predicting the survival time for individual patients with cancer from the PanCancer Atlas: given the (16,335 dimensional) gene expression profiles from 10,173 patients, each having one of 33 cancers, this method uses unsupervised nonnegative matrix factorization (NMF) to reexpress the gene expression data for each patient in terms of 100 learned NMF factors. It then feeds these 100 factors into the Multi-Task Logistic Regression (MTLR) learner to produce cancer-specific models for each of 20 cancers (with >50 uncensored instances); this produces "individual survival distributions" (ISD), which provide survival probabilities at each future time for each individual patient, which provides a patient's risk score and estimated survival time. RESULTS: Our NMF-MTLR concordance indices outperformed the VAECox benchmark by 14.9% overall. We achieved optimal survival prediction using pan-cancer NMF in combination with cancer-specific MTLR models. We provide biological interpretation of the NMF model and clinical implications of ISDs for prognosis and therapeutic response prediction. CONCLUSIONS: NMF-MTLR provides many benefits over other models: superior model discrimination, superior calibration, meaningful survival time estimates, and accurate probabilistic estimates of survival over time for each individual patient. We advocate for the adoption of these cancer survival models in clinical and research settings.


Subject(s)
Neoplasms , Transcriptome , Humans , Algorithms , Neoplasms/genetics
9.
J Affect Disord ; 339: 52-57, 2023 10 15.
Article in English | MEDLINE | ID: mdl-37380110

ABSTRACT

BACKGROUND: Early identification of the middle-aged and elderly people with high risk of developing depression disorder in the future and the full characterization of the associated risk factors are crucial for early interventions to prevent depression among the aging population. METHODS: Canadian Longitudinal Study on Aging (CLSA) has collected comprehensive information, including psychological scales and other non-psychological measures, i.e., socioeconomic, environmental, health, lifestyle, cognitive function, personality, about its participants (30,097 subjects aged from 45 to 85) at baseline phase in 2012-2015. We applied machine learning models for the prediction of these participants' risk of depression onset approximately three years later using information collected at baseline phase. RESULTS: Individual-level risk for future depression onset among CLSA participants can be accurately predicted, with an area under receiver operating characteristic curve (AUC) 0.791 ± 0.016, using all baseline information. We also found the 10-item Center for Epidemiological Studies Depression Scale coupled with age and sex information could achieve similar performance (AUC 0.764 ± 0.016). Furthermore, we identified existing subthreshold depression symptoms, emotional instability, low levels of life satisfaction, perceived health, and social support, and nutrition risk as the most important predictors for depression onset independent from psychological scales. LIMITATIONS: Depression was based on self-reported doctor diagnosis and depression screening tool. CONCLUSIONS: The identified risk factors will further improve our understanding of the depression onset among middle-aged and elderly population and the early identification of high-risk subjects is the first step for successful early interventions.


Subject(s)
Aging , Depression , Middle Aged , Humans , Adult , Aged , Longitudinal Studies , Depression/diagnosis , Depression/epidemiology , Depression/psychology , Canada/epidemiology , Aging/psychology , Machine Learning
10.
IEEE Trans Biomed Eng ; 70(12): 3389-3400, 2023 Dec.
Article in English | MEDLINE | ID: mdl-37339045

ABSTRACT

An Individual Survival Distribution (ISD) models a patient's personalized survival probability at all future time points. Previously, ISD models have been shown to produce accurate and personalized survival estimates (for example, time to relapse or to death) in several clinical applications. However, off-the-shelf neural-network-based ISD models are usually opaque models due to their limited support for meaningful feature selection and uncertainty estimation, which hinders their wide clinical adoption. Here, we introduce a Bayesian-neural-network-based ISD (BNN-ISD) model that produces accurate survival estimates but also quantifies the uncertainty in model's parameter estimation, which can be used to (1) rank the importance of the input features to support feature selection and (2) compute credible intervals around ISDs for clinicians to assess the model's confidence in its prediction. Our BNN-ISD model utilized sparsity-inducing priors to learn a sparse set of weights to enable feature selection. We provide empirical evidence, on 2 synthetic and 3 real-world clinical datasets, that BNN-ISD system can effectively select meaningful features and compute trustworthy credible intervals of the survival distribution for each patient. We observed that our approach accurately recovers feature importance in the synthetic datasets and selects meaningful features for the real-world clinical data as well, while also achieving state-of-the-art survival prediction performance. We also show that these credible regions can aid in clinical decision-making by providing a gauge of the uncertainty of the estimated ISD curves.


Subject(s)
Neural Networks, Computer , Humans , Bayes Theorem , Uncertainty
11.
NPJ Digit Med ; 6(1): 21, 2023 Feb 06.
Article in English | MEDLINE | ID: mdl-36747065

ABSTRACT

The feasibility and value of linking electrocardiogram (ECG) data to longitudinal population-level administrative health data to facilitate the development of a learning healthcare system has not been fully explored. We developed ECG-based machine learning models to predict risk of mortality among patients presenting to an emergency department or hospital for any reason. Using the 12-lead ECG traces and measurements from 1,605,268 ECGs from 748,773 healthcare episodes of 244,077 patients (2007-2020) in Alberta, Canada, we developed and validated ResNet-based Deep Learning (DL) and gradient boosting-based XGBoost (XGB) models to predict 30-day, 1-year, and 5-year mortality. The models for 30-day, 1-year, and 5-year mortality were trained on 146,173, 141,072, and 111,020 patients and evaluated on 97,144, 89,379, and 55,650 patients, respectively. In the evaluation cohort, 7.6%, 17.3%, and 32.9% patients died by 30-days, 1-year, and 5-years, respectively. ResNet models based on ECG traces alone had good-to-excellent performance with area under receiver operating characteristic curve (AUROC) of 0.843 (95% CI: 0.838-0.848), 0.812 (0.808-0.816), and 0.798 (0.792-0.803) for 30-day, 1-year and 5-year prediction, respectively; and were superior to XGB models based on ECG measurements with AUROC of 0.782 (0.776-0.789), 0.784 (0.780-0.788), and 0.746 (0.740-0.751). This study demonstrates the validity of ECG-based DL mortality prediction models at the population-level that can be leveraged for prognostication at point of care.

12.
Article in English | MEDLINE | ID: mdl-35195049

ABSTRACT

The absence of disease modifying treatments for amyotrophic lateral sclerosis (ALS) is in large part a consequence of its complexity and heterogeneity. Deep clinical and biological phenotyping of people living with ALS would assist in the development of effective treatments and target specific biomarkers to monitor disease progression and inform on treatment efficacy. The objective of this paper is to present the Comprehensive Analysis Platform To Understand Remedy and Eliminate ALS (CAPTURE ALS), an open and translational platform for the scientific community currently in development. CAPTURE ALS is a Canadian-based platform designed to include participants' voices in its development and through execution. Standardized methods will be used to longitudinally characterize ALS patients and healthy controls through deep clinical phenotyping, neuroimaging, neurocognitive and speech assessments, genotyping and multisource biospecimen collection. This effort plugs into complementary Canadian and international initiatives to share common resources. Here, we describe in detail the infrastructure, operating procedures, and long-term vision of CAPTURE ALS to facilitate and accelerate translational ALS research in Canada and beyond.


Subject(s)
Amyotrophic Lateral Sclerosis , Humans , Amyotrophic Lateral Sclerosis/drug therapy , Canada , Biomarkers , Disease Progression , Neuroimaging
13.
Can J Psychiatry ; 68(1): 54-63, 2023 01.
Article in English | MEDLINE | ID: mdl-35892186

ABSTRACT

OBJECTIVE: Opioid use disorder (OUD) is a chronic relapsing disorder with a problematic pattern of opioid use, affecting nearly 27 million people worldwide. Machine learning (ML)-based prediction of OUD may lead to early detection and intervention. However, most ML prediction studies were not based on representative data sources and prospective validations, limiting their potential to predict future new cases. In the current study, we aimed to develop and prospectively validate an ML model that could predict individual OUD cases based on representative large-scale health data. METHOD: We present an ensemble machine-learning model trained on a cross-linked Canadian administrative health data set from 2014 to 2018 (n = 699,164), with validation of model-predicted OUD cases on a hold-out sample from 2014 to 2018 (n = 174,791) and prospective prediction of OUD cases on a non-overlapping sample from 2019 (n = 316,039). We used administrative records of OUD diagnosis for each subject based on International Classification of Diseases (ICD) codes. RESULTS: With 6409 OUD cases in 2019 (mean [SD], 45.34 [14.28], 3400 males), our model prospectively predicted OUD cases at a high accuracy (balanced accuracy, 86%, sensitivity, 93%; specificity 79%). In accord with prior findings, the top risk factors for OUD in this model were opioid use indicators and a history of other substance use disorders. CONCLUSION: Our study presents an individualized prospective prediction of OUD cases by applying ML to large administrative health datasets. Such prospective predictions based on ML would be essential for potential future clinical applications in the early detection of OUD.


Subject(s)
Analgesics, Opioid , Opioid-Related Disorders , Male , Humans , Analgesics, Opioid/therapeutic use , Canada/epidemiology , Opioid-Related Disorders/diagnosis , Opioid-Related Disorders/epidemiology , Opioid-Related Disorders/drug therapy , Risk Factors
14.
PLoS One ; 17(12): e0279174, 2022.
Article in English | MEDLINE | ID: mdl-36534670

ABSTRACT

We propose a method to predict when a woman will develop breast cancer (BCa) from her lifestyle and health history features. To address this objective, we use data from the Alberta's Tomorrow Project of 18,288 women to train Individual Survival Distribution (ISD) models to predict an individual's Breast-Cancer-Onset (BCaO) probability curve. We show that our three-step approach-(1) filling missing data with multiple imputations by chained equations, followed by (2) feature selection with the multivariate Cox method, and finally, (3) using MTLR to learn an ISD model-produced the model with the smallest L1-Hinge loss among all calibrated models with comparable C-index. We also identified 7 actionable lifestyle features that a woman can modify and illustrate how this model can predict the quantitative effects of those changes-suggesting how much each will potentially extend her BCa-free time. We anticipate this approach could be used to identify appropriate interventions for individuals with a higher likelihood of developing BCa in their lifetime.


Subject(s)
Breast Neoplasms , Humans , Female , Life Style , Probability , Surveys and Questionnaires
15.
BMC Health Serv Res ; 22(1): 1415, 2022 Nov 24.
Article in English | MEDLINE | ID: mdl-36434628

ABSTRACT

BACKGROUND: Hospital readmissions are one of the costliest challenges facing healthcare systems, but conventional models fail to predict readmissions well. Many existing models use exclusively manually-engineered features, which are labor intensive and dataset-specific. Our objective was to develop and evaluate models to predict hospital readmissions using derived features that are automatically generated from longitudinal data using machine learning techniques. METHODS: We studied patients discharged from acute care facilities in 2015 and 2016 in Alberta, Canada, excluding those who were hospitalized to give birth or for a psychiatric condition. We used population-level linked administrative hospital data from 2011 to 2017 to train prediction models using both manually derived features and features generated automatically from observational data. The target value of interest was 30-day all-cause hospital readmissions, with the success of prediction measured using the area under the curve (AUC) statistic. RESULTS: Data from 428,669 patients (62% female, 38% male, 27% 65 years or older) were used for training and evaluating models: 24,974 (5.83%) were readmitted within 30 days of discharge for any reason. Patients were more likely to be readmitted if they utilized hospital care more, had more physician office visits, had more prescriptions, had a chronic condition, or were 65 years old or older. The LACE readmission prediction model had an AUC of 0.66 ± 0.0064 while the machine learning model's test set AUC was 0.83 ± 0.0045, based on learning a gradient boosting machine on a combination of machine-learned and manually-derived features. CONCLUSION: Applying a machine learning model to the computer-generated and manual features improved prediction accuracy over the LACE model and a model that used only manually-derived features. Our model can be used to identify high-risk patients, for whom targeted interventions may potentially prevent readmissions.


Subject(s)
Patient Discharge , Patient Readmission , Humans , Male , Female , Aged , Hospitalization , Machine Learning , Alberta/epidemiology
17.
Front Psychiatry ; 13: 923938, 2022.
Article in English | MEDLINE | ID: mdl-35990061

ABSTRACT

Transcranial direct current stimulation (tDCS) is a promising adjuvant treatment for persistent auditory verbal hallucinations (AVH) in Schizophrenia (SZ). Nonetheless, there is considerable inter-patient variability in the treatment response of AVH to tDCS in SZ. Machine-learned models have the potential to predict clinical response to tDCS in SZ. This study aims to examine the feasibility of identifying SZ patients with persistent AVH (SZ-AVH) who will respond to tDCS based on resting-state functional connectivity (rs-FC). Thirty-four SZ-AVH patients underwent resting-state functional MRI at baseline followed by add-on, twice-daily, 20-min sessions with tDCS (conventional/high-definition) for 5 days. A machine learning model was developed to identify tDCS treatment responders based on the rs-FC pattern, using the left superior temporal gyrus (LSTG) as the seed region. Functional connectivity between LSTG and brain regions involved in auditory and sensorimotor processing emerged as the important predictors of the tDCS treatment response. L1-regularized logistic regression model had an overall accuracy of 72.5% in classifying responders vs. non-responders. This model outperformed the state-of-the-art convolutional neural networks (CNN) model-both without (59.41%) and with pre-training (68.82%). It also outperformed the L1-logistic regression model trained with baseline demographic features and clinical scores of SZ patients. This study reports the first evidence that rs-fMRI-derived brain connectivity pattern can predict the clinical response of persistent AVH to add-on tDCS in SZ patients with 72.5% accuracy.

18.
PLoS One ; 17(7): e0252697, 2022.
Article in English | MEDLINE | ID: mdl-35901020

ABSTRACT

Many researchers try to understand a biological condition by identifying biomarkers. This is typically done using univariate hypothesis testing over a labeled dataset, declaring a feature to be a biomarker if there is a significant statistical difference between its values for the subjects with different outcomes. However, such sets of proposed biomarkers are often not reproducible - subsequent studies often fail to identify the same sets. Indeed, there is often only a very small overlap between the biomarkers proposed in pairs of related studies that explore the same phenotypes over the same distribution of subjects. This paper first defines the Reproducibility Score for a labeled dataset as a measure (taking values between 0 and 1) of the reproducibility of the results produced by a specified fixed biomarker discovery process for a given distribution of subjects. We then provide ways to reliably estimate this score by defining algorithms that produce an over-bound and an under-bound for this score for a given dataset and biomarker discovery process, for the case of univariate hypothesis testing on dichotomous groups. We confirm that these approximations are meaningful by providing empirical results on a large number of datasets and show that these predictions match known reproducibility results. To encourage others to apply this technique to analyze their biomarker sets, we have also created a publicly available website, https://biomarker.shinyapps.io/BiomarkerReprod/, that produces these Reproducibility Score approximations for any given dataset (with continuous or discrete features and binary class labels).


Subject(s)
Algorithms , Biomarkers , Humans , Reproducibility of Results
19.
Neuroimage Clin ; 35: 103120, 2022.
Article in English | MEDLINE | ID: mdl-35908308

ABSTRACT

Many previous intervention studies have used functional magnetic resonance imaging (fMRI) data to predict the antidepressant response of patients with major depressive disorder (MDD); however, practical constraints have limited many of those attempts to small, single centre studies which may not adequately reflect how these models will generalize when used in clinical practice. Not only does the act of collecting data at multiple sites generally increase sample sizes (a critical point in machine learning development) it also generates a more heterogeneous dataset due to systematic differences in scanners at different sites, and geographical differences in patient populations. As part of the Canadian Biomarker Integration Network in Depression (CAN-BIND-1) study, 144 MDD patients from six sites underwent resting state fMRI prior to starting escitalopram treatment, and again two weeks after the start. Here, we consider ways to use machine learning techniques to produce models that can predict response (measured at eight weeks after initiation), based on various parcellations, functional connectivity (FC) metrics, dimensionality reduction algorithms, and base learners, and also whether to use scans from one or both time points. Models that use only baseline (pre-treatment) or only week 2 (early-response) whole-brain FC features consistently failed to perform significantly better than default models. Utilizing the change in FC between these two time points, however, yielded significant results, with the best performing analytical pipeline achieving 69.6% (SD 10.8) accuracy. These results appear contrary to findings from many smaller single-site studies, which report substantially higher predictive accuracies from models trained on only baseline resting state FC features, suggesting these models may not generalize well beyond data used for development. Further, these results indicate the potential value of collecting data both before and shortly after treatment initiation.


Subject(s)
Depressive Disorder, Major , Magnetic Resonance Imaging , Biomarkers , Brain/diagnostic imaging , Canada , Depressive Disorder, Major/diagnostic imaging , Depressive Disorder, Major/drug therapy , Escitalopram , Humans , Magnetic Resonance Imaging/methods
20.
Chem Sci ; 13(22): 6669-6686, 2022 Jun 07.
Article in English | MEDLINE | ID: mdl-35756507

ABSTRACT

Advances in diagnostics, therapeutics, vaccines, transfusion, and organ transplantation build on a fundamental understanding of glycan-protein interactions. To aid this, we developed GlyNet, a model that accurately predicts interactions (relative binding strengths) between mammalian glycans and 352 glycan-binding proteins, many at multiple concentrations. For each glycan input, our model produces 1257 outputs, each representing the relative interaction strength between the input glycan and a particular protein sample. GlyNet learns these continuous values using relative fluorescence units (RFUs) measured on 599 glycans in the Consortium for Functional Glycomics glycan arrays and extrapolates these to RFUs from additional, untested glycans. GlyNet's output of continuous values provides more detailed results than the standard binary classification models. After incorporating a simple threshold to transform such continuous outputs the resulting GlyNet classifier outperforms those standard classifiers. GlyNet is the first multi-output regression model for predicting protein-glycan interactions and serves as an important benchmark, facilitating development of quantitative computational glycobiology.

SELECTION OF CITATIONS
SEARCH DETAIL