RESUMO
Over the past two decades, advances in computational power and data availability combined with increased accessibility to pre-trained models have led to an exponential rise in machine learning (ML) publications. While ML may have the potential to transform healthcare, this sharp increase in ML research output without focus on methodological rigor and standard reporting guidelines has fueled a reproducibility crisis. In addition, the rapidly growing complexity of these models compromises their interpretability, which currently impedes their successful and widespread clinical adoption. In medicine, where failure of such models may have severe implications for patients' health, the high requirements for accuracy, robustness, and interpretability confront ML researchers with a unique set of challenges. In this review, we discuss the semantics of reproducibility and interpretability, as well as related issues and challenges, and outline possible solutions to counteracting the "black box". To foster reproducibility, standard reporting guidelines need to be further developed and data or code sharing encouraged. Editors and reviewers may equally play a critical role by establishing high methodological standards and thus preventing the dissemination of low-quality ML publications. To foster interpretable learning, the use of simpler models more suitable for medical data can inform the clinician how results are generated based on input data. Model-agnostic explanation tools, sensitivity analysis, and hidden layer representations constitute further promising approaches to increase interpretability. Balancing model performance and interpretability are important to ensure clinical applicability. We have now reached a critical moment for ML in medicine, where addressing these issues and implementing appropriate solutions will be vital for the future evolution of the field.
Assuntos
Medicina , Humanos , Reprodutibilidade dos Testes , Aprendizado de Máquina , SemânticaRESUMO
We provide explanations on the general principles of machine learning, as well as analytical steps required for successful machine learning-based predictive modeling, which is the focus of this series. In particular, we define the terms machine learning, artificial intelligence, as well as supervised and unsupervised learning, continuing by introducing optimization, thus, the minimization of an objective error function as the central dogma of machine learning. In addition, we discuss why it is important to separate predictive and explanatory modeling, and most importantly state that a prediction model should not be used to make inferences. Lastly, we broadly describe a classical workflow for training a machine learning model, starting with data pre-processing and feature engineering and selection, continuing on with a training structure consisting of a resampling method, hyperparameter tuning, and model selection, and ending with evaluation of model discrimination and calibration as well as robust internal or external validation of the fully developed model. Methodological rigor and clarity as well as understanding of the underlying reasoning of the internal workings of a machine learning approach are required, otherwise predictive applications despite being strong analytical tools are not well accepted into the clinical routine.
Assuntos
Inteligência Artificial , Aprendizado de MáquinaRESUMO
We review the concept of overfitting, which is a well-known concern within the machine learning community, but less established in the clinical community. Overfitted models may lead to inadequate conclusions that may wrongly or even harmfully shape clinical decision-making. Overfitting can be defined as the difference among discriminatory training and testing performance, while it is normal that out-of-sample performance is equal to or ever so slightly worse than training performance for any adequately fitted model, a massively worse out-of-sample performance suggests relevant overfitting. We delve into resampling methods, specifically recommending k-fold cross-validation and bootstrapping to arrive at realistic estimates of out-of-sample error during training. Also, we encourage the use of regularization techniques such as L1 or L2 regularization, and to choose an appropriate level of algorithm complexity for the type of dataset used. Data leakage is addressed, and the importance of external validation to assess true out-of-sample performance and to-upon successful external validation-release the model into clinical practice is discussed. Finally, for highly dimensional datasets, the concepts of feature reduction using principal component analysis (PCA) as well as feature elimination using recursive feature elimination (RFE) are elucidated.
Assuntos
Algoritmos , Aprendizado de MáquinaRESUMO
Various available metrics to describe model performance in terms of discrimination (area under the curve (AUC), accuracy, sensitivity, specificity, positive predictive value, negative predictive value, F1 Score) and calibration (slope, intercept, Brier score, expected/observed ratio, Estimated Calibration Index, Hosmer-Lemeshow goodness-of-fit) are presented. Recalibration is introduced, with Platt scaling and Isotonic regression as proposed methods. We also discuss considerations regarding the sample size required for optimal training of clinical prediction models-explaining why low sample sizes lead to unstable models, and offering the common rule of thumb of at least ten patients per class per input feature, as well as some more nuanced approaches. Missing data treatment and model-based imputation instead of mean, mode, or median imputation is also discussed. We explain how data standardization is important in pre-processing, and how it can be achieved using, e.g. centering and scaling. One-hot encoding is discussed-categorical features with more than two levels must be encoded as multiple features to avoid wrong assumptions. Regarding binary classification models, we discuss how to select a sensible predicted probability cutoff for binary classification using the closest-to-(0,1)-criterion based on AUC or based on the clinical question (rule-in or rule-out). Extrapolation is also discussed.
Assuntos
Aprendizado de Máquina , Área Sob a Curva , Calibragem , Humanos , Valor Preditivo dos TestesRESUMO
We illustrate the steps required to train and validate a simple, machine learning-based clinical prediction model for any binary outcome, such as, for example, the occurrence of a complication, in the statistical programming language R. To illustrate the methods applied, we supply a simulated database of 10,000 glioblastoma patients who underwent microsurgery, and predict the occurrence of 12-month survival. We walk the reader through each step, including import, checking, and splitting of datasets. In terms of pre-processing, we focus on how to practically implement imputation using a k-nearest neighbor algorithm, and how to perform feature selection using recursive feature elimination. When it comes to training models, we apply the theory discussed in Parts I-III. We show how to implement bootstrapping and to evaluate and select models based on out-of-sample error. Specifically for classification, we discuss how to counteract class imbalance by using upsampling techniques. We discuss how the reporting of a minimum of accuracy, area under the curve (AUC), sensitivity, and specificity for discrimination, as well as slope and intercept for calibration-if possible alongside a calibration plot-is paramount. Finally, we explain how to arrive at a measure of variable importance using a universal, AUC-based method. We provide the full, structured code, as well as the complete glioblastoma survival database for the readers to download and execute in parallel to this section.
Assuntos
Aprendizado de Máquina , Modelos Estatísticos , Algoritmos , Humanos , Modelos Logísticos , PrognósticoRESUMO
This chapter goes through the steps required to train and validate a simple, machine learning-based clinical prediction model for any continuous outcome. We supply fully structured code for the readers to download and execute in parallel to this section, as well as a simulated database of 10,000 glioblastoma patients who underwent microsurgery, and predict survival from diagnosis in months. We walk the reader through each step, including import, checking, splitting of data. In terms of pre-processing, we focus on how to practically implement imputation using a k-nearest neighbor algorithm. We also illustrate how to select features based on recursive feature elimination and how to use k-fold cross validation. We demonstrate a generalized linear model, a generalized additive model, a random forest, a ridge regressor, and a Least Absolute Shrinkage and Selection Operator (LASSO) regressor. Specifically for regression, we discuss how to evaluate root mean square error (RMSE), mean average error (MAE), and the R2 statistic, as well as how a quantile-quantile plot can be used to assess the performance of the regressor along the spectrum of the outcome variable, similarly to calibration when dealing with binary outcomes. Finally, we explain how to arrive at a measure of variable importance using a universal, nonparametric method.
Assuntos
Aprendizado de Máquina , Modelos Estatísticos , Algoritmos , Humanos , Modelos Lineares , PrognósticoRESUMO
Advancements in population neuroscience are spurred by the availability of large scale, open datasets, such as the Human Connectome Project or recently introduced UK Biobank. With the increasing data availability, analyses of brain imaging data employ more and more sophisticated machine learning algorithms. However, all machine learning algorithms must balance generalization and complexity. As the detail of neuroimaging data leads to high-dimensional data spaces, model complexity and hence the chance of overfitting increases. Different methodological approaches can be applied to alleviate the problems that arise in high-dimensional settings by reducing the original information into meaningful and concise features. One popular approach is dimensionality reduction, which allows to summarize high-dimensional data into low-dimensional representations while retaining relevant trends and patterns. In this paper, principal component analysis (PCA) is discussed as widely used dimensionality reduction method based on current examples of population-based neuroimaging analyses.
Assuntos
Algoritmos , Neuroimagem , Encéfalo/diagnóstico por imagem , Humanos , Aprendizado de Máquina , Análise de Componente PrincipalRESUMO
Advancements in neuroimaging and the availability of large-scale datasets enable the use of more sophisticated machine learning algorithms. In this chapter, we non-exhaustively discuss relevant analytical steps for the analysis of neuroimaging data using machine learning (ML), while the field of radiomics will be addressed separately (c.f., Chap. 18 -Radiomics). Broadly classified into supervised and unsupervised approaches, we discuss the encoding/decoding framework, which is often applied in cognitive neuroscience, and the use of ML for the analysis of unlabeled data using clustering.
Assuntos
Aprendizado de Máquina , Neuroimagem , Algoritmos , Análise por ConglomeradosRESUMO
In the last decades, modern medicine has evolved into a data-centered discipline, generating massive amounts of granular high-dimensional data exceeding human comprehension. With improved computational methods, machine learning and artificial intelligence (AI) as tools for data processing and analysis are becoming more and more important. At the forefront of neuro-oncology and AI-research, the field of radiomics has emerged. Non-invasive assessments of quantitative radiological biomarkers mined from complex imaging characteristics across various applications are used to predict survival, discriminate between primary and secondary tumors, as well as between progression and pseudo-progression. In particular, the application of molecular phenotyping, envisioned in the field of radiogenomics, has gained popularity for both primary and secondary brain tumors. Although promising results have been obtained thus far, the lack of workflow standardization and availability of multicenter data remains challenging. The objective of this review is to provide an overview of novel applications of machine learning- and deep learning-based radiomics in primary and secondary brain tumors and their implications for future research in the field.
Assuntos
Inteligência Artificial , Neoplasias Encefálicas , Encéfalo , Neoplasias Encefálicas/diagnóstico por imagem , Humanos , Aprendizado de Máquina , Estudos Multicêntricos como AssuntoRESUMO
For almost a century, classical statistical methods including exponential smoothing and autoregression integrated moving averages (ARIMA) have been predominant in the analysis of time series (TS) and in the pursuit of forecasting future events from historical data. TS are chronological sequences of observations, and TS data are therefore prevalent in many aspects of clinical medicine and academic neuroscience. With the rise of highly complex and nonlinear datasets, machine learning (ML) methods have become increasingly popular for prediction or pattern detection and within neurosciences, including neurosurgery. ML methods regularly outperform classical methods and have been successfully applied to, inter alia, predict physiological responses in intracranial pressure monitoring or to identify seizures in EEGs. Implementing nonparametric methods for TS analysis in clinical practice can benefit clinical decision making and sharpen our diagnostic armory.
Assuntos
Aprendizado de Máquina , Modelos Estatísticos , Previsões , Fatores de TempoRESUMO
The applications of artificial intelligence (AI) and machine learning (ML) in modern medicine are growing exponentially, and new developments are fast-paced. However, the lack of trust and appropriate legislation hinder its clinical implementation. Recently, there is a clear increase of directives and considerations on Ethical AI. However, most literature broadly deals with ethical tensions on a meta-level without offering hands-on advice in practice. In this article, we non-exhaustively cover basic practical guidelines regarding AI-specific ethical aspects, including transparency and explicability, equity and mitigation of biases, and lastly, liability.
Assuntos
Inteligência Artificial , Aprendizado de MáquinaRESUMO
Selecting a set of features to include in a clinical prediction model is not always a simple task. The goals of creating parsimonious models with low complexity while, at the same time, upholding predictive performance by explaining a large proportion of the variance within the dependent variable must be balanced. With this aim, one must consider the clinical setting and what data are readily available to clinicians at specific timepoints, as well as more obvious aspects such as the availability of computational power and size of the training dataset. This chapter elucidates the importance and pitfalls in feature selection, focusing on applications in clinical prediction modeling. We demonstrate simple methods such as correlation-, significance-, and variable importance-based filtering, as well as intrinsic feature selection methods such as Lasso and tree- or rule-based methods. Finally, we focus on two algorithmic wrapper methods for feature selection that are commonly used in machine learning: Recursive Feature Elimination (RFE), which can be applied regardless of data and model type, as well as Purposeful Variable Selection as described by Hosmer and Lemeshow, specifically for generalized linear models.
Assuntos
Algoritmos , Máquina de Vetores de Suporte , Aprendizado de Máquina , Modelos Estatísticos , PrognósticoRESUMO
Machine learning (ML) and artificial intelligence (AI) applications in the field of neuroimaging have been on the rise in recent years, and their clinical adoption is increasing worldwide. Deep learning (DL) is a field of ML that can be defined as a set of algorithms enabling a computer to be fed with raw data and progressively discover-through multiple layers of representation-more complex and abstract patterns in large data sets. The combination of ML and radiomics, namely the extraction of features from medical images, has proven valuable, too: Radiomic information can be used for enhanced image characterization and prognosis or outcome prediction. This chapter summarizes the basic concepts underlying ML application for neuroimaging and discusses technical aspects of the most promising algorithms, with a specific focus on Convolutional Neural Networks (CNNs) and Generative Adversarial Networks (GANs), in order to provide the readership with the fundamental theoretical tools to better understand ML in neuroimaging. Applications are highlighted from a practical standpoint in the last section of the chapter, including: image reconstruction and restoration, image synthesis and super-resolution, registration, segmentation, classification, and outcome prediction.
Assuntos
Inteligência Artificial , Aprendizado de Máquina , Algoritmos , Processamento de Imagem Assistida por Computador , Redes Neurais de ComputaçãoRESUMO
PURPOSE: PET using radiolabeled amino acid [18F]-fluoro-ethyl-L-tyrosine (FET-PET) is a well-established imaging modality for glioma diagnostics. The biological tumor volume (BTV) as depicted by FET-PET often differs in volume and location from tumor volume of contrast enhancement (CE) in MRI. Our aim was to investigate whether a gross total resection of BTVs defined as < 1 cm3 of residual BTV (PET GTR) correlates with better oncological outcome. METHODS: We retrospectively analyzed imaging and survival data from patients with primary and recurrent WHO grade III or IV gliomas who underwent FET-PET before surgical resection. Tumor overlap between FET-PET and CE was evaluated. Completeness of FET-PET resection (PET GTR) was calculated after superimposition and semi-automated segmentation of pre-operative FET-PET and postoperative MRI imaging. Survival analysis was performed using the Kaplan-Meier method and the log-rank test. RESULTS: From 30 included patients, PET GTR was achieved in 20 patients. Patients with PET GTR showed improved median OS with 19.3 compared to 13.7 months for patients with residual FET uptake (p = 0.007; HR 0.3; 95% CI 0.12-0.76). This finding remained as independent prognostic factor after performing multivariate analysis (HR 0.19, 95% CI 0.06-0.62, p = 0.006). Other survival influencing factors such as age, IDH-mutation, MGMT promotor status, and adjuvant treatment modalities were equally distributed between both groups. CONCLUSION: Our results suggest that PET GTR improves the OS in patients with WHO grade III or IV gliomas. A multimodal imaging approach including FET-PET for surgical planning in newly diagnosed and recurrent tumors may improve the oncological outcome in glioma patients.
Assuntos
Neoplasias Encefálicas , Glioma , Neoplasias Encefálicas/diagnóstico por imagem , Neoplasias Encefálicas/genética , Neoplasias Encefálicas/cirurgia , Glioblastoma , Glioma/diagnóstico por imagem , Glioma/genética , Glioma/cirurgia , Humanos , Imageamento por Ressonância Magnética , Imagem Multimodal , Tomografia por Emissão de Pósitrons/métodos , Estudos Retrospectivos , Tirosina , Organização Mundial da SaúdeRESUMO
The human default mode network (DMN) is implicated in several unique mental capacities. In this study, we tested whether brain-wide interregional communication in the DMN can be derived from population variability in intrinsic activity fluctuations, gray-matter morphology, and fiber tract anatomy. In a sample of 10,000 UK Biobank participants, pattern-learning algorithms revealed functional coupling states in the DMN that are linked to connectivity profiles between other macroscopical brain networks. In addition, DMN gray matter volume was covaried with white matter microstructure of the fornix. Collectively, functional and structural patterns unmasked a possible division of labor within major DMN nodes: Subregions most critical for cortical network interplay were adjacent to subregions most predictive of fornix fibers from the hippocampus that processes memories and places.
Assuntos
Encéfalo/diagnóstico por imagem , Adulto , Idoso , Algoritmos , Bancos de Espécimes Biológicos , Encéfalo/fisiologia , Mapeamento Encefálico , Feminino , Substância Cinzenta/diagnóstico por imagem , Substância Cinzenta/fisiologia , Humanos , Aprendizagem , Imageamento por Ressonância Magnética , Masculino , Pessoa de Meia-Idade , Reino Unido , Substância Branca/diagnóstico por imagem , Substância Branca/fisiologiaRESUMO
OBJECTIVE: Patient-reported outcome measures following elective lumbar fusion surgery demonstrate major heterogeneity. Individualized prediction tools can provide valuable insights for shared decision-making. We externally validated the spine surgical care and outcomes assessment programme/comparative effectiveness translational network (SCOAP-CERTAIN) model for prediction of 12-month minimum clinically important difference in Oswestry Disability Index (ODI) and in numeric rating scales for back (NRS-BP) and leg pain (NRS-LP) after elective lumbar fusion. METHODS: Data from a prospective registry were obtained. We calculated the area under the curve (AUC), calibration slope and intercept, and Hosmer-Lemeshow values to estimate discrimination and calibration of the models. RESULTS: We included 100 patients, with average age of 50.4 ± 11.4 years. For 12-month ODI, AUC was 0.71 while the calibration intercept and slope were 1.08 and 0.95, respectively. For NRS-BP, AUC was 0.72, with a calibration intercept of 1.02, and slope of 0.74. For NRS-LP, AUC was 0.83, with a calibration intercept of 1.08, and slope of 0.95. Sensitivity ranged from 0.64 to 1.00, while specificity ranged from 0.38 to 0.65. A lack of fit was found for all three models based on Hosmer-Lemeshow testing. CONCLUSIONS: The SCOAP-CERTAIN tool can accurately predict which patients will achieve favourable outcomes. However, the predicted probabilities-which are the most valuable in clinical practice-reported by the tool do not correspond well to the true probability of a favourable outcome. We suggest that any prediction tool should first be externally validated before it is applied in routine clinical practice. These slides can be retrieved under Electronic Supplementary Material.
Assuntos
Fusão Vertebral , Adulto , Idoso , Feminino , Humanos , Vértebras Lombares/cirurgia , Região Lombossacral , Masculino , Pessoa de Meia-Idade , Dor , Resultado do TratamentoRESUMO
BACKGROUND: Recent technological advances have led to the development and implementation of machine learning (ML) in various disciplines, including neurosurgery. Our goal was to conduct a comprehensive survey of neurosurgeons to assess the acceptance of and attitudes toward ML in neurosurgical practice and to identify factors associated with its use. METHODS: The online survey consisted of nine or ten mandatory questions and was distributed in February and March 2019 through the European Association of Neurosurgical Societies (EANS) and the Congress of Neurosurgeons (CNS). RESULTS: Out of 7280 neurosurgeons who received the survey, we received 362 responses, with a response rate of 5%, mainly in Europe and North America. In total, 103 neurosurgeons (28.5%) reported using ML in their clinical practice, and 31.1% in research. Adoption rates of ML were relatively evenly distributed, with 25.6% for North America, 30.9% for Europe, 33.3% for Latin America and the Middle East, 44.4% for Asia and Pacific and 100% for Africa with only two responses. No predictors of clinical ML use were identified, although academic settings and subspecialties neuro-oncology, functional, trauma and epilepsy predicted use of ML in research. The most common applications were for predicting outcomes and complications, as well as interpretation of imaging. CONCLUSIONS: This report provides a global overview of the neurosurgical applications of ML. A relevant proportion of the surveyed neurosurgeons reported clinical experience with ML algorithms. Future studies should aim to clarify the role and potential benefits of ML in neurosurgery and to reconcile these potential advantages with bioethical considerations.
Assuntos
Atitude do Pessoal de Saúde , Aprendizado de Máquina , Neurocirurgiões/estatística & dados numéricos , Procedimentos Neurocirúrgicos , Europa (Continente) , Pesquisas sobre Atenção à Saúde , Humanos , Inquéritos e QuestionáriosRESUMO
Ischemic cerebrovascular events often lead to aphasia. Previous work provided hints that such strokes may affect women and men in distinct ways. Women tend to suffer strokes with more disabling language impairment, even if the lesion size is comparable to men. In 1401 patients, we isolate data-led representations of anatomical lesion patterns and hand-tailor a Bayesian analytical solution to carefully model the degree of sex divergence in predicting language outcomes ~3 months after stroke. We locate lesion-outcome effects in the left-dominant language network that highlight the ventral pathway as a core lesion focus across different tests of language performance. We provide detailed evidence for sex-specific brain-behavior associations in the domain-general networks associated with cortico-subcortical pathways, with unique contributions of the fornix in women and cingular fiber bundles in men. Our collective findings suggest diverging white matter substrates in how stroke causes language deficits in women and men. Clinically acknowledging such sex disparities has the potential to improve personalized treatment for stroke patients worldwide.
Assuntos
Afasia , Acidente Vascular Cerebral , Substância Branca , Masculino , Humanos , Feminino , Substância Branca/diagnóstico por imagem , Substância Branca/patologia , Teorema de Bayes , Afasia/complicações , Afasia/patologia , ViésRESUMO
The current World Health Organization classification integrates histological and molecular features of brain tumours. The aim of this study was to identify generalizable topological patterns with the potential to add an anatomical dimension to the classification of brain tumours. We applied non-negative matrix factorization as an unsupervised pattern discovery strategy to the fine-grained topographic tumour profiles of 936 patients with neuroepithelial tumours and brain metastases. From the anatomical features alone, this machine learning algorithm enabled the extraction of latent topological tumour patterns, termed meta-topologies. The optimal part-based representation was automatically determined in 10 000 split-half iterations. We further characterized each meta-topology's unique histopathologic profile and survival probability, thus linking important biological and clinical information to the underlying anatomical patterns. In neuroepithelial tumours, six meta-topologies were extracted, each detailing a transpallial pattern with distinct parenchymal and ventricular compositions. We identified one infratentorial, one allopallial, three neopallial (parieto-occipital, frontal, temporal) and one unisegmental meta-topology. Each meta-topology mapped to distinct histopathologic and molecular profiles. The unisegmental meta-topology showed the strongest anatomical-clinical link demonstrating a survival advantage in histologically identical tumours. Brain metastases separated to an infra- and supratentorial meta-topology with anatomical patterns highlighting their affinity to the cortico-subcortical boundary of arterial watershed areas.Using a novel data-driven approach, we identified generalizable topological patterns in both neuroepithelial tumours and brain metastases. Differences in the histopathologic profiles and prognosis of these anatomical tumour classes provide insights into the heterogeneity of tumour biology and might add to personalized clinical decision-making.
RESUMO
Socioeconomic status (SES) anchors individuals in their social network layers. Our embedding in the societal fabric resonates with habitus, world view, opportunity, and health disparity. It remains obscure how distinct facets of SES are reflected in the architecture of the central nervous system. Here, we capitalized on multivariate multi-output learning algorithms to explore possible imprints of SES in gray and white matter structure in the wider population (n ≈ 10,000 UK Biobank participants). Individuals with higher SES, compared with those with lower SES, showed a pattern of increased region volumes in the left brain and decreased region volumes in the right brain. The analogous lateralization pattern emerged for the fiber structure of anatomical white matter tracts. Our multimodal findings suggest hemispheric asymmetry as an SES-related brain signature, which was consistent across six different indicators of SES: degree, education, income, job, neighborhood and vehicle count. Hence, hemispheric specialization may have evolved in human primates in a way that reveals crucial links to SES.