RESUMO
Machine learning (ML) is increasingly used in cognitive, computational and clinical neuroscience. The reliable and efficient application of ML requires a sound understanding of its subtleties and limitations. Training ML models on datasets with imbalanced classes is a particularly common problem, and it can have severe consequences if not adequately addressed. With the neuroscience ML user in mind, this paper provides a didactic assessment of the class imbalance problem and illustrates its impact through systematic manipulation of data imbalance ratios in (i) simulated data and (ii) brain data recorded with electroencephalography (EEG), magnetoencephalography (MEG) and functional magnetic resonance imaging (fMRI). Our results illustrate how the widely-used Accuracy (Acc) metric, which measures the overall proportion of successful predictions, yields misleadingly high performances, as class imbalance increases. Because Acc weights the per-class ratios of correct predictions proportionally to class size, it largely disregards the performance on the minority class. A binary classification model that learns to systematically vote for the majority class will yield an artificially high decoding accuracy that directly reflects the imbalance between the two classes, rather than any genuine generalizable ability to discriminate between them. We show that other evaluation metrics such as the Area Under the Curve (AUC) of the Receiver Operating Characteristic (ROC), and the less common Balanced Accuracy (BAcc) metric - defined as the arithmetic mean between sensitivity and specificity, provide more reliable performance evaluations for imbalanced data. Our findings also highlight the robustness of Random Forest (RF), and the benefits of using stratified cross-validation and hyperprameter optimization to tackle data imbalance. Critically, for neuroscience ML applications that seek to minimize overall classification error, we recommend the routine use of BAcc, which in the specific case of balanced data is equivalent to using standard Acc, and readily extends to multi-class settings. Importantly, we present a list of recommendations for dealing with imbalanced data, as well as open-source code to allow the neuroscience community to replicate and extend our observations and explore alternative approaches to coping with imbalanced data.
Assuntos
Benchmarking , Encéfalo , Humanos , Magnetoencefalografia , Aprendizado de Máquina , Eletroencefalografia , AlgoritmosRESUMO
How do we choose a particular action among equally valid alternatives? Nonhuman primate findings have shown that decision-making implicates modulations in unit firing rates and local field potentials (LFPs) across frontal and parietal cortices. Yet the electrophysiological brain mechanisms that underlie free choice in humans remain ill defined. Here, we address this question using rare intracerebral electroencephalography (EEG) recordings in surgical epilepsy patients performing a delayed oculomotor decision task. We find that the temporal dynamics of high-gamma (HG, 60-140 Hz) neural activity in distinct frontal and parietal brain areas robustly discriminate free choice from instructed saccade planning at the level of single trials. Classification analysis was applied to the LFP signals to isolate decision-related activity from sensory and motor planning processes. Compared with instructed saccades, free-choice trials exhibited delayed and longer-lasting HG activity during the delay period. The temporal dynamics of the decision-specific sustained HG activity indexed the unfolding of a deliberation process, rather than memory maintenance. Taken together, these findings provide the first direct electrophysiological evidence in humans for the role of sustained high-frequency neural activation in frontoparietal cortex in mediating the intrinsically driven process of freely choosing among competing behavioral alternatives.
Assuntos
Comportamento de Escolha/fisiologia , Tomada de Decisões/fisiologia , Eletroencefalografia/métodos , Adulto , Encéfalo/fisiologia , Mapeamento Encefálico/métodos , Córtex Cerebral/fisiologia , Feminino , Lobo Frontal/fisiologia , Ritmo Gama/fisiologia , Humanos , Masculino , Neurônios/fisiologia , Lobo Parietal/fisiologia , Autonomia Pessoal , Estimulação Luminosa , Desempenho Psicomotor/fisiologia , Movimentos Sacádicos/fisiologiaRESUMO
Recent years have witnessed a massive push towards reproducible research in neuroscience. Unfortunately, this endeavor is often challenged by the large diversity of tools used, project-specific custom code and the difficulty to track all user-defined parameters. NeuroPycon is an open-source multi-modal brain data analysis toolkit which provides Python-based template pipelines for advanced multi-processing of MEG, EEG, functional and anatomical MRI data, with a focus on connectivity and graph theoretical analyses. Importantly, it provides shareable parameter files to facilitate replication of all analysis steps. NeuroPycon is based on the NiPype framework which facilitates data analyses by wrapping many commonly-used neuroimaging software tools into a common Python environment. In other words, rather than being a brain imaging software with is own implementation of standard algorithms for brain signal processing, NeuroPycon seamlessly integrates existing packages (coded in python, Matlab or other languages) into a unified python framework. Importantly, thanks to the multi-threaded processing and computational efficiency afforded by NiPype, NeuroPycon provides an easy option for fast parallel processing, which critical when handling large sets of multi-dimensional brain data. Moreover, its flexible design allows users to easily configure analysis pipelines by connecting distinct nodes to each other. Each node can be a Python-wrapped module, a user-defined function or a well-established tool (e.g. MNE-Python for MEG analysis, Radatools for graph theoretical metrics, etc.). Last but not least, the ability to use NeuroPycon parameter files to fully describe any pipeline is an important feature for reproducibility, as they can be shared and used for easy replication by others. The current implementation of NeuroPycon contains two complementary packages: The first, called ephypype, includes pipelines for electrophysiology analysis and a command-line interface for on the fly pipeline creation. Current implementations allow for MEG/EEG data import, pre-processing and cleaning by automatic removal of ocular and cardiac artefacts, in addition to sensor or source-level connectivity analyses. The second package, called graphpype, is designed to investigate functional connectivity via a wide range of graph-theoretical metrics, including modular partitions. The present article describes the philosophy, architecture, and functionalities of the toolkit and provides illustrative examples through interactive notebooks. NeuroPycon is available for download via github (https://github.com/neuropycon) and the two principal packages are documented online (https://neuropycon.github.io/ephypype/index.html, and https://neuropycon.github.io/graphpype/index.html). Future developments include fusion of multi-modal data (eg. MEG and fMRI or intracranial EEG and fMRI). We hope that the release of NeuroPycon will attract many users and new contributors, and facilitate the efforts of our community towards open source tool sharing and development, as well as scientific reproducibility.
Assuntos
Encéfalo/diagnóstico por imagem , Rede Nervosa/diagnóstico por imagem , Neuroimagem/métodos , Software , Algoritmos , Eletroencefalografia , Humanos , Imageamento por Ressonância Magnética , Magnetoencefalografia , Reprodutibilidade dos TestesRESUMO
Rhythmic neuronal synchronization across large-scale networks is thought to play a key role in the regulation of conscious states. Changes in neuronal oscillation amplitude across states of consciousness have been widely reported, but little is known about possible changes in the temporal dynamics of these oscillations. The temporal structure of brain oscillations may provide novel insights into the neural mechanisms underlying consciousness. To address this question, we examined long-range temporal correlations (LRTC) of EEG oscillation amplitudes recorded during both wakefulness and anesthetic-induced unconsciousness. Importantly, the time-varying EEG oscillation envelopes were assessed over the course of a sevoflurane sedation protocol during which the participants alternated between states of consciousness and unconsciousness. Both spectral power and LRTC in oscillation amplitude were computed across multiple frequency bands. State-dependent differences in these features were assessed using non-parametric tests and supervised machine learning. We found that periods of unconsciousness were associated with increases in LRTC in beta (15-30Hz) amplitude over frontocentral channels and with a suppression of alpha (8-13Hz) amplitude over occipitoparietal electrodes. Moreover, classifiers trained to predict states of consciousness on single epochs demonstrated that the combination of beta LRTC with alpha amplitude provided the highest classification accuracy (above 80%). These results suggest that loss of consciousness is accompanied by an augmentation of temporal persistence in neuronal oscillation amplitude, which may reflect an increase in regularity and a decrease in network repertoire compared to the brain's activity during resting-state consciousness.