An extended clinical EEG dataset with 15,300 automatically labelled recordings for pathology decoding.

Kiessner, Ann-Kathrin; Schirrmeister, Robin T; Gemein, Lukas A W; Boedecker, Joschka; Ball, Tonio

Kiessner, Ann-Kathrin; Schirrmeister, Robin T; Gemein, Lukas A W; Boedecker, Joschka; Ball, Tonio.

Affiliation

Kiessner AK; Neuromedical AI Lab, Department of Neurosurgery, Medical Center - University of Freiburg, Faculty of Medicine, University of Freiburg, Engelbergerstr. 21, 79106 Freiburg, Germany; BrainLinks-BrainTools, IMBIT (Institute for Machine-Brain Interfacing Technology), University of Freiburg, Georges-Köhle
Schirrmeister RT; Neuromedical AI Lab, Department of Neurosurgery, Medical Center - University of Freiburg, Faculty of Medicine, University of Freiburg, Engelbergerstr. 21, 79106 Freiburg, Germany; BrainLinks-BrainTools, IMBIT (Institute for Machine-Brain Interfacing Technology), University of Freiburg, Georges-Köhle
Gemein LAW; Neuromedical AI Lab, Department of Neurosurgery, Medical Center - University of Freiburg, Faculty of Medicine, University of Freiburg, Engelbergerstr. 21, 79106 Freiburg, Germany; Neurorobotics Lab, Computer Science Department - University of Freiburg, Faculty of Engineering, University of Freiburg,
Boedecker J; BrainLinks-BrainTools, IMBIT (Institute for Machine-Brain Interfacing Technology), University of Freiburg, Georges-Köhler-Allee 201, 79110 Freiburg, Germany; Neurorobotics Lab, Computer Science Department - University of Freiburg, Faculty of Engineering, University of Freiburg, Georges-Köhler-Allee
Ball T; Neuromedical AI Lab, Department of Neurosurgery, Medical Center - University of Freiburg, Faculty of Medicine, University of Freiburg, Engelbergerstr. 21, 79106 Freiburg, Germany; BrainLinks-BrainTools, IMBIT (Institute for Machine-Brain Interfacing Technology), University of Freiburg, Georges-Köhle

Neuroimage Clin ; 39: 103482, 2023.

Article in En | MEDLINE | ID: mdl-37544168

ABSTRACT

Automated clinical EEG analysis using machine learning (ML) methods is a growing EEG research area. Previous studies on binary EEG pathology decoding have mainly used the Temple University Hospital (TUH) Abnormal EEG Corpus (TUAB) which contains approximately 3,000 manually labelled EEG recordings. To evaluate and eventually even improve the generalisation performance of machine learning methods for EEG pathology, decoding larger, publicly available datasets is required. A number of studies addressed the automatic labelling of large open-source datasets as an approach to create new datasets for EEG pathology decoding, but little is known about the extent to which training on larger, automatically labelled dataset affects decoding performances of established deep neural networks. In this study, we automatically created additional pathology labels for the Temple University Hospital (TUH) EEG Corpus (TUEG) based on the medical reports using a rule-based text classifier. We generated a dataset of 15,300 newly labelled recordings, which we call the TUH Abnormal Expansion EEG Corpus (TUABEX), and which is five times larger than the TUAB. Since the TUABEX contains more pathological (75%) than non-pathological (25%) recordings, we then selected a balanced subset of 8,879 recordings, the TUH Abnormal Expansion Balanced EEG Corpus (TUABEXB). To investigate how training on a larger, automatically labelled dataset affects the decoding performance of deep neural networks, we applied four established deep convolutional neural networks (ConvNets) to the task of pathological versus non-pathological classification and compared the performance of each architecture after training on different datasets. The results show that training on the automatically labelled TUABEXB dataset rather than training on the manually labelled TUAB dataset increases accuracies on TUABEXB and even for TUAB itself for some architectures. We argue that automatically labelling of large open-source datasets can be used to efficiently utilise the massive amount of EEG data stored in clinical archives. We make the proposed TUABEXB available open source and thus offer a new dataset for EEG machine learning research.

Subject(s)

Machine Learning; Neural Networks, Computer; Humans; Electroencephalography/methods; Algorithms

Key words

Automatic labeling; Convolutional neural networks; Deep learning; Diagnostics; Electroencephalography; Pathology

Fulltext

Add to My VHL

XML

PubMed Links

Search on Google

Full text: 1 Collection: 01-internacional Database: MEDLINE Main subject: Neural Networks, Computer / Machine Learning Limits: Humans Language: En Journal: Neuroimage Clin Year: 2023 Document type: Article Country of publication: Netherlands

Fulltext

Add to My VHL

XML

PubMed Links

Search on Google