In this project, we utilize the dataset from a previous study [1], which employed a Rapid Serial Visual Presentation (RSVP) paradigm to present 16,740 images to subjects while recording their EEG signals using a 64-channel EASYCAP system at 1,000 Hz. The dataset comprises approximately 80,000 samples per subject from 10 subjects in total. The task involves classifying the subjects' EEG signals into one of 27 high-level classes based on the images they were viewing. For this project, we will focus on data from the first subject, and the classes will be consolidated into 14 to simplify the task. The source of the dataset is linked in the Data preparation
section.
Quality evaluation
Pre-processing | Demographic Attribute | |||||||||
---|---|---|---|---|---|---|---|---|---|---|
EEG (63 Channels) | Channel selection | Bandpass filter (1-50Hz) | ASR | Brain | Muscle | Eye | Heart | Line Noise | Channel Noise | Other |
v | x | x | x | 32 | 3 | 1 | 0 | 0 | 6 | 21 |
v | v | x | x | 15 | 0 | 0 | 1 | 0 | 0 | 1 |
v | v | v | x | 16 | 0 | 0 | 0 | 0 | 0 | 1 |
v | v | v | v | 16 | 0 | 0 | 0 | 0 | 0 | 1 |




EEG data presents challenges due to its low signal-to-noise ratio (SNR). Effective preprocessing is essential for models to accurately interpret EEG data, making artifact removal methods critical for EEG-based BCI.
This project aims to evaluate the effectiveness of different artifact removal methods. Utilizing an RSVP EEG visual dataset, we tested various methods such as ASR, ICA, and Autoreject. The results are evaluated based on the performance of EEGNet, with the evaluation metric being the macro F1 score.

As illustrated in the image, the preprocessing starts by selecting 17 channels in the occipital and parietal cortex, similar to the method used in study [1]. Then, a bandpass filter is applied to retain signals within the 1-50 Hz range. After filtering, one of the artifact removal methods is chosen, although it is important to note that the ICU-net is not yet implemented and that ICA refers to the extended Infomax ICA. The signal is then epoched from -0.2s to 0.8s relative to the stimulus onset, followed by baseline correction by subtracting the mean from -0.2s to 0s. Finally, the mean ERP is calculated for epochs sharing the same label, which helps to minimize noise. The data is then split into training, validation, and test sets in a 70:15:15 ratio.
We utilize EEGNet [2], a compact CNN specifically designed for efficient EEG classification. Its primary advantage lies in its versatility, handling various EEG-based tasks such as motor imagery, ERP classification, and abnormal EEG detection. The lightweight architecture of EEGNet makes it ideal for real-time applications and deployment on devices with limited computational resources. Furthermore, its robustness against noise making it an excellent candidate for our investigation into the necessity of data preprocessing for deep learning methods.
To assess the effectiveness of the methods introduced in the lecture, we designed two experiments. In the first experiment, we evaluated steps 1, 2, and 4 using an ablation approach. The second experiment tested the best combination from the first experiment alongside one of four artifact removal methods. The results of these experiments are detailed in the Results
section and the experiments are evaluated on an independent test set after model tuning to ensure reliability.
Environment
run pip install -r requirements.txt
Data preparation
-
Download the
Raw EEG data
of sub1 from here and Unzip it. -
Download the
image_metadata.npy
from here -
Download the
category_mat_manual.mat
from here -
put them under data/LargeEEG/raw and execute
python preprocess.py
. Now we got0000.set
and1000.set
. The data files are named in XXXX.set.- First digit: 0 means using all channels without channel selection. 1 means otherwise.
- Second digit: 0 means not using bandPass filtering. 1 means otherwise.
- Third digit:
- 0: no artifact removal method used.
- 1: ICA
- 2: ASR
- 3: autoreject
- 5: ASR+ICA
- Fourth digit: 0 means not taking the ERP mean. 1 means otherwise.
-
Use EEGLab for preprocessing to obtain 0100.set 1100.set 1110.set 1120.set 1150.set
-
execute
python preprocess2.py
and input the following 4 digits to generate respective training files.- 0000
- 0020
- 0100
- 1000
- 1100
- 1110
- 1120
- 1130
- 1150
Run
To reproduce experiment1, excute python exp1.py
and then python test_exp1.py
.
To reproduce experiment2, excute python exp2.py
and then python test_exp2.py
.
In the experiments
folder, each subfolder contains the training results for specific dataset. For a summary of the performance, you can refer to the results.txt
.
Dataset ID | Test Accuracy | Test Macro F1 | Test Micro F1 |
---|---|---|---|
0000 | 0.1176 | 0.0860 | 0.1176 |
1100 | 0.1224 | 0.0887 | 0.1224 |
1001 | 0.3471 | 0.2775 | 0.3471 |
0101 | 0.2703 | 0.2543 | 0.2703 |
1101 | 0.3137 | 0.2782 | 0.3137 |
Dataset ID | Test Accuracy | Test Macro F1 | Test Micro F1 |
---|---|---|---|
1111 | 0.3200 | 0.2982 | 0.3200 |
1121 | 0.3484 | 0.3257 | 0.3484 |
1131 | 0.4197 | 0.3102 | 0.4197 |
1151 | 0.2906 | 0.2689 | 0.2906 |
Category | Label |
---|---|
animal | 0 |
human body | 1 |
clothing and accessories | 2 |
food | 3 |
home and furniture | 4 |
kitchen | 5 |
electronics | 6 |
medical equipment | 7 |
office supply | 8 |
musical instrument | 9 |
vehicle | 10 |
toy | 11 |
plant | 12 |
other | 13 |




Given the imbalanced nature of the dataset, we use macro F1 as our main metric. From Experiment 2, we observe that the combination labeled 1121
(ASR) performs the best. Adding ICA appears to degrade performance, possibly because the dataset was recorded while the subject was sitting, making ICA an overkill that might remove important components. Interestingly, Autoreject achieves the highest performance in terms of accuracy and micro F1, suggesting that combining it with other methods might yield even better results.
Regarding the performance of each label, labels 1
, 2
, 3
, and 12
(human body
, clothing and accessories
, food
, and plant
) generally perform better, except in the 1151
combination, which shows lower performance for the plant
label. These categories are commonly seen objects, which likely contributes to their higher performance. Surprisingly, the animal
label (0
) consistently shows low performance across all artifact removal methods. This is unexpected, as one would assume that animals, being more distinct to human perception, would yield better performance.
https://drive.google.com/file/d/1CXrHnbwSs3LDOLASfUE4Sm5fZbmLAnou/view?usp=sharing
[1] Gifford, A. T., Dwivedi, K., Roig, G., & Cichy, R. M. (2022). A large and rich EEG dataset for modeling human visual object recognition. NeuroImage, 264, 119754.
[2] Lawhern, V. J., Solon, A. J., Waytowich, N. R., Gordon, S. M., Hung, C. P., & Lance, B. J. (2018). EEGNet: a compact convolutional neural network for EEG-based brain–computer interfaces. Journal of neural engineering, 15(5), 056013.
Code struture modified from https://github.com/eric12345566