EEG Seizure Detection

Multi-Architecture Benchmark on Pediatric EEG

ResearchEngineering

Problem

Most published seizure-detection models report numbers on a single architecture or subject — leaving a critical question unanswered: which model class actually generalizes across patients in real conditions?

Approach

Built a unified preprocessing pipeline on the CHB-MIT corpus (24 patients, 916 hours of recording) using MNE for filtering, artifact handling, and windowing. So every architecture trains on byte-identical inputs.

Benchmarked 15+ architectures across four families. LSTM/GRU recurrent, Transformer, Mamba/state-space, and Mixture-of-Experts. With patient-disjoint splits to measure cross-subject generalization, not memorization.

Containerized training with version-pinned environments and tracked every run. Swapping an architecture is one config change and a full re-evaluation is reproducible end-to-end.

Results

AUROC 0.740 across 15+ architectures

AUROC 0.740 (best family, patient-disjoint evaluation)
15+ architectures benchmarked under identical preprocessing
916 hours of CHB-MIT EEG processed across 24 pediatric subjects
Reproducible: any run can be re-executed from the locked environment

Stack

Deep LearningNeural Signal ProcessingPyTorch

What I learned

Architecture choice matters less than preprocessing rigor and patient-disjoint evaluation. Several models that look state-of-the-art on shuffled splits collapse on held-out patients. The infrastructure to run a fair comparison is the actual scientific contribution; the model rankings are downstream of that.

View on GitHub ↗