You are on page 1of 48

Motivation

Automatic Feature Selection

Summary

Learning optimal EEG features across time, frequency and space.


Jason Farquhar, Jeremy Hill, Bernhard Schlkopf
Max Planck Institute for Biological Cybernetics, Tbingen Germany

NIPS06 Workshop on Trends in BCI


kyb-logo mc-logo

Jason Farquhar, Jeremy Hill, Bernhard Schlkopf Learning optimal EEG features across time, frequency and space.

MPI Tbingen

Motivation

Automatic Feature Selection

Summary

Outline

Motivation Source types in EEG based BCI Automatic Feature Selection Learning Spatial Features Feature selection as Model Selection Spectral/Temporal Filtering

kyb-logo

mc-logo

Jason Farquhar, Jeremy Hill, Bernhard Schlkopf Learning optimal EEG features across time, frequency and space.

MPI Tbingen

Motivation

Automatic Feature Selection

Summary

The current approach to learning in BCIs


Feature Extraction Classification

Current BCI use learning in two distinct phases, 1. Feature Extraction where we attempt to extract features which lead to good classier performance, 2. Classication usually a simple linear classier, (SVM, LDA, Gaussian), because Once we have good features the classier doesnt really matter
kyb-logo mc-logo

Jason Farquhar, Jeremy Hill, Bernhard Schlkopf Learning optimal EEG features across time, frequency and space.

MPI Tbingen

Motivation

Automatic Feature Selection

Summary

The current approach to learning in BCIs


Feature Extraction Classification

Current BCI use learning in two distinct phases, 1. Feature Extraction where we attempt to extract features which lead to good classier performance, using,
prior-knowledge, the 7-30Hz band for ERDs maximising r-scores, maximising independence (ICA) maximising the ratios of the class variances (CSP)

2. Classication usually a simple linear classier, (SVM, LDA, Gaussian), because Once we have good features the classier kyb-logo doesnt really matter
Jason Farquhar, Jeremy Hill, Bernhard Schlkopf Learning optimal EEG features across time, frequency and space.

mc-logo

MPI Tbingen

Motivation

Automatic Feature Selection

Summary

The current approach to learning in BCIs


Feature Extraction Classification

Current BCI use learning in two distinct phases, 1. Feature Extraction where we attempt to extract features which lead to good classier performance, 2. Classication usually a simple linear classier, (SVM, LDA, Gaussian), because Once we have good features the classier doesnt really matter
kyb-logo mc-logo

Jason Farquhar, Jeremy Hill, Bernhard Schlkopf Learning optimal EEG features across time, frequency and space.

MPI Tbingen

Motivation

Automatic Feature Selection

Summary

The current approach to learning in BCIs

This seems wrong!


Note The objectives used in feature extraction are not good predictors of generalisation performance. Question? Why, use an objective for the important feature extraction which is a poor predictor of generalisation performance? When we have provably good predictors (margin, evidence) available in the unimportant classier?
kyb-logo mc-logo

Jason Farquhar, Jeremy Hill, Bernhard Schlkopf Learning optimal EEG features across time, frequency and space.

MPI Tbingen

Motivation

Automatic Feature Selection

Summary

The current approach to learning in BCIs

This seems wrong!


Note The objectives used in feature extraction are not good predictors of generalisation performance. Question? Why, use an objective for the important feature extraction which is a poor predictor of generalisation performance? When we have provably good predictors (margin, evidence) available in the unimportant classier?
kyb-logo mc-logo

Jason Farquhar, Jeremy Hill, Bernhard Schlkopf Learning optimal EEG features across time, frequency and space.

MPI Tbingen

Motivation

Automatic Feature Selection

Summary

The current approach to learning in BCIs

This seems wrong!


Note The objectives used in feature extraction are not good predictors of generalisation performance. Question? Why, use an objective for the important feature extraction which is a poor predictor of generalisation performance? When we have provably good predictors (margin, evidence) available in the unimportant classier?
kyb-logo mc-logo

Jason Farquhar, Jeremy Hill, Bernhard Schlkopf Learning optimal EEG features across time, frequency and space.

MPI Tbingen

Motivation

Automatic Feature Selection

Summary

The current approach to learning in BCIs

This seems wrong!


Note The objectives used in feature extraction are not good predictors of generalisation performance. Question? Why, use an objective for the important feature extraction which is a poor predictor of generalisation performance? When we have provably good predictors (margin, evidence) available in the unimportant classier?
kyb-logo mc-logo

Jason Farquhar, Jeremy Hill, Bernhard Schlkopf Learning optimal EEG features across time, frequency and space.

MPI Tbingen

Motivation

Automatic Feature Selection

Summary

A Better approach

1. Combine the feature extraction and classier learning 2. Choose features which optimise the classiers objective We show how to learn spatio-spectro-temporal feature extractors for classifying ERDs using the max-margin criterion1

kyb-logo 1 We have also successfully applied this approach to LR and GP classiers and MRP/P300 temporal signals)

mc-logo

Jason Farquhar, Jeremy Hill, Bernhard Schlkopf Learning optimal EEG features across time, frequency and space.

MPI Tbingen

Motivation

Automatic Feature Selection

Summary

A Better approach

1. Combine the feature extraction and classier learning 2. Choose features which optimise the classiers objective We show how to learn spatio-spectro-temporal feature extractors for classifying ERDs using the max-margin criterion1

kyb-logo 1 We have also successfully applied this approach to LR and GP classiers and MRP/P300 temporal signals)

mc-logo

Jason Farquhar, Jeremy Hill, Bernhard Schlkopf Learning optimal EEG features across time, frequency and space.

MPI Tbingen

Motivation

Automatic Feature Selection

Summary

A Better approach

1. Combine the feature extraction and classier learning 2. Choose features which optimise the classiers objective We show how to learn spatio-spectro-temporal feature extractors for classifying ERDs using the max-margin criterion1

kyb-logo 1 We have also successfully applied this approach to LR and GP classiers and MRP/P300 temporal signals)

mc-logo

Jason Farquhar, Jeremy Hill, Bernhard Schlkopf Learning optimal EEG features across time, frequency and space.

MPI Tbingen

Motivation

Automatic Feature Selection

Summary

Data-visualisation: the ROC-ogram

Raw data is dxT time-series for N trials ROC-ogram : time vs. frequency vs. ROC score for each channel allows us to identify where the discriminative information lies

kyb-logo

mc-logo

Jason Farquhar, Jeremy Hill, Bernhard Schlkopf Learning optimal EEG features across time, frequency and space.

MPI Tbingen

Motivation

Automatic Feature Selection

Summary

Example raw ROC-ogram: (The good)

Spatial, Spectral, and Temporal discriminative features are subject specic

kyb-logo

mc-logo

Jason Farquhar, Jeremy Hill, Bernhard Schlkopf Learning optimal EEG features across time, frequency and space.

MPI Tbingen

Motivation

Automatic Feature Selection

Summary

Example raw ROC-ogram: (The bad)

Spatial, Spectral, and Temporal discriminative features are subject specic

kyb-logo

mc-logo

Jason Farquhar, Jeremy Hill, Bernhard Schlkopf Learning optimal EEG features across time, frequency and space.

MPI Tbingen

Motivation

Automatic Feature Selection

Summary

Example raw ROC-ogram: (The ugly)

Spatial, Spectral, and Temporal discriminative features are subject specic

kyb-logo

mc-logo

Jason Farquhar, Jeremy Hill, Bernhard Schlkopf Learning optimal EEG features across time, frequency and space.

MPI Tbingen

Motivation

Automatic Feature Selection

Summary

Spatio-Spectro-Temporal feature selection


Would like to automatically perform feature selection:

Spatially, Temporally, Spectrally

kyb-logo

mc-logo

Jason Farquhar, Jeremy Hill, Bernhard Schlkopf Learning optimal EEG features across time, frequency and space.

MPI Tbingen

Motivation

Automatic Feature Selection

Summary

Spatio-Spectro-Temporal feature selection


Would like to automatically perform feature selection:

Spatially, Temporally, Spectrally

kyb-logo

mc-logo

Jason Farquhar, Jeremy Hill, Bernhard Schlkopf Learning optimal EEG features across time, frequency and space.

MPI Tbingen

Motivation

Automatic Feature Selection

Summary

Spatio-Spectro-Temporal feature selection


Would like to automatically perform feature selection:

Spatially, Temporally, Spectrally

kyb-logo

mc-logo

Jason Farquhar, Jeremy Hill, Bernhard Schlkopf Learning optimal EEG features across time, frequency and space.

MPI Tbingen

Motivation

Automatic Feature Selection

Summary

Spatio-Spectro-Temporal feature selection


Would like to automatically perform feature selection:

Spatially, Temporally, Spectrally

kyb-logo

mc-logo

Jason Farquhar, Jeremy Hill, Bernhard Schlkopf Learning optimal EEG features across time, frequency and space.

MPI Tbingen

Motivation

Automatic Feature Selection

Summary

Learning Feature Extractors

Start by showing how to learning spatial lters with the max-margin criteria, Then extend to learning spatial+spectral+temporal

kyb-logo

mc-logo

Jason Farquhar, Jeremy Hill, Bernhard Schlkopf Learning optimal EEG features across time, frequency and space.

MPI Tbingen

Motivation

Automatic Feature Selection

Summary

Spatial Filtering
Volume Conduction electrodes detect superposition of signals from all over the brain X = AS Spatial ltering undoes this superposition to re-focus on discriminative signals y = fs X This is a Blind Source Separation (BSS) problem many algorithms available to solve this problem In BCI commonly use a fast, supervised method called kyb-logo Common Spatial Patterns [Koles 1990]
Jason Farquhar, Jeremy Hill, Bernhard Schlkopf Learning optimal EEG features across time, frequency and space.

mc-logo

MPI Tbingen

Motivation

Automatic Feature Selection

Summary

Spatial Filtering
Volume Conduction electrodes detect superposition of signals from all over the brain X = AS Spatial ltering undoes this superposition to re-focus on discriminative signals y = fs X This is a Blind Source Separation (BSS) problem many algorithms available to solve this problem In BCI commonly use a fast, supervised method called kyb-logo Common Spatial Patterns [Koles 1990]
Jason Farquhar, Jeremy Hill, Bernhard Schlkopf Learning optimal EEG features across time, frequency and space.

mc-logo

MPI Tbingen

Motivation

Automatic Feature Selection

Summary

Spatial Filtering
Volume Conduction electrodes detect superposition of signals from all over the brain X = AS Spatial ltering undoes this superposition to re-focus on discriminative signals y = fs X This is a Blind Source Separation (BSS) problem many algorithms available to solve this problem In BCI commonly use a fast, supervised method called kyb-logo Common Spatial Patterns [Koles 1990]
Jason Farquhar, Jeremy Hill, Bernhard Schlkopf Learning optimal EEG features across time, frequency and space.

mc-logo

MPI Tbingen

Motivation

Automatic Feature Selection

Summary

The Max-margin Objective


related to an upper bound on generalisation performance the basis for the SVM nds w such that the minimal distance between classes is maximised in the linear case can be expressed primal objective as, min w w +
w,b i

max(0, 1 yi (xi w + b))

for non-linear classication we can simply replace xi with an explicit feature mapping (xi ) this is how we include the feature extraction into the kyb-logo classication objective
Jason Farquhar, Jeremy Hill, Bernhard Schlkopf Learning optimal EEG features across time, frequency and space.

mc-logo

MPI Tbingen

Motivation

Automatic Feature Selection

Summary

Max-margin optimised spatial lters


1. Dene the feature-space mapping, , from time series, Xi to spatially ltered log bandpowers, (Xi , Fs ) = ln(diag(Fs Xi Xi Fs )) where, Fs = [fs1 , fs2 , . . .] is the set of spatial lters 2. Include the dependence on explicitly into the classiers objective, e.g. Linear, Max Margin Jmm (X , w, b, Fs ) = w w+
i

max(0, 1yi ((Xi ; Fs ) w+b))

Jason Farquhar, Jeremy Hill, Bernhard Schlkopf

3. Optimise this objective, treating s parameters as additional optimisation variables kyb-logo Note Unconstrained optimisation, solve for w, b, Fs directly using CG

mc-logo

MPI Tbingen

Learning optimal EEG features across time, frequency and space.

Motivation

Automatic Feature Selection

Summary

Max-margin optimised spatial lters


1. Dene the feature-space mapping, , from time series, Xi to spatially ltered log bandpowers, (Xi , Fs ) = ln(diag(Fs Xi Xi Fs )) where, Fs = [fs1 , fs2 , . . .] is the set of spatial lters 2. Include the dependence on explicitly into the classiers objective, e.g. Linear, Max Margin Jmm (X , w, b, Fs ) = w w+
i

max(0, 1yi ((Xi ; Fs ) w+b))

Jason Farquhar, Jeremy Hill, Bernhard Schlkopf

3. Optimise this objective, treating s parameters as additional optimisation variables kyb-logo Note Unconstrained optimisation, solve for w, b, Fs directly using CG

mc-logo

MPI Tbingen

Learning optimal EEG features across time, frequency and space.

Motivation

Automatic Feature Selection

Summary

Max-margin optimised spatial lters


1. Dene the feature-space mapping, , from time series, Xi to spatially ltered log bandpowers, (Xi , Fs ) = ln(diag(Fs Xi Xi Fs )) where, Fs = [fs1 , fs2 , . . .] is the set of spatial lters 2. Include the dependence on explicitly into the classiers objective, e.g. Linear, Max Margin Jmm (X , w, b, Fs ) = w w+
i

max(0, 1yi ((Xi ; Fs ) w+b))

Jason Farquhar, Jeremy Hill, Bernhard Schlkopf

3. Optimise this objective, treating s parameters as additional optimisation variables kyb-logo Note Unconstrained optimisation, solve for w, b, Fs directly using CG

mc-logo

MPI Tbingen

Learning optimal EEG features across time, frequency and space.

Motivation

Automatic Feature Selection

Summary

Adding Spectral/Temporal lters


Very simple to include Spectral/Temporal ltering,.... .....just modify the feature-mapping to include them. Let, ff be a spectral lter, and ft a temporal lter. Then, the Spatial + Spectral + Temporally ltered band-power is, (X ; fs , ff , ft ) = F 1 (F(fs X Dt )Df )(F 1 (F(fs X Dt )Df )) = fs F(X Dt )Df2 F(X Dt ) fs /T where, F is the Fourier transform, and D(.) = diag(f(.) )
kyb-logo mc-logo

Jason Farquhar, Jeremy Hill, Bernhard Schlkopf Learning optimal EEG features across time, frequency and space.

MPI Tbingen

Motivation

Automatic Feature Selection

Summary

Adding Spectral/Temporal lters


Very simple to include Spectral/Temporal ltering,.... .....just modify the feature-mapping to include them. Let, ff be a spectral lter, and ft a temporal lter. Then, the Spatial + Spectral + Temporally ltered band-power is, (X ; fs , ff , ft ) = F 1 (F(fs X Dt )Df )(F 1 (F(fs X Dt )Df )) = fs F(X Dt )Df2 F(X Dt ) fs /T where, F is the Fourier transform, and D(.) = diag(f(.) )
kyb-logo mc-logo

Jason Farquhar, Jeremy Hill, Bernhard Schlkopf Learning optimal EEG features across time, frequency and space.

MPI Tbingen

Motivation

Automatic Feature Selection

Summary

Adding Spectral/Temporal lters


Very simple to include Spectral/Temporal ltering,.... .....just modify the feature-mapping to include them. Let, ff be a spectral lter, and ft a temporal lter. Then, the Spatial + Spectral + Temporally ltered band-power is, (X ; fs , ff , ft ) = F 1 (F(fs X Dt )Df )(F 1 (F(fs X Dt )Df )) = fs F(X Dt )Df2 F(X Dt ) fs /T where, F is the Fourier transform, and D(.) = diag(f(.) )
kyb-logo mc-logo

Jason Farquhar, Jeremy Hill, Bernhard Schlkopf Learning optimal EEG features across time, frequency and space.

MPI Tbingen

Motivation

Automatic Feature Selection

Summary

Filter regularisation
The lters, Fs , Ff , Ft are unconstrained so may overt We have prior knowledge about the lters shape, e.g.
spatial lters tend to be over the motor regions temporal and spectral lters should be smooth

Include this prior knowledge with quadratic regularisation on the lters, Jmm = w w +
i

max(0, 1 yi (ln((Xi ; Fs , Ff , Ft )) w + b))

+s Tr(Fs Rs Fs ) + f Tr(Ff Rf Ff ) + t Tr(Ft Rt Ft ) where, R(.) is a positive denite matrix encoding the prior kyb-logo knowledge
Jason Farquhar, Jeremy Hill, Bernhard Schlkopf Learning optimal EEG features across time, frequency and space. mc-logo

MPI Tbingen

Motivation

Automatic Feature Selection

Summary

Implementation issues
Optimising Jmm for all the lters directly, results in a stiff problem and very slow convergence Further, evaluating (X ; fs , ff , ft ) requires a costly FFT Coordinate descent on the lter types solves both these problems,
1. Spatial optimisation, where, s (X , fs ) = fs Xf ,t Xf ,t fs 2. Spectral optimisation, where (X ; ff ) = Xs,t Df2 Xs,t 3. Temporal optimisation, where t (X , ft ) = Xs,f Dt2 Xs,f 4. Repeat until convergence

Non-convex problem seed with good solution found by another method, e.g. CSP or prior knowledge. kyb-logo
Jason Farquhar, Jeremy Hill, Bernhard Schlkopf Learning optimal EEG features across time, frequency and space.

mc-logo

MPI Tbingen

Motivation

Automatic Feature Selection

Summary

Example Optimisation trajectory for CompIII,Vc

Noisy CSP seed Finds:


motor region, 15Hz band, >.5s temporal band

Finds foot region Iteration 0: 45% Error


Jason Farquhar, Jeremy Hill, Bernhard Schlkopf Learning optimal EEG features across time, frequency and space. kyb-logo mc-logo

MPI Tbingen

Motivation

Automatic Feature Selection

Summary

Example Optimisation trajectory for CompIII,Vc

Noisy CSP seed Finds:


motor region, 15Hz band, >.5s temporal band

Finds foot region Iteration 1: 16% Error


Jason Farquhar, Jeremy Hill, Bernhard Schlkopf Learning optimal EEG features across time, frequency and space. kyb-logo mc-logo

MPI Tbingen

Motivation

Automatic Feature Selection

Summary

Example Optimisation trajectory for CompIII,Vc

Noisy CSP seed Finds:


motor region, 15Hz band, >.5s temporal band

Finds foot region Iteration 2: 3.5% Error


Jason Farquhar, Jeremy Hill, Bernhard Schlkopf Learning optimal EEG features across time, frequency and space. kyb-logo mc-logo

MPI Tbingen

Motivation

Automatic Feature Selection

Summary

Example Optimisation trajectory for CompIII,Vc

Noisy CSP seed Finds:


motor region, 15Hz band, >.5s temporal band

Finds foot region Iteration 3: 3.5% Error


Jason Farquhar, Jeremy Hill, Bernhard Schlkopf Learning optimal EEG features across time, frequency and space. kyb-logo mc-logo

MPI Tbingen

Motivation

Automatic Feature Selection

Summary

Example Optimisation trajectory for CompIII,Vc

Noisy CSP seed Finds:


motor region, 15Hz band, >.5s temporal band

Finds foot region Iteration 4: 2.6% Error


Jason Farquhar, Jeremy Hill, Bernhard Schlkopf Learning optimal EEG features across time, frequency and space. kyb-logo mc-logo

MPI Tbingen

Motivation

Automatic Feature Selection

Summary

Example Optimisation trajectory for CompIII,Vc

Noisy CSP seed Finds:


motor region, 15Hz band, >.5s temporal band

Finds foot region Iteration 5: 2.6% Error


Jason Farquhar, Jeremy Hill, Bernhard Schlkopf Learning optimal EEG features across time, frequency and space. kyb-logo mc-logo

MPI Tbingen

Motivation

Automatic Feature Selection

Summary

Example Optimisation trajectory for CompIII,Vc

Noisy CSP seed Finds:


motor region, 15Hz band, >.5s temporal band

Finds foot region Iteration 6: 2.6% Error


Jason Farquhar, Jeremy Hill, Bernhard Schlkopf Learning optimal EEG features across time, frequency and space. kyb-logo mc-logo

MPI Tbingen

Motivation

Automatic Feature Selection

Summary

Experimental analysis
We show binary classication error from 15 imagined movement subjects:
9 from BCI competitions (Comp 2:IIa, Comp 3:IVa,IVc) and 6 from an internal MPI dataset.

pre-processed by band-pass ltering to .545Hz Baseline performance is from CSP with 2 lters computed on the signal ltered to 7-27Hz. CSP solution used as the spatial lter seed, at seeds used for spectral and temporal lters
kyb-logo mc-logo

Jason Farquhar, Jeremy Hill, Bernhard Schlkopf Learning optimal EEG features across time, frequency and space.

MPI Tbingen

Motivation

Automatic Feature Selection

Summary

Results Spatial Optimization

100 Training Points 200 Training Points Spatial General Improvement in performance; particularly for low numbers of data-points (when overtting is an issue) kyb-logo Huge improvement in a few cases
Jason Farquhar, Jeremy Hill, Bernhard Schlkopf Learning optimal EEG features across time, frequency and space.

mc-logo

MPI Tbingen

Motivation

Automatic Feature Selection

Summary

Results Spatial + Spectral Optimization

100 Training Points Spatial + Spectral large benet in a few cases


Jason Farquhar, Jeremy Hill, Bernhard Schlkopf Learning optimal EEG features across time, frequency and space.

200 Training Points

Further improvement, for the subjects helped before


kyb-logo mc-logo

MPI Tbingen

Motivation

Automatic Feature Selection

Summary

Results Spatial + Spectral + Temporal Optimization

100 Training Points Spatial + Spectral + Temporal slight decrease for others
Jason Farquhar, Jeremy Hill, Bernhard Schlkopf Learning optimal EEG features across time, frequency and space.

200 Training Points

Further improvements for some subjects


kyb-logo mc-logo

MPI Tbingen

Motivation

Automatic Feature Selection

Summary

Summary
EEG BCI performance depends mainly on learning subject-specic feature extractors These can be learnt by direct optimisation of the classication objective (Max-margin) Results show signicant improvement over independent feature-extractor/classier learning (better in 12/15 cases)

Future work
Alternative objective functions SVM, LR and Gaussian Process objectives
implementated already.

Better priors particularly for the spatial lters, found by cross-subject learning? Other feature/signal types wavelets, MRPs, P300, etc. kyb-logo On-line feature learning
Jason Farquhar, Jeremy Hill, Bernhard Schlkopf Learning optimal EEG features across time, frequency and space.

mc-logo

MPI Tbingen

Motivation

Automatic Feature Selection

Summary

Results Temporal Signal Extraction

100 Training Points 200 Training Points learn a rank-1, i.e. 1-spatial + 1-temporal, approximation to the full svm weight vector this regularisation signicantly improves classication performance kyb-logo mc-logo and produces readily interpertable results..
Jason Farquhar, Jeremy Hill, Bernhard Schlkopf Learning optimal EEG features across time, frequency and space. MPI Tbingen

Motivation

Automatic Feature Selection

Summary

Results Temporal Signal Extraction

100 Training Points 200 Training Points learn a rank-1, i.e. 1-spatial + 1-temporal, approximation to the full svm weight vector this regularisation signicantly improves classication performance kyb-logo mc-logo and produces readily interpertable results..
Jason Farquhar, Jeremy Hill, Bernhard Schlkopf Learning optimal EEG features across time, frequency and space. MPI Tbingen

Motivation

Automatic Feature Selection

Summary

Results Example solutions

spatially differential lter between left/right motor regions temporally?


Jason Farquhar, Jeremy Hill, Bernhard Schlkopf Learning optimal EEG features across time, frequency and space.

spatially differential lter between foot and motor regions


kyb-logo mc-logo

temporally?
MPI Tbingen

You might also like