You are on page 1of 7

Applying Weightless Neural Networks to a P300-based

Brain-Computer Interface

Marco Simões1,2,3, Carlos Amaral1,2, Felipe França4, Paulo Carvalho3 and Miguel Cas-
telo-Branco1,2
1 CIBIT, Coimbra Institute for Biomedical Imaging and Translational Research, ICNAS, Uni-
versity of Coimbra, Portugal
2 Faculty of Medicine, University of Coimbra, Portugal
3 CISUC, Center for Informatics and Systems, University of Coimbra, Portugal
4 PESC-COPPE, Universidade Federal do Rio de Janeiro, Rio de Janeiro, Brazil

msimoes@dei.uc.pt

Abstract. P300-based Brain Computer Interfaces (BCI) are one of the most used
types of BCIs in the literature that make use of the electroencephalogram (EEG)
signal to convey commands to the computer. The efficiency of such systems de-
pends drastically on the ability of correctly identifying the P300 wave in the EEG
signal. Due to high inter-subject and inter-session variability, single-subject clas-
sifiers must be trained every session. In order to achieve fast setup times of the
system, only a few trials are available each session for training the classifier. In
this scenario, the capacity to learn from few examples is crucial for the perfor-
mance of the BCI and, therefore, the use of weightless neural networks (WNN)
is promising. Despite its possible added value, there are no studies, to our
knowledge, applying WNNs to P300 classification. Here we compare the perfor-
mance of a WNN against the state-of-the-art algorithms when applied to a P300-
based BCI for joint-attention training in autism. Our results show that the WNN
performs as good as its competitors, outperforming them several times. We also
perform an analysis of the WNN hyperparameters, showing that smaller memo-
ries achieve better results most of the times. This study demonstrates that the
adoption of this type of classifiers might help increase the prediction accuracy of
P300-based BCI systems, and should be a valid option for future studies to con-
sider.

Keywords: P300, Brain-Computer Interfaces (BCI), Weightless Neural Net-


works (WNN).

1 Introduction

P300-based Brain Computer Interfaces (BCI) represent a widely used type of electro-
encephalographic (EEG) BCI in the literature. Those BCIs work based on the identifi-
cation of the P300 wave in the EEG signal, elicited by paying attention an infrequent
stimulus. Although the most common application in the P300-speller [1], an interface
for selecting letters from a matrix based on flashes, more complex interfaces have been
2

proposed [2]. Although they base their functioning in the same neurological process
related to attention, different waveforms are elicited by different interfaces [3]. In this
sense, it is important to have strong classification algorithms to correctly identify the
target stimulus.
Due to high inter-subject and inter-session variability, single-subject classifiers must
be trained every session. In order to achieve fast setup times to the system, only a few
trials are available each session for training the classifier.
In this scenario, being able to learn from few cases is crucial and, therefore, the use
of weightless neural networks (WNN) is promising. The WNN are a family of classifi-
ers that do not require weight optimization, being trained only with a forward step [4].
This way, the WNN are usually faster to train than its usual competitors[5]. Despite
their foreseeable merits, the applications of WNN remain underexplored [6], being this
the first work applying WNN to BCI, to our knowledge.
In this paper, we compare the performance of a WNN against the state-of-the-art al-
gorithms when applied to a P300-based BCI clinical trial for joint-attention training in
autism. We further study how the WNN hyperparameters influence the classifier in this
task

2 Methods

In this section we describe the methods used in the paper, starting with the dataset
used, the feature extraction procedure and then the classifiers used for performance
comparison.

2.1 Dataset
We used the EEG data from 13 subjects using the gTec g.Nautilus system while
performing an innovative P300-based BCI task for joint-attention training in autism,
published in [2]. This innovative system uses the BCI in a virtual reality setting, where
the participants train to follow the gaze of a virtual avatar. 13 subjects performed one
session of BCI with 3 systems. The paper concluded that the best performing EEG sys-
tem was the g.Nautilus, in terms of use for a clinical trial in autism. Here, we also use
the g.Nautilus data for the comparison of classifiers. For a detailed explanation of the
task, please refer to the paper in [2].

2.2 Signal Processing and Classification Pipeline


The signal processing procedure follows the traditional feature extraction and selec-
tion using two independent datasets (one for training, one for testing). This process is
repeated for each session of each participant.

Feature Extraction
The feature extraction procedure follows the algorithm of [7], where a two-step spa-
tial filter is applied to the signal, one using the Fisher Criteria to maximize the differ-
ences between target/non-target signals and another one to maximize the Signal-to-
3

Noise Ratio (SNR) (see Fig. 1). The filters are applied in cascade. To a deeper expla-
nation of the procedure, please refer to the original paper [7].

Fig. 1. Left – Event-related potentials (ERPs) from the 8 original electrodes, showing the grand
averages of all target stimuli (orange) and all non-target stimuli (blue), for a randomly selected
subject. Right – Signal processed for feature extraction, after application of Fisher Criterion
filter (top) and max-SNR filter (bottom). Most discriminant features are selected and high-
lighted in the figures.

From the final signal generated, we select the features that maximize the difference
between the target and non-target signals (measured by Pearson R correlation). Only
the features whose p-value are below 0.01 are selected. A minimum of 50 features is
enforced, meaning that if not enough features have a p-value below threshold, the 50
features with lowest p-value are selected.

Classification Procedure
In order to select the best parameters for each classifier, the train dataset is split in
train (70% of the samples) and validation (30% of the samples) sets. Then, we itera-
tively select hyperparameter combinations to test. For each hyperparameter combina-
tion we train a classifier with the train set and evaluate its performance with the test set.
The classifier with the best accuracy in the validation set is selected for evaluation with
the test set.
The metric selected for evaluation of the classifier performance was the object de-
tection accuracy. In the virtual scenario, the avatar directs the attention of the partici-
pant for one out of eight objects. For each blink of the eight objects, the classifier must
perform a binary classification (target vs non-target), generating a target score for each
object. For the eight objects, the one with the highest target score is selected as target,
while the other are labeled as non-target. Chance-level is, therefore, fixed as 12.5% (1
out of 8).
4

2.3 Classifiers

In this paper, we compared the most used classifiers for this purpose [8] (Fisher
Linear Discriminant Analysis (fisher), Support-Vector Machine (svm) and Naïve
Bayes Classifier (nbc)) with a Weightless Neural Network (Wilkes, Stonham and Ale-
ksander Recognition Device - WiSARD variant). The WiSARD is a network of dis-
criminants, each composed by n-tuples of RAM memories. The input feature vector is
transformed in a binary representation. That binary representation is randomly mapped
to the memories. For training, each sample goes to the discriminant of its class, increas-
ing the counter of each memory location. For evaluation, the feature vector is evaluated
in both discriminants (see Fig. 2), selecting the discriminant with higher sum of counts,
after a bleaching procedure (for a more in-depth description, see [9]). We adapted the
WiSARD algorithm to consider prior class frequencies in its prediction, to deal with
the unbalanced characteristics of this problem. So, after the training, the counts of each
memory of each discriminant is divided by the total samples of its class, used in train-
ing.

Fig. 2. WiSARD schematic, explaining the process from the signal to the discriminants. During
training, each sample is mapped to the correspondent discriminant. For the prediction part, the
sample is submitted to both discriminants and the resulting scores compared.

The hyperparameters optimized for the WiSARD classifier were the number of bits
per memory (memory size) and the number of bits to represent each feature (input size).
For the SVM, we optimized the BoxContraint C value. The NBC and Fisher classifier
do not need hyperparameter optimization.
We also registered the training time needed for each classifier and compare them in
addition to the accuracy metric. It should be noted that while the SVM, Fisher and Na-
ïve Bayes classifiers make use of precompiled C routines, the WNN uses an in-house
developed software fully scripted in Matlab, which makes the training times compari-
son biased against the WNN.
5

3 Results

Due to the intrinsic noise characteristic of event-related potential signals, the SNR in-
creases when we average several responses together. In this sense, we present in Fig.
3 the results of the performance of the classifiers across the number of averaged trials.

Fig. 3. Object detection accuracy for the four classifiers tested across number of averaged trials.
At the left, the results for the validation accuracy and at the right, the results for the test set.

As expected, we see a monotonic increase in accuracy across trials for almost every
classifier. We see that, for the validation set, the WiSARD present the best results for
every SNR level. At the test set, we see that its performance is, at least, as good as the
best classifier of the common used solutions, outperforming it several times. We see
that in several cases it outperforms the SVM, possibly the most used classifier in the
literature for this type of problem.

Fig. 4. Histogram of hyperparameter selection in the validation set by SNR. At the left, the
memory size and, at the right, the number of bits used to represent a feature.

Regarding the hyperparameters (Fig. 4), we see that for most of the times, smaller
memories are selected (2 bits). When looking to the histogram of the number of bits
6

used to represent each feature, 30 bits is the most common, but without a clear superi-
ority, showing that the memory size has a greater influence in the performance of the
classifier than the number of bits per feature.

Regarding the training duration for each classifier, Fig. 5 shows the time needed to
train each classifier. Due to some training times not to be normally distributed, we
compared the training times with a non-parametric Friedman test, showing statisti-
cally significant differences between the classifiers ꭓ2(3) =39, p < 0.001. Post-hoc
tests (pair-wise Wilcoxon signed-rank tests, corrected for multiple comparisons with
the Bonferroni method) showed strong statistical differences between all methods.

Fig. 5. Boxplot showing training times of each classifier. *** represent statistically significant
differences with p < 0,001 after correction for multiple comparisons with the Bonferroni
method.

4 Discussion

In this paper we compared the most used classifiers in the P300 detection field with a
weightless neural network, which, to our knowledge, was never tested with this pur-
pose. We used an innovative BCI application that aims to train joint-attention skills to
people with autism spectrum disorder.
The results achieved show that the presented WNN is able to, at least, perform at the
level of the best classifier, outperforming it several times. Additionally, the WNN
shows the fastest training time of the classifiers sampled, even without the use of pre-
compiled routines. The WNN faster training times and ability to generalize responses
using small memory sizes worked specially well for the characteristics of the problem,
where inter-trial variability presents a big challenge for any learning algorithm. Further
exploration is needed to assess if other configurations of WNN can improve even more
the accuracy results achieved by the WiSARD algorithm here tested.
This paper demonstrates that the adoption of this type of classifiers might help to
increase the prediction accuracy of P300-based BCI systems. Therefore, future studies
7

should consider its adoption when choosing the classifier to include in the P300 detec-
tion systems.

Acknowledgments

This work was supported by FTC – Portuguese national funding agency for science,
research and technology [Grant PAC – MEDPERSYST, POCI-01-0145-FEDER-
016428], fellowship SFRH/BD/77044/2011, and the BRAINTRAIN Project FP7-
HEALTH- 2013-INNOVATION-1–602186 20, 2013.
The authors declare no conflict of interests.

References

1. Pires G, Nunes U, Castelo-Branco M (2012) Comparison of a row-column


speller vs. a novel lateral single-character speller: Assessment of BCI for severe
motor disabled patients. Clin Neurophysiol 123:1168–1181 . doi:
10.1016/j.clinph.2011.10.040
2. Amaral CP, Simões MA, Mouga S, et al (2017) A novel Brain Computer
Interface for classification of social joint attention in autism and comparison of
3 experimental setups: A feasibility study. J Neurosci Methods 290:105–115 .
doi: 10.1016/j.jneumeth.2017.07.029
3. Amaral CP, Simões M a., Castelo-Branco MS (2015) Neural Signals Evoked
by Stimuli of Increasing Social Scene Complexity Are Detectable at the Single-
Trial Level and Right Lateralized. PLoS One 10:e0121970 . doi:
10.1371/journal.pone.0121970
4. Aleksander M de G, França FMG, Lima PM V, Morton H (2009) A brief
introduction to Weightless Neural Systems. ESANN’2009 proceedings, Eur
Symp Artif Neural Networks - Adv Comput Intell Learn 22–24
5. Cardoso DO, Carvalho DS, Alves DSF, et al (2016) Financial credit analysis
via a clustering weightless neural classifier. Neurocomputing 183:70–78 . doi:
10.1016/j.neucom.2015.06.105
6. França FMG, de Gregorio M, Lima PM V., de Oliveira WR (2014) Advances
in Weightless Neural Systems. 22th Eur Symp Artif Neural Networks 497–504
. doi: 10.13140/2.1.2688.6403
7. Pires G, Nunes U, Castelo-Branco M (2011) Statistical spatial filtering for a
P300-based BCI: tests in able-bodied, and patients with cerebral palsy and
amyotrophic lateral sclerosis. J Neurosci Methods 195:270–81 . doi:
10.1016/j.jneumeth.2010.11.016
8. Manyakov N V, Chumerin N, Combaz A, Van Hulle MM (2011) Comparison
of Classification Methods for P300 Brain-Computer Interface on Disabled
Subjects. Comput Intell Neurosci 2011:1–12 . doi: 10.1155/2011/519868
9. Grieco BPA, Lima PMV, De Gregorio M, França FMG (2010) Producing
pattern examples from “mental” images. Neurocomputing 73:1057–1064 . doi:
10.1016/j.neucom.2009.11.015

You might also like