You are on page 1of 4

The role of affective computing for ensuring safety in

at-risk educational environments


The development of VoisJet and VoisEye for forensic phonetical analysis

Jonathan Bishop Darren Bellenger


Centre for Research into Online Communities and E- Global Learning Design and Creation
Learning Systems Culture and Experience CoE, CGI
Swansea, Wales Manchester, England
jonathan@jonathanbishop.com darren.bellenger@cgi.com

Abstract This paper presents an introduction to the use of


affective computing in at-risk educational environments, such as TABLE I. THE 12 EMOTIONAL INPUTS AND GENERATED OUTPUTS
those that schools located in areas where there is armed conflict
and for the safeguarding of children and at-risk adults more Emot- Emotion outputted from algorithm
generally. This paper has discussed the improvement of the icon
EigenFace based facial emotion recognition by continually
streamlining the facial dataset used and its application in at-risk :-# Afraid, lonely, discouraged, fearful, lonely,
educational environments. One of these mimics the authors nervous, reserved, scared, terrified
EigenFaces library and appears to have better performance in
poor lighting and poor camera situations, making it possibly :-| Bored, detached, fatigued, rusty, sleepy, tired,
better for drone use. It is therefore paramount that as the wise.
authors develop the system further that they keep each
component separate, in case it is decided to utilise commercial ;-) Bashful, devoted, eager, erotic, hopeful, merry,
libraries (e.g. Microsoft Project Oxford) for certain aspects. A obscene, romantic, sexy, tense
structure such as VoisOver that allows for third party
technologies to be plugged in would mean the authors code can :-( Depressed, despairing, distressed, embarrassed,
remain separate from that of third party plug-ins, namely gloomy, guilty, helpless, horror, misery, regretful,
specific algorithms for identifying the core 12 emotion sets the sad, stress, suicide, unhappy, upset, pity
authors have devised in contexts that might not even have been
considered yet. Such algorithms could work with the system :-D Amused, enjoyment, happy, joyful, mischief, silly,
described in this paper to make its operation in at-risk tease, wit
educational environments even more possible.
|-) Critical, moral, rigid, rude, selfish, serious,
Keywordsforensic phonetics, affective computing, innovation sceptical, thinking, thoughtful, snob

I. INTRODUCTION :-) Calm, delighted, friendly, impressed, liked,


natural, nice, pleased, relaxed, satisfied, serene
Forensic phonetics involves the comparison of two voice
sources in order to assess their differences or similarities [1]. 8o| Disdainful, disgusted, hostile, jealousy, menace,
An important part of forensic phonetics is monitoring given insecure, nasty obnoxious, rejected, resent
voices over a period of time [2]. Whilst forensic phonetics is
usually used to distinguish one person from another [3], this 8-| Cheerful, elated, excitement, lively, optimism,
paper extends it from the perspective of wanting to distinguish triumphant
each persons affective states through continuous analysis.
^o) Antagonistic, disbelieving, disgusted, suspicious,
An important consideration in affect recognition where
ridicule
forensic phonetics is concerned is the sincerity of the voices
sampled [4]. The uncertainty that comes from not knowing :@ Angry, contemptuous, enraged, displeased, hate,
whether a voice sample is genuine can cause problems in hatred, hurt, outrage, rage, scornful, scorn, violent
affective computing. Therefore, Table I shows twelve emotion
sets that can be devised not simply from the tone of voice of a :-o Grateful, kind, kindness, repentant, reverent,
person but also other aspects such as facial expressions. startled, subdued, thoughtful, timid, warmth, sissy
Contextualising these inputs results in the generation of context
specific emotion identification.
II. VOISEYE AND VOISJET were children and adults not engaged in terrorist activities. This
In life there are people who just do not get where someone would result in those at the centre of command and control
is coming from. This is an everyday reality for people with (C2) military and security operations would have far more
recognised conditions like autism, but it is not just people with upfront information of similar situations, leading to less risk of
disabilities who need help with understanding emotion in non-combatant loss of life.
others. VoisEye [5] and VoisJet [6] are systems that make use
of emotion aware technology to ensure that affective
information is communicated in such a way to help and not
hinder human interaction. They are based on an innovative
design which will assist autistic people in recognising the facial
moods of people they are talking to and suggest appropriate
responses. Given that VoisJet and VoisEye can work
irrespective of what language is being spoken, there are
obviously cross-over opportunities to use it in areas such as
those in Table II.

TABLE II. BENEFITS OF VOISEYE AND VOISJET

Purpose Example
Defence Soldiers who have regular contact with, say, a
tribal elder, could use it to see whether the
elder is being evasive as well as how well his
mood changes over time. Fig. 1. The architecture of the VoisEye component

Security During interrogation of suspected terrorists,


along with standard questioning, they could
pick-up evasiveness and suggest more
questions in certain areas.
Immigration Again they could help in questioning asylum
seekers here too.

A. Considering VoisEye
The aim of VoisEye [5] is to protect children and at-risk
adults from harm. Fig.1 presents the architecture for VoisEye.
Fig.2 presents an embodiment of VoisEye on a smartphone. As
can be seen, it is capable of detecting their affect from their
face and also their body language so as to enable a responsible
person to intervene if the at-risk person is likely to be harmed.

B. Considering VoisJet
The purpose of VoisJet [6] is to identify those likely to
harm children and at-risk adults and to enable the elimination
of them within the scope of international treaties, like the
Geneva Convention. In armed conflict it is important to Fig. 2. An embodiment of VoisEye on a smartphone
differentiate civilians from combatants and so forensic
phonetics have an important part to play in aiding this process.
The role of VoisJet, which can take the form of an unmanned
aerial vehicle, or drone, is to ensure that those responsible for
armed conflict operations have full access to information on
whether an intended target is a civilian or a combatant, so that
they are only eliminated if they are an actual threat. Fig.3 sets
out how VoisJet operates in practice. An example of where
VoisJet would have been effective was at the Wech Baghtu
wedding party. On 7 November 2008 there was a drone killing
of 63 people in Afghanistan, which consisted of 37 Afghan
civilians and 26 combatants. The civilians consisted of 23
children. With VoisJet programmed into the drone, then it Fig. 3. The architecture of the VoisJet component
would have been possible to identify that those at the party
III. DEVELOPING VOISEYE AND VOISJET USING COMPUTING
FACIAL AND AUDITORY DATA TO ASSIST IN FORENSIC
PHONETICAL ANALYSIS
Both VoisEye [5] and VoisJet [6] share a common
architecture that will be developed during the rest of this paper.
This architecture is presented in Fig.4.

A. Visual Data capture component


Fig 5. presents a possible data capture component for
implementing the structures required for VoisJet and
VoisEye. The approach, which is still under development, Fig. 4. The shared architecture between VoisEye and VoisJet
provides many advantages for at-risk educational
environments through the use of EigenFaces, which are
the basis vectors of the eigenface decomposition [7]. These
eigenvectors are used by many in the computer vision
problem of human face recognition [8].
The concept of EigenFaces has commonly been used for
visual face detection, but its use in visual facial emotion
detection is not widespread. The advantages are that it is
quicker and requires less computation, although conversely its
accuracy in poor lighting and poor camera situations is
something that needs to be improved upon.

B. Audio Data decoding component Fig. 5. Data Capture Component (auditory/facial)


A possible auditory/facial data decoding component can be
seen in Fig.5. This algorithm involves the acoustic properties
of the amplitude, pitch and spectral profile of the stimuli being
measured [9]. The basis on which this algorithm is designed is
that emotional states, cues associated with voice quality,
loudness and pitch give the emotional speech its affective
quality [9].
The spectral centre of gravity and the standard deviation of
the spectrum component are computed on the basis of fast
Fourier transformations [9]. Amplitude cues in the algorithm
are included measurements of intensity, aspects of the
amplitude envelope and duration [9].

C. Data matching component


Initially the development of the system involved creating a
series of prototypes that demonstrate each part of the complex
functionality required, such as facial recognition, video/audio
Fig. 6. The data decoding component (auditory)
emotion recognition, speech recognition [10, 11]. The app was
among the first to tackle the developing of a mobile app for
facial and emotion recognition and recommendation. Fig.7
presents an embodiment of an augmentive interface that can
display the results of the analysis in terms of face recognition.
This application was developed based on the common
component of VoisJet and VoisSafe.
The application in Fig.7 was developed in C# and uses the
concept of EigenFaces to initially detect someone the user
knows (from a library they must accrue on their mobile device)
then as the person talks the app will detect the emotion in their
facial expressions (again using EigenFaces concepts).
Currently the app runs as a Windows Tablet PC app with the
next task being to re-develop it in Xamarin, so that it can be
published to Android or iOS devices.
Fig. 7. An agumentive interface for visualising 7 of the 12 emotion types
IV. DISCUSSION [3] S. Taitechawat and P. Foulkes, "Discrimination of speakers using tone and
formant dynamics in Thai," in Proceedings of the 17th International
This paper has discussed the improvement of EigenFace Congress of Phonetic Sciences, Hong Kong, China. Hong Kong:
based facial emotion recognition by continually streamlining Organizers of ICPhS XVII at the Department of Chinese, Translation
the facial dataset used. This has resulted in improved results, and Linguistics, City University of Hong Kong, pp. 1975-1981, 2011.
but a move to a FACS (Facial Action Coding System) variant [4] H. Fraser, "The role of'educated native speakers' in providing language
might be required. It was decided to avoid using Xamarin for analysis for the determination of the origin of asylum seekers."
International Journal of Speech, Language & the Law, vol. 16, 2009.
the time being, due to the cost-benefit not being favourable.
[5] J. Bishop, "Reducing corruption and protecting privacy in emerging
Future steps will include creating a Windows Phone variant of economies: The potential of neuroeconomic gamification and Western
the app created for tablets, initially running on a Nokia Lumia media regulation in trust building and economic growth," in Economic
630. At the time of writing, Microsoft has a number of new Behavior, Game Theory, and Technology in Emerging Markets, B.
APIs available under the Cognitive Services banner [12]. One Christiansen, Hershey, PA: IGI Global, 2014, pp. 237-249.
of these mimics the authors EigenFaces library and appears to [6] J. Bishop, "The Role of Affective Computing for Improving Situation
have better performance in poor lighting and poor camera Awareness in Unmanned Aerial Vehicle Operations: A US
Perspective," in Synthesizing Human Emotion in Intelligent Systems
situations, making it possibly better for drone use. It is and Robotics, J. Vallverd, Hershey, PA: IGI Global, 2014, pp. 408-418.
therefore paramount that as the authors develop the system [7] M.A. Turk and A.P. Pentland, "Face recognition using eigenfaces," in
further that they keep each component separate, in case access Computer Vision and Pattern Recognition, 1991. Proceedings
to third-party libraries are required. A structure such as CVPR'91., IEEE Computer Society Conference on, pp. 586-591, 1991.
VoisOver that allows for third-party technologies to be plugged [8] V. Kshirsagar, M. Baviskar and M. Gaikwad, "Face recognition using
in would mean the authors code can remain separate from that Eigenfaces," in Computer Research and Development (ICCRD), 2011
of third party plug-ins, namely specific algorithms for 3rd International Conference on, pp. 302-306, 2011.
identifying the core 12 emotion inputs devised in contexts that [9] D.A. Sauter, F. Eisner, A.J. Calder and S.K. Scott, "Perceptual cues in
might not even have been considered yet, which would result nonverbal vocal expressions of emotion," Q.J.Exp.Psychol., vol. 63, pp.
2251-2272, 2010.
in further outputs. Such algorithms could work with the
[10] J. Bishop, "The Internet for educating individuals with social
systems described in this paper to make its operation in at-risk impairments," J.Comput.Assisted Learn., vol. 19, pp. 546-556, 2003.
educational environments even more possible. [11] J. Bishop, "The Role of Augmented E-Learning Systems for Enhancing
Pro-social Behaviour in Socially Impaired Individuals," in Assistive and
REFERENCES Augmentive Communication for the Disabled: Intelligent Technologies
for Communication, Learning and Teaching, L. Bee Theng, Hershey,
[1] J. Elliott, "Comparing the acoustic properties of normal and shouted PA: IGI Global, 2011, pp. 248-272.
speech: a study in forensic phonetics," in Proc. SST-2000: 8th Int. Conf.
[12] A.D. Wade and K. Wang, "The rise of the machines: Artificial
Speech Sci. & Tech, pp. 154-159, 2000.
intelligence meets scholarly content," Learned Publishing, vol. 29, pp.
[2] A. Eriksson, "Tutorial on forensic speech science," in Interspeech, Lisbon, 201-205, 2016.
Portugal, 2005.

You might also like