You are on page 1of 112

SEVENTH FRAMEWORK PROGRAMME

FP7-ICT-2011-1.5 Networked Media and Search Systems b) End-to-end Immersive and Interactive Media Technologies Specific Targeted Research Project

VENTURI
(FP7-288238)

immersiVe ENhancemenT of User-woRld Interactions

D3.1 User expectations and cross-modal interaction

Due date of deliverable: 31-01-2013 Actual submission date: [31-01-2013]

Start date of project: 01-10-2011 Duration: 36 months

FP7-288238

Document Code: D3.1 User expectations and cross-modal interaction v0.4

Summary of the document


Document Code: Last modification: State: Participant Partner(s): Editor & Authors (alphabetically): D3.1 User expectations and cross-modal interaction - v0.4 31/01/2013 Add audio content creation INRIA , SONY Editor: Jacques Lemordant Authors: Alce, Gnter (SONY), Hermodsson, Klas (SONY), Lasorsa, Yohan (INRIA), Liodenot, David (INRIA), Paul Chippendale (FBK)

Fragment: Audience:

No public restricted internal

Abstract:

This document is deliverable D3.1 User expectations and cross-modal interaction and presents user studies to understand expectations and reactions to content presentation methods for mobile AR applications and recommendations to realize an interface and interaction design in accordance with user needs or disabilities. Interaction, cross-modal, Augmented Reality, navigation, visually impaired people, 3D audio Refer to the corresponding section at the end of the deliverable

Keywords:

References:

VENTURI Consortium 2011-2014


Page 2

FP7-288238

Document Code: D3.1 User expectations and cross-modal interaction v0.4

Document Control Page


Version number Date Modified by Comments Status V0.4 31/01/2013 Paul Chippendale Final Quality check draft WP leader accepted Technical coordinator accepted Project coordinator accepted Action requested to be revised by partners involved in the preparation of the deliverable for approval of the WP leader for approval of the technical coordinator for approval of the project coordinator Deadline for action: 31/01/2013

Change history
Version number 0.1 0.2 0.3 0.4 Date 11/10/2012 21/12/2012 02/01/2013 31/01/2013 Changed by G. Alce D. Liodenot D. Liodenot P. Chippendale Changes made Preliminary version Integrated user study for audio and experiments with visually impaired people Add audio content creation Final Quality check

VENTURI Consortium 2011-2014


Page 3

FP7-288238

Document Code: D3.1 User expectations and cross-modal interaction v0.4

Table of Contents
Summary of the document........................................................................................................................................ 2 Document Control Page ............................................................................................................................................ 3 Change history ........................................................................................................................................................... 3 Table of Contents ...................................................................................................................................................... 4 Executive Summary ................................................................................................................................................... 7 Scope ..................................................................................................................................................................... 7 Audience ................................................................................................................................................................ 7 Summary ................................................................................................................................................................ 7 Structure ................................................................................................................................................................ 7 1 2 Introduction ........................................................................................................................................................ 7 AR gaming ........................................................................................................................................................... 8 2.1 2.2 2.3 Introduction ................................................................................................................................................. 8 AR gaming user study ................................................................................................................................... 8 Method......................................................................................................................................................... 9 Participants............................................................................................................................................ 9 AR Games .............................................................................................................................................. 9 Procedure ............................................................................................................................................ 11

2.3.1 2.3.2 2.3.3 2.4

Results and Discussions .............................................................................................................................. 12 Questionnaire ...................................................................................................................................... 12 Interviews ............................................................................................................................................ 12

2.4.1 2.4.2 2.5 2.6 3

Conclusions ................................................................................................................................................ 13 Design recommendations .......................................................................................................................... 13

Audio integration & User study ........................................................................................................................ 14 3.1 Audio content creation .............................................................................................................................. 14 Choosing an audio format ................................................................................................................... 14 Preparing sounds for mobile usage ..................................................................................................... 15 Creating seamless loops ...................................................................................................................... 18 Pre-rendering HRTF samples ............................................................................................................... 19

3.1.1 3.1.2 3.1.3 3.1.4

VENTURI Consortium 2011-2014


Page 4

FP7-288238

Document Code: D3.1 User expectations and cross-modal interaction v0.4

3.2

Audio integration & User tests using IXE demonstrator ............................................................................ 19 Scenario ............................................................................................................................................... 19 Audio scene description ...................................................................................................................... 21 User tests............................................................................................................................................. 24 Results and conclusion ........................................................................................................................ 25

3.2.1 3.2.2 3.2.3 3.2.4 3.3

3D HRTF Audio integration & User tests .................................................................................................... 26 Scenario ............................................................................................................................................... 26 Audio scene description ...................................................................................................................... 28 User tests............................................................................................................................................. 31 Results and conclusion ........................................................................................................................ 31

3.3.1 3.3.2 3.3.3 3.3.4 4

Experiments with visually impaired people (June and July 2012 Grenoble) .................................................... 32 4.1 Methodology .............................................................................................................................................. 32 Plan of a typical day............................................................................................................................. 33 Routes.................................................................................................................................................. 33 Interview post-tests ............................................................................................................................ 35

4.1.1 4.1.2 4.1.3 4.2

Conclusion and recommendations ............................................................................................................ 35 Key points for user testing .................................................................................................................. 35 Vigilance points for the design ............................................................................................................ 36 Recommendations .............................................................................................................................. 36

4.2.1 4.2.2 4.2.3 5 6 7

Results and Conclusions .................................................................................................................................... 38 References ........................................................................................................................................................ 38 Appendix for AR gaming study .......................................................................................................................... 39 7.1 7.2 7.3 7.4 7.5 7.6 7.7 Interview questions.................................................................................................................................... 39 Questionnaires ........................................................................................................................................... 41 NASA-TLX.................................................................................................................................................... 42 Informed consent ....................................................................................................................................... 43 Graphs from questionnaires phone form factor ........................................................................................ 44 Answers from questionnaires phone form factor ...................................................................................... 47 Graphs from questionnaires tablet form factor......................................................................................... 48

VENTURI Consortium 2011-2014


Page 5

FP7-288238

Document Code: D3.1 User expectations and cross-modal interaction v0.4

7.8 7.9 7.10 7.11 7.12

Answers from questionnaires tablet form factor ...................................................................................... 51 Graphs from NASA-TLX phone form factor ................................................................................................ 52 Answers from NASA-TLX phone form factor ............................................................................................ 55 Graphs from NASA-TLX tablet form factor ............................................................................................... 57 Answers from NASA-TLX tablet form factor............................................................................................. 60

VENTURI Consortium 2011-2014


Page 6

FP7-288238

Document Code: D3.1 User expectations and cross-modal interaction v0.4

Executive Summary
Scope
This document provides the deliverable contents related to the T3.1 User expectations from Mixed Reality and cross modal interaction.

Audience
This deliverable is public.

Summary
In this report, the objective is to investigate users expectations and reactions towards content presentation in a mixed reality fashion. AR gaming, interactive audio scene and navigation with visually impaired people are considered.

Structure
This deliverable is structured as follows: Section 1 is an introduction explaining the objective of the deliverable. Section 2 describes the methodology, the outcome and design recommendations of the AR gaming user study. In section 3, the audio integration based on the MAUDL format and user study are described. Finally, section 4 considers audio for indoor and outdoor navigation and experiments with visually impaired people.

1 Introduction
User studies have been undertaken to understand the expectations and reactions of users to content presentation methods for mobile AR applications, taking into account usability aspects. The use-cases defined in WP2 provide the background for this study. In a user-centred design approach, current and future audio and visual technologies are explored to learn how to improve the efficiency and quality of AR applications and assistive AR technologies. In this report, AR gaming, interactive audio scene and navigation are considered. The expectations from visually impaired people with regards to mixed reality applications will be investigated, especially with pedestrian navigation applications.

VENTURI Consortium 2011-2014


Page 7

FP7-288238

Document Code: D3.1 User expectations and cross-modal interaction v0.4

2 AR gaming
This section describes the methodology, the outcome and design recommendations of the AR gaming user study.

2.1 Introduction
The user studies are part of an iterative design process (Figure 2.1). User centred design means keeping the user at the centre of focus when developing and designing. The users should be directly involved from the beginning of the design process, where they can actually influence the design and not only be asked to help in validation. The outcome of the user studies should then be reused in the next version of the product.

T3.1

Conceptual Framework Review existing research

Conceptual Framework Plan user studies

Outcome: - Design recommendations - D3.1

Hypothesis

Analysis

Conduct user studies

T3.2

Design UX

Create Prototypes / Mockups

Evaluate

Conceptual Framework

Conceptual Framework

Outcome: - Prototypes and mockups - Design recommendations - D3.2, D3.3

FIGURE 2.1 THE ITERATIVE DESIGN PROCESS


The use-cases defined in D2.1.1 were considered and since the first year use-case is based on gaming, a decision was made to investigate how people react to current AR marker based games and compare this with future marker-less games such as those illustrated by VeDi 0.9. Since we are still in an early stage of the project, we did not focus on quantitative results. Instead our focus was on participants opinions of the experience of AR gaming, and, to see if participants find the technology flaws irritating. Subjective data was gathered through observations, semi-structured interviews, questionnaires and NASA-TLX. The NASA-TLX workload questionnaire enabled us to look at how difficult participants found the playability of each AR game [2].

2.2 AR gaming user study


User studies were conducted on existing marker based AR games, since one of the objectives was to investigate the pros and cons of current AR interfaces as the game developed by VENTURI was still not available. The idea was to understand how the user experiences are influenced by technical instabilities and to understand what the users think about the new concept of AR gaming in terms of presentation and interaction.

VENTURI Consortium 2011-2014


Page 8

FP7-288238

Document Code: D3.1 User expectations and cross-modal interaction v0.4

When preparing user studies, a conceptual framework is useful both for the design phase and for the evaluation phases; in order to keep the user and context in mind. A concept framework assigns a priority for issues during the designing phase, and provides clues about what to include in the evaluation. It also gives one an understanding for real usage scenarios, especially when the usage scenario is not very clear. According to Wikipedia, 201201-05: Conceptual frameworks (theoretical frameworks) are a type of intermediate theory that attempts to connect to all aspects of inquiry (e.g., problem definition, purpose, literature review, methodology, data collection and analysis). Conceptual frameworks can act like maps that give coherence to empirical inquiry. Because conceptual frameworks are potentially so close to empirical inquiry, they take different forms depending upon the research question or problem. Examples of conceptual frameworks are: Grounded theory, Activity Theory (AT) and distributed cognition. For the AR gaming user study, AT was used both for preparing the user studies and for analysing the collected interview data.

2.3 Method
Our approach was to design the user studies similar to a case study, where the main objective is rather small and we have a small number of participants. The focus was on in-depth investigations, multiple data sources and an emphasis on qualitative data [4]. Furthermore, to analyse the qualitative data an AT checklist [3] was used. The activity checklist is intended to elucidate the most important contextual factors of humancomputer interaction. It is a guide to the specific areas in which a researcher or practitioner should pay attention when trying to understand the context in which a tool will be, or is, used. An AT checklist lays out a kind of contextual design space by representing the key areas of context specified by AT, the areas are: Means/Ends Environment Learning/cognition and articulation Developments

2.3.1 Participants
We conducted interviews with six people to start with. In the eventuality answers diverged too much plans were made to interview more users. However similar answers and views of AR marker based gaming were identified and a level of saturation was achieved. Interesting population for the user studies is non-engineering but with interest of new technology. The assumption was that people from marketing and administration have little or no technical background. Further it was assumed that those interested in participating in the user study are interested in new technology. Invitation mail was sent out to all administrative staff within Sony Mobile Communications. Pilots where used to ensure the relevance of the questions and to get an indication of how long the user study will last. Four of the participants were females and two were males with ages varied between 30 and 47. Additional six participants tried out the AR games and answered the questionnaire and NSA-TLX. This was done to enlarge the number of participants filling in the questionnaires to be able to illustrate trends with graphs.

2.3.2 AR Games
The objective was to let participants try both unstable and stable AR games. AR Blitz and AR Defender are two games that easily lose tracking while Danger Copter and NerdHerder are more stable. It was decided to let participants test the games both on phone form factor and tablet form factor. However Danger Copter worked only on Nexus phone thus it could not be tested on tablet. Halfway of the studies AR Blitz stopped working with the

VENTURI Consortium 2011-2014


Page 9

FP7-288238

Document Code: D3.1 User expectations and cross-modal interaction v0.4

tablet and NerdHerder was released at the same point so in order to have at least two games which are played with tablet it was decided to let participants play NerdHerder with tablet. The complete list of the games and the hardware form factor that were used are listed below: 1. 2. 3. 4. 5. 6. AR Blitz with phone (Sony Xperia P) (Figure 2.2) AR Blitz with tablet (Sony Tablet S) AR Defender with phone (Sony Xperia P) (Figure 2.3) AR Defender with tablet (Sony Tablet S) Danger Copter with phone (Nexus) (Figure 2.4) NerdHerder (Sony Tablet S) (Figure 2.5)

FIGURE 2.2 AR B LITZ

FIGURE 2.3 AR DEFENDER

FIGURE 2.4 DANGER COPTER

VENTURI Consortium 2011-2014


Page 10

FP7-288238

Document Code: D3.1 User expectations and cross-modal interaction v0.4

FIGURE 2.5 NERD HERDER


The order of the games and form factor were mixed. For example, if the first participant tried out with a phone form factor the next one would start with tablet form factor and so on. Also the order of the games was mixed.

2.3.2.1 AR Blitz
AR Blitz is a game where user needs to hit shapes popping out of a hole (Figure 2.2). It is a very simple game where users must touch the screen to hit the different shapes. Sometimes it has some difficulties finding the marker. Users can move the phone but only with small movements otherwise it loses tracking. Video link: http://youtu.be/bSFo_U30lWw .

2.3.2.2 AR Defender
AR Defender is a game involving a tower, which users need to defend by using different weapons to target the enemies (Figure 2.3). Users need to move the phone to target the enemies and need to press a button to shoot. Also sensitive and loses tracking easily. Video link: http://youtu.be/rB5xUStsUs4 .

2.3.2.3 Danger Copter


Danger Copter is a game where you are a pilot of a fire-fighting helicopter (Figure 2.4). Most of the interaction is done by moving around the phone in all directions (sideways, up and down etc.). The game is very stable and very intuitive. Video link: http://youtu.be/LlFryaZwD6Y .

2.3.2.4 NerdHerder
In NerdHerder users are IT-managers and need to tempt the workers back to their office with a donut (Figure 2.5). The interaction with NerdHerder is similar as Danger Copter but the metaphor is not as easy to understand as the Danger Copter. NerdHerder is also a very stable game and warns users when it is about to lose tracking which is a great advantage. Video link: http://youtu.be/RSxImyFXSXw .

2.3.3 Procedure
The in-depth interviews were performed at the Usability Lab of Sony Mobile Communications. It started with some refreshments, in order to get the participant relaxed and letting her/him getting used to both the environment and the moderator (one of the authors). After the refreshments the session started with a short introduction of Azumas definition of AR [1] followed by explaining the objective of the study. The user study session was one hour long and it was recorded with two video cameras and a table microphone (Figure 2.6).

VENTURI Consortium 2011-2014


Page 11

FP7-288238

Document Code: D3.1 User expectations and cross-modal interaction v0.4

FIGURE 2.6 THE SETUP OF THE USER STUDY


Two cameras were used in order to cover both the participants view of the phone and the participants face to observe the participants reactions. All participants signed an informed consent form (Appendix 7.4). The session continued as follows, participant played the game using a phone or tablet device, followed by a semi-structured interview that was followed by a questionnaire and NASA TLX. These steps were repeated for all tested AR games and with both phone and tablet form factor. The video material was transcribed and then colour coding was used to identify patterns about participants thoughts about AR presentation and interaction. After that the Activity Theory checklist [3] was used to go through the transcribed material again.

2.4 Results and Discussions


This section presents and discusses results of questionnaires and interviews.

2.4.1 Questionnaire
In appendix 7.5 7.12 all results from the questionnaires of the phone form factor and tablet form factor are presented. The results from the questionnaires shows that the participants found that it was easy to understand how to play the AR games (figure 7.1) this is also evident in the NASA TLX (figure 7.15). Also most participants found the game sufficiently stable. Technical instabilities indicate some diversion. However the majority thinks that the games are stable enough and that both the responsiveness and how the camera picture followed the movements are stable enough. It should be noted graphs presented in figure 7.1 figure 7.7 is the results from phone form factor. Similar trend can be seen also for tablet form factor shown in figure 7.8 figure 7.14 in appendix. From the NASA-TLX questionnaires, it can be found that the majority of participants think that mental demand and the physical demand are low similar to the answers from figure 7.15 and figure 7.16, but note that the temporal, effort and frustration diverse figure 7.17, figure 7.19 and figure 7.20.

2.4.2 Interviews
The following topics emerged in the analysis of the interviews: interaction, engagement and environment. These topics are discussed below.

VENTURI Consortium 2011-2014


Page 12

FP7-288238

Document Code: D3.1 User expectations and cross-modal interaction v0.4

2.4.2.1 Interaction
All participants commented about interaction techniques. In the beginning participants found it strange to move the phone instead of as usual touching the display when playing. However after a couple of minutes they got into it and started to discover new features such as zoom in and out by simply moving the phone away and towards the marker. Some of the participants pointed out that especially Danger Copter is very intuitive, probably since the participants could immediately relate the movements with real life scenario.

2.4.2.2 Engagement
The majority of participants showed signs of engagement in the game. One such signal was the use of spontaneous quotes during the game: come I will save you, dont run away, like the idea of play in real surroundings, wow fun, my kids should see this(authors translation). Also, participants found the AR games fun. Despite this, however, none of the participants would consider playing them again. One main reason is that the games are too simple, and you need markers to be able to play. The latter view is illustrated in the quote: I would have to plan when to play, it would not be something I do spontaneously. It was fairly noticeable that the participants found the Danger Copter game much more engaging than the others probably since the metaphor of being a helicopter is much easier to relate with real life scenario and participants liked the idea of rescuing people.

2.4.2.3 Environment
The participants addressed issues when it comes to the environment, in terms of the location in which participants play preferable. This was related to the fact that most participants generally play (mobile games) before going to bed. This means that in order to play the game with markers the room needs to be bright with a flat surface e.g. a table in order to put the marker on. One participant suggested that she would like to play while waiting for the buss if the markers where e.g. on the ground or placed on the environment.

2.5 Conclusions
The objective of the questionnaires and interviews was to find out how technical instabilities influence the user experience and try to see how people react with the new way of interaction. Both from the questionnaires and interviews it is certain that participants were irritated but still found the technical issues sufficiently stable such as detection of the game board, location of the virtual objects and responsiveness. Perhaps, however, since it is just a game it is not that important but if it was something important such as buy a ticket or be guided to a place while in a hurry it is essential that it is stable. Moving the phone instead of as usual just touching the display was strange interaction in the beginning. However participants got used to it fairly quickly. Utilizing movements with good scenarios that people can relate to, is emphasized.

2.6 Design recommendations


This sections summaries recommendations emerged both from AR gaming user studies and expert evaluation of VEDI 0.9. Expert evaluation is described in detail in delivery report D6.4. Recommendations from AR gaming: Avoid markers or at least hide markers in the environment Incorporate real objects in the game Create game which relate to real life scenario

VENTURI Consortium 2011-2014


Page 13

FP7-288238

Document Code: D3.1 User expectations and cross-modal interaction v0.4

Recommendations from expert evaluations: Give feedback when tracking is lost Give feedback when tracking is about to be lost Have on screen prompts. For the VEDI game relevant prompt could be; "Pickup burger", "Deliver burger" Have feedback sounds/vibration to make user aware of new goal Have floating burger on top of car to show state of the game (and remove it when dropped off) Increase time to drop off hamburger Make pickup and delivery easier Pause the countdown timer while tracking is lost Improve location and visibility of drop off location Steer the car with a joystick instead of current UI

3 Audio integration & User study


This section describes the audio integration based on the MAUDL format and user studies. We will describe how to create audio content specifically adapted to the mobile context. Then, We will show user tests and results on an interactive audio scene based on the VeDi 1.0 game scenario (burger delivery). Finally, user tests are performed using pre-rendering HRTF samples.

3.1 Audio content creation


Creating audio content for mobile application must take in account the specifics of the target platform and audio API limitations. Special care must also be taken when preparing the sound files to optimize the quality and clarity of their reproduction on any conditions: headphones, portable speakers or larger sound systems. This includes some prerequisites on audio file formats and sound processing.

3.1.1 Choosing an audio format


The choice of the output audio formats depends on many aspects: The target audio API, which must be able to handle the audio format. The hardware limitations of the target platform, as the available quantity of RAM may limit the choice of usable audio formats. The length of the sound, and its trigger frequency: small and frequently triggered sounds should be provided in uncompressed or linearly compressed audio formats (like PCM or ADPCM) when very long audio content (such as background music) is better to handle using good compression formats (like MP3, OGG or M4A).

VENTURI Consortium 2011-2014


Page 14

FP7-288238

Document Code: D3.1 User expectations and cross-modal interaction v0.4

The playback type of the sound: for sounds that will need to be looped, the use of compressed formats is discouraged since the compression and the overhead due to the decompression stage may alter the looping capability of the sound, resulting in pops and cracks.

When possible, the use of the ADPCM format for short or looping sounds is encourage since it reduces the size of output files with no noticeable penalty on audio quality of playback. For sounds with longer duration, MP3 is a good choice when available as it often can be decoded in hardware, and have low decoding overhead when it is not the case. The number of channels to use (generally 1 or 2) also depends on the audio content and limitations of the platforms: When space limitation for sounds is tight (in RAM or for application size restrictions), you better use mono sounds (1 channel). For music or sound ambiances that are not aimed to be spatialized using 3D rendering, using stereo sounds (2 channels) can enhance the sound quality and user experience. UI sounds may also benefit from using stereo format in some cases. For voice, sound effect or spatialized sound it is better to use mono (1 channel).

Finally, for the sample rate, most of the time you should stick to using 44100 Hz sample rate frequency to ensure best sound quality. If sound size is really a concern, then for voice sample or simple sound effects you can reduce their sample rate frequency to 22050 Hz without losing too much on audio quality.

3.1.2 Preparing sounds for mobile usage


As you may use audio content coming from various sources (recordings, internet, sound banks) the volume and spectral balance of the sounds may differ drastically. Thus it is necessary to harmonize these sounds using audio processing. To perform this processing, audio editor software is required such as Adobe Audition [5], Steinberg Wavelab [6] or Audacity [7], a good open-source and cross-platform solution for audio editing (http://audacity.sourceforge.net.

VENTURI Consortium 2011-2014


Page 15

FP7-288238

Document Code: D3.1 User expectations and cross-modal interaction v0.4

FIGURE 3.1 AUDACITY , AN OPEN-SOURCE AUDIO EDITOR


Denoising When using audio content coming from recordings made with low-quality microphones, there may be a lot of noise in the sound. Then we need to use the denoiser plugin or processor from the audio editor to reduce this unwanted noise that occurs in the higher frequencies. The denoising process is essentially done in 2 stages: First, you need to determine the noise profile. To do this, select a blank part of your recording that contains no significant content other than noise, then train the denoiser tool with this. Then select the noise reduction to apply, in dB: 6-12 dB allow reducing the noise with no or little side artefacts, while greater values further reduce the noise but may alter the quality of the original recording. Once you have determined the optimal noise reduction value, apply the process to your entire sound.

Equalization Audio contents from various sources may have very different spectral balance. It is particularly important to adjust the frequency spectrum of sounds targeted at mobile applications, since most of these will be played through the integrated speaker or low-quality ear buds. Special care must be taken especially for low frequencies, which eats most of the energy power of sounds, but will be inaudible on such listening systems. Too much high frequencies may become unpleasant to the ear, while too much mediums frequencies may cause the sound to be aggressive. A lack of these frequencies may make the sound dull of empty though. Everything is a matter of balance and harmonization. But since equalization is a very subjective process and depends heavily on the initial audio content, it is hard to provide general advices or recipes, everything is best done using testing and A/B comparisons.

VENTURI Consortium 2011-2014


Page 16

FP7-288238

Document Code: D3.1 User expectations and cross-modal interaction v0.4

The equalization process can be done using equalizer plugins, and may come in various forms. For these kinds of process, it is best to use parametric equalizer and limit the correction to removing or attenuating unwanted frequencies rather than boosting some frequency ranges. There are 3 typical parameters to understand for parametric equalizers: The base frequency of the filter is the centre frequency at which the equalizing process will occur. The Q factor or quality factor corresponds to how steep will be the correction curve, in other words how large will be the correction range around the base frequency. Higher values means that the correction will be very localized and precise, while lower values affects greater frequency ranges. The correction factor, expressed in dB, determined how much the selected frequencies would be cut or boosted. Remember as a general rule that is always better to cut frequencies off than to boost some.

Volume normalization This is one of the most critical and tricky processes. Sound files from various sources can have drastic differences in perceived audio volumes. There is notions to separate here: sound intensity corresponding to the actual peak volume of audio content (expressed in dB, from 0 to -96 dB for 16 bits files) and perceived volume (expressed in RMS [Round Mean Square] dB) corresponding to an average of the perceived sound energy. Peak volume normalization (as you may found in most audio editors) does not adjust the volume differences in audio files, it only maximize the volume based on the most intense audio peak in the files. On the other hand, perceived volume normalization may need special care depending of the audio editor and the audio content: in order to adjust the mean perceived volume, a compression of the sound dynamics must be performed, with the use of an audio limiter to prevent digital saturation of the sound. In order to perform this process, you can use either a compressor/limiter plugin or two separate plugins, or special RMS normalization plugin as some audio editors provides. The target value for perceived volume normalization heavily depends on the content, but values around -10 or 12 RMS dB are good candidates for normalization of all kind of audio content. You may even push it to -6 or -8 RMS dB for special effects or UI sounds, to add more impact. Trimming and fading ends The final preparation stage is to select in an appropriate manner the specifically desired sound part from the current audio file. Silences or noises at the beginning or the end of a sound may disturb the user or the timing of interactions, so you must make sure your sounds start and ends at the right time. To do this, just delete the unwanted parts of the sounds at the beginning and at the end of the audio file using the audio editor. After your sound is correctly trimmed, the last stage is to ensure your sound has a smooth start and end at 0value point, using fades. Using the built-in functions of your audio editor, perform a fade-in on the beginning and a fade-out at the end or your sound. Good all around values is around 5-10 ms for fades. If you have sounds with ending audio tails like reverbed sound, you may want to perform a longer fade-out (1 or more seconds) in order to make the sound ends smoothly. Applying fades is essential to avoid unpleasing clicks and cracks at the beginning and end of sounds when they are played.

VENTURI Consortium 2011-2014


Page 17

FP7-288238

Document Code: D3.1 User expectations and cross-modal interaction v0.4

3.1.3 Creating seamless loops


The creation of perfectly looping audio files is not an easy task: you must have appropriate audio content to create seamless loops, with not too much variations in the sound and perform manual mixing operation to make the looping perfect and pleasing at the ear. The basic theory of this process can be decomposed like this: Find audio content that is suitable for looping (an ambiance, a repetitive sound, etc.). Locate a part in the audio content that would be appropriate for creating a loop: is must have not too much variation in volume and general tone. You must also make sure you have additional appropriate audio content before the place the loop will be actually created, as it will used to create the seamless transition. Create the loop markers in your audio editor, to serve as a reference. Now comes the tricky part: crossfade the end part of the loop with the audio content preceding the beginning of the loop (see fig. 10). The fade curves must be of constant energy (logarithmic). The best duration is to determine by ear, but 10% of the loop duration is a good starting point. It is also better to keep the crossfade duration between 500 milliseconds and 5 seconds to not alter the loop too much while keeping a pleasant transition. Make sure your loop begins and ends on zero-crossing points, to avoid clicks. Trim the audio content before and after your loop. Do not perform additional audio fades after this or it will alter the seamlessness of you audio loop!

FIGURE 3.2 CROSS-ADE FOR SEAMLESS LOOPING

The most difficult stage is the crossfading part. Some audio editors can automate this process for you, on others you having to performs the different steps manually: Perform a fade-out of a given duration (for example 1 second) on the end of the loop, with a logarithmic curve. Select and copy the same duration of audio content just before the actual loop, in a new audio file. Perform a fade-in of the same duration on this new file, using a logarithmic curve.

VENTURI Consortium 2011-2014


Page 18

FP7-288238

Document Code: D3.1 User expectations and cross-modal interaction v0.4

Mix this new file at the end of the loop, like on fig. 10.

Sometimes this process must be tweaked a little to find the optimum results.

3.1.4 Pre-rendering HRTF samples


If you want to pre-render various HRTF directions, for example to make a 3D audio beacon, you still have to prepare your sound with the processing stages explained in section 4.2. After you have a final simple version of your sound, you may proceed to the HRTF pre-rendering. In order to do so, you have to get an audio plugin able to perform the HRTF processing, like IRCAM Spat [8] or WaveArts Panorama [9]. Note that mono sounds are better suited for HRTF rendering.

FIGURE 3.3 WAVE ARTS PANORAMA , A HRTF SPATIALIZATION PLUGIN


Then you have to configure the plugin for your needs: choose the right HRTF to use (most plugins include a generalized HRTF profile name Human or so), set up the reverb (a small reverb helps improving the 3D perception) and save this as a new pre-set. Finally, you have to apply the result to your sound, changing each time the location of the source for each direction you want to pre-render and saving the result to a new audio file. You should end up with as many audio files as directions you chose to use.

3.2 Audio integration & User tests using IXE demonstrator


3.2.1 Scenario
We have designed an interactive audio scene based on the VeDi 1.0 game scenario (burger delivery) to demonstrate the features of our content authoring system and sound manager based on XML and its future integration in the VeDi demonstrator. In order to do this, we have recreated the OpenStreetMap navigation network of the Venturi city model (figure 3.4).

VENTURI Consortium 2011-2014


Page 19

FP7-288238

Document Code: D3.1 User expectations and cross-modal interaction v0.4

FIGURE 3.4 OSM NETWORK OF THE CITY


Based on this OSM network, we will trigger audio events using our IXE navigation demonstrator. The audio events will by interpreted by the sound manager, based on the XML audio document in MAUDL format we provide (see next section). This scenario is designed to illustrate the following features of audio language: Event synchronization and triggering Simple stereo ambiances with distance attenuation (garden, busy street, calm street) 3D spatialized ambiances (restaurant, construction site) Sound object 3D spatialization with rear attenuation (klaxon / angry people, piano) or not (people, dog) Sound randomization (klaxon / angry people) Internal synchronization (piano with its reverb, door with ambiances) Sound queues with priority classes and validity discrimination (Dog, People) Pre-rendered HRTF beacon to indicate delivery target using multiple sound sources Mix groups (ambiance, objects, UI)

VENTURI Consortium 2011-2014


Page 20

FP7-288238

Document Code: D3.1 User expectations and cross-modal interaction v0.4

To demonstrate these features, we have built a scenario based on the VeDi 1.0 game, for which the goal is to deliver a burger from one location to another using a car. Using our IXE demonstrator, the car (represented here by the black head) will move on the OSM network, following this predefined route:

FIGURE 3.5 SIMULATION ROUTE OF THE SCENARIO


The simulation starts on the right at the green pin and ends on the left at the red pin. The car will move following the path specified in blue. First it will go to pick the burger to deliver at the restaurant. Once the burger is taken, the HRTF beacon starts to indicate the delivery location. On its way to the delivery target, the car will go past 2 events: the barking dog and a person talking. Depending of its speed, only the dog or both of these may be heard, as these sounds are put into a sound queue (see 3.2.2.3). The car will pass near the construction site and then arrive at the delivery point, which ends the beacon. The car will then move down the street, meet a random event (klaxon / angry guy) and then make a stop. The driver will go out of the car and enter the building. As it passes the door, the car engine and exterior ambiances stop. The driver will move forward to hear a piano player in a concert room with a lot of reverbed sound, walk around him and finally return outside. As he passes the door again, the exterior ambiances will be heard again and the car engine will restart. The car will finally move into the neighbourhood, make a U-turn as it was going the wrong way and then go out of the city.

3.2.2 Audio scene description


The audio scene is defined in the XML audio format, using object names and events mapped on the OSM document. This is the audio document we use in this scenario:
<?xml version="1.0" encoding="UTF-8"?> <maudl xmlns="http://gforge.inria.fr/projects/iaudio/maudl/1.0" id="audio_stylesheet"> <!-- use standard 3D rendering settings, with a little rear attenuation to improve focus effect --> <sounds rolloff="linear" listenerRearAttenuation="0.2" listenerPos="0 0 0" listenerLookAt="0 0 1" scale="1.0"

VENTURI Consortium 2011-2014


Page 21

FP7-288238

Document Code: D3.1 User expectations and cross-modal interaction v0.4

classOrder="objective danger info"> <!-- local spatialized ambiances --> <sound id="amb.restaurant" play="amb.restaurant.trigger obj.door.started" replay="stop" render3D="advanced" min="3" max="30" src="amb_restaurant.wav" loopCount="-1"/> <sound id="amb.construction" play="amb.construction.trigger obj.door.started" replay="stop" render3D="advanced" min="10" max="42" src="amb_construction.wav" loopCount="-1"/> <!-- piano ambiance with reverb --> <sound id="amb.piano" play="amb.piano.trigger" stop="obj.door.trigger" render3D="advanced" min="1" max="4" src="amb_piano.wav" loopCount="-1"/> <sound id="amb.piano.reverb" play="amb.piano.started" stop="amb.piano.ended amb.piano.stopped" render3D="no" min="1" max="4" src="amb_piano_reverb.wav" loopCount="-1" volume="0.1"/> <!-- global ambiances, with attenuation only --> <sound id="amb.busystreet" play="amb.busystreet.trigger obj.door.started" replay="stop" render3D="simple" pan3DFactor="0" min="5" max="75" src="amb_busystreet.wav" loopCount="-1"/> <sound id="amb.calmstreet" play="amb.calmstreet.trigger obj.door.started" replay="stop" render3D="simple" pan3DFactor="0" min="5" max="80" src="amb_calmstreet.wav" loopCount="-1" volume="0.5"/> <sound id="amb.garden" play="amb.garden.trigger obj.door.started" replay="stop" render3D="simple" pan3DFactor="0.5" min="5" max="50" src="amb_garden.wav" loopCount="-1"/> <!-- spatialized ponctual random object --> <sound id="obj.klaxon" pick="random" play="obj.klaxon.trigger" render3D="advanced" pos="0 0 0" min="5" max="20"> <soundsource src="obj_klaxon1.wav"/> <soundsource src="obj_klaxon2.wav"/> </sound> <!-- car engine (not spatialized since the scene listener is on the car) --> <sound id="obj.car" play="navigation.start resume_navigation.play obj.door.trigger" stop="pause_navigation.play" replay="stop" render3D="none" loopCount="-1" volume="0.1" src="obj_car.wav"/> <!-- pre-rendered HRTF spatialized beacon for the target --> <sound id="obj.hrtf" loopCount="0" pick="fixed" render3D="simple" pan3DFactor="0.0" play="ui.takeburger.ended obj.hrtf.ended" stop="ui.deliverburger.trigger" min="1" max="150"> <soundsource src="hrtf_0.wav"/> <soundsource src="hrtf_1.wav"/> <soundsource src="hrtf_2.wav"/> <soundsource src="hrtf_3.wav"/> <soundsource src="hrtf_4.wav"/> <soundsource src="hrtf_5.wav"/> <soundsource src="hrtf_6.wav"/> <soundsource src="hrtf_7.wav"/> <soundsource src="hrtf_8.wav"/> <soundsource src="hrtf_9.wav"/> <soundsource src="hrtf_10.wav"/> <soundsource src="hrtf_11.wav"/> <soundsource src="hrtf_12.wav"/> <soundsource src="hrtf_13.wav"/> <soundsource src="hrtf_14.wav"/> <soundsource src="hrtf_15.wav"/> </sound> <!-- entering inside building --> <sound id="obj.door" play="obj.door.trigger" render3D="none" src="obj_door.wav"/> <!-- user interaction sounds --> <sound id="ui.takeburger" enqueue="ui.takeburger.trigger:ui.queue" render3D="none" src="ui_takeburger.wav" class="objective"/> <sound id="ui.deliverburger" enqueue="ui.deliverburger.trigger:ui.queue" render3D="none" src="ui_deliverburger.wav" class="objective"/> <sound id="ui.dog" enqueue="uidog.trigger:ui.queue" render3D="simple" src="ui_dog.wav" class="danger"/> <sound id="ui.people" enqueue="ui.people.trigger:ui.queue:-1:3" render3D="simple" src="ui_people.wav" class="info"/> <sound id="ui.pause" play="ui.pause.trigger pause_navigation.play resume_navigation.play" render3D="none" src="ui_pause.wav"/> </sounds> <queues> <!-- priority queue for playing UI sounds --> <queue id="ui.queue" autoPlay="true" sort="class" timeBase="realtime"/> </queues> <mixers>

VENTURI Consortium 2011-2014


Page 22

FP7-288238

Document Code: D3.1 User expectations and cross-modal interaction v0.4

<!-- mixer for ambiance sounds --> <mixer id="mix.ambiance" volume="0.9"> amb.burger, amb.construction, amb.busystreet, amb.calmstreet, amb.garden, amb.piano, amb.piano.reverb </mixer> <!-- mixer for objects sounds --> <mixer id="mix.objects" volume="0.5"> obj.klaxon, obj.car, obj.door </mixer> <!-- mixer for UI sounds --> <mixer id="mix.ui" volume="1.0"> ui.queue, ui.paused </mixer> </mixers> </maudl>

Most of the sounds specified here correspond to the elements declared in the OSM document, with few exceptions (that will be explained later). We will now see in details the main features illustrated in this simulation.

3.2.2.1 3D Spatialization
There are various spatialization techniques used in this demonstration, with different goals: Global ambiances, with large listening radius, only use distance attenuation along their initial stereo rendering (garden, busy street, calm street). The goal is that these global ambiances are heard when the listener is within their range, and attenuate when he goes farther than the source of the ambiances. Local ambiances and some sound objects are spatialized using 3D positioning, distance attenuation with rear attenuation enabled (restaurant, construction site, piano, klaxon). Since these sounds only occur locally, they are spatialized using a natural positioning and attenuation model. The rear attenuation focus the listener attention on what is in front of him. Some user interactions are spatialized using simple 3D positioning with distance attenuation (dog, people). These represent local events, but since they are quite long sounds (people talk) they should be heard while the car is moving further from them from behind, hence why no read attenuation is used here. The audio beacon indicating the delivery target is composed by many sound sources representing 16 pre-rendered directions using HRTF processing. The selection of the right sound source to play depending of the listener orientation is done in the IXE application. Then distance attenuation is used to indicate whether the listener is close or far to its destination.

These different techniques are used to illustrate the possibilities offered by the MAUDL format in various contexts, and to compare the drawbacks and benefits of each method.

3.2.2.2 Synchronization and interactions


The MAUDL format is entirely based on an event synchronization system derived from SMIL. External events (sent by the application) are used to trigger the audio objects during the car navigation. In addition, the audio objects themselves generate internal events when they start, stop, etc. All these events are used to build the dynamic soundscape of this demonstration. Internal events are here used to create interactions between sounds: When the car arrive at the restaurant to take the burger for delivery (ui.takeburger.trigger event), the sound ui.takeburger will be added to the sound queue ui.queue to be played. When it has finished play-

VENTURI Consortium 2011-2014


Page 23

FP7-288238

Document Code: D3.1 User expectations and cross-modal interaction v0.4

ing (ui.takeburger.ended event), the HRTF beacon obj.hrtf will start playing in loop. It will end as soon as the burger is delivered (ui.deliverburger.trigger event). When the listener goes past the door (obj.door.trigger event), the sound obj.door is played. When this sound is started (obj.door.started event) the first time, the ambiances already playing and the car engine sound obj.car will stop, and the piano sound amb.piano will start. The reverb of the concert room amb.piano.reverb is also synchronized on the start and stop of the piano sound. When the listener goes out of the building, the sound obj.door is started again, caused all the previously stopped ambiances and the car engine to start again.

Using internal makes it easy to create complex interactions between sounds in a simple way.

3.2.2.3 Sound queue


In order to demonstrate the priority and filtering system of sound queues, we have created one named ui.queue to play the user interaction sounds. We have defined 3 priority classes, ordered by decreasing priority: objective, danger and info. The sounds ui.takeburger and ui.deliverburger are set in the objective class, as they must always be played with the highest priority. However, this is just for safety and these sounds should not be part of the test cases explained later. The sound ui.dog is set in the danger class, since dangerous cues should be notified with high priority. The sound ui.people is set in the info class, as it is only informative and may be skipped for higher priority sounds. This sound has a validity distance of 3 meters. If the listener has moved more than this distance before this sound can be played, it will be automatically skipped.

This setup allows performing 3 test cases to assert that the queue behaviour is the one expected: If the car moves very fast, the sound ui.dog and ui.people will be put in the sound queue at the same time. But due to the higher priority class of the ui.dog sound, it will always be played first. If the car moves fast, the sound ui.dog will obviously be played and the sound ui.people may be skipped completely if it goes farther than the 3 meters radius before the sound ui.dog has finished playing. If the car moves slowly and is still within the 3 meters radius after the sound ui.dog has finished playing, the sound ui.people will be played.

Finally, simply changing the movement speed of the car modify the playback behaviour of the sound queue, allowing to put in evidence the benefits of its usage in a navigation context.

3.2.3 User tests


In order to test the perception of audio of this demonstration scenario by various users, we have put in place a series of tests based on a questionnaire. The goal of these tests to better understand how the various audio elements are perceived, how effective are the different spatialization methods and what can be achieved to improve the user immersion in such scenarios.

3.2.3.1 Testing Methodology


We will present the demonstration scenarios to a group of testing users. The general soundscape context (areas of the city) and simulation objective (driving inside a city, delivering burger) is explained and the unmarked map of the city will be showed, so the users know have an idea of what they should expect to hear. They will hear the

VENTURI Consortium 2011-2014


Page 24

FP7-288238

Document Code: D3.1 User expectations and cross-modal interaction v0.4

simulation a first time with the car moving at an average speed, then a questionnaire (see next section) will be given to them for answer. They will hear the simulation a second time before answering the questions.

3.2.3.2 Questionnaire
After they have heard the simulation a second time, they will be asked to answer each question using a scale ranging from one to ten, one corresponding to I strongly disagree and ten corresponding to I strongly agree. S1: I have a good spatial conception of the sound locations. Before the users will answer this question, we will ask them to concentrate on the spatialized ambiances (restaurant, construction site) and punctual sound objects like the piano. S2: I can determine if I am going closer or farther of a sound source easily. To answer this question, we will ask the users to focus on the ambiances sounds and punctual sound objects, for example the construction site. S3: I can determine approximately in which area of the city the car is currently present. During their second hearing of the simulation, we will make 3 pauses at different places of the map: before arriving at the restaurant, near the construction site, and after delivering the burger. We will each time ask the users if they can approximately tell in which area of the city they think they currently are, and after the end of the hearing we will show them the simulation route and the correct answers, so they can see if they were right or not. S4: I can determine easily if Im inside or outside. During the briefing before the tests, the users will be told that at some point the user will enter a building, do something inside and return outside. We will ask them after the second hearing at which point they think this occurred, and tell them the correct answer (between the door sounds, when they hear the piano), so they can make their opinion. S5: When I am hearing multiple sounds concurrently, I can distinguish and understand them without effort. After the second hearing, we will ask them to tell with their own words what they think occurred during the whole scenario, and after that we will remind them what was the original scenario so they can compare and make their opinion.

3.2.4 Results and conclusion


We have performed the tests with a group of 12 users, 6 for iOS demonstrator and 6 participants for Android version. The mean score for each question can be found in the following graphic:

VENTURI Consortium 2011-2014


Page 25

FP7-288238

Document Code: D3.1 User expectations and cross-modal interaction v0.4

10 9 8 7 6 5 4 3 2 1 0 S1 S2 S3 S4 S5 iOS Android

From these results, it can be found that iOS and Android results are similar. The majority of participants can localize easily sound objects with the different spatialization methods. They understand that the volume attenuation allows determining if they go closer or farther of a sound source. The general soundscape with specific ambiances helps users to know quickly in which area the car is present. Due to the sounds chosen, all participants can determine if they are inside or outside easily. Finally, some users think that it is sometimes difficult to distinguish and understand multiple sounds that are playing concurrently. The objective of the questionnaire was to find out how the various audio elements are perceived and how effective are the different spatialization methods. All participants think that adding audio helps to make the user experience in a game more immersive. The MAUDL XML format and its sound manager implementation are very useful to describe a rich soundscape. This demonstration scenario shows the various possibilities offered by the audio language and its usage in a concrete example. It is also a first step before the audio integration into the VeDi game demonstrator.

3.3 3D HRTF Audio integration & User tests


3.3.1 Scenario
A second interactive audio scene based on the VeDi 1.0 game scenario (burger delivery) was designed to test sound objects using pre-rendering HRTF samples for different directions. Figure 3.6 below shows four sound objects using HRTF samples (bird, cat, dog, church). The OpenStreetMap document describing the navigation network of the city contains these four sound objects as POIs. Circles show the area where the objects can be heard.

VENTURI Consortium 2011-2014


Page 26

FP7-288238

Document Code: D3.1 User expectations and cross-modal interaction v0.4

FIGURE 3.6 MAP OF THE CITY WITH HRTF SOUND OBJECTS

Based on the OSM document, a node element localizes a sound object and defines a distance in meter to trigger audio events:
<node id='-362' action='modify' visible='true' lat='0.004243508385054006' lon='0.010137663568737965'> <tag k='cat' v='hrtf' /> <tag k='triggering' v='11' /> </node>

We will trigger audio events using tag elements in our IXE navigation demonstrator. In this example, tags are parsed and the audio events cat.hrtf.trigger will be created using <tag k='cat' v='hrtf' />. When the listener enters in the triggering circle of this node (<tag k='triggering' v='11' />), the event is sent to the sound manager and will be interpreted based on the XML audio document in MAUDL format we provide. To demonstrate the pre-rendered HRTF feature, we have built a scenario based on the VeDi 1.0 game, for which the goal is to move around the city from one location to another using a car. Using our IXE demonstrator, the car (represented here by the black head) will move on the OSM network, following a predefined route or a route computed by the OSM router embedded in the application. Using our simulator, the car will move following the path specified in blue. When the simulator computes a new location and orientation, this information is sent to the Sound Manager to set the listener location and orienta-

VENTURI Consortium 2011-2014


Page 27

FP7-288238

Document Code: D3.1 User expectations and cross-modal interaction v0.4

tion. Then, our application looks at POIs information to trigger events with the new location simulated. Figure 3.7 is a screenshot of the IXE demonstrator where the simulator is following a predefined route.

FIGURE 3.7 SIMULATION ON A PRE -DEFINED ROUTE

3.3.2 Audio scene description


The audio scene is defined in the XML audio format, using object names and events mapped on the OSM document. This is the audio document we use in this scenario:
<?xml version="1.0" encoding="UTF-8"?> <maudl xmlns="http://gforge.inria.fr/projects/iaudio/maudl/1.0" id="audio_stylesheet"> <!-- use standard 3D rendering settings, with a little rear attenuation to improve focus effect --> <sounds rolloff="linear" listenerRearAttenuation="0.2" listenerPos="0 0 0" listenerLookAt="0 0 1" scale="1.0" classOrder="objective danger info">

<!-- car engine (not spatialized since the scene listener is on the car) --> <sound id="obj.car" play="navigation.start resume_navigation.play" stop="pause_navigation.play" replay="stop" render3D="none" loopCount="-1" volume="0.1" src="obj_car.wav"/> <!-- pre-rendered HRTF spatialized bird --> <sound id="bird.hrtf" loopCount="0" pick="fixed" render3D="simple" pan3DFactor="0.0" play="bird.hrtf.trigger bird.hrtf.ended" stop="pause_navigation.play" min="5" max="20"> <soundsource src="hrtf_bird_0.wav"/> <soundsource src="hrtf_bird_1.wav"/> <soundsource src="hrtf_bird_2.wav"/> <soundsource src="hrtf_bird_3.wav"/> <soundsource src="hrtf_bird_4.wav"/> <soundsource src="hrtf_bird_5.wav"/> <soundsource src="hrtf_bird_6.wav"/> <soundsource src="hrtf_bird_7.wav"/>

VENTURI Consortium 2011-2014


Page 28

FP7-288238

Document Code: D3.1 User expectations and cross-modal interaction v0.4

<soundsource src="hrtf_bird_8.wav"/> <soundsource src="hrtf_bird_9.wav"/> <soundsource src="hrtf_bird_10.wav"/> <soundsource src="hrtf_bird_11.wav"/> <soundsource src="hrtf_bird_12.wav"/> <soundsource src="hrtf_bird_13.wav"/> <soundsource src="hrtf_bird_14.wav"/> <soundsource src="hrtf_bird_15.wav"/> </sound> <!-- pre-rendered HRTF spatialized dog --> <sound id="dog.hrtf" loopCount="0" pick="fixed" render3D="simple" pan3DFactor="0.0" play="dog.hrtf.trigger dog.hrtf.ended" stop="pause_navigation.play" min="5" max="20"> <soundsource src="hrtf_dog_0.wav"/> <soundsource src="hrtf_dog_1.wav"/> <soundsource src="hrtf_dog_2.wav"/> <soundsource src="hrtf_dog_3.wav"/> <soundsource src="hrtf_dog_4.wav"/> <soundsource src="hrtf_dog_5.wav"/> <soundsource src="hrtf_dog_6.wav"/> <soundsource src="hrtf_dog_7.wav"/> <soundsource src="hrtf_dog_8.wav"/> <soundsource src="hrtf_dog_9.wav"/> <soundsource src="hrtf_dog_10.wav"/> <soundsource src="hrtf_dog_11.wav"/> <soundsource src="hrtf_dog_12.wav"/> <soundsource src="hrtf_dog_13.wav"/> <soundsource src="hrtf_dog_14.wav"/> <soundsource src="hrtf_dog_15.wav"/> </sound> <!-- pre-rendered HRTF spatialized cat --> <sound id="cat.hrtf" loopCount="0" pick="fixed" render3D="simple" pan3DFactor="0.0" play="cat.hrtf.trigger cat.hrtf.ended" stop="pause_navigation.play" min="5" max="15"> <soundsource src="hrtf_cat_0.wav"/> <soundsource src="hrtf_cat_1.wav"/> <soundsource src="hrtf_cat_2.wav"/> <soundsource src="hrtf_cat_3.wav"/> <soundsource src="hrtf_cat_4.wav"/> <soundsource src="hrtf_cat_5.wav"/> <soundsource src="hrtf_cat_6.wav"/> <soundsource src="hrtf_cat_7.wav"/> <soundsource src="hrtf_cat_8.wav"/> <soundsource src="hrtf_cat_9.wav"/> <soundsource src="hrtf_cat_10.wav"/> <soundsource src="hrtf_cat_11.wav"/> <soundsource src="hrtf_cat_12.wav"/> <soundsource src="hrtf_cat_13.wav"/> <soundsource src="hrtf_cat_14.wav"/> <soundsource src="hrtf_cat_15.wav"/> </sound> <!-- pre-rendered HRTF spatialized church --> <sound id="church.hrtf" loopCount="0" pick="fixed" render3D="simple" pan3DFactor="0.0" play="church.hrtf.trigger church.hrtf.ended" stop="pause_navigation.play" min="5" max="50"> <soundsource src="hrtf_church_0.wav"/> <soundsource src="hrtf_church_1.wav"/> <soundsource src="hrtf_church_2.wav"/> <soundsource src="hrtf_church_3.wav"/> <soundsource src="hrtf_church_4.wav"/> <soundsource src="hrtf_church_5.wav"/> <soundsource src="hrtf_church_6.wav"/> <soundsource src="hrtf_church_7.wav"/> <soundsource src="hrtf_church_8.wav"/> <soundsource src="hrtf_church_9.wav"/> <soundsource src="hrtf_church_10.wav"/> <soundsource src="hrtf_church_11.wav"/> <soundsource src="hrtf_church_12.wav"/> <soundsource src="hrtf_church_13.wav"/> <soundsource src="hrtf_church_14.wav"/> <soundsource src="hrtf_church_15.wav"/> </sound>

VENTURI Consortium 2011-2014


Page 29

FP7-288238

Document Code: D3.1 User expectations and cross-modal interaction v0.4

</sounds>

<mixers> <!-- mixer for ambiance sounds --> <mixer id="mix.ambiance" volume="0.9"> bird.hrtf, church.hrtf, dog,hrtf </mixer> <!-- mixer for objects sounds --> <mixer id="mix.objects" volume="0.5"> obj.car, cat.hrtf </mixer> </mixers> </maudl>

Most of the sounds specified here are using pre-rendering HRTF sound sources for 16 different directions according to listener position and orientation. Samples are created using section 4.4 Pre-rendering HRTF samples of T.5.1.2 3D Audio Content Creation. Sound sources are ordered in clockwise to cover all the directions from 0 to 360 degrees. For example, the first source <soundsource src="hrtf_church_0.wav"/> is the pre-rendering HRTF sample for a sound at 0 degree (in front of the listener), <soundsource src="hrtf_church_12.wav"/> is the HRTF sample for a sound at 270 degrees (at the left of the listener).

3.3.2.1 Synchronization and interactions


The MAUDL format is entirely based on an event synchronization system derived from SMIL. External events (sent by the application) are used to trigger the audio objects during the car navigation. In addition, the audio objects themselves generate internal events when they start, stop, etc. All these events are used to build the dynamic soundscape of this demonstration. - External events trigger a sound to start playing (play="church.hrtf.trigger"). - Internal events are here used to loop a sound when it finishes playing (play="church.hrtf.ended"). Also, an algorithm is implemented in IXE demonstrator to select the sound source to play according to the angle between the source and the listener positions for each HRTF sound objects:
// update all HRTF objects for (ADSound *hrtfSound in hrtfSounds) { float hrtfAngle = 0.0; // set the listener as the center of our world point2f_t newPos = point2f_init(hrtfSound.position.x - soundManager.listenerPosition.x, hrtfSound.position.z - soundManager.listenerPosition.z); if (!(fequalzero(newPos.x)|| fequalzero(newPos.y))) { // get the angle between the source and the listener positions hrtfAngle = atan2f(newPos.y, newPos.x); if (!(fequalzero(soundManager.listenerOrientation.x) && fequalzero(soundManager.listenerOrientation.z))) { // get the angle of the direction the listener is looking at float orientationAngle = atan2f(soundManager.listenerOrientation.x, soundManager.listenerOrientation.z); hrtfAngle = hrtfAngle - orientationAngle; } } hrtfAngle = deg_angle_normalize(RAD_TO_DEG(hrtfAngle) - 270); int dir = (int)roundf(hrtfAngle * 16.0 / 360.0) % 16; // set the next soundsource to play [hrtfSound setNextSoundsource:dir]; }

VENTURI Consortium 2011-2014


Page 30

FP7-288238

Document Code: D3.1 User expectations and cross-modal interaction v0.4

3.3.3 User tests


The goal of these tests is to evaluate how effective is the 3D spatialization using pre-rendered HRTF 3D sounds. We will compare a scenario using HRTF 3D sounds with a scenario using simple stereo sounds and distance attenuation to determinate if pre-rendered HRTF sounds improve the user experience.

3.3.3.1 Testing Methodology


We will present the demonstration scenarios to a group of testing users. The soundscape (four sound objects) and the simulated route are explained and the map of the city will be showed, so the users have an idea of what they should expect to hear. Then a questionnaire (see next section) is given to them in order to help them to concentrate on what we expect. They will hear the simulation a first time (using simple stereo sounds) with the car moving at an average speed, then they will answer the questionnaire. They will hear the simulation a second time (using HRTF 3D sounds) and they will answer the same questionnaire.

3.3.3.2 Questionnaire
After they have heard each scenario, they will be asked to answer each questions using a scale ranging from one to ten. One corresponding to I strongly disagree and ten corresponding to I strongly agree. S6: I have a good spatial conception of the sound locations. Before the users will answer this question, we will ask them to show where the four sound objects are on the map. S7: I have the feeling that the sources are outside of my head. During their hearing, we will make a pause near the dog and the church to help users to focus on these two sound objects. S8: I can determine easily if a source is in front of or behind the car. During their hearing, we will make a pause near the bird and we use gyroscope to change the listener orientation. Then, we will ask the participant to tell if the sound object is in front of or behind him. S9: The listening experience gives me the feeling of moving. After the route simulation, we will ask them to tell if the sound objects give them the feeling of moving by coming closer of sound objects or not.

3.3.4 Results and conclusion


We have performed the tests with a group of 6 users. The mean score for each question can be found in the following graphic:

VENTURI Consortium 2011-2014


Page 31

FP7-288238

Document Code: D3.1 User expectations and cross-modal interaction v0.4

From these results, we can compare the scenario using simple stereo sounds with the one using pre-rendered HRTF 3D sounds. In both cases, users can localize easily sound objects and have the feeling of moving. HRTF samples make the user experience realer because participants have the feeling that sound sources does not come from their headphones, but rather from a speaker or an object close to them. HRTF samples also help users to determine if a source is in front or behind them. The MAUDL XML format and its sound manager implementation are very useful to describe a rich soundscape. This demonstration scenario shows the possibility of using pre-rendered HRTF 3D sounds in the audio language and its usage. From the questionnaire, we find out that the audio perception is realer if we use HRTF 3D sounds because users have less the feeling of using headphones and they can determine easily where a sound is, especially if its location is in front of or behind them.

4 Experiments with visually impaired people (June and July 2012 Grenoble)
This section considers audio for indoor and outdoor navigation. The objective of these tests was to understand how visually impaired people will use a mobile phone audio navigation system on an indoor-outdoor route within an unknown environment.

4.1 Methodology
For the tests conducted with professional ergonomists, we use a prototype of the application that will be developed in task 4.1 of WP4. The objective was threefold: Removing critical ergonomics errors of the voice application audio-guide, for example: a. Can the user navigate in an unknown environment with the audio guide? b. Does the application speak the same language as the user? c. Can the users explicitly control the various functions of the application?

VENTURI Consortium 2011-2014


Page 32

FP7-288238

Document Code: D3.1 User expectations and cross-modal interaction v0.4

Evaluate how the users felt about the usefulness of the application a. Effectiveness of the implementation of the task felt b. Perceived ease of use Test the prototype with the goal to integrate it into a continuous improvement cycle

The tests were conducted with five visually impaired people, three of them using a white stick to sense obstacle after validation of the route with a visually impaired pre-tester (the one shows on the figure 4.1).

FIGURE 4.1 VISUALLY IMPAIRED PRE -TESTER

4.1.1 Plan of a typical day


10 AM-11 AM Initiation stage Presentation of the application and of the experimentations context Training on the user interface and audio navigation: Calibration and walk in a corridor Preparation of the course Route A Route B Lunch Testing on both routes Interview after testing

11 AM-12 PM

12 PM-1.30 PM 1.30 PM 3 PM 3 PM 4.30 PM

4.1.2 Routes
Each route is performed 2 times with a different 3D audio guide each time: continuous beacon enabled versus beacon activated only when the user stops walking. The beacon is a sound that indicates the position of the body relative to the direction of travel. The sound is more or less strong, with a change in frequency according to the difference between the orientation of the body and the route. This use of 3D audio can be better understood by hearing this video: http://www.youtube.com/watch?v=h2b8yfCauZ8

VENTURI Consortium 2011-2014


Page 33

FP7-288238

Document Code: D3.1 User expectations and cross-modal interaction v0.4

A discovery phase was introduced at the start of testing. It consists of two phases: A walking model calibration phase requiring walking along a straight line of 30 meters A straight-line route of 50 meters to better understand the use of the beacon and vocal instructions were presented. For the test in real conditions, two routes were available. The testers were able to use the routes simulator incorporated in the prototype application to learn it before trying it in the real world.

The table below presents the two routes used for testing: Route A : From the bus stop to the INRIA reception Route B : Inside the INRIA building

This route is 60 meters long with two stairs. The first is outside with an unusual step length and no ramp. It will be made more accessible very soon. Segment 1 = 12m Segment 2 = 38m Stairs 1: 10 steps 70cm wide Stairs 2: 9 steps, one landing of 1m then 9 steps

This route is 153 meters long in 2 meters wide corridors and open spaces.

Segment 1 = 20m Segment 2 = 50m Segment 3 = 18m Segment 4 = 25m Segment 5 = 40m

VENTURI Consortium 2011-2014


Page 34

FP7-288238

Document Code: D3.1 User expectations and cross-modal interaction v0.4

4.1.3 Interview post-tests


The following questions were used to guide the interview and explore the utility of the application perceived by the users after the test session: Does the application was easy to use? What are the features that surprised you? What do you not like about the application? Does the use of the application allowed you to guide you effectively? Was its use comfortable? Did you understand quickly how the application works? Were the information and texts accessible and understandable? Do you like texture of sounds? Would you change? Do you have any preferences? Did you enjoy using the application? What would you like to improve? For the wording of instructions, texture of sounds, guide mode, calibration.

4.2 Conclusion and recommendations


4.2.1 Key points for user testing
Usability testing with IXE INRIA navigation application led to 5 elements of conclusion: IXE should be improved at the calibration and localization level. Indeed, the system calculating the positioning of users is taken in default by two identified factors: problems with walking speed and those linked to irregular orientations received by the Smartphone hung on the torso of the user. This generates critical errors in the announcement of the instructions, the use of 3D audio and renders inoperative the system. System errors are also too heterogeneous for users to adapt the system. The wording of the instructions is efficient. Apart from few problems easily correctable, the announced instructions are recognized and interpreted correctly by users. Therefore, the tracking system must require more work.

VENTURI Consortium 2011-2014


Page 35

FP7-288238

Document Code: D3.1 User expectations and cross-modal interaction v0.4

Route A is the best route as there was usually 3 meters maximum error. Route B has been a source of error for the system that was successful three times out of 10 to guide users into room G220. The users found the system useful. All participants found relevant the use of such a system in an indoor environment. Moreover, they expect that IXE can describe more precisely the environment in order to benefit from the exploration of a place or a building. IXEs UI available on the Smartphone and headphones can be improved and could be made more accessible to a visually impaired people. It is impossible for a disabled person to calibrate the device or access management options for the route.

4.2.2 Vigilance points for the design


These remarks apply to technical defects that hinder the efficient use of the system and are not dependent on the design of audio and/or visual interfaces. The calibration process does not take into account the conditions of real steps to end-users. Indeed, it appears that the blind tend to walk unevenly, especially in unfamiliar places. Therefore, the system must adapt to this irregularity to allow an effective presentation of audio instructions. Otherwise the system leads to critical errors, making the system unusable. It is therefore necessary to allow the system to anticipate this irregularity. The gyro system is a sensitive and smartphones may malfunction if the phone is shaken by something other than the walking pace. Therefore, it is necessary to reduce the risk of misdirection through realtime filtering of gyroscope data The cycle of mobility is not fully integrated. Indeed, the model "walk-stop and take stock of the situation-go back or forward" is not fully taken into account and this makes the walk not enough secure. This is the phase "stop and take stock of the situation" that is missing, because it is currently impossible to stop during the walk to review past and future sections of the route. The calibration phase must be integrated into the design of interfaces. Indeed, the current system does not allow a blind person to calibrate the system himself. It lacks the notions of distance and voice guidance to define the beginning and the end of the calibration phase.

4.2.3 Recommendations 4.2.3.1 Calibration and respect of the course by the system
The calibration system is not operating due to the walking pace of the users. Therefore it must overcome this irregularity by using a repositioning system (automatic or manual) on the map. The system must allow users to reposition the localisation system themselves in the environment when they stopped walking. To do this, they must explore their surroundings (touch, asks third party, etc.) to validate their position. This command must be run through the headphones or the screen of the smartphone. For example, users can scroll through the instructions or POI via the buttons on the headset then validate their choice by double clicking. Propose the user to manually reposition themselves in key areas of the route through the use of buttons. We need to ask the user to validate sections of the course. For example, if the system tells the user to go down using a stairway consisting of 10 steps and located at a distance of 5 meters, we must ask the user to confirm the meeting between the stairs and him. This implies that the user interrupts its

VENTURI Consortium 2011-2014


Page 36

FP7-288238

Document Code: D3.1 User expectations and cross-modal interaction v0.4

progress to validate the obstacle. It is also necessary that routes instructions of the audio-guide incorporate elements sufficiently identifiable to allow the user to validate the stages of his route. Ideally, the system can automatically capture elements of the environment to reposition the user as it travels through waypoints. For example, the project explores the Venturi system possibilities to put visual helping tags at key stages of the course.

4.2.3.2 Interpretation of Instructions


Users must understand sounds and vocalizations used in the system. The visually impaired have different interpretations because of the absence of vision. For example, it is better able to understand "a quarter turn to the right" than "turn right. We must focus on a regular announcement of distances. For example: 5 meters, 10 meters and 15 meters. Instructions using announcement of heterogeneous distances such as "in 8 meters turn right then, in 1 meter turn left " should be avoided. Avoid announcing predictive instructions at less than 10 meters. You have to give distances before actions. For example "in 10 meters turn left. Instructions for a change of direction must be given in real time with a slightly ahead of time to allow people to anticipate the turn. To change direction, you must provide instructions including the representation of the user's body. For example: "upstairs, turn half to your left into the corridor B".

4.2.3.3 Soundscape
The navigation system uses four types of audio: vocal instructions, announcement of POI, sonification of the steps and positioning beacon. These four kinds of audio information can conflict when they overlap in information-rich situations or when the system bugs. This causes confusion for the user. The sonification of steps reassures users when they start walking or when they restart after a shutdown. But it is less useful when the user walks because he wants to focus eon other sounds. Therefore, we propose to stop this sonification after the second vocal instruction. A beacon indicating the true heading is useful to help people to go straight or to turn right. However, it depends on the quality of the direction computed from the gyroscopes. It is therefore necessary to know precisely when the algorithm is susceptible of not producing the good heading and then disable the beacon. The announcement of POIs is made in a timely manner by the system. It is preferable to announce the label of a POI (Office A) or group of POIs (multiple offices) rather than giving the information that there is POI. For example, it is better to say "4 offices" rather than 4 POI. Then the user can stop and explore his environment following the first information. The exploration phase must be based on three distances: less than a meter, between 1-5 meters and more than 5 meters. It must adapt quickly to the rotation of the user and vocally indicate the spatial distribution of POI. To do this, the application must allow the user to choose one among the three distances and then listen to the short list of POIs available. Then, during exploration, the POI is announced when the user is in front of him, but not before.

VENTURI Consortium 2011-2014


Page 37

FP7-288238

Document Code: D3.1 User expectations and cross-modal interaction v0.4

4.2.3.4 Recommendations for the accessibility of IXE under iOS


Voice Over [10] on iOS is a very effective system, however, it requires a significant exploration time with the user's finger. Items must be findable quickly. Increase the size of buttons on the interface to facilitate the exploration and selection of components with Voice Over. A good size for buttons is 2 * 2cm. Create buttons to scroll lists. This allows faster access in mode "Voice-over". For example: buttons "up" and "down". For more flexibility, it is necessary to integrate two ways to activate one action: one on the headphones and one on the screen of the smartphones. Provide a headphones with buttons easily distinguishable by touching them for a better understanding of commands

5 Results and Conclusions


User studies and expert evaluation were undertaken to understand users expectations and reactions in order to improve the efficiency and quality of AR applications. This report is an input to T3.2 and WP2. It provides results, recommendations and requirements for AR gaming (see 2.5 and 2.6), interactive audio scene (see 3.2.4 and 3.3.4) and navigation (see 4.2), taking into account current and future visual and audio technologies in order to fulfil user needs or disabilities to a maximum.

6 References
[1] Azuma, R. A Survey of Augmented Reality, August 1997. [2] Cairns, P., Cox, A. L., Research Methods for Human-Computer Interaction, Cambridge University Press, 2008, 12. [3] Kaptelinin, V, Nardi, B., Macaulay, C., The Activity Checklist: A Tool for Representing the Space of context, interactions, July, 1999 [4] Lazar, J., Feng, H. J., Hochheiser, H., Research Methods in Human-computer interaction, WILEY (2010), 144150 [5] Adobe Audition, http://www.adobe.com/fr/products/audition.html [6] Steinberg Wavelab, http://www.steinberg.net/en/products/wavelab.html [7] Audacity, http://audacity.sourceforge.net [8] IRCAM Spat, http://www.fluxhome.com/products/plug_ins/ircam_spat [9] WaveArts Panorama, http://wavearts.com/products/plugins/panorama/ [10] Voice Over, http://www.apple.com/accessibility/voiceover/ [11] Corona: Audio Augmented Reality in Historic Sites, Florian Heller and Jan Borchers, MobileHCI Workshop on Mobile Augmented Reality: Design Issues and Opportunities, Stockholm, Sweden, August 2011.

VENTURI Consortium 2011-2014


Page 38

FP7-288238

Document Code: D3.1 User expectations and cross-modal interaction v0.4

7 Appendix for AR gaming study


7.1 Interview questions
How was your gameplay experience? o o o o Fun? Difficult? Something was annoying? Enjoyable?

Do you play mobile phone games? o o o o What kind of mobile games do you enjoy playing? When? While waiting / in the transport? How often?

Do you think you would play these games? o o o o When? Where? How often? If the marker is not needed?

Are you usually buying new inventions or do you wait until the technology is mature? Can you comment on how easy or hard it was to aim before shooting / smashing? Did you notice any technical malfunctions? o o o o o Which ones? The recognition of the marker was lost? Instability of spatial placement of graphic overlays? camera picture lag? Correct placement of graphic overlays?

Did some events cause disturbance for you? o o Why? Why not?

VENTURI Consortium 2011-2014


Page 39

FP7-288238

Document Code: D3.1 User expectations and cross-modal interaction v0.4

Is this a game that you would recommend to your girlfriend or boyfriend? Describe the advantages/disadvantages using phone/tablet?

VENTURI Consortium 2011-2014


Page 40

FP7-288238

Document Code: D3.1 User expectations and cross-modal interaction v0.4

7.2 Questionnaires
Questionnaire, AR game: ___________________ with _____________

Please mark how these statements fit your experience. Q1: I easily understood how to play the game. Q2: I became physically tired of playing the game. Q3: Technical instabilities irritated me during the game. Q4: The detection of the game board was stable enough for me. Q5: The location of the virtual objects was stable enough for me. Q6: The game responded fast enough on my input. Q7: The camera picture followed my movements. Agree ______________________Disagree Agree ______________________Disagree Agree ______________________Disagree

Agree ______________________Disagree

Agree ______________________Disagree Agree ______________________Disagree Agree ______________________Disagree

Please write your positive experiences of playing the game. ____________________________________________________________________________________________ ____________________________________________________________________________________________ ______________________________________________________________

Please write your negative experiences of playing the game. ____________________________________________________________________________________________ ____________________________________________________________________________________________ ______________________________________________________________

Other comments ____________________________________________________________________________________________ ____________________________________________________________________________________________ ______________________________________________________________

VENTURI Consortium 2011-2014


Page 41

FP7-288238

Document Code: D3.1 User expectations and cross-modal interaction v0.4

7.3 NASA-TLX

VENTURI Consortium 2011-2014


Page 42

FP7-288238

Document Code: D3.1 User expectations and cross-modal interaction v0.4

7.4 Informed consent Informed consent


Informed consent for the study of Augmented Reality gaming

I hereby confirm that I have received information about the study mentioned above. I am aware that:

the information about me that is collected as part of the study will only be studied by the research team at Lund University (Design sciences) and the research team of Sony Mobile Communications involved in the study

the recorded video from the study will only be viewed by those who are involved in the study

participation is voluntary and that I may at any time end my participation in the study

Date:______________

Name:__________________________________________________________________

Signature:___________________________________________________________

VENTURI Consortium 2011-2014


Page 43

FP7-288238

Document Code: D3.1 User expectations and cross-modal interaction v0.4

7.5 Graphs from questionnaires phone form factor


Below graphs are presenting the results for each question that participants answered for the phone form factor.

FIGURE 7.1 R ESULTS FOR Q1: I EASILY UNDERSTOOD HOW TO PLAY THE GAME

FIGURE 7.2 R ESULTS FOR Q2: I BECAME PHYSICALLY TIRED OF PLAYING THE GAME

FIGURE 7.3 R ESULTS FOR Q3: TECHNICAL INSTABILITIES IRRITATED ME DURING THE GAME

VENTURI Consortium 2011-2014


Page 44

FP7-288238

Document Code: D3.1 User expectations and cross-modal interaction v0.4

FIGURE 7.4 R ESULTS FOR Q4: THE DETECTION OF THE GAME BOARD WAS STABLE ENOUGH FOR ME

FIGURE 7.5 R ESULTS FOR Q5: THE LOCATION OF THE VIRTUAL OBJECTS WAS STABLE ENOUGH FOR ME

FIGURE 7.6 R ESULTS FOR Q6: THE GAME RESPONDED FAST ENOUGH ON MY INPUT

VENTURI Consortium 2011-2014


Page 45

FP7-288238

Document Code: D3.1 User expectations and cross-modal interaction v0.4

FIGURE 7.7 R ESULTS FOR Q7: THE CAMERA PICTURE FOLLOWED MY MOVEMENTS

VENTURI Consortium 2011-2014


Page 46

FP7-288238

Document Code: D3.1 User expectations and cross-modal interaction v0.4

7.6 Answers from questionnaires phone form factor


Below tables are presenting the results for each game that participants answered for the phone form factor.

TABLE 7.1 S HOWS PARTICIPANTS ANSWERS FOR AR B LITZ AR Blitz Phone Disagree I easily understood how to play the game. I became physically tired of playing the game. Technical instabilities irritated me during the game. The detection of the game board was stable enough for me. The location of the virtual objects was stable enough for me. The game responded fast enough on my input. The camera picture followed my movements. TABLE 7.2 S HOWS PARTICIPANTS ANSWERS FOR AR DEFENDER AR Defender Phone Disagree I easily understood how to play the game. I became physically tired of playing the game. Technical instabilities irritated me during the game. The detection of the game board was stable enough for me. The location of the virtual objects was stable enough for me. The game responded fast enough on my input. The camera picture followed my movements. TABLE 7.3 S HOWS PARTICIPANTS ANSWERS FOR DANGER COPTER Danger copter Phone Disagree I easily understood how to play the game. I became physically tired of playing the game. Technical instabilities irritated me during the game. The detection of the game board was stable enough for me. The location of the virtual objects was stable enough for me. The game responded fast enough on my input. The camera picture followed my movements. 0 6 9 0 0 0 0 0 5 0 0 0 1 1 1 1 3 3 3 0 1 Agree 11 0 0 9 9 11 10 0 9 7 1 1 0 0 0 0 2 1 1 0 0 1 3 2 4 1 1 1 Agree 11 0 1 6 9 11 11 0 5 2 0 0 0 0 0 1 0 1 0 0 0 0 0 2 2 2 0 1 Agree 6 0 2 3 4 6 5

VENTURI Consortium 2011-2014


Page 47

FP7-288238

Document Code: D3.1 User expectations and cross-modal interaction v0.4

7.7 Graphs from questionnaires tablet form factor


Below graphs are presenting the results for each question that participants answered for the tablet form factor.

FIGURE 7.8 R ESULTS FOR Q1: I EASILY UNDERSTOOD HOW TO PLAY THE GAME

FIGURE 7.9 R ESULTS FOR Q2: I BECAME PHYSICALLY TIRED OF PLAYING THE GAME

FIGURE 7.10 R ESULTS FOR Q3: TECHNICAL INSTABILITIES IRRITATED ME DURING THE GAME

VENTURI Consortium 2011-2014


Page 48

FP7-288238

Document Code: D3.1 User expectations and cross-modal interaction v0.4

FIGURE 7.11 R ESULTS FOR Q4: THE DETECTION OF THE GAME BOARD WAS STABLE ENOUGH FOR ME

FIGURE 7.12 R ESULTS FOR Q5: THE LOCATION OF THE VIRTUAL OBJECTS WAS STABLE ENOUGH FOR ME

FIGURE 7.13 R ESULTS FOR Q6: T HE GAME RESPONDED FAST ENOUGH ON MY INPUT

VENTURI Consortium 2011-2014


Page 49

FP7-288238

Document Code: D3.1 User expectations and cross-modal interaction v0.4

FIGURE 7.14 R ESULTS FOR Q7: T HE CAMERA PICTURE FOLLOWED MY MOVEMENTS

VENTURI Consortium 2011-2014


Page 50

FP7-288238

Document Code: D3.1 User expectations and cross-modal interaction v0.4

7.8 Answers from questionnaires tablet form factor


Below tables are presenting the results for each game that participants answered for the phone form factor.

TABLE 7.4 S HOWS PARTICIPANTS ANSWERS FOR AR B LITZ TABLET AR Blitz Tablet Disagree I easily understood how to play the game. I became physically tired of playing the game. Technical instabilities irritated me during the game. The detection of the game board was stable enough for me. The location of the virtual objects was stable enough for me. The game responded fast enough on my input. The camera picture followed my movements. 0 1 0 1 0 1 0 1 0 0 2 2 0 0 0 2 3 0 0 1 1 Agree 2 0 0 0 1 1 2

TABLE 7.5 S HOWS PARTICIPANTS ANSWERS FOR AR DEFENDER TABLET AR Defender Tablet Disagree I easily understood how to play the game. I became physically tired of playing the game. Technical instabilities irritated me during the game. The detection of the game board was stable enough for me. The location of the virtual objects was stable enough for me. The game responded fast enough on my input. The camera picture followed my movements. TABLE 7.6 S HOWS PARTICIPANTS ANSWERS FOR N ERDH ERDER TABLET NerdHerder Tablet Disagree I easily understood how to play the game. I became physically tired of playing the game. Technical instabilities irritated me during the game. The detection of the game board was stable enough for me. The location of the virtual objects was stable enough for me. The game responded fast enough on my input. The camera picture followed my movements. 2 2 6 1 2 1 1 0 1 2 1 0 1 0 2 3 0 0 0 1 2 Agree 5 3 1 7 7 6 6 1 8 4 4 7 3 2 0 1 1 1 1 1 2 3 2 0 0 0 2 1 Agree 8 1 7 7 4 6 7

VENTURI Consortium 2011-2014


Page 51

FP7-288238

Document Code: D3.1 User expectations and cross-modal interaction v0.4

7.9 Graphs from NASA-TLX phone form factor


Below graphs are presenting the results for each question that participants answered for the phone form factor, with the NASA-TLX.

FIGURE 7.15 R ESULTS FOR MENTAL DEMAND

FIGURE 7.16 R ESULTS FOR PHYSICAL DEMAND

VENTURI Consortium 2011-2014


Page 52

FP7-288238

Document Code: D3.1 User expectations and cross-modal interaction v0.4

FIGURE 7.17 R ESULTS FOR TEMPORAL DEMAND

FIGURE 7.18 R ESULTS FOR PERFORMANCE

VENTURI Consortium 2011-2014


Page 53

FP7-288238

Document Code: D3.1 User expectations and cross-modal interaction v0.4

FIGURE 7.19 R ESULTS FOR EFFORT

FIGURE 7.20 R ESULTS FOR FRUSTRATION

VENTURI Consortium 2011-2014


Page 54

FP7-288238

Document Code: D3.1 User expectations and cross-modal interaction v0.4

7.10 Answers from NASA-TLX phone form factor


Below tables are presenting the results for each game that participants answered for the phone form factor.

TABLE 7.7 S HOWS PARTICIPANTS ANSWERS FOR AR B LITZ AR Blitz Phone VL 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 VH Mental Demand (How mentally demanding was the task?) Physical Demand (How physically demanding was the task?) Temporal Demand (How hurried or rushed was the pace of the task?) Performance (How successful were you in accomplishing what your were asked to do?) Effort (How hard did you have to work to accomplish your level of performance?) Frustration (How insecure, discouraged, irritated, stressed, and annoyed were you?) 1 0 2 0 0 2 0 0 0 3 1 0 0 0 1 1 0 0 0 0 0 0 1 2 0 0 0 0 0 1 0 0 0 0 0 0 1 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 1 0 1 0 0

0 0 0 0 1 1 0 1 1

2 0 1 1 0 0 0 0 0

TABLE 7.8 S HOWS PARTICIPANTS ANSWERS FOR AR DEFENDER AR Defender Phone VL 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 VH Mental Demand (How mentally demanding was the task?) Physical Demand (How physically demanding was the task?) Temporal Demand (How hurried or rushed was the pace of the task?) Performance (How successful were you in accomplishing what your were asked to do?) Effort (How hard did you have to work to accomplish your level of performance?) Frustration (How insecure, discouraged, irritated, stressed, and annoyed were you?) 2 2 0 1 1 2 1 0 0 5 2 1 0 0 2 0 0 1 0 2 0 1 1 1 1 1 0 1 1 1 1 0 1 0 0 0 1 0 0 0 0 1 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 2 0 1 2 0 1 1

2 0 0 1 0 3 0 1 0

4 1 3 0 0 1 0 0 0

VENTURI Consortium 2011-2014


Page 55

FP7-288238

Document Code: D3.1 User expectations and cross-modal interaction v0.4

TABLE 7.9 S HOWS PARTICIPANTS ANSWERS FOR DANGER COPTER Danger copter Phone VL 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 VH Mental Demand (How mentally demanding was the task?) Physical Demand (How physically demanding was the task?) Temporal Demand (How hurried or rushed was the pace of the task?) Performance (How successful were you in accomplishing what your were asked to do?) Effort (How hard did you have to work to accomplish your level of performance?) Frustration (How insecure, discouraged, irritated, stressed, and annoyed were you?) 2 0 1 0 2 0 2 1 2 2 2 1 0 1 1 1 0 1 1 0 0 1 2 1 0 0 1 1 1 2 1 2 1 0 0 0 0 0 1 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 1 1 0 2 0 1 3 0

2 0 0 1 1 0 1 2 0

4 2 1 0 1 1 1 0 0

VENTURI Consortium 2011-2014


Page 56

FP7-288238

Document Code: D3.1 User expectations and cross-modal interaction v0.4

7.11 Graphs from NASA-TLX tablet form factor


Below graphs are presenting the results for each question that participants answered for the tablet form factor, with the NASA-TLX.

FIGURE 7.21 R ESULTS FOR MENTAL DEMAND

FIGURE 7.22 R ESULTS FOR PHYSICAL DEMAND

VENTURI Consortium 2011-2014


Page 57

FP7-288238

Document Code: D3.1 User expectations and cross-modal interaction v0.4

FIGURE 7.23 R ESULTS FOR TEMPORAL DEMAND

FIGURE 7.24 R ESULTS FOR PERFORMANCE

VENTURI Consortium 2011-2014


Page 58

FP7-288238

Document Code: D3.1 User expectations and cross-modal interaction v0.4

FIGURE 7.25 R ESULTS FOR EFFORT

FIGURE 7.26 R ESULTS FOR FRUSTRATION

VENTURI Consortium 2011-2014


Page 59

FP7-288238

Document Code: D3.1 User expectations and cross-modal interaction v0.4

7.12 Answers from NASA-TLX tablet form factor


Below tables are presenting the results for each game that participants answered for the tablet form factor.

TABLE 7.10 S HOWS PARTICIPANTS ANSWERS FOR AR BLITZ TABLET AR Blitz Tablet VL 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 VH Mental Demand (How mentally demanding was the task?) Physical Demand (How physically demanding was the task?) Temporal Demand (How hurried or rushed was the pace of the task?) Performance (How successful were you in accomplishing what your were asked to do?) Effort (How hard did you have to work to accomplish your level of performance?) Frustration (How insecure, discouraged, irritated, stressed, and annoyed were you?) 0 0 0 1 0 0 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 1 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 1 0 0 0 0 1

0 0 0 0 0 2 0 1 0

0 0 0 0 0 0 0 0 1

TABLE 7.11 S HOWS PARTICIPANTS ANSWERS FOR AR DEFENDER TABLET AR Defender Tablet VL 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 VH Mental Demand (How mentally demanding was the task?) Physical Demand (How physically demanding was the task?) Temporal Demand (How hurried or rushed was the pace of the task?) Performance (How successful were you in accomplishing what your were asked to do?) Effort (How hard did you have to work to accomplish your level of performance?) Frustration (How insecure, discouraged, irritated, stressed, and annoyed were you?) 3 0 1 1 0 1 0 2 0 2 1 1 2 1 2 0 0 0 0 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1

1 0 0 2 0 2 1 0 1

0 2 1 0 0 0 0 1 0

0 0 1 0 1 2 0 0 0

3 2 0 0 0 0 1 0 0

VENTURI Consortium 2011-2014


Page 60

FP7-288238

Document Code: D3.1 User expectations and cross-modal interaction v0.4

TABLE 7.12 S HOWS PARTICIPANTS ANSWERS FOR N ERDH ERDER TABLET NerdHerder Tablet VL 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 VH Mental Demand (How mentally demanding was the task?) Physical Demand (How physically demanding was the task?) Temporal Demand (How hurried or rushed was the pace of the task?) Performance (How successful were you in accomplishing what your were asked to do?) Effort (How hard did you have to work to accomplish your level of performance?) Frustration (How insecure, discouraged, irritated, stressed, and annoyed were you?) 2 0 0 2 0 0 1 0 0 1 1 1 0 0 1 0 0 0 0 1 2 3 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1

1 0 1 2 0 0 0 1 2

1 1 0 0 1 0 1 1 0

0 1 0 0 1 0 1 0 0

2 1 1 0 0 0 0 0 0

VENTURI Consortium 2011-2014


Page 61

SEVENTH FRAMEWORK PROGRAMME


FP7-ICT-2011-1.5 Networked Media and Search Systems b) End-to-end Immersive and Interactive Media Technologies Specific Targeted Research Project

VENTURI
(FP7-288238)

immersiVe ENhancemenT of User-woRld Interactions

[D3.2 Interface design prototypes and/or mock ups]

Due date of deliverable: [31-01-2013] Actual submission date: [29-01-2013]

Start date of project: 01-10-2011 Duration: 36 months

FP7-288238

Document Code: D3.2 Interface design prototypes and/or mock

Summary of the document


Document Code: Last modification: State: Participant Partner(s): Editor & Authors (alphabetically): D3.2 Interface design prototypes and/or mock ups v0.4 29/01/2013 Quality checked Sony Mobile Communications, INRIA Editor: Jacques Lemordant Authors: Gnter Alce (SONY), Klas Hermodsson (SONY), Yohan Lasorsa (INRIA), David Liodenot (INRIA), Thibaud Michel (INRIA), Mathieu Razafimahazo (INRIA), Paul Chippendale (FBK)

Fragment: Audience:

Yes/No public restricted internal

Abstract:

This document is delivera ble D3.2 Interface design prototypes and/or mock ups and presents tools and prototypes to show new interface and interaction design. Wizard of Oz, user interface, interaction design, navigation audio beacon, HMD, head tracker, headphone controls, QR code Refer to the corresponding section at the end of the deliverable

Keywords:

References:

VENTURI Consortium 2011-2014


Page 2

FP7-288238

Document Code: D3.2 Interface design prototypes and/or mock

Document Control Page


Version number Date Modified by Comments Status V0.4 29/01/2013 Paul Chippendale Ready for Quality checks draft WP leader accepted Technical coordinator accepted Project coordinator accepted Action requested to be revised by partners involved in the preparation of the deliverable for approval of the WP leader for approval of the technical coordinator for approval of the project coordinator Deadline for action: 29/01/2013

Change history
Version number 0.1 0.2 0.3 0.4 Date 28/12/2012 15/01/2013 27/01/2013 29/01/2013 Changed by G. Alce D. Liodenot D. Liodenot P. Chippendale Changes made Preliminary version Add sections 4 to 6 Add introduction and conclusion Quality check

VENTURI Consortium 2011-2014


Page 3

FP7-288238

Document Code: D3.2 Interface design prototypes and/or mock

Table of Contents
Summary of the document ....................................................................................................................... 2 Document Control Page............................................................................................................................ 3 Change history ........................................................................................................................................ 3 Table of Contents..................................................................................................................................... 4 Executive Summary.................................................................................................................................. 6 Scope .................................................................................................................................................. 6 Audience ............................................................................................................................................. 6 Summary ............................................................................................................................................. 6 Structure ............................................................................................................................................. 6 1 2 Introduction ..................................................................................................................................... 6 Wizard of Oz ..................................................................................................................................... 7 2.1 2.2 Research material....................................................................................................................... 7 Wizard of Oz tool........................................................................................................................ 8 Introduction ........................................................................................................................ 8 Setup ................................................................................................................................. 9 Connection ......................................................................................................................... 9 Wizard...............................................................................................................................10 Puppet ..............................................................................................................................10 Features ............................................................................................................................10

2.2.1 2.2.2 2.2.3 2.2.4 2.2.5 2.2.6 3

IXE: Interactive eXtensible Engine ......................................................................................................15 3.1 3.2 3.3 3.4 Introduction..............................................................................................................................15 Application features...................................................................................................................15 User interfaces and interactions ..................................................................................................16 Guiding people using a navigation audio beacon ...........................................................................20 Pre-rendering the navigation audio beacon............................................................................20 Additional improvements ....................................................................................................22 Using a head tracker to improve sound spatialization ...............................................................23

3.4.1 3.4.2 3.4.3 4

OSM authoring ................................................................................................................................25

VENTURI Consortium 2011-2014


Page 4

FP7-288238

Document Code: D3.2 Interface design prototypes and/or mock

4.1

Mobile OSM Route Editor ...........................................................................................................25 Requirements.....................................................................................................................25 User interface.....................................................................................................................26

4.1.1 4.1.2 4.2

Android kick-scooter ..................................................................................................................30 Technologies ......................................................................................................................31 Features ............................................................................................................................31 Experimentation .................................................................................................................31

4.2.1 4.2.2 4.2.3 5

PDRTrack: localization test application................................................................................................32 5.1 5.2 Introduction..............................................................................................................................32 Requirements ...........................................................................................................................32 OpenStreetMap document ..................................................................................................32 Pedometer calibration .........................................................................................................33

5.2.1 5.2.2 5.3 6 7 8

User Interfaces ..........................................................................................................................34

Results and Conclusions ....................................................................................................................38 References ......................................................................................................................................38 Appendix for Research Material .........................................................................................................40 8.1 H1: Using AR visor for AR is better than using phone or tablet for AR ...............................................40

8.2 H2: New application paradigm which dynamically filters application is better than the current application metaphor...........................................................................................................................42 8.3 H3: The AR visor gives an added value in everyday life to an extent that the user wants to keep them on 45 H4: Interaction redundancy results in low intrusion .......................................................................48

8.4

8.5 H5: Adjustable user interface elements that can change color and the placement of the user interface would be better than non adjustable user interface .................................................................................49 8.6 H6: To use same type of interactions for virtual object interaction as with real objects is better than indirect screen/pointer interaction ........................................................................................................51

VENTURI Consortium 2011-2014


Page 5

FP7-288238

Document Code: D3.2 Interface design prototypes and/or mock

Executive Summary
Scope
This document provides the deliverable contents related to the T3.2 Interface and Interaction Design.

Audience
This deliverable is public.

Summary
In this report, the objective is to provide interface and interaction design, mainly based on new audio-visual technologies to improve user experience.

Structure
This deliverable is structured as follows: Section 1 is an introduction. Section 2 describes the Wizard of Oz tool, which enables the testing of novel design ideas. Section 3 considers IXE, a mobile prototype enabling indoor and outdoor navigation solely based on the Inertial Measurement Unit (IMU) and OSM network with audio instructions. In Section 4, we present prototypes to create OSM documents. Finally, in Section 5 we present PDRTrack, a localization test application designed to test the Pedestrian Dead-Reckoning (PDR) algorithms.

1 Introduction
In this report, we explore the opportunities provided by new audio-visual technologies (HMD, headphones with head-trackers, SmartWatch). We also deal with indoor and outdoor localisation and navigation based on the Inertial Measurement Unit (IMU) or Ant+ sensors. Different tools and prototypes using these technologies are presented to show new interface design, interaction design, auditory display design and navigation audio beacon.

VENTURI Consortium 2011-2014


Page 6

FP7-288238

Document Code: D3.2 Interface design prototypes and/or mock

2 Wizard of Oz
2.1 Research material
The project started with a descriptive approach and tried to describe the project by asking questions such as, where are we, what do we want to do, what techniques are available, and how can these techniques be used or combined. Therefore hypotheses were formulated inspired from VENTURIs three use cases : gaming, indoor assistant and outdoor tourism. The following six hypotheses were posed: H1: Using AR visor for AR is better than using phone or tablet for AR H2: New application paradigm which dynamically filters applications is better than the current application metaphor H3: The AR visor gives an added value in everyday life to an extent that the user wants to keep them on H4: Interaction redundancy results in low intrusion H5: Adjustable user interface elements that can change colour and the placement of the user interface would be better than non adjustable user interface H6: To use same type of interactions for virtual object interaction as with real objects is better than indirect screen/pointer interaction The discussions after the listed hypotheses were about which concept frameworks are most suitable for desig ning and analysing these hypotheses. Example of conceptual frameworks which were considered are Distributed Cognition, Grounded Theory and Activity Theory. We decided to use Activity Theory and outlined a work flow process (Figure 2.1).

FIGURE 2.1 THE ITERATIVE

DESIGN PROCESS

Each hypotheses was analysed, information was gathered and explained. Furthermore suggestions on how to collect evidence was listed and also different design inputs were suggested. For instance for the first hypothesis we started with listing positive and negative claims (Table 2.1) of using AR visor, phone and tablet.

VENTURI Consortium 2011-2014


Page 7

FP7-288238

Document Code: D3.2 Interface design prototypes and/or mock

TABLE 2.1 P OSITIVE


Claims Field of view Always connected Power usage View

AND NEGATIVE CLAIMS OF

AR GLASSES
Phone Keyhole

VS. HANDHELD DEVICE (POSITIVE IN ITALIC )

AR visor Full view

Tablet Keyhole

Have to put on Slip out of pocket In a bag Battery See through Battery Camera image Physically tiring Intrusive Battery Camera image Physically tiring Intrusive

Holding the phone in front of you N/A Social acceptance Unobtrusive

From research material, we identified that to be able to test the design ideas a tool was needed. We decided to develop a tool which could be used to simulate non-complete systems. Thus the Wizard of Oz tool was developed. The research material mentioned above is presented in the appendix. The Wizard of Oz tool will be useful during the whole project; for UCY2 we will use it to investigate how the user should be guided in an indoor area. Several ways of navigation possibilities will be tested, including: audio guiding, visual guiding and a combination of audio and visual navigation. Another interesting area which will be investigated is how the user will be notified when a target product is found e.g. sound, vibration and graphical animations. Furthermore, different notifications will be simulated to see how intrusive they are, but also to investigate how users want to read or remove notifications in different situations.

2.2 Wizard of Oz tool


2.2.1 Introduction
When conducting user studies of a system which is in its early development, it is essential to be able to simulate missing parts of the system to collect valuable feedback from potential users. A Wizard of Oz (WOZ) tool is one way of doing it. WOZ testing is a well known method and is used for testing a non-complete system on users, where a human wizard acts as the system to fill in the missing gap. The method was initially developed by J.F. Kelley in 1983 to develop a natural language application [1]. The task of the wizard varies between different implementations [2]. In one end we have the supervisor who observes the system and acts as a safeguard and if the need arises intervenes. On the other side of the spectrum we have the controller. The controller takes the role of a missing part of the system or the entire system and emulates it. The role of the wizard can also change depending on the phase of the development. When more and more functionality is added the role of the wizard goes from being a controller to a supervisor. WOZ testing is a powerful tool for uncovering design ideas in limited evolved technologies, especially for systems performing in physical environments, since the designers are less constrained by technical specifications [4] [3]. It is very important that a WOZ interface is supporting the wizard in performing their assigned tasks by observing the user, observing the environment, simulating sensors and/or controlling content [3]. The tool should be useful to the dedicated wizard operator and could, if possible, have automated functionality. S. Dow

VENTURI Consortium 2011-2014


Page 8

FP7-288238

Document Code: D3.2 Interface design prototypes and/or mock

et al. opinion is that WOZ testing is "leading to more frequent testing of design ideas and, hopefully, to better end-user experiences".

2.2.2 Setup
The WOZ tool consists of two android devices communicating with each other over WiFi (Figure 2.2). One device acts as the wizard controlled by the test leader and the other one is the puppet used by test person. The wizard application can control the user interface on the puppet device. The main features which was identified as ne cessary includes, present media such as image, video and sound, easy navigation and location based triggering, capability to log test results and visual feedback and integrate Sony SmartWatch [5] for interaction possibilities. Wizard device can be used as a remote control and is as already mentioned useful to test new user interface ideas before the actual application is developed. The wizard and puppet applications communicate over WiFi using a proprietary protocol. Android versions supported: Wizard: 3.2 and up Puppet: 2.3.3 and up

FIGURE 2.2 SHOWS WIZARD

OF

OZ TOOL

SETUP

2.2.3 Connection
The WOZ tool communicates via IP over WiFi. Both TCP and UDP are used in the communication. Network communication is handled through two separate services, one using TCP and the other UDP. UDP is only used to transfer the camera feed from the puppet device to the Wizard device. The other service, using TCP, is used for all other communication. The reason UDP is used for the camera feed is that speed is more important than ma king sure all packages are received. All communication except for the camera feed uses a protocol generated by protocol buffers over a TCP connection [6]. Since we use wireless communication it is hard to guarantee a stable connection. Instead of trying to guarantee a stable connection, focus was put on being able to recover in case of an unwanted drop in connection. Another negative side effect of a trembling connection is that the wizard cannot really be sure that the connection is up at a given time. To counteract this, the wizard device will show its connection status on top of the screen (Figure 2.4). To clarify whether or not the connection is established the field will change color. If the

VENTURI Consortium 2011-2014


Page 9

FP7-288238

Document Code: D3.2 Interface design prototypes and/or mock

devices are connected the field will be green and red otherwise. To establish whether or not a connection is up the wizard device will send a ping every two seconds. If no answer has been received before the next ping, the wizard will consider itself as disconnected. The puppet device does not send out its own pings, only answering the ones it get. If it has not received any pings in a given timeframe it will consider itself disconnected and starts to initiate a new connection.

2.2.4 Wizard
The wizard contains most of the functionalities and can be used to control what is shown on the puppet device. Different views were developed for different situations, example of views which the wizard can enter includes navigation view, camera view etc. The views are presented in more detail in section Features. Additional e xample of functionality running on the wizard is the SmartWatch which is connected with the wizard device but worn by the test person. The wizard device triggers events on the puppet when it has received events from the SmartWatch.

2.2.5 Puppet
The user interface of the puppet is designed to not have too much point and click interaction, other than the SmartWatch which is in-directly interaction through the wizard as mentioned in the previous section. The only form of point and click interaction is a menu that contains two elements, one button to close the program and one button to enter the IP address of the wizard device. Other than that it is a black background that is replaced by what the wizard wants the puppet to see (Figure 2.2). The reason for using black background is that when using HMDs with optical see-through black is transparent. There is also support to show navigation arrows on a video-see through device (Figure 2.3).

FIGURE 2.3 AN IMAGE

DISPLAYED OVER THE CAMERA PREVIEW ON THE

P UPPET DEVICE (VIDEO-SEE

THROUGH)

The puppet device can also, if supported by the device, read out text strings, using the Android text to speech library.

2.2.6 Features
This section will describe the implemented features accessible by the wizard and also show the view of each feature.

VENTURI Consortium 2011-2014


Page 10

FP7-288238

Document Code: D3.2 Interface design prototypes and/or mock

Filebrowser
In the filebrowser view, it is possible to list all files which can be of interest to show or play to the test person on the puppet device (Figure 2.4).

FIGURE 2.4 SCREENSHOT

FROM THE FILEBROWSER VIEW (LEFT SIDE: ON A SONY X PERIA S)

SONY TABLET S, RIGHT

SIDE: SAME VIEW

Camera
From this camera view (Figure 2.5) the wizard can start the camera on the puppet device and also request the puppet device to send the camera feed. It is also possible to set the puppet to take pictures automatically within a time interval which also is set from the wizard. It should be noted that it is not necessary to display the camera view on the puppet device it can be started in the background without the test person knowing it.

FIGURE 2.5 SCREENSHOT

FROM THE CAMERA VIEW (LEFT SIDE: SONY SONY XPERIA S)

TABLET S, RIGHT

SIDE: SAME VIEW ON A

VENTURI Consortium 2011-2014


Page 11

FP7-288238

Document Code: D3.2 Interface design prototypes and/or mock

Puppet
This feature allows the wizard to see what is displayed on the puppet device.

FIGURE 2.6 SCREENSHOT

FROM THE PUPPET VIEW (LEFT SIDE: SONY SONY XPERIA S)

TABLET S,

RIGHT SIDE: SAME VIEW ON A

Navigation
The purpose with this feature is to fake navigation. It can be used both for indoor and outdoor navigation. It is also possible for the wizard to enable/disable audio navigation together with the visual navigation (Figure 2.7). Additionally it is also possible to customize the audio navigation since Android Text-to-Speech engine is used.

FIGURE 2.7 SCREENSHOT

FROM THE NAVIGATION VIEW (LEFT SIDE: SONY ON A SONY X PERIA S)

TABLET S,

RIGHT SIDE: SAME VIEW

VENTURI Consortium 2011-2014


Page 12

FP7-288238

Document Code: D3.2 Interface design prototypes and/or mock

Notification
The wizard can choose to send default notifications but the wizard also have the possibility to create and send different simulated notifications on the fly such as Facebook, Twitter, G-Talk, Alarm, SMS etc. Further it is also possible to get the notification messages to be read out on the puppet device.

FIGURE 2.8 SCREENSHOT

FROM THE NOTIFICATION VIEW (LEFT SIDE: ON A SONY X PERIA S)

SONY TABLET S, RIGHT

SIDE: SAME VIEW

Tours
With the tour feature the wizard can trigger different actions on different locations. The idea is to give the wizard the opportunity to create a tour with the tool. The wizard can walk around and choose to set different a ctions e.g. show an image in different locations presented on a map (Figure 2.9). Further the wizard can start the tour which will be running in the background and still have the possibility to use the other features.

VENTURI Consortium 2011-2014


Page 13

FP7-288238

Document Code: D3.2 Interface design prototypes and/or mock

FIGURE 2.9 SCREENSHOT

FROM THE TOUR VIEW (LEFT SIDE: SONY SONY XPERIA S)

TABLET S, RIGHT

SIDE: SAME VIEW ON A

Predefined sequence
This feature helps to predefine a user study. The idea is to list all commands and simply click through without needing to switch views.

FIGURE 2.10 SCREENSHOT

FROM THE PREDEFINED VIEW (LEFT SIDE: SONY ON A SONY X PERIA S)

TABLET S, RIGHT

SIDE: SAME VIEW

Log
Both the wizard and puppet applications have support for logging the activities. The logs are saved in the SDcard. All entries have a timestamp and if the position is known latitude and longitude are also saved. Examples of events that create entries are if a connection has been established, if a command has been received or sent and in some places error messages.

VENTURI Consortium 2011-2014


Page 14

FP7-288238

Document Code: D3.2 Interface design prototypes and/or mock

3 IXE: Interactiv e eXtensible Engine


3.1 Introduction
IXE is a mobile application allowing indoor and outdoor navigation only based on the Inertial Measurement Unit (IMU) described in section 5 and not on GPS. This application enables users to test navigation on any wellformatted OSM network with audio instructions. The embedded router in IXE provides the shortest path between your current location and a chosen destination in OSM format. As a consequence the generated route can be directly edited in any OSM authoring software such as JOSM or in the mobile OSM route editor described in section 4. Figure 3.1 introduces IXE independent modules from algorithm level (IMU and routing) to cross-modal user interactions (audio, button and so on) and interfaces. This mobile application was tested by visually impaired people whose feedbacks are detailed in D3.1 [26].

FIGURE 3.1 IXE ARCHITECTURE

3.2 Application features


As shown on figure 3.1 IXE is built above several software components: OSM cartography: o Manages map resources o Computes shortest path Localization o Provides current location using PDR regardless of whether the user is indoor or outdoor. o Calibration process view Head orientation: provides body orientation with respect to the north. Actual IXE version takes into a ccount device orientation and not head orientation. In future work a head tracker based on compass, accelerometers and 3-axis gyroscope could be a very interesting way to investigate in.

VENTURI Consortium 2011-2014


Page 15

FP7-288238

Document Code: D3.2 Interface design prototypes and/or mock

Interactive environment interrogation: at anytime user can have access to information about nearby environment by pointing the device in the wished direction. Navigation: Turn-by-turn instructions are provided for the computed route and synthetized by a text-2speech engine

In addition to these components which provides must have features, some convenient features has be en added to the application in order to improve user experience: QRCode reader: OSM network and routes can be directly downloaded and then imported in the application by scanning a QRCode without exiting the application Mobile OSM route editor: allows an author to edit on the fly a computed route. This feature is very relevant for visually impaired people who want to add custom audio instruction for a specific route User profiles: user settings can be saved by the application. There are three classes of settings: o Routing settings: user can choose if the router has to take into account stairs and/or lifts and if instructions must be automatically generated o Audio settings: user can choose audio beacon behaviour, never enabled, when user is stopped, always enabled o PDR settings: physiological and pedometer parameters (user height, calibration value, IMU position)

3.3 User interfaces and interactions


IXE user interfaces are simple and respect iOS human interface guidelines [31]. Focus on the primary task is the most important guideline to respect when an application is designed for visually impaired people. Each presented view to the user must be clear and interactions limited as much as possible to a single or two tasks. For visua lly impaired people it is very complicated to use a touch screen, thats why IXE supports VoiceOver and defines accessibility attributes for each UI element. An application is accessible for visually impaired people when all user interface elements with which users can interact are accessible. Figure 3.2 shows the main menu of the application where it is possible to choose a route among those previously saved. User has just to scroll down or scroll up the view to select a route, when its finger touches a cell VoiceOver announces the route name.

Accessibility attributes
Selected profile default. Double touch to choose another profile

Edit A102-A201 route A102-A201, double touch to load this route Compute new route from a road network

FIGURE 3.2 MAIN

MENU SUPPORTING

VOICEOVER

When user double touches the screen a new view (Figure 3.3) appears and invites the user to start the navigation. It is possible to enable simulation from this view by touching the auto button.

VENTURI Consortium 2011-2014


Page 16

FP7-288238

Document Code: D3.2 Interface design prototypes and/or mock

Back to main menu PDR or Simulation mode

Start navigation

Query nerby POI Resume navigation Query distance

Pause navigation

Map view Destination Start point Current location

Display/Hide POI

Display/Hide instructions
Map type Map parameters

New route from current location (see Figure 3.8)

FIGURE 3.3 NAVIGATION

VIEW

Figure 3.3 introduces navigation interface with a map and button to start, pause or stop localization process. This interface is not suitable for visually people because of small buttons and the map, which is useless, but it is important to provide such interface for valid users. To meet the needs of visually impaired people all interaction on this view can be control by headphones buttons as shown on figure 3.4.

In navigation mode Restart navigation Pause / Resume navigation Query nearby POI

In query mode Next query level Pause / Resume navigation Previous query level

FIGURE 3.4 HEADPHONE

CONTROLS

It is possible to query nearby POI according to three predetermined distances (5m, 10m and 30m) as shown on figure below:

VENTURI Consortium 2011-2014


Page 17

FP7-288238

Document Code: D3.2 Interface design prototypes and/or mock

FIGURE 3.5 - QUERY LEVELS

Interfaces for user profile (Figure 3.6 and 3.7) respect high contrast guideline (white character on black background). This view is divided into 4 parts: routing parameters then audio parameters and physical parameters and finally pedometer parameters.

Profile name Routing parameters

Use stairs in routing


Use lifts in routing Generate instructions

Audio parameters Audio beacon behavior

Audio beacon distance

FIGURE 3.6 ROUTING

AND AUDIO PARAMETERS

VENTURI Consortium 2011-2014


Page 18

FP7-288238

Document Code: D3.2 Interface design prototypes and/or mock

Save profile
Pedometer parameters

Calibration value User height Calibration parameters


Calibration distance

IMU position Acceleration peak (computed or manual)

FIGURE 3.7 PEDOMETER

AND CALIBRATION PARAMETERS

IXE allows computing a route from an OSM network (figure 3.8). The user can load an OSM file and sets start point and destination. Then, he can modify routing parameters to define if the route computed can cross stairs and/or elevators. Finally, the user defines if the application has to generate audio instructions.

OSM network

Loads routing view

Start / Destination point

Go to settings

FIGURE 3.8 ROUTING


VENTURI Consortium 2011-2014

IN

IXE

Page 19

FP7-288238

Document Code: D3.2 Interface design prototypes and/or mock

The last feature is the embedded QRCode reader, which allow scanning and then downloading an OSM network or a route. A relevant use case could be the scan of a QRCode on business card referring to a predetermined route, from the INRIA office reception to a room for example.

Back

Confirmation required
Download confirmed

Back to main menu

Start QRCode

Added route

FIGURE 3.9 QRCODE

READER

3.4 Guiding people using a navigation audio beacon


Using HRTF (Head-Related Transfer Function) rendering, it is possible to create a virtual audio beacon that is precisely positioned in the 3D space. This audio beacon can be used as a navigation guide to follow, while still providing other navigation cues at the same time. Using a spatialized audio beacon instead of synthesized speech reduces the user cognition while providing him confirmation feedback at all times. It is especially useful for users with eyesight disabilities.

3.4.1 Pre-rendering the navigation audio beacon


Because mobile phones have limited processing capabilities, we need to use a set of pre-rendered HRTF samples to create the audio beacon, as HRTF rendering is a processor intensive task. The principle is to divide the circular space around the listener into a definite number of points, and pre calculate the HRTF 3D spatialization of the audio beacon for each of these points (figure 3.10).

VENTURI Consortium 2011-2014


Page 20

FP7-288238

Document Code: D3.2 Interface design prototypes and/or mock

FIGURE 3.10 P RE-RENDERING

THE NAVIGATION AUDIO BEACON USING

8 POSITIONS

AROUND THE USER

With an objective of guiding people, at least 8 points are needed (4 cardinal points + 4 diagonals) for providing significant information. This granularity can be augmented for a more precise guidance, at the cost of a greater memory usage. After having pre-rendered all the audio samples, we can use them to play the navigation audio beacon. Two methods can be implemented, and we have tested both to experiment their result: If we consider a short duration navigation audio beacon sound (1-2s), we can simply play them one after the other in loop while choosing the sample the closer to the direction to take before starting a new repetition. We can also start all samples at the same time, and then adjust in real-time the mix of these sources to obtain by interpolation the desired guiding direction. A different weight Pi is affected to each sample corresponding to its target mix volume (figure 3.11), with variation depending of the orientation angle relative to the head of the listener (figure 3.12).

FIGURE 3.11 WEIGHTS

OF THE SAMPLES DEPENDING OF THE SOURCE AZIMUTH

VENTURI Consortium 2011-2014


Page 21

FP7-288238

Document Code: D3.2 Interface design prototypes and/or mock

FIGURE 3.12 SOURCE

ANGLE ORIENTATION ANGLE RELATIVE TO THE USER HEAD

The first case can be expressed directly using our XML audio language described in D5. 1, below is an example implementation of it. This template can also be adapted for the second case by changing only the synchroniz ation parameters and adding application code to perform the mix adjustments.
<maudl> <sounds> <sound id="pointer" loopCount="-1" pick="manual" render3D="no" play="nav.pointer.star t" stop="nav.pointer.stop"> <soundsource src="pointer_p0.wav" setNext="pointer.p0"/> <soundsource src="pointer_p1.wav" setNext="pointer.p1"/> <soundsource src="pointer_p2.wav" setNext="pointer.p2"/> <soundsource src="pointer_p3.wav" setNext="pointer.p3"/> <soundsource src="pointer_p4.wav" setNext="pointer.p4"/> <soundsource src="pointer_p5.wav" setNext="pointer.p5"/> <soundsource src="pointer_p6.wav" setNext="pointer.p6"/> <soundsource src="pointer_p7.wav" setNext="pointer.p7"/> </sond> </sounds> </maudl>

Here we have used a single sound container named pointer in which we add the 8 samples corresponding to the 8 pre-rendered pointer positions, as sound sources. It is configured to play in loop ( loopCount attribute set to -1) with a manual selection of the current sample (pick attribute) and 3D rendering deactivated, as it is precalculated. During the user movements, the position manager will select the best sample to play corresponding to the direction to take by sending an event in the form pointer.p* to the sound manager. As for the second case, experiments showed various difficulties regarding its implementation: perfectly synchronizing the 8 sources to play in parallel is a difficult task: it would need to develop a specific sound library with sample-precise synchronization. In addition, various simple tests with users showed that dues to the different imprecision factors (synchronization of the samples, mixing delay) the results were confusing for the users lea ding to guiding errors and misinterpretation of the audio beacon direction. This is why we chose to stick with the first method that is also simpler to implement on many platforms.

3.4.2 Additional improvements


In order to give the user the maximum amount of information with the navigation audio beacon and minimize the error chances while improving his precision, we chose to improve the pre-rendering precision by augmenting the granularity to 16 pre-rendering positions, resulting in smoother indications. We also added a new level of information: the spatialized sound used by the audio beacon varies depending of the indicated direction, allowing the user to better correct its moving direction (figure 3.13). We have split in 5 zones the 16 different possible di-

VENTURI Consortium 2011-2014


Page 22

FP7-288238

Document Code: D3.2 Interface design prototypes and/or mock

rections and added a direction hint by changing the kind of sound of the beacon. This allows the user to discriminate quickly and efficiently the direction to take.

FIGURE 3.13 NAVIGATION

AUDIO BEACON WITH ZONE VARIATIONS

Based on the tests we made with various kinds of users, it appears that some people feel more confortable at first with an audio beacon based using voice instructions, especially the ones having difficulties to hear the 3D sound spatialization or too hasty to overcome the slight learning curve. We have then designed a second type of navigation audio beacon based on the same concepts but using voice instructions for the 16 directions instead spatialized sounds. With this pointer, the ability to receive additional voice instructions from the environment is lost, and the instantaneousness of the cognition due to the longer duration of the instructions and the need to interpret them. This second version of the audio beacon is then not recommended for most users, as it only benefits people with hearing impairments past the learning curve of the first version.

3.4.3 Using a head tracker to improve sound spatializa tio n


In the heart of the current navigation systems, the direction indicated by the audio beacon considers the listener head to have the same orientation as the listener body. This supposition works generally well in most cases, but it is more natural for the user to only turn his head without turning its body when quickly searching for the dire ction to take.

VENTURI Consortium 2011-2014


Page 23

FP7-288238

Document Code: D3.2 Interface design prototypes and/or mock

FIGURE 3.14 EXAMPLE

OF A HEAD TRACKING DEVICE BASED ON A GYROSCOPE

In order to complete our spatialization system, we have made experimentation using a head tracking device (figure 3.14). This hardware module can connect to a mobile phone and embeds various sensors, in particular a gyroscope and magnetometer. When positioned on the user head, this module allows the navigation system to track the head rotation relative to the user body. Using this rotation, we can fix the sound spatialization to r eflect the exact orientation of the user head, instead of his body: the virtual sound space is then stabilized and the precision of the navigation audio beacon further improved. Though, this system has limitations: in order to extract the exact rotation of the head relative to the body, a complex calibration process has to be performed, otherwise the whole system would generate more orientation errors than improvements.

VENTURI Consortium 2011-2014


Page 24

FP7-288238

Document Code: D3.2 Interface design prototypes and/or mock

4 OSM authoring
The IXE application presented in section 3 provides a complete indoor/outdoor navigation solution based on OSM. In order to make quick modifications or sketch complete navigation route on the go, we also designed a mobile OSM route editor that is integrated into the IXE application. Please not that the route editor was deve loped initially in French, as it was the language of the IXE application mock-up at this time, so the user interface screenshots may contain French labels. A second application using a kick-scooter allows creating OSM map with a good accuracy.

4.1 Mobile OSM Route Editor


Because it is always useful to make quick editions on the go like changing audio instructions or adding POIs, we decided to create an embedded editor into the IXE application. Using the same pedometer module that const itutes the base of the PDR localization system we use for estimation of the user location, we can also create new navigation routes from scratch on this mobile editor. When you already have a plan of a place you want to navigate, it is also possible to use the touch edition mode to place route points on the map directly.

4.1.1 Requirements
In order to use all the features of the embedded editor, there are some requirements, depending of the features you want to use: Pedometer editing (see figure 4.1): if you want to create or edit routes using the pedometer mode, you first need to have selected and calibrated your walk profile in the IXE application, so the computed distances are accurate. Route editing: since the mobile editor can only open OSM route documents, you must first generate a route using IXE if you only have an OSM navigation network document. Starting location: when you want to create a new route from scratch, you must know your starting location. By default, your GPS location is used, but you should change it since GPS is not precise enough for this kind of usage, especially if you are inside a building. Audio instruction generation: the editor can automatically generate OSM audio instruction tags for basic turn-by-turn directions, but it needs a completed route to do so. Background image: you can add a background image layer to an OSM route, for example to trace a route manually using a building map image. This image must be configured beforehand directly in the OSM document.

VENTURI Consortium 2011-2014


Page 25

FP7-288238

Document Code: D3.2 Interface design prototypes and/or mock

FIGURE 4.1 USING

THE PEDOMETER TO CREATE AN

OSM ROUTE

4.1.2 User inter face


The mobile OSM editor user interface can be decomposed into 5 views: The main editor view, a WYSIWYG view where you can see the route, POIs and audio instructions and edit them directly. The document properties / editor preferences view, where you edit the basic document properties, generate basic turn-by-turn audio directions and change editor settings. The route node position editor, to change manually the position and level of a selected route node. The POI editor view, allowing editing all the properties of a selected POI. The audio instruction editor view that can be used to test the generated audio or edit the properties of the selected audio instruction.

We will now a look at how these views are linked to each other and how we designed the user interactions for an intuitive mobile editing experience.

Main editor interface


When you load an existing OSM route document, you are directly presented the main editor interface where you see the route with the POIs and audio instructions as a whole (see figure below, left screenshot), as well as a compass indicating your current orientation and a toolbar to make edits. Each map feature has a distinct representation for a quick and easy visualization: Route nodes are represented by a red circle Route edges are represented by a blue line (the colour goes darker/lighter for differentiation of the levels) POIs are represented by a blue pin Audio instructions are represented by a yellow note The current position and orientation are represented by a blue target with an arrow indicating the current direction on the map

VENTURI Consortium 2011-2014


Page 26

FP7-288238

Document Code: D3.2 Interface design prototypes and/or mock

Save document and exit editor Route properties and editor preferences Document name

Go back to editor view

Document author
Generate audio instructions Pedometer mode / Touch mode switch Turn angle step

Add turn

Add audio instruction

Add straight line (in pedometer mode)


On the top of the main editor view, there is 2 buttons on the left and right side: the first one to save and quit the editor, the second one that opens the route properties / editor preferences view. Since the editor preferences are saved within the OSM document and thus are related to the document properties, it made sense to group these two sections together. On the bottom of the view is located the toolbar, with a status just above providing contextual information. You can immediately know which route edit mode enabled by looking at the first toolbar icon, as shown on the figure below:

Pedometer edit mode 4 icons compose the main toolbar, from left to right:

Touch edit mode

The first one allows adding a straight line to the route, starting from the current position (and orientation when in pedometer mode). Depending one mode (instructions are provided via the status bar), you just walk to the desired location and touch the icon again or touch the screen to the desired location to create a new route node an edge in a straight line. The second icon allows changing the current orientation on the map, by simply turning the device (you holding the phone) facing the desired direction. This is essentially useful in pedometer edit mode, but can still be used as a direction helper when in touch edit mode.

VENTURI Consortium 2011-2014


Page 27

FP7-288238

Document Code: D3.2 Interface design prototypes and/or mock

The third icon allows adding a new POI, by directly touching the desired POI location on the map. The fourth icon allows adding a new audio instruction, the same way as a POI.

When a POI or an audio instruction is selected by touching it on the map, additional toolbar buttons appears:

POI name

Edit POI

Enlarge radius

Reduce radius

Trigger radius

These buttons allows enlarging or reducing the triggering zone of the selected POI or audio instruction. The a ctual triggering zone is also expressed in meters in the status bar, and shown on a map using a transparent blue circle. When a map entity is selected, a black popover appears with showing the feature type or its name if it is available, with a blue button on the right. When touching this button, the editor view related to the entity type is opened. A selected map entity can also be moved on the map, simply by dragging it to the desired location after touching it.

Route node editor


The route node editor allows editing the various properties of a node, such as the latitude/longitude location, the altitude and its floor level (when inside a building).

Save and go back to the main view

Delete entity

Latitude Longitude Altitude (optional) Floor level (optional)

It can also be deleted from the map, by tapping the recycle bin icon: user confirmation is then asked before performing the deletion.

VENTURI Consortium 2011-2014


Page 28

FP7-288238

Document Code: D3.2 Interface design prototypes and/or mock

Asking for user confirmation before deletion

This basic view and properties is also the base of the POI and audio instruction editors, since these map entities are also OSM nodes, with more advanced properties.

POI editor
In addition to the same properties presented in the previous section, the POI editor view shows new edition fields:

Shared properties

POI name Triggering radius

Additional tags

You edit the POI name, set the triggering radius value directly in meters, as well as specifying multiple OSM tags using the key=value; syntax.

Audio instruction editor


The audio instruction editor is similar to the POI editor, with specific controls related to the audio instruction itself: you can enter the audio instruction text to be synthesized by the text-to-speech engine, and try playing the instruction by tapping the listen button.

VENTURI Consortium 2011-2014


Page 29

FP7-288238

Document Code: D3.2 Interface design prototypes and/or mock

Shared properties

Triggering radius
Play the instruction

Instruction text

4.2 Android kick-scooter


Android kick-scooter is an Android application using a kick scooter in order to map indoor and outdoor places easily in OpenStreetMap XML format. The scooter uses ANT+ [32] sensor (with gyroscope and counter wheel) to make precise plan of a travel. It automatically takes into account the corners (from gyroscope data) and can create custom POI (Points of Interest) in order to enable the user to obtain a relatively complete route. Finally, OSM files are collected, cleaned and recalibrated using an OpenStreetMap editor like JOSM. Figure 4.2 bellow shows users with the Android Kick-scooter at Sugimotocho train station (Japan).

FIGURE 4.2 STUDENTS

USING THE

ANDROID KICK -SCOOTER

AT SUGIMOTOCHO,

OSAKA CITY UNIVERSITY

VENTURI Consortium 2011-2014


Page 30

FP7-288238

Document Code: D3.2 Interface design prototypes and/or mock

4.2.1 Technologies
Kick-scooter is a mobile Android application (version 2.3 & higher supported) connected to ANT+ sensor on the scooter. ANT+ is a 2.4GHz practical wireless networking protocol and embedded system solution specifically designed for wireless sensor networks (WSN). Ant+ is primarily designed for collection and transfer of sensor data, to manageable units of various types. The mobile application receives sensor data (counter wheel and gyroscope) and computes a route with distance and directions.

4.2.2 Features
Android kick-scooter application allows the following features: Computes a route taking into account changes of directions Saves points of interest with voice or text Saves sensor data in XML files Saves computed route in OSM files Sends files by email Uses wheels of different diameters Manages multiple magnets to increase the accuracy of measurements (By default, it is equivalent to +/- the circumference of the wheel)

4.2.3 Experimentation
The application allows creating OSM map of public indoor places. Users can take data discreetly and create the full map when they come back home using an OSM editor. This solution is a way to create map more quickly with a good accuracy. The application was used to create maps of Grenoble station and Osaka station. Right image of figure 4.3 shows the OSM document of Sugimotocho station map created with Android kick-scooter.

FIGURE 4.3 OSM DOCUMENT

OF SUGIMOTOCHO STATION

VENTURI Consortium 2011-2014


Page 31

FP7-288238

Document Code: D3.2 Interface design prototypes and/or mock

5 PDRTrack: localization test application


5.1 Introduction
A mobile application called PDRTrack has been designed to test and find the best parameters for the Pedestrian Dead-Reckoning (PDR) algorithms. This application provides the current user location for each step and displays it on a map. Navigation and routing are not part of PDRTrack, to start the localization the user has only to choose a start point and a start orientation. Similar applications exist for infrastructure-less indoor navigation such as FootPath [25] from Aachen but localization in this application is constraint to a computed route. From PDRTrack side since localization is not based on a predetermined route but on a full navigation network it is possible to freely move in the building. For example, user can enter in a room and then go back to a corridor, take the stairs and finally turn back. From tests done with visually impaired people [26], a software requirement has been added: the localization must not be constraint to a specific route because user can leave the route at anytime. Another main innovation of PDRTrack is the dataset recording and reading mode, which enables the user to save all sensors values during his walk and replay them. This feature is particularly useful to test the same walk with different algorithm parameters without walking again. This part will detail this mobile application from requirements to user interfaces.

5.2 Requirements
5.2.1 OpenStreetMap document
The PDR module needs an OSM [28] document description (indoor, outdoor or both) in input as detailed in part 4.1 of deliverable D4.3 [27]. As often as possible the PDR will try to match the current location on a way defined in the OSM document. There are two map-matching algorithms, one for correct the location and the other one for the orientation, richer the map is better corrections will be. This document wont detail these algorithms, please refer to D4.3 for further information. Here below an OSM example with a way that can be used by mapmatching algorithms. This way includes standard tags and values from the approved map features [29], additional tags needed for the navigation are introduced in D4.3.
<?xml version='1.0' encoding='UTF-8'?> <osm version='0.6' generator='JOSM'> <!-- Nodes hold by the corridor --> <node id='-43' visible='true' lat='40.01724758683568' lon='2.9948837556279706' /> <node id='-18' visible='true' lat='40.02583770246074' lon='3.0069869656592285' /> <node id='-16' visible='true' lat='40.025826707140524' lon='2.9948704796222487' /> <node id='-15' visible='true' lat='40.0123408216216' lon='2.994891347992221' /> <!-- Corridor definition --> <way id='-17' visible='true'> <nd ref='-15' /> <nd ref='-43' /> <nd ref='-16' /> <nd ref='-18' /> <tag k='highway' v='footway' /> <tag k='indoor' v='yes' /> <tag k='level' v='2' /> <tag k='name' v='Corridor' /> </way> </osm>

VENTURI Consortium 2011-2014


Page 32

FP7-288238

Document Code: D3.2 Interface design prototypes and/or mock

The OSM document can be produced by any authoring application such as ones described in section 4 and directly imported in PDRTrack.

5.2.2 Pedometer calibration


Before using the localization part the pedometer must be calibrated with the physical and physiological chara cteristics of the user. This calibration process is strongly recommended for a precise localization. Indeed, depending on the user height and his way of walking each stride length wont be the same. The pedometer doesnt pr ovide a mean value but estimates the distance from a parameterized model of walking with the calibration value. Depending on the device position the acceleration peak enabling the detection of a step is different. Indeed, if the device is hand-held the vertical acceleration will be more stable than if the device is chest-mounted, thats the reason why the vertical acceleration threshold is a pedometer parameter. The device orientation is automatically detected, so landscape, upside down, face down or face up mode doesnt affect step detection algorithm.

FIGURE 5.1 CHEST/BELT


The calibration process is in 6 steps:

MOUNT OR HAND-HELD PDR

(5)

(4)
(1) (2) (3) (6)

(1) Enter your height in meter (2) Choose the acceleration threshold (3) Choose the calibration distance
VENTURI Consortium 2011-2014

(4) Enable calibration mode (5) Start the pedometer and walk the distance entered in (3)

(6) After stopping the pedometer save and apply the new calibration value

Page 33

FP7-288238

Document Code: D3.2 Interface design prototypes and/or mock

5.3 User Interfaces


PDRTrack user interfaces are divided into two parts: Left part 1. OSM network: list of all OSM files imported in the application 2. Dataset: list of all files containing sensors values recording 3. Pedometer: access to calibration parameter 4. Drift elimination: access to heading drift elimination algorithm parameters 5. Map-Matching: access to map-matching algorithm parameters

FIGURE 5.2 LEFT

MENU

FIGURE 5.3 LEFT

SUBMENUS

When the PDR is in recording mode the produced dataset has the current date for name according to this format MMddyy_hhmmss. Right part o Localization view: map with overlaid user position o Start point and orientation chooser view: enables user to choose from which position to start the localization o Pedometer view: provides pedometer information such as step length, cumulated distance and number of steps o Dataset recording/reading mode: a switch button enables recording or reading mode

VENTURI Consortium 2011-2014


Page 34

FP7-288238

Document Code: D3.2 Interface design prototypes and/or mock

Recording/Reading mode

FIGURE 5.4 MAIN PDRTRACK

VIEWS

During the localization the trace describing the current trajectory of the user is directly drawn on a map, in red as shown on Figure 5.5. A well-known issue with native map view for indoor visualization is the limited zoom provided by the MapKit. Depending on which map type that was in use the zoom can reach the level 21 (satellite tiles), level 19 otherwise. This zoom level is not enough for navigating inside a building, thats the reason why visualization based on SVG and HTML 5 is currently in progress.

Current location
OSM ways

Trace

FIGURE 5.5 REAL

TIME TRACE

Thanks to indoors localization reliability, user should be able to zoom as much as he wants on the map. Latest mobile phones can easily handle CSS transformation on SVG maps in a WebView, thats why we have choose to use HTML5 features to be more portable on others OS.

VENTURI Consortium 2011-2014


Page 35

FP7-288238

Document Code: D3.2 Interface design prototypes and/or mock

Current position with accuracy

Current floor

Trace

FIGURE 5.6 REAL

TIME TRACE ON

SVG MAP MULTI

FLOOR

FIGURE 5.7 ZOOM TO LEVEL 24

NOT PIXELATED

Some basics features have been integrated to navigate through the map (Figure 5.6): Zoom-in / Zoom-out not pixelated (Figure 5.7) Translate with fingers (mobile web browser) or mouse (desktop web browser) Follow current position

VENTURI Consortium 2011-2014


Page 36

FP7-288238

Document Code: D3.2 Interface design prototypes and/or mock

FIGURE 5.8 SVG

MAP MULTI FLOOR WITH

OSM SUR LAYER

To enhance visualization system, these following functionalities have been added: Multiple SVG (buildings) visualization Manage multiple floors (Figure 5.6) Show OSM, Google, Google Satellite sub layers (disabled when zoom is too high) (Figure 5.8) Visibility controlling according to zooming Embedded geographic coordinates Trace (Figure 5.6)

In order to improve authoring, in SVG files, all these features are handle by a new namespace, with new attributes. For example the following:
<svg xmlns="http://www.w3.org/2000/svg" xmlns:svgmap="http://tyrex.inria.fr/svgmap" svgmap:visibleMinZoom="17" svgmap:minFloor="0" svgmap:maxFloor="2" svgmap:title="Inria" svgmap:startLat="45.21882318" svgmap:startLon="5.8061204" svgmap:endLat="45.21730293" svgmap:endLon="5.807867424"> <g id="rooms" svgmap:floorRange="1;1"> </g> </svg>

This system allows designers to create their own maps easily with Inkscape/Illustrator witho ut any developer knowledge. PDRTrack was tested in France at INRIA Rhne-Alpes and in Japan at Sugimotocho and Wakayamadaigaku stations. These tests have allowed improving map-matching algorithm and therefore the localization accuracy. A public video introducing PDRTrack is available on YouTube [30].

VENTURI Consortium 2011-2014


Page 37

FP7-288238

Document Code: D3.2 Interface design prototypes and/or mock

6 Results and Conclusions


This report describes tools and prototypes to show new interface and interaction design provided by new audiovisual technologies. The Wizard of Oz tool (section 2) is essential to simulate part of the system to collect feedback from users. IXE mobile application (section 3) deals with indoor and outdoor navigation based on the Inertial Measurement Unit and presents user interfaces and interactions for guiding people by using this technology. The OSM authoring part (section 4) describes solutions to create OSM maps using PDR or a kick scooter. Finally, PDRTrack (section 5) is a localization test application and provides user interfaces and especially a solution for visualization of building with many levels, based on SVG and HTML 5.

7 References
[1] J.F. Kelley, An empirical methodology for writing user-friendly natural language computer applications. In CHI 83 Proceedings of the SIGCHI Conference of Human Factors in Computing Systems , pages 193196, 1983. [2] S. Dow, B. MacIntyre, J. Lee, C. Oezbek, J.D. Bolter, and M. Gandy. Wizard of oz support throughout an iter ative design process. Pervasive Computing, IEEE, 4(4):18 26, oct.-dec. 2005. [3] Steven Dow, Jaemin Lee, Christopher Oezbek, Blair MacIntyre, Jay David Bolter, and Maribeth Gandy. Wizard of oz interfaces for mixed reality applications. In CHI 05 Extended Abstracts on Human Factors in Computing Systems, CHI EA 05, pages 13391342, New York, NY, USA, 2005. ACM. [4] Y. Li, J.I. Hong, and J.A. Landay. Design challenges and principles for wizard of oz testing of location-enhanced applications. IEEE Pervasive Computing, 6:7075, April 2007. [5] Sony Mobile Communications. Sony SmartWatch. http://www.sonymobile.com/gb/products/accessories/smartwatch/. [Online] [Accessed 27 Dec 2012]. [6] Google Developers. Protocol buffers developer guide. Accessed 25 Oct 2012. [7] J. P. Rolland, H. Fuchs, Optical Versus Video See-Through Head-Mounted Displays in Medical Visualization, Journal of Presence, Vol. 9, No.3, June 2000, pp. 287-309. [8] C. S. Montero, J. Alexander, M. T. Marshall, S. Subramanian, Would You Do That? Understanding Social Acceptance of Gestural I nterfaces, MobileHCI 10, September 7-10, 2010, Lisbon, Portugal, ACM [9] Y. Rogers, H. Sharp, and J. Preece, Interaction Design - beyond human-computer interaction, 2011, pp. 5153, 59. [10] B. Shneiderman, C. Plaisant, Designing the User Interface Strategies for effective human-computer interaction, Fifth edition, 2010, pp. 304 308. [11] R. Catrambone, J. Stasko, J. Xiao, Anthropomorphic Agents as a User Interface Paradigm: Experimental Findings and a Framework for Research [12] M. Haake, Embodied Pedagogical Agents From Visual Impact to Pedagogical Implications, Doctoral Th esis, Dept. of Design Sciences, Lund University, Sweden. [13] B.,Svensson, M. Wozniak, Augmented Reality Visor Concept, Master Thesis, Lund University, 2011, pp. 35. [14] Augmented City 3D, http://www.youtube.com/watch?v=3TL80ScTLlM&feature=html5_3d, [Online] [Accessed 27 Dec 2012].

VENTURI Consortium 2011-2014


Page 38

FP7-288238

Document Code: D3.2 Interface design prototypes and/or mock

[15] Steve Manns papers, http://n1nlf-1.eecg.toronto.edu/research.htm, [Online] [Accessed 27 Dec 2012]. [16] S. Mann, WearCam (The Wearable Camera): Personal Imaging Systems for long -term use in wearable tetherless computer-mediated reality and personal Phot o/Videographic Memory Prosthesis [17] Y. Ishiguro, J. Rekimoto, Peripheral vision annotation: noninterference information presentation method for mobile augmented reality, Proceedings of the 2 nd Augmented Human International Conference Article No. 8, ACM, New York, USA, 2011. [18] Eyez 720p video streaming, http://www.engadget.com/2011/12/07/eyez-720p-video-streaming-recordingglasses-hands-on-video/ [Online] [Accessed 27 Dec 2012]. [19] Demonstrating a next generation head-up display (HUD), http://www.pioneer.eu/eur/newsroom/news/news/next-generation-head-up-display-HUD/page.html , [Online] [Accessed 27 Dec 2012]. [20] Pioneer Augmented Reality Head-Up Display Navigation System Arriving 2012, http://www.geekygadgets.com/pioneer-augmented-reality-head-up-display-navigation-system-arriving-2012-video-24-10-2011/, [Online] [Accessed 27 Dec 2012]. [21]Eye tracking research, http://www.tobii.com/en/eye-tracking-research/global/, [Online] [Accessed 27 Dec 2012]. [22] J. L. Gabbard, J. Zedlitz, J. E. Swan II, W. W. Winchester III, More Than Meets the Eye: An Engineering Study to Empirically Examine the Blending of Real and Virtual Color Spaces, IEEE Virtual Reality 2010 [23]Colour blind UI check tool, http://www.vischeck.com/vischeck/vischeckImage.php, [Online] [Accessed 27 Dec 2012]. [24] More than pictures under glass, http://worrydream.com/ABriefRantOnTheFutureOfInteractionDesign/, [Online] [Accessed 27 Dec 2012]. [25] Paul Smith, Accurate Map-based Indoor Navigation Using Smartphones, IPIN 2011, http://www.comsys.rwth-aachen.de/fileadmin/papers/2011/2011-IPIN-bitsch-footpath-long.pdf

[26] VENTURI consortium, "D3.1: Report on user expectations and cross modal interaction", January 2013. [27] VENTURI consortium, "D4.3: WP4 outcome definitions and API specifications for inter-task / inter-WP communications ", January 2013. [28] OpenStreetMap official website, http://openstreetmap.org [29] OpenStreetMap features http://wiki.openstreetmap.org/wiki/Map_Features [30] PDRTrack overview, http://www.youtube.com/watch?v=MisAjkCi0m0 [31] iOS Human Interface Guidelines, http://developer.apple.com/library/ios/#documentation/UserExperience/Conceptual/MobileHIG/Introduct ion/Introduction.html
[32] Ant+, http://www.thisisant.com

VENTURI Consortium 2011-2014


Page 39

FP7-288238

Document Code: D3.2 Interface design prototypes and/or mock

8 Appendix for Research Material


8.1 H1: Using AR visor for AR is better than using phone or tablet for AR
Explanation
Mobile augmented reality (AR) applications are becoming increasingly popular but the experience is not very immersive and neither is the interaction since the interaction area is limited to palm size screen. Using the phone or tablet for augmenting the world is like watching the world through a keyhole. Further the augment ation on a phone or tablet is done on a picture/image of reality and not on the reality as you perceive it which is a degradation of the world to the quality and speed of the camera sensor. However AR visors needs to fuse information with objects which demands spatial calculations, tracking etc. Further cognitive, perceptual, ergonomic and social issues must all be considered when developing AR visors. With AR visor the AR application(s) can be running so the augmented information is always available. However with a phone or tablet the user must actively initiate the use of the AR application and point the device in the desired direction for there to be any augmented information available. This would feel awkward when standing in a public spot and holding up a device in front of them for extended periods of time. It is both socially awkward and physically tiring. Positive and negative claims of using AR visor, phone and tablet ( positive in italic): Claims Field of view Always connected Power usage View AR visor Full view Phone Keyhole Tablet Keyhole

Have to put on Slip out of pocket In a bag Battery See through Battery Camera image Physically tiring Intrusive Battery Camera image Physically tiring Intrusive

Holding the phone in front of you N/A Social acceptance Unobtrusive

Questions to be answered through user test


How useful is the image size using AR through a phone or tablet compared with AR visors? Experience: Well known that some phones have too small screen for some use cases. End user buys larger screen size phone to better meet these use cases. Test method: Test with existing AR applications in a phone and compare with visors or larger screen (VR). How much is the degradation of the augmented reality through a phone or tablet compared to "see through" AR visors?

VENTURI Consortium 2011-2014


Page 40

FP7-288238

Document Code: D3.2 Interface design prototypes and/or mock

Experience: Both phone and tablet has screen dependencies such as camera speed, camera quality, time to zoom, time to focus, time for processing, screen resolution, camera resolution, reflections, brightness etc. Test method: Obvious? Compare specifications using data sheets. Literature: Optical versus video see-through head-mounted displays in medical visualization [7] How socially acceptable is it to hold up the phone in front of you all the time? Literature: The results from the two Master theses; Augmented Reality Visor Concept and Social Interaction in Augmented Reality. Literature: Would you do that?: understanding social acceptance of gestural interfaces; [8] Test method: Perform focus group and user studies. Compare between using AR applications for phone (Junaio, Layar or Wikitude) and visor. Would indirect interaction through phone or tablet mean less immersion compared with AR visor? To click on screen to interact with an object instead of actually grabbing the air or pointing at it will give less immersion. Conversely the more direct interaction will give the user a better spatial sense, familiarity and enjoyment. Compare two interaction types: manipulating and exploring Test method: Perform usability test, and compare phone/tablet vs AR visor. Would indirect interaction through phone or tablet mean inaccuracy due to scaling and small involuntary hand movement? Will the user miss AR experience opportunities? Comment: This is obvious, a user interface which is most often out of sight or in your pocket is a lot of lost opportunities to discover new things or possible contextual data that can be offered by the mobile device. Assuming the users wear the glasses. Will the user perceive that he or she can execute a task faster and more accurately by using the visors than to take a mobile device from a pocket? Test method: Measure the time it takes to perform different actions with Fitts's law Is an optical see through system better for AR than a video see through system? Literature: Optical versus video see-through head-mounted displays in medical visualization [7]

VENTURI Consortium 2011-2014


Page 41

FP7-288238

Document Code: D3.2 Interface design prototypes and/or mock

8.2 H2: New application paradigm which dynamically filters application is better than the current application metaphor
Explanation
The current mobile device user interface and application paradigm are often centred on one, in focus, application which is displayed full screen. This application is often dedicated to supporting the user in carrying out specific task or tasks. We are assuming that this type of application paradigm is not the optimal paradigm for an AR visor (optical see through head worn system). When using the current application metaphor the user needs to search for an application, start it and use it. Different applications are needed in different situations. A new application paradigm could be a non anthropomorphous service that filters applications by context; this means that the application search function should not be needed as much as today. However getting context aware guessing "wrong" has a higher penalty than the positive effects of getting them "right". Dividing apps up into things like "notifications", "actions" and "intents" and then displaying them free of their applications would lessen the impact of bad guesses. It would be interesting to investigate if WIMP is still a valid application metaphor. The menu could e.g. be shown on different surfaces on the palm; table, wall, body etc. but still remain as the familiar WIMP.

Direct Manipulation
Direct manipulation proposes that digital objects be designed at the interface so that they can be interacted with in ways that are analogous to how physical objects in the physical world are manipulated. In so doing, direct manipulation interfaces are assumed to enable users to feel that they are directly controlling the digital objects represented by the computer [9]. Three core principles are: Continuous representation of the objects and actions of interest; Rapid reversible incremental action with immediate feedback about the object of interest; Physical actions and button pressing instead of issuing commands with complex syntax

Who should be in control of the interface?


Different interaction types vary in terms of how much control a user has and how much the computer has. Whereas users are primarily in control for command-based and direct manipulation interfaces they are less so in context-aware environments and agent-based systems. User controlled interaction is based on the premise that people enjoy mastery and being in control. In contrast, context-aware control assumes that having the environment monitor, recognize, and detect deviations in a persons behaviour can enable timely, helpful and even critical information to be provided when considered appropriate. To what extend do you need to be in control in your everyday and working life? Can you let computing technology monitor and decide what you need or do you prefer to tell it what you want to do? How would you feel if your assistant told you to drive more slowly because it had started to rain or drive faster or you will be late to your meeting? [9]

Sub hypothesis
Direct manipulation UI is better than natural UI (e.g. Natural language UI,) see [10], such as Apple's Siri or Microsoft's office assistant Clippy). o User study of a central application that can assist with correct information and/or use correct application.

VENTURI Consortium 2011-2014


Page 42

FP7-288238

Document Code: D3.2 Interface design prototypes and/or mock

Literature: Anthropomorphic Agents as a User Interface Paradigm: Experimental Findings and a Framework for Research [11] Literature: Embodies pedagogical agents [12] Context based filter which makes certain applications available is better than non filtered o User intended actions needs to be considered. The context based filtering assistant/application cant know what the user wants to do all t he time. Possibility for using all application is needed. In Samsung TVs today the applications are geographically dependent.

How should user interact with local items? Test method: User study, by pointing, grabbing on items?

How should user interact with items far away? Test method: User study by letting the user hold his hand in front of an interactive object the content is then mirrored into the hand. See AR Visor Concept page 35 [13].

How should user interact with incoming events? o Events that come from a local place e.g. information about some offer from a shop or system information? Events that come from something or someone far away?

Design input
Use multimodal interaction techniques such as vibration, sound, text, picture, animation etc this way several of the human senses will be involved processing the information. Dominant senses can however in some situations wipe out other senses. A standardized framework that makes it possible for applications to plug-in and be used seamless is needed. Add: Gaze tracking together object recognition allows adaption to the users varying interests without being intrusive. Important to find a balance between intrusiveness (what to present and how) and serendipity o Concept video Augmented City (watchable with 3D glasses) [14]

The notification centre in Android (and recently in iOS) may be a good design for collecting notifications that does not have a connection to the local context The desktop metaphor with widgets may be a good inspiration o Only essential info is shown on a small area and can be placed next to other widgets for at a glance status

The world browser

VENTURI Consortium 2011-2014


Page 43

FP7-288238

Document Code: D3.2 Interface design prototypes and/or mock

A constant inflow of items which augments and populates the world around us depending on our geo-location and the geo-location of items around us. Most likely there will be specific information architecture in order to solve such a population without the need for the mobile device to constantly scour the web with searches and to itself aggregate and filter the result.

Self contained applications o Similar to how applications work today in that they take up some display real estate with a ca nvas where you can enter text or draw an image. Even if this app is similar to the ones today in that it can be an application installed it must still be adapted so that it can function in the real world: be interacted with using other input than mouse and keyboard or touch events. Can be placed o projected on different surfaces or locations where it needs to adapt to its surrounding.

Tools to operate on real or virtual objects


o

Similar to how we can carry around tools such as a pen or a hammer etc. we could carry around virtual tools which are for applying changes or actions on real and virtual objects around us. These tools need to be directed at objects and act upon them instead of like applications today load a document and then within its own environment act on it. It will require a strong interconnectedness between different virtual tools and the objects that they can operate on.

VENTURI Consortium 2011-2014


Page 44

FP7-288238

Document Code: D3.2 Interface design prototypes and/or mock

8.3 H3: The AR visor gives an added value in everyday life to an extent that the user wants to keep them on
Explanation
The visor form factor (optical see through head worn system) allows for a never out of sight user experience. When something is constantly in view it is crucial that it does not irritate, interrupt or make the user feel discomfort. We believe that the user must be given added value by wearing the system while at the same time avoiding side effects which could make the user want to take them off. The idea is to immerse the user with the mixed reality environment to an extent that the user wants to keep the AR visors on. The user should perceive a larger value of using the AR visor for everyday situations. Some important areas to achieve this are ergonomics, content, safety, privacy and technical requirements. Industrial design, form factor and light weight are some ergonomically issues. What type of content and how much content are also other factors to not overload the user with information. Further the user should feel safe using the visors factors as radiation, thermal (visors getting too warm) and nausea needs to be investigated. Indirect safety; when driving the car it is important that the user does not get blue screen covering user's sight. Other tec hnical specifications that are important to keep the visors on are field of view, resolution, power consumption, weight, grounded glass etc.

Sub hypothesis
The user perceives a larger value of using the AR visor for achieving certain goals, rather than resorting to other solutions. o Try to find out: In what situations do visors value or experience? In what situations would people use visors?

Keeping the AR visors on makes the users perceive that he/she is always up to date (Local offers, social network, emergency communication.) Literature: Steve Mann papers, [15] e.g. his WearCam paper, [16] Test method: Through user studies get user opinions, users attitudes to wear glasses the entire time.

The user feels more satisfied in his life because of the sense of increased control and more available options in his life The use of the visor's can be controlled by the user so that it never reaches a level of intrusion or irritation which forces the user to remove them in order to be able to complete a task. Test method: Evaluate by user studies and applying for instance NASA RTLX

The AR visors feels safe for users and do not harm the users eyes, cause nausea, thermal issues or rad iation issues. Test method: By questionnaire study.

The AR visors are ergonomically good designed concerning industrial design, form factor, weight, etc. o "Attractive things work better" , Donald A. Norman

Test method: By questionnaire study and/or interviews.

VENTURI Consortium 2011-2014


Page 45

FP7-288238

Document Code: D3.2 Interface design prototypes and/or mock

The AR visors can have eye correction lenses (replace specs of a kind) o o The AR visors might look odd worn over the normal specs Perhaps contact lenses is the only possible resolution for people that need vision correction

There have been several reports that HMD often create nausea. Further investigations are needed to dig into studies and see what the top causes for this nausea is. Is it accommodation convergence conflict problems or is it poor registration (objects floating around) etc.? o o o Test method: User study? Synchronization loss in video-pipe of the displays Stereoscopic 3D effect vs. clone image feed

Covering the centre of FOV means high intrusion o A user interface which is covering a central and large part of the FOV can easily create irritation and intrusiveness in case graphics is placed there without the direct command of the user. When objects in the surrounding are augmented with information the situation must be avoided where everywhere you look you will activate information popup. We assume that you may have indications of where there is information to be retrieved but then a select or more info a ction which can reveal more information. There must in that case be a balance between information given and the easiness of getting more without accidentally activating this more info function just by glancing at things. Literature: Peripheral Vision Annotation: Non-interference Information Presentation Method for Mobile Augmented Reality paper by Y. Yoshiguro and J. Rekimoto [17] Literature: There should be design guidelines on HUD type UI that work and does not work from military and computer gaming areas

Overlay graphics that appear to be locked into a real world location must move with the environment o If a user is looking at a wall where there is a virtual painting hanging then moving the users head to the sides should have the painting hanging at the same place and not lag behind or appear to be floating and not pinned down. If many objects are constantly floating this may be like being drunk and things are spinning around resulting in poor UX. If the overlay information can be included at the same focus distance as the object, the user will not be forced to frequently refocus his/her vision.

Design input
The AR visor should be able to sustain rain It can be good to understand why a user does not want to use the AR visor and try to support the solution chosen by the user or to include characteristics that are important for the user. For instance if the user does not find it valuable to use the personal assistant for giving flight details, it should still support him in his/her attempt to find/double check the information, for instance by having the possibility to record the details given as he/she asks a person. Same goes for navigational advice. This tension between different solutions becomes evident if the activity diamond is applied.

VENTURI Consortium 2011-2014


Page 46

FP7-288238

Document Code: D3.2 Interface design prototypes and/or mock

Attractive glasses from Eyez [18] Steve Mann's WearCam paper [16] has a list of extreme stress tests that very well cover what everyday needs could be: o The Casino Test: "Casino operators are perhaps the most camera shy and the savviest in detecting cameras. Therefore, if the apparatus can pass undetected in the casino environment, under the scrutiny of pit bosses and croupiers, it can probably be used in other establishments such as grocery stores, where photography is also usually prohibited." The Sports Test: "A sporting event, such as volley- ball, requires good handeye coordination" and "being able to play a competitive game of volleyball through the apparatus should be a reason- able test of its visual transparency and quality." The Rain (Swim) Test: "An extreme scenario might be that if one were to fall off of a small sailing vessel or the like, into the water, that the apparatus and wearer would both survive."

Should be see-through and have easy turn off o o Maybe consider hardware on/off switch like in noise-cancelling headphones Maybe there should be a quickly accessible mute button which does not turn the system off but suspends any graphics overlay which may be good in situations when the user must not be disturbed

Should have context depending shut off if failure happens Indirect safety; when driving the car it is important that the user doesn't get blue screen covering u ser's sight. o Will driving with visors be banned similar to talking on the phone? Might be possible to glean information from Car Connectivity Consortium and its Driver Distraction rules. Pioneer Head-Up display. hologram based AR Navigation system Pioneer [19] and video [20]

VENTURI Consortium 2011-2014


Page 47

FP7-288238

Document Code: D3.2 Interface design prototypes and/or mock

8.4 H4: Interaction redundancy results in low intrusion


Explanation
Since not all user contexts can be correctly sensed, an augmented reality mobile device should allow for different interaction modalities (e.g. voice, touch, gestures, and gaze) as well as different levels of discreteness (e.g. small discrete movements of the hand versus full body gestures). One example is when the hands of the user are busy which makes voice a possibility while touch input is not. Another example is when the user is in on a train next to a sleeping person thus one would not like to be forced to speak out aloud. Input modalities and discreteness levels should be considered both from the perspective of the user and obser vers/bystanders. The goal of investigating this area is to determine the minimum number of modalities and discreteness levels which an AR Visor should support in order to be useful in everyday life situations. Pocket interaction can be good if gestures are not ok, but otherwise gestures can be used. State of the art gaze-tracking might be a differentiator as an input modality and a enabler for new interaction paradigm o

Eye-tracking research from Tobii [21]

Redundancy Unforeseen contexts may stop the user to use certain modalities e.g. loud environment and voice communication, bumpy bus ride and small touch areas

Design input
It should always be possible to perform simple operations when hands are busy. Examples could include: o Dismiss an application asking for user input or requesting the user's attention

A user should never be forced to speak out loud in order to complete an action A user should never be forced to use full body or big movement gestures to complete an action It should be possible to stand and talk to someone and perform simple interactions without it being noticeable. A user should be able to receive a notification while talking to another person without feeling distracted

VENTURI Consortium 2011-2014


Page 48

FP7-288238

Document Code: D3.2 Interface design prototypes and/or mock

8.5 H5: Adjustable user interface elements that can change color and the placement of the user interface would be better than non adjustable user interface
Explanation
It is well documented that natural lighting conditions and real-world backgrounds affect the usability of optical see-through AR displays in outdoor environments. In many cases, outdoor environmental conditions can dramatically alter users' color perception of user interface (UI) elements, by for example, washing out text or icon colors [22]. The UI elements need to be adjustable to fit into the users context. The colours should be able to change depending on the light conditions and background colour. However the colour should not change from white to pink but from white to light grey. Further the UI needs to change when out and running compared to sitting on the train. Even the placement of the UI should be adjustable depending on interesting objects in field of display. Example: Pulling out information from the ToDo list-app about vicinity of the bike repair shop while biking with a broken horn and unobtrusively pointing on its location with a label "Repair bike here ->" Example: Your keys holds to e.g. via Bluetooth, if the device (mediator/phone/visors) cannot find the keys around in your bag, you will get an unobtrusive notification, about forgotten keys. Example: Imagine user place NFC tags across his social habitat (work space, home, car, on the objects he/she interacts), whenever phone catches those tags, some additional info about current context will be delivered to visors. E.g. phone is sitting on the work desk (next to NFC tag placed there), user gets notified about that consider a relevancy level Tool: Colour blind UI check tool [23] Literature: More Than Meets the Eye: An Engineering Study to Empirically Examine the Blending of Real and Virtual Colour Spaces [22]

Positive and negative claims of adjustable UI elements: Claims Adjustable colour Adjustable placement Positive User can read and see the UI elements independent of the background Negative Intrusive if colour changes draws attention to the user Performance problem Irritating or difficult to read if UI element moves around too much

Not hindering interesting objects

Fixed elements

Easy for the user to find some important information Might hinder interesting objects. (battery, number of messages etc.) Changing to bad colour If the user needs to change often irritating

System initiated User doesn't have to do anything User initiated User have control

VENTURI Consortium 2011-2014


Page 49

FP7-288238

Document Code: D3.2 Interface design prototypes and/or mock

Sub hypothesis
Sound volume that is automatically adjusted to surrounding audio levels is more pleasant and results in less intrusion (no need to constantly manually adjust volume up or down) than a constant audio volume only manually adjustable by the end user Test method: Through user studies? Adjusting size of UI elements and activation areas according to the movement situation of the user/device will increase the accuracy of input and readability. Situations where this should be of special benefit are when walking or a bumpy car ride where the constant movement makes reading and input difficult for small text and input areas. Test method: Through user studies?

Fixed placement of the UI elements can be acceptable in the specific use cases like cycling or skiing

Design input
Brightness and colour of overlaid UI elements must adjust to brightness and colour of the user's environment in order to allow for good readability and to avoid eye strain

VENTURI Consortium 2011-2014


Page 50

FP7-288238

Document Code: D3.2 Interface design prototypes and/or mock

8.6 H6: To use same type of interactions for virtual object interaction as with real objects is better than indirect screen/pointer interaction
Explanation
When a user is wearing an AR visor (optical see through head worn system) there is a possibility to blend the real world and the virtual world. We can have virtual objects be displayed and positioned so that they appear to be in the real world. One example would be to draw a flower pot and flower sitting on the table. When the user moves around the sensors in the system takes that into account and redraws the image so that perspective and placement is correct. It is also possible to use a shared service which makes virtual objects be persistent between uses and between persons Objects in real life are there for all our senses and we have through our lifetime learned how to handle them. In case of a tool like a hammer, we pick it up by the handle and can use it by swinging it towards a nail. All our real world objects have volume, weight and location. We know that we can put it in a container and bring it with us. These are just some of the properties of a real world objects of course. The hypothesis is that if we can represent and handle virtual objects in a similar way as for real world objects then we will introduce a natural and intuitive way which will be easier and more efficient than a mouse pointer or touch screen indirect selection and activation paradigm for virtual objects.

Design input
A virtual object should have a pervasive spatial location. If you place it on a table and walk out of the room, it should still be there if you return to the room a couple of minutes later. A virtual object can be picked up and brought along somewhere on the user's body (e.g. in a pocket, or in the hand of a user) The virtual object may have an owner which prohibits other users from picking up or acting on it The virtual object may be limitless in quantity which means that if it is picked up it will still remain in place while another instance of the virtual object is the one picked up. One example is a pen stand with one pen in it. The virtual pen object may be configured so there will always be a pen in the pen stand. If a user then picks up the pen it will be duplicated so that one pen is in the hand of the user picking it up but there will also be one pen left in the pen stand. There should be a simple way for a user to determine whether an object is virtual or not. One way would be to press a mute button on the visors which will make all virtual objects disappear. Another way would be for each object to be accompanied with a small floating text label which can be easily detected if looking for it. A virtual object shall be able to be retrieved or removed from the world by its owner without being at the same location. A virtual object shall be able to be operated or acted upon by both users with the capability of direct tangible interaction as well as indirect methods (pointer, touch screen or other indirect input). Indirect screen/pointer interaction for virtual object interaction is better than to use same type of interactions as with real objects Video link: More than pictures under glass [24]

VENTURI Consortium 2011-2014


Page 51

You might also like