You are on page 1of 617

Smart Innovation, Systems

and Technologies 14

Editors-in-Chief
Prof. Robert J. Howlett Prof. Lakhmi C. Jain
KES International School of Electrical and Information
PO Box 2115 Engineering
Shoreham-by-sea University of South Australia
BN43 9AF Adelaide
UK South Australia SA 5095
E-mail: rjhowlett@kesinternational.org Australia
E-mail: Lakhmi.jain@unisa.edu.au

For further volumes:


http://www.springer.com/series/8767
Toyohide Watanabe, Junzo Watada,
Naohisa Takahashi, Robert J. Howlett,
and Lakhmi C. Jain (Eds.)

Intelligent Interactive
Multimedia: Systems
and Services
Proceedings of the 5th International
Conference on Intelligent Interactive
Multimedia Systems and Services
(IIMSS 2012)

ABC
Editors
Professor Toyohide Watanabe Professor Robert J. Howlett
Nagoya University KES International
Japan Shoreham-by-sea
United Kingdom
Professor Junzo Watada
Waseda University Professor Lakhmi C. Jain
Kitakyushu University of South Australia
Japan Adelaide
Australia
Professor Naohisa Takahashi
Nagoya Institute of Technology
Japan

ISSN 2190-3018 e-ISSN 2190-3026


ISBN 978-3-642-29933-9 e-ISBN 978-3-642-29934-6
DOI 10.1007/978-3-642-29934-6
Springer Heidelberg New York Dordrecht London
Library of Congress Control Number: 2012937643

c Springer-Verlag Berlin Heidelberg 2012


This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of
the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation,
broadcasting, reproduction on microfilms or in any other physical way, and transmission or information
storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology
now known or hereafter developed. Exempted from this legal reservation are brief excerpts in connection
with reviews or scholarly analysis or material supplied specifically for the purpose of being entered
and executed on a computer system, for exclusive use by the purchaser of the work. Duplication of
this publication or parts thereof is permitted only under the provisions of the Copyright Law of the
Publisher’s location, in its current version, and permission for use must always be obtained from Springer.
Permissions for use may be obtained through RightsLink at the Copyright Clearance Center. Violations
are liable to prosecution under the respective Copyright Law.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
While the advice and information in this book are believed to be true and accurate at the date of pub-
lication, neither the authors nor the editors nor the publisher can accept any legal responsibility for any
errors or omissions that may be made. The publisher makes no warranty, express or implied, with respect
to the material contained herein.

Printed on acid-free paper


Springer is part of Springer Science+Business Media (www.springer.com)
Preface

This volume contains the Proceedings of the 5th International Conference on


Intelligent Interactive Multimedia Systems and Services (KES-IIMSS-12). The
Conference was jointly organised by Nagoya University in Japan and the KES
International organisation, and held in the attractive city of Gifu.
The KES-IIMSS conference series, (series chairs Prof. Maria Virvou and Prof.
George Tsihrintzis), presents novel research in various areas of intelligent mul-
timedia system relevant to the development of a new generation of interactive,
user-centric devices and systems. The aim of the conference is to provide an interna-
tionally respected forum for scientific research in the technologies and applications
of this new and dynamic research area.
At a time when computers are more widespread than ever and computer users
range from highly qualified scientists to non computer expert professionals, Intel-
ligent Interactive Systems are becoming a necessity in modern computer systems.
The solution of “one-fits-all” is no longer applicable to wide ranges of users of
various backgrounds and needs. Therefore one important goal of many intelligent
interactive systems is dynamic personalization and adaptivity to users.
Multimedia Systems refer to the coordinated storage, processing, transmission
and retrieval of multiple forms of information, such as audio, image, video, an-
imation, graphics, and text. The growth rate of multimedia services has become
explosive, as technological progress matches consumer needs for content. The
KES-IIMSS conferences explore this area.
As is frequently the case at KES conferences invited sessions on thematic topics,
under the overall umbrella of the conference, were an important part of the event.
These allow for timely dissemination of research breakthroughs and novel ideas via a
number of autonomous presentation sessions and workshops on emerging issues and
topics that are identified each year. Of particular interest were sessions on: Kukanchi
- interactive human-space design and intelligence, Intelligent Human-Computer
Interaction, Nonverbal Communication Technology for Human-Computer Interac-
tion, Knowledge Media for Learning and Creativity, Interactive Learning / Edu-
cational Systems, Services, and Environments, Student-Centred e-Learning, Social
Intelligence in Human Community, Risk and Cognition, Information Service and
VI Preface

Knowledge Sharing through Interactive Media, and Personalization of Intelligent


Interactive Systems.
All papers were peer-reviewed to rigorous standards to ensure high quality and
fitness for publication. Approximately 60 papers were selected for publication from
a much larger number. We are very satisfied with the standard of these papers and
the breadth and depth of the programme. We thank the authors for choosing KES-
IIMSS-12 as the means of bringing their work to the attention of the public. We
appreciate the efforts of the keynote speakers in enlightening our delegates. We
gratefully acknowledge the work of the international programme committee for en-
suring quality of the published papers. We thank the local committee for their organ-
isation and management: in particular, Prof.Naoto Mukai in Sugiyama Jyo-gakuen
University and Prof.Taketoshi Ushiama in Kyushu University for their editing of
these proceedings, and we appreciate the administrative efficiency of the KES Inter-
national secretariat staff.

Prof. Toyohide Watanabe, Nagoya Univ., Japan


Prof. Junzo Watada, Waseda Univ., Japan
Prof. Lakhmi C. Jain, Univ. of South Australia, Australia
Prof. Robert J. Howlett, Bournemouth Univ., UK
Contents

1 A Decision Making for a Robot Based on Simple Interaction with


Human . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Hiroyuki Masuta, Yasuto Tamura, Hun-ok Lim
2 A Fusion of Multiple Focuses on a Focus+Glue+Context Map . . . . . 11
Hiroya Mizutani, Daisuke Yamamoto, Naohisa Takahashi
3 A Map Matching Algorithm for Sharing Map Information
among Refugees in Disaster Areas . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
Koichi Asakura, Masayoshi Takeuchi, Toyohide Watanabe
4 A Method for Supporting Presentation Planning Based on
Presentation Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
Koichi Hanaue, Toyohide Watanabe
5 A Study on Privacy Preserving Collaborative Filtering with Data
Anonymization by Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
Katsuhiro Honda, Yui Matsumoto, Arina Kawano, Akira Notsu,
Hidetomo Ichihashi
6 A Traffic Flow Prediction Approach Based on Aggregated
Information of Spatio-temporal Data Streams . . . . . . . . . . . . . . . . . . 53
Jun Feng, Zhonghua Zhu, Rongwei Xu
7 A Way for Color Image Enhancement under Complex
Luminance Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
Margarita Favorskaya, Andrey Pakhirka
8 Animated Pronunciation Generated from Speech for
Pronunciation Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
Yurie Iribe, Silasak Manosavan, Kouichi Katsurada, Tsuneo Nitta
VIII Contents

9 Building a Domain Ontology to Design a Decision Support


Software to Plan Fight Actions against Marine Pollutions . . . . . . . . . 83
Jean-Marc Mercantini, Colette Faucher
10 Can Pictures Be a Candidate for Knowledge Media? . . . . . . . . . . . . . 97
Fuminori Akiba
11 Capturing Student Real Time Facial Expression for More
Realistic E-learning Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
Asanka D. Dharmawansa, Katsuko T. Nakahira, Yoshimi Fukumura
12 Character Giving Model of KANSEI Robot Based on the
Tendency of User’s Treatment for Personalization . . . . . . . . . . . . . . . 117
Hiroki Ogasawara, Shohei Kato
13 Checklist System Based on a Web for Qualities of Distance
Learning and the Operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
Nobuyuki Ogawa, Hideyuki Kanematsu, Yoshimi Fukumura,
Yasutaka Shimizu
14 Comparison Analysis for Text Data by Integrating Two
FACT-Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
Ryosuke Saga, Hiroshi Tsuji
15 Construction of a Local Attraction Map According to Social
Visual Attention . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
Ichiro Ide, Jiani Wang, Masafumi Noda, Tomokazu Takahashi,
Daisuke Deguchi, Hiroshi Murase
16 Construction of Content Recording and Delivery System for
Intercollegiate Distance Lecture in a University Consortium . . . . . . 163
Takeshi Morishita, Kizuku Chino, Masaaki Niimura
17 Data Embedding and Extraction Method for Printed Images by
Log Polar Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
Kimitoshi Tamaki, Mitsuji Muneyasu, Yoshiko Hanada
18 Design and Implementation of Computer Assisted Training
System for Nursing Process Learning . . . . . . . . . . . . . . . . . . . . . . . . . . 183
Seiichiro Takami, Toshinobu Kawai, Takako Takeuchi, Yukiko Fukuda,
Satoko Kamiya, Kaori Nakajima, Setsuko Maeda, Junko Okumura,
Misako Sugiura, Yukuo Isomoto
19 Designing Agents That Recognise and Respond to Players’
Emotions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
Weiqin Chen
Contents IX

20 Development of Agent-Based Model for Simulation on Residential


Mobility Affected by Downtown Regeneration Policy . . . . . . . . . . . . 201
Zhenjiang Shen, Yan Ma, Mitsuhiko Kawakami, Tatsuya Nishino
21 Development of the Online Self-Placement Test Engine That
Interactively Selects Texts for an Extensive Reading Test . . . . . . . . . 213
Kosuke Adachi, Mark Brierley, Masaaki Niimura
22 DOSR: A Method of Domain-Oriented Semantic Retrieval in
XML Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223
Jun Feng, Zhixian Tang, Ruchun Huang
23 Encoding Travel Traces by Using Road Networks and Routing
Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233
Pablo Martinez Lerin, Daisuke Yamamoto, Naohisa Takahashi
24 Estimation of Dialogue Moods Using the Utterance Intervals
Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245
Kaoru Toyoda, Yoshihiro Miyakoshi, Ryosuke Yamanishi,
Shohei Kato
25 Extraction of Vocational Aptitude from Operation Logs in
Virtual Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255
Kyohei Nishide, Tateaki Komaki, Fumiko Harada,
Hiromitsu Shimakawa
26 Framework of a System for Extracting Mathematical Concepts
from Content MathML-Based Mathematical Expressions . . . . . . . . . 269
Takayuki Watabe, Yoshinori Miyazaki
27 Fundamental Functions of Dynamic Teaching Materials System . . . 279
George Moroni Teixeira Batista, Mayu Urata, Takami Yasuda
28 Generation Method of Multiple-Choice Cloze Exercises in
Computer-Support for English-Grammar Learning . . . . . . . . . . . . . 289
Ayse Saliha Sunar, Dai Inagi, Yuki Hayashi, Toyohide Watanabe
29 Genetic Ensemble Biased ARTMAP Method of ECG-Based
Emotion Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 299
Chu Kiong Loo, Wei Shiung Liew, M. Shohel Sayeed
30 Honey Bee Optimization Based on Mimicry of Threshold
Regulation in Honey Bee Foraging . . . . . . . . . . . . . . . . . . . . . . . . . . . . 307
Maki Furukawa, Yasuhiro Suzuki
31 IEC-Based 3D Model Retrieval System . . . . . . . . . . . . . . . . . . . . . . . . 317
Seiji Okajima, Yoshihiro Okada
X Contents

32 Incremental Representation and Management of Recursive


Types in Graph-Based Data Model for Content Representation of
Multimedia Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329
Teruhisa Hochin, Yuki Ohira, Hiroki Nomiya
33 Intelligent Collage System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 341
Margarita Favorskaya, Elena Yaroslavtzeva, Konstantin Levtin
34 Intuitive Humanoid Robot Operating System Based on
Recognition and Variation of Human Body Motion . . . . . . . . . . . . . . 351
Yuya Hirose, Shohei Kato
35 Knowledge-Based System for Automatic 3D Building Generation
from Building Footprint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 363
Kenichi Sugihara, Xinxin Zhou, Takahiro Murase
36 Locomotion Design of Artificial Creatures in Edutainment . . . . . . . . 375
Kyohei Toyoda, Takamichi Yuasa, Toshio Nakamura, Kentaro Onishi,
Shunshuke Ozawa, Kunihiro Yamada
37 Multistep Search Algorithm for Sum k-Nearest Neighbor Queries
on Remote Spatial Databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 385
Hideki Sato, Ryoichi Narita
38 (Not)Myspace: Social Interaction as Detriment to Cognitive
Processing and Aesthetic Experience in the Museum of Art . . . . . . . 399
Matthew Pelowski
39 Nuclear Energy Safety Project in Metaverse . . . . . . . . . . . . . . . . . . . . 411
Hideyuki Kanematsu, Toshiro Kobayashi, Nobuyuki Ogawa,
Yoshimi Fukumura, Dana M. Barry, Hirotomo Nagai
40 Online Collaboration Support Tools for Blended Project-Based
Learning on Embedded Software Development: Final Report . . . . . 419
Takashi Yukawa, Tomonori Iwazaki, Keisuke Ishida, Yuji Nishigaki,
Yoshimi Fukumura, Makoto Yamazaki, Naoki Hasegawa,
Hajime Miura
41 Online News Browsing over Interrelated Target Events . . . . . . . . . . . 429
Yusuke Koyanagi, Toyohide Watanabe
42 Path Planning in Probabilistic Environment by Bacterial
Memetic Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 439
János Botzheim, Yuichiro Toda, Naoyuki Kubota
43 Personalization of News Speech Delivery Service Based on
Transformation from Written Language to Spoken Language . . . . . 449
Shigeki Matsubara, Yukiko Hayashi
Contents XI

44 Personalized Text Formatting for E-mail Messages . . . . . . . . . . . . . . 459


Masaki Murata, Tomohiro Ohno, Shigeki Matsubara
45 Presentation Story Estimation from Slides for Detecting
Inappropriate Slide Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 469
Tomoko Kojiri, Fumihiro Yamazoe
46 Problem Based Learning for US and Japan Students in a Virtual
Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 479
Dana M. Barry, Hideyuki Kanematsu, Yoshimi Fukumura,
Toshiro Kobayashi, Nobuyuki Ogawa, Hirotomo Nagai
47 Proposal of a Numerical Calculation Exercise System for SPI2
Test Based on Academic Ability Diagnosis . . . . . . . . . . . . . . . . . . . . . . 489
Shin’ichi Tsumori, Kazunori Nishino
48 Proposal of an Automatic Composition Method of Piano Works
for Novices Based on an Analysis of Study Items in Early Stages
of Piano Education . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 499
Mio Iwaki, Hisayoshi Kunimune, Masaaki Niimura
49 Proposal of MMI-API and Library for JavaScript . . . . . . . . . . . . . . . 511
Kouichi Katsurada, Taiki Kikuchi, Yurie Iribe, Tsuneo Nitta
50 Proposal of Teaching Material of Information Morals Education
Based on Goal-Based Scenario Theory for Japanese High School
Students . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 521
Kyoko Umeda, Ayako Shimoyama, Hironari Nozaki, Tetsuro Ejima
51 Prototypical Design of Learner Support Materials Based on the
Analysis of Non-verbal Elements in Presentation . . . . . . . . . . . . . . . . 531
Kiyota Hashimoto, Kazuhiro Takeuchi
52 Reflection Support for Constructing Meta-cognitive Skills by
Focusing on Isomorphism between Internal Self-dialogue and
Discussion Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 541
Risa Kurata, Kazuhisa Seta, Mitsuru Ikeda
53 Skeleton Generation for Presentation Slides Based on Expression
Styles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 551
Yuanyuan Wang, Kazutoshi Sumiya
54 Stochastic Applications for e-Learning System . . . . . . . . . . . . . . . . . . 561
Syouji Nakamura, Keiko Nakayama, Toshio Nakagawa
55 Supporting Continued Communication with Social Networking
Service in e-Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 569
Kai Li, Yurie Iribe
XII Contents

56 Tactile Score, a Knowledge Media of Tactile Sense for Creativity . . . 579


Yasuhiro Suzuki, Junji Watanabe, Rieko Suzuki
57 Taxi Demand Forecasting Based on Taxi Probe Data by Neural
Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 589
Naoto Mukai, Naoto Yoden
58 The Design of an Automatic Lecture Archiving System Offering
Video Based on Teacher’s Demands . . . . . . . . . . . . . . . . . . . . . . . . . . . 599
Shin’nosuke Yamaguchi, Yoshimasa Ohnishi, Kazunori Nishino
59 The Difference and Limitation of Cognition for Piano Playing
Skill with Difference Educational Design . . . . . . . . . . . . . . . . . . . . . . . 609
Katsuko T. Nakahira, Miki Akahane, Yukiko Fukami
60 Topic Bridging by Identifying the Dynamics of the Spreading
Topic Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 619
Makoto Sato, Mina Akaishi, Koichi Hori

Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 629


A Decision Making for a Robot Based on Simple
Interaction with Human

Hiroyuki Masuta, Yasuto Tamura, and Hun-ok Lim

Abstract. Recently, an intelligent robot is expected to operate our living area. To


realize, a robot should make a decision for action from a simple order by human.
For decision making of a robot, it is important that a perception of environmental
situation and adaptation to the preference of a person. We have proposed the
learning method based on SOM to adapt environmental situation and the
preference of human. Through experiments by simulation, we verified that the
proposed method can consider the changing of attribution by time variation. And,
the decision making of a robot can be adapted to the preference of a person
through interaction with a person.

Keywords: Human friendly robot, Service Robot, Self Organized Map, Human
Interaction.

1 Introduction

Recently, intelligent robots are expected to work in our environments, such as


factory, home and office[1]. In such environments, robots must have advanced
intelligent functions. An intelligent robot should interact with human through
natural communication like voice command and gesture command recognition.
Moreover, a robot should decide to take on the behavior of an appropriate task
from human orders and environmental situations. However, a previous interaction
robot needs detailed order from human, for example turn on or off of devices,
change action mode and others. The problem of this interaction is that a context in
dynamic environment is not considered. Therefore, human can't trust operation of
a robot that can't respond to a true request. To solve above problem, a huge
number of action patterns is required in case of installing all action corresponding
to dynamic environment. Or operator must order a task with something change.

Hiroyuki Masuta · Yasuto Tamura · Hun-Ok Lim


3-27-1 Rokkakubashi, Kanagawa-Ku, Yokohama-Shi, Kanagawa, 221-8686, Japan
e-mail: masuta-hiroyuki@kanagawa-u.ac.jp,
r200802502fu@kanagawa-u.ac.jp, holim@kanagawa-u.ac.jp

T. Watanabe et al. (Eds.): Intelligent Interactive Multimedia: Systems & Services, SIST 14, pp. 1–10.
springerlink.com © Springer-Verlag Berlin Heidelberg 2012
2 H. Masuta, Y. Tamura, and H. Lim

On the other hand, the human can understand the true request that is hope of a
person without detailed orders by voice and gesture. Furthermore, the human can
recognize ambiguous term or reference term. Thus the human complements the
missing information from simple request. To complement the missing
information, the human attaches importance to an environmental situation that is
considered a spatiotemporal context and to past experience. Therefore, this
research target is that a robot can understand a true task based on a learning of past
experience and a environmental situation from simple request. If a robot can
understand a true request that means human intention, a robot will be able to
provide service and assistant from simple request such as reference term and
simple gesture.
Specific task of this research is a service robot to clear a table in restaurant. Our
previous research have developed the service robot system for clearing table with
human interaction[2]. In this system, a person orders a clearance command by
voice and pointing gesture to an interaction robot. A service robot makes an
inference of dish clearing procedure from human orders and properties of dishes
such as size, position, default repository and so on. However, it is difficult to
estimate an appropriate clearing plan, because clearing object selection, meaning
of human order and storage place are changed dynamically by environmental
situation. For example, the voice command "Clean up" is generally understood
that all dishes remove to sink from a table. But if a dish wasn't used, it would
remove to kitchen cabinet. If dishes are paper, it would be understood "Throw
away". The reasons of this problem are that a robot can't understand an
environmental situation of a spatiotemporal context and can't consider a past
experience through human interaction. Therefore, we propose a learning method
based on environmental situation perception and past experience consideration.
This paper is organized as follows: Section 2 explains our intelligent human
friendly robot system for clearing, Section 3 explains learning method for
estimation of human intention, Section 4 explains a experiment of clearing a table.
Finally, Section 5 concludes this paper.

2 Intelligent Human Friendly Robot System


Fig.1 shows an overview of our intelligent human friendly robot system for
clearing a table. The intelligent human friendly robot system consists of a service
robot, an interaction robot, and an intelligent space server. The interaction robot
recognizes human orders from hand gestures and spoken commands by using
stereo vision system and voice recognition system [3]. The intelligent space server
manages information on table objects, such as dish type, size and color by using
RFID system[4]. The service robot that has a robot arm and a mobile robot picks
up a dish based on vision information by stereo vision or 3D camera. The robot
arm "Katana" is small and lightweight, similar in size to a human arm. Katana has
5 degree-of-freedom (5DOF) structure and 6 motors, including a gripper. We
installed a 3D range camera and stereo vision system beside the robot arm like
Fig.1. The stereo vision made by Toshiba can recognize dish position and posture
from ellipse shape of dishes[5]. The 3D range camera can measure 3-dimensional
distance up to 7.5 [m] [6].
A Decision Making for a Robot Based on Simple Interaction with Human 3

Stereo Camera

Clear the table!

Interaction Robot Robot Arm

3D range camera
RFID Tag

RFID
reader/writer
Intelligent Space
Server
Mobile Robot

Fig. 1 The human friendly robot system overview

On the other hand, it is difficult to manage various environmental information


accurately for perceiving situation in this service robot system, such as trajectory
of a dish, trajectory of human, leftover food and so on. Now, we apply a
simulation system "V-REP". We developed the simulation model similar to Fig.1.
Fig.2 shows snapshots of simulation image. The specification of service robot is
same, and required information can measure accurately. We use a Kinect sensor to
interact with human instead of the interaction robot.

1. Sink

2. Kitchen Cabinet

3. Refrigerator

Control Panel
Service Robot

Dish

Fig. 2 The simulation model on V-REP


4 H. Masuta, Y. Tamura, and H. Lim

Moreover, we apply Robot Technology Middleware(RTM) to facilitate the


development between the real robot system and the simulation model[7][8]. The
purpose of RTM is to develop the common platform to assist in the development
of robotic systems. RTM is component based platform. A component called RT-
Component(RTC) is developed independently by various companies and
universities. And we can use that RTC easily on the RTM platform. Now, we use
a OpenHRI component to recognize voice commands[9], and OpenNI component
to recognize human gestures.

3 Learning for Human Intention Estimation


A service robot requires various information for understanding human intention.
Fig.3 shows information flow. A robot is required not only environmental
information like dish position and human position but also taking action and state
transition of itself to perceive situation. After, a decision of suitable action is
performed by integration of the above information.

Fig. 3 Information flow of a service robot

3.1 The Estimation of Human Based on Ecological Psychology

Ecological psychology insists that a perception of human is not inference from


sense organs but only detection of invariant information. Invariant is a important
concept of a perceptual system in ecological psychology[10]. For an example of
invariant, Lee verified that tau-coupling is used as direct perception when a human
being perceives an object approaching [11]. Turvey verified that dynamic touch is
also considered direct perception when a human being perceives object length
without using the eyesight. The theory common to the above examples is that a
human being directly perceives information important for taking suitable action
such as tau-coupling. Invariants extracted as perceptual information are obtained
based on the coupling structure of the physical body with the environment. We
have taken the concept of direct perception into account for realizing humanlike
judgment.
A Decision Making for a Robot Based on Simple Interaction with Human 5

We propose a learning method based on ecological psychology concept. This


section explains learning method using self organized map(SOM) to classify dish
properties directly based on environmental information and taking action.

3.2 The SOM for Clearing Table

SOM is one of the clustering method based on neural networks, which is consisted
of two layers of input and output layer, and the unsupervised learning is
performed[12]. The feature of SOM is to making a map in arbitrary dimension
within inputs data correlation. Therefore, the m dimensional neuron in the output
layer is learned from n dimensional input data. Generally, the output layer is set
two or three dimensions for visual understandability. We apply a two dimensional
SOM because the task of clearing table is a comparatively restricted situation. We
apply general 2 dimensional SOM.
Output nodes are connected in an array of i-row and j-column. All nodes in
input layer are connected to all nodes in output layer. The each node have the
weight vector W of n dimensions. The winner node is the closest node of euclid
distance |X-Wi,j| between input X and output node. This winner node is called Best
Matching Unit (BMU). The weight vector is updated with a focus on BMU as
follows.

Wi , j (t + 1) = Wi , j (t ) + hc ⋅ ( X − Wi , j (t )) (1)

where X is an input vector. hc is the neighborhood function shown equation (2).

⎛ r2 ⎞
hc = a ⋅ exp ⎜⎜ − 2 ⎟⎟ (2)
⎝ σ ⎠
where r is a distance between BMU and a node(i,j), a and σ are constant
parameters for the area of neighborhood. Therefore, the similar nodes gather
around each other.
In this research, parameters of a dish for input vector are defined weight ratio of
a dish (WR), weight ration change (WRV), displacement of a dish(OT), distance
between human and a dish(DP), transition rate between Place ID(PIV). The WR is
normalized value of a dish weight with foods from empty 0.0 to full 1.0. The
WRV can be changed to 1 or 0 when the WR is changed or not, that means during
a meal. The OT is an integrating dish travel distance that is normalized by
maximized one. This means a dish frequency of use while eating. The DP is a
distance between human and a dish that means priority for a person. The PIV is
frequency of movement between storage places while a month. The reasons for
selecting these parameters are possible to measure by real robot system on Fig.1.
These parameters are updated at intervals of 5 seconds on simulator. The SOM is
learned using above 5 inputs.
6 H. Masuta, Y. Tamura, and H. Lim

R2
R2 R1 R1

R1'
X
X

R4 R4

R3 R3

Fig. 4 The conceptual image for learning with human interaction

3.3 The Direct Estimation of a Clearance Attributes Using SOM

The categorized cluster by SOM is difficult to supply a categorical symbol name


beforehand, because the cluster depends on the attribution of input information.
So, we exploit interaction with human. In initial state, the reference parameters
(Rk) to store a dish in specific storage place are provided. After learning to SOM,
the storage place for a target dish is decided by nearest to the reference on the map
of SOM. Fig.4 shows the conceptual image. The left image is a categorized SOM
map at a moment after learning based on the references(Rk). X is a target dish on
this map. In this case, a target dish belongs to R2 category, so a target dish is
stored to a place corresponding to R2 category.
However, the storage place for a target dish is not constant with the reference,
because the preference of a person is related. Normally, a service robot decides the
storage place for a target dish from a result of SOM map. When the behavior of a

Fig. 5 The conceptual image for learning with human interaction


A Decision Making for a Robot Based on Simple Interaction with Human 7

service robot is unacceptable for a person, a person indicates a correct storage


place for a target dish by interaction. For example, when a person indicates that a
target dish should store to a place corresponding to R1 category. The new
reference connecting to R'1 is created for modifying the meaning of SOM map like
the dash line in Fig.4. Therefore, the interpretation of categories for a SOM map is
changed to right figure in Fig.4 from the next same situation.

4 The Experiment

4.1 The Experimental Result of SOM

The task of experiment is to clear 6 dishes after a human order. Each dish has
different properties. The initial state of simulation is shown by Fig.5. The initial
parameters of a dish are shown by Table 1. The parameters of dish ID 2, 3, 4 and 5
are changed on the assumption that eating. The dish ID 1 is not changed, means
don't touch a meal in spite of keeping full. The dish ID 6 is empty and
transferring, which means should have been clearing. This experiment considers
only a meal, so PIV is fixed.
Fig.6 shows a SOM map at 10, 150 and 300 seconds on the simulation. This
map is shown distance profile based on dish ID 6. Dish parameters at 300 seconds
on the simulation are shown by Table 2. Dish ID 1 is far away from ID 6, because
the assumption is opposite. On the other hand, dish ID 5 become empty, so
parameters are similar to ID 6 and each position is close on the SOM map at
300[s]. Therefore, it is possible to classify a dish attribution for clearance. Fig.7
shows a time series of parameters on Dish ID 2. The WR of dish ID 2 is decreased
while eating, this dish is assumed as a bowl of soup. The place ID is changed at
30[s] to 100[s]. Fig.8 shows Snapshots in an experimental simulation.

t = 10[s] t = 90[s] t = 210[s] t = 230[s]

Fig. 6 The conceptual image for learning with human interaction

Table 1 The initialize parameters of dishes (t=0[s])


Dish ID WR WRV OT DP PIV
Dish 1 1.0 0 0.00 0.5 0.0
Dish 2 1.0 0 0.00 0.5 0.9
Dish 3 1.0 0 0.00 0.5 0.9
Dish 4 1.0 0 0.00 0.5 0.9
Dish 5 1.0 0 0.00 0.5 0.9
Dish 6 0.0 0 1.00 0.5 1.0
8 H. Masuta, Y. Tamura, and H. Lim

Table 2 The parameters of dishes (t=300[s])


Dish ID WR WRV OT DP PIV
Dish 1 1.0 0 0.00 0.5 0.0
Dish 2 0.6 0 0.77 0.5 0.9
Dish 3 0.4 0 0.82 0.5 0.9
Dish 4 0.2 0 0.89 0.5 0.9
Dish 5 0.0 0 0.88 0.5 0.9
Dish 6 0.0 0 1.00 0.5 1.0

Fig. 7. The changing parameters focused on dish ID 2 while experiment

First, the storage place is decided to be a refrigerator, because the dish is


enough full. The storage place is decided to be a kitchen cabinet when the WR is
under 70%. Therefore, a service robot can decide a storage place of a target dish
by reference to this SOM map when a human orders to clean a table. The service
robot removes Dish 2 to kitchen cabinet like a left figure in Fig.8. That is, the
proposed method based on SOM can consider the changing of attribution by time
variation.

Wait! To refrigerator

t = 190[s] t = 210[s] t = 230[s]

Fig. 8. Snapshots in an experimental simulation


A Decision Making for a Robot Based on Simple Interaction with Human 9

However, the kitchen cabinet is not recommended for the dish ID 2 after eating.
So, a person orders that the dish should remove to refrigerator at 210[s]. The new
reference 3' (boxed number in right figure of Fig.6) is created shown by Fig.6.
And the storage place is changed to refrigerator at 230[s]. The service robot can
take Dish2 to the refrigerator like Fig.8. Therefore, the decision making of a robot
is adapted to the preference of a person through interaction with a person.
Moreover, the preference of a person is estimated by perceiving of environmental
situation.

5 Conclusions
A service robot should make a decision for action from a simple order by human.
To realize this decision making, it is important that a perception of environmental
situation and adaptation to the preference of a person. We have proposed the
learning method based on SOM to adapt environmental situation and the
preference of human.
Through experiments by simulation, we verified that the proposed method
based on SOM can consider the changing of attribution by time variation. And, the
decision making of a robot can be adapted to the preference of a person through
interaction with a person. Moreover, the preference of a person can be estimated
by perceiving of environmental situation.
As future works, we are going to experiment on the real robot system.
Moreover, we will consider the required environmental situations to flexible
interaction with human.

References
[1] Mitsunaga, N., Miyashita, Z., Shinozawa, K., Miyashita, T., Ishiguro, H., Hagita, N.:
What makes people accept a robot in a social environment. In: International
Conference on Intelligent Robots and Systems, pp. 3336–3343 (2008)
[2] Masuta, H., Kubota, N.: An Integrated Perceptual System of Different Perceptual
Elements for an Intelligent Robot. Journal of Advanced Computational Intelligence
and Intelligent Informatics 14(7), 770–775 (2010)
[3] Sato, E., Yamaguchi, T., Harashima, F.: Natural Interface Using Pointing Behavior
for Human-Robot Gestural Interaction. IEEE Transactions on Industrial
Electronics 54(2), 1105–1112 (2007)
[4] Chong, N.Y., Hongu, H., Ohba, K., Hirai, S., Tanie, K.: Knowledge Distributed
Robot Control Framework. In: Proc. Int. Conf. on Control, Automation, and Systems,
pp. 22–25 (2003)
[5] Nishiyama, M.: Robot Vision Technology for Target Recognition. Toshiba
Review 64(1), 40–43 (2009)
[6] Oggier, T., Lehmann, M., Kaufmannn, R., Schweizer, M., Richter, M., Metzler, P.,
Lang, G., Lustenberger, F., Blanc, N.: An all-solid-state optical range camera for 3D-
real-time imaging with sub-centimeter depth-resolution (Swiss Ranger). In:
Proceedings of SPIE, vol. 5249, pp. 634–545 (2003)
[7] OpenRTM-aist, http://www.openrtm.org/
10 H. Masuta, Y. Tamura, and H. Lim

[8] Ando, N., Kurihara, S., Biggs, G., Sakamoto, T., Nakamoto, H.: Software
Deployment Infrastructure for Component Based RT-Systems. Journal of Robotics
and Mechatronics 23(3), 350–359 (2011)
[9] Matsusaka, Y.: Open Source Software for Human Robot Interaction. In: Proceedings
of IROS 2010 Workshop on Towards a Robotics Software Platform (2010)
[10] Gibson, J.J.: The ecological approach to visual perception. Lawrence Erlbaum
Associates, Hillsdale (1979)
[11] Lee, D.N.: Guiding movement by coupling taus. Ecological Psychology 10(3-4), 221–
250 (1998)
[12] Kohonen, T.: Self-Organizing Maps. Springer (2000)
A Fusion of Multiple Focuses on a
Focus+Glue+Context Map

Hiroya Mizutani, Daisuke Yamamoto, and Naohisa Takahashi

Abstract. Focus+Glue+Context map system EMMA (Elastic Mobile Map) is com-


posed of the areas of expanded detail (Focus), peripheral areas (Context), and ar-
eas that absorb the distortion between the Focus and the Context areas (Glue). The
existing EMMA implementation has the drawback that road-network connections
cannot be correctly drawn when multiple Focuses overlap. This paper proposes the
following methods for nearby Focuses to naturally unite like water drops uniting by
surface tension: 1) Focus Transformation method enables Focuses to transform in
a squeezed manner to avoid overlapping. 2) Focus Union method enables overlap-
ping Focuses to unite into a single Focus with a large Focus area. 3) Union Focus
Transformation method enables a Union Focus to transform along a moving mouse
pointer, enabling users to easily view map areas near the Union Focus. 4) Focus
Division method enables Union Focuses to divide so that they can be operated as
individual Focuses. Moreover, we have had subjects examine the experiment for
testing the prototype system. Compared with the existing EMMA implementation,
the proposed system required fewer operations and less time. According to the result
of the questionnaires, the subjects could recognize road connections and follow the
focused areas with their eyes more easily on the prototype system than on existing
EMMA.

Hiroya Mizutani
College of Science and Technology, Nagoya Institute of Technology, Gokiso, Showa,
Nagoya, Aichi 466–8555, Japan
e-mail: mizutani@moss.elcom.nitech.ac.jp
Daisuke Yamamoto
College of Science and Technology, Nagoya Institute of Technology, Gokiso, Showa,
Nagoya, Aichi 466–8555, Japan
e-mail: yamamoto.daisuke@nitech.ac.jp
Naohisa Takahashi
College of Science and Technology, Nagoya Institute of Technology, Gokiso, Showa,
Nagoya, Aichi 466–8555, Japan
e-mail: naohisa@nitech.ac.jp

T. Watanabe et al. (Eds.): Intelligent Interactive Multimedia: Systems & Services, SIST 14, pp. 11–21.
springerlink.com c Springer-Verlag Berlin Heidelberg 2012
12 H. Mizutani, D. Yamamoto, and N. Takahashi

1 Introduction
On web digital maps such as Google Maps [1] and Yahoo Maps [2], users have
to repeatedly change scale and scroll maps to view multiple destinations. For in-
stance, when a user needs to know the route to the final destination through the oth-
ers, he/she needs to confirm detail maps to know landmarks and intersections near
the destinations. On the other hand, at the same time, he/she needs to view wide
maps to know the relations between the destinations and the entire areas around the
destinations.
Focus+Context type fisheye map methods [5, 6, 7, 8, 9] have been proposed to
solve this problem. Although these methods enable users to view both the areas of
expansion (Focus) and peripheral areas (Context) in one map like a fisheye lens,
there is the problem that the map has distortion in whole area, and/or, a density
of roads in corner areas of maps is large [3]. In addition, these methods have to
generate whole map dynamically [3]. Therefore, these methods are not suitable for
the web map services that require high speed generation.
We proposed the Focus+Glue+Context map system EMMA (Elastic Mobile
Map) [3, 4, 10] to address these issues. On the basis of the image of the cogni-
tive map [11], EMMA is a map system that is composed of Focus, Context, and
areas that absorb distortion between the Focus and Context areas (Glue), as shown
in Fig 1. Gathering the distortion intensively inside the Glue areas enables EMMA
to display the Focus and Context areas without any distortion. Since only Glue areas
have to be generated dynamically, the calculation cost is low and EMMA is suitable
for the web map services.
When working with EMMA, users sometimes create multiple Focuses and change
their positions and scales separately to look for multiple destinations. The existing
EMMA implementation has the drawback that these Focuses sometimes overlap
and hide each other during user operation. We propose a method in which nearby
Focuses naturally unite like water drops uniting by surface tension.

Fig. 1 Focus+Glue+Context map


A Fusion of Multiple Focuses on a Focus+Glue+Context Map 13

2 Overlapping Focus Problem


In the existing EMMA, overlapping Focuses hide each other and the Glue areas. In
addition, for maintaining road-network connections, inner Glue areas adjoin Focus
areas, and outer Glue areas adjoin Context areas.
Then, we tried to address this drawback with the Overlap Repulsion method. This
method avoids overlapping by moving inactive Focuses away from active Focuses,
as shown in Fig 2.
Therefore, when Focuses are too close to each other, the other Glue areas dis-
appear, as shown in Fig 3. Thus, the Overlap Repulsion method cannot maintain
road-network connections. In short, this method has the following problems.

Problem 1. The Overlap Repulsion method loses the map connections between the
Focus and Context areas because it removes the Glue area, as shown in Fig 3.

Problem 2. Focuses are moved considerably to maintain the continuity of maps.


Therefore, Focuses hide some parts of the Context areas that can be originally seen.

Fig. 2 The Repelled Focus which is dis- Fig. 3 Glue disappearance and magnifica-
placed by the Focus approaching from the tion of the black shaded area like a magni-
right fying glass

3 Overview of the Proposed System


To solve the drawbacks of the existing Overlap Repulsion method, we propose the
method that causes Focuses to transform, unite, and divide. We describe the func-
tions of the proposed system below:

1) Focus Transformation Function


This function causes Focuses to transform in a squeezed manner so that they do
not overlap. This enables users to view multiple destination areas simultaneously
(Fig 4). Only the overlapping parts of Focuses are transformed, whereas the rest
14 H. Mizutani, D. Yamamoto, and N. Takahashi

Fig. 4 Focus Transformation Function: the Fig. 5 Focus Union Function: the Focuses
Focuses are contacted like water drops by unite if they approach closer
transforming the shapes of Focuses

Fig. 6 Union Focus Transformation Func- Fig. 7 Focus Division Function: dragging
tion: the Union Focus transforms along the the mouse to a point outside the Union Fo-
moving mouse pointer cus makes the Union Focus divide

of the Focuses maintain their original shapes. This enables transformed Focuses
in a manner similar to that in which water drops transform naturally by surface
tension. Transformed Focuses return to their original shapes when they move to
non-overlapping locations.

2) Focus Union Function


When Focuses are less than a predetermined distance apart because of the Focus
Transformation Function, they unite as shown in Fig 5. Focuses unite like that water
drops unite by surface tension because the Union Focus’s outline is the same as that
of the individual Focuses prior to unification. The scale of the Union Focus is same
as the scale of the active Focus.
A Fusion of Multiple Focuses on a Focus+Glue+Context Map 15

Table 1 Focus definition data


LOF The polygon marking the pre-magnified Focus area on the Context area
LF The closed polygon marking the border between the Focus and the Glue areas
LG The closed polygon marking the border between the Glue and the Context areas
PF The center point of a Focus
LNG LG of a Focus in the absence of overlap
LFmG LG when the Focus is transformed so as not to overlap the mth Focus
Pn (L) Coordinates of the nth vertex of polygon L

3) Union Focus Transformation Function


After two Focuses have united, dragging the mouse makes the Union Focus trans-
form along the moving mouse pointer as shown in Fig 6. This transformation makes
the active Focus look like it is absorbed and emitted by the inactive Focus.

4) Focus Division Function


After the mouse is dragged to a point outside the Union Focus, the Union Focus
divides into the original two independent Focuses as shown in Fig 7.

4 Proposed Methods
4.1 Definitions
The proposed system uses the following data. Focuses are convex and N-sided poly-
gons. Coordinate type is based on XY.

Focus Definition Data. Users designate positions and shapes of Focuses. Focus
data is provided in table 1 and in Fig 8. Focus definition data has the following
limitations: 1) LOF is completely enclosed by LF . LF is completely enclosed by LG .
PF is completely enclosed by LOF . (LOF ⊂ LF ; LF ⊂ LG ; PF ⊂ LOF ). 2) LOF and LF
are geometrically similar (LOF ∼ LF ).
LOF describes the map area on the Context area before it is expanded and dis-
played on the Focus areas. When the scale and coordinates of LOF are changed
about a center point (PF ), its area fits the Focus area (LF ). PF can be found using
LOF and LF . The size of LOF changes by the scale of the Focus area.

4.2 Focus Transformation Function


Although Focuses can be united to avoid overlapping, Focuses with shallow overlap
should not be combined but rather transformed owing to the relationship between
Focuses and scales (as discussed in section 4.3). Hence, Focuses need to get close
for a deep overlap; this is achieved by Focus Transformation Function.
16 H. Mizutani, D. Yamamoto, and N. Takahashi

Fig. 8 Focus definition data

When vertices from one Focus enter another Focus’s area, they are moved to-
wards the center of their own Focus and away from the other Focus. When trans-
forming Focuses move apart, each vertex tries to return to its original position. The
algorithm for the Focus Transformation Function is described below.
M Focuses are labeled 0 to M − 1. The original Focus coordinates (LNG ) and the
transforming Focus coordinates are stored. When the Focus Transformation Func-
tion is invoked, the following steps are performed:
step.1 i = 0 and j = 1 are set as Focus counters.
step.2 It is judged whether Focus i is overlapping Focus j by the two conditions:
1) for n = 0 to N − 1, whether Pn (LNG ) of Focus i is in LNG of Focus j and
2) for n = 0 to N − 1, whether Pn (LNG ) of Focus j is in LNG of Focus i.
If either condition is true, both Focuses are overlapping, and step.3 will be im-
plemented. If both the conditions are false, both Focuses are not overlapping, and
step.3 will be implemented.
Fig 9 shows two overlapping Focuses. P1 (LNG ) of the left Focus is in the area of
the right Focus, and P8 (LNG ) of the right Focus is in the area of the left Focus. When
n = 1, the condition about Pn (LNG ) of the left Focus holds. Thus, the Focuses are
judged to be overlapping.
step.3 Angles between PF (a center point) and each vertex of each Focus is calcu-
lated. Each vertex of LFj G of Focus i and LFi G of Focus j is moved apart from its
own PF along the segment joining the two PFs in small increments. The movement
is repeated until each vertex satisfies the following conditions: 1) It returns to its
original position Pn (LNG ), 2) When the Focus’s number is i, Focus i enters the area
of LFi G of Focus j. When the Focus’s number is j, Focus j enters the area of LFj G
of Focus i.
step.4 Because of the processes of step.3, the two Focuses are slightly overlapping.
To move them apart, the vertices that are in another Focus’s areas are moved to-
wards the center point incrementally along line segment R calculated in step.3. The
movement is repeated on Focus i and Focus j until all vertices have exited the other
Focus’s area.
step.5 The vertices that were changed on the steps above are stored along with the
number of the other Focus.
step.6 j is incremented by 1, and if j < N, step.2 is implemented.
A Fusion of Multiple Focuses on a Focus+Glue+Context Map 17

step.7 i is incremented by 1, and if i < N − 1, step.2 is implemented.


step.8 Some changed vertices of Focus have been stored, and the changed coordi-
nates of each vertex closest to the center point are chosen. The Focus number and
the coordinate data are used in the drawing system, and Focus transformation is real-
ized. Each vertex of the Focuses moves along R (the inclination to the center point),
so Focuses transform as shown in Fig refFocus transforming due to movement of
each vertex. As transforming Focuses are moved apart, each Focus is restored from
transforming shape to the original shape gradually owing to step.3.

Fig. 9 Focus overlapping judgment before Fig. 10 Focus transforming owing to move-
transforming ment of each vertex

4.3 Focus Union Function


When two Focuses are less than a predetermined distance apart, this function deletes
the individual Focuses and creates a new Union Focus. LOF of the Union Focus needs
to include the LOF of the individual Focuses. The red circles in Fig 12 show the LOF
of the two Focuses in Fig 11. When Focuses are adjoining, the map areas displayed
in the Focus areas are actually at a distance. The greater the scale difference between
the Focus and the Context areas, the smaller the LOF of the Focus is.

Fig. 11 Focuses are adjoining Fig. 12 LOF of the two Focuses in Fig 11
and LOF of the Union Focus
18 H. Mizutani, D. Yamamoto, and N. Takahashi

When the Focuses that are shallowly overlapping unite, the LOF of the Union
Focus cannot include much of the LOF of both the original Focuses, and the map
area that was displayed in each original Focus area cannot be displayed in the union
Focus area. The black shadow in Fig 12 shows the LOF of the Union Focus that is
made when the Focuses in Fig 11 unite. The red circles show LOF of the two Fo-
cuses in Fig 11. Because the black shadow does not include much of the circled area,
Focuses should unite only when they overlap deeply. The Focus Union algorithm is
described below.

step.1 i = 0 and j = 1 are set as initial values.


step.2 If the distance between the PF points of Focus i and Focus j is less than a
certain value D1 , the next step is implemented. If it is greater, Focus union does not
occur, and the Focus Union Function is completed.
step.3 LG of the Union Focus is shown by a black circle right before the Union
Focuses are created. The pair of vertices with maximum separation among LNG of
Focus i and Focus j is calculated. The adjoining (M − 2)/4 vertices on either side
of both vertices are chosen. The polygon containing these vertices forms the LG of
the Union Focus and is displayed in black.
step.4 When the distance between PF of Focus i and Focus j is shorter than a cer-
tain value D2 (D2 < D1 ), Focus i and Focus j are deleted, and a new Union Focus
is created. LG of the Union Focus is calculated in the same way as step.3. LNG of
Focus i and Focus j are stored for Union Focus Transformation Function.

Fig 13 shows the shape of the Union Focus calculated by step.3 using a thick black
outline. In Fig 13, the maximally separated vertex pair is circled. The LNG of the
Union Focus is composed of the circled vertices and three vertices on either side of
the maximally separated vertices.

Fig. 13 Union Focus’s shape decision method

4.4 Union Focus Transformation Function


If the mouse keeps being dragged after unification, the Union Focus transforms
along the movement of the mouse pointer. When the mouse button is released, the
A Fusion of Multiple Focuses on a Focus+Glue+Context Map 19

shape of the Union Focus is fixed. Union Focus Transformation algorithm is shown
below.

step.1 LNG of the two Focuses that were stored in step.4 of section 4.3 are retrieved.
step.2 The LNG that the user is dragging moves with the mouse pointer. The moved
LNG is stored for Focus Division Function.
step.3 LG of the Union Focus is recalculated using the two LNG polygons as in step.3
of section 4.3.

4.5 Focus Division Function


Deleting a Union Focus and creating two new transforming Focuses gives the ap-
pearance of the Union Focus dividing. The LG of the two original Focuses are re-
trieved for division. Division occurs only when mouse dragging continues after Fo-
cus Union. The map areas shown by the two Focuses do not overlap. The Focus
Division Function algorithm is shown below.

step.1 When a user keeps draging a Focus after unification, the distance between
the two PF is calculated. One of the two PF is the PF of the inactive Focus. The
other one is the PF that was preserved on step.2 in section 4.4.
step.2 When the distance is greater than a certain value, the Union Focus is deleted,
and two Focuses with the original LNG areas are created. Although the newly created
Focuses overlap, the Focus Transformation Function is immediately implemented
and the Focuses are drawn in a transformed non-overlapping manner.

5 Experimental Results
We conducted the following experiment to examine the effectiveness of the pro-
posed system. The purpose of this experiment is to ensure that the proposed system
enables users to recognize roads and check geographical points more easily than the
previous system.

5.1 Experimental Steps


Subjects undertook the following exercise with both the proposed and the previous
systems. The systems were compared on the basis of questionnaire results, the num-
ber of operations required, and the time required. The detailed processes are described
below. To familiarize themselves with the working of each system before beginning
the exercises, the subjects practiced on the systems for three minutes each.
step.1 Focuses are placed on a start point and two check points. The subjects move
the start point Focus along a designated route and look for designated buildings
near the check points. The time required and the number of operations required are
recorded. Clicking and dragging the mouse are both counted as one operation.
20 H. Mizutani, D. Yamamoto, and N. Takahashi

step.2 The subjects draw a simple map including the route and the buildings while
remembering the information from step.1.
step.3The subjects answer a questionnaire about usability.

The subjects check the place relationship between the check point and the nearby
designated buildings while checking the detailed information around the immediate
area using the Focus. The designated buildings are near the check points, and the
subjects need to view them through the Focus.
The subjects were asked to rate the following parameters on a scale of 1 to 5,
where 5: very good, 4: good, 3: satisfactory, 2: poor, and 1: very poor
1) recognition of road connections, 2) how natural it felt, 3) processing speed, 4)
interest, 5) comfort, 6) ease of use, 7) ease of following Focus areas visually.

5.2 Results of the Experiment


The average number of operations (i.e., the number of times the subjects switched
Focuses) for the existing EMMA system is 7.1, while the corresponding number
for the proposed system is 1.0. The average time required on the existing EMMA
system is 63.8 s, while the proposed system required only 49.1 s. The number of
operation indicates the number of times subjects switch Focuses that they operate.
These results suggest that the proposed system is more effective than the previous
system since both the results on the proposed system are lower than those on the
past system. In particular, the number of operations on the proposed system is only
1, whereas, the number of operations on the previous system is more than 4. The
graphs in Fig 14 show the results of our questionnaire. The vertical bars indicate
how well the systems were perceived to meet each requirement.
The item gprocessing speedh on the proposed system is rated higher than on
the previous system. The item grecognition of road connectionsh on the proposed
system is also higher than on the previous system. These results suggest that the
proposed system allows users to recognize road connections between Focuses easily
without switching Focus scales. Therefore, the proposed system solves problem 1
in section 2. The item gease of following Focus areas visuallyh received double the
rating on the proposed system compared to the previous system. This implies that

Fig. 14 The questionnaire result


A Fusion of Multiple Focuses on a Focus+Glue+Context Map 21

the proposed system could keep showing the focused map information in the Focus
areas. Therefore, the proposed system solves problem 2 in section 2.

6 Conclusion
This paper proposed transformation, union, and division methods for Focuses in the
Focus+Glue+Context map EMMA. Moreover, experimental results suggest that the
proposed system is more useful than the previous EMMA system.
We have three issues for future research. First, when there is a large scale differ-
ence between two uniting Focuses, we have to explore the optimal size and scale of
the Union Focus. Second, we should compare our methods with those used by nav-
igation systems such as car navigation systems. Third, we could propose a device
and interface that is user-friendly for pedestrians.

References
1. Google Maps, http://maps.google.com
2. Yahoo! Maps, http://map.yahoo.com
3. Takahashi, N.: An Elastic Map System with Cognitive Map-based Operations. In: Inter-
national Perspectives on Maps and the Internet, February 12, pp. 73–87. Springer (2008)
4. Yamamoto, D., Ozeki, S., Takahashi, N.: Focus+Glue+Context: An Improved Fisheye
Approach for Web Map Services. In: Proceedings of the ACM SIGSPATIAL GIS 2009,
Seattle, Washington, pp. 101–110 (November 2009)
5. Furnas, G.W.: Generalized fisheye views. In: Proc. of the SIGCHI 1986, pp. 16–23
(1986)
6. Harrie, L., Sarjakoski, L.T., Lehto, L.: A variable-scale map for small-display cartog-
raphy. In: Proc. of the Symp. on GeoSpatial Theory, Processing, and Applications, pp.
8–12 (2002)
7. Sarkar, M., Brown, M.H.: Graphical fisheye views of graphs. In: Proc. of the SIGCHI
1992, pp. 83–91 (1992)
8. Sarkar, M., Snibbe, S.S., Tversky, O.J., Reiss, S.P.: Stretching the rubber sheet: a
metaphor for viewing large layouts on small screens. In: Proc. of the 6th Annual ACM
Symp. on User Interface Software and Technology, pp. 81–91 (1993)
9. Gutwin, C., Fedak, C.: A comparison of fisheye lenses for interactive layout tasks. In:
Proc. of the Graphics Interface 2004, pp. 213–220 (2004)
10. Yamamoto, D., Ozeki, S., Takahashi, N.: Wired Fisheye Lens: A Motion-Based Im-
proved Fisheye Interface for Mobile Web Map Services. In: Carswell, J.D., Fothering-
ham, A.S., McArdle, G. (eds.) W2GIS 2009. LNCS, vol. 5886, pp. 153–170. Springer,
Heidelberg (2009)
11. Gould, P., White, R.: Mental Maps. Penguin Books Ltd, Harmondsworth (1997)
A Map Matching Algorithm for Sharing Map
Information among Refugees in Disaster Areas

Koichi Asakura, Masayoshi Takeuchi, and Toyohide Watanabe

Abstract. In this paper, we propose a map matching algorithm for a map infor-
mation sharing system used by refugees in disaster areas. The map information
sharing system stores the roads passed by a refugee as map information. When
another refugee comes close to the refugee, they exchange their map information
each other in ad-hoc network manner. Our map matching algorithm is based on ge-
ometric curve-to-curve matching approach and provides road segments on which a
refugee moves, from position data by GPS. In order to decrease matching errors, our
algorithm adopts the point-to-curve matching method as initial matching, and incre-
mental matching for suppressing error propagation. Experimental results shows that
our proposed algorithm achieve the best result in comparison with the conventional
point-to-curve and curve-to-curve map matching methods.

Keywords: map matching, navigation system, ad-hoc network, disaster area.

1 Introduction
Mobile ad-hoc network (MANET) technologies have attracted great attention re-
cently, especially for communication systems in disaster situations[1, 2]. This is
because that communication infrastructures may be broken or malfunction and thus

Koichi Asakura
Department of Information Systems, School of Informatics, Daido University,
10-3 Takiharu-cho, Minami-ku, Nagoya 457-8530, Japan
e-mail: asakura@daido-it.ac.jp
Masayoshi Takeuchi
Department of Information Systems, School of Informatics, Daido University,
10-3 Takiharu-cho, Minami-ku, Nagoya 457-8530, Japan
Toyohide Watanabe
Department of Systems and Social Informatics, Graduate School of Information Science,
Nagoya University, Furo-cho, Chikusa-ku, Nagoya 464-8603, Japan
e-mail: watanabe@is.nagoya-u.ac.jp

T. Watanabe et al. (Eds.): Intelligent Interactive Multimedia: Systems & Services, SIST 14, pp. 23–31.
springerlink.com 
c Springer-Verlag Berlin Heidelberg 2012
24 K. Asakura, M. Takeuchi, and T. Watanabe

normal mobile phones or wired and wireless LAN networks cannot be used in such
situations. Since the MANET does not require any communication infrastructures[3,
4, 5], it is suitable for communication systems in disaster areas.
In order for refugees in disaster areas to share up-to-the-minute map informa-
tion, we have already proposed a map information sharing system based on the
MANET technologies[6]. In this system, refugees record the history of positions as
map information and share map information among neighboring refugees in ad-hoc
network manner. By sharing map information, refugees can achieve correct safety
road information in disaster areas timely without any central servers. For this sys-
tem, correct position information of refugees is very essential. Although position
information can be acquired by GPS generally, there are some errors in position
information acquired by GPS. Such errors influence decision making for refugees
which roads they move to.
In this paper, we propose a map matching algorithm for the map information shar-
ing system for the refugees. In map matching algorithms, a sequence of position
information is matched with road network information in a map. By map match-
ing algorithms, position information of refugees is normalized to the road network,
which makes it easy to share the history of position information of refugees without
any errors.
The rest of this paper is organized as follows. Section 2 describes our map in-
formation sharing system briefly and related work for map matching algorithms.
Section 3 describes our proposed map matching algorithm. Section 4 explains our
experiments. Finally, Section 5 concludes this paper and gives our future work.

2 Map Information Sharing System


2.1 System Overview
In disaster situations such as a big earthquake and other destructive situations, cor-
rect map information is essential in order for refugees to evacuate to safe areas
quickly. Especially, information on roads’ condition, namely which roads can be
passed thorough safely, which roads should be selected for quick evacuation and so
on, is very important. However, it is very difficult to collect such information in dis-
aster areas because sensor equipments such as a surveillance video monitor, a traffic
counting system and so on may be destructed or malfunction although they pro-
vide useful information at ordinary time. Furthermore, communication infrastruc-
tures may not be used perfectly, which prevents refugees from acquiring timeliness
information.
In order to deal with the above circumstance, we have proposed a map infor-
mation sharing system for refugees in disaster areas[6]. Our system is executed
in personal mobile terminals as an application in smartphones, slate PCs, note
PCs and so on. The system acquires a current position of a refugee by using
GPS or other positioning systems, and stores passing roads as map information.
When another refugee comes close to the refugee, the systems exchange their map
A Map Matching Algorithm for Sharing Map Information among Refugees 25

Trajectory of refugee A Trajectory of refugee B Trajectory of refugee C

Sharing map information by ad-hoc network

Safety road map in the disaster area

Fig. 1 System overview

information each other. This exchange is performed in ad-hoc network manner with-
out any network infrastructures. By this exchange, map information of refugees is
merged and information on roads that can be passed safely in the disaster situation
is collected without any communication infrastructures. Figure 1 shows the features
of our proposed map information sharing system. By this system, we can collect
and share map information in real time, which enables refugees to move to shelters
safely.

2.2 Map Matching


In order to share map information among refugees, road segments passed through
by refugees have to be detected. Generally, position information which is acquired
by GPS is provided as a coordinate: a pair of latitude and longitude. Thus, a series
of position information has to be corresponded with road segments correctly. Gener-
ally, there are some errors in position information from GPS. Especially, error ratio
is relatively large in urban areas, since the number of GPS satellites whose radio
wave can reach to the GPS devices in mobile terminals is not enough due to high
buildings.
For the sake of solving the above problem, map matching methods have been
proposed[7, 8]. By map matching algorithms, the sequence of position information
is corresponded with the sequence of road segments. The example of map matching
is shown in Figure 2. Figure 2(a) represents road network data and a sequence of
26 K. Asakura, M. Takeuchi, and T. Watanabe

x
x
xxx x x xxx xx
x
x
xx
x
x
x xx x x x

(a) History of positions (b) Result of a map matching method

Fig. 2 Map matching

positions for a refugee captured by GPS. By using map matching methods, the road
segments which the refugee move on can be extracted as Figure 2(b) although all
points are not exactly laid on the road segments.
Until now, many types of map matching algorithms have been developed mainly
for vehicle navigation systems. Map matching algorithms are classified roughly
into the following types: geometric map matching[9, 10, 11], probabilistic map
matching[12], statistical map matching[13], and so on. In this paper, we focus on the
geometric map matching algorithms. This is because other map matching algorithms
require many computational resources: namely, the large amount of pre-calculated
data (memory) or computation power (CPU). Since our system is utilized on mo-
bile terminals under disaster situations, it is important to execute a map matching
algorithm with smaller amount of memory and less computation.
In geometric map matching, geometrical analysis between position data and road
networks is achieved and the point on a road segment corresponding with po-
sition information is extracted. There are several approaches for geometric map
matching: point-to-point matching, point-to-curve matching and curve-to-curve
matching[9, 11]. In point-to-point map matching algorithms, position information
is matched into the nearest point in the map database. This approach tends to gen-
erate many matching errors although it is very simple and requires less computing
resources. In point-to-curve map matching, position information is matched into the
nearest point on a road segment. Since the number of points for candidate of match-
ing becomes large, this approach generates less matching errors than the point-to-
point approach. In curve-to-curve map matching, a sequence of position points is
matched into road segments directly. This approach generates less matching errors
than former two approaches because matching between two line segments makes
less topological mismatch errors. Figure 3 shows matching results for three ap-
proaches. In this figure, position information is denoted as cross points. Points in
the map database is denoted as white circles. A road segment in the map database is
denoted as line between two circles. Figure 3(a), (b) and (c) describe the matching
results by the point-to-point approach, the point-to-curve approach and curve-to-
curve approach, respectively. Since the curve-to-curve approach provides accurate
A Map Matching Algorithm for Sharing Map Information among Refugees 27

x x x x x x x x x x x x
x x x x x x x x x
x x x
x x x

(a) Point-to-point (b) Point-to-curve (c) Curve-to-curve

Fig. 3 Map matching algorithms

results as shown in Figure 3, we develop a map matching algorithm based on this


approach.

2.3 Requirements
In this section, we describe requirements for a map matching algorithm in our map
information sharing system for refugees in disaster areas.
Since our system is utilized by refugees who evacuate to shelters by foot, the
moving speed of the refugees is relatively slow and the moving trajectory is compli-
cated in comparison with vehicles. Thus, in the map matching algorithm, we have
to take initial position matching into account. If initial position matching is failed,
the position of a refugee is corresponded with the wrong road, which causes stor-
ing incorrect map information. In addition to the initial position matching, an error
recovery method is also important because of slow moving speed and complicated
moving trajectories. In vehicle navigation systems, making turn or passing though
in an intersection provide hints for error correction in map matching. On the other
hand, in our system, other error correction methods have to be provided which take
slow speed and complicated trajectories into account, although the above hints are
even important for error correction.
In order to deal with the problem for initial position matching, in our
algorithm, we adopt the point-to-curve matching method. Namely, we use the point-
to-curve matching method for initial position matching and then use the curve-to-
curve matching method, although many conventional curve-to-curve map matching
algorithms use the point-to-point matching for initial position matching.
Furthermore, in order to provide quick error recovery, we propose an incremen-
tal map matching method. In our method, initial position matching based on the
point-to-curve matching approach is repeatedly performed in a certain time inter-
val. This approach resolves the problem in that curve-to-curve matching produces
wrong matching results continuously.

3 Algorithm
This section describes the map matching algorithm in the map information sharing
system for walking refugees in disaster areas.
28 K. Asakura, M. Takeuchi, and T. Watanabe

3.1 Initial Matching


Initial position matching is performed by the point-to-curve matching method. This
approach decreases matching errors in comparison with the point-to-point matching
which is used in the conventional curve-to-curve matching approaches. Especially,
if the number of point data is less in map databases, the point-to-curve matching
method produces better results.
In point-to-curve matching, geometric distances between road segments and po-
sition point are calculated, and the corresponding point is extracted in the nearest
road segment.

3.2 Curve-to-Curve Matching


In the curve-to-curve matching, geometric distances between road segments and the
line segment which is generated from two successive position points are calculated.
In order to decrease computation cost, we calculate candidate road segments in our
algorithm. When the previous position point is matched to the road segment r, r
itself and the road segments which are connected to r are extracted as candidate
road segments. The road segment which has the shortest distance is extracted from
the candidate road segments, and the corresponding point is calculated on the road
segment.
Figure 4 shows this process. When the line segment between position points p1
and p2 is matched to the road segment r1 , the map-matched point p2 is calculated
as shown in Figure 4(a). Then, for map matching of the position point p3 , candidate
road segments are calculated. The road segments r1 , r2 , r3 , r4 and r5 are extracted
as candidate road segments because r2 , r3 , r4 and r5 are connected to r1 on which
p2 exists. Thus, the distances from the line segment between p2 and p3 to each
candidate road segment are evaluated and the map-matched point p3 is extracted
on r1 successfully as shown in Figure 4(b) although geometric distance to r6 is the
shortest.

r6
p2 p2 p3
p1 r4 p1 x r2
x x x x
r1 r5 r1
pʼ1 pʼ2 pʼ1 pʼ2 pʼ3
r3

(a) initial matching (b) Candidate road segments

Fig. 4 Processing flow in the proposed algorithm


A Map Matching Algorithm for Sharing Map Information among Refugees 29

x
r2 x
x x x
x
x x x x x
r3 x
x r1

(a) Problem on initial matching (b) Problem at a short road segment

Fig. 5 Wrong matching results in the conventional algorithms

3.3 Incremental Matching


After the curve-to-curve matching is performed at several times, the point-to-curve
matching is performed for error recovery. Figure 5 shows the effect of incremental
matching approach. As shown in Section 3.2, candidate road segments are extracted
in the curve-to-curve matching phase. Thus, when a matching error occurs at first
and the correct road segment is not included in candidate road segments, this phase
produces the wrong matching results continuously (Figure 5(a)). Furthermore, as
shown in Figure 5(b), an matching error also occurs when there exists short road
segments on map database.
Our algorithm can avoid such situations by incremental matching because the
corresponding point is extracted without limitation by the point-to-curve matching
at a certain interval.

4 Experiments
We conducted a experiment for evaluating proposed map matching algorithm. This
section describes the experimental results.
For the experiments, we use GPS log data which is captured by ourselves. We
walk around our university with the GPS logger which is implemented as the Java
applet program. Then, map matching is performed by several algorithms. We com-
pare three map matching algorithms: the conventional point-to-curve map matching
algorithm, the conventional curve-to-curve map matching algorithm and our pro-
posed algorithm. The number of points included in the GPS log data is 1573 with 1
point/sec. sampling rate.

4.1 Experimental Results


Detailed experimental results show in Figure 6. In the figure, position data captured
by GPS are denoted as blue dots, and map-matched point data are denoted as red
dots. Figure 6(a) shows the result of the point-to-curve matching and the proposed
30 K. Asakura, M. Takeuchi, and T. Watanabe

Point-to-curve method Point-to-point method

Proposed method Proposed method

(a) Comparison with the point- (b) Comparison with the curve-to-curve method
to-curve method

Fig. 6 Experimental results

Table 1 Error ratio for three algorithms

Algorithm Matching success Matching failure


Point-to-curve algorithm 85.80% 14.20%
Curve-to-curve algorithm 86.86% 13.14%
Proposed algorithm 88.73% 11.27%

matching algorithms on the same map area. The point-to-curve matching algorithm
generates wrong results in the hatched region, while our proposed algorithm can
generate a little better results. Furthermore, Figure 6(b) shows the results of the con-
ventional curve-to-curve matching algorithm and our algorithm. This figure shows
that our algorithm generates the better results in the area where the road network
is complicated. Error ratios for each map matching algorithm are shown in Table 1.
This table shows that the proposed algorithm produces the best map matching result.
From these experimental results, we can conclude that our proposed algorithm
can generate better map matching results in comparison with conventional geometric-
based map matching algorithms.

5 Conclusion
In this paper, we propose a map matching algorithm for a map information shar-
ing system. Our algorithm is based on geometric method, in which computational
cost is less than other advanced map matching algorithms. Since our algorithm is
A Map Matching Algorithm for Sharing Map Information among Refugees 31

used for mobile terminals in disaster areas, power consumption of batteries is one
of the most important factors. However, geometric map matching approaches gen-
erate more matching errors than other advanced approaches. In order to decrease
matching errors, our algorithms has two features: using point-to-curve matching for
initial matching and incremental matching for suppressing error propagation. Ex-
perimental results shows that our algorithm generates better matching results than
other conventional algorithms. For our future work, we have to take topological fea-
tures of the road network into account. Topological information makes the number
of matching errors especially in the intersections decrease.

References
1. Midkiff, S.F., Bostian, C.W.: Rapidly-Deployable Broadband Wireless Networks for Dis-
aster and Emergency Response. In: The 1st IEEE Workshop on Disaster Recovery Net-
works, DIREN 2002 (2002)
2. Meissner, A., Luckenbach, T., Risse, T., Kirste, T., Kirchner, H.: Design Challenges for
an Integrated Disaster Management Communication and Information System. In: The
1st IEEE Workshop on Disaster Recovery Networks, DIREN 2002 (2002)
3. Toh, C.-K.: Ad Hoc Mobile Wireless Networks: Protocols and Systems. Prentice Hall
(2001)
4. Murthy, C.S.R., Manoj, B.S.: Ad Hoc Wireless Networks: Architectures and Protocols.
Prentice Hall (2004)
5. Lang, D.: Routing Protocols for Mobile Ad Hoc Networks: Classification, Evaluation
and Challenges. VDM Verlag (2008)
6. Asakura, K., Chiba, T., Watanabe, T.: A Map Information Sharing System among
Refugees in Disaster Areas, on the Basis of Ad-hoc Networks. In: The 3rd International
Conference on Intelligent Decision Technologies (IDT 2011), pp. 367–376 (2011)
7. Brakatsoulas, S., Pfoser, D., Salas, R., Wenk, C.: On Map-matching Vehicle Tracking
Data. In: The 31st International Conference on VLDB, pp. 853–864 (2005)
8. Quddus, M.A., Ochieng, W.Y., Zhao, L., Noland, R.B.: Current Map-matching Algo-
rithms for Transport Applications: State-of-the Art and Future Research Directions.
Transportation Research Part C 15, 312–328 (2007)
9. Quddus, M.A., Ochieng, W.Y., Zhao, L., Noland, R.B.: A General Map Matching Algo-
rithm for Transport Telematics Applications. GPS Solutions 7(3), 157–167 (2003)
10. Yang, J., Kang, S., Chon, K.: The Map Matching Algorithm of GPS Data with Relatively
Long Polling Time Intervals. Journal of the Eastern Asia Society of Transportation Stud-
ies 6, 2561–2573 (2005)
11. Dewandaru, A., Said, A.M., Matori, A.N.: A Novel Map-matching Algorithm to Improve
Vehicle Tracking System Accuracy. In: International Conference on Intelligent and Ad-
vanced Systems, pp. 177–181 (2007)
12. Ochieng, W.Y., Quddus, M.A., Noland, R.B.: Map-matching in Complex Urban Road
Networks. Brazilian Journal of Cartography 55(2), 1–18 (2004)
13. Gustafsson, F., Gunnarsson, F., Bergman, N., Forssell, U., Jansson, J., Karlsson, R.,
Nordlund, P.: Particle Filters for Positioning, Navigation, and Tracking. IEEE Trans-
actions on Signal Processing 50(2), 425–435 (2002)
A Method for Supporting Presentation Planning
Based on Presentation Strategies

Koichi Hanaue and Toyohide Watanabe

Abstract. We propose a method for supporting presentation planning. Our objec-


tive is to support composition of scenarios according to the situation such as the
background knowledge of audience and time constraints. Our idea is to introduce
a presentation strategy and translate it into the policy of searching a network of
content fragments in which they are related with semantic relationships. We pro-
pose a mechanism of selecting and ordering the fragments according to a strategy
specified by a presenter. This mechanism allows a presenter to compose a scenario
from a few fragments specified as important ones by him/her. Our method is ex-
pected to bridge the gap between the product of idea organization and presentation
slides.

1 Introduction
Presentation is one of the most important activities for transferring and sharing
knowledge among people. Presenters often make speeches by showing presenta-
tion slides prepared with traditional tools such as Apple Keynote [1] and Microsoft
PowerPoint [8]. Although the traditional presentation tools are widely used, some
problems have been pointed out from the viewpoint of understandability for people
looking at slides. One is that the tools do not allow presenters to clarify semantic
relationships among the ideas and facts [10]. This leads to make the construction
and important point of slides vague. Another problem is that a deck of slides are

Koichi Hanaue
Graduate School of Information Science, Nagoya University,
Furo-cho, Chikusa-ku, Nagoya, 464-8603 Japan
e-mail: hanaue@nagoya-u.jp
Toyohide Watanabe
Graduate School of Information Science, Nagoya University,
Furo-cho, Chikusa-ku, Nagoya, 464-8603 Japan
e-mail: watanabe@is.nagoya-u.ac.jp

T. Watanabe et al. (Eds.): Intelligent Interactive Multimedia: Systems & Services, SIST 14, pp. 33–42.
springerlink.com 
c Springer-Verlag Berlin Heidelberg 2012
34 K. Hanaue and T. Watanabe

managed separately for each presentation, This makes it difficult to reuse and recon-
struct existing slides according to the situations such as the background of audience
and the constraint of time.
Until now, a number of methodologies and systems have been proposed for au-
thoring contents by organizing pieces of fragments. Marshall et al. developed a
system called VIKI for authoring spatial hypertexts [7]. VIKI estimates the rela-
tionships among content fragments by interpreting the user’s manipulation and their
layout. Hasida proposed the methodology of semantic authoring based on an ontol-
ogy [5]. Semantic authoring is to compose intelligent contents by specifying seman-
tic relationships among content fragments explicitly. Also, Haller et al. proposed a
system called iMapping for personal knowledge management [3]. This system al-
lows a user to specify relationships among content fragments by linking and nesting
them.
We propose a method for supporting the process of presentation preparation
on the basis of these works. Specifically, we aim to support scenario composition
according to the situation of a presentation. Here, we assume that the structured
fragments are already prepared by a presenter with existing systems such as those
described above. We introduce a concept of a presentation strategy to the process
of scenario composition. A presentation strategy refers to the intention of a presen-
ter on how to unfold his/her story. Our idea is to translate a presentation strategy
into the policy of selecting and ordering fragments in the structured fragments. We
present a mechanism of constructing a presentation story and logical structure of
slides from the structured fragments.
Our contribution is the development of the mechanism of transforming structured
idea fragments into presentation slides. We have developed a presentation system
that converts fragments organized in the form of a tree into a deck of presentation
slides [4]. However, this system does not consider any relationships except for se-
quential/hierarchical relationship. FLY [6] and Prezi [9] are characteristic systems
for presentations. These systems allow a presenter to author a presentation docu-
ment by arranging objects such as texts, images and videos without imposing phys-
ical constraints of a slide frame. From the viewpoint of reusing slides, Bergman et
al. developed a system for composing presentation slides from existing ones [2]. Al-
though these systems are effective, they do not support editing of slides at the level
of topics. We believe that these systems will be improved by introducing semantic
relationships among content fragments and reflecting a presentation strategy in the
design of slides.
The rest of this paper is organized as follows. Section 2 describes the frame-
work of our method. Section 3 explains how to translate a presentation strategy in
our model. Then, Section 4 describes the mechanism of constructing a presentation
story and logical structure of a presentation slide. Section 5 introduces our proto-
type system for scenario composition. Finally, Section 6 concludes this paper and
presents our future work.
A Method for Supporting Presentation Planning 35

2 Approach
2.1 Scenario Composition in Presentation Planning
Presentation planning is to construct a presentation story and to design presenta-
tion slides. The process of presentation preparation is divided into three phases as
illustrated in Figure 1. First, a presenter constructs a story from structured frag-
ments of ideas by picking and ordering necessary fragments. We call each fragment
a knowledge fragment, and represent the structured fragments as a network struc-
ture of knowledge fragments in which semantic relationships are specified between
them. We call this network structure a knowledge fragment network. Next, a pre-
senter designs presentation slides by selecting necessary materials and considering
the layouts of slides. We represent the result of designing slides as logical struc-
ture. Logical structure is a set of relationships among visual elements such as texts,
figures and tables. The types of relationships defined in logical structure include
sequential relation, inclusive relation in addition to semantic relations. Finally, a
presenter makes a deck of slides according to the logical structure. In this phase, the
logical structure is handled as constraints in allocating visual elements.
We call the product of presentation planning a presentation scenario. Namely, a
presentation scenario consists of a presentation story and logical structure of slides.
Presentation scenarios reflect the intentions of presenters on what to speak, how to
speak and what materials to present. It is effective to handle a scenario to support
the process of presentation preparation.

2.2 Framework
Our method aims to support the process of scenario composition based on a
presentation strategy. A presentation strategy is a presenter’s intentions on how
to construct a persuasive story. “PREP (Point-Reason-Example-Point)”, “Explain
from examples” and “Use as much illustrations as possible” are examples of a pre-
sentation strategy. In our method, a presenter specifies a presentation strategy before
composing a scenario. When a presenter picks an knowledge fragment that he/she
intends to emphasize, the fragments that are relevant to the picked fragment are

Fig. 1 The process of presentation planning


36 K. Hanaue and T. Watanabe

Fig. 2 Framework for supporting composition of presentation scenarios

selected and ordered according to the strategy. Then, a logical structure of slides are
constructed according to the importance of each fragment. By doing this, it becomes
easy for a presenter to construct a story and edit slides at the level of topics.
Figure 2 illustrates our framework for supporting composition of presentation
scenarios. Before composing a scenario, a presenter organizes ideas by construct-
ing a knowledge fragment network in advance. Then, a presenter determines a
specific presentation strategy and an important knowledge fragment. According to
the strategy, we construct a presentation story by assigning importance weights
to fragments (Step 1) and extract the fragments with high importance as neces-
sary elements for a story (Step 2). We use the technique of propagating weights
according to the semantic relationship. Finally, we derive the logical structure of
presentation slides by grouping knowledge fragments in a presentation story (Step
3). If additional fragments are necessary, a presenter specifies another important
fragment. This interaction is repeated until a presenter judges that the story is
complete.

2.3 Presentation Strategy


We take an approach of translating a presentation strategy into the policy of search-
ing in a knowledge fragment network. A presentation strategy has two aspects from
the viewpoint of adaptation to the situation of presentations.
Common strategy: A general strategy common to most presentations. This type of
strategy includes policies such as “the concepts related to many other concepts
are more important” and “important concepts have higher priority”.
Specific strategy: A strategy that varies according to the situations such as the
background knowledge of audience and time constraints. This type of strategy
includes story patterns and the important concepts to be explained in detail.
Figure 3 illustrates the process of translating a presentation strategy into the steps to
construct a story. In order to reflect the specific strategy, semantic relationships in a
given knowledge fragment network are interpreted as sequential relation (an arrow
A Method for Supporting Presentation Planning 37

Fig. 3 Translation of a knowledge fragment network into a presentation story according to a


presentation strategy

with a dotted line) or hierarchical relation (an arrow with a solid line). Suppose
that the fragment f1 represents the reason of the fragment f2 . If a strategy is “Give
higher priority to points than to reasons”, the relationship between f1 and f2 is
interpreted as a hierarchical relationship in which f1 is subordinate to f2 . Then, the
network is searched from an important fragment according to the order of semantic
relationships specified as a story pattern. The order also reflects a specific strategy to
the situation of a presentation. Also, the importance is assigned to fragments during
this step according to the common strategy. The importance is used to determine the
range of searching and to construct the logical structure of slides. Finally, a story is
constructed by arranging fragments in sequence according to the order of visiting
and the categories of relationships.

3 Definitions
3.1 Knowledge Fragment Network
A knowledge fragment is an element that forms a part of a presentation story. A
knowledge fragment f is defined as a double (type, content) where type is the type
of f and content refers to the entity of f .
A knowledge fragment network G is defined as G = (F, L), where F is a set of
knowledge fragments. L is a set of links between knowledge fragments defined as
L = {(r, fsrc , fdst )| fsrc , fdst ∈ F}, where r is a semantic relationship between knowl-
edge fragments. In our method, twelve types of semantic relationships are consid-
ered: causes, assumes, paraphrases, criticizes, compared-with, exemplifies, details,
specializes, supplements, illustrates, precedes and related-to.
38 K. Hanaue and T. Watanabe

3.2 Translating Presentation Strategy


3.2.1 Specific Strategy

A specific strategy S is defined as S = {(r,type, direction, priority, decay)}, where


type ∈ {sequential, hierarchical, undirected}, direction ∈ {same, reversed, none}.
In this definition, r is a semantic relationship. Type indicates whether r is se-
quential, hierarchical or undirected. Direction specifies whether the direction of
sequential/hierarchical relationship from one fragment to another is the same as
that of semantic relationship r. For example, fsrc appears before fdst in a story if
(r, fsrc , fdst ) ∈ L, type = sequential and direction = same. Also, fsrc is subordinate to
fdst in a story if (r, fsrc , fdst ) ∈ L and type = hierarchical and direction = reversed.
If type = undirected, direction becomes none. Priority is a numeric value that in-
dicates how important the semantic relationship r is considered in a given strategy.
This value is used in searching a knowledge fragment network G. Decay is the de-
cay of importance weights associated with r. This value indicates to what extent the
importance of a knowledge fragment specified by a presenter spreads over G.

3.2.2 Common Strategy

In order to reflect a common strategy, we calculate a potential importance of a


knowledge fragment f based on its degree centrality in G. This is because the frag-
ments related to many fragments are considered to be more important. We represent
a potential importance of f as c( f ) and define it as the number of the fragments that
has a semantic relationship with f .
Based on the potential importance, we spread the importance weight of the
knowledge fragments over G. Specifically, if the fragment f1 is subordinate to the
fragment f2 and c( f1 ) is less than c( f2 ), the importance weight of f1 is decreased
by the value decay specified in S. If the importance weight of a fragment is less than
a predefined threshold, we stop visiting any further fragments in the network.

4 Composition of Presentation Scenarios

4.1 Constructing Presentation Story


In constructing a presentation story, we represent a story as a list of a double ( f , w)
such that f ∈ F and w is its importance weight in a presentation story. Also, we
introduce a set of candidate fragments. Each element in this set is represented as
a triple ( f , w, a) such that f ∈ F and w is its importance weight, a is an activation
degree.
Algorithm 1 describes the procedure for constructing a presentation story. This
procedure searches a knowledge fragment network with a policy similar to breadth-
first search. The procedure C ONSTRUCT-S TORY takes six arguments as inputs: a
knowledge fragment network G, a translation of a specific presentation strategy S, a
list of knowledge fragments T that represents a presentation story, a set of candidate
A Method for Supporting Presentation Planning 39

Algorithm 1. C ONSTRUCT-S TORY(G, S, T,C, p, a)


T  ← T , C ← C, Q ← φ
( f , w) ← (T [p]. f , T [p].w)
for all (r,type, direction, priority, decay) sorted by priority specified in S do
for all f  such that (r, f , f  ) ∈ L or (r, f  , f ) ∈ L do
(w , a , p ) ← P ROPAGATE (G, r,type, direction, decay, f , f  , w, a, p)
if ( f  , w̃, ã) ∈ C then
C ← C − {( f  , w̃, ã)}
w ← M AX(w , w̃), a ← M AX(a , ã)
end if
if f  already exists in T  at position p̃ then
T [ p̃] ← ( f  , M AX(T [ p̃].w, w ))
else
if w > T hw and a > T ha then
I NSERT (( f , w ), T  , p )
E NQUEUE (Q, ( f , w , a ))
else
C ← C ∪ {( f  , w , a )}
end if
end if
end for
end for
while Q is not empty do
( f  , w , a ) ← D EQUEUE (Q)
(T  ,C ) ←C ONSTRUCT-S TORY(G, S, T  ,C , p , a )
end while
return (T  ,C )

fragments C for a story, the index p indicating the position of a knowledge fragment
in T , the activation degree a indicating the range of search. C ONSTRUCT-S TORY
returns a list of knowledge fragments T  and a set of knowledge fragments C . This
procedure computes T  and C by visiting the fragments on the knowledge fragment
network and then adding them to T and C.
In the phase of story construction, C ONSTRUCT-S TORY is called when a pre-
senter specifies a knowledge fragment f0 in G and its position p0 in T . Before the
procedure is called, a double ( f0 , w0 ) is inserted into T at position p0 . Here, w0 is
an initial value of an importance weight. If L is empty, for example, C ONSTRUCT-
S TORY is called with arguments G, S, T = [( f0 , w0 )], C = φ , p = 0 and a = a0 , an
initial value of an activation degree.
The procedure C ONSTRUCT-S TORY is divided into three main steps. First, the
neighboring fragments in G are enumerated according to the priorities of seman-
tic relationships specified in the strategy S. In this step, if a neighboring fragment
exists in a candidate set C, its importance weight and activation degree is up-
dated. Second, the importance weight and the activation degree are propagated to
40 K. Hanaue and T. Watanabe

Algorithm 2. P ROPAGATE(G, r,type, direction, decay, f , f  , w, a, p)


if type = hierarchical then
if (direction = same and (r, f , f  ) ∈ L) or (direction = reversed and (r, f  , f ) ∈ L) then
{ f  is subordinate to f }
if c( f ) > c( f  ) then
w ← w − decay, a ← a − decay, p ← p + 1
else
w ← w, a ← a − α · decay, p ← p + 1
end if
else if (direction = same and (r, f  , f ) ∈ L) or (direction = reversed and (r, f , f  ) ∈ L)
then
{ f is subordinate to f  }
w ← w, a ← a − α · decay, p ← p − 1
end if
else if type = sequential then
if (direction = same and (r, f , f  ) ∈ L) or (direction = reversed and (r, f  , f ) ∈ L) then
{ f precedes f  }
w ← w, a ← a − α · decay, p ← p + 1
else if (direction = same and (r, f  , f ) ∈ L) or (direction = reversed and (r, f , f  ) ∈ L)
then
{ f  precedes f }
w ← w, a ← a − α · decay, p ← p − 1
end if
else if type = undirected then
if ((r, f , f  ) ∈ L) or ((r, f  , f ) ∈ L) then
{ f precedes f  }
w ← w, a ← a − α · decay, p ← p + 1
end if
end if
return (w , a , p )

neighboring fragment in the procedure P ROPAGETE according to the semantic rela-


tionship and its direction. The procedure of P ROPAGATE is described in Algorithm
2. In this procedure, the importance weight, the activation degree and the position
in a story of a neighboring fragment f  are calculated according to the presenta-
tion strategy. How to calculate these values are determined by whether the rela-
tionship between f and f  is hierarchical, sequential or undirected. The parameter
α (0 < α < 1) determines the rate of decreasing the propagated values. Third, the
neighboring fragment is added to a story T  if both its importance weight and its
activation degree are higher than threshold. Then, a tuple of the fragment, its im-
portance weight and its activation decree is enqueued in a queue Q. Otherwise, the
neighboring fragment is saved in a candidate set C for future search. After these
steps are finished, the procedure C ONSTRUCT-S TORY is called recursively for each
fragment in Q.
A Method for Supporting Presentation Planning 41

Fig. 4 Screenshot of our prototype system

4.2 Constructing Logical Structure


When a presentation story T is constructed through the interactions with a presenter,
the logical structure of presentation slides are constructed by grouping knowledge
fragments in T . How to group the two fragments that are located next to each other
in T is divided into three cases according to their importance weights. Suppose that
the fragment fn precedes the fragment fn+1 in T .
• If the importance of fn is higher than that of fn+1 , these fragments are included
in the same group and fn includes fn+1 .
• If the importance of fn is lower than that of fn+1 , these fragments are included in
different groups that are disjoint with each other.
• If the importance of fn is equal to that of fn+1 , these fragments are included in
the same group and become disjoint with each other.

5 Prototype System
We have developed a prototype system for composing presentation scenarios. Figure
4 shows the screenshot of the system. A presenter prepares a knowledge fragment
network in the left window and construct a story in the right window. The fragments
in a story are arranged from the top of the right window. The importance of each
fragment is expressed as its size in window. First, a presenter specifies a presentation
strategy by determining the type, the direction, the priority and the decay for each
semantic relationship. Next, a presenter picks an important fragment and copies it
from the left window to the right window. Then, the system selects the fragments
relevant to the copied one and orders them according to the specified strategy. The
42 K. Hanaue and T. Watanabe

fragments selected by the system are displayed in the right window and arranged
in the sequence of a story. A presenter completes his/her story by repeating these
manipulations.

6 Conclusion
We proposed a method for supporting presentation planning by transforming struc-
tured fragments into logical structure of presentation slides. In our method, the con-
cept of presentation strategy is introduced to construct a presentation story and a
deck of slides. A presentation strategy is translated into a policy of searching a
knowledge fragment network. We believe that this mechanism allows a presenter to
compose a scenario from a small number of knowledge fragments explicitly speci-
fied by him/her.
Currently, our prototype system requires much input for specifying a presentation
strategy. Therefore, we have to consider a mechanism for specifying it with less
input. Also, we have to confirm that our prototype system enables a presenter to
compose a scenario according to his/her presentation situation.

References
1. Apple Inc.: Keynote, http://www.apple.com/iwork/keynote/
2. Bergman, L., Lu, J., Konuru, R., MacNaught, J., Yeh, D.: Outline Wizard: Presentation
Composition and Search. In: Proceedings of the 15th International Conference on Intel-
ligent User Interfaces, Hong Kong, China, pp. 209–218 (2010)
3. Haller, H., Abecker, A.: iMapping – A Zooming User Interface Approach for Personal
and Semantic Knowledge Management. In: Proceedings of the 21st ACM Conference on
Hypertext and Hypermedia, Tronto, Ontario, Canada, pp. 119–128 (2010)
4. Hanaue, K., Watanabe, T.: Supporting Design and Composition of Presentation Docu-
ment Based on Presentation Scenario. In: Proceedings of the 2nd International Sympo-
sium on Intelligent Decision Technologies, Baltimore, MD, US, SIST, vol. 4, pp. 465–
473 (2010)
5. Hasida, K.: Semantic Authoring and Semantic Computing. In: Sakurai, A., Hasida, K.,
Nitta, K. (eds.) JSAI 2003. LNCS (LNAI), vol. 3609, pp. 137–149. Springer, Heidelberg
(2007)
6. Lichtschlag, L., Karrer, T., Borchers, J.: Fly: A Tool to Author Planar Presentations. In:
Proceedings of the CHI 2009 Conference on Human Factors in Computing Systems,
Boston, MA, USA, pp. 547–556 (2009)
7. Marshall, C.C., Shipman, F.M., Coombs, J.H.: VIKI: Spatial Hypertext Supporting
Emergent Structure. In: Proceedings of the ACM European Conference on Hyperme-
dia Technologies, Edinburgh, Scotland, pp. 13–23 (1994)
8. Microsoft Inc.: Power Point, http://office.microsoft.com/powerpoint/
9. Prezi, http://prezi.com/
10. Tufte, E.R.: The Cognitive Style of Power Point. Graphics Press (2004)
A Study on Privacy Preserving Collaborative
Filtering with Data Anonymization by
Clustering

Katsuhiro Honda, Yui Matsumoto, Arina Kawano,


Akira Notsu, and Hidetomo Ichihashi

Abstract. Collaborative filtering achieves personalized recommendation based on


user collaboration. In this paper, how to preserve personal information in collabora-
tive filtering is studied through several comparative experiments. k-anonymization
is a standard method for guaranteeing personal privacy, in which data records are
summarized so that any record is indistinguishable from at least (k − 1) other
records. This study compares several clustering-based k-anonymization models in
the context of collaborative filtering application.

1 Introduction
Collaborative filtering achieves personalized recommendation by searching for user-
neighborhood comparing user preferences such as purchase history data, and is a
powerful tool for reducing information overload. GroupLens [5, 8] is a basic model
of the memory-based method, in which user-similarity is first measured by Pear-
son correlation coefficients and the applicability of a new item for an active user
is predicted by the similarity-weighted average of other users’ ratings. The concept
is applied in many practical systems, such as Amazon.com [9], and is proved to
be beneficial for both users and content suppliers. In real world situations, how-
ever, users may not fully enjoy such fruits of IT tools because they often hesitate
to provide their personal information or feel nervous about information
leaks.
In order to encourage users to exploit IT tools, we should develop the techniques
for privacy preserving data mining [1], and such techniques as data perturbation or
obfuscation have been applied to privacy preserving collaborative filtering [11, 12].
k-anonymization [13] is a standard method for guaranteeing personal privacy, in

Katsuhiro Honda · Yui Matsumoto · Arina Kawano · Akira Notsu · Hidetomo Ichihashi
Osaka Prefecture University, 1-1 Gakuen-cho, Nakaku, Sakai, Osaka 599-8531, Japan
e-mail: {honda@,matsumoto@hi.,kawano@hi.,notsu@,
ichi@}cs.osakafu-u.ac.jp

T. Watanabe et al. (Eds.): Intelligent Interactive Multimedia: Systems & Services, SIST 14, pp. 43–52.
springerlink.com c Springer-Verlag Berlin Heidelberg 2012
44 K. Honda et al.

which data records are summarized so that any record is indistinguishable from at
least (k − 1) other records. The task has a close relation to clustering (or cluster anal-
ysis) [2], in which multivariate data observations are grouped into several clusters
so that objects in a same cluster are mutually similar but objects in different clusters
are not similar. k-anonymization can be achieved by searching clusters having k or
more objects and by packaging the observations into a prototypical datum in each
cluster. In general, k-anonymity is considered only for the quasi-identifiers, which
are the attributes having information for distinguishing a particular object from an
object set.
In this paper, the applicability of several clustering-based k-anonymization
approaches to collaborative filtering tasks is studied through several compara-
tive experiments. The remaining parts of this paper are organized as follows:
Section 2 gives a brief review of GroupLens recommendation algorithm. Section 3
presents several clustering-based k-anonymization approaches. Experimental
results are shown in Section 4 and a concluding summary is given in Section 5.

2 GroupLens Recommendation Algorithm


Assume that we have a user-item evaluation matrix X = {xi j }, whose elements xi j ,
i = 1, . . . , n, j = 1, . . . , m is the rating for item j given by user i. Usually, users
evaluated only a part of m items and the matrix has many missing elements. The
goal of collaborative filtering is to predict the missing elements (applicability of
items to an active user) and to recommend an item that has not been evaluated but
is expected to be preferred by the active user.
GroupLens [5, 8] is a basic model of the neighborhood-based method, in
which Word-of-Mouth can be virtually achieved in network communities. In the
neighborhood-based method, the user neighborhood of an active user is first ex-
tracted by considering the Pearson correlation coefficients among users’ ratings,
and then, missing elements yi j are predicted by calculating the similarity (Pearson
correlation coefficient)-weighted average as follows:

∑na=1 (xa j − x̄a ) × ria


yi j = x̄i + , (1)
∑na=1 ria
where ria is the Pearson correlation coefficient between users i and a. x̄i is the mean
rating value of user i. In GroupLens, it was recommended to use the deviations from
mean value of users (xa j − x̄a ) instead of original values (xa j ) in order to remove the
influence of users’ tendencies.
It was also proved to be useful to consider the item-neighborhood instead of the
user-neighborhood [9]. Besides such memory-based method, where missing value
prediction is performed with memory-stored ratings, model-based methods [3, 7]
have been developed, which are computationally efficient in their recommendation
stage while the traning processes are often computationally expensive.
A Study on Privacy Preserving Collaborative Filtering with Data Anonymization 45

3 Clustering-Based k-Anonymization
k-anonymization [13] is a standard method for privacy preserving data mining [1],
in which similar objects are summarized into a single observation so that each obser-
vation cannot be used for distinguish a particular individual. k-anonymity is concep-
tually achieved so that at least k objects has same observations, and its requirement
can be identified with the process of clustering, in which the goal is to extract clus-
ters composed of similar objects.
In this study, the applicability of several clustering-based k-anonymization ap-
proaches is discussed from the view point of information loss in collaborative filter-
ing tasks.

3.1 Hierarchical Clustering


Hierarchical clustering methods perform data clustering in a step-by-step approach,
in which clusters are merged (divided) in a bottom-up (top-down) manner. For ex-
ample, in the bottom-up approach, all objects are first regarded as separate clusters,
i.e., the number of clusters is equal to that of objects. Then, the nearest cluster pair
is merged into a single cluster in each step. This step is iterated until all clusters
are merged into one big (whole data) cluster. Hierarchical clustering methods are
useful for constructing a dendrogram (tree diagram) and are available for extracting
arbitrary numbers of clusters.
In this study, two major methods of the bottom-up approach are adopted.

3.1.1 Single-Linkage Method

In the single-linkage method (or also called as nearest-neighbor method), the dis-
tance between clusters are measured by the shortest distance among objects.
Let di j be the similarity degree between objects i and j. The larger the similarity
degree, the nearer the objects. The similarity degree wi j between clusters Gi and G j
is defined as:
wi j = max dab . (2)
a∈Gi ,b∈G j

It has been shown that the single-linkage method has the tendency of constructing a
small number of large (long) clusters in its middle stage and is suitable for extracting
thin and nonlinear-shape clusters.

3.1.2 Complete-Linkage Method

In the complete-linkage method (or also called as furthest-neighbor method), the


distance between clusters are measured by the longest one.
The similarity degree wi j between clusters Gi and G j is defined as:

wi j = min dab . (3)


a∈Gi ,b∈G j
46 K. Honda et al.

It has been shown that the complete-linkage method has the tendency of constructing
a large number of small clusters in its middle stage and is suitable for extracting
compact and spherical-shape clusters.

3.2 k-Member Clustering


Although the conventional clustering models are applicable for estimating a partic-
ular number of clusters, they cannot be used for tuning the number of objects in
each cluster while k-anonymity requires all clusters to have a particular number of
or more objects. k-member clustering is another type of clustering method, in which
a particular size cluster is extracted one by one. Byun et al. [4] proposed an efficient
algorithm for k-member clustering, which can be applied to both (or the mixture of)
numerical and categorical observations. With the goal of achieving k-anonymity, the
prototypical observations of each cluster is given by interval values for numerical
one or mixed categories for categorical one.
The greedy algorithm can be summarized as follows:
1. Let S be a set of objects. Choose the anonymity level k and randomly select an
object r.
2. Let a cluster index t be 0. Repeat the following process while |S| > k.
a. Replace r with its furthest object and remove r from S.
b. t = t + 1. Generate cluster Gt with a single element r.
c. Repeat the following process while |Gt | < k.
i. Find the best neighbor object r of cluster Gt .
ii. Add r to cluster Gt and remove r from S.
3. Repeat the following process while |S| > 0.
a. Randomly select an object r from S.
b. Find the best neighbor cluster Gt of r.
c. Add r to cluster Gt and remove r from S.
The best neighbor is selected so that the sum of the interval of numerical observation
and/or the number of the merged category of categorical observation is minimized
after merging. This merging process is essentially similar to the nearest-neighbor
(or single-linkage) principle.

3.3 Application to GroupLens Recommendation Algorithm


In this study, the applicability of clustering-based k-anonymization approaches is
discussed in the context of collaborative filtering with 0-1 type purchase history
data, where “1” means already bought and “0” means not yet. Such purchase history
data often include privacy sensitive information and all attributes are regarded as the
quasi-identifiers.
A Study on Privacy Preserving Collaborative Filtering with Data Anonymization 47

In the experiments shown in the next section, the similarity among objects is
measured by the Jaccard index [2], which is a major similarity measure for 0-1 type
observation:
A
dab = , (4)
A + B +C
where A, B and C are the numbers of attributes whose observations are “1-1”, “1-0”
and “0-1” for objects “a-b”. The index is used not only for hierarchical clustering
methods but also for k-member clustering because the interval value for 0-1 obser-
vation has no meaning, i.e., the interval [0,1] just means unknown for the attribute.
So, k-member clusters are extracted in a nearest-neighbor principle.
Before applying GroupLens recommendation algorithm, in which only a scalar
value is available, each attribute of the purchase history data was summarized into
the median of each cluster, i.e., “0 or 1” based on the majority rule in 0-1 type
observation.

3.4 Three Strategies for Handling Isolated Objects


In clustering-based k-anonymization approaches, it is possible that some objects are
isolated alone or in small-size clusters having k − 1 or lesser elements after cluster-
ing process. In k-member clustering proposed by Byun et al., such isolated objects
were simply merged into the nearest cluster after extracting all possible clusters.
However, it is also the case that we want to use only a smaller number of good qual-
ity clusters and the many other noise objects should be remained to be isolated. In
such case, the merging strategy may increase the influence of noise.
In this study, the following three strategies are compared:
1. Merging Strategy: Isolated objects are merged into the nearest cluster and the
cluster prototypical observation is updated considering the objects.
2. Coding Strategy: Isolated objects are assigned to the nearest cluster and the ob-
servation of the objects are coded (replaced) with its prototypical observation
without updating.
3. Rejecting Strategy: Isolated objects are just rejected (removed).
The coding strategy is expected to make the influence of noise weaker than the
merging strategy while it does not ignore them. The rejecting strategy considers
noise object to be ignored even though such rejection may bring a significant infor-
mation loss.

4 Experimental Results
Several Comparative experiments for evaluating the availability of clustering-based
k-anonymization approaches were performed by using a purchase history data col-
lected by Nikkei Inc. in 2000. The data set used in [6] includes the purchase history
of 996 users (n = 996) on 18 items (m = 18). The elements xi j is 1 if user i has item
j while otherwise 0. In the experiments, six items (Piano, PC, Word processor, VD,
48 K. Honda et al.

0.85 0.85

Coding Marging
ROC sensitivity

ROC sensitivity
0.8
Marging 0.8
Coding

0.75 0.75
Rejecting
Rejecting

0.7 0.7

2 4 6 8 10 2 4 6 8 10
Anonymity level k Anonymity level k
(a) Single-linkage (b) Complete-linkage
Fig. 1 Comparison of ROC sensitivity of Single-linkage and Complete-linkage

Oven, Coffee maker) were selected as the target items because they were owned by
nearly half (30-60%) users. Randomly selected 1000 elements (xi j for the above 6
items) were used as the test set and the applicability (0 or 1) was predicted based
on GroupLens recommendation algorithm. Here, before applying the algorithm, “1”
elements of the test set were withheld and replaced with “0” (not yet bought but may
buy near future).
When the GroupLens algorithm was applied to the original purchase history data
without anonymization, the ROC sensitivity measure was 0.827. The ROC sensitiv-
ity [14] is a major criterion for assessing the recommendation ability. ROC curve is a
true positive rate vs. false positive rate plots drawn by changing the threshold of the
applicability level in recommendation and the ROC sensitivity measure is given by
the lower area of the curve. The larger the criterion, the higher the recommendation
ability.

4.1 Hierarchical Clustering: Single-Linkage


vs. Complete-Linkage
First, Two major approaches for hierarchical clustering were applied to k-
anonymization. The dendrogram built in hierarchical clustering are available for
extracting arbitrary numbers of clusters. In this experiment, the anonymized data
clusters having a pre-defined k or more users were extracted such that the number of
clusters are as much as possible with each k value. Then, the objects isolated alone
or in small-size clusters were handled as denoted in Section 3.4.
Figure 1 compares the ROC sensitivity of Single-linkage and Complete-linkage
with various anonymity level k. In Single-linkage, we cannot derive any clear con-
clusion which is better or worse about three strategies for handling isolated objects.
On the other hand, in Complete-linkage, the rejecting strategy is clearly inferior to
the other two strategies. The rejecting strategy may bring information loss by simply
A Study on Privacy Preserving Collaborative Filtering with Data Anonymization 49

0.85

ROC sensitivity
0.8

Single Complete
0.75 -linkage
-linkage
principle principle

0.7

Fig. 2 Comparison of ROC


sensitivity of k-member 2 4 6 8 10

clustering Anonymity level k

ignoring all isolated users when the number of clusters is not so large and we have
many isolated users.
Comparing Single-linkage vs. Complete-linkage, Complete-linkage outperformed
Single-linkage, i.e., Complete-linkage has a better quality for anonymizing data
without significant information loss. It is because Complete-linkage has the ten-
dency of extracting many compact clusters, in which the inner-cluster errors are
minimized, while Single-linkage often extract a few large clusters, in which the
inner-errors may not be minimized.

4.2 k-Member Clustering vs. Hierarchical Clustering


Second, k-member clustering is compared with hierarchical clustering. Although
the original k-member clustering is based on the single-linkage principle, the result
of Section 4.1 implies that the complete-linkage approach may be better. In this
experiment, two approaches of single-linkage and complete-linkage are compared
in extracting k-member clusters.

Table 1 Comparison of number of clusters

k Single-linkage Complete-linkage k-member

1 996 996 996


2 154 247 427
3 66 134 294
4 31 100 223
5 17 77 181
6 12 62 155
7 8 49 137
8 7 41 121
9 4 37 109
10 2 35 99
50 K. Honda et al.

0.85 0.85
Coding
Coding
ROC sensitivity

ROC sensitivity
Marging Marging
0.8 0.8

0.75 0.75
Rejecting
Rejecting
0.7 0.7

2 4 6 8 10 2 4 6 8 10
Anonymity level k Anonymity level k
(a) Single-linkage-principle (b) Complete-linkage-principle
Fig. 3 Comparison of ROC sensitivity of k-member clustering with fewer clusters

Because the k-member clustering process causes a small number (at most k −
1) of isolated users, the three strategies for handling isolated users derived al-
most same results. Figure 2 compares the ROC sensitivity of single-linkage-based
and complete-linkage-based k-member clustering and implies that both k-member
clustering approaches have quite high quality in anonymizing data, i.e., the recom-
mendation ability is still high even when k is nearly 10. It is because k-member
clustering extracts small clusters whose sizes are around k as many as possible al-
though the hierarchical clustering approaches used in Section 4.1 extracted only a
smaller number of good clusters as compared in Table 1.
Next, the k-member clustering models were performed with the same cluster
number with Complete-linkage and the ROC sensitivity was given as Fig. 3. The
figure implies that k-member clustering derives a similar result with Complete-
linkage if it uses a smaller number of clusters. However, the performance of
k-member clustering is not monotonically decreasing as the anonymity level k be-
comes large. It may be because k-member clustering is essentially a random search
as shown in Section 3.4. So, the recommendation quality is also influenced by
randomness.

4.3 Comparison of Anonymization Quality


Third, in order to discuss the quality of anonymization, the loss of information is
compared. Table 2 compares the number of the different elements between the orig-
inal purchase history data and the anonymized data given by the coding strategy.
The merging strategy gave a similar result while the rejecting strategy lost too much
information.
k-member clustering seems to be more promising approach than hierarchical
clustering although the performance is influenced by randomness. It may be because
A Study on Privacy Preserving Collaborative Filtering with Data Anonymization 51

Table 2 Comparison of information loss: number of different elements in purchase history


data (SL: Single-linkage, CL: Complete-linkage)

Hierarchical k-member k-member (fewer)


k SL CL SL CL SL CL
1 0 0 0 0 0 0
2 996 752 534 523 532 532
3 1502 1131 739 748 857 857
4 1910 1304 1053 991 1158 1124
5 2202 1478 1191 1134 1344 1335
6 2432 1623 1342 1296 1516 1467
7 2720 1851 1468 1401 1611 1601
8 2822 2007 1633 1497 1806 1704
9 3261 1959 1755 1570 1887 1830
10 3821 1983 1838 1661 2039 1945

k-member clustering often uniformly assigns clusters based on the random search
tendency while hierarchical clustering can extract unequal and distorted clusters.
Generally speaking, Complete-linkage-principle is more suitable for anonymiza-
tion without information loss than Single-linkage. So, the k-member clustering
should be implemented based on Complete-linkage-principle or other modified
principles.

5 Conclusions
In this paper, the applicability of several clustering-based k-anonymization ap-
proaches were compared from the view point of collaborative filtering application.
The experimental results imply that k-member clustering is a promising approach
for data anonymization while it may be influenced randomness. In addition, the orig-
inal k-member clustering model, which is based on the Single-linkage-principle, can
be improved by considering other principle such as Complete-linkage.
A potential future work is to improve the anonymization quality considering the
cluster compactness in conjunction with the cluster size, i.e., the number of objects.
Fuzzy clustering [10] is a potential candidate for improving crisp partition and the
construction of a fuzzy variant of k-member clustering will be promising.

Acknowledgements. This work was supported in part by the Ministry of Education, Cul-
ture, Sports, Science and Technology, Japan, through a Grant-in-Aid for Scientific Research
(#23500283).
52 K. Honda et al.

References
1. Aggarwal, C.C., Yu, P.S.: Privacy-Preserving Data Mining: Models and Algorithms.
Springer, New York (2008)
2. Anderberg, M.R.: Cluster Analysis for Applications. Academic Press, London (1973)
3. Breese, J., Heckerman, D., Kadie, C.: Empirical analysis of predictive algorithms for
collaborative filtering. In: Proc. 14th Conference on Uncertainty in Artificial Intelligence,
pp. 43–52 (1998)
4. Byun, J.-W., Kamra, A., Bertino, E., Li, N.: Efficient k-Anonymization Using Clustering
Techniques. In: Kotagiri, R., Radha Krishna, P., Mohania, M., Nantajeewarawat, E. (eds.)
DASFAA 2007. LNCS, vol. 4443, pp. 188–200. Springer, Heidelberg (2007)
5. Herlocker, J.L., Konstan, J.A., Borchers, A., Riedl, J.: An algorithmic framework for
performing collaborative filtering. In: Proc. Conference on Research and Development
in Information Retrieval (1999)
6. Honda, K., Notsu, A., Ichihashi, H.: Collaborative filtering by sequential user-item co-
cluster extraction from rectangular relational data. International Journal of Knowledge
Engineering and Soft Data Paradigms 2(4), 312–327 (2010)
7. Honda, K., Sugiura, N., Ichihashi, H., Araki, S.: Collaborative Filtering Using Principal
Component Analysis and Fuzzy Clustering. In: Zhong, N., Yao, Y., Ohsuga, S., Liu, J.
(eds.) WI 2001. LNCS (LNAI), vol. 2198, pp. 394–402. Springer, Heidelberg (2001)
8. Konstan, J.A., Miller, B.N., Maltz, D., Herlocker, J.L., Gardon, L.R., Riedl, J.:
Grouplens: applying collaborative filtering to usenet news. Communications of the
ACM 40(3), 77–87 (1999)
9. Linden, G., Smith, B., York, J.: Amazon.com recommendations: item-to-item collabora-
tive filtering. IEEE Internet Computing, 76–80 (January-Februry 2003)
10. Miyamoto, S., Ichihashi, H., Honda, K.: Algorithms for Fuzzy Clustering: Methods in
c-Means Clustering with Applications. Springer (2008)
11. Parameswaran, R., Blough, D.M.: Privacy preserving collaborative filtering using data
obfuscation. In: Proc. IEEE International Conference on Granular Computing 2007, pp.
380–386 (2007)
12. Polat, H., Du, W.: Privacy-preserving collaborative filtering using randomized perturba-
tion techniques. In: Proc. 3rd IEEE International Conference on Data Mining, pp. 625–
628 (2003)
13. Sweeney, L.: k-anonymity: a model for protecting privacy. International Journal on Un-
certainty, Fuzziness and Knowledge-Based Systems 10(5), 557–570 (2002)
14. Swets, J.A.: Measuring the accuracy of diagnostic systems. Science 240, 1285–1289
(1988)
A Traffic Flow Prediction Approach Based
on Aggregated Information of Spatio-temporal
Data Streams

Jun Feng, Zhonghua Zhu, and Rongwei Xu

Abstract. Predicting traffic flow efficiently to encourage driver to avoid the sections
which are going to have traffic jams is a good approach to deal with traffic conges-
tion. Conventional prediction methods focused on specific information (e.g., speed,
density, flux, and so on). However, they will consume a lot of time and storage
space. This paper proposes a novel prediction approach by analyzing the aggregated
information of data streams to avoid unnecessary time and storage consumption.
Evaluation shows that compared with existing similar approach ES (Exponential
Smoothing), this new approach can adjust its smoothing factor based on historical
values and outperform in prediction results.

1 Introduction
With the rapid economic development, traffic congestion has become a common
problem in major cities in the world and has a serious impact on people’s quality
of life. ’How to build an intelligent transportation system (ITS)’ becomes the intel-
ligent transportation research focus.The precise prediction of changing traffic flow
state (speed, density, flux, travel time and other traffic operating conditions) is one
of the core of ITS. All subsystems of ITS need a reasonable, real-time and accurate
prediction on the state of road traffic to adjust traffic management control program
and publish the travel information to the travelers and provide some optimal path
options to the drivers. However, with the application and development, ITS has ac-
cumulated a massive and complex traffic flow information, and it has wide variety
of sources, a wide range of different forms and a huge amount of information. Also,
traffic flow information is obtained in real time, and the amount of information will
rapidly expand in a relatively short period of time. Real-time traffic flow informa-
tion which is get from ITS provides an important data foundation to the intelligent

Jun Feng · Zhonghua Zhu · Rongwei Xu


College of Computer and Information, Hohai University, Nanjing 210098, China
e-mail: fengjun@hhu.edu.cn

T. Watanabe et al. (Eds.): Intelligent Interactive Multimedia: Systems & Services, SIST 14, pp. 53–62.
springerlink.com 
c Springer-Verlag Berlin Heidelberg 2012
54 J. Feng, Z. Zhu, and R. Xu

transportation system management and the control of road traffic flow data. How-
ever, the storage, processing and handling of huge traffic flow have proposed new
requirements to the current traffic flow data analyzing and processing techniques. So
we need aggregate processing techniques to deal with the data streams, this technol-
ogy does not require process each specific data and only need process summarized
information (e.g., the number of vehicles in a query region).
For getting reasonable, real-time, accurate prediction answer and avoiding high
storage space and time consumption, in this paper, we propose a new prediction ap-
proach which can effectively take the advantage of existing aggregated information on
the stream to obtain the excellent prediction answer which can ease the traffic pressure.
The rest of the paper is organized as follows. Section 2 formally defines the
spatio-temporal aggregation and reviews related work, our solution is proposed
in Section 3. In Section 4, we present related experiment studies and report our
findings. Finally, Section 5 concludes the paper with a summary.

2 Preliminaries
2.1 Spatio-temporal Aggregation Definition
Spatio-temporal aggregation calculation [1] calculates objects in a query region dur-
ing a query time. Objects’ explicit properties and spatial regions will change over
time, this change can be discrete or continuous. For example, ”calculating forest
cover of each country of the world within ten years”. The current main research meth-
ods are AMH* [2], aRB-tree [3], DynSketch [4], and so on. In real life traffic appli-
cations, we need predict the traffic situation in a region during a future query time
and make the appropriate adjustments. In this paper, we focus on the spatio-temporal
aggregated information which is generated by existing aggregate methods and do an
analysis. According to the characteristics of aggregated information, we use reason-
able forecasting methods to predict the traffic flow to ease traffic congestion.

2.2 Related Work


Real-time data streams prediction of future trends is an important practical applica-
tion, but the current data streams prediction calculation methods are still very few.
The first topical method is [5], Guo et al. proposed a continuous aggregate query
prediction algorithm with the ’Count’ aggregate function. The main method is math-
ematics statistics, the defect is that the number of data elements has upper limit, and
the size of road networks’ data streams is unbounded. Li et al. [6] proposed a multiple
linear regression method, when the predicted number of failures is greater than the
pre-specified threshold, a predictive model is given automatically adjust its strategy
to reduce the prediction error. They also proposed a mathematical model on the foun-
dation of the update cycle of sliding window and the impact of prediction accuracy
based on data streams’ velocity. This prediction model is only applicable to a linear
relationship between the flow of data, once the data is not a linear relationship, the
A Traffic Flow Prediction: SAES 55

model’s error will increase. So it has its own limitations, and is not suitable for road
networks’ prediction calculation. Sun [7] proposed a prediction aggregation method
based on the basic window and the prediction equation. This method is applied to the
data streams which have certain regularity or satisfy the fixed function, but the road
networks’ data streams are unpredictable, they are useless to follow the law. Yu et
al. [8] advanced a method (CSPA), based on chaos theory, which takes the answer of
continuous aggregation calculation to predict future values. CSPA considers the im-
pact of the data itself to the prediction and uses local prediction algorithm of chaotic
time series, and predict future values of the size of aggregation calculation based on
historical data, finally, according to the error results of the predicted values and actual
values, adjusts the prediction algorithm model.
Those above four methods are not efficient for the spatio-temporal data streams,
and only DynSketch [4] mentioned a prediction method, ES (Exponential Smooth-
ing), which is based on spatio-temporal aggregated information. This prediction
method is the use of prediction smoothing formula to get the predicted values, and
is appropriate for data which is random changing around a horizontal line. The road
networks’ data will have a great volatility in a section of the peak flow, for instance,
around rush hour of 5:00 pm, cars will increase significantly on the road. Therefore,
there are some limitations of this method which is not particularly suitable for online
road traffic flow prediction.

2.3 Exponential Smoothing


Exponential smoothing method was first proposed by Robert G. Brown and Richard
F. Meyer [9] in 1960. It depends on successively actual observations and systemati-
cally corrects the prediction model system. Its prediction strategy is that extrapolates
the historical observation to the future and corrects the changing type of need with
adaptive principle. This approach has two notable features: first, it uses all historical
data and related information; second, it also follows the principle of ’giving more
weight on the closer one rather than the further’ to weight average and smoothing
the data. This prediction model has the function of resisting or reducing the impact
of abnormal data, so that the historical laws’ performance of time series can be re-
flected more significantly.The traditional ES model uses Fourier function to model
periodically. The Formula is showed in 1:

Yt+1 = α0 St + α1 St−1 + · · · + αnSt−n (n → ∞) (1)

In order to meet the characteristics of ES, it sets α0 = α , αk = α (1− α )k , k = 1, 2, ···.


And when 0 < α < 1, ∑i=0→∞ αi = α + α (1 − α ) + α (1 − α )2 + · · · = α /[1 − (1 −
α )] = 1. So the formula 1 can be simplied to that:

Yt+1 = α St + (1 − α )Yt (2)

In the formula 2, St is the actual value at time t, α is the smoothing parameter, Yt


represents predictive value at time t, and Yt+1 represents predictive value at time t + 1.
56 J. Feng, Z. Zhu, and R. Xu

In practice, smoothing parameter α has a great influence on smoothing perfor-


mance and also has a high demand for data. If α is small, the smoothing perfor-
mance of prediction model will be more strong, and if α is large, prediction model
will have a faster response to time series. In the paper [4], it chooses the value of α
only by experimentally debugging. But in real life applications on the road, it is not
reasonable approach to get the value of parameter α through manual adjustment. In
the traditional exponential smoothing, the α ’s value to the people’s choice depends
on the experience. Usually, when the time series’ volatility is not significant, the
value of α can be set between 0.1 and 0.3 to increase the weight of original pre-
dictive value, on the other hand, when time series’ volatility is significant, α can be
set between 0.6 and 0.8 to increase the weight of new predictive value. In the diffi-
cult judgments, we can use several different values to do repeated calculations, and
compare different prediction values to determine α ’s value. The value of α which
is get by that way is lack of accuracy. And it is static and does not have the adaptive
ability to time series at different times, which obviously has a great impact on the
prediction accuracy.

3 Algorithm Description

3.1 Self-Adaptive Exponential Smoothing


Exponential smoothing method’s choice should adapt to properties of different time
series. If there is no significant change in the time series, linear exponential smooth-
ing model (which is also called as exponential smoothing or ES) will be chosen, if
the time series’ change shows linear trend, quadric exponential smoothing model
will be chosen, and if the time series’ change shows parabolic trend, cubic exponen-
tial smoothing model [10] will be chosen. According to the analysis, road networks’
data distribution will have some volatility, and data streams of urban road networks’
peak period is a good verity, it will not be a linear trend or does not change. There-
fore, we may take cubic exponential smoothing model into consideration to optimize
prediction calculation on the data streams of road networks.
According to analyze traditional exponential smoothing model, we put forward
the concept of self-adaptive cubic exponential smoothing which can adapt smooth-
ing weight to time series, we also call this model as self-adaptive exponential
smoothing (SAES). At first, we take a research on cubic exponential smoothing
model, whose prediction model is shown in formula 3:

Yt+T = at + bt T + ct T 2 (3)

Yt+T is the target of prediction (prediction value at time t + T ), t is the time series,
T is the predictive time range, and at , bt , ct are expressed as linear, quadric, cubic
prediction parameter. According to the formula 2, the traditional cubic exponential
smoothing’s formula is shown in 4:
A Traffic Flow Prediction: SAES 57

St1 = α Xt + (1 − α )St−1
1

St2 = α St1 + (1 − α )St−1


2

St3 = α St2 + (1 − α )St−1


3
(4)

In the formula 4, St1 , St2 and St3 are expressed as linear, quadric, cubic exponential
smoothing value, α is the static smoothing parameter, Xt is the actual value at time
t. And then the prediction parameter is shown in 5.

at = 3St1 − St2 + 3St3


α
bt = [(6 − 5α )St1 − 2(5 − 4α )St2 + (4 − 3α )St3]
2(1 − α )2
α2
ct = [S1 − 2St2 + St3] (5)
2(1 − α )2 t

In the cubic exponential smoothing model, the smoothing parameter α is static and
is difficult to adapt to the changes of time series, which is the same problem of
ES model. The initial value of smoothing parameter is hard to determine [11]. In
formula 4 and 5, we can see that α ’s value is always a constant value in the process
of calculating. In terms of the original sequence of ups and downs, even if it finds a
suitable value of α for the previous sequence, it will be not necessarily suitable for
smoothing and prediction at later periods. For the most of time series, randomicity
makes many problems have no actually constant value to match the applications
all the time. Especially for the road networks’ data streams, uncertainty is more
obvious. Thus, there will be a clear prediction error and even a serious distortion if
the traditional exponential smoothing model is used for prediction. Therefore, we
consider giving up a fixed value of α and construct a value α (t) which can adjust
itself with changes over time. At first, we change α to α (t) in formula 4 and get the
following answer:

St1 = ∑ α (t)(1 − α (t))t−i Xt + (1 − α (t))t S01


i=0→t
St2 = ∑ α (t)(1 − α (t))t−i St1 + (1 − α (t))t S02
i=0→t
St3 = ∑ α (t)(1 − α (t))t−i St2 + (1 − α (t))t S03 (6)
i=0→t

Because of form of three formulas in 6 is basically same, here we set Ψ =


α (t)
1−(1−α (t))t . Ψt is the function of time t. when 0 < α < 1,t > 1, we can know
0 < Ψt < 1, when t = 1, we may get limt→1 ψt = 1, and we can make Ψt = 1 which
makes Ψt satisfy the condition of smoothing parameter. So we get a new formula:

St1 = Ψt Xt + (1 − Ψt )St−1
1

St2 = Ψt St1 + (1 − Ψt )St−1


2

St3 = Ψt St2 + (1 − Ψt )St−1


3
(7)
58 J. Feng, Z. Zhu, and R. Xu

The corresponding prediction coefficients are changed as follows:

at = 3St1 − St2 + 3St3


Ψt
bt = [(6 − 5Ψt )St1 − 2(5 − 4Ψt )St2 + (4 − 3Ψt )St3 ]
2(1 − Ψt )2
Ψt 2
ct = [S1 − 2St2 + St3 ] (8)
2(1 − Ψt )2 t

Thus the new adaptive exponential smoothing prediction model (SAES) is consti-
tuted by formula 3, 7, and 8. As new model do not need to estimate the initial values
of x0 and s10 , it can smooth Xt and St1 directly. So it deals with the problem which is
difficult to determine the initial value and avoids the disadvantage of selecting the
initial value of smoothing manually.

3.2 Architecture
Since the ultimate goal of this paper is to establish an appropriate prediction model
to adapt to the practical application of the user’s query needs. For example, ”query
the number of vehicles in the road segment within future ten minutes”. This paper
does not make a further study on aggregate query technique of spato-temporal data
streams, and just uses the aggregate query method of DynSketch [4], it is also can
interpret the differences of performance between ES model and SAES model in the
experiment section. Here we take the basis of the aggregate results to establish an
appropriate prediction system model which is shown in Fig. 1.
At first, traffic flow data are stored in ’Aggregate Index Architecture’ with the
form of aggregation, the user make a query according to his (her) need, SAES mod-
ule get the appropriate results from ’Aggregate Index Architecture’ based on this
query condition, then the algorithm of SAES itself obtain the prediction results, and
the final results are sent to the user.
’Prediction query’ is defined as: prediction query time bucket is t = [T1 , Tn ]
(T0 is current time, and T1 > T0 ), prediction query region is q = ([X0 , X1 ], [Y0 ,Y1 ])

Fig. 1 Prediction system model


A Traffic Flow Prediction: SAES 59

Fig. 2 Prediction process

([X0 , X1 ], [Y0 ,Y1 ] is the coordinate in the two-dimensional space), and the function
of prediction query is that predicting the approximate number of moving objects
within the prediction query region q during a prediction query time bucket t. The
prediction process is shown in Fig. 2.
In Fig. 2, we set the interval between prediction time T1 and current time T0 as
T which is the same concept in formula 3. Because of generally discrete collection
of historical data in prediction model of time series, this paper also uses discretely
historical data to achieve the prediction. If prediction query time bucket is [T1 , Tn ],
the result of each discrete time T1 , T2 , · · ·, Tn of [T1 , Tn ] will be added together to get
the prediction query answer. And we set the i-th moment of the past as T0−i . There
are two situations for us to discuss:
• When T1 = Tn , it represents the single time, we may only make the prediction
calculation at sing time on the road networks.
• When T1 < Tn , it represents many discrete time of [T1 , Tn ] and we can calculate
the sum of prediction value of each time.
The Algorithm which called Predict Agg is showed as follows.

Algorithm Predict Agg(T0, T1 , Tn )


/*T0 represents current time; T1 (Tn ) represents prediction start (end) time.*/
1. get the aggregate value of current time by aggregate query → S0
2. if T1 = Tn //only has one time
α (t)
3. get the value of Ψ1 according to Ψ = 1−(1− α (t))t
4. according to formula 8, Ψ1 and S0 , get the values of (a1 , b1 , c1 )
5. according to formula 3,T1 , T0 and (a1 , b1 , c1 ), calculate Y1
6. return Y1
7. else T1 < Tn //has many times
α (t)
8. for 1→n, get the values of Ψ1 , · · ·, Ψn with Ψ = 1−(1− α (t))t
9. for 1→n, get series of values of (a1 , b1 , c1 )→(an , bn , cn ) with formula 8,
Ψ ’s value and S0
10. for 1→n, get the values of Y1 , · · ·,Yn with formula 3
11. Y = Y1 + · · · + Yn
12. return Y
End Algorithm Predict Agg
60 J. Feng, Z. Zhu, and R. Xu

4 Experiments
The experiment adapted road network of Ningbo, China. There were about 1451
road segments per km2 . There were about 30760 vehicles and we divided them into
four kinds (car, bus, truck and auto-bike) with different speed and moving patterns.
The vehicles were uniformly distributed on the road network at the start time.
Spatio-temporal prediction aggregate query based on road networks is based on
the aggregation of past and current time to predict results. In this section, we do
some experiments according to three factors: 1) historical information length; 2)
smoothing parameter α ; 3) the length of prediction query time (T , this T is not the
same concept as T of formula 3). And finally, we compare SAES with ES.
When α is set to 0.6, we take an analysis by varying historical information length,
and get the Fig. 3.

Fig. 3 Error of Varying


historical information length

In Fig. 3, When historical information length keep the value which is bigger than
22, the relative error will keep the lowest. This is because that SAES is more de-
pendent on the past time series, and adjust the smoothing factor based on historical
value, the historical information length is longer, the relative error is smaller. When
historical information length is long enough, it will have little influence in generat-
ing future data. What’s more, T is longer, the relative error is bigger.
According to the foregoing experiment, we set historical information length to
22, and vary the value of α to find the law. The result is shown in Fig. 4.

Fig. 4 Error of Varying the


value of α

The selection of α ’s value of prediction model SAES is based on the characteris-


tic of the data distribution, then Ψ dynamic adjust its value depending on the value
A Traffic Flow Prediction: SAES 61

of α . Here we analyze the error situation whose smoothing parameter α is set in


(0, 1). (Historical information length is set to 22) It can be seen that errors in differ-
ent cycle have different values, when T is getting longer, the relative error will be
bigger. But prediction error can be maintained under a certain range with the same
value of T , it is due to the SAES model which can dynamically adjust Ψ ’s value to
make it have adaptability.
Finally, we compare our method with ES model in DynSketch [4]. In the
DynSketch method, the parameters which make ES model have the best predic-
tion value is: historical information length is set to 4, and smoothing parameter α
is set to 0.9. In order to fully reflect the superiority of SAES model, we take those
two parameter value to do the experiment, and we get the results which is shown
in Fig. 5.

Fig. 5 Error of comparing


SAES with ES

In Fig. 5, x-axis represents bucket number which is an aggregate parameter of


aggregate process in DynSketch [4]. Obviously, the relative error of SAES is smaller
than ES’s, and its value is about 15%, if we set historical information length to 4
and smoothing parameter α to 0.65 which is get from Fig. 4, the relative error of
SAES will be much smaller.
In summary, we analyzed three factors of SAES in the beginning and got some
appropriate values. Then we compared SAES with ES, we found that SAES had
superior performance. Our method is a good traffic prediction approach based on
aggregated information of spato-temporal road networks’ data streams.

5 Conclusion
Nowadays, traffic congestion is getting to be a more and more serious problem.
We urgently need a prediction approach which uses small space storage and time
consumption to adjust traffic management control program and publish the travel
information to travelers and provide some optimal path options to the drivers. In
this paper, we developed the SAES model by predict the traffic flow based on ag-
gregation of spato-temporal information. Experiments show that, compared with
ES model, our model have the superior performance. In our future work, we will do
some other work to predict long-term traffic flow.
62 J. Feng, Z. Zhu, and R. Xu

References
1. Bao, L., Qin, X.: Research progress in spatio-temporal aggregation computation. Com-
puter Science (2006)
2. Jin, C., Guo, W., Zhao, F.: Getting Qualified Answers for Aggregate Queries in Spatio-
temporal Databases. In: Dong, G., Lin, X., Wang, W., Yang, Y., Yu, J.X. (eds.) AP-
Web/WAIM 2007. LNCS, vol. 4505, pp. 220–227. Springer, Heidelberg (2007)
3. Papadias, D., Tao, Y., Kalnis, P., Zhang, J.: Indexing spatio-temporal data ware houses.
In: Proc. of the Intl. Conf. on Data Engineering, San Jose, CA, pp. 166–175 (2002)
4. Feng, J., Lu, C.: Research on novel method for forecasting aggregate queries over data
streams in road networks. Journal of Frontiers of Computer Science and Technology 11
(2010)
5. Guo, L., Li, J., Wang, W., Zhang, D.: Predictive continuous aggregate queries over data
streams. Journal of Computer Research and Development 41(10) (October 2004)
6. Li, J., Guo, L.: Processing algorithms for predictive aggregate queried over data streams.
Journal of Software 16(7) (2005)
7. Sun, L.: Research on aggregate query based on continuous data streams. Master thesis
(2006)
8. Yu, Y., Wang, G., Chen, C., Fu, C.: A chaos-based predictive algorithm for continuous
aggregate queried over data streams. Journal of Northeastern University (Nature Sci-
ence) 28(8) (August 2007)
9. Brown, R.G., Meyer, R.F.: The fundamental theorem of exponential smoothing. In: JS-
TOR, April 7 (1960)
10. Yan, L., Ma, F.: Application of cubic exponential smoothing method to city underground
deformation prediction. Technology & Economy in Areas of Communications 43(5)
(2007)
11. Li, Y., Jia, F.: Application of dynamic cubic exponential smoothing method to the appli-
cation of predicting GDP of Liaoning province. Applied Science (2009)
A Way for Color Image Enhancement
under Complex Luminance Conditions

Margarita Favorskaya and Andrey Pakhirka

Abstract. In this paper, we represent a novel method of spectrum enhancement for


color and gray-scale images which were received under complex luminance and
have dark or/and bright areas. Classical retinex algorithm permits to normalize dark
areas and gives a result image with a large contract values especially for grey-scale
images that does not satisfy a perceptive observer. Our Enhanced Multi-Scale
Retinex (EMSR) algorithm is based on an adaptive equalization of spectral ranges
not only in dark areas of image but also in bright areas simultaneously. We built a
special function which stretches the spectral ranges with low and high values of in-
tensity but reduces the spectral ranges with middle values. Also a method of image
improvement after the EMSR algorithm application based on empirical dependences
was designed and has shown better visual results as compared with existing filters.

1 Introduction

Methods of image enhancement based on complex filtering are used in many


computer vision applications particularly for compensation of distortions which
appear in non-calibrated optical devices (so called γ-correction), for enhancement
of color image perception, and also for local and global equalization of spectral
ranges of image (an adaptive equalization of dark or/and bright areas). We can
classify such techniques as intensity transformations, a histogram approach, a ho-
momorphic filtering, and retinex algorithm.
Intensity transformations use a wide set of specific functions such as linear
function, logarithmic function, or power function (including γ-correction) [1]. A
histogram approach modifies local histograms in dark and bright areas according
to a desired shape [2]. One may consider this approach as a generalization of in-
tensity transformations with a stochastic nature of images. The term “histogram
equalization” is concerned to this approach. If we will represent a color image as
the product of absorption and background reflectance of light rays then the image
formal model is associated with the homomorphic filtering [3]. The retinex is the
most advanced method which simulates the adaptation of a human vision under
complex luminance conditions [4]. In spite of high computer cost, it demonstrates

Margarita Favorskaya · Andrey Pakhirka


Siberian State Aerospace University, 31 Krasnoyarsky Rabochy, Krasnoyarsk, 660014
Russia
e-mail: favorskaya@sibsau.ru, pakhirka@sibsau.ru

T. Watanabe et al. (Eds.): Intelligent Interactive Multimedia: Systems & Services, SIST 14, pp. 63–72.
springerlink.com © Springer-Verlag Berlin Heidelberg 2012
64 M. Favorskaya and A. Pakhirka

the best processing results for grey-scale images (Single-Scale Retinex (SSR)
algorithm) and has difficulties for color images. Last fact is explained by color
R-, G-, B-functions overlapping after separate calculations of retinex functions for
R-, G-, B-components in input image. This realization is called a Multi-Scale
Retinex (MSR) algorithm.
The image with lower sharpness is not well for human visual perception. The
traditional way is a filter design with defined specific characteristics. We have not
exclusive approach in this sphere, but we tried to solve a complicated task. We
wanted to design such digital filter which will automatically increase image
sharpness after its processing by EMSR algorithm.

2 Related Work
Observed images of a real scene have a strongly dependence from environmental
luminance conditions. Human visual system can recognize objects in shadow
thanks to its dynamical properties but machine vision having a restricted non-
adaptive spectral range collapse in such cases. The human vision can automatical-
ly compensate the luminance deviations by psychological mechanism of color
constancy. The machine vision needs in development of intelligent methods and
algorithms modeling and surpassing the human vision especially under complex
luminance conditions. E. Land was the first researcher who proposed a term “reti-
nex” (formed from “retina” and “cortex”) suggesting the participation both the eye
and the brain activities in processing of visual information [5]. A general mathe-
matical function based on three scales of gray-level variations of input image was
suggested in [6]. Some authors applied a learning mechanism of neural networks
to evaluate the relative brightness in arbitrary environments. Later SSR algorithm
was proposed for dynamic-range compressions. 1D retinex function Ri(x, y, σ) ac-
cording to the SSR model calculates differences of logarithmic functions given by
Ri ( x , y ,σ ) = log{I i (x , y )} − log{F (x , y ,с )* I i ( x , y )} , (1)
where Ii(x,y) is the input image function in the i-th spectral channel along OX and
OY axes; c is a scale coefficient; sign ∗ represents the convolution of the input
image function Ii(x,y) and the surrounded function F(x, y, c).
Many authors proposed various surrounded functions for example, an inverse-
square spatial surround function or an exponential function [7, 8]. The most used
function is a Gaussian F(x, y, σ) defined as

F (x , y ,σ ) = Ke − (x ) .
2
+ y 2 σ2
(2)

The coefficient K from Eq. (2) is chosen so that the following condition takes
place:
∫∫ F (x , y ,σ) dx dy = 1 ,
Ω x ,y
(3)

where Ωx,y is a set of pixels owned to full image.


A Way for Color Image Enhancement under Complex Luminance Conditions 65

MSR function RMi(x, y, w, σ) in the i-th spectral channel is calculated as [9]


N
RM i (x , y ,w ,σ ) = ∑ w n Ri ( x , y ,σ n ) , (4)
n =1

where w = (w1, w2, …, wM) is a weight vector of 1D retinex functions in the i-th
spectral channel Ri(x, y, σ); σ = (σ1, σ2, …, σN) is a scale vector of 1D output
retinex function. The components of a weigh vector w from Eq. (4) satisfy the
equation
N

∑wn =1
n =1.

The dimension of a scale vector σ is chosen not less than 3. In various References
we found different advisable values. In our experiments, we used σ = (15, 90,
180). A weight vector w includes the components with equal values.
MSR algorithm makes distortions in image chromaticity because a value of
each color component of pixel is replaced by a relation its input value to mean
value of surrounding pixels of the same color component. Some decisions of this
problem exist. Certain enhancement we have viewed by transition in other color
spaces with explicit broadcasting of brightness and hue components (HIS-, HSV-,
HSL-spaces). The best effect is achieved by using a model of normalized broad-
casting of brightness and hue components suggested in [4]:
RM′ i (x , y , w ,σ ,b ) = RM i ( x , y , w ,σ )* I i′( x , y ,b ) , (5)

where Ii′(x, y, b) is a normalized brightness which is determined as

⎛ ⎞
⎜ ⎟
I i′( x , y ,b )= log⎜1 + b 3 i
I ( x , y ) ⎟, (6)
⎜ ⎟
⎜ ∑ I i (x , y ) ⎟
⎝ i =1 ⎠
where coefficient b is chosen from middle of values range [0…255], b = 100÷125.
The remainder of paper is organized as follows. In Section 3 we introduce
the special function which dynamically changes the spectral ranges in MSR
algorithm dark and bright areas. Section 4 explains in detail a way of image
improvement after EMSR algorithm application. Experimental results are
included in Section 5, and in Section 6 we summarize the findings of this
paper.

3 Enhanced MSR Algorithm


It is well known that visual objects are more recognized in shadows then in
bright areas. Let’s introduce a complex function for adaptive range equalization
66 M. Favorskaya and A. Pakhirka

depending from threshold value Th in range [1…DR] of i-th color component,


where DR is the upper range value, DRmax = 255. According to MSR algorithm in
dark area, (interval [1…Th]) this function is a logarithmic function of input image
function. That’s why MSR algorithm can not provide an acceptable equalization in
bright area. In high range (interval [Th…DR]), we have proposed a logarithmic
dependence of inverse input image function. Then a result function R(I(x, y)) with
weighting coefficients k1 and k2 for each area is calculated as

⎧ kl ⋅ log(I ( x,y )) , I (x,y ) < Th ,


R (I ( x,y )) = ⎨
⎩− k 2 ⋅ log (DR − I ( x,y )) + log(DR ) , I ( x,y ) ≥ Th,
(7)

where

Th ⎛ Th ⎞
logDR ⎜1 − ⎟ ⋅ logDR
k1 = DR , k2 = ⎝ DR ⎠
,
(8)
log(Th ) log(DR − Th )
where DR is a dynamic image range; DR=255 for image with 8 bit on color
channel; value of threshold Th is equal 200 (according to empirical analysis).
Graphic of function R(I(x, y)) is represented on Fig. 1.

Fig. 1 A view of function R(I(x, y)) with threshold Th=200

However multi-dimensional functions RMi(x, y, w, σ) in the i-th spectral chan-


nel, Eq. (4), have in MSR algorithm overlapping spectral ranges that leads to dis-
tortion of color components in output image. For compensation of such effect we
may decompose functions RMi(x, y, w, σ) on range series.
A Way for Color Image Enhancement under Complex Luminance Conditions 67

The jointed logarithmic branches in the result function R(I(x, y)) permit to in-
crease objects recognition in bright areas. Our experiments have shown better rec-
ognition in dark areas; it explains by availability of worse (for human vision)
edges and boundaries. In bright areas, the objects edges and boundaries are practi-
cally disappear, and we have no information for restoration. Also we may say that
the application of complex logarithmic function, which is represented on Fig.1, in-
creases objects contrast in typical brightness distribution [60…200] well less than
in dark and bright areas.
Another problem of color images which are processed by MSR or EMSR
algorithms is an overflow of contrast objects with high reflection coefficients. We
proposed the reconstruction model of a luminance normalization based on γ-
correction:

[ ]
1
I γ ( x , y ) = Wh I R ( x , y ) Wh γ , (9)

where Wh is a value of white color (Wh = 255 for 8-bit images); [⋅] is an integer
part of a number; IR(x, y) is a processed by one of retinex similar algorithms im-
age, and Iγ(x, y) is a reconstructed by γ-correction image.
The extension of γ-correction method is a design of a linear filter which re-
motes high-frequency components and moderates low-frequency components by
using γ-correction. The interesting decision is the application of following steps:

1. The usage of a regressive γ-correction to an input image.


2. The image equalization by MSR or EMSR algorithms.
3. The usage of the compensating γ-correction to the enhanced by MSR or
EMSR algorithms image.

For estimation of a quality of enhanced images, let’s calculate two estimations: a


relation of peak signal MPSNR to noise (PSNR – Peak Signal-to-Noise Ratio), and a
relation of contrast MCNR to noise (CNR – Contrast-to-Noise Ratio). We modified
these estimations for SSR, MSR, and EMSR algorithms in such manner:

M PSNR = 20 log
I P (x , y )
, ⎨
( )
⎧⎪Δ = I R ( x , y ) − I (x , y ) if 100 < I ( x , y ) < 200 ,
Δ2 ( )
⎪⎩Δ = 1 I R ( x , y ) − I ( x , y ) in other cases , (10)
∑Ω
Ω x ,y x ,y

M CNR =
[ ]
E I R (x , y ) − E [I ( x , y )]
,
μ(I ( x , y )) + μ(I ( x , y ))
R (11)
2
P
where I is a peak value of brightness; |Ωx,y| is a surrounded image area; E[⋅] is a
mean value on Ωx,y; μ(⋅) is a variance value on Ωx,y.
Thereby, EMSR algorithm has following steps.
Initial data: an input image received under complex luminance conditions.
68 M. Favorskaya and A. Pakhirka

Step 1. Transform of input image from RGB-space to HSL-space.


Step 2. Determine a threshold value Th in range [1…DR] of i-th color compo-
nent using the curve from Fig.1.
Step 3. Calculate a result function R(I(x, y)), Eq. (7).
Step 4. Decompose functions RMi(x, y, w, σ) on range series.
Step 5. If it is necessary, apply a reconstruction algorithm based on γ-
correction, Eq. (9).
Step 6. Estimate a quality of received images using Eqs. (10)-(11).
Output data: an improved output image.

4 Image Improvement
All retinex similar algorithms give a result image with lower sharpness function
that does not satisfy a perceptive observer. This fact is not essential for following
computer processing. But if we have a task only to enhance visual properties of
image then we need in a following image improvement. Image sharpness permits
to improve details in image which became blurred or have not enough clearness
for human vision. Some popular filters solve this problem, for example high-
frequency filter “High pass”, filter based on the second derivation (Laplacian), and
“Unsharp masking” filter. All filters increase sharpness by contrast amplification
of tonal transitions. The main disadvantage of High pass and Laplacian filters con-
sists in sharpening not only image details but also a noise. Unsharp masking filter
blurs a copy of original image by Gauss function and subtracts the received image
from original input image if their differences exceed some threshold value.
Our Enhanced Unsharp Masking (EUM) filter improves the output image by
joined compositing of contour performance and equalization performance based
on empirical dependences. Let’s calculate a function of contour performance
ICP(x, y) in follows:

I CP ( x , y ) = R (F NK ( x , y )) + R (F PK ( x , y )) , (12)

where R(⋅) is a function of range equalization; FNK(⋅) and FPK(⋅) are response func-
tions with negative and positive kernels:
r r
⎛ 2⎛ k ⎞ ⎞
F NK ( x , y ) = ∑ ∑ ⎜⎜ I (x + i , y − j ) − (2r + 1) ⎜1 + c ⎟ I ( x + i , y − j )⎟⎟ ,
i = − r j = − r⎝ ⎝ 2r + 1 ⎠ ⎠
(13)
r r
⎛ 2⎛ kc ⎞ ⎞
F ( x , y ) = ∑ ∑ ⎜⎜ I ( x + i , y − j ) − (2r + 1) ⎜1 −
PK
⎟ I ( x + i , y − j )⎟⎟ ,
i = − r j = − r⎝ ⎝ 2r + 1 ⎠ ⎠
where r is a distance from central processed pixel to boundary of slicing window;
kc is a coefficient of boundaries suppression, kc = 0.2…0.7.
A Way for Color Image Enhancement under Complex Luminance Conditions 69

A function of equalization performance IEP(x, y) has a view:


⎧⎪ I (x , y ) + k D k SS k SC F δ ( x , y ) , F δ (x , y ) ≤ −TS ,
I EP ( x , y ) = ⎨ (14)
⎪⎩ I (x , y ) + k L k SS k SC F δ (x , y ) , F δ ( x , y ) ≥ TS ,
where kD is a blanking coefficient, kD=0.7…0.9; kL is a lighting coefficient,
kL=1.1…1.3; kSS is a shift suppression coefficient, kSS=1–Fδ/255; kSC is a shift cor-
rection coefficient, kSC=1/logFδ; Fδ(x, y) is a difference between original image
and smoothed by Gauss filter image; TS is a threshold value. Values of coeffi-
cients kD and kL, is selected for contour sharpness and background smoothness.

5 Experimental Results

Experimental software “Non-Linear Adaptive Image Enhancement” (NLAIE)


processes separate images and includes eight main modules: a module for image
transformation from RGB-space to HIS-space, a module of SSR algorithm, a
module of MSR algorithm, a module of EMSR algorithm, a module of histogram
equalization, a module of γ-correction, a module of image improvement, and an
estimation module. By designed software on language C++ we processed near 200
images, color and grey-scaled, and calculated the estimations of enhanced images.
Example of input and processed images “Tree” is located on Fig. 2. One can see
the details of image in shadow (Fig. 2 b). Such processing permits to find con-
tours, geometrical and statistical characteristics for following image segmentation.

a b
Fig. 2 Example of EMSR algorithm application: a) input image with shadow, b) enhanced
image with details in shadow

In Table 1 the comparative results for image “Tree” are represented. We ap-
plied SSR, MSR, EMSR algorithms (image was converted from RGB-space to
HSL-space), histogram equalization, and local histogram equalization and calcu-
lated estimations MPSNR and MCNR by Eqs. (10)-(11). As one can see, algorithms
based on logarithmic equalization of spectrum ranges demonstrate better results
(relations of a peak signal to a noise and of a contrast to a noise have large values).
70 M. Favorskaya and A. Pakhirka

Table 1 The comparative results for image “Tree”

Algorithm Estimation MPSNR Estimation MCNR


SSR algorithm (σ=15) 7.52 3.78
SSR algorithm (σ=90) 16.23 0.98
SSR algorithm (σ=180) 20.14 0.54
MSR algorithm (w1=1/3, w2=1/3, w3=1/3) 12.58 1.58
MSR algorithm (w1=1/2, w2=1/4, w3=1/4) 10.98 2.08
MSR algorithm (w1=1/4, w2=1/2, w3=1/4) 14.42 1.46
MSR algorithm (w1=1/4, w2=1/4, w3=1/2) 15.88 1.21
MSR algorithm (w1=1/5, w2=3/5, w3=1/5) 14.86 1.33
EMSR algorithm 18.49 2.42
Histogram equalization 6.63 1.82
Local histogram equalization 6.51 1.86

The results of image sharpness are presented on Fig.3. We tested two filters:
Laplacian filter and our EUM filter.

a b

c d
Fig. 3 Image enhancement: a) input image, b) EMSR processing, c) Laplacian filter applied
to image b, d) EUM filter applied to image b
A Way for Color Image Enhancement under Complex Luminance Conditions 71

On Fig. 4 there are some fragments from images Fig. 3 b, c, d in 100% scale. It
is evidence that fragments on Fig. 4 c have more sharpness edges (than on Fig. 4
a) and more smoothness homogeneous regions (than on Fig. 4 b).

a b c

Fig. 4 Image fragments from Fig. 3 b, c, d in 100% scale: a) EMSR processing, b)


Laplacian filter applied to image a, c) EUM filter applied to image a

6 Conclusion

We investigated a spectrum enhancement of color and gray-scale images. Our


novel Enhancement MSR algorithm is based on complex logarithmic function
which normalizes as possible dark or/and bright areas on images. Also we devel-
oped γ-correction model for compensation luminance for objects image having
high value of reflection coefficient. We introduced estimations of a quality of en-
hanced images MPSNR and MCNR. The EUM filter based on empirical dependences
permits to improve the contour representation of images, especially grey-scale
images.
EMSR algorithm lies on the base of experimental software NLAIE. It includes
modules not only functioning according to the development methods and algo-
rithms but also modules realizing SSR algorithm, MSR algorithm, and a histogram
equalization. The desirable effect is achieved by stretching the spectral ranges
with low and high values of intensity and reducing the spectral ranges with middle
values of image which had been received from digital grabber device. The future
work is connected with extension additional technologies and reconstruction video
sequences by EMSR algorithm.
72 M. Favorskaya and A. Pakhirka

References
1. Gonzalez, R.C., Woods, R.E.: Digital image processing, 2nd edn. Prentice Hall, Inc.,
New Jersey (2002)
2. Chen, S.D., Ramli, A.R.: Preserving brightness in histogram equalization based
contrast enhancement techniques. Digit. Sig. Proc. 14(5), 413–428 (2004)
3. Kimmel, R., Shaked, D., Elad, M., Sobel, I.: Space-dependent color gamut mapping: A
variational approach. IEEE Trans. Image Process. 14(6), 796–803 (2005)
4. Moroney, N.: Method and System of Local Color Correction Using Background
Luminance Mask. U.S. 6, 741, 753 (2004)
5. Meylan, L., Alleysson, D., Süsstrunk, S.: Model of retinal local adaptation for the tone
mapping of color filter array images. J. Opt. Soc. Am. A 24(9), 2807–2816 (2007)
6. Land, E.: An alternative technique for the computation of the designator in the retinex
theory of color vision. Proc. Natl. Acad. Sci. USA 83(10), 3078–3080 (1986)
7. Hurlbert, A.C., Poggio, T.: Synthesizing a color algorithm from examples.
Science 239(4839), 482–485 (1988)
8. Choi, D.H., Jang, I.H., Kim, M.H., Kim, N.C.: Color image enhancement based on
single-scale retinex with a JND-based nonlinear filter. In: IEEE Int. Symp. Circuits and
Syst., pp. 3948–3951 (2007)
9. Sun, B., Chen, W., Li, H., Tao, W., Li, J.: Modified luminance based adaptive MSR.
In: IEEE ICIG, pp. 116–120 (2007)
10. Meylan, L., Süsstrunk, S.: High Dynamic Range Image Rendering with a Retinex-
Based Adaptive Filter. IEEE Trans. Image Process. 15(9), 2820–2830 (2006)
Animated Pronunciation Generated
from Speech for Pronunciation Training

Yurie Iribe, Silasak Manosavan, Kouichi Katsurada, and Tsuneo Nitta

Abstract. Computer-assisted pronunciation training (CAPT) was introduced for


language education in recent years. CAPT scores the learner’s pronunciation qual-
ity and points out wrong phonemes by using speech recognition technology. How-
ever, although the learner can thus realize that his/her speech is different from the
teacher’s, the learner still cannot control the articulation organs to pronounce cor-
rectly. The learner cannot understand how to correct the wrong articulatory ges-
tures precisely. We indicate these differences by visualizing a learner’s wrong
pronunciation movements and the correct pronunciation movements with CG
animation. We propose a system for generating animated pronunciation by
estimating a learner’s pronunciation movements from his/her speech automatical-
ly. The proposed system maps speech to coordinate values that are needed to gen-
erate the animations by using multi-layer neural networks (MLN). We use MRI
data to generate smooth animated pronunciations. Additionally, we verify whether
the vocal tract area and articulatory features are suitable as characteristics of
pronunciation movement through experimental evaluation.

Keywords: Vocal Tract Area, Articulatory Feature, Animated Pronunciation,


Pronunciation Training.

1 Introduction
Computer-assisted pronunciation training (CAPT) was introduced for language
education in recent years [1][2]. CAPT typically scores pronunciation quality and
points out a learner’s wrong phonemes by using speech recognition technology
[3][4][5]. Moreover, it often indicates the differences between incorrect and correct
pronunciation by showing the learner’s speech wave and the correct speech wave.

Yurie Iribe
Information and Media Center, Toyohashi University of Technology, Japan
e-mail: iribe@imc.tut.ac.jp

Silasak Manosavan · Kouichi Katsurada · Tsuneo Nitta


Graduate School of Engineering, Toyohashi University of Technology, Japan
e-mail: {manosavan,katsurada,nitta}@cs.vox.tut.ac.jp

T. Watanabe et al. (Eds.): Intelligent Interactive Multimedia: Systems & Services, SIST 14, pp. 73–82.
springerlink.com © Springer-Verlag Berlin Heidelberg 2012
74 Y. Iribe et al.

Although the learner can thus


realize that his/her speech is map map
different from the teacher’s, your pronunciation correct pronunciation
the learner cannot understand
how to make the correct
pronunciation movement. In
particular, as for the speech
wave, only a phonetician can
realize the reasons for the
differences. The system
should show how the wrong Fig. 1 Animations of learner’s and correct
articulatory organs move and pronunciation
how to correct this move-
Coordinate
ment when the learner Vector
makes a wrong pronuncia- User’s
X (t-3)
Speech
tion. Although other studies
Characteristics X(t) Animated
have examined making Pronunciation Output
(Articulatory
correct pronunciation anima- Input Feature or Vocal
X(t+3) Generator
Tract Area)
tions and video in advance
[6][7][8], the studies do not
automatically produce ani- Multi-Layer neural Networks(MLN)
mations of the learner’s
wrong pronunciation from Fig. 2 System outline
speech. We indicate these
differences by visualizing the
learner’s wrong pronunciation movements (movement of the tongue, palate, and
lips) and the correct pronunciation movements by using CG animation (Figure 1).
As a result, the learner can study how to move the articulatory organs while visual-
ly comparing their mispronunciation animation with the correct pronunciation
animation. We propose generating animated pronunciations by automatically esti-
mating the pronunciation movement from speech. Concretely, the proposed system
maps speech to coordinate values that are needed to generate the animations by
using multi-layer neural networks (MLN). We use MRI data to represent smooth
human articulatory movements accurately. MRI data is applied as MLN training
data. Additionally, we compare whether the vocal tract area and articulatory fea-
tures are appropriate as characteristics of pronunciation movement through experi-
mental evaluation. In this paper, the method for automatically generating animated
pronunciations from speech is described.
In section 2, we describe the method for articulatory feature extraction, the
vocal tract area calculators, coordinate vector extraction, and CG animation
generation. In section 3, an experimental evaluation to confirm the accuracy of the
generated animated pronunciation is discussed. In the last section, the paper is
summarized.
Animated Pronunciation Generated from Speech for Pronunciation Training 75

2 Animated Pronunciation Generation

2.1 System Outline


spectrum
Figure 2 shows an outline of the
system. The system consists
mainly of characteristics (such
1
as articulatory feature or the 0
1
0
vocal tract area) extractor,
coordinate vector extraction by
multi-layer neural networks
(MLN), and CG animation
generation based on the coordi-
nate vectors. Coordinate vectors
are acquired by transforming
articulatory feature or the vocal
tract area extracted from
Fig. 3. Articulatory feature sequence: /jiNkoese
speech. Our previous research (artificial satellite)/
applied the articulatory features
(place of articulation and man-
ner of articulation) to extract articulatory movement from speech [9]. To generate
accurate animation, we use the vocal tract area and articulatory features as charac-
teristics we regarded that it may be suitable for area mapping with the coordinate
values of the articulation organs. Especially, we verify whether the vocal tract area
and articulatory features are suitable as characteristics of pronunciation movement
through experimental evaluation.
The CG animation is generated on the basis of coordinate values extracted from
a trained MLN. As a result, the user’s speech is input in our system, and a CG
animation that visualizes the pronunciation movement is automatically generated.
Moreover, this paper describes the animation generation of English using more
phonemes than in Japanese.

2.2 Articulatory Feature Extraction


In order to vocalize, human beings change the shape of the vocal tract and move
articulatory organs such as the lips, alveolar arch, palate, tongue and pharynx. This
is called articulatory movement or pro-
nunciation movement. Each attribute of
the place of articulation (back tongue,
front tongue, palate, etc.) and manner of
articulation (fricative, plosive, nasal, etc.)
in the articulatory movement is called an
articulatory feature. In short, articulatory
features are information (for instance,
closing the lips to pronounce "m") about
the movement of the articulatory organ Fig. 4. Articulatory feature extraction
76 Y. Iribe et al.

that contributes to the articulatory movement. In this paper, articulatory features are
expressed by assigning +/- as the feature of each articulation in a phoneme. For
example, the articulatory feature sequence of "/jiNkoese/ (space satellite)" in Japa-
nese is shown in Figure 3. Because phoneme N is a voiced sound, "voiced" in Figure
3 is given [+] (Actually, [+] is given a value of "1" (right side of Figure 3)) as the
teacher signal. Because phoneme k is a voiceless sound, "voiced" in Figure 3 is
given [-]. Actually, [-] is given a value of "0" (right side of Figure 3) as the teacher
signal and "unvoiced" in Figure 3 is given [+]. We generated an articulatory feature
table of 15 dimensions corresponding to 25 Japanese phonemes. We defined the
articulatory features based on distinctive phonetic features (DPF) involved in
Japanese phonemes in international phonetic symbols (International Phonetic
Alphabet; IPA) [10].
We also used our previously developed articulatory feature (AF) extraction
technology [10]. The extraction accuracy is about 95 %. Figure 4 shows the AF
extractor. An input speech is sampled at 16 kHz and a 512-point FFT of the 25 ms
Hamming-windowed speech segment is applied every 10 ms. The resultant FFT
power spectrum is then integrated into a 24-ch BPFs output with mel-scaled center
frequencies. At the acoustic feature extraction stage, the BPF outputs are first
converted to local features (LFs) by applying three-point linear regression (LR)
along the time and frequency axes. LFs represent variation in a spectrum pattern
along two axes. After compressing these two LFs with 24 dimensions into LFs
with 12 dimensions using a discrete cosine transform (DCT), a 25-dimensional (12
Δt, 12 Δf, and ΔP, where P stands for the log power of a raw speech signal) fea-
ture vector called LF is extracted. Our previous work showed that LF is superior
to MFCC as the input to MLNs for the extraction of AFs. LFs then enter a three-
stage AF extractor. The first stage extracts 45-dimensional AF vectors from the
LFs of input speech using two MLNs, where the first MLN maps acoustic fea-
tures, or LFs, onto discrete AFs and the second MLN reduces misclassification at
phoneme boundaries by constraining the AF context. The second stage incorpo-
rates inhibition/enhancement (In/En) functionalities to obtain modified AF pat-
terns. The third stage decorrelates three context vectors of AFs.

2.3 Vocal Tract Area Extraction


The vocal tract area is determined from the following vocal tract area function
with the following formula.
Am −1 / Am = (1 + km ) / (1 − km ), (m = M ,...,1) (1)

Am : m dimension of the vocal tract area, k m : PARCOR coefficient

PARCOR coefficients are equivalent to the reflection coefficients in a lossless


acoustic tube model of the vocal tract. A vocal tract area function expresses the
vocal tract area from the glottis to the lips as a function of the distance from the
glottis, and it is related to the distance between the palate and the tongue. The
vocal tract area is acquired by calculating PARCOR parameters converted from
Animated Pronunciation Generated from Speech for Pronunciation Training 77

speech signals. The vocal tract area (13 dimensions) is combined with the other
two frames, which are three points prior to and following the current frame
(VT (t, t-3), VT (t, t+3)) to form articulatory movement. MLN input is the vocal
tract area (13 × 3 dimensions).

2.4 Coordinate Vector Extraction


We apply magnetic resonance
Training of MLN
imaging (MRI) images to obtain
the coordinate values of the Coordinate
shape of an articulatory organ. Speech in Vocal tract area
MRI data calculator
vector of
MRI machines capture images MRI images
within the body by using magnet-
ic fields and electric waves. MRI
Coordinate vector extractor
data captured in two dimensions Feature point’s vector
details the movements of the in each frame
person’s tongue, larynx, and Optical flow calculation
palate while making an utterance.
CG animations are generated on Fig. 5 Coordinate vector extraction (In the case of
the basis of coordinate vectors. vocal tract area)
The MLN trains the vocal tract
area or articulatory feature ex-
tracted from the speech included in the MRI data as input and the coordinate
vectors of the articulatory organs acquired from the MRI images as output
(Figure 5). As a result, after the user’s speech is input, the coordinate vectors
adjusted to the speech are extracted, and a CG animation is generated. In this
section, the extraction of the feature points on the MRI data and the method for
calculating the coordinate vectors of each feature point are described.
We assigned initial feature points to the articulatory organ’s shapes (tongue,
palate, lips, and lower jaw) on the MRI data beforehand. The number of initial
feature points was 43. We decreased the number of the dimensions of the MLN
training data in order to train the MLN effectively by using even a small amount
of MRI data. Therefore, we selected only eight feature points that vary infinitely
and are important for the pronunciation teaching method (Figure 6). Many feature
points should be assigned if a lot of MRI data can be used. These feature points
were obtained in the following order.

1. We imported 10-ms speech and image segments to the MRI data


2. The coordinate value of each feature point was extracted by calculating the
optical flow for each frame. The input data for the optical flow program is
the coordinate vectors of the initial feature points.
3. Only the y-coordinate distance of each feature point was calculated to de-
crease the number of dimensions. The x-coordinate value was the same as the
x-coordinate of the initial feature point.
78 Y. Iribe et al.

Fig. 6 Feature points used in MLN Fig. 7 CG animation of “basket”


training

The number of the dimensions of the MLN was the vocal tract area (15 × 3 dimen-
sions) used as input and y-coordinate vectors (8 × 3 dimensions) used as output. In
the case of articulatory feature, the number of the dimensions of the MLN was 28
× 3 used as input and y-coordinate vectors (8 × 3 dimensions) used as output.

2.5 CG Animation Generation Programs


We correct the y-coordinate vectors by using a spline curve and a median filter to
form the CG animations. We assigned 43 points (15 tongue points, 2 lip points, 16
palate points, and 10 lower jaw points) as the initial feature points of the MRI im-
ages. The position relation between the 8 feature points (trained by the MLN) and
the remaining 35 feature points are calculated. The spline curve is used to comple-
ment between the eight feature points and other feature points by keeping the posi-
tion relation. The movement is drawn on the basis of the y-coordinate distance, but
since this movement is twitchy, we use a median filter to smooth it out.
The system is built as a web application so that various users can use it on the web,
and the system can be incorporated in various web dictionaries. The CG animation
program was implemented with Actionscript3.0 to operate in a web browser with a
Flash Player plug-in installed. Figure 7 shows a screen shot of a CG animation devel-
oped in the present study. It highlights the wrong articulation organs with a red circle
based on the differences between learner’s animation and correct animation.

3 Evaluations

We calculated the correlation coefficient between the coordinate values of the


generated CG animations and the MRI data to confirm the accuracy of the anima-
tions. Moreover, to show the effectiveness of using articulatory features to extract
coordinate distances, we compared the correlation coefficients for the case of AF
with the case of LF as MLN inputs.

3.1 Experimental Data and Setup


To evaluate the animation generated from speech, the correlation coefficient between
the animations and MRI images is calculated. Moreover, the correlation coefficient of
the articulation features and a vocal tract area as MLN input is also compared.
Animated Pronunciation Generated from Speech for Pronunciation Training 79

The MRI data used in the evaluation was taken in a single shot, in which one
female English native speaker uttered 37 English words. The data set used for the
experimental evaluation is as follows.
D1: Training data set for AF-coordinate vector or VT-coordinate vector con-
verter training: 36 words of English speech and images included in the MRI data
(one female English native speaker)
D2: Testing data set for AF- coordinate vector or VT-coordinate vector conver-
ter adaptation: One word of English speech included in the MRI data (one
female English native speaker)
Experimental results are acquired by using the leave-one-out cross-validation me-
thod. MRI data we used in this experiment was recorded in ATR (Advanced Tele-
communications Research Institute International) by Kobe University research group.

3.2 Experimental Results


Figure 8 shows the correlation coefficient for each phoneme. As for the average
correlation coefficient ("all" in Figure 8) of all phonemes, the vocal tract area was
0.83 and articulatory feature was 0.78. In addition, the correct rate of articulatory
feature by MLN is about 81.2% (“all” in Figure 9). A comparatively high correla-
tion coefficient was acquired in spite of the small amount of training data. On the
whole, the correlation coefficient of the vocal tract area was higher than that of the
articulatory feature.

Fig. 8 Correlation coefficient of each phoneme

Fig. 9 Correct rate of articulatory feature


80 Y. Iribe et al.

18
16
14
Phoneme Number

12
10
8
6
4
2
0
p b t d k f th s z sh ch hh m n r l w y iy ih ey eh ae aa ay awow oy uh uw ah er ax

Fig. 10 The number of each phoneme in the training data

It is clear that the translation to Vocal Tract Area Articulatory Feature


coordinate vectors has higher adapta- 1
sil b ah t

bility with the vocal tract area than do 0.8 Correlation Coefficient
the articulatory features. However,
0.6
although we evaluated with a speaker
dependent data set in this experiment, 0.4

the vocal tract area typically changes in 0.2

response to a speaker's age and sex. On 0


0 20 40 60 80 100
the other hand, it is well expected that Frame
the correlation coefficient of the articu-
Fig. 11 Correlation coefficient of word
latory features would be higher than “but”
that of the vocal tract area in the case
of a speaker-independent experiment 0.9

because it would not be dependent on a 0.8


Correlation Coefficient

0.7
speaker. We intend to evaluate with 0.6

speaker-independent data. Figure 10 0.5


0.4
shows the number of phonemes con- 0.3

tained in the training data. Despite the 0.2


0.1
fact that phoneme “t” has the heaviest 0

number of all the phonemes, the corre- ① ② ③ ④ ⑤ ⑥ ⑦ ⑧


Feature Point Number
lation coefficient is not very high in
Figure 8. Since training is insufficient Fig. 12 Correlation coefficient of each
depending on the phoneme, further feature point
improvement is required. Figure 11
shows the result of the word "but." Generally, before humans utter speech, an
articulation organ is already beginning to move. It is clear that the articulatory
movement [“sil” (from frames 0 to 7)] before the speech of phoneme "b" is ex-
pressed accurately from Figure 11. It is effective to train by combining the preced-
ing and subsequent frames (t-3, t+3) in current frame t in the MLN. Figure 12
shows the correlation coefficient for each articulatory organ. The horizontal axis
shows feature points, and more specifically, from feature point to feature point ①
⑤ refers to the tongue, feature point ⑥
refers to the upper lip, feature point ⑦
refers to the soft palate, and feature point ⑧
refers to the lower lip in Figure 6.
Animated Pronunciation Generated from Speech for Pronunciation Training 81

Although the soft palate shows high correlation, the correlation of the lower lip is
not very good. Moreover, although the tongue is an average 0.7, since it is a very
important organ for various pronunciations, it is necessary to improve the articula-
tory gesture of the tongue and lower lip. We plan to intensively train important
articulatory manners and articulatory positions in the MLN by forming some
anchor points.
We calculated only the y-coordinate distance of each feature point to decrease
the number of dimensions in this experiment. However the x-coordinate should be
also assigned if a lot of MRI data can be used, because it is affected by individual
variation of users. Additionally, we will verify the individuality of ariticulatory
movement by applying MRI data composed of some users

4 Summary
We developed a system for automatically generating CG animations to express
pronunciation movement through articulatory features extracted from speech.
The pronunciation mistakes of the user can be pointed out by expressing the
pronunciation movements of the user’s tongue, palate, lips, and lower jaw as
animated pronunciations. We conducted experiments that confirmed the accuracy
of the generated CG animations. The correlation coefficient was more than about
0.83, and we confirmed that smooth animations were generated from speech
automatically. We will build a pronunciation instructor system that includes the
CG animation program.

Acknowledgements. This research was supported by a Grant-in-Aid for Young Scientists


(B) (Subject No. 21700812).

References
1. Delmonte, R.: SLIM prosodic automatic tools for self-learning instruction. Speech
Communication 30(2-3), 145–166 (2000)
2. Gamper, J., Knapp, J.: A Review of Intelligent CALL Systems. Computer Assisted
Language Learning 15(4), 329–342 (2002)
3. Neumeyer, L., Franco, H., Digalakis, V., Weintraub, M.: Automatic scoring of pro-
nunciation quality. Speech Communication 30(2-3), 83–93 (2000)
4. Witt, S.M., Young, S.J.: Phone-level pronunciation scoring and assessment for interac-
tive language learning. Speech Communication 30(2-3), 95–108 (1995)
5. Deroo, O., Ris, C., Gielen, S., Vanparys, J.: Automatic detection of mispronounced
phonemes for language learning tools. In: Proceedings of ICSLP 2000, vol. 1, pp. 681–
684 (2000)
6. Wang, S., Higgins, M., Shima, Y.: Training English pronunciation for Japanese learn-
ers of English online. The JALT Call Journal 1(1), 39–47 (2005)
7. Phonetics Flash Animation Project,
http://www.uiowa.edu/~acadtech/phonetics/
82 Y. Iribe et al.

8. Wong, K.H., Lo, W.K., Meng, H.: Allophonic variations in visual speech synthesis for
corrective feedback in capt. In: Proc. ICASSP 2011, pp. 5708–5711 (2011)
9. Iribe, Y., Manosavanh, S., Katsurada, K., Hayashi, R., Zhu, C., Nitta, T.: Generation
Animated Pronunciation from Speech through Articulatory Feature Extraction. In:
Proc. of Interspeecch 2011, pp. 1617–1621 (2011)
10. Huda, M.N., Katsurada, K., Nitta, T.: Phoneme recognition based on hybrid neural
networks with inhibition/enhancement of Distinctive Phonetic Feature (DPF) trajecto-
ries. In: Proc. Interspeech 2008, pp. 1529–1532 (2008)
Building a Domain Ontology to Design
a Decision Support Software to Plan Fight
Actions against Marine Pollutions

Jean-Marc Mercantini and Colette Faucher

Abstract. The return on experience in techniques to combat pollution show that


their effectiveness depends on the situations in which they are implemented and
that their choice is not trivial. From the maritime field perspective, the objective of
the paper is to present a software tool to assist crisis management staff to minimise
pollution impact due to maritime accidents. From a methodological perspective,
the objective of the paper is to show the importance to develop ontologies (i) for
structuring a domain as perceived by its actors and (ii) for building computer tools
aimed to support problem solving in that domain. Such tools are endowing
knowledge shared by the actors of the domain, what make them more effective
within critical situations. The followed designing methodological process is based
on the Cognitive Engineering method "Knowledge Oriented Design” (KOD). In
the paper, the methodological process will be detailed. The resulting ontology, the
architecture of the software tool and the plan generation mechanism will be
presented and discussed.

1 Introduction

Although the Mediterranean is only one hundredth of the sea surface it supports
thirty percent of the volume of international maritime traffic. An estimated 50% of
goods transported could present a risk to different degrees. A study on shipping
accidents in the Mediterranean sea [1], conducted between 1977 and 2003
identified 376 accidents involving hydrocarbons and 94 accidents involving

Jean-Marc Mercantini · Colette Faucher


Laboratoire des Sciences de l’Information et des Systèmes (UMR CNRS 6168)
Université Paul Cézanne (Aix-Marseille 3)
Avenue Escadrille Normandie-Niemen
13397 Marseille cedex 20
Phone : +33 (0) 491 05 60 15
e-mail: jean-marc.mercantini@Lsis.org, colette.faucher@Lsis.org

T. Watanabe et al. (Eds.): Intelligent Interactive Multimedia: Systems & Services, SIST 14, pp. 83–95.
springerlink.com © Springer-Verlag Berlin Heidelberg 2012
84 J.-M. Mercantini and C. Faucher

hazardous and noxious substances (HNS). These accidents have resulted in a total
discharge of 305,000 tons of hydrocarbons and 136,000 tonnes of HNS. These
events highlight the criticality of the risk induced by transport activities in that
region. In general, the strategy of fight against marine pollution from hydrocarbon
following shipping accident is divided in two complementary stages: (i) recovery
of the maximum volume of hydrocarbon on the sea and, when the pollutant
reached the coast, (ii) cleaning the polluted coastline. There are many intervention
techniques to combat pollution and their effectiveness depends on the situations in
which they are implemented. Thus, it appears that the choice of a fight technique
in a response plan is not trivial and requires taking into account a large number of
parameters.
The project CLARA 2 (Calculations Relating to Accidental Releases in the
Mediterranean) brings responses to these problems. It aims to develop and
implement a computer tool to assist management crisis resulting from a maritime
accident having caused a spill of pollutants. To carry out this national project
(funded by the Research National Agency), a consortium of 13 partners was
formed [2]. The purpose of this paper is to focus on the elaboration process of the
GENEPI module (GENEration de Plan d’Interventions) from CLARA 2, which
aims to plan fight actions against marine pollutions. The current work lies in the
following research fields: (1) from the maritime field perspective, the paper
presents a software tool to assist crisis management staff to minimise pollution
impact and (2) from a methodological perspective, the paper shows the importance
of developing ontologies (i) for structuring a domain (at a conceptual level) as its
actors perceive it and (ii) for using these ontologies to build computer tools for aid
solving problems in that domain. An overview of the CLARA 2 project is
presented in Section 2 and Section 3 presents the functioning principles of the
GENEPI module. Section 4 describes the methodological approach and the
process used to build the GENEPI module. In the Section 5 the implementation of
the process is developed and exemplified. Section 6 presents the architecture of
the GENEPI module. Section 7 presents the conclusions.

2 The CLARA 2 Project


The CLARA 2 project aims to provide a tool for managing crises induced by
marine pollution, whether of chemical or petroleum. This tool should facilitate the
rapid establishment of relevant exclusion zones to alert, but also to protect people,
goods and environment, to mobilize appropriated fighting means and to anticipate
critical situations. It also provides information on the capabilities of bio-
accumulate in the food chain substances released and a preliminary approach to
risk in terms of toxicological effects on humans is proposed in case of atmospheric
dispersion of toxic gases. The software tool is based on a simulator designed to
predict the location of a pollutant, changes in its concentration in the sea and in the
atmosphere following a massive spill. It helps to know the effects in the case of a
fire, provides information on the bioaccumulation capacity of some marine
organisms and provides sensitivity indicators according to the polluted areas
(Vulnerability Maps). In addition, CLARA 2 generates plans on the steps to take
Building a Domain Ontology to Design a Decision Support Software 85

and methods of intervention to implement (the generation module of intervention


plans: GENEPI).

3 The GENEPI Module


The fighting plans generated by GENEPI take account of the accidental situations
and their changes over time. The set of methods and intervention techniques that
could be mobilized have been classified and suitability criteria with situations
have been established and associated to each of them. Figure 1 shows the
functioning principle of this module. Access to the GENEPI module is done
through an observation vector of a real situation (VREAL) and / or an observation
vector of a simulated situation (VSIM) resulting from another CLARA 2 Modules.
Based on this observation vector and the suitability criteria associated with each
fight (intervention) action, the Selection Process (Selection of Fight Actions)
accesses to the Classified Fight Actions to extract the most relevant ones. The
selection is based on the analysis of suitability criteria associated with each fight
action. The result is a set of fight action called "candidates actions". It will
serve as the basis for generating fight Plans. Each Plan can be simulated to be
validated. Its users can operate this module automatically or in a coordinated and
controlled way.

Classified Additional
Figth Actions Observations

VREAL Plan
Selection Set of Intervention
OR Candidate Generation Plans
Process
Actions Process
VSIM

Needed
Ressource Base

Fig. 1 Functioning principle of the GENEPI Module

4 Methodological Approach

4.1 Analysis of the Problem


One of the main problems arising during the conception of new computing tools to
assist the resolution of safety problems is linked to the stability of the terminology.
This problem is symptomatic of semantic and conceptual distances within actors
of the community for which computing tools are intended. These distances can
emerge within critical situations and lead to accidents or to increase accident
consequences. The notion of ontology and works currently developed by the
86 J.-M. Mercantini and C. Faucher

scientific community of the knowledge engineers can bring interesting answers to


this problem. One of the objectives of ontology is to facilitate the exchanges of
knowledge between human beings, between human beings and machines as well
as between human beings through machines [3].
The advantages in developing ontologies to solve problems arising in the field
of safety and risk management are the following: (i) they structure a domain in
highlighting concepts and semantic relations that are linking these concepts and,
(ii) they can be used to be the base for new computer tool design. Tools so built
are carrying knowledge shared by the actors of the domain, what makes them
more effective within critical or crisis situations. The followed methodological
process is based on the "Knowledge Oriented Design” (KOD) method [4, 5]. KOD
belongs to the family of methods coming from Cognitive Engineering and
designed to guide the engineer (or the knowledge engineer) in its task of
developing knowledge based systems. This method was designed to introduce an
explicit model between the formulation of the problem in natural language and its
representation in the formal language chosen. The inductive process of KOD is
based on the analysis of a corpus of documents, speeches and comments from
expert domain, in such a way to express an explicit cognitive model (also called
conceptual model).

4.2 The KOD Method


KOD is based on an inductive approach requiring to explicitly express the
cognitive model (or the conceptual model) based on a corpus of documents,
comments and experts’ statements. The main features of this method are based on
linguistics and anthropological principles. Its linguistics basis makes it well suited
for the acquisition of knowledge expressed in natural language. Thus, it proposes a
methodological framework to guide the collection of terms and to organize them
based on a terminological analysis (linguistic capacity). Through its
anthropological basis, KOD provides a methodological framework, facilitating the
semantic analysis of the terminology used to produce a cognitive model
(conceptualisation capacity). It guides the work of the knowledge engineer from
the extraction of knowledge to the development of the conceptual model.
The use of the KOD method is based on the conception of three types of
successive models: the practical models, the cognitive model and the software
model, as represented in Table 1. Each of these models is conceived according to
the paradigms: <Representation, Action, Interpretation>. The Representation
paradigm allows to model the universe as an expert represents it. This universe is
made of concrete or abstract related objects. The Action paradigm allows
modelling the behaviour of active objects that activate procedures upon the receipt
of messages. In consequence, action plans devised by human operators as well as
by artificial operators will be modelled in the same format. The
Interpretation/Intention paradigm allows to model the reasoning used by the
experts in order to interpret situations and to elaborate action plans related to their
intentions (reasoning capacity).
The practical model (PMi) is the representation of a speech or a document of
the corpus, expressed in the terms of the domain by means of taxemes (static
Building a Domain Ontology to Design a Decision Support Software 87

representation of objects), actemes (object activity representation) and inferences


(at the basis of the task cognitive structure). The cognitive model is built by
abstracting the practical models. It is composed of taxonomies, actinomies and
reasoning patterns. The software model result from the formalization of a
cognitive model expressed in a formal language, and is independent of
programming languages.

Table 1 KOD, the three modelling levels.

Paradigms / Models Representation Action Interpretation


Practical Taxeme Acteme Inferences
Cognitive Taxonomy Actinomy Reasoning Pattern
Software Classes Methods Rules

4.3 The Ontology Building Process Using KOD


Research work in Ontology Engineering has put in evidence five main steps for
building ontologies [3, 6, 7, 8, 9, 10]:

1. Ontology Specification. The purpose of this step is to provide a description of


the problem as well as the method to solve it. This step allows one to describe
the objectives, scope and granularity size of the anticipated ontology.
2. Corpus Definition. The purpose is to select among the available information
sources, those that will allow the objectives of the study to be attained.
3. Linguistic Study of the Corpus. It consists in a terminological analysis of the
corpus in order to extract the candidate terms and their relations. Linguistics
is specially concerned to the extent that available data for ontology building
are often expressed as linguistic expressions. The characterization of
the sense of these linguistic expressions leads to determine contextual
meanings.
4. Conceptualization. Within this step, the candidate terms and their relations
resulting from the linguistic study are analyzed. The candidate terms are
transformed into concepts and their lexical relations are transformed in
semantic relations. The result of this step is a conceptual model.
5. Formalization. The step consists in expressing the conceptual model by means
of a formal language.

The projection of the KOD method on the general approach for developing
ontology shows that KOD guides the corpus constitution and provides the tools to
meet the operational steps 3 (linguistic study) and 4 (conceptualization).
Under previous researches, the KOD method has been already implemented
[5, 11, 12].
88 J.-M. Mercantini and C. Faucher

Table 2 Integration of the KOD method into the elaboration process of ontology
Elaboration process of KOD process Elaboration process of
Ontology ontology with KOD
1. Specification 1. Specification
2. Corpus definition 2. Corpus definition
3. Linguistic study 1. Practical Model 3. Practical Model
4. Conceptualisation 2. Cognitive Model 4. Cognitive Model
5. Formalisation 5. Formalisation
3. Software Model 6. Software Model

5 Elaboration of the Ontology

5.1 Corpus Definition


This phase’s objectives are to identify the relevant knowledge for the GENEPI
module within the problem domain. It requires a well-defined and well-delimited
problem domain. In our study, two important phenomena that define the field and
the problem to be addressed are: (i) maritime accidents and (ii) interventions to
contain the consequences of the accident. Thus, the corpus has been established on
the basis of documents from CEDRE (le Centre de Documentation, de Recherche
et d’Experimentation sur les pollutions accidentelles des eaux) and REMPEC (the
REgional Marine Pollution Emergency Response Centre for the Mediterranean
Sea) in respect of accidents that have already occurred as well as the
implementation of emergency plans. The types of documents that make up this
corpus are the following:

• Documents relating to the evaluation of each fight technique or method,


• General documents about the organization of emergency plans,
• Return on experience documents about the major maritime disasters such as
that of the Erika, Prestige, etc..
• Return on experience documents about maritime accidents of lower
magnitudes.

5.2 Practical Models


This phase consists in extracting from each document belonging to the corpus, all
the elements (objects, actions, and inferences) that are relevant to accident
representation and fight action implementation.

5.2.1 Extracting Taxems

The linguistic analysis is performed in two steps: the verbalization and the
modelling. The verbalization step consists in paraphrasing the corpus documents
in order to obtain simple phrases, which allow qualification of the terms employed
during document analysis. Some terms appear as objects, others appear as
properties, and yet others appear as relations between objects and values. The
Building a Domain Ontology to Design a Decision Support Software 89

modelling step consists of representing the phrases in the format of taxem:


<object, attribute, value>.
The taxem characterizes an object from the real world by means of a relation
(attribute), which links the object to a value. There are five types of relations:
classifying (is-a, type-of), identifying (is), descriptive (position, failure mode,
error mode, cause…), structural (composed-of) and situational (is-in, is-below,
…). The example that follows illustrates the process employed to obtain the
taxems:
“... On November 13, 2002, the Prestige oil tanker flying the Bahamian flag,
sends an emergency message from the Finisterre Cape ...”
Paraphrases:
1. The Prestige is a oil tanker
2. The Prestige flies the flag of the Bahamas
3. On November 13, the Prestige is located at the Finisterre Cape
4. On November 13, the Prestige sends an emergency message
Taxems:
1. <Prestige, IS A, oil tanker>
2. <Prestige, FLAG, Bahamas>
3. <Prestige, LOCATION, Finisterre Cape>
4. <Prestige, DATE, November 13th>

The last paraphrase is related to an action, so it will be modelled by means of an


actem. The extent of this analysis at the Corpus, have allowed obtaining the set of
taxems needed for the representation of the universe described by the corpus of
documents. An object of the real world is modelled by the sum of related taxems.

5.2.2 Extracting Actems

In order to obtain the actems, the linguistic analysis consists on identifying verbs
that represent activities performed by actors during marine pollution or object
behaviour. In general terms, an activity is performed by an action manager, by
means of one or more instruments, in order to modify the state (physical or
knowledge) of the addressee. The action manager temporarily takes control of the
addressee by means of instruments. Occasionally the action manager can be one
who directs the activity and at the same time is also subjected to the change of
state (example: knowledge acquisition). The following example illustrates how to
extract actems from the Corpus: “... the Prestige sends an emergency message...”
The activity is “SENDING an emergency message”. Once identified, the
activity is translated into a 7-tuple (the actem):
<Action Manager, Action, Addressee, Properties, State1, State2, Instruments>,
Where: the Action Manager performs the action; the Action causes the change; the
Addressee undergoes the action; the Properties represent the way the action is
performed; State 1 is the state of the addressee before the change; State 2 is the
state of the addressee after the change; Instruments, is one or a set of instruments
representing the means used to cause the change.
90 J.-M. Mercantini and C. Faucher

The actem “SENDING an emergency message” is represented as following:


<Prestige Commandant, SENDING an emergency message, CROSS MED, (date,
location, duration), CROSS MED (do not know), CROSS MED (know), Radio>.
Where CROSS MED means “Centre Régional Opérationnel de Secours et de
Sauvetage en Méditerranée », which is the French organism that receives any
emergency messages from ships in difficulties. Figure 2 illustrates this actem and
Figure 3 illustrates the case of a fight action:
<Action Manager, Action, Addressee, Properties, Suitability Criteria, State1,
State2, Instruments>

Prestige Commandant

SENDING
CROSS MED An emergency message CROSS MED
(do not know) (Date, location, duration ) (know)

Radio

Fig. 2 Representation of the Actem “SENDING an emergency message”.

Fig. 3 Representation of the Actem “FLUSHING” in a table form.

Actems model the task activity. It is composed of textual items extracted from the
reports, which describe the state change of an object as described by the domain
experts. Each element of the 7-tuple (or 8-tuple for fight actions because of the
suitability criteria) must be previously defined as a taxem.

5.3 The Cognitive Model


This phase consists of the analysis and abstraction of the Practical Models. The
objective is to build the domain ontology. In other words, the aim is to classify the
used terminology and thus obtain the KOD Cognitive Model.
Building a Domain Ontology to Design a Decision Support Software 91

5.3.1 Building the Taxonomies

Term Analysis: the analysis consists in solving problems induced by homonym


and synonym terms, with the objective to build a common terminology.
Concept Identification: this step is based on the analysis of the taxems and
consists in highlighting the nature of the attributes, which characterize each object.
The attribute nature is the basis for the construction of the taxonomies (relations
‘kind-of’ and ‘is-a’) or other tree type structures (relations: ‘is-composed-of’,
‘position’, ‘is-in’, ‘is-below’, ‘is-above’, etc.).
As an example, from the analysis of the set of taxems it was found that the term
“Skimmer” is meaningful and thus it deserves the status of a concept. It is
significant of a set of recovery devices (modelled by means of taxems). As a result
of the analysis of the terms related to “Skimmer”, the taxonomy in figure 4 has
been built and the “Skimmer” concept is defined through his attributes as follow:
The Skimmer Concept: <Type, Flow, Quantity, Storage Location, Dimension,
Weight, Performance Limit, Selectivity, Recovery Rate>
All the taxems of the corpus are organized in taxonomies and each concept has
been defined as shown in the example.

Fig. 4 The Skimmer taxonomy

5.3.2 Actems Abstraction

One result of the actem analysis is that actems can be devided into five main
action categories:
• Actions related to pollutant behaviour,
• Actions related to accidented ship behaviour,
• Actions related to reasoning patterns,
• Actions related to CLARA 2 services.
• Actions related to operations against pollution,
Amongst actions related to pollutant behaviour we can cite: Evaporation and
Dissolution. Amongst actions related to accidented ship behaviour, we can cite:
92 J.-M. Mercantini and C. Faucher

Listing to starboard and Sending an emergency message. The actions related to


reasoning patterns such as « Choosing the shoreline clean-up methods » are used
to select or to plan fight actions. To be performed, they use the suitability criteria
associated to each actem. The actions that belong to the CLARA 2 services
category are implemented to improve the GENEPI functionalities.
The actions of the last category are fight actions (Figure 5). They are divided
into two main classes: (i) the shoreline clean-up methods and (ii) the clean-up
methods on the sea. Some actems of the fight action category can be organized in
a structural and temporal way to form actinomies. The interest of this kind of
structure is that actions are already planned.

Fig. 5 Extract of the Fight Action Taxonomy

6 Architecture of the GENEPI Module


The architecture of the GENEPI module (Figure 6) has been designed around the
ontology enriched with the instances of the concrete classes. The association of
the ontology with instances constitutes a knowledge base.

The notion of Situation


The analysis of accident stories from the corpus shows that each accident has its
own characteristics and that for a particular accident the circumstances and
context change from one moment to another. To take this into consideration, we
defined the notion of Situation. A Situation consists of a set of attributes (S) that
characterizes the accident and its context. The set of these attributes is a superset
of the set of suitability criteria (Ca) associated to fight actions. Thus, attributes
common to Ca and S have the same types. Instances of the Situation are obtained
from data delivered by the access interface to external data (coming from others
CLARA2 modules), and from data supplied by the user.
Building a Domain Ontology to Design a Decision Support Software 93

Fig. 6 Architecture of the GENEPI module

The Action Search Engine


The search engine receives as input the Situation and the Domain in which
searching the fight actions in the ontology. The domain is identified by the name
of the class that characterizes it in the taxonomy of the fight actions (Shoreline
Clean-up Actions, Mechanical Retrieval, etc.). As a result, it provides four sets of
fight actions:

• The set A, which contains the actions where all criteria are verified,
• The set B, which contains the actions where at least one of the criteria could not
be assessed by lack of information in the situation,
• The set C, which contains the shares of which at least one criterion was not
satisfied,
• The set D, which contains the actions of the set B enriched by criteria not
assessed.
The rules for selection of fight actions are based on the suitability criteria and the
values taken by the corresponding attributes of the situation.
The rules are of the form:
c1 ^ c2 ^ ...^ cn → True / False
With c1, c2, ... cn, the criteria associated to a fight action. The conclusion of the
rule is about the possibility whether or not to select the action. A criterion is
satisfied if the value taken by the corresponding attribute of the situation is
compatible the criterion constraints.
94 J.-M. Mercantini and C. Faucher

Upon the receipt of the Situation, the action selecting algorithm analyzes the
actems involved in the Search Domain. From each ACTEM, it extracts the criteria
and it applies the selection rules previously presented. According to the results
obtained, the actem is placed in the corresponding set (A, B, C or D). After
running the algorithm, if the user is not satisfied with the result, it can enrich the
situation to assess the criteria that have not been. This new running should reduce
the size of the B set, putting the actions which were either in the set A or in the set
C. The algorithm is independent of changes in the ontology.

The ontology management module


This module provides users with the functions needed for maintenance (updating,
adding and deleting classes, attributes and instances) and consultation (searching
knowledge) of the ontology.

7 Conclusion
The paper presented the first results about the design of a software tool (the
GENEPI module) to plan fight actions against marine pollution. The GENEPI
module is a part of a wider research program: CLARA 2. The methodological
process to build GENEPI is based on the elaboration of an ontology. The purpose
of that ontology is to structure the domain (maritime accidents) according to the
problem to solve (to plan fight actions) and to the problem solving method. The
ontology was obtained through a cognitive approach, which consisted in applying
the KOD method, which has proven to be adequate.
The Situation Management module, the Ontology Management module and the
Action Search Engine are in service. The Plan Generator module and the
Simulator are currently in progress.

References
[1] Rempec, Guide pour la lutte contre la pollution marine accidentelle en Méditerranée,
Partie D, Fascicule 1 (Avril 2002)
[2] CLARA 2 Consortium: École des Mines d’Alès, le Cèdre, IFREMER, Météo France,
IRSN, TOTAL, EADS, Géocéan, UBO, INERIS, SDIS 30, Préfecture Maritime de la
Méditerranée, le CEPPOL, LSIS. Projet ANR (2006-2010)
[3] Uschold, M., Grüninger, M.: Ontologies: Principles, methods and applications.
Knowledge Engineering Review 11(2), 93–136 (1996)
[4] Vogel, C.: Génie cognitif. Sciences cognitives, Paris, Masson (1988)
[5] Mercantini, J., Tourigny, N., Chouraqui, E.: Elaboration d’ontologies à partir de
corpus en utilisant la méthode d’ingénierie des connaissances KOD. In: 1ère édition
des Journées Francophones sur les Ontologies, JFO 2007, Octobre 18-20, pp. 195–
214, Sousse, Tunisie (2007) ISBN: 978-9973-37-414-1
[6] Gandon, F.: Ontology engineering: a survey and a return on experience. Research
Report no. 4396. INRIA Sophia-Antipolis (Mars 2002)
Building a Domain Ontology to Design a Decision Support Software 95

[7] Aussenac-Gilles, N., Biébow, B., Szulman, S.: Revisiting Ontology Design: A
Method Based on Corpus Analysis. In: Dieng, R., Corby, O. (eds.) EKAW 2000.
LNCS (LNAI), vol. 1937, pp. 172–188. Springer, Heidelberg (2000)
[8] Dahlgren, K.: A Linguistic Ontology. International Journal of Human-Computer
Studies 43(5), 809–818 (1995)
[9] Uschold, M., King, M.: Towards a Methodology for Building Ontologies. In:
Proceedings of the IJCAI 1995 Worshop on Basic Ontological Issues in Knowledge
Sharing, Montréal, Canada (1995)
[10] Fernández-López, M.: Overview of methodologies for building ontologies. In:
Proceedings, IJCAI 1999 Workshop on Ontologies and Problem-Solving Methods
(KRR5), Stockhom, Sweden, August 2, pp. 4-1–4-13 (1999)
[11] Mercantini, J.M., Capus, L., Chouraqui, E., Tourigny, N.: Knowledge Engineering
contributions in traffic road accident analysis. In: Jain, R.K., Abraham, A., Faucher,
C., van der Zwaag, B.J. (eds.) Innovations in Knowledge Engineering, pp. 211–244
(2003)
[12] Mercantini, J.-M., Turnell, M.F.Q.V., Guerrero, C.V.S., Chouraqui, E., Vieira,
F.A.Q., Pereira, M.R.B.: Human centred modelling of incident scenarios. In: IEEE
SMC 2004, Proceedings of the International Conference on Systems, Man &
Cybernetics, The Hague, The Netherlands, October 10-13, pp. 893–898 (2004)
Can Pictures Be a Candidate for Knowledge
Media?

Fuminori Akiba

Abstract. Can pictures be a candidate for knowledge media? After mentioning the
background of this question in chapter one, we introduce in chapter two the idea
of three kinds of knowledge, from discussion of a picture from a book of Dominic
McIver Lopes (Lopes 2006) ―that is, knowledge about, knowledge through and
knowledge in―and point out its deficiencies. In chapter three, we then propose an
alternative idea of pictures from which we obtain knowledge ―that is, pictures as
objects and facts, pictures as process and pictures as informational indica-
tors―and find strong support for our proposal from various research fields and
practices. Finally we conclude that we could think of pictures as a candidate for
knowledge media.

1 Introduction: Can Artworks Convey Knowledge?

At first glance it seems quite easy for us to answer this question. Artworks have
long given us various kinds of knowledge. For example, Giotto’s fresco painting
of the Last Judgment at the Chapel of the Scrovegni in Padova (Fig.1) has taught
people what Hell is and taught them to live good lives if they do not want to go to
Hell. Cannacher’s contemporary artwork called Addict to Plastic (Fig.2) teaches
us the problem of wastes disposal and of manipulation through the mass media.
But can we surely say that they really give us knowledge themselves? The answer
is probably No. If someone who has never read the description of Hell in the Bible
sees Giotto’s fresco painting, the person could not understand what it is about. The
person only sees a scene which shows a monster eating a man. And if someone
who is not accustomed to the traditional artistic convention that an open window
or a frame symbolizes –i.e., a gate through which we can get to a hidden truth—
the person should fail to grasp Cannacher’s intention and only see the scene of a
heap of trash.
Corresponding to our suspicion, many philosophers have cast doubts on the
ability of artworks as knowledge media. Among them the negative evaluation by

Fuminori Akiba
Graduate School of Information Science, Nagoya University
e-mail: akibaf@is.nagoya-u.ac.jp

T. Watanabe et al. (Eds.): Intelligent Interactive Multimedia: Systems & Services, SIST 14, pp. 97–105.
springerlink.com © Springer-Verlag Berlin Heidelberg 2012
98 F. Akiba

Stolnitz (1992) is the most famous. He said, “[I]n science, history, and religion,
confirmation of a statement also counts as evidence for other, logically related
statements. Thus truths, notably in the cumulative advances of science support and
build on each other,” while “art is unlikely any of these kinds of knowing. Be-
cause “the truth derived from one work of art never confirms that derived from
another work of art […]” (Stolnitz 1992, 341).
In contrast to Stolnitz who still recognizes each artwork as a source of knowl-
edge and finds another kind of knowing in art, Hideo Iwasaki, a Japanese biologist
and artist, said that even each artwork cannot be a source of knowledge because
for him an artwork is an “open text” : in other words, a source of “never-ending,
unanswerable and in a sense irresponsible questions” (Iwasaki 2010, 753).
Therefore, we need to ask again, can artworks be a candidate for knowledge
media? But the word artworks is too vague and includes hugely different objects,
so here we limit the topic in this paper from artworks in general to pictures in or-
der to avoid confusion. In this case, of course, pictures do not include visual repre-
sentaion such as diagrams and graphs. In chapter two we will then take up the idea
of three kinds of knowledge from a picture in Lopes (2006) and point out its defi-
ciencies. In chapter three, we then explore the alternative idea of knowledge from
a picture and find strong supports for our proposal from various research fields
and practices. In the final chapter we conclude that we could think of pictures as a
candidate for knowledge media.

2 Knowledge from Picture in Lopes’ Aesthetics

2.1 Three Kinds of Knowledge We Obtain from a Picture


What kind of knowledge can we expect in a picture? Among countless studies on
the theme of art and knowledge, I have selected Lopes (2006) because in this work
he draws a useful outline of the issue and indirectly tells us what we should throw
away if we really want to explore the possibility of pictures as knowledge media. I
briefly mention his idea of three kinds of knowledge and then point out the defi-
ciency of his argument at the sacrifice both of richness of his original text and of
respect for aesthetic experience his text originally shows.
First he criticizes previous studies on art and knowledge because they lack “the
right conception of picture’s cognitive value” (Lopes 2006, 132). Then, according
to his own idea, he classifies knowledge which is obtained from pictures into three
different kinds: knowledge about pictures, knowledge through pictures, and
knowledge in pictures (Table 2.1).

2.2 Lopes’ Strange Argument


Lopes examines these three kinds of knowledge from the point of justification be-
cause the classical definition of knowledge is that knowledge is a justified (in his
words, “warrant[ed]”) true belief. Therefore, he tries to test whether these three
kinds of knowledge are worthy of their name.
Can Pictures Be a Candidate for Knowledge Media? 99

Table 2.1 Three kinds of knowledge from pictures (reconstructed by the author from Lopes
2006, 133-4)

Kind of knowledge Contents Concrete examples


Knowledge about Picture’s properties The name of materials,
of makers, and represen-
tation-al properties

Knowledge through Facts about the makers The circumstances in


and the historical which the makers
conditions worked

Knowledge in Pictures’ lessons Scientific, historical, hu-


man psychological, and
especially moral
knowledge

From this point of view Lopes makes clear the difference between knowledge
about, knowledge through and knowledge in. Knowledge about and knowledge
through is justified, though in a different way from propositional knowledge. For
example the belief that the painting is exercised in oils [knowledge about] is justi-
fied from a caption, or more scientifically, through technological investigations,
and also the belief that the painting was made under the strong influence of con-
temporary paintings [knowledge through] is justified if the painting is compared
with contemporary paintings which the painter could really see.
In contrast to these two kinds of knowledge, however, Lopes argues knowledge
in is not justified because the content of perceptual belief, which Lopes call the
perceptual report of a picture, cannot justify the lesson of the picture. Take Lopes’
example, Dorothea Lange’s Migrant Mother (1936, Fig.3). For example we cannot
logically draw the message of the picture [knowledge in] –that is, “we ought to act
with greater compassion for the poor” (Lopes 2006, 140)— from its contents of
perceptual belief. We can only reasonably interpret the message from the percep-
tual contents such as “the remarkable lines of the face of the migrant mother, the
quiet pride and determination she expresses, and the depiction of the children with
their backs to the viewer” (Lopes 2006, 139-140, concerning the discussions about
the content of picture perception, see Zeimbekis 2010).
At this moment, everyone should expect that Lopes casts such knowledge in
away because it does not satisfy the classical definition of knowledge. If knowl-
edge in means the lesson which we manage to obtain through interpretation of our
perceptual contents, knowledge in is not knowledge worthy of the name. How-
ever, everyone may be surprised because Lopes suddenly throws the classical de-
finition of knowledge away and changes the topic from knowledge to an “intellec-
tual virtue” (Lopes 2006, 145). Pictures may have an intellectual virtue if they
have the power by which viewers are transformed into men of “fine observation.”
He says, “looking at a picture frequently requires effort, sometimes a great deal of
effort, in attention to detail, accurate perception, and adaptable seeing. […]
100 F. Akiba

A person who misses the fine details of an image, or who cannot see how it por-
trays previously unremarked features of reality, or who cannot see things in new
way…such a person does not appreciate it fully. In order to meet these demands,
viewers must become fine observers. […] A fine observer has visual experiences
that closely track the properties of which perceived and bring her experiences un-
der concepts which make them available to belief formation, knowledge gathering,
and reasoning. Pictures have cognitive merit in so far as they bring about revisions
to the way we conceptualize visual experience” (Lopes 2006, 149-150).
However, this alteration is quite strange. First, if we are allowed to throw the
classical definition of knowledge away, we can construct the entire argument in a
quite different way. Second, indeed pictures facilitate our ability to see something
accurately and heuristically in detail, but we can learn the same thing from any
kinds of visual representation such as diagram, graph, and scientific visualization.
And finally, according to his idea, man must learn many things about how to see
pictures and how to be a fine observer in order to understand knowledge in pic-
tures, but even if we accumulate hundreds of propositional rules about how to see
pictures accurately and heuristically in details, the rules can never guarantee a
viewer’s jump from perceptual contents to knowledge in pictures. No one can con-
fidently say that the jump from the perception of “the lines of face of the migrant
mother” to the lesson, “we ought to act with greater compassion for the poor,” is
true or false because it is a problem of persuasion or rhetoric, not of logical dem-
onstration.

3 Our Proposal and Supportive Research


Why did Lopes’ argument take such a strange turn? There seem to be at least three
reasons: the first, his implicit assumption that the deepest knowledge we ob-tain
from pictures is moral knowledge: second, his standpoint which starts from view-
ers’ experience: and finally his halfway acceptance of the classical definition of
‘knowledge’ as a justified true belief. Conversely, we propose that if we throw
away these three deficiencies, we could then find the possibility that pictures can
be a candidate for knowledge media. In fact we can find support for our proposal
from research in various fields.

3.1 Discard the Hierarchical Assumption


Lopes’ idea of three kinds of knowledge―that is knowledge about (objects: what
are the objects?), knowledge through (facts: what is the historical situation in
which the picture was made?) and knowledge in (lessons: what do the maker teach
us?) --seems to me the long distant echo of the hierarchical triad from the early
20th century’s philosophy of art: phenomenon (low perceptual experi-
ence)/meaning (middle conventional reading)/worldview (high spiritual intuition)
(cf. Panofsky 1932).
But we must remember that knowledge about and knowledge through are com-
pletely enough for museum visitors, especially for laypeople during their informal
learning. So we do not have to reject the classical definition of knowledge and we
Can Pictures Be a Candidate for Knowledge Media? 101

do not have to think that the deepest knowledge we obtain from pictures is
spiri-tual (or moral).
In addition, since knowledge about and knowledge through can be taught in the
form of propositional statements, they are easily exhibited with captions in
muse-ums. Concerning knowledge through we can exhibit the picture by the side
of oth-er pictures relevant to the picture, and the contexts in which the maker
really pro-duced the picture.
For this proposal we can find a support from the practice which was really done
in a museum. Meighen S. Katz reported in his article on an exhibition This Great
Nation Will Endure (2004/2005, at Franklin Roosevelt President Library and
Mu-seum) and told in this exhibition Lange’s Migrant Mother was exhibited in a
quite interesting way. It was exhibited with a computer interaction “which allowed
the visitors to access not just the familiar image, but Lange’s full series. By
viewing the lead-up photos the museum visitors were able to see the process of
composi-tional framing and decisions that Lange made” (Katz 2012, 332). Such
contextual display of historical facts about making a picture conveys us what a so-
ciety once requested to the maker and what the maker wanted to show people in
order to sat-isfy their request or tried to create a new vision against their need.
Pictures be-come historical evidence.

3.2 See a Picture Not as an End Product, But as a Process

3.2.1 Knowledge in the Process

Lopes begins his argument from the viewer who expects to receive knowledge
from a picture as an end-product. From this point of view pictures might remain
unanswerable, open-ended questions with no definitive answer, and what viewers
can do is only to imagine the picture’s lesson. Consequently viewers are always
forced to jump from their unreliable perceptual contents to equally unreliable les-
sons [or intention of imaginary makers]. However, if we change our point of view
from viewers’ receptive experience to makers’ generative process, the situation
completely changes, because even though a work of art as an end-product re-
mains an open-ended question, the process through which the maker of a work of
art makes is a problem-solving process in which a maker struggle to find an
opti-mal solution to an artistic problem.
Gregory Currie once called this process a “heuristic path” (Currie 1989). “In
speaking of a scientist’s ‘heuristic path’ to a theory I mean the process whereby
the theory was arrived at; the facts, methods and assumptions employed, including
analogical models, mathematical techniques and metaphysical ideas. […] And I
wish to take over the spirit of this idea for our analysis of artworks, though it will
undergo modification in the process” (Currie 1989, 113).
Along this line, many researchers already have made their studies and give
strong support for our proposal. For example, various kinds of studies on drawing
process (Fujihata 2008), cognitive studies of artists’ creative process (Yokochi and
Okada 2007), cognitive science of design process (Goel 1995), cognitive studies
of creativity and knowledge transmission through copying (Ishibashi and Okada
2004, 2010), etc.
102 F. Akiba

Among these two papers written by Ishibashi and Okada are especially worth
mentioning, because, on the one hand, they (Ishibashi and Okada 2010) recognizes
the creative process of drawing as a kind of problem-solving process and demon-
strate that copying artist’s drawing by laymen is useful for “constraint relaxation”
which constitutes a precondition of creative drawing. According to the paper, in
the process of copying, laymen can utilize the knowledge structures in artists’
drawings as guidance, and with their help can reflect upon their own knowledge
structures they already had, and make themselves relaxed from the constraint of
their existing ideas about drawing. On the other hand (Ishibashi and Okada 2004),
copying is to understand oneself. It tells us our own knowledge structures and
their limits (concerning self-awareness and self-change in the viewers’ aesthetic
experience, see Pelowski and Akiba 2011).

3.2.2 Material Knowledge in Making Pictures

Among countless matters relevant to pictures as process, the exploration of


pic-torial materials suitable for maker’s expression is one of the most important.
Mak-ing pictures is to transform materials into a structured form. Without mate-
rials pic-tures cannot exist. If the maker deals with materials badly, the picture
will physically collapse. But unfortunately, researches mentioned in the above
section (see Sect.3.2.1) only deal with drawing. So we must refer briefly to this
material knowledge.
For example, at Aichi Prefectural University of Arts and Music, undergraduate
students of the painting course usually take two years for learning the nature of
materials. In the class of chemistry they learn the character of each material and
the combinations of different materials. In another class they learn how the past
Great Masters find their own material combinations in order to realize their ideal
expressions. After the basic learning, they practically make trials of combining
various materials and acquire the full range of knowledge of their characteristics
and try to find new combinations in order to know the limits and possibilities for
future productions.
So we might say that pictures are the accumulated knowledge which results
from the dialogue between makers and materials. Therefore, if we do not know
material knowledge, we can never say we fully obtain knowledge from a picture.

3.3 From the Classical Definition of Knowledge as Justified


True Belief to ‘Belief-Independent’ Informational Systems
As I pointed out (see Sect.2.2), if we are allowed to discard the classical defini-
tion of knowledge and accept the alternative view that knowledge is belief-
independent and needs no justification, we can construct the entire argument in a
quite different way. We can begin it with an alternative worldview, that is, the
world as “flow of information” (Dretske 1981, cited in Lopes 1996, 114). I quote
the paragraph of Gareth Evans’ book The Varieties of Reference. Evans takes the
famous Mueller-Lyer illusion as an example, and points out that its perceptual il-
lusion (one line appears to us longer than the other) still continue even when we
are sure that it is not true and says:
Can Pictures Be a Candidate for Knowledge Media? 103

“In general, it seems to me preferable to take the notion of being in an infor-


mational state with such-and-such content as a primitive notion for philosophy, ra-
ther than to attempt to characterize it in terms of belief. […] the subject’s being in
an informational state is independent of whether or not he believes that the state is
veridical.” (Evans 1982, 124, see also Lopes 1996, 101-106)
But man can object to the idea, insisting that it is not knowledge but informa-
tion. In fact Evans reserves ‘belief’ for more sophisticated cognitive state, for
example, judgment (Evans 1982, 124). Lopes also speaks of such kind of informa-
tion. He thinks that this is the Air (emotional expressions) of artworks, not of
knowledge.
“Smoke indicates fire, and measles indicate an infection of Morbillivirus. In
each of these cases the indicator state carries information about an indicated state:
given the indicator’s state, the conditional probability of the indicated state is one
in one. The position of a speedometer needle also indicates something – for ex-
ample, that the vehicle is moving at 50 kilometers per hour. Normally it does this
by carrying information about the speed of the vehicle. […] Natural expression-
looks frequently carry information about emotions. […] Expression-looks are
more like speedometer needles than measles. The smile is part of a mechanism
designed (in fact, evolved) to carry information about emotions.
(Lopes 2006, 75)
But is information which results from informational systems, in other words
evolved from a mechanism, not knowledge? Probably we do not have to think so.
If such information comes from mechanisms which we all share as the result from
evolution and it conveys to us the essential features of objects or facial expres-
sions, then we could have the right to say it might be knowledge. And if pictures,
as information systems, can also convey us the essential features of objects or
facial expressions,, we might say pictures convey knowledge.
For this proposal Neuroscience may give a strong support. For example, Semir
Zeki says:
“[…] the function of art that is very similar to the function of the brain: to rep-
resent the constant, lasting, essential and enduring features of objects, surfaces,
faces, situations, and so on, and thus allow us to acquire knowledge not only about
the particular object, or face, or condition represented on the canvas but to gener-
alize from that to many other objects and thus acquire knowledge about a wide
category of objects of faces. In this process, the artist, too, must be selective and
invest his work with attributes that are essential, and discard much that is super-
fluous. It follows that one of the functions of art is an extension of the major
function of the visual brain.” (Zeki 1999, 9-10)
Of course we know there are numerous criticisms of his idea. And someone
might say that there exists a researcher like Scherer, who criticizes the idea of the
evolutionary programmed emotional system which automatically releases five
fundamental emotions like anger as basic facial expressions (Scherer 2008).
However, Scherer also utilizes pictures, including paintings and photography,
for example, Caravaggio’s Judith, as evidence of his theory of “component proc-
ess model of emotion” which allegedly gives us an explanation of “a universally
valid iconic representation of emotions” (Scherer 2008, 249). In this sense his
104 F. Akiba

argument does not negate our proposal that pictures can convey knowledge about
facial expressions and emotions. Here we find a possibility that pictures can be
knowledge media. They can convey knowledge about “a wide category of objects
and faces” (Zeki 1999).

4 Conclusion: Three Alternative Ideas of Pictures


and Knowledge
Now we have three alternative ideas of pictures. They convey knowledge
(Table 4.1).

Table 4.1 Three alternative ideas of pictures and contents of knowledge

Kind of pictures Contents of knowledge Research fields that


support our proposal
Pictures as objects and Pictures’ properties History
facts Historical circumstances Anthropology
in which the makers Practices in museums
worked
Pictures as process Problem-solving Cognitive science
process, Chemistry, information
Material knowledge science

Pictures as informa- Essential features of Neuroscience, informa-


tional indicators which objects and faces tion science
result from informa-
tional systems

To the third one, we can easily access, because we are all products of evolution
and share the basic informational systems evolved. To the first one we can also
easily access because they are propositionally explicable. They are, therefore, eas-
ily combined with other kinds of media (texts, etc.) and exhibited in museums. To
the knowledge we obtain from pictures as process, we can access through combi-
nations with some other kinds of media such as videos which record the process of
making a picture. In addition, it is important for us to know how the maker deals
with the following matters: the degree of acceptance of already established values,
consciousness of other makers and their works, reflection on one’s own ideas, ex-
ploration of the materials which are suitable to what the maker wants to express,
development of one’s own artistic problems, relation to the societies to which the
maker belongs, etc. (Yokochi and Okada 2007, 444).
From what has been said above we can conclude that pictures can be a candi-
date for knowledge media. This conclusion results from a change in our view of
pictures ―that is, pictures as objects and facts, pictures as process and pictures as
informational indicators.
Can Pictures Be a Candidate for Knowledge Media? 105

Fig.1 Giotto agli Scrovegni Visita virtuale


http://www.giottoagliscrovegni.it/eng/visita/pano/pan_01.htm Accessed 1 Feb 2012

Fig.2 Ian Cannacher’s Addict to Plasitc


Ars Electronica Linz GmbH (2010), 17

Fig.3 Migrant Mother, taken by Dorothea Lange in 1936


http://en.wikipedia.org/wiki/Florence_Owens_Thompson Accessed 1 Feb 2012

References
Ars Electronica Linz GmbH, Repair: sind wir noch zu retten, Linz (2010)
Currie, G.: Art works as action types. In: Lamarque & Olsen (2004), pp. 103–122 (1989)
Evans, G.: The varieties of reference. In: MacDowell, J. (ed.) Oxford (1982)
Fujihata, M.: What is drawing process studies? In: Drawing Process Studies. Department of
Engineering, University of Tokyo (2008) (in Japanese)
Goel, V.: Sketches of thought. The MIT Press (1995)
Ishibashi, K., Okada, T.: Copying artworks as perceptual experience for creation. Cognitive
Studies 11(1), 51–59 (2004) (in Japanese)
Ishibashi, K., Okada, T.: Facilitating creative drawings by copying art works by others.
Cognitive Studies 17(1), 196–223 (2010) (in Japanese)
Iwasaki, H.: Biomedia Art: possibilities of synthetic biology from the point of aesthetics.
Science Journal KAGAKU, 747–753 (July 2010) (in Japanese)
Katz, M.S.: Reconsidering images: using the farm security ad-ministration photographs as
objects in history exhibitions. In: Dudley, S., et al. (eds.) The Thing About Museums:
Objects and Experience, Representation and Contestation, Essays in Honour of Profes-
sor Susan M. Pearce, pp. 324–337, Routledge (2012)
Lamarque, P., Olsen, S.H.: Aesthetics and the philosophy of art. Blackwell (2004)
Lopes, D.M.: Understanding pictures. Oxford (1996)
Lopes, D.M.: Sight and sensibility: evaluating pictures. Oxford (2006)
Erwin, P.: Zum Problem der Beschreibung und Inhaltsdeutung von Werken der bildenden
Kunst. In: Kraemmerling, E. (ed.) Ikonographie und Ikonologie: Theorien-Entwicklung-
Probleme, Dumont (1932)
Pelowski, M., Akiba, F.: A model of art perception, evaluation and emotion in transforma-
tive aesthetic experience. New Ideas in Psychology 29(2), 80–97 (2011)
Schere, K.R.: Gefrorene Gefuehle: Zur Emotionsdarstellung in der bildenden Kunst. In:
Boehm, G., et al. (eds.) Movens Bild: Zwischen Evidenz und Affekt, pp. 249–273. Wil-
helm Fink (2008)
Stolnitz, J.: On the cognitive triviality of art, Lamar-que & Olsen, pp. 337–343. Blackwell
(1992); (Reprinted in (2004))
Yokochi, S., Okada, T.: Creative expertise of contemporary artists. Cognitive Studies 14(3),
437–454 (2007) (in Japanese)
Zeimbekis, J.: Pictures and Singular Thought. Journal of Aesthetics and Art Criti-
cism 68(1), 11–22 (2010)
Zeki, S.: Inner vision: an exploration of art and the brain. Oxford (1999)
Capturing Student Real Time Facial Expression
for More Realistic E-learning Environment

Asanka D. Dharmawansa, Katsuko T. Nakahira, and Yoshimi Fukumura

Abstract. With the development of information and communications technology,


the growth of E-learning is rapidly increased. Environment is one of the major
factors for E-learning performance specially. More realistic E-leaning surrounding
is affected to achievement of effective learning environment. Facial expressions
are assumed to have a great impact on behavior and also on learning behavior.
This attempts to transfer real user facial feature into virtual learning place. Real
time facial feature detection system is developed and it is continually extracted
facial expression of E-learner. The appropriate person in virtual learning
environment who represents real user called ‘Avatar’ changes when the real user
changes the face. In the virtual environment, the appropriate face changes are
prepared to visible real user face data. In addition, any other persons able to
visible user face data through web component to observe the face behavior of the
E-learner.

1 Introduction

Education is one of the largest sectors of the economy in most countries. The
development of this sector is the core for future development. Most governments
invest more and more for improvement of this field. Traditional classroom and E-
learning settings are currently the most popular learning styles. E-learning is a
way to enhance knowledge and performance using Internet technologies to deliver
a broad range of solutions. With the increment of Internet users, the growth of the
E-learners cannot be avoided. Because of that is a Convenient, Cost-effective and
Consistent method. E-learning becomes most popular method to gain education.
Over the last decade, the number of corporate universities which has learning
partners such as E-learning companies or universities grew from 400 to 1,800 [1].

Asanka D. Dharmawansa · Katsuko T. Nakahira · Yoshimi Fukumura


Dept. of Management and Information Systems Engineering
Nagaoka University of Technology
Nagaoka, Japan

T. Watanabe et al. (Eds.): Intelligent Interactive Multimedia: Systems & Services, SIST 14, pp. 107–116.
springerlink.com © Springer-Verlag Berlin Heidelberg 2012
108 A.D. Dharmawansa, K.T. Nakahira, and Y. Fukumura

40% of Fortune 500 companies have established corporate universities which are
delivering E-learning [2]. Those evidences prove that in near future E-learning
become powerful tool all over the world.
Although E-learning becomes more popular, it is based on a self-regulated
learning style. Students can learn separately in the environment of E-learning.
They must determine "what to learn" and "where to go" in each learning session.
In the virtual E-learning world also, teacher couldn’t immediately get information
from students and instruct them face-to-face, like the real world. One of the ways
to increase the effectiveness of virtual world education system is retrieve real time
student information. It is very helpful to get some important decision from
teacher’s aspect.
Learning environment is one of the major factors affecting for the student
performance in traditional education system [3]. The similar affection can be
applied to E-learning also. As a result of that, three dimensional virtual
environments utilize as a platform for E-learning is gradually increased. More
researchers and institutions try to build their learning environment like real world
classrooms.
In this paper, one of the methods discuss to overcome above barriers, to
increase the effectiveness and to make more suitable place for the E-learning
environment, develop different kind of tools. Basically, main target of this system
consists with major two purposes.

1. Making the E-learning environment more realistic


2. Identify and analyze student behavior during the E-learning sessions

In this research, mainly focuses on development of real time facial expression


acquiring system and students’ facial data observing system during the E-learning
sessions in the virtual world. Facial expression acquiring system uses geometric
facial feature-based method to find the shape, texture and/or location information
of prominent components such as the mouth, eyes, nose, which can cover the
variation in the appearance of the facial expression. Through this system, basic
facial expressions of student can be extracted and visible through virtual learning
environment. There is a web interface also developed to observe the face data of
E-learner.

2 Related Works
There has been considerable interest in the potential for the development of
E-learning in universities, schools and further education [4]. With the
implementation of virtual environment, the number of E-learners rapidly
increased. It is expected that there will be about five million online learners within
the next ten years [5]. Not only for education but also it is spread to every fields. It
is used for supply training to workers [6]. Workplace learners can be better
served by E-learning environments rather than conventional training.
Although, there are significant increments of the E-learners, it is somewhat
difficult to identify student activities in an E-learning environment. Therefore
Capturing Student Real Time Facial Expression 109

courses which are conducted in E-learning are used peer assessment method to
assign appropriate marks [7]. There are some investigations to check the usability
aspects of E-learning interfaces that incorporate the use of avatar as a virtual
lecturer. There are researchers who analyze users’ satisfaction and views in regard
to a set of facial expressions and body gestures when used by a virtual lecturer in
the presence and absence of interactive context in E-learning interfaces [8, 9].
Another study is affective body gesture analysis in video sequences and spatial-
temporal features are exploited for modeling of body gestures. They also
presented to fuse facial expression and body gesture at the feature level using
Canonical Correlation Analysis [6]. Researchers tried to assess the emotional state
of learners by analyzing of nonverbal behavior as a speech analysis and the
analysis of facial expressions. They have developed tools to extract features from
sound and video recordings and used classifiers as support vector machine to label
emotional states [11].
Pedagogically, it is not always true that every E-learning virtual environment
provide high-quality learning. According to Govindasamy, development and
evaluation of E-learning involves learner and task analysis, defining instructional
objectives and strategies, testing the environment with users and producing the
initial version of the E-learning tool [12].
Eliminating the major barriers and improving the effectiveness of virtual
learning is the main target of this research. Making virtual environment more
realistic with user facial features and facilitating several ways to analyze the
student behavior indirectly are the major tasks of this research as a contribution to
increase the usefulness of E-learning.

3 Architecture of the Whole System


Learning environment is setup in the three dimensional virtual environment and
then students can access virtual class room and continue their educational
activities or other activities, as shown in Fig.1.

Fig. 1 Architecture of the whole system


110 A.D. Dharmawansa, K.T. Nakahira, and Y. Fukumura

At the same time, facial expression extracting system also activated. Student’s real
time facial expression can be extracted and connected with html web interface.
The user’s facial features (Face, Eye, Mouth and Nose widths and heights), face
image and expression are appeared in the html web interface which is sent from
facial expression acquiring system. This html web interface is fulfilled two main
activities.

• It is worked as a connection method between real world and virtual learning


environment.
• Web interface provides E-learner’s relevant data to public community

The relevant data is passed to virtual environment and according to those data, the
appropriate avatar is changed. As an example, when the real user smile the
appropriate avatar also smiles. In addition, web interface is handed necessary data
to server for store valuable data for further analysis. There are two ways to
observe the student behavior.

1. Direct observation through virtual environment.


2. Teacher or any other third person can be able to access student real time facial
features through web interface.

This is the whole system architecture for making three dimensional virtual
learning environments more realistic and facilitates to observe student facial
behavioral patterns.

4 Facial Expression Identification System

Fig. 2 Facial expression extraction procedure


Capturing Student Real Time Facial Expression 111

Real time facial expression can be extracted through this system and whole
procedure is indicated in Fig.2. The real user video is obtained continuously by
using webcam and it is consists of frames. Then, the analysis of frame by frame is
conducted. After obtaining the frame, detect the objects of that frame. As a result
of that analysis, the face can be detected. The process of finding face, other
components and facial expressions are explained as follows.

4.1 Methodology of Object Detection


This phrase describes the way to detect the face and identify face components. The
procedure for object detection is based on work by Viola and Jones [13]. In this
research, Haar feature-based cascade classifier for object detection is used because
of following factors.

– Detection at any scale


– Face detection at 15 frames per second for 384 288 pixel images
– 90% objects detected
The application of Haar-like features for real world application is shown in Fig.4.
It is necessary to develop classification based on the Haar-features for face and
each face components. First, the appropriate Haar-features need to apply for the
positive images which consist of specific face component and notify range of filter
and threshold value as shown in Fig.3. The threshold value (fi) is determined by
subtracting the summation of white-region pixel value from the summation of
dark-region pixel value.

fi = - (1)

Where means white pixel values and means black pixel values.
The same procedure needs to be applied for many positive images and obtains
average value for threshold and filtering range. After develop the classification, it
can be applied to real time image as shown in Fig.4. To detect the face area, the

Fig. 3 Haar-like classifier features Fig. 4 Apply Haar-like classifier features


112 A.D. Dharmawansa, K.T. Nakahira, and Y. Fukumura

relevant features has to be gone through the all area of image. It is relatively time
consuming task. Therefore the relevant region for each face components can be
roughly set and it is discussed in next part. After apply the relevant Haar-feature in
real world application fi value is able to determined. If the real image fi value is
greater than the fi value of classifier, then the relevant face feature is available for
one Haar-feature. All the selected Haar-features are needed to be satisfied for
confirm to availability of face components as indicated in Fig.4. This is the way
for detecting the face and face components.

4.2 Define Region of Interest to Identify Face Components


Having identified face area, the face components recognition is carried. According
to, Athanasios Nikolaidis and Loannis Pitas, the details of the regions of interest
for each face component can be obtained. With the details of the regions of
interest and experimental results of this system, the suitable regions of interest
could be defined [14]. As shown in Fig.5, the height of the image can be divided
into several parts (zj) to identify the face components.
zj = { xj| xj>0 & 1≤j≤5} (2)
Where j means the number of parts area to set the interest regions for identify each
components. In here, the value of j named from top to bottom of image such as
top, eye, nose, mouth, and bottom. Each zj has a value of (18.18, 15.15, 6.66,
26.66, and33.33) respectively.
According to Fig.6 the selected method is enough to detect face components.
This system is developed to identify eyes, nose and mouth locations. Further, the
corner points of the mouth, eye points and nose point are also detected. According
to the size of each face component, the appropriate rectangle is appeared on the
detected component. The detection may be varied with the accessories which are
wearing by users. But this system is relatively detected eye points when user
wears spectacles. Constraint of this system as follows.

– Viewing direction – Lighting condition


– Distance between user and computer
– Wearing accessories like ear rings, spectacles etc.
Based on the rectangle appeared on the face components, the relevant size of face
features observable as shown in Fig.6. When the user behaves in a normal way,

Fig. 5 Region of interest Fig. 6 Detection of face components Fig. 7 Face variables
Capturing Student Real Time Facial Expression 113

those features may be varied. According to the variation of face features the
relevant rectangles are also changed. Instead of face component data, the relevant
rectangle size can be determined.

4.3 Face Variables


Based on the detected face components the different face variables are acquired as
shown in Fig.7. Ten face variables are attained for determine the facial expression
of the user.

EH – Eye Height EW – Eye Width MH – Mouth Height Fh– Face Height


MW- Mouth Width NW – Nose Width NH– Nose Height Fw– Face Width

4.4 Classify in to Facial Expressions


After calculating average size of the face components in natural situation, the
classification or identification of real user facial expression is commenced. System
checks average natural size and the current size of each component for each
occurrence. According to changes of the face component size, the system is
identified basic facial features such as “Neutral”, “Happy”, “Surprise”, and “Sad”.
Identify highly affected face variables for each facial expression and define new
axes for each facial expression base on the ten face variables. According to
Darwin and experimental data of this system, the behavior of each face component
is classified for identify facial expressions as shown in Table 01 [15].

Table 01 Classification of facial expression


Facial Expression Behavior
Neutral -2<EW -EW <2 & -2<EH -EH <2 & -2<NW -NW <2 & -2<NH -

NH <2 & -2<MW -MW <2 & -2<MH -MH <2

Happy MW -MW > 10 & EW -EW <-10 & EH -EH <-2

Surprise MW -MW < -5 & MH -MH > 10 & EW -EW >10 & EH -EH >10

Sad EW -EW <-10 & EH -EH <-10 & MW -MW >5 & NW -NW >3

Avg – Average value of first ten frames i – Frame number (i>10)

Fig. 8 Facial Expression detection system


114 A.D. Dharmawansa, K.T. Nakahira, and Y. Fukumura

According to the real time facial expression recognition system, the appropriate
facial expression can be identified as shown in Fig.8.

5 Facial Data in Web Application


After recognizing the user facial expression, the appropriate data delivered to web
interface. As shown in Fig.9, the details of the face components and relevant data
are appeared. It is updated continuously according to real user face data. Web
interface is a middle point of this system. Although, the connection between real
world and virtual environment is important, it is somewhat difficult to connect
virtual learning environment and external developed system. But there is way to
pass relevant data to virtual world using http request. Therefore, as a mechanism
to connect system and virtual world, the html web interface can be introduced and
the ultimate objective can be fulfilled.

6 Virtual World Avatar Changes

Fig. 9 Web interface for face data

Normally, virtual world avatar has not any facial expression. Most of the time,
they are visible with a neutral face image. It is a barrier for make a more realistic
learning environment. As a first step to overcome it, basic facial expression
animate system in virtual environment, is developed. Using that system, avatar can
make basic facial expression while avatar is engaging with learning.
In this time, the effort is made connection between real user and appropriate
avatar in the virtual environment. Above web interface can be able to send
relevant data to the virtual environment. According to those data, appropriate
facial expression visible through avatar. As a result of this system, the real user
facial behavior is appeared on the avatar face.
Capturing Student Real Time Facial Expression 115

7 Future Work
Making alive avatar in educational environment may affect the educational
performance of E-learner. Although that is a virtual environment, student can
learns regarding the real world applications, natural things, day-today knowledge
etc. In that case, they have to compare with real world and they have to think
about real world. The realistic virtual environment provides some impression
about real world and it is very helpful for continue their learning activities.
Further, thinking ability may be change person to person and it may easy with
realistic environment.
But yet, there are no any experiments to identify how this realistic virtual
environment affects to student learning behavior. The next step is to conduct some
experiments and verify whether there are some effects of this realistic
environment for student learning behavior in all aspects.

8 Conclusion
With the growing interest of E-learning, three dimensional virtual learning
methods become popular. To make a more realistic virtual environment, live
avatar with facial features and connection between real user and virtual avatar are
important. Therefore, real user facial expression extraction system is developed
and real time facial expression can be extracted. To acquire the facial features,
geometric facial feature-based method and location information of prominent
component has used. Using that system real time facial feature can be extracted
continuously.
When the real user facial expression extracted, it is needed to send into virtual
learning environment. Using web interface, the relevant data can be delivered. To
indicate four basic facial features, avatar face changing system initialize before
connecting the virtual world and real world. After introducing the avatar face
changing system, the connection between the avatar and the user need to be
established. At last, according to real user face changes, the appropriate avatar
face also changes.
E-learning is not conducted in face to face format. Therefore student behavior
is difficult to analyze. But this system provides the face details of real user for any
other person with web interface. Any people can observe the face data of E-learner
continuously and it is loaded in database for further use. This system makes
virtual learning environment more realistic with face data and provides student
facial behavior-observe facility for any person.

References
[1] Moe and Blodgett, op. cit., endnote 21, p. 229. Meister op. cit., endnote 23 in US Web
Based Education Commission Report (December 2000)
[2] Moe and Blodgett, op. cit., endnote 21, p. 229, Gregory, Wilson and Husman (2000)
116 A.D. Dharmawansa, K.T. Nakahira, and Y. Fukumura

[3] Higgins, S., Hall, E., Wall, K., Woolner, P., McCaughey, C.: The Impact of School
Environments: A literature review. The Centre for Learning and Teaching School of
Education, Communication and Language Science, University of Newcastle
[4] Hughes, J., Attwell, G.: A framework for the evaluation of E-learning. Paper-
presented to a seminar series on Exploring Models and Partnerships for eLearning in
SME’s, held in Stirling, Scotland and Brussels, Belgium (2002/2003),
http://www.theknownet.com/ict_smes_seminars/papers/Hughes
(retrieved February 14, 2007)
[5] Bjur, J.J.: Auditory Icons in an Information Space. Department of Industrial Design,
School of Design and Craft. Goteberg University, Sweden (1998)
[6] Paynea, A.M., Stephensonb, J.E., Morrisb, W.B., Tempestb, H.G., Milehamc, A.,
Griffinb, D.K.: The use of an E-learning constructivist solution in workplace learning.
International Journal of Industrial Ergonomics 39(3), 548–553 (2009)
[7] Chang, T.-Y., Chen, Y.-T.: Cooperative learning in E-learning A peer assessment of
student-centered using consistent fuzzy preference. Expert Systems with
Applications 36(4), 8342–8349 (2009)
[8] Alseid, M., Rigas, D.: Users’ views of Facial Expressions and Body Gestures in E-
learning Interfaces: an Empirical Evaluation. In: SEPADS 2009 Proceedings of the 8th
WSEAS International Conference on Software Engineering, Parallel and Distributed
Systems, pp. 121–126 (2009)
[9] Alseid, M., Rigas, D.: Empirical results for the use of facial expressions and body
gestures in E-learning tools. International Journal of Computers and
Communications 2(3) (2008)
[10] Shan, C., Gong, S., McOwan, P.W.: Beyond Facial Expressions: Learning Human
Emotion from Body Gestures. In: British Machine Vision Conference 2007, paper-
276 (2007)
[11] Rothkrantz, L., Datcu, D., Chiriacescu, I., Chitu, A.: Assessment of the emotiona
states of students during E-learning. In: International Conference on E-learning and
the Knowledge Society - E-learning (2009) ISBN:1313-9207
[12] Govindasamy, T.: Successful implementation of E-learning Pedagogical
considerations. The Internet and Higher Education 4, 287–299 (2001)
[13] Viola, P., Jones, M.J.: Robust real-time object detection. International Journal of
Computer Vision 57(2), 137–154 (2004)
[14] Nikolaidis, A., Pitas, I.: Facial feature extraction and pose determination. Pattern
Recognition 33, 1783–1791 (2000)
[15] Matsumoto, D., Ekman, P.: Facial expression analysis. Scholarpedia 3(5), 4237
(2008)
Character Giving Model of KANSEI
Robot Based on the Tendency of
User’s Treatment for Personalization

Hiroki Ogasawara and Shohei Kato

Abstract. Recently, many types of robots have been developed for not only
industrial manufacturing but also interacting with human. Robots designed
for human-robot interaction are expected to have the ability of communi-
cating with human smoothly. In this paper, we propose the character giving
model for KANSEI robot. This model makes robots individual-beings that
are varied with each user. We aim to develop more humanly and empathetic
robots by using this model. Robots dynamically get their own characters
based on the tendency of user’s behaviors which are classified into two dimen-
sions: dominance-submission and acceptance-rejection. Through the interac-
tion experiments between human and the robot with the proposed model, we
confirmed that proposed model could give various characters to the robot,
and the character, which was given through the communication with a user,
suited for each of the users.

1 Introduction
Recently, varied robots are used in not only manufacture fields but also com-
munication with humans, such as PaPeRo [1] and wakamaru [4]. Commu-
nication robots are expected to take care elderly people and heal humans’
hearts. Therefore, they should have abilities to communicate with human
smoothly. Many studies that intend to give the robots these abilities have
been reported. Yokoyama [8] analyzed appearance timings of non-verbal in-
formation in communications between fellow humans. And they used these
timings for controlling non-verbal information of robots. Takada [7] pro-
posed the system controlling robots’ facial actions. That system outputs facial
actions in compliance with humans’ behaviors.

Hiroki Ogasawara · Shohei Kato


Nagoya Institute of Technology,
Gokiso-cho Showa-ku Nagoya 466-8555 Japan
e-mail: {oga,SHOHEY}@juno.ics.nitech.ac.jp

T. Watanabe et al. (Eds.): Intelligent Interactive Multimedia: Systems & Services, SIST 14, pp. 117–127.
springerlink.com c Springer-Verlag Berlin Heidelberg 2012
118 H. Ogasawara and S. Kato

Fig. 1 Overview of character giving model.

In this paper, we aim to improve humanities of robots and propose the


robots that have own individualities. We can see varied kinds of individuali-
ties on humans’ behaviors such as a gesture, an accent and a character. We
focus on “characters.” In case of humans, it is thought that the characters
are modeled by communications with humans and their own environments.
Therefore, we propose the robots that make their own characters from com-
munications with humans and their own environments. It is thought that
communication with compatible person is more interesting than communi-
cation with foreign person. So we aim to make the robot more empathic by
changing the robot’s characters based on communications with users.
Fig. 1 shows the overview of our character giving model. Robots store
the tendency of user’s behaviors and make their own characters based on
the stored tendency. Their own characters are expressed by the tendency
of robots’ emotions. Robots show their emotions by facial expressions. We
conduct the interaction experiments between humans and the robot with the
proposed model, and make sure that this model has the ability to give robots
various characters and improve humanities of robots.

2 Character Giving Model


In the field of personality psychology, it has been known that there are many
kinds of cause to give children their own characters. In this paper, we give
robots their own characters based on parental behaviors.

2.1 Parent-Child Relationships


In this paper, We use “The Psychology of Parent-Child Relationships [6]”
as the character giving model. In this book, Symonds reported about rela-
tionships between parental behaviors and children’s characters. He classified
Character Giving Model of KANSEI Robot 119

Fig. 2 Relationships between parental behaviors and children’s character.

parental behaviors into two dimensions: dominance-submission and


acceptance-rejection. The tendency of these behaviors makes children’s char-
acters. Fig. 2 shows relationships between parental behaviors and children’s
characters. For example, the tendency to dominance and acceptance is defined
as “overprotection.” In this tendency, the child is characterized as overdepen-
dent and infantile.
In the proposed model, user’s behavior classified into four different types:
dominance, submission, acceptance and rejection. The tendencies of user’s
behavior dynamically characterize robots.

2.2 Character Expression


Saitoh [5] reported about relationships between interpersonal behaviors and
emotions. In this paper, we use these relationships as the relationships
between user’s behaviors and robot’s emotions. Saitoh classified interper-
sonal behaviors and emotions into eight different types. We use four behav-
iors and four emotions in Saitoh’s classification. We think these behaviors
correspond to parental behaviors. These emotions correspond to behaviors
that we selected. Table 1 shows the relationships between behaviors and
emotions.
We think it is hard that robots give the impression humanly when robots
evenly show emotions based on this relationships. So, we propose the method
for characterizing robots. In this method, robots show emotions based on
expressional tendencies varied by their characters. We aim to improve hu-
manities of robots with this method. In this paper, we treat robots’ char-
acters as expressional tendencies. We aim to express characters defined
by Symonds by increasing expressional tendencies corresponding to users’
behaviors.
120 H. Ogasawara and S. Kato

Table 1 Relationships between Interpersonal Behaviors and Emotions

interpersonal behaviors emotions


dominance inferiority
submission superiority
acceptance affection
rejection antipathy

2.3 Communication Model


Fig. 3 shows the proposed model to give robots their own characters. First, a
robot shows actions to an user. Second, a user feeds back one of interpersonal
behaviors in answer to robot’s actions. A robot stores interpersonal behaviors
classified into four different types, where are shown by Table 1, and increases
expressional tendency of an emotion corresponding to interpersonal behav-
iors. Finally, a robot shows facial expression to express an emotion chosen
by expressional tendencies and interpersonal behaviors. We define this flow
as an interchange. Human-robot communication is constructed by repeating
this interchange.
We describe the tendency to inferiority as T1 , superiority as T2 , affection
as T3 and antipathy as T4 . These tendencies are defined as:

(D − S) ∗ a if D > S
T1 = (1)
0 if S ≥ D,

(S − D) ∗ a if S > D
T2 = (2)
0 if D ≥ S,

(A − R) ∗ a if A > R
T3 = (3)
0 if R ≥ A,

(R − A) ∗ a if R > A
T4 = (4)
0 if A ≥ R,
where D, S, A and R are a number of dominance, submission, acceptance
and rejection, respectively. a is a constant defined as strength of expressional
tendencies. Here, P0 represents the probability of expressing an emotion based
on the relationships shown by Table 1. And we describe the probability of
expressing an inferiority as P1 , a superiority as P2 , an affection as P3 and an
antipathy as P4 . Then these probabilities are calculated as:
n
P0 = , (5)
n + T1 + T2 + T3 + T4
Character Giving Model of KANSEI Robot 121

Table 2 Examples of Situation Sentences and Users’ Actions

Users’ Actions
Situations with ifbot
Dominance Submission Acceptance Rejection
wants to play with you. restrain play play later reject
wants to help cleaning with you. turn down rely clean together let it go off
begs for toys. restrain buy buy on another time reject
overslept. admonish forgive and prepare help to prepare leave it alone
is singing. prevent commend sing together lay off

Fig. 3 Structure of character giving model.

Ti
Pi = , (6)
n + T1 + T2 + T3 + T4
(i = 1, 2, 3, 4)
where n is defined as a constant number of interpersonal behaviors stored in
robots. Robots reflect interpersonal behaviors over the past n times in their
own characters.
In this paper, we use the GUI to communicate with robots. Fig. 4 shows
the snapshot of communication between the user and the robot using the
GUI. The robot shows situation sentence as actions in this GUI. Users
read situation sentence and select users’ actions in four buttons on the
GUI. Four users’ actions correspond to interpersonal behaviors shown by
Table 1. In this communication, we didn’t show clearly users these corre-
spondences. We set 30 situation sentences based on behaviors of five-years-
old children. Table 2 shows the examples of situation sentences and users’
actions. Each situation sentences have no relationships and are shown in ran-
dom order. The communication is ended when the all situation sentences were
happened.
122 H. Ogasawara and S. Kato

Fig. 4 Snapshot of communication.

3 Experiments and Results


We performed two experiments to confirm the efficacy of the proposed model.
Both experiments had 20 subjects. We used KANSEI robot “ifbot” [2] in
experiments. The number of interpersonal behaviors stored in robots was
30(n=30). The strength of expressional tendencies was 2(a=2).

3.1 Evaluation of Character


We performed character evaluation experiment to confirm the proposed
model could make robots expressing characters defined by Symonds. For this
experiment, we equipped four ifbots characterized by four parental tendencies
defined by Symonds. The characters of equipped ifbots were given as follows.
ifbotA :ifbot characterized by “cruelty”
had a strong tendency toward inferiority and antipathy
ifbotB :ifbot characterized by “overprotection”
had a strong tendency toward inferiority and affection
ifbotC :ifbot characterized by “indulgence”
had a strong tendency toward superiority and affection
ifbotD :ifbot characterized by “neglect”
had a strong tendency toward superiority and antipathy
In this experiment, ifbots’ characters didn’t change. Subjects communicated
with these ifbots and evaluated their impression at five factors on a scale of
1 to 7. These factors are based on “Big Five” factors [3]. In these factors, the
more the cobweb chart, which is shaped by each factors’ evaluation values,
shaped the large equilateral pentagon, the more the character is emotional
Character Giving Model of KANSEI Robot 123

and ideal. Fig. 5 shows the cobweb chart shaped by each factors’ average val-
ues of evaluation in this experiment, and Table 3 shows ifbots’ combinations
which show significant differences about some factors. Conscientiousness and
openness are omitted from the table because any robot combinations didn’t
show significant differences about these factors.

Evaluation by “Big Five” Factors. According to Fig. 5, these cobweb


charts had different tendencies. So the characters expressed by using the pro-
posed model were confirmed they could give users the impression that these
ifbots’ characters had differences. In neuroticism factor, ifbotC had the high-
est score. And ifbotA had the lowest score. These ifbots’ combination showed
a significant difference about the factor. ifbotC had a strong tendency toward
superiority and affection. And it rarely expressed negative emotions such as
inferiority and antipathy. On the contrary, ifbotA had a strong tendency to-
ward inferiority and antipathy. From these results, we consider the tendency
toward superiority and affection increases the neuroticism factor’s score. And
the tendency toward inferiority and antipathy decreases that’s score. Addition-
ally, there was little difference between the neuroticism factor’s score of ifbotB
and ifbotD. So we think each emotions’ effects on neuroticism factor is about
equality. In extraversion and agreeableness factors, ifbotB and ifbotC had high
scores. And ifbotA and ifbotD had low scores. Also, in agreeableness factor, if-
botB and ifbotC showed positive significant differences on ifbotA and ifbotD.
ifbotB and ifbotC had a strong tendency toward affection. ifbotA and ifbotD
had a strong tendency toward antipathy. Therefore, we think a tendency to-
ward affection and antipathy effects on extraversion and agreeableness factors.
In conscientiousness and openness factors, ifbotB had a little higher score than
others. But, there wasn’t a major difference in each ifbots. Overall, the cobweb
charts of ifbotB and ifbotC shaped large diagrams and similar to equilateral
pentagon. And the cobweb charts of ifbotB and ifbotC shaped small and irreg-
ular diagrams. As these results, we consider ifbotB and ifbotC were obtained
the characters impressing favorably for users.

Evaluation by Parent-Child Relationships. In Parent-Child Relation-


ships, ifbotA is defined as “nervousness” child. In “Big Five” factors, we
think the low neuroticism factor’s score of ifbotA shows nervous character.
So, ifbotA could express the character defined by Symonds. ifbotB is de-
fined as “overdependent” and “infantile” child. But, in “Big Five” factors,
each factors of ifbotB had average score. There wasn’t dependent and infan-
tile character. ifbotC is defined as “authority-rejection” and “commanding”
child. We think ifbotC could express commanding character by the high score
about extraversion, agreeableness and neuroticism. ifbotD is defined as “ag-
gressive” child. We think ifbotD could express aggressive character by the
low score about extraversion and agreeableness.
124 H. Ogasawara and S. Kato

Fig. 5 Character evaluation by “Big Five” factors.

Table 3 ifbots’ Combinations Which Showed a Significant Difference

factor positive negative


ifbotB ifbotA
extraversion ifbotC ifbotA
ifbotC ifbotD
ifbotB ifbotA
ifbotC ifbotA
agreeableness
ifbotB ifbotD
ifbotC ifbotD
neuroticism ifbotC ifbotA

3.2 Evaluation of Impression


We performed impression evaluation experiment to confirm the robot char-
acterized by the proposed model had high humanity and empathy. First, in
this experiment, subjects communicated with the ifbot mounted the proposed
model on. Then, we equipped three ifbots based on this communication. And
subjects communicated with these ifbots. The characters of equipped ifbots
are given as follows.
ifbotM :expresses emotions based on the tendencies of subjects’ behaviors
ifbotR :expresses emotions in random order
ifbotS :expresses emotions based on the tendencies of others’ behaviors
Subjects didn’t know the characters of these ifbots. Each subjects commu-
nicated with these ifbots in random order. Before this experiment, we per-
formed the pre-communication between 20 subjects, which differ from this
Character Giving Model of KANSEI Robot 125

experiment, and the ifbot mounted the proposed model on. Through the
pre-communication, we equipped the data of 20 tendencies of users’ behav-
iors. ifbotS was characterized by the tendency that was the nearest pre-
communication’s data from the origin symmetry of subject’s tendency. After
the experiment, subjects evaluated their impression by Semantic Differen-
tial and answered questionnaires on distinguishing the ifbot characterized by
their own tendency from others.

Evaluation by Semantic Differential. We used Semantic Differential


to evaluate these ifbots’ impression. Subjects evaluated their impression at
ten pairs of adjective on a scale of -3 to 3. Fig. 6 shows the result of this
experiment. The averages of evaluation values are shown by the bar graphs,
and these standard deviations are shown by the error bars. The case arcs
show the ifbots’ combinations which show significant differences.
By this experiment, ifbotM shows positive significant differences on
ifbotR and ifbotS in “familiar - unfamiliar” and “likable - unlikable” pairs.
And ifbotM shows a positive significant difference on ifbotR in “agreeable -
disagreeable” pair. From these results, we consider ifbotM, which mounted
the proposed model on, left a good impression on subjects. In “significant
- insignificant,” “natural - artificial” and “human - mechanic” pairs, ifbotM
shows positive significant differences on ifbotR. So, the robots characterized
by the proposed model are more humanly than the robots which show emo-
tions in random order. In addition, ifbotS shows positive significant difference
on ifbotR in these pairs. That is to say, the robots characterized by the pro-
posed model left a humanly impression on the subjects even if they had never
communicated with these robots. And ifbotS shows a positive significant dif-
ference on ifbotM in “complicate - simple” pair. ifbotS was characterized by
the opposites of subjects’ behaviors and showed different reactions from the
ifbot which was characterized for subjects. Therefore, we think ifbotS left
a complicate impression on subjects because of ifbotS showed unpredictable
reactions for subjects.

Questionnaire on Distinguishing the Ifbot. We performed the ques-


tionnaire for subjects. This questionnaire is aimed at distinguishing the ifbot
characterized by their own tendency from others. In this questionnaire, sub-
jects answered the question “Do you think this ifbot is characterized by you?”
with “yes” or “no.” This questionnaire wasn’t just a multiple-choice question.
In other words, we didn’t specify a number of answer “yes,” we allow the case
that a subject answer “no” to the question for all ifbots or “yes” to the ques-
tion for some ifbots. When subjects answered “yes” to the question for ifbotM
and “no” to the question for ifbotS and ifbotR, the answer was correct. As
the result, 75% of all subjects answered “yes” to the question for ifbotM.
And, no one answered “yes” to the question for ifbotR. 15% of all subjects
answered “yes” to the question for ifbotS. 65% of all subjects made a correct
126 H. Ogasawara and S. Kato

Fig. 6 Result of impression evaluation experiment.

answer. According to the results, we think the proposed model could char-
acterize a robot as a unique-being been able to distinguish it from others by
their user.

4 Conclusion
In order to make robots empathic and humanly, we proposed the method to
dynamically characterize robots. We performed the interaction experiments
between humans and the robots with the proposed model containing charac-
ters defined by Symonds. According to the results, the changes expressional
tendencies by the proposed model could characterize robots dynamically and
make users to feel robots had characters defined by Symonds. And, we con-
firmed the proposed model could leave a humanity impression on users and
the character based on users’ interpersonal tendencies could leave a good im-
pression on users. Therefore, we think the proposed model have efficacy for
increasing robots’ empathy and humanity.
In future work, we aim to propose the method characterizing robots more
flexibly by adding environments or various communications to causes to give
robots their own characters.

References
1. Fujita, Y.: Development of personal robot papepo. Journal of the Society of
Instrument and Control Engineers 42, 521–526 (2003) (in Japanese)
2. Kato, S., Oshiro, S., Itoh, H., Kimura, K.: Development of a communication
robot ifbot. In: IEEE ICRA, pp. 697–702 (2004)
3. Murakami, Y.: Big five and psychometric conditions for their extraction in
japanese. The Japanese Journal of Personality 11, 70–85 (2003) (in Japanese)
Character Giving Model of KANSEI Robot 127

4. Onishi, K.: ”wakamaru”, the robot for your home. The Japan Society of Me-
chanical Engineers 109, 448–449 (2006) (in Japanese)
5. Saitoh, I.: Interpersonal sentiments and emotions in social interaction. The
Japanese Journal of Psychology 56, 222–228 (1985) (in Japanese)
6. Symonds, P.M.: The psychology of parent-child relationship. Appleton-Century-
Croft, New York (1939)
7. Takada, M., Kaneko, M.: Sympathy and reaction based of facial actions between
humanoid and user. The Institute of Erectronics, Information and Communica-
tion Engineers 104, 1–6 (2005) (in Japanese)
8. Yokoyama, M., Aoyama, K., Kikuchi, H., Hoashi, K., Shirai, K.: Controlling non-
verbal information in speaker-changing for spoken dialogue interface of humanoid
robot. Information Processing Society of Japan 40, 487–496 (1999) (in Japanese)
Checklist System Based on a Web for Qualities
of Distance Learning and the Operation

Nobuyuki Ogawa, Hideyuki Kanematsu, Yoshimi Fukumura,


and Yasutaka Shimizu

Abstract. Nowadays, it is very important to ensure the high quality of e-learning


on a global scale. Therefore, the online checklist which could give a helpful guide-
line to ensure a high quality is required. In the current project, we chose 60 kinds
of items for the checklist selected and categorized it from Japanese viewpoints of
quality guarantee and established a Web check list. The system could diagnose the
quality assurance by the designers themselves for e-learning materials, using self-
diagnosis results. And in addition, the user can analyze his/her e-learning course
statistically, according to the accumulated data on the web. We asked the teachers
joining e-Help, a special group among some higher educational organizations for
e-learning and credit transfer in Japan, to try the web for their self-checking. And
then the data collected from the web was analyzed and the results were discussed.

1 Introduction
Recently, the quality and its improvement for distance learning are becoming
important more and more with the tendency that ICT education, e-Learning etc.

Nobuyuki Ogawa
Department of Architecture, Gifu National College of Technology, Japan
e-mail: ogawa@gifu-nct.ac.jp

Hideyuki Kanematsu
Department of Materials Science and Engineering, Suzuka National College of
Technology, Japan
e-mail: kanemats@mse.suzuka-ct.ac.jp

Yoshimi Fukumura
Department of Management and Information Systems Science, Nagaoka University of
Technology, Japan
e-mail: fukumura@oberon.nagaokaut.ac.jp

Yasutaka Shimizu
Tokyo Institute of Technology, Japan
e-mail: shimizu.y.ak@m.titech.ac.jp

T. Watanabe et al. (Eds.): Intelligent Interactive Multimedia: Systems & Services, SIST 14, pp. 129–141.
springerlink.com © Springer-Verlag Berlin Heidelberg 2012
130 N. Ogawa et al.

have prevailed not only at a national level, but also on global scale [1]-[3]. There-
fore, we tried to make a checklist based on lots of data and information relating to
distance learning as guideline to improve the quality of distance learning.
The checklist in this current project is available online (a Web check list),
where one could judge how he/she should prepare e-learning materials to improve
the quality of content. It does not only provide us the check system for the guide-
line to carry out the e-learning course, but also serves to award credits to students
and to transfer credits among different higher organizations. It is composed of
many questions that teachers for e-learning courses should answer. They would
input their answers on the web and could get the guideline to analyze their own
e-learning courses. The system accumulates the input data by different teachers to
classify and analyze them statistically. We established the checklist system to
apply it to various e-learning courses for e-learning Higher Education Linkage
Project: e-Help). However, the contents for the checklist would be available for
general e-learning courses. In addition to the application in the e-Help project, we
aim to apply to general e-learning courses to assure the quality.

2 Checklist to Assure the Quality of e-Learning


When the contents for the checklist were chosen, we referred mainly to “Perspec-
tive for the enhancement of quality for e-learning” by Shimizu[4], cited from the
working group report of JSEE (Japan Society of Engineering Education), qualita-
tive study and educational engineering. Shimizu described approximately 200
items and analyzed them in detail. We chose of them according to our own criteria
and adjusted them to our own purpose.
In Shimizu’s paper, he classified the perspective to enhance the quality for e-
learning into four categories. (1) the perspective to enhance the quality at the de-
veloping stage, (2) the perspective to enhance the quality for the support of opera-
tion. (3) the perspective to enhance the quality from the viewpoint of administra-
tion in higher organization. (4) the perspective to enhance the quality for
evaluation.
The checklist in the current project introduced three new evaluation standards
in addition to the four perspectives by Shimizu and reclassified all of them once
again. According to the concept, we analyzed and chose the items needed to culti-
vate promising students who will lead e-learning area in the future. These new
three evaluation standards were made, depending on their degree on difficulty. It
increased in the order, L1, L2 and L3. Each of them was defined as follows:
L1: The checklist in the category for the practical e-learning as a part of classes
and it is needed absolutely to deliver it. Concretely speaking, it involves the con-
tents relating to laws, systems, environments, security, essential contents etc.
L2: The checklist in the category for awarding credits and it is needed to award
the credits for the students actually. According to the bylaws by Japan University
Accreditation Association and National Institution for Academic Degrees for
e-learning, the checklist provides the required contents to satisfy the same level of
quality with that for face-to-face learning.
Checklist System Based on a Web for Qualities of Distance Learning 131

L3: The checklist for higher-level standards to improve the quality much more. It
is composed of contents which could correspond to versatile needs, individual
needs much more.

The current checklist is composed of 12 categories where three evaluation stan-


dards and four perspectives are combined together. Each perspective contains 15
check items and totally, it has 60 check items as follows.
(1) Perspective at developing stage
L1
1-1 Does the e-learning have appropriate needs when it is used for actual learning?
1-2 Is the procedure of processing for copyrights made clear?
1-3 Is the design for display articulate? Does everything in it have its own meaning?
1-4 Is the feedback utilized to develop the e-learning contents?
L2
1-5 Is the contents used for face-to-face learning coherent and corresponding to
those for e-learning?
1-6 Are the goals and purposes for the class and the outcome been achieved after
it has been explained clearly?
1-7 Is the course schedule provided so that students can accomplish the goal of
the class?
1-8 Is the development for e-learning course appropriate for the learning goal? Is
the learning goal written properly? Is the importance of learning course
described? Are the contents of the learning course explained easily? Is the
learning course structured properly?
1-9 Is the evaluation method compatible with the learning goal?
1-10 Are the procedure, period, needed time, the due for the evaluation clearly
shown?
1-11 Are the purpose and contents for the class described clearly? Are the
criteria for evaluation by the submission of reports and tests explained
enough?
L3
1-12 Is the continual system of surveillance established?
1-13 Are versatile learning methods corresponding to different learning styles
provided?
1-14 Are the different environments of students considered to design the curri-
culum?
132 N. Ogawa et al.

1-15 Are the multiple evaluation methods (mini exams, term exams, reports,
projects, discussions etc.) carried out?
(2) Perspective to support the actual operations
L1
2-1 Does the teacher lead and guide students to submit their reports properly, so
that they would not copy the information from web sites directly, exchange their
information among students, borrow the contents except their names from other
classmates and so on.
2-2 Does the teacher provide the system where students can get technical support?
2-3 Does the teacher adequately consider the preservation of confidentiality and
the privacy protection for students, when he/she provides the educational service
to leaners?
2-4 Does the teacher provide the orientation for the distance learning online and/or
offline?
2-5 The e-learning contents satisfy the recommended computers’ performance,
networks line speeds, the needs for the basic software?
2-6 The time needed to have access to the e-learning contents is well considered
for students to continue their learning?
L2
2-7 The support system is well established, so that the interaction among students
and teachers could occur smoothly?
2-8 Is the system well established to show the situation for learning processes such
as submission of reports etc.?
2-9 Does the teacher provide feedback immediately and properly?
2-10 Does the teacher show learners the methods for evaluation?
L3
2-11 Does the e-learning course provide the system to motivate the learners?
2-12 Does the e-learning course have the support system for an online learning
community?
2-13 Does the e-learning course have the system, where students could have
access to library materials, terminology dictionaries, and other materials needed
for classes as service for learners?
2-14 Does the teacher provide the support for handicapped learners?
2-15 Does the teacher provide the support system, where learners self-check their
basic operation capability for learning? Does the system make it possible for the
learners under a certain criteria to improve their operation capability by them own?
Checklist System Based on a Web for Qualities of Distance Learning 133

(3) Perspective for administration in higher organization


L1
3-1 Does the organization have the infra-structures for e-learning (Network
system, servers etc.)?
3-2 Does the head of higher educational organization etc. alleviate teachers’ duties
for e-learning courses or evaluate their achievements especially?
L2
3-3 Does the organization provide the system to keep the academic level in the
relation with its ideal?
3-4 Can the head of higher educational organization (President, Deans etc.)
address the strategic importance of distance learning and the role of distanced
learning for their organization?
3-5 Does the organization clarify the accountability and transparency for their
maintenance, the possibility of enlargement, future scope, the responsibility?
3-6 Does the organization support the learners in order to provide them
guidebooks for learning, helpdesks etc.?
3-7 Does the organization give adequate information to the students about
e-learning courses
L3
3-8 Does the organization establish the operation system to make the communica-
tion more active among learners?
3-9 Does the organization have the perspective to support learners as an organiza-
tion and also to improve the learning quality?
3-10 Does the organization provide not the e-learning course converted directly
from conventional face-to-face learning for day students, but the indispensable
e-learning course with great outcomes?
3-11 Does the organization help learners with learning guidebooks, teaching
assistants, tutors etc.?
3-12 Does the organization hold forums, symposiums, workshops periodically to
improve the quality of e-learning course?
3-13 Does the organization support staffers to produce guidebooks, helpdesks,
consulting, contents etc. for e-learning courses?
3-14 Does the organization provide the system to enhance self-learning capability
for learners and to give them a sense of accomplishment?
3-15 Does the organization support the e-learning courses for the effective prac-
tice of e-learning courses and also for the enhancement of learners incentive?
134 N. Ogawa et al.

(4) The perspective to improve the quality through evaluation


L1
4-1 Does the teacher make the e-learning course with the media available for
learners’ learning level?
4-2 Does the exams contents evaluate the students ability properly?
4-3 Are the strategies for ability evaluation and the language level appropriate?
Does the teacher adjust the basic operation to the learners?
4-4 Does the operation system work without problems for the learners?
L2
4-5 Does the course have a way to identify the learners, when they would have the
online and offline tests to evaluate their achievements?
4-6 Does the course have a system to evaluate its own e-learning course? (The
evaluation for students needs, satisfaction of learners, attendance rates of learners,
occupation rates of libraries and support services etc.)
4-7 Does the teacher provide feedback regarding the evaluation results to learning
goals?
4-8 Does the learner pursue the learning and accomplish the goal as expected?
4-9 Does the teacher support students enough and do the students learn e-learning
contents effectively?
4-10 Does the teacher clarify the evaluation items for the e-learning course and are
the appropriate teachers allocated for the course properly, if the e-learning course
is developed not by staff within the organization, but a person from outside organ-
izations.
4-11 Does the teacher update the e-learning contents arbitrarily?
4-12 Does the teacher provide students exam tests for their learning achieve-
ments?
4-13 Does the teacher continue to evaluate student’s outcomes through their
courses?
L3
4-14 Does the teacher consider the appropriateness of learning motivation, con-
tents attractiveness, learners’ level, based on the information about the evaluation
by learners?
4-15 Does the e-learning course have the system to encourage students learning in
addition of only knowledge transmission?
Checklist System Based on a Web for Qualities of Distance Learning 135

3 The Establishment of Web Checklist


The web checklist system made it possible for e-learning teachers, developers and
administrators to answer the 60 items mentioned above on the web. Since the user
answers the questions on the web, they must be authenticated to enter the web
with a certain password at first, so that the fraud participant could be eliminated.
(Fig.1) To protect the individual information, we designed the basic password
would not lead to the identification of individuality. When the user enters the web
site, he/she would enter his/her subject, organization, name etc. Then the user in-
puts his/her evaluation with five steps. #5: Very much, #4: Pretty much, #3:

Fig. 1 The screen shot for the entry screen.

Fig. 2 The general input screen for the checklist.


136 N. Ogawa et al.

Neutral, #2: Not so much, #1: Not at all. If his/her answer would be #3 for a ques-
tion, he/she can click the radio button for #3 on the web. When he/she answers the
questions on the web, they should check those items from the teacher’s view-
points. For each question, the user answers, when they cannot choose the right an-
swers for some questions, they can check the items from the personal viewpoints
as an exception. At any rate, it is the main purpose that the user would answer
from the viewpoint of the teachers in charge of the e-learning courses to improve
the quality of it. Fig.2 shows the general input screen for the virtual checklist.
After the input, the user can get the results as an answering display on the
web. (Fig.3) It is basically composed of three bar graphs for each questionnaire.
One is the input data, the second the average value in the current year and the
third the whole average value. The user can see objectively for which items he/she
has weak points. And in addition, he/she can analyze his/her own
e-learning course, comparing his/her results with the average values from other
researchers.

Fig. 3 Self-diagnosis results screen for the checklist.

The simultaneous display for both results in the current year and also over the
total years, we aim to correspond to the rapid changing consciousness for the
quality of e-learning contents with the rapid prevailing of e-learning. And in
addition, we also aim to correspond to year-by-year-updated contents for
e-learning courses.
Checklist System Based on a Web for Qualities of Distance Learning 137

4 Statistical Classification
As already described, the user gets the useful results by the self-checking system
for the quality assurance for e-learning courses. On the other hand, the data accu-
mulated in the web server by various users provide us useful statistical informa-
tion, which helps us to understand the current situation for e–learning among some
groups. For example, when you add up the results by filtering compulsory courses
and non-compulsory ones, respectively, you can judge if they would be different
as for the approach to the guarantee for the quality of e-learning course or not,
comparing both each other.
To realize such a filtering analysis, the web system requires the inputs of
information about credits for the field of the subject, the agreement of credit
transfer, and awarding credits itself.
As for the field of the subject, it is classified into 10 categories, corresponding to
the classification of scientific research fund by our Japan education ministry. (Fig.4)

Fig. 4 Summary and control screen for the checklist.

As for the filtering of credit transfer agreement, the alternative is if the organi-
zation concludes the agreement for credit transfer with the distributed ones or not.
As for the filtering of award of credits, the following choices are prepared.

(1) As for credits in the organization delivering the contents: The contents are
- not delivered to their own organization
- delivered to their own organization
138 N. Ogawa et al.

- But not awarded as the credit


- And awarded as the credit

Fig. 5 Filtering screen for the checklist.

To graduate,
-The credits are needed.
-The credits are not needed.
-I don’t know if it is needed to graduate or not.
- I don’t know at all about the awarding system of credits.
(2) As for credits in the organization receiving the contents:
- The contents are never delivered to other organizations.
- The contents are delivered to other organizations.
- The credit is not awarded.
- The credit is awarded.
- The credit is needed to graduate.
- The credit is not needed to graduate.
- The credit is needed in some cases.
Checklist System Based on a Web for Qualities of Distance Learning 139

- I don’t know at all about the relation of credits with


graduation.
I don’t know at all about the credit system for other organizations.

The results for filtering can be shown on the web and the user can confirm them
easily. (Fig.5)

5 Results and Discussion


Actually, we asked teachers in e-Help (e-learning Higher Education Linkage
Project, Japan) to input their data into the system on the web and tried to check
their e-learning courses as for the enhancement of quality. At that point, the num-
ber of organizations constituting e-Help was 24. Among those organizations, we
got 14 answers in total. They were classified into three categories. 12 teachers in
charge of e-learning courses in the organizations available for credit transfer and a
teacher not available for credit transfer and the one who is going to make an e-
learning course.
The results collected from
the web were summarized
again as four categories.
A: Very much, B: Pretty
much, C: Not so much, D:
Not at all. In this order, the
negativity for the enhance-
ment of quality increased.
The data shows many
characteristics and some of
them are still not analyzed
correctly. However, they
indicate the following six
characteristics – three posi-
tive and three negative ones
- very clearly at this point.
(1) Very positive characteristics
Q:3-7 Does the organization show the students information about e-learning
courses enough?
Q: 3-1 Does the organization have the infra-structures for e-learning (Network
system, servers etc.)?
Q: 1-5 Are the contents used for face-to-face learning coherent and corresponding
to those for e-learning?
140 N. Ogawa et al.

(2) Very negative characteristics


Q: 3-2 Does the head of higher educational organization etc. alleviate teachers’
duties for e-learning courses or evaluate their achievements especially?
Q: 2-15 Does the teacher provide the support system, where learners self-check
their basic operation capability for learning? Does the system make it possible for
the learners under a certain criteria to improve their operation capability by them
own?
Q: 2-4 Does the teacher provide the orientation for the distance learning online
and/or offline?
The results are summarized in Fig.6 and 7.
As Fig.6 shows, most organiza-
tions prepared syllabi for their
e-learning courses. It shows the re-
cent reform and its prevalence in
Japanese higher education organi-
zation very clearly. And the infra-
structure and design for e-learning
courses are well developed in the
organizations where teachers decide
to launch on e-learning courses.
On the other hand, Fig.7 shows that
the administrators in organizations
don’t understand precisely what to
do for e-learning courses. They suggest that the successful e-learning courses
should be prepared not only from the viewpoint of contents at the individual level,
but also from the viewpoint of administration and that will be an important
mission which the e-learning development must take into consideration in the near
future.

6 Conclusions
The checklist to assure the quality of e-learning was established and the self-
checking system on a web was designed. The system enables us to check our own
e-learning course by ourselves as well as the analytical classification and classifi-
cation on the base of accumulated data. The results were extensive and versatile.
However, three characteristic positive and other three negative ones were clear-
sighted. The checklist was originally made according to the four different perspec-
tives. Those graded results generally show relatively higher evaluations for all of
the four perspectives. However, they already reveal at this point that the adminis-
trators in the organizations do not always understand the importance of e-learning
courses, nor what to do for them. The negative points should be considered care-
fully and improved for the future.
Checklist System Based on a Web for Qualities of Distance Learning 141

The checklist which we established in this paper could correspond to diversity


with flexibility. That was our original purpose and goal. The checklist was origi-
nally made for the higher education organizations in e-Help. And e-Help was
composed of many different-type higher organizations, such as general universi-
ties, technical universities, national colleges, institutes, etc. Therefore, the check-
list was inevitably made to apply to those different higher organizations. In this
point, our checklist was unique and different, comparing conventional ones. In our
very rapid changing world, such a checklist would be hopefully prevailed more
and more.

References
[1] Council for Higher Education Accreditation: "Accreditation and Assuring Quality in
Distance Learning", CHEA Monograph Series 2002, Number 1, CHEA Institute for
Research and Study of Accreditation and Quality Assurance, Washington DC, The
USA (2002)
[2] Distance Education Certificate Program (web page),
http://depd.wisc.edu/html/quality3.htm
[3] Barker, K.C.: E-learning quality standards for consumer protection and con-sumer con-
fidence: A Canadian case study in e-learning quality assurance. Journal of Educational
Technology and Society (2007)
[4] Shimizu, Y.: Perspectives to enhance the quality of e-learning courses. JSEE Research
Report JSET 08-2, 121–128 (2008)
Comparison Analysis for Text Data
by Integrating Two FACT-Graphs

Ryosuke Saga and Hiroshi Tsuji

Abstract. This paper describes a method to visualize contrast information about


two targets the Frequency and Co-occurrence Trend (FACT)-Graph. FACT-Graph
is a method to visualize the changes in keyword trends and relationships between
terms over two time periods. We have used FACT-Graphs as comparison method
between two targets in previous research; however, the method cannot compare
them as equals. To visualize contrast information, we combine two FACT-Graphs
generated from different viewpoints and express the features in one graph. In case
study by using 132 articles from two newspapers, we compare topics such as
politics and events in them.

Keywords: Contrast Mining, Visualization, FACT-Graph, Text Mining,


Knowledge Management.

1 Introduction

With the increasing availability of data on Web, several business organizations


have to create business value and sustain competitive advantage by using data in
data-warehouses [1][2]. To make these data-warehouses work to their advantage,
they have to recognize their strong points, develop a strategy, and make effective
investments.
To recognize advantages, comparison analysis is often done by using not only
traditional methods such as cross-tabulation and visualization analysis but also
data/text mining methods. Comparison analysis is relatively easy when the
comparative data are expressed quantitatively. However, most significant data
often occur in text data and are difficult to obtain from pre-defined attributes.
Therefore, text data in questionnaires, reports, and so on must be analyzed.
For the target of text data, text mining is used to obtain new knowledge [3]. In
text mining, the applicable areas are wide-ranging such as visualization, keyword

Ryosuke Saga ⋅ Hiroshi Tsuji


Osaka Prefecture University, Graduate School of Engineering,
1-1 Gakuen-cho, Nakaku, Sakai, 559-8531, Japan
e-mail: {saga,tsuji}@cs.osakafu-u.ac.jp

T. Watanabe et al. (Eds.): Intelligent Interactive Multimedia: Systems & Services, SIST 14, pp. 143–151.
springerlink.com © Springer-Verlag Berlin Heidelberg 2012
144 R. Saga and H. Tsuji

extraction, summarization of text, and so on. As previous researches, we have


developed the Frequency and Co-occurrence Trend (FACT)-Graph for trend
visualization of time-series text data [4] and try to visualize comparison
information by using FACT-Graph. However, the comparison information is
shown from the viewpoints of one side target and FACT-Graph cannot show the
features of both targets as equal.
Therefore, this paper describes a method to compare two targets by using
FACT-Graph. The rest of this paper is organized as follows: Section 2 describes the
overview and underlying technologies of FACT-Graph. Next, Section 3 describes
how to apply FACT-Graph for comparison analysis. After that, Section 4 performs
a case study of two Japanese newspapers. Finally, we conclude this paper.

2 FACT-Graph
A FACT-Graph is a method to create visualized graphs of large-scale trends [4]. It
is shown as a graph embedding a co-occurrence graph and the information of
keyword class transition. A FACT-Graph enables us to see the hints of trends,
which have been used for analyzing trends in different fields such as politics and
crime, by using analysis tools [5][6]. In addition, the FACT-Graph has been used
for web access log data and we have acquired useful knowledge such as the result
shown in Fig. 1 [7].
A FACT-Graph uses nodes and links. It embeds the changes in a keyword’s
class transition and co-occurrence in nodes and edges. In addition, a FACT-Graph
allocates the last keyword class attribute to nodes because we assume that recent
information is important to carry out trend analysis.
There are two essential technologies in order to compile a FACT-Graph: class
transition analysis and co-occurrence transition. Class transition analysis shows
the transition of a keyword class between two periods [8]. This analysis separates
keywords into four classes (Class A–Class D) on the basis of term frequency (TF)
and document frequency (DF) [8]. The four classes are classified by the status of
high/low TF and DF separated by two thresholds. The results of the analysis detail
the transition of keywords between two time-periods (before and after) as shown
in Table 1. For example, if a term belongs to Class A in a certain time period and
moves into Class D in the next time period, then the trend regarding that term is
referred to as “fadeout.” A FACT-Graph identifies these trends by the node’s
color. For example, red means fashionable, blue refers to unfashionable, and white
stands for unchanged.
In addition, a FACT-Graph visualizes relationships between keywords by using
co-occurrence information to show and analyze the topics that consist of multiple
terms. As a result, useful keywords can be obtained from their relationship with
other keywords, even though that keyword seems to be unimportant at a glance, and
the analyst can extract such keywords by using a FACT-Graph. Moreover, from the
results of the class-transition analysis, the analyst can comprehend trends in
keywords and topics (consisting of several keywords) by using the FACT-Graph. In
addition, a FACT-Graph considers the transition of the co-occurrence relationship
between the keywords. This transition is classified into the following types.
Comparison Analysis for Text Data by Integrating Two FACT-Graphs 145

Admission

News

Academic
/Research

Contents page

Top page

Lifelong Study

Fig. 1 Example of FACT-Graph for Web Usage[7]

Table 1 Transition of Keyword Classes; Class A (TF: High, DF: High), Class B (TF: High,
DF: Low), Class C (TF: Low, DF: High), and Class D (TF: Low, DF: Low)

After
Class A Class B Class C Class D
Class A Hot Cooling Bipolar Fade
Class B Common Universal - Fade
Before
Class C Broaden - Locally Active Fade
Class D New Widely New Locally New Negligible

(a) Co-occurrence relation continues in both analytical periods.


(b) Co-occurrence relation occurs in a later analytical period.
(c) Co-occurrence relation goes off in a later analytical period.
The relationship in type (a) indicates that these words occur very close together;
thus, we can consider them to be essential elements of the topic. On the other
hand, relationships in types (b) or (c) indicate temporary topical changing.
Figure 2 shows an overview for outputting a FACT-Graph. First, the text data
must be morphologically analyzed. A morpheme (a, the, -ed, etc.) is the smallest
unit that gives meaning to a sentence. Then text data is divided into morphemes,
and the parts of speech are also classified by using these tools. This step also
146 R. Saga and H. Tsuji

Fig. 2 Overview of Outputting FACT-Graph

extracts the attributes of each document such as date of issue, length of document,
and category of document. The term database is then built.
The user sets up the parameters such as analysis span, the filter of
documents/terms, and thresholds used in the analysis. Then the term database is
divided into two databases (first- and second-half periods) in accordance with the
analysis span. Each term’s frequency is aggregated in the respective databases,
and keywords are extracted from the terms under the established conditions. These
keywords go through procedures concerning the transition of the keyword’s
classes and co-occurrence. The output chart that reflects the respective processing
results is a FACT-Graph.

3 Comparison Analysis by Integrating FACT-Graph

3.1 FACT-Graph for Comparison Analysis

We have proposed a method for applying a FACT-Graph for comparison analysis


[9]. To apply the FACT-Graph, we regard two time periods as two targets to class
transition analysis in a FACT-Graph. This is because one can comprehend a
FACT-Graph as performing a comparison analysis between two categories:
“Before” and “After,” although it treats time-series text data. By replacing the
periods with targets for comparison, we can compare them by using a FACT-
Graph.
A FACT-Graph allocates shapes to nodes based on the “after” period class
because we assume that the important information regarding trends occurs in the
“after” period. However, for a comparison analysis, it is difficult to understand the
features of both targets equally. This is because a FACT-Graph mainly shows the
features of a one-side target (e.g. “after”) that covers the other side (e.g. “before”).
Comparison Analysis for Text Data by Integrating Two FACT-Graphs 147

Recognize gaps

E
F
Choose more characteristic Merge node and visualize
B D
G
g A > A E
e
A F
gaps C
C < C B D
G
E
e
F E < E A
a C

B D
G
G > G
A
a C

Fig. 3 Integrated process of FACT-Graphs

For this reason, we output both sides of the FACT-Graphs, that is, not only from
“before” to “after” but also from “after” to “before,” and carry out a comparison
analysis [10]. However, for a comparison analysis using two FACT-Graphs, we
should compare two graphs at frequent intervals. The larger a FACT-Graph is, the
more is the cost of analysis. In order to reduce the cost, we integrate two FACT-
Graphs into one.

3.2 Principles for Integrating FACT-Graphs

In order to output the comparison information, we have to integrate the FACT-


Graph according to certain principles. As we mentioned, each FACT-Graph has
nodes and links. For the links we apply an original concept because a link shows
the relationship of what connects with what. The nodes, however, have several
attributes like size and shape. In order to compile an integrated FACT-Graph, we
carry out the following integration processes.
1. Recognize the gaps of features
In this process, we compare class information of keywords between two
targets by recognizing the gap of features. If a node with a term has an
unchanged trend in any FACT-Graph (a white node), we assume that the
features between the two targets are not different. On the other hand, if a
node with a term has an incremental or a decremental trend (a red or blue
node); we recognize that the keyword has different features between the
two targets.
2. Choose a more representative target.
We evaluate which target to express in a FACT-Graph with regards to
keywords which have gaps between two targets. Here, we visualize more
“characteristic” keywords which are estimated by several measurements
such as TF and TF-IDF(Inversed Document Frequency). By using any
measurement, we determine which target is more characteristic of a
keyword and choose a more characteristic target.
148 R. Saga and H. Tsuji

Class A: High TF and DF

Class B: Low TF and High DF

Class C: High TF and Low DF


E
e F
Class D: Low TF and Low DF
B D
d
G
g White terms The terms exists in only one target .

A
a C
Be characteristic terms(links) to both
targets

Be characteristic terms(links) for


Target B only.

Be characteristic terms (links) for


Target A only.

Fig. 4 Integrated FACT-Graph for Comparison analysis.

3. Merge node attributes and visualize results.


From the previous processes, we integrate the attributes of nodes.
Basically, the attributes, which are color, shape, and size, are drawn by
those of more characteristic targets with reference to one FACT-Graph.
However, when a keyword has no gap between two targets in the first
process, the color shall be drawn in white.
As a result, we can obtain an integrated graph like that shown in Figure 3. As shown
in Figure 4, the graph allocates four shapes to nodes according to Class A to Class D,
and the color corresponds to the comparing target except for white. In addition, the
node size shows the scale of TF of each target; that is, the higher TF the keywords
have, the larger the nodes are. A node with white letters is Class E, which indicates a
unique keyword appearing at only one target. Black links mean that the relations
between keywords appear at both targets; red and blue links mean that the relations
appear at only one target. For the comparison analysis, knowing whether a term exists
is necessary to find the features of the comparison targets in the comparison analysis.
Therefore, we add new information to the node that expresses a term existing in only
one side of the comparison targets by showing white letters.

4 Experiment

4.1 Data Set


We carried out an experiment using a FACT-Graph to verify whether a
comparison analysis can be performed. In this study, we used the editorials
published in the Asahi and Yomiuri newspapers, two of Japan’s major newspapers,
from the period between 2006 and 2008.
We chose editorials because they discuss important issues and are often written
on the basis of interviews or opinions. Generally, these articles are written from
Comparison Analysis for Text Data by Integrating Two FACT-Graphs 149

several viewpoints, and the assertions are characteristic of and different for each
publisher. Note that we consider few frequent words as unnecessary terms because
there is a probability that they are noise and error words. Therefore, we removed
the terms having TF less than 2 and DF equal to 1.
In this case study we targeted articles on the topic of the Olympic Games (the
Asahi and Yomiuri had 74 and 58 editorials, respectively). We applied the Jaccard
coefficient for co-occurrence and adopted relationships having co-occurrence over
0.3. To carry out class transition analysis in the FACT-Graph, we configured the
threshold into the top 20% ranked terms on the basis of Zip’s law and Pareto
Principle[11], which is often called the 20–80 rule. In addition, we recognized
which target to characterize by TF.

4.2 Result of Analysis


Figure 4 shows an integrated FACT-Graph with these conditions. In this graph,
the blue nodes and links indicate the features in the Asahi; the red nodes and links
indicate the Yomiuri. When we took a global view of the FACT-Graph we could
observe many blue nodes. This shows the differences in the article volumes.
Overall, each newspaper covers the same topics equally well. Specifically, the
keywords of the Asahi spread across many areas because the amount of Asahi’s
articles might be larger than that of the Yomiuri. But when we focused on the
white nodes and their related nodes, we could roughly categorize the pages into 9
groups that are “Election,” “Economics,” “Political Situation,” “Diplomatic
Relations,” “Tibet Problem,” “Food Problem,” “Olympic Games,” “Japanese
Olympic Committee,” and “Broadcasting and Copyright.”
For example, in “Election,” each newspaper describes the topic of an election
of the Tokyo governor with Olympic, and there are several related keywords with
elections such as “Governor,” “Candidates,” and “Pledge.” In addition, both
newspapers focus on “Ishihara” who is one of the candidates for the Tokyo
governor. In addition, several similar terms exist in this area but Yomiuri
describes the topics involving politic realignment from the red nodes with
“Political party” and “Reorganization.”
In another example, there is a class C node “Relief,” which connects to several
nodes with “Disaster,” “Aid,” “Earthquake,” and “Sichuan” in diplomatic relations.
From the nodes we can understand that grand-scale disasters occur in China and
Myanmar and, specifically, the Yomiuri frequently describes Olympic with the topic
of “Earthquake” because the node is a Class A. There is an interesting node with
“War” in white color and a Class B for Asahi. This means that “War” is a unique
keyword for the Asahi and “War” appears in the article of Olympic frequently.
The extreme characteristics of the Yomiuri appear in “Broadcasting and
Copyright.” As shown in Figure 4, the red nodes compose a cluster consisting of
several keywords including not only “Broadcasting” and “Copyright” but also
“Recording,” “Digital,” and “Equipment,” Note that we could guess from the
keywords that this area shows the limit of copyright and proper equipment when
we record digital broadcasting of the Olympic, but these keywords connect each
other on the side of the Yomiuri. Therefore, we can assume that broadcasting and
copyright are important factors for the Yomiuri.
150 R. Saga and H. Tsuji

Election

Economics

Broadcasting and Copyright Political Situation GDP

of Neighbor countries

Diplomatic relations
Food Problem

Japanese Olympic Committee


Skating

Tibet Problem

Olympic Games

Fig. 5 Integrated FACT-Graph of two newspapers: the Asahi and the Yomiuri (Red: the
Yomiuri, Blue: the Asahi, White: both)

On the other hand, when we checked the keywords of the Asahi, we noticed
the term “War” in the area of “Diplomatic Relations.” This keyword belongs to
Class B with white terms, which has high TF and high DF, and the keyword only
appears in the Asahi. This means that the Asahi used the term “War” frequently
whereas the Yomiuri did not use this term at all. From this, we could comprehend
that the Asahi used the term “War” extensively when the topics of diplomatic
relations were described.
We could also understand the following by integrating FACT-Graphs.
1. We can recognize the spread of topics and amount of keywords from the
distribution of the color.
2. From the common keywords shown as white nodes in each topic, we can
recognize the basic keywords in the topics.

5 Conclusion
This paper described a method to integrate two FACT-Graphs into one to compare
two targets. To apply a FACT-Graph to a comparison analysis, we interchanged
Comparison Analysis for Text Data by Integrating Two FACT-Graphs 151

target data with time series on the basis of class transition analysis. In addition, we
explained how to integrate two FACT-Graphs into one for comparison analysis.
To validate the usability of an integrated FACT-Graph, we compared the
features of the Asahi and Yomiuri newspapers by analyzing their editorials. From
the results of the comparison analysis and targeting the word “Olympic,” we
discovered that the Asahi described several topics other than those described by
the Yomiuri and showed that the proposed method could be used for a comparison
analysis between two targets. As future works, we evaluate the experiment results
quantitatively and compare them with existing FACT-Graph.
Acknowledgement. This research was supported by The Ministry of Education, Culture,
Sports, Science and Technology (MEXT), Japan Society for the Promotion of Science
(JSPS), Grant-in-Aid for Young Scientists (B), 21760305, 2009.4-2011.3.

References
1. Tiwana, A.: The Knowledge Management Toolkit: Orchestrating IT, Strategy, and
Knowledge Platforms. Prentice Hall (2002)
2. Inmon, W.H.: Building the Data Warehouse. John Wiley & Sons, Inc. (2005)
3. Feldman, R., Sanger, J.: The Text Mining Handbook: Advanced Approaches in
Analyzing Unstructured Data. Cambridge University Press (2007)
4. Saga, R., Terachi, M., Sheng, Z., Tsuji, H.: FACT-Graph: Trend Visualization by
Frequency and Co-occurrence. In: Dengel, A.R., Berns, K., Breuel, T.M., Bomarius,
F., Roth-Berghofer, T.R. (eds.) KI 2008. LNCS (LNAI), vol. 5243, pp. 308–315.
Springer, Heidelberg (2008)
5. Saga, R., Tsuji, H., Tabata, K.: Loopo: Integrated Text Miner for FACT-Graph-Based
Trend Analysis. In: Salvendy, G., Smith, M.J. (eds.) HCI I 2009. Part II. LNCS,
vol. 5618, pp. 192–200. Springer, Heidelberg (2009)
6. Saga, R., Tsuji, H., Miyamoto, T., Tabata, K.: Development and case study of trend
analysis software based on FACT-Graph. Artificial Life and Robotics 15, 234–238
(2010)
7. Saga, R., Miyamoto, T., Tsuji, H., Matsumoto, K.: FACT-Graph in Web Log Data. In:
König, A., Dengel, A., Hinkelmann, K., Kise, K., Howlett, R.J., Jain, L.C. (eds.) KES
2011, Part IV. LNCS, vol. 6884, pp. 271–279. Springer, Heidelberg (2011)
8. Terachi, M., Saga, R., Tsuji, H.: Trends Recognition. In: IEEE International
Conference on Systems, Man & Cybernetics (IEEE/SMC 2006), pp. 4784–4789 (2006)
9. Saga, R., Takamizawa, S., Kitami, K., Tsuji, H., Matsumoto, K.: Comparison Analysis
for Text Data by Using FACT-Graph. In: Salvendy, G., Smith, M.J. (eds.) HCII 2011,
Part II. LNCS, vol. 6772, pp. 75–83. Springer, Heidelberg (2011)
10. Saga, R., Takamizawa, S., Tsuji, H., Matsumoto, K.: Comparison Analysis for
Editorials by Reversible FACT-Graph. In: Proceedings of the International Conference
on Information and Knowledge Engineering (IKE 2011), pp. 216–221 (2011)
11. Baayen, R.H.: Word Frequency Distributions. Springer (2002)
Construction of a Local Attraction Map
According to Social Visual Attention

Ichiro Ide, Jiani Wang, Masafumi Noda, Tomokazu Takahashi, Daisuke Deguchi,
and Hiroshi Murase

Abstract. Social media on the Internet where millions of people share their per-
sonal experiences, can be considered as an information source that implies people’s
implicit and/or explicit visual attentions. Especially, when the attentions of many
people around a specific geographic location focus on a common content, we may
assume that there is a certain target that attracts people’s attentions in the area. In
this paper, we propose a framework that detects people’s common attention in a
local area (local attraction) from a large number of geo-tagged photos, and its vi-
sualization on the “Local Attraction Map”. Based on the framework, as a first step
of the research, we report the results from a user study performed on a Local At-
traction Map browsing interface that showed the representative scene categories as
local attractions for geographic clusters of the geo-tagged photos.

Ichiro Ide · Masafumi Noda · Hiroshi Murase


Nagoya University, Graduate School of Information Science, Furo-cho, Chikusa-ku, Nagoya
464-8601, Japan
e-mail: ide@is.nagoya-u.ac.jp,mnoda@murase.m.is.nagoya-u.ac.jp,
murase@is.nagoya-u.ac.jp
Jiani Wang
Nagoya University, Graduate School of Information Science, Furo-cho, Chikusa-ku, Nagoya
464-8601, Japan
e-mail: jwang@murase.m.is.nagoya-u.ac.jp
Currently at Oki Data Corporation
Tomokazu Takahashi
Gifu Shotoku Gakuen University, Department of Economics and Information,
1-38 Naka-Uzura, Gifu, 500-8288, Japan
e-mail: ttakahashi@gifu.shotoku.ac.jp
Daisuke Deguchi,
Nagoya University, Information and Communications Headquarters, Furo-cho, Chikusa-ku,
Nagoya 464-8601, Japan
e-mail: ddeguchi@nagoya-u.jp

T. Watanabe et al. (Eds.): Intelligent Interactive Multimedia: Systems & Services, SIST 14, pp. 153–162.
springerlink.com c Springer-Verlag Berlin Heidelberg 2012
154 I. Ide et al.

1 Introduction
Following the recent trend of the diffusion of social media on the Internet where
millions of people share their personal experiences as digital contents, we can easily
obtain thousands of photos tagged with geographic coordinates (geo-tags) indicating
where they were taken. We are focusing on the contents of such photos from the
point that they imply people’s implicit and/or explicit visual attentions.
Especially, when the attentions of many people around a specific geographic lo-
cation focus on a common content, we may assume that there is a certain target that
attracts people’s attentions in the area. In this paper, we call such a target a “local
attraction”.
A local attraction could be a static phenomenon such as an object; artificial or
natural, or a scenery observed from the area. Such local attractions are not easy to
infer from objective data such as satellite images and maps, since they can be any-
thing from a small statue located exactly on the spot to a panoramic view observable
from the spot which may contain geographic objects located miles away.
On the other hand, a local attraction could also be a non-static phenomenon that
reflects a common activity in the area, such as shopping, eating, playing, and so
on. These are even more difficult to infer from objective data, since they need to be
recognized in the context of human activity observed on the ground.
Moreover, without the help of social media, it would be difficult to infer what
attracts people only from objective data. We are not interested in providing users
with information on interesting spots located in the middle of a desert where no one
actually visits, but instead, with information on attractive spots where many other
people have also visited and showed their interest by taking a photo and sharing it
on the Internet.
Meanwhile, traditional media such as travel guides and maps cover both types of
local attractions, but their contents are not necessarily updated frequently. In order
to cope with the rapidly changing modern society, we considered that the continu-
ously updated information provided from a large number of people through social
media should be useful to construct a map that reflects the most up-to-date local
attractions.
In this paper, we propose a framework that automatically detects people’s com-
mon attention in a local area (e.g. local attraction) from a large number of geo-
tagged photos, and its visualization on a map called the “Local Attraction Map”.
Based on the framework, as a first step of the research, we report the results from
a user study performed on a Local Attraction Map browsing interface that showed
the representative scene categories as local attractions for geographic clusters of the
geo-tagged photos (Figure 1).
The paper is organized as follows: Section 2 introduces related works on land-
mark detection and travel planning based on social media. Section 3 introduces the
proposed method to construct the Local Attraction Map based on a large number of
geo-tagged photos. Section 4 reports the result of the user study, and finally Sect. 5
concludes the paper.
Construction of a Local Attraction Map According to Social Visual Attention 155

Fig. 1 A Local Attraction Map for Kyoto, Japan.

2 Related Works
Commercial softwares such as Google Map1 , Google Earth2 , and Panoramio3 are
nowadays important tools for travelers to perform a visual survey before visiting a
planned travel destination. However, it is usually difficult to find sites / areas-of-
interest in a destination where the user is not familiar with, by simply using these
services.
Making use of the geo-tags tagged to photos is a recent trend to support travel
planning for such users. This type of research can be separated into two topics: 1)
travel route mining and recommendation [1, 3, 8, 9], and 2) landmark detection and
representative photo selection [2, 6, 10, 12].
The travel route mining and recommendation methods analyze sequences of geo-
tagged photos and propose routes that match the interests of a user. Since they focus
mostly on the sequence of geo-tags, they usually do not make use of the image
contents, except for Cheng et al.’s work [3] that infers user attributes from faces in
the images and matches them with the user’s attributes for the recommendation.
1 http://maps.google.com/
2 http://earth.google.com/
3 http://www.panoramio.com/
156 I. Ide et al.

Meanwhile, the landmark detection and representative photo selection methods


combine geo-clustering and tag and/or image clustering in order to obtain a pop-
ular landmark and sometimes its popular angle. Weyand and Leibe’s method [10]
even generates a 3D model of a landmark from a large number of photos taken from
different angles. Chen et al.’s method [2] is also interesting in the sense that it gen-
erates simplified icons from a representative view of a landmark, and shows them
on a simplified map. These methods are useful for landmarks; mostly buildings, but
not all kinds of local attractions such as those mentioned in Sect.1 can be handled.
In both research topics, providing information on actual landmarks may be too
concrete for users who may not be familiar with or have a clear idea of what awaits
them in the planned destination; they may simply want to know where the cultural
heritages are, where they can find restaurants, or where they can do street shop-
ping, without an idea of a particular site or shop. Thus in this paper, we will focus
on providing the users with the types of the local attractions rather than concrete
information on the individual local attractions.
Scene category classification itself has been a very hot research topic in the past
decade, such as Xiao et al.’s work [11]. However, since it is a widely studied research
topic, we will not focus much on the technology in this paper.

3 Construction of a Local Attraction Map According to Social


Visual Attention
In this Section, we describe how the proposed method analyzes the social visual
attention and construct the Local Attraction Map.

3.1 Analyzing the Social Visual Attention


In order to analyze the social visual attention from a large number of geo-tagged
photos taken within a specific region, they are first clustered according to their ge-
ographic location. Next, for each cluster, a representative scene category is decided
by image classification. Details of the two steps are described in this Section.

3.1.1 Clustering of Geo-tagged Photos

Since the local attraction could be any phenomenon from an object located exactly
on the spot to a terrain that covers a wide area, the size and the shape of the area
that covers a local attraction should be flexible. Thus, we decided to extract clusters
from the distribution of the geo-tagged photos instead of using fixed-sized shapes
(most likely, blocks).
For the clustering, we used the nearest neighbor method with a restriction that if
the geographic distance between two clusters are larger than θ [km], they are not
merged.
Figure 2 shows the result of the clustering for constructing the Local Attraction
Map shown in Fig. 1.
Construction of a Local Attraction Map According to Social Visual Attention 157

Fig. 2 Example of the clus-


tering of geo-tagged photos.
Each dot represents a photo
taken at the location. Differ-
ent colors indicate different
clusters.

3.1.2 Decision of the Type of Local Attraction

After the clustering, for each cluster, the type of local attraction is decided. As men-
tioned in Sect. 1, at this moment, we have only implemented certain static phenom-
ena, namely, scene categories as local attractions. We defined five scene categories
as shown in Fig. 3; “city”, “forest”, “water”, “flatland”, and “mountain”.
For the scene category classification, we implemented a five-class support vector
machine (SVM) classifier with a bag-of-features (BoF) representation of local image
features obtained by the scale-invariant feature transform (SIFT) algorithm [7] and
a normalized HSV color histogram as image features.
The classifier was trained by 16,689 categorized photos from the SUN database
[11]. We manually selected 39 categories from the SUN database and mapped them
onto the five scene categories as listed in Table 1. For reference, the classifiers had

Fig. 3 Example of photos that belong to each scene category.


158 I. Ide et al.

Table 1 Correspondence of the scene categories and the SUN database categories.

Scene category SUN Database categories

City alley, amusement park, bridge, building, fountain, gazebo, house, market,
pagodas, place, railroad track, shopfront, street, temple, tower, village
Forest botanical garden, forest, forest path, park
Water bridge, canal, coast, creek, dam, hot spring, islet, lake, ocean, pond, river,
sea cliff, waterfall
Flatland amphitheater, badlands, desert, field
Mountain cliff, dam, mountain, sea cliff, valley

a recognition rate of approximately 77% in a ten-fold cross validation experiment.


Although this accuracy is not significantly high, we can expect the use of state-of-
the-art general object recognition methods in the future.
Finally, for each cluster, the most frequent scene category is selected as the rep-
resentative one; the type of local attraction.

3.2 Construction of the Local Attraction Map


Finally, both the location of each geo-tagged photo and the representative scene
category for each cluster are super-imposed onto a digital map as shown in Fig. 1.
In the map, each geo-tagged photo is indicated by a ‘◦’ with a color that represents
the representative scene category for the cluster that it belongs to. The representative
scene category for each cluster is also indicated by a scene category icon as shown
in Fig. 4.

Fig. 4 Icons that represent


scene categories in the Local
Attraction Map.

4 User Study
In order to evaluate the usefulness of the proposed Local Attraction Map, we per-
formed a user-study using a Local Attraction Map browsing interface as shown in
Fig. 5. The interface allows users to browse the Local Attraction Map (left-hand
side), and at the same time, scan through photos that belong to the representa-
tive scene category for a specified cluster displayed in a separate panel (right-hand
side).
Construction of a Local Attraction Map According to Social Visual Attention 159

Fig. 5 The Local Attraction Map browsing interface. The picture browsing panel allows users
to browse geo-tagged photos that belong to a representative scene category at a location
specified in the Local Attraction Map panel.

4.1 Preparation and Experimental Conditions


For the user study, we constructed a Local Attraction Map for Kyoto, Japan
(Figure 1). Table 2 shows the parameters and conditions set to create the map. From
the specified area, we collected 4,536 geo-tagged photos from Panoramio. The clus-
tering yielded 39 clusters with an average of 112 photos per cluster.
The user study was performed in the following steps:
Step 1. Asked a subject to perform a survey on a planned travel destination;
Kyoto, Japan, using both Panoramio and the proposed Local Attraction Map
browsing interface.

Table 2 Parameters and conditions set to create the Local Attraction Map in Fig. 1.

Parameter / condition Value

Target area (+35.00244◦ , +135.71102◦ ) – (+35.16454◦ , +135.90946◦ )


A square area with a size of approximately 20 [km] × 20 [km] that
includes Kyoto city and its vicinity.
Image source Panoramio (http://www.panoramio.com/)
Clustering threshold (θ ) 2 [km]
Subjects 25 students
160 I. Ide et al.

Step 2. After lecturing the function of the Local Attraction Map browsing inter-
face, asked the subject to evaluate its usefulness for the purpose of performing a
survey on a planned travel destination. The subjects selected from the following
five candidates for the evaluation: Useful, Relatively useful, Neutral, Relatively
Useless, and Useless. They were also asked to provide reasonings for the judg-
ments.

4.2 Result and Analysis


Table 3 shows the result of the evaluation by the users. 76% (= 19/25) of the users
evaluated the Local Attraction Map as useful to some extent. From this result, we
consider that the proposed representative scene category visualization function com-
bined with the geographic layouting of geo-tagged photos was useful for performing
a survey on a planned travel destination.
We analyzed the reasonings for the evaluations provided by the users. First, the
following are excerpts of the reasonings provided by the users who evaluated either
“Useful’ or “Relatively useful” (Translated from Japanese).

• The map allows me to grasp at a glance, the location and the types of
scenes that I can expect at an unfamiliar travel destination.
• The map shows what (which scene category) most people shoot at at
a specific location.
• Since the local information is evaluated according to the number of
photos, the map may reveal hidden spots-of-interest.
• Even photos without tags can be classified and searched using the map.
• Different from Panoramio’s tag-based search, the map can show vari-
ous scene categories at the same time.

These reasonings matched our purpose of the proposed Local Attraction


Map.
Next, the following are excerpts of the reasonings provided by the users who
evaluated either “Neutral” or “Relatively useless” (Translated from Japanese).

Table 3 Result of user evaluation.

Usefulness Useful Relatively Neutral Relatively Useless


useful useless
Ratio 24% ( 6/25) 52% (13/25) 16% ( 4/25) 8% ( 2/25) 0% ( 0/25)
Construction of a Local Attraction Map According to Social Visual Attention 161

• It would be more useful if the map displayed not only photos from a
representative scene category, but rather popular photos for all cate-
gories.
• If the scene categorization were very accurate, the map could be use-
ful.
• The definition of the scene categories was ambiguous and difficult to
understand. The map is not useful unless they are more concrete con-
cepts.
• What happens if two different scene categories are present in a single
photo?

Following these reasonings, in the future, we will consider the following points in
order to improve the usefulness of the Local Attraction Map.
• Add a function that shows a ranked list of representative scene categories per
cluster.
• Modify the scene category classifier so that it could handle the situation where
multiple scene categories are present in a single photo.
• Improve the scene category classification accuracy by using a state-of-the-art
general object recognition method, and also by developing a classification method
that considers the inclusion relation between scene categories.
• In addition to the current scene categories, add those that represent non-static
phenomena, such as “Eating”, “Shopping”, “Playing (sports)”, and so on.

5 Conclusion
In this paper, we proposed the framework to construct a “Local Attraction Map” by
analyzing the social visual attention from a large number of geo-tagged photos. The
user study showed positive results, but at the same time we found several important
points that need improvement. In the future, we will work on these points, and also
try to perform an experiment and an user study in a larger scale.

Acknowledgements. Parts of this work were supported by Grants-in-aid for Scientific Re-
search from the Japanese Ministry of Education, Culture, Sports, Science and Technology.

References
1. Arase, Y., Xie, X., Hara, T., Nishio, S.: Mining people’s trips from large scale geo-tagged
photos. In: Proceedings of the 18th ACM International Conference on Multimedia, pp.
133–142 (2010)
2. Chen, W.C., Battestini, A., Gelfand, N., Setlur, V.: Visual summaries of popular land-
marks from community photo collections. In: Proceedings of the 17th ACM International
Conference on Multimedia, pp. 789–792 (2009)
162 I. Ide et al.

3. Cheng, A.J., Chen, Y.Y., Huang, Y.T., Hsu, W.H., Liao, H.Y.M.: Personalized travel rec-
ommendation by mining people attributes from community-contributed photos. In: Pro-
ceedings of the 19th ACM International Conference on Multimedia, pp. 83–92 (2011)
4. Crandall, D., Backstrom, L., Huttenlocher, D., Kleinberg, J.: Mapping the world’s pho-
tos. In: Proceedings of the 18th International Conference on World Wide Web, pp. 761–
770 (2009)
5. Csurka, G., Bray, C., Dance, C., Fan, L., Willamowski, J.: Visual categorization with
bags of keypoints. In: Proceedings of the ECCV2004 International Workshop on Statis-
tical Learning in Computer Vision, pp. 1–22 (2004)
6. Gao, Y., Tang, J., Hong, R., Dai, Q., Chua, T.S., Jain, R.: W2Go: A travel guidance
system by automatic landmark ranking. In: Proceedings of the 18th ACM International
Conference on Multimedia, pp. 123–132 (2010)
7. Lowe, D.: Distinctive image features from scale-invariant keypoints. International Jour-
nal of Computer Vision 60(2), 91–110 (2004)
8. Lu, X., Wang, G., Yang, J., Pang, Y., Zhang, L.: Photo2Trip: Generating travel routes
from geo-tagged photos for travel planning. In: Proceedings of the 18th ACM Interna-
tional Conference on Multimedia, pp. 143–152 (2010)
9. Okuyama, K., Yanai, K.: A travel planning system based on travel trajectories extracted
from a large number of geotagged photos on the Web. In: Proceedings of the Pacific-Rim
Conference on Multimedia (2011)
10. Weyand, T., Leibe, B.: Discovering favorite views of popular places with iconoid shift.
In: Proceedings of the 13th IEEE International Conference on Computer Vision, pp.
1132–1139 (2011)
11. Xiao, J., Hays, J., Ehinger, K., Oliva, A., Torralba, A.: SUN Database: Large-scale scene
recognition from abbey to zoo. In: Proceedings of the 2010 IEEE Computer Society
Conference on Computer Vision and Pattern Recognition, pp. 3485–3492 (2010)
12. Zheng, Y.T., Zhao, M., Song, Y., Adam, H., Buddemeier, U., Bissacco, A., Brucher,
F., Chua, T.S., Neven, H.: Tour the World: Building a Web-scale landmark recognition
engine. In: Proc. 2009 IEEE Computer Society Conference on Computer Vision and
Pattern Recognition, pp. 1085–1092 (2009)
Construction of Content Recording
and Delivery System for Intercollegiate Distance
Lecture in a University Consortium

Takeshi Morishita, Kizuku Chino, and Masaaki Niimura

Abstract. In offering intercollegiate distance lectures, problems arose from


different timetables of each university in a university consortium. To address the
problem, we aimed to construct a content recording and delivery system based on
the distance learning system to accommodate students who could not attend the
lectures. As a result of combining various systems, students were able to attend
the lectures in real time or content delivery with VOD depending on the
university. In addition, we suggest the possibility of encouraging students to view
the lecture again.

Keywords: distance lecture, e-learning, consortium, system development.

1 Introduction
The consortium of higher education in Shinshu (Koutou Kyouiku Konsoshiamu
Shinshu) is a joint body with the aim of maintaining the individuality and
nurturing the talents within eight universities geographically dispersed around
Nagano Prefecture in central Japan. To meet these aims, the intercollegiate
distance learning system was set up in November, 2008 so that shared classes
could take place with faculty members and students from each of these
universities.
From April, 2009, we have operated the intercollegiate distance learning
system and been able to connect distance-learning lecture rooms in real time. For

Takeshi Morishita ⋅ Masaaki Niimura


Shinshu University

Kizuku Chino
The Consortium of Higher Education in Shinshu
3-1-1 Asahi Matsumoto City, Nagano 390-8621 Japan
e-mail: morisita@shinshu-u.ac.jp

T. Watanabe et al. (Eds.): Intelligent Interactive Multimedia: Systems & Services, SIST 14, pp. 163–172.
springerlink.com © Springer-Verlag Berlin Heidelberg 2012
164 T. Morishita, K. Chino, and M. Niimura

example, we held the intercollegiate community event “K3 Salon” for faculty
members and students twenty-four times between May, 2009 and September,
2011, which promoted utilization and operational testing of the system[1,2].
Such a distance learning system has also been constructed at other universities
and consortia[3], and faculty members have used it for conferences entirely within
Shinshu University. However, in the case of intercollegiate distance lectures,
students were not able to attend lectures because of differences in lesson schedules
of each university. In addition, faculty members suggested that it was difficult to
offer distance lectures without any discrimination between universities and to
expect the same educational effect as with face-to-face classes. To solve the above
problems, Morikawa, et al.[4] offered distance lectures on the schedule of the
backbone university. Alternatively, there was a practice of using a common time
zone of each university within a 120 minute window[5].
On the basis of the above previous studies, we suggest recording distance
lectures and delivering content in multiple formats to accommodate students who
are not able to attend the lectures. We felt it necessary to construct a recording
system within the existing system to make it easier for everyone familiar with that
system.
In this paper, we aim to construct a content recording and delivery system
based on the existing intercollegiate distance learning system in the consortium of
higher education in Shinshu.

2 Functionality Requirements
This system had to fulfill the following requirements for distance lectures.

• Connection to Distance-Learning Lecture Rooms in Real Time

Students are effectively able to attend the distance lecture. In this instance, the
timetable of each university should be the same. In these classes, because the
teacher-student rapport is maintained, educational effects should be equivalent to
face-to-face classes.

• Content Delivery with VOD (Video on Demand)

The image and voice are transmitted to the content recording system and contents
of lectures are created automatically. Then, these contents are transmitted to the
content delivery system and delivered on LMS (Learning Management System)
with VOD, so students are able to watch these contents on a personal computer at
home or in a study room with the internet at their convenience.
However, because there is no teacher-student rapport, it is necessary to provide
a method of communications such as LMS or E-mail, as above. In addition,
students also need their own accounts and passwords to log into the LMS on
which contents are published.
Construction of Content Recording and Delivery System 165

3 Configuration
To achieve the intercollegiate distance learning system with the above
requirements, the following equipment and software is necessary.

3.1 Intercollegiate Distance Learning System[1,2]


Teleconference system is set up in each university to send and receive high-quality
sound and HD (High Definition) images between projectors, cameras and
lecturers’ personal computers. Each university is connected by optical fiber to a
Multi-point Control Unit (MCU) Polycom RMX2000 in Shinshu University.
Each university can participate in fixed or mobile video conferences. A fixed
conference takes place in a dedicated classroom or conference room, set up for
distance learning or video conferencing with large groups of people, equipped
with either a Polycom HDX9002 or a SONY PCS-XG80 teleconferencing system.
Dedicated classrooms are equipped with a Polycom HDX9002 as CODEC
(COder DECoder) controller. There are two or three HD projector screens or
plasma televisions with at least fifty-inch screens and a Component/Composite/
RGB Matrix Switch as the output controller. The System Control Unit interfaces
with a Touch Panel and controls this system (Fig. 1).
Network

System Speaker
Control Unit

Component/
Composite/RGB
Matrix Switch Projector1
Display

Projector2
Telecon System Projector3

Echo Canceller

AMP Speaker

Touch
Panel
Wireless LAN
Access Point

Fig. 1 Fixed System for Dedicated Classroom

The mobile system can be taken to any classroom or laboratory that has
internet access. SONY PCS-XG80 is used in combination with at least a twenty-
six inch LCD TV. The entire system is set up on a rack on casters that can be
moved easily and is wired so that only one plug needs to be connected to a
power outlet.
166 T. Morishita, K. Chino, and M. Niimura

3.2 Content Recording and Delivery System


A student studies alone with his or her computer connected to the internet at home
or in study rooms. This system is composed of content recorder Mediasite ML
Recorder and content delivery server Mediasite EX Server in Shinshu University.
In addition, the image and voice recorded to Mediasite ML Recorder are sent from
MCU through teleconference system Polycom HDX7000. The recorded content
data is preserved in a content storage server ASACA DS1200.

3.3 Booking System


The existing booking system didn’t have class name, faculty members and course
information because distance lectures were not recorded. However it is the
purpose to record distance lectures in this study. So, in addition to regulation of
the start-up of the intercollegiate distance learning system of each university and
the connection between each system, this booking system has to start/stop the
recording system and transmit class name, faculty member and course information
to content recording and delivery system.
The booking system is composed of the following, and there is also an
authentication system.
• Booking Information Manager
We are able to book and manage the booking information; distance-learning
lecture rooms, date, timetable, class name, faculty members, course information
and so on.
• Device Integration Server
The device integration database is created on the basis of the booking information
in the booking information manager and data of the authentication system
described below. A control signal is created on the basis of this database to control
related equipment.
• Schedule Manager
This manager loads the schedule of distance lectures from the booking
information system and controls MCU and the recording system on the basis of
the device integration database. It is composed of Princeton Meeting Organizer
conference management and scheduling software.

3.4 Learning Management System and Authentication System


Recorded content is automatically linked with a content delivery server on a
Moodle LMS. Authentication systems are composed of the existing LDAP
(Light Directory Access Protocol) authentication system in Shinshu University.
This has a data table of all participants in the consortium of higher education in
Shinshu.
Construction of Content Recording and Delivery System 167

Matsumoto Dental Univ. Seisen Jogakuin Col.

Fixed Fixed Mobile Nagano-Education Campus


System System System
Fixed
System Minamiminowa Campus
Mobile
System Fixed
System
Ueda Campus
Saku Univ.
Fixed
Fixed System
System
Nagano-Attached Campus
Mobile
Content
System B Flets 100Mbps
Storage
Schedule Server
Matsumoto Univ. Manager
Fixed Telecon
System Leased
Internet SINET Recorder
Circuit MCU
1Gbps
Mobile 1Gbps
Fixed
System
System
Wide-Area
Nagano Univ.
Ethernet
Fixed 100Mbps Matsumoto Campus
System
Telecon
Mobile Pref. Network System
System 10Mbps
Content
SINET 200Mbps Recorder
Nagano Col. of Nursing Content
Delivery
Fixed
Server
System
Fixed Mobile Fixed
Mobile System System System
System
Tokyo Univ. of Science, Suwa Shinshu Univ.

Fig. 2 Design of Intercollegiate Distance Learning System

4 Routine and Design


The following shows the routine by which distance lectures are recorded to the
content recorder and delivered as content to students on the internet. Fig. 2 is a
design of the system which is constructed according to the following routine.

(1) Set start-up-time, finish-time and rooms for distance lectures on the
booking system.
(2) Turn on distance learning system of each room automatically three
minutes before the start-up-time.
(3) Connect distance-learning lecture rooms to MCU automatically at the
start-up-time. In addition, start recording automatically with Polycom
RSS2000 teleconference recorder and content recorder in Polycom
HDX7000 at Shinshu University.
(4) Start distance lecture.
(5) Finish distance lecture at least one minute before the finish-time.
168 T. Morishita, K. Chino, and M. Niimura

(6) Disconnect distance-learning lecture rooms automatically from MCU one


minute before the finish-time. In addition, stop recording by content
recorder automatically.
(7) Turn off distance learning system at the finish-time.
(8) Publish contents through content delivery system.

5 Construction

5.1 Cooperation
Table 1 shows necessary functions and received signal/data of each system. The
received signal/data on the table is necessary to cooperate with other systems and
work smoothly.
Table 2 shows necessary data of each content delivery form on booking.

Table 1 Necessary Functions and Received Signal/Data of Each System

System Name Functions Received Signal/Data


Intercollegiate Distance Delivery of image and Control signal of each equipment from
Learning System voice booking system
Connection rooms, time and control
Recording classes,
Content Recording and signal from booking system
Delivery of contents,
Delivery System Class name, faculty members and course
Live broadcast classes
information in device integration server
Connection, Class name, faculty members and course
Booking System
Booking to record information in device integration server
Class name, faculty members and course
Distributing the materials, information in device integration server
LMS
Handing in papers URL, viewing records and recording file
name in content delivery system
User authentication
Class name, faculty members and course
Authentication System between each system,
information in device integration server
Input of course information

Table 2 Necessary Data of Each Content Delivery Form on Booking

Necessary Data on Booking


Content
Content
Delivery Class Faculty Course View
Rooms Time Delivery
Form Name Member Info. Records
URL
Real Time ✓ ✓ ✓ ✓
VOD ✓ ✓ ✓ ✓ ✓
5.2 Authentication
According to these tables, the authentication system is important to construct
content recording and delivery system based on an intercollegiate distance
Construction of Content Recording and Delivery System 169

learning system because the data of lectures is closely dependent upon faculty
members and students. So we clarified the necessary access authorities, and
designed and constructed an authentication system (Table 3).
We decided to use a part of LDAP authentication system in Shinshu University
to authenticate users and manage access authorities. However there was a
possibility of conflicts, so we have constructed the LDAP server for the
consortium and made it work without contradictions.

Table 3 Access Authorities

Access Authority
Content
User Recording and Authentication
Booking System LMS
Delivery System
System
Administrator Full access Full access Full access Full access
Booking
Full access
Administrator
Authentication
Full access
Administrator
Full access Full access
Faculty Member View only
(only own class) (only own class)
Student View only View only
View only View only
Guest (only open (only open
class) class)

6 Evaluation
Each university has operated credit-transfer lectures within the intercollegiate
distance learning system since April, 2010, and 854 students registered for
fourteen distance lectures in the first semester; from April to September, 2011.

6.1 Support
For the first two weeks, a technical assistant visited each university from the
consortium and supported the delivery of distance lectures. The assistant gave out
a manual to use the distance learning system and instructed faculty members of
each university. In addition, another assistant in the consortium supported distance
lectures of each university by remote control.
After the above assistance period, faculty members of each university called the
technical assistants when there was a problem or question. The assistants
answered by telephone or E-mail, and supported by remote control as necessary.

6.2 Status of Content View


There were a total of 159 lectures of fourteen classes from six universities and a
total of 3065 viewers watched them for the semester.
170 T. Morishita, K. Chino, and M. Niimura

According to the access log, 24.4% of all students watched the contents in the
weekend and the number of views was on the increase after school; from eighteen
to twenty-three in the weekday (Fig. 3). So we found that students were interested
in the content of preparing for or reviewing lectures and this system was able to
meet their needs. In addition, there was a possibility that the contents encouraged
students to study by themselves outside of school hours.

Fig. 3 The Number of Views per Hour in Weekday

6.3 Questionnaire Results


We conducted a questionnaire survey all students on the use of contents. The
survey was conducted on the internet, and students who were taking more than
one class provided answers in all their classes. A total of 352 students responded,
with valid responses from 165 students so the response rate was 46.9%.
According to the survey, 117 students watched contents; 70.9% of all valid
responses and 111 students in home; 94.9% of them. The major reasons given
were follow-up or review after a real-time lecture and to learn about lectures that
students did not attend (Table 4). So we suggest that this system solved the
problem of students not being able to attend lectures because of differences in
lesson schedules of each university.

Table 4 Why Did You Watch Contents? (Multiple Answers)

Reason Quantity Rate (%)


1. To review a previous lecture 71 43.0
2. To learn more about an unclear point in the lecture 70 42.4
3. To learn the lecture not attended 56 33.9
4. To be interested in the lecture 24 14.6
5. To have enough time to watch 10 6.1
6. To be coached by the lecturer 9 5.5
7. To review learning from high school 3 1.8
Construction of Content Recording and Delivery System 171

In addition, 90.9% of students felt that contents helped them to understand the
class (Fig. 4). Contents were felt necessary by 98.8% of students, of whom 43.6%
felt them indispensable (Fig. 5). So we found that students needed contents to
understand the lecture.

0.0% 0.6%
Strongly Agree
8.5%
41.8%
Somewhat Agree

Neither Agree nor Disagree

Somewhat Disagree

49.1%
Strongly Disagree

Fig. 4 Contents Helped to Understand the Class?

8.5% 1.2%
Absolutely Necessary
43.6%
Only as a Guide

Better than Nothing

46.7% Unnecessary

Fig. 5 Contents Were Necessary?

7 Conclusions
We constructed a content recording and delivery system based on the existing
intercollegiate distance learning system to accommodate students who are not able
to attend lectures because of timetabling differences between each university in
the consortium of higher education in Shinshu.
We found that a lot of contents recorded by this system have been used at the
convenience of each university and student in the practice of a semester. As a
result, we suggest that this system achieved an aim of this paper that it records
172 T. Morishita, K. Chino, and M. Niimura

lectures and delivers a range of content to accommodate students who are not able
to attend. In addition, there is a possibility that the contents encouraged students to
study by themselves outside of school hours.
In the future, we would like to analyze the status of content views for each
student and make clear their study records and efforts with this system.

Acknowledgment. This research is a support project for strategic university collaborations


in 2008 by the Ministry of Education, Culture, Sports, Science and Technology in Japan.
And thanks to faculty members and students in the consortium of higher education in
Shinshu, Audio Visual Communications ltd., Delight Technology, Mediasite K.K. and NEC
Networks & System Integration Corporation.

Relevant URLs. The Consortium of Higher Education in Shinshu:


http://www.c-snet.jp/ (in Japanese)

References
1. Morishita, T., Niimura, M.: Quantitative Evaluation of an Intercollegiate Distance
Learning System. In: Proc. of World Conference on E-Learning in Corporate,
Government, Healthaccomodate, and Higher Education 2009, pp. 3563–3568 (2009)
2. Morishita, T., Chino, K., Suzuki, H., Nagai, K., Niimura, M., Yabe, M.: Practice of K3
Salon with Intercollegiate Distance Learning System in the Consortium of Higher
Education in Shinshu. Journal for Academic Computing and Networking 14, 105–116
(2010) (in Japanese)
3. Sakurada, T., Hagiwara, Y.: Deployment of HD Videoconference System for Remote
Lectures at 18 National Universities. IEICE Technical Report, IA2008-82 108(460),
91–95 (2009) (in Japanese)
4. Morikawa, H., Ruangrassamee, A., Chen, H.: A Practice of International Distance
Lecture through the Internet. Geotechnical Engineering Magazine, JGS Ser. No.
610 56(11), 34–35 (2008) (in Japanese)
5. Terao, Y.: On Distance Education of Universities using SCS and its Evaluation. The
Journal of School Education 14, 179–184 (2002) (in Japanese)
Data Embedding and Extraction Method
for Printed Images by Log Polar Mapping

Kimitoshi Tamaki, Mitsuji Muneyasu, and Yoshiko Hanada

Abstract. Data extraction methods from data embedding printed images by cap-
turing devices like scanners or mobile cameras have attracted much attention. In
this technique, consideration for geometrical deformation, especially rotation and
scaling, in the data extraction process is essential. This paper proposes a new cor-
rection algorithm without remarkable markers. This method exploits the log polar
mapping (LPM) to detect and correct the distortion. The proposed method esti-
mates the rotational angle and scaling factor from the amount of shift in the LPM
domain. Therefore no data embedding or adding for correcting the deformation is
required and the data embedded image with high quality is obtained. An image
clipping technique suitable for the proposed method is also developed. From the
experimental results, the effectiveness of the proposed method is shown.

1 Introduction
Recently, QR cords for helping the access to the web site from many printed ad-
vertisements and publications are frequently used. The QR cords are one of the
two-dimensional codes, which are most widely used in Japan. However, the two-
dimensional code degrades the appearance of the original design of the printed
matter, since they look like impressive mosaic-shaped figures. Therefore the data
embedding technology which directly embeds the data to an image on the printed
matter has been proposed and attracts the attention [1], [2], [3].
The data embedding technology is an application based on the digital water-
mark [4]. The severe influence of printers or image scanning devices should be
considered. Especially, geometrical distortion by scanning like rotation has strong
influence and any countermeasure for it is required. In this case, some reference
mark should be required to correct the geometrical distortion and the mark is used
for the cue by which the amount of distortion is measured.

Kimitoshi Tamaki · Mitsuji Muneyasu · Yoshiko Hanada


Faculty of Engineering Science, Kansai University
3-3-35 Yamate-cho, Suita, Osaka, 564-8680, Japan
e-mail: muneyasu@kansai-u.ac.jp

T. Watanabe et al. (Eds.): Intelligent Interactive Multimedia: Systems & Services, SIST 14, pp. 173–181.
springerlink.com © Springer-Verlag Berlin Heidelberg 2012
174 K. Tamaki, M. Muneyasu, and Y. Hanada

Therefore a rotation correction method by using the concept of reference marks


has been proposed [4]. This method embeds the reference marks to detect the
rotational angle of the captured image in the discrete Fourier transform (DFT)
domain. This method can correct the rotational angle by using the reference
marks. However, in this method, the reference marks with the large value have
been used and the reference marks have caused the image deterioration. In
addition, for the image with white background, it is difficult to decide the correct
region of the image.
In this paper, we propose a new detection and correction method for the rota-
tional angle and scaling factor without embedding the reference marks. The LPM
(log polar mapping) [5] and POC (phase only correlation) [6] are exploited for
estimating the rotational angle and scaling factor. The original image is also used
for not only the detection of messages, but also the estimation of the distortion.
For the image with the white background, the proposed method can be applied.
Some experiments show the effectiveness of the proposed method.

2 Marker Embedding Method


In this method, the data and the reference marks are embedded in the DCT and the
DFT domains, respectively. The reference marks are embedded to the four sym-
metrical positions centered the DC component. This symmetrical condition should
be satisfied from the nature of the DFT for the image with real values.

2.1 Embedding Procedure


First of all, the DFT is applied to an original image and the reference marks are
placed on the positions which are symmetrically arranged centered the DC com-
ponent. After embedding the reference marks, the inverse DFT is applied to the
embedding image. The configuration of the reference marks is shown in Fig. 1.
For embedding the data, the DCT is applied to the image. The spread spectrum
technique [7] is used for the generation of a printed image which contains an em-
bedded data. In the following, the embedding data is called message bits. The 1-bit
information is replaced by some diffusion code and embedded to frequency do-
main of an image. In this case, a Walsh code is used for the diffusion code. The
Walsh code is assigned for each bit to be embedded, the code is kept as it is if the
bit in the data is zero, and all bits of the code are reversed if the bit is one. All
codes are added and the embedding data W is generated. Finally a gain coefficient
g is multiplied to the W, and it is added to the DCT coefficients D of the origi-
nal image on the region for embedding the message bits. This is expressed by
Dw = D + W * g (1)

where Dw means the DCT coefficient after the embedding.


The embedding region of the message bits is shown in Fig. 2. This region is al-
located within the intermediate frequency area on the embedding baseline drawn
from the DC component to the direction of the angle θ . This region is decided
Data Embedding and Extraction Method for Printed Images 175

for giving the tolerance to the geometrical transformation [2]. Moreover, the same
message bits sequence are embedded in three lines centering on this embedding
region, and this redundancy gives the robustness to the disturbance caused by
printing and image capturing.

DFT Domain
DCT Domain

Reference Message
mark Shadowed bits
area

Reference
mark

Embedding
Shadowed Margin base line
area Margin

Margin

Fig. 1 Configuration of the reference marks in Fig. 2 Configuration of the message


the DFT domain. bits in the DCT domain.

2.2 Detection Procedure


For the captured image, the rotation correction is applied. From the rotated image,
only the object area is clipped out and resized to the size of the original image, and
the DCT is applied to the image. Finally we extract the message bits and detect the
information data.
We describe the rotation correction method and the procedure for the resize of
the image in the following.

【 Rotation correction algorithm 】


Step1: DFT is applied to the captured image whose size is adjusted to the size of
the original image.
Step2: The DFT coefficients in the shadowed area are changed to 0. The sha-
dowed area means the low frequency area shown in Fig. 1. In this area, the magni-
tude of the DFT coefficients is large and they disturb the detection of the reference
marks.
Step3: The maximum values of the DFT component found in the first quadrant
and the fourth quadrant are found. These coefficients are considered to be the
reference marks. The angle between the reference mark in the first quadrant and
horizontal axis is α 1 and the angle between the reference mark in the fourth qua-
drant and vertical axis is α 2.
Step4: Let the angle of rotation be ω . By using ω = (α1 +α 2) 2 − (π 4) , the rota-
tional angle can be obtained.

After the rotation correction, the corrected image should be resized. In this paper,
the following method is used.
176 K. Tamaki, M. Muneyasu, and Y. Hanada

【 Resize method for the corrected image 】


Step1: The rotation corrected image is binarized and black-and-white pixel rever-
sal is performed.
Step2: Labeling is applied to the binarized image and the pixels whose label is the
same one in the corner of the image is turned to the black pixel.
Step3: Labeling is applied and the area whose size is the maximum one is turned
to black.
Step4: The black-and-white pixel reversal is performed.
Step5: For the obtained image, we scan from all points on each side of the image
and obtain the distance to the black pixel which is found first.
Step6: The trimmed means of the obtained vertical and horizontal distances are
calculated, and by using these values, we can clip only an image part.

For the extraction of the message bits, the difference W ' between the DCT coef-
ficients of the luminance component of the captured image Dw ' and that of the
original image is calculated. This is expressed by
W ' = Dw '− D (2)
In addition, the information data bits are detected by the inner product of the pre-
defined diffusion codes and W ′ . If the value of the inner product becomes posi-
tive, the value of the information bit is judged to be 0, and if negative, it is judged
as 1. Taking majority decision is also applied to the three extracted message bits
sequences.

3 Proposed Method
This paper proposes a new correction algorithm without remarkable markers. This
method is based on the LPM and POC. The proposed method estimates the rota-
tional angle and scaling rate from the amount of shift in the LPM domain. There-
fore no data embedding or adding for correcting the deformation is required and
the data embedded image with high quality is obtained. An image acquisition
technique suitable for this method is also developed. In this section, the embed-
ding and detection methods for message bits are same as that of the maker embed-
ding method and we omit the details of these methods.

3.1 LPM
The LPM transforms the orthogonal coordinate system to the polar coordinate one
whose radius axis is expressed by the log scale. To transform, firstly, the point
( x, y ) on the orthogonal coordinate system is transformed to the polar coordi-
nate one;

⎧⎪ r = x 2 + y 2
⎨ (3)
⎪⎩θ = tan −1 ( y x )
Data Embedding and Extraction Method for Printed Images 177

where r and θ mean radius and angle. r is also transformed by ρ = ln r .


Finally, the point ( x, y ) is transformed to the point ( ρ , θ ) .

3.2 Effect of Deformation in the LPM Domain


The original image f ( x, y ) and the captured image g ( x, y ) which receive trans-
lation, rotation and scaling can be related in the following;
g ( x, y ) = f (σ ( x cos α + y sin α ) − x0 , σ ( − x sin α + y cos α ) − y 0 ) (4)
where ( x0 , y0 ) , α and σ mean the amount of translation, rotation and scaling,
respectively. Let F (u, v ) and G(u, v) be the Fourier transformed f ( x, y ) and
g ( x, y ) , respectively, then the following relationship between their amplitude
spectrums is obtained;
−2
G ( u, v ) = σ F (σ −1 (u cosα + v sin α ), σ −1 ( −u sin α + v cosα )) (5)

Then we can ignore the amount of translation by using this relationship.


The polar coordinate can be represented by
⎧u = e ρ cos θ
⎨ ρ (6)
⎩ v = e sin θ
and (5) can be rewritten as
−2
G ( u, v ) = σ F (σ −1e ρ cos(θ − α ), σ −1e ρ sin(θ − α )) . (7)

With ρ = ln r , (7) is also rewritten by


−2
G( ρ ,θ ) = σ F ( ρ − ln σ ,θ − α ) . (8)

This equation shows that ln σ and α in this domain represent the effects of
scaling and rotation to the captured image. Therefore we can estimate the amount
of scaling and rotation from the amount of translation on the LPM domain. The
POC method [6] is used for the estimation of the amount of translation on the
LPM domain. This method can calculate the matching between the images effec-
tively and accurately by using the phase correlation. The amount of translation
obtained by the POC method is exploited for the estimation of the amount of rota-
tion and scaling.

3.3 Precision Consideration in the LPM Domain


The measurement precision in the LPM domain severely affects the estimation
accuracy of the deformations and the required estimation time. Therefore we
should consider the precision of each coordinate carefully. In the proposed me-
thod, the precision is specified as the parameter.
178 K. Tamaki, M. Muneyasu, and Y. Hanada

The precision of angular axis is specified by,


360
W= × nw (9)
h
where h means the parameter which specifies the mapping region from the whole
image and nw specifies the precision of angular axis.
The precision of radius axis is defined as

R = rmax × n = X ⎣ ( 2 ) 2⎦× n r
(10)

where the size of the image is X × X , ⎣•⎦ means round down and nr is the
parameter which specifies the precision of radius axis.
Figure 6 shows the relationship between the orthogonal coordinate system and
the log polar coordinate one. The relationship between the point (a , b) on the log
polar coordinate and the point ( x, y ) on the orthogonal coordinate is expressed as

⎧ b ln rmax
⎪ x = e R −1 cos(a nw )
⎨ b ln rmax . (11)
⎪ y = e R −1 sin (a n )
⎩ w

From this relation, we find that the lattice point in a coordinate is not always cor-
responding to the one in another coordinate. Therefore an appropriate interpola-
tion should be required. In this paper, the bilinear interpolation is adopted.

y
ln r

x
θ

Fig. 6 Relationship between coordinate systems.

3.4 Detection Method


The detection method for the message bits is summarized. First of all, the size of the
captured image is measured, the rectangle image which includes the target image is
clipped and the image is resized to the size of the original image. The resized image
and the original one are 2D Fourier transformed and the magnitude spectrums of
these images are calculated. The LPM is applied to the both images. Since the effect
of the rotation and scaling to the captured image is observed as the translation of the
LPM domain, the POC method is used for the detection of the translation.
Data Embedding and Extraction Method for Printed Images 179

Let xt and yt be the number of translation pixels for angular and radius axes,
respectively, then the rotation angle α and the scaling factor σ can be obtained by

⎧ xt
α=
⎪⎪ nw
⎨ − y t ln( rmax )
. (12)

⎪⎩σ = e R − 1

By using α and σ , the captured image is corrected.


In the resize of the captured image, the blank which surrounds the data embed-
ded image affects the estimation of the scaling, since the blank attenuates the peak
value of the POC function by decreasing the correlation between the images. In
this paper, the image part with the blank border whose width is bl in the corrected
image is clipped again. To identify the image part, the POC method is used.
For this clipped image, the amount of rotation and scaling is estimated. The
image is corrected by the estimated values and only the image part is clipped.
Finally, the DCT is applied to the image and the message bits are extracted.

4 Experimental Result
To show the effectiveness of the proposed method, we compare the proposed
method with the conventional method [4]. In this experiment, as the printer, Ca-
non LBP5400 whose resolution was 600 [dpi] and the image scanner, EPSON GT-
X770 whose resolution was 300 [dpi] were used, respectively. The five kinds of
grayscale images whose size were 512 × 512 were selected. Figure 7 shows the
example of the original image. The value of the reference mark was 320,000 for
the conventional method. The 48 bits were embedded and the gain coefficient was
9 for the both method. We also set n w = 5 , n r = 2 , h = 5 and bl = 15 , experi-
mentally. Figure 8 shows an example of the data embedded image.

Fig. 7 Original image.


180 K. Tamaki, M. Muneyasu, and Y. Hanada

Fig. 8 Data Embedded image.

First, to evaluate image quality, peak signal noise to ratio (PSNR) was adopted.
The result is shown in Table 1. From this result, the image quality of the proposed
method is superior to that of the conventional one.
Next, the data detection rate was evaluated. In Table 2, the average detection
rates for 15 trails are shown. The detection rate of the proposed method is over
95% for each image and is almost equivalent to that of the conventional one. The
superior results were obtained for airfield and Goldhill images whose background
is nearly white, since the resize method of the conventional one is strongly
affected by the intensity of the background.

Table 1 Evaluation of image quality.

Image Conventional[dB] Proposed[dB]


Barbara 40.45 42.20
Lenna 40.44 42.29
Airfield 40.56 42.20
Peppers 40.44 42.34
Goldhill 40.46 42.29

Table 2 Evaluation of detection rate.

Image Conventional[%] Proposed[%]


Barbara 95.17 97.22
Lenna 96.25 95.56
Airfield 83.61 95.00
Peppers 98.89 95.69
Goldhill Not detectable 96.53
Data Embedding and Extraction Method for Printed Images 181

5 Conclusion
In this paper, a new correction method to the captured image for extracting the
embedded data from the printed image has been proposed. This method estimates
the rotational angle and scaling factor based on the amount of the translation in the
LPM domain. Compared to the conventional method, the image quality can be
improved, since no markers are embedded. The employment of the POC method
enables the proposed method to detect the data from the image whose background
is nearly white. Experimental results confirmed the effectiveness of the proposed
method.

Acknowledgement. The part of this research was financially supported by JSPS


Grant-in-Aid for Scientific Research (C), 23560479.

References
1. Mizumoto, T., Matsui, K.: Robustness Investigation of DCT Digital Watermark for
Printing and Scanning. Trans. of IEICE (A) J85-A(4), 451–459 (2002)
2. Nakanishi, K., Shono, M., Muneyasu, M., Hanada, Y.: Data Detection from Data
Embedding Printing Images Using Cellular Phones with a Camera. Proc. SISB 2008,
111–114 (2009)
3. Kudo, H., Furuta, K., Muneyasu, M., Hanada, Y.: Automatic Information Retrieval
from Data Embedded Printing Images Using Correction of Rotational Angles Based on
Reference Marks. In: Proc. 2010 ISCIT, pp. 626–629 (2010)
4. Cox, I.J., Miller, M.L., Bloom, J.A.: Digital Watermarking. Morgan Kaufmann Publish-
ing, San Francisco (2002)
5. Zheng, D., Zhao, J., Saddik, A.E.: RST-Invariant Digital Image Watermarking Based on
Log-Polar Mapping and Phase Correlation. IEEE Trans. on Circuits Syst. Video
Technology 13(13) (2003)
6. Foroosh, H(S.), Zerubia, J.B., Berthod, M.: Extension of Phase Correlation to Subpixel
Registration. IEEE Trans. on Image Process. 11(3), 188–200 (2002)
7. Ruanaidh, J.J.K.O., Pun, T.: Rotation, scale and translation invariant spread spectrum
digital watermarking. Signal Process. 66(3), 303–317 (1998)
Design and Implementation of Computer
Assisted Training System for Nursing Process
Learning

Seiichiro Takami, Toshinobu Kawai, Takako Takeuchi, Yukiko Fukuda,


Satoko Kamiya, Kaori Nakajima, Setsuko Maeda, Junko Okumura,
Misako Sugiura, and Yukuo Isomoto

Abstract. Nurses are required to grasp conditions of each patient from physical,
mental, social, and spiritual perspectives. From that grasp, nurses are realising both
of disease control which enables observation and treatment and support of patients
living. This work process is called “nursing process”. The education of assessment
skill is one of the most important assignments for understanding nursing process
and support system to master assessment skills waiting a long time. We developed
CASYSNUPL, Computer Assisted System for Nursing Process Learning and had
made use of the system in lectures and practices. It is a system that learner can im-
aging medical case by media files and available learning on stand-alone computer
by template file used MS-Excel VBA. So learners can understand nursing process
likewise learners who are learned man-to-man practice. CASYSNUPL is evaluated
by learner that it helped their understand for nursing process.

1 Introduction
A nurse’s responsibility is care on an individual basis of each patient. Therefore,
nurses are required to grasp conditions of each patient in a comprehensive way
from physical, mental, social, and spiritual perspectives. From this grasp, nurses are
realising both disease control which enables observation and treatment and support
of patient’s living (figure 1). This work process is called “nursing process”, and it is
a series of process which involves assessment, nursing diagnosis, planning,

Seiichiro Takami ⋅ Toshinobu Kawai ⋅ Takako Takeuchi ⋅ Yukiko Fukuda ⋅


Satoko Kamiya ⋅ Kaori Nakajima ⋅ Setsuko Maeda ⋅ Junko Okumura ⋅ Misako Sugiura
Faculty of Nursing Japanese Red Cross Toyota College of Nursing
12-33, Nanamagari, Hakusan-cho, Toyota, Aichi Prefecture, 471-8565, Japan

Yukuo Isomoto
Faculty of Human Life and Environmental Science, Nagoya Women’s University

T. Watanabe et al. (Eds.): Intelligent Interactive Multimedia: Systems & Services, SIST 14, pp. 183–189.
springerlink.com © Springer-Verlag Berlin Heidelberg 2012
184 S. Takami et al.

implementation. This report focuses on the teaching of “assessment” as the first


step of nursing process. Assessment is a process that gathers information from pa-
tients to find their problem thorough analysis and synthesis. It has a great effect on
the following process such as nursing diagnosis and nursing care, so it is an impor-
tant process to be responded cautiously and appropriately. In this way, the educa-
tion of assessment skill is one of the most important assignments and the develop-
ment of support system for nursing process education has been waited for a long
time for developing the skill of assessment.
Nursing process is a process of solving a patient’s problem, so realising support
system by computer needs design based on POS (problem oriented system). In this
connection for promoting nursing process and practicing, a number of interdiscip-
linary knowledge, critical thinking, discernment based on nursing, the ability of
building relationships with patients, nursing skills to practice care, and holistic in-
tegration capability(figure 2). So, many learners find nursing process difficult.
And it is said that for understand nursing process, learners have to practice at least
10 cases1). Although teachers of nursing process are devising or improving teach-
ing method of nursing process, nursing process is taught in a man-to-man method.

Fig. 1 Thinking flow of nurse

Fig. 2 The image of nursing proce


Design and Implementation of Computer Assisted Training System 185

From such situation, there is much demand for an effective supporting system
for learning nursing process.

2 Problem of Nurse Education

A method of thinking when nurses provide care for patients is called POS. It is
important section in fundamentals of nursing, so POS is taught for a long time.
POS is a natural thinking skill of everyday life. But in medical and nursing
sciences, it needs critical thinking based on knowledge of medicine and nursing,
psychology, sociology and so on to make clear the problems of patients. It is hard
for learners to master POS as a special skill.
For mastering POS of nursing process, learners need to practice more than 10
cases on paper simulation or by man-to-man practice by senior nurses. But it
needs enormous human resources. For solving the present situation, we developed
CASYSNUPL, Computer Assisted System for Nursing Process Learning, a
system that enables graphical images about standardized patient and medical
environments and solves problems in inductive and deductive ways by learners
themselves in 2005. We have made use of the system in lectures and practices.

3 The Concept of CASYSNUPL


Main purposes of CASYSNUPL are to provide several ways for learning nursing
process and realize learning on any time.

a). Improvement of a learner’s assessment , self-education ability and motivation


to learning CASYSNUPL makes learners to gather information by POS tho-
roughly and support learners, for example, to keep learning enjoyment, to
check learning progress, and to make simple grasp a whole image of a patient.
b). Imaging of nursing process by movie and picture CASYSNUPL helps stu-
dents to imagine patients by registering media files of movies and pictures.
c). Reduction of distance between learners and teachers by using web (figure 3)
Almost all WBT systems need internet connection for all operation.
CASYSNUPL is strengthened by editing data by function of Excel VBA, so
learners can learn on standalone computer after downloading templates and
case data files.
d). Synchronising the 5 steps of nursing process (figure 4) Assessment, the first
step of nursing process, is important because of influence on all the follow-
ing steps accuracy of nursing diagnosis, decision target of nursing suitable
with patients, and the implementation of nursing plan. CASYSNUPL’s in-
terface is designed for reappearing the thinking process of nurses during
learning for the sake of learners who have not mastered POS and who can
learn assessment by themselves.
e). Reduction of skills of learners to operate personal computer CASYSNUPL
can be used by learners who have knowledge of how to use Word and Excel.
186 S. Takami et al.

WWW Server Client (students)


Browzer (download and browze image)
Templete for edit case data

Excel (learn)
a case
Downloaded Templete
case data editting case data

case image
•movies Client (teacher)
•photos Browzer
•illustrations download students case data and upload
new case data
Student’s data Excel (teaching)
editted case data create case data

BBS collect student’s data

Fig. 3 The Structure of CASYSNUPL

UGPVGPEG 㧔XKVCN UKIPU HNQY UJGGV㧕


6#) UVKNNHTCOG
ICVJGTKPIQH 6#) XKVCNUKIPU
FCVC 6#)
6#)
6#)
HNQY UJGGV CPKOCVKQP
C
6#)
XKVCNUKIPUITCRJ
U
U ENCUUKHKECVKQP
G
U
U %NCUUKHKECVKQP %NCUUKHKECVKQP %NCUUKHKECVKQP %NCUUKHKECVKQP
K
O 6#) 6#) 6#)
̕ 6#) PF
G
P W
EV
V CPCN[UKU KX
G
CPCN[UKU6#) CPCN[UKU6#) CPCN[UKU6#) CPCN[UKU6#) KP
̕ HG
TG

̕
U[PVJGUKU PE
U[PVJGUKU㧝 U[PVJGUKU㧞 U[PVJGUKU㧢 G
FG
TGNCVGFEJCTV FW

̕
EV
FKCIPQUKU &KCIPQUKU㧝 &KCIPQUKU㧠 KX
G
KP
RTQDNGONKUV HG
RTQDNGONKUV
TG
PE
PWTUKPIECTGRNCP G
 PWTUKPI RNCP  PWTUKPI RNCP
̕  PWTUKPI RNCP

PWTUKPITGUWNV
GXCNWCVKQP
TGUWNV
GXCNWCVKQP
TGUWNV
GXCNWCVKQP ̕ TGUWNV
GXCNWCVKQP

Fig. 4 Comparison CASYSNUPL and nursing process

4 The Process of CASYSNUPL Operation

4.1 Structure (Figure 5)


CASYSNUPL is structured with server and client computer.
Server is WWW server, it operates to store templates, case data(master data and
data edited by learners) and media data to assist to image.
A client computer needs MS-Excel. Its roles are to edit case data files by tem-
plate, view media data, upload edited case data by learners, and upload new case
data and media from teachers. Template is a file of MS-Excel VBA, and its opera-
tion is to read case data and edition based on nursing process. Case data is an
XML file by own rules. Media data is for viewing by streaming.
Design and Implementation of Computer Assisted Training System 187

Student Server

Excel Login user master


te
m
menu and execution p
more macroscopially la
te template
B case editing
ro
w
se
take case date in c r Data case date file
a Access (editted by students)
s
e
e
d
it
Case information in
g
editing
case date file
(collected by teacher)

Fig. 5 Learning process of CASYSNUPL

4.2 Learning Process of Learners


Learners need to log in server from client PCs, and to download a templates and
case data.
i). Then learners open a template by Excel. After the template runs macro, Excel’s
interface is added as a new menu for editing the case data of CASYSNUPL. All
operations to edit case data is conducted from additional menu.
ii). In the process of learning, learners can view media files for assisting their learn-
ing. Skills necessary for operating the template of CASYSNUPL are basic op-
erations, such as copy and paste, input texts and download from web site. The
process of editing case data is based on nursing process, so learners can promote
the understanding of nursing process and Problem Oriented System.
iii). As learners make progress, the case data used for their learning is edited
with reflection of the learners' thought. When learners upload these edited
data, CASYSNUPL entry edited data with learner’s ID. Teachers can access
and download learners’ data.
iv). After checking edited data, teachers can teach learners on web. CASYSNUPL
has a function of BBS, so teachers teach each learner by entering BBS. Also
teachers can teach each learner by web. CASYSNUPL can reduce distance
between learners and teachers, similar to man-to-man environment.
Learners can also use BBS and messages on web, so teachers and learners can
have two-way communication every time they use it.

5 Effects and Problems from Practice

5.1 Evaluation of Teachers


We have evaluation the following from teachers;
i. Since CASYSNUPL has functions to show many supporting data, such as
data of check-ups, image data of check-ups and expressions of patients,
we could reproduce medical scenes.
188 S. Takami et al.

ii. Most learners of nursing process who have lectures in classroom did not
have enough instruction, less than learners who gave man-to-man train-
ing. CASYSNUPL has a function of collecting edited case data by learn-
ers. So we could give personal and friendly teaching for all learners.
iii. We had two-way communication between teachers and learners. This
communication leads learner up to have more motivation to learn.

5.2 Evaluation of Learner (a Questionnaire Survey)


We had a questionnaire survey from learners who have finished nursing practice.
We obtained 87 answers from 130 learners. According to it (figure 6),

i. A majority of learners answered that CASYSNUPL helped their under-


standing of nursing process.
ii. However, they did not have affirmative answer about understanding of
nursing support system and Problem Oriented System.
iii. Some learners answered that they could not master how to operation Ex-
cel, so they could not operate the template of CASYSNUPL.
iv. One learner could not have a feeling that for method of deduction and in-
duction needed for nursing process is taught.

Did you understand the overview of nursing process? 9.2 65.5 19.5 4.61.1
Did you understand aseries of nursing process such as assesment,nursing
diagnosis, nursing care plan, intervention, and evaluation? 8.4 64.4 10.3 5.7 1.1

Could you have corrective image of nursing support system? 8 35.6 44.8 8 3.4

Can you think in both inductive and deductive ways? 9.2 25.3 47.1 14.9 3.4

Do you want to continue to study nursing process with CASYSNUPL? 48.3 28.7 12.6 9.2 1.1
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

Strongly agree Agree Neither agree nor disagree Disagree Strongly disagree

Fig. 6 Results of questionnaire about CASYSNUPL

5.3 The Plan for Improvement


From these results, almost au problems are traceable to interface and learner’s
skills. We designed interface of CASYSNUPL for experienced users of PCs, but
some learners had little experience of PCs. A interface of template (of
CASYSNUPL) does not depend on Excel, even though it is made by Excel VBA.
However, many learners did not think so. For example, they could not understand
why there data are often lost they clicked the icon of “overwrite save” (The file of
Excel is only a template in system of CASYSNUPL, so they saved the template,
not their edited case data).We think the reason is the same as the one that learner
could not understand about nursing support system. A problem of learners’ com-
puter literacy will influence on the awareness of computerized medical records
system in a negative way, so we must improve not only the interface of
CASYSNUPL, but also training to learners to strengthen computer literacy.
Design and Implementation of Computer Assisted Training System 189

6 Prospects of CASYSNUPL
CASYSNUPL is used by some nursing institutions including our college. Each in-
stitution teachers created case data for their students. These data is recorded by the
database of CASYSNUPL, so learners can share all the case data in it.
Teachers can add case data by their own free will, so case data will increase
with implementation. Learners will be able to use more case data, but it is ex-
pected that they will not have enough knowledge to select their proper case data.
The applied range of patients’ cases is wide, so it has operated on multiple
nursing areas. In other nursing institutions, CASYSNUPL can operate by making
case data according to each institution’s educational policies. At present,
CASYSNUPL is operated on case data in a fictional character, but it can operate
on the case data of real patients.
In the future, we hope that CASYSNUPL will be used in more nursing institu-
tions, so we aim to obtain extensibility for adding functions, such as searching
for many case data and, multilingual support.

7 Summary
CASYSNUPL is a system for developing train “thinking”, not “knowledge”. It is a
feature of CASYSNUPL that can train learners’ skills for thinking, and
CASYSNUPL obtained achievements for training skills for thinking.
On the other side, we have some problems;

࡮Because CASYSNUPL is installed in server on Web , it could not add the


case data of real patients. (In Japan, it is a taboo to share the personal data of
patients in a open web site.)
࡮ A template of CASYSNUPL leaves much room for improvement for inter-
face, because it depends on MS-Excel’s interface.
࡮ We must train learners’ skills for PCs. Almost all learners can understand to
edit case data, but a few learners lacks such ability.

We hope that we can solve these problems and develop the education of nursing
skills from the practical use of CASYSNUPL.

Reference
1. Egawa, S., Shindan, K.K.: Nissoken, p. 2 (2005)
Designing Agents That Recognise and Respond
to Players’ Emotions

Weiqin Chen

Abstract. Giving agents the ability to sense, recognise and appropriately respond
to the human emotions is one of the main methods to make the agents more
believable. In intelligent tutoring systems, learners’ affective states have been in-
corporated in providing adaptive feedback. In game industry, however, emotions
of players have not been paid enough attention. Games normally do not take into
account of players’ emotions when they play games. Non-player characters do not
respond to players’ emotional status. We argue that adapting to players’ emotions
can make the non-player characters more believable and the game more enjoyable.

1 Introduction

Emotions play an essential role in human intelligence. Emotional intelligence,


which refers to the ability to recognise the emotions of another and to respond
appropriately to these emotions, is considered more and more important in
human-human communication.
Existing research indicates that giving agents the skills of emotional intelli-
gence can improve the believability of agents where believability refers to agents
providing the illusion of life (Bates 1994). Believability and enjoyableness are two
important factors in games where emotions can be used to create a more human-
like opponent, making it more enjoyable to play with. However, in commercial
games, players’ emotions have not been taken into account in order to provide
more enjoyable experience for players. The most recent development is the Vitali-
ty sensor from Nintendo that is supposed to be able to detect player’s affective
states. Although it has been announced in 2009, it has not appeared in market.
Our earlier research in modelling emotional states of players based purely on
game state and interactions between players has shown the limitation of such an

Weiqin Chen
Department of Information Science and Media Studies, University of Bergen,
Bergen, Norway
Oslo and Akershus University College of Applied Sciences, Oslo, Norway

T. Watanabe et al. (Eds.): Intelligent Interactive Multimedia: Systems & Services, SIST 14, pp. 191–200.
springerlink.com © Springer-Verlag Berlin Heidelberg 2012
192 W. Chen

approach. More precise modelling based on sensor input should provide us better
data for adapting agents’ action. In this research we use the Emotiv EPOC and a
neural network to detect player’s emotions. The agent is designed based on the
data input from Emotiv EPOC device and knowledge on how people react in
different emotional states.
The research questions we address in this research are:

1. How does emotion modelling change players’ perception of the gameplay?


2. Are agents that are based on players’ emotion modelling more believable?

2 Emotions in Games
Researchers in affective computing have made considerable efforts in modelling
users’ affect and synthesizing emotions in software agents (Picard 1997).
Emotions have been incorporated into action selection and planning (Aylett et al.
2006). Affective expressions have been implemented in virtual characters (Paiva
2000). In order to allow for powerful emotional experiences, Sundström
(Sundström 2005) defined “affective loop” which is an interactive process where
1) users recognise the emotional state of the system/agent and relate it to their own
affective state; 2) in turn the system/agent in turn recognises the affective state of
users and performs an equivalent integrative process.
Some systems are able to recognize affect without using any hardware sensors.
In such systems, affect is recognised by making inferences based on the users’ ac-
tions. Since emotions are complex and are expressed in different modalities (e.g.
facial expressions, voices, gestures and physiological signals), some systems have
explored multimodal approach for detecting affects. Studies have shown im-
provements in performance by combing contextual information and physiological
signals (Conati and Maclaren 2009). Some researches explored the potentials of
using electroencephalogram (EEG) devices for affect detection. A notable fact is
that affective tutoring systems that adapt not only to learners’ cognitive states but
also to their affective states have recently attracted the interest of a growing com-
munity of researchers (Arroyo et al. 2009; Heraz et al. 2007; D’Mello and
Graesser 2010). Various physiological sensors that capture EEG, EMG signals,
skin conductance levels, heart rate, and respiration rate have been used to provide
adaptive feedback to learners.
In game research, some effects have been made in taking consideration of play-
er’s affective states. For example, Hudlicka (Hudlicka 2009) proposed affective
game engines in order to provide functionality to support the recognition of user
and game character emotions, real-time adaptation and appropriate responses to
these emotions, and more realistic expression of emotions in game characters and
user avatars. Other researches on emotions in games have also incorporated play-
ers’ emotional states into player modelling (Yannakakis et al. 2010; Garbarino et
al. 2011; Kim et al. 2004; Martinez and Yannakakis 2010). Some researchers have
studied gameplay emotions that arise from players’ actions in the game and
the consequent reactions of the game (Perron 2005). These emotions have the
potential to be used in providing adaptive responses.
Designing Agents That Recognise and Respond to Players’ Emotions 193

In commercial games, however, the focus has mainly been to invoke players’
emotions through a series of structuring and writing techniques, such as in “Emo-
tioneering” (Freeman 2003). Players’ affective states have not been widely used to
provide adaptive gameplay.

3 Recognizing and Synthesizing Emotions in Diplomacy


Diplomacy is a strategy-based social board game. It simulates the First World War
when seven nations fought for domination over Europe. The seven nations include
England, France, Germany, Russia, Italy, Austria-Hungary, and Turkey. The
board is a map of Europe (showing political boundaries as they existed in 1914)
divided into 75 regions of which 34 contain supply centres. For each supply centre
a player controls, she or he can build and maintain an army or a fleet on the board.
If one of the players controls 18 supply centres, this player has won the game. The
game mechanics are relatively simple. Only one unit may occupy a region at any
time. There is no chance involved. If the forces are equal in strength, it results in
standoff and the units remain in their original positions. Initially each country is
roughly equal in strength, thus it is very difficult to gain territory - except by
forming alliance and then attacking. Negotiation for forming alliances is a very
important part of the game, because numerical superiority is crucial. Secret nego-
tiations and secret agreements are explicitly encouraged, but no agreements of any
kind are enforced.
Each game turn begins with a negotiation period, and after this period players
secretly write orders for each unit they control. The orders are then revealed
simultaneously, possible conflicts are resolved and the next turn can commence.
StateCraft is a software version of the Diplomacy, developed by Krzywinski,
et al. (Krzywinski et al. 2008). It is an online multiplayer turn-based strategy game
where each of the seven countries can be played by either a human player or an
agent (Fig. 1).
In our earlier research, we have implemented emotion module in StateCraft
(Chen et al. 2011a). The emotion module allows agents to have emotions based on
the game states and other players’ or agents’ actions toward them. They then make
decisions based on these emotions as well as current game states. We adopted the
Ortony Clore Collins-model (OCC) to synthesise emotions (Ortony et al. 1988). In
order to identify what emotions are experience and how the emotions influence
decision making when playing Diplomacy, seven human players were invited to
play the game and interview was conducted afterward. The most frequently expe-
rience emotions we found were joy, loyalty, guilt, fear, anger, shame, relief, and
disappointed. These emotions were then mapped to the categories of the
OCC-model and implemented in the emotion module of agents.
A user study was conducted to examine how the emotions affect player expe-
rience, e.g. whether they can identify the different emotions of the agents and
whether they think it is fun to play against agents with emotions. The participants
played Austria-Hungary in all games, because of Austria's geographic position,
with many neighbouring countries. Each participant played three games from
1901 to 1905:
194 W. Chen

Fig. 1 Game interface for players

• Game 0: against a mixture of emotional agents and regular agents (with no


emotion) and were asked to identify which emotions the agents were feeling
towards them;
• Game A: against all emotional agents;
• Game B against all regular agents.

The sequence of Game A and B were mixed to prevent the effect that the players
become better at the game. After playing through Game A and B they were asked
if they observed any differences between the two games, and if so, which
differences they found and if they thought one of the games were more fun than
the other.
Although four of six players thought that it was more fun to play against emo-
tional agents, most of them were unable to identify the agents’ emotions. Only two
players managed to pick an emotion that the agent actually had towards them after
Game 0. We can argue that the players did not know how emotions would be ex-
pressed through the actions of the agents and therefore they had difficulties in
identifying agent emotions. When human players play the games face to face, they
can see each other’s facial expressions, body languages and other cues. These cues
provide important information for human players to judge the emotional status and
make decisions accordingly. When human players play with agents in the game,
the only information they could base on is the actions of the agents. The same ap-
plies for agents. When agents play with human players, they can only base on the
Designing Agents That Recognise and Respond to Players’ Emotions 195

actions taken by players. There are no other clues in the mechanism that can help
agents to better understand the emotional status of players. This is the main moti-
vation for our current research on using electroencephalogram (EEG) device in
StateCraft.

4 Emotiv EPOC in StateCraft

EEG devices capture brainwaves by measure the electrical activity on the scalp.
The EEG device we use in this research is the Emotiv EPOC. It has 14 elec-
todes/channels, comparable to medical brain-computer interfaces that normally
have 19 electrodes. A service program can automatically capture the raw EEG
signals coming from each of the channels. It can detect 14 cognitive actions
including movements and rotations. It can also detect expressive actions such as
facial expressions, eye and eyelid-related expressions. In addition, it can detect
affective emotions such as engagement, instantaneous excitement, and long-term
excitement with no need of training.
Emotiv EPOC has mainly been used in games for human players to manipulate
and control games with brain instead of hands. It has not been used to identifying
the players’ emotions and using these emotions to improve the realism of emo-
tional responses of AI characters in games.

4.1 EmotivInterpreter in StateCraft


StateCraft implemented a three-layered agent architecture which includes an oper-
ational layer, a tactical layer and a strategic layer. These layers handle three main
tasks when playing Diplomacy respectively: monitoring the game board, planning
moves, and engaging in diplomatic negotiations. The operational and tactical lay-
ers are invoked once each turn, acting on the new game state that resulted from the
previous round and the strategic layer is active throughout the whole game ses-
sion. Thus the agent is driven by the periodical updates of the game state, while
still maintaining continuous diplomatic interaction with the opponents.
The operational layer is a reactive layer and is triggered at the start of each
round. It monitors the game board and discovers all possible and legal moves for
each unit based on the game state. The tactical layer combines operations for each
unit into a set of operations, a tactic. Each tactic contains thus one operation for
each of the agents units. The strategic layer is responsible for communicating with
the other players, and based on this diplomatic activity and the weighted tactics
from the previous two layers, selects the appropriate tactic for the current round.
This layer consists of four relatively simple modules – Choose Tactic, Answer-
Support, SupportSuggestor and Relationship (Fig. 2). The strategic layer adopts
the idea of the Subsumption architecture (Brooks 1990) which results in a flexible
layered structure -- the modules impact each other in a non-linear manner and new
modules can be added without breaking the former functionality.
196 W. Chen

Fig. 2 Emotion module integrated in the strategic layer in StateCraft

Given that the Emotion module (EmotivInterpreter) is an addition to the agent


in StateCraft, the whole module has been implemented in the Strategic layer of the
agent. The Strategic layer uses an architecture similar to the Subsumption system,
using sensors to look for changes in the environment and actuators to act on the
changes from the sensors. In the case of EmotivInterpreter, the module receives
input through an input line from GameStateSensor, the sensor listening for new
game states from the server, and MessageSensor, the sensor listening for new
SupportRequestMessages and AnswerSupportRequestMessages. Then it performs
its actions by suppressing (S in Fig. 2) the input to ChooseTactic, the module re-
sponsible for choosing tactics. Additionally, it inhibits (I in Fig. 2) the output from
the AnswerSupport module. In other words, if the output of the AnswerSupport
module is to provide support, but the output of the EmotivInterpreter module is
not to provide support. Then the decision is not to provide support. Fig. 2 depicts
the EmotivInterpreter as part of StateCraft’s strategic layer.

4.2 EmotivInterpreter in Details


Fig. 3 shows the process of adapting the agent actions based on the player’s emo-
tions. Emotiv EPOC raw data and game state sensor data are collected and
processed as input to the classifier. The classifier decides the current emotion of
the player and the intensity of the emotion. The emotion and its intensity together
with the list of tactics (produced by the tactic layer) and messages from other
players/agents are then fed into the decision-making module which decides on an
action.
Designing Agents That Recognise and Respond to Players’ Emotions 197

Fig. 3 Choosing actions based on emotions

The EmotivInterpreter is the main class in the emotions package. It includes


three main methods:

• receive() is the method that receives the GameState and the diplomatic messag-
es through its input line. This ensures that the EmotivInterpreter receives the
GameState object each action round, and thereby passes it to all of the emotions
in the emotion list. Each SupportRequestMessage and AnswerSupportRe-
questMessage is also passed through the receive()-method so that it keeps track
of the agreements made in the preceding round.
• suppress() is the method which suppresses the TacticList from the ChooseTac-
tic module and allows the module to remove tactics or change their values.
• inhibit() is the method which inhibits the outgoing AnswerSupportRequestMes-
sage. The message is stored until next round, when the agent will check to see
if it performed the support operations it promised.

A feed forward neural network is developed to identify emotions when playing


Diplomacy. The input layer of the neural network includes 33 neurons:

• Game states (6): including number of provinces, supply centres, army, fleets,
supply centre surplus, occupied neighbours.
• Number of supply centres of each players (7)
• Interaction between players (6): including number of supply centres stolen,
provinces stolen, accepted and denied support request, sent support requests
that were accepted or denied.
• Emotiv EPOC reading (14)

The output layer of the neural network is the emotion and its intensity. As men-
tioned in the previous section, there are eight different emotions experienced by
players in the game. These emotions include joy, loyalty, guilt, fear, anger, shame,
relief and disappointed.
198 W. Chen

In the training process for the neural network, players are required to self-report
the emotions. An emotion self-report window pops up after each round. The level
of intensity of the eight emotions can be reported by using a sliding bar with val-
ues from 1 to 100 for each emotion. The self-reported emotions are the standard
output for the neural network in the training process.
The decision-making process takes into account the player’s emotion and inten-
sity. Based on the emotion (with intensity) and game states, the agent predicts the
player’s next action. A list of rules is used to generate predictions. For example,
Anger players tend to make aggressive moves towards their opponents. Fear caus-
es players to make defensive moves. If a player feels guilty, he or she tends to
make cooperative moves. In other words, he or she will most likely to accept a
Support Request and actually provide the support. These rules are identified based
on the interview with players. For more details see (Chen et al. 2011b). The pre-
dicted actions of players are then used to decide on what actions the agent will
take. In this process the weights of the actions in the TacticList will be changed
accordingly.

5 Finishing Words
In this paper we have presented a new emotion module for agents in a multiplayer
strategy game. This work is an extension of our earlier research where we have
found out that it is difficult for an agent to identify human players’ or other
agents’ emotions states based purely on the behaviours of the players or the other
agents. The new module makes use of the EEG signals from Emotiv EPOC to
identify players’ emotions, which helps agents to provide adaptive responses to
them. The research aims to make the agent more believable and the game more
enjoyable.
The module is under development. The decision-making process is not fully
functional. When it is completely implemented, we plan to conduct two user stu-
dies to address the two research questions we presented in Section 1.
The first study will be similar to our previous user study where players will be
asked to play three games:

• Game 1: players use Emotiv EPOC and play against all 6 new emotional agents
(agents with the new emotion module)
• Game 2: players do not use Emotiv EPOC and play against all 6 old emotional
agents (agents with the previous emotion module)
• Game 3: players do not use Emotiv EPOC play against all 6 regular agents
(agents without the emotion module)

The sequence of the games will be mixed to prevent the learning effect. After
playing through the three games they will be asked if they observed any differenc-
es among the three games, and if so, which differences they found and if they
thought the agents in one of the games are more believable or one of the games
were more fun than the others.
Designing Agents That Recognise and Respond to Players’ Emotions 199

The second study is similar to a Turing Test where players with Emotiv EPOC
will be playing a mixture of human players and new emotional agents. After the
games, they will be asked to identify which of the countries are played by agents
and which are played by human players.

References
Arroyo, I., Cooper, D.G., Burleson, W., Woolf, B.P., Muldner, K., Christopherson, R.:
Emotion sensors go to school. In: Dimitrova, V., Mizoguchi, R., Boulay, B.D., Graesser,
A. (eds.) Proc. AIED 2009, pp. 17–24. IOS Press (2009)
Aylett, R.S., Dias, J., Paiva, A.: An affectively-driven planner for synthetic characters. In:
Long, D., Smith, S.F., Borrajo, D., McCluskey, L. (eds.) Proc. ICAPS 2006, pp. 2–10.
AAAI Press (2006)
Bates, J.: The role of emotion in believable agents. Communications of the ACM 37(7),
122–125 (1994)
Brooks, R.: Elephants don’t play chess. Robotics and Autonomous Systems 6(1&2), 3–15
(1990)
Chen, W., Carlson, C., Hellevang, M.: Emotional agents in a social strategic game. In:
König, A., Dengel, A., Hinkelmann, K., Kise, K., Howlett, R.J. (eds.) Proc. KES 2011,
pp. 239–248. Springer, Heidelberg (2011a)
Chen, W., Carlson, C., Hellevang, M.: The Implementation of Emotions in a Social Strate-
gy Game. Paper presented at the GET 2011, Rome, Italy (2011b)
Conati, C., Maclaren, H.: Empirically building and evaluating a probabilistic model of user
affect. User Modeling and User-Adapted Interaction 19(3), 267–303 (2009)
D’Mello, S., Graesser, A.C.: Multimodal semi-automated affect detection from conversa-
tional cues, gross body language, and facial features. User Modeling and User-adapted
Interaction 20(2), 147–187 (2010)
Freeman, D.E.: Creating Emotion in Games: The Craft and Art of Emotioneering. New
Riders Games (2003)
Garbarino, M., Matteucci, M., Bonarini, A.: Affective Preference from Physiology in
Videogames: A Lesson Learned from the TORCS Experiment. In: D’Mello, S., Graess-
er, A., Schuller, B., Martin, J.-C. (eds.) ACII 2011, Part II. LNCS, vol. 6975, pp. 528–
537. Springer, Heidelberg (2011)
Heraz, A., Razaki, R., Frasson, C.: Using machine learning to predict learner emotional
state from brainwaves. In: Spector, J.M., Sampson, D.G., Okamoto, T., et al. (eds.) Proc.
ICALT 2007, pp. 853–857 (2007)
Hudlicka, E.: Affective game engines: motivation and requirements. In: Whitehead, J.,
Young, R.M. (eds.) Proc. the 4th International Conference on Foundations of Digital
Games (FDG 2009), pp. 299–306. ACM (2009)
Kim, J., Bee, N., Wagner, J., André, E.: Emote to win: affective interactions with a com-
puter game agent. GI Jahrestagung 1, 159–164 (2004)
Krzywinski, A., Chen, W., Helgesen, A.: Agent architecture in social games – the imple-
mentation of subsumption architecture in Diplomacy. In: Darken, C., Mateas, M. (eds.)
AIIDE 2008, pp. 100–104. AAAI Press (2008)
200 W. Chen

Martinez, H.P., Yannakakis, G.N.: Genetic search feature selection for affective modeling:
a case study on reported preferences. In: Castellano, G., Karpouzis, K., Martin, J.-C.,
Morency, L.-P., Peters, C., Riek, L. (eds.) 3rd International Workshop on Affective
Interaction in Natural Environments (AFFINE 2010), pp. 15–20. ACM (2010)
Ortony, A., Clore, G.L., Collins, A.: The cognitive structure of emotions. Cambridge
University Press, Cambridge (1988)
Paiva, A. (ed.): Affective Interactions: toward a new generation of computer interfaces.
Springer, New York (2000)
Perron, B.: A Cognitive Psychological Approach to Gameplay Emotions. Paper Presented
at the DIGRA 2005, Vancouver, British Columbia, Canada (2005)
Picard, R.W.: Affective Computing. MIT Press, Cambridge (1997)
Sundström, P.: Exploring the Affective Loop. Licentiate thesis, Stockholm University
(2005)
Yannakakis, G.N., Martínez, H.P., Jhala, A.: Towards affective camera control in games.
User Modeling and User-Adapted Interaction 20(4), 313–340 (2010)
Development of Agent-Based Model
for Simulation on Residential Mobility Affected
by Downtown Regeneration Policy

Zhenjiang Shen, Yan Ma*, Mitsuhiko Kawakami, and Tatsuya Nishino1

Abstract. In the current decades, compact city becomes a new concern of urban
planning among most of Japanese cities. Some local governments in Japan target
to realize the conception of compact city pattern through policy intervention, such
as encourage households remove from suburban to downtown and then relieve the
decrease of population in urban center areas. Recently one of such kind residential
policy is argued by local government in Kanazawa city, Japan. This policy man-
ages to attract and encourage households remove to downtown by offering a local
housing allowance. The contribution of this work is that we developed an agent-
based Household Residential Relocation Model (HRRM) for visualizing the effect
of this residential policy. HRRM is built on household interaction through housing
relocation choice and policy attitude, and thereby it can simulate the diversified
decisions of households in all their lifecycle stages. Through the simulation of
HRRM the effectiveness of this residential policy can be visualized and hereafter
helps local government to view the effect of the residential policy.

Keywords: Household Relocation, Downtown Decline, Urban Shrinkage,


Agent-Based Modelling, Compact City, policy effect.

1 Introduction
Today, policy prescription has increasingly favoured a compact city approach to
relieve the negative sides of urban decline [Haase, et al, 2010;, Rieniets, 2006;,
Howley, et al, 2009]. Local government in Kanazawa city, Japan, has published
series of downtown regeneration policies to vitalize their downtown and make

Zhenjiang Shen · Yan Ma · Mitsuhiko Kawakami · Tatsuya Nishino


School of Environmental Design, Kanazawa University, Kakuma Machi, Kanazawa City,
Japan 920-1192
*
Corresponding author.
e-mail: mayan@stu.kanazawa-u.ac.jp

T. Watanabe et al. (Eds.): Intelligent Interactive Multimedia: Systems & Services, SIST 14, pp. 201–211.
springerlink.com © Springer-Verlag Berlin Heidelberg 2012
202 Z. Shen et al.

their city more compact. In this work, we focus on a downtown regeneration poli-
cy for improving residential environment in local city (hereafter residential pro-
moting policy) that target to revitalize downtown through drawing the residents
who move to downtown with allowances for their residential relocation. We
attempt to prove the possibility of implementation effects of this policy by
Agent-Based Modelling (ABM).
As proved by exist research[Jager and Mosler, 2007], ABM is expected to con-
tribute significantly to the study of behavior-environment interactions, and to pro-
vide a valuable tool for exploring the effectiveness of policy measures in city
complex environments. This characteristic of ABM makes it a practical approach
for simulation of policy and meanwhile, revealing the effectiveness of policy. The
purpose of this research is to develop an agent-based Household Residential Relo-
cation Model (hereafter, HRRM) for visualizing household residential mobility
and hereby, reveal the possibility of reflecting implementation effects of this poli-
cy in local city of Kanazawa.
Regarding residential location issues that are combined as part of simulation
model, the stated preference experiments primarily focus on determining the
transport characteristics versus work and location related variables for residential
location [Kim et al., 2005; Molin and Timmermans, 2003; Rouwendal and Meijer,
2001]. These experiments are mostly on the basis of individual behavior research,
which show us with believable variables for further researches, especially for mi-
crosimulations or agent-based simulation. In the following we will introduce how
HRRM is built and how it works.

Table 1 Residential promoting policy in Kanazawa city


Building Types Utilization types Allowances
Buy new Single household 10 % payment , 2 million JPY
house Two households 10% payment, less than 3million JPY
House
Basic part+ supplementary part, less than
Buy or repair old house
500,000+200,000
Apartment Buy new apartment Basic part (5% of payment) + supplementary
old apartment buy part (1%), less than 1 million + 200,000
repair 50% of design payment, less than 1 million

2 Method of Model Formulating


We attempt to conduct simulation of household residential mobility through
HRRM in order to reveal the possibility of ABM on visualizing the implementa-
tion effects of residential promoting policy. The policy includes allowances for
household relocation to downtown as shown in Tab-1.
Within HRRM we developed three modules for simulation on household resi-
dential mobility. They are household lifecycle stage module, evaluation module
for household relocation desire and household relocation choice module. In our
work, a lifecycle stage principle that covers the whole life of a person will be in-
putted to recognize the needs of household residential relocation. In HRRM each
residential location will be further defined by a series of special attributes. Agent
will make decision on new location based on their affordability and the utility of
Development of Agent-Based Model for Simulation 203

that location. Before relocation, household will firstly evaluate the utility of cur-
rent location. This process will produce a satisfaction evaluation of household’ s
current location and hereby, decides a relocation desire. After this, households will
focus on finding new houses in somewhere within the urban area. We utilize utili-
ty theory to build the third module to simulate the relocation choice process. The
relocation choice module will help households compare the utility of residential
locations in different urban areas and interact dynamically with other households
to help them make relocation decisions.
Meanwhile, the interactions between household agents are introduced to
represent the influences of residential promoting policy. In this work, the interac-
tions will take place on two levels, one is between agents at neighborhood level
and the other one is between agents at global level. The interactions between
agents are used to simulate household attitudes to the policy and in this paper
which is measured by the numbers of residents who would accept the policy and
move to downtown based on a questionnaire investigation regarding to this policy.
For model testing, we estimated the necessary coefficients using the questionnaire
carried out in Kanazawa City and visualized household residential mobility in a
vitural space of Kanazawa City. According to comparison between the simulated
relocation choices and a real ratio of household relocation to downtown of Kana-
zawa City, the possibility of using ABM to represent the effects of residential
promoting policy in Kanazawa city can be illustrated.

3 Discription of HRRM

3.1 Lifecycle Stage Module of Hosuehold Agent


In this work real persons are not seemed as individual agents. A household is de-
fined as an agent in HRRM and it can make decision as a single entity on the
property within which it is located (comprising single person or groups of per-
sons). Location decisions will be affected by the attributes of household agent. We
assume that household do residential relocation only when their lifecycle stage
happen changes or they are unsatisfied with their current locations.
We divided the whole life of a
householder into 4 stages. As
shown in Fig-1, an independent
household agent elder than 18 is
created at first stage. In the second
stage the agent will get married
randomly before 36 years old
basing on local coupling rate
(0.59%) and become a new
household agent[Kanazawa City,
2005-2007]. While as indicated
by the dashed line, some house-
holds won’t marry till they die. In Fig. 1 Diagram of household lifecycle stage in
the other side, some coupled HRRM
204 Z. Shen et al.

households will pregnant and would probably move again for their babies (we
define households less than 45 years old can give birth to new generation of
households according to local birth rate of 0.9%). In the third stage a new genera-
tion of households is created and just as showed by another dashed line when the
agent one day grow up, he will be independent (go to collage or get a job at 18
years old,) and begin his first lifecycle stage. During this process, the old house-
hold who gave birth to this new household will continues their life and finally go
to their last lifecycle stage (a death rate 0.8% is utilized to eliminate household
agents).
When the lifecycle stage of a household agent changed, facing to moving the
agent will firstly make an assessment regarding his Household Relocation Desire
and then make a relocation decision basing on Household Relocation module as
disscussed in the following sections.

3.2 Evaluation Module of Household Relocation Desire

3.2.1 Decision Tree for Household Relocation Desire


In household relocation desire module we use a decision tree as shown in Fig-2 to
sift the households who desire moving. As showed in this figure household’s satis-
faction with current location will be firstly calculated to judge whether household i
has a potential to move to a new location. If the result shows that his satisfaction
of current residential Si is below the satisfaction threshold he pursuits, household i
will consider doing a new
Start
location. Then is the turn for
household predicting the
S <= S i Cost < Deposit
threshold NoHousehold age Yes
possible costs of the coming
relocation and comparing it Yes

with his deposit. If the depo- 65~ 40~64 30 ~20


Keep current location No

sit can afford the relocation


cost, he will do a new loca- Relocation module
tion and following is the job
of relocation module.
Fig. 2 Decision tree for household relocation desire

3.2.2 Evaluation on Household Satisfaction with Current Location


Each land parcel in HRRM has a series of predefined spatial attributes, which will
be inputted in the begaining of simulation by users. Then agents can conduct satis-
faction evaluation based on the spatial attributs of their locations. The methmatic
models for evaluation on household satifaction with current loation is basing on
utility theory that showed by equations 1 to 4:
n

j =1
( a j × xi j ) + ε i
(n=1,…, 20) (1)
Si =
n
−2 ≤ Sthreshold ≤ 2 (2)
S i > Sthreshold (Satisfied by current location) (3)
S i ≤ Sthreshold (Unsatisfied by current location) (4)
Development of Agent-Based Model for Simulation 205

As shown by the equations, variable Si stands for the satisfaction of household


i’ s current residential location. aj is a vector of retrospective coefficients to
variable j, which obeys the order in Tab-1. xij as the satisfaction of household i to
variable j is basing on a questionnaire of residential environment and housing en-
vironment in Suburban area of Kanazawa City (shown in Tab-3). The investiga-
tion contents 18 factors on housing environment and residential environment as
shown in Tab-2. Satisfaction degree of each factor was divided into 2, 1, -1 and -2
four levels, which represented satisfied very much, satisfied, unsatisfied and ex-
tremely unsatisfied. During simulation, the variable values for xij will be assigned
values in this range. Within the questionaire, responders were asked to discrimi-
nate the score for each of the following factors shown in Tab-2 between the four
satisfaction levels. In this questionnaire the satisfaction investigation was con-
cluded by a relocation desire question of "will you move". The results were quan-
tified by 1, 2, 3 and 4 four levels, respectively stands for not move, not move if I
can, move if I can and move. We estimate the coefficients aj between Si and xij us-
ing Multi-Criteria Evaluation (MCE) in statistic software R. This results are
named Coefficients in Tab-1. Meanwhile, according to existed researches we set
the satisfaction threshold as 0.1[Kawakami and Takayama, 1978; Kikuchi and
Nojima, 2007].

Table 2 Partial correlation coefficient of regression analysis for household relocation

Variables for satisfaction evaluation Co Partial coefficient Coef. Partial coef.


ef. (no policy interaction) (with policy inter-
action)
House size a1 0.170634**** as1 0.069904
Housing security from earthquake, typhoon a2 0.140753**** as2 0.153358***
Housing security from fire a3 0.045609 as3 0.085280*
Housing impairment a4 0.029561 as4 0.112268**
Barrier-free structures for old people a5 0.067244 as5 0.051591
Surrounding safety equipments a6 0.125595*** as6 0.072290*
Safety while walking on surrounding a7 0.126829*** 0.105173*
as7
pavement
Crime rate a8 0.109048*** as8 0.085835*
Air or noise pollution a9 0.162371**** as9 0.098698*
Accessibility to work or school a10 0.056896 as10 0.079705*
Shopping convenience a11 0.118102*** as11 0.075495*
Accessibility to community hospital a12 0.111277*** as12 0.052196
Distance from Cultural facilities (e.g. library) a13 0.097474** as13 0.058622
Park or playing ground for children a14 0.100275** as14 0.100814*
Green space a15 0.079093** as15 0.123672**
The areas of out space a16 0.172469**** as16 0.027558
Street landscape a17 0.087435** as17 0.102831*
Communication feasibility with neighbors a18 0.061328 as18 0.020922
Interaction 1: Neighborhood influence as19 0.064556
Interaction 2: Global influence as20 0.174542****
206 Z. Shen et al.

Table 3 Investigation of residential environment and housing environment in Suburban


area of Kanazawa City

Investigation Time Investigation Areas Responders Recovery Rate


Nukashin Machi 124 63.27%
Magae Machi 139 68.81%
Minma Machi 90 67.67%
Dec 2009 -Jan 2010 Kubo Machi 130 62.20%
Izumino Machi 125 64.10%
Izumigaoka Machi 134 62.33%

3.3 Household Relocation Module

3.3.1 Decision Tree for Start

Household Relocation Choice


Residential utility of different
We propose that households locations
who desire for new locations
will follow the decision-making
Probability for household i
flow showed by Fig-3. Agent choosing location s
will firstly compare the utility
offered at the three urban areas Which location has biggest probability?
showed in the figure. Basing on
the comparison, agents will
finally choose the location that 1, CCA 2, UPA 3, UCA

provides them with biggest utili-


ty for themselves. Make a decision

Remove to new
3.3.2 Utility of Residential location 1, CCA City Center Area
2, UPA Urban Promoting Area
Locations in Different Urban 3, UCA Urban Control Area
End
Areas
We seem that household makes Fig. 3 Decision tree for household relocation
decision on a new location choice
based on the utility offered at
and arround the location. The utility of location s for household i can be calculated
by equation 5. xisj here means a vector of observable explanatory variable j de-
scribing attributes of household i and location s. Additionally, asj is a vector of re-
trospective coefficients to variable j. Finally the probability for household i choos-
ing location s follows a logistic function showed in equation 6. Qis means the
probability of household i choose location s, which will further be influenced by
the utility offered by location s and without considering unobserved random influ-
ences. The same variables as what we used for satisfaction evaluation will be em-
ployed for calculating residential utility. While different from satisfaction evalua-
tion, the variable value is assigned by the rang of [1, 4] based on the hypothetical
space of Kanazawa city. We assume that the values for variable 1 to 6 and variable
10 to 16 are decreasing from CA to UCA by 1.5 unit with a random range of 0.5
respectively, but others are increasing from CA to UCA by 1.5 unit with a random
Development of Agent-Based Model for Simulation 207

range of 0.5 respectively by an assumption of housing environment are decreasing


from UCA to CA while residential environment is just contrast situation.

Vis = ∑ asj × xisj (5)

eVis
Qis = (6)
∑e Vis

The variables for reflecting policy interactions (as shown in table-1):

1) Neighborhood influence (xis19): neighbors who will use the policy of resi-
dential allowance to move to downtown Nmove divided by neighbors who
will not use the policy do relocation Nnomove ;
2) Global influence (xis20): total number of households who will use the pol-
icy to do relocation Gmove divided by all households Gtotal ;

Variable xis19 and xis20 stand for the neighborhood and global acceptance of the
policy. We determined them basing on the questionaire introduce by Tab-4.

Table 4 Question regarding to residential promoting policy


Question Alternatives
Do you plan to get an allowance 1.Plan to buy house in downtown by using the policy
for moving to downtown by using 2. Plan to buy apartment by using the policy
this policy? 3. Planned to buy house by using the policy
4. Planned to buy apartment by using the policy
5. won’t use the policy
6. others

As shown by Tab-4, we introduced part of the questions regarding to the resi-


dential promoting policy. We assume that households who chose the first two al-
ternatives will utilize the policy to do relocation while others won't. Through
seeming the households live in the same investigation area as neighbors, we calcu-
lated Nmove and Nnomove in average 8 responders of the questionaires. Because
there are 6 investigation areas, so there will be 6 results for xis19. Last, we calcu-
lated the average value of them to determine the final xis19. In the meantime, we
represent global policy response by determining the ratio of households who will
use this policy for relocation based on the answers of all responders of the 6 inves-
tigation areas. The result is defined as xs20. At last we also estimated asj, the results
are showed in Tab-1, namely partial coefficients with policy influence.

4 Model Test of HRRM Using Hypohetical Urban Space


For considering the impact of planning policy, we define the spatial planning in-
formation and household information based on the typical urban form of Japan.
All the spatial information and household information are taken as external condi-
tions for the simulation model.
208 Z. Shen et al.

4.1 Hypothetical Urban Space and Household for Policy


Simulation
This study concerns a hypothetical urban space of 2500 cells (50 x 50 ) where
each cell mearsures 500 m x 500 m. This hypothetical city has the characteristics
of a typical Japanese city that has a commercial business district located in the
CCA of the city and two concentric areas, an urbanization promotion area (UPA)
and an urbanization control area (UCA). The land use zoning is based on the ratios
of different land use types reflecting local urban plan in Kanazawa city that
showed by the second picture from the left of Fig.4. The following simulation will
start from 1985, each simulation loop stands for 6 months.

11 3 1 5 10
5 8 7
7 7 1 4
5 2
CA 9
6 8
1 9
4 3 12
CA 4
UPA 6 9 11
11 7 12
UPA 12
UCA 8
10 63

UCA

Zoning type: Household density:


1---1st low-rise exclusive residential district 7----Quasi-residential district Household-num >= 4
Household-num >= 2
2----2nd low-rise exclusive residential district 8----Neighborhood commercial district Household-num = 1
9----Commercial district Household-num = 0
3----1st mid-high-rise exclusive residential district
Income level:
4----2nd mid-high-rise exclusive residential district 10----Quasi-industrial district Rich
Middle
5----1st residential district 11-----Industrial district Poor
6----2nd residential district 12------Exclusive industrial district

Fig. 4 The virtual Kanazawa city for accommodating household agents

According to the Japanese Census Survey in 1985, household data is created for
reflecting household attributes in Kanazawa city. It contains like household in-
come, car ownership, age, current residential location, etc. There are 1500 house-
hold agents living in this virtual city and each agent stands for 300 real households
in local city. The households are located according to the density of each land-use
zoning. In the virtual city we assume that the percentages of population in the
three income levels are 20%, 60%, and 20%, respectively. Therefore, the
household density in the virtual data has been created as the third picture in
Fig 4 and household income was represented as the first picture from the right of
Fig. 4.

4.2 Model behavior Test


HRRM is tested in Netlogo platform and two scenarios are simulated. The first
one purely focused on the model behavior of household residential relocation by
ignoring any policy influences. The other one inputted the policy influences in the
simulation to conduct the interactions between policy implementation and house-
hold responses.
Development of Agent-Based Model for Simulation 209

4.2.1 Scenario One


During simulation the virtual data regarding household information and virtual
space will be firstly inputted. The agent information is shown in Fig-5 and it will
be seemed as initial household in-
formation.
As shown in Fig-6, there are 173
households moved to new locations
after 30 simulation steps, namely 15
years. While although relocations
are happening, the number of
households in CA is just 346, which
is just half of the figure in UCA. It Fig. 5 Initial household information
is thus evident that without special
intervention, downtown decline
cannot be practically changed.

Fig. 6 simulation result of the first scenario

4.2.2 Scenario Two


In this scenario we will input the policy influence to our simulation. The basic
principle of this process is updating the parameter values of as19 and as20 to reflect
neighbourhood and global influences on household relocation. Simulation results
of second scenario are showed in Fig-7, within which two types of policy

Fig. 7 simulation result of the second scenario


210 Z. Shen et al.

parameters are simulated. The first one namely policy interaction I is basing on the
values of as19 and as20 showed in table-1. It supposed that all the households who
showed interests in this policy will denifitely move to CA. The results showed
households gradually all moved to CA area. Actually this is impossible so we sup-
pose that 10% of whom interested in this policy will finally move, the result
turned to be what showed by the figure of policy interaction II in Fig-7. Compared
with first scenario we can easily observe that households live in CA or do reloca-
tion to CA are increasing obviously. Apparently this residential promoting policy
can to some extent accelerate households remove to downtown.

4.3 Model Validation by the Real Data of Kawakawa City


In order to prove the validation of HRRM we will compare the simulation results
with local statistic data by converting the simulation results into household ratios
in different urban areas. We conducted the simulation of household residential re-
location from year 1985 to 2000. The comparison are showed by Tab-5, as
showed by local census survey in 1985 the household ratios in CA is 33.9% and
66.1% in UPA and UCA, up to these two indexes the simulation results are re-
spectively 33.3% and 66.7%. Obviously the simulation results are quite near to the
real data. Until this stage, a conclusion can be gained that our model can simulate
household residential relocation within different areas in a realistic way.

Table 5 Comparison of household ratios in different urban areas between real data and
simulation results

Years 1985 1990 1995 2000


Real dataset CA 33.9%* 31.9%* 29.0%* 26.6%*
UPA+UCA 66.1%* 68.1%* 71.0%* 73.4%*
Simulated CA 33.3%* 32.1%* 30.4%* 21.3%*
results UPA+UCA 66.7%* 67.9%* 69.6%* 78.7%*
Note: * average values of 30 round simulations with significance level over 0.05

5 Conclusions and Future Works


The HRRM integrated three modules, which can thereby simulate the whole
process of household decision-making on residential relocation. The model can be
employed for visualization of household residential relocation within different
urban areas.
This paper focuses on model development, we conduct scenario analysis to
show the possibility of this ABM model. As introduced by model behavior test,
we simulated two scenarios by considering two case studies with policy influences
and without policy influences. The case study with policy influences was
conducted basing on a residential investigation in local city regarding to the
residential promoting policy. Thus, the parameters employed in scenario two can
basically reflect the policy responses of local households. The simulation results
Development of Agent-Based Model for Simulation 211

thereby have fulfilled the purpose of visualization on possible effects of the resi-
dential promoting policy for downtown regeneration. Comparing with the simula-
tion results in first scenario, the number of households moved to downtown (CA
area) is increasing evidently when the policy parameters are updated. It means that
local downtown decline probably can be relieved by implementation of this new
residential policy. The implementation of this residential promoting policy to
some extent only helps and attracts them to choose new locations in downtown.
As a result, the model validation proved the possibility of HRRM in simulation,
and visualization of household residential relocation influenced by specialty poli-
cies through scenario configuration. While in this work all the simulations were
conducted in a virtual space and the simulation results can just reveal the differ-
ences between the first and second scenario. Thus the simulations can represent
the possibility of ABM in simulation of policy implementation effects. In the fu-
ture work we will improve it by simulating the real implementation results of local
residential promoting policy through using a real dataset.

References
Haase, D., Lautenbach, S., Seppelt, R.: Modeling and simulating residential mobility in a
shrinking city using an agent-based approach. Environmental Modelling & Software 25,
1225–1240 (2010)
Rieniets, T.: Urban shrinkage. In: Atlas of Shrinking Cities, German, Hatje, Ostfildern, p.
30 (2006)
Howley, P., Scott, M., Redmond, D.: An examination of residential preferences for less sus-
tainable housing: Exploring future mobility among Dublin central city residents. Ci-
ties 26, 1–8 (2009)
Jager, W., Mosler, H.J.: Simulating Human Behavior for Understanding and Managing En-
vironmental Resource Use. Social Issues 63(1), 97–116 (2007)
Population investigation, Basic census survey of Kanazawa City (2005-2007),
http://www.jinko-watch.com/shicho/0846.html
Kawakami, M., Takayama, J.: Study on residential intends of housing owners – A case
study of Kanazawa City. Journal of the City Planning Institute of Japan 13, 67–72
(1978)
Kikuchi, Y., Nojima, S.: Resident’s Mind about Residence Selection in Suburban Housing
Estates: Case study of 4 suburban estates in Fukui City. City Planning Review: Special
Issue on City Planning 42(3), 217–222 (2007)
Kim, J.H., Pagliara, F., Preston, J.: The intention to move and residential location choice
behavior. Urban Studies 42(9), 1621–1636 (2005)
Molin, E., Timmermans, H.: Accessibility considerations in residential choice decisions:
accumulated evidence from the Benelux. In: TRB 2003 Annual Meeting CD-ROM,
Washington, DC (2003)
Rouwendal, J., Meijer, E.: Preferences for housing, jobs, and commuting: a mixed logit
analysis. Journal of Regional Science 41(3), 475–505 (2001)
Development of the Online Self-Placement Test
Engine That Interactively Selects Texts for an
Extensive Reading Test

Kosuke Adachi, Mark Brierley, and Masaaki Niimura

Abstract. Extensive reading is a method of learning language by reading many


books that are easy and enjoyable. There is a need for a placement instrument to
evaluate learners’ reading levels and the Edinburgh Project on Extensive Reading
tests are well established. However, they have a few drawbacks, for example they are
paper tests and preparing the correct level of test for individual learners is difficult.
The Extensive Reading Foundation Online Self-Placement Test (ERFOSPT) is a
new online self-placement test that “evaluates learners fluent reading levels easily,
quickly and accurately”. It delivers the test via the Internet and adaptively selects
texts depending on records of the learners reading level and the results of previous
questions in the test. In this paper we explain the development of the Online Self-
Placement Test engine that is a part of the ERFOSPT.

1 Introduction
Extensive reading (ER) is a method of learning that improves reading comprehen-
sion by reading many books that are easy and enjoyable. It is able to produce sig-
nificant educational benefits and has been successful in Japan[1, 6]. We are engaged
Kosuke Adachi
Graduate School of Science and Technology, Shinshu University
4-17-1, Wakasato, Nagano City, Nagano, Japan
e-mail: adachi@seclab.shinshu-u.ac.jp
Mark Brierley
Language Education Center, School of General Education, Shinshu University
3-1-1, Asahi, Matsumoto City, Nagano, Japan
e-mail: mark2@shinshu-u.ac.jp
Masaaki Niimura
e-Learning Center, Shinshu University
3-1-1, Asahi, Matsumoto City, Nagano, Japan
e-mail: niimura@shinshu-u.ac.jp

T. Watanabe et al. (Eds.): Intelligent Interactive Multimedia: Systems & Services, SIST 14, pp. 213–222.
springerlink.com c Springer-Verlag Berlin Heidelberg 2012
214 K. Adachi, M. Brierley, and M. Niimura

in ER as a part of the English language education curriculum at Shinshu University.


Additionally we developed and have been operating an online ER support system;
it is effective to learn with ER[3].
Learners have to select ER books that are graded by each publisher when they
start ER. However beginners often don’t know their own reading level in ER, so it
is difficult to select suitable books. Additionally, subsequent selection of suitable
books is difficult for learners, because their reading level may change. If learners
are ignorant of the changes in their own reading level, they may keep on reading the
same grade books, or may attempt higher level books before they are ready. As a
result the learners’ reading ability may not improve.
Therefore, there is a strong need for a placement instrument to accurately evaluate
individual learners’ reading levels. At the moment varied methods are used; one is
the Edinburgh Project on Extensive Reading (EPER) Extensive Reading Test[8].
However this has a few drawbacks, for example it is a paper test and preparing the
correct level of test for individual learners is difficult.
Therefore we propose and have been developing a new test system that evalu-
ates individual learners’ reading levels, named Extensive Reading Foundation On-
line Self-Placement Test (ERFOSPT)[7]. The ERFOSPT delivers the testing via
the Internet and adaptively selects texts by the learners’ reading level and the re-
sults of previous questions in the test. As a result it is able to more easily, quickly
and accurately evaluate individual learners’ reading levels. In this paper we explain
about the development of the Online Self-Placement Test engine that is a part of the
ERFOSPT.

2 Background
2.1 SSS ER Method
Most native-targeted books, such as novels, require a high level of language profi-
ciency and are too difficult for general learners in Japan to read with any fluency. ER
typically relies on children’s literature, or graded readers that have been written or
adapted for language learners at a particular level of proficiency. Professor SAKAI
Kunihide at the University of Electro-Communications proposes a radical form of
the ER method, emphasising the reading of many books that are easy and enjoyable,
and beginning with very low-level books. In 2001, SAKAI et al. created SSS (Start
with Simple Stories) ER method, and his method has spread and come to fruition in
Japan[5].
The SSS ER method sets a goal to read one million words in English. For this
purpose, learners should first select books that are easy, and gradually read higher
level books as the learner’s degree of proficiency increases. Sakai’s method has the
“Three Golden Rules”[4]:
1. No dictionaries while reading.
2. Skip over difficult words and phrases.
3. Quit reading if the book is difficult or boring.
Development of the OSPT Engine That Interactively Selects Texts 215

Although there is variety among teachers, the ER method in Shinshu University is


close to the SSS ER method, and in this paper we use “ER” in the sense of “SSS ER”.

2.2 Selection of Suitable Books


Selection of suitable ER books is a critical factor to continue ER with the Three
Golden Rules (2.1). Necessarily, learners have to select suitable ER books from
many books in the bookshelves, with the help of the book’s grade that is decided
by each publisher. However, learners who don’t know their own reading level in
ER have difficulty in the selection of suitable books. Additionally if learners are
ignorant of the change in their own reading level, they may either keep on reading
the same grade books when they could have read more interesting books at higher
levels and met more new language, or, perhaps more dangerously, dropped out of
reading because they selected a book that was too difficult. Therefore learners need
to accurately and routinely check their fluent reading ability, for example by an ER
placement test.

2.3 EPER Drawbacks


There are various methods to evaluate individual learners’ reading levels, and the
EPER tests are well established. In fact EPER offers two kinds of test: the Place-
ment/Progress Test and the Extensive Reading test. The Placement/Progress test is
a modified cloze test, including short texts at various levels with high-frequency
words removed. The Extensive Reading Test includes relatively long texts, which
are like parts of graded readers. The texts are graded using the 8 levels of difficulty
that are decided by EPER. The Extensive Reading Test has two or three sets of com-
prehension questions that the student is instructed to answer at some point within
the text and at the end of the text.
In Shinshu University, following the EPER guidelines, learners take a pair of
suitable level texts, which takes an hour. The teacher then marks question papers by
hand, and, using criteria provided by EPER, feeds back the results to the learners.
The EPER Extensive Reading Test can evaluate individual learners’ reading levels,
but it has some drawbacks:
1. Paper tests waste a lot of time and paper
Although the reading texts can be recycled, the answer sheets cannot, and teach-
ers have to print a lot of papers at suitable grades each time there is a test, and
they have to spend time marking the question papers by hand. Additionally, they
might have to gather individual learners’ records, text levels and so on, for indi-
vidual text papers and question papers each time of testing. This is because they
need to know the state of each learner’s reading level to help advising them on
suitable graded texts. Therefore, teachers may waste a lot of their time and paper,
and it is a drawback for them. In addition, there is a drawback for learners: the
limited and time for taking the test, which must be carried out during lesson time,
and will take up most of a lesson.
216 K. Adachi, M. Brierley, and M. Niimura

2. Difficulty of suitable grade text selection


Teachers have to select suitable grade texts for learners from the various levels
of difficulty, to accurately evaluate individual learners’ reading levels. However,
in order to do this, they have to gather individual learners’ records each time of
testing. That is difficult for the reason described previously and another draw-
back for teachers and learners.

3. Evaluation may not be accurate


This test may not indicate a learner’s fluent reading level, as the questions involve
writing and may be open to guessing. Additionally the test is not able to evaluate
minor changes of a learner’s reading level, because the teacher only selects the
same level text for all learners in a test; in other words, they are not able to select
the second text in consideration of the result of the first questions. For example
a teacher may select a test at EPER level E (where A is the highest level and H
is the lowest). However this text can only evaluate whether the learner’s reading
level is at E, below E or above E. If a student is above level E, the test will not
show whether a student is at level B, C or D.

3 About the ERFOSPT


The ERFOSPT is a new test that evaluates individual learners’ reading levels. It is
a combined effort with the support of the ERF and most of the major graded reader
publishers. Since graded readers are most commonly used for ER, the ERFOSPT
evaluates individual learner’s reading level by texts with published graded reader
series from all publishers. In addition the text uses the 16-level ERF Graded Read-
ing Scale[2] as its leveling system. We address each drawback of the EPER by the
following solutions:
1. Online test evaluates easily and quickly
Teachers do not need to print a lot of papers each time of testing, because learners
are able to take the test via the Internet. Additionally, marking a lot of individ-
ual question papers and gathering individual learner records becomes automatic.
Moreover, learners can take the test anytime, anywhere by themselves, so there
is less pressure on class time.

2. Selection of suitable grade texts is automatic


The ERFOSPT is able to select texts at a suitable level for learners automati-
cally from various levels of difficulty, because it has every learner’s record in the
database.

3. Evaluation is more accurate by adaptively selecting texts


All the questions are true/false format and none are trick questions, which al-
lows learners to answer quickly and efficiently, and avoids interference from
students’ writing proficiency. Additionally the ERFOSPT gives more suitable
grades of text than EPER to individual learners each time of testing. Moreover the
Development of the OSPT Engine That Interactively Selects Texts 217

ERFOSPT should be more accurate than EPER because it adaptively selects the
next level texts in consideration of the results of the previous questions. Eval-
uation criteria are not just the results of comprehension questions but also the
reading speed, and the learner’s impression of the story.

3.1 Algorithm to Evaluate Learner’s Reading Levels


The algorithm used by the text is under constant review. Below is an example of an
algorithm. The algorithm is based on the following assumptions:
• A given text can be assigned a level of difficulty (X), so we can say that text A is
more difficult than text B, or text C is easier than text B.
• A given learners has a reading levels (C) that can be matched against difficulty
levels, such that for difficulty X there is a hypothetical C that is equal.
• If a learner (C) can read a text fluently (i.e. within a time limit (T), and correctly
answering all questions (Q) on the text) then the learners level is at, or above the
level of the text.
• If a learner cannot read a text extensively (i.e. takes more than the time limit to
read it, and/or the learner cannot answer all questions correctly) then the learner’s
level is below the level of the text.
• There is variation among learners and texts, so one result will not confirm the
level of a learner 100%.
Definitions of variables:
Xn : level of iteration n text.
Cn : evaluated level of learner after iteration n text.
Tn : time for learner to read iteration n text.
T Tn : target time for iteration n text.
Qn : percentage of questions answered correctly on iteration n text.
QTn : threshold of Qn .
Dn : increase/decrease in text level between subsequent texts on iteration n.
Φ : a constant for reducing D with each iteration.
w : weight of test level compared with previous estimated level.
The same algorithm will work at each iteration from the beginning, and various
variables are governed by the following equations.

Dn = Dn−1 /Φ (1)

D1 is the initial value of Dn and a suitable initial value and a Φ for evaluated level
is decided by the examiner.

Cn−1 − Dn < Xn < Cn−1 + Dn (2)


218 K. Adachi, M. Brierley, and M. Niimura

16
15

Learner ’s evaluated level(Cn), Text level(Xn)


14
13 12.3 12.0
12 X4 11.5
11.3 C3 11.0
C2 10.0 10.3 10.0 C5
11 X6 9.9
10 9.0 X3 C4 X5 C6
9 8.0 X2
8 C1 10.6 to 14.0
9.3 to 13.4 (D4 is 1.7) 10.3 to 12.7
7
(D3 is 2.1) 8.9 to 11.8 (D6 is 1.2)
6 5.0 5.0
(D5 is 1.4)
5 C0 X1
10.5 to 5.5
4
(D2 is 2.5)
3
8.0 to 2.0
2
(D1 is 3.0)
1
Start 1 2 3 4 5 6
Iteration

Fig. 1 Evaluation of learner’s reading level with each iteration

A text at level Xn is chosen within the above range that reduces in size with each
iteration.
Cn = (Cn−1 + wXn )/(1 + w) + Dn (3)
Cn = (Cn−1 + wXn )/(1 + w) − Dn (4)

Cn is defined by the upper equation(3) if “Tn < T Tn and Qn > QTn ” and other than
that is defined by the lower equation(4). “Cn-1 + w Xn/(1 + w)” is a weighted
average of the text difficulty and our estimate of the student’s level from last time, so
it can take into consideration both the text they have just read, and our last evaluate
of their level.
Fig.1 is an example of an algorithm operating as a learner takes the test. w is 5.0,
Φ is 1.2 and D1 is 3.0. The learner’s initial estimate of reading level is 5.0, and the
learner’s actual reading level is 10.0. Fig.1 assumes that the learner can quickly read
and correctly answer questions for texts up to level 10, but cannot fluently read texts
over level 10.

4 Development of the OSPT Engine


The OSPT Engine is the testing system of the ERFOSPT as a web application. Its
architecture operates on not only a PC but also a tablet PC, a smartphone and other
devices via the Internet. For this reason it is developed in HTML5, JavaScript and
some server-side programs. It has the following components for implementing the
ERFOSPT (fig.2).
OSPT Engine

Authentication
Testing
Managing accounts

Creating problems
Fig. 2 Components of OSPT Controller
Managing logs
OSPT Engine
Development of the OSPT Engine That Interactively Selects Texts 219

4.1 Testing Function and OSPT Controller


The primary components of this engine are the Testing Function, and the OSPT
Controller. The Testing Function implements the test and it has five states:
• Start
• SelectLevel
• ReadText
• TakeQuestions
• End
The OSPT Controller creates and controls state transitions while calculating the
learner’s question scores using various data, and decides the learner’s reading level.
It has full control of the control blocks. The combination of the control blocks makes
up the architecture and algorithm for the controller, and the administrator can edit
the algorithm to easily replace some control blocks on the web user interface.
Fig.3 is the algorithm of 3.1 using a combination of the control blocks for ER-
FOSPT, fig.4 is the state transition diagram created by the algorithm and fig.5
contains screenshots of each state. The algorithm has the following steps:
1. The first state is “Start” and it displays help text for the learner.
2. The OSPT Controller checks whether or not the learner’s record is in the
database. If there is no record, the OSPT Controller changes the state to “Se-
lectLevel” and the learner then roughly selects a level as a starting point for the
test from a selection of short texts.
3. After the OSPT Controller gets the learner’s level, it selects a suitable grade text
for the defined algorithm and changes the state to “ReadText”.
4. The learner reads the text and after finishing the reading passage, the OSPT Con-
troller changes state to “TakeQuestions”.
5. After completing the questions, the OSPT Controller gets the score and the read-
ing time and controller calculates the learner’s level using various data, and esti-
mates the learner’s reading level.
6. After that the OSPT Controller returns to “ReadText” or changes the state to
“End”. Finally the OSPT Controller updates the learner’s level in the database
and gives the learner data on a suitable reading level.

4.2 Authentication and Management of the Accounts


The OSPT Engine can authenticate and manage accounts. There are various authen-
ticating methods, for example ID/Password, Open-ID and Shibboleth. Alternatively,
users can enter the system with a guest account. Therefore various learners, includ-
ing students in this university or other schools and universities, or members of the
public are able to take the test. Teachers can also enter the system.
220 K. Adachi, M. Brierley, and M. Niimura

state
1. Start
def constant as number
ITERMAX 6
def constant as number
PHI 1.2
def constant as number
WEIGHT 5
def variable
calev
def variable as number
iter 0
def variable as number
indec 2
def variable
rTime
def variable
score
def text
sText
if predicate function value learner.level
exsits
then set to learner.level
calev
2.
state
else SelectLevel
set to selectedLevel
calev
while predicate
ITERMAX > iter
function from
randomSelectText calev - indec
to calev + indec
set to selectedText
3. sText
state text sText
ReadText
set to readingTime
rTime
state questions sText.questions
4. TakeQuestions
do set to score
score
if predicate
rTime < sText.TT and sText.questions.threshold < score
set to
then /
calev calev + WEIGHT × sText.grade number + WEIGHT + indec
1

set to
5. else
calev calev + WEIGHT × sText.grade / number + WEIGHT - indec
1

set to
indec indec / PHI
set to number
iter iter + 1

set to calev
learner.level
function user learner
modifyUser
6.
state
End

Fig. 3 The ERFOSPT algorithm with a combination of the control blocks

Fig. 4 The state transition


diagram created by Fig.3 Start
Read Take
End
Text Questions

Select
Level
Development of the OSPT Engine That Interactively Selects Texts 221

SAMPLE

Select Level Read Text Take Questions

Fig. 5 The screenshots of each state

4.3 Creation of the Texts and the Questions


An administrator of ERFOSPT is able to create the texts and the questions for the
function of the OSPT Engine. The OSPT Engine stores these texts and questions
in the database and picks them up when they are suitable for the learner taking the
test. Administrators are able to use characters and images as text, and they select a
combination of questions that are true/false, multiple-choice or written answers for
each text. Additionally questions are categorized such as comprehension, impres-
sion or questionnaire, and the way in which the data can be used to evaluate the
learner’s level is recorded. Moreover the administrator is able to create pages in the
“Selectlevel” state.

4.4 Management of the Logs


The OSPT Engine logs various data including access time, selected text, individual
score/reading speed of each text. The administrator and designated teachers can
browse this information, and it is helpful to develop the algorithm for the OSPT
Controller.

5 Conclusion
The ERFOSPT is a new online self-placement test for ER that evaluates individ-
ual learner’s fluent reading levels easily, quickly and accurately. Additionally this
system adaptively selects the texts depending on the learners’ reading level and the
results of previous questions in the test using the algorithm of the OSPT Controller.
At this time the test is in the Beta stage with students from a number of schools
in Japan participating in trial runs. Moreover we have been analyzing data of the
logs, and assessing the suitable algorithm for the OSPT Controller. For the future,
we will make the ERFOSPT and the OSPT Engine available to the public, and keep
222 K. Adachi, M. Brierley, and M. Niimura

on analyzing and developing this. Additionally we hope to make the OSPT Engine
available not only to the ERFOSPT but also to other tests for other disciplines.

References
1. Furukawa, A., et al.: The Complete Book Guide for Extensive Reading, 3rd edn., p. 512.
Cosmopier (2010)
2. Extensive Reading Foundation.: About Graded Readers — The Extensive Reading Foun-
dation (December 7, 2011), http://www.erfoundation.org/erf/node/44
3. Sato, H., et al.: Evaluation of English Education system based on Extensive Reading in
Shinshu University. IEICE Technical Report. ET, Educational Technology 109(453), 141–
146 (2010)
4. Sakai, K.: Toward One Million Words and Beyond, p. 310. Chikuma-sho bo (2002)
5. Sakai, K., Kanda, M.: Extensive Reading in the Classroom, p. 227. Taishukan Shoten
(2005)
6. Brierley, M.: Extensive reading levels. JABAET Journal (11), 135–144 (2007)
7. Lemmer, R., Brierley, M., Reynolds, B., Waring, R.: Introduction to the Extensive Read-
ing Foundation Online Self-Placement Test. Extensive Reading World Congress Proceed-
ings 1, 23–25 (2012)
8. University of Edinburgh ELTC.: EPER Getting started (December 7, 2011),
http://www.ials.ed.ac.uk/postgraduate/research/
eper-getting-started.htm
DOSR: A Method of Domain-Oriented Semantic
Retrieval in XML Data

Jun Feng, Zhixian Tang, and Ruchun Huang

Abstract. This paper presents a method (named DOSR) to support the semantic re-
trieval of XML documents in a specific domain. It takes the entity as the basic unit
of information processing to guarantee the semantic integrity of returned results.
An efficient index method named Entity-based index is designed for indexing the
entities. It can greatly reduce the size of the index file while guarantee the speed of
parsing entity. In order to rank the querying results, the Stratified-Weight-Method is
proposed to make an improvement towards the traditional technology. Experimen-
tal results show that DOSR can infer users’ search intention effectively, locate the
search target quickly and return the exact results in accordance with users’ expecta-
tion. The results processed by DOSR guarantees semantic integrity and reasonable
ranking results.

1 Introduction
With the scalability, flexibility and self-descriptiveness of XML, the XML data from
a specific application domain has rich semantic information. XML has become a
well-acknowledged standard for data storage and data exchange, resulting in push-
ing the query processing issues to become very hot topics in XML research field.
Traditional XML query processing methods are mostly tree-based models, that
is, firstly look for the nodes that match the query keywords and then locate a sub-tree
containing those nodes which match the query keywords. But the semantic informa-
tion of node can’t be acquired through this traditional retrieval method. Currently
the typical method for XML semantic retrieval is Smallest Lowest Common An-
cestor (SLCA) [8], a sub-tree with high matching degree with the query keywords
can be achieved through SLCA. Based on the SLCA method, many scholars have
Jun Feng · Zhixian Tang · Ruchun Huang
College of Computer and Information, Hohai University, No.1 Xikang Road,
Nanjing 210098, China
e-mail: fengjun@hhu.edu.cn

T. Watanabe et al. (Eds.): Intelligent Interactive Multimedia: Systems & Services, SIST 14, pp. 223–232.
springerlink.com c Springer-Verlag Berlin Heidelberg 2012
224 J. Feng, Z. Tang, and R. Huang

conducted further researches, but they are overly concerned about the accuracy of
retrieval results but not the semantic integrity. Semantic integrity refers to the re-
trieval results with rich information and complete semantics, which is important for
the users of information systems. For example, as user John may want to search a
person’s contact information (e.g. name, e-mail, address) by phone, the traditional
tree-based retrieval methods which may return back just phone information but not
other information can’t achieve the purpose.
This paper presents a method of Domain-Oriented Semantic Retrieval (DOSR)
for XML, it takes the semantic integrity of retrieval results as a primary goal and
the domain entity as the basic processing unit. The main idea is domain-oriented,
those semantically complete domain entities are extracted by the domain experts
through analysis of the XML Schema. Based on the domain entity, DOSR index-
ing, retrieving and ranking the XML data guarantees the semantic integrity of the
retrieval results.
Our paper has the following contributions:
1. We present to take the domain entities as the basic processing unit. There is
no ambiguity for a concept in a specific application domain, which ensures the
semantic integrity of entities. The XML snippet corresponding to the entity con-
tains the entity nodes and all their child-nodes, which constitutes a semantically
complete entity.
2. We present the Stratified-Weight-Method for ranking the retrieval results, which
effectively solves the matching degree computing problem between entity in-
stances and query keywords.
3. We present an effective XML indexing method named Entity-based index, which
removes structural information and just maintains the value information with
flattens the XML structure twice, resulting in effectively improving the speed of
parsing the entity instances and reducing the size of index file.
The rest of the paper is organized as follows: In Section 2 we present related work.
In Section 3, we introduce DOSR. We discuss experiments analysis in Section 4.
We also give a summary of our work in Section 5.

2 Related Work
In recent years, the research on XML retrieval focused on the aspects of XML Key-
word query. The first area of research relevant to this work is the computation of
the Lowest Common Ancestor (LCA) of a set of nodes of a tree. Efficiently com-
puting the lowest common ancestor (LCA) of a pair of nodes in a tree is a hot
problem, efficient approaches in main memory are known for its solution [1]. A
Schmidt et al [2] present an algorithm which uses a set of relations which contain
the keywords pairs as inputs for computing of a pair of nodes, outputs all pairwise
LCAs. XKSearch[8]defined Smallest LCAs (SLCAs) to be LCAs that do not con-
tain other LCAs. In XKSearch the number of smallest LCAs is bounded by the size
of the smallest keyword list, then it can filter out some useless nodes, but it leads
to some meaningful results be lost. C Sun et al[3] extend the SLCA to handle general
DOSR: A Method of Domain-Oriented Semantic Retrieval in XML Data 225

keyword search involving combinations of AND and OR boolean operators.


XRank[5] return subtrees as answers to the keywords retrieval and rank the answers
base on PageRank, but it does not explain how the keywords connect to each other.
V.Hristidis et al [6] present GDMCT to return the (possibly heterogeneous) set
of minimum connecting trees (MCTs) of the matches to the individual keywords
in the query. In GDMCT, the MCTs are the answer of retrieval only, MCTs can
explain how the keywords are connected. The similarity improved between results
and keywords by GDMCTs, but the results do not have enough information, and
then the semantic integrity is worse than the SCLA. Z Liu et al[11] present XSeek,
to infer the semantics of the search and identify return nodes effectively, but it’s
short of ranking for results. Z. Bao et al[10] present an IR-style approach which
basically utilizes the statistics of underlying XML data to identify search Intentions
and resolve the problems about keyword ambiguity and result ranking. However,
IR-style can’t accurately hold the semantic information of nodes, may return some
meaningless results. Y Li et al.[7]present the notion of Meaningful Lowest Common
Ancestor Structure (MLCAS) for finding related nodes within an XML document.
MLCAs is an equivalent semantic of SLCA, it supports strong constraint for re-
trieval by using Xquery. G.Li[4] introduces the notion of Valuable Lowest Common
Ancestor (VLCA) to filter the results which have same path from the LCA nodes to
the keyword query. Y Huang et al[9] present a method eXtract for computing the
summary of the results of keyword search. It focussed on study how to infer the
self-contained semantic from different keywords.

3 Domain-Oriented Semantic Retrieval Method


3.1 Related Definitions
Definition 1. Entity: Entity has XML snippet with complete semantic information
in a given domain.

Definition 2. Match-Attribute : The attribute of an Entity which matches with key-


words is called Match-Attribute

Definition 3. Target-Entity: The Entity which has Match-Attribute is called Target-


Entity.

Definition 4. Target-Entity-Set: A set of Target-Entity.

Definition 5. Instance: The XML segment of a XML document corresponding to


the Entity is called Instance.

Definition 6. Predicate: The defined tag name in the XML Schema document is
called Predicate, including the schema of entity name and attribute name.

Definition 7. Text-Value: The value of an entity name or an attribute name in a XML


document is called Text-Value.
226 J. Feng, Z. Tang, and R. Huang

Definition 8. Instance-Matching Degree: The matching degree between instance


and keywords is called Instance-Matching.
Definition 9. Entity-Matching Degree: The matching degree between an Entity and
a Predicate is called Entity-Matching Degree.
Definition 10. Precise-Semantic Instance: If the instance contain match-attribute
and the value of the match-attribute is one of text-value in keyword. The instance is
called Precise-Semantic Instance( PSI).
Definition 11. Fuzzy-Semantic Instance: If the instance contain match-attributes
and the values of match-attributes or instances contain match-attributes, but the
value of match-attribute doesn’t match with the text value in keyword. The instance
is called Fuzzy-Semantic Instance(FSI).
Definition 12. Non-Semantic Instance: The instances don’t contain match-attribute
is called Non-Semantic Instance( NSI).

3.2 Entity-Based Index


In order to speed up the resolving of entity and reduce the index file. We present
the Entity-based index method, it indexes the XML data using the inverted index.
This method takes the entities as the basic processing unit for indexing, which is
quite different to the traditional inverted index. The traditional methods separately
index each attribute of the entity. In order to form a complete result, it requires ad-
ditional time for connecting among those attributes.Entity-based index method has
a two-level index, the first level indexes entities, the second level indexes instance.
Fig. 1(a) shows the structure of the Entity-based index, each row corresponds to an
entity, each entity points to all the corresponding instances which are in the XML
documents1. On the second level, each instance contains the XML document name
and all the attribute value corresponding to the instance2 .
When building the index, the instances firstly extracted from the source docu-
ments, so that the hierarchy between the instances has been removed; then all the
attribute value of the instance has been indexed to the same level for removing the
self-hierarchy of the instance. Then, all the attribute data of the entities stored in
line and separated by special symbol, the Semi-structured XML data is turned into
the structured data with Entity-based index by flattening the structure of XML data
twice, which doesn’t access instance by analysing the complex structure informa-
tion, resulting in speeding up the parsing speed effectively. As all the attribute values
of the entity are in an order, the attribute name can be determined by the location
of the corresponding attribute value which is in the Entity-based index. Then it can
effectively reduce the index size by only saving the attribute value in the index.
1 Each instance begins with the special symbol ”#0”, and ends with ”#0;”.
2 Begins with ”#n”, n is positive integer, which indicates the number of occurrences of at-
tribute values. In particular, if n>2, then use ”#;” to separate multiple occurrences of the
attribute value.
DOSR: A Method of Domain-Oriented Semantic Retrieval in XML Data 227

Fig. 1 Structure of Entity-based index

Fig. 1(b) shows an example of Entity-based index, entity 3 contains more than two
instances, the instance in Doc1 has three attributes, the attribute 3 has two same
values which are separated by ”#;”.

3.3 Stratified-Weight-Method
In the XML Schema, the attributes of an entity has different semantic weight. We
recognize the attribute nodes from the same level have the same weight and the
sum weight of attribute nodes from same level is inversely proportional to their
level. We take 1/2i as the inverse coefficient, i is the level of the attribute node.
Algorithm 1 shows how to calculate the weights of the attributes, line 7 to line 11
are to calculate the nodes of the level of node p, the final coefficient is the inverse
coefficient(line 15).

Fig. 2 Example for


Stratified-Weight Method

Fig. 2 shows entity A has two layers ,the inverse coefficient of the first level is
1/21 , 1/22 for the second level, the weight-ratio of the first level is (1/21)/(1/22 +
1/21 )=2/3, (1/22 )/(1/21+1/22 )=1/3 for second level. Assume the weight of A is 1,
the overall weight of the first level is 2/3, 1/3 for the second level, then a, b, c has
the same weight, being 2/3/3=2/9, 1/3/2=1/6 for c, d.
228 J. Feng, Z. Tang, and R. Huang

Algorithm 1. WeightOfProperty(Instance instance, attribute a)


1 begin
2 nl=instance.level-number //the number of level of instance
3 count=0
4 coefficient=0
5 weight=0 // weight of a
6 for a* is a attribute of instance do
7 if a*.level=a.level then
8 count=count+1 //count the number of attribute which level is equal al
9 end
10 end
11 for i=1 to nl do
12 coefficient= coefficient+2i
13 end
14 coefficient =(1/2l )/ coefficient
15 weight= coefficient/count
16 end

3.4 Retrieval Intention Speculation


DOSR speculates users’ search intention by parsing the predicates. The XML
Schema from a specified domain is analyzed in advance to form the predicate table.
After DOSR gets the keyword, it firstly looks up the predicate table, if the keyword
appears in the predicate table then the keyword is a predicate, otherwise, is the text-
value (ie, the value of predicate). After parsing the predicates of query keywords,
one can find the Target-Entity by searching the index file. If the name of an entity
or name of a attribute of an entity matches with the keywords, then the entity is the
Target-Entity. After all the entities are processed by the above method, the complete
Target-Entity-Set will finally be generated.

3.5 Query Processing


The main goal of query processing is to rank the instances of the Target-Entity-Set.
We divide the results into three classes based on their semantic precision: PSIs, FSIs,
NSIs. Clearly, the PSIs have the most semantic precision, followed FSIs, then NSIs.
The results are ranked based on the idea of classification and internal ranking, then
we use Stratified-Weight-Method for ranking each class internal, but among classes.
Algorithm 2 shows how to calculate the weight and semantic precision of in-
stance. For an attribute a of instance, if a’s name and a’s value both appear in key-
words (line 5 and line 8 are true), then a is a precise-match attribute and the instance
is a PSI (if precise>1, the instance is a PSI). If a’s name (line 5 is true and line 7 is
false) or a’s value matches with keywords (line 10 is true), then it only cumulatives
the weight of a to the instance (line 6 or line 11, but both).
DOSR: A Method of Domain-Oriented Semantic Retrieval in XML Data 229

Algorithm 2. WeightOfInstance(Instance instance)


1 begin
2 totalWeight=0 // matching degree of instance
3 precise=0 //semantic precise of instance
4 for a is a attribute of instance do
5 if a.name is a keyword then
6 totalWeight=toalWeight + a.weight //a.weight is the weight
7 if a.value is a keyword then
8 precise = precise+1
9 end
10 end
11 else if a.value is a keyword then
12 totalWeight=toalWeight + a.weight
13 end
14 end
15 end

4 Experimental Analysis
We conducted the semantic integrity experiments in order to test the precision of
results which were retrieved by DOSR. The index performance experiment was to
test the performance of response time of retrieval. The experiment environment is as
follows: CPU Intel i5 2.40G, Memory 4GB, HDD 300G/7200r, OS Win7 Ultimate.

4.1 Semantic Integrity Experiment


Fig. 3 shows the data for experimental analysis, it’s a snippet of a metadata. There
are two entities: MetaSchema and mdContact. MetaSchema is the root of the snip-
pet and represents the metadata. mdFileID is the unique identification of the XML
document, in the Entity-based index, it’s the doc name for an instance. mdContact
is the responsible party of the metadata. rpIndName is the executive’s name of the
responsible party. rpOrgName is name of the responsible party. rpPosName is posi-
tion of the executive. rpCntInfo is the contact information of the responsible party,
it contains cntAddress (address information), cntOnLinesRes (online resource infor-
mation) and voiceNum (phone of the responsible party).
The retrieval keywords for SLCA and DOSR from table 1 are tested. Fig. 4(a)
shows the tree cntOnLinesRes which is the result of SCLA for Q7. It’s the minimum
tree which contain all the keywords and doesn’t exist any sub-tree of cntOnLines-
Res which contain all the keywords. Although the matching degree between tree
cntOnLinesRes and keywords is 100%, in fact it has no meaning for users. Firstly,
the sub-tree cntOnLinesRes describes an online resources information which has no
context. Secondly, the returned tree is completely equivalent to the keywords and
doesn’t have any extra information, which looks a little weak in terms of semantic
integrity.
230 J. Feng, Z. Tang, and R. Huang

Fig. 3 Example data of experiment

Table 1 Query for Testing

Q keywords
Q1 nanjing // nanjing in all field
Q2 city // city in all field
Q3 rpOrgName delPoint city // city in rpOrgName and delPoint
Q4 rpIndName John Director // john in rpIndName Director in all field
Q5 postCode 210095 // 210095 in postCode
Q6 delPoint beijing postCode 210095 // beijing in delPoint, 210095 in postCode
Q7a cntOnLineRes linkage www.hhu.edu.cn orDesc information orFunct 002
a www.hhu.edu.cn in cntOnLineRes.linkage, information in orDesc, 002 in orFunct

Fig. 4(b) shows the entity mdContact which is the result of DOSR for Q7. The
context of entity mdContact is the contact information of metadata, which contains
the complete description information of responsible party, not only the information
appear in the keywords. For the user, mdContact is an entity of the domain, which
not only has complete context but also contains enough amount of information.

Fig. 4 Answer for Q7

Because of SLCA semantics is widely used, we selected SLCA as the compar-


ative objective for semantic query. SLCA recognizes if the node n of the tree from
XML document contains all the keywords, then those ancestor nodes of n have low
DOSR: A Method of Domain-Oriented Semantic Retrieval in XML Data 231

correlation with keywords. It denies the contribution of those ancestor nodes. The
results of SLCA must meet the following condition: the result must contain all the
keywords and there doesn’t results in any sub-tree which contain all the keywords.
Only the keyword contains the entity name, then SLCA can return entire semantic
entity. Compared with the SLCA, DOSR takes the domain entity as the information
processing unit. The minimum granularity of result is the entity, as long as the key-
words contain the name of entity or attribute, then you can get FSIs. If the (predicate,
text value) pair of keywords match with (attribute, attribute value) of an instance,
then you can get the PSIs.

4.2 Index Testing


We select data compression ratio and access time as performance indicators in terms
of index performance. From Table 2, the compression ratio remains steadily between
20% to 21% when the number of source documents was very large, because the
Entity-based index method flattens the XML data twice which effectively reduces
the size of index file.

Table 2 Compression Ratio of Index

No Number of doc Size of doc(KB) Size of index (KB) Ratio

1 1 6 1 16.67%
7 50 499 104 20.84%
2 100 1003 205 21.50%
3 200 2114 442 20.92%
4 500 5001 1021 21.42%
5 800 8625 1787 20.73%
6 1000 11468 2403 20.96%

Fig. 5 Time of access


instances

In order to be fair, in experiments, while accessing each instance snippet which is


stored in the index file, the index file should be reopened (in fact, the index file can
be stored in the memory once it has been openned, then the time will be shorter).
Fig. 5 shows the time of accessing instance segment from XML document (doc) and
232 J. Feng, Z. Tang, and R. Huang

Entity-based index (index). We can find that when the pressure of reading is grad-
ually increases, the time of reading instances from the source document increased
significantly, while the time of reading instances from the index remain at a rela-
tively low level, about 10% of reading instances from doc.

5 Conclusion
This paper presents DOSR to support the semantic retrieval of XML documents in a
specific domain. DOSR which takes the sematic entity as the basic processing unit
uses Entity-based index method for indexing the XML data and Stratified-Weight-
Method for ranking the results. Our experiments verify the effectiveness and effi-
ciency of DOSR. In the future, we intend to extend our keywords parsing algorithm
to support logical computing.

References
1. Czumaj, A., Kowaluk, M., Lingas, A.: Faster algorithms for finding lowest common
ancestors in directed acyclic graphs. Theoretical Computer Science 380(1-2) (July 2007)
2. Schmidt, A., Kersten, M.L., Windhouwer, M.: Querying XML Documents Made Easy:
Nearest Concept Queries. In: Proc. of the Intl. Conf. on Data Engineering, Washington,
USA, pp. 321–329 (2001)
3. Sun, C., Chan, C., Goenka, A.K.: Multiway SLCA-Based Keyword Search in XML Data.
In: Proc. of the Intl. Conf. on World Wide Web, New York, USA, pp. 1043–1052 (2007)
4. Li, G., Feng, J., Wang, J., Zhou, L.: Effective keyword search for valuable lcas over xml
documents. In: Proc. of the Intl. Conf. on Information and Knowledge Management,
New York, USA, pp. 31–40 (2007)
5. Guo, L., Shao, F., Botev, C., Shanmugasundaram, J.: XRANK: Ranked keyword search
over xml documents. In: Proc. of the Intl. Conf. on Management of Data, New York,
USA, pp. 16–27 (2003)
6. Hristidis, V., Koudas, N., Papakonstantinou, Y., Srivastava, D.: Keyword Proximity
Search in XML Trees. IEEE Transactions on Knowledge and Data Engineering 18(4)
(April 2006)
7. Li, Y., Yu, C., Jagadish, H.V.: Schema-Free XQuery. VLDB Endowment Very Large
Database Endowment 30 (2004)
8. Xu, Y., Papakonstantinou, Y.: Efficient Keyword Search for Smallest LCAs in XML
Databases. In: Proc. of the Intl. Conf. on Management of Data, New York, USA, pp.
527–538 (June 2005)
9. Huang, Y., Liu, Z., Chen, Y.: Query biased snippet generation in xml search. In: Proc. of
the Intl. Conf. on Management of Data, New York, USA, pp. 315–326 (2008)
10. Bao, Z., Lu, J., Ling, T.W., Chen, B.: Towards an Effective XML Keyword Search. IEEE
Transactions on Knowledge and Data Engineering 22(8) (August 2010)
11. Liu, Z., Chen, Y.: Identifying Meaningful Return Information for XML Keyword Search.
In: Proc. of the Intl. Conf. on Management of Data, New York, USA (2007)
Encoding Travel Traces by Using Road
Networks and Routing Algorithms

Pablo Martinez Lerin, Daisuke Yamamoto, and Naohisa Takahashi*

Abstract. Large numbers of travel traces are collected by vehicles and stored for
applications such as optimizing delivery routes, predicting and avoiding traffic,
and providing directions. Many of the applications preprocess the travel traces,
usually composed of position data, by matching these with links in the underlying
road network. This paper addresses the problem of persistent storage of large
numbers of vehicle travel traces. We propose two methods for using a routing al-
gorithm and road network to encode a travel trace formed by a sequence of links.
An encoded trace, composed of a few links, is useful to store or share and can be
decoded into the original travel trace. Considering that drivers tend to proceed
from an origin to a destination by using the shortest path or going as straight as
possible, the two proposed methods use the following two routing algorithms: a
shortest path algorithm; and a following path algorithm, which finds the path that
avoids turns. The experimental results for 30 real traces show that a travel trace is
encoded into only 5% or 7% of its links on average using the shortest path algo-
rithm or the following path algorithm, respectively.

Keywords: GIS, travel trace, travel data encoding, shortest path, following path.

1 Introduction
There are several kinds of applications that query large numbers of travel traces,
collected by vehicles over long periods of time, in order to obtain some knowl-
edge. Examples of such applications include the optimization of delivery routes on
the basis of previous data, the prediction and avoidance of traffic, and the provi-
sion of driving directions [1,2].
The travel trace of a vehicle is usually stored as a GPS trace generated by a
GPS receiver on board the vehicle. A GPS trace consists in a sequence of GPS

Pablo Martinez Lerin ⋅ Daisuke Yamamoto ⋅ Naohisa Takahashi


*

Dept. of Computer Science and Engineering, Nagoya Institute of Technology,


Gokisocho, Showaku, Nagoya 466-8555 Japan
e-mail: {pablo,daisuke,naohisa}@moss.elcom.nitech.ac.jp

T. Watanabe et al. (Eds.): Intelligent Interactive Multimedia: Systems & Services, SIST 14, pp. 233–243.
springerlink.com © Springer-Verlag Berlin Heidelberg 2012
234 P.M. Lerin, D. Yamamoto, and N. Takahashi

points each composed of coordinates (longitude, latitude) and a time-stamp. Con-


sidering that GPS receivers generate GPS points with a small interval and that
the applications mentioned above need travel traces from many vehicles over long
time periods, the problems of storing, sharing, and processing such amounts of da-
ta are understandable.
Before the travel traces are queried, it is common to match the GPS points with
the underlying road network as done, for example, in the T-drive approach [2].
A matched trace between two points (hereinafter referred to as a path) is com-
posed of a sequence of links in a road network, where a link is the section of road
between two neighboring intersections.
Since most applications use the links as input, a naive idea to avoid matching
the points repeatedly would be to match the GPS points with links once and then
store the links. However, storing each link usually takes more space than storing
the GPS points. For example, a GPS receiver logs a GPS point every minute;
however, in one minute, a vehicle usually travels several links.
In this paper we propose two methods to encode a path so that it can be stored
using just a few of its links. A key observation is that drivers tend to travel by fol-
lowing the shortest path from an origin to a destination or a sequence of consecu-
tive destinations. Let us consider the following example: a person drives from
home to a gas station, using the shortest path, and then from the gas station to
work, again using the shortest path. Based on this example, the key idea is not to
store every link in the full path between home and work but to store instead
a sequence of just three links: the link in front of home, the link in front of the gas
station, and the link in front of work. When the travel trace needs to be used, all
the links in the path can be recovered (decoded) by finding the shortest paths be-
tween the stored links. Therefore, the encoding methods require a routing algo-
rithm and a road network. We consider that using a road network is not a disad-
vantage for two reasons: 1) the road network is already needed in order to generate
or match the path; and 2) the road network of a city, which is a relatively small
area, can be used to encode a large dataset that holds millions of vehicle travel da-
ta generated in the city.
Besides the shortest path, drivers tend to follow a path of fewest turns: we start
driving from an origin; we drive straight until we reach a waypoint; we make a
turn; and then we continue driving straight, reaching waypoints, and making turns
until we reach a destination. Hereinafter, we refer to a path that avoids turns as a
following path.
In this study, we present two novel methods to encode a travel trace using
a road network and a routing algorithm (Section 3). As the routing algorithm,
one method uses the shortest path algorithm (Section 4), and the other method
uses the following path algorithm (Section 5). The results of the evaluation show
that these two methods drastically reduce the number of links in a travel trace
when encoded (Section 6). Our main contributions are three:
• We present a method to encode a travel trace based on the shortest path.
• We present a method to encode a travel trace based on the following path.
• We clarify how these proposed methods can encode a path into a small
number of links, through experimental evaluations of 30 real travel traces.
Encoding Travel Traces by Using Road Networks and Routing Algorithms 235

2 Related Work
There are two main approaches to reducing vehicle travel traces: line simplifica-
tion approaches and map-based (map matching) approaches. Line simplification is
a well-studied problem [3,4]. The line simplification approach is based on reduc-
ing the number of GPS points by introducing an error, which is bounded. In other
words, the trace of the travel is smoothed. Map matching is also a well-studied
problem [5,6]. The map-based approach involves matching the GPS points to links
in a road network. Typically, for each matched link one point is then stored. Cao
et al. have investigated several simplification methods [7]. In addition, Hönle et al.
have presented a comparison of several simplification methods with a map-based
method, where the map-based method produced worse results than the approxima-
tion methods in terms of data reduction and calculation time.
We propose an approach that reduces travel traces in the form of a path (i.e.,
a sequence of links in a road network), as opposed to the above mentioned meth-
ods that reduced travel traces in the form of a sequence of GPS points.
We argue that to compare the results of map-based methods and line simplifica-
tion methods is not easy. Line simplifications methods smooth the trace, while map-
based methods fix the trace to links; therefore, one method is more convenient than
the other, regardless of the storage reduction and depending on the application.
Our approach uses two routing algorithms: the shortest path algorithm (SPA)
and the following path algorithm (FPA). The FPA was developed for EMMA,
the Focus+Glue+Context map system, in our previous work [9,10]. EMMA im-
plements the FPA to reduce the density of roads that are drawn to connect the fo-
cus area with the context area. The FPA developed for EMMA can find the multi-
ple paths that appear when a route bifurcates, because it tries to draw as many
following paths as possible to connect the focus with the context.
The encoding method described in this paper uses the FPA to find a subpath
within a travel route that cannot have bifurcations; i.e., a traveler who arrives at
an intersection of roads can only continue along one of the roads. This implies that
the proposed encoding system imposes requirements on the FPA that are different
from those imposed by EMMA. For the proposed system, we define an adapted
version of the FPA that finds a following path without bifurcations.

3 Overview of the Encoding Methods


First, we define the important concepts and then we define the encoding methods.
Link: A link L is a section of road between two neighboring intersections associated
with an identifier L.id, a length L.len, a starting point L.start, and an endpoint L.end,
as shown in Figure 1. The length of a link is computed using Euclidean distance.
Road network: A road network is a directed graph R(V, E), where V is a set of
vertices representing the intersections and terminal points of the roads and E is
a set of directed edges representing the links.
236 P.M. Lerin, D. Yamamoto, and N. Takahashi

Path: A path P is a sequence of connected links in a road network and extends


from a point p1 to a point p2, i.e., P = (L1, L2 ..., Ln), where L1.start = p1, Ln.end =
p2, Li.end = Li+1.start, and 1 ≤ i < n. Figure 1 shows an example of a path com-
posed of four links.
Routing algorithm: Given a road network R and links linkA and linkB in R, a rout-
ing algorithm RTA finds a unique path P between linkA and linkB in R; i.e.,
P = RTA(linkA, linkB, R), where P = (linkA, ..., linkB). For example, some automo-
tive navigational aids implement a routing algorithm that finds the quickest path.
The encoding methods proposed in this paper for encoding a path using a road
network R and a routing algorithm RTA can be defined as follows: Given R, RTA,
and a path P in R, a function Encode transforms P into a subset S of links that be-
long to P by using R and RTA, and a function Decode transforms S into P by
using R and RTA.
S = Encode(P, R, RTA)
P = Decode(S, R, RTA)
The main objective is to minimize the number of links in S and thereby reduce
the storage requirements.

L1 L2
L3
L2.start L2.end
L4

Fig. 1 Representation of a path in a road network.

4 Encoding Method Using the Shortest Path Algorithm

4.1 Encode and Decode Functions


The Encode and Decode functions are defined for a generic routing algorithm RTA,
which is used through the function makePath. The function makePath(LA, LB, R,
RTA) returns the path found by RTA that includes the links LA and LB as well as in-
termediate links in the road network R. The function makePath is described in detail
for the routing algorithm of each proposed method in Subsections 4.2 and 5.2.
The key idea of the Encode function is to find consecutive subpaths as large as
possible within the given path. A subpath is a path that the given routing algorithm
would find. In consecutive subpaths, the last link of each subpath is the first link
of the next subpath.
Figure 2 shows an example of the encoding of a path, where the Encode func-
tion has found two subpaths, subP1 = (L1, L2, L3, L4) and subP2 = (L4, L5, L6, L7),
by using the shortest path algorithm (SPA) as the routing algorithm. The Decode
function would return the original path by using makePath(L1, L4, R, SPA) and
makePath(L4, L7, R, SPA).
Encoding Travel Traces by Using Road Networks and Routing Algorithms 237

Original path P Encoding P with R and SPA


P = (L1,L2,L3,L4,L5,L6,L7) S = Encode(P,R,SPA)=(L1,L4,L7)

L1 L2 L3 L1

L4 L4
L5

L6

L7 L7

Fig. 2 Example of encoding a path by using the shortest path algorithm (SPA). The arrows
represent the links and the thick lines represent the road network R.

Below we describe the algorithms for the Encode and Decode functions. Let us
consider that the function y.append(x) adds the element or sequence x to the end
of the sequence y. Let us also consider that the function pathEqual(P1, P2) returns
TRUE if the paths P1 and P2 are identical and returns FALSE otherwise.

Function Encode(P, R, RTA)


Input: A path P, a road network R and a routing algorithm RTA.
Output: A sequence of links S that is the result of encoding P.
N = |P|
S.append(P[1])
subPath = {}
subPath.append(P[1])
subPath.append(P[2])
For i = 2 to N–1 do
subPath.append(P[i+1])
M = |subPath|
tempSubPath = makePath(subPath[1], subPath[M], SR, RTA)
If ! pathEqual(subPath, tempSubPath) then
S.append(P[i])
subPath = {}
subPath.append(P[i])
subPath.append(P[i+1])
EndIf
EndFor
S.append(P[N])
EndFunction
238 P.M. Lerin, D. Yamamoto, and N. Takahashi

Function Decode(S, R, RTA)


Input: A sequence of links S, a road network R, and a routing algorithm RTA.
Output: A path P.
N = |S|
For i = 1 to N do
subPath = makePath(S[i], S[i+1], SR, RTA)
P.append(subPath)
EndFor
EndFunction

4.2 Making a Path with the Shortest Path Algorithm


The shortest path from a link LA to a link LB is the path of minimum length that
extends from the point LA.end to the point LB.start, with the length of a path being
the sum of the lengths of its links. Since the road network available is often too
extensive, the algorithm searches for the shortest path using a delimited area of
the road network. The function makePath for the shortest path algorithm is defined
below.
Function makePath(LA, LB, R, SPA)
Given the two input links LA and LB along with a road network R, the function first
selects the subset SR that contains the links in R that lie within or partially within
the minimum bounding box that includes two squares of area sa × sa centered on
the points LA.end and LB.start. The length sa (in meters) is a system parameter,
discussed in Section 6. Then, the function makePath returns the shortest path that
includes LA and LB as well as links that belong to SR. However, the function make-
Path returns NULL when there is no path in SR that reaches LB from LA. For the
evaluation, we compute the shortest path by using the Dijkstra algorithm [11].

5 Encoding Method Using the Following Path Algorithm


This method uses the same Encode and Decode functions as the method based on
the shortest path (Subsection 4.1), because those functions are defined for a ge-
neric routing algorithm RTA. The difference arises in the function makePath,
which is defined in Subsection 5.2.

5.1 Following Link Algorithm


The following link of a link L is the link connected to L that a traveler would fol-
low by going straight ahead from L, considering that a link Lx is connected to a
link Ly when Lx.start = Ly.end. The function FLA defined below returns the fol-
lowing link of a link.
Function FLA(L, R)
Given a road network R and a link L, the function first selects the subset SR that
contains the links in R that lie within or partially within a square of area sb × sb
Encoding Travel Traces by Using Road Networks and Routing Algorithms 239

centered on the point L.end. The length sb (in meters) is a system parameter, dis-
cussed in Section 6. Then, the function FLA returns the following link of L that
belongs to SR. The function FLA returns NULL when no links in SR are con-
nected to L. The following link is computed by obeying the rules prescribed be-
low. Let us consider L1–LN the N links that are connected to L and belong to SR.
Let us consider αi the angle between the connected links Li and L, as shown in
Figure 3.
Rule 1. When N = 1, the following link is L1.
Rule 2. When N > 1 and the angle αi between L and Li is the smallest among
the angles αk (1 ≤ k ≤ N) between L and the connected links, the following link is
Li, as shown in Figure 3.
Rule 3. When Li.len < l0, we assume that the links connected to Li are directly
connected to L, as shown in Figure 4 (left). The very small length l0, for example 2
m, is a system parameter that determines when a link is too small. Very small
links indicate misalignments, as shown in Figure 4 (right), for which the algorithm
uses Rule 3 to select the link Ls as the following link of L.

L1
L2
Li
L αi

LN

Fig. 3 Angles between the link L and the links connected to L.

Li.len

Fig. 4 Rule 3 of function FLA (left) and example of slightly misaligned intersection (right).

5.2 Making a Path with the Following Path Algorithm


The following path from a link L to a link LB is the path that a traveler would fol-
low by going straight ahead from L without turning at the intersections until
reaching the point LB.start. Note that the following path needs to start from a di-
rected link, whereas the shortest path starts from a point. The function makePath
generates the following path between two input links LA and LB by successively
applying the function FLA. When the destination link LB is not ahead of the fol-
lowing path generated thus far, as defined by the function isAhead below, the
process considers that the destination link LB is unreachable and returns NULL.
240 P.M. Lerin, D. Yamamoto, and N. Takahashi

Function isAhead(p, L)
Given a point p and a link L, the function isAhead(p, L) returns TRUE and p is
considered ahead of L if and only if the angle between L and the vector formed by
p and the endpoint of L is greater than 90º, with the angle defined as shown in
Figure 5.

Fig. 5 Example of the angle between a link L and a point p.

The algorithm of the function makePath with the shortest path algorithm is pre-
sented below. Let us consider that the function P.append(x) adds the element x to
the end of the sequence P and that the function last(z) returns the last element of
the sequence z.

Function makePath(LA, LB, R, FPA)


Input: two links, LA and LB, and a road network R.
Output: a path P made using R and the FPA.
P = {}
P.append(LA)
While last(P).end != LB.start do
If ! isAhead(LB.start, last(P)) then
Return NULL
EndIf
followingLink = FLA(currentLink, R)
If followingLink == NULL then
Return NULL
EndIf
P.append(followingLink)
EndWhile
P.append(LB)
EndFunction

6 Evaluation
We have developed and evaluated the functions Encode and Decode for the two me-
thods presented in Sections 4 and 5. The evaluation was made in terms of
the number of links needed to encode each path in a dataset of 30 paths with diverse
Encoding Travel Traces by Using Road Networks and Routing Algorithms 241

lengths. The 30 paths have a total length of 356 km, and they are composed of 4374
links. We obtained the 30 paths as a result of map matching the GPS traces of real
intra-city and intercity journeys made by car in Japan by a member of our labora-
tory. For the evaluation, we used the entire road network of Japan as stored in a da-
tabase in our laboratory.
We have evaluated the two encoding methods with the following parameters:
• Method 1, using the SPA, with sa = 2000 m.
• Method 2, using the FPA, with sb = 10 m and l0 = 5 m.
Initially, we evaluated Method 1 using different values for the parameter sa and
concluded that 1) the greater the parameter sa, the fewer the links returned by
the function Encode (i.e., the better the encoding), and 2) when the parameter sa
exceeds 1000 m, in most cases the result does not vary. The reason for these con-
clusions is that when the parameter sa is small, the delimited area is also small,
and so a shortest path may not be detected because part of it falls outside the de-
limited area.
Method 2 was not evaluated using several values for the parameters l0 and sb
because 1) changing the parameter l0 would cause the FLA function to return a
link that is not the following link, and 2) changing the parameter sb has no effect
on the result as long as sb is greater than l0 since FLA only needs the connected
links.
Figures 6 and 7 show the results of the evaluation of the two methods. The re-
sults show an extremely good performance by each method. The SPA performs
better for most of the paths, although the difference is always very small compared
with the number of links in the original path. The results show that using the SPA
a path is encoded using 5% of its links on average, while using the FPA a path is
encoded using 7% of its links on average.

Fig. 6 Comparison of the numbers of links in the encoded paths and the original paths.
242 P.M. Lerin, D. Yamamoto, and N. Takahashi

Fig. 7 Percentage of the original links included in the encoding of each experiment.

Although the method based on the SPA is able to encode a path using fewer
links than the method based on the FPA, the shortest path algorithm (SPA) might
require much more time than the following path algorithm (FPA) to perform en-
coding and decoding. The FPA potentially requires less computation time than the
SPA to make a path, because the FPA is an incremental algorithm and uses very
small areas of road network, while the SPA requires backtracks that consume
much computation time and may use very large areas of road network.
For applications where the process time is very important, the computation time
of the encoding based on the SPA can be improved by finding the shortest path us-
ing more complex solutions based on preprocessing the road network [12,13].
Other solutions are based on the generation of a path view from the road network
[14,15]. Basically, a path view contains the pre-computed shortest paths between
each pair of nodes in the road network.

7 Conclusions
This paper presented two novel methods to encode a path, a sequence of links in
a road network, by using a routing algorithm so that the path can be stored and
shared using very few links. One method uses the shortest path algorithm (SPA) as
the routing algorithm and the other uses the following path algorithm (FPA) as
the routing algorithm. The evaluation in this paper has shown that these two meth-
ods can drastically reduce the number of links in a path when encoded. The results
of the method that uses the SPA are slightly better than those of the method that
uses the FPA but may take longer to compute. The results confirm that vehicle
routes are usually composed of several shortest paths or following paths.
As a future work remains a quantitative comparison of the two methods in terms
of the computation time, necessary to discuss a trade-off between the two methods.

Acknowledgments. We would like to thank Yahoo Japan Corp. for support in the devel-
opment of the prototype system. This work was also supported by JSPS KAKENHI
20509003 and 23500084.
Encoding Travel Traces by Using Road Networks and Routing Algorithms 243

References
1. Xue, G., Li, Z., Zhu, H., Liu, Y.: Traffic-known urban vehicular route prediction based
on partial mobility patterns. In: Proc. ICPADS, pp. 369–375 (2009)
2. Yuan, J., Zheng, Y., Zhang, C., Xie, W., Xie, X., Sun, G., Huang, Y.: T-drive: driving
directions based on taxi trajectories. In: Proc. GIS, pp. 99–108 (2010)
3. McMaster, R.B.: Automated line generalization. Cartographica 24(2), 74–111 (1987)
4. Leu, J.G., Chen, L.: Polygonal approximation of 2-D shapes through boundary merg-
ing. Pattern Recognition Letters 7(4), 231–238 (1988)
5. Brakatsoulas, S., Pfoser, D., Salas, R., Wenk, C.: On map-matching vehicle tracking
data. In: Proc. 31st Int’l Conf. on Very Large Data Bases (VLDB 2005), pp. 853–864
(2005)
6. Yuan, J., Zheng, Y., Zhang, C., Xie, X., Sun, G.-Z.: An interactive-voting based map
matching algorithm. In: Proc. 11th Int’l Conf. on Mobile Data Management (MDM),
pp. 43–52 (2010)
7. Cao, H., Wolfson, O., Trajcevski, G.: Spatio-temporal data reduction with determinis-
tic error bounds. VLDB Journal 15(3), 211–228 (2006)
8. Hönle, N., Grossmann, M., Reimann, S., Mitschang, B.: Usability analysis of compres-
sion algorithms for position data streams. In: Proc. 18th ACM SIGSPATIAL Int’l
Conf. on Advances in Geographic Information Systems, pp. 240–249 (2010)
9. Takahashi, N.: An elastic map system with cognitive map-based operations. In: Inter-
national Perspectives on Maps and Internet. Lecture Notes in Geoinformation and Car-
tography, pp. 73–87 (2008)
10. Yamamoto, D., Ozeki, S., Takahashi, N.: Focus+Glue+Context: an improved fisheye
approach for web map services. In: Proc. 17th ACM SIGSPATIAL Int’l Conf. on Ad-
vances in Geographic Information Systems, pp. 101–110 (2009)
11. Dijkstra, E.W.: A note on two problems in connection with graph theory. Numerische
Mathematik 1, 269–271 (1959)
12. Idwan, S., Etaiwi, W.: Dijkstra algorithm heuristic approach for large graph. Journal of
Applied Sciences 11, 2255–2259 (2011)
13. Cho, H.-J., Lan, C.-L.: Hybrid shortest path algorithm for vehicle navigation. Journal
of Supercomputing 49(2), 234–247 (2009)
14. Huang, Y.-W., Jing, N., Rundensteiner, E.A.: A semi-materialized view approach for
route maintenance in IVHS. In: Proc. 2nd ACM Workshop on Geographic Information
Systems, pp. 144–151 (1994)
15. Huang, Y.-W., Jing, N., Rundensteiner, E.A.: A hierarchical path view model for path
finding in intelligent transportation systems. GeoInformatica 1(2), 125–159 (1997)
Estimation of Dialogue Moods
Using the Utterance Intervals Features

Kaoru Toyoda, Yoshihiro Miyakoshi, Ryosuke Yamanishi, and Shohei Kato

Abstract. Many recent studies have focused on dialogue communication. In this


paper, our target is to make a robot support a communication between humans. To
support a communication between humans, we believe that there are two important
functions: estimating dialogue moods and behaving suitably. In this paper, we pro-
pose dialogue mood estimation model using the utterance intervals. The proposed
estimation model is composed by relating the subjective evaluations for several ad-
jectives with the utterance intervals features. Through the estimation experiments,
we confirmed that the proposed system could estimate the dialogue moods with a
high degree of accuracy, especially for “excitement,” “seriousness,” and “closeness.”
And we suggested that the utterance intervals features had a high potential for the
dialogue mood estimation.

1 Introduction
Recently, there are many studies which have focused on communication robots [6, 8,
11]. These studies aim for development of a robot which communicates with human
and provides people with a feeling of fullness and happiness. However, it is difficult
for development of such a robot which plays a role as alternate for a human. While
the human-robot interaction attracts attention, the robots that support human-human
interaction can be also usable for more attractive and affective communication.
We usually have a communication in a group of people, and Fig. 1 shows such
a communication which we supposed in this paper. In a communication, people
can communicate each other fluently and intimately by two important functions: to
estimate dialogue moods, and to select suitable behavior considering the dialogue
moods.
Kaoru Toyoda · Yoshihiro Miyakoshi · Ryosuke Yamanishi · Shohei Kato
Dept. of Computer Science and Engineering, Graduate School of Engineering,
Nagoya Institute of Technology,
Gokiso-cho Showa-ku Nagoya 466-8555 Japan
e-mail: {toyoda,miyakosi,ryama,shohey}@juno.ics.nitech.ac.jp

T. Watanabe et al. (Eds.): Intelligent Interactive Multimedia: Systems & Services, SIST 14, pp. 245–254.
springerlink.com c Springer-Verlag Berlin Heidelberg 2012
246 K. Toyoda et al.

Speaker A Dialogue Speaker B

Hearing Behavior

Speaker O

Dialogue mood Selection


estimation of behavior

Fig. 1 The group communication we supposeded in this paper

Our target is to make a robot support a communication between humans con-


sidering the personality of speakers. In this paper, as the first step, we propose a
dialogue mood estimation model using utterance intervals features which represent
who and how long speaks in dialogues. And we reveal the relationships between
dialogue moods and utterance intervals features, and discuss the effectiveness of the
proposed dialogue mood estimation model.

2 Related Studies
There are some studies about communication between humans. Wrede [13] studied
the relationships between the hot spots in meeting and the bibliographic tags labeled
by humans. However the tags can be not labeled automatically and dynamically and
it is not appropriate to use the tags in real time communication. Besides Gatica-
Perez [5] and Ito [7] studied the relationships between human motion and meeting
moods using motion features obtained through the motion recognition. Because it
needs high calculation cost and motion capture that is a lot of equipment to extract
the motion features, it can be also not appropriate to use the motion features in
real-world communication. And Mori’s study [9] estimates dialogue moods using
speaker’s facial expression features, this system expects that the users always are
have a face-to-face talk with camera and it is difficult to use this system in real-
world communication.
In this paper, we believe that the intervals of speakers’ states have a beneficial ef-
fect on estimating dialogue moods, and propose the dialogue mood estimation sys-
tem using the utterance intervals features. The utterance intervals features need just
only the information about “who speaks how long,” therefore it needs very lower cost
to extract than bibliographic data, motion features, and facial expression features.
Moreover, the point of this study is different from above several existing studies
and intend to support a communication between humans, We believe that not just
a knowledge about communication but a support for a communication is important
and significant in the field of human-computer interaction.
Estimation of Dialogue Moods Using the Utterance Intervals Features 247

Dialogue mood estimation system

Utterance intervals
Voice of

Mood
dialogue
Discriminant

features
Feature extraction Input Estimation
model

Discriminant model(Learning section)

Learning data

Subjective
Feature extraction
evaluations

Utterance intervals Relating


features Mood labels

Fig. 2 The general of proposed dialogue mood estimation system

3 Dialogue Mood Estimation System


Fig. 2 shows the general of the proposed system. The proposed system estimates
dialogue moods using utterance intervals features. In the system, at first, utterance
intervals features is extracted from voice in dialogues. Then the features are input
to estimation model that learned the relationships between each dialogue mood and
the utterance intervals features in advance. And the estimated dialogue moods are
provided.
The estimation model is composed by relating the subjective evaluations for sev-
eral adjectives with the utterance intervals features. The utterance intervals features
will be detailed in the following section 4, and the relating subjective evaluations
with these features will be detailed in section 5.

4 The Utterance Intervals Features


We believe that the intervals of speakers’ states have a beneficial effect on estimating
dialogue moods, and we have focused on intervals of speakers’ states (for example, a
solitary utterance state which a human speaks by oneself, a simultaneous utterance
state which two humans speak at once, and silent state which is occurred when a
speaker is changed).
248 K. Toyoda et al.

A utterance

B utterance (st=1) (st=2) (st=1)


(st=4) (st=1) (st=3) (st=2) (st=4)

3sec 1sec 1sec

Fig. 3 An example of the representation of a dialogue. In this case (dialogue index = d), A’s
utterance (st= 1) is 3 seconds, 1 second, and 1 second. And the utterance intervals of st = 1
is shown as the multiset S1d = {3, 1, 1}.

4.1 Speakers’ States


In this study, we defined speakers’ states in dialogues as bellow.
st1: A solitary utterance state
st2: B solitary utterance state
st3: simultaneous utterance state
st4: silent state
In general dialogues, there are a leading speaker and following speakers, and they
take turn speaking. And it can be thought that a leading speaker have longer utter-
ance intervals than following speakers in a dialogue. Thus, we defined a speaker A
who have longer utterance intervals as a leading speaker and another speaker B as a
following speaker,
In the first step for extracting the utterance intervals features, a dialogue is rep-
resented with four speakers’ states, and each interval of the speakers’ state st is
calculated and the utterance intervals are shown as the multiset Sst . Fig. 3 shows an
example of the representation of a dialogue.

4.2 Calculating the Utterance Intervals Features


Table 1 shows the utterance intervals features and examples of the calculation.
Where, a function stat(Sstd ) shows one of the following six functions, and each func-
tion is used in the each utterance intervals features in index order. For example, the
utterance intervals feature 7 shows mean(S2d ), and the utterance intervals feature 12
shows occupy(S2d ).
Function 1: mean(Sstd ): The mean of Sstd .
Function 2: var(Sstd ): The variance of Sstd .
Function 3: min(Sstd ): The minimum of Sstd .
Function 4: max(Sstd ): The maximum of Sstd .
Function 5: count(Sstd ): The element count of Sstd .
Function 6: occupy(Sstd ): The occupancy of speakers’ states (st) in dialogue (d).
Estimation of Dialogue Moods Using the Utterance Intervals Features 249

Table 1 The Utterance Intervals Features

Index Features Equation Example

1-6 Statics about utterances A stat(S1d )


7-12 Statics about utterances B stat(S2d )
13-18 Statics about simultaneous utterance stat(S3d )
19-24 Statics about silent state stat(S4d )
stat(Sd2 )+stat(Sd3 )
25-48 Comparison utterances A and utter- stat(Sd1 )+stat(Sd3 )
ances B
stat(Sd4 )
49-54 Comparison utterance states and silent Σi=1
3
stat(Sdi )
states
stat(Sd3 )
55-66 Comparison simultaneous utterance stat(Sd1 )
and solitary utterance of each speaker
stat(Sd3 )
67-78 Comparison summation utterance and stat(Sd1 )+stat(Sd3 )
simultaneous utterance of each speaker
stat(Sd1 )
79-90 Comparison summation utterance and stat(Sd1 )+stat(Sd3 )
solitary utterance of each speaker

The features indexed 25-90 in Table 1 are prepared to comparison of speakers’ states
(e.g., comparison of A solitary utterances and B solitary utterances), because we
believe that these features have high potential for estimating dialogue moods base
on our heuristics.

5 The Estimation Model


5.1 Affective Evaluation of Dialogue
We constructed the estimation model by relating the subjective evaluations for sev-
eral adjectives with the utterance intervals features. The subjective evaluations of
dialogue moods were obtained through the subjective evaluation experiments. In the
experiments, we prepared two hundred dialogues that were included in corpus [4].
Fifteen Japanese males and females in their twenties participated in the experiments,
and they were asked to listen to the each dialogue and evaluate the each adjective
pair listed in bellow, the sentences shown in parentheses are the original Japanese
used in the experiments.
250 K. Toyoda et al.

Excitement (Moriagari) : Excite - Not Excite


Seriousness (Majimesa) : Serious - Not Serious
Smoothness (Kamiai) : Smooth - Not Smooth
Brightness (Akarusa) : Bright - Not Bright
Closeness (Shitashisa) : Close - Not Close
Equivalent (Taitousa) : Equal - Not Equal
The major values were supposeded and used as the affective evaluations of the dia-
logue, which are dialogue moods’ labels.

5.2 Composition of the Estimation Model


The utterance intervals features described in section 4 were extracted from the each
prepared dialogue in the subjective evaluation experiments, respectively. Contribute
features for estimating the each adjectives were selected from all utterance intervals
features using features selection, which will be described in section 5.3. The estima-
tion model was constructed by relating the selected the utterance intervals features
with the affective evaluations of each dialogue using Tree-Augmented Naive Bayes
(TAN). TAN is one of the Bayesian method which constructs causal relation among
features using learning data, and estimates the target class under the probabilistic
model. It has been reported that TAN shows the high estimation performance in the
problem in which the features have casual relation among themselves, for example,
estimation of the affection using the facial features [3]. Therefore we used TAN
to construct the estimation model using the utterance intervals features which have
causal relation among themselves.

5.3 Features Selection Using Genetic Algorithm (GA)


In the multivariate analysis, it has been known that a critical problem can be oc-
curred; the learning effectiveness is reduced, because the features’ space expands
as the number of futures increases. Besides we believe that effective features for
the each estimation of dialogue mood differ. Thus we selected the contribute utter-
ance intervals features using GA [10] which can rapidly obtain practical solution
from a lot of features for each dialogue mood. We performed features selection us-
ing GA with the parameters listed in the Table 2. We used elite selection, uniform
crossover, and mutation as genetic operation for GA, and AIC(Akaike’s Information
Criterion) [1] is used as the fitness.

5.4 Results of Composing “Excitement” Estimation Model


Table 3 shows the selected utterance intervals features for estimating each dialogue
mood. Since we believe that “excitement” is one of the most important adjective
Estimation of Dialogue Moods Using the Utterance Intervals Features 251

Table 2 GA parameters

Number of Individuals 150


Number of Steps 1000
Number of Selecting Individuals 5
Rate of Mutation 0.1%

Table 3 The utterance intervals features selected by GA

Adjectives Features selected by GA


Activity 18, 44, 79
Seriousness 7, 18
Smoothness 25, 67
Brightness 15, 18, 28, 53
Closeness 34, 71, 84
Equivalent 4, 35

for a robot to select its behavior for supporting humans’ communication, we will
describe about excitement estimation model in detail in this paper.
Focusing on excitement estimation model, the features indexed (18), (44), and
(79) were selected as the contribute features. Fig. 4 and Fig. 5 each shows the nor-
mal distribution plot of the feature indexed (18) and (44) whose degrees of separa-
tion were relatively high; these features seemed to be contribute for the excitement
estimation model also in intuitiveness.
In the normal distribution plot, the line colored light and dark each means class
of “excite” and “not excite,” respectively. The dialogue whose light colored feature
in the plot is higher than dark one is estimated as “excite” mood.
From Fig. 4, it was confirmed that the highly occupancy of simultaneous utter-
ance led excite mood. This result suggested that the dialogue having a lot of simulta-
neous utterance (e.g., a speaker overlap another speaker’s utterance before finishing
his/her speaking, or give many back-channel feedback) made the excite mood.
Focusing on Fig. 5, it was confirmed that the dialogue mood was estimated as the
excite mood in the case of that the variance of utterances B were relatively higher
than the variance of utterances A, This result suggested that the dialogue in which
following speaker B replied long utterance involving his/her own belief made the
excite mood, because the highly variance of utterances B meant that B had not only
short utterances but also long utterances.
252 K. Toyoda et al.

Excite
Not Excite



1EEWTTGPEG2TQDCDKNKV[
 












    

Fig. 4 The statics about simultaneous utterance; (18) occupy(S3d ) (normalized)

Excite
Not Excite




1EEWTTGPEG2TQDCDKNKV[













    

var(Sd2 )+var(Sd3 )
Fig. 5 The comparison utterances A and utterances B; (44) var(Sd1 )+var(Sd3 )
(normalized)

6 Experiment of Dialogue Mood Estimation


We have described the estimation model in section 5, and we verify the effective-
ness of the proposed dialogue mood estimation model by conducting 5 fold cross
validation.
Table 4 shows experimental results which are the positive accuracy rate, the neg-
ative accuracy rate, and the whole accuracy rate. The positive and negative accuracy
Estimation of Dialogue Moods Using the Utterance Intervals Features 253

Table 4 The accuracy of estimating

Moods Positive Accuracy Negative Accuracy Whole Accuracy (%)


Excitement 71.4 90.7 86.0
Seriousness 87.7 79.8 84.0
Smoothness 88.0 40.0 76.0
Brightness 48.4 93.4 79.0
Closeness 65.8 87.9 79.5
Equivalent 76.9 61.4 73.5

rate each means the accuracy rate estimating dialogues that labeled positive and neg-
ative, respectively. The accuracy rate of estimating the all dialogue moods is named
the whole accuracy rate. Focusing on excitement, seriousness, and closeness, we
confirmed that the estimation models showed over than 80% whole accuracy rate
and more than 70% positive and the negative accuracy rate, and it seemed that these
estimation models had high potential for dialogue mood estimation. Moreover, we
suggested that the proposed utterance intervals features were effective for “excite-
ment,” “seriousness,” and “closeness” estimation.
However, focusing on “smoothness” and “brightness” estimation model, either
positive or negative accuracy rate was below 50% while the whole acuracy rate was
more than 70% in the both estimation. It was suggested that it was difficult to esti-
mate smoothness and brightness moods using only the utterance intervals features.

7 Conclusion
In this paper, we proposed the dialogue mood estimation model using the utter-
ance intervals features focusing on the intervals of speakers’ states. Through the
estimation experiments, we confirmed that the proposed system could estimate the
dialogue moods with a high degree of accuracy, especially for excitement, serious-
ness, and closeness. And we suggested that the utterance intervals features had a
high potential for the dialogue mood estimation. Estimating dialogue moods, the
personality of speakers can be taken in the consideration of human communication
support. Then it is expected that more affective/emotional communication can be
realized with the proposed system.
In future, we will study the effectiveness of the robot’s behavior depending on
the dialogue moods for humans communication support, and propose the behavior
selection method depending on the dialogue moods. As one of such a behavior,
we believe that playing the suitable back ground music (BGM) depending on the
mood is effective. Because in the filed of psychology science, it has been suggested
that the affections of music influences humans’ minds or bodies [2]. Therefore, as a
human-human interaction support system, we will develop automated suitable BGM
254 K. Toyoda et al.

selection system for the dialogue mood using our previous proposed system: song
selection system with affective requests [12].

Acknowledgment. This work was supported in part by the Ministry of Education, Science,
Sports and Culture, Grant.in.Aid for Scientific Research under grant #20700199, and HORI
SCIENCE AND ART FOUNDATION.

References
1. Akaike, H.: Information theory and an extension of the maximum likelihood principle.
In: 2nd Inter. Symp. on Information Theory, vol. 1, pp. 267–281 (1973)
2. Bruner, G.: Music, mood, and marketing. Journal of Marketing 54(4), 94–104 (1990)
3. Cohen, I., Sebe, N., Chen, L., Garg, A., Huang, T.S.: Facial expression recognition from
video sequences: Temporal and static modelling. In: Computer Vision and Image Under-
standing, pp. 160–187 (2003)
4. Consortium, N.T.S.R.: Priority area spoken dialogue spoken dialogue corpus (pasd)
(1993-1996)
5. Gatica-perez, D., Mccowan, I., Zhang, D., Bengio, S.: Detecting group interest level
in meetings. In: Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing
(ICASSP), pp. 489–492 (2005)
6. Hayashi, T., Kato, S., Itoh, H.: A Synchronous Model of Mental Rhythm Using Paralan-
guage for Communication Robots. In: Yang, J.-J., Yokoo, M., Ito, T., Jin, Z., Scerri, P.
(eds.) PRIMA 2009. LNCS, vol. 5925, pp. 376–388. Springer, Heidelberg (2009)
7. Ito, H., Shigeno, S., Nishimoto, T., Araki, M., Nimi, Y.: The analysis of the atmosphere
in the dialogues. IPSJ SIG Technical Report, pp. 103–108 (2011) (in Japanese)
8. Itoh, C., Kato, S., Itoh, H.: Mood-transition-based emotion generation model for the
robot’s personality. In: Proceedings of the 2009 IEEE International Conference on Sys-
tems, Man, and Cybernetics, SMC 2009, San Antonio, TX, USA, pp. 2957–2962 (2009)
9. Mori, H., Miyawaki, K., Nishiguchi, S., Sano, M., Yamashita, N.: An affections model
of group activities for estimation of individual’s affection. IEICE Technical Report, pp.
519–523 (2010)
10. Siedlecki, W., Sklansky, J.: A note on genetic algorithms for large-scale feature selection.
Pattern Recogn. Lett. 10, 335–347 (1989)
11. Takasugi, S., Yoshida, S., Okitsu, K., Yokoyama, M., Yamamoto, T., Miyake, Y.: Influ-
ence of pause duration and nod response timing in dialogue between human and com-
munication robot. In: Transactions of the Society of Instrument and Control Engineers,
pp. 72–81 (2010) (in Japanese)
12. Toyoda, K., Yamanishi, R., Kato, S.: Song selection system with affective requests.
In: 12th International Symposium on Advanced Intelligent Systems, Suwon, Korea, pp.
462–465 (2011)
13. Wrede, B., Shriberg, E.: The relationship between dialogue acts and hot spots in meet-
ings. In: Proc. IEEE Automatic Speech Recognition and Understanding Workshop,
ASRU, Virgin Islands (2003)
Extraction of Vocational Aptitude from
Operation Logs in Virtual Space

Kyohei Nishide, Tateaki Komaki, Fumiko Harada, and Hiromitsu Shimakawa

Abstract. Though vocational aptitude tests have been conducted, it sometimes


judge people do not have aptitude for their desired jobs, because of weak points.
If they know their weak points in early age such as elementary school ages, they
may be able to overcome the points. It would extend their range of job choices.
However, elementary school children cannot take most of tests because of loaf of
knowledge. In this paper, we propose a method to evaluate vocational aptitudes for
elementary school children quantitatively. In this method, we extract aptitude value,
applying evaluation expression on operation logs. Specific evaluation expression is
provided for every aptitude. We implemented aptitude test system assuming an ex-
ample of the home-delivery service job experience. We have applied the system to
15 people. We have evaluated with correlation coefficient between extracted values
and the result of CPS-J, Which is one of widely used vocational aptitude tests. From
the result, we have found that some aptitudes should be calculated in early set while
others can be derived from any set in repeated job experiences.

1 Introduction
The vocational aptitude tests are used widely [1][2][3]. In these tests, interest, per-
sonality and ability a person has for jobs are checked through many test items. The
results of the tests are compared to general tendency of the vocational aptitude to de-
rive jobs the person has high concordance rate. These tests aim to make examinees
Kyohei Nishide · Tateaki Komaki
Graduate School of Science and Engineering, Ritsumeikan University, Nojihigashi 1-1-1,
Kusatsu, Shiga
e-mail: mario@de.is.ritsumei.ac.jp,tateaki76@de.is.ritsumei.ac.jp
Fumiko Harada · Hiromitsu Shimakawa
College of Information Science and Engineering, Ritsumeikan University, Nojihigashi 1-1-1,
Kusatsu, Shiga
e-mail: harada@cs.ritsumei.ac.jp,simakawa@cs.ritsumei.ac.jp

T. Watanabe et al. (Eds.): Intelligent Interactive Multimedia: Systems & Services, SIST 14, pp. 255–267.
springerlink.com c Springer-Verlag Berlin Heidelberg 2012
256 K. Nishide et al.

understand their concordance for jobs, which lead them to choose their preferred
job. Many vocational aptitude tests consist of questionnaire and written tests. These
tests target persons older than high school students and university students. If the
result of the test shows that they do not match the aptitude type of their desired jobs,
they may resign the desired jobs because they do not have time to acquire the re-
quired aptitude. If they know their weak points early time such as elementary school
ages, they may be able to overcome the points. It would extend their range of job
choices. Therefore, it is important to judge the vocational aptitude in early time.
However, these tests are based on questionnaire and written tests so it is too difficult
for elementary school children to give correct answer due to the lack of knowledge.
In addition, these tests have strong implication of an examination. It is not pleasant
for children to take the tests.
In this paper, we propose a method to evaluate vocational aptitude for elementary
school children quantitatively. We provide a game-flavored tool to experience a job
in a virtual space. In this method, children enjoy the job experience tool in a virtual
space. We focus on the operation log. We extract their vocational aptitude through
calculation from operation logs with evaluation expression.

2 Vocational Aptitudes
2.1 Vocational Aptitudes with CPS-J
CPS-J (Career Planning Survey - Japanese Version) [3] is a vocational aptitude test
used widely in Japan. It is developed based on the theory of Holland [4]. The test
targets persons older than university students. The test evaluates the interest and
aptitude for wide varieties of jobs from primary industry to tertiary industry. In the
interest test, examinees answer the 150 questions which ask their preference for
specific activities. They are required to answer from exclusive 3 options: ”like it”,
”dislike it”, ”neither of them”. In the aptitude test, they are requested to answer
the fifteen questions on specific actions. They answer whether they are good at or
poor at with five grade evaluation. CPS-J judges the aptitude along with six axes
proposed in the theory of Holland. The axes are Realistic, Investigative, Artistic,
Social, Enterprising and Conventional. Each question is relevant to more than one
axis. The aptitude of the persons can be represented with the six axes from their
answers of fifteen questions.

2.2 Related Work


The study in [5] has proposed a method to extract current ability. In [5], the academic
results of graduates are used to set the evaluation value for a type of the job. Stu-
dents compare their results with evaluation values. They grasp current their abilities
visually. In this way, they can train the ability required in the job they desire. In this
method, the evaluation value is based on the academic results of the graduates who
belong to the desired job. Therefore, they do not always have the aptitude of the job
Extraction of Vocational Aptitude from Operation Logs in Virtual Space 257

Fig. 1 Outline of proposed


method

even if their results exceed all required evaluation values. Our proposed method uses
CPS-J [3] to make classification of jobs on the basis of required abilities. Therefore,
our proposed method can extract aptitudes of each job.
The proposed methods in [6] and [7] provide chances to experience works of
a job in a virtual space. In [6], we train the work through both multimedia and
haptic technology. Since general behavior is reflected to the 3D object, we can learn
efficiently. In the method of [7], we can learn the medical field work efficiently.
However, we cannot know the aptitude of the job in these methods.

3 Extracting Vocational Aptitude with Virtual Space

3.1 Method Overview


In this paper, we propose a method to extract vocational aptitude by analyzing op-
eration logs of a job experience in virtual space. In this method, we extract aptitude
value to apply evaluation expression to operation logs. We compare the aptitude
value to the one of children who experience the same job in the past. We evaluate
how superior they are relatively. The result is visualized to show it to the children.
Figure 1 shows the flow of the proposed method. First, elementary school child
logs in to virtual space. Virtual space has many kinds of jobs. These jobs consist of
several works specific to the job. For example, in case of a bookstore clerk, the works
of the job include the accounting with a register, checking the stock, and serving a
customer. The system gets the operation logs while the child deals with these works.
After he finishes all works, system calculates the aptitude value, applying evaluation
expressions to operation logs. The needed ability is different for every aptitude. The
system calculates every aptitude value with a different evaluation expression from
operation logs.
258 K. Nishide et al.

3.2 Operation Log


In this method, we focus on the operation logs. People can do well what he is good
at. If he is good at a calculation, he can do precisely and quickly. If he has a light
hand, he can accomplish a detailed work. On the contrary, people cannot do well
what they are poor at. This feature can be reflected in the same way to the oper-
ation in virtual space. Each job consists of some works. If a person is good at a
specific work, he can plan actions which compose the work. Since he can predict
what he should do next, he can accomplish smoothly a sequence of actions for the
work, even if they are implemented in a virtual space. A person good at a specific
work is expected to bring difference in every operation such as touch to an object,
movement of an avatar. It also affects not only the time necessary for the operation
to handle work, but also the interval between the operations. The system records
operation logs. Each record is the sequence of the 3 values: a touched object, the
avatar position, and a timestamp at which the operation is taken.

3.3 Calculation of Aptitude Value


This paper explains the evaluation expression on operation logs. Since the six axes
of the theory of Holland are abstract, we use the fifteen questions of aptitude to lead
them in CPS-J. We focus on the five aptitudes shown Table 1 in the method.
System aptitude requires the ability to work systematically, recording and to or-
ganizing the details of duties. Action cases include working on a schedule and pro-
cessing tasks according to priorities. The person of the high system aptitude can
decide correct priority of tasks using given information. We expect that we can
evaluate system aptitude from the ability to assign priority to tasks in virtual space.
The person who does it well assigns correct priority in a short time. The person
who is weak in it decides wrong priority, or takes a long time for the decision. The
person who does it loosely decides wrong priority, though his thinking time is short.
The discussion above lead the value of the correct rate of the priority divided by the
thinking time as the index of the system aptitude value.
Machine aptitude requires the ability to understand the principle of the machine
and operating the machine. Technophobes have many difficulties in machine opera-
tions [8]. They take long time to use various functions because it is difficult for them
to grasp overall functions. Conversely, persons familiar with machine operations
come to use machines correctly in a short time. Therefore, we expect that we can
evaluate machine aptitude by a task to understand how to operate avatar correctly.
In this method, we focus on the avatar movement. In the job experience, avatars
are assumed to run whenever they move. The person experiencing a job needs to
understand how to make the avatar run. He must operate the avatar correctly. When
he does not operate correctly, his avatar may walk or stop. The person who is good
at movement operation has less time to move the avatar for a specific distance than
others. The person operating loosely takes long time because his avatar ought to hit
the wall. Therefore, the value of the moving distance divided by the movement time
is used as the index of the machine operation aptitude.
Extraction of Vocational Aptitude from Operation Logs in Virtual Space 259

Table 1 Abilities and actions required for each aptitude

Aptitude Ability Action

Recording and organizing the Working on a schedule


System
details of duties and working Processing tasks according to priorities
systematically
Machine Understanding the principle of the
Understanding the mechanism of ma-
Operation machine and operating the machinechines and equipping with them
Understanding the basic fundamentals
of science
Scientific Understanding and leveraging Checking programs and magazines of
Laws scientific laws the health information and the latest
science technology
Apply scientific acknowledges to daily
life
Japanese Understanding and using Japanese Writing and talking Japanese rightly
Ability grammar and reading correctly Framing and describing ideas with ap-
propriate and right words
Space Grasping spatial structure of a Explaining the photographic image
Recognition house and a machine from a design from the blueprint

The aptitude for scientific laws requires ability to leverage scientific laws after
understanding of them. Action cases include understanding the laws and formulas
of mathematics. It is advanced ability for elementary school child to use arithmetic
to their surroundings [9]. We expect that we calculate the aptitude by a task requiring
arithmetic knowledge in a job experience. Elementary school child operates accord-
ing to his refracted thought in virtual space. The person of the high system aptitude
takes ideal operation because he thinks well. The person of the low system aptitude
operates aimlessly. The value of the rate of the elementary school child operation
against the ideal operation is adopted as the index of the science laws aptitude value.
Japanese Ability aptitude requires ability which is to understand conversation
or writings in Japanese. One of the examinations to measure Japanese ability is
ACTFUL-OPI (ACTFL—Oral Proficiency Interview) [10] This examination evalu-
ates the ability to have logical conversations. The person of the high Japanese ability
understands what the other says and responds correctly. It can be expected to calcu-
late the aptitude by the degree of the person to logically cope with questions from
other avatars. Persons who are good at coping with questions understand what to
be told in short time. At the same time, they can give correct responses quickly.
Persons who are poor at coping with questions understand what they are told in a
long time. Their responses are often wrong. We estimate the person who is loosely
takes short time but responses wrong. Therefore, the value of the correct answers
divided by the total response time is the index of the Japanese ability aptitude value.
If we permit the user of the job experience tool to type in an arbitrary message, the
response time depends on the typing speed. In addition, the input message varies
260 K. Nishide et al.

with his expressive ability of Japanese. To avoid them, we provide options for the
user so that he can choose his response from them. In this way, we can judge the
user ability whether to give response suitable for the context without depending on
his expressive ability of Japanese, even if he is an elementary school child.
Space Recognition aptitude requires ability to grasp spatial structure of a house
and a machine from a design. The person of the high space recognition ability grasps
the current position focused in the virtual space from two- dimensional information
correctly. There is a test to measure space recognition ability using virtual space
[11]. In this test, examinees walk freely at building in the virtual space. Imaging a
space structure, he sketches the floor plan of the building. Therefore, we expect that
we calculate the aptitude using a task requiring to recognize space structure. Person
who can grasp space structure knows the current position on a map immediately. A
poor person easily loses the correspondence from a position on a map to that in a
virtual space. Since he loses the current position, he rotates his avatar to know the
space structure around him. As a result, he cannot move his avatar. Therefore, the
system calculates the stop time when avatar should move and the time is the index
of the space recognition value.

4 Implementation of Home-Delivery Service Job Experience


4.1 Experience Scenario
We have implemented a tool for the home-delivery service job experience like Fig. 2
in the three-dimensional virtual space applying the proposed method. This virtual
space has 318 houses. Each house has an address. 15 houses are chosen as delivery
destinations. A user operates the avatar imitating a deliveryman. The avatar can
move only on the road among houses. The home-delivery service job comprises
two works, route definition and delivery.
The route definition is the work to define the delivery order of 15 packages. Every
package has an ID number. The user knows the delivery destination on the package
by clicking the package. The user may write the delivery destination in a paper map
which illustrates the arrangement of buildings in the virtual space. After the user
decides the shortest delivery order, he registers it. If the registered order is not the
shortest, the system presents the shortest order automatically. The user is required
to deliver the packages according to the shortest order.
The delivery is the work to carry the package to each delivery destination accord-
ing to the shortest order. The user starts from the start point and carries each given
package to the delivery destination correctly. The user can view undelivered pack-
ages and the delivery destinations anytime. Moreover, the user can make a certain
package focused on. The way to hand package is that the user moves the avatar to
the front of the delivery destination and clicks the door. If the delivery destination
of a focused on package is the same as the house of the clicked door, the user hands
the package. The avatar talks to the person of the house. Questions such as ”Who
are you?” and ”Who sends the package?” are displayed on the message board. The
Extraction of Vocational Aptitude from Operation Logs in Virtual Space 261

Fig. 2 Screenshot of home-


delivery service job applica-
tion

user can answer by clicked on the given options. If the delivery destination of the
focused package is not the same as the house of the clicked door, the system warms
by displaying ”It does not match delivery destination”, or ”It does not match the
package” on the message board. When the user finishes delivering packages, the job
experience finishes.

4.2 Aptitude Value Expressions


System calculates the five aptitude values of CPS-J from operation logs.
The person of the high system aptitude can decide the correct priority from given
information. In the home-delivery service job experience, the system aptitude cor-
responds to checking delivery destinations of packages as well as deciding the order
of the shortest delivery route. We use Equation 1 to evaluate the system aptitude
value.
max(2 × answer d − user d, 0)
Vsys = (1)
log(time) + 1
where the variable answer d is the shortest move distance. The variable user d is
the move distance of the delivery order which the user defines. If user d is more
than twice of answer d, numerator of right side in Equation 1 evaluates 0. We di-
vide it by the logarithm of the answer time. We add 1 to guarantee that we divide
it by values more than 0. The value is the system aptitude value, which is denoted
with Vsys . The person good at machine operations understands how to operate a ma-
chine quickly as well as operates it correctly. To evaluate the machine operation, the
system calculates the value of the moving distance, move distance, divided by the
movement time, move time, is used as the index of the machine operation aptitude
as shown Equation 2. The movement time does not include the loss time, which
is the time that the user does not operate more than ten seconds. The value is the
machine operation value, which denoted with Vmac .
262 K. Nishide et al.

Fig. 3 Shortest route of


scientific law

move distance
Vmac = (2)
move time
The person strong at scientific laws finds an ideal route because he considers well
with scientific laws. In the home-delivery service job experience, a user needs to
understand the shortest route to make the delivery efficient. The science law aptitude
corresponds to finding the shortest route for the delivery as well as to moving the
route. We expect that the person of the high science law aptitude can shorten the
movement distance like Fig. 3. We use Equation 3 to evaluate the scientific laws
aptitude value.
 2
mid d
Vsci = , (3)
move d
where min d is the shortest move distance for delivering all packages. It is the con-
stant derived from the distance combination of the delivery order. On the other
hand, move d is the movement distance the avatar moves to deliver all packages.
The shorter distance the avatar moves, the higher value the user gets. The system
squares the value of min d by move d to widen the move distances gap. The value
is the science laws value, which is denoted with Vsci .
The person of the high Japanese ability understands what the other says and re-
sponds correctly. To evaluate the Japanese ability, the system calculates the value
of the number of collect replies, collect answers, divided by total replying times,
reply time, as shown Equation 4. The value is the Japanese ability value denoted
with V jap .
collect answers
V jap = (4)
reply time
A user identifies a building to which he should deliver a package using a map. Since
the user experience the home-delivery service job in a virtual space, he must as-
sociate his map into roads and buildings in the virtual space. The person of the
high space recognition ability grasps the current position in the virtual space from
two-dimensional information on a map correctly. The space recognition aptitude
corresponds to moving the avatar without losing their ways while watching a map.
We expect that the person of the high space recognition aptitude can shorten the
time in which he loses his way. The system calculates the loss time, loss time, to
evaluate the space recognition as shown Equation 5. The loss time is the time that
Extraction of Vocational Aptitude from Operation Logs in Virtual Space 263

the user does not operate more than ten seconds. The value is the space recognition
value denoted with Vspa .
Vspa = loss time (5)

5 Experiments and Evaluation

5.1 Objective and Content


We have conducted the experiment to validate the effectiveness of the proposed
method. We have two experiment items. The one is whether the system extracts
vocational aptitudes from operation logs of each job experience in the virtual space
correctly. The other is whether the system makes elementary school children enjoy.
We compare the degree of coincidence of the extracted aptitudes by the proposed
method with those by the CPS-J of the vocational aptitude test.
To compare the proposed method with CPS-J, we should make examinees take
both of them. Unfortunately, elementary school children cannot take CPS-J tests,
became the tests need much knowledge. We should prepare a special method to
compare the proposed method with CPS-J. Since children cannot take CPS-J tests,
we adopt university students as examinees. If the proposed method extracts the uni-
versity student aptitudes correctly, we except the system can extract the aptitude of
elementary school children. We have conducted the aptitude extraction experiment
for fifteen university students who take both of CPS-J test and the test based on the
proposed method. We have compared the both result to see the correlation between
them. On the other hand, many elementary school children take a class with PCs.
Because the implemented system is operated with a mouse and arrow keys, we can
expect they are familiar with these operations. In addition to the aptitude extrac-
tion experiment, we let four elementary school children use the system, to have the
questionnaire about the pleasure of the experience and the operability.
All examinees experienced the route definition and the delivery of the home-
delivery service job experience. In the route definition, the examinees watch the
delivery destinations of the given fifteen packages and write memos on prepared
paper map. SuccessivelyCthey estimate the delivery order of the shortest movement
distance. They register the order to the system. In delivery, they carry fifteen pack-
ages to the delivery destinations. The university students perform three sets of these
works, while the elementary school students perform one set of these works for the
questionnaire. The university students perform CPS-J after all sets. The system cal-
culates the correlation coefficients between the extracted aptitude values and values
of the CPS-J answer for each aptitude. When the correlation coefficient is the high
value, it means that the higher the examinees get the value of proposed method, the
higher they get the value of CPS-J. This proves that the aptitude extracted by the
proposed method is correct.
264 K. Nishide et al.

5.2 Result and Consideration

Table 2 Result of correlation coefficients

Aptitude Set 1 Set 2 Set 3 Average


System 0.3464 0.5143 0.1699 0.3435
Machine Operation 0.6056 0.5068 0.4023 0.5049
Scientific Laws 0.3403 0.5432 0.6342 0.5059
Japanese Ability 0.4652 0.4059 0.4653 0.4455
Space Recognition -0.5779 -0.495 -0.0794 -0.3841

Table 2 shows the result summarizing the correlation coefficients of extracting apti-
tude values and values of CPS-J. In this paper, we judge that an aptitude value and
the corresponding value of CPS-J are associated with each other if their correlation
coefficient is more than 0.4.
The system aptitude has less than 0.4 in the coefficients at the first set and the
third set. This is an undesired result. In the first set, some of the examinees only
marked the delivery destinations and the others wrote the package numbers for the
delivery destinations on the paper map. In the second set, all examinees wrote the
package numbers for the delivery destinations on the paper map. The correlation
coefficient is presumed high in the second set because all of them take notes in the
same way. We can say that it is effective to show a sample of how to write the memo
beforehand. The delivery destinations in the third set resemble closely those of the
second set. Delivery destinations in the third set are similar to those in the second
set. Some examinees have planned the delivery route referencing the second set one.
They get better aptitude value at the third set compared with the result of the second
set. It makes their aptitude in the proposed method far different from the result of
CPS-J. Therefore, we estimate the correlation coefficient is lower. This problem can
be solved by setting the delivery destinations not to resemble closely.
The machine operation aptitude has high correlation coefficient in first set, while
it gets low as examinees experience jobs more times. While the examinees of high
aptitude get familiar with operations immediately, the examinees of low aptitude do
not. However, the examinees of low aptitude can operate well as they experience
jobs more. Therefore, the correlation coefficient is presumed to get lower. To avoid
the effect, we should take operation logs in early set for the machine operation
aptitude.
The science law aptitude has less than 0.4 coefficients at the first set. But, it gets
better later. We should later operation logs for the science law aptitude.
The Japanese ability aptitude has more than 0.4 coefficients in all sets. We can
take any set to take operation logs to calculate the Japanese ability aptitude. We in-
vestigate the correlation coefficients of the machine operation value of CPS-J and
the total answer time. The correlation coefficient value is 0.13 on the average. It no-
tices us that users cannot select the option quickly even if they are good at machine
operation.
Extraction of Vocational Aptitude from Operation Logs in Virtual Space 265

The space recognition aptitude should have the negative correlation coefficients
because the high aptitude user does not lose his way. The experiment result shows
that the correlation coefficient is the high value at the first set and second set. It
means we can apply the method in early set. The more examinees experienced
building arrangement in the virtual space, the more they grasp the position of the
buildings, which prevents them from losing their way. We can expect to keep the
high correlation coefficient with increasing kinds of the placement of the buildings
to avoid their grasp of the building arrangement.
From the result, we have found that some aptitudes should be calculated in early
set while others can be derived from any set in repeated job experiences. We cannot
expect that elementary school children conduct same experience many times. The
system should judge the vocational aptitudes in the first set though they do not get
accustomed to it. To get the aptitude which can be derived after users get accus-
tomed to the tool, we need to make users understand operations in a job experience
tool, which eliminates the necessity of repeating job experience. Especially, it is im-
portant for examinees to have the image what kind of operation they do. In future
experience, it is useful for them to watch the tutorials such as demonstration movies
of the job experience in advance.
Elementary school children answered a questionnaire by four grade evaluation
after they experienced. In the item of ”Was the experience of the home-delivery
service job fun?”, all of them answered to enjoy it. In the item of ”Do you want to
experience like this service for other jobs again?”, 75 of them answered that they
want to experience. Therefore, job experience with the virtual space is the service
to make elementary school children enjoy. In the item of ”Could you operate as
expected?”, all of them answered to operate as expected. Elementary school children
can operate enough if they use only the mouse and the arrow key. Because they
can operate as expected, our proposed method can be used for elementary school
children.

6 Discussion
It is a significant issue how many applications we should implement. The proposed
method would be infeasible if we need to implement an application corresponding
to every job in the world. By measuring wide range of abilities, CPS-J and the theory
of Holland judge the aptitudes for wide varieties of jobs from primary industry to
tertiary industry. The proposed method aims to judge the vocational aptitudes from
the same viewpoint of CPS-J and the theory of Holland. To achieve it, we need to
implement applications to measure wide range of abilities in the virtual space. One
application allows us to measure many kinds of abilities of the user, from which we
can judge his vocational aptitudes for more than one job. In the experiment described
in 4, we used the home-delivery service application. Note that, however, it does
not judge the aptitude of only home-delivery service. Therefore, we do not have to
implement every application corresponding to an individual job whose aptitude is
to be judged. Meanwhile, we cannot judge all of the vocational aptitudes with one
266 K. Nishide et al.

application. We should implement several applications which enable us to judge all


vocational aptitudes we want to judge.
It is desirable to reduce the number of applications in virtual space because the
implementation of application requires much effort. We should implement an ap-
plication to measure as many vocational aptitudes as possible. To accomplish it, we
should consider the story of an application and the way of interaction with users in
it. We should implement a small set of well-implemented applications, with which
we judge many kinds of abilities. It is our future work to establish a method to find
the small set.

7 Conclusion
In this paper, we proposed a method to extract the vocational aptitude with a virtual
space. We use a virtual space to implement a vocational aptitude test system to make
elementary school children enjoy. This method focuses on the operation logs of each
work of job experiences in the virtual space. The proposed evaluation expressions
extract the aptitude value from the operation logs. By extracting the vocational ap-
titude from the operation logs, the system can be applied to the test examinees who
have no ability to answer the complicated questions of the traditional tests.
We have experimented whether the vocational aptitude can be extracted by this
method or not. Examinees take the home-delivery service job experience in a virtual
space. We have examined the correlation coefficients between the aptitude values
extracted from operation logs at that time and the answered values of CPS-J. We
have found that some aptitude should be derived from the first operation log, while
others from later logs. The aptitude which needs many experiences can be calculated
after examinees get accustomed with operation. Since we cannot expect children
try same job experience many times, we should give them enough knowledge on
operations in the virtual space in advance. To improve the performance higher, it is
useful for them to watch the tutorials of the job experience in advance.
As a future work, we will develop useful presentation method of the extracted
aptitude values.

References
1. A Student Site for ACT Test Takers,
http://www.actstudent.org/index.html (cited November 29, 2011)
2. Free Sample of Vocational Aptitude Test,
http://www.personality-and-aptitude-career-tests.com/
vocational-aptitude-test.html
(cited November 29, 2011)
3. CPS-J,
http://www.nipponmanpower.co.jp/ps/think/cpsj/
(cited November 29, 2011) (in Japanese)
4. Holland, J.L.: Making vocational choices: A theory of vocational personalities and work
environments, 3rd edn. Psychological Assessment Resources (1997)
Extraction of Vocational Aptitude from Operation Logs in Virtual Space 267

5. Ogawa, K.: Application of role model based e-portfolio system to career design support.
In: Proceedings of World Conference on E-Learning in Corporate, Government, Health-
care, and Higher Education 2008, pp. 3052–3057 (2008)
6. Bhavani, B., Sheshadri, S., Unnikrishnan, R.: Vocational education technology: rural
India. In: A2CWiC 2010 Proceedings of the 1st Amrita ACM-W Celebration on Women
in Computing in India (2010)
7. Coles, T.R., Meglan, D., John, N.W.: The Role of Haptics in Medical Training Simula-
tors: A Survey of the State of the Art. IEEE Transactions on Haptics 4(1), 51–66 (2011)
8. Ueda, K., Endo, M., Suzuki, H.: Task decomposition: Why do some novice users have
difficulties in manipulating the user-interface of daily electronic appliances. In: Harris,
D., Duffy, V., Smith, M., Stephanidis, C. (eds.) Human-Centred Computing: Cognitive,
Social and Ergonomic Aspects, pp. 345–349. Lawrence Erlbaum Associates (2003)
9. Saito, K.: Study on systematic instruction to bring up power to think about mathemati-
cally (2009) (in Japanese),
http://www.fuku-c.ed.jp/center/houkokusyo/h21/
h21sansuuchoken.pdf
(cited November 29, 2011)
10. ACTFL Certified Proficiency Testing Programs (oral and written),
http://www.actfl.org/i4a/pages/index.cfm?pageid=3642
(cited November 29, 2011)
11. Yasufuku, K., Abe, H., Yoshida, K.: Development of Architectural Visualization Ability
Test Using Real-Time CG. In: Proceedings of 7th. Japan-China Joint Conference on
Graphics Education, pp. 44–49 (2005)
Framework of a System for Extracting
Mathematical Concepts from Content
MathML-Based Mathematical Expressions

Takayuki Watabe and Yoshinori Miyazaki *

Abstract. This study proposes the framework of a system that extracts mathemati-
cal concepts from an input mathematical expression. In this paper, math concepts
are represented as specific patterns in math expressions such as “differential equa-
tions” and “quadratic functions.” This system, termed the concept extraction sys-
tem, presents a math concept when an input math expression includes the pattern
for that particular math concept. The system uses two key components: “Math-
Placeholder,” an originally defined XML vocabulary to describe patterns, and a
pattern discriminator, a mechanism to identify whether an input math expression
includes the predefined pattern(s). Math expressions described by an XML voca-
bulary called Content MathML have been used for this study. Lastly, the follow-
ing two applications of the proposed system are presented: (1) it can be used as an
information retrieval tool to match math concepts in math expressions, and (2) it
can be used together with a learning management system that provides study
material for the concepts used in a given math expression.

1 Introduction

Mathematical concepts are often represented as specific patterns in math expres-


sions. For example, the math expressions for “differential equations” include “diffe-
rential operators” in the form of “equations”. This implies that precise identification
of patterns in math expressions enables us to extract math concepts used in them. In
this study, the authors aim to propose the framework of a concept extraction system,

Takayuki Watabe
*

Graduate School of Informatics, Shizuoka University, Japan


e-mail: gs11055@s.inf.shizuoka.ac.jp
Yoshinori Miyazaki
Faculty of Informatics, Shizuoka University, Japan
e-mail: yoshi@inf.shizuoka.ac.jp

T. Watanabe et al. (Eds.): Intelligent Interactive Multimedia: Systems & Services, SIST 14, pp. 269–278.
springerlink.com © Springer-Verlag Berlin Heidelberg 2012
270 T. Watabe and Y. Miyazaki

and the development of such a system will allow us to extract math concepts for a
variety of applications. For instance, the system performs the function of matching
math expressions having similar math concepts; furthermore, the extracted concept
can be used to obtain educational material to understand the math expression.
In order to distinguish between math expressions with various patterns such as
“differential operators” or “equations” and those without such specific patterns,
use of rigid notations is indispensable. Content MathML[8], which meets the
abovementioned requirement, allows us to clearly describe the types of operators
and operands in math expressions. In addition, TeX is a well-known markup lan-
guage for describing math expressions; however, this language does not clearly
distinguish between “the product of fraction and ” and “the differentia-
tion of with respect to .” Content MathML clearly distinguishes between
such concepts, and this is why we chose this markup language to perform the ex-
traction of math concepts from the input math expression.
In section 2, Content MathML will be introduced with some examples. Section
3 elaborates on the proposed system configuration and algorithms for extracting
math concepts, particularly focusing on how math patterns are described using the
originally defined XML vocabulary, and how math expressions are identified to
have specific patterns. Subsequently, section 4 explains two future applications of
the proposed system, such as the use of the system as an information retrieval tool
to match math concepts and its use with learning management system (LMS) to
obtain relevant study material to study a math expression after extracting concepts
from the expression. Section 5 presents the concluding remark.

2 MathML
Content MathML is an application of XML for describing mathematical notations
and capturing both its structure and content. It has been released as a W3C Rec-
ommendation and defined as one of the XML vocabularies. Central to Content
MathML is the <apply> tag that represents to apply a function or an operator,
given in the first child, applied to the remaining elements.

2.1 Examples of Content MathML


Fig. 2.1 shows a code for a Content MathML-based expression , with tags
explained in Table 2.1.

<apply>
Table 2.1 Description of tags in Fig. 2.1
<plus/>
<ci>a</ci> Tag Description
<ci>b</ci> applies the first child operator
</apply> apply
to other elements
Fig. 2.1 Representation of using plus performs addition
Content MathML ci encloses identifiers
Framework of a System for Extracting Mathematical Concepts 271

To be precise, the code in Fig. 2.1 is interpreted as “applying addition to the


identifiers a and b.” The next example involves differentiation. Fig. 2.2 shows the
Content MathML code for , with its tags defined in Table 2.2.

<apply> Table 2.2 Description of tags in Fig. 2.2


<diff/>
<bvar><ci>x</ci></bvar> Tag Description
<degree><cn>2</cn></degree>
<apply> diff performs differentiation
<ci>f</ci> bvar specifies bounded variables
<ci>x</ci> degree order of differentiation
</apply>
</apply> cn encloses a number

Fig. 2.2 Representation of by Con-


tent MathML

In other words, the code in Fig. 2.2 is for “applying second order differentiation
to function .” Thus, Content MathML is used to clarify the structures of math
expressions using the <apply> tag with different types of operators and operands.
There are different tags for operators such as subtraction, multiplication/division,
trigonometric functions, and logarithmic functions.
Content MathML also has introduced a mechanism to describe types of identi-
fiers by the values of type attributes of <ci> tag. For example, <ci
type="vector">x</ci> means that the identifier is a vector. The values of
type attributes include “integer,” “real,” “matrix,” “set,” and more.

2.2 Representation of MathML by Tree Structure


MathML is defined as one of the XML vocabularies. Because an XML document
makes a one-to-one correspondence with its tree structure, Content MathML is
also uniquely represented by a tree. The representation of sin by Content
MathML and its corresponding tree are shown in Fig. 2.3.

<apply>
<sin/>
<apply>
<plus/>
<ci>a</ci>
<ci>b</ci>
</apply>
</apply>

Fig. 2.3 Representation of sin by Content MathML and its corresponding tree
272 T. Watabe and Y. Miyazaki

For the rest of this paper, let us use the two representations interchangeably and
allow the terminology to describe trees with “sin as a child of apply” or “plus and
ci as siblings.”

3 Framework of Concept Extraction System


Concept extraction is a function that extracts math concepts used in input math
expressions. In particular, this function represents math concepts with specific pat-
terns that match with the patterns used in the math expressions. The system stores
a number of pairs of such concepts and patterns called “concept tuple.” A combi-
nation of a pattern “including differential operators in the form of equations” and a
concept “differential equations” is one example of a concept tuple. In this case, if
an input math expression has a pattern “including differential operators in the form
of equations,” the corresponding concept, “differential equations,” will be dis-
played. This approach is an extension of [7], which is called keyword extractions.
The proposed system consists of “a concept tuple database” and “a pattern dis-
criminator.” The concept tuple database is a database retaining a number of con-
cept tuples, and the pattern discriminator is a mechanism to detect whether or not
an input math expression (hereafter referred to as “target”) has a predefined
pattern.
The operation of the system is described as follows: First, a concept tuple is ex-
tracted from the concept tuple database in a sequential manner. Second, the pattern
discriminator detects whether the target includes the patterns specified in the con-
cept tuple. The math concept is extracted from the system if the matching is
successful.
The following sections elucidate the representation of patterns and the function-
ing of the pattern discriminator.

3.1 Pattern Representation


New tags are devised for the purpose of realizing pattern descriptions: “arbitrary,”
“partial,” and “logical (operator)”. Patterns are described by Content MathML
along with the originally defined XML vocabulary. We shall call the original
XML vocabulary as “MathPlaceholder.” In [6], MathMLQ is proposed as an XML
vocabulary that can be used along with MathML to express queries. Because the
“logical (operator)” has a strong connotation as a query, it is analogous to some
parts of the functions in MathMLQ.
Tags in MathPlaceholder perform the function of considering an element (en-
closed by the tags) as one under specific conditions as described by Content
MathML. The conditions are described by tags, attributes, or child element(s) of
the MathPlaceholder. Now, we explain each of the abovementioned tags. First, the
<arbitrary> tag is used to permit arbitrary elements. Arbitrary elements, for in-
stance, include a single identifier <ci>a</ci> and an expression to add x and y, or
<apply> <plus/> <ci>x</ci> <ci>y</ci> </apply>. The <arbitrary> tag
might be used with the “label” attribute. The value of this attribute is any string.
Framework of a System for Extracting Mathematical Concepts 273

<arbitrary> tags with the same values of the “label” attribute are considered to
be identical. Another attribute that may be taken by the <arbitrary> tag is “sib-
ling.” The value of the “sibling” attribute is either true or false. Unless this attrib-
ute is explicitly specified, its value is considered to be false. If this attribute value
is set to true, the <arbitrary> tags are interpreted as arbitrary plural numbers of
sibling elements. The <partial> tag is also devised to consider its child elements
as its part. With the use of this tag, a sample pattern code <mp:partial>
<ci>x</ci> </mp:partial> may match targets such as <ci>x</ci> and an
expression to multiply x and y, or <apply> <times/> <ci>x</ci> <ci>y</ci>
</apply>. One may append the “sibling” attribute to the <partial> tag. When
the value of the “sibling” attribute is true, the <partial> tag is interpreted as plu-
ral siblings either of which has the child of the <partial> tag as its part. Lastly,
<and>, <or>, and <not> represent three logical operators “and,” “or,” and “not,”
respectively. The <and> and <or> tags have more than one element, whereas the
<not> tag has only one child. The <and> tag, usually used with the <partial>
tag, is used to represent all the child elements of the tag. For instance, math ex-
pressions containing both and , can be coded as shown in Fig. 3.1.

<mp:and>
<mp:partial>
<ci>x</ci>
</mp:partial>
<mp:partial>
<ci>y</ci>
</mp:partial>
</mp:and>

Fig. 3.1 Codes for math expressions containing both and

Likewise, the <or> tag is used for the “or” operation, and the <not> tag is con-
sidered for the element which is not the child of the tag. Table 3.1 summarizes the
tags introduced in this section.

Table 3.1 List of tags of MathPlaceholder

Tag Attribute Synopses (meaning of the string enclosed by the tags)


Arbitrary - arbitrary elements
elements corresponding to this tag have label attributes equal
label=“string”
to “string”
sibling=“true” arbitrary plural sibling elements
Partial - elements with the child element of this tag as their part
sibling elements either of which has the child element of this
sibling=“true”
tag as its part
And - logical “and” operator
Or - logical “or” operator
Not - logical “not” operator
274 T. Watabe and Y. Miyazaki

By introducing this MathPlaceholder, we show that it is feasible to obtain the


pattern for the math concept “differential equations” as shown in Fig. 3.2.

<mp:partial>
<apply>
<equal/>
<mp:partial sibling="true">
<apply>
<diff/>
<mp:arbitrary sibling="true"/>
</apply>
</mp:partial>
</apply>
</mp:partial>

Fig. 3.2 Pattern for “differential equations” generated by MathPlaceholder

The pattern for “combinations ( C )” is also gained in Fig. 3.3.

<mp:partial>
<apply>
<divide/>
<apply>
<factorial/>
<mp:arbitrary label="n"/>
</apply>
<apply>
<times/>
<apply>
<factorial/>
<mp:arbitrary label="k"/>
</apply>
<apply>
<factorial/>
<apply>
<minus/>
<mp:arbitrary label="n"/>
<mp:arbitrary label="k"/>
</apply>
</apply>
</apply>
</apply>
</mp:partial>

Fig. 3.3 Pattern for “combinations ( C )” generated by MathPlaceholder

3.2 Pattern Discriminator


In order to discriminate among various math concept patterns, the concept extraction
system closely compares the elements of a pattern and a target. If any element of a
pattern does not match with that of the target, the system discontinues traversing
through the tree structure of that pattern. However, if all the elements of the pattern
Framework of a System for Extracting Mathematical Concepts 275

match with those of the target, that target is identified to have the matched pattern as
a math concept. Let us take a simple example, wherein the target math expression is
sin and the pattern is cos . The tree structures of the target and
the pattern are shown in Fig. 3.4.
The number near each element indicates the traversing order. It can be observed
that the tenth elements in the two tree structures do not match, thereby leading to
the termination of the traversing procedure. As a result, the system indicates that
the target sin z does not include the pattern cos in it.

Fig. 3.4 Tree structures of y sin (left) and y cos (right)

The traversing procedure to find the elements of a MathPlaceholder is different.


In particular, the MathPlaceholder controls the traversing path in pattern discrimi-
nation. In an algorithm, the “arbitrary” and “partial” elements are considered as
wildcard characters. In [1], the authors have focused on discussions and observa-
tions concerning the use of wildcard characters in math expressions, whereas our
study focuses on the implementation of the wildcard characters using the “arbi-
trary” and “partial” elements for traversing the tree structure paths.
Now, we explain the traversing algorithm. When the tree structure contains the
“arbitrary” element, the system continues traversing irrespective of the target ele-
ments. If the target element is a subtree, the system continues traversing by skip-
ping the subtree. Fig. 3.5 shows an example of an expression with the “arbitrary”
element and the traversing path followed by the system.

Fig. 3.5 Tree with “arbitrary” element and its traversing path
276 T. Watabe and Y. Miyazaki

The next example is that of a pattern with a “partial” element. When the “par-
tial” element is detected, the child element of the element and the corresponding
element of the target tree are separated, and they might form another tree. Let the
child element of the “partial” element, and its corresponding element in the target
be denoted as pt and tt, respectively. Further, let pte and tte present elements being
focused of pt and tt, respectively. It should be noted that are scanned according to
the algorithm shown in Fig. 3.6.

pte=root element of pt; tte=root element of tt;


while{
if(pte and tte are identical){
pte=next element of pte; tte=next element of tte;
}else{
if(pte and the root of pt are identical) tte=next element of tte;
else pte=root element of pt;
}
if(pte is the last element) exit with the result “the matching is successful”;
if(tte is the last element) exit with the result “the matching is unsuccessful”;
}

Fig. 3.6 Scanning algorithm for pt and tt (“partial” element case)

When the child element of the “partial” element in the pattern and the corre-
sponding element in the target are matched successfully, the system continues
traversing the tree by skipping the subtree of the “partial” element in the pattern as
well as in the target. However, if the elements in the target and the pattern do not
match, the system indicates that the target does not include that pattern. Fig. 3.7
shows the scanning procedure of the pattern and the target tree structures.

Fig. 3.7 Scanning of pattern and target trees with “partial” elements

When a tree includes an “and” element, each of the child elements is extracted
individually. In addition, the system extracts the corresponding element (tt) from the
target tree, as well. Next, each child element of the “and” element is considered as a
subpattern of the element and is matched with tt to check whether tt has similar
subpatterns. If tt includes all the subpatterns, the system will continue scanning by
skipping tt as well as the subtrees of the “and” element; however, if tt does not in-
clude all the subpatterns the system indicates that the scanning result is unsuccessful.
Framework of a System for Extracting Mathematical Concepts 277

Examples of trees with “or” and “not” elements are omitted owing to space
constraints in the paper.

4 Applications
In this section, we discuss two applications that use the concept extraction algorithm.

4.1 Collaboration with Learning Management System


The first application of the concept extraction system is to develop a module,
which provides study material that can aid in understanding input math expres-
sions, in the LMS. In this system, when a user provides an input math expression,
the module matches the concept tuples in its database with the input expression
and extracts the concept from the expression, and then, presents relevant study
material instead of providing only the math concept. This module is particularly
useful when the users are inclined toward the study of math expressions with math
concepts, which are beyond their comprehension. This is because, in case of in-
comprehensible expressions, it is very difficult to fathom as to which concepts are
required to understand such math expressions. Table 4.1 shows a few examples of
input math expressions and their corresponding output materials to be presented.

Table 4.1 Input-output correspondence (concept extraction and LMS)

Input (math expression) Math concepts Output (learning material)


(e.g.) materials on differentiation, inte-
d /d differential equations
gration, and differential equation
(e.g.) materials on graph and discrimi-
1/2 quadratic functions
nant of quadratic function

4.2 Information Retrieval Tool to Match Math Concepts


The second application of the concept extraction system is an IR (information re-
trieval) tool, which outputs math expressions that match with the input math expres-
sion. This tool also allows us to find math expressions, which share the same math
concepts, even though their appearances are different. For an input math expression
provided by a user, the tool extracts its math concept and then presents math expres-
sions with concepts that match with the extracted math concept. The target math ex-
pressions in this application are texts and exercises used for math study. As related
literature, [2] proposed to create the most often used texts using MathML, which
will contribute to having affluent amount of documents as targets. Such an IR tool
can also be used to find similarities between two math expressions by the compari-
son of their corresponding math concepts. One probable shortcoming of this applica-
tion is the complexity of the algorithm required to find similarities between math
concepts. In [9], an algorithm to find the similarities between MathML-based ex-
pressions has been presented, using the sets of paths in the tree structures of the
278 T. Watabe and Y. Miyazaki

expressions. This algorithm finds the similarities by considering the structural prop-
erties of each math expression. On the other hand, the IR tool is capable of finding
similarities by comprehending math expressions more abstractly because the system
performs computations on the basis of the math concepts. There exists another study
that finds the similarities between expressions on the basis of the number of tags in
their MathML codes [4].

5 Concluding Remarks
This study deals with the representation of math concepts as specific patterns in
math expressions and aims to devise a method for extracting math concepts from an
input math expression. In order to describe patterns, an originally extended Math-
Placeholder is introduced, and its notations and their functions are elucidated. Two
key components, namely, “concept tuple” and “pattern discriminator,” are essential
to this system and have been explained in detail. Lastly, applications to show the va-
lidity and usefulness of the proposed concept extraction system have been presented.
In future, we plan to enhance this system by devising a method for the metadata
representation of the concepts by classifying the concepts in different categories
and for establishing relations among different concepts. This enhancement will
enable us to use the extracted math concepts optimally. In contrast to our future
plans, [3] deals with the knowledge management concerning math expressions,
and [5] deals with the development of ontology of expressions in MathML. It is
also desirable to implement the applications shown in section 4 to evaluate the
usefulness of the proposed system.

References
[1] Altamimi, M.E., Youssef, A.S.: Wildcards in math search, implementation issues. In:
CAINE/ISCA, pp. 90–96 (2007)
[2] David, C., Kohlhase, M., Lange, C., Rabe, F., Zhiltsov, N., Zholudev, V.: Publishing
math lecture notes as linked data. The Semantic Web: Research and Applications,
370–375 (2010)
[3] Jeschke, S., Natho, N., Wilke, M.: KEA-A knowledge management system for mathe-
matics. In: 2007 IEEE International Conference on Signal Processing and Communica-
tions, pp. 1431–1434 (2007)
[4] Kishimoto, S., Nakanishi, T., Sakurai, T., Kitagawa, T., Tochigi, T.: An Implementa-
tion method of similarity-based retrieval for formulas using MathML. In: IEICE
DEWS 2003 6-P-07 (2003)
[5] Kitani, N., Yukita, S.: The educational uses of mathematical ontology and the search-
ing tool. Frontiers in Education Conference (FIE 2008) T4B-11 (2008)
[6] Kohlhase, M., Sucan, I.: A search engine for mathematical formulae. Artificial Intelli-
gence and Symbolic Computation, pp. 241–253. Springer (2006)
[7] Watabe, T., Miyazaki, Y.: Toward math education utilizing math expressions with
their semantic information. In: Annual Conference of Japan e-Learning Association,
pp. 13–20 (2011)
[8] W3C Math Home, http://www.w3.org/Math/
[9] Yokoi, K., Aizawa, A.: An approach to similarity search for mathematical expressions
using MathML. Towards Digital Mathematics Library (DML), 27–35 (2009)
Fundamental Functions of Dynamic Teaching
Materials System*

George Moroni Teixeira Batista**, Mayu Urata, and Takami Yasuda

Abstract. In this research an e-Learning system was developed focusing on how to


implement the idea of dynamic contents in teaching materials, creating what we call
Dynamic Teaching Materials, with which teachers themselves can easily change or
update the teaching materials according to the requirements of the class. The
contents created with the system can also be easily shared between any teachers, and
when the shared content of one teacher is updated all the other teaching materials
that use the same content can also be automatically updated. The teaching materials
are produced in a WYSIWYG interface that allows teachers to create interactive
multimedia content without any knowledge of programing languages; the content
can also be edited in realtime. The interface design was focused on maximum
utilization of the user’s prior computer knowledge, so they do not have to learn basic
operations. The system was tested and is being used by the Department of Foreign
Languages and Translation of Brasilia University in Brazil.

Keywords: e-Learning, dynamic, language, teaching materials, interaction.

1 Introduction
In many countries including Brazil, there is a lack of e-Learning learning teaching
materials, and others problems such as: how students and teachers will access the
materials? Do they have all the necessary hardware and software configurations to
use the materials? Do they have the computer user experience necessary to operate
the system? Another problem is that most of the e-Learning teaching materials are

George Moroni Teixeira Batista ⋅ Mayu Urata ⋅ Takami Yasuda


Graduated School of Information Science, Nagoya University, Japan
e-mail: tenno.kun@gmail.com, mayu@nagoya-u.jp,
yasuda@is.nagoya-u.ac.jp
* Submitted for a Special Edition IS05: Intelligent Network Service Yuichiro Tateiwa.
** Corresponding author, address: Aichi ken, Nagoya shi, Higashi ku, Tsutsui 3-8-15 Residense
Higashi Yaba 210, 461-0003 Japan. Tel. 080-4303-1887.

T. Watanabe et al. (Eds.): Intelligent Interactive Multimedia: Systems & Services, SIST 14, pp. 279–287.
springerlink.com © Springer-Verlag Berlin Heidelberg 2012
280 G.M.T. Batista, M. Urata, and T. Yasuda

simply digital versions of printed teaching materials or a new material that works
in the same way. There is nothing that is really new or that makes the use of the
computer really necessary, or that in fact utilizes the real potential of a computer.
There is also a lack of e-Learning materials designed to be used during classes;
most of the materials are designed to be used in distance education, so teachers
have problems using it during classes, limiting them to only the use of videos,
audio and slide presentations which lead to a underutilization of the computer. In
fact, it is difficult to have a class where the teacher and all the students can access
a computer and have everyone participate as they do in a normal class; making
e-Learning teaching materials for this kind of situation is also difficult.
Because there are many people involved in the process of producing e-Learning
teaching material, it is more difficult to make the material really match the needs
encountered during classes by the teachers and students. First in the process is the
instructional designer work, and than there is the programmer or web designer
creating the material itself based on the instructional design, and finally there is the
teacher using the material with his/her students. However, as the instructional
designer is not the teacher that will be using the material, he/she does not know
exactly what is necessary in the design of the material, and what the teachers and the
students really need. Another problem is that the teacher does not know the technical
limitations faced by the programmer in the process of producing the materials, and
does not know exactly which problems can be solved by updating the material and
which problems would be better solved by adapting his/her teaching methods. If
they do not work together, therefore, the material may have many problems.
This research was done together with the Brasilia University Department of
Foreign Languages and Translation localized in Brazil. A group consisting of
teachers and tutors from the Brasilia University was responsible for testing the
system, creating teaching materials for the Brasilia University students. Those
teachers and tutors are members of the Project ELO (ELO stands for Online Lan-
guage School in portuguese, Escola de Línguas Online), and the Dynamic
Teaching Materials System developed as a result of this research is also part of the
project. Fundamentally, the Project ELO analyzed what kind of features are
necessary in the system, based on the teachers’ and tutors’ needs; however, in
Brasilia University we did not have the necessary resources and knowledge to
develop the Dynamic Teaching Materials System. As the Brasilia University had a
student exchange program with Nagoya University, we took this opportunity to
transfer the development of the system to Nagoya. Another reason for the transfer
was the opportunity to try to use the system in Japanese schools intended for
Brazilian children, so that Brazilian teachers could also help with the education of
those children without having to come to Japan.
The teaching material present in the system was created by veteran students
with the supervision of the teachers. In addition to the teaching materials, the
project also had communication features which were considered extremely
important; as stated in [1] and [2], allowing students to communicate with tutors,
teachers and other students is also important in the learning process, as they can
participate more in the process and help each other. The project started only with
Japanese, but the system is now available to the whole of the Department of
Fundamental Functions of Dynamic Teaching Materials System 281

Foreign Languages and Translation, and has a total of 55 lectures, 25 teachers


working together with 2 tutors each and a total of 2982 users already accessed the
system.

2 Dynamic Teaching Materials


When the development of the new version of the system began, making the
teaching materials dynamic was found to be the best way to make them easy to
edit or to be updated according to the necessities found in the classes in realtime.
Here is an example to show the basic idea of the Dynamic Teaching Materials.
Imagine the situation when a teacher creates a teaching material. The common
teaching materials are static; they cannot be changed as needed in realtime. Every
time something needs to be changed or corrected, a newer version of the whole
teaching material is usually needed, and this generally costs money. A great deal
of time is spent new versions of the materials, and probably the newer version will
not arrive until after the students have finished the semester or the course.
However with the Dynamic Teaching Materials System, when a teacher creates a
material, all the changes can be made by the teacher him/herself in realtime even
during the classes, so the students can have a teaching material that is focused on
what they need all the time.
When making a change or update, the teachers can also save it as a new
version, so they can have the new and the previous versions stored on the database
if needed. The content of the materials are stored in the database and can be shared
by the teachers, so a teacher can use the teaching material of other teachers as a
template to create a new one or make his/her own version of the material, and the
materials can also be combined to create new ones. When the content of one
material is linked to another material and the first one is updated, the content can
be automatically updated on the second material; so there is no need to update all
the materials one by one, and all the teachers can use the most recent version of
the contents all the time without having to worry about looking for new versions
of the contents linked to in their materials.
There are three principal differences between static teaching materials and
dynamic teaching materials. The first is the realtime editing feature, which makes
the teaching material editable anytime and with which the teacher can make the
material always match the needs of his classes as and when the needs appear, this
is different to what happens with a static material which usually can not be edited
by the teacher himself, where the teacher needs to send an edition or update
request to the person in charge of the material and when the next version of the
material is published, the teacher’s request may be attended.
The second difference is the shared content between the materials that allows
teachers to use contents created by other teachers to make their own materials. The
content link feature can also be used by teachers to cooperate in creating different
contents of the same material. They can work on different contents separately and
then put everything together linking the contents in the same material. However,
on static materials the contents are isolated, and there is no database for the
contents themselves, so the contents of one material can not be linked to other
282 G.M.T. Batista, M. Urata, and T. Yasuda

materials; in order to use the same content in another material, a copy of the
content needs to be inserted in the source of the other material and it needs to be
done on every material that uses the content.
The third principal difference is the automatic update of the shared contents.
This feature helps the teaching materials to be edited or updated faster, as all the
materials that have a linked content get updated at once. As static materials do not
support shared contents every time a content is updated the copies present on the
other materials stay the same; to update the copies it is necessary to insert a copy
of the updated version of the content in place of the old one, and it needs to be
done on every material one by one.
The features of the dynamic teaching materials allow the teaching materials to
evolve, adapting to the needs encountered by the teachers during classes. They
helps the teachers to work together sharing content and allows them to grow
together and share their updates faster automatically in realtime.

3 About the System

3.1 System Design


The system development had two main points, first was to find a way to allow the
teachers themselves to create or edit the e-Learning teaching materials in a viable
time, in order to make the changes really reach the students that needed them, so
an completely new interface was needed. The second point was the Dynamic
Teaching Materials System itself, how to handle all the needed features, like
realtime editing and multimedia, and at the same time make all the contents
linkable, so the teachers could share contents and help each other.
As the interface was the biggest problem for the teachers on the Project ELO
using Moodle, it was decided that all the necessary content editing tools would be
put on the new interface. During the development it was observed that many of the
teachers could not use the Moodle interface, but they could use other software like
Microsoft Word or Microsoft Power Point; so instead of making them learn
everything again from the beginning, it was decided to make the new interface
look like the interface of the software that they already knew how to work with,
but organized in a way that was more practical to them, so that they could have a
more intuitive experience even if they were using the system for the first time.
Because much of the software like Microsoft Word or Microsoft Power Point
has a WYSIWYG interface, another challenge to the system development was to
find a way to make a WYSIWYG interface to create or edit multimedia contents
that could also be interactive, which is a really important feature as indicated by
[3]. One important thing in this part of the system design was to talk to the
teachers and find out what kind of interactive features they needed and how they
expected to use those features. Almost all of the teachers had not much experience
operating computers, so even if they wanted to make something really
complicated with the interaction features, they could not because of this lack of
user experience. In fact, the system did not need more than basic interaction
features such as, mouse clicks, drag/drop and collision detection.
Fundamental Functions of Dynamic Teaching Materials System 283

A content linkage feature was also created in the system. As the teachers
wanted to create a database of the contents used in the teaching materials, when a
content is stored in the database, the content link feature allows teachers to share
the contents, link other teachers’ contents to their own materials or make copies
and create new versions of the contents. Any content created can be used in any
teaching material in the system just by making a reference to the desired content’
ID number; this feature works not only with texts, images or videos, but can be
used with a whole slide with multiple contents or a whole folder of slides.
The structure of the Dynamic Teaching Material is basically a group of folders;
the folders are filled with slides and inside the slides are the contents like texts,
images and videos. Everything has its own ID number, so when the system is
accessed by a teacher, the system, the system loads the necessary group of folders
based in the teacher’s ID. Then, when the teachers opens a folder the system
checks the ID number of the folder to verify which slides’ ID numbers are related
to that folder and loads the slides. When the system is loading a slide the process
is the same as loading a folder: the system checks the slide ID number to verify
the ID numbers of the contents related to the slide and loads the contents based on
the number written in the database table.
In the system the contents are organized in packages called ‘Packy’. Each
Packy is a package with a maximum of ten contents plus one content that is used
only in interactive operations. This package organization of the contents is
necessary for the interaction features. When the teacher is creating a teaching
material, he/she can configure all the interactions of the content and save it in a
package. For example, the teacher needs one image change to another when the
student clicks on it. In this case the teacher puts the two images in the same
package and configures it to change from one image to another on the editing
menu. All the information about the type of the interaction and the images’ urls in
the package are stored on the database as a content, and the package can be
loaded, edited and linked to any Dynamic Teaching Material as a single content.

3.2 System Features


Because there were many problems with the Moodle’s interface, the Dynamic
Teaching Material System editing interface was created completely separately
from Moodle, but the two systems can still be used together. The main part of the
system was programed in ActionScript 3 and compiled with the Flex SDK. The
system interface can run alone with the ADOBE AIR and can also run in the
internet browser like any other Flash application. When running with ADOBE
AIR, the interface runs like an native application of the computer’s OS, which
makes the interface run smoothly, and some graphical features only work when
running with ADOBE AIR, but all the functionality of the system runs the same
on both ADOBE AIR and in the internet browser.
A system diagram showing the connections between the technologies utilized
can be seen in Fig. 1. The system has a mysql database for the database tables
necessary to store the users and contents’ data. The ActionScript 3 can not
communicate directly with the mysql database; however, it can communicate with
284 G.M.T. Batista, M. Urata, and T. Yasuda

Fig. 1 System diagram showing the connections between the main interface with the other
parts of the system and which technology was used for creating the parts.

the PHP, so the PHP was used to link the interface of the system to the database.
The file upload feature was also made using PHP to store the files on the server,
because it is not possible to do it with just the ActionScript 3. The system also has
a content search feature that can be used to search for images and videos. In the
case of searching for images, the system allows the teacher to use the Google
image search directly from the system interface and put the image in the teaching
material. In the case of searching for videos, the system allows the teacher to
search videos from YouTube and opens the YouTube player inside the teaching
material as a content that can also interact with other contents. In both cases the
search feature was created using the GOOGLE AS3 API.
Comparing with other e-Learning systems, the Dynamic Teaching Materials
System has some features that make it more useful and faster than others. Almost
all of the e-Learning systems have an interface built focusing on the developer's
perspective, which is a big barrier to novice computer users, as stated in [4] and
[5]. One of the most important things about the system is the fact that the teachers
have everything they need in one place, on a single screen. All the creation,
editing, linking and configuration tools are on the same screen organized in
menus, and everything is online. The WYSIWYG interface allows the teacher to
open the teaching material and edit only the part that needs to be edited (Fig. 2);
the system has all the necessary features in the same place, so that the interface
used by the students to study, the interface used by the teachers to utilize the
Fundamental Functions of Dynamic Teaching Materials System 285

Fig. 2 Context menu and Packy editing menu used by teachers to create and configure the
content of the Dynamic Teaching Materials.

teaching materials on class and the interface used by the teachers to edit the
materials are the same, the only difference being that the editing features and
menus are not enabled to the student. Thus, when a teacher creates a teaching
material, the contents are already inside the system during the edition, so he/she
does not need to upload the teaching material later and can edit it in realtime
whenever they like. When a teaching material is finished it can be accessed
instantaneously by the students.
As stated in [6], “Novices tend to have limited knowledge and will often make
assumptions about what to do using other knowledge about similar situations.”
The system interface was made to be very similar to the computer OS interface, to
utilize the maximum of the user’s prior knowledge about operating a computer, so
they do not have to learn basic operations like opening files folders with a double
click, and will have a more intuitive experience even if they are using the system
for the first time. The WYSIWYG interface for editing the teaching materials is
very important, not just because it is more simple, but because the whole system
was structured to work based on this interface. This kind of interface helps the
user get a response from the system more quickly, because the interface allows the
user to see the changes in realtime.
286 G.M.T. Batista, M. Urata, and T. Yasuda

4 Tests and Interviews


Since the beginning of the Project ELO the teachers and tutors involved were
always working side by side with the system developer team. In the development
of the Dynamic Teaching Materials System there was a small difficulty due to the
distance: the teachers were in Brazil but the system was being developed in Japan.
However, the developer team and the teachers kept in contact as much as possible,
using many kinds of resources like Skype, Google Wave, e-mail and the forums of
the Project ELO site.
The Dynamic Teaching Materials System is now being used by 6 teachers in 6
lectures with approximately 50 students and 9 Dynamic Teaching Materials
created for the first three basic levels of the Japanese course. The feedback of the
teachers was used to evaluate the system. The teachers gave feedback about the
system functions, interface and what they needed the system to do. The teachers
also were observed by the developers to analyze how they worked with the
system, and the interface could then be adapted to the way they work. The system
was also tested by a designer and developer from CETEB, Technological
Education Center of Brasilia.
Some of the teachers ware interviewed individually and there was also some
reunions with all the teachers and the developers team to discuss the system.
Those interviews and reunions enabled a much more profitable sharing of
experiences using the system compared to the feedback given by e-mail, and were
very important to decisions made in the process of development of the system.
After the tests the teachers were satisfied with the simplicity of the interface,
the search feature and ability to insert content from the internet in the materials
from inside the system itself, even during classes, was very useful, and the
possibility of putting all the materials inside the system could prevent students
being charged the expense of teaching materials. There were also some comments
about features that could be added to the system or some possible improvements,
like making the access to the Packy editing menu more simple, or adding an
internal content search feature.

5 Conclusion
The system has shown good results allowing the teachers to create interactive
multimedia teaching materials, even without knowing any programing language.
The dynamic teaching materials are created faster in comparison with the
materials created before with Moodle and other authoring software used by the
teachers because of the WYSIWYG interface integrated directly inside the system
content view screen. The teachers creating and editing the teaching materials by
themselves is also a very important thing: the speed that the updates of the
teaching materials can be done make the update more efficient, and they reach the
students that really need them, making the teaching materials more appropriate for
both students and teachers.
Fundamental Functions of Dynamic Teaching Materials System 287

The system still needs some improvements, such as an internal content search
engine, improvements to the editing interface to allows teachers to really work
together on the same material at the same time on the same slide, communicating
and seeing what the other is doing in realtime. Another useful feature that should
be added is the possibility of dragging content from the computer file folder
directly to the system interface, which could make the upload of files from the
computer to the system faster and more easily. At present the system only runs on
computers like desktops and notebooks, but a mobile version of the system that
runs on tablet PCs like iPads or Android tablets should make the system more
useful during classes as the tablet can be operated more easily than a PC, and is
more useful for any one to access the system and study or create a teaching
material from anywhere, which should give more freedom to the users. Another
important thing is that those devices usually have cameras that can help in the
communication between users. As stated in [7] and [8], letting the users use the
system more freely and improve the communications features should be useful:
for example, this can allow different students to study in different ways and let
them participate and interact to create their own knowledge.
In the current version of the system, the basic functions that allow the creation
the Dynamic Teaching Materials were developed allow the teaching materials to
evolve. However, there are still a lot of possibilities for improving the system.
Acknowledgements. Part of this work was supported by Grants-in-Aid for Scientific
Research Japan.

References
[1] Grant, L., Facer, K., Owen, M., Sayers, S.: Opening Education: Social software and
learning. Futurelab United Kingdom (2006)
[2] Letramento digital através de narrativas de aprendizagem de língua inglesa,
http://www.veramenezes.com/narmult.mht
[3] Gillani, B.: Learning Theories and the Design of E-Learning Environments. University
Press of America United States (2003)
[4] Gestural Interfaces: A Step Backwards In Usability,
http://www.jnd.org/dn.mss/
gestural_interfaces_a_step_backwards_in_usability_6.html
[5] Natural User Interfaces Are Not Natural,
http://www.jnd.org/dn.mss/
natural_user_interfaces_are_not_natural.html
[6] Preece, J., Rogers, Y., Sharp, H.: Interaction Design: beyond human-computer interac-
tion, 2nd edn. John Wiley & Sons Ltd, England (2009)
[7] Blackwood, A., Anderson, P.: Mobile and PDA technologies and their future use in
education. JISC Technology and Standards Watch: 04-03 (2004)
[8] Chao, H., Wu, T.: Mobile e-Learning for Next Generation communication Environ-
ment. Journal of Distance Education Technologies 6(4), 1–13 (2008)
Generation Method of Multiple-Choice Cloze
Exercises in Computer-Support for
English-Grammar Learning

Ayse Saliha Sunar, Dai Inagi, Yuki Hayashi, and Toyohide Watanabe

Abstract. With many remarkable advances in technology, not only studying with
tutor at school but also studying through computer at home become preferable. In-
telligent Tutoring System (ITS) is one of the research fields which aim to support
the individual learning intellectually. To provide the learning material of the domain
knowledge in many ITS, the learning materials are statically associated with each
other in advance and given to student based on her/his understanding state. Motivat-
ing student and making them more interested in the learning content is the system’s
task in the computer-supported systems. If students study on content which they are
interested in, learning activity becomes more effective. Our research objective is to
construct a system which automatically generates multiple-choice cloze exercises
from text input by the student. We focus on supporting individual study of learn-
ing English grammar. In this paper, we propose a representation method of English
grammar by Part-Of-Speech (POS) tags and words, the calculation procedure for
estimating the understanding state of student in the student model, and the learning
strategy for generating the next exercise based on the student model.

1 Introduction
Knowing English became one of the most important and essential issues in every-
one’s life. It is necessary for many foreigners to develop themselves and advance
their works. In order to learn English and check the effect, many people prepare
for English examinations. People prefer to study English through computer by
Ayse Saliha Sunar · Yuki Hayashi · Toyohide Watanabe
Graduate School of Information Science, Nagoya University,
Furo-cho, Chikusa-ku, Nagoya, Japan
e-mail: {saliha,yhayashi,watanabe}@watanabe.ss.is.nagoya-u.ac.jp
Dai Inagi
Faculty of Engineering, Nagoya University, Furo-cho, Chikusa-ku, Nagoya, Japan
e-mail: inagi@watanabe.ss.is.nagoya-u.ac.jp

T. Watanabe et al. (Eds.): Intelligent Interactive Multimedia: Systems & Services, SIST 14, pp. 289–298.
springerlink.com c Springer-Verlag Berlin Heidelberg 2012
290 A.S. Sunar et al.

themselves in recent years. Intelligent Tutoring System (ITS) is the research field
which aims to support studies individually, using the Artificial Intelligence technol-
ogy. ITS gives students an appropriate learning content from the domain according
to their understanding [1]. Although ITS provides individual learning environments,
the systems mostly give the student a part of the same learning content. In the
domain knowledge, the learning content must be structured in such way that can
be easily managed in order to adapt learning to student’s understanding. However,
constructing the domain of English is quite hard [2]. The biggest challenge on rep-
resenting English is that English does not consist of particular formulae and theo-
rem like mathematics. If English grammar is represented as formulae, it would be
easy to manage the domain knowledge. Kyriakou et al. proposed a tool for manag-
ing domain knowledge and helping tutors in ITS [3]. Their system consists of three
components: knowledge concepts which are organized in network, course units, and
meta-description which is a data set of learning objects. This system describes not
only a new metadata, but also its concept and relations. Although this system allows
tutor to manage the domain by creating, storing, viewing, and editing the metadata
or can create the concept network of domain, students still need manually to define
relation links, contents and concepts by themselves, but not do automatically. Faul-
haber et al. constructed the web-based learning system named ActiveMath for math-
ematics [4]. This system represents the learning content (rule/concept) as the nodes
and Inter-node relations are dynamically extracted from the domain. Even if rela-
tions are dynamically determined, this system must also predefine a domain which
is assigned to the difficulty level. However, if students study on content which they
are interested in contrast to predefined learning content, the learning activity would
be more effective and students would be more motivated to study.
In this research, we aim to construct an ITS which automatically generates
multiple-choice cloze English grammar exercises from student’s input text based on
the understanding state of student. To automatically generate the English questions,
MAGIC (Multiple-choice Automatic GeneratIon system for Cloze question) was
composed in our laboratory which generates multiple-choice cloze question from
an input English text [5]. The MAGIC system only generates an English question;
it does not adapt to the learning system for learning English grammar. In this paper,
we propose some mechanisms as for the estimation of student understanding, for
exercise generation based on student model to add into the MAGIC. Since grammar
consists of rules, we purpose to formulate English grammar rules. English gram-
mar rules are formulated by Part-of-Speech (POS) tags and some words. Firstly, our
system generates a few exercises from the student’s input text. After the student an-
swered the exercises, the understanding state of student is estimated in the student
model based on the accuracy of the answers. Then, the next exercises are generated
from the newly-input text based on the understanding state of the student. In this
paper, we represent English grammar rules by POS tagger and words and then dis-
cuss the constructional formulation so as to be useful as our learning strategy. We
also discuss how to estimate the understanding state of student and generate the next
exercises.
Generation Method of Multiple-Choice Cloze Exercises in Computer-Support 291

I Domain
N Knowledge
T
E
Learning
R
Strategy
F
Student A Student
C Model
E
ITS Framework

Fig. 1 Framework of ITS

2 Framework of Computer-Supported Learning System


ITS is a computer-supported learning system which consists of three main compo-
nents: namely, domain knowledge, student model, and learning strategy as shown
in Figure 1. Domain knowledge represents a learning content to study. ITS supports
their learning activities by managing the understanding states of students in the stu-
dent model. The student model manages learning records of the student, the learn-
ing progress, and the understanding states derived correspondingly from the domain
knowledge, and then the learning strategy module determines the next instructional
actions with respect to the student’s experiments [1].
The key idea of ITS is to dynamically provide a learning content which is appro-
priate for the student’s understanding. If the domain knowledge is defined properly,
ITS can accurately determine the understanding state of student. Also, the learning
strategy is applied based on the student model. In other words, the learning strategy
estimates the next content, which is defined in the domain knowledge, according to
the understanding state of student. On the contrary, if the domain knowledge is de-
fined insufficiently, system can neither estimate the understanding state of student
in the student model nor generate an appropriate learning content for the student.
If a system has these components, it can provide learning activity. Since MAGIC
only generates the English multiple-choice cloze questions, some mechanisms are
needed to modify MAGIC. In this research, the English grammar is defined in the
domain knowledge. After the student inputs the text, the system puts the appropriate
sentence in the domain knowledge. The student model is also inserted in MAGIC to
enhance it. The model supports to estimate the understanding degree of student for
each English grammar rule based on the student’s answers. Then, the learning strat-
egy determines the next multiple-choice cloze exercises based on the student model.
If these main parts are constructed coherently, MAGIC can generate exercises which
are appropriate for the student.
Figure 2 illustrates a framework of our system for learning English grammar.
The student model, which is one of main components of ITS, should be added
to determine the understanding state of student. In MAGIC, after inputting the
text, the system attaches Penn Treebank II tags [6] to all words. Then, the system
292 A.S. Sunar et al.

IN : Process flow
: Data flow
Input text
No
(c)
Sentences POS of unit
Select unit for
Extract sentences
exercises Exercise
Current Selected sentences Yes No Unit data
MAGIC Chapter & Unit id
system Select blank part Chapter& Select candidate for
Unit id blank part
Chapter & Unit & BP id Yes POS of blank part

Generate distracters
POS of dist.
Correct choice
Output exercises

Answer Answer
Selected choice data
Evaluate answer
(a)
Correctness of answer
Explanation
Correct answer Grammar
Rule
Understanding state
Display explanation
POS of blank part, of student
distracter and unit (b) Student
Renew student model Model
Correctness of answer

Fig. 2 Framework of our system

performs three procedures: extracting sentences from texts which are appropriate for
multiple-choice cloze questions, determining blank part and generating distracters.
The system uses some methods to carry out these processes. In the first process,
Preference Learning is executed using words and POS tag information emerging in
the existing multiple-choice cloze questions to put the input sentences in order. In
order to make up this order, Ranking Voted Perceptron [7] is used. Ranking Voted
Perceptron calculates ranks for each input sentence according to their similarity to
the existing questions. As a result, the sentences which are appropriate to com-
pose a cloze question for an examination like TOEIC are ranked. In the next pro-
cesses, blank part and its distracters are estimated based on Conditional Random
Field (CRF). In the current MAGIC system, CRF is introduced to attach labels to
words of the sentence, and then a blank part is defined as the named entity in a
sequence of words and represented by IOB2 format [8]. Sequences of words, POS
tags and distracters with their named entities in the existing multiple-choice cloze
questions are learned. The word which has the largest marginal probability of some
particular tags is determined as a blank part. Its distracters are also determined based
on result of CRF.
In order to generate exercises based on the understanding degree of student, it is
needed to construct learning contents first. The domain knowledge which is shown
in Fig. 2(a) represents English grammar rule as divided units. In order to estimate the
understanding state of student for each unit, the student model is added in MAGIC
as shown in Fig. 2(b). To generate exercises which are suitable for the student the
Generation Method of Multiple-Choice Cloze Exercises in Computer-Support 293

Table 1 Example of English grammar rule representation

Chapter Passive Voice


Unit General structure of passive sentences
Grammar rule VB VBD by NP
Candidate(s) of blank part “VBD” or “by”
Example sentences The house was built in 1470 by somebody.

process of selecting blank part and generating distracters are altered. The learn-
ing strategy is applied based on the student model in Fig. 2(c). Blank part and its
distracters, hereby, are to determine the student model to generate appropriate exer-
cises. Then, the system outputs multiple exercises to the student. Finally, after the
student answered the exercises, our system evaluates the student’s answer and cor-
rects the student’s mistakes. Also, our system updates the student model according
to the accuracy of the answers before generating the new exercises.

3 Grammar Structure
To construct a system for making English grammar learn, the domain knowledge of
English grammar should be defined so as to be able to manage the student model and
apply the learning strategy. We examined English textbooks to decide the structure
of English grammar. In general, the English grammar is divided into chapters and
units. Each chapter includes some units which are relevant to the chapter’s subject.
Each unit represents a single grammar rule. Since blank parts are selective part(s) of
exercises to assess the student’s knowledge of the grammar rules, we also examined
blank parts of the exercises on the English textbooks.
Results of our investigation show that most of English grammar rules can be
represented as the ordering of part-of-speech. In addition, some words might be
important as much as their type (part-of-speech) and their placement in the sentence.
In this research, we represent each English grammar rule by a composition of POS
tags and words. We determine candidates of the blank part for the grammar rules
while defining the grammar rules.
Table 1 shows an example for the representation of a grammar rule. The unit rep-
resents the basic rule in Passive Voice chapter. In this example, “VB” (base form of
verb) and “VBD” (past form of verb) are word level of POS tags, “by” is the im-
portant word and “NP” (noun phrase) is the phrase level of POS tags. For instance,
usage of past participate type of verb (VBD) and usage of preposition “by” are more
characteristic parts in learning the passive voice than the other parts such as subject
of the sentence. If the student answers these parts properly, our system can estimate
that student has the knowledge of the passive voice.
294 A.S. Sunar et al.

4 Learning of English Grammar


4.1 Student Model
The student is supposed to understand a unit completely when s/he correctly answers
most of exercises which are relevant to each defined blank part of the grammar rule
in the unit. For estimating understanding state of the student of each unit, all answer
information of each blank part of the grammar rule is used. Here, U is a set of all
units, as shown in the expression (1). A set Bi indicates each determined blank part
of the grammar rule i, and each exercise which is relevant to each blank part in
the unit i is indicated in a set Ei , as shown in expressions (2) and (3), respectively.
f (ei , j ,k ) returns “1” if the student’s answer of exercise k is correct. Otherwise, f
returns “0”. A number of correct answers for a blank part which belongs to the unit
i are calculated by the function f . Based on the function, understanding state of unit
i is calculated by the expression (5). di is a summation of rates of all correct answers
per blank part. The formula is hereby defined over all defined blank parts of the unit.

U = {ui : ∀i ∈ [1, n] indicates unit ID} (1)


Bi = {bi , j : ∀ j ∈ [1, m] indicates determined blank part ID in unit ui } (2)
Ei = {ei , j ,k : ∀k indicates exercise ID which belongs to bi , j in unit ui } (3)

1 (ei , j ,k ∈ Ei is correct)
f (ei , j ,k ) = (4)
0 (otherwise)
1 1
|Bi | bi ,∑ ∑ f (ei , j ,k )
di = (5)
j ∈Bi
|E i| k

After receiving the student’s all answers, the system evaluates the correctness of the
answers. Then, the system updates the understanding state of student based on the
checked answers.

4.2 Learning Strategy


The learning strategy is applied to select an appropriate exercise based on the un-
derstanding state of student has been described in Section 4.1. While selecting rules
and deciding blank part in the process of generating exercises, the learning strategy
is considered as shown in Fig. 2. In order to select the unit which has the suitable
grammar rule, extracted sentences are tagged by Penn Treebank II tags. If a sentence
includes the ordering of the POS tag(s) and important word(s) which are defined for
the grammar rule of the unit, we can say that the sentence has the grammar structure
of the unit which represents the grammar rule. In order to decide which suitable
grammar rules can be adapted to generate exercises from each sentence of input
text, the grammar structure of the sentence and the grammar structure of the rule
are compared. If the sentence includes the grammar rule, the sentence and the unit
are matched. In this case, one or more units can be matched with one sentence. An
exercise is generated from the sentence based on one of the grammar rule of the
Generation Method of Multiple-Choice Cloze Exercises in Computer-Support 295

Tagged sentences

Grammar Grammar
Match sentence with units
rules DB
Matched
0 with any
More than 1
units
Select appropriated unit
1
Yes Any other
sentences
No
Selected sentences
and units

Fig. 3 Unit selection process

matched unit. Thus, the learning strategy is applied to choose the appropriate unit
among the matched plural units to generate an exercise based on its grammar rule.
Figure 3 shows a flowchart for the selection process. If a sentence matches with
none of the units, any exercise is not generated from this sentence. Then, the system
carries out the process of matching for the other sentence of input text, if a sentence
exists. If a sentence matches with only one unit, the exercise is generated from this
sentence based on the grammar rule of this unit. If the sentence matches with more
than one unit, one of them is selected to generate the exercise from the sentence
based on the grammar rule of the selected unit. After the unit was selected, the
system checks whether there is another sentence in the input text or not. If there
is, the process of matching is preceded. The learning strategy is applied based on
the understanding state of student to select the unit. In order to judge whether a
student understands a unit or not, we introduce threshold α which determines the
understanding state of the student for each unit. α ranges from 0 to 1 (0 ≤ α ≤ 1). If
di is larger than α , it is judged that the student understood the unit i. Otherwise, the
system judges that s/he has not understood the unit i yet. In the selection process, the
unit which is the easiest one to advance to threshold α is selected. If all of candidate
units have already reached to α , the lowest one is selected.
After selecting the units, the placement of blank part is determined. In order to
decide the appropriate blank part of an exercise, our system considers the blank part
information of the matched grammar rule. In order to decide that a unit is completed
successfully, each blank part of unit should be studied and completed successfully.
Therefore, at least one exercise should be asked on each blank part. Thus, the per-
centage of correct answers per the blank part is also calculated. The blank part which
has the lowest percentage is determined as a candidate for the blank part. Then, our
296 A.S. Sunar et al.

system checks if results of CRF are compatible with decided candidates for the
blank part or not first. If the highest result of CRF is not one of candidates of blank
part of the unit, the next highest result is checked to generate a sufficient exercise.

5 Example
In our system, a student can input a long-length or short-length English text. In this
example, we assume that the student inputs one following sentence:

“If I were in your situation, I would ask to speak with the manager.”

First, this sentence is tagged by Penn Treebank II tags:

(TOP (S (SBAR (IN If) (S (NP (PRP I) ) (VP (VBD were) (PP (IN in) (NP
(PRP$ your) (NN situation) ) ) ) ) ) (, ,) (NP (PRP I) ) (VP (MD would) (VP
(VB ask) (S (VP (TO to) (VP (VB speak) (PP (IN with) (NP (DT the) (NN
manager) ) ) ) ) ) ) ) (. .) ) ).

Then, the system compares the grammar structure of the sentence to the structures
of the defined grammar rules. Currently, 65 grammar rules have been defined. This
sentence is matched with the unit 2-1 (unit 1 of chapter 2) and the unit 4-5 (unit
5 of chapter 4). The chapter 2 is on “Infinitive” which has the basic rule which is
represented by “TO VB” and only one candidate for the blank part (TO VB); the
chapter 4 is on “If Clauses” which has the rule which is represented by “If NP VBD,
NP MD VB” and two candidates for the blank part (VBD or MD VB). These two
units are shown as follows:

Chapter 2 Infinitive
Unit 1 General rule of infinitive
Grammar rule TO VB
Candidate(s) of blank part “TO VB”

Chapter 4 If Clauses
Unit 5 Usage past tense in if clauses
Grammar rule If NP VBD, NP MD VB
Candidate(s) of blank part “VBD” or “MD VB”

Since this sentence is matched with two units, the system should decide one of them
to generate the exercise based on the student model. In this case, we set α = 0.6. A
number of exercises and student’s correct answer is shown in Table 2.
Generation Method of Multiple-Choice Cloze Exercises in Computer-Support 297

Table 2 Number of the student’s answers


2-1 4-5
TO VB VBD MD VB
Number of correct answers 3 1 2
Number of all exercises 4 2 3
Percentage of correct answers 0.75 0.5 0.67

By using the expression (5), the understanding states of student for units d2−1
and d4−5 are calculated as shown in the expressions (6) and (7) respectively.
1 3
d2−1 = · = 0.75 (6)
1 4
1 1 2
d4−5 = · ( + ) = 0.58 (7)
2 2 3
The exercise is generated from the grammar rule of the unit 4-5 according to the
learning strategy while α is equal to 0.6, because the d4−5 is smaller than the thresh-
old. After selecting the unit, our system estimates the blank part. Since the part of
VBD has the higher result of CRF and the lower percentage of correct answers, the
part of VBD is estimated as the blank part of the exercise. Then distracters are gen-
erated for the estimated blank part based on the CRF. Finally, the system outputs the
exercise which was generated on the blank part (VBD) of the unit 4-5:

If I ( ) in your situation, I would ask to speak with the manager.

1. were (correct answer)


2. been
3. being
4. will be been

6 Conclusion
In this paper, we proposed a framework for the learning system which generates En-
glish multiple-choice cloze exercises from input text according to the understanding
state of student. We purpose a way to represent knowledge of English grammar. It
can be formulized by POS tags and words, because English grammar consists of
grammar rules. All grammar rules are defined as individual unite. In order to select
an appropriate one of the units which are matched with the structure of the input sen-
tence, the learning strategy has been defined. Furthermore, the calculation method
for estimating the understanding state of student in the student model for each unit
is defined.
298 A.S. Sunar et al.

For our future work, we have to add new English grammar rules to the domain
knowledge. We need to confirm that sentences and units can be matched correctly.
In addition, rules are independently defined for this time. For estimating the under-
standing state of a student more accurately, the relation among grammar rules may
need to be considered. In this paper, we focused on the process of generating ex-
ercises. We plan to consider a method for automatically providing explanations to
correct their mistake after evaluating the answers of the student.

References
1. Corbett, A.T., Koedinger, K.R., Anderson, J.R.: Handbook of Human-Computer Interac-
tion, pp. 849–874 (1997)
2. Heilman, M., Eskenazi, M.: Language Learning: Challenges for Intelligent Tutoring Sys-
tems. In: Proc. of the Workshop of Intelligent Tutoring Systems for Ill-Defined Domains,
8th International Conference on Intelligent Tutoring System, pp. 20–28 (2006)
3. Kyriakou, P., Hatzilygeroudis, I., Garofalakis, J.: A Tool for Managing Domain Knowl-
edge and Helping Tutors in Intelligent Tutoring Systems. Journal of Universal Computer
Science 16(19), 2841–2861 (2010)
4. Faulhaber, A., Melis, E.: An Efficient Student Model Based on Student Performance and
Metadata. In: 18th European Conference on Artificial Intelligent (ECAI 2008). Frontiers
in Artificial Intelligent and Applications (FAIA), vol. 178, pp. 276–280. IOS Press (2008)
5. Goto, T., Kojiri, T., Watanabe, T., Iwata, T., Yamada, T.: Automatic Generation System of
Multiple-Choice Cloze Question and its Evaluation. KM and E-Learning: An International
Journal 2(3), 210–224 (2010)
6. Tsuruoka, Y., Tsujii, J.: Bidirectional Inference with the Easiest-First Strategy for Tagging
Sequence Data. In: Proc. of HLT/EMNLP 2005, pp. 467–474 (2005)
7. Collins, M., Duffy, N.: New Ranking Algorithms for Parsing and tagging: Kernels over
Discrete Structures, and the Voted Perceptron. In: Proc. of 40th Annual Meeting of the
Association for Computational Linguistics, pp. 263–270 (2007)
8. Sang, T.K., Veenstra, J.: Representing Text Chunks. In: Proc. of EACL 1999, pp. 173–179
(1999)
Genetic Ensemble Biased ARTMAP Method
of ECG-Based Emotion Classification

Chu Kiong Loo, Wei Shiung Liew, and M. Shohel Sayeed


*

Abstract. This study is an attempt to design a method for an autonomous pattern


classification and recognition system for emotion recognition. The proposed sys-
tem utilizes Biased ARTMAP for pattern learning and classification. The
ARTMAP system is dependent on training sequence presentation to determine the
effectiveness of the learning processes, as well as the strength of the biasing pa-
rameter, lambda λ. The optimal combination of λ and training sequence can be
computed efficiently using a genetic permutation algorithm. The best combina-
tions were selected to train individual ARTMAPs as voting members, and the final
class predictions were determined using probabilistic ensemble voting strategy.
Classification performance can be improved by implementing a reliability thresh-
old for training data. Reliability metric for each training sample was computed
from the current voter output, and unreliable training samples were excluded from
the performance calculation. Individual emotional states are highly variable and
are subject to evolution from personal experiences. For this reason, the above sys-
tem is designed to be able to perform learning and classification in real-time to
account for inter-individual and intra-individual emotional drift over time.

1 Introduction
Several methods were developed to develop emotion recognition systems based on
facial and speech recognition [1] [2], as well as physiological signal measurements

Chu Kiong Loo ⋅ Wei Shiung Liew


*

Faculty of Computer Science and Information Technology


University of Malaya
Kuala Lumpur, Malaysia
e-mail: ckloo.um@gmail.com, liew.wei.shiung@gmail.com
M. Shohel Sayeed
Faculty of Information Science and Technology
Multimedia University
Melaka, Malaysia
e-mail: shohel.sayeed@mmu.edu.my

T. Watanabe et al. (Eds.): Intelligent Interactive Multimedia: Systems & Services, SIST 14, pp. 299–306.
springerlink.com © Springer-Verlag Berlin Heidelberg 2012
300 C.K. Loo, W.S. Liew, and M. Shohel Sayeed

[3] [4] [5]. The main advantage of physiological measurements over facial and
speech recognition is the practicality of implementing the monitoring systems.
Physiological signal monitoring have the benefit of a constant and robust signal
recording, especially with the development of portable biosensors that are also un-
obtrusive for daily activities.
A study by Zhong et.al [6] analyzes the nonlinear components of heart-rate dy-
namics caused by the two main branches of the autonomic nervous system (ANS).
In addition to another study [7], these methods can be used to examine the fluctua-
tions of the ANS. Principal dynamic modes (PDM) allows for nonlinear analysis
on the separated dynamics in the ECG signal, as well as clear separation between
the contributions of the two ANS branches.
Individual emotion states are subject to variations due to external and internal
influences. This emotional drift requires that any autonomous emotion classifica-
tion system to be regularly updated with its users’ current physiological-emotion
data. The system must be capable of incorporating new learning patterns while re-
taining previous knowledge without performing the entire learning sequence. This
“stability vs. plasticity” dilemma can be minimized by using adaptive resonance
theory (ART) for pattern learning and classification.
ART-based neural networks were developed as a model of human cognitive in-
formation processing. During learning or training, certain input sequences with a
specific featural attention can distort the system’s memory and reduce its classifi-
cation accuracy. Biased ARTMAP [9] solves the problem of overemphasis on ear-
ly critical features by biasing attention away from previously attended features
when the system makes a predictive error. The strength of the biasing is controlled
by an attention parameter, λ.
Using Biased ARTMAP for pattern recognition reduces the number of vari-
ables which determines the system’s performance to two factors: the attention
parameter λ, and the sequence of the training data presentation. Optimum combi-
nations of λ and training data sequence can be computed efficiently by implement-
ing a genetic permutation method [10] to “evolve” the optimal combinations over
several generations.
To further improve the classification accuracy, a voting strategy is used to de-
termine the final class predictions. The proposed voting strategy [11] calculates
recognition rates of plurality voting techniques while considering the system's
measure of reliability, that is, the probability of a decision to be classified cor-
rectly given a specific input pattern. Using the reliability metric to describe a
given training data can reduce the classification error by rejecting suspicious train-
ing data which do not meet a minimum reliability requirement [12].
The final incarnation of the classification system will be a prototype emotion
recognition system that can be customized for individual by continuous feedback
of ECG measurements into the system to improve predictive accuracy of the indi-
vidual’s emotions. Constant online learning will generate enough data for the
system to adapt to its user’s emotional drift.
Genetic Ensemble Biased ARTMAP Method 301

Fig. 1.1 Block diagram of the Genetic Ensemble Biased ARTMAP system

2 Genetic Ensemble Biased ARTMAP


2.1 Genetic Permutation for Optimal Pattern Ordering
20 chromosomes were generated for the initial population. Each chromosome con-
sists of a number of genes equal to the number of feature patterns, N, in each train-
ing sample. Thus, the gene arrangement for each chromosome is representative of
a single data presentation sequence of the given data set.
To calculate the fitness value of each chromosome, a single-voter Biased
ARTMAP was trained and tested with 5-fold cross-validation. Fitness value of the
chromosome was calculated as the percentage of correctly classified patterns over
the total number of tested patterns. Chromosomes were then sorted according to
fitness, and half of the least fit chromosomes were discarded.
Reproduction was performed to repopulate the chromosome pool and replace
the discarded chromosomes with offspring of randomly-chosen fit parents. Muta-
tion was also applied to randomly selected offspring to prevent early convergence.
The process of fitness testing and selection, mating, and mutation ensured that
each successive generated population of chromosomes will have a higher average
fitness than the previous generation.

2.2 Biased ARTMAP Theory


ART-based neural networks model real-time prediction, searching, learning, and
recognition, capable of learning stable recognition categories in response to arbi-
trary input sequences. One example of the ART learning system is the ARTMAP, a
hierarchical network architecture that can self-organize stable categorical mappings
between m-dimensional input vectors and n-dimensional output vectors [13].
302 C.K. Loo, W.S. Liew, and M. Shohel Sayeed

The novelty of the ARTMAP’s learning strategy on attended critical feature


patterns possesses a design flaw which presents itself during online fast learning.
Extensive testing show that certain input sequences with a strong featural attention
can distort the learning process and affect the system’s testing accuracy.
This flaw is addressed by adding a module to the ARTMAP system to selec-
tively bias previously attended features whenever the system makes a predictive
error. Biasing the input pattern to favour previously inactive nodes implements
search by allowing the network to activate new nodes, and biases the system
against reselecting the category node which had just produced the predictive error.

2.3 Probabilistic Voting


The probabilistic voting strategy was proposed as an alternative to majority voting
[11]. In general, the error rate of the combination system was minimized by
choosing the classification category with the largest a posteriori probability ac-
cording to Bayes’ rule, while all other classes have equal probability of being cho-
sen in case of incorrect classification. Each classifier can have a different weight
and each class has a constant representing its a priori probability.
A study by Loo and Rao [12] implements a method in extension of the above to
measure the reliability of a class prediction computed from the probabilistic voting
results. The desired reliability of a classification system can be enforced by requir-
ing that each and every input object’s winning class to have at least r more votes
than the closest competing class, failure of which the classification of the input
object is rejected due to unreliability of the prediction.

3 Experiment
The performance of the Genetic Ensemble ARTMAP system was tested using several
datasets from the UCI Machine Learning Repository [15]. The tested datasets were
Dermatology, Glass, Hepato, and Wine, chosen for non-binary classification. Optimi-
zation was performed for each data set using the same methods outlined above. The
resultant pool of 220 potential training sequences were then used for training a single-
voter, five-voter, and ten-voter system. In addition, the 220 training sequences were
used to generate a bootstrapped mean with 1000 resamplings and 95% confidence
interval, as a representation of the overall classification accuracy of each data set.

Table 3.1 Prediction accuracy of bootstrapped mean of the genetic optimized population,
and the probabilistic voting system

Bootstrapped population Probabilistic ensemble voting


Low Mean High 1-voter 5-voters 10-voters
Derm 93.84 94.00 94.16 97.74 96.05 96.05

Glass 67.85 68.48 69.07 82.38 82.38 76.19

Hepato 70.05 70.17 70.29 72.52 70.65 69.90

Wine 89.56 89.71 89.86 92.57 90.28 89.71


Genetic Ensemble Biased ARTMAP Method 303

The obtained results, when compared to similar literature using other pattern
classification methods with the same data sets, show comparable performance of
the Genetic Ensemble Biased ARTMAP with other contemporary methods.
The system is then tested using a database of physiological signals collected by
Wagner et.al. [3]. The data set consists of 100 ECG samples divided into four
emotion classes. Feature extraction was performed using an algorithm designed by
Wagner et.al. [14]. A total of 106 ECG features were extracted from each
recording, including several features not available in the original toolbox
algorithm. The features of principal dynamic modes [6] [7] were included to
provide nonlinear analysis to the overall feature set.
A genetic optimization algorithm was employed to obtain optimal training se-
quences for the data set for each value of λ from 0 to 10. The genetic optimization
process was iterated for 20 generations, and then repeated with another random
population of chromosomes for a different value of λ. A total of 220 chromosomes
were generated from the optimization exercise and the chromosomes with the best
fitness were chosen to train a Biased ARTMAP each. A probabilistic voting strat-
egy was then used to determine the final class prediction for any given data input.
For each value of λ, 220 training sequences were generated using the genetic op-
timization algorithm. The predictive accuracy of each individual training sequences
were obtained, and were used to generate a bootstrapped mean as a statistical aggre-
gate of the entire population’s predictive accuracy. Results show little distinction in
predictive accuracy when different values of λ were used. All bootstrapped mean
results were clustered around 65-67% accuracy, while the individual predictive
accuracies range from 54-78% prediction rate.
One hypothesis is that genetic ordering compensation inadvertently solves the
problem of early featural distortion which the Biased ARTMAP was designed to
solve. Nevertheless, the results were obtained from offline learning, and the bias-
ing technique will be more useful during online learning.
A probabilistic ensemble voting system was applied, in which N voters were
individually trained by N of the best training sequences from the combined popu-
lation of 220 chromosomes. Testing was performed on the voting system based on
probabilistic majority rules to determine the final class prediction of the test data.
Testing was repeated using a reliability metric to evaluate each class prediction.
Class predictions which did not meet the reliability threshold were removed from
the final accuracy calculation.

Table 3.2 Classification performance of Fuzzy ARTMAP (λ = 0) with probabilistic ensem-


ble voting and reliability threshold

Voters Reliability R=0 R=0.5 R=0.9 R=0.99


1 76.00 (0) 76.00 (0) NaN (100) NaN (100)
3 76.00 (0) 76.76 (1) 85.13 (26) 85.13 (26)
5 70.00 (0) 70.40 (2) 77.64 (15) 78.31 (17)
7 71.00 (0) 71.42 (2) 76.13 (12) 78.57 (16)
10 73.00 (0) 74.22 (3) 73.95 (4) 73.33 (10)
304 C.K. Loo, W.S. Liew, and M. Shohel Sayeed

Table 3.3 Classification performance of Biased ARTMAP (λ = 0:10) with probabilistic en-
semble voting and reliability threshold

Voters Reliability R=0 R=0.5 R=0.9 R=0.99


1 78.00 (0) 78.00 (0) NaN (100) NaN (100)
3 79.00 (0) 79.00 (0) 87.17 (22) 87.17 (22)
5 79.00 (0) 79.38 (3) 84.44 (10) 84.88 (14)
7 75.00 (0) 74.48 (2) 77.77 (10) 78.65 (11)
10 79.00 (0) 79.78 (6) 80.64 (7) 80.89 (11)

The number in brackets represents the percentage of class predictions which


were rejected due to low reliability. In particular, predictions from a voting system
with few voting members are considered less reliable due to lack of information
compared with systems with more voting members. However, this experiment also
indicates that while predictive accuracy increased when a more stringent reliability
threshold was applied, increasing the number of voters did not elicit an improve-
ment. This may be explained by the method in which each voter was trained. Each
additional voter besides the first was trained using training sequences which were
increasingly less accurate, effectively affecting the system’s predictive accuracy
by adding an increasing amount of noisy data. Even so, each additional voter
served to contribute additional information into the ensemble classifier by improv-
ing recognition rates of reliable training data.
The above results were then compared against similar pattern classification me-
thods: linear discriminant analysis (LDA), k-nearest neighbor (kNN), and multi-
layer perceptron (MLP). For kNN, a series of training and testing was performed
for a range of values for k, in the range [1, 10]. A bootstrapped mean was gener-
ated from the results. For MLP, the main initial network parameters are the num-
ber of hidden layers (set to 9), the rate of learning (set to 1), and the number of
training iterations (set to 100). For Fuzzy ARTMAP and Biased ARTMAP, the re-
sults were using the classification performance from the best combination of voter
ensemble, reliability threshold, and training sequence.

Table 3.4 Comparison of pattern classification methods

Classification method Predictive accuracy (%)


Linear discriminant analysis 66.00
Kth nearest neighbor [k = 1:10] 72.00
Multilayer perceptron 83.00
Genetic Ensemble Fuzzy ARTMAP 85.13 (3-voter, 90% reliability)
Genetic Ensemble Biased ARTMAP 87.17 (3-voter, 90% reliability)

Both ARTMAPs show comparable classification performance with the multi-


layer perceptron (MLP). However, ARTMAPs have several distinct advantages
over the MLP classification method, including the ability for incremental learning
to evolve the classification system over time, and a faster convergence during
training and testing.
Genetic Ensemble Biased ARTMAP Method 305

Conclusion
From the experiment, several conclusions can be drawn. The genetic optimization
algorithm is an effective method for training and testing an ARTMAP system for
pattern learning and classification. When combined with Biased ARTMAP, the
genetic optimization method rendered the biasing technique redundant. In addi-
tion, a more effective voter selection method will be required, as the current me-
thod reduces the predictive accuracy of the system with each additional voter add-
ed. Implementing a reliability threshold allows a slight increase in classification
accuracy by rejecting class predictions which do not meet the minimum consensus
among voting members, as opposed to simple majority voting. Overall, the Ge-
netic Ensemble Biased ARTMAP has comparable pattern prediction capability as
compared with other pattern classification methods such as LDA, kNN, and MLP.
The system’s features can be further improved based on the results from this
study.

References
[1] De Silva, L.C., Miyasato, T., Nakatsu, R.: Facial emotion recognition using multimo-
dal information. In: Proceedings on International Conference on Information, Com-
munications, and Signal Processing, vol. 1, pp. 397–401 (1997)
[2] Busso, C., Deng, Z., Yildrim, S., Bulut, M., Lee, C.M., Kazemzadeh, A., Lee, S.,
Neumann, U., Narayanan, S.: Analysis of emotion recognition using facial expres-
sions and multimodal information. In: Proceedings of the 6th International Confer-
ence on Multimodal Interfaces, pp. 205–211 (2004)
[3] Wagner, J., Kim, J., Andre, E.: From physiological signals to emotion: Implementing
and comparing selected methods for feature extraction and classification. In: IEEE In-
ternational Conference on Multimedia and Expo, pp. 940–943 (2005)
[4] Kim, K.H., Bang, S.W., Kim, S.R.: Emotion recognition system using short-term
monitoring of physiological signals. Medical and Biological Engineering and Com-
puting 42(3), 419–427 (2004)
[5] Mandryk, R.L., Atkins, M.S.: A fuzzy physiological approach for continuously mod-
eling emotion during interaction with play technologies. International Journal of Hu-
man-Computer Studies 65(4), 329–347 (2007)
[6] Zhong, Y., Wang, H., Ju, K.H., Jan, K.M., Chon, K.H.: Nonlinear analysis of the
separate contributions of autonomics nervous systems to heart rate variability using
principal dynamic modes. IEEE Transactions on Biomedical Engineering 51(2), 255–
262 (2004)
[7] Choi, J., Gutierrez-Osuna, R.: Using heart rate monitors to detect mental stress. In:
6th International Workshop on Wearable and Implantable Body Sensor Networks, pp.
219–223 (2009)
[8] Plutchik, R.: The nature of emotions. American Scientist (2001)
[9] Carpenter, G.A., Gaddam, S.C.: Biased ART: A neural architecture that shifts atten-
tion toward previously disregarded features following an incorrect prediction. Neural
Networks 23(3), 435–451 (2010)
306 C.K. Loo, W.S. Liew, and M. Shohel Sayeed

[10] Palaniappan, R., Eswaran, C.: Using genetic algorithm to select the presentation order
of training patterns that improves simplified fuzzy ARTMAP classification perform-
ance. Applied Soft Computing 9(1), 100–106 (2009)
[11] Lin, X., Yacoub, S., Burns, J., Simske, S.: Performance analysis of pattern classifier
combination by plurality voting. Pattern Recognition Letters 24, 1959–1969 (2003)
[12] Loo, C.K., Rao, M.V.C.: Accurate and reliable diagnosis and classification using
probabilistic ensemble simplified fuzzy ARTMAP. IEEE Transactions on Knowledge
and Data Engineering 17(11), 1589–1593 (2005)
[13] Carpenter, G.A., Grossberg, S., Reynolds, J.H.: ARTMAP: A self-organizing neural
network architecture for fast supervised learning and pattern recognition. Neural Net-
works 4(5), 565–588 (1991)
[14] Wagner, J.: The Augsburg Biosignal Toolbox (2009),
http://www.informatik.uni-augsburg.de/
en/chairs/hcm/projects/aubt/
(retrieved June 29, 2011
[15] Frank, A., Suncion, A.: UCI Machine Learning Repository Irvine, CA: University of
California, School of Information and Computer Science (2010),
http://archive.ics.uci.edu/ml (retrieved November 2011)
Honey Bee Optimization Based on Mimicry
of Threshold Regulation in Honey Bee Foraging

Maki Furukawa and Yasuhiro Suzuki*

Abstract. Honey bees correctly allocate their work force to nectar sources using
the “waggle dance”. In addition, they can determine the necessity for a nectar
source. Thus, they possess a value threshold for a nectar source and they can col-
lectively regulate it. Based on the mimicry of the threshold regulation used in ho-
ney bee foraging, we are developing a system that allows agents to determine
whether their own solution is worth communicating to other agents. We propose a
novel bio-inspired optimization algorithm, honey bee optimization (HBO). HBO
is a multi-agent system based on the foraging activities of honey bees. To test the
characteristics of HBO, we applied it to the travelling salesperson problem (TSP).

1 Introduction

One of the most familiar social insects in the world is the honey bee. Work force
allocation by foraging honey bees is organized by the "waggle dance", which was
first identified by von Frisch[1]. The foraging activities of honey bees are well-
organized systems. Lucic and Teodorovic proposed the "Bee System (BS, an ar-
tificial bee swarm)" to solve the travelling salesman problem (TSP) in 2003[2]. Li-
Pei Wong et al. proposed bee colony optimization (BCO) for more general prob-
lem solving in 2008 [3, 4]. These algorithms are multi-agent systems based on the
“waggle dance”. To improve solutions efficiently, these optimization algorithms
control two functions. Because it is a type of roulette-wheel selection, the waggle
dance is a function of exploration. The function of exploitation inspired by honey
bee foraging is also needed.
In the bee hive, bees adjust the foraging criteria that individuals use to evaluate
the worth of a newly located nectar source. They automatically adjust their criteria

Maki Furukawa ⋅ Yasuhiro Suzuki


*

Department of Complex Systems Science Graduate School of Information Science,


Nagoya University, Furo- cho, Chikusa- ku, Nagoya, Japan
e-mail: furukawa.maki@d.mbox.nagoya-u.ac.jp

T. Watanabe et al. (Eds.): Intelligent Interactive Multimedia: Systems & Services, SIST 14, pp. 307–316.
springerlink.com © Springer-Verlag Berlin Heidelberg 2012
308 M. Furukawa and Y. Suzuki

and they change nectar sources to maximize nectar collection with minimal effort.
This is not well known and it is not adopted in optimization algorithms.
We proposed the honey bee optimization (HBO) algorithm, which uses a thre-
shold regulation system based on honey bee foraging. Each agent can determine
whether it should communicate its solution to other agents. This method allows
agents to control the overall variety of solutions.
We applied the HBO to the travelling salesperson problem (TSP). First, we de-
scribe honey bee foraging. We then set the threshold to allow agents to have indi-
vidual control over threshold regulation with multi-agent system. Finally, we
compared the results with fixed and regulated thresholds, before discussing our
findings.

2 Honey Bee Foraging


Honey bees collect nectar from nectar sources such as flower patches located
around the hive. To collect nectar efficiently, bees collect from rewarding nectar
sources and abandon less rewarding sources. To avoid convergence on the most
useful source, they allocate their workforce as detailed below.

a) Optimization of work force allocation


Bees appear to comprehensively evaluate and compare nectar sources. However,
each bee chooses one source, gathers nectar, and evaluates that source according
to its own foraging criteria.
Each bee can evaluate the found source based on a few factors including dis-
tance from the hive, the concentration of nectar, etc. When the importance of the
source is higher than the threshold for foraging, the bee advertises the source posi-
tion to other bees using a recruitment dance known as the waggle dance. The
higher the importance of the source, the longer the bee dances. If the source is not
rewarding, the bee abandons the source and randomly follows one of the dancers.
Each bee has a different threshold, but the distribution of foraging bees is de-
termined by the importance of the nectar source when sufficient bees are foraging.

b) Adjustment of the threshold


The individual thresholds for foraging vary. This avoids convergence on the most
rewarding source. However, if there is a very rewarding source with sufficient
nectar, it is beneficial for them to rush to the source and collect it in large
amounts. Thus, bees can adjust their foraging criteria in response to environmental
and domestic change.
There are two types of honey bees in the hive, foragers and receivers. The re-
ceivers collect nectar from foragers and process it in the hive. After foragers col-
lect nectar and return to the hive, they search for an available receiver. For exam-
ple, a forager might easily find a free receiver, which indicates that there are more
receivers than foragers. In this case, a forager should advertise the food source and
collect more nectar. Thus, the threshold should be lower. If not, they will abandon
Honey Bee Optimization Based on Mimicry of Threshold Regulation 309

sources too often. This is not a well-known system and it requires different types
of bees, so it has not been used in bee-inspired optimization algorithms.

3 Honey Bee Optimization


We propose the honey bee optimization (HBO) algorithm, which uses a threshold
regulating system to mimic the honey bees' foraging criteria.
HBO is an optimization algorithm that uses a multi-agent system. Each agent
has a threshold, which it uses to determine whether it should tell another agent its
own solution. Via communication, each agent can adjust its threshold.
Main factors of HBO are
1. Agents imitating honey bee foragers
2. Hive
3. Nectar source
The nectar source is a possible part of the solution that is communicated to other
agents using the waggle dance. After foraging, agents return to hive and dance for
the nearest agents. Nearest agents are agents that arrive at the hive at the same
time.
The agents are split into two groups, elite agents or normal agents, depending
on the threshold. If they exceed the threshold with a good solution, they will be-
come an elite agent and advertise their own solution to normal agents who aban-
don their solution. They then regulate the threshold based on the number of nearby
elite agents.
An outline of the HBO algorithm is shown below.

1 Initialize
1.1 Setting agents, nectar source and hive in the field
1.2 Supplying every agent with one solution
2 Foraging (loop until satisfying a terminal condition)
2.1 Agents moving and collecting nectar
2.2 Return to the hive
3 Evaluation of the nectar source
-Each agent determined whether to advertising or abandon
4 Communication (improvement of the solution)
-Elite agents: advertising the nectar source
-Normal agents: abandoning the nectar source and acquiring a new source from
one of the elite agents.
5 Threshold regulation
6 Check the terminal condition
-Whether terminal iterations are completed
-End or back to 2
7 Output the best solution and end
310 M. Furukawa and Y. Suzuki

3.1 Applying HBO to TSP


The tour from city i to city j is described as (i,j) and the city list is {… i, j,…} in
HBO. Each agent has a city list as the solution, which is generated randomly. The
shortest tour is the optimal tour.
3 and 4 of outline are described below.

3 Each agent calculates the length of its tour


- Agents shorter than the threshold become an elite agent
4 Improvement of the city list
- Normal agents change the city list according to the elite agents’ dance

The agents imitate the bee's waggle dance. An elite agent with a shorter tour is
likely to be selected. They determine the dance probability using Eq. (1).

, ,
= (1)

, ,

Pi: the dance probability, which is likelihood of being selected by normal agents
La,(i,next): distance from current city i to the next city
Nelite: number of elite agents
To improve the solutions, we need two groups of agents. A group of elite agents is
known as group E while the group of normal agents is known as group N.
For example, the improvement of the solution at city 1 by a normal agent
known as N0 with {…1,3…} in its city list is shown in Fig. 1. N0 will move to city
3 after city 1. When N0 arrives at city 1, it will randomly select one agent from
group E in the same city. Because elite agents E0 ~ E4 were in city 1, N0 selected
E1 with {…,1,2,…}. The city list of N0 changed from {…,1,3,…} to
{…,1,2,3,…}. Parts of tour…(1,3)…(a,2)(2,b)… became…(1,2)(2,3) …(a,b)…,
where a and b are arbitrary cities before and after city 2. N0 moved to city 2.
The tour length of N0 was increased by the length of (1,2),(2,3) and (a,b), and
decreased by the length of (1,3),(a,2) and (2,b), compared with its previous sta-
tus. If the decrease in length is longer than the increase, N0 may become an elite
agent in the next city and advertise other parts of its tour. In this way, all normal
agents select and modify their city list. If some normal agents succeed in im-
proving their city list, other normal agents will refer to their tour list and find a
shorter tour.
The number of normal agents moving to each city increases in proportion to the
number of elite agents advertising each city and each dance probability. Agents
use local search. This partial improvement does not assume that normal agents
have an optimal tour.
Elite agents move to the next city with normal agents. They do not change their
city list.
Honey Bee Optimization Based on Mimicry of Threshold Regulation 311

Fig. 1 Improvement of the visited cities order of a normal agent


N0: normal agent; E0,…, E4: elite agents

3.2 Definition of the Threshold


We introduce a threshold to select agents with a shorter tour list using Eq. (2).
The threshold is known as T. T is constructed using Lmin, Lave and r. Lmin
represent the shortest tour length of all agents, but not an optimal tour length. Lave
represents the average value of all the agents tour lengths. The parameter r is an
indicator that determines the threshold. The threshold equals Lmin plus r times the
difference between Lave and Lmin.
Agents with shorter tour lengths than T become elite agents.
T: the threshold for selecting elite agents
L: tour length of an agent
Lmin: the shortest L of all agents (not the optimal tour length)
Lave: the average L of all agents
r: indicator for calculating the threshold

T = Lmin + Lave – Lmin * r (2)


r is an indicator of the threshold value, which affects the number of elite agents.
When r is small, the solution is converged upon rapidly because there are few elite
agents to communicate with normal agents. When r is too large, there are many elite
agents, including agents with bad solutions. Thus, the solutions are not improved.
To maintain the correct number of elite agents, each agent has to determine
whether the number of elite agents is many or few. Thus, the agents are equipped
with an individual memory of the average number of elite agents in every tra-
velled city, referred to as “En”.
En is defined below.
Ean-i × n-1 +en,i
Ean = (3)
n

Ean: the average number of elite agents in each travelled city from the start until n
iterations of agent a
a: agent
n: iterations (n>2)
i: current city
en,i: number of elite agents in the current city i (n iterations)
312 M. Furukawa and Y. Suzuki

Because en,i will become too small with an increasing value of n, the En is mod-
ified based on the average of a tour. Thus, c is the number of cities used, rather
than n.
,
(4)

E a 0 : total number of all agents/c


c: number of cities
Each agent regulates r based on En so that the increase in Ean restricts the increase
in r. The threshold T and r are redefined using Eq. (5), (6).

Ean - Ean-1
a
∆ran = - (0< rn-1 <1) (5)
Ean

Tan = Lmin + Lave – Lmin * ran (6)

ran: individual r for each agent


Tan: individual r for each agent

4 Experiments
The HBO algorithm described in this paper was developed using JAVA where
Eclipse IDE 3.5 was the development tool.
Experiments were carried out with a regulated r, which is the initial value of r
was set at 0.10, 0.30, 0.50, 0.70, 0.90, and random numbers from 0 to 1. The num-
ber of agents was 1,000. The terminal iteration was 10,000 iterations.
In the experiment, we used the ATT48 taken from TSPLIB[5]. ATT48 is a 48-
city problem. The optimal tour length is 3329.
Results are presented where agents use an individually regulated r.

Table 1 Comparison of Best and nbest with a Fixed and Regulated r

Optimal route length: 3329, Best: minimum tour length, nbest: iterations until found Best

Fixed r Regulated r

Initial r nBest Best nBest Best Average r


0.10 850 4,391 30,496 3,477 0.377
0.30 38,713 3,467 36,048 3,484 0.380
0.50 97,016 5,269 35,442 3,468 0.295
0.70 94,896 5,438 36,239 3,458 0.384
0.90 95,448 6,478 35,844 3,479 0.363
Random 95,375 5,199 35,113 3,467 0.389
Honey Bee Optimization Based on Mimicry of Threshold Regulation 313

Table 1 shows a comparison of the Best using a fixed and regulated r. In the ta-
ble, each trial with a regulated r started with the same value used in each trial with
a fixed r. Best is the average value of the shortest tour length while nBest is the
average number of iterations until nBest was found in 10 trials. Average r is the av-
erage r of all agents when the Best was found in the regulated r trials.
In the trials using a regulated r, both Best and nBest were lower than those with a
fixed r, except r = 0.30. Overall, trials with a regulated r can find similar solutions
to every initial r.

5 Discussions

Fig. 2 shows the En, Lmin and Lave with a regulated r when the initial r was a
random value.

Fig. 2 Lmin, Lave and Average En versus iterations n

According to the features of En shown in Fig. 2, the search process was divided
into three different stages.

(1) Early stage: preparation of the regulator r


In Fig. 2, En decreases until 8,000 iterations in the early stage.
Fig. 3-(1) shows the initial value of r versus L. The values of r were uniformly
distributed between 0.0 and 1.0. The values of L were normally distributed be-
tween about 12,000 and 19,000. Threshold T is indicated, as defined by Eq. (6).
Agents with a smaller L than T became elite agents.
314 M. Furukawa and Y. Suzuki

Fig. 3 r versus L for each agent in the early stage, n = 0, 100

During the early stage, the agents with small L values are not all elite agents.
Therefore, these agents cannot efficiently search shorter tour lengths.
In Fig.3-(2), the number of agent with a large r increase. The increase in r is at-
tributable to the En decrease, as given by Eq. (6). Because the initial value of En is
the average number of agents a city and this is greater than the initial elite agents,
the En continues to decrease until the elite agents increase sufficiently. Thus, the
early stage is a preparatory period. After this, En begins to increase while r
decreases during the middle stage.

Fig. 4 Number of agents with each r

Figure 4 shows the number of agents for each r in each stage measured, when the
value of r is almost level. In the early stage, there are many agents with a high r.
Honey Bee Optimization Based on Mimicry of Threshold Regulation 315

(2) Middle stage: searching for the Best


From 8,000 36,000 iterations, the En is stable. During this period, the number of
agents versus the value of r is highest as shown in Fig. 4.
Because there are sufficient elite agents in every city at the start of the middle
stage, En remains stable or it increases slightly. Thus, r tends to decrease and
agents with shorter tour length become elite agents, as shown in Fig. 5.
In Fig. 5, normal agents have a small r that is close to 0. In contrast, elite agents
have a high r. During the middle stage, agents narrow down the candidate solu-
tions. However, agent with a high L also become elite agents if they have a high r
because these agents may not encounter many elite agents. These agents help to
maintain a variety of solutions, thereby avoiding convergence.

Fig. 5 r versus L for each agent during the middle stage, n = 20,000

(3) Terminal stage: convergence of solutions


After n = 36,000, En increases beyond the restriction of r decreasing. The number
of elite agents increases rapidly. In this case, the solutions finally converge.

The features described above were observed in other trials with different initial
values of r (Fig. 6). During the early stage, many agents had a high r in every trial.
In the trials started with a lower r value, agents tended to have a lower r. During
the middle stage, the distribution of r was similar to that shown in Fig. 4. Thus,
agents narrowed down the candidate solutions.
In summary, agents can improve solutions because of the increase in r during
the early stage and the decrease of r during the middle stage. In addition, there is
no restriction on the overall number of elite agents. They control the individual
threshold like bees.
316 M. Furukawa and Y. Suzuki

Fig. 6 Simulations with different initial values of regulated r


Initial r = 0.1, 0.3, 0.5, 0.7, 0.9

6 Conclusion
We proposed a novel bio-inspired optimization algorithm, honey bee optimization
(HBO), which mimics the threshold regulation in honey bee foraging. We applied
HBO to the travelling salesperson problem (TSP). We investigated the regulation
of the threshold.
In the future, this algorithm should be compared with the BCO algorithm,
where all agents advertise the solution.

References
[1] Frisch, K.: Decoding the language of the bee. Science 185(4152), 663–668 (1974)
[2] Lucic, P., Teodorovic, D.: Attacking Complex Transportation Engineering Problems.
International Journal on Artificial Intelligence Tools 12(3), 375–394 (2003)
[3] Wong, L., et al.: A Bee Colony Optimization Algorithm for Traveling Salesman Prob-
lem. In: Second Asia International Conference on Modelling & Simulation IEEE
Xplore, pp. 818–823 (2008)
[4] Wong, L., et al.: An Efficient Bee Colony Optimization Algorithm for Traveling Sa-
lesman Problem using Frequency-based Pruning. In: 2009 7th IEEE International Con-
ference on Industrial Informatics (INDIN 2009), pp. 775–782 (2009)
[5] TSPLIB,
http://www.iwr.uni-heidelberg.de/groups/comopt/
software/TSPLIB95/
IEC-Based 3D Model Retrieval System

Seiji Okajima and Yoshihiro Okada

Abstract. Recently, 3D CG animations have become in great demand for movie


and video game industries. As a result, many 3D model and motion data have been
created and stored. In this situation, we need any tools that help us to efficiently re-
trieve required data from such a pool of 3D models and motions. The authors have
already proposed a motion retrieval system using Interactive Evolutionary Compu-
tation (IEC) based on Genetic Algorithm (GA). In this paper, the authors propose
a 3D model retrieval system using the IEC based on GA and clarify the usefulness
of the system by showing experimental results of 3D model retrievals practically
performed by several users. The results indicate that the proposed system is useful
for effectively retrieving required data from a 3D model database including many
data more than one thousand.

1 Introduction
Recently, there has been the great demand for 3D CG animations in video game
industries and movie productions. For the creation of 3D CG animations, the char-
acter design is very important factor but very time-consuming, laborious work. It
is significant to reuse already existing data to create new data by modifying them.
Therefore, we are interested in providing 3D multimedia data retrieval systems that
enable 3D CG creators to effectively retrieve their required data.
3D CG animation mainly consists of texture image data, 3D model data and mo-
tion data. Although there have been many researches on image data retrieval and
search systems so far, there have been very few researches on motion data retrieval
and search systems. We have already proposed a motion retrieval system using
Seiji Okajima · Yoshihiro Okada
Graduate School of ISEE, Kyushu University, 744, Motooka, Nishi-ku,
Fukuoka, 819-0395 Japan
e-mail: seiji.okajima@inf.kyushu-u.ac.jp,
okada@inf.kyushu-u.ac.jp

T. Watanabe et al. (Eds.): Intelligent Interactive Multimedia: Systems & Services, SIST 14, pp. 317–327.
springerlink.com c Springer-Verlag Berlin Heidelberg 2012
318 S. Okajima and Y. Okada

Interactive Evolutionary Computation [20]. This system allows the user to retrieve
motion data similar to his/her required data easily and intuitively only through the
evaluation repeatedly performed by scoring satisfaction points to retrieved motion
data without entering any search queries. The IEC method of the system is based
on Genetic Algorithm, so that motion data should be represented as genes used as
similarity features for the similarity calculation in the system. This IEC-based re-
trieval system is very significant especially in the cases that users do not have any
query data can be entered to the system and they only have images of required data
in their mind. Recently, we have applied the IEC method based on GA to a 3D
model retrieval system. In this paper, we propose the 3D model retrieval system and
introduce 3D model features we employed in the system used for the similarity cal-
culation among 3D models. We also clarify the usefulness of the proposed 3D model
retrieval system by showing experimental results of 3D model retrievals practically
performed by several users. The results indicate that the proposed system is useful
for effectively retrieving required data from a 3D model database including many
data more than one thousand.
The remainder of this paper is organized as follows: First of all, Section 2 intro-
duces the IEC method based on GA. Section 3 describes related works. Next Section
4 introduces 3D model features we employed to use as their gene representation for
GA. After that, Section 5 describes how our proposed 3D model retrieval system
works. Furthermore, in Section 6, we describe user experiments and show their re-
sults to clarify the usefulness of the system. Finally, we conclude the paper in the
last section.

2 IEC Method Based on GA


In this section, we explain Interactive Evolutionary Computation (IEC) based on
Genetic Algorithm (GA). IEC is a general term for methods of evolutionary com-
putation that use human interactive evaluation to obtain optimized solutions [17]. In
the IEC method, first of all, a system presents some candidate solutions to the user,
and then the user evaluate them by giving a numerical score depending on his/her
requirement. After that, the system again presents some solutions more suitable for
the user requirement solved by a certain algorithm like GA. After several trials of
this operation, the user obtains his/her most desirable solution. Since, in this way,
the IEC method is intuitive and useful to deal with problems depending on human
feelings, we decided to employ the IEC method based on GA for our 3D model
retrieval system.

3 Related Works
There are many researches on 3D model search and retrieval systems. In general,
3D model search systems are regarded as similarity search systems for retrieving
3D model data similar to the 3D model entered as the search query. For similarity
IEC-Based 3D Model Retrieval System 319

search systems, the choice of what kinds of features as their similarity measures is
significant because it affects the search performances of the systems.
There are several kinds of similarity features of 3D model data, i.e., parameter
based features, graph based features and the others. There are some parameter based
features of 3D model data have been proposed so far. Osada et al. proposed D2 shape
distribution represented as a histogram of distances between any two random points
on a 3D model surface [14]. Vandeborre et al. proposed curvature histogram that
is a histogram of curvature values about randomly sampled points on a 3D model
surface. [19] Elad et al. proposed geometric moment that is a moment of a 3D shape
model [5]. Also, there are some graph based features of 3D model data have been
proposed so far. McWherter et al. proposed Model Graph is a graph constructed
from the component structure of a 3D model [10]. Hilaga et al. proposed the topol-
ogy matching method for 3D model similarity search that uses reeb graphs of 3D
models as similarity features of them [8]. As the other kinds of similarity features of
3D model data, there are appearance based features represented as 2D images. Ass-
falg et al. proposed to use signatures of spin images of 3D models as their similarity
features [1]. Chen et al. proposed light field descriptor based on silhouette images of
3D models [3]. Ohbuchi et al. proposed to use Salient Visual Feature of 3D models
as their similarity features [12].
On the other hand, in general, 3D model retrieval systems are regarded as simi-
larity search systems for retrieving 3D model data similar to the 3D models the user
wants specified by his/her interactive operations like selection of candidate data or
browse a whole database. For browsing based data retrieval systems, how to present
candidate data to the user, i.e., layout algorithms and visualization methods, is sig-
nificant in order to enable the user to find his/her required data as fast as possible. As
layout algorithms based on hierarchical structure of a database, there are tree-maps,
hyperbolic trees, cone trees, etc. As dimensionally reduction methods for similar-
ity feature based layouts, there are multidimensional scaling, principal component
analysis, self-organizing maps, etc. As one of the data retrieval systems through the
user’s interactive operations, there are several systems using IEC method. IEC is
proposed as the interactive calculation method that the user evaluates target data in-
teractively, and finally the system outputs optimized solution based on its evaluated
values. The remarkable point where IEC is useful is that the necessitated operation
is only the evaluation against data by the user. There are some experimental systems
as IEC based retrieval systems. Takagi et al., Cho et al. and Lai et al. proposed an im-
age and music retrieval system using Interactive Genetic Algorithm (IGA) [16, 4, 9].
Yoo et al. proposed video scene retrieval based on emotion using IGA [21]. Cho et
al. proposed a music retrieval system using IGA [4]. However, there is not any 3D
model retrieval system using IEC that retrieves and presents model data according
to the user requirement from a model database. In this paper, we propose such a 3D
model retrieval system using IEC method based on GA. In GA, 3D models should
be represented as their genes and we employ D2 shape distribution and curvature
histogram as similarity features of 3D models used for their gene representations
320 S. Okajima and Y. Okada

from the following reasons. In general, the graph matching needs the higher cal-
culation cost so that graph based similarity features are not suitable for interactive
systems like ours. D2 shape distribution marks good performances about not only
the similarity calculation cost but also the similarity accuracy. Furthermore, curva-
ture histogram is a completely different kind of similarity feature from D2 shape
distribution so that the combined use of them can compensate their weakness each
other.

4 3D Model Features for Genetic Algorithm


As described above, we have been developing a 3D model retrieval system using
IEC based on GA. To use GA, it is necessary to regard 3D models as their corre-
sponding genes represented as similarity features. In this section, we describe 3D
model features we employed and genetic operations performed in the system.

4.1 3D Model Data File Format


There are several types of 3D model data, i.e., polygonal model data (= surface
model data), CSG(Constructive Solid Geometry) data and Voxel (Volume cell) data.
Since polygonal model data are very popular for 3D CG contents, we employ polyg-
onal model data for our proposed 3D model retrieval system. Polygonal model data
mainly consists of vertices information and faces information. There are several
file formats for polygonal model data. For example, Stanford Triangle Format file
format (PLY) is employed by Stanford University, Wavefront OBJ file format is em-
ployed by Wavefront Technologies Inc. and 3ds file format is employed by Autodesk
Inc. Our 3D model database used for the experiment of user evaluation explained
in Sec. 6 is collected from the collection of Princeton University. Their file format
(OFF) is very similar to PLY.

4.2 D2 Histogram (Shape Distribution) and Curvature Histogram


As 3D model features, we employ D2 histogram (Shape Distribution) and curvature
histogram.
Shape Distribution means a histogram of distances between any two random
points on a 3D model surface and it can be used as the feature value of the model for
similarity searches. Fig. 1 shows histograms of three typical shaped models obtained
by applying D2 method, the histogram extraction operation for Shape Distribution.
Each of (a), (b) and (c) corresponds to each of a sphere, a cylinder and a torus shaped
models, respectively. As shown in the figure, typical shaped models have a differ-
ent histogram from each other. So, the dissimilarity between two 3D models can be
calculated as the error between their two D2 histograms.
IEC-Based 3D Model Retrieval System 321

Fig. 1 D2 histograms of typical shaped models, a sphere (left), a cylinder (middle) and a
torus (right).

Fig. 2 Curvature histograms of typical shaped models, a sphere (left), a cylinder (middle)
and a torus (right).

Curvature histogram is obtained as a histogram of curvature values of randomly


sampled vertices on a 3D model surface. In this paper, we use approximate gaus-
sian curvature for the curvature value. Curvature histogram nicely represents detail
information about the surface of a 3D model. Fig. 2 shows histograms of three typ-
ical shaped models obtained by applying the curvature histogram method.
These two histograms are normalized to make their bin width and range the same.

4.3 Gene Representation


We represent 3D models as their corresponding genes using the D2 and curvature
model features. So, 3D models are represented as a combination histogram of D2
and curvature histograms. A chromosome, a gene and an allele are represented as
a real vector, a real number and a real value, respectively. For similarity measure
of chromosomes, we choose the bhattacharyya distance as a measure of gene sim-
ilarity. Let x and y be feature vectors. Then the bhattacharyya distance d is defined
as
n

d(x, y) = − log(∑ xy). (1)
i

4.4 Genetic Operations


For selection algorithm, we employ roulette wheel selection algorithm [2] in our
system because this is often used in many cases. This selection algorithm calculates
322 S. Okajima and Y. Okada

probabilities that individuals are selected by GA. We define fi is a fitness value. The
probability pi of the individual i selected by GA is calculated by
fi
pi = . (2)
∑k=1 fk
N

In addition, this expression assumes that a fitness value is positive. When the fitness
value of an individual is higher, the probability of it becomes higher. If some fitness
values are too high rather than others, it causes early convergence which the search
settles in the early stages.
There are some crossover operators for real-coded GA such as BLX-α [6, 7],
UNDX [13], SPX [18] and so on. In this study, we employ BLX-α because of its
simplicity and fast convergence. Let C1 = (c11 , ..., c1n ) and C2 = (c21 , ..., c2n ) be parents
chromosomes. Then, BLX-α uniformly picks new individuals with a number of the
interval [cmin − I · α , cmax − I · α ], where cmax = max(c1i , c2i ), cmin = min(c1i , c2i ), and
I = cmax − cmin .
For a mutation operator, we choose the random mutation operator [7, 11] because
this is often used in many cases. Let C = (c1 , ..., ci , ..., cn ) be a chromosome and
ci ∈ [ai , bi ] be a gene to be mutated. Then, ci is an uniform number picked from the
domain [ai , bi ].

5 3D Model Retrieval System


In this section, we explain our proposed IEC-based model retrieval system.
Fig.3 and Fig.4 show the overview and a screen snapshot of the 3D model re-
trieval system, respectively. As the preprocessing, the system generates genes as a
database each of which corresponds to each data of the 3D model database.
After that, the system retrieves 3D models according to the following steps
(Fig.3): 1) The system randomly generates initial genes. 2) The system enters the
genes as search queries in the database. 3) The database retrieves the 3D mod-
els corresponding to the queries. 4) The system receives the results of the 3D
model retrieval. 5) The system displays retrieved models to the user. 6) The user
evaluates each of displayed models by three stage scoring, i.e., good, normal and
bad. This evaluation is performed only by mouse clicks on the thumbnails of the
3D models. 7) After the evaluation, the system applies GA operations, i.e., selec-
tion, crossover and mutation to the genes in order to generate the next generation
of genes. 8) Repeat Step 2 to Step 7 until the user is satisfied with the retrieval
results.
After several trials of these process, the user can obtain his/her most desirable
models without any difficult operations and without entering any query models.
IEC-Based 3D Model Retrieval System 323

Fig. 3 Overview of 3D model retrieval system.

Fig. 4 Screenshot of 3D model retrieval system.

6 User Experiment
In this section, we present experimental results about 3D model retrievals performed
using the proposed system by several subjects. Five students in Graduate School of
ISEE, Kyushu University volunteered to participate in this experiment. The experi-
ment was carried out on a standard PC with Windows XP Professional, a 2.66 GHz
Core 2 Quad processor and 4.0 GB memory.
As a 3D model database for the experiment, we employed Princeton Shape
Benchmark [15]. It contains around 1800 model data collected from the World Wide
324 S. Okajima and Y. Okada

Web. As for the GA operators, we employed roulette wheel selection operator, BLX-
α crossover operator and random mutation operator. The value of α is 0.5, crossover
rate is 1.0 and mutation rate is 0.01. The fitness values of three stage scoring are 2.0
for good, 0.5 for normal and 0.05 for bad. Also, the population size is 12 determined
from the results of our previous study [20].
In the experiment for evaluating the usefulness of our proposed system, the par-
ticipants retrieved each of five target models using the system. The five target models
were selected one by one from each of five class models that are tire, car, dolphin,
plant and human head. Each participant tried to search each of five target models
until 20 generations, and then, we obtained 25 trial results totally for the all partic-
ipants. We measured computation and operation times, and we explored retrieved
models. These trials were carried out according to the following procedure.
1. Introduction of the model retrieval system (1 minute).
2. Try to use the system for answering preparation questions (3 minutes).
3. Actual searches for target models using the system.

6.1 Performance Evaluation


We tried to measure an actual computation time spent for one GA operation and an
average user operation time. First, the time spent for one GA operation is less than
ten milliseconds and the average retrieval time to present next generation is 0.62
seconds in the case of about 1800 model data in a database. So, the user manipulates
the system without feeling any impatience. Second, the average user operation time
until 10, 15 and 20 generations are 119.3 seconds, 177.0 seconds and 233.5 seconds,
respectively. Therefore, it can be said that our system allows the user to search
his/her desirable models in a reasonable time.

6.2 Retrieval Results


Next, we explored retrieved models and categorized the results of trials into three
types: 1) Retrieval of the same model as a target model, 2) Retrieval of the same
class model as a target model, 3) Retrieval failure. Table. 1 shows the classification
of retrieved model results. Result 1) can be judged from a corresponding file name.
Result 2) and 3) are judged from descriptions of Princeton Shape Benchmark. Also,
Fig. 5 shows change of cumulative number of cases that same class or same 3D
models are retrieved.

Table 1 Types of retrieved model results.

Number of Results
1) Retrieval of the same model as a target model 11
2) Retrieval of the same class model as a target model 14
3) Retrieval failure 0
Sum 25
IEC-Based 3D Model Retrieval System 325

Fig. 5 Counts of cases that same class or same 3D models are retrieved at each generation.

From these table and chart, it is said that the system retrieve same class 3D mod-
els as the desirable model before 20th generation, and in many cases, the user re-
trieves same class 3D models before 10th generation. These results indicate that our
proposed system is practically useful for retrieving 3D model data even in the case
of a huge database including many data more than one thousand.

7 Conclusion and Remarks


In this paper, we proposed the 3D model retrieval system using IEC based on GA.
Our proposed system allows the user to retrieve 3D models similar to his/her re-
quired models easily and intuitively only through the interactive operation of evalu-
ating retrieved models without any difficult operations. Furthermore, we performed
user experiment for evaluating the usefulness of our proposed 3D model retrieval
system. The results indicate that our proposed system is useful for effectively re-
trieving 3D model data even in a huge database including many data more than one
thousand.
As future work, we will investigate more efficient similarity features of 3D mod-
els for improving the performance of our proposed model retrieval system. In addi-
tion, we will improve GUI of the system to make it more useful.

References
1. Assfalg, J., Del, B.A., Pala, P.: Spin images for retrieval of 3D objects by local and global
similarity. In: Proc of the 17th International Conference on Pattern Recognition, ICPR
2004 (2004), doi:10.1109/ICPR.2004.1334675
2. Baker, J.E.: Reducing bias and inefficiency in the selection algorithm. In: Proc. of the
Second International Conference on Genetic Algorithms on Genetic Algorithms and their
Application, pp. 14–21 (1987)
326 S. Okajima and Y. Okada

3. Chen, D.Y., Tian, X.P., Shen, Y.T., Ouhyoung, M.: On visual similarity based 3D
model retrieval. In: Proc. of Eurographics Computer Graphics Forum, EG 2003 (2003),
doi:10.1111/1467-8659.00669
4. Cho, S.B.: Emotional image and musical information retrieval with interactive genetic
algorithm. Proc. of the IEEE 92(4), 702–711 (2004), doi:10.1109/JPROC.2004.825900
5. Elad, M., Tal, A., Ar, S.: Content based retrieval of VRML objects - an iterative and
interactive approach. EG Multimedia, 97–108 (2001)
6. Eshelman, L.J., Schaffer, J.D.: Real-Coded Genetic Algorithms and Interval-Schemata.
In: Foundations of Genetic Algorithms 2, pp. 187–202. Morgan Kaufman Publishers,
San Mateo (1993)
7. Herrera, F., Lozano, M., Verdegay, J.L.: Tackling Real-Coded Genetic Algorithm: Oper-
ators and Tools for Behavioural Analysis. Journal of Artifitial Intelligence Review 12(4),
265–319 (1998), doi:10.1023/A:1006504901164
8. Hilaga, M., Shinagawa, Y., Kohmura, T., Kunii, T.L.: Topology matching for fully
automatic similarity estimation of 3D shapes. In: Proc. of the 28th Annual Confer-
ence on Computer Graphics and Interactive Techniques, SIGGRAPH 2001 (2001),
doi:10.1145/383259.383282
9. Lai, C.-C., Chen, Y.-C.: Color Image Retrieval Based on Interactive Genetic Algo-
rithm. In: Chien, B.-C., Hong, T.-P., Chen, S.-M., Ali, M. (eds.) IEA/AIE 2009. LNCS,
vol. 5579, pp. 343–349. Springer, Heidelberg (2009), doi:10.1007/978-3-642-02568-
6 35
10. McWherter, D., Peabody, M., Regli, W.C., Shokoufandeh, A.: Solid Model Databases:
Techniques and Empirical Results. Journal of Computing and Information Science in
Engineering (2001), doi:10.1115/1.1430233
11. Michalewicz, Z.: Genetic Algorithms + Data Structures = Evolution Program. Springer
(1994)
12. Ohbuchi, R., Osada, K., Furuya, T., Banno, T.: Salient local visual features for shape-
based 3D model retrieval. In: IEEE International Conference on Shape Modeling and
Applications 2008 (2008), doi:10.1109/SMI.2008.4547955
13. Ono, I., Kobayashi, S.: A real-coded genetic algorithm for function optimization using
the unimodal normal distribution crossover. In: Proc. of the Seventh International Con-
ference on Genetic Algorithms, pp. 246–253 (1997)
14. Osada, R., Funkhouser, T., Chazelle, B., Dobkin, D.: Shape distributions. Journal of
ACM Transactions on Graphics (TOG) 21(4) (2002), doi:10.1145/571647.571648
15. Shilane, P., Min, P., Kazhdan, M., Funkhouser, T.: The Princeton Shape Benchmark. In:
Proc. of Shape Modeling Applications 2004 (2004), doi:10.1109/SMI.2004.1314504
16. Takagi, H., Cho, S.B., Noda, T.: Evaluation of an IGA-based image retrieval system
using wavelet coefficients. In: IEEE International Conference on Fuzzy Systems (1999),
doi:10.1109/FUZZY.1999.790176
17. Takagi, H.: Interactive Evolutionary Computation: Fusion of the Capacities of EC
Optimization and Human Evaluation. Proc. of the IEEE 89(9), 1275–1296 (2001),
doi:10.1109/5.949485
18. Tsutsui, S., Yamamura, M., Higuchi, T.: Multi-parent Recombination with Simplex
Crossover in Real Coded Genetic Algorithm. In: Proc. of the 1999 Genetic and Evo-
lutionary Computation Conference (GECCO 1999), pp. 657–664 (1999), doi:10.1007/3-
540-45356-3 36
IEC-Based 3D Model Retrieval System 327

19. Vandeborre, J.P., Couillet, V., Daoudi, M.: A practical approach for 3D model in-
dexing by combining local and global invariants. In: Proc. of 1st International Sym-
posium on 3D Data Processing Visualization and Transmission, pp. 644–647 (2002),
doi:10.1109/TDPVT.2002.1024132
20. Wakayama, Y., Okajima, S., Takano, S., Okada, Y.: IEC-Based Motion Retrieval System
Using Laban Movement Analysis. In: Setchi, R., Jordanov, I., Howlett, R.J., Jain, L.C.
(eds.) KES 2010. LNCS (LNAI), vol. 6279, pp. 251–260. Springer, Heidelberg (2010),
doi:10.1007/978-3-642-15384-6 27
21. Yoo, H.W., Cho, S.B.: Video scene retrieval with interactive genetic algorithm. Multime-
dia Tools and Applications(2007), doi: 10.1007/s11042-007-0109-8
Incremental Representation and Management
of Recursive Types in Graph-Based Data Model
for Content Representation of Multimedia Data

Teruhisa Hochin, Yuki Ohira, and Hiroki Nomiya

Abstract. A data model incorporating the concepts of recursive graphs has been
proposed for representing the contents of multimedia data. A shape graph, which
represents the structure of a set of instances, has to catch their incremental updates.
It is difficult to manage instances when they have recursive structure. This paper
proposes a method of managing the recursive structure of instances. The procedure
incrementally revising the information of the structure of shape graphs is presented.
Owing to this procedure, the recursive structure could incrementally and properly
be managed and represented in the shape graph.

1 Introduction
In recent years, content retrieval of multimedia data has extensively been inves-
tigated. Using graphs representing the contents of multimedia data is one of the
major approaches addressing to the content retrieval of multimedia data. Petrakis
et al. have proposed the representation of the contents of medical images by using
directed labeled graphs [1]. Uehara et al. have used the semantic network in order
to represent the contents of a scene of a video clip [2]. Jaimes has proposed a data
model representing the contents of multimedia by using four components and the
relationships between them [3]. Contents of video data is represented with a kind of
tree structure in XML [4].
We have proposed a graph-based data model, the Directed Recursive Hypergraph
data Model (DRHM), for representing the contents of multimedia data [5, 6, 7, 8].
It incorporates the concepts of directed graphs, recursive graphs, and hypergraphs.
An instance graph is the fundamental unit in representing an instance. A collec-
tion graph is a graph having instance graphs as its components. A shape graph
Teruhisa Hochin · Yuki Ohira · Hiroki Nomiya
Kyoto Institute of Technology, Goshokaidocho, Mastugasaki, Sakyo-ku,
Kyoto 606-8585 Japan
e-mail: hochin@kit.ac.jp,nomiya@kit.ac.jp

T. Watanabe et al. (Eds.): Intelligent Interactive Multimedia: Systems & Services, SIST 14, pp. 329–339.
springerlink.com c Springer-Verlag Berlin Heidelberg 2012
330 T. Hochin, Y. Ohira, and H. Nomiya

represents the structure of the collection graph. Shape graphs may change when
instance graphs are inserted, or modified. As the existence of instance graphs ef-
fects shape graphs, DRHM is said to be an instance-based data model. Moreover,
generalization and specialization relationships have been introduced to DRHM [8].
An instance graph could include other instance graphs in it. Instance graphs
could, of course, represent self-nested objects. A self-nested object is the object
containing the same type of objects as the type of container object. An example of
the self-nested object is a nested lunch box, whose picture is shown in Fig. 5(a).
A nested lunch box is a lunch box, and a container. The largest lunch box contains
a middle-sized lunch box, which contains the smallest one. Whereas the instance
graph has the capability of representing such nested objects, the shape graph, which
represents the structure of a set of instance graphs, could not properly represent this
situation.
This paper proposes a revision of the shape graph. The shape graph is extended
to capture the self-nested objects. The mapping function from a collection graph to
the shape graph is revised in the formal definition. This extension brings the new
construct to the notation of the shape graph.
This paper is organized as follows: Section 2 briefly explains the structure in
DRHM by using examples. Section 3 describes the formal definition of DRHM. The
extension for self-nested shape graphs is proposed in Section 4. Some considerations
are made in Section 5. Lastly, Section 6 concludes this paper.

2 Informal Description of DRHM


The structure of DRHM is described through examples. In DRHM, the fundamental
unit in representing data or knowledge is an instance graph. It is a directed recursive
hypergraph. It has a label composed of its identifier, its name, and its data value. It
corresponds to a tuple in the relational model.

Example 1. Consider the representation of the picture shown in Fig. 1(a). An or-
nament is on a floor. The ornament is consisted of three bags of rice and a tassel.
Fig. 1(b) represents the contents of this picture in DRHM. An instance graph is
represented with a round rectangle. For example, g1 is an instance graph. An edge
is represented with an arrow. A dotted round rectangle surrounds a set of initial or
terminal elements of an edge. For example, g5 and g6, which are surrounded by a
dotted round rectangle, are the initial elements of the edge e2. When an edge has
only one element as an initial or terminal element, the dotted round rectangle could
be omitted for simplicity. The instance graph g4, which is the terminal element of
the edge e2, is an example of this representation. An instance graph may contain
instance graphs and edges. For example, g1 contains g2, g3, e1, and e4.

A set of the instance graphs having similar structure is captured as a collection


graph. A collection graph is a graph whose components are instance graphs. It cor-
responds to a relation in the relational model.
Incremental Representation and Management of Recursive Types 331

(a) (b)

Fig. 1 (a) A picture and (b) an instance graph representing its contents.

(a) (b)

Fig. 2 (a) A collection graph and (b) its shape graph.

Example 2. An example of a collection graph is shown in Fig. 2(a). A collection


graph is represented with a dashed dotted line. A collection graph has a unique name
in a database. The name of the collection graph shown in Fig. 2(a) is Ornament.
The instance graph g1 is the one shown in Fig. 1(b). The instance graph g10 is for
another picture. These instance graphs are called representative instance graphs.

The structure of a collection graph is represented with the graph called a shape
graph. It corresponds to a relation schema in the relational model. The collection
graph, whose structure the shape graph represents, is called its corresponding col-
lection graph.

Example 3. Figure 2(b) shows the shape graph for the collection graph Ornament
shown in Fig. 2(a). It represents that an instance graph ornament includes an
instance graph object, and an instance graph object is connected to object
by an edge pos.

A shape graph does not have to exist prior to the creation of a collection graph.
Inserting an instance graph results in the creation of a shape graph if the shape
graph describing the definition of the instance graph does not exist yet. It may, of
332 T. Hochin, Y. Ohira, and H. Nomiya

course, exist prior to the collection graph creation. A shape graph must exist while
a collection graph exists. A shape graph may change when new instance graphs
are inserted into the corresponding collection graph, or the instance graphs in it are
modified. Once shape graphs are created, they are not deleted by deleting instance
graphs. Shape graphs can be deleted only by the operation deleting the shape graphs.

3 Formal Description without Self-nested Shape Graphs


In this section, the structural aspect of DRHM without self-nested shape graphs [7]
is formally described. Please refer to our previous work [7] for detail definition.
Let us begin with an instance graph, which is a directed recursive hypergraph.
Definition 1. An instance graph g is an octuple (V, E, Lv , Le , φv , φe , φconnect , φcomp ),
where V is a set of instance graphs included in g, E is a set of edges, Lv is a set
of labels of the instance graphs, Le is a set of labels of the edges, φv is a mapping
from the set of the instance graphs to the set of the labels of the instance graphs (φv :
V → Lv ), φe is a mapping from the set of the edges to the set of the labels of the edges
(φe : E → Le ), φconnect is a partial mapping representing the connections between
sets of instance graphs (φconnect : E → 2V × 2V ), and φcomp is a partial mapping
representing the inclusion relationships (φcomp : V ∪ {g} → 2V ∪E ). A label is a triple
(did , nm, d), where did is an identifier, nm is a name, and d is a tuple of a data type
and a data value.
Here, the order of an instance graph is described.
Definition 2. Let g = (V, E, Lv , Le , φv , φe , φconnect , φcomp ) be an instance graph. A set
of the nth constructing elements Vcen (v) for an instance graph v ∈ (V ∪ {g}) is defined
by the following recursion formula:
Vce0 (v) = {v},

Vcei+1 (v) = φcomp (p), p ∈ Vcei (v) ∩ (V ∪ {g}) (i ≥ 0).
Here, n is called the order of constructing elements. Instance graphs (edges, respec-
tively) that are the nth constructing elements of v are called the nth constructing
instance graphs (edges) of v. The first constructing elements of v are called direct
constructing elements of v.
Example 4. The direct constructing elements of the instance graph g1 shown in Fig.
1(a) are g2, g3, e1, and e4.
A collection graph captures a set of instance graphs. The concise definition is de-
scribed here. Please refer to our previous work [7] for precise definition.
Definition 3. A collection graph is a graph having instance graphs as its compo-
nents. That is, when a collection graph cg has n instance graphs : gi = (Vi , Ei , Lvi , Lei ,
φvi , φei , φconnecti , φcompi )(i = 1, · · · , n), and φcompcg (cg) = {g1 , · · · , gn }, cg is repre-
sented with a 9-tuple (nm,V, E, Lv , Le , φv , φe , φconnect , φcomp ), where nm is the name
of a collection graph, V (E, Lv , Le , respectively) contains the union of Vi (Ei , Lvi , Lei ),
Incremental Representation and Management of Recursive Types 333

φv (φe , φconnect , respectively) is the union of the mappings φvi (φei , φconnecti ), and
φcomp is the union of the mappings φcompi and φcompcg . Each component instance
graph (gi ) is called a representative instance graph. A database is a set of collection
graphs. The name of a collection graph must be unique in a database.

A shape graph represents the structure of instance graphs in a collection graph. The
instance graphs having the same name are mapped to a shape graph, whose name is
that of the instance graphs. The edges having the same name are similarly mapped
to a shape edge [7].

Definition 4. The structure of a shape graph is the same as that of a collection graph.
That is, it is represented with a 9-tuple. An edge in a shape graph is called a shape
edge. The labels of shape graphs and shape edges are different from those of collec-
tion graphs. The label of a shape graph or shape edge is a triple (sid , nmd , DT ), where
sid is an identifier, nmd is a name of a shape graph or shape edge, and DT is a set of
data types of data values. This label is called a shape label. There are the following
relationships between a collection graph (nmcg ,V, E, Lv , Le , φv , φe , φconnect , φcomp )
and its corresponding shape graph (nmsg ,Vs , Es , Lvs , Les , φvs , φes , φconnect s , φcomps ).
Here, nm(l) represents the name in a label l, and nm(L) = { nm(l) | l ∈ L }, where
L is a set of labels.
• nmcg = nmsg
• nm(Lvs ) ⊇ nm(Lv ),
• nm(Les ) ⊇ nm(Le ),
• There is a mapping θv : V → Vs such that ∀ v ∈ V ∃ vs ∈ Vs (nm(φv (v)) =
nm(φvs (vs )) ∧ θv (v) = vs ),
• There is a mapping θe : E → Es such that ∀ e ∈ E ∃ es ∈ Es (nm(φe (e)) =
nm(φes (es ))∧ θe (e) = es ), and φconnect (e) = (U,W ) ⇒ φconnects (θe (e)) = (Θv (U),
Θv (W )), where Θv (U) means a set of shape graphs {θv (v1 ), · · · , θv (vn )} for a set
of instance graphs U = {v1 , · · · , vn }, and
• φcomp (v) = U ∪ Z ⇒ φcomps (θv (v)) = Θv (U) ∪ Θe (Z), where Θe (Z) means a set
of shape edges {θe (e1 ), · · · , θe (en )} for a set of edges Z = {e1 , · · · , en }.
A shape graph has to be changed in order to satisfy the conditions described above
when new instance graphs are inserted, or instance graphs are modified.

4 Revision for Self-nested Shape Graphs


The inclusion relationship of the shape graph (φcomps ) described in the previous
section could not capture the recursive inclusion relationship. This relationship is
revised for representing the structure of self-nested objects.
Here, the inclusion relationship of the shape graph (φcomps ) is mainly revised.
Before revising φcomps , the procedure deriving the recursive structure of inclusion
relationships is described. The procedure is Analyze, and is shown in Fig. 3. This
procedure takes two arguments as input. One is a set of pairs of an instance graph
and its path. The other is a kind of table Tcomp , which represents the inclusion
334 T. Hochin, Y. Ohira, and H. Nomiya

Algorithm Analyze
Input
S: a set of pairs <instance graph (g), path(path)>
Tcomp : a set of triplets <gname, Pnr , Pr >
Output
Tcomp : revised Tcomp
Method
1: foreach ent in S
2: push ent to Q; // Q is a queue
3: end
4: while(Q is not empty)
5: ent = pop Q;
6: path = ent.path; nm = current(path); nm p = parent(path);
7: if(nm p == NULL) then
8: if(nm ∈ / NM) then
9: add <nm, {}, {}> to Tcomp ;
10: endif
11: else // nm p = NULL
12: if(nm p ∈ NM) then
13: if(path ∈/ Pr (nm p ) ∧ path ∈
/ Pnr (nm p )) then
14: if(nm ∈ NM) then
15: add path to Pr (nm p );
16: else
17: add path to Pnr (nm p );
18: endif
19: endif
20: else // (nm p ∈ / NM)
(nm)
21: lowest order = min(order(path), order(Pnr ));
22: processed=move path f rom Pnr to Pr(Tcomp , path, lowest order);
23: if(processed == true ∧ order(path) > lowest order) then
24: add < nm p , {}, {path} > to Tcomp ;
25: else
26: add < nm p , {path}, {} > to Tcomp ;
27: endif
28: endif
29: endif
30: foreach ig in φcomp (ent.g)
31: push <ig, path +str ”;” +str nm(θv (ig))> to Q;
32: end
33: end
End

Fig. 3 Procedure Analyze.

relationships. That is, it represents the relationship between an instance graph and
the instance graphs included in it. The procedure Analyze revises Tcomp .
When Analyze is firstly called, a set of representative instance graphs are specified
as the instance graphs of the first argument S. In this case, S is a set of pairs of
the form (a representative instance graph, its name) because representative instance
graphs have no parents. Moreover, Tcomp has no entry because it is the first call.
Incremental Representation and Management of Recursive Types 335

The procedure Analyze has to analyze whether a type of instance graph included
in another one is already defined or not. If the type of instance graph is already
defined, it must be treated as a recursive instance graph, which includes itself in
it. Therefore, the included instance graphs are managed in separating recursive in-
stance graphs from non-recursive ones. In the procedure, a set of paths of non-
recursive (recursive, respectively) instance graphs is managed in Pnr (Pr ). An entry
of the table Tcomp is a triplet of (gname, Pnr , Pr ), where gname is a name of an in-
stance graph. In the procedure, Pnr (nm) (Pr (nm), respectively) means Pnr (Pr ) of the
entry of Tcomp , whose gname is nm. Moreover, NM represents a set of the names,
each of which is gname of the entry in Tcomp .
In the procedure Analyze, the elements in S are pushed to Q (lines 1–3). In lines
7–8, if the parent name is not obtained, then the name of the current instance graph
is registered to Tcomp . If the parent name is obtained, the procedure tries to insert the
path (lines 11–29). If the parent name nm p is already registered (line 12), the path is
not registered yet (line 13), and the current name nm is also already registered (line
14), then the path is registered as a recursive instance graph (line 15). Otherwise,
the path is registered as a non-recursive instance graph (line 17). Please note that the
path is registered to Pr or Pnr if the path is not registered yet because Pr and Pnr are
sets. If the parent name is not registered yet (line 20), then the parent name and its
path is registered to Tcomp (lines 21–28). The lowest order of nm is obtained because
the path having the lowest order is registered or kept as a path of a non-recursive
instance graph. Then, the function move path f rom Pnr to Pr() is invoked. This
function is not precisely described because it is cumbersome and requires many
lines for description, but it is not difficult. This function finds the path which is of
nm, and which includes the parent of nm or whose parent is included in path in
all of Pnr . If the path is found, it is decided that nm is not shared. In this case, the
path having the lowest order be registered or kept in Pnr , and the other paths are
moved from Pnr to Pr . This function returns true when nm is not shared. If nm is not
shared and the order of the path is not the lowest one (line 23), the path is registered
as a recursive instance graph (line 24). Otherwise, the path is registered as a non-
recursive instance graph (line 26). Lastly, the instance graphs included in the current
instance graph are pushed into Q (lines 30–32). The procedure described above is
repeated until Q has an entry.
Example 5. Let us consider the collection graph shown in Fig. 4(a). There are two
representative instance graphs (g10 and g20). The procedure Analyze produces
Tcomp shown in Fig. 4(b). The instance graph B non-recursively contains the instance
graph D, and recursively contains the instance graph B.
By using the table Tcomp , the structure of the shape graph and the mapping function
φcomps described in Definition 4 are revised as follows.

Definition 5. the structure of the shape graph and the mapping function φcomps are
re-defined as follows:
• The structure of a shape graph is represented with a 10-tuple: (nmsg ,Vs , Es , Lvs ,
Les , φvs , φes , φconnect s , φcomps , Tcomp ).
336 T. Hochin, Y. Ohira, and H. Nomiya

(a) (b) (c)


Tcomp
gname Pnr Pr
A {A;B, A;C} {}
B {A;B;D} {A;B;B}
C {A;C;D, A;C;E} {}

Fig. 4 (a) An example of a collection graph including a nested instance graph, (b) its Tcomp ,
and (c) its shape graph.

• φcomp (v) = U ∪ Z ⇒ φcomps (θv (v)) = Wnr (v,U, Tcomp ) ∪ Wr (v, Tcomp ) ∪ Θe (Z),
where Wnr (v,U, Tcomp ) = {θv (u)|nm(φv (u)) = Pnr (nm(φv (v))) ∧ u ∈ U}, Wr (v,
Tcomp ) is a set of shape graphs whose name is included in Pr (nm(φv (v))), and
Θe (Z) is the same as that shown in Definition 4.

Here, several shape graphs for self-nested objects are described. This example also
demonstrates the representation of the nested shape graph.

Example 6. Let us consider a nested lunch box. In a nested lunch box, a large lunch
box contains a smaller lunch box, and it contains the smallest one. A nested lunch
box is shown in Fig. 5(a). In this figure, three lunch boxes are displayed in parallel
for clarity. The collection graph of the nested lunch box is shown in Fig. 5(b). The in-
stance graph corresponding to the largest lunch box contains the one corresponding
to the middle-sized one. It similarly contains the instance graph corresponding to the
smallest one. The shape graph of the collection graph is shown in Fig. 5(c). A lunch
box recursively contains another lunch box. The inner lunch box is represented with
a broken line. This represents that the inner lunch box is already represented (or
defined) as the outer instance graph.

(a) (b) (c)

Fig. 5 (a) A nested lunch box, (b) the collection graph, and (c) its shape graph.

Example 6 is of the self nested object containing the same kind of objects. The next
example is of the self nested object that may contain another kind of object.
Incremental Representation and Management of Recursive Types 337

(a) (b) (c)

Fig. 6 (a) A handcrafted nested box, (b) the collection graph, and (c) its shape graph.

(a) (b)

Fig. 7 (a) The evolved collection graph of the nested box and (b) its shape graph.

Example 7. Let us consider a handcrafted nested box shown in Fig. 6(a). The largest
box contains a smaller box. The smaller box contains a block. In Fig. 6(a), two boxes
and a block are displayed in parallel for clarity. The collection graph and the shape
graph of the nested box are shown in Fig. 6(b) and Fig. 6(c), respectively. In the
instance graph, the inner instance graph box contains the instance graph block.
In the shape graph, the shape graph block is contained in the shape graph box.

Example 8. Let us consider the situation that another instance graph block is in-
serted into the instance graph block in the collection graph shown in Fig. 6(b).
The collection graph updated is shown in Fig. 7(a). The shape graph evolves from
the one shown in Fig. 6(c) into the one shown in Fig. 7(b).

5 Consideration
An instance graph could include instance graphs in it. When shape graphs represent
their structure, the shape graph is shared if possible. An example of the representa-
tion is the shape graph D shown in Fig. 4(c). This shape graph is shared by the shape
graphs B and C. If the shape graph cannot be shared, the inner shape graph refers to
the outer one as the recursive shape graph, e.g., B.
In conventional schema-based data models [9], data are defined before they are
inserted. In this case, data definition plays a role of a kind of constraints. That is,
illegal data are not permitted to be inserted into a database. On the other hand,
any data can be inserted under an instance-based data model. As inserting and/or
updating data may result in the modification of the information on the structure of
338 T. Hochin, Y. Ohira, and H. Nomiya

data, maintaining this information is very important. The proposed method enables
incremental modification of this information even if it includes recursive structure.
Semistructured data and XML data are considered to be instance-based data.
Research efforts have been made to derive a kind of schema from such data
[11, 12, 13, 14, 15, 16, 17, 18]. Some methods follow automata or grammar ap-
proaches [12, 15, 17, 18]. A clustering method or the Minimum Description Length
(MDL) principle may be used in deriving a kind of schema information [14, 16].
As XML data are represented with a kind of tree, the methods for XML data
are not applicable for deriving shape graphs from instance graphs because inclu-
sion relationships of instance graphs may constitute a graph structure rather than
a tree structure. In the research efforts on the semistructured data, graph structures
are considered [11, 12, 13, 14]. Although DataGuides [11] could precisely repre-
sent the structure of semistructured data, the cost of incremental maintenance is
high [12,13]. Wang et al. have proposed the approximate graph schema for summa-
rizing semistructured data graphs by using an incremental clustering method [14].
The method proposed in this paper does not approximate data. The shape graphs,
however, have the structure similar to the approximate graph schema by unifying
the shape graphs having the same name into one. The proposed method could bring
the concise representation of instance graphs to users.

6 Conclusion
This paper extended the shape graph in order to capture the self-nested objects.
The mapping function from a collection graph to the shape graph was revised in
the formal definition. The procedure Analyze obtaining the recursive structure was
clarified. The mapping function could be defined through the information obtained
by the procedure Analyze. The recursively contained shape graph is represented with
a broken line in drawing shape graphs. These extensions enable the shape graph to
represent the self-nested objects properly.
Future research includes the application of the schema graph to the real applica-
tion. We have a plan to represent the contents of video capturing Japanese traditional
craft workers’ movements.

Acknowledgements. This research is partially supported by the Ministry of Education,


Science, Sports and Culture, Grant-in-Aid for Scientific Research (B), 23300037, 2011-2014.

References
1. Petrakis, E.G.M., Faloutsos, C.: Similarity Searching in Medical Image Databases. IEEE
Trans. on Know. and Data Eng. 9, 435–447 (1997)
2. Uehara, K., Oe, M., Maehara, K.: Knowledge Representation, Concept Acquisition and
Retrieval of Video Data. In: Proc. of Int’l Symposium on Cooperative Database Systems
for Advanced Applications, pp. 218–225 (1996)
Incremental Representation and Management of Recursive Types 339

3. Jaimes, A.: A Component-Based Multimedia A Data Model. In: Proc. of ACM Workshop
on Multimedia for Human Communication: from Capture to Convey (MHC 2005), pp.
7–10 (2005)
4. Manjunath, B.S., Salembier, P., Sikora, T. (eds.): Introduction to MPEG-7. John Wiley
& Sons, Ltd (2002)
5. Hochin, T.: Graph-Based Data Model for the Content Representation of Multimedia
Data. In: Proc. of 10th Int’l Conf. on Knowledge-Based Intelligent Information and Eng.
Systems (KES 2006), pp. 1182–1190 (2006)
6. Hochin, T., Nomiya, H.: A Logical and Graphical Operation of a Graph-based Data
Model. In: Proc. of 8th IEEE/ACIS Int’l Conference on Computer and Information Sci-
ence (ICIS 2009), pp. 1079–1084 (2009)
7. Hochin, T.: Decomposition of Graphs Representing the Contents of Multimedia Data.
Journal of Communication and Computer 7(4), 43–49 (2010)
8. Ohira, Y., Hochin, T., Nomiya, H.: Introducing Specialization and Generalization to a
Graph-Based Data Model. In: König, A., Dengel, A., Hinkelmann, K., Kise, K., Howlett,
R.J., Jain, L.C. (eds.) KES 2011, Part IV. LNCS, vol. 6884, pp. 1–13. Springer, Heidel-
berg (2011)
9. Silberschatz, A., Korth, H., Sudarshan, S.: Database System Concepts, 4th edn. McGraw-
Hill (2002)
10. Tanaka, K., Nishio, S., Yoshikawa, M., Shimojo, S., Morishita, J., Jozen, T.: Obase Ob-
ject Database Model: Towards a More Flexible Object-Oriented Database System. In:
Proc. of Int’l. Symp. on Next Generation Database Systems and Their Applications
(NDA 1993), pp. 159–166 (1993)
11. Goldman, R., Widom, J.: DataGuides: Enabling Query Formulation and Optimization
in Semistructured Databases. In: Proc. of 23rd Int’l Conf. on Very Large Databases, pp.
436–445 (1997)
12. Nestorov, S., Ullman, J., Wiener, J., Chawathe, S.: Representative Objects: Concise Rep-
resentations of Semistructured, Hierarchical Data. In: Proc. of 13th Int’l Conf. on Data
Engineering (ICDE 1997), pp. 79–90 (1997)
13. Soe, D.-Y., Lee, D.-H., Moon, K.-S., Chang, J., Lee, J.-Y., Han, C.-Y.: Schemaless Repre-
sentation of Semistructured Data and Schema Construction. In: Tjoa, A.M. (ed.) DEXA
1997. LNCS, vol. 1308, pp. 387–396. Springer, Heidelberg (1997)
14. Wang, Q.Y., Yu, J.X., Wong, K.-F.: Approximate Graph Schema Extraction for Semi-
structured Data. In: Zaniolo, C., Grust, T., Scholl, M.H., Lockemann, P.C. (eds.) EDBT
2000. LNCS, vol. 1777, pp. 302–316. Springer, Heidelberg (2000)
15. Chidlovskii, B.: Schema Extraction from XML Data: a Grammatical Inference Ap-
proach. In: Proc. of 8th Int’l Workshop on Knowledge Representation Meets Databases,
KRDB 2001 (2001)
16. Garofalakis, M., Gionis, A., Rastogi, R., Seshadri, S., Shim, K.: XTRACT: Learning
Document Type Descriptors from XML Document Collections. In: Data Mining and
Knowledge Discovery, vol. 7, pp. 23–56. Kluwer Academic Publishers (2003)
17. Bex, G.J., Neven, F., Schwentick, T., Vansummere, S.: Inference of Concise Regular
Expressions and DTDs. ACM Trans. on Database Systems 35(2) (2010)
18. Bex, G.J., Gelade, W., Neven, F., Vansummere, S.: Learning Deterministic Regular Ex-
pressions for the Inference of Schemas from XML Data. ACM Trans. on the Web 4(4)
(2010)
Intelligent Collage System

Margarita Favorskaya, Elena Yaroslavtzeva, and Konstantin Levtin*

Abstract. Automatic collage design is a widely spread task in many applications,


from a design of home collages to commercial advertising projects. We have pro-
posed an intelligent collage system based on adaptive segmentation of Regions of
Interest (ROI) in photos or video frames. Under ROI we assume images of people,
their faces, animals, cars, ships, buildings or other large objects which are basi-
cally situated in the center of photos. Also we enhanced a method of a seamless
blending of collage regions using flexible contours and color distribution on re-
gion boundaries. In the case of video sequence, we use an adaptive selection of
frames and a special algorithm for removing of non-informative or repeatable
frames. The intelligent collage system can work in automatic or handle modes, in
last case with tuning of parameters. The main advantage of our approach is a great
variability of different possible placements of collage regions and a high aesthetic
effect of a designed collage.

1 Introduction
A digital collage is an effective performance of various images with multiple
closed or different scenarios according to user preferences. During intelligent col-
lage modeling, we decide the task of optimal placement of collage segments; this
is equivalent the task of cut and package. Also we need in suitable ROI which are
selected under some criteria and associated with any subject from our life envi-
ronment. When we have a limited set of photos, a collage system is restricted in
selection of informative images; otherwise an algorithm for detection of ROI is
run. So we apply a simple boosting procedure of face detection and a removing
procedure of repeatable and non-informative frames from video sequences. The
optimal placement of collage regions will be insufficient if we don’t use one of
seamless joint methods between regions. The most efficient method of collage

Margarita Favorskaya ⋅ Elena Yaroslavtzeva ⋅ Konstantin Levtin


*

Siberian State Aerospace University, 31 Krasnoyarsky Rabochy, Krasnoyarsk,


660014 Russia
e-mail: favorskaya@sibsau.ru, yaroslavtzeva@sibsau.ru,
levtin@sibsau.ru

T. Watanabe et al. (Eds.): Intelligent Interactive Multimedia: Systems & Services, SIST 14, pp. 341–350.
springerlink.com © Springer-Verlag Berlin Heidelberg 2012
342 M. Favorskaya, E. Yaroslavtzeva, and K. Levtin

design is a use of a seamless blending of regions with similar color areas. Such
smoothness transitions between collage regions need in algorithms which select
similar color boundaries and maintain smoothness and opaque parameters. Also it
is desired that an automatic collage system has a high computer speed for photos
and videos processing.
One can find in literature many methods which are separately concerned to
photo collage and video collage [1–3]. We have proposed an integrated approach
when we can consider a video collage as an extension of photo collage technique.
In the case of video collage, we have all tasks of photo collage and additionally a
selection of informative frames which are called key-frames. Three aspects char-
acterize a collage design: sampling of images-candidates, their content analysis,
and a seamless joint of collage regions. Existing methods of images-candidates
sampling from photo set or video representation are classified into two categories,
i.e., stochastic skimming and intelligent summary. Stochastic skimming composes
random series of frames from one or multiple photos collection or video se-
quences. The more perspective intelligent summary contains selected photos of
key-frames in handle mode or automatically. The task of automatic extraction has
a special interest, and we can consider it as a particular case of content analysis in
multimedia databases. The following aspect is in content analysis into selected
photos or key-frames. Two paradigms frame-based and ROI-based are available
for such systems. The advantage of the second paradigm is evident because photo
or key-frame may include many non-informative areas. Methods of seamless
blending may be simple and complex; our contribution is a usage of flexible
contours for these purposes.

2 Related Work
Yeung and Yeo were the pioneer researches whom represented their system “Pic-
ture Collage” [4]. A compact and smooth algorithm which automatically and
seamlessly arranges the ROI within a photo collage was proposed by Rother et al.
[5]. This project was called “AutoCollage”. Then Wang et al. advanced ideas of
“AutoCollage” on video sequences and proposed “Video Collage” [6]. ROI were
extracted from representative temporal key-frames and aggregated in spatial struc-
tures with blended boundaries. Wang et al. preferred fixed rectangles for ROI and
for final collage, and this strategy was not a good decision for a human aesthetic
perception. A great diversity of collage templates with arbitrary shaped ROI and
different styles will significantly improve the browsing video content. That’s why
a kind of enhanced video collage, called “Free-Shaped Video Collage” was pro-
posed by Yang et al. [7]. At present great variability of shapes and templates is
introduced into multiple software products which represent handle tools for a
collage design.
Many authors find criteria of cut and package of collage regions for automatic
ROI placement on final image according to such basic properties as representative,
compact, and visually smooth. In some researches these properties are formulized
as a series of energies [7]. For each pixel pC in a collage template C its label L(pC)
depends from the number n of frames F∈{F1, F2, … , Fn}, the resize factor r of
Intelligent Collage System 343

frame Fi which makes more saliency regions larger (resized frame Fi′(r)), and a
shift parameter s which denotes the 2D shift between original image I(p) and re-
sized frame Fi′(r). The optimal decision is in minimization of an energy function
E(L):
E (L ) = w1 E rep (L ) + w2 Ecomp (L ) + w3 E smo (L ) , (1)

where Erep(L), Ecomp(L), and Esmo(L) denote a representative cost, a compact cost,
and a visual smooth cost respectively; w1, w2, and w3 are their empirical weights.
More detailed estimations of Erep(L), Ecomp(L), and Esmo(L) are also defined ac-
cording to empirical dependences that don’t made great contribution in theory of
aesthetic human performances. We used another way based on traditional estima-
tors in digital image theory, i. e. mask preparation for ROI, contour analysis, af-
fine transformations, and descriptors of color, brightness and smoothness. Also
three collage styles: book, diagonal, and spiral [7] are limited a representation of a
final collage. We propose an intelligent search algorithm based on optimization of
ROI placement, overlapping, and sizes of ROI.
However video collage has some principal challenges which make photo col-
lage techniques are directly unsuitable into video collage: selection of representa-
tive key-frames from temporal scenes and alignment of the most salient ROI.
More enhanced techniques were applied an effective summarization of content
such as stained-glass visualization [8], video snapshots [9], and some others.
The paper is organized as follows. Section 3 briefly summarizes the enhanced
method of frames selection. Section 4 is devoted to describing the main underly-
ing ideas of the proposed algorithm of cut and package. A way of seamless blend-
ing based on flexible contours is described into Section 5. System “Intelligent
Collage” and experimental results are presented in the next Section 6 while
Section 7 contains conclusion and future research.

3 Frames Selection
We have proposed the enhanced method of frames selection from video sequence.
Such method possesses additional criteria that were not considered in known pub-
lications. We develop an automatic frames selection based on heuristic research
technique and production system of decision making. Primarily we find first frame
F0 randomly and check if it satisfies the determined constraints. We formulate
such constraints in a following manner:
1. Current frame Fi can not be equivalent to any frame from a set of selected
frames FC={F0, …}.
2. Current frame Fi is not a “pause” frame between scenes in videos.
3. Current frame Fi is not an intermediate frame (black or white) between
scenes.
4. Current frame Fi does not include subtitles.
5. Current frame Fi contains ROI (faces, buildings, etc.).
344 M. Favorskaya, E. Yaroslavtzeva, and K. Levtin

If we analyze a TV video sequence then we will have a deal with logotypes. This
problem is decided by ROI segmentation in the center of frame because logotypes
are previously situated in four corners of frame. If current frame Fi is not satisfy to
the declared constraints, then we check the following frame Fi+1 until the needed
frame F1 would be detected and written into a set of selected frames FC. This set is
limited by a number of collage segments (in our case, 6). In the same way we find
the second selected frame F2, the third selected frame F3 and so on. If no objects of
interest were detected in whole video sequence (according to a number of iterations)
then random frames will be selected which satisfy to other remaining criteria.
The enhanced method of frames selection permits to extract more representa-
tive sampling not only from one video sequence but also from some available
video sequences. The following stage is a ROI cut, sorting and package collage
segments on final collage.

4 Cut and Package Algorithm


Our algorithm is not a universal tool but it is very efficient for collage design
when number of collage segments is previously determined, in our case by 6. This
is the main constraint of proposed approach. The task of ROI cut from selected
images or frames is connected with content analysis and pattern recognition; it has
not appropriate decision in the case of unlimited classes of objects. That’s why
collage designers limit objects of interest by faces, people, buildings, and other
large significant objects from our life environment. We have applied a simple
boosting procedure of face and people detection in video sequences based on an-
thropometric facilities [10, 11]. Also we remove some undesirable small objects
(logotypes, subtitles) from selected frames [12].
Package algorithm is based on a sequential placement of collage segments on
canvas of final image according to parameter “A size of overlapping segments”.
Package algorithm includes following stages: (1) sorting, (2) placement, and (3)
minimization of unfulfilled areas of canvas. Let’s we have 6 input images. We sort
input images according to their sizes in following way. Image pair 1 includes two
images with maximum and minimum sizes, image pair 2 contains two images with
maximum and minimum sizes remaining after image pair 1, image pair 3 consists
from unlabeled images. Such procedure provides a balance between placement op-
timization, overlapping optimization, and sizes optimization of collage segments.
Placement of collage segments is a following stage of algorithm. First image is
chosen randomly from 6 selected images of frames and maintained in top left cor-
ner of canvas, moreover the sizes of such selected image is varied randomly but in
scale ratios relatively to a collage canvas. This constraint was introduced for a uni-
form canvas filling. Then we begin to locate image pairs which were determined
at the previous stage. So, we need in sizes and corner coordinates of a current
segment which depend from sizes, corner coordinates, and overlapping area of a
previous segment:
[ xi , y j ] = [ xi ±1 ⋅ (1 − L); y j ±1 ⋅ (1 − L)] , (2)
Intelligent Collage System 345

where xi, yj are coordinates of current segment; xi±1, yj±1 are coordinates of adjacent
horizontal and vertical segments correspondingly; L is a parameter “A size of
overlapping segments”.
A high size of a second segment (second image from a pair) is calculated as
H 2 = H 0 − H 1 (1 − L) , (3)

where H0 is a high size of canvas; Hi is a high size of corresponding segment. A


width size of second segment is calculated proportionally.
Second segment is placed under first segment with coordinates
[ x2 , y 2 ] = [ x1 ; y1 + H 1 (1 − L)] . (4)

Third and fourth segments are located in right part of canvas similarly to first pair
of segments. Remaining images are fifth and sixth segments; they are situated in
unfulfilled area between first and third, second and forth segments correspond-
ingly. The sizes of current segments depend from a square of unfulfilled areas and
overlapping sizes:
W5 ,6 = W0 − W1,2 (1 − L ) − W3 ,4 (1 − L) ,
[ x5 , y5 ] = [ x1 + W1 (1 − L ); y1 ] , (5)
[ x6 , y 6 ] = [ x2 + W2 (1 − L); y5 + H 5 (1 − L)] ,
where W0 is a weight size of canvas; Wi is a weight size of corresponding segment.
A high size of fifth and sixth segments is calculated proportionally.
Then we analyze unfulfilled areas on collage canvas. If such areas are detected
then sizes of adjacent segments are increased until existing disruptions would be
deleted. On Fig. 1 one can see existing approaches and a result of our technique.

Fig. 1 Examples of segments placement: a) and b) first and last stages of segments place-
ment (by k-means method) in system “Mobile photo collage” [2]; c) and d) results without
segments overlapping and with segments overlapping in system “Microsoft Autocollage”;
e) our results with 6 segments
346 M. Favorskaya, E. Yaroslavtzeva, and K. Levtin

5 Seamless Blending Based on Flexible Contours


Our method of flexible contours generation is based on adaptive threshold under
theoretical assumption that ROI have a higher intensity and a more textured struc-
ture then a background. After binary splitting, ROI are labeled by white pixels and
background by black pixels. Also we use assumption that a squire of background
is larger then a squire of ROI, otherwise a binary mask is inverted.
The next stage is a building of flexible contours of ROI with a noise compensa-
tion and ROI approximation by graphic primitives. We realized two methods in
our system: a method of shape estimation and a block analysis method. The me-
thod of shape estimation includes iterative marking of connected areas and remov-
ing such areas which sizes are less then threshold value of determined ratio of
sides. Then we estimate shapes of connected areas, select approximated graphic
primitives for each connected area, and at least generate ROI.
The block analysis method is based on image partitioning on blocks n × n sizes.
Then we analyze a filled level for each block and change each filled block by
blob. In our experiments n was equaled 2, 4, 8, and 16 (according to image sizes),
and optimal ROI approximation was achieved near 300 blobs. A radios of ap-
proximated blob is calculated as

Ri , j = k i , j ⋅ n + n; k = Fli , j n 2 , (6)

where i, j are coordinates; Ri,j is a blob radios; ki,j is a coefficient on filled level of
block; n is a size of block; Fli,j is a factor of filled level by pixels owned to object
of interest. Coefficient k may increase a blob size.
Coordinates of blob center are determined as coordinates of mass center of ROI
divided on value Fli,j. Then we generate array of blobs which approximate ROI.
The block analysis method is characterized by increased complexity, and use
100÷200 blocks (instead of 1÷10 blocks in method of shape estimation). However
with optimization, the block analysis method has enough computer speed and high
accuracy in comparison with the method of shape estimation.
The received array of approximated graphic primitives is used for fuzzy masks
generation during seamless blending between ROI on collage canvas. Recursive al-
gorithm of blending applies an image processed by a fuzzy mask on step n–1 as a
background on following step n. Fuzzy mask is a grey-scale image fragment with
ROI sizes. It permits to impose a collage segment on background with determined
transparency for each pixel. There are two principal different ways of fuzzy mask
generation: a simple method of semitransparent transitions between segments and
background and an adaptive method based on calculated ROI approximation. The
second method is a more preferred decision for large and non-uniformed ROI.
Imposed procedure is executed as so called α-blending of textures according to
following rules. White areas of fuzzy mask give opaque ROI, black areas of fuzzy
mask give opaque background, and grey-scale colors create semitransparent com-
binations of ROI and background by formulas:

, j = Ii, j ⋅ Ii, j
I iout ROI mask
(
+ I ibg, j I max − I imask
,j ), (7)
Intelligent Collage System 347

where i, j are coordinates; Iout is an output pixel value; IROI is a ROI pixel value;
Imask is a mask pixel value; Ibg is a background pixel value; Imax is a maximum val-
ue in determined color space.
The result image received by Eq. (7) is an output collage with effect of seamless
blending. The main advantage of such adaptive method of collage design consists in
availability of ROI detection with exceptions of non-informative regions and design
of blending transitions between ROI and background. Thanks to these properties the
output collage becomes demonstrable, visual attractive and balanced product.

6 System “Intelligent Collage” and Experimental Results


System “Intelligent Collage”, v. 2.0 for composition of photos and videos collages
was designed in Rapid Application Development “Borland Delphi 7” by using
language “Object Pascal”. Software includes four main modules: a module of
frame selection from video sequences, a module of segment placement, a module
of ROI detection, and a module of collage generation. Main functions of these
modules are represented in Table 1.
For working with video sequences by utility “DirectShow”, we used a compo-
nent DSPACK. A set of components from palette “AlphaControls” was applied
for interface program design. Also we created user components with following
functions: component TStretchHandle for segment transformation in handle mode
Drug&Drop, component TCoolTrayIcon for resource minimization of running
application, and component TOrImage for extended graphic effect in collage.

Table 1 Main functions of system “Intelligent Collage”

Module Functions
Module of frame selection from video 1. Removing of similar frames.
sequences 2. Removing of frame-transitions between scenes.
3. Subtitles removing.
4. Blending frames removing.
5. Face detection.
Module of segment placement 1. Location according to determined prototype.
2. Random location.
3. Interactive random location with canvas filling.
4. Automatic location.
Module of ROI detection 1. Method of shape estimation.
2. Block analysis method.
Module of collage generation 1. Seamless blending.
2. Generation a fuzzy mask.
3. α-blending.

Software “Intelligent Collage” has an intuitive user interface which permits to


generate a composition from 6 images or frames with determined background
during some simple steps. Digital photos or selected frames are used for collage
348 M. Favorskaya, E. Yaroslavtzeva, and K. Levtin

generation. We realized four automated methods of collage generation: with


prototypes (3 main variants), randomly, by filling, and by automatic location. Also
interactive handle mode permits to create and edit each collage region. Adaptive
ROI location, seamless blending, supporting of various color decoration of user
interface, filters for process of designed compositions, project saving in own
format, and following editing are main intelligent functions of this software.
Examples of various types of generated collage are presented on Fig.2. We see
that criteria of collage visualization and ROI selection on various images are very
personal and strongly depend from expert's opinion.
Software “Intelligent Collage” was scored by empirical and visual estimations
given by 30 experts. Estimation of flexible contours was realized by three parame-
ters (ROI selection, seamless blending, and collage relevance). Three methods were
used for such compare, i. e. rectangular formatting, method of shape estimation, and
block analysis method. Results of such testing in scores (0…100) are presented in
Table 2. One can see that the most effective method is a block analysis method. Es-
timations of computer speed are situated in Table 3. Test 1, Test 2, and Test 3 were
selected from three video sequences by a module of frame selection of software
“Intelligent Collage”.

a b

c d
Fig. 2 Types of collages: a) a collage designed by simple compositing regions on back-
ground; b) a collage with seamless blending effect without adaptive ROI; c) a collage with
seamless blending effect and adaptive ROI by method of shape estimation; d) a collage
with seamless blending effect and adaptive ROI by block analysis method
Intelligent Collage System 349

Table 2 Visual estimations of test collating

Rectangular formatting Shape estimation Block analysis


ROI selection
Test 1 30 76 84
Test 2 23 75 89
Test 3 40 78 81
Seamless blending
Test 1 42 80 82
Test 2 33 85 84
Test 3 34 87 90
Collage relevance
Test 1 67 91 91
Test 2 79 93 95
Test 3 73 89 94

Table 3 Estimations of computer speed

Test 1, s Test 2, s Test 3, s Test 4, s Test 5, s Test 6, s Test 7, s


Rectangular formatting 2.54 3.71 3.61 3.25 3.32 3.78 4.03
Block analysis 13.02 11.02 10.5 12.12 10.09 11.64 12.37
Shape estimation 18.77 15.56 16.02 19.01 15.07 15.69 17.28

7 Conclusion
Adaptive methods which were applied in software “Intelligent Collage”, v. 2.0
permit to realize some interesting functions and achieve a high aesthetic effect of a
designed collage. Collage relevance is also on high level thanks to exact ROI
selection based on development methods of shape estimation and block analysis.
We will continue researches for increase of accuracy of ROI selection in
frames. We plan to design a hybrid method including advantages of shape estima-
tion and block analysis methods. A shape of a single blob will be analyzed and
approximated by graphic primitives with more suitable shape. Also we assume to
extend a set of criteria for objects detection by using pattern recognition methods.

References
1. Diakopoulos, N., Essa, I.: Mediating Photo Collage Authoring. In: UIST, pp. 183–186
(2005)
2. Man, H.-L., Singhal, N., Cho, S., Park, I.K.: Mobile photo collage. In: IEEE WECV,
pp. 24–30 (2010)
350 M. Favorskaya, E. Yaroslavtzeva, and K. Levtin

3. Mei, T., Hua, X.-S., Zhu, C.-Z., Zhou, H.-Q., Li, S.: Home Video Visual Quality As-
sessment with Spatiotemporal Factors. IEEE Transactions on Circuits and Systems for
Video Technology 17(6), 699–706 (2007)
4. Yeung, M.M., Yeo, B.L.: Video visualization for compact presentation and fast brows-
ing of pictorial content. IEEE Trans. on CSVT 7(5), 771–785 (1997)
5. Rother, C., Bordeaux, L., Hamadi, Y., Blake, A.: Autocollage. In: SIGGRAGPH, pp.
847–852 (2006)
6. Wang, T., Mei, T., Hua, X.-S., Liu, X., Zhou, H.-Q.: Video Collage: A Novel Presen-
tation of Video Sequence. In: ICME, pp. 1479–1482 (2007)
7. Yang, B., Mei, T., Sun, L.-F., Yang, S.-Q., Hua, X.-S.: Free-Shaped Video Collage. In:
Satoh, S., Nack, F., Etoh, M. (eds.) MMM 2008. LNCS, vol. 4903, pp. 175–185.
Springer, Heidelberg (2008)
8. Chiu, P., Girgensohn, A., Liu, Q.: Stained-glass visualization for highly condensed
video summaries. In: ICME, pp. 2059–2062 (2004)
9. Ma, Y.-F., Zhang, H.-J.: Video snapshot: A bird view of video sequence. In: MMM
2005, pp. 94–101 (2005)
10. Favorskaya, M.: A Way to Recognize Dynamic Visual Images on the Basis of Group
Transformations. Pattern Recognition and Image Analysis 21(2), 179–183 (2011)
11. Favorskaya, M.: Motion Estimation for Object Analysis and Detection in Videos. In:
Handbook “Advances in Reasoning-Based Image Processing, Analysis and Intelligent
Systems: Conventional and Intelligent Paradigms, pp. 211–253. Springer (2012)
12. Favorskaya, M., Zotin, A., Damov, M.: Intelligent Inpainting System for Texture
Reconstruction in Videos with Text Removal. In: ICUMT, pp. 867–874 (2010)
Intuitive Humanoid Robot Operating
System Based on Recognition and
Variation of Human Body Motion

Yuya Hirose and Shohei Kato

Abstract. In this paper, we propose an intuitive operating system for hu-


manoid robot based on recognition and variation of human body motion. In
this system, human body motion is dynamically sensed by six-axis sensors in
controllers held in user’s hands. Then, the variation of human body motion
is also recognized by gyro sensors in the controllers. Based on the obtained
sensor information, the body motion classifier is constructed using Hidden
Markov Model. By using the body motion classifier, the proposed system
recognizes user’s body motion, and redundant motions of humanoid robot
are prevented. User can intuitively operate humanoid robot and make it do
the user’s intending motion. We conduct a task experiment to evaluate the
usability of proposed system. In the experiment, for comparison with the
proposed system, we prepare three methods of operation: joystick, kinect,
and operating system using six-axis sensor without recognizing the variation
of human body motion. As the result, comparing with other systems, we
confirmed that the proposed system has more versatility and the humanoid
robot could be more appropriately operated with the proposed system. And
it was suggested that the proposed system enables user to operate humanoid
robot, appropriately and intuitively.

1 Introduction
In recent years, personal robots have been researched intensively. Personal
robot is a robot which living the same environment as human and sup-
porting daily life while communicating with a human. The various personal
robots have been developed, for example, PaPeRo [7] and PARO [12]. These
Yuya Hirose · Shohei Kato
Dept. of Computer Science and Engineering, Graduate School of Engineering,
Nagoya Institute of Technology,
Gokiso-cho Showa-ku Nagoya 466-8555 Japan
e-mail: {hirose,shohey}@juno.ics.nitech.ac.jp

T. Watanabe et al. (Eds.): Intelligent Interactive Multimedia: Systems & Services, SIST 14, pp. 351–361.
springerlink.com c Springer-Verlag Berlin Heidelberg 2012
352 Y. Hirose and S. Kato

personal robots are designed for intended therapeutic effects on users. Hu-
manoid personal robots which is in the similar shape of human, have also
been developed. The humanoid personal robots have potential to work in
the same environment as human, for example, household chores, helping the
elderly, and simple task in the workplace because of its human form. There-
fore, the development of humanoid personal robots is expected, because it
can help humans in many ways if it is spread.
In the very recent years, many and various robot control systems and
the interfaces have been proposed and developed. The personalization of the
robot control also has attached much attention. The user’s personalized mo-
tion can be adapted into the robot motion by the dynamic and directly control
in which a user can intuitively operate the robot. For example, a motion cap-
ture is one of the most usable interface to obtain user’s free motion, and the
obtained motion data are directly used for the robot control in the existing
study [6] [10]. However, direct motion control system may probable catch
some unintended motions and reflect them to the robot, hence it may make
the user’s usability down. On the other hand, the recognition of the user’s
motion is also used in the robot control system [5] [9] in order to reduce the
unintended motion, hence it may be usable for problem in which no mistakes
are allowed. However, using only the recognition mechanism in the robot mo-
tion control, the personalization of robot motion control can not be realized.
Therefore, we consider that the personalization requires both the recogni-
tion of a user’s motion and reflection of a user’s dynamic control adjustment.
Thus, we propose the robot motion control system with combination of mo-
tion recognition and dynamic adjustment, and discuss the usability of the
proposed system through the task and subjective evaluation experiments.

2 Motion Recogniton/Control System


Figure 1 shows overview of the motion recognition/control system. This sys-
tem is composed of three phases: learning section, recognition section and
control section. In this research we use wiimotes as 6 axis sensors which have
been widely used as a game controller and also applied to robot control in
very recent years [3], [4], [11]. In the proposed system, the information about
acceleration and angular velocity of user’s hand can be obtained thorough
wiimotes, and used in the recognition of the human arm motion. Then the
recognition is realized by using Hidden Markov Model (HMM) which can
recognizes the time-axis data.
The motion to be recognized has the each corresponding motion model.
The proposed system constructs motion model in a learning section, and rec-
ognizes a motion based on the model. Next, in a control section, the system
calculates the variation of the robot’s joint angle corresponding to the recog-
nition result from the information of the angular velocity sensor that changed
with the user’s body motion.
Intuitive Humanoid Robot Operating System Based on Recognition 353

learning recognition control


section section section

motion M = {m1 , m2 , · · · , mN } motion u

generate observation sequences of calculate the joint angle variation


acceleration OX , OY , and OZ from angular velocity

learn motion model using HMM λm1 , λm2 , · · · , λmN


Baum-Welach algorithm recognize motion using robot
construct HMM λm1 , λm2 , · · · , λmN Viterbi algorithm action

Fig. 1 Overview of the proposed system.

a11 a22 a33

a12 a23 a34


q1 q2 q3
b1 (o) b2 (o) b3 (o)
Fig. 2 Left-to-right HMM.

2.1 Hidden Markov Model


The HMM is a well-known tool that is a recognition technique of time
sequence data, and it is often used in voice recognition. In this paper,
We used the left-to-right HMM as shown in Figure 2 to recognize the
user’s motion. The HMM consists of the following five elements as shown
λ = (Q, S, A, B, π): a finite set of states Q = q1 , · · · , qN , a finite set of output
symbols S = o1 , · · · , oM , state transition probabilities A = aij that transit
from state qi to state qj , output probabilities B = bi (oj ) that output a obser-
vation value oj on state qi , and probability distribution π = πi that is state
Si at the initial time.
Moreover, we used Baum-Welch algorithm [1] to learn the parameters
{A, B, π} and Vitarbi algorithm [2] to recognize the user’s motion.

2.2 Learning Section


In the learning section, the proposed system constructs the motion model
corresponding to each N pattern motion to be recognized. The motion model
of user’s body motion u is shown as M du and composed of the following
elements: HMMs and the set of the robot’s joints. From the acceleration
354 Y. Hirose and S. Kato

Table 1 Rocognition Algorithm


u
From the user’s motion u, acceleration sequences OX , OYu OZ u
are generated.
The threshold T h is set, the motion mi which satisfied all of the following three
conditions is selected.
N (OX u
λmX ) > Th
i

N (OY λm
u
Y ) > Th
i

N (OZ u
λm
Z ) > Th
i

Then the set of the selected motion mi is defined as M .


The motion mj ∈ M that satisfies the following expression is set as the recognition
result.
arg maxmj ∈M 31 (N (OXu
λm i
X )
mi
u
+N (OY λY ) + N (OZ u
λm i
Z ))

u
sequences of user’s motion u on each axis OX , OYu and OZu
, HMMs (λuX , λuY
and λuZ ) are each constructed using Baum-Wealth algorithm, respectively.
Set of the robot’s joint J u is composed of the joints whose angle needs
to be changed for the user’s motion u. J u is heuristically determined, for
example, J down = {shoulderjoint} in the case of the arm down motion and
J push = {shoulderjoint, elbowjoint} in the case of pushing forward the arm.

2.3 Recognition Section


In recognition section, the proposed system recognizes whether user’s body
motion u is one of the previously learned motion or not. Taking an example
the motion recognition on x axis, the recognition section will be detailed in
the follows. The acceleration sequence of the user’s motion u on x-axis can
be shown as OX u
= ouX 1 , ouX 2 , · · · , ouX T . Where, ouX t shows the tth quantized
value of the acceleration AB X t on the absolute coordinate system in the motion
u, and T shows the length of OX u
. For OX u u
, the likelihood P (OX | λm i
X ) can
be calculated using Viterbi algorithm. The calculated likelihood tends to be
a small value, hence represented as the logarithm to prevent the underflow
as follows.
L(OX u
| λm X ) = log P (OX | λX ).
i u mi
(1)
In the same way, L(OYu | λm
Y ) and L(OZ | λZ ) can be calculated.
i u mi

Normalization of the Likelihood. The scale of likelihood is different be-


tween the motion models, hence we normalized the likelihood for each motion
model; the likelihood between the motion models becomes comparable value.
To normalize the likelihood, we used the average and standard deviation of
Intuitive Humanoid Robot Operating System Based on Recognition 355

Fig. 3 Arm down motion.

Fig. 4 Arm bend motion.

the likelihood for the training data used in learning HMM. The normalized
likelihood of motion u on D-axis is calculated as follows.
u
L(OD | λm
D ) − L(λD )
i mi
u
N (OD | λm i
D ) = , (2)
σ(λm i
D )

where, L(λm i mi
D ), σ(λD ) each shows the average and standard deviation of the
likelihood for training data of motion mi on D-axis.
The proposed system recognizes a motion with the normalized likelihood
based on algorithm shown in Table 1.

2.4 Control Section


In the control section, the proposed system calculates a variation of the
robot’s joint angle corresponding to the recognition result from an angu-
lar velocity of motion u. And, the system controls a robot based on the
recognition result and the variation of the robot’s joint angle.

2.5 Motion Recognition Demonstration


Figure 3 and 4 show two examples of robot control demonstration using the
proposed system; Figure 3 and 4 each shows the example that user performs
arm down and bend, respectively.
356 Y. Hirose and S. Kato

3 Usability Evaluation Experiment


To verify the usability of the proposed robot contorol system (S) detailed
in the previous section, we conduct usability evaluation experiment. In the
experiments, we prepared three types of the operation method for the compar-
ison with the proposed system: joy stick (Sj ), kinect (Sk ), and the proposed
system without dynamic adjustment of joint angle (Sr ). Sj was prepared for
evaluation of the effect of body motion control, Sk was prepared for evalu-
ation of the effect to incorporate body motion recognition in direct control,
and Sr was prepared for evaluation of the dynamic adjustment of joint an-
gle. In Sj and Sk , the robot performs the motion adjusting its hands to the
three-dimensional coordinate which is received from the interface in real time.
Hereafter these coordinates are defined as the target coordinate for Sj and
Sk . Each joint angle of the robot arm is calculated by inverse kinematics
from the target coordinate. In Sj , the variation of the target coordinate is
calculated from the direction of the tilting joystick and the amount. Besides,
the two buttons on the joystick’s side are used in Sj because it is difficult to
point the three-dimensional coordinate by only using the joystick. In Sj , it
is defined that the target coordinate is determined as follows: 1) tilting the
joystick to the pitch means up-down, 2) titling the joystick to the roll means
left-right, and 3) pushing the button and another one mean forward and back
respectively. In Sk , the target coordinate is the three-dimentional coordinate
of the user’s hand which is obtained through the sensor.

3.1 Task Experiment


To evaluate the operability of the proposed system for robot control, we
conducted the task experiment (shown in Figure 5) which is under the as-
sumption that the robot is controlled in remote place. The robot was placed
remote place where the participant can not see directly. The participant con-
trols the robot to perform a task while viewing the robot condition displayed
in the monitor which is captured through the web camera. The conditions of
success for the task is ”take up a box which placed in front of the robot and
take down the box on the initial position”. If the participant fails the task,
the robot and the box are each corrected in the initial condition and position,
and the participant performs the task repeatedly until he/she succeeds. In
the experiment, we prepared two types of box whose sizes differ: 55 × 90 × 46
[mm] and 55 × 102 × 46 [mm], and the participant perform the same task for
each box. The humanoid robot KHR-2HV made by Kondo Kagaku co., ltd
(Figure 6) is used in the experiment.
The experimental environment is as follow: the participants are 20 males
and females college students. The system for the robot control and the objects
(the box A and B) are used in random order for each participant.
In this paper, we prepared 4 motion patterns (arm down, up, bend and
spread) for the proposed system, and 5 motion patterns (arm down, up,
Intuitive Humanoid Robot Operating System Based on Recognition 357

Fig. 5 Box take up/down task.

Fig. 6 KHR-2HV(Kondo Kagaku co., ltd).

Fig. 7 Experimental environment.

widely bend, compactly bend and spread) for Sr . Figure 7 shows the exper-
imental environment. We calculated the average time, the standard error of
the average time and the average number of failure to success the task using
each control system. And, we also calculated the recognition rates of each
motion when using the proposed system and Sr . Table 2 and 3 shows the
experiment result.
From Table 2, we confirmed that the proposed system could success the
task without difference of size, and it had a versatility compared with Sr . It
was also shown that the proposed system could success the task in a shorter
358 Y. Hirose and S. Kato

Table 2 Experimental Results

box A(55 × 102 × 46) [mm] box B(55 × 90 × 46) [mm]


average number of failure average time (s) SE (s) average number of failure average time (s) SE (s)
S 0.9 39.98 6.87 0.8 33.19 5.82
Sj 3.2 119.82 15.36 3.8 134.74 16.47
Sk 1.6 43.18 7.14 2.5 61.75 6.36
Sr 0.9 37.43 9.70 1.5 70.32 14.17

Table 3 Recognition Rate

down bend spread up


S 86.2 90.0 88.3 82.1
compactly widely
Sr 84.3 84.2 86.1
74.5 36.1

time than Sj . It was suggested that the proposed system could control the
robot intuitively by associated user’s motion with robot’s motion. Besides,
it was shown that the proposed system was less failure number than Sk . It
was suggested that the robot dropped a box in the situation the robot was
holding a box in its hands, because Sk directly transfered the user’s motion
to the robot and reflected even a minute motion of the user’s hands. On
the other hand, it was suggested that the proposed system could prevent
redundant motions on performing the task, because it selects a motion from
the previously learned motions by the body motion recognition. From the
above results, it was confirmed that the proposed system had relatively high
versatility, and enabled user could perform the task quickly and successfully.
Furthermore, as focusing on the standard error, it can be confirmed that the
difference of the task achieve time between users is small. From the result, the
difference is comparable to Sk in which user’s motion is directly corresponded
to the robot one. Hence, it was suggested that the proposed system dose not
need proficiency.
Meanwhile, from Table 3, the proposed system showed relatively high
recognition rate for each motion. And, especially in the bend motion, the
proposed system showed higher recognition rate than Sr . From these facts,
it was suggested that the proposed system realized not only higher degree of
freedom but also improvement of recognition rate by the dynamic adjustment
of the joint angle. Therefore it seems that the realization of high recognition
rate for varied motion is one of the factor for user to achieve the task quickly
and successfully. Overall, from these results, we confirmed that the proposed
system enables user to operate humanoid robot appropriately.
Intuitive Humanoid Robot Operating System Based on Recognition 359

Easily- Affini- Accus-


Good Intuitive New
handled tive tomed
3

1
S
0 Sj
Sk
-1 Sr
-2
: means 1% significance
: means 5% significance
-3
Hardly- Non-in- Non-af- Unaccus-
Bad Old
handled tuitive finitive tomed

Fig. 8 Evaluation of the control systems.

3.2 Subjective Evaluation Experiment


After the task experiment, we conducted subjective evaluation experiment
about the control systems used for the robot control. We used semantic dif-
ferential method (SD method) [8], and the evaluations were conducted on a
scale of one to seven for six pairs of adjectives. Figure 8 shows the evaluation
result. Using the Tukey test [13], the significant differences of the subjective
evaluation between the control systems were verified: the broken line and the
solid line each shows the 5% and 1% significance, respectively. From Figure 8,
we confirmed that the proposed system and Sk received more positive evalu-
ation than the other two control systems. Focusing on ”intuitive” and ”have
a sense of unity”, it was also confirmed that the proposed system and Sk
received more significantly positive evaluation than Sj . From these results,
it was suggested that user’s and the robot’s motion were corresponded us-
ing the proposed system and Sk , and these system enabled user to intuitive
control. Besides focusing on ”good”, the proposed system received more sig-
nificantly positive evaluation than Sj . From this result, it was confirmed that
the proposed system was suitable system for robot control, and it seemed to
be caused by the recognition of body motion and the dynamic adjustment of
the joint angle.

4 Conclusion
In this paper, we proposed the intuitive robot control system with the com-
bination of the recognition of a user’s motion and dynamic adjustment of
360 Y. Hirose and S. Kato

robot’s joint angle. The proposed system recognizes user’s motion using HMM
and dynamically reflects the joint angle variation to control robot motion.
Through the task experiment, we confirmed that the proposed system had
relatively higher capability of completing task than the compared systems.
And through the subjective evaluation experiment, we confirmed that the
proposed system could control the robot intuitively. From the experimental
results, we suggest that the proposed system has high usability for robot con-
trol and provides intuitive robot control that a robot appropriately perform
user’s intended motion.
In this paer, the proposed system did not feed back the robot’s state in-
formation to the user, therefore the user could not catch the robot’s state
and its sorroundings. For example, by the feedback function with a pressure
sensor, it is expected that operability of the robot can be improved.
In our future, to verify utility of the proposed system, we will conduct addi-
tional other experiments and the communication experiment using humanoid
robot between remote locations.

Acknowledgment. This work was supported in part by the Ministry of Educa-


tion, Science, Sports and Culture, Grant.in.Aid for Scientific Research under grant
#20700199.

References
1. Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incom-
plete data via the em algorithm. The Journal of the Royal Statistical Society
B 39, 1–38 (1973)
2. Forney, G.D.: The viterbi algorithm. Proc. IEEE 61(3), 268–278 (1973)
3. Gams, A., Mudry, P.-A.: Gaming controllers for research robots: controlling a
humanoid robot using a wiimote. In: 17th International Electrotechnical and
Computer Science Conference, ERK 2008 (2008)
4. Guo, C., Sharlin, E.: Exploring the use of tangible user interfaces for human-
robot interaction: A comparative study. In: Proceeding of the SIGCHI Confer-
ence on Human Factors in Computing System, pp. 121–130 (2008)
5. Inamura, T., Nakamura, Y., Toshima, I., Tanie, H.: Embodied symbol emer-
gence based on mimesis theory. Int’l J. of Robotics Research 23(4), 363–378
(2004)
6. Nakaoka, S., Nakazawa, A., Yokoi, K.: Generating whole body motions for a
biped humanoid robot from captured human dances. In: Proceedings of the
IEEE International Conference on Robotics and Automation, vol. 3, pp. 3905–
3910 (2003)
7. NEC. PaPeRo, http://www.nec.co.jp/products/robot/en/
8. Osgood, C., Suci, G., Tannenbaum, P.: The measurement of meaning. Univer-
sity of Illinois Press, Urbana (1967)
9. Pook, P.K., Ballard, D.H.: Recognizing teleoperated manipulations. In: Pro-
ceedings of the IEEE International Conference on Robotics and Automation,
pp. 578–585
Intuitive Humanoid Robot Operating System Based on Recognition 361

10. Riley, M., Ude, A., Atkeson, C.G.: Methods for motion generation and interac-
tion with a humanoid robot: Case studies of dancing and catching. In: Proceed-
ings of AAAI and CMU Workshop on Interactive Robotics and Entertainment,
pp. 35–42 (2000)
11. Smith, C., Christensen, H.I.: Wiimote robot control using human motion mod-
els. In: The 2009 IEEE/RSJ International Conference on Intelligent Robots
and Systems, pp. 5509–5515 (2009)
12. Shibata, T.: PARO, http://paro.jp/english/
13. Tukey, J.W.: The problem of multiple comparisons. Mimeographed Monograph
(1953)
Knowledge-Based System for Automatic 3D
Building Generation from Building Footprint

Kenichi Sugihara, Xinxin Zhou, and Takahiro Murase*

Abstract. A 3D urban model is an important information infrastructure that can be


utilized in several fields, such as, urban planning and game industries. However,
enormous time and effort have to be spent to create 3D urban models, using 3D
modeling software. In this paper, we propose a GIS and CG integrated system for
automatically generating 3D building models from building polygons (building
footprints) on a digital map. By combining the building footprint analysis with
information from GIS maps and domain specific knowledge, the complexity of the
building generation process can be greatly reduced.

1 Introduction
A 3D urban model shown in Fig.1 bottom is important in urban planning and in
facilitating public involvement. To facilitate public involvement, 3D models simu-
lating a real or near future cities by a 3D CG (Computer Graphics) can be of great
use. However, enormous time and labour has to be consumed to create these 3D
models, using 3D modeling software such as 3ds Max or SketchUp. For example,
when manually modeling a house with roofs by Constructive Solid Geometry
(CSG), one must use the following laborious steps:
(1) Generation of primitives of appropriate size, such as box, prism or polyhedron
that will form parts of a house (2) Boolean operations are applied to these primi-
tives to form the shapes of parts of a house such as making holes in a building
body for doors and windows (3) Rotation of parts of a house (4) Positioning of
parts of a house (5) Texture mapping onto these parts.

Kenichi Sugihara
*

Gifu Keizai University, 5-50 Kitagata-chou, Ogaki-City, Gifu-Pref., Japan 503-8550


e-mail: sugihara@gifu-keizai.ac.jp

Xinxin Zhou
Nagoya Bunri University, Inazawa-chou, Inazawa-City, Aichi-Pref., Japan 492-8520
e-mail: xinxin@nagoya-bunri.ac.jp

Takahiro Murase
Chukyo Gakuin University, 2216 Toki-chou, Mizunami-City, Gifu-Pref., Japan 509-6101
e-mail: murase@chukyogakuin-u.ac.jp

T. Watanabe et al. (Eds.): Intelligent Interactive Multimedia: Systems & Services, SIST 14, pp. 363–373.
springerlink.com © Springer-Verlag Berlin Heidelberg 2012
364 K. Sugihara, X. Zhou, and T. Murase

GIS Application ( ArcGIS )


*Digital residential map con-
taining building polygons
linked with attributes data
such as the number of storeys
and the type of roof.

*Attributes (left below) for


3D model such as number of
storeys, type of roof (flat, ga-
ble roof, hipped roof, oblong
gable roof, gambrel roof, and
so forth), and the image code
of roof, wall.

GIS Module ( Visual Basic & MapObjects )


*Filtering out an unnecessary vertex whose internal angle is almost 180 degrees
*Partitioning orthogonal building polygons into sets of rectangles *Generating inside
contours by straight skeleton computation for positioning windows and façades of a
building

CG Module ( MaxScript )
*Generating 3D models of appropriate size, such as boxes, prisms forming the parts &
Boolean operation, making holes for doors and windows
*Rotating and positioning 3D models
*Automatic texture mapping onto 3D models

Fig. 1 Flow of Automatic Generation System for 3D Building Models


Knowledge-Based System for Automatic 3D Building Generation 365

In order to streamline these laborious steps, we propose a GIS (Geographic in-


formation system) and CG integrated system that automatically generates 3D
building models, based on building polygons or building footprints on a digital
map shown in Fig.1 top. Fig.1 top also shows most building polygons’ edges meet
at right angles (orthogonal polygon).
A complicated orthogonal polygon can be partitioned into a set of rectangles.
The proposed integrated system partitions orthogonal building polygons into a
set of rectangles and places rectangular roofs and box-shaped building bodies on
these rectangles. In order to partition an orthogonal polygon, a useful polygon ex-
pression (RL expression: edges’ Right & Left turns expression) and a partitioning
scheme was proposed for deciding from which vertex a dividing line (DL) is
drawn [Sugihara 2006].
In this paper, we propose a new scheme for partitioning complicated orthogonal
building polygons. By combining the building footprint analysis with information
from GIS maps and domain specific knowledge, the complexity of the building
generation process can be greatly reduced.

2 Related Work
Since 3D urban models are important information infrastructure that can be uti-
lized in several fields, the researches on creations of 3D urban models are in full
swing. Various types of technologies, ranging from computer vision, computer
graphics (CG), photogrammetry, and remote sensing, have been proposed and de-
veloped for creating 3D urban models.
Using photogrammetry, Gruen et al. [1998, 2002] introduced a semi-automated
topology generator for 3D building models: CC-Modeler. Feature identification
and measurement with aerial stereo images is implemented in manual mode. Dur-
ing feature measurement, measured 3D points belonging to a single object should
be coded into two different types according to their functionality and structure:
boundary points and interior points. After these manual operations, the faces are
defined and the related points are determined. Then the CC-Modeler fits the faces
jointly to the given measurements in order to form a 3D building model.
Suveg and Vosselman [2002] presented a knowledge-based system for auto-
matic 3D building reconstruction from aerial images. The reconstruction process
starts with the partitioning of a building into simple building parts based on the
building polygon provided by 2D GIS map. If the building polygon is not a rec-
tangle, then it can be divided into rectangles. A polygon can have multiple parti-
tioning schemes. To avoid a blind search for optimal partitioning schemes, the
minimum description length principle is used. This principle provides a means of
giving higher priority to the partitioning schemes with a smaller number of rectan-
gles. Among these schemes, optimal partitioning is ‘manually’ selected. Then, the
building primitives of CSG representation are placed on the rectangles partitioned.
These proposals and systems, using photogrammetry, will provide us with a
primitive 3D building model with accurate height, length and width, but without
details such as windows, eaves or doors. The research on 3D reconstruction is
366 K. Sugihara, X. Zhou, and T. Murase

concentrated on reconstructing the rough shape of the buildings, neglecting details


on the façades such as windows, etc [Zlatanova 2002].
On the other hand, there are some application areas such as urban planning and
game industries where the immediate creation and modification of many detailed
building models is requested to present the alternative 3D urban models. Proce-
dural modeling is an effective technique to create 3D models from sets of rules
such as L-systems, fractals, and generative modeling language [Parish et al. 2001].
Müller et al. [2006] have created an archaeological site of Pompeii and a
suburbia model of Beverly Hills by using a shape grammar that provides a compu-
tational approach to the generation of designs. They import data from a GIS
database and try to classify imported mass models as basic shapes in their shape
vocabulary. If this is not possible, they use a general extruded footprint together
with a general roof obtained by a straight skeleton computation defined by a
continuous shrinking process [Aichholzer et al. 1995].
More recently, image-based capturing and rendering techniques, together with
procedural modeling approaches, have been developed that allow buildings to be
quickly generated and rendered realistically at interactive rates. Bekins et al.
[2005] exploit building features taken from real-world capture scenes. Their inter-
active system subdivides and groups the features into feature regions that can be
rearranged to texture a new model in the style of the original. The redundancy
found in architecture is used to derive procedural rules describing the organization
of the original building, which can then be used to automate the subdivision and
texturing of a new building. This redundancy can also be used to automatically fill
occluded and poorly sampled areas of the image set.
Aliaga et al. [2007] extend the technique to inverse procedural modeling of
buildings and they describe how to use an extracted repertoire of building gram-
mars to facilitate the visualization and modification of architectural structures.
They present an interactive system that enables both creating new buildings in the
style of others and modifying existing buildings in a quick manner.
Vanega et al. [2010] interactively reconstruct 3D building models with the
grammar for representing changes in building geometry that approximately follow
the Manhattan-world (MW) assumption which states there is a predominance of
three mutually orthogonal directions in the scene. They say automatic approaches
using laser-scans or LIDAR data, combined with aerial imagery or ground-level im-
ages, suffering from one or all of low-resolution sampling, robustness, and missing
surfaces. One way to improve quality or automation is to incorporate assumptions
about the buildings such as MW assumption. However, there are lots of buildings
that have cylindrical or general curved surfaces, based on non-orthogonal building
polygons.
By these interactive modeling, 3D building model with plausible detailed
façade can be achieved. However, the limitation of these modeling is the large
amount of user interaction involved [Nianjuan et al. 2009]. When creating 3D ur-
ban model for urban planning or facilitating public involvement, 3D urban models
should cover lots of citizens’ and stakeholders’ buildings involved. This means
that it will take an enormous time and labour to model 3D urban model with hun-
dreds or thousands of building.
Knowledge-Based System for Automatic 3D Building Generation 367

Thus, the GIS and CG integrated system that automatically generates 3D urban
models immediately is proposed, and the generated 3D building models that con-
stitute 3D urban models are approximate geometric 3D building models that citi-
zens and stakeholder can recognize as their future house or real-world buildings.

3 Process for Automatic 3D Building Model Generation


As shown in Fig. 1, the proposed automatic building generation system consists of
GIS application (ArcGIS, ESRI Inc.), GIS module and CG module. The source of
the 3D urban model is a digital residential map that contains building polygons
linked with attributes data such as the number of storeys and the type of roof.
The GIS module ‘pre-processes’ building polygons on the digital map. As men-
tioned in detail in the preceding sections, ‘pre-process’ includes filtering out an
unnecessary vertex whose internal angle is almost 180 degrees, partitioning ortho-
gonal building polygons into sets of rectangles, generating inside contours by
straight skeleton computation for positioning windows and façades of a building
and exporting the coordinates of polygons’ vertices and attributes of buildings.
The attributes of buildings consist of the number of storeys, the image code of
roof, wall and the type of roof (flat, gable roof, hipped roof, oblong gable roof,
gambrel roof, mansard roof, temple roof and so forth). The GIS module has been
developed using 2D GIS software components (MapObjects, ESRI).
The CG module receives the pre-processed data that the GIS module exports,
generating 3D building models. CG module has been developed using Maxscript
that controls 3D CG software (3ds MAX, Autodesk Inc). In case of modeling a
building with roofs, the CG module follows these steps:

(1) Generation of primitives of appropriate size, such as boxes, prisms or polyhe-


dra that will form the various parts of the house
(2) Boolean operations are applied to these primitives to form the shapes of parts
of the house, for examples, making holes in a building body for doors and win-
dows, making trapezoidal roof boards for a hipped roof and a temple roof
(3) Rotation of parts of the house
(4) Positioning of parts of the house
(5) Texture mapping onto these parts according to the attribute received
(6) Copying the 2nd floor to form the 3rd floor or more in case of building higher
than 3 storeys.
As mentioned in this section, the proposed system consists of GIS application,
GIS module and CG module. GIS module is discussed in the next section.

4 Functionality of GIS Module


At map production companies, technicians are drawing building polygons manual-
ly with digitizers, depending on aerial photos or satellite imagery as shown in
368 K. Sugihara, X. Zhou, and T. Murase

Fig. 1. This aerial photo and digital map also show that most building polygons
are orthogonal polygons. An orthogonal polygon can be replaced by a combina-
tion of rectangles.

4.1 Proposed Polygon Expression


When following edges of a polygon clockwise, an edge turns to the right or to the
left by 90 degrees. Therefore, it is possible to assume that an orthogonal polygon
can be expressed as a set of its edges’ turning direction; an edge turning to the
‘Right’ or to the ‘Left’.
A useful polygon expression (RL expression: edges’ Right & Left turns expres-
sion) is proposed for specifying the shape pattern of an orthogonal polygon [Sugiha-
ra 2005]. For example, an orthogonal polygon with 22 vertices shown in Fig. 2 is
expressed as a set of its edges’ turning direction; LRRRLLRRLRRLRRLRLLRRRL
where R and L mean a change of an edge’s direction to the right and to the left, re-
spectively. The number of shapes that a polygon can take depends on the number of
vertices of a polygon.
The advantage of this RL expression is as follows.
(1) RL expression specifies the shape pattern of a polygon without regard to the
length of its edges.
(2) This expression decides from which vertex a dividing line (DL) is drawn.

4.2 Partitioning Scheme


The more vertices a polygon has, the more partitioning scheme a polygon has,
since the interior angle of a ‘L’ vertex is 270 degrees and two DLs (dividing lines)
can be drawn from a ‘L’ vertex. The partitioning scheme that gives higher priority
to the DLs that divide ‘fat rectangles’ was proposed [Sugihara 2006]. A ‘fat rec-
tangle’ is a rectangle close to a square. The proposed partitioning scheme is simi-
lar to Delaunay Triangulation in the sense that Delaunay Triangulation avoids thin
triangles and generates fat triangles. However, the proposal did not always result
in generating plausible and probable 3D building models with roofs. In the next
proposal, among many possible DLs, the DL that satisfies the following condi-
tions is selected for partitioning.

(1) A DL that cuts off ’one rectangle’.


(2) Among two DLs from a same ‘L’ vertex, a shorter DL is selected to cut off a
rectangle.
(3) A DL whose length is shorter than the width of a ‘main roof’ that a ‘branch
roof’ extends to.

A ’branch roof’ is a roof that is cut off by a DL and extends to a main roof. To cut
off one rectangle, the edge crossed by a DL is three or four edges adjacent from a
‘L’ vertex, when following edges of a polygon clockwise or counter-clockwise.
Knowledge-Based System for Automatic 3D Building Generation 369

Stage 1 Stage 2

Building Polygon Expression From ‘L’ vertex, two possible DLs can be drawn.
Among DLs, a shorter DL that cuts off one rectan-
LRRRLLRRLRRLRRLRLLRRRL gle or a DL whose length is shorter than the width
of a ‘main roof’ can be selected.

Stage 4
Stage 3

A DL that satisfies the conditions is Upper left geometry (‘LRRRL’) is evaluated as


selected for partitioning. an independent rectangle when the area over-
lapped with a body polygon is small.

Stage 5 Stage 6

Partitions will continue until the number of After partitions, 3D building models are auto-
vertices of a body polygon is four. matically generated on divided rectangles by
using CSG.

Fig. 2 Partitioning process of an orthogonal building polygon into a set of rectangles


370 K. Sugihara, X. Zhou, and T. Murase

4.3 Partitioning Process


Fig. 2 shows the partitioning process of an orthogonal building polygon into a set
of rectangles. The vertices of a polygon are numbered in clock-wise order. Stage 2
in Fig. 2 shows an orthogonal polygon with all possible DLs shown as thin dotted
lines and with DLs that satisfy condition (1), shown as thick dotted lines. Also in
stage 2, the example of a branch roof is shown as the rectangle formed by vertices
6,7,8,9 cut off by DL.
Since each roof has the same slope in most multiple-roofed buildings, a wider
roof is higher than a narrower roof and ‘probable multiple-roofed buildings’ take
the form of narrower branch roofs diverging from a wider and higher main roof.
Narrower branch roofs are formed by dividing a polygon along a shorter DL and
the width of a branch roof is equal to the length of the DL. The reason for setting
up these conditions is that like breaking down a tree into a collection of branches,
the system cut off along ‘thin’ part of branches of a polygon. Thus, a scheme of
prioritizing the shorter DL that cuts off a branch roof is proposed.
In the partitioning process as shown in Fig. 2, the DLs that satisfy the men-
tioned conditions are selected for partitioning. By cutting off one rectangle, the
number of the vertices of a body polygon is reduced by two or four. After
partitioning branches, the edges’ length and RL data are recalculated to find
new branches. Partitioning continues until the number of the vertices of a body
polygon is four.
After being partitioned into a set of rectangles, the system places 3D building
models on these rectangles. The rectangle partitioned is extended to a wider and
higher main roof so that it will form a narrower and lower branch roof.

4.4 How to Partition Branches


How the system is finding ‘branches’ is as follows. The vertices of a polygon are
numbered in clock-wise order as shown in Fig. 2. The system counts the number
of consecutive ‘R’ vertices (= nR) between ‘L’ vertices. If nR is two or more, then
it can be a branch. One or two DLs can be drawn from ‘L’ vertex in a clockwise
or counter-clockwise direction, depending on the length of the adjacent edges of
‘L’ vertex.
Fig. 3 shows various cases of drawing DL when nR is 2, 3 and 4, depending on
the length of adjacent edges of ‘L’ vertex. Fig. 3 (nR=2) shows three cases of
drawing DL when nR is two. The way of drawing DL depends on the comparison
between Len(FCP) and Len(jpb). In Fig. 3, FCP (Forward Cutting Point) is the ‘L’
vertex that precedes consecutive ‘R’ vertices and from which a DL can be drawn
forwardly in terms of the clock-wise numbered vertices. BCP (Backward Cutting
Point) is the ‘L’ vertex that succeeds consecutive ‘R’ vertices and from which DL
can be drawn backwardly.
‘jnsf’ is the index to specify the vertex that succeeds FCP by ‘n’ vertex and
‘jnpb’ is the index to specify the vertex that precedes BCP by ‘n’ vertex (n=1,2,..).
Len(FCP) means the length of an edge between FCP and pt(jsf). Fig. 3 (nR=3)
shows three cases of drawing DL when nR is three. The way of drawing DL
Knowledge-Based System for Automatic 3D Building Generation 371

depends on the comparison between Len(FCP) and Len(j2sf) and the comparison
between Len(jsf) and Len(jpb). Of the two DLs from FCP or BCP, a shorter DL
will be selected for partition. In the 3rd case of nR=3 in Fig. 3, the rectangle consist-
ing of vertices; pt(jsf) and pt(j2sf), pt(jpb), pt(A) is not partitioned but separated as
an independent one. This is the only case where the separation occurs.
Fig. 3 (nR=4) shows three cases of drawing DL when nR is four. The way of
drawing DL depends on the comparison between Len(jsf)and Len(j2pb). With the
exception of 1st case of nR=4 where the vertices of pt(jsf) and pt(j2sf), pt(j2pb),
pt(jpb) form one rectangle, the partition method of nR=4 or more is the same as the
method of nR=3 since the branch that is formed by these 4 or more ‘R’ vertices
will be self-intersecting and cannot form ‘one rectangle’.

pt(jsf) pt(jpb) pt(jsf) pt(jpb) pt(jsf) pt(jpb)

FCP BCP
FCP BCP BCP FCP

Len(FCP) = Len(jpb) Len(FCP) < Len(jpb) Len(FCP) > Len(jpb)

nR=2 : three cases of DL depending on the comparison between Len(FCP) and Len(jpb)

pt(j2sf) pt(jpb) pt(j2sf) pt(jsf) pt(j2sf)


pt(jsf)

FCP
pt(jsf) BCP pt(jpb)
FCP
BCP FCP pt(A) BCP pt(jpb)

Len(FCP) < Len(j2sf) Len(FCP) > Len(j2sf) Len(FCP) < Len(j2sf)


Len(jsf) < Len(jpb) Len(jsf) > Len(jpb) Len(jsf) > Len(jpb)

nR=3 : three cases of DL depending on the comparison between Len(FCP) and Len(j2sf)
and the comparison between Len(jsf) and Len(jpb)

pt(j2sf) pt(j2pb) pt(j2sf) pt(j2pb) pt(j2sf) pt(j2pb)

pt(jsf)

pt(jsf) FCP pt(jpb)


pt(jpb)
pt(jsf) BCP pt(jpb)
FCP BCP BCP
FCP

Len(jsf) = Len(j2pb) Len(jsf) < Len(j2pb) Len(jsf) > Len(j2pb)

nR=4 : If Len(jsf)=Len(j2pb), then it can form a rectangle. Otherwise, same as nR=3

Fig. 3 Various cases of drawing DL when nR is 2, 3 and 4 vertices


372 K. Sugihara, X. Zhou, and T. Murase

5 Knowledge Based Partitioning Scheme


For procedural modeling, as mentioned in related work, Müller et al. [2006] have
used a general extruded footprint together with a general roof obtained by a
straight skeleton computation. The straight skeleton is the set of lines traced out by
the moving vertices in this shrinking process and can be used as the set of ridge
lines of a building roof [Aichholzer et al. 1996].
However, the roofs created by the straight skeleton are limited to hipped roofs
or gable roofs with their ridges parallel to long edges of the rectangle into which a
building polygon is partitioned. As shown in a satellite image of Fig. 4, there are
many roofs whose ridges are perpendicular to a long edge of the rectangle, and
these roofs cannot be created by the straight skeleton. Since the straight skeleton
treats a building polygon as a whole, it forms a seamless roof so that it cannot
place roofs independently on partitioned polygons. Fig. 4 also shows 3D house
models automatically generated by the proposed system, depending on the differ-
ent partitioning schemes; ‘separation prioritizing’ or ‘shorter DL (dividing line)
prioritizing’, which are decided by domain knowledge stored as attribute data
linked to the building polygon.
To create the various shapes of 3D roofs, the proposed system has an option to
choose partitioning scheme; prioritizing separation or prioritizing shorter DL. The
proposed system also tries to select a suitable DL for partitioning or a suitable se-
paration, depending on the RL expression of a polygon, the length of DLs and the
edges of a polygon.

6 Conclusion
For everyone, a 3D urban model is quite effective in understanding what if this al-
ternative plan is realized, what image of the town were or what has been built.
Traditionally, urban planners design the future layout of the town by drawing
building polygons on a digital map. Depending on the building polygons, the inte-
grated system automatically generates a 3D urban model so instantly that it meets
the urgent demand to realize another alternative urban planning.
In this paper, a new scheme for an orthogonal polygon partitioning is proposed;
the system divides a polygon along the thin part of its branches. Thus, the pro-
posed integrated system succeeds in automatically generating typical residential
areas.
The limitation of the system is that automatic generation is executed based only
on ground plans or top views. There are some complicated shapes of buildings
whose outlines are curved or even crooked. To create these curved buildings, the
system needs side views and front views for curved outlines information.
Future work will be directed towards the development of methods for:

1) the automatic generation algorithm to model curved buildings by using side


views and front views.
2) the creation of general shape of roofs by a straight skeleton computation based
on general shape of building polygons.
Knowledge-Based System for Automatic 3D Building Generation 373

References
1. Daniel, A.G., Paul, R.A., Daniel, B.R.: Style Grammars for interactive Visualization of
Architecture. IEEE Transactions on Visualization and Computer Graphics 13, 786–797
(2007)
2. Gruen, A., Wang, X.: CC Modeler: A topology generator for 3D urban models. ISPRS
J. of Photogrammetry and Remote Sensing 53, 286–295 (1998)
3. Gruen, A., et al.: Generation and visualization of 3D-city and facility models using
CyberCity Modeler. MapAsia, 8, CD-ROM (2002)
4. Daniel, B.R., Daniel, A.G.: Build-by-number: rearranging the real world to visualize
novel architectural spaces. In: Visualization, VIS 2005, pp. 143–150. IEEE (2005)
5. Jiang, N., Tan, P., Cheong, L.-F.: Symmetric architecture modeling with a single im-
age. ACM Transactions on Graphics - TOG 28(5) (2009)
6. Aichholzer, O., Aurenhammer, F., Alberts, D., Gärtner, B.: A novel type of skeleton
for polygons. Journal of Universal Computer Science 1(12), 752–761 (1995)
7. Aichholzer, O., Aurenhammer, F.: Straight Skeletons for General Polygonal Figures in
the Plane. In: Cai, J.-Y., Wong, C.K. (eds.) COCOON 1996. LNCS, vol. 1090, pp.
117–126. Springer, Heidelberg (1996)
8. Mueller, P., Wonka, P., Haegler, S., Ulmer, A., Van Gool, L.: Procedural Modeling of
Buildings. ACM Transactions on Graphics 25(3), 614–623 (2006)
9. Kenichi, S.: Automatic Generation of 3D Building Model from Divided Building Po-
lygon. In: ACM SIGGRAPH 2005, Posters Session, Geometry & Modeling, CD-ROM
(2005)
10. Kenichi, S.: Generalized Building Polygon Partitioning for Automatic Generation of
3D Building Models. In: ACM SIGGRAPH 2006, Posters Session Virtual & Aug-
mented & Mixed Reality & Environments, CD-ROM (2006)
11. Suveg, I., Vosselman, G.: Automatic 3D Building Reconstruction. In: Proceedings of
SPIE, vol. 4661, pp. 59–69 (2002)
12. Carlos, V.A., Daniel, A.G., Bedřich, B.: Building reconstruction using Manhattan-
world grammars. In: 2010 IEEE Conference on Computer Vision and Pattern Recogni-
tion (CVPR), pp. 358–365 (2010)
13. Parish, Y.I.H., Müller, P.: Procedural modeling of cities. In: Fiume, E. (ed.) Proceed-
ings of ACM SIGGRAPH 2001, pp. 301–308. ACM Press, New York (2001)
14. Zlatanova, S., Heuvel Van Den, F.A.: Knowledge-based automatic 3D line extraction
from close range images. International Archives of Photogrammetry and Remote Sens-
ing 34, 233–238 (2002)
Locomotion Design of Artificial Creatures
in Edutainment

Kyohei Toyoda, Takamichi Yuasa, Toshio Nakamura, Kentaro Onishi,


Shunshuke Ozawa, and Kunihiro Yamada

Abstract. This paper discusses a methodology for project-based learning (PBL) for
edutainment using locomotion robots. The aim of PBL is to develop a locomotion
robot as a new shape of artificial creature not existing in the natural world, and to
design new locomotion patterns. Therefore, the PBL is composed of two steps; (1)
Students conduct conceptual design of artificial creatures, and (2) Students develop
hardware design and locomotion generation based on the conceptual design. This
paper introduces two examples of development of locomotion robots. Based on their
experience on problems and troubles in the trial and error through the development
of locomotion robots, the students learn the relationship between the shape and
locomotion patterns, the tradeoff the stability and high-speed locomotion, and the
difficulty of problem solving.

1 Introduction
Recently, biologically inspired robots have been discussed and developed from vari-
ous viewpoints, e.g., Kukanchi [1] and Mobiligence [2]. Kukanchi is the fundamental
concept based on interactive human-space design and intelligence. This research di-
rection is related with the human-centered environmental design, intelligence space,
and human-friendly robots. The living and moving ability of animals and people de-
pends strongly on their surrounding environments. Therefore, we should discuss the
relationship between shape and locomotion of animals and robots. Especially, ani-
mals behave adaptively in diverse environments. In the concept of Mobiligence, the
mechanisms for the generation of intelligent adaptive behaviors have been discussed
to emerge from the interaction of the body, brain, and environment until now.
On the other hand, various types of embedded systems have been applied to edu-
tainment until now. In general, edutainment is known as a coined word of education
with entertainment. Basically, there are three different aims in robot edutainment. One
is to develop knowledge and skill of students through the project-based learning (PBL)
by the development of robots (Learning on Robots). Students can learn basic knowl-
edge on robotics itself by the development of a robot [3,4]. The next one is to learn the
interdisciplinary knowledge on mechanics, electronics, dynamics, biology, and

T. Watanabe et al. (Eds.): Intelligent Interactive Multimedia: Systems & Services, SIST 14, pp. 375–384.
springerlink.com © Springer-Verlag Berlin Heidelberg 2012
376 K. Toyoda et al.

informatics by using robots (Learning through Robots) [5]. The last is to apply human-
friendly robots instead of personal computers for computer assisted instruction (Learn-
ing with Robots) [6,7]. A student learns (together) with a robot. We also have applied
robots to the fields of education to aim at realizing new robot edutainment, and have
developed various types of robots for courses in primary and junior high schools since
2009. Furthermore, we tested them in experimental classes as activities of Science
Partnership Projects (SPP) [8] of Japan Science and Technology Agency.
In this paper, we focus on the learning through robots based on PBL. The aim
of PBL is to develop the various human abilities such as management, discussion,
survey, and presentation through performing a project. Therefore, the learning
through robots based on PBL can make students learn both the basic knowledge
on robots and the management of a project at the same time. The aim of PBL in
this class is to develop a locomotion robot as a new shape of artificial creature not
existing in the natural world, and to generate new locomotion patterns based on
the above discussions on the Kukanchi and Mobiligence. The PBL is composed of
two steps; (1) Students conduct conceptual design of artificial creatures, and (2)
Students develop hardware design and locomotion generation based on the con-
ceptual design. Based on their experience on problems and troubles in the trial and
error through the development of locomotion robots, the students learn the diffi-
culty of problem solving. Figure 1 shows the locomotion robots developed by two
groups of graduate students joined at the class of autumn semester in 2010-2011.
The common aim in these two is to minimize the number of actuators for locomo-
tion while maintaining stable locomotion. The students learned the effectiveness
of the project based on cooperative and competitive discussion through the devel-
opment of robots. Furthermore, a human can easily walk, jump, and run in an
environment, but it is very difficult to realize such motions of robots. Thus, the re-
search and development of robots are very useful to understand the dynamics and
intelligence of the human itself. In this paper, we introduce two different types of
locomotion robots developed by two groups of graduate school joined at the class
of the autumn semester in 2011-2012.
This paper is organized as follows: Section 2 explains the hardware of robot
kits for developing locomotion robots and the procedure of PBL. Section 3 shows
an example of locomotion robot based on four legs for stable high-speed rotation.
Section 4 shows an example of one-leg locomotion robot based on cyclic patterns
of upright posture and falling behavior. Section 5 summarizes this paper, and dis-
cusses the future direction of this PBL.

(a) Rotation behaviors by six legs (b) Rotation behaviors by two legs
Fig. 1 Locomotion robots in 2010-2011 [9]
Locomotion Design of Artificial Creatures in Edutainment 377

2 Robot Kits for Development of Locomotion Robots


We use robot development kits: Bioloid and Freedom produced by Robotis in
South Korea [10,11]. Fig.2 shows the components of robot kits including the main
unit of micro-controller, actuators, sensor unit, frames, connection cables, bolts
and nuts. The robot can perform the wireless communication by ZigBee through
RS-232C with host computers or other robots.

(a) Robot parts (b) Body of Bioloid (c) Body of Freedom

Fig. 2 Robot development kit; Bioloid and Freedom

Fig. 3 ZigBee wireless communication modules

Basically, the teaching material in the PBL is composed of three stages of (1)
the design of locomotion robot, (2) the design of locomotion patterns (3) the expe-
riments of locomotion robots, and creation of teaching materials. Figure 4 shows
the procedure of PBL. The standard number of students in a group is 3 to 5. A
teaching technical assistant is assigned to each group.
In the first stage, we make students consider the shape of artificial creatures not
existing in the natural world. First of all, we explain the fundamental definition of
locomotion. Next, we make the students imagine the concept of artificial creatures.
378 K. Toyoda et al.

The students try to discuss the concept of artificial creatures by drawing various
shapes of artificial creatures. At the time, the students cut the sketch of artificial
creatures into several parts, and combine some parts to imagine a new shape of ar-
tificial creatures. Next, we show movies of locomotion of animals and insects, and
discuss the locomotion patterns. Finally, we make the students decide the combi-
nation of shape and locomotion of a target locomotion robot.

Fig. 4 The procedure of PBL

3 Cartwheel Locomotion Robots

3.1 Conceptual Design of Locomotion Robots

In the first stage, we made students consider the shape of artificial creatures not ex-
isting in the natural world. They proposed more than 10 ideas for the inspiration to
the conceptual design of artificial creatures such as moonwalk, rotation locomotion,
locomotion using a Dharma doll, gliding steps, and expansion and contraction me-
chanism. They discussed the advantage and disadvantage of each idea through
brainstorming. The most important thing is to increase their motivation to develop
locomotion robots. As a result, they decided to develop a robot for high-speed
cartwheel locomotion.

Fig. 5 A human cartwheel behavior


Locomotion Design of Artificial Creatures in Edutainment 379

Fig. 6 A cartwheel locomotion robot

3.2 Hardware Design of Locomotion Robots

First, the student proposed several types of shapes based on the shapes of actuators,
mechanical parts, and body (controller box of Bioloid). Furthermore, they
searched for several photos on human cartwheel. Figure 5 shows an example of
human cartwheel. Finally, they built the cartwheel locomotion shown in Fig.6 to
realize smooth and high-speed locomotion.

Fig. 7. Locomotion of a cartwheel locomotion robot

3.3 Design of Locomotion Patterns

The shape and locomotion of the locomotion robot should be designed to realize
locomotion patterns. First, the students designed locomotion patterns based on the
sequence shown in Fig.5 by hand (Fig.7). Next, they measured the change of the
center of gravity (COG) according to the difference of shapes of the body. They de-
signed the reference trajectory of joint angles each actuators where the output level
of each actuator is fixed. However, the robot was not able to rotate complete once
(one cycle). They discussed its reason from the viewpoints of shapes and locomotion
patterns. Finally, they changed the attachment position between the actuator and
foot-plate, and they successfully realized one cycle of forward movement.
380 K. Toyoda et al.

Fig. 8 Backward movement of the cartwheel locomotion robot

3.4 Experiments of Locomotion Robots

The students evaluate designed locomotion patterns in several experiments. First


of all, we discussed the difference between 6-leg locomotion robot (Fig.1 (a)) and
cartwheel locomotion robot. Originally, 6-leg locomotion robot had only 4 legs,
but they were not able to rotate one cycle by 4 legs with 4 actuators. Therefore,
they developed 6-legs locomotion robot. In case of 6-leg locomotion robot, it is
very difficult to change the COG for forward movement, because the link length is
short. On the other hand, the cartwheel locomotion robot can easily change the
COG for forward movement. Furthermore, the movement of the cartwheel loco-
motion robot is very stable, because the cartwheel locomotion robot can stand on
the ground by two legs.
Next, they tried to realize the backward movement. First of all, they used the
same reference trajectory of joint angles with inverse outputs of actuators, but the
failed. This time, they realized the backward movement with different reference
trajectory to change the COG rapidly toward the outside of the body (Fig.8).
Finally, they tried to improve the speed of forward movement with the same
value of actuator output. As a result, they generated different pattern of locomo-
tion based on two steps movement. In this movement, the robot raises up the leg
as much as possible, and falls down forward with the moment. As a result, they
realized high-speed forward movement. Figure 9 shows the comparison result of
three types of locomotion patterns per one cycle with the same actuator output
values. The moving distance of two steps locomotion is the longest, but it takes
much moving time comparing with two other locomotion patterns to stabilize the
posture in the transition of postures. Next, they compared the maximal speed
(actuator outputs) of stable locomotion among three locomotion patterns. The
moment is large as the actuator output increases. As a result, they considered the
effect of dynamics, and generated different reference trajectory to conduct high-
speed rotation. Table 1 shows the comparison results of maximal output to realize
the stable high-speed rotation. In the experimental result, the original forward
movement is the best to realize the high-speed rotation. This means that the high-
speed rotation requires the stable posture transition. In this way, they discussed
the relationship between the shape and locomotion patterns of the cartwheel
locomotion robot in detail.
Locomotion Design of Artificial Creatures in Edutainment 381

Fig. 9 Comparison of thee types of locomotion patterns per one cycle

Table 1 Comparison of maximal speed (actuator outputs)

4 One-Leg Locomotion Robots with Multiple Links


4.1 Conceptual Design of Locomotion Robots
This subsection shows the development of one-leg locomotion robot based on cyc-
lic patterns of upright posture and falling behavior. The students of another group
in 2011-2012 proposed more than 5 ideas for the inspiration to the conceptual de-
sign of artificial creatures such as one-leg locomotion, three-leg locomotion, and
dynamics-based chair movement. They discussed the advantage and disadvantage
of each idea through brainstorming. As a result, they decided to develop one-leg
locomotion robot with multiple links (Fig.10).

Fig. 10 A conceptual design of locomotion by one-leg with multiple links

4.2 Hardware Design of Locomotion Robots

The student proposed several types of shapes based on the shapes of actuators,
mechanical parts, and body (controller box of Bioloid and Freedom). According to
the conceptual design shown in Fig.10, they developed two different types of one-
leg locomotion robots (Fig.11), because the weight of the body parts is quite dif-
ferent between Bioloid and Freedom. Actually, the body part of Freedom can be
considered as same as two-link (Fig.11 (b)).
382 K. Toyoda et al.

(a) Bioloid-base (b) Freedom-base


Fig. 11 One-leg locomotion robots

4.3 Design of Locomotion Patterns

First, the students designed locomotion patterns based on the shape of one-leg lo-
comotion robots shown in Fig.11 by hand (Fig.12). The most important thing is to
discuss the movability of body part of Bioloid on the ground. In Fig.12 (a), the lo-
comotion robot must draw the body part to the contacting point of foot-plate. At
the time, they discussed the effect of friction between the robot and ground (floor).
In preliminary experiments, the robot was not able to draw the body part of Biolo-
id to the original foot-plate owing to the low friction with the ground. Therefore,
they attached a rubber sheet on back of foot-plate. Accordingly, the robot was able
to draw the body part to the position of foot-plate.

(a) Bioloid-base (b) Freedom-base


Fig. 12 Locomotion patterns of one-leg locomotion robots

4.4 Experiments of Locomotion Robots

The aim of this one-leg locomotion robot is to realize cyclic patterns of upright
posture and falling behavior (see Fig.12). Therefore, the stability of upright posture
Locomotion Design of Artificial Creatures in Edutainment 383

depends on the number of links corresponding to the actuators. They changed the
number of actuators, and they found the one-leg locomotion robot with less than 4
actuators was not able to fall down because the COG of the robot cannot be
changed outside of the body part (Bioloid). They also discussed the effect of fric-
tion between the robot and ground (floor) as shown in Fig.13. This result shows
that the one-leg locomotion can move without consideration of the effect of friction
if the number of actuators is changed. On the other hand, it is very difficult to real-
ize the stable upright posture with more than 8 actuators. The moving distance
increases linearly as the number of actuators increases.
Next, they conducted several experiments of the performance of moving veloci-
ty (Table 2). The required time for one cyclic movement by the one-leg locomo-
tion robot increases as the number of actuators increases, but they realized the
speed-up from the viewpoint of average moving velocity.

Fig. 13 Comparison of locomotion by one-leg locomotion robots (Bioloid-base)

Table 2 Comparison of performance of one-leg locomotion robot (Bioloid-base)

5 Summary
This paper discussed the applicability of locomotion robots to the edutainment.
First, we prepared the teaching materials based on the robot kit. The aim of subject
prepared in this study is to design the new shape of artificial creature not existing
384 K. Toyoda et al.

in the natural world without using wheels, then realize its locomotion. We con-
ducted project-based learning of two groups of four or five students in the gradu-
ate school. The students joined at the autumn semester in 2011-2012 are interested
in the design of the stable high-speed locomotion as a result while the students
tried to develop the minimal size of locomotion robot. They understood the rela-
tionship between shape and locomotion patterns of artificial creatures, the tradeoff
between the stability and high-speed locomotion, and others.
As future works, we intend to conduct the teaching by edutainment using the
developed locomotion robots at elementary schools or junior high schools in this
summer.

References
[1] Zhen, J., Aoki, H., Sato-Shimokawara, E., Yamaguchi, T.: Interactive System for
Sharing Objects Information by Gesture and Voice Recognition between Human and
Robot with Facial Expression. In: The Fourth Symposium in System Integration (SII
2011), pp. 293–298 (2011)
[2] Ogawa, H., Chiba, R., Takakusaki, K., Asama, H., Ota, J.: Method for Obtaining
Quantitative Change in Muscle Activities by Difference in Sensory Inputs about
Human Posture Control. In: Proceedings of The 5th International Symposium on
Adaptive Motion of Animals and Machines (AMAM 2011), pp. 9–10 (2011)
[3] Gonzalez-Gomez, J., Valero-Gomez, A., Prieto-Moreno, A., Abderrahim, M.: A New
Open Source 3D-printable Mobile Robotic Platform for Education. In: Proceedings of
the 6th International Symposium on Autonomous Minirobots for Research and
Edutainment, p. S22 (2011)
[4] Riedo, F., Retornaz, P., Bergeron, L., Nyffeler, N., Mondada, F.: A two years informal
learning experience using the Thymio robot. In: Proceedings of the 6th International
Symposium on Autonomous Minirobots for Research and Edutainment, p. S11 (2011)
[5] Salvini, P., Macrì, G., Cecchi, F., Orofino, S., Coppedè, S., Sacchini, S., Guiggi, P.,
Spadoni, E., Dario, P.: Teaching with minirobots: The Local Educational Laboratory
on Robotics. In: Proceedings of the 6th International Symposium on Autonomous
Minirobots for Research and Edutainment, p. S12 (2011)
[6] Yorita, A., Hashimoto, T., Kobayashi, H., Kubota, N.: Remote Education based on
Robot Edutainment. In: Proc (CD-ROM) of The 5th International Symposium on Au-
tonomous Minirobots for Research and Educatinment (AMiRE 2009), pp. 204–213
(2009)
[7] Yorita, A., Kubota, N.: Robot Assisted Instruction in Elementary School Based on
Robot Theater. Journal of Robotics and Mechatronics 23(5), 893–901 (2011)
[8] http://spp.jst.go.jp/
[9] Narita, T., Tajima, K., Takase, N., Zhou, X., Hata, S., Yamada, K., Yorita, A.,
Kubota, N.: Reconfigurable Locomotion Robots for Project-based Learning based on
Edutainment. In: Proc (CD-ROM) of International Workshop on Advanced Computa-
tional Intelligence and Intelligent Informatics, IWACIII 2011, p. SS5-3 (2011)
[10] http://www.robotis.com/zbxe/main
[11] Kubota, N., Tomioka, Y., Ozawa, S.: Intelligent Systems for Robot Edutainment. In:
Proc. of 4th International Symposium on Autonomous Minirobots for Research and
Edutainment, pp. 37–46 (2007)
Multistep Search Algorithm for Sum k-Nearest
Neighbor Queries on Remote Spatial Databases

Hideki Sato and Ryoichi Narita

Abstract. Processing sum k-Nearest Neighbor (NN) queries on remote spatial


databases suffers from a large amount of communication. In this paper, we pro-
pose RQP-M search algorithm for efficiently searching sum k-NN query results
to overcome the difficulty. It refines query results originally searched by RQP-S
algorithm with subsequent k-NN queries, whose query points are chosen among
vertices of a regular polygon inscribed in a before-searched circle. Experimental re-
sults show that Precision is over 0.99 for uniformly distributed data, over 0.95 for
skew-distributed data, and over 0.97 for real data. Also, NOR (Number of Requests)
ranges between 3.2 and 4.0, between 3.1 to 3.8, and between 2.9 and 3.5, respec-
tively. Precision of RQP-M increases by 0.04-0.20 for uniformly distributed data, in
comparison with that of RQP-S.

1 Introduction
New types of Location-Based Services (LBS) for supporting a group of mobile users
are potentially promising. Consider, for example, a group of mobile users, each
being at a different location (query point), wants to obtain POI (Points Of Interests)
information to meet there together. Additionally, it is assumed that location data
regarding mobile users is obtainable from a location management server and POI
information is accessible via other WEB services. In this case, realizing the LBS
requires Aggregate k-Nearest Neighbor (k-ANN) queries, for returning POI whose
sum (maximum) of distance from each query point is top-k minimum.
Hideki Sato
School of Informatics, Daido University, 10-3 Takiharu-cho,
Minami-ku, Nagoya, 457-8530 Japan
e-mail: hsato@daido-it.ac.jp
Ryoichi Narita
Aichi Toho University, 3-11 Heiwagaoka, Meito-ku, Nagoya, 465-8515 Japan
e-mail: narita@aichi-toho.ac.jp

T. Watanabe et al. (Eds.): Intelligent Interactive Multimedia: Systems & Services, SIST 14, pp. 385–397.
springerlink.com c Springer-Verlag Berlin Heidelberg 2012
386 H. Sato and R. Narita

However, there are difficulties in realizing the LBS for two reasons. First, Web
service receiving k-ANN queries has to access its corresponding spatial databases
for answering them. If spatial databases to be queried are local, and query pro-
cessing algorithms have direct access to their spatial indices (i.e., R-trees[1] and its
variants), it can answer queries efficiently. However, this assumption does not hold,
when k-ANN queries are processed by accessing remote spatial databases that op-
erate autonomously. Although some or all the data from remote databases can be
replicated in a local database and a separate index structure for them can be built,
it is infeasible when the database is huge, or large number of remote databases are
accessed
Secondly, accesses to spatial data on WWW are limited by certain types of
queries, due to simple and restrictive Web API interfaces. A typical scenario is de-
voted to retrieving POI nearest to the address given as query point through a Web
API interface. Unfortunately, Web API interfaces are not supported for processing
k-ANN queries on remote spatial databases. In other words, a new strategy for effi-
ciently processing k-ANN queries is required in this setting.
We have proposed Representative Query Point (RQP) based algorithm RQP-S
as a new solution for processing k-ANN queries[2],[3]. Actually, it searches k-
Nearest Neighbor (NN) queries using RQP as query point, instead of original k-
ANN queries. However, it returns approximate results, not exact ones. According
to experimental results, Precision is partially allowable, not entirely [2],[3]. Thus,
additional improvement must be imposed upon query results searched by RQP-S.
In this paper, we propose RQP-M for efficiently searching much exact results
of sum k-NN queries. It refines query results originally searched by RQP-S with
subsequent k-NN queries. While RQP-S is a single step search algorithm, RQP-M
is a multi-step search algorithm.
The remainder of this paper is organized as follows. Sect.2 mentions related
work. Sect.3 describes sum k-NN queries and the difficulties in processing them
for the later discussion. Sect.4 presents RQP-M algorithm. Sect.5 evaluates RQP-M
experimentally, using synthetic and real data. Finally, Sect.6 concludes the paper
and gives our future work.

2 Related Work
The existing literature in the field of location-dependent queries is extensively sur-
veyed in the article[4]. Among many location-dependent queries, NN queries[5],
[6] and their variants such as Reverse NN[7], Constrained NN[8], and Group
NN[9],[10] are considered to be important in supporting spatial decision making.
A Reverse k-NN query retrieves objects that have a specified object/location among
their k nearest neighbors. A Constrained NN query retrieves objects that satisfies
a range constraint. For example, a visible k-NN query retrieves k objects with the
smallest visible distance to a query object[11].
Multistep Search Algorithm for Sum k-NN Queries 387

Since a Group NN query retrieves ANN objects, the work[9],[10] is much re-
lated to ours. First, it has been dedicated to the case of euclidean distance and sum
function[9]. Then, it has been generalized to the case of the network distance[10].
Their setting is that the spatial database storing data objects are local to the site
where the database resides. However, we deal with k-ANN queries, where each
database is located at a remote site.
Both of the work[12],[13] are much related to ours, because they provide users
with location-dependent query results by using Web API interfaces to remote
databases. The former[12] proposes a k-NN query processing algorithm that uses
one or more Range queries1[14],[15],[16] to retrieve the nearest neighbors of a given
query point. The latter[13] proposes two Range query processing algorithms by us-
ing k-NN queries. However, our work differs from theirs in dealing with k-ANN
queries, not either k-NN queries or Range queries.

3 Preliminaries
ANN queries are an extension of NN queries. Let p be a point and Q be a set
of query points. Then, aggregate distance function dagg (p, Q) is defined to be
agg({d(p, q)|q ∈ Q}), where agg( ) is aggregate function (e.g.,sum,max,min). Given
a set P of data objects and a set Q of query points, ANN query retrieves the object p
in P, such that dagg (p, Q) is minimized. k-ANN queries are generalization of ANN
queries to top-k. Given a set of data objects P, a set of query points Q, and aggregate
distance function dagg (p, Q), k-ANN query k-ANNagg (P, Q) retrieves S ⊂ P such that
|S| = k and dagg (p, Q) ≤ dagg (p , Q), ∀p ∈ S, p ∈ (P − S) for some k(< |P|).
In the following of the paper, sum k-Nearest Neighbor (sum k-NN) query is ex-
amined as k-ANN query, where sum is used for aggregate distance function. Con-
sider the example of Fig.1, where P(= {p1 , p2 , p3 , p4 }) is a set of data objects (e.g.,
restaurants) and Q(= {q1 , q2 }) is a set of query points (e.g., locations of mobile
users). The number on each edge connecting a data object and a query point repre-
sents any distance cost between them. Table1 presents dagg (p, Q) for each p in P,
sum NN query result, sum 3-NN query result over sum distance function.
Let p be a point (x,y) and Q be a set of query points. Sum distance function
dsum,Q (x, y) over Q is defined in Eq.1. Fig.2 presents a contour graph of sum distance
function dsum,Q (x, y) over a set of 10 query points whose locations are randomly
generated. Since dsum,Q (x, y) is a convex function, there certainly exists a single
point at which function value is the lowest.
|Q| 
dsum,Q (x, y) = ∑ (x − xi )2 + (y − yi)2 (1)
i=1

1 A Range query retrieves the objects located within a certain range/region.


388 H. Sato and R. Narita

p3
Table 1 Results of sum NN query and sum
3-NN query shown in Fig.1
450

p2 q1
280 300 dsum (p1 , Q) 760
dsum (p2 , Q) 860
580 q2
dsum (p3 , Q) 750
340
240 dsum (p4 , Q) 800
420 560
p4 sum NN query result p3
p1
sum 3-NN query result {p3 , p1 , p4 }

Fig. 1 Example of query points (solid circle)


and data objects (hollow square)

query point

9 0.8
8
7
6 0.6
5
4 y
3
2 0.4
1
0
0.2
1
0.8

0 0.6 0
0.2 0.4 y 0 0.2 0.4 0.6 0.8 1
0.4
0.6 0.2
x 0.8
1 0 x

(a) (b)

Fig. 2 Sum distance function dsum,Q (x, y) (euclidean distance, number of query points=10)

4 RQP Based Search Algorithm for Sum k-NN Query


In this section, RQP-S and RQP-M for searching sum k-NN query results are de-
scribed. First, the former is presented to discuss the latter. Secondly, the latter is
introduced to be an algorithm for refining query results searched by the former

4.1 RQP-S Algorithm


RQP-S was proposed to search k-ANN query results by requesting a single k-NN
query[2],[3]. RQP represents a set of query points specified for a k-ANN query and
is used as query point to be specified for a substitutive k-NN query for the k-ANN
one. Table 2 lists RQP proposed for sum k-NN queries. Since dsum,Q (x, y) is a convex
function, there certainly exists a unique minimal point for a set of query points Q.
However, it cannot be computed by an analytical method. Furthermore, dsum,Q (x, y)
is not differentiable at a point (xi , yi ) belonging to Q. Therefore, it can be computed
by an algorithm based on differentiation, neither. Instead, it can be obtained by
employing Nelder-Mead method[17] for nonlinear programming problems, which
does not rely on gradients of a function. Fig.3 shows a searched circle of a 5-NN
Multistep Search Algorithm for Sum k-NN Queries 389

Table 2 RQP over a set of query points Q

RQP description
minimal point point at which the value of sum distance function dsum,Q (x, y) over Q is the lowest.
middle point point (median({xi |(xi , yi ) ∈ Q}), median({yi |(xi , yi ) ∈ Q})), where median() is
a function returning the middle ordered value of elements in a set.
mean point centroid of Q.

p4
p5
r

p2
q
p1
query
point
p3

Fig. 3 Searched circle of 5-NN query (query point (solid circle) and data objects (hollow
circle))

query with RQP q as query point. {p1 , p2 , p3 , p4 , p5 } is the query result. The radius
of the circle equals the distance r between the 5th nearest neighbor p5 and q.

4.2 RQP-M Algorithm


k-ANN query results searched by RQP-S are not necessarily correct. This implies
that spatial data to be answered might reside outside a searched circle (See Fig.3).
However, regions where such data reside cannot be computed by an analytical
method. Instead, a heuristic method is chosen for searching the regions.
Let q be a query point of k-NN query, p be a point outside a searched circle,
v be the point at which line segment pq and the circumference of the circle cross.
Since dsum ()is a convex function, relation 2 holds for q, p, v and a set of query points
Q, where 0 ≤ α ≤ 1, β = 1 − α , v = α q + β p. In case that dsum (q, Q) ≤ dsum (v, Q)
holds, It is derived from relation 2 that dsum (v, Q) ≤ dsum (p, Q) holds. Consequently,
dsum (p, Q) ≤ upperbound might hold under dsum (v, Q) ≤ upperbound. In the op-
posite case that dsum (q, Q) > dsum (v, Q) holds, dsum (p, Q) ≤ upperbound might
hold. Eq.3 is a logical formula regarding if p might exist such that dsum (p, Q) ≤
upperbound holds, which is derived by merging both cases.

dsum (v, Q) ≤ α dsum (q, Q) + β dsum(p, Q) (2)

(dsum (q, Q) ≤ dsum (v, Q) ∧ dsum (v, Q) ≤ u) ∨ (dsum(q, Q) > dsum (v, Q)) (3)
390 H. Sato and R. Narita

Since infinite number of points exist continuously on the circumference of a searched


circle, which point on the circumference should be chosen as v is problematic in
searching regions for spatial data to be answered. A regular polygon inscribed in a
searched circle can be employed to provide its vertices as candidate point. Fig.4(a)
shows an inscribed 6-regular polygon. Let upperbound be max({dsum (pi , Q)|1 ≤ i ≤
5}) and dsum (p4 , Q) equal upperbound. The first vertex v1 of the polygon is set to be
a point at which a line extending line segment p4 q ahead of q and the circumference
of the circle cross2 . Each element of {v1 , v2 , v3 , v4 , v5 , v6 } can be chosen as point for
searching regions for spatial data to be answered, if it satisfies Eq.3. A list of these
vertices is called CQPlist (Candidate Query Point list). A query result searched by
RQP-S might be refined by merging with the result of a k-NN query with v as query
point, where v is a point belonging to CQPlist.

䌖䋳 䌖䋳
䌖䋴 䌖䋴

p4
p5

p2 䌖䋱䋲 䌖䋱䋱
q 䌖䋲 䌖䋲
䌖䋵 p1 䌖䋵 p1

p3 p3

p䋷 䌖䋱䋰
q
䌖䋶 䌖䋷 䌖䋶
䌖1 䌖1
p䋶

p䋸

䌖䋹
䌖䋸

(a) Inscribed 6-regular polygon (b) Subsequent 5-NN query search

Fig. 4 Additional k-NN query search whose query point is a vertex of n-regular polygon

Fig.4(b) shows a newly searched circle of 5-NN query with v1 as query point.
{p1 , p3 , p6 , p7 , p8 } is the query result. While p1 , p3 are again searched points,
p6 , p7 , p8 are newly searched points. The latter three might refine the before gained
query result. Let upperbound be max({dsum(pi , Q)|1 ≤ i ≤ k}) where pi belongs
to the refined query result. The new 6-regular polygon inscribed in the new circle
supplies its vertices. Each element of {v7 , v8 , v9 , v10 , v11 } can be added to CQPlist,
if it satisfies Eq.3. However, v12 is not added to CQPlist, because it reside inside
the before searched region. Additionally, either v2 or v6 is removed from CQPlist
for the same reason, if it belongs to the list. Of course, v1 is removed from CQPlist,
because it has been used as query point.
Fig.5 shows RQP-M algorithm. k-NN query results (line 1) are rearranged in
the ascending order of sum distance (line 2). These 2 lines correspond to RQP-S.
uppervalue is set to be the maximum (line 3). In case that k1 > k2, infinity is set

2 This is heuristically decided because p might reside on a line extending line segment qv1
ahead of v1 such that dsum (p, Q) ≤ upperbound.
Multistep Search Algorithm for Sum k-NN Queries 391

RQP_M (k1, k2, qp, rqp, n)


Input:
number of data to be returned for k-ANN query k1
number of data to be returned for k-NN query k2
a set of query points qp
representative query point for a set of query points rqp
number of edges of a regular polygon n
Output:
kANN query result searched by RQP-M Rlist
01 Slist:=NEAREST_NEIGHBOR_SEARCH(k2, rqp);
02 Rlist:=MAKE_AGGREGATE_DISTANCE_LIST(k1, qp, [], Slist);
03 upperbound:=MAX_AGGREGATE_DISTANCE(Rlist, k1, qp);
04 Clist:=[];
05 circle:=MAKE CIRCLE(rqp DISTANCE(rqp,
circle:=MAKE_CIRCLE(rqp, DISTANCE(rqp Slist,
Slist k2);
06 Vlist:=MAKE_VERTEX_LIST(circle, n, MAX_AGGREGATE_DISTANCE_POINT(Slist, k2, qp));
07 CQPlist:=MAKE_CANDIDATE_QUERY_POINT_LIST([], Vlist, qp, upperbound, Clist);
08 while(not(CQPlist=[])){
09 let search_point be the head element of CQPlist and CQPlist be the remaining list of CQPlist;
10 Slist:=NEAREST_NEIGHBOR_SEARCH(k2, search_point);
11 Rlist:=MAKE_AGGREGATE_DISTANCE_LIST(k1, qp, Rlist, Slist);
12 upperbound:=MAX_AGGREGATE_DISTANCE(Rlist, k1, qp);
13 Clist:=APPEND(Clist, circle);
14 circle:=MAKE_CIRCLE(search_point, DISTANCE(search_point, Slist, k2);
15 Vlist:=MAKE_VERTEX_LIST(circle, n, MAX_AGGREGATE_DISTANCE_POINT(Slist, k2, qp));
16 CQPlist:=MAKE_CANDIDATE_QUERY_POINT_LIST(CQPlist, Vlist, qp, upperbound, Clist);
17 }
18 return Rlist;

Fig. 5 RQP-M search algorithm

instead. Clist maintains before-searched circles and is initialized (line 4). A searched
circle with rqp as center is created (line 5) and the vertices of the regular polygon
inscribed in the circle are gathered (line 6). CQPlist is initially created (line 7), in
which candidate query points are arranged in the ascending order of sum distance.
The same search process is repeated (line 8-17) until CQPlist becomes empty. The
candidate query point with the least sum distance is selected as query point (line 9)
for the k-NN query (line 10). Rlist is refined by using the query results (line 11).
uppervalue is set to be the maximum for updated Rlist (line 12). A searched circle
is created (line 14) and the vertices of the regular polygon inscribed in the circle are
gathered (line 15).
CQPlist is related to the termination condition of the loop (line 8-17), which is
initially created with at most n candidate query points in (line 7). One execution
of loop body necessarily consumes a single query point which is removed from
CQPlist. In line 16, CQPlist is updated to be a list of elements belonging to either
CQPlist or a set of vertices of the regular polygon inscribed in the searched circle
(line 14) and satisfying the following two conditions. One is that its sum distance
is upperbound or less and the other is that it does not reside inside before-searched
circles. upperbound decreases monotonously (line 12) and the regions of before-
searched circles increase monotonously. Accordingly, the loop execution necessar-
ily terminates.
392 H. Sato and R. Narita

RQP-M refines the k-NN query result searched by RQP-S by requesting subse-
quent k-NN queries. NOR (Number of Requests) is employed to measure the search
costs, which counts the number of k-NN queries requested (line 1 and line 10).

5 Experimental Evaluation
In this section, performance of RQP-M is experimentally evaluated by measuring
Precision and NOR. The former is used as criteria to specify the accuracy of sum
k-NN query results. It is defined in Eq.4, where Rsum k−NN is an original sum k-NN
query result, RRQP−M(sum k−NN) is the query result searched by RQP-M, and k is the
cardinality of Rsum k−NN and RRQP−M(sum k−NN) . The latter is the requested number
of k-NN query queries, which specifies search costs of sum k-NN queries. Experi-
mental results are averages of 100 trials conducted for each setting. Parameters k1
and k2 of RQP-M are set equal in the experiments (See Fig.5). Furthermore, each
location of query points is uniformly distributed in the experiments.

|Rsum k−NN RRQP−M(sum k−NN) |
Precision(k) = (4)
k

5.1 Influence of Regular Polygon in Performance


The number of edges of a regular polygon is the parameter of RQP-M (See Fig.5).
How it affects performance is measured. Experiments regarding sum 10-NN queries
are conducted by varying the value from 3 to 9, with a minimal point as RQP and
10000 data points whose locations are uniformly distributed. Precision is over 0.99
in case that the value is 5 or more (See Fig.6(a)). On the other hand, NOR does not
increase in proportion to the value (See Fig.6(b)). From the results, it is supposed
that a regular polygon whose number of edges is 5 or more is sufficient to refine sum
k-NN query results.
Number of Requests(NOR)

1 5
0.99 4
0.98 |QP|=10
Precision

|QP|=30 3
0.97 |QP|=10
|QP|=50 2 |QP|=30
0.96 |QP|=70 |QP|=50
1 |QP|=70
0.95 |QP|=90
|QP|=90
0.94 0
3 4 5 6 7 8 9 3 4 5 6 7 8 9
Number of edges of n-regular polygon Number of edges of n-regular polygon

(a) Precision (b) Number of Requests (NOR)

Fig. 6 Performance of sum 10-NN search for varying number of edges of a regular polygon
Multistep Search Algorithm for Sum k-NN Queries 393

5.2 Performance of RQP-M over Several Type of Data


Performance of RQP-M with a minimal point as RQP is measured by using several
type of data. Firstly, experiments are conducted by varying the number of query
points and k of sum k-NN queries, with 5-regular polygons and 10000 data points
whose locations are uniformly distributed. Precision is over 0.99 (See Fig.7(a)) and
NOR ranges between 3.2 and 4.0 (See Fig.7(b)).

Number of Requests(NOR)
1 4

0.998
3
k=10 k=10
Precision

0.996 k=30
2 k=30
0.994 k=50 k=50
k=70 1 k=70
0.992
k=90 k=90
0.99 0
10 20 30 40 50 60 70 80 90 100 10 20 30 40 50 60 70 80 90 100
Number of Query Points (|QP|) Number of Query Points (|QP|)

(a) Precision (b) Number of Requests (NOR)

Fig. 7 Performance of sum k-NN search over data points of uniform distribution

Secondly, performance is measured by using skew-distributed data points.


Experiments regarding sum 10-NN queries are conducted by varying the number
of query points, with 5-regular polygons and 10000 data points whose locations are
generated according to two-dimensional Gaussian distribution. Let the location of a
data point be (x, y) (x ∈ [0, 1), y ∈ [0, 1)). The mean point of Gaussian distribution
is randomly generated and the standard deviation(σ ) is changed. Precision is over
0.95 in case that σ is 0.06 and over 0.99 in case that σ is 0.14 (See Fig.8(a)). On
the other hand, NOR ranges between 3.1 and 3.8 (See Fig.8(b)). The larger σ is, the
more performance is like the case of uniform distribution. This is because Gaussian
distribution of large σ is similar to uniform distribution.
Number of Requests(NOR)

1 4

0.98 3
σ=0.06
Precision

0.96
σ=0.06 2 σ=0.08
0.94 σ=0.08 σ=0.10
σ=0.10
1 σ=0.12
0.92 σ=0.12
σ=0.14 σ=0.14
0.9 0
10 20 30 40 50 60 70 80 90 100 10 20 30 40 50 60 70 80 90 100
Number of Query Points (|QP|) Number of Query Points (|QP|)

(a) Precision (b) Number of Requests (NOR)

Fig. 8 Performance of sum 10-NN search over data points of Gaussian distribution

Thirdly, performance is measured by using real data points. Experiments are con-
ducted by varying the number of query points and k of sum k-NN queries, with
5-regular polygons and real data points. The data is concerned with restaurants
394 H. Sato and R. Narita

locating in Nagoya, which is available at the Web site and accessible via the Web
API3 . There are 2003 corresponding restaurants, which are concentrated in the
downtown of Nagoya. Precision is over 0.97 (See Fig.9(a)) and NOR ranges be-
tween 2.9 and 3.5 (See Fig.9(b)).

Number of Requests(NOR)
1 4
0.98
3
k=10
Precision

0.96 k=l10
k=30 k=30
2
0.94 k=50 k=50
k=70 1 k=70
0.92
k=90 k=90
0.9 0
10 20 30 40 50 60 70 80 90 100 10 20 30 40 50 60 70 80 90 100
Number of Query Points (|QP|) Numberof Query Points (|QP|)

(a) Precision (b) Number of Requests (NOR)

Fig. 9 Performance of sum k-NN search over real data points

5.3 Precision Improvement Process


Precision improvement process is experimentally clarified. Let p1 , p2 , . . . , pn  be
a sequence of Precision regarding sum k-NN query results, where pi (1 ≤ i ≤ n)
is Precision after requesting the i-th k-NN query. Note that pi ( f ≤ i ≤ n) is set
to p f in case that NOR f is less than n. Experiments are conducted to compute
a sequence of average Precision for sum 10-NN queries with a minimal point as
RQP, by varying the number of query points with 5-regular polygons and 10000 data
points whose locations are uniformly distributed. Fig.10(a) shows that Precision is
over 0.99 after requesting the 3rd 10-NN queries. Additionally, the ratio of exact
results after requesting the i-th 10-NN query is measured. However, exact results
do not necessarily lead to immediate termination of RQP-M, because it cannot be
aware that exact results are obtained. RQP-M continues to execute until available
query points are exhausted. Conversely, it stops its execution when there remains no
available query points, even if it does not obtain exact results. Fig.10(b) shows that
the ratio of exact results is over 0.93 after requesting the 3rd 10-NN queries.

5.4 Influence of Representative Query Point in Performance


RQP is the parameter of RQP-M for the first k-NN search. How RQP affects perfor-
mance of RQP-M are measured. Experiments are conducted by varying the number
of query points with 5-regular polygons and 10000 data points whose locations are
uniformly distributed. RQP-S is a special version of RQP-M, which requests just a
single k-NN query. Fig.11 shows Precision of sum 10-NN query results searched
by RQP-S using distinct RQP. It is certain that minimal points as RQP of RQP-S
3 htt p : //webservice.recruit.co. j p/hot pepper/gourmet/v1/
Multistep Search Algorithm for Sum k-NN Queries 395

1 1

Ratio of exact results


0.95 0.8
|QP|=10 |QP|=10
Precision

0.9 |QP|=30 0.6 |QP|=30


|QP|=50 |QP|=50
0.85 0.4
|QP|=70 |QP|=70
0.8 |QP|=90 0.2 |QP|=90
0.75 0
1 2 3 4 5 6 7 1 2 3 4 5 6 7
i-th Request i-th Request

(a) Precision (b) Ratio of exact results

Fig. 10 Precision improvement process of sum 10-NN query search

are superior in Precision to both of mean points and middle points. Fig.12 shows
Precision and NOR of sum 10-NN query results searched by RQP-M using distinct
RQP. Precision of RQP-M using minimal points as RQP increases by 0.04-0.20
for uniformly distributed data, in comparison with that of RQP-S. However, there
is little difference in Precision among three kinds of RQP, each of which is used
by RQP-M (See Fig.12(a)). By contrast, there remains certain difference in NOR
among the three (See Fig.12(b)). NOR regarding minimal points ranges from 3.42
to 3.74, while NOR regarding mean points and middle points ranges from 4.3 to
7.56. It is certain that minimal points are much superior in NOR to the others.

1
0.9
0.8 minimal point
0.7 mean point
Precision

0.6
0.5 median point
0.4
0.3
0.2
0.1
0
10 20 30 40 50 60 70 80 90 100
Number of Query Points (|QP|)

Fig. 11 Precision of sum 10-NN query results searched by RQP-S with distinct RQP
Number of Requests(NOR)

1 8
7
0.99
6
Precision

0.98 5
4
0.97 minimal point
3 minimal point
mean point
0.96 2 mean point
middle point
1 middle point
0.95 0
10 20 30 40 50 60 70 80 90 100 10 20 30 40 50 60 70 80 90 100
Number of Query Points (|QP|) Number of Query Points (|QP|)

(a) Precision (b) Number of Requests (NOR)

Fig. 12 Performance of sum 10-NN query results searched by RQP-M with distinct RQP
396 H. Sato and R. Narita

6 Conclusion
In this paper, we have proposed RQP-M search algorithm for efficiently searching
sum k-NN query results. It refines query results originally searched by RQP-S with
subsequent k-NN queries, whose query points are chosen among vertices of a regular
polygon inscribed in a before-searched circles. Experimental results on performance
of RQP-M are as follows. (1) A regular polygon whose number of edges is 5 or
more is sufficient to refine sum k-NN query results. (2) Precision is over 0.99 for
uniformly distributed data, over 0.95 for skew-distributed data, and over 0.97 for real
data. Also, NOR ranges between 3.2 and 4.0, between 3.1 to 3.8, and between 2.9
and 3.5, respectively. Precision of RQP-M using minimal points as RQP increases
by 0.04-0.20 for uniformly distributed data, in comparison with that of RQP-S. (3)
Precision is over 0.99 after requesting the 3rd k-NN queries and over 93% among
them are exact results. (4) Minimal points as RQP are much superior in NOR to both
of mean points and middle points. Our future work includes further examination
of available query points, experiments on the case that parameters k1 and k2 are
unequal (See Fig.5), development of an efficient algorithm for max k-NN queries,
and study of aggregate within-distance queries[18].

References
1. Guttman, A.: R-trees: A Dynamic Index Structure for Spatial Searching. In: Proc. ACM
SIGMOD Int’l Conf. on Management of Data, pp. 47–57 (1984)
2. Sato, H.: Approximately Solving Aggregate k-Nearest Neighbor Queries over Web Ser-
vices. In: Phillips-Wren, G., Jain, L.C., Nakamatsu, K., Howlett, R.J. (eds.) IDT 2010.
SIST, vol. 4, pp. 445–454. Springer, Heidelberg (2010)
3. Sato, H.: Approximately Searching Aggregate k-Nearest Neighbors on Remote Spatial
Databases Using Representative Query Points. In: Watanabe, T., Jain, L.C. (eds.) In-
novations in Intelligent Machines – 2. SCI, vol. 376, pp. 91–102. Springer, Heidelberg
(2012)
4. Ilarri, S., Menna, E., Illarramendi, A.: Location-Dependent Query Processing: Where
We Are and Where We Are Heading. ACM Computing Survey 42(3), Article 12 (2010)
5. Roussopoulos, N., Kelly, S., Vincent, F.: Nearest Neighbor Queries. In: Proc. ACM SIG-
MOD Int’l Conf. on Management of Data, pp. 71–79 (1995)
6. Hjaltason, G.R., Samet, H.: Distance Browsing in Spatial Databases. ACM Trans.
Database Systems 24(2), 265–318 (1999)
7. Korn, F., Muthukrishnan, S.: Influence Sets Based on Reverse Nearest Neighbor Queries.
In: Proc. ACM SIGMOD Int’l Conf. on Management of Data, pp. 201–212 (2000)
8. Ferhatosmanoglu, H., Stanoi, I., Agrawal, D.P., El Abbadi, A.: Constrained Nearest
Neighbor Queries. In: Jensen, C.S., Schneider, M., Seeger, B., Tsotras, V.J. (eds.) SSTD
2001. LNCS, vol. 2121, pp. 257–276. Springer, Heidelberg (2001)
9. Papadias, D., Shen, Q., Tao, Y., Mouratidis, K.: Group Nearest Neighbor Queries. In:
Proc. Int’l Conf. Data Eng., pp. 301–312 (2004)
10. Yiu, M.L., Mamoulis, M., Papadias, D.: Aggregate Nearest Neighbor Queries in Road
Networks. IEEE Trans. on Knowledge and Data Engineering 17(6), 820–833 (2005)
11. Nutanong, S., Tanin, E., Zhang, R.: Visible Nearest Neighbor Queries. In: Kotagiri, R.,
Radha Krishna, P., Mohania, M., Nantajeewarawat, E. (eds.) DASFAA 2007. LNCS,
vol. 4443, pp. 876–883. Springer, Heidelberg (2007)
Multistep Search Algorithm for Sum k-NN Queries 397

12. Liu, D., Lim, E., Ng, W.: Efficient k-Nearest Neighbor Queries on Remote Spatial
Databases Using Range Estimation. In: Proc. SSDBM, pp. 121–130 (2002)
13. Bae, W.D., Alkobaisi, S., Kim, S.H., Narayanappa, S., Shahabi, C.: Supporting Range
Queries on Web Data Using k-Nearest Neighbor Search. In: Ware, J.M., Taylor, G.E.
(eds.) W2GIS 2007. LNCS, vol. 4857, pp. 61–75. Springer, Heidelberg (2007)
14. Xu, B., Wolfson, O.: Time-Series Prediction with Applications to Traffic and Moving
Objects Databases. In: Proc. Third ACM Int’l Workshop on MobiDE, pp. 56–60 (2003)
15. Trajcevski, G., Wolfson, O., Xu, B., Nelson, P.: Managing Uncertainty in Moving Ob-
jects Databases. ACM Trans. Database Systems 29(3), 463–507 (2004)
16. Yu, P.S., Chen, S.K., Wu, K.L.: Incremental Processing of Continual Range Queries over
Moving Objects. IEEE Trans. Knowl. Data Eng. 18(11), 1560–1575 (2006)
17. Nelder, J.A., Mead, R.: A Simplex Method for Function Minimization. Computational
Journal, 308–313 (1965)
18. Trajcevski, G., Scheuermann, P.: Triggers and Continuous Queries in Moving Objects
Database. In: Proc. 6th Int’l DEXA Workshop on Mobility in Databases and Distributed
Systems, pp. 905–910 (2003)
(Not)Myspace: Social Interaction as Detriment
to Cognitive Processing and Aesthetic
Experience in the Museum of Art

Matthew Pelowski*

Abstract. This paper considers the effect of social interaction on art museum be-
havior and cognitive/ aesthetic experience, arguing that social interaction may
represent one of the potentially most detrimental elements in museum-based view-
ing of art—calling into question the current push to increase social interactions
through museum social and knowledge media design. This is considered through
three case studies with the same works of art, varying only design elements creat-
ing social interaction and considering the differences this creates. Through a psy-
chological viewpoint, these cases are considered and social interaction’s effect is
presented, considering how these findings might connect to museum and general
conceptions of social and knowledge media design.

1 Introduction
Modern society is increasingly becoming a technologically connected and “social”
space. Social communication, knowledge media, networking applications and cul-
tural norms involving the sharing of social and personal information are becoming
a more and more core component of human life. In modern times, it is quite com-
mon—often unthinkingly expected—for one to publish a record of the sights they
have seen, curate the foods they have eaten, announce their immediate opinions,
aesthetic reactions and new understandings of experiences throughout the day; not
to mention search out and share announcements, activities and opinions of others.
This behavior is in turn finding its way into many of the spaces that had formed
the previous venues for social and cultural sharing in human life: the school, the
street and—notably here—the museum of art.
Viewing art itself has long been a social task—matching one’s appraisal against
that of an artist or artworld [1]. While the making of art has long been defined as a
social act [2]—introducing a new work amid a history of other such activities.

Matthew Pelowski
*

Nagoya University, Japan

T. Watanabe et al. (Eds.): Intelligent Interactive Multimedia: Systems & Services, SIST 14, pp. 399–410.
springerlink.com © Springer-Verlag Berlin Heidelberg 2012
400 M. Pelowski

And much current marketing and sociological research suggests that patrons have
long gone to museums (art or otherwise) expressly to join in a social event [3]—
with the driving force for attendance being to integrate and be acknowledged by a
community of likeminded others. With modern and post-modern art, it is also of-
ten noted that art-viewing is increasingly becoming a more difficult, confusing
task [5]. And sharing or learning from knowledge of others, and accessing the
wealth of existing artwork information, is given as primary means for increasing
enjoyment and understanding in the ‘common’ art patron. Therefore—as the panel
discussion, of which this paper is a part, implies—it is no surprise that the art mu-
seum would represent a microcosm for social media integration today, using cell
phones or i-pod to record and share opinions or remarks, offering technology
augmenting viewing with digital overlay of community and others art facts [4].
However, in the push to add social technology to the museum space, and to
work out the technological kinks, unconsidered is the question of what effect this
interaction in fact might have on the cognitive understanding, emotional response,
even basic assessment, of art itself. It appears to go without saying that art interac-
tion should be a social event, that we should share our experiences with others and
benefit from others’ company in our perceptions of art. Yet this assumption may
not necessarily be true. Social interaction, especially within forums such as a mu-
seum that tend to amplify social response, may be uniquely harmful to the person-
al act of enjoying, understanding and aesthetically experiencing works of art. In
fact, social meetings, created by design elements bringing together opinions and
presence of others—even as seemingly mundane as a hallway or bench—may be a
primary reason why individuals do not have fulfilling or rewarding outcomes with
art. This may only be exacerbated by technology that does even more efficiently
make connections within social space.
It is on this topic that this paper hopes to open debate. Through short considera-
tion of three case studies from our research in museums of art, we explore evi-
dence from art encounters—each with the same basic paintings, viewers and
layout; the only major difference being design elements within one space which
lead to heightened social interaction, in turn causing very different outcome for
viewers themselves. Based on our previous work on the psychological underpin-
nings of aesthetic experience, we then provide a frame for considering physical/
psychological evidence of negative social effect, considering why and how this
occurs, and—by, briefly, walking through a decidedly “analog” example of social
interaction effect—offering a counterpoint that should be considered as we push to
integrate social encounters or social/knowledge-media design into aesthetic life.

2 Review: The Aesthetic Progression: Goals and Outcomes


of Viewing Art
First, let us begin with a context for what we can expect from viewing art. While
many outcomes can be considered here, we [6, 7] recently published a cognitive
model for art perception (shown on the left side of Fig. 2) that encapsulates much
of the current debate, and around which we can consider the discussion to come.
(Not)Myspace: Social Interaction as Detriment to Cognitive Processing 401

This model treats art viewing as a series of 5 stages, each with significance for un-
derstanding art, and with elements that raise important points for social design:
1.) Pre expectations: Essentially, the viewing of art (although the same can be
said for other perception activity as well), and the key point for this discussion,
begins with the nature and structure of one’s self. Before a viewer can set foot in a
museum or encounter cognitive task, they already hold expectations for what they
will see and do. Viewers carry “‘fundamental’ meanings regarding themselves,
other persons, objects or behaviors”—“Who am I?” “What is art?” “How does art
relate to me?” How do I relate to (art) society?—which collectively combine to
form one’s image of the world or “ideal self”. This self-image can be divided into
a hierarchical frame [8], topped by traits that are aspired to and that are integral to
one’s identify (“be an art person”), pursued through goals for actions—i.e., under-
stand art—and subdivided into schema—classify, find meaning, follow social
norms. In this way, all action becomes application of the self, protecting and navi-
gating via this structure, and determining what viewers can do or see—leading to
three outcomes, only one of which is generally desirable for art.
2.) Assimilation/ cognitive mastery: First, upon engaging stimuli, individuals
classify, understand and form a response, based upon these expectations and in
such a way as to control and reinforce the self. This is, in fact, generally consi-
dered to be the evolutionary goal of human action, successfully navigating the en-
vironment without cause for disruption, processing and controlling smoothly
without danger to the self. In the case of art, however, as we argue extensively in
our previous publications, because this marks a matching of perception to existing
schema or self, this stage (recently called “cognitive mastery” by Leder et al. [5])
is also the result of circularity, requiring that a viewer expand classification or
break off perception rather than modify the self. When viewed in isolation, this
cuts off possibility for new perception, change in ideas; and becomes a ‘facile’ act
of assimilation—a blasé outcome without fundamental mark or affect on one’s life
(think the typical viewer briskly walking past a painting, identifying mimetic
signs, noting an artist, reading a label, and moving on to the next work in a room).
To move past this point requires something to bump us out of our preconceived
frame. This occurs through discrepancy in matching of world to the self. Discre-
pancy might arise for numerous reasons—between expectations for perception and
perception itself, perceived information and its relevance to the self, between
actions and one’s expectations—in any case, when this cannot be assimilated or
ignored, moving one to the second outcome.
3.) Discrepancy, escape or secondary control: Upon discrepancy, the next re-
sponse is to escape, through what Rothbaum et al. [9] call “secondary control”.
In order to minimize discrepancy, the individual must alter one of the two percep-
tual elements—either the self or the environment. Typically viewers choose to
discard the latter, through: 1) re-classification, often taking on an accusatory
tone—i.e., art is meaningless, bad or esoteric; 2) physical escape; or last 3) mental
withdrawal, lowering importance of the discrepant event—i.e. “it’s only art.”—in
all cases lowering perceived demand for interaction, and therefore impact on the
self.
402 M. Pelowski

4-5.) Self awareness, schema change and aesthetic experience: It is only in


cases when viewers do not assimilate or escape, then, that we instead find an im-
pact on the self. Viewers find themselves in an intractable position. Unable to
process a discrepancy and unable to downplay its importance, they are left with no
choice but to reframe their own involvement, seeking what Torrance [10] has
called a “second-order change” whereby they adopt a meta-cognitive approach to
interaction, “looki[ng] outside the problem situation to the system” itself, giving
up attempts at overt control, revisiting expectations—what did I expect; what did I
see?—discarding or changing schema and eventually, through schema- or “self-”
change, creating a new perception or worldview.
It is this outcome, then, that holds unique importance for art and for perceptual
experience—and which is the essential target of our studies. By changing the self,
one can reset involvement; re-enter cognitive mastery, employing new schema al-
lowing new more harmonious interaction and ability to attend to or understand
new elements. In this way, viewers can be said to grow and to learn from interac-
tion, to feel harmony, or to see and experience something new, and to end in an
outcome commonly referred to as “aesthetic experience.”
It is this outcome that is generally agreed to be the goal and essence of viewing
art. Art is argued to mark one rare case where we are presented with opportunity
to meet up against a new take on perception, reassess conceptions and find new
ideas or alignments. Writers speak of potential “challenge” [5], novelty or arousal
as the point of interacting [6], philosophically equate this to “enlightenment,”
theorists champion final harmony and “aesthetic” response. This is contrasted
against the alternative outcomes of assimilation or escape, which are the antithesis
of aesthetic experience. Therefore it is this conclusion that museums naturally
seek to enhance—often by the social/ technology means mentioned above: provid-
ing ready access to information so that one might overcome confusion, soliciting
and bringing together viewers with the experiences of others in order to offer a
path to harmony, new perceptions or aesthetic response.
However, this outcome, and model’s discussion, also raise issues not often fully
considered by museum/ perceptual design. First, is the basic importance of this
progression itself. The aesthetic experience outcome is of course a goal, but it is
also temporally tied to the previous outcomes noted above, requiring a viewer to
move, often within one encounter, through assimilation and secondary control, be-
fore arriving at aesthetic end. When we do look to museum viewer response, our
research [7] shows that this progression can literally be traced. Beginning emo-
tionally, a viewer’s path that does end in epiphany or happiness will also correlate
to earlier anxiety, confusion and reflection on the self. In appraisal, viewers move
from ugliness (assimilation/ escape) to beauty (aesthetic schema change), from
blasé facile reception to meaninglessness to assessments of meaning tied to the
self; from lasting negative to positive effect. Therefore it is actually this movement
that may hold the most pragmatic importance for art, suggesting a progression,
which viewers can exit at any outcome, but must be fully experienced to arrive at
an aesthetic end. This in turn leads to the key issue in connection to social interac-
tion’s effect. As a viewer moves through this progression there are two elements
that are required in order to move past the major checkpoints of assimilation and
(Not)Myspace: Social Interaction as Detriment to Cognitive Processing 403

escape. First, one must precisely not sufficiently understand, must not (at least at
first) master their interaction—they must fail in their attempt at control. Second,
they must undergo overt focus on the self. It is these two elements that are specifi-
cally impacted by social presence, and which call into question many of the
museum approaches above. This might be best considered in the case study below.

3 Case Studies: 3 Encounters with Rothko Art, and Effect


of Social Interaction
For the past several years, we have been conducting research in three rooms of art,
in three museums, all containing the same basic artworks by the same artist (Mark
Rothko). These spaces—in Houston, U.S.A., Kawamura, Japan and London’s Tate
modern—are particularly interesting for discussion of aesthetic experience. Critics
and viewers routinely report emotions considered above—epiphany, pleasure,
happiness. They also mention discrepancy, anxiety as well as self-awareness [7 for
review]. However, it is the layout of the rooms, and artworks themselves, that we
find most intriguing. Each space has essentially the same arrangement (Fig. 2):
one or two closed rooms without labels/ information, and with only Rothko’s ab-
stract art, itself compelling in its simplicity—large squares of color (purple and
black in the U.S., red and orange in Tate and Japan; ranging from roughly 2.5 x
3.5-4.5m) set on a darkened ground. Although perceived as paintings in the clas-
sical sense, the artworks cannot be said to depict anything, and in fact according to
critical analysis cease to be mimetic paintings and instead become ‘just paint’—
allowing us to focus not on any specific variety of mimetic or personal meaning,
but on the underlying mechanism of aesthetic response.
And again, these rooms are specifically noted for this. Critical discussion [7 al-
so 11] describes an experience that mirrors the progression above. Upon entering,
because of the paintings’ monumental quality and minute brushstrokes, viewers
have a sense of profound significance or meaning. However, as viewers attempt to
fit the paintings together and understand their collective meaning, the paintings’
redundancy eventually causes discrepancy, forcing viewers to give-up mimesis.
The outcome from this discrepancy, then, comes to explicitly follow our argu-
ment. Either one leaves, or finds schema change. “Frustrated and thrown back to
yourself” [14] notes, “you become the center of the room. You think about your
conduct, your body.” “[the art] forcibly redirects attention to the viewer’s situation
and conduct… specifies the focus of the viewer’s newly evoked self-awareness.”
The art “forces disturbing questions about the nature of the self and its relation to
the world,” and this self-confrontation induces an expectational change, in which
we “may emerge on a new plateau” of understanding where “the experiential
structure [itself] is transformed.”
Our empirical analysis also shows exactly this. Although we will not consider
the statistical analysis of the evidence here [see 8], what is important to note is that
we do find three outcomes—related to time, appraisals and the emotions found.
First, there is the facile, notable for generally neutral emotion, shallow assessment
and a very short time spent inside, and no confusion. Second we find the outcome
404 M. Pelowski

of secondary control, containing confusion and anxiety, very negative assessment


and no conclusion in self awareness or positive response. And finally (condensed
in Table 1) we find aesthetic response, correlating in all three rooms with the same
collection of emotional and cognitive events, and suggesting full progression
through each stage of this model—confusion, anxiety, need to escape, self aware-
ness, and final motive re-assessment, happiness and epiphany, As argued above,
this final outcome is also tied to many of objective measures that a museum would
hope for from art: hedonic evaluations of beauty and goodness and basic under-
standing and sense of “meaningfulness” in the artworks themselves. This holds
constant regardless of culture, country, gender and specific work.
However, when we do compare these rooms, we do also find a notable differ-
ence, centering specifically on the room in Japan. As shown in Table 1, while
Houston and London saw roughly 70% of viewers achieve aesthetic response—
with very few failing in secondary control, and roughly 20% ending in facile mas-
tery—in Kawamura, we found a quite different breakdown. Very few viewers re-
ported first, facile or non-discrepant outcome; a very high number (33%) reported
aborted escape and significantly fewer (57%) reported aesthetic experience. The
potential reason for this difference, then, can be directly attributed to social
awareness.

Table 1 Cross-cultural/ Cross-stimuli Evidence for Aesthetic Experience in Three Rothko


Rooms: Adapted from Pelowski (2011)

Rothko Chapel, Houston, USA Kawamura, Japan Tate Modern, London, UK


element (n = 21) (n = 30) (n = 29)
outcome
fffacile 24% (n = 5) 10% (n = 3) 17% (n = 5)
ffabort 5% (n = 1) 33% (n = 10) 10% (n = 3)
ffAesthetic 71% (n = 15) 57% (n = 17) 73% (n = 21)

emotional discrepancy
experience: confusion, anxiety, tension confusion, anxiety confusion
correlation to
secondary control
aesthetic
experience need to leave/ escape, need to leave/ escape need to leave/ escape
meta-cognitive reflection
self-awareness/ very aware of self-awareness, self-awareness, aware of my
myself, changed my mind, aware of others body, examined my motives
examined my motives
aesthetic outcome:
epiphany, felt like crying, epiphany (satori), felt like epiphany, felt like crying,
happy, relief or catharsis crying, happy, happy, relief or catharsis,

Artwork hedonic appraisal


appraisal: good, meaningful, beautiful, meaningful, beautiful good, meaningful, beautiful
tie to epiphany nice
Artwork understood artist's intention, art understood artist's intention, art art was meaningful
understanding: was meaningful, was meaningful, _

Note: adapted from [8]. All denoted terms are significant at<.10.

In tandem with considering the psychological experience of these rooms, we al-


so considered viewer behavior (the following is based on observation of 65 Ss in-
side [7]). And when we explore this aspect of looking at the art, we find profound
importance of social interactions inside the space. Whereas the Houston and Tate
(Not)Myspace: Social Interaction as Detriment to Cognitive Processing 405

galleries were, by square area much bigger spaces, the Kawamura room, at the
time of this study, represented, as the only major difference, a much narrower
room. And when we do look at what viewers actually do in this room, it appeared
that as a viewer stood at the entrance to the gallery forming an assessment and
schema to be used in their encounter, as per the model above, they were also mak-
ing one more assessment before engaging with the art, with even more profound
effect upon what they do.
Essentially, upon entering, and pausing to survey the inside, subsequent viewer
action came to be determined, in almost all cases, by the location, and the avoid-
ance, of other individuals themselves. The potential locations where other viewers
might be were primarily centered on two points: the bench, located in the middle
of the space (Fig. 1) and in the far corner where the in-room docent stood. Viewer
interaction first showed an affect from this seated viewer. As our observations
show, upon entering and pausing, if another viewer was sitting on the bench fac-
ing to the right (positions 6 and 7), 100% of viewers moved to the left. If another
was sitting looking left (2 or 3), viewers moved to the right, 91% of the time.

QuickTimeô and a
TIFF (Uncompressed) decompress
are needed to see this picture.

Fig. 1 Gaze spaces and effect on viewer movement in Kawamura Rothko Room, Japan.
(Images created by the author).

In turn, what was determining viewer action in these cases, or again what was
being avoided, was not another viewer per say, but their pool of vision or what
might be called their ‘gaze space’—most often trained upon the particular painting
ahead. It was a quite common occurrence to observe a viewer wait, just outside
another’s pool of vision, trained on the next painting in the waiting viewer’s natu-
ral progression, until the other had moved their gaze before proceeding on.
This initial gaze avoidance was coupled with subsequent interaction with the
docent as well. This again, occurred in two possible points; depending on which
direction the viewer had initially moved. If a viewer entered moving left, upon
turning the corner from painting 4 to 5, they immediately came face to face with
the docent. In a likewise fashion, if a viewer moved right, they encountered the
gaze of the docent upon turning the corner from paintings 7 and 6 and moving to-
ward 5. Again, observation quite clearly showed that this gaze came to have a pro-
found effect on viewer action. Of 38 (58%) viewers who did move initially left,
29% (n = 11) stopped and turned around when hitting the corner of 3/4. Of 27
406 M. Pelowski

viewers who moved right, 22% (n = 6) turned around at painting 5 in front of the
docent, while 37% (n = 10) turned around and left at the first corner (painting 6/7).
These gaze interactions, then, came to have the key role in setting this space
apart. This can also be put in very objective terms. For those who entered when
another was already sitting at a bench, they saw on average 1½ (of 7) less paint-
ings, spent roughly 1 full minute less in the room itself. Or, that is, if we compare
these findings to the layout of this room, the time spent inside, and the amount of
gallery covered, align almost exactly to the amount of time and space presumably
available before one came upon a point where they had no choice but to enter
another’s gaze, or to leave—the choice of the majority—an amount of time and
number of paintings, presumably, short of aesthetic response. This also appeared
in our questionnaires, with a significant number of viewers, again unlike the other
two rooms, noting a specific awareness of others (Table 1).

4 Why Walk Away? Social Interaction Considered


Psychologically

But what does this have to do with social interaction and cognitive processing of
art? There is in fact a good deal of literature we might attach to what is occurring
here. This returns specifically to the assessment made by a viewer at the entrance
to a gallery and the self image basis of the model discussion above. While viewers
can be said to carry conceptions with them into the gallery, so too do they utilize
social schema. According to Rapee and Heimberg [12], “on encountering a social
situation,” before any processing of specific tasks it might contain, interaction be-
gins with a classification of the audience and one’s social, in addition to personal,
self. “An individual forms, a mental representation of his/her external appearance
and behavior as presumably seen by the audience. The individual simultaneously
formulates a prediction of the performance standard or norm which he/she expects
the audience to utilize in the given situation. The representation of how the au-
dience is expected to view the individual and the appraisal of the audience's pre-
sumed situational standards are compared [to one’s image of the self] to provide
an estimate of the audience's perception of the individual's performance... a deter-
mination is made whether the individual [is likely to] perform in a manner which
meets the presumed standard,” and one creates a classification for social standing
with which to engage in the cognitive task.
The gaze then is the physical manifestation—and specific test—of this cogni-
tive preparation. Stepping into the gaze space of another is an act of stepping on
stage, or essentially directly beginning a social engagement. However, when we
do consider the Kawamura room, this model also directly touches on the two is-
sues raised above for aesthetic response. As we said, in order to arrive at aesthetic
experience, there are two points that must be moved through—discrepancy and
self-awareness. However, it is these two points, according to the literature, that are
(Not)Myspace: Social Interaction as Detriment to Cognitive Processing 407

specifically affected by this type of social awareness (we have placed these into a
cognitive model [7] for viewing art, shown in Fig. 2).
First, in the case of discrepancy, viewers presumably do not find significant de-
ficiency in their social assessment—otherwise they would likely refrain from en-
tering. However, in cases where there is some discrepancy that arisen within one’s
cognitive task, the game changes. Discrepancy, where others are present, cannot
help but to take on a social tone, involving one’s expected fit within the social en-
vironment and social relation to others in the space. Rapee and Heimberg [12]
note, “perceptions of 'poor' performance would provide powerful input to the men-
tal representation indicating an inept appearance to the audience” [also 9]. And it
is specifically this sort of individual who would avoid gazes. Individuals who have
low expectations actively seek to avoid social interaction, and in this way, seek
self protection—“behaviors aimed at reducing potential for social interaction with-
in a situation… avoiding eye contact, standing on the periphery of a group” [12]—
or in this case, on the periphery of art. This can also be said of latter self-
awareness. According to [13], when individuals consider themselves to be defi-
cient “they tend to shift attention inward,” away from self-awareness, to prevent
“embarrassment and humiliation.” In fact, perception of a social imbalance
between viewer and others actually impedes meta-cognition.
It is this outcome, then, that we specifically find in the Kawamura room. It is
important to note that this behavior is not unique to Kawmura or Japanese [see 8].
We observe the same actions and behavior in the other museums we study, and
this social interaction is a commonly considered point of psychological study.
However, again, as a viewer progresses around the specific Japan space they have
no choice but to bump into others, and in turn no choice but to introduce a social
interaction into their cognitive process. This social interaction, while potentially
minor in a general museum of art where there is another path to take, becomes the
driving force for how much time and in what manner viewers engage.

5 Implications for Social Media Design—Kawamura Revisited

This discussion, then, should raise important issues for in-museum social media de-
sign. Returning to the two elements raised above and reconsidered in the psycholog-
ical consideration of social interaction, in order to have aesthetic response—and in
turn, according to our data to find meaningfulness, beauty or positive emotion with
art (and essential goals of art museums)—one must encounter discrepancy, must be-
come self aware and must not run away. However, it is these two elements that are
specifically hampered with introduction of social awareness. This was considered
with something as simple as a room too small to avoid bumping into another, with
unavoidable points where you must share their space. But is this not the very thing
that is amplified with social media design?
408 M. Pelowski

Fig. 2 Combined cognitive flow model of aesthetic experience (left side, adapted from [6])
and cognitive processing of viewer social-relations (right, adapted from [7]).
Note that viewer cognition involves parallel processing of both outward cognitive task (e.g.,
artwork) and social position/ perceived balance between one’s self-efficacy and that of
‘others’ within the environment. Viewers switch allocation of resources between processing
of the task and monitoring social situation depending upon level of comfort within social
task. Upon discrepancy in either external/internal processing, viewers first abort through as-
similation or secondary control (outcomes 1 and 2). Given situations that cannot be aborted,
viewers often switch cognitive resources to environmental monitoring, withdrawing from
attempts at cognitive mastery (outcome 3). Only when one cannot escape due to strong tie
between perceived task and the self and one does not perceive large self-other discrepancy
might aesthetic outcome occur (4).
(Not)Myspace: Social Interaction as Detriment to Cognitive Processing 409

As noted by [7], art viewing is of course a social task, by insuring both that art
viewing will always have others, even if digitally, present, as well as by giving in-
formation derived from other sources and a conduit into the opinions (the gazes?)
of others—whether bench or twitter feed—we essentially offer up: 1) a task which
has a correct answer and in which one should not misunderstand, and 2) prime an
individual for a social comparison—will I succeed?; are my opinions important?;
as much as others? While a few may answer in the affirmative, a vast majority,
when reminded of their social position, decide to turn and walk away. In turn, it is
the very appreciation of discrepancy that is required for final aesthetic/ cognitive
appreciation, however it is this very initial failure that primes one for susceptibili-
ty to negative impact of social interaction. While this is not to argue that this will
always end in such negative encounter, designers should be very careful of social
awareness and other’s information’s, considered in tandem with cognitive
processing, potential effect on appraisal and aesthetic interaction—so that we do
not create a digital form of the very interaction considered physically above.
This point can be driven home by returning to Kawamura’s room. The curators,
aware of the negative encounters occurring, redesigned the space to specifically
make it larger so as to avoid social contact. In a return study of this space, our
findings further support the claims above. Where before 10% (of N = 22) had rec-
orded facile outcome (without confusion or epiphany), 33% had recorded escape,
and only 57% reported the aesthetic experience, now we find an essential reversal
of these latter two. 13.6% recorded the first outcome, only 18% recorded escape,
and 68% (n = 15, or almost exactly the 70% found in the other studies above) rec-
orded aesthetic experience. Comparison of means between old and new, again,
found one notable change—reduction, to almost ‘0,’ of social or other awareness.

References
[1] Becker, H.S.: Art worlds. University of California Press, Berkeley (1982)
[2] Harrington, A.: Art and social theory: Sociological arguments in aesthetics. Polity
Press, Cambridge (2004)
[3] Goulding, C.: The museum environment and the visitor experience. European Journal
of Marketing 34(3), 261–278 (2000)
[4] Leder, et al.: A model of aesthetic appreciation and aesthetic judgments. British Jour-
nal of Psychology 95, 489–508 (2004)
[5] Pelowski, M., Akiba, F.: A model of art perception, evaluation and emotion in trans-
formative aesthetic experience. New Ideas in Psychology, 1–18 (2011)
[6] Pelowski, M.: Disruption, change and aesthetic experience. Doctoral Dissertation,
Nagoya University, Japan (2011)
[7] Carver, C.: Cognitive interference and the structure of behavior. In: Sarason, I.G., et
al. (eds.) Cognitive Interference; Theories, Methods, and Findings, pp. 25–46. Erl-
baum, Mahwah (1996)
[8] Rothbaum, et al.: Changing the world and changing the self: A two-process model of
perceived control. Journal of Personality and Social Psychology 42(1), 5–37 (1982)
410 M. Pelowski

[9] Torrance, E.P.: The search for satori & creativity. The Creative Education Founda-
tion, New York (1979)
[10] Nodelman, S.: The Rothko chapel paintings: Origins, structure, meaning. University
of Texas Press, Austin (1997)
[11] Rapee, R., Heimberg, G.: A cognitive-behavioral model of anxiety in social phobia.
Behavioral Research Therapy 35(8), 741–756 (1997)
[12] Wells, A., Papageorgiou, C.: Social phobia: Effects of external attention on anxiety,
negative beliefs, and perspective taking. Behavior Therapy 29, 357–370 (1998)
Nuclear Energy Safety Project in Metaverse

Hideyuki Kanematsu, Toshiro Kobayashi, Nobuyuki Ogawa, Yoshimi Fukumura,


Dana M. Barry, and Hirotomo Nagai*

Abstract. This project for learning nuclear energy safety was carried out through e-
learning. Problem Based Learning (PBL) was selected as the educational tool and
Metaverse as the class environment. The virtual classroom was built on a virtual
island of Second Life owned by Nagaoka University of Technology. Three students
from two National Technical Colleges in Japan joined the project. A teacher gave
the students a short lecture and proposed the problem. Students understood the con-
tents very well and solved the problem through chat-based discussions in Metaverse.
Students’ clear and precise understanding, their high activeness of discussion and
high interest for the safety of nuclear energy was apparent throughout this successful

Hideyuki Kanematsu
*

Department of Materials Science and Engineering,


Suzuka National College of Technology, Japan
e-mail: kanemats@mse.suzuka-ct.ac.jp
Toshiro Kobayashi
Department of Electronics & Control Engineering,
Tsuyama National College of Technology, Japan
e-mail: t-koba@tsuyama-ct.ac.jp
Nobuyuki Ogawa
Department of Architecture, Gifu National College of Technology, Japan
e-mail: ogawa@gifu-nct.ac.jp
Yoshimi Fukumura
Department of Management and Information Systems Science,
Nagaoka University of Technology, Japan
e-mail: fukumura@oberon.nagaokaut.ac.jp
Dana M. Barry
Center for Advanced Materials Processing (CAMP), Clarkson University, US
e-mail: dmbarry@clarkson.edu
Hirotomo Nagai
Water Cell Inc. Japan
e-mail: nagai.h@water-cell.jp

T. Watanabe et al. (Eds.): Intelligent Interactive Multimedia: Systems & Services, SIST 14, pp. 411–418.
springerlink.com © Springer-Verlag Berlin Heidelberg 2012
412 H. Kanematsu et al.

PBL class project. The results indicate very clearly that this kind of PBL class
was obviously possible for actual e-learning in nuclear engineering and engineering
education.

1 Introduction
On March 11, 2011, a huge earthquake hit the east part of Japan. Along with it
came a nuclear disaster, for which the people of Japan are still suffering from its
unending aftermath. Among those tragic events, the tragic trouble in Fukushi-
ma’s nuclear plants is the worst nuclear disaster for Japan. Now we are doing our
best to solve the many difficult problems relating to the disaster in different areas
of Japan. From the viewpoint of engineering education, what could we contribute
to restoration and recovery processes? Engineers of today and the future should
be optimistic problem solvers at any time. At the same time, they should know
what technology could do and could not do scientifically. Engineers have the
important mission to transfer this knowledge to people (our society at large). As
for the nuclear energy, engineers should also behave in this same way. In this
point, there is a chance for engineering education to play an important role. What
is dangerous and what is safe for nuclear energy? It is very important for engi-
neers of the future to answer the question: “To-be-or-not-to-be.” We believe that
engineering education could contribute to it very much in various ways.
We, the current authors, have been involved with the Japanese higher education
in national colleges (Kosen) for many years. Almost 50 years ago, National Col-
leges systems were established to produce practical engineers who could solve the
practical problems instantaneously in industrial worlds. The national colleges
have originally had five year programs for junior high school graduates. Howev-
er, the two year programs were added into the original one on the way for graduate
courses. Now there are over 50 national colleges all over the country. Seven
years ago, we were united into a huge organization, even though each branch has
had its own autonomy to some extent. From this viewpoint, the national college
system may be a sort of huge higher educational network. Two science and tech-
nology universities, Nagaoka University of Technology and Toyohashi University
of Technology, are located in the center of the educational network to pursue the
scientific and educational collaborations together (Fig.1).
Since such a geographical condition and practical education purpose are re-
quired for national colleges, e-learning is becoming important. In addition, engi-
neering creative design education is also required for every aspect in the education
of national colleges. Therefore, a Problem Based Learning (PBL) model for e-
learning has been established by the authors so far [1]-[10]. We believe that such
an educational project could lead to the curriculum of the near future, which
would provide students with distance learning and creative education at the same
time.
A huge earthquake and its following disasters hit the eastern part of Japan, as
previously mentioned. Particularly, the troubles in nuclear plants of the Fukushima
Prefecture stirred the reliability for the safety in nuclear energy remarkably. Be-
cause of such a background, it is very important and informative for engineering
Nuclear Energy Safety Project in Metaverse 413

Fig. 1 Kosen System in Japan. The left map was cited from the web page of institute of Na-
tional Colleges of Technology, http://www.kosen-k.go.jp/english/map[]_mechanical.html

students to know precisely what is safe and what is dangerous. This topic is most
appropriate and timely among youngsters who will launch on the engineering field
in the near future. For this project, we proposed an engineering problem relating to
the nuclear safety issues in Japan. After listening to a lecture, the students tackled
with the problem solving project in Metaverse. The effectiveness and problems
for the virtual distance learning class in Metaverse are discussed.

2 Experimental

2.1 Metaverse and PBL


For this project, we utilized Metaverse as virtual space where the e-learning class
activity was available. There are some interesting Metaverse environments already
in the world. We chose Second Life for our purpose. Second Life has been run
by Linden Lab Co. based on San Francisco, USA and we have often utilized the
Metaverse so far. Metaverse is a three dimensional virtual world where avatars
do everything on behalf of the participants. In our previous studies, there were
some reasons why we utilized it for e-learning projects. First of all, one could
possess the feeling of identity from the viewpoint of space and time in Metaverse.
Usually, e-learning based on texts and static images provides students asynchron-
ous merits that they could join the project anywhere and at anytime, when they
want discussions with each other. The activity usually limits their social contacts
to some extent, where they could originally share the common real feeling to join
something together in a certain space and time. We applied e-learning to Problem
414 H. Kanematsu et al.

Based Learning (PBL). Problem Based Learning is basically a problem-solving


type educational tool where students could learn everything through their mu-
tual discussion in a team. Therefore, various societal feelings and skills are re-
quired to pursue the educational activity. It would be very difficult for PBL to be
realized in conventional e-learning classes. The PBL classes in Metaverse have
been successfully carried out so far. We confirmed the effectiveness in various
ways. In the current project, the goal was for the students to have a precise and
scientific impression for nuclear energy safety through PBL. Therefore, Second
Life as Metaverse was chosen as a virtual environment.

2.2 Classroom Environments


A Virtual classroom was built on the island of Nagaoka University of Technology
(NUT). They bought an island in Second Life and have run it so far to utilize it
for various e-learning activities. Fig.2 shows some outward appearances for the
virtual island. Fig.3 shows a distant
view of the classroom where students
participated in the virtual PBL project
this time. The classroom has a black-
board like big screen, a teacher’s desk,
students’ desks and chairs, walls and
roof, as shown in Fig.4. It gives us
the impression and feeling as if we
could be at a usual classroom in real
life. We think this is an important key
Fig. 2 A panoramic view for the island factor to pursue the PBL activity in
owned by NUT. virtual space. The discussion for PBL
in Metaverse was basically carried out
through exchanging text messages.
From the viewpoint, students could
discuss easily without any classroom
facilities mentioned above. However,
students arguably enjoyed the class ac-
tivities in the virtual space, since the
real atmosphere in the classroom gave
them a sense of reality. Creative and
positive discussion is always inspired
Fig. 3 The appearance of the virtual class- by pleasant, delight and optimistic
room for this project. mindsets. Therefore, the classroom and
its various facilities are always needed
for successful PBL in virtual space.
In the current project, a teacher (Kobayashi) gave a 30 minutes lecture firstly.
He gave the lecture by a Power Point slide show. It was displayed on the big
screen shown at the front of classroom. The lecture was based on text message and
given to the students in a chat-like way. He sometimes asked students some
questions. Students answered him also in a chat-like way. Tablets (Bamboo tablet,
Nuclear Energy Safety Project in Metaverse 415

Wacom Co.) were prepared to help students’ discussion and understanding. They
could write any sentences, sketches, figures, equations etc. on them. Then the data
was sent to a web server, so that the participants could share them on a web by
browsers.

2.3 Project Outline


Three students were chosen from Tsuyama National College of Technology and
Gifu National College of Technology. The total number of students was three.
One was from Tsuyama National College of Technology and the other two were
from Gifu National College of Technology. They were 16 years old and corres-
ponded to the first grader level in usual high schools. One of the students be-
longed to the same college with the teacher, while the other two joined the class
from quite a different col-
lege. As for the teacher
and the student in the
same college, they were
separated to join the class
in completely different
rooms. The students got
together with teachers in a
preliminary session sche-
duled at one day before
their class was to be held
Fig. 4 Kobayashi’s lecture scene. in the virtual island of Na-
gaoka University of Tech-
nology. They confirmed
with us teachers how they
could move, look at the
big screen, discuss, use
the tablet and so on. In
the following formal ses-
sion, they gathered once
again in the virtual island
and had the 30 minute
lecture by Kobayashi.
Kobayashi’s lecture was
composed of four parts.
At the first stage, he
explained to the students
schematically what the
Fig. 5 Materials for calculations –to protect radiation for radiation was, what kind
the safety. of radiations we had, how
we could get them on dai-
ly basis. Secondary, he stressed that the strength of radiation was decreased with
the increase of distance between the source and the measured position. And he
416 H. Kanematsu et al.

also stressed the shielding capability would depend on what kind of materials
would be used to protect human beings. He mentioned some kinds of metallic
materials and made students calculate the shielding capability against radiation.
At the final stage, the following problem was proposed.
What kind of metals could protect radiation effectively?
The radiation decay by shiel-
ding materials was explained to
the students, shown in Fig.5.
Students calculated the radia-
tion decay with increasing dis-
tance for copper and aluminum.
And they also calculated the
thickness at the point where the
original radiation strength de-
creased to the half value under a
certain condition. All of these
problem solving processes were
Fig. 6 PBL procedure in Metaverse carried out by the team. In that
way, each student could learn
the safety and danger of nuclear energy scientifically and quantitatively. Finally,
the teacher ended his lecture with the following remark.
“Nuclear energy is absolutely safe, where the radiation intensity decreased com-
pletely to the safe level with the shielding material. These results show one could
use nuclear energy with the well-learned knowledge completely in safety.” And after
the PBL class, questionnaires were provided to students and they answered those
questions off-line immediately. Fig.6 summarizes the project outline.

3 Results and Discussion


The questionnaire was composed of 11 questions. It is written below.
1. Did you enjoy your over all activity?
#1: Very much #2: Pretty much #3: Neutral, #4: Not so much
#5: Not at all
2. Did you chat with other colleagues and the teacher effectively?
#1: Very much #2: Pretty much #3: Neutral, #4: Not so much
#5: Not at all
3. Did you feel that the discussion was easy?
#1: Very much #2: Pretty much #3: Neutral, #4: Not so much
#5: Not at all
4. Did you feel that the usage of tablet was easy?
#1: Very much #2: Pretty much #3: Neutral, #4: Not so much
#5: Not at all
5. Did you understand the contents of lecture?
#1: Very much #2: Pretty much #3: Neutral, #4: Not so much
#5: Not at all
Nuclear Energy Safety Project in Metaverse 417

6. What was the easiest?


#1: Avatar’s movement #2: Discussion #3: Sketch
#4: Teleport #5: other ( )
7. What was the most difficult?
#1: Avatar’s movement #2: Discussion #3: Sketch
#4: Teleport #5: other ( )
8. Was your teacher friendly to you?
#1: Very much #2: Pretty much #3: Neutral, #4: Not so much
#5: Not at all
9. Do you want to join such a project again in the future?
#1: Very much #2: Pretty much #3: Neutral, #4: Not so much
#5: Not at all
10. Are you interested in the nuclear safety more than before?
#1: Very much, #2; Pretty much, #3: Neutral, #4; Not so much
#5: Not at all
11. Please write your impressions, ideas etc. freely.
Several typical students’ answers were cited and analyzed below.
As for question 1, most of the students could enjoy the activity. They gave us
their positive answers. As for the conversation (Question 2), it depended on the
students. Some felt it easy, while others were not so easy. Inevitably, the answers
for #3 differed from student to student. However, they answered in unison nega-
tively that the tablet usage was difficult (Question 4, 6, ). Actually, the tablet did
not work effectively, even though we expected it very much originally. Before
the project, we planned and presumed in advance each tablet connecting PCs
could send their data to a common web server. However, the white board based
on the web could not display the precise figures. The reason could be attributed
to that the action such as MouseDown, Drag and so on was lacking on the tablet
side. To solve the problem for the future, we are now planning two ways. The
first one is to develop some special applications for tablets. And the second one
is to use the whiteboard by PC browsers. Since the latter should be much better,
we will use the latter for the next time. As for the questions 5, 8, 9 and 10, all of
the students gave the positive answers back to us. They could understand the
class contents pretty well and therefore, they showed a relatively high degree of
satisfaction. For the final question, their answers were concentrated on the wish
for tablets.
The discussion in Metaverse among students was very active. The group dis-
cussion was led by an active student. It was really like an actual classroom ac-
tivity. It shows clearly that Problem Based Learning in Metaverse could be used
for an educational tool in Engineering Education.

4 Conclusions
As for Nuclear Energy Safety, we aimed for the students to learn it through PBL as
e-learning. For the purpose, we prepared a virtual classroom and other educational
facilities in a virtual island and confirmed the effectiveness of the virtual PBL.
Students understood the class contents through a short lecture and discussion very
418 H. Kanematsu et al.

much. They discussed very actively and finally learned what is the safety for nuclear
energy and how the dangerous radiation energy could be decreased drastically. The
series of educational investigations for this project show a positive and optimistic
possibility for the application of PBL in Metaverse.

References
[1] Kanematsu, H., Fukumura, Y., Ogawa, N., Okuda, A., Taguchi, R., Nagai, H., Barry,
D.M.: Problem Based Learning in Metaverse As a Digitized Synchronous Type
Learning. In: Kim, H.S. (ed.) Proceedings of the ICEE and ICEER (International
Conference on Engineering Education and Research), ICEE & ICEER 2009, Korea,
vol. 1, pp. 330–335. Se Yung Lim, Publishing Committee Chair, Seoul (2009)
[2] Barry, D.M., Kanematsu, H., Fukumura, Y.: Problem Based Learning in Me-taverse.
ERIC (Education Resource Information Center) Paper, ED512315 (2010),
http://www.eric.ed.gov/ERICWebPortal/
recordDetail?accno=ED512315 (retrieved)
[3] Kanematsu, H., Fukumura, Y., Barry, D.M., Sohn, S.Y., Taguchi, R.: Multilingual
Discussion in Metaverse among Students from the USA, Korea and Japan. In: Setchi,
R., Jordanov, I., Howlett, R.J., Jain, L.C. (eds.) KES 2010, Part IV. LNCS, vol. 6279,
pp. 200–209. Springer, Heidelberg (2010)
[4] Farjami, S., Taguchi, R., Nakahira, K.T., Nunez Rattia, R., Fukumura, Y., Kanemat-
su, H.: Multilingual Problem Based Learning in Metaverse. In: König, A., Dengel, A.,
Hinkelmann, K., Kise, K., Howlett, R.J., Jain, L.C. (eds.) KES 2011, Part III. LNCS
(LNAI), vol. 6883, pp. 499–509. Springer, Heidelberg (2011)
[5] Farjami, S., Taguchi, R., Nakahira, K.T., Fukumura, Y., Kanematsu, H.: Problem
Based Learning for Materials Science education in Metaverse. In: Proceedings of
2011 JSEE Annual Conference, pp. 20–23 (2011)
[6] Barry, D., Kanematsu, H., Fukumura, Y., Ogawa, N., Okuda, A., Taguchi, R., Nagai,
H.: Problem Based Learning Experiences in Metaverse and the Differences between
Students in the US and Japan. In: International Session Proceedings of 2009 JSEE
Annual Conference - International Cooperation in Engineering Education, pp. 72–75.
Japan Society of Engineering Education (JSEE), Nagoya (2009)
[7] Nakahira, K., Rodrigo, N.R., Taguchi, R., Hideyuiki, K., Fukumura, Y.: Design of a
multilinguistic Problem Based Learning - Learning Environment in the Metaverse,
pp. 298–303. IEEE, Taiwan (2010)
[8] Barry, D.M., Kanematsu, H., Fukumura, Y., Ogawa, N., Okuda, A., Taguchi, R., Na-
gai, H.: International Comparison for Problem Based Learning in Metaverse. In: Kim,
H.S. (ed.) The ICEE and ICEER 2009 Korea (International Conference on Engineer-
ing Education and Research), ICEE & ICEER 2009, Korea, vol. 1, pp. 60–66. Lim,
Se Yung, Publishing Committee Chair, Intercontinent Grand Hotel, Seoul (2009)
[9] Taguchi, R., Nakahira, K., Kanematsu, H., Fukumura, Y.: Construction and evalua-
tion of multilanguage environment which aims smooth PBL in metaverse, p. 35. The
Institute of Electronics, Information and Communication Engineers (IEICE), Nagao-
ka University of Technology, Nagaoka, Niigata (2010)
[10] Kanematsu, H., Fukumura, Y., Barry, D.M., Sohn, S.Y., Taguchi, R.: Multilingual
Discussion in Metaverse among Students from the USA, Korea and Japan. In: Setchi,
R., Jordanov, I., Howlett, R.J., Jain, L.C. (eds.) KES 2010, Part IV. LNCS, vol. 6279,
pp. 200–209. Springer, Heidelberg (2010)
Online Collaboration Support Tools
for Blended Project-Based Learning
on Embedded Software Development
— Final Report —

Takashi Yukawa, Tomonori Iwazaki, Keisuke Ishida, Yuji Nishigaki,


Yoshimi Fukumura, Makoto Yamazaki, Naoki Hasegawa, and Hajime Miura

Abstract. The present paper reports on online collaboration support tools


for a blended and project-based learning (PBL) program. The authors have
been conducting a research project in an attempt to implement e-Learning
technology for project-based learning on the development of embedded soft-
ware, and reported several times on each of the tools and the learning pro-
gram. This paper corresponds to the final report of the research project, which
clarifies the position of each tool and provides evaluation results and discus-
sions. The tools proposed and developed include integrated repository, review
support, online white-board, and postmortem support tools. A project-based
learning program has also been constructed on the premise of use of those
tools. A trial of the proposed program has been carried out, then a ques-
tionnaire survey and an achievement test have been conducted to evaluate
the learning effect of the program and the tools. The results suggested that
the program is feasible for PBL on embedded software development, and the
Takashi Yukawa · Tomonori Iwazaki · Kaisuke Ishida · Yuji Nishigaki ·
Yoshimi Fukumura
Nagaoka University of Technology, 1603-1 Kamitomioka-match,
Nagaoka-shi, Niigata 940-2188 Japan
e-mail: yukawa@vos.nagaokaut.ac.jp
Makoto Yamazaki
Nagaoka National College of Technology, 888 Nishikatagai-mach, Nagaoka-shi,
Niigata 940-8532 Japan
Naoki Hasegawa
Industrial Research Institute of Niigata Prefecture, 4-1-14 Shinsan,
Nagaoka-shi, Niigata 940-2127 Japan
Hajime Miura
Techno Holon Corporationl, 3-19-2 Takamatsu-cho,
Tachikawa-shi, Tokyo 190-0011 Japan

T. Watanabe et al. (Eds.): Intelligent Interactive Multimedia: Systems & Services, SIST 14, pp. 419–428.
springerlink.com c Springer-Verlag Berlin Heidelberg 2012
420 T. Yukawa et al.

tools facilitate collaborative activities between learners and are effective for
enhancing their ability.

Keywords: collaborative learning, project-based learning, e-learning,


embedded systems.

1 Introduction
Project-based learning (PBL) is intended to strengthen students’ abilities in
design, teamwork, and communication through experiences developed in solv-
ing practical problems as a team. As an example, PBL has become popular
in engineering education because industry requires new university graduates
to have engineering design abilities.
However, there are some disincentives avoiding to popularize PBL. PBL
assumes a group works in a classroom, the learning opportunity is limited
for the students who cannot attend the classroom at the same time. In ad-
dition, project-based learners must organize their experiences into system-
atic knowledge and insight to acquire the desired abilities; otherwise, PBL
becomes an ineffective and time-consuming process. Therefore, the authors
have conducted a research project to apply information and communication
technology (ICT) to PBL for implementing PBL sessions even if the learners
are in distant places or attend different time. The project targets a training
course of embedded software development [7].
Since collaboration between learners is a significant factor in PBL, this
project focused on both the e-Learning program of the collaborative design
process and the tools that support the e-Learning program. In addition, learn-
ers should take a look back on their activities at the end of a PBL session,
it is called a “postmortem”. The postmortem is also important in PBL in
order that the learners organize their experiences into systematic knowledge
and insight for acquiring the desired abilities [6].
The present paper describes every online collaboration support tool de-
veloped in the project and a model learning program for embedded software
development training. The position of each of the tools in PBL is clarified.
For evaluation of the learning effect, the actual learning sessions accompa-
nied by a questionnaire survey and an achievement test have been conducted.
The paper also reports evaluation results and discuss the learning effect and
usability of the tools.

2 Background
2.1 Related Works
In recent years, a number of training programs on embedded software have
been established. In Niigata prefecture, the Niigata Industrial Creation Or-
ganization (NICO) and associated organizations have conducted training
Online Collaboration Support Tools 421

courses on embedded software since 2006. The NICO program incorporates


a PBL course.
Research on collaborative learning with the support of ICT has become
increasingly active, and so research on Computer Supported Collaborative
Learning (CSCL) has evolved [1, 2, 3, 4]. In a CSCL environment, com-
munication between a teacher and a learner and/or between learners, shar-
ing documents and program codes created through the learning process,
and exchange of atmosphere (awareness) among learners are supported by a
computer.

2.2 Project Overview


The authors have established a project for implementing CSCL technology
for the PBL course on embedded software development. The final goal of the
project is to achieve the combination of the following two objectives:
• to construct a blended learning program for the embedded software PBL,
and
• to develop supporting software tools for the learning program.
Figure 1 shows the process of developing embedded software. Most existing
training courses incorporating e-Learning technology focus on having learners
acquire programming skills through a trial-and-error process, as shown on the
right side of the figure.
Without ignoring the right side of Fig. 1, the authors focus on sharing de-
sign knowledge through a review process, as shown on the left side, as well as
the postmortem process. The postmortem process enables learners to orga-
nize their experiences into systematic knowledge about designing embedded
software and to strengthen their insight for carrying out a project.

Fig. 1 Embedded
Software Development
Process
422 T. Yukawa et al.

3 Online Collaboration Support Tools for PBL on


Embedded Software Development
3.1 Requirements for Collaboration Support Tools
As described in the previous section, most existing training courses incorpo-
rating e-Learning technology focus on the right half of the process. Simula-
tors, virtual hardware environments, and version control systems are often
used for this purpose. In contrast, the authors intensively focus on the left
half of the development process and the postmortem process. For these pur-
poses, ICT is required to handle and process the information and knowledge
contributed by the learners.
For clarifying the requirements for ICT tools, the learners’ activities and
communication were captured in the real PBL training course and analyzed.
The following findings on the requirements for collaboration support were
obtained.
1. The learners are trained by the experience of carrying out the project
and applying their own knowledge and skills for software development
and project management. Frequent verification and feedback would im-
prove the learning effect. Cross review and face-to-face group review are
necessary in each step of the development process, although each learner
conducts the design process individually. Therefore, a function for per-
forming cross and group reviews, even if the learners are in remote places,
is required.
2. In the face-to-face group discussion, use of a whiteboard was very effec-
tive, and continuous presentation of the discussion process was rated as a
positive effect for productivity and quality of the software. Therefore, an
online whiteboard that has a history function is required.
3. In the PBL course, the learners gain a sense of accomplishment with com-
pletion of the project; however, this fact itself does not contribute to the
learners’ ability. For achieving a learning effect, the experiences should be
organized into structured knowledge. Therefore, the postmortem study is
very important. Records of the learners’ activities and communication are
expected to make the postmortem more productive.

3.2 An Integrated Repository Tool


The Integrated Repository Tool (IRT) is the integration of file storage and
a bulletin board. It is basically a BBS that is able to store multiple files as
attachments of a posted message. Obviously, this tool can be used as a simple
asynchronous communication tool; it can also be used for sharing documents
and programs (products) created through the learning process and annotating
the products. This would enable remote parties to perform cross and group
reviews. A screen shot of this tool is shown in Fig. 2. This tool supports the
Online Collaboration Support Tools 423

Fig. 2 Screen Shot of the


Integrated Repository
Tool

first requirement, that is, project-based distant learners can frequently review
their designs with each other.

3.3 A Review Support Tool


The Review Support Tool (RST) displays differences between an original de-
sign document and a revised design document. There are several tools that
can clarify the differences between text files. However, differences between
diagrams must also be displayed in this learning program. Figure 3 demon-
strates the function of the RST. For detecting added, deleted, and modified
objects, the tool uses the edit graphs algorithm [5]. This tool also supports
the first requirement.

Fig. 3 Function of the


Review Support Tool

3.4 An Online Whiteboard Tool with Chat and


Playback Functions
To support the second requirement, an Online Whiteboard Tools (OWB) with
chat and playback function is proposed. Although several online whiteboard
424 T. Yukawa et al.

Fig. 4 Screenshot of the


Online Whiteboard Tool

tools have been developed, these tools can only be used for synchronous
(real-time) discussion. In our learning program, cross review processes should
able to be performed asynchronously. Adding a chat function in addition
to a playback function, which can replay the chat contents sequentially as
well as display previous drawings on the whiteboard, enables asynchronous
discussion to be achieved using drawings. The OWB also has a function
whereby users can upload an image file and display the file as a background
image of the whiteboard. A screenshot of the OWB is shown in Figure 4.

3.5 A Postmortem Support Tool


Figure 5 shows a display image of the Postmortem Support Tool (PST),
which supports the third requirement. This tool provides the comparison
between the activity logs and the schedule. In addition, the information of
software configuration management such as the number of total files and
folders is provided together. To plot the review schedule and the milestone
in the same graph is also useful to overview the cause of the delay of the
project. The learner can change indication / non-indication of information
by the buttons, because it is supposed that displaying too much information
on the same graph at the same time is hard to view.

Fig. 5 Screenshot of
the Postmortem Support
Tool
Online Collaboration Support Tools 425

4 A Model Program
Postulating the use of the proposed e-Learning environment, a blended learn-
ing program on project-based embedded software development is constructed
as shown in Table 1. The table lists the schedule, the learning modes, and the
tools usages for each learning unit or activity. The learning modes include
synchronous (sync.) or asynchronous (async.), face-to-face (f2f) or distant
(dist.), and individual (indiv.), group of learners (group) or all learners (all).
The program requires 21 days, whereas the equivalent face-to-face program
requires only five days. This is because most learning units and activities are
conducted asynchronously, which requires a time allowance. Learners are as-
sumed to be part-time students and have workplace responsibilities. There-
fore, learners have limited available time each day for the learning program.

5 Evaluation of the Program and Tools


5.1 Evaluation with Trail of the Program
A trial of the proposed learning program involving actual learners has been
done. After completing the trial, a questionnaire survey was conducted to
evaluate the learning program and the supporting tools. The results of the
survey are shown in Figure 6. As shown in the figure, the learners provided
positive responses to most questions.
An achievement test, which assesses the manner of embedded software
design, was also conducted to evaluate the effectiveness of the PST. The
learners were divided in two groups. One group used the PST but another
in the postmortem phase. Table 2 shows the result. The result demonstrates
that the group using the PST got better score than another with 0.05 level
of significance.

5.2 Discussion
The question on the effectiveness of the tools received positive responses (very
good and good ) from 80% of the learners and received no negative responses.
On the other hand, the question on reduction of burden for the review with
the supporting tools received negative responses from 20% of the learners,
whereas 40% of the learners provided positive responses. This suggests that
the usability of the supporting tools should be improved. All trial learners
responded the positive answer for satisfaction of provided information and
smoothness of usage of the PST. In addition, they reported that the review
quality of this program is equivalent to that of the face-to-face program.
Although the blended learning program takes longer to complete, the learn-
ing quality is not far behind that of the face-to-face program, and knowledge
and skills on software design process are organized better with the supporting
tools.
426 T. Yukawa et al.

Table 1 Blended Learning Program on Project-based Embedded Software Devel-


opment and Supporting Tool Applicability to the Learning Units

Learning Unit/Activity Day Learning Modes IRT RST OWB PST


Requirement Specification
a lecture on a development process 1 async./dist./indiv.
a lecture on device drivers 1 async./dist./indiv.
a lecture on requirement specifica- 1 async./dist./indiv.
tion
a task description 2 async./dist./indiv.
writing an event list 2 async./dist./indiv. x
cross review 3 sync./dist./group x x x
general review 3 async./dist./all x
writing a content diagram 4 async./dist./indiv. x
cross review 5 async./dist./group x x x
general review 4 sync./f2f/all
Structured Design
a lecture on structured design 6–7 async./dist./indiv.
a lecture on data flow and state 6–7 async./dist./indiv.
transition diagrams
writing a data flow diagram 6–7 async./dist./indiv. x
cross review 8 sync./dist./group x x x
general review 8 async./dist./all x
a lecture on a module structure di- 9–10 async./dist./indiv.
agram
writing a module structure dia- 9–10 async./dist./indiv. x
gram
cross review 11 sync./dist./group x x x
general review 11 async./dist./all x
a lecture on module specifications 12–13 async./dist./indiv.
a lecture on a timer interrupt 12–13 async./dist./indiv.
writing module specifications 12–13 async./dist./indiv. x
cross review 14 async./dist./group x x x
general review 14 sync./f2f/all
Test Design
a lecture on test design 15–16 async./dist./indiv.
writing system test specifications 15–16 async./dist./indiv. x
writing linkage test specifications 15–16 async./dist./indiv. x
writing unit test specifications 15–16 async./dist./indiv. x
cross review 17 sync./dist./group x x x
general review 17 sync./f2f/all
Coding, Debugging, and Testing
coding, debugging, and testing 18–20 async./dist./indiv. x
Postmortem
individual retrospect 21 async./dist./indiv. x x
overall retrospect 21 sync./dist./all x x x
summarization 21 sync./f2f/all x x x x
Online Collaboration Support Tools 427

Fig. 6 Evaluation Results of the Proposed Learning Program

Table 2 Results of the Achievement Test


w/ PST w/o PST
Average score (out of 100) 71.4 53.0
Standard deviation 15.8 14.3
Number of Samples 5 6
Test static (t) 2.03
P (T ≤ t) (one-tailed) 0.0367

6 Conclusions
A research project to implement e-Learning technology for PBL of embed-
ded software development was described. Based on observation of a real PBL
training course, functional requirements for collaboration support were clar-
ified, and the support tools including integrated repository, review support,
online whiteboard, and postmortem support were developed. A model learn-
ing program which premises use of the tools was also presented. Trial sessions
of the program were conducted for evaluation of the program itself and the
tools. The results of the questionnaire survey suggested that the tool was easy
to use for most learners and the learning effect of the program was equivalent
to that of the face-to-face program. In addition, the result of the achievement
test suggested that the tools can enhance establishing organized knowledge
and skills of software development process to the learners.

Acknowledgments. The present research is supported by a grant through the


Strategic Information and Communications R&D Promotion Programme (SCOPE)
from the Ministry of Internal Affairs and Communications, Japan.
428 T. Yukawa et al.

References
1. Alavi, M.: Computer-mediated collaborative learning: An empirical evaluation.
MIS Quarterly: Management Information Systems 18(2), 159–174 (1994)
2. Brandon, D.P., Hollingshead, A.B.: Collaborative learning and computer-
supported groups. Communication Education 48(2), 109–126 (1999)
3. Gillet, D., Nguyen-Ngoc, A.V., Rekik, Y.: Collaborative web-based experimen-
tation in flexible engineering education. IEEE Transactions on Education 48(4),
696–704 (2005)
4. Kojiri, T., Kayama, M., Tamura, Y., Har, K., Itoh, Y.: CSCL and support
technology. JSiSE Journal 23(4), 209–221 (2006) (in Japanese)
5. Myers, E.W.: An O(ND) difference algorithm and its variations. Algorithmixa 1,
251–266 (1986)
6. Nishigaki, Y., Yukawa, T.: Proposal of postmortem support tools for use in
project-based learning. In: Proceedings of Society for Information Technology
and Teacher Education International Conference 2011, pp. 593–598 (2011)
7. Yukawa, T., Takahashi, H., Fukumura, Y., Yamazaki, M., Miyazaki, T., Yano, S.,
Takeuchi, A., Miura, H., Hasegawa, N.: Implementing e-learning technology for
project-based learning for the development of embedded software. In: Proceed-
ings of Society for Information Technology and Teacher Education International
Conference 2009, pp. 2208–2212 (2009)
Online News Browsing over Interrelated Target
Events

Yusuke Koyanagi and Toyohide Watanabe

Abstract. Recently, we can easily acquire various information from many Web sites.
However, it is difficult to determine the next browsing page from many Web pages.
We often change the search target to narrow down the result pages found out by
the existing search engine. Generally, it is not easy for a searcher to change the
query before he/she knows the page content in search result perfectly. In this paper,
we propose a method for specifying query terms to narrow down the pages in the
search result effectively. Our method detects the representative terms of the current
search target according to the currently browsed page. In order to detect the repre-
sentative terms of the current search target, our method calculates the occurrences
of individual terms in the currently browsed page, per constant time interval. We
explain the prototype system with our method and show our experiment.

1 Introduction
Recently, we can easily acquire various information from the Web with the search
engine such as Google search engine1. However, it is difficult for us to find the next
browsing page when we acquired a lot of pages as the search result [1]. In order to
search efficiently, it is necessary to catch up only the pages directly related to the
browsed page.
The news articles in some online news sites such as Yahoo! Japan News2 contain
hyperlinks to the other related news articles. The hyperlinks are useful to select
the next browsing article. When a searcher wants to find out information about a
Yusuke Koyanagi · Toyohide Watanabe
Graduate School of Information Science, Nagoya University,
Furo-cho, Chikusa-ku, Nagoya, Japan
e-mail: koyanagi@watanabe.ss.is.nagoya-u.ac.jp,
watanabe@is.nagoya-u.ac.jp
1 http://www.google.com/
2 http://headlines.yahoo.co.jp/

T. Watanabe et al. (Eds.): Intelligent Interactive Multimedia: Systems & Services, SIST 14, pp. 429–438.
springerlink.com c Springer-Verlag Berlin Heidelberg 2012
430 Y. Koyanagi and T. Watanabe

term in the currently browsed article, he/she tries to acquire the articles about the
term. However, the hyperlinks are not useful for every term in the currently browsed
article. It is necessary that one article is tied by the hyperlink to every related article
such as Wikipedia article. In Wikipedia3, each article has some hyperlinks to another
article whose title is contained in the article. However, Wikipedia article does not
contain the more detailed content than the news article.
In the Web search, we sometimes refine the search target. For example, in the
search for information about Aftermath of the 2011 Tohoku earthquake, we some-
times focus on more detailed target such as Fukushima Daiichi nuclear disaster. In
this case, in order to search for information about the target, we usually input new
query to the search engine and narrow down the search result. However, when the
searcher does not know information about the current search target, it is difficult for
the searcher to decide the query to narrow down the result effectively.
We propose a method for presenting the effective, detailed target. Our method
detects the representative term of the current search target according to the currently
browsed page.
In this paper, we assume a search with online news articles in Japanese. In ad-
dition, we focus on the change of the search target, not caused by the searcher’s
whim, but caused by the browsing of the articles. For example, the searcher some-
times changes the search target to the terms described in the browsed article.

2 Related Works
Several studies have investigated the method for detecting the searcher’s interest for
the Web page recommendation. Sumathi et al. proposed the Web page recommender
system based on the searcher’s browsing sequential patterns [2]. He et al. proposed
the sequential query prediction method based on the searcher’s past query sequence
[3]. Both of them extract the searcher’s interest from the information such as the
browsing history or the query history. In the case that the searcher does not know
the search target, such history does not represent the searcher’s interest. In addi-
tion, the method based on the browsing history may detect the change of the target
inaccurately. For the search with the search target, our method should present the
candidate of the search target and let the searcher select one.
In order to support the selection of browsed articles on Web, several studies have
proposed the document clustering methods. Kaski proposed WebSOM, which maps
similar documents to neighboring areas in 2D space [4]. For determining the similar-
ity of the document, he adopts the calculation of cosine similarity between the fea-
ture vectors of documents. Hu et al. proposed a method to extract semantic relations
from Wikipedia and then cluster documents with these relations [5]. They enhanced
the traditional similarity measure such as cosine similarity. However, they did not
assume the search with the search target. We assume that the searcher changes the
search target to the term described in the browsed article. In the assumption, it is
3 http://www.wikipedia.org/
Online News Browsing over Interrelated Target Events 431

Indication of the search target

Search target

Target articles

Searcher

Selection of the browsed article

Target articles

Searcher

Fig. 1 Searcher’s activity for refinement

more appropriate to present the related article based on the term of article than ones
based on the similarity of the total description of article.

3 Approach
We assume that the searcher does two activities: indication of the target and selec-
tion of the browsed article. Figure 1 shows a searcher’s activity for refinement. In
indication of the target, the searcher can acquire the articles about the target. In this
paper, we call the acquired articles as target articles, In selection of the browsed ar-
ticle, the searcher selects the browsed article from the articles acquired in indication
of the target. We consider the method for presenting the candidate of the target to
select new target. In this paper, we call the candidate list of the targets as candidate
targets.
The important functions in the search with news articles are:
Determination of Important Degree of the Term in Browsed Article: We as-
sume that the searcher decides the new target by reading the term of browsed article.
In order to present the candidate target which tends to be selected by the searcher,
the candidate target should be presented according to the important degree of each
term in browsed article.
Extraction of Representative Term in Target Articles: The searcher does not
know the content of the search result. The representative term of target articles
should be extracted and presented.
432 Y. Koyanagi and T. Watanabe

Candidate targets

Browsed terms A
browse { A, B, C, … } B Presenting

C order

Searcher

browsed article ・

Target articles Representative term: A Representative term: C

time
Target time period 1 Target time period 2
regular interval

Fig. 2 Overview of our approach

Hereinafter, we explain our approach to determine candidate targets. Figure 2


shows the overview of our approach.
The important degree of term in browsed article is different according to the
target. Therefore, the important degree of the term in browsed article is determined
according to the target articles. In our method, the content on the browsed article
is represented as the set of the terms on the browsed article. In this paper, we call
the set of the terms on the browsed article as browsed terms. We assume that the
term which is contained in only the browsed article but not in the other article in the
target articles is more important. In order to calculate the important degree of each
browsed term, we adopt tf-idf.
We assume that the term used in a specified time period is more representative.
Therefore, we consider some regular interval time periods including the target arti-
cles. We call the interval time periods as the target time periods. Our method deter-
mines the representative terms of each target time period from the browsed terms
according to the number of articles containing each browsed term in the target time
period, and presents them as the candidate targets.
In addition, in order to determine the order of the candidate targets, the impor-
tant degree of each representative term is calculated. If the candidate target is more
important term in the browsed article, the target is more important. If the candi-
date target is contained in more articles in the target time period, the target is more
important. Therefore, we calculate the important degree of representative term ac-
cording to the important degree in the browsed article and the number of articles
which contains the term in the target time period.
Figure 3 shows the processing flow of our method. First, the target articles with
the input target term are acquired, the browsed terms, their important degree and
the candidate target are determined, and also the target articles and the candidate
target are presented. When the searcher selects the new browsed article from the
target articles, the browsed terms, their important degree and the candidate target
Online News Browsing over Interrelated Target Events 433

Fig. 3 Processing of our Start


method Select the browsed
Input the article/the target
target terms
Acquire the target browsed
Which is target

articles
article
selected?

Extract the browsed Update the Acquire the target


terms browsed terms articles
Calculate the important degree
of each browsed term
Determine the representative
term in each target time period
(the candidate target)
Determine the presenting order
of the candidate targets
Present the target articles
and the candidate targets

are updated, and the candidate target is presented. When the searcher selects the
new target from the candidate target, the target articles, the important degree of the
browsed terms and the candidate target are updated, and also the target articles and
the candidate target are presented. In the acquisition of the target articles, the web
search engine or the prepared index of the articles is used.
In Section 4, we explain each process: Extraction/Update of the browsed terms
and their important degree, and Determination of the representative term in each
time period and the presenting order.

4 Proposed Method
4.1 Extraction/Update of Browsed Terms
The browsed terms BT is represented by the following equation.

BT = {(t1 , v1 ), (t2 , v2 ), (t3 , v3 ), ..., (tn , vn )} (1)

In this equation, n is the number of the browsed terms, ti is the i-th term, and vi is
the important degree of ti .
Each time the browsed article is changed, each term is extracted from the browsed
article. When the searcher browses no article, each term is extracted from all the tar-
get articles. The general noun, the proper noun, sahen-setsuzoku and keiyoudoushi-
gokan are extracted with MeCab[6], which is one of morphological analysis tools
for Japanese language. The title and the body text of article are only used for the
extraction. The hyperlink text to the related article is not used.
The important degree of the browsed terms is calculated with the following
equation.
Freq(ti ) N
vi = n log2 (2)
∑ j=1 Freq(t j ) DocNum(ti )
434 Y. Koyanagi and T. Watanabe

In this equation, Freq(ti ) is the occurrence number of ti in the browsed article,


DocNum(ti ) is the number of target articles which contain ti and N is the number of
the target articles. These values are used in the next step.

4.2 Determination of Representative Term in Each Time Period


and Presenting Order
The i-th target time period is denoted as T Pi . A series of target time period TPs is
represented by the following equation.

TPs = {(TA1 , rt1 , o1 ), (TA2 , rt2 , o2 ), ..., (TAm , rtm , om )} (3)

In this equation, m is the number of the target time periods, TAi is the set of the
target articles in T Pi , rti is the representative term of T Pi , and oi is the important
degree of rti .
The representative term of the T Pi is selected from the browsed terms. In order
to select rti from the browsed terms, the important degree of the browsed term t j in
T Pi (Valuei, j ) is calculated with the following equation.

DocNum(ti , TAj ) M
Valuei, j = log2 (4)
|TAj | T PNum(ti )

In this equation, DocNum(ti , TAj ) is the number of the articles containing ti in TAj ,
and T PNum(ti ) is the number of the target time periods including the article con-
taining ti . The Valuei, j becomes high when t j is contained in many articles in TAi .
In addition, the Valuei, j becomes high when t j is occured in few target time period.
The representative term of T Pi is the browsed term whose important degree in
T Pi is the highest. After the calculation of the important degree of all the browsed
terms, the representative term is decided.
When rt j is ti , the important degree of rt j is calculated by the following equation.

o j = vi DocNum(rt j , TAj ) (5)

In this equation, o j becomes high when the important degree of rt j in the browsed
article is higher. In addition, o j becomes high when rt j is contained in many articles
in TAi .

5 Prototype System
We developed the prototype system based on our method explained in Section 4. The
system consists three windows: Target Article Window, Candidate Target Window
and Browsed Article Window. Initially, Target Article Window is opened.
Target Article Window is the window for presenting the target articles. Figure 4
shows the Target Article Window. When the searcher inputs the target terms and
clicks the “Search” button, the system opens Candidate Target Window and outputs
Online News Browsing over Interrelated Target Events 435

Target Article Window


“Search” button

Textbox for inputing


the target terms

Listbox for
target time periods

Listbox for
target articles

Fig. 4 Display of Target Article Window

Fig. 5 Display example of


Candidate Target Window
Candidate Target Window
(delete)

Listbox for
candidate targets

“Select” button

the target articles and the candidate target. The candidate targets are output in
the listbox of Candidate Target Window. The target articles are output in the two
listboxes of Target Article Window. In the upper listbox, the target time periods
are shown in reverse time order and the most recent target time period is selected
initially. In the lower listbox, the target articles published in the selected target time
period are shown in reverse time order. Each time the searcher selects another target
time period from the upper listbox, the lower listbox is updated. When the searcher
selects the title of article from the lower listbox, Browsed Article Window is opened
and the article is shown in its window.
Candidate Target Window is the window for presenting the candidate target.
Figure 5 shows the Candidate Target Window. When the searcher selects a can-
didate target from the listbox and clicks the “Select” button, the candidate target
is added to the target terms and the target articles, and also the candidate tar-
gets are updated. At this case, in the upper listbox, the target time period whose
436 Y. Koyanagi and T. Watanabe

representative term was the selected candidate target was selected initially. The first
element of the listbox is “(delete).” When the searcher selects “(delete),” the last
added candidate target is deleted from the target terms.
Browsed Article Window is the window for presenting the browsed article. The
searcher cannot click every hyperlink in the presented article.
In addition to the construction of the prototype system, we collected the news
articles periodically, stored them in the local hard disc, and constructed the inverted
index of every noun contained in them. The inverted index is used to acquire the
target articles from the local hard disc.

6 Experiment
In this section, we explain our experiment. In our experiment, the 13,231 economic
news articles acquired from Yahoo! News are used. They contain 41,263 kinds of
terms. Their published dates are from Sep. 21, 2011 to Nov. 6, 2011. The interval of
the target time period is one day. Articles published on the same day are included
in the same time period.
Table 1 shows the candidate targets when the searcher inputs “nuclear power
station ” as the target terms and browses the article on Nov. 4 whose title is
“The No. 4 reactor of Genkai nuclear power plant recoveries to the normal operation
( 4 ).” The important terms representing the content of
the browsed article become high on the list of candidate target, such as operation,
restart, Kyushu Electric Power Co., Inc., Genkai and recovery.
The terms which are difficult to be understood with the only single term are
listed such as regular. For example, in the browsed article, regular is used with
examination. It is difficult for us to guess the sense of regular examination from

Table 1 Candidate targets when target term is “nuclear power station” and browsed article
is “The No. 4 reactor of Genkai nuclear power plant recoveries to the normal operation”

Rank Candidate Target

1 operation( )
2 restart( )
3 Kyushu Electric Power Co., Inc.( )
4 regular( )
5 Genkai( )
6 regular( )
7 regular( )
8 recovery( )
9 restart( )
10 report( )
11 Saga( )
12 report( )
13-45 consultation( ), group( ), citizen( ), etc.
Online News Browsing over Interrelated Target Events 437

Fig. 6 Target Article Window immediately after the searcher selects Genkai

only regular. For the sense clarification of such candidate target, we will consider
the addition of another term to the candidate target.
In addition, some terms are listed twice or more, such as restart and regular. It
is because the representative term of more than two target time periods becomes
the same term. For easy selection of the candidate target, we have to consider our
method for presenting the same candidate target.
Figure 6 shows the Target Article Window immediately after the searcher selects
Genkai from Table 1. In the upper listbox, the target time period “Nov. 1, 2011” is
selected initially. “Nov. 1, 2011” is the date on which the restart of the No. 4 reactor
of Genkai nuclear power plant was announced. Therefore, it is adequate that the
representative term of “Nov. 1, 2011” is Genkai. It is helpful for the selection of the
suitable article for the target. For easier selection of the suitable article for the target,
we will consider the presenting of another important target time period.

7 Conclusion
We proposed the method for presenting the effective search target. Our method de-
tects the representative term of the current search target according to the currently
browsed article. We constructed the prototype system based on our method. In the
experiment, we presented a case example of the proposed method. In order to con-
firm the precision of the method, we have a plan to conduct the experiment by
comparing with the correct result set.
For our future work, we should consider how to display the candidate target
and target time period effectively. For example, we will consider how to present
the same candidate target in more than two target time periods, how to present the
candidate target with more than two terms, and how to present the target time period
based on the important degree.

Acknowledgements. We are thankful to Mr. Y. Tabuchi for his collection of news articles
from Yahoo! Japan News, which we used in our experiment.
438 Y. Koyanagi and T. Watanabe

References
1. Turing, M., Haake, J.M., Hannemann, J.: Hypermedia and Cognition: designing for com-
prehension. Communication of the ACM 38(8), 57–66 (1995)
2. Sumathi, C.P., Valli, R.P., Santhanam, T.: Automatic Recommendation of Web Pages in
Web Usage Mining. Int’l Journal on Computer Science and Engineering 2(9), 3046–3052
(2010)
3. He, Q., Jiang, D., Liao, Z., Hoi, S.C., Chang, K., Lim, E., Li, H.: Web Query Recommen-
dation via Sequential Query Prediction. In: Proc. of the 25th Int’l Conf. on Data Engineer-
ing, pp. 1443–1454 (2009)
4. Kaski, S.: Dimensionality reduction by random mapping: Fast similarity computation for
clustering. In: Proc. of Int’l Joint Conf. on Neural Networks, vol. 1, pp. 413–418 (1998)
5. Hu, J., Fang, L., Cao, Y., Zeng, H.J., Li, H., Yang, Q., Chen, Z.: Enhancing Text
Clustering by Leveraging Wikipedia Semantics. In: Proc. of the 31st Annual Int’l
ACM SIGIR Conf. on Research and Development in Information Retrieval (2008), doi:
10.1145/1390334.1390367
6. Kudo, T., Yamamoto, K., Matsumoto, Y.: Applying Conditional Random Fields to
Japanese Morphological Analysis. In: Proc. of the 2004 Conf. on Empirical Methods in
Natural Language Processing, EMNLP 2004, pp. 230–237 (2004)
Path Planning in Probabilistic Environment
by Bacterial Memetic Algorithm

János Botzheim, Yuichiro Toda, and Naoyuki Kubota

Abstract. The goal of the path planning problem is to determine an optimal


collision-free path between a start and a target point for a mobile robot in an en-
vironment surrounded by obstacles. In case of probabilistic environment not only
static obstacles obstruct the free passage of the robot, but there are appearances
of obstacles with probability. The problem is approached by the bacterial memetic
algorithm. The objective is to minimize the path length and the number of turns
without colliding with an obstacle. Our method is able to generate a collision-free
path in probabilistic environment. The proposed algorithm is tested by simulations.

1 Introduction
The emerging synthesis of information technology, network technology, and robot
technology is one of the most promising approaches to realize a safe, secure, and
comfortable society for the next generation [4]. Intelligent technology plays a key
role in this process. Information technology and intelligent technology have been
discussed from various points of view. Information resources and the accessibil-
ity within an environment are essential for both people and robots. Therefore, the
environment surrounding people and robots should have a structured platform for
gathering, storing, transforming, and providing information. Such an environment is
called informationally structured space [12, 13] (Fig. 1). The intelligent technology
János Botzheim
Department of Automation, Széchenyi István University, 1 Egyetem tér, Győr, 9026,
Hungary, Graduate School of System Design, Tokyo Metropolitan University,
6-6 Asahigaoka, Hino, Tokyo, 191-0065 Japan
e-mail: botzheim@sze.hu
Yuichiro Toda · Naoyuki Kubota
Graduate School of System Design, Tokyo Metropolitan University, 6-6 Asahigaoka,
Hino, Tokyo, 191-0065 Japan
e-mail: toda-yuuichirou@sd.tmu.ac.jp,kubota@tmu.ac.jp

T. Watanabe et al. (Eds.): Intelligent Interactive Multimedia: Systems & Services, SIST 14, pp. 439–448.
springerlink.com c Springer-Verlag Berlin Heidelberg 2012
440 J. Botzheim, Y. Toda, and N. Kubota

for the design and usage of the informationally structured space should be discussed
from various points of view such as information gathering of real environment and
cyber space, structuralization, visualization and display of the gathered information.
The structuralization of informationally structured space realizes the quick update
and access of valuable and useful information for people. It is very useful for both
robots and people to easily access the information on real environments. The infor-
mation is transformed into the useful form suitable to the features of robot partners
and people. Furthermore, if the robot can share the environmental information with
people, the communication with people might become very smooth and natural.

Fig. 1 Informationally Structured Space

Path planning for mobile robots is an essential task in informationally structured


spaces. The goal is to find an optimal, collision-free path between two points in the
environment composed of walls and obstacles. The optimality of the path is usually
measured by the distance covered by the robot. For its practical importance many
approaches have been suggested in the literature based mainly on evolutionary com-
putation [8, 10, 11, 18, 19, 21]. In our previous paper we solved the path planning
problem by bacterial memetic algorithm in case of static environment [2]. However,
real life situations are uncertain in most cases. In an informationally structured space
depicted in Figure 1, there can be some static obstacles like a bookshelf and a table,
Path Planning in Probabilistic Environment by Bacterial Memetic Algorithm 441

but there can also be some moveable objects like objects A and B in the Figure.
There can be for example a chair which is used in two different positions during the
day, e.g. in 80% of the day (i.e. with 80% probability) in one position, and in 20%
of the day in another position perhaps outside the room. This probabilistic extension
of our previous approach is investigated in this paper.

2 Bacterial Memetic Algorithm


Nature inspired evolutionary optimization algorithms are often suitable for global
optimization of even non-linear, high-dimensional, multi-modal, multi-objective,
and discontinuous problems. Bacterial Evolutionary Algorithm (BEA) [16] is one of
these techniques. BEA uses two operators, the bacterial mutation and the gene trans-
fer operation. These operators are based on the microbial evolution phenomenon.
The bacterial mutation operation optimizes the chromosome of one bacterium, the
gene transfer operation allows the transfer of information between the bacteria in
the population. BEA has been applied to a wide range of problems, for instance,
optimizing the fuzzy rule bases [16], fuzzy rule selection [5], and combinatorial
optimization problems [14].
Evolutionary algorithms are global searchers, however in most cases they give
only a quasi-optimal solution to the problem, because their convergence speed is
low. Local search approaches can give a more accurate solution, however they are
searching for the solution only in the neighborhood of the search space. Local search
approaches might be useful in improving the performance of the basic evolution-
ary algorithm, which may find the global optimum with sufficient precision in this
combined way. Combinations of evolutionary and local-search methods are usually
referred to as memetic algorithms [15]. Memetic algorithms have been used to wide
variety of problems such as NP-hard combinatorial problems [6, 20], dynamic opti-
mization problems [3], scheduling problems [9].
A new kind of memetic algorithm based on the bacterial approach is the bacte-
rial memetic algorithm (BMA) proposed in [1]. Bacterial memetic algorithm has
been applied to fuzzy rule base optimization [1], for the traveling salesman problem
and its modifications [7], and for path planning problem in static environment [2].
Bacterial memetic algorithms can deal with different individual lengths, too. The
algorithm consists of four steps. First, a random initial population with Nind individ-
uals has to be created. Then, bacterial mutation, a local search, and gene transfer are
applied, until a stopping criterion (number of generations, Ngen ) is fulfilled.

2.1 Encoding Method and Evaluation


We use cell decomposition as illustrated in Fig. 1 by blue cells. The input of the
algorithm is the map which has a cell structure and the map is created by the SLAM
method [17]. Each cell contains a probability of having an obstacle. The size of the
map (number of rows and number of columns) is also given as well as the position
of the start and target points. The encoding of the individuals is similar as in [21].
442 J. Botzheim, Y. Toda, and N. Kubota

The chromosome consists of intermediate points between the start and target points.
Each intermediate point (cell) is identified only by one integer number containing
the point’s two coordinates together as shown in Figure 2. The encoding of the
individual (bacterium) depicted in Figure 2 is [10|36|14]. The start and target points
are not presented in the bacterium.
The individuals are evaluated based on new intermediate points which cover the
way from the start point to the target point by only neighboring points with 4 possi-
ble directions. This can be done by using “straight” lines between the points encoded
in the individual. The way how a straight line is described by only neighboring
points with 4 possible directions is illustrated in Figure 3. After doing this extension
of the individual the path will contain only neighboring cells.
The individuals are evaluated as:
Li
Evali = Li + Penalty · ∑ CPj + S · Turnsi, (1)
j=1

where Evali is the i-th individual’s evaluation value, Li is the number of neighboring
cells in the extended individual, CPj is the collision probability of the j-th cell in the
extended individual’s sequence. Each cell in the extended individual has a collision
probability and these probabilities are summed up in the whole cell sequence. The
collision probability of a cell is 0 if the cell is free, the probability is 1 if the cell is
occupied by a static obstacle, and the probability is between 0 and 1 in case of prob-
abilistic obstacles. Penalty is a parameter reflecting the strength of punishment for
crossing occupied cells. Turnsi is the number of robot’s turns, parameter S reflects
the smoothness of the path. Paths with less turns are preferred.
An advantage of the approach is that it is able to handle individuals with different
length, i.e. individuals with different number of intermediate points. This property is
observable in the evaluation function, because if the length of a bacterium is chang-
ing e.g. a new intermediate point is added to the bacterium or an intermediate point
is removed from the bacterium then the extended individual (which is described by
only neighboring cells as seen in Figure 3) will be changing as well, and its L and
Turns values will be different than before and the sum of the collision probabili-
ties can differ as well. In [19] memetic algorithm is proposed for solving the path
planning problem based on the classical operators, the mutation and the crossover.
In their approach the length of the individual and the number of intermediate points
are predetermined, not found automatically by the algorithm.
In the initial population creation bacteria with different length can be created.
The length of the bacterium has to be between 1 and maximum allowed bacterium
length which is a parameter of the algorithm.
Another advantage of the approach is that it can handle probabilistic environment
as presented in this paper. In [2] we dealt with certain environments. The classical
techniques for path planning based on graph theory and similar approaches rely on
the exact estimation of the cost. However, in probabilistic environment this cannot
be achieved.
Path Planning in Probabilistic Environment by Bacterial Memetic Algorithm 443

Fig. 2 Cell structure Fig. 3 Representation of a line

2.2 Bacterial Mutation


To find a global optimum, it is necessary to explore new regions of the search space
not yet covered by the current population. This is achieved by adding new, randomly
generated information to the bacteria using bacterial mutation.
Bacterial mutation is applied to all bacteria one by one. First, Nclones copies
(clones) of the bacterium are created. Then, a random segment of length lbm is
mutated in each clone except one clone which is left unmutated. After mutating
the same segment in the clones, each clone is evaluated using Equation (1). The
clone with the best evaluation result transfers the mutated segment to the other
clones. These three steps operations (mutation of the clones, selection of the best
clone, transfer of the mutated segment) are repeated until each segment of the bac-
terium has been mutated once. The mutation may not only change the content, but
also the length [2, 5, 7]. The length of the new elements is chosen randomly as
lbm ∈ {lbm − l  , . . . , lbm + l  }, where l  ≥ 0 is a parameter specifying the maximal
change in length. When changing a segment of a bacterium, we must take care that
the new segment is unique within the selected bacterium. At the end, the best bac-
terium is kept and the clones are discharged.

2.3 Local Search


Local search is performed between the bacterial mutation and the gene transfer op-
erators. The local search has a probability, which is a parameter reflecting the chance
for a bacterium to be local searched in a given generation. Too frequent local search
can drive the population into local optimum, but if its probability is too small then it
will not sufficiently accelerate the evolutionary process. Four kinds of local search
operator are proposed. The one being performed on a given bacterium is randomly
chosen with equal probability [2].
444 J. Botzheim, Y. Toda, and N. Kubota

The insertion operator inserts a new point between those two intermediate points
where the insertion has the biggest benefit. The operation is performed only when
after insertion the bacterium will be better than before.
The deletion operator deletes that point from the individual which removal has
the biggest benefit. If no such a point is found in the sense that after removal the
bacterium will be worse then the operation will not be performed.
The swap operator tries to swap each pair of consecutive points in the individual’s
chromosome. The operation is performed when after swap the bacterium will be
better.
The local improvement operator tries to improve each point in the chromosome
in a given radius, which is a parameter of the algorithm. The local environment of
the points is investigated and the best new position for the point is selected or the
old position is remained in the path if there is no better position for the point.

2.4 Gene Transfer


The bacterial mutation operator optimizes the bacteria in the population individu-
ally. To ensure the information flow in population, gene transfer is applied.
First, the population must be sorted and divided into two halves according to their
evaluation results. The bacteria with less cost are called superior half, the bacteria
with more cost are referred to as inferior half. Then, one bacterium is randomly
chosen from the superior half and another from the inferior half. These two bacteria
are called the source bacterium, and the destination bacterium, respectively. A seg-
ment of length lgt from the source bacterium is randomly chosen and this segment
is used to overwrite a random segment of the destination bacterium, if the source
segment is not already in the destination bacterium. These two segments may vary
in size up to a given length [2, 5, 7]. This ensures – together with the variable length
in the bacterial mutation step – that the bacteria are automatically adjusted to the
optimal length. The steps above (sorting the population, selection of the source and
destination bacteria, transfer the segment) are repeated Nin f times, where Nin f is the
number of “infections” per generation.

3 Simulation Results
In the simulation tests two different environments were applied as illustrated in
Figure 4. The map size in case of the smaller problem is 20 × 20, and in case of
the second problem is 30 × 30. In Figure 4 the blue cell represents the start position
and the red cell is the goal position. The white cells mean free positions where the
probability of obstacle is 0. The dark cells represent the static obstacles with colli-
sion probability 1. The gray cells illustrate obstacles with probabilistic appearances.
Their probabilities are larger than 0 and less than 1. The probabilities in the first task
are: P1=0.4, P2=0.8, P3=0.2, and P4=0.6. The probabilities in the second task are:
P1=0.8, P2=0.3, and P3=0.7.
Path Planning in Probabilistic Environment by Bacterial Memetic Algorithm 445

Fig. 4 Simulation environments

BMA has many parameters which can cause a pretty fine behavior of the algo-
rithm. However, it is not always easy to find the most appropriate parameter setting
for a given problem. After some preliminary tests the parameter settings presented
in Table 1 appeared to be the most suitable for the two tasks.
With the parameter settings presented in Table 1 we obtained the optimal result
seven times from ten simulations in case of the smaller problem, and 3 times from
ten simulations in case of Task 2. Optimality means that there was no collision,
the path length was optimal and the number of turns was optimal as well. In the

Table 1 Parameter settings

Operation Parameter Task 1 Task 2


Number of generations 30 200
Number of bacteria 30 200
Max. bacterium length 10 20
Penalty 1000 1000
S 0.1 0.1
Bacterial mutation Number of clones 5 25
Segment length 2 2
Segment length change 1 1
Gene transfer Number of infections 5 25
Segment length 2 2
Segment length change 1 1
Local search Probability 20% 20%
Radius 2 3
446 J. Botzheim, Y. Toda, and N. Kubota

Fig. 5 Results for Task 1

Fig. 6 Results for Task 2

remaining cases the obtained paths were also collision-free, only the path length and
the number of turns were not optimal. In case of the first task, the path length was
optimal in all simulations, only the number of turns was not optimal. This situation
is illustrated in Fig. 5 for Task 1, and in Fig. 6 for Task 2, where Fig. 5a and Fig. 6a
depict the optimal solution in all terms, while Fig. 5b and Fig. 6b show a solution
where the path length is optimal, there is no collision with any obstacles, only the
number of turns is not optimal.

4 Conclusions and Future Works


In this paper the path planning problem in probabilistic environment is solved by
bacterial memetic algorithm. Classical approaches for path planning problems rely
on the exact estimation of the cost which cannot be performed in probabilistic cases.
Evolutionary techniques can be useful in these cases, however, their convergence
Path Planning in Probabilistic Environment by Bacterial Memetic Algorithm 447

speed to the optimum is pretty slow. Memetic algorithm can accelerate the evolu-
tionary process by local search. Bacterial memetic algorithm effectively combines
the bacterial operators with local search heuristics and can speed up the evolution-
ary process in this way. This property was investigated in [2] where BMA provided
better results than BEA and genetic algorithm.
The algorithm can handle different individual length which is useful in the en-
coding applied in this paper.
Our future plan is to extend the algorithm for handling more robots simultane-
ously. Another goal is a deeper analysis of the effect of the penalty factor in order
to provide practical solutions in complex probabilistic environments.

Acknowledgements. The research was supported by Universitas-Győr Foundation in the


framework of TÁMOP 4.1.1/A-10/1/KONV-2010-0005 Programme.

References
[1] Botzheim, J., Cabrita, C., Kóczy, L.T., Ruano, A.E.: Fuzzy rule extraction by bacterial
memetic algorithms. In: Proceedings of the 11th World Congress of International Fuzzy
Systems Association, Beijing, China, pp. 1563–1568 (2005)
[2] Botzheim, J., Toda, Y., Kubota, N.: Bacterial memetic algorithm for offline path plan-
ning of mobile robots. Memetic Computing (2012), doi: 10.1007/s12293-012-0076-0
[3] Caponio, A., Cascella, G.L., Neri, F., Salvatore, N., Sumner, M.: A fast adaptive
memetic algorithm for online and offline control design of PMSM drives. IEEE Trans-
actions on Systems, Man, and Cybernetics, Part B: Cybernetics 37, 28–41 (2007)
[4] Cordeiro, C.M., Agrawal, D.P.: Ad Hoc & Sensor Networks – Theory and Applications.
World Scientific Publishing (2006)
[5] Drobics, M., Botzheim, J.: Optimization of fuzzy rule sets using a bacterial evolutionary
algorithm. Mathware and Soft Computing 15(1), 21–40 (2008)
[6] Fischer, T., Bauer, K., Merz, P.: Solving the routing and wavelength assignment problem
with a multilevel distributed memetic algorithm. Memetic Computing 1(2), 101–123
(2009)
[7] Földesi, P., Botzheim, J., Kóczy, L.T.: Eugenic bacterial memetic algorithm for fuzzy
road transport traveling salesman problem. International Journal of Innovative Comput-
ing, Information and Control 7(5(B)), 2775–2798 (2011)
[8] Geisler, T., Manikas, T.: Autonomous robot navigation system using a novel value en-
coded genetic algorithm. In: Proceedings of the IEEE Midwest Symposium on Circuits
and Systems, pp. 45–48 (2002)
[9] Hasan, S.M.K., Sarker, R., Essam, D., Cornforth, D.: Memetic algorithms for solving
job-shop scheduling problems. Memetic Computing 1(1), 69–83 (2009)
[10] Hermanu, A.: Genetic algorithm with modified novel value encoding technique for au-
tonomous robot navigation. Master’s thesis, The University of Tulsa, Tulsa, OK, USA
(2002)
[11] Hosseinzadeh, A., Izadkhah, H.: Evolutionary approach for mobile robot path planning
in complex environment. International Journal of Computer Science Issues 7(4), 1–9
(2010)
448 J. Botzheim, Y. Toda, and N. Kubota

[12] Kubota, N., Yorita, A.: Topological environment reconstruction in informationally


structured space for pocket robot partners. In: Proceedings of the 2009 IEEE Interna-
tional Symposium on Computational Intelligence in Robotics and Automation, CIRA
2009, pp. 165–170 (2009)
[13] Kubota, N., Sotobayashi, H., Obo, T.: Human interaction and behavior understanding
based on sensor network with iPhone for rehabilitation. In: Proceedings of the Interna-
tional Workshop on Advanced Computational Intelligence and Intelligent Informatics
(2009)
[14] Luh, G.C., Lee, S.W.: A bacterial evolutionary algorithm for the job shop scheduling
problem. Journal of the Chinese Institute of Industrial Engineers 23(3), 185–191 (2006)
[15] Moscato, P.: On evolution, search, optimization, genetic algorithms and martial arts:
Towards memetic algorithms. Tech. Rep. Caltech, Pasadena, USA (1989)
[16] Nawa, N.E., Furuhashi, T.: Fuzzy system parameters discovery by bacterial evolution-
ary algorithm. IEEE Transactions on Fuzzy Systems 7(5), 608–616 (1999)
[17] Sasaki, H., Kubota, N., Taniguchi, K.: Evolutionary Computation for Simultaneous Lo-
calization and Mapping Based on Topological Map of a Mobile Robot. In: Xiong, C.-H.,
Liu, H., Huang, Y., Xiong, Y.L. (eds.) ICIRA 2008, Part I. LNCS (LNAI), vol. 5314,
pp. 883–891. Springer, Heidelberg (2008)
[18] Sedighi, K.H., Ashenayi, K., Manikas, T.W., Wainwright, R.L., Tai, H.M.: Autonomous
local path planning for a mobile robot using a genetic algorithm. In: Proceedings of the
2004 IEEE Congress on Evolutionary Computation, CEC 2004, pp. 1338–1345 (2004)
[19] Shahidi, N., Esmaeilzadeh, H., Abdollahi, M., Lucas, C.: Memetic algorithm based path
planning for a mobile robot. In: Proceedings of the International Conference on Com-
putational Intelligence, pp. 56–59 (2004)
[20] Tang, J., Lim, M.H., Ong, Y.S.: Diversity-adaptive parallel memetic algorithm for solv-
ing large scale combinatorial optimization problems. Soft Computing Journal 11(1),
873–888 (2007)
[21] Yang, S.X., Hu, Y.: Robot path planning in unstructured environments using a
knowledge-based genetic algorithm. In: Proceedings of the 16th IFAC World Congress
(2005)
Personalization of News Speech Delivery Service
Based on Transformation from Written
Language to Spoken Language

Shigeki Matsubara and Yukiko Hayashi

Abstract. This paper proposes a method of automatic transformation of newspaper


articles for generating spontaneous news speech in a news speech delivery service.
Among several differences between written and spoken languages in Japanese, this
study takes particular note of the importance of the differences in sentence-style,
and also works on taigen-dome sentence, which is a Japanese-specific rhetoric style
to cut off the words following a certain noun. The sentence-style transformation
from written language to spoken language is a kind of personalization of a text-
to-speech application. We took an example-based approach to complement taigen-
dome sentences. An experiment using Japanese newspaper articles has shown the
achievement of the acceptable performance on the selection of the complementary
phrase for taigen-dome sentences by using the type of the last noun of the sentence,
the tense, and so on.

1 Introduction
By the spread of the data release on the Web such a podcast, speech contents can be
easily heard during driving a car and walking in a town. The news speech is one of
such the contents, and it is utilized as a means to obtain information efficiently. Cur-
rently, the news speech is created by manually reading out the articles and recording
it. In order to increase the variety and scale of news speech from now on, it is desired
to automatically create news speech data.
Generally, the speech of reading a text can be created by only using a speech
synthesis software. In recent years, progress of speech synthesis technology is re-
markable and acoustically natural speech can be generated automatically. However,
there exist differences in vocabulary or expression between written language and
Shigeki Matsubara · Yukiko Hayashi
Graduate School of Information Science, Nagoya University, Furo-cho, Chikusa-ku,
Nagoya, 464-8603, Japan
e-mail: matubara@nagoya-u.jp

T. Watanabe et al. (Eds.): Intelligent Interactive Multimedia: Systems & Services, SIST 14, pp. 449–457.
springerlink.com c Springer-Verlag Berlin Heidelberg 2012
450 S. Matsubara and Y. Hayashi

spoken language, and therefore the method of changing a text into a speech as it
was would induce linguistically unnatural speech. Since the naturalness of speech
affects ease of hearing it, it is also important to generate a linguistically natural
speech.
This paper describes conversion from written language to spoken language, aim-
ing at automatically generating spontaneous news speech by using a newspaper ar-
ticles. In this research, it noted that taigen-dome1 occurs frequently in newspaper
articles and that the speech generated by reading out the sentence including taigen-
dome was unnatural.
In this paper, a technique for complementing the omitted expression of taigen-
dome sentences is presented. The technique was implemented by a statistical ap-
proach using linguistic information in taigen-dome sentences, including the type of
the noun of the sentence end, the dependency relation, tense, etc. The complement
experiment using newspaper articles provided the accuracy rate of 79.9% and the
availability of the technique was confirmed.
This paper is organized as follows: Section 2 describes a speech news deliv-
ery system. Section 3 gives comparisons between written language and spoken lan-
guage. Section 4 explains a method of predicate paraphrasing of Japanese newspa-
per articles. Section 5 reports experimental results.

2 On-Demand News Speech Delivery System


We developed an on-demand news speech delivery system which enables a driver
while driving a car listen to the latest news. Figure 1 shows the configuration of the
system. The system consists of two components: a server and an application.
The server component collects the latest articles from the news site on the Web,
and executes a paraphrase processing for the articles in order to generate the news
speech scripts to be read. The details of the paraphrase processing will be explained
in Section 4. The news speech files of the paraphrased articles are generated by
speech synthesis. The files are uploaded on the Internet and delivered to the drivers
in the Podcast style.
On the other hand, the application component recognizes driver’s speech com-
mands, and executes news retrieval. As speech commands, there exist “headline” to
read out the title, “back” and “next” to move the article, “domestic”, “economy” and
“IT” to select genres, and so on. That is, the system selects an appropriate article
from Podcast according to the driver’s commands, and plays the speech.
Figure 2 shows the selection screen of the news article. The driver can operate
it by the speech command while running and on the screen while stopping. We
implemented the paraphrase processing in Ruby and used HitVoice for the speech
1 Taigen-dome is a rhetoric to cut off the words following a certain noun in order to end
the sentence with the noun. The eliminated words, however, have to be inferable from the
context. In Japanese, taigen means an indeclinable word including a noun and a pronoun,
and dome means “to end,” hence taigen-dome approximately corresponds to “ending-with-
noun.” By doing this, the writer can make an impression on the reader.
Personalization of News Speech Delivery Service 451

Fig. 1 Configuration of on-demand news speech delivery system

Fig. 2 Selection screen of news articles


452 S. Matsubara and Y. Hayashi

synthesis. Moreover, we implemented the speech delivery with Flash Professional


8, and used Julius for speech recognition.

3 Written Language and Spoken Language


3.1 Linguistic Differences between News Articles and News
Speech
In general, natural language is divided into written language and spoken language.
Table 1 shows the differences between written language and spoken language in
Japanese. In this study, among the differences between written language and spoken
language, we take up a sentence style and diffuseness. Note that we equate diffuse-
ness with taigen-dome sentences. Figure 3 shows the comparison between written
language and spoken language using an actual newspaper article. The differences in
sentence style and the taigen-dome sentence are indicated by underline.

Table 1 Differences between written language and spoken language

written language spoken language


style “dearu” style “desu-masu” style
length long short
vocabulary difficult easy
tone formal informal
diffuseness low high

Fig. 3 Comparison between written language and spoken language


Personalization of News Speech Delivery Service 453

3.2 Related Works


Kaji et al. proposed a method of paraphrasing written-language-specific vocabulary
into spokenlanguage- specific vocabulary by using written and spoken language cor-
pora collected from the Web [1].
On the other hand, as a study on predicate paraphrasing, Hayashi et al. the con-
structed rules for sentence-style conversion [2]. They achieved the precision rate of
99.8% in a conversion experiment using news articles. However, the achievement
of reliable complement of taigen-dome sentences has been one of the remaining
problems.

4 Complement of Taigen-dome Sentences


In this study, a taigen-dome sentence is defined as the sentence whose words and
phrases after the noun at the end in a sentence are only a punctuation or a sym-
bol. Generally, a taigen-dome sentence is generated by deleting an expression at its
end from an original sentence. Therefore, the original sentence can be restored by
presuming and complementing the deleted expression.
Since a suitable expression as complement words and phrases of a taigen-dome
sentence is greatly influenced by the context, it is difficult to create the comple-
ment rules equipped with comprehensibility. In this research, the complement of a
taigen-dome sentence is realized by a statistical method using a text corpus. That
is, complement words and phrases are selected by paying attention to the sentences
which has the same feature with the taigen-dome sentence

4.1 Type of Complementary Phrase


For example, in the sentence
• Yoshi-wa tsugi-no tori. (The summary is as follows.),
the subtype of the last noun “tori” is “general.” In the sentence:
• Taisaku-no hitsuyosei-o shiteki. (He indicated the need for countermeasures.),
the subtype of the last noun “shiteki” is “sahen.” The subtype “sahen” means that
the noun is a verbal noun and can be followed by a verb “shimasu” which is roughly
the English equivalent of “do.” Thus, if the last noun of a sentence is a “sahen”
noun, the sentence is complemented by “shi-masu” or its past form “shi-mashita,”
otherwise it is complemented by “desu” or its past form “deshi-ta.” In the sentence:
• Kankoku-gawa-kara gutai-teki-na tean-ga aru yote. (South Korea will make a
concrete suggestion.),
although the last noun “yote” is “sahen” noun, the proper complement of the sen-
tence is “desu.” Thus, even if the last noun is a “sahen” noun, a sentence including
the word which can become the subject is complemented by “desu” or its past form.
454 S. Matsubara and Y. Hayashi

Fig. 4 Flow of complement processing

In addition to the type of the complementary word, we have to decide the proper
tense of the word. In order to determine the tense, we use the particular auxiliary
verb and the phrase which indicates the past form. A sentence which includes any
of the words listed below is considered as the past tense.
• Auxiliary verb “ta” which indicates the past form.
• The phrases which indicates the past form such as “sakunen (last year),” “san-
nen-mae (three years ago).”
There exists a taigen-dome sentence which needs no complement. For example, a
sentence
• Kanada-no ese-toshi-gun-no hitotsu ricchimondo-hiru. (Richmond Hill, one of
Canadian satellite cities.)
needs no complement. If it were complemented by a word like “desu,” it would be
considerably artificial. Therefore, if the subtype of the last noun in a sentence is not
“sahen” and the sentence does not include the word which can become a subject,
the sentence is considered to need no complement.

4.2 Flow of Complement Processing


The complement processing cosists of the following three steps:
1. Extraction of candidate sentences: The method extracts the sentences, of
which the last noun correspond with that of a taigen-dome sentence, from learn-
ing data as candidate sentences.
Personalization of News Speech Delivery Service 455

Fig. 5 Replacement of named entity

2. Refinement of candidate sentences: The method refines the candidate sen-


tences based on the context of dependent bunsetsus of the last noun.
3. Selection of a complement word or phrase: The method selects the most fre-
quent sentence end expression from refined candidate sentences as a complement
phrase.
Figure 4 shows the flow of complement processing for “
”. First, the method extracts the sentences of which
the last noun is “ (tenkai)” from learning data, and then retrieve the sentences
in which the particle of the dependent bunsetsu of “ ” is “ (-wo)”. Last of all,
the method counts the frequency for each sentence end expression and selects the
most frequent expression “ (shimashi-ta)” as the complement phrase.
The rest of this chapter explains the supplementary processing for the comple-
ment processing.
To promote the extraction of the candidate sentences, the method replace the
named entities and numerical expressoins into the category names and generalize
the sentences. CaboCha [3] was used for named entity recognition. The classifi-
cation names such as “PERSON” or “LOCATION” were assigned according to
the specification of the IREX (Information Retrieval and Extraction) [4]. Figure 5
shows the outline of the replacement of named entities. “ ” and “ ” are
replaced with “PERSON” and “PERCENT”, respectively.

5 Evaluation
The articles on January 3rd, 1995 in the Mainichi newspaper text corpus were used
as test data. The articles consist of 687 sentences including 714 places to be con-
verted. Of them, 164 are taigen-dome sentences. MeCab [5], CaboCha [3], and
CBAP [6] were used for morphological analysis, dependency analysis, and clause
boudary analysis, respectively. We set the following four comparative techniques:
1. Simple complement based on fine classification of nouns
2. Simple complement based on fine classification of nouns, tense, and adonominal
pharses
3. Statistical complement using only the last nouns.
4. Our method
As learning data, 715,429 sentences in the articles from January 4th to December
31st, 1995 in the Mainichi newspaper text corpus.
456 S. Matsubara and Y. Hayashi

Table 2 Experimental result

method correct incorect


1. simple (noun) 122 (74.4%) 42 (25.6%)
2. simple (noun, tense, adnominal) 128 (78.0%) 36 (22.0%)
3. statistical 120 (73.2%) 44 (26.8%)
4. our method 131 (79.9%) 33 (20.1%)

Table 3 Causes of faulty complement

cause number
Tense 14
Necessity of complement 9
Type of complementary word 7
Voice 3

Table 2 shows the experimental result for 164 taigen-dome sentences. The result
shows our method to be effective for complement of taigen-dome sentences. Table 3
shows causes of the faulty complements by our method.

6 Conclusion
This paper has proposed a method of conversion from newspaper articles into news
speech for generating spontaneous Japanese speech in a text-to-speech synthesis
system. We focused on complement of taigen-dome sentences. In the experiment
of the complement of taigen-dome sentences, we confirmed the effectiveness of our
method.

Acknowledgements. This research was partially supported in part by the Grant-in-Aid for
Challenging Exploratory Research (No. 21650028) of JSPS.

References
1. Kaji, N., Okamoto, M., Kurohashi, S.: Paraphrasing Predicates from Written Language to
Spoken Language using the Web. In: Proceedings of the Human Language Technology
Conference, pp. 241–248 (2004)
2. Hayashi, Y., Matsubara, S.: Sentence-style conversion of Japanese news article for text-to-
speech application. In: Proceedings of 7th International Symposium on Natural Language
Processing, pp. 257–262 (2007)
Personalization of News Speech Delivery Service 457

3. CaboCha: Yet Anothor Japanese dependency Analyzer,


http://chasen.org/taku/software/cabocha/
4. IREX, http://nlp.cs.nyu.edu/irex/
5. MeCab: Yet Anothor Part-of-Speech and Morphological Analyzer,
http://mecab.sourceforge.jp/
6. Kashioka, H., Maruyama, T.: Segmentation of semantic units in Japanese monologues. In:
Proceedings of International Conference on Speech Language Technology and Oriental,
COCOSDA, pp. 87–92 (2004)
Personalized Text Formatting
for E-mail Messages

Masaki Murata, Tomohiro Ohno, and Shigeki Matsubara

Abstract. E-mail systems are common communication tools, and it is desirable that
e-mail messages are written readably for recipients. One of techniques to write read-
able e-mail messages is to insert linefeeds and blank lines appropriately. However,
linefeeds and blank lines in incoming e-mails are not always inserted at positions
where recipients feel readable. If linefeeds and blank lines are inserted automati-
cally at proper positions, readability of e-mail texts are improved and recipients can
read e-mail texts efficiently. This paper proposes a method for text formatting of e-
mail texts by inserting linefeeds and blank lines into incoming e-mails at positions
where recipients feel readable.

1 Introduction
E-mail systems are common communication tools, and those users spend a great
deal of time in reading e-mail messages. It is desirable that e-mail messages are
written readably for recipients to read the e-mail messages effectively. One of tech-
niques to write readable e-mail messages is to insert linefeeds and blank lines ap-
propriately [1, 2, 3]. However, linefeeds and blank lines in incoming e-mails are not
always inserted at positions where recipients feel readable.
This paper proposes a method for formatting texts in e-mail messages by inserting
linefeeds and blank lines into incoming e-mails at positions where recipients feel
Masaki Murata · Shigeki Matsubara
Graduate School of Information Science, Nagoya University,
Furo-cho, Chikusa-ku, 464-8603, Japan
e-mail: murata@el.itc.nagoya-u.ac.jp
Tomohiro Ohno
Information Technology Center, Nagoya University,
Furo-cho, Chikusa-ku, 464-8601, Japan
e-mail: ohno@nagoya-u.jp,matubara@nagoya-u.jp

T. Watanabe et al. (Eds.): Intelligent Interactive Multimedia: Systems & Services, SIST 14, pp. 459–468.
springerlink.com c Springer-Verlag Berlin Heidelberg 2012
460 M. Murata, T. Ohno, and S. Matsubara

readable. In our method, we assume that users insert linefeeds and blank lines at
positions where they feel readable when they write e-mail messages. Therefore,
our method realizes linefeed and blank line insertion by statistical approach using
e-mails written and sent by recipients as the learning data.
Positions where linefeeds and blank lines are inserted in sent e-mails show dif-
ferent tendencies in individuals. We organized factors which have association with
positions where linefeeds and blank lines should be inserted, and decided the fea-
tures based on it. Our method realizes personalized text formatting of e-mails by
selecting features which are useful for inserting linefeeds and blank lines into one’s
incoming e-mails from the feature set.

2 Factors of Linefeed and Blank Line Insertion


When people insert linefeeds and blank lines into e-mail texts, it is thought that they
decide the positions where linefeeds and blank lines are inserted based on some fac-
tors. One’s tendency of linefeed and blank line insertion differs depending on which
factors they focus on among those factors. In the following sections, we organizes
factors which have potentially association with linefeed and blank line insertion.

Semantic Boundaries
Linefeeds are inserted at semantic boundaries if writers emphasize that each line in
e-mail texts consists of a semantic unit. This will enable to avoid separating a word
or a bunsetsu1 by a linefeed, and lead to efficient reading of e-mail texts.
Authors have analyzed linefeed insertion by which texts become more readable
[4]. There are clause boundaries or dependency relations as information on semantic
boundaries.

Line Length
If linefeeds are inserted so that a length of a line does not exceed a certain number,
the number relates to settings of the width in one’s e-mail environment. When peo-
ple read e-mail texts written in longer length of lines than the setting of the width,
they sometimes feel unreadable because they have to scroll. Therefore, writers write
e-mail texts so that a length of a line does not exceed the width in one’s e-mail
environment. there exists a different tendency about the maximum number of char-
acters of lines because the setting of the width in e-mail environment are different in
individuals.

Balance of Line Length


About a length of a line, there exists a writing style that maintains a balance of
lengths of lines. People might feel unreadable if right edges of lines in a paragraph
are not justified. Such writers insert linefeeds so that the balance of lengths of lines
1 Bunsetsu is a linguistic unit in Japanese that roughly corresponds to a basic phrase in
English. A bunsetsu consists of one independent word and zero or more ancillary words.
Personalized Text Formatting for E-mail Messages 461

is maintained. It is pointed out that e-mail texts become readable by inserting line-
feeds so that the length of each line is maintained on some level [1].

Topic Boundaries
One of blank line’s roles is to divide e-mail texts into topics by inserting blank line
at topic boundaries. If e-mail texts are divided into topics, recipients can read e-mail
texts efficiently. Information on topic boundaries is, for example, an interrogative
sentence or a conjunction at the beginning of a sentence.

Number of Line of Paragraph


A factor other than described above is relations between “blank lines” and “a num-
ber of lines of a paragraph.” Texts on display are more unreadable than texts on
paper because of the brightness of display or fonts. Therefore, readability of e-mail
texts can be improved by creating white space by blank line insertion accordingly.
The number of lines in a paragraph people feel unreadable is different in individuals.

3 Text Formatting Method of E-mails


Figure 1 shows the flow of our method. First, our method divides sent e-mails of a
recipient into the learning data and the development data, and selects features which
are useful to capture the tendency of the writing style of the recipient from the
feature set. The feature set were decided based on the factors of linefeed and blank
line insertion described in Section 2. And then, our method inserts linefeeds and
blank lines sequentially by using these features into incoming mails of the recipient
whose linefeeds and blank lines were deleted. At that time, our method uses sent
e-mails written by the recipient as the learning data. Figure 2 shows an example of
text formatting of an e-mail message by our method.

3.1 Linefeed Insertion Method


In linefeed insertion method, a sentence, on which morphological analysis, bun-
setsu segmentation, clause boundary analysis and dependency2 analysis have been
performed, is considered the input. Our method decides whether or not to insert a
linefeed at each bunsetsu boundary in an input sentence. Linefeed insertion method
identifies the most appropriate combination among all combinations of positions
where a linefeed can be inserted, by using the probabilistic model.
In this paper, input sentences which consist of n bunsetsus are represented by
B = b1 · · · bn , and the results of linefeed insertion by O = o1 · · · on . Here, oi is 1
if a linefeed is inserted right after bunsetsu bi , and 0 otherwise. Also, on = 1. We
indicate the j-th sequence of bunsetsus created by dividing an input sentence into m
2 A dependency in Japanese is a modification relation in which a modifier bunsetsu depends
on a modified bunsetsu. That is, the modifier bunsetsu and the modified bunsetsu work as
modifier and modifyee, respectively.
462 M. Murata, T. Ohno, and S. Matsubara

sent e-mail

学習データ
learning development
data data

feature selection
(Section 3.3)

incoming e-mail mail text into which


blank line
which has no linefeed insertion linefeeds and blank
insertion
linefeeds and (Section 3.1) lines are inserted
blank lines (Section 3.2)

Fig. 1 Flow of our method

input text I'm sorry for this late reply. I confirmed the unit which
返事が遅くなって申し訳ありません。参加単位、発表人数の件、了解しました。 participate in the meeting and the number for
presentation. I think all the people in laboratory attend the
研究室全体の参加で、こちらも発表人数は2名と考えております。時間帯の件です
meeting and the number for presentation is two. With
が、7/10になった場合、13:00以降であれば何時でも大丈夫ですので、そちらの研 regard to hours, if the meeting is held on 7/10, we can
究室のご都合さえよろしければ13:00スタートが良いと考えております。以上、よ participate in the meeting after 13:00. So, I would like to
ろしくお願いします。 start the meeting at 13:00, if that's OK with your laboratory.
Thank you for your assistance in this matter.
linefeed insertion

返事が遅くなって申し訳ありません。 I'm sorry for this late reply. I confirmed the unit which
参加単位、発表人数の件、了解しました。 participate in the meeting and the number for
研究室全体の参加で、こちらも発表人数は2名と考えております。 presentation. I think all the people in laboratory attend the
時間帯の件ですが、 meeting and the number for presentation is two. With
7/10になった場合、13:00以降であれば何時でも大丈夫ですので、 regard to hours, if the meeting is held on 7/10, we can
そちらの研究室のご都合さえよろしければ participate in the meeting after 13:00. So, I would like to
13:00スタートが良いと考えております。 start the meeting at 13:00, if that's OK with your laboratory.
Thank you for your assistance in this matter.
以上、よろしくお願いします。

blank line insertion


(I'm sorry for this late reply. )
返事が遅くなって申し訳ありません。 I confirmed the unit which participate in the meeting and the number for
参加単位、発表人数の件、了解しました。 presentation.

研究室全体の参加で、こちらも発表人数は2名と考えております。 I think all the people in laboratory attend the meeting and the number for
presentation is two.
時間帯の件ですが、
(With regard to hours, )
7/10になった場合、13:00以降であれば何時でも大丈夫ですので、
(if the meeting is held on 7/10, we can participate in the meeting after 13:00. )
そちらの研究室のご都合さえよろしければ (So, if that's OK with your laboratory, )
13:00スタートが良いと考えております。 (I would like to start the meeting at 13:00. )

以上、よろしくお願いします。 (Thank you for your assistance in this matter. )

Fig. 2 Example of text formatting of an e-mail message by our method

j j j j
sequences as L j = b1 · · · bn j (1 ≤ j ≤ m), and then, ok = 0 if 1 ≤ k < n j , and ok = 1
if k = n j .

3.1.1 Probabilistic Model for Linefeed Insertion

When an input sentence B is provided, our method identifies the linefeed insertion
O that maximizes the conditional probability P(O|B). Assuming that whether or not
to insert a linefeed right after a bunsetsu is independent of other linefeeds except
the one appearing immediately before that bunsetsu, P(O|B) can be calculated as
follows:
Personalized Text Formatting for E-mail Messages 463

P(O|B) (1)
=P(o11 = 0, · · · , o1n1 −1 = 0, o1n1 = 1, · · · , om
1 = 0, · · · , onm −1 = 0, onm = 1|B)
m m

∼P(o1 = 0|B) × · · · × P(o1


n1 −1 = 0|on1 −2 = 0, · · · , o1 = 0, B)
1 1
= 1
×P(o1n1 = 1|o1n1 −1 = 0, · · · , o11 = 0, B) × · · · × P(om
1 = 0|onm−1 = 1, B) × · · ·
m−1

×P(om
nm −1 = 0|onm −2 = 0,· · ·, o1 = 0, onm−1 = 1, B)
m m m−1

nm = 1|onm −1 = 0, · · · , o1 = 0, onm−1 = 1, B)
×P(om m m m−1

where P(okj = 1|ok−1 j


= 0, · · · , o1j = 0, onj−1
j−1 = 1, B) is the probability that a linefeed
j
is inserted right after a bunsetsu bk when the sequence of bunsetsus B is provided
and the position of j-th linefeed is identified. Similarly, P(okj = 0|ok−1 j
= 0, · · · , o1j =
j−1
0, on j−1 = 1, B) is the probability that a linefeed is not inserted right after a bunsetsu
j
bk . These probabilities are estimated by the maximum entropy method.
Next, we describe how to calculate arg max P(O|B). To insert linefeeds in consid-
eration of the balance of line length described in Section 2, comparing line lengths
between each line may be a simple way. However, if doing the comparison simply,
the information of before and after linefeeds are need. Therefore, we can’t assume
that the position where a linefeed is inserted is independent of other linefeeds except
the one appearing immediately before, and arg max P(O|B) is not able to calculate
by dynamic programming efficiently.
In this work, to calculate arg max P(O|B) efficiently as much as possible, we
calculate how close a length of each line is to a average length, which is obtained
by dividing the number of characters of B by the number of lines of B, and use this
value as the feature to consider the balance of the line length. This is because the
length of each line is close to the value which is obtained by dividing the number
of characters of B by the number of lines of B on a linefeed insertion result which
balances line lengths. By introducing this feature, arg max P(O|B) can be calculated
as follows: First, our method identifies the linefeed insertion Ol that maximizes
P(O|B) from O ∈ {O| ∑ni=1 oi = l}, which divide an input sentence into l lines.

Ol = arg max P(O|B)(1 ≤ l ≤ n)


O∈{O| ∑ni=1 oi =l}

And then, our method identifies the linefeed insertion O that maximizes P(O|B)
from O1 , · · · , On .
arg max P(O|B) = arg max P(O|B)
O∈{O1 ,···,On }

3.2 Blank Line Insertion Method


In blank line insertion method, sentences, on which morphological analysis, line-
feed insertion have been performed, are considered the input. Our method decides
whether or not to insert a blank line at each sentence boundary in an input sentence.
464 M. Murata, T. Ohno, and S. Matsubara

Table 1 Features used for the maximum entropy method

Linefeed insertion
Morphological 1. the rightmost independent morpheme i.e. head word, (part-of-speech and
information inflected form) and rightmost morpheme (part-of-speech) of a bunsetsu bkj
Clause boundary 2. whether or not a clause boundary exists right after bkj
information 3. type of a clause boundary right after bkj if there exists a clause boundary
Dependency 4. whether or not bkj depends on the next bunsetsu
information 5. whether or not bkj depends on the final bunsetsu of a clause
6. whether or not bkj depends on a bunsetsu to which the number of characters
from the start of the line is less than or equal to the maximum number of
characters
7. whether or not bkj is depended on by the final bunsetsu of an adnominal
clause
8. whether or not bkj is depended on by the bunsetsu located right before it
9. whether or not the dependency structure of a sequence of bunsetsus between
bkj and b1j , which is the first bunsetsu of the line, is closed
10. whether or not there exists a bunsetsu which depends on the modified bun-
setsu of bkj , among bunsetsus which are located after bkj and to which the
number of characters from the start of the line is less than or equal to the
maximum number of characters
Line length 11. Proportion of the number of characters from the start of the line to bkj to the
maximum number of characters
Balance of line 12. Proportion of different of the number of characters from the start of the line
length to bkj and average length to average length
Leftmost 13. whether or not the basic form or part-of-speech of the leftmost morpheme
morpheme of a of the next bunsetsu of bkj is one of the following morphemes
bunsetsu (Basic form: “ࢥ͏ (think)”, “໰୊ (problem),” “͢Δ (do),” “ͳΔ (be-
come),” “ඞཁ (necessary)”
Part-of-speech: noun-non independent-general, noun-nai adjective stem,
noun-non independent-adverbial)
Comma 14. whether or not a comma exists right after bJk
Blank line insertion
Number of lines 15. number of lines from the start of the paragraph
in paragraph 16. number of lines of sgh+1
Keyword 17. whether or not a first morpheme of sgh+1 is a conjunction
18. the first morpheme(a surface form) of sgh+1 if its part-of-speech is a con-
junction
19. whether or not sgh is a interrogative sentence

Blank line insertion method identifies the most appropriate combination among all
combinations of positions where a blank line can be inserted, by using the proba-
bilistic model.
In this paper, input texts which consist of q sentences are represented by S =
s1 · · · sq , and the results of blank insertion by T = t1 · · ·tq . Here, ti is 1 if a blank
line is inserted right after sentence si , and 0 otherwise. Also, tq = 1. We indicate the
g-th sequence of sentences created by dividing an input sentence into r sequences
as Sg = sg1 · · · sgqg (1 ≤ g ≤ r), and then, thg = 0 if 1 ≤ h < qg , and thq = 1 if h = qg .
When an input sentences S is provided, our method identifies the blank line inser-
tion T that maximizes the conditional probability P(T |S). Assuming that whether
Personalized Text Formatting for E-mail Messages 465

or not to insert a blank line right after a sentence is independent of other blank lines
except the one appearing immediately before that sentence, we calculate P(T |S) by
the same method described in Section 3.1.1.

3.3 Features Selection Method


We decided the feature set based on the literature [4] and the factors of linefeed and
blank line insertion described in Section 2. This feature set is thought of as typically
useful features for linefeed and blank line insertion. Table 1 shows the feature set.
When people inserts linefeeds and blank lines into e-mail texts, it is possible that
they focus on different factors in individuals. In the case that formatting incoming
e-mails of a certain recipient automatically by linefeed and blank line insertion,
all features shown in Table 1 are not useful. Our method uses a part of sent e-
mails of the recipient as the development data. And our method compares accuracy
(harmonic mean of recall and precision shown in Section 4.1) in the case of using
all features and the features deleted one by one among features shown in Table 1.
If accuracy is improved when a certain feature is deleted, we think this feature does
not contribute to improve readability of one’s e-mail texts. Therefore, our method
does not use this feature when inserting linefeeds and blank lines into incoming e-
mails of the recipient. That is, our method selects and uses the features which are
useful for formatting e-mail texts of a particular recipient from the feature set shown
in Table 1.

4 Preliminary Experiment
To evaluate the effectiveness of our method, it is necessary to conduct an experiment
by using e-mail data of many users. As a preliminary experiment, we used e-mail
data of one of the authors.

4.1 Outline of Experiment


First, we divided sent e-mails of the author into the learning data and the develop-
ment data, and selected the features by applying the method described in Section 3.3.
We performed text formatting of e-mails by our method using this feature set. We
used 517 sent e-mails as the learning data, and 65 sent e-mails as the development
data for selecting the features. Also, we used 65 sent e-mails and 100 incoming e-
mails as the test data. This paper aims to format e-mail texts more readable for a
certain recipient by inserting linefeeds and blank lines into incoming e-mails. How-
ever, we used also sent e-mails as the test data to perform the quantitative evaluation.
In the evaluation for sent e-mails, we obtained the recall and precision. The recall
and precision of linefeed insertion are respectively defined as follows:
466 M. Murata, T. Ohno, and S. Matsubara

# o f correctly inserted line f eeds


recall =
# o f line f eeds in the correct data
# o f correctly inserted line f eeds
precision =
# o f automatically inserted line f eeds
Those of blank line insertion are defined likewise.
In evaluation for incoming e-mails, we conducted a subjective evaluation of e-
mail texts.
We deleted greetings, signatures, quotation and forwarding by using a simple
script to target at contents of e-mail texts for experimental object. Linguistic infor-
mation was provided automatically. We used MeCab[5] as a morphological parser,
CaboCha[6] with default learning data as a dependency parser, and CBAP[7] as a
clause boundary analyzer. And, we used the maximum entropy method tool [8] with
the default options except “-i 2000.” In the experiment, we defined the maximum
number of characters per line as 37.

4.2 Result of Feature Selection


As a result of applying the method described in Section 3.3 to the development
data, F-measure of linefeed insertion was increased 1.16 and 0.75 in the case of
deleting the features 6. and 12. respectively. And, F-measure of blank line insertion
was increased 19.56 and 4.06 in the case of deleting the features 15. and 17., 18.
respectively.
From these, we used the features shown in Table 1 except the features 6., 12., 15.,
17., 18. in the experiment.

4.3 Experimental Result


Table 2 shows the experimental result of our method for sent e-mails. In linefeed
insertion, the recall and precision were 58.65% and 55.96% respectively. In blank
line insertion, the recall and precision were 100% and 62.12% respectively, but this
result is the same as the result that inserts blank lines at every sentence boundary in
the test data.
Figure 3 shows the result of linefeed and blank line insertion of our method. The
result shown in Figure 3 is the same as the sent mail written by the author. Both
F-measure of linefeed insertion and blank line insertion are not good, however our

Table 2 Experimental result

recall precision F-measure


linefeeds 58.65% (61/104) 55.96% (61/109) 57.28
blank lines 100% (123/123) 62.12% (123/198) 76.74
Personalized Text Formatting for E-mail Messages 467

input text When we checked the Mainichi newspaper data, the data
毎日新聞データ集を確認したところ、データ集には朝刊と夕刊の記事が含まれて included articles of morning paper and evening paper. The
いました。お渡ししたデータは朝刊、夕刊の区別なく文を抽出したデータを基に given data was created based on data extracted from
without distinguishing between the morning paper and the
作成しているため、似たような文が見られたのだと思われます。よろしくお願い
evening paper, so I think there were similar sentences in
致します。
the data. We appreciate your prompt attention to this
matter.
text into which linefeeds and blank lines were inserted
毎日新聞データ集を確認したところ、 (When we checked the Mainichi newspaper data, )
データ集には朝刊と夕刊の記事が含まれていました。 (the data included articles of morning paper and evening paper. )

お渡ししたデータは (The given data was )


created based on data extracted from without distinguishing
朝刊、夕刊の区別なく文を抽出したデータを基に作成しているため、
between the morning paper and the evening paper,
似たような文が見られたのだと思われます。 (so I think there were similar sentences in the data. )

よろしくお願い致します。 (I appreciate your prompt attention to this matter. )

Fig. 3 Example of linefeed and blank line insertion by our method

method was able to output results which completely corresponds to sent e-mails of
the author as shown in Figure 3.

4.4 Subjective Evaluation


We conducted a subjective evaluation of 100 incoming mails into which linefeeds
and blank lines were inserted by our method. In the subjective evaluation, the author,
whose sent e-mails were used as the learning data, compared original incoming
mails to formatted texts, and judged formatted texts “become readable,” “become
unreadable,” “equally.”
Among 100 incoming e-mails, 17 mails were judged more readable than original,
and, 23 mails were comparable level. 60 mails were judged to become unreadable.
The reasons for high rate of unreadable e-mails are that linefeed insertion method
does not consider white spaces and symbols included in e-mail texts and does not
insert linefeeds well into itemized texts. E-mail texts which include these errors are
unreadable for not only the author but also many people. Therefore, the development
of linefeed insertion method which considers a structure of e-mail texts is need.

5 Conclusion
This paper proposed a method for text formatting of e-mails based on linefeed and
blank line insertion. In our method, sent e-mails written by a certain recipient are
used as the learning data. Our method realizes linefeed and blank line insertion
which fits write tendency of the recipient by using the useful features for formatting
e-mail texts of the recipient which are selected from the common feature set. An
experiment by using the mail data of one of authors showed an F-measure of linefeed
insertion was 57.28, and F-measure of blank line insertion was 76.74. We conducted
a subjective evaluation, and analysed the causes making readability worse by our
method.
In the experiment, we used e-mail data of the author. In the future, we will con-
duct an experiment using mail data of multiple users and evaluate whether or not
our method inserts linefeeds and blank lines at positions where each recipient feels
readable. And, on experimental e-mail data of one of the authors, blank lines were
468 M. Murata, T. Ohno, and S. Matsubara

inserted at every sentence boundaries. We might say that our blank line insertion
method didn’t work well. In the future, it is necessary to improve the blank line
insertion method.

Acknowledgements. This research was partially supported by the Challenging Exploratory


Research (No.21650028).

References
1. Fujita, E.: Mail bunsyoryoku no kihon (Fundamentals of a writing ability of e-mail). Nip-
pon Jitsugyo Publishing Co., Ltd. (2010) (in Japanese)
2. Ueda, M., Hosoda, S.: Tyosoku master E-mail, Rirekisyo Entry sheet Seikou Jitureisyu
(Ultrahigh-eary master Example of Success of E-mail, Resume, Entry Sheet). Tahahashi
Shoten Co., Ltd. (2009) (in Japanese)
3. Ando, S.: E-mail handbook. Kyoritsu Shuppan Co., Ltd. (1998) (in Japanese)
4. Ohno, T., Murata, M., Matsubara, S.: Linefeed Insertion into Japanese Spoken Monologue
for Captioning. In: Proceedings of Joint Conference of the 47th Annual Meeting of the
Association for Computational Linguistics and the 4th International Joint Conference on
Natural Language Processing of the Asian Federation of Natural Language Processing,
pp. 531–539 (2009)
5. Kudo, T., Yamamoto, K., Matsumoto, Y.: Applying Conditional Random Fields to
Japanese Morphological Analysis. In: Proceedings of the 2004 Conference on Empirical
Methods in Natural Language Processing, pp. 230–237 (2004)
6. Kudo, T., Matsumoto, Y.: Japanese dependency analysis using cascaded chunking. In:
Proceedings of 6th Conference on Computational Natural Language Learning, pp. 63–69
(2002)
7. Kashioka, H., Maruyama, T.: Segmentation of semantic units in Japanese monologues. In:
Proceedings of International Conference on Speech Language Technology and Oriental,
COCOSDA, pp. 87–92 (2004)
8. Le, Z.: Maximum entropy modeling toolkit for python and c++ (2008),
http://homepages.inf.ed.ac.uk/s0450736/maxenttoolkit.html
(online; accessed March 1, 2008)
Presentation Story Estimation from Slides
for Detecting Inappropriate Slide Structure

Tomoko Kojiri and Fumihiro Yamazoe*

Abstract. In many situations, we present our ideas using presentation tools, such
as PowerPoint in Microsoft Office. However, story created by author is sometimes
inappropriate and listeners cannot understand topics that the author wants to insist
by watching the generated slide. Our research aims at constructing the system
which automatically detects differences of author’s intention and slide structure,
and points out inappropriateness of generated slides. Author’s intention for each
slide is captured by assigning each slide in topic template. On the other hand, top-
ics that listeners may understand from generated slides are estimated based on the
lexical information in the slides and are organized as topic tree. Relations between
generated slides are detected by the change of their focusing nodes in a topic tree.
By comparing relations grasped by the topic template and topic tree, the system is
able to point out inappropriate slides automatically.

1 Introduction
In many situations, we present our ideas using presentation tools, such as Power-
Point in Microsoft Office. Presentation files consist of a sequence of presentation
slides (slide) in which descriptions of topics are organized sequentially. Author of
the slide (author) consider the presentation story (story) based on the topics that
he/she wants to insist (intention). Then, he/she generates contents, such as texts,
diagrams or table, that explain the topics by considering relations between them.
However, story created by author is sometimes inappropriate and listeners cannot
understand topics that the author wants to insist by watching the generated slide.
In many cases, it is difficult for authors to detect the inappropriateness of their
slides by themselves. Thus, to point out the differences between author’s intention
and listeners’ understandings is valuable.
There are several researches that support authors for generating logical presen-
tation slide [1, 2]. Maeda, et al. constructs collaborative learning environment for

Tomoko Kojiri ⋅ Fumihiro Yamazoe


*

Faculty of Engineering Science, Kansai University


3-3-35 Yamate-cho, Suita, Osaka, 564-8680, Japan
e-mail: kojiri@kansai-u.ac.jp

T. Watanabe et al. (Eds.): Intelligent Interactive Multimedia: Systems & Services, SIST 14, pp. 469–478.
springerlink.com © Springer-Verlag Berlin Heidelberg 2012
470 T. Kojiri and F. Yamazoe

discussing slide story generation skill [3]. This research provides story amendment
tool which assists participants to re-organize the presented slide and generate slide
amendment. Also, it proposes the discussion support tool which detects meaning-
ful modifications in participants’ amendments. In this environment, whether
participants can effectively discuss story generation skill or not depends on cha-
racteristics of participants. So, participants sometimes are not able to acquire the
knowledge for slide generation. On the other hand, Kamewada, et al. constructed
the tool which extracts listeners’ eye movement during the presentation [4]. By
notifying differences of presenting slide and listeners’ eye directions, authors can
notice the inappropriateness of his/her story. However, in order to modify the gen-
erated slide using this system, authors need to do presentation in front of listeners.
So, this system cannot be used when there is not listener.
On the other hand, Hanaue, et al. constructed the system which generates slides
automatically [5]. In this system, author has to input topics and their relations.
Based on the input topics, slides are generated automatically. Using this system,
slides that satisfy author’s intention can be generated. However, author’s skill for
generating slides is not developed. Therefore, author needs to use the system every
time he/she creates slides.
Our research aims at constructing the system which automatically detects au-
thor’s intention and topic flow in generated slide. Then, it extracts conflicts be-
tween generated slide and author’s intention and points out inappropriateness of
generated slides. By considering the reason for detected inappropriateness, authors
can develop the skill for generating logical slides.
There are some typical stories for each presentation genres, such as research
presentation at the academic conference and commercial goods presentation to
customers. Elements in such typical stories have sequential and inclusive relations
and how to express their relations in the slides is important. Authors tend to create
their slides by following the typical stories. Therefore, roles of generated slides
and their relations in author’s intention can be grasped by matching them to the
elements in the typical stories.
For listeners, relations among slides are inferred by descriptions in slides. For
instance, if one slide explains topic which is already described in other slide, these
slides contain the same words and slide which explains the topics has several ex-
planation sentences below them. In order to estimate listeners’ understandings of
the slides, our system automatically analyzes the descriptions in the slides and de-
tects the relations of topics. Then, by comparing them with author’s intention
which is acquired from the typical stories, conflict between author’s intention and
generated slides is able to be detected.

2 Approach
Figure 1 shows the overall framework of our system. The objective of our system
is to detect the conflict between author’s intention and constructed slides. As Tufte
[6] described, relations among slides are one of the difficult points to represent in
creating slides. This research focuses on the relations between slides, such as
sequential relation, inclusive relation or no relation, as author’s intention.
Presentation Story Estimatio
on from Slides 4771

Fig. 1 Framework of system


m

The target of the preseentation in our research is the research presentation in thhe
computer science field fo or the academic conference. In such presentation genrre,
typical story exists, in whhich relations among elements in the typical story are dee-
fined statically. If author’s slides are corresponded to the elements in the typiccal
story, author’s intentions for each slide can be grasped. Thus, in our system, topic
template is introduced wh hich corresponds to the typical story, by which elemennts
and relations among elem ments are defined. When using the system, author firstlly
indicates each slide to thee elements in the topic template.
On the other hand, listeener’s understanding for the slide is estimated by descripp-
tions in the constructed slides.
s New words appear when new topic is proposed,
while complementary top pic holds words that are already used in other topics. Cuur-
rently, we regard each sen ntence in the slides as individual topics. Based on the app-
pearance and relations beetween words in sentences, relations among topics are inn-
ferred and topic tree is constructed.
c Topic tree represents the relations betweeen
topics. Topics with sequeential relations are situated as sibling nodes and topics oof
inclusive relations are located as child nodes.
Relations of slides can n be grasped by topics in the topic tree included in eacch
slide. For instance, if one slide holds topics of upper nodes and the other slide conn-
tains those of lower nodes, the latter slide may provide supplementary explanation tto
the former slide. Thus, by comparing topics included in each slide, relations of slidees
are estimated. Then, if thee relations are different from those that are acquired fromm
topic template, differencess are pointed out to the author.
472 T. Kojiri and F. Yamazooe

3 Topic Template
Topic template representss typical stories for a specific presentation genre. Toppic
template consists of nodees and links. Nodes represent topics and links represennt
relations between topics. Currently, two types of links are prepared, such as see-
quential relation and incllusive relation. Such relations are defined statically foor
each element in typical stoory.
This research focuses on the research presentation in the computer science at
the academic conference. Figure 2 is the example of the topic template for the ree-
search presentation. Usuaally, such research presentation starts from backgrounnd
and objective follows to the background. Then, constructed system is introduceed
and effectiveness of the system
s is discussed based on the result of experiment. IIn
addition to the basic storry, several other topics are added for the purpose of exx-
plaining the story in detaiil. Approach provides global viewpoint for achieving thhe
objective. So, it is includ
ded from objective. Also, method embodies approach bby
showing the algorithms, equations or some models. Thus, it is included by App-
proach. As the same way y, experiment and discussion, and discussion and concluu-
sion have inclusive relatio
ons respectively.

Fig. 2 Example of topic temp


plate

Author’s slides are manually allocated to the nodes in the topic template. Baseed
on the allocation, author’ss intention among slides is grasped according to the rela-
tions between nodes in a topic
t template.

4 Topic Tree
Topic tree represents top pics and their relations that may be grasped by listeneers
from the slides. In slides, each sentence forms topic, so relations among sentencees
should be detected.
Within one slide, relatiions among topics can be observed by their layout infoor-
mation. If the itemize level of one topic is lower than that of the other, the formeer
topic may explain the lattter topic, so that the former topic is included by the latteer
topic. When topics are in n the same itemize level, their relations are grasped bby
their physical relation. Th
hat is, the topic which is written in the upper part than thhe
Presentation Story Estimatio
on from Slides 4773

other, the latter topic has sequential relation from the former topic. Relations arre
also defined between slides. Sequential relation exists between slides whose order
is next to each other. A slide has sequential relation to its next slide. Based oon
such viewpoint, topic treee is constructed from the slides.
Figure 3(a) is the exammple of slides and Figure 3(b) shows its topic tree. Sincce
objective slide locates nexxt to the background slide, sequential relation is attacheed
from the node of backgro ound to that of objective. Since background slide consissts
of two topics of the samee itemize level, they become child nodes of backgrounnd
and sequential relations are
a attached between them. On the other hand, objectivve
slide has two topics of different itemize levels. So, first topic becomes child nodde
of objective, and second toopic is generated as the child node of the first topic.

Fig. 3 Example of topic tree

On the other hand, sommetimes topics (explaining topic) explain the topics in thhe
different slide (explained topic). In such case, topic tree should be re-organized sso
as to make explaining no ode become sub-tree of explained node. Following arre
steps for re-organizing top
pic tree.
Step 1: Detection of keeywords that characterize topic
Step 2: Detection of noodes of the same topic
Step 3: Re-organize thee topic tree according to nodes of the same topic.

In step 1, in order to detecct the nodes of the same topic, words that may characteer-
ize the topics are extracted
d. Keywords are the words that are often used in the topiic,
but are not used in other topics.
t In order to detect such words, methods whose ideea
is similar to tf-idf method
d [7] is proposed. Figure 4 is the equations for calculatinng
474 T. Kojiri and F. Yamazooe

uniqueness of words. Uniiqueness of word i is calculated by multiplying its tf annd


idf values. tf calculates raatio of the word in the sentence. idf derives logarithm oof
reciprocal of the distributiion of words in slide. If uniqueness of word is larger thaan
the threshold, the word is regarded as keyword.

Fig. 4 Equation for calculatin


ng uniqueness of words

In step 2, nodes that insist


i the same topic are detected using their keywordds.
Since sentences consist off a small number word, nodes are regarded as similar topp-
ic if they contain the sam
me keywords. Therefore, nodes that have more than onne
word in common are detected as the same topic.
In step 3, topic tree iss re-organized if a topic in one slide explains a topic iin
other slide. Explaining toopic may appear in the slide after the explained topic is
described. In addition, exxplaining topic may situate upper layer in the topic treee,
because concrete explanattion is expected to situate in the topics of the lower layeer.
The amount of the explan nation is grasped by the size of its sub-tree. Therefore, if
two nodes are regarded as the same topic in step 2 and the node which comes laat-
ter has certain amount off explanation, the sub-tree of the latter node is moved aas
the sub-tree of the formerr node. Figure 5 is the equation for calculating the impoor-
tance of the node j, which h represents the amount of explanation of node j. In thhis
equation, levelj representss indent level of node j and sentencek corresponds to thhe
number of topics in level k. The level of title is regarded 0 and that of itemize lev
vel
1 corresponds to 1. As ann example, let’s calculate importance of “To support slidde
construction” in Figure 3((a). The itemize level of the sentence is 1 and the numbeer
of topics in each leevel is 1. So, its importance is calculated aas
(1/2)1/{1*(1/2)0+1*(1/2)1+1*(1/2)
+ 2
}=2/7. If the importance is larger than three-
shold, the node is regardeed as explanation and its sub-tree is moved as the sub-treee
of the other node.

Fig. 5 Equation for calculatin


ng importance of nodes
Presentation Story Estimatio
on from Slides 4775

In Figure 3(a), two sliides have words “slide” in common. The importance oof
node which includes “slid de” in slide2 is 2/7. If the threshold for the importance is
set as 1/7, its sub-tree mov
ves and topic tree of Figure 5 is transformed to Figure 6.

Fig. 6 Topic tree (transformeed)

5 Detection of Rela
ations among Slides from Topic Tree

Relations between two sliides are determined using topic tree. Usually, slides conn-
sist of several nodes in to
opic tree. Average location of nodes is regarded as the foo-
cus of the slide. Change of focus between slides may express relations betweeen
slides globally. In order to
t represent the focus, two axes are introduced, such aas
proceeding degree and deepth. Then, the focus of slide is expressed by the follow w-
ing form:

Focus = (proceeding degree, depth)

Proceeding degree indiccates status from the first node of topics in the slidees.
Figure 7 is the equation of determining the proceeding degree. It is derived bby
calculating the ratio of inncluded nodes for each sub-tree of the first layer in thhe
slide. Let’s assume that slide
s includes colored nodes in Figure 8. The proceedinng
degree for the slide is calcculated as 1*4/6+2*2/3+3*0/5=4/3.

Fig. 7 Equation for calculatin


ng proceeding ratio
476 T. Kojiri and F. Yamazooe

Fig. 8 Example of nodes included in slide

Depth corresponds to the detail of explanation for each topic. Figure 9 is thhe
equation of determining depth.
d In the equation, Layeri corresponds to the level oof
layer. Depth is determined by the number of included nodes for each layer. In thhe
example of Figure 8, the depth
d is calculated as (1*2+2*3+3*1)/7=11/7.

Fig. 9 Equation for calculatin


ng depth

The relations between slides can be grasped by the direction of the vector, iin
which starting point correesponds to the focus of the former slide and ending poinnt
corresponds to that of thee latter slide. If two slides have sequential relation, proo-
ceeding degree becomes larger and depth may not change. On the other hand, if
two slides have inclusive relation, depth becomes larger while proceeding degreee
may not change. So, if an ngle of the vector from the horizontal axis a (see Figurre
10) is bigger than the threeshold, slides are regarded to have inclusive relation. O
On
the contrary, if angle a iss smaller than the threshold, slides may have sequentiial
relation.
Determined relations are compared with those defined by topic template. If
there are conflicts betweeen relations, system points out that slide description is
inappropriate.
Presentation Story Estimatio
on from Slides 4777

Fig. 10 Detection of relation


ns between slides

6 Conclusion
In this paper, the mechan nism which automatically detects differences of authorr’s
intention and created presentation slide is introduced. Currently, we are developp-
ing the system using C# and
a MeCab[7] as morphological analyzer. As soon as thhe
system is implemented, ou ur mechanism needs to be evaluated through experimennt.
So far, we have not disscussed the kind of messages to give to authors, when diif-
ferences are detected. If the message does not explain the reason of differencees,
authors may not be able tot modify the slides. However, to tell the reason directlly
prevents authors consideering the reasons of inappropriateness by themselvees.
Therefore, the mechanism m for generating messages that promote meta-learning oof
slide generation skill shou
uld be developed in our future work.

References
1. Okamoto, R., Kashihara,, A.: Back-Review Support Method for Presentation Rehearssal
Support System. In: Kön nig, A., Dengel, A., Hinkelmann, K., Kise, K., Howlett, R.JJ.,
Jain, L.C. (eds.) KES 20011, Part II. LNCS (LNAI), vol. 6882, pp. 165–175. Springeer,
Heidelberg (2011)
2. Kashihara, A., Saito, K., Hasegawa, S.: A Cognitive Apprenticeship Framework ffor
Developing Presentation Skill. IEICE Technical Report 111(141), 23–28 (2011) (in Jaap-
anese)
3. Maeda, K., Hayashi, Y., Kojiri, T., Watanabe, T.: Skill-up Support for Slide Compossi-
tion through Discussion. In: König, A., Dengel, A., Hinkelmann, K., Kise, K., Howleett,
R.J., Jain, L.C. (eds.) KES
K 2011, Part III. LNCS (LNAI), vol. 6883, pp. 637–6446.
Springer, Heidelberg (20011)
478 T. Kojiri and F. Yamazoe

4. Kamewada, K., Nishimoto, K.: Supporting Composition of a Presentation by Showing


Transitions of Audiences’ Attentions. Trans. of Information Processing Society of Ja-
pan 48(12), 3859–3872 (2007)
5. Hanaue, K., Watanabe, T.: Externalization Support of Key Phrase Channel in Presenta-
tion Preparation. International Journal of Intelligent Decision Technologies 3(2), 85–92
(2009)
6. Tufte, E.R.: The Cognitive style of PowerPoint. Graphic Press (2003)
7. Salton, G., Fox, E.A., Wu, H.: Term-weighting Approaches in Automatic Text Retriev-
al. Journal of Information Processing & Management 24(5), 513–523 (1988)
8. MeCab: Yet Another Part-of-Speech and Morphological Analyzer (2009),
http://mecab.sourceforge.net/
Problem Based Learning for US and Japan
Students in a Virtual Environment

Dana M. Barry, Hideyuki Kanematsu, Yoshimi Fukumura, Toshiro Kobayashi,


Nobuyuki Ogawa, and Hirotomo Nagai *

Abstract. Problem Based Learning (PBL) is important for engineering education


and has been a tool for creative engineering design. It can enhance creativity and
has been used to successfully carry out many experiments in the real world. Re-
searchers in the US and Japan (the authors) are pursuing studies to determine the
effectiveness of its use in a virtual environment, one with cutting-edge technology
and opportunities for complementary activities between face to face learning and
electronic learning. Here students can work from anywhere in the world, at any
time, and at their own pace. For this project, student teams from the US and
Japan were asked to solve problems in a virtual community. Each team worked

Dana M. Barry
*

Center for Advanced Materials Processing (CAMP), Clarkson University, US


e-mail: dmbarry@clarkson.edu
Hideyuki Kanematsu
Department of Materials Science and Engineering,
Suzuka National College of Technology, Japan
e-mail: kanemats@mse.suzuka-ct.ac.jp
Yoshimi Fukumura
Department of Management and Information Systems Science,
Nagaoka University of Technology, Japan
e-mail: fukumura@oberon.nagaokaut.ac.jp
Toshiro Kobayashi
Department of Electronics & Control Engineering,
Tsuyama National College of Technology, Japan
e-mail: t-koba@tsuyama-ct.ac.jp
Nobuyuki Ogawa
Department of Architecture, Gifu National College of Technology, Japan
e-mail: ogawa@gifu-nct.ac.jp
Hirotomo Nagai
Water Cell Inc. Japan
e-mail: nagai.h@water-cell.jp

T. Watanabe et al. (Eds.): Intelligent Interactive Multimedia: Systems & Services, SIST 14, pp. 479–488.
springerlink.com © Springer-Verlag Berlin Heidelberg 2012
480 D.M. Barry et al.

independently on a different project. The US team designed and built a car for the
future, while the Japan team focused on designing a safe way for using nuclear
energy. A discussion about the US team’s successful project is provided.

1 Introduction
PBL provides students with challenging, ill-structured problems that relate to their
daily lives [1]. The students receive guidance from their teachers and work coope-
ratively in a group to seek solutions to the problems. For this project, student
teams from the US and Japan worked independently on different problems in
Second Life (SL), an online three-dimensional community. All of the activities
took place on a virtual island owned by Nagaoka University of Technology, Japan.
Students met in virtual classrooms that resembled those seen in real life, with
items such as tables and chairs. This arrangement gave the team members a sense
of reality. Also their teachers presented the PBL material on a big screen in the
virtual classroom by using Power Point slides.
Students’ conversations and discussions were based on the chat function in SL.
Participants typed text messages in the input box at the bottom of the screen to ex-
change ideas and thoughts with each other. All of the dialogue was recorded as
files written by the Linden Script Language. Then it was sent to a web server,
and finally saved as CSV files. This information was accessed (by the teacher and
students) for analysis by using various web browsers. The students were to also
use touch tablets with special pens and Bamboo software (Wacom Company) to
exchange ideas with each other by drawing sketches. A touch tablet was con-
nected to each student’s personal computer so that their information and drawings
could be sent to a common server. This way a whiteboard (set for the specific
website) could display individuals’ sketches (as they were being prepared) in the
virtual classroom. Unfortunately because of technical problems, etc. this activity
did not work well. Therefore, the US team just prepared a simple sketch of their
solar car and displayed it in the virtual classroom for discussion purposes.
The US team presented the final solution to their PBL project by constructing
the eco car of the future using prims (formally called primitives). Their avatars
prepared prims (three dimensional objects written by the Linden Script) in a space
referred to as a sandbox, located in front of the virtual classroom. A detailed de-
scription of the US team’s activity is provided.
The authors have previous experience of successfully carrying out problem based
learning activities in the virtual world [2 - 4]. Since PBL has been very useful in
real classrooms, they wondered if it would be an effective teaching tool in Meta-
verse. To find out, they tried a Pilot Study in 2009 for a PBL project in the virtual
world. This project was carried out in Second Life by student teams from the US
and Japan in virtual classrooms owned by Nagaoka University of Technology
(NUT). Each team worked independently on the same PBL project. They received
guidance from their teachers. The student teams were asked to solve the following
problem. What will the typical house look like in the near future, during the global
warming era? The participants from both Countries enjoyed and successfully com-
pleted this project. They communicated well using the chat function and built their
houses of the future with prims. The US house was a dome-shaped structure with
Problem Based Learning for US and Japan Students in a Virtual Environment 481

solar panels on the roof and a floor made of synthetic wood to preserve the Earth’s
trees. Japan’s team built an energy efficient dome-shaped house with a ceiling that
automatically opened so that cool breezes could flow through it. The results of the
Pilot Study clearly indicated that this kind of PBL class was possible for actual e-
learning in engineering education. Therefore, the researchers decided to pursue
their studies with two new PBL projects (The Virtual Eco Car Project by the US
team and The Nuclear Energy Safety Project in Metaverse by the Japan team).

2 Details of the Virtual Eco Car Project


This project took place in SL on an island owned by Nagaoka University of Tech-
nology (NUT) in Japan. Researchers at the University built virtual buildings con-
taining virtual classrooms so that
the eco car project could be car-
ried out in SL [5]. The classrooms
included red chairs, tables, a po-
dium, whiteboards, and a record-
ing box to collect chat dialogues.
A team of three US students (16
years old and older) was set up to
carry out the virtual car project.
See Figure 1, with a teacher and
the three students seated. At the
same time, a group of three Japan
students (16 years old and older)
Fig. 1 US team members carry out chat discus- was formed to independently car-
sions in a virtual class. ry out their project.
It should be mentioned that in
order for students to carry out the eco car project in SL, they need to perform vari-
ous functions in the virtual world. To start, they register and name an avatar to
carry out activities on behalf of them. They make their avatars move by using
tasks such as walking, running, and flying. They use the teleport function to trans-
port their avatars to different locations in SL. The students brainstorm, participate
in group discussions, and make decisions (in the virtual classroom) by using the
chat function. Also they design and prepare prims (three-dimensional objects, such
as cubes, written in Linden
script) in order to build their car
of the future.
To start, each team’s instruc-
tor introduced the virtual project
to them. The Japan team mem-
bers sat in the SL classroom,
where their instructor introduced
the nuclear safety project prob-
Fig. 2 The Japan team is provided with some infor- lem to them. They needed to de-
mation about nuclear energy. sign a safe way for using nuclear
energy. See Figure 2.
482 D.M. Barry et al.

The US instructor used a Power Point presentation to present the car problem and
to provide general information about three types of cars (a solar car, a nuclear car,
and a fuel cell car). This team was to select one of the three cars to be their best eco
car of the future. See Figure 3.
Each team began their project (in the virtual classroom) by brainstorming and
holding discussions about possible solutions to the problem they were asked to
solve. The US team needed to select one type of car to design and build as their
eco car of the future. In order to make a decision they compared the three cars in
terms of energy efficiency, ecological
friendliness, and safety.
The US team emphasized safety and
felt overall that the best car would be a
solar car. They made some important
points about the cars. They said the so-
lar car is an electric car powered by an
available source of energy (the Sun).
Therefore our natural resources will
not be depleted. They also said that the
Fig. 3 The US instructor gives a Power Point solar car was the safest and that the
presentation about the project to her team. technology for making it was already
available. In regards to the other two
cars, the team agreed that the nuclear and fuel cell cars were energy efficient but
had safety issues. They were concerned about radiation and the storage of wastes
for a nuclear car and about the flammability of hydrogen for the fuel cell car.
The next part of this
project involved the design-
ing of their solar car. The
team was asked to prepare a
simple sketch of their car
design by using special tab-
lets and Bamboo software.
This required some practice
on their part.
At the same time, the
students practiced making
prims which would be used
to make their eco car of the
future. The students tra-
veled to various locations in
Fig. 4 US student team members practice making prims. SL (such as Natoma and the
Ivory Tower) to obtain in-
formation about making prims. Figure 4 shows the US students making prims
During another meeting in the virtual classroom, the US team’s simple car design
was displayed on the whiteboard for discussion purposes. See Figure 5. The stu-
dents decided to slightly modify the design by adding more solar panels. They
agreed to build a car that resembled this sketch. It would be green and have
wheels, solar panels, and a small passenger section in the front. (The battery for
Problem Based Learning for US and Japan Students in a Virtual Environment 483

Fig. 5 The US instructor examines her team’s solar car design sketch.

storing energy would not be visible.) Finally the students decided (as a result of
chat discussions) how they would build the car as a team. Avatar Fountainer14
built the car’s body by making a long, green rectangular solid prim. Avatar Ali-
lovleylights built the small passenger section for the front of the car by starting
with a green hemisphere prim. Avatar Swimmywimmy15 made the wheels and
took the lead in making the solar panels. See Figure 6.
The students completed this project by placing solar panels (the flat blue items)
on top of their car. See Figure 7.

Fig. 6 The US team makes car parts using prims.

Fig. 7 The US team finishes building their solar car.


484 D.M. Barry et al.

3 Project Results
The student teams in the US and Japan successfully completed their projects. The
US team designed and built virtual cars of the future. To obtain more information
about these Problem Based Learning (PBL) activities, each student was asked to
complete two questionnaires. Questionnaire (Part 1) and the US results for this
questionnaire are provided.

Questionnaire (Part 1)

1. Did you enjoy your overall activity?


#1: Very much #2: Pretty much #3: Neutral, #4: Not so much
#5: Not at all
2. Did you chat with other colleagues and the teacher effectively?
#1: Very much #2: Pretty much #3: Neutral, #4: Not so much
#5: Not at all
3. Did you feel that the discussion was easy?
#1: Very much #2: Pretty much #3: Neutral, #4: Not so much
#5: Not at all
4. Did you feel that the sketch making was easy?
#1: Very much #2: Pretty much #3: Neutral, #4: Not so much
#5: Not at all
5. Did you feel that the prim making was easy?
#1: Very much #2: Pretty much #3: Neutral, #4: Not so much
#5: Not at all
6. What was the most enjoyable?
#1: Avatar’s movement #2: Discussion #3: Sketch
#4: Prim Making #5: other ( )
7. What was the most difficult?
#1: Avatar’s movement #2: Discussion #3: Sketch
#4: Prim Making #5: other ( )
8. Was your teacher friendly to you?
#1: Very much #2: Pretty much #3: Neutral, #4: Not so much
#5: Not at all
9. Do you want to join such a project again in the future?
#1: Very much #2: Pretty much #3: Neutral, #4: Not so much
#5: Not at all
10. Please write your impressions, ideas, etc. freely.

The US team results for Questionnaire (Part 1) are provided in Figure 8.


The answers for Questions (1) – (9) are shown graphically as follows.
Problem Based Learning for US and Japan Students in a Virtual Environment 485

Fig. 8 Questionnaire (part 1) questions & responses of the 3 students


486 D.M. Barry et al.

Fig. 8 (continued)

And as for the question 10, we got the following results. Student number 1: “It
was fun and enjoyable. We would like more time individually to play with and
create the car.” Student number 2: “It was fun. We would like more time on
the car and more prims available.” Student number 3: “We would like more
time to individually build the car.”
The results show that the US team members enjoyed this activity and would be
interested in participating in another similar project. Overall they appeared com-
fortable performing functions (such as walking, making prims, etc.) in SL. The
students said it was easy drawing sketches and making prims. Two of them
thought that the avatar’s movement was the most enjoyable, while one liked prim
making the best. Participants expressed a need for more time to complete the
project. They all felt that discussion was the hardest task, even though they had
good brainstorming sessions in the virtual classroom. It may have been difficult
(using the chat function) because they had to think and type fast and be careful
about spelling errors. Also they had to be aware of other chat messages and read
them quickly before they disappeared. These results suggest that the students may
Problem Based Learning for US and Japan Students in a Virtual Environment 487

be more relaxed speaking than writing. The voice chat in SL is a good option for
them. Overall this was an exciting and successful project for all involved.

4 Conclusions
The Virtual Car Project had several goals for the students. The participants were
to learn about various car types, and to design and build a car of the future through
problem based learning (PBL) in Second Life (SL). Therefore, virtual classrooms
were built on a virtual island to determine the effectiveness of PBL in SL. For this
project, the students understood the lecture material presented in the virtual class-
room. Using the chat function, they actively discussed the possible car types in re-
gards to energy efficiency, ecological friendliness, and safety. As a team they
placed much emphasis on safety and decided to design and build a solar car for
their virtual car of the future.
The US team’s PBL project seemed to apply well to the e-learning environment
offered in Second Life. The participants enjoyed the activity, which was a great
exercise in engineering design. They successfully carried out the PBL project in
SL by using the chat function for discussions and by using prims for making their
PBL product: the eco car of the future. For this project, SL provided some benefits
to PBL. The virtual world appeared to be a relaxed and comfortable setting for
discussions and decision making activities. The virtual classroom was bright and
cheerful, and offered a private gathering place where the students could focus on
their PBL activity. Also decision making in the virtual world was quicker and eas-
ier than in the real world. To design and build a car in the real world would require
lots of time and money, along with major decisions like what materials to use for
the car, where to buy these materials, where to build the car, etc. In addition, the
virtual world allowed for creative decision making because it has less restrictions
(examples: in terms of time, money, and space). Students were free to expand
their thoughts and to consider more options (for possible solutions) to the problem
they were solving. Overall it can be said that this project confirmed the effec-
tiveness of PBL in SL.

References
1. http://en.wikipedia.org/wiki/Problem-based_learning
2. Barry, D.M., Kanematsu, H., Fukumura, Y.: Problem Based Learning in Metaverse (ED
512315), Education Resources Information Center, U.S (2010)
3. Barry, D.M., Kanematsu, H., Fukumura, Y., Ogawa, N., Okuda, A., Taguchi, R., Nagai,
H.: International Comparison for Problem Based Learning in Metaverse. In: Proceed-
ings of the ICEE and ICEER 2009 Korea (International Conference on Engineering
Education and Research), Seoul, Korea, pp. 59–65 (2009)
488 D.M. Barry et al.

4. Kanematsu, H., Fukumura, Y., Barry, D.M., Sohn, S.Y., Taguchi, R., Farjami, S.: Vir-
tual Classroom Environment for MultiLingual Problem Based Learning with US, Ko-
rean, and Japanese Students. In: Proceedings of 2010 JSEE Annual Conference, Toho-
ku, Japan (August 2010)
5. Kanematsu, H., Fukumura, Y., Ogawa, N., Okuda, A., Taguchi, R., Nagai, H.: Practice
and Evaluation of Problem Based Learning in Metaverse. In: ED- MEDIA 2009 (World
Conference on Educational Multimedia, Hypermedia & Tele-communications), pp.
2862–2870. Association for the Advancement of Computing in Education, Honolulu
(2009)
Proposal of a Numerical Calculation
Exercise System for SPI2 Test
Based on Academic Ability Diagnosis

Shin'ichi Tsumori and Kazunori Nishino*

Abstract. This paper describes a concept for a calculation exercise system used in
order for students who study for SPI2 Test may exercise a numerical calculation
repeatedly. We aim to develop a system characterized by generating questions
dynamically to each student in order to ease the burden for the teacher when
preparing many original questions. In such a case, in order to raise learning effect, it
is important to measure a student's academic ability exactly and distribute questions
according to academic ability. In this paper, we propose a method to estimate
students' understanding based on an academic ability diagnostic test using item
response theory, and a method to control questions to distribute using the
information of hierarchical structure among questions in the study unit.

Keywords: e-learning, SPI2 Test, numerical calculation exercise, academic ability


diagnosis, generation of questions, item response theory.

1 Introduction
In recent years, in many universities or colleges, the students who lack fundamental
academic ability of Mathematics are increasing in number, and it interferes the
advance of some lectures premised on Mathematics understanding. In addition, since
many companies give Mathematics test as a part of their employment examination
nowadays, it is difficult for students with low academic ability of Mathematics to pass
it. The College which Tsumori belongs to has been giving a class of remedial
Mathematics to the freshman students. However, since the difference of the academic
ability between students is large, it is difficult to raise learning effectiveness with the
same curriculum. In such a case, the individualized learning which has been adapted
for the understanding is more effective than the simultaneous learning. Therefore, it

Shin'ichi Tsumori ⋅ Kazunori Nishino


*

Kyushu Junior College of Kinki University,


1-5-30 Komoda-Higashi, Iizuka, Fukuoka, 820-8513 Japan
e-mail: tsumori@kjc.kindai.ac.jp, nishino@lai.kyutech.ac.jp

T. Watanabe et al. (Eds.): Intelligent Interactive Multimedia: Systems & Services, SIST 14, pp. 489–498.
springerlink.com © Springer-Verlag Berlin Heidelberg 2012
490 S. Tsumori and K. Nishino

seems that an e-learning system functions effectively. Since many students have an
insufficient understanding of the Mathematics learned at Elementary and Junior High
Schools, it is important for them to solve many fundamental numerical calculations
repeatedly. However, since they do not like to study "Mathematics" generally, it is
difficult for them to maintain motivation to study contents of the existing
"Mathematics" as is. Instead, their volition to study for an employment examination is
comparatively high. Therefore, it is expectable that the teaching materials made for
employment examinations raise their motivation for learning.
Then, we aim to improve students' academic ability of mathematics using the
mathematical questions in SPI2 Test widely adopted as an employment
examination. The range of questions of SPI2 Test is very wide. It consists of
questions which require the academic ability at the level of Elementary School
upper classes to High School low grade. Therefore, in order to study efficiently, it is
important to measure the student's academic ability and to exercise using many
questions repeatedly in the study unit which he/she does not understand well.
Many exercise systems characterized by distributing the questions accordingly to
academic ability have been developed for many years. One of the typical methods is
choosing the question which is adapted for student's academic ability from the
database which store a lot of questions [1][2]. This method is very effective if the
level of difficulty for all questions in the database are already set using a method
such as Item Response Theory (IRT). However, since it is necessary for a teacher to
make a lot of questions in advance, a big burden on him/her is created. One of the
other methods is to generate questions automatically [3]-[10]. In this case, it is a
problem of how the validity of the generated question is evaluated.
Then, we propose the calculation exercise system characterized by carrying out
two steps; Academic Ability Diagnosis and Questions Exercise using Numerical
Calculation. For the first step, the student's understanding of every study unit is
measured from the result of a diagnosis test. Next, a student repeatedly solves the
questions which are generated by our system automatically. This paper explains the
concept for a system with the above feature.

2 What Is SPI2?
SPI (Synthetic Personality Inventory) is a name of the aptitude test developed by
(present)Recruit Co., Ltd. in 1974 based on MNPI (Minnesota Multiphasic
Personality Inventory) which was developed by University of Minnesota. The version
of SPI used now is SPI2 developed in 2005. SPI2 Test is used widely for the
employment examination of a company. As of March, 2011, 8,610 companies
adopted SPI2 and about 1,230,000 people have taken it. There are two systems for
SPI2 Test. These are Paper-Based Test which uses an answer sheet and
Computer-Based Test (CBT). It is said that IRT is used for CBT in order to change the
question's difficulty level according to the right-or-wrong situation of an answer.
SPI2 consists of an achievement test and a personality test. Although the contents
and the range of questions of an achievement test are not opened, it includes two
fields. One is the language field which asks language academic ability, and the other
is a non-language field which asks mathematical academic ability (it is an object of
Proposal of a Numerical Calculation Exercise System for SPI2 Test 491

(number of people)

0 2 4 6 8 10 12 14 16 (score)

Fig. 1 Histogram of raw score of SPI2 trial test

this paper). Many of the questions in the non-language field are simple questions
which can be correctly answered by the student who has fundamental Mathematical
academic ability. However, as for the result of the trial test for SPI2, even the question
of this level was difficult for a Junior College Student. Fig.1 shows the histogram of
raw score of SPI2 trial test which consists of 16 questions (Max: 8, Average: 4).

3 Studying Method with Our System


Many questions of non-language fields of SPI2 Test are numerical calculations, and
contents of them can be divided into tens of study units, such as calculation of
probability, calculation of speed, calculation of concentration, and so on. Since the
range of questions of SPI2 Test is wide, in order to raise a learning effect in limited
time, it is important to learn only study units which the student does not understand
well. Incidentally, since non-language field of SPI2 Test needs to answer 30
questions in about 40 minutes, it is also required that an answer should be solved
within a short time.
In this paper, learning activity is advanced by two steps as follows.
(1) Academic Ability Diagnosis
At first, all students take the diagnostic test for measuring the academic ability of
non-language field of SPI2. The student who is going to use a calculation exercise
system needs to take this test in advance. A diagnostic test consists of questions in
every study unit, and the same test is given to all students.
After the test, the difficulty level of each question and the academic ability of
each student is calculated using IRT.
(2) Questions Exercise using Numerical Calculation
After a student chooses the study unit which he/she wants to learn, the question
exercise begins with a level (High, Middle or Low) according to academic ability.
System generates a question automatically using a question template and presents it
to a student. Also, it raises/lowers the level of questions created according to a
student's right-or-wrong situation. Student finishes exercising the study unit when
he/she can solve high-level questions in the study unit, and he/she finishes learning
with this system when the exercise of all study units are completed.
492 S. Tsumori and K. Nishino

4 Academic Ability Diagnosis


A teacher creates the test questions in advance to be used for academic ability
diagnosis based on the questions in actual SPI2 Test. It is desirable that the level of
all questions in the diagnosis test set to medium. All questions are numerical
calculations and the created questions are registered into learning management
system (LMS). Students access the web page for academic ability diagnosis test
using a web browser and solve questions.
After the test, parameters of both a student's academic ability and the difficulty of
every question are calculated using one-parameter logistic model of IRT. PROX
method (Approximation Procedure) is used for parameter estimation. There are
other methods for parameter estimation such as a maximum likelihood estimate
method. However, in order to divide academic ability only into three levels (High,
Middle and Low), the necessity for the high-precision estimate method is low. We
thought the simplicity of calculation was important and adopted the PROX method.
The outline of the procedure of the parameter estimation by the PROX method is
shown using Table.1. In order to explain simply, it assumes that the test consisted of
six questions and seven students took the test.

Table 1 Parameter estimation using the PROX method

Student Question ID
Lc θ
ID 1 2 3 4 5 6
1 1 0 0 0 1 0 -0.695 -0.940
2 1 1 1 0 1 0 0.695 0.940
3 0 0 1 0 0 0 -1.607 -2.174
4 1 0 0 0 1 0 -0.695 -0.940
5 1 0 0 0 0 1 -0.695 -0.940
6 1 1 1 0 0 1 0.695 0.940
7 1 1 1 1 0 1 1.607 2.174
Li -1.791 0.286 -0.286 1.791 0.286 0.286 unbiased estimate
of variance:
IIC -1.886 0.191 -0.381 1.696 0.191 0.191 U: 1.338, V: 1.252
expansion factor:
bj -2.525 0.256 -0.51 2.271 0.256 0.256 X: 1.353, Y: 1.339

1) Setting up right-or-wrong information


1 (correct) or 0 (incorrect) is set as a result of right-or-wrong information for every
question. In Table.1, it shows that the student of ID:1 answered the question 1 and 5
correctly and he/she answered the other questions incorrectly.
2) Calculation of logit incorrect (written as "Li" in Table.1)
Logit incorrect which is calculated from right-or-wrong information is an index
showing the difficulty of an item (question). If the rate of correct answers of an item
is set to p, it will be calculated by the following formula.
1
Logit incorrect log
Proposal of a Numerical Calculation Exercise System for SPI2 Test 493

3) Calculation of initial item calibration (written as "IIC" in Table.1)


Since logit incorrect contains the problem that it changes with the levels of an
students’ academic ability, the average value of logit incorrect is subtracted from
each logit incorrect. This is called initial item calibration.
4) Calculation of logit correct (written as "Lc" in Table.1)
Logit correct is an index showing the student's academic ability. If the rate of
correct answers of an item is set to p, it will be calculated by the following formula.

Logit correct log


1
5) Calculation of unbiased estimate of variance of logit incorrect and logit correct
Each unbiased estimate variance is calculated by

Unbiased estimate of variance
1
The unbiased estimate of variance of logit incorrect and logit correct is denoted by
U and V respectively.
6) Calculation of expansion factor
Expansion factor is used in order to keep variance of both student's academic ability
and difficulty level of the question from changing with samples. Expansion factor
of academic ability (X) and expansion factor of question (Y) are calculated by the
following formula using unbiased estimate of variance U and V.

1 1
. .
X ,
1 1
. .

7) Calculation of student's academic ability and difficulty level of question


Each student's academic ability is obtained by multiplying each logit correct and
expansion factor X. The difficulty level of each question is obtained by
multiplying each initial item calibration and expansion factor Y.
By the above method, a student's academic ability and the difficulty of each
question can be calculated. Table.2 shows probabilities that the student whose
academic ability is will answer correctly at the question which has a difficulty of
. These are calculated using the following formula for the probability of the
question in one-parameter logistic model.
1
1
Unfortunately, by this method, the difficulty of a question may change at each
diagnosis test against the idea of IRT. Actually, we don't expect that there will be
enough number of students who will take a diagnosis test at a time. Then, we would
need to repeat the tests until at least we will be able to get reliable difficulty for each
question.
The shaded cells in Table.2 show the questions which the student answered
correctly. Although there are some cells which the prediction has mistaken, it can be
494 S. Tsumori and K. Nishino

said that it is a result in which the parameter estimation using PROX method is
almost appropriate.
In this research, initial question level of each study unit is determined as follows
using the correct answer probability of Table.2.
- 0.3 : Question of a level lower than a diagnostic test
- 0.3 0.7 : Question of a level equivalent to a diagnostic test
- 0.7 : Question of a level higher than a diagnostic test
Two values 0.3 and 0.7 for the criterion are tentative. They may be changed
appropriately after a number of experiment with students.

Table 2 Probability that a student answers a question correctly ( )

Bj
Student
θ 1 2 3 4 5 6
ID
-2.525 0.256 -0.510 2.271 0.256 0.256
1 -0.940 0.83 0.23 0.39 0.04 0.23 0.23
2 0.940 0.97 0.66 0.81 0.21 0.66 0.66
3 -2.174 0.59 0.08 0.16 0.01 0.08 0.08
4 -0.940 0.83 0.23 0.39 0.04 0.23 0.23
5 -0.940 0.83 0.23 0.39 0.04 0.23 0.23
6 0.940 0.97 0.66 0.81 0.21 0.66 0.66
7 2.174 0.99 0.87 0.94 0.48 0.87 0.87

For example, the student of ID:1 starts to exercise questions at:


- lower than a diagnostic test for study unit 2, 4, 5 and 6
- equivalent to a diagnostic test for study unit 3
- higher than a diagnostic test for study unit 1
Student of ID:1 is correct for the question in study unit 5. However, he/she may not be
correct for other questions of the same level in study unit 5 because his/her is lower
than criterion value. Therefore he/she is needed to start to exercise at a low level.

5 Question Generation Method

Table 3 Example of a question template

attribute value
Question ID 7
Question What percentage of salt water solution will be made if
Sentence ( p1 )g of water is added to ( p2 )g of ( p3 )% salt
water solution?
Level Middle
Parameter set (50, 100, 3, 2), (60, 100, 4, 2.5), (50, 200, 5, 4), …
Parent ID 11
Child ID 3, 4, 6
Proposal of a Numerical Calculation Exercise System for SPI2 Test 495

Question is generated using the question template registered into the question
template database.
An example of a question template is shown in Table.3. The contents of each
attribute of a template are as follows.

What percentage of salt water 100 * 10 / 100 = 10


solution will be made if 300g of
(a) 4% salt water solution is added to 300 * 4 / 100 = 12
100g of 10% salt water solution?
10 + 12 = 22

22 / (100 + 300) * 100 = 5.5

What percentage of salt water


solution will be made if 150g of 100 * 10 / 100 = 10
(b) water is added to 100g of 10% salt
10 / (100 + 150) * 100 = 4
water solution?

How many grams of salt is


(c) contained in the 100g of 10% salt 100 * 10 / 100 = 10
water solution?

Fig. 2 Layer of questions

- Question ID
ID of a question template
- Question sentence
Question sentence given to a student. x 1, 2, 3, … is a parameter which
determines a value at the time of generating questions.
- Question level
Difficulty level of the question expressed in three levels (High, Middle or Low).
- Parameter set
Set of the numerical value given to a question sentence and the numerical value of
a correct answer. There are some parameter sets in a question template, and one
set is chosen at the time of generating the question. For example, when (50, 100,
3, 2) are chosen, p1=50, p2=100, and p3=3 are set to a question sentence. The
correct answer is 2.
- Parent ID
ID of the question containing the solution of this question
496 S. Tsumori and K. Nishino

- Child ID
Question ID used as the partial solution of this question
Fig.2 shows three questions (a), (b), (c) and each solution. As Fig.2 shows, the
solution of question (b) includes the solution of question (c) and it needs another
calculation. Similarly, the solution of question (a) contains the solution of question
(b). Therefore, the difficulty level of three questions is (a), (b), (c) in order of
difficulty, and the layer of solution of the questions shown in Fig.2 is defined. That
is, the following relations are shown.
- The student who answered a certain question correctly can answer all the
questions of the lower level of the question correctly.
- The student who answered a certain question incorrectly cannot answer all the
questions of the higher level of the question correctly.
Then, all the questions in the study unit are defined hierarchically and a question-
template database like Fig.3 is created. Each node of Fig.3 is a question and the
number in a node is the question ID explained at Table.3. All links between nodes
can be expressed by setting up parent ID and child ID of Table.3.

Question-Template Database

Study unit 3
Study unit 2
Study unit 1
High 11 12

7 8 9 10
Middle

Low 1 2 3 4 5

Fig. 3 Layer which consists of question-templates

All the nodes are divided into three levels (High, Middle and Low) in Fig.3.
Although the clear rule about how to divide a level is not specified, it can divide a level
by using the number of a formula required in order to solve the question as one view.

6 The Flow of a Question Exercise and Student Model


The flowchart of a question exercise is shown in Fig.4. Fundamentally, the template
which has not been shown yet is chosen from the question templates of the level to
Proposal of a Numerical Calculation Exercise System for SPI2 Test 497

START

All questions in the study Yes


unit are "learned"?
END
Raise the question level
No

Yes All questions in the level


are "learned"?

No

Generate the question using the Set the status of both this question
template unset a understanding and its children to "learned"
status and give it to a student

Yes
Answer is correct?

No

Unset the status of child-template of


the question and lower the question
level

Fig. 4 Flow chart of a question exercise

which the student belongs (for example "Middle") now. A question is generated by
setting a parameter to it. If a student answers the questions of a level to which he/she
belongs correctly altogether, it will move to the higher level. On the other hand,
when a student gives an inaccurate solution to a question, the learning situation of
the child template of the question template is returned to "not learned" (unset the
"learned"). One level of a student's level is lowered and an easier question is set.
A student's understanding situation is expressed by the overlay student model
defined in the pair of the question and the understanding situation ("learned" or "not
learned") of each question.

7 Conclusion
We explained the outline of the calculation exercise system for the numerical
calculation set in SPI2 Test. A student performs the question exercise based on the
result of the diagnostic test for every study unit using this method. Therefore, the
student can avoid spending learning time on the study unit which he already
understands thoroughly, and instead, learn the study unit which he/she becomes
498 S. Tsumori and K. Nishino

short of understanding. Furthermore, in order for this method not to be dependent


on the specific study field or study unit, it is one of the features that it is also
applicable to numerical calculations other than SPI2 by the addition of a template.
We plan to develop this system as a Web-based system and to verify its validity.

Acknowledgement. This work was supported by a Grant-in-Aid for Scientific Research (C)
(23501191).

References
1. Huang, S.X.: A Content-Balanced Adaptive Testing Algorithm for Computer-Based
Training Systems. In: Lesgold, A.M., Frasson, C., Gauthier, G. (eds.) ITS 1996. LNCS,
vol. 1086, pp. 306–314. Springer, Heidelberg (1996)
2. Suganuma, A., Mine, T., Shoudai, T.: Automatic Generating Appropriate Exercises
Based on Dynamic Evaluating both Students’ and Questions’ Levels. In: Proceedings of
World Conference on Educational Multimedia, Hypermedia and Telecommunications,
pp. 1898–1903 (2002)
3. Hoshino, A., Nakagawa, H.: A real-time multiple-choice question generation for
language testing - a preliminary study. In: Proceedings of the 2nd Workshop on Building
Educational Applications Using NLP, pp. 17–20 (2005)
4. Mitkov, R., Ha, L.A.: Computer-Aided Generation of Multiple-Choice Tests. In:
Proceedings of the HLT-NAACL 2003 Workshop on Building Educational
Applications Using Natural Language Processing, vol. 2, pp. 17–22 (2003)
5. Holohan, E., Melia, M., McMullen, D., Pahl, C.: The Generation of E-Learning Exercise
Problems from Subject Ontologies. In: Proceedings of The Sixth IEEE International
Conference on Computer and Information Technology, pp. 967–969 (2006)
6. Holohan, E., Melia, M., McMullen, D., Pahl, C.: Adaptive E-Learning Content
Generation based on Semantic Web Technology. In: AI-ED 2005 Workshop 3, SW-EL
2005: Applications of Semantic Web Technologies for E-Learning, pp. 29–36 (2005)
7. Gonzalez, J.A., Munoz, P.: E-status: An Automatic Web-Based Problem Generator -
Applications to Statistics. Computer Applications in Engineering Education 14(2),
151–159 (2006)
8. Guzmán, E., Conejo, R.: A Model for Student Knowledge Diagnosis Through Adaptive
Testing. In: Lester, J.C., Vicari, R.M., Paraguaçu, F. (eds.) ITS 2004. LNCS, vol. 3220,
pp. 12–21. Springer, Heidelberg (2004)
9. Lazcorreta, E., Botella, F., Fernandez-Caballero, A.: Auto-Adaptive Questions in
E-Learning System. In: Proceedings of the Sixth International Conference on Advanced
Learning Technologies, pp. 270–274 (2006)
10. Tsumori, S., Kaijiri, K.: System Design for Automatic Generation of Multiple-Choice
Questions Adapted to Students’ Understanding. In: Proceedings of the 8th International
Conference on Information Technology Based Higher Education and Training, pp.
541–546 (2007)
Proposal of an Automatic Composition Method
of Piano Works for Novices Based on an
Analysis of Study Items in Early Stages of Piano
Education

Mio Iwaki, Hisayoshi Kunimune, and Masaaki Niimura

Abstract. This paper proposes an automatic composition method of piano works


for novices. Because, there is a shortage of piano works suitable for various learning
stages in existing textbooks. At first, this study analyzed and systematized the study
items in the learning process of the textbooks. This study also confirmed that the
study items are adequate to represent the learning stage of each piano works in the
textbooks. This study then defined the restrictions on the piano works for novices
according to learning stages. Finally, this study proposes the automatic composition
method based on the learning stages and their restrictions. This paper describes these
learning items, their restrictions, and the proposed method. Additionally, this paper
shows an example of piano works composed with the proposed method.

Keywords: Piano works, novices, learning stages, study items, automatic compo-
sition method.

1 Introduction
Piano learners in early stages use textbooks for novices in their piano practice. We
guess, the authors of the textbooks compose a piano work for novices based on
knowledge and skill, which the novices already acquired and the author would like
them to learn in the work. The textbooks include various types of piano works suit-
able for novices in various learning stages; however, it is difficult to make enough
Mio Iwaki
Interdisciplinary Graduate School of Science and Technology, Shinshu University, Japan
Hisayoshi Kunimune
Faculty of Engineering, Shinshu University, Japan
e-mail: kunimune@cs.shinshu-u.ac.jp
Masaaki Niimura
Graduate School of Science and Engineering, Shinshu University, Japan

T. Watanabe et al. (Eds.): Intelligent Interactive Multimedia: Systems & Services, SIST 14, pp. 499–509.
springerlink.com © Springer-Verlag Berlin Heidelberg 2012
500 M. Iwaki, H. Kunimune, and M. Niimura

progress in their practice with only such textbook because of a shortage of works
for some stages.
Novices in a learning stage need to acquire knowledge and skill by playingwith
some works suitable for the stage. Thus this study proposes an automatic composi-
tion method of piano works for novices based on already acquired knowledge and
the skill of a learner and items to learn in the composed work. The composed works
with the proposed method become supplementary works for practice of the learner.
Firstly, this paper classifies and systematizes study items in the early stages of
piano practice to define the general curriculum. Secondly, this paper then organizes
the basic component of piano works for novices —rhythm, chord progression, play-
ing style, accompaniment patterns, and so on— based on the study items. Finally,
this paper proposes an automatic composition method of piano works.

2 Related Works
Kitamura et al. propose a method for composing melody including a learner’s weak
point for novices’ self-learning[3]. This method produces sheets of music, which
consist of five measures. Every measures of a produced sheet include two notes
in the same interval. A learner chooses an interval from I to V degree as his/her
weak point to produce a sheet. The method composes five melodies of a measure
including the chosen interval and combines these melodies to produce a sheet.
Yoshida et al. suggest a producing method of tunes like ‘Hanon’ containing suit-
able exercises for a learner, which are found from the learner’s piano performance[7].
This method analyzes the performance and detects the learners’ weak point.
The research of Kitamura et al. and our study are similar in that they limit the
range based on a learner’s level. However, these related works focus on composing
melodies, which are suitable to practice his/her weak point of finger movements.
Moreover, these works intend their method for self-learners in various ages.
On the other hand, McCormack proposes to adapt L-system for automatic com-
position based on grammar[4]. This method can compose various pieces of music;
however, this work does not focus on learning. Thus, we should define other gram-
mar to compose pieces of music based on learning stages of learners.

3 Classifying and Systematizing Study Items


Each textbook has its original aim and curriculum of the lesson, which are empiri-
cally established by the author of the textbook. The common characteristics among
the textbooks are that (1) the texture of almost all of the works are homophony,
and (2) new kind of the frames composing a work —meters, ranges, playing styles,
kinds of notes and rests, and tonalities— are introduced depending on the learning
stage of the work.
This study has classified and systematized the study items of the early stages of
piano learners in these textbooks based on the common characteristics.
Proposal of an Automatic Composition Method of Piano Works for Novices 501

3.1 Study Items in the Early Stages


Figure 1 shows the hierarchical structure of the study items in the early stages piano
learning divided into three viewpoints: “knowledge”, “skill,” and “perception.”
“Knowledge” consists of knowledge to learn in the early stages piano lessons:
frames of works, the relationship between kinds of notes and note values, and the
correspondence between the notes on music and the fingering on pianos. “Skill”
contains skills needed to play piano works in the early stages: the number of fingers
pressing keys simultaneously, fingering techniques, and so on. “Perception” means
sensuous aspects of music such as sense of music, expressive power, and so on. The
goal of piano learners in the early stages is to acquire “Knowledge,” “Skill,” and
“Perception” through playing piano works.

Study items in
the early stages

Knowledge Perception Skill

frames relationship correspondence number of fingering etc.


between between notes fingers to techniques
notes and and fingering play
note values
meters tonality etc.

Fig. 1 Study items in the early stages of piano learning.

Study items in RH = Right hand


the early stages LH = Left hand

Knowledge Skill

fingering
frames relationship
techniques
between correspondence number of
notes and between notes fingers to position of
meters notes
note values and fingering play fingers

more keys LH: extension


3/6, 6/8

RH: E5-C6

RH: A4-D5 extension of RH: single tone


RH: extension
2/4
 LH: C3-E3 five fingers LH: chord

RH: C4-G4 five fingers RH: single tone


3/4, 4/4
 LH: F3-C4 position LH: double notes

middle-C RH & LH: within RH: single tone


 position fifth interval LH: single tone

Fig. 2 The model of piano education in a textbook “Piano Dream.”


502 M. Iwaki, H. Kunimune, and M. Niimura

We do not treat items in “Perception” in this study because these are vague to
classify.

3.2 Model of Piano Education in the Early Stages


We confirm that the study items mentioned above can be applied the curricula in
various textbooks. Figure 2 shows the model of curriculum of a textbook “Piano
Dream,” which is the most popular textbook in Japan.

4 Restriction for Composing Piano Works for Novices


Piano works in the textbooks for novices are composed based on the study items
mentioned above. This study only treats works with the following restrictions for
automatic composition. Because, we surveyed the textbooks for novices and found
lack of the works with these restrictions.
• Obeying functional harmony
– including only 11 harmonies: I-VII, V7 , ◦I, IV/V7 , V/V7 , VI/V7
– Derived notes are not used
• Homophony
– with single tone in melodies
– with single tone, double notes, chords, or broken chords in accompaniment
petterns
• Every phrases has four measures length, and the last measure of each phrase is
perfect cadence or half cadence
• One part, binary, or ternary form
• Value of Notes: 1, 2, 3, 4, 1/4, 1/2, 3/4, 3/2, 7/4, and triplet
• Value of Rests: 1, 2, 4, 1/4, and 1/2
• No ties and ornaments in melodies
• No dynamics marks and expression marks
• Five fingers position or its extensions
– All notes in melodies and accompaniments are within from fifth to seventh
intervals.

4.1 Restrictions on Composition Methods


Homophonic musical pieces following functional harmony has some restrictions
on composition methods. There are additional restrictions to compose music with
above-mentioned restrictions. We organized study items related to usable notes,
rests, and their values, which are decided from learning stages, usable pitches, and
Proposal of an Automatic Composition Method of Piano Works for Novices 503

chords based on the restrictions. We also organized and analyzed accompaniment


patterns, which are decided from these items.
The usable values of rests and notes in a learning stage restrict the usable rhythm
patterns in composed works. The usable notes decided from the study items —
“correspondence between notes and fingering,” “number of fingers to play,” and
“position of fingers”— affect to usable chords and chord progressions in composed
works. Thus, the study items of a learner restrict the melody and the accompaniment
pattern of composed work.

4.2 Rhythms
Learning of rhythm follows the learning progress in understanding notes and rests
and their values. Thus usable notes and rests in a piano work are decided from
the designated learning stage of the work. There are vast combinations of notes
and rests; however, we narrow the combinations down by surveying actually used
combinations in the textbooks for novices.
We selected three textbooks —“Beyer Op.101,”[2] “Methode Roses,”[6] and “Pi-
ano dream”[5]—, which include many homophonic works, and we surveyed the
combinations used in 162 works in these textbooks. These works consist of 87 works
at 4/4, 63 works at 3/4, and 12 works at 2/4. We counted the combinations in every
measure of melodies and accompaniments in these works and divided these combi-
nations into combinations usually used for (1) melodies, (2) accompaniments, and
(3) cadence.
For example, Table 1 shows the usually used combinations of notes and rests with
values one and two at 4/4 in these textbooks.

Table 1 Example of used rhythm patterns with values one and two at 4/4.

Rhythm ˇ “˘ “ ˇ “ ˘ “ ˇ “ˇ “ ˇ “ˇ “˘ “ ˇ “> < ˘ “ ˇ “> ˇ “ˇ “ <


Melody ◦ ◦ ◦ ◦ ◦ ◦
Accompaniment ◦ ◦ ◦ ◦ ◦ ◦
Cadence - - ◦ ◦ ◦ ◦
(◦: Used, -: Not used)

4.3 Chord Progression


The subject piano works in this study have following restrictions as mentioned
above.
• Following functional harmony
• Five fingers position or its extensions
• Having intervals within from fifth to seventh
Piano works composed with these restrictions have limitations on chords in each
work. For example, works in C-major, and the interval of which within fifth inter-
vals, can express only five chords such as I, V7 , ◦I, V/V7 , and VI/V7 . The other
504 M. Iwaki, H. Kunimune, and M. Niimura

chords —II, III, IV, VI, VII, and IV/V7 —, in the work have the same combinations
of notes as I, IV, or V. Table 2 shows the expressible chords in works in C-major
with each interval and playing style.

Table 2 Expressive chords in C-major with each interval and playing style.

Chord
Range Playing style I II III IV V7 VI VII ◦I IV/V7 V/V7 VI/V7
Single tone ◦ - - ◦ ◦ - ◦ ◦ - ◦ ∗
C-G fifth interval Double notes ◦ - - ◦ ◦ - ◦ ◦ - ◦ ∗
Chord ◦ - - - ◦ - ◦ ◦ - ∗ ∗
Single tone ◦ ◦  ◦ ◦   ◦  ◦ ◦
C-A sixth interval Double notes ◦ ◦  ◦ ◦   ◦  ◦ ◦
Chord ◦◦  ◦ ◦   ◦  ◦ ◦
Single tone ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦
C-B seventh interval Double notes ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦
Chord ◦◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦
(◦: Expressible, -: Not expressible, : Cannot exist together in the same work, ∗: No
following chords or unused inverted chords)

4.4 Accompaniment Patterns


Accompaniment patterns consist of their playing styles —single tone, double notes,
chords, and broken chords— and their rhythms. Piano works treated in this study
are homophonic, and the works have an accompaniment part. All accompaniment
patterns in a work have the same playing style and rhythm for every measure except
measure of cadence or half cadence. It is rare that two or more accompaniment
patterns coexist in a work to enhance the mood of the work or to aim special effects.
We organized accompaniment patterns based on the playing styles and ranges.
Tabel 3 shows an example of usable combinations of pitches in accompaniment
patterns.

Table 3 Example of usable combinations of pitches in accompaniment patterns.

Range
Chord Playing style fifth interval sixth interval seventh interval
single tone C, E, or G
I double notes CE, EG, or CG
chord CEG
broken chord C-G-E-G or C-E-G
single tone - D, F, or A
II double notes - DF, FA, or DA
chord - DFA
broken chord - D-A-F-A or D-F-A
(-: Impossible to play or expressing other chord)
Proposal of an Automatic Composition Method of Piano Works for Novices 505

4.5 Chord-Tone Connection


The chord progression of homophonic musical pieces affects the melody of them[1].
There is a tendency to assign chord-tones in the downbeats of the melody of piano
works in the textbooks. We thus decided to place chord-tones on the downbeats of
a melody. The downbeats in each time are as follows:
• First and third beats in works in quadruple time (4/4),
• First, second, and third beats in works in triple time (3/4 and 3/8), and
• First and second beats in works in double time (2/4).
If we need to put any notes between downbeats, and we make the notes as nonhar-
monic tones to connect chord-tones around the notes. In case nonharmonic tones
cannot be placed on there, the seventh or ninth tone will be put on instead of chord-
tones. We decided that neighbor downbeats are within forth intervals because it is
easy to put nonharmonic tones.
We also organized chord-tone connections based on these criteria. Table 4 shows
an example of the connections four notes N1 -n2 -n3 -N4 — N1 and N4 are on down-
beats and harmonic tones, and the interval between them is first.

Table 4 Example of connection between chord-tones.

Interval between notes Examples


N1 and n2 n2 and n3 n3 and N4
first first first C-C-C-C
C-C-D-C
first second second
C-C-B-C
second second first C-D-C-C
C-B-C-C

4.6 Form
All phrases in the subject works have four measure length and are perfect cadence
or half cadence. All of the works are in one part, binary, or ternary form as follows:
• One part form: AA A’A
• Binary form: AB A’B
• Ternary form: ABA AB’A A’B’A
A, A’, B, and B’ indicate phrases. A and B are perfect cadence, and A’ and B’ are
half cadence.

4.7 Practice Point


Supplementary works for novices contain practice points, which is a sequence of
two or three notes on designated pitches in a measure. Their pitches affect the
506 M. Iwaki, H. Kunimune, and M. Niimura

chord assigned to the measure, and at least one note in the sequence becomes the
chord-tone.
We organized the relationship between assignment of a chord to the measure,
which contains a practice point, and the pitches in the sequence of a practice point.
Table 5 shows an example of the relationships in a work, which has fifth intervals
and chord pattern in its accompaniment and is in C major.

Table 5 Example of the relationship between the chords and pitches in practice points.

Chord-tone(s)
Sequence of two notes Both notes the first note the second note
C-C I I I
C-D - I V
C-E I I V
C-F - I I
C-G I I I and V
C-A - I -
C-B - I V
(-: No chords to saticefy the condition)

5 Process and Rules to Determine Melody and Accompaniment


Patterns
This section describes a method for determining melody and accompaniment pat-
terns of a piano work. This method composing the piano work based on the study
items defined in the previous section and practice points in the composed work by
following three steps below.
Step 1: Choose the bases of the piano work: range, notes and rests value, number
of measures, meter, form, tonality, accompaniment patterns and permission
of 4th interval. These are automatically chosen based on the learning stage
of a learner.
Step 2: Designate the practice points in the work.
Step 3: Determine the notes of the work. In this step, harmony, rhythm and accom-
paniment are chosen out of the selection in the step 1 and 2.
In the step 3, at first, the method decides a chord progression including the harmony,
which harmonizes with the practice points. The method then decides the rhythm
of the work using notes and rests values chosen in the step 1. Next, the method
determines the melody of the measure including the learning items at the beginning,
and then the method decides the melody of other measures. Finally, the method
selects an accompaniment of the work by using the accompaniment patterns chosen
in the step 2 and the chord progression chosen in the step 3.
We decide additional rules in the step 3 as follows:
Rule 1: The user chooses the value of a note in the melody between long (≥ 1) and
short (< 1) when the next note of the note has fourth interval from the note.
Proposal of an Automatic Composition Method of Piano Works for Novices 507

Rule 2: When the practice point of the work is designated as a series of two or
three notes, there are two following rules to assign a chord to the measure
including the practice point.
Rule 2-1: If there are chords including all notes of the practice point as chord-tones,
the method chooses from these chords and sets the first note on a down-
beat in the measure.
Rule 2-2: Otherwise, the method assigns a chord including the second note as a
chord-tone and sets the second note on a downbeat in the measure.
Rule 3: If the user designates two or more practice points, and there is no chord
progression, which can include all of the practice points. Then the method
composes the work in binary form.
Rule 4: If the chord of the 4n-th (ex. fourth, eighth, 12th) measure, which is not the
last measure of the work, becomes I (tonic triad), the method determines
the last note in the measure the third or the fifth note.
Rule 5: The method sets the last note the key-note of the work.
Rule 6: If the melody of the work finishes on rests, the method determines the ac-
companiment to finish on the same rests.

6 Example of Works Composition


Figure 3 shows an example of the study items in a learner’s stage and the practice
point. Figure 4 shows a result of automatic composition with the proposed method.
The step 1 chose the bases of the piano work. For example, the range, which the
learner can play with his/her right hand; C4-G4, and left hand; B2-A3.

Study items in RH = Right hand


the early stages LH = Left hand

Knowledge Skill

fingering
frames relationship
techniques
between correspondence number of
notes and between notes fingers to position of
meters notes
note values and fingering play fingers

RH: extension
4/4  RH: C4-G4 extension of
LH: seventh
RH: single tone
LH: B2-A3 five fingers LH: broken chord
 intervals

Fig. 3 Example of the study items in a learning stage.


508 M. Iwaki, H. Kunimune, and M. Niimura


 44                   

 4                                
4


                  
 
 
                    
      

Fig. 4 Result of automatic composition with the proposed method.

The step 2 designates the practice point in the work. In this case, the practice
point is the sequence of two notes E-F with his/her right hand.
The step 3 determins the chord progression of the work. For example, the method
chose out usable notes and chords, which are limited by the range of this work and
practice point (E-F with the right hand). In this case, the method determines I-V/V7 -
V-I as the chord progression of the work. The method also determines the melody
and the accompaniment pattern of the work.

7 Conclusion
The objective of this study is an automatic composition of supplementary piano
works for novices on the early stage.
At first, we classify and systematize study items in piano works in textbooks for
novices. The piano works has characteristics: study items increase functionally as
the learner’s skill grows up. These study items limit elements of the piano works
such as range, notes and rests values, number of measures, and so on. This study
hypothesizes it is possible to automatically compose piano works for novices, which
are suitable for a learner’s learning stage, by using the limitation introduced from
study items of the stage. In this study, we classify and systematize the study items
of each stage, and we confirm these items can explain the curricula of textbooks for
novices.
This study then propose a method, which consists of three steps and six rules,
for automatic composition. As study items in a learning stage restrict the elements
of a piano work, it is possible to make selections about the base elements of the
work from these restrictions. Thus, the proposed method can choose out from the
selections, which are made by the restrictions introduced from the study items.
We confirmed that the proposed method could automatically compose some pi-
ano works for novices in various stages.
Proposal of an Automatic Composition Method of Piano Works for Novices 509

References
1. Akutagawa, Y.: Ongaku no Kiso (Fundamentals of music). Iwanami Shoten (1971) (in
Japanese)
2. Beyer, F.: Vorschule im Klavierspiel Op.101: for Piano. Zen-On (2005)
3. Kitamura, T., Miura, M.: Constructing a support system for self-learning playing the pi-
ano at the beginning stage. In: Proc. International Conference on Music Perception and
Cognition, ICMPC 2006, pp. 258–262 (2006)
4. McCormack, J.: Grammar Based Music Composition. In: Complex Systems: From Local
Interactions to Global Phenomena, pp. 320–336. IOS Press (1996)
5. Tamaru, N.: Piano Dream 1–6. Gakushu Kenkyu Sha (1993)
6. Velde, E.V.D.: Methode Rose Par Ernest Van De Velde. Ongaku No Tomo Sha Corp.
(1950)
7. Yoshida, K., Muraki, M., Emura, N., Miura, M., Yanagida, M.: Generating appropriate ex-
ercises like hanon for practicing the piano. Technical report of musical acoustics, Acous-
tical Society of Japan, MA2008-52, pp. 51–56 (2008) (in Japanese)
Proposal of MMI-API and Library
for JavaScript

Kouichi Katsurada, Taiki Kikuchi, Yurie Iribe, and Tsuneo Nitta*

Abstract. This paper proposes a multimodal interaction API (MMI-API) and a


library for the development of web-based multimodal applications. The API and
library enable us to embed synchronized multiple inputs/outputs into an applica-
tion, as well as to specify concrete speech inputs/outputs and actions of dialogue
agents. Because the API and the library are provided for JavaScript, which is a
commonly used web-development language, they can be executed on general web
browsers without having to install special add-ons. The users can therefore expe-
rience multimodal interaction simply by accessing a web site from their web
browsers. In addition to presenting an outline of the API and the library, we offer
a practical example of the use of the multimodal interaction system, as applied to
an English pronunciation training application for Japanese students.

Keywords: MMI-API, JavaScript, web-based multimodal interaction.

1 Introduction
The use of web-based multimodal interaction is one of the most absorbing research
topics in the area of multimodal interaction. Some multimodal description languages
such as X+V [1] and XISL [2] have been proposed to describe speech interaction
scenarios and are used together with HTML pages, while SALT [3] provides a set of
tags that are used to embed a speech interface into HTML documents. Other languag-
es such as SMIL [4], MPML [5], and TVML [6] define the synchronization of output
media or the gestures of animated characters for the rich presentation of output me-
dia/agents. Although these languages have resulted in significant advances in web-
based multimodal interaction, not many of them are widely used as practical systems.
One reason for this is the difficulty that application developers face in master-
ing a new description language. Although the above languages provide a strong

Kouichi Katsurada ⋅ Taiki Kikuchi ⋅ Yurie Iribe ⋅ Tsuneo Nitta


*

Toyohashi University of Technology,


Toyohashi, Aichi 441-8580, Japan
e-mail: {katsurada,nitta}@cs.tut.ac.jp, kikuchi@vox.cs.tut.ac.jp,
iribe@imc.tut.ac.jp

T. Watanabe et al. (Eds.): Intelligent Interactive Multimedia: Systems & Services, SIST 14, pp. 511–520.
springerlink.com © Springer-Verlag Berlin Heidelberg 2012
512 K. Katsurada et al.

foundation for developing multimodal applications, application developers must


make considerable effort to learn them. If developers could instead use a well-
known language, they could create many applications without such an investment
of time and effort. The other difficulty presented by the currently available lan-
guage is the complexity faced by users in the installation and compilation of an
interpreter. For general web users, it is time-consuming to install new software
when they use a new application. To avoid this labor, it is desirable for multimod-
al applications to be executed without the use of any additional software.
Against this background, we propose an MMI-API for JavaScript and provide a
library that can be executed on general web browsers. Because JavaScript is the
most commonly used language for web development and provides rich descriptive
power for controlling multimodal interaction, developers can create rich multi-
modal applications without needing to learn a new language. This enables users to
experience multimodal interaction on the web through their browsers. The ease of
use of these features will accelerate the use of multimodal interaction on the web.
A number of APIs and libraries have been proposed for the handling of speech
and dialogue agents in JavaScript. Microsoft agent [7] is one approach to handling
animated characters on the web. It provides an API to control the gestures of ani-
mated characters in JavaScript. It differs from our proposal, however, in that the
characters need to be installed on the user’s PC, whereas our API and library re-
quire no installation in order for the user to experience multimodal interaction.
Other approaches to providing rich web contents using Flash and JavaScript util-
ize synchronized outputs of speech, music, and movies. Most of them, however,
just provide output functionality. The purpose of our approach is to provide an
API that can handle both output and input modalities, including speech interaction
with dialogue agents. This distinguishes our approach from other approaches for
delivering rich content through the web.
In the rest of this paper, we first discuss the requirements for the API in Section 2,
and present an outline of the API specifications in Section 3. In Section 4, we
show the system structure of the library. Next, in Section 5, we introduce a sample
application, and then conclude our proposal in Section 6.

2 Requirements for MMI-API


A number of languages have been proposed for the development of multimodal
interaction systems. SALT is a tag set that is used to add a voice interface to its
mother language (which is usually presumed to be HTML). It provides tags for
speech input/output, binding them to HTML forms, along with some other func-
tions that are used in speech interaction. SALT tags are activated by the JavaScript
described in its mother HTML document. X+V achieves multimodal interaction
by using VoiceXML, which defines speech interaction, together with XHTML.
VoiceXML and XHTML tags are bound by the original tags. The VoiceXML
document is invoked by an XML event that occurs in the XHTML document.
XISL is a language that is used to control a multimodal interaction scenario, which
includes speech interface, HTML pages, and so on. In an XISL document, HTML
is handled as one of the output/input modalities. SMIL, MPML, and TVML are
languages that are used to control multimedia presentations or the gestures of
Proposal of MMI-API and Library for JavaScript 513

dialogue agents. SMIL provides detailed synchronization of multimedia outputs


such as audio and movies, whereas MPML offers a number of gestures that are
performed by animated characters. TVML also controls the performance of ani-
mated characters for TV programs. Table 1 lists the features of these languages.

Table 1 Features of existing languages

SALT X+V XISL SMIL MPML TVML


multimodal inputs Yes Yes Yes No Yes* No
multimodal outputs Yes Yes Yes Yes Yes Yes
media synchronization No No Yes** Yes Yes Yes
dialogue agent No No Yes No Yes Yes
compatibility with HTML Yes Yes Yes No No No
* Some versions of MPML support speech interface [8].
** XISL provides limited input/output synchronization.

For an MMI-API, the handling of multimodal inputs/outputs is an essential


function. At the same time, the synchronization of inputs/outputs and the gestures
of dialogue agents play important roles in the development of rich applications
with various input/output modalities. Another important feature of an MMI-API is
compatibility with HTML. To create useful applications, data binding with HTML
is a fundamental requirement. To summarize the above discussion, we defined the
following requirements that an MMI-API should ideally satisfy.
1. It can handle multimodal inputs.
2. It can handle multimodal outputs.
3. It can control the synchronization of multimodal inputs/outputs.
4. It can control the gestures of dialogue agents.
5. It has a strong connection with HTML.
Because requirement 5 is a preexisting JavaScript feature, we sought to create an
API that satisfies requirements 1-4.

3 Outline of MMI-API
Based upon the discussion in Section 2, we created the following specifications for
an API in order for it to handle multimodal inputs/outputs. Table 2 shows the mul-
timodal input/output functions provided by the API. It provides four types of input
functions: Input, seqInput, altInput, and parInput; these handle un-
imodal input, multimodal sequential inputs, multimodal alternative inputs, and mul-
timodal parallel inputs, respectively. As for outputs, the API provides three types of
functions: Output, seqOutput, and parOutput; these handle unimodal out-
put, multimodal sequential outputs, and multimodal parallel outputs, respectively.
These functions only partially satisfy requirements 1-3 because they do not de-
fine any details with regard to concrete inputs, outputs, and synchronization. These
details are provided by the arguments given to the functions. These arguments are
described in the JSON format [9], as listed in Figure 1. The JSON format is used to
514 K. Katsurada et al.

describe multiple pairs of properties and their values. Each property represents the
type of modality, the conditions for accepting input, or some other options. Tables
3-5 show the available properties of inputs, outputs, and gestures performed by the
dialogue agents, respectively.

Table 2 Multimodal input/output functions

Type Function Outline


Input Input Accepts a unimodal (single) input
seqInput Accepts sequential inputs
altInput Accepts alternative (exclusive) input
parInput Accepts simultaneous inputs
Output Output Outputs a unimodal (single) content
seqOutput Outputs contents sequentially
parOutput Outputs contents simultaneously

// Input example
mmi.altInput({
"type" :"click",
"match" :"agent"
},{
"type" :"speech",
"match" :"./grammar/start.txt"
}); //start interaction with the agent
//click the agent or talk to the agent

// Output example
mmi.parOutput({
"type" :"audio",
"event" :"play",
"match" :"./sound/isshoni.ogg",
"options":{
"begin":500
}
},{
"type" :"agent",
"event" :"gesture",
"gesture":"speak",
"options":{
"begin":500,
"dur" :5500
}
}); //output the agent’s speech
//This example does not use TTS.

Fig. 1 Descriptive example of multimodal input/output functions


Proposal of MMI-API and Library for JavaScript 515

Table 3 Example of input properties and available values

Property Value Outline


“type” “click” Mouse click
“mouseover” Mouseover event
“speech” Speech from the user
“keydown” Key event
: :
“match” HTML ID Specify the ID in the HTML document
(“type” is mouse event)
“agent” Operation to the agent
(“type” is mouse event)
Grammar file Specify the grammar file
(“type” is “speech”)
Character Output contents simultaneously
(“type” is key event)
: :
“options”
“begin” Time or event Start time or event of input acceptance
“dur” Time Duration of input acceptance
“repeatCount” Number Number of repetitions
: : :

Table 4 Example of output properties and available values

Property Value Outline


“type” “agent” Agent’s gesture
“audio” Output an audio file
“browser” Output a HTML page
: : :
“event” “speech”,“gesture” Agent’s gesture (“type” is “agent”)
“play”, “stop”,… Operation to audio (“type” is “audio”)
:
“match” HTML ID or path to media Specify an output content
(“type” is “browser”, “audio”, …)
“text” Output text Specify the output text
(“type” is “agent”, “browser”)
“gesture” Define a gesture The gesture is performed by the agent
“options”
“begin” Time or event Start time or event of presentation
“dur” Time Duration of presentation
“fadein” Fade-in effect
“volume” 0.1 to 1.0
: : Volume control
:
516 K. Katsurada et al.

Table 5 Gestures performed by dialogue agent

Type Gesture Outline


Facial “happy” Happy expression
expressions “sad” Sad expression
and gestures “blink” Blink
(24 gestures) “speak” Lip sync
: :
Hand “pointup” Point upward
gestures “grab” Grab an object
(15 gestures) : :
Body “walk” Walk
gestures “explain” An explanatory posture
(14 gestures) : :
User “appear” An agent appears
interaction “disappear” The agent disappears
(4 gestures) : :
Others “search” Search for something
(5 gestures) “wait” Wait
: :

4 Multimodal Library and System Structure


We provide a multimodal library together with the API. The library includes a
JavaScript program for handling multimodal inputs/outputs, a Java applet program
for recording speech input, and a Java servlet program for managing server-side
applications such as a speech recognition engine, speech synthesis engine, and
agent manager. Figure 2 shows the system structure of the multimodal interaction
system.
In this system, a speech input from a user is recorded in the sound recorder mod-
ule, which is executed on the web browser. It is then sent to the session manager on
the web server through the JavaScript program that is executed on the web browser.
The session manager sends it to the input manager, and the speech is then recog-
nized by the speech recognition engine Julius [10]. The recognition result is sent to
the JavaScript program through the input manager and the session manager.
The production of speech output and dialogue agent animation is ordered from
the JavaScript program that is executed on the web browser. An output text and a
command specified by the JavaScript program (using the API) are sent to the
agent manager through the session manager, the output control manager, and the
output manager. The agent manager produces a synthesized voice using the speech
synthesis engine Aques Talk [11], and sends it to JavaScript, together with the
images of the dialogue agent through the output manager and the session manager.
The images of the dialogue agent are given in a gif format, and are converted into
animation on the web browser.
Proposal of MMI-API and Library for JavaScript 517

Fig. 2 Structure of multimodal interaction system

5 Sample Application Using Developed API and Library

To confirm the usability of the developed API and library, we embedded multi-
modal interaction into a web-based English pronunciation training application for
Japanese students [12] that was scripted using Action Script (A scripting language
used for developing Flash software). The application recognizes the user’s pro-
nunciation of a phoneme and shows its manner and place of articulation (such as
the shape of the mouth and position of the tongue) on the International Phonetic
Alphabet (IPA) vowel-chart. The user can correct his/her pronunciation by mod-
ifying his/her mouth shape and tongue position according to the chart.
Figure 3 shows a screenshot of the application, and Figure 4 outlines its system
structure. The character shown at the center of the figure is a dialogue agent to
which the user speaks. The user can input commands to the system either by oper-
ating the mouse or by speaking to the agent. These commands are sent to the Flash
program that is executed on the browser, after which they are sent to the server.
The content is delivered to the user after synchronization of the dialogue agent’s
gestures, speech, and background music.
518 K. Katsurada et al.

Fig. 3 Screenshot of English pronunciation training application

Fig. 4 Browser-side system structure of pronunciation training application


Proposal of MMI-API and Library for JavaScript 519

Using the proposed API and library, through the development process of this
application, we confirmed a number of findings. Compared to the development of
the application using the other languages, our API and library realize more de-
tailed and complex control of interaction, including control of the timing of in-
puts/outputs and coordination with Flash or other APIs (such as Google map API).
Because an application developer can now construct an interaction scenario only
using JavaScript, he/she can regard the interaction scenario and the application as
a single program unit. This feature enables developers to construct complicated
interaction scenarios far more easily than they would use some other languages.
However, an issue arises due to a characteristic of the language JavaScript, namely
the difficulty of synchronization that is triggered by the end of an input acceptance
or an output presentation. This is because JavaScript does not provide a “wait”
function. The developer still has to write a somewhat complicated program to
execute this type of synchronization. We would like to resolve this issue in the
future.

6 Conclusions
In this paper, we have proposed a multimodal interaction API for JavaScript, and
have also provided a library that can be executed on general web browsers. Al-
though the API is very simple one which contains only four types of input func-
tions and three types of output functions, it has a strong descriptive power in the
arguments given to these functions. Through the development process of a web-
based English pronunciation training application for Japanese students, we con-
firmed that the API and the library provide more detailed control of a complicated
interaction, including control of the timing of inputs/outputs and coordination with
Flash programs. This feature enables developers to construct complicated interac-
tion scenarios more easily than they would use some other languages.
The remaining studies are to resolve the problem of output synchronization,
which was mentioned in Section 5, and to implement various non-implemented
functions of the library (such as the nesting of inputs/outputs, and some gestures
of the dialogue agent, among others). In the near future, we intend to publish the
API and the library on the web.

References
1. XHTML+Voice, http://www.w3.org/TR/xhtml+voice/
2. Katsurada, K., Nakamura, Y., Yamada, H., Nitta, T.: XISL: A Language for Describ-
ing Multimodal Interaction Scenarios. In: Proc. of ICMI 2003, pp. 281–284 (2003)
3. Wang, K.: SALT: A spoken language interface for web-based multimodal dialog sys-
tems. In: Proc. of InterSpeec 2002, pp. 2241–2244 (2002)
4. SMIL, http://www.w3.org/AudioVideo/
5. Tsutsui, T., Saeyor, S., Ishizuka, M.: MPML: A Multimodal Presentation Markup
Language with Character Agent Control Functions. In: Proc. WebNet 2000 World
Conf. on the WWW and Internet (2000)
520 K. Katsurada et al.

6. Hayashi, Ueda, Kurihara: TVML (TV program Making Language) - Automatic TV


Program Generation from Text-based Script. In: ACM Multimedia 1997 State of the
Art Demos (1997)
7. http://www.microsoft.com/products/msagent/main.aspx
8. Nishimura, Y., Minotsu, S., Dohi, H., Ishizuka, M., Nakano, M., Funakoshi, K., Ta-
keuchi, J., Hasegawa, Y., Tsujino, H.: A markup language for describing interactive
humanoid robot presentations. In: Proc. of IUI 2007, pp. 333–336 (2007)
9. JSON, http://www.json.org/index.html
10. Kawahara, T., Kobayashi, T., Takeda, K., Minematsu, N., Itou, K., Yamamoto, M.,
Yamada, A., Utsuro, T., Shikano, K.: Sharable software repository for Japanese large
vocabulary continuous speech recognition. In: Proc. ICSLP 1998, pp. 3257–3260
(1998)
11. Aques Talk, http://www.a-quest.com/aquestalk/
12. Mori, T., Iribe, Y., Katsurada, K., Nitta, T.: Real-time Visualization of English Pro-
nunciation on an IPA Vowel-Chart Based on Articulatory Feature Extraction. IPSJ SIG
Technical Report 89-15 (2011) (in Japanese)
Proposal of Teaching Material of Information
Morals Education Based on Goal-Based
Scenario Theory for Japanese High School
Students

Kyoko Umeda, Ayako Shimoyama, Hironari Nozaki, and Tetsuro Ejima

Abstract. We considered the knowledge and skills that are required for information
morals education at high schools in Japan. Clearly its knowledge and skills are
critical for high school students. We demonstrate the problems with the teaching
material of information morals in Japan. Teaching materials are needed to help
students cope with the problems caused by utilizing the knowledge and the skills of
the information society to provide a more general way of thinking that can apply
other examples and a positive attitude toward the information society. We proposed
teaching material based on the goal-based scenario theory. From our practice results
that used teaching material whose subject was spam e-mail, we achieved the
learning goals of the teaching material.

1 Introduction
The following two points contain this paper’s purpose. The first is to consider the
knowledge and skills that are required for information morals education at high
schools in Japan. The second is to propose teaching materials for learning about
such knowledge and skills.

1.1 Requirement of Knowledge and Skills on Information Morals


Education
The Ministry of Education, Culture, Sports, Science and Technology in Japan
(MEXT) defines information morals as the “attitude that all people should acquire

Kyoko Umeda ⋅ Ayako Shimoyama ⋅ Hironari Nozaki ⋅ Tetsuro Ejima


Faculty of Education, Aichi University of Education, Hirosawa1, Igaya-cho, Kariya-shi
448-8542, Japan
e-mail: kumeda@auecc.aichi-edu.ac.jp

T. Watanabe et al. (Eds.): Intelligent Interactive Multimedia: Systems & Services, SIST 14, pp. 521–529.
springerlink.com © Springer-Verlag Berlin Heidelberg 2012
522 K. Umeda et al.

for living in an information society and developing an information society” [1].


According to Gagne, R. M [2], the study of this attitude includes feelings when a
child chooses his/her behavior. An attitude can be perceived by whether one
chooses appropriate behavior in particular situations. For example, choosing to pick
up trash represents an affirmative attitude to the environment and choosing to do
homework instead of playing a video game represents an affirmative attitude to
study. We examined what kind of knowledge and skills are effective to study
attitudes. Of course, it is not the same that understanding knowledge intellectually
and feeling, and then an attitude is formed. But knowing is one step that leads to
making decisions. One can find intellectual/motor skills for this attitude
representation by asking how one can represent attitudes. Knowledge can be found
for the meaning of attitudes by asking why these are necessary for selecting this
attitude.
Concerning this theory, which knowledge and skills should be required when
teaching information morals? Before considering this question, we discuss
information morals education in Japan. MEXT published a guidebook and a
curriculum of information morals education for elementary through and high school
students in 2007 [1]. According to the guidebook, the curriculum can be classified
into two main parts. One deals with correct judgments and appropriate attitudes to
the information society. The second deals with the methodology of risk avoidance
and a knowledge of security and information technology to safely live in the
information society. The guidebook also argues that the difference of daily morals
and information morals originates in studying the knowledge and skills of the
information society, even though both moralities have many overlapping parts.
Therefore, a moral standard, as the overlapping parts of daily morality and
information morals and the knowledge and the skills of the information society, as
the difference between both moralities, is required in information morals education
in Japan.
This study’s target is high school students. The curriculum of information morals
showed that children’s knowledge and skills of the information society increase as
they enter high school (Fig. 1).

Learning contents of information morals


Elementary school Moral standard
(overlapping part of
Junior high Knowledge and daily morals)
skills of
High school information
society
Fig. 1 Knowledge and skills of information society at each level
Proposal of Teaching Material of Information Morals Education 523

1.2 Knowledge and Skills of Information Society


In this section, we explain the knowledge and the skills of the information society.
Recent studies have focused on the following three aspects of its knowledge and
skills.
(1) Understanding its characteristics
Tamada and Matsuda [3] examined many examples relevant to information morals
and identified the following five characteristics:

• Authenticity: The Internet has too much random information, since anyone can
contribute to it. Information must be verified.
• Openness: The information written on bulletin boards or SNSs can be seen by
anyone all over the world since they are open to the whole world.
• Recorded information: The information written on the Internet is difficult to
completely delete. On the other hand, sent information is not anonymous. The
records of senders always remain.
• Mutual payment and public resources: Both senders and recipients pay Internet
fees. Networks are public resources that should be used economically.
• Invading problems: Simply by connecting to the Internet, dangerous Web pages
might attack computers to extract information.

(2) Knowledge and skills about information technology


Fundamental knowledge about information technology or network includes the
following. When sending e-mails, a video file generally needs larger bandwidth. A
firewall is a structure that prevents outside invasions of computer networks in an
organization. Such knowledge is necessary to cope with the problems or to live in
an information society.
(3) Knowledge about laws relevant to information morals
This includes fundamental legal knowledge, for example, copyright and private
information protection.

In this paper, we define these three aspects as the knowledge and the skills of the
information society.

1.3 Problems of Teaching Materials of Information Morals


in Japan
Japanese high school students study the above knowledge and skills, especially
knowledge (2) and skills (3) in the subject “information” which exist only high
school. However, based on our previous research [4], some students with a certain
level of morality cannot clearly cope with information moral problems using the
knowledge and skills of the information society. Such students cannot function in
the information society and don’t know how to use its knowledge and skills. For
524 K. Umeda et al.

example, since a student knows a domain’s terms without knowing whether the site
is helpful for judging authenticity, he doesn’t check what domains are more reliable
for judging websites. Another example is a case where student A wrote a friend’s
libel on Student A’s blog since Student A was under an illusion that blog was only
read by oneself and close friends. Students sometimes fail to realize how open the
information society is.
We also identified the following problems in information morals education in
Japan.

• Among the teaching materials of information morals currently used in schools


since 2000, 81% emphasize the negative conclusions of various scenarios [5].
Some students are intimidated by information technology, and others don’t want
to join the information society. Teaching materials must promote positive
feelings to encourage participation in it.
• The teaching method of information morals is generally based on case studies.
However, this method can’t give students every possible example, so they can’t
cope with new problems. A more generally adopted way of teaching is required
in which students can deal not only with examples in class/teaching material but
other examples as well [6].

To overcome these problems, we developed teaching material based on the


goal-based scenario (GBS) theory, which we explain in the second purpose of this
study where we propose the teaching materials we developed.

2 Developing Teaching Material Based on GBS


The GBS theory is an instructional design that provides a learn-by-doing simulation
environment where students pursue a goal by practicing target skills with relevant
content knowledge [7]. It constructs a story that gives students to opportunity to
learn from mistakes in realistic contexts. Such learning from expectation failure is
compatible with the education of information morals because students can simulate
experiences in the information society, for example, having troubles and coping
with them. Students gain experience about the information society by applying their
knowledge and skills and dealing with information problems.

2.1 GBS Theory


First, we briefly explain [7][8] the following seven components of the GBS theory:
(1) Learning goals
The learning goals, which are specific descriptions of what students should learn,
have two different knowledge categories: process and content. Process knowledge
is how to practice skills that contribute to goal achievement, and content knowledge
is the information required to achieve goals. In constructing GBS, the designers
focused on the skill set they wanted students to practice as well as the content
knowledge they wanted students to find. Students are not told these learning goals.
Proposal of Teaching Material of Information Morals Education 525

(2) Mission
The mission is the problem that students are expected to solve. They learn to
recognize that the mission is the goals of the teaching material instead of the
learning goals. It must be motivational and somewhat realistic. It should also
require the use and application of the knowledge and skills mentioned in the
learning goals.
(3) Cover story
The cover story is the background storyline that creates the need to accomplish the
mission. It must motivate the students and allow enough opportunities to practice
the skills and seek knowledge.
(4) Role
The role is the character or position that the students will play within the story. It is
important to think about what role is best in particular scenarios to practice the
necessary skills. The role should also motivate students to engage in the story.
(5) Scenario operations
The scenario operations are all the activities done by students in pursuing the
mission goals. They must be closely related to both the mission and the goals. They
must also include decision points with evident consequences. If a student cannot
make the right decision, they should learn the negative consequences through
student interaction.
(6) Resources
Resources provide the information that might assist students to achieve their
mission goal. Well-organized information must be readily accessible to help
students successfully complete their missions.
(7) Feedback
The students can receive adequate feedback at appropriate times in any of three
ways: consequence of actions, through coaches, or domain experts who tell stories
about similar experiences.

2.2 Method of Developing a Teaching Material of Information


Morals
Based on GBS, we developed teaching materials for information morals so that
students can realize the information society and its troubles and gain experience in
judging right actions in it using its knowledge and skills. Besides overcoming all
three problems mentioned in Section 1.3., we also addressed the following two
points:

• Encouraging students to participate in the information society by engendering


positive feelings about it
• Helping students learn by examples
526 K. Umeda et al.

In this paper, we explain our teaching material using spam e-mail as an example.
Spam e-mail may spread problems by one-click frauds or distribute viruses. In
recent years, spam e-mail is often generated by the personal information that was
inputted into prize sites by users themselves rather than being randomly sent. The
fundamental way of coping with spam e-mail is to ignore it. We must also use filters
that block it and avoid inputting personal information. However, personal
information may be disclosed on SNSs during communication with others.
Therefore, students must be taught more than simply warning them to avoid
inputting personal information on webpages.
To implement GBS, we designed our teaching material as follows:
(1) Learning goals of spam e-mail teaching material
We define process knowledge as the following three skills: (i) Correctly
identifying spam e-mails and coping with them. (ii) Deciding for themselves how
much personal information to disclose based on the website or their purposes. (iii)
Developing positive feeling by participating in the information society. We define
content knowledge as the following three aspects based on the definition of
knowledge and the skills of the information society mentioned in Section 1.2: (i)
Students have to learn about invaders, authenticity, and openness as characteristics
of the information society. (ii) Students have to learn information technology that
is relevant to spam e-mails, for example, filtering services and domains that
effectively reject them. (iii) Such laws as the Consumer Contract Act for the
Internet, which enacted for relief of the consumers’ operation mistake in electronic
commerce, are used by students when they practice skills in the scenarios. When
we decide the learning goals, the process knowledge can be widely adapted not
only for a specific example or trouble but other examples in the information
society.
(2) Mission, (3) cover story, and (4) role of spam e-mail in teaching material
In Japan, more than 95% of Japanese high school students have mobile phones and
34.5% of junior high school students [9]. Based on this current situation, we set the
mission, the cover story, and the role as follows. The mission for students is to give
advice to younger junior high siblings, who are eager to start using mobile phones,
about coping with the information society. The role of the students is an elder high
school sibling.
We made a student’s role that is not working actively in the information society
but is supporting the sibling for the following reason. In some scenarios, events
purposely go wrong so that students can learn something. In such cases, they do not
avoid failure, even though they didn’t choose that way. Some students regard it as
disagreeable. Therefore, students are assigned to roles that support younger siblings
and view the process of activities instead of a role itself that fails. In the cover
story’s scenario, the younger sibling gets excited since she finally has her own
mobile phone and starts to use it. But she is having problems as she advances in the
information society and needs advice.
Proposal of Teaching Material of Information Morals Education 527

(5) Scenario operations, (6) resources, and (7) feedback of spam e-mail teaching
material
First, students learn the mechanism of spam e-mails through the young sister’s
behavior. Then the question of how students should cope with them is prepared as
scenario operations. The teaching material immediately returns feedback that
informs students of failures and the reasons when they selected wrong actions.
The scenario operations give students a chance to practice specifying a domain
for rejecting spam e-mails and helping them become comfortable with information
technology. The scenario operations, which include such other topics as one-click
fraud and SNS communication, help students judge how much personal information
to disclosure based on the website or their particular purpose. Moreover, not only
troubles but also effective usage of advertising e-mail, for example, discount
coupons, is contained in the scenario operations. The teaching material describes
the knowledge and skills of information society as resources that the students can
read depending on the scenario operations.

3 Practice Using Developed Teaching Materials

3.1 Overview
We used our developed teaching materials at a high school on December 20-21,
2010 in 90-minute classes with 40 students who had already studied information
technology. In the first half of the class, we explained the mission and the scenario
using presentation files, and all students dealt with the mission in every four
students’ groups. In the second half of the class, each student practiced the skills
and knowledge using a PC and the teaching material that we developed using
Adobe Flash Professional CS3.0 for Windows.

3.2 Results
We conducted pre- and post-tests to evaluate whether the students achieved the
learning goals and examined the results from three aspects of process knowledge.
(i) Students correctly recognized spam e-mails and coped with them.
In the pre- and post-tests, students were given examples of spam e-mails and asked
how they would approach them. Students answered the action of deciding as skills
and gave their reasons of the action as knowledge. Students choose answers from
the pre-test choices, although they chose actions from the choices and gave reasons
in the post-test because it was more difficult than the pre-test. The pre- and
post-tests had two types of questions: similar questions that confirmed whether
students understood what they learned in the class or the teaching material, and new
questions that confirmed whether students could apply what they learned to new
situations that were not treated in the class.
528 K. Umeda et al.

The average scores of these questions are shown in Table 1. The results of
multiple comparisons showed that the post-test scores are significantly higher than
the pre-test scores.

Table 1 Pre- and post-test scores


N=40, 10 is perfection
Pre-test Post-test
Mean 5.58 6.59
SD 1.73 1.76

(ii) Students can decide how much personal information to disclosure based on the
website.
Students were given a role of Student A who wants to exchange information about
her favorite singer B with fans’ B on SNS. Students were also given a situation
about providing personal information to an SNS and described it freely in the
post-test. Student A’s personal information is listed, such as where she lives, her
e-mail address, her favorite animals, food, etc. Students write self-introductions of
Student A on the SNS using the listed information. The average scores were 4.73 on
the scale of one to five, which is very high.
(iii) Students developed positive feelings about participating in the information
society.
The student comments about the teaching materials showed that they felt
satisfactory while treating such familiar items as mobile phones and SNSs as
scenario subjects and their roles that gave advice to younger siblings. This showed
that our teaching materials provided realistic situations for students along with the
GBS theory. Moreover, student opinions included “I will try to use the coupon of
my favorite store” or “I’d like to use SNS to utilize the class’s knowledge and skill.”
We can see positive attitudes of participating in the information society. These
results suggest that our learning goals were clearly achieved.

4 Conclusions
In this paper, we considered the knowledge and skills that are required for
information morals education in Japan, which high school students clearly need.
We also proposed teaching material based on GBS theory. After the students
practiced using the teaching material whose subject was spam e-mails, they
achieved the learning goals of the teaching material. In future work, we will develop
other teaching materials based on such student characteristics as preexisting
knowledge of information technology. Also we would like to compare with other
teaching materials based on other methods.

Acknowledgments. This research was supported by the Ministry of Education, Science,


Sports and Culture, Grant-in-Aid for Scientific Research (B), 19700634, 2011.
Proposal of Teaching Material of Information Morals Education 529

References
[1] JAPET, Instructional practice kickoff guide of “information morals” for all teachers
(2007) (in Japanese)
[2] Gagne, R.M., Wager, W.W., Golas, K.C., Kellar, J.M.: Principles of Instructional
Design, 5th edn. Thomson Learning Inc. (2005); Japanese translation by Suzuki, K., and
Iwasaki S. (Translation supervisor) Kitaoji-shobo (2007)
[3] Tamada, K., Matsuda, T.: Systematic and methodical information morals education in
elementary school stage. In Consideration of Consistency with Instructional Method
Using “Three types of knowledge” for information morals education. In: Research
Report of JSET Conferences, vol. 08(5), pp. 109–116 (2008) (in Japanese)
[4] Umeda, K., Ejima, T., Nozaki, H.: The development and trial of goal-based scenario
teaching material for learning a framework for judging information ethics in a high
school class. Bulletin Paper of Center for Research, Training and Guidance in
Educational Practice (11), 67–72 (2008) (in Japanese)
[5] Ishihara, K.: The Transition of Information Morality Education and Teaching Materials
of Information Moral. The Annals of Gifu Shotoku Gakuen University. Faculty of
Education 50, 101–116 (2011) (in Japanese)
[6] Tamada, K., Matsuda, T.: Development of the Instruction method of Information Morals
by “the combination of three types of knowledge”. Japan Journal of Educational
Technology 28(2), 79–88 (2004) (in Japanese)
[7] Schank, R.C., Berman, T.R., Macpherson, K.A.: Learning by Doing. In: Regeluth, C.M.
(ed.) Instructional-Design Theories and Models: A New Paradigm of Instructional
Theory, vol. II (1999)
[8] Nemoto, J., Suzuki, K.: A checklist development for or instructional design based on
Goal-Based Scenario theory. Japan Journal of Educational Technology 29(3), 309–318
(2005) (in Japanese)
[9] Nippon police agency, The result of a survey on the actual conditions of usage of mobile
phone used by elementary school students,
http://www.npa.go.jp/safetylife/syonen1/shonen20110825.pdf
(accessed, December 17, 2011) (in Japanese)
Prototypical Design of Learner Support
Materials Based on the Analysis of Non-verbal
Elements in Presentation

Kiyota Hashimoto* and Kazuhiro Takeuchi**

Abstract. There is a growing need for a well-designed learner support system for
presentation in English, particularly in non-English-speaking countries. We have
developed a prototype of comprehensive learning support system of basic presen-
tation that consists of several modules including digital contents of preliminary tu-
torials, an interactive aide for organizing a presentation and its corresponding
slides, semi-automatic evaluation estimation, and a online review of recorded
presentations. After trials, it has been clear that non-verbal aspects has to be
extensively supported by such a system. In this study, we made an extensive ob-
servation and analysis of available professional and learner presentations, and ex-
tracted significant non-verbal elements in those presentations. Then, we designed
learner support materials for non-verbal aspects, based on a non-verbal ontology
we also designed. It is expected that the implementation of those materials will let
learners to learn more effectively how to make and conduct a presentation.

1 Introduction
It is more and more important for any one of us to make a good presentation, par-
ticularly that in English, in various situations. Oral presentation is one of the most
sophisticated communicative activities; deliberately designed to be presented to
specific audience with slides to deliver information and attempt to make a persua-
sion. Not only the linguistic organization but also the paralinguistic effects like

Kiyota Hashimoto
School of Humanities and Social Sciences, Osaka Prefecture University, Japan
e-mail: hash@lc.osakafu-u.ac.jp

Kazuhiro Takeuchi
Faculty of Information and Communication Engineering, Osaka Electro-Communication
University
e-mail: takeuchi@isc.osakac.ac.jp

T. Watanabe et al. (Eds.): Intelligent Interactive Multimedia: Systems & Services, SIST 14, pp. 531–540.
springerlink.com © Springer-Verlag Berlin Heidelberg 2012
532 K. Hashimoto and K. Takeuchi

body language and eye contact are utilized, and the presenter should have a deep
understanding not only of what he or she delivers but of how the audience will
react. However, the education for presentation has not been so systematic and per-
vasive as expected particular in Asian countries. There are at least two reason for
this current situation. First, oral presentation has not been culturally emphasized,
as symbolized as “actions speak louder than words (fugen-jikkou in Japanese).”
Second, not so many teachers are trained for presentation education. On the other
hand, social demands for good presentation education are larger in any country.
Considering these situations, we are constructing a comprehensive learner support
system for presentation in English.[1-4] It consists of several modules including
digital contents of preliminary tutorials, an interactive aide for organizing a pres-
entation and its corresponding slides, semi-automatic evaluation estimation, and a
online review of recorded presentations, with a multimedia learner corpus of pres-
entation. The overview of our prototypical system is shown in Fig.1.

Preparatory Online Tutorial Interactive Presentation Or-


English grammar ganizer
Pronunciation Helping
Paragraph Organization to organize a presentation
Essay Organization to organize slides
Presentation Organization
Presentation Tips
Model Example View

Semi-automatic Presentation Evaluator


To automatically detect inappropriate errors and
visually point them out for human revisions
To infer element-based gradings from human ho-
listic grading

Presentation Ontologies
Presentation Ontology
Multimedia Learner Corpus
Presentation Task Ontology
Presentation Error Ontology of Basic Presentation

Presentation Review of
Recording Presentations

Fig. 1 Overview of our prototypical learner support system of Basic Presentation


Prototypical Design of Learner Support Materials Based on the Analysis 533

We conducted some experimental education as a blended learning in which part


of our prototypical system is employed for self-learning, together with usual class
hours. We have then found that what our system offers to learners are mostly useful
to better their presentation skills, but the peer evaluation has indicated that the au-
dience are influenced more heavily than we had expected by paralinguistic elements
like body language, speech intonation, and eye contact, which is not yet treated in
our system. We have also found that most of the textbooks and handbooks on pres-
entation do not treat these aspects in detail, though their importance is pointed
out.[6, 12, 14] To help learners, particularly those whose native language is not Eng-
lish, it is highly desirable to develop a learner support module on paralinguistic as-
pects in our system, and it must be based on a comparative analysis of those aspects.
In this paper, we focus on non-verbal aspects of presentation. In section 2, we
made an extensive observation on professionals’ and learners’ presentations that
were video-recorded, and construct a model of non-verbal aspects of presentation.
Based on this, we design learner support materials in section 3, and section 4 is a
tentative conclusion.

2 Analysis on Professionals’ and Learners’ Presentations


We made an extensive observation and analysis on various presentations made by
both professionals and learners, based on our prior assumptions. First, as our sys-
tem is designed for use by non-native speakers of English, in particular, Japanese,
we need to capture differences between Japanese and English speaking people,
though we of course admit that there are a wide diversity among English speaking
people. There has been many studies on non-verbal aspects of communication
[e.g., 8, 10]and there are even a couple of guidebooks on body language [11, 13],
there are few comparing non-verbal aspects based on objective consideration.[9]
Second, as non-verbal aspects, together with the linguistic aspects, influence the
impression the audience have, it is desirable to interrelate non-verbal aspects in a
presentation to the impression or evaluation by the audience.

Fig. 2 Sample View of presentation stored in MLCP


534 K. Hashimoto and K. Takeuchi

So, as for learners’ presentations, we employed our multimedia learner corpus


of basic presentation, MLCP[1], in which hundreds of video-recorded learner
presentations in English for three to six minutes are stored. Presenters are mostly
Japanese freshmen at a Japanese university. An example of the recorded presenta-
tions is shown in Fig. 2. The most remarkable feature of MLCP compared to other
presentation databases [5,7,14] is that each presentation is accompanied by more
than 30 peer-evaluations by the audience. Peer-evaluation consists of the whole
grading and element-based grading. The elements are shown in Table 1.

Table 1 Elements for evaluation

1 Posture, gesture, and eye contact


Physical
2 Volume, clearness, effectiveness of his/her voice
3 Validity of the title
Whole 4 Presentation organization (including slide organization)
5 Persuasiveness of the presentation
6 Organization and Structure of each slide
Slides 7 Understandability and Effectiveness of Slides
8 Proper use of graphs, images, or illustration
9 Phonetic correctness
Pronunciation
10 Phonetic fluency
11 Grammatical correctness
Grammar
12 Grammatical variation

As for professionals’ presentations, we recorded a small number of professional


presentations in academic meetings, together with some famous presentations
whose video-recording is widely available.
The elements related to non-verbal aspects are 1 and 2 in Table 1. So we com-
pared the whole grading and Element 5, and Element 1 and 2, for 100 learner
presentations in MLCP. The result is shown in Table 2. Note that each presenta-
tions have more than 30 peer-evaluations, and so we show those averages.
The result says that the higher group and the lower group has big gaps for Ele-
ment 1, 2, and 5, though other elements like Element 11 and 12 do not show such
a gap between the lower group and the higher group. Another interesting point is
that though each peer-reviewer naturally made a different evaluation for a given
presentation, they mostly agreed on Element 1, 2, and 5. These results indicates

Table 2 Average gradings of the whole, Element 1, and Element 2 of 100 learner
presentations

Average of the whole Average of Element Average of Element Average of Element


grading (1≤n≤30) 5 (1≤n≤6) 1 (1≤n≤6) 2 (1≤n≤6)
The lower 6-10 1.8 2.4 2.3
group 11-15 2.2 2.7 3.1
The higher 16-20 3.1 4.3 3.8
group 21-25 3.4 4.4 3.9
Prototypical Design of Learner Support Materials Based on the Analysis 535

that peer-reviewers tend to associate the superiority of non-verbal aspects with


that of the whole presentation. Then, what kinds of non-verbal aspects are highly
evaluated by peer-reviewers? To answer this question, we made an extensive ob-
servation of 20 out of the 100 presentations employed. This observation is qualita-
tive rather than quantitative, since it is difficult to count every move distinctly and
to evaluate their frequencies. According to a small sample of learners’ and profes-
sionals’ presentations, together with instructions and descriptions in body lan-
guage guidebooks [11,13], we classified non-verbal aspects as shown in Table 3.

Table 3 Classification of non-verbal aspects

Category Details
Big move Walking around
Stepping
Approaching to the audience
Approaching to the screen
Posture Basic posture
Basic arm position
Turning to different sides of the audience
Turning to the screen
Upper body Bending
Shrugging
Turning to different sides of the audience
Turning to the screen
Arm Widening both arms
Moving both arms with various hand shapes
Turning up an arm
Pointing out a particular part of the audience
Pointing out a particular part of the screen
Taking hold of both hands
Hand without arm movement Moving a hand around
Showing the palm side
Showing the back side
Turning the palm up
Turning the palm down
Holding fast
Pointing up the thumb
Pointing up the index finger
Head Nodding
Shaking
Turning to different sides of the audience
Turning to the screen
Inclining the head
Face Richness of facial changes
Eye Eye contact to the audience
Eye contact to different sides of the audience
Turning to the screen
536 K. Hashimoto and K. Takeuchi

As expected, famous presenters like the late Steve Jobs and Michael J. Sandel em-
ploy most of these different moves, and professional presentations evaluated high by
us also contain many. However, a remarkable feature common to them is the rather
long duration of each move. Most of their moves are rather slow, and the finished po-
sition lasts more than one second in general. On the other hand, presenters whose
moves are fast and hasty look less trustworthy, as naturally expected. Interestingly,
the same evaluation tends to be seen in peer-evaluations on learners’ presentations.
Another intriguing feature is arm positions. According to our observation, good
presenters keep their arms, or at least an arm, above the line of the diaphragm al-
most always. Though it is often said that putting a hand or both in his pocket(s)
looks rude, many presenters still do so. However, even when they put their hand in
his pocket, the other arm is kept upwards.
Note that our study focuses on presentations with slides (i.e., there is a screen
where slides are projected), and that none of them are primarily made for broad-
casting. The former condition leads to frequent turns to the screen regardless of
which part of the body is used. The latter leads to the full employment of their
body partly in order to face every part of the audience.
In sum, though the discussion above is not exhaustive at all, we newly classi-
fied the presenter’s moves as shown in Table 3, and we have found two remarka-
ble features: duration and arm positioning. Presenters evaluated not so high tend to
lack both features. In particular, it is noticed that Japanese presenters, whether
they are learners or professional, tend to show these lack, though we are not cer-
tain whether it is a cultural tendency or just a manifestation of inexperience in
presentation.
Before turning to the next section, note that we did not relate each move to a
particular communicative function, as most guidebooks and handbooks do. Of
course some moves have an obvious communicative function like pointing to a
particular part of the screen, but mostly their communicative functions were not
pursued in our observation. There are two reasons: one is that it is just impossible
for us to find such relations with our data with evidence, and the other is, more
importantly, that the frequency and variation of non-verbal moves clearly divides
good and bad presentations, regardless that a presentation is made by a profes-
sional or a learner. So, though we admit that it would be important to relate eave
move to a particular communicative function, any attempt for it is a future task.

3 Design of Learner Materials for Non-verbal Aspects


Based on our observations discussed in the last section, we designed learner mate-
rials for non-verbal aspects. In section 3.1, we introduce our prototypical ontology
on non-verbal elements in presentation. In section 3.2 and 3.3, we explain our de-
sign of two related learner materials.

3.1 Prototypical Ontology on Non-verbal Elements in Presentation


Based on our observation, we classified presenter’s moves as in Table 3. It might
be possible just to employ this classification for constructing learner materials,
Prototypical Design of Learner Support Materials Based on the Analysis 537

but, as we explain later, we are planning to offer simulative image and for this
purpose, it is desirable to construct an ontology rather than to have a simple list of
moves. The overview of our prototypical ontology on non-verbal elements in
presentation is shown in Fig. 3.

Fig. 3 Overview of our prototypical ontology on non-verbal elements in presentation

This prototypical ontology has each body part as its elemental entry. Each entry
is related by “part-of” relations, and each body part has a list of roles that describe
actions used in a presentation. So the lists of body parts and their roles are not ex-
haustive from the viewpoint for describing all the human actions. Each role has
two values: distance and duration. As naturally expected, some are related to oth-
ers by the feature “induced move,” which means that, due to the human body
structure, they move in accord. Note that induced move is triggered by physical
necessity. If a person moves more than one different body parts with his own will,
it is not an induced move, and such simultaneous moves are captured just by acti-
vation of roles of those body parts.
This ontology is designed to be used not only for clarifying actions employed in
a presentation but for describing and analyzing presentations, and simultaneous
moves and sequential moves are easily captured with it.

3.2 Tutorial for Non-verbal Aspects


We prepared a tutorial for non-verbal aspects based on our observations and the
related literature. [6,11,13] The tutorial consists of video-recorded mini lectures
and accompanied documents. It is available online to learners, and we used it for
538 K. Hashimoto and K. Takeuchi

two classes in 2011, and 82% of the students answered in the post-class question-
naire survey that the tutorial was highly comprehensible and useful. However, our
analysis on their final presentations which should have been improved if the tu-
torial was truly useful indicates that such a tutorial is not enough. It may be true
that they understand the importance of non-verbal aspects and some functions of
non-verbal actions, but mastery requires more than understanding, as naturally ex-
pected.

3.3 Simulative Analysis Tool for Non-verbal Aspects


The key for turning understanding to mastery is awareness, or the keen sense of
reviewing one’s own actions dynamically. Such awareness is usually attained by
repeated exercises and practices, and it is considered that the process is accele-
rated by well-designed review analysis on one’s own presentation(s) in a PDCA
cycle shown in Fig. 4.

Fig. 4 PDCA Cycle of presentation learning

So we designed our simulative analysis tool for non-verbal aspects. This tool is
designed to make a better review analysis on presentations in non-verbal aspects
by providing opportunities to analyze presentations visually.
The user interface consists of three panes as shown in Fig.5. At the leftmost,
video playing buttons and tag drawing lists are placed. The user chooses the pres-
entation video he wants to review, and plays it. While playing, he can stop the
video any time by pushing “Capture Now” button, and can draw lines to the body
part he focuses on. When drawing a line, he can choose the reason why he focuses
! ?
on it. The classification is quite simple: [ ], [ ], [×]. Then he can add a com-
ment to the line at the rightmost pane.
The unique point of the design is that the review result can be piled on
other results and the user can easily compare more than one reviews on a
presentation.
Prototypical Design of Learner Support Materials Based on the Analysis 539

Fig. 5 Prototypical Window Image

4 Concluding Remarks
In this paper, we first made an analysis on non-verbal aspects of professionals’
and learners’ presentations, and then we proposed the construction of an ontology
and two learning materials. Most of our attempts are still at a design phase and
much as to be done, but we are developing a prototypical simulative tool for non-
verbal aspects.

References
[1] Hashimoto, K., Takeuchi, K.: Multimedia Learner Corpus of Foreign-er’s Basic Pres-
entation in English with Evaluations. In: Proc. of International Conference on Educa-
tional and Information Technology, vol. 2, pp. 469–473 (2010)
[2] Hashimoto, K., Takeuchi, K.: Prototypical Development of Awareness Promoting
Learning Support System of Basic Presentation. In: Proc. of 2nd International Sym-
posium on Aware Computing, pp. 304–311 (2010)
[3] Hashimoto, K., Takeuchi, K.: A Task Ontology Construction for Pres-entation Skills.
In: Proc. of 16th International Conference on Artificial Life and Robotics, pp. 162–
165 (2011)
[4] Hashimoto, K., Takeuchi, K.: Rhetorical Structure Ontology for Representing Learn-
er’s Presentations with Potential Textual Inconsistencies and Imperfections. ICIC Ex-
press Letters 5(5), 1649–1654 (2011)
[5] Ishikawa, S.: Elements of English Presentation Skills. Proc. of Japan and British Lan-
guage and Culture 1, 1–18 (2009)
[6] Koegel, T.J.: The Exceptional Presenter: A Proven Formula to Open Up and Own the
Room. Greenleaf Book Group, New York (2007)
[7] Kayatsu, R.: A Trial of Peer Review in an Information Presentations Class and its
Evaluation. J. of Nagano Junior College 64, 71–79 (2009)
540 K. Hashimoto and K. Takeuchi

[8] Krauss, R.M., Chen, Y., Chawla, P.: Nonverbal Behavior and Non-verbal Communi-
cation: What Do Conversational Hand Gestures Tell Us? In: Zanna, M. (ed.) Ad-
vances in Experimental Social Psychology, pp. 389–450. Academic Press, San Diego
(1996)
[9] Nakano, Y., Rehm, M., Lipi, A.A.: Parameters for Linking Socio-Cultural Characte-
ristics with Nonverbal Expressiveness; Comparison between Japanese and German
Nonverbal Behaviors. In: Proc. of HAI 2008, vol. 1A-4 (2008)
[10] Nakata, A., Sumi, Y., Nishida, T.: Sequential Pattern Analysis of Non-verbal Beha-
viors in Multiparty Conversation. IEICE Transactions J94-D-1, 113–123 (2011)
[11] Pease, A., Pease, B.: The Definitive Book of Body Language. Bantam Books, New
York (2004)
[12] Powell, M.: Presenting in English: How to Give Successful Presentations. Language
Teaching Publications, New York (1996)
[13] Reiman, T.: The Power of Body Language. Pocket Books, New York (2007)
[14] Reinhart, S.M.: Giving Academic Presentations. U. Michigan Press, Ann Arbor
(2002)
[15] Takahashi, Y., Kato, M., Kashiwagi, H.: Development of a Web-based Presentation
Database for English Language Learning. J. of the School of Languages and Commu-
nication 4, 93–103 (2007)
Reflection Support for Constructing
Meta-cognitive Skills by Focusing
on Isomorphism between Internal
Self-dialogue and Discussion Tasks

Risa Kurata, Kazuhisa Seta, and Mitsuru Ikeda

Abstract. Our goal is developing a system to improve knowledge co-creation skills


by reflecting so-called “knowledge co-creation discussion.” The discussion is a
core part of our educational program that conducts a knowledge co-creation
processes that we develop in the medical service field. Meta-cognitive skill plays a
key role to improve the knowledge co-creation skills especially in the field where
there is no pre-defined definite answer. However, it is difficult for learners to train
meta-cognitive skills since they are quite tacit, latent and context dependent: one
cannot observe other one’s cognitive processes conducting in her mind as internal
self-dialogue. In our research, we aim to develop learning support system by
focusing on isomorphism between tacit internal self-dialogue task and discussion
tasks conducted observably in the external world. In this paper, we firstly describe
background of our research. Then, we describe underlying philosophy to train
meta-cognitive skills essentially required for knowledge co-creation. Finally, we
overview our learning support system through reflective monitoring on learners’
discussion processes.

1 Introduction

Our goal is developing a system to improve knowledge co-creation skills by


reflecting so-called “knowledge co-creation discussion.” The discussion is a core
part of our educational program that conducts a knowledge co-creation processes
that we develop in the medical service field.

Risa Kurata ⋅ Kazuhisa Seta


Graduate School of Science, Osaka Prefecture University

Mitsuru Ikeda
School of Knowledge Science, JAIST

T. Watanabe et al. (Eds.): Intelligent Interactive Multimedia: Systems & Services, SIST 14, pp. 541–550.
springerlink.com © Springer-Verlag Berlin Heidelberg 2012
542 R. Kurata, K. Seta, and M. Ikeda

Fig. 1 Reducing the Difficulty of Learning Thinking Processes

The knowledge co-creation skill is required to create and conduct the responsible
medical treatment by grasping wide range of information on the patients.
Meta-cognitive skill plays a key role to improve the knowledge co-creation skills
especially in the field where there is no pre-defined definite answer. It is the
essential skill that monitors one’s own cognitive / problem-solving processes from
objective viewpoints and controls them adequately. However, it is difficult for
learners to train meta-cognitive skills since they are quite tacit, latent and context
dependent: one cannot observe other one’s cognitive processes conducting in her
mind as internal self-dialogue.
In our research, we aim to develop learning support system by focusing on
isomorphism between tacit internal self-dialogue task and discussion tasks
conducted observably in the external world.

2 Background of Our Research

The medical services practitioners must have not only their technical capabilities
but also interpersonal skills to conduct intellectual collaboration for knowledge
co-creation. In this chapter, we describe the skills that they should have, and an
educational program that we are developing to train the skills.

2.1 Characteristics of Decision-Making in the Medical Service


Field
Characteristics of the decision-making in the medical service field are two folds:

1. The medical staffs have to make an evidence-based decision even under the
urgent situation.
2. They have to consider individual patient’s factors of her/ his values and beliefs
that are intertwined.
Reflection Support for Constructing Meta-cognitive Skills by Focusing 543

Therefore, a medical staff must have following two skills:

1. The skill to comprehend the structure of conflict relations in one’s mind


adequately by monitoring and controlling own thinking processes from objective
viewpoints.
2. The knowledge creation skill that can create reasonable solutions which satisfy
considerable factors even in a situation where some trade-off relations exist.

Figure 1(a) shows the cognitive processes described above. The medical staffs have
to have the skill to resolve the conflicts by thinking logically and accurately,
objectively reflecting on their own thinking process, taking into account the other’s
viewpoints, and integrating his idea and other’s idea into meaningful knowledge.

2.2 A Method for Training Knowledge Co-creation Thinking


Skills: Reflecting Isomorphism Structures between
“Meta-Cognitive Activities in Internal Self-Dialogue
Processes” and “Thinking Processes for Discussion”
It is necessary for them to get much meaningful experience of performing
meta-cognitive activities in internal self-dialogue processes and thinking processes
for discussion to develop knowledge co-creation skills in problem-solving
processes.
In our training method, learners write ones’ own case of thinking processes by
reflecting on ones’ experiences in the medical practices with Sizhi (will be
described in 2.2.2). And then, conduct group discussion on the questions raised
through reflective case writings. Our goal is to train knowledge co-creation thinking
skills through these processes [1][2][3].
Figure 1(b) shows that the training processes for internal self-dialogue and
thinking processes for discussion. They are serialized for reducing learners’
cognitive loads to train their meta-cognitive skills.
In the following, we outline our educational processes by referring to Fig. 2
called “goal attainment model of verbalization” proposed by Ito [4]. This goal
achievement model can be regarded as a learning model that includes the model of
thought for dynamic internal self-dialogue.

Fig. 2 A Goal Attainment Model of Verbalization as a Learning Strategy


544 R. Kurata, K. Seta, and M. Ikeda

2.2.1 The Process of “Internal Self-Dialogue” in a Method for Training


Knowledge Co-creation Thinking Skills

In the internal self-dialogue processes, learners write ones’ own case by reflecting
on ones’ problem-solving experiences objectively in the medical practices with
Sizhi (Fig. 3).
In this method, we divide the training processes into two processes to clarify the
learning goals of each process and to reduce a learner’s cognitive loads required for
training. Furthermore, regarding internal self-dialogue processes, we divided into
three processes, i.e., description phase, cognitive conflict phase, and knowledge
building phase shown as Fig. 2.

1. The description phase: A learner writes one’s own case of thinking processes
by reflecting on one’s experiences which she/he conceived a psychological
conflict in the medical practice.
2. The cognitive conflict phase: A learner writes another idea that she/he has
thought of as another person’s idea, or an idea that he/she thought of assuming
the thinking style of her/his teacher/ supervisor/ colleague/ parent. And she/he is
conflicted between one’s own opinions and other people’s opinions.
3. The knowledge building phase: A learner thinks and describes new knowledge
to overcome the conflicts.

Figure 3 shows a screen image of Sizhi which the learner uses in case-writing
[5][6]. The Sizhi is a learning environment designed for developing the learner’s
ability to conduct logical thinking for self-dialogue and to appropriately reflect on
one’s thinking processes by oneself. The set of Sizhi tags consists of nine tags: fact
(patient), fact (medical), policy/principle, assumption, decision, medical decision,
conflict, reflect and resolve. The tasks in the case-writing are to reflect on their own
thinking processes in nursing patients and clarify the structure of the thinking
processes by tagging them with the Sizhi tags.
Prompting learner’s logical thinking processes by adding the Sizhi tags in this
environment, which fosters meta-cognitive monitoring and controlling their own
thinking processes, contributes to training their ability to perform discussion
logically.

2.2.2 The Process of “Social Dialogue” in a Method for Training Knowledge


Co-creation Thinking Skills

In the discussion process, each learner plays a role of a discussion leader (DL) in the
discussion on his own case. DL presents the case of oneself in discussion, and
leads the flow of discussion based on that case. The DL is required to think about
how to bring out conflict, and how to control members’ thoughts and perform one’s
own meta-cognition. Discussion Member (DM) reads the case that DL described
and simulates the way of knowledge building of DL’s thinking processes for
self-dialogue, takes part in the discussion, and gain insight into other members’
thinking processes.
Reflection Support for Constructing Meta-cognitive Skills by Focusing 545

Three Phases of Thinking


Process in Self-dialogue

Statement
Tag Reference

Fig. 3 A Screen Image of Sizhi (Description Phase)

Discussing one’s case indirectly interacts with the one’s self-dialogue processes
in nursing patients, since a case represents one’s self-dialogue processes.

1. The empathy phase: The DL presents a topic referring to the case of oneself
using Sizhi. This phase corresponds to the description phase in the internal
self-dialogue processes.
2. The critique phase: This phase clarifies relative associations such as
similarities and conflicts between opinions based on that has discussed in the
previous phase. The goal of this phase is to make the conflicts evident among
members. This phase corresponds to the cognitive conflict phase in the internal
self-dialogue processes.
3. The creation phase: Learners find conflicts between their own opinions and
other people’s opinions and create new solution to overcome these conflicts.
This phase corresponds to the knowledge building phase in the internal
self-dialogue processes.

In the internal self-dialogue processes, a learner writes one’s own case after the
problem-solving. On the other hand, the discussion process is configured to make
the situation to solve the problem in the real time.
In this paper, we call this discussion “knowledge co-creation discussion.”
546 R. Kurata, K. Seta, and M. Ikeda

3 Principles for Developing Meta-cognitive Skills: Prompting


Learners’ Perception of Isomorphism between
“Meta-cognitive Activities in Internal Self-dialogue
Processes” and “Thinking Processes for Discussion”
The timing that a learner develops knowledge co-creation skills by exerting
meta-cognitive skills is the one that a learner plays a role of a DL.
An important mission of a DL is adequate monitoring and controlling of the
discussion processes.
The knowledge co-creation discussion is a method for training knowledge
co-creation thinking skills by prompting their perception of isomorphism between
“meta-cognitive activities in internal self-dialogue processes” and “thinking
processes for discussion.”
DL’s intervention activities that suggesting a method for conducting discussion,
confirming the basis of a opinion, questioning the intention of each order and so on
correspond to meta-cognitive activities in internal self-dialogue processes.
For example, a question that “What is the basis of that conclusion?” corresponds
to a meta-cognitive activity that monitors validity of the assertion, i.e., “Is the basis
of this my assertion valid?” in internal self-dialogue, whereas a confirmation that
“This original conclusion might change by considering this new viewpoint”
corresponds to the one that thinks about “What is the important factor that affects in
this decision-making?”
Our goal is that learners obtain meta-cognitive skills by displaying their skills of
controlling the discussion adequately through knowledge co-creation discussion.
Table 1 shows example correspondences between a thinking activity for controlling
discussion and a meta-cognitive one in internal self-dialogue [7][8][9].
In the discussion processes of this training method, we provide a situation for a
learner where he/ she has to perform on-going monitoring of a discussion by
playing a role of DL: it contributes to prompting meta-cognitive activities in his/ her
internal self-dialogue processes under real-time situation of a problem-solving.

Table 1 Example orders and corresponding meta-cognitive activities

Orders Meta-Cognitive Activities


What are the evidences of the conclusion? Are evidences of this assertion clear?
Getting back to where we were, since we stray into undesirable Is what I think valid for my purpose of thinking?
direction. Make sure what I have to clarify: what is the purpose of
thinking about this?
Is the answer valid for the question? Is my conclusion valid for the purpose
Is your answer wandering form the intention of his question.
Let’s think about it from another viewpoint. Although I’ve considered from the viewpoint of A, is there
another viewpoint to consider?
Will the conclusion change by adding the viewpoint of B?
Is your assertion A? Although I considered from wide range of viewpoint, what is
Is the evidence E? the conclusion?
What is the reason of the conclusion?
We had firstly came to a conclusion A. Now, we come to an What are the important factors that affects this decision-
another conclusion B since we add this viewpoint. making?
Let’s discuss from this important item, because we have What is the critical point in this problem-solving?
limited time.
Reflection Support for Constructing Meta-cognitive Skills by Focusing 547

Performing meta-cognitive skills is a tacit and latent activity in one’s


self-dialogue processes. On the other hand, meta-cognitive activities are
externalized as orders in the collaborative discussion processes. In this paper, we
propose a learning support environment whereby learners can train their
meta-cognitive skills after conducting knowledge co-creation discussion by
performing reflective-monitoring of the thinking processes in the discussion as
meaningful learning resources.

4 Reflection Support System for Training Meta-cognitive Skills


We build our reflection support environment that consists of two systems: (i) an
annotation support system for discussion protocols and (ii) collaborative learning
support system that facilitates meta-cognitive learning using discussion protocols of
knowledge co-creation discussion.

4.1 Protocol Annotation Support System


Figure 3 shows a screen image of protocol annotation environment whereby
learners try to add defined tags on their discussion protocols for characterizing their
discussion. The contents are described in Japanese, however, it does not prevent
from understanding the subject of this section. In the system, it provides vocabulary
on (a) medical service domain, (b) logical relation and (c) orders. In Fig. 3(a), for
instance, vocabulary representing vital situation of patients, economic situation of

(d) Structure of a Case

(a) Vocabulary of Medical


Service Domain (b) Logical Relation

(c) Orders

Fig. 3 Protocol Annotation Support System


548 R. Kurata, K. Seta, and M. Ikeda

patients, medical service and so on are shown, whereas opposite, support, evidence
and so on in Fig. 3(b) and question, proposal, answer and so on in Fig. 3(c) are
shown, respectively. Fig. 3(d) is a graphical representation on DL’s internal
self-dialogue imported from Sizhi. By showing it, it prompts learners to focus on
the differences among DL’s internal self-dialogue and collaborative thinking
processes.
By using the vocabulary in Fig. 3(a), for instance, a learner adds a tag “ends for
safety management” on the statement of “We should use an instrument for
restricting the patient’s movement for her own safety (statement 1).” On the other
hand, learner adds a tag “opposite” relation between statement 1 and a statement “It
might impair her dignity and hurt her pride.” If the learners add different tags to the
same statement, the system prompts to discuss their recognition on the statement; it
might provide an opportunity to acquire their meta-cognitive knowledge to monitor
/ characterize their thinking processes.
Learners who attended knowledge co-creation discussion are ready for
collaborative reflection on their discussion processes.

4.2 CSCL System for Training Meta-cognitive Skills


Figure 4 shows a screen image of computer supported collaborative learning system
for training meta-cognitive skills using discussion protocols annotated in the
annotation support system.
In Fig. 4(c), it presents logical relationships in a case created in Sizhi. It is shown
in Fig. 4(b) that discussion protocols with tags annotated in the annotation support

(a) (b) (c)


根拠 根拠 根拠

指針の設定 指針の設定
指針 指針
結果

私は患者Aに、拘束器具をつけないことにしました。そ
の根拠は、患者Aの”拘束器具を付けることは人間らし
さを損なわれる”という気持ちです。

そこで、対立する他者の意見として、私は患者Aに
拘束器具をつけるという考え方を記 述しました。

この根拠は、患者の安全面
と伝統的指針です。

この二つの思考で葛藤を起こした結果, 患者の
気持ちと安全面を両方考慮し”患者Aに
睡眠薬を投与する”ことを知識構築しました。
みなさんが私の立場である
5 質問 なら、どう思いますか?
回答
(1)回答もっとほかの観点がな
かったか検討したい
私は反対です。
確認 質問
私はあなたの行動を支
持します。 理由 ズ反対の理由を述べて
レください。
(2)質問結果に対する (3)
理由 DM1 同意確認
さんの支持の理由は? 回答
質問の意図と回
患者が望まないことを
回答 この場合拘束器具をつけないの
は危険すぎるからです。
(4)したく
答が異なっていま
はないし、私もそ
すよね同意
うすると思うからです。
私もそうするかな。
13 提案 葛藤の結果についても
考えてみませんか?

(f) (e) (d)


Fig. 4 CSCL system for training meta-cognitive Skills
Reflection Support for Constructing Meta-cognitive Skills by Focusing 549

system regarding the case. Structure of the discussion protocols are graphically
represented in Fig. 4(d) and Fig. 4(e): logical relations in the medical service
domain and meta-cognitive structures of a DL that are represented as orders are
shown in Fig. 4(d) and Fig. 4(e), respectively. By scrolling down / up the scroll bar
indicated by Fig. 4(a), learners can replay their discussion processes: nodes shown
in the Fig. 4(d) and Fig. 4(e) are (dis-)appearing according to their scrolling. The
system highlights the statements in Fig. 4(b), Fig. 4(c) and Fig. (d) from the
viewpoint of a selected term of the medical service domain shown in Fig. 4(f).
Learners perform collaborative reflection for training meta-cognitive skills using
the tool. They especially focus on the orders in Fig. 4(e) for their discussion.
They discuss validity of orders proposed by the DL, since they, which are
depicted as red colored statement in Fig. 4(e), are recognized as results of
meta-cognitive self-dialogue processes. By discussing the validity of orders, they
examine the tacit meta-cognitive activities performed behind them. Balloons in the
Fig. 4(d) represent orders which DL did not proposed but other members think they
are valid if proposed: they conduct collaborative learning by discussing their
usefulness / validity by seeing their information.
The balloon “Confirmation” shown in the upper part of Fig. 4(e) are added by a
DL or DM who thought the order confirmation was required, although the
intervention had not been performed. By clicking the balloon, the contents of it are
shown in Fig. 4(d): it pointed out the gap between 2 learners’ intentions behind their
statements, the statement (1) “What did you do if you were in my shoes?” and the
statement (2), which is the answer to (1), “I support your action.” The intention of
the learner who talked the statement (1) is that she wanted to investigate the validity
of her actions if she is now in the situation where she performs her problem-solving
processes (Fig. 4(d)(1)). However, he interpreted her intention of the statement that
she wanted to require his agreement to her actions (Fig. 4(d)(2)). It also presents
another learner’s suggestion for externalizing the gap from the viewpoint of
knowledge creation: “DL should give intervention order, for instance, ‘there exists
some miss-understanding of her intentions.’”
In this way, CSCL environment for training meta-cognitive skills are provided to
learners by giving a role of DL required to control discussion processes as a
problem-solving task. This is only realized by focusing on the isomorphism
between internal self-dialogue processes and discussion processes.

5 Concluding Remarks
In this paper, we firstly described that training meta-cognitive skills is a key issue to
train problem-solving skills in the medical service science domain. Then, we
proposed our underlying philosophy of building a learning support system whereby
learners can collaboratively train their meta-cognitive skills through performing
discussion processes: we focus on the isomorphism between meta-cognitive
activities in internal self-dialogue processes and thinking processes in discussion
processes. Furthermore, we proposed our learning support system based on the
philosophy. Further evaluation of validity and usefulness of our system will be
addressed in future work.
550 R. Kurata, K. Seta, and M. Ikeda

References
[1] Keio Business School, Theory and Practice of Case Method. Toyo Keizai Inc., Tokyo
(1977) (in Japanese)
[2] Hyakkai, S.: Learning by Case Method. Gakubunsha Inc., Tokyo (2009) (in Japanese)
[3] Ishida, H., Hoshino, H., Okubo, T.: Case Book 1: Intorduaction to Case-Method. Keio
University Press, Tokyo (2007) (in Japanese)
[4] Ito, T.: Effects of Verbalization as Learning Strategy: A Review. Japanese Journal of
Educational Psychology 57, 237–251 (2009) (in Japanese)
[5] Cui, L., et al.: Thinking Skill Development Program To Support Co-Creation of
Knowledge for Improving the Quality of Medical Services. In: Proceedings of
Conference on Education and Education Management (2011)
[6] Morita, Y., Cui, L., Kamiyama, M.: Learning program that makes thinking the outside
and presses knowledge collaboration skill development. The Institute of Electronics,
Information and Communication Engineers Technology Research Report 111(98), 7–12
(2011) (in Japanese)
[7] Tomida, E., Maruno, J.: Theoretical Background and Empirical Findings of Argument
as Thinking. Japanese Psychological Review 24(2), 187–209 (2004) (in Japanese)
[8] Billig, M.: Arguing and thinking: A rhetorical approach to social psychology.
Cambridge University Press, Cambridge (1987)
[9] Kuhn, D.: The skills of argument. Cambridge University Press, Cambridge (1991)
Skeleton Generation for Presentation Slides
Based on Expression Styles

Yuanyuan Wang and Kazutoshi Sumiya

Abstract. With the advent of PowerPoint and Keynote that can effectively create
attractive presentation slides, people can use them to exchange and discuss ideas
together. However, because it is necessary to prepare many slides to enable audi-
ences to understand the content, authors need to prepare the best possible slides.
Our skeleton generation method is designed to help authors to prepare slides with
ease by constructing slide layouts based on the expression styles that the level posi-
tions of words expressing their role in slides from the text in the textbooks they use.
By analyzing the role of the words in the slides, our method can then extract the
differences between the important elements in both the texts and slides. To generate
skeletons for slides from target texts in a textbook, our method derives the expres-
sion styles of the words from pre-existing texts and their slides. Finally, it generates
slide skeletons by using the same expression styles of the corresponding words from
the target texts arranged in slides, which are the same as the layouts of pre-existing
slides. We also present the results of an evaluation of the method’s effectiveness.

1 Introduction
Presentations now play a socially important role in many fields, including business
and education, among others. Many university teachers have used Web services such
as SlideShare [1] and CiteSeerX [2] to store the slides they use in lectures. However,
because teachers prepare many slides to enable students to understand their content,
the teachers should prepare the best possible slides. In fact, when authors plan their
slides often refer to texts (e.g., lectures in a textbook) to determine the information

Yuanyuan Wang
University of Hyogo, Japan
e-mail: ne11u001@stshse.u-hyogo.ac.jp
Kazutoshi Sumiya
University of Hyogo, Japan
e-mail: sumiya@shse.u-hyogo.ac.jp

T. Watanabe et al. (Eds.): Intelligent Interactive Multimedia: Systems & Services, SIST 14, pp. 551–560.
springerlink.com c Springer-Verlag Berlin Heidelberg 2012
552 Y. Wang and K. Sumiya

Generating
Word Role Expression
interface Specialization 1 level
st
(slide title)

feedback Consistency 3 level


rd


user Aggregation
… 1 ,2 ,3 …levels
st nd rd

Fig. 1 Conceptual diagram of skeleton generation from textbook and its slides

should be conveyed. It is important to focus on how to express the information that


will appear in slides from texts. We can generate skeletons serve as slide layouts
that express typical words from the texts based on their role in slides by considering
how to convey the words to create the layout of the slide. For example, a word
“vegetable” appears in all the chapters of a textbook, but appears in only one slide.
We consider that the role of “vegetable” is Aggregation, which is a concentrated
summary of the information regarding “vegetable” in one slide for that textbook.
Our approach creates an editable slide skeleton that is able to produce a slide
layout based on specific words to help authors prepare slides easily and efficiently.
In this paper, we define the expression styles that the level positions of the words
are arranged in slides for the expression of presentation, based on the role of the
words in the slides by considering how each word represented in a slide differs from
how it appears in text. We derived the document structure from texts by focusing
on their logical units, and the document structure of slides by focusing the level of
indentation in slide text that are often used to help users better organize their slide
contents. As depicted in Fig. 1, when a textbook contains a number of lectures,
authors can take a target text as 2’s text to prepare slides. When the logical units
that constitute 2’s text are the same as in the pre-existing 1’s and 3’s texts, we
can detect the expression styles of words in the slides by analyzing the differences
between the pre-existing texts and their slides as input. We can therefore generate
skeletons for 2’s slides from 2’s text, based on the expression styles of the words.
We found that there were two main features particularly helpful for deriving ex-
pression styles, based upon the differences between important elements by analyz-
ing the document structure of texts and their slides: (1) When a word appears within
the body of a text, it is an important word in the text; and when a word appears in the
slide title or in lines that are less indented, it is an important word in slides [3]. (2)
When a word occurs with high density in a certain passage of a segment text, which
is an important description of the word in the text [4]. Also, when a number of sen-
tences appear in lines that are deeply indented, they are an important description of
a word in a slide. Therefore, we can generate skeletons for slides from a target text
Skeleton Generation for Presentation Slides Based on Expression Styles 553

based on the expression styles of words by extracting the differences between the
important elements of pre-existing texts and their slides that are in such a textbook.
The next section reviews related work. Section 3 describes how to determine key
elements in texts and slides. Section 4 presents the generation of skeletons for slides.
Experimental results and conclusions are given in Sections 5 and 6, respectively.

2 Related Work
Most of the research related to slide-making support has focused on slide generation.
Mathivanan et al. [5], Beamer et al. [6] and Yasumura et al. [7] proposed a system for
generating slides from academic papers. Their method extracts information from a
paper by the TF-IDF method, and assigns the sentences, figures and tables in slides
by identifying important phrases for bullets. Shibata et al. [8] converted Japanese
documents to slides representation by parsing their discourse structure and repre-
senting the resulting tree in an outline format. However, conventional approaches
that focus only on the consistency of the document structure in the text and slides,
both ignore the role played by how to express words from the text to the slides.
Our method focuses on the differences between the key elements of texts and their
slides, and it generates skeletons for slides based on the expression styles of words.
Kan [9] proposed a system for the discovery, alignment, and presentation of such
document and slide pairs. Hayama et al. [10] aligned academic papers and slides
based on Jing’s method, which uses a hidden Markov model. These studies are sim-
ilar to ours for analyzing information that is common to texts and their slides. Our
approach focuses not only on the information that is in common, but also on infor-
mation that differs between texts and slides. Yokota et al. [3] can retrieve important
information in slides is similar to ours. Kurohashi et al. [4] detected important de-
scriptions of a word in a text. Their method is based on the assumption that the most
important description of a word in a text is the passage where the word occurs with
the highest density. We have employed the same method for detecting important de-
scriptions of a word in a text. Therefore, our goal is to generate skeletons for slides
by analyzing the differences between important elements of texts and their slides.

3 Determination of Important Elements Using Document


Structures
We determine important elements by calculating the distribution of words based on
the document structure in the text and then taking the document structure in slides.
A chapter in a textbook is referred to as a text. We define the document structure of
a text in terms of its logical units, which consist of sections, which in turn consist of
a section head and paragraphs. The content of a presentation includes a number of
slides that have structured text information. We define the document structure from
slides, based on the indentations in the slide text. We define the slide title as the 1st
554 Y. Wang and K. Sumiya

level. The first item of text is considered to be on the 2nd level, and the depth of the
sub-items increases with the level of indentation (3rd level, 4th level, etc.).

3.1 Determination of Important Elements in a Text


If the location in which a word b appears in the text is dispersed, b is deemed an
important word in the text; it is called Wt . We explain the determination of Wt using
b, and we calculate the degree of importance of b by the words dispersion.
∑n dist(c j , bu )

∑nu=1 dist(c1 , bu )
Wt = {b|min , ..., u=1 > α} (1)
n n
Where bu is the uth word b, and c j is the jth section in the text. Function dist cal-
culates the distance between sections, that is a number indicates how many sections
there are between two words. n is the number of times that b appears in a text. The
highest degree of expectation is obtained with the lowest dispersity by using func-
tion min. Wt is a bag of important words in the text, if the formula is greater than a
threshold α in Eq. (1), and b is determined to be the important word in Wt .
If a word m occurs in a high density in a certain range of a text segment in the text,
the text segment is therefore considered an important description of m in the text,
and it is called Dt . When the density of m in a text segment is high, it is determined
that the text segment of m provides Dt of m in the text. We define the position i of m,
and define l as a center position, and the range from l to the anteroposterior position
is w as a certain range in the text. To calculate the density of m, we use the hanning
window function [11] to decrease the weight of the words in the range from l to
l − w, l + w. The density of m on l in the range of |i − l| ≤ w can be calculated as
l+w
1 i−l
Dt = {m| ∑ am (i) · (1 + cos2π
2 2w
) > β} (2)
i=l−w

The part of formula 12 (1 + cos2π i−l


2w ) is a hanning window function. The function
am (i) indicates whether the word in l is m. If m is in l, am (i) returns 1; otherwise,
am (i) returns 0. Here, the location starts from 0 (the head of the text segment),
followed by each position as l of the hanning window, in order. For the number of
sections in the text or words in each section is not consistent. Therefore, we set the
range of windows (2w) for the average of number of words in each section.

3.2 Determination of Important Elements in Slides


If a slide has more information in terms of a word g than is contained in a prior slide
in the presentation, g is thus an important word in the slides, and it is called Ws . We
explain the determination of Ws using g, which is present in both slides x and y.

Kl (x, g) = {ki |ki ∈ x, l(x, g) < l(x, ki )} (3)


Skeleton Generation for Presentation Slides Based on Expression Styles 555

Here, Kl (x, g) is a bag of words that can be considered to provide an explanation in


terms of g in slide x. l(x, g) is a function returns the level of g in slide x. The word
ki is included in the levels that have a hierarchical relationship with the level of g,
and ki belongs to Kl (x, g) in slide x. l(x, ki ) is greater than l(x, g), in that ki is a child
of g in the document structure. Then, we compute the number of words in detailed
information related to g for slides x and y, and compare their numbers as follows:

Ws = {g||Kl (x, g)| < |Kl (y, g)|} (4)

where the function |Kl (x, g)| extracts the total number of ki in Kl (x, g) in slide x.
Kl (y, g) are also bags of words in slide y, and they satisfy the same conditions as
Kl (x, g) in Eq. (3). Ws is a bag of important words in the slides, and if |Kl (x, g)| for
slide x is lower than |Kl (y, g)| for slide y in Eq. (4), g is then determined to be an
important word in Ws .
If a number of sentences in lines are indented deep in the level indentation of a
word d, these sentences is an important description of d in the slides, and it is called
Ds . When d and other words in slide x satisfy certain conditions, the lower levels of
sentences Ls (x, d) of d is determined to be Ds of d.

Ds = [d, Ls (x, d)] (5)


Ls (x, d) = {rs |l(x, d) ≤ l(x, rs )} (6)

A set Ls (x, d) consists of sentences from levels related to d in slide x. Sentence rs


belongs to Ls (x, d) in slide x if rs must be included in one of the indentation levels.
Additionally, l(x, rs ) is greater than or equal to l(x, d), and words of rs are children
of d or the words of rs and d are brothers in the document structure, Ls (x, d) will
also extract sentences containing d from levels from l(x, rs ) to l(x, d).

4 Skeleton Generation
4.1 Detecting Expression Styles
To generate skeletons, a slide layout is used, which consists of words based upon
expression styles by the role of the words using the differences between the impor-
tant elements in the pre-existing text and their slides. For the differences between
the importance of word q in the slides and the text, which falls into 3 categories:
• tw1 : q ∈ Wt ∩ Ws , q is an important word in both the text and the slides.
• tw2 : q ∈ Wt , q is an important word in the text.
• tw3 : q ∈ Ws , q is an important word in the slides.
For the differences between important descriptions of a word that appear in the text
and slides, we compute the similarity of the bag of words in important descriptions
of q, Dt in the text and Ds in slides. This is done using the Simpson similarity coeffi-
|Dt ∩Ds |
cient [12] as Sim(Dt , Ds )= min(|D t |,|Ds |)
. We consider that whether the text and slides
contain one or multiple important descriptions of q, based upon their similarity, they
556 Y. Wang and K. Sumiya

Chapter 5 Presentation 5

document documentdocument document


display document
documentposition
documentdisplay document
Summary Document
document
document list document highlightdocument
documentquery
document document
result position result
summary structure ranking
display document
summary graph
summary relevance
summary term document document term summary
document document
document list query
summary document
document summary
document display
document document
summary term
document documentsentence slide a6
slide a3
Word Corresponding Word Category Pattern Role Expression

document query tw1 td3 Aggregation 1 ,2 ,3 levels


st nd rd

summary interface tw3 td1 Specialization 1 ,3 levels


st rd

display modification tw1 td1 Consistency 2 level



nd

Chapter 6 Slide skeletons

query user query query position


modification query
query engine
query query
queryinteraction Query expansion
query term
term interface query term
query query function similar
interface system query -document
Interface
-queryexpansion
query
querydocument
engine adaptationquery feedback log query

query ・
query query
query query
query
user suggest
modification
term queryuser
queryrelevance
term
query compare -modification -query log ・function

modification queryquery ・user interface


slide b6
slide b3
Fig. 2 Example of skeleton generation

fall into 6 categories. When Sim(Dt , Ds ) ≥ 0.7, the content of the important descrip-
tions of q in the text and in the slides are similar, and there are 3 categories:
• td1 : one (multiple) descriptions of q in Dt corresponds to one (multiple) descrip-
tions of q in Ds .
• td2 : one description of q in Dt corresponds to multiple descriptions of q in Ds .
• td3 : one description of q in Ds corresponds to multiple descriptions of q in Dt .
When 0.3 ≤ Sim(Dt , Ds ) < 0.7, the common content of the important descriptions
of q in the text and in the slides are not similar, which falls into 3 categories:
• td4 : one (multiple) descriptions of q in Dt has information in common with one
(multiple) descriptions of q in Ds .
• td5 : one description of q in Dt has information in common with multiple descrip-
tions of q in Ds .
• td6 : one description of q in Ds has information in common with multiple descrip-
tions of q in Dt .
We can find what words are emphasized, and how the words should be described
in the text and the slides, whether multiple descriptions are dispersed, or one de-
scription is centered from the differences between important elements in the text
and the slides. In the example shown in Fig. 2, the word “document” is dispersed in
all sections in Chapter 5, with some text segments having a high density of “docu-
ment,” and it also appears frequently in the body of text in slide a6 of Presentation
5. When “document” is an important word in both the text and slides as tw1 , mul-
tiple important descriptions in the text correspond to one important description in
the slides as td3 . We consider that slide a6 is concentrated when it summarizes the
Skeleton Generation for Presentation Slides Based on Expression Styles 557

Table 1 Patterns in the role of words in slides

P td1 td2 td3 td4 td5 td6


tw1 Consistency Separation Aggregation Portion Partial Separation Partial Aggregation
tw2 Generalization Dispersion Uni f ication Mention Separable Mention Centered Mention
tw3 Specialization Subdivision Concentration Expansion Separable Expansion Centered Expansion

information in terms of “document” in Chapter 5, and the role of “document” will be


Aggregation. On the other hand, when the word “summary” repeatedly appears in a
certain text segment that has a high density of “summary”, slide a3 is titled “sum-
mary” of Presentation 5. When “summary” is an important word in slides as tw3 ,
and one important description in the text corresponds to one important description
in the slides as td1 . Slide a3 offers specialized information regarding “summary”
from Chapter 5, and the role of “summary” is then Specialization. Therefore, we
define the expression style ES that the role R of words with the expression E of
presentation is represented by the level positions of the words in slides as follows:

ES = (R, E) (7)
R = (wi , pwi )(wi ∈ W, pwi ∈ P) (8)
W = Wt ∪Ws (9)
P = {pw1 (tw1 ,td1 ), · · · , pw6 (tw1 ,td6 ), · · · , pw13 (tw3 ,td1 ), · · · , pw18 (tw3 ,td6 )}
(10)

here, W is a bag of words that belongs to Wt or Ws that can be considered as the


words that play key roles in the slides. E denotes the level positions of the words in
slides by the role of the words in R, and P denotes the total of 18 patterns that intend
the role of the words in R, and the words belong to W . These patterns combine
3 categories of differences in the important words and 6 categories of differences
between the important descriptions of the text and slides that are shown in Table 1.

4.2 Generating Skeletons for Slides


Based upon the expression styles drawn from pre-existing texts and slides, we can
generate skeletons for slides from a target text in a textbook by extracting the word
in the target text that corresponds to the words in pre-existing texts. We consider
texts in which the chapters in a textbook have the same document structure as the
sections in each chapter. When the frequency of a word z in all sections of the pre-
existing text Ta and the frequency of a word z in all sections of the target text Tb
have the same tendency, we consider that z corresponds to z.
For each word, we rank the sections in terms of its frequency by calculating
the Spearman’s rank correlation coefficient R( f (z, Ta ), f (z , Tb )) for the correlation
between the section rankings f (z, Ta ) of z in Ta and f (z , Tb ) of z in Tb . Based on
the above criteria, we extract a pair C p of z in Ta and z in Tb as follows:
558 Y. Wang and K. Sumiya

Citation Relationship Frequency Visualization


• term citation • visualization
relationship – visualization – visualization – term
• representation
– frequency
– term
slide 2 slide 3 slide 4 slide 5 slide 6

Citation Relationship
Visualization Analysis • node
• analysis citation text • analysis • text analysis
relationship • representation – citation • citation citation
– node – representation • visualization • visualization
– representation – visualization relationship relationship
– analysis – analysis
slide 7 slide 8 slide 9 slide 10 slide 11

Not appear in Sb’s slides

Fig. 3 Generated slide skeletons compared with slides in Sb

C p = {(z, z )|R( f (z, Ta ), f (z , Tb )) > γ , z ∈ W } (11)

if R( f (z, Ta ), f (z , Tb )) is greater than a threshold γ that is near to 1 in Eq. (11), z


is determined to be the corresponding word of z. Therefore, we are able to generate
skeletons for layout slides by using the expression style of z in the same expression
style as z, which is performed according to Eqs. (7), (8), (9) and (10), and the number
of skeletons for slides is the same as the number of pre-existing slides.
For example, an author wants to make slides for a lecture regarding Chapter 6 in
a textbook. Our method generates skeletons for slides from Chapter 6, referring to
Presentation 5 from Chapter 5 (see Fig. 2). In Chapter 5 the word “document” ap-
pears in all sections, and it occurs in high density in some certain ranges of the text
segments. Meanwhile, if “document” appears frequently in slide a6 only in Presen-
tation 5, then the role of “document” is Aggregation. In Chapter 6 the word “query”
appears in all sections that correspond to “document” in Chapter 5. The skeleton for
slide b6 generated from Chapter 6 shows that “query” appears frequently in slide
b6, which explains “query expansion” in terms of “query.” Next, “query” in slide
b6 has the same role as “document” When the author makes slides referring to the
skeletons for slides, such as slide b6, the information for “query” in slide b6 is con-
structed in the same way as it is for the level positions of “document” in slide a6,
based upon the same expression style. The generated skeletons can be used to cre-
ate layout slides that construct words according to the same roles the words play in
pre-existing slides, and these skeletons then enable the author to make slides easily.
Skeleton Generation for Presentation Slides Based on Expression Styles 559

5 Evaluation: Validity of Generating Skeletons


The aim of this experiment was to verify whether our method is useful for generating
skeletons for slides. We first prepared two presentation files: Sa from text Ta and Sb
from text Tb were made by the same person, both from Chapter 11 in a textbook
called Search User Interfaces [13]. Because of their single authorship, Sa and Sb both
have the same expression styles, and Ta and Tb have the same document structure.
Each presentation file contains 10 slides, not counting the cover slide. We used Ta
and Sa to generate skeletons from Tb based on our method; the slides in Sb serve as
correct answers regardless of whether the level positions of the words in the slides
generated from skeletons from Tb are correct or not.
First, we extracted the expression styles of 14 important words in Ta and Sa and of
9 words in Tb , which correspond to 8 words of the 14 important words in Ta , based
on our method. There were 40 level positions of 8 words from Ta that are in Sa .
Next, we generated 10 slide skeletons from Tb with the same number of slides as in
Sa , and 40 level positions of 9 words from Tb were arranged in slide skeletons based
on the expression styles of the 8 corresponding words in Ta . Finally, we compared
them with the correct answers as Sb ’s slides (see Fig. 3).
In the experimental results, the correct rate of the level positions of words in
slides by the generated skeletons based our method was 62.5%(25/40), and the cor-
rect rate of the expression styles of the words was 66.7%(6/9). The result for the
skeleton generation was low, and it was dependent upon the expression styles of
the words that were arrayed in the slides. For example, our method determined the
expression style of a word that has one important description in Sa ; however, we
used the same expression style for the corresponding word has multiple important
descriptions in the correct answer Sb . In addition, we need to consider the figure
captions for determining the important elements in the text. Sa and Sb , which were
written by the same person, contain a number of important words in slides, and they
appear in figure captions in the texts. However, those words in the body of the text
that appear once cannot be determined the important word by our method.
This experiment showed that our method can arrange the words in slides using
generated skeletons based on their expression styles. However, our method could not
extract the corresponding words by using the frequency of each word in all sections
in Ta and Tb , when some words appeared frequently in one section only, or when
some words appeared just once in one section. This was one of the reasons why
the rate of correct responses was low. Therefore, these corresponding words in the
target text that is used for generating skeletons also need to be considered.

6 Concluding Remarks
In this paper, we proposed a method of skeleton-generation that provides support
for making slides based on the expression styles of words. We described in detail
how to expression styles are determined by extracting the patterns that combine the
differences between the important words and the important descriptions of words in
560 Y. Wang and K. Sumiya

texts and slides, respectively. To generate skeletons for slides from a target text, we
extracted the words in the target text that correspond to the words in pre-existing
text, and we then used the same expression styles of the words in the target text.
In the future, we plan to improve our algorithm for skeleton generation and to
evaluate it using a large set of actual presentation data. We also plan to enhance
our method for extracting corresponding words based on the document structures of
texts, not only in terms of sections but also in terms of paragraphs in a section.

References
1. SlideShare, http://www.slideshare.net/
2. CiteSeerX, http://citeseer.ist.psu.edu/index
3. Yokota, H., Kobayashi, T., Okamoto, H., Nakano, W.: Unified contents retrieval from an
academic repository. In: Proc. of International Symposium on Large-scale Knowledge
Resources (LKR 2006), pp. 41–46 (March 2006)
4. Kurohashi, S., Shiraki, N., Nagao, M.: A Method for Detecting Important Descriptions of
a Word Based on Its Density Distribution in Text. IPSJ (Information Processing Society
of Japan) 38(4), 845–854 (1997)
5. Mathivanan, H., Jayaprakasam, M., Prasad, K.G., Geetha, T.V.: Document summariza-
tion and information extraction for generation of presentation slides. In: Proc. of Inter-
national Conference on Advances in Recent Technologies in Communication and Com-
puting (ARTCOM 2009), pp. 126–128 (October 2009)
6. Beamer, B., Girju, R.: Investigating automatic alignment methods for slide generation
from academic papers. In: Proc. of the 13th Conference on Computational Natural Lan-
guage Learning (CoNLL 2009), pp. 111–119 (June 2009)
7. Yoshiaki, Y., Masashi, T., Katsumi, N.: A support system for making presentation slides.
In: Transactions of the Japanese Society for Artificial Intelligence, pp. 212–220 (2003)
(in Japanese)
8. Shibata, T., Kurohashi, S.: Automatic Slide Generation Based on Discourse Structure
Analysis. In: Dale, R., Wong, K.-F., Su, J., Kwong, O.Y. (eds.) IJCNLP 2005. LNCS
(LNAI), vol. 3651, pp. 754–766. Springer, Heidelberg (2005)
9. Kan, M.: Slideseer: A digital library of aligned document and presentation pairs. In:
Proc. of the 7th ACM/IEEE-CS Joint Conference on Digital Libraries, JCDL 2007, pp.
81–90 (2007)
10. Hayama, T., Nanba, H., Kunifuji, S.: Alignment between a technical paper and presenta-
tion sheets using a hidden markov model. In: Proc. of the 2005 International Conference
on Active Media Technology (AMT 2005), pp. 102–106 (May 2005)
11. Blackman, R.B., Tukey, J.W.: Particular Pairs of Windows. In: The Measurement of
Power Spectra, From the Point of View of Communications Engineering, pp. 95–101.
Dover, New York (1959)
12. Simpson, E.H.: Measurement of diversity. Nature 163, 688 (1949)
13. Hearst, M.A.: Search user interfaces, pp. 281–296. Cambridge University Press (Novem-
ber 2009)
Stochastic Applications for e-Learning System

Syouji Nakamura, Keiko Nakayama, and Toshio Nakagawa

Abstract. This paper considers the optimal facing study support interval for
the learner and derive analytically the optimal interval study support policy by a
stochastic model using access logging data of the e-learning system to contents. If
a lecture does not provide the facing study support to the student in an e-learning
system, the learner has the possibility of dropping out of the target subject to be
studied. However, if the lecture indeed provides such support to the learner every
time, a problem occurs from the viewpoint of its cost-effectiveness.

1 Introduction
High efficiency education and Efficient education are responsibilities for every
school and society [1]. One of measures of evaluation is whether education reaches
a high target level with a small education. Because education is the external econ-
omy, it has been thought to be not related with marginal utility in economics up to
now. If a lecturer does not provide the study support to a learner of the e-learning
system, a learner often may drop out of the study[2]. Such problem in study causes
study stagnation due to changes in the life environment. Then, as study stagnation
is prolonged, student’s motivation decreases [3]. Therefore, it is an important role
Syouji Nakamura
Kinjo Gakuin University, 1723 Omori 2-chome, Moriyama-ku, Nagoya, 463-8521, Japan
e-mail: snakam@kinjo-u.ac.jp
Keiko Nakayama
Chukyo University, 101-2 Yagoto-Honmachi, Showa-ku, Nagoya, 466-8666, Japan
e-mail: nakayama@mecl.chukyo-u.ac.jp
Toshio Nakagawa
Aichi Institute of Technology, 11247 Yachigusa, Yakusa-cho, Toyota, 470-0392, Japan
e-mail: toshi-nakagawa@aitech.ac.jp

T. Watanabe et al. (Eds.): Intelligent Interactive Multimedia: Systems & Services, SIST 14, pp. 561–568.
springerlink.com c Springer-Verlag Berlin Heidelberg 2012
562 S. Nakamura, K. Nakayama, and T. Nakagawa

to do the study support such that the lecturer may lead to the completion in the
e-learning system.
In this paper, when the lecturer uses the e-learning system, we consider how to
give the learner efficient by study support. In general, the content of study support
would be different and depend on the lecture. Therefore, neither the content, method
nor level of study support, we consider the frequency of the study support[4,5]. In
addition, the learner’s understanding level is proportional to the number of access
logs of the e-learning system. That is, in the e-learning system of self-study, if the
access to a learner’s e-learning system increases, learner’s acquirement items have
risen. Conversely, the learner acquires few items when there is a little access in the
e-learning system. We use the study support history data in the e-learning system.
It is necessary to decide the study support frequency and to reduce the work load of
the lecturer.
We apply the cumulative damage model[6,7] to the study support of e-learning
system, and seek analytically the optimal number of the study support. The cumu-
lative damage model in which shocks occur in random times and damage such as
fatigue, wear, crack growth, creep, and dielectric breakdown is additive. In this pa-
per, we apply the cumulative damage model to the e-learning system, by putting
shock by access in the e-learning system, failure by learner’s acquirement item and
damage by credit threshold.

2 e-Learning Model
Suppose that a learner should a total number of items K(> 0) in the e-learning sys-
tem for period [0, S ] and the number of acquirement items at time t is At which
is in proportion to time t. If the number of acquirement items at the final period
S is AS ≥ K in the e-learning system, then the study support is effective. Con-
versely, if the number of acquirement items at period S is AS < K, then the
study support is ineffective. As for the acquirement of the learning items of the
learner, we consider that the understanding time for item is not constant because of
a learner’s characteristics and a lecturer’s teachability. Thus, we assume that A is
the random variable with mean E{A} = a > 0 and a probability distribution function
G(x) ≡ Pr{A ≤ x}.
The probability that a learner can achieve the number of items K for period S is

Pr{AS > K} = Pr{A > K/S } = 1 − G(K/S ), (1)

and the probability that a learner cannot achieve K for S is

Pr{AS ≤ K} = Pr{A ≤ K/S } = G(K/S ). (2)


Stochastic Applications for e-Learning System 563

3 Optimal Study Support Policy


3.1 Continuous Optimal Study Support Policies
In a cumulative damage model, the amount of damage is defined by Z(t) = at +
Bt (a > 0) at time

t and Bt is assumed to be an exponential distribution Pr{Bt ≤
−x/α t
x} = 1 − e [8], i.e., we assume that the number of the acquirement items
is proportional to time t, and in addition the number of Bt increases by the study
support.
Let T (T ≤ K/a) be the acquirement period. Suppose that the learner should
achieve the number of acquirement items in the e-learning system more than K until
time T . The probability of achieving the number K of items until time T is

Pr{Z(T ) > K} = Pr{BT > K − aT }


⎧   


⎪ K −√aT

⎨ exp − T≤ K
a ,
=⎪⎪ σ T  (3)


⎩ 1 T> K
a .
Conversely, the probability of no achieving is

Pr{Z(T ) ≤ K} = Pr{BT > K − aT }


⎧   


⎪ K −√aT

⎨ 1 − exp − T≤ K
a ,
=⎪⎪ σ T  (4)


⎩ 0 T> K
a .
The mean time of achieving K or no achieving K until time T is

T
T Pr{Z(T ) ≤ K} + tdPr{Z(t) ≥ K}
0

T √
= {1 − exp −(K − at)/σ t }dt. (5)
0

As a standard measure to take whether a learner can accomplish K items, or cannot,


we introduce the following costs:
c1 : Loss cost when a learner cannot achieve the regulated acquirement item K
for period S .
c2 : Cost per unit of time for which the lecturer checks a learner in the e-learning
system and executes the study support.
The total cost of a learner in this e-learning system during T is:
 
K − aT
C1 (T ) = c1 1 − exp − √ + c2 T (6)
σ T
564 S. Nakamura, K. Nakayama, and T. Nakagawa

3.2 Discrete Optimal Study Support Policy


A learner has to achieve the regulation number K of items for a period S . The lec-
turer checks the number of the learner’s items at periodic times jT ( j = 1, 2, . . . , N),
where NT = S , and supports a lecture’s study. In such system, it is assumed that the
number of items b(> 0) is effective by the lecture’s support at every jT .
The probability of achieving the number K of items until time S is
   
K − Nb K − Nb
Pr{AS + Nb > K} = Pr A > = 1−G , (7)
S S

and the probability of no achieving K is


 
K − Nb
Pr{AS + Nb ≤ K} = G . (8)
S

The total cost of a learner in this e-learning system during S is


 
K − Nb
C2 (N) = c1G + Nc2 (N = 0, 1, 2, . . . ). (9)
S

We seek an optimal number N ∗ that minimizes C2 (N) in (9), i.e., the minimum
number N ∗ by which the lecturer can support a learner. From Nb ≤ K, it is clear that
N ≤ K/b. Thus, we may obtain N ∗ for N = 0, 1, 2, . . . , [K/b].
Letting Nb ≡ x. 0 ≤ x ≤ K, and from(9),
 
2 (x) ≡ bC(N) = bc1 G K − x + x.
C (10)
c2 c2 S
Clearly,
 
2 (0) = bc1 G K ,
C
c2 S

C2 (K) = K.
2 (x) with respect to x and setting it equal to zero,
Differenticting C
K−x c2 S
g( )= , (11)
S c1 b
where a function g(x) is a density function of G(x).
Stochastic Applications for e-Learning System 565

4 Numerical Examples
As numerical examples, we consider two cases where A has a Weibull distribtion
and a normal distribution.

4.1 Weibull Distribution Case


Suppose that A is a random variable with a Weibull distribution G(x) = 1 −
exp[−(x/a)m] . Obviously,
 
1
E{A} = aΓ 1 + , (12)
m
∞
where Γ(α) ≡ 0 xα−1 e−x dx (α > 0). From (8), the probability that the learner can
not achieve the fregulation number K of items until time S is
   m 
K − Nb K − Nb
G = 1 − exp − . (13)
S S

Especially, from (12), it is assumed that when b = 1,


 
1
aΓ 1 + S = K = 10.
m

Equation(9) is
  m 
C2 (N) c1 10 − N
= 1 − exp − +N
c2 c2 10/Γ(1 + 1/m)
(N = 0, 1, 2, . . . , 10). (14)

We seek an optimal number N ∗ (N ∗ = 0, 1, 2, . . . , 10) that minimizes C2 (N) in (14)


when m = 1. A learner has to usually acquire 10 items until time 10/a. Then, from
(11),
1  K − x c S
2
exp − = . (15)
a aS c1 b
exp(−1) c2
(i) If ≤ then x∗ = 0, i.e., N ∗ = 0.
10 bc1
exp(−1) c2
(ii) If > then there exists a finite and unique x∗ (0 < x∗ < K)
10 bc1
which satisfies (15).
566 S. Nakamura, K. Nakayama, and T. Nakagawa

4.2 Normal Distribution Case


Suppose that a random variable A has a normal distribution N(a, σ2 /S ) and
aS = K = 10. Then,
     
K − Nb (K − Nb)/S − a −Nb
G =Φ √ =Φ √ , (16)
S σ/ S σ S
where a standard normal distribution Φ(x) is N(0, 1). From (9), we seek an optimal
N ∗ (0 ≤ N ∗ ≤ K/b) which minimizes
 
C2 (N) c1 −Nb
= Φ √ + N. (17)
c2 c2 σ S
Letting Nb = x,

2 (x) ≡ bC(x/b) = bc1 Φ( −x


C √ )+x (0 ≤ x ≤ K). (18)
c2 c2 σ S
2 (x) with respect to x and setting it equal to zero,
Differentiating C
1 −x c2
√ φ( √ ) = , (19)
σ S σ S bc 1

2
√1 e− 2
x
where φ(x) = 2π
.

1 c2
(i) If √ ≤ then x∗ = 0, i.e., N ∗ = 0.
σ 2πS bc1
1 c2
(ii) If √ > then there exists a finite and unique x∗ (0 < x∗ < K)
σ 2πS bc1 x x
which satisfies (19), and N ∗ = or N ∗ = + 1. If N ∗ > K/b, then N ∗ = K/b.
b b
In particular, when N = 0,
c1
C(0) = . (20)
2
Table 1 shows that if c1 /c2 becomes large, then the lecturer should increase the
frequency of study support. If c1 /c2 becomes large, we should frequently make the
study support. For example, when c1 /c2 = 15, the lecturer should make N ∗ = 10
for m = 1 , N ∗ = 5 for m = 2 times. Let the ratio of cost c1 to c2 be 15 times,i.e.,
c1 /c2 = 15. Then, we should provide study support once almost every weeks and
every 2 weeks for m = 1 and m = 2, respectively.
Stochastic Applications for e-Learning System 567

Table 1 Optimal number N ∗ (b = 1 CS = K = 10) in Weibull distribution case

c1 /c2 m = 1 m = 2
5 0 0
10 0 0
15 10 5
20 10 6
30 10 8
40 10 8
50 10 9

Table 2 Optimal numberN ∗ (b = 1 CS = K = 10) in normal distribution case

c1 /c2 σ = 1 σ=2
2 0 0
5 3 0
10 5 5
15 6 8
20 6 9
30 7 10
40 7 10
50 10 10

Table 2 presents optimal number N ∗ which minimize C(N) when b = 1, S = K =


10 for σ = 1, 2, c1 /c2 = 2 ∼ 50. Optimal N ∗ become large with c1 /c2 and the total
cost increase for the study priod S . When c1 /c2 is more than 15, optimal N ∗ become
large with σ.

5 Conclusion
In this proposed model, the optimal study support interval can be determined ana-
lytically by the accumulation access to the e-learning system. However, we do not
consider the form and the content of the study support, and there is a support method
according to the acquirement progress in the content of the learner’s study support.
In addition, it is a problem that for the necessity of every time there is the same cost
though the content of the study support.
As future problems, we should analyze the characteristics of a distributed
learner’s acquirement situation and cost of the learner’s study support and compare
them.

Acknowledgements. The authors would like to thank the financial support by the Grant-in-
Aid for Scientific Research (C), Grant No. 21530318 (2009-2011) and Grant No.22500897
(2010-2012) from the Ministry of Education, Culture, Sports, Science.
568 S. Nakamura, K. Nakayama, and T. Nakagawa

References
1. Obara, Y.: ICT using on University class. Tamagawa university press (2002) (in Japanese)
2. Ueno, M.: Knowledge Society in E-learning. Baifuukann (2007) (in Japanese)
3. Yamashita, J., et al.: Development of Aiding System for the Support for Distance Learners.
JSiSE Research Report 23(2), 41–46 (2008) (in Japanese)
4. Nakamura, S., Nakayama, K., Nakagawa, T.: The optimal study support interval policy in
e-learning. Discussion Paper, Institute of Economics Chukyo University. No. 0812 (2009)
5. Nakayama, K. (ed.), Nakamura, S., Nakagawa, T., et al.: Associated Economics filed
Stochastic Process and Education. Keisou Syobou (2011) (in Japanese)
6. Nakagawa, T.: Mentencne Theory of Reliability. Springer (2005)
7. Nakagawa, T.: Shock and Damage Models in Reliability Theory. Springer (2007)
8. Takács, L.: Stochastic Processes. Wiley (1960)
9. Barlow, R.E., Proschan, F.: Mathematical Theory of Reliability. Wiley, New York (1965)
Supporting Continued Communication with
Social Networking Service in e-Learning

Kai Li and Yurie Iribe

Abstract. In this paper, we describe a practical use of social networking service to


support continued communication among adult students in an e-learning program.
The access counts are analyzed in functions of my homepage, friends’ diary,
ranking, send message, add diary, etc. And communication activities are compared
in three coures periods’ students. The results show that the most communication
activity is reading friends’ diaries, the next most activities are accessing footprint
and listing all diaries. Furthermore, both the first-year students and the second-
year students show continued communication activities in SNS even after they
have graduated. Especially, the first-year students show significant more
communication activities than the other two coures periods’ students. As a result,
we conclude that continued communication among adult students is supported by
the SNS.

Keywords: e-learning, SNS, communication.

1 Introduction
In recent years, educational research has focused on the use of computer-mediated
communication (CMC)(Kato, S. & Akahori, K., 2004, Joinson, A.N., 2001).
Research has demonstrated that CMC in the teaching-learning process creates
more flexible communication patterns (Berge & Collins, 1996; Heller & Kearsley,
1996; Ruberg, Moore, & Taylor, 1996). CMC allows students to interact with their

Kai Li
Research Center for Agrotechnology and Biotechnology, Toyohashi University of
Technology, Japan
e-mail: kaili@recab.tut.ac.jp

Yurie Iribe
Information and Media Center, Toyohashi University of Technology, Japan
e-mail: iribe@imc.tut.ac.jp

T. Watanabe et al. (Eds.): Intelligent Interactive Multimedia: Systems & Services, SIST 14, pp. 569–577.
springerlink.com © Springer-Verlag Berlin Heidelberg 2012
570 K. Li and Y. Iribe

instructors and peers in a time that is convenient for them and may increase stu-
dent responsibility and self-discipline (Berge & Collins, 1996, Hsi & Hoadley,
1997). CMC can also equalize participation by masking social cues and cultural
differences (Berge & Collins, 1996; Hsi & Hoadley, 1997).
Social Network Service was defined as web-based services that allow individu-
als to construct a public or semi-public profile within a bounded system, articulate
a list of other users with whom they share a connection, and view and traverse
their list of connections and those made by others within the system (boyd, d. m.,
& Ellison, N. B., 2007). And social network service focuses on building online
communities of people who share interests and/or activities, or who are interested
in exploring the interests and activities of others (Gross, R., Acquisti, A., 2005).
Recently, social networking services (SNS) have been introduced as a new
CMC mode among university students (Sagayama, K. et al., 2008, Tokuno, J. et
al., 2007, Umeda, K. et al., 2007). Most of researches reported that SNS was ef-
fective in supporting communication between university students and teachers.
But little studies have focused on whether SNS was effective in supporting con-
tinued communication among adult students. In our previous study we found adult
students have more private communication than public communication in SNS.
In this study, we focus on discussing whether SNS could support continued
communication among adult students and comparing the different communication
activities in three course periods.

2 Project Background
This study is about a classroom lectures and e-learning courses blended learning
project. All the students are adult students who are interested in IT and agricultural
technology. Learning period of the project is about 18 months. In the beginning of
the project, the adult students have two months’ classroom lectures only on the
weekend, and then they have 12 e-learning courses at home for other 8 months. At
last, they have an on-site training for one month in different places. After the first-
year students got graduated, the second-year students will have the same learning
program as the first-year students.
In order to support communication between students and instructors, the
Sendoshi-SNS was developed (see figure 1).It was developed basing on the
Open Source Software of OpenPNE (Official site, 2008). OpenPNE is a social
networking service engine providing SNS functions such as profile, diary,
footprint, message, community, ranking, etc.

2.1 Users of the Sendoshi-SNS


There are 30 first-year students, 32 second-year students and 32 third-year stu-
dents in the IT-agriculture learning project. They have different education back-
ground from high school graduate to doctoral graduate and a wide range of ages
from 20s to 60s which resulted in different communication activities in SNS
which we will mention in the later of the study. All of the students and 10 staff as
administrator and mentor participated in the SNS. It is a membership system, no
Supporting Continued Communication with SNS in e-Learning 571

Home
Add diary List all Ranking
Footprint
Send message Friends’ diary

Fig. 1 Snapshot of Sendoshi-SNS

other one could log in the SNS except the students and the staff. And all of the
users were asked to use their real name in their profile in order to know who
they are.

2.2 Hit Count


The author as a super administrator of the network extracted the log data from
November, 2008 to December, 2011. In the study the authors focus on the hit
count of the main communication functions as my homepage, friends’ diary, foot-
print, ranking, diary list, send message, reply message and add diary functions.

2.3 Functions in SNS


In the Sendoshi-SNS, there are many functions to support communication among
students. In the study, we focus on the main functions of my homepage, friends’
diary, footprint, list all diaries, ranking, send message, reply message, and add
diary.
Function of my homepage is the first page when the user logs in the SNS.
Function of friends’ diary shows his/her friends’ new diaries.
Function of footprint is a list of visitors’ name and visiting time which shows
who and when his/her homepage is visited.
Function of list all diaries is a list of new diaries in the SNS. An outline of the
diaries is showed. The user could click to outline to read the detail of the diary if
he is interested in the contents.
Function of ranking is a list of most noticed users in recent days.
Function of sending message is a private communication between users. No
other ones could read the private messages.
Function of replying message, by which user could reply private messages to
other students.
Function of add diary is a public communication that could be read by
everyone.
572 K. Li and Y. Iribe

3 Analysis of the Communication Activities


First, we compared all of the students’ communication activities in their first
year’s learning. Because they have the same learning program even in different
course periods, their activities could be compared in their first year learning.
Second, in order to discuss whether the SNS could support students’ continued
communication, three years communication activities of the first-year students and
two years activities of the second-year students were compared.

3.1 Communication Activities of all the Students


The result shows that the first-year students have more activities in the SNS than
the other two years’ students (see figure 2). An ANOVA analysis shows that there
are significant differences among the three years’ students in accessing my home-
page function (F(2,79)=6.78, p=.002<.005). The first-year students show signifi-
cant more hit count than the second-year students (p=.008) and the third-year stu-
dents (p=.005) (see Table 1). Furthermore, there are significant differences among
the three years’ students in functions of replying message (F(2,63)=5.52,
p=.006<.05), function of listing all diaries (F(2,49)=3.69, p=.032<.05), function of
friends’ diaries (F(2,78)=9.49,p=.000<.001), function of sending messages
(F(2,75)=5.16, p=.008<.05). In function of replying message, the first-year stu-
dents (M=79.78) show significant more hit count than the second-year students
(M=14.92) (p=.012) and the third-year students (M=10.71)(p=.029). In function of
reading friends’ diaries, the first-year students (M=606.42) show significant more
hit count than the second-year students (M=146.62)(p=.001) and the third-year
students (M=163.55) (p=.002). In function of sending message, the first-year stu-
dents (M=120.60) show significant more hit count than the second-year students
(M=26.00)(p=.026) and the third-year students(M=11.38) (p=.017).

Fig. 2 Hit count of functions of my homepage, friends’ diary, footprint, etc. (Vertical axis is
hit count)
Supporting Continued Communication with SNS in e-Learning 573

Table 1 Average hit count in functions of home, diary, etc. among three years students

first-year students second-year students third-year students p


home 1270.81 264.86 154.04 0.002 **
diary 606.42 146.62 163.55 0.000 ***
footprint 302.27 46.81 41.41 0.079
diary_list_all 196.93 95.24 85.57 0.243
ranking 118.46 42.47 44.85 0.185
message_send 120.60 26.00 11.38 0.008 *
reply_message 79.78 14.92 10.71 0.006 *
diary_add 32.73 7.52 44.67 0.032 *
*p<.05, ** p<.005, ***p<.001.

3.2 Communication Activities of the First-Year Students in Three


Course Periods
The first-year students have used the sendoshi-SNS for three course periods. In the
first year, they have in-classroom courses and e-learning courses, but in the next
two course periods, they have no learning activities. Although they have graduat-
ed, they could also use the SNS to communicate with others.

Fig. 3 Hit count in different functions by the first-year students in three years (Vertical axis
is hit count)

The results show that the first-year students have more communication
activities in the first course period (see figure 3). Especially, there are significant
differences in reading friends’ diaries (F(2,75)=5.52, p=0.006<0.05) (see table 2).
They have read significant more friends’ diaries in the first course period
(M=606.42) than the third course period (M=178.29) (p=0.07<0.05). There are no
significant differences in activities of list diaries, ranking, etc. Although they have
574 K. Li and Y. Iribe

no learning activities in the second and the third course periods, we could still
found some communication activities in the SNS, by which we could conclude
that the first-year students have a continued communication in three course
periods even after they have graduated.

Table 2 the first-year students’ average hit count in functions of home, diary, etc. in three
years

the first year the second year the third year p


home 1270.81 753.15 366.05 0.155
friends_diary 606.42 299.73 178.29 0.006 *
footprint 302.27 438.18 238.77 0.773
diary_list_all 196.93 234.05 203.12 0.904
ranking 118.46 113.11 86.25 0.905
message_send 120.60 55.68 10.81 0.077
reply_message 79.78 36.58 9.08 0.071
diary_add 32.73 19.41 7.80 0.14
*p<.05.

3.3 Communication Activities of the Second-Year Students in


Two Course Periods
As the first-year students, we have a same analysis of the second-year students’
communication activities in two course periods (see figure 4). There are no signif-
icant different communication activities between two course periods (see table 3),
by which we could conclude that the second-year students also have a continued
communication even after they have graduated.

Fig. 4 Hit count in different functions by the second-year students in two years (Vertical
axis is hit count)
Supporting Continued Communication with SNS in e-Learning 575

Table 3 the second-year students’ average hit count in functions of home, diary, etc. in two
years

the second year the third year p


home 264.86 154.12 0.205
friends_diary 146.62 165.26 0.774
diary_list_all 95.24 158.89 0.480
footprint 46.81 45.83 0.974
ranking 42.47 38.18 0.860
message_send 26.00 12.11 0.147
reply_message 14.92 8.29 0.230
diary_add 7.52 11.56 0.313

4 Discussion

4.1 Hit Count of Different Functions in SNS


The most communication activity in the sendoshi-SNS is reading friends’ diaries
(see figure 2) which is the main purpose of the project to share information among
students. The next most communication activity is reading footprint by which the
user could know who and when have visited his/her pages. As a result, we con-
clude that the adult students not only share their information with others but also
concern about whether his/her diaries are read by others and how many times
his/her diaries have been read. The function of listing all diaries is also most ac-
cessed function in the SNS. The students not only share their own information, but
also concern about others information. Comparing to add diary to share with each
students, the students have more private communication by sending messages to
his friends which cloud not be read by others. The results are the same as our
previous research.
Comparing the communication activities of the three years’ students, the
first-year students show significant more hitting than the second-year students
and the third-year students (see Table 1). But we have not found significant dif-
ferent activities between the second-year students and the third-year students.
Therefore, we could conclude that the first-year students are the most active
students in communication than the other two years’ students. Since all the
students have the same learning courses, the first-year students have the most
curiosity about the learning project because they have nothing information
about the project, but the other two years’ students have got some information
from the first-year students and lost some curiosity about the project progres-
sively. From the results, it is suggested that in order to promote continued
communications in SNS, new information or learning materials should be
updated occasionally.
576 K. Li and Y. Iribe

4.2 Continued Communication Activities


In figure 3 and figure 4, we could found continued communication activities in the
first-year students and the second-year students even after they have graduated.
The sendoshi-SNS could support their communications after school. Since the
adult students have the same interest in IT and agricultural technology, they con-
tinued to share information or experiences with others even after they have gradu-
ated. Comparing to the first course period, they have significant less communica-
tion activities in reading friends’ diaries in the second and the third course periods.
However, reading friends’ diaries is still the most communication activity in the
SNS. Furthermore, activities of listing all diaries and ranking, etc. are founded af-
ter they have graduated. As a result, we conclude that continued communication is
supported by the sendoshi-SNS which is this desired in this learning project.

5 Conclusion and Future Work


In this study, communication activities of three course periods’ student in func-
tions of reading friends’ diaries, footprint, sending messages, adding diary, etc. are
analyzed and compared. The results show that the first-year students and the
second-year students have showed continued communication activities in the SNS.
And the most communication activity in the SNS is reading friends’ diaries. The
next more activities are hitting footprint and listing all diaries. Furthermore, the
first-year students showed more communication activities than the other two
course periods’ students.
Since the continued communication is confirmed in the study, further studies
are needed to find out the effective factors to keep a continued communication in
e-learning. In addition, studies should be conducted to evaluate whether tutor
and staff’s communication activities could keep a continued communication in
SNS. And new SNS functions should also be developed to support continued
communication.

References
(1) Kato, S., Akahori, K.: Influences of Past Postings on a Bulletin Board System to New
Participants in a Counseling Environment. In: Proceedings of ICCE 2004, pp. 1549–
1557 (2004)
(2) Joinson, A.N.: Self-disclosure in computer-mediated communication: The role of self-
awareness and visual anonymity. European Journal of Social Psychology 31, 177–192
(2001)
(3) Berge, Z., Collins, M.: Computer mediated communication and the online classroom:
Overview and perspectives. In: Collins, B. (ed.) Computer Mediated Communication,
vol. I, pp. 129–137. Hampton, New Jersey (1996)
Supporting Continued Communication with SNS in e-Learning 577

(4) Heller, H., Kearsley, G.: Using a computer BBS for graduate education: Issues and
outcomes. In: Berge, Z., Collins, M. (eds.) Computer-mediated communication and
the online classroom. Distance learning, vol. III, pp. 129–137. Hamptom Press, NJ
(1996)
(5) Ruberg, L., Moore, D., Taylor, D.: Student participation, interaction, and regulation in
a computer-mediated communication environment: A qualitative study. Journal of
Educational Computing Research 14(3), 243–268 (1996)
(6) Hsi, S., Hoadley, C.: Productive discussion in science: Gender equity through elec-
tronic discourse. Journal of Science Education and technology 6(1), 23–36 (1997)
(7) Boyd, D., Ellison, N.: Social Network Sites: Definition, History, and Scholarship.
Journal of Computer-Mediated Communication 13(1) (2007)
(8) Sagayama, K., Kume, K., et al.: Characteristics and Method for Initial Activity on
Campus SNS. In: Proc. of ED-MEDI 2008, pp. 936–945 (2008)
(9) Tokuno, J., Sakurada, T., Hagiwara, Y., Akita, K., Terada, M., Miyaura, C.: Devel-
opment of a Social Networking Service for Supporting Alumnae’s Re-challenge. IPSJ
SIG Technical Report, 2007-CE 91(10), 53–60 (2007)
(10) Umeda, K., Naito, Y., Nozaki, H., Ejima, T.: A study of university student communi-
cation using SNS Web diaries. In: Supplementary Proc. of ICCE 2007 (WS/DSC),
Hiroshima, Japan, vol. 2, pp. 315–320 (2007)
(11) OpenPNE Official Site (2008) (in Japanese), http://www.openpne.jp/
(12) Gross, R., Acquisti, A.: Information Revelation and Privacy in Online Social Net-
works (The Facebook case). In: Proceedings of WPES 2005, pp. 71–80. Association
of Computing Machinery, Alexandria (2005)
(13) Archer, J.L.: Self-disclosure. In: Wegner, D., Vallacher, R. (eds.) In the self in social
psychology, pp. 183–204. Oxford University Press, London (1980)
Tactile Score, a Knowledge Media of Tactile
Sense for Creativity

Yasuhiro Suzuki, Junji Watanabe, and Rieko Suzuki

Abstract. The sensitivity information processing have been developed by visualiz-


ing it through developing a method of describe, such as musical stores. We propose
a knowledge media of describing tactile sense and visualize it, with refereed to the
musical score. We analyze massages in a beauty salon by using tactile scores; we
decompose the massage to 42 kinds of basic method components then analyze them
by using the principal component analysis; found out that they are classified into 6
basic groups. We visualize the massages as transition sequences among the basic
groups in the principle component space; and discover that two basic groups are
intermediate one basic group to another.

1 Introduction
When perceiving an apple, one cannot precisely confirm whether everybody per-
ceives the apple in a same/equivalent manner or not. However, for visual and audio
perception, one can share comparable sensation with others to some extent. When
given the name of Mona Lisa, we can recall the identical piece of painting. We also
can hum to the same tune of the subject of Symphony No.5 of Beethoven. On the
other hand, tactile perception has no way to visualize or to record in regeneratable
formats like paintings and musical scores, to share with others. Tactile perception is
among the important senses, along with visual and audio perception [1].

Yasuhiro Suzuki
Department of Complex Systems Science, Graduate School of Information Science, Nagoya
University
Name, Furocho Chikusa Nagoya Japan
e-mail: ysuzuki@nagoya-u.jp
Junji Watanabe
NTT Communication Laboratories, 3-1 Morinosato Wakamiya Atsugi-shi, Kanagawa
243-0198, Japan
Rieko Suzuki
Tokyo Face Therapie, 2-3-4 Koishikawa Bunkyo Tokyo Japan

T. Watanabe et al. (Eds.): Intelligent Interactive Multimedia: Systems & Services, SIST 14, pp. 579–587.
springerlink.com c Springer-Verlag Berlin Heidelberg 2012
580 Y. Suzuki, J. Watanabe, and R. Suzuki

Tactile perception has mainly been the subject of cognitive science and psychol-
ogy studies and has been applied in engineering. Especially massage has been stud-
ied as a part of alternative medicine, and its effect on relaxation and activation of
immunocyte has been widely recognized. This study recommends the visualization
method of tactile stimulation in regenaratable manner, and analyzes the effect of
massage.

1.1 Massage as Cosmetology


A coauthor, Rieko Suzuki has been and still is running an esthetic salon business.
Right after its opening, treatments using various cosmetics and devices did not suc-
cessfully promote the beauty, thus the number of client stayed low.
Hence, she have reinvestigated the treatment procedures, and switched to the one
comprised mainly with massage. This transition has brought the improved beauty
result which led to the take off of the business. Based on this experience, she have
confirmed that chemical substances (i.e. cosmetics), physical stimulation (i.e. treat-
ment devices) as well as visual/audition stimulation (i.e. decor, music and other el-
ements of the salon) cannot effectively improve the physical conditions, but tactile
stimulation can.
Conventional beauty massage was a supplement to cosmetics and treatment de-
vices, and rather mechanical and plain. Hence, conventional methods could not
bring out the intended beauty effect. She have developed a more effective (whitening
and sharpening of the face line among others) massaging method by trial and error
of more complex methods than conventional ones. This series of trial and error has
shown that how the touch is made can make a significant difference on the result.

1.2 Lauguage of Tactile Sense


Although the face is three dimensional, massages are two dimensional movements
over the surface. Multi-dimensional change, such as contact area, pressure and speed
among others, were perceived from there. Also, tactile perception conveys different
messages from speech language. When one is patted on the shoulder once, he/she
might think of accidental collision, yet when patted twice, it has meaning and he/she
interprets it as someone has called. Also, mothers gently tap babies at steady rhythm
when caressing. The steady rhythm evokes the sense of security in babies.
In other words, counts and rhythm are important in tactile perception. Suppose
the basic “count” of massage is the circular stroke from the base point. Just like
when one is patted on the shoulder, a single stroke could not be distinguished from a
mere rubbing and it required more than double strokes to be recognized as massage.
This set of double strokes or more was considered as the basic element. Counts
were the alphabet of massage, basic elements were words, and the combination
of words corresponded to massage. As mothers gently tap, steady rhythm added
meaning and sense of security to massage. It corresponds to measures in music. For
example, in quadruple measures, basic elements in 4 counts compose one unit of
Tactile Score, a Knowledge Media of Tactile Sense for Creativity 581

massaging. Hence, in order to describe massaging, we borrow from the method of


musical scoring to make use of the tactile score.

2 Tactile Score
To visualize and investigate the method of massage, we take massaging to be com-
posed of pressure, the area of touching, and the velocity of the movement of hands.
In staff notation of the tactile score, we define the third line as the basic pressure;
the basic pressure is the pressure when we hold a baby or an expensive jewel very
carefully. Hence, the basic pressure is not defined absolutely but may change from
person to person or for different types of massage. For the tactile score, we de-
fine the pressure strength as the difference in pressure from the basic pressure. We
define stronger pressure as downward from the third line in the staff notation and
weaker pressure as upward from the third line (Figure 2). We also define the part
of the hand used in massage and the kind of stroke when massaging (see Figure
2). For example, the fingertip to the first joint is 1, the second joint is 2, the third
joint is 3, the upper part of the palm is 4, the center of the palm is 5 and the bottom
of the palm is 6; when we flow from a fingertip to the third joint, this is denoted
as “1-3”.
For massage strokes, we analyse the method of massaging, face therapy and
extract strokes; we symbolize each stroke as A, a, N, n, etc. For example, the sym-
bol A stands for the massage stroke of drawing a circle on the cheek. In this no-
tation, for example, A5 illustrates drawing a circle on the cheek with the center of
the palm. The tactile score in this contribution is the basic version in which each
musical note denotes massage by both hands and we denote a gap in hand mo-
tion with a special mark above the staff notation; 1 denotes both hands moving the
same, 2 indicates a small gap between hands and 3 indicates a large gap between
hands.

2.1 Simplified Version of the Tactile Score


For a person who are not familiar with a music score, we also propose the schematic
expression of the Tactile score, where the the size of a cycle expresses the size of
area of touching and the movement velocity and the color of a cycle expresses the
strength of pressure, where darker color (near black) shows strong pressure, while
lighter color (near white) shows lighter pressure.
In the tactile score of Figure 2.1, at the first count in the beginning part, A5 ,
circles are drawn on both sides of the cheeks using the center of the palm with
weaker pressure than the basic pressure, at the second count, the hands are moved
to the tails of the eyes and a small circle is drawn using the center of the palm while
keeping the same pressure as the first count and, at the third and fourth counts, the
hands are moved to both sides of the cheeks and cycles are drawn using the fingertips
with a stronger pressure than the basic pressure.
582 Y. Suzuki, J. Watanabe, and R. Suzuki

Fig. 1 Top: Strokes of massaging on a face; these strokes are obtained from massage expe-
riences in beauty shops; strokes that pass uncomfortable areas have been excluded. Bottom:
Usage of part of the hand.

3 Analyzing the Massage by Using Tactile Score


By using a tactile score, we can analyse the standard massage used in face therapy.
We examined the method and confirmed that it can be decomposed into 42 kinds
Tactile Score, a Knowledge Media of Tactile Sense for Creativity 583

Fig. 2 Top: An example of a tactile score, with special marking above the staff notation; 1
denotes both hands moving the same, 2 indicates a small gap between hands, and 3 indicates
a large gap between hands, the Sulla like marks illustrate a unit component of massaging,
the integral-like marks illustrate releasing pressure, and the breath-like mark corresponds to a
short pause in massaging, much like a breath in playing music. Bottom: Schematic expression
of change of pressure and areas of touching, where the size of each cycle illustrates the area
of touching and the solid line illustrates the change of pressure.

of basic massage components. Through this investigation, we can describe various


massages by combining these basic components. To characterize these basic compo-
nents, we use the semantic differential (SD) method. We asked the inventor of face
therapy, R. Suzuki, to be the respondent in our SD method analysis. In this analy-
sis, we used nine pairs of adjectives as follows: The respondent chooses an integer
scaled between −3 and 3 for each basic component. We then examined the result
using principal component analysis. The first principal component was the charac-
teristics of touching (soft–large, brow–wrap), the second principal component was
the time variation of touching (disappearing–releasing), and the third principal com-
ponent was the change of pressure (heavy–sharp).

Table 1 Pairs of adjectives used in SD method


Soft Hard,
Light Heavy,
Large Small,
Sharp Blunt,
Disappearing Remaining,
Inhibitory Releasing,
Calm Stable,
Hollow Brow,
Dub Wrap,
584 Y. Suzuki, J. Watanabe, and R. Suzuki

By using these principal components, we classify 42 kinds of basic components


into six groups, named I: light pressure, II middle pressure, III heavy pressure, IV
light flow, V keen flow, and VI soft flow. (See Figure 3.)
The characterization of basic components corresponds to having drawing mate-
rials for basic motifs of massaging. For example, in painting we compose artwork
by using drawing materials to create a beautiful form. Tactile stimuli do not have
visual or auditory forms for which we can judge their beauty. Hence we define the
beauty of massaging as comforts.
For a beauty salon, massages that can improve the skin condition or physical
states of the body and attract customers are required; otherwise, the business fails.
Hence, we define a massage that has a high client satisfaction level for ten years
as a ”good massage”. We have studied various massages and found that comfort-
able massages are likely to be good massage. The standard massage has been ob-
tained through crystallization of such comfortable massages, so we analyse it. We
described the standard massage using tactile score and transformed it into basic
components I to VI; and, we analysed the massage as a time series of basic com-
ponents (Figure 3). We found that the basic components of IV (middle pressure)
and V (keen flow) are used as intermediate components; for example, for a massage
starting from I (light pressure) to reach III (heavy pressure), since there are no direct
transition path from I to III, it has to go through IV or V such as I → IV → III or I
→ V → III.

Fig. 3 Map of the 42 basic components in the principal component space, where the horizon-
tal axis illustrates the first and second principal components and the vertical axis illustrates
the third principal component.
Tactile Score, a Knowledge Media of Tactile Sense for Creativity 585

Fig. 4 The result of a time series of basic massage components, where each bidirectional
arrows illustrates possible transitions between basic groups. Groups IV and V (indicated by
cycles) are intermediate groups; they mediate transitions between groups.

3.1 How to Create a Good Massage?


Having obtained a method for visualization and describing massages, we are able
to compose massages by using this method, where basic components are useful in
the composition. We have created various tactile scores and performed them and
have examined massages that are composed of using tactile scores to find those that
are comfortable. We synthesize these experiments and experiences of massaging
for customers and obtain the common characteristics of a comfortable massage a
relational expression as follows:

Constant = S × P × V,

where S is the area of touching, P is the pressure of massage, and V is the velocity of
hand movement. In massaging, we feel comfortable, when S, P and V are oscillatory
changed with preserving this relation. For instance, draw a circle on the back of
your hand using your fingertip strongly and then draw a circle on the back of your
hand using your palm with the same movement velocity; if you massage with the
same pressure, it will not be comfortable but if you massage more softly, it will be
comfortable.
586 Y. Suzuki, J. Watanabe, and R. Suzuki

This relational expression is required to generate oscillations in multidimentional


space, which is composed of S or/and P or/and V ; in this case, movement velocity
has kept the same and the area of touching become large, so in order to preserve this
rule, the pressure of massage must be relieved.

4 Conclusion
In this contribution, we proposed a visualization and desctibing method, the tactile
score for massages. Tactile sense, specially complex tactile sense such as massages
are invisible and we have no way to visualize or to record in regeneratable formats,
to share with others. On studies of tactile perception has centered on the generation
of tactile stimulation (i.e. receiving end) and how to touch, has not been discussed so
much. We believe that this visualization method is useful for analysing tactile sense
or designing and showing complex tactile sense. For example, by visualizing tactile
sense by using tactile score, we can apply it into the computational aesthetic1 [3],
[2] and ’compose’ a massage as if composing a music. In this example, we compose
a “motif” of massaging by simplified version of the tactile score (Figure 5); And
we delete the lefttmost the tactile note and add new one in the rightmost position;

Fig. 5 A motif of massage expressed by using simplified version of the Tactile score

Fig. 6 From the motif, we delete the leftmost one and insert rightmost

1 Computational Aesthetics aims of understanding aesthetics as computation.


Tactile Score, a Knowledge Media of Tactile Sense for Creativity 587

Fig. 7 Composed tactile score

we repeat this operation three times and compose the Tactile score for four measure
(Figure 6). Then we transform obtained simplified tactile score into the score with
the tactile notes with keeping the relational expression and we obtain the new tactile
score (Figure 7).

References
1. Leeuwenberg, E.: A perceptual coding language for visual and auditory patterns. Americal
Journal of Psycology 84(3), 307–349 (1971)
2. Bense, M.: Aesthetica. einfühurung. die neue Aesthetik (1965)
3. Scha, R., Bod, R.: Computationele estehtica. Informatiej en Informatiebleid 11(1), 54–63
(1993)
Taxi Demand Forecasting Based on Taxi Probe
Data by Neural Network

Naoto Mukai and Naoto Yoden

Abstract. The taxi is a flexible transportation system that everyone can move to
any destination. However, in Japan, the charge for the taxi is more expensive than
other transportation facilities. The taxi business is in a very tough situation because
the cost of crude oil suddenly increased in addition to the influence of the over-
supply of the taxi market. Recently, the application of Information Technologies
has advanced on taxi industries (e.g., the fare payment by non-contact IC and car
navigation system). One of the technologies that gain such the attention is a probe
system which can store a large amount of customer trajectory data. The probe sys-
tem will improve the profitability of taxi companies if the demand in the future can
be forecasted from the statistics. Therefore, in this paper, we try to forecast the taxi
demands from the taxi probe data by a neural network (i.e., multilayer perceptron).
First, we analyze the statistics of the taxi demands and make the training data set for
the neural network. Then, the back-propagation learning is applied to the neural net-
work to reveal the relationship of regions in the Tokyo(i.e., 23-words, Mitaka-shi,
and Musashino-shi). Finally, we report our discussion about the result.

1 Introduction
Recently, transportation systems in Japan such as taxi and train have introduced
information technologies in diverse ways. For example, most taxis are equipped with
car navigation system which would show the way to your destination, and we can

Naoto Mukai
Culture-Information Studies, School of Culture-Information Studies, Sugiyama Jogakuen
University, 17-3, Hoshigaoka-motomachi, Chikusa-ku, Nagoya, Aichi, 464-8662, Japan
e-mail: nmukai@sugiyama-u.ac.jp
Naoto Yoden
Dept. of Electronic Telecommunications, Matsumoto-denryokusho, Tokyo Electric Power
Co.Inc., 1-1-17, Chuou, Matsumoto, Nagano, 390-0811, Japan
e-mail: utinokoinu@msn.com

T. Watanabe et al. (Eds.): Intelligent Interactive Multimedia: Systems & Services, SIST 14, pp. 589–597.
springerlink.com © Springer-Verlag Berlin Heidelberg 2012
590 N. Mukai and N. Yoden

pass through ticket gates simply by holding our non-contact IC cards over a scanner
in train stations. One of the technologies for transportation systems in the spotlight is
a taxi probe system which provides historical data of taxi (i.e., latitude and longitude
when a taxi picks up a customer). Taxi probe data is just beginning to be applied to
a variety of uses. Nakashima et al. aimed at the improvement of road time table by
using taxi probe data [4]. They discussed how to deal with week and seasonal factors
for traveling time of the road. Taguchi et al. analyzed the relationship between taxi
behaviors and the characteristics of region and time [6], and found a macro model
of customers in Sendai, Japan. Furthermore, there has been considerable researchs
on the probe data [2, 5].
In this paper, we try to forecast the taxi demands by using the taxi probe data.
Transport demand forecasting is a very important factor for transportation systems
because more precious prediction can improve their profits. Especially on-demand
systems such as demand-bus and car sharing system are very sensitive to the pre-
diction accuracy of transport demands. For example, the allocation of taxis can be
controlled adequately, and the time required to transport customers can be reduced.
A typical forecasting approach is a neural network and have been used as demand
forecasting [3, 1, 7]. We also adapt a neural network (i.e., multilayer perceptron)
for our objective. First, we analyze the statistics of the taxi probe data and make
the training data set for the neural network. Then, the back-propagation learning is
applied to the neural network to reveal the relationship of regions in the Tokyo(i.e.,
23-words, Musashino-shi, and Mitaka-shi).
The remainder of this paper is as follow. Section 2 shows the format of taxi probe
data. Section 3 defines the training data set for a neural network. Section 4 reports
our results and discussion. Section 5 describes conclusions and future works.

2 Taxi Probe Data


A taxi probe system is one of the gathering information methods, taxis are equipped
with some compact sensors for vehicle information (e.g., vehicle position). Ad-
vanced traffic information such as traffic jam can be estimated by the accumulated
probe data. In this paper, we utilize the taxi probe data to develop a demand fore-
casting neural network for taxis. The taxi probe data we used are offered by Tokyo
Musen Taxi 1 . The probe data are recorded from February 1st to March 31, 2009,
and the travel region of taxis is in Tokyo’s 23 wards, Mitaka-shi, and Musashino-shi.
An example of taxi probe data is shown in Table 1. Each record contains “ID”, “X”,
“Y”, “Month”, “Day”, “Hour”, “Minute”, “Region”. ID is a identification number
of taxi. X and Y are latitude and longitude (which are converted to Euclidean space).
Month, Day, Hour, and Minute are date and time when a taxi picks up a customer.
Region is a region name (i.e., Tokyo’s 23 ward, Mitaka-shi, or Musashino-shi).
Figure 1 is Tokyo’s demand map where red circles represent pick up demands
(which are recorded from 8:00 to 8:30 in February 1st, 2009).

1 Tokyo Musen Taxi:http://www.tokyomusen.or.jp/


Taxi Demand Forecasting Based on Taxi Probe Data by Neural Network 591

Table 1 An example of taxi probe data

ID X Y Month Day Hour Minute Region


3215531044 -26736.4 -13714.5 2 1 7 59 Itabashi-Ku
3124267042 -34983.6 -19709.1 2 1 8 2 Suginami-Ku
3179710059 -27798.4 -22352.6 2 1 8 3 Nerima-Ku
3567305063 -34411.7 -12192.6 2 1 8 23 Shinjuku-Ku

Fig. 1 Demand map of Tokyo

3 Demand Forecasting by Neural Network


In this section, we explain how to convert from taxi probe data to training data set
for neural network, and show the setting of neural network (e.g., learning pattern
and factors).

3.1 Training Data Set


We have taxi probe data during two months (i.e., February 1st to March 31, 2009).
Thus, the data of first month (i.e., February 1st to 28) are used as training data set,
and the data of last month (i.e., March 1st to 31) are used as validation data set. We
adopt three kinds of input data: “demands in each region”, “day of the week”, and
“amount of precipitation” to the input layer of neural network. We will explain the
detail of the input data as follows.
592 N. Mukai and N. Yoden

3.1.1 Demands in Each Region

There are 25 regions (Tokyo’s 23 wards, Mitaka-shi, and Musashino-shi), thus we


set 25 neurons in input layer. We calculate the number of demands in each region
during 4-hours (i.e., 0:00-4:00, 4:00-8:00, 8:00-12:00, 12:00-16:00, 16:00-20:00,
and 20:00-24:00) or 6-hours (i.e., 0:00-6:00, 6:00-12:00, 12:00-18:00, and 18:00-
24:00). Each number is normalized as following equation where xi is the number
of demands in region i, xmax and xmin are the maximum and minimum numbers of
demands in all regions, in order to map each input value from 0 to 1.
xi − xmin
x̂i = (1)
xmax − xmin
Table 2 is an example of input data. Each record represents a sequence of normalized
demands of a region in a day.

Table 2 An example of input data (4-hours)

Region 0-4 4-8 8-12 12-16 16-20 20-24


Adachi-Ku 0.29 0.03 0.25 0.09 0.06 0.12
Bunkyo-Ku 0.05 0.03 0.03 0.02 0.03 0.05
Chiyoda-Ku 0.10 0.00 0.27 0.10 0.07 0.05
Chuo-Ku 0.12 0.03 0.10 0.05 0.01 0.29
Edogawa-Ku 0.44 0.21 0.18 0.11 0.08 0.27

3.1.2 Day of the Week

In [6], Taguchi et al. indicated that taxi demands depend strongly on day of the
week (i.e., weekday or holiday). Thus, we considered three input patterns: “One-
day, Three-days, and Seven-days” shown in Table 3 for day of the week.

Table 3 Neurons of input layer for day of the week

Pattern Number of neurons Recognition pattern


One-day 1 If the day is weekday, the input value is 0, otherwise, 1.
Three-days 3 Three neurons show previous-day, current-day, and after-day.
Seven-days 7 Each neuron shows day of the week.

3.1.3 Amount of Precipitation

We assumed that the taxi demands also depend on weather, in fact, the number
of taxi-users increases in poor weather conditions. Japan Meteorological Agency
provides the past amount of precipitation in Japan 2 . Table 4 shows the amount
2 Japan Meteorological Agency:http://www.jma.go.jp/jma/index.html
Taxi Demand Forecasting Based on Taxi Probe Data by Neural Network 593

Table 4 Weather information in Tokyo at February 27, 2009

Time Amount of precipitation (mm) Air temperature (◦ C)


1 0.0 4.7
2 0.0 4.9
3 0.0 5.1
4 0.0 4.8
5 0.0 4.6
6 0.0 4.3
7 0.5 3.9
8 0.0 3.7
9 1.5 3.5
10 2.5 2.7
11 3.0 2.0
12 1.5 2.2

of precipitation and the air temperature in Tokyo at February 27, 2009, which is
obtained from the website. We set one neuron in input layer, and if the rainfall is
observed, the input value is 1, otherwise, 0.

3.2 Setting of Neural Network


The structure of neural network for demand forecasting is illustrated in Figure 2.
The neural network consists of three layers: input layer, hidden layer, and output
layer (i.e., feed-forward multilayer perceptron). The number of neurons in input
layer depends on 11 input patterns shown in Table 5. At least 25 neurons are needed
for the input values of demands in each region (i.e., Tokyo’s 23 wards, Mitaka-shi,
and Musashino-shi). In addition, we set extra neurons which represent the day of
the week (One-day, Three-days, and Seven-days) and the amount of precipitation.
As previously explained, the number of the extra neurons is shown in Table 3. In
order to forecast the future demand, past data are used as input values and future
data are used as output values (e.g., demands during 8:00-12:00 as input values
and demands during 12:00-16:00 as output values) for back-propagation learning 3 .
Consequently, the number in output layer is 25 neurons which represent “future”
demands in each region.

4 Results and Discussion


Here, we report our results and discussion. Figure 3 shows the comparison of 4-
hours and 6-hours data regarding with the number of neurons in hidden layer. The
error of horizontal axis shows the difference between actual demands and the out-
put values of neural network after back-propagation learning. The result indicates
3 The learning coefficient is set to 0.5 and the convergence error is set to 0.05.
594 N. Mukai and N. Yoden

Fig. 2 Structure of neural network

Table 5 Input Patterns of Neural Network

Pattern One-day Three-days Seven-days Precipitation


PT1 Unused Unused Unused Unused
PT2 Used Unused Unused Unused
PT3 Unused Used Unused Unused
PT4 Unused Unused Used Unused
PT5 Used Unused Used Unused
PT6 Unused Used Used Unused
PT7 Unused Unused Used Used
PT8 Used Unused Unused Used
PT9 Unused Used Unused Used
PT10 Used Unused Used Used
PT11 Unused Used Used Used

that the neural network of 4-hours and 50 neurons (in hidden layer) outperforms
compared to other networks. From here, we focus on the result of the outperformed
neural network.
Figure 4 shows the comparison of input patterns shown in Table 5. First, when
comparing input pattern of day of week (i.e., one-day, three-days, and seven-days),
the seven-days can reduce the error effectively. This fact implies that the taxi de-
mands vary widely according to the day of week. Moreover, three-days is superior
to the one-day in the case of PT 2 and PT 3, but inferior in the case of PT 5 and PT 6.
A possible cause is that the seven-days overlaps three-days and redundant neurons
may negatively affect to the result. Next, we found that the weather information
Taxi Demand Forecasting Based on Taxi Probe Data by Neural Network 595

Fig. 3 Comparison of neural network structures

Fig. 4 Comparison of training patterns

about precipitation is ineffective. Whether the rain falls or not is considered in this
way, but we should have considered the amount of precipitation. We would like to
address this problem as the future task.
Figure 5 shows the error comparison in each region. The result indicates that
error of Chuo-Ku is the least in the regions. The Chuo-Ku is positioned roughly in
the center of Tokyo’s 23 wards. Population of the Chuo-Ku is the second smallest,
but day population increases sharply because most of areas in Chuo-Ku are business
zones. Thus, taxi demands occur periodically in a day. On the other hand, Edogawa-
ku is the worst in the regions. The Edogawa-Ku is positioned at the east end of
Tokyo’s 23 wards. In the town, many younger families live because the town is
596 N. Mukai and N. Yoden

Fig. 5 Comparison of regions in Tokyo

easy access to the center of Tokyo (there are five railways and subways). Thus, taxi
demands are so small, and non-periodical. We think that these features of the regions
are the reason that the demand forecasting is effective or not.
Figure 6 shows the comparison of time zones in a week. We found the error of
weekdays is relatively small compared to weekends. Most of businessmen work in
weekdays, thus, taxi demands occur periodically as previous described. Moreover,
the error during 4:00-8:00 is so small because the number of demand during the
time is also small.

Fig. 6 Comparison of time zones in a week


Taxi Demand Forecasting Based on Taxi Probe Data by Neural Network 597

5 Conclusion
In this paper, we considered demand forecasting for taxis by neural network. Taxi
probe data, which contain historical data of taxis (i.e., latitude and longitude when
a taxi picks up a customer), is used as training data set for the neural network. We
adopted three kinds of input data: “demands in each region”, “day of the week”,
and “amount of precipitation” for the neural network, and evaluated their effects.
We found that day of the week is important factor for demand forecasting because
the demands occur periodically in a week. Furthermore, the demand forecasting in
business town like Chuo-Ku is more easy compared to bed town like Edogawa-Ku.
However, the amount of precipitation is ineffective because we considered whether
the rain falls or not. Therefore, we must consider how to deal with weather informa-
tion or other events (e.g., festivals).

Acknowledgements. We appreciate the provision of the taxi probe data by System Origin
Corporation. This work was supported by Grant-in-Aid for Young Scientists (B).

References
1. Araki, H., Kimura, A., Arizono, I., Ohta, H.: Demand forecasting based on differences of
demands via neural networks. Journal of Japan Industrial Management Association 47(2),
59–68 (1996)
2. Kanazawa, F., Sawada, Y., Wakatsuki, T., Iwasaki, K.: Applying the probe data, accumu-
lated by the its-spot, to the road governance. In: Proceedings of ITS Symposium 2011, pp.
73–76 (2011)
3. Kimura, A., Arizono, I., Ohta, H.: An application of layered neural networks to demand
forecasting. Journal of Japan Industrial Management Association 44(5), 401–407 (1993)
4. Nakajima, Y., Makimura, K.: Study on improvement of road time table using taxi probe
vehicle data. Journal of Japan Society of Civil Engineering 29 (2004)
5. Nishimura, S., Suzuki, K., Kobayashi, M., Matsumoto, O.H.H., Nagashima, Y.: An im-
provement on traffic signal control through use of probe vehicle data. In: Proceedings of
ITS Symposium 2010, pp. 365–370 (2010)
6. Taguchi, K., Yoshida, S., Sadohara, S.: Time spatial analysis of taxi demand using probe.
In: Summaries of Technical Papers of Annual Meeting Architectural Institute of Japan,
vol. 2009, pp. 519–520 (2009)
7. Xu, J.X., Lim, J.S.: A new evolutionary neural network for forecasting net flow of a car
sharing system. In: Proceedings of IEEE Congress on Evolutionary Computation 2007,
pp. 1670–1676 (2007)
The Design of an Automatic Lecture Archiving
System Offering Video Based on Teacher’s
Demands

Shin’nosuke Yamaguchi, Yoshimasa Ohnishi, and Kazunori Nishino

Abstract. In this research, the authors propose improvements to automatic


lecture-archiving systems. We identify silent periods in lecture videos by analyzing
the lecture’s audio content. We then create shorter videos by removing the
identified silences and evaluate their encoding time and contents of those videos.

Keywords: e-learning, lecture-archiving system, developing teaching material,


video on demand

1 Introduction
As the use of audio/visual equipment spreads, many educational facilities are
beginning to develop and apply automatic lecture archiving systems [1][2][3].
Some systems can record not only the teacher but also the lecture slides and output a
combined video [4].
Many videos recorded in an archiving system are can be used in review by
students. For teachers, these videos can be used to create teaching material.
However, because a video is often long, it can take significant time for a teacher
to edit the video by deletion of unnecessary sections and by subdividing it into
sections, etc. Such edited lecture videos will be authoritative and easy to use but
only at the human cost of the editing. Methods for creating videos for archiving
systems have been studied by many research groups [5][6]. This research has
proposed methods for creating videos without these human costs, including
rule-based control of the camera recording and automatic construction of the video
from videos of the teacher’s action patterns.

Shin’nosuke Yamaguchi · Yoshimasa Ohnishi · Kazunori Nishino


Kyushu Institute of Technology, Faculty of Computer Science and Systems Engineering,
680-4 Kawazu, Iizuka, Fukuoka, 820-8502 Japan
e-mail: yamas@iizuka.isc.kyutech.ac.jp,
ohnishi@el.kyutech.ac.jp, nishino@lai.kyutech.ac.jp

T. Watanabe et al. (Eds.): Intelligent Interactive Multimedia: Systems & Services, SIST 14, pp. 599–608.
springerlink.com © Springer-Verlag Berlin Heidelberg 2012
600 S. Yamaguchi, Y. Ohnishi, and K. Nishino

We focus on teacher input into improving archiving systems. The main requests
by teachers at our university are

• Deleting of some unnecessary sections of the video, such as silences and


material unrelated.
• Subdividing the video in terms of lecture content.
• Emphasizing important section of the video.

Our goal is the development of an automatic-lecture archiving system based on


teacher demands. In this paper, we start by focusing on the first demand, namely the
deleting of unnecessary sections of the lecture by analyzing its sound.

2 The Design of Lecture-Archiving System Based on Teacher


Demands
In this section, we explain the design of a lecture-archiving system based on teacher
demands. We place a recording system in each lecture room and set up a control
server and a streaming flash server. The recording system records a lecture
according to a schedule. The control server sends videos and information to the file
server from the recording system. The current system is able to make full length
lecture videos without editing.

Fig. 1 Diagram of the automatic lecture-archiving system

We propose adding a mobile terminal to obtain the lecture information and


adding an editing server to edit the lecture videos. Figure 1 is a diagram of the
proposed lecture-archiving system.
The teacher sends information about the lecture to the control server via the
mobile terminal, while giving the lecture. This information mainly involves
recording the time of speaking about an important section and changing the lecture
contents. The editing server uses this information to encode the video. Because
The Design of an Automatic Lecture Archiving System 601

these operations have to be simple, we consider using a mobile terminal with a


touch interface.
Our recording system records the whole lecture, because some silent sections of
the lecture may be necessary (such as the teacher showing the operation of a
computer to a student). Judgment about the necessity for a silent section is difficult
for the recording system alone. Moreover, we expect that it is easy to synchronize of
the time that teacher sends the information and the time of lecture video, by
recording the full lecture. Therefore, we choose what to edit using the editing
server, after recording the full lecture.
The editing server analyzes the lecture video, and creates encoding orders
according to the results of this analysis and the information sent by the teacher. The
edit server then runs the encoding orders to create the concisely edited lecture
videos for distribution.
We expect that our system will improve full-length lecture videos without human
cost by generating shorter, useful lecture videos. Initially, therefore, we evaluate
whether it is useful to edit the video automatically based on an analysis of the
lecture’s sound.

3 Analysis of the Lecture's Sound

3.1 Sounds in a Programming Lecture That Includes Some Exercises

In this section, we explain the analysis of the audio files recorded by the
lecture-archiving system. The system records the lecture sound as a Broadcast
Wave Format (BWF) file. A BWF file is an encoded sound file in an uncompressed
sound format.
First, we analyze the sound files for 14 programming lectures presented at our
university. This lecture contains explanations by the teacher and exercises using a
computer. Figure 2 shows the sound records for the 14 programming lectures. The
deeply colored sections indicate that sound is recorded, i.e., that the teacher is
explaining. Each lecture lasts 3 h. We can see that explanatory sections and exercise
sections are divided clearly in these lectures. Note that the orientation lecture occurs
first, with the teacher talking for much of the time and ending the lecture earlier than
usual.
While the students are exercising, the teacher does not do much explaining.
Figure 2 shows some vertical deeply colored lines in the silent sections, indicating
that the teacher is offering hints during the exercises and messages of a few words.
The archiving system will not need to record these user-exercise sections. If the
archiving system can delete the silent sections from the videos, these videos will
become more useful for users.
In Figure 2, the speech sections and silent sections are separated clearly. We can
assume that separating required sections from sections that are not required is easy,
from the evidence of the videos of this lecture.
However, because the archiving system does not recognize when an exercise
starts, it is difficult to remove these exercise sections while recording the lecture.
We therefore consider removing these sections from the video after the recording.
602 S. Yamaguchi, Y. Ohnishi, and K. Nishino

Fig. 2 The records of waveform for 14 programming lecture

3.2 Identification and Deletion of Silent Sections

Next, we describe the identification of silent sections in the video. In fact, the
“silent” sections shown in Figure 2 are not always completely silent. We can see
some small audio signals by expanding the waveform.
Moreover, the volume of the audio starting times is increasing little by little. We
therefore measure the value of the sound signal for every second of the video, as
shown in Figure 3. We check the maximum and minimum values for each 1 sec
period, and record the differences. A section of the record is:

• 63 min 43 sec; max value: 2; min value: -6; difference: 8.


• 63 min 44 sec; max value: 3; min value: -6; difference: 9.
• 63 min 45 sec; max value: 2; min value: -9; difference: 11.
• 63 min 46 sec; max value: 7; min value: -12; difference: 19.
• 63 min 47 sec; max value: 2,855; min value: -2,382; difference: 5,237.
• 63 min 48 sec; max value: 5,272; min value: -2,485; difference: 7,757.
The Design of an Automatic Lecture Archiving System 603

This section of the record includes the start of some talking by the teacher. The
difference values are around 10 in the initial silent section, and then increase greatly
from the 5th record on. From these records, we can assume that silent sections will
have difference values of 100 or less and we use this threshold in our automatic
editing program. The program analyzes the BWF file and generates a new BWF file
without the silent sections. The algorithm for the automatic editing program is as
follows:

• Read the header of the BWF file and record it in a new BWF file.
• Record to an information file the times for which the difference value in each 1
sec sample is over 100.
• Copy to the new BWF file the waveform data for only those times recorded in
the information file.
• Update the header of the new BWF file to match the new recording time.

Table 1 The length of the audio files with


unnecessary silences deleted

Edit by Edit by 1 sec


Value for 1 sec Max value
analysis student staff
program (video file)
1 1 h 46 m Not edited
2 1 h 51 m 1 h 42 m
3 1 h 31 m 1 h 27 m
4 1 h 44 m 1 h 33 m
5 1 h 34 m 1 h 43 m
6 1 h 32 m 1 h 30 m
7 1 h 35 m 1 h 33 m
8 1 h 48 m 1 h 43 m
9 1 h 31 m 1 h 29 m
10 1 h 37 m 1 h 34 m
11 1 h 20 m 1 h 14 m Min value
12 1 h 41 m 1 h 40 m
13 1 h 49 m 1 h 48 m Fig. 3 Signal values for 1 sec in the audio
14 1 h 20 m 1 h 13 m file used by the analysis program

Table 1 shows the result of this analysis and editing by the program. We can
confirm that our program deletes the silent sections from the BWF file and reduces
its length by about a half. The right-hand column shows the corresponding length of
videos edited by a student. The student was employed by our research group to
support the creation of teaching material. The student watches the video and
extracts only the sections from the video that are required by the archiving system.
(The first lecture was not edited by the student, because it contained only
orientation material.)
604 S. Yamaguchi, Y. Ohnishi, and K. Nishino

The analysis and editing time for the student required 70–90 min per video,
whereas the analysis time for our program required 2–3 min. (The encoding time for
the two methods has not been included.) From Table 1, we can confirm that there is
not much difference in the output video’s length for the two methods. The
student-edited file is slightly shorter than that from our analysis program. This is
because the student judged some sections for which the teacher was speaking not to
be related to the lecture, and therefore deleted them from the video. We expect that
our program would include all such sections that the student omitted.
If our program deletes the same sections from the BWF file as the student-edited
video, we are then able to use the results of the analysis effectively.

4 Video Encoding Based on the Analysis Results

4.1 Identification of the Sections to Retain in the Video


In this section, we explain the identification of those sections of the video to be
encoded. As explained in Section 3, our program identifies silent sections in a BWF
file and deletes them. We then use FFmpeg [7] to encode the video. FFmpeg is an
encoding program that is able to encode video data from an arbitrary start time for
an arbitrary period. We use FFmpeg to encode only the required sections of the
video.
However, it is not necessary to encode sections containing a period of sound
lasting only a few seconds. (This is considered to be free noise.) Similarly, we think
that it is not necessary to omit those periods for which the teacher becomes silent for
only a few seconds.
Therefore, we identify the sections of the video to be encoded using the
following rules:

• A time that a sound starts is an encoding start time.


• If a silent section continues for 10 sec following the identification of an
encoding start time, the starting time for the silent section is used as the
encoding end time.
• The minimum length of a section to be encoded is 10 sec.
• If the video ends with sound, the ending time of the video is an encoding end
time. However, if this section is shorter than 10 sec, the section is omitted.

We improved our program based on these rules, and analyzed the video for the 14
lectures. Figure 4 illustrates an example of our results by showing three rows, where
the first row is the sound waveform for the 13th lecture. As in Figure 2, the deeply
colored sections show where sound has been recorded. The second row gives the
analysis results from our program. The deeply colored sections show the sections to
be encoded that our program identified. The third row is the result following student
editing. The student divided up the lecture movie according to the teacher’s
demands, with the vertical lines showing where these divisions occur.
The Design of an Automatic Lecture Archiving System 6005

Fig. 4 Sections of a lecture video


v encoded by the analysis program and by a student

From Figure 4, we can n confirm that there are no big gaps in the results. Ouur
program has removed som me sections of the sound data that were also removed bby
the student. Therefore, ev ven if our program does not retain these sections, we thinnk
that the result will be acceeptable.
The student removed more m sections than our program. This was almost alwayys
because these sections co ontained material unrelated to the lecture. However, thhe
teacher may explain the relationship
r to the lecture. In Figure 4, we can see a thiin
vertical line in the last gaap in the student’s data, between 2:30 and 3:00. Here, thhe
student judged that this short section was required according to the teacherr’s
demands. Our program also judged that this section was required.
Therefore, even if ourr program removes some silent sections automatically,
based on the analysis resu ults, we judge that these videos can be used effectively.

4.2 Results of Video


o Encoding
Next, we explain the enco
oding of the lecture video. We improved our program sso
that it creates an encodiing script automatically from the analysis results. A
An
encoding command for thhe video has this form:

put video file name}.m2t -f flv -vcodec flv -r 25 -b 15000k


/usr/bin/ffmpeg -y -i {inop
-s 960x540 -acodec libmp p3lame -ar 44100 -ab 64k -ss {starting time} -t {encodinng
time} {output video file name}.flv
n

This command creates a one-section


o lecture video. Our program creates commandds
based on the analysis resullts, and recodes the script in the editing server. The editinng
server runs this script, an
nd the subdivided lecture video is created. The encodinng
environment for these com mmands is as follows:

• Output video format: Flash video.


• Frame rate: 25.
• Video bit rate: 1,500 kbps.
606 S. Yamaguchi, Y. Ohnishi, and K. Nishino

• Output video resolution: 960 x 540 dots.


• Output audio format: mp3.
• Audio sampling frequency: 44,100 Hz.
• Audio bit rate: 64 kbps.

We now describe subdividing lecture videos into sections of video automatically.


First, we evaluate the encoding times. Figure 5 shows the results of encoding two
lecture videos.
In Figure 5, the video length indicates the length of the video of a certain lecture
that is subdivided into sections. The encoding times show the times required for
encoding each section of the video. Video length 1 and Encoding time 1 are the
results for the same lecture video (Lecture video 1). Video length 2 and Encoding
time 2 are the results for a different lecture video (Lecture video 2). The horizontal
axis in Figure 5 shows the number of subdivisions. Lecture 1 is subdivided into 17
sections and Lecture 2 is subdivided into 15 sections.

60 12:40:00 AM
0:38:39 0:38:19
54

12:35:00 AM
50 48 48 48 48
46
45 45
12:30:00 AM

Video length 1
40
36 Video length 2
35 35 12:25:00 AM
Encording 33 Encoding time 1
Video Length
TIme (min) 31 31
(h:mm:ss)
0:20:04 Encoding time 2
30 12:20:00 AM

12:15:00 AM
19
20
16
0:10:29
12:10:00 AM

10 0:05:19
12:05:00 AM
0:00:59
1 0:00:13 0:00:10 0:00:21
0:00:16
12:00:11 AM 0:00:14 0:00:21 0:00:12 0:00:32 0:00:18 0:00:19
0 12:00:00 AM
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
Number of divided videos

Fig. 5 Time required for encoding each section and the length of the videos

From Figure 5, we can see that the encoding time is long if the video is long (such
as the 2nd section in Lecture video 1 and the 7th section in Lecture video 2).
The Design of an Automatic Lecture Archiving System 607

However, the encoding time involves more than just its length. For example, in
Lecture video 1, although the 16th section is very short (21 sec), its encoding time is
the same as that for the 10th section.
In examining the behavior of the program, we note that whenever FFmpeg is run,
it searches for the encoding start time from the beginning of the file. Figure 5 shows
that the time to search for the start time is longer than the encoding time. The results
for Lecture video 2 show this effect. Lecture video 2 comprises two videos. The 8th
section of Lecture video 2 comes from the 2nd video. Therefore, the time to search
for the starting time is short, whereas the time to encode the data for the 8th section
remains long.
As a result, the total encoding time for Lecture video 1 is over 10 h. On the other
hand, the total encoding time for Lecture video 2 is only 3 h. We think that the
encoding time for our method is greatly affected by the situation in the lecture. The
encoding time will be long if there are two or more explanations lasting 10 sec or
more at the end of a lecture.
We watched the subdivided videos for some sections to check if there was any
problem with the contents of the video. Although the video sections had not been
subdivided in terms of the contents of the lecture, those contents about which the
teacher spoke were covered. We therefore estimated that it would offer satisfactory
viewing and listening. These videos are poor compared with the videos edited by
the student. However, we were able to assess that the videos are acceptable, when
compared with watching a 3 h video containing unnecessary silent sections.

5 Conclusion
In this research, we propose an automatic lecture-archiving system. We identified
silent sections in the lecture videos by analyzing the sound in 14 lectures. We then
created some short videos by removing their silent sections.
These videos used different methods from video editing by students. However,
the videos retained all the video content that was in the videos edited by students.
We expect that our archiving system will be able to create lecture videos subdivided
into content-related periods by adding teacher-supplied information about the
lecture to the encoding method used in this paper.
In future work, we will first evaluate analysis program by showing videos to
students and teacher, and develop an interface for mobile terminals to send
information about the lecture. Next, we will consider encoding methods for other
types of lectures. Finally, we will offer all the lecture videos to both students and
teachers to evaluate the lecture-archiving system.

References
1. Baeker, R.N., Wolf, P., Rankin, K.: The ePresence Interactive Webcasting and Archiving
System: Technology Overview and Current Research Issues. In: Proceedings of the
World Conference on E-Learning in Corporate, Government, Healthcare, and Higher
Education, pp. 2532–2537 (2004)
608 S. Yamaguchi, Y. Ohnishi, and K. Nishino

2. Herr, J., Lougheed, L., Neal, H.A.: Lecture Archiving on a Larger Scale at the University
of Michigan and CERN. In: In:17th International Conference on Computing in High
Energy and Nuclear Physics, CHEP 2009, p. 11 (2009)
3. Takayuki, N.: Automated lecture recording system with AVCHD camcorder and
microserver. In: Proceedings of the 37th annual ACM SIGUCCS Fall Conference, pp.
47–54 (2009)
4. Cha, Z., Yong, R., Jim, C., Li-wei, H.: An Automated End-to-End Lecture Capture and
Broadcasting System. ACM Transactions on Multimedia Computing, Communications,
and Applications 4(1) (2008)
5. Takafumi, M., Yoshitaka, S., Koh, K., Michihiko, M.: Lecture Context Recognition
Based on Statistical Features of Lecture Action for Automatic Video Recording. Journal
of the Institute of Electronics, Information and Communication Engineers J90-D(10),
2775–2786 (2007)
6. Atsuo, Y., Tsukasa, H.: A Framework for Rule Based Video Editing for Lecture Video
Archiving. Information Processing Society of Japan Technical Report, 2009-HCI-132,
pp. 123–129 (2009)
7. FFmpeg (December 18, 2011), http://ffmpeg.org/
The Difference and Limitation of Cognition
for Piano Playing Skill with Difference
Educational Design

Katsuko T. Nakahira, Miki Akahane, and Yukiko Fukami

Abstract. It is an important theme for pre-school teacher education to identify the


merits of view to view lesson and of e-Learning method under the limitation of time
or the number of instructors. In this paper, we discuss the difference and limitation of
skill transfer, comparing two types of prepared learning environment. One consists
of annotated scores and model performance videos served by e-Learning, while the
other is supplemented with view-to-view lesson after taking the self-learning. The
analysis of our experiment will indicate that e-Learning contents provide a powerful
method for cognition of correct length or pitch of notes. Based on this result, we
will propose a hypothesis that e-Learning is effective to learn items related to the
motion control and music perception. Although students can improve their skill by
the self-learning via e-Learning to some extent, they get more stimuli from view to
view lesson. Those stimuli lead to more freqent awareness and the improvement of
cognition of the length of notes, fine movement of notes, or tempo. In contrast, the
expression of dynamics of music will turn out not to show any radical improvement
in both learning methods. However, the students who take a view to view lesson can
get instruction reaching at dynamics of music because of their faster mechanic skill
improvement. As a result, we will find that it has some additional effect to change
their attitude into the direction more oriented to improve their music expression.

1 Introduction
There are many case study in the piano singing and playing lesson in pre-school
teacher education, for example, Nakajima[1] or Imaizumi[2]. But almost of case

Katsuko T. Nakahira
Nagaoka University of Technology, Nagaoka, Niigata, Japan
e-mail: katsuko@vos.nagaokaut.ac.jp
Miki Akahane
Tokyo College of Music, Tokyo, Japan
Yukiko Fukami
Kyoto Women’s University, Higashiyama-ku, Kyoto, Japan
e-mail: fukami@kyoto-wu.ac.jp

T. Watanabe et al. (Eds.): Intelligent Interactive Multimedia: Systems & Services, SIST 14, pp. 609–617.
springerlink.com c Springer-Verlag Berlin Heidelberg 2012
610 K.T. Nakahira, M. Akahane, and Y. Fukami

studies treat with only view to view lesson and few studies in the point of educational
design.
In past, we studied the improvement of educational design for piano singing
and playing lesson in pre-school teacher education. The design includes
concurrent use of view to view lesson and e-Learning. The main results are as
follows:
1. We made an experiment of video submission with their piano playing and singing
for 300 students. The analysis of the number of video submission and mid/end of
performance examination suggests that it gives motivation to students and is ef-
fective for students’ skill improvement (Fukami et.al[3], Nakahira et. al.[4]). We
also made an experiment of giving asynchronous comments to students’ playing
and singing, which suggested the effectiveness and limitation of the educational
design (Fukami et al., [6]).
2. We developed the e-learning material (entitled “e-learning course on piano per-
formance for teachers and pre-school teachers”, Nakahira et al., [5], [8]) and
internet-delivery since April 2008.
3. We developed the educational design which includes 2 components: (1)self-
learning via e-Learning contents for piano playing and singing developed in
2, (2)self video submission of students’ playing and singing before and after
e-Learning for piano playing and singing. After experiments, we analyzed the
difference in students’ skills or awareness before and after e-Learning. The anal-
ysis suggested the educational design has the effects of (1)getting basic skills for
early-stage students, and the expression skills for advanced students, (2)improve-
ment of students’ singing.
4. we developed annotated scores for 50 numbers[9].
In the process of developing them, we analyzed the students’ change before and
after gazing on the annotated note. The analysis suggested that concurrent use
of meta-cognitive language[11] and intercorporate imagination of physical skills
(ability to mimic or copy skills)[12] has the potential of radical improvement of
students’ skills (Nakahira et al., [13]).
Through these processes, we think that we succeeded in proposing a good educa-
tional design for piano singing and playing to some extent. We could not, however,
refer to the piano playing skill transfer. This is an important theme for pre-school
teacher education, and we need to distinguish the merits of view to view lesson and
of e-Learning method, under the limitation of time and/or the number of instructors.
In this paper, we discuss the difference and limitation of skill transfer between
2 types of learning environment, which are composed of (1) annotated scores and
model performance videos served by e-Learning, (2) a view to view lesson after
taking (1).
The Difference and Limitation of Cognition for Piano Playing Skill 611

2 Environment of Education
2.1 Construction of e-Learning Contents
First of all, we explain how we constructed e-Learning contents, which plays the
most important roll in making this experiment. Figure 1 shows the index of the
contents. There are 2 links for each number: one is for the model performance video,
and the other for the annotated score. Figure 2 shows those elements.
The model performance video was played by a professional pianist who is in-
volved in piano playing education for many years. We have recorded and edited the
playing by ourselves. The process of making the annotated scores was as follows.
First, all the annotations are decided by the authors. Then, making the scores with
the annotations, we finally convert them into pdf files to include in the Web page.
We also prepared an upload website for the video submition, which enables stu-
dents to submit their playing video at anytime.

2.2 Educational Design


Our educational environment is as follows. The sample consists of 106 students
in K University who have taken the course of “Music for Children I”, which was
lectured from April to July 2011. We implemented our experiment in June. Two
targeted music scores which have different features were selected from etude which
is supposed to be learned by beginners of piano. The score A has the following
features: written in a minor key, without black keys, composed only of quaver·quater
note·rest notes. The score B is more difficult than the score A and has the following
features: written in a major key, composed of quaver·quater note·rest·dotted note.
Fig. 3 shows the outline of the educational design for students in this experiment.

Fig. 1 outline of e-Learning site


612 K.T. Nakahira, M. Akahane, and Y. Fukami

Fig. 2 (a)(b) an image of model performance video. Each video is taken from 2 angles. (c)
Score A (abr.), (d) Score B (abr.)

Fig. 3 The outline of the educational design in this experiment.


The Difference and Limitation of Cognition for Piano Playing Skill 613

Students are instructed to choose which score they would like to learn from the
score A and B. After the decision of students’ compulsory, the students are divided
into 2 groups randomly by the instructor: one group practices by the self-learning
via e-Learning (served with the annotation score and the model performance video),
while the other group is supplemented with a view to view lesson after the self-
learning via e-Learning. We require all students to submit their own performance
videos before and after the learning. The instructor does not insist the students to
record the whole score of their performance if they cannot, in order to reduce the
barrier to submit the videos.

3 Results
We analyzed these video datasets gathered through the process mentioned above
from the following two points of the view: (1) the dependence of the effectiveness
of e-Learning depending on the degree of difficulty of the tune. (2) the difference
in the skill transfer quality arising from the difference in the learning method. The
estimation of skill transfer quality was made by a professional piano instructor who
is not acquainted with the students.

3.1 Differences of Score


We first analyze the dependence of the effectiveness of e-Learning depending on the
degree of difficulty of the tune. Table 1 shows the results of analyzing the data from
12 students who submitted (sufficiently complete) videos of their performance for
the score A without taking any view to view lesson. We did not find great improve-
ment in the performance of students’ playing, but we recognized some improvement
in the point of expression presumably owing to the model performance video. The

Table 1 The fraction of students in each category that classifies the degree of improvement of
the students’ skills, when comparing Take 1 and Take 2 for the score A.  means significant
improvement,  tiny improvement, and × no improvement. Error means that the data is
incomplete.

Improvement degree between Take 1 and Take 2


  × Error
correct length of note
half note in left hand 0.17 0.50 0.17 0.17
quater note in right hand 0.25 0.63 0.12 0.00
the last bar 0.33 0.58 0.08 0.00
rest 0.25 0.75 0.00 0.00
ligature 0.17 0.67 0.00 0.17
Tempo 0.42 0.58 0.00 0.00
dynamics 0.58 0.42 0.00 0.00
phrase 0.08 0.75 0.00 0.17
614 K.T. Nakahira, M. Akahane, and Y. Fukami

Table 2 The same table as before but for the score B.


Improvement between Take 1 and Take 2
  ×
correct length of note
dotted rhythm 0.64 0.28 0.08
half note 0.52 0.12 0.36
keeping note 0.44 - 0.56
demiquaver 0.28 0.16 0.56
Tempo 0.24 0.28 0.48
fingering 0.40 0.40 0.20
dynamics 0.04 0.24 0.72
phrase 0.32 - 0.68
marcato 0.76 0.08 0.16

ratios written in italic letters with underline in Table 1 represent the mentioned im-
provement: the last bar rit. (ritardando, ratio 0.58), motif expression (ratio 0.58 in
dynamics, ratio 0.75 in phrase). In contrast, as for the item of correct length of note
there is no improvement arising from the learning method, namely self-studying via
e-Learning.
Table 2 shows the results of analyzing the data from 20 students who submitted
videos of their performance for the score B without taking any view to view les-
son. There is some improvement in students’ skills related to correct length of note
(dotted rhythm, ratio 0.64) and marcato (ratio 0.76). On the other hand, there is no
improvement in the skills related to dynamics or phrase.

3.2 Difference of Learning Tool


Next, we analyze the difference in the skill transfer quality arising from the dif-
ference in the learning method. Table 3 shows the results of analyzing the data for

Table 3 The same table as before but for the score B.

Improvement between Take 1 and Take 2


  ×   ×
correct length of note
dotted rhythm 0.96 0.04 - 0.64 0.28 0..08
half note 0.65 0.30 0.04 0.52 0.12 0.36
keeping note 0.56 0.09 0.30 0.44 - 0.56
demiquaver 0.83 0.17 - 0.28 0.16 0.56
Tempo 0.87 0.17 - 0.24 0.28 0.48
fingering 0.83 0.17 - 0.40 0.40 0.20
dynamics 0.26 0.52 0.22 0.04 0.24 0.72
phrase 0.17 - 0.78 0.32 - 0.68
marcato 0.96 0.04 - 0.76 0.08 0.16
The Difference and Limitation of Cognition for Piano Playing Skill 615

the score B from 23 students who took a view to view lesson. Comparing Table 3
with Table 2, we find that the students who have practiced with a view to view les-
son drastically improved their skills about demiquaver, keeping note, dynamics, and
tempo. As for the skills for which improvement is observed even for students who
did not take a view to view lesson, improvement is seen consistently also for stu-
dents who took a view to view lesson. In total, the students who took a view to view
lesson made a large amount of improvement of their skill.

4 Discussion
In this section we add further discussion on the points analyzed in section 3.
First, we discuss the difference in the skill transfer for different tunes. The score
A does not contain a demiquaver, while the score B contain many demiquavers
and dotted notes, but both tunes are basically easygoing numbers. Both scores have
simple dynamics composed of p (piano), f (forte), crescendo, and decrescendo. The
items of students’ skill that are improved only by e-Learning contents are the last
bar rit. and motif expression affected from model performance video. By contrast,
there is no improvement about dynamics, tempo and fine notes such as demiquaver.
These points difficult to improve by e-Learning are improved by taking a view to
view lesson. The reason why this difference is caused is accountable as follows.
The first to mention is the difference in learning style. The self-learning via e-
Learning contents is provided without any interaction with other people. Students
can learn following their own learning rhythm, and the learning environment is ba-
sically free from any stress. We consider what students can learn most under such
learning environment. Fujimura and Ohmi[14] suggested the difference of the rolls
between the right and left sides of the brain in recognizing rhythm. From the op-
tical topography image, the left side of the brain works for rhythm cognition but
it becomes more accurate when the person use the right side of the brain together.
Iwasaka et al.[15] gave some insight into the relation between score reading and
playing music. They pointed out that (1) the music information processing of play-
ers who are practiced in score reading tend to make active planning for playing
expression in their brain, (2) the amateur players make less planning for playing
expression, (3) the dominant part of brain activity is spent for the functions of the
motion control and music perception when the amateur players listen to the music.
From these suggestions and the fact that students are in the state of an amateur for
piano playing, we made a hypothesis that e-Learning is a powerful method under the
situation in which the function of motion control and music perception dominate,
and hence it is useful for the cognition of correct length or pitch of note. By contrast,
it is not so effective for the purpose of improving the fine note cognition.
On the other hand, when the students have a chance to take a view to view lesson,
they can get not only model performance but also realtime oral suggestions from the
instructor. The oral suggestions from the instructor can be regarded as a different
type of “stimuli” for students since they contain the points that the students will not
be able to find out by themselves by comparing their own playing with the scores or
616 K.T. Nakahira, M. Akahane, and Y. Fukami

the model performance. In consequence, the students get more stimuli from a view
to view lesson than self-learning via e-Learning.
These stimuli bring frequent awareness leading to the improvement of cognition
of the length of notes, fine movement of notes, and the tempo. Although the ex-
pression of dynamics in score did not make radical improvement in both learning
methods, the students who have taken a view to view lesson can get the instruc-
tion reaching at dynamics of music because of their faster mechanic skill improve-
ment, which further caused some change in their attitude to improve their music
expression.

5 Conclusion
In this paper, we discussed the difference and limitation of skill transfer by compar-
ing two types of prepared learning environment: one consists of annotated scores
and model performance videos served by e-Learning, and the other is supplemented
with a view to view lesson after taking e-Learning. We showed that the following
two points can be suggested from our experiment.
1. We find that e-Learning contents can provide a powerful method for cognition of
correct length or pitch of notes.
2. When students have a chance to take a view to view lesson, they can get not only
model performance but also realtime oral suggestions from the instructor.
In future, we would like to discuss about the reason why the fine note cognition is
not improved in their training.

Acknowledgements. We have great thanks to Prof. Muneo Kitajima at Nagaoka University


of Technology for his useful comments. The study described in this paper has been funded
in part by the Scientific Research Expense Foundation C (Representative: Yukiko Fukami.
18500742).

References
1. Nakajima, T.: A Practical Study on Piano Teaching Method at the Department of
Education–For Acquirement of Musical Capacity. In: Studies on Educational Practice,
Center for Educational Research and Training, Faculty of Education, Shinshu Univer-
sity, vol. 3, pp. 31–40 (2002)
2. Imaizumi, A.: A Trial of Teaching Piano Playing to Students with No Experience–Group
Lesson using Keyboard Pianos (2) Introduction of Practice Record Cards. Japan Society
of Research on Early Childhood Care and Education Annual Report 57, 281–282 (2004)
3. Fukami, Y., Nakahira, K.T., Akahane, M.: Effect of submitting self-made videos in piano
playing and singing practicing for preschool teacher education. The Bulltain of Depart-
ment of Pedology, Kyoto Women’s University 4, 19–27 (2008)
4. Nakahira, K.T., Akahane, M., Fukami, Y.: Combining Music Practicing with the Sub-
mission of Self-made Videos for Pre-School Teacher Education. In: The Proceedings of
the 15th International Conference on Computers in Education, pp. 573–576 (2007)
The Difference and Limitation of Cognition for Piano Playing Skill 617

5. Nakahira, K.T., Akahane, M., Fukami, Y.: Development e-learning contents with blended
learning for teaching piano singing and playing piano. JSiSE Research Report 23(1), 85–
92 (2008)
6. Fukami, Y., Nakahira, K.T., Akahane, M.: Effects and problems of remote and non-face-
to-face teaching to singing with simultaneous piano self-accompaniment. The Bulltain
of Department of Pedology, Kyoto Women’s University 5, 31–40 (2009)
7. Nakahira, K.T., Akahane, M., Fukami, Y.: Use of Electronic Media for Teaching Singing
with Simultaneous Piano Self-Accompaniment. The Journal of Three Dimensional Im-
ages 23(1), 82–87 (2008)
8. http://oberon.nagaokaut.ac.jp/kwu/
9. Fukami, Y., Akahane, M.: 50 Best Annotated Scores for Simultaneous Piano Playing and
Singing of Children’s Songs (2011) (in Japanese) ISBN-4276820723
10. Nakahira, K.T., Akahane, M., Fukami, Y.: Verication of the Effectiveness of Blended
Learning in Teaching Performance Skills for Simultaneous Singing and Piano Playing.
Biometric Systems, Design and Applications, 978–953 (2011) ISBN 978-953-307-542-6
11. Suwa, M.: The Act of Creation: A Cognitive Coupling of External Representation, Dy-
namic Perception, the Construction of Self and Meta-Cognitive Practice. Cognitive Stud-
ies 11(1), 26–36 (2004)
12. Saito, T.: “Culture(Buildung in German)” as Bodily Wisdom. The Japan Society for the
Study of Education 66(3), 29–36 (1999)
13. Nakahira, K.T., Akahane, M., Fukami, Y.: Awareness Promoting Learning Design of
Sing-along Piano Playing – the role of annotated musical score and multimedia contents
–. In: The Proceedings of the 3rd International Conference on Awaqreness Science and
Technology, pp. 372–378 (2011)
14. Fujimura, A., Ohmi, M.: Difference of brain activity when musical element is perceived
by subjects with different musical instrument experience. IEICE Technical Report HIP-
2007-126, pp. 143–147 (2007)
15. Iwasaka, M., Sugo, K., Shimo, M., Ishii, T.: Human brain hemodynamics during ac-
tive/passive musical listening revealed by near infrared spectroscopy. IPSJ SIG Technical
Report, 2007-MUS-69, pp. 1–6 (2007)
Topic Bridging by Identifying the Dynamics
of the Spreading Topic Model

Makoto Sato, Mina Akaishi, and Koichi Hori

Abstract. We propose topic bridging as a method for story generation support. In


our research, a document is defined as a story fragment and a story is defined as
a sequence of story fragments. Topic bridging suggests story fragments that can
function as a bridge between the start topic and the goal topic to generate a story.
To do this, we propose a topic dynamics model corresponding to the story that is
based on a spreading activation model, which we call the spreading topic model.
On the basis of this model, we defined the term context-dependent attractiveness
to indicate the dynamic popularity of a term spreading through the term relations
in a new concatenated story fragment. The term context-dependent attractiveness
features the topic of the story fragment. We propose the topic bridging method to
estimate the feature of the story fragment that bridges the topics of the start story
fragment and the goal story fragment by solving the inverse problem of the term
context-dependent attractiveness.

1 Introduction
Representing knowledge in the form of a story has many advantages [5]. For exam-
ple, when learning computer programming, students can progress more quickly if
they read not only descriptions of functions, such as reference materials, but also
descriptions on how to use the functions, such as tutorials or sample applications,
or descriptions on what to do in the case of a problem, such as FAQs. Depending on
the objectives, it is usually preferable to impart knowledge about a function through
a story that links different bits of knowledge related to different functions.
Individuals can piece together relevant knowledge to empirically create a story,
and it is possible to attach different meanings and/or values to a piece of the story

Makoto Sato · Mina Akaishi · Koichi Hori


Department of Aeronautics and Astronautics, University of Tokyo, 7-3-1 Hongo, Bunkyo-ku,
Tokyo, Japan
e-mail: satomakoto@ailab.t.u-tokyo.ac.jp,
akaishi@ailab.t.u-tokyo.ac.jp, hori@computer.org

T. Watanabe et al. (Eds.): Intelligent Interactive Multimedia: Systems & Services, SIST 14, pp. 619–627.
springerlink.com c Springer-Verlag Berlin Heidelberg 2012
620 M. Sato, M. Akaishi, and K. Hori

depending on the context. Therefore, if we want a system to string together pieces of


knowledge to automatically create a story, we must consider the knowledge context
dependency from its representation.
Methods for generating natural language stories automatically have been actively
studied since the 1960s. The generation of artificial intelligence-based approaches
can be traced back to TALE-SPIN [11], which is a method that generates a story by
using agents that perform rational actions to achieve the goals set in the world of the
story in conjunction with the story grammar. Since the advent of TALE-SPIN, com-
putational story grammar-based generation approaches are being developed with the
goal of understanding stories [4, 12, 13, 19].
There has been extensive research related to the application of story generation
to the educational or entertainment fields [6, 8]. One approach is to generate stories
dynamically or on a per-session basis [14]. The system can adapt narratives to user
preference and ability through interactions between the system and the user. Topic
detection and tracking (TDT), which is also regarded as an application of story
generation, is a event-based automatic method for organizing news stories [2]. A
method for generating stories by forming connections between the start and goal
documents has been proposed [18]. The problem of connecting the dots is almost
the same in that study as in our problem.
In this paper, we define a story as a composition of connected documents. The
objective of our research is to find the relationship between documents and gener-
ate a new story in a new context from accumulated documents. We will propose a
context-dependent dynamics model of the topics in the story in addition to proposing
a method called ”Topic Bridging,” which, given start and goal documents, generates
a new story (a chain of documents).

2 Topic Bridging
Any time a particular topic is discussed, there are many possible ways to progress
to its conclusion. For example, there can be several potential ideas when discussing
approaches to solving environmental problems from the viewpoint of aerospace en-
gineering, such as to design a more efficient engine, to use cost-effective material that
can be reused, or to take a more efficient flight path. Connecting aerospace engineer-
ing to the environmental problems can be done with a process called topic bridging.
In order to successfully implement topic bridging in a computational system, we
propose an algorithm in which the system returns candidates of a chain of docu-
ments connecting a given start document with a goal document. An overview of this
implementation is shown in Fig. 1.

2.1 The Spreading Topic Model


We propose the spreading topic model, in which the term weight in a topic spreads
depending on how the documents are connected, to extract context-dependent
relations from among chains of documents.
Topic Bridging by Identifying the Dynamics 621

Start document Goal document

User y0
y'1
y4
y'2 y'3

Bridge candidates
y'1 y'2 y'3

System x0 x1 x2 x3 x4

Document database

Fig. 1 Overview of the topic bridging implementation. The input is the start document and
goal document. The output is chains of documents as the bridge candidates. The documents
for the bridge candidates should be accumulated in advance.

When connecting documents to generate a story, each story fragment should have
a different meaning depending on the composition of the story. For example, con-
sider a story fragment about the features of nuclear power. If it links up with a
story fragment about global warming, the subject of the story would be relatively
low CO2 emissions. On the other hand, if it links up with a fragment about the
Great East Japan Earthquake, the subject would be the dangers of nuclear power.
We modeled this idea as follows: Terms that are important in the ”global warming”
story fragments become more important in the ”nuclear power” story fragments.
We referred to the spreading activation model [3, 7] to define the spreading topic
model, which is a model used for network analyses such as calculating the impor-
tance of nodes in a network. There has been much previous research on document
analysis applications [9, 10]. In the spreading activation model, a network has both
weighted nodes and weighted links. The weight of the node is called “activation”,
and the activation spreads with the weighted links.
In the spreading topic model, there is a relationship between a term assigned a node
and a link assigned a term. In this paper, we define the term weight and the term relation
weight as term context-dependent attractiveness and term dependency, respectively
[1]. These concepts are explained in more detail in the sections that follow.

2.2 Term Dependency and Term Attractiveness


Term dependency represents the weight of a term relation in a document and the
term attractiveness represents the weight of a term. The dependency of term ti on
term t j in document s, d(ti ,t j ), is given by conditional probability as follows:
622 M. Sato, M. Akaishi, and K. Hori

sentencess (ti ,t j )
ds (ti ,t j ) = , (1)
sentencess (ti )

where sentencess (ti ) is the number of sentences that contain term ti in document s
and sentencess (ti ,t j ) is the number of sentences that contain both term ti and term t j
in document s. The attractiveness of term t j in document s, as (t j ), is the sum of the
dependency of term ti on term t j over all terms ti :

as (t j ) = ∑ ds (ti ,t j ), (2)
ti ∈T

where T is the set of all the terms in the document sets.

2.3 Term Context-Dependent Attractiveness


In addition, we defined term context-dependent attractiveness as an extension from
the term dependency and the term attractiveness [16, 17]. A term depended on
by other terms that have high attractiveness in a document has more attractive-
ness in the next document than a term depended on by terms that have low
attractiveness.
The term context-dependent attractiveness cτ (t j ) of a term t j in a document at
position τ is the sum of the products of term context-dependent attractiveness cτ −1
in the previous document at position τ − 1 and term dependency ds (ti ,t j ) in the
present document in a document s over all the terms ti :

cτ (t j ) = ∑ cτ −1 (ti )ds (ti ,t j ). (3)


ti ∈T

An overview is shown in Fig. 2.

Fig. 2 Overview of term


context-dependent attrac-
tiveness. The topic is char-
acterized by the terms with
their weights. A bubble
represents a term in this
figure. There are four terms.
Topic0 Topic1 Topic2
The weight of the terms
spreads depending on the
relationship in the story
fragments (bridges). The
weight corresponds to the
Fragment1 Fragment2
term context-dependent at- Story
tractiveness.
Topic Bridging by Identifying the Dynamics 623

2.4 Bridging Topics


Our goal is to find bridge documents between the start and goal documents. To
achieve this, we propose a method that solves an inverse problem of the term
context-dependent attractiveness described in section 2.3.
We rewrite equation (3) into a matrix form:

cτ = Ds cτ −1 , (4)

where cτ [i, 1] = cτ (ti ), Ds [i, j] = ds (t j ,ti ). Then, when the start document is charac-
terized by cτ −1 and the goal document is characterized by cτ , the relation between
the two is written with bridges {s1 , s2 , ..., sτ −1 } as follows:

cτ = Dbridges c0 , (5)

where
Dbridges = Dsτ −1 Dsτ −2 ...Ds1 . (6)
If we know cτ and c0 , we can estimate Dbridges as follows:

D̂bridges = cτ c+
0, (7)

where c+0 is a pseudoinverse of c0 . Equation (4) usually has many solutions because
it is under-determined for its sparseness. We, therefore, use the pseudoinverse, a
generalization of the inverse matrix, to obtain a solution. The pseudoinverse gives
the “least-squares” answer.
Next, we want to find chains of documents as story fragments, so we score a
chain of documents {s1 , s2 , ..., sτ −1 } in the following way:

∑i, j (D̂bridges [i, j]Dbridges [i, j])


score(D̂bridges , Dbridges ) = , (8)
D̂bridges Dbridges
F F

where ||D||F is the Frobenius norm of D. The system makes recommendations on


the basis of the score of the chains of documents.

3 Evaluation
We tested the proposed method by conducting user studies to determine the utility
of our algorithm as it would be used in practice. We evaluated our method by using
descriptions of lectures. The stories generated from such descriptions can then be
used for the reconstruction of new courses as chains of lectures.
For our evaluation, we selected a course called “Global Focus on Knowledge”
from the University of Tokyo1. The concept of this course is “to look at the global
knowledge system on a macro scale, capture the entire picture of each field, and

1 http://www.gfk.c.u-tokyo.ac.jp/
624 M. Sato, M. Akaishi, and K. Hori

understand how they are organically linked up with each other.” In this course, one
or two different themes are set each semester and the sub-themed lectures in a course
are held by professors from multiple departments. There have been lectures on 20
different course themes and 104 sub-themes from 2005 to 2010. For example, in
the 2010 Winter term, one theme was “The World of Diverse Matter - The Distant
Journey from Space to Earth”, and it consisted of five lectures with five sub-themes:
“From Micro Particles to Macro Space”, “The Diversity of Matter Born from the Ac-
tions of Atoms, Electrons, and Molecules”, “The Search for and Creation of Matter
with Desirable Properties -Discoveries from the Field of Pharmaceutical Science”,
“Changing Matter - From Matter to Material”, and ”An Everlasting Future for Our
Small Earth”.
Each lecture had a description composed of about 200 Japanese characters. We
used Japanese language morphological analysis on the descriptions and treated the
nouns as terms and removed the function words. Stories were then composed from
the descriptions.
The generated stories were presented to users, who then evaluated them. There
were two bridges; that is, four story fragments, the start, two bridges, and the goal
represented one story. We tried to generate stories by linking start and goal pairs
using the following two techniques:
Topic bridging As described in Section 2.
Shortest path We computed the distance between documents on the basis of the
cosine similarity of the tf-idf vectors and then located the path with the biggest
total similarity. The tf-idf weight, which is the term frequency-inverse document
frequency, is often used to evaluate how important a word is to a document [15].
The start and goal documents were selected from lectures with the same theme. We
were able to generate 262 stories by using each technique with the highest score,
so there were a total of 464 stories. We presented six participants with 20 stories
(10 stories for each technique; randomly selected) which were then shuffled. We
then asked the participants if the chain made sense for the concept of the course.
Examples of the stories used are shown in Table 1.

Table 1 Examples of generated stories. Story 1 is regarded as appropriate, while Story 2


is not.
Story 1 “Clinical Psychology and Abnor- Story 2 “Life and Death in Russian
mal Psychology” (Summer, 2008) // “Psy- Literature” (Summer, 2009) // “Respon-
chological Development and Educational sibilities of Technology in Energy and
Psychology” (Summer, 2008) // “Science of Environment Issues” (Winter, 2007) //
Humans” (Winter, 2006) // “Social Mecha- “Information Explosion and Creation of
nisms and the Individual Psyche” (Summer, New Network Society” (Winter, 2007) //
2008) “Thoughts on Bio-Power and Death, Ideas
Regarding Euthanasia” (Summer, 2009)
Topic Bridging by Identifying the Dynamics 625

First, we checked whether the bridges of all the stories were originally from the
same theme. In the case of our algorithm, there were 106 stories that had bridges
between the start and goal from the same theme, while with the shortest path al-
gorithm, there were 44. Extracting bridges from the same theme as the start and
goal results in a better suitability to the concept of the course, indicating that our
algorithm is more effective than the shortest path algorithm.
Next, we compared the evaluated stories (Fig. 3). There were 51 out of 60 (85%)
appropriate stories (suitable for the course concept) generated by our algorithm. In
contrast, the shortest path algorithm generated 24 out of 60 (40%). Our algorithm
outperformed the competitor.
Moreover, we compared the evaluated stories for the bridges with the same theme
and in the case of the bridges with a different theme.
Our algorithm generated 15 out of 16 (94%) appropriate stories with the same
theme, and the shortest path algorithm had 7 out of 7 (100%). Both algorithms could
recommend appropriate bridges if they belong to the same theme.
On the other hand, Our algorithm stories generated 36 out of 44 (82%) appropri-
ate stories with a different theme. In contrast, the shortest path algorithm generated
17 out of 53 (32%). This demonstrates that bridges that are evaluated as appropriate
for the concept of the course are not necessarily selected from the lectures with the
same theme. We were able to get such bridges with the proposed method more than
with the shortest path algorithm.
This is because the proposed method is essentially a method of weighting the co-
occurrence of words in a bridge story fragment. It is possible to extract unique the
co-occurrence combinations of common words — e.g., a combination of the word
“network” with the word “society” or the word “network” with the word “digital”
— and treat them separately if a sentence includes these three words. Focusing on
a specific combination of words can enable us to link documents by the identifying
polysemous words.
  

 
  

 
 
   

 
 
 
   

 


 
 

      

Fig. 3 Evaluations of generated stories by using topic bridging and shortest path algorithms.
626 M. Sato, M. Akaishi, and K. Hori

By the way, of our algorithm is computationally expensive. When implementing


our algorithm, calculating the pseudoinverse matrix requires O(|T |2 ) and calculat-
ing the score requires O(|T |2τ ). Therefore, in the future, we plan to deal with this
intractableness, e.g., by letting the user explicitly provide the number of bridges as
the experiment in this section or by taking into account a dimension reduction such
as a feature selection or feature extraction.

4 Conclusion
We proposed using the spreading topic model to model the topic transition dynamics
and an algorithm to identify the spreading topic model for topic bridging. The goal
of the topic bridging was to link chains of documents from the start document to the
goal document to create a story. The method bridges topics by solving an inverse
problem of the term context-dependent attractiveness.
The findings we have presented are the results of a simple strategy, particularly
in the case of the context-dependency model. In the future, we intend to extend this
work by exploring more complex models and evaluating additional tools for the
generated stories.

References
1. Akaishi, M.: A dynamic Decomposition/Recomposition framework for documents based
on narrative structure model. Transactions of the Japanese Society for Artificial Intelli-
gence 21(5), 428–438 (2006)
2. Allan, J.: Topic detection and tracking: event-based information organization, vol. 12.
Springer, Heidelberg (2002)
3. Anderson, J.: A spreading activation theory of memory. Journal of Verbal Learning and
Verbal Behavior 22(3), 261–295 (1983)
4. Bringsjord, S., Ferrucci, D.: Artificial Intelligence and Literary Creativity: Inside the
Mind of Brutus, a Storytelling Machine. L. Erlbaum Associates Inc., Hillsdale (1999)
5. Brown, J.: Storytelling in organizations: Why storytelling is transforming 21st century
organizations and management. Butterworth-Heinemann (2005)
6. Cavazza, M., Charles, F., Mead, S.: Character-based interactive storytelling. IEEE Intel-
ligent Systems 17(4), 17–24 (2002)
7. Crestani, F.: Application of spreading activation techniques in information retrieval. Ar-
tificial Intelligence Review 11(6), 453–482 (1997)
8. Gordon, A., Van Lent, M., Van Velsen, M., Carpenter, P., Jhala, A.: Branching story-
lines in virtual reality environments for leadership development. In: Proceedings of the
National Conference on Artificial Intelligence, pp. 844–851. AAAI Press (2004), TECH-
NOLOGIES, U.O.S.C.M.D.R.C.I.F.C.
9. Mani, I., Bloedorn, E.: Summarizing similarities and differences among related docu-
ments. Inf. Retr. 1(1-2), 35–67 (1999)
10. Matsumura, N., Ohsawa, Y., Ishizuka, M.: Automatic indexing based on term activity.
Transactions of the Japanese Society for Artificial Intelligence 17(4), 398–406 (2002)
11. Meehan, J.: Tale-spin, an interactive program that writes stories. In: Proceedings of the
Fifth International Joint Conference on Artificial Intelligence, Citeseer, pp. 91–98 (1977)
Topic Bridging by Identifying the Dynamics 627

12. Montfort, N.: Generating narrative variation in interactive fiction (2007)


13. Pérez y Pérez, R., Sharples, M.: Mexica: A computer model of a cognitive account of cre-
ative writing. Journal of Experimental & Theoretical Artificial Intelligence 13(2), 119–
139 (2001)
14. Riedl, M., Young, R.: From linear story generation to branching story graphs. IEEE
Computer Graphics and Applications 26(3), 23–31 (2006)
15. Salton, G., McGill, M.: Introduction to modern information retrieval (1986)
16. Sato, M., Akaishi, M., Hori, K.: Analyzing topic transitions using term context-
dependent attractiveness. In: Information Modelling and Knowledge Bases XXI, pp.
324–331. IOS Press, Amsterdam (2010)
17. Sato, M., Akaishi, M., Hori, K.: Bridging topics for story generation. In: Proceeding of
the 2011 Conference on Information Modelling and Knowledge Bases XXII, pp. 247–
257. IOS Press, Amsterdam (2011)
18. Shahaf, D., Guestrin, C.: Connecting the dots between news articles. In: Proceedings
of the 16th ACM SIGKDD international conference on Knowledge discovery and data
mining, pp. 623–632. ACM (2010)
19. Turner, S.: The creative process: A computer model of storytelling and creativity.
Lawrence Erlbaum (1994)
Author Index

Adachi, Kosuke 213 Hayashi, Yuki 289


Akahane, Miki 609 Hayashi, Yukiko 449
Akaishi, Mina 619 Hirose, Yuya 351
Akiba, Fuminori 97 Hochin, Teruhisa 329
Asakura, Koichi 23 Honda, Katsuhiro 43
Hori, Koichi 619
Barry, Dana M. 411, 479 Huang, Ruchun 223
Batista, George Moroni Teixeira 279
Botzheim, János 439 Ichihashi, Hidetomo 43
Brierley, Mark 213 Ide, Ichiro 153
Ikeda, Mitsuru 541
Chen, Weiqin 191 Inagi, Dai 289
Chino, Kizuku 163 Iribe, Yurie 73, 511, 569
Ishida, Keisuke 419
Deguchi, Daisuke 153 Isomoto, Yukuo 183
Dharmawansa, Asanka D. 107 Iwaki, Mio 499
Iwazaki, Tomonori 419
Ejima, Tetsuro 521
Kamiya, Satoko 183
Faucher, Colette 83 Kanematsu, Hideyuki 129, 411, 479
Favorskaya, Margarita 63, 341 Kato, Shohei 117, 245, 351
Feng, Jun 53, 223 Katsurada, Kouichi 73, 511
Fukami, Yukiko 609 Kawai, Toshinobu 183
Fukuda, Yukiko 183 Kawakami, Mitsuhiko 201
Fukumura, Yoshimi 107, 129, 411, 419, Kawano, Arina 43
479 Kikuchi, Taiki 511
Furukawa, Maki 307 Kobayashi, Toshiro 411, 479
Kojiri, Tomoko 469
Hanada, Yoshiko 173 Komaki, Tateaki 255
Hanaue, Koichi 33 Koyanagi, Yusuke 429
Harada, Fumiko 255 Kubota, Naoyuki 439
Hasegawa, Naoki 419 Kunimune, Hisayoshi 499
Hashimoto, Kiyota 531 Kurata, Risa 541
630 Author Index

Lerin, Pablo Martinez 233 Okada, Yoshihiro 317


Levtin, Konstantin 341 Okajima, Seiji 317
Li, Kai 569 Okumura, Junko 183
Liew, Wei Shiung 299 Onishi, Kentaro 375
Lim, Hun-ok 1 Ozawa, Shunshuke 375
Loo, Chu Kiong 299
Pakhirka, Andrey 63
Ma, Yan 201 Pelowski, Matthew 399
Maeda, Setsuko 183
Manosavan, Silasak 73 Saga, Ryosuke 143
Masuta, Hiroyuki 1 Sato, Hideki 385
Matsubara, Shigeki 449, 459 Sato, Makoto 619
Matsumoto, Yui 43 Sayeed, M. Shohel 299
Mercantini, Jean-Marc 83 Seta, Kazuhisa 541
Miura, Hajime 419 Shen, Zhenjiang 201
Miyakoshi, Yoshihiro 245 Shimakawa, Hiromitsu 255
Miyazaki, Yoshinori 269 Shimizu, Yasutaka 129
Mizutani, Hiroya 11 Shimoyama, Ayako 521
Morishita, Takeshi 163 Sugihara, Kenichi 363
Mukai, Naoto 589 Sugiura, Misako 183
Muneyasu, Mitsuji 173 Sumiya, Kazutoshi 551
Murase, Hiroshi 153 Sunar, Ayse Saliha 289
Murase, Takahiro 363 Suzuki, Rieko 579
Murata, Masaki 459 Suzuki, Yasuhiro 307, 579

Takahashi, Naohisa 11, 233


Nagai, Hirotomo 411, 479
Takahashi, Tomokazu 153
Nakagawa, Toshio 561
Takami, Seiichiro 183
Nakahira, Katsuko T. 107, 609
Takeuchi, Kazuhiro 531
Nakajima, Kaori 183
Takeuchi, Masayoshi 23
Nakamura, Syouji 561
Takeuchi, Takako 183
Nakamura, Toshio 375
Tamaki, Kimitoshi 173
Nakayama, Keiko 561
Tamura, Yasuto 1
Narita, Ryoichi 385
Tang, Zhixian 223
Niimura, Masaaki 163, 213, 499
Toda, Yuichiro 439
Nishide, Kyohei 255
Toyoda, Kaoru 245
Nishigaki, Yuji 419
Toyoda, Kyohei 375
Nishino, Kazunori 489, 599
Tsuji, Hiroshi 143
Nishino, Tatsuya 201
Tsumori, Shin’ichi 489
Nitta, Tsuneo 73, 511
Noda, Masafumi 153 Umeda, Kyoko 521
Nomiya, Hiroki 329 Urata, Mayu 279
Notsu, Akira 43
Nozaki, Hironari 521 Wang, Jiani 153
Wang, Yuanyuan 551
Ogasawara, Hiroki 117 Watabe, Takayuki 269
Ogawa, Nobuyuki 129, 411, 479 Watanabe, Junji 579
Ohira, Yuki 329 Watanabe, Toyohide 23, 33, 289, 429
Ohnishi, Yoshimasa 599
Ohno, Tomohiro 459 Xu, Rongwei 53
Author Index 631

Yamada, Kunihiro 375 Yasuda, Takami 279


Yamaguchi, Shin’nosuke 599 Yoden, Naoto 589
Yamamoto, Daisuke 11, 233 Yuasa, Takamichi 375
Yamanishi, Ryosuke 245 Yukawa, Takashi 419
Yamazaki, Makoto 419
Yamazoe, Fumihiro 469 Zhou, Xinxin 363
Yaroslavtzeva, Elena 341 Zhu, Zhonghua 53

You might also like