Professional Documents
Culture Documents
and Technologies 14
Editors-in-Chief
Prof. Robert J. Howlett Prof. Lakhmi C. Jain
KES International School of Electrical and Information
PO Box 2115 Engineering
Shoreham-by-sea University of South Australia
BN43 9AF Adelaide
UK South Australia SA 5095
E-mail: rjhowlett@kesinternational.org Australia
E-mail: Lakhmi.jain@unisa.edu.au
Intelligent Interactive
Multimedia: Systems
and Services
Proceedings of the 5th International
Conference on Intelligent Interactive
Multimedia Systems and Services
(IIMSS 2012)
ABC
Editors
Professor Toyohide Watanabe Professor Robert J. Howlett
Nagoya University KES International
Japan Shoreham-by-sea
United Kingdom
Professor Junzo Watada
Waseda University Professor Lakhmi C. Jain
Kitakyushu University of South Australia
Japan Adelaide
Australia
Professor Naohisa Takahashi
Nagoya Institute of Technology
Japan
Keywords: Human friendly robot, Service Robot, Self Organized Map, Human
Interaction.
1 Introduction
T. Watanabe et al. (Eds.): Intelligent Interactive Multimedia: Systems & Services, SIST 14, pp. 1–10.
springerlink.com © Springer-Verlag Berlin Heidelberg 2012
2 H. Masuta, Y. Tamura, and H. Lim
On the other hand, the human can understand the true request that is hope of a
person without detailed orders by voice and gesture. Furthermore, the human can
recognize ambiguous term or reference term. Thus the human complements the
missing information from simple request. To complement the missing
information, the human attaches importance to an environmental situation that is
considered a spatiotemporal context and to past experience. Therefore, this
research target is that a robot can understand a true task based on a learning of past
experience and a environmental situation from simple request. If a robot can
understand a true request that means human intention, a robot will be able to
provide service and assistant from simple request such as reference term and
simple gesture.
Specific task of this research is a service robot to clear a table in restaurant. Our
previous research have developed the service robot system for clearing table with
human interaction[2]. In this system, a person orders a clearance command by
voice and pointing gesture to an interaction robot. A service robot makes an
inference of dish clearing procedure from human orders and properties of dishes
such as size, position, default repository and so on. However, it is difficult to
estimate an appropriate clearing plan, because clearing object selection, meaning
of human order and storage place are changed dynamically by environmental
situation. For example, the voice command "Clean up" is generally understood
that all dishes remove to sink from a table. But if a dish wasn't used, it would
remove to kitchen cabinet. If dishes are paper, it would be understood "Throw
away". The reasons of this problem are that a robot can't understand an
environmental situation of a spatiotemporal context and can't consider a past
experience through human interaction. Therefore, we propose a learning method
based on environmental situation perception and past experience consideration.
This paper is organized as follows: Section 2 explains our intelligent human
friendly robot system for clearing, Section 3 explains learning method for
estimation of human intention, Section 4 explains a experiment of clearing a table.
Finally, Section 5 concludes this paper.
Stereo Camera
3D range camera
RFID Tag
RFID
reader/writer
Intelligent Space
Server
Mobile Robot
1. Sink
2. Kitchen Cabinet
3. Refrigerator
Control Panel
Service Robot
Dish
SOM is one of the clustering method based on neural networks, which is consisted
of two layers of input and output layer, and the unsupervised learning is
performed[12]. The feature of SOM is to making a map in arbitrary dimension
within inputs data correlation. Therefore, the m dimensional neuron in the output
layer is learned from n dimensional input data. Generally, the output layer is set
two or three dimensions for visual understandability. We apply a two dimensional
SOM because the task of clearing table is a comparatively restricted situation. We
apply general 2 dimensional SOM.
Output nodes are connected in an array of i-row and j-column. All nodes in
input layer are connected to all nodes in output layer. The each node have the
weight vector W of n dimensions. The winner node is the closest node of euclid
distance |X-Wi,j| between input X and output node. This winner node is called Best
Matching Unit (BMU). The weight vector is updated with a focus on BMU as
follows.
Wi , j (t + 1) = Wi , j (t ) + hc ⋅ ( X − Wi , j (t )) (1)
⎛ r2 ⎞
hc = a ⋅ exp ⎜⎜ − 2 ⎟⎟ (2)
⎝ σ ⎠
where r is a distance between BMU and a node(i,j), a and σ are constant
parameters for the area of neighborhood. Therefore, the similar nodes gather
around each other.
In this research, parameters of a dish for input vector are defined weight ratio of
a dish (WR), weight ration change (WRV), displacement of a dish(OT), distance
between human and a dish(DP), transition rate between Place ID(PIV). The WR is
normalized value of a dish weight with foods from empty 0.0 to full 1.0. The
WRV can be changed to 1 or 0 when the WR is changed or not, that means during
a meal. The OT is an integrating dish travel distance that is normalized by
maximized one. This means a dish frequency of use while eating. The DP is a
distance between human and a dish that means priority for a person. The PIV is
frequency of movement between storage places while a month. The reasons for
selecting these parameters are possible to measure by real robot system on Fig.1.
These parameters are updated at intervals of 5 seconds on simulator. The SOM is
learned using above 5 inputs.
6 H. Masuta, Y. Tamura, and H. Lim
R2
R2 R1 R1
R1'
X
X
R4 R4
R3 R3
4 The Experiment
The task of experiment is to clear 6 dishes after a human order. Each dish has
different properties. The initial state of simulation is shown by Fig.5. The initial
parameters of a dish are shown by Table 1. The parameters of dish ID 2, 3, 4 and 5
are changed on the assumption that eating. The dish ID 1 is not changed, means
don't touch a meal in spite of keeping full. The dish ID 6 is empty and
transferring, which means should have been clearing. This experiment considers
only a meal, so PIV is fixed.
Fig.6 shows a SOM map at 10, 150 and 300 seconds on the simulation. This
map is shown distance profile based on dish ID 6. Dish parameters at 300 seconds
on the simulation are shown by Table 2. Dish ID 1 is far away from ID 6, because
the assumption is opposite. On the other hand, dish ID 5 become empty, so
parameters are similar to ID 6 and each position is close on the SOM map at
300[s]. Therefore, it is possible to classify a dish attribution for clearance. Fig.7
shows a time series of parameters on Dish ID 2. The WR of dish ID 2 is decreased
while eating, this dish is assumed as a bowl of soup. The place ID is changed at
30[s] to 100[s]. Fig.8 shows Snapshots in an experimental simulation.
Wait! To refrigerator
However, the kitchen cabinet is not recommended for the dish ID 2 after eating.
So, a person orders that the dish should remove to refrigerator at 210[s]. The new
reference 3' (boxed number in right figure of Fig.6) is created shown by Fig.6.
And the storage place is changed to refrigerator at 230[s]. The service robot can
take Dish2 to the refrigerator like Fig.8. Therefore, the decision making of a robot
is adapted to the preference of a person through interaction with a person.
Moreover, the preference of a person is estimated by perceiving of environmental
situation.
5 Conclusions
A service robot should make a decision for action from a simple order by human.
To realize this decision making, it is important that a perception of environmental
situation and adaptation to the preference of a person. We have proposed the
learning method based on SOM to adapt environmental situation and the
preference of human.
Through experiments by simulation, we verified that the proposed method
based on SOM can consider the changing of attribution by time variation. And, the
decision making of a robot can be adapted to the preference of a person through
interaction with a person. Moreover, the preference of a person can be estimated
by perceiving of environmental situation.
As future works, we are going to experiment on the real robot system.
Moreover, we will consider the required environmental situations to flexible
interaction with human.
References
[1] Mitsunaga, N., Miyashita, Z., Shinozawa, K., Miyashita, T., Ishiguro, H., Hagita, N.:
What makes people accept a robot in a social environment. In: International
Conference on Intelligent Robots and Systems, pp. 3336–3343 (2008)
[2] Masuta, H., Kubota, N.: An Integrated Perceptual System of Different Perceptual
Elements for an Intelligent Robot. Journal of Advanced Computational Intelligence
and Intelligent Informatics 14(7), 770–775 (2010)
[3] Sato, E., Yamaguchi, T., Harashima, F.: Natural Interface Using Pointing Behavior
for Human-Robot Gestural Interaction. IEEE Transactions on Industrial
Electronics 54(2), 1105–1112 (2007)
[4] Chong, N.Y., Hongu, H., Ohba, K., Hirai, S., Tanie, K.: Knowledge Distributed
Robot Control Framework. In: Proc. Int. Conf. on Control, Automation, and Systems,
pp. 22–25 (2003)
[5] Nishiyama, M.: Robot Vision Technology for Target Recognition. Toshiba
Review 64(1), 40–43 (2009)
[6] Oggier, T., Lehmann, M., Kaufmannn, R., Schweizer, M., Richter, M., Metzler, P.,
Lang, G., Lustenberger, F., Blanc, N.: An all-solid-state optical range camera for 3D-
real-time imaging with sub-centimeter depth-resolution (Swiss Ranger). In:
Proceedings of SPIE, vol. 5249, pp. 634–545 (2003)
[7] OpenRTM-aist, http://www.openrtm.org/
10 H. Masuta, Y. Tamura, and H. Lim
[8] Ando, N., Kurihara, S., Biggs, G., Sakamoto, T., Nakamoto, H.: Software
Deployment Infrastructure for Component Based RT-Systems. Journal of Robotics
and Mechatronics 23(3), 350–359 (2011)
[9] Matsusaka, Y.: Open Source Software for Human Robot Interaction. In: Proceedings
of IROS 2010 Workshop on Towards a Robotics Software Platform (2010)
[10] Gibson, J.J.: The ecological approach to visual perception. Lawrence Erlbaum
Associates, Hillsdale (1979)
[11] Lee, D.N.: Guiding movement by coupling taus. Ecological Psychology 10(3-4), 221–
250 (1998)
[12] Kohonen, T.: Self-Organizing Maps. Springer (2000)
A Fusion of Multiple Focuses on a
Focus+Glue+Context Map
Hiroya Mizutani
College of Science and Technology, Nagoya Institute of Technology, Gokiso, Showa,
Nagoya, Aichi 466–8555, Japan
e-mail: mizutani@moss.elcom.nitech.ac.jp
Daisuke Yamamoto
College of Science and Technology, Nagoya Institute of Technology, Gokiso, Showa,
Nagoya, Aichi 466–8555, Japan
e-mail: yamamoto.daisuke@nitech.ac.jp
Naohisa Takahashi
College of Science and Technology, Nagoya Institute of Technology, Gokiso, Showa,
Nagoya, Aichi 466–8555, Japan
e-mail: naohisa@nitech.ac.jp
T. Watanabe et al. (Eds.): Intelligent Interactive Multimedia: Systems & Services, SIST 14, pp. 11–21.
springerlink.com c Springer-Verlag Berlin Heidelberg 2012
12 H. Mizutani, D. Yamamoto, and N. Takahashi
1 Introduction
On web digital maps such as Google Maps [1] and Yahoo Maps [2], users have
to repeatedly change scale and scroll maps to view multiple destinations. For in-
stance, when a user needs to know the route to the final destination through the oth-
ers, he/she needs to confirm detail maps to know landmarks and intersections near
the destinations. On the other hand, at the same time, he/she needs to view wide
maps to know the relations between the destinations and the entire areas around the
destinations.
Focus+Context type fisheye map methods [5, 6, 7, 8, 9] have been proposed to
solve this problem. Although these methods enable users to view both the areas of
expansion (Focus) and peripheral areas (Context) in one map like a fisheye lens,
there is the problem that the map has distortion in whole area, and/or, a density
of roads in corner areas of maps is large [3]. In addition, these methods have to
generate whole map dynamically [3]. Therefore, these methods are not suitable for
the web map services that require high speed generation.
We proposed the Focus+Glue+Context map system EMMA (Elastic Mobile
Map) [3, 4, 10] to address these issues. On the basis of the image of the cogni-
tive map [11], EMMA is a map system that is composed of Focus, Context, and
areas that absorb distortion between the Focus and Context areas (Glue), as shown
in Fig 1. Gathering the distortion intensively inside the Glue areas enables EMMA
to display the Focus and Context areas without any distortion. Since only Glue areas
have to be generated dynamically, the calculation cost is low and EMMA is suitable
for the web map services.
When working with EMMA, users sometimes create multiple Focuses and change
their positions and scales separately to look for multiple destinations. The existing
EMMA implementation has the drawback that these Focuses sometimes overlap
and hide each other during user operation. We propose a method in which nearby
Focuses naturally unite like water drops uniting by surface tension.
Problem 1. The Overlap Repulsion method loses the map connections between the
Focus and Context areas because it removes the Glue area, as shown in Fig 3.
Fig. 2 The Repelled Focus which is dis- Fig. 3 Glue disappearance and magnifica-
placed by the Focus approaching from the tion of the black shaded area like a magni-
right fying glass
Fig. 4 Focus Transformation Function: the Fig. 5 Focus Union Function: the Focuses
Focuses are contacted like water drops by unite if they approach closer
transforming the shapes of Focuses
Fig. 6 Union Focus Transformation Func- Fig. 7 Focus Division Function: dragging
tion: the Union Focus transforms along the the mouse to a point outside the Union Fo-
moving mouse pointer cus makes the Union Focus divide
of the Focuses maintain their original shapes. This enables transformed Focuses
in a manner similar to that in which water drops transform naturally by surface
tension. Transformed Focuses return to their original shapes when they move to
non-overlapping locations.
4 Proposed Methods
4.1 Definitions
The proposed system uses the following data. Focuses are convex and N-sided poly-
gons. Coordinate type is based on XY.
Focus Definition Data. Users designate positions and shapes of Focuses. Focus
data is provided in table 1 and in Fig 8. Focus definition data has the following
limitations: 1) LOF is completely enclosed by LF . LF is completely enclosed by LG .
PF is completely enclosed by LOF . (LOF ⊂ LF ; LF ⊂ LG ; PF ⊂ LOF ). 2) LOF and LF
are geometrically similar (LOF ∼ LF ).
LOF describes the map area on the Context area before it is expanded and dis-
played on the Focus areas. When the scale and coordinates of LOF are changed
about a center point (PF ), its area fits the Focus area (LF ). PF can be found using
LOF and LF . The size of LOF changes by the scale of the Focus area.
When vertices from one Focus enter another Focus’s area, they are moved to-
wards the center of their own Focus and away from the other Focus. When trans-
forming Focuses move apart, each vertex tries to return to its original position. The
algorithm for the Focus Transformation Function is described below.
M Focuses are labeled 0 to M − 1. The original Focus coordinates (LNG ) and the
transforming Focus coordinates are stored. When the Focus Transformation Func-
tion is invoked, the following steps are performed:
step.1 i = 0 and j = 1 are set as Focus counters.
step.2 It is judged whether Focus i is overlapping Focus j by the two conditions:
1) for n = 0 to N − 1, whether Pn (LNG ) of Focus i is in LNG of Focus j and
2) for n = 0 to N − 1, whether Pn (LNG ) of Focus j is in LNG of Focus i.
If either condition is true, both Focuses are overlapping, and step.3 will be im-
plemented. If both the conditions are false, both Focuses are not overlapping, and
step.3 will be implemented.
Fig 9 shows two overlapping Focuses. P1 (LNG ) of the left Focus is in the area of
the right Focus, and P8 (LNG ) of the right Focus is in the area of the left Focus. When
n = 1, the condition about Pn (LNG ) of the left Focus holds. Thus, the Focuses are
judged to be overlapping.
step.3 Angles between PF (a center point) and each vertex of each Focus is calcu-
lated. Each vertex of LFj G of Focus i and LFi G of Focus j is moved apart from its
own PF along the segment joining the two PFs in small increments. The movement
is repeated until each vertex satisfies the following conditions: 1) It returns to its
original position Pn (LNG ), 2) When the Focus’s number is i, Focus i enters the area
of LFi G of Focus j. When the Focus’s number is j, Focus j enters the area of LFj G
of Focus i.
step.4 Because of the processes of step.3, the two Focuses are slightly overlapping.
To move them apart, the vertices that are in another Focus’s areas are moved to-
wards the center point incrementally along line segment R calculated in step.3. The
movement is repeated on Focus i and Focus j until all vertices have exited the other
Focus’s area.
step.5 The vertices that were changed on the steps above are stored along with the
number of the other Focus.
step.6 j is incremented by 1, and if j < N, step.2 is implemented.
A Fusion of Multiple Focuses on a Focus+Glue+Context Map 17
Fig. 9 Focus overlapping judgment before Fig. 10 Focus transforming owing to move-
transforming ment of each vertex
Fig. 11 Focuses are adjoining Fig. 12 LOF of the two Focuses in Fig 11
and LOF of the Union Focus
18 H. Mizutani, D. Yamamoto, and N. Takahashi
When the Focuses that are shallowly overlapping unite, the LOF of the Union
Focus cannot include much of the LOF of both the original Focuses, and the map
area that was displayed in each original Focus area cannot be displayed in the union
Focus area. The black shadow in Fig 12 shows the LOF of the Union Focus that is
made when the Focuses in Fig 11 unite. The red circles show LOF of the two Fo-
cuses in Fig 11. Because the black shadow does not include much of the circled area,
Focuses should unite only when they overlap deeply. The Focus Union algorithm is
described below.
Fig 13 shows the shape of the Union Focus calculated by step.3 using a thick black
outline. In Fig 13, the maximally separated vertex pair is circled. The LNG of the
Union Focus is composed of the circled vertices and three vertices on either side of
the maximally separated vertices.
shape of the Union Focus is fixed. Union Focus Transformation algorithm is shown
below.
step.1 LNG of the two Focuses that were stored in step.4 of section 4.3 are retrieved.
step.2 The LNG that the user is dragging moves with the mouse pointer. The moved
LNG is stored for Focus Division Function.
step.3 LG of the Union Focus is recalculated using the two LNG polygons as in step.3
of section 4.3.
step.1 When a user keeps draging a Focus after unification, the distance between
the two PF is calculated. One of the two PF is the PF of the inactive Focus. The
other one is the PF that was preserved on step.2 in section 4.4.
step.2 When the distance is greater than a certain value, the Union Focus is deleted,
and two Focuses with the original LNG areas are created. Although the newly created
Focuses overlap, the Focus Transformation Function is immediately implemented
and the Focuses are drawn in a transformed non-overlapping manner.
5 Experimental Results
We conducted the following experiment to examine the effectiveness of the pro-
posed system. The purpose of this experiment is to ensure that the proposed system
enables users to recognize roads and check geographical points more easily than the
previous system.
step.2 The subjects draw a simple map including the route and the buildings while
remembering the information from step.1.
step.3The subjects answer a questionnaire about usability.
The subjects check the place relationship between the check point and the nearby
designated buildings while checking the detailed information around the immediate
area using the Focus. The designated buildings are near the check points, and the
subjects need to view them through the Focus.
The subjects were asked to rate the following parameters on a scale of 1 to 5,
where 5: very good, 4: good, 3: satisfactory, 2: poor, and 1: very poor
1) recognition of road connections, 2) how natural it felt, 3) processing speed, 4)
interest, 5) comfort, 6) ease of use, 7) ease of following Focus areas visually.
the proposed system could keep showing the focused map information in the Focus
areas. Therefore, the proposed system solves problem 2 in section 2.
6 Conclusion
This paper proposed transformation, union, and division methods for Focuses in the
Focus+Glue+Context map EMMA. Moreover, experimental results suggest that the
proposed system is more useful than the previous EMMA system.
We have three issues for future research. First, when there is a large scale differ-
ence between two uniting Focuses, we have to explore the optimal size and scale of
the Union Focus. Second, we should compare our methods with those used by nav-
igation systems such as car navigation systems. Third, we could propose a device
and interface that is user-friendly for pedestrians.
References
1. Google Maps, http://maps.google.com
2. Yahoo! Maps, http://map.yahoo.com
3. Takahashi, N.: An Elastic Map System with Cognitive Map-based Operations. In: Inter-
national Perspectives on Maps and the Internet, February 12, pp. 73–87. Springer (2008)
4. Yamamoto, D., Ozeki, S., Takahashi, N.: Focus+Glue+Context: An Improved Fisheye
Approach for Web Map Services. In: Proceedings of the ACM SIGSPATIAL GIS 2009,
Seattle, Washington, pp. 101–110 (November 2009)
5. Furnas, G.W.: Generalized fisheye views. In: Proc. of the SIGCHI 1986, pp. 16–23
(1986)
6. Harrie, L., Sarjakoski, L.T., Lehto, L.: A variable-scale map for small-display cartog-
raphy. In: Proc. of the Symp. on GeoSpatial Theory, Processing, and Applications, pp.
8–12 (2002)
7. Sarkar, M., Brown, M.H.: Graphical fisheye views of graphs. In: Proc. of the SIGCHI
1992, pp. 83–91 (1992)
8. Sarkar, M., Snibbe, S.S., Tversky, O.J., Reiss, S.P.: Stretching the rubber sheet: a
metaphor for viewing large layouts on small screens. In: Proc. of the 6th Annual ACM
Symp. on User Interface Software and Technology, pp. 81–91 (1993)
9. Gutwin, C., Fedak, C.: A comparison of fisheye lenses for interactive layout tasks. In:
Proc. of the Graphics Interface 2004, pp. 213–220 (2004)
10. Yamamoto, D., Ozeki, S., Takahashi, N.: Wired Fisheye Lens: A Motion-Based Im-
proved Fisheye Interface for Mobile Web Map Services. In: Carswell, J.D., Fothering-
ham, A.S., McArdle, G. (eds.) W2GIS 2009. LNCS, vol. 5886, pp. 153–170. Springer,
Heidelberg (2009)
11. Gould, P., White, R.: Mental Maps. Penguin Books Ltd, Harmondsworth (1997)
A Map Matching Algorithm for Sharing Map
Information among Refugees in Disaster Areas
Abstract. In this paper, we propose a map matching algorithm for a map infor-
mation sharing system used by refugees in disaster areas. The map information
sharing system stores the roads passed by a refugee as map information. When
another refugee comes close to the refugee, they exchange their map information
each other in ad-hoc network manner. Our map matching algorithm is based on ge-
ometric curve-to-curve matching approach and provides road segments on which a
refugee moves, from position data by GPS. In order to decrease matching errors, our
algorithm adopts the point-to-curve matching method as initial matching, and incre-
mental matching for suppressing error propagation. Experimental results shows that
our proposed algorithm achieve the best result in comparison with the conventional
point-to-curve and curve-to-curve map matching methods.
1 Introduction
Mobile ad-hoc network (MANET) technologies have attracted great attention re-
cently, especially for communication systems in disaster situations[1, 2]. This is
because that communication infrastructures may be broken or malfunction and thus
Koichi Asakura
Department of Information Systems, School of Informatics, Daido University,
10-3 Takiharu-cho, Minami-ku, Nagoya 457-8530, Japan
e-mail: asakura@daido-it.ac.jp
Masayoshi Takeuchi
Department of Information Systems, School of Informatics, Daido University,
10-3 Takiharu-cho, Minami-ku, Nagoya 457-8530, Japan
Toyohide Watanabe
Department of Systems and Social Informatics, Graduate School of Information Science,
Nagoya University, Furo-cho, Chikusa-ku, Nagoya 464-8603, Japan
e-mail: watanabe@is.nagoya-u.ac.jp
T. Watanabe et al. (Eds.): Intelligent Interactive Multimedia: Systems & Services, SIST 14, pp. 23–31.
springerlink.com
c Springer-Verlag Berlin Heidelberg 2012
24 K. Asakura, M. Takeuchi, and T. Watanabe
normal mobile phones or wired and wireless LAN networks cannot be used in such
situations. Since the MANET does not require any communication infrastructures[3,
4, 5], it is suitable for communication systems in disaster areas.
In order for refugees in disaster areas to share up-to-the-minute map informa-
tion, we have already proposed a map information sharing system based on the
MANET technologies[6]. In this system, refugees record the history of positions as
map information and share map information among neighboring refugees in ad-hoc
network manner. By sharing map information, refugees can achieve correct safety
road information in disaster areas timely without any central servers. For this sys-
tem, correct position information of refugees is very essential. Although position
information can be acquired by GPS generally, there are some errors in position
information acquired by GPS. Such errors influence decision making for refugees
which roads they move to.
In this paper, we propose a map matching algorithm for the map information shar-
ing system for the refugees. In map matching algorithms, a sequence of position
information is matched with road network information in a map. By map match-
ing algorithms, position information of refugees is normalized to the road network,
which makes it easy to share the history of position information of refugees without
any errors.
The rest of this paper is organized as follows. Section 2 describes our map in-
formation sharing system briefly and related work for map matching algorithms.
Section 3 describes our proposed map matching algorithm. Section 4 explains our
experiments. Finally, Section 5 concludes this paper and gives our future work.
information each other. This exchange is performed in ad-hoc network manner with-
out any network infrastructures. By this exchange, map information of refugees is
merged and information on roads that can be passed safely in the disaster situation
is collected without any communication infrastructures. Figure 1 shows the features
of our proposed map information sharing system. By this system, we can collect
and share map information in real time, which enables refugees to move to shelters
safely.
x
x
xxx x x xxx xx
x
x
xx
x
x
x xx x x x
positions for a refugee captured by GPS. By using map matching methods, the road
segments which the refugee move on can be extracted as Figure 2(b) although all
points are not exactly laid on the road segments.
Until now, many types of map matching algorithms have been developed mainly
for vehicle navigation systems. Map matching algorithms are classified roughly
into the following types: geometric map matching[9, 10, 11], probabilistic map
matching[12], statistical map matching[13], and so on. In this paper, we focus on the
geometric map matching algorithms. This is because other map matching algorithms
require many computational resources: namely, the large amount of pre-calculated
data (memory) or computation power (CPU). Since our system is utilized on mo-
bile terminals under disaster situations, it is important to execute a map matching
algorithm with smaller amount of memory and less computation.
In geometric map matching, geometrical analysis between position data and road
networks is achieved and the point on a road segment corresponding with po-
sition information is extracted. There are several approaches for geometric map
matching: point-to-point matching, point-to-curve matching and curve-to-curve
matching[9, 11]. In point-to-point map matching algorithms, position information
is matched into the nearest point in the map database. This approach tends to gen-
erate many matching errors although it is very simple and requires less computing
resources. In point-to-curve map matching, position information is matched into the
nearest point on a road segment. Since the number of points for candidate of match-
ing becomes large, this approach generates less matching errors than the point-to-
point approach. In curve-to-curve map matching, a sequence of position points is
matched into road segments directly. This approach generates less matching errors
than former two approaches because matching between two line segments makes
less topological mismatch errors. Figure 3 shows matching results for three ap-
proaches. In this figure, position information is denoted as cross points. Points in
the map database is denoted as white circles. A road segment in the map database is
denoted as line between two circles. Figure 3(a), (b) and (c) describe the matching
results by the point-to-point approach, the point-to-curve approach and curve-to-
curve approach, respectively. Since the curve-to-curve approach provides accurate
A Map Matching Algorithm for Sharing Map Information among Refugees 27
x x x x x x x x x x x x
x x x x x x x x x
x x x
x x x
2.3 Requirements
In this section, we describe requirements for a map matching algorithm in our map
information sharing system for refugees in disaster areas.
Since our system is utilized by refugees who evacuate to shelters by foot, the
moving speed of the refugees is relatively slow and the moving trajectory is compli-
cated in comparison with vehicles. Thus, in the map matching algorithm, we have
to take initial position matching into account. If initial position matching is failed,
the position of a refugee is corresponded with the wrong road, which causes stor-
ing incorrect map information. In addition to the initial position matching, an error
recovery method is also important because of slow moving speed and complicated
moving trajectories. In vehicle navigation systems, making turn or passing though
in an intersection provide hints for error correction in map matching. On the other
hand, in our system, other error correction methods have to be provided which take
slow speed and complicated trajectories into account, although the above hints are
even important for error correction.
In order to deal with the problem for initial position matching, in our
algorithm, we adopt the point-to-curve matching method. Namely, we use the point-
to-curve matching method for initial position matching and then use the curve-to-
curve matching method, although many conventional curve-to-curve map matching
algorithms use the point-to-point matching for initial position matching.
Furthermore, in order to provide quick error recovery, we propose an incremen-
tal map matching method. In our method, initial position matching based on the
point-to-curve matching approach is repeatedly performed in a certain time inter-
val. This approach resolves the problem in that curve-to-curve matching produces
wrong matching results continuously.
3 Algorithm
This section describes the map matching algorithm in the map information sharing
system for walking refugees in disaster areas.
28 K. Asakura, M. Takeuchi, and T. Watanabe
r6
p2 p2 p3
p1 r4 p1 x r2
x x x x
r1 r5 r1
pʼ1 pʼ2 pʼ1 pʼ2 pʼ3
r3
x
r2 x
x x x
x
x x x x x
r3 x
x r1
4 Experiments
We conducted a experiment for evaluating proposed map matching algorithm. This
section describes the experimental results.
For the experiments, we use GPS log data which is captured by ourselves. We
walk around our university with the GPS logger which is implemented as the Java
applet program. Then, map matching is performed by several algorithms. We com-
pare three map matching algorithms: the conventional point-to-curve map matching
algorithm, the conventional curve-to-curve map matching algorithm and our pro-
posed algorithm. The number of points included in the GPS log data is 1573 with 1
point/sec. sampling rate.
(a) Comparison with the point- (b) Comparison with the curve-to-curve method
to-curve method
matching algorithms on the same map area. The point-to-curve matching algorithm
generates wrong results in the hatched region, while our proposed algorithm can
generate a little better results. Furthermore, Figure 6(b) shows the results of the con-
ventional curve-to-curve matching algorithm and our algorithm. This figure shows
that our algorithm generates the better results in the area where the road network
is complicated. Error ratios for each map matching algorithm are shown in Table 1.
This table shows that the proposed algorithm produces the best map matching result.
From these experimental results, we can conclude that our proposed algorithm
can generate better map matching results in comparison with conventional geometric-
based map matching algorithms.
5 Conclusion
In this paper, we propose a map matching algorithm for a map information shar-
ing system. Our algorithm is based on geometric method, in which computational
cost is less than other advanced map matching algorithms. Since our algorithm is
A Map Matching Algorithm for Sharing Map Information among Refugees 31
used for mobile terminals in disaster areas, power consumption of batteries is one
of the most important factors. However, geometric map matching approaches gen-
erate more matching errors than other advanced approaches. In order to decrease
matching errors, our algorithms has two features: using point-to-curve matching for
initial matching and incremental matching for suppressing error propagation. Ex-
perimental results shows that our algorithm generates better matching results than
other conventional algorithms. For our future work, we have to take topological fea-
tures of the road network into account. Topological information makes the number
of matching errors especially in the intersections decrease.
References
1. Midkiff, S.F., Bostian, C.W.: Rapidly-Deployable Broadband Wireless Networks for Dis-
aster and Emergency Response. In: The 1st IEEE Workshop on Disaster Recovery Net-
works, DIREN 2002 (2002)
2. Meissner, A., Luckenbach, T., Risse, T., Kirste, T., Kirchner, H.: Design Challenges for
an Integrated Disaster Management Communication and Information System. In: The
1st IEEE Workshop on Disaster Recovery Networks, DIREN 2002 (2002)
3. Toh, C.-K.: Ad Hoc Mobile Wireless Networks: Protocols and Systems. Prentice Hall
(2001)
4. Murthy, C.S.R., Manoj, B.S.: Ad Hoc Wireless Networks: Architectures and Protocols.
Prentice Hall (2004)
5. Lang, D.: Routing Protocols for Mobile Ad Hoc Networks: Classification, Evaluation
and Challenges. VDM Verlag (2008)
6. Asakura, K., Chiba, T., Watanabe, T.: A Map Information Sharing System among
Refugees in Disaster Areas, on the Basis of Ad-hoc Networks. In: The 3rd International
Conference on Intelligent Decision Technologies (IDT 2011), pp. 367–376 (2011)
7. Brakatsoulas, S., Pfoser, D., Salas, R., Wenk, C.: On Map-matching Vehicle Tracking
Data. In: The 31st International Conference on VLDB, pp. 853–864 (2005)
8. Quddus, M.A., Ochieng, W.Y., Zhao, L., Noland, R.B.: Current Map-matching Algo-
rithms for Transport Applications: State-of-the Art and Future Research Directions.
Transportation Research Part C 15, 312–328 (2007)
9. Quddus, M.A., Ochieng, W.Y., Zhao, L., Noland, R.B.: A General Map Matching Algo-
rithm for Transport Telematics Applications. GPS Solutions 7(3), 157–167 (2003)
10. Yang, J., Kang, S., Chon, K.: The Map Matching Algorithm of GPS Data with Relatively
Long Polling Time Intervals. Journal of the Eastern Asia Society of Transportation Stud-
ies 6, 2561–2573 (2005)
11. Dewandaru, A., Said, A.M., Matori, A.N.: A Novel Map-matching Algorithm to Improve
Vehicle Tracking System Accuracy. In: International Conference on Intelligent and Ad-
vanced Systems, pp. 177–181 (2007)
12. Ochieng, W.Y., Quddus, M.A., Noland, R.B.: Map-matching in Complex Urban Road
Networks. Brazilian Journal of Cartography 55(2), 1–18 (2004)
13. Gustafsson, F., Gunnarsson, F., Bergman, N., Forssell, U., Jansson, J., Karlsson, R.,
Nordlund, P.: Particle Filters for Positioning, Navigation, and Tracking. IEEE Trans-
actions on Signal Processing 50(2), 425–435 (2002)
A Method for Supporting Presentation Planning
Based on Presentation Strategies
1 Introduction
Presentation is one of the most important activities for transferring and sharing
knowledge among people. Presenters often make speeches by showing presenta-
tion slides prepared with traditional tools such as Apple Keynote [1] and Microsoft
PowerPoint [8]. Although the traditional presentation tools are widely used, some
problems have been pointed out from the viewpoint of understandability for people
looking at slides. One is that the tools do not allow presenters to clarify semantic
relationships among the ideas and facts [10]. This leads to make the construction
and important point of slides vague. Another problem is that a deck of slides are
Koichi Hanaue
Graduate School of Information Science, Nagoya University,
Furo-cho, Chikusa-ku, Nagoya, 464-8603 Japan
e-mail: hanaue@nagoya-u.jp
Toyohide Watanabe
Graduate School of Information Science, Nagoya University,
Furo-cho, Chikusa-ku, Nagoya, 464-8603 Japan
e-mail: watanabe@is.nagoya-u.ac.jp
T. Watanabe et al. (Eds.): Intelligent Interactive Multimedia: Systems & Services, SIST 14, pp. 33–42.
springerlink.com
c Springer-Verlag Berlin Heidelberg 2012
34 K. Hanaue and T. Watanabe
managed separately for each presentation, This makes it difficult to reuse and recon-
struct existing slides according to the situations such as the background of audience
and the constraint of time.
Until now, a number of methodologies and systems have been proposed for au-
thoring contents by organizing pieces of fragments. Marshall et al. developed a
system called VIKI for authoring spatial hypertexts [7]. VIKI estimates the rela-
tionships among content fragments by interpreting the user’s manipulation and their
layout. Hasida proposed the methodology of semantic authoring based on an ontol-
ogy [5]. Semantic authoring is to compose intelligent contents by specifying seman-
tic relationships among content fragments explicitly. Also, Haller et al. proposed a
system called iMapping for personal knowledge management [3]. This system al-
lows a user to specify relationships among content fragments by linking and nesting
them.
We propose a method for supporting the process of presentation preparation
on the basis of these works. Specifically, we aim to support scenario composition
according to the situation of a presentation. Here, we assume that the structured
fragments are already prepared by a presenter with existing systems such as those
described above. We introduce a concept of a presentation strategy to the process
of scenario composition. A presentation strategy refers to the intention of a presen-
ter on how to unfold his/her story. Our idea is to translate a presentation strategy
into the policy of selecting and ordering fragments in the structured fragments. We
present a mechanism of constructing a presentation story and logical structure of
slides from the structured fragments.
Our contribution is the development of the mechanism of transforming structured
idea fragments into presentation slides. We have developed a presentation system
that converts fragments organized in the form of a tree into a deck of presentation
slides [4]. However, this system does not consider any relationships except for se-
quential/hierarchical relationship. FLY [6] and Prezi [9] are characteristic systems
for presentations. These systems allow a presenter to author a presentation docu-
ment by arranging objects such as texts, images and videos without imposing phys-
ical constraints of a slide frame. From the viewpoint of reusing slides, Bergman et
al. developed a system for composing presentation slides from existing ones [2]. Al-
though these systems are effective, they do not support editing of slides at the level
of topics. We believe that these systems will be improved by introducing semantic
relationships among content fragments and reflecting a presentation strategy in the
design of slides.
The rest of this paper is organized as follows. Section 2 describes the frame-
work of our method. Section 3 explains how to translate a presentation strategy in
our model. Then, Section 4 describes the mechanism of constructing a presentation
story and logical structure of a presentation slide. Section 5 introduces our proto-
type system for scenario composition. Finally, Section 6 concludes this paper and
presents our future work.
A Method for Supporting Presentation Planning 35
2 Approach
2.1 Scenario Composition in Presentation Planning
Presentation planning is to construct a presentation story and to design presenta-
tion slides. The process of presentation preparation is divided into three phases as
illustrated in Figure 1. First, a presenter constructs a story from structured frag-
ments of ideas by picking and ordering necessary fragments. We call each fragment
a knowledge fragment, and represent the structured fragments as a network struc-
ture of knowledge fragments in which semantic relationships are specified between
them. We call this network structure a knowledge fragment network. Next, a pre-
senter designs presentation slides by selecting necessary materials and considering
the layouts of slides. We represent the result of designing slides as logical struc-
ture. Logical structure is a set of relationships among visual elements such as texts,
figures and tables. The types of relationships defined in logical structure include
sequential relation, inclusive relation in addition to semantic relations. Finally, a
presenter makes a deck of slides according to the logical structure. In this phase, the
logical structure is handled as constraints in allocating visual elements.
We call the product of presentation planning a presentation scenario. Namely, a
presentation scenario consists of a presentation story and logical structure of slides.
Presentation scenarios reflect the intentions of presenters on what to speak, how to
speak and what materials to present. It is effective to handle a scenario to support
the process of presentation preparation.
2.2 Framework
Our method aims to support the process of scenario composition based on a
presentation strategy. A presentation strategy is a presenter’s intentions on how
to construct a persuasive story. “PREP (Point-Reason-Example-Point)”, “Explain
from examples” and “Use as much illustrations as possible” are examples of a pre-
sentation strategy. In our method, a presenter specifies a presentation strategy before
composing a scenario. When a presenter picks an knowledge fragment that he/she
intends to emphasize, the fragments that are relevant to the picked fragment are
selected and ordered according to the strategy. Then, a logical structure of slides are
constructed according to the importance of each fragment. By doing this, it becomes
easy for a presenter to construct a story and edit slides at the level of topics.
Figure 2 illustrates our framework for supporting composition of presentation
scenarios. Before composing a scenario, a presenter organizes ideas by construct-
ing a knowledge fragment network in advance. Then, a presenter determines a
specific presentation strategy and an important knowledge fragment. According to
the strategy, we construct a presentation story by assigning importance weights
to fragments (Step 1) and extract the fragments with high importance as neces-
sary elements for a story (Step 2). We use the technique of propagating weights
according to the semantic relationship. Finally, we derive the logical structure of
presentation slides by grouping knowledge fragments in a presentation story (Step
3). If additional fragments are necessary, a presenter specifies another important
fragment. This interaction is repeated until a presenter judges that the story is
complete.
with a dotted line) or hierarchical relation (an arrow with a solid line). Suppose
that the fragment f1 represents the reason of the fragment f2 . If a strategy is “Give
higher priority to points than to reasons”, the relationship between f1 and f2 is
interpreted as a hierarchical relationship in which f1 is subordinate to f2 . Then, the
network is searched from an important fragment according to the order of semantic
relationships specified as a story pattern. The order also reflects a specific strategy to
the situation of a presentation. Also, the importance is assigned to fragments during
this step according to the common strategy. The importance is used to determine the
range of searching and to construct the logical structure of slides. Finally, a story is
constructed by arranging fragments in sequence according to the order of visiting
and the categories of relationships.
3 Definitions
3.1 Knowledge Fragment Network
A knowledge fragment is an element that forms a part of a presentation story. A
knowledge fragment f is defined as a double (type, content) where type is the type
of f and content refers to the entity of f .
A knowledge fragment network G is defined as G = (F, L), where F is a set of
knowledge fragments. L is a set of links between knowledge fragments defined as
L = {(r, fsrc , fdst )| fsrc , fdst ∈ F}, where r is a semantic relationship between knowl-
edge fragments. In our method, twelve types of semantic relationships are consid-
ered: causes, assumes, paraphrases, criticizes, compared-with, exemplifies, details,
specializes, supplements, illustrates, precedes and related-to.
38 K. Hanaue and T. Watanabe
fragments C for a story, the index p indicating the position of a knowledge fragment
in T , the activation degree a indicating the range of search. C ONSTRUCT-S TORY
returns a list of knowledge fragments T and a set of knowledge fragments C . This
procedure computes T and C by visiting the fragments on the knowledge fragment
network and then adding them to T and C.
In the phase of story construction, C ONSTRUCT-S TORY is called when a pre-
senter specifies a knowledge fragment f0 in G and its position p0 in T . Before the
procedure is called, a double ( f0 , w0 ) is inserted into T at position p0 . Here, w0 is
an initial value of an importance weight. If L is empty, for example, C ONSTRUCT-
S TORY is called with arguments G, S, T = [( f0 , w0 )], C = φ , p = 0 and a = a0 , an
initial value of an activation degree.
The procedure C ONSTRUCT-S TORY is divided into three main steps. First, the
neighboring fragments in G are enumerated according to the priorities of seman-
tic relationships specified in the strategy S. In this step, if a neighboring fragment
exists in a candidate set C, its importance weight and activation degree is up-
dated. Second, the importance weight and the activation degree are propagated to
40 K. Hanaue and T. Watanabe
5 Prototype System
We have developed a prototype system for composing presentation scenarios. Figure
4 shows the screenshot of the system. A presenter prepares a knowledge fragment
network in the left window and construct a story in the right window. The fragments
in a story are arranged from the top of the right window. The importance of each
fragment is expressed as its size in window. First, a presenter specifies a presentation
strategy by determining the type, the direction, the priority and the decay for each
semantic relationship. Next, a presenter picks an important fragment and copies it
from the left window to the right window. Then, the system selects the fragments
relevant to the copied one and orders them according to the specified strategy. The
42 K. Hanaue and T. Watanabe
fragments selected by the system are displayed in the right window and arranged
in the sequence of a story. A presenter completes his/her story by repeating these
manipulations.
6 Conclusion
We proposed a method for supporting presentation planning by transforming struc-
tured fragments into logical structure of presentation slides. In our method, the con-
cept of presentation strategy is introduced to construct a presentation story and a
deck of slides. A presentation strategy is translated into a policy of searching a
knowledge fragment network. We believe that this mechanism allows a presenter to
compose a scenario from a small number of knowledge fragments explicitly speci-
fied by him/her.
Currently, our prototype system requires much input for specifying a presentation
strategy. Therefore, we have to consider a mechanism for specifying it with less
input. Also, we have to confirm that our prototype system enables a presenter to
compose a scenario according to his/her presentation situation.
References
1. Apple Inc.: Keynote, http://www.apple.com/iwork/keynote/
2. Bergman, L., Lu, J., Konuru, R., MacNaught, J., Yeh, D.: Outline Wizard: Presentation
Composition and Search. In: Proceedings of the 15th International Conference on Intel-
ligent User Interfaces, Hong Kong, China, pp. 209–218 (2010)
3. Haller, H., Abecker, A.: iMapping – A Zooming User Interface Approach for Personal
and Semantic Knowledge Management. In: Proceedings of the 21st ACM Conference on
Hypertext and Hypermedia, Tronto, Ontario, Canada, pp. 119–128 (2010)
4. Hanaue, K., Watanabe, T.: Supporting Design and Composition of Presentation Docu-
ment Based on Presentation Scenario. In: Proceedings of the 2nd International Sympo-
sium on Intelligent Decision Technologies, Baltimore, MD, US, SIST, vol. 4, pp. 465–
473 (2010)
5. Hasida, K.: Semantic Authoring and Semantic Computing. In: Sakurai, A., Hasida, K.,
Nitta, K. (eds.) JSAI 2003. LNCS (LNAI), vol. 3609, pp. 137–149. Springer, Heidelberg
(2007)
6. Lichtschlag, L., Karrer, T., Borchers, J.: Fly: A Tool to Author Planar Presentations. In:
Proceedings of the CHI 2009 Conference on Human Factors in Computing Systems,
Boston, MA, USA, pp. 547–556 (2009)
7. Marshall, C.C., Shipman, F.M., Coombs, J.H.: VIKI: Spatial Hypertext Supporting
Emergent Structure. In: Proceedings of the ACM European Conference on Hyperme-
dia Technologies, Edinburgh, Scotland, pp. 13–23 (1994)
8. Microsoft Inc.: Power Point, http://office.microsoft.com/powerpoint/
9. Prezi, http://prezi.com/
10. Tufte, E.R.: The Cognitive Style of Power Point. Graphics Press (2004)
A Study on Privacy Preserving Collaborative
Filtering with Data Anonymization by
Clustering
1 Introduction
Collaborative filtering achieves personalized recommendation by searching for user-
neighborhood comparing user preferences such as purchase history data, and is a
powerful tool for reducing information overload. GroupLens [5, 8] is a basic model
of the memory-based method, in which user-similarity is first measured by Pear-
son correlation coefficients and the applicability of a new item for an active user
is predicted by the similarity-weighted average of other users’ ratings. The concept
is applied in many practical systems, such as Amazon.com [9], and is proved to
be beneficial for both users and content suppliers. In real world situations, how-
ever, users may not fully enjoy such fruits of IT tools because they often hesitate
to provide their personal information or feel nervous about information
leaks.
In order to encourage users to exploit IT tools, we should develop the techniques
for privacy preserving data mining [1], and such techniques as data perturbation or
obfuscation have been applied to privacy preserving collaborative filtering [11, 12].
k-anonymization [13] is a standard method for guaranteeing personal privacy, in
Katsuhiro Honda · Yui Matsumoto · Arina Kawano · Akira Notsu · Hidetomo Ichihashi
Osaka Prefecture University, 1-1 Gakuen-cho, Nakaku, Sakai, Osaka 599-8531, Japan
e-mail: {honda@,matsumoto@hi.,kawano@hi.,notsu@,
ichi@}cs.osakafu-u.ac.jp
T. Watanabe et al. (Eds.): Intelligent Interactive Multimedia: Systems & Services, SIST 14, pp. 43–52.
springerlink.com c Springer-Verlag Berlin Heidelberg 2012
44 K. Honda et al.
which data records are summarized so that any record is indistinguishable from at
least (k − 1) other records. The task has a close relation to clustering (or cluster anal-
ysis) [2], in which multivariate data observations are grouped into several clusters
so that objects in a same cluster are mutually similar but objects in different clusters
are not similar. k-anonymization can be achieved by searching clusters having k or
more objects and by packaging the observations into a prototypical datum in each
cluster. In general, k-anonymity is considered only for the quasi-identifiers, which
are the attributes having information for distinguishing a particular object from an
object set.
In this paper, the applicability of several clustering-based k-anonymization
approaches to collaborative filtering tasks is studied through several compara-
tive experiments. The remaining parts of this paper are organized as follows:
Section 2 gives a brief review of GroupLens recommendation algorithm. Section 3
presents several clustering-based k-anonymization approaches. Experimental
results are shown in Section 4 and a concluding summary is given in Section 5.
3 Clustering-Based k-Anonymization
k-anonymization [13] is a standard method for privacy preserving data mining [1],
in which similar objects are summarized into a single observation so that each obser-
vation cannot be used for distinguish a particular individual. k-anonymity is concep-
tually achieved so that at least k objects has same observations, and its requirement
can be identified with the process of clustering, in which the goal is to extract clus-
ters composed of similar objects.
In this study, the applicability of several clustering-based k-anonymization ap-
proaches is discussed from the view point of information loss in collaborative filter-
ing tasks.
In the single-linkage method (or also called as nearest-neighbor method), the dis-
tance between clusters are measured by the shortest distance among objects.
Let di j be the similarity degree between objects i and j. The larger the similarity
degree, the nearer the objects. The similarity degree wi j between clusters Gi and G j
is defined as:
wi j = max dab . (2)
a∈Gi ,b∈G j
It has been shown that the single-linkage method has the tendency of constructing a
small number of large (long) clusters in its middle stage and is suitable for extracting
thin and nonlinear-shape clusters.
It has been shown that the complete-linkage method has the tendency of constructing
a large number of small clusters in its middle stage and is suitable for extracting
compact and spherical-shape clusters.
In the experiments shown in the next section, the similarity among objects is
measured by the Jaccard index [2], which is a major similarity measure for 0-1 type
observation:
A
dab = , (4)
A + B +C
where A, B and C are the numbers of attributes whose observations are “1-1”, “1-0”
and “0-1” for objects “a-b”. The index is used not only for hierarchical clustering
methods but also for k-member clustering because the interval value for 0-1 obser-
vation has no meaning, i.e., the interval [0,1] just means unknown for the attribute.
So, k-member clusters are extracted in a nearest-neighbor principle.
Before applying GroupLens recommendation algorithm, in which only a scalar
value is available, each attribute of the purchase history data was summarized into
the median of each cluster, i.e., “0 or 1” based on the majority rule in 0-1 type
observation.
4 Experimental Results
Several Comparative experiments for evaluating the availability of clustering-based
k-anonymization approaches were performed by using a purchase history data col-
lected by Nikkei Inc. in 2000. The data set used in [6] includes the purchase history
of 996 users (n = 996) on 18 items (m = 18). The elements xi j is 1 if user i has item
j while otherwise 0. In the experiments, six items (Piano, PC, Word processor, VD,
48 K. Honda et al.
0.85 0.85
Coding Marging
ROC sensitivity
ROC sensitivity
0.8
Marging 0.8
Coding
0.75 0.75
Rejecting
Rejecting
0.7 0.7
2 4 6 8 10 2 4 6 8 10
Anonymity level k Anonymity level k
(a) Single-linkage (b) Complete-linkage
Fig. 1 Comparison of ROC sensitivity of Single-linkage and Complete-linkage
Oven, Coffee maker) were selected as the target items because they were owned by
nearly half (30-60%) users. Randomly selected 1000 elements (xi j for the above 6
items) were used as the test set and the applicability (0 or 1) was predicted based
on GroupLens recommendation algorithm. Here, before applying the algorithm, “1”
elements of the test set were withheld and replaced with “0” (not yet bought but may
buy near future).
When the GroupLens algorithm was applied to the original purchase history data
without anonymization, the ROC sensitivity measure was 0.827. The ROC sensitiv-
ity [14] is a major criterion for assessing the recommendation ability. ROC curve is a
true positive rate vs. false positive rate plots drawn by changing the threshold of the
applicability level in recommendation and the ROC sensitivity measure is given by
the lower area of the curve. The larger the criterion, the higher the recommendation
ability.
0.85
ROC sensitivity
0.8
Single Complete
0.75 -linkage
-linkage
principle principle
0.7
ignoring all isolated users when the number of clusters is not so large and we have
many isolated users.
Comparing Single-linkage vs. Complete-linkage, Complete-linkage outperformed
Single-linkage, i.e., Complete-linkage has a better quality for anonymizing data
without significant information loss. It is because Complete-linkage has the ten-
dency of extracting many compact clusters, in which the inner-cluster errors are
minimized, while Single-linkage often extract a few large clusters, in which the
inner-errors may not be minimized.
0.85 0.85
Coding
Coding
ROC sensitivity
ROC sensitivity
Marging Marging
0.8 0.8
0.75 0.75
Rejecting
Rejecting
0.7 0.7
2 4 6 8 10 2 4 6 8 10
Anonymity level k Anonymity level k
(a) Single-linkage-principle (b) Complete-linkage-principle
Fig. 3 Comparison of ROC sensitivity of k-member clustering with fewer clusters
Because the k-member clustering process causes a small number (at most k −
1) of isolated users, the three strategies for handling isolated users derived al-
most same results. Figure 2 compares the ROC sensitivity of single-linkage-based
and complete-linkage-based k-member clustering and implies that both k-member
clustering approaches have quite high quality in anonymizing data, i.e., the recom-
mendation ability is still high even when k is nearly 10. It is because k-member
clustering extracts small clusters whose sizes are around k as many as possible al-
though the hierarchical clustering approaches used in Section 4.1 extracted only a
smaller number of good clusters as compared in Table 1.
Next, the k-member clustering models were performed with the same cluster
number with Complete-linkage and the ROC sensitivity was given as Fig. 3. The
figure implies that k-member clustering derives a similar result with Complete-
linkage if it uses a smaller number of clusters. However, the performance of
k-member clustering is not monotonically decreasing as the anonymity level k be-
comes large. It may be because k-member clustering is essentially a random search
as shown in Section 3.4. So, the recommendation quality is also influenced by
randomness.
k-member clustering often uniformly assigns clusters based on the random search
tendency while hierarchical clustering can extract unequal and distorted clusters.
Generally speaking, Complete-linkage-principle is more suitable for anonymiza-
tion without information loss than Single-linkage. So, the k-member clustering
should be implemented based on Complete-linkage-principle or other modified
principles.
5 Conclusions
In this paper, the applicability of several clustering-based k-anonymization ap-
proaches were compared from the view point of collaborative filtering application.
The experimental results imply that k-member clustering is a promising approach
for data anonymization while it may be influenced randomness. In addition, the orig-
inal k-member clustering model, which is based on the Single-linkage-principle, can
be improved by considering other principle such as Complete-linkage.
A potential future work is to improve the anonymization quality considering the
cluster compactness in conjunction with the cluster size, i.e., the number of objects.
Fuzzy clustering [10] is a potential candidate for improving crisp partition and the
construction of a fuzzy variant of k-member clustering will be promising.
Acknowledgements. This work was supported in part by the Ministry of Education, Cul-
ture, Sports, Science and Technology, Japan, through a Grant-in-Aid for Scientific Research
(#23500283).
52 K. Honda et al.
References
1. Aggarwal, C.C., Yu, P.S.: Privacy-Preserving Data Mining: Models and Algorithms.
Springer, New York (2008)
2. Anderberg, M.R.: Cluster Analysis for Applications. Academic Press, London (1973)
3. Breese, J., Heckerman, D., Kadie, C.: Empirical analysis of predictive algorithms for
collaborative filtering. In: Proc. 14th Conference on Uncertainty in Artificial Intelligence,
pp. 43–52 (1998)
4. Byun, J.-W., Kamra, A., Bertino, E., Li, N.: Efficient k-Anonymization Using Clustering
Techniques. In: Kotagiri, R., Radha Krishna, P., Mohania, M., Nantajeewarawat, E. (eds.)
DASFAA 2007. LNCS, vol. 4443, pp. 188–200. Springer, Heidelberg (2007)
5. Herlocker, J.L., Konstan, J.A., Borchers, A., Riedl, J.: An algorithmic framework for
performing collaborative filtering. In: Proc. Conference on Research and Development
in Information Retrieval (1999)
6. Honda, K., Notsu, A., Ichihashi, H.: Collaborative filtering by sequential user-item co-
cluster extraction from rectangular relational data. International Journal of Knowledge
Engineering and Soft Data Paradigms 2(4), 312–327 (2010)
7. Honda, K., Sugiura, N., Ichihashi, H., Araki, S.: Collaborative Filtering Using Principal
Component Analysis and Fuzzy Clustering. In: Zhong, N., Yao, Y., Ohsuga, S., Liu, J.
(eds.) WI 2001. LNCS (LNAI), vol. 2198, pp. 394–402. Springer, Heidelberg (2001)
8. Konstan, J.A., Miller, B.N., Maltz, D., Herlocker, J.L., Gardon, L.R., Riedl, J.:
Grouplens: applying collaborative filtering to usenet news. Communications of the
ACM 40(3), 77–87 (1999)
9. Linden, G., Smith, B., York, J.: Amazon.com recommendations: item-to-item collabora-
tive filtering. IEEE Internet Computing, 76–80 (January-Februry 2003)
10. Miyamoto, S., Ichihashi, H., Honda, K.: Algorithms for Fuzzy Clustering: Methods in
c-Means Clustering with Applications. Springer (2008)
11. Parameswaran, R., Blough, D.M.: Privacy preserving collaborative filtering using data
obfuscation. In: Proc. IEEE International Conference on Granular Computing 2007, pp.
380–386 (2007)
12. Polat, H., Du, W.: Privacy-preserving collaborative filtering using randomized perturba-
tion techniques. In: Proc. 3rd IEEE International Conference on Data Mining, pp. 625–
628 (2003)
13. Sweeney, L.: k-anonymity: a model for protecting privacy. International Journal on Un-
certainty, Fuzziness and Knowledge-Based Systems 10(5), 557–570 (2002)
14. Swets, J.A.: Measuring the accuracy of diagnostic systems. Science 240, 1285–1289
(1988)
A Traffic Flow Prediction Approach Based
on Aggregated Information of Spatio-temporal
Data Streams
Abstract. Predicting traffic flow efficiently to encourage driver to avoid the sections
which are going to have traffic jams is a good approach to deal with traffic conges-
tion. Conventional prediction methods focused on specific information (e.g., speed,
density, flux, and so on). However, they will consume a lot of time and storage
space. This paper proposes a novel prediction approach by analyzing the aggregated
information of data streams to avoid unnecessary time and storage consumption.
Evaluation shows that compared with existing similar approach ES (Exponential
Smoothing), this new approach can adjust its smoothing factor based on historical
values and outperform in prediction results.
1 Introduction
With the rapid economic development, traffic congestion has become a common
problem in major cities in the world and has a serious impact on people’s quality
of life. ’How to build an intelligent transportation system (ITS)’ becomes the intel-
ligent transportation research focus.The precise prediction of changing traffic flow
state (speed, density, flux, travel time and other traffic operating conditions) is one
of the core of ITS. All subsystems of ITS need a reasonable, real-time and accurate
prediction on the state of road traffic to adjust traffic management control program
and publish the travel information to the travelers and provide some optimal path
options to the drivers. However, with the application and development, ITS has ac-
cumulated a massive and complex traffic flow information, and it has wide variety
of sources, a wide range of different forms and a huge amount of information. Also,
traffic flow information is obtained in real time, and the amount of information will
rapidly expand in a relatively short period of time. Real-time traffic flow informa-
tion which is get from ITS provides an important data foundation to the intelligent
T. Watanabe et al. (Eds.): Intelligent Interactive Multimedia: Systems & Services, SIST 14, pp. 53–62.
springerlink.com
c Springer-Verlag Berlin Heidelberg 2012
54 J. Feng, Z. Zhu, and R. Xu
transportation system management and the control of road traffic flow data. How-
ever, the storage, processing and handling of huge traffic flow have proposed new
requirements to the current traffic flow data analyzing and processing techniques. So
we need aggregate processing techniques to deal with the data streams, this technol-
ogy does not require process each specific data and only need process summarized
information (e.g., the number of vehicles in a query region).
For getting reasonable, real-time, accurate prediction answer and avoiding high
storage space and time consumption, in this paper, we propose a new prediction ap-
proach which can effectively take the advantage of existing aggregated information on
the stream to obtain the excellent prediction answer which can ease the traffic pressure.
The rest of the paper is organized as follows. Section 2 formally defines the
spatio-temporal aggregation and reviews related work, our solution is proposed
in Section 3. In Section 4, we present related experiment studies and report our
findings. Finally, Section 5 concludes the paper with a summary.
2 Preliminaries
2.1 Spatio-temporal Aggregation Definition
Spatio-temporal aggregation calculation [1] calculates objects in a query region dur-
ing a query time. Objects’ explicit properties and spatial regions will change over
time, this change can be discrete or continuous. For example, ”calculating forest
cover of each country of the world within ten years”. The current main research meth-
ods are AMH* [2], aRB-tree [3], DynSketch [4], and so on. In real life traffic appli-
cations, we need predict the traffic situation in a region during a future query time
and make the appropriate adjustments. In this paper, we focus on the spatio-temporal
aggregated information which is generated by existing aggregate methods and do an
analysis. According to the characteristics of aggregated information, we use reason-
able forecasting methods to predict the traffic flow to ease traffic congestion.
model’s error will increase. So it has its own limitations, and is not suitable for road
networks’ prediction calculation. Sun [7] proposed a prediction aggregation method
based on the basic window and the prediction equation. This method is applied to the
data streams which have certain regularity or satisfy the fixed function, but the road
networks’ data streams are unpredictable, they are useless to follow the law. Yu et
al. [8] advanced a method (CSPA), based on chaos theory, which takes the answer of
continuous aggregation calculation to predict future values. CSPA considers the im-
pact of the data itself to the prediction and uses local prediction algorithm of chaotic
time series, and predict future values of the size of aggregation calculation based on
historical data, finally, according to the error results of the predicted values and actual
values, adjusts the prediction algorithm model.
Those above four methods are not efficient for the spatio-temporal data streams,
and only DynSketch [4] mentioned a prediction method, ES (Exponential Smooth-
ing), which is based on spatio-temporal aggregated information. This prediction
method is the use of prediction smoothing formula to get the predicted values, and
is appropriate for data which is random changing around a horizontal line. The road
networks’ data will have a great volatility in a section of the peak flow, for instance,
around rush hour of 5:00 pm, cars will increase significantly on the road. Therefore,
there are some limitations of this method which is not particularly suitable for online
road traffic flow prediction.
3 Algorithm Description
Yt+T = at + bt T + ct T 2 (3)
Yt+T is the target of prediction (prediction value at time t + T ), t is the time series,
T is the predictive time range, and at , bt , ct are expressed as linear, quadric, cubic
prediction parameter. According to the formula 2, the traditional cubic exponential
smoothing’s formula is shown in 4:
A Traffic Flow Prediction: SAES 57
St1 = α Xt + (1 − α )St−1
1
In the formula 4, St1 , St2 and St3 are expressed as linear, quadric, cubic exponential
smoothing value, α is the static smoothing parameter, Xt is the actual value at time
t. And then the prediction parameter is shown in 5.
In the cubic exponential smoothing model, the smoothing parameter α is static and
is difficult to adapt to the changes of time series, which is the same problem of
ES model. The initial value of smoothing parameter is hard to determine [11]. In
formula 4 and 5, we can see that α ’s value is always a constant value in the process
of calculating. In terms of the original sequence of ups and downs, even if it finds a
suitable value of α for the previous sequence, it will be not necessarily suitable for
smoothing and prediction at later periods. For the most of time series, randomicity
makes many problems have no actually constant value to match the applications
all the time. Especially for the road networks’ data streams, uncertainty is more
obvious. Thus, there will be a clear prediction error and even a serious distortion if
the traditional exponential smoothing model is used for prediction. Therefore, we
consider giving up a fixed value of α and construct a value α (t) which can adjust
itself with changes over time. At first, we change α to α (t) in formula 4 and get the
following answer:
St1 = Ψt Xt + (1 − Ψt )St−1
1
Thus the new adaptive exponential smoothing prediction model (SAES) is consti-
tuted by formula 3, 7, and 8. As new model do not need to estimate the initial values
of x0 and s10 , it can smooth Xt and St1 directly. So it deals with the problem which is
difficult to determine the initial value and avoids the disadvantage of selecting the
initial value of smoothing manually.
3.2 Architecture
Since the ultimate goal of this paper is to establish an appropriate prediction model
to adapt to the practical application of the user’s query needs. For example, ”query
the number of vehicles in the road segment within future ten minutes”. This paper
does not make a further study on aggregate query technique of spato-temporal data
streams, and just uses the aggregate query method of DynSketch [4], it is also can
interpret the differences of performance between ES model and SAES model in the
experiment section. Here we take the basis of the aggregate results to establish an
appropriate prediction system model which is shown in Fig. 1.
At first, traffic flow data are stored in ’Aggregate Index Architecture’ with the
form of aggregation, the user make a query according to his (her) need, SAES mod-
ule get the appropriate results from ’Aggregate Index Architecture’ based on this
query condition, then the algorithm of SAES itself obtain the prediction results, and
the final results are sent to the user.
’Prediction query’ is defined as: prediction query time bucket is t = [T1 , Tn ]
(T0 is current time, and T1 > T0 ), prediction query region is q = ([X0 , X1 ], [Y0 ,Y1 ])
([X0 , X1 ], [Y0 ,Y1 ] is the coordinate in the two-dimensional space), and the function
of prediction query is that predicting the approximate number of moving objects
within the prediction query region q during a prediction query time bucket t. The
prediction process is shown in Fig. 2.
In Fig. 2, we set the interval between prediction time T1 and current time T0 as
T which is the same concept in formula 3. Because of generally discrete collection
of historical data in prediction model of time series, this paper also uses discretely
historical data to achieve the prediction. If prediction query time bucket is [T1 , Tn ],
the result of each discrete time T1 , T2 , · · ·, Tn of [T1 , Tn ] will be added together to get
the prediction query answer. And we set the i-th moment of the past as T0−i . There
are two situations for us to discuss:
• When T1 = Tn , it represents the single time, we may only make the prediction
calculation at sing time on the road networks.
• When T1 < Tn , it represents many discrete time of [T1 , Tn ] and we can calculate
the sum of prediction value of each time.
The Algorithm which called Predict Agg is showed as follows.
4 Experiments
The experiment adapted road network of Ningbo, China. There were about 1451
road segments per km2 . There were about 30760 vehicles and we divided them into
four kinds (car, bus, truck and auto-bike) with different speed and moving patterns.
The vehicles were uniformly distributed on the road network at the start time.
Spatio-temporal prediction aggregate query based on road networks is based on
the aggregation of past and current time to predict results. In this section, we do
some experiments according to three factors: 1) historical information length; 2)
smoothing parameter α ; 3) the length of prediction query time (T , this T is not the
same concept as T of formula 3). And finally, we compare SAES with ES.
When α is set to 0.6, we take an analysis by varying historical information length,
and get the Fig. 3.
In Fig. 3, When historical information length keep the value which is bigger than
22, the relative error will keep the lowest. This is because that SAES is more de-
pendent on the past time series, and adjust the smoothing factor based on historical
value, the historical information length is longer, the relative error is smaller. When
historical information length is long enough, it will have little influence in generat-
ing future data. What’s more, T is longer, the relative error is bigger.
According to the foregoing experiment, we set historical information length to
22, and vary the value of α to find the law. The result is shown in Fig. 4.
5 Conclusion
Nowadays, traffic congestion is getting to be a more and more serious problem.
We urgently need a prediction approach which uses small space storage and time
consumption to adjust traffic management control program and publish the travel
information to travelers and provide some optimal path options to the drivers. In
this paper, we developed the SAES model by predict the traffic flow based on ag-
gregation of spato-temporal information. Experiments show that, compared with
ES model, our model have the superior performance. In our future work, we will do
some other work to predict long-term traffic flow.
62 J. Feng, Z. Zhu, and R. Xu
References
1. Bao, L., Qin, X.: Research progress in spatio-temporal aggregation computation. Com-
puter Science (2006)
2. Jin, C., Guo, W., Zhao, F.: Getting Qualified Answers for Aggregate Queries in Spatio-
temporal Databases. In: Dong, G., Lin, X., Wang, W., Yang, Y., Yu, J.X. (eds.) AP-
Web/WAIM 2007. LNCS, vol. 4505, pp. 220–227. Springer, Heidelberg (2007)
3. Papadias, D., Tao, Y., Kalnis, P., Zhang, J.: Indexing spatio-temporal data ware houses.
In: Proc. of the Intl. Conf. on Data Engineering, San Jose, CA, pp. 166–175 (2002)
4. Feng, J., Lu, C.: Research on novel method for forecasting aggregate queries over data
streams in road networks. Journal of Frontiers of Computer Science and Technology 11
(2010)
5. Guo, L., Li, J., Wang, W., Zhang, D.: Predictive continuous aggregate queries over data
streams. Journal of Computer Research and Development 41(10) (October 2004)
6. Li, J., Guo, L.: Processing algorithms for predictive aggregate queried over data streams.
Journal of Software 16(7) (2005)
7. Sun, L.: Research on aggregate query based on continuous data streams. Master thesis
(2006)
8. Yu, Y., Wang, G., Chen, C., Fu, C.: A chaos-based predictive algorithm for continuous
aggregate queried over data streams. Journal of Northeastern University (Nature Sci-
ence) 28(8) (August 2007)
9. Brown, R.G., Meyer, R.F.: The fundamental theorem of exponential smoothing. In: JS-
TOR, April 7 (1960)
10. Yan, L., Ma, F.: Application of cubic exponential smoothing method to city underground
deformation prediction. Technology & Economy in Areas of Communications 43(5)
(2007)
11. Li, Y., Jia, F.: Application of dynamic cubic exponential smoothing method to the appli-
cation of predicting GDP of Liaoning province. Applied Science (2009)
A Way for Color Image Enhancement
under Complex Luminance Conditions
1 Introduction
T. Watanabe et al. (Eds.): Intelligent Interactive Multimedia: Systems & Services, SIST 14, pp. 63–72.
springerlink.com © Springer-Verlag Berlin Heidelberg 2012
64 M. Favorskaya and A. Pakhirka
the best processing results for grey-scale images (Single-Scale Retinex (SSR)
algorithm) and has difficulties for color images. Last fact is explained by color
R-, G-, B-functions overlapping after separate calculations of retinex functions for
R-, G-, B-components in input image. This realization is called a Multi-Scale
Retinex (MSR) algorithm.
The image with lower sharpness is not well for human visual perception. The
traditional way is a filter design with defined specific characteristics. We have not
exclusive approach in this sphere, but we tried to solve a complicated task. We
wanted to design such digital filter which will automatically increase image
sharpness after its processing by EMSR algorithm.
2 Related Work
Observed images of a real scene have a strongly dependence from environmental
luminance conditions. Human visual system can recognize objects in shadow
thanks to its dynamical properties but machine vision having a restricted non-
adaptive spectral range collapse in such cases. The human vision can automatical-
ly compensate the luminance deviations by psychological mechanism of color
constancy. The machine vision needs in development of intelligent methods and
algorithms modeling and surpassing the human vision especially under complex
luminance conditions. E. Land was the first researcher who proposed a term “reti-
nex” (formed from “retina” and “cortex”) suggesting the participation both the eye
and the brain activities in processing of visual information [5]. A general mathe-
matical function based on three scales of gray-level variations of input image was
suggested in [6]. Some authors applied a learning mechanism of neural networks
to evaluate the relative brightness in arbitrary environments. Later SSR algorithm
was proposed for dynamic-range compressions. 1D retinex function Ri(x, y, σ) ac-
cording to the SSR model calculates differences of logarithmic functions given by
Ri ( x , y ,σ ) = log{I i (x , y )} − log{F (x , y ,с )* I i ( x , y )} , (1)
where Ii(x,y) is the input image function in the i-th spectral channel along OX and
OY axes; c is a scale coefficient; sign ∗ represents the convolution of the input
image function Ii(x,y) and the surrounded function F(x, y, c).
Many authors proposed various surrounded functions for example, an inverse-
square spatial surround function or an exponential function [7, 8]. The most used
function is a Gaussian F(x, y, σ) defined as
F (x , y ,σ ) = Ke − (x ) .
2
+ y 2 σ2
(2)
The coefficient K from Eq. (2) is chosen so that the following condition takes
place:
∫∫ F (x , y ,σ) dx dy = 1 ,
Ω x ,y
(3)
where w = (w1, w2, …, wM) is a weight vector of 1D retinex functions in the i-th
spectral channel Ri(x, y, σ); σ = (σ1, σ2, …, σN) is a scale vector of 1D output
retinex function. The components of a weigh vector w from Eq. (4) satisfy the
equation
N
∑wn =1
n =1.
The dimension of a scale vector σ is chosen not less than 3. In various References
we found different advisable values. In our experiments, we used σ = (15, 90,
180). A weight vector w includes the components with equal values.
MSR algorithm makes distortions in image chromaticity because a value of
each color component of pixel is replaced by a relation its input value to mean
value of surrounding pixels of the same color component. Some decisions of this
problem exist. Certain enhancement we have viewed by transition in other color
spaces with explicit broadcasting of brightness and hue components (HIS-, HSV-,
HSL-spaces). The best effect is achieved by using a model of normalized broad-
casting of brightness and hue components suggested in [4]:
RM′ i (x , y , w ,σ ,b ) = RM i ( x , y , w ,σ )* I i′( x , y ,b ) , (5)
⎛ ⎞
⎜ ⎟
I i′( x , y ,b )= log⎜1 + b 3 i
I ( x , y ) ⎟, (6)
⎜ ⎟
⎜ ∑ I i (x , y ) ⎟
⎝ i =1 ⎠
where coefficient b is chosen from middle of values range [0…255], b = 100÷125.
The remainder of paper is organized as follows. In Section 3 we introduce
the special function which dynamically changes the spectral ranges in MSR
algorithm dark and bright areas. Section 4 explains in detail a way of image
improvement after EMSR algorithm application. Experimental results are
included in Section 5, and in Section 6 we summarize the findings of this
paper.
where
Th ⎛ Th ⎞
logDR ⎜1 − ⎟ ⋅ logDR
k1 = DR , k2 = ⎝ DR ⎠
,
(8)
log(Th ) log(DR − Th )
where DR is a dynamic image range; DR=255 for image with 8 bit on color
channel; value of threshold Th is equal 200 (according to empirical analysis).
Graphic of function R(I(x, y)) is represented on Fig. 1.
The jointed logarithmic branches in the result function R(I(x, y)) permit to in-
crease objects recognition in bright areas. Our experiments have shown better rec-
ognition in dark areas; it explains by availability of worse (for human vision)
edges and boundaries. In bright areas, the objects edges and boundaries are practi-
cally disappear, and we have no information for restoration. Also we may say that
the application of complex logarithmic function, which is represented on Fig.1, in-
creases objects contrast in typical brightness distribution [60…200] well less than
in dark and bright areas.
Another problem of color images which are processed by MSR or EMSR
algorithms is an overflow of contrast objects with high reflection coefficients. We
proposed the reconstruction model of a luminance normalization based on γ-
correction:
[ ]
1
I γ ( x , y ) = Wh I R ( x , y ) Wh γ , (9)
where Wh is a value of white color (Wh = 255 for 8-bit images); [⋅] is an integer
part of a number; IR(x, y) is a processed by one of retinex similar algorithms im-
age, and Iγ(x, y) is a reconstructed by γ-correction image.
The extension of γ-correction method is a design of a linear filter which re-
motes high-frequency components and moderates low-frequency components by
using γ-correction. The interesting decision is the application of following steps:
M PSNR = 20 log
I P (x , y )
, ⎨
( )
⎧⎪Δ = I R ( x , y ) − I (x , y ) if 100 < I ( x , y ) < 200 ,
Δ2 ( )
⎪⎩Δ = 1 I R ( x , y ) − I ( x , y ) in other cases , (10)
∑Ω
Ω x ,y x ,y
M CNR =
[ ]
E I R (x , y ) − E [I ( x , y )]
,
μ(I ( x , y )) + μ(I ( x , y ))
R (11)
2
P
where I is a peak value of brightness; |Ωx,y| is a surrounded image area; E[⋅] is a
mean value on Ωx,y; μ(⋅) is a variance value on Ωx,y.
Thereby, EMSR algorithm has following steps.
Initial data: an input image received under complex luminance conditions.
68 M. Favorskaya and A. Pakhirka
4 Image Improvement
All retinex similar algorithms give a result image with lower sharpness function
that does not satisfy a perceptive observer. This fact is not essential for following
computer processing. But if we have a task only to enhance visual properties of
image then we need in a following image improvement. Image sharpness permits
to improve details in image which became blurred or have not enough clearness
for human vision. Some popular filters solve this problem, for example high-
frequency filter “High pass”, filter based on the second derivation (Laplacian), and
“Unsharp masking” filter. All filters increase sharpness by contrast amplification
of tonal transitions. The main disadvantage of High pass and Laplacian filters con-
sists in sharpening not only image details but also a noise. Unsharp masking filter
blurs a copy of original image by Gauss function and subtracts the received image
from original input image if their differences exceed some threshold value.
Our Enhanced Unsharp Masking (EUM) filter improves the output image by
joined compositing of contour performance and equalization performance based
on empirical dependences. Let’s calculate a function of contour performance
ICP(x, y) in follows:
I CP ( x , y ) = R (F NK ( x , y )) + R (F PK ( x , y )) , (12)
where R(⋅) is a function of range equalization; FNK(⋅) and FPK(⋅) are response func-
tions with negative and positive kernels:
r r
⎛ 2⎛ k ⎞ ⎞
F NK ( x , y ) = ∑ ∑ ⎜⎜ I (x + i , y − j ) − (2r + 1) ⎜1 + c ⎟ I ( x + i , y − j )⎟⎟ ,
i = − r j = − r⎝ ⎝ 2r + 1 ⎠ ⎠
(13)
r r
⎛ 2⎛ kc ⎞ ⎞
F ( x , y ) = ∑ ∑ ⎜⎜ I ( x + i , y − j ) − (2r + 1) ⎜1 −
PK
⎟ I ( x + i , y − j )⎟⎟ ,
i = − r j = − r⎝ ⎝ 2r + 1 ⎠ ⎠
where r is a distance from central processed pixel to boundary of slicing window;
kc is a coefficient of boundaries suppression, kc = 0.2…0.7.
A Way for Color Image Enhancement under Complex Luminance Conditions 69
5 Experimental Results
a b
Fig. 2 Example of EMSR algorithm application: a) input image with shadow, b) enhanced
image with details in shadow
In Table 1 the comparative results for image “Tree” are represented. We ap-
plied SSR, MSR, EMSR algorithms (image was converted from RGB-space to
HSL-space), histogram equalization, and local histogram equalization and calcu-
lated estimations MPSNR and MCNR by Eqs. (10)-(11). As one can see, algorithms
based on logarithmic equalization of spectrum ranges demonstrate better results
(relations of a peak signal to a noise and of a contrast to a noise have large values).
70 M. Favorskaya and A. Pakhirka
The results of image sharpness are presented on Fig.3. We tested two filters:
Laplacian filter and our EUM filter.
a b
c d
Fig. 3 Image enhancement: a) input image, b) EMSR processing, c) Laplacian filter applied
to image b, d) EUM filter applied to image b
A Way for Color Image Enhancement under Complex Luminance Conditions 71
On Fig. 4 there are some fragments from images Fig. 3 b, c, d in 100% scale. It
is evidence that fragments on Fig. 4 c have more sharpness edges (than on Fig. 4
a) and more smoothness homogeneous regions (than on Fig. 4 b).
a b c
6 Conclusion
References
1. Gonzalez, R.C., Woods, R.E.: Digital image processing, 2nd edn. Prentice Hall, Inc.,
New Jersey (2002)
2. Chen, S.D., Ramli, A.R.: Preserving brightness in histogram equalization based
contrast enhancement techniques. Digit. Sig. Proc. 14(5), 413–428 (2004)
3. Kimmel, R., Shaked, D., Elad, M., Sobel, I.: Space-dependent color gamut mapping: A
variational approach. IEEE Trans. Image Process. 14(6), 796–803 (2005)
4. Moroney, N.: Method and System of Local Color Correction Using Background
Luminance Mask. U.S. 6, 741, 753 (2004)
5. Meylan, L., Alleysson, D., Süsstrunk, S.: Model of retinal local adaptation for the tone
mapping of color filter array images. J. Opt. Soc. Am. A 24(9), 2807–2816 (2007)
6. Land, E.: An alternative technique for the computation of the designator in the retinex
theory of color vision. Proc. Natl. Acad. Sci. USA 83(10), 3078–3080 (1986)
7. Hurlbert, A.C., Poggio, T.: Synthesizing a color algorithm from examples.
Science 239(4839), 482–485 (1988)
8. Choi, D.H., Jang, I.H., Kim, M.H., Kim, N.C.: Color image enhancement based on
single-scale retinex with a JND-based nonlinear filter. In: IEEE Int. Symp. Circuits and
Syst., pp. 3948–3951 (2007)
9. Sun, B., Chen, W., Li, H., Tao, W., Li, J.: Modified luminance based adaptive MSR.
In: IEEE ICIG, pp. 116–120 (2007)
10. Meylan, L., Süsstrunk, S.: High Dynamic Range Image Rendering with a Retinex-
Based Adaptive Filter. IEEE Trans. Image Process. 15(9), 2820–2830 (2006)
Animated Pronunciation Generated
from Speech for Pronunciation Training
1 Introduction
Computer-assisted pronunciation training (CAPT) was introduced for language
education in recent years [1][2]. CAPT typically scores pronunciation quality and
points out a learner’s wrong phonemes by using speech recognition technology
[3][4][5]. Moreover, it often indicates the differences between incorrect and correct
pronunciation by showing the learner’s speech wave and the correct speech wave.
Yurie Iribe
Information and Media Center, Toyohashi University of Technology, Japan
e-mail: iribe@imc.tut.ac.jp
T. Watanabe et al. (Eds.): Intelligent Interactive Multimedia: Systems & Services, SIST 14, pp. 73–82.
springerlink.com © Springer-Verlag Berlin Heidelberg 2012
74 Y. Iribe et al.
that contributes to the articulatory movement. In this paper, articulatory features are
expressed by assigning +/- as the feature of each articulation in a phoneme. For
example, the articulatory feature sequence of "/jiNkoese/ (space satellite)" in Japa-
nese is shown in Figure 3. Because phoneme N is a voiced sound, "voiced" in Figure
3 is given [+] (Actually, [+] is given a value of "1" (right side of Figure 3)) as the
teacher signal. Because phoneme k is a voiceless sound, "voiced" in Figure 3 is
given [-]. Actually, [-] is given a value of "0" (right side of Figure 3) as the teacher
signal and "unvoiced" in Figure 3 is given [+]. We generated an articulatory feature
table of 15 dimensions corresponding to 25 Japanese phonemes. We defined the
articulatory features based on distinctive phonetic features (DPF) involved in
Japanese phonemes in international phonetic symbols (International Phonetic
Alphabet; IPA) [10].
We also used our previously developed articulatory feature (AF) extraction
technology [10]. The extraction accuracy is about 95 %. Figure 4 shows the AF
extractor. An input speech is sampled at 16 kHz and a 512-point FFT of the 25 ms
Hamming-windowed speech segment is applied every 10 ms. The resultant FFT
power spectrum is then integrated into a 24-ch BPFs output with mel-scaled center
frequencies. At the acoustic feature extraction stage, the BPF outputs are first
converted to local features (LFs) by applying three-point linear regression (LR)
along the time and frequency axes. LFs represent variation in a spectrum pattern
along two axes. After compressing these two LFs with 24 dimensions into LFs
with 12 dimensions using a discrete cosine transform (DCT), a 25-dimensional (12
Δt, 12 Δf, and ΔP, where P stands for the log power of a raw speech signal) fea-
ture vector called LF is extracted. Our previous work showed that LF is superior
to MFCC as the input to MLNs for the extraction of AFs. LFs then enter a three-
stage AF extractor. The first stage extracts 45-dimensional AF vectors from the
LFs of input speech using two MLNs, where the first MLN maps acoustic fea-
tures, or LFs, onto discrete AFs and the second MLN reduces misclassification at
phoneme boundaries by constraining the AF context. The second stage incorpo-
rates inhibition/enhancement (In/En) functionalities to obtain modified AF pat-
terns. The third stage decorrelates three context vectors of AFs.
speech signals. The vocal tract area (13 dimensions) is combined with the other
two frames, which are three points prior to and following the current frame
(VT (t, t-3), VT (t, t+3)) to form articulatory movement. MLN input is the vocal
tract area (13 × 3 dimensions).
The number of the dimensions of the MLN was the vocal tract area (15 × 3 dimen-
sions) used as input and y-coordinate vectors (8 × 3 dimensions) used as output. In
the case of articulatory feature, the number of the dimensions of the MLN was 28
× 3 used as input and y-coordinate vectors (8 × 3 dimensions) used as output.
3 Evaluations
The MRI data used in the evaluation was taken in a single shot, in which one
female English native speaker uttered 37 English words. The data set used for the
experimental evaluation is as follows.
D1: Training data set for AF-coordinate vector or VT-coordinate vector con-
verter training: 36 words of English speech and images included in the MRI data
(one female English native speaker)
D2: Testing data set for AF- coordinate vector or VT-coordinate vector conver-
ter adaptation: One word of English speech included in the MRI data (one
female English native speaker)
Experimental results are acquired by using the leave-one-out cross-validation me-
thod. MRI data we used in this experiment was recorded in ATR (Advanced Tele-
communications Research Institute International) by Kobe University research group.
18
16
14
Phoneme Number
12
10
8
6
4
2
0
p b t d k f th s z sh ch hh m n r l w y iy ih ey eh ae aa ay awow oy uh uw ah er ax
bility with the vocal tract area than do 0.8 Correlation Coefficient
the articulatory features. However,
0.6
although we evaluated with a speaker
dependent data set in this experiment, 0.4
0.7
speaker. We intend to evaluate with 0.6
Although the soft palate shows high correlation, the correlation of the lower lip is
not very good. Moreover, although the tongue is an average 0.7, since it is a very
important organ for various pronunciations, it is necessary to improve the articula-
tory gesture of the tongue and lower lip. We plan to intensively train important
articulatory manners and articulatory positions in the MLN by forming some
anchor points.
We calculated only the y-coordinate distance of each feature point to decrease
the number of dimensions in this experiment. However the x-coordinate should be
also assigned if a lot of MRI data can be used, because it is affected by individual
variation of users. Additionally, we will verify the individuality of ariticulatory
movement by applying MRI data composed of some users
4 Summary
We developed a system for automatically generating CG animations to express
pronunciation movement through articulatory features extracted from speech.
The pronunciation mistakes of the user can be pointed out by expressing the
pronunciation movements of the user’s tongue, palate, lips, and lower jaw as
animated pronunciations. We conducted experiments that confirmed the accuracy
of the generated CG animations. The correlation coefficient was more than about
0.83, and we confirmed that smooth animations were generated from speech
automatically. We will build a pronunciation instructor system that includes the
CG animation program.
References
1. Delmonte, R.: SLIM prosodic automatic tools for self-learning instruction. Speech
Communication 30(2-3), 145–166 (2000)
2. Gamper, J., Knapp, J.: A Review of Intelligent CALL Systems. Computer Assisted
Language Learning 15(4), 329–342 (2002)
3. Neumeyer, L., Franco, H., Digalakis, V., Weintraub, M.: Automatic scoring of pro-
nunciation quality. Speech Communication 30(2-3), 83–93 (2000)
4. Witt, S.M., Young, S.J.: Phone-level pronunciation scoring and assessment for interac-
tive language learning. Speech Communication 30(2-3), 95–108 (1995)
5. Deroo, O., Ris, C., Gielen, S., Vanparys, J.: Automatic detection of mispronounced
phonemes for language learning tools. In: Proceedings of ICSLP 2000, vol. 1, pp. 681–
684 (2000)
6. Wang, S., Higgins, M., Shima, Y.: Training English pronunciation for Japanese learn-
ers of English online. The JALT Call Journal 1(1), 39–47 (2005)
7. Phonetics Flash Animation Project,
http://www.uiowa.edu/~acadtech/phonetics/
82 Y. Iribe et al.
8. Wong, K.H., Lo, W.K., Meng, H.: Allophonic variations in visual speech synthesis for
corrective feedback in capt. In: Proc. ICASSP 2011, pp. 5708–5711 (2011)
9. Iribe, Y., Manosavanh, S., Katsurada, K., Hayashi, R., Zhu, C., Nitta, T.: Generation
Animated Pronunciation from Speech through Articulatory Feature Extraction. In:
Proc. of Interspeecch 2011, pp. 1617–1621 (2011)
10. Huda, M.N., Katsurada, K., Nitta, T.: Phoneme recognition based on hybrid neural
networks with inhibition/enhancement of Distinctive Phonetic Feature (DPF) trajecto-
ries. In: Proc. Interspeech 2008, pp. 1529–1532 (2008)
Building a Domain Ontology to Design
a Decision Support Software to Plan Fight
Actions against Marine Pollutions
1 Introduction
Although the Mediterranean is only one hundredth of the sea surface it supports
thirty percent of the volume of international maritime traffic. An estimated 50% of
goods transported could present a risk to different degrees. A study on shipping
accidents in the Mediterranean sea [1], conducted between 1977 and 2003
identified 376 accidents involving hydrocarbons and 94 accidents involving
T. Watanabe et al. (Eds.): Intelligent Interactive Multimedia: Systems & Services, SIST 14, pp. 83–95.
springerlink.com © Springer-Verlag Berlin Heidelberg 2012
84 J.-M. Mercantini and C. Faucher
hazardous and noxious substances (HNS). These accidents have resulted in a total
discharge of 305,000 tons of hydrocarbons and 136,000 tonnes of HNS. These
events highlight the criticality of the risk induced by transport activities in that
region. In general, the strategy of fight against marine pollution from hydrocarbon
following shipping accident is divided in two complementary stages: (i) recovery
of the maximum volume of hydrocarbon on the sea and, when the pollutant
reached the coast, (ii) cleaning the polluted coastline. There are many intervention
techniques to combat pollution and their effectiveness depends on the situations in
which they are implemented. Thus, it appears that the choice of a fight technique
in a response plan is not trivial and requires taking into account a large number of
parameters.
The project CLARA 2 (Calculations Relating to Accidental Releases in the
Mediterranean) brings responses to these problems. It aims to develop and
implement a computer tool to assist management crisis resulting from a maritime
accident having caused a spill of pollutants. To carry out this national project
(funded by the Research National Agency), a consortium of 13 partners was
formed [2]. The purpose of this paper is to focus on the elaboration process of the
GENEPI module (GENEration de Plan d’Interventions) from CLARA 2, which
aims to plan fight actions against marine pollutions. The current work lies in the
following research fields: (1) from the maritime field perspective, the paper
presents a software tool to assist crisis management staff to minimise pollution
impact and (2) from a methodological perspective, the paper shows the importance
of developing ontologies (i) for structuring a domain (at a conceptual level) as its
actors perceive it and (ii) for using these ontologies to build computer tools for aid
solving problems in that domain. An overview of the CLARA 2 project is
presented in Section 2 and Section 3 presents the functioning principles of the
GENEPI module. Section 4 describes the methodological approach and the
process used to build the GENEPI module. In the Section 5 the implementation of
the process is developed and exemplified. Section 6 presents the architecture of
the GENEPI module. Section 7 presents the conclusions.
Classified Additional
Figth Actions Observations
VREAL Plan
Selection Set of Intervention
OR Candidate Generation Plans
Process
Actions Process
VSIM
Needed
Ressource Base
4 Methodological Approach
The projection of the KOD method on the general approach for developing
ontology shows that KOD guides the corpus constitution and provides the tools to
meet the operational steps 3 (linguistic study) and 4 (conceptualization).
Under previous researches, the KOD method has been already implemented
[5, 11, 12].
88 J.-M. Mercantini and C. Faucher
Table 2 Integration of the KOD method into the elaboration process of ontology
Elaboration process of KOD process Elaboration process of
Ontology ontology with KOD
1. Specification 1. Specification
2. Corpus definition 2. Corpus definition
3. Linguistic study 1. Practical Model 3. Practical Model
4. Conceptualisation 2. Cognitive Model 4. Cognitive Model
5. Formalisation 5. Formalisation
3. Software Model 6. Software Model
The linguistic analysis is performed in two steps: the verbalization and the
modelling. The verbalization step consists in paraphrasing the corpus documents
in order to obtain simple phrases, which allow qualification of the terms employed
during document analysis. Some terms appear as objects, others appear as
properties, and yet others appear as relations between objects and values. The
Building a Domain Ontology to Design a Decision Support Software 89
In order to obtain the actems, the linguistic analysis consists on identifying verbs
that represent activities performed by actors during marine pollution or object
behaviour. In general terms, an activity is performed by an action manager, by
means of one or more instruments, in order to modify the state (physical or
knowledge) of the addressee. The action manager temporarily takes control of the
addressee by means of instruments. Occasionally the action manager can be one
who directs the activity and at the same time is also subjected to the change of
state (example: knowledge acquisition). The following example illustrates how to
extract actems from the Corpus: “... the Prestige sends an emergency message...”
The activity is “SENDING an emergency message”. Once identified, the
activity is translated into a 7-tuple (the actem):
<Action Manager, Action, Addressee, Properties, State1, State2, Instruments>,
Where: the Action Manager performs the action; the Action causes the change; the
Addressee undergoes the action; the Properties represent the way the action is
performed; State 1 is the state of the addressee before the change; State 2 is the
state of the addressee after the change; Instruments, is one or a set of instruments
representing the means used to cause the change.
90 J.-M. Mercantini and C. Faucher
Prestige Commandant
SENDING
CROSS MED An emergency message CROSS MED
(do not know) (Date, location, duration ) (know)
Radio
Actems model the task activity. It is composed of textual items extracted from the
reports, which describe the state change of an object as described by the domain
experts. Each element of the 7-tuple (or 8-tuple for fight actions because of the
suitability criteria) must be previously defined as a taxem.
One result of the actem analysis is that actems can be devided into five main
action categories:
• Actions related to pollutant behaviour,
• Actions related to accidented ship behaviour,
• Actions related to reasoning patterns,
• Actions related to CLARA 2 services.
• Actions related to operations against pollution,
Amongst actions related to pollutant behaviour we can cite: Evaporation and
Dissolution. Amongst actions related to accidented ship behaviour, we can cite:
92 J.-M. Mercantini and C. Faucher
• The set A, which contains the actions where all criteria are verified,
• The set B, which contains the actions where at least one of the criteria could not
be assessed by lack of information in the situation,
• The set C, which contains the shares of which at least one criterion was not
satisfied,
• The set D, which contains the actions of the set B enriched by criteria not
assessed.
The rules for selection of fight actions are based on the suitability criteria and the
values taken by the corresponding attributes of the situation.
The rules are of the form:
c1 ^ c2 ^ ...^ cn → True / False
With c1, c2, ... cn, the criteria associated to a fight action. The conclusion of the
rule is about the possibility whether or not to select the action. A criterion is
satisfied if the value taken by the corresponding attribute of the situation is
compatible the criterion constraints.
94 J.-M. Mercantini and C. Faucher
Upon the receipt of the Situation, the action selecting algorithm analyzes the
actems involved in the Search Domain. From each ACTEM, it extracts the criteria
and it applies the selection rules previously presented. According to the results
obtained, the actem is placed in the corresponding set (A, B, C or D). After
running the algorithm, if the user is not satisfied with the result, it can enrich the
situation to assess the criteria that have not been. This new running should reduce
the size of the B set, putting the actions which were either in the set A or in the set
C. The algorithm is independent of changes in the ontology.
7 Conclusion
The paper presented the first results about the design of a software tool (the
GENEPI module) to plan fight actions against marine pollution. The GENEPI
module is a part of a wider research program: CLARA 2. The methodological
process to build GENEPI is based on the elaboration of an ontology. The purpose
of that ontology is to structure the domain (maritime accidents) according to the
problem to solve (to plan fight actions) and to the problem solving method. The
ontology was obtained through a cognitive approach, which consisted in applying
the KOD method, which has proven to be adequate.
The Situation Management module, the Ontology Management module and the
Action Search Engine are in service. The Plan Generator module and the
Simulator are currently in progress.
References
[1] Rempec, Guide pour la lutte contre la pollution marine accidentelle en Méditerranée,
Partie D, Fascicule 1 (Avril 2002)
[2] CLARA 2 Consortium: École des Mines d’Alès, le Cèdre, IFREMER, Météo France,
IRSN, TOTAL, EADS, Géocéan, UBO, INERIS, SDIS 30, Préfecture Maritime de la
Méditerranée, le CEPPOL, LSIS. Projet ANR (2006-2010)
[3] Uschold, M., Grüninger, M.: Ontologies: Principles, methods and applications.
Knowledge Engineering Review 11(2), 93–136 (1996)
[4] Vogel, C.: Génie cognitif. Sciences cognitives, Paris, Masson (1988)
[5] Mercantini, J., Tourigny, N., Chouraqui, E.: Elaboration d’ontologies à partir de
corpus en utilisant la méthode d’ingénierie des connaissances KOD. In: 1ère édition
des Journées Francophones sur les Ontologies, JFO 2007, Octobre 18-20, pp. 195–
214, Sousse, Tunisie (2007) ISBN: 978-9973-37-414-1
[6] Gandon, F.: Ontology engineering: a survey and a return on experience. Research
Report no. 4396. INRIA Sophia-Antipolis (Mars 2002)
Building a Domain Ontology to Design a Decision Support Software 95
[7] Aussenac-Gilles, N., Biébow, B., Szulman, S.: Revisiting Ontology Design: A
Method Based on Corpus Analysis. In: Dieng, R., Corby, O. (eds.) EKAW 2000.
LNCS (LNAI), vol. 1937, pp. 172–188. Springer, Heidelberg (2000)
[8] Dahlgren, K.: A Linguistic Ontology. International Journal of Human-Computer
Studies 43(5), 809–818 (1995)
[9] Uschold, M., King, M.: Towards a Methodology for Building Ontologies. In:
Proceedings of the IJCAI 1995 Worshop on Basic Ontological Issues in Knowledge
Sharing, Montréal, Canada (1995)
[10] Fernández-López, M.: Overview of methodologies for building ontologies. In:
Proceedings, IJCAI 1999 Workshop on Ontologies and Problem-Solving Methods
(KRR5), Stockhom, Sweden, August 2, pp. 4-1–4-13 (1999)
[11] Mercantini, J.M., Capus, L., Chouraqui, E., Tourigny, N.: Knowledge Engineering
contributions in traffic road accident analysis. In: Jain, R.K., Abraham, A., Faucher,
C., van der Zwaag, B.J. (eds.) Innovations in Knowledge Engineering, pp. 211–244
(2003)
[12] Mercantini, J.-M., Turnell, M.F.Q.V., Guerrero, C.V.S., Chouraqui, E., Vieira,
F.A.Q., Pereira, M.R.B.: Human centred modelling of incident scenarios. In: IEEE
SMC 2004, Proceedings of the International Conference on Systems, Man &
Cybernetics, The Hague, The Netherlands, October 10-13, pp. 893–898 (2004)
Can Pictures Be a Candidate for Knowledge
Media?
Fuminori Akiba
Abstract. Can pictures be a candidate for knowledge media? After mentioning the
background of this question in chapter one, we introduce in chapter two the idea
of three kinds of knowledge, from discussion of a picture from a book of Dominic
McIver Lopes (Lopes 2006) ―that is, knowledge about, knowledge through and
knowledge in―and point out its deficiencies. In chapter three, we then propose an
alternative idea of pictures from which we obtain knowledge ―that is, pictures as
objects and facts, pictures as process and pictures as informational indica-
tors―and find strong support for our proposal from various research fields and
practices. Finally we conclude that we could think of pictures as a candidate for
knowledge media.
At first glance it seems quite easy for us to answer this question. Artworks have
long given us various kinds of knowledge. For example, Giotto’s fresco painting
of the Last Judgment at the Chapel of the Scrovegni in Padova (Fig.1) has taught
people what Hell is and taught them to live good lives if they do not want to go to
Hell. Cannacher’s contemporary artwork called Addict to Plastic (Fig.2) teaches
us the problem of wastes disposal and of manipulation through the mass media.
But can we surely say that they really give us knowledge themselves? The answer
is probably No. If someone who has never read the description of Hell in the Bible
sees Giotto’s fresco painting, the person could not understand what it is about. The
person only sees a scene which shows a monster eating a man. And if someone
who is not accustomed to the traditional artistic convention that an open window
or a frame symbolizes –i.e., a gate through which we can get to a hidden truth—
the person should fail to grasp Cannacher’s intention and only see the scene of a
heap of trash.
Corresponding to our suspicion, many philosophers have cast doubts on the
ability of artworks as knowledge media. Among them the negative evaluation by
Fuminori Akiba
Graduate School of Information Science, Nagoya University
e-mail: akibaf@is.nagoya-u.ac.jp
T. Watanabe et al. (Eds.): Intelligent Interactive Multimedia: Systems & Services, SIST 14, pp. 97–105.
springerlink.com © Springer-Verlag Berlin Heidelberg 2012
98 F. Akiba
Stolnitz (1992) is the most famous. He said, “[I]n science, history, and religion,
confirmation of a statement also counts as evidence for other, logically related
statements. Thus truths, notably in the cumulative advances of science support and
build on each other,” while “art is unlikely any of these kinds of knowing. Be-
cause “the truth derived from one work of art never confirms that derived from
another work of art […]” (Stolnitz 1992, 341).
In contrast to Stolnitz who still recognizes each artwork as a source of knowl-
edge and finds another kind of knowing in art, Hideo Iwasaki, a Japanese biologist
and artist, said that even each artwork cannot be a source of knowledge because
for him an artwork is an “open text” : in other words, a source of “never-ending,
unanswerable and in a sense irresponsible questions” (Iwasaki 2010, 753).
Therefore, we need to ask again, can artworks be a candidate for knowledge
media? But the word artworks is too vague and includes hugely different objects,
so here we limit the topic in this paper from artworks in general to pictures in or-
der to avoid confusion. In this case, of course, pictures do not include visual repre-
sentaion such as diagrams and graphs. In chapter two we will then take up the idea
of three kinds of knowledge from a picture in Lopes (2006) and point out its defi-
ciencies. In chapter three, we then explore the alternative idea of knowledge from
a picture and find strong supports for our proposal from various research fields
and practices. In the final chapter we conclude that we could think of pictures as a
candidate for knowledge media.
Table 2.1 Three kinds of knowledge from pictures (reconstructed by the author from Lopes
2006, 133-4)
From this point of view Lopes makes clear the difference between knowledge
about, knowledge through and knowledge in. Knowledge about and knowledge
through is justified, though in a different way from propositional knowledge. For
example the belief that the painting is exercised in oils [knowledge about] is justi-
fied from a caption, or more scientifically, through technological investigations,
and also the belief that the painting was made under the strong influence of con-
temporary paintings [knowledge through] is justified if the painting is compared
with contemporary paintings which the painter could really see.
In contrast to these two kinds of knowledge, however, Lopes argues knowledge
in is not justified because the content of perceptual belief, which Lopes call the
perceptual report of a picture, cannot justify the lesson of the picture. Take Lopes’
example, Dorothea Lange’s Migrant Mother (1936, Fig.3). For example we cannot
logically draw the message of the picture [knowledge in] –that is, “we ought to act
with greater compassion for the poor” (Lopes 2006, 140)— from its contents of
perceptual belief. We can only reasonably interpret the message from the percep-
tual contents such as “the remarkable lines of the face of the migrant mother, the
quiet pride and determination she expresses, and the depiction of the children with
their backs to the viewer” (Lopes 2006, 139-140, concerning the discussions about
the content of picture perception, see Zeimbekis 2010).
At this moment, everyone should expect that Lopes casts such knowledge in
away because it does not satisfy the classical definition of knowledge. If knowl-
edge in means the lesson which we manage to obtain through interpretation of our
perceptual contents, knowledge in is not knowledge worthy of the name. How-
ever, everyone may be surprised because Lopes suddenly throws the classical de-
finition of knowledge away and changes the topic from knowledge to an “intellec-
tual virtue” (Lopes 2006, 145). Pictures may have an intellectual virtue if they
have the power by which viewers are transformed into men of “fine observation.”
He says, “looking at a picture frequently requires effort, sometimes a great deal of
effort, in attention to detail, accurate perception, and adaptable seeing. […]
100 F. Akiba
A person who misses the fine details of an image, or who cannot see how it por-
trays previously unremarked features of reality, or who cannot see things in new
way…such a person does not appreciate it fully. In order to meet these demands,
viewers must become fine observers. […] A fine observer has visual experiences
that closely track the properties of which perceived and bring her experiences un-
der concepts which make them available to belief formation, knowledge gathering,
and reasoning. Pictures have cognitive merit in so far as they bring about revisions
to the way we conceptualize visual experience” (Lopes 2006, 149-150).
However, this alteration is quite strange. First, if we are allowed to throw the
classical definition of knowledge away, we can construct the entire argument in a
quite different way. Second, indeed pictures facilitate our ability to see something
accurately and heuristically in detail, but we can learn the same thing from any
kinds of visual representation such as diagram, graph, and scientific visualization.
And finally, according to his idea, man must learn many things about how to see
pictures and how to be a fine observer in order to understand knowledge in pic-
tures, but even if we accumulate hundreds of propositional rules about how to see
pictures accurately and heuristically in details, the rules can never guarantee a
viewer’s jump from perceptual contents to knowledge in pictures. No one can con-
fidently say that the jump from the perception of “the lines of face of the migrant
mother” to the lesson, “we ought to act with greater compassion for the poor,” is
true or false because it is a problem of persuasion or rhetoric, not of logical dem-
onstration.
do not have to think that the deepest knowledge we obtain from pictures is
spiri-tual (or moral).
In addition, since knowledge about and knowledge through can be taught in the
form of propositional statements, they are easily exhibited with captions in
muse-ums. Concerning knowledge through we can exhibit the picture by the side
of oth-er pictures relevant to the picture, and the contexts in which the maker
really pro-duced the picture.
For this proposal we can find a support from the practice which was really done
in a museum. Meighen S. Katz reported in his article on an exhibition This Great
Nation Will Endure (2004/2005, at Franklin Roosevelt President Library and
Mu-seum) and told in this exhibition Lange’s Migrant Mother was exhibited in a
quite interesting way. It was exhibited with a computer interaction “which allowed
the visitors to access not just the familiar image, but Lange’s full series. By
viewing the lead-up photos the museum visitors were able to see the process of
composi-tional framing and decisions that Lange made” (Katz 2012, 332). Such
contextual display of historical facts about making a picture conveys us what a so-
ciety once requested to the maker and what the maker wanted to show people in
order to sat-isfy their request or tried to create a new vision against their need.
Pictures be-come historical evidence.
Lopes begins his argument from the viewer who expects to receive knowledge
from a picture as an end-product. From this point of view pictures might remain
unanswerable, open-ended questions with no definitive answer, and what viewers
can do is only to imagine the picture’s lesson. Consequently viewers are always
forced to jump from their unreliable perceptual contents to equally unreliable les-
sons [or intention of imaginary makers]. However, if we change our point of view
from viewers’ receptive experience to makers’ generative process, the situation
completely changes, because even though a work of art as an end-product re-
mains an open-ended question, the process through which the maker of a work of
art makes is a problem-solving process in which a maker struggle to find an
opti-mal solution to an artistic problem.
Gregory Currie once called this process a “heuristic path” (Currie 1989). “In
speaking of a scientist’s ‘heuristic path’ to a theory I mean the process whereby
the theory was arrived at; the facts, methods and assumptions employed, including
analogical models, mathematical techniques and metaphysical ideas. […] And I
wish to take over the spirit of this idea for our analysis of artworks, though it will
undergo modification in the process” (Currie 1989, 113).
Along this line, many researchers already have made their studies and give
strong support for our proposal. For example, various kinds of studies on drawing
process (Fujihata 2008), cognitive studies of artists’ creative process (Yokochi and
Okada 2007), cognitive science of design process (Goel 1995), cognitive studies
of creativity and knowledge transmission through copying (Ishibashi and Okada
2004, 2010), etc.
102 F. Akiba
Among these two papers written by Ishibashi and Okada are especially worth
mentioning, because, on the one hand, they (Ishibashi and Okada 2010) recognizes
the creative process of drawing as a kind of problem-solving process and demon-
strate that copying artist’s drawing by laymen is useful for “constraint relaxation”
which constitutes a precondition of creative drawing. According to the paper, in
the process of copying, laymen can utilize the knowledge structures in artists’
drawings as guidance, and with their help can reflect upon their own knowledge
structures they already had, and make themselves relaxed from the constraint of
their existing ideas about drawing. On the other hand (Ishibashi and Okada 2004),
copying is to understand oneself. It tells us our own knowledge structures and
their limits (concerning self-awareness and self-change in the viewers’ aesthetic
experience, see Pelowski and Akiba 2011).
argument does not negate our proposal that pictures can convey knowledge about
facial expressions and emotions. Here we find a possibility that pictures can be
knowledge media. They can convey knowledge about “a wide category of objects
and faces” (Zeki 1999).
To the third one, we can easily access, because we are all products of evolution
and share the basic informational systems evolved. To the first one we can also
easily access because they are propositionally explicable. They are, therefore, eas-
ily combined with other kinds of media (texts, etc.) and exhibited in museums. To
the knowledge we obtain from pictures as process, we can access through combi-
nations with some other kinds of media such as videos which record the process of
making a picture. In addition, it is important for us to know how the maker deals
with the following matters: the degree of acceptance of already established values,
consciousness of other makers and their works, reflection on one’s own ideas, ex-
ploration of the materials which are suitable to what the maker wants to express,
development of one’s own artistic problems, relation to the societies to which the
maker belongs, etc. (Yokochi and Okada 2007, 444).
From what has been said above we can conclude that pictures can be a candi-
date for knowledge media. This conclusion results from a change in our view of
pictures ―that is, pictures as objects and facts, pictures as process and pictures as
informational indicators.
Can Pictures Be a Candidate for Knowledge Media? 105
References
Ars Electronica Linz GmbH, Repair: sind wir noch zu retten, Linz (2010)
Currie, G.: Art works as action types. In: Lamarque & Olsen (2004), pp. 103–122 (1989)
Evans, G.: The varieties of reference. In: MacDowell, J. (ed.) Oxford (1982)
Fujihata, M.: What is drawing process studies? In: Drawing Process Studies. Department of
Engineering, University of Tokyo (2008) (in Japanese)
Goel, V.: Sketches of thought. The MIT Press (1995)
Ishibashi, K., Okada, T.: Copying artworks as perceptual experience for creation. Cognitive
Studies 11(1), 51–59 (2004) (in Japanese)
Ishibashi, K., Okada, T.: Facilitating creative drawings by copying art works by others.
Cognitive Studies 17(1), 196–223 (2010) (in Japanese)
Iwasaki, H.: Biomedia Art: possibilities of synthetic biology from the point of aesthetics.
Science Journal KAGAKU, 747–753 (July 2010) (in Japanese)
Katz, M.S.: Reconsidering images: using the farm security ad-ministration photographs as
objects in history exhibitions. In: Dudley, S., et al. (eds.) The Thing About Museums:
Objects and Experience, Representation and Contestation, Essays in Honour of Profes-
sor Susan M. Pearce, pp. 324–337, Routledge (2012)
Lamarque, P., Olsen, S.H.: Aesthetics and the philosophy of art. Blackwell (2004)
Lopes, D.M.: Understanding pictures. Oxford (1996)
Lopes, D.M.: Sight and sensibility: evaluating pictures. Oxford (2006)
Erwin, P.: Zum Problem der Beschreibung und Inhaltsdeutung von Werken der bildenden
Kunst. In: Kraemmerling, E. (ed.) Ikonographie und Ikonologie: Theorien-Entwicklung-
Probleme, Dumont (1932)
Pelowski, M., Akiba, F.: A model of art perception, evaluation and emotion in transforma-
tive aesthetic experience. New Ideas in Psychology 29(2), 80–97 (2011)
Schere, K.R.: Gefrorene Gefuehle: Zur Emotionsdarstellung in der bildenden Kunst. In:
Boehm, G., et al. (eds.) Movens Bild: Zwischen Evidenz und Affekt, pp. 249–273. Wil-
helm Fink (2008)
Stolnitz, J.: On the cognitive triviality of art, Lamar-que & Olsen, pp. 337–343. Blackwell
(1992); (Reprinted in (2004))
Yokochi, S., Okada, T.: Creative expertise of contemporary artists. Cognitive Studies 14(3),
437–454 (2007) (in Japanese)
Zeimbekis, J.: Pictures and Singular Thought. Journal of Aesthetics and Art Criti-
cism 68(1), 11–22 (2010)
Zeki, S.: Inner vision: an exploration of art and the brain. Oxford (1999)
Capturing Student Real Time Facial Expression
for More Realistic E-learning Environment
1 Introduction
Education is one of the largest sectors of the economy in most countries. The
development of this sector is the core for future development. Most governments
invest more and more for improvement of this field. Traditional classroom and E-
learning settings are currently the most popular learning styles. E-learning is a
way to enhance knowledge and performance using Internet technologies to deliver
a broad range of solutions. With the increment of Internet users, the growth of the
E-learners cannot be avoided. Because of that is a Convenient, Cost-effective and
Consistent method. E-learning becomes most popular method to gain education.
Over the last decade, the number of corporate universities which has learning
partners such as E-learning companies or universities grew from 400 to 1,800 [1].
T. Watanabe et al. (Eds.): Intelligent Interactive Multimedia: Systems & Services, SIST 14, pp. 107–116.
springerlink.com © Springer-Verlag Berlin Heidelberg 2012
108 A.D. Dharmawansa, K.T. Nakahira, and Y. Fukumura
40% of Fortune 500 companies have established corporate universities which are
delivering E-learning [2]. Those evidences prove that in near future E-learning
become powerful tool all over the world.
Although E-learning becomes more popular, it is based on a self-regulated
learning style. Students can learn separately in the environment of E-learning.
They must determine "what to learn" and "where to go" in each learning session.
In the virtual E-learning world also, teacher couldn’t immediately get information
from students and instruct them face-to-face, like the real world. One of the ways
to increase the effectiveness of virtual world education system is retrieve real time
student information. It is very helpful to get some important decision from
teacher’s aspect.
Learning environment is one of the major factors affecting for the student
performance in traditional education system [3]. The similar affection can be
applied to E-learning also. As a result of that, three dimensional virtual
environments utilize as a platform for E-learning is gradually increased. More
researchers and institutions try to build their learning environment like real world
classrooms.
In this paper, one of the methods discuss to overcome above barriers, to
increase the effectiveness and to make more suitable place for the E-learning
environment, develop different kind of tools. Basically, main target of this system
consists with major two purposes.
2 Related Works
There has been considerable interest in the potential for the development of
E-learning in universities, schools and further education [4]. With the
implementation of virtual environment, the number of E-learners rapidly
increased. It is expected that there will be about five million online learners within
the next ten years [5]. Not only for education but also it is spread to every fields. It
is used for supply training to workers [6]. Workplace learners can be better
served by E-learning environments rather than conventional training.
Although, there are significant increments of the E-learners, it is somewhat
difficult to identify student activities in an E-learning environment. Therefore
Capturing Student Real Time Facial Expression 109
courses which are conducted in E-learning are used peer assessment method to
assign appropriate marks [7]. There are some investigations to check the usability
aspects of E-learning interfaces that incorporate the use of avatar as a virtual
lecturer. There are researchers who analyze users’ satisfaction and views in regard
to a set of facial expressions and body gestures when used by a virtual lecturer in
the presence and absence of interactive context in E-learning interfaces [8, 9].
Another study is affective body gesture analysis in video sequences and spatial-
temporal features are exploited for modeling of body gestures. They also
presented to fuse facial expression and body gesture at the feature level using
Canonical Correlation Analysis [6]. Researchers tried to assess the emotional state
of learners by analyzing of nonverbal behavior as a speech analysis and the
analysis of facial expressions. They have developed tools to extract features from
sound and video recordings and used classifiers as support vector machine to label
emotional states [11].
Pedagogically, it is not always true that every E-learning virtual environment
provide high-quality learning. According to Govindasamy, development and
evaluation of E-learning involves learner and task analysis, defining instructional
objectives and strategies, testing the environment with users and producing the
initial version of the E-learning tool [12].
Eliminating the major barriers and improving the effectiveness of virtual
learning is the main target of this research. Making virtual environment more
realistic with user facial features and facilitating several ways to analyze the
student behavior indirectly are the major tasks of this research as a contribution to
increase the usefulness of E-learning.
At the same time, facial expression extracting system also activated. Student’s real
time facial expression can be extracted and connected with html web interface.
The user’s facial features (Face, Eye, Mouth and Nose widths and heights), face
image and expression are appeared in the html web interface which is sent from
facial expression acquiring system. This html web interface is fulfilled two main
activities.
The relevant data is passed to virtual environment and according to those data, the
appropriate avatar is changed. As an example, when the real user smile the
appropriate avatar also smiles. In addition, web interface is handed necessary data
to server for store valuable data for further analysis. There are two ways to
observe the student behavior.
This is the whole system architecture for making three dimensional virtual
learning environments more realistic and facilitates to observe student facial
behavioral patterns.
Real time facial expression can be extracted through this system and whole
procedure is indicated in Fig.2. The real user video is obtained continuously by
using webcam and it is consists of frames. Then, the analysis of frame by frame is
conducted. After obtaining the frame, detect the objects of that frame. As a result
of that analysis, the face can be detected. The process of finding face, other
components and facial expressions are explained as follows.
fi = - (1)
Where means white pixel values and means black pixel values.
The same procedure needs to be applied for many positive images and obtains
average value for threshold and filtering range. After develop the classification, it
can be applied to real time image as shown in Fig.4. To detect the face area, the
relevant features has to be gone through the all area of image. It is relatively time
consuming task. Therefore the relevant region for each face components can be
roughly set and it is discussed in next part. After apply the relevant Haar-feature in
real world application fi value is able to determined. If the real image fi value is
greater than the fi value of classifier, then the relevant face feature is available for
one Haar-feature. All the selected Haar-features are needed to be satisfied for
confirm to availability of face components as indicated in Fig.4. This is the way
for detecting the face and face components.
Fig. 5 Region of interest Fig. 6 Detection of face components Fig. 7 Face variables
Capturing Student Real Time Facial Expression 113
those features may be varied. According to the variation of face features the
relevant rectangles are also changed. Instead of face component data, the relevant
rectangle size can be determined.
Surprise MW -MW < -5 & MH -MH > 10 & EW -EW >10 & EH -EH >10
Sad EW -EW <-10 & EH -EH <-10 & MW -MW >5 & NW -NW >3
According to the real time facial expression recognition system, the appropriate
facial expression can be identified as shown in Fig.8.
Normally, virtual world avatar has not any facial expression. Most of the time,
they are visible with a neutral face image. It is a barrier for make a more realistic
learning environment. As a first step to overcome it, basic facial expression
animate system in virtual environment, is developed. Using that system, avatar can
make basic facial expression while avatar is engaging with learning.
In this time, the effort is made connection between real user and appropriate
avatar in the virtual environment. Above web interface can be able to send
relevant data to the virtual environment. According to those data, appropriate
facial expression visible through avatar. As a result of this system, the real user
facial behavior is appeared on the avatar face.
Capturing Student Real Time Facial Expression 115
7 Future Work
Making alive avatar in educational environment may affect the educational
performance of E-learner. Although that is a virtual environment, student can
learns regarding the real world applications, natural things, day-today knowledge
etc. In that case, they have to compare with real world and they have to think
about real world. The realistic virtual environment provides some impression
about real world and it is very helpful for continue their learning activities.
Further, thinking ability may be change person to person and it may easy with
realistic environment.
But yet, there are no any experiments to identify how this realistic virtual
environment affects to student learning behavior. The next step is to conduct some
experiments and verify whether there are some effects of this realistic
environment for student learning behavior in all aspects.
8 Conclusion
With the growing interest of E-learning, three dimensional virtual learning
methods become popular. To make a more realistic virtual environment, live
avatar with facial features and connection between real user and virtual avatar are
important. Therefore, real user facial expression extraction system is developed
and real time facial expression can be extracted. To acquire the facial features,
geometric facial feature-based method and location information of prominent
component has used. Using that system real time facial feature can be extracted
continuously.
When the real user facial expression extracted, it is needed to send into virtual
learning environment. Using web interface, the relevant data can be delivered. To
indicate four basic facial features, avatar face changing system initialize before
connecting the virtual world and real world. After introducing the avatar face
changing system, the connection between the avatar and the user need to be
established. At last, according to real user face changes, the appropriate avatar
face also changes.
E-learning is not conducted in face to face format. Therefore student behavior
is difficult to analyze. But this system provides the face details of real user for any
other person with web interface. Any people can observe the face data of E-learner
continuously and it is loaded in database for further use. This system makes
virtual learning environment more realistic with face data and provides student
facial behavior-observe facility for any person.
References
[1] Moe and Blodgett, op. cit., endnote 21, p. 229. Meister op. cit., endnote 23 in US Web
Based Education Commission Report (December 2000)
[2] Moe and Blodgett, op. cit., endnote 21, p. 229, Gregory, Wilson and Husman (2000)
116 A.D. Dharmawansa, K.T. Nakahira, and Y. Fukumura
[3] Higgins, S., Hall, E., Wall, K., Woolner, P., McCaughey, C.: The Impact of School
Environments: A literature review. The Centre for Learning and Teaching School of
Education, Communication and Language Science, University of Newcastle
[4] Hughes, J., Attwell, G.: A framework for the evaluation of E-learning. Paper-
presented to a seminar series on Exploring Models and Partnerships for eLearning in
SME’s, held in Stirling, Scotland and Brussels, Belgium (2002/2003),
http://www.theknownet.com/ict_smes_seminars/papers/Hughes
(retrieved February 14, 2007)
[5] Bjur, J.J.: Auditory Icons in an Information Space. Department of Industrial Design,
School of Design and Craft. Goteberg University, Sweden (1998)
[6] Paynea, A.M., Stephensonb, J.E., Morrisb, W.B., Tempestb, H.G., Milehamc, A.,
Griffinb, D.K.: The use of an E-learning constructivist solution in workplace learning.
International Journal of Industrial Ergonomics 39(3), 548–553 (2009)
[7] Chang, T.-Y., Chen, Y.-T.: Cooperative learning in E-learning A peer assessment of
student-centered using consistent fuzzy preference. Expert Systems with
Applications 36(4), 8342–8349 (2009)
[8] Alseid, M., Rigas, D.: Users’ views of Facial Expressions and Body Gestures in E-
learning Interfaces: an Empirical Evaluation. In: SEPADS 2009 Proceedings of the 8th
WSEAS International Conference on Software Engineering, Parallel and Distributed
Systems, pp. 121–126 (2009)
[9] Alseid, M., Rigas, D.: Empirical results for the use of facial expressions and body
gestures in E-learning tools. International Journal of Computers and
Communications 2(3) (2008)
[10] Shan, C., Gong, S., McOwan, P.W.: Beyond Facial Expressions: Learning Human
Emotion from Body Gestures. In: British Machine Vision Conference 2007, paper-
276 (2007)
[11] Rothkrantz, L., Datcu, D., Chiriacescu, I., Chitu, A.: Assessment of the emotiona
states of students during E-learning. In: International Conference on E-learning and
the Knowledge Society - E-learning (2009) ISBN:1313-9207
[12] Govindasamy, T.: Successful implementation of E-learning Pedagogical
considerations. The Internet and Higher Education 4, 287–299 (2001)
[13] Viola, P., Jones, M.J.: Robust real-time object detection. International Journal of
Computer Vision 57(2), 137–154 (2004)
[14] Nikolaidis, A., Pitas, I.: Facial feature extraction and pose determination. Pattern
Recognition 33, 1783–1791 (2000)
[15] Matsumoto, D., Ekman, P.: Facial expression analysis. Scholarpedia 3(5), 4237
(2008)
Character Giving Model of KANSEI
Robot Based on the Tendency of
User’s Treatment for Personalization
Abstract. Recently, many types of robots have been developed for not only
industrial manufacturing but also interacting with human. Robots designed
for human-robot interaction are expected to have the ability of communi-
cating with human smoothly. In this paper, we propose the character giving
model for KANSEI robot. This model makes robots individual-beings that
are varied with each user. We aim to develop more humanly and empathetic
robots by using this model. Robots dynamically get their own characters
based on the tendency of user’s behaviors which are classified into two dimen-
sions: dominance-submission and acceptance-rejection. Through the interac-
tion experiments between human and the robot with the proposed model, we
confirmed that proposed model could give various characters to the robot,
and the character, which was given through the communication with a user,
suited for each of the users.
1 Introduction
Recently, varied robots are used in not only manufacture fields but also com-
munication with humans, such as PaPeRo [1] and wakamaru [4]. Commu-
nication robots are expected to take care elderly people and heal humans’
hearts. Therefore, they should have abilities to communicate with human
smoothly. Many studies that intend to give the robots these abilities have
been reported. Yokoyama [8] analyzed appearance timings of non-verbal in-
formation in communications between fellow humans. And they used these
timings for controlling non-verbal information of robots. Takada [7] pro-
posed the system controlling robots’ facial actions. That system outputs facial
actions in compliance with humans’ behaviors.
T. Watanabe et al. (Eds.): Intelligent Interactive Multimedia: Systems & Services, SIST 14, pp. 117–127.
springerlink.com c Springer-Verlag Berlin Heidelberg 2012
118 H. Ogasawara and S. Kato
Users’ Actions
Situations with ifbot
Dominance Submission Acceptance Rejection
wants to play with you. restrain play play later reject
wants to help cleaning with you. turn down rely clean together let it go off
begs for toys. restrain buy buy on another time reject
overslept. admonish forgive and prepare help to prepare leave it alone
is singing. prevent commend sing together lay off
Ti
Pi = , (6)
n + T1 + T2 + T3 + T4
(i = 1, 2, 3, 4)
where n is defined as a constant number of interpersonal behaviors stored in
robots. Robots reflect interpersonal behaviors over the past n times in their
own characters.
In this paper, we use the GUI to communicate with robots. Fig. 4 shows
the snapshot of communication between the user and the robot using the
GUI. The robot shows situation sentence as actions in this GUI. Users
read situation sentence and select users’ actions in four buttons on the
GUI. Four users’ actions correspond to interpersonal behaviors shown by
Table 1. In this communication, we didn’t show clearly users these corre-
spondences. We set 30 situation sentences based on behaviors of five-years-
old children. Table 2 shows the examples of situation sentences and users’
actions. Each situation sentences have no relationships and are shown in ran-
dom order. The communication is ended when the all situation sentences were
happened.
122 H. Ogasawara and S. Kato
and ideal. Fig. 5 shows the cobweb chart shaped by each factors’ average val-
ues of evaluation in this experiment, and Table 3 shows ifbots’ combinations
which show significant differences about some factors. Conscientiousness and
openness are omitted from the table because any robot combinations didn’t
show significant differences about these factors.
experiment, and the ifbot mounted the proposed model on. Through the
pre-communication, we equipped the data of 20 tendencies of users’ behav-
iors. ifbotS was characterized by the tendency that was the nearest pre-
communication’s data from the origin symmetry of subject’s tendency. After
the experiment, subjects evaluated their impression by Semantic Differen-
tial and answered questionnaires on distinguishing the ifbot characterized by
their own tendency from others.
answer. According to the results, we think the proposed model could char-
acterize a robot as a unique-being been able to distinguish it from others by
their user.
4 Conclusion
In order to make robots empathic and humanly, we proposed the method to
dynamically characterize robots. We performed the interaction experiments
between humans and the robots with the proposed model containing charac-
ters defined by Symonds. According to the results, the changes expressional
tendencies by the proposed model could characterize robots dynamically and
make users to feel robots had characters defined by Symonds. And, we con-
firmed the proposed model could leave a humanity impression on users and
the character based on users’ interpersonal tendencies could leave a good im-
pression on users. Therefore, we think the proposed model have efficacy for
increasing robots’ empathy and humanity.
In future work, we aim to propose the method characterizing robots more
flexibly by adding environments or various communications to causes to give
robots their own characters.
References
1. Fujita, Y.: Development of personal robot papepo. Journal of the Society of
Instrument and Control Engineers 42, 521–526 (2003) (in Japanese)
2. Kato, S., Oshiro, S., Itoh, H., Kimura, K.: Development of a communication
robot ifbot. In: IEEE ICRA, pp. 697–702 (2004)
3. Murakami, Y.: Big five and psychometric conditions for their extraction in
japanese. The Japanese Journal of Personality 11, 70–85 (2003) (in Japanese)
Character Giving Model of KANSEI Robot 127
4. Onishi, K.: ”wakamaru”, the robot for your home. The Japan Society of Me-
chanical Engineers 109, 448–449 (2006) (in Japanese)
5. Saitoh, I.: Interpersonal sentiments and emotions in social interaction. The
Japanese Journal of Psychology 56, 222–228 (1985) (in Japanese)
6. Symonds, P.M.: The psychology of parent-child relationship. Appleton-Century-
Croft, New York (1939)
7. Takada, M., Kaneko, M.: Sympathy and reaction based of facial actions between
humanoid and user. The Institute of Erectronics, Information and Communica-
tion Engineers 104, 1–6 (2005) (in Japanese)
8. Yokoyama, M., Aoyama, K., Kikuchi, H., Hoashi, K., Shirai, K.: Controlling non-
verbal information in speaker-changing for spoken dialogue interface of humanoid
robot. Information Processing Society of Japan 40, 487–496 (1999) (in Japanese)
Checklist System Based on a Web for Qualities
of Distance Learning and the Operation
1 Introduction
Recently, the quality and its improvement for distance learning are becoming
important more and more with the tendency that ICT education, e-Learning etc.
Nobuyuki Ogawa
Department of Architecture, Gifu National College of Technology, Japan
e-mail: ogawa@gifu-nct.ac.jp
Hideyuki Kanematsu
Department of Materials Science and Engineering, Suzuka National College of
Technology, Japan
e-mail: kanemats@mse.suzuka-ct.ac.jp
Yoshimi Fukumura
Department of Management and Information Systems Science, Nagaoka University of
Technology, Japan
e-mail: fukumura@oberon.nagaokaut.ac.jp
Yasutaka Shimizu
Tokyo Institute of Technology, Japan
e-mail: shimizu.y.ak@m.titech.ac.jp
T. Watanabe et al. (Eds.): Intelligent Interactive Multimedia: Systems & Services, SIST 14, pp. 129–141.
springerlink.com © Springer-Verlag Berlin Heidelberg 2012
130 N. Ogawa et al.
have prevailed not only at a national level, but also on global scale [1]-[3]. There-
fore, we tried to make a checklist based on lots of data and information relating to
distance learning as guideline to improve the quality of distance learning.
The checklist in this current project is available online (a Web check list),
where one could judge how he/she should prepare e-learning materials to improve
the quality of content. It does not only provide us the check system for the guide-
line to carry out the e-learning course, but also serves to award credits to students
and to transfer credits among different higher organizations. It is composed of
many questions that teachers for e-learning courses should answer. They would
input their answers on the web and could get the guideline to analyze their own
e-learning courses. The system accumulates the input data by different teachers to
classify and analyze them statistically. We established the checklist system to
apply it to various e-learning courses for e-learning Higher Education Linkage
Project: e-Help). However, the contents for the checklist would be available for
general e-learning courses. In addition to the application in the e-Help project, we
aim to apply to general e-learning courses to assure the quality.
L3: The checklist for higher-level standards to improve the quality much more. It
is composed of contents which could correspond to versatile needs, individual
needs much more.
1-15 Are the multiple evaluation methods (mini exams, term exams, reports,
projects, discussions etc.) carried out?
(2) Perspective to support the actual operations
L1
2-1 Does the teacher lead and guide students to submit their reports properly, so
that they would not copy the information from web sites directly, exchange their
information among students, borrow the contents except their names from other
classmates and so on.
2-2 Does the teacher provide the system where students can get technical support?
2-3 Does the teacher adequately consider the preservation of confidentiality and
the privacy protection for students, when he/she provides the educational service
to leaners?
2-4 Does the teacher provide the orientation for the distance learning online and/or
offline?
2-5 The e-learning contents satisfy the recommended computers’ performance,
networks line speeds, the needs for the basic software?
2-6 The time needed to have access to the e-learning contents is well considered
for students to continue their learning?
L2
2-7 The support system is well established, so that the interaction among students
and teachers could occur smoothly?
2-8 Is the system well established to show the situation for learning processes such
as submission of reports etc.?
2-9 Does the teacher provide feedback immediately and properly?
2-10 Does the teacher show learners the methods for evaluation?
L3
2-11 Does the e-learning course provide the system to motivate the learners?
2-12 Does the e-learning course have the support system for an online learning
community?
2-13 Does the e-learning course have the system, where students could have
access to library materials, terminology dictionaries, and other materials needed
for classes as service for learners?
2-14 Does the teacher provide the support for handicapped learners?
2-15 Does the teacher provide the support system, where learners self-check their
basic operation capability for learning? Does the system make it possible for the
learners under a certain criteria to improve their operation capability by them own?
Checklist System Based on a Web for Qualities of Distance Learning 133
Neutral, #2: Not so much, #1: Not at all. If his/her answer would be #3 for a ques-
tion, he/she can click the radio button for #3 on the web. When he/she answers the
questions on the web, they should check those items from the teacher’s view-
points. For each question, the user answers, when they cannot choose the right an-
swers for some questions, they can check the items from the personal viewpoints
as an exception. At any rate, it is the main purpose that the user would answer
from the viewpoint of the teachers in charge of the e-learning courses to improve
the quality of it. Fig.2 shows the general input screen for the virtual checklist.
After the input, the user can get the results as an answering display on the
web. (Fig.3) It is basically composed of three bar graphs for each questionnaire.
One is the input data, the second the average value in the current year and the
third the whole average value. The user can see objectively for which items he/she
has weak points. And in addition, he/she can analyze his/her own
e-learning course, comparing his/her results with the average values from other
researchers.
The simultaneous display for both results in the current year and also over the
total years, we aim to correspond to the rapid changing consciousness for the
quality of e-learning contents with the rapid prevailing of e-learning. And in
addition, we also aim to correspond to year-by-year-updated contents for
e-learning courses.
Checklist System Based on a Web for Qualities of Distance Learning 137
4 Statistical Classification
As already described, the user gets the useful results by the self-checking system
for the quality assurance for e-learning courses. On the other hand, the data accu-
mulated in the web server by various users provide us useful statistical informa-
tion, which helps us to understand the current situation for e–learning among some
groups. For example, when you add up the results by filtering compulsory courses
and non-compulsory ones, respectively, you can judge if they would be different
as for the approach to the guarantee for the quality of e-learning course or not,
comparing both each other.
To realize such a filtering analysis, the web system requires the inputs of
information about credits for the field of the subject, the agreement of credit
transfer, and awarding credits itself.
As for the field of the subject, it is classified into 10 categories, corresponding to
the classification of scientific research fund by our Japan education ministry. (Fig.4)
As for the filtering of credit transfer agreement, the alternative is if the organi-
zation concludes the agreement for credit transfer with the distributed ones or not.
As for the filtering of award of credits, the following choices are prepared.
(1) As for credits in the organization delivering the contents: The contents are
- not delivered to their own organization
- delivered to their own organization
138 N. Ogawa et al.
To graduate,
-The credits are needed.
-The credits are not needed.
-I don’t know if it is needed to graduate or not.
- I don’t know at all about the awarding system of credits.
(2) As for credits in the organization receiving the contents:
- The contents are never delivered to other organizations.
- The contents are delivered to other organizations.
- The credit is not awarded.
- The credit is awarded.
- The credit is needed to graduate.
- The credit is not needed to graduate.
- The credit is needed in some cases.
Checklist System Based on a Web for Qualities of Distance Learning 139
The results for filtering can be shown on the web and the user can confirm them
easily. (Fig.5)
6 Conclusions
The checklist to assure the quality of e-learning was established and the self-
checking system on a web was designed. The system enables us to check our own
e-learning course by ourselves as well as the analytical classification and classifi-
cation on the base of accumulated data. The results were extensive and versatile.
However, three characteristic positive and other three negative ones were clear-
sighted. The checklist was originally made according to the four different perspec-
tives. Those graded results generally show relatively higher evaluations for all of
the four perspectives. However, they already reveal at this point that the adminis-
trators in the organizations do not always understand the importance of e-learning
courses, nor what to do for them. The negative points should be considered care-
fully and improved for the future.
Checklist System Based on a Web for Qualities of Distance Learning 141
References
[1] Council for Higher Education Accreditation: "Accreditation and Assuring Quality in
Distance Learning", CHEA Monograph Series 2002, Number 1, CHEA Institute for
Research and Study of Accreditation and Quality Assurance, Washington DC, The
USA (2002)
[2] Distance Education Certificate Program (web page),
http://depd.wisc.edu/html/quality3.htm
[3] Barker, K.C.: E-learning quality standards for consumer protection and con-sumer con-
fidence: A Canadian case study in e-learning quality assurance. Journal of Educational
Technology and Society (2007)
[4] Shimizu, Y.: Perspectives to enhance the quality of e-learning courses. JSEE Research
Report JSET 08-2, 121–128 (2008)
Comparison Analysis for Text Data
by Integrating Two FACT-Graphs
1 Introduction
T. Watanabe et al. (Eds.): Intelligent Interactive Multimedia: Systems & Services, SIST 14, pp. 143–151.
springerlink.com © Springer-Verlag Berlin Heidelberg 2012
144 R. Saga and H. Tsuji
2 FACT-Graph
A FACT-Graph is a method to create visualized graphs of large-scale trends [4]. It
is shown as a graph embedding a co-occurrence graph and the information of
keyword class transition. A FACT-Graph enables us to see the hints of trends,
which have been used for analyzing trends in different fields such as politics and
crime, by using analysis tools [5][6]. In addition, the FACT-Graph has been used
for web access log data and we have acquired useful knowledge such as the result
shown in Fig. 1 [7].
A FACT-Graph uses nodes and links. It embeds the changes in a keyword’s
class transition and co-occurrence in nodes and edges. In addition, a FACT-Graph
allocates the last keyword class attribute to nodes because we assume that recent
information is important to carry out trend analysis.
There are two essential technologies in order to compile a FACT-Graph: class
transition analysis and co-occurrence transition. Class transition analysis shows
the transition of a keyword class between two periods [8]. This analysis separates
keywords into four classes (Class A–Class D) on the basis of term frequency (TF)
and document frequency (DF) [8]. The four classes are classified by the status of
high/low TF and DF separated by two thresholds. The results of the analysis detail
the transition of keywords between two time-periods (before and after) as shown
in Table 1. For example, if a term belongs to Class A in a certain time period and
moves into Class D in the next time period, then the trend regarding that term is
referred to as “fadeout.” A FACT-Graph identifies these trends by the node’s
color. For example, red means fashionable, blue refers to unfashionable, and white
stands for unchanged.
In addition, a FACT-Graph visualizes relationships between keywords by using
co-occurrence information to show and analyze the topics that consist of multiple
terms. As a result, useful keywords can be obtained from their relationship with
other keywords, even though that keyword seems to be unimportant at a glance, and
the analyst can extract such keywords by using a FACT-Graph. Moreover, from the
results of the class-transition analysis, the analyst can comprehend trends in
keywords and topics (consisting of several keywords) by using the FACT-Graph. In
addition, a FACT-Graph considers the transition of the co-occurrence relationship
between the keywords. This transition is classified into the following types.
Comparison Analysis for Text Data by Integrating Two FACT-Graphs 145
Admission
News
Academic
/Research
Contents page
Top page
Lifelong Study
Table 1 Transition of Keyword Classes; Class A (TF: High, DF: High), Class B (TF: High,
DF: Low), Class C (TF: Low, DF: High), and Class D (TF: Low, DF: Low)
After
Class A Class B Class C Class D
Class A Hot Cooling Bipolar Fade
Class B Common Universal - Fade
Before
Class C Broaden - Locally Active Fade
Class D New Widely New Locally New Negligible
extracts the attributes of each document such as date of issue, length of document,
and category of document. The term database is then built.
The user sets up the parameters such as analysis span, the filter of
documents/terms, and thresholds used in the analysis. Then the term database is
divided into two databases (first- and second-half periods) in accordance with the
analysis span. Each term’s frequency is aggregated in the respective databases,
and keywords are extracted from the terms under the established conditions. These
keywords go through procedures concerning the transition of the keyword’s
classes and co-occurrence. The output chart that reflects the respective processing
results is a FACT-Graph.
Recognize gaps
E
F
Choose more characteristic Merge node and visualize
B D
G
g A > A E
e
A F
gaps C
C < C B D
G
E
e
F E < E A
a C
B D
G
G > G
A
a C
For this reason, we output both sides of the FACT-Graphs, that is, not only from
“before” to “after” but also from “after” to “before,” and carry out a comparison
analysis [10]. However, for a comparison analysis using two FACT-Graphs, we
should compare two graphs at frequent intervals. The larger a FACT-Graph is, the
more is the cost of analysis. In order to reduce the cost, we integrate two FACT-
Graphs into one.
A
a C
Be characteristic terms(links) to both
targets
4 Experiment
several viewpoints, and the assertions are characteristic of and different for each
publisher. Note that we consider few frequent words as unnecessary terms because
there is a probability that they are noise and error words. Therefore, we removed
the terms having TF less than 2 and DF equal to 1.
In this case study we targeted articles on the topic of the Olympic Games (the
Asahi and Yomiuri had 74 and 58 editorials, respectively). We applied the Jaccard
coefficient for co-occurrence and adopted relationships having co-occurrence over
0.3. To carry out class transition analysis in the FACT-Graph, we configured the
threshold into the top 20% ranked terms on the basis of Zip’s law and Pareto
Principle[11], which is often called the 20–80 rule. In addition, we recognized
which target to characterize by TF.
Election
Economics
of Neighbor countries
Diplomatic relations
Food Problem
Tibet Problem
Olympic Games
Fig. 5 Integrated FACT-Graph of two newspapers: the Asahi and the Yomiuri (Red: the
Yomiuri, Blue: the Asahi, White: both)
On the other hand, when we checked the keywords of the Asahi, we noticed
the term “War” in the area of “Diplomatic Relations.” This keyword belongs to
Class B with white terms, which has high TF and high DF, and the keyword only
appears in the Asahi. This means that the Asahi used the term “War” frequently
whereas the Yomiuri did not use this term at all. From this, we could comprehend
that the Asahi used the term “War” extensively when the topics of diplomatic
relations were described.
We could also understand the following by integrating FACT-Graphs.
1. We can recognize the spread of topics and amount of keywords from the
distribution of the color.
2. From the common keywords shown as white nodes in each topic, we can
recognize the basic keywords in the topics.
5 Conclusion
This paper described a method to integrate two FACT-Graphs into one to compare
two targets. To apply a FACT-Graph to a comparison analysis, we interchanged
Comparison Analysis for Text Data by Integrating Two FACT-Graphs 151
target data with time series on the basis of class transition analysis. In addition, we
explained how to integrate two FACT-Graphs into one for comparison analysis.
To validate the usability of an integrated FACT-Graph, we compared the
features of the Asahi and Yomiuri newspapers by analyzing their editorials. From
the results of the comparison analysis and targeting the word “Olympic,” we
discovered that the Asahi described several topics other than those described by
the Yomiuri and showed that the proposed method could be used for a comparison
analysis between two targets. As future works, we evaluate the experiment results
quantitatively and compare them with existing FACT-Graph.
Acknowledgement. This research was supported by The Ministry of Education, Culture,
Sports, Science and Technology (MEXT), Japan Society for the Promotion of Science
(JSPS), Grant-in-Aid for Young Scientists (B), 21760305, 2009.4-2011.3.
References
1. Tiwana, A.: The Knowledge Management Toolkit: Orchestrating IT, Strategy, and
Knowledge Platforms. Prentice Hall (2002)
2. Inmon, W.H.: Building the Data Warehouse. John Wiley & Sons, Inc. (2005)
3. Feldman, R., Sanger, J.: The Text Mining Handbook: Advanced Approaches in
Analyzing Unstructured Data. Cambridge University Press (2007)
4. Saga, R., Terachi, M., Sheng, Z., Tsuji, H.: FACT-Graph: Trend Visualization by
Frequency and Co-occurrence. In: Dengel, A.R., Berns, K., Breuel, T.M., Bomarius,
F., Roth-Berghofer, T.R. (eds.) KI 2008. LNCS (LNAI), vol. 5243, pp. 308–315.
Springer, Heidelberg (2008)
5. Saga, R., Tsuji, H., Tabata, K.: Loopo: Integrated Text Miner for FACT-Graph-Based
Trend Analysis. In: Salvendy, G., Smith, M.J. (eds.) HCI I 2009. Part II. LNCS,
vol. 5618, pp. 192–200. Springer, Heidelberg (2009)
6. Saga, R., Tsuji, H., Miyamoto, T., Tabata, K.: Development and case study of trend
analysis software based on FACT-Graph. Artificial Life and Robotics 15, 234–238
(2010)
7. Saga, R., Miyamoto, T., Tsuji, H., Matsumoto, K.: FACT-Graph in Web Log Data. In:
König, A., Dengel, A., Hinkelmann, K., Kise, K., Howlett, R.J., Jain, L.C. (eds.) KES
2011, Part IV. LNCS, vol. 6884, pp. 271–279. Springer, Heidelberg (2011)
8. Terachi, M., Saga, R., Tsuji, H.: Trends Recognition. In: IEEE International
Conference on Systems, Man & Cybernetics (IEEE/SMC 2006), pp. 4784–4789 (2006)
9. Saga, R., Takamizawa, S., Kitami, K., Tsuji, H., Matsumoto, K.: Comparison Analysis
for Text Data by Using FACT-Graph. In: Salvendy, G., Smith, M.J. (eds.) HCII 2011,
Part II. LNCS, vol. 6772, pp. 75–83. Springer, Heidelberg (2011)
10. Saga, R., Takamizawa, S., Tsuji, H., Matsumoto, K.: Comparison Analysis for
Editorials by Reversible FACT-Graph. In: Proceedings of the International Conference
on Information and Knowledge Engineering (IKE 2011), pp. 216–221 (2011)
11. Baayen, R.H.: Word Frequency Distributions. Springer (2002)
Construction of a Local Attraction Map
According to Social Visual Attention
Ichiro Ide, Jiani Wang, Masafumi Noda, Tomokazu Takahashi, Daisuke Deguchi,
and Hiroshi Murase
Abstract. Social media on the Internet where millions of people share their per-
sonal experiences, can be considered as an information source that implies people’s
implicit and/or explicit visual attentions. Especially, when the attentions of many
people around a specific geographic location focus on a common content, we may
assume that there is a certain target that attracts people’s attentions in the area. In
this paper, we propose a framework that detects people’s common attention in a
local area (local attraction) from a large number of geo-tagged photos, and its vi-
sualization on the “Local Attraction Map”. Based on the framework, as a first step
of the research, we report the results from a user study performed on a Local At-
traction Map browsing interface that showed the representative scene categories as
local attractions for geographic clusters of the geo-tagged photos.
T. Watanabe et al. (Eds.): Intelligent Interactive Multimedia: Systems & Services, SIST 14, pp. 153–162.
springerlink.com c Springer-Verlag Berlin Heidelberg 2012
154 I. Ide et al.
1 Introduction
Following the recent trend of the diffusion of social media on the Internet where
millions of people share their personal experiences as digital contents, we can easily
obtain thousands of photos tagged with geographic coordinates (geo-tags) indicating
where they were taken. We are focusing on the contents of such photos from the
point that they imply people’s implicit and/or explicit visual attentions.
Especially, when the attentions of many people around a specific geographic lo-
cation focus on a common content, we may assume that there is a certain target that
attracts people’s attentions in the area. In this paper, we call such a target a “local
attraction”.
A local attraction could be a static phenomenon such as an object; artificial or
natural, or a scenery observed from the area. Such local attractions are not easy to
infer from objective data such as satellite images and maps, since they can be any-
thing from a small statue located exactly on the spot to a panoramic view observable
from the spot which may contain geographic objects located miles away.
On the other hand, a local attraction could also be a non-static phenomenon that
reflects a common activity in the area, such as shopping, eating, playing, and so
on. These are even more difficult to infer from objective data, since they need to be
recognized in the context of human activity observed on the ground.
Moreover, without the help of social media, it would be difficult to infer what
attracts people only from objective data. We are not interested in providing users
with information on interesting spots located in the middle of a desert where no one
actually visits, but instead, with information on attractive spots where many other
people have also visited and showed their interest by taking a photo and sharing it
on the Internet.
Meanwhile, traditional media such as travel guides and maps cover both types of
local attractions, but their contents are not necessarily updated frequently. In order
to cope with the rapidly changing modern society, we considered that the continu-
ously updated information provided from a large number of people through social
media should be useful to construct a map that reflects the most up-to-date local
attractions.
In this paper, we propose a framework that automatically detects people’s com-
mon attention in a local area (e.g. local attraction) from a large number of geo-
tagged photos, and its visualization on a map called the “Local Attraction Map”.
Based on the framework, as a first step of the research, we report the results from
a user study performed on a Local Attraction Map browsing interface that showed
the representative scene categories as local attractions for geographic clusters of the
geo-tagged photos (Figure 1).
The paper is organized as follows: Section 2 introduces related works on land-
mark detection and travel planning based on social media. Section 3 introduces the
proposed method to construct the Local Attraction Map based on a large number of
geo-tagged photos. Section 4 reports the result of the user study, and finally Sect. 5
concludes the paper.
Construction of a Local Attraction Map According to Social Visual Attention 155
2 Related Works
Commercial softwares such as Google Map1 , Google Earth2 , and Panoramio3 are
nowadays important tools for travelers to perform a visual survey before visiting a
planned travel destination. However, it is usually difficult to find sites / areas-of-
interest in a destination where the user is not familiar with, by simply using these
services.
Making use of the geo-tags tagged to photos is a recent trend to support travel
planning for such users. This type of research can be separated into two topics: 1)
travel route mining and recommendation [1, 3, 8, 9], and 2) landmark detection and
representative photo selection [2, 6, 10, 12].
The travel route mining and recommendation methods analyze sequences of geo-
tagged photos and propose routes that match the interests of a user. Since they focus
mostly on the sequence of geo-tags, they usually do not make use of the image
contents, except for Cheng et al.’s work [3] that infers user attributes from faces in
the images and matches them with the user’s attributes for the recommendation.
1 http://maps.google.com/
2 http://earth.google.com/
3 http://www.panoramio.com/
156 I. Ide et al.
Since the local attraction could be any phenomenon from an object located exactly
on the spot to a terrain that covers a wide area, the size and the shape of the area
that covers a local attraction should be flexible. Thus, we decided to extract clusters
from the distribution of the geo-tagged photos instead of using fixed-sized shapes
(most likely, blocks).
For the clustering, we used the nearest neighbor method with a restriction that if
the geographic distance between two clusters are larger than θ [km], they are not
merged.
Figure 2 shows the result of the clustering for constructing the Local Attraction
Map shown in Fig. 1.
Construction of a Local Attraction Map According to Social Visual Attention 157
After the clustering, for each cluster, the type of local attraction is decided. As men-
tioned in Sect. 1, at this moment, we have only implemented certain static phenom-
ena, namely, scene categories as local attractions. We defined five scene categories
as shown in Fig. 3; “city”, “forest”, “water”, “flatland”, and “mountain”.
For the scene category classification, we implemented a five-class support vector
machine (SVM) classifier with a bag-of-features (BoF) representation of local image
features obtained by the scale-invariant feature transform (SIFT) algorithm [7] and
a normalized HSV color histogram as image features.
The classifier was trained by 16,689 categorized photos from the SUN database
[11]. We manually selected 39 categories from the SUN database and mapped them
onto the five scene categories as listed in Table 1. For reference, the classifiers had
Table 1 Correspondence of the scene categories and the SUN database categories.
City alley, amusement park, bridge, building, fountain, gazebo, house, market,
pagodas, place, railroad track, shopfront, street, temple, tower, village
Forest botanical garden, forest, forest path, park
Water bridge, canal, coast, creek, dam, hot spring, islet, lake, ocean, pond, river,
sea cliff, waterfall
Flatland amphitheater, badlands, desert, field
Mountain cliff, dam, mountain, sea cliff, valley
4 User Study
In order to evaluate the usefulness of the proposed Local Attraction Map, we per-
formed a user-study using a Local Attraction Map browsing interface as shown in
Fig. 5. The interface allows users to browse the Local Attraction Map (left-hand
side), and at the same time, scan through photos that belong to the representa-
tive scene category for a specified cluster displayed in a separate panel (right-hand
side).
Construction of a Local Attraction Map According to Social Visual Attention 159
Fig. 5 The Local Attraction Map browsing interface. The picture browsing panel allows users
to browse geo-tagged photos that belong to a representative scene category at a location
specified in the Local Attraction Map panel.
Table 2 Parameters and conditions set to create the Local Attraction Map in Fig. 1.
Step 2. After lecturing the function of the Local Attraction Map browsing inter-
face, asked the subject to evaluate its usefulness for the purpose of performing a
survey on a planned travel destination. The subjects selected from the following
five candidates for the evaluation: Useful, Relatively useful, Neutral, Relatively
Useless, and Useless. They were also asked to provide reasonings for the judg-
ments.
• The map allows me to grasp at a glance, the location and the types of
scenes that I can expect at an unfamiliar travel destination.
• The map shows what (which scene category) most people shoot at at
a specific location.
• Since the local information is evaluated according to the number of
photos, the map may reveal hidden spots-of-interest.
• Even photos without tags can be classified and searched using the map.
• Different from Panoramio’s tag-based search, the map can show vari-
ous scene categories at the same time.
• It would be more useful if the map displayed not only photos from a
representative scene category, but rather popular photos for all cate-
gories.
• If the scene categorization were very accurate, the map could be use-
ful.
• The definition of the scene categories was ambiguous and difficult to
understand. The map is not useful unless they are more concrete con-
cepts.
• What happens if two different scene categories are present in a single
photo?
Following these reasonings, in the future, we will consider the following points in
order to improve the usefulness of the Local Attraction Map.
• Add a function that shows a ranked list of representative scene categories per
cluster.
• Modify the scene category classifier so that it could handle the situation where
multiple scene categories are present in a single photo.
• Improve the scene category classification accuracy by using a state-of-the-art
general object recognition method, and also by developing a classification method
that considers the inclusion relation between scene categories.
• In addition to the current scene categories, add those that represent non-static
phenomena, such as “Eating”, “Shopping”, “Playing (sports)”, and so on.
5 Conclusion
In this paper, we proposed the framework to construct a “Local Attraction Map” by
analyzing the social visual attention from a large number of geo-tagged photos. The
user study showed positive results, but at the same time we found several important
points that need improvement. In the future, we will work on these points, and also
try to perform an experiment and an user study in a larger scale.
Acknowledgements. Parts of this work were supported by Grants-in-aid for Scientific Re-
search from the Japanese Ministry of Education, Culture, Sports, Science and Technology.
References
1. Arase, Y., Xie, X., Hara, T., Nishio, S.: Mining people’s trips from large scale geo-tagged
photos. In: Proceedings of the 18th ACM International Conference on Multimedia, pp.
133–142 (2010)
2. Chen, W.C., Battestini, A., Gelfand, N., Setlur, V.: Visual summaries of popular land-
marks from community photo collections. In: Proceedings of the 17th ACM International
Conference on Multimedia, pp. 789–792 (2009)
162 I. Ide et al.
3. Cheng, A.J., Chen, Y.Y., Huang, Y.T., Hsu, W.H., Liao, H.Y.M.: Personalized travel rec-
ommendation by mining people attributes from community-contributed photos. In: Pro-
ceedings of the 19th ACM International Conference on Multimedia, pp. 83–92 (2011)
4. Crandall, D., Backstrom, L., Huttenlocher, D., Kleinberg, J.: Mapping the world’s pho-
tos. In: Proceedings of the 18th International Conference on World Wide Web, pp. 761–
770 (2009)
5. Csurka, G., Bray, C., Dance, C., Fan, L., Willamowski, J.: Visual categorization with
bags of keypoints. In: Proceedings of the ECCV2004 International Workshop on Statis-
tical Learning in Computer Vision, pp. 1–22 (2004)
6. Gao, Y., Tang, J., Hong, R., Dai, Q., Chua, T.S., Jain, R.: W2Go: A travel guidance
system by automatic landmark ranking. In: Proceedings of the 18th ACM International
Conference on Multimedia, pp. 123–132 (2010)
7. Lowe, D.: Distinctive image features from scale-invariant keypoints. International Jour-
nal of Computer Vision 60(2), 91–110 (2004)
8. Lu, X., Wang, G., Yang, J., Pang, Y., Zhang, L.: Photo2Trip: Generating travel routes
from geo-tagged photos for travel planning. In: Proceedings of the 18th ACM Interna-
tional Conference on Multimedia, pp. 143–152 (2010)
9. Okuyama, K., Yanai, K.: A travel planning system based on travel trajectories extracted
from a large number of geotagged photos on the Web. In: Proceedings of the Pacific-Rim
Conference on Multimedia (2011)
10. Weyand, T., Leibe, B.: Discovering favorite views of popular places with iconoid shift.
In: Proceedings of the 13th IEEE International Conference on Computer Vision, pp.
1132–1139 (2011)
11. Xiao, J., Hays, J., Ehinger, K., Oliva, A., Torralba, A.: SUN Database: Large-scale scene
recognition from abbey to zoo. In: Proceedings of the 2010 IEEE Computer Society
Conference on Computer Vision and Pattern Recognition, pp. 3485–3492 (2010)
12. Zheng, Y.T., Zhao, M., Song, Y., Adam, H., Buddemeier, U., Bissacco, A., Brucher,
F., Chua, T.S., Neven, H.: Tour the World: Building a Web-scale landmark recognition
engine. In: Proc. 2009 IEEE Computer Society Conference on Computer Vision and
Pattern Recognition, pp. 1085–1092 (2009)
Construction of Content Recording
and Delivery System for Intercollegiate Distance
Lecture in a University Consortium
1 Introduction
The consortium of higher education in Shinshu (Koutou Kyouiku Konsoshiamu
Shinshu) is a joint body with the aim of maintaining the individuality and
nurturing the talents within eight universities geographically dispersed around
Nagano Prefecture in central Japan. To meet these aims, the intercollegiate
distance learning system was set up in November, 2008 so that shared classes
could take place with faculty members and students from each of these
universities.
From April, 2009, we have operated the intercollegiate distance learning
system and been able to connect distance-learning lecture rooms in real time. For
Kizuku Chino
The Consortium of Higher Education in Shinshu
3-1-1 Asahi Matsumoto City, Nagano 390-8621 Japan
e-mail: morisita@shinshu-u.ac.jp
T. Watanabe et al. (Eds.): Intelligent Interactive Multimedia: Systems & Services, SIST 14, pp. 163–172.
springerlink.com © Springer-Verlag Berlin Heidelberg 2012
164 T. Morishita, K. Chino, and M. Niimura
example, we held the intercollegiate community event “K3 Salon” for faculty
members and students twenty-four times between May, 2009 and September,
2011, which promoted utilization and operational testing of the system[1,2].
Such a distance learning system has also been constructed at other universities
and consortia[3], and faculty members have used it for conferences entirely within
Shinshu University. However, in the case of intercollegiate distance lectures,
students were not able to attend lectures because of differences in lesson schedules
of each university. In addition, faculty members suggested that it was difficult to
offer distance lectures without any discrimination between universities and to
expect the same educational effect as with face-to-face classes. To solve the above
problems, Morikawa, et al.[4] offered distance lectures on the schedule of the
backbone university. Alternatively, there was a practice of using a common time
zone of each university within a 120 minute window[5].
On the basis of the above previous studies, we suggest recording distance
lectures and delivering content in multiple formats to accommodate students who
are not able to attend the lectures. We felt it necessary to construct a recording
system within the existing system to make it easier for everyone familiar with that
system.
In this paper, we aim to construct a content recording and delivery system
based on the existing intercollegiate distance learning system in the consortium of
higher education in Shinshu.
2 Functionality Requirements
This system had to fulfill the following requirements for distance lectures.
Students are effectively able to attend the distance lecture. In this instance, the
timetable of each university should be the same. In these classes, because the
teacher-student rapport is maintained, educational effects should be equivalent to
face-to-face classes.
The image and voice are transmitted to the content recording system and contents
of lectures are created automatically. Then, these contents are transmitted to the
content delivery system and delivered on LMS (Learning Management System)
with VOD, so students are able to watch these contents on a personal computer at
home or in a study room with the internet at their convenience.
However, because there is no teacher-student rapport, it is necessary to provide
a method of communications such as LMS or E-mail, as above. In addition,
students also need their own accounts and passwords to log into the LMS on
which contents are published.
Construction of Content Recording and Delivery System 165
3 Configuration
To achieve the intercollegiate distance learning system with the above
requirements, the following equipment and software is necessary.
System Speaker
Control Unit
Component/
Composite/RGB
Matrix Switch Projector1
Display
Projector2
Telecon System Projector3
Echo Canceller
AMP Speaker
Touch
Panel
Wireless LAN
Access Point
The mobile system can be taken to any classroom or laboratory that has
internet access. SONY PCS-XG80 is used in combination with at least a twenty-
six inch LCD TV. The entire system is set up on a rack on casters that can be
moved easily and is wired so that only one plug needs to be connected to a
power outlet.
166 T. Morishita, K. Chino, and M. Niimura
(1) Set start-up-time, finish-time and rooms for distance lectures on the
booking system.
(2) Turn on distance learning system of each room automatically three
minutes before the start-up-time.
(3) Connect distance-learning lecture rooms to MCU automatically at the
start-up-time. In addition, start recording automatically with Polycom
RSS2000 teleconference recorder and content recorder in Polycom
HDX7000 at Shinshu University.
(4) Start distance lecture.
(5) Finish distance lecture at least one minute before the finish-time.
168 T. Morishita, K. Chino, and M. Niimura
5 Construction
5.1 Cooperation
Table 1 shows necessary functions and received signal/data of each system. The
received signal/data on the table is necessary to cooperate with other systems and
work smoothly.
Table 2 shows necessary data of each content delivery form on booking.
learning system because the data of lectures is closely dependent upon faculty
members and students. So we clarified the necessary access authorities, and
designed and constructed an authentication system (Table 3).
We decided to use a part of LDAP authentication system in Shinshu University
to authenticate users and manage access authorities. However there was a
possibility of conflicts, so we have constructed the LDAP server for the
consortium and made it work without contradictions.
Access Authority
Content
User Recording and Authentication
Booking System LMS
Delivery System
System
Administrator Full access Full access Full access Full access
Booking
Full access
Administrator
Authentication
Full access
Administrator
Full access Full access
Faculty Member View only
(only own class) (only own class)
Student View only View only
View only View only
Guest (only open (only open
class) class)
6 Evaluation
Each university has operated credit-transfer lectures within the intercollegiate
distance learning system since April, 2010, and 854 students registered for
fourteen distance lectures in the first semester; from April to September, 2011.
6.1 Support
For the first two weeks, a technical assistant visited each university from the
consortium and supported the delivery of distance lectures. The assistant gave out
a manual to use the distance learning system and instructed faculty members of
each university. In addition, another assistant in the consortium supported distance
lectures of each university by remote control.
After the above assistance period, faculty members of each university called the
technical assistants when there was a problem or question. The assistants
answered by telephone or E-mail, and supported by remote control as necessary.
According to the access log, 24.4% of all students watched the contents in the
weekend and the number of views was on the increase after school; from eighteen
to twenty-three in the weekday (Fig. 3). So we found that students were interested
in the content of preparing for or reviewing lectures and this system was able to
meet their needs. In addition, there was a possibility that the contents encouraged
students to study by themselves outside of school hours.
In addition, 90.9% of students felt that contents helped them to understand the
class (Fig. 4). Contents were felt necessary by 98.8% of students, of whom 43.6%
felt them indispensable (Fig. 5). So we found that students needed contents to
understand the lecture.
0.0% 0.6%
Strongly Agree
8.5%
41.8%
Somewhat Agree
Somewhat Disagree
49.1%
Strongly Disagree
8.5% 1.2%
Absolutely Necessary
43.6%
Only as a Guide
46.7% Unnecessary
7 Conclusions
We constructed a content recording and delivery system based on the existing
intercollegiate distance learning system to accommodate students who are not able
to attend lectures because of timetabling differences between each university in
the consortium of higher education in Shinshu.
We found that a lot of contents recorded by this system have been used at the
convenience of each university and student in the practice of a semester. As a
result, we suggest that this system achieved an aim of this paper that it records
172 T. Morishita, K. Chino, and M. Niimura
lectures and delivers a range of content to accommodate students who are not able
to attend. In addition, there is a possibility that the contents encouraged students to
study by themselves outside of school hours.
In the future, we would like to analyze the status of content views for each
student and make clear their study records and efforts with this system.
References
1. Morishita, T., Niimura, M.: Quantitative Evaluation of an Intercollegiate Distance
Learning System. In: Proc. of World Conference on E-Learning in Corporate,
Government, Healthaccomodate, and Higher Education 2009, pp. 3563–3568 (2009)
2. Morishita, T., Chino, K., Suzuki, H., Nagai, K., Niimura, M., Yabe, M.: Practice of K3
Salon with Intercollegiate Distance Learning System in the Consortium of Higher
Education in Shinshu. Journal for Academic Computing and Networking 14, 105–116
(2010) (in Japanese)
3. Sakurada, T., Hagiwara, Y.: Deployment of HD Videoconference System for Remote
Lectures at 18 National Universities. IEICE Technical Report, IA2008-82 108(460),
91–95 (2009) (in Japanese)
4. Morikawa, H., Ruangrassamee, A., Chen, H.: A Practice of International Distance
Lecture through the Internet. Geotechnical Engineering Magazine, JGS Ser. No.
610 56(11), 34–35 (2008) (in Japanese)
5. Terao, Y.: On Distance Education of Universities using SCS and its Evaluation. The
Journal of School Education 14, 179–184 (2002) (in Japanese)
Data Embedding and Extraction Method
for Printed Images by Log Polar Mapping
Abstract. Data extraction methods from data embedding printed images by cap-
turing devices like scanners or mobile cameras have attracted much attention. In
this technique, consideration for geometrical deformation, especially rotation and
scaling, in the data extraction process is essential. This paper proposes a new cor-
rection algorithm without remarkable markers. This method exploits the log polar
mapping (LPM) to detect and correct the distortion. The proposed method esti-
mates the rotational angle and scaling factor from the amount of shift in the LPM
domain. Therefore no data embedding or adding for correcting the deformation is
required and the data embedded image with high quality is obtained. An image
clipping technique suitable for the proposed method is also developed. From the
experimental results, the effectiveness of the proposed method is shown.
1 Introduction
Recently, QR cords for helping the access to the web site from many printed ad-
vertisements and publications are frequently used. The QR cords are one of the
two-dimensional codes, which are most widely used in Japan. However, the two-
dimensional code degrades the appearance of the original design of the printed
matter, since they look like impressive mosaic-shaped figures. Therefore the data
embedding technology which directly embeds the data to an image on the printed
matter has been proposed and attracts the attention [1], [2], [3].
The data embedding technology is an application based on the digital water-
mark [4]. The severe influence of printers or image scanning devices should be
considered. Especially, geometrical distortion by scanning like rotation has strong
influence and any countermeasure for it is required. In this case, some reference
mark should be required to correct the geometrical distortion and the mark is used
for the cue by which the amount of distortion is measured.
T. Watanabe et al. (Eds.): Intelligent Interactive Multimedia: Systems & Services, SIST 14, pp. 173–181.
springerlink.com © Springer-Verlag Berlin Heidelberg 2012
174 K. Tamaki, M. Muneyasu, and Y. Hanada
for giving the tolerance to the geometrical transformation [2]. Moreover, the same
message bits sequence are embedded in three lines centering on this embedding
region, and this redundancy gives the robustness to the disturbance caused by
printing and image capturing.
DFT Domain
DCT Domain
Reference Message
mark Shadowed bits
area
Reference
mark
Embedding
Shadowed Margin base line
area Margin
Margin
After the rotation correction, the corrected image should be resized. In this paper,
the following method is used.
176 K. Tamaki, M. Muneyasu, and Y. Hanada
For the extraction of the message bits, the difference W ' between the DCT coef-
ficients of the luminance component of the captured image Dw ' and that of the
original image is calculated. This is expressed by
W ' = Dw '− D (2)
In addition, the information data bits are detected by the inner product of the pre-
defined diffusion codes and W ′ . If the value of the inner product becomes posi-
tive, the value of the information bit is judged to be 0, and if negative, it is judged
as 1. Taking majority decision is also applied to the three extracted message bits
sequences.
3 Proposed Method
This paper proposes a new correction algorithm without remarkable markers. This
method is based on the LPM and POC. The proposed method estimates the rota-
tional angle and scaling rate from the amount of shift in the LPM domain. There-
fore no data embedding or adding for correcting the deformation is required and
the data embedded image with high quality is obtained. An image acquisition
technique suitable for this method is also developed. In this section, the embed-
ding and detection methods for message bits are same as that of the maker embed-
ding method and we omit the details of these methods.
3.1 LPM
The LPM transforms the orthogonal coordinate system to the polar coordinate one
whose radius axis is expressed by the log scale. To transform, firstly, the point
( x, y ) on the orthogonal coordinate system is transformed to the polar coordi-
nate one;
⎧⎪ r = x 2 + y 2
⎨ (3)
⎪⎩θ = tan −1 ( y x )
Data Embedding and Extraction Method for Printed Images 177
This equation shows that ln σ and α in this domain represent the effects of
scaling and rotation to the captured image. Therefore we can estimate the amount
of scaling and rotation from the amount of translation on the LPM domain. The
POC method [6] is used for the estimation of the amount of translation on the
LPM domain. This method can calculate the matching between the images effec-
tively and accurately by using the phase correlation. The amount of translation
obtained by the POC method is exploited for the estimation of the amount of rota-
tion and scaling.
R = rmax × n = X ⎣ ( 2 ) 2⎦× n r
(10)
where the size of the image is X × X , ⎣•⎦ means round down and nr is the
parameter which specifies the precision of radius axis.
Figure 6 shows the relationship between the orthogonal coordinate system and
the log polar coordinate one. The relationship between the point (a , b) on the log
polar coordinate and the point ( x, y ) on the orthogonal coordinate is expressed as
⎧ b ln rmax
⎪ x = e R −1 cos(a nw )
⎨ b ln rmax . (11)
⎪ y = e R −1 sin (a n )
⎩ w
From this relation, we find that the lattice point in a coordinate is not always cor-
responding to the one in another coordinate. Therefore an appropriate interpola-
tion should be required. In this paper, the bilinear interpolation is adopted.
y
ln r
x
θ
Let xt and yt be the number of translation pixels for angular and radius axes,
respectively, then the rotation angle α and the scaling factor σ can be obtained by
⎧ xt
α=
⎪⎪ nw
⎨ − y t ln( rmax )
. (12)
⎪
⎪⎩σ = e R − 1
4 Experimental Result
To show the effectiveness of the proposed method, we compare the proposed
method with the conventional method [4]. In this experiment, as the printer, Ca-
non LBP5400 whose resolution was 600 [dpi] and the image scanner, EPSON GT-
X770 whose resolution was 300 [dpi] were used, respectively. The five kinds of
grayscale images whose size were 512 × 512 were selected. Figure 7 shows the
example of the original image. The value of the reference mark was 320,000 for
the conventional method. The 48 bits were embedded and the gain coefficient was
9 for the both method. We also set n w = 5 , n r = 2 , h = 5 and bl = 15 , experi-
mentally. Figure 8 shows an example of the data embedded image.
First, to evaluate image quality, peak signal noise to ratio (PSNR) was adopted.
The result is shown in Table 1. From this result, the image quality of the proposed
method is superior to that of the conventional one.
Next, the data detection rate was evaluated. In Table 2, the average detection
rates for 15 trails are shown. The detection rate of the proposed method is over
95% for each image and is almost equivalent to that of the conventional one. The
superior results were obtained for airfield and Goldhill images whose background
is nearly white, since the resize method of the conventional one is strongly
affected by the intensity of the background.
5 Conclusion
In this paper, a new correction method to the captured image for extracting the
embedded data from the printed image has been proposed. This method estimates
the rotational angle and scaling factor based on the amount of the translation in the
LPM domain. Compared to the conventional method, the image quality can be
improved, since no markers are embedded. The employment of the POC method
enables the proposed method to detect the data from the image whose background
is nearly white. Experimental results confirmed the effectiveness of the proposed
method.
References
1. Mizumoto, T., Matsui, K.: Robustness Investigation of DCT Digital Watermark for
Printing and Scanning. Trans. of IEICE (A) J85-A(4), 451–459 (2002)
2. Nakanishi, K., Shono, M., Muneyasu, M., Hanada, Y.: Data Detection from Data
Embedding Printing Images Using Cellular Phones with a Camera. Proc. SISB 2008,
111–114 (2009)
3. Kudo, H., Furuta, K., Muneyasu, M., Hanada, Y.: Automatic Information Retrieval
from Data Embedded Printing Images Using Correction of Rotational Angles Based on
Reference Marks. In: Proc. 2010 ISCIT, pp. 626–629 (2010)
4. Cox, I.J., Miller, M.L., Bloom, J.A.: Digital Watermarking. Morgan Kaufmann Publish-
ing, San Francisco (2002)
5. Zheng, D., Zhao, J., Saddik, A.E.: RST-Invariant Digital Image Watermarking Based on
Log-Polar Mapping and Phase Correlation. IEEE Trans. on Circuits Syst. Video
Technology 13(13) (2003)
6. Foroosh, H(S.), Zerubia, J.B., Berthod, M.: Extension of Phase Correlation to Subpixel
Registration. IEEE Trans. on Image Process. 11(3), 188–200 (2002)
7. Ruanaidh, J.J.K.O., Pun, T.: Rotation, scale and translation invariant spread spectrum
digital watermarking. Signal Process. 66(3), 303–317 (1998)
Design and Implementation of Computer
Assisted Training System for Nursing Process
Learning
Abstract. Nurses are required to grasp conditions of each patient from physical,
mental, social, and spiritual perspectives. From that grasp, nurses are realising both
of disease control which enables observation and treatment and support of patients
living. This work process is called “nursing process”. The education of assessment
skill is one of the most important assignments for understanding nursing process
and support system to master assessment skills waiting a long time. We developed
CASYSNUPL, Computer Assisted System for Nursing Process Learning and had
made use of the system in lectures and practices. It is a system that learner can im-
aging medical case by media files and available learning on stand-alone computer
by template file used MS-Excel VBA. So learners can understand nursing process
likewise learners who are learned man-to-man practice. CASYSNUPL is evaluated
by learner that it helped their understand for nursing process.
1 Introduction
A nurse’s responsibility is care on an individual basis of each patient. Therefore,
nurses are required to grasp conditions of each patient in a comprehensive way
from physical, mental, social, and spiritual perspectives. From this grasp, nurses are
realising both disease control which enables observation and treatment and support
of patient’s living (figure 1). This work process is called “nursing process”, and it is
a series of process which involves assessment, nursing diagnosis, planning,
Yukuo Isomoto
Faculty of Human Life and Environmental Science, Nagoya Women’s University
T. Watanabe et al. (Eds.): Intelligent Interactive Multimedia: Systems & Services, SIST 14, pp. 183–189.
springerlink.com © Springer-Verlag Berlin Heidelberg 2012
184 S. Takami et al.
From such situation, there is much demand for an effective supporting system
for learning nursing process.
A method of thinking when nurses provide care for patients is called POS. It is
important section in fundamentals of nursing, so POS is taught for a long time.
POS is a natural thinking skill of everyday life. But in medical and nursing
sciences, it needs critical thinking based on knowledge of medicine and nursing,
psychology, sociology and so on to make clear the problems of patients. It is hard
for learners to master POS as a special skill.
For mastering POS of nursing process, learners need to practice more than 10
cases on paper simulation or by man-to-man practice by senior nurses. But it
needs enormous human resources. For solving the present situation, we developed
CASYSNUPL, Computer Assisted System for Nursing Process Learning, a
system that enables graphical images about standardized patient and medical
environments and solves problems in inductive and deductive ways by learners
themselves in 2005. We have made use of the system in lectures and practices.
Excel (learn)
a case
Downloaded Templete
case data editting case data
case image
•movies Client (teacher)
•photos Browzer
•illustrations download students case data and upload
new case data
Student’s data Excel (teaching)
editted case data create case data
̕
U[PVJGUKU PE
U[PVJGUKU㧝 U[PVJGUKU㧞 U[PVJGUKU㧢 G
FG
TGNCVGFEJCTV FW
̕
EV
FKCIPQUKU &KCIPQUKU㧝 &KCIPQUKU㧠 KX
G
KP
RTQDNGONKUV HG
RTQDNGONKUV
TG
PE
PWTUKPIECTGRNCP G
PWTUKPI RNCP PWTUKPI RNCP
̕ PWTUKPI RNCP
PWTUKPITGUWNV
GXCNWCVKQP
TGUWNV
GXCNWCVKQP
TGUWNV
GXCNWCVKQP ̕ TGUWNV
GXCNWCVKQP
Student Server
ii. Most learners of nursing process who have lectures in classroom did not
have enough instruction, less than learners who gave man-to-man train-
ing. CASYSNUPL has a function of collecting edited case data by learn-
ers. So we could give personal and friendly teaching for all learners.
iii. We had two-way communication between teachers and learners. This
communication leads learner up to have more motivation to learn.
Did you understand the overview of nursing process? 9.2 65.5 19.5 4.61.1
Did you understand aseries of nursing process such as assesment,nursing
diagnosis, nursing care plan, intervention, and evaluation? 8.4 64.4 10.3 5.7 1.1
Could you have corrective image of nursing support system? 8 35.6 44.8 8 3.4
Can you think in both inductive and deductive ways? 9.2 25.3 47.1 14.9 3.4
Do you want to continue to study nursing process with CASYSNUPL? 48.3 28.7 12.6 9.2 1.1
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Strongly agree Agree Neither agree nor disagree Disagree Strongly disagree
6 Prospects of CASYSNUPL
CASYSNUPL is used by some nursing institutions including our college. Each in-
stitution teachers created case data for their students. These data is recorded by the
database of CASYSNUPL, so learners can share all the case data in it.
Teachers can add case data by their own free will, so case data will increase
with implementation. Learners will be able to use more case data, but it is ex-
pected that they will not have enough knowledge to select their proper case data.
The applied range of patients’ cases is wide, so it has operated on multiple
nursing areas. In other nursing institutions, CASYSNUPL can operate by making
case data according to each institution’s educational policies. At present,
CASYSNUPL is operated on case data in a fictional character, but it can operate
on the case data of real patients.
In the future, we hope that CASYSNUPL will be used in more nursing institu-
tions, so we aim to obtain extensibility for adding functions, such as searching
for many case data and, multilingual support.
7 Summary
CASYSNUPL is a system for developing train “thinking”, not “knowledge”. It is a
feature of CASYSNUPL that can train learners’ skills for thinking, and
CASYSNUPL obtained achievements for training skills for thinking.
On the other side, we have some problems;
We hope that we can solve these problems and develop the education of nursing
skills from the practical use of CASYSNUPL.
Reference
1. Egawa, S., Shindan, K.K.: Nissoken, p. 2 (2005)
Designing Agents That Recognise and Respond
to Players’ Emotions
Weiqin Chen
Abstract. Giving agents the ability to sense, recognise and appropriately respond
to the human emotions is one of the main methods to make the agents more
believable. In intelligent tutoring systems, learners’ affective states have been in-
corporated in providing adaptive feedback. In game industry, however, emotions
of players have not been paid enough attention. Games normally do not take into
account of players’ emotions when they play games. Non-player characters do not
respond to players’ emotional status. We argue that adapting to players’ emotions
can make the non-player characters more believable and the game more enjoyable.
1 Introduction
Weiqin Chen
Department of Information Science and Media Studies, University of Bergen,
Bergen, Norway
Oslo and Akershus University College of Applied Sciences, Oslo, Norway
T. Watanabe et al. (Eds.): Intelligent Interactive Multimedia: Systems & Services, SIST 14, pp. 191–200.
springerlink.com © Springer-Verlag Berlin Heidelberg 2012
192 W. Chen
approach. More precise modelling based on sensor input should provide us better
data for adapting agents’ action. In this research we use the Emotiv EPOC and a
neural network to detect player’s emotions. The agent is designed based on the
data input from Emotiv EPOC device and knowledge on how people react in
different emotional states.
The research questions we address in this research are:
2 Emotions in Games
Researchers in affective computing have made considerable efforts in modelling
users’ affect and synthesizing emotions in software agents (Picard 1997).
Emotions have been incorporated into action selection and planning (Aylett et al.
2006). Affective expressions have been implemented in virtual characters (Paiva
2000). In order to allow for powerful emotional experiences, Sundström
(Sundström 2005) defined “affective loop” which is an interactive process where
1) users recognise the emotional state of the system/agent and relate it to their own
affective state; 2) in turn the system/agent in turn recognises the affective state of
users and performs an equivalent integrative process.
Some systems are able to recognize affect without using any hardware sensors.
In such systems, affect is recognised by making inferences based on the users’ ac-
tions. Since emotions are complex and are expressed in different modalities (e.g.
facial expressions, voices, gestures and physiological signals), some systems have
explored multimodal approach for detecting affects. Studies have shown im-
provements in performance by combing contextual information and physiological
signals (Conati and Maclaren 2009). Some researches explored the potentials of
using electroencephalogram (EEG) devices for affect detection. A notable fact is
that affective tutoring systems that adapt not only to learners’ cognitive states but
also to their affective states have recently attracted the interest of a growing com-
munity of researchers (Arroyo et al. 2009; Heraz et al. 2007; D’Mello and
Graesser 2010). Various physiological sensors that capture EEG, EMG signals,
skin conductance levels, heart rate, and respiration rate have been used to provide
adaptive feedback to learners.
In game research, some effects have been made in taking consideration of play-
er’s affective states. For example, Hudlicka (Hudlicka 2009) proposed affective
game engines in order to provide functionality to support the recognition of user
and game character emotions, real-time adaptation and appropriate responses to
these emotions, and more realistic expression of emotions in game characters and
user avatars. Other researches on emotions in games have also incorporated play-
ers’ emotional states into player modelling (Yannakakis et al. 2010; Garbarino et
al. 2011; Kim et al. 2004; Martinez and Yannakakis 2010). Some researchers have
studied gameplay emotions that arise from players’ actions in the game and
the consequent reactions of the game (Perron 2005). These emotions have the
potential to be used in providing adaptive responses.
Designing Agents That Recognise and Respond to Players’ Emotions 193
In commercial games, however, the focus has mainly been to invoke players’
emotions through a series of structuring and writing techniques, such as in “Emo-
tioneering” (Freeman 2003). Players’ affective states have not been widely used to
provide adaptive gameplay.
The sequence of Game A and B were mixed to prevent the effect that the players
become better at the game. After playing through Game A and B they were asked
if they observed any differences between the two games, and if so, which
differences they found and if they thought one of the games were more fun than
the other.
Although four of six players thought that it was more fun to play against emo-
tional agents, most of them were unable to identify the agents’ emotions. Only two
players managed to pick an emotion that the agent actually had towards them after
Game 0. We can argue that the players did not know how emotions would be ex-
pressed through the actions of the agents and therefore they had difficulties in
identifying agent emotions. When human players play the games face to face, they
can see each other’s facial expressions, body languages and other cues. These cues
provide important information for human players to judge the emotional status and
make decisions accordingly. When human players play with agents in the game,
the only information they could base on is the actions of the agents. The same ap-
plies for agents. When agents play with human players, they can only base on the
Designing Agents That Recognise and Respond to Players’ Emotions 195
actions taken by players. There are no other clues in the mechanism that can help
agents to better understand the emotional status of players. This is the main moti-
vation for our current research on using electroencephalogram (EEG) device in
StateCraft.
EEG devices capture brainwaves by measure the electrical activity on the scalp.
The EEG device we use in this research is the Emotiv EPOC. It has 14 elec-
todes/channels, comparable to medical brain-computer interfaces that normally
have 19 electrodes. A service program can automatically capture the raw EEG
signals coming from each of the channels. It can detect 14 cognitive actions
including movements and rotations. It can also detect expressive actions such as
facial expressions, eye and eyelid-related expressions. In addition, it can detect
affective emotions such as engagement, instantaneous excitement, and long-term
excitement with no need of training.
Emotiv EPOC has mainly been used in games for human players to manipulate
and control games with brain instead of hands. It has not been used to identifying
the players’ emotions and using these emotions to improve the realism of emo-
tional responses of AI characters in games.
• receive() is the method that receives the GameState and the diplomatic messag-
es through its input line. This ensures that the EmotivInterpreter receives the
GameState object each action round, and thereby passes it to all of the emotions
in the emotion list. Each SupportRequestMessage and AnswerSupportRe-
questMessage is also passed through the receive()-method so that it keeps track
of the agreements made in the preceding round.
• suppress() is the method which suppresses the TacticList from the ChooseTac-
tic module and allows the module to remove tactics or change their values.
• inhibit() is the method which inhibits the outgoing AnswerSupportRequestMes-
sage. The message is stored until next round, when the agent will check to see
if it performed the support operations it promised.
• Game states (6): including number of provinces, supply centres, army, fleets,
supply centre surplus, occupied neighbours.
• Number of supply centres of each players (7)
• Interaction between players (6): including number of supply centres stolen,
provinces stolen, accepted and denied support request, sent support requests
that were accepted or denied.
• Emotiv EPOC reading (14)
The output layer of the neural network is the emotion and its intensity. As men-
tioned in the previous section, there are eight different emotions experienced by
players in the game. These emotions include joy, loyalty, guilt, fear, anger, shame,
relief and disappointed.
198 W. Chen
In the training process for the neural network, players are required to self-report
the emotions. An emotion self-report window pops up after each round. The level
of intensity of the eight emotions can be reported by using a sliding bar with val-
ues from 1 to 100 for each emotion. The self-reported emotions are the standard
output for the neural network in the training process.
The decision-making process takes into account the player’s emotion and inten-
sity. Based on the emotion (with intensity) and game states, the agent predicts the
player’s next action. A list of rules is used to generate predictions. For example,
Anger players tend to make aggressive moves towards their opponents. Fear caus-
es players to make defensive moves. If a player feels guilty, he or she tends to
make cooperative moves. In other words, he or she will most likely to accept a
Support Request and actually provide the support. These rules are identified based
on the interview with players. For more details see (Chen et al. 2011b). The pre-
dicted actions of players are then used to decide on what actions the agent will
take. In this process the weights of the actions in the TacticList will be changed
accordingly.
5 Finishing Words
In this paper we have presented a new emotion module for agents in a multiplayer
strategy game. This work is an extension of our earlier research where we have
found out that it is difficult for an agent to identify human players’ or other
agents’ emotions states based purely on the behaviours of the players or the other
agents. The new module makes use of the EEG signals from Emotiv EPOC to
identify players’ emotions, which helps agents to provide adaptive responses to
them. The research aims to make the agent more believable and the game more
enjoyable.
The module is under development. The decision-making process is not fully
functional. When it is completely implemented, we plan to conduct two user stu-
dies to address the two research questions we presented in Section 1.
The first study will be similar to our previous user study where players will be
asked to play three games:
• Game 1: players use Emotiv EPOC and play against all 6 new emotional agents
(agents with the new emotion module)
• Game 2: players do not use Emotiv EPOC and play against all 6 old emotional
agents (agents with the previous emotion module)
• Game 3: players do not use Emotiv EPOC play against all 6 regular agents
(agents without the emotion module)
The sequence of the games will be mixed to prevent the learning effect. After
playing through the three games they will be asked if they observed any differenc-
es among the three games, and if so, which differences they found and if they
thought the agents in one of the games are more believable or one of the games
were more fun than the others.
Designing Agents That Recognise and Respond to Players’ Emotions 199
The second study is similar to a Turing Test where players with Emotiv EPOC
will be playing a mixture of human players and new emotional agents. After the
games, they will be asked to identify which of the countries are played by agents
and which are played by human players.
References
Arroyo, I., Cooper, D.G., Burleson, W., Woolf, B.P., Muldner, K., Christopherson, R.:
Emotion sensors go to school. In: Dimitrova, V., Mizoguchi, R., Boulay, B.D., Graesser,
A. (eds.) Proc. AIED 2009, pp. 17–24. IOS Press (2009)
Aylett, R.S., Dias, J., Paiva, A.: An affectively-driven planner for synthetic characters. In:
Long, D., Smith, S.F., Borrajo, D., McCluskey, L. (eds.) Proc. ICAPS 2006, pp. 2–10.
AAAI Press (2006)
Bates, J.: The role of emotion in believable agents. Communications of the ACM 37(7),
122–125 (1994)
Brooks, R.: Elephants don’t play chess. Robotics and Autonomous Systems 6(1&2), 3–15
(1990)
Chen, W., Carlson, C., Hellevang, M.: Emotional agents in a social strategic game. In:
König, A., Dengel, A., Hinkelmann, K., Kise, K., Howlett, R.J. (eds.) Proc. KES 2011,
pp. 239–248. Springer, Heidelberg (2011a)
Chen, W., Carlson, C., Hellevang, M.: The Implementation of Emotions in a Social Strate-
gy Game. Paper presented at the GET 2011, Rome, Italy (2011b)
Conati, C., Maclaren, H.: Empirically building and evaluating a probabilistic model of user
affect. User Modeling and User-Adapted Interaction 19(3), 267–303 (2009)
D’Mello, S., Graesser, A.C.: Multimodal semi-automated affect detection from conversa-
tional cues, gross body language, and facial features. User Modeling and User-adapted
Interaction 20(2), 147–187 (2010)
Freeman, D.E.: Creating Emotion in Games: The Craft and Art of Emotioneering. New
Riders Games (2003)
Garbarino, M., Matteucci, M., Bonarini, A.: Affective Preference from Physiology in
Videogames: A Lesson Learned from the TORCS Experiment. In: D’Mello, S., Graess-
er, A., Schuller, B., Martin, J.-C. (eds.) ACII 2011, Part II. LNCS, vol. 6975, pp. 528–
537. Springer, Heidelberg (2011)
Heraz, A., Razaki, R., Frasson, C.: Using machine learning to predict learner emotional
state from brainwaves. In: Spector, J.M., Sampson, D.G., Okamoto, T., et al. (eds.) Proc.
ICALT 2007, pp. 853–857 (2007)
Hudlicka, E.: Affective game engines: motivation and requirements. In: Whitehead, J.,
Young, R.M. (eds.) Proc. the 4th International Conference on Foundations of Digital
Games (FDG 2009), pp. 299–306. ACM (2009)
Kim, J., Bee, N., Wagner, J., André, E.: Emote to win: affective interactions with a com-
puter game agent. GI Jahrestagung 1, 159–164 (2004)
Krzywinski, A., Chen, W., Helgesen, A.: Agent architecture in social games – the imple-
mentation of subsumption architecture in Diplomacy. In: Darken, C., Mateas, M. (eds.)
AIIDE 2008, pp. 100–104. AAAI Press (2008)
200 W. Chen
Martinez, H.P., Yannakakis, G.N.: Genetic search feature selection for affective modeling:
a case study on reported preferences. In: Castellano, G., Karpouzis, K., Martin, J.-C.,
Morency, L.-P., Peters, C., Riek, L. (eds.) 3rd International Workshop on Affective
Interaction in Natural Environments (AFFINE 2010), pp. 15–20. ACM (2010)
Ortony, A., Clore, G.L., Collins, A.: The cognitive structure of emotions. Cambridge
University Press, Cambridge (1988)
Paiva, A. (ed.): Affective Interactions: toward a new generation of computer interfaces.
Springer, New York (2000)
Perron, B.: A Cognitive Psychological Approach to Gameplay Emotions. Paper Presented
at the DIGRA 2005, Vancouver, British Columbia, Canada (2005)
Picard, R.W.: Affective Computing. MIT Press, Cambridge (1997)
Sundström, P.: Exploring the Affective Loop. Licentiate thesis, Stockholm University
(2005)
Yannakakis, G.N., Martínez, H.P., Jhala, A.: Towards affective camera control in games.
User Modeling and User-Adapted Interaction 20(4), 313–340 (2010)
Development of Agent-Based Model
for Simulation on Residential Mobility Affected
by Downtown Regeneration Policy
Abstract. In the current decades, compact city becomes a new concern of urban
planning among most of Japanese cities. Some local governments in Japan target
to realize the conception of compact city pattern through policy intervention, such
as encourage households remove from suburban to downtown and then relieve the
decrease of population in urban center areas. Recently one of such kind residential
policy is argued by local government in Kanazawa city, Japan. This policy man-
ages to attract and encourage households remove to downtown by offering a local
housing allowance. The contribution of this work is that we developed an agent-
based Household Residential Relocation Model (HRRM) for visualizing the effect
of this residential policy. HRRM is built on household interaction through housing
relocation choice and policy attitude, and thereby it can simulate the diversified
decisions of households in all their lifecycle stages. Through the simulation of
HRRM the effectiveness of this residential policy can be visualized and hereafter
helps local government to view the effect of the residential policy.
1 Introduction
Today, policy prescription has increasingly favoured a compact city approach to
relieve the negative sides of urban decline [Haase, et al, 2010;, Rieniets, 2006;,
Howley, et al, 2009]. Local government in Kanazawa city, Japan, has published
series of downtown regeneration policies to vitalize their downtown and make
T. Watanabe et al. (Eds.): Intelligent Interactive Multimedia: Systems & Services, SIST 14, pp. 201–211.
springerlink.com © Springer-Verlag Berlin Heidelberg 2012
202 Z. Shen et al.
their city more compact. In this work, we focus on a downtown regeneration poli-
cy for improving residential environment in local city (hereafter residential pro-
moting policy) that target to revitalize downtown through drawing the residents
who move to downtown with allowances for their residential relocation. We
attempt to prove the possibility of implementation effects of this policy by
Agent-Based Modelling (ABM).
As proved by exist research[Jager and Mosler, 2007], ABM is expected to con-
tribute significantly to the study of behavior-environment interactions, and to pro-
vide a valuable tool for exploring the effectiveness of policy measures in city
complex environments. This characteristic of ABM makes it a practical approach
for simulation of policy and meanwhile, revealing the effectiveness of policy. The
purpose of this research is to develop an agent-based Household Residential Relo-
cation Model (hereafter, HRRM) for visualizing household residential mobility
and hereby, reveal the possibility of reflecting implementation effects of this poli-
cy in local city of Kanazawa.
Regarding residential location issues that are combined as part of simulation
model, the stated preference experiments primarily focus on determining the
transport characteristics versus work and location related variables for residential
location [Kim et al., 2005; Molin and Timmermans, 2003; Rouwendal and Meijer,
2001]. These experiments are mostly on the basis of individual behavior research,
which show us with believable variables for further researches, especially for mi-
crosimulations or agent-based simulation. In the following we will introduce how
HRRM is built and how it works.
that location. Before relocation, household will firstly evaluate the utility of cur-
rent location. This process will produce a satisfaction evaluation of household’ s
current location and hereby, decides a relocation desire. After this, households will
focus on finding new houses in somewhere within the urban area. We utilize utili-
ty theory to build the third module to simulate the relocation choice process. The
relocation choice module will help households compare the utility of residential
locations in different urban areas and interact dynamically with other households
to help them make relocation decisions.
Meanwhile, the interactions between household agents are introduced to
represent the influences of residential promoting policy. In this work, the interac-
tions will take place on two levels, one is between agents at neighborhood level
and the other one is between agents at global level. The interactions between
agents are used to simulate household attitudes to the policy and in this paper
which is measured by the numbers of residents who would accept the policy and
move to downtown based on a questionnaire investigation regarding to this policy.
For model testing, we estimated the necessary coefficients using the questionnaire
carried out in Kanazawa City and visualized household residential mobility in a
vitural space of Kanazawa City. According to comparison between the simulated
relocation choices and a real ratio of household relocation to downtown of Kana-
zawa City, the possibility of using ABM to represent the effects of residential
promoting policy in Kanazawa city can be illustrated.
3 Discription of HRRM
households will pregnant and would probably move again for their babies (we
define households less than 45 years old can give birth to new generation of
households according to local birth rate of 0.9%). In the third stage a new genera-
tion of households is created and just as showed by another dashed line when the
agent one day grow up, he will be independent (go to collage or get a job at 18
years old,) and begin his first lifecycle stage. During this process, the old house-
hold who gave birth to this new household will continues their life and finally go
to their last lifecycle stage (a death rate 0.8% is utilized to eliminate household
agents).
When the lifecycle stage of a household agent changed, facing to moving the
agent will firstly make an assessment regarding his Household Relocation Desire
and then make a relocation decision basing on Household Relocation module as
disscussed in the following sections.
Remove to new
3.3.2 Utility of Residential location 1, CCA City Center Area
2, UPA Urban Promoting Area
Locations in Different Urban 3, UCA Urban Control Area
End
Areas
We seem that household makes Fig. 3 Decision tree for household relocation
decision on a new location choice
based on the utility offered at
and arround the location. The utility of location s for household i can be calculated
by equation 5. xisj here means a vector of observable explanatory variable j de-
scribing attributes of household i and location s. Additionally, asj is a vector of re-
trospective coefficients to variable j. Finally the probability for household i choos-
ing location s follows a logistic function showed in equation 6. Qis means the
probability of household i choose location s, which will further be influenced by
the utility offered by location s and without considering unobserved random influ-
ences. The same variables as what we used for satisfaction evaluation will be em-
ployed for calculating residential utility. While different from satisfaction evalua-
tion, the variable value is assigned by the rang of [1, 4] based on the hypothetical
space of Kanazawa city. We assume that the values for variable 1 to 6 and variable
10 to 16 are decreasing from CA to UCA by 1.5 unit with a random range of 0.5
respectively, but others are increasing from CA to UCA by 1.5 unit with a random
Development of Agent-Based Model for Simulation 207
eVis
Qis = (6)
∑e Vis
1) Neighborhood influence (xis19): neighbors who will use the policy of resi-
dential allowance to move to downtown Nmove divided by neighbors who
will not use the policy do relocation Nnomove ;
2) Global influence (xis20): total number of households who will use the pol-
icy to do relocation Gmove divided by all households Gtotal ;
Variable xis19 and xis20 stand for the neighborhood and global acceptance of the
policy. We determined them basing on the questionaire introduce by Tab-4.
11 3 1 5 10
5 8 7
7 7 1 4
5 2
CA 9
6 8
1 9
4 3 12
CA 4
UPA 6 9 11
11 7 12
UPA 12
UCA 8
10 63
UCA
According to the Japanese Census Survey in 1985, household data is created for
reflecting household attributes in Kanazawa city. It contains like household in-
come, car ownership, age, current residential location, etc. There are 1500 house-
hold agents living in this virtual city and each agent stands for 300 real households
in local city. The households are located according to the density of each land-use
zoning. In the virtual city we assume that the percentages of population in the
three income levels are 20%, 60%, and 20%, respectively. Therefore, the
household density in the virtual data has been created as the third picture in
Fig 4 and household income was represented as the first picture from the right of
Fig. 4.
parameters are simulated. The first one namely policy interaction I is basing on the
values of as19 and as20 showed in table-1. It supposed that all the households who
showed interests in this policy will denifitely move to CA. The results showed
households gradually all moved to CA area. Actually this is impossible so we sup-
pose that 10% of whom interested in this policy will finally move, the result
turned to be what showed by the figure of policy interaction II in Fig-7. Compared
with first scenario we can easily observe that households live in CA or do reloca-
tion to CA are increasing obviously. Apparently this residential promoting policy
can to some extent accelerate households remove to downtown.
Table 5 Comparison of household ratios in different urban areas between real data and
simulation results
thereby have fulfilled the purpose of visualization on possible effects of the resi-
dential promoting policy for downtown regeneration. Comparing with the simula-
tion results in first scenario, the number of households moved to downtown (CA
area) is increasing evidently when the policy parameters are updated. It means that
local downtown decline probably can be relieved by implementation of this new
residential policy. The implementation of this residential promoting policy to
some extent only helps and attracts them to choose new locations in downtown.
As a result, the model validation proved the possibility of HRRM in simulation,
and visualization of household residential relocation influenced by specialty poli-
cies through scenario configuration. While in this work all the simulations were
conducted in a virtual space and the simulation results can just reveal the differ-
ences between the first and second scenario. Thus the simulations can represent
the possibility of ABM in simulation of policy implementation effects. In the fu-
ture work we will improve it by simulating the real implementation results of local
residential promoting policy through using a real dataset.
References
Haase, D., Lautenbach, S., Seppelt, R.: Modeling and simulating residential mobility in a
shrinking city using an agent-based approach. Environmental Modelling & Software 25,
1225–1240 (2010)
Rieniets, T.: Urban shrinkage. In: Atlas of Shrinking Cities, German, Hatje, Ostfildern, p.
30 (2006)
Howley, P., Scott, M., Redmond, D.: An examination of residential preferences for less sus-
tainable housing: Exploring future mobility among Dublin central city residents. Ci-
ties 26, 1–8 (2009)
Jager, W., Mosler, H.J.: Simulating Human Behavior for Understanding and Managing En-
vironmental Resource Use. Social Issues 63(1), 97–116 (2007)
Population investigation, Basic census survey of Kanazawa City (2005-2007),
http://www.jinko-watch.com/shicho/0846.html
Kawakami, M., Takayama, J.: Study on residential intends of housing owners – A case
study of Kanazawa City. Journal of the City Planning Institute of Japan 13, 67–72
(1978)
Kikuchi, Y., Nojima, S.: Resident’s Mind about Residence Selection in Suburban Housing
Estates: Case study of 4 suburban estates in Fukui City. City Planning Review: Special
Issue on City Planning 42(3), 217–222 (2007)
Kim, J.H., Pagliara, F., Preston, J.: The intention to move and residential location choice
behavior. Urban Studies 42(9), 1621–1636 (2005)
Molin, E., Timmermans, H.: Accessibility considerations in residential choice decisions:
accumulated evidence from the Benelux. In: TRB 2003 Annual Meeting CD-ROM,
Washington, DC (2003)
Rouwendal, J., Meijer, E.: Preferences for housing, jobs, and commuting: a mixed logit
analysis. Journal of Regional Science 41(3), 475–505 (2001)
Development of the Online Self-Placement Test
Engine That Interactively Selects Texts for an
Extensive Reading Test
1 Introduction
Extensive reading (ER) is a method of learning that improves reading comprehen-
sion by reading many books that are easy and enjoyable. It is able to produce sig-
nificant educational benefits and has been successful in Japan[1, 6]. We are engaged
Kosuke Adachi
Graduate School of Science and Technology, Shinshu University
4-17-1, Wakasato, Nagano City, Nagano, Japan
e-mail: adachi@seclab.shinshu-u.ac.jp
Mark Brierley
Language Education Center, School of General Education, Shinshu University
3-1-1, Asahi, Matsumoto City, Nagano, Japan
e-mail: mark2@shinshu-u.ac.jp
Masaaki Niimura
e-Learning Center, Shinshu University
3-1-1, Asahi, Matsumoto City, Nagano, Japan
e-mail: niimura@shinshu-u.ac.jp
T. Watanabe et al. (Eds.): Intelligent Interactive Multimedia: Systems & Services, SIST 14, pp. 213–222.
springerlink.com c Springer-Verlag Berlin Heidelberg 2012
214 K. Adachi, M. Brierley, and M. Niimura
2 Background
2.1 SSS ER Method
Most native-targeted books, such as novels, require a high level of language profi-
ciency and are too difficult for general learners in Japan to read with any fluency. ER
typically relies on children’s literature, or graded readers that have been written or
adapted for language learners at a particular level of proficiency. Professor SAKAI
Kunihide at the University of Electro-Communications proposes a radical form of
the ER method, emphasising the reading of many books that are easy and enjoyable,
and beginning with very low-level books. In 2001, SAKAI et al. created SSS (Start
with Simple Stories) ER method, and his method has spread and come to fruition in
Japan[5].
The SSS ER method sets a goal to read one million words in English. For this
purpose, learners should first select books that are easy, and gradually read higher
level books as the learner’s degree of proficiency increases. Sakai’s method has the
“Three Golden Rules”[4]:
1. No dictionaries while reading.
2. Skip over difficult words and phrases.
3. Quit reading if the book is difficult or boring.
Development of the OSPT Engine That Interactively Selects Texts 215
ERFOSPT should be more accurate than EPER because it adaptively selects the
next level texts in consideration of the results of the previous questions. Eval-
uation criteria are not just the results of comprehension questions but also the
reading speed, and the learner’s impression of the story.
Dn = Dn−1 /Φ (1)
D1 is the initial value of Dn and a suitable initial value and a Φ for evaluated level
is decided by the examiner.
16
15
A text at level Xn is chosen within the above range that reduces in size with each
iteration.
Cn = (Cn−1 + wXn )/(1 + w) + Dn (3)
Cn = (Cn−1 + wXn )/(1 + w) − Dn (4)
Cn is defined by the upper equation(3) if “Tn < T Tn and Qn > QTn ” and other than
that is defined by the lower equation(4). “Cn-1 + w Xn/(1 + w)” is a weighted
average of the text difficulty and our estimate of the student’s level from last time, so
it can take into consideration both the text they have just read, and our last evaluate
of their level.
Fig.1 is an example of an algorithm operating as a learner takes the test. w is 5.0,
Φ is 1.2 and D1 is 3.0. The learner’s initial estimate of reading level is 5.0, and the
learner’s actual reading level is 10.0. Fig.1 assumes that the learner can quickly read
and correctly answer questions for texts up to level 10, but cannot fluently read texts
over level 10.
Authentication
Testing
Managing accounts
Creating problems
Fig. 2 Components of OSPT Controller
Managing logs
OSPT Engine
Development of the OSPT Engine That Interactively Selects Texts 219
state
1. Start
def constant as number
ITERMAX 6
def constant as number
PHI 1.2
def constant as number
WEIGHT 5
def variable
calev
def variable as number
iter 0
def variable as number
indec 2
def variable
rTime
def variable
score
def text
sText
if predicate function value learner.level
exsits
then set to learner.level
calev
2.
state
else SelectLevel
set to selectedLevel
calev
while predicate
ITERMAX > iter
function from
randomSelectText calev - indec
to calev + indec
set to selectedText
3. sText
state text sText
ReadText
set to readingTime
rTime
state questions sText.questions
4. TakeQuestions
do set to score
score
if predicate
rTime < sText.TT and sText.questions.threshold < score
set to
then /
calev calev + WEIGHT × sText.grade number + WEIGHT + indec
1
set to
5. else
calev calev + WEIGHT × sText.grade / number + WEIGHT - indec
1
set to
indec indec / PHI
set to number
iter iter + 1
set to calev
learner.level
function user learner
modifyUser
6.
state
End
Select
Level
Development of the OSPT Engine That Interactively Selects Texts 221
SAMPLE
5 Conclusion
The ERFOSPT is a new online self-placement test for ER that evaluates individ-
ual learner’s fluent reading levels easily, quickly and accurately. Additionally this
system adaptively selects the texts depending on the learners’ reading level and the
results of previous questions in the test using the algorithm of the OSPT Controller.
At this time the test is in the Beta stage with students from a number of schools
in Japan participating in trial runs. Moreover we have been analyzing data of the
logs, and assessing the suitable algorithm for the OSPT Controller. For the future,
we will make the ERFOSPT and the OSPT Engine available to the public, and keep
222 K. Adachi, M. Brierley, and M. Niimura
on analyzing and developing this. Additionally we hope to make the OSPT Engine
available not only to the ERFOSPT but also to other tests for other disciplines.
References
1. Furukawa, A., et al.: The Complete Book Guide for Extensive Reading, 3rd edn., p. 512.
Cosmopier (2010)
2. Extensive Reading Foundation.: About Graded Readers — The Extensive Reading Foun-
dation (December 7, 2011), http://www.erfoundation.org/erf/node/44
3. Sato, H., et al.: Evaluation of English Education system based on Extensive Reading in
Shinshu University. IEICE Technical Report. ET, Educational Technology 109(453), 141–
146 (2010)
4. Sakai, K.: Toward One Million Words and Beyond, p. 310. Chikuma-sho bo (2002)
5. Sakai, K., Kanda, M.: Extensive Reading in the Classroom, p. 227. Taishukan Shoten
(2005)
6. Brierley, M.: Extensive reading levels. JABAET Journal (11), 135–144 (2007)
7. Lemmer, R., Brierley, M., Reynolds, B., Waring, R.: Introduction to the Extensive Read-
ing Foundation Online Self-Placement Test. Extensive Reading World Congress Proceed-
ings 1, 23–25 (2012)
8. University of Edinburgh ELTC.: EPER Getting started (December 7, 2011),
http://www.ials.ed.ac.uk/postgraduate/research/
eper-getting-started.htm
DOSR: A Method of Domain-Oriented Semantic
Retrieval in XML Data
Abstract. This paper presents a method (named DOSR) to support the semantic re-
trieval of XML documents in a specific domain. It takes the entity as the basic unit
of information processing to guarantee the semantic integrity of returned results.
An efficient index method named Entity-based index is designed for indexing the
entities. It can greatly reduce the size of the index file while guarantee the speed of
parsing entity. In order to rank the querying results, the Stratified-Weight-Method is
proposed to make an improvement towards the traditional technology. Experimen-
tal results show that DOSR can infer users’ search intention effectively, locate the
search target quickly and return the exact results in accordance with users’ expecta-
tion. The results processed by DOSR guarantees semantic integrity and reasonable
ranking results.
1 Introduction
With the scalability, flexibility and self-descriptiveness of XML, the XML data from
a specific application domain has rich semantic information. XML has become a
well-acknowledged standard for data storage and data exchange, resulting in push-
ing the query processing issues to become very hot topics in XML research field.
Traditional XML query processing methods are mostly tree-based models, that
is, firstly look for the nodes that match the query keywords and then locate a sub-tree
containing those nodes which match the query keywords. But the semantic informa-
tion of node can’t be acquired through this traditional retrieval method. Currently
the typical method for XML semantic retrieval is Smallest Lowest Common An-
cestor (SLCA) [8], a sub-tree with high matching degree with the query keywords
can be achieved through SLCA. Based on the SLCA method, many scholars have
Jun Feng · Zhixian Tang · Ruchun Huang
College of Computer and Information, Hohai University, No.1 Xikang Road,
Nanjing 210098, China
e-mail: fengjun@hhu.edu.cn
T. Watanabe et al. (Eds.): Intelligent Interactive Multimedia: Systems & Services, SIST 14, pp. 223–232.
springerlink.com c Springer-Verlag Berlin Heidelberg 2012
224 J. Feng, Z. Tang, and R. Huang
conducted further researches, but they are overly concerned about the accuracy of
retrieval results but not the semantic integrity. Semantic integrity refers to the re-
trieval results with rich information and complete semantics, which is important for
the users of information systems. For example, as user John may want to search a
person’s contact information (e.g. name, e-mail, address) by phone, the traditional
tree-based retrieval methods which may return back just phone information but not
other information can’t achieve the purpose.
This paper presents a method of Domain-Oriented Semantic Retrieval (DOSR)
for XML, it takes the semantic integrity of retrieval results as a primary goal and
the domain entity as the basic processing unit. The main idea is domain-oriented,
those semantically complete domain entities are extracted by the domain experts
through analysis of the XML Schema. Based on the domain entity, DOSR index-
ing, retrieving and ranking the XML data guarantees the semantic integrity of the
retrieval results.
Our paper has the following contributions:
1. We present to take the domain entities as the basic processing unit. There is
no ambiguity for a concept in a specific application domain, which ensures the
semantic integrity of entities. The XML snippet corresponding to the entity con-
tains the entity nodes and all their child-nodes, which constitutes a semantically
complete entity.
2. We present the Stratified-Weight-Method for ranking the retrieval results, which
effectively solves the matching degree computing problem between entity in-
stances and query keywords.
3. We present an effective XML indexing method named Entity-based index, which
removes structural information and just maintains the value information with
flattens the XML structure twice, resulting in effectively improving the speed of
parsing the entity instances and reducing the size of index file.
The rest of the paper is organized as follows: In Section 2 we present related work.
In Section 3, we introduce DOSR. We discuss experiments analysis in Section 4.
We also give a summary of our work in Section 5.
2 Related Work
In recent years, the research on XML retrieval focused on the aspects of XML Key-
word query. The first area of research relevant to this work is the computation of
the Lowest Common Ancestor (LCA) of a set of nodes of a tree. Efficiently com-
puting the lowest common ancestor (LCA) of a pair of nodes in a tree is a hot
problem, efficient approaches in main memory are known for its solution [1]. A
Schmidt et al [2] present an algorithm which uses a set of relations which contain
the keywords pairs as inputs for computing of a pair of nodes, outputs all pairwise
LCAs. XKSearch[8]defined Smallest LCAs (SLCAs) to be LCAs that do not con-
tain other LCAs. In XKSearch the number of smallest LCAs is bounded by the size
of the smallest keyword list, then it can filter out some useless nodes, but it leads
to some meaningful results be lost. C Sun et al[3] extend the SLCA to handle general
DOSR: A Method of Domain-Oriented Semantic Retrieval in XML Data 225
Definition 6. Predicate: The defined tag name in the XML Schema document is
called Predicate, including the schema of entity name and attribute name.
Fig. 1(b) shows an example of Entity-based index, entity 3 contains more than two
instances, the instance in Doc1 has three attributes, the attribute 3 has two same
values which are separated by ”#;”.
3.3 Stratified-Weight-Method
In the XML Schema, the attributes of an entity has different semantic weight. We
recognize the attribute nodes from the same level have the same weight and the
sum weight of attribute nodes from same level is inversely proportional to their
level. We take 1/2i as the inverse coefficient, i is the level of the attribute node.
Algorithm 1 shows how to calculate the weights of the attributes, line 7 to line 11
are to calculate the nodes of the level of node p, the final coefficient is the inverse
coefficient(line 15).
Fig. 2 shows entity A has two layers ,the inverse coefficient of the first level is
1/21 , 1/22 for the second level, the weight-ratio of the first level is (1/21)/(1/22 +
1/21 )=2/3, (1/22 )/(1/21+1/22 )=1/3 for second level. Assume the weight of A is 1,
the overall weight of the first level is 2/3, 1/3 for the second level, then a, b, c has
the same weight, being 2/3/3=2/9, 1/3/2=1/6 for c, d.
228 J. Feng, Z. Tang, and R. Huang
4 Experimental Analysis
We conducted the semantic integrity experiments in order to test the precision of
results which were retrieved by DOSR. The index performance experiment was to
test the performance of response time of retrieval. The experiment environment is as
follows: CPU Intel i5 2.40G, Memory 4GB, HDD 300G/7200r, OS Win7 Ultimate.
Q keywords
Q1 nanjing // nanjing in all field
Q2 city // city in all field
Q3 rpOrgName delPoint city // city in rpOrgName and delPoint
Q4 rpIndName John Director // john in rpIndName Director in all field
Q5 postCode 210095 // 210095 in postCode
Q6 delPoint beijing postCode 210095 // beijing in delPoint, 210095 in postCode
Q7a cntOnLineRes linkage www.hhu.edu.cn orDesc information orFunct 002
a www.hhu.edu.cn in cntOnLineRes.linkage, information in orDesc, 002 in orFunct
Fig. 4(b) shows the entity mdContact which is the result of DOSR for Q7. The
context of entity mdContact is the contact information of metadata, which contains
the complete description information of responsible party, not only the information
appear in the keywords. For the user, mdContact is an entity of the domain, which
not only has complete context but also contains enough amount of information.
correlation with keywords. It denies the contribution of those ancestor nodes. The
results of SLCA must meet the following condition: the result must contain all the
keywords and there doesn’t results in any sub-tree which contain all the keywords.
Only the keyword contains the entity name, then SLCA can return entire semantic
entity. Compared with the SLCA, DOSR takes the domain entity as the information
processing unit. The minimum granularity of result is the entity, as long as the key-
words contain the name of entity or attribute, then you can get FSIs. If the (predicate,
text value) pair of keywords match with (attribute, attribute value) of an instance,
then you can get the PSIs.
1 1 6 1 16.67%
7 50 499 104 20.84%
2 100 1003 205 21.50%
3 200 2114 442 20.92%
4 500 5001 1021 21.42%
5 800 8625 1787 20.73%
6 1000 11468 2403 20.96%
Entity-based index (index). We can find that when the pressure of reading is grad-
ually increases, the time of reading instances from the source document increased
significantly, while the time of reading instances from the index remain at a rela-
tively low level, about 10% of reading instances from doc.
5 Conclusion
This paper presents DOSR to support the semantic retrieval of XML documents in a
specific domain. DOSR which takes the sematic entity as the basic processing unit
uses Entity-based index method for indexing the XML data and Stratified-Weight-
Method for ranking the results. Our experiments verify the effectiveness and effi-
ciency of DOSR. In the future, we intend to extend our keywords parsing algorithm
to support logical computing.
References
1. Czumaj, A., Kowaluk, M., Lingas, A.: Faster algorithms for finding lowest common
ancestors in directed acyclic graphs. Theoretical Computer Science 380(1-2) (July 2007)
2. Schmidt, A., Kersten, M.L., Windhouwer, M.: Querying XML Documents Made Easy:
Nearest Concept Queries. In: Proc. of the Intl. Conf. on Data Engineering, Washington,
USA, pp. 321–329 (2001)
3. Sun, C., Chan, C., Goenka, A.K.: Multiway SLCA-Based Keyword Search in XML Data.
In: Proc. of the Intl. Conf. on World Wide Web, New York, USA, pp. 1043–1052 (2007)
4. Li, G., Feng, J., Wang, J., Zhou, L.: Effective keyword search for valuable lcas over xml
documents. In: Proc. of the Intl. Conf. on Information and Knowledge Management,
New York, USA, pp. 31–40 (2007)
5. Guo, L., Shao, F., Botev, C., Shanmugasundaram, J.: XRANK: Ranked keyword search
over xml documents. In: Proc. of the Intl. Conf. on Management of Data, New York,
USA, pp. 16–27 (2003)
6. Hristidis, V., Koudas, N., Papakonstantinou, Y., Srivastava, D.: Keyword Proximity
Search in XML Trees. IEEE Transactions on Knowledge and Data Engineering 18(4)
(April 2006)
7. Li, Y., Yu, C., Jagadish, H.V.: Schema-Free XQuery. VLDB Endowment Very Large
Database Endowment 30 (2004)
8. Xu, Y., Papakonstantinou, Y.: Efficient Keyword Search for Smallest LCAs in XML
Databases. In: Proc. of the Intl. Conf. on Management of Data, New York, USA, pp.
527–538 (June 2005)
9. Huang, Y., Liu, Z., Chen, Y.: Query biased snippet generation in xml search. In: Proc. of
the Intl. Conf. on Management of Data, New York, USA, pp. 315–326 (2008)
10. Bao, Z., Lu, J., Ling, T.W., Chen, B.: Towards an Effective XML Keyword Search. IEEE
Transactions on Knowledge and Data Engineering 22(8) (August 2010)
11. Liu, Z., Chen, Y.: Identifying Meaningful Return Information for XML Keyword Search.
In: Proc. of the Intl. Conf. on Management of Data, New York, USA (2007)
Encoding Travel Traces by Using Road
Networks and Routing Algorithms
Abstract. Large numbers of travel traces are collected by vehicles and stored for
applications such as optimizing delivery routes, predicting and avoiding traffic,
and providing directions. Many of the applications preprocess the travel traces,
usually composed of position data, by matching these with links in the underlying
road network. This paper addresses the problem of persistent storage of large
numbers of vehicle travel traces. We propose two methods for using a routing al-
gorithm and road network to encode a travel trace formed by a sequence of links.
An encoded trace, composed of a few links, is useful to store or share and can be
decoded into the original travel trace. Considering that drivers tend to proceed
from an origin to a destination by using the shortest path or going as straight as
possible, the two proposed methods use the following two routing algorithms: a
shortest path algorithm; and a following path algorithm, which finds the path that
avoids turns. The experimental results for 30 real traces show that a travel trace is
encoded into only 5% or 7% of its links on average using the shortest path algo-
rithm or the following path algorithm, respectively.
Keywords: GIS, travel trace, travel data encoding, shortest path, following path.
1 Introduction
There are several kinds of applications that query large numbers of travel traces,
collected by vehicles over long periods of time, in order to obtain some knowl-
edge. Examples of such applications include the optimization of delivery routes on
the basis of previous data, the prediction and avoidance of traffic, and the provi-
sion of driving directions [1,2].
The travel trace of a vehicle is usually stored as a GPS trace generated by a
GPS receiver on board the vehicle. A GPS trace consists in a sequence of GPS
T. Watanabe et al. (Eds.): Intelligent Interactive Multimedia: Systems & Services, SIST 14, pp. 233–243.
springerlink.com © Springer-Verlag Berlin Heidelberg 2012
234 P.M. Lerin, D. Yamamoto, and N. Takahashi
2 Related Work
There are two main approaches to reducing vehicle travel traces: line simplifica-
tion approaches and map-based (map matching) approaches. Line simplification is
a well-studied problem [3,4]. The line simplification approach is based on reduc-
ing the number of GPS points by introducing an error, which is bounded. In other
words, the trace of the travel is smoothed. Map matching is also a well-studied
problem [5,6]. The map-based approach involves matching the GPS points to links
in a road network. Typically, for each matched link one point is then stored. Cao
et al. have investigated several simplification methods [7]. In addition, Hönle et al.
have presented a comparison of several simplification methods with a map-based
method, where the map-based method produced worse results than the approxima-
tion methods in terms of data reduction and calculation time.
We propose an approach that reduces travel traces in the form of a path (i.e.,
a sequence of links in a road network), as opposed to the above mentioned meth-
ods that reduced travel traces in the form of a sequence of GPS points.
We argue that to compare the results of map-based methods and line simplifica-
tion methods is not easy. Line simplifications methods smooth the trace, while map-
based methods fix the trace to links; therefore, one method is more convenient than
the other, regardless of the storage reduction and depending on the application.
Our approach uses two routing algorithms: the shortest path algorithm (SPA)
and the following path algorithm (FPA). The FPA was developed for EMMA,
the Focus+Glue+Context map system, in our previous work [9,10]. EMMA im-
plements the FPA to reduce the density of roads that are drawn to connect the fo-
cus area with the context area. The FPA developed for EMMA can find the multi-
ple paths that appear when a route bifurcates, because it tries to draw as many
following paths as possible to connect the focus with the context.
The encoding method described in this paper uses the FPA to find a subpath
within a travel route that cannot have bifurcations; i.e., a traveler who arrives at
an intersection of roads can only continue along one of the roads. This implies that
the proposed encoding system imposes requirements on the FPA that are different
from those imposed by EMMA. For the proposed system, we define an adapted
version of the FPA that finds a following path without bifurcations.
L1 L2
L3
L2.start L2.end
L4
L1 L2 L3 L1
L4 L4
L5
L6
L7 L7
Fig. 2 Example of encoding a path by using the shortest path algorithm (SPA). The arrows
represent the links and the thick lines represent the road network R.
Below we describe the algorithms for the Encode and Decode functions. Let us
consider that the function y.append(x) adds the element or sequence x to the end
of the sequence y. Let us also consider that the function pathEqual(P1, P2) returns
TRUE if the paths P1 and P2 are identical and returns FALSE otherwise.
centered on the point L.end. The length sb (in meters) is a system parameter, dis-
cussed in Section 6. Then, the function FLA returns the following link of L that
belongs to SR. The function FLA returns NULL when no links in SR are con-
nected to L. The following link is computed by obeying the rules prescribed be-
low. Let us consider L1–LN the N links that are connected to L and belong to SR.
Let us consider αi the angle between the connected links Li and L, as shown in
Figure 3.
Rule 1. When N = 1, the following link is L1.
Rule 2. When N > 1 and the angle αi between L and Li is the smallest among
the angles αk (1 ≤ k ≤ N) between L and the connected links, the following link is
Li, as shown in Figure 3.
Rule 3. When Li.len < l0, we assume that the links connected to Li are directly
connected to L, as shown in Figure 4 (left). The very small length l0, for example 2
m, is a system parameter that determines when a link is too small. Very small
links indicate misalignments, as shown in Figure 4 (right), for which the algorithm
uses Rule 3 to select the link Ls as the following link of L.
L1
L2
Li
L αi
LN
Li.len
Fig. 4 Rule 3 of function FLA (left) and example of slightly misaligned intersection (right).
Function isAhead(p, L)
Given a point p and a link L, the function isAhead(p, L) returns TRUE and p is
considered ahead of L if and only if the angle between L and the vector formed by
p and the endpoint of L is greater than 90º, with the angle defined as shown in
Figure 5.
The algorithm of the function makePath with the shortest path algorithm is pre-
sented below. Let us consider that the function P.append(x) adds the element x to
the end of the sequence P and that the function last(z) returns the last element of
the sequence z.
6 Evaluation
We have developed and evaluated the functions Encode and Decode for the two me-
thods presented in Sections 4 and 5. The evaluation was made in terms of
the number of links needed to encode each path in a dataset of 30 paths with diverse
Encoding Travel Traces by Using Road Networks and Routing Algorithms 241
lengths. The 30 paths have a total length of 356 km, and they are composed of 4374
links. We obtained the 30 paths as a result of map matching the GPS traces of real
intra-city and intercity journeys made by car in Japan by a member of our labora-
tory. For the evaluation, we used the entire road network of Japan as stored in a da-
tabase in our laboratory.
We have evaluated the two encoding methods with the following parameters:
• Method 1, using the SPA, with sa = 2000 m.
• Method 2, using the FPA, with sb = 10 m and l0 = 5 m.
Initially, we evaluated Method 1 using different values for the parameter sa and
concluded that 1) the greater the parameter sa, the fewer the links returned by
the function Encode (i.e., the better the encoding), and 2) when the parameter sa
exceeds 1000 m, in most cases the result does not vary. The reason for these con-
clusions is that when the parameter sa is small, the delimited area is also small,
and so a shortest path may not be detected because part of it falls outside the de-
limited area.
Method 2 was not evaluated using several values for the parameters l0 and sb
because 1) changing the parameter l0 would cause the FLA function to return a
link that is not the following link, and 2) changing the parameter sb has no effect
on the result as long as sb is greater than l0 since FLA only needs the connected
links.
Figures 6 and 7 show the results of the evaluation of the two methods. The re-
sults show an extremely good performance by each method. The SPA performs
better for most of the paths, although the difference is always very small compared
with the number of links in the original path. The results show that using the SPA
a path is encoded using 5% of its links on average, while using the FPA a path is
encoded using 7% of its links on average.
Fig. 6 Comparison of the numbers of links in the encoded paths and the original paths.
242 P.M. Lerin, D. Yamamoto, and N. Takahashi
Fig. 7 Percentage of the original links included in the encoding of each experiment.
Although the method based on the SPA is able to encode a path using fewer
links than the method based on the FPA, the shortest path algorithm (SPA) might
require much more time than the following path algorithm (FPA) to perform en-
coding and decoding. The FPA potentially requires less computation time than the
SPA to make a path, because the FPA is an incremental algorithm and uses very
small areas of road network, while the SPA requires backtracks that consume
much computation time and may use very large areas of road network.
For applications where the process time is very important, the computation time
of the encoding based on the SPA can be improved by finding the shortest path us-
ing more complex solutions based on preprocessing the road network [12,13].
Other solutions are based on the generation of a path view from the road network
[14,15]. Basically, a path view contains the pre-computed shortest paths between
each pair of nodes in the road network.
7 Conclusions
This paper presented two novel methods to encode a path, a sequence of links in
a road network, by using a routing algorithm so that the path can be stored and
shared using very few links. One method uses the shortest path algorithm (SPA) as
the routing algorithm and the other uses the following path algorithm (FPA) as
the routing algorithm. The evaluation in this paper has shown that these two meth-
ods can drastically reduce the number of links in a path when encoded. The results
of the method that uses the SPA are slightly better than those of the method that
uses the FPA but may take longer to compute. The results confirm that vehicle
routes are usually composed of several shortest paths or following paths.
As a future work remains a quantitative comparison of the two methods in terms
of the computation time, necessary to discuss a trade-off between the two methods.
Acknowledgments. We would like to thank Yahoo Japan Corp. for support in the devel-
opment of the prototype system. This work was also supported by JSPS KAKENHI
20509003 and 23500084.
Encoding Travel Traces by Using Road Networks and Routing Algorithms 243
References
1. Xue, G., Li, Z., Zhu, H., Liu, Y.: Traffic-known urban vehicular route prediction based
on partial mobility patterns. In: Proc. ICPADS, pp. 369–375 (2009)
2. Yuan, J., Zheng, Y., Zhang, C., Xie, W., Xie, X., Sun, G., Huang, Y.: T-drive: driving
directions based on taxi trajectories. In: Proc. GIS, pp. 99–108 (2010)
3. McMaster, R.B.: Automated line generalization. Cartographica 24(2), 74–111 (1987)
4. Leu, J.G., Chen, L.: Polygonal approximation of 2-D shapes through boundary merg-
ing. Pattern Recognition Letters 7(4), 231–238 (1988)
5. Brakatsoulas, S., Pfoser, D., Salas, R., Wenk, C.: On map-matching vehicle tracking
data. In: Proc. 31st Int’l Conf. on Very Large Data Bases (VLDB 2005), pp. 853–864
(2005)
6. Yuan, J., Zheng, Y., Zhang, C., Xie, X., Sun, G.-Z.: An interactive-voting based map
matching algorithm. In: Proc. 11th Int’l Conf. on Mobile Data Management (MDM),
pp. 43–52 (2010)
7. Cao, H., Wolfson, O., Trajcevski, G.: Spatio-temporal data reduction with determinis-
tic error bounds. VLDB Journal 15(3), 211–228 (2006)
8. Hönle, N., Grossmann, M., Reimann, S., Mitschang, B.: Usability analysis of compres-
sion algorithms for position data streams. In: Proc. 18th ACM SIGSPATIAL Int’l
Conf. on Advances in Geographic Information Systems, pp. 240–249 (2010)
9. Takahashi, N.: An elastic map system with cognitive map-based operations. In: Inter-
national Perspectives on Maps and Internet. Lecture Notes in Geoinformation and Car-
tography, pp. 73–87 (2008)
10. Yamamoto, D., Ozeki, S., Takahashi, N.: Focus+Glue+Context: an improved fisheye
approach for web map services. In: Proc. 17th ACM SIGSPATIAL Int’l Conf. on Ad-
vances in Geographic Information Systems, pp. 101–110 (2009)
11. Dijkstra, E.W.: A note on two problems in connection with graph theory. Numerische
Mathematik 1, 269–271 (1959)
12. Idwan, S., Etaiwi, W.: Dijkstra algorithm heuristic approach for large graph. Journal of
Applied Sciences 11, 2255–2259 (2011)
13. Cho, H.-J., Lan, C.-L.: Hybrid shortest path algorithm for vehicle navigation. Journal
of Supercomputing 49(2), 234–247 (2009)
14. Huang, Y.-W., Jing, N., Rundensteiner, E.A.: A semi-materialized view approach for
route maintenance in IVHS. In: Proc. 2nd ACM Workshop on Geographic Information
Systems, pp. 144–151 (1994)
15. Huang, Y.-W., Jing, N., Rundensteiner, E.A.: A hierarchical path view model for path
finding in intelligent transportation systems. GeoInformatica 1(2), 125–159 (1997)
Estimation of Dialogue Moods
Using the Utterance Intervals Features
1 Introduction
Recently, there are many studies which have focused on communication robots [6, 8,
11]. These studies aim for development of a robot which communicates with human
and provides people with a feeling of fullness and happiness. However, it is difficult
for development of such a robot which plays a role as alternate for a human. While
the human-robot interaction attracts attention, the robots that support human-human
interaction can be also usable for more attractive and affective communication.
We usually have a communication in a group of people, and Fig. 1 shows such
a communication which we supposed in this paper. In a communication, people
can communicate each other fluently and intimately by two important functions: to
estimate dialogue moods, and to select suitable behavior considering the dialogue
moods.
Kaoru Toyoda · Yoshihiro Miyakoshi · Ryosuke Yamanishi · Shohei Kato
Dept. of Computer Science and Engineering, Graduate School of Engineering,
Nagoya Institute of Technology,
Gokiso-cho Showa-ku Nagoya 466-8555 Japan
e-mail: {toyoda,miyakosi,ryama,shohey}@juno.ics.nitech.ac.jp
T. Watanabe et al. (Eds.): Intelligent Interactive Multimedia: Systems & Services, SIST 14, pp. 245–254.
springerlink.com c Springer-Verlag Berlin Heidelberg 2012
246 K. Toyoda et al.
Hearing Behavior
Speaker O
2 Related Studies
There are some studies about communication between humans. Wrede [13] studied
the relationships between the hot spots in meeting and the bibliographic tags labeled
by humans. However the tags can be not labeled automatically and dynamically and
it is not appropriate to use the tags in real time communication. Besides Gatica-
Perez [5] and Ito [7] studied the relationships between human motion and meeting
moods using motion features obtained through the motion recognition. Because it
needs high calculation cost and motion capture that is a lot of equipment to extract
the motion features, it can be also not appropriate to use the motion features in
real-world communication. And Mori’s study [9] estimates dialogue moods using
speaker’s facial expression features, this system expects that the users always are
have a face-to-face talk with camera and it is difficult to use this system in real-
world communication.
In this paper, we believe that the intervals of speakers’ states have a beneficial ef-
fect on estimating dialogue moods, and propose the dialogue mood estimation sys-
tem using the utterance intervals features. The utterance intervals features need just
only the information about “who speaks how long,” therefore it needs very lower cost
to extract than bibliographic data, motion features, and facial expression features.
Moreover, the point of this study is different from above several existing studies
and intend to support a communication between humans, We believe that not just
a knowledge about communication but a support for a communication is important
and significant in the field of human-computer interaction.
Estimation of Dialogue Moods Using the Utterance Intervals Features 247
Utterance intervals
Voice of
Mood
dialogue
Discriminant
features
Feature extraction Input Estimation
model
Learning data
Subjective
Feature extraction
evaluations
A utterance
Fig. 3 An example of the representation of a dialogue. In this case (dialogue index = d), A’s
utterance (st= 1) is 3 seconds, 1 second, and 1 second. And the utterance intervals of st = 1
is shown as the multiset S1d = {3, 1, 1}.
The features indexed 25-90 in Table 1 are prepared to comparison of speakers’ states
(e.g., comparison of A solitary utterances and B solitary utterances), because we
believe that these features have high potential for estimating dialogue moods base
on our heuristics.
Table 2 GA parameters
for a robot to select its behavior for supporting humans’ communication, we will
describe about excitement estimation model in detail in this paper.
Focusing on excitement estimation model, the features indexed (18), (44), and
(79) were selected as the contribute features. Fig. 4 and Fig. 5 each shows the nor-
mal distribution plot of the feature indexed (18) and (44) whose degrees of separa-
tion were relatively high; these features seemed to be contribute for the excitement
estimation model also in intuitiveness.
In the normal distribution plot, the line colored light and dark each means class
of “excite” and “not excite,” respectively. The dialogue whose light colored feature
in the plot is higher than dark one is estimated as “excite” mood.
From Fig. 4, it was confirmed that the highly occupancy of simultaneous utter-
ance led excite mood. This result suggested that the dialogue having a lot of simulta-
neous utterance (e.g., a speaker overlap another speaker’s utterance before finishing
his/her speaking, or give many back-channel feedback) made the excite mood.
Focusing on Fig. 5, it was confirmed that the dialogue mood was estimated as the
excite mood in the case of that the variance of utterances B were relatively higher
than the variance of utterances A, This result suggested that the dialogue in which
following speaker B replied long utterance involving his/her own belief made the
excite mood, because the highly variance of utterances B meant that B had not only
short utterances but also long utterances.
252 K. Toyoda et al.
Excite
Not Excite
1EEWTTGPEG2TQDCDKNKV[
Excite
Not Excite
1EEWTTGPEG2TQDCDKNKV[
var(Sd2 )+var(Sd3 )
Fig. 5 The comparison utterances A and utterances B; (44) var(Sd1 )+var(Sd3 )
(normalized)
rate each means the accuracy rate estimating dialogues that labeled positive and neg-
ative, respectively. The accuracy rate of estimating the all dialogue moods is named
the whole accuracy rate. Focusing on excitement, seriousness, and closeness, we
confirmed that the estimation models showed over than 80% whole accuracy rate
and more than 70% positive and the negative accuracy rate, and it seemed that these
estimation models had high potential for dialogue mood estimation. Moreover, we
suggested that the proposed utterance intervals features were effective for “excite-
ment,” “seriousness,” and “closeness” estimation.
However, focusing on “smoothness” and “brightness” estimation model, either
positive or negative accuracy rate was below 50% while the whole acuracy rate was
more than 70% in the both estimation. It was suggested that it was difficult to esti-
mate smoothness and brightness moods using only the utterance intervals features.
7 Conclusion
In this paper, we proposed the dialogue mood estimation model using the utter-
ance intervals features focusing on the intervals of speakers’ states. Through the
estimation experiments, we confirmed that the proposed system could estimate the
dialogue moods with a high degree of accuracy, especially for excitement, serious-
ness, and closeness. And we suggested that the utterance intervals features had a
high potential for the dialogue mood estimation. Estimating dialogue moods, the
personality of speakers can be taken in the consideration of human communication
support. Then it is expected that more affective/emotional communication can be
realized with the proposed system.
In future, we will study the effectiveness of the robot’s behavior depending on
the dialogue moods for humans communication support, and propose the behavior
selection method depending on the dialogue moods. As one of such a behavior,
we believe that playing the suitable back ground music (BGM) depending on the
mood is effective. Because in the filed of psychology science, it has been suggested
that the affections of music influences humans’ minds or bodies [2]. Therefore, as a
human-human interaction support system, we will develop automated suitable BGM
254 K. Toyoda et al.
selection system for the dialogue mood using our previous proposed system: song
selection system with affective requests [12].
Acknowledgment. This work was supported in part by the Ministry of Education, Science,
Sports and Culture, Grant.in.Aid for Scientific Research under grant #20700199, and HORI
SCIENCE AND ART FOUNDATION.
References
1. Akaike, H.: Information theory and an extension of the maximum likelihood principle.
In: 2nd Inter. Symp. on Information Theory, vol. 1, pp. 267–281 (1973)
2. Bruner, G.: Music, mood, and marketing. Journal of Marketing 54(4), 94–104 (1990)
3. Cohen, I., Sebe, N., Chen, L., Garg, A., Huang, T.S.: Facial expression recognition from
video sequences: Temporal and static modelling. In: Computer Vision and Image Under-
standing, pp. 160–187 (2003)
4. Consortium, N.T.S.R.: Priority area spoken dialogue spoken dialogue corpus (pasd)
(1993-1996)
5. Gatica-perez, D., Mccowan, I., Zhang, D., Bengio, S.: Detecting group interest level
in meetings. In: Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing
(ICASSP), pp. 489–492 (2005)
6. Hayashi, T., Kato, S., Itoh, H.: A Synchronous Model of Mental Rhythm Using Paralan-
guage for Communication Robots. In: Yang, J.-J., Yokoo, M., Ito, T., Jin, Z., Scerri, P.
(eds.) PRIMA 2009. LNCS, vol. 5925, pp. 376–388. Springer, Heidelberg (2009)
7. Ito, H., Shigeno, S., Nishimoto, T., Araki, M., Nimi, Y.: The analysis of the atmosphere
in the dialogues. IPSJ SIG Technical Report, pp. 103–108 (2011) (in Japanese)
8. Itoh, C., Kato, S., Itoh, H.: Mood-transition-based emotion generation model for the
robot’s personality. In: Proceedings of the 2009 IEEE International Conference on Sys-
tems, Man, and Cybernetics, SMC 2009, San Antonio, TX, USA, pp. 2957–2962 (2009)
9. Mori, H., Miyawaki, K., Nishiguchi, S., Sano, M., Yamashita, N.: An affections model
of group activities for estimation of individual’s affection. IEICE Technical Report, pp.
519–523 (2010)
10. Siedlecki, W., Sklansky, J.: A note on genetic algorithms for large-scale feature selection.
Pattern Recogn. Lett. 10, 335–347 (1989)
11. Takasugi, S., Yoshida, S., Okitsu, K., Yokoyama, M., Yamamoto, T., Miyake, Y.: Influ-
ence of pause duration and nod response timing in dialogue between human and com-
munication robot. In: Transactions of the Society of Instrument and Control Engineers,
pp. 72–81 (2010) (in Japanese)
12. Toyoda, K., Yamanishi, R., Kato, S.: Song selection system with affective requests.
In: 12th International Symposium on Advanced Intelligent Systems, Suwon, Korea, pp.
462–465 (2011)
13. Wrede, B., Shriberg, E.: The relationship between dialogue acts and hot spots in meet-
ings. In: Proc. IEEE Automatic Speech Recognition and Understanding Workshop,
ASRU, Virgin Islands (2003)
Extraction of Vocational Aptitude from
Operation Logs in Virtual Space
1 Introduction
The vocational aptitude tests are used widely [1][2][3]. In these tests, interest, per-
sonality and ability a person has for jobs are checked through many test items. The
results of the tests are compared to general tendency of the vocational aptitude to de-
rive jobs the person has high concordance rate. These tests aim to make examinees
Kyohei Nishide · Tateaki Komaki
Graduate School of Science and Engineering, Ritsumeikan University, Nojihigashi 1-1-1,
Kusatsu, Shiga
e-mail: mario@de.is.ritsumei.ac.jp,tateaki76@de.is.ritsumei.ac.jp
Fumiko Harada · Hiromitsu Shimakawa
College of Information Science and Engineering, Ritsumeikan University, Nojihigashi 1-1-1,
Kusatsu, Shiga
e-mail: harada@cs.ritsumei.ac.jp,simakawa@cs.ritsumei.ac.jp
T. Watanabe et al. (Eds.): Intelligent Interactive Multimedia: Systems & Services, SIST 14, pp. 255–267.
springerlink.com c Springer-Verlag Berlin Heidelberg 2012
256 K. Nishide et al.
understand their concordance for jobs, which lead them to choose their preferred
job. Many vocational aptitude tests consist of questionnaire and written tests. These
tests target persons older than high school students and university students. If the
result of the test shows that they do not match the aptitude type of their desired jobs,
they may resign the desired jobs because they do not have time to acquire the re-
quired aptitude. If they know their weak points early time such as elementary school
ages, they may be able to overcome the points. It would extend their range of job
choices. Therefore, it is important to judge the vocational aptitude in early time.
However, these tests are based on questionnaire and written tests so it is too difficult
for elementary school children to give correct answer due to the lack of knowledge.
In addition, these tests have strong implication of an examination. It is not pleasant
for children to take the tests.
In this paper, we propose a method to evaluate vocational aptitude for elementary
school children quantitatively. We provide a game-flavored tool to experience a job
in a virtual space. In this method, children enjoy the job experience tool in a virtual
space. We focus on the operation log. We extract their vocational aptitude through
calculation from operation logs with evaluation expression.
2 Vocational Aptitudes
2.1 Vocational Aptitudes with CPS-J
CPS-J (Career Planning Survey - Japanese Version) [3] is a vocational aptitude test
used widely in Japan. It is developed based on the theory of Holland [4]. The test
targets persons older than university students. The test evaluates the interest and
aptitude for wide varieties of jobs from primary industry to tertiary industry. In the
interest test, examinees answer the 150 questions which ask their preference for
specific activities. They are required to answer from exclusive 3 options: ”like it”,
”dislike it”, ”neither of them”. In the aptitude test, they are requested to answer
the fifteen questions on specific actions. They answer whether they are good at or
poor at with five grade evaluation. CPS-J judges the aptitude along with six axes
proposed in the theory of Holland. The axes are Realistic, Investigative, Artistic,
Social, Enterprising and Conventional. Each question is relevant to more than one
axis. The aptitude of the persons can be represented with the six axes from their
answers of fifteen questions.
even if their results exceed all required evaluation values. Our proposed method uses
CPS-J [3] to make classification of jobs on the basis of required abilities. Therefore,
our proposed method can extract aptitudes of each job.
The proposed methods in [6] and [7] provide chances to experience works of
a job in a virtual space. In [6], we train the work through both multimedia and
haptic technology. Since general behavior is reflected to the 3D object, we can learn
efficiently. In the method of [7], we can learn the medical field work efficiently.
However, we cannot know the aptitude of the job in these methods.
The aptitude for scientific laws requires ability to leverage scientific laws after
understanding of them. Action cases include understanding the laws and formulas
of mathematics. It is advanced ability for elementary school child to use arithmetic
to their surroundings [9]. We expect that we calculate the aptitude by a task requiring
arithmetic knowledge in a job experience. Elementary school child operates accord-
ing to his refracted thought in virtual space. The person of the high system aptitude
takes ideal operation because he thinks well. The person of the low system aptitude
operates aimlessly. The value of the rate of the elementary school child operation
against the ideal operation is adopted as the index of the science laws aptitude value.
Japanese Ability aptitude requires ability which is to understand conversation
or writings in Japanese. One of the examinations to measure Japanese ability is
ACTFUL-OPI (ACTFL—Oral Proficiency Interview) [10] This examination evalu-
ates the ability to have logical conversations. The person of the high Japanese ability
understands what the other says and responds correctly. It can be expected to calcu-
late the aptitude by the degree of the person to logically cope with questions from
other avatars. Persons who are good at coping with questions understand what to
be told in short time. At the same time, they can give correct responses quickly.
Persons who are poor at coping with questions understand what they are told in a
long time. Their responses are often wrong. We estimate the person who is loosely
takes short time but responses wrong. Therefore, the value of the correct answers
divided by the total response time is the index of the Japanese ability aptitude value.
If we permit the user of the job experience tool to type in an arbitrary message, the
response time depends on the typing speed. In addition, the input message varies
260 K. Nishide et al.
with his expressive ability of Japanese. To avoid them, we provide options for the
user so that he can choose his response from them. In this way, we can judge the
user ability whether to give response suitable for the context without depending on
his expressive ability of Japanese, even if he is an elementary school child.
Space Recognition aptitude requires ability to grasp spatial structure of a house
and a machine from a design. The person of the high space recognition ability grasps
the current position focused in the virtual space from two- dimensional information
correctly. There is a test to measure space recognition ability using virtual space
[11]. In this test, examinees walk freely at building in the virtual space. Imaging a
space structure, he sketches the floor plan of the building. Therefore, we expect that
we calculate the aptitude using a task requiring to recognize space structure. Person
who can grasp space structure knows the current position on a map immediately. A
poor person easily loses the correspondence from a position on a map to that in a
virtual space. Since he loses the current position, he rotates his avatar to know the
space structure around him. As a result, he cannot move his avatar. Therefore, the
system calculates the stop time when avatar should move and the time is the index
of the space recognition value.
user can answer by clicked on the given options. If the delivery destination of the
focused package is not the same as the house of the clicked door, the system warms
by displaying ”It does not match delivery destination”, or ”It does not match the
package” on the message board. When the user finishes delivering packages, the job
experience finishes.
move distance
Vmac = (2)
move time
The person strong at scientific laws finds an ideal route because he considers well
with scientific laws. In the home-delivery service job experience, a user needs to
understand the shortest route to make the delivery efficient. The science law aptitude
corresponds to finding the shortest route for the delivery as well as to moving the
route. We expect that the person of the high science law aptitude can shorten the
movement distance like Fig. 3. We use Equation 3 to evaluate the scientific laws
aptitude value.
2
mid d
Vsci = , (3)
move d
where min d is the shortest move distance for delivering all packages. It is the con-
stant derived from the distance combination of the delivery order. On the other
hand, move d is the movement distance the avatar moves to deliver all packages.
The shorter distance the avatar moves, the higher value the user gets. The system
squares the value of min d by move d to widen the move distances gap. The value
is the science laws value, which is denoted with Vsci .
The person of the high Japanese ability understands what the other says and re-
sponds correctly. To evaluate the Japanese ability, the system calculates the value
of the number of collect replies, collect answers, divided by total replying times,
reply time, as shown Equation 4. The value is the Japanese ability value denoted
with V jap .
collect answers
V jap = (4)
reply time
A user identifies a building to which he should deliver a package using a map. Since
the user experience the home-delivery service job in a virtual space, he must as-
sociate his map into roads and buildings in the virtual space. The person of the
high space recognition ability grasps the current position in the virtual space from
two-dimensional information on a map correctly. The space recognition aptitude
corresponds to moving the avatar without losing their ways while watching a map.
We expect that the person of the high space recognition aptitude can shorten the
time in which he loses his way. The system calculates the loss time, loss time, to
evaluate the space recognition as shown Equation 5. The loss time is the time that
Extraction of Vocational Aptitude from Operation Logs in Virtual Space 263
the user does not operate more than ten seconds. The value is the space recognition
value denoted with Vspa .
Vspa = loss time (5)
Table 2 shows the result summarizing the correlation coefficients of extracting apti-
tude values and values of CPS-J. In this paper, we judge that an aptitude value and
the corresponding value of CPS-J are associated with each other if their correlation
coefficient is more than 0.4.
The system aptitude has less than 0.4 in the coefficients at the first set and the
third set. This is an undesired result. In the first set, some of the examinees only
marked the delivery destinations and the others wrote the package numbers for the
delivery destinations on the paper map. In the second set, all examinees wrote the
package numbers for the delivery destinations on the paper map. The correlation
coefficient is presumed high in the second set because all of them take notes in the
same way. We can say that it is effective to show a sample of how to write the memo
beforehand. The delivery destinations in the third set resemble closely those of the
second set. Delivery destinations in the third set are similar to those in the second
set. Some examinees have planned the delivery route referencing the second set one.
They get better aptitude value at the third set compared with the result of the second
set. It makes their aptitude in the proposed method far different from the result of
CPS-J. Therefore, we estimate the correlation coefficient is lower. This problem can
be solved by setting the delivery destinations not to resemble closely.
The machine operation aptitude has high correlation coefficient in first set, while
it gets low as examinees experience jobs more times. While the examinees of high
aptitude get familiar with operations immediately, the examinees of low aptitude do
not. However, the examinees of low aptitude can operate well as they experience
jobs more. Therefore, the correlation coefficient is presumed to get lower. To avoid
the effect, we should take operation logs in early set for the machine operation
aptitude.
The science law aptitude has less than 0.4 coefficients at the first set. But, it gets
better later. We should later operation logs for the science law aptitude.
The Japanese ability aptitude has more than 0.4 coefficients in all sets. We can
take any set to take operation logs to calculate the Japanese ability aptitude. We in-
vestigate the correlation coefficients of the machine operation value of CPS-J and
the total answer time. The correlation coefficient value is 0.13 on the average. It no-
tices us that users cannot select the option quickly even if they are good at machine
operation.
Extraction of Vocational Aptitude from Operation Logs in Virtual Space 265
The space recognition aptitude should have the negative correlation coefficients
because the high aptitude user does not lose his way. The experiment result shows
that the correlation coefficient is the high value at the first set and second set. It
means we can apply the method in early set. The more examinees experienced
building arrangement in the virtual space, the more they grasp the position of the
buildings, which prevents them from losing their way. We can expect to keep the
high correlation coefficient with increasing kinds of the placement of the buildings
to avoid their grasp of the building arrangement.
From the result, we have found that some aptitudes should be calculated in early
set while others can be derived from any set in repeated job experiences. We cannot
expect that elementary school children conduct same experience many times. The
system should judge the vocational aptitudes in the first set though they do not get
accustomed to it. To get the aptitude which can be derived after users get accus-
tomed to the tool, we need to make users understand operations in a job experience
tool, which eliminates the necessity of repeating job experience. Especially, it is im-
portant for examinees to have the image what kind of operation they do. In future
experience, it is useful for them to watch the tutorials such as demonstration movies
of the job experience in advance.
Elementary school children answered a questionnaire by four grade evaluation
after they experienced. In the item of ”Was the experience of the home-delivery
service job fun?”, all of them answered to enjoy it. In the item of ”Do you want to
experience like this service for other jobs again?”, 75 of them answered that they
want to experience. Therefore, job experience with the virtual space is the service
to make elementary school children enjoy. In the item of ”Could you operate as
expected?”, all of them answered to operate as expected. Elementary school children
can operate enough if they use only the mouse and the arrow key. Because they
can operate as expected, our proposed method can be used for elementary school
children.
6 Discussion
It is a significant issue how many applications we should implement. The proposed
method would be infeasible if we need to implement an application corresponding
to every job in the world. By measuring wide range of abilities, CPS-J and the theory
of Holland judge the aptitudes for wide varieties of jobs from primary industry to
tertiary industry. The proposed method aims to judge the vocational aptitudes from
the same viewpoint of CPS-J and the theory of Holland. To achieve it, we need to
implement applications to measure wide range of abilities in the virtual space. One
application allows us to measure many kinds of abilities of the user, from which we
can judge his vocational aptitudes for more than one job. In the experiment described
in 4, we used the home-delivery service application. Note that, however, it does
not judge the aptitude of only home-delivery service. Therefore, we do not have to
implement every application corresponding to an individual job whose aptitude is
to be judged. Meanwhile, we cannot judge all of the vocational aptitudes with one
266 K. Nishide et al.
7 Conclusion
In this paper, we proposed a method to extract the vocational aptitude with a virtual
space. We use a virtual space to implement a vocational aptitude test system to make
elementary school children enjoy. This method focuses on the operation logs of each
work of job experiences in the virtual space. The proposed evaluation expressions
extract the aptitude value from the operation logs. By extracting the vocational ap-
titude from the operation logs, the system can be applied to the test examinees who
have no ability to answer the complicated questions of the traditional tests.
We have experimented whether the vocational aptitude can be extracted by this
method or not. Examinees take the home-delivery service job experience in a virtual
space. We have examined the correlation coefficients between the aptitude values
extracted from operation logs at that time and the answered values of CPS-J. We
have found that some aptitude should be derived from the first operation log, while
others from later logs. The aptitude which needs many experiences can be calculated
after examinees get accustomed with operation. Since we cannot expect children
try same job experience many times, we should give them enough knowledge on
operations in the virtual space in advance. To improve the performance higher, it is
useful for them to watch the tutorials of the job experience in advance.
As a future work, we will develop useful presentation method of the extracted
aptitude values.
References
1. A Student Site for ACT Test Takers,
http://www.actstudent.org/index.html (cited November 29, 2011)
2. Free Sample of Vocational Aptitude Test,
http://www.personality-and-aptitude-career-tests.com/
vocational-aptitude-test.html
(cited November 29, 2011)
3. CPS-J,
http://www.nipponmanpower.co.jp/ps/think/cpsj/
(cited November 29, 2011) (in Japanese)
4. Holland, J.L.: Making vocational choices: A theory of vocational personalities and work
environments, 3rd edn. Psychological Assessment Resources (1997)
Extraction of Vocational Aptitude from Operation Logs in Virtual Space 267
5. Ogawa, K.: Application of role model based e-portfolio system to career design support.
In: Proceedings of World Conference on E-Learning in Corporate, Government, Health-
care, and Higher Education 2008, pp. 3052–3057 (2008)
6. Bhavani, B., Sheshadri, S., Unnikrishnan, R.: Vocational education technology: rural
India. In: A2CWiC 2010 Proceedings of the 1st Amrita ACM-W Celebration on Women
in Computing in India (2010)
7. Coles, T.R., Meglan, D., John, N.W.: The Role of Haptics in Medical Training Simula-
tors: A Survey of the State of the Art. IEEE Transactions on Haptics 4(1), 51–66 (2011)
8. Ueda, K., Endo, M., Suzuki, H.: Task decomposition: Why do some novice users have
difficulties in manipulating the user-interface of daily electronic appliances. In: Harris,
D., Duffy, V., Smith, M., Stephanidis, C. (eds.) Human-Centred Computing: Cognitive,
Social and Ergonomic Aspects, pp. 345–349. Lawrence Erlbaum Associates (2003)
9. Saito, K.: Study on systematic instruction to bring up power to think about mathemati-
cally (2009) (in Japanese),
http://www.fuku-c.ed.jp/center/houkokusyo/h21/
h21sansuuchoken.pdf
(cited November 29, 2011)
10. ACTFL Certified Proficiency Testing Programs (oral and written),
http://www.actfl.org/i4a/pages/index.cfm?pageid=3642
(cited November 29, 2011)
11. Yasufuku, K., Abe, H., Yoshida, K.: Development of Architectural Visualization Ability
Test Using Real-Time CG. In: Proceedings of 7th. Japan-China Joint Conference on
Graphics Education, pp. 44–49 (2005)
Framework of a System for Extracting
Mathematical Concepts from Content
MathML-Based Mathematical Expressions
Abstract. This study proposes the framework of a system that extracts mathemati-
cal concepts from an input mathematical expression. In this paper, math concepts
are represented as specific patterns in math expressions such as “differential equa-
tions” and “quadratic functions.” This system, termed the concept extraction sys-
tem, presents a math concept when an input math expression includes the pattern
for that particular math concept. The system uses two key components: “Math-
Placeholder,” an originally defined XML vocabulary to describe patterns, and a
pattern discriminator, a mechanism to identify whether an input math expression
includes the predefined pattern(s). Math expressions described by an XML voca-
bulary called Content MathML have been used for this study. Lastly, the follow-
ing two applications of the proposed system are presented: (1) it can be used as an
information retrieval tool to match math concepts in math expressions, and (2) it
can be used together with a learning management system that provides study
material for the concepts used in a given math expression.
1 Introduction
Takayuki Watabe
*
T. Watanabe et al. (Eds.): Intelligent Interactive Multimedia: Systems & Services, SIST 14, pp. 269–278.
springerlink.com © Springer-Verlag Berlin Heidelberg 2012
270 T. Watabe and Y. Miyazaki
and the development of such a system will allow us to extract math concepts for a
variety of applications. For instance, the system performs the function of matching
math expressions having similar math concepts; furthermore, the extracted concept
can be used to obtain educational material to understand the math expression.
In order to distinguish between math expressions with various patterns such as
“differential operators” or “equations” and those without such specific patterns,
use of rigid notations is indispensable. Content MathML[8], which meets the
abovementioned requirement, allows us to clearly describe the types of operators
and operands in math expressions. In addition, TeX is a well-known markup lan-
guage for describing math expressions; however, this language does not clearly
distinguish between “the product of fraction and ” and “the differentia-
tion of with respect to .” Content MathML clearly distinguishes between
such concepts, and this is why we chose this markup language to perform the ex-
traction of math concepts from the input math expression.
In section 2, Content MathML will be introduced with some examples. Section
3 elaborates on the proposed system configuration and algorithms for extracting
math concepts, particularly focusing on how math patterns are described using the
originally defined XML vocabulary, and how math expressions are identified to
have specific patterns. Subsequently, section 4 explains two future applications of
the proposed system, such as the use of the system as an information retrieval tool
to match math concepts and its use with learning management system (LMS) to
obtain relevant study material to study a math expression after extracting concepts
from the expression. Section 5 presents the concluding remark.
2 MathML
Content MathML is an application of XML for describing mathematical notations
and capturing both its structure and content. It has been released as a W3C Rec-
ommendation and defined as one of the XML vocabularies. Central to Content
MathML is the <apply> tag that represents to apply a function or an operator,
given in the first child, applied to the remaining elements.
<apply>
Table 2.1 Description of tags in Fig. 2.1
<plus/>
<ci>a</ci> Tag Description
<ci>b</ci> applies the first child operator
</apply> apply
to other elements
Fig. 2.1 Representation of using plus performs addition
Content MathML ci encloses identifiers
Framework of a System for Extracting Mathematical Concepts 271
In other words, the code in Fig. 2.2 is for “applying second order differentiation
to function .” Thus, Content MathML is used to clarify the structures of math
expressions using the <apply> tag with different types of operators and operands.
There are different tags for operators such as subtraction, multiplication/division,
trigonometric functions, and logarithmic functions.
Content MathML also has introduced a mechanism to describe types of identi-
fiers by the values of type attributes of <ci> tag. For example, <ci
type="vector">x</ci> means that the identifier is a vector. The values of
type attributes include “integer,” “real,” “matrix,” “set,” and more.
<apply>
<sin/>
<apply>
<plus/>
<ci>a</ci>
<ci>b</ci>
</apply>
</apply>
Fig. 2.3 Representation of sin by Content MathML and its corresponding tree
272 T. Watabe and Y. Miyazaki
For the rest of this paper, let us use the two representations interchangeably and
allow the terminology to describe trees with “sin as a child of apply” or “plus and
ci as siblings.”
<arbitrary> tags with the same values of the “label” attribute are considered to
be identical. Another attribute that may be taken by the <arbitrary> tag is “sib-
ling.” The value of the “sibling” attribute is either true or false. Unless this attrib-
ute is explicitly specified, its value is considered to be false. If this attribute value
is set to true, the <arbitrary> tags are interpreted as arbitrary plural numbers of
sibling elements. The <partial> tag is also devised to consider its child elements
as its part. With the use of this tag, a sample pattern code <mp:partial>
<ci>x</ci> </mp:partial> may match targets such as <ci>x</ci> and an
expression to multiply x and y, or <apply> <times/> <ci>x</ci> <ci>y</ci>
</apply>. One may append the “sibling” attribute to the <partial> tag. When
the value of the “sibling” attribute is true, the <partial> tag is interpreted as plu-
ral siblings either of which has the child of the <partial> tag as its part. Lastly,
<and>, <or>, and <not> represent three logical operators “and,” “or,” and “not,”
respectively. The <and> and <or> tags have more than one element, whereas the
<not> tag has only one child. The <and> tag, usually used with the <partial>
tag, is used to represent all the child elements of the tag. For instance, math ex-
pressions containing both and , can be coded as shown in Fig. 3.1.
<mp:and>
<mp:partial>
<ci>x</ci>
</mp:partial>
<mp:partial>
<ci>y</ci>
</mp:partial>
</mp:and>
Likewise, the <or> tag is used for the “or” operation, and the <not> tag is con-
sidered for the element which is not the child of the tag. Table 3.1 summarizes the
tags introduced in this section.
<mp:partial>
<apply>
<equal/>
<mp:partial sibling="true">
<apply>
<diff/>
<mp:arbitrary sibling="true"/>
</apply>
</mp:partial>
</apply>
</mp:partial>
<mp:partial>
<apply>
<divide/>
<apply>
<factorial/>
<mp:arbitrary label="n"/>
</apply>
<apply>
<times/>
<apply>
<factorial/>
<mp:arbitrary label="k"/>
</apply>
<apply>
<factorial/>
<apply>
<minus/>
<mp:arbitrary label="n"/>
<mp:arbitrary label="k"/>
</apply>
</apply>
</apply>
</apply>
</mp:partial>
match with those of the target, that target is identified to have the matched pattern as
a math concept. Let us take a simple example, wherein the target math expression is
sin and the pattern is cos . The tree structures of the target and
the pattern are shown in Fig. 3.4.
The number near each element indicates the traversing order. It can be observed
that the tenth elements in the two tree structures do not match, thereby leading to
the termination of the traversing procedure. As a result, the system indicates that
the target sin z does not include the pattern cos in it.
Fig. 3.5 Tree with “arbitrary” element and its traversing path
276 T. Watabe and Y. Miyazaki
The next example is that of a pattern with a “partial” element. When the “par-
tial” element is detected, the child element of the element and the corresponding
element of the target tree are separated, and they might form another tree. Let the
child element of the “partial” element, and its corresponding element in the target
be denoted as pt and tt, respectively. Further, let pte and tte present elements being
focused of pt and tt, respectively. It should be noted that are scanned according to
the algorithm shown in Fig. 3.6.
When the child element of the “partial” element in the pattern and the corre-
sponding element in the target are matched successfully, the system continues
traversing the tree by skipping the subtree of the “partial” element in the pattern as
well as in the target. However, if the elements in the target and the pattern do not
match, the system indicates that the target does not include that pattern. Fig. 3.7
shows the scanning procedure of the pattern and the target tree structures.
Fig. 3.7 Scanning of pattern and target trees with “partial” elements
When a tree includes an “and” element, each of the child elements is extracted
individually. In addition, the system extracts the corresponding element (tt) from the
target tree, as well. Next, each child element of the “and” element is considered as a
subpattern of the element and is matched with tt to check whether tt has similar
subpatterns. If tt includes all the subpatterns, the system will continue scanning by
skipping tt as well as the subtrees of the “and” element; however, if tt does not in-
clude all the subpatterns the system indicates that the scanning result is unsuccessful.
Framework of a System for Extracting Mathematical Concepts 277
Examples of trees with “or” and “not” elements are omitted owing to space
constraints in the paper.
4 Applications
In this section, we discuss two applications that use the concept extraction algorithm.
expressions. This algorithm finds the similarities by considering the structural prop-
erties of each math expression. On the other hand, the IR tool is capable of finding
similarities by comprehending math expressions more abstractly because the system
performs computations on the basis of the math concepts. There exists another study
that finds the similarities between expressions on the basis of the number of tags in
their MathML codes [4].
5 Concluding Remarks
This study deals with the representation of math concepts as specific patterns in
math expressions and aims to devise a method for extracting math concepts from an
input math expression. In order to describe patterns, an originally extended Math-
Placeholder is introduced, and its notations and their functions are elucidated. Two
key components, namely, “concept tuple” and “pattern discriminator,” are essential
to this system and have been explained in detail. Lastly, applications to show the va-
lidity and usefulness of the proposed concept extraction system have been presented.
In future, we plan to enhance this system by devising a method for the metadata
representation of the concepts by classifying the concepts in different categories
and for establishing relations among different concepts. This enhancement will
enable us to use the extracted math concepts optimally. In contrast to our future
plans, [3] deals with the knowledge management concerning math expressions,
and [5] deals with the development of ontology of expressions in MathML. It is
also desirable to implement the applications shown in section 4 to evaluate the
usefulness of the proposed system.
References
[1] Altamimi, M.E., Youssef, A.S.: Wildcards in math search, implementation issues. In:
CAINE/ISCA, pp. 90–96 (2007)
[2] David, C., Kohlhase, M., Lange, C., Rabe, F., Zhiltsov, N., Zholudev, V.: Publishing
math lecture notes as linked data. The Semantic Web: Research and Applications,
370–375 (2010)
[3] Jeschke, S., Natho, N., Wilke, M.: KEA-A knowledge management system for mathe-
matics. In: 2007 IEEE International Conference on Signal Processing and Communica-
tions, pp. 1431–1434 (2007)
[4] Kishimoto, S., Nakanishi, T., Sakurai, T., Kitagawa, T., Tochigi, T.: An Implementa-
tion method of similarity-based retrieval for formulas using MathML. In: IEICE
DEWS 2003 6-P-07 (2003)
[5] Kitani, N., Yukita, S.: The educational uses of mathematical ontology and the search-
ing tool. Frontiers in Education Conference (FIE 2008) T4B-11 (2008)
[6] Kohlhase, M., Sucan, I.: A search engine for mathematical formulae. Artificial Intelli-
gence and Symbolic Computation, pp. 241–253. Springer (2006)
[7] Watabe, T., Miyazaki, Y.: Toward math education utilizing math expressions with
their semantic information. In: Annual Conference of Japan e-Learning Association,
pp. 13–20 (2011)
[8] W3C Math Home, http://www.w3.org/Math/
[9] Yokoi, K., Aizawa, A.: An approach to similarity search for mathematical expressions
using MathML. Towards Digital Mathematics Library (DML), 27–35 (2009)
Fundamental Functions of Dynamic Teaching
Materials System*
1 Introduction
In many countries including Brazil, there is a lack of e-Learning learning teaching
materials, and others problems such as: how students and teachers will access the
materials? Do they have all the necessary hardware and software configurations to
use the materials? Do they have the computer user experience necessary to operate
the system? Another problem is that most of the e-Learning teaching materials are
T. Watanabe et al. (Eds.): Intelligent Interactive Multimedia: Systems & Services, SIST 14, pp. 279–287.
springerlink.com © Springer-Verlag Berlin Heidelberg 2012
280 G.M.T. Batista, M. Urata, and T. Yasuda
simply digital versions of printed teaching materials or a new material that works
in the same way. There is nothing that is really new or that makes the use of the
computer really necessary, or that in fact utilizes the real potential of a computer.
There is also a lack of e-Learning materials designed to be used during classes;
most of the materials are designed to be used in distance education, so teachers
have problems using it during classes, limiting them to only the use of videos,
audio and slide presentations which lead to a underutilization of the computer. In
fact, it is difficult to have a class where the teacher and all the students can access
a computer and have everyone participate as they do in a normal class; making
e-Learning teaching materials for this kind of situation is also difficult.
Because there are many people involved in the process of producing e-Learning
teaching material, it is more difficult to make the material really match the needs
encountered during classes by the teachers and students. First in the process is the
instructional designer work, and than there is the programmer or web designer
creating the material itself based on the instructional design, and finally there is the
teacher using the material with his/her students. However, as the instructional
designer is not the teacher that will be using the material, he/she does not know
exactly what is necessary in the design of the material, and what the teachers and the
students really need. Another problem is that the teacher does not know the technical
limitations faced by the programmer in the process of producing the materials, and
does not know exactly which problems can be solved by updating the material and
which problems would be better solved by adapting his/her teaching methods. If
they do not work together, therefore, the material may have many problems.
This research was done together with the Brasilia University Department of
Foreign Languages and Translation localized in Brazil. A group consisting of
teachers and tutors from the Brasilia University was responsible for testing the
system, creating teaching materials for the Brasilia University students. Those
teachers and tutors are members of the Project ELO (ELO stands for Online Lan-
guage School in portuguese, Escola de Línguas Online), and the Dynamic
Teaching Materials System developed as a result of this research is also part of the
project. Fundamentally, the Project ELO analyzed what kind of features are
necessary in the system, based on the teachers’ and tutors’ needs; however, in
Brasilia University we did not have the necessary resources and knowledge to
develop the Dynamic Teaching Materials System. As the Brasilia University had a
student exchange program with Nagoya University, we took this opportunity to
transfer the development of the system to Nagoya. Another reason for the transfer
was the opportunity to try to use the system in Japanese schools intended for
Brazilian children, so that Brazilian teachers could also help with the education of
those children without having to come to Japan.
The teaching material present in the system was created by veteran students
with the supervision of the teachers. In addition to the teaching materials, the
project also had communication features which were considered extremely
important; as stated in [1] and [2], allowing students to communicate with tutors,
teachers and other students is also important in the learning process, as they can
participate more in the process and help each other. The project started only with
Japanese, but the system is now available to the whole of the Department of
Fundamental Functions of Dynamic Teaching Materials System 281
materials; in order to use the same content in another material, a copy of the
content needs to be inserted in the source of the other material and it needs to be
done on every material that uses the content.
The third principal difference is the automatic update of the shared contents.
This feature helps the teaching materials to be edited or updated faster, as all the
materials that have a linked content get updated at once. As static materials do not
support shared contents every time a content is updated the copies present on the
other materials stay the same; to update the copies it is necessary to insert a copy
of the updated version of the content in place of the old one, and it needs to be
done on every material one by one.
The features of the dynamic teaching materials allow the teaching materials to
evolve, adapting to the needs encountered by the teachers during classes. They
helps the teachers to work together sharing content and allows them to grow
together and share their updates faster automatically in realtime.
A content linkage feature was also created in the system. As the teachers
wanted to create a database of the contents used in the teaching materials, when a
content is stored in the database, the content link feature allows teachers to share
the contents, link other teachers’ contents to their own materials or make copies
and create new versions of the contents. Any content created can be used in any
teaching material in the system just by making a reference to the desired content’
ID number; this feature works not only with texts, images or videos, but can be
used with a whole slide with multiple contents or a whole folder of slides.
The structure of the Dynamic Teaching Material is basically a group of folders;
the folders are filled with slides and inside the slides are the contents like texts,
images and videos. Everything has its own ID number, so when the system is
accessed by a teacher, the system, the system loads the necessary group of folders
based in the teacher’s ID. Then, when the teachers opens a folder the system
checks the ID number of the folder to verify which slides’ ID numbers are related
to that folder and loads the slides. When the system is loading a slide the process
is the same as loading a folder: the system checks the slide ID number to verify
the ID numbers of the contents related to the slide and loads the contents based on
the number written in the database table.
In the system the contents are organized in packages called ‘Packy’. Each
Packy is a package with a maximum of ten contents plus one content that is used
only in interactive operations. This package organization of the contents is
necessary for the interaction features. When the teacher is creating a teaching
material, he/she can configure all the interactions of the content and save it in a
package. For example, the teacher needs one image change to another when the
student clicks on it. In this case the teacher puts the two images in the same
package and configures it to change from one image to another on the editing
menu. All the information about the type of the interaction and the images’ urls in
the package are stored on the database as a content, and the package can be
loaded, edited and linked to any Dynamic Teaching Material as a single content.
Fig. 1 System diagram showing the connections between the main interface with the other
parts of the system and which technology was used for creating the parts.
the PHP, so the PHP was used to link the interface of the system to the database.
The file upload feature was also made using PHP to store the files on the server,
because it is not possible to do it with just the ActionScript 3. The system also has
a content search feature that can be used to search for images and videos. In the
case of searching for images, the system allows the teacher to use the Google
image search directly from the system interface and put the image in the teaching
material. In the case of searching for videos, the system allows the teacher to
search videos from YouTube and opens the YouTube player inside the teaching
material as a content that can also interact with other contents. In both cases the
search feature was created using the GOOGLE AS3 API.
Comparing with other e-Learning systems, the Dynamic Teaching Materials
System has some features that make it more useful and faster than others. Almost
all of the e-Learning systems have an interface built focusing on the developer's
perspective, which is a big barrier to novice computer users, as stated in [4] and
[5]. One of the most important things about the system is the fact that the teachers
have everything they need in one place, on a single screen. All the creation,
editing, linking and configuration tools are on the same screen organized in
menus, and everything is online. The WYSIWYG interface allows the teacher to
open the teaching material and edit only the part that needs to be edited (Fig. 2);
the system has all the necessary features in the same place, so that the interface
used by the students to study, the interface used by the teachers to utilize the
Fundamental Functions of Dynamic Teaching Materials System 285
Fig. 2 Context menu and Packy editing menu used by teachers to create and configure the
content of the Dynamic Teaching Materials.
teaching materials on class and the interface used by the teachers to edit the
materials are the same, the only difference being that the editing features and
menus are not enabled to the student. Thus, when a teacher creates a teaching
material, the contents are already inside the system during the edition, so he/she
does not need to upload the teaching material later and can edit it in realtime
whenever they like. When a teaching material is finished it can be accessed
instantaneously by the students.
As stated in [6], “Novices tend to have limited knowledge and will often make
assumptions about what to do using other knowledge about similar situations.”
The system interface was made to be very similar to the computer OS interface, to
utilize the maximum of the user’s prior knowledge about operating a computer, so
they do not have to learn basic operations like opening files folders with a double
click, and will have a more intuitive experience even if they are using the system
for the first time. The WYSIWYG interface for editing the teaching materials is
very important, not just because it is more simple, but because the whole system
was structured to work based on this interface. This kind of interface helps the
user get a response from the system more quickly, because the interface allows the
user to see the changes in realtime.
286 G.M.T. Batista, M. Urata, and T. Yasuda
5 Conclusion
The system has shown good results allowing the teachers to create interactive
multimedia teaching materials, even without knowing any programing language.
The dynamic teaching materials are created faster in comparison with the
materials created before with Moodle and other authoring software used by the
teachers because of the WYSIWYG interface integrated directly inside the system
content view screen. The teachers creating and editing the teaching materials by
themselves is also a very important thing: the speed that the updates of the
teaching materials can be done make the update more efficient, and they reach the
students that really need them, making the teaching materials more appropriate for
both students and teachers.
Fundamental Functions of Dynamic Teaching Materials System 287
The system still needs some improvements, such as an internal content search
engine, improvements to the editing interface to allows teachers to really work
together on the same material at the same time on the same slide, communicating
and seeing what the other is doing in realtime. Another useful feature that should
be added is the possibility of dragging content from the computer file folder
directly to the system interface, which could make the upload of files from the
computer to the system faster and more easily. At present the system only runs on
computers like desktops and notebooks, but a mobile version of the system that
runs on tablet PCs like iPads or Android tablets should make the system more
useful during classes as the tablet can be operated more easily than a PC, and is
more useful for any one to access the system and study or create a teaching
material from anywhere, which should give more freedom to the users. Another
important thing is that those devices usually have cameras that can help in the
communication between users. As stated in [7] and [8], letting the users use the
system more freely and improve the communications features should be useful:
for example, this can allow different students to study in different ways and let
them participate and interact to create their own knowledge.
In the current version of the system, the basic functions that allow the creation
the Dynamic Teaching Materials were developed allow the teaching materials to
evolve. However, there are still a lot of possibilities for improving the system.
Acknowledgements. Part of this work was supported by Grants-in-Aid for Scientific
Research Japan.
References
[1] Grant, L., Facer, K., Owen, M., Sayers, S.: Opening Education: Social software and
learning. Futurelab United Kingdom (2006)
[2] Letramento digital através de narrativas de aprendizagem de língua inglesa,
http://www.veramenezes.com/narmult.mht
[3] Gillani, B.: Learning Theories and the Design of E-Learning Environments. University
Press of America United States (2003)
[4] Gestural Interfaces: A Step Backwards In Usability,
http://www.jnd.org/dn.mss/
gestural_interfaces_a_step_backwards_in_usability_6.html
[5] Natural User Interfaces Are Not Natural,
http://www.jnd.org/dn.mss/
natural_user_interfaces_are_not_natural.html
[6] Preece, J., Rogers, Y., Sharp, H.: Interaction Design: beyond human-computer interac-
tion, 2nd edn. John Wiley & Sons Ltd, England (2009)
[7] Blackwood, A., Anderson, P.: Mobile and PDA technologies and their future use in
education. JISC Technology and Standards Watch: 04-03 (2004)
[8] Chao, H., Wu, T.: Mobile e-Learning for Next Generation communication Environ-
ment. Journal of Distance Education Technologies 6(4), 1–13 (2008)
Generation Method of Multiple-Choice Cloze
Exercises in Computer-Support for
English-Grammar Learning
Ayse Saliha Sunar, Dai Inagi, Yuki Hayashi, and Toyohide Watanabe
Abstract. With many remarkable advances in technology, not only studying with
tutor at school but also studying through computer at home become preferable. In-
telligent Tutoring System (ITS) is one of the research fields which aim to support
the individual learning intellectually. To provide the learning material of the domain
knowledge in many ITS, the learning materials are statically associated with each
other in advance and given to student based on her/his understanding state. Motivat-
ing student and making them more interested in the learning content is the system’s
task in the computer-supported systems. If students study on content which they are
interested in, learning activity becomes more effective. Our research objective is to
construct a system which automatically generates multiple-choice cloze exercises
from text input by the student. We focus on supporting individual study of learn-
ing English grammar. In this paper, we propose a representation method of English
grammar by Part-Of-Speech (POS) tags and words, the calculation procedure for
estimating the understanding state of student in the student model, and the learning
strategy for generating the next exercise based on the student model.
1 Introduction
Knowing English became one of the most important and essential issues in every-
one’s life. It is necessary for many foreigners to develop themselves and advance
their works. In order to learn English and check the effect, many people prepare
for English examinations. People prefer to study English through computer by
Ayse Saliha Sunar · Yuki Hayashi · Toyohide Watanabe
Graduate School of Information Science, Nagoya University,
Furo-cho, Chikusa-ku, Nagoya, Japan
e-mail: {saliha,yhayashi,watanabe}@watanabe.ss.is.nagoya-u.ac.jp
Dai Inagi
Faculty of Engineering, Nagoya University, Furo-cho, Chikusa-ku, Nagoya, Japan
e-mail: inagi@watanabe.ss.is.nagoya-u.ac.jp
T. Watanabe et al. (Eds.): Intelligent Interactive Multimedia: Systems & Services, SIST 14, pp. 289–298.
springerlink.com c Springer-Verlag Berlin Heidelberg 2012
290 A.S. Sunar et al.
themselves in recent years. Intelligent Tutoring System (ITS) is the research field
which aims to support studies individually, using the Artificial Intelligence technol-
ogy. ITS gives students an appropriate learning content from the domain according
to their understanding [1]. Although ITS provides individual learning environments,
the systems mostly give the student a part of the same learning content. In the
domain knowledge, the learning content must be structured in such way that can
be easily managed in order to adapt learning to student’s understanding. However,
constructing the domain of English is quite hard [2]. The biggest challenge on rep-
resenting English is that English does not consist of particular formulae and theo-
rem like mathematics. If English grammar is represented as formulae, it would be
easy to manage the domain knowledge. Kyriakou et al. proposed a tool for manag-
ing domain knowledge and helping tutors in ITS [3]. Their system consists of three
components: knowledge concepts which are organized in network, course units, and
meta-description which is a data set of learning objects. This system describes not
only a new metadata, but also its concept and relations. Although this system allows
tutor to manage the domain by creating, storing, viewing, and editing the metadata
or can create the concept network of domain, students still need manually to define
relation links, contents and concepts by themselves, but not do automatically. Faul-
haber et al. constructed the web-based learning system named ActiveMath for math-
ematics [4]. This system represents the learning content (rule/concept) as the nodes
and Inter-node relations are dynamically extracted from the domain. Even if rela-
tions are dynamically determined, this system must also predefine a domain which
is assigned to the difficulty level. However, if students study on content which they
are interested in contrast to predefined learning content, the learning activity would
be more effective and students would be more motivated to study.
In this research, we aim to construct an ITS which automatically generates
multiple-choice cloze English grammar exercises from student’s input text based on
the understanding state of student. To automatically generate the English questions,
MAGIC (Multiple-choice Automatic GeneratIon system for Cloze question) was
composed in our laboratory which generates multiple-choice cloze question from
an input English text [5]. The MAGIC system only generates an English question;
it does not adapt to the learning system for learning English grammar. In this paper,
we propose some mechanisms as for the estimation of student understanding, for
exercise generation based on student model to add into the MAGIC. Since grammar
consists of rules, we purpose to formulate English grammar rules. English gram-
mar rules are formulated by Part-of-Speech (POS) tags and some words. Firstly, our
system generates a few exercises from the student’s input text. After the student an-
swered the exercises, the understanding state of student is estimated in the student
model based on the accuracy of the answers. Then, the next exercises are generated
from the newly-input text based on the understanding state of the student. In this
paper, we represent English grammar rules by POS tagger and words and then dis-
cuss the constructional formulation so as to be useful as our learning strategy. We
also discuss how to estimate the understanding state of student and generate the next
exercises.
Generation Method of Multiple-Choice Cloze Exercises in Computer-Support 291
I Domain
N Knowledge
T
E
Learning
R
Strategy
F
Student A Student
C Model
E
ITS Framework
IN : Process flow
: Data flow
Input text
No
(c)
Sentences POS of unit
Select unit for
Extract sentences
exercises Exercise
Current Selected sentences Yes No Unit data
MAGIC Chapter & Unit id
system Select blank part Chapter& Select candidate for
Unit id blank part
Chapter & Unit & BP id Yes POS of blank part
Generate distracters
POS of dist.
Correct choice
Output exercises
Answer Answer
Selected choice data
Evaluate answer
(a)
Correctness of answer
Explanation
Correct answer Grammar
Rule
Understanding state
Display explanation
POS of blank part, of student
distracter and unit (b) Student
Renew student model Model
Correctness of answer
performs three procedures: extracting sentences from texts which are appropriate for
multiple-choice cloze questions, determining blank part and generating distracters.
The system uses some methods to carry out these processes. In the first process,
Preference Learning is executed using words and POS tag information emerging in
the existing multiple-choice cloze questions to put the input sentences in order. In
order to make up this order, Ranking Voted Perceptron [7] is used. Ranking Voted
Perceptron calculates ranks for each input sentence according to their similarity to
the existing questions. As a result, the sentences which are appropriate to com-
pose a cloze question for an examination like TOEIC are ranked. In the next pro-
cesses, blank part and its distracters are estimated based on Conditional Random
Field (CRF). In the current MAGIC system, CRF is introduced to attach labels to
words of the sentence, and then a blank part is defined as the named entity in a
sequence of words and represented by IOB2 format [8]. Sequences of words, POS
tags and distracters with their named entities in the existing multiple-choice cloze
questions are learned. The word which has the largest marginal probability of some
particular tags is determined as a blank part. Its distracters are also determined based
on result of CRF.
In order to generate exercises based on the understanding degree of student, it is
needed to construct learning contents first. The domain knowledge which is shown
in Fig. 2(a) represents English grammar rule as divided units. In order to estimate the
understanding state of student for each unit, the student model is added in MAGIC
as shown in Fig. 2(b). To generate exercises which are suitable for the student the
Generation Method of Multiple-Choice Cloze Exercises in Computer-Support 293
process of selecting blank part and generating distracters are altered. The learn-
ing strategy is applied based on the student model in Fig. 2(c). Blank part and its
distracters, hereby, are to determine the student model to generate appropriate exer-
cises. Then, the system outputs multiple exercises to the student. Finally, after the
student answered the exercises, our system evaluates the student’s answer and cor-
rects the student’s mistakes. Also, our system updates the student model according
to the accuracy of the answers before generating the new exercises.
3 Grammar Structure
To construct a system for making English grammar learn, the domain knowledge of
English grammar should be defined so as to be able to manage the student model and
apply the learning strategy. We examined English textbooks to decide the structure
of English grammar. In general, the English grammar is divided into chapters and
units. Each chapter includes some units which are relevant to the chapter’s subject.
Each unit represents a single grammar rule. Since blank parts are selective part(s) of
exercises to assess the student’s knowledge of the grammar rules, we also examined
blank parts of the exercises on the English textbooks.
Results of our investigation show that most of English grammar rules can be
represented as the ordering of part-of-speech. In addition, some words might be
important as much as their type (part-of-speech) and their placement in the sentence.
In this research, we represent each English grammar rule by a composition of POS
tags and words. We determine candidates of the blank part for the grammar rules
while defining the grammar rules.
Table 1 shows an example for the representation of a grammar rule. The unit rep-
resents the basic rule in Passive Voice chapter. In this example, “VB” (base form of
verb) and “VBD” (past form of verb) are word level of POS tags, “by” is the im-
portant word and “NP” (noun phrase) is the phrase level of POS tags. For instance,
usage of past participate type of verb (VBD) and usage of preposition “by” are more
characteristic parts in learning the passive voice than the other parts such as subject
of the sentence. If the student answers these parts properly, our system can estimate
that student has the knowledge of the passive voice.
294 A.S. Sunar et al.
After receiving the student’s all answers, the system evaluates the correctness of the
answers. Then, the system updates the understanding state of student based on the
checked answers.
Tagged sentences
Grammar Grammar
Match sentence with units
rules DB
Matched
0 with any
More than 1
units
Select appropriated unit
1
Yes Any other
sentences
No
Selected sentences
and units
matched unit. Thus, the learning strategy is applied to choose the appropriate unit
among the matched plural units to generate an exercise based on its grammar rule.
Figure 3 shows a flowchart for the selection process. If a sentence matches with
none of the units, any exercise is not generated from this sentence. Then, the system
carries out the process of matching for the other sentence of input text, if a sentence
exists. If a sentence matches with only one unit, the exercise is generated from this
sentence based on the grammar rule of this unit. If the sentence matches with more
than one unit, one of them is selected to generate the exercise from the sentence
based on the grammar rule of the selected unit. After the unit was selected, the
system checks whether there is another sentence in the input text or not. If there
is, the process of matching is preceded. The learning strategy is applied based on
the understanding state of student to select the unit. In order to judge whether a
student understands a unit or not, we introduce threshold α which determines the
understanding state of the student for each unit. α ranges from 0 to 1 (0 ≤ α ≤ 1). If
di is larger than α , it is judged that the student understood the unit i. Otherwise, the
system judges that s/he has not understood the unit i yet. In the selection process, the
unit which is the easiest one to advance to threshold α is selected. If all of candidate
units have already reached to α , the lowest one is selected.
After selecting the units, the placement of blank part is determined. In order to
decide the appropriate blank part of an exercise, our system considers the blank part
information of the matched grammar rule. In order to decide that a unit is completed
successfully, each blank part of unit should be studied and completed successfully.
Therefore, at least one exercise should be asked on each blank part. Thus, the per-
centage of correct answers per the blank part is also calculated. The blank part which
has the lowest percentage is determined as a candidate for the blank part. Then, our
296 A.S. Sunar et al.
system checks if results of CRF are compatible with decided candidates for the
blank part or not first. If the highest result of CRF is not one of candidates of blank
part of the unit, the next highest result is checked to generate a sufficient exercise.
5 Example
In our system, a student can input a long-length or short-length English text. In this
example, we assume that the student inputs one following sentence:
“If I were in your situation, I would ask to speak with the manager.”
(TOP (S (SBAR (IN If) (S (NP (PRP I) ) (VP (VBD were) (PP (IN in) (NP
(PRP$ your) (NN situation) ) ) ) ) ) (, ,) (NP (PRP I) ) (VP (MD would) (VP
(VB ask) (S (VP (TO to) (VP (VB speak) (PP (IN with) (NP (DT the) (NN
manager) ) ) ) ) ) ) ) (. .) ) ).
Then, the system compares the grammar structure of the sentence to the structures
of the defined grammar rules. Currently, 65 grammar rules have been defined. This
sentence is matched with the unit 2-1 (unit 1 of chapter 2) and the unit 4-5 (unit
5 of chapter 4). The chapter 2 is on “Infinitive” which has the basic rule which is
represented by “TO VB” and only one candidate for the blank part (TO VB); the
chapter 4 is on “If Clauses” which has the rule which is represented by “If NP VBD,
NP MD VB” and two candidates for the blank part (VBD or MD VB). These two
units are shown as follows:
Chapter 2 Infinitive
Unit 1 General rule of infinitive
Grammar rule TO VB
Candidate(s) of blank part “TO VB”
Chapter 4 If Clauses
Unit 5 Usage past tense in if clauses
Grammar rule If NP VBD, NP MD VB
Candidate(s) of blank part “VBD” or “MD VB”
Since this sentence is matched with two units, the system should decide one of them
to generate the exercise based on the student model. In this case, we set α = 0.6. A
number of exercises and student’s correct answer is shown in Table 2.
Generation Method of Multiple-Choice Cloze Exercises in Computer-Support 297
By using the expression (5), the understanding states of student for units d2−1
and d4−5 are calculated as shown in the expressions (6) and (7) respectively.
1 3
d2−1 = · = 0.75 (6)
1 4
1 1 2
d4−5 = · ( + ) = 0.58 (7)
2 2 3
The exercise is generated from the grammar rule of the unit 4-5 according to the
learning strategy while α is equal to 0.6, because the d4−5 is smaller than the thresh-
old. After selecting the unit, our system estimates the blank part. Since the part of
VBD has the higher result of CRF and the lower percentage of correct answers, the
part of VBD is estimated as the blank part of the exercise. Then distracters are gen-
erated for the estimated blank part based on the CRF. Finally, the system outputs the
exercise which was generated on the blank part (VBD) of the unit 4-5:
6 Conclusion
In this paper, we proposed a framework for the learning system which generates En-
glish multiple-choice cloze exercises from input text according to the understanding
state of student. We purpose a way to represent knowledge of English grammar. It
can be formulized by POS tags and words, because English grammar consists of
grammar rules. All grammar rules are defined as individual unite. In order to select
an appropriate one of the units which are matched with the structure of the input sen-
tence, the learning strategy has been defined. Furthermore, the calculation method
for estimating the understanding state of student in the student model for each unit
is defined.
298 A.S. Sunar et al.
For our future work, we have to add new English grammar rules to the domain
knowledge. We need to confirm that sentences and units can be matched correctly.
In addition, rules are independently defined for this time. For estimating the under-
standing state of a student more accurately, the relation among grammar rules may
need to be considered. In this paper, we focused on the process of generating ex-
ercises. We plan to consider a method for automatically providing explanations to
correct their mistake after evaluating the answers of the student.
References
1. Corbett, A.T., Koedinger, K.R., Anderson, J.R.: Handbook of Human-Computer Interac-
tion, pp. 849–874 (1997)
2. Heilman, M., Eskenazi, M.: Language Learning: Challenges for Intelligent Tutoring Sys-
tems. In: Proc. of the Workshop of Intelligent Tutoring Systems for Ill-Defined Domains,
8th International Conference on Intelligent Tutoring System, pp. 20–28 (2006)
3. Kyriakou, P., Hatzilygeroudis, I., Garofalakis, J.: A Tool for Managing Domain Knowl-
edge and Helping Tutors in Intelligent Tutoring Systems. Journal of Universal Computer
Science 16(19), 2841–2861 (2010)
4. Faulhaber, A., Melis, E.: An Efficient Student Model Based on Student Performance and
Metadata. In: 18th European Conference on Artificial Intelligent (ECAI 2008). Frontiers
in Artificial Intelligent and Applications (FAIA), vol. 178, pp. 276–280. IOS Press (2008)
5. Goto, T., Kojiri, T., Watanabe, T., Iwata, T., Yamada, T.: Automatic Generation System of
Multiple-Choice Cloze Question and its Evaluation. KM and E-Learning: An International
Journal 2(3), 210–224 (2010)
6. Tsuruoka, Y., Tsujii, J.: Bidirectional Inference with the Easiest-First Strategy for Tagging
Sequence Data. In: Proc. of HLT/EMNLP 2005, pp. 467–474 (2005)
7. Collins, M., Duffy, N.: New Ranking Algorithms for Parsing and tagging: Kernels over
Discrete Structures, and the Voted Perceptron. In: Proc. of 40th Annual Meeting of the
Association for Computational Linguistics, pp. 263–270 (2007)
8. Sang, T.K., Veenstra, J.: Representing Text Chunks. In: Proc. of EACL 1999, pp. 173–179
(1999)
Genetic Ensemble Biased ARTMAP Method
of ECG-Based Emotion Classification
1 Introduction
Several methods were developed to develop emotion recognition systems based on
facial and speech recognition [1] [2], as well as physiological signal measurements
T. Watanabe et al. (Eds.): Intelligent Interactive Multimedia: Systems & Services, SIST 14, pp. 299–306.
springerlink.com © Springer-Verlag Berlin Heidelberg 2012
300 C.K. Loo, W.S. Liew, and M. Shohel Sayeed
[3] [4] [5]. The main advantage of physiological measurements over facial and
speech recognition is the practicality of implementing the monitoring systems.
Physiological signal monitoring have the benefit of a constant and robust signal
recording, especially with the development of portable biosensors that are also un-
obtrusive for daily activities.
A study by Zhong et.al [6] analyzes the nonlinear components of heart-rate dy-
namics caused by the two main branches of the autonomic nervous system (ANS).
In addition to another study [7], these methods can be used to examine the fluctua-
tions of the ANS. Principal dynamic modes (PDM) allows for nonlinear analysis
on the separated dynamics in the ECG signal, as well as clear separation between
the contributions of the two ANS branches.
Individual emotion states are subject to variations due to external and internal
influences. This emotional drift requires that any autonomous emotion classifica-
tion system to be regularly updated with its users’ current physiological-emotion
data. The system must be capable of incorporating new learning patterns while re-
taining previous knowledge without performing the entire learning sequence. This
“stability vs. plasticity” dilemma can be minimized by using adaptive resonance
theory (ART) for pattern learning and classification.
ART-based neural networks were developed as a model of human cognitive in-
formation processing. During learning or training, certain input sequences with a
specific featural attention can distort the system’s memory and reduce its classifi-
cation accuracy. Biased ARTMAP [9] solves the problem of overemphasis on ear-
ly critical features by biasing attention away from previously attended features
when the system makes a predictive error. The strength of the biasing is controlled
by an attention parameter, λ.
Using Biased ARTMAP for pattern recognition reduces the number of vari-
ables which determines the system’s performance to two factors: the attention
parameter λ, and the sequence of the training data presentation. Optimum combi-
nations of λ and training data sequence can be computed efficiently by implement-
ing a genetic permutation method [10] to “evolve” the optimal combinations over
several generations.
To further improve the classification accuracy, a voting strategy is used to de-
termine the final class predictions. The proposed voting strategy [11] calculates
recognition rates of plurality voting techniques while considering the system's
measure of reliability, that is, the probability of a decision to be classified cor-
rectly given a specific input pattern. Using the reliability metric to describe a
given training data can reduce the classification error by rejecting suspicious train-
ing data which do not meet a minimum reliability requirement [12].
The final incarnation of the classification system will be a prototype emotion
recognition system that can be customized for individual by continuous feedback
of ECG measurements into the system to improve predictive accuracy of the indi-
vidual’s emotions. Constant online learning will generate enough data for the
system to adapt to its user’s emotional drift.
Genetic Ensemble Biased ARTMAP Method 301
Fig. 1.1 Block diagram of the Genetic Ensemble Biased ARTMAP system
3 Experiment
The performance of the Genetic Ensemble ARTMAP system was tested using several
datasets from the UCI Machine Learning Repository [15]. The tested datasets were
Dermatology, Glass, Hepato, and Wine, chosen for non-binary classification. Optimi-
zation was performed for each data set using the same methods outlined above. The
resultant pool of 220 potential training sequences were then used for training a single-
voter, five-voter, and ten-voter system. In addition, the 220 training sequences were
used to generate a bootstrapped mean with 1000 resamplings and 95% confidence
interval, as a representation of the overall classification accuracy of each data set.
Table 3.1 Prediction accuracy of bootstrapped mean of the genetic optimized population,
and the probabilistic voting system
The obtained results, when compared to similar literature using other pattern
classification methods with the same data sets, show comparable performance of
the Genetic Ensemble Biased ARTMAP with other contemporary methods.
The system is then tested using a database of physiological signals collected by
Wagner et.al. [3]. The data set consists of 100 ECG samples divided into four
emotion classes. Feature extraction was performed using an algorithm designed by
Wagner et.al. [14]. A total of 106 ECG features were extracted from each
recording, including several features not available in the original toolbox
algorithm. The features of principal dynamic modes [6] [7] were included to
provide nonlinear analysis to the overall feature set.
A genetic optimization algorithm was employed to obtain optimal training se-
quences for the data set for each value of λ from 0 to 10. The genetic optimization
process was iterated for 20 generations, and then repeated with another random
population of chromosomes for a different value of λ. A total of 220 chromosomes
were generated from the optimization exercise and the chromosomes with the best
fitness were chosen to train a Biased ARTMAP each. A probabilistic voting strat-
egy was then used to determine the final class prediction for any given data input.
For each value of λ, 220 training sequences were generated using the genetic op-
timization algorithm. The predictive accuracy of each individual training sequences
were obtained, and were used to generate a bootstrapped mean as a statistical aggre-
gate of the entire population’s predictive accuracy. Results show little distinction in
predictive accuracy when different values of λ were used. All bootstrapped mean
results were clustered around 65-67% accuracy, while the individual predictive
accuracies range from 54-78% prediction rate.
One hypothesis is that genetic ordering compensation inadvertently solves the
problem of early featural distortion which the Biased ARTMAP was designed to
solve. Nevertheless, the results were obtained from offline learning, and the bias-
ing technique will be more useful during online learning.
A probabilistic ensemble voting system was applied, in which N voters were
individually trained by N of the best training sequences from the combined popu-
lation of 220 chromosomes. Testing was performed on the voting system based on
probabilistic majority rules to determine the final class prediction of the test data.
Testing was repeated using a reliability metric to evaluate each class prediction.
Class predictions which did not meet the reliability threshold were removed from
the final accuracy calculation.
Table 3.3 Classification performance of Biased ARTMAP (λ = 0:10) with probabilistic en-
semble voting and reliability threshold
Conclusion
From the experiment, several conclusions can be drawn. The genetic optimization
algorithm is an effective method for training and testing an ARTMAP system for
pattern learning and classification. When combined with Biased ARTMAP, the
genetic optimization method rendered the biasing technique redundant. In addi-
tion, a more effective voter selection method will be required, as the current me-
thod reduces the predictive accuracy of the system with each additional voter add-
ed. Implementing a reliability threshold allows a slight increase in classification
accuracy by rejecting class predictions which do not meet the minimum consensus
among voting members, as opposed to simple majority voting. Overall, the Ge-
netic Ensemble Biased ARTMAP has comparable pattern prediction capability as
compared with other pattern classification methods such as LDA, kNN, and MLP.
The system’s features can be further improved based on the results from this
study.
References
[1] De Silva, L.C., Miyasato, T., Nakatsu, R.: Facial emotion recognition using multimo-
dal information. In: Proceedings on International Conference on Information, Com-
munications, and Signal Processing, vol. 1, pp. 397–401 (1997)
[2] Busso, C., Deng, Z., Yildrim, S., Bulut, M., Lee, C.M., Kazemzadeh, A., Lee, S.,
Neumann, U., Narayanan, S.: Analysis of emotion recognition using facial expres-
sions and multimodal information. In: Proceedings of the 6th International Confer-
ence on Multimodal Interfaces, pp. 205–211 (2004)
[3] Wagner, J., Kim, J., Andre, E.: From physiological signals to emotion: Implementing
and comparing selected methods for feature extraction and classification. In: IEEE In-
ternational Conference on Multimedia and Expo, pp. 940–943 (2005)
[4] Kim, K.H., Bang, S.W., Kim, S.R.: Emotion recognition system using short-term
monitoring of physiological signals. Medical and Biological Engineering and Com-
puting 42(3), 419–427 (2004)
[5] Mandryk, R.L., Atkins, M.S.: A fuzzy physiological approach for continuously mod-
eling emotion during interaction with play technologies. International Journal of Hu-
man-Computer Studies 65(4), 329–347 (2007)
[6] Zhong, Y., Wang, H., Ju, K.H., Jan, K.M., Chon, K.H.: Nonlinear analysis of the
separate contributions of autonomics nervous systems to heart rate variability using
principal dynamic modes. IEEE Transactions on Biomedical Engineering 51(2), 255–
262 (2004)
[7] Choi, J., Gutierrez-Osuna, R.: Using heart rate monitors to detect mental stress. In:
6th International Workshop on Wearable and Implantable Body Sensor Networks, pp.
219–223 (2009)
[8] Plutchik, R.: The nature of emotions. American Scientist (2001)
[9] Carpenter, G.A., Gaddam, S.C.: Biased ART: A neural architecture that shifts atten-
tion toward previously disregarded features following an incorrect prediction. Neural
Networks 23(3), 435–451 (2010)
306 C.K. Loo, W.S. Liew, and M. Shohel Sayeed
[10] Palaniappan, R., Eswaran, C.: Using genetic algorithm to select the presentation order
of training patterns that improves simplified fuzzy ARTMAP classification perform-
ance. Applied Soft Computing 9(1), 100–106 (2009)
[11] Lin, X., Yacoub, S., Burns, J., Simske, S.: Performance analysis of pattern classifier
combination by plurality voting. Pattern Recognition Letters 24, 1959–1969 (2003)
[12] Loo, C.K., Rao, M.V.C.: Accurate and reliable diagnosis and classification using
probabilistic ensemble simplified fuzzy ARTMAP. IEEE Transactions on Knowledge
and Data Engineering 17(11), 1589–1593 (2005)
[13] Carpenter, G.A., Grossberg, S., Reynolds, J.H.: ARTMAP: A self-organizing neural
network architecture for fast supervised learning and pattern recognition. Neural Net-
works 4(5), 565–588 (1991)
[14] Wagner, J.: The Augsburg Biosignal Toolbox (2009),
http://www.informatik.uni-augsburg.de/
en/chairs/hcm/projects/aubt/
(retrieved June 29, 2011
[15] Frank, A., Suncion, A.: UCI Machine Learning Repository Irvine, CA: University of
California, School of Information and Computer Science (2010),
http://archive.ics.uci.edu/ml (retrieved November 2011)
Honey Bee Optimization Based on Mimicry
of Threshold Regulation in Honey Bee Foraging
Abstract. Honey bees correctly allocate their work force to nectar sources using
the “waggle dance”. In addition, they can determine the necessity for a nectar
source. Thus, they possess a value threshold for a nectar source and they can col-
lectively regulate it. Based on the mimicry of the threshold regulation used in ho-
ney bee foraging, we are developing a system that allows agents to determine
whether their own solution is worth communicating to other agents. We propose a
novel bio-inspired optimization algorithm, honey bee optimization (HBO). HBO
is a multi-agent system based on the foraging activities of honey bees. To test the
characteristics of HBO, we applied it to the travelling salesperson problem (TSP).
1 Introduction
One of the most familiar social insects in the world is the honey bee. Work force
allocation by foraging honey bees is organized by the "waggle dance", which was
first identified by von Frisch[1]. The foraging activities of honey bees are well-
organized systems. Lucic and Teodorovic proposed the "Bee System (BS, an ar-
tificial bee swarm)" to solve the travelling salesman problem (TSP) in 2003[2]. Li-
Pei Wong et al. proposed bee colony optimization (BCO) for more general prob-
lem solving in 2008 [3, 4]. These algorithms are multi-agent systems based on the
“waggle dance”. To improve solutions efficiently, these optimization algorithms
control two functions. Because it is a type of roulette-wheel selection, the waggle
dance is a function of exploration. The function of exploitation inspired by honey
bee foraging is also needed.
In the bee hive, bees adjust the foraging criteria that individuals use to evaluate
the worth of a newly located nectar source. They automatically adjust their criteria
T. Watanabe et al. (Eds.): Intelligent Interactive Multimedia: Systems & Services, SIST 14, pp. 307–316.
springerlink.com © Springer-Verlag Berlin Heidelberg 2012
308 M. Furukawa and Y. Suzuki
and they change nectar sources to maximize nectar collection with minimal effort.
This is not well known and it is not adopted in optimization algorithms.
We proposed the honey bee optimization (HBO) algorithm, which uses a thre-
shold regulation system based on honey bee foraging. Each agent can determine
whether it should communicate its solution to other agents. This method allows
agents to control the overall variety of solutions.
We applied the HBO to the travelling salesperson problem (TSP). First, we de-
scribe honey bee foraging. We then set the threshold to allow agents to have indi-
vidual control over threshold regulation with multi-agent system. Finally, we
compared the results with fixed and regulated thresholds, before discussing our
findings.
sources too often. This is not a well-known system and it requires different types
of bees, so it has not been used in bee-inspired optimization algorithms.
1 Initialize
1.1 Setting agents, nectar source and hive in the field
1.2 Supplying every agent with one solution
2 Foraging (loop until satisfying a terminal condition)
2.1 Agents moving and collecting nectar
2.2 Return to the hive
3 Evaluation of the nectar source
-Each agent determined whether to advertising or abandon
4 Communication (improvement of the solution)
-Elite agents: advertising the nectar source
-Normal agents: abandoning the nectar source and acquiring a new source from
one of the elite agents.
5 Threshold regulation
6 Check the terminal condition
-Whether terminal iterations are completed
-End or back to 2
7 Output the best solution and end
310 M. Furukawa and Y. Suzuki
The agents imitate the bee's waggle dance. An elite agent with a shorter tour is
likely to be selected. They determine the dance probability using Eq. (1).
, ,
= (1)
∑
, ,
Pi: the dance probability, which is likelihood of being selected by normal agents
La,(i,next): distance from current city i to the next city
Nelite: number of elite agents
To improve the solutions, we need two groups of agents. A group of elite agents is
known as group E while the group of normal agents is known as group N.
For example, the improvement of the solution at city 1 by a normal agent
known as N0 with {…1,3…} in its city list is shown in Fig. 1. N0 will move to city
3 after city 1. When N0 arrives at city 1, it will randomly select one agent from
group E in the same city. Because elite agents E0 ~ E4 were in city 1, N0 selected
E1 with {…,1,2,…}. The city list of N0 changed from {…,1,3,…} to
{…,1,2,3,…}. Parts of tour…(1,3)…(a,2)(2,b)… became…(1,2)(2,3) …(a,b)…,
where a and b are arbitrary cities before and after city 2. N0 moved to city 2.
The tour length of N0 was increased by the length of (1,2),(2,3) and (a,b), and
decreased by the length of (1,3),(a,2) and (2,b), compared with its previous sta-
tus. If the decrease in length is longer than the increase, N0 may become an elite
agent in the next city and advertise other parts of its tour. In this way, all normal
agents select and modify their city list. If some normal agents succeed in im-
proving their city list, other normal agents will refer to their tour list and find a
shorter tour.
The number of normal agents moving to each city increases in proportion to the
number of elite agents advertising each city and each dance probability. Agents
use local search. This partial improvement does not assume that normal agents
have an optimal tour.
Elite agents move to the next city with normal agents. They do not change their
city list.
Honey Bee Optimization Based on Mimicry of Threshold Regulation 311
Ean: the average number of elite agents in each travelled city from the start until n
iterations of agent a
a: agent
n: iterations (n>2)
i: current city
en,i: number of elite agents in the current city i (n iterations)
312 M. Furukawa and Y. Suzuki
Because en,i will become too small with an increasing value of n, the En is mod-
ified based on the average of a tour. Thus, c is the number of cities used, rather
than n.
,
(4)
Ean - Ean-1
a
∆ran = - (0< rn-1 <1) (5)
Ean
4 Experiments
The HBO algorithm described in this paper was developed using JAVA where
Eclipse IDE 3.5 was the development tool.
Experiments were carried out with a regulated r, which is the initial value of r
was set at 0.10, 0.30, 0.50, 0.70, 0.90, and random numbers from 0 to 1. The num-
ber of agents was 1,000. The terminal iteration was 10,000 iterations.
In the experiment, we used the ATT48 taken from TSPLIB[5]. ATT48 is a 48-
city problem. The optimal tour length is 3329.
Results are presented where agents use an individually regulated r.
Optimal route length: 3329, Best: minimum tour length, nbest: iterations until found Best
Fixed r Regulated r
Table 1 shows a comparison of the Best using a fixed and regulated r. In the ta-
ble, each trial with a regulated r started with the same value used in each trial with
a fixed r. Best is the average value of the shortest tour length while nBest is the
average number of iterations until nBest was found in 10 trials. Average r is the av-
erage r of all agents when the Best was found in the regulated r trials.
In the trials using a regulated r, both Best and nBest were lower than those with a
fixed r, except r = 0.30. Overall, trials with a regulated r can find similar solutions
to every initial r.
5 Discussions
Fig. 2 shows the En, Lmin and Lave with a regulated r when the initial r was a
random value.
According to the features of En shown in Fig. 2, the search process was divided
into three different stages.
During the early stage, the agents with small L values are not all elite agents.
Therefore, these agents cannot efficiently search shorter tour lengths.
In Fig.3-(2), the number of agent with a large r increase. The increase in r is at-
tributable to the En decrease, as given by Eq. (6). Because the initial value of En is
the average number of agents a city and this is greater than the initial elite agents,
the En continues to decrease until the elite agents increase sufficiently. Thus, the
early stage is a preparatory period. After this, En begins to increase while r
decreases during the middle stage.
Figure 4 shows the number of agents for each r in each stage measured, when the
value of r is almost level. In the early stage, there are many agents with a high r.
Honey Bee Optimization Based on Mimicry of Threshold Regulation 315
~
From 8,000 36,000 iterations, the En is stable. During this period, the number of
agents versus the value of r is highest as shown in Fig. 4.
Because there are sufficient elite agents in every city at the start of the middle
stage, En remains stable or it increases slightly. Thus, r tends to decrease and
agents with shorter tour length become elite agents, as shown in Fig. 5.
In Fig. 5, normal agents have a small r that is close to 0. In contrast, elite agents
have a high r. During the middle stage, agents narrow down the candidate solu-
tions. However, agent with a high L also become elite agents if they have a high r
because these agents may not encounter many elite agents. These agents help to
maintain a variety of solutions, thereby avoiding convergence.
Fig. 5 r versus L for each agent during the middle stage, n = 20,000
The features described above were observed in other trials with different initial
values of r (Fig. 6). During the early stage, many agents had a high r in every trial.
In the trials started with a lower r value, agents tended to have a lower r. During
the middle stage, the distribution of r was similar to that shown in Fig. 4. Thus,
agents narrowed down the candidate solutions.
In summary, agents can improve solutions because of the increase in r during
the early stage and the decrease of r during the middle stage. In addition, there is
no restriction on the overall number of elite agents. They control the individual
threshold like bees.
316 M. Furukawa and Y. Suzuki
6 Conclusion
We proposed a novel bio-inspired optimization algorithm, honey bee optimization
(HBO), which mimics the threshold regulation in honey bee foraging. We applied
HBO to the travelling salesperson problem (TSP). We investigated the regulation
of the threshold.
In the future, this algorithm should be compared with the BCO algorithm,
where all agents advertise the solution.
References
[1] Frisch, K.: Decoding the language of the bee. Science 185(4152), 663–668 (1974)
[2] Lucic, P., Teodorovic, D.: Attacking Complex Transportation Engineering Problems.
International Journal on Artificial Intelligence Tools 12(3), 375–394 (2003)
[3] Wong, L., et al.: A Bee Colony Optimization Algorithm for Traveling Salesman Prob-
lem. In: Second Asia International Conference on Modelling & Simulation IEEE
Xplore, pp. 818–823 (2008)
[4] Wong, L., et al.: An Efficient Bee Colony Optimization Algorithm for Traveling Sa-
lesman Problem using Frequency-based Pruning. In: 2009 7th IEEE International Con-
ference on Industrial Informatics (INDIN 2009), pp. 775–782 (2009)
[5] TSPLIB,
http://www.iwr.uni-heidelberg.de/groups/comopt/
software/TSPLIB95/
IEC-Based 3D Model Retrieval System
1 Introduction
Recently, there has been the great demand for 3D CG animations in video game
industries and movie productions. For the creation of 3D CG animations, the char-
acter design is very important factor but very time-consuming, laborious work. It
is significant to reuse already existing data to create new data by modifying them.
Therefore, we are interested in providing 3D multimedia data retrieval systems that
enable 3D CG creators to effectively retrieve their required data.
3D CG animation mainly consists of texture image data, 3D model data and mo-
tion data. Although there have been many researches on image data retrieval and
search systems so far, there have been very few researches on motion data retrieval
and search systems. We have already proposed a motion retrieval system using
Seiji Okajima · Yoshihiro Okada
Graduate School of ISEE, Kyushu University, 744, Motooka, Nishi-ku,
Fukuoka, 819-0395 Japan
e-mail: seiji.okajima@inf.kyushu-u.ac.jp,
okada@inf.kyushu-u.ac.jp
T. Watanabe et al. (Eds.): Intelligent Interactive Multimedia: Systems & Services, SIST 14, pp. 317–327.
springerlink.com c Springer-Verlag Berlin Heidelberg 2012
318 S. Okajima and Y. Okada
Interactive Evolutionary Computation [20]. This system allows the user to retrieve
motion data similar to his/her required data easily and intuitively only through the
evaluation repeatedly performed by scoring satisfaction points to retrieved motion
data without entering any search queries. The IEC method of the system is based
on Genetic Algorithm, so that motion data should be represented as genes used as
similarity features for the similarity calculation in the system. This IEC-based re-
trieval system is very significant especially in the cases that users do not have any
query data can be entered to the system and they only have images of required data
in their mind. Recently, we have applied the IEC method based on GA to a 3D
model retrieval system. In this paper, we propose the 3D model retrieval system and
introduce 3D model features we employed in the system used for the similarity cal-
culation among 3D models. We also clarify the usefulness of the proposed 3D model
retrieval system by showing experimental results of 3D model retrievals practically
performed by several users. The results indicate that the proposed system is useful
for effectively retrieving required data from a 3D model database including many
data more than one thousand.
The remainder of this paper is organized as follows: First of all, Section 2 intro-
duces the IEC method based on GA. Section 3 describes related works. Next Section
4 introduces 3D model features we employed to use as their gene representation for
GA. After that, Section 5 describes how our proposed 3D model retrieval system
works. Furthermore, in Section 6, we describe user experiments and show their re-
sults to clarify the usefulness of the system. Finally, we conclude the paper in the
last section.
3 Related Works
There are many researches on 3D model search and retrieval systems. In general,
3D model search systems are regarded as similarity search systems for retrieving
3D model data similar to the 3D model entered as the search query. For similarity
IEC-Based 3D Model Retrieval System 319
search systems, the choice of what kinds of features as their similarity measures is
significant because it affects the search performances of the systems.
There are several kinds of similarity features of 3D model data, i.e., parameter
based features, graph based features and the others. There are some parameter based
features of 3D model data have been proposed so far. Osada et al. proposed D2 shape
distribution represented as a histogram of distances between any two random points
on a 3D model surface [14]. Vandeborre et al. proposed curvature histogram that
is a histogram of curvature values about randomly sampled points on a 3D model
surface. [19] Elad et al. proposed geometric moment that is a moment of a 3D shape
model [5]. Also, there are some graph based features of 3D model data have been
proposed so far. McWherter et al. proposed Model Graph is a graph constructed
from the component structure of a 3D model [10]. Hilaga et al. proposed the topol-
ogy matching method for 3D model similarity search that uses reeb graphs of 3D
models as similarity features of them [8]. As the other kinds of similarity features of
3D model data, there are appearance based features represented as 2D images. Ass-
falg et al. proposed to use signatures of spin images of 3D models as their similarity
features [1]. Chen et al. proposed light field descriptor based on silhouette images of
3D models [3]. Ohbuchi et al. proposed to use Salient Visual Feature of 3D models
as their similarity features [12].
On the other hand, in general, 3D model retrieval systems are regarded as simi-
larity search systems for retrieving 3D model data similar to the 3D models the user
wants specified by his/her interactive operations like selection of candidate data or
browse a whole database. For browsing based data retrieval systems, how to present
candidate data to the user, i.e., layout algorithms and visualization methods, is sig-
nificant in order to enable the user to find his/her required data as fast as possible. As
layout algorithms based on hierarchical structure of a database, there are tree-maps,
hyperbolic trees, cone trees, etc. As dimensionally reduction methods for similar-
ity feature based layouts, there are multidimensional scaling, principal component
analysis, self-organizing maps, etc. As one of the data retrieval systems through the
user’s interactive operations, there are several systems using IEC method. IEC is
proposed as the interactive calculation method that the user evaluates target data in-
teractively, and finally the system outputs optimized solution based on its evaluated
values. The remarkable point where IEC is useful is that the necessitated operation
is only the evaluation against data by the user. There are some experimental systems
as IEC based retrieval systems. Takagi et al., Cho et al. and Lai et al. proposed an im-
age and music retrieval system using Interactive Genetic Algorithm (IGA) [16, 4, 9].
Yoo et al. proposed video scene retrieval based on emotion using IGA [21]. Cho et
al. proposed a music retrieval system using IGA [4]. However, there is not any 3D
model retrieval system using IEC that retrieves and presents model data according
to the user requirement from a model database. In this paper, we propose such a 3D
model retrieval system using IEC method based on GA. In GA, 3D models should
be represented as their genes and we employ D2 shape distribution and curvature
histogram as similarity features of 3D models used for their gene representations
320 S. Okajima and Y. Okada
from the following reasons. In general, the graph matching needs the higher cal-
culation cost so that graph based similarity features are not suitable for interactive
systems like ours. D2 shape distribution marks good performances about not only
the similarity calculation cost but also the similarity accuracy. Furthermore, curva-
ture histogram is a completely different kind of similarity feature from D2 shape
distribution so that the combined use of them can compensate their weakness each
other.
Fig. 1 D2 histograms of typical shaped models, a sphere (left), a cylinder (middle) and a
torus (right).
Fig. 2 Curvature histograms of typical shaped models, a sphere (left), a cylinder (middle)
and a torus (right).
probabilities that individuals are selected by GA. We define fi is a fitness value. The
probability pi of the individual i selected by GA is calculated by
fi
pi = . (2)
∑k=1 fk
N
In addition, this expression assumes that a fitness value is positive. When the fitness
value of an individual is higher, the probability of it becomes higher. If some fitness
values are too high rather than others, it causes early convergence which the search
settles in the early stages.
There are some crossover operators for real-coded GA such as BLX-α [6, 7],
UNDX [13], SPX [18] and so on. In this study, we employ BLX-α because of its
simplicity and fast convergence. Let C1 = (c11 , ..., c1n ) and C2 = (c21 , ..., c2n ) be parents
chromosomes. Then, BLX-α uniformly picks new individuals with a number of the
interval [cmin − I · α , cmax − I · α ], where cmax = max(c1i , c2i ), cmin = min(c1i , c2i ), and
I = cmax − cmin .
For a mutation operator, we choose the random mutation operator [7, 11] because
this is often used in many cases. Let C = (c1 , ..., ci , ..., cn ) be a chromosome and
ci ∈ [ai , bi ] be a gene to be mutated. Then, ci is an uniform number picked from the
domain [ai , bi ].
6 User Experiment
In this section, we present experimental results about 3D model retrievals performed
using the proposed system by several subjects. Five students in Graduate School of
ISEE, Kyushu University volunteered to participate in this experiment. The experi-
ment was carried out on a standard PC with Windows XP Professional, a 2.66 GHz
Core 2 Quad processor and 4.0 GB memory.
As a 3D model database for the experiment, we employed Princeton Shape
Benchmark [15]. It contains around 1800 model data collected from the World Wide
324 S. Okajima and Y. Okada
Web. As for the GA operators, we employed roulette wheel selection operator, BLX-
α crossover operator and random mutation operator. The value of α is 0.5, crossover
rate is 1.0 and mutation rate is 0.01. The fitness values of three stage scoring are 2.0
for good, 0.5 for normal and 0.05 for bad. Also, the population size is 12 determined
from the results of our previous study [20].
In the experiment for evaluating the usefulness of our proposed system, the par-
ticipants retrieved each of five target models using the system. The five target models
were selected one by one from each of five class models that are tire, car, dolphin,
plant and human head. Each participant tried to search each of five target models
until 20 generations, and then, we obtained 25 trial results totally for the all partic-
ipants. We measured computation and operation times, and we explored retrieved
models. These trials were carried out according to the following procedure.
1. Introduction of the model retrieval system (1 minute).
2. Try to use the system for answering preparation questions (3 minutes).
3. Actual searches for target models using the system.
Number of Results
1) Retrieval of the same model as a target model 11
2) Retrieval of the same class model as a target model 14
3) Retrieval failure 0
Sum 25
IEC-Based 3D Model Retrieval System 325
Fig. 5 Counts of cases that same class or same 3D models are retrieved at each generation.
From these table and chart, it is said that the system retrieve same class 3D mod-
els as the desirable model before 20th generation, and in many cases, the user re-
trieves same class 3D models before 10th generation. These results indicate that our
proposed system is practically useful for retrieving 3D model data even in the case
of a huge database including many data more than one thousand.
References
1. Assfalg, J., Del, B.A., Pala, P.: Spin images for retrieval of 3D objects by local and global
similarity. In: Proc of the 17th International Conference on Pattern Recognition, ICPR
2004 (2004), doi:10.1109/ICPR.2004.1334675
2. Baker, J.E.: Reducing bias and inefficiency in the selection algorithm. In: Proc. of the
Second International Conference on Genetic Algorithms on Genetic Algorithms and their
Application, pp. 14–21 (1987)
326 S. Okajima and Y. Okada
3. Chen, D.Y., Tian, X.P., Shen, Y.T., Ouhyoung, M.: On visual similarity based 3D
model retrieval. In: Proc. of Eurographics Computer Graphics Forum, EG 2003 (2003),
doi:10.1111/1467-8659.00669
4. Cho, S.B.: Emotional image and musical information retrieval with interactive genetic
algorithm. Proc. of the IEEE 92(4), 702–711 (2004), doi:10.1109/JPROC.2004.825900
5. Elad, M., Tal, A., Ar, S.: Content based retrieval of VRML objects - an iterative and
interactive approach. EG Multimedia, 97–108 (2001)
6. Eshelman, L.J., Schaffer, J.D.: Real-Coded Genetic Algorithms and Interval-Schemata.
In: Foundations of Genetic Algorithms 2, pp. 187–202. Morgan Kaufman Publishers,
San Mateo (1993)
7. Herrera, F., Lozano, M., Verdegay, J.L.: Tackling Real-Coded Genetic Algorithm: Oper-
ators and Tools for Behavioural Analysis. Journal of Artifitial Intelligence Review 12(4),
265–319 (1998), doi:10.1023/A:1006504901164
8. Hilaga, M., Shinagawa, Y., Kohmura, T., Kunii, T.L.: Topology matching for fully
automatic similarity estimation of 3D shapes. In: Proc. of the 28th Annual Confer-
ence on Computer Graphics and Interactive Techniques, SIGGRAPH 2001 (2001),
doi:10.1145/383259.383282
9. Lai, C.-C., Chen, Y.-C.: Color Image Retrieval Based on Interactive Genetic Algo-
rithm. In: Chien, B.-C., Hong, T.-P., Chen, S.-M., Ali, M. (eds.) IEA/AIE 2009. LNCS,
vol. 5579, pp. 343–349. Springer, Heidelberg (2009), doi:10.1007/978-3-642-02568-
6 35
10. McWherter, D., Peabody, M., Regli, W.C., Shokoufandeh, A.: Solid Model Databases:
Techniques and Empirical Results. Journal of Computing and Information Science in
Engineering (2001), doi:10.1115/1.1430233
11. Michalewicz, Z.: Genetic Algorithms + Data Structures = Evolution Program. Springer
(1994)
12. Ohbuchi, R., Osada, K., Furuya, T., Banno, T.: Salient local visual features for shape-
based 3D model retrieval. In: IEEE International Conference on Shape Modeling and
Applications 2008 (2008), doi:10.1109/SMI.2008.4547955
13. Ono, I., Kobayashi, S.: A real-coded genetic algorithm for function optimization using
the unimodal normal distribution crossover. In: Proc. of the Seventh International Con-
ference on Genetic Algorithms, pp. 246–253 (1997)
14. Osada, R., Funkhouser, T., Chazelle, B., Dobkin, D.: Shape distributions. Journal of
ACM Transactions on Graphics (TOG) 21(4) (2002), doi:10.1145/571647.571648
15. Shilane, P., Min, P., Kazhdan, M., Funkhouser, T.: The Princeton Shape Benchmark. In:
Proc. of Shape Modeling Applications 2004 (2004), doi:10.1109/SMI.2004.1314504
16. Takagi, H., Cho, S.B., Noda, T.: Evaluation of an IGA-based image retrieval system
using wavelet coefficients. In: IEEE International Conference on Fuzzy Systems (1999),
doi:10.1109/FUZZY.1999.790176
17. Takagi, H.: Interactive Evolutionary Computation: Fusion of the Capacities of EC
Optimization and Human Evaluation. Proc. of the IEEE 89(9), 1275–1296 (2001),
doi:10.1109/5.949485
18. Tsutsui, S., Yamamura, M., Higuchi, T.: Multi-parent Recombination with Simplex
Crossover in Real Coded Genetic Algorithm. In: Proc. of the 1999 Genetic and Evo-
lutionary Computation Conference (GECCO 1999), pp. 657–664 (1999), doi:10.1007/3-
540-45356-3 36
IEC-Based 3D Model Retrieval System 327
19. Vandeborre, J.P., Couillet, V., Daoudi, M.: A practical approach for 3D model in-
dexing by combining local and global invariants. In: Proc. of 1st International Sym-
posium on 3D Data Processing Visualization and Transmission, pp. 644–647 (2002),
doi:10.1109/TDPVT.2002.1024132
20. Wakayama, Y., Okajima, S., Takano, S., Okada, Y.: IEC-Based Motion Retrieval System
Using Laban Movement Analysis. In: Setchi, R., Jordanov, I., Howlett, R.J., Jain, L.C.
(eds.) KES 2010. LNCS (LNAI), vol. 6279, pp. 251–260. Springer, Heidelberg (2010),
doi:10.1007/978-3-642-15384-6 27
21. Yoo, H.W., Cho, S.B.: Video scene retrieval with interactive genetic algorithm. Multime-
dia Tools and Applications(2007), doi: 10.1007/s11042-007-0109-8
Incremental Representation and Management
of Recursive Types in Graph-Based Data Model
for Content Representation of Multimedia Data
Abstract. A data model incorporating the concepts of recursive graphs has been
proposed for representing the contents of multimedia data. A shape graph, which
represents the structure of a set of instances, has to catch their incremental updates.
It is difficult to manage instances when they have recursive structure. This paper
proposes a method of managing the recursive structure of instances. The procedure
incrementally revising the information of the structure of shape graphs is presented.
Owing to this procedure, the recursive structure could incrementally and properly
be managed and represented in the shape graph.
1 Introduction
In recent years, content retrieval of multimedia data has extensively been inves-
tigated. Using graphs representing the contents of multimedia data is one of the
major approaches addressing to the content retrieval of multimedia data. Petrakis
et al. have proposed the representation of the contents of medical images by using
directed labeled graphs [1]. Uehara et al. have used the semantic network in order
to represent the contents of a scene of a video clip [2]. Jaimes has proposed a data
model representing the contents of multimedia by using four components and the
relationships between them [3]. Contents of video data is represented with a kind of
tree structure in XML [4].
We have proposed a graph-based data model, the Directed Recursive Hypergraph
data Model (DRHM), for representing the contents of multimedia data [5, 6, 7, 8].
It incorporates the concepts of directed graphs, recursive graphs, and hypergraphs.
An instance graph is the fundamental unit in representing an instance. A collec-
tion graph is a graph having instance graphs as its components. A shape graph
Teruhisa Hochin · Yuki Ohira · Hiroki Nomiya
Kyoto Institute of Technology, Goshokaidocho, Mastugasaki, Sakyo-ku,
Kyoto 606-8585 Japan
e-mail: hochin@kit.ac.jp,nomiya@kit.ac.jp
T. Watanabe et al. (Eds.): Intelligent Interactive Multimedia: Systems & Services, SIST 14, pp. 329–339.
springerlink.com c Springer-Verlag Berlin Heidelberg 2012
330 T. Hochin, Y. Ohira, and H. Nomiya
represents the structure of the collection graph. Shape graphs may change when
instance graphs are inserted, or modified. As the existence of instance graphs ef-
fects shape graphs, DRHM is said to be an instance-based data model. Moreover,
generalization and specialization relationships have been introduced to DRHM [8].
An instance graph could include other instance graphs in it. Instance graphs
could, of course, represent self-nested objects. A self-nested object is the object
containing the same type of objects as the type of container object. An example of
the self-nested object is a nested lunch box, whose picture is shown in Fig. 5(a).
A nested lunch box is a lunch box, and a container. The largest lunch box contains
a middle-sized lunch box, which contains the smallest one. Whereas the instance
graph has the capability of representing such nested objects, the shape graph, which
represents the structure of a set of instance graphs, could not properly represent this
situation.
This paper proposes a revision of the shape graph. The shape graph is extended
to capture the self-nested objects. The mapping function from a collection graph to
the shape graph is revised in the formal definition. This extension brings the new
construct to the notation of the shape graph.
This paper is organized as follows: Section 2 briefly explains the structure in
DRHM by using examples. Section 3 describes the formal definition of DRHM. The
extension for self-nested shape graphs is proposed in Section 4. Some considerations
are made in Section 5. Lastly, Section 6 concludes this paper.
Example 1. Consider the representation of the picture shown in Fig. 1(a). An or-
nament is on a floor. The ornament is consisted of three bags of rice and a tassel.
Fig. 1(b) represents the contents of this picture in DRHM. An instance graph is
represented with a round rectangle. For example, g1 is an instance graph. An edge
is represented with an arrow. A dotted round rectangle surrounds a set of initial or
terminal elements of an edge. For example, g5 and g6, which are surrounded by a
dotted round rectangle, are the initial elements of the edge e2. When an edge has
only one element as an initial or terminal element, the dotted round rectangle could
be omitted for simplicity. The instance graph g4, which is the terminal element of
the edge e2, is an example of this representation. An instance graph may contain
instance graphs and edges. For example, g1 contains g2, g3, e1, and e4.
(a) (b)
Fig. 1 (a) A picture and (b) an instance graph representing its contents.
(a) (b)
The structure of a collection graph is represented with the graph called a shape
graph. It corresponds to a relation schema in the relational model. The collection
graph, whose structure the shape graph represents, is called its corresponding col-
lection graph.
Example 3. Figure 2(b) shows the shape graph for the collection graph Ornament
shown in Fig. 2(a). It represents that an instance graph ornament includes an
instance graph object, and an instance graph object is connected to object
by an edge pos.
A shape graph does not have to exist prior to the creation of a collection graph.
Inserting an instance graph results in the creation of a shape graph if the shape
graph describing the definition of the instance graph does not exist yet. It may, of
332 T. Hochin, Y. Ohira, and H. Nomiya
course, exist prior to the collection graph creation. A shape graph must exist while
a collection graph exists. A shape graph may change when new instance graphs
are inserted into the corresponding collection graph, or the instance graphs in it are
modified. Once shape graphs are created, they are not deleted by deleting instance
graphs. Shape graphs can be deleted only by the operation deleting the shape graphs.
φv (φe , φconnect , respectively) is the union of the mappings φvi (φei , φconnecti ), and
φcomp is the union of the mappings φcompi and φcompcg . Each component instance
graph (gi ) is called a representative instance graph. A database is a set of collection
graphs. The name of a collection graph must be unique in a database.
A shape graph represents the structure of instance graphs in a collection graph. The
instance graphs having the same name are mapped to a shape graph, whose name is
that of the instance graphs. The edges having the same name are similarly mapped
to a shape edge [7].
Definition 4. The structure of a shape graph is the same as that of a collection graph.
That is, it is represented with a 9-tuple. An edge in a shape graph is called a shape
edge. The labels of shape graphs and shape edges are different from those of collec-
tion graphs. The label of a shape graph or shape edge is a triple (sid , nmd , DT ), where
sid is an identifier, nmd is a name of a shape graph or shape edge, and DT is a set of
data types of data values. This label is called a shape label. There are the following
relationships between a collection graph (nmcg ,V, E, Lv , Le , φv , φe , φconnect , φcomp )
and its corresponding shape graph (nmsg ,Vs , Es , Lvs , Les , φvs , φes , φconnect s , φcomps ).
Here, nm(l) represents the name in a label l, and nm(L) = { nm(l) | l ∈ L }, where
L is a set of labels.
• nmcg = nmsg
• nm(Lvs ) ⊇ nm(Lv ),
• nm(Les ) ⊇ nm(Le ),
• There is a mapping θv : V → Vs such that ∀ v ∈ V ∃ vs ∈ Vs (nm(φv (v)) =
nm(φvs (vs )) ∧ θv (v) = vs ),
• There is a mapping θe : E → Es such that ∀ e ∈ E ∃ es ∈ Es (nm(φe (e)) =
nm(φes (es ))∧ θe (e) = es ), and φconnect (e) = (U,W ) ⇒ φconnects (θe (e)) = (Θv (U),
Θv (W )), where Θv (U) means a set of shape graphs {θv (v1 ), · · · , θv (vn )} for a set
of instance graphs U = {v1 , · · · , vn }, and
• φcomp (v) = U ∪ Z ⇒ φcomps (θv (v)) = Θv (U) ∪ Θe (Z), where Θe (Z) means a set
of shape edges {θe (e1 ), · · · , θe (en )} for a set of edges Z = {e1 , · · · , en }.
A shape graph has to be changed in order to satisfy the conditions described above
when new instance graphs are inserted, or instance graphs are modified.
Algorithm Analyze
Input
S: a set of pairs <instance graph (g), path(path)>
Tcomp : a set of triplets <gname, Pnr , Pr >
Output
Tcomp : revised Tcomp
Method
1: foreach ent in S
2: push ent to Q; // Q is a queue
3: end
4: while(Q is not empty)
5: ent = pop Q;
6: path = ent.path; nm = current(path); nm p = parent(path);
7: if(nm p == NULL) then
8: if(nm ∈ / NM) then
9: add <nm, {}, {}> to Tcomp ;
10: endif
11: else // nm p = NULL
12: if(nm p ∈ NM) then
13: if(path ∈/ Pr (nm p ) ∧ path ∈
/ Pnr (nm p )) then
14: if(nm ∈ NM) then
15: add path to Pr (nm p );
16: else
17: add path to Pnr (nm p );
18: endif
19: endif
20: else // (nm p ∈ / NM)
(nm)
21: lowest order = min(order(path), order(Pnr ));
22: processed=move path f rom Pnr to Pr(Tcomp , path, lowest order);
23: if(processed == true ∧ order(path) > lowest order) then
24: add < nm p , {}, {path} > to Tcomp ;
25: else
26: add < nm p , {path}, {} > to Tcomp ;
27: endif
28: endif
29: endif
30: foreach ig in φcomp (ent.g)
31: push <ig, path +str ”;” +str nm(θv (ig))> to Q;
32: end
33: end
End
relationships. That is, it represents the relationship between an instance graph and
the instance graphs included in it. The procedure Analyze revises Tcomp .
When Analyze is firstly called, a set of representative instance graphs are specified
as the instance graphs of the first argument S. In this case, S is a set of pairs of
the form (a representative instance graph, its name) because representative instance
graphs have no parents. Moreover, Tcomp has no entry because it is the first call.
Incremental Representation and Management of Recursive Types 335
The procedure Analyze has to analyze whether a type of instance graph included
in another one is already defined or not. If the type of instance graph is already
defined, it must be treated as a recursive instance graph, which includes itself in
it. Therefore, the included instance graphs are managed in separating recursive in-
stance graphs from non-recursive ones. In the procedure, a set of paths of non-
recursive (recursive, respectively) instance graphs is managed in Pnr (Pr ). An entry
of the table Tcomp is a triplet of (gname, Pnr , Pr ), where gname is a name of an in-
stance graph. In the procedure, Pnr (nm) (Pr (nm), respectively) means Pnr (Pr ) of the
entry of Tcomp , whose gname is nm. Moreover, NM represents a set of the names,
each of which is gname of the entry in Tcomp .
In the procedure Analyze, the elements in S are pushed to Q (lines 1–3). In lines
7–8, if the parent name is not obtained, then the name of the current instance graph
is registered to Tcomp . If the parent name is obtained, the procedure tries to insert the
path (lines 11–29). If the parent name nm p is already registered (line 12), the path is
not registered yet (line 13), and the current name nm is also already registered (line
14), then the path is registered as a recursive instance graph (line 15). Otherwise,
the path is registered as a non-recursive instance graph (line 17). Please note that the
path is registered to Pr or Pnr if the path is not registered yet because Pr and Pnr are
sets. If the parent name is not registered yet (line 20), then the parent name and its
path is registered to Tcomp (lines 21–28). The lowest order of nm is obtained because
the path having the lowest order is registered or kept as a path of a non-recursive
instance graph. Then, the function move path f rom Pnr to Pr() is invoked. This
function is not precisely described because it is cumbersome and requires many
lines for description, but it is not difficult. This function finds the path which is of
nm, and which includes the parent of nm or whose parent is included in path in
all of Pnr . If the path is found, it is decided that nm is not shared. In this case, the
path having the lowest order be registered or kept in Pnr , and the other paths are
moved from Pnr to Pr . This function returns true when nm is not shared. If nm is not
shared and the order of the path is not the lowest one (line 23), the path is registered
as a recursive instance graph (line 24). Otherwise, the path is registered as a non-
recursive instance graph (line 26). Lastly, the instance graphs included in the current
instance graph are pushed into Q (lines 30–32). The procedure described above is
repeated until Q has an entry.
Example 5. Let us consider the collection graph shown in Fig. 4(a). There are two
representative instance graphs (g10 and g20). The procedure Analyze produces
Tcomp shown in Fig. 4(b). The instance graph B non-recursively contains the instance
graph D, and recursively contains the instance graph B.
By using the table Tcomp , the structure of the shape graph and the mapping function
φcomps described in Definition 4 are revised as follows.
Definition 5. the structure of the shape graph and the mapping function φcomps are
re-defined as follows:
• The structure of a shape graph is represented with a 10-tuple: (nmsg ,Vs , Es , Lvs ,
Les , φvs , φes , φconnect s , φcomps , Tcomp ).
336 T. Hochin, Y. Ohira, and H. Nomiya
Fig. 4 (a) An example of a collection graph including a nested instance graph, (b) its Tcomp ,
and (c) its shape graph.
• φcomp (v) = U ∪ Z ⇒ φcomps (θv (v)) = Wnr (v,U, Tcomp ) ∪ Wr (v, Tcomp ) ∪ Θe (Z),
where Wnr (v,U, Tcomp ) = {θv (u)|nm(φv (u)) = Pnr (nm(φv (v))) ∧ u ∈ U}, Wr (v,
Tcomp ) is a set of shape graphs whose name is included in Pr (nm(φv (v))), and
Θe (Z) is the same as that shown in Definition 4.
Here, several shape graphs for self-nested objects are described. This example also
demonstrates the representation of the nested shape graph.
Example 6. Let us consider a nested lunch box. In a nested lunch box, a large lunch
box contains a smaller lunch box, and it contains the smallest one. A nested lunch
box is shown in Fig. 5(a). In this figure, three lunch boxes are displayed in parallel
for clarity. The collection graph of the nested lunch box is shown in Fig. 5(b). The in-
stance graph corresponding to the largest lunch box contains the one corresponding
to the middle-sized one. It similarly contains the instance graph corresponding to the
smallest one. The shape graph of the collection graph is shown in Fig. 5(c). A lunch
box recursively contains another lunch box. The inner lunch box is represented with
a broken line. This represents that the inner lunch box is already represented (or
defined) as the outer instance graph.
Fig. 5 (a) A nested lunch box, (b) the collection graph, and (c) its shape graph.
Example 6 is of the self nested object containing the same kind of objects. The next
example is of the self nested object that may contain another kind of object.
Incremental Representation and Management of Recursive Types 337
Fig. 6 (a) A handcrafted nested box, (b) the collection graph, and (c) its shape graph.
(a) (b)
Fig. 7 (a) The evolved collection graph of the nested box and (b) its shape graph.
Example 7. Let us consider a handcrafted nested box shown in Fig. 6(a). The largest
box contains a smaller box. The smaller box contains a block. In Fig. 6(a), two boxes
and a block are displayed in parallel for clarity. The collection graph and the shape
graph of the nested box are shown in Fig. 6(b) and Fig. 6(c), respectively. In the
instance graph, the inner instance graph box contains the instance graph block.
In the shape graph, the shape graph block is contained in the shape graph box.
Example 8. Let us consider the situation that another instance graph block is in-
serted into the instance graph block in the collection graph shown in Fig. 6(b).
The collection graph updated is shown in Fig. 7(a). The shape graph evolves from
the one shown in Fig. 6(c) into the one shown in Fig. 7(b).
5 Consideration
An instance graph could include instance graphs in it. When shape graphs represent
their structure, the shape graph is shared if possible. An example of the representa-
tion is the shape graph D shown in Fig. 4(c). This shape graph is shared by the shape
graphs B and C. If the shape graph cannot be shared, the inner shape graph refers to
the outer one as the recursive shape graph, e.g., B.
In conventional schema-based data models [9], data are defined before they are
inserted. In this case, data definition plays a role of a kind of constraints. That is,
illegal data are not permitted to be inserted into a database. On the other hand,
any data can be inserted under an instance-based data model. As inserting and/or
updating data may result in the modification of the information on the structure of
338 T. Hochin, Y. Ohira, and H. Nomiya
data, maintaining this information is very important. The proposed method enables
incremental modification of this information even if it includes recursive structure.
Semistructured data and XML data are considered to be instance-based data.
Research efforts have been made to derive a kind of schema from such data
[11, 12, 13, 14, 15, 16, 17, 18]. Some methods follow automata or grammar ap-
proaches [12, 15, 17, 18]. A clustering method or the Minimum Description Length
(MDL) principle may be used in deriving a kind of schema information [14, 16].
As XML data are represented with a kind of tree, the methods for XML data
are not applicable for deriving shape graphs from instance graphs because inclu-
sion relationships of instance graphs may constitute a graph structure rather than
a tree structure. In the research efforts on the semistructured data, graph structures
are considered [11, 12, 13, 14]. Although DataGuides [11] could precisely repre-
sent the structure of semistructured data, the cost of incremental maintenance is
high [12,13]. Wang et al. have proposed the approximate graph schema for summa-
rizing semistructured data graphs by using an incremental clustering method [14].
The method proposed in this paper does not approximate data. The shape graphs,
however, have the structure similar to the approximate graph schema by unifying
the shape graphs having the same name into one. The proposed method could bring
the concise representation of instance graphs to users.
6 Conclusion
This paper extended the shape graph in order to capture the self-nested objects.
The mapping function from a collection graph to the shape graph was revised in
the formal definition. The procedure Analyze obtaining the recursive structure was
clarified. The mapping function could be defined through the information obtained
by the procedure Analyze. The recursively contained shape graph is represented with
a broken line in drawing shape graphs. These extensions enable the shape graph to
represent the self-nested objects properly.
Future research includes the application of the schema graph to the real applica-
tion. We have a plan to represent the contents of video capturing Japanese traditional
craft workers’ movements.
References
1. Petrakis, E.G.M., Faloutsos, C.: Similarity Searching in Medical Image Databases. IEEE
Trans. on Know. and Data Eng. 9, 435–447 (1997)
2. Uehara, K., Oe, M., Maehara, K.: Knowledge Representation, Concept Acquisition and
Retrieval of Video Data. In: Proc. of Int’l Symposium on Cooperative Database Systems
for Advanced Applications, pp. 218–225 (1996)
Incremental Representation and Management of Recursive Types 339
3. Jaimes, A.: A Component-Based Multimedia A Data Model. In: Proc. of ACM Workshop
on Multimedia for Human Communication: from Capture to Convey (MHC 2005), pp.
7–10 (2005)
4. Manjunath, B.S., Salembier, P., Sikora, T. (eds.): Introduction to MPEG-7. John Wiley
& Sons, Ltd (2002)
5. Hochin, T.: Graph-Based Data Model for the Content Representation of Multimedia
Data. In: Proc. of 10th Int’l Conf. on Knowledge-Based Intelligent Information and Eng.
Systems (KES 2006), pp. 1182–1190 (2006)
6. Hochin, T., Nomiya, H.: A Logical and Graphical Operation of a Graph-based Data
Model. In: Proc. of 8th IEEE/ACIS Int’l Conference on Computer and Information Sci-
ence (ICIS 2009), pp. 1079–1084 (2009)
7. Hochin, T.: Decomposition of Graphs Representing the Contents of Multimedia Data.
Journal of Communication and Computer 7(4), 43–49 (2010)
8. Ohira, Y., Hochin, T., Nomiya, H.: Introducing Specialization and Generalization to a
Graph-Based Data Model. In: König, A., Dengel, A., Hinkelmann, K., Kise, K., Howlett,
R.J., Jain, L.C. (eds.) KES 2011, Part IV. LNCS, vol. 6884, pp. 1–13. Springer, Heidel-
berg (2011)
9. Silberschatz, A., Korth, H., Sudarshan, S.: Database System Concepts, 4th edn. McGraw-
Hill (2002)
10. Tanaka, K., Nishio, S., Yoshikawa, M., Shimojo, S., Morishita, J., Jozen, T.: Obase Ob-
ject Database Model: Towards a More Flexible Object-Oriented Database System. In:
Proc. of Int’l. Symp. on Next Generation Database Systems and Their Applications
(NDA 1993), pp. 159–166 (1993)
11. Goldman, R., Widom, J.: DataGuides: Enabling Query Formulation and Optimization
in Semistructured Databases. In: Proc. of 23rd Int’l Conf. on Very Large Databases, pp.
436–445 (1997)
12. Nestorov, S., Ullman, J., Wiener, J., Chawathe, S.: Representative Objects: Concise Rep-
resentations of Semistructured, Hierarchical Data. In: Proc. of 13th Int’l Conf. on Data
Engineering (ICDE 1997), pp. 79–90 (1997)
13. Soe, D.-Y., Lee, D.-H., Moon, K.-S., Chang, J., Lee, J.-Y., Han, C.-Y.: Schemaless Repre-
sentation of Semistructured Data and Schema Construction. In: Tjoa, A.M. (ed.) DEXA
1997. LNCS, vol. 1308, pp. 387–396. Springer, Heidelberg (1997)
14. Wang, Q.Y., Yu, J.X., Wong, K.-F.: Approximate Graph Schema Extraction for Semi-
structured Data. In: Zaniolo, C., Grust, T., Scholl, M.H., Lockemann, P.C. (eds.) EDBT
2000. LNCS, vol. 1777, pp. 302–316. Springer, Heidelberg (2000)
15. Chidlovskii, B.: Schema Extraction from XML Data: a Grammatical Inference Ap-
proach. In: Proc. of 8th Int’l Workshop on Knowledge Representation Meets Databases,
KRDB 2001 (2001)
16. Garofalakis, M., Gionis, A., Rastogi, R., Seshadri, S., Shim, K.: XTRACT: Learning
Document Type Descriptors from XML Document Collections. In: Data Mining and
Knowledge Discovery, vol. 7, pp. 23–56. Kluwer Academic Publishers (2003)
17. Bex, G.J., Neven, F., Schwentick, T., Vansummere, S.: Inference of Concise Regular
Expressions and DTDs. ACM Trans. on Database Systems 35(2) (2010)
18. Bex, G.J., Gelade, W., Neven, F., Vansummere, S.: Learning Deterministic Regular Ex-
pressions for the Inference of Schemas from XML Data. ACM Trans. on the Web 4(4)
(2010)
Intelligent Collage System
1 Introduction
A digital collage is an effective performance of various images with multiple
closed or different scenarios according to user preferences. During intelligent col-
lage modeling, we decide the task of optimal placement of collage segments; this
is equivalent the task of cut and package. Also we need in suitable ROI which are
selected under some criteria and associated with any subject from our life envi-
ronment. When we have a limited set of photos, a collage system is restricted in
selection of informative images; otherwise an algorithm for detection of ROI is
run. So we apply a simple boosting procedure of face detection and a removing
procedure of repeatable and non-informative frames from video sequences. The
optimal placement of collage regions will be insufficient if we don’t use one of
seamless joint methods between regions. The most efficient method of collage
T. Watanabe et al. (Eds.): Intelligent Interactive Multimedia: Systems & Services, SIST 14, pp. 341–350.
springerlink.com © Springer-Verlag Berlin Heidelberg 2012
342 M. Favorskaya, E. Yaroslavtzeva, and K. Levtin
design is a use of a seamless blending of regions with similar color areas. Such
smoothness transitions between collage regions need in algorithms which select
similar color boundaries and maintain smoothness and opaque parameters. Also it
is desired that an automatic collage system has a high computer speed for photos
and videos processing.
One can find in literature many methods which are separately concerned to
photo collage and video collage [1–3]. We have proposed an integrated approach
when we can consider a video collage as an extension of photo collage technique.
In the case of video collage, we have all tasks of photo collage and additionally a
selection of informative frames which are called key-frames. Three aspects char-
acterize a collage design: sampling of images-candidates, their content analysis,
and a seamless joint of collage regions. Existing methods of images-candidates
sampling from photo set or video representation are classified into two categories,
i.e., stochastic skimming and intelligent summary. Stochastic skimming composes
random series of frames from one or multiple photos collection or video se-
quences. The more perspective intelligent summary contains selected photos of
key-frames in handle mode or automatically. The task of automatic extraction has
a special interest, and we can consider it as a particular case of content analysis in
multimedia databases. The following aspect is in content analysis into selected
photos or key-frames. Two paradigms frame-based and ROI-based are available
for such systems. The advantage of the second paradigm is evident because photo
or key-frame may include many non-informative areas. Methods of seamless
blending may be simple and complex; our contribution is a usage of flexible
contours for these purposes.
2 Related Work
Yeung and Yeo were the pioneer researches whom represented their system “Pic-
ture Collage” [4]. A compact and smooth algorithm which automatically and
seamlessly arranges the ROI within a photo collage was proposed by Rother et al.
[5]. This project was called “AutoCollage”. Then Wang et al. advanced ideas of
“AutoCollage” on video sequences and proposed “Video Collage” [6]. ROI were
extracted from representative temporal key-frames and aggregated in spatial struc-
tures with blended boundaries. Wang et al. preferred fixed rectangles for ROI and
for final collage, and this strategy was not a good decision for a human aesthetic
perception. A great diversity of collage templates with arbitrary shaped ROI and
different styles will significantly improve the browsing video content. That’s why
a kind of enhanced video collage, called “Free-Shaped Video Collage” was pro-
posed by Yang et al. [7]. At present great variability of shapes and templates is
introduced into multiple software products which represent handle tools for a
collage design.
Many authors find criteria of cut and package of collage regions for automatic
ROI placement on final image according to such basic properties as representative,
compact, and visually smooth. In some researches these properties are formulized
as a series of energies [7]. For each pixel pC in a collage template C its label L(pC)
depends from the number n of frames F∈{F1, F2, … , Fn}, the resize factor r of
Intelligent Collage System 343
frame Fi which makes more saliency regions larger (resized frame Fi′(r)), and a
shift parameter s which denotes the 2D shift between original image I(p) and re-
sized frame Fi′(r). The optimal decision is in minimization of an energy function
E(L):
E (L ) = w1 E rep (L ) + w2 Ecomp (L ) + w3 E smo (L ) , (1)
where Erep(L), Ecomp(L), and Esmo(L) denote a representative cost, a compact cost,
and a visual smooth cost respectively; w1, w2, and w3 are their empirical weights.
More detailed estimations of Erep(L), Ecomp(L), and Esmo(L) are also defined ac-
cording to empirical dependences that don’t made great contribution in theory of
aesthetic human performances. We used another way based on traditional estima-
tors in digital image theory, i. e. mask preparation for ROI, contour analysis, af-
fine transformations, and descriptors of color, brightness and smoothness. Also
three collage styles: book, diagonal, and spiral [7] are limited a representation of a
final collage. We propose an intelligent search algorithm based on optimization of
ROI placement, overlapping, and sizes of ROI.
However video collage has some principal challenges which make photo col-
lage techniques are directly unsuitable into video collage: selection of representa-
tive key-frames from temporal scenes and alignment of the most salient ROI.
More enhanced techniques were applied an effective summarization of content
such as stained-glass visualization [8], video snapshots [9], and some others.
The paper is organized as follows. Section 3 briefly summarizes the enhanced
method of frames selection. Section 4 is devoted to describing the main underly-
ing ideas of the proposed algorithm of cut and package. A way of seamless blend-
ing based on flexible contours is described into Section 5. System “Intelligent
Collage” and experimental results are presented in the next Section 6 while
Section 7 contains conclusion and future research.
3 Frames Selection
We have proposed the enhanced method of frames selection from video sequence.
Such method possesses additional criteria that were not considered in known pub-
lications. We develop an automatic frames selection based on heuristic research
technique and production system of decision making. Primarily we find first frame
F0 randomly and check if it satisfies the determined constraints. We formulate
such constraints in a following manner:
1. Current frame Fi can not be equivalent to any frame from a set of selected
frames FC={F0, …}.
2. Current frame Fi is not a “pause” frame between scenes in videos.
3. Current frame Fi is not an intermediate frame (black or white) between
scenes.
4. Current frame Fi does not include subtitles.
5. Current frame Fi contains ROI (faces, buildings, etc.).
344 M. Favorskaya, E. Yaroslavtzeva, and K. Levtin
If we analyze a TV video sequence then we will have a deal with logotypes. This
problem is decided by ROI segmentation in the center of frame because logotypes
are previously situated in four corners of frame. If current frame Fi is not satisfy to
the declared constraints, then we check the following frame Fi+1 until the needed
frame F1 would be detected and written into a set of selected frames FC. This set is
limited by a number of collage segments (in our case, 6). In the same way we find
the second selected frame F2, the third selected frame F3 and so on. If no objects of
interest were detected in whole video sequence (according to a number of iterations)
then random frames will be selected which satisfy to other remaining criteria.
The enhanced method of frames selection permits to extract more representa-
tive sampling not only from one video sequence but also from some available
video sequences. The following stage is a ROI cut, sorting and package collage
segments on final collage.
where xi, yj are coordinates of current segment; xi±1, yj±1 are coordinates of adjacent
horizontal and vertical segments correspondingly; L is a parameter “A size of
overlapping segments”.
A high size of a second segment (second image from a pair) is calculated as
H 2 = H 0 − H 1 (1 − L) , (3)
Third and fourth segments are located in right part of canvas similarly to first pair
of segments. Remaining images are fifth and sixth segments; they are situated in
unfulfilled area between first and third, second and forth segments correspond-
ingly. The sizes of current segments depend from a square of unfulfilled areas and
overlapping sizes:
W5 ,6 = W0 − W1,2 (1 − L ) − W3 ,4 (1 − L) ,
[ x5 , y5 ] = [ x1 + W1 (1 − L ); y1 ] , (5)
[ x6 , y 6 ] = [ x2 + W2 (1 − L); y5 + H 5 (1 − L)] ,
where W0 is a weight size of canvas; Wi is a weight size of corresponding segment.
A high size of fifth and sixth segments is calculated proportionally.
Then we analyze unfulfilled areas on collage canvas. If such areas are detected
then sizes of adjacent segments are increased until existing disruptions would be
deleted. On Fig. 1 one can see existing approaches and a result of our technique.
Fig. 1 Examples of segments placement: a) and b) first and last stages of segments place-
ment (by k-means method) in system “Mobile photo collage” [2]; c) and d) results without
segments overlapping and with segments overlapping in system “Microsoft Autocollage”;
e) our results with 6 segments
346 M. Favorskaya, E. Yaroslavtzeva, and K. Levtin
Ri , j = k i , j ⋅ n + n; k = Fli , j n 2 , (6)
where i, j are coordinates; Ri,j is a blob radios; ki,j is a coefficient on filled level of
block; n is a size of block; Fli,j is a factor of filled level by pixels owned to object
of interest. Coefficient k may increase a blob size.
Coordinates of blob center are determined as coordinates of mass center of ROI
divided on value Fli,j. Then we generate array of blobs which approximate ROI.
The block analysis method is characterized by increased complexity, and use
100÷200 blocks (instead of 1÷10 blocks in method of shape estimation). However
with optimization, the block analysis method has enough computer speed and high
accuracy in comparison with the method of shape estimation.
The received array of approximated graphic primitives is used for fuzzy masks
generation during seamless blending between ROI on collage canvas. Recursive al-
gorithm of blending applies an image processed by a fuzzy mask on step n–1 as a
background on following step n. Fuzzy mask is a grey-scale image fragment with
ROI sizes. It permits to impose a collage segment on background with determined
transparency for each pixel. There are two principal different ways of fuzzy mask
generation: a simple method of semitransparent transitions between segments and
background and an adaptive method based on calculated ROI approximation. The
second method is a more preferred decision for large and non-uniformed ROI.
Imposed procedure is executed as so called α-blending of textures according to
following rules. White areas of fuzzy mask give opaque ROI, black areas of fuzzy
mask give opaque background, and grey-scale colors create semitransparent com-
binations of ROI and background by formulas:
, j = Ii, j ⋅ Ii, j
I iout ROI mask
(
+ I ibg, j I max − I imask
,j ), (7)
Intelligent Collage System 347
where i, j are coordinates; Iout is an output pixel value; IROI is a ROI pixel value;
Imask is a mask pixel value; Ibg is a background pixel value; Imax is a maximum val-
ue in determined color space.
The result image received by Eq. (7) is an output collage with effect of seamless
blending. The main advantage of such adaptive method of collage design consists in
availability of ROI detection with exceptions of non-informative regions and design
of blending transitions between ROI and background. Thanks to these properties the
output collage becomes demonstrable, visual attractive and balanced product.
Module Functions
Module of frame selection from video 1. Removing of similar frames.
sequences 2. Removing of frame-transitions between scenes.
3. Subtitles removing.
4. Blending frames removing.
5. Face detection.
Module of segment placement 1. Location according to determined prototype.
2. Random location.
3. Interactive random location with canvas filling.
4. Automatic location.
Module of ROI detection 1. Method of shape estimation.
2. Block analysis method.
Module of collage generation 1. Seamless blending.
2. Generation a fuzzy mask.
3. α-blending.
a b
c d
Fig. 2 Types of collages: a) a collage designed by simple compositing regions on back-
ground; b) a collage with seamless blending effect without adaptive ROI; c) a collage with
seamless blending effect and adaptive ROI by method of shape estimation; d) a collage
with seamless blending effect and adaptive ROI by block analysis method
Intelligent Collage System 349
7 Conclusion
Adaptive methods which were applied in software “Intelligent Collage”, v. 2.0
permit to realize some interesting functions and achieve a high aesthetic effect of a
designed collage. Collage relevance is also on high level thanks to exact ROI
selection based on development methods of shape estimation and block analysis.
We will continue researches for increase of accuracy of ROI selection in
frames. We plan to design a hybrid method including advantages of shape estima-
tion and block analysis methods. A shape of a single blob will be analyzed and
approximated by graphic primitives with more suitable shape. Also we assume to
extend a set of criteria for objects detection by using pattern recognition methods.
References
1. Diakopoulos, N., Essa, I.: Mediating Photo Collage Authoring. In: UIST, pp. 183–186
(2005)
2. Man, H.-L., Singhal, N., Cho, S., Park, I.K.: Mobile photo collage. In: IEEE WECV,
pp. 24–30 (2010)
350 M. Favorskaya, E. Yaroslavtzeva, and K. Levtin
3. Mei, T., Hua, X.-S., Zhu, C.-Z., Zhou, H.-Q., Li, S.: Home Video Visual Quality As-
sessment with Spatiotemporal Factors. IEEE Transactions on Circuits and Systems for
Video Technology 17(6), 699–706 (2007)
4. Yeung, M.M., Yeo, B.L.: Video visualization for compact presentation and fast brows-
ing of pictorial content. IEEE Trans. on CSVT 7(5), 771–785 (1997)
5. Rother, C., Bordeaux, L., Hamadi, Y., Blake, A.: Autocollage. In: SIGGRAGPH, pp.
847–852 (2006)
6. Wang, T., Mei, T., Hua, X.-S., Liu, X., Zhou, H.-Q.: Video Collage: A Novel Presen-
tation of Video Sequence. In: ICME, pp. 1479–1482 (2007)
7. Yang, B., Mei, T., Sun, L.-F., Yang, S.-Q., Hua, X.-S.: Free-Shaped Video Collage. In:
Satoh, S., Nack, F., Etoh, M. (eds.) MMM 2008. LNCS, vol. 4903, pp. 175–185.
Springer, Heidelberg (2008)
8. Chiu, P., Girgensohn, A., Liu, Q.: Stained-glass visualization for highly condensed
video summaries. In: ICME, pp. 2059–2062 (2004)
9. Ma, Y.-F., Zhang, H.-J.: Video snapshot: A bird view of video sequence. In: MMM
2005, pp. 94–101 (2005)
10. Favorskaya, M.: A Way to Recognize Dynamic Visual Images on the Basis of Group
Transformations. Pattern Recognition and Image Analysis 21(2), 179–183 (2011)
11. Favorskaya, M.: Motion Estimation for Object Analysis and Detection in Videos. In:
Handbook “Advances in Reasoning-Based Image Processing, Analysis and Intelligent
Systems: Conventional and Intelligent Paradigms, pp. 211–253. Springer (2012)
12. Favorskaya, M., Zotin, A., Damov, M.: Intelligent Inpainting System for Texture
Reconstruction in Videos with Text Removal. In: ICUMT, pp. 867–874 (2010)
Intuitive Humanoid Robot Operating
System Based on Recognition and
Variation of Human Body Motion
1 Introduction
In recent years, personal robots have been researched intensively. Personal
robot is a robot which living the same environment as human and sup-
porting daily life while communicating with a human. The various personal
robots have been developed, for example, PaPeRo [7] and PARO [12]. These
Yuya Hirose · Shohei Kato
Dept. of Computer Science and Engineering, Graduate School of Engineering,
Nagoya Institute of Technology,
Gokiso-cho Showa-ku Nagoya 466-8555 Japan
e-mail: {hirose,shohey}@juno.ics.nitech.ac.jp
T. Watanabe et al. (Eds.): Intelligent Interactive Multimedia: Systems & Services, SIST 14, pp. 351–361.
springerlink.com c Springer-Verlag Berlin Heidelberg 2012
352 Y. Hirose and S. Kato
personal robots are designed for intended therapeutic effects on users. Hu-
manoid personal robots which is in the similar shape of human, have also
been developed. The humanoid personal robots have potential to work in
the same environment as human, for example, household chores, helping the
elderly, and simple task in the workplace because of its human form. There-
fore, the development of humanoid personal robots is expected, because it
can help humans in many ways if it is spread.
In the very recent years, many and various robot control systems and
the interfaces have been proposed and developed. The personalization of the
robot control also has attached much attention. The user’s personalized mo-
tion can be adapted into the robot motion by the dynamic and directly control
in which a user can intuitively operate the robot. For example, a motion cap-
ture is one of the most usable interface to obtain user’s free motion, and the
obtained motion data are directly used for the robot control in the existing
study [6] [10]. However, direct motion control system may probable catch
some unintended motions and reflect them to the robot, hence it may make
the user’s usability down. On the other hand, the recognition of the user’s
motion is also used in the robot control system [5] [9] in order to reduce the
unintended motion, hence it may be usable for problem in which no mistakes
are allowed. However, using only the recognition mechanism in the robot mo-
tion control, the personalization of robot motion control can not be realized.
Therefore, we consider that the personalization requires both the recogni-
tion of a user’s motion and reflection of a user’s dynamic control adjustment.
Thus, we propose the robot motion control system with combination of mo-
tion recognition and dynamic adjustment, and discuss the usability of the
proposed system through the task and subjective evaluation experiments.
N (OY λm
u
Y ) > Th
i
N (OZ u
λm
Z ) > Th
i
u
sequences of user’s motion u on each axis OX , OYu and OZu
, HMMs (λuX , λuY
and λuZ ) are each constructed using Baum-Wealth algorithm, respectively.
Set of the robot’s joint J u is composed of the joints whose angle needs
to be changed for the user’s motion u. J u is heuristically determined, for
example, J down = {shoulderjoint} in the case of the arm down motion and
J push = {shoulderjoint, elbowjoint} in the case of pushing forward the arm.
the likelihood for the training data used in learning HMM. The normalized
likelihood of motion u on D-axis is calculated as follows.
u
L(OD | λm
D ) − L(λD )
i mi
u
N (OD | λm i
D ) = , (2)
σ(λm i
D )
where, L(λm i mi
D ), σ(λD ) each shows the average and standard deviation of the
likelihood for training data of motion mi on D-axis.
The proposed system recognizes a motion with the normalized likelihood
based on algorithm shown in Table 1.
widely bend, compactly bend and spread) for Sr . Figure 7 shows the exper-
imental environment. We calculated the average time, the standard error of
the average time and the average number of failure to success the task using
each control system. And, we also calculated the recognition rates of each
motion when using the proposed system and Sr . Table 2 and 3 shows the
experiment result.
From Table 2, we confirmed that the proposed system could success the
task without difference of size, and it had a versatility compared with Sr . It
was also shown that the proposed system could success the task in a shorter
358 Y. Hirose and S. Kato
time than Sj . It was suggested that the proposed system could control the
robot intuitively by associated user’s motion with robot’s motion. Besides,
it was shown that the proposed system was less failure number than Sk . It
was suggested that the robot dropped a box in the situation the robot was
holding a box in its hands, because Sk directly transfered the user’s motion
to the robot and reflected even a minute motion of the user’s hands. On
the other hand, it was suggested that the proposed system could prevent
redundant motions on performing the task, because it selects a motion from
the previously learned motions by the body motion recognition. From the
above results, it was confirmed that the proposed system had relatively high
versatility, and enabled user could perform the task quickly and successfully.
Furthermore, as focusing on the standard error, it can be confirmed that the
difference of the task achieve time between users is small. From the result, the
difference is comparable to Sk in which user’s motion is directly corresponded
to the robot one. Hence, it was suggested that the proposed system dose not
need proficiency.
Meanwhile, from Table 3, the proposed system showed relatively high
recognition rate for each motion. And, especially in the bend motion, the
proposed system showed higher recognition rate than Sr . From these facts,
it was suggested that the proposed system realized not only higher degree of
freedom but also improvement of recognition rate by the dynamic adjustment
of the joint angle. Therefore it seems that the realization of high recognition
rate for varied motion is one of the factor for user to achieve the task quickly
and successfully. Overall, from these results, we confirmed that the proposed
system enables user to operate humanoid robot appropriately.
Intuitive Humanoid Robot Operating System Based on Recognition 359
1
S
0 Sj
Sk
-1 Sr
-2
: means 1% significance
: means 5% significance
-3
Hardly- Non-in- Non-af- Unaccus-
Bad Old
handled tuitive finitive tomed
4 Conclusion
In this paper, we proposed the intuitive robot control system with the com-
bination of the recognition of a user’s motion and dynamic adjustment of
360 Y. Hirose and S. Kato
robot’s joint angle. The proposed system recognizes user’s motion using HMM
and dynamically reflects the joint angle variation to control robot motion.
Through the task experiment, we confirmed that the proposed system had
relatively higher capability of completing task than the compared systems.
And through the subjective evaluation experiment, we confirmed that the
proposed system could control the robot intuitively. From the experimental
results, we suggest that the proposed system has high usability for robot con-
trol and provides intuitive robot control that a robot appropriately perform
user’s intended motion.
In this paer, the proposed system did not feed back the robot’s state in-
formation to the user, therefore the user could not catch the robot’s state
and its sorroundings. For example, by the feedback function with a pressure
sensor, it is expected that operability of the robot can be improved.
In our future, to verify utility of the proposed system, we will conduct addi-
tional other experiments and the communication experiment using humanoid
robot between remote locations.
References
1. Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incom-
plete data via the em algorithm. The Journal of the Royal Statistical Society
B 39, 1–38 (1973)
2. Forney, G.D.: The viterbi algorithm. Proc. IEEE 61(3), 268–278 (1973)
3. Gams, A., Mudry, P.-A.: Gaming controllers for research robots: controlling a
humanoid robot using a wiimote. In: 17th International Electrotechnical and
Computer Science Conference, ERK 2008 (2008)
4. Guo, C., Sharlin, E.: Exploring the use of tangible user interfaces for human-
robot interaction: A comparative study. In: Proceeding of the SIGCHI Confer-
ence on Human Factors in Computing System, pp. 121–130 (2008)
5. Inamura, T., Nakamura, Y., Toshima, I., Tanie, H.: Embodied symbol emer-
gence based on mimesis theory. Int’l J. of Robotics Research 23(4), 363–378
(2004)
6. Nakaoka, S., Nakazawa, A., Yokoi, K.: Generating whole body motions for a
biped humanoid robot from captured human dances. In: Proceedings of the
IEEE International Conference on Robotics and Automation, vol. 3, pp. 3905–
3910 (2003)
7. NEC. PaPeRo, http://www.nec.co.jp/products/robot/en/
8. Osgood, C., Suci, G., Tannenbaum, P.: The measurement of meaning. Univer-
sity of Illinois Press, Urbana (1967)
9. Pook, P.K., Ballard, D.H.: Recognizing teleoperated manipulations. In: Pro-
ceedings of the IEEE International Conference on Robotics and Automation,
pp. 578–585
Intuitive Humanoid Robot Operating System Based on Recognition 361
10. Riley, M., Ude, A., Atkeson, C.G.: Methods for motion generation and interac-
tion with a humanoid robot: Case studies of dancing and catching. In: Proceed-
ings of AAAI and CMU Workshop on Interactive Robotics and Entertainment,
pp. 35–42 (2000)
11. Smith, C., Christensen, H.I.: Wiimote robot control using human motion mod-
els. In: The 2009 IEEE/RSJ International Conference on Intelligent Robots
and Systems, pp. 5509–5515 (2009)
12. Shibata, T.: PARO, http://paro.jp/english/
13. Tukey, J.W.: The problem of multiple comparisons. Mimeographed Monograph
(1953)
Knowledge-Based System for Automatic 3D
Building Generation from Building Footprint
1 Introduction
A 3D urban model shown in Fig.1 bottom is important in urban planning and in
facilitating public involvement. To facilitate public involvement, 3D models simu-
lating a real or near future cities by a 3D CG (Computer Graphics) can be of great
use. However, enormous time and labour has to be consumed to create these 3D
models, using 3D modeling software such as 3ds Max or SketchUp. For example,
when manually modeling a house with roofs by Constructive Solid Geometry
(CSG), one must use the following laborious steps:
(1) Generation of primitives of appropriate size, such as box, prism or polyhedron
that will form parts of a house (2) Boolean operations are applied to these primi-
tives to form the shapes of parts of a house such as making holes in a building
body for doors and windows (3) Rotation of parts of a house (4) Positioning of
parts of a house (5) Texture mapping onto these parts.
Kenichi Sugihara
*
Xinxin Zhou
Nagoya Bunri University, Inazawa-chou, Inazawa-City, Aichi-Pref., Japan 492-8520
e-mail: xinxin@nagoya-bunri.ac.jp
Takahiro Murase
Chukyo Gakuin University, 2216 Toki-chou, Mizunami-City, Gifu-Pref., Japan 509-6101
e-mail: murase@chukyogakuin-u.ac.jp
T. Watanabe et al. (Eds.): Intelligent Interactive Multimedia: Systems & Services, SIST 14, pp. 363–373.
springerlink.com © Springer-Verlag Berlin Heidelberg 2012
364 K. Sugihara, X. Zhou, and T. Murase
CG Module ( MaxScript )
*Generating 3D models of appropriate size, such as boxes, prisms forming the parts &
Boolean operation, making holes for doors and windows
*Rotating and positioning 3D models
*Automatic texture mapping onto 3D models
2 Related Work
Since 3D urban models are important information infrastructure that can be uti-
lized in several fields, the researches on creations of 3D urban models are in full
swing. Various types of technologies, ranging from computer vision, computer
graphics (CG), photogrammetry, and remote sensing, have been proposed and de-
veloped for creating 3D urban models.
Using photogrammetry, Gruen et al. [1998, 2002] introduced a semi-automated
topology generator for 3D building models: CC-Modeler. Feature identification
and measurement with aerial stereo images is implemented in manual mode. Dur-
ing feature measurement, measured 3D points belonging to a single object should
be coded into two different types according to their functionality and structure:
boundary points and interior points. After these manual operations, the faces are
defined and the related points are determined. Then the CC-Modeler fits the faces
jointly to the given measurements in order to form a 3D building model.
Suveg and Vosselman [2002] presented a knowledge-based system for auto-
matic 3D building reconstruction from aerial images. The reconstruction process
starts with the partitioning of a building into simple building parts based on the
building polygon provided by 2D GIS map. If the building polygon is not a rec-
tangle, then it can be divided into rectangles. A polygon can have multiple parti-
tioning schemes. To avoid a blind search for optimal partitioning schemes, the
minimum description length principle is used. This principle provides a means of
giving higher priority to the partitioning schemes with a smaller number of rectan-
gles. Among these schemes, optimal partitioning is ‘manually’ selected. Then, the
building primitives of CSG representation are placed on the rectangles partitioned.
These proposals and systems, using photogrammetry, will provide us with a
primitive 3D building model with accurate height, length and width, but without
details such as windows, eaves or doors. The research on 3D reconstruction is
366 K. Sugihara, X. Zhou, and T. Murase
Thus, the GIS and CG integrated system that automatically generates 3D urban
models immediately is proposed, and the generated 3D building models that con-
stitute 3D urban models are approximate geometric 3D building models that citi-
zens and stakeholder can recognize as their future house or real-world buildings.
Fig. 1. This aerial photo and digital map also show that most building polygons
are orthogonal polygons. An orthogonal polygon can be replaced by a combina-
tion of rectangles.
A ’branch roof’ is a roof that is cut off by a DL and extends to a main roof. To cut
off one rectangle, the edge crossed by a DL is three or four edges adjacent from a
‘L’ vertex, when following edges of a polygon clockwise or counter-clockwise.
Knowledge-Based System for Automatic 3D Building Generation 369
Stage 1 Stage 2
Building Polygon Expression From ‘L’ vertex, two possible DLs can be drawn.
Among DLs, a shorter DL that cuts off one rectan-
LRRRLLRRLRRLRRLRLLRRRL gle or a DL whose length is shorter than the width
of a ‘main roof’ can be selected.
Stage 4
Stage 3
Stage 5 Stage 6
Partitions will continue until the number of After partitions, 3D building models are auto-
vertices of a body polygon is four. matically generated on divided rectangles by
using CSG.
depends on the comparison between Len(FCP) and Len(j2sf) and the comparison
between Len(jsf) and Len(jpb). Of the two DLs from FCP or BCP, a shorter DL
will be selected for partition. In the 3rd case of nR=3 in Fig. 3, the rectangle consist-
ing of vertices; pt(jsf) and pt(j2sf), pt(jpb), pt(A) is not partitioned but separated as
an independent one. This is the only case where the separation occurs.
Fig. 3 (nR=4) shows three cases of drawing DL when nR is four. The way of
drawing DL depends on the comparison between Len(jsf)and Len(j2pb). With the
exception of 1st case of nR=4 where the vertices of pt(jsf) and pt(j2sf), pt(j2pb),
pt(jpb) form one rectangle, the partition method of nR=4 or more is the same as the
method of nR=3 since the branch that is formed by these 4 or more ‘R’ vertices
will be self-intersecting and cannot form ‘one rectangle’.
FCP BCP
FCP BCP BCP FCP
nR=2 : three cases of DL depending on the comparison between Len(FCP) and Len(jpb)
FCP
pt(jsf) BCP pt(jpb)
FCP
BCP FCP pt(A) BCP pt(jpb)
nR=3 : three cases of DL depending on the comparison between Len(FCP) and Len(j2sf)
and the comparison between Len(jsf) and Len(jpb)
pt(jsf)
6 Conclusion
For everyone, a 3D urban model is quite effective in understanding what if this al-
ternative plan is realized, what image of the town were or what has been built.
Traditionally, urban planners design the future layout of the town by drawing
building polygons on a digital map. Depending on the building polygons, the inte-
grated system automatically generates a 3D urban model so instantly that it meets
the urgent demand to realize another alternative urban planning.
In this paper, a new scheme for an orthogonal polygon partitioning is proposed;
the system divides a polygon along the thin part of its branches. Thus, the pro-
posed integrated system succeeds in automatically generating typical residential
areas.
The limitation of the system is that automatic generation is executed based only
on ground plans or top views. There are some complicated shapes of buildings
whose outlines are curved or even crooked. To create these curved buildings, the
system needs side views and front views for curved outlines information.
Future work will be directed towards the development of methods for:
References
1. Daniel, A.G., Paul, R.A., Daniel, B.R.: Style Grammars for interactive Visualization of
Architecture. IEEE Transactions on Visualization and Computer Graphics 13, 786–797
(2007)
2. Gruen, A., Wang, X.: CC Modeler: A topology generator for 3D urban models. ISPRS
J. of Photogrammetry and Remote Sensing 53, 286–295 (1998)
3. Gruen, A., et al.: Generation and visualization of 3D-city and facility models using
CyberCity Modeler. MapAsia, 8, CD-ROM (2002)
4. Daniel, B.R., Daniel, A.G.: Build-by-number: rearranging the real world to visualize
novel architectural spaces. In: Visualization, VIS 2005, pp. 143–150. IEEE (2005)
5. Jiang, N., Tan, P., Cheong, L.-F.: Symmetric architecture modeling with a single im-
age. ACM Transactions on Graphics - TOG 28(5) (2009)
6. Aichholzer, O., Aurenhammer, F., Alberts, D., Gärtner, B.: A novel type of skeleton
for polygons. Journal of Universal Computer Science 1(12), 752–761 (1995)
7. Aichholzer, O., Aurenhammer, F.: Straight Skeletons for General Polygonal Figures in
the Plane. In: Cai, J.-Y., Wong, C.K. (eds.) COCOON 1996. LNCS, vol. 1090, pp.
117–126. Springer, Heidelberg (1996)
8. Mueller, P., Wonka, P., Haegler, S., Ulmer, A., Van Gool, L.: Procedural Modeling of
Buildings. ACM Transactions on Graphics 25(3), 614–623 (2006)
9. Kenichi, S.: Automatic Generation of 3D Building Model from Divided Building Po-
lygon. In: ACM SIGGRAPH 2005, Posters Session, Geometry & Modeling, CD-ROM
(2005)
10. Kenichi, S.: Generalized Building Polygon Partitioning for Automatic Generation of
3D Building Models. In: ACM SIGGRAPH 2006, Posters Session Virtual & Aug-
mented & Mixed Reality & Environments, CD-ROM (2006)
11. Suveg, I., Vosselman, G.: Automatic 3D Building Reconstruction. In: Proceedings of
SPIE, vol. 4661, pp. 59–69 (2002)
12. Carlos, V.A., Daniel, A.G., Bedřich, B.: Building reconstruction using Manhattan-
world grammars. In: 2010 IEEE Conference on Computer Vision and Pattern Recogni-
tion (CVPR), pp. 358–365 (2010)
13. Parish, Y.I.H., Müller, P.: Procedural modeling of cities. In: Fiume, E. (ed.) Proceed-
ings of ACM SIGGRAPH 2001, pp. 301–308. ACM Press, New York (2001)
14. Zlatanova, S., Heuvel Van Den, F.A.: Knowledge-based automatic 3D line extraction
from close range images. International Archives of Photogrammetry and Remote Sens-
ing 34, 233–238 (2002)
Locomotion Design of Artificial Creatures
in Edutainment
Abstract. This paper discusses a methodology for project-based learning (PBL) for
edutainment using locomotion robots. The aim of PBL is to develop a locomotion
robot as a new shape of artificial creature not existing in the natural world, and to
design new locomotion patterns. Therefore, the PBL is composed of two steps; (1)
Students conduct conceptual design of artificial creatures, and (2) Students develop
hardware design and locomotion generation based on the conceptual design. This
paper introduces two examples of development of locomotion robots. Based on their
experience on problems and troubles in the trial and error through the development
of locomotion robots, the students learn the relationship between the shape and
locomotion patterns, the tradeoff the stability and high-speed locomotion, and the
difficulty of problem solving.
1 Introduction
Recently, biologically inspired robots have been discussed and developed from vari-
ous viewpoints, e.g., Kukanchi [1] and Mobiligence [2]. Kukanchi is the fundamental
concept based on interactive human-space design and intelligence. This research di-
rection is related with the human-centered environmental design, intelligence space,
and human-friendly robots. The living and moving ability of animals and people de-
pends strongly on their surrounding environments. Therefore, we should discuss the
relationship between shape and locomotion of animals and robots. Especially, ani-
mals behave adaptively in diverse environments. In the concept of Mobiligence, the
mechanisms for the generation of intelligent adaptive behaviors have been discussed
to emerge from the interaction of the body, brain, and environment until now.
On the other hand, various types of embedded systems have been applied to edu-
tainment until now. In general, edutainment is known as a coined word of education
with entertainment. Basically, there are three different aims in robot edutainment. One
is to develop knowledge and skill of students through the project-based learning (PBL)
by the development of robots (Learning on Robots). Students can learn basic knowl-
edge on robotics itself by the development of a robot [3,4]. The next one is to learn the
interdisciplinary knowledge on mechanics, electronics, dynamics, biology, and
T. Watanabe et al. (Eds.): Intelligent Interactive Multimedia: Systems & Services, SIST 14, pp. 375–384.
springerlink.com © Springer-Verlag Berlin Heidelberg 2012
376 K. Toyoda et al.
informatics by using robots (Learning through Robots) [5]. The last is to apply human-
friendly robots instead of personal computers for computer assisted instruction (Learn-
ing with Robots) [6,7]. A student learns (together) with a robot. We also have applied
robots to the fields of education to aim at realizing new robot edutainment, and have
developed various types of robots for courses in primary and junior high schools since
2009. Furthermore, we tested them in experimental classes as activities of Science
Partnership Projects (SPP) [8] of Japan Science and Technology Agency.
In this paper, we focus on the learning through robots based on PBL. The aim
of PBL is to develop the various human abilities such as management, discussion,
survey, and presentation through performing a project. Therefore, the learning
through robots based on PBL can make students learn both the basic knowledge
on robots and the management of a project at the same time. The aim of PBL in
this class is to develop a locomotion robot as a new shape of artificial creature not
existing in the natural world, and to generate new locomotion patterns based on
the above discussions on the Kukanchi and Mobiligence. The PBL is composed of
two steps; (1) Students conduct conceptual design of artificial creatures, and (2)
Students develop hardware design and locomotion generation based on the con-
ceptual design. Based on their experience on problems and troubles in the trial and
error through the development of locomotion robots, the students learn the diffi-
culty of problem solving. Figure 1 shows the locomotion robots developed by two
groups of graduate students joined at the class of autumn semester in 2010-2011.
The common aim in these two is to minimize the number of actuators for locomo-
tion while maintaining stable locomotion. The students learned the effectiveness
of the project based on cooperative and competitive discussion through the devel-
opment of robots. Furthermore, a human can easily walk, jump, and run in an
environment, but it is very difficult to realize such motions of robots. Thus, the re-
search and development of robots are very useful to understand the dynamics and
intelligence of the human itself. In this paper, we introduce two different types of
locomotion robots developed by two groups of graduate school joined at the class
of the autumn semester in 2011-2012.
This paper is organized as follows: Section 2 explains the hardware of robot
kits for developing locomotion robots and the procedure of PBL. Section 3 shows
an example of locomotion robot based on four legs for stable high-speed rotation.
Section 4 shows an example of one-leg locomotion robot based on cyclic patterns
of upright posture and falling behavior. Section 5 summarizes this paper, and dis-
cusses the future direction of this PBL.
(a) Rotation behaviors by six legs (b) Rotation behaviors by two legs
Fig. 1 Locomotion robots in 2010-2011 [9]
Locomotion Design of Artificial Creatures in Edutainment 377
Basically, the teaching material in the PBL is composed of three stages of (1)
the design of locomotion robot, (2) the design of locomotion patterns (3) the expe-
riments of locomotion robots, and creation of teaching materials. Figure 4 shows
the procedure of PBL. The standard number of students in a group is 3 to 5. A
teaching technical assistant is assigned to each group.
In the first stage, we make students consider the shape of artificial creatures not
existing in the natural world. First of all, we explain the fundamental definition of
locomotion. Next, we make the students imagine the concept of artificial creatures.
378 K. Toyoda et al.
The students try to discuss the concept of artificial creatures by drawing various
shapes of artificial creatures. At the time, the students cut the sketch of artificial
creatures into several parts, and combine some parts to imagine a new shape of ar-
tificial creatures. Next, we show movies of locomotion of animals and insects, and
discuss the locomotion patterns. Finally, we make the students decide the combi-
nation of shape and locomotion of a target locomotion robot.
In the first stage, we made students consider the shape of artificial creatures not ex-
isting in the natural world. They proposed more than 10 ideas for the inspiration to
the conceptual design of artificial creatures such as moonwalk, rotation locomotion,
locomotion using a Dharma doll, gliding steps, and expansion and contraction me-
chanism. They discussed the advantage and disadvantage of each idea through
brainstorming. The most important thing is to increase their motivation to develop
locomotion robots. As a result, they decided to develop a robot for high-speed
cartwheel locomotion.
First, the student proposed several types of shapes based on the shapes of actuators,
mechanical parts, and body (controller box of Bioloid). Furthermore, they
searched for several photos on human cartwheel. Figure 5 shows an example of
human cartwheel. Finally, they built the cartwheel locomotion shown in Fig.6 to
realize smooth and high-speed locomotion.
The shape and locomotion of the locomotion robot should be designed to realize
locomotion patterns. First, the students designed locomotion patterns based on the
sequence shown in Fig.5 by hand (Fig.7). Next, they measured the change of the
center of gravity (COG) according to the difference of shapes of the body. They de-
signed the reference trajectory of joint angles each actuators where the output level
of each actuator is fixed. However, the robot was not able to rotate complete once
(one cycle). They discussed its reason from the viewpoints of shapes and locomotion
patterns. Finally, they changed the attachment position between the actuator and
foot-plate, and they successfully realized one cycle of forward movement.
380 K. Toyoda et al.
The student proposed several types of shapes based on the shapes of actuators,
mechanical parts, and body (controller box of Bioloid and Freedom). According to
the conceptual design shown in Fig.10, they developed two different types of one-
leg locomotion robots (Fig.11), because the weight of the body parts is quite dif-
ferent between Bioloid and Freedom. Actually, the body part of Freedom can be
considered as same as two-link (Fig.11 (b)).
382 K. Toyoda et al.
First, the students designed locomotion patterns based on the shape of one-leg lo-
comotion robots shown in Fig.11 by hand (Fig.12). The most important thing is to
discuss the movability of body part of Bioloid on the ground. In Fig.12 (a), the lo-
comotion robot must draw the body part to the contacting point of foot-plate. At
the time, they discussed the effect of friction between the robot and ground (floor).
In preliminary experiments, the robot was not able to draw the body part of Biolo-
id to the original foot-plate owing to the low friction with the ground. Therefore,
they attached a rubber sheet on back of foot-plate. Accordingly, the robot was able
to draw the body part to the position of foot-plate.
The aim of this one-leg locomotion robot is to realize cyclic patterns of upright
posture and falling behavior (see Fig.12). Therefore, the stability of upright posture
Locomotion Design of Artificial Creatures in Edutainment 383
depends on the number of links corresponding to the actuators. They changed the
number of actuators, and they found the one-leg locomotion robot with less than 4
actuators was not able to fall down because the COG of the robot cannot be
changed outside of the body part (Bioloid). They also discussed the effect of fric-
tion between the robot and ground (floor) as shown in Fig.13. This result shows
that the one-leg locomotion can move without consideration of the effect of friction
if the number of actuators is changed. On the other hand, it is very difficult to real-
ize the stable upright posture with more than 8 actuators. The moving distance
increases linearly as the number of actuators increases.
Next, they conducted several experiments of the performance of moving veloci-
ty (Table 2). The required time for one cyclic movement by the one-leg locomo-
tion robot increases as the number of actuators increases, but they realized the
speed-up from the viewpoint of average moving velocity.
5 Summary
This paper discussed the applicability of locomotion robots to the edutainment.
First, we prepared the teaching materials based on the robot kit. The aim of subject
prepared in this study is to design the new shape of artificial creature not existing
384 K. Toyoda et al.
in the natural world without using wheels, then realize its locomotion. We con-
ducted project-based learning of two groups of four or five students in the gradu-
ate school. The students joined at the autumn semester in 2011-2012 are interested
in the design of the stable high-speed locomotion as a result while the students
tried to develop the minimal size of locomotion robot. They understood the rela-
tionship between shape and locomotion patterns of artificial creatures, the tradeoff
between the stability and high-speed locomotion, and others.
As future works, we intend to conduct the teaching by edutainment using the
developed locomotion robots at elementary schools or junior high schools in this
summer.
References
[1] Zhen, J., Aoki, H., Sato-Shimokawara, E., Yamaguchi, T.: Interactive System for
Sharing Objects Information by Gesture and Voice Recognition between Human and
Robot with Facial Expression. In: The Fourth Symposium in System Integration (SII
2011), pp. 293–298 (2011)
[2] Ogawa, H., Chiba, R., Takakusaki, K., Asama, H., Ota, J.: Method for Obtaining
Quantitative Change in Muscle Activities by Difference in Sensory Inputs about
Human Posture Control. In: Proceedings of The 5th International Symposium on
Adaptive Motion of Animals and Machines (AMAM 2011), pp. 9–10 (2011)
[3] Gonzalez-Gomez, J., Valero-Gomez, A., Prieto-Moreno, A., Abderrahim, M.: A New
Open Source 3D-printable Mobile Robotic Platform for Education. In: Proceedings of
the 6th International Symposium on Autonomous Minirobots for Research and
Edutainment, p. S22 (2011)
[4] Riedo, F., Retornaz, P., Bergeron, L., Nyffeler, N., Mondada, F.: A two years informal
learning experience using the Thymio robot. In: Proceedings of the 6th International
Symposium on Autonomous Minirobots for Research and Edutainment, p. S11 (2011)
[5] Salvini, P., Macrì, G., Cecchi, F., Orofino, S., Coppedè, S., Sacchini, S., Guiggi, P.,
Spadoni, E., Dario, P.: Teaching with minirobots: The Local Educational Laboratory
on Robotics. In: Proceedings of the 6th International Symposium on Autonomous
Minirobots for Research and Edutainment, p. S12 (2011)
[6] Yorita, A., Hashimoto, T., Kobayashi, H., Kubota, N.: Remote Education based on
Robot Edutainment. In: Proc (CD-ROM) of The 5th International Symposium on Au-
tonomous Minirobots for Research and Educatinment (AMiRE 2009), pp. 204–213
(2009)
[7] Yorita, A., Kubota, N.: Robot Assisted Instruction in Elementary School Based on
Robot Theater. Journal of Robotics and Mechatronics 23(5), 893–901 (2011)
[8] http://spp.jst.go.jp/
[9] Narita, T., Tajima, K., Takase, N., Zhou, X., Hata, S., Yamada, K., Yorita, A.,
Kubota, N.: Reconfigurable Locomotion Robots for Project-based Learning based on
Edutainment. In: Proc (CD-ROM) of International Workshop on Advanced Computa-
tional Intelligence and Intelligent Informatics, IWACIII 2011, p. SS5-3 (2011)
[10] http://www.robotis.com/zbxe/main
[11] Kubota, N., Tomioka, Y., Ozawa, S.: Intelligent Systems for Robot Edutainment. In:
Proc. of 4th International Symposium on Autonomous Minirobots for Research and
Edutainment, pp. 37–46 (2007)
Multistep Search Algorithm for Sum k-Nearest
Neighbor Queries on Remote Spatial Databases
1 Introduction
New types of Location-Based Services (LBS) for supporting a group of mobile users
are potentially promising. Consider, for example, a group of mobile users, each
being at a different location (query point), wants to obtain POI (Points Of Interests)
information to meet there together. Additionally, it is assumed that location data
regarding mobile users is obtainable from a location management server and POI
information is accessible via other WEB services. In this case, realizing the LBS
requires Aggregate k-Nearest Neighbor (k-ANN) queries, for returning POI whose
sum (maximum) of distance from each query point is top-k minimum.
Hideki Sato
School of Informatics, Daido University, 10-3 Takiharu-cho,
Minami-ku, Nagoya, 457-8530 Japan
e-mail: hsato@daido-it.ac.jp
Ryoichi Narita
Aichi Toho University, 3-11 Heiwagaoka, Meito-ku, Nagoya, 465-8515 Japan
e-mail: narita@aichi-toho.ac.jp
T. Watanabe et al. (Eds.): Intelligent Interactive Multimedia: Systems & Services, SIST 14, pp. 385–397.
springerlink.com c Springer-Verlag Berlin Heidelberg 2012
386 H. Sato and R. Narita
However, there are difficulties in realizing the LBS for two reasons. First, Web
service receiving k-ANN queries has to access its corresponding spatial databases
for answering them. If spatial databases to be queried are local, and query pro-
cessing algorithms have direct access to their spatial indices (i.e., R-trees[1] and its
variants), it can answer queries efficiently. However, this assumption does not hold,
when k-ANN queries are processed by accessing remote spatial databases that op-
erate autonomously. Although some or all the data from remote databases can be
replicated in a local database and a separate index structure for them can be built,
it is infeasible when the database is huge, or large number of remote databases are
accessed
Secondly, accesses to spatial data on WWW are limited by certain types of
queries, due to simple and restrictive Web API interfaces. A typical scenario is de-
voted to retrieving POI nearest to the address given as query point through a Web
API interface. Unfortunately, Web API interfaces are not supported for processing
k-ANN queries on remote spatial databases. In other words, a new strategy for effi-
ciently processing k-ANN queries is required in this setting.
We have proposed Representative Query Point (RQP) based algorithm RQP-S
as a new solution for processing k-ANN queries[2],[3]. Actually, it searches k-
Nearest Neighbor (NN) queries using RQP as query point, instead of original k-
ANN queries. However, it returns approximate results, not exact ones. According
to experimental results, Precision is partially allowable, not entirely [2],[3]. Thus,
additional improvement must be imposed upon query results searched by RQP-S.
In this paper, we propose RQP-M for efficiently searching much exact results
of sum k-NN queries. It refines query results originally searched by RQP-S with
subsequent k-NN queries. While RQP-S is a single step search algorithm, RQP-M
is a multi-step search algorithm.
The remainder of this paper is organized as follows. Sect.2 mentions related
work. Sect.3 describes sum k-NN queries and the difficulties in processing them
for the later discussion. Sect.4 presents RQP-M algorithm. Sect.5 evaluates RQP-M
experimentally, using synthetic and real data. Finally, Sect.6 concludes the paper
and gives our future work.
2 Related Work
The existing literature in the field of location-dependent queries is extensively sur-
veyed in the article[4]. Among many location-dependent queries, NN queries[5],
[6] and their variants such as Reverse NN[7], Constrained NN[8], and Group
NN[9],[10] are considered to be important in supporting spatial decision making.
A Reverse k-NN query retrieves objects that have a specified object/location among
their k nearest neighbors. A Constrained NN query retrieves objects that satisfies
a range constraint. For example, a visible k-NN query retrieves k objects with the
smallest visible distance to a query object[11].
Multistep Search Algorithm for Sum k-NN Queries 387
Since a Group NN query retrieves ANN objects, the work[9],[10] is much re-
lated to ours. First, it has been dedicated to the case of euclidean distance and sum
function[9]. Then, it has been generalized to the case of the network distance[10].
Their setting is that the spatial database storing data objects are local to the site
where the database resides. However, we deal with k-ANN queries, where each
database is located at a remote site.
Both of the work[12],[13] are much related to ours, because they provide users
with location-dependent query results by using Web API interfaces to remote
databases. The former[12] proposes a k-NN query processing algorithm that uses
one or more Range queries1[14],[15],[16] to retrieve the nearest neighbors of a given
query point. The latter[13] proposes two Range query processing algorithms by us-
ing k-NN queries. However, our work differs from theirs in dealing with k-ANN
queries, not either k-NN queries or Range queries.
3 Preliminaries
ANN queries are an extension of NN queries. Let p be a point and Q be a set
of query points. Then, aggregate distance function dagg (p, Q) is defined to be
agg({d(p, q)|q ∈ Q}), where agg( ) is aggregate function (e.g.,sum,max,min). Given
a set P of data objects and a set Q of query points, ANN query retrieves the object p
in P, such that dagg (p, Q) is minimized. k-ANN queries are generalization of ANN
queries to top-k. Given a set of data objects P, a set of query points Q, and aggregate
distance function dagg (p, Q), k-ANN query k-ANNagg (P, Q) retrieves S ⊂ P such that
|S| = k and dagg (p, Q) ≤ dagg (p , Q), ∀p ∈ S, p ∈ (P − S) for some k(< |P|).
In the following of the paper, sum k-Nearest Neighbor (sum k-NN) query is ex-
amined as k-ANN query, where sum is used for aggregate distance function. Con-
sider the example of Fig.1, where P(= {p1 , p2 , p3 , p4 }) is a set of data objects (e.g.,
restaurants) and Q(= {q1 , q2 }) is a set of query points (e.g., locations of mobile
users). The number on each edge connecting a data object and a query point repre-
sents any distance cost between them. Table1 presents dagg (p, Q) for each p in P,
sum NN query result, sum 3-NN query result over sum distance function.
Let p be a point (x,y) and Q be a set of query points. Sum distance function
dsum,Q (x, y) over Q is defined in Eq.1. Fig.2 presents a contour graph of sum distance
function dsum,Q (x, y) over a set of 10 query points whose locations are randomly
generated. Since dsum,Q (x, y) is a convex function, there certainly exists a single
point at which function value is the lowest.
|Q|
dsum,Q (x, y) = ∑ (x − xi )2 + (y − yi)2 (1)
i=1
p3
Table 1 Results of sum NN query and sum
3-NN query shown in Fig.1
450
p2 q1
280 300 dsum (p1 , Q) 760
dsum (p2 , Q) 860
580 q2
dsum (p3 , Q) 750
340
240 dsum (p4 , Q) 800
420 560
p4 sum NN query result p3
p1
sum 3-NN query result {p3 , p1 , p4 }
query point
9 0.8
8
7
6 0.6
5
4 y
3
2 0.4
1
0
0.2
1
0.8
0 0.6 0
0.2 0.4 y 0 0.2 0.4 0.6 0.8 1
0.4
0.6 0.2
x 0.8
1 0 x
(a) (b)
Fig. 2 Sum distance function dsum,Q (x, y) (euclidean distance, number of query points=10)
RQP description
minimal point point at which the value of sum distance function dsum,Q (x, y) over Q is the lowest.
middle point point (median({xi |(xi , yi ) ∈ Q}), median({yi |(xi , yi ) ∈ Q})), where median() is
a function returning the middle ordered value of elements in a set.
mean point centroid of Q.
p4
p5
r
p2
q
p1
query
point
p3
Fig. 3 Searched circle of 5-NN query (query point (solid circle) and data objects (hollow
circle))
query with RQP q as query point. {p1 , p2 , p3 , p4 , p5 } is the query result. The radius
of the circle equals the distance r between the 5th nearest neighbor p5 and q.
(dsum (q, Q) ≤ dsum (v, Q) ∧ dsum (v, Q) ≤ u) ∨ (dsum(q, Q) > dsum (v, Q)) (3)
390 H. Sato and R. Narita
䌖䋳 䌖䋳
䌖䋴 䌖䋴
p4
p5
p2 䌖䋱䋲 䌖䋱䋱
q 䌖䋲 䌖䋲
䌖䋵 p1 䌖䋵 p1
p3 p3
p䋷 䌖䋱䋰
q
䌖䋶 䌖䋷 䌖䋶
䌖1 䌖1
p䋶
p䋸
䌖䋹
䌖䋸
Fig. 4 Additional k-NN query search whose query point is a vertex of n-regular polygon
Fig.4(b) shows a newly searched circle of 5-NN query with v1 as query point.
{p1 , p3 , p6 , p7 , p8 } is the query result. While p1 , p3 are again searched points,
p6 , p7 , p8 are newly searched points. The latter three might refine the before gained
query result. Let upperbound be max({dsum(pi , Q)|1 ≤ i ≤ k}) where pi belongs
to the refined query result. The new 6-regular polygon inscribed in the new circle
supplies its vertices. Each element of {v7 , v8 , v9 , v10 , v11 } can be added to CQPlist,
if it satisfies Eq.3. However, v12 is not added to CQPlist, because it reside inside
the before searched region. Additionally, either v2 or v6 is removed from CQPlist
for the same reason, if it belongs to the list. Of course, v1 is removed from CQPlist,
because it has been used as query point.
Fig.5 shows RQP-M algorithm. k-NN query results (line 1) are rearranged in
the ascending order of sum distance (line 2). These 2 lines correspond to RQP-S.
uppervalue is set to be the maximum (line 3). In case that k1 > k2, infinity is set
2 This is heuristically decided because p might reside on a line extending line segment qv1
ahead of v1 such that dsum (p, Q) ≤ upperbound.
Multistep Search Algorithm for Sum k-NN Queries 391
instead. Clist maintains before-searched circles and is initialized (line 4). A searched
circle with rqp as center is created (line 5) and the vertices of the regular polygon
inscribed in the circle are gathered (line 6). CQPlist is initially created (line 7), in
which candidate query points are arranged in the ascending order of sum distance.
The same search process is repeated (line 8-17) until CQPlist becomes empty. The
candidate query point with the least sum distance is selected as query point (line 9)
for the k-NN query (line 10). Rlist is refined by using the query results (line 11).
uppervalue is set to be the maximum for updated Rlist (line 12). A searched circle
is created (line 14) and the vertices of the regular polygon inscribed in the circle are
gathered (line 15).
CQPlist is related to the termination condition of the loop (line 8-17), which is
initially created with at most n candidate query points in (line 7). One execution
of loop body necessarily consumes a single query point which is removed from
CQPlist. In line 16, CQPlist is updated to be a list of elements belonging to either
CQPlist or a set of vertices of the regular polygon inscribed in the searched circle
(line 14) and satisfying the following two conditions. One is that its sum distance
is upperbound or less and the other is that it does not reside inside before-searched
circles. upperbound decreases monotonously (line 12) and the regions of before-
searched circles increase monotonously. Accordingly, the loop execution necessar-
ily terminates.
392 H. Sato and R. Narita
RQP-M refines the k-NN query result searched by RQP-S by requesting subse-
quent k-NN queries. NOR (Number of Requests) is employed to measure the search
costs, which counts the number of k-NN queries requested (line 1 and line 10).
5 Experimental Evaluation
In this section, performance of RQP-M is experimentally evaluated by measuring
Precision and NOR. The former is used as criteria to specify the accuracy of sum
k-NN query results. It is defined in Eq.4, where Rsum k−NN is an original sum k-NN
query result, RRQP−M(sum k−NN) is the query result searched by RQP-M, and k is the
cardinality of Rsum k−NN and RRQP−M(sum k−NN) . The latter is the requested number
of k-NN query queries, which specifies search costs of sum k-NN queries. Experi-
mental results are averages of 100 trials conducted for each setting. Parameters k1
and k2 of RQP-M are set equal in the experiments (See Fig.5). Furthermore, each
location of query points is uniformly distributed in the experiments.
|Rsum k−NN RRQP−M(sum k−NN) |
Precision(k) = (4)
k
1 5
0.99 4
0.98 |QP|=10
Precision
|QP|=30 3
0.97 |QP|=10
|QP|=50 2 |QP|=30
0.96 |QP|=70 |QP|=50
1 |QP|=70
0.95 |QP|=90
|QP|=90
0.94 0
3 4 5 6 7 8 9 3 4 5 6 7 8 9
Number of edges of n-regular polygon Number of edges of n-regular polygon
Fig. 6 Performance of sum 10-NN search for varying number of edges of a regular polygon
Multistep Search Algorithm for Sum k-NN Queries 393
Number of Requests(NOR)
1 4
0.998
3
k=10 k=10
Precision
0.996 k=30
2 k=30
0.994 k=50 k=50
k=70 1 k=70
0.992
k=90 k=90
0.99 0
10 20 30 40 50 60 70 80 90 100 10 20 30 40 50 60 70 80 90 100
Number of Query Points (|QP|) Number of Query Points (|QP|)
Fig. 7 Performance of sum k-NN search over data points of uniform distribution
1 4
0.98 3
σ=0.06
Precision
0.96
σ=0.06 2 σ=0.08
0.94 σ=0.08 σ=0.10
σ=0.10
1 σ=0.12
0.92 σ=0.12
σ=0.14 σ=0.14
0.9 0
10 20 30 40 50 60 70 80 90 100 10 20 30 40 50 60 70 80 90 100
Number of Query Points (|QP|) Number of Query Points (|QP|)
Fig. 8 Performance of sum 10-NN search over data points of Gaussian distribution
Thirdly, performance is measured by using real data points. Experiments are con-
ducted by varying the number of query points and k of sum k-NN queries, with
5-regular polygons and real data points. The data is concerned with restaurants
394 H. Sato and R. Narita
locating in Nagoya, which is available at the Web site and accessible via the Web
API3 . There are 2003 corresponding restaurants, which are concentrated in the
downtown of Nagoya. Precision is over 0.97 (See Fig.9(a)) and NOR ranges be-
tween 2.9 and 3.5 (See Fig.9(b)).
Number of Requests(NOR)
1 4
0.98
3
k=10
Precision
0.96 k=l10
k=30 k=30
2
0.94 k=50 k=50
k=70 1 k=70
0.92
k=90 k=90
0.9 0
10 20 30 40 50 60 70 80 90 100 10 20 30 40 50 60 70 80 90 100
Number of Query Points (|QP|) Numberof Query Points (|QP|)
1 1
are superior in Precision to both of mean points and middle points. Fig.12 shows
Precision and NOR of sum 10-NN query results searched by RQP-M using distinct
RQP. Precision of RQP-M using minimal points as RQP increases by 0.04-0.20
for uniformly distributed data, in comparison with that of RQP-S. However, there
is little difference in Precision among three kinds of RQP, each of which is used
by RQP-M (See Fig.12(a)). By contrast, there remains certain difference in NOR
among the three (See Fig.12(b)). NOR regarding minimal points ranges from 3.42
to 3.74, while NOR regarding mean points and middle points ranges from 4.3 to
7.56. It is certain that minimal points are much superior in NOR to the others.
1
0.9
0.8 minimal point
0.7 mean point
Precision
0.6
0.5 median point
0.4
0.3
0.2
0.1
0
10 20 30 40 50 60 70 80 90 100
Number of Query Points (|QP|)
Fig. 11 Precision of sum 10-NN query results searched by RQP-S with distinct RQP
Number of Requests(NOR)
1 8
7
0.99
6
Precision
0.98 5
4
0.97 minimal point
3 minimal point
mean point
0.96 2 mean point
middle point
1 middle point
0.95 0
10 20 30 40 50 60 70 80 90 100 10 20 30 40 50 60 70 80 90 100
Number of Query Points (|QP|) Number of Query Points (|QP|)
Fig. 12 Performance of sum 10-NN query results searched by RQP-M with distinct RQP
396 H. Sato and R. Narita
6 Conclusion
In this paper, we have proposed RQP-M search algorithm for efficiently searching
sum k-NN query results. It refines query results originally searched by RQP-S with
subsequent k-NN queries, whose query points are chosen among vertices of a regular
polygon inscribed in a before-searched circles. Experimental results on performance
of RQP-M are as follows. (1) A regular polygon whose number of edges is 5 or
more is sufficient to refine sum k-NN query results. (2) Precision is over 0.99 for
uniformly distributed data, over 0.95 for skew-distributed data, and over 0.97 for real
data. Also, NOR ranges between 3.2 and 4.0, between 3.1 to 3.8, and between 2.9
and 3.5, respectively. Precision of RQP-M using minimal points as RQP increases
by 0.04-0.20 for uniformly distributed data, in comparison with that of RQP-S. (3)
Precision is over 0.99 after requesting the 3rd k-NN queries and over 93% among
them are exact results. (4) Minimal points as RQP are much superior in NOR to both
of mean points and middle points. Our future work includes further examination
of available query points, experiments on the case that parameters k1 and k2 are
unequal (See Fig.5), development of an efficient algorithm for max k-NN queries,
and study of aggregate within-distance queries[18].
References
1. Guttman, A.: R-trees: A Dynamic Index Structure for Spatial Searching. In: Proc. ACM
SIGMOD Int’l Conf. on Management of Data, pp. 47–57 (1984)
2. Sato, H.: Approximately Solving Aggregate k-Nearest Neighbor Queries over Web Ser-
vices. In: Phillips-Wren, G., Jain, L.C., Nakamatsu, K., Howlett, R.J. (eds.) IDT 2010.
SIST, vol. 4, pp. 445–454. Springer, Heidelberg (2010)
3. Sato, H.: Approximately Searching Aggregate k-Nearest Neighbors on Remote Spatial
Databases Using Representative Query Points. In: Watanabe, T., Jain, L.C. (eds.) In-
novations in Intelligent Machines – 2. SCI, vol. 376, pp. 91–102. Springer, Heidelberg
(2012)
4. Ilarri, S., Menna, E., Illarramendi, A.: Location-Dependent Query Processing: Where
We Are and Where We Are Heading. ACM Computing Survey 42(3), Article 12 (2010)
5. Roussopoulos, N., Kelly, S., Vincent, F.: Nearest Neighbor Queries. In: Proc. ACM SIG-
MOD Int’l Conf. on Management of Data, pp. 71–79 (1995)
6. Hjaltason, G.R., Samet, H.: Distance Browsing in Spatial Databases. ACM Trans.
Database Systems 24(2), 265–318 (1999)
7. Korn, F., Muthukrishnan, S.: Influence Sets Based on Reverse Nearest Neighbor Queries.
In: Proc. ACM SIGMOD Int’l Conf. on Management of Data, pp. 201–212 (2000)
8. Ferhatosmanoglu, H., Stanoi, I., Agrawal, D.P., El Abbadi, A.: Constrained Nearest
Neighbor Queries. In: Jensen, C.S., Schneider, M., Seeger, B., Tsotras, V.J. (eds.) SSTD
2001. LNCS, vol. 2121, pp. 257–276. Springer, Heidelberg (2001)
9. Papadias, D., Shen, Q., Tao, Y., Mouratidis, K.: Group Nearest Neighbor Queries. In:
Proc. Int’l Conf. Data Eng., pp. 301–312 (2004)
10. Yiu, M.L., Mamoulis, M., Papadias, D.: Aggregate Nearest Neighbor Queries in Road
Networks. IEEE Trans. on Knowledge and Data Engineering 17(6), 820–833 (2005)
11. Nutanong, S., Tanin, E., Zhang, R.: Visible Nearest Neighbor Queries. In: Kotagiri, R.,
Radha Krishna, P., Mohania, M., Nantajeewarawat, E. (eds.) DASFAA 2007. LNCS,
vol. 4443, pp. 876–883. Springer, Heidelberg (2007)
Multistep Search Algorithm for Sum k-NN Queries 397
12. Liu, D., Lim, E., Ng, W.: Efficient k-Nearest Neighbor Queries on Remote Spatial
Databases Using Range Estimation. In: Proc. SSDBM, pp. 121–130 (2002)
13. Bae, W.D., Alkobaisi, S., Kim, S.H., Narayanappa, S., Shahabi, C.: Supporting Range
Queries on Web Data Using k-Nearest Neighbor Search. In: Ware, J.M., Taylor, G.E.
(eds.) W2GIS 2007. LNCS, vol. 4857, pp. 61–75. Springer, Heidelberg (2007)
14. Xu, B., Wolfson, O.: Time-Series Prediction with Applications to Traffic and Moving
Objects Databases. In: Proc. Third ACM Int’l Workshop on MobiDE, pp. 56–60 (2003)
15. Trajcevski, G., Wolfson, O., Xu, B., Nelson, P.: Managing Uncertainty in Moving Ob-
jects Databases. ACM Trans. Database Systems 29(3), 463–507 (2004)
16. Yu, P.S., Chen, S.K., Wu, K.L.: Incremental Processing of Continual Range Queries over
Moving Objects. IEEE Trans. Knowl. Data Eng. 18(11), 1560–1575 (2006)
17. Nelder, J.A., Mead, R.: A Simplex Method for Function Minimization. Computational
Journal, 308–313 (1965)
18. Trajcevski, G., Scheuermann, P.: Triggers and Continuous Queries in Moving Objects
Database. In: Proc. 6th Int’l DEXA Workshop on Mobility in Databases and Distributed
Systems, pp. 905–910 (2003)
(Not)Myspace: Social Interaction as Detriment
to Cognitive Processing and Aesthetic
Experience in the Museum of Art
Matthew Pelowski*
Abstract. This paper considers the effect of social interaction on art museum be-
havior and cognitive/ aesthetic experience, arguing that social interaction may
represent one of the potentially most detrimental elements in museum-based view-
ing of art—calling into question the current push to increase social interactions
through museum social and knowledge media design. This is considered through
three case studies with the same works of art, varying only design elements creat-
ing social interaction and considering the differences this creates. Through a psy-
chological viewpoint, these cases are considered and social interaction’s effect is
presented, considering how these findings might connect to museum and general
conceptions of social and knowledge media design.
1 Introduction
Modern society is increasingly becoming a technologically connected and “social”
space. Social communication, knowledge media, networking applications and cul-
tural norms involving the sharing of social and personal information are becoming
a more and more core component of human life. In modern times, it is quite com-
mon—often unthinkingly expected—for one to publish a record of the sights they
have seen, curate the foods they have eaten, announce their immediate opinions,
aesthetic reactions and new understandings of experiences throughout the day; not
to mention search out and share announcements, activities and opinions of others.
This behavior is in turn finding its way into many of the spaces that had formed
the previous venues for social and cultural sharing in human life: the school, the
street and—notably here—the museum of art.
Viewing art itself has long been a social task—matching one’s appraisal against
that of an artist or artworld [1]. While the making of art has long been defined as a
social act [2]—introducing a new work amid a history of other such activities.
Matthew Pelowski
*
T. Watanabe et al. (Eds.): Intelligent Interactive Multimedia: Systems & Services, SIST 14, pp. 399–410.
springerlink.com © Springer-Verlag Berlin Heidelberg 2012
400 M. Pelowski
And much current marketing and sociological research suggests that patrons have
long gone to museums (art or otherwise) expressly to join in a social event [3]—
with the driving force for attendance being to integrate and be acknowledged by a
community of likeminded others. With modern and post-modern art, it is also of-
ten noted that art-viewing is increasingly becoming a more difficult, confusing
task [5]. And sharing or learning from knowledge of others, and accessing the
wealth of existing artwork information, is given as primary means for increasing
enjoyment and understanding in the ‘common’ art patron. Therefore—as the panel
discussion, of which this paper is a part, implies—it is no surprise that the art mu-
seum would represent a microcosm for social media integration today, using cell
phones or i-pod to record and share opinions or remarks, offering technology
augmenting viewing with digital overlay of community and others art facts [4].
However, in the push to add social technology to the museum space, and to
work out the technological kinks, unconsidered is the question of what effect this
interaction in fact might have on the cognitive understanding, emotional response,
even basic assessment, of art itself. It appears to go without saying that art interac-
tion should be a social event, that we should share our experiences with others and
benefit from others’ company in our perceptions of art. Yet this assumption may
not necessarily be true. Social interaction, especially within forums such as a mu-
seum that tend to amplify social response, may be uniquely harmful to the person-
al act of enjoying, understanding and aesthetically experiencing works of art. In
fact, social meetings, created by design elements bringing together opinions and
presence of others—even as seemingly mundane as a hallway or bench—may be a
primary reason why individuals do not have fulfilling or rewarding outcomes with
art. This may only be exacerbated by technology that does even more efficiently
make connections within social space.
It is on this topic that this paper hopes to open debate. Through short considera-
tion of three case studies from our research in museums of art, we explore evi-
dence from art encounters—each with the same basic paintings, viewers and
layout; the only major difference being design elements within one space which
lead to heightened social interaction, in turn causing very different outcome for
viewers themselves. Based on our previous work on the psychological underpin-
nings of aesthetic experience, we then provide a frame for considering physical/
psychological evidence of negative social effect, considering why and how this
occurs, and—by, briefly, walking through a decidedly “analog” example of social
interaction effect—offering a counterpoint that should be considered as we push to
integrate social encounters or social/knowledge-media design into aesthetic life.
This model treats art viewing as a series of 5 stages, each with significance for un-
derstanding art, and with elements that raise important points for social design:
1.) Pre expectations: Essentially, the viewing of art (although the same can be
said for other perception activity as well), and the key point for this discussion,
begins with the nature and structure of one’s self. Before a viewer can set foot in a
museum or encounter cognitive task, they already hold expectations for what they
will see and do. Viewers carry “‘fundamental’ meanings regarding themselves,
other persons, objects or behaviors”—“Who am I?” “What is art?” “How does art
relate to me?” How do I relate to (art) society?—which collectively combine to
form one’s image of the world or “ideal self”. This self-image can be divided into
a hierarchical frame [8], topped by traits that are aspired to and that are integral to
one’s identify (“be an art person”), pursued through goals for actions—i.e., under-
stand art—and subdivided into schema—classify, find meaning, follow social
norms. In this way, all action becomes application of the self, protecting and navi-
gating via this structure, and determining what viewers can do or see—leading to
three outcomes, only one of which is generally desirable for art.
2.) Assimilation/ cognitive mastery: First, upon engaging stimuli, individuals
classify, understand and form a response, based upon these expectations and in
such a way as to control and reinforce the self. This is, in fact, generally consi-
dered to be the evolutionary goal of human action, successfully navigating the en-
vironment without cause for disruption, processing and controlling smoothly
without danger to the self. In the case of art, however, as we argue extensively in
our previous publications, because this marks a matching of perception to existing
schema or self, this stage (recently called “cognitive mastery” by Leder et al. [5])
is also the result of circularity, requiring that a viewer expand classification or
break off perception rather than modify the self. When viewed in isolation, this
cuts off possibility for new perception, change in ideas; and becomes a ‘facile’ act
of assimilation—a blasé outcome without fundamental mark or affect on one’s life
(think the typical viewer briskly walking past a painting, identifying mimetic
signs, noting an artist, reading a label, and moving on to the next work in a room).
To move past this point requires something to bump us out of our preconceived
frame. This occurs through discrepancy in matching of world to the self. Discre-
pancy might arise for numerous reasons—between expectations for perception and
perception itself, perceived information and its relevance to the self, between
actions and one’s expectations—in any case, when this cannot be assimilated or
ignored, moving one to the second outcome.
3.) Discrepancy, escape or secondary control: Upon discrepancy, the next re-
sponse is to escape, through what Rothbaum et al. [9] call “secondary control”.
In order to minimize discrepancy, the individual must alter one of the two percep-
tual elements—either the self or the environment. Typically viewers choose to
discard the latter, through: 1) re-classification, often taking on an accusatory
tone—i.e., art is meaningless, bad or esoteric; 2) physical escape; or last 3) mental
withdrawal, lowering importance of the discrepant event—i.e. “it’s only art.”—in
all cases lowering perceived demand for interaction, and therefore impact on the
self.
402 M. Pelowski
escape. First, one must precisely not sufficiently understand, must not (at least at
first) master their interaction—they must fail in their attempt at control. Second,
they must undergo overt focus on the self. It is these two elements that are specifi-
cally impacted by social presence, and which call into question many of the
museum approaches above. This might be best considered in the case study below.
emotional discrepancy
experience: confusion, anxiety, tension confusion, anxiety confusion
correlation to
secondary control
aesthetic
experience need to leave/ escape, need to leave/ escape need to leave/ escape
meta-cognitive reflection
self-awareness/ very aware of self-awareness, self-awareness, aware of my
myself, changed my mind, aware of others body, examined my motives
examined my motives
aesthetic outcome:
epiphany, felt like crying, epiphany (satori), felt like epiphany, felt like crying,
happy, relief or catharsis crying, happy, happy, relief or catharsis,
Note: adapted from [8]. All denoted terms are significant at<.10.
galleries were, by square area much bigger spaces, the Kawamura room, at the
time of this study, represented, as the only major difference, a much narrower
room. And when we do look at what viewers actually do in this room, it appeared
that as a viewer stood at the entrance to the gallery forming an assessment and
schema to be used in their encounter, as per the model above, they were also mak-
ing one more assessment before engaging with the art, with even more profound
effect upon what they do.
Essentially, upon entering, and pausing to survey the inside, subsequent viewer
action came to be determined, in almost all cases, by the location, and the avoid-
ance, of other individuals themselves. The potential locations where other viewers
might be were primarily centered on two points: the bench, located in the middle
of the space (Fig. 1) and in the far corner where the in-room docent stood. Viewer
interaction first showed an affect from this seated viewer. As our observations
show, upon entering and pausing, if another viewer was sitting on the bench fac-
ing to the right (positions 6 and 7), 100% of viewers moved to the left. If another
was sitting looking left (2 or 3), viewers moved to the right, 91% of the time.
QuickTimeô and a
TIFF (Uncompressed) decompress
are needed to see this picture.
Fig. 1 Gaze spaces and effect on viewer movement in Kawamura Rothko Room, Japan.
(Images created by the author).
In turn, what was determining viewer action in these cases, or again what was
being avoided, was not another viewer per say, but their pool of vision or what
might be called their ‘gaze space’—most often trained upon the particular painting
ahead. It was a quite common occurrence to observe a viewer wait, just outside
another’s pool of vision, trained on the next painting in the waiting viewer’s natu-
ral progression, until the other had moved their gaze before proceeding on.
This initial gaze avoidance was coupled with subsequent interaction with the
docent as well. This again, occurred in two possible points; depending on which
direction the viewer had initially moved. If a viewer entered moving left, upon
turning the corner from painting 4 to 5, they immediately came face to face with
the docent. In a likewise fashion, if a viewer moved right, they encountered the
gaze of the docent upon turning the corner from paintings 7 and 6 and moving to-
ward 5. Again, observation quite clearly showed that this gaze came to have a pro-
found effect on viewer action. Of 38 (58%) viewers who did move initially left,
29% (n = 11) stopped and turned around when hitting the corner of 3/4. Of 27
406 M. Pelowski
viewers who moved right, 22% (n = 6) turned around at painting 5 in front of the
docent, while 37% (n = 10) turned around and left at the first corner (painting 6/7).
These gaze interactions, then, came to have the key role in setting this space
apart. This can also be put in very objective terms. For those who entered when
another was already sitting at a bench, they saw on average 1½ (of 7) less paint-
ings, spent roughly 1 full minute less in the room itself. Or, that is, if we compare
these findings to the layout of this room, the time spent inside, and the amount of
gallery covered, align almost exactly to the amount of time and space presumably
available before one came upon a point where they had no choice but to enter
another’s gaze, or to leave—the choice of the majority—an amount of time and
number of paintings, presumably, short of aesthetic response. This also appeared
in our questionnaires, with a significant number of viewers, again unlike the other
two rooms, noting a specific awareness of others (Table 1).
But what does this have to do with social interaction and cognitive processing of
art? There is in fact a good deal of literature we might attach to what is occurring
here. This returns specifically to the assessment made by a viewer at the entrance
to a gallery and the self image basis of the model discussion above. While viewers
can be said to carry conceptions with them into the gallery, so too do they utilize
social schema. According to Rapee and Heimberg [12], “on encountering a social
situation,” before any processing of specific tasks it might contain, interaction be-
gins with a classification of the audience and one’s social, in addition to personal,
self. “An individual forms, a mental representation of his/her external appearance
and behavior as presumably seen by the audience. The individual simultaneously
formulates a prediction of the performance standard or norm which he/she expects
the audience to utilize in the given situation. The representation of how the au-
dience is expected to view the individual and the appraisal of the audience's pre-
sumed situational standards are compared [to one’s image of the self] to provide
an estimate of the audience's perception of the individual's performance... a deter-
mination is made whether the individual [is likely to] perform in a manner which
meets the presumed standard,” and one creates a classification for social standing
with which to engage in the cognitive task.
The gaze then is the physical manifestation—and specific test—of this cogni-
tive preparation. Stepping into the gaze space of another is an act of stepping on
stage, or essentially directly beginning a social engagement. However, when we
do consider the Kawamura room, this model also directly touches on the two is-
sues raised above for aesthetic response. As we said, in order to arrive at aesthetic
experience, there are two points that must be moved through—discrepancy and
self-awareness. However, it is these two points, according to the literature, that are
(Not)Myspace: Social Interaction as Detriment to Cognitive Processing 407
specifically affected by this type of social awareness (we have placed these into a
cognitive model [7] for viewing art, shown in Fig. 2).
First, in the case of discrepancy, viewers presumably do not find significant de-
ficiency in their social assessment—otherwise they would likely refrain from en-
tering. However, in cases where there is some discrepancy that arisen within one’s
cognitive task, the game changes. Discrepancy, where others are present, cannot
help but to take on a social tone, involving one’s expected fit within the social en-
vironment and social relation to others in the space. Rapee and Heimberg [12]
note, “perceptions of 'poor' performance would provide powerful input to the men-
tal representation indicating an inept appearance to the audience” [also 9]. And it
is specifically this sort of individual who would avoid gazes. Individuals who have
low expectations actively seek to avoid social interaction, and in this way, seek
self protection—“behaviors aimed at reducing potential for social interaction with-
in a situation… avoiding eye contact, standing on the periphery of a group” [12]—
or in this case, on the periphery of art. This can also be said of latter self-
awareness. According to [13], when individuals consider themselves to be defi-
cient “they tend to shift attention inward,” away from self-awareness, to prevent
“embarrassment and humiliation.” In fact, perception of a social imbalance
between viewer and others actually impedes meta-cognition.
It is this outcome, then, that we specifically find in the Kawamura room. It is
important to note that this behavior is not unique to Kawmura or Japanese [see 8].
We observe the same actions and behavior in the other museums we study, and
this social interaction is a commonly considered point of psychological study.
However, again, as a viewer progresses around the specific Japan space they have
no choice but to bump into others, and in turn no choice but to introduce a social
interaction into their cognitive process. This social interaction, while potentially
minor in a general museum of art where there is another path to take, becomes the
driving force for how much time and in what manner viewers engage.
This discussion, then, should raise important issues for in-museum social media de-
sign. Returning to the two elements raised above and reconsidered in the psycholog-
ical consideration of social interaction, in order to have aesthetic response—and in
turn, according to our data to find meaningfulness, beauty or positive emotion with
art (and essential goals of art museums)—one must encounter discrepancy, must be-
come self aware and must not run away. However, it is these two elements that are
specifically hampered with introduction of social awareness. This was considered
with something as simple as a room too small to avoid bumping into another, with
unavoidable points where you must share their space. But is this not the very thing
that is amplified with social media design?
408 M. Pelowski
Fig. 2 Combined cognitive flow model of aesthetic experience (left side, adapted from [6])
and cognitive processing of viewer social-relations (right, adapted from [7]).
Note that viewer cognition involves parallel processing of both outward cognitive task (e.g.,
artwork) and social position/ perceived balance between one’s self-efficacy and that of
‘others’ within the environment. Viewers switch allocation of resources between processing
of the task and monitoring social situation depending upon level of comfort within social
task. Upon discrepancy in either external/internal processing, viewers first abort through as-
similation or secondary control (outcomes 1 and 2). Given situations that cannot be aborted,
viewers often switch cognitive resources to environmental monitoring, withdrawing from
attempts at cognitive mastery (outcome 3). Only when one cannot escape due to strong tie
between perceived task and the self and one does not perceive large self-other discrepancy
might aesthetic outcome occur (4).
(Not)Myspace: Social Interaction as Detriment to Cognitive Processing 409
As noted by [7], art viewing is of course a social task, by insuring both that art
viewing will always have others, even if digitally, present, as well as by giving in-
formation derived from other sources and a conduit into the opinions (the gazes?)
of others—whether bench or twitter feed—we essentially offer up: 1) a task which
has a correct answer and in which one should not misunderstand, and 2) prime an
individual for a social comparison—will I succeed?; are my opinions important?;
as much as others? While a few may answer in the affirmative, a vast majority,
when reminded of their social position, decide to turn and walk away. In turn, it is
the very appreciation of discrepancy that is required for final aesthetic/ cognitive
appreciation, however it is this very initial failure that primes one for susceptibili-
ty to negative impact of social interaction. While this is not to argue that this will
always end in such negative encounter, designers should be very careful of social
awareness and other’s information’s, considered in tandem with cognitive
processing, potential effect on appraisal and aesthetic interaction—so that we do
not create a digital form of the very interaction considered physically above.
This point can be driven home by returning to Kawamura’s room. The curators,
aware of the negative encounters occurring, redesigned the space to specifically
make it larger so as to avoid social contact. In a return study of this space, our
findings further support the claims above. Where before 10% (of N = 22) had rec-
orded facile outcome (without confusion or epiphany), 33% had recorded escape,
and only 57% reported the aesthetic experience, now we find an essential reversal
of these latter two. 13.6% recorded the first outcome, only 18% recorded escape,
and 68% (n = 15, or almost exactly the 70% found in the other studies above) rec-
orded aesthetic experience. Comparison of means between old and new, again,
found one notable change—reduction, to almost ‘0,’ of social or other awareness.
References
[1] Becker, H.S.: Art worlds. University of California Press, Berkeley (1982)
[2] Harrington, A.: Art and social theory: Sociological arguments in aesthetics. Polity
Press, Cambridge (2004)
[3] Goulding, C.: The museum environment and the visitor experience. European Journal
of Marketing 34(3), 261–278 (2000)
[4] Leder, et al.: A model of aesthetic appreciation and aesthetic judgments. British Jour-
nal of Psychology 95, 489–508 (2004)
[5] Pelowski, M., Akiba, F.: A model of art perception, evaluation and emotion in trans-
formative aesthetic experience. New Ideas in Psychology, 1–18 (2011)
[6] Pelowski, M.: Disruption, change and aesthetic experience. Doctoral Dissertation,
Nagoya University, Japan (2011)
[7] Carver, C.: Cognitive interference and the structure of behavior. In: Sarason, I.G., et
al. (eds.) Cognitive Interference; Theories, Methods, and Findings, pp. 25–46. Erl-
baum, Mahwah (1996)
[8] Rothbaum, et al.: Changing the world and changing the self: A two-process model of
perceived control. Journal of Personality and Social Psychology 42(1), 5–37 (1982)
410 M. Pelowski
[9] Torrance, E.P.: The search for satori & creativity. The Creative Education Founda-
tion, New York (1979)
[10] Nodelman, S.: The Rothko chapel paintings: Origins, structure, meaning. University
of Texas Press, Austin (1997)
[11] Rapee, R., Heimberg, G.: A cognitive-behavioral model of anxiety in social phobia.
Behavioral Research Therapy 35(8), 741–756 (1997)
[12] Wells, A., Papageorgiou, C.: Social phobia: Effects of external attention on anxiety,
negative beliefs, and perspective taking. Behavior Therapy 29, 357–370 (1998)
Nuclear Energy Safety Project in Metaverse
Abstract. This project for learning nuclear energy safety was carried out through e-
learning. Problem Based Learning (PBL) was selected as the educational tool and
Metaverse as the class environment. The virtual classroom was built on a virtual
island of Second Life owned by Nagaoka University of Technology. Three students
from two National Technical Colleges in Japan joined the project. A teacher gave
the students a short lecture and proposed the problem. Students understood the con-
tents very well and solved the problem through chat-based discussions in Metaverse.
Students’ clear and precise understanding, their high activeness of discussion and
high interest for the safety of nuclear energy was apparent throughout this successful
Hideyuki Kanematsu
*
T. Watanabe et al. (Eds.): Intelligent Interactive Multimedia: Systems & Services, SIST 14, pp. 411–418.
springerlink.com © Springer-Verlag Berlin Heidelberg 2012
412 H. Kanematsu et al.
PBL class project. The results indicate very clearly that this kind of PBL class
was obviously possible for actual e-learning in nuclear engineering and engineering
education.
1 Introduction
On March 11, 2011, a huge earthquake hit the east part of Japan. Along with it
came a nuclear disaster, for which the people of Japan are still suffering from its
unending aftermath. Among those tragic events, the tragic trouble in Fukushi-
ma’s nuclear plants is the worst nuclear disaster for Japan. Now we are doing our
best to solve the many difficult problems relating to the disaster in different areas
of Japan. From the viewpoint of engineering education, what could we contribute
to restoration and recovery processes? Engineers of today and the future should
be optimistic problem solvers at any time. At the same time, they should know
what technology could do and could not do scientifically. Engineers have the
important mission to transfer this knowledge to people (our society at large). As
for the nuclear energy, engineers should also behave in this same way. In this
point, there is a chance for engineering education to play an important role. What
is dangerous and what is safe for nuclear energy? It is very important for engi-
neers of the future to answer the question: “To-be-or-not-to-be.” We believe that
engineering education could contribute to it very much in various ways.
We, the current authors, have been involved with the Japanese higher education
in national colleges (Kosen) for many years. Almost 50 years ago, National Col-
leges systems were established to produce practical engineers who could solve the
practical problems instantaneously in industrial worlds. The national colleges
have originally had five year programs for junior high school graduates. Howev-
er, the two year programs were added into the original one on the way for graduate
courses. Now there are over 50 national colleges all over the country. Seven
years ago, we were united into a huge organization, even though each branch has
had its own autonomy to some extent. From this viewpoint, the national college
system may be a sort of huge higher educational network. Two science and tech-
nology universities, Nagaoka University of Technology and Toyohashi University
of Technology, are located in the center of the educational network to pursue the
scientific and educational collaborations together (Fig.1).
Since such a geographical condition and practical education purpose are re-
quired for national colleges, e-learning is becoming important. In addition, engi-
neering creative design education is also required for every aspect in the education
of national colleges. Therefore, a Problem Based Learning (PBL) model for e-
learning has been established by the authors so far [1]-[10]. We believe that such
an educational project could lead to the curriculum of the near future, which
would provide students with distance learning and creative education at the same
time.
A huge earthquake and its following disasters hit the eastern part of Japan, as
previously mentioned. Particularly, the troubles in nuclear plants of the Fukushima
Prefecture stirred the reliability for the safety in nuclear energy remarkably. Be-
cause of such a background, it is very important and informative for engineering
Nuclear Energy Safety Project in Metaverse 413
Fig. 1 Kosen System in Japan. The left map was cited from the web page of institute of Na-
tional Colleges of Technology, http://www.kosen-k.go.jp/english/map[]_mechanical.html
students to know precisely what is safe and what is dangerous. This topic is most
appropriate and timely among youngsters who will launch on the engineering field
in the near future. For this project, we proposed an engineering problem relating to
the nuclear safety issues in Japan. After listening to a lecture, the students tackled
with the problem solving project in Metaverse. The effectiveness and problems
for the virtual distance learning class in Metaverse are discussed.
2 Experimental
Wacom Co.) were prepared to help students’ discussion and understanding. They
could write any sentences, sketches, figures, equations etc. on them. Then the data
was sent to a web server, so that the participants could share them on a web by
browsers.
also stressed the shielding capability would depend on what kind of materials
would be used to protect human beings. He mentioned some kinds of metallic
materials and made students calculate the shielding capability against radiation.
At the final stage, the following problem was proposed.
What kind of metals could protect radiation effectively?
The radiation decay by shiel-
ding materials was explained to
the students, shown in Fig.5.
Students calculated the radia-
tion decay with increasing dis-
tance for copper and aluminum.
And they also calculated the
thickness at the point where the
original radiation strength de-
creased to the half value under a
certain condition. All of these
problem solving processes were
Fig. 6 PBL procedure in Metaverse carried out by the team. In that
way, each student could learn
the safety and danger of nuclear energy scientifically and quantitatively. Finally,
the teacher ended his lecture with the following remark.
“Nuclear energy is absolutely safe, where the radiation intensity decreased com-
pletely to the safe level with the shielding material. These results show one could
use nuclear energy with the well-learned knowledge completely in safety.” And after
the PBL class, questionnaires were provided to students and they answered those
questions off-line immediately. Fig.6 summarizes the project outline.
4 Conclusions
As for Nuclear Energy Safety, we aimed for the students to learn it through PBL as
e-learning. For the purpose, we prepared a virtual classroom and other educational
facilities in a virtual island and confirmed the effectiveness of the virtual PBL.
Students understood the class contents through a short lecture and discussion very
418 H. Kanematsu et al.
much. They discussed very actively and finally learned what is the safety for nuclear
energy and how the dangerous radiation energy could be decreased drastically. The
series of educational investigations for this project show a positive and optimistic
possibility for the application of PBL in Metaverse.
References
[1] Kanematsu, H., Fukumura, Y., Ogawa, N., Okuda, A., Taguchi, R., Nagai, H., Barry,
D.M.: Problem Based Learning in Metaverse As a Digitized Synchronous Type
Learning. In: Kim, H.S. (ed.) Proceedings of the ICEE and ICEER (International
Conference on Engineering Education and Research), ICEE & ICEER 2009, Korea,
vol. 1, pp. 330–335. Se Yung Lim, Publishing Committee Chair, Seoul (2009)
[2] Barry, D.M., Kanematsu, H., Fukumura, Y.: Problem Based Learning in Me-taverse.
ERIC (Education Resource Information Center) Paper, ED512315 (2010),
http://www.eric.ed.gov/ERICWebPortal/
recordDetail?accno=ED512315 (retrieved)
[3] Kanematsu, H., Fukumura, Y., Barry, D.M., Sohn, S.Y., Taguchi, R.: Multilingual
Discussion in Metaverse among Students from the USA, Korea and Japan. In: Setchi,
R., Jordanov, I., Howlett, R.J., Jain, L.C. (eds.) KES 2010, Part IV. LNCS, vol. 6279,
pp. 200–209. Springer, Heidelberg (2010)
[4] Farjami, S., Taguchi, R., Nakahira, K.T., Nunez Rattia, R., Fukumura, Y., Kanemat-
su, H.: Multilingual Problem Based Learning in Metaverse. In: König, A., Dengel, A.,
Hinkelmann, K., Kise, K., Howlett, R.J., Jain, L.C. (eds.) KES 2011, Part III. LNCS
(LNAI), vol. 6883, pp. 499–509. Springer, Heidelberg (2011)
[5] Farjami, S., Taguchi, R., Nakahira, K.T., Fukumura, Y., Kanematsu, H.: Problem
Based Learning for Materials Science education in Metaverse. In: Proceedings of
2011 JSEE Annual Conference, pp. 20–23 (2011)
[6] Barry, D., Kanematsu, H., Fukumura, Y., Ogawa, N., Okuda, A., Taguchi, R., Nagai,
H.: Problem Based Learning Experiences in Metaverse and the Differences between
Students in the US and Japan. In: International Session Proceedings of 2009 JSEE
Annual Conference - International Cooperation in Engineering Education, pp. 72–75.
Japan Society of Engineering Education (JSEE), Nagoya (2009)
[7] Nakahira, K., Rodrigo, N.R., Taguchi, R., Hideyuiki, K., Fukumura, Y.: Design of a
multilinguistic Problem Based Learning - Learning Environment in the Metaverse,
pp. 298–303. IEEE, Taiwan (2010)
[8] Barry, D.M., Kanematsu, H., Fukumura, Y., Ogawa, N., Okuda, A., Taguchi, R., Na-
gai, H.: International Comparison for Problem Based Learning in Metaverse. In: Kim,
H.S. (ed.) The ICEE and ICEER 2009 Korea (International Conference on Engineer-
ing Education and Research), ICEE & ICEER 2009, Korea, vol. 1, pp. 60–66. Lim,
Se Yung, Publishing Committee Chair, Intercontinent Grand Hotel, Seoul (2009)
[9] Taguchi, R., Nakahira, K., Kanematsu, H., Fukumura, Y.: Construction and evalua-
tion of multilanguage environment which aims smooth PBL in metaverse, p. 35. The
Institute of Electronics, Information and Communication Engineers (IEICE), Nagao-
ka University of Technology, Nagaoka, Niigata (2010)
[10] Kanematsu, H., Fukumura, Y., Barry, D.M., Sohn, S.Y., Taguchi, R.: Multilingual
Discussion in Metaverse among Students from the USA, Korea and Japan. In: Setchi,
R., Jordanov, I., Howlett, R.J., Jain, L.C. (eds.) KES 2010, Part IV. LNCS, vol. 6279,
pp. 200–209. Springer, Heidelberg (2010)
Online Collaboration Support Tools
for Blended Project-Based Learning
on Embedded Software Development
— Final Report —
T. Watanabe et al. (Eds.): Intelligent Interactive Multimedia: Systems & Services, SIST 14, pp. 419–428.
springerlink.com c Springer-Verlag Berlin Heidelberg 2012
420 T. Yukawa et al.
tools facilitate collaborative activities between learners and are effective for
enhancing their ability.
1 Introduction
Project-based learning (PBL) is intended to strengthen students’ abilities in
design, teamwork, and communication through experiences developed in solv-
ing practical problems as a team. As an example, PBL has become popular
in engineering education because industry requires new university graduates
to have engineering design abilities.
However, there are some disincentives avoiding to popularize PBL. PBL
assumes a group works in a classroom, the learning opportunity is limited
for the students who cannot attend the classroom at the same time. In ad-
dition, project-based learners must organize their experiences into system-
atic knowledge and insight to acquire the desired abilities; otherwise, PBL
becomes an ineffective and time-consuming process. Therefore, the authors
have conducted a research project to apply information and communication
technology (ICT) to PBL for implementing PBL sessions even if the learners
are in distant places or attend different time. The project targets a training
course of embedded software development [7].
Since collaboration between learners is a significant factor in PBL, this
project focused on both the e-Learning program of the collaborative design
process and the tools that support the e-Learning program. In addition, learn-
ers should take a look back on their activities at the end of a PBL session,
it is called a “postmortem”. The postmortem is also important in PBL in
order that the learners organize their experiences into systematic knowledge
and insight for acquiring the desired abilities [6].
The present paper describes every online collaboration support tool de-
veloped in the project and a model learning program for embedded software
development training. The position of each of the tools in PBL is clarified.
For evaluation of the learning effect, the actual learning sessions accompa-
nied by a questionnaire survey and an achievement test have been conducted.
The paper also reports evaluation results and discuss the learning effect and
usability of the tools.
2 Background
2.1 Related Works
In recent years, a number of training programs on embedded software have
been established. In Niigata prefecture, the Niigata Industrial Creation Or-
ganization (NICO) and associated organizations have conducted training
Online Collaboration Support Tools 421
Fig. 1 Embedded
Software Development
Process
422 T. Yukawa et al.
first requirement, that is, project-based distant learners can frequently review
their designs with each other.
tools have been developed, these tools can only be used for synchronous
(real-time) discussion. In our learning program, cross review processes should
able to be performed asynchronously. Adding a chat function in addition
to a playback function, which can replay the chat contents sequentially as
well as display previous drawings on the whiteboard, enables asynchronous
discussion to be achieved using drawings. The OWB also has a function
whereby users can upload an image file and display the file as a background
image of the whiteboard. A screenshot of the OWB is shown in Figure 4.
Fig. 5 Screenshot of
the Postmortem Support
Tool
Online Collaboration Support Tools 425
4 A Model Program
Postulating the use of the proposed e-Learning environment, a blended learn-
ing program on project-based embedded software development is constructed
as shown in Table 1. The table lists the schedule, the learning modes, and the
tools usages for each learning unit or activity. The learning modes include
synchronous (sync.) or asynchronous (async.), face-to-face (f2f) or distant
(dist.), and individual (indiv.), group of learners (group) or all learners (all).
The program requires 21 days, whereas the equivalent face-to-face program
requires only five days. This is because most learning units and activities are
conducted asynchronously, which requires a time allowance. Learners are as-
sumed to be part-time students and have workplace responsibilities. There-
fore, learners have limited available time each day for the learning program.
5.2 Discussion
The question on the effectiveness of the tools received positive responses (very
good and good ) from 80% of the learners and received no negative responses.
On the other hand, the question on reduction of burden for the review with
the supporting tools received negative responses from 20% of the learners,
whereas 40% of the learners provided positive responses. This suggests that
the usability of the supporting tools should be improved. All trial learners
responded the positive answer for satisfaction of provided information and
smoothness of usage of the PST. In addition, they reported that the review
quality of this program is equivalent to that of the face-to-face program.
Although the blended learning program takes longer to complete, the learn-
ing quality is not far behind that of the face-to-face program, and knowledge
and skills on software design process are organized better with the supporting
tools.
426 T. Yukawa et al.
6 Conclusions
A research project to implement e-Learning technology for PBL of embed-
ded software development was described. Based on observation of a real PBL
training course, functional requirements for collaboration support were clar-
ified, and the support tools including integrated repository, review support,
online whiteboard, and postmortem support were developed. A model learn-
ing program which premises use of the tools was also presented. Trial sessions
of the program were conducted for evaluation of the program itself and the
tools. The results of the questionnaire survey suggested that the tool was easy
to use for most learners and the learning effect of the program was equivalent
to that of the face-to-face program. In addition, the result of the achievement
test suggested that the tools can enhance establishing organized knowledge
and skills of software development process to the learners.
References
1. Alavi, M.: Computer-mediated collaborative learning: An empirical evaluation.
MIS Quarterly: Management Information Systems 18(2), 159–174 (1994)
2. Brandon, D.P., Hollingshead, A.B.: Collaborative learning and computer-
supported groups. Communication Education 48(2), 109–126 (1999)
3. Gillet, D., Nguyen-Ngoc, A.V., Rekik, Y.: Collaborative web-based experimen-
tation in flexible engineering education. IEEE Transactions on Education 48(4),
696–704 (2005)
4. Kojiri, T., Kayama, M., Tamura, Y., Har, K., Itoh, Y.: CSCL and support
technology. JSiSE Journal 23(4), 209–221 (2006) (in Japanese)
5. Myers, E.W.: An O(ND) difference algorithm and its variations. Algorithmixa 1,
251–266 (1986)
6. Nishigaki, Y., Yukawa, T.: Proposal of postmortem support tools for use in
project-based learning. In: Proceedings of Society for Information Technology
and Teacher Education International Conference 2011, pp. 593–598 (2011)
7. Yukawa, T., Takahashi, H., Fukumura, Y., Yamazaki, M., Miyazaki, T., Yano, S.,
Takeuchi, A., Miura, H., Hasegawa, N.: Implementing e-learning technology for
project-based learning for the development of embedded software. In: Proceed-
ings of Society for Information Technology and Teacher Education International
Conference 2009, pp. 2208–2212 (2009)
Online News Browsing over Interrelated Target
Events
Abstract. Recently, we can easily acquire various information from many Web sites.
However, it is difficult to determine the next browsing page from many Web pages.
We often change the search target to narrow down the result pages found out by
the existing search engine. Generally, it is not easy for a searcher to change the
query before he/she knows the page content in search result perfectly. In this paper,
we propose a method for specifying query terms to narrow down the pages in the
search result effectively. Our method detects the representative terms of the current
search target according to the currently browsed page. In order to detect the repre-
sentative terms of the current search target, our method calculates the occurrences
of individual terms in the currently browsed page, per constant time interval. We
explain the prototype system with our method and show our experiment.
1 Introduction
Recently, we can easily acquire various information from the Web with the search
engine such as Google search engine1. However, it is difficult for us to find the next
browsing page when we acquired a lot of pages as the search result [1]. In order to
search efficiently, it is necessary to catch up only the pages directly related to the
browsed page.
The news articles in some online news sites such as Yahoo! Japan News2 contain
hyperlinks to the other related news articles. The hyperlinks are useful to select
the next browsing article. When a searcher wants to find out information about a
Yusuke Koyanagi · Toyohide Watanabe
Graduate School of Information Science, Nagoya University,
Furo-cho, Chikusa-ku, Nagoya, Japan
e-mail: koyanagi@watanabe.ss.is.nagoya-u.ac.jp,
watanabe@is.nagoya-u.ac.jp
1 http://www.google.com/
2 http://headlines.yahoo.co.jp/
T. Watanabe et al. (Eds.): Intelligent Interactive Multimedia: Systems & Services, SIST 14, pp. 429–438.
springerlink.com c Springer-Verlag Berlin Heidelberg 2012
430 Y. Koyanagi and T. Watanabe
term in the currently browsed article, he/she tries to acquire the articles about the
term. However, the hyperlinks are not useful for every term in the currently browsed
article. It is necessary that one article is tied by the hyperlink to every related article
such as Wikipedia article. In Wikipedia3, each article has some hyperlinks to another
article whose title is contained in the article. However, Wikipedia article does not
contain the more detailed content than the news article.
In the Web search, we sometimes refine the search target. For example, in the
search for information about Aftermath of the 2011 Tohoku earthquake, we some-
times focus on more detailed target such as Fukushima Daiichi nuclear disaster. In
this case, in order to search for information about the target, we usually input new
query to the search engine and narrow down the search result. However, when the
searcher does not know information about the current search target, it is difficult for
the searcher to decide the query to narrow down the result effectively.
We propose a method for presenting the effective, detailed target. Our method
detects the representative term of the current search target according to the currently
browsed page.
In this paper, we assume a search with online news articles in Japanese. In ad-
dition, we focus on the change of the search target, not caused by the searcher’s
whim, but caused by the browsing of the articles. For example, the searcher some-
times changes the search target to the terms described in the browsed article.
2 Related Works
Several studies have investigated the method for detecting the searcher’s interest for
the Web page recommendation. Sumathi et al. proposed the Web page recommender
system based on the searcher’s browsing sequential patterns [2]. He et al. proposed
the sequential query prediction method based on the searcher’s past query sequence
[3]. Both of them extract the searcher’s interest from the information such as the
browsing history or the query history. In the case that the searcher does not know
the search target, such history does not represent the searcher’s interest. In addi-
tion, the method based on the browsing history may detect the change of the target
inaccurately. For the search with the search target, our method should present the
candidate of the search target and let the searcher select one.
In order to support the selection of browsed articles on Web, several studies have
proposed the document clustering methods. Kaski proposed WebSOM, which maps
similar documents to neighboring areas in 2D space [4]. For determining the similar-
ity of the document, he adopts the calculation of cosine similarity between the fea-
ture vectors of documents. Hu et al. proposed a method to extract semantic relations
from Wikipedia and then cluster documents with these relations [5]. They enhanced
the traditional similarity measure such as cosine similarity. However, they did not
assume the search with the search target. We assume that the searcher changes the
search target to the term described in the browsed article. In the assumption, it is
3 http://www.wikipedia.org/
Online News Browsing over Interrelated Target Events 431
Search target
Target articles
Searcher
Target articles
Searcher
more appropriate to present the related article based on the term of article than ones
based on the similarity of the total description of article.
3 Approach
We assume that the searcher does two activities: indication of the target and selec-
tion of the browsed article. Figure 1 shows a searcher’s activity for refinement. In
indication of the target, the searcher can acquire the articles about the target. In this
paper, we call the acquired articles as target articles, In selection of the browsed ar-
ticle, the searcher selects the browsed article from the articles acquired in indication
of the target. We consider the method for presenting the candidate of the target to
select new target. In this paper, we call the candidate list of the targets as candidate
targets.
The important functions in the search with news articles are:
Determination of Important Degree of the Term in Browsed Article: We as-
sume that the searcher decides the new target by reading the term of browsed article.
In order to present the candidate target which tends to be selected by the searcher,
the candidate target should be presented according to the important degree of each
term in browsed article.
Extraction of Representative Term in Target Articles: The searcher does not
know the content of the search result. The representative term of target articles
should be extracted and presented.
432 Y. Koyanagi and T. Watanabe
Candidate targets
Browsed terms A
browse { A, B, C, … } B Presenting
C order
Searcher
・
browsed article ・
・
time
Target time period 1 Target time period 2
regular interval
articles
article
selected?
are updated, and the candidate target is presented. When the searcher selects the
new target from the candidate target, the target articles, the important degree of the
browsed terms and the candidate target are updated, and also the target articles and
the candidate target are presented. In the acquisition of the target articles, the web
search engine or the prepared index of the articles is used.
In Section 4, we explain each process: Extraction/Update of the browsed terms
and their important degree, and Determination of the representative term in each
time period and the presenting order.
4 Proposed Method
4.1 Extraction/Update of Browsed Terms
The browsed terms BT is represented by the following equation.
In this equation, n is the number of the browsed terms, ti is the i-th term, and vi is
the important degree of ti .
Each time the browsed article is changed, each term is extracted from the browsed
article. When the searcher browses no article, each term is extracted from all the tar-
get articles. The general noun, the proper noun, sahen-setsuzoku and keiyoudoushi-
gokan are extracted with MeCab[6], which is one of morphological analysis tools
for Japanese language. The title and the body text of article are only used for the
extraction. The hyperlink text to the related article is not used.
The important degree of the browsed terms is calculated with the following
equation.
Freq(ti ) N
vi = n log2 (2)
∑ j=1 Freq(t j ) DocNum(ti )
434 Y. Koyanagi and T. Watanabe
In this equation, m is the number of the target time periods, TAi is the set of the
target articles in T Pi , rti is the representative term of T Pi , and oi is the important
degree of rti .
The representative term of the T Pi is selected from the browsed terms. In order
to select rti from the browsed terms, the important degree of the browsed term t j in
T Pi (Valuei, j ) is calculated with the following equation.
DocNum(ti , TAj ) M
Valuei, j = log2 (4)
|TAj | T PNum(ti )
In this equation, DocNum(ti , TAj ) is the number of the articles containing ti in TAj ,
and T PNum(ti ) is the number of the target time periods including the article con-
taining ti . The Valuei, j becomes high when t j is contained in many articles in TAi .
In addition, the Valuei, j becomes high when t j is occured in few target time period.
The representative term of T Pi is the browsed term whose important degree in
T Pi is the highest. After the calculation of the important degree of all the browsed
terms, the representative term is decided.
When rt j is ti , the important degree of rt j is calculated by the following equation.
In this equation, o j becomes high when the important degree of rt j in the browsed
article is higher. In addition, o j becomes high when rt j is contained in many articles
in TAi .
5 Prototype System
We developed the prototype system based on our method explained in Section 4. The
system consists three windows: Target Article Window, Candidate Target Window
and Browsed Article Window. Initially, Target Article Window is opened.
Target Article Window is the window for presenting the target articles. Figure 4
shows the Target Article Window. When the searcher inputs the target terms and
clicks the “Search” button, the system opens Candidate Target Window and outputs
Online News Browsing over Interrelated Target Events 435
Listbox for
target time periods
Listbox for
target articles
Listbox for
candidate targets
“Select” button
the target articles and the candidate target. The candidate targets are output in
the listbox of Candidate Target Window. The target articles are output in the two
listboxes of Target Article Window. In the upper listbox, the target time periods
are shown in reverse time order and the most recent target time period is selected
initially. In the lower listbox, the target articles published in the selected target time
period are shown in reverse time order. Each time the searcher selects another target
time period from the upper listbox, the lower listbox is updated. When the searcher
selects the title of article from the lower listbox, Browsed Article Window is opened
and the article is shown in its window.
Candidate Target Window is the window for presenting the candidate target.
Figure 5 shows the Candidate Target Window. When the searcher selects a can-
didate target from the listbox and clicks the “Select” button, the candidate target
is added to the target terms and the target articles, and also the candidate tar-
gets are updated. At this case, in the upper listbox, the target time period whose
436 Y. Koyanagi and T. Watanabe
representative term was the selected candidate target was selected initially. The first
element of the listbox is “(delete).” When the searcher selects “(delete),” the last
added candidate target is deleted from the target terms.
Browsed Article Window is the window for presenting the browsed article. The
searcher cannot click every hyperlink in the presented article.
In addition to the construction of the prototype system, we collected the news
articles periodically, stored them in the local hard disc, and constructed the inverted
index of every noun contained in them. The inverted index is used to acquire the
target articles from the local hard disc.
6 Experiment
In this section, we explain our experiment. In our experiment, the 13,231 economic
news articles acquired from Yahoo! News are used. They contain 41,263 kinds of
terms. Their published dates are from Sep. 21, 2011 to Nov. 6, 2011. The interval of
the target time period is one day. Articles published on the same day are included
in the same time period.
Table 1 shows the candidate targets when the searcher inputs “nuclear power
station ” as the target terms and browses the article on Nov. 4 whose title is
“The No. 4 reactor of Genkai nuclear power plant recoveries to the normal operation
( 4 ).” The important terms representing the content of
the browsed article become high on the list of candidate target, such as operation,
restart, Kyushu Electric Power Co., Inc., Genkai and recovery.
The terms which are difficult to be understood with the only single term are
listed such as regular. For example, in the browsed article, regular is used with
examination. It is difficult for us to guess the sense of regular examination from
Table 1 Candidate targets when target term is “nuclear power station” and browsed article
is “The No. 4 reactor of Genkai nuclear power plant recoveries to the normal operation”
1 operation( )
2 restart( )
3 Kyushu Electric Power Co., Inc.( )
4 regular( )
5 Genkai( )
6 regular( )
7 regular( )
8 recovery( )
9 restart( )
10 report( )
11 Saga( )
12 report( )
13-45 consultation( ), group( ), citizen( ), etc.
Online News Browsing over Interrelated Target Events 437
Fig. 6 Target Article Window immediately after the searcher selects Genkai
only regular. For the sense clarification of such candidate target, we will consider
the addition of another term to the candidate target.
In addition, some terms are listed twice or more, such as restart and regular. It
is because the representative term of more than two target time periods becomes
the same term. For easy selection of the candidate target, we have to consider our
method for presenting the same candidate target.
Figure 6 shows the Target Article Window immediately after the searcher selects
Genkai from Table 1. In the upper listbox, the target time period “Nov. 1, 2011” is
selected initially. “Nov. 1, 2011” is the date on which the restart of the No. 4 reactor
of Genkai nuclear power plant was announced. Therefore, it is adequate that the
representative term of “Nov. 1, 2011” is Genkai. It is helpful for the selection of the
suitable article for the target. For easier selection of the suitable article for the target,
we will consider the presenting of another important target time period.
7 Conclusion
We proposed the method for presenting the effective search target. Our method de-
tects the representative term of the current search target according to the currently
browsed article. We constructed the prototype system based on our method. In the
experiment, we presented a case example of the proposed method. In order to con-
firm the precision of the method, we have a plan to conduct the experiment by
comparing with the correct result set.
For our future work, we should consider how to display the candidate target
and target time period effectively. For example, we will consider how to present
the same candidate target in more than two target time periods, how to present the
candidate target with more than two terms, and how to present the target time period
based on the important degree.
Acknowledgements. We are thankful to Mr. Y. Tabuchi for his collection of news articles
from Yahoo! Japan News, which we used in our experiment.
438 Y. Koyanagi and T. Watanabe
References
1. Turing, M., Haake, J.M., Hannemann, J.: Hypermedia and Cognition: designing for com-
prehension. Communication of the ACM 38(8), 57–66 (1995)
2. Sumathi, C.P., Valli, R.P., Santhanam, T.: Automatic Recommendation of Web Pages in
Web Usage Mining. Int’l Journal on Computer Science and Engineering 2(9), 3046–3052
(2010)
3. He, Q., Jiang, D., Liao, Z., Hoi, S.C., Chang, K., Lim, E., Li, H.: Web Query Recommen-
dation via Sequential Query Prediction. In: Proc. of the 25th Int’l Conf. on Data Engineer-
ing, pp. 1443–1454 (2009)
4. Kaski, S.: Dimensionality reduction by random mapping: Fast similarity computation for
clustering. In: Proc. of Int’l Joint Conf. on Neural Networks, vol. 1, pp. 413–418 (1998)
5. Hu, J., Fang, L., Cao, Y., Zeng, H.J., Li, H., Yang, Q., Chen, Z.: Enhancing Text
Clustering by Leveraging Wikipedia Semantics. In: Proc. of the 31st Annual Int’l
ACM SIGIR Conf. on Research and Development in Information Retrieval (2008), doi:
10.1145/1390334.1390367
6. Kudo, T., Yamamoto, K., Matsumoto, Y.: Applying Conditional Random Fields to
Japanese Morphological Analysis. In: Proc. of the 2004 Conf. on Empirical Methods in
Natural Language Processing, EMNLP 2004, pp. 230–237 (2004)
Path Planning in Probabilistic Environment
by Bacterial Memetic Algorithm
1 Introduction
The emerging synthesis of information technology, network technology, and robot
technology is one of the most promising approaches to realize a safe, secure, and
comfortable society for the next generation [4]. Intelligent technology plays a key
role in this process. Information technology and intelligent technology have been
discussed from various points of view. Information resources and the accessibil-
ity within an environment are essential for both people and robots. Therefore, the
environment surrounding people and robots should have a structured platform for
gathering, storing, transforming, and providing information. Such an environment is
called informationally structured space [12, 13] (Fig. 1). The intelligent technology
János Botzheim
Department of Automation, Széchenyi István University, 1 Egyetem tér, Győr, 9026,
Hungary, Graduate School of System Design, Tokyo Metropolitan University,
6-6 Asahigaoka, Hino, Tokyo, 191-0065 Japan
e-mail: botzheim@sze.hu
Yuichiro Toda · Naoyuki Kubota
Graduate School of System Design, Tokyo Metropolitan University, 6-6 Asahigaoka,
Hino, Tokyo, 191-0065 Japan
e-mail: toda-yuuichirou@sd.tmu.ac.jp,kubota@tmu.ac.jp
T. Watanabe et al. (Eds.): Intelligent Interactive Multimedia: Systems & Services, SIST 14, pp. 439–448.
springerlink.com c Springer-Verlag Berlin Heidelberg 2012
440 J. Botzheim, Y. Toda, and N. Kubota
for the design and usage of the informationally structured space should be discussed
from various points of view such as information gathering of real environment and
cyber space, structuralization, visualization and display of the gathered information.
The structuralization of informationally structured space realizes the quick update
and access of valuable and useful information for people. It is very useful for both
robots and people to easily access the information on real environments. The infor-
mation is transformed into the useful form suitable to the features of robot partners
and people. Furthermore, if the robot can share the environmental information with
people, the communication with people might become very smooth and natural.
but there can also be some moveable objects like objects A and B in the Figure.
There can be for example a chair which is used in two different positions during the
day, e.g. in 80% of the day (i.e. with 80% probability) in one position, and in 20%
of the day in another position perhaps outside the room. This probabilistic extension
of our previous approach is investigated in this paper.
The chromosome consists of intermediate points between the start and target points.
Each intermediate point (cell) is identified only by one integer number containing
the point’s two coordinates together as shown in Figure 2. The encoding of the
individual (bacterium) depicted in Figure 2 is [10|36|14]. The start and target points
are not presented in the bacterium.
The individuals are evaluated based on new intermediate points which cover the
way from the start point to the target point by only neighboring points with 4 possi-
ble directions. This can be done by using “straight” lines between the points encoded
in the individual. The way how a straight line is described by only neighboring
points with 4 possible directions is illustrated in Figure 3. After doing this extension
of the individual the path will contain only neighboring cells.
The individuals are evaluated as:
Li
Evali = Li + Penalty · ∑ CPj + S · Turnsi, (1)
j=1
where Evali is the i-th individual’s evaluation value, Li is the number of neighboring
cells in the extended individual, CPj is the collision probability of the j-th cell in the
extended individual’s sequence. Each cell in the extended individual has a collision
probability and these probabilities are summed up in the whole cell sequence. The
collision probability of a cell is 0 if the cell is free, the probability is 1 if the cell is
occupied by a static obstacle, and the probability is between 0 and 1 in case of prob-
abilistic obstacles. Penalty is a parameter reflecting the strength of punishment for
crossing occupied cells. Turnsi is the number of robot’s turns, parameter S reflects
the smoothness of the path. Paths with less turns are preferred.
An advantage of the approach is that it is able to handle individuals with different
length, i.e. individuals with different number of intermediate points. This property is
observable in the evaluation function, because if the length of a bacterium is chang-
ing e.g. a new intermediate point is added to the bacterium or an intermediate point
is removed from the bacterium then the extended individual (which is described by
only neighboring cells as seen in Figure 3) will be changing as well, and its L and
Turns values will be different than before and the sum of the collision probabili-
ties can differ as well. In [19] memetic algorithm is proposed for solving the path
planning problem based on the classical operators, the mutation and the crossover.
In their approach the length of the individual and the number of intermediate points
are predetermined, not found automatically by the algorithm.
In the initial population creation bacteria with different length can be created.
The length of the bacterium has to be between 1 and maximum allowed bacterium
length which is a parameter of the algorithm.
Another advantage of the approach is that it can handle probabilistic environment
as presented in this paper. In [2] we dealt with certain environments. The classical
techniques for path planning based on graph theory and similar approaches rely on
the exact estimation of the cost. However, in probabilistic environment this cannot
be achieved.
Path Planning in Probabilistic Environment by Bacterial Memetic Algorithm 443
The insertion operator inserts a new point between those two intermediate points
where the insertion has the biggest benefit. The operation is performed only when
after insertion the bacterium will be better than before.
The deletion operator deletes that point from the individual which removal has
the biggest benefit. If no such a point is found in the sense that after removal the
bacterium will be worse then the operation will not be performed.
The swap operator tries to swap each pair of consecutive points in the individual’s
chromosome. The operation is performed when after swap the bacterium will be
better.
The local improvement operator tries to improve each point in the chromosome
in a given radius, which is a parameter of the algorithm. The local environment of
the points is investigated and the best new position for the point is selected or the
old position is remained in the path if there is no better position for the point.
3 Simulation Results
In the simulation tests two different environments were applied as illustrated in
Figure 4. The map size in case of the smaller problem is 20 × 20, and in case of
the second problem is 30 × 30. In Figure 4 the blue cell represents the start position
and the red cell is the goal position. The white cells mean free positions where the
probability of obstacle is 0. The dark cells represent the static obstacles with colli-
sion probability 1. The gray cells illustrate obstacles with probabilistic appearances.
Their probabilities are larger than 0 and less than 1. The probabilities in the first task
are: P1=0.4, P2=0.8, P3=0.2, and P4=0.6. The probabilities in the second task are:
P1=0.8, P2=0.3, and P3=0.7.
Path Planning in Probabilistic Environment by Bacterial Memetic Algorithm 445
BMA has many parameters which can cause a pretty fine behavior of the algo-
rithm. However, it is not always easy to find the most appropriate parameter setting
for a given problem. After some preliminary tests the parameter settings presented
in Table 1 appeared to be the most suitable for the two tasks.
With the parameter settings presented in Table 1 we obtained the optimal result
seven times from ten simulations in case of the smaller problem, and 3 times from
ten simulations in case of Task 2. Optimality means that there was no collision,
the path length was optimal and the number of turns was optimal as well. In the
remaining cases the obtained paths were also collision-free, only the path length and
the number of turns were not optimal. In case of the first task, the path length was
optimal in all simulations, only the number of turns was not optimal. This situation
is illustrated in Fig. 5 for Task 1, and in Fig. 6 for Task 2, where Fig. 5a and Fig. 6a
depict the optimal solution in all terms, while Fig. 5b and Fig. 6b show a solution
where the path length is optimal, there is no collision with any obstacles, only the
number of turns is not optimal.
speed to the optimum is pretty slow. Memetic algorithm can accelerate the evolu-
tionary process by local search. Bacterial memetic algorithm effectively combines
the bacterial operators with local search heuristics and can speed up the evolution-
ary process in this way. This property was investigated in [2] where BMA provided
better results than BEA and genetic algorithm.
The algorithm can handle different individual length which is useful in the en-
coding applied in this paper.
Our future plan is to extend the algorithm for handling more robots simultane-
ously. Another goal is a deeper analysis of the effect of the penalty factor in order
to provide practical solutions in complex probabilistic environments.
References
[1] Botzheim, J., Cabrita, C., Kóczy, L.T., Ruano, A.E.: Fuzzy rule extraction by bacterial
memetic algorithms. In: Proceedings of the 11th World Congress of International Fuzzy
Systems Association, Beijing, China, pp. 1563–1568 (2005)
[2] Botzheim, J., Toda, Y., Kubota, N.: Bacterial memetic algorithm for offline path plan-
ning of mobile robots. Memetic Computing (2012), doi: 10.1007/s12293-012-0076-0
[3] Caponio, A., Cascella, G.L., Neri, F., Salvatore, N., Sumner, M.: A fast adaptive
memetic algorithm for online and offline control design of PMSM drives. IEEE Trans-
actions on Systems, Man, and Cybernetics, Part B: Cybernetics 37, 28–41 (2007)
[4] Cordeiro, C.M., Agrawal, D.P.: Ad Hoc & Sensor Networks – Theory and Applications.
World Scientific Publishing (2006)
[5] Drobics, M., Botzheim, J.: Optimization of fuzzy rule sets using a bacterial evolutionary
algorithm. Mathware and Soft Computing 15(1), 21–40 (2008)
[6] Fischer, T., Bauer, K., Merz, P.: Solving the routing and wavelength assignment problem
with a multilevel distributed memetic algorithm. Memetic Computing 1(2), 101–123
(2009)
[7] Földesi, P., Botzheim, J., Kóczy, L.T.: Eugenic bacterial memetic algorithm for fuzzy
road transport traveling salesman problem. International Journal of Innovative Comput-
ing, Information and Control 7(5(B)), 2775–2798 (2011)
[8] Geisler, T., Manikas, T.: Autonomous robot navigation system using a novel value en-
coded genetic algorithm. In: Proceedings of the IEEE Midwest Symposium on Circuits
and Systems, pp. 45–48 (2002)
[9] Hasan, S.M.K., Sarker, R., Essam, D., Cornforth, D.: Memetic algorithms for solving
job-shop scheduling problems. Memetic Computing 1(1), 69–83 (2009)
[10] Hermanu, A.: Genetic algorithm with modified novel value encoding technique for au-
tonomous robot navigation. Master’s thesis, The University of Tulsa, Tulsa, OK, USA
(2002)
[11] Hosseinzadeh, A., Izadkhah, H.: Evolutionary approach for mobile robot path planning
in complex environment. International Journal of Computer Science Issues 7(4), 1–9
(2010)
448 J. Botzheim, Y. Toda, and N. Kubota
1 Introduction
By the spread of the data release on the Web such a podcast, speech contents can be
easily heard during driving a car and walking in a town. The news speech is one of
such the contents, and it is utilized as a means to obtain information efficiently. Cur-
rently, the news speech is created by manually reading out the articles and recording
it. In order to increase the variety and scale of news speech from now on, it is desired
to automatically create news speech data.
Generally, the speech of reading a text can be created by only using a speech
synthesis software. In recent years, progress of speech synthesis technology is re-
markable and acoustically natural speech can be generated automatically. However,
there exist differences in vocabulary or expression between written language and
Shigeki Matsubara · Yukiko Hayashi
Graduate School of Information Science, Nagoya University, Furo-cho, Chikusa-ku,
Nagoya, 464-8603, Japan
e-mail: matubara@nagoya-u.jp
T. Watanabe et al. (Eds.): Intelligent Interactive Multimedia: Systems & Services, SIST 14, pp. 449–457.
springerlink.com c Springer-Verlag Berlin Heidelberg 2012
450 S. Matsubara and Y. Hayashi
spoken language, and therefore the method of changing a text into a speech as it
was would induce linguistically unnatural speech. Since the naturalness of speech
affects ease of hearing it, it is also important to generate a linguistically natural
speech.
This paper describes conversion from written language to spoken language, aim-
ing at automatically generating spontaneous news speech by using a newspaper ar-
ticles. In this research, it noted that taigen-dome1 occurs frequently in newspaper
articles and that the speech generated by reading out the sentence including taigen-
dome was unnatural.
In this paper, a technique for complementing the omitted expression of taigen-
dome sentences is presented. The technique was implemented by a statistical ap-
proach using linguistic information in taigen-dome sentences, including the type of
the noun of the sentence end, the dependency relation, tense, etc. The complement
experiment using newspaper articles provided the accuracy rate of 79.9% and the
availability of the technique was confirmed.
This paper is organized as follows: Section 2 describes a speech news deliv-
ery system. Section 3 gives comparisons between written language and spoken lan-
guage. Section 4 explains a method of predicate paraphrasing of Japanese newspa-
per articles. Section 5 reports experimental results.
In addition to the type of the complementary word, we have to decide the proper
tense of the word. In order to determine the tense, we use the particular auxiliary
verb and the phrase which indicates the past form. A sentence which includes any
of the words listed below is considered as the past tense.
• Auxiliary verb “ta” which indicates the past form.
• The phrases which indicates the past form such as “sakunen (last year),” “san-
nen-mae (three years ago).”
There exists a taigen-dome sentence which needs no complement. For example, a
sentence
• Kanada-no ese-toshi-gun-no hitotsu ricchimondo-hiru. (Richmond Hill, one of
Canadian satellite cities.)
needs no complement. If it were complemented by a word like “desu,” it would be
considerably artificial. Therefore, if the subtype of the last noun in a sentence is not
“sahen” and the sentence does not include the word which can become a subject,
the sentence is considered to need no complement.
5 Evaluation
The articles on January 3rd, 1995 in the Mainichi newspaper text corpus were used
as test data. The articles consist of 687 sentences including 714 places to be con-
verted. Of them, 164 are taigen-dome sentences. MeCab [5], CaboCha [3], and
CBAP [6] were used for morphological analysis, dependency analysis, and clause
boudary analysis, respectively. We set the following four comparative techniques:
1. Simple complement based on fine classification of nouns
2. Simple complement based on fine classification of nouns, tense, and adonominal
pharses
3. Statistical complement using only the last nouns.
4. Our method
As learning data, 715,429 sentences in the articles from January 4th to December
31st, 1995 in the Mainichi newspaper text corpus.
456 S. Matsubara and Y. Hayashi
cause number
Tense 14
Necessity of complement 9
Type of complementary word 7
Voice 3
Table 2 shows the experimental result for 164 taigen-dome sentences. The result
shows our method to be effective for complement of taigen-dome sentences. Table 3
shows causes of the faulty complements by our method.
6 Conclusion
This paper has proposed a method of conversion from newspaper articles into news
speech for generating spontaneous Japanese speech in a text-to-speech synthesis
system. We focused on complement of taigen-dome sentences. In the experiment
of the complement of taigen-dome sentences, we confirmed the effectiveness of our
method.
Acknowledgements. This research was partially supported in part by the Grant-in-Aid for
Challenging Exploratory Research (No. 21650028) of JSPS.
References
1. Kaji, N., Okamoto, M., Kurohashi, S.: Paraphrasing Predicates from Written Language to
Spoken Language using the Web. In: Proceedings of the Human Language Technology
Conference, pp. 241–248 (2004)
2. Hayashi, Y., Matsubara, S.: Sentence-style conversion of Japanese news article for text-to-
speech application. In: Proceedings of 7th International Symposium on Natural Language
Processing, pp. 257–262 (2007)
Personalization of News Speech Delivery Service 457
Abstract. E-mail systems are common communication tools, and it is desirable that
e-mail messages are written readably for recipients. One of techniques to write read-
able e-mail messages is to insert linefeeds and blank lines appropriately. However,
linefeeds and blank lines in incoming e-mails are not always inserted at positions
where recipients feel readable. If linefeeds and blank lines are inserted automati-
cally at proper positions, readability of e-mail texts are improved and recipients can
read e-mail texts efficiently. This paper proposes a method for text formatting of e-
mail texts by inserting linefeeds and blank lines into incoming e-mails at positions
where recipients feel readable.
1 Introduction
E-mail systems are common communication tools, and those users spend a great
deal of time in reading e-mail messages. It is desirable that e-mail messages are
written readably for recipients to read the e-mail messages effectively. One of tech-
niques to write readable e-mail messages is to insert linefeeds and blank lines ap-
propriately [1, 2, 3]. However, linefeeds and blank lines in incoming e-mails are not
always inserted at positions where recipients feel readable.
This paper proposes a method for formatting texts in e-mail messages by inserting
linefeeds and blank lines into incoming e-mails at positions where recipients feel
Masaki Murata · Shigeki Matsubara
Graduate School of Information Science, Nagoya University,
Furo-cho, Chikusa-ku, 464-8603, Japan
e-mail: murata@el.itc.nagoya-u.ac.jp
Tomohiro Ohno
Information Technology Center, Nagoya University,
Furo-cho, Chikusa-ku, 464-8601, Japan
e-mail: ohno@nagoya-u.jp,matubara@nagoya-u.jp
T. Watanabe et al. (Eds.): Intelligent Interactive Multimedia: Systems & Services, SIST 14, pp. 459–468.
springerlink.com c Springer-Verlag Berlin Heidelberg 2012
460 M. Murata, T. Ohno, and S. Matsubara
readable. In our method, we assume that users insert linefeeds and blank lines at
positions where they feel readable when they write e-mail messages. Therefore,
our method realizes linefeed and blank line insertion by statistical approach using
e-mails written and sent by recipients as the learning data.
Positions where linefeeds and blank lines are inserted in sent e-mails show dif-
ferent tendencies in individuals. We organized factors which have association with
positions where linefeeds and blank lines should be inserted, and decided the fea-
tures based on it. Our method realizes personalized text formatting of e-mails by
selecting features which are useful for inserting linefeeds and blank lines into one’s
incoming e-mails from the feature set.
Semantic Boundaries
Linefeeds are inserted at semantic boundaries if writers emphasize that each line in
e-mail texts consists of a semantic unit. This will enable to avoid separating a word
or a bunsetsu1 by a linefeed, and lead to efficient reading of e-mail texts.
Authors have analyzed linefeed insertion by which texts become more readable
[4]. There are clause boundaries or dependency relations as information on semantic
boundaries.
Line Length
If linefeeds are inserted so that a length of a line does not exceed a certain number,
the number relates to settings of the width in one’s e-mail environment. When peo-
ple read e-mail texts written in longer length of lines than the setting of the width,
they sometimes feel unreadable because they have to scroll. Therefore, writers write
e-mail texts so that a length of a line does not exceed the width in one’s e-mail
environment. there exists a different tendency about the maximum number of char-
acters of lines because the setting of the width in e-mail environment are different in
individuals.
is maintained. It is pointed out that e-mail texts become readable by inserting line-
feeds so that the length of each line is maintained on some level [1].
Topic Boundaries
One of blank line’s roles is to divide e-mail texts into topics by inserting blank line
at topic boundaries. If e-mail texts are divided into topics, recipients can read e-mail
texts efficiently. Information on topic boundaries is, for example, an interrogative
sentence or a conjunction at the beginning of a sentence.
sent e-mail
学習データ
learning development
data data
feature selection
(Section 3.3)
input text I'm sorry for this late reply. I confirmed the unit which
返事が遅くなって申し訳ありません。参加単位、発表人数の件、了解しました。 participate in the meeting and the number for
presentation. I think all the people in laboratory attend the
研究室全体の参加で、こちらも発表人数は2名と考えております。時間帯の件です
meeting and the number for presentation is two. With
が、7/10になった場合、13:00以降であれば何時でも大丈夫ですので、そちらの研 regard to hours, if the meeting is held on 7/10, we can
究室のご都合さえよろしければ13:00スタートが良いと考えております。以上、よ participate in the meeting after 13:00. So, I would like to
ろしくお願いします。 start the meeting at 13:00, if that's OK with your laboratory.
Thank you for your assistance in this matter.
linefeed insertion
返事が遅くなって申し訳ありません。 I'm sorry for this late reply. I confirmed the unit which
参加単位、発表人数の件、了解しました。 participate in the meeting and the number for
研究室全体の参加で、こちらも発表人数は2名と考えております。 presentation. I think all the people in laboratory attend the
時間帯の件ですが、 meeting and the number for presentation is two. With
7/10になった場合、13:00以降であれば何時でも大丈夫ですので、 regard to hours, if the meeting is held on 7/10, we can
そちらの研究室のご都合さえよろしければ participate in the meeting after 13:00. So, I would like to
13:00スタートが良いと考えております。 start the meeting at 13:00, if that's OK with your laboratory.
Thank you for your assistance in this matter.
以上、よろしくお願いします。
研究室全体の参加で、こちらも発表人数は2名と考えております。 I think all the people in laboratory attend the meeting and the number for
presentation is two.
時間帯の件ですが、
(With regard to hours, )
7/10になった場合、13:00以降であれば何時でも大丈夫ですので、
(if the meeting is held on 7/10, we can participate in the meeting after 13:00. )
そちらの研究室のご都合さえよろしければ (So, if that's OK with your laboratory, )
13:00スタートが良いと考えております。 (I would like to start the meeting at 13:00. )
j j j j
sequences as L j = b1 · · · bn j (1 ≤ j ≤ m), and then, ok = 0 if 1 ≤ k < n j , and ok = 1
if k = n j .
When an input sentence B is provided, our method identifies the linefeed insertion
O that maximizes the conditional probability P(O|B). Assuming that whether or not
to insert a linefeed right after a bunsetsu is independent of other linefeeds except
the one appearing immediately before that bunsetsu, P(O|B) can be calculated as
follows:
Personalized Text Formatting for E-mail Messages 463
P(O|B) (1)
=P(o11 = 0, · · · , o1n1 −1 = 0, o1n1 = 1, · · · , om
1 = 0, · · · , onm −1 = 0, onm = 1|B)
m m
×P(om
nm −1 = 0|onm −2 = 0,· · ·, o1 = 0, onm−1 = 1, B)
m m m−1
nm = 1|onm −1 = 0, · · · , o1 = 0, onm−1 = 1, B)
×P(om m m m−1
And then, our method identifies the linefeed insertion O that maximizes P(O|B)
from O1 , · · · , On .
arg max P(O|B) = arg max P(O|B)
O∈{O1 ,···,On }
Linefeed insertion
Morphological 1. the rightmost independent morpheme i.e. head word, (part-of-speech and
information inflected form) and rightmost morpheme (part-of-speech) of a bunsetsu bkj
Clause boundary 2. whether or not a clause boundary exists right after bkj
information 3. type of a clause boundary right after bkj if there exists a clause boundary
Dependency 4. whether or not bkj depends on the next bunsetsu
information 5. whether or not bkj depends on the final bunsetsu of a clause
6. whether or not bkj depends on a bunsetsu to which the number of characters
from the start of the line is less than or equal to the maximum number of
characters
7. whether or not bkj is depended on by the final bunsetsu of an adnominal
clause
8. whether or not bkj is depended on by the bunsetsu located right before it
9. whether or not the dependency structure of a sequence of bunsetsus between
bkj and b1j , which is the first bunsetsu of the line, is closed
10. whether or not there exists a bunsetsu which depends on the modified bun-
setsu of bkj , among bunsetsus which are located after bkj and to which the
number of characters from the start of the line is less than or equal to the
maximum number of characters
Line length 11. Proportion of the number of characters from the start of the line to bkj to the
maximum number of characters
Balance of line 12. Proportion of different of the number of characters from the start of the line
length to bkj and average length to average length
Leftmost 13. whether or not the basic form or part-of-speech of the leftmost morpheme
morpheme of a of the next bunsetsu of bkj is one of the following morphemes
bunsetsu (Basic form: “ࢥ͏ (think)”, “ (problem),” “͢Δ (do),” “ͳΔ (be-
come),” “ඞཁ (necessary)”
Part-of-speech: noun-non independent-general, noun-nai adjective stem,
noun-non independent-adverbial)
Comma 14. whether or not a comma exists right after bJk
Blank line insertion
Number of lines 15. number of lines from the start of the paragraph
in paragraph 16. number of lines of sgh+1
Keyword 17. whether or not a first morpheme of sgh+1 is a conjunction
18. the first morpheme(a surface form) of sgh+1 if its part-of-speech is a con-
junction
19. whether or not sgh is a interrogative sentence
Blank line insertion method identifies the most appropriate combination among all
combinations of positions where a blank line can be inserted, by using the proba-
bilistic model.
In this paper, input texts which consist of q sentences are represented by S =
s1 · · · sq , and the results of blank insertion by T = t1 · · ·tq . Here, ti is 1 if a blank
line is inserted right after sentence si , and 0 otherwise. Also, tq = 1. We indicate the
g-th sequence of sentences created by dividing an input sentence into r sequences
as Sg = sg1 · · · sgqg (1 ≤ g ≤ r), and then, thg = 0 if 1 ≤ h < qg , and thq = 1 if h = qg .
When an input sentences S is provided, our method identifies the blank line inser-
tion T that maximizes the conditional probability P(T |S). Assuming that whether
Personalized Text Formatting for E-mail Messages 465
or not to insert a blank line right after a sentence is independent of other blank lines
except the one appearing immediately before that sentence, we calculate P(T |S) by
the same method described in Section 3.1.1.
4 Preliminary Experiment
To evaluate the effectiveness of our method, it is necessary to conduct an experiment
by using e-mail data of many users. As a preliminary experiment, we used e-mail
data of one of the authors.
input text When we checked the Mainichi newspaper data, the data
毎日新聞データ集を確認したところ、データ集には朝刊と夕刊の記事が含まれて included articles of morning paper and evening paper. The
いました。お渡ししたデータは朝刊、夕刊の区別なく文を抽出したデータを基に given data was created based on data extracted from
without distinguishing between the morning paper and the
作成しているため、似たような文が見られたのだと思われます。よろしくお願い
evening paper, so I think there were similar sentences in
致します。
the data. We appreciate your prompt attention to this
matter.
text into which linefeeds and blank lines were inserted
毎日新聞データ集を確認したところ、 (When we checked the Mainichi newspaper data, )
データ集には朝刊と夕刊の記事が含まれていました。 (the data included articles of morning paper and evening paper. )
method was able to output results which completely corresponds to sent e-mails of
the author as shown in Figure 3.
5 Conclusion
This paper proposed a method for text formatting of e-mails based on linefeed and
blank line insertion. In our method, sent e-mails written by a certain recipient are
used as the learning data. Our method realizes linefeed and blank line insertion
which fits write tendency of the recipient by using the useful features for formatting
e-mail texts of the recipient which are selected from the common feature set. An
experiment by using the mail data of one of authors showed an F-measure of linefeed
insertion was 57.28, and F-measure of blank line insertion was 76.74. We conducted
a subjective evaluation, and analysed the causes making readability worse by our
method.
In the experiment, we used e-mail data of the author. In the future, we will con-
duct an experiment using mail data of multiple users and evaluate whether or not
our method inserts linefeeds and blank lines at positions where each recipient feels
readable. And, on experimental e-mail data of one of the authors, blank lines were
468 M. Murata, T. Ohno, and S. Matsubara
inserted at every sentence boundaries. We might say that our blank line insertion
method didn’t work well. In the future, it is necessary to improve the blank line
insertion method.
References
1. Fujita, E.: Mail bunsyoryoku no kihon (Fundamentals of a writing ability of e-mail). Nip-
pon Jitsugyo Publishing Co., Ltd. (2010) (in Japanese)
2. Ueda, M., Hosoda, S.: Tyosoku master E-mail, Rirekisyo Entry sheet Seikou Jitureisyu
(Ultrahigh-eary master Example of Success of E-mail, Resume, Entry Sheet). Tahahashi
Shoten Co., Ltd. (2009) (in Japanese)
3. Ando, S.: E-mail handbook. Kyoritsu Shuppan Co., Ltd. (1998) (in Japanese)
4. Ohno, T., Murata, M., Matsubara, S.: Linefeed Insertion into Japanese Spoken Monologue
for Captioning. In: Proceedings of Joint Conference of the 47th Annual Meeting of the
Association for Computational Linguistics and the 4th International Joint Conference on
Natural Language Processing of the Asian Federation of Natural Language Processing,
pp. 531–539 (2009)
5. Kudo, T., Yamamoto, K., Matsumoto, Y.: Applying Conditional Random Fields to
Japanese Morphological Analysis. In: Proceedings of the 2004 Conference on Empirical
Methods in Natural Language Processing, pp. 230–237 (2004)
6. Kudo, T., Matsumoto, Y.: Japanese dependency analysis using cascaded chunking. In:
Proceedings of 6th Conference on Computational Natural Language Learning, pp. 63–69
(2002)
7. Kashioka, H., Maruyama, T.: Segmentation of semantic units in Japanese monologues. In:
Proceedings of International Conference on Speech Language Technology and Oriental,
COCOSDA, pp. 87–92 (2004)
8. Le, Z.: Maximum entropy modeling toolkit for python and c++ (2008),
http://homepages.inf.ed.ac.uk/s0450736/maxenttoolkit.html
(online; accessed March 1, 2008)
Presentation Story Estimation from Slides
for Detecting Inappropriate Slide Structure
Abstract. In many situations, we present our ideas using presentation tools, such
as PowerPoint in Microsoft Office. However, story created by author is sometimes
inappropriate and listeners cannot understand topics that the author wants to insist
by watching the generated slide. Our research aims at constructing the system
which automatically detects differences of author’s intention and slide structure,
and points out inappropriateness of generated slides. Author’s intention for each
slide is captured by assigning each slide in topic template. On the other hand, top-
ics that listeners may understand from generated slides are estimated based on the
lexical information in the slides and are organized as topic tree. Relations between
generated slides are detected by the change of their focusing nodes in a topic tree.
By comparing relations grasped by the topic template and topic tree, the system is
able to point out inappropriate slides automatically.
1 Introduction
In many situations, we present our ideas using presentation tools, such as Power-
Point in Microsoft Office. Presentation files consist of a sequence of presentation
slides (slide) in which descriptions of topics are organized sequentially. Author of
the slide (author) consider the presentation story (story) based on the topics that
he/she wants to insist (intention). Then, he/she generates contents, such as texts,
diagrams or table, that explain the topics by considering relations between them.
However, story created by author is sometimes inappropriate and listeners cannot
understand topics that the author wants to insist by watching the generated slide.
In many cases, it is difficult for authors to detect the inappropriateness of their
slides by themselves. Thus, to point out the differences between author’s intention
and listeners’ understandings is valuable.
There are several researches that support authors for generating logical presen-
tation slide [1, 2]. Maeda, et al. constructs collaborative learning environment for
T. Watanabe et al. (Eds.): Intelligent Interactive Multimedia: Systems & Services, SIST 14, pp. 469–478.
springerlink.com © Springer-Verlag Berlin Heidelberg 2012
470 T. Kojiri and F. Yamazoe
discussing slide story generation skill [3]. This research provides story amendment
tool which assists participants to re-organize the presented slide and generate slide
amendment. Also, it proposes the discussion support tool which detects meaning-
ful modifications in participants’ amendments. In this environment, whether
participants can effectively discuss story generation skill or not depends on cha-
racteristics of participants. So, participants sometimes are not able to acquire the
knowledge for slide generation. On the other hand, Kamewada, et al. constructed
the tool which extracts listeners’ eye movement during the presentation [4]. By
notifying differences of presenting slide and listeners’ eye directions, authors can
notice the inappropriateness of his/her story. However, in order to modify the gen-
erated slide using this system, authors need to do presentation in front of listeners.
So, this system cannot be used when there is not listener.
On the other hand, Hanaue, et al. constructed the system which generates slides
automatically [5]. In this system, author has to input topics and their relations.
Based on the input topics, slides are generated automatically. Using this system,
slides that satisfy author’s intention can be generated. However, author’s skill for
generating slides is not developed. Therefore, author needs to use the system every
time he/she creates slides.
Our research aims at constructing the system which automatically detects au-
thor’s intention and topic flow in generated slide. Then, it extracts conflicts be-
tween generated slide and author’s intention and points out inappropriateness of
generated slides. By considering the reason for detected inappropriateness, authors
can develop the skill for generating logical slides.
There are some typical stories for each presentation genres, such as research
presentation at the academic conference and commercial goods presentation to
customers. Elements in such typical stories have sequential and inclusive relations
and how to express their relations in the slides is important. Authors tend to create
their slides by following the typical stories. Therefore, roles of generated slides
and their relations in author’s intention can be grasped by matching them to the
elements in the typical stories.
For listeners, relations among slides are inferred by descriptions in slides. For
instance, if one slide explains topic which is already described in other slide, these
slides contain the same words and slide which explains the topics has several ex-
planation sentences below them. In order to estimate listeners’ understandings of
the slides, our system automatically analyzes the descriptions in the slides and de-
tects the relations of topics. Then, by comparing them with author’s intention
which is acquired from the typical stories, conflict between author’s intention and
generated slides is able to be detected.
2 Approach
Figure 1 shows the overall framework of our system. The objective of our system
is to detect the conflict between author’s intention and constructed slides. As Tufte
[6] described, relations among slides are one of the difficult points to represent in
creating slides. This research focuses on the relations between slides, such as
sequential relation, inclusive relation or no relation, as author’s intention.
Presentation Story Estimatio
on from Slides 4771
The target of the preseentation in our research is the research presentation in thhe
computer science field fo or the academic conference. In such presentation genrre,
typical story exists, in whhich relations among elements in the typical story are dee-
fined statically. If author’s slides are corresponded to the elements in the typiccal
story, author’s intentions for each slide can be grasped. Thus, in our system, topic
template is introduced wh hich corresponds to the typical story, by which elemennts
and relations among elem ments are defined. When using the system, author firstlly
indicates each slide to thee elements in the topic template.
On the other hand, listeener’s understanding for the slide is estimated by descripp-
tions in the constructed slides.
s New words appear when new topic is proposed,
while complementary top pic holds words that are already used in other topics. Cuur-
rently, we regard each sen ntence in the slides as individual topics. Based on the app-
pearance and relations beetween words in sentences, relations among topics are inn-
ferred and topic tree is constructed.
c Topic tree represents the relations betweeen
topics. Topics with sequeential relations are situated as sibling nodes and topics oof
inclusive relations are located as child nodes.
Relations of slides can n be grasped by topics in the topic tree included in eacch
slide. For instance, if one slide holds topics of upper nodes and the other slide conn-
tains those of lower nodes, the latter slide may provide supplementary explanation tto
the former slide. Thus, by comparing topics included in each slide, relations of slidees
are estimated. Then, if thee relations are different from those that are acquired fromm
topic template, differencess are pointed out to the author.
472 T. Kojiri and F. Yamazooe
3 Topic Template
Topic template representss typical stories for a specific presentation genre. Toppic
template consists of nodees and links. Nodes represent topics and links represennt
relations between topics. Currently, two types of links are prepared, such as see-
quential relation and incllusive relation. Such relations are defined statically foor
each element in typical stoory.
This research focuses on the research presentation in the computer science at
the academic conference. Figure 2 is the example of the topic template for the ree-
search presentation. Usuaally, such research presentation starts from backgrounnd
and objective follows to the background. Then, constructed system is introduceed
and effectiveness of the system
s is discussed based on the result of experiment. IIn
addition to the basic storry, several other topics are added for the purpose of exx-
plaining the story in detaiil. Approach provides global viewpoint for achieving thhe
objective. So, it is includ
ded from objective. Also, method embodies approach bby
showing the algorithms, equations or some models. Thus, it is included by App-
proach. As the same way y, experiment and discussion, and discussion and concluu-
sion have inclusive relatio
ons respectively.
Author’s slides are manually allocated to the nodes in the topic template. Baseed
on the allocation, author’ss intention among slides is grasped according to the rela-
tions between nodes in a topic
t template.
4 Topic Tree
Topic tree represents top pics and their relations that may be grasped by listeneers
from the slides. In slides, each sentence forms topic, so relations among sentencees
should be detected.
Within one slide, relatiions among topics can be observed by their layout infoor-
mation. If the itemize level of one topic is lower than that of the other, the formeer
topic may explain the lattter topic, so that the former topic is included by the latteer
topic. When topics are in n the same itemize level, their relations are grasped bby
their physical relation. Th
hat is, the topic which is written in the upper part than thhe
Presentation Story Estimatio
on from Slides 4773
other, the latter topic has sequential relation from the former topic. Relations arre
also defined between slides. Sequential relation exists between slides whose order
is next to each other. A slide has sequential relation to its next slide. Based oon
such viewpoint, topic treee is constructed from the slides.
Figure 3(a) is the exammple of slides and Figure 3(b) shows its topic tree. Sincce
objective slide locates nexxt to the background slide, sequential relation is attacheed
from the node of backgro ound to that of objective. Since background slide consissts
of two topics of the samee itemize level, they become child nodes of backgrounnd
and sequential relations are
a attached between them. On the other hand, objectivve
slide has two topics of different itemize levels. So, first topic becomes child nodde
of objective, and second toopic is generated as the child node of the first topic.
On the other hand, sommetimes topics (explaining topic) explain the topics in thhe
different slide (explained topic). In such case, topic tree should be re-organized sso
as to make explaining no ode become sub-tree of explained node. Following arre
steps for re-organizing top
pic tree.
Step 1: Detection of keeywords that characterize topic
Step 2: Detection of noodes of the same topic
Step 3: Re-organize thee topic tree according to nodes of the same topic.
In step 1, in order to detecct the nodes of the same topic, words that may characteer-
ize the topics are extracted
d. Keywords are the words that are often used in the topiic,
but are not used in other topics.
t In order to detect such words, methods whose ideea
is similar to tf-idf method
d [7] is proposed. Figure 4 is the equations for calculatinng
474 T. Kojiri and F. Yamazooe
In Figure 3(a), two sliides have words “slide” in common. The importance oof
node which includes “slid de” in slide2 is 2/7. If the threshold for the importance is
set as 1/7, its sub-tree mov
ves and topic tree of Figure 5 is transformed to Figure 6.
5 Detection of Rela
ations among Slides from Topic Tree
Relations between two sliides are determined using topic tree. Usually, slides conn-
sist of several nodes in to
opic tree. Average location of nodes is regarded as the foo-
cus of the slide. Change of focus between slides may express relations betweeen
slides globally. In order to
t represent the focus, two axes are introduced, such aas
proceeding degree and deepth. Then, the focus of slide is expressed by the follow w-
ing form:
Proceeding degree indiccates status from the first node of topics in the slidees.
Figure 7 is the equation of determining the proceeding degree. It is derived bby
calculating the ratio of inncluded nodes for each sub-tree of the first layer in thhe
slide. Let’s assume that slide
s includes colored nodes in Figure 8. The proceedinng
degree for the slide is calcculated as 1*4/6+2*2/3+3*0/5=4/3.
Depth corresponds to the detail of explanation for each topic. Figure 9 is thhe
equation of determining depth.
d In the equation, Layeri corresponds to the level oof
layer. Depth is determined by the number of included nodes for each layer. In thhe
example of Figure 8, the depth
d is calculated as (1*2+2*3+3*1)/7=11/7.
The relations between slides can be grasped by the direction of the vector, iin
which starting point correesponds to the focus of the former slide and ending poinnt
corresponds to that of thee latter slide. If two slides have sequential relation, proo-
ceeding degree becomes larger and depth may not change. On the other hand, if
two slides have inclusive relation, depth becomes larger while proceeding degreee
may not change. So, if an ngle of the vector from the horizontal axis a (see Figurre
10) is bigger than the threeshold, slides are regarded to have inclusive relation. O
On
the contrary, if angle a iss smaller than the threshold, slides may have sequentiial
relation.
Determined relations are compared with those defined by topic template. If
there are conflicts betweeen relations, system points out that slide description is
inappropriate.
Presentation Story Estimatio
on from Slides 4777
6 Conclusion
In this paper, the mechan nism which automatically detects differences of authorr’s
intention and created presentation slide is introduced. Currently, we are developp-
ing the system using C# and
a MeCab[7] as morphological analyzer. As soon as thhe
system is implemented, ou ur mechanism needs to be evaluated through experimennt.
So far, we have not disscussed the kind of messages to give to authors, when diif-
ferences are detected. If the message does not explain the reason of differencees,
authors may not be able tot modify the slides. However, to tell the reason directlly
prevents authors consideering the reasons of inappropriateness by themselvees.
Therefore, the mechanism m for generating messages that promote meta-learning oof
slide generation skill shou
uld be developed in our future work.
References
1. Okamoto, R., Kashihara,, A.: Back-Review Support Method for Presentation Rehearssal
Support System. In: Kön nig, A., Dengel, A., Hinkelmann, K., Kise, K., Howlett, R.JJ.,
Jain, L.C. (eds.) KES 20011, Part II. LNCS (LNAI), vol. 6882, pp. 165–175. Springeer,
Heidelberg (2011)
2. Kashihara, A., Saito, K., Hasegawa, S.: A Cognitive Apprenticeship Framework ffor
Developing Presentation Skill. IEICE Technical Report 111(141), 23–28 (2011) (in Jaap-
anese)
3. Maeda, K., Hayashi, Y., Kojiri, T., Watanabe, T.: Skill-up Support for Slide Compossi-
tion through Discussion. In: König, A., Dengel, A., Hinkelmann, K., Kise, K., Howleett,
R.J., Jain, L.C. (eds.) KES
K 2011, Part III. LNCS (LNAI), vol. 6883, pp. 637–6446.
Springer, Heidelberg (20011)
478 T. Kojiri and F. Yamazoe
Dana M. Barry
*
T. Watanabe et al. (Eds.): Intelligent Interactive Multimedia: Systems & Services, SIST 14, pp. 479–488.
springerlink.com © Springer-Verlag Berlin Heidelberg 2012
480 D.M. Barry et al.
independently on a different project. The US team designed and built a car for the
future, while the Japan team focused on designing a safe way for using nuclear
energy. A discussion about the US team’s successful project is provided.
1 Introduction
PBL provides students with challenging, ill-structured problems that relate to their
daily lives [1]. The students receive guidance from their teachers and work coope-
ratively in a group to seek solutions to the problems. For this project, student
teams from the US and Japan worked independently on different problems in
Second Life (SL), an online three-dimensional community. All of the activities
took place on a virtual island owned by Nagaoka University of Technology, Japan.
Students met in virtual classrooms that resembled those seen in real life, with
items such as tables and chairs. This arrangement gave the team members a sense
of reality. Also their teachers presented the PBL material on a big screen in the
virtual classroom by using Power Point slides.
Students’ conversations and discussions were based on the chat function in SL.
Participants typed text messages in the input box at the bottom of the screen to ex-
change ideas and thoughts with each other. All of the dialogue was recorded as
files written by the Linden Script Language. Then it was sent to a web server,
and finally saved as CSV files. This information was accessed (by the teacher and
students) for analysis by using various web browsers. The students were to also
use touch tablets with special pens and Bamboo software (Wacom Company) to
exchange ideas with each other by drawing sketches. A touch tablet was con-
nected to each student’s personal computer so that their information and drawings
could be sent to a common server. This way a whiteboard (set for the specific
website) could display individuals’ sketches (as they were being prepared) in the
virtual classroom. Unfortunately because of technical problems, etc. this activity
did not work well. Therefore, the US team just prepared a simple sketch of their
solar car and displayed it in the virtual classroom for discussion purposes.
The US team presented the final solution to their PBL project by constructing
the eco car of the future using prims (formally called primitives). Their avatars
prepared prims (three dimensional objects written by the Linden Script) in a space
referred to as a sandbox, located in front of the virtual classroom. A detailed de-
scription of the US team’s activity is provided.
The authors have previous experience of successfully carrying out problem based
learning activities in the virtual world [2 - 4]. Since PBL has been very useful in
real classrooms, they wondered if it would be an effective teaching tool in Meta-
verse. To find out, they tried a Pilot Study in 2009 for a PBL project in the virtual
world. This project was carried out in Second Life by student teams from the US
and Japan in virtual classrooms owned by Nagaoka University of Technology
(NUT). Each team worked independently on the same PBL project. They received
guidance from their teachers. The student teams were asked to solve the following
problem. What will the typical house look like in the near future, during the global
warming era? The participants from both Countries enjoyed and successfully com-
pleted this project. They communicated well using the chat function and built their
houses of the future with prims. The US house was a dome-shaped structure with
Problem Based Learning for US and Japan Students in a Virtual Environment 481
solar panels on the roof and a floor made of synthetic wood to preserve the Earth’s
trees. Japan’s team built an energy efficient dome-shaped house with a ceiling that
automatically opened so that cool breezes could flow through it. The results of the
Pilot Study clearly indicated that this kind of PBL class was possible for actual e-
learning in engineering education. Therefore, the researchers decided to pursue
their studies with two new PBL projects (The Virtual Eco Car Project by the US
team and The Nuclear Energy Safety Project in Metaverse by the Japan team).
The US instructor used a Power Point presentation to present the car problem and
to provide general information about three types of cars (a solar car, a nuclear car,
and a fuel cell car). This team was to select one of the three cars to be their best eco
car of the future. See Figure 3.
Each team began their project (in the virtual classroom) by brainstorming and
holding discussions about possible solutions to the problem they were asked to
solve. The US team needed to select one type of car to design and build as their
eco car of the future. In order to make a decision they compared the three cars in
terms of energy efficiency, ecological
friendliness, and safety.
The US team emphasized safety and
felt overall that the best car would be a
solar car. They made some important
points about the cars. They said the so-
lar car is an electric car powered by an
available source of energy (the Sun).
Therefore our natural resources will
not be depleted. They also said that the
Fig. 3 The US instructor gives a Power Point solar car was the safest and that the
presentation about the project to her team. technology for making it was already
available. In regards to the other two
cars, the team agreed that the nuclear and fuel cell cars were energy efficient but
had safety issues. They were concerned about radiation and the storage of wastes
for a nuclear car and about the flammability of hydrogen for the fuel cell car.
The next part of this
project involved the design-
ing of their solar car. The
team was asked to prepare a
simple sketch of their car
design by using special tab-
lets and Bamboo software.
This required some practice
on their part.
At the same time, the
students practiced making
prims which would be used
to make their eco car of the
future. The students tra-
veled to various locations in
Fig. 4 US student team members practice making prims. SL (such as Natoma and the
Ivory Tower) to obtain in-
formation about making prims. Figure 4 shows the US students making prims
During another meeting in the virtual classroom, the US team’s simple car design
was displayed on the whiteboard for discussion purposes. See Figure 5. The stu-
dents decided to slightly modify the design by adding more solar panels. They
agreed to build a car that resembled this sketch. It would be green and have
wheels, solar panels, and a small passenger section in the front. (The battery for
Problem Based Learning for US and Japan Students in a Virtual Environment 483
Fig. 5 The US instructor examines her team’s solar car design sketch.
storing energy would not be visible.) Finally the students decided (as a result of
chat discussions) how they would build the car as a team. Avatar Fountainer14
built the car’s body by making a long, green rectangular solid prim. Avatar Ali-
lovleylights built the small passenger section for the front of the car by starting
with a green hemisphere prim. Avatar Swimmywimmy15 made the wheels and
took the lead in making the solar panels. See Figure 6.
The students completed this project by placing solar panels (the flat blue items)
on top of their car. See Figure 7.
3 Project Results
The student teams in the US and Japan successfully completed their projects. The
US team designed and built virtual cars of the future. To obtain more information
about these Problem Based Learning (PBL) activities, each student was asked to
complete two questionnaires. Questionnaire (Part 1) and the US results for this
questionnaire are provided.
Questionnaire (Part 1)
Fig. 8 (continued)
And as for the question 10, we got the following results. Student number 1: “It
was fun and enjoyable. We would like more time individually to play with and
create the car.” Student number 2: “It was fun. We would like more time on
the car and more prims available.” Student number 3: “We would like more
time to individually build the car.”
The results show that the US team members enjoyed this activity and would be
interested in participating in another similar project. Overall they appeared com-
fortable performing functions (such as walking, making prims, etc.) in SL. The
students said it was easy drawing sketches and making prims. Two of them
thought that the avatar’s movement was the most enjoyable, while one liked prim
making the best. Participants expressed a need for more time to complete the
project. They all felt that discussion was the hardest task, even though they had
good brainstorming sessions in the virtual classroom. It may have been difficult
(using the chat function) because they had to think and type fast and be careful
about spelling errors. Also they had to be aware of other chat messages and read
them quickly before they disappeared. These results suggest that the students may
Problem Based Learning for US and Japan Students in a Virtual Environment 487
be more relaxed speaking than writing. The voice chat in SL is a good option for
them. Overall this was an exciting and successful project for all involved.
4 Conclusions
The Virtual Car Project had several goals for the students. The participants were
to learn about various car types, and to design and build a car of the future through
problem based learning (PBL) in Second Life (SL). Therefore, virtual classrooms
were built on a virtual island to determine the effectiveness of PBL in SL. For this
project, the students understood the lecture material presented in the virtual class-
room. Using the chat function, they actively discussed the possible car types in re-
gards to energy efficiency, ecological friendliness, and safety. As a team they
placed much emphasis on safety and decided to design and build a solar car for
their virtual car of the future.
The US team’s PBL project seemed to apply well to the e-learning environment
offered in Second Life. The participants enjoyed the activity, which was a great
exercise in engineering design. They successfully carried out the PBL project in
SL by using the chat function for discussions and by using prims for making their
PBL product: the eco car of the future. For this project, SL provided some benefits
to PBL. The virtual world appeared to be a relaxed and comfortable setting for
discussions and decision making activities. The virtual classroom was bright and
cheerful, and offered a private gathering place where the students could focus on
their PBL activity. Also decision making in the virtual world was quicker and eas-
ier than in the real world. To design and build a car in the real world would require
lots of time and money, along with major decisions like what materials to use for
the car, where to buy these materials, where to build the car, etc. In addition, the
virtual world allowed for creative decision making because it has less restrictions
(examples: in terms of time, money, and space). Students were free to expand
their thoughts and to consider more options (for possible solutions) to the problem
they were solving. Overall it can be said that this project confirmed the effec-
tiveness of PBL in SL.
References
1. http://en.wikipedia.org/wiki/Problem-based_learning
2. Barry, D.M., Kanematsu, H., Fukumura, Y.: Problem Based Learning in Metaverse (ED
512315), Education Resources Information Center, U.S (2010)
3. Barry, D.M., Kanematsu, H., Fukumura, Y., Ogawa, N., Okuda, A., Taguchi, R., Nagai,
H.: International Comparison for Problem Based Learning in Metaverse. In: Proceed-
ings of the ICEE and ICEER 2009 Korea (International Conference on Engineering
Education and Research), Seoul, Korea, pp. 59–65 (2009)
488 D.M. Barry et al.
4. Kanematsu, H., Fukumura, Y., Barry, D.M., Sohn, S.Y., Taguchi, R., Farjami, S.: Vir-
tual Classroom Environment for MultiLingual Problem Based Learning with US, Ko-
rean, and Japanese Students. In: Proceedings of 2010 JSEE Annual Conference, Toho-
ku, Japan (August 2010)
5. Kanematsu, H., Fukumura, Y., Ogawa, N., Okuda, A., Taguchi, R., Nagai, H.: Practice
and Evaluation of Problem Based Learning in Metaverse. In: ED- MEDIA 2009 (World
Conference on Educational Multimedia, Hypermedia & Tele-communications), pp.
2862–2870. Association for the Advancement of Computing in Education, Honolulu
(2009)
Proposal of a Numerical Calculation
Exercise System for SPI2 Test
Based on Academic Ability Diagnosis
Abstract. This paper describes a concept for a calculation exercise system used in
order for students who study for SPI2 Test may exercise a numerical calculation
repeatedly. We aim to develop a system characterized by generating questions
dynamically to each student in order to ease the burden for the teacher when
preparing many original questions. In such a case, in order to raise learning effect, it
is important to measure a student's academic ability exactly and distribute questions
according to academic ability. In this paper, we propose a method to estimate
students' understanding based on an academic ability diagnostic test using item
response theory, and a method to control questions to distribute using the
information of hierarchical structure among questions in the study unit.
1 Introduction
In recent years, in many universities or colleges, the students who lack fundamental
academic ability of Mathematics are increasing in number, and it interferes the
advance of some lectures premised on Mathematics understanding. In addition, since
many companies give Mathematics test as a part of their employment examination
nowadays, it is difficult for students with low academic ability of Mathematics to pass
it. The College which Tsumori belongs to has been giving a class of remedial
Mathematics to the freshman students. However, since the difference of the academic
ability between students is large, it is difficult to raise learning effectiveness with the
same curriculum. In such a case, the individualized learning which has been adapted
for the understanding is more effective than the simultaneous learning. Therefore, it
T. Watanabe et al. (Eds.): Intelligent Interactive Multimedia: Systems & Services, SIST 14, pp. 489–498.
springerlink.com © Springer-Verlag Berlin Heidelberg 2012
490 S. Tsumori and K. Nishino
seems that an e-learning system functions effectively. Since many students have an
insufficient understanding of the Mathematics learned at Elementary and Junior High
Schools, it is important for them to solve many fundamental numerical calculations
repeatedly. However, since they do not like to study "Mathematics" generally, it is
difficult for them to maintain motivation to study contents of the existing
"Mathematics" as is. Instead, their volition to study for an employment examination is
comparatively high. Therefore, it is expectable that the teaching materials made for
employment examinations raise their motivation for learning.
Then, we aim to improve students' academic ability of mathematics using the
mathematical questions in SPI2 Test widely adopted as an employment
examination. The range of questions of SPI2 Test is very wide. It consists of
questions which require the academic ability at the level of Elementary School
upper classes to High School low grade. Therefore, in order to study efficiently, it is
important to measure the student's academic ability and to exercise using many
questions repeatedly in the study unit which he/she does not understand well.
Many exercise systems characterized by distributing the questions accordingly to
academic ability have been developed for many years. One of the typical methods is
choosing the question which is adapted for student's academic ability from the
database which store a lot of questions [1][2]. This method is very effective if the
level of difficulty for all questions in the database are already set using a method
such as Item Response Theory (IRT). However, since it is necessary for a teacher to
make a lot of questions in advance, a big burden on him/her is created. One of the
other methods is to generate questions automatically [3]-[10]. In this case, it is a
problem of how the validity of the generated question is evaluated.
Then, we propose the calculation exercise system characterized by carrying out
two steps; Academic Ability Diagnosis and Questions Exercise using Numerical
Calculation. For the first step, the student's understanding of every study unit is
measured from the result of a diagnosis test. Next, a student repeatedly solves the
questions which are generated by our system automatically. This paper explains the
concept for a system with the above feature.
2 What Is SPI2?
SPI (Synthetic Personality Inventory) is a name of the aptitude test developed by
(present)Recruit Co., Ltd. in 1974 based on MNPI (Minnesota Multiphasic
Personality Inventory) which was developed by University of Minnesota. The version
of SPI used now is SPI2 developed in 2005. SPI2 Test is used widely for the
employment examination of a company. As of March, 2011, 8,610 companies
adopted SPI2 and about 1,230,000 people have taken it. There are two systems for
SPI2 Test. These are Paper-Based Test which uses an answer sheet and
Computer-Based Test (CBT). It is said that IRT is used for CBT in order to change the
question's difficulty level according to the right-or-wrong situation of an answer.
SPI2 consists of an achievement test and a personality test. Although the contents
and the range of questions of an achievement test are not opened, it includes two
fields. One is the language field which asks language academic ability, and the other
is a non-language field which asks mathematical academic ability (it is an object of
Proposal of a Numerical Calculation Exercise System for SPI2 Test 491
(number of people)
0 2 4 6 8 10 12 14 16 (score)
this paper). Many of the questions in the non-language field are simple questions
which can be correctly answered by the student who has fundamental Mathematical
academic ability. However, as for the result of the trial test for SPI2, even the question
of this level was difficult for a Junior College Student. Fig.1 shows the histogram of
raw score of SPI2 trial test which consists of 16 questions (Max: 8, Average: 4).
Student Question ID
Lc θ
ID 1 2 3 4 5 6
1 1 0 0 0 1 0 -0.695 -0.940
2 1 1 1 0 1 0 0.695 0.940
3 0 0 1 0 0 0 -1.607 -2.174
4 1 0 0 0 1 0 -0.695 -0.940
5 1 0 0 0 0 1 -0.695 -0.940
6 1 1 1 0 0 1 0.695 0.940
7 1 1 1 1 0 1 1.607 2.174
Li -1.791 0.286 -0.286 1.791 0.286 0.286 unbiased estimate
of variance:
IIC -1.886 0.191 -0.381 1.696 0.191 0.191 U: 1.338, V: 1.252
expansion factor:
bj -2.525 0.256 -0.51 2.271 0.256 0.256 X: 1.353, Y: 1.339
1 1
. .
X ,
1 1
. .
said that it is a result in which the parameter estimation using PROX method is
almost appropriate.
In this research, initial question level of each study unit is determined as follows
using the correct answer probability of Table.2.
- 0.3 : Question of a level lower than a diagnostic test
- 0.3 0.7 : Question of a level equivalent to a diagnostic test
- 0.7 : Question of a level higher than a diagnostic test
Two values 0.3 and 0.7 for the criterion are tentative. They may be changed
appropriately after a number of experiment with students.
Bj
Student
θ 1 2 3 4 5 6
ID
-2.525 0.256 -0.510 2.271 0.256 0.256
1 -0.940 0.83 0.23 0.39 0.04 0.23 0.23
2 0.940 0.97 0.66 0.81 0.21 0.66 0.66
3 -2.174 0.59 0.08 0.16 0.01 0.08 0.08
4 -0.940 0.83 0.23 0.39 0.04 0.23 0.23
5 -0.940 0.83 0.23 0.39 0.04 0.23 0.23
6 0.940 0.97 0.66 0.81 0.21 0.66 0.66
7 2.174 0.99 0.87 0.94 0.48 0.87 0.87
attribute value
Question ID 7
Question What percentage of salt water solution will be made if
Sentence ( p1 )g of water is added to ( p2 )g of ( p3 )% salt
water solution?
Level Middle
Parameter set (50, 100, 3, 2), (60, 100, 4, 2.5), (50, 200, 5, 4), …
Parent ID 11
Child ID 3, 4, 6
Proposal of a Numerical Calculation Exercise System for SPI2 Test 495
Question is generated using the question template registered into the question
template database.
An example of a question template is shown in Table.3. The contents of each
attribute of a template are as follows.
- Question ID
ID of a question template
- Question sentence
Question sentence given to a student. x 1, 2, 3, … is a parameter which
determines a value at the time of generating questions.
- Question level
Difficulty level of the question expressed in three levels (High, Middle or Low).
- Parameter set
Set of the numerical value given to a question sentence and the numerical value of
a correct answer. There are some parameter sets in a question template, and one
set is chosen at the time of generating the question. For example, when (50, 100,
3, 2) are chosen, p1=50, p2=100, and p3=3 are set to a question sentence. The
correct answer is 2.
- Parent ID
ID of the question containing the solution of this question
496 S. Tsumori and K. Nishino
- Child ID
Question ID used as the partial solution of this question
Fig.2 shows three questions (a), (b), (c) and each solution. As Fig.2 shows, the
solution of question (b) includes the solution of question (c) and it needs another
calculation. Similarly, the solution of question (a) contains the solution of question
(b). Therefore, the difficulty level of three questions is (a), (b), (c) in order of
difficulty, and the layer of solution of the questions shown in Fig.2 is defined. That
is, the following relations are shown.
- The student who answered a certain question correctly can answer all the
questions of the lower level of the question correctly.
- The student who answered a certain question incorrectly cannot answer all the
questions of the higher level of the question correctly.
Then, all the questions in the study unit are defined hierarchically and a question-
template database like Fig.3 is created. Each node of Fig.3 is a question and the
number in a node is the question ID explained at Table.3. All links between nodes
can be expressed by setting up parent ID and child ID of Table.3.
Question-Template Database
Study unit 3
Study unit 2
Study unit 1
High 11 12
7 8 9 10
Middle
Low 1 2 3 4 5
All the nodes are divided into three levels (High, Middle and Low) in Fig.3.
Although the clear rule about how to divide a level is not specified, it can divide a level
by using the number of a formula required in order to solve the question as one view.
START
No
Generate the question using the Set the status of both this question
template unset a understanding and its children to "learned"
status and give it to a student
Yes
Answer is correct?
No
which the student belongs (for example "Middle") now. A question is generated by
setting a parameter to it. If a student answers the questions of a level to which he/she
belongs correctly altogether, it will move to the higher level. On the other hand,
when a student gives an inaccurate solution to a question, the learning situation of
the child template of the question template is returned to "not learned" (unset the
"learned"). One level of a student's level is lowered and an easier question is set.
A student's understanding situation is expressed by the overlay student model
defined in the pair of the question and the understanding situation ("learned" or "not
learned") of each question.
7 Conclusion
We explained the outline of the calculation exercise system for the numerical
calculation set in SPI2 Test. A student performs the question exercise based on the
result of the diagnostic test for every study unit using this method. Therefore, the
student can avoid spending learning time on the study unit which he already
understands thoroughly, and instead, learn the study unit which he/she becomes
498 S. Tsumori and K. Nishino
Acknowledgement. This work was supported by a Grant-in-Aid for Scientific Research (C)
(23501191).
References
1. Huang, S.X.: A Content-Balanced Adaptive Testing Algorithm for Computer-Based
Training Systems. In: Lesgold, A.M., Frasson, C., Gauthier, G. (eds.) ITS 1996. LNCS,
vol. 1086, pp. 306–314. Springer, Heidelberg (1996)
2. Suganuma, A., Mine, T., Shoudai, T.: Automatic Generating Appropriate Exercises
Based on Dynamic Evaluating both Students’ and Questions’ Levels. In: Proceedings of
World Conference on Educational Multimedia, Hypermedia and Telecommunications,
pp. 1898–1903 (2002)
3. Hoshino, A., Nakagawa, H.: A real-time multiple-choice question generation for
language testing - a preliminary study. In: Proceedings of the 2nd Workshop on Building
Educational Applications Using NLP, pp. 17–20 (2005)
4. Mitkov, R., Ha, L.A.: Computer-Aided Generation of Multiple-Choice Tests. In:
Proceedings of the HLT-NAACL 2003 Workshop on Building Educational
Applications Using Natural Language Processing, vol. 2, pp. 17–22 (2003)
5. Holohan, E., Melia, M., McMullen, D., Pahl, C.: The Generation of E-Learning Exercise
Problems from Subject Ontologies. In: Proceedings of The Sixth IEEE International
Conference on Computer and Information Technology, pp. 967–969 (2006)
6. Holohan, E., Melia, M., McMullen, D., Pahl, C.: Adaptive E-Learning Content
Generation based on Semantic Web Technology. In: AI-ED 2005 Workshop 3, SW-EL
2005: Applications of Semantic Web Technologies for E-Learning, pp. 29–36 (2005)
7. Gonzalez, J.A., Munoz, P.: E-status: An Automatic Web-Based Problem Generator -
Applications to Statistics. Computer Applications in Engineering Education 14(2),
151–159 (2006)
8. Guzmán, E., Conejo, R.: A Model for Student Knowledge Diagnosis Through Adaptive
Testing. In: Lester, J.C., Vicari, R.M., Paraguaçu, F. (eds.) ITS 2004. LNCS, vol. 3220,
pp. 12–21. Springer, Heidelberg (2004)
9. Lazcorreta, E., Botella, F., Fernandez-Caballero, A.: Auto-Adaptive Questions in
E-Learning System. In: Proceedings of the Sixth International Conference on Advanced
Learning Technologies, pp. 270–274 (2006)
10. Tsumori, S., Kaijiri, K.: System Design for Automatic Generation of Multiple-Choice
Questions Adapted to Students’ Understanding. In: Proceedings of the 8th International
Conference on Information Technology Based Higher Education and Training, pp.
541–546 (2007)
Proposal of an Automatic Composition Method
of Piano Works for Novices Based on an
Analysis of Study Items in Early Stages of Piano
Education
Keywords: Piano works, novices, learning stages, study items, automatic compo-
sition method.
1 Introduction
Piano learners in early stages use textbooks for novices in their piano practice. We
guess, the authors of the textbooks compose a piano work for novices based on
knowledge and skill, which the novices already acquired and the author would like
them to learn in the work. The textbooks include various types of piano works suit-
able for novices in various learning stages; however, it is difficult to make enough
Mio Iwaki
Interdisciplinary Graduate School of Science and Technology, Shinshu University, Japan
Hisayoshi Kunimune
Faculty of Engineering, Shinshu University, Japan
e-mail: kunimune@cs.shinshu-u.ac.jp
Masaaki Niimura
Graduate School of Science and Engineering, Shinshu University, Japan
T. Watanabe et al. (Eds.): Intelligent Interactive Multimedia: Systems & Services, SIST 14, pp. 499–509.
springerlink.com © Springer-Verlag Berlin Heidelberg 2012
500 M. Iwaki, H. Kunimune, and M. Niimura
progress in their practice with only such textbook because of a shortage of works
for some stages.
Novices in a learning stage need to acquire knowledge and skill by playingwith
some works suitable for the stage. Thus this study proposes an automatic composi-
tion method of piano works for novices based on already acquired knowledge and
the skill of a learner and items to learn in the composed work. The composed works
with the proposed method become supplementary works for practice of the learner.
Firstly, this paper classifies and systematizes study items in the early stages of
piano practice to define the general curriculum. Secondly, this paper then organizes
the basic component of piano works for novices —rhythm, chord progression, play-
ing style, accompaniment patterns, and so on— based on the study items. Finally,
this paper proposes an automatic composition method of piano works.
2 Related Works
Kitamura et al. propose a method for composing melody including a learner’s weak
point for novices’ self-learning[3]. This method produces sheets of music, which
consist of five measures. Every measures of a produced sheet include two notes
in the same interval. A learner chooses an interval from I to V degree as his/her
weak point to produce a sheet. The method composes five melodies of a measure
including the chosen interval and combines these melodies to produce a sheet.
Yoshida et al. suggest a producing method of tunes like ‘Hanon’ containing suit-
able exercises for a learner, which are found from the learner’s piano performance[7].
This method analyzes the performance and detects the learners’ weak point.
The research of Kitamura et al. and our study are similar in that they limit the
range based on a learner’s level. However, these related works focus on composing
melodies, which are suitable to practice his/her weak point of finger movements.
Moreover, these works intend their method for self-learners in various ages.
On the other hand, McCormack proposes to adapt L-system for automatic com-
position based on grammar[4]. This method can compose various pieces of music;
however, this work does not focus on learning. Thus, we should define other gram-
mar to compose pieces of music based on learning stages of learners.
Study items in
the early stages
Knowledge Skill
fingering
frames relationship
techniques
between correspondence number of
notes and between notes fingers to position of
meters notes
note values and fingering play fingers
We do not treat items in “Perception” in this study because these are vague to
classify.
4.2 Rhythms
Learning of rhythm follows the learning progress in understanding notes and rests
and their values. Thus usable notes and rests in a piano work are decided from
the designated learning stage of the work. There are vast combinations of notes
and rests; however, we narrow the combinations down by surveying actually used
combinations in the textbooks for novices.
We selected three textbooks —“Beyer Op.101,”[2] “Methode Roses,”[6] and “Pi-
ano dream”[5]—, which include many homophonic works, and we surveyed the
combinations used in 162 works in these textbooks. These works consist of 87 works
at 4/4, 63 works at 3/4, and 12 works at 2/4. We counted the combinations in every
measure of melodies and accompaniments in these works and divided these combi-
nations into combinations usually used for (1) melodies, (2) accompaniments, and
(3) cadence.
For example, Table 1 shows the usually used combinations of notes and rests with
values one and two at 4/4 in these textbooks.
Table 1 Example of used rhythm patterns with values one and two at 4/4.
chords —II, III, IV, VI, VII, and IV/V7 —, in the work have the same combinations
of notes as I, IV, or V. Table 2 shows the expressible chords in works in C-major
with each interval and playing style.
Table 2 Expressive chords in C-major with each interval and playing style.
Chord
Range Playing style I II III IV V7 VI VII ◦I IV/V7 V/V7 VI/V7
Single tone ◦ - - ◦ ◦ - ◦ ◦ - ◦ ∗
C-G fifth interval Double notes ◦ - - ◦ ◦ - ◦ ◦ - ◦ ∗
Chord ◦ - - - ◦ - ◦ ◦ - ∗ ∗
Single tone ◦ ◦ ◦ ◦ ◦ ◦ ◦
C-A sixth interval Double notes ◦ ◦ ◦ ◦ ◦ ◦ ◦
Chord ◦◦ ◦ ◦ ◦ ◦ ◦
Single tone ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦
C-B seventh interval Double notes ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦
Chord ◦◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦
(◦: Expressible, -: Not expressible, : Cannot exist together in the same work, ∗: No
following chords or unused inverted chords)
Range
Chord Playing style fifth interval sixth interval seventh interval
single tone C, E, or G
I double notes CE, EG, or CG
chord CEG
broken chord C-G-E-G or C-E-G
single tone - D, F, or A
II double notes - DF, FA, or DA
chord - DFA
broken chord - D-A-F-A or D-F-A
(-: Impossible to play or expressing other chord)
Proposal of an Automatic Composition Method of Piano Works for Novices 505
4.6 Form
All phrases in the subject works have four measure length and are perfect cadence
or half cadence. All of the works are in one part, binary, or ternary form as follows:
• One part form: AA A’A
• Binary form: AB A’B
• Ternary form: ABA AB’A A’B’A
A, A’, B, and B’ indicate phrases. A and B are perfect cadence, and A’ and B’ are
half cadence.
chord assigned to the measure, and at least one note in the sequence becomes the
chord-tone.
We organized the relationship between assignment of a chord to the measure,
which contains a practice point, and the pitches in the sequence of a practice point.
Table 5 shows an example of the relationships in a work, which has fifth intervals
and chord pattern in its accompaniment and is in C major.
Table 5 Example of the relationship between the chords and pitches in practice points.
Chord-tone(s)
Sequence of two notes Both notes the first note the second note
C-C I I I
C-D - I V
C-E I I V
C-F - I I
C-G I I I and V
C-A - I -
C-B - I V
(-: No chords to saticefy the condition)
Rule 2: When the practice point of the work is designated as a series of two or
three notes, there are two following rules to assign a chord to the measure
including the practice point.
Rule 2-1: If there are chords including all notes of the practice point as chord-tones,
the method chooses from these chords and sets the first note on a down-
beat in the measure.
Rule 2-2: Otherwise, the method assigns a chord including the second note as a
chord-tone and sets the second note on a downbeat in the measure.
Rule 3: If the user designates two or more practice points, and there is no chord
progression, which can include all of the practice points. Then the method
composes the work in binary form.
Rule 4: If the chord of the 4n-th (ex. fourth, eighth, 12th) measure, which is not the
last measure of the work, becomes I (tonic triad), the method determines
the last note in the measure the third or the fifth note.
Rule 5: The method sets the last note the key-note of the work.
Rule 6: If the melody of the work finishes on rests, the method determines the ac-
companiment to finish on the same rests.
Knowledge Skill
fingering
frames relationship
techniques
between correspondence number of
notes and between notes fingers to position of
meters notes
note values and fingering play fingers
RH: extension
4/4 RH: C4-G4 extension of
LH: seventh
RH: single tone
LH: B2-A3 five fingers LH: broken chord
intervals
44
4
4
The step 2 designates the practice point in the work. In this case, the practice
point is the sequence of two notes E-F with his/her right hand.
The step 3 determins the chord progression of the work. For example, the method
chose out usable notes and chords, which are limited by the range of this work and
practice point (E-F with the right hand). In this case, the method determines I-V/V7 -
V-I as the chord progression of the work. The method also determines the melody
and the accompaniment pattern of the work.
7 Conclusion
The objective of this study is an automatic composition of supplementary piano
works for novices on the early stage.
At first, we classify and systematize study items in piano works in textbooks for
novices. The piano works has characteristics: study items increase functionally as
the learner’s skill grows up. These study items limit elements of the piano works
such as range, notes and rests values, number of measures, and so on. This study
hypothesizes it is possible to automatically compose piano works for novices, which
are suitable for a learner’s learning stage, by using the limitation introduced from
study items of the stage. In this study, we classify and systematize the study items
of each stage, and we confirm these items can explain the curricula of textbooks for
novices.
This study then propose a method, which consists of three steps and six rules,
for automatic composition. As study items in a learning stage restrict the elements
of a piano work, it is possible to make selections about the base elements of the
work from these restrictions. Thus, the proposed method can choose out from the
selections, which are made by the restrictions introduced from the study items.
We confirmed that the proposed method could automatically compose some pi-
ano works for novices in various stages.
Proposal of an Automatic Composition Method of Piano Works for Novices 509
References
1. Akutagawa, Y.: Ongaku no Kiso (Fundamentals of music). Iwanami Shoten (1971) (in
Japanese)
2. Beyer, F.: Vorschule im Klavierspiel Op.101: for Piano. Zen-On (2005)
3. Kitamura, T., Miura, M.: Constructing a support system for self-learning playing the pi-
ano at the beginning stage. In: Proc. International Conference on Music Perception and
Cognition, ICMPC 2006, pp. 258–262 (2006)
4. McCormack, J.: Grammar Based Music Composition. In: Complex Systems: From Local
Interactions to Global Phenomena, pp. 320–336. IOS Press (1996)
5. Tamaru, N.: Piano Dream 1–6. Gakushu Kenkyu Sha (1993)
6. Velde, E.V.D.: Methode Rose Par Ernest Van De Velde. Ongaku No Tomo Sha Corp.
(1950)
7. Yoshida, K., Muraki, M., Emura, N., Miura, M., Yanagida, M.: Generating appropriate ex-
ercises like hanon for practicing the piano. Technical report of musical acoustics, Acous-
tical Society of Japan, MA2008-52, pp. 51–56 (2008) (in Japanese)
Proposal of MMI-API and Library
for JavaScript
1 Introduction
The use of web-based multimodal interaction is one of the most absorbing research
topics in the area of multimodal interaction. Some multimodal description languages
such as X+V [1] and XISL [2] have been proposed to describe speech interaction
scenarios and are used together with HTML pages, while SALT [3] provides a set of
tags that are used to embed a speech interface into HTML documents. Other languag-
es such as SMIL [4], MPML [5], and TVML [6] define the synchronization of output
media or the gestures of animated characters for the rich presentation of output me-
dia/agents. Although these languages have resulted in significant advances in web-
based multimodal interaction, not many of them are widely used as practical systems.
One reason for this is the difficulty that application developers face in master-
ing a new description language. Although the above languages provide a strong
T. Watanabe et al. (Eds.): Intelligent Interactive Multimedia: Systems & Services, SIST 14, pp. 511–520.
springerlink.com © Springer-Verlag Berlin Heidelberg 2012
512 K. Katsurada et al.
3 Outline of MMI-API
Based upon the discussion in Section 2, we created the following specifications for
an API in order for it to handle multimodal inputs/outputs. Table 2 shows the mul-
timodal input/output functions provided by the API. It provides four types of input
functions: Input, seqInput, altInput, and parInput; these handle un-
imodal input, multimodal sequential inputs, multimodal alternative inputs, and mul-
timodal parallel inputs, respectively. As for outputs, the API provides three types of
functions: Output, seqOutput, and parOutput; these handle unimodal out-
put, multimodal sequential outputs, and multimodal parallel outputs, respectively.
These functions only partially satisfy requirements 1-3 because they do not de-
fine any details with regard to concrete inputs, outputs, and synchronization. These
details are provided by the arguments given to the functions. These arguments are
described in the JSON format [9], as listed in Figure 1. The JSON format is used to
514 K. Katsurada et al.
describe multiple pairs of properties and their values. Each property represents the
type of modality, the conditions for accepting input, or some other options. Tables
3-5 show the available properties of inputs, outputs, and gestures performed by the
dialogue agents, respectively.
// Input example
mmi.altInput({
"type" :"click",
"match" :"agent"
},{
"type" :"speech",
"match" :"./grammar/start.txt"
}); //start interaction with the agent
//click the agent or talk to the agent
// Output example
mmi.parOutput({
"type" :"audio",
"event" :"play",
"match" :"./sound/isshoni.ogg",
"options":{
"begin":500
}
},{
"type" :"agent",
"event" :"gesture",
"gesture":"speak",
"options":{
"begin":500,
"dur" :5500
}
}); //output the agent’s speech
//This example does not use TTS.
To confirm the usability of the developed API and library, we embedded multi-
modal interaction into a web-based English pronunciation training application for
Japanese students [12] that was scripted using Action Script (A scripting language
used for developing Flash software). The application recognizes the user’s pro-
nunciation of a phoneme and shows its manner and place of articulation (such as
the shape of the mouth and position of the tongue) on the International Phonetic
Alphabet (IPA) vowel-chart. The user can correct his/her pronunciation by mod-
ifying his/her mouth shape and tongue position according to the chart.
Figure 3 shows a screenshot of the application, and Figure 4 outlines its system
structure. The character shown at the center of the figure is a dialogue agent to
which the user speaks. The user can input commands to the system either by oper-
ating the mouse or by speaking to the agent. These commands are sent to the Flash
program that is executed on the browser, after which they are sent to the server.
The content is delivered to the user after synchronization of the dialogue agent’s
gestures, speech, and background music.
518 K. Katsurada et al.
Using the proposed API and library, through the development process of this
application, we confirmed a number of findings. Compared to the development of
the application using the other languages, our API and library realize more de-
tailed and complex control of interaction, including control of the timing of in-
puts/outputs and coordination with Flash or other APIs (such as Google map API).
Because an application developer can now construct an interaction scenario only
using JavaScript, he/she can regard the interaction scenario and the application as
a single program unit. This feature enables developers to construct complicated
interaction scenarios far more easily than they would use some other languages.
However, an issue arises due to a characteristic of the language JavaScript, namely
the difficulty of synchronization that is triggered by the end of an input acceptance
or an output presentation. This is because JavaScript does not provide a “wait”
function. The developer still has to write a somewhat complicated program to
execute this type of synchronization. We would like to resolve this issue in the
future.
6 Conclusions
In this paper, we have proposed a multimodal interaction API for JavaScript, and
have also provided a library that can be executed on general web browsers. Al-
though the API is very simple one which contains only four types of input func-
tions and three types of output functions, it has a strong descriptive power in the
arguments given to these functions. Through the development process of a web-
based English pronunciation training application for Japanese students, we con-
firmed that the API and the library provide more detailed control of a complicated
interaction, including control of the timing of inputs/outputs and coordination with
Flash programs. This feature enables developers to construct complicated interac-
tion scenarios more easily than they would use some other languages.
The remaining studies are to resolve the problem of output synchronization,
which was mentioned in Section 5, and to implement various non-implemented
functions of the library (such as the nesting of inputs/outputs, and some gestures
of the dialogue agent, among others). In the near future, we intend to publish the
API and the library on the web.
References
1. XHTML+Voice, http://www.w3.org/TR/xhtml+voice/
2. Katsurada, K., Nakamura, Y., Yamada, H., Nitta, T.: XISL: A Language for Describ-
ing Multimodal Interaction Scenarios. In: Proc. of ICMI 2003, pp. 281–284 (2003)
3. Wang, K.: SALT: A spoken language interface for web-based multimodal dialog sys-
tems. In: Proc. of InterSpeec 2002, pp. 2241–2244 (2002)
4. SMIL, http://www.w3.org/AudioVideo/
5. Tsutsui, T., Saeyor, S., Ishizuka, M.: MPML: A Multimodal Presentation Markup
Language with Character Agent Control Functions. In: Proc. WebNet 2000 World
Conf. on the WWW and Internet (2000)
520 K. Katsurada et al.
Abstract. We considered the knowledge and skills that are required for information
morals education at high schools in Japan. Clearly its knowledge and skills are
critical for high school students. We demonstrate the problems with the teaching
material of information morals in Japan. Teaching materials are needed to help
students cope with the problems caused by utilizing the knowledge and the skills of
the information society to provide a more general way of thinking that can apply
other examples and a positive attitude toward the information society. We proposed
teaching material based on the goal-based scenario theory. From our practice results
that used teaching material whose subject was spam e-mail, we achieved the
learning goals of the teaching material.
1 Introduction
The following two points contain this paper’s purpose. The first is to consider the
knowledge and skills that are required for information morals education at high
schools in Japan. The second is to propose teaching materials for learning about
such knowledge and skills.
T. Watanabe et al. (Eds.): Intelligent Interactive Multimedia: Systems & Services, SIST 14, pp. 521–529.
springerlink.com © Springer-Verlag Berlin Heidelberg 2012
522 K. Umeda et al.
• Authenticity: The Internet has too much random information, since anyone can
contribute to it. Information must be verified.
• Openness: The information written on bulletin boards or SNSs can be seen by
anyone all over the world since they are open to the whole world.
• Recorded information: The information written on the Internet is difficult to
completely delete. On the other hand, sent information is not anonymous. The
records of senders always remain.
• Mutual payment and public resources: Both senders and recipients pay Internet
fees. Networks are public resources that should be used economically.
• Invading problems: Simply by connecting to the Internet, dangerous Web pages
might attack computers to extract information.
In this paper, we define these three aspects as the knowledge and the skills of the
information society.
example, since a student knows a domain’s terms without knowing whether the site
is helpful for judging authenticity, he doesn’t check what domains are more reliable
for judging websites. Another example is a case where student A wrote a friend’s
libel on Student A’s blog since Student A was under an illusion that blog was only
read by oneself and close friends. Students sometimes fail to realize how open the
information society is.
We also identified the following problems in information morals education in
Japan.
(2) Mission
The mission is the problem that students are expected to solve. They learn to
recognize that the mission is the goals of the teaching material instead of the
learning goals. It must be motivational and somewhat realistic. It should also
require the use and application of the knowledge and skills mentioned in the
learning goals.
(3) Cover story
The cover story is the background storyline that creates the need to accomplish the
mission. It must motivate the students and allow enough opportunities to practice
the skills and seek knowledge.
(4) Role
The role is the character or position that the students will play within the story. It is
important to think about what role is best in particular scenarios to practice the
necessary skills. The role should also motivate students to engage in the story.
(5) Scenario operations
The scenario operations are all the activities done by students in pursuing the
mission goals. They must be closely related to both the mission and the goals. They
must also include decision points with evident consequences. If a student cannot
make the right decision, they should learn the negative consequences through
student interaction.
(6) Resources
Resources provide the information that might assist students to achieve their
mission goal. Well-organized information must be readily accessible to help
students successfully complete their missions.
(7) Feedback
The students can receive adequate feedback at appropriate times in any of three
ways: consequence of actions, through coaches, or domain experts who tell stories
about similar experiences.
In this paper, we explain our teaching material using spam e-mail as an example.
Spam e-mail may spread problems by one-click frauds or distribute viruses. In
recent years, spam e-mail is often generated by the personal information that was
inputted into prize sites by users themselves rather than being randomly sent. The
fundamental way of coping with spam e-mail is to ignore it. We must also use filters
that block it and avoid inputting personal information. However, personal
information may be disclosed on SNSs during communication with others.
Therefore, students must be taught more than simply warning them to avoid
inputting personal information on webpages.
To implement GBS, we designed our teaching material as follows:
(1) Learning goals of spam e-mail teaching material
We define process knowledge as the following three skills: (i) Correctly
identifying spam e-mails and coping with them. (ii) Deciding for themselves how
much personal information to disclose based on the website or their purposes. (iii)
Developing positive feeling by participating in the information society. We define
content knowledge as the following three aspects based on the definition of
knowledge and the skills of the information society mentioned in Section 1.2: (i)
Students have to learn about invaders, authenticity, and openness as characteristics
of the information society. (ii) Students have to learn information technology that
is relevant to spam e-mails, for example, filtering services and domains that
effectively reject them. (iii) Such laws as the Consumer Contract Act for the
Internet, which enacted for relief of the consumers’ operation mistake in electronic
commerce, are used by students when they practice skills in the scenarios. When
we decide the learning goals, the process knowledge can be widely adapted not
only for a specific example or trouble but other examples in the information
society.
(2) Mission, (3) cover story, and (4) role of spam e-mail in teaching material
In Japan, more than 95% of Japanese high school students have mobile phones and
34.5% of junior high school students [9]. Based on this current situation, we set the
mission, the cover story, and the role as follows. The mission for students is to give
advice to younger junior high siblings, who are eager to start using mobile phones,
about coping with the information society. The role of the students is an elder high
school sibling.
We made a student’s role that is not working actively in the information society
but is supporting the sibling for the following reason. In some scenarios, events
purposely go wrong so that students can learn something. In such cases, they do not
avoid failure, even though they didn’t choose that way. Some students regard it as
disagreeable. Therefore, students are assigned to roles that support younger siblings
and view the process of activities instead of a role itself that fails. In the cover
story’s scenario, the younger sibling gets excited since she finally has her own
mobile phone and starts to use it. But she is having problems as she advances in the
information society and needs advice.
Proposal of Teaching Material of Information Morals Education 527
(5) Scenario operations, (6) resources, and (7) feedback of spam e-mail teaching
material
First, students learn the mechanism of spam e-mails through the young sister’s
behavior. Then the question of how students should cope with them is prepared as
scenario operations. The teaching material immediately returns feedback that
informs students of failures and the reasons when they selected wrong actions.
The scenario operations give students a chance to practice specifying a domain
for rejecting spam e-mails and helping them become comfortable with information
technology. The scenario operations, which include such other topics as one-click
fraud and SNS communication, help students judge how much personal information
to disclosure based on the website or their particular purpose. Moreover, not only
troubles but also effective usage of advertising e-mail, for example, discount
coupons, is contained in the scenario operations. The teaching material describes
the knowledge and skills of information society as resources that the students can
read depending on the scenario operations.
3.1 Overview
We used our developed teaching materials at a high school on December 20-21,
2010 in 90-minute classes with 40 students who had already studied information
technology. In the first half of the class, we explained the mission and the scenario
using presentation files, and all students dealt with the mission in every four
students’ groups. In the second half of the class, each student practiced the skills
and knowledge using a PC and the teaching material that we developed using
Adobe Flash Professional CS3.0 for Windows.
3.2 Results
We conducted pre- and post-tests to evaluate whether the students achieved the
learning goals and examined the results from three aspects of process knowledge.
(i) Students correctly recognized spam e-mails and coped with them.
In the pre- and post-tests, students were given examples of spam e-mails and asked
how they would approach them. Students answered the action of deciding as skills
and gave their reasons of the action as knowledge. Students choose answers from
the pre-test choices, although they chose actions from the choices and gave reasons
in the post-test because it was more difficult than the pre-test. The pre- and
post-tests had two types of questions: similar questions that confirmed whether
students understood what they learned in the class or the teaching material, and new
questions that confirmed whether students could apply what they learned to new
situations that were not treated in the class.
528 K. Umeda et al.
The average scores of these questions are shown in Table 1. The results of
multiple comparisons showed that the post-test scores are significantly higher than
the pre-test scores.
(ii) Students can decide how much personal information to disclosure based on the
website.
Students were given a role of Student A who wants to exchange information about
her favorite singer B with fans’ B on SNS. Students were also given a situation
about providing personal information to an SNS and described it freely in the
post-test. Student A’s personal information is listed, such as where she lives, her
e-mail address, her favorite animals, food, etc. Students write self-introductions of
Student A on the SNS using the listed information. The average scores were 4.73 on
the scale of one to five, which is very high.
(iii) Students developed positive feelings about participating in the information
society.
The student comments about the teaching materials showed that they felt
satisfactory while treating such familiar items as mobile phones and SNSs as
scenario subjects and their roles that gave advice to younger siblings. This showed
that our teaching materials provided realistic situations for students along with the
GBS theory. Moreover, student opinions included “I will try to use the coupon of
my favorite store” or “I’d like to use SNS to utilize the class’s knowledge and skill.”
We can see positive attitudes of participating in the information society. These
results suggest that our learning goals were clearly achieved.
4 Conclusions
In this paper, we considered the knowledge and skills that are required for
information morals education in Japan, which high school students clearly need.
We also proposed teaching material based on GBS theory. After the students
practiced using the teaching material whose subject was spam e-mails, they
achieved the learning goals of the teaching material. In future work, we will develop
other teaching materials based on such student characteristics as preexisting
knowledge of information technology. Also we would like to compare with other
teaching materials based on other methods.
References
[1] JAPET, Instructional practice kickoff guide of “information morals” for all teachers
(2007) (in Japanese)
[2] Gagne, R.M., Wager, W.W., Golas, K.C., Kellar, J.M.: Principles of Instructional
Design, 5th edn. Thomson Learning Inc. (2005); Japanese translation by Suzuki, K., and
Iwasaki S. (Translation supervisor) Kitaoji-shobo (2007)
[3] Tamada, K., Matsuda, T.: Systematic and methodical information morals education in
elementary school stage. In Consideration of Consistency with Instructional Method
Using “Three types of knowledge” for information morals education. In: Research
Report of JSET Conferences, vol. 08(5), pp. 109–116 (2008) (in Japanese)
[4] Umeda, K., Ejima, T., Nozaki, H.: The development and trial of goal-based scenario
teaching material for learning a framework for judging information ethics in a high
school class. Bulletin Paper of Center for Research, Training and Guidance in
Educational Practice (11), 67–72 (2008) (in Japanese)
[5] Ishihara, K.: The Transition of Information Morality Education and Teaching Materials
of Information Moral. The Annals of Gifu Shotoku Gakuen University. Faculty of
Education 50, 101–116 (2011) (in Japanese)
[6] Tamada, K., Matsuda, T.: Development of the Instruction method of Information Morals
by “the combination of three types of knowledge”. Japan Journal of Educational
Technology 28(2), 79–88 (2004) (in Japanese)
[7] Schank, R.C., Berman, T.R., Macpherson, K.A.: Learning by Doing. In: Regeluth, C.M.
(ed.) Instructional-Design Theories and Models: A New Paradigm of Instructional
Theory, vol. II (1999)
[8] Nemoto, J., Suzuki, K.: A checklist development for or instructional design based on
Goal-Based Scenario theory. Japan Journal of Educational Technology 29(3), 309–318
(2005) (in Japanese)
[9] Nippon police agency, The result of a survey on the actual conditions of usage of mobile
phone used by elementary school students,
http://www.npa.go.jp/safetylife/syonen1/shonen20110825.pdf
(accessed, December 17, 2011) (in Japanese)
Prototypical Design of Learner Support
Materials Based on the Analysis of Non-verbal
Elements in Presentation
Abstract. There is a growing need for a well-designed learner support system for
presentation in English, particularly in non-English-speaking countries. We have
developed a prototype of comprehensive learning support system of basic presen-
tation that consists of several modules including digital contents of preliminary tu-
torials, an interactive aide for organizing a presentation and its corresponding
slides, semi-automatic evaluation estimation, and a online review of recorded
presentations. After trials, it has been clear that non-verbal aspects has to be
extensively supported by such a system. In this study, we made an extensive ob-
servation and analysis of available professional and learner presentations, and ex-
tracted significant non-verbal elements in those presentations. Then, we designed
learner support materials for non-verbal aspects, based on a non-verbal ontology
we also designed. It is expected that the implementation of those materials will let
learners to learn more effectively how to make and conduct a presentation.
1 Introduction
It is more and more important for any one of us to make a good presentation, par-
ticularly that in English, in various situations. Oral presentation is one of the most
sophisticated communicative activities; deliberately designed to be presented to
specific audience with slides to deliver information and attempt to make a persua-
sion. Not only the linguistic organization but also the paralinguistic effects like
Kiyota Hashimoto
School of Humanities and Social Sciences, Osaka Prefecture University, Japan
e-mail: hash@lc.osakafu-u.ac.jp
Kazuhiro Takeuchi
Faculty of Information and Communication Engineering, Osaka Electro-Communication
University
e-mail: takeuchi@isc.osakac.ac.jp
T. Watanabe et al. (Eds.): Intelligent Interactive Multimedia: Systems & Services, SIST 14, pp. 531–540.
springerlink.com © Springer-Verlag Berlin Heidelberg 2012
532 K. Hashimoto and K. Takeuchi
body language and eye contact are utilized, and the presenter should have a deep
understanding not only of what he or she delivers but of how the audience will
react. However, the education for presentation has not been so systematic and per-
vasive as expected particular in Asian countries. There are at least two reason for
this current situation. First, oral presentation has not been culturally emphasized,
as symbolized as “actions speak louder than words (fugen-jikkou in Japanese).”
Second, not so many teachers are trained for presentation education. On the other
hand, social demands for good presentation education are larger in any country.
Considering these situations, we are constructing a comprehensive learner support
system for presentation in English.[1-4] It consists of several modules including
digital contents of preliminary tutorials, an interactive aide for organizing a pres-
entation and its corresponding slides, semi-automatic evaluation estimation, and a
online review of recorded presentations, with a multimedia learner corpus of pres-
entation. The overview of our prototypical system is shown in Fig.1.
Presentation Ontologies
Presentation Ontology
Multimedia Learner Corpus
Presentation Task Ontology
Presentation Error Ontology of Basic Presentation
Presentation Review of
Recording Presentations
Table 2 Average gradings of the whole, Element 1, and Element 2 of 100 learner
presentations
Category Details
Big move Walking around
Stepping
Approaching to the audience
Approaching to the screen
Posture Basic posture
Basic arm position
Turning to different sides of the audience
Turning to the screen
Upper body Bending
Shrugging
Turning to different sides of the audience
Turning to the screen
Arm Widening both arms
Moving both arms with various hand shapes
Turning up an arm
Pointing out a particular part of the audience
Pointing out a particular part of the screen
Taking hold of both hands
Hand without arm movement Moving a hand around
Showing the palm side
Showing the back side
Turning the palm up
Turning the palm down
Holding fast
Pointing up the thumb
Pointing up the index finger
Head Nodding
Shaking
Turning to different sides of the audience
Turning to the screen
Inclining the head
Face Richness of facial changes
Eye Eye contact to the audience
Eye contact to different sides of the audience
Turning to the screen
536 K. Hashimoto and K. Takeuchi
As expected, famous presenters like the late Steve Jobs and Michael J. Sandel em-
ploy most of these different moves, and professional presentations evaluated high by
us also contain many. However, a remarkable feature common to them is the rather
long duration of each move. Most of their moves are rather slow, and the finished po-
sition lasts more than one second in general. On the other hand, presenters whose
moves are fast and hasty look less trustworthy, as naturally expected. Interestingly,
the same evaluation tends to be seen in peer-evaluations on learners’ presentations.
Another intriguing feature is arm positions. According to our observation, good
presenters keep their arms, or at least an arm, above the line of the diaphragm al-
most always. Though it is often said that putting a hand or both in his pocket(s)
looks rude, many presenters still do so. However, even when they put their hand in
his pocket, the other arm is kept upwards.
Note that our study focuses on presentations with slides (i.e., there is a screen
where slides are projected), and that none of them are primarily made for broad-
casting. The former condition leads to frequent turns to the screen regardless of
which part of the body is used. The latter leads to the full employment of their
body partly in order to face every part of the audience.
In sum, though the discussion above is not exhaustive at all, we newly classi-
fied the presenter’s moves as shown in Table 3, and we have found two remarka-
ble features: duration and arm positioning. Presenters evaluated not so high tend to
lack both features. In particular, it is noticed that Japanese presenters, whether
they are learners or professional, tend to show these lack, though we are not cer-
tain whether it is a cultural tendency or just a manifestation of inexperience in
presentation.
Before turning to the next section, note that we did not relate each move to a
particular communicative function, as most guidebooks and handbooks do. Of
course some moves have an obvious communicative function like pointing to a
particular part of the screen, but mostly their communicative functions were not
pursued in our observation. There are two reasons: one is that it is just impossible
for us to find such relations with our data with evidence, and the other is, more
importantly, that the frequency and variation of non-verbal moves clearly divides
good and bad presentations, regardless that a presentation is made by a profes-
sional or a learner. So, though we admit that it would be important to relate eave
move to a particular communicative function, any attempt for it is a future task.
but, as we explain later, we are planning to offer simulative image and for this
purpose, it is desirable to construct an ontology rather than to have a simple list of
moves. The overview of our prototypical ontology on non-verbal elements in
presentation is shown in Fig. 3.
This prototypical ontology has each body part as its elemental entry. Each entry
is related by “part-of” relations, and each body part has a list of roles that describe
actions used in a presentation. So the lists of body parts and their roles are not ex-
haustive from the viewpoint for describing all the human actions. Each role has
two values: distance and duration. As naturally expected, some are related to oth-
ers by the feature “induced move,” which means that, due to the human body
structure, they move in accord. Note that induced move is triggered by physical
necessity. If a person moves more than one different body parts with his own will,
it is not an induced move, and such simultaneous moves are captured just by acti-
vation of roles of those body parts.
This ontology is designed to be used not only for clarifying actions employed in
a presentation but for describing and analyzing presentations, and simultaneous
moves and sequential moves are easily captured with it.
two classes in 2011, and 82% of the students answered in the post-class question-
naire survey that the tutorial was highly comprehensible and useful. However, our
analysis on their final presentations which should have been improved if the tu-
torial was truly useful indicates that such a tutorial is not enough. It may be true
that they understand the importance of non-verbal aspects and some functions of
non-verbal actions, but mastery requires more than understanding, as naturally ex-
pected.
So we designed our simulative analysis tool for non-verbal aspects. This tool is
designed to make a better review analysis on presentations in non-verbal aspects
by providing opportunities to analyze presentations visually.
The user interface consists of three panes as shown in Fig.5. At the leftmost,
video playing buttons and tag drawing lists are placed. The user chooses the pres-
entation video he wants to review, and plays it. While playing, he can stop the
video any time by pushing “Capture Now” button, and can draw lines to the body
part he focuses on. When drawing a line, he can choose the reason why he focuses
! ?
on it. The classification is quite simple: [ ], [ ], [×]. Then he can add a com-
ment to the line at the rightmost pane.
The unique point of the design is that the review result can be piled on
other results and the user can easily compare more than one reviews on a
presentation.
Prototypical Design of Learner Support Materials Based on the Analysis 539
4 Concluding Remarks
In this paper, we first made an analysis on non-verbal aspects of professionals’
and learners’ presentations, and then we proposed the construction of an ontology
and two learning materials. Most of our attempts are still at a design phase and
much as to be done, but we are developing a prototypical simulative tool for non-
verbal aspects.
References
[1] Hashimoto, K., Takeuchi, K.: Multimedia Learner Corpus of Foreign-er’s Basic Pres-
entation in English with Evaluations. In: Proc. of International Conference on Educa-
tional and Information Technology, vol. 2, pp. 469–473 (2010)
[2] Hashimoto, K., Takeuchi, K.: Prototypical Development of Awareness Promoting
Learning Support System of Basic Presentation. In: Proc. of 2nd International Sym-
posium on Aware Computing, pp. 304–311 (2010)
[3] Hashimoto, K., Takeuchi, K.: A Task Ontology Construction for Pres-entation Skills.
In: Proc. of 16th International Conference on Artificial Life and Robotics, pp. 162–
165 (2011)
[4] Hashimoto, K., Takeuchi, K.: Rhetorical Structure Ontology for Representing Learn-
er’s Presentations with Potential Textual Inconsistencies and Imperfections. ICIC Ex-
press Letters 5(5), 1649–1654 (2011)
[5] Ishikawa, S.: Elements of English Presentation Skills. Proc. of Japan and British Lan-
guage and Culture 1, 1–18 (2009)
[6] Koegel, T.J.: The Exceptional Presenter: A Proven Formula to Open Up and Own the
Room. Greenleaf Book Group, New York (2007)
[7] Kayatsu, R.: A Trial of Peer Review in an Information Presentations Class and its
Evaluation. J. of Nagano Junior College 64, 71–79 (2009)
540 K. Hashimoto and K. Takeuchi
[8] Krauss, R.M., Chen, Y., Chawla, P.: Nonverbal Behavior and Non-verbal Communi-
cation: What Do Conversational Hand Gestures Tell Us? In: Zanna, M. (ed.) Ad-
vances in Experimental Social Psychology, pp. 389–450. Academic Press, San Diego
(1996)
[9] Nakano, Y., Rehm, M., Lipi, A.A.: Parameters for Linking Socio-Cultural Characte-
ristics with Nonverbal Expressiveness; Comparison between Japanese and German
Nonverbal Behaviors. In: Proc. of HAI 2008, vol. 1A-4 (2008)
[10] Nakata, A., Sumi, Y., Nishida, T.: Sequential Pattern Analysis of Non-verbal Beha-
viors in Multiparty Conversation. IEICE Transactions J94-D-1, 113–123 (2011)
[11] Pease, A., Pease, B.: The Definitive Book of Body Language. Bantam Books, New
York (2004)
[12] Powell, M.: Presenting in English: How to Give Successful Presentations. Language
Teaching Publications, New York (1996)
[13] Reiman, T.: The Power of Body Language. Pocket Books, New York (2007)
[14] Reinhart, S.M.: Giving Academic Presentations. U. Michigan Press, Ann Arbor
(2002)
[15] Takahashi, Y., Kato, M., Kashiwagi, H.: Development of a Web-based Presentation
Database for English Language Learning. J. of the School of Languages and Commu-
nication 4, 93–103 (2007)
Reflection Support for Constructing
Meta-cognitive Skills by Focusing
on Isomorphism between Internal
Self-dialogue and Discussion Tasks
1 Introduction
Mitsuru Ikeda
School of Knowledge Science, JAIST
T. Watanabe et al. (Eds.): Intelligent Interactive Multimedia: Systems & Services, SIST 14, pp. 541–550.
springerlink.com © Springer-Verlag Berlin Heidelberg 2012
542 R. Kurata, K. Seta, and M. Ikeda
The knowledge co-creation skill is required to create and conduct the responsible
medical treatment by grasping wide range of information on the patients.
Meta-cognitive skill plays a key role to improve the knowledge co-creation skills
especially in the field where there is no pre-defined definite answer. It is the
essential skill that monitors one’s own cognitive / problem-solving processes from
objective viewpoints and controls them adequately. However, it is difficult for
learners to train meta-cognitive skills since they are quite tacit, latent and context
dependent: one cannot observe other one’s cognitive processes conducting in her
mind as internal self-dialogue.
In our research, we aim to develop learning support system by focusing on
isomorphism between tacit internal self-dialogue task and discussion tasks
conducted observably in the external world.
The medical services practitioners must have not only their technical capabilities
but also interpersonal skills to conduct intellectual collaboration for knowledge
co-creation. In this chapter, we describe the skills that they should have, and an
educational program that we are developing to train the skills.
1. The medical staffs have to make an evidence-based decision even under the
urgent situation.
2. They have to consider individual patient’s factors of her/ his values and beliefs
that are intertwined.
Reflection Support for Constructing Meta-cognitive Skills by Focusing 543
Figure 1(a) shows the cognitive processes described above. The medical staffs have
to have the skill to resolve the conflicts by thinking logically and accurately,
objectively reflecting on their own thinking process, taking into account the other’s
viewpoints, and integrating his idea and other’s idea into meaningful knowledge.
In the internal self-dialogue processes, learners write ones’ own case by reflecting
on ones’ problem-solving experiences objectively in the medical practices with
Sizhi (Fig. 3).
In this method, we divide the training processes into two processes to clarify the
learning goals of each process and to reduce a learner’s cognitive loads required for
training. Furthermore, regarding internal self-dialogue processes, we divided into
three processes, i.e., description phase, cognitive conflict phase, and knowledge
building phase shown as Fig. 2.
1. The description phase: A learner writes one’s own case of thinking processes
by reflecting on one’s experiences which she/he conceived a psychological
conflict in the medical practice.
2. The cognitive conflict phase: A learner writes another idea that she/he has
thought of as another person’s idea, or an idea that he/she thought of assuming
the thinking style of her/his teacher/ supervisor/ colleague/ parent. And she/he is
conflicted between one’s own opinions and other people’s opinions.
3. The knowledge building phase: A learner thinks and describes new knowledge
to overcome the conflicts.
Figure 3 shows a screen image of Sizhi which the learner uses in case-writing
[5][6]. The Sizhi is a learning environment designed for developing the learner’s
ability to conduct logical thinking for self-dialogue and to appropriately reflect on
one’s thinking processes by oneself. The set of Sizhi tags consists of nine tags: fact
(patient), fact (medical), policy/principle, assumption, decision, medical decision,
conflict, reflect and resolve. The tasks in the case-writing are to reflect on their own
thinking processes in nursing patients and clarify the structure of the thinking
processes by tagging them with the Sizhi tags.
Prompting learner’s logical thinking processes by adding the Sizhi tags in this
environment, which fosters meta-cognitive monitoring and controlling their own
thinking processes, contributes to training their ability to perform discussion
logically.
In the discussion process, each learner plays a role of a discussion leader (DL) in the
discussion on his own case. DL presents the case of oneself in discussion, and
leads the flow of discussion based on that case. The DL is required to think about
how to bring out conflict, and how to control members’ thoughts and perform one’s
own meta-cognition. Discussion Member (DM) reads the case that DL described
and simulates the way of knowledge building of DL’s thinking processes for
self-dialogue, takes part in the discussion, and gain insight into other members’
thinking processes.
Reflection Support for Constructing Meta-cognitive Skills by Focusing 545
Statement
Tag Reference
Discussing one’s case indirectly interacts with the one’s self-dialogue processes
in nursing patients, since a case represents one’s self-dialogue processes.
1. The empathy phase: The DL presents a topic referring to the case of oneself
using Sizhi. This phase corresponds to the description phase in the internal
self-dialogue processes.
2. The critique phase: This phase clarifies relative associations such as
similarities and conflicts between opinions based on that has discussed in the
previous phase. The goal of this phase is to make the conflicts evident among
members. This phase corresponds to the cognitive conflict phase in the internal
self-dialogue processes.
3. The creation phase: Learners find conflicts between their own opinions and
other people’s opinions and create new solution to overcome these conflicts.
This phase corresponds to the knowledge building phase in the internal
self-dialogue processes.
In the internal self-dialogue processes, a learner writes one’s own case after the
problem-solving. On the other hand, the discussion process is configured to make
the situation to solve the problem in the real time.
In this paper, we call this discussion “knowledge co-creation discussion.”
546 R. Kurata, K. Seta, and M. Ikeda
(c) Orders
patients, medical service and so on are shown, whereas opposite, support, evidence
and so on in Fig. 3(b) and question, proposal, answer and so on in Fig. 3(c) are
shown, respectively. Fig. 3(d) is a graphical representation on DL’s internal
self-dialogue imported from Sizhi. By showing it, it prompts learners to focus on
the differences among DL’s internal self-dialogue and collaborative thinking
processes.
By using the vocabulary in Fig. 3(a), for instance, a learner adds a tag “ends for
safety management” on the statement of “We should use an instrument for
restricting the patient’s movement for her own safety (statement 1).” On the other
hand, learner adds a tag “opposite” relation between statement 1 and a statement “It
might impair her dignity and hurt her pride.” If the learners add different tags to the
same statement, the system prompts to discuss their recognition on the statement; it
might provide an opportunity to acquire their meta-cognitive knowledge to monitor
/ characterize their thinking processes.
Learners who attended knowledge co-creation discussion are ready for
collaborative reflection on their discussion processes.
指針の設定 指針の設定
指針 指針
結果
私は患者Aに、拘束器具をつけないことにしました。そ
の根拠は、患者Aの”拘束器具を付けることは人間らし
さを損なわれる”という気持ちです。
そこで、対立する他者の意見として、私は患者Aに
拘束器具をつけるという考え方を記 述しました。
この根拠は、患者の安全面
と伝統的指針です。
この二つの思考で葛藤を起こした結果, 患者の
気持ちと安全面を両方考慮し”患者Aに
睡眠薬を投与する”ことを知識構築しました。
みなさんが私の立場である
5 質問 なら、どう思いますか?
回答
(1)回答もっとほかの観点がな
かったか検討したい
私は反対です。
確認 質問
私はあなたの行動を支
持します。 理由 ズ反対の理由を述べて
レください。
(2)質問結果に対する (3)
理由 DM1 同意確認
さんの支持の理由は? 回答
質問の意図と回
患者が望まないことを
回答 この場合拘束器具をつけないの
は危険すぎるからです。
(4)したく
答が異なっていま
はないし、私もそ
すよね同意
うすると思うからです。
私もそうするかな。
13 提案 葛藤の結果についても
考えてみませんか?
system regarding the case. Structure of the discussion protocols are graphically
represented in Fig. 4(d) and Fig. 4(e): logical relations in the medical service
domain and meta-cognitive structures of a DL that are represented as orders are
shown in Fig. 4(d) and Fig. 4(e), respectively. By scrolling down / up the scroll bar
indicated by Fig. 4(a), learners can replay their discussion processes: nodes shown
in the Fig. 4(d) and Fig. 4(e) are (dis-)appearing according to their scrolling. The
system highlights the statements in Fig. 4(b), Fig. 4(c) and Fig. (d) from the
viewpoint of a selected term of the medical service domain shown in Fig. 4(f).
Learners perform collaborative reflection for training meta-cognitive skills using
the tool. They especially focus on the orders in Fig. 4(e) for their discussion.
They discuss validity of orders proposed by the DL, since they, which are
depicted as red colored statement in Fig. 4(e), are recognized as results of
meta-cognitive self-dialogue processes. By discussing the validity of orders, they
examine the tacit meta-cognitive activities performed behind them. Balloons in the
Fig. 4(d) represent orders which DL did not proposed but other members think they
are valid if proposed: they conduct collaborative learning by discussing their
usefulness / validity by seeing their information.
The balloon “Confirmation” shown in the upper part of Fig. 4(e) are added by a
DL or DM who thought the order confirmation was required, although the
intervention had not been performed. By clicking the balloon, the contents of it are
shown in Fig. 4(d): it pointed out the gap between 2 learners’ intentions behind their
statements, the statement (1) “What did you do if you were in my shoes?” and the
statement (2), which is the answer to (1), “I support your action.” The intention of
the learner who talked the statement (1) is that she wanted to investigate the validity
of her actions if she is now in the situation where she performs her problem-solving
processes (Fig. 4(d)(1)). However, he interpreted her intention of the statement that
she wanted to require his agreement to her actions (Fig. 4(d)(2)). It also presents
another learner’s suggestion for externalizing the gap from the viewpoint of
knowledge creation: “DL should give intervention order, for instance, ‘there exists
some miss-understanding of her intentions.’”
In this way, CSCL environment for training meta-cognitive skills are provided to
learners by giving a role of DL required to control discussion processes as a
problem-solving task. This is only realized by focusing on the isomorphism
between internal self-dialogue processes and discussion processes.
5 Concluding Remarks
In this paper, we firstly described that training meta-cognitive skills is a key issue to
train problem-solving skills in the medical service science domain. Then, we
proposed our underlying philosophy of building a learning support system whereby
learners can collaboratively train their meta-cognitive skills through performing
discussion processes: we focus on the isomorphism between meta-cognitive
activities in internal self-dialogue processes and thinking processes in discussion
processes. Furthermore, we proposed our learning support system based on the
philosophy. Further evaluation of validity and usefulness of our system will be
addressed in future work.
550 R. Kurata, K. Seta, and M. Ikeda
References
[1] Keio Business School, Theory and Practice of Case Method. Toyo Keizai Inc., Tokyo
(1977) (in Japanese)
[2] Hyakkai, S.: Learning by Case Method. Gakubunsha Inc., Tokyo (2009) (in Japanese)
[3] Ishida, H., Hoshino, H., Okubo, T.: Case Book 1: Intorduaction to Case-Method. Keio
University Press, Tokyo (2007) (in Japanese)
[4] Ito, T.: Effects of Verbalization as Learning Strategy: A Review. Japanese Journal of
Educational Psychology 57, 237–251 (2009) (in Japanese)
[5] Cui, L., et al.: Thinking Skill Development Program To Support Co-Creation of
Knowledge for Improving the Quality of Medical Services. In: Proceedings of
Conference on Education and Education Management (2011)
[6] Morita, Y., Cui, L., Kamiyama, M.: Learning program that makes thinking the outside
and presses knowledge collaboration skill development. The Institute of Electronics,
Information and Communication Engineers Technology Research Report 111(98), 7–12
(2011) (in Japanese)
[7] Tomida, E., Maruno, J.: Theoretical Background and Empirical Findings of Argument
as Thinking. Japanese Psychological Review 24(2), 187–209 (2004) (in Japanese)
[8] Billig, M.: Arguing and thinking: A rhetorical approach to social psychology.
Cambridge University Press, Cambridge (1987)
[9] Kuhn, D.: The skills of argument. Cambridge University Press, Cambridge (1991)
Skeleton Generation for Presentation Slides
Based on Expression Styles
Abstract. With the advent of PowerPoint and Keynote that can effectively create
attractive presentation slides, people can use them to exchange and discuss ideas
together. However, because it is necessary to prepare many slides to enable audi-
ences to understand the content, authors need to prepare the best possible slides.
Our skeleton generation method is designed to help authors to prepare slides with
ease by constructing slide layouts based on the expression styles that the level posi-
tions of words expressing their role in slides from the text in the textbooks they use.
By analyzing the role of the words in the slides, our method can then extract the
differences between the important elements in both the texts and slides. To generate
skeletons for slides from target texts in a textbook, our method derives the expres-
sion styles of the words from pre-existing texts and their slides. Finally, it generates
slide skeletons by using the same expression styles of the corresponding words from
the target texts arranged in slides, which are the same as the layouts of pre-existing
slides. We also present the results of an evaluation of the method’s effectiveness.
1 Introduction
Presentations now play a socially important role in many fields, including business
and education, among others. Many university teachers have used Web services such
as SlideShare [1] and CiteSeerX [2] to store the slides they use in lectures. However,
because teachers prepare many slides to enable students to understand their content,
the teachers should prepare the best possible slides. In fact, when authors plan their
slides often refer to texts (e.g., lectures in a textbook) to determine the information
Yuanyuan Wang
University of Hyogo, Japan
e-mail: ne11u001@stshse.u-hyogo.ac.jp
Kazutoshi Sumiya
University of Hyogo, Japan
e-mail: sumiya@shse.u-hyogo.ac.jp
T. Watanabe et al. (Eds.): Intelligent Interactive Multimedia: Systems & Services, SIST 14, pp. 551–560.
springerlink.com c Springer-Verlag Berlin Heidelberg 2012
552 Y. Wang and K. Sumiya
Generating
Word Role Expression
interface Specialization 1 level
st
(slide title)
…
user Aggregation
… 1 ,2 ,3 …levels
st nd rd
Fig. 1 Conceptual diagram of skeleton generation from textbook and its slides
based on the expression styles of words by extracting the differences between the
important elements of pre-existing texts and their slides that are in such a textbook.
The next section reviews related work. Section 3 describes how to determine key
elements in texts and slides. Section 4 presents the generation of skeletons for slides.
Experimental results and conclusions are given in Sections 5 and 6, respectively.
2 Related Work
Most of the research related to slide-making support has focused on slide generation.
Mathivanan et al. [5], Beamer et al. [6] and Yasumura et al. [7] proposed a system for
generating slides from academic papers. Their method extracts information from a
paper by the TF-IDF method, and assigns the sentences, figures and tables in slides
by identifying important phrases for bullets. Shibata et al. [8] converted Japanese
documents to slides representation by parsing their discourse structure and repre-
senting the resulting tree in an outline format. However, conventional approaches
that focus only on the consistency of the document structure in the text and slides,
both ignore the role played by how to express words from the text to the slides.
Our method focuses on the differences between the key elements of texts and their
slides, and it generates skeletons for slides based on the expression styles of words.
Kan [9] proposed a system for the discovery, alignment, and presentation of such
document and slide pairs. Hayama et al. [10] aligned academic papers and slides
based on Jing’s method, which uses a hidden Markov model. These studies are sim-
ilar to ours for analyzing information that is common to texts and their slides. Our
approach focuses not only on the information that is in common, but also on infor-
mation that differs between texts and slides. Yokota et al. [3] can retrieve important
information in slides is similar to ours. Kurohashi et al. [4] detected important de-
scriptions of a word in a text. Their method is based on the assumption that the most
important description of a word in a text is the passage where the word occurs with
the highest density. We have employed the same method for detecting important de-
scriptions of a word in a text. Therefore, our goal is to generate skeletons for slides
by analyzing the differences between important elements of texts and their slides.
level. The first item of text is considered to be on the 2nd level, and the depth of the
sub-items increases with the level of indentation (3rd level, 4th level, etc.).
∑nu=1 dist(c1 , bu )
Wt = {b|min , ..., u=1 > α} (1)
n n
Where bu is the uth word b, and c j is the jth section in the text. Function dist cal-
culates the distance between sections, that is a number indicates how many sections
there are between two words. n is the number of times that b appears in a text. The
highest degree of expectation is obtained with the lowest dispersity by using func-
tion min. Wt is a bag of important words in the text, if the formula is greater than a
threshold α in Eq. (1), and b is determined to be the important word in Wt .
If a word m occurs in a high density in a certain range of a text segment in the text,
the text segment is therefore considered an important description of m in the text,
and it is called Dt . When the density of m in a text segment is high, it is determined
that the text segment of m provides Dt of m in the text. We define the position i of m,
and define l as a center position, and the range from l to the anteroposterior position
is w as a certain range in the text. To calculate the density of m, we use the hanning
window function [11] to decrease the weight of the words in the range from l to
l − w, l + w. The density of m on l in the range of |i − l| ≤ w can be calculated as
l+w
1 i−l
Dt = {m| ∑ am (i) · (1 + cos2π
2 2w
) > β} (2)
i=l−w
where the function |Kl (x, g)| extracts the total number of ki in Kl (x, g) in slide x.
Kl (y, g) are also bags of words in slide y, and they satisfy the same conditions as
Kl (x, g) in Eq. (3). Ws is a bag of important words in the slides, and if |Kl (x, g)| for
slide x is lower than |Kl (y, g)| for slide y in Eq. (4), g is then determined to be an
important word in Ws .
If a number of sentences in lines are indented deep in the level indentation of a
word d, these sentences is an important description of d in the slides, and it is called
Ds . When d and other words in slide x satisfy certain conditions, the lower levels of
sentences Ls (x, d) of d is determined to be Ds of d.
4 Skeleton Generation
4.1 Detecting Expression Styles
To generate skeletons, a slide layout is used, which consists of words based upon
expression styles by the role of the words using the differences between the impor-
tant elements in the pre-existing text and their slides. For the differences between
the importance of word q in the slides and the text, which falls into 3 categories:
• tw1 : q ∈ Wt ∩ Ws , q is an important word in both the text and the slides.
• tw2 : q ∈ Wt , q is an important word in the text.
• tw3 : q ∈ Ws , q is an important word in the slides.
For the differences between important descriptions of a word that appear in the text
and slides, we compute the similarity of the bag of words in important descriptions
of q, Dt in the text and Ds in slides. This is done using the Simpson similarity coeffi-
|Dt ∩Ds |
cient [12] as Sim(Dt , Ds )= min(|D t |,|Ds |)
. We consider that whether the text and slides
contain one or multiple important descriptions of q, based upon their similarity, they
556 Y. Wang and K. Sumiya
Chapter 5 Presentation 5
fall into 6 categories. When Sim(Dt , Ds ) ≥ 0.7, the content of the important descrip-
tions of q in the text and in the slides are similar, and there are 3 categories:
• td1 : one (multiple) descriptions of q in Dt corresponds to one (multiple) descrip-
tions of q in Ds .
• td2 : one description of q in Dt corresponds to multiple descriptions of q in Ds .
• td3 : one description of q in Ds corresponds to multiple descriptions of q in Dt .
When 0.3 ≤ Sim(Dt , Ds ) < 0.7, the common content of the important descriptions
of q in the text and in the slides are not similar, which falls into 3 categories:
• td4 : one (multiple) descriptions of q in Dt has information in common with one
(multiple) descriptions of q in Ds .
• td5 : one description of q in Dt has information in common with multiple descrip-
tions of q in Ds .
• td6 : one description of q in Ds has information in common with multiple descrip-
tions of q in Dt .
We can find what words are emphasized, and how the words should be described
in the text and the slides, whether multiple descriptions are dispersed, or one de-
scription is centered from the differences between important elements in the text
and the slides. In the example shown in Fig. 2, the word “document” is dispersed in
all sections in Chapter 5, with some text segments having a high density of “docu-
ment,” and it also appears frequently in the body of text in slide a6 of Presentation
5. When “document” is an important word in both the text and slides as tw1 , mul-
tiple important descriptions in the text correspond to one important description in
the slides as td3 . We consider that slide a6 is concentrated when it summarizes the
Skeleton Generation for Presentation Slides Based on Expression Styles 557
ES = (R, E) (7)
R = (wi , pwi )(wi ∈ W, pwi ∈ P) (8)
W = Wt ∪Ws (9)
P = {pw1 (tw1 ,td1 ), · · · , pw6 (tw1 ,td6 ), · · · , pw13 (tw3 ,td1 ), · · · , pw18 (tw3 ,td6 )}
(10)
Citation Relationship
Visualization Analysis • node
• analysis citation text • analysis • text analysis
relationship • representation – citation • citation citation
– node – representation • visualization • visualization
– representation – visualization relationship relationship
– analysis – analysis
slide 7 slide 8 slide 9 slide 10 slide 11
6 Concluding Remarks
In this paper, we proposed a method of skeleton-generation that provides support
for making slides based on the expression styles of words. We described in detail
how to expression styles are determined by extracting the patterns that combine the
differences between the important words and the important descriptions of words in
560 Y. Wang and K. Sumiya
texts and slides, respectively. To generate skeletons for slides from a target text, we
extracted the words in the target text that correspond to the words in pre-existing
text, and we then used the same expression styles of the words in the target text.
In the future, we plan to improve our algorithm for skeleton generation and to
evaluate it using a large set of actual presentation data. We also plan to enhance
our method for extracting corresponding words based on the document structures of
texts, not only in terms of sections but also in terms of paragraphs in a section.
References
1. SlideShare, http://www.slideshare.net/
2. CiteSeerX, http://citeseer.ist.psu.edu/index
3. Yokota, H., Kobayashi, T., Okamoto, H., Nakano, W.: Unified contents retrieval from an
academic repository. In: Proc. of International Symposium on Large-scale Knowledge
Resources (LKR 2006), pp. 41–46 (March 2006)
4. Kurohashi, S., Shiraki, N., Nagao, M.: A Method for Detecting Important Descriptions of
a Word Based on Its Density Distribution in Text. IPSJ (Information Processing Society
of Japan) 38(4), 845–854 (1997)
5. Mathivanan, H., Jayaprakasam, M., Prasad, K.G., Geetha, T.V.: Document summariza-
tion and information extraction for generation of presentation slides. In: Proc. of Inter-
national Conference on Advances in Recent Technologies in Communication and Com-
puting (ARTCOM 2009), pp. 126–128 (October 2009)
6. Beamer, B., Girju, R.: Investigating automatic alignment methods for slide generation
from academic papers. In: Proc. of the 13th Conference on Computational Natural Lan-
guage Learning (CoNLL 2009), pp. 111–119 (June 2009)
7. Yoshiaki, Y., Masashi, T., Katsumi, N.: A support system for making presentation slides.
In: Transactions of the Japanese Society for Artificial Intelligence, pp. 212–220 (2003)
(in Japanese)
8. Shibata, T., Kurohashi, S.: Automatic Slide Generation Based on Discourse Structure
Analysis. In: Dale, R., Wong, K.-F., Su, J., Kwong, O.Y. (eds.) IJCNLP 2005. LNCS
(LNAI), vol. 3651, pp. 754–766. Springer, Heidelberg (2005)
9. Kan, M.: Slideseer: A digital library of aligned document and presentation pairs. In:
Proc. of the 7th ACM/IEEE-CS Joint Conference on Digital Libraries, JCDL 2007, pp.
81–90 (2007)
10. Hayama, T., Nanba, H., Kunifuji, S.: Alignment between a technical paper and presenta-
tion sheets using a hidden markov model. In: Proc. of the 2005 International Conference
on Active Media Technology (AMT 2005), pp. 102–106 (May 2005)
11. Blackman, R.B., Tukey, J.W.: Particular Pairs of Windows. In: The Measurement of
Power Spectra, From the Point of View of Communications Engineering, pp. 95–101.
Dover, New York (1959)
12. Simpson, E.H.: Measurement of diversity. Nature 163, 688 (1949)
13. Hearst, M.A.: Search user interfaces, pp. 281–296. Cambridge University Press (Novem-
ber 2009)
Stochastic Applications for e-Learning System
Abstract. This paper considers the optimal facing study support interval for
the learner and derive analytically the optimal interval study support policy by a
stochastic model using access logging data of the e-learning system to contents. If
a lecture does not provide the facing study support to the student in an e-learning
system, the learner has the possibility of dropping out of the target subject to be
studied. However, if the lecture indeed provides such support to the learner every
time, a problem occurs from the viewpoint of its cost-effectiveness.
1 Introduction
High efficiency education and Efficient education are responsibilities for every
school and society [1]. One of measures of evaluation is whether education reaches
a high target level with a small education. Because education is the external econ-
omy, it has been thought to be not related with marginal utility in economics up to
now. If a lecturer does not provide the study support to a learner of the e-learning
system, a learner often may drop out of the study[2]. Such problem in study causes
study stagnation due to changes in the life environment. Then, as study stagnation
is prolonged, student’s motivation decreases [3]. Therefore, it is an important role
Syouji Nakamura
Kinjo Gakuin University, 1723 Omori 2-chome, Moriyama-ku, Nagoya, 463-8521, Japan
e-mail: snakam@kinjo-u.ac.jp
Keiko Nakayama
Chukyo University, 101-2 Yagoto-Honmachi, Showa-ku, Nagoya, 466-8666, Japan
e-mail: nakayama@mecl.chukyo-u.ac.jp
Toshio Nakagawa
Aichi Institute of Technology, 11247 Yachigusa, Yakusa-cho, Toyota, 470-0392, Japan
e-mail: toshi-nakagawa@aitech.ac.jp
T. Watanabe et al. (Eds.): Intelligent Interactive Multimedia: Systems & Services, SIST 14, pp. 561–568.
springerlink.com c Springer-Verlag Berlin Heidelberg 2012
562 S. Nakamura, K. Nakayama, and T. Nakagawa
to do the study support such that the lecturer may lead to the completion in the
e-learning system.
In this paper, when the lecturer uses the e-learning system, we consider how to
give the learner efficient by study support. In general, the content of study support
would be different and depend on the lecture. Therefore, neither the content, method
nor level of study support, we consider the frequency of the study support[4,5]. In
addition, the learner’s understanding level is proportional to the number of access
logs of the e-learning system. That is, in the e-learning system of self-study, if the
access to a learner’s e-learning system increases, learner’s acquirement items have
risen. Conversely, the learner acquires few items when there is a little access in the
e-learning system. We use the study support history data in the e-learning system.
It is necessary to decide the study support frequency and to reduce the work load of
the lecturer.
We apply the cumulative damage model[6,7] to the study support of e-learning
system, and seek analytically the optimal number of the study support. The cumu-
lative damage model in which shocks occur in random times and damage such as
fatigue, wear, crack growth, creep, and dielectric breakdown is additive. In this pa-
per, we apply the cumulative damage model to the e-learning system, by putting
shock by access in the e-learning system, failure by learner’s acquirement item and
damage by credit threshold.
2 e-Learning Model
Suppose that a learner should a total number of items K(> 0) in the e-learning sys-
tem for period [0, S ] and the number of acquirement items at time t is At which
is in proportion to time t. If the number of acquirement items at the final period
S is AS ≥ K in the e-learning system, then the study support is effective. Con-
versely, if the number of acquirement items at period S is AS < K, then the
study support is ineffective. As for the acquirement of the learning items of the
learner, we consider that the understanding time for item is not constant because of
a learner’s characteristics and a lecturer’s teachability. Thus, we assume that A is
the random variable with mean E{A} = a > 0 and a probability distribution function
G(x) ≡ Pr{A ≤ x}.
The probability that a learner can achieve the number of items K for period S is
We seek an optimal number N ∗ that minimizes C2 (N) in (9), i.e., the minimum
number N ∗ by which the lecturer can support a learner. From Nb ≤ K, it is clear that
N ≤ K/b. Thus, we may obtain N ∗ for N = 0, 1, 2, . . . , [K/b].
Letting Nb ≡ x. 0 ≤ x ≤ K, and from(9),
2 (x) ≡ bC(N) = bc1 G K − x + x.
C (10)
c2 c2 S
Clearly,
2 (0) = bc1 G K ,
C
c2 S
C2 (K) = K.
2 (x) with respect to x and setting it equal to zero,
Differenticting C
K−x c2 S
g( )= , (11)
S c1 b
where a function g(x) is a density function of G(x).
Stochastic Applications for e-Learning System 565
4 Numerical Examples
As numerical examples, we consider two cases where A has a Weibull distribtion
and a normal distribution.
Equation(9) is
m
C2 (N) c1 10 − N
= 1 − exp − +N
c2 c2 10/Γ(1 + 1/m)
(N = 0, 1, 2, . . . , 10). (14)
2
√1 e− 2
x
where φ(x) = 2π
.
1 c2
(i) If √ ≤ then x∗ = 0, i.e., N ∗ = 0.
σ 2πS bc1
1 c2
(ii) If √ > then there exists a finite and unique x∗ (0 < x∗ < K)
σ 2πS bc1 x x
which satisfies (19), and N ∗ = or N ∗ = + 1. If N ∗ > K/b, then N ∗ = K/b.
b b
In particular, when N = 0,
c1
C(0) = . (20)
2
Table 1 shows that if c1 /c2 becomes large, then the lecturer should increase the
frequency of study support. If c1 /c2 becomes large, we should frequently make the
study support. For example, when c1 /c2 = 15, the lecturer should make N ∗ = 10
for m = 1 , N ∗ = 5 for m = 2 times. Let the ratio of cost c1 to c2 be 15 times,i.e.,
c1 /c2 = 15. Then, we should provide study support once almost every weeks and
every 2 weeks for m = 1 and m = 2, respectively.
Stochastic Applications for e-Learning System 567
c1 /c2 m = 1 m = 2
5 0 0
10 0 0
15 10 5
20 10 6
30 10 8
40 10 8
50 10 9
c1 /c2 σ = 1 σ=2
2 0 0
5 3 0
10 5 5
15 6 8
20 6 9
30 7 10
40 7 10
50 10 10
5 Conclusion
In this proposed model, the optimal study support interval can be determined ana-
lytically by the accumulation access to the e-learning system. However, we do not
consider the form and the content of the study support, and there is a support method
according to the acquirement progress in the content of the learner’s study support.
In addition, it is a problem that for the necessity of every time there is the same cost
though the content of the study support.
As future problems, we should analyze the characteristics of a distributed
learner’s acquirement situation and cost of the learner’s study support and compare
them.
Acknowledgements. The authors would like to thank the financial support by the Grant-in-
Aid for Scientific Research (C), Grant No. 21530318 (2009-2011) and Grant No.22500897
(2010-2012) from the Ministry of Education, Culture, Sports, Science.
568 S. Nakamura, K. Nakayama, and T. Nakagawa
References
1. Obara, Y.: ICT using on University class. Tamagawa university press (2002) (in Japanese)
2. Ueno, M.: Knowledge Society in E-learning. Baifuukann (2007) (in Japanese)
3. Yamashita, J., et al.: Development of Aiding System for the Support for Distance Learners.
JSiSE Research Report 23(2), 41–46 (2008) (in Japanese)
4. Nakamura, S., Nakayama, K., Nakagawa, T.: The optimal study support interval policy in
e-learning. Discussion Paper, Institute of Economics Chukyo University. No. 0812 (2009)
5. Nakayama, K. (ed.), Nakamura, S., Nakagawa, T., et al.: Associated Economics filed
Stochastic Process and Education. Keisou Syobou (2011) (in Japanese)
6. Nakagawa, T.: Mentencne Theory of Reliability. Springer (2005)
7. Nakagawa, T.: Shock and Damage Models in Reliability Theory. Springer (2007)
8. Takács, L.: Stochastic Processes. Wiley (1960)
9. Barlow, R.E., Proschan, F.: Mathematical Theory of Reliability. Wiley, New York (1965)
Supporting Continued Communication with
Social Networking Service in e-Learning
1 Introduction
In recent years, educational research has focused on the use of computer-mediated
communication (CMC)(Kato, S. & Akahori, K., 2004, Joinson, A.N., 2001).
Research has demonstrated that CMC in the teaching-learning process creates
more flexible communication patterns (Berge & Collins, 1996; Heller & Kearsley,
1996; Ruberg, Moore, & Taylor, 1996). CMC allows students to interact with their
Kai Li
Research Center for Agrotechnology and Biotechnology, Toyohashi University of
Technology, Japan
e-mail: kaili@recab.tut.ac.jp
Yurie Iribe
Information and Media Center, Toyohashi University of Technology, Japan
e-mail: iribe@imc.tut.ac.jp
T. Watanabe et al. (Eds.): Intelligent Interactive Multimedia: Systems & Services, SIST 14, pp. 569–577.
springerlink.com © Springer-Verlag Berlin Heidelberg 2012
570 K. Li and Y. Iribe
instructors and peers in a time that is convenient for them and may increase stu-
dent responsibility and self-discipline (Berge & Collins, 1996, Hsi & Hoadley,
1997). CMC can also equalize participation by masking social cues and cultural
differences (Berge & Collins, 1996; Hsi & Hoadley, 1997).
Social Network Service was defined as web-based services that allow individu-
als to construct a public or semi-public profile within a bounded system, articulate
a list of other users with whom they share a connection, and view and traverse
their list of connections and those made by others within the system (boyd, d. m.,
& Ellison, N. B., 2007). And social network service focuses on building online
communities of people who share interests and/or activities, or who are interested
in exploring the interests and activities of others (Gross, R., Acquisti, A., 2005).
Recently, social networking services (SNS) have been introduced as a new
CMC mode among university students (Sagayama, K. et al., 2008, Tokuno, J. et
al., 2007, Umeda, K. et al., 2007). Most of researches reported that SNS was ef-
fective in supporting communication between university students and teachers.
But little studies have focused on whether SNS was effective in supporting con-
tinued communication among adult students. In our previous study we found adult
students have more private communication than public communication in SNS.
In this study, we focus on discussing whether SNS could support continued
communication among adult students and comparing the different communication
activities in three course periods.
2 Project Background
This study is about a classroom lectures and e-learning courses blended learning
project. All the students are adult students who are interested in IT and agricultural
technology. Learning period of the project is about 18 months. In the beginning of
the project, the adult students have two months’ classroom lectures only on the
weekend, and then they have 12 e-learning courses at home for other 8 months. At
last, they have an on-site training for one month in different places. After the first-
year students got graduated, the second-year students will have the same learning
program as the first-year students.
In order to support communication between students and instructors, the
Sendoshi-SNS was developed (see figure 1).It was developed basing on the
Open Source Software of OpenPNE (Official site, 2008). OpenPNE is a social
networking service engine providing SNS functions such as profile, diary,
footprint, message, community, ranking, etc.
Home
Add diary List all Ranking
Footprint
Send message Friends’ diary
other one could log in the SNS except the students and the staff. And all of the
users were asked to use their real name in their profile in order to know who
they are.
Fig. 2 Hit count of functions of my homepage, friends’ diary, footprint, etc. (Vertical axis is
hit count)
Supporting Continued Communication with SNS in e-Learning 573
Table 1 Average hit count in functions of home, diary, etc. among three years students
Fig. 3 Hit count in different functions by the first-year students in three years (Vertical axis
is hit count)
The results show that the first-year students have more communication
activities in the first course period (see figure 3). Especially, there are significant
differences in reading friends’ diaries (F(2,75)=5.52, p=0.006<0.05) (see table 2).
They have read significant more friends’ diaries in the first course period
(M=606.42) than the third course period (M=178.29) (p=0.07<0.05). There are no
significant differences in activities of list diaries, ranking, etc. Although they have
574 K. Li and Y. Iribe
no learning activities in the second and the third course periods, we could still
found some communication activities in the SNS, by which we could conclude
that the first-year students have a continued communication in three course
periods even after they have graduated.
Table 2 the first-year students’ average hit count in functions of home, diary, etc. in three
years
Fig. 4 Hit count in different functions by the second-year students in two years (Vertical
axis is hit count)
Supporting Continued Communication with SNS in e-Learning 575
Table 3 the second-year students’ average hit count in functions of home, diary, etc. in two
years
4 Discussion
References
(1) Kato, S., Akahori, K.: Influences of Past Postings on a Bulletin Board System to New
Participants in a Counseling Environment. In: Proceedings of ICCE 2004, pp. 1549–
1557 (2004)
(2) Joinson, A.N.: Self-disclosure in computer-mediated communication: The role of self-
awareness and visual anonymity. European Journal of Social Psychology 31, 177–192
(2001)
(3) Berge, Z., Collins, M.: Computer mediated communication and the online classroom:
Overview and perspectives. In: Collins, B. (ed.) Computer Mediated Communication,
vol. I, pp. 129–137. Hampton, New Jersey (1996)
Supporting Continued Communication with SNS in e-Learning 577
(4) Heller, H., Kearsley, G.: Using a computer BBS for graduate education: Issues and
outcomes. In: Berge, Z., Collins, M. (eds.) Computer-mediated communication and
the online classroom. Distance learning, vol. III, pp. 129–137. Hamptom Press, NJ
(1996)
(5) Ruberg, L., Moore, D., Taylor, D.: Student participation, interaction, and regulation in
a computer-mediated communication environment: A qualitative study. Journal of
Educational Computing Research 14(3), 243–268 (1996)
(6) Hsi, S., Hoadley, C.: Productive discussion in science: Gender equity through elec-
tronic discourse. Journal of Science Education and technology 6(1), 23–36 (1997)
(7) Boyd, D., Ellison, N.: Social Network Sites: Definition, History, and Scholarship.
Journal of Computer-Mediated Communication 13(1) (2007)
(8) Sagayama, K., Kume, K., et al.: Characteristics and Method for Initial Activity on
Campus SNS. In: Proc. of ED-MEDI 2008, pp. 936–945 (2008)
(9) Tokuno, J., Sakurada, T., Hagiwara, Y., Akita, K., Terada, M., Miyaura, C.: Devel-
opment of a Social Networking Service for Supporting Alumnae’s Re-challenge. IPSJ
SIG Technical Report, 2007-CE 91(10), 53–60 (2007)
(10) Umeda, K., Naito, Y., Nozaki, H., Ejima, T.: A study of university student communi-
cation using SNS Web diaries. In: Supplementary Proc. of ICCE 2007 (WS/DSC),
Hiroshima, Japan, vol. 2, pp. 315–320 (2007)
(11) OpenPNE Official Site (2008) (in Japanese), http://www.openpne.jp/
(12) Gross, R., Acquisti, A.: Information Revelation and Privacy in Online Social Net-
works (The Facebook case). In: Proceedings of WPES 2005, pp. 71–80. Association
of Computing Machinery, Alexandria (2005)
(13) Archer, J.L.: Self-disclosure. In: Wegner, D., Vallacher, R. (eds.) In the self in social
psychology, pp. 183–204. Oxford University Press, London (1980)
Tactile Score, a Knowledge Media of Tactile
Sense for Creativity
1 Introduction
When perceiving an apple, one cannot precisely confirm whether everybody per-
ceives the apple in a same/equivalent manner or not. However, for visual and audio
perception, one can share comparable sensation with others to some extent. When
given the name of Mona Lisa, we can recall the identical piece of painting. We also
can hum to the same tune of the subject of Symphony No.5 of Beethoven. On the
other hand, tactile perception has no way to visualize or to record in regeneratable
formats like paintings and musical scores, to share with others. Tactile perception is
among the important senses, along with visual and audio perception [1].
Yasuhiro Suzuki
Department of Complex Systems Science, Graduate School of Information Science, Nagoya
University
Name, Furocho Chikusa Nagoya Japan
e-mail: ysuzuki@nagoya-u.jp
Junji Watanabe
NTT Communication Laboratories, 3-1 Morinosato Wakamiya Atsugi-shi, Kanagawa
243-0198, Japan
Rieko Suzuki
Tokyo Face Therapie, 2-3-4 Koishikawa Bunkyo Tokyo Japan
T. Watanabe et al. (Eds.): Intelligent Interactive Multimedia: Systems & Services, SIST 14, pp. 579–587.
springerlink.com c Springer-Verlag Berlin Heidelberg 2012
580 Y. Suzuki, J. Watanabe, and R. Suzuki
Tactile perception has mainly been the subject of cognitive science and psychol-
ogy studies and has been applied in engineering. Especially massage has been stud-
ied as a part of alternative medicine, and its effect on relaxation and activation of
immunocyte has been widely recognized. This study recommends the visualization
method of tactile stimulation in regenaratable manner, and analyzes the effect of
massage.
2 Tactile Score
To visualize and investigate the method of massage, we take massaging to be com-
posed of pressure, the area of touching, and the velocity of the movement of hands.
In staff notation of the tactile score, we define the third line as the basic pressure;
the basic pressure is the pressure when we hold a baby or an expensive jewel very
carefully. Hence, the basic pressure is not defined absolutely but may change from
person to person or for different types of massage. For the tactile score, we de-
fine the pressure strength as the difference in pressure from the basic pressure. We
define stronger pressure as downward from the third line in the staff notation and
weaker pressure as upward from the third line (Figure 2). We also define the part
of the hand used in massage and the kind of stroke when massaging (see Figure
2). For example, the fingertip to the first joint is 1, the second joint is 2, the third
joint is 3, the upper part of the palm is 4, the center of the palm is 5 and the bottom
of the palm is 6; when we flow from a fingertip to the third joint, this is denoted
as “1-3”.
For massage strokes, we analyse the method of massaging, face therapy and
extract strokes; we symbolize each stroke as A, a, N, n, etc. For example, the sym-
bol A stands for the massage stroke of drawing a circle on the cheek. In this no-
tation, for example, A5 illustrates drawing a circle on the cheek with the center of
the palm. The tactile score in this contribution is the basic version in which each
musical note denotes massage by both hands and we denote a gap in hand mo-
tion with a special mark above the staff notation; 1 denotes both hands moving the
same, 2 indicates a small gap between hands and 3 indicates a large gap between
hands.
Fig. 1 Top: Strokes of massaging on a face; these strokes are obtained from massage expe-
riences in beauty shops; strokes that pass uncomfortable areas have been excluded. Bottom:
Usage of part of the hand.
Fig. 2 Top: An example of a tactile score, with special marking above the staff notation; 1
denotes both hands moving the same, 2 indicates a small gap between hands, and 3 indicates
a large gap between hands, the Sulla like marks illustrate a unit component of massaging,
the integral-like marks illustrate releasing pressure, and the breath-like mark corresponds to a
short pause in massaging, much like a breath in playing music. Bottom: Schematic expression
of change of pressure and areas of touching, where the size of each cycle illustrates the area
of touching and the solid line illustrates the change of pressure.
Fig. 3 Map of the 42 basic components in the principal component space, where the horizon-
tal axis illustrates the first and second principal components and the vertical axis illustrates
the third principal component.
Tactile Score, a Knowledge Media of Tactile Sense for Creativity 585
Fig. 4 The result of a time series of basic massage components, where each bidirectional
arrows illustrates possible transitions between basic groups. Groups IV and V (indicated by
cycles) are intermediate groups; they mediate transitions between groups.
Constant = S × P × V,
where S is the area of touching, P is the pressure of massage, and V is the velocity of
hand movement. In massaging, we feel comfortable, when S, P and V are oscillatory
changed with preserving this relation. For instance, draw a circle on the back of
your hand using your fingertip strongly and then draw a circle on the back of your
hand using your palm with the same movement velocity; if you massage with the
same pressure, it will not be comfortable but if you massage more softly, it will be
comfortable.
586 Y. Suzuki, J. Watanabe, and R. Suzuki
4 Conclusion
In this contribution, we proposed a visualization and desctibing method, the tactile
score for massages. Tactile sense, specially complex tactile sense such as massages
are invisible and we have no way to visualize or to record in regeneratable formats,
to share with others. On studies of tactile perception has centered on the generation
of tactile stimulation (i.e. receiving end) and how to touch, has not been discussed so
much. We believe that this visualization method is useful for analysing tactile sense
or designing and showing complex tactile sense. For example, by visualizing tactile
sense by using tactile score, we can apply it into the computational aesthetic1 [3],
[2] and ’compose’ a massage as if composing a music. In this example, we compose
a “motif” of massaging by simplified version of the tactile score (Figure 5); And
we delete the lefttmost the tactile note and add new one in the rightmost position;
Fig. 5 A motif of massage expressed by using simplified version of the Tactile score
Fig. 6 From the motif, we delete the leftmost one and insert rightmost
we repeat this operation three times and compose the Tactile score for four measure
(Figure 6). Then we transform obtained simplified tactile score into the score with
the tactile notes with keeping the relational expression and we obtain the new tactile
score (Figure 7).
References
1. Leeuwenberg, E.: A perceptual coding language for visual and auditory patterns. Americal
Journal of Psycology 84(3), 307–349 (1971)
2. Bense, M.: Aesthetica. einfühurung. die neue Aesthetik (1965)
3. Scha, R., Bod, R.: Computationele estehtica. Informatiej en Informatiebleid 11(1), 54–63
(1993)
Taxi Demand Forecasting Based on Taxi Probe
Data by Neural Network
Abstract. The taxi is a flexible transportation system that everyone can move to
any destination. However, in Japan, the charge for the taxi is more expensive than
other transportation facilities. The taxi business is in a very tough situation because
the cost of crude oil suddenly increased in addition to the influence of the over-
supply of the taxi market. Recently, the application of Information Technologies
has advanced on taxi industries (e.g., the fare payment by non-contact IC and car
navigation system). One of the technologies that gain such the attention is a probe
system which can store a large amount of customer trajectory data. The probe sys-
tem will improve the profitability of taxi companies if the demand in the future can
be forecasted from the statistics. Therefore, in this paper, we try to forecast the taxi
demands from the taxi probe data by a neural network (i.e., multilayer perceptron).
First, we analyze the statistics of the taxi demands and make the training data set for
the neural network. Then, the back-propagation learning is applied to the neural net-
work to reveal the relationship of regions in the Tokyo(i.e., 23-words, Mitaka-shi,
and Musashino-shi). Finally, we report our discussion about the result.
1 Introduction
Recently, transportation systems in Japan such as taxi and train have introduced
information technologies in diverse ways. For example, most taxis are equipped with
car navigation system which would show the way to your destination, and we can
Naoto Mukai
Culture-Information Studies, School of Culture-Information Studies, Sugiyama Jogakuen
University, 17-3, Hoshigaoka-motomachi, Chikusa-ku, Nagoya, Aichi, 464-8662, Japan
e-mail: nmukai@sugiyama-u.ac.jp
Naoto Yoden
Dept. of Electronic Telecommunications, Matsumoto-denryokusho, Tokyo Electric Power
Co.Inc., 1-1-17, Chuou, Matsumoto, Nagano, 390-0811, Japan
e-mail: utinokoinu@msn.com
T. Watanabe et al. (Eds.): Intelligent Interactive Multimedia: Systems & Services, SIST 14, pp. 589–597.
springerlink.com © Springer-Verlag Berlin Heidelberg 2012
590 N. Mukai and N. Yoden
pass through ticket gates simply by holding our non-contact IC cards over a scanner
in train stations. One of the technologies for transportation systems in the spotlight is
a taxi probe system which provides historical data of taxi (i.e., latitude and longitude
when a taxi picks up a customer). Taxi probe data is just beginning to be applied to
a variety of uses. Nakashima et al. aimed at the improvement of road time table by
using taxi probe data [4]. They discussed how to deal with week and seasonal factors
for traveling time of the road. Taguchi et al. analyzed the relationship between taxi
behaviors and the characteristics of region and time [6], and found a macro model
of customers in Sendai, Japan. Furthermore, there has been considerable researchs
on the probe data [2, 5].
In this paper, we try to forecast the taxi demands by using the taxi probe data.
Transport demand forecasting is a very important factor for transportation systems
because more precious prediction can improve their profits. Especially on-demand
systems such as demand-bus and car sharing system are very sensitive to the pre-
diction accuracy of transport demands. For example, the allocation of taxis can be
controlled adequately, and the time required to transport customers can be reduced.
A typical forecasting approach is a neural network and have been used as demand
forecasting [3, 1, 7]. We also adapt a neural network (i.e., multilayer perceptron)
for our objective. First, we analyze the statistics of the taxi probe data and make
the training data set for the neural network. Then, the back-propagation learning is
applied to the neural network to reveal the relationship of regions in the Tokyo(i.e.,
23-words, Musashino-shi, and Mitaka-shi).
The remainder of this paper is as follow. Section 2 shows the format of taxi probe
data. Section 3 defines the training data set for a neural network. Section 4 reports
our results and discussion. Section 5 describes conclusions and future works.
In [6], Taguchi et al. indicated that taxi demands depend strongly on day of the
week (i.e., weekday or holiday). Thus, we considered three input patterns: “One-
day, Three-days, and Seven-days” shown in Table 3 for day of the week.
We assumed that the taxi demands also depend on weather, in fact, the number
of taxi-users increases in poor weather conditions. Japan Meteorological Agency
provides the past amount of precipitation in Japan 2 . Table 4 shows the amount
2 Japan Meteorological Agency:http://www.jma.go.jp/jma/index.html
Taxi Demand Forecasting Based on Taxi Probe Data by Neural Network 593
of precipitation and the air temperature in Tokyo at February 27, 2009, which is
obtained from the website. We set one neuron in input layer, and if the rainfall is
observed, the input value is 1, otherwise, 0.
that the neural network of 4-hours and 50 neurons (in hidden layer) outperforms
compared to other networks. From here, we focus on the result of the outperformed
neural network.
Figure 4 shows the comparison of input patterns shown in Table 5. First, when
comparing input pattern of day of week (i.e., one-day, three-days, and seven-days),
the seven-days can reduce the error effectively. This fact implies that the taxi de-
mands vary widely according to the day of week. Moreover, three-days is superior
to the one-day in the case of PT 2 and PT 3, but inferior in the case of PT 5 and PT 6.
A possible cause is that the seven-days overlaps three-days and redundant neurons
may negatively affect to the result. Next, we found that the weather information
Taxi Demand Forecasting Based on Taxi Probe Data by Neural Network 595
about precipitation is ineffective. Whether the rain falls or not is considered in this
way, but we should have considered the amount of precipitation. We would like to
address this problem as the future task.
Figure 5 shows the error comparison in each region. The result indicates that
error of Chuo-Ku is the least in the regions. The Chuo-Ku is positioned roughly in
the center of Tokyo’s 23 wards. Population of the Chuo-Ku is the second smallest,
but day population increases sharply because most of areas in Chuo-Ku are business
zones. Thus, taxi demands occur periodically in a day. On the other hand, Edogawa-
ku is the worst in the regions. The Edogawa-Ku is positioned at the east end of
Tokyo’s 23 wards. In the town, many younger families live because the town is
596 N. Mukai and N. Yoden
easy access to the center of Tokyo (there are five railways and subways). Thus, taxi
demands are so small, and non-periodical. We think that these features of the regions
are the reason that the demand forecasting is effective or not.
Figure 6 shows the comparison of time zones in a week. We found the error of
weekdays is relatively small compared to weekends. Most of businessmen work in
weekdays, thus, taxi demands occur periodically as previous described. Moreover,
the error during 4:00-8:00 is so small because the number of demand during the
time is also small.
5 Conclusion
In this paper, we considered demand forecasting for taxis by neural network. Taxi
probe data, which contain historical data of taxis (i.e., latitude and longitude when
a taxi picks up a customer), is used as training data set for the neural network. We
adopted three kinds of input data: “demands in each region”, “day of the week”,
and “amount of precipitation” for the neural network, and evaluated their effects.
We found that day of the week is important factor for demand forecasting because
the demands occur periodically in a week. Furthermore, the demand forecasting in
business town like Chuo-Ku is more easy compared to bed town like Edogawa-Ku.
However, the amount of precipitation is ineffective because we considered whether
the rain falls or not. Therefore, we must consider how to deal with weather informa-
tion or other events (e.g., festivals).
Acknowledgements. We appreciate the provision of the taxi probe data by System Origin
Corporation. This work was supported by Grant-in-Aid for Young Scientists (B).
References
1. Araki, H., Kimura, A., Arizono, I., Ohta, H.: Demand forecasting based on differences of
demands via neural networks. Journal of Japan Industrial Management Association 47(2),
59–68 (1996)
2. Kanazawa, F., Sawada, Y., Wakatsuki, T., Iwasaki, K.: Applying the probe data, accumu-
lated by the its-spot, to the road governance. In: Proceedings of ITS Symposium 2011, pp.
73–76 (2011)
3. Kimura, A., Arizono, I., Ohta, H.: An application of layered neural networks to demand
forecasting. Journal of Japan Industrial Management Association 44(5), 401–407 (1993)
4. Nakajima, Y., Makimura, K.: Study on improvement of road time table using taxi probe
vehicle data. Journal of Japan Society of Civil Engineering 29 (2004)
5. Nishimura, S., Suzuki, K., Kobayashi, M., Matsumoto, O.H.H., Nagashima, Y.: An im-
provement on traffic signal control through use of probe vehicle data. In: Proceedings of
ITS Symposium 2010, pp. 365–370 (2010)
6. Taguchi, K., Yoshida, S., Sadohara, S.: Time spatial analysis of taxi demand using probe.
In: Summaries of Technical Papers of Annual Meeting Architectural Institute of Japan,
vol. 2009, pp. 519–520 (2009)
7. Xu, J.X., Lim, J.S.: A new evolutionary neural network for forecasting net flow of a car
sharing system. In: Proceedings of IEEE Congress on Evolutionary Computation 2007,
pp. 1670–1676 (2007)
The Design of an Automatic Lecture Archiving
System Offering Video Based on Teacher’s
Demands
1 Introduction
As the use of audio/visual equipment spreads, many educational facilities are
beginning to develop and apply automatic lecture archiving systems [1][2][3].
Some systems can record not only the teacher but also the lecture slides and output a
combined video [4].
Many videos recorded in an archiving system are can be used in review by
students. For teachers, these videos can be used to create teaching material.
However, because a video is often long, it can take significant time for a teacher
to edit the video by deletion of unnecessary sections and by subdividing it into
sections, etc. Such edited lecture videos will be authoritative and easy to use but
only at the human cost of the editing. Methods for creating videos for archiving
systems have been studied by many research groups [5][6]. This research has
proposed methods for creating videos without these human costs, including
rule-based control of the camera recording and automatic construction of the video
from videos of the teacher’s action patterns.
T. Watanabe et al. (Eds.): Intelligent Interactive Multimedia: Systems & Services, SIST 14, pp. 599–608.
springerlink.com © Springer-Verlag Berlin Heidelberg 2012
600 S. Yamaguchi, Y. Ohnishi, and K. Nishino
We focus on teacher input into improving archiving systems. The main requests
by teachers at our university are
In this section, we explain the analysis of the audio files recorded by the
lecture-archiving system. The system records the lecture sound as a Broadcast
Wave Format (BWF) file. A BWF file is an encoded sound file in an uncompressed
sound format.
First, we analyze the sound files for 14 programming lectures presented at our
university. This lecture contains explanations by the teacher and exercises using a
computer. Figure 2 shows the sound records for the 14 programming lectures. The
deeply colored sections indicate that sound is recorded, i.e., that the teacher is
explaining. Each lecture lasts 3 h. We can see that explanatory sections and exercise
sections are divided clearly in these lectures. Note that the orientation lecture occurs
first, with the teacher talking for much of the time and ending the lecture earlier than
usual.
While the students are exercising, the teacher does not do much explaining.
Figure 2 shows some vertical deeply colored lines in the silent sections, indicating
that the teacher is offering hints during the exercises and messages of a few words.
The archiving system will not need to record these user-exercise sections. If the
archiving system can delete the silent sections from the videos, these videos will
become more useful for users.
In Figure 2, the speech sections and silent sections are separated clearly. We can
assume that separating required sections from sections that are not required is easy,
from the evidence of the videos of this lecture.
However, because the archiving system does not recognize when an exercise
starts, it is difficult to remove these exercise sections while recording the lecture.
We therefore consider removing these sections from the video after the recording.
602 S. Yamaguchi, Y. Ohnishi, and K. Nishino
Next, we describe the identification of silent sections in the video. In fact, the
“silent” sections shown in Figure 2 are not always completely silent. We can see
some small audio signals by expanding the waveform.
Moreover, the volume of the audio starting times is increasing little by little. We
therefore measure the value of the sound signal for every second of the video, as
shown in Figure 3. We check the maximum and minimum values for each 1 sec
period, and record the differences. A section of the record is:
This section of the record includes the start of some talking by the teacher. The
difference values are around 10 in the initial silent section, and then increase greatly
from the 5th record on. From these records, we can assume that silent sections will
have difference values of 100 or less and we use this threshold in our automatic
editing program. The program analyzes the BWF file and generates a new BWF file
without the silent sections. The algorithm for the automatic editing program is as
follows:
• Read the header of the BWF file and record it in a new BWF file.
• Record to an information file the times for which the difference value in each 1
sec sample is over 100.
• Copy to the new BWF file the waveform data for only those times recorded in
the information file.
• Update the header of the new BWF file to match the new recording time.
Table 1 shows the result of this analysis and editing by the program. We can
confirm that our program deletes the silent sections from the BWF file and reduces
its length by about a half. The right-hand column shows the corresponding length of
videos edited by a student. The student was employed by our research group to
support the creation of teaching material. The student watches the video and
extracts only the sections from the video that are required by the archiving system.
(The first lecture was not edited by the student, because it contained only
orientation material.)
604 S. Yamaguchi, Y. Ohnishi, and K. Nishino
The analysis and editing time for the student required 70–90 min per video,
whereas the analysis time for our program required 2–3 min. (The encoding time for
the two methods has not been included.) From Table 1, we can confirm that there is
not much difference in the output video’s length for the two methods. The
student-edited file is slightly shorter than that from our analysis program. This is
because the student judged some sections for which the teacher was speaking not to
be related to the lecture, and therefore deleted them from the video. We expect that
our program would include all such sections that the student omitted.
If our program deletes the same sections from the BWF file as the student-edited
video, we are then able to use the results of the analysis effectively.
We improved our program based on these rules, and analyzed the video for the 14
lectures. Figure 4 illustrates an example of our results by showing three rows, where
the first row is the sound waveform for the 13th lecture. As in Figure 2, the deeply
colored sections show where sound has been recorded. The second row gives the
analysis results from our program. The deeply colored sections show the sections to
be encoded that our program identified. The third row is the result following student
editing. The student divided up the lecture movie according to the teacher’s
demands, with the vertical lines showing where these divisions occur.
The Design of an Automatic Lecture Archiving System 6005
From Figure 4, we can n confirm that there are no big gaps in the results. Ouur
program has removed som me sections of the sound data that were also removed bby
the student. Therefore, ev ven if our program does not retain these sections, we thinnk
that the result will be acceeptable.
The student removed more m sections than our program. This was almost alwayys
because these sections co ontained material unrelated to the lecture. However, thhe
teacher may explain the relationship
r to the lecture. In Figure 4, we can see a thiin
vertical line in the last gaap in the student’s data, between 2:30 and 3:00. Here, thhe
student judged that this short section was required according to the teacherr’s
demands. Our program also judged that this section was required.
Therefore, even if ourr program removes some silent sections automatically,
based on the analysis resu ults, we judge that these videos can be used effectively.
60 12:40:00 AM
0:38:39 0:38:19
54
12:35:00 AM
50 48 48 48 48
46
45 45
12:30:00 AM
Video length 1
40
36 Video length 2
35 35 12:25:00 AM
Encording 33 Encoding time 1
Video Length
TIme (min) 31 31
(h:mm:ss)
0:20:04 Encoding time 2
30 12:20:00 AM
12:15:00 AM
19
20
16
0:10:29
12:10:00 AM
10 0:05:19
12:05:00 AM
0:00:59
1 0:00:13 0:00:10 0:00:21
0:00:16
12:00:11 AM 0:00:14 0:00:21 0:00:12 0:00:32 0:00:18 0:00:19
0 12:00:00 AM
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
Number of divided videos
Fig. 5 Time required for encoding each section and the length of the videos
From Figure 5, we can see that the encoding time is long if the video is long (such
as the 2nd section in Lecture video 1 and the 7th section in Lecture video 2).
The Design of an Automatic Lecture Archiving System 607
However, the encoding time involves more than just its length. For example, in
Lecture video 1, although the 16th section is very short (21 sec), its encoding time is
the same as that for the 10th section.
In examining the behavior of the program, we note that whenever FFmpeg is run,
it searches for the encoding start time from the beginning of the file. Figure 5 shows
that the time to search for the start time is longer than the encoding time. The results
for Lecture video 2 show this effect. Lecture video 2 comprises two videos. The 8th
section of Lecture video 2 comes from the 2nd video. Therefore, the time to search
for the starting time is short, whereas the time to encode the data for the 8th section
remains long.
As a result, the total encoding time for Lecture video 1 is over 10 h. On the other
hand, the total encoding time for Lecture video 2 is only 3 h. We think that the
encoding time for our method is greatly affected by the situation in the lecture. The
encoding time will be long if there are two or more explanations lasting 10 sec or
more at the end of a lecture.
We watched the subdivided videos for some sections to check if there was any
problem with the contents of the video. Although the video sections had not been
subdivided in terms of the contents of the lecture, those contents about which the
teacher spoke were covered. We therefore estimated that it would offer satisfactory
viewing and listening. These videos are poor compared with the videos edited by
the student. However, we were able to assess that the videos are acceptable, when
compared with watching a 3 h video containing unnecessary silent sections.
5 Conclusion
In this research, we propose an automatic lecture-archiving system. We identified
silent sections in the lecture videos by analyzing the sound in 14 lectures. We then
created some short videos by removing their silent sections.
These videos used different methods from video editing by students. However,
the videos retained all the video content that was in the videos edited by students.
We expect that our archiving system will be able to create lecture videos subdivided
into content-related periods by adding teacher-supplied information about the
lecture to the encoding method used in this paper.
In future work, we will first evaluate analysis program by showing videos to
students and teacher, and develop an interface for mobile terminals to send
information about the lecture. Next, we will consider encoding methods for other
types of lectures. Finally, we will offer all the lecture videos to both students and
teachers to evaluate the lecture-archiving system.
References
1. Baeker, R.N., Wolf, P., Rankin, K.: The ePresence Interactive Webcasting and Archiving
System: Technology Overview and Current Research Issues. In: Proceedings of the
World Conference on E-Learning in Corporate, Government, Healthcare, and Higher
Education, pp. 2532–2537 (2004)
608 S. Yamaguchi, Y. Ohnishi, and K. Nishino
2. Herr, J., Lougheed, L., Neal, H.A.: Lecture Archiving on a Larger Scale at the University
of Michigan and CERN. In: In:17th International Conference on Computing in High
Energy and Nuclear Physics, CHEP 2009, p. 11 (2009)
3. Takayuki, N.: Automated lecture recording system with AVCHD camcorder and
microserver. In: Proceedings of the 37th annual ACM SIGUCCS Fall Conference, pp.
47–54 (2009)
4. Cha, Z., Yong, R., Jim, C., Li-wei, H.: An Automated End-to-End Lecture Capture and
Broadcasting System. ACM Transactions on Multimedia Computing, Communications,
and Applications 4(1) (2008)
5. Takafumi, M., Yoshitaka, S., Koh, K., Michihiko, M.: Lecture Context Recognition
Based on Statistical Features of Lecture Action for Automatic Video Recording. Journal
of the Institute of Electronics, Information and Communication Engineers J90-D(10),
2775–2786 (2007)
6. Atsuo, Y., Tsukasa, H.: A Framework for Rule Based Video Editing for Lecture Video
Archiving. Information Processing Society of Japan Technical Report, 2009-HCI-132,
pp. 123–129 (2009)
7. FFmpeg (December 18, 2011), http://ffmpeg.org/
The Difference and Limitation of Cognition
for Piano Playing Skill with Difference
Educational Design
1 Introduction
There are many case study in the piano singing and playing lesson in pre-school
teacher education, for example, Nakajima[1] or Imaizumi[2]. But almost of case
Katsuko T. Nakahira
Nagaoka University of Technology, Nagaoka, Niigata, Japan
e-mail: katsuko@vos.nagaokaut.ac.jp
Miki Akahane
Tokyo College of Music, Tokyo, Japan
Yukiko Fukami
Kyoto Women’s University, Higashiyama-ku, Kyoto, Japan
e-mail: fukami@kyoto-wu.ac.jp
T. Watanabe et al. (Eds.): Intelligent Interactive Multimedia: Systems & Services, SIST 14, pp. 609–617.
springerlink.com c Springer-Verlag Berlin Heidelberg 2012
610 K.T. Nakahira, M. Akahane, and Y. Fukami
studies treat with only view to view lesson and few studies in the point of educational
design.
In past, we studied the improvement of educational design for piano singing
and playing lesson in pre-school teacher education. The design includes
concurrent use of view to view lesson and e-Learning. The main results are as
follows:
1. We made an experiment of video submission with their piano playing and singing
for 300 students. The analysis of the number of video submission and mid/end of
performance examination suggests that it gives motivation to students and is ef-
fective for students’ skill improvement (Fukami et.al[3], Nakahira et. al.[4]). We
also made an experiment of giving asynchronous comments to students’ playing
and singing, which suggested the effectiveness and limitation of the educational
design (Fukami et al., [6]).
2. We developed the e-learning material (entitled “e-learning course on piano per-
formance for teachers and pre-school teachers”, Nakahira et al., [5], [8]) and
internet-delivery since April 2008.
3. We developed the educational design which includes 2 components: (1)self-
learning via e-Learning contents for piano playing and singing developed in
2, (2)self video submission of students’ playing and singing before and after
e-Learning for piano playing and singing. After experiments, we analyzed the
difference in students’ skills or awareness before and after e-Learning. The anal-
ysis suggested the educational design has the effects of (1)getting basic skills for
early-stage students, and the expression skills for advanced students, (2)improve-
ment of students’ singing.
4. we developed annotated scores for 50 numbers[9].
In the process of developing them, we analyzed the students’ change before and
after gazing on the annotated note. The analysis suggested that concurrent use
of meta-cognitive language[11] and intercorporate imagination of physical skills
(ability to mimic or copy skills)[12] has the potential of radical improvement of
students’ skills (Nakahira et al., [13]).
Through these processes, we think that we succeeded in proposing a good educa-
tional design for piano singing and playing to some extent. We could not, however,
refer to the piano playing skill transfer. This is an important theme for pre-school
teacher education, and we need to distinguish the merits of view to view lesson and
of e-Learning method, under the limitation of time and/or the number of instructors.
In this paper, we discuss the difference and limitation of skill transfer between
2 types of learning environment, which are composed of (1) annotated scores and
model performance videos served by e-Learning, (2) a view to view lesson after
taking (1).
The Difference and Limitation of Cognition for Piano Playing Skill 611
2 Environment of Education
2.1 Construction of e-Learning Contents
First of all, we explain how we constructed e-Learning contents, which plays the
most important roll in making this experiment. Figure 1 shows the index of the
contents. There are 2 links for each number: one is for the model performance video,
and the other for the annotated score. Figure 2 shows those elements.
The model performance video was played by a professional pianist who is in-
volved in piano playing education for many years. We have recorded and edited the
playing by ourselves. The process of making the annotated scores was as follows.
First, all the annotations are decided by the authors. Then, making the scores with
the annotations, we finally convert them into pdf files to include in the Web page.
We also prepared an upload website for the video submition, which enables stu-
dents to submit their playing video at anytime.
Fig. 2 (a)(b) an image of model performance video. Each video is taken from 2 angles. (c)
Score A (abr.), (d) Score B (abr.)
Students are instructed to choose which score they would like to learn from the
score A and B. After the decision of students’ compulsory, the students are divided
into 2 groups randomly by the instructor: one group practices by the self-learning
via e-Learning (served with the annotation score and the model performance video),
while the other group is supplemented with a view to view lesson after the self-
learning via e-Learning. We require all students to submit their own performance
videos before and after the learning. The instructor does not insist the students to
record the whole score of their performance if they cannot, in order to reduce the
barrier to submit the videos.
3 Results
We analyzed these video datasets gathered through the process mentioned above
from the following two points of the view: (1) the dependence of the effectiveness
of e-Learning depending on the degree of difficulty of the tune. (2) the difference
in the skill transfer quality arising from the difference in the learning method. The
estimation of skill transfer quality was made by a professional piano instructor who
is not acquainted with the students.
Table 1 The fraction of students in each category that classifies the degree of improvement of
the students’ skills, when comparing Take 1 and Take 2 for the score A. means significant
improvement, tiny improvement, and × no improvement. Error means that the data is
incomplete.
ratios written in italic letters with underline in Table 1 represent the mentioned im-
provement: the last bar rit. (ritardando, ratio 0.58), motif expression (ratio 0.58 in
dynamics, ratio 0.75 in phrase). In contrast, as for the item of correct length of note
there is no improvement arising from the learning method, namely self-studying via
e-Learning.
Table 2 shows the results of analyzing the data from 20 students who submitted
videos of their performance for the score B without taking any view to view les-
son. There is some improvement in students’ skills related to correct length of note
(dotted rhythm, ratio 0.64) and marcato (ratio 0.76). On the other hand, there is no
improvement in the skills related to dynamics or phrase.
the score B from 23 students who took a view to view lesson. Comparing Table 3
with Table 2, we find that the students who have practiced with a view to view les-
son drastically improved their skills about demiquaver, keeping note, dynamics, and
tempo. As for the skills for which improvement is observed even for students who
did not take a view to view lesson, improvement is seen consistently also for stu-
dents who took a view to view lesson. In total, the students who took a view to view
lesson made a large amount of improvement of their skill.
4 Discussion
In this section we add further discussion on the points analyzed in section 3.
First, we discuss the difference in the skill transfer for different tunes. The score
A does not contain a demiquaver, while the score B contain many demiquavers
and dotted notes, but both tunes are basically easygoing numbers. Both scores have
simple dynamics composed of p (piano), f (forte), crescendo, and decrescendo. The
items of students’ skill that are improved only by e-Learning contents are the last
bar rit. and motif expression affected from model performance video. By contrast,
there is no improvement about dynamics, tempo and fine notes such as demiquaver.
These points difficult to improve by e-Learning are improved by taking a view to
view lesson. The reason why this difference is caused is accountable as follows.
The first to mention is the difference in learning style. The self-learning via e-
Learning contents is provided without any interaction with other people. Students
can learn following their own learning rhythm, and the learning environment is ba-
sically free from any stress. We consider what students can learn most under such
learning environment. Fujimura and Ohmi[14] suggested the difference of the rolls
between the right and left sides of the brain in recognizing rhythm. From the op-
tical topography image, the left side of the brain works for rhythm cognition but
it becomes more accurate when the person use the right side of the brain together.
Iwasaka et al.[15] gave some insight into the relation between score reading and
playing music. They pointed out that (1) the music information processing of play-
ers who are practiced in score reading tend to make active planning for playing
expression in their brain, (2) the amateur players make less planning for playing
expression, (3) the dominant part of brain activity is spent for the functions of the
motion control and music perception when the amateur players listen to the music.
From these suggestions and the fact that students are in the state of an amateur for
piano playing, we made a hypothesis that e-Learning is a powerful method under the
situation in which the function of motion control and music perception dominate,
and hence it is useful for the cognition of correct length or pitch of note. By contrast,
it is not so effective for the purpose of improving the fine note cognition.
On the other hand, when the students have a chance to take a view to view lesson,
they can get not only model performance but also realtime oral suggestions from the
instructor. The oral suggestions from the instructor can be regarded as a different
type of “stimuli” for students since they contain the points that the students will not
be able to find out by themselves by comparing their own playing with the scores or
616 K.T. Nakahira, M. Akahane, and Y. Fukami
the model performance. In consequence, the students get more stimuli from a view
to view lesson than self-learning via e-Learning.
These stimuli bring frequent awareness leading to the improvement of cognition
of the length of notes, fine movement of notes, and the tempo. Although the ex-
pression of dynamics in score did not make radical improvement in both learning
methods, the students who have taken a view to view lesson can get the instruc-
tion reaching at dynamics of music because of their faster mechanic skill improve-
ment, which further caused some change in their attitude to improve their music
expression.
5 Conclusion
In this paper, we discussed the difference and limitation of skill transfer by compar-
ing two types of prepared learning environment: one consists of annotated scores
and model performance videos served by e-Learning, and the other is supplemented
with a view to view lesson after taking e-Learning. We showed that the following
two points can be suggested from our experiment.
1. We find that e-Learning contents can provide a powerful method for cognition of
correct length or pitch of notes.
2. When students have a chance to take a view to view lesson, they can get not only
model performance but also realtime oral suggestions from the instructor.
In future, we would like to discuss about the reason why the fine note cognition is
not improved in their training.
References
1. Nakajima, T.: A Practical Study on Piano Teaching Method at the Department of
Education–For Acquirement of Musical Capacity. In: Studies on Educational Practice,
Center for Educational Research and Training, Faculty of Education, Shinshu Univer-
sity, vol. 3, pp. 31–40 (2002)
2. Imaizumi, A.: A Trial of Teaching Piano Playing to Students with No Experience–Group
Lesson using Keyboard Pianos (2) Introduction of Practice Record Cards. Japan Society
of Research on Early Childhood Care and Education Annual Report 57, 281–282 (2004)
3. Fukami, Y., Nakahira, K.T., Akahane, M.: Effect of submitting self-made videos in piano
playing and singing practicing for preschool teacher education. The Bulltain of Depart-
ment of Pedology, Kyoto Women’s University 4, 19–27 (2008)
4. Nakahira, K.T., Akahane, M., Fukami, Y.: Combining Music Practicing with the Sub-
mission of Self-made Videos for Pre-School Teacher Education. In: The Proceedings of
the 15th International Conference on Computers in Education, pp. 573–576 (2007)
The Difference and Limitation of Cognition for Piano Playing Skill 617
5. Nakahira, K.T., Akahane, M., Fukami, Y.: Development e-learning contents with blended
learning for teaching piano singing and playing piano. JSiSE Research Report 23(1), 85–
92 (2008)
6. Fukami, Y., Nakahira, K.T., Akahane, M.: Effects and problems of remote and non-face-
to-face teaching to singing with simultaneous piano self-accompaniment. The Bulltain
of Department of Pedology, Kyoto Women’s University 5, 31–40 (2009)
7. Nakahira, K.T., Akahane, M., Fukami, Y.: Use of Electronic Media for Teaching Singing
with Simultaneous Piano Self-Accompaniment. The Journal of Three Dimensional Im-
ages 23(1), 82–87 (2008)
8. http://oberon.nagaokaut.ac.jp/kwu/
9. Fukami, Y., Akahane, M.: 50 Best Annotated Scores for Simultaneous Piano Playing and
Singing of Children’s Songs (2011) (in Japanese) ISBN-4276820723
10. Nakahira, K.T., Akahane, M., Fukami, Y.: Verication of the Effectiveness of Blended
Learning in Teaching Performance Skills for Simultaneous Singing and Piano Playing.
Biometric Systems, Design and Applications, 978–953 (2011) ISBN 978-953-307-542-6
11. Suwa, M.: The Act of Creation: A Cognitive Coupling of External Representation, Dy-
namic Perception, the Construction of Self and Meta-Cognitive Practice. Cognitive Stud-
ies 11(1), 26–36 (2004)
12. Saito, T.: “Culture(Buildung in German)” as Bodily Wisdom. The Japan Society for the
Study of Education 66(3), 29–36 (1999)
13. Nakahira, K.T., Akahane, M., Fukami, Y.: Awareness Promoting Learning Design of
Sing-along Piano Playing – the role of annotated musical score and multimedia contents
–. In: The Proceedings of the 3rd International Conference on Awaqreness Science and
Technology, pp. 372–378 (2011)
14. Fujimura, A., Ohmi, M.: Difference of brain activity when musical element is perceived
by subjects with different musical instrument experience. IEICE Technical Report HIP-
2007-126, pp. 143–147 (2007)
15. Iwasaka, M., Sugo, K., Shimo, M., Ishii, T.: Human brain hemodynamics during ac-
tive/passive musical listening revealed by near infrared spectroscopy. IPSJ SIG Technical
Report, 2007-MUS-69, pp. 1–6 (2007)
Topic Bridging by Identifying the Dynamics
of the Spreading Topic Model
1 Introduction
Representing knowledge in the form of a story has many advantages [5]. For exam-
ple, when learning computer programming, students can progress more quickly if
they read not only descriptions of functions, such as reference materials, but also
descriptions on how to use the functions, such as tutorials or sample applications,
or descriptions on what to do in the case of a problem, such as FAQs. Depending on
the objectives, it is usually preferable to impart knowledge about a function through
a story that links different bits of knowledge related to different functions.
Individuals can piece together relevant knowledge to empirically create a story,
and it is possible to attach different meanings and/or values to a piece of the story
T. Watanabe et al. (Eds.): Intelligent Interactive Multimedia: Systems & Services, SIST 14, pp. 619–627.
springerlink.com c Springer-Verlag Berlin Heidelberg 2012
620 M. Sato, M. Akaishi, and K. Hori
2 Topic Bridging
Any time a particular topic is discussed, there are many possible ways to progress
to its conclusion. For example, there can be several potential ideas when discussing
approaches to solving environmental problems from the viewpoint of aerospace en-
gineering, such as to design a more efficient engine, to use cost-effective material that
can be reused, or to take a more efficient flight path. Connecting aerospace engineer-
ing to the environmental problems can be done with a process called topic bridging.
In order to successfully implement topic bridging in a computational system, we
propose an algorithm in which the system returns candidates of a chain of docu-
ments connecting a given start document with a goal document. An overview of this
implementation is shown in Fig. 1.
User y0
y'1
y4
y'2 y'3
Bridge candidates
y'1 y'2 y'3
System x0 x1 x2 x3 x4
Document database
Fig. 1 Overview of the topic bridging implementation. The input is the start document and
goal document. The output is chains of documents as the bridge candidates. The documents
for the bridge candidates should be accumulated in advance.
When connecting documents to generate a story, each story fragment should have
a different meaning depending on the composition of the story. For example, con-
sider a story fragment about the features of nuclear power. If it links up with a
story fragment about global warming, the subject of the story would be relatively
low CO2 emissions. On the other hand, if it links up with a fragment about the
Great East Japan Earthquake, the subject would be the dangers of nuclear power.
We modeled this idea as follows: Terms that are important in the ”global warming”
story fragments become more important in the ”nuclear power” story fragments.
We referred to the spreading activation model [3, 7] to define the spreading topic
model, which is a model used for network analyses such as calculating the impor-
tance of nodes in a network. There has been much previous research on document
analysis applications [9, 10]. In the spreading activation model, a network has both
weighted nodes and weighted links. The weight of the node is called “activation”,
and the activation spreads with the weighted links.
In the spreading topic model, there is a relationship between a term assigned a node
and a link assigned a term. In this paper, we define the term weight and the term relation
weight as term context-dependent attractiveness and term dependency, respectively
[1]. These concepts are explained in more detail in the sections that follow.
sentencess (ti ,t j )
ds (ti ,t j ) = , (1)
sentencess (ti )
where sentencess (ti ) is the number of sentences that contain term ti in document s
and sentencess (ti ,t j ) is the number of sentences that contain both term ti and term t j
in document s. The attractiveness of term t j in document s, as (t j ), is the sum of the
dependency of term ti on term t j over all terms ti :
as (t j ) = ∑ ds (ti ,t j ), (2)
ti ∈T
cτ = Ds cτ −1 , (4)
where cτ [i, 1] = cτ (ti ), Ds [i, j] = ds (t j ,ti ). Then, when the start document is charac-
terized by cτ −1 and the goal document is characterized by cτ , the relation between
the two is written with bridges {s1 , s2 , ..., sτ −1 } as follows:
cτ = Dbridges c0 , (5)
where
Dbridges = Dsτ −1 Dsτ −2 ...Ds1 . (6)
If we know cτ and c0 , we can estimate Dbridges as follows:
D̂bridges = cτ c+
0, (7)
where c+0 is a pseudoinverse of c0 . Equation (4) usually has many solutions because
it is under-determined for its sparseness. We, therefore, use the pseudoinverse, a
generalization of the inverse matrix, to obtain a solution. The pseudoinverse gives
the “least-squares” answer.
Next, we want to find chains of documents as story fragments, so we score a
chain of documents {s1 , s2 , ..., sτ −1 } in the following way:
3 Evaluation
We tested the proposed method by conducting user studies to determine the utility
of our algorithm as it would be used in practice. We evaluated our method by using
descriptions of lectures. The stories generated from such descriptions can then be
used for the reconstruction of new courses as chains of lectures.
For our evaluation, we selected a course called “Global Focus on Knowledge”
from the University of Tokyo1. The concept of this course is “to look at the global
knowledge system on a macro scale, capture the entire picture of each field, and
1 http://www.gfk.c.u-tokyo.ac.jp/
624 M. Sato, M. Akaishi, and K. Hori
understand how they are organically linked up with each other.” In this course, one
or two different themes are set each semester and the sub-themed lectures in a course
are held by professors from multiple departments. There have been lectures on 20
different course themes and 104 sub-themes from 2005 to 2010. For example, in
the 2010 Winter term, one theme was “The World of Diverse Matter - The Distant
Journey from Space to Earth”, and it consisted of five lectures with five sub-themes:
“From Micro Particles to Macro Space”, “The Diversity of Matter Born from the Ac-
tions of Atoms, Electrons, and Molecules”, “The Search for and Creation of Matter
with Desirable Properties -Discoveries from the Field of Pharmaceutical Science”,
“Changing Matter - From Matter to Material”, and ”An Everlasting Future for Our
Small Earth”.
Each lecture had a description composed of about 200 Japanese characters. We
used Japanese language morphological analysis on the descriptions and treated the
nouns as terms and removed the function words. Stories were then composed from
the descriptions.
The generated stories were presented to users, who then evaluated them. There
were two bridges; that is, four story fragments, the start, two bridges, and the goal
represented one story. We tried to generate stories by linking start and goal pairs
using the following two techniques:
Topic bridging As described in Section 2.
Shortest path We computed the distance between documents on the basis of the
cosine similarity of the tf-idf vectors and then located the path with the biggest
total similarity. The tf-idf weight, which is the term frequency-inverse document
frequency, is often used to evaluate how important a word is to a document [15].
The start and goal documents were selected from lectures with the same theme. We
were able to generate 262 stories by using each technique with the highest score,
so there were a total of 464 stories. We presented six participants with 20 stories
(10 stories for each technique; randomly selected) which were then shuffled. We
then asked the participants if the chain made sense for the concept of the course.
Examples of the stories used are shown in Table 1.
First, we checked whether the bridges of all the stories were originally from the
same theme. In the case of our algorithm, there were 106 stories that had bridges
between the start and goal from the same theme, while with the shortest path al-
gorithm, there were 44. Extracting bridges from the same theme as the start and
goal results in a better suitability to the concept of the course, indicating that our
algorithm is more effective than the shortest path algorithm.
Next, we compared the evaluated stories (Fig. 3). There were 51 out of 60 (85%)
appropriate stories (suitable for the course concept) generated by our algorithm. In
contrast, the shortest path algorithm generated 24 out of 60 (40%). Our algorithm
outperformed the competitor.
Moreover, we compared the evaluated stories for the bridges with the same theme
and in the case of the bridges with a different theme.
Our algorithm generated 15 out of 16 (94%) appropriate stories with the same
theme, and the shortest path algorithm had 7 out of 7 (100%). Both algorithms could
recommend appropriate bridges if they belong to the same theme.
On the other hand, Our algorithm stories generated 36 out of 44 (82%) appropri-
ate stories with a different theme. In contrast, the shortest path algorithm generated
17 out of 53 (32%). This demonstrates that bridges that are evaluated as appropriate
for the concept of the course are not necessarily selected from the lectures with the
same theme. We were able to get such bridges with the proposed method more than
with the shortest path algorithm.
This is because the proposed method is essentially a method of weighting the co-
occurrence of words in a bridge story fragment. It is possible to extract unique the
co-occurrence combinations of common words — e.g., a combination of the word
“network” with the word “society” or the word “network” with the word “digital”
— and treat them separately if a sentence includes these three words. Focusing on
a specific combination of words can enable us to link documents by the identifying
polysemous words.
Fig. 3 Evaluations of generated stories by using topic bridging and shortest path algorithms.
626 M. Sato, M. Akaishi, and K. Hori
4 Conclusion
We proposed using the spreading topic model to model the topic transition dynamics
and an algorithm to identify the spreading topic model for topic bridging. The goal
of the topic bridging was to link chains of documents from the start document to the
goal document to create a story. The method bridges topics by solving an inverse
problem of the term context-dependent attractiveness.
The findings we have presented are the results of a simple strategy, particularly
in the case of the context-dependency model. In the future, we intend to extend this
work by exploring more complex models and evaluating additional tools for the
generated stories.
References
1. Akaishi, M.: A dynamic Decomposition/Recomposition framework for documents based
on narrative structure model. Transactions of the Japanese Society for Artificial Intelli-
gence 21(5), 428–438 (2006)
2. Allan, J.: Topic detection and tracking: event-based information organization, vol. 12.
Springer, Heidelberg (2002)
3. Anderson, J.: A spreading activation theory of memory. Journal of Verbal Learning and
Verbal Behavior 22(3), 261–295 (1983)
4. Bringsjord, S., Ferrucci, D.: Artificial Intelligence and Literary Creativity: Inside the
Mind of Brutus, a Storytelling Machine. L. Erlbaum Associates Inc., Hillsdale (1999)
5. Brown, J.: Storytelling in organizations: Why storytelling is transforming 21st century
organizations and management. Butterworth-Heinemann (2005)
6. Cavazza, M., Charles, F., Mead, S.: Character-based interactive storytelling. IEEE Intel-
ligent Systems 17(4), 17–24 (2002)
7. Crestani, F.: Application of spreading activation techniques in information retrieval. Ar-
tificial Intelligence Review 11(6), 453–482 (1997)
8. Gordon, A., Van Lent, M., Van Velsen, M., Carpenter, P., Jhala, A.: Branching story-
lines in virtual reality environments for leadership development. In: Proceedings of the
National Conference on Artificial Intelligence, pp. 844–851. AAAI Press (2004), TECH-
NOLOGIES, U.O.S.C.M.D.R.C.I.F.C.
9. Mani, I., Bloedorn, E.: Summarizing similarities and differences among related docu-
ments. Inf. Retr. 1(1-2), 35–67 (1999)
10. Matsumura, N., Ohsawa, Y., Ishizuka, M.: Automatic indexing based on term activity.
Transactions of the Japanese Society for Artificial Intelligence 17(4), 398–406 (2002)
11. Meehan, J.: Tale-spin, an interactive program that writes stories. In: Proceedings of the
Fifth International Joint Conference on Artificial Intelligence, Citeseer, pp. 91–98 (1977)
Topic Bridging by Identifying the Dynamics 627