You are on page 1of 6

Stereo-Vision Based Motion Stabilization of a

Humanoid Robot for the Environment Recognition


by Type-2 Fuzzy Logic
Tae-Koo Kang, Huazhen Zhang, Gwi-Tae Park
School of Electrical Engineering
Korea University
Seoul, Korea
Email: {tkkang,zhanghz,gtpark}@korea.ac.kr
AbstractThis paper presents an efcient ego-motion com-
pensation method for a humanoid robot, using stereo vision and
type-2 fuzzy logic. A humanoid robot should have the ability
to autonomously recognize its surroundings and to make right
decisions in an unknown environment. To enable a humanoid
robot to do this, the ego-motion compensation method, which can
eliminate the motion of a humanoid robot that causes an error of
environment recognition, is suggested in this paper. The method
uses a disparity map obtained from stereo vision and can be
divided into three modules: the segmentation, feature extraction,
and compensation modules. In the segmentation module, the
objects are analyzed using type-2 FCM and features are extracted
using wavelet level set extraction in the feature extraction
module. The displacement for the rotation and translation can
be estimated by tracking the least-square ellipse and correlation
coefcient using FNCC in the compensation module. Based on
the results of the experiments that were conducted in this study,
it was found that the proposed method can be effectively applied
to a humanoid robot.
I. INTRODUCTION
The structure of a humanoid robot is similar to that of
a human being, and it has greater mobility compared to
a wheeled robot. Unlike the wheeled robots, the humanoid
robots have the advantage of moving over obstacles. In other
words, the humanoid robots can walk on noncontinuous and
uneven surfaces, such as stairs and doorsills. In the case of
moving over an obstacle, however, it is critically important to
attain as much precise information regarding the obstacles as
possible since the robot establishes contact with an obstacle by
calculating the appropriate motion trajectories to the obstacle.
As such, the humanoid robots need the algorithms that can
autonomously determine their action and paths in unknown
environments from the vision system. The vision system is
one of the most important sensors in the humanoid robot
system. It indispensably requires the stabilization module,
which can compensate for its own ego-motion for more precise
recognition.
Over the years, a number of researches have been under-
taken in the motion compensation eld on the vision system
installed in a robot. These researches commonly use stereo
vision, which can extract information regarding the depth
of the environment. Robot motion from stereo vision can
be estimated through the 3D rigid transform, using the 2D
multiscale tracker, which projects the 3D depth information
on a 2D feature space. These methods are used in scale-
invariant feature transform(SIFT) [8] and in iterative closest
point(ICP) [9] and have been used mainly for the motion
estimation or obstacle/object recognition of the wheeled robots
[10-12]. Moreover, the optical-ow-based method, which can
estimate the motion by a 3D normal-ow constraint using a
gradient-based error function, is currently widely used because
of the simplicity of the computation therein [13]. These are
not appropriate methods, however, for a biped humanoid robot
as the walking motions of a humanoid robot simultaneously
show the vertical and horizontal movements, unlike the motion
of a mobile robot, as well as the computation cost yielded
by its point-to-point operation. Therefore, the more efcient
stereo-vision-based ego-motion compensation method is pro-
posed for a humanoid robot, using type-2 fuzzy logic. The
proposed method consists of three parts: segmentation, feature
extraction, and compensation. In the segmentation part, objects
are extracted through image analysis, using interval type-2
fuzzy c-means(IT2-FCM). As IT2-FCM is characterized by
the uncertainty of the image obtained from stereo vision,
it can cluster the objects in the image more precisely. The
method that automatically determines the number of objects
in an image and extracts the precise degree of displacement
using type-2 fuzzy logic is likewise proposed in this paper. In
the feature extraction part, the feature images are extracted
via wavelet level set extraction. From the multilevel set,
much information can be obtained from each image. Then
the displacement can be calculated using least-square ellipse
approximation and fast-normalized cross-correlation(FNCC).
The results of the feature extraction step are used as the input
data for the compensation part. In the compensation part, only
a couple of datasets, rotation and translation, can be extracted
using type-2 fuzzy logic ltering and FNCC. Moreover, a more
precise displacement can be obtained by using the type-2-
fuzzy-logic-based ltering method.
This paper is organized as follows. In chapter 2, the pro-
posed stereo-vision-based motion stabilization of a humanoid
robot by type-2 fuzzy logic is introduced. In chapter 3, the
17th Mediterranean Conference on Control & Automation
Makedonia Palace, Thessaloniki, Greece
June 24 - 26, 2009
978-1-4244-4685-8/09/$25.00 2009 IEEE 772
results of the experiments that were conducted focusing on
the verication of the performances of the proposed system
are presented. Finally, chapter 4 presents the conclusions and
the papers contributions.
II. MOTION STABILIZATION
A. Architecture of Motion Stabilization
The overall system architecture was constructed as illus-
trated in Fig. 1. The system largely consists of three parts: seg-
mentation, feature extraction, and compensation. The disparity
map obtained from stereo vision is used as the input image. In
the segmentation, the number of objects is initialized, and all
the objects that were used as features are extracted. In feature
extraction, the feature data obtained by the wavelet level set
are extracted. In the feature extraction step, the displacement
for the angle and translation in each estimation part can
be estimated. Consequently, all the displacements, such as
the angle and translation, are compensated by the estimation
results in the compensation step. The detailed explanations are
given as follows.
Fig. 1. Overall system architecture
B. Segmentation and Feature Extraction
The IT2-FCM method [1,6] is used for the segmentation
of the objects from the disparity map obtained from stereo
vision. The IT2-FCM algorithm is proposed to control the un-
certainty of fuzzier in FCM, which affects the assignment of
memberships for the patterns. The maximum fuzzy boundary
can be controlled by incorporating two values of fuzzier, as
shown in Fig. 2.
Fig. 2. Desirable maximum fuzzy region by two fuzziers
The primary membership J
x
i
of pattern X
i
can be
represented as a membership interval with all secondary
grades of the primary memberships that are equal to one. The
interval type-2 fuzzy set A can be represented as

A = {((x, u), u

A
(x, u))|x A, u J
x
[0, 1],
u

A
(x, u) = 1} (1)
Fig.3 illustrates an example of a guassian interval type-2
fuzzy set with an uncertain standard deviation, where the gray-
shaded region indicates the footprint of uncertainty(FOU). The
primary membership interval J
x
= [u(x

), u(x

)] in IT2-
FCM can be expressed as equation (2). Fuzzier m
1
and m
2
represent and manage the uncertainty, and give two different
objection functions to be minimized in FCM as equation (3).
Fig. 3. Gaussian interval type-2 fuzzy set with uncertain std.
u
j
(x
i
) =

C
k=1
(d
ij
/d
ik
)
2/(m
1
1)
,
if
1

C
k=1
(dij/d
ik
)
2/(m
1
1)
>
1

C
k=1
(dij/d
ik
)
2/(m
2
1)
1

C
k=1
(dij/d
ik
)
2/(m
2
1)
, otherwise
and
(2)
u
j
(x
i
) =

C
k=1
(d
ij
/d
ik
)
2/(m
1
1)
,
if
1

C
k=1
(dij/d
ik
)
2/(m
1
1)

1

C
k=1
(dij/d
ik
)
2/(m
2
1)
1

C
k=1
(dij/d
ik
)
2/(m
2
1)
, otherwise
J
m
1
(U,v)
=
N

i=1
C

j=1
u
j
(x
i
)
m
1
d
2
ji
J
m
2
(U,v)
=
N

i=1
C

j=1
u
j
(x
i
)
m
2
d
2
ji
(3)
The computing procedure for updating cluster centers and
membership function in IT2-FCM requires type reduction and
defuzzication methods using type-2 fuzzy operations [2].The
generalized centroid(GC) type reduction can be used as a type
773
reduction procedure. The centroid obtained via type reduction
is shown as the following interval:
V
x
= [V
L
, V
R
] =

u(x1)Jx
1

u(x
N
)Jx
N
1/

N
i=1
x
i
u(x
i
)
m

N
i=1
u(x
i
)
m
(4)
The crisp center for estimated center is simply obtained via
defuzzication as:
V
j
=
V
L
+V
R
2
(5)
Using the IT2-FCM segmentation method, there is a need to
determine the number of cluster seeds. The number of peaks
in the histogram of an image, which is bigger than the regular
ratio for the whole pixel number of an image in a domain
interval of 20, is set as the number of cluster seeds. In this
study, the ratio was set at 2%. Fig. 4 shows an example of the
IT2-FCM segmentation of a depth image. With ve gray seeds,
four objects can be segmented. Morphological technology is
needed to deal with the noise data in the segmented image,
such as (f).
Fig. 4. Example of segmentation using IT2-FCM. (a) original depth image
(b) segmentation result (c)-(f) segmented objects
Feature extraction based on wavelet transformation [7] is
executed after the segmentation. Fig. 5 presents the two-
Fig. 5. Two-level wavelet transformation
level wavelet transformation. As shown in Fig. 5, the 2D WT
decomposes an image into four subbands that are localized
in frequency and orientation, denoted by LL, HL, LH, and
HH, respectively. The low-frequency components contribute
to the global description while the high-frequency components
contribute to the details. The subband LL can be further
decomposed as the next level. Enough information can be
obtained for the estimation, using the wavelet transform. Of
course, how to select the level set is a problem because it has
a great effect on the computational cost and the process time.
In our case, the level set was determined to be two.
C. Rotation and Translation Estimation
A numerically stable direct least-square method tting an
ellipse [3] to a set of data points was used to calculate
the rotation angle between the image sequences. This tting
method is robust for the localization of the optimal ellipse
solution. The datasets, which are used for tting an ellipse,
were generated from the wavelet feature extraction process.
Every dataset, the coordinate of the pixels of the wavelet-
decomposed images, belongs to one ellipse because it stands
for one segmented object. Fig. 6 shows the tting result of an
object and the same object rotated 13 degrees.
Fig. 6. Estimation result of an object for the rotation (a) reference frame
object (b)rotated frame object
Many rotation values can be obtained according to the level
of wavelet transform and the number of segmented objects,
including some noise values which can occur in the case in
which the object partially disappears in the sequence image.
A type-2 fuzzy thresholding method based on information-
theoretical measures [4,5] was used to get rid of such noise
values. This method shifts the membership function over the
range of data sets to calculate the amount of ultrafuzziness
in each position, and selects the maximum ultrafuziness as
the optimal threshold. The ultrafuziness can be dened as
equation (6), where MN is the total pixel number, L is
gray level, h is histogram, u
U
and u
L
are the upper and
lower membership function derived from primary membership
function u
A
respectively.
(

A) =
1
MN
L1

g=0
h(g) [u
U
(g) u
L
(g)]
where, u
U
(g) = [u
A
(g)]
1/
u
L
(g) = [u
A
(g)]

(1, 2] (6)
774
Two peaks can be obtained as thresholds that get rid of
the left and right noise values, as shown in Fig.7 and then
calculate the average value of angles between two thresholds
as the rotation angle.
Fig. 7. Type-2 Fuzzy based thresholding method
The rotation compensation is proceeded as the rotation angle
calculated by rotation estimation procedure. In case of Fig.
6, the difference from the rotation angle calculated from the
ellipse-tting result of 12.8 degrees is less than 1 degree. After
proceeding the rotation compensation, rotated image is used
as the input data for the translation compensation.
The translation estimation uses the FNCC algorithm, which
is a measure of the similarity between the image and the
feature. FNCC overcomes the difculties of cross-correlation,
which is dependent on the size of the feature and is not
invariant to image amplitude changes, such as the changing
lighting conditions across the image sequence, by normalizing
the image and feature vectors to the unit length [14,15].
Translation compensation uses this translation estimation.
(u, v) =

x,y
[f(x, y)

f
u,v
][t(x u, y v) t]
{

x,y
[f(x, y)

f
u,v
]
2

x,y
[t(x u, y v) t]
2
}
0.5
(7)
In the equation (7), t is the mean of the feature and f
u,v
is the mean of f(x, y) in the region under the feature.
III. EXPERIMENT RESULTS
The performance of the proposed motion stabilization
method of the humanoid robot was evaluated via experi-
mentation. The experiments that were conducted could be
divided into two subexperiments, one executed in an ideal
walking environment of a humanoid robot, and the other in
a real walking environment of the same. In this experiment,
real walking environment means an indoor environment. The
detailed results of the experiments are given as follows.
A. Stabilization Performance in an Ideal Environment
The proposed motion stabilization method was evaluated in
an articial ideal environment. As such, the quantity of errors
was determined by comparing the results of the use of the
test algorithms with the ideal data. The test algorithms that
were compared with the proposed method consist of SIFT, ICP
for the translation and rotation displacement. The performance
evaluation measured the displacements of the x and y axes,
the rotation angle, and the average error from the ideal case
to the results of each algorithm for one cycle, respectively. A
standard set of stereo pairs with available ground truth was
used[16]. Each disparity value had 256 gray levels, with the
brighter levels representing the points closer to the camera
and the unmatched points depicted as white. The origin of
the coordinate in each frame is the center of the image. The
results of the evaluation for the stabilization performance and
errors in the ideal case are presented in Fig. 8. Moreover, the
errors of the compared methods are shown in Table 1.
TABLE I
ERRORS IN THE IDEAL CASE
Variance Method Mean of errors Variance
Rotation displacement
Proposed method 0.42 0.32
SIFT 0.83 0.48
ICP 2.14 1.52
X-axis displacement
Proposed method 0.59 0.39
SIFT 1.12 0.37
ICP 1.40 1.23
Y-axis displacement
Proposed method 0.39 0.30
SIFT 3.92 2.73
ICP 6.84 3.96
As shown in Table 1, the proposed method demonstrated a
better performance compared to the other algorithms. Espe-
cially, the proposed method showed good performance on the
same plane as SIFT, if not an even slightly better performance.
B. Stabilization Performance in a Real Environment
A second evaluation was executed in a real environment.
The algorithms were tested under a real image sequence
obtained from the stereo vision mounted on the humanoid
robot, which had a height of 0.6m, a weight of 6kg, 24 DOF
and a SR4000 Time of Flight(TOF) sensor [17]. Fig. 9 shows
the results of the evaluation in a real environment. In Fig. 10,
the X-axis displacements show the peak points around 40 and
-40, the Y-axis displacements show the peak points around 32
and 2, and the rotation displacements show the peak points
around 12 and -12.
775
Fig. 8. Stabilization results in an ideal environment
IV. CONCLUSION
The problems related to the development of a vision system
in the humanoid robot in the real world spring from the
fact that the conditions for the vision system of a humanoid
robot are entirely different from those for a camera mounted
on a wheeled robot. As such, an efcient and appropriate
method for the ego-motion stabilization of the humanoid robot
was proposed in this paper. The proposed system consists
of type-2 fuzzy logic and wavelet transform for the ego-
motion of the humanoid robot. To remove the ambiguity
caused by the motion of the robot in an input image, the
type-2 fuzzy logic based method, where the IT2-FCM and
type-2 fuzzy thresholding methods are applied, is proposed in
this paper. The meaningful objects are extracted by IT2-FCM,
and the meaningless estimation data are ltered through the
type-2 fuzzy thresholding method. Moreover, to obtain more
information from the input image, a method that decomposes
the input image into two-level subband images is also proposed
in this paper.
776
Fig. 9. Stabilization results in a real environment
This paper shows that the type-2 fuzzy logic based method
has a slightly better stabilization performance compared to
the other algorithms, such as SIFT and ICP. As the wavelet
transform is added at the feature extraction step, many feature
subimages and more meaningful results can be obtained there-
from. Moreover, as IT2-FCM is used at the segmentation step,
the range of the input image is restricted to the candidate of
the objects or obstacles and is not the whole image, and the
accuracy can be improved. In the realization of a humanoid
robot, stabilization is the mandatory condition for enabling the
robot to autonomously recognize its surrounding environment.
Therefore, this paper is important in that it helps in the
development of aid technologies for the humanoid robot.
ACKNOWLEDGMENT
This work was supported by the Korean Institute of Con-
struction & Transportation Technology Evaluation and Plan-
ning. (Program No.:06-United Advanced Construction Tech-
nology Program-D01)
REFERENCES
[1] C. Hwang, F. Rhee, Uncertain fuzzy clustering: interval type2 fuzzy
approach to C-means, IEEE Trans. on Fuzzy Systems, vol.15 issue 1,
pp. 107120, 2007.
[2] J. Mendel, Uncertain Rule-Based Fuzzy Logic Systems:Introduction and
New Directions, Prentice-Hall, 2001.
[3] H. Radim, F. Jan, Nuberically Stable Direct Least Squares Fitting
Ellipses, Proc. of Intl. Conf. on Computer Graphics and Visualization,
vol.1 pp. 125132, 1998.
[4] H. R. Tizhoosh, Image Thresholding using Type II Fuzzy Sets, Pattern
Recognition, vol.38 pp. 23632372, 2005.
[5] H. R. Tizhoosh, Type II Fuzzy Image Segmentation, Fuzzy Sets and
Their Extensions, pp. 607618, 2008.
[6] B. Choi, F. C. Rhee, Interval type-2 fuzzy membership function gener-
ation methods for pattern recognition, Information Sciences2008.
[7] S. Mallat, A Wavelet Tour of Signal Processing, Academic Press, 1999.
[8] D. G. Lowe, Distinctive Image Features from Scale-Invariant Keypoints,
Intl. J. of Computer Vision, vol.60 pp. 91-110, 2004.
[9] Y. Chen and G. Medioni, Object modelling by registration of multiple
range images, Proc. of IEEE Intl. Conf. on Robotics and Automation,
pp. 2724-2728, 1991.
[10] R. Lienhart, and J. Maydt, An Extended Set of Haar-like Features for
Rapid Object Detection, Proc. of IEEE Intl. Conf. on Image Processing,
Vol. 1, pp. 900-903, 2002.
[11] J. R. Beveridge, K. She, B. Draper, and G. H. Givens, A nonparametric
statistical comparison of principal component and linear discriminant sub-
spaces for face recognition, Proc. of IEEE Conf. on Pattern Recognition
and Machine Intelligence, pp. 535-542, 2001.
[12] L. P. Morency, R. Gupta, Robust real-time egomotion from stereo
images, Proc. of Intl. Conf. on Image Processing vol. 2, pp. 719-722,
2003.
[13] S. Vedula, S. Baker, P. Rander, R. Collins, and T. Kanade, Three-
dimensional scene ow, Intl. Conference on Computer Vision, Vol. 2,
pp. 722-129, 1999.
[14] K. Briechle and U. D. Hanebeck, Template matching using fast
normalized cross correlation. Proc. of Optical Pattern Recognition XII,
Vol.4386, pp. 95-102, 2001.
[15] J. P. Lewis, Fast Template Matching, Vision Interface, pp. 120-123,
1995.
[16] Middlebury Stereo Vision Page: http://vision.middlebury.edu/stereo/.
[17] Mesa-Image Page:http://www.mesa-imaging.ch/.
777

You might also like