Professional Documents
Culture Documents
Email: fiorentino@poliba.it
http://www.dimeg.poliba.it/vr3lab/
Abstract
The interaction metaphor, based on mouse, monitor and keyboard, presents evident limits in the
engineering design review activities, when real and virtual models must be explored and
compared, and also in outside-the-office environments, where the desk is not available. The
presented research aims to explore a new generation of gesture-based interfaces, called natural
interfaces, which promise an intuitive control using free hands and without the desk support. We
present a novel natural design review workspace which acquires user motion using a combination
of video and depth cameras and visualizes the CAD models using monitor-based augmented
reality. We implemented a bimanual egocentric pointer paradigm by a virtual active surface in
front of the user. We used a XML configurable approach to explore bimanual gesture commands
to browse, select, dis\assembly and explode 3D complex models imported in standard STEP
format. Our experiments demonstrated that the virtual active surface is able to effectively trigger a
set of CAD specific commands and to improve technical navigation in non-desktop environments:
e.g. shop floor maintenance, on site quality control, etc. We evaluated the feasibility and
robustness of the interface and reported a high degree of acceptance from the users who preferred
the presented interface to a unconstrained 3D manipulation.
Introduction
Ubiquitous computing is becoming reality with the fast diffusion of smartphones,
tablets, because of their increasing processing and 3D graphic power. A novel
human-computer interaction paradigm, caller post-desktop approach [1], is also
strictly related to pervasive computing where the devices are not personal
computers, but tiny, even invisible devices, embedded in almost any type of
surrounding objects, including cars, tools, appliances, clothing, etc., all
communicating through increasingly interconnected networks [2]. Nowadays,
these mobile devices are candidate to replace desktop PCs in our daily life and, in
a near future to play an important role also in industrial scenarios. The
incorporation of ubiquitous devices , micro projectors, and embedded 3D scanners
can lead to a revolutionary way to interact with 3D CAD models. Therefore, in the
area of computer aided design methods, the study of the potential and the limits of
desktop-less interface in industrial use is a very important issue. In particular
current CAD interfaces show evident limits in Design Review (DR). DR is a
crucial step in the Product Lifecycle Management (PLM). Its goal is to spot, as
soon as possible in the production chain, product and process weaknesses, errors
and manufacturing problems. One critical aspect of the design evaluation of an
industrial component is the understanding of engineering model. DR requires an
efficient workspace to understand complex in 3D geometries, to browse a large
number of components in an assembly and to select and manipulate them. In real
industrial scenarios, we can find the following conditions conflicting with the use
of a traditional desk interface: the lack of a clean desk, the user wears gloves, and
the necessity of comparing virtual (i.e. ideal CAD models) with real, defective
or uncompleted parts or assemblies. In particular, this paper aims to study a new
generation of gesture-based interfaces, called natural interfaces, to facilitate
technical discussion and CAD model navigation (see figure 1). The natural
interfaces are designed to reuse existing skills for interacting directly with
NOTE: This is a manuscript: visit http://link.springer.com/article/10.1007%2Fs12008-012-0179-3
for the printed version
content[3], and therefore they are particularly user friendly. But due to their
novelty and to the lack of implementation of gesture recognition into commercial
CAD kernels, literature on this topic is still scarce or too general, while a specific
CAD oriented methodology is necessary.
Differently from other similar approaches in literature, in this work we integrate
natural interfaces in an augmented reality environment. Using bimanual
interaction on a virtual active surface, the user can navigate, inspect, and interact
with CAD models.
The paper is organized as follows: we start with a brief survey of related works in
Section 2 and analyze the design review requirements in Section 3. Section 4
contains a detailed description of the virtual active surface concept and in Section
5 we detail the implementation. In Section 6 we present a case study and the users
response, while in Section 7 we conclude the paper and defines the future work.
Related Works
It is a well-known issue in industry that CAD software and downstream
applications hardly support DR because their human computer interface which is
specifically oriented to expert CAD users seated on an office desk and not suited
NOTE: This is a manuscript: visit http://link.springer.com/article/10.1007%2Fs12008-012-0179-3
for the printed version
achieve realistic and real time CAD assembly simulations in an augmented reality
environment using 3D natural interfaces and a depth camera. A real time solver
allowed the picking operation of multiple 3D simple objects (e.g. cylinders, cubes,
etc.) while the depth information resolved occlusion of virtual and real object
(including the user body).
All the presented interaction metaphors are designed to work with 3D graphic
models without taking in account the engineering knowledge that they contain.
This paper aims to explore a egocentric virtual pointer interface with a specific
design review approach for CAD assemblies.
data which need to be visualized. The extraction of non-geometrical data and how
the user can interact with them are actually current issues. Each CAD vendor
masks the data structure in proprietary formats, and therefore the only feasible
way to access the data is by neutral formats. In particular we decided to use the
STEP format, which is a standard, de facto, in most of the commercial CAD
systems [19]. The retrieved information can be used to navigate the models using
the engineering knowledge that is embedded in the CAD design. Figure 4 depicts
the CAD data workflow. In the first step, independently from the specific CAD
platform, the models are exported in STEP file format.
In the second step, the AR converter module, based on the OpenCascade CAD
kernel, is used to prepare the data files for the AR application. The main function
of this module is the tessellation. It converts each part model from a B-rep
mathematical representation in a separate mesh file. The system supports both
STEP protocols: AP 203 Configuration controlled 3D designs of mechanical
parts and assemblies and 214 Core data for automotive mechanical design
processes. The level of detail of the triangulation and, consequently, the
precision of the graphical representation can be optimized for the specific
visualization hardware in order to obtain real time interaction. The local
translations and rotations of the single CAD parts are flattened to a common
world reference system by traversing the assembly structure. The assembly
structure, the part filenames and other CAD related data (e.g. volume and
NOTE: This is a manuscript: visit http://link.springer.com/article/10.1007%2Fs12008-012-0179-3
for the printed version
constraints) are stored in a custom 3D model XML file. This is the input file for
the AR application, which allows the user to browse parts and assemblies using
simple gestures and visual feedback according to our novel virtual active surface
paradigm.
10
A big issue that we experienced in the implementation of this metaphor is the lack
of tactile feedback. Our solution is to provide a visual feedback to the user
touching the virtual plane by displaying a semi-transparent red frame on the
margin of the user field of view (see Figure 6). The active surface geometry is
defined by three parameters: width (w), height (h) and depth (d). Their values
must be adapted to the user anthropometry for an ergonomic cursor mapping.
Another important parameter is d which represents the collision sensitivity of the
active surface. The interaction is active if the centre of the hand is contained
between the two back\front planes, see figure 6.
11
These three parameters are defined during a preliminary calibrating phase while
the user extends his\her arms. Visual cursors are displayed on the active plane as
semi-transparent proxies (2D discs, 3D spheres, or also a virtual hand model).
The cursor state is obtained by a gesture recognition module. Although we could
exploit all the gestures obtained by five fingers, in this metaphor we use only
three states: open hand state (OS), fist state (FS), and bang state (BS). We use the
open hand for idle cursor visual feedback, the grasping fist for selection, and the
three-fingers for navigation (see figure 7). The user has visual feedback of these
states because the proxies change colours accordingly. The 3D object interaction
is obtained ray casting the 2D cursors according to the virtual pointer metaphor.
12
13
from each other. The midpoint between the hands controls the centre of the
scaling operation (i.e., the only point that is remains constant).
The object selection is obtained by moving the cursor on the object and changing
the state to FS simulating the grasping action.
Files interaction
As regards managing CAD files with natural interaction, we already presented a
visual file manager in a previous work [21].
In the current approach, differently from the presented solution, opening a CAD
file is as simple and intuitive as scrolling the contact list on a smartphone. The
Documentation Browsing Bar (see bottom of figure 8) appears and disappears
automatically when the user pointer reaches the lower zone of the active surface.
The CAD files are visually represented as miniaturized icons on the document
browsing bar. The user scrolls the files with a hand in FS toward left or right (see
figure 9). Once found the file is loaded by selecting and dragging the icon in the
center of the active surface area. The current CAD document file is closed by a
zoom reduction until the hands are joined for more than 3 seconds.
14
Implementation
The system is composed of two distinct PCs that operate in parallel: one for the
user tracking and one for the AR application. This approach was necessary due to
the computational requirements of the user interaction but also it increases the
flexibility of the entire system during the developing and testing phase.
Hardware
figure 10 shows an overview of the hardware setup of the presented AR
application. The user stands on a fixed position in front of the main working area.
Although any AR display system (e.g. head mounted display) can be effectively
integrated with natural interaction, we decided to use a simple monitor-based ARsystem. It incorporates a 24 widescreen LCD monitor and a video camera
NOTE: This is a manuscript: visit http://link.springer.com/article/10.1007%2Fs12008-012-0179-3
for the printed version
15
Software
The software is written in C++ using object oriented programming and open
source libraries. Figure 11 depicts a schematic overview of the two applications
we implemented for the system. The hands tracker system is divided into a
Skeleton Computation Module and a Hand State Recognition Module. Their
function is to generate real time events of the hand positions in 3D space and their
state . The skeleton computation relies on the OpenNI framework
(http://www.openni.org), which is an application programming interface that
provides a middleware component to retrieve the images from the Kinect and to
determine the user limbs positions, and in particular the approximate hands
locations.
NOTE: This is a manuscript: visit http://link.springer.com/article/10.1007%2Fs12008-012-0179-3
for the printed version
16
Since the user anthropometrics are registered in each session, our system can
recognize different users and recall their calibration data.
The Hand State Recognition Module uses an image processing algorithm based on
OpenCV (http://opencv.willowgarage.com/), an open source computer vision
library. The starting point of the algorithm is the depth image retrieved from the
Kinect as shown in Figure 12 on the right.
17
The system segments the user silhouette from the background using the depth
information and projects the 3D position of the hands of the user skeleton onto the
depth image plane. We define two 80x80 pixels square regions of interest around
the projected positions and we apply a threshold function. The final output is
composed by two black and white images which contain only the hand outlines.
The hand shape is defined using the Hu set of invariant moments. They are
invariant to translations and rotations; thus, they increase the system robustness.
To identify the hand gesture, we compare the estimated Hu set of moments with a
pre-defined set using a support vector machine, a non-probabilistic binary linear
classifier. The pre-defined Hu set of moments of each gesture is calculated out of
200 image samples. We limited the hand states to just three (OS, FS and BS) for
two main reasons. Firstly, image processing in a uncontrolled environment is very
challenging and the classifier error rate grows more than linearly with the number
of hand states. Moreover we experienced how false or wrong gesture detections
are very frustrating for the user (e.g. triggering a command instead of another).
We chose three states as an optimal trade-off between robustness and flexibility to
perform our DR task. The second reason is that some hand configurations proved
to be uncomfortable and wearisome (e.g. indicating number two), especially if
repeated or kept for more than a few seconds. Our states proved to be not tiring
for hands and fingers and easy to learn and to remember because they mimic real
object manipulation.
18
The 3D hands positions and states are submitted as network events via UDP/IP to
the AR application. The augmented reality application can be broken into four
modules: the visualization engine, the tracking, the flexible interface and the
model manager. The visualization engine main function is to perform the real time
overlay and registration of the virtual scene over the real world captured by the
camera. This module is based on OpenSceneGraph (a OpenGL based graphic
library www.openscenegraph.org) for the rendering and ARToolKit library [22] to
perform the scene tracking using image based markers.
The application interface module collects the events from the hand tracking and
decodes them in a sequence of signals for the application finite state machine. The
state machine activates the application functions in a flexible way, because it is
described by a standardized UML-XMI model. This approach allowed us to
explore easily different interface metaphors just changing a XML file instead of
doing it programmatically. The last component of the AR application, the model
manager, is expressly dedicated to model related activities. The model manager
reads the XML model file generated by the AR converter, it parses the assembly
structure and manages: visualization, registration, assembly configurations,
dynamic model loading and explosions.
This file is based on the schema ARProTo.xsd (see https:/homepages.unipaderborn.de/rafael76/ARXML/ARProTo.xsd ). Figure 13 represents the XML
file of a simple product composed by several cubic parts. This file contains a
sequence of "Model3D" elements, each representing a single part with the
following data: a unique ID, the OSG mesh models filename (extension .osgt) ,
the location in the global coordinate system (instead the relative one provided by
the CAD system) and the volume. The volume information is the exact value
computed by the CAD kernel, instead of the approximated mesh internal volume.
The volume is used by the Model Manager to optimize the visualization by simply
unloading small components located far from the user view on runtime.
19
20
21
22
Figure 16 compares the response of the users as to overall ease of use of the 2D
interaction with. the 3D one showing that the former is the preferred one (median
value 4 vs. 3). Also in a direct question, as if the completion of a task is more
difficult using 3D techniques than using 2D techniques, the users strongly agreed
(median value 5).
Figure 17 compares the response as to a specific task (Pan & Orbit). The 2D
interaction is once more preferred (median value 4 vs. 3).
In a direct question about manipulating object, the users agreed (median value 4)
about moving and rotating objects using 2D interaction techniques to be easier
than using constraint 3D techniques.
23
The results from these user ratings are clearly consistent in favour of the presented
active plane interaction techniques. We can justify this non trivial result with two
main explanations. Firstly all the users in the test have already experience and
familiarity with desktop CAD 2D interfaces so the presented approach fits
seamless with their skills. Secondly, due to the existing depth camera tracking
limitation, a constrained movement can be more effective and precise than a
completely free one.
24
References
[1]
Kumar, R., Chatterjee, R.: Shaping Ubiquity for the Developing World. International
Blake, J.: Natural User Interfaces. In: NET, Manning Publication Co., Shelter Island, NY
(2010)
[4]
Bowman, D.A., Kruijff, E., LaViola, J.J., and Poupyrev, I.: 3D User Interfaces: Theory
and Practice. Addison Wesley Longman Publishing Co. Inc., Redwood City, CA, USA. (2004)
[5]
Pausch, R., Burnette, T., Brockway, D., and Weiblen, M.E.: Navigation and locomotion
in virtual worlds via flight into hand-held miniatures. In Proceedings of the 22nd annual
conference on Computer graphics and interactive techniques (SIGGRAPH '95), Susan G. Mair and
Robert Cook (Eds.). ACM, New York, NY, USA, pp. 399-400. (1995)
[6]
Fiorentino, M., Monno, G., Uva, A. E.: Tangible digital master for product lifecycle
management in augmented reality. Int. Journal on Interactive Design and Manufacturing, Vol. 3,
Issue 2, pp. 121-129 (2009) doi:10.1007/s12008-009-0062-z
[7]
Ullmer, B., Ishii, H.: Emerging Frameworks for Tangible User Interfaces. IBM Systems
Buchmann, V., Violich, S., Billinghurst, M., Cockburn, A.: FingARtips, gesture based
direct manipulation in Augmented Reality. In: Proc. of the 2nd int. conf. on Computer graphics
and interactive techniques, GRAPHITE '04. (2004)
[9]
Reifinger, S., Wallhoff, F., Ablassmeier, M., Poitschke, T., Rigoll, G.: Static and
Dynamic Hand-Gesture Recognition for Augmented Reality Applications. In: HCI Intelligent
Multimodal Interaction Environments. Springer-Verlag (2007)
[10]
[11]
Vision and Image Understanding, Volume 108, Issues 12, pp 116-134 (2007)
25
[12]
Zhu, Y., Xu, G., Kriegman, D.J.: A real-time approach to the spotting, representation, and
recognition of hand gestures for human-computer interaction. Computer Vision and Image
Understanding 85(3), pp 189-208 (2002)
[13]
Liu, X., Fujimura, K.: Hand gesture recognition using depth data. In: Proceedings of Sixth
IEEE International Conference on Automatic Face and Gesture Recognition, pp. 529- 534, (2004)
doi: 10.1109/AFGR.2004.1301587
[14]
Valentini, P.P.: Enhancing user role in augmented reality interactive simulations. Chapter
Dani, T.H., Gadh, R.: Creation of concept shape designs via a virtual reality interface.
Dellisanti, M., Fiorentino, M., Monno, G., Uva, A.E.: Enhanced 3D object snap for CAD
Valentini, P.P.: Natural interface in augmented reality interactive simulations. Virtual and
Fiorentino, M., Uva, A. E., Monno, G., Radkowski, R.: Augmented technical drawings: A
novel technique for natural interactive visualization of computer-aided design models. Journal of
Computing and Information Science in Engineering, 12(2). (2012) doi: 10.1115/1.4006431.
[19]
STEP Application Handbook: ISO 10303 (Version 3 ed.), North Charleston, SC: SCRA
(2006)
[20]
Dnser, A., Grasset, R., Seichter, H., and Billinghurst, M.: Applying HCI principles to
AR systems design. HIT Lab New Zeeland Technical Report, pp. 15 (2007)
[21]
Radkowski, R., Stritzke, C.: Interactive Hand Gesture-based Assembly for Augmented
Kato, H., Billinghurst, M.: Marker Tracking and HMD Calibration for a video-based
26
27