You are on page 1of 27

COPYRIGHT DISCLAIMER

This is a draft manuscript, confidential and no


material from this document may be copied, reproduced,
republished, uploaded, posted, transmitted or
distributed in any way, without the express and
specific permission from the copyrighted author(s) or
the publisher. You may download one copy of the
materials on any single computer for your personal,
informal, non-commercial, EDUCATIONAL and RESEARCH
PURPOSES only, provided you keep intact all original
copyright and other proprietary notices. Modification
of the materials or use of the materials for any other
purpose is a violation international copyright law.

Please download and cite the published form using the


Link to publisher:
http://link.springer.com/article/10.1007%2Fs12008-0120179-3
or
Fiorentino M., Radkowski R. Stritzke C., Uva A. E.,
Monno G., Design review of CAD assemblies using
bimanual natural interface, (2012) Springer-Verlag,
International Journal on Interactive Design and
Manufacturing (IJIDeM), Issn 1955-2513, pages= 1-12,
doi 10.1007/s12008-012-0179-3.

NOTE: This is a manuscript: visit http://link.springer.com/article/10.1007%2Fs12008-012-0179-3


for the printed version

Design review of CAD assemblies using


bimanual natural interface
Michele Fiorentino (a), Rafael Radkowski (b), Christian Stritzke (b), Antonio E.
Uva (a), Giuseppe Monno (a)
(a) Department of Mechanics, Mathematics and Management, Politecnico di Bari,
{fiorentino, a.uva, gmonno} @poliba.it
(b) Heinz Nixdorf Institute, University of Paderborn, {rafael.radkowski, cstritzk}
@hni.uni-paderborn.de
Contact author. Michele Fiorentino
Viale Japigia 182, 70126, Bari, IT
Phone: 0039 080 596 2800
Fax:

0039 080 596 2777

Email: fiorentino@poliba.it
http://www.dimeg.poliba.it/vr3lab/

Abstract
The interaction metaphor, based on mouse, monitor and keyboard, presents evident limits in the
engineering design review activities, when real and virtual models must be explored and
compared, and also in outside-the-office environments, where the desk is not available. The
presented research aims to explore a new generation of gesture-based interfaces, called natural
interfaces, which promise an intuitive control using free hands and without the desk support. We
present a novel natural design review workspace which acquires user motion using a combination
of video and depth cameras and visualizes the CAD models using monitor-based augmented
reality. We implemented a bimanual egocentric pointer paradigm by a virtual active surface in
front of the user. We used a XML configurable approach to explore bimanual gesture commands
to browse, select, dis\assembly and explode 3D complex models imported in standard STEP
format. Our experiments demonstrated that the virtual active surface is able to effectively trigger a
set of CAD specific commands and to improve technical navigation in non-desktop environments:
e.g. shop floor maintenance, on site quality control, etc. We evaluated the feasibility and
robustness of the interface and reported a high degree of acceptance from the users who preferred
the presented interface to a unconstrained 3D manipulation.

NOTE: This is a manuscript: visit http://link.springer.com/article/10.1007%2Fs12008-012-0179-3


for the printed version

NOTE: This is a manuscript: visit http://link.springer.com/article/10.1007%2Fs12008-012-0179-3


for the printed version

Natural Interfaces, Augmented reality, Human computer interfaces, CAD, Depth


camera, 3D manipulation

Introduction
Ubiquitous computing is becoming reality with the fast diffusion of smartphones,
tablets, because of their increasing processing and 3D graphic power. A novel
human-computer interaction paradigm, caller post-desktop approach [1], is also
strictly related to pervasive computing where the devices are not personal
computers, but tiny, even invisible devices, embedded in almost any type of
surrounding objects, including cars, tools, appliances, clothing, etc., all
communicating through increasingly interconnected networks [2]. Nowadays,
these mobile devices are candidate to replace desktop PCs in our daily life and, in
a near future to play an important role also in industrial scenarios. The
incorporation of ubiquitous devices , micro projectors, and embedded 3D scanners
can lead to a revolutionary way to interact with 3D CAD models. Therefore, in the
area of computer aided design methods, the study of the potential and the limits of
desktop-less interface in industrial use is a very important issue. In particular
current CAD interfaces show evident limits in Design Review (DR). DR is a
crucial step in the Product Lifecycle Management (PLM). Its goal is to spot, as
soon as possible in the production chain, product and process weaknesses, errors
and manufacturing problems. One critical aspect of the design evaluation of an
industrial component is the understanding of engineering model. DR requires an
efficient workspace to understand complex in 3D geometries, to browse a large
number of components in an assembly and to select and manipulate them. In real
industrial scenarios, we can find the following conditions conflicting with the use
of a traditional desk interface: the lack of a clean desk, the user wears gloves, and
the necessity of comparing virtual (i.e. ideal CAD models) with real, defective
or uncompleted parts or assemblies. In particular, this paper aims to study a new
generation of gesture-based interfaces, called natural interfaces, to facilitate
technical discussion and CAD model navigation (see figure 1). The natural
interfaces are designed to reuse existing skills for interacting directly with
NOTE: This is a manuscript: visit http://link.springer.com/article/10.1007%2Fs12008-012-0179-3
for the printed version

NOTE: This is a manuscript: visit http://link.springer.com/article/10.1007%2Fs12008-012-0179-3


for the printed version

content[3], and therefore they are particularly user friendly. But due to their
novelty and to the lack of implementation of gesture recognition into commercial
CAD kernels, literature on this topic is still scarce or too general, while a specific
CAD oriented methodology is necessary.
Differently from other similar approaches in literature, in this work we integrate
natural interfaces in an augmented reality environment. Using bimanual
interaction on a virtual active surface, the user can navigate, inspect, and interact
with CAD models.
The paper is organized as follows: we start with a brief survey of related works in
Section 2 and analyze the design review requirements in Section 3. Section 4
contains a detailed description of the virtual active surface concept and in Section
5 we detail the implementation. In Section 6 we present a case study and the users
response, while in Section 7 we conclude the paper and defines the future work.

Related Works
It is a well-known issue in industry that CAD software and downstream
applications hardly support DR because their human computer interface which is
specifically oriented to expert CAD users seated on an office desk and not suited
NOTE: This is a manuscript: visit http://link.springer.com/article/10.1007%2Fs12008-012-0179-3
for the printed version

NOTE: This is a manuscript: visit http://link.springer.com/article/10.1007%2Fs12008-012-0179-3


for the printed version

for natural 3D models navigation and collaboration. Figure 2 depicts a simplified


taxonomy of the common interaction metaphors for the manipulation of 3D
virtual objects according to Bowman et al [4]. They split the interaction
metaphors in two main branches: exocentric and ego-centric. In the former, the
exocentric, called also perspective of God the user interacts from the outside of
the virtual environment. An example of this approach is the World-in-Miniature
[5], where the user is provided with a miniature model of the whole scene and the
interactions are mapped with the real scale world. In the latter metaphor, the
egocentric, the user is interacting from inside the environment according to two
metaphors: the virtual hand and the virtual pointer. The first method needs 3D
tracking to map the user real hands with a virtual representation of them in order
to simulate real life gestures and benefit from natural skills and coordination.
With the second metaphor, the virtual pointer, the user selects and manipulates
objects by pointing at them. A very common example of this approach is the
"laser ray": an imaginary linear vector emanating from the user eye through the
virtual pointer which emulates the mouse pointer. Each of the presented
approaches have positive aspects and drawbacks and there is not an optimal
solution for all possible scenarios: model size, number of elements, object
complexity, user expertise, etc.

NOTE: This is a manuscript: visit http://link.springer.com/article/10.1007%2Fs12008-012-0179-3


for the printed version

NOTE: This is a manuscript: visit http://link.springer.com/article/10.1007%2Fs12008-012-0179-3


for the printed version

In a previous work [6] we presented a desk-less egocentric interface for improving


paper technical drawings with virtual technical content (2D images, 3D graphics,
annotations). The system was based on two advanced technologies: Tangible
interfaces (TUI) and Augmented Reality (AR). TUIs enable a person to interact
with digital information using the physical environment [7]. TUIs are designed as
an alternative paradigm to conventional GUIs allowing users to manipulate
objects in virtual space using physical, thus tangible, objects. The study proved
the advantages of a desk-less approach in engineering discussion, however, it
disclosed some limits: the fiducial based tracking, the restricted number of
commands available and the need to have at least one hand free and clean. Those
condition have a deep impact on the usability in industrial scenarios where the
multiple markers, needed to keep the tracking robust, reduced the useful
information area. We also needed a hybrid interface by supporting the tangible
with a traditional GUI to exploit a larger choice of commands. Finally tangible
interfaces demonstrated to be very intuitive but, especially in some industrial
scenarios like the maintenance, an interface which leaves the hands free could be
more appropriate.
Buchmann et al [8] introduced one significant step towards egocentric and highly
natural interfaces with FingARtips: a gesture-based system for the direct
manipulation of virtual objects. They attached fiducial markers on each finger to
track the fingertips of a person and to derive gestures. This system allows to pick
up virtual objects in 3D space. Reifinger et al. [9] presented a similar approach but
using infrared tracking for hand and gesture recognition. Infrared-reflective
markers are attached to the fingertips of the user and s\he can grasp virtual objects
as in the physical world.
Recent technologies such as 3D depth cameras (e.g. Microsoft Kinect [10], see
figure 3), can potentially allow low-cost optical-based gesture recognition without
the need to wear or handle any awkward device [11].

NOTE: This is a manuscript: visit http://link.springer.com/article/10.1007%2Fs12008-012-0179-3


for the printed version

NOTE: This is a manuscript: visit http://link.springer.com/article/10.1007%2Fs12008-012-0179-3


for the printed version

Recognizing in real time human gestures in a general environment is a complex


task, which involves multiple multidisciplinary aspects such as motion modelling,
motion analysis, pattern recognition, and machine learning. For hand detection,
which is an essential component for gesture recognition, the most common
approaches in literature use colour information (skin segmentation) and motion
information. However, these approaches struggle with changing light conditions
and make hand motion tracking not robust [12]. Recently, 2D vision-based
gesture recognition systems have received a boost from the additional aid
provided by depth cameras [13][14]. A lot of research effort is spent to improve
the algorithms in terms of latency, robustness and precision, but few studies have
been carried out on the usability in a specific application field. Our contribution to
the state of art of this emerging technology is to apply natural interfaces to the
CAD domain, and in particular to support engineering 3D model navigation for
design review. In fact the idea of gesture based CAD interaction is not new in
literature and is deeply connected with virtual reality (VR) technology [15]. In a
previous research using a VR setup we experimented the 3D input asymmetry
with user studies [16]. In particular, pointing accuracy along the depth direction
(i.e. in front of the user) is significantly lower than along other
(horizontal\vertical) directions. Valentini [17] presented a novel approach to
NOTE: This is a manuscript: visit http://link.springer.com/article/10.1007%2Fs12008-012-0179-3
for the printed version

NOTE: This is a manuscript: visit http://link.springer.com/article/10.1007%2Fs12008-012-0179-3


for the printed version

achieve realistic and real time CAD assembly simulations in an augmented reality
environment using 3D natural interfaces and a depth camera. A real time solver
allowed the picking operation of multiple 3D simple objects (e.g. cylinders, cubes,
etc.) while the depth information resolved occlusion of virtual and real object
(including the user body).
All the presented interaction metaphors are designed to work with 3D graphic
models without taking in account the engineering knowledge that they contain.
This paper aims to explore a egocentric virtual pointer interface with a specific
design review approach for CAD assemblies.

Design review requirements


A significant issue in current DR tools is related to usability. Differently from
CAD modeling, the audience of DR is much wider and ranges from specialized
designers to marketing, managers and quality control professionals. This
heterogenic group of users can benefit from a natural interface. In a previous
research we already experienced natural interfaces using 3D interaction where
user tests proved the advantages but also the disadvantages of a completely
unconstrained 3D navigation [18]. In fact, a fully unconstrained and direct 3D
interaction, while theoretically being the most natural, can be awkward due to the
model dimensions, the tracking latency and precision, the user attention
allocation, the interaction anisotropy and the lack of tactile feedback. In the
specific domain of the design review our approach aims to improve the usability
by reducing the available degrees of freedom in the natural interaction. In
practice, DR requires the implementation of a limited but effective subset of CAD
functions. These functions should be accessed without the need of the
mouse\keyboard input, for the aforementioned reasons.
Another important aspect in DR is the straightforward integration in the
engineering design workflow, according to industry standards and practice (e.g.
file formats, technical knowledge, etc.) Most of the natural interaction system
described in literature are focusing mainly to geometry and aesthetic rendering. A
CAD model is not just geometry and materials, but it contains a lot of technical
NOTE: This is a manuscript: visit http://link.springer.com/article/10.1007%2Fs12008-012-0179-3
for the printed version

NOTE: This is a manuscript: visit http://link.springer.com/article/10.1007%2Fs12008-012-0179-3


for the printed version

data which need to be visualized. The extraction of non-geometrical data and how
the user can interact with them are actually current issues. Each CAD vendor
masks the data structure in proprietary formats, and therefore the only feasible
way to access the data is by neutral formats. In particular we decided to use the
STEP format, which is a standard, de facto, in most of the commercial CAD
systems [19]. The retrieved information can be used to navigate the models using
the engineering knowledge that is embedded in the CAD design. Figure 4 depicts
the CAD data workflow. In the first step, independently from the specific CAD
platform, the models are exported in STEP file format.

In the second step, the AR converter module, based on the OpenCascade CAD
kernel, is used to prepare the data files for the AR application. The main function
of this module is the tessellation. It converts each part model from a B-rep
mathematical representation in a separate mesh file. The system supports both
STEP protocols: AP 203 Configuration controlled 3D designs of mechanical
parts and assemblies and 214 Core data for automotive mechanical design
processes. The level of detail of the triangulation and, consequently, the
precision of the graphical representation can be optimized for the specific
visualization hardware in order to obtain real time interaction. The local
translations and rotations of the single CAD parts are flattened to a common
world reference system by traversing the assembly structure. The assembly
structure, the part filenames and other CAD related data (e.g. volume and
NOTE: This is a manuscript: visit http://link.springer.com/article/10.1007%2Fs12008-012-0179-3
for the printed version

NOTE: This is a manuscript: visit http://link.springer.com/article/10.1007%2Fs12008-012-0179-3


for the printed version

constraints) are stored in a custom 3D model XML file. This is the input file for
the AR application, which allows the user to browse parts and assemblies using
simple gestures and visual feedback according to our novel virtual active surface
paradigm.

The virtual active surface


Our interaction approach was inspired by the well-known multi-touch interaction
metaphor (e.g. Apple iPad). The proposed interaction technique follows the
design recommendations by Dnser et al. [20] Their guidelines demand the
reduction of cognitive load, an immediate feedback, and an error tolerant system.
We need to consider that the user is always in movement, and that some human
actions do not necessarily indicate a wanted interaction (e.g. answering with a
mobile phone). Moreover, in the interface design, we need to consider the limited
precision of current low cost depth cams: approximately 1cm at 2m tolerance. We
implemented a bimanual egocentric pointer paradigm with an imaginary virtual
plane located approximately 50cm in front of the user chest (see Figure 5).
If the user hands are near this plane, the system recognizes an intentional
interaction and consequently triggers an action if associated to a previously
classified gesture.
The active surface is designed to separate the interaction volume from the
personal area between the plane and the user. The personal area is a private
untracked zone, which allows normal working activity (e.g. sketching on paper or
handling objects) without triggering any event. The system recognizes the user
chest position and moves the plane accordingly.

NOTE: This is a manuscript: visit http://link.springer.com/article/10.1007%2Fs12008-012-0179-3


for the printed version

10

NOTE: This is a manuscript: visit http://link.springer.com/article/10.1007%2Fs12008-012-0179-3


for the printed version

A big issue that we experienced in the implementation of this metaphor is the lack
of tactile feedback. Our solution is to provide a visual feedback to the user
touching the virtual plane by displaying a semi-transparent red frame on the
margin of the user field of view (see Figure 6). The active surface geometry is
defined by three parameters: width (w), height (h) and depth (d). Their values
must be adapted to the user anthropometry for an ergonomic cursor mapping.
Another important parameter is d which represents the collision sensitivity of the
active surface. The interaction is active if the centre of the hand is contained
between the two back\front planes, see figure 6.

NOTE: This is a manuscript: visit http://link.springer.com/article/10.1007%2Fs12008-012-0179-3


for the printed version

11

NOTE: This is a manuscript: visit http://link.springer.com/article/10.1007%2Fs12008-012-0179-3


for the printed version

These three parameters are defined during a preliminary calibrating phase while
the user extends his\her arms. Visual cursors are displayed on the active plane as
semi-transparent proxies (2D discs, 3D spheres, or also a virtual hand model).
The cursor state is obtained by a gesture recognition module. Although we could
exploit all the gestures obtained by five fingers, in this metaphor we use only
three states: open hand state (OS), fist state (FS), and bang state (BS). We use the
open hand for idle cursor visual feedback, the grasping fist for selection, and the
three-fingers for navigation (see figure 7). The user has visual feedback of these
states because the proxies change colours accordingly. The 3D object interaction
is obtained ray casting the 2D cursors according to the virtual pointer metaphor.

NOTE: This is a manuscript: visit http://link.springer.com/article/10.1007%2Fs12008-012-0179-3


for the printed version

12

NOTE: This is a manuscript: visit http://link.springer.com/article/10.1007%2Fs12008-012-0179-3


for the printed version

Basic CAD interaction


Our interface provides both single hand (left or right) and bimanual gesture
interaction. To visualize the cursor(s), the hand(s) must act on the virtual plane
where d is usually around 30cm.
The classical scene navigation is obtained with three functions: orbit, pan, zoom.
The user can orbit the scene with one single hand in BS. With both hands in BS
the system can perform pan and zoom at the same time. If the hands are pointing
in generally the same direction, then a pan action is activated. To zoom in the user
moves two hands closer to each other, and to zoom out the user moves them away

NOTE: This is a manuscript: visit http://link.springer.com/article/10.1007%2Fs12008-012-0179-3


for the printed version

13

NOTE: This is a manuscript: visit http://link.springer.com/article/10.1007%2Fs12008-012-0179-3


for the printed version

from each other. The midpoint between the hands controls the centre of the
scaling operation (i.e., the only point that is remains constant).
The object selection is obtained by moving the cursor on the object and changing
the state to FS simulating the grasping action.
Files interaction
As regards managing CAD files with natural interaction, we already presented a
visual file manager in a previous work [21].

In the current approach, differently from the presented solution, opening a CAD
file is as simple and intuitive as scrolling the contact list on a smartphone. The
Documentation Browsing Bar (see bottom of figure 8) appears and disappears
automatically when the user pointer reaches the lower zone of the active surface.
The CAD files are visually represented as miniaturized icons on the document
browsing bar. The user scrolls the files with a hand in FS toward left or right (see
figure 9). Once found the file is loaded by selecting and dragging the icon in the
center of the active surface area. The current CAD document file is closed by a
zoom reduction until the hands are joined for more than 3 seconds.

NOTE: This is a manuscript: visit http://link.springer.com/article/10.1007%2Fs12008-012-0179-3


for the printed version

14

NOTE: This is a manuscript: visit http://link.springer.com/article/10.1007%2Fs12008-012-0179-3


for the printed version

Interactive exploded view


If an assembly file is loaded it is ready for navigation. The user can explode it
applying a zoom-in like gesture but with both hands in FS. Correspondingly the
user can recompose the assembly by applying a zoom-out like gesture with hands
in FS. In the exploded view the user is able to visualize and select a sub assembly
from the Documentation Browsing Bar which is updated according to the
assembly context.
In the next section we describe the hardware setup and the software architecture
we implemented to provide the presented natural interface.

Implementation
The system is composed of two distinct PCs that operate in parallel: one for the
user tracking and one for the AR application. This approach was necessary due to
the computational requirements of the user interaction but also it increases the
flexibility of the entire system during the developing and testing phase.
Hardware
figure 10 shows an overview of the hardware setup of the presented AR
application. The user stands on a fixed position in front of the main working area.
Although any AR display system (e.g. head mounted display) can be effectively
integrated with natural interaction, we decided to use a simple monitor-based ARsystem. It incorporates a 24 widescreen LCD monitor and a video camera
NOTE: This is a manuscript: visit http://link.springer.com/article/10.1007%2Fs12008-012-0179-3
for the printed version

15

NOTE: This is a manuscript: visit http://link.springer.com/article/10.1007%2Fs12008-012-0179-3


for the printed version

mounted on a tripod. The video camera used for pattern-based tracking is a


Creative Live Cam Video IM Ultra webcam (1280 x 960 pixel at 30 fps), located
next to the users head and aligned towards the working space. This configuration
simulates a camera attached on a head mounted display. The user observes the
augmented scene on the screen of the monitor and the Kinect device is located
under the monitor facing the user to detect her\his gestures. The Kinect video
camera provides RGB colour images with a resolution of 640 x 480 pixel and
12bit depth images. The user does not see these images during normal usage.

Software
The software is written in C++ using object oriented programming and open
source libraries. Figure 11 depicts a schematic overview of the two applications
we implemented for the system. The hands tracker system is divided into a
Skeleton Computation Module and a Hand State Recognition Module. Their
function is to generate real time events of the hand positions in 3D space and their
state . The skeleton computation relies on the OpenNI framework
(http://www.openni.org), which is an application programming interface that
provides a middleware component to retrieve the images from the Kinect and to
determine the user limbs positions, and in particular the approximate hands
locations.
NOTE: This is a manuscript: visit http://link.springer.com/article/10.1007%2Fs12008-012-0179-3
for the printed version

16

NOTE: This is a manuscript: visit http://link.springer.com/article/10.1007%2Fs12008-012-0179-3


for the printed version

Since the user anthropometrics are registered in each session, our system can
recognize different users and recall their calibration data.

The Hand State Recognition Module uses an image processing algorithm based on
OpenCV (http://opencv.willowgarage.com/), an open source computer vision
library. The starting point of the algorithm is the depth image retrieved from the
Kinect as shown in Figure 12 on the right.

NOTE: This is a manuscript: visit http://link.springer.com/article/10.1007%2Fs12008-012-0179-3


for the printed version

17

NOTE: This is a manuscript: visit http://link.springer.com/article/10.1007%2Fs12008-012-0179-3


for the printed version

The system segments the user silhouette from the background using the depth
information and projects the 3D position of the hands of the user skeleton onto the
depth image plane. We define two 80x80 pixels square regions of interest around
the projected positions and we apply a threshold function. The final output is
composed by two black and white images which contain only the hand outlines.
The hand shape is defined using the Hu set of invariant moments. They are
invariant to translations and rotations; thus, they increase the system robustness.
To identify the hand gesture, we compare the estimated Hu set of moments with a
pre-defined set using a support vector machine, a non-probabilistic binary linear
classifier. The pre-defined Hu set of moments of each gesture is calculated out of
200 image samples. We limited the hand states to just three (OS, FS and BS) for
two main reasons. Firstly, image processing in a uncontrolled environment is very
challenging and the classifier error rate grows more than linearly with the number
of hand states. Moreover we experienced how false or wrong gesture detections
are very frustrating for the user (e.g. triggering a command instead of another).
We chose three states as an optimal trade-off between robustness and flexibility to
perform our DR task. The second reason is that some hand configurations proved
to be uncomfortable and wearisome (e.g. indicating number two), especially if
repeated or kept for more than a few seconds. Our states proved to be not tiring
for hands and fingers and easy to learn and to remember because they mimic real
object manipulation.

NOTE: This is a manuscript: visit http://link.springer.com/article/10.1007%2Fs12008-012-0179-3


for the printed version

18

NOTE: This is a manuscript: visit http://link.springer.com/article/10.1007%2Fs12008-012-0179-3


for the printed version

The 3D hands positions and states are submitted as network events via UDP/IP to
the AR application. The augmented reality application can be broken into four
modules: the visualization engine, the tracking, the flexible interface and the
model manager. The visualization engine main function is to perform the real time
overlay and registration of the virtual scene over the real world captured by the
camera. This module is based on OpenSceneGraph (a OpenGL based graphic
library www.openscenegraph.org) for the rendering and ARToolKit library [22] to
perform the scene tracking using image based markers.
The application interface module collects the events from the hand tracking and
decodes them in a sequence of signals for the application finite state machine. The
state machine activates the application functions in a flexible way, because it is
described by a standardized UML-XMI model. This approach allowed us to
explore easily different interface metaphors just changing a XML file instead of
doing it programmatically. The last component of the AR application, the model
manager, is expressly dedicated to model related activities. The model manager
reads the XML model file generated by the AR converter, it parses the assembly
structure and manages: visualization, registration, assembly configurations,
dynamic model loading and explosions.
This file is based on the schema ARProTo.xsd (see https:/homepages.unipaderborn.de/rafael76/ARXML/ARProTo.xsd ). Figure 13 represents the XML
file of a simple product composed by several cubic parts. This file contains a
sequence of "Model3D" elements, each representing a single part with the
following data: a unique ID, the OSG mesh models filename (extension .osgt) ,
the location in the global coordinate system (instead the relative one provided by
the CAD system) and the volume. The volume information is the exact value
computed by the CAD kernel, instead of the approximated mesh internal volume.
The volume is used by the Model Manager to optimize the visualization by simply
unloading small components located far from the user view on runtime.

<?xml version="1.0" encoding="UTF-8" standalone="no" ?>

NOTE: This is a manuscript: visit http://link.springer.com/article/10.1007%2Fs12008-012-0179-3


for the printed version

19

NOTE: This is a manuscript: visit http://link.springer.com/article/10.1007%2Fs12008-012-0179-3


for the printed version

<p1:ARPrototypingToolkit xmlns:p1="ARProToXML" xmlns="ns1"


xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="ns1
https:/homepages.uni-paderborn.de/rafael76/ARXML/ARProTo.xsd">
<Models>
<Model3D label="From CAD" userID="SOLID59" visible="true" volume="399944.25">
<Position X="0" Y="187.04" Z="0"/>
<Orientation RX="0" RY="-0" RZ="-0"/>
<File name="./models/SOLID59.osgt"/>
</Model3D>
<Model3D label="From CAD" userID="SOLID63" visible="true" volume="29344.25">
<Position X="0.90" Y="186.14" Z="100"/>
<Orientation RX="0" RY="-0" RZ="-1.57"/>
<File name="./models/SOLID63.osgt"/>
</Model3D>
</Models>
<Structure>
<Constraint First="CADROOT" ID="26" Second="PRODUCT L156" Type="UNKNOWN"/>
<Constraint First="PRODUCT L156" ID="27" Second="SOLID59" Type="UNKNOWN"/>
<Constraint First="CADROOT" ID="28" Second="PRODUCT L160" Type="UNKNOWN"/>
<Constraint First="PRODUCT L160" ID="29" Second="SOLID63" Type="UNKNOWN"/>
</Structure>
</p1:ARPrototypingToolkit>
<ARToolKit camera_config="camera_config" threshold="100">
<Pattern id="0">
<File value="patt.hiro"/>
<Dimension value="60"/>
<Model3D userID="SOLID59"/>
<Model3D userID="SOLID63"/>
</Pattern>
</ARToolKit>

The hierarchical structure of the assembly is stored in a separate element


"Structure", a sequence of Types LinkType". In the example in figure 13, the
second level SOLID59 part belongs to the subassembly PRODUCT L156 which
is a child of the CADROOT main assembly. The link between the parts can be
used in the interactive explosion to generate the steps of the animation. The Type
NOTE: This is a manuscript: visit http://link.springer.com/article/10.1007%2Fs12008-012-0179-3
for the printed version

20

NOTE: This is a manuscript: visit http://link.springer.com/article/10.1007%2Fs12008-012-0179-3


for the printed version

attribute was designed to contain detailed information about the nature of


constraints among the parts. Unfortunately all the CAD systems that we have
tested did not export in the STEP format this information and therefore we set
Type to "UNKNOWN".
Another important data element of the XML file is the ARToolKit element, which
contains the ARToolkit tracking configuration: the camera calibration file and the
mapping between each fiducial marker (e.g. patt.hiro) and the model unique ID.

The case study


We compared the proposed virtual active surface interface with a natural
interaction interface, we previously presented [18], which allows a completely
unconstrained 3D navigation. For our test case, we decided to use a mechatronic
design: the Bebot. The Bebot was designed using CATIA at the University of
Paderborn in collaboration with the Heinz Nixdorf Institute.
We carried out a usability test with 12 students from the local engineering faculty
in Paderborn. All the users (male) were proficient with at least one CAD system
and in particularly CATIA. No user had previous experience with hand gesturebased interaction techniques. The Bebot model (total of 169 parts in 5 levels subassemblies) had been correctly exported in STEP and visualized with the AR
application. The original CAD assembly total size was more than 200Mb, while
the converted mesh models in .osgt format occupied 32 Mb. We let the users try
the application for 20 minutes without any instruction. The user were asked to
simply examine the model and to understand the main components and their
function. The application ran on a Windows 7 PC with Intel Xeon 3,6 GHz
Processor and 6GB RAM with a NVIDIA Quadro 5500 GPU. Figure 14
represents a screenshot during the selection phase on the Bebop. Figure 15 shows
an exploded view animation that is triggered by a bimanual fist gesture. The users
found exploded views very helpful to understand complex 3D models when
occlusion occurs and for the selection of single components. After the tests we
interviewed the participants for their opinion.

NOTE: This is a manuscript: visit http://link.springer.com/article/10.1007%2Fs12008-012-0179-3


for the printed version

21

NOTE: This is a manuscript: visit http://link.springer.com/article/10.1007%2Fs12008-012-0179-3


for the printed version

The post experiment questionnaire featured five-point Likert scale questions


(1=most negative; 5=most positive) to evaluate ease of use, satisfaction level, and
intuitiveness for each interaction mode.

NOTE: This is a manuscript: visit http://link.springer.com/article/10.1007%2Fs12008-012-0179-3


for the printed version

22

NOTE: This is a manuscript: visit http://link.springer.com/article/10.1007%2Fs12008-012-0179-3


for the printed version

Figure 16 compares the response of the users as to overall ease of use of the 2D
interaction with. the 3D one showing that the former is the preferred one (median
value 4 vs. 3). Also in a direct question, as if the completion of a task is more
difficult using 3D techniques than using 2D techniques, the users strongly agreed
(median value 5).

Figure 17 compares the response as to a specific task (Pan & Orbit). The 2D
interaction is once more preferred (median value 4 vs. 3).
In a direct question about manipulating object, the users agreed (median value 4)
about moving and rotating objects using 2D interaction techniques to be easier
than using constraint 3D techniques.

NOTE: This is a manuscript: visit http://link.springer.com/article/10.1007%2Fs12008-012-0179-3


for the printed version

23

NOTE: This is a manuscript: visit http://link.springer.com/article/10.1007%2Fs12008-012-0179-3


for the printed version

The results from these user ratings are clearly consistent in favour of the presented
active plane interaction techniques. We can justify this non trivial result with two
main explanations. Firstly all the users in the test have already experience and
familiarity with desktop CAD 2D interfaces so the presented approach fits
seamless with their skills. Secondly, due to the existing depth camera tracking
limitation, a constrained movement can be more effective and precise than a
completely free one.

Conclusions and future works


This paper presents a set of natural interaction techniques to facilitate a desk-less
interaction with 3D CAD models. In our approach the user interacts with mono\bi
manual hand gestures on a virtual plane to control CAD models in an AR
environment. We developed a module to access geometry and engineering
information directly from standard STEP CAD file. Therefore, each assembly\part
can be selected, examined, and navigated using natural hand gestures. The
navigation includes orbiting, panning, and zooming. A dynamic exploded view is
provided to understand complex assemblies when occlusion occurs and for
selecting single components. One advantage of this desk-less approach, in
addition to its simplicity, is that all commands can be triggered also in an
industrial environment, noisy and dirty, with bare hands or covered by protective
gloves. The case study showed the feasibility of hand gesture-based techniques
and how the presented virtual plane approach is preferred by the users on an
unconstrained 3D navigation. As regards the integration in current industry
workflow, the STEP format demonstrated to be very successful in exporting the
geometry, the hierarchical assembly structure, the engineering materials and exact
volume data, but in our test we could not retrieve any constraints information (i.e.
kinematic joints). This information could have been used to develop better
explosion and interaction strategies. In future works we will focus on two main
aspects. First, we will improve the gesture algorithm by specific filtering to obtain
a more precise and stable tracking. Secondly, we will find a way to access the
kinematics constraints of the model in order to provide real time simulation in
NOTE: This is a manuscript: visit http://link.springer.com/article/10.1007%2Fs12008-012-0179-3
for the printed version

24

NOTE: This is a manuscript: visit http://link.springer.com/article/10.1007%2Fs12008-012-0179-3


for the printed version

multi-body assemblies. It is important to notice also that we have designed the


presented system specifically for industrial applications (e.g. shop floor
maintenance, on site quality control, etc.), but we can also picture different
scenarios where it can be used to improve 3D models navigation in non-desktop
environments (e.g. scientific or medical data visualization, advertising, etc.)

References
[1]

Weiser, M., Brown, J. S.: The Coming Age Of Calm Technology.

http://www.ubiq.com/hypertext/weiser/acmfuture2endnote.htm (1996). Accessed May 2012


[2]

Kumar, R., Chatterjee, R.: Shaping Ubiquity for the Developing World. International

Telecommunications Union (ITU) Workshop on Ubiquitous Network Societies, Geneva,


Switzerland (2005)
[3]

Blake, J.: Natural User Interfaces. In: NET, Manning Publication Co., Shelter Island, NY

(2010)
[4]

Bowman, D.A., Kruijff, E., LaViola, J.J., and Poupyrev, I.: 3D User Interfaces: Theory

and Practice. Addison Wesley Longman Publishing Co. Inc., Redwood City, CA, USA. (2004)
[5]

Pausch, R., Burnette, T., Brockway, D., and Weiblen, M.E.: Navigation and locomotion

in virtual worlds via flight into hand-held miniatures. In Proceedings of the 22nd annual
conference on Computer graphics and interactive techniques (SIGGRAPH '95), Susan G. Mair and
Robert Cook (Eds.). ACM, New York, NY, USA, pp. 399-400. (1995)
[6]

Fiorentino, M., Monno, G., Uva, A. E.: Tangible digital master for product lifecycle

management in augmented reality. Int. Journal on Interactive Design and Manufacturing, Vol. 3,
Issue 2, pp. 121-129 (2009) doi:10.1007/s12008-009-0062-z
[7]

Ullmer, B., Ishii, H.: Emerging Frameworks for Tangible User Interfaces. IBM Systems

J., vol. 393, no. 3, pp. 915-931 (2000)


[8]

Buchmann, V., Violich, S., Billinghurst, M., Cockburn, A.: FingARtips, gesture based

direct manipulation in Augmented Reality. In: Proc. of the 2nd int. conf. on Computer graphics
and interactive techniques, GRAPHITE '04. (2004)
[9]

Reifinger, S., Wallhoff, F., Ablassmeier, M., Poitschke, T., Rigoll, G.: Static and

Dynamic Hand-Gesture Recognition for Augmented Reality Applications. In: HCI Intelligent
Multimodal Interaction Environments. Springer-Verlag (2007)
[10]

Microsoft Kinect website. http://www.xbox.com/kinect. Accessed May 2012

[11]

Jaimes, A., Sebe, N.: Multimodal humancomputer interaction: A survey, Computer

Vision and Image Understanding, Volume 108, Issues 12, pp 116-134 (2007)

NOTE: This is a manuscript: visit http://link.springer.com/article/10.1007%2Fs12008-012-0179-3


for the printed version

25

NOTE: This is a manuscript: visit http://link.springer.com/article/10.1007%2Fs12008-012-0179-3


for the printed version

[12]

Zhu, Y., Xu, G., Kriegman, D.J.: A real-time approach to the spotting, representation, and

recognition of hand gestures for human-computer interaction. Computer Vision and Image
Understanding 85(3), pp 189-208 (2002)
[13]

Liu, X., Fujimura, K.: Hand gesture recognition using depth data. In: Proceedings of Sixth

IEEE International Conference on Automatic Face and Gesture Recognition, pp. 529- 534, (2004)
doi: 10.1109/AFGR.2004.1301587
[14]

Valentini, P.P.: Enhancing user role in augmented reality interactive simulations. Chapter

in Human Factors in Augmented Reality Environments, Springer (in press).


[15]

Dani, T.H., Gadh, R.: Creation of concept shape designs via a virtual reality interface.

Computer-Aided Design, Volume 29, Issue 8, pp 555-563 (1997) doi: 10.1016/S00104485(96)00091-7.


[16]

Dellisanti, M., Fiorentino, M., Monno, G., Uva, A.E.: Enhanced 3D object snap for CAD

modelling on large stereo displays. International Journal of Computer Applications in Technology,


33(1), pp. 54-62. (2008) doi: 10.1504/IJCAT.2008.021885
[17]

Valentini, P.P.: Natural interface in augmented reality interactive simulations. Virtual and

Physical Prototyping, 7:2, 137-151. (2012)


[18]

Fiorentino, M., Uva, A. E., Monno, G., Radkowski, R.: Augmented technical drawings: A

novel technique for natural interactive visualization of computer-aided design models. Journal of
Computing and Information Science in Engineering, 12(2). (2012) doi: 10.1115/1.4006431.
[19]

STEP Application Handbook: ISO 10303 (Version 3 ed.), North Charleston, SC: SCRA

(2006)
[20]

Dnser, A., Grasset, R., Seichter, H., and Billinghurst, M.: Applying HCI principles to

AR systems design. HIT Lab New Zeeland Technical Report, pp. 15 (2007)
[21]

Radkowski, R., Stritzke, C.: Interactive Hand Gesture-based Assembly for Augmented

Reality Applications, in Proc. ARIA, pp. 303 to 308. (2012)


[22]

Kato, H., Billinghurst, M.: Marker Tracking and HMD Calibration for a video-based

Augmented Reality Conferencing System. In Proceedings of the 2nd International Workshop on


Augmented Reality (IWAR 99), San Francisco, USA, pp. 85-94 (1999)

NOTE: This is a manuscript: visit http://link.springer.com/article/10.1007%2Fs12008-012-0179-3


for the printed version

26

NOTE: This is a manuscript: visit http://link.springer.com/article/10.1007%2Fs12008-012-0179-3


for the printed version

Fig. 1 Natural gestures to navigate 3D CAD models in augmented reality


Fig. 2 Classification of interaction metaphor according Bowman et al.
Fig. 3 The XBox Kinect specifications and outputs
Fig. 4 CAD data integration workflow
Fig. 5 Concept of the virtual active surface
Fig. 6 Active plane depth limits
Fig. 7 Hand states
Fig. 8 The virtual workspace provided to the user
Fig. 9 Documentation Browsing Bar concept
Fig. 10 The hardware setup
Fig. 11 The software architecture
Fig. 12 An example of the Kinect camera image(left) and the depth image(right)
Fig. 13 A simple example of the XML file generated by AR converter module
Fig. 14 Selection with right hand
Fig. 15 Model explosion
Fig. 16 Survey response histograms for overall ease of use. Median values for each condition are
shown as triangles
Fig. 17 Survey response histograms for Pan & Orbit Task. Median values for each condition are
shown as triangles

NOTE: This is a manuscript: visit http://link.springer.com/article/10.1007%2Fs12008-012-0179-3


for the printed version

27

You might also like