Professional Documents
Culture Documents
Deliverable D1.1
Kinematic model of the human hand
R = Report PU = Public
P = Prototype PP = Restricted to other programme participants (including the Commission Services)
D = Demonstrator RE = Restricted to a group specified by the consortium (including the Commission Services)
O = Other CO = Confidential, only for members of the consortium (including the Commission Services)
ICT – FP7 216239 – DEXMART Deliverable D1.1
TABLE OF CONTENTS
1 Introduction 2
2 Hand modelling: state of the art 3
3 Motion capture 5
4 Kinematic model calibration 6
4.1 Least square calibration 8
4.2 Error analysis 8
4.3 Marker motion model 18
5 Methods for experimental validation 22
5.1 Statistical validation 24
6 Results 24
6.1 Marker motion model 24
6.2 Kinematic model selection 26
6.3 Evaluation of finger joints interdependencies 29
6.4 Marker set selection: human hand wearing a data-glove 35
7 Conclusions 39
A Bayes factors for marker regression: algebraic form 39
1
ICT – FP7 216239 – DEXMART Deliverable D1.1
1 Introduction
The articulations of the human hand are more complex than the comparable articulations of other animals.
In fact, the skeleton only consists of 27 bones, 14 for the fingers, five metacarpals forming the palm and eight
carpal bones in the wrist (Fig. 11 ). Thanks to this complexity and to an highly sensitive tactile feedback
humans can manipulate objects in the environment and execute complex tasks that are still out of reach for
state-of-the-art robotic systems. As many of the tools that we use in our everyday life were designed for us
humans, robots that can manipulate the same tools would gain enhanced human interaction capabilities.
Also, a natural ability to interact with the environment could ease the creation of richer training data-sets
and consequently enhance the robot artificial intelligence.
To improve robotic capabilities a necessity arises for a model that can reproduce the vast majority of
the hand manipulation tasks, that can be measured with available motion capture technology and that can
serve as a path indicator for the evolution of robotic hands. Unfortunately, a model that emulates in toto
the human hand leads to two major problems. First, the subtle movements of the wrist (carpal) bones are
difficult to measure using non-invasive techniques. Second, a detailed robotic replica of the human hand
is complex to implement. The good news is that an approximated articulation model that is capable of
reproducing most of the common manipulation tasks can be implemented and, application-wise, it is often
sufficient.
This report describes the research on kinematic models of the human hand conducted within the DEX-
MART project. In Sec. 2 we review existing state of the art approaches adopted in a variety of research fields
and applications. In particular, we analyse the different choices in terms of number of bones, number of
joints and marker configurations. After a brief overview of the Vicon motion capture system (Section 3) we
describe our contribution toward a more accurate subject calibration procedure (Section 4). To this extent,
first we present the standard procedure, then we analyse the dominant errors using Magnetic Resonance
Imaging (MRI) and finally we propose a novel solution that aims at reducing soft tissue artefacts by explicitly
modelling marker movements. Before presenting the experimental results we briefly discuss best practises for
kinematic model evaluation (Sec. 5). The experimental validation in Section 6 assesses the performance of
the proposed calibration procedure, compares different articulation models, and evaluates the level of inter-
1
The original image is kind courtesy of Mariana Ruiz Villarreal,
http://en.wikipedia.org/wiki/File:Scheme_human_hand_bones-en.svg.
2
ICT – FP7 216239 – DEXMART Deliverable D1.1
dependencies between joint parameters. Also, a preliminary work on best marker set selection for combined
optical and data glove based motion capture is presented. Finally in Sec. 7 we draw our conclusions.
3
ICT – FP7 216239 – DEXMART Deliverable D1.1
of each vertex as a linear combination of the joint poses. Typically, only the poses of the closer segments
are used and the animator manually adjusts the weights to improve the quality of the rendering. Although
this technique is simple it is also prone to the classic "pinching" at the joints caused by the sub-space of
the deformation function becoming degenerate and collapsing when joints are extremely flexed. To reduce
pinching artefacts and increase the level of realism Wang et al. [21] still use linear transformations but
add multiple weights per segment. Singh and Kokkevis [20] instead extend the work from Sederberg and
Parry [22] and use control points aligned with the character skin and 3D Bezier control volumes to deform
the skin according to the poses of the character skeleton. This method yields much more realistic results
than SWE due to the Bezier sub-space deformation. Lewis et al. [23] propose a different approach that
does not rely on initial manual tuning of the weights. Their method allows CG animators to bind their
skin to a character skeleton in a number of key poses; then the algorithm interpolates the intermediate
deformations using non-linear kernels. Unfortunately a large number of poses is required to generate the
mapping for complex limbs like the hand. Anguelov et al. [24] propose a data driven approach lo learn
secondary motion like muscle bulging. Their method applies a linear regressions from the vertex positions
to the skeleton poses. The problem is formulated as a large optimisation procedure that learns the model
parameters and enforces mesh smoothness. Although the mesh deforms realistically, this approach is not
suited to biomechanical analysis as it requires several range scans of the subject. Park and Hodgins [25]
instead model the dynamic movements of the skin from a large set of markers.
Unlike computer graphic animators, biomechanicists model skin motion to reduce capture artefacts and
consequently improve the accuracy of the joint angle measurements. Standard approaches to model fitting
and calibration model the motion as Gaussian noise added to rigid marker positions [26]. Then rigid markers
are fit to the deformed 3D reconstructions in a least squares sense. Andreacchi et al. [27] propose an
interesting solution to the soft tissue artefacts problem. Their design relies on distributing as many marker
as possible on each segment. Then a mass is assigned to each marker and the centre of mass and the inertia
tensor of the cluster are calculated on a frame-by-frame basis. By changing the masses the model adapts
the motion of the markers to the skin. Despite its elegance in design and implementation, this method has
been shown to be remarkably unstable by Cereatti et al. [28]. Cappello et al. [29] extend the popular CAST
technique (Calibrated Anatomical System Technique) from Cappozzo et al. [30] by calibrating two distinct
poses at the same time. The two sets of marker positions are interpolated using a linear function of the
joint angle. Although simple and ultimately extensible to many poses this method is limited by the linear
transformation that may not fit the skin motion.
The interaction between bones, tendons and skin constraints the possible poses that the human hand
can take. To reduce the tracking search space, and consequently the computational cost, Lin et al. [31] and
the Santos model [11] impose constraints on the joint angle ranges. Also, although the parametrisation of
the hand can have between 20 and 30 DoF, the physical action of the tendons imposes strong dependencies
between different joints. These dependencies can be used to reduce the number of free parameters [32]. A
simple rule of thumb for the Proximal Inter-Phalanx (PIP) and Distal Inter-Phalanx joint angles θDIP and
θP IP is that θDIP = 32 θP IP . As long as the hand moves freely this rule holds. However, if the subjects
grasps an object like a pen or a knife, θDIP can strongly deviate from the predicted angle 32 θP IP . Lin et
al. [31] instead perform a Principal Component Analysis (PCA) of the finger motions. The analysis leads
to a principled model of the inter-parameter dependencies. The authors show that, for tracking purposes,
it is possible to represent the vast majority of the hand poses with just seven degrees of freedom. Also, the
kinematic poses of the hand are weakly correlated to the body poses. Jin et al. [33] exploit this dependency
to generate realistic hand animations.
Biomechanical research has investigated more in-depth joint dependencies [34, 35, 36]. In general the
mechanism according to which, when normal humans attempt to move just one finger the other fingers
have to move as well, is well known. Also, the movements of the thumb, index finger, and little finger
typically are more independent than movements of the middle or ring fingers. Simultaneous motion of non
4
ICT – FP7 216239 – DEXMART Deliverable D1.1
instructed digits may result in part from passive mechanical connections between the digits, in part from
the organization of multitendoned finger muscles, and in part from distributed neural control of the hand.
Recent studies have demonstrated that mechanical coupling between the fingers rather than neuromuscular
control limits appears to be a major factor limiting the complete independence of finger movements [37].
Finger independence is generally similar during passive and active movements, but showed a trend toward
less independence in the middle, ring, and little fingers during active, large-arc movements. Mechanical
coupling limited the independence of the index, middle, and ring fingers to the greatest degree, followed by
the little finger, and placed only negligible limitations on the independence of the thumb. Studies involving
simple grasping or skilled tasks have shown that a small number of combined joint motions (i.e., synergies)
can account for most of the variance in observed hand postures that is representative of most naturalistic
postures during object manipulation. These synergies are used broadly during variety of tasks execution,
simple hand motions such as reach and grasp of objects that vary in width, curvature and angle, and skilled
motions such as precision pinch. This studies suggest that this small set of synergies represent basic building
blocks underlying natural human hand motions [38]. The degree of interdependence of the fingers depends
on the extension of flexion movement of the fingers and also it depends on the frequency of rhythmic
movements. Angular motion tended to be greatest at the middle joint of each digit, with increased angular
motion at the proximal and distal joints during 3 Hz movements [35]. Nakamura et al. [34] discovered that
the correlation between distal and proximal joints may depend on the grasped object. Also, while Hager-
Ross and Schieber [35] simply quantify the dependency level between different fingers, Lee and Zhang [36]
propose a control model that uses finger interactions to simulate the natural motion.
Another critical element for capturing the motion of the human hand is the marker configuration.
Motion capture systems are a well established technology to measure the motion of the main human limbs.
However, only recent advances in terms of sensor resolution have allowed researchers to use small markers in
medium-size (3m or more) capture volumes where natural movements of the hand are easier to reproduce.
Nevertheless, selecting the correct number of markers and their position is critical to minimise occlusions.
In [39] 13 colour coded markers are located on key hand locations, 5 on the finger tips, 4 on PIP joints
of (not on the thumb), 3 markers on MCP thumb, pinkie and index joints and one on the wrist. In this
case the large size of the markers heavily constrained their positioning and therefore subtle palm movements
remained unobserved. Zhang et al. [8] for their experiments use 21 markers: one per finger tip, one per joint
right above the joint, and one on the wrist. Then the knowledge of the relative position between markers
and joints is used to estimate the centres of rotation. A similar setup is also presented in [28] but without
markers on the finger tips. For more accurate experiments up to six markers are used to capture the complex
motion of the thumb CMC joint alone [15]. Also, a larger number of markers is positioned on the hand
by Cerveri at al. [9]. Although this setup requires a longer preparation, 42 markers can provide a certain
level of redundancy in case of occlusions. Recently Baker et al. [40] use 24 markers with 4mm diameter to
capture the finger and wrist movements during computer keyboard usage. Finally, Cerveri at al. [9] showed
that 24 markers are sufficient to capture simple tasks in a constrained scenario with fixed wrist position.
3 Motion capture
This section gives a brief overview of the VICON motion capture procedures. For the life sciences market,
Vicon provides a motion capture software called Nexus. This application allows a user to control the
hardware, to post process the marker data and to compute joint angles. A typical optical motion capture
section includes the following steps:
1. Hardware setup.
2. Subject setup.
5
ICT – FP7 216239 – DEXMART Deliverable D1.1
6
ICT – FP7 216239 – DEXMART Deliverable D1.1
(a) (b)
(c) (d)
(e) (f)
Figure 2: Vicon system (a) and motion capture intermediate results (b)-(f). (b): visualization of a calibrated
camera setup; (c): 3D reconstructions of markers on a hand; (d): Vicon Skeleton (subject) for a human
hand; (e)-(f): examples of the hand subject kinematically fitted in different poses.
7
ICT – FP7 216239 – DEXMART Deliverable D1.1
between the nm markers and their parent segments and the marker positions M = {mi }ni=1 m
in the the
2
parent segment coordinate systems . For each marker i we define Si (θ, Λ), a 4 × 4 matrix that transforms
the local coordinates mi to the world coordinates. This transform depends on the joint angle state θ and
on the vector of subject parameters Λ (i.e., bone lengths and orientations). Given the kinematic chain we
can decompose Si as the product of paired transformations where each pair is composed of a fixed segment
transformation P (Λ) and a time-varying joint transformation T (θ). For example, if a marker i is attached
to a segment c, and c is the third segment of a chain a, b and c then
Subject calibration adapts the model to the physical dimensions of the subject and this is formulated as an
optimisation problem over the segment parameters Λ = {lj }nj=1
b
as well as over the marker positions M .
where ni,k is a Gaussian distributed random vector with zero mean and covariance Σi .
Given a set K of preselected key frames from the ROM trial, the objective function f (.) to be minimised
is the sum of squared differences between marker positions and the reckons, that is
nm
kfi,k (θk , mi , Λ)k2
XX
f (Θ, M, Λ) = (3)
k∈K i=1
nm
p
XX
2
=
Σi Si (θk , Λ)−1 ri,k − mi
, (4)
k∈K i=1
where Θ = {θk }k∈K denotes the joint angles for all the key frames and fi,k (.) outputs the per-frame and
per-marker residual. In a typical calibration scenario the number of free parameters may be high and add up
to several thousands. The solution of such a large problem is found via a conjugate gradient-based iterative
procedure. The quality of the final results depends above all on whether the model in Eq. (2) can predict
the reconstructed data. To point out the limitations of the calibration procedure in the next section we
analyse the residual errors.
8
ICT – FP7 216239 – DEXMART Deliverable D1.1
2
1
1 0.5
0 0
z
−1 z −0.5
−2 −1
1
1 1 0.5 1
0 0.5 0 0.5
0 −0.5 0
−1 −0.5 −0.5
−1 −1
−1
y x y x
2
1
1
0.5
0
z
0
z
−0.5 −1
−1 −2
2
1
1 1 0.5 1
0 0.5 0 0.5
0 −0.5 0
−1 −0.5
−0.5 −1
−2 −1
−1
y x y x
Figure 3: Example of 3D residuals on hand motion capture. The red dots show the difference vector between
predicted marker positions and reconstructed measurements. The standard VICON calibration assumes the
residuals to be Gaussian distributed. The plots shows that a Gaussian does not well approximate the data
distribution.
9
ICT – FP7 216239 – DEXMART Deliverable D1.1
At Second University of Naples this point was developed by using hand capture data from a Magnetic
Resonance Imaging (MRI) device. The hand was captured with different marker setups. Also, we performed
static and sequential MRI acquisitions on different hand poses that simulate the tasks proposed in the
DEXMART testing scenario [41]. Then we extracted from MRI data the displacements of markers placed on
the hand dorsal. In particular in this study, we analyse the marker movements caused by a set of predefined
flexions of the fingers. The displacement of the marker relative to the underlying bone is observed and
quantified.
First, we used the MRI equipment to capture a static hand in two different poses and we reconstructed the
three-dimensional models of the hand bones. Then, reflective markers were attached to the subject’s hand
(see Fig. 4) and a sequential protocol was used to track their position in two different postures. To validate
the static data a dynamic MRI scan of the sensorised hand was also performed. No significant differences
were measured between the static and dynamic displacements. Some authors [42, 43] have reported on
kinematic studies based on MRI acquisition techniques, the importance of acquiring joint motion actively,
due to the existence of statistically significant variations between acquiring actively or passively. Unlike
other articulations like knee and hip [44, 45, 46], this does not apply to soft tissue artefacts evaluation of
the back-hand. In active acquisition, no abnormal tracking patterns due the influence of hand muscles and
tendons were observed during the flexion and extension of fingers.
Although MRI data usually shows relevant differences across subjects in soft tissue elasticity, and these
differences are dependent on the subject weight, height and age, for the purpose of our experiment we
considered the variation of distance between marker and bone reference to be small and therefore subject in-
dependent. Our experiments were executed on a healthy male subject. The size of the hand is approximately
20.5cm long. The subject consented to use of his anatomical data for scientific purposes.
MRI acquisition
The MRI scanning was performed at University of Naples “Federico II” with a 1.5 T station manufactured by
Philips Medical systems. We captured the subject while in supine position and with the right arm on top of
the body. Two high-resolution MRI scans of the right hand containing thin axial slices were obtained. The
two series have a small Field Of View (FOV) as they measure one hand only. Two series of T1-weighted
spin echo images and two series of T1-weighted gradient echo images were acquired with one frame every
10
ICT – FP7 216239 – DEXMART Deliverable D1.1
11.4ms, 4.4ms echo, and 250mm FOV. The surface markers in the MR image looks like small cylinders as
highlighted by the arrows in Fig. 6, 7, and 8.
For each hand pose one hundred images representing a slice of the hand were generated. The spacing
between each slice is 1.5mm. Each image is 256 × 256 pixels in size, 8 bit per pixel, and with each pixel
covering a physical rectangular area 0.98mm wide. In the first posture the slicing plane is parallel to the
longitude direction of the fingers. While in the second pose the plane is perpendicular to the longitudinal
direction of the fingers.
MRI processing
In the first stage of the processing a semiautomatic analysis was conduct. We used the software ’Vitrea
ver. 2.0’ of the Vital Images inc. for 2D and 3D visualization and editing of the MR images data. The
software can convert the scans (DICOM format)into many different image format, segment the region of
interest and generate iso-surfaces. Also, we performed a manual segmentation to distinguish bones and
surface markers from soft tissues. First, the pixels were removed by the tuning of the threshold value at 60.
Then, a contour tracing method was used to identify the object edges. In the second stage of processing
was carried out a more accurate measurement of sliding for the metacarpal markers without any manual
editing of recorded images. For this purpose, we have used a co-registration method of MR Images for the
two different poses of the hand (pose 1:open hand, pose 2: closed hand). The used method was developed
at Biostructure and Bioimaging Institute (IBB) of Italian Research National Council (CNR). We used the
SPM software to implement the automatic coregistration processing of MRI data, it is a suite of MatLab
functions and subroutines,typically used for functional PET and MRI brain image analysis that implements
"statistical parametric mapping". By first, the MR images sequences (pose 1 and pose 2) are smoothed
and filtered to eliminate some artifact in the coregistration process due of the fat and of the skin which
are in the recorded images. The algorithms work by minimizing the sum of squares difference between
11
ICT – FP7 216239 – DEXMART Deliverable D1.1
Table 1: Pair-wise marker distance with close and open hand. The distance measures are in millimetres.
MARKER ID DISTANCES
FIRST SECOND OPEN CLOSED DIFF.
RMM4 RH4 19.1 25.4 -6.3
RMM4 RH6 27.8 31.8 -4.0
RMM2 RH1 34.0 36.2 -2.2
RMM2 RH3 20.5 24.3 -3.8
RMF1 RH3 13.5 16.9 -3.4
Table 2: Distances between a marker and the relative bone head. The radius bone is used as a reference.
The distance measures are in millimetres.
the images which are to be coregistered. The first step of the process is to determine the optimum 12-
parameter affine transformation. Initially, the coregistration is performed by matching the whole of the two
hand pose. Following this, the registration proceeded by only matching the metacarpal bones together, by
appropriate weighting of the voxels. A Bayesian framework is used, such that the registration searches for
the solution that maximizes the a posteriori probability of it being correct. i.e., it maximizes the product of
the likelihood function (derived from the residual squared difference) and the prior function (which is based
on the probability of obtaining a particular set of zooms and shears). The affine registration is followed by
estimating nonlinear deformations, whereby the deformations are defined by a linear combination of three
dimensional discrete cosine transform (DCT) basis functions. The parameters represent coefficients of the
deformations in three orthogonal directions. The matching involved simultaneously minimizing the bending
energies of the deformation fields and the residual squared difference between the images.
Results
In the first stage of the analysis, after registration of the 3D hands we performed two different measurements.
First we measured the distance of a marker from a reference bone in both poses (Fig 6 (c)-(f)). Then,
we measured the pair-wise distances between metacarpal markers and their variations due to pose changes
(Fig 7 (e)-(f)). Table 1 and 2 summarize the results.
The results in Tab 1 show that when the hand is flexed, due to skin stretch and muscles deformations,
the distance between markers increases. The markers “slide” over the bones while the hand moves, and this
12
ICT – FP7 216239 – DEXMART Deliverable D1.1
(a) (b)
(c) (d)
(e) (f)
Figure 6: MRI measurements for the marker RH6 on open (left column) and closed (right column) hand.
(a)-(b): marker spatial position (yellow arrow); (c)-(d): distance from radius proximal head; (e)-(f): distance
from metacarpal proximal head.
13
ICT – FP7 216239 – DEXMART Deliverable D1.1
(a) (b)
(c) (d)
(e) (f)
Figure 7: MRI measurements for the marker RMM4 on open (left column) and closed (right column) hand.
(a)-(b): marker spatial position (yellow arrow); (c)-(d): distance from metacarpal proximal head; (e)-(f):
distances from markers RH4 and RH6.
14
ICT – FP7 216239 – DEXMART Deliverable D1.1
Figure 8: MRI measurements for the marker RMF2. The blue lines show the distances between the marker
and the 3rd middle phalanx bone heads.
causes large residual errors during calibration and fitting with the optical system. In absolute value RMM4
moves of 6.3mm and 4.0mm with respect to RH4 and RH6. This indicates that the skin deformations follow
a not linear law. In fact the distance changes between RH4 and RMM4 are bout 33% of the initial distance,
while between RMM4 and RH6 the change is about 14%. These measures give us a good starting point
to correct passive optical data. The results of the RH1, RMM2 and RH3 chain are even more clear. The
slide between RMM2 and RH1 is of 6%, while RH3 slides by 18%. We can see that RMF1 slides of 25% by
RH3, and we have still to consider that RH3 is moving too. The results in Tab. 2 show that RH6 sliding
is apparently incongruent. Closing the hand doesn’t stretch the skin as we could believe, but contracts it.
This happens because the subject does not only closes the hand, but also moves it. This causes a larger skin
slide of about 24% toward the metacarpal and 9% toward the radius. As a results RH6 moves toward the
metacarpal. RMM4 instead shifts toward the radius because of the relative hand-radius movement; while
RMM2 moves toward the middle finger. Vice-versa, RMF2 is moving toward the metacarpal zone (Fig. 8).
Finally the skin sliding causes RH3 to move toward the 3rd phalanx.
In the second stage of our analysis, the output of the coregistration process has provided a more reliable
result of the distances of metacarpal markers in the two hand poses (Fig 9). The first step of coregistration
process required a filtering operation of the 2 MRI series. The lack of the antenna during acquisition phase
produce noise which is reduced dividing the images with a parabolic fitting. Another filtering process was
necessary to reduce anisotropic noise in such a way as to preserve the parts of the images with higher
gradients (edge preserving). Then, a level setting process was applied to eliminate non-interesting parts of
the images(Fig 10). Table 3 summarize the results.
The results in Tab 3 show that metacarpal markers significantly move over the bones while the hand
moves from pose 1 to pose 2. Obviously, the most important contribution is given by the displacement along
the axial direction of the hand (Y RANGE). The largest displacement appens on the third metacarpal bone
in the vicinity of the proximal phalanx (middle finger). RH3 slips more than 11 mm along the direction of
the 3th metacarpal bone while RH2 and RH4 have covered a distance little bit less. The distances covered
by markers RMM2 and RMM4 respectively are 68% and 45% of maximum displacement. Also, the marker
RH5 approximately slips 68% of the maximum displacement. We can see that the sliding is maximum in
the middle of the backhand and it decreases towards the wrist and the little finger more than what happens
in the thumb direction. The significant variation along the others 2 directions also mast be considered to
reduce the residual errors during subject calibration and fitting with an optical capture system.
We also have analyzed the MRI scans for a gloved hand in the 2 hand poses. The considerable noise
due to the presence of the glove, made it impossible to obtain good images sequences after the filtering
and segmentation phases in the coregistration processes. As can be seen (Fig 11 (c)-(d)) many markers
15
ICT – FP7 216239 – DEXMART Deliverable D1.1
(a) (b)
(c) (d)
Figure 9: Coregistration of MR Images for the 2 hand poses. (a)-(b): the 2 poses labelled markers; (c)-(d):
the MR Images of the 2 poses, with metacarpal labelled markers.
16
ICT – FP7 216239 – DEXMART Deliverable D1.1
(a) (b)
Figure 10: Coregistration of MR Images for the 2 hand poses. (a)-(b): Filtered Images (left column)and
MR Images after the coregistration(right column).
17
ICT – FP7 216239 – DEXMART Deliverable D1.1
Table 3: Distances between metacarpal markers in the 2 hand poses after coregistration process. The
distance measures are in millimetres.
MARKER ID DISTANCES
2-DISTANCE X RANGE Y RANGE Z RANGE
RH2 10.94 0.94 10.85 1.07
RH3 11.31 2.01 10.91 2.23
RH4 10.08 1.29 9.15 4.05
RH5 7.76 0.01 6.07 4.83
RMM2 7.70 0.24 7.61 1.13
RMM4 5.16 0.54 4.96 1.32
RH1 5.52 0.55 5.24 1.68
RH6 4.98 1.01 3.82 3.03
are lost after the filtering process and the segmentation of matacarpal bones images is not properly correct.
We belive that a different MRI acquisition protocol must be investigate to obtain a reliable coregistration
process for gloved hand.
φx (θ)T
0 0 0
0 φy (θ)T 0 0
F (θ) = T
0 0 φz (θ) 0
0 0 0 1
18
ICT – FP7 216239 – DEXMART Deliverable D1.1
(a) (b)
(c) (d)
Figure 11: Gloved hand for coregistration process; open (left column) and closed (right column) hand.
(a)-(b): Labelled markers; (c)-(d): Output after filtering and segmentation process.
19
ICT – FP7 216239 – DEXMART Deliverable D1.1
5 2
4
1.5
3
1
2
0.5
1
d
d
0
0
−0.5
−1
−2 −1
−3 −1.5
−0.2 −0.15 −0.1 −0.05 0 0.05 0.1 0.15 −1 −0.5 0 0.5 1 1.5
Joint angle (θ) Joint angle (θ)
(a) (b)
1.5 0.9
10
0.8
1 20
0.7
30
0.6
0.5 40
0.5
d
d
50
0.4
0
60
0.3
70
−0.5 0.2
80
0.1
90
−1
−0.4 −0.2 0 0.2 0.4 0.6 1 2 3 4 5 6 7 8 9
Joint angle (θ) θ
(c) (d)
Figure 12: Visual analysis of the relationship between joint parameters and unnormalised marker to reckon
residuals. (a)-(c): sample residual components (in mm) plotted against the maximally correlated joint
parameter (in radians). (d): correlation coefficient absolute values.
20
ICT – FP7 216239 – DEXMART Deliverable D1.1
contains the regressor vectors φ(θ) for each of the three positional coordinates. In our implementation the
regressors are simple polynomial components. The modified variable position marker model becomes
Note that, if the polynomials are zero-order (i.e., F (θ) = I) Eq. (6) reduces to Eq. (2) with mi = wi .
To limit the number of additional parameters it is not desirable to fix the polynomial order for all the
residuals or to have polynomial components for all the joint parameters in θ. As shown in Fig. 12 each
residual is usually correlated to a limited number of joint angles and in some cases non-linear components
may not be necessary. At the same time favouring simpler models with a low number of extra parameters
improves the computational efficiency of the calibration step and prevents model overfitting. A model
selection procedure is required to find the correct balance between the number of extra parameters and
model complexity. As in other model selection problems our goal is to add only those parameters which
contribute to a significant reduction of the overall residual.
Given a measure of model quality, the optimal model selection scheme would require to optimise the
parameters for all the combinations of regressors. Unfortunately, as the parameter optimisation is an
expensive procedure on its own, a suboptimal, although faster, methodology is necessary. To this extent we
propose a procedure that requires the optimisation of two models only.
The outline of the procedure is in Algorithm 1. First we calibrate the standard model defined in Eq. (2).
Then we perform an analysis of the residual error to select the additional polynomial parameters. Finally,
we calibrate the new model where the markers move according to the polynomial functions.
To perform model selection we treat the residual components independently. Therefore, it is convenient
to define dk = [d1,k . . . dnm ,k ] as the concatenation of the unnormalised residuals at frame k and an
(u)
index u = 1, . . . , U for the single residual components dk . Also, we define the 1D polynomial function
g (u) (θ, w(u) ) = φ(u) (θ)T w(u) modelling the marker motion for the component u. For each residual the
model selection procedure is composed of two steps: (i) first we analyse the correlation between θk and
(u)
dk ; those parameters with correlation larger than a threshold T are selected as active inputs for the skin
model function g (u) (Algorithm 2); (ii) then we initialise set φ(u) = 1 with the zero order regressor only
and for each input we add higher order regressors (i.e., θ, θ2 , θ3 , etc.) in a greedy fashion (Algorithm 3);
the greedy procedure comes to a halt when none of the more complex models under test improves the
performance with respect to the current best model.
The performance of two models g ′ and g ′′ on the data d is compared by computing the Bayes factor
As in Eq. (6) we use a Gaussian noise model with independent marker residuals. Thus we can write the
likelihood as Y
p(d|w, g) = N (dk |g(θk , w), σ),
k
where N is a Gaussian with mean g(θk , w) and variance σ evaluated in dk . The definition of the prior
p(w|g) requires some preliminary considerations. The model selection step does not recalibrate the subject
for each marker motion model, but compares the models on a fixed residual obtained assuming static
markers. Therefore we can expect a bias between this approximated residual and the actual one. The bias
magnitude is unknown a priori and should not affect the model selection result. Consequently we use a
uniform prior for the zero order parameter w0 ∈ w while for all the other parameters we use a standard
Gaussian regulariser, that is
p(w|g) ∝ N (ŵ|0, Σŵ ), (8)
21
ICT – FP7 216239 – DEXMART Deliverable D1.1
where ŵ is the vector containing all the parameters but w0 (i.e., w = [w0 ŵ]), and Σŵ is a diagonal prior
covariance. Also, note that the regression steps 4 and 9 in Algorithm 3 use the same regulariser. Given
this choice of likelihood and prior, an algebraic solution of the integrals in Eq. (7) exists. The derivation is
presented in Appendix A.
To conclude this section we comment on the residual components independence assumption. In general
the marker noise components are not independent; for example the sensor noise depends on the camera
position and on the pose of the subject in the capture volume. However the dominant component of the
residual may still due to unmodelled soft tissue artefacts like those caused by abrupt limb accelerations.
Although the formulation in Appendix A could be easily extended to full covariance matrices, in practise,
before the subject calibration step, the user is often unable to provide a good estimate for the residual error
covariance and usually isotropic models are the default.
22
ICT – FP7 216239 – DEXMART Deliverable D1.1
procedure. Percutaneous fixation of markers has also been used, but this is only marginally less invasive and
still requires ethical approval [48]. Less invasive X-ray studies have been performed, but these invariably
require the attachment of radiolucent markers to the bone [49]. Despite this less invasive approach, ionising
radiation still carries with it the need for ethical approval. Instead of comparing the joint angles Veber and
Bajd [50] compare the subject calibration results of the phalanx segments with the data from statistically-
based anthropometry (i.e., hand lengths and palm widths). However, in our opinion the accuracy of the
bone lengths may not be sufficient to differentiate similar models.
As direct measurements are impractical researchers have evaluate other desirable model qualities [51, 10]
or have used synthetic or semi synthetic data [28]. A well established procedure is to measure the repeatability
of the calibration results[51, 10]. In these tests the researcher capture 20 to 30 trials of the same subject
performing the same movement. Then a calibration step is run on each trial end the results in terms of
bone lengths and joint angles are analysed. Good models and a good calibration procedures should produce
consistent results with small cross-trial variance (see ANOVA [52]). Cereatti et al. [28] generate 3D data
with a synthetic model of the knee. The authors animate the model with real gait data and then add to
the predicted marker position real and synthetic soft tissue artefacts. Finally they compare the joint angles
and the calibration parameters with those of the synthetic subject.
Although repeatability analysis and ANOVA are well established techniques for model validation they do
not tell us how well the model explains the data. For example, an over-simple model that cannot explain
some movements could still be highly consistent on a particular movement. The problem is similar to the skin
parameter selection of Sec. 4.3 as the goal is again to find a compromise between complexity, descriptiveness
and what we can effectively measure. Broadly speaking, complex models (with many segments and more
DoF per joint) have the potential to better explain the data; however they are less stable and slower to
optimise than simpler models. Also, given the data noise level we can model selection should tell us if the
model is overfitting the data. On this regard the next section we report on our study on statistical model
23
ICT – FP7 216239 – DEXMART Deliverable D1.1
where nΨ is the number of parameters in Ψ. A variant of AIC is the Consistent AIC (CAIC) that accounts
for the number of samples nR . The CAIC formulation is
The Bayesian Information Criterion [54] instead assigns a score to each model according to an approx-
imation of the marginal p(R|M) under the assumptions that the data distribution is in the exponential
family. This results in
−n /2
p(R|M) ≈ p(R|Ψ, M) · nΨ R . (11)
The BIC approximation is quite crude, especially for the parameter prior p(Ψ|M). A more elegant
solution is to bootstrap the data and estimate, under a Gaussian assumption, the prior covariance V as
well [55]. Algorithm 4 outlines the procedure used to compute an approximated p(R|M). First we calibrate
the model as explained in Sec. 4.1. Then we bootstrap the unnormalised residuals and we create a set of
semi-synthetic trials. Each trial is again calibrated and the parameter covariance V is computed. Finally we
use the covariance estimate to compute the approximated marginal (Algorithm 4, step 13).
Although the estimates produced by the bootstrap procedure are potentially more accurate than BIC
ones, its application is limited to small subjects. In fact, bootstrapping typically requires at least 1000
samples that in our case correspond to 1000 computationally expensive subject calibrations.
6 Results
6.1 Marker motion model
We evaluate the kinematic model with moving makers described in Sec. 4.3 on capture data acquired with
a rig of nine 4 megapixel Vicon MX cameras. As a proof of concept we limited the capture to two fingers:
the right thumb and index of one healthy subject. 31 markers with 3mm diameter were glued to the latex
glove wore by the subject as showed in Fig. 13. Also, to ensure accuracy we of the global position we glued
one larger (7mm) marker over the wrist and limited the capture volume to about 1m. Finally, to reduce
the occurrence of marker occlusions under wrist rotations we pointed two of the nine cameras upwards. As
in the Santos model [11] we defined the index and thumb articulations with two DoF for CMC and TMC
joints and one DoF for thumb IP, PIP and DIP joints.
24
ICT – FP7 216239 – DEXMART Deliverable D1.1
Figure 13: High density markerset. The thumb and the index are sensorised with 32 markers. The markers
are glued to a latex glove.
We compared the results of the standard Static Marker (SM) model (Eq. (2)) and the enhanced model
with Moving Markers (MM) Eq. (6) on three capture trials. Trial 1, 2 and 3 are three ROM trials. In Trial
4 the subject repeatedly picks a piece of plastic cutlery (a knife) from a small container that he holds with
the other hand. For our experiments we used 100 frames from Trial 1 to calibrate the two subjects. Then,
for the remaining frames in Trial 1 and for the other two trials, we computed the joint angles and the Root
Mean Square Error (RMSE) of the unnormalised marker residuals. Also, we set the maximal degree of the
polynomials in the marker motion model to three. Fig. 14 and Table 4 summarise the results. For all four
trials MM has a significant lower RMSE than SM. The reduction was expected on Trial 1 as this is the
trial used to calibrate the subject and the proposed model has a larger number of free parameters than the
standard one. However the improvement on the other three trial shows that MM generalises well on unseen
data. This result is consistent for motions similar to the training ones (i.e., Trial 2 and Trial 3 ) as well
as for fairly dissimilar movements as in Trial 4. Also, Fig. 14 shows that the performance improvement is
consistent over time. The marker motion model outperforms the standard model both on extreme poses,
when the fingers are fully flexed (see RMSE peaks in Fig. 14), and near the mean pose. Also, during the
experiments we noted that sometimes the marker motion shows an hysteretic behaviour that is caused by
the glove sliding over the skin and not returning to the original position. The problem affects both models,
25
ICT – FP7 216239 – DEXMART Deliverable D1.1
Table 4: Performance comparison between the standard model with static markers and the proposed moving
marker model. The RMSE are in millimeters.
RMSE
Static Markers Moving Markers Perc. difference
Trial 1 0.91 0.66 27.8%
Trial 2 1.02 0.79 23.0%
Trial 3 0.97 0.80 18.2%
Trial 4 0.92 0.74 19.7%
but we can speculate that a better motion prediction can be achieved with skin attached markers.
To evaluate repeatability and stability of the calibration procedure we produced a dataset of 200 semi-
synthetic subjects. Each subject is a randomly rescaled version of the calibrated subject obtained with the
static marker model. First, we multiply all the bone lengths by a global rescaling factor G, where G is
a Gaussian random variable with mean 1 and standard deviation 0.1 then we independently multiply each
segment k times a local rescale factor Lk again Gaussian distributed with mean 1 and standard deviation
0.5. Fig. 15 shows the bone-length statistics (mean and standard deviation) for the two models. Although
both models predict similar bone lengths, the standard model is slightly more stable than the proposed
approach. This result is not unexpected. In fact, more complex models are usually more difficult to optimise
than simpler ones. However, Fig. 15, left, shows that the worst-case scenario for MM happens on the thumb
CMC joint where the standard deviation is 0.07 millimetres only. That value is about 0.2% of the CMC
segment length.
26
ICT – FP7 216239 – DEXMART Deliverable D1.1
Trial 1 (ROM)
SM
1.5 MM
RMSE
0.5
0
0 200 400 600 800 1000 1200 1400 1600 1800 2000
Frame number
Trial 2
SM
1.5 MM
RMSE
0.5
0
0 200 400 600 800 1000 1200 1400 1600 1800 2000
Frame number
Trial 3 Trial 4
SM SM
1.5 MM 1.5 MM
RMSE
RMSE
1 1
0.5 0.5
0 0
0 200 400 600 0 200 400 600
Frame number Frame number
Figure 14: RMSE comparison between the standard calibration model with static markers (SM) and the
proposed model with polynomial moving markers (MM). For all test trials MM better predicts the marker
positions.
27
ICT – FP7 216239 – DEXMART Deliverable D1.1
100 0.2
SM SM
90 MM 0.18 MM
80 0.16
70 0.14
60 0.12
50 0.1
40 0.08
30 0.06
20 0.04
10 0.02
0 0
MCP−I PIP−I DIP−I CMC−T MCP−T IP−T MCP−I PIP−I DIP−I CMC−T MCP−T IP−T
Figure 15: Reproducibility comparison between the standard calibration model with static markers (SM) and
the proposed model with polynomial moving markers (MM). Bone length statistics (Left: mean µ; right:
standard deviation σ) over 100 synthetically rescaled subjects. The lower the standard deviation the better.
Table 5: Best kinematic models of the thumb and index according to the four model selection strategies.
Joint types
MCP I PIP I DIP I CMC T MCP T IP T
AIC Ball Ball Hinge Ball Ball Hinge
CAIC HS Hinge Hinge Ball HS Hinge
BIC HS Hinge Hinge Ball HS Hinge
Bootstrap Ball Hinge Hinge HS Ball Hinge
chain. In fact, high scores on models with rigid joints would indicate that a model with a lower number of
segments is more appropriate.
We calibrated each model using 100 frames from Trial 1 and then, to evaluate the scores, we computed
the joint angles and the residuals on the remaining frames. Table 5 shows the joint types for the best
models according to each of the four model selection methods. First we note that AIC seems to grossly
overestimate the model complexity. AIC assigns 3DoF to the index PIP joint that undoublty should have a
single free parameter. This result lead us to exclude AIC from our pool of model selection methods. CAIC
and instead BIC agree on the best model. It is interesting to note that both methods select a three DoF
model for the CMC joint of the thumb. This indicates that a standard two DoF may not be sufficient to
predict complex thumb actions. Finally, the bootstrap method assigns 3 DoF to the MCP joint of the index
and, with respect to AIC and BIC, inverts the DoF assignment of the thumb. As the number of models
is exponential with the number of segments, a complete test for the full hand is not feasible. Therefore,
we combined the results on thumb and index with the state-of-the-art research and we implemented a set
of 24 plausible models. The models were generated by combining four thumb models, two finger models
and three palm arching models. The thumb models (TM)s are: TM5, the Santos model [11] with five DoF
(Hardy-Spicers for CMC and MCP joints and Hinge for the IP joint); TM6a, the thumb model selected by
CAIC and BICl with six DoF (ball CMC joint); TM6b, the model with six DoF selected via Bootstrapping
using a ball TMC joint; and TM7, a model with ball CMC and TMC joints. From the literature only two
28
ICT – FP7 216239 – DEXMART Deliverable D1.1
Table 6: Five best kinematic models of the hand according to the CAIC and BIC scores when training on a
ROM trial (CAIC score: the lower the better. BIC score: the higher the better.).
plausible Finger Models (FM) exist. The most common has two DoF on the TMC joints; the other has three
DoF. We name these two models as FM2 and FM3. Finally, the Palm Model (PM) can be rigid (PMR),
with two CMC Hardy-Spicer joints for ring and pinky (PM4), or with a third CMC Hardy-Spicer joint for
the index (PM6). To capture the motion we sensorised the hand with 22 markers positioned as showed in
Fig. 2 (c),(e), one marker per phalanx and the rest on the hand dorsal. For the evaluation we run model
selection methods on three capture trials. While the first trial is a classic ROM, in the other two trials the
subject was asked to perform two actions that are particularly relevant to the benchmarking scenario of the
DEXMART project [41]. In one trial the subject unscrews a jar lid; in the other trial the subject repeatedly
picks a piece of cutlery from a small box. Finally, due to the Bootstrap procedure being too computationally
demanding on full hand subjects, we present results for CAIC and BIC methods only.
Table 6, 7 and 8 show the best 5 models according to BIC and CAIC for the three trials respectively. The
first observation is that BIC and CAIC outputs are fully coherent. Therefore, without loss of generality, we
can comment the results of one or the other. The ROM trial results (Table 6) show which models produce
good fit to generic hand movements. In this case the Santos model (TM5-FM2-PM4) achieves the highest
score. The other high ranked models are more complex than Santos and have extra DoFs on the thumb
(TM6a, TM6b and TM7) or on the palm (PM6). None of the models use three DoF for the finger MCP
joints. The Santos model is also the highest scorer on the jar lid unscrewing movement (Tab. 7 . However,
the other high rank models present a larger number of DoF than on the ROM trial case. This indicates
that, to achieve high accuracy on this specific movement a more complex palm model like PM6 can be a
viable option. Finally, the results in Tab. 8 show that the cutlery picking task also triggers extra DoF on
the palm as well as on the thumb. In this case the highest score is produced by the kinematics using the
most complex palm model.
29
ICT – FP7 216239 – DEXMART Deliverable D1.1
30
ICT – FP7 216239 – DEXMART Deliverable D1.1
Table 7: Five best kinematic models of hand according to the CAIC and BIC scores. The training is done
on motion capture data of a hand opening a jar lid (CAIC score: the lower the better. BIC score: the higher
the better.).
Table 8: Five best kinematic models of hand according to the CAIC and BIC scores. The training is done
on motion capture data of a hand picking up pieces of of cutlery from a small box (CAIC score: the lower
the better. BIC score: the higher the better.).
After a labelling procedure of markers and a tracking process we have performed a dynamic subject
calibration and a fitting of the subject motion. The calculated joint angles values during the trials are
exported in Matlab in CSV format. The movements are acquired with the performers seated in an initial
pose with the torso approximately upright, the right upper arm vertical and forearm horizontal. The fingers
are in natural full extension and the palm is supported by a desk. In the execution of tasks small forearm
pronation/supination and torso assistance was involved4 .
In the first two tasks (Fig 18), subjects reached forward over a distance of approximately 250 mm to
grasp two different vertical cylinders with diameters 50 mm and 65 mm (once for each trial). The observation
is focused on concurrent voluntary flexion of all digits in whole grasp task. Before the subject returns to the
initial posture the cylinder is placed at 150 mm from its initial position and a concurrent voluntary extension
of all fingers is observed.
In the third and fourth task, subjects maintained the same initial posture as in the first two tasks. Each
subject performed two consecutive repetitions of individual flexion Ű voluntary flexion of individual fingers,
one digit at a time. For the latter task, the palm of the performer is posed on a special support without any
other constraints. In the latter two tasks, the subjects were instructed not to consciously control involuntary
joint flexion of the non-intended fingers; they completed 10 trials (five different movements, two repetitions)
for each task.
A local coordinate system x0 −y0 −z0 was established to facilitate kinematic descriptions and definitions.
4
. The subjects moved each finger into flexion and extension while attempting to keep the others, non instructed fingers
still.
31
ICT – FP7 216239 – DEXMART Deliverable D1.1
The origin of this local coordinate system was the marker adhered to the dorsal landmark of the wrist. The
y0 -axis lay in the plane, pointing radially while being perpendicular to the x0 -axis. The z0 -axis was therefore
normal to the plane, pointing dorsally. Coordinates of the markers measured in the global (laboratory)
coordinate system (x − y − z) were transformed and expressed in the local coordinate system (x0 − y0 − z0 ).
From the local coordinates, the time-varying joint angles (Fig. 16) measuring all the involved flexion-
extension DoF were derived through a computational procedure in the Nexus software that determined
the finger segmental centres of rotation. The flexion portions of angular profiles for the MP, proximal
interphalangeal(PIP), and DIP joints of digits 2-5, and CM, MP and IP joints of thumb were analysed in the
current study with a total of 25 DoF. A semi-automatic procedure was established to identify the initiation
and termination times of the flexion and extension motions.
The first type of analysis consists in the computation of the correlation coefficient matrix for all the
DOF’s. This is aimed at quantifying the degree of correlation of each DOF with all the others. The
Figures 19,20 summarise the results in the flexion-extension movement, while the Figures 21,22 summarise
the results for the grasping movement of only one of the objects considered, where the correspondence
between number and name of each dof is summarised in Table 6.3. Analogous results have been obtained
for the grasping of the other object and thus are not reported for brevity.
In the analysis of the flexion-extension movement (see the correlation coefficient matrix in Fig. 19) we
can observe that:
• each metacarpal joint has an high correlation coefficient (0.65 - 0.85) respect to the adjacent
metacarpal joints both in the flexion DOF and in the abduction DOF (see the entries (1,5), (5,11),
(11,17))
• each proximal interphalanx joint has a correlation coefficient around 0.6 respect to the proximal
interphalanx joint of adjacent fingers (see the entries (3,7), (7,13), (13,19))
• the abduction of the ring finger metacarpal joint (labeled as CMC_3_MCP_3) is strongly correlated
with flexion of the little finger proximal interphalanx joint (labeled as MCP_4_PIP_4) (see the entry
(12,19))
32
ICT – FP7 216239 – DEXMART Deliverable D1.1
correlation coefficients
1 1
2
3
0.9
4
5
6 0.8
7
8
0.7
9
10
11 0.6
DOF number
12
13
0.5
14
15
16 0.4
17
18
0.3
19
20
21 0.2
22
23
0.1
24
25
26 0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
DOF number
Figure 19: Estimated correlation among hand dof’s in the voluntary flexion-extension trials.
1
2 0.55
3
4 0.5
5
6 0.45
7
8 0.4
9
10 0.35
11
12
0.3
13
14
0.25
15
16
0.2
17
18
19 0.15
20
21 0.1
22
23 0.05
24
25
5 10 15 20 25
Figure 20: Estimated variance of the correlation indices in the voluntary flexion-extension trials.
33
ICT – FP7 216239 – DEXMART Deliverable D1.1
• the empirical model commonly used in the literature which correlates, for all fingers except the thumb,
the DIP and PIP angular positions is not valid in the flexion movement (see the entries (3,4), (7,8),
(13,14), (19,20)).
• the two DoF’s of he dorsum joint labeled as Wrist_CMC_3 are strongly correlated (see the entry
(9,10)).
• the two DOF’s of the hand dorsum joint labeled as Wrist_CMC_4 are strongly correlated each other
(see the entry (15,16)).
• the flexion DOF of the joint labeled as OCMC_T is strongly correlated with abduction DOF of the
thumb metacarpal joint (labeled as CMC_T_MCP_T) and with both the abduction and flexion
DOF of the metacarpal index joint, labeled as Wrist_MCP_1 (see the entries (21,1), (21,2), (21,2),
(21,24), (2,24)).
• The flexion DOF of the Wrist_MCP_1 joint (index metacarpal joint) has a significant correlation
(0.65) with the MCP_T_IP_T joint (proximal interphalanx thumb joint).
• the two DOF’s of the CMC_T_MCP_T joint are both correlated with the MCP_T_IP_T; the
values in entries (23,26) and (24,26) show that a correlation of about 0.6 exists between the two
thumb joints in flexion movement.
34
ICT – FP7 216239 – DEXMART Deliverable D1.1
Figure 21: Estimated correlation among hand dof’s in the grasp trials.
From the analysis of the variance matrix in Fig. 20 we can conclude that the above findings are quite
reliable since the variance values are almost uniformly low.
Figure 21 shows that, during a grasp of a cylinder with a diameter of 60mm, the angular joint positions
are highly correlated. The thumb joints are the less correlated with the others and the associated correlation
coefficient significantly varies in different capture sessions, this is due to the occlusion phenomenon, which
makes very difficult capturing and tracking marker positions. The low quality of measurements in this task
is confirmed by the analysis of the variance matrix in Fig. 22, where the last four rows, corresponding to
thumb DOF’s, show a high variance among the different trials. The sensorised glove and the sensor fusion
algorithms to be developed within the project will be used to reduce this problem and to get more insight
into the joint interdependencies in manipulation tasks involving also the thumb. The obtained information
about hand joint correlation will be used by the Kalman-like sensor fusion algorithm cited above to improve
its tracking performance.
A second type of analysis has been carried out. A Principal Components Analysis (PCA) was employed
to investigate the synergistic behaviour among finger joints. The use of PCA is motivated with the aim
to establish the minimum number of signals necessary to approximately describe a motion during the two
considered tasks.
The PCA analysis of the flexion-extension task shows that the 90% of the variance is contained in the
first five principal components and the 98% of the variance is contained in the first 12 principal components.
The values shown in Fig. 23 are obtained by dividing each singular value of the covariance matrix by the
sum of all the singular values; by combining 12 signals it is possibile to represent with good approximation
all the hand joint movements.
The values shown in Fig. 24 demonstrate that 90% of variance is contained in the first three principal
components and the 98% of the variance is contained in the first six principal components. It means that
to reconstruct the movements in a grasping task less signals are needed than in a flexion task.
35
ICT – FP7 216239 – DEXMART Deliverable D1.1
1
2 0.7
3
4
5
0.6
6
7
8
9 0.5
10
11
DOF number
12
0.4
13
14
15
16 0.3
17
18
19 0.2
20
21
22
0.1
23
24
25
26 0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
DOF number
Figure 22: Estimated variance of the correlation indices in the gasp trials.
0.9
0.8
singular values of the covariance matrix
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0 5 10 15 20 25
principal component number
36
ICT – FP7 216239 – DEXMART Deliverable D1.1
0.7
0.6
0.5
singular value
0.4
0.3
0.2
0.1
0
0 5 10 15 20 25
principal component number
the high degree of freedom of the hand. In particular, there are two very difficult problems to solve: the
first is the reduction of the number of marker to place on a small area of the backhand and the second is
the reduction of the marker occlusion phenomenon due to dexterous hand performance in a capture area
(also the relevance of ghost marker problem increases with the number of the markers in a small field of
view of each camera).
Most of the researchers in this area use a number of markers reduced with respect to the minimum
number required to reconstruct the motion of all the bones constituting the human hand. The reduction
has been made possible by using a mathematical model of the hand or by use of additional sensors. e.g. data
gloves. The second strategy will be also used in DEXMART, in fact the planned activities already include the
integration of a data glove (under development in WP5) into the OMG optical motion capture system. The
main motivation is the objective to reduce at a minimal extent the failure in motion tracking of hand bones
due to the marker occlusion problem, which is very frequent during manipulation tasks. Some preliminary
measurements have already been conducted, where the motion of a single finger has been captured by using
the typical marker set used in the literature (one marker for each finger bone) and even in such a simple
case, occlusions have been demonstrated very frequent even with the use of five cameras well distributed
around the hand workspace. To improve the quality of acquired kinematic data and to reduce the minimal
number of markers, a sensor fusion algorithm for hand motion tracking will be realised. In particular our
sensorised glove is equipped with only three markers and three low cost angular sensors per finger. In detail,
three markers are used for defining a reference system fixed to the hand wearing the glove, three markers
placed on the index finger are used to estimate the joint angles between the phalanxes. The three angular
sensors are then mounted on the same finger (see Fig. 25).
To perform the experiments we used a Vicon 460 motion capture system equipped with 5 high resolution
M2 cameras. Figure 26 shows the marker trajectories for four consecutive flexion and extension movements
of the index. This experiment was executed in the two cases with the same constraints condition ( the palm
of the hand is still held in a fixed position and the index motion is performed without any other constrains
on other fingers.
The results show high variability of the marker trajectories across consecutive movements. This is due
to the sliding of the glove with respect to the phalanxes bones. To evaluate and reduce this effect it will be
37
ICT – FP7 216239 – DEXMART Deliverable D1.1
Figure 26: Trajectories generated by three markers mounted on the data glove.
38
ICT – FP7 216239 – DEXMART Deliverable D1.1
necessary to perform an analysis in MRI environment of the capture error for a gloved had.
7 Conclusions
In this report we analysed in depth the kinematic model of the human hand. We reviewed the existing
state-of-the-art in different research communities, and, although a common ground on articulated models
exists we pointed out that different models suit different applications. Similarly, our results suggest that the
kinematic model should depend not only on the application but also on the type of capture system. With
small camera counts and a relative low number of large markers, it is not feasible to try to recover subtle
movements like palm arching. In this case a simple kinematic model with 20 DoF should be used. However,
results have shown that 20 DoF are not sufficient to accurately model complex grasp actions. When high
resolution cameras and small markers are used, more complex modelling is a feasible option. In particular
we have seen that the Santos model with 25 DoF is a good compromise between simplicity and predictive
power. This model can be enhanced by with a more complex thumb articulation that uses three DoF on
either CMC or MCP thumb joints or via the addition of extra DoF on the palm to better approximate the
subtle movements of the carpal bones.
To improve the capture accuracy, we presented a novel kinematic calibration procedure that accounts
for soft tissue artefacts by allowing the markers to move according to polynomial functions of the joints
angles. The extra parameters added to model marker motions are selected by an elegant automated model
selection procedure. The results on thumb and index capture show that the proposed model generalize well
on unseen data and produces significant improvements in terms or marker residual reduction.
The finger interdependencies analysis showed that on simple grasp tasks the first two PCA components
can account for up to 98% of the signal energy. This result paves the way for a low dimensional and compact
representation of the grasp movements that can simplify the design of the robotic hand control algorithms.
Finally, results have shown that optical motion capture can accurately track the hand movements when
these movements are heavily constrained and with small capture volumes. However, natural object manip-
ulations in an unconstrained environment may produce long term occlusions especially on the fingertips. In
these situations an independent source of information is necessary. Research has started in T1.3 and will
focus also on the integration of data-glove data with optical motion capture. On this regard we presented
preliminary results on marker set selection for hybrid optical/data-glove based motion capture.
d = g(θ, w) = φ(θ)T w + e
with e a zero mean Gaussian noise and the parameter prior defined in Eq. (8) we have to formulate the
marginal probability Z
p(d|g) = p(d|w, g)p(w|g)dw. (12)
To this extent let us formulate the linar regression on the data vector d as
d = Φw + e,
39
ICT – FP7 216239 – DEXMART Deliverable D1.1
where e is the random vector generated from the realisations of e, Φ is the design matrix
φ(θ0 )T
φ(θ1 )T
Φ= .. = A .
.
φ(θnk )T
is an nk vector with all elements equal to one (i.e., the regressor for the the zero order component) and
A is an nk × (nw − 1) matrix containing the regressor values for the components with order higher than
zero. Finally, we can rewrite the likelihood as
By substituting the prior and the likelihood into Eq. (12) and applying Gaussian integration rules we obtain
Z Z
N d|Aŵ + w0 , σ 2 I N (ŵ|0, Σŵ ) dŵ dw0
p(d|g) = (13)
w0 ŵ
Z
N d|w0 , B = σ 2 I + AΣŵ AT dw0
= (14)
w0
2 !
1 T −1 1 T B −1 d + dT B −1
= c · exp − d B d− T dT B −1
, (15)
2 4
1−nk 1
where the normalisation factor c = (2π) 2 (|B| T B −1 )− 2 .
References
[1] A. Erol, G. Bebis, M. Nicolescu, R.D. Boyle, and X. Twombly. Vision-based hand pose estimation: A
review. Computer Vision and Image Understanding, 108(1-2):52–73, 2007.
[2] J. Davis and M. Shah. Recognizing hand gestures. In Proc. of European conference on Computer
Vision, pages 331–340, Secaucus, NJ, USA, 1994. Springer-Verlag New York, Inc.
[3] K.G. Derpanis. A review of vision-based hand gestures. Technical report, York University, February
2004.
[4] J.M. Rehg and T. Kanade. Visual tracking of high dof articulated structures: an application to human
hand tracking. In Proc. of European conference on Computer Vision, pages 35–46, Secaucus, NJ, USA,
1994. Springer-Verlag New York, Inc.
[6] Y. Wu, J. Lin, and T.S. Huang. Analyzing and capturing articulated hand motion in image sequences.
IEEE Trans. Pattern Anal. Mach. Intell., 27(12):1910–1922, 2005.
[7] K.N. An, E.Y. Chao, W.P. Cooney III, and R.L. Linscheid. Normative model of human hand for
biomechanical analysis. J. Biomechanics, 12(10):775–788, 1979.
[8] X. Zhang, S. W. Lee, and P. Braido. Determining finger segmental centers of rotation in flexion-
extension based on surface marker measurement. J Biomech, 36(8):1097–1102, August 2003.
40
ICT – FP7 216239 – DEXMART Deliverable D1.1
[9] P. Cerveri, N. Lopomo, A. Pedotti, and G. Ferrigno. Derivation of centers and axes of rotation for
wrist and fingers in a hand kinematic model: Methods and reliability results. Annals of Biomedical
Engineering, 33(3):402–412, March 2005.
[10] P. Cerveri, E. De Momi, N. Lopomo, Baud G. Bovy, R. Barros, and G. Ferrigno. Finger kinematic
modeling and real-time hand motion estimation. Annals of Biomedical Engineering, 35(11):1989–2002,
November 2007.
[11] E.P. Pena-Pitarch, J. Yang, and K. Abdel-Malek. SANTOSTM hand: A 25 degree-of-freedom model.
In Proc. of SAE Digital Human Modeling for Design and Engineering, Iowa City, USA, June 2005.
[12] J. H. Coert, H. G. van Dijke, S. E. Hovius, C. J. Snijders, and M. F. Meek. Quantifying thumb rotation
during circumduction utilizing a video technique. J Orthop Res, 21(6):1151–1155, November 2003.
[13] F. J. Valero-Cuevas, M. E. Johanson, and J. D. Towles. Towards a realistic biomechanical model of
the thumb: the choice of kinematic description may be more critical than the solution method or the
variability/uncertainty of musculoskeletal parameters. J Biomech, 36(7):1019–1030, July 2003.
[14] L. Kuo, W.P. Cooney, M. Oyama, K.R. Kaufam, F.C. Su, and K.N. An. Feasibility of using surface
markers for assessing motion of the thumb trapeziometacarpal joint. Clinical Biomechanics, 18(6):558–
563, July 2003.
[15] L.Y. Chang and N. Pollard. Method for determining kinematic parameters of the in vivo thumb
carpometacarpal joint. IEEE Trans. Biomed. Eng., 55(1):1897–1906, July 2008.
[16] A. Hollister, D. J. Giurintano, W. L. Buford, L. M. Myers, and A. Novick. The axes of rotation of
the thumb interphalangeal and metacarpophalangeal joints. Clinical Orthopaedics & Related Research,
320:188–193, November 1995.
[17] Y. Yasumuro. Three-dimensional modeling of the human hand with motion constraints. Image and
Vision Computing, 17(2):149–156, February 1999.
[18] I. Albrecht, J. Haber, and H.P. Seidel. Construction and animation of anatomically based human hand
models. In Proceedings of SIGGRAPH the conference on Computer graphics and interactive techniques,
pages 98–109, San Diego, California, 2003.
[19] S. Sueda, A. Kaufman, and D.K. Pai. Musculotendon simulation for hand animation. ACM Trans.
Graph. (SIGGRAPH), 27(3):1–8, 2008.
[20] K. Singh and E. Kokkevis. Skinning characters using surface oriented free-form deformations. In Proc.
of the Conference on Graphics Interface, pages 35–42, Montreal, Canada, 2000.
[21] X.C. Wang and C. Phillips. Multi-weight enveloping: least-squares approximation techniques for skin
animation. In SCA ’02: Proceedings of the 2002 ACM SIGGRAPH/Eurographics symposium on Com-
puter animation, pages 129–138, New York, NY, USA, 2002. ACM.
[22] T.W. Sederberg and S.R. Parry. Free-form deformation of solid geometric models. In Proceedings
of SIGGRAPH the conference on Computer graphics and interactive techniques, pages 151–160, New
York, NY, USA, 1986. ACM Press.
[23] J. P. Lewis, M. Cordner, and N. Fong. Pose space deformation: a unified approach to shape inter-
polation and skeleton-driven deformation. In Proceedings of SIGGRAPH the conference on Computer
graphics and interactive techniques, pages 165–172, New York, NY, USA, 2000. ACM Press/Addison-
Wesley Publishing Co.
41
ICT – FP7 216239 – DEXMART Deliverable D1.1
[24] D. Anguelov, P. Srinivasan, D. Koller, S. Thrun, J. Rodgers, and J. Davis. Scape: shape completion
and animation of people. ACM Trans. Graph., 24(3):408–416, 2005.
[25] S.I. Park and J.K. Hodgins. Capturing and animating skin deformation in human motion. ACM Trans.
Graph., 25(3):881–889, 2006.
[26] I. Söderkvist and P.A. Wedin. (determining the movements of the skeleton using well-configured
markers. Journal of Biomechanics, 26(12):1473–1477, 1993.
[27] T. Andriacchi, E. Alexander, M. Toney, C. Dyrby, and J. Sum. A point cluster method for in vivo motion
analysis: Applied to a study of knee kinematics. Journal of Biomechanical Engineering, 120(12):743–
749, 1998.
[28] A. Cereatti, U. Della Croce, and A. Cappozzo. Reconstruction of skeletal movement using skin markers:
comparative assessment of bone pose estimators. Journal of NeuroEngineering an Rehabilitation, 3(7),
2006.
[29] A. Cappello, A. Cappozzo, P.F. La Palombara, L. Lucchetti, and A. Leardini. Multiple anatomical
landmark calibration for optimal bone pose estimation. Human Movement Science, 16(2-3):259–274,
1997.
[30] A. Cappozzo, F. Catani, U. Della Croce, and A. Leardini. Position and orientation of bones during
movement: anatomical frame definition and determination. Clinical Biomechanics, 10:171–178, 1995.
[31] J. Lin, Y. Wu, and T.S. Huang. Modeling the constraints of human hand motion. In Proc. of Workshop
on Human Motion, page 121, Washington, DC, USA, 2000. IEEE Computer Society.
[32] C.S. Chua, H.Y. Guan, and Y.K. Ho. Model-based finger posture estimation. In Proc. of Asian
Conference on Computer Vision, pages 43–48, January 2000.
[33] G. Jin and J.K. Hahn. Adding hand motion to the motion capture based character animation. In
International Symposium on Advances in Visual Computing, pages 17–24, 2005.
[34] M. Nakamura, C. Miyawaki, N. Matsushita, R. Yagi, and Y. Handa. Finger kinematic modeling and
real-time hand motion estimation. J Electromyogrraphy and Kinesiology, 8(5):295–303, 1998.
[35] C. Hager-Ross and M. H. Schieber. Quantifying the independence of human finger movements: com-
parisons of digits, hands, and movement frequencies. Journal of Neuroscience, 20(22):8542–8550,
2000.
[36] S.W. Lee and X. Zhang. Biodynamic modeling, system identification, and variability of multi-finger
movements. J Biomech., 40(14):3215–3222, 2007.
[37] C.E. Lang and Schieber M.H. Human finger independence: limitations due to passive mechanical
coupling versus active neuromuscular control. Journal of Neurophysiology, 92:2802–2810, 2004.
[38] P.H. Thakur, A.J. Bastian, and S.S. Hsiao. Multidigit movement synergies of the human hand in an
unconstrained haptic exploration task. Journal of Neuroscience, 28(6):1271–1281, 2008.
[39] E. Holden. Visual Recognition of Hand Motion. PhD thesis, University of Western Australia, 1997.
[40] N.A. Baker, R. Cham, and E.H. Cidboy. Kinematics of the fingers and hands during computer keyboard
use. Clinical Biomechanics, 22(1):34–43, January 2007.
42
ICT – FP7 216239 – DEXMART Deliverable D1.1
[41] DLR et al. Specification of benchmarks. Technical report, European research project DEXMART
(FP7-216239), 2009.
[42] D. Witonski. Dynamic magnetic resonance imaging. Clinics in Sports Medicine, 21(3):403–415.
[43] H.H. Quick, M.E. Ladd, and M. Hoevel. Real-time mri of joint movement with true fisp. Magnetic
Resonance Imaging, 15:710–715, 2002.
[44] J. Brossmann, Muhle C., and Schroder C. Patellar tracking patterns during active and passive knee
extension: evaluation with motion-triggered cine mr imaging. Radiology, 187:205–212, 1993.
[45] B. Gilles, R. Perrin, N. Magnenat-Thalmann, and J. Vallee. Bone motion analysis from dynamic mri:
Acquisition and tracking. Academic Radiology, 12(10):1285–1292.
[46] C. Muhle. Kinematic ct and mr imaging of the patellofemoral joint. Eur Radiol, 9(3):508–518, 1999.
[47] J. Fuller, L. Liu, M.C. Murphy, and R.W. Mann. A comparison of lower-extremity skeletal kinematics
measured using skin- and pin-mounted markers. In 3-D Analysis of Human Movement, volume 16,
pages 219–242, 1997.
[48] I.K. Sahni, Hipp J.A., Kirking B.C., Alexander J.W., and Esses S.I. Use of percutaneous transpedicular
external fixation pins to measure intervertebral motion. Spine, 24(18):1890–1893, September 1999.
[49] D. Nunn, M.A. Freeman, P.F. Hill, and S.J. Evans. The measurement of migration of the acetabular
component of hip prostheses. Journal of Bone and Joint Surgery - British Volume, 71-B:629–631,
1989.
[50] M. Veber and T. Bajd. Assessment of human hand kinematics. In Proc. of International Conference
on Robotics and Automation, pages 2966–2971. IEEE, 2006.
[51] I.W. Charlton, P. Smyth, and L. Roren. Repeatability of an optimized lower body model. Gait and
Posture, 20:213Ű221, 2004.
[52] H.R. Lindman. Analysis of variance in complex experimental designs. SIAM Review, 18(1):134–137,
January 1976.
[53] H. Akaike. A new look at the statistical model identification. IEEE Trans. Autom. Control,
19(6):716Ű723, 1974.
[54] G. Schwarz. Estimating the dimension of a model. Annals of Statistics, 6(2):461–464, 1978.
[55] K. Bubna and C.V. Stewart. Model selection and surface merging in reconstruction algorithms. In
Proc. of International Conference on Computer Vision, pages 895–902, 1998.
43