Professional Documents
Culture Documents
C3 c2
C2 C3 c3
C2 c6
C3 c5
C9 C4 c4 c4
C6 C10 C6 c3
C7 C1 c5 c2
C5 C2
C4 C8 C4 c6 c1
C5
Figure 2. From left to right, the figure illustrates two-view motion estimation from a generalized camera of 3 different types (cases). Left:
the most general case, where all the image rays are entirely unconstrained; Middle: the locally-central case, where for every 3D scene
point the projection center is unique and fixed locally in camera frame; Right: the axial case, where all image rays must intersect a common
line called as axis. Note that in the latter two cases the relative positions of all the camera centers are fixed in the local frame. In this
paper we show that the ranks of generalized epipolar equations corresponding to the above three cases are respectively r = 17, r = 16
and r = 14. The standard SVD algorithm works only for the first case, while our new algorithm applies to all 3 cases.
In the following sections, we solve the problem effec- claimed value. This will mean that generically (that is, for
tively and linearly. The results we obtain are remarkably almost all input data) the rank will indeed reach this upper
accurate. bound. In this paper, we do not explicitly exhibit examples
where the rank attains the claimed value, but in all cases this
3. Analysis of Degeneracies has been verified by example.
The derivation of the generalized epipolar equation can The most general case. In the most general case (see
be easily obtained from the Plücker coordinate representa- fig-2 left), the camera is simply a set of unconstrained im-
tion of 3D lines (see [13]). An image ray passing through age rays in general position. For this case, the rank of the
a point v (e.g., camera center) with unit direction x can be obtained generalized epipolar equation will be 17. There-
represented by a Plücker 6-vector L = (x , (v × x) ) . fore a unique solution is readily solvable by the standard
Using this representation we then re-state the GEC as SVD method. We do not further consider this case in the
follows: paper.
xi Exi + xi R(vi × xi ) + (vi × xi ) R xi = 0 . (2) Locally central projection. Next, we consider a degen-
erate case of a “generalized camera” consisting of a set of
In the ground-truth solution to these equations, the ma- locally central projection rays (see fig-2 middle). The com-
trix E is of the form [t]× R, where R is the same matrix oc- monly used (non-overlapping) multi-camera rigs are exam-
curring in the second and third terms of this equation. There ples of this case. When the camera rig moves from an initial
is no simple way of enforcing this condition in solving the to a final position, points are tracked. We assume no point
equations, however, so in a linear approach we solve the from one component camera to another is used, so that all
equations ignoring this condition. Our initial goal is to ex- point correspondences are for points seen in the same cam-
amine the linear structure of the solution set to these equa- era. We assume further that each component camera is a
tions under various different camera geometries. central projection camera, so that all rays go through the
same point, i.e. the camera center. We will refer to this as
3.1. Degeneracies locally central projection.
We identify several degeneracies for the set of equations Since rays are represented in a coordinate system at-
arising from (2), which cause the set of equations to have tached to the camera rig, the correspondence is between
smaller than expected rank. Suppose we wish to show that points (xi , vi ) ↔ (xi , vi ) where vi is the camera center.
the rank of the system is r < 17. Assume that there are In particular, note that vi = vi . The equations (2) are now
at least r equations arising from point correspondences via
xi Exi + xi R(vi × xi ) + (vi × xi ) R xi = 0 . (3)
the equations (2). To show that the rank is r, we will exhibit
a linear family of solutions to the equations. If the linear Now let (E, R) be a one solution to this set of equa-
family of solutions has rank 18−r, then the equation system tions, with E = 0. It is easily seen that (0, I) is also
must have rank no greater than r. a solution. In fact, substituting (0, I) in (3) results in
The reader may object that this argument only places an xi (vi × xi ) + (vi × xi ) xi , which is zero because of
upper bound on the rank of the equation system. However, the antisymmetry of the triple-product.
to show that the system does not have smaller rank, it is suf- Generically, the rank is not less than 16, so a complete
ficient to exhibit a single example in which the rank has the solution to this set of equations is therefore of the form
(λE, λR + µI), a two-dimensional linear family. From this In summary, in the case of a locally central axial camera
formulation, an interesting property of the set of solutions the complete solution set is of the form
to (2) is found: the ambiguity is contained entirely in the
estimation of R, while the essential matrix E is still able to (αE, αR + βI + γ[w]× + δww ) .
be determined uniquely up to scale.
under the assumption that the coordinate origin lies on the
camera axis. Once more, the E part of the solution is de-
Axial cameras. Our second example of degenerate con- termined uniquely up to scale, even thought there is a 4-
figuration is what we will call an axial camera (see fig-2 dimensional family of solutions.
right). This is defined as a generalized camera in which all
the rays intersect in a single line, called the axis. There are
4. Algorithm
several examples of this which may be of practical interest.
Next, we give some algorithms based on (2) for retriev-
1. A pair of rigidly mounted central projection cameras ing the motion of a generalized camera. The algorithm ap-
(for instance, ordinary perspective cameras). plies to the situations we have considered involving locally
2. A set of central projection cameras with collinear cen- central and axial cameras where the equation set is rank-
ters. We call this a linear camera array. deficient, resulting in different dimensional families of so-
3. A set of non-central catadioptric or fisheye cameras lutions. Despite the degeneracy, we obtain a unique linear
mounted with collinear axes. solution.
The first two cases are also locally central projections,
provided that points are not tracked from one camera to oth- Linear algorithm. The condition (2) gives one equation
ers. for each point correspondence. Given sufficiently many
To analyze this configuration, we will assume that the point correspondences, we may solve for the entries
origin of the world coordinate system lies on the axis, and of matrices E and R linearly from the set of equations
examine the solution set of equations (2) for this case. A(vec(E) , vec(R) ) = 0. However, we have seen that
In this coordinate system, we may write vi = αi w and the standard SVD solution to this set of equations gives
vi = αi w, where w is the direction vector of the axis. multiple solutions. If one ignores the rank deficiency of the
Equation (2) then takes the form equations, totally spurious solutions may be found.
xi Exi + αi (w × xi ) R xi + αi xi R(w × xi ) = 0 . (4) It was observed that for locally central projections, one
trivial solution to the linear system is E = 0 and R = I.
Suppose that (E, R) is the true solution to these equa- In practice, this solution is often found using the standard
tions. Another solution is given by (0, ww ), It satisfies SVD algorithm. The corresponding motion are R = I and
the equation (4) because (w × xi ) w + w (w × xi ) = 0. t = 0, since E = [t]× R. This means that the camera rig
Generically, the equation system has rank 16, so the general neither rotates, nor translates — a null motion. However,
solution to (4) for an axial camera is (λE, λR + µww ). for a moving camera this solution is not compatible with
Note the most important fact that the E part of the solu- the observed data. This shows a very curious property of
tion is constant, and the ambiguity only involves the R part the algebraic solution, that the equation set may be satisfied
of the solution. Thus, we may retrieve the matrix E with- exactly with zero error even though the solution found is
out ambiguity from the degenerate system of equations. It totally wrong geometrically.
is important to note that this fact depends on the choice of Various possibilities for finding a single solution from
coordinate system such that the origin lies on the axis. With- among a family of solutions may be proposed, enforcing
out this condition, there is still a two-dimensional family of necessary conditions on the essential matrix E and the ro-
solutions, but the solution for the matrix E is not invariant. tation R. Such methods will be non-linear, and awkward to
Locally-central-and-axial cameras. If in addition we implement. In addition, observe that the solution E = 0,
assume that the projections are locally central, then further R = I with t = 0 satisfies all compatibility conditions be-
degeneracies occur. We have seen already that for locally tween a rotation R and E = [t]× R, and yet is wrong geomet-
central projections, (0, R) is also a solution. However, in the rically. We prefer a linear solution avoiding this problem,
case of an axial camera array, a further degeneracy occurs. which will be described next.
The condition of local centrality means that αi = αi in (4).
We may now identify a further solution (0, [w]× ), since The key idea. To avoid the problem of multiple solutions
we observe the crucial fact that although there exists a fam-
(w × xi ) [w]× xi + xi [w]× (w × xi ) ily of solutions (of dimension 2 to 4 depending on the case),
= (w × xi ) (w × xi ) + (xi × w) (w × xi ) = 0 . all the ambiguity lies in the determination of the R part of
the solution. The E part of the solution is unchanged by the 2. From the standard essential matrix the translation t
ambiguity. This suggests using the set of equations to solve may be computed only up to scale. For a generalized
only for E, and forget about trying to solve for the R part, camera, however, the scale of t may be computed un-
which provides redundant information anyway. ambiguously. There is only one translation t that is
Thus, given a set of equations compatible with the correctly scaled rotation matrix R.
A (vec(E) , vec(R) ) subject to E = 1 , xi [t]× (R xi ) + xi R(vi × xi ) + (vi × xi ) R xi = 0 .
instead of (vec(E) , vec(R) ) = 1 as in the standard The only unknown in this set of equations is the transla-
SVD algorithm. This seemingly small change to the algo- tion vector t which may then be computed using linear-least
rithm avoids all the difficulties associated with the standard squares. The computed t will not have scale ambiguity.
SVD algorithm.
Solving a problem of this form is discussed in [6] (sec- Algorithm description. We now summarize the linear
tion 9.6, page 257) in a more general form. Here we sum- algorithm for solving GEC in the case of locally central or
marize the method. Write the equations as AE vec(E) + axial cameras.
AR vec(R) = 0, where AE and AR are submatrices of A con-
' $
sisting of the first and last 9 columns. Finding the solution 1. Given: a set of correspondences (xi , vi ) ↔
that satisfies vec(E) = 1 is equivalent to solving (xi , vi ). For the case of locally central projection,
vi = vi for all i. For the case of an axial camera,
(AR A+
R − I)AE vec(E) = 0 all vi and vi lie on a single line.
where A+R is the pseudo-inverse of AR . This equation is 2. Normalization: Center the cameras by the trans-
then solved using the standard SVD method, and it gives formations vi ← vi − v̄, vi ← vi − v̄ for some v̄,
a unique solution for E. normally the centroid of the different camera cen-
ters. For axial cameras, it is essential that v̄ be a
Handling axial cameras. The method for solving for point on the axis of the cameras. Next, scale so that
axial cameras, and particularly for the case of two cameras the cameras are approximately unit distance from
(i.e, stereo) is just the same, except that we must take care the origin.
to write the GEC equations in terms of a world coordinate 3. Form the set of linear equations AE vec(E) +
system where the origin lies on the axis. This is an essential AR vec(R) = 0 using (2).
(non-optional) step to allow us to compute the matrix E cor-
rectly. In the case of two cameras, it makes sense that the 4. Compute the pseudo-inverse A+ R of AR and write
origin should be the mid point between the two cameras. In the equation for vec(E) as B vec(E) = 0, where
addition, we scale the rays such that each of the two cameras B = (AR A+R − I)AE . Solve this equation using the
lies at unit distance from the origin, in opposite directions. standard SVD algorithm to find E.
5. Decompose E to get the twisted pair of rotation ma-
Extracting the rotation and translation. The E part trices R and R .
once found may be decomposed as E = [t]× R to obtain
both the rotation and translation up to scale. This problem is 6. Knowing possible rotations, solve equations (2)
a little different from the corresponding method of decom- to compute t linearly. The equations are non-
posing the essential matrix E for a standard pair of pinhole homogeneous in t, so t is computed with the cor-
images. There are two differences. rect scale. Keep either R or R and the correspond-
ing t, whichever gives the best residual.
& %
1. The decomposition of E = [t]× R gives two possi-
ble values for the rotation, differing by the so-called 5. Alternation
twisted pair ambiguity. This ambiguity may be re-
solved by cheirality considerations. However, in the It was indicated that once R is known, t may be computed
GCM case, only one of the two possibilities is com- linearly. Similarly, if t is known, then R may be computed
patible with the GEC. linearly from the same equations. We solve linearly for R
subject to the condition R2 = 3 so as to approximate a For both real and simulated experiments, the algorithm’s
rotation matrix. performance is characterized by the following criteria.
This suggests an alternating approach in which one
E−Ê
solves alternately for R and t. Since the cost decreases at • Error in E : E = E ;
each step, this alternation will converge to a local minimum • Error in R : R = R − R̂;
of the algebraic cost function
Tr(RR̂ )−1
• Angle difference in R: δθ = cos−1 ;2
xi [t]× R xi +xi R(vi ×xi )+(vi ×xi ) R xi 2 . (5)
i t ·t̂
• Direction difference in t: δt = cos−1 t t̂
;
The matrix R so found may not be orthogonal, but it may be
corrected at the end. Note that the cost (5) being minimized The symbols used above are defined as following. We
decreases at each step of the iteration. denote E, R, t as the ground-truth information used in sim-
ulations, and Ê, R̂, t̂ the corresponding estimated versions.
Unfortunately, the alternation algorithm just given has a Note that all these data (matrix or vector) are defined with
problem that manifests itself occasionally. Namely, it re- absolute scales (rather than defined up to a scale). Further
turns to the spurious minimum R = I and t = 0, i.e., the denote the R̂ and t̂ as the final results obtained after the al-
null motion, even if it may not be geometrically meaning- ternation, while Ê is computed by Ê = [t̂]× R̂. All the norms
ful. Note that we avoided this spurious solution in the linear are Frobenius norms.
algorithm by enforcing a constraint that E = 1. However,
since we are computing the value of t exactly (without scale 6.1. Tests on simulated multi-camera rigs
ambiguity) in this alternation method, we can not enforce
this constraint.
The way to solve this is by modifying the equations (2) as
will be explained now. We rewrite the equations as follows.
Let t̂ = βt be a unit vector in the direction of t, with β
being chosen accordingly. Then multiplying each term of
(5) by β results in a cost function Figure 3. We simulate three different configurations of multi cam-
xi [t̂]× R x + β xi R(v × x ) + (vi × xi ) R x 2 era rigs. Left: a general non-axial camera rig; middle: an axial
i i i i
camera rig; right: a non-overlap stereo head. The standard SVD
i
(6) algorithm is not applicable to any of these cases.
in which t̂ is a unit vector, and β is an additional unknown.
We wish to minimize this cost over all values of β, t̂ and R
subject to the conditions t̂ = 1, and R = 3. Given t̂ We simulate three cases of multi-camera rigs (see fig-
3). The first two cases—one is non-axial and one is axial—
and β it is easy to solve for R linearly as described before.
Similarly, if R is known, the problem is a linear least-squares each consists of five pinhole cameras, and the last case is a
problem in t̂ and β, that we may solve subject to t̂ = 1 non-overlapping stereo (binocular) system . The synthetic
image size is about 1000 × 1000 pixels. We do not use
in the same way as we solved for E above.
Note that the trick of multiplying the cost by β, the re- any cross camera matches, therefore they both satisfy the
ciprocal of the magnitude of t prevents t from converging locally-central projection model.
to zero, since this will result in increasing cost. We build up GEC equations for each of these three cases.
By a sequence of such alternating steps, the algebraic When there is no noise, the observed rank of each of the
equations is 16, 14 and 14. This has confirmed our theoret-
cost function associated with (6) is minimized subject to
t̂ = 1 and R = 3, the cost diminishing at every step. ical prediction.
In this way, we can not fall into the same spurious mini- We add Gaussian noise of 0.05 degrees in std to the di-
rection part of the Plücker vectors. This is a reasonable
mum of the cost function as before. If further accuracy is
required, the alternation result may be used to initialize a level of noise, as in the image plane it roughly corresponds
bundle-adjustment algorithm. to one pixel std error in our experiments.
We test our linear algorithm+alternation algorithm. Ex-
6. Experiments periments confirm that the alternation procedure always de-
creases the residual and thus improves the estimation accu-
To demonstrate the proposed algorithm works well in racy. An average convergence curve of 50 tests is illustrated
practice, we conduct extensive experiments on both syn- in fig-4.
thetic data and real image data using multi-camera rig con- Figure-5 gives the histograms of estimation accuracy for
figuration. the first two cases. Figure-6 shows the estimation accuracy
Residual convergence curve (noise = 0.05 ,100 points, avearge of 50 runs) Error v.s. noise level, for non−axial multi−camera rig Error v.s. noise level, for axial multi−camera rig
30 0.045 0.05
Rotation error Rotation error
Translation error Translation error
0.04
Scale error Scale error
0.04
25 0.035
0.03
0.03
20 0.025
0.02 0.02
residuals
15 0.015
0.01
0.01
10 0.005
0
5 −0.005 −0.01
0.02 0.04 0.06 0.08 0.1 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1
Standard deviations of noise (in degrees) Standard deviations of noise (in degrees)
0
0 10 20 30 40 50 Error v.s. noise level (when the noise level is relatively higher) Error v.s. noise level (when the noise level is relatively higher)
0.7 0.7
iterations Rotation error Rotation error
Translation error Translation error
0.6 Scale error 0.6 Scale error
dure, i.e., residual error v.s. number of iterations. The curve was 0.4 0.4
generated by averaging 50 tests with 0.05 degrees std noise. 0.3 0.3
0.2 0.2
0.1 0.1
0 0
Error in rotation Error in translation direction Estimated scale 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Standard deviations of noise (in degrees) Standard deviations of noise (in degrees)
50 50 200
40 40
150 Figure 6. This figure shows estimation accuracy (in rotation, trans-
30 30
100
lation, scale) as a function of noise level. The error in scale esti-
20 20 mate is defined as |1 − t
t̂
|. Top row left: results for simulated
50
10 10
non-axial camera rigs; Top row right: results for simulated axial
0
0
in degrees
0.05
0
0 0.05
in degrees
0.1
0
0.95 1
scales
1.05 camera rigs. Bottom row shows the same results as in the top row,
but with much higher levels of noise.
Error in rotation Error in translation direction Estimated scale
50 50 200
40 40
150
Monocular
15
0 0 0
0 0.02 0.04 0.06 0 0.05 0.1 0.95 1 1.05 100
in degrees in degrees scales 10
15
20
10
10
5
−0.02
ear approach. Using this method for initializing a nonlinear
0.02
0
0.02 −0.02
0 algorithm (e.g., bundle adjustment algorithm based on geo-
metric error metric) would no doubt give even better results.
Figure 8. Left: our experiment setup. A ladybug camera is moving
along a plotted planar curve; Right: spatial configurations of its Acknowledgement. This research was partly supported by
NICTA, a research centre funded by the Australian Government as rep-
six camera units. resented by the Department of Broadband, Communications and the Dig-
ital Economy. The authors wish to thank anonymous reviewers for their
invaluable suggestions.
References
[1] M. Byröd, K. Josephson, and K. Åström. Improving numerical ac-
curacy of grbner basis polynomial equation solver. In ICCV, 2007.
[2] B. Clipp, J. Kim, J. Frahm, M. Pollefeys, and R. Hartley. Robust
6dof motion estimation for non-overlapping, multi-camera systems.
Figure 9. Some sample images used in the real date experiment. Technical Report TR07-006, UNC, ftp://ftp.cs.unc.edu/pub, 2007.
[3] D. Feldman, T. Pajdla, and D. Weinshall. On the epipolar geometry
of the crossed-slits projection. In ICCV, page 988, Washington, DC,
USA, 2003. IEEE Computer Society.
[4] J.-M. Frahm, K. Köser, and R. Koch. Pose estimation for Multi-
Camera Systems. In DAGM, 2004.
[5] M. D. Grossberg and S. K. Nayar. A general imaging model and a
method for finding its parameters. In ICCV, pages 108–115, 2001.
[6] R. Hartley and A. Zisserman. Multiple View Geometry in Computer
Vision. Cambridge University Press, 2003.
[7] M. Lhuillier. Effective and generic structure from motion using an-
Figure 10. Measured ground-truth trajectories vs. recovered tra- gular error. In ICPR, pages 67–70. IEEE Computer Society, 2006.
jectories. From left to right we test three different type of curves, [8] H. Li. A practical algorithm for l-infinity triangulation with outliers.
an ∞-shaped curve, a spiral and a circle. Each sequence has 101 In CVPR, 2007.
frames. [9] H. Li and R. Hartley. Five-point motion estimation made easy. In
ICPR ’06, pages 630–633, 2006.
[10] R. Molana and C. Geyer. Motion estimation with essential and gen-
eralized essential matrice. In Imaging Beyond the Pinhole Camera
(Daniilidis and Klette ed.). Springer-Verlag, 2007.
the incremental motion between them. Even with errors [11] E. Mouragnon, M. Lhuillier, M. Dhome, F. Dekeyser, and P. Sayd.
accumulate over 100 frames, the recovered trajectories are Generic and real time structure from motion. In Proc. BMVC, 2007.
remarkably good, not show obvious drift. This has vali- [12] J. Neumann, C. Fermüller, and Y. Aloimonos. Polydioptric cameras:
dated the effectiveness of our algorithm. During the tests, New eyes for structure from motion. In DAGM, pages 618–625, Lon-
don, UK, 2002. Springer-Verlag.
we manually checked the feature matches to rule out mis- [13] R. Pless. Using many cameras as one. In CVPR03, pages II: 587–
matches. Otherwise, the recovered trajectory would be 593, 2003.
skewed, indicating that the algorithm is sensitive to outliers. [14] G. Schweighofer and A. Pinz. Fast and globally convergent structure
To address the outlier issue, we are currently investigating a and motion estimation for general camera models. In Proc. BMVC,
volume 1, pages 206–212, 2006.
new algorithm based on Generalized Linear Programming [15] O. Shakernia, R. Vidal, and S. Sastry. Infinitesimal motion estimation
([8]). from multiple central panoramic views. In Proc. MOTION’02. IEEE
Computer Society, 2002.
[16] H. Stewénius, D. Nistér, M. Oskarsson, and K. Åström. Solutions to
7. Summary minimal generalized relative pose problems. In ICCV Workshop on
Omnidirectional Vision, Beijing China, Oct. 2005.
Although the standard 17-point algorithm does not work [17] P. Sturm. Multi-view geometry for general camera models. In CVPR,
for many of the common Generalized Camera config- volume 1, pages 206–212, jun 2005.
urations (such as axial camera arrays, non-overlapping [18] P. Sturm, S. Ramalingam, and S. Lodha. On calibration, structure
from motion and multi-view geometry for generic camera models.
multiple-camera rigs, non-central catadioptric cameras or
In Imaging Beyond the Pinhole Camera (Daniilidis and Klette ed.).
non-overlapping binocular stereo), a new linear algorithm Springer-Verlag New York, Inc., 2006.
(based on as few as 14 or 16 points) is proposed which al- [19] S. Tariq and F. Dellaert. A Multi-Camera 6-DOF Pose Tracker. In
lows us to solve linearly for the generalized essential matrix IEEE and ACM International Symposium on Mixed and Augmented
Reality (ISMAR), 2004.
and the 6-dof motion of the generalized camera.