Professional Documents
Culture Documents
Geometry
Subhashis Banerjee
Dept. Computer Science and Engineering
IIT Delhi
email: suban@cse.iitd.ac.in
May 29, 2001
Camera Models
1.1
x1
x2 = T21
x3
X1
X2
X3
X4
1.2
A special case of the projective camera is the perspective (or central) projection,
reducing to the familiar pin-hole camera when the leftmost 3 3 sub-matrix of T
is a rotation matrix with its third row scaled by the inverse focal length 1/f . The
simplest form is:
1 0 0 0
0 0
Tp =
0 1
0 0 1/f 0
which gives the familiar equations
"
x
y
f
=
Z
"
X
Y
Each point is scaled by its individual depth, and all projection rays converge to the
optic center.
1.3
The affine camera is a special case of the projective camera and is obtained by constraining the matrix T such that T31 = T32 = T33 = 0, thereby reducing the degrees
of freedom from 11 to 8:
x1
0
0
0 T34
x2 = T21
x3
X1
X2
X3
X4
In terms of image and scene coordinates, the mapping takes the form
x = MX + t
where M is a general 2 3 matrix with elements Mij = Tij /T34 while t is a general
2-vector representing the image center.
The affine camera preserves parallelism.
1.4
The affine camera becomes a weak-perspective camera when the rows of M form a
uniformly scaled rotation matrix. The simplest form is
Twp
yielding,
Mwp
f
=
Zave
"
1 0 0
0
0
= 0 1 0
0 0 0 Zave /f
1 0 0
0 1 0
"
and
x
y
f
=
Zave
"
X
Y
This is simply the perspective equation with individual point depths Zi replaced by
an average constant depth Zave
The weak-perspective model is valid when the average variation of the depth of the
object (Z) along the line of sight is small compared to the Zave and the field of view
is small. We see this as follows.
Expanding the perspective projection equation using a Taylor series, we obtain
f
x=
Zave + Z
"
X
Y
Z
f
Z
1
+
=
Zave
Zave
Zave
!"
...
X
Y
When |Z| << Zave only the zero-order term remains giving the weak-perspective
projection. The error in image position is then xerr = xp xwp :
xerr
f
=
Zave
Z
Zave + Z
"
X
Y
showing that a small focal length (f ), small field of view (X/Zave and (Y /Zave ) and
small depth variation (Z) contribute to the validity of the model.
1.5
The affine camera reduces to the case of orthographic (parallel) projection when M
represents the first two rows of a rotation matrix. The simplest form is
Torth
yielding,
"
Morth =
1 0 0 0
= 0 1 0 0
0 0 0 1
1 0 0
0 1 0
"
and
x
y
"
X
Y
2.1
When the perspective effects are small, the problem of locating perspective epipolar
lines becomes ill-conditioned. In such cases it is convenient to assume the parallel
projection model of the affine camera which explicitly models the ambiguities.
The affine epipolar constraint can be described in terms of the affine fundamental
matrix F as p0T Qp = 0, i.e.,
x0i yi0
0 0 a
xi
1
0 0 b yi = 0
1
c d e
i
Z
Scene point
ave
Average depth
plane
Image plane
X
Optic Center
Xp
X
wp
X
orth
Figure 1: 1D image formation with image plane at Z = f . Xp , Xwp and Xorth are the
perspective, weak-perspective and orthographic projections respectively.
where p0 = (x0 , y 0 , 1)T and p = (x, y, 1) are homogeneous 3-vectors representing corresponding image points in two views.
(See Shapiro, Zisserman and Brady).
To derive the above, we write M as (B | b) where B is a general (non-singular)
2 2 matrix and b is a 2 vector. The projection equation then gives
"
xi = B
Xi
Yi
+ Zi b + t
x0i
=B
Xi
Yi
+ Zi b0 + M 0 D + t 0
X1
X1
X2
X2
X3
X3
u3
x1
x2
u1
x1
x2
u3
x3
u2
u2
x3
xe
xe
u1
(a)
(b)
x0i yi0
0 0 a
xi
0
0
b
1
yi = 0
1
c d e
i
2.1.1
Given correspondences in two views the affine fundamental matrix can be computed
using orthogonal regression by minimizing
X
1 n1
(ri n + e)2
2
| n | i=0
Here ri = (x0i , yi0 , xi , yi )T and n = (a, b, c, d)T . The minimization finds a hyper-plane
that globally minimizes the sum of the squared perpendicular distances between ri
and the hyper-plane.
Defining
vi = ri
r
and
W=
n1
X
vi vi T
i=0
2.2
Affine Structure
and
X0 = X0 X00 = AX
Affine projections
If the affine camera models for the two views are given by the parameters {M, t}
and {M0 , t0 } respectively, then
x = MX
and
x0 = M0 X0 = M0 AX
for
i = 1, . . . , n
(2)
and
These correspondences establish the bases {e1 , e2 , e3 } and {e01 , e02 , e03 } provided
no two axes, in either images, are collinear. Each additional point gives four
equations in 3 unknowns
"
xi
x0i
"
e1 e2 e3
e01 e02 e03
i
i
and the affine structure can be computed. The redundancy in the system enables
us to verify whether the affine projection model is valid.
2.2.1
x1 x2 . . . xn1
x01 x02 . . . x0n1
x001 x002 . . . x00n1
..
.
e1 e2 e3
2 . . . n1
e01 e02 e03
1
1 2 . . . n1
.
.
.
..
1
2
n1
.
the invariant affine structure of the n points in motion, and the i h row of M, M(i),
along with the corresponding image center x0 (i), gives the projection parameters for
Once the affine structure has been computed, it can be used to generate a new view
of the object (transfer) by simply selecting a new spanning set {e001 , e002 , e003 }. No
camera calibration is needed. Note that this is same as choosing a new projection
matrix M00 .
x00i = x000 + i e001 + i e002 + i e00k
If the affine structure is not of interest (graphics), it is possible to bypass the affine
coordinates and express the new image coordinates x00 directly in terms of the first
two sets of image coordinates x and x0 . One can write the projection equations
in the first two views as
x = GX
x0 = G0 X
where G and G0 are 2 3 matrices with rows {G1 , G2 } and {G01 , G02 } respectively.
The new view can be similarly written as
x00 = G00 X
where G00 has rows {G001 , G002 }.
Now, any three rows of {G1 , G2 , G01 , G02 } define a linearly independent spanning
set for A3 , say {G1 , G2 , G01 }. So, there exists scalars such that
"
00
G =
a1 a2
b1 b2
"
G+
a3 0
b3 0
G0
x00 =
a1 a2
b1 b2
"
x +
a3 0
b3 0
"
x0 =
a1 a2 a3
b1 b2 b3
y
x0
Thus, if images of an object are obtained using affine cameras, then a novel view can
be expressed as a linear combination of views (this is useful for object recognition).
2.2.3
Change of basis
Given the current spanning set {e1 , e2 , e3 } and {e01 , e02 , e03 } in the two images, we
have that
"
#
"
# i
xi
e1 e2 e3
=
i
0
0
0
0
xi
e1 e2 e3
i
Suppose that we now wish to express the same set of points using alternative spanning
sets {h1 , h2 , h3 } and {h01 , h02 , h03 }, the new affine coordinates must obey
"
2.2.4
xi
x0i
"
h1 h2 h3
h01 h02 h03
i
i
i
~
Q
~
P
reference
plane
p
p
q
~
p
q
V
1
V
2
~
q
2.3
2.3.1
Rigid reconstruction
Assumptions
Affine projection
Metric constructions
2.3.2
Procedure
Image Plane
Fronto-parallel
plane
rotate. Consider the projection of all image points on to this axis. If they differ
in the two views, they must differ by only a constant scale factor. Otherwise,
the rigidity assumption is falsified.
4. Now the two views differ only by a rotation about an axis in the fronto-parallel
plane. Define a Euclidean frame (e1 , e2 , e3 ), such that e1,2,3 are unit vectors
with e1 along the axis of rotation and e3 along the line of sight.
Let G1 e1 + G2 e2 denote the depth gradient of a plane in the object. That is,
the depth of a point e1 + e2 in the image with respect to the fronto-parallel
plane is G1 + G2 . Note that
G1 = tan cos
G2 = tan sin
where is the slant and is the tilt of the plane.
Consider any triangle OXY in the plane. Let the coordinates of X and Y be
(X1 , X2 ) and (Y1 , Y2 ) respectively. Then the third coordinates must be
X3 = G1 X1 + G2 X2
Y3 = G1 Y1 + G2 Y2
For a given turn the rotation can be represented by
1
0
0
0 cos sin
0 sin cos
Of the three transformed coordinates, the first one is trivially unchanged and
the third one is not observable. The second coordinate is observable, and the
equations are:
X21 = X20 cos sin (X10 G1 + X20 G2 )
Y21 = Y20 cos sin (Y10 G1 + Y20 G2 )
here the upper indices label the views and the lower indices label the components.
Because the turn is unknown, we eliminate it from these equations to obtain a
single equation in (G1 , G2 ). This equation represents a one-parameter solution
for the two view case. The parameter is the unknown turn . The equation is
quadratic in (G1 , G2 ) with the linear term absent; and represents a hyperbola
in the (G1 , G2 ) space (please derive it).
5. Repeating the steps above between the second and a third view, we obtain a
pair of two view solutions. Each two view solution represents a one-parameter
family of solutions. The one-parameter families for the 0-1 transition and the
1-2 transition are represented by the hyperbolic loci in the gradient space. The
pair of hyperbola has either two or four intersections. The case of no intersection
occurs only in the non-rigid case. If the motion is rigid, then there has to be
one solution and hence a pair of them. The intersections represent either one or
two pairs of solutions that are related through a reflection in the fronto-parallel
plane.