You are on page 1of 13

The University of Melbourne

Department of Computer Science and Software Engineering


433-480: Computer Vision and Image Processing
Semester 1, 2002

Programming Assignment

1 Introduction
The objective of this assignment is to give you some experience with mid-level
computer vision processing, namely region segmentation using linked pyramids.

2 Background
A pyramid, or more particularly a gray-level pyramid, is a multi-resolution im-
age data structure. The main originator of pyramids (at least by that name)
was Peter Burt (in the late 1970s), although a similar idea, the image cone,
originated with Leonard Uhr in the 1960s. The main distinction (aside from the
cultural one of what lineage of research lab work is done in) is that pyramids
tend to contain only numeric pixel data, while cones conceptually can contain
other information, such as symbolic interpretations. The multiresolution ideas
of pyramids are related to the multiresolution ideas behind wavelet transforms.1
Pyramids can be constructed in various ways. A simple pyramid construction
is this: Start out with the original full-resolution image. Call this level 0. The
construction can be made to work for arbitrary image sizes, but it works most
neatly for square images with a power-of-two size, let’s say 2N ×2N . We compute
the average of every disjoint 2×2 block in the level 0 base image. Those averages
comprise an image of size 2N −1 ×2N −1 , with resolution reduced by a linear factor
of 2 (therefore by a factor of 4 in area). This is level 1 of the pyramid. We can
then repeat this process on level 1 to produce level 2, of size 2N −2 × 2N −2 . In
general, level k of the pyramid is of size 2N −k × 2N −k . This can continue up to
the top level of the pyramid, which is just a single pixel in size, although often
the pyramid stops short of that top level.2
An example of a pyramid (at least of the successive reduced-resolution images
that make up the pyramid) is shown in Figure 1 (taken at Cape Schank on the
Mornington Peninsula). Note the reduction in image size with accompanying
loss of detail.
The name pyramid comes, obviously, from visualizing these successively re-
duced images stacked on top of each other. Of course, they are exponentially
tapering pyramids, not linearly tapering like Egyptian pyramids. In fact, since
the sum of sizes forms a geometric series, the amount of storage needed to store
the pixel data is bounded above by 1 31 times that needed for the base level.
Another way of looking at pyramid construction is as a combination of
smoothing and subsampling. In effect, we’re smoothing the image with a 2 × 2
1 In
fact, pyramid construction can be thought of as part of a wavelet transform.
some treatments of pyramids, the level numbering goes the opposite way: For a 2N ×2N
2 In

image, the base of the image is counted as level N , with the top of the pyramid being level 0.
The term layer is sometimes used instead of level.

1
Level 0 Level 1 Level 2 ...

256 × 256 128 × 128 64 × 64 ...

Figure 1: Reduced-resolution images in a pyramid.

mean filter, and then subsampling the image by taking alternate pixels on al-
ternate rows. However, instead of computing the smoothed value at every pixel
position, we do it only where we subsample. There’s a minor, technical “phase”
issue here: When we take alternate pixels, do we take odd-numbered or even-
numbered pixels? (Similarly for rows.) The convention we’ll follow here is that
we put the image origin at pixel (0, 0) at the lower left-hand corner of the image,
and sample at points both of whose x and y coordinates are even. The other
three conventions are at least possible, and will lead to different results. But
for reasonable natural images, the differences should be negligible. (See Note 3
in Section 7 for more on this issue.)
Pyramids can be used for coarse-to-fine processing: We find something first
at a high level of the pyramid, where there are few pixels to search through. We
cannot locate that thing accurately in the coarse-resolution image, but we can
use its approximate location in the coarse-resolution image as a starting point
for searching with higher accuracy in the lower-level higher-resolution layers of
the pyramid. This way, we can find what we’re looking for with full accuracy
without the cost of searching completely through the full-resolution image. Of
course, this only works for things that are large enough still to be reasonably
detectable higher up in the pyramid.
Pyramids are related to quadtrees and other regular hierarchical image data
structures. One important difference is that pyramid construction is mainly a
bottom-up process while quadtree construction is mainly top-down.3
We can think of the pixels at various levels in a pyramid as being nodes in
a tree structure, with each pixel at a higher level being the parent of the child
pixels in the corresponding 2 × 2 block in the level immediately below.
This is easier to visualize in 1D. Imagine we just have a single 1D slice
of an image (say along an image row). Each pair of child pixels on one level
contributes to their parent pixel on the level above, as shown in Figure 2. In
2D, it’s a 2 × 2 block of child pixels that contribute to a parent pixel on the level
above. One way of looking at it is that each (parent) pixel at a certain level has
a receptive field (where it gets its input from) which is the corresponding 2 × 2
3 This is not an absolute distinction, just an observation on the general nature of the
respective structures. For example, algorithms to convert from a raster image to a quadtree
are mainly bottom-up.

2
Figure 2: Parent/child tree structure in 1D pyramid.

3 4

2 2 4 4 4 4

Figure 3: Overlapped pyramid in 1D.

block of pixels on the level below. For an adjacent parent pixel on the higher
level, the receptive field is moved over by 2 pixel positions, the receptive fields
are disjoint, and don’t overlap.
This leads us to an important variation on pyramids, the overlapped pyramid.
Again, there are various ways of constructing overlapped pyramids, but a simple
way is as follows:
Let’s look first at the 1D version, as shown in Figure 3. (For simplicity, this
figure shows only two adjacent levels of a very small pyramid. For now, ignore
the numbers, which represent example intensities.) Each pixel at the higher
level has a receptive field 4 pixels long at the level below. But the receptive
fields still stepped over only by 2-pixel steps, so that there is (in this case) a
50% overlap of receptive fields. In Figure 3 the child-parent links that would
have been there in the non-overlapped pyramid are shown as solid arrows; the
additional links arising from the overlapping are shown as dashed arrows. The
overlapping receptive fields at the lower level are shown respectively as dashed
and dotted boxes, slightly offset for clarity.
In 2D each pixel at the higher level has a corresponding 4 × 4 receptive field
on the level below, and these similarly overlap, since the receptive field for an
adjacent pixel at the higher level is moved over by only 2 pixels at the lower
level. If the value for a parent pixel is computed as the average over its entire
receptive field (though this is not always done), then the effect of the pyramid
construction is as if we’d first smoothed the image with a 4 × 4 mean filter,

3
and then subsampled by taking pixels in steps of 2 in both rows and columns.
(But of course, as before, the computation of the average is done only at the
subsampled pixels.)
Dropping back to 1D, you can see from Figure 3 that, owing to the overlap,
some child pixels at the lower level have two parents in the level above. In fact,
in general almost all pixels will have two parents on the level above. The reason
that there are only two such pixels in Figure 3 is that the rows of pixels are very
short, and edge effects predominate. Analogously, in an overlapped pyramid in
2D every pixel will have 4 parents on the level above (ignoring edge effects).
Now it starts to be interesting. We can regard the tree links in a pyramid
not as actual links always to be used for averaging, but as potential links that
might be used for averaging. This leads to the family of iterative pyramid linking
algorithms, which can (inter alia) be used for region segmentation. As always,
there are many variations. I’ll describe first a simple version, which for reasons
that will become clear later, is called unweighted forced-choice linking.
This is how it goes:

1. Build an overlapped pyramid structure.

2. Initialize the pyramid: Load the original grayscale image into the lowest
level. Moving up through the levels of the pyramid to the top, compute
each parent pixel as the average of the child pixels below it. Even though
this is an overlapped pyramid, in which each parent pixel has 16 children
in its 4 × 4 receptive field, we normally compute this initial average just
from the core 4 children of the receptive field, as if it were a non-overlapped
pyramid.
As a 1D example, look at the numeric pixel values in Figure 3. The values
in the lower level are the original grayscale values (a run of 2s followed
by a edge transition to a run of 4s). At the level above, the right-hand
parent pixel value, 4, is the average of the its core children (both 4, linked
by the solid arrows); the left-hand parent pixel value, 3, is the average of
its core children (in this case 2 and 4). At this stage, the left-hand parent
is a mixed pixel, in that it mixes in pixels from both sides of the edge.

3. Then iterate the following process, typically until there is no further


change (or at least until changes become sufficiently small):

(a) Starting from the base level, each child pixel examines its 4 potential
parents in the overlapped scheme, and elects to link with the parent
most similar to it in gray level. It’s as if we deleted all the upwards
links, except those that link a child to its most similar parent.
(b) Then on the level above, each parent pixel recomputes its gray level
as the average only of those children that chose to link to it.4
(c) This linking and recomputation is repeated on each level up through
the pyramid. It might go right up to the single pixel at the very top
of the pyramid, or it might stop short of the top, say at the 4 × 4
level.
4 It’s possible that none of a parent’s children may choose to link to it—they all prefer

other parents. This needs to be handled specially.

4
Note that it is necessary to iterate, since the recomputation of a parent’s
gray level may change which children link to it. Typically, though, the
process converges well enough after only a few iterations.
Returning to our 1D example in Figure 3: On the lower level, the outer
pixels don’t have a choice, they each have only one parent to link to. The
two inner pixels (both 4) have two potential parents each. The one on the
right links to its “primary” parent (via the solid arrow), while the one on
the left links to the same parent (in this case its “secondary” parent, via
the dashed arrow). When the averages are recomputed, the right-hand
parent stays at 4, but the left-hand parent now recomputes its average as
2, which corresponds to the true value on the left of the edge. For this
example there would be no further changes on subsequent iterations. The
point is that while the edge between 2 and 4 on the lower level falls at
a point that doesn’t match up with the pyramid structure, the relinking
process can (to a limited degree) regroup pixels into regions. Of course,
in a realistic situation, there would be noise to take into account, but the
overall principle should be clear.
4. After this iterative process has converged, the pyramid will contain within
it a stable tree structure of child nodes linking to their parents. We want
to identify “well formed” subtrees in the pyramid. The idea is that the
base nodes of such a subtree projected all the way down to the base level
will constitute a segmented region of the image.
A simple of way of doing this is, as mentioned above, just to run the
linking process up to say the 4 × 4 level, and then treat each of these 16
pixels as the root of a subtree. While simple, this is often not entirely
satisfactory, since the number of regions found will depend on how high
up in the pyramid we stop—although for some applications this is good
enough. (Going up to the 4 × 4 level does not necessarily mean getting
exactly 16 regions, since some parents may have no children linking them.)
Another, usually better way, is to look through the tree to find places
where a child is very different from its parent. Every child must link to
a parent (that is why it’s called forced choice). However, sometimes even
the most similar parent is significantly different from the child (since the
gray value of all the parents are computed from different other children).
In such a case we make the “orphan” child the root of a subtree, which
we trace back down to make a region at the base level of the pyramid. In
effect, we cut links that aren’t sufficiently similar. If we form subtrees this
way, we can do the linking process all the way up to the top single-pixel
level of the pyramid. Of course, to decide which links to sever, we need
some sort of threshold on similarity (or difference).

3 Variations
The algorithm described above, as mentioned, is called forced choice, in that
every child is forced to link to one of its parents, the most similar. The breaking
of links to form region subtrees happens only at the very last stage. A variation
on pyramid linking is to do unforced linking, in which linking from child to
parent is optional. That is, a child node still links to its most similar parent,

5
but only if that parent is sufficiently similar. If none of the parents is similar
enough, then no link from that child is made. This happens at every iteration,
not just at the end.
Another variation is to perform weighted linking. Here, rather than have
the linking be an on/off, all-or-nothing affair, each link between a child and its
4 parents is assigned a numeric weight.5 This weight can be computed from a
combination of gray-scale similarity and spatial proximity (so that even if the
gray-scale differences are the same, a child pixel would tend to link more strongly
with its closer, “primary” parent, the one it would have had without overlap).
Then, at the recomputation stage, the new average for a parent is computed as
a weighted average of all its children. This is like a “soft computation” version
of the selective averaging done in unweighted pyramid linking. See [HR84],
for example, for ideas about link-weight formulae. After convergence, the link
weights can be thresholded to find the region subtrees.
Weighted linking done this way is analogous to unforced linking, in that it’s
possible for a child to have low-weight links to all its parents. The analog to
forced linking is to normalize the weights from a child to its parents, say by
dividing them by their sum (or possibly maximum). That way, if all the links
from a child to its parents are low, the normalization will bring them back up
higher—a “soft” version of forcing the linking.
Clearly, pyramid linking can be applied to any reasonable kind of pixel value,
not just gray scale. It’s reasonably straightforward to adapt pyramid linking to
color images. The pixel similarity can be based on Euclidean distance between
pixels in 3D RGB space, although there are probably better ways of doing it.6

4 Basic Requirements
For this assignment, I want you at a minimum to write a program to read a
grayscale PGM image from standard input (see Note 5 in Section 7) and produce
a region segmentation of that image using unweighted pyramid linking (forced
or unforced at your choice). Again at a minimum, the output (on standard out-
put) should be a simple, textual description of the “significant” regions found,
essentially one line per region, with whitespace-delimited numeric fields (integer
or float7 as appropriate) in the following format:
i n xmin ymin xmax ymax xc yc Iavg
where
i is an identifying number for the region. Identifying numbers need not be in
order, or even be contiguous, they just have to be unique for each region.
5 The weight can be thought of as a fuzzy-logic truth value assigned to the link, but there

is no requirement that the weights be between zero and one.


6 Human subjective perception of color differences does not correspond with Euclidean

distance in RGB color space. This is one more instance of a general issue in computer vision.
There’s no fundamental reason why computer vision processing has to be based on human
perception—you don’t have to build aeroplanes with feathers. However, there is empirical
evidence that human perception works quite well at coping with the world (look at how many
humans there are around), so human perception can provide useful ideas for computer vision
systems. A more subtle issue is that if robots are to interact with humans, then it would be
helpful if they perceived the world in similar ways.
7 No need to express floats to an excessive number of decimals. Three significant figures is

probably quite adequate.

6
It’s certainly a little neater to have the region numbers be contiguous inte-
gers from 1 on up, but it’s not a requirement. (To allow for continuation
lines, as described below, i should start right at the beginning, column
one, of the line.)

n is the area of the region, as a pixel count in the original image (base level of
the pyramid).

xmin ymin xmax ymax specify the bounding box of the region, respectively min-
imum and maximum x and y coordinates.

xc yc specify the centroid of the region (average of x and y coordinates over the
region).

Iavg is the average gray-scale intensity over the region.

Exactly what is a “significant” region is left up to your judgement. A rea-


sonable default rule would be to count a region as significant if it occupies at
least 5% of the area of the image.
Take it that coordinates are cartesian coordinates (x rightwards, y upwards)
with origin in the lower-left corner of the image, measured at full resolution
(base level of the pyramid). These may be different from the external image
row/column coordinates, or even from the internal array indices that you use.
You will need to convert appropriately if this is so.
The average graylevel Iavg can be expressed following either of two conven-
tions: it can be an integer between 0 and the image’s maxval, or it can be
a floating-point number between zero and one, representing a fraction of the
image’s maxval. No other values are valid.
You may, at your option, as part of the challenges described below, provide
additional information about each region. However, that should be done in a
way that is sensibly compatible with the above format, generally by tacking the
additional fields on to the right of the required fields specified above. If the lines
end up being uncomfortably long, you can adopt the convention that any line
beginning with whitespace is a continuation of the previous line.
At this “basic requirements” level your program need take no command-line
arguments or options; it just reads from standard input and writes to standard
output. Handling the challenges described below will probably require having
your program handle appropriate command-line arguments and options. How-
ever, strive for compatibility: without any arguments or options, the default
behavior of your program should be sensibly compatible with what’s described
here. Even at the basic requirements level, you might find it convenient to have
command-line options to specify various tunable parameters, like the threshold
level for similarity for cutting subtrees, or the percentage area of “significant”
regions. These should be documented, but your program should have reasonable
default behavior without them.

5 Challenges
Since this is an Honours subject after all, some initiative is expected on your
part. To get a top mark, you will need to go beyond the basic requirements
described above. This part is deliberately left open ended: What I talk about

7
here are just possible ideas that you might try. You might have bright ideas of
your own. If you have, you’re encouraged to discuss them with me.
Weighted linking Implement some kind of weighted linking scheme. You
might get some ideas from [HR84]. This might involve somehow com-
paring the results of weighted with unweighted linking to see whether it
gives any improvement in results (as a trade-off with the increased com-
putation cost). Presumably, you would have some kind of command-line
option to switch weighted linking on or off. You might compare the quality
of segmentation results for different settings of weighting formulae.
Color You could adapt the pyramid linking scheme to work with segmenting
color images. As mentioned above, using Euclidean distance in RGB color
space could be used, but perhaps there are better color metrics.
Display Find effective ways of displaying the results of the segmentation. This
is mainly for human consumption, but it can be of great help in visualizing
and understanding what’s happening in the segmentation process, and can
be important for presenting results effectively in publications. One idea is
produce an output image in which the value of each pixel is replaced by the
average intensity (or color) value of the region it belongs to. Another idea
is to produce (maybe as bitmap image, maybe as Postscript) a line graphic
which draws the boundaries of the extracted regions, possibly with anno-
tations. (To do this properly, you need to draw the region boundaries in
the “cracks” between the pixels.) This will require command-line options
or arguments to specify things like output image and scaling parameters.
Other region features You might consider computing additional features of
the extracted regions, like perimeter (or a least a digital approximation of
it), or various measures of elongatedness, or other shape measures. Check
with me for suggestions.
Processing You might try variations on the pyramid algorithm. For exam-
ple, in the algorithm sketched above, links and averages are recomputed
upwards through the pyramid, and this overall process is iterated until
convergence. A variation is to repeat the recomputation of links and aver-
ages between a particular two levels until convergence, before moving up
to the next pair of levels. Is this actually any different? You could play
around with the convergence criterion: Iterate until there are no further
changes in pixel values, or until the maximum (or average) pixel change
is sufficiently small.
Another idea is to introduce some top-down processing as well. In the
standard scheme, propagation of averages is only bottom-up: a parent re-
computes its gray level from its children. The only top-down process is the
tracing out of the subtrees at the end. We could also have children average
with their respective chosen parent (maybe according to some weighting
scheme), on an interleaved, repeated top-down pass, so that eventually
(we might hope) even at the base level, pixel values will converge towards
the average for the region they belong to.
Evaluation Work on ways of evaluating the performance of pyramid linking.
A simple thing is just to do timing measurements, so you can, for example,

8
get an idea of the additional computation cost of doing weighted linking.
More challenging is to come up with criteria for evaluating the quality of
the segmentation results. One idea is to start out with a known synthetic
test image, say of various circles or squares of constant intensity on a
constant background, corrupted by various amounts of noise and blurring,
and measure how well the results of the segmentation match up with
the “ideal” original image. (This is to view the segmentation as a kind of
image restoration process.) The evaluation could be done at the pixel level:
compute say the pixelwise RMS (root mean square) difference between the
image after segmentation (with each pixel replaced by the region average)
and the “ideal” uncorrupted original. Or the evaluation could be done at
a higher level: for example, measure how accurately the circle parameters
(center position, radius) can be measured from the segmentation.
One line to take is to measure (and graph) the quality of the segmenta-
tion iteration by iteration. Maybe all of the improvement comes in the
first few iterations. This might give you ideas for ways of improving the
performance of pyramid linking.
Different pyramid geometries Most pyramids (ignoring overlap) are based
on assembling 2 × 2 blocks bottom-up (which corresponds to decomposing
by quadrants top-down), giving a reduction in resolution by a linear factor
of 2 in going from one level to the next. However, pyramids can also be
constructed based on 3×3 blocks, giving a factor of 3 resolution reduction.
One advantage of this is that the alignment between levels is neater: the
center of a parent pixel is geographically exactly at the center of its central
child. However, for many purposes, the linear step of 3 in resolution
between levels is too big. Going the other way, some researchers, like
James Crowley, have worked with pyramid-like structures in which the
level-to-level resolution change is the square root of 2. Doing this requires
some funny interpolation, but makes for a smaller change level to level.
You might investigate the effects of different pyramid geometries like these.
3D You could adapt pyramid linking to work with 3D image data, such as
comes from CT or MRI imaging. This could be quite a challenge, both
from a computational and visualization point of view. Talk to me first if
you’re thinking about taking this on.
These extensions are not mutually exclusive: you might work on some in com-
bination. For example, you could do some sort of evaluation to compare the
effectiveness of weighted versus unweighted linking.

6 Working, Submission and Assessment


This project is worth 30% of your mark for 480 and is due at 5pm on Monday
17 June 2002. Late submissions will be accepted only by prior arrangement Note change to due date.
with me, and in any case will suffer a penalty of 3 marks per day late, unless
you have some valid reason for special consideration.
If you like, you can do this as an individual project, but I recommend you
do it as a group project in teams of two or three. Your programming project
team need not be the same as your presentation team, but I expect for most

9
people this would be the most convenient arrangement, since you’re already
working together.
Whichever way you decide to do it, let me know as soon as possible, so I
can do things like set up sharable CVS repositories as required. Let me know
explicitly even if you are sticking with your existing presentation team.
General discussion about approaches and ideas is permitted, even encour-
aged, but your actual submission (code, documentation, etc.) must be the work
only of your team (or of you alone if you’re doing the project individually).
Detected violations of this rule will be treated under the usual departmental
and university discipline procedures. If you are in doubt about any such issue
(for example, making use of publicly available library code) then check with me
first. See Note 4 in Section 7 for certain exceptions.
Assessment will be based on correctness, completeness, style, documenta-
tion, ambition and efficiency—in about that order of importance. Unless there
is some good reason for someone not being in a team, individual and two-person
submissions will be assessed on the same basis. For three-person teams I will
make allowance for the additional labor available. That is, to receive the same
mark as a two-person team, a three-person will have to achieve something com-
mensurately more (though this is not a linear scale just measured in lines of
code). For just satisfying the basic requirements, the maximum mark for indi-
vidual and two-person team submissions is 24/30 (80%); for three-person teams
this is 22/30 (74%). To receive a higher mark, you will need to show some
creativity and go beyond satisfying the basic requirement by doing at least one
of those things listed under “Challenges” above (Section 5).
Details of submission mechanics will be announced later. However, your
submission should consist of your source code, a makefile to build your system,
documentation explaining your method, and optionally some test data files and
shell scripts to run them. To make life easier for me in marking, make it so
that your executable ends up being called pyrseg and all necessary programs
are created by the default target of your makefile. You should write your code
so that it will run as expected on any reasonable Unix/Posix platform. How-
ever, the default target platform is the Department’s Sun Solaris x86 student
machines. That is, I will initially mark your program on whatever reasonable
Unix platform is convenient for me, but if I run into problems that I suspect
are platform-related, I’ll fall back onto running on those student machines.
Acceptable languages for coding your project are C, C++, or Java, with
possibly shell or Perl scripts as wrappers to glue together compiled executables.8
Other languages may be acceptable—check with me for approval. Java will run
much slower than C or C++ and will limit the size of images that you can
reasonably run on. But if you particularly like Java, it’s a viable choice, since
you can still get interesting results on fairly small images.9
You can optionally submit some reasonably small test cases (images and shell
scripts to run the cases) on which you believe your program functions properly.
Normally, I won’t look at these at all. However, if your program fails badly on
8 For example, you might have two separate compiled executables for doing unweighted and

weighted linking respectively, but have a single top-level shell script called pyrseg which execs
one or the other depending on a command-line option.
9 A lot of the early work on pyramids was done on a PDP-11/45 with an approximately

1 microsecond instruction time. Memory limitations meant that 120×120 was then the biggest
image that could reasonably be processed in a pyramid.

10
my test cases, I might—as a last resort—try your own provided examples. If
your program runs only on your examples, but not my test cases, then you will
receive only a low mark, but possibly better than if your program had failed
even on your examples, or if you provided no examples of your own.
You may also need to provide your own test cases if they’re necessary to
demonstrate some additional features you’ve implemented in your program (be-
yond the basic requirements), features that may not be exercised by my test
cases. For example, if you implement color pyramid linking, then you will need
to provide some test cases with suitable color images.
I’ll provide some sample test images. However, you should regard provision
of test data as largely your own responisibility. For my testing in marking, I
may use some of the sample image files I provide, or I may use different (though
similar) images.
Your documentation does not have to be extensive, but should be sufficient
that I can understand the approach you’ve taken and how your programs work.
Your documentation (in addition to program comments) can be in plain ASCII
text, HTML, Postscript, or PDF. Make sure your HTML can be rendered as
expected by netscape, and your Postscript or PDF by gv. If there’s anything
I should pay special attention to in marking, then you should mention it in a
short README file.

7 Notes
1. For an overlapped pyramid, the “natural” size for an image is not nec-
essarily an exact power of two. This is something for you to figure out.
A related issue is how you handle edge effects—pixels near the boundary
of the image will not have a full choice of potential parents. One simple
solution is to compute all image coordinates modulo the image size. This,
in effect, wraps the image around into a toroidal (doughnut) topology.10
Though simple, this isn’t entirely satisfactory, since you can get spurious
linkages across the wrap-around.
You’ll also need to do something sensible with an image that isn’t a “natu-
ral” size for the pyramid construction. One solution is to embed the image
inside an image of the next larger “natural” size. What value should the
extra pixels be? One simple solution is just to make them black. Per-
haps more satisfactory is to mark them with some distinguished value
that won’t link with anything, which can be treated specially. You might
be able to find a way of dealing reasonably with arbitrary-sized images
without embedding them into a larger “natural-sized” image.

2. Representation and processing: How you process and represent a pyramid


is up to you. While it is possible to process a pyramid by keeping only a
few rows of each level in memory at a time, for the sizes of images we’ll be
using it’s far and away more convenient just to store the entire pyramid
in memory. A convenient way is to have an array of pointers, one for each
level of the pyramid, which points to a representation for the image at
that level. That in turn is an array of pointers to allocated image-pixel
10 This cyclic wrap-around also naturally happens with Fourier transforms, which are inher-

ently periodic.

11
rows. That way you can refer to the pixel at coordinates (x, y) at level k,
by something like p[k][y][x] in C. (Or if you allocate by columns, you
could do p[k][x][y].) At each pixel, as well as the grayscale intensity, you
may need to store various auxiliary pieces of information, most important
about links, but possibly also the previous intensity value so you can check
for convergence (though this may not be necessary, depending on how you
do the checking).
Even though the links are links, you shouldn’t think of them as pointers.
In actuality, you may at most just need a small integer to encode which of
the four parents is linked to (with possibly a code to indicate “no link”).
You may not even need to encode the links explicitly at all—they can
be inferred as needed on the fly. For example, at the point where a child
elects to link to a particular parent, the child’s intensity can be added into
a running total for the parent. Whether it’s better to have links explicit
or implicit is a small-scale design decision for you to make. This is for the
unweighted case—the computational cost of computing weights may shift
the balance towards some more explicit representation.
3. Geometry: One of the things you’ll need to do is figure out the geometric
relationship, in terms of actual coordinates and image coordinates, be-
tween a parent pixel on one level of the pyramid and its children on the
level below. This can be a little tricky. If we imagine that pixels are little
square tiles, with position measured nominally at the center of the tile,
then a parent pixel is like a big tile that covers its 4 children’s tiles, and
the nominal position of the parent pixel, strictly speaking, lies in between
the child pixels on the level below. For this reason, it’s far simpler to
regard the nominal position of a pixel as being at say the lower left-hand
corner of its pixel tile. Under this scheme, the position of a parent pixel
is geometrically the same as that of its southwest child. Even though this
probably introduces some small bias, it makes things much easier to code,
and I recommend you follow it. This means that a child pixel at image
coordinates (x, y) has its parent at image coordinates (⌊x/2⌋ , ⌊y/2⌋) on
the level above. This is for the non-overlapped case. You can figure out
the corresponding mapping for the overlapped case, and for the mapping
from a parent’s image coordinates to those of its children.
4. You can make use of existing library code (like in pbmplus) for simple
things like reading, writing, and pixel access of images (although it’s not
hard to roll your own). If the code you’re using isn’t already installed on
the system, you may need to include it with your submission. Any use of
existing code beyond this must be approved by me ahead of time. And
in any case you must make proper acknowledgement of any use of other
people’s code and respect any licensing requirements.
5. As a starting point, you can find out about PGM format via man pgm. It’s
part of the pbmplus/netpbm package, originally written by Jef Poskanzer.
Pbmplus specifies a family of very simple image formats, numerous utility
programs to operate on them, and some library code for image access and
storage (see Note 4 above).
6. Do not worry much about error detection and recovery. You can assume

12
that all inputs and arguments are correct and properly formatted. Having
said that, including some sanity checks (defensive programming) would
certainly help you with development and debugging, and make your pro-
gram more robust.
7. By its nature, pyramid linking is good at segmenting out compact, blob-
like objects/regions, but not so good at other kinds of objects, like elon-
gated wiggly worms. Whether this is a bug or a feature depends on what
kind of objects you’re trying to find. In some applications it’s an advan-
tage.
8. You can see that, in some respects, pyramid linking implements a kind
of iterated, selective-averaging smoothing. Also, pyramid linking is very
similar to the classic k-means or ISODATA clustering algorithm. In fact,
the convergence proof for k-means can be adapted to prove convergence of
pyramid linking. This was work done by Kasif and Rosenfeld in the early
1980s. I can track down a reference if anybody’s interested.
9. Don’t hesitate to ask questions about the project, or to discuss your so-
lution approaches with me. Also, be prepared for fine-tuning and clari-
fications of the specification, and possibly corrections. These will be an-
nounced in lectures, via cs.480, or via the 480 webpages, as appropriate.
It is your responsibility to be aware of any such announcements.

Bibliography
[HR84] T. H. Hong and Azriel Rosenfeld. Compact region extraction using
weighted pixel linking in a pyramid. IEEE Trans. Pattern Analysis and
Machine Intelligence, 6(2):222–229, March 1984.

Les Kitchen, 13 May 2002 (updated 20 May)

$Id: proj-2002.tex,v 1.11 2005/09/22 01:46:15 ljk Exp $

13

You might also like