Fleming Anderson Preprint

Roland Fleming & Bart Anderson: Perceptual Organization of Depth
The Perceptual Organization of Depth

Roland Fleming and Bart Anderson
Department of Brain and Cognitive Sciences, MIT
Running Head: Perceptual Organization of Depth

Correspondence:
Roland Fleming,
MIT Room NE20-451,
77 Mass. Ave.,
Cambridge, MA 02139.
Email: roland@psyche.mit.edu
Introduction
The goal of depth perception is to identify the spatial layout of the objects and surfaces
that constitute our surroundings. One important observation about the world around us
that influences the way we see depth is that physical matter is not distributed randomly,
with arbitrary depths at every location. On the contrary, the environment is generally
organized: the world consists mainly of tightly bound objects in a discernable layout.
This order results from countless forces and processes in our world which tend to
organize matter into objects and place those objects in certain spatial relations. The
central thesis of this chapter is that our perception of depth mirrors this organization. We
argue that because the world consists of objects and surfaces, our perception of depth
should likewise be represented in terms of the functionally valuable units of the
environment, namely surfaces and objects.
As we shall see, this has profound
consequences for the processing of depth information. In particular, there is more to

depth perception than simply measuring the distance from the observer of every location
in the visual field. Rather, the perception of depth is the active organization of depth
estimates into meaningful bodies. Depth constrains the formation of perceptual units,
and, reciprocally, the figural relations between depth measurements allow the visual
system to parse its representation of depth into ecologically valuable structures.
There are many sources of information about depth from pictorial perspective to
motion parallax. An exhaustive review of all these sources of information is beyond the
scope of this chapter (although see Bruce et al., 1996 and Palmer, 1999 for introductory
reviews). Instead, we discuss three key domains in which the visual system organizes
our perception of depth into meaningful units, to emphasise the intimate relationship
between depth processing and perceptual unit formation.
In the first section we discuss how the visual system infers the layout of surfaces
from local measurements of depth. We will argue that local estimates of depth are
ambiguous, but that the geometry of occlusion critically constrains the legal
interpretations. Occlusion occurs when one opaque object partly obscures the view of a
more distant object, as happens frequently under normal viewing conditions. Occlusion
is important because it occurs at object boundaries, and therefore the depth
discontinuities introduced by occlusion provide ideal locations for the segmentation of
depth into objects. Moreover, as we will show in section 1, the geometry of occlusion
causes relatively near and relatively far depths to play different roles in the inference of
surface structure.
In the second section we discuss the visual representation of environmental
structures that are hidden from view. If the visual system is to organize depth into
meaningful bodies, it must represent whole objects and not only those fragments that
happen to be visible. In order to do this, the visual system must interpolate across gaps in
the image to complete its representation of form. We argue that by considering the
particular environmental conditions under which structures become invisible (specifically
occlusion and camouflage) we can make predictions about the mechanisms underlying
visual completion. We also discuss how visual completion influences the representation
of depth.
Finally, we discuss what happens when the scene contains transparent surfaces,
and thus multiple depths are visible along a single line of sight. We argue that this
introduces a second segmentation problem in the perceptual organization of depth. The

visual system not only needs to segment perpendicular to the image plane, such that
neighbouring locations are assigned to different objects; with transparency, the visual
system also has to segment depth parallel to the image plane, by separating a single
image intensity into multiple depths, a process known as scission (Koffka, 1935). We
discuss the conditions under which the visual system performs scission, and how the
ordering of the surfaces in depth is resolved.
We argue that the ambiguity of local depth measurements, the representation of
missing structure, and the depiction of multiple depth planes are three of the major
problems faced by a visual system if it is to organize depth into surfaces and objects.
Through systematic explanations of example stimuli, we discuss some of the ways in
which the visual system overcomes these problems.
1. Interpreting local depth measurements: the contrast depth asymmetry principle
In this section we discuss how occlusion constrains the interpretation of local depth
estimates. Specifically we show that occlusion enforces a crucial asymmetry between
relatively near and relatively distant structures that can have profound implications for
the representation of surface layout. Although the principles are discussed in terms of
binocular disparity, the fundamental logic relates to the geometry of occlusion and
therefore applies to any local estimate of depth.
1.1 Binocular stereopsis and the correspondence problem.

Binocular stereopsis is the most thoroughly studied source of information about depth.
Binocular depth perception relies on the fact that the two eyes receive slightly different
views of the same scene. The horizontal parallax between the views has the consequence
that a given feature in the world often projects to two slightly different locations on the
two retinae (see Figure 1). These small differences in retinal location, or binocular
disparities, vary systematically with distance in depth from the point of convergence
and can thus be used to triangulate depth. For a thorough treatment of stereopsis see
Howard and Rogers (1995) and chapters [chapter numbers for Ohzawa, Schor and
Shimojo] of this volume.
In order to determine the disparity of a feature in the world, the visual system
must localize that feature in the two retinal images. Once it has identified matching
image features, the difference in retinal location is the binocular disparity, which can then
be scaled to estimate depth. The visual system must not measure the disparity between
features that do not belong together, otherwise it will derive spurious depth estimates (see
Figure 1). Because of this, the accuracy of the matching process is critical to binocular
depth perception. The problem of identifying matching features in the two eyes views
(that is, features that originate from a common source in the world) is known as the
correspondence problem.
If the features that the visual system localizes in the two images are very simple,
such as raw intensity values (or pixels) then in principle there could be many distracting
features that do not in reality share a common origin in the world. Under these
conditions, the correspondence problem would be difficult as the visual system would
have to identify the one true match from among a large number of false targets.
However, there is considerable debate about what types of image features the
visual system matches to determine disparity (Julesz, 1960, 1971; Sperling, 1970; Marr
and Poggio, 1976, 1979; Pollard, Mayhew and Frisby, 1985; Prazdny, 1985; Jones and
Malik, 1992). Psychophysically, at least, it now seems unlikely that the visual system
matches raw luminances.
Rather, the visual system seems to match local contrast
signals, that is, localizable variations in intensity, such as luminance edges (Anderson
and Nakayama, 1994, Smallman and McKee, 1995). This seems an almost inevitable
consequence of early visual processing, which maximises sensitivity to contrasts, rather
than to absolute luminances (Hartline, 1940; Wallach, 1948; Ratliff, 1965; Cornsweet,
1970). By the time binocular information converges in V1, the visual field appears to be
represented in terms of local measurements of oriented contrast energy (Hubel and
Wiesel, 1962; DeValois and DeValois, 1988) and thus it is likely that these are the
features from which disparity is computed.
If this is true, then the image features that carry disparity information are local
contrasts, such as luminance edges. However, this poses a problem for the visual system,
for in order to capture the functional units of the environment, the visual representation of
depth should be tied to surfaces and objects, not to local image features. There is
therefore, a potential discrepancy between the image features that carry disparity
information (i.e. local contrasts), and the perceptual structures to which depth is assigned
(i.e. regions) in the ultimate representation of environmental layout. This discrepancy
plays a critical role in the theoretical discussion that follows.
A local image feature, such as an edge, has only one true match in the other eyes
image. Therefore, the edge carries only one disparity. However, depth is ultimately
assigned to the two regions that meet to form the edge. This results in a problem: in
order to represent surface structure the visual system must assign depth to both sides of
an edge, even though the edge carries only one disparity (see Figure 2). How does the
visual system infer the depths of two regions from every local disparity signal? We will
show that the geometry of occlusion imposes an inviolable constraint on the
interpretation of local disparity-carrying features. To anticipate, we show that the simple
fact that near surfaces can occlude more distant ones, but not vice versa, has profound
consequences for the assignment of depth to whole regions.
1.2 Asymmetries in depth: a demonstration.

By way of motivation for the theoretical discussion that follows, consider figure 3, which
is based on a figure developed by Takeichi, Watanabe and Shimojo (1992). The figure
consists of a Kanizsa illusory triangle and three diamonds. When disparity places the
diamonds nearer to the observer than the triangle and inducers (by cross-fusing the
stereopair on the left of figure 3), the diamonds appear to float independently in front of
the background, and the Kanizsa triangle tends to be seen as a figure in front of the
circular inducers; this percept is schematised in figure 3b. The disparities in the display
can be inverted simply by swapping the left and right eyes views, as can be seen by
cross-fusing the stereopair on the right of figure 3. In this case what was previously
distant becomes near and vice versa, such that the diamonds are placed behind the plane
of the inducers. In both versions of the display, the triangle itself carries no disparity
relative to the circular inducers; only the disparity of the diamonds changes from near to
far. This simple inversion leads to a change in surface representation that is more
complex than a simple reversal in the depth ordering of the perceptual units (as
schematised in figure 3b). When the diamonds recede, they drag their background back
with them, such that the triangle appears as a hole through which the observer can see a
white surface; the three black diamonds lie embedded in the more distant white surface.
This recession of the background has a secondary effect of increasing the strength of the
illusory contour (the border of the triangle).
The important observations with regard to the theory are the following. First,
when the diamonds are in front, they are freely floating and separate, while when they
recede, they drag the background with them. Second, when the dots are forward, the
Kanizsa triangle tends to be seen as a figure (rather than ground), but when the diamonds
are more distant, the triangle is seen as a hole. And yet all that changed in the display
was the disparity of the diamonds. Why does this simple reversal in depth lead to an
asymmetric change in the surface representation?
Why does the disparity of the
diamonds influence the appearance of the triangle? These are the asymmetries of depth
to which the following discussion pertains.
1.3 From features to surfaces: interpretation of local disparity signals.

Let us assume that the visual system has located a luminance edge and derived a
disparity, d0, from that edge. What possible surface configurations are consistent with the
local disparity measurement? Broadly, the legal interpretations fall into two classes, as
shown in figure 4. The first class consists of surface events in which both sides of the
edge meet at the depth of the edge, d0. There are many surface events for which this is
the case: reflectance edges, cast shadows, and creases in the surface, to name just three.
When the feature originates from a continuous manifold, as in these cases, interpretation
is simple, as both sides of the edge are assigned the same depth, d0.
The second class of interpretations occurs when the edge corresponds to an object
boundary, and therefore represents a depth discontinuity (see figure 4). In this case, one
side of the edge lies at the depth of the occluding object, and the other side of the edge
lies at the depth of the background. Therefore, the visual system must assign different
depths to the two sides of the edge. How can the visual system assign two depths, when
it is given only one disparity, d0? The answer is that it only assigns a unique depth to the
occluding side. The critical insight is the following: The depth measurement acquired at
an occluding edge only specifies the depth of the occluding surface. The visual system
assigns depth d0 to the occluding surface. All that it knows about the other side is that it
must be more distant than the occluding surface.
If the more distant surface is
untextured, then it could be at any depth behind the occluder and the local image data
would remain the same. By contrast, if the depth of the occluding surface varies, the
disparity carried by the object boundary must also change, because the occluding surface
owns the contour (Koffka, 1935, Nakayama, Shimojo and Silverman, 1989) and is
therefore responsible for the disparity associated with the edge.
Although the visual system cannot uniquely derive the depth of the occluded side
(i.e. the background) from the local disparity computation, there is one critical piece of
information that it does have, and that is that the occluded side is more distant than the
occluder. There is no way for an occluding object to be more distant than the background
10
that it occludes. If the background is brought closer than the object, then the background
becomes the occluding surface, and carries the edge with it. In this way, occlusion
introduces a fundamental asymmetry into the interpretation of disparity-carrying edges:
the occluded side of the edge can be at any distance greater than d0, but neither side can
be nearer than d0.
We can summarise the possible depth assignments (from the occlusion and nonocclusion classes just described) in the form of a constraint on the interpretation of local
disparity-carrying contrasts, which is termed the contrast depth asymmetry principle
(Anderson, submitted; see also Anderson, Singh and Fleming, 2002):
Both sides of an edge must be situated at a depth that is greater than

or equal to the depth carried by that edge.
Although this geometric fact is simple in form, it can have pronounced effects on
the global interpretation of images, when the constraint applies to all edges
simultaneously. We will now run through an example to show how the principle can
explain the asymmetric changes in perceived surface structure that occur when near and
far disparities are inverted.
1.4 Application of the contrast depth asymmetry principle.

In order to demonstrate the explanatory power of the contrast depth asymmetry principle
(hereafter CDAP), we will now use it to account for the demo in figure 3. Recall that
when the diamonds carry near disparity, they float freely in front of the background, and
11
the illusory triangle tends to be seen as figure. When the disparity is reversed, however,
the diamonds drag the background back with them, and the triangle appears as a hole.
This asymmetry in surface layout is depicted in figure 3b.
Let us first consider the case in which the diamonds appear to float in front. The
visual system has to interpret the disparity signals carried by the edges of the diamonds.
The CDAP requires that both sides of the diamonds edges (i.e. the black inside and the
white outside of the diamonds) have to be at least as distant as the edges. Now consider
the inducers, which are more distant than the diamonds. The constraint requires both
sides of these edges to be at least as distant as their edges. This means that all of the
black interior of the inducers must be at least this distant and, more importantly, all of the
white background must be at least this distant, which is further than the disparity of the
diamonds. If all of the white background is further than the diamonds, then the edges of
the diamonds must be occluding edges, and the black interior of the diamonds must be an
occluding surface. This explains why the diamonds are seen as independent occluders,
floating in front of the large white background and black inducers: the edges of the
inducers drag the white background back, leaving the diamonds floating in front.
Now consider the case in which the diamonds are more distant than the inducers.
Again, the CDAP requires that both the inside and the outside of the diamonds have to be
at least as far back as their disparity dictates. This means that both the diamonds and
their white background are dragged back to the more distant disparity. Now consider the
inducers, which carry a relatively near disparity. Because the white background behind
the diamonds has been dragged back with the diamonds, the inducers and their white
background must be occluding surfaces. This means that the background immediately
12
surrounding the diamonds must be visible through a hole in the occluding surface. The
edges of this hole are the illusory contours of the Kanizsa figure. Note again, the fact that
both sides of every edge have to be at least as far as the edge, leads to asymmetrical
surface structures when disparities are inverted.
This is just one example that shows how the CDAP can account for asymmetrical
effects of relatively near and relatively far disparities on perceived surface layout;
because the CDAP is derived from the geometry of occlusion, it can account for a very
large number of displays, and can be used to generate surprising new displays (see
Anderson, 1999; Anderson, submitted).
2. Occlusion and camouflage: hallucinating the invisible
The central thesis of this chapter is that the visual system does not merely record depth at
each location in the visual field; rather, it actively organizes its depth measurements into
functionally valuable units. In the last section, we discussed how occlusion plays a key
role in this organization. In this section, we discuss how the visual system handles what
is arguably the hardest problem posed by occlusion: the visual representation of
structures that are hidden and are therefore completely invisible. If seeing depth is about
representing the actual layout of objects in the environment, then all portions of the
objects must be represented, even those that are hidden from view: hidden portions do not
disappear from the environment just because they do not appear in the image. Therefore,
the visual system has to go beyond local image data to construct representations of
13
hidden structures. We will now discuss how the environmental conditions of occlusion
and camouflage predict properties of the construction process.
2.1 Modal and amodal completion

We will consider two major ways in which parts of the scene can become invisible. The
first is simple occlusion, when an opaque object obscures part of a more distant object.
When this happens, the occluded structures of the more distant object have no
corresponding features in the image, and thus the visual system must somehow
reconstruct the missing data. The second way that viewing conditions can lead to
invisible structures is through camouflage. In camouflage it is the nearer, occluding
surface that is rendered invisible because it happens to match the color of its background.
Because the boundaries of the camouflaged object do not project any contrast, they have
no corresponding features in the image and thus the nearer object is effectively invisible.
Under these circumstances, the visual system must actively hallucinate the invisible
structures. In both cases, the visual system interpolates missing data, a process that is
known as visual completion. This process is important to depth perception because it
is one of the means by which the visual system organizes its depth measurements into
meaningful bodies. We argue that depth perception and unit formation are intimately
intertwined, for depth constrains the perceptual units that are formed, and perceptual
organization influences the interpretation of local depth measurements.
The phenomenal quality of completed structures differs, depending on whether it
is near (camouflaged) or far (occluded) structures that are interpolated. In the case of
camouflage, the interpolation leads to a distinct impression of a contour or surface across
14
the region of missing data. This is referred to as modal completion (Michotte, Thines
and Crabbe, 1991/1964) because the experience is of the same phenomenal modality as
ordinary visual experience. An illusory contour, for example, is crisp, and subjectively
similar to a real contour, as can be seen in figure 5a. In contrast to this, the sense of
completion experienced with occluded structures is less distinct. The black form in
figure 5b tends to be seen as a single object, part of which is hidden, rather than as two
distinct objects, whose boundaries coincide with the boundary of the grey occluder.
There is a compelling sense that the two visible portions of the black form belong to the
same object, and that that object continues in the space behind the occluder. However,
this impression, although visual in origin, is not of the same phenomenal mode as normal
and modal contours, and is therefore referred to as amodal completion (Michotte et al.).
In general, the regions of the image which are visible, and lead to visual completion are
referred to as inducers.
2.2 The identity hypothesis.

There is a vast literature on visual completion and a thorough discussion of all the issues
is beyond the scope of this chapter. One important issue that is discussed in greater detail
in chapter [chapter number for Shimojo], is whether visual completion occurs relatively
early or late in the putative processing hierarchy. However, the perceptual organization
of depth has a direct bearing on another current debate, specifically, the extent to which
modal and amodal completion are the consequence of a single process. This issue is
intimately bound to depth perception because it determines the extent to which depth
processing and perceptual organization are independent.
15
The debate runs roughly as follows. On the one hand there has been the strong
claim that a single completion mechanism is responsible for both modal and amodal
completion.
According to this account, perceptual organization (including visual
completion) produces perceptual units, and an independent process places those units in
depth. The theory states that psychological differences between modal and amodal
completion results from the final depth ordering of the completed forms (Kellman and
Shipley, 1991; Shipley and Kellman, 1992; Kellman, Yin and Shipley, 1998) rather than
a difference between the completion processes themselves.
This is known as the
identity hypothesis. On the other hand the two processes could be largely independent,
subject to different constraints and subserved by distinct neural mechanisms. The strong
form of this dual mechanism hypothesis would be that the two processes are of a
fundamentally different kind, for example, that modal completion is largely data-driven,
while amodal completion is essentially cognitive. To anticipate, although we do not
subscribe to the strongest form of the dual-mechanism hypothesis, we will provide
evidence that modal and amodal completion follow different constraints and argue that
they are subserved by distinct neural processes. Central to the arguments that we present
are the geometric and photometric conditions under which occlusion and camouflage
actually occur in the environment.
The principle evidence for the identity hypothesis has been that subjects perform
similarly with modally and amodally completed figures in a variety of tasks. In one task,
Shipley and Kellman (1992) varied the spatial alignment of the inducing elements in both
modally and amodally completed squares. Such misalignment is known to weaken the
sense of completion, as the completed boundary is forced to undergo an inflection.
16
Subjects were asked to rate the subjective strength of visual completion as a function of
the degree of misalignment for modal and amodal versions of the display. Shipley and
Kellman (1992) found that ratings declined at the same rate as a function of misalignment
for both modal and amodal figures. This has been interpreted as evidence that a single
mechanism is responsible for both forms of completion.
Using a more rigorous method, Ringach and Shapley (1996) performed a shape
discrimination task with modal and amodal versions of a Kanizsa figure. By rotating the
inducing elements, the vertical contours of the completed square can be made to bow out
(creating a Fat Kanizsa), or curve in (creating a Thin Kanizsa). Subjects were asked
to discriminate between Fat and Thin versions of the display while the angle through
which the inducers were rotated was varied.
Ringach and Shapley found that
discrimination performance as a function of rotation was nearly identical for modal and
amodal versions of the display, a finding which is consistent with the identity hypothesis.
One problem with this type of evidence is that it relies on negative results, that is,
a failure to detect a difference, which could be due to the method rather than a
fundamental property of the system being studied. Should positive evidence be provided
that modal and amodal completion are subject to different constraints, or result in
different perceptual units, then the identity hypothesis would no longer be tenable.
There are two major reasons for believing that modal and amodal completion
should be subject to different constraints, both of which are related to the environmental
conditions under which occlusion and camouflage occur. First, occlusion occurs over
greater distances across images because it only requires that one object is in front of
another. Camouflage, on the other hand, requires a perfect match in color between the
17
near surface and its background, and thus occurs less frequently in general.
This
difference is reflected in a constraint on the image distances over which modal and
amodal completion occur, which was first documented by Petter (1956). Petter used a
class of stimuli now known as spontaneously splitting objects (SSOs), which consist of a
single homogeneously colored shape, such as the one shown in figure 5c, that tends to be
interpreted as two independent shapes, one behind the other. Which object is seen in
front tends to oscillate with prolonged viewing. However, which shape is seen in front
first, and which tends to be seen in front for a greater proportion of the time can be
predicted rather well from the lengths of the contours that must be interpolated. Petters
rule states that longer contours tend to be completed amodally, while shorter contours
tend to be completed modally. Thus, which figure is seen in front can be predicted from
the length of the contours that must be completed. If the two types of completion are
subject to different constraints on the distances over which they occur, this opens the
possibility that they are subserved by different mechanisms.
A second reason for believing that modal and amodal completion are subject to
different constraints relates to the color conditions that are required for occlusion and
camouflage to occur. Again, occlusion can happen between objects of any color. The
reflectance of the near object is unrelated to the fact that it hides the more distant one
from view.
This suggests that amodal completion should not be sensitive to the
luminance relations between the image regions involved.
Camouflage, by contrast,
requires a perfect match in luminance between the near and far surface. This implies that
modal completion should be sensitive to the luminance relations between the image
regions involved.
18
Recent experimental work has shown that this luminance sensitivity can lead to
large differences between modal and amodal displays (Anderson, Singh, Fleming, 2002).
Anderson et al. created displays consisting of two vertically separated circles filled with
light and dark stripes, as shown in figure 6. The binocular disparity of the circles was
kept constant, but the disparity of the light/dark contours inside the circles was altered to
place the stripes behind or in front of the circular boundaries. When the stripes were
further than the circles, the top and bottom stripes tended to complete amodally to form a
single continuous dark and light surface, which appeared to be visible through two
circular holes, as schematised in figure 6d. This percept occurred irrespective of the
luminance of the region surrounding the circles.
By contrast, when the disparity placed the contours in front of the circles, the dark
and light stripes separated into different depth planes. The way in which the stripes
separated from one another depended on the luminance of the surround. When the
surround was the same color as the light stripes, the light stripes appeared to float in front
and completed modally across the gap between the two circles. In this condition, the
dark stripes completed amodally underneath the light stripes to form complete circles.
This lead to an impression of light vertical stripes in front of dark circles, as schematised
in figure 6e. However, when the surround was the same luminance as the dark stripes,
the percept inverted, such that the dark stripes appeared to float in front of light disks.
This demonstrates a fundamental dependence on luminance that was not present in the
amodal version of the display. Furthermore, if the surround was an intermediate grey,
then the display was not consistent with camouflage, as neither the light nor the dark
stripes perfectly matched the luminance of the background. Under these conditions, there
19
was no modal completion across the gap, and the percept was difficult to interpret. This
demonstrates that modal completion is sensitive to luminance relations, while amodal
completion is not.
Anderson et al. showed that this luminance sensitivity could affect performance
on basic visual tasks such as vernier acuity. The stripes in the top and bottom circles can
be horizontally offset (i.e. misaligned slightly), without destroying the sense of
completion. Subjects were asked to report in which of two displays the contours were
slightly misaligned. Both modal and amodal completion facilitate performance in this
task. However, in the amodal case performance was unaffected by the luminance of the
surround, while in the modal case, performance was much worse when the luminance of
the surround was an intermediate grey (the condition in which the stripes do not complete
across the gap). Thus, modal and amodal completion are subject to different constraints,
both on the distance over which they occur, and the luminance conditions that are
required to induce them. This positive evidence for a difference between modal and
amodal completion uses essentially the same types of task as the negative evidence that
had previously been used to support the identity hypothesis.
2.3 Visual completion and the perceptual organization of depth.

The geometric and photometric differences between modal and amodal completion are
derived directly from the environmental conditions of occlusion and camouflage.
Because occlusion and camouflage occur under different circumstances, they have
different consequences for the organization of depth into meaningful bodies. In fact, the
differences can be exploited to generate stimuli in which modal and amodal completion
20
lead to different shapes. This is important as it shows that unit formation is intimately
bound to the placement of structures in depth.
The greater promiscuity of amodal completion is the key in the generation of
these displays. Figure 7 is a recently developed stereoscopic variant of the Kanizsa
configuration in which the inducing elements are rotated outwards (Anderson et al.,
2002). When the straight segments (the mouths of the pacmen) are placed in front of
the circular portions of the inducers, the impression is of 5 independent illusory
fragments that float in front of 5 black disks on a white background. However, when the
two eyes views are interchanged, and thus the straight contours are placed behind the
circular segments, the impression is rather dramatically altered.
With the disparity
inverted, the impression is of a single amodally-completed, irregularly-shaped, black

figure on a white background, which is visible through 5 holes in a white surface (these
percepts are schematised in figures 7b and c). Thus, the former case consists of a total of
11 surfaces (5 fragments + 5 disks + white background), while the latter case consists of
3 (1 white surface with 5 holes + 1 black shape + white background). Clearly the
placement in depth has a considerable effect on what perceptual units are formed.
Anderson et al. also provided evidence that differences between modal and
amodal interpolation can lead to differences in the very shapes of completed contours
themselves. When the left-hand stereopair in Figure 8a is uncross fused, the resulting
percept consists of six circular disks that are partly occluded by a jagged white surface on
the right-hand side, as schematised in figure 8b. However, when the disparities are
inverted (by uncross-fusing the right pair of Figure 8a), the modal completion across the
regions between the four black blobs tends to take the form of a continuous wavy contour
21
that runs down the center of the display. This percept is schematised in figure 8c. The
importance of this demonstration is that it shows that modal and amodal completion can
not only result in different surface structures, but even in differently shaped contours. It
is difficult to see what the concept of a single completion mechanism serves to explain if
the two processes can result in different completed forms.
Ultimately, the identity hypothesis is a claim about mechanism and can therefore
be assessed physiologically. There is a considerable body of evidence for extrastriatal
units that are sensitive to illusory, but not to amodally-completed, contours (see chapter
[chapter number for von der Heydt], this volume, for a review). A critical additional
piece of evidence was provided recently by Sugita (1999), who found cells in V1 that
respond to amodal completion across their receptive fields, but not to modal completion.
Cells responded weakly when presented with two unconnected edges; holes and
occluding surfaces on their own; and stimuli in which two unconnected edges were
separated by a hole. However, when the cells were presented with two edge fragments
separated by an occluder (a stimulus that leads to amodal completion of the edge), the
cells responded vigorously. This shows that at the earliest stages of cortical processing,
there is a double dissociation between the representations of modal and amodal
structures, a conclusion which supports the dual mechanism hypothesis.
3. Transparency, scission, and the representation of multiple depth planes.
22
Transparency poses a particularly interesting problem in the perceptual organization of

depth. With transparency, one object is visible through another, and thus two distinct
depths lie along the same line of sight (see Figure 9). If the visual system is to represent
depth in terms of the actual surfaces of the environment, it has to depict two distinct
depths at a single location in the visual field. The process of projection compresses the
light arriving from the transparent surface and the light arriving from the more distant
surface into a single image intensity on the retina. In order to represent both surfaces, the
visual system has to separate a single luminance value into multiple contributions, a
process known as scission (Koffka, 1935). We argue that scission is a type of perceptual
segmentation as it parses the representation of depth into distinct surfaces. However,
rather than segmenting neighbouring locations into distinct objects, scission separates
depth into layers, or planes, and thus operates parallel to the image plane.
Scission poses the visual system with two principle problems. The first is to
identify when a single luminance results from two distinct depths. The second is to
assign surfaces properties correctly at the two depths. By studying when and how we see
transparency, we can learn how the visual system scissions depth into layers.
Much of the seminal work on perceptual transparency was conducted by Metelli
(1970, 1974a,b; see also Metelli et al., 1985), who provided a thorough quantitative
analysis of the color mixing that occurs when one surface is visible through another.
When a background is visible through a transparent sheet, only certain geometrical and
luminance relations can hold between the various regions of the display (see Figure 9).
From these relations Metelli derived constraints that determine whether a region will look
transparent or not, and how opaque it will appear if it does look transparent. This is
23
important as it determines the conditions under which the visual system scissions a single
image intensity into multiple layers, and thus how the visual system stratifies its
representation of depth.
Broadly the conditions required for perceptual scission fall into two classes. The
first are the photometric conditions for transparency, which detail the relations between
the light intensities of neighbouring regions that are necessary for scission. The second
set of conditions for perceptual scission are geometrical, or figural. Depth only separates
into layers when these relations hold between the various regions of the display.
3.1 Photometric conditions for scission.

Consider the display shown in figure 9a, which tends to be seen as a bipartite background
that is visible through a transparent filter. The vivid separation of the central region into
two depths only occurs when certain luminance relations hold. Metelli derived two
constraints on the photometric conditions required for perceptual scission.
The intuition behind the first constraint, which we refer to as the magnitude
constraint, is that a transparent medium cannot increase the contrast of the structures
visible through it. The consequence of this constraint is that the central diamond must be
lower contrast than its surround in order to appear transparent, as shown in Figure 9a.
This constraint is important as it restricts the conditions under which scission occurs: a
region can only scission if its contrast is less than or equal to the contrast of its flanking
regions. As can be seen from figure 9c, infringement of this constraint with respect to the
central diamond prevents the central disk from undergoing scission. However, in this
display, the constraint is satisfied for the region surrounding the diamond, and thus, the
24
display can be seen as a bipartite display seen through a transparent filter with a
diamond-shaped hole in the centre.
The intuition behind the second luminance constraint, which we refer to as the
polarity constraint, is that a transparent medium cannot alter the contrast polarity of the
structures visible through it. Put another way, if a dark-light edge passes underneath a
transparent medium, the dark side will remain darker than the light side, no matter what
the absolute luminances are. As can be seen from Figure 9d, infringement of this
constraint prevents perceptual scission, demonstrating that the visual system respects this
optical outcome of transparency. This constraint is particularly important in determining
the depth ordering in transparent displays.
The polarity constraint enforces certain restrictions on the ordinal relationships
between the luminances of neighbouring regions. This means that, in principle, we can
classify the locations where neighbouring regions meet to determine whether scission is
or is not possible in each region. This provides the visual system with a local signature
of transparency. Beck and Ivry (1988) noted that if one draws a series of lines running
progressively from the brightest to the darkest regions, there are three possible shapes
that result, as shown in figure 10. The only difference between the three figures is the
luminance of the region of overlap between the two squares. In the first instance (Figure
10a), the image is bistable, as either square can be seen as a transparent overlay. In these
circumstances the lines linking regions of increasing luminance form a Z-configuration.
When the lines form a C-shape (Figure 10b), only one of the squares is seen as
transparent, and when the lines criss-cross (Figure 10c), the polarity constraint is
infringed for all regions, and neither square scissions. Adelson and Anandan (1990)
25
provided a similar taxonomy based on the number of polarity reversals. A number of

lightness illusions demonstrate that scission can be predicted from the class of Xjunctions in the display, and that these X-junctions can have powerful effects on many
qualities of our experience (see, for example, Adelson, 1993, 1999).
The magnitude and polarity constraints can be unified as a single rule that
describes a powerful local cue to scission. Anderson (1997) phrased the rule as follows:
When two aligned contours undergo a discontinuous change in contrast magnitude, but
preserve contrast polarity, the lower-contrast region is decomposed into two causal
layers. There are two valuable consequences of this rule. The first is that it unifies the
two Metelli constraints. The second is that it provides a local signature of transparency
that can be applied to any meeting of contours. This includes those T-junctions that are
in fact degenerate X-junctions; that is, those in which two neighbouring regions happen
to have exactly the same luminance. Anderson (1997) also demonstrated that a number
of traditional lightness phenomena, including Whites effect and its variants, and neon
color spreading, can be accounted for as cases of scission, rather than the consequence of
traditional contrast or assimilation processes.
Having identified that a location contains two surfaces, the visual system has to
partition the luminance at that location between the two depths. How much of the light is
due to reflectance of underlying surface, and how much is due to the properties of the
overlying layer? The opacity of the overlying layer determines how the luminance is
divided between the two depths. Metellis model makes explicit predictions about the
perceived opacity and lightness of the transparent layer. The equations predict that two
surfaces with identical transmittance should look equally opaque irrespective of their
26
lightness. However, Metelli himself noted that dark filters tend to look more transparent
than light filters with the same transmittance. Why does the visual system confuse
lightness and transmittance in partitioning luminance between two depths?
In a series of matching experiments, Singh and Anderson (in press) recently
resolved this issue. Subjects adjusted the opacity of one filter until it matched the
perceived opacity of another filter with a different lightness. Singh and Anderson found
that perceived transmittance is predicted almost perfectly by the ratio of Michelson
contrasts inside and outside the transparent region, even though such a measure is
actually inconsistent with the optics of transparency. As discussed above, there is a
general consensus that the early visual processing tends to optimise sensitivity to
contrast, rather than absolute luminance. Hence, in assigning transmittance, the visual
system appears to use the readily available contrast measurements, even though they are
not strictly accurate measurements of opacity.
3.2 Figural conditions for scission.

In addition to the luminance conditions, certain geometrical relations must hold between
the various regions of the display in order for depth stratification to occur (Metelli, 1974;
Kanizsa, 1979/1955). These figural conditions fall in two broad classes. The first class
requires good continuation of the underlying layer. Specifically, the contours that are in
plain view should be continuous with the contours viewed through the region of
presumed transparency. As can be seen from figure 9e, infringement of this condition
interrupts the percept of transparency.
The second figural condition requires good
27
continuation of the transparent layer. Figure 9f shows that infringement of this condition
weakens or eliminates the percept of transparency.
There are conditions in which the figural cues to transparency are so strong that
they can override the luminance cues. Beck and Ivry (1988) showed subjects displays
like the one shown in figure 10c, in which the region of overlap between the two figures
is the wrong contrast polarity for either figure to be seen as transparent. Despite this,
nave subjects did occasionally report seeing such figures as transparent, demonstrating
that the sense of figural overlap is a central aspect of the percept of transparency.
Certainly most observers are willing to agree that the region of overlap in Figure 10c
appears to belong to two figures simultaneously, an impression that can be enhanced with
stereo and relative motion. However, it should be noted that the grey of the overlap
region does not appear to scission into two distinct sources, at least not in the same way
as the overlap of a normal transparency display does (as in Figures 10a and 10b). This
leads to the possibility of two distinct neural processes in the perception of transparency.
One is driven by relatively local cues and leads to phenomenal color scission. The other
is driven by more global geometrical relations, and leads to stratification in depth. Under
normal conditions of transparency, the two processes operate concinnously to produce the
full impression of transparency. However, using carefully designed cue-conflict stimuli,
such as those used by Beck and Ivry, these two factors in the representation of transparent
surfaces can be distinguished. An open question, however, is how these processes are
instantiated neurally. All we can conclude is that the representation of depth is much
more sophisticated than a mere 2D map of depth values.
28
3.3 Scission and the perceptual organization of depth.

Scission can have pronounced effects on perceptual organization. For example, Stoner,
Albright and Ramachandran (1990) demonstrated that perceived transparency can alter
the integration of motion signals into coherent moving objects. When a plaid is drifted at
constant velocity across the visual field, it is typically seen as a single coherent pattern
that moves at the velocity of the intersections between the two component gratings.
However, with prolonged viewing the plaid appears to separate into two component
gratings that slide across each other, each of which appears to move in the direction
perpendicular to its orientation. When the plaid is coherent, it appears to occupy a single
depth plane, but when it separates into its components, the gratings tend to appear at
different depths.
Stoner et al. varied the intensity of the intersections of the plaids and measured
the proportion of time for which the plaid was seen as coherent. They found that when
the color of the intersection was consistent with one grating being seen through the other
(i.e. when the junctions are consistent with transparency), the proportion of the time for
which the plaid appeared to separate into gratings was greatly increased. By contrast,
when the color of the intersections infringed the polarity constraint, such that neither
grating could be seen as transparent, the pattern tended to be seen as a coherent plaid,
rather than undergoing scission into distinct layers. This demonstrates that scission has
important consequences for the representation of visual structure. When an image region
scissions, the effects can spread to regions distant from the local cues to scission.
Scission acts as a nexus between depth and other visual attributes. Scission of
depth can cause regions to change in apparent lightness, and conversely changes in
29
luminance can cause changes in depth stratification. Figure 11 (taken from Anderson,
1999) demonstrates this close relationship between luminance, scission and the
perceptual organization of depth. Three circular patches of a random texture were placed
on a uniform background. Critical to the demonstration is that disparity is introduced
between the circular boundaries and the texture inside the circles. When the disparity
places the texture behind the circular boundaries, the circles appear as holes, through
which the texture is visible.
The texture tends to appear as a single plane with
continuously, stochastically varying lightness. However, when the disparity places the
texture in front of the circular boundaries, the percept changes considerably. The texture
separates into two distinct layers: a near layer made of clouds with spatially varying
transmittance, and a far layer that is visible through the clouds, which consists of uniform
disks on a uniform background.
Another interesting property of this display is that the lightness and spatial
structure of the clouds and disks reverse completely when the luminance of the surround
varies. In figure 11, the top and bottom displays are completely identical except for the
lightness of the surround. When the surround is dark, the texture scissions into dark,
smoke-like clouds in front of white disks. However, when the surround is white, it is the
light portions of the texture that move forward, floating like mist in front of dark disks.
One final observation about the display is that when the texture carries near disparity, and
thus undergoes scission, the clouds that float in front tends to complete modally across
the gaps in between the disks. This is in part due to the fact that the conditions for
camouflage are satisfied, as discussed in section 2.
30
When the depth is reversed in the display, two asymmetries occur. The first is
geometrical in that it alters the structure of the depths in the scene. In the near case the
texture scissions into two layers, while in the far case the texture appears relatively
uniform in depth by comparison. The second asymmetry that occurs with depth inversion
is photometric in that it is driven by the luminance of the surround and determines the
lightness of the cloud and disks. When the texture is distant, the percept changes very
little with changes to the luminance of the surround; by contrast, when the texture is near,
the luminance of the surround critically determines how the scission occurs as well as the
lightness of the cloud and disks.
In what follows, we will use the contrast depth
asymmetry principle (CDAP) discussed in section 1 and the concept of scission to

explain theses asymmetries. For a more thorough discussion see Anderson (submitted).
Let us first consider the case in which the texture carries far disparity relative to
the circular boundaries. Because the texture is continuously varying in luminance, it
carries localizable disparity signals at almost every location. Put another way, if disparity
is carried by contrast, as argued in section 1, then patterns that are richly structured bear
the densest distribution of disparities. Recall that the CDAP requires both sides of every
contrast to be at least as distant as the disparity carried by the contrast. This means that
when the texture is given far disparity (or more precisely, when the contrasts of the
texture are given far disparity), both the light and dark matter in the texture recede to this
depth.
In turn, the depth-placement of the texture uniquely determines the border-
ownership of the boundaries of the disks, which carry relatively near disparity. If the
insides of the disks (i.e. the texture) carry far disparity, then the outsides (i.e. the region
surrounding the disks) must be at the depth carried by the circular boundary. Thus, the
31
circles are seen as holes in the surrounding surface; it is through these holes that the
texture is visible.
The situation is more complex when the depth is reversed, i.e. when the contrasts
of the texture are nearer than the contrast of the circular boundaries. Crucial to the
following argument is that it is contrasts that carry disparity, while it is the light and dark
regions that make up the contrasts to which depth is assigned. First let us consider the
circular boundary between the surround and the texture. When the surround is light, it is
the dark portions of the texture (inside the circles) that contrast with the surround. Thus,
the disparity of the circular boundary is carried by the contrast between the light matter of
the surround, and the dark portion of the disk. The CDAP requires both of these regions
to be at least as distant as the disparity carried by the boundary. This means that the light
surround is dragged back to this depth, and the dark matter of the texture is also dragged
back to this depth. Now consider the contrasts between the dark and light portions within
the texture. These contrasts carry relatively near disparity. But the contrast between the
dark matter and the surround has already constrained the dark matter to be at least as
distant as the circular boundary. This means that it must be the light matter of the texture
that is responsible for the near disparity of the texture i.e. the light matter is a near
surface that partly obscures the dark matter. This explains why the texture splits into two
depths: the dark matter is dragged back by forming a contrast that carries far disparity
(i.e. the boundary of the disk) and the light matter floats in front as its boundaries with
the dark matter carry near disparity.
The final logical step in the explanation involves scission. The texture does not
consist of only two luminances, but of a continuous range of luminances from light to
32
dark. How can we explain the appearance of the intermediate luminances in the texture?
Scission makes it possible to separate the intermediate luminances into two distinct
components: dark stuff, and light stuff, which have been compressed into a single
luminance by the process of projection onto the retina. These two components lie in
different depth planes. Put another way, scission allows the visual system to interpret the
grey regions as dark matter viewed through light matter. The critical insight is that it is
the dark stuff in the texture that forms the contrast with the surround. Therefore, all of
the dark stuff belongs to the more distant depth, including the dark stuff in the greys.
All of the remaining lightness in the greys belongs to the transparent clouds that float
in front of the disks. In this way, the intermediate luminances are interpreted as varying
degrees of transmittance of the overlying layer. The lighter the grey, the thicker the
cloud; the darker, the sparser. This explains why the disk appears as a uniform black
disk: all of the black is sucked out of intermediate regions and is dragged back to form
the disk. The left-over lightness is attributed to the transparent clouds.
The whole argument reverses when we change the surround from light to dark.
When the surround is dark, it is the light portions of the texture that contrast with the
surround, and therefore, it is the light portions of the texture that are dragged back. The
near disparity of the texture must therefore be due to the dark regions, and thus dark
clouds are seen to float in front of white disks. Again, as it is the whiteness of the texture
that is dragged back, all of the whiteness in the intermediate luminances is attributed to
the more distant disks. The remaining darkness in the greys is attributed to the dark
clouds that float in front. In this way, changing the luminance of the surround changes
which contrasts carry the disparities, and thus which regions are dragged back by virtue
33
of the CDAP. Scission enables the visual system to separate luminances into multiple
contributions and thus segment the intermediate greys into two distinct depth planes.
This demonstration and others like it are important as they show how multiple
processes interface to determine our percepts of depth and material quality. It is through
the CDAP and scission that the visual system interprets local variations in luminance as
meaningful surfaces located in depth.
Depth stratification complements traditional
segmentation as an important process through which the visual system organizes its
representation of depth into ecologically valid structures.
Conclusions
It is common to think that depth perception involves little more than determining the
depth at each location in the visual field. We have argued, to the contrary, that the visual
system mirrors the structural organization of the environment by tying its representation
of depth to surfaces and objects. Thus depth perception is an active process of perceptual
organization, as well as a passive process of acquiring depth estimates. We have argued
that luminance, disparity and contrast are some of the basic image features that carry
local information about depth, while scission, visual completion and the CDAP are some
of the means by which depth is organized into surfaces.
In the first section we introduced the CDAP and argued that:
(1) disparity is carried by local contrasts (e.g. luminance edges) but assigned to
the regions that meet to form the contrasts.
34
(2) Occlusion introduces a critical constraint on the interpretation of local

disparity signals, the CDAP. This constraint requires that both sides of a
contrast are at the depth specified by the contrast, or one side could be a more
distant occluded surface. In the latter case, the disparity determines the depth
of the occluding side.
(3) The CDAP imposes a fundamental asymmetry between near and far
structures. When simultaneously applied to all edges in a display, the CDAP
can explain a number of asymmetrical changes in perceived surface layout
that occur with simple inversion of the disparity field.
In the second section, we discussed how the visual system deals with structures
that are invisible because they are hidden by occlusion or camouflaged against their
background. We argued that:
(1)
The visual system has to actively complete the missing data if it is to

accurately segment depth into objects.
(2)
Consideration of the environmental conditions of occlusion and camouflage

predicts (a) that modal completion is sensitive to luminance, while amodal
completion is not, and (b) that modal completion tends to occur over shorter
distances than amodal completion.
(3)
As predicted from the environmental differences, distinct mechanisms are

responsible for the two types of completion. The differences can be used to
generate displays in which the completed forms differ when the disparity
field is inverted.
35
Finally, in the third section, we discussed how scission allows the visual system to
represent two depths along the same line of sight, and thus organize depth into layer. We
argued that:
(1) Certain luminance and figural relations must obtain in order for a region to
undergo scission.
(2) Scission can have pronounced effects on perceptual organization in regions
distant from the local signatures of transparency.
36
References
Adelson, E.H. & Anandan, P. (1990). Ordinal characteristics of transparency. AAAI-90

Workshop on Qualitative Vision, July 29, 1990, Boston, MA.
Adelson, E.H. (1999). Lightness perception and lightness illusions, in The new cognitive
neurosciences, (M. Gazzaniga, Editor-in-chief), Cambridge, MA: MIT Press.
Anderson, B.L. (1997). A theory of illusory lightness and transparency in monocular
and binocular images: The role of contour junctions. Perception, 26: 419-453.
Anderson, B.L. (1999). Stereoscopic surface perception. Neuron, 24: 919-928.
Anderson, B.L. Stereoscopic surface perception: Contrast, disparity and perceived depth.
Submitted to Psychological Review.
Anderson, B.L., Singh, M. & Fleming, R.W. (2002). The Interpolation of Object and
Surface Structure. Cognitive Psychology, 44, 148-190.
Anderson, B.L. & Nakayama, K. (1994).
Towards a general theory of stereopsis:
Binocular matching, occluding contours and fusion. Psychological Review, 101:

414-445.
Beck, J. & Ivry, R.
(1988).
On the role of figural organization in perceptual
transparency. Perception & psychophysics, 44: 585-594.

Bruce, V., Green, P.R. & Georgeson, M.A. (1996). Visual Perception (3rd Edition).
Hove, East Sussex, UK: Psychology Press.
Cornsweet, T.N. (1970). Visual Perception. New York: Academic Press.
DeValois, R.L. & DeValois, K.K. (1988). Spatial Vision. New York: Oxford University
Press.
37
Hartline, H.K. (1940). The Receptive Fields of Optic Nerve Fibres. American Journal
of Physiology, 130: 690-699.
Howard, I.P. & Rogers, B.J. (1995). Binocular vision and stereopsis., New York: Oxford
University Press.
Hubel, D.H. & Wiesel, T.N. (1962).
Receptive fields, binocular interaction and
functional architecture of monkey striate cortex. Journal of Physiology, 160: 106154.

Jones, J. & Malik, J. (1992).
A computational framework for determining stereo
correspondence from a set of linear spatial filters. Image and Vision Computing,
10: 699-708.
Julesz, B. (1960). Binocular depth perception of computer generated patterns. Bell
System Technical Journal, 39: 1125-1162.
Julesz, B. (1971). Foundations of cyclopean perception., Chicago, IL: University of
Chicago Press.
Kellman, P.J. & Shipley, T.F. (1991).
A theory of visual interpolation in object
perception. Cognitive Psychology, 23: 141-221.

Kellman, P.J., Yin, C. & Shipley, T.F. (1998). A common mechanism for illusory and
occluded object completion.
Journal of Experimental Psychology: Human
Perception & Performance, 24: 859-869.

Koffka, K. (1935). Principles of Gestalt Psychology. Harcourt, Brace and World:
Cleveland.
Kanizsa, G. (1979/1955). Organization in Vision. New York: Praeger.
38
Marr, D. & Poggio, T. (1976). Cooperative computation of stereo disparity. Science,

194: 283-287.
Marr, D. & Poggio, T. (1979).
A computational theory of human stereo vision.
Proceedings of the Royal Society of London (B), 204: 301-328.

Metelli, F. (1970). An algebraic development of the theory of perceptual transparency.
Ergonomics, 13: 59-66.
Metelli, F. (1974a). The perception of transparency. Scientific American, 230: 90-98.
Metelli, F. (1974b). Achromatic color conditions in the perception of transparency, in
Perception: Essays in Honor of J.J. Gibson, (R.B. MacLeod, H.L. Pick, eds.).
Ithaca, NY: Cornell University Press.
Metelli, F., da Pos, O. & Cavedon, A. (1985). Balanced and unbalanced, complete and
partial transparency. Perception & psychophysics, 38: 354-366.
Michotte, A., Thines, G. & Crabbe, G. (1991/1964). Amodal completion of perceptual
structures, in Michottes experimental phenomenology of perception., (G. Thines,
A. Costall, & G. Butterworth, eds.), Hillsdale, NJ: Erlbaum, pp. 140-167.
Nakayama, K., Shimojo, S. & Silverman, G.H. (1989). Stereoscopic depth. Its relation
to image segmentation, grouping, and the recognition of occluded objects.
Perception, 18: 55-68.
Palmer, S.E. (1999). Vision Science. Cambridge, MA: MIT Press.
Petter, G. (1956). Nuove ricerche sperimentali sulla totalizzazione percettiva. Rivista di
Psicologia, 50: 213-227.
Pollard, S.B., Mayhew, J.E.W. & Frisby, J.P. (1985). A stereo correspondence algorithm
using a disparity gradient limit. Perception, 14: 449-470.
39
Prazdny, K. (1985). Detection of binocular disparities. Biological Cybernetics, 52: 9399.

Ratliff, F. (1965). Mach Bands: Quantitative studies on neural networks in the retina.
San Francisco, CA: Holden-Day.
Ringach, D.L. & Shapley, R. (1996). Spatial and temporal properties of illusory contours
and amodal boundary completion. Vision Research, 36: 3037-3050.
Singh, M. & Anderson, B.L. (in press). Toward a perceptual theory of transparency. To
appear in Psychological Review.
Sugita, Y. (1999). Grouping of image fragments in primary visual cortex. Nature, 401:
269-272.
Shipley, T.F. & Kellman, P.J. (1992). Perception of partly occluded objects and illusory
figures: Evidence for an identity hypothesis.
Journal of Experimental
Psychology: Human Perception and Performance, 18: 106-120.

Smallman, H.S. & McKee, S.P. (1995). A contrast ratio constraint on stereo matching.
Proceedings of the Royal Society of London (B), 260: 265-271.
Sperling, G. (1970). Binocular vision: A physiological and neural theory. American
Journal of Psychology, 83: 461-534.
Stoner, G.R., Albright, T.D. & Ramachandran, V.S. (1990). Transparency and coherence
in human motion perception. Nature, 344: 153-155.
Takeichi, H., Watanabe, T. & Shimojo, S. (1992). Illusory occluding contours and
surface formation by depth propagation. Perception, 21: 177-184.
Wallach, H. (1948). Brightness constancy and the nature of achromatic colors. Journal
of Experimental Psychology, 38: 310-324.
40
Figure Captions.
Figure 1. (a) The two eyes converge by angle on a point P. Therefore, by definition,
P projects to the foveae of both eyes (P). The Vieth-Mller circle is one of the
geometrical horopters, that is, it traces a locus of points in space that project to the
equivalent retinal locations in the two eyes, and thus carry no interocular
disparity. Point Q is closer to the observer than P (as it falls inside the horopter).
Therefore, it projects to different locations on the two retinae (Q).
The
difference in the locations of Q is the binocular disparity, which can be scaled by

the vergence angle, , to derive depth. (b)
When the visual field contains many
points, there is a potential ambiguity as to which image features correspond in the

two eyes. Correct matches yield correct depth estimates, such as dA. (c) By
contrast, false matches yield erroneous depth estimates. Here, the image of point
A has been incorrectly matched with the image of point B, leading to an incorrect
depth estimate, d*.
Figure 2. (a) The image of a square occluding a diamond. A receptive field of limited
extent (the ellipse) captures only local information about the scene, here a vertical
luminance edge. This local information is ambiguous as many different scenes
could have resulted in the same image feature. (b) If disparity is calculated by
matching local contrasts, then the edge carries only a single disparity. However,
in this case, the light and dark sides of the edge result from two distinct objects
and therefore different depths have to be assigned to the two sides of the edge.
41
Figure 3. Asymmetries in depth interpolation, adapted from Takeichi et al. (1990). (a)
When the left stereopair is cross-fused, the diamonds appear to float
independently in front of the Kanizsa triangle, as schematised in (b). When the
disparity of the diamonds is inverted (by cross-fusing the right stereopair), the
diamonds drag their background with them, creating the percept of a triangular
hole, even though only the disparity of the diamonds has changed.
This
asymmetrical change in surface structure can be explained by the contrast depth

asymmetry principle (see main text).
Figure 4. Adapted from Anderson et al. (2002). A contour which carries a depth signal
(e.g. disparity) is inherently ambiguous. Two main classes of world states could
have given rise to the contour: the contour could have originated from a single
continuous surface (e.g. a reflectance edge or cast shadow), or it could have
originated from an occlusion event. In the occlusion case, the border ownership
of the contour (i.e. which side is the occluder) is ambiguous. Nonetheless, in all
configurations, both sides of the contour are constrained to be at least as far as the
depth signal carried by the contour. This introduces a fundamental asymmetry in
the role of near and far contours in determining surface structure (see text for
details).
Figure 5. (a) Modal completion. Most observers report seeing a vivid white triangle in
front of three disks and a black triangular outline. The contours of the white
42
triangle are subjectively distinct, resembling real contours, even though there is
no corresponding image contrast, and hence the triangle is illusory.
(b)
Amodal completion. Most observers report seeing a single continuous black

shape, part of which is hidden from view by the grey occluder, even though the
parts that are hidden from view are, by definition, invisible. (c) A self-splitting
object (SSO). Even though the shape is uniform black, it tends to be seen as two
forms, one in front of the other. Which form tends to complete modally, and
which amodally, depends in part on the distance that must be spanned by the
completion (Petters law).
Figure 6. Adapted from Anderson et al. (2002). Demonstration of dependence of modal

completion on surround luminance. When the left stereopairs of (a), (b), and (c)
are cross-fused, the stripes tend to amodally complete between the gaps between
the circular hole, creating the impression of a single striped surface (like
wallpaper) viewed through two apertures, as depicted in (d).
irrespective of the luminance of the surround.
This occurs
However, when the right
stereopairs are cross-fused, thus inverting the disparity, only two stripes appear to
complete modally, and which stripes complete depends critically on the surround
luminance, as depicted in (e). When the surround is dark, as in (a), the dark
stripes complete modally; When the surround is light, as in (b), the light stripes
complete modally; and when the surround is intermediate, no completion is
visible. This demonstrates that modal completion is luminance dependent, while
amodal is not.
43
Figure 7. Adapted from Anderson et al. (2002). (a) Relative depth alters perceptual
organization. When the left stereopair is cross-fused, the figure tends to appear as
five disks occluded by five distinct image fragments, as depicted in (b); the
transparency in (b) is included only so that both depth planes can be depicted
simultaneously. When the depth ordering is reversed by cross-fusing the right
stereopair, a single irregular black star appears to lie on a continuous white
background, which is visible through five holes in a continuous overlying layer.
In this depth ordering the black shape tends to appear as figure.
Figure 8. The serrated-edge illusion, adapted from Anderson et al. (2002). When the left
stereopair in (a) is uncross-fused, the resulting percept consists of six circular
disks that are partly occluded by a jagged white surface on the right, as depicted
in (b). When the right stereopair is uncross-fused, the modal completion of these
four black blobs tends to take the form of a single wavy contour that runs
vertically down the center of the display, as depicted in (c). Although other
percepts are possible, this is an existence proof that depth inversion alone can
alter the shape of modally and amodally completed contours.
Figure 9. Perceptual transparency. The figure in (a) tends to be seen as a light grey
transparent surface in front of a bipartite background, as depicted in (b), and thus
two distinct surfaces are visible along the same line of sight. Transparency is
only seen when certain relations hold between the various regions of the display.
44
In (c) the central region is higher contrast than its surround and thus is not seen as
transparent.
In (d), the polarity of the contrasts is reversed, and again
transparency is not seen.
In (e), the contour of the underlying layer is not
continuous inside and outside the central region, eliminating the percept of
transparency. In (f), the contour of the overlying layer is not continuous, which
also reduces the percept of transparency.
Figure 10. Adapted from Beck and Ivry (1988). The polarity constraint means that
transparency manifests itself in distinctive local ordinal relations in luminance.
The only difference between the three figures is the luminance of the region of
overlap. In (a), the region is dark, and the image is bistable as either square can
be seen in front. When this occurs, a line that progressively passes from brighter
to darker regions creates a Z-shape. In (b), the overlap is intermediate, such that
the line that joins regions of decreasing brightness is C-shaped.
When this
happens, exactly one of the surfaces appears transparent. In (c), the overlap is
light, creating a criss-cross pattern.
In this case, neither square appears
transparent as the polarity constraint is infringed for both squares.
Figure 11. Scission and the perceptual organization of depth; adapted from Anderson
(1999). The top and bottom figures are identical apart from the brightness of the
surround. When the right stereopair is cross-fused, the figure appears as a single
textured plane that is visible through three circular holes. This is seen irrespective
of the luminance of the surround. However, when the disparity is reversed (by
45
cross-fusing the left stereopair), the texture appears to separate into two depth
planes. The near layer contains near clouds that vary spatially in thickness or
opacity. Through these clouds can be seen three more distant disks, which appear
more-or-less uniform in lightness.
With this depth ordering, the structure
completely reverses with a change in the luminance. In the top case, the dark
portions of the texture form the clouds; in the bottom case, the light portions of
the texture form the clouds. Scission makes these percepts possible by allowing
the visual system to separate the intermediate greys into two distinct
contributions.
46
Figure 1.
(a)
Vieth-M ller Circle
!
Q
Q
P Q
(b)
P
(c)
dA
d*
47
Figure 2.
(a)
(b)
wo
ima
ge
rld
Figure 3.
(a)
(b)
(c)
48
49
Figure 4.
Possible depth interpretations

Continuous Surfaces
Occluding Surfaces
matching,
disparity
computation
Local Image Data
Figure 5.
(a)
(b)
(c)
50
51
Figure 6.
(a)
(d)
(b)
(e)
(c)
Figure 7.
(a)
(b)
(c)
52
53
Figure 8.
(a)
(b) Serrated edge near
(c) Serrated edge far
Figure 9.
(b)
(a)
(c)
(d)
(e)
(f)
54
55
Figure 10.
(a)
(b)
(c)
Figure 11.
56

Fleming Anderson Preprint

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Fleming Anderson Preprint

Uploaded by

Copyright:

Available Formats

Roland Fleming & Bart Anderson: Perceptual Organization of Depth

The Perceptual Organization of Depth

Running Head: Perceptual Organization of Depth

Roland Fleming & Bart Anderson: Perceptual Organization of Depth

As we shall see, this has profound

consequences for the processing of depth information. In particular, there is more to

Roland Fleming & Bart Anderson: Perceptual Organization of Depth

Roland Fleming & Bart Anderson: Perceptual Organization of Depth

introduces a second segmentation problem in the perceptual organization of depth. The

1. Interpreting local depth measurements: the contrast depth asymmetry principle

Roland Fleming & Bart Anderson: Perceptual Organization of Depth

1.1 Binocular stereopsis and the correspondence problem.

Roland Fleming & Bart Anderson: Perceptual Organization of Depth

Rather, the visual system seems to match local contrast

Roland Fleming & Bart Anderson: Perceptual Organization of Depth

1.2 Asymmetries in depth: a demonstration.

Roland Fleming & Bart Anderson: Perceptual Organization of Depth

Why does the disparity of the

1.3 From features to surfaces: interpretation of local disparity signals.

Roland Fleming & Bart Anderson: Perceptual Organization of Depth

If the more distant surface is

Roland Fleming & Bart Anderson: Perceptual Organization of Depth

Both sides of an edge must be situated at a depth that is greater than

1.4 Application of the contrast depth asymmetry principle.

Roland Fleming & Bart Anderson: Perceptual Organization of Depth

Roland Fleming & Bart Anderson: Perceptual Organization of Depth

2. Occlusion and camouflage: hallucinating the invisible

Roland Fleming & Bart Anderson: Perceptual Organization of Depth

2.1 Modal and amodal completion

Roland Fleming & Bart Anderson: Perceptual Organization of Depth

2.2 The identity hypothesis.

Roland Fleming & Bart Anderson: Perceptual Organization of Depth

According to this account, perceptual organization (including visual

This is known as the

Roland Fleming & Bart Anderson: Perceptual Organization of Depth

Ringach and Shapley found that

Roland Fleming & Bart Anderson: Perceptual Organization of Depth

This suggests that amodal completion should not be sensitive to the

luminance relations between the image regions involved.

Roland Fleming & Bart Anderson: Perceptual Organization of Depth

Roland Fleming & Bart Anderson: Perceptual Organization of Depth

2.3 Visual completion and the perceptual organization of depth.

Roland Fleming & Bart Anderson: Perceptual Organization of Depth

With the disparity

inverted, the impression is of a single amodally-completed, irregularly-shaped, black

Roland Fleming & Bart Anderson: Perceptual Organization of Depth

3. Transparency, scission, and the representation of multiple depth planes.

Roland Fleming & Bart Anderson: Perceptual Organization of Depth

Transparency poses a particularly interesting problem in the perceptual organization of

Roland Fleming & Bart Anderson: Perceptual Organization of Depth

3.1 Photometric conditions for scission.

Roland Fleming & Bart Anderson: Perceptual Organization of Depth

Roland Fleming & Bart Anderson: Perceptual Organization of Depth

provided a similar taxonomy based on the number of polarity reversals. A number of

Roland Fleming & Bart Anderson: Perceptual Organization of Depth

3.2 Figural conditions for scission.

The second figural condition requires good

Roland Fleming & Bart Anderson: Perceptual Organization of Depth

Roland Fleming & Bart Anderson: Perceptual Organization of Depth

3.3 Scission and the perceptual organization of depth.

Roland Fleming & Bart Anderson: Perceptual Organization of Depth

The texture tends to appear as a single plane with

Roland Fleming & Bart Anderson: Perceptual Organization of Depth

In what follows, we will use the contrast depth