Professional Documents
Culture Documents
Introduction
The goal of depth perception is to identify the spatial layout of the objects and surfaces
that constitute our surroundings. One important observation about the world around us
that influences the way we see depth is that physical matter is not distributed randomly,
with arbitrary depths at every location. On the contrary, the environment is generally
organized: the world consists mainly of tightly bound objects in a discernable layout.
This order results from countless forces and processes in our world which tend to
organize matter into objects and place those objects in certain spatial relations. The
central thesis of this chapter is that our perception of depth mirrors this organization. We
argue that because the world consists of objects and surfaces, our perception of depth
should likewise be represented in terms of the functionally valuable units of the
environment, namely surfaces and objects.
our perception of depth into meaningful units, to emphasise the intimate relationship
between depth processing and perceptual unit formation.
In the first section we discuss how the visual system infers the layout of surfaces
from local measurements of depth. We will argue that local estimates of depth are
ambiguous, but that the geometry of occlusion critically constrains the legal
interpretations. Occlusion occurs when one opaque object partly obscures the view of a
more distant object, as happens frequently under normal viewing conditions. Occlusion
is important because it occurs at object boundaries, and therefore the depth
discontinuities introduced by occlusion provide ideal locations for the segmentation of
depth into objects. Moreover, as we will show in section 1, the geometry of occlusion
causes relatively near and relatively far depths to play different roles in the inference of
surface structure.
In the second section we discuss the visual representation of environmental
structures that are hidden from view. If the visual system is to organize depth into
meaningful bodies, it must represent whole objects and not only those fragments that
happen to be visible. In order to do this, the visual system must interpolate across gaps in
the image to complete its representation of form. We argue that by considering the
particular environmental conditions under which structures become invisible (specifically
occlusion and camouflage) we can make predictions about the mechanisms underlying
visual completion. We also discuss how visual completion influences the representation
of depth.
Finally, we discuss what happens when the scene contains transparent surfaces,
and thus multiple depths are visible along a single line of sight. We argue that this
In this section we discuss how occlusion constrains the interpretation of local depth
estimates. Specifically we show that occlusion enforces a crucial asymmetry between
relatively near and relatively distant structures that can have profound implications for
the representation of surface layout. Although the principles are discussed in terms of
binocular disparity, the fundamental logic relates to the geometry of occlusion and
therefore applies to any local estimate of depth.
conditions, the correspondence problem would be difficult as the visual system would
have to identify the one true match from among a large number of false targets.
However, there is considerable debate about what types of image features the
visual system matches to determine disparity (Julesz, 1960, 1971; Sperling, 1970; Marr
and Poggio, 1976, 1979; Pollard, Mayhew and Frisby, 1985; Prazdny, 1985; Jones and
Malik, 1992). Psychophysically, at least, it now seems unlikely that the visual system
matches raw luminances.
signals, that is, localizable variations in intensity, such as luminance edges (Anderson
and Nakayama, 1994, Smallman and McKee, 1995). This seems an almost inevitable
consequence of early visual processing, which maximises sensitivity to contrasts, rather
than to absolute luminances (Hartline, 1940; Wallach, 1948; Ratliff, 1965; Cornsweet,
1970). By the time binocular information converges in V1, the visual field appears to be
represented in terms of local measurements of oriented contrast energy (Hubel and
Wiesel, 1962; DeValois and DeValois, 1988) and thus it is likely that these are the
features from which disparity is computed.
If this is true, then the image features that carry disparity information are local
contrasts, such as luminance edges. However, this poses a problem for the visual system,
for in order to capture the functional units of the environment, the visual representation of
depth should be tied to surfaces and objects, not to local image features. There is
therefore, a potential discrepancy between the image features that carry disparity
information (i.e. local contrasts), and the perceptual structures to which depth is assigned
(i.e. regions) in the ultimate representation of environmental layout. This discrepancy
plays a critical role in the theoretical discussion that follows.
A local image feature, such as an edge, has only one true match in the other eyes
image. Therefore, the edge carries only one disparity. However, depth is ultimately
assigned to the two regions that meet to form the edge. This results in a problem: in
order to represent surface structure the visual system must assign depth to both sides of
an edge, even though the edge carries only one disparity (see Figure 2). How does the
visual system infer the depths of two regions from every local disparity signal? We will
show that the geometry of occlusion imposes an inviolable constraint on the
interpretation of local disparity-carrying features. To anticipate, we show that the simple
fact that near surfaces can occlude more distant ones, but not vice versa, has profound
consequences for the assignment of depth to whole regions.
relative to the circular inducers; only the disparity of the diamonds changes from near to
far. This simple inversion leads to a change in surface representation that is more
complex than a simple reversal in the depth ordering of the perceptual units (as
schematised in figure 3b). When the diamonds recede, they drag their background back
with them, such that the triangle appears as a hole through which the observer can see a
white surface; the three black diamonds lie embedded in the more distant white surface.
This recession of the background has a secondary effect of increasing the strength of the
illusory contour (the border of the triangle).
The important observations with regard to the theory are the following. First,
when the diamonds are in front, they are freely floating and separate, while when they
recede, they drag the background with them. Second, when the dots are forward, the
Kanizsa triangle tends to be seen as a figure (rather than ground), but when the diamonds
are more distant, the triangle is seen as a hole. And yet all that changed in the display
was the disparity of the diamonds. Why does this simple reversal in depth lead to an
asymmetric change in the surface representation?
diamonds influence the appearance of the triangle? These are the asymmetries of depth
to which the following discussion pertains.
edge meet at the depth of the edge, d0. There are many surface events for which this is
the case: reflectance edges, cast shadows, and creases in the surface, to name just three.
When the feature originates from a continuous manifold, as in these cases, interpretation
is simple, as both sides of the edge are assigned the same depth, d0.
The second class of interpretations occurs when the edge corresponds to an object
boundary, and therefore represents a depth discontinuity (see figure 4). In this case, one
side of the edge lies at the depth of the occluding object, and the other side of the edge
lies at the depth of the background. Therefore, the visual system must assign different
depths to the two sides of the edge. How can the visual system assign two depths, when
it is given only one disparity, d0? The answer is that it only assigns a unique depth to the
occluding side. The critical insight is the following: The depth measurement acquired at
an occluding edge only specifies the depth of the occluding surface. The visual system
assigns depth d0 to the occluding surface. All that it knows about the other side is that it
must be more distant than the occluding surface.
untextured, then it could be at any depth behind the occluder and the local image data
would remain the same. By contrast, if the depth of the occluding surface varies, the
disparity carried by the object boundary must also change, because the occluding surface
owns the contour (Koffka, 1935, Nakayama, Shimojo and Silverman, 1989) and is
therefore responsible for the disparity associated with the edge.
Although the visual system cannot uniquely derive the depth of the occluded side
(i.e. the background) from the local disparity computation, there is one critical piece of
information that it does have, and that is that the occluded side is more distant than the
occluder. There is no way for an occluding object to be more distant than the background
10
that it occludes. If the background is brought closer than the object, then the background
becomes the occluding surface, and carries the edge with it. In this way, occlusion
introduces a fundamental asymmetry into the interpretation of disparity-carrying edges:
the occluded side of the edge can be at any distance greater than d0, but neither side can
be nearer than d0.
We can summarise the possible depth assignments (from the occlusion and nonocclusion classes just described) in the form of a constraint on the interpretation of local
disparity-carrying contrasts, which is termed the contrast depth asymmetry principle
(Anderson, submitted; see also Anderson, Singh and Fleming, 2002):
Although this geometric fact is simple in form, it can have pronounced effects on
the global interpretation of images, when the constraint applies to all edges
simultaneously. We will now run through an example to show how the principle can
explain the asymmetric changes in perceived surface structure that occur when near and
far disparities are inverted.
11
the illusory triangle tends to be seen as figure. When the disparity is reversed, however,
the diamonds drag the background back with them, and the triangle appears as a hole.
This asymmetry in surface layout is depicted in figure 3b.
Let us first consider the case in which the diamonds appear to float in front. The
visual system has to interpret the disparity signals carried by the edges of the diamonds.
The CDAP requires that both sides of the diamonds edges (i.e. the black inside and the
white outside of the diamonds) have to be at least as distant as the edges. Now consider
the inducers, which are more distant than the diamonds. The constraint requires both
sides of these edges to be at least as distant as their edges. This means that all of the
black interior of the inducers must be at least this distant and, more importantly, all of the
white background must be at least this distant, which is further than the disparity of the
diamonds. If all of the white background is further than the diamonds, then the edges of
the diamonds must be occluding edges, and the black interior of the diamonds must be an
occluding surface. This explains why the diamonds are seen as independent occluders,
floating in front of the large white background and black inducers: the edges of the
inducers drag the white background back, leaving the diamonds floating in front.
Now consider the case in which the diamonds are more distant than the inducers.
Again, the CDAP requires that both the inside and the outside of the diamonds have to be
at least as far back as their disparity dictates. This means that both the diamonds and
their white background are dragged back to the more distant disparity. Now consider the
inducers, which carry a relatively near disparity. Because the white background behind
the diamonds has been dragged back with the diamonds, the inducers and their white
background must be occluding surfaces. This means that the background immediately
12
surrounding the diamonds must be visible through a hole in the occluding surface. The
edges of this hole are the illusory contours of the Kanizsa figure. Note again, the fact that
both sides of every edge have to be at least as far as the edge, leads to asymmetrical
surface structures when disparities are inverted.
This is just one example that shows how the CDAP can account for asymmetrical
effects of relatively near and relatively far disparities on perceived surface layout;
because the CDAP is derived from the geometry of occlusion, it can account for a very
large number of displays, and can be used to generate surprising new displays (see
Anderson, 1999; Anderson, submitted).
The central thesis of this chapter is that the visual system does not merely record depth at
each location in the visual field; rather, it actively organizes its depth measurements into
functionally valuable units. In the last section, we discussed how occlusion plays a key
role in this organization. In this section, we discuss how the visual system handles what
is arguably the hardest problem posed by occlusion: the visual representation of
structures that are hidden and are therefore completely invisible. If seeing depth is about
representing the actual layout of objects in the environment, then all portions of the
objects must be represented, even those that are hidden from view: hidden portions do not
disappear from the environment just because they do not appear in the image. Therefore,
the visual system has to go beyond local image data to construct representations of
13
hidden structures. We will now discuss how the environmental conditions of occlusion
and camouflage predict properties of the construction process.
14
the region of missing data. This is referred to as modal completion (Michotte, Thines
and Crabbe, 1991/1964) because the experience is of the same phenomenal modality as
ordinary visual experience. An illusory contour, for example, is crisp, and subjectively
similar to a real contour, as can be seen in figure 5a. In contrast to this, the sense of
completion experienced with occluded structures is less distinct. The black form in
figure 5b tends to be seen as a single object, part of which is hidden, rather than as two
distinct objects, whose boundaries coincide with the boundary of the grey occluder.
There is a compelling sense that the two visible portions of the black form belong to the
same object, and that that object continues in the space behind the occluder. However,
this impression, although visual in origin, is not of the same phenomenal mode as normal
and modal contours, and is therefore referred to as amodal completion (Michotte et al.).
In general, the regions of the image which are visible, and lead to visual completion are
referred to as inducers.
15
The debate runs roughly as follows. On the one hand there has been the strong
claim that a single completion mechanism is responsible for both modal and amodal
completion.
completion) produces perceptual units, and an independent process places those units in
depth. The theory states that psychological differences between modal and amodal
completion results from the final depth ordering of the completed forms (Kellman and
Shipley, 1991; Shipley and Kellman, 1992; Kellman, Yin and Shipley, 1998) rather than
a difference between the completion processes themselves.
identity hypothesis. On the other hand the two processes could be largely independent,
subject to different constraints and subserved by distinct neural mechanisms. The strong
form of this dual mechanism hypothesis would be that the two processes are of a
fundamentally different kind, for example, that modal completion is largely data-driven,
while amodal completion is essentially cognitive. To anticipate, although we do not
subscribe to the strongest form of the dual-mechanism hypothesis, we will provide
evidence that modal and amodal completion follow different constraints and argue that
they are subserved by distinct neural processes. Central to the arguments that we present
are the geometric and photometric conditions under which occlusion and camouflage
actually occur in the environment.
The principle evidence for the identity hypothesis has been that subjects perform
similarly with modally and amodally completed figures in a variety of tasks. In one task,
Shipley and Kellman (1992) varied the spatial alignment of the inducing elements in both
modally and amodally completed squares. Such misalignment is known to weaken the
sense of completion, as the completed boundary is forced to undergo an inflection.
16
Subjects were asked to rate the subjective strength of visual completion as a function of
the degree of misalignment for modal and amodal versions of the display. Shipley and
Kellman (1992) found that ratings declined at the same rate as a function of misalignment
for both modal and amodal figures. This has been interpreted as evidence that a single
mechanism is responsible for both forms of completion.
Using a more rigorous method, Ringach and Shapley (1996) performed a shape
discrimination task with modal and amodal versions of a Kanizsa figure. By rotating the
inducing elements, the vertical contours of the completed square can be made to bow out
(creating a Fat Kanizsa), or curve in (creating a Thin Kanizsa). Subjects were asked
to discriminate between Fat and Thin versions of the display while the angle through
which the inducers were rotated was varied.
discrimination performance as a function of rotation was nearly identical for modal and
amodal versions of the display, a finding which is consistent with the identity hypothesis.
One problem with this type of evidence is that it relies on negative results, that is,
a failure to detect a difference, which could be due to the method rather than a
fundamental property of the system being studied. Should positive evidence be provided
that modal and amodal completion are subject to different constraints, or result in
different perceptual units, then the identity hypothesis would no longer be tenable.
There are two major reasons for believing that modal and amodal completion
should be subject to different constraints, both of which are related to the environmental
conditions under which occlusion and camouflage occur. First, occlusion occurs over
greater distances across images because it only requires that one object is in front of
another. Camouflage, on the other hand, requires a perfect match in color between the
17
near surface and its background, and thus occurs less frequently in general.
This
difference is reflected in a constraint on the image distances over which modal and
amodal completion occur, which was first documented by Petter (1956). Petter used a
class of stimuli now known as spontaneously splitting objects (SSOs), which consist of a
single homogeneously colored shape, such as the one shown in figure 5c, that tends to be
interpreted as two independent shapes, one behind the other. Which object is seen in
front tends to oscillate with prolonged viewing. However, which shape is seen in front
first, and which tends to be seen in front for a greater proportion of the time can be
predicted rather well from the lengths of the contours that must be interpolated. Petters
rule states that longer contours tend to be completed amodally, while shorter contours
tend to be completed modally. Thus, which figure is seen in front can be predicted from
the length of the contours that must be completed. If the two types of completion are
subject to different constraints on the distances over which they occur, this opens the
possibility that they are subserved by different mechanisms.
A second reason for believing that modal and amodal completion are subject to
different constraints relates to the color conditions that are required for occlusion and
camouflage to occur. Again, occlusion can happen between objects of any color. The
reflectance of the near object is unrelated to the fact that it hides the more distant one
from view.
Camouflage, by contrast,
requires a perfect match in luminance between the near and far surface. This implies that
modal completion should be sensitive to the luminance relations between the image
regions involved.
18
Recent experimental work has shown that this luminance sensitivity can lead to
large differences between modal and amodal displays (Anderson, Singh, Fleming, 2002).
Anderson et al. created displays consisting of two vertically separated circles filled with
light and dark stripes, as shown in figure 6. The binocular disparity of the circles was
kept constant, but the disparity of the light/dark contours inside the circles was altered to
place the stripes behind or in front of the circular boundaries. When the stripes were
further than the circles, the top and bottom stripes tended to complete amodally to form a
single continuous dark and light surface, which appeared to be visible through two
circular holes, as schematised in figure 6d. This percept occurred irrespective of the
luminance of the region surrounding the circles.
By contrast, when the disparity placed the contours in front of the circles, the dark
and light stripes separated into different depth planes. The way in which the stripes
separated from one another depended on the luminance of the surround. When the
surround was the same color as the light stripes, the light stripes appeared to float in front
and completed modally across the gap between the two circles. In this condition, the
dark stripes completed amodally underneath the light stripes to form complete circles.
This lead to an impression of light vertical stripes in front of dark circles, as schematised
in figure 6e. However, when the surround was the same luminance as the dark stripes,
the percept inverted, such that the dark stripes appeared to float in front of light disks.
This demonstrates a fundamental dependence on luminance that was not present in the
amodal version of the display. Furthermore, if the surround was an intermediate grey,
then the display was not consistent with camouflage, as neither the light nor the dark
stripes perfectly matched the luminance of the background. Under these conditions, there
19
was no modal completion across the gap, and the percept was difficult to interpret. This
demonstrates that modal completion is sensitive to luminance relations, while amodal
completion is not.
Anderson et al. showed that this luminance sensitivity could affect performance
on basic visual tasks such as vernier acuity. The stripes in the top and bottom circles can
be horizontally offset (i.e. misaligned slightly), without destroying the sense of
completion. Subjects were asked to report in which of two displays the contours were
slightly misaligned. Both modal and amodal completion facilitate performance in this
task. However, in the amodal case performance was unaffected by the luminance of the
surround, while in the modal case, performance was much worse when the luminance of
the surround was an intermediate grey (the condition in which the stripes do not complete
across the gap). Thus, modal and amodal completion are subject to different constraints,
both on the distance over which they occur, and the luminance conditions that are
required to induce them. This positive evidence for a difference between modal and
amodal completion uses essentially the same types of task as the negative evidence that
had previously been used to support the identity hypothesis.
20
lead to different shapes. This is important as it shows that unit formation is intimately
bound to the placement of structures in depth.
The greater promiscuity of amodal completion is the key in the generation of
these displays. Figure 7 is a recently developed stereoscopic variant of the Kanizsa
configuration in which the inducing elements are rotated outwards (Anderson et al.,
2002). When the straight segments (the mouths of the pacmen) are placed in front of
the circular portions of the inducers, the impression is of 5 independent illusory
fragments that float in front of 5 black disks on a white background. However, when the
two eyes views are interchanged, and thus the straight contours are placed behind the
circular segments, the impression is rather dramatically altered.
21
that runs down the center of the display. This percept is schematised in figure 8c. The
importance of this demonstration is that it shows that modal and amodal completion can
not only result in different surface structures, but even in differently shaped contours. It
is difficult to see what the concept of a single completion mechanism serves to explain if
the two processes can result in different completed forms.
Ultimately, the identity hypothesis is a claim about mechanism and can therefore
be assessed physiologically. There is a considerable body of evidence for extrastriatal
units that are sensitive to illusory, but not to amodally-completed, contours (see chapter
[chapter number for von der Heydt], this volume, for a review). A critical additional
piece of evidence was provided recently by Sugita (1999), who found cells in V1 that
respond to amodal completion across their receptive fields, but not to modal completion.
Cells responded weakly when presented with two unconnected edges; holes and
occluding surfaces on their own; and stimuli in which two unconnected edges were
separated by a hole. However, when the cells were presented with two edge fragments
separated by an occluder (a stimulus that leads to amodal completion of the edge), the
cells responded vigorously. This shows that at the earliest stages of cortical processing,
there is a double dissociation between the representations of modal and amodal
structures, a conclusion which supports the dual mechanism hypothesis.
22
23
important as it determines the conditions under which the visual system scissions a single
image intensity into multiple layers, and thus how the visual system stratifies its
representation of depth.
Broadly the conditions required for perceptual scission fall into two classes. The
first are the photometric conditions for transparency, which detail the relations between
the light intensities of neighbouring regions that are necessary for scission. The second
set of conditions for perceptual scission are geometrical, or figural. Depth only separates
into layers when these relations hold between the various regions of the display.
24
display can be seen as a bipartite display seen through a transparent filter with a
diamond-shaped hole in the centre.
The intuition behind the second luminance constraint, which we refer to as the
polarity constraint, is that a transparent medium cannot alter the contrast polarity of the
structures visible through it. Put another way, if a dark-light edge passes underneath a
transparent medium, the dark side will remain darker than the light side, no matter what
the absolute luminances are. As can be seen from Figure 9d, infringement of this
constraint prevents perceptual scission, demonstrating that the visual system respects this
optical outcome of transparency. This constraint is particularly important in determining
the depth ordering in transparent displays.
The polarity constraint enforces certain restrictions on the ordinal relationships
between the luminances of neighbouring regions. This means that, in principle, we can
classify the locations where neighbouring regions meet to determine whether scission is
or is not possible in each region. This provides the visual system with a local signature
of transparency. Beck and Ivry (1988) noted that if one draws a series of lines running
progressively from the brightest to the darkest regions, there are three possible shapes
that result, as shown in figure 10. The only difference between the three figures is the
luminance of the region of overlap between the two squares. In the first instance (Figure
10a), the image is bistable, as either square can be seen as a transparent overlay. In these
circumstances the lines linking regions of increasing luminance form a Z-configuration.
When the lines form a C-shape (Figure 10b), only one of the squares is seen as
transparent, and when the lines criss-cross (Figure 10c), the polarity constraint is
infringed for all regions, and neither square scissions. Adelson and Anandan (1990)
25
26
lightness. However, Metelli himself noted that dark filters tend to look more transparent
than light filters with the same transmittance. Why does the visual system confuse
lightness and transmittance in partitioning luminance between two depths?
In a series of matching experiments, Singh and Anderson (in press) recently
resolved this issue. Subjects adjusted the opacity of one filter until it matched the
perceived opacity of another filter with a different lightness. Singh and Anderson found
that perceived transmittance is predicted almost perfectly by the ratio of Michelson
contrasts inside and outside the transparent region, even though such a measure is
actually inconsistent with the optics of transparency. As discussed above, there is a
general consensus that the early visual processing tends to optimise sensitivity to
contrast, rather than absolute luminance. Hence, in assigning transmittance, the visual
system appears to use the readily available contrast measurements, even though they are
not strictly accurate measurements of opacity.
27
continuation of the transparent layer. Figure 9f shows that infringement of this condition
weakens or eliminates the percept of transparency.
There are conditions in which the figural cues to transparency are so strong that
they can override the luminance cues. Beck and Ivry (1988) showed subjects displays
like the one shown in figure 10c, in which the region of overlap between the two figures
is the wrong contrast polarity for either figure to be seen as transparent. Despite this,
nave subjects did occasionally report seeing such figures as transparent, demonstrating
that the sense of figural overlap is a central aspect of the percept of transparency.
Certainly most observers are willing to agree that the region of overlap in Figure 10c
appears to belong to two figures simultaneously, an impression that can be enhanced with
stereo and relative motion. However, it should be noted that the grey of the overlap
region does not appear to scission into two distinct sources, at least not in the same way
as the overlap of a normal transparency display does (as in Figures 10a and 10b). This
leads to the possibility of two distinct neural processes in the perception of transparency.
One is driven by relatively local cues and leads to phenomenal color scission. The other
is driven by more global geometrical relations, and leads to stratification in depth. Under
normal conditions of transparency, the two processes operate concinnously to produce the
full impression of transparency. However, using carefully designed cue-conflict stimuli,
such as those used by Beck and Ivry, these two factors in the representation of transparent
surfaces can be distinguished. An open question, however, is how these processes are
instantiated neurally. All we can conclude is that the representation of depth is much
more sophisticated than a mere 2D map of depth values.
28
29
luminance can cause changes in depth stratification. Figure 11 (taken from Anderson,
1999) demonstrates this close relationship between luminance, scission and the
perceptual organization of depth. Three circular patches of a random texture were placed
on a uniform background. Critical to the demonstration is that disparity is introduced
between the circular boundaries and the texture inside the circles. When the disparity
places the texture behind the circular boundaries, the circles appear as holes, through
which the texture is visible.
continuously, stochastically varying lightness. However, when the disparity places the
texture in front of the circular boundaries, the percept changes considerably. The texture
separates into two distinct layers: a near layer made of clouds with spatially varying
transmittance, and a far layer that is visible through the clouds, which consists of uniform
disks on a uniform background.
Another interesting property of this display is that the lightness and spatial
structure of the clouds and disks reverse completely when the luminance of the surround
varies. In figure 11, the top and bottom displays are completely identical except for the
lightness of the surround. When the surround is dark, the texture scissions into dark,
smoke-like clouds in front of white disks. However, when the surround is white, it is the
light portions of the texture that move forward, floating like mist in front of dark disks.
One final observation about the display is that when the texture carries near disparity, and
thus undergoes scission, the clouds that float in front tends to complete modally across
the gaps in between the disks. This is in part due to the fact that the conditions for
camouflage are satisfied, as discussed in section 2.
30
When the depth is reversed in the display, two asymmetries occur. The first is
geometrical in that it alters the structure of the depths in the scene. In the near case the
texture scissions into two layers, while in the far case the texture appears relatively
uniform in depth by comparison. The second asymmetry that occurs with depth inversion
is photometric in that it is driven by the luminance of the surround and determines the
lightness of the cloud and disks. When the texture is distant, the percept changes very
little with changes to the luminance of the surround; by contrast, when the texture is near,
the luminance of the surround critically determines how the scission occurs as well as the
lightness of the cloud and disks.
ownership of the boundaries of the disks, which carry relatively near disparity. If the
insides of the disks (i.e. the texture) carry far disparity, then the outsides (i.e. the region
surrounding the disks) must be at the depth carried by the circular boundary. Thus, the
31
circles are seen as holes in the surrounding surface; it is through these holes that the
texture is visible.
The situation is more complex when the depth is reversed, i.e. when the contrasts
of the texture are nearer than the contrast of the circular boundaries. Crucial to the
following argument is that it is contrasts that carry disparity, while it is the light and dark
regions that make up the contrasts to which depth is assigned. First let us consider the
circular boundary between the surround and the texture. When the surround is light, it is
the dark portions of the texture (inside the circles) that contrast with the surround. Thus,
the disparity of the circular boundary is carried by the contrast between the light matter of
the surround, and the dark portion of the disk. The CDAP requires both of these regions
to be at least as distant as the disparity carried by the boundary. This means that the light
surround is dragged back to this depth, and the dark matter of the texture is also dragged
back to this depth. Now consider the contrasts between the dark and light portions within
the texture. These contrasts carry relatively near disparity. But the contrast between the
dark matter and the surround has already constrained the dark matter to be at least as
distant as the circular boundary. This means that it must be the light matter of the texture
that is responsible for the near disparity of the texture i.e. the light matter is a near
surface that partly obscures the dark matter. This explains why the texture splits into two
depths: the dark matter is dragged back by forming a contrast that carries far disparity
(i.e. the boundary of the disk) and the light matter floats in front as its boundaries with
the dark matter carry near disparity.
The final logical step in the explanation involves scission. The texture does not
consist of only two luminances, but of a continuous range of luminances from light to
32
dark. How can we explain the appearance of the intermediate luminances in the texture?
Scission makes it possible to separate the intermediate luminances into two distinct
components: dark stuff, and light stuff, which have been compressed into a single
luminance by the process of projection onto the retina. These two components lie in
different depth planes. Put another way, scission allows the visual system to interpret the
grey regions as dark matter viewed through light matter. The critical insight is that it is
the dark stuff in the texture that forms the contrast with the surround. Therefore, all of
the dark stuff belongs to the more distant depth, including the dark stuff in the greys.
All of the remaining lightness in the greys belongs to the transparent clouds that float
in front of the disks. In this way, the intermediate luminances are interpreted as varying
degrees of transmittance of the overlying layer. The lighter the grey, the thicker the
cloud; the darker, the sparser. This explains why the disk appears as a uniform black
disk: all of the black is sucked out of intermediate regions and is dragged back to form
the disk. The left-over lightness is attributed to the transparent clouds.
The whole argument reverses when we change the surround from light to dark.
When the surround is dark, it is the light portions of the texture that contrast with the
surround, and therefore, it is the light portions of the texture that are dragged back. The
near disparity of the texture must therefore be due to the dark regions, and thus dark
clouds are seen to float in front of white disks. Again, as it is the whiteness of the texture
that is dragged back, all of the whiteness in the intermediate luminances is attributed to
the more distant disks. The remaining darkness in the greys is attributed to the dark
clouds that float in front. In this way, changing the luminance of the surround changes
which contrasts carry the disparities, and thus which regions are dragged back by virtue
33
of the CDAP. Scission enables the visual system to separate luminances into multiple
contributions and thus segment the intermediate greys into two distinct depth planes.
This demonstration and others like it are important as they show how multiple
processes interface to determine our percepts of depth and material quality. It is through
the CDAP and scission that the visual system interprets local variations in luminance as
meaningful surfaces located in depth.
segmentation as an important process through which the visual system organizes its
representation of depth into ecologically valid structures.
Conclusions
It is common to think that depth perception involves little more than determining the
depth at each location in the visual field. We have argued, to the contrary, that the visual
system mirrors the structural organization of the environment by tying its representation
of depth to surfaces and objects. Thus depth perception is an active process of perceptual
organization, as well as a passive process of acquiring depth estimates. We have argued
that luminance, disparity and contrast are some of the basic image features that carry
local information about depth, while scission, visual completion and the CDAP are some
of the means by which depth is organized into surfaces.
In the first section we introduced the CDAP and argued that:
(1) disparity is carried by local contrasts (e.g. luminance edges) but assigned to
the regions that meet to form the contrasts.
34
(2)
(3)
35
Finally, in the third section, we discussed how scission allows the visual system to
represent two depths along the same line of sight, and thus organize depth into layer. We
argued that:
(1) Certain luminance and figural relations must obtain in order for a region to
undergo scission.
(2) Scission can have pronounced effects on perceptual organization in regions
distant from the local signatures of transparency.
36
References
(1988).
37
Hartline, H.K. (1940). The Receptive Fields of Optic Nerve Fibres. American Journal
of Physiology, 130: 690-699.
Howard, I.P. & Rogers, B.J. (1995). Binocular vision and stereopsis., New York: Oxford
University Press.
Hubel, D.H. & Wiesel, T.N. (1962).
correspondence from a set of linear spatial filters. Image and Vision Computing,
10: 699-708.
Julesz, B. (1960). Binocular depth perception of computer generated patterns. Bell
System Technical Journal, 39: 1125-1162.
Julesz, B. (1971). Foundations of cyclopean perception., Chicago, IL: University of
Chicago Press.
Kellman, P.J. & Shipley, T.F. (1991).
38
39
Journal of Experimental
40
Figure Captions.
Figure 1. (a) The two eyes converge by angle on a point P. Therefore, by definition,
P projects to the foveae of both eyes (P). The Vieth-Mller circle is one of the
geometrical horopters, that is, it traces a locus of points in space that project to the
equivalent retinal locations in the two eyes, and thus carry no interocular
disparity. Point Q is closer to the observer than P (as it falls inside the horopter).
Therefore, it projects to different locations on the two retinae (Q).
The
Figure 2. (a) The image of a square occluding a diamond. A receptive field of limited
extent (the ellipse) captures only local information about the scene, here a vertical
luminance edge. This local information is ambiguous as many different scenes
could have resulted in the same image feature. (b) If disparity is calculated by
matching local contrasts, then the edge carries only a single disparity. However,
in this case, the light and dark sides of the edge result from two distinct objects
and therefore different depths have to be assigned to the two sides of the edge.
41
Figure 3. Asymmetries in depth interpolation, adapted from Takeichi et al. (1990). (a)
When the left stereopair is cross-fused, the diamonds appear to float
independently in front of the Kanizsa triangle, as schematised in (b). When the
disparity of the diamonds is inverted (by cross-fusing the right stereopair), the
diamonds drag their background with them, creating the percept of a triangular
hole, even though only the disparity of the diamonds has changed.
This
Figure 4. Adapted from Anderson et al. (2002). A contour which carries a depth signal
(e.g. disparity) is inherently ambiguous. Two main classes of world states could
have given rise to the contour: the contour could have originated from a single
continuous surface (e.g. a reflectance edge or cast shadow), or it could have
originated from an occlusion event. In the occlusion case, the border ownership
of the contour (i.e. which side is the occluder) is ambiguous. Nonetheless, in all
configurations, both sides of the contour are constrained to be at least as far as the
depth signal carried by the contour. This introduces a fundamental asymmetry in
the role of near and far contours in determining surface structure (see text for
details).
Figure 5. (a) Modal completion. Most observers report seeing a vivid white triangle in
front of three disks and a black triangular outline. The contours of the white
42
triangle are subjectively distinct, resembling real contours, even though there is
no corresponding image contrast, and hence the triangle is illusory.
(b)
This occurs
stereopairs are cross-fused, thus inverting the disparity, only two stripes appear to
complete modally, and which stripes complete depends critically on the surround
luminance, as depicted in (e). When the surround is dark, as in (a), the dark
stripes complete modally; When the surround is light, as in (b), the light stripes
complete modally; and when the surround is intermediate, no completion is
visible. This demonstrates that modal completion is luminance dependent, while
amodal is not.
43
Figure 7. Adapted from Anderson et al. (2002). (a) Relative depth alters perceptual
organization. When the left stereopair is cross-fused, the figure tends to appear as
five disks occluded by five distinct image fragments, as depicted in (b); the
transparency in (b) is included only so that both depth planes can be depicted
simultaneously. When the depth ordering is reversed by cross-fusing the right
stereopair, a single irregular black star appears to lie on a continuous white
background, which is visible through five holes in a continuous overlying layer.
In this depth ordering the black shape tends to appear as figure.
Figure 8. The serrated-edge illusion, adapted from Anderson et al. (2002). When the left
stereopair in (a) is uncross-fused, the resulting percept consists of six circular
disks that are partly occluded by a jagged white surface on the right, as depicted
in (b). When the right stereopair is uncross-fused, the modal completion of these
four black blobs tends to take the form of a single wavy contour that runs
vertically down the center of the display, as depicted in (c). Although other
percepts are possible, this is an existence proof that depth inversion alone can
alter the shape of modally and amodally completed contours.
Figure 9. Perceptual transparency. The figure in (a) tends to be seen as a light grey
transparent surface in front of a bipartite background, as depicted in (b), and thus
two distinct surfaces are visible along the same line of sight. Transparency is
only seen when certain relations hold between the various regions of the display.
44
In (c) the central region is higher contrast than its surround and thus is not seen as
transparent.
continuous inside and outside the central region, eliminating the percept of
transparency. In (f), the contour of the overlying layer is not continuous, which
also reduces the percept of transparency.
Figure 10. Adapted from Beck and Ivry (1988). The polarity constraint means that
transparency manifests itself in distinctive local ordinal relations in luminance.
The only difference between the three figures is the luminance of the region of
overlap. In (a), the region is dark, and the image is bistable as either square can
be seen in front. When this occurs, a line that progressively passes from brighter
to darker regions creates a Z-shape. In (b), the overlap is intermediate, such that
the line that joins regions of decreasing brightness is C-shaped.
When this
happens, exactly one of the surfaces appears transparent. In (c), the overlap is
light, creating a criss-cross pattern.
Figure 11. Scission and the perceptual organization of depth; adapted from Anderson
(1999). The top and bottom figures are identical apart from the brightness of the
surround. When the right stereopair is cross-fused, the figure appears as a single
textured plane that is visible through three circular holes. This is seen irrespective
of the luminance of the surround. However, when the disparity is reversed (by
45
cross-fusing the left stereopair), the texture appears to separate into two depth
planes. The near layer contains near clouds that vary spatially in thickness or
opacity. Through these clouds can be seen three more distant disks, which appear
more-or-less uniform in lightness.
completely reverses with a change in the luminance. In the top case, the dark
portions of the texture form the clouds; in the bottom case, the light portions of
the texture form the clouds. Scission makes these percepts possible by allowing
the visual system to separate the intermediate greys into two distinct
contributions.
46
Figure 1.
(a)
!
Q
Q
P Q
(b)
P
(c)
dA
d*
47
Figure 2.
(a)
(b)
wo
ima
ge
rld
Figure 3.
(a)
(b)
(c)
48
49
Figure 4.
Occluding Surfaces
matching,
disparity
computation
Figure 5.
(a)
(b)
(c)
50
51
Figure 6.
(a)
(d)
(b)
(e)
(c)
Figure 7.
(a)
(b)
(c)
52
53
Figure 8.
(a)
Figure 9.
(b)
(a)
(c)
(d)
(e)
(f)
54
55
Figure 10.
(a)
(b)
(c)
Figure 11.
56