You are on page 1of 88

Axiomathes (2005) 15:399486 DOI 10.

1007/s10516-004-5445-y DHANRAJ VISHWANATH

Springer 2005

THE EPISTEMOLOGICAL STATUS OF VISION AND ITS IMPLICATIONS FOR DESIGN

ABSTRACT. Computational theories of vision typically rely on the analysis of two aspects of human visual function: (1) object and shape recognition (2) co-calibration of sensory measurements. Both these approaches are usually based on an inverseoptics model, where visual perception is viewed as a process of inference from a 2D retinal projection to a 3D percept within a Euclidean space schema. This paradigm has had great success in certain areas of vision science, but has been relatively less successful in understanding perceptual representation, namely, the nature of the perceptual encoding. One of the drawbacks of inverse-optics approaches has been the diculty in dening the constraints needed to make the inference computationally tractable (e.g. regularity assumptions, Bayesian priors, etc.). These constraints, thought to be learned assumptions about the nature of the physical and optical structures of the external world, have to be incorporated into any workable computational model in the inverse-optics paradigm. But inference models that employ an inverse optics plus structural assumptions approach inevitably result in a na ve realist theory of perceptual representation. Another drawback of inference models for theories of perceptual representation is their inability to explain central features of the visual experience. The one most evident in the process and visual understanding of design is the fact that some visual congurations appear, often spontaneously, as perceptually more coherent than others. The epistemological consequences of inferential approaches to vision indicate that they fail to capture enduring aspects of our visual experience. Therefore they may not be suited to a theory of perceptual representation, or useful for an understanding of the role of perception in the design process and product. KEY WORDS: 3D shape and space perception, aesthetics, Bayesian inference, computational vision, design, epistemology, visual perception and cognition

1.

INTRODUCTION

When it comes to deriving suitable and rigorous concepts and designations for the various characteristics of our sensations, the first requirement is that these concepts should be derived entirely out of the sensations themselves. We must rigorously avoid confusing sensations with their physical or physiological causes, or deducing from the latter any principle of classification Ewald Hering 1878

400

DHANRAJ VISHWANATH

A standard refrain in the introduction to most undergraduate textbooks on perception is that vision is not the result of a simple camera-like process in which the external world is imaged faithfully onto the minds eye. Instead, it is often claimed that the rst step towards an understanding of perception is to discard the notion that what we perceive is an objective view of the external world. For example, in their highly regarded textbook, Sekuler and Blake (1986, p. 3) suggest that a distinction has to be made between ones perception of the world and the world itself. What we perceive should be more correctly thought of as the minds reconstructed 3D representation of the world generated from a meager 2D image impinging on the retina. Perception textbooks typically go on to say that dispelling the nave realist view that the world is exactly as it appears (Figure 1) has historically taken two opposing approaches: (1) Empiricism, which is best exemplied by Helmholtzs theory of unconscious inference. (2) Nativism, which is best exemplied by Hering and the Gestalt school (see for example Rock (1984), Sekuler and Blake (1986), Palmer (1999), Turner (1994)). We nd out that the empiricist believes that our perceptions are the result of our extensive experience and interaction with the world, while the nativist believes that our perceptions are entirely due to the minds innate predisposition to organize the sensory stimulation in a particular way. The underlying motivation for both these theories, we are told, is what is known as the poverty of

Figure 1.

Na ve realism.

THE EPISTEMOLOGY OF VISION

401

the stimulus argument: the retinal image highly underdetermines the structures that it gives rise to it in our percepts. Take the example of two possible images of a cube (Figure 2). Evidently, we perceive A as a cube while we perceive B as a square. But, B is also consistent with an image of a cube. In fact, both images, assuming Euclidean projective geometry, are consistent with an innite class of 3-D shapes. The empiricists reasoning for our stable and unitary percepts in A and B might be as follows: Through experience we have noted that in the preponderance of situations that we have encountered a cube, it has appeared to us as image A. We have rarely been in the position to view it headon as shown in B and have only experienced such an image when encountering a square. We have thus learned to recognize a cube in A and a square in B. Obviously, the actual story is more complicated. It may, for example, entail the fact that we support these claims through association with our other senses such as touch. In more quantitative analysis of such inferential approaches, notions such as non-accidentalness or generic viewpoint assumptions may also be brought to bear (see, for example, Barlow, 1961; Nakayama and Shimojo, 1996; Richards et al., 1996). The Gestaltist, on the other hand, might say that the reason we perceive A and B the way we do, should be attributed entirely to the minds innate predisposition to organize each image. There are no cubes, squares, or surfaces in the world in the folk sense of the terms i.e. in exactly the way we perceive them. Rather, what we see is the result of the spontaneous cortical organization of the sensory ux. Naturally, this does not preclude the possibility for the organized image to be correlated non-trivially with the physical structure of environment that gave rise to that image.

Figure 2.

402

DHANRAJ VISHWANATH

The aforementioned textbooks will usually inform us that the last half century of research have found both these models taken by themselves lacking as theories of perception. Therefore, a compromise between the two must be struck. This new approach, which is usually a sophisticated variant on the classical theory of constructivism originating from Helmholtzs notion of unconscious inference, one might generically call Neoconstructivism.1 It is best characterized in the classic text by Marr (1982) (cf. Palmer, 1999). The theory is an attempt at an amalgamation of empirical ndings in visual neurophysiology, and computational theories of vision originating in articial intelligence, which view perception as a problem of inference in an inverse-optics framework. In other words, perception is the inversion of the optical process that generates the 2D image from a 3D environment. It is usually argued that Neoconstructivism has the appropriate combination of elements from both empiricist and nativist theories of knowledge. The Neoconstructivist rejects a purely empiricist notion of perception because it can be shown that inverse optics, as well as concept learning, is impossible unless the visual system has pre-specied constrains for determining how the image must be processed. To the Neoconstructivist the Gestaltists or nativist position is also deemed unattractive because it seems in the danger of slipping into a kind of solipsism (if its all in the cortex, where does the real world come into play?). A perfect compromise for the Neoconstructivist would be to assume, as a nativist might, that there are indeed innate constraints for processing the image. Constraints which capture objective properties and behavior of world that are learnt from interacting with the external environment. A few examples might be: that there exist surfaces, lines, parallel lines, and common objects shapes; that light impinges from above; that the observer is not viewing the environment from a special vantage point; and so on and so fourth. These constraints, along with some tractable form of learning, are combined with the outputs of early perceptual processes that measure properties of the objects and environment such as brightness, illumination, distance, direction, size and orientation. The task of the visual system, then, is to detect, recover or infer from the 2D retinal image the simplest environmental conguration that is consistent with these various measurements and constraints. The requirement for simplicity arises from the well-regarded notion that nature abhors unnecessary complexity. This principle of Occams Razor has been expressed in the perception literature in

THE EPISTEMOLOGY OF VISION

403

such terms as the minimum principle (see Hochberg and McAllister, 1953; Hateld and Epstein, 1985), minimum length encoding (e.g. Boselie and Leeuwenberg, 1986), homogeneity and isotropy assumptions (e.g. Knill, 1998), regularity assumptions (e.g. Horn, 1986), and genericity (e.g. Richards et al., 1996). In much of the literature, these simplicity assumptions are assumed to be a direct reection of the wellstructured behavior of the physical world. A cursory glance at Neoconstructivism may make it appear to have achieved, simultaneously, a successful rejection of na ve realism and a perfect compromise between a nativist and empiricist theory of knowledge; an achievement that for many renders moot any discussion of theories of knowledge. On closer inspection though, it appears such a conclusion may be premature, because such a theory usually leads to the question about where the assumptions or constraints about structure-in-the-world come from, and how they are encoded. The Neoconstructivist will typically say that it happens through evolution. In essence, the claim is that that these assumptions have to be hardwired into the system through phylogenetic interaction with the objective external world (see Pinker (1997) for a popular scientic account of this notion; also see Brain and Behavioral Sciences, Volume 24 for related analysis). The story might go: dierent computational tricks or rules could compete with each other through evolution, until those that most eectively detect the objective external structure are the ones that are incorporated into the phenotype. But this, one would submit, betrays an empiricist theory of knowledge applied to the bootstrapping of hardwired assumptions or constraints. Any empiricist theory of knowledge, as Hume admirably demonstrated, has to either reach the inexorable conclusion of an idealistic world, or cling to its initial mistaken (na ve realist) belief that experience can provide objective knowledge of a real world. Since the central claim of Neoconstructivism is non-idealistic (i.e. the constraints and assumption actually reect something objective about the real world) it reduces, by its own claims, to a na ve realist theory. In other words, we nd that despite the view espoused in the introductory paragraphs of the perception texts alluded to earlier, the theoretical basis for current approaches to perception is essentially an empiricist, na ve-realist one. The foundational issues that aict Neoconstructivist approaches do not by any means bear on the whole research enterprise of human and computer vision. Many areas of research can indeed

404

DHANRAJ VISHWANATH

remain agnostic to epistemic assumptions within the theory and, at least to a point, implicitly assume a na ve realist model. Table 1 is a partial classication of areas of research in visual science and perception based on whether or not epistemological issues are critical to such research. Where the foundational issues do raise a red ag is in any theoretical or empirical research that falls in category B which involves the issue of perceptual representation.2 In these areas of research, the representational scheme that is assumed, explicitly or implicitly, has a direct bearing on whether the theory is plausible or not. Note that we only refer to the theories plausibility as a theory of human perception. Undoubtedly, many of the approaches in column B are quite suited to applications in machine vision. The Neoconstructivist approach aligns itself with a theory of perceptual representation where the fruits of perception are, more or less, an objective 3D description of the external world; the heavy lifting of perceptual processes is the inference from a 2D retinal image to just such a 3D description. Neoconstructivist theories are ultimately aligned with a notion of representation that involves symbolic tokens that signal external world measurements, properties or entities, such as orientation, size, shape, color, surface, object, part, a face, etc. This symbolic token may take the form of the ring pattern of an individual neuron or groups of neurons. The critical assumption is that the properties being signaled are properties of the real external world that have been learned through experience, and are not synthetic constructs of perception. The existence of a symbolic form signaling a property in the brain indicates that such and such a property, measure, or entity, has been successfully (or perhaps erroneously) detected from the available sensorium. The informational content of the symbolic form is necessarily parasitic on the objective information contained in the external objects that they signal, and thus the representation is a direct mapping, indeed a faithful image up to some resolution and hardware limitations, of objective properties in the world. From an epistemological standpoint, it is perhaps the issue of information content that has been most overlooked by contemporary theories of perception. The critical importance of dening the nature of the information content of perception was rst broached by the Gestaltist, and has been most forcefully and elegantly put forward by Leyton (1992, 1999). Leytons theory makes two crucial points regarding the informational structure of perception: (1) The

TABLE I.

A: Research topics that can be neutral to epistemological assumptions

B: Research topics aected by epistemological assumptions

THE EPISTEMOLOGY OF VISION

Forward optics Physiological optics Sensory physiology Front-end properties of sensory apparatus (e.g. thresholds, adaptation) Estimation of spatial properties (direction, distance, slant, size etc.) Sensor co-calibration (spatial estimation across multiple sources of information) including probabilistic approaches Spatial acuities and sensitivities (e.g.vernier, stereo acuities) Spatial localization and capacity

Attentional allocation, and limitations, in perception Correlations between perceptual and visuo-motor estimates of space

Inverse optics; shape recovery as inverse optics Perception as shape/object recognition Shape perception as probabilistic inference Shape recovery of from multiple cues. Shape recovery and representation via primitives (e.g. geons) Shape recovery via heuristics, biases, bags of tricks, mimima principles, non-acidentalness, regularity, genericity, etc. Application of ecological statistics to shape recovery and representation. Image correlation approaches to shape recovery n vector, etc.) and representation (cross correlation,eige Perceptual organization;grouping; grouping principles; shape recovery and representation via grouping Perceptual completion, Figure-ground Lightness and brightness, Parts and wholes Feature binding, object perception as feature binding Percieved stability of visual world across eyemovements, blinks, etc.

405

406

DHANRAJ VISHWANATH

information content of a percept (its causal structure) is constituted internal to the perceptual schema and does not reside in the external world. (2) The entities and relations used to construct a representational model cannot be parasitic on entities identied in the perceptual product, such as lines, surfaces, etc., but rather such entities have to derive from the representational scheme itself. This is achieved in Leytons model through a purely abstract, nested, algebraic (group-theoretic) representational schema. In contrast, let us try to understand the informational content of a percept as proposed in Neoconstructivist theories by considering the example of an observer looking at a bend on a road. Under a Neoconstructivist theory, a bend perceived in the road is the activation of some set of signals that a bend in the road exists, and the bend exists descriptively in the world in more or less the way that those perceptual signals or symbols species. For example, certain measures that we may specify the bending road to have, such as length, curvature, width, distance and direction as well as any of the ontological categories that we might ascribe to it, such as line or surface, provide an objective spatial description of the bend in the road that exists externally, independent of perception, and also specify the content of those signals or symbols that make up the percept of the bending road. In other words, both the physical thing that exists that is the bending road, as well as its percept, should naturally be described using these descriptors. Under a Neoconstructivist theory, we use these spatial and ontological descriptors not because that is the format in which our perceptions specify the world, but because our perceptions are (more or less) faithful descriptions of such objective physical properties and entities. In other words, describing such objective properties and entities in the world in terms of these ontological categories and spatial attributes is the only objective way that they can be described. And the descriptions that can be applied to the fruit of our percepts are also exactly those descriptions that apply to the physical thing out there that is the road; perhaps something less (resolution and hardware limitations), but certainly nothing more. Thus, the epistemology and ontology of a Neoconstructivist suggest even at rst blush to be at least weakly na ve realist. Putting aside any generic distaste for na ve realism, what else might possibly be wrong with a theory of perception like Neoconstructivism? One of the most enduring questions an inferential theory such as Neoconstructivism raises is the following: if a percept is

THE EPISTEMOLOGY OF VISION

407

either an objective measure, or indicator of the existence of a property or entity in the world, then how is this indication psychologically experienced? This question has been vexing to perceptual researchers from Mach, Hering, the Gestalt school, through to Gibson. Yet perhaps the most penetrating analysis on the question of perceptual experience and its relationship to the information content of the percept has been put forth by Leyton (1992). The question his theory raises and answers is the following: for the example of the bend in the road given earlier, how is it that we have a phenomenological sense of the bend itself if the indication is merely specifying certain static properties or quantities? Let us look at more closely at Leytons question, using the example of the sculpture in Figure 3. The sculpture does not represent any familiar object, and the most immediate markers of familiarity are that it is carved out of stone, and is a solid rigid object. Yet the most phenomenologically striking aspect of it, as Leyton would point out, is that we can perceptually sense the forces, the bending, and the bulging. Yet all our direct familiarity cues should be telling us that such processes are not at work in the object, and that it is instead a static, stress free, object. One might argue and say that those perceived forces are merely because the object resembles, say, a rolled toothpaste tube, or clasped hand, and so we are not experiencing the bending, but merely experiencing the lighting up of a hierarchical neural symbolic linkage, that might be

Figure 3.

Carving #11 Barry Flanagan, 1981 (from Beal and Jacob, 1987).

408

DHANRAJ VISHWANATH

as follows: like a rolled paste tube rolling requires force and action that force and action produces internal stresses internal stresses cause stretching of external membrane excessive external stress can cause disruption of the membrane. What would a Neoconstructivist theory predict sculptor Barry Flanagan3 would see when looking at his own nished product. Presumably since he is a sculptor, and has himself carved the object, his experience should not make his visual system light up the in above symbolic hierarchy (even though he may intend such tromp loeil in his observers). Instead, since the shape only weakly invokes some sort of familiar object, while his experience should strongly evoke what the object itself really is, he should just have activation of the symbolic set that simply says solid, hard, carved, roundish object (let us ignore the fact that even these have to be cashed out experientially). Indeed, if he hires an assistant to carve a multitude of the same shape through his lifetime, that assistant should cease to phenomenally experience any of the bending and bulging, and his very percept of the object should change. Any cognitive understanding of its similarity to a rolled toothpaste tube must be post-perceptual. Indeed, an animal with a visual system comparable to the human visual system, should have a completely neutral perceptual experience with respect to the object since it is neither familiar to a known object, or created with a familiar procedure. The entire informational structure (the sensed forces, deformations, etc., that Leyton enumerates) is under a Neoconstructivist theory either non-existent or the result of a simple application of our cognitive experience with objects. This line of thinking leads to another question that is denitely a more relevant issue to the theme of this volume. It arises when we consider what a Neoconstructivist theory of perception has to say about aesthetics and design. Do we reexively perceive qualitative dierences above and beyond the objective spatial and recognition measures when we view dierent visual congurations? In other words, is there a natural reexive qualitative evaluation that occurs at the level of the perceptual understanding of a visual conguration, which is prior to any application of cognitive factors such as memory, experience, etc? The ubiquitous perceptual evaluation that seems integral to the process of designing and the experience of a designed product, as well as common visual phenomenology, suggests that the answer is yes. That such direct perceptual evaluation is at some level central to the aesthetic experience in art and architecture has been of great interest historically

THE EPISTEMOLOGY OF VISION

409

both in psychology (e.g. analysis by Kant, Klee, the Gestalt theorists, Arnheim, etc.) as well as artistic movements (e.g. Abstract Expressionism and Minimalism). More recently, Leytons theory of perceptual representation has taken as its central charge the ability to explain fundamental aspects of aesthetics. An inferential theory of perception such as Neoconstructivism implies that all physically plausible visual congurations are, at the perceptual level, psychologically equivalent. This assumption of psychological equivalence in inferential theories of perception is reected in the fact that qualitative aspects of perception are usually judiciously sidestepped in favor of measurable ones. The implicit assumption is that since a functioning perceptual system only faithfully infers what is out in the world (up to limits on hardware), and does not inject any non-trivial informational structure of its own, all physically plausible congurations should yield the same perceptual quality; or perhaps, no perceptual quality. Of course cognitive factors, such as memory, appetite, or experience, might color the cognitive experience of the object that perception delivers, but the perceptual act remains neutral, since all it does is indicate that such and such a thing is out there, in the way that it is out there. Since the way that it is out there is physically valid (we have already made this caveat), there is nothing else that can be said about it in terms of quality. Obviously, sometimes the recovery may be erroneous, but since there is no marker on the percept telling us this, the erroneous percept is, from perceptions point of view, just as valid. Any judgment on the appropriateness of a conguration must come from extra-perceptual considerations (memory, appetite, aversion, experience, etc), what we refer to in this paper as cognitive aesthetics. It is interesting to note that for the aesthetician who wants to claim that all perceptual preferences are learned, a Neoconstructivist theory works very well, since rather than being a result of the very act of perception, aesthetic preference is cognitively applied onto the neutral product of perception. Yet, the nature of the process and product of design (and art) as well as common phenomenology are convincing evidence that such perceptual neutrality is not what we typically experience. The very act of painting and designing, involve choices and manipulation of physical congurations that are deeply connected to perceiving dierences in the quality of the congurations. Such dierences are inexplicable within a na ve-realist theory of perception (which we will hopefully show Neoconstructivism to be). Our experience

410

DHANRAJ VISHWANATH

of what one might call perceptual aesthetics suggests that for a workable perceptual theory, the dierences in perceptual quality should be deducible from the representational schema that embodies our perceptual system. The notion that the representational scheme of perception reveals its signature in our perceptual phenomenology is implicit, historically, in the work of several researchers (e.g. Hering) and particularly the Gestaltists. Leyton (1992, 2001) in his theory has rigorously raised and answered many of the epistemological, phenomenological and aesthetic criteria implicit in Gestalt theory. Yet surprisingly, these central observations of Gestalt theory are precisely the one that has been jettisoned from contemporary theories of representation aligned with Neoconstructivism. Generically, contemporary vision science has shied away from tackling the enduring but dicult puzzles of perception that are tied to phenomenology, epistemology and aesthetics. Much of this might be attributed to the current lack of resources on the historical lineage of the epistemological and phenomenological problems, and how they apply to contemporary scientic research in perception. None of the introductory or survey texts used for pedagogy provide a sustained critique of current approaches and their consequences. This paper is an attempt at lling this gap by bringing together issues within an epistemological framework that have been sometimes explicit and sometimes implicit in prior research, and applied to current approaches to understanding perception within vision research. There are six sections to this paper. Through these sections we will attempt communicate a range of ideas. Most, if not all, have been expressed before in the literature, starting from natural philosophy of the 18th century, empirical and theoretical research in vision (notably Hering, the Gestalt theorists and Gibson), and most particularly Leytons theory of shape.4 We will generously borrow ideas from their analyses provided in these works, to weave an argument that consists of the following observations: 1. In empiricist theories of vision such as Neoconstructivism (perception as inference, inverse optics, etc.) the critical informational and causal distinction between the 2D image and the 3D percept, specied by the theory, is erased by the computational rendering of the theory.

THE EPISTEMOLOGY OF VISION

411

2. The result of Neoconstructivist theories is a computational model of perception where the percept itself is largely noninformative. In such theories, the percept contains no non-metric information about the perceived world. The only non-metric information is generic rather than percept-specic (e.g. the fact that surfaces are continuous) and such information is entirely the property of the inferential device. The remaining metric information is itself not informative outside of the purview of inter- and intra-sensory calibration, and is especially not, as often assumed, an objective measure on the external world. All other information is rendered to be properties of the outside world; properties which are merely symbolically instantiated in the inferential device. 3. Theories of perception-as-inference always involve positing objective measures, attributes and entities to both the sensory stimulation and the external world. On closer inspection such attributes (features), measures (cues) and entities (lines, surfaces, objects) turn out to be subjective descriptors parasitic on very perceptual structures that they are used to explain. This results, inexorably, in such theories becoming na ve realist ones. 4. Standard computational renderings of Neoconstructivist theories conate sensor co-calibration and object recognition with perceptual representation. Both calibration and object recognition exhibit characteristics of learning, which are usually taken by such theories to support an empiricist or constructivist epistemology for perceptual representation. 5. A restricted model of perception as inverse optics that deals with only inter-sensory and intra-sensory co-calibration issues is a viable model for a range of empirical research studies in vision, particularly 3D space perception. Such a model is viable because it takes a strictly behaviorist approach to the notion of perceptual estimation of spatial attributes, where relationships are restricted to predictions between output and input; and can usually remain agnostic to explicit representational structures. 6. Although the notion of cues, and their combination thereof, is a very useful construct for understanding how interand intra-sensory calibration occurs, the use of the notion of cues is problematic for areas of research that are aimed at understanding the nature of perceptual representation, because

412

DHANRAJ VISHWANATH

7.

8.

9.

10.

11.

cues are merely ways in which to specify measurements within the perceptual output, and are not, as is commonly assumed, objective descriptors of either the external stimulus or the internal image. Recent Neoconstructivist approaches (e.g. perceptual organization, grouping, gure-ground) embrace Gestalt principles as important factors in the generation of the visual precept. Yet, most of these approaches are contrary to the basic epistemological and functional proposals implied by Gestalt theory. Theories of inference introduce spurious problems to the understanding of the perceptual process. One such red herring is the puzzle of how a stable percept is maintained despite the constant changes in the retinal image across saccadic eye movements and blinks. Neoconstructivist theories cannot explain why our percepts seem to provide greater information content than appears to be objectively present in the external array. This is an argument implicit in Gestalt theory and central to Leytons generative theory of shape. Neoconstructivist theories cannot explain how the percept is experienced. A fundamental charge of the theories put forth by Hering, Gestalt theory, Gibson and Leyton. Neoconstructivist theories cannot explain the phenomenological reality of the reexive qualitative judgments of perceived visual congurations that appear pre-cognitively in art, design and every day visual experience. Leyton is among the few who have argued how the understanding of aesthetics is central to any computational theory of shape.

In Section 2 we do a rudimentary review of the basic epistemological arguments in modern philosophy stretching from Descartes to Kant. This is important because the notion of the distinction between contingent and necessary connections between events will be crucial for understanding why all constructivist theories of shape representation/recovery ultimately reduce to untenable na ve realist ones. In Section 3 we review the two basic approaches to shape representation/recovery in modern research: (1) standard computational vision5 and (2) shape perception as Bayesian probabilistic inference. We reiterate that the methodologies in both these approaches have important and wide application to many problems in human and

THE EPISTEMOLOGY OF VISION

413

computer vision, and are irreplaceable in the development of articial systems, as well as the assessment of visuo-motor capacities of humans. The intent here will be to try and show why they cannot be successful theories of human perceptual representation. Many other ad hoc approaches to shape representation suer similar problems, but in addition, they do not provide any useful quantitative framework for other basic aspects of vision research. In that sense, an important distinction must be made between ad hoc theories and the sound quantitative frameworks of computer-vision and probabilistic approaches. In Section 4 we assess two key theories of perception that have heavily inuenced current research, namely Gibsons theory of perception and Gestalt theory. For the latter we mention only the theory and approach of the Berlin school of Gestalt (e.g. Wertheimer and Kohler), which is the one that is most familiar to researchers in perception. There are many important and crucial ideas that come out of the early Gestalt theorists such as Brentano, Von Ehrenfels, Mach, as well as other philosophers and psychologists of the Austrian and Italian schools of Gestalt theory. The reader is directed to the extensive reviews and analysis on their application to contemporary perceptual science by Albertazzi et al. (1996), Albertazzi (2000, 2001, 2002). Section 5 analyses the shortcomings of inferential approaches and outline diagrammatic frameworks for understanding the various approaches one might take for a theory of perception. Specically we will outline 3 of them: (1) shape perception as inference from 2D image to 3D world (na ve realism) (2) Shape perception as a calibration map (here we will also argue that shape or object recognition can be thought of as a form of calibration) (3) shape perception as the presentation6 of sensory ux. Section 6 discusses the implications of each approach to perceptual experience. Section 7 discusses the implications of theories of perception to aesthetics and design. Here the notion of representational conict in perception is introduced. This section will, by design, be of a speculative nature. Since the paper is quite long, a rst reading might be possible by skipping Section 3, and for those familiar with the basic philosophical arguments, Section 2. A short reading of the paper might include the introduction, and Sections 5, 6 and 7.

414 2.

DHANRAJ VISHWANATH PHILOSOPHICAL PRELIMINARIES

Ich gestehe frei: die Erinnerung des David Hume war eben dasjenige, was mir vor vielen Jahren zuerst den dogmatischen Schlummer unterbrach und meinen Untersuchungen im Felde der spekulativen Philosophie eine ganz andere Richtung gab.7 Immanuel Kant (1783)

The central metaphysical questions and theories of knowledge in modern philosophy stretching from Descartes to Kant essentially arise from the attempt to explain how we arrive at our perceptual understanding of the world. Namely, it is the question of how we support the knowledge we have of the world, when there is no basis for such knowledge within the sensorium itself. And as we mentioned earlier, the two competing theoretical approaches to explaining how knowledge is supported are nativism and empiricism. One reading of the philosophy from Descartes to Kant (the analysis of space, causality, objecthood, and induction) indicates that rather than being a tussle between nativism and empiricism, the entire enterprise is essentially a proof of the untenable nature of an empiricist theory of knowledge. Notwithstanding the colloquial understanding of Humes contributions as the refutation of the Cartesian position, Humes analysis could be thought of less as an attack on rationalism, as it is a device to expose the inherent contradiction in empiricism. It does so by arriving at the following conclusion: If we embrace empiricism as a theory of knowledge, we must either give up any claim to the existence of an objective world external to the perceiver, or maintain a na ve realist view of perception. Contemporary theories of perception appear to betray a blind sighting of this central epistemological result in philosophy.8 Yet its analytic conclusion has reappeared quantitatively in every computational model of concept learning or perception ever devised, leading to the need for the introduction of inductive biases, priors, regularity assumptions, heuristics, etc., in order to make the computation tractable. A possible explanation of why these foundational questions receive short shrift, is that the epistemological implication of the induction problem is viewed as having been beaten to death, and that with a more sophisticated analytic philosophy having arisen, neither the psychologist nor the philosopher

THE EPISTEMOLOGY OF VISION

415

nds a need to consider it in constructing theories of perception or cognition. We will quickly revisit the induction problem in the simplest way possible. In the interest of not getting sucked into a philosophical black hole, we will steer clear of debates on causality, concept formation, etc., in current analytic philosophy and, instead, focus on only those aspects that are historically undisputed. We will be interested in those basic aspects that are essential to understand for anyone attempting to develop a theory of perceptual representation. In order to keep the historical introductory perspective we will use characterizations from a well-regarded introduction to modern western philosophy by Scruton (1986) that quite suciently and elegantly captures the basic ideas we need. Induction can be thought of as a process by which we create a necessary connection between two things that are by their very denition, contingently separate. Humes argument against a rationalist understanding of cause, and the invalidity of induction,9 begins with his distinction between relations of ideas (necessary truths) and matters of fact (contingent truths). He stakes the claim that there can be no a priori proof of any matter of fact. Though it appears that he may be saying one cannot rationalize ones way to the truth, what he is essentially saying is that the attempt to construct a theory of necessary connections from particulars of observation is doomed to failure. Since matters of fact are contingent on observing the state of aairs, and no set of observations however large will ever exhaust the possible theories that we may be able to construct through reason, we can do no better than merely summarize what happened to be true (Scruton, p. 119). Hume then goes on to show that the notion of cause and objecthood as necessary connections suer from the precisely this problem. Scrutons excellent summary goes thus:
The idea of necessary connection cannot be derived from an impression of necessary connection for there is no such impression. If A causes B, we can observe nothing in the relation between the individual events A and B besides their contiguity in space and time, and the fact that A precedes B. We say that A causes B only when the conjunction between A and B is constant that is, when there is a regular connection of A-type and B-type events, leading us to expect B whenever we have observed a case of A. Apart from this constant conjunction, there is nothing that we observe, and nothing that we could observe, in the relation between A and B, that would constitute a bond of necessary connection...

416

DHANRAJ VISHWANATH

Why is Hume so confident that necessary connections between events cannot be observed? His reasoning seems to be this: causal relations exist only between distinct events. If A causes B, then A is a distinct event from B. Hence it must be possible to identify A without identifying B. But, if A and B are identifiable apart from each other, we cannot deduce the existence of B from that of A: the relations between the two can only be matters of fact. Propositions expressing matter of fact are always contingent; it is only those conveying relations of ideas between A and B that are necessary. If there were a relation of ideas between A and B, then there might also be a necessary connection-as there is a necessary connection between 2+3 and 5. But, in that case A and B would not be distinct, any more than 2 +3 is distinct from 5. The very nature of causality, as a relation between distinct existences, rules out the possibility of a necessary connection. [Scruton, p. 121]

It is the same line of reasoning that denies us the grounds for inducing the notion of enduring objecthood. The concept of an object relies on the notion of persistence over time, or in other terms, the necessary connection between distinct experiences of the same object. Once again there is no rational basis for positing a necessary connection between these distinct impressions and thus no basis for positing the existence of objects that endure even when unobserved (Scruton, p. 122). These arguments against the concepts of cause and objecthood are part of Humes more general argument against the idea of induction. Namely, that the problem with induction (of causality, of the existence of objects, of the existence of a material world) is that it hinges on establishing necessary connections based on the observation of contingent connections between distinct existences. (Scruton, p. 123). How then does the mind construct notions of causality and objecthood? Hume, relying on his empiricist ideology, claims that it is habit and custom. But his statements to the fact that the notion of causality appears to arise spontaneously within us, or that the mind has a propensity to spread itself upon objects (Hume, 1748), seem to betray his implicit belief that these notions whose validity cannot be established a priori originate in the peculiar way our mind organizes our experiences. Thus one way of interpreting Hume is that the properties imputed to the world cannot be derived solely from the particular sequence of sensory events, but rather, that the most fundamental properties attributed to the external world are constructs of our mind. Kants theory of knowledge places equal importance on both experience and the role of the innate predisposition of the mind to interpret that experience. He rejects from Descartes the notion

THE EPISTEMOLOGY OF VISION

417

that through reason alone one can arrive at knowledge of the world. He rejects from Hume the position that concepts arise only via association and custom. What he retains is the idea from Hume (and the other empiricists) that our senses provide the only subject matter for knowledge, while from Descartes he retains the idea of innate conceptual constructs as the only basis with which to describe that experience. These innate concepts then represent the most basic foundations of human thought beyond which no further analysis is plausible. Another way of saying this is that perception must be explained in terms of the innate predispositions that are required to bootstrap sensory stimulation so that it may be experienced. Kants brilliant insight is his claim that despite these constraints on knowledge, a synthetic knowledge of the world is still possible. Thus, though 2+3 and 5 are necessarily and not contingently connected, the relationship is genuine synthetic knowledge of the behavior of things. In the same way, in the gas equation PV=nRT, there is a necessary connection between pressure and temperature, and thus their distinction is lost in such a formulation (i.e. a real distinct property such as pressure cannot be said to exist in distinction to another real and distinct property, temperature). Still, the relationship gives us synthetic knowledge. It provides synthetic knowledge of the behavior of the distinct quantities and qualities (e.g. pressure and temperature) that we dene over the sensorium via a particular mode of perceptual-motor measurement, whether it be the space between a molecule (measurement in motor space), the kinetic energy of a molecule (rate of change measured in motor space as a function of measured time). (See part 5 for a description of visuo-motor measurement space). An objective external distinction cannot exist between pressure and temperature for good reason, because their distinction is merely in the way we dene each in perceptual measurement space. Still, the equation does give us synthetic knowledge of the relationship between qualities and quantities dened in perceptual or motor terms; quantities for which there appear to be functional reasons to distinguish between by dening a specic mode of measurement. But, then P cannot be said to be causally linked with T because a causal link is by denition a connection between two distinguishable quantities or entities. Within a Neoconstructivist theory, the 2D retinal image and the resulting 3D percept are dened to be distinguishable entities, and the critical causal linkage occurs at the level of the inference

418

DHANRAJ VISHWANATH

(the inductive step) from 2D image to the 3D world. In such a theory, the foundational assumption is that the 2D retinal image is causally distinct from the 3D percept since the former (part of the real world) gives rise to the latter (part of perceptual space). What we will hopefully show is that though such a theory depends on the distinguishability of the two, their very mode of denition leads, inexorably, to a place where that distinction is lost, resulting in a na ve realist or idealist metaphysics as Hume warned. Historically, the epistemological concerns in psychology have centered almost exclusively on the nature-nurture debate. This has derived from the nearly century long tussle between psychological theories that embraced an empiricist theory of knowledge and ones that embraced a nativist one. The former represented by Helmholtz, the Structuralists, and the Behaviorists, and the latter by Hering and the Gestaltists. With the waning of Behaviorism and associationist models of perception, what appeared to have taken its place is something in the guise of a compromise, but as we have already begun to see in the introduction, is essentially aligned to the same epistemological claims as empiricism; and in eect there has been an unconscious return to na ve realism. Yet within the research arena most see the empiricist-nativist argument as moot, assured that the reality is somewhere in between; that any focus on the epistemological question is not required for a scientic understanding of perception. This agnosticism regarding the status of perceptual knowledge has spread to a large swath of research surrounding the intersection of modern cognitive psychology and analytic philosophy, possibly due to the deference that philosophers began to give perceptual scientists late in the century with the advent of sophisticated experimental methodologies and some rather remarkable empirical results in physiology.10,11 What is interesting and as we will hopefully show is that it is precisely the things that Kant rejects based on Descartes and Humes analysis that have been embraced by Neoconstructivism (the possibility of rationalizing an objective world from our sense experience and the possibility of an empiricist theory of knowledge) and it is precisely those things that Kant embraced, that have been rejected (the innate predisposition of the mind in organizing experience and the essential unknowability of the world as it is in itself 12)

THE EPISTEMOLOGY OF VISION

419

3.

COMPUTATIONAL APPROACHES TO PERCEPTION AS INFERENCE

So when, as Bayesians, we examine the external world to determine what priors we should use, what do we find? We find our own posteriors. And nothing else. All we can ever see in perception is our own posteriors. Hoffman, 1996

The classical work in computational vision originated in work in articial intelligence, and the problem of developing robotic devices that could optically detect and categorize objects in controlled environments. One of the early approaches that embraced theory from perceptual research was that of Horn and collaborators (compiled in Horn (1986)). It was J. J. Gibsons formulation of invariant properties of surface features under optical projection that provided the foundation for this early work. Though we will later see that Gibsons approach is epistemologically untenable, its use in the articial vision domain was appropriate since epistemological issues could be ignored. The original goal of these approaches was not to understand human perception, but to develop computational schemes where video images could be used to recover or recognize objects in highly constrained environmental congurations. However, this approach eventually became a model for theories of human perception as inference, and therefore it is important to see why it fails as a model of perception. We will call this approach standard computational vision following Leyton (1992), to distinguish it from other computational approaches to vision such as perception as Bayesian inference, or computational theories generically referred to as spatial vision (spectral image analysis in space and time in the Fourier domain). Leyton (1992, 1999)13 has provided a very clear analysis of standard computational vision, and we will borrow much of the following analysis from there. 3.1. Standard computational vision Computational vision has largely focused on how geometry of surfaces can be inferred from a 2D image using certain cues in the image. These cues are aspects of the image that can be shown to have invariant relations between the 3D environment, and the 2D images generated from that 3D environment via a forward optical

420

DHANRAJ VISHWANATH

projection. Gibson provided a particularly clear description of these invariants. The invariant relationships, once characterized, are then used to by the visual system to accomplish inverse optics: infer the 3D environment, given the 2D image that was created by an optical projection. Analytically, Standard Computational Vision can be broadly broken down into two phases (from Leyton, 1992): (1) Modeling the observerenvironment interaction i.e. the image formation process. (forward optics) (2) Modeling the recovery of observer independent properties of environment. I.e. recovering the objective 3D environmental conguration from a 2D image. (inverse optics). Modeling of observerenvironment interaction involves the derivation of functions that predict how the value of a particular measurable quantity at the 2D image varies with a measurable physical property of the environment such as surface orientation, assuming certain relationships between quantities and properties in the 3D environment. For example, in the domain of surface shading, the predicted quantity at the 2D image would be the measured illuminance or irradiance,14 while the assumed properties and quantities in the 3D environment would be surface reectance properties and the nature of the illumination. The analysis is derived for sparse environment/image congurations specied only in a single domain e.g. shading (Ikeiuchi and Horn, 1981), contour (Kanade, 1981), or texture (Stevens, 1981; Brady, 1983; Witkin and Tannenbaum, 1983), and denes the image formation process in that domain. The image formation process is based on an optical model that involves assumptions regarding the nature of interaction between the environment (material substance), the medium (light), and the observer. One such assumption might be that the angle of reection is equal to the angle of incidence. At this stage, no assumptions about the actual structure of the environment (e.g. actual surface orientations) need to be made. The result of this modeling would provide, for example, functions that predict the illuminance of a point in the image, given the orientation of the corresponding surface point in the environment, (assuming the environmental illumination and surface reectance properties) (e.g. Ikieuchi and Horn, 1981), or the shape of a texture element in the image as a function of the orientation of a texture element in the environment (e.g. Stevens, 1981).

THE EPISTEMOLOGY OF VISION

421

However, these computations do not result in a solution for environmental (surface) structure, but rather constrains the possible point-wise orientations of the surface/surfaces that could have given rise to a particular illuminance measurement at the image. In order to arrive at a single interpretation of surface structure, certain basic assumptions about the nature of the observer independent environment must be made. These are the assumptions that are brought to bear in the second part, namely, modeling the recovery of the observer-independent properties of the environment. This phase has been typically referred to as surface interpolation. The classical results in this area are from Grimson (see Grimson, 1981), Terzopoulos (see Blake and Zisserman, 1981). In order to get a clearer understanding of this two-stage process, we review the classical work on shape from shading by Ikieuchi and Horn (1981) and reviewed in Horn (1986). 3.2. Shape from shading The goal of any shape from shading computation is to determine some property of surface shape usually surface orientation given the pattern of illuminance, or spatial pattern of intensity of light, observed at the image. Since the pattern of shading (the light emanating from an object) in the 3D environment is correlated with the pattern of illuminance measured at the image, knowledge of the latter might be used to infer the former. The rst part of Ikieuchi and Horns shape from shading model is to determine the intensity of light reected from a point on a surface in the environment based on the local surface orientation with respect to the viewing position (the surface luminance). Naturally, the intensity of the measured illuminance at the corresponding image point will be directly proportional to the intensity of the light reected from the surface (Figure 4, left panel). Thus, we can relate point-wise surface orientations in the environment to pointwise image intensities. Figure 4 (right panel) shows a surface patch viewed from point V, illuminated from point S, where the directions are specied with respect to the surface normal N. These vectors can be expressed in terms of what is known as gradient space.15 A frontoparallel surface has gradient (0, 0) since its normal is parallel to the line of sight, while a surface with very high slant will have a gradient (x, y) where the absolute value of x and y are very high, while

422

DHANRAJ VISHWANATH

Figure 4.

a surface perpendicular to the image plane will have gradient (), ). (see Figure 5, right panel) The relationship between the surface normal vector, the illumination direction vector and the line of sight can then be expressed in the coordinates of gradient space. If (p, q) represent the co-ordinates of the surface patch orientation in gradient space and (ps, qs) the illumination orientation, then Horn (1986) shows that we can derive expressions relating the angles between the surface normal, illumination direction, and viewing direction in terms of the gradient space for arbitrary surface reectance property (note: the viewpoint direction is always at the origin of the gradient space): coshe fp; q; coshi fp; q; ps ; qs : The luminance of a point on the surface (of a particular orientation), as observed from a particular viewpoint, given a particular direction of illumination, can then be expressed purely as a function the variables hi and he. In order to do this we have to assume what is known as the bi-directional reectance distribution function (BRDF) for the material that makes up the surface. This species the reectance characteristics of the surface material due to its particular surface microstructure. The BRDF is the ratio of the amount of light reected from a surface patch in a particular direction to the amount of light arriving at the patch from the illuminant. In other words, for any surface point whose orientation is expressed in terms of gradient space, given a particular direction of illumination to that point we can derive the luminance as observed from a particular viewpoint, as long as we know or assume its

THE EPISTEMOLOGY OF VISION

423

reectance behavior. So if we assume an illumination direction and the surface BRDF (e.g. specular reectance: reection such that surface brightness is only measurable when the angle of incidence=angle of reection; or Lambertian reectance: distribution of reection such that the same amount of brightness is seen from any viewing angle), we can derive the predicted luminance (radiance) for dierent orientations of a surface patch expressed in gradient space (p, q). Rp; q For a purely specular surface (i.e. a mirror), luminance will only be observed for one particular surface orientation, i.e. the one that makes e equal to i, at all other location no luminance will be observed. However, for other BRDFs, it is possible that dierent surface-patch orientations will result in the same luminance (given a particular illumination angle with respect to viewing direction). Indeed there may be a whole class of orientations that may give rise to the same observed luminance. Using the vector expressions for surface orientation and illumination direction and the BRDF, we can plot the predicted surface luminance for all possible surface orientations for a particular set of illuminants. Such a plot is called a reectance map (see Horn, 1986); essentially a plot of surface luminance as a function of surface orientation expressed in terms of gradient space, where the view vector orientation is represented by (0, 0). Any set of orientations that give rise to the same luminance will appear as iso-luminance contours in this reectance map. Three examples of this are shown below (Figure 5). Points on a given contour are eectively dierent surface orientations (p, q) that have the same luminance with respect to the viewer. The rst is a reectance map for a Lambertian (matte) surface where the viewpoint and illumination direction are identical. The

Figure 5.

From Horn (1986).

424

DHANRAJ VISHWANATH

surface orientation with the highest luminance with respect to the viewer will be one whose normal is parallel to the line of sight (and illumination). The second is for Lambertian surface where the illumination direction and view vector are not the same. Here the highest luminance will be for a surface whose normal is parallel to the illumination direction. The third is an example of a Lambertian surface with a specular (shiny) component (e.g. semi-gloss surfaces) for viewpoint dierent from illumination direction. This more complex reectance map is derived as a simple sum of contributions of the matte and specular components, and shows two areas of maximum luminance. Note that in all cases the assumption is a single innite uniform light source (all rays parallel to the assumed direction of illumination and of the same intensity). Addition of light sources, or altering their spatial distribution properties would change the nature of the iso-reectance contours. It is possible that a given set of iso-reectance contours could arise from entirely different sets of lighting and BRDF conditions. Thus, if we can (1) assume a particular surface reectance characteristic (BRDF), and (2) assume a particular orientation and distribution of the illumination, we can derive the expected luminance with respect to the line of sight (view vector) for a surface patch of a given orientation. Conversely, we can derive the class of surface orientations consistent with a given surface luminance (under assumptions of viewpoint direction, illumination and surface properties). Also, since image illuminance (amount of light recorded at the image) is directly proportional to surface luminance we can derive the class of surface orientations consistent with measured illuminance at a point in the image. This relationship is implicit in what is called the image irradiance equation (see Horn, 1986) Ex; y Rp; q: Where E is the image illuminance (or irradiance) and R is expected surface luminance (or radiance) as a function of surface orientation. So for example, for a given surface orientation (p, q) we can predict a particular luminance R, which in turn allows us to predict, up to a scaling factor, particular measured image illuminance. Then, we could potentially use the inverse of this function to predict the surface orientation consistent with any given measured image illuminance value. But it is obvious that such a mapping is not unique. There are an innite number of possible surface orientations that could give rise to a particular illuminance value, since

THE EPISTEMOLOGY OF VISION

425

any surface orientation on a particular iso-luminance contour is a valid prediction for a given value of image illuminance. In other words, any given image illuminance value signals a class of possible of surface orientations. Note that so far in constructing an inferential model, we are implicitly stating that the inference has to have a range of built-in assumptions of the environment, the very structure of which we are trying to recover. To summarize, we have assumed: (1) a particular physical and ontological model of the world that contains the entities (light and reecting surfaces) which include assumptions such as light travels in straight lines, does not dissipate over space, reects o of surfaces (2) assume that we are only interested in one aspect of surfaces, namely orientation with respect to the line of sight. Then in order to setup the relationships between surface orientation, view direction and illumination direction we assume (3) the illumination direction with respect to view vector and (4) the surface reectance characteristics (e.g., Lambertian). Finally we have to use a (5) Euclidean projective geometry in arriving at a particular reectance map. It is important to note that we have not used any information from the image per se, up to this point. Once we have a particular image description, all we are able to do is put a weak constraint on the local surface orientation in the environment given the measured illuminance at a point in the image. The constraint is weak because, despite it, there are an innite number of possible environmental congurations that could be inferred. So we are still very far away from getting any handle on inferring what the actual surface congurations are. Note that if we were to construe the analysis thus far as demonstrative of some real process in the human visual system, we are already missing large swaths of the informational content of the percept of surfaces, since the entire informational content of the percept here is reduced to local surface patch orientation. Accepting a limited description of surface perception (orientation), and even if we could somehow specify a unique value for local surface orientation (i.e. pick a point on the contour) given the pixel brightness in the image, we still have an innite number of possible solutions for the surface structure. This is because though we uniquely specify the local surface orientations, we still dont know the nature of the connectivity between the piecewise orientations. For example, in Figure 6, the image on the left is consistent with both a single connected surface, and a set of disconnected

426

DHANRAJ VISHWANATH

Figure 6.

surface patches (some small ones 1 m away from the observer, others large ones even a mile away from the observer!) Either of these two congurations will result identical retinal images, and are thus valid interpretations of the image. So in order to infer the overall surface structure, after we have picked the possible set of orientations that matches observed surface luminance (contour in Figure 5), we additionally need to (1) limit the inferred orientation to a single value given a particular image illuminance value. I.e. pick a point on the given contour in the relectance plots; (2) gure out the actual location and size of the local surface patches. This procedure in classical computational vision is know as surface interpolation, and it involves making two assumptions: 1. Surface consistency: there is minimal variation in geometry of surfaces in the environment. 2. Boundary conditions of the surface given in the image must be satised (surface must conform to edges, assuming one can independently recover these using some other independent mechanism). In other words, we want to nd the orientation or gradient (p, q) given each image location such that the resulting hypothesized surface (quoting from Leyton, 1992) 1. Conforms to the image intensities (illuminance) 2. Is as smooth as possible 3. Conforms to the boundary values. The rst constraint is satised by picking surface orientation (f, g)16 that minimizing the variation between derived surface luminance (from the reectance plots) and measured image illuminance, and may be given by the following expression

THE EPISTEMOLOGY OF VISION

427

ei

Z Z

Ex; y Rf; g2 dx dy:

It can be thought of as the error between derived luminance from the reectance plots and observed luminance as given by the image. The second constraint is satised by minimizing the variation in surface normal orientations hypothesized for each location on the image (derived using the reectancegradient relationship above) and may be expressed as Z Z   2 2 2 fx fy g2 g es x y dx dy: To satisfy both we minimize both simultaneously by the following expression es kei ; where k, the regularization term is adjusted depending on how much noise we assume in the image. These equations are in eect the expression for a thin membrane clamped to an assumed boundary and allowed to settle to a minimum variation subject to conformation with image intensity values. This model was developed further by Grimson (1981), Terzopoulos (1983), Blake and Zisserman (1987) into more sophisticated models where the environmental surfaces are assumed to be thin plates, and where the reectance error term and minimization of surface orientation variation is modeled using spring models. Additionally, other cost functions might be applied to determine whether the inferred surface should break or bend at high image contrast boundaries. 3.3. Hume and shape from shading In conclusion we see that the standard computational vision model of inferring surface orientation from image brightness values (shape from shading) relied on (a) assuming a particular objective model for observerenvironment interaction and (b) assuming a particular physical model of the environment, in eect, the physics of membranes or thin plate surfaces. (see Leyton (1992) for a fuller

428

DHANRAJ VISHWANATH

discussion). Given an image, the surface structure we infer is xed given the assumptions that are incorporated into the transformation. Note how, in such a model, the inferred environmental structure is deterministic on the image. I.e. for such a mechanism, the information constituted in the inferred 3D environment is informationally identical with the 2D image no new information results from making the inference from 2D image to 3D environment. All that is being done is that the information in the image is being transformed into a dierent data format using a xed and known set of transformations internal to perceptual system; like a lookup table or a translation from one language to another. Since there is no information gain due to such a translation, it cannot genuinely constitute an inference. This is not a problem that is relevant for the machine vision type application for which this analysis was developed, but it is a problem if we construe it as a model of perceptual inference. If we take it to be a workable model of inference then we must assume in the model that the 2D image and the 3D environment are distinct entities, where the inferred 3D environment has some additional contingent information that is missing in the 2D image, and that we want to infer the former from the latter. But in this process we have dened a mechanism which removes the very contingent identity that makes any inference interesting or informative. Another way of thinking about it is that inferential theories implicitly suggest that there is a gain in dimensionality and by extension a gain in information in the process of inference from a 2D image to the 3D environment. But, the computational rendering of the inference reveals that there is a xed relationship between the 2D image and the inferred 3D structure such that no information gain is possible the transformation can be thought of as merely changing the frame of reference in a xed dimension information space. This is precisely what Hume told us will happen if we try to induce state B from state A, assuming that they are two distinct entities. In the process we will have to abandon that very distinction that makes the inference informative. The result is a model where the two states are necessarily connected. The acceptance of the validity of the inferential process negates the very distinction that makes the inference interesting, rendering the inferential transformation non-informative. No knowledge of the world is gained by mere application of the 3D inferential transformation, in as much as there is no knowledge gained from counting ve objects as being

THE EPISTEMOLOGY OF VISION

429

a set of two and a set of three objects, then adding them to make ve, rather than by counting them together as a set of ve; that addition transformation does not tell us anything more about the objects in front of us, or change their objective status in any way, because from the point of view of the knowledge of objects, two and three is the same as ve. Any synthetic knowledge contained in the + operator (or 3D transformation function) about the way things behave is already internal to the perceiver; application of the operator (or transform) does not add information to the original set of 5 objects (or the 2D image). The ve objects (or the 2D image) remain the only contingent truth.17 If one were to insist that the inference process was valid, and that it results in a gain of contingent truth, then one would be embracing, as the empiricists warned, a na ve realist epistemology. Note that the derivation of the transformation functions from 3D environment to 2D projection (i.e. the forward optics functions) in the brightness domain constitutes genuine synthetic knowledge of the behavior of things in the environment (captured in terms our perceptual ontology as is any physical science. Establishing the mathematics of forward optics therefore constitutes a valid scientic enterprise in and of itself, having application to domains in machine vision, aspects of physiology, and assessment of the perceptual capacities in human vision. However, it does not bear on the question of how the visual system parses the sensory ux into the apparent brightness variation that is perceived, or how the phenomenology of the continuous surface that appears to support those brightness variations is achieved; or why it does so in the way that it does. Let us now look at a more recent approach to perception as the inference of 3D structure from a 2D image. This approach attempts to get around some of the problems of standard computational vision by focusing on the stochastic nature of the sensory ux and the organization and structure of the external world itself. Its ostensible strength comes from the realization that inferences occur in a probabilistic, and not deterministic, domain. 3.4. Perception as Bayesian inference There has been enormous interest in the last decade for reformulating the perceptual inference problem as one of probabilistic inference. The idea is that the image (or set of images) is a statistical sample of some objective external environment which itself is made

430

DHANRAJ VISHWANATH

up of a stochastic distribution of properties and elements (say, illumination, surfaces, points, objects etc.). The problem of perception, then, is to infer from the properties of the image (F) which specic conguration of the external environment (C) may have most likely given rise to that image. (Note, for now well skip over the problem that the properties of the image are usually described in terms of the features that the percept itself is delivers, e.g. points, lines, angles, as in the previous models.) We will use a generic, and notably clear formulation of perception as Bayesian inference presented by Jepson and Richards (1992). To paraphrase Jepson and Richards (1992): there are certain properties or congurations P that occur in the environment or world context C with probability p (P|C). Considering some collection of measurements on that environment F, also referred to as image features, we can express the probabilities of obtaining a particular image feature F given knowledge of P, namely p (F | P). The problem of perceptual inference is to determine the probability of the property P, in a world context C given a measured feature F. The important addition that Bayesian theory (as opposed to standard probability theory) brings to the formulation is that the probability of the occurrence of P in the world p(P|C), or simply p (P), is a critical factor in the inferential step. Within a Bayesian framework (as indicated by Richards and Jepson) the prior probabilities p(P|C) can be thought of as the learned objective structure and behavior of the world, or a priori idealizations of the world. The probability p(F|C) is pre-determined due to the fact that we assume a particular structure for the interaction between the observer, environmental conguration (surface, point, what have you), and medium (light), within a Euclidean projective framework. We know or establish p(P|C) because we have a priori knowledge or experience that the world behaves in a certain way. Not surprisingly, Bayesian analysis of perceptual inference breaks down in to the same two components enumerated for the standard computational vision example. (1) probabilities relating to observerenvironment interaction (2) probabilities for the inference of observer-independent structure. The observerenvironment interaction phase is what is typically referred to in the Bayesian literature as observer analysis, or, ideal observer analysis. It is the derivation of the likelihood functions for the probability of observing a particular image conguration given a

THE EPISTEMOLOGY OF VISION

431

particular environmental conguration. In the shape from shading domain, this derivation would essentially be isomorphic to what we saw as the iso-luminance function in Horns analysis. Once again, it only constrains the possible space of environmental conguration that could have given rise to the observed image conguration. In order to arrive at a probabilistic inference of a single solution, this class of solutions must be further constrained by making assumptions about the actual structure of the world. This is the second phase: the inference of observer independent structure. In a Bayesian framework these assumptions are introduced as the prior probability p(P | C). The two components, the likelihood function (observer environment interaction) and the prior distribution (observer-independent environment properties) are then combined to produce the posterior distribution, which, in eect, underwrite the validity of any particular interpretation. In the case of the shape from shading model of Ikeiuchi and Horn (1981), the most valid interpretation is the one in which we chose the congurations that minimize the size of the expression. In the Bayesian case, the most valid interpretation is the one that maximizes the size of the expression, namely the posterior probability. Jepson and Richards (1992) present a formulation that does not directly rely on probabilities, but rather, on ratios of probabilities. They propose a posterior probability ratio Rpost. Rpost pP j F; C : p P j F; C

The perceptual system infers the property P which generates the largest value of Rpost, given F and C. Note that in this formulation F and C are xed, i.e. the structure of the image and the behavior of the word are assumed, and P is the inferred property that we are trying to select that maximizes this ratio. Using Bayes rule, this can be rewritten as Rpost pF j P; C pP; C ; pF j  P; C p P; C

which can be simply written as Rpost L:Rprior ; where L, the likelihood ratio, is called the measurement likelihood condition and Rprior, the ratio of priors, is called genericity condition, because the ratio is an expression of how generic the property P is in C.

432

DHANRAJ VISHWANATH

Jepson and Richards now consider the perceptual inference of a simple image of a V" shown in Figure 7. The image feature or property (F) is the co-termination (within some resolution e) of the two lines in the image. From this we seek to infer the structure P in the environment: two sticks in 3D space attached at one end. Obviously, F holds when P is true, namely pF j P; C 1: If P does not hold in the world, and normalizing the entire image to unit area, the probability of observing F in the image is equal to the area included within some area of e (where e is arbitrarily small dimension that can be thought of as the resolution of the image; note that e is akin to the image noise parameter k in the Ikeiuchi and Horn model) pF j  P; C e2

Figure 7.

From Jepson and Richards (1992).

THE EPISTEMOLOGY OF VISION

433

From this we see that the Likelihood Ratio is a very large number (assuming some reasonable degree of resolution, i.e. e is suitably small) L pF j P; C 1 2 >> 1: pF j  P; C e

This would lead one to think that inferring P from F would be reasonable. However, as Richards and Jepson correctly inform us, in a random world it is actually guaranteed to be wrong from a probabilistic standpoint. If we assume a random world (and normalizing the environmental context to unit volume) then the probability of P in that world (two sticks co-terminating in 3-D) will be proportional to e3. Thus we have pP e3 ; pP j F pF e2 ; pP e << 1: pF

Or, in terms of the prior ratios Rprior pP; C e3  e3 : p P; C 1 e3

So combining, for the nal posterior ratio we have 1 Rpost 2 e3 << 1: e (again, assuming a reasonable resolution e). This implies that the inference of P from F is actually not supported, since Rpost is a small number less than 1. This is precisely what we came up against in the Ikieuchi and Horn model. If we assumed a particular orientation of a surface patch for a particular image illuminance (based on our observerenvironment analysis), in a probabilistic sense we would be guaranteed to be wrong. Since without any surface constraints, the set of possible orientations is uniformly distributed over the respective iso-luminance contour in gradient space. In order to achieve a singular solution, we saw in the Ikieuchi and Horn model the need to make some assumptions about the behavior of entities in the world. In the present example we need to make Rpost arbitrary large for V type congurations in the world (say some value d less than 1, which represents a discrete probabil-

434

DHANRAJ VISHWANATH

ity mass). This could be thought of as a prior for connectivity, just like the assumption that surface patches are connected smoothly in the Horn model for the shape from shading domain. Then we get Rpost 1 d >> 1: e2

(with the reasonable assumption that e is at least as small a fraction as the probability mass d). Now, Rpost is very large and so supports the inference to P. In the Bayesian framework we would say that what we have done is assumed a model environment where the property P is generic (namely that such a property has an arbitrary non-zero probability mass in the probability distribution space of possible congurations). Then the inference is guaranteed to be correct. This assumption of a non-zero probability masses in the distribution, is the model of the world (C) that the perceiver must assume in making the inference. Since our perceptual judgments are categorical, Richards and Jepson suggest that there must exist a framework to reduce the prior distribution to a single mode that can be selected for a perceptual inference. Indeed they propose that model of the world (C) may be considered to be just a collection of modes, or assumptions about the world, that are presumably hardwired into the observer. These priors are assumptions about structure in the world (right angles, connectivity; regularity assumption or generic viewpoint assumptions). In such a framework, the likelihood ratio is assumed to be innite, i.e. the probability of the image feature F, given the presence of property P is innite small (see expression on p. 48). Given a positive mode in the prior distribution for property P in C, the inference to P is certain to occur. Thus the prior component when congured with an innite likelihood ratio ensures a deterministic rather than a probabilistic inference from F to P. In other words given the feature in the image, the inference to P is inexorable. So once again in a Bayesian inference device, the image and percept no longer are distinct; their connection has become a necessary one. The notion of having to assume priors in order to constrain observer-independent properties is directly related to the regularity constraints for surfaces in the surface shape from shading models. Indeed, many other similarities abound between Bayesian Models and the standard computational vision models. For example some have suggested a variation on prior distribution models called

THE EPISTEMOLOGY OF VISION

435

competitive priors where the observer-independent environment constraint is not limited to a single prior distribution but to a set of competitive prior distributions (Yuille and Bultho, 1996). This is equivalent to what is captured by the weighting parameters in the multiple component standard computational vision models of Terzopoulos (1983), the weighting parameters adjusted based on the relative signicance and cost of each of the environmental assumptions e.g. boundary conditions, criteria for breaking surfaces etc.). Thus, from the standpoint of understanding perception as a problem in inference, Bayesian models of inference are not essentially distinct from standard computational vision models. Though Bayesian analysis provides purchase in those areas where the probabilistic nature of the input is relevant, e.g. estimation of spatial properties, and is also useful in applications for articial inference devices (as are the standard computational models), it is unclear what the redenition of the problem of inference in a probabilistic domain contributes. As we have seen, the models though initially construed in terms of probabilities, have to be congured such that the inference given an image is deterministic on that image, and the original formulations based on stochastic models of image sampling or distributions of properties in the environment become moot. It should be noted however that in Bayesian models of spatial estimation, metric estimates of the scene are indeed genuinely probabilistic, but those models have to be carefully distinguished from the one discussed here, because such models are ultimately about sensor co-calibration, and not perceptual inference. Several studies have provided empirical evidence showing that estimates of metric properties of the sensorium can be well modeled by Bayesian statistics, and shown to behave like maximum likelihood estimators (Landy et al., 1995; Ernst and Banks, 2002; Alais and Burr, 2004; Hillis et al., 2002). It has to be noted, however, that the most successful of these Bayesian studies have been exactly the ones that have not attempted to apply the theory to shape recovery or inference. They have not attempted an application of Bayesian theory to so-called mid-level vision, e.g. recovery of surface brightness, surface shape, or for ecological statistics approaches to deriving or explaining Gestalt grouping phenomena. Again, approaches such as ecological statistics can prove eective in providing compact descriptions of the external environment for use in articial applications, such as object recognition, line detectors, etc.; or to establish

436

DHANRAJ VISHWANATH

the statistical relationships between metric environmental properties and properties of the perceptual structures they give rise to. This latter type of analysis provides precisely the kind of useful synthetic knowledge that forward-optics analysis in the shape from shading, or shape from texture domains in standard computational vision provides. But they do not directly address the nature of perceptual representation, and thus what the information format of our percepts is. We discuss this further in part 5. 3.5. The problem of induction in any computational model of learning The central problem with empiricism, evident in the models presented above, has cropped up in many domains not the least of which have been in eorts to develop articial inferential devices, e.g. machine-learning algorithms. Machine-learning research has repeatedly shown that induction is not possible in concept learning space without the introduction of an inductive bias, in order to constrain the search space (see Pratt, 1994). The most critical point is not that the system has to have a bias, but that once such a bias has been chosen, the rule that will be induced via learning is already determined by the nature of the training examples. The rule is already a property of the image set given the inductive bias. In other words, given the computational procedure, the training image set and the computed rule have a necessary (and not contingent) connection, just as the inferred 3D interpretation achieved by applying a xed transformation on a 2D image has a necessary connection to that image itself. Namely, it is no longer an inductive solution but rather a deductive one. And to boot, there is no independent objective basis to determine whether one inductive bias is better than another. (see Pratt, 1994; Mitchell, 1997). The requirement to assume prior distributions in Bayesian models, or regularity assumptions in standard computational vision models, is precisely this sort of selection of inferential bias. In perception literature the bias that is claimed to be the right one, is naturally the one that explains the data. It leads to the implication that the solution is already determined in the image, rather than being an inference to external environment. This confound between the need to dene the image and the inferred external world as contingently separate, while at the same time attempting to understand how the latter may be inferred from

THE EPISTEMOLOGY OF VISION

437

the former, without falling into a na ve realist trap, has not been lost on all researchers in the eld as is evident from the following quotes from two prominent Bayesian theorists:
...uncertainties associated with how closely any chosen model for a world fact matches Mother Natures model....requires measures on knowledge which may not be explicit rule-based knowing...It is not enough to simply to give ...a probabilistic structure and then to incorporate them into theoretical frameworks in a like manner... We must know specifically just how the structure of our cognitive models (and their underlying assumptions) force particular conclusions about the world. (Richards, 1996, p. 227; my italics) Is there nevertheless an observer-independent world out there? I think so....But does this observer independent world resemble what I see, hear, feel or smell? That is more than I can know. But I suspect it does not (Hoffman, 1996, p. 220)

However the next statement by Homan indicates how dicult indeed it is to shake loose of the notion of perception as inference:
In sum, for good Bayesian analysis we need appropriate priors (and likelihoods). But when we look we see only our posteriors. What are we to do? Well, what we in fact do is to fabricate those priors (and likelihoods), which best square with our posteriors. And we are happy when the three are nally consistent... (Homan, 1996, p. 220; my italics)

But Homan does not go on to tell us that fabricating these priors and likelihoods by squaring them with our posteriors (e.g. choosing the delta probability mass for connectedness in the example above) results in a model where the image and the perceived external world lose their distinction, and one where the inferential process is reduced to a na ve-realist one. It does so because generating the right posterior depends on choosing the right prior and likelihood function, which themselves are based on that very posterior. I.e. the priors and likelihoods are themselves meaningless without being dened in terms of the posterior they are supposed to generate. Unfortunately, despite the careful and thorough analysis of perception by Hering, Mach, and later the Gestaltists, the appeal of the Helmholzian and Gibsonian approach, that maintain an intrinsic na ve-realism, has unwittingly discouraged a more epistemologically sound approach to the study of why our perceptions force particular conclusions about the world. Na ve realism in all its guises is so potent an idea, that a concerted eort has to be made to recognize it, lest sophistication of analyses provides false cover.

438 4.

DHANRAJ VISHWANATH GESTALT THEORY AND GIBSON

Two approaches of the last century that resisted the constructivist approach to perception championed by the Helmholtz school were those of the Gestalt School and the psychologist J.J. Gibson. Gibson himself was inuenced by the Gestalt school and drew heavily from their insights, but eventually sought to distance himself from their approach. We discuss here briey how these two approaches bear on the analysis of computational theories provided in the previous section. 4.1. Gestalt theory and its current popularity There has been a great resurgence of interest in Gestalt theory in last two decades as researchers have begun to realize that traditional approaches in computational vision, such as those proposed by Marr (1982), and neurophysiological theories aligned with Hubel and Weisels model of cortical detectors, have failed to deliver everything they promised. (See for example, Westheimer (1999), Nakayama and Shimojo (1995)). There is a move to codify and quantify Gestalt theory within the general framework of Neoconstructivism. The popularity of a sub-eld called perceptual organization or grouping, suggest that Gestalt theory is alive and well and enjoys wide application. But that may be too hasty a conclusion; because on closer inspection it is precisely those things that the Gestaltists held to be of paramount importance the ontological and epistemological concerns that underlies their formulation that are appear to be most often sidestepped. Meanwhile, the observed eects in perception such the Gestalt grouping principles, and the contextual eects of the whole on the part, have come to constitute the entirety of the modern notion of Gestalt. In a sense, the current interpretation of Gestalt is probably what Gibson would have ended up with were it not for his visceral distaste for Gestalt notion of grouping that the following quote betrays:
Wertheimers drawings were nonsense patterns of the extreme type, far removed from the images of a material world. (Gibson, 1950, p. 196).

THE EPISTEMOLOGY OF VISION

439

4.2. Grouping principles The common interpretation of Gestalt theory is that is species a set of rules, heuristics, or process used by the visual system to recover the shape of objects in the world from elementary units available early in the visual stream. The application of grouping processes on elementary visual units gives rise to observed groupings, such as detected line segments being grouped into contours. The end product of this process is a set of 2D entities (which resolve ambiguities of gure/ground, connectivity, occlusion, etc.) and where the nal 3D interpretation (disposition and shape of objects) is constrained by the various cues available (e.g. binocular disparity). The organizational rules that can take us from the retinal atoms to the perceived 3D space are ones that explain the observed phenomenology of object perception, e.g. closer objects appear to group together (for a description of Gestalt rules see Sekuler and Blake, 1986; Palmer, 1999). Added to this is the generic rule of grouping that says: the parts is dened by its relation to the whole. This has led in recent times to empirical and quantitative analysis of what has become known as global vs. local processes (see for example, Sekuler et al., 1994), and to a whole range of empirically observed congural eects, particularly in the lightness and brightness perception domain. In some recent approaches to grouping, the grouping rules are themselves assumed to be instantiated in the organization of the external world or objective images of the external world. That is, the actual elementary units in the retinal image, say line segments or points, have a particular organizational structure due to the special nature of the external environment, and Gestalt rules of grouping are the visual systems learned instantiations of that structure. This idea has developed into an approach called ecological statistics, where the grouping principles are derived from statistical analysis of the distribution of properties or elements in photographs of natural scenes (see, for example, Elder and Goldberg, 2002; Geisler et al., 2001). The idea is that the grouping principles we observe in our percepts are derived from the nature of grouping of elements (say line segments) that obtain statistically in a set of images of an objective external world (environment). In other words, grouping principles that the visual system uses to bind elementary units available at early stages of visual processing can

440

DHANRAJ VISHWANATH

be understood by analyzing sets of images produced of an objective external ecology (environment)18. In typical computational renderings of such theories, grouping principles are assumed to be rules or heuristics used by the visual system to group and then detect or identify in the image, lines, curves, surfaces, objects, etc. The grouping principles are considered to be eective because they end up detecting things that are out there in the world in the way we end up seeing them. Once again, the nal product is a neural code that signals the existence of a real external property or behavior (closure, continuation, etc) given in some objective description and arrived at by the appropriate application of Gestalt rules. But this is an interpretation that is precisely what the grouping principles appear to have been arguing against. For the Gestaltists, the grouping principles were a description of the nature of the spontaneous organization of elements that can be observed in a percept despite the fact that no indication of that organization is present in any objective spatial description of the raw stimulation, or even the objects given to us in perception. The primary purpose was to show that our perceptions over specify any descriptions that we might assign objectively to the percept or the retinal image. In the classic example shown in Figure 4, perceptually we naturally group the dots into vertical lines, yet nothing in the stimulus itself or for that matter the perceptual objects before us (the dots) provides information that the vertically aligned dots should be seen as related. The vertical grouping constitutes a qualitative content that is missing in the raw objects before us. Note, even though it is the case that the measured distance between dots in vertical column are smaller, there is no objective external criteria that species that we must group them as such, since an external criteria that says more distant dots should be grouped is equally valid. It is not so much that grouping seems to follow certain rules (e.g. shortest distance) but that such a perceptual thing as grouping even occurs. In other words there are qualities (or informational content) that are purely in the perceptual domain and cannot be attributed to the objective external world. These qualities are not re-presentations of the selfsame qualities in the external world because there are no such qualities that can be dened on the external world independent of perception. Even if we uncover some basic quantitative rules that specify under what conditions certain groupings occur, it still does not explain the grouping phenomena itself. The Gestalt argument about

THE EPISTEMOLOGY OF VISION

441

observable rules of grouping is thus ultimately an epistemological one, pointing out, as Leyton (1992) has vividly demonstrated, that information exists at the perceptual level that cannot be explained in an inferential model of perception (Figure 8). These two considerations lead to what one might claim are the two most important ideas at the core of Gestalt theory: (1) Perception is the presentation19 of an organized image. It is not a re-presentation of real entities out in the world, in some objective descriptive mode. The Gestalt rules of grouping are merely indications of how one might begin to construct a description of image organization to understand the informational content of the percept. Enumerating or identifying rules of organization is just a descriptive exercise on the way to understanding what information is contained in the organized percept. The percept, by denition, is achieved through an organization that takes into account the entire image. (2) Psychophysical isomorphism: the organized percept is isomorphic to the state of the neural circuitry in response to the image. This very important idea provides a basis for why we experience our perception in exactly the same terms that we describe their content. Psychophysical isomorphism allows us to explain the fact that we do indeed experience the perceptual world directly, rather than experience a symbolic representation of a real world. A line 4 ft long is measured to be twice the length of a line 2 ft long, and is also perceived to be twice as long. The scientic problem of perception is to determine objective principles independent of the perceived image that can be

Figure 8.

442

DHANRAJ VISHWANATH

brought to bear in explaining what is occurring in such an organization. The problem of how it occurs can then be tackled by empirical research in neurophysiology. In the Berlin school of Gestaltist, this second principle was promoted most vehemently by Kohler. As he remarks this last application of the principle (of isomorphism) has perhaps the greatest importance for Gestalt psychology.... (Kohler, 1947). This principle is also central to Herings thinking, that percepts are nothing more than the state of the organismic entities. It was this thesis that allowed him to develop the opponent process model of color, which had to wait over half a century to be nally empirically vindicated. In the modern era the only quantitative theory of perception that has incorporated both these crucial Gestalt ideas has been Leytons theory of perception (1992).

4.3. Gibsons direct perception


the characteristic of perception is that the result is not so much spontaneous as it is faithful to the thing perceived. The question is not how the percept gets organized but why it is always organized like the particular entity toward which the eye happens to be pointing (Gibson, 1950, p. 25; my italics)

Gibson approach is a strange combination of Gestalt intuitions with an aversion to Helmholtzian inferential models, directed by a rigid behaviorist and na ve realist ethic.20 He starts out by criticizing a cue based inferential model on the basis that it is not explicative of the perceptual process or experience, and replaces it with the novel idea that perception can be explained by examining the invariants between the visual eld (retinal image) and the external visual world (Gibson, 1950). Gibson essentially redenes the nature of the term cues, as measures dened over the extent of surfaces rather than the classical notion as measures over points in space. He provided a novel analysis of sets of gradients that can be dened on surface properties such as texture, shading, etc. Note that from an epistemological point of view there is nothing substantively dierent between his notion of cues and the ones he starts out riling against in traditional space perception approaches. The essential dierence is that while the older space perception model suggested that cues could be used to infer distance and direction of points in space, Gibson proposed that

THE EPISTEMOLOGY OF VISION

443

gradients in the image contain information about the external layout of surfaces. Here, there is both a novel idea, and a sleight of hand. The novel idea is that space perception be thought of a process of inference of surface layout, rather than the inference of point layout; an idea that rightfully has had an enormous impact on computational approaches to vision, as we have seen in the previous section. There is also something prescient and phenomenologically correct with respect to his claim about surfaces from an ontological standpoint. First, a range of empirical results in human discriminative capacities has supported the primacy of surfaces as perceptual entities (see Nakayama and Shimojo, 1996). Second, Gibsons notion that anything we see, we see as a surface, seems to have a phenomenal reality to it. Gibsons sleight of hand is in packaging his idea of surface perception as somehow fundamentally distinct from the inferential model of space perception that preceded it. He starts o with the idea that we should examine directly the percept and its correlate, the visual eld, suggesting that the analysis can ignore the external world and the actual physical stimulation; a view sympathetic to Gestalt principles. Yet he ends up concluding that the percept is essentially a copy of the external world, and that the visual eld is essentially a description of the retinal pattern of stimulation; a conclusion consistent with the inferential model of perception. His analysis of invariants then reduces to a description of forward optics in the external world (see Figure 11) given in terms of certain pre-specied measures that he calls gradients. This of course, as he rightfully states at the outset, can be done without even considering the workings of the human brain! Gibsons claim that the total stimulation contains all that is needed for perception is on closer inspection essentially an admission that there is no informational distinction between the 2D retinal image and the perceived 3D visual world in an invariants-based approach. As we saw in the previous section, any computational rendering of perception as inference results in the erasure of the informational or causal distinction that such models start out with, and thus inadvertently conrms that the transformation from 2D image to 3D percept under an inference model does not tell us anything about perceptual processes, but just describes relationships between two ways in which the perceptual product can be measured.

444

DHANRAJ VISHWANATH

Figure 9.

Perception as inference.

Gibson, in his later work, must have seen that the inexorable conclusion of his theory of invariants/gradients was one of two mutual exclusive models: (1) an inferential model in which the percept is a re-presentation and so cannot be experienced. (2) One that reduces to the na ve-realist camera-like model where the external world is directly painted onto the mind (Figure 9). He correctly saw that in a schema where the perceptual inferences are supposed to be describing an objective external world, that description would have to be a re-presentation. Gibson was keenly aware that this was problematic: a re-presentation cannot provide an experience of the thing it is representing, as much as the inference of the passage of a bear by paw patterns constitutes experience of the bear passage. In a scheme where the neural code is merely a symbolic code for the existence of a real external property or behavior, that neural code cannot appear to us as the thing itself. Gibsons Gestalt roots and his exceptional phenomenological sense did not allow him to abandon experience. Yet at the same time Gibson was apparently unable to rid his distaste for the merely phenomenal world of the Gestaltists. Conjoining this distaste for a phenomenal world with his fearless embrace of na ve realism, lead him to the almost evangelical notion of direct perception. Though Gibson realizes that he would rather accept conclusion (2) above, because of the nagging reality of perceptual

THE EPISTEMOLOGY OF VISION

445

experience, he also realizes a theory of perception must be something more than just a mere camera (with its associated homunculus). So he develops what came to be seen as the rather obscure notion of direct pickup where the objective information in the external world is directly picked up by the senses; and where the invariants (gradients) in the percept directly reect the invariants (gradients) in the world. Thus, he achieves a model where the perceptual state is a direct experience of the external world, and by introducing a new piece of terminology direct perception he skillfully brushes the camera and homunculus under the carpet. But the obscurity of the notion of direct perception, and its obvious resemblance to the camera-like model, is what presumably won it disfavor in modern perceptual science.21 Neoconstructivist theories have, for better or worse, tended to accept conclusion (1) above, a perceptual theory where perceptual experience is brushed under the rug.

5.

SHORTCOMINGS OF NEOCONSTRUCTIVISM

[A] theory of cues could never really explain how we see the world, or why it looks the way it does, but only how we make judgments about the world. Gibson, 1950

The analysis we have provided in part 3 is meant to highlight shortcomings of two approaches in computational vision only insofar as they apply to research in perceptual inference or perceptual representation. As asserted before, both approaches are valid for a range of basic research in sensory physiology as well as in the development of articial systems (object recognition, image processing, robotics, etc.). However, a considerable amount of this work has crossed over into the research program that is directed at understanding how the visual system constructs and encodes a percept (column B, Table 1); and indeed some have advocated this crossover. It is precisely because standard computational vision and Bayesian approaches have attained such a foothold that there needs to be recognition that they are not viable as models for the explanation of perceptual representation. Bayesian approaches and standard computational vision approaches to perceptual inference, both of which are quantitative instantiations of Gibsonian and

446

DHANRAJ VISHWANATH

Figure 10. Gibsonian na ve realism.

Helmholzian versions of perceptual inference, share the same set of problems, which are summarized here. Figure 1 showed the na ve realist camera model of perception that is typically rejected in the introduction to texts on perception. Usually, the main reason it is rejected is because it clearly involves an all-knowing homunculus who has to interpret the image created by the camera. Figure 9 shows the standard Neoconstructivist model of perception as inference. The 2D image is considered to be an entity distinct from the internal visual processes and the 3D percept they give rise to. The image formation process is assumed to occur and to have an objective description external to the perceptual system. Figure 10 shows Gibsons model, which is essentially a combination of Figures 1 and 9, the main dierence being that since Gibson does not want any representational or inferential schema to be specied, we place a homuncular direct perceptor in between the external and psychological domain. In both the Gibsonian and Neoconstructivist inferential model, the quantitative description of forward optics is assumed to be an objective description independent of perception. In Neoconstructivism, perception is the inversion of this optical process, so that a 3D description can be inferred from the 2D image. Since this inductive process is underdetermined, it requires that the perceptual process have built in biases to constrain the inversion. The crucial point is that the 3D description that we arrive in our inference at has the same objective description that we can apply to the objective external world. Certain aspects of the perceptual

THE EPISTEMOLOGY OF VISION

447

description, e.g. the sensation of color, are not thought to necessarily reect properties of the external world under the same descriptive mode (the percept of a color is hard to describe quantitatively, but its external correlate, wavelength, can be described spatio-temporally).22 One might argue therefore that Neoconstructivism does not hold to a purely na ve realist view of this process. Instead it can be considered to be only weakly na ve realist, because it is only the most important aspect of the percept (the spatial description) that is thought to directly reect properties of the external world, under the same geometric descriptive mode. Thus, when I see a line that is X meters long, that percept is veridical when there exists a line that is X meters long in the world. Note that both the term line and the term length are considered to be objective geometric descriptors, and to have objective informational content, independent of perception. In the same way, in this model, the 2D image is assumed to have an objective description independent of perception, e.g. wavelength of impinging light, intensity of impinging light, distance between image elements, size, orientation, etc. Are these attributes, measures and entities correctly thought of as objective descriptors? Yes and no. As long as we are trying to attain synthetic knowledge of the external world (as in all other sciences) they can be considered to be objective in the same way as an electron orbit around an atom can be considered objective. This is because all the sciences can be agnostic (for the most part) as to the source of concepts such as objects, persistence over time, measurement, distance, etc. The goal of the non-perceptual sciences is to understand, given our perceptual window into the world, what regularities hold in that world as described by our perception, or motor-perceptual extensions of it. Thus, understanding forward optics (likelihood functions, ideal observers, etc.) is a valid scientic endeavor that describes functional relationships between entities in the external world in terms of our perception of them. But, as has been most clearly pointed out by Leyton (1992, 1999), forward optics does not provide any knowledge of the perceptual representational structure, or the information content of percepts. Rather, as Kant would say, forward optics gives us synthetic knowledge of the invariant relations among entities in the external world, which in terms of the ontology, attributes, and modes of measurement specied by our extended perception (perception+measurement) we label as light rays, surfaces. (Later we will explain how the notion of external measurement is merely an extension of perception, or more

448

DHANRAJ VISHWANATH

correctly visuo-motor action.) This knowledge that we gain from understanding forward optics is useful since it can be used to determine how the sensory systems calibrate, or to understand the physiology of sensory system, e.g. that there are three receptors with dierential sensitivity to wavelength.23 In modern physics some of the very assumptions about modes of measurement, the validity of assuming persistence over time, have already been explicitly challenged and incorporated into modern theories of physics. Indeed, there is a direct lineage of breakthroughs in modern physics to early perceptual science through Mach, Hering, Fechner, all the way to Kant. Yet, astoundingly, in contemporary perceptual science some of the very assumptions are taken as objective! What Gestalt theory suggested, and what Leyton has more explicitly and quantitatively pointed out, is the problematic assumption made by Neoconstructivist-type theories that the percept is descriptively equivalent to some actual external object. In other words, they have pointed out the aw in the assumption (of inferential theories) that there is nothing descriptively present in the percept that constitutes more information than what is available in the external object itself. Inferential theories allow for the description to be erroneous or sparser, but in no way to contain more information.24 According to Neoconstructivist theories the informational content of the percept of a line (for example), its description given spatially, is parasitic on the objective informational content that the real object that constitutes a line in the external world posseses, given in the same type of spatial description. Furthermore, regularities of the external world are captured in terms of the same descriptive atoms that are identied in the perceptual product e.g. parallelism of lines, collinearity of lines, isotropy of texture elements etc. The problem is that the atoms and regularities are opportunistically dened based on how well they t with the perceptual product. Yet the atoms and regularities are assumed in these theories to have an objective external basis. Such an assumption is ne for computer vision systems, because there is no other external basis to determine what constitutes a regularity or an atom or a property in the external world apart from falling back on the phenomenological senses of the computer scientist that implicitly rely on the perceptual dierentiation of things like objects, surfaces, lines etc. But the visual system itself does not have an external interpreter telling it what is objective and what is not. The lack of

THE EPISTEMOLOGY OF VISION

449

an objective basis for dening how to parse a given data set, or what constitutes a special property over that set, was famously illustrated in pattern recognition research in Watanabes Ugly Duckling Theorem (Watanabe, 1969). 5.1. Shape as calibration map One way out of the problems just characterized, while staying within an inverse optics framework is to take a strong behaviorist position and be explicitly agnostic to the existence of perceptual representations. Instead, a percept can be thought of simply as being that which will quantitatively predict particular behavior (motor response or perceptual judgment). A report of a perceptual quantity (say length or slant), through a discrimination or magnitude estimation paradigm, implies nothing more than that that reported estimate can quantitatively predict what the magnitude of a related motor action will be, or predict the magnitude of an external motor measurement of the object giving rise to the stimulation. Taking this approach has the simultaneous eect of banishing the homunculus as well as rendering moot the question of what descriptive moles the percept is based on; or what their information content might be. One no longer needs to make a claim about whether the descriptive moles and measures are objective properties of the external world, or merely perceptual constructs. In such a model there is no need to discuss what the informational content of either the perceptual representation or the actual object in the world is. In this approach, notions such as slant, or other constitutive properties dont need to be thought of as being represented symbolically, or otherwise, in the visual system. They can be thought of as merely useful quantitative devices used by the researcher to relate the perceptual estimates and the predicted motor action or measurement. One might use a quantity such as horizontal disparity under this model, but there is no claim that such a quantity is actually represented in the system in the way the quantitative theory species. Such quantities are merely operationalized variables that allow us to determine how and if sensory motor co-calibration occurs. The quantities themselves, such as perceived surface curvature, dont need to be experienced per se, because they are dened solely for the purposes of investigating sensory co-calibration, and can be thrown out once that process is understood. The ultimate question in such a model becomes: are the perceived locations

450

DHANRAJ VISHWANATH

Figure 11. Perception as sensory co-calibration.

of points in space consistent with measured locations, and under what conditions. So, perceptual space or perceptual shape can be thought of simply as a calibration map; a map of distance and direction of a set of points with respect to some frame of reference, that can be used for motor localization. For a collection of points in space that constitute an object, the only information that is required under this model is the specication of their distance and direction in some co-ordinate frame, along with higher order quantities that are based on these, such as size, curvature, collinearity, slant, etc., that are operationalized by the researcher in order to study the co-calibration. This approach is illustrated in Figure 11: which shows the output of the perceptual system to be estimates of metric environmental properties on which behavior can be based; and which behavior can in turn change. In a very important sense, this is a limited but viable application of the notion of perception as inverse optics. It gets around some of the problems of an inferential version of Neoconstructivism, which eventually has to dene how the various aspects of the percept are symbolically represented, and what their epistemological source is. By avoiding the issue of representation, and by focusing only on calibration space, this restrictive version of perception as inverse optics is very eective in understanding basic issues in 3D space perception, binocular vision etc., where the goal is to determine

THE EPISTEMOLOGY OF VISION

451

how various sources of information are co-calibrated, and whether estimates of spatial properties (in terms of perceptual descriptors such as distance, size, etc) are consistent with a motor test of the external environment (we will later dene measurement as nothing more than an extended motor test on the sensorium). Additionally, such an approach is adequate for many aspects of neurophysiology, e.g. nding out which pathways of the visual system are binocularly coded. Note however that Figure 11 is not a correct representation of the calibration model, since strictly speaking, there should be no external world in a calibration model. The model operates only at the level of comparing measures on sensory responses, be they cues dened on the retinal image, or tactile and proprioceptive feedback. A more correct diagram that shows the true locus of calibration approaches in shown in Figure 17. Figure 12 shows an extension of the calibration model of perception that species other quantitative and qualitative measures that the perceptual system might estimate and explicitly represent. This model is essentially the core Neoconstructivist model and it is this one that suers from the problems we have been addressing

Figure 12. Representational Neoconstructivism: perception as calibration and recognition in an inverse-optics paradigm.

452

DHANRAJ VISHWANATH

thus far. In such representational models, not acknowledging the epistemic source of the descriptive moles, attributes, or informational content of the quantities and qualities being represented leads to the secondary problems that are described below. 5.2. Measurement as visuo-motor action There has been a conation in representational versions of Neoconstructivism of what might be called motor measurement space with perceptual representation. Under this conation, perceptual representation is considered to be nothing more than a collection of estimates of metric properties of the visual scene. When the domain is geometric shape or space the properties that are perceptually represented are usually considered to be spatial quantities such as length, distance, depth, orientation, curvature etc. Now, it is correct to say that that the visual system has implicit spatial estimates of properties of the visual scene, but a more careful epistemology would reveal that those estimates are specied within the terms of the representational structures of perception, in other words the measurements are of the perceptual sensorium and not objective measures of the external world.25 The notion of measurement as some independent test of the external environment is essentially meaningless outside of the structures, attributes and information content of perception that include notions such as distance, persistence of objects over time, and Euclidean 3D space; they are meaningless in the same way that the idea of motion is meaningless without the assumption of persistence over time. Instead, measurement is better thought of as nothing more than an extension of motor action. When we apply a motor action within our perceptual sensorium, the result (touch, proprioception, or a new visual layout) is essentially a measurement on that sensorium. Our percepts seem vivid and real because, usually, these measurements coincide with those values already predicted within the percept. Most deviations from this coincidence can usually be assigned to limitations in resources, erroneous assumptions, or damage to the perceptual/motor plant. Our percepts of objects seem vivid and real because they coincide with our measurement of them. In that sense, motor action is an implicit measurement test operating over the perceptual representation to determine whether the representation is coherent and predictive. Repeatability and predictability of motor-action-as-measurement is

THE EPISTEMOLOGY OF VISION

453

what denes whether a perceptual spatial property estimate available to perception is reliable and veridical. When the reliability or predictability is compromised the system adapts to bring the percept and measurement space back into coincidence. Device based measurement is just an extension of such motor action. Measurement, whether it involves a ruler, micrometer screw gauge, or laser interferometer, are all merely extensions of motor action as measurement. The informational content derived from making such measurements using external devices, is no less dependent on perceptual structure and attributes, than are our motor actions. Thus, when we say that the visual system has spatial estimates of the visual scene, we have to assume that it is only in terms of visuo-motor representational space, that we have such estimates. Two erroneous conclusions come out of the conation of measurement space with representation space in Neoconstructivism: (1) The implication that, since perceptual estimates of spatial properties are more or less matched by external measurement of the objects themselves, in some important sense our perceptual representation reects the objective spatial properties of objects in the world. This in turn leads to the idea that objects in the world (as they actually exist) are descriptively much like the objects description given by our percepts. Namely they can, and should, be described by specifying a set of spatial measures and nothing more. (2) The ability for the visuo-motor system to recalibrate (or in the case of neonates, to calibrate) has usually been taken as evidence that supports empiricist theories, because it shows that learning can occur. The sensors and motor plant can recalibrate within the visuo-motor representation space when motor-actions-as-measurement are not predictive. Since some of these modes of calibration or recalibration appear to go beyond merely adjusting spatial estimates, this is seen as even more impressive evidence of learning perceptual representations and structures. Some of these examples involve cases where the visual system has putatively learned statistical regularities of certain spatial properties of the environment, which might have the eect of biasing otherwise inherently ambiguous spatial estimates toward expected values in the environment (e.g. Purves, 2003). In other examples the calibration might not even involve a metric adjustment, but instead a learned rule that biases the

454

DHANRAJ VISHWANATH

overall perceived conguration when the percept is potentially ambiguous, an example being the well-know shadow illusion shown in Figure 13. This illusion is usually attributed to a learned assumption that light impinges from above. But on closer inspection, in both these examples the learning or calibration merely biases a spatial estimate toward a particular value, given a discrete or continuous set of ambiguous spatial estimates that are already available at the level of the percept.26 5.3. Object recognition as calibration Analogous to spatial calibration as a form of learning within the visuo-motor domain is the notion of learning in object recognition. Object recognition to be carefully distinguished from object representation provides numerous examples where an otherwise ambiguous stimulus generates an unambiguous percept that is consistent with a learned stimulus conguration. Recognition learning may manifest through exposure with particular stimulus sets, priming, or might imply natural learning of experientially common congurations. An example of the former is a degraded display of text which might have ambiguous interpretation at the level of the individual letters, but which will nevertheless be recognized as a particular letter in a familiar word, depending on the context of that word in a sentence. Another example would be a Mooney-type gure, such as the one in Figure 13 (right panel), where no object is initially perceived, but after being primed one sees the outline of a Dalmatian dog (see Van Tonder and Ejima, 2000, for an interesting empirical analysis of this illusion). An example of naturally learned

Figure 13.

THE EPISTEMOLOGY OF VISION

455

recognition might be biological motion. Note that the example we gave in Figure 14 (left panel), is better thought of in the spatial calibration domain rather than recognition, since unlike biological motion, learning in the shadow example only results in resolving an ambiguity in metric spatial structure. In biological motion, there may be a component of such a spatial (or motion) ambiguity resolution, but here one can also make a distinction between seeing coherent motion, from recognizing what that motion is. In all the cases of recognition learning as a form of calibration, the epistemic issues are moot; since the learning, while useful and robust, is occurring within a given perceptual representation, none of these phenomena indicate that the representational structure itself is being altered. 5.4. Why learning in sensory calibration cannot be used to learn perceptual representations There is sometimes the temptation to extend the notion of calibration-as-learning from the object recognition and spatial estimation domain to the perceptual representation domains. Such learning might be suggested as ways in which the visual system

Figure 14. Experiential Neoconstructivism.

456

DHANRAJ VISHWANATH

might learn generic congurations of the environment that can be incorporated into the perceptual inference mechanism. An example might be learning of the existence and ubiquity of continuous surfaces in the environment, and the incorporation of this learning into a bias to parse the image into continuous surface when possible. (This is precisely the kind of bias needed in the shape from shading domain discussed earlier). Evidence of such a bias, might be cited in the tendency to perceive illusory surfaces even when there is no direct visual evidence in the image for such a surface (e.g. Kaniza triangle). Another example might be the learning of the ubiquity of orthogonal surface relations through exposure to carpentered environments, which in turn gives rise to rectangularity constraints, and related illusions (Ames room, or the tilted balcony illusion (Griths and Zaidi, 2000)). But there is a problem with the notion of learning constraints through calibration in a Neoconstructivist paradigm. This is because it is precisely these constraints (biases, assumptions, priors) that are required in order for a unitary perceptual inference to occur (as we reviewed in the quantitative examples provided in part 3). Without these constraints, no unitary percept, or set of percepts, can be achieved onto which a calibration process can act. For example, without a rectangularity constraint, the projection of two orthogonal surfaces specify an innite class of possible surface conguration under inverse optics, which mean that there is no unitary percept or set of percepts of the surfaces that can be used for the learning of rectilinearity constraint via calibration. In other words we have a circularity: co-calibration across senses is used to learn the very constraints that are required in order to infer a singular spatial structure onto which the calibration can be applied. Assumptions such as smoothness, or rectangularity cannot be deduced by calibration in an inverse-optics model, because those very assumptions have to be already embedded in the perceptual structures that are being calibrated! One might suggest that a unitary percept is not required for the calibration to occur, because the calibration might involve a dierent sense that does have a unitary estimate, e.g. in the example above, one might say that touching the orthogonal surfaces provides the disambiguation to calibrate the visual percept. But this is not workable either, because such an explanation still does not explain how the particular

THE EPISTEMOLOGY OF VISION

457

percept that is seen as being right angle is matched with the felt right angle, when there is no unitary seen percept available in the rst place! So to summarize, in the case of spatial estimation within a perceptual representation, learning via calibration merely adjusts the metric value of an estimate. In the case of the shadow example (Figure 13, left panel) the calibration merely resolves a metric ambiguity. In the case of recognition (Figure 13, right panel), the calibration merely resolves the ambiguity based on matching to some visual memory. In none of the cases is any representational structure or mode of inference itself learned. As we saw in part 3, assuming that the perceptual representation itself is generated using assumptions that are learned from the environment, leads us inexorably to a na ve-realist camera model of perception. Crucially, learning through calibration has no bearing on how notions such as causality, object persistence, connectedness, uniformity, etc., get embedded in the perceptual system, and indeed no empirical nding has ever established that learning of such aspects exists. Some infant learning literature has explored at what age neonates might begin to show evidence of perceptual notions such as individuation, causal connectedness, etc. (see Spelke et al., 1995). But this work is more about determining when causal inferences might be measurably identied in neonates, and less about how the causal notions or constraints might themselves be encoded at the level of perceptual representation. They cannot tell us whether change in measured response is babies is due to learning the notion of causation, or is merely evidence of the timeline of the calibration process that brings other perceptual apparatus online; apparatus that are required to support the sensory parsing of events into objects, causality etc. Hopefully, this will make clear the distinction between the theoretically implausible mode of learning rules for perceptual representation, and the empirically established mode of learning as a process of calibration. This distinction is crucial! 5.5. How can simple to complex perceptual representations be possible without learning? We have been arguing against the possibility that learning can alter the nature of perceptual representation, and instead pointing out how most empirically demonstrable examples of learning are more

458

DHANRAJ VISHWANATH

correctly thought of as a process of calibration, either phylogenetically (the shadow example) or ontogenetically (the degraded letter example). But if a mode of learning that alters perceptual representation is not plausible, how might one explain the dierent degrees of complexity in perceptual representation that are surely present across species. It is intuitively clear that perceptual representations of the primate visual systems are more complex than that of the cat, which in turn are more complex than that of a y. How, one might ask, does one get a continuum of complexity in perceptual systems if the learning of perceptual representations is not viable. How can one account both for the fact that a continuum of complexity in perceptual systems exists, while at the same time accepting the notion that the core aspects of perceptual representation cannot be learned. Leytons theory proposes an answer to this dilemma by dening a perceptual representation schema based on basic rules for the conversion of image asymmetries to nested representations of perceived symmetries; where the image may be in any sensory domain, and symmetries are dened abstractly in terms of algebraic group structures. 6.
EXPERIENCING THE PERCEPT

The ecology is the minimal order in which image asymmetries can be changed back to symmetries Michael Leyton, 1999

When we get beyond the issues of calibration, recognition and learning and into the realm of perceptual representation within a Neoconstructivist model, we nd that a profound aspect of perception is entirely missing from such theories namely, they do not explain the reality of the perceptual experience. The need for a theory of perception to be able to explain the realities of perceptual experience has been put forward through history by a range of vision scientists from Mach and Hering, to the Gestalt movement, Gibson, and recently in a more systematic and quantitative way by Leyton. For our purposes we will distinguish three aspects of the perceptual experience. The rst, and most often discussed by both the Gestaltists and Gibson, is why we seem to experience the objects of our perceptions so directly, rather than having the impression that we only have a collection of symbols, features and measurements of the

THE EPISTEMOLOGY OF VISION

459

objects bound up in some hierarchy, giving us indirect perceptual access to the object. The second is the question implicit in the Gestalt arguments on grouping and analyzed more explicitly by Leyton (1984, 1986a, 1986b, 1986c, 1987, 1992, 2001) of why it seems as though the informational content of the percept seems to go, not only beyond what is available in the 2D retinal image, but beyond the information content that seems to reside in the very objects before us (for example, perceiving an aligned set of objects as constituting a group or meta-object). The third which may appear more subtle to the vision scientist, but is omnipresent for the artist and designer, is that our percepts themselves seem to provide a visceral subjective judgment of the visual objects, distinct from any cognitive or experience based eect. Let us look at each separately and explain why Neoconstructivism cannot account for any of these three aspects, and is thus not viable as a theory of perceptual representation. 6.1. Experiencing the percept in conventional theories of perception Previously we argued that the only viable version of the inverse optics model of perception is the restricted calibration model that is shown in Figure 11. This model, as we discussed, is suitable for a range of problems in vision science that can remain agnostic to representational issues (column B, Table 1). Since the only claim in this model is that a set of spatial estimates of points can be measured in the percept, it leaves open, deliberately, the question how the percept is represented or might be experienced. In the calibration model, all we have in essence are distances and directions of sets of points (where distance, direction and point are perceptual modes of description). No claims are made about the representation of any other quantities based on these spatial measures. As we discussed, the full Neoconstructivist model assumes that there are a whole range of quantities and attributes that might be derived from these sets of points representing the objects, including things like a relative size, orientation, shape, part structure, surface continuity, etc; where these quantities and attributes are represented as neural symbols (Figure 12). Yet in real visual experience, rather than having the impression that our percept gives us indirect access to the objects and their properties via these symbolic codes, we instead

460

DHANRAJ VISHWANATH

seem to have a direct vivid perceptual representation of the objects themselves; where such quantities dont seem to be giving rise to the percept, but instead seem to be derived from the percept. Indeed, it seems that the quantities and qualities dont come anywhere near to exhausting the range of measures and attributes that we can arbitrarily assign to our percept of the object. We are conscious of things like continuity of surfaces, variable part-whole structure, groupings, etc., that just can not seem to be exhaustively captured by a set of symbolic codes, however complex a hierarchical structure they might be embedded in. We have a sense of the form of objects when, as even Gibson claims, no set of descriptors however complex can seem to exhaust. What we see, in other words, is not a description of a picture, but the picture itself. In order to achieve such a direct percept (in Gibsons words) in the standard model of Neoconstructivism, wed have to add a special kind of homuncular draftsman (shown in Figure 14) to look at and interpret the symbolic (neural) output of estimates and then draw them out them and color them in as objects onto the canvas of our conscious percepts. Notice this act of drawing by a homunculus which captures our experience of the connectedness of the points, the continuity of surfaces made up by those points is precisely what the regularity assumptions, priors, etc., that are introduced ad hoc in computational models are in eect trying underwrite (see Leyton (1992) for a detailed analysis). But even after we have such a draftsman, he still cant draw out for us why we attain other rudimentary aspects of our percepts such as continuation behind an occluder, or inferring the presence of points that are not visible, such as in the self occluded portions of a solid object. Indeed, the development of computational models that are designed for highly constrained environments usually stop right when issues such as occlusion rear their ugly head. Thus any viable model of perceptual representation would need to explain how the representational scheme itself embeds notions such as continuity and connectedness. In other words, as Leyton has intimated in his theory, a valid perceptual representation must be able to draw itself out.27 Thus a representationalist model of perception as inference, which we have argued Neoconstructivism to be, reduces to what we might call a sophisticated na ve realist model (Figure 15).

THE EPISTEMOLOGY OF VISION

461

Figure 15. Sophisticated na ve realism.

6.2. The information content of the percept We have been making the point that in conventional theories of perception, a key component of information that is normally ascribed to perceptual representation, indeed the most important one, is spatial measurements. We have explained how such a view of perception is a useful but limited one for understanding some important aspect of visual processing, particularly sensor co-calibration. We have also tried to point out that the descriptors and modes of measurement specied in such models are inextricable linked to the nature of perceptual representation itself, rather than being an objective measure of the external world. Let us take the example of line again and look at the possible modes of information content of a line under the various incarnations of Neoconstructivism. Under a na ve realist model (to which, as we have seen, all Neoconstructivist theories of perceptual representation reduce to) a line is, informationally speaking, a description of a line-object in the world. The descriptors we can assign it might includelike: A line is a continuous one dimensional entity that spans two points in 3D space; the current line exists between measured points A[x1, y1, z1] and B[x2, y2, z2], the orientation of the line with respect to the observer is [theta, alpha, phi], etc. Under a na ve realist model, the descriptions that the line is given in our percept are exactly those objective descriptions that we give to the real line in the world (namely its position, length, orientation).

462

DHANRAJ VISHWANATH

In the restricted calibration version of inverse optics approaches, the only informational content of the percept is the position in Euclidean co-ordinates of all measurable points that lie on that line with respect to the observer. There is no need to posit an entity called a line within the representation, because such a theory is not concerned with entities other than points in space, and an estimate of their direction and distance. It is agnostic as to what that object might be. Within calibration theory, the point estimates constitute all the content we can genuinely speak of. Though we might conjecture that certain other quantities might be derived from these measurements, such as curvature, slant, etc., such quantities (e.g. slant) do not constitute additional information content in the percept from the standpoint of the calibration model, because they can be derived from the point estimates by mere application of a xed transform that is already part of the interpretive perceptual apparatus. In other words, in terms of our philosophical discussion earlier, there is no distinction between the measured spatial location of points, and any quantity derived from them by application of an internal transform. Thus the information content of the percept in a calibration model of perception is solely a metric estimate of the position of points, where these estimates are given in terms of the descriptive structures of perception (distance, direction). But as pointed out by Gestalt theory and Leyton (1992), when we examine our phenomenology, there seems to be available additional information content in the percept such as groupings, connectedness of points, continuity of surfaces etc. In standard Neoconstructivist theory, these are captured in the built-in assumptions that are referred to variously as gestalt grouping rules, uniformity assumptions, biases, heuristics, priors, minima principles, regularity assumption, genericity assumptions. But again, since these assumptions are already part of the perceptual mechanism in Neoconstructivist models, their application onto the inferential process does not add any additional information content over and above the metric information in the percept as specied in the calibration model. Note that saying that the notion of continuity of surfaces is somehow encoded in memory, just makes the problem circular, because in Neoconstructivist theories, the nal percept would still need to be read out and drawn out onto the phenomenological canvas. When one asks what the representational scheme of the drawn out percept is, we return to the problem we

THE EPISTEMOLOGY OF VISION

463

started out with. This is probably the most subtle but crucial point of the analysis we have been through. 6.3. Perceiving a square reduced to checking if spatial measurements correlate with those of a square As we have seen, the information content of the percept in both a full-edged model of perception as inverse optics, as well as the restricted calibration model, is by denition, solely metric. That is why such models are viable for studies that are primarily investigating aspects of vision that can be captured purely by metric descriptions, and not for those aspects that need to capture the nature of the representation of shape, (e.g. the transformational structure of a square (see Leyton, 1992 and 2001)). In Neoconstructivist theories such properties are assumed instead to be encoded in the hierarchical symbolic representation of the square built into the visual system, and invoked when the measurements of the collection of points tally with that of a square. Again, this implies that the entire non-metric information content of the square is already part of the perceptual apparatus (so, for example, there could be innate shape primitives such as square, or indeed as some have suggested, parametric, volumetric ones (Marr and Nishihara, 1978) often called geons (Biederman, 1987). This would imply that all non-metric information in any percept of an object is not part of the percept, but instead part of the perceptual apparatus (think of a list of x, y coordinates of a set of points that make a square compared to drawing of a square). In order to ascertain whether the set of co-ordinates constitute a square one needs an interpreter that already has knowledge of all the geometric, connective and topological properties of a square, the set of points themselves contain no such information). The percept itself, in this schema, is just a set of measurements. Thus in the Neoconstructivist model, perceiving a square is reduced to checking if spatial measurements correlate with those of a square. But our phenomenology, like the picture of the square clearly points to measurement-free information content in the percept (see Leyton, 1992). This suggests that that a very important requirement for any viable theory of perceptual representation is that the core of such a representation be metric-free. (precisely what Leytons theory achieves by specifying representational schema in terms of abstract algebraic structures).

464

DHANRAJ VISHWANATH

In any perceptual representation, the metric information contained in the percept is meaningful only from the standpoint of the co-calibration of sensor measurement (including device-based measurement). The metric information itself does not constitute any objective knowledge of the world, but instead a set of measures relating one aspect of the precept with others. Thus building a representational schema based on metric information alone, such as cue combination methods, or Gibsonian gradient computations methods does not provide purchase on important properties of perception outside of calibration. 6.4. The problem of a stable percept One of the most vexing problems that constructivist theories generate is how the visual system maintains a stable percept of the world despite the constantly and rapidly shifting sensory stimulation brought about by eye movements, blinks, self motion, etc. Many solutions suggest that a sort of memory of the prior scene may be maintained across xations, which can be used to establish correspondence. Additionally, eerent or outow signals of the eye-movement command may be coupled with this memory representation to allow for this correspondence to be achieved. Gibson (1950) on the other hand, maintained that the problem of stability of the visual world across xations was moot, because the objective external environment was itself stable, and that the visual system need only depend on that assumption to maintain its own stability. As he says: The perception of ...a unitary visual world over time, might be explained by the assumption that unchanging information underlies the changing sequence of obtained stimulation, and that gets attended to (Gibson, 1966). Gibsons alternative proposal, like his others, does contain a modicum of truth. This is because one interpretation of Gibsons proposal is that eye-movements are occurring within the framework of a stable percept, and thus stability across eye movements is not a critical problem (though his insistence on the stability as being due to the external world makes his position epistemologically untenable). A similar position, but one that was epistemologically correct was also held by Hering in opposition to the Helmholtz school (see Turner, 1994). The problem of the stability of the visual world across eyemovements arises from the na ve realist notion that the visual

THE EPISTEMOLOGY OF VISION

465

system is passively recording information that impinges on the retina from an objective external world. And therefore, that it is the perceptual system that has to make adjustments and account for the movement of the eyes. On the other hand if we maintain the proposal that motor actions are occurring within the framework of an organized image and only within that framework the notion of stability of the visual world no longer appears to be a central problem of vision. In a model where the organism is viewed as being located within the organized image (a perceptual space or sensorium), the organism is then merely exploring this image (or representation) using what might be referred to as an internal eye (might be thought of as shifts of attention). The result of this exploration alters the deployment of the parallel sensory device, the retina, located in the external world (whatever that may be!). This device is entirely under the control of the exploratory internal eye, and has been calibrated through early development to move in such a way as to result in successive retinal patterns of stimulation that are already predicted within the image. Specically, the internal eye is merely exploring what is a stable perceptual space, and it is the role of the adaptive mechanisms that control the actual eye-movements to make sure that retinal stimulation maintains registration with the scanning of the internal eye. There is some empirical evidence for this alternate view of eye-movements, attentional control and perception. It has been shown for example, that articially adapting eye-movement mechanisms leads to perceptual illusions showing that the eye movement apparatus is designed to be calibrated to the perceived image rather than vice-versa (Bahcall and Kowler, 1999). Other neurophysiological evidence show shifts of receptive eld characteristics of cortical cells that occur due to only changes in attentional locus (e.g. Duhamel et al., 1992). Indeed a large area of research looking into how visuo-motor coordination occurs has established the need for fully specied internal models of the external sensory-motor plant (see Kawato, 1999 for a review). Organisms learn to calibrate motor behavior such that it operates successfully within the perceptual representation (or sensorium). This idea is crucially distinct from the notion that sensory and motor processes operate independently, but in a coordinated way on a physically objective world.

466 6.5. Summary

DHANRAJ VISHWANATH

So, let us now summarize the essential problems with the Neoconstructivist, inverse optics approach to perceptual representation: (1) The 2D image and 3D percept are dened to be distinct entities: the former existing in the external world and the latter in the psychological domain. Perception is supposed to provide the causal inferential link between the two. Yet, as we have seen, an inferential model of perception even when placed within a sophisticated quantitative framework removes the very distinction that makes the inferential link informative. Any model of perception as inference reduces inexorably to one of two trivial models. (1) a na ve realist model (Figure 15), where the perceptual system doesnt involve an inductive (inferential) step between the external and psychological, but has instead an all knowing homunculus examining an objective image on a camera; or (2) An idealistic model in which the external world does not exist. This essential problem is nothing more than Humes argument against any empiricist theory of knowledge that wants to retain the notion of an objective physical world. In all such theories we end up losing the very distinction that we are trying to explain, and with it most information content in the percept. (2) The external world, the 2D image, and the 3D percept are all dened in terms of the same descriptive parameters, which are erroneously considered to be objective parameters independent of perception. Thus, geometric descriptors like length, position, orientation, are taken to be objective measures or properties of the world, in the same way that objects and surfaces are erroneously taken to be objective things in the world. Similarly, the image is thought to have objective descriptors as well as objective cues which are assumed to have informational content distinct from the perceptual machinery and the nal percept. In these models, perception is assumed to be a process that involves the successful detection of objects that have distinct objective existences in the external world, which have been imaged onto an objective sensory eld. A central aspect of these models is the assumption that spatial descriptions are objective from an informational point of view.

THE EPISTEMOLOGY OF VISION

467

(3) Inference requires a re-presentational scheme and re-presentation precludes direct experience, and so any re-presentational scheme that species the existence of an external world reduces to a na ve realist scheme. (4) Though many recent Neoconstructivist approaches claim to be aligned with Gestalt theories of perception, nearly all maintain epistemological positions exactly opposite to those espoused by Gestalt theory. (5) Neoconstructivist theories conate the notion of calibration with perceptual representation. One way they do so is by creating an unnecessary distinction between what we call measurement as motor action, and device-based measurement. In these models, the device-based measurement is considered, erroneously, to be objective and informationally distinct from perception and motor action. (6) Limiting the notion of inverse optics to a calibration domain allows for workable inverse-optics models for exploring a range of issues in vision. Such approaches, in which shape is viewed simply as a calibration map, implicitly posit that the only information contained in the percept is spatial estimates of points in the visual eld. Note that such a model is epistemologically a behaviorist model where there is only an idealized world of stimulus and response. (7) The non-metric informational content of a percept such as connectedness of points, continuity of surfaces, etc, are assumed to be encoded as learned biases, constraints, priors, etc. There are two problems with this: (1) Such constraints cannot be learned since a unique percept (on which the learning can take place) is impossible without these biases already in place. (2) It implies that the only information content in the percept itself is metric all non-metric information is not part of the percept, but part of the inferential device. (8) Neoconstructivist theories do not provide a satisfactory explanation of perceptual experience. This is because Neoconstructivism regards perception as a re-presentation of properties of the external world. Though we have been using the term perceptual representation throughout the text, we did introduce the notion of perception as a presentation of the sensory ux. This is the most critical aspect of any theory of perception; it is most correctly dened, not as a system of re-presentation, but of presentation. Thinking in such terms, color is not thought of

468

DHANRAJ VISHWANATH

as a re-presentation of a property in the external world; it is the presentation of chromatic dierences; where chromatic dierence is a property of the percept and not the external world.28 In other words, the information content of color is parasitic on percept and not on objective properties of the external world. It is certainly not a re-presentation of dierences in electromagnetic wavelength; any correlation between wavelength and perceived color is a synthetic description of relations between perceptual states and measurements in visuo-motor space. Figure 16 illustrates what an epistemologically correct model of perception would look like, where the information content of the percept (both of the image description, and the perceived object description is part of the perceptual apparatus and not the extenal world. There is naturally a correlation between the states of the external world, and the percept, as determined by the ux at the sensory interface. Figure 17 illustrates the correct construal of the calibration domain in perception, again, any metric descriptions of this domain are based on perceptual entities and metrics.

Figure 16. Perception as the presentation of an external sensory ux.

THE EPISTEMOLOGY OF VISION

469

Figure 17. Calibration domain in an epistemologically correct model of perception.

7.

PERCEPTUAL EXPERIENCE, AESTHETICS AND DESIGN

Objects project possibilities for action as much as they project that they themselves were acted upon the former allows for certain subtle identifications and orientations, the latter if emphasized is a recovery of the time that welds together ends and means... The work is such that materials are not so much brought into alignment with static a priori forms, as that the material itself is being probed for openings that allow the artist behavioral access. Robert Morris, Sculptor.29

Perhaps the most enduring puzzle in the history of perception is its relationship to visual aesthetics. The range of phenomena that can be ascribed to visual aesthetics is very broad and often involve personal, sociological, religious, ritualistic, and convention-based aspects. Though many of these are best discussed within the framework of art history and criticism, a much more enduring puzzle in aesthetics has been the question of what the purely perceptual dimensions of aesthetics are. This has been a particularly vexing and important question in the domain of the design of artifacts, namely architecture and product design. A central feature of perception, that is most evident in the process and products of design,

470

DHANRAJ VISHWANATH

is that some visual congurations appear, often spontaneously, as perceptually more coherent than others (e.g. a particular arrangement of objects or parts). In architecture, certain relationships of elements are universally acknowledged to be more aesthetically coherent than others. There have been many historic eorts to codify design rules that embody those visual parameters of design from Alberti to Le Corbusier. In addition to such classical rules of architecture there are much more subtle rules of form-making (e.g. structural validity), arrangement of elements (e.g. repetition, alignment), treatment of materials, that are invariably, and implicitly manifest in the process of design at all scales. Such written and unwritten rules abound in all elds of design, particularly in architecture. Design styles that explore the deliberate breaking of implicit and explicit design rules (e.g. late 70s and 80s postmodernism and deconstructivism) underscore their import and ubiquity. We might dene the aesthetic underwritten by these rules as pre-cognitive reexive perceptual judgments on perceived visual congurations. There are strong reasons to believe that such reexive perceptual preferences, which inform and shape every step of the design process, are a result of the very nature of the underlying perceptual mechanisms, rather than being merely experience dependent or cognitively applied. This idea has been proposed through history by a range of philosophers, artists, and scientists such as Kant, Klee, and Arnheim. The very trajectory of modern art has been a dialogue with these qualitative eects of perception on painting and sculpture, culminating in quite explicit investigation on the linkage between art and perception in contemporary art, particularly Minimalism. Recent theoretical and empirical studies have sought to establish more clearly what these links might be. Leyton (1992, 2001) has explained why qualitative eects in perception and aesthetics have to be explicable within the representational structure of perception and has provided a comprehensive view of art and aesthetics that comes out of his generative theory of shape perceptual theory. Other empirical work has pointed to connections between possible representational schemas and their aesthetic import (e.g. Van Tonder et al. (2002), Taylor et al. (1999)). Albertazzi has provided extensive analysis on the relation between aesthetics and perception held by the Gestalt School, particularly in the Brentano and Graz traditions. Also, the notion that aesthetics in art may be a window into underlying neural mechanisms has now made connecting visual arts and brain science a respectable topic,

THE EPISTEMOLOGY OF VISION

471

almost a fashion, in scientic circles previously skeptical of the soft issues like art and design. The foundations of Gestalt theory originating in Ehrenfels work, and more recently Leytons theory of shape, propose that the most important aspect of qualitative perceptual judgments are that they are not judgments of external congurations, but rather of the internal perceptual state. What these theories essentially imply is that no qualitative judgment can be made on the external conguration, because it is never experienced independently of perception. Thus, for the Gestaltist, the qualitative perceptual strength of aligned objects cannot be ascribed to some objective notion of alignment in the external conguration, but to the natural strength of aligned perceptual objects inside the sensorium. Our purpose here will be limited to investigating the qualitative implications of contemporary approaches to perception. We will specically be interested in their implications to the design of artifacts, rather than aesthetics in general. 7.1. Qualitative implications of calibration and recognition theories of perception Conventional theories of perception cannot account for reexively perceived subjective dierences precisely because they dene perception as a problem of inference. Neoconstructivist theories of perceptual representation imply that all perceived congurations should, at the perceptual level, be qualitatively equivalent. We have seen that there are two plausible models of perception as inference via inverse optics: perception as calibration or perception as recognition. It is abundantly clear that in a pure calibration model, no subjective dierences should obtain in the perception of any physically plausible external conguration. The product of a percept in a calibration model is a set of estimates of the spatial position of points that make up the viewed conguration; we have characterized such a set of spatial estimates as a calibration map. For external congurations (objects) that are physically plausible, there should be agreement across all the various estimators (cues) of spatial position, and so they will, in combination, provide the best possible estimates for the points. For another physically plausible conguration, a dierent but equally valid set of estimates will obtain. Thus a set of estimates for one spatial conguration will be no dierent

472

DHANRAJ VISHWANATH

qualitatively than the estimates for a dierent conguration. Both sets of estimates are merely metric values. Now, there are two possible ways in which the existence of additional information in the calibration map might allow for a qualitative judgment on the estimates. The rst way is that the sets of point estimates (that constitute the calibration map) have a concomitant indicator of their reliability. Thus, for example, spatial estimates for a visual conguration that is viewed under sucient illumination, and close up (best binocular disparity cues) would be more reliable than one viewed in dim condition further away, and thus could be thought of as being better. But note that such a reliability signal is in eect an indicator of the quality of the estimate being made, and not an indicator of the quality of the conguration, since better viewing conditions mean better estimates, regardless of the nature of the conguration. Thus for any set of congurations that are viewed under the same conditions, no qualitative dierences can be inferred, since the reliabilities will all be the same. The second way is that the estimates from one set of sensors do not correspond to the estimate for another set of sensors. For example, the calibration map based on disparity may be dierent from the calibration map specied by motion parallax. This is a state of aairs that is very familiar to the vision researcher in what is know as a cue-conict study. It is possible to articially stimulate the visual system so that the estimates from one set of cues conicts with another set of cues giving rise to a cue conict. It is plausible that the visual system could assign qualitative dierences to the percept based on the level of cue-conict, thus high cue-conict percepts might be seen as inferior to those with low or zero-conict. But note that what we started out by claiming is that calibration theories cannot assign qualitative dierences in situations where the external physical conguration is a physically plausible one, i.e. sensory stimulation that is consistent with a real object, and thus has no cue conict. Cue conict stimuli are not physically plausible but are situations that are created articially in the laboratory, in order to study the action of the various sensor estimates that give rise to the overall percept. Thus a system that can assess dierences between real and articial stimulation does not provide a means to distinguish qualitative dierences between physically plausible (zero cue-conict) congurations. The fact that cueconict situations do provide a means to distinguish between

THE EPISTEMOLOGY OF VISION

473

physically plausible and physically implausible congurations, underscores the fact that cue-conict stimuli are ideal for studying how co- calibration across sensory measurements is maintained: a calibration process that is designed to remove detected conicts when possible. Cue-conict methods are perfect for exploring those areas where issues of perceptual representation are not of relevance; namely how metric estimates within the visuo-motor sensorium are maintained and correlated. We can extend the argument above to the full-edged form of Neoconstructivism that we have been claiming reduces to a theory of perception as recognition. We argued that in such a model, the nal percept becomes a set of symbolic codes for the existence of pre-dened measures, attributes, or spatial entities, and possibly some hierarchical combination thereof. The argument against the plausibility of qualitative dierences at the perceptual level in such a recognition model is similar to that for the calibration model. The nal product of the percept in such a model is the activation of a set of symbolic codes in what is essentially a look-up table, or visual dictionary, and the activation of any set of symbols merely indicates that such and such attribute or shape has be identied in the sensorium. From a perceptual standpoint, a visual conguration that lights up one set of symbols, is not qualitatively dierent from another visual conguration that lights up a dierent set of symbols. Since the representational schema in such a recognition model is essentially a learned instantiation of the objective physical world (in some neural symbology), such a functioning recognition system only faithfully infers what is out in the world (up to limits on hardware). So all physically plausible congurations, that are recognized, should yield the same perceptual quality, i.e. from the standpoint of perceptual representation, the inferences themselves are equally appropriate for any recognized conguration. Representation in a recognition model, is just a set of symbolic codes that indicate that such and such a thing has been recognized as being out there, in the way that it is out there. It may be the case that some symbols (or sets of symbols) are predened, in the symbolic look-up table, to be better, more useful, common, rare, of hedonic value etc., i.e. the symbols, or hierarchy of symbols, may have attributes that have become encoded through experience. But under such a model, these values are crucially not properties of the percept itself, i.e. they do not inhere to the perceived conguration, but rather are properties of the symbols (or hierarchy of

474

DHANRAJ VISHWANATH

symbols) that make up the look-up table of the recognition device. Under a Neoconstructivist model such cognitive factors would be encoded in memory via experience, and might color the cognitive experience of the perceived object, but the perceptual experience remains neutral. Since the conguration is physically valid (we have already made this caveat), there is nothing else that can be said about its perceptual representation in terms of quality. Obviously, it is possible that the recognition may be erroneous, but since there is no marker on the percept telling us this, the erroneous percept is, from perceptions point of view, just as valid. It is interesting and predictive to note that the Neoconstructivist would want to claim that all aesthetic qualities are learned and have no basis in raw perceptual experience, because such a position is completely consistent with what a empiricist, na ve realist, theory of perception would predict. 7.2. Representational conict The sort of reexive qualitative judgments that our perception seems to apply to visual congurations, that are most acutely observed in the process and product of design (and art), are also strongly indicated in our common phenomenology. The very act of arranging or combining elements, in everyday tasks that require it, involve choices and changes of physical conguration that are deeply connected to perceiving dierences in the quality of the congurations. Certain congurations just dont look right! This suggests that there is something in the perceptual product akin to the cue-conict congurations in articial stimulation. If perception is viewed as a particular type of representation, or more correctly, presentation, of the sensory ux (Figure 15) then we can think of a situation beyond the calibration domain of cue-conict, which we will call representational conict. Representational conict can be thought of as a measure of the degree of internal conict in the perceptual organization. This conict has nothing to do with the physical validity or invalidity of the external conguration, or its coherence based on some objective external measure. Neither does it have anything to do with qualitative judgments applied as a result of memory or cognition. It is instead a measure of internal structural coherence of the percept, in terms of its predened representational schema and ontological entities. The perceptual system is constrained to present the percept in terms of certain spatial and

THE EPISTEMOLOGY OF VISION

475

ontological modes, thus the sensory stimulation generated by dierent external entities result in perceptual conguration with dierent degrees of internal perceptual conict. Such conict will be present even when the external stimulation is perfectly physically valid and cue-conict free. We might also predict that the special property of representational conict is that it can never be corrected out by calibration as in the cue-conict situation. It may be possible for representational conicts to be masked by learned cognitive attributes, such as those listed above, but the crucial distinction between representational conict and cue conict, is that the former is always available at perceptual level, even when masked by cognitive factors, while the latter (cue-conict) might be erased through adaptation or re-calibration. In other words, representational conict cannot be resolved at the perceptual level it is the very qualitative product of a perceptual organization. We also propose that some degree of representational conict is always present in any perceived visual congurations. In other words, we can think of the idealized ground state as one where there is no representational conict; but such a state is never achieved. Any given percept is thus worse in terms of representation, than this idealized ground state. For a given visual conguration, changes may shift the percept toward better lower conict states, but the idealized ground state is never attained. In contrast, a Neoconstuctivist or inferential theory of perception would imply a neutral perceptual product modulated by appetitive and aversive cognitive factors accrued through experience. Crucially, an inferential theory cannot explain how we perceive qualitative dierences in congurations that we cannot even recognize or have any cognitive knowledge of. Another implication of the idea of representational conict is that qualitative perceptual judgments are internal to the object, and are not comparisons across objects. Perceptually, a particular conguration is not more or less qualitatively superior to a dierent visual conguration; the measure of its qualitative status is selfreferential and thus no metric exists to compare it to a dierent object. One building cannot be said to be more aesthetically coherent than another, rather each has its own measure of internal coherence that, strictly speaking, cannot be compared. This is in contrast to cognitive theories of aesthetic quality where one would posit a neutral perceptual ground state, where the given percept is rendered better or worse by application of an extra-perceptual evaluation

476

DHANRAJ VISHWANATH

based on memory or other cognitive factors. In such theories there can be direct comparison across entities since the quality accrues to the symbols, and not directly to the percept. One cannot deny that such a cognitive aesthetic is also a central aspect of the psychological experience, our aim here is to show that such aspects are not correctly to be thought of in the purview of perceptual representation. We can now add one more item to the list of problems with Neoconstructivist theories we gave in Section 6.4: (9) The implicit assumption of an objective external world, and the view of perception as inferring the contents of that world, do not allow for qualitative dierences to be experienced at the perceptual level. Thus, such theories cannot explain the ubiquitous phenomenology of design. They can only imply a cognitive aesthetic applied onto the neutral product of perception. 7.3. Representational conict, design, and visual homeostasis The notion of cue conict plays a role in the ergonomic development of display devices because of inherent cue conicts in displaying stereo images on 2D computer displays. In such displays the cue conict cannot be calibrated out. This usually leads to visual stress or the inability to make consistent metric judgments. A major goal of the development of stereo display devices is to remove such conicts, making the stimulating light array more spatially consistent with that emanating from a real (physically plausible) object. Since design involves the manipulation of real objects, and such objects do not have cue-conict, it suggests that the notion of representational conict may instead plays a greater role in the visual ecacy of design. Since inherent representational conicts in the perceived conguration cannot be corrected out, they most likely lead to states of perceptual stress akin to the visual stress induced by cue-conicts in displays. Good designers can then be thought of as those who intuitively congure artifacts toward a deep local minimum of representational conict. One can think of poor visual congurations (in other words, poor design) as ones where the homeostasis of perceptual organization is disrupted. Such disruptions of homeostasis can naturally be cloaked by cognitive factors, and in common design often are! Thus, perceptual aesthetics (in contrast to cognitive aesthetics) can be thought of as a form of visual homeostasis. It is interesting to

THE EPISTEMOLOGY OF VISION

477

note that the concept of physiological homeostasis was central to the gures that rst began the anti-constructivist view of vision, namely Mach and Hering.

8.

EPILOGUE: EMPIRICISM, NATIVISM, AND REPRESENTATION

In the case of one observation almost all psychologists agree, that local sensory experience is determined by more than merely local stimulation. The case is that of color contrast, which at the present time most psychologists suppose to be an effect of interaction in the nervous system. Here the point-to-point correlation between retinal stimuli and sensory experience is no longer defended, because the determination of local experience by conditions in a larger area is too evident. But after this concession, how can we proceed as though nothing serious has happened? It took science some time to accept obvious evidence even in this case. Wolfgang Kohler (1947)

By now, the reader will hopefully begin to see that the underlying problem with all inferential inverse-optics approaches to perceptual representation is nothing more than the deceptively simple problem rst posed to us by Hume in 1739. The problem is that they attempt to be causal theories in an empiricist framework. Humes analysis implies that if one wants to establish causal linkages outside of perception, the distinct entities that we seek to establish those links between, lose their identity, and make those very causal links meaningless. But establishing external causal linkages between distinct existences is the central goal of any empiricist theory of perception. As we have seen such an approach reduces to either a na ve realist model of perception, or an idealistic one where we have lost the very external world we are trying to explain. We cannot conclude from this that a causal theory is not possible, but rather that it is not plausible under an empiricist theory. Hume himself betrays his implicit belief that a causal theory is only plausible within a nativist framework, when he essentially says that the mind wraps causality around object. Leyton has laid the foundation for such a model, by arguing that the very schema of perception must be a causal one. Within perception research, it often seems like the debate from Descartes through Hume to Kant is merely a pleasant philosophical interlude that, though interesting and of relevance to historic developments in perception research, ultimately of minimal bearing

478

DHANRAJ VISHWANATH

to contemporary research, which is taken to be on more solid scientic ground. The desire for an exactness matching that of physics is what is usually held up by Neoconstructivism as a basis for rejecting merely philosophical musings on the epistemology of perception. What is ironic about this is that it is these very epistemological concerns in perception, starting with Hume and Kant, through Brentano, Mach, Fechner, and Hering, that forms the direct lineage to modern physics. Research in perceptual representation has suered because the debate between empiricists and nativists, exemplied by the debate between Hering and Helmholtz (see Turner, 1994), has always been characterized as an anachronism, futile and unnecessary (as Gibson states quoting Boring: the long and barren controversy over nativism and empiricism (Gibson, 1950, p. 9). Nativism typically gets aligned either with Descartes, and the associated muddles in the notion of self, being, rationality, etc., or with the idea that perception does not involve learning of any kind. Empiricism typically gets aligned with the notion of perception as learning through association, the notion of a blank-slate, or the idea that perception can be explained without recourse to representation (behaviorism). Invariably researchers have assumed that there was a safe middle ground that could be occupied, and that rejects both these notions. Thus most Neoconstructivists will see themselves as neither empiricist nor nativist, but rather as holding to a scientic methodology that is orthogonal to the debate between the two. The assumption being that the argument reduces to one where nativism means that the modes of perceptual processing are already established at birth, and empiricism, that they are all learned. Naturally when the argument is characterized in that way, a natural, but illusory, middle ground between empiricism and nativism appears. The problem has been that the epistemological incommensurability of nativism and empiricism, as originally formulated by Hume, Descartes, Kant etc., gave way to a supposed incommensurability of nature vs. nurture debate as it came to be played out in the Helmholtz-Hering controversy (see Turner, 1994). But the latter incommensurability is false, which is why so many have been drawn to take the natural middle road on the nature/nurture divide, and with it the appearance that a compromise between empiricism and nativism is possible. Indeed, the denouement of story of color perception and space perception are often held to be the hallmarks of this successful compromise, merely because the

THE EPISTEMOLOGY OF VISION

479

protagonist of each approach regarded themselves as being on the opposite sides of the divide (for a wonderful historical account of this, see Turner, 1994). Arguably, in neither domain (color or spatial perception) has any support been found for a genuine empiricist theory of representation. Neoconstructivists have assumed that perceptual inference can be split up into two parts, the ontogenetic component and the phylogenetic component. The typical reasoning is that a empiricist like learning process occurs in ontogeny utilizing nativist like constraints that have evolved through phylogeny. In other words, nativism and empiricism just represent two ways in which information might come to be encoded, and the time frame involved, rather than that they represent mutually exclusive theories of the nature of the information or knowledge that our percepts provide. Thus, most characterization of current theory would claim that Neoconstructivism is partly nativist because it is fairly well established that there are constraints at birth. But for the real nativist position (e.g. Hering and the Gestaltists) the least interesting aspect of nativism is that they predict constraints at birth. Hering evidently saw that that any distinction between nativist and empiricist portions of a compromise theory dissolves once one posits that nativist constraints are merely ones that have been learned through evolution (see Turner, 1994). Even though Hering has been cast in the role of a person against any form of learning, much of his theorizing on color and binocular vision still stands on solid epistemological ground, and indeed his theories have been redeemed. The later introduction of nativism by the Gestalt theorists, particularly the Berlin school, took a stronger epistemic stance, but was against the atomistic approach of both Helmholtz and Hering. This distinction introduced another red herring into the debate by dening empiricists as those who worked atomistically and Gestaltists who held to the paramount importance of the whole. I believe this distinction hurt the Gestalt movement, because it precluded precisely the kind of so-called atomistic work followed by Hering school, which had a very clear understanding of its epistemological claims, unlike the Helmholtz school. Yet, in the apparent conation of Hering and Helmholtz, we lost an early opportunity to redene the entire gamut of vision research from receptor physics to perceptual representation on strong epistemological grounds. The Gestalt challenge, on epistemological grounds,

480

DHANRAJ VISHWANATH

is entirely commensurate with the Hering school, and in hindsight, the dierences may only be one of levels of abstraction and function. Herings exceptional phenomenological and epistemological sense lead to the elegant solution of opponent color process which redened color more strongly as a perceptual construct and only very weakly as a correlate of wavelength. Its possible that the Gestalt school lost its inuence for several decades precisely because it didnt provide a continuum with the solid scientic base that Herings school had already established; meanwhile the empiricist and behaviorist approaches naturally followed from Helmholtz. The increasing interest in Gestalt in the last two decades again threatens to further bury the critical epistemic foundations spelled out by Hering, and the original thrust of Gestalt theory, by focusing on secondary aspects of the theory such as enumerating the rules of grouping. An alternative theory of representation proposed by Leyton (1992, 2001), has explicitly and implicitly resolved many of these challenges of Gestalt theory by providing a model whose very schema is a causal one, and one where the informational content and qualitative aspects of the perception of shape are explicable within a non-metric algebraic description. A fruitful enterprise for future work in vision would be a redenition of research problems in space and object perception by combining the careful, epistemologically correct, empirical approach taken by Hering, with the quantitative theoretical basis provided by Leyton. Such research might more usefully shed light on issues in Design.
NOTES
1 This term has been introduced previously in the literature in the context of the general approaches to cognition. (see Harnad, 1982). Our usage here is restricted to visual perception and no commentary for or against any existing interpretation of the term is intended. We merely use the term as shorthand for contemporary constructivist theories that view perception as a process of inference. 2 The term representation has varied use in the literature. When the standard English usage of the term representation is implied as something that stands in for, or symbolizes, something else the word will be hyphenated to read re-presentation. Without a hyphen, the term representation or perceptual representation will refer generically to how the percept is actually encoded or instantiated in some structural model (e.g. a particular symbolic hierarchy). A more correct term for the actual output of perception from an epistemological standpoint would be perceptual presentation or presentation. See Note 19.

THE EPISTEMOLOGY OF VISION


3

481

Barry Flanagan is one of a group of British sculptors whose work since the mid 60s has explored the interface between sculptural convention and the exigencies of perception. The two other major artistic movements of the time that worked in a similar vein were Minimalism (US) and Art Povera (Italy). The image is used with permission from Waddington Galleries, London. 4 I am deeply indebted to Michael Leyton for the development of this paper, and thank him both for introducing me to his work and his personal communications. It was an understanding of his theory, and his analysis of the critical issues in visual phenomenology and the information content of perception, that crystallized for me the epistemological argument I present here. 5 We use the term standard computational vision following Leyton (1992) for computational analysis of vision originating in Articial Intelligence research. 6 See note 2 and 19. 7 From Kant, Immanuel (1783). Prolegomena zu einer jeden ku nftigen Metaphysikdie als Wissenschaft wird auftreten ko nnen. Riga: Johann Friedrich Hartknoch. I confess frankly, it was the warning voice of David Hume that rst, years ago, roused me from dogmatic slumbers and gave a new direction to my investigations in the eld of speculative philosophy. Translation in Mu ller , Ludwig (1881). Immanuel Kants Critique of Pure Reason. F. Max, tr., and Noire London: Macmillan. 8 Humes argument is essential for the cognitive and perceptual sciences in a way that it is not for the physical and biological sciences. 9 Also anticipated by Al-Ghazali in The Incoherence of Philosophy (Scruton, 1986). 10 There have been several attempts to re-ignite the discourse (For example, see Brain and Behavioral Sciences Issue 24 (2001), Varela et al. (1991), Mausfeld (2002)), yet no comprehensive analysis of the foundational issues and current scientic approaches is available. This is especially problematic for the point of view of the training of new perception scientists and philosophers of perception. There has been resurgence in interest in perception research on some of these foundational notions, particularly the psychological primacy of the notion of object (see Pylyshyn, 2004; Scholl, 2002). Another area of much analysis in epistemology and ontology has been that of color perception (for a review and analysis see Byrne and Hilbert (2003). It is generally accepted that there are four distinct positions on the status of color as a mental/and or physical entity: externalism, eliminativism, dispositionalism, and functionalism. Externalism appears to tow a line close to Neoconstructivism while eliminativism appears to betray an idealist position at least on one reading. The other two theories seem to hedge their bets with regard to whether properties inhere to the percept or the external world. It is unclear, however, the status of the distinctions made among these theories, once one assumes (a) the existence of an external world but without an objective description (b) that perceptual information is correlated non-trivially with the external ux. 11 An additional drawback has been the professionalization of analytic philosophy, which considers itself to be true the heir to philosophical treatments of perception. The increasingly technical bulwark that analytic philosophy has put up has shifted epistemological discourse more into the domain of language and concepts. Thus in cognitive psychology, it appears that both philosophers and psychologists of perception have been eager to relegate issues of epistemology to the analysis of the nature of reference.

482
12

DHANRAJ VISHWANATH

From the German, Dich an sing or thing-in-itself, (Kant, from Critique of Pure Reason (1781)). 13 I am very grateful to Michael Leyton for providing me with his lecture notes for the course on computational vision that he taught at Rutgers. Most of the section is based on his review of computational theories. 14 Luminance (or radiance) is the amount of visible light that comes to the image from a surface. Illuminance (or irradiance) is the amount of light incident on a surface (or the retinal image). Horn uses the terms radiance and irradiance, though we will use the terms luminance and illuminance. Reectance is the proportion of light reected from a surface in a particular direction. All these are physical properties that can be measured using physical devices. Lightness is the perceptual correlate of reectance, i.e. perceived surface reectance. Brightness is the perceptual correlate of luminance, i.e. perceived surface luminance. 15 Any vector orientation is expressed as a coordinate ( p, q) which are partial derivatives (along a at surface patch normal to the vector) dz/dx and dz/dy in a co-ordinate system where the (x, y) plane is parallel to the image plane, and the line of sight passes through the origin. Thus, vectors parallel to the line of sight have gradient (0, 0), and vectors perpendicular to the line of sight have gradient (), ). See Horn (1986). 16 The gradient space co-ordinates ( f, g) here refer to a projected version of the original gradient space co-ordinates ( p, q). See Horn (1986). 17 A possible objection to the reasoning above is that the transformation is informative because it generates a 3D interpretation of the data that might be useful for motor programming. But on closer consideration it is clear that if the translation from a 2D data set to a 3D data set is accomplished by a xed transformation, then that same transformation can be directly applied to the image data set to program the motor commands without any need for perceptual access to it. Indeed, there doesnt seem to be a reason for us to have perceptual access to a translated 3D space, when the visual information, as well as any motor program, can be adequately expressed in lower dimensional (2D) space of the image. 18 Many researchers may object to this characterization. The claim here is whatever may be the intent of ecological statistics, its internal assumptions are consistent with an assumption of an image and external environment that can be objectively and correctly described. 19 I am indebted to Liliana Albertazzi for introducing the usage of this term to me, which derives from usage in German Vorstellung (as opposed to Darstellung or representation) by Gestaltists, particularly in the Bentano and the Graz school of Gestalt perception. We have used the term perceptual representation, or just representation, throughout the text. Here we imply the actual structural model that encodes percepts. 20 It is perhaps the reason why Gibson is so many dierent things to so many people a person who can be berated and admired in the same breath by those with either Gestalt or empiricist leanings; for exactly the opposite reasons. 21 Many contemporary descriptions of direct pickup or direct perception are notoriously dicult to understand, and evidently arise from the fact that the concept itself, is at some level, incoherent. 22 See note 10.

THE EPISTEMOLOGY OF VISION


23 24

483

Wavelength is itself an extended perception concept. Note that we exclude cognitive associations, or categorizations, that the perceiver might have with the object that might be construed as additional information, since this does accrue to the perceptual process that delivers the percept of an object. 25 Naturally, the relationships among those measurements have some correlation to the actual physical ux external to the perceptual system, the actual nature and dimensionality of which perception has no access to. 26 Recently there has been evidence that this particular assumption of lighting direction is amenable to learning on a short time scale and can be altered within laboratory conditions (Adams et al., 2004). 27 It is precisely the capacity of the visual system to draw out the percept that is central to Leytons proposal for a nested, causal representational schema (see Leyton, 1992). 28 See note 10. 29 Robert Morris was one of the pre-emininent artists of the Minimalist movement. His work most directly explored issues underlying the nature of perception and its relation to sculpture, to such a degree that his mode of work was criticized as being theater in the now controversial essay by Michael Fried Art and Objecthood.

REFERENCES

Adams, WJ, EW Graf and MO Ernst: 2004, Experience can change the light-fromabove prior. Nature Neuroscience 7(10), 10571058. Albertazzi, L., M. Libardi, and R. Poli: 1996, The School of Franz Brentano, Kluwer: Dordrecht. Albertazzi, L.: 2000, Early European Contributors to Cognitive Science, Kluwer: Dordrecht. Albertazzi, L.: 2001, Presentational Primitives. Parts, Wholes and Psychophysics, in L. Albertazzi (ed.), The Dawn of Cognitive Science: Early European Contributors, Dordrechet: Kluwer, pp. 2960. Albertazzi, L. (ed.): 2002, Unfolding Perceptual Continua, Benjamins. Alais, D. and D. Burr: 2004, The Ventriloquist Eect Results from Near-Optimal Bimodal Integration, Current Biology 14(3), 257262. Bahcall, D. O. and E. Kowler: 1999, Illusory Shifts in Perceived Visual Direction Accompany Adaptation of Saccadic Eye Movements, Nature 400, 864866. Barlow, H. B.: 1961, Possible Principles Underlying the Transformation of Sensory Messages, in W. A. Rosenblith (eds), Sensory Communication, Cambridge, MA: MIT Press. Beal, G. and M. J. Jacob: 1987, A Quiet Revolution British Sculpture since 1965, Thames and Hudson: London. Biederman, I.: 1987, Recognition-by-Components: A Theory of Human Image Understanding, Psychological Review 94, 115147. Blake, A. and A. Zisserman: 1987, Visual Reconstruction, MIT Press: Cambridge.

484

DHANRAJ VISHWANATH

Boselie, F. and E. Leeuwenberg: 1986, A test of the minimum principle requires a perceptual coding system. Perception 15, 331354. Brady, M.: 1983, Criteria for Representation of Shape, in A. Rosenfeld and J. Beck (eds), Human and Machine Vision, Vol. 1, Erlbaum: Hillsdale, NJ. Braun, J.: 2000, Intimate Attention, Nature, Vol. 9, p. 408. Buhmann, J. M., J. Malik, and P. Perona: 1999, Image Recognition: Visual Grouping, Recognition and Learning, Proceedings of National Academy of Sciences 96(25), 1420314204. Byrne, A. and D. Hilbert: 2003, Color Realism and Color Science, Behavioral and Brain Sciences 26, 321. Dennett, D.: 1991, Consciousness Explained, Little Brown and Co: Boston. Duhamel J. R., C. L. Colby and M. E. Goldberg: 1992, The updating of the representation of visual space in parietal contex by intended eye movements. Science 155, 902. Elder, J. H. and R. M. Goldberg: 2002, Ecological Statistics of Gestalt Laws for the Perceptual Organization of Contours. Journal of Vision 2(4), 324353. Ernst, M. O. and M. S. Banks: 2002, Humans Integrate Visual and Haptic Information in a Statistically Optimal Fashion, Nature 415, 429433. Feldman, J. and W. A. Richards: 1998, Mapping the Mental Space of Rectangles, Perception 27, 11911202. Fodor, J.: 1983, Modularity of Mind, MIT Press: Cambridge. Geisler, W. S., J. S. Perry, B. J. Super and D. P. Gallogly: 2001, Edge Co-occurence in Natural Images Predicts Contour Grouping Performance, Vision Research 41, 711724. Gibson, J. J.: 1950, The Perception of The Visual World, Haughton Miin: Boston. Gibson, J. J.: 1966, The Senses Considered as Perceptual Systems, Houghton Miin: Boston. Gibson, J. J.: 1979, The Ecological Approach to Visual Perception, Houghton Miin: Boston. Griths, A. F. and Q. Zaidi: 2000, Perceptual Assumptions and Projective Distortions in a Three-dimensional Shape Illusion, Perception 29(2), 171200. Grimson, W. E. L.: 1981, From Images to Surfaces, MIT Press: Cambridge. Harnad, S.: 1982, Neoconstructivism: A Unifying Theme for the Cognitive Sciences, in T. Simon and R. Scholes (eds), Language, Mind and Brain, Hillsdale, NJ: Erlbaum, pp. 111. Hateld, G. and W. Epstein: 1985, The Status of the Minimum Principle in the Theoretical Analysis of Visual Perception, Psychological Bulletin 97(2), 155186. Hillis, J. M., M. O. Ernst, M. S. Banks, and M. S. Landy: 2002, Combining Sensory Information: Mandatory Fusion within, but not between, Senses, Science 298, 16271630. Hochberg, J. and E. McAlister: 1953, A Quantitative Approach to Figural Goodness, Journal of Experimental Psychology 46(5), 361364. Horn, B. K. P.: 1986, Robot Vision, MIT Press: Cambridge. Hume, D.: 1748, A Treatise of Human Nature Vol. 1: of the Understanding in The Empiricists London: MacMillan Press. Homan, D. D.: 1996, What do we Mean by the Structure of the World? Commentary, in D. C. Knill and W. Richards (eds), Perception as Bayesian Inference, Cambridge: Cambridge University Press.

THE EPISTEMOLOGY OF VISION

485

Ikeuchi, K. and B. K. P. Horn: 1981,Numerical Shape from Shading and Occluding Boundaries, Articial Intelligence 17, 141184. Jepson, A. and W. A. Richards: 1992, What Makes a Good Feature?, in L. Harris and M. Jenkin (eds), Spatial Vision in Humans and Robots, Cambridge Univ. Press. Kanade, T.: 1981, Recovery of the Three-dimensional Shape of an Object from a Single View, Articial Intelligence 17, 409460. Kawato, M.: 1999, Internal Models for Motor Control and Trajectory Planning, Current Opinion in Neurobiology 9, 7187270. Knill, D. C.: 1998, Surface Orientation from Texture: Ideal Observers, Generic Observers and the Information Content of Texture Cues, Vision Research 38,, 16551682. Kohler, W.: 1947, Gestalt Psychology, Liverlight: New York. Landy, M. S., L. T. Maloney, E. B. Johnston, and M. J. Young: 1995, Measurement and Modeling of Depth Cue Combination: In Defense of Weak Fusion, Vision Research 35, 389412. Leyton, M.: 1984, Perceptual Organization as Nested Control, Biological Cybernetics 51, 141153. Leyton, M.: 1986a, Principles of Information Structure Common to Six Levels of the Human Cognitive System, Information Sciences 38, 1120. Leyton, M.: 1986b, A Theory of Information Structure I: General Principles, Journal of Mathematical Psychology 30, 103160. Leyton, M.: 1986c, A Theory of Information Structure II: A Theory of Perceptual Organization, Journal of Mathematical Psychology 30, 257305. Leyton, M.: 1987, Nested Structures of Control: An Intuitive View, Computer Vision, Graphics, and Image Processing 37, 2053. Leyton, M.: 1992, Symmetry, Causality, Mind, MIT Press: Cambridge. Leyton, M.: 1999, New Foundations for Perception, in E. Lepore and Z. Pylyshyn (eds), What is Cognitive Science?, Malden: Blackwell. Leyton, M.: 2001, A Generative Theory of Shape, Springer-Verlag Heidelberg. Marr, D.: 1982, Vision, Freeman Press: San Francisco. Mausfeld, R.: 2002, The Physicalist Trap in Perception Theory, in Perception and The Physical World, Wiley. Mitchell, T.M.: 1997, Machine Learning, McGraw Hill. Nakayama, K. and S. Shimojo: 1996, Experiencing and Perceiving Visual Surfaces, in D. C. Knill and W. Richards (eds), Perception as Bayesian Inference, Cambridge: Cambridge University Press. Palmer, S.: 1999, Vision Science, MIT Press: Cambridge. Pinker, S.: 1997, How the Mind Works, Norton: New York. Pratt, I.: 1994, Articial Intelligence, Macmillan: London. Pylyshyn, Z. W.: 2001, Visual Indexes, Preconceptual Objects, and Situated Vision, Cognition 80, 127158. Richards, W.: 1996, A Claim Commentary, in D. C. Knill and W. Richards (eds), Perception as Bayesian Inference, Cambridge: Cambridge University Press. Richards, W., A. Jepson and J. Feldman: 1996, Priors, Preferences, and Categorical Percepts, in D. Knill and W. Richards (eds), Perception as Bayesian Inference, Cambridge University Press. Rock, I.: 1984, Perception, Scientic American Books: New York.

486

DHANRAJ VISHWANATH

in B. J. Scholl.: 2002, Objects and Attention, MIT Press: Cambridge, MA. Scruton, R.: 1986, A Short History of Modern Philosophy, 2nd edition Routledge: London. Sekuler, R. and Blake: 1986, Sensation and Perception. Sekuler, A. B., S. E. Palmer, and C. Flynn: 1994, Local and Global Processes in Visual Completion, Psychological Science 5, 260267. Shiman, H. R.: 1999, Perception, John Wiley and Sons. Spelke, E. S., G. Gutheil, and G. Van de Walle: 1995, The Development of Object Perception, in D. Osherson (eds), Invitation to Cognitive Science, Visual Cognition 2nd edition, Vol. 2, Cambridge, MA: MIT Press. Stevens, K. A.: 1981, The Information Content of Texture Gradients, Biological Cybernetics 42, 95105. Taylor, R. P., A. P. Micolich, and D. Jonas: 1999, Fractal Analysis of Pollocks Drip Paintings, Nature 399, 422. Terzopoulos, D.: 1983, Multilevel Computational Processes for Visual Surface Reconstruction, Computer Vision, Graphics, and Image Processing 24, 5296. Turner, R. S.: 1994, In the Eyes Mind: Vision and the Helmholtz-Hering Controversy, Princeton University Press: Princeton, NJ. Van Tonder, G. and Y. Ejima: 2000, Perception 29, 149157. Van Tonder, G. J., M. J. Lyons, and Y. Ejima: 2002, Visual Structure of a Japanese Zen Garden, Nature 419, 359360. Varela, F., E. Thompson, and E. Rosch: 1991, The Embodied Mind: Cognitive Science and Human Experience, MIT Press: Cambridge. Watanabe, S.: 1969, Knowing and Guessing. A Quantitative Study of Inference Information, John Wiley and Sons: New-York. Westheimer, G.: 1999, Gestalt Theory Recongured: Max Wertheimers Anticipation of Recent Developments in Visual Neuroscience, Perception 28(1). Witkin, A. and J. M. Tannenbaum: 1983, On the Role of Structure in Vision, in A. Rosenfeld and J. Beck (eds), Human and Machine Vision: Vol. 1, Hillsdale, NJ: Erlbaum. Yang, Z. and D. Purves: (2003), A Statistical Explanation of Visual Space, Nature Neuroscience 6, 632640. Yuille, A. and H. H. Bultho: 1996, Bayesian Decision Theory and Psychophysics, in D. C. Knill and W. Richards (eds), Perception as Bayesian Inference, Cambridge: Cambridge University Press.

You might also like