Sergiu Dretske

THE BEHAVIORAL AND BRAIN SCIENCES (1983) 6, 55-90
Printed in the United States of America
Precis of Knowledge and the Flow of Information

Fred I. Dretske
Department of Philosophy, University of Wisconsin, Madison, Wise. 53706
Abstract: A theory of information is developed in which the informational content of a signal (structure, event) can be specified. This content is expressed by a sentence describing the condition at a source on which the properties of a signal depend in some lawful way. Information, as so defined, though perfectly objective, has the kind of semantic property (intentionality) that seems to be needed for an analysis of cognition. Perceptual knowledge is an information-dependent internal state with a content corresponding to the information producing it. This picture of knowledge captures most of what makes knowledge an important epistemological notion. It also avoids many of the problems infecting traditional justificational accounts of knowledge (knowledge as "justified, true belief). Our information pickup systems are characterized in terms of the way they encode incoming information (perception) for further cognitive processing. Our perceptual experience is distinguished from our perceptual beliefs by the different way sensory information is encoded in these internal structures. Our propositional attitudes - those (unlike knowledge) having a content that can be cither true or false (e. g., belief) - are described in terms of the way internal (presumably neural) structures acquire during learning a certain information-carrying role. The content of these structures (whether true or false) is identified with the kind of information they were developed to carry. Keywords: belief; cognition; concept; information; intentionality; knowledge; meaning; perception; representation; semantics
Knoioledgc and the Flow of Information (Dretske 1981; henceforth Knowledge) is an attempt to develop a philosophically useful theory of information. To be philosophically useful the theory should: (1) preserve enough of our common understanding of information to justify calling it a theory of information; (2) make sense of (or explain its failure to make sense of) the theoretically central role information plays in the descriptive and explanatory efforts of cognitive scientists; and (3) deepen our understanding of the baffling place of mind, the chief consumer of information, in the natural order of things. A secondary motive in writing this book, and in organizing its approach to philosophical problems around the notion of information, was to build a bridge, if only a terminological one, to cognitive science. Even if we don't have the same problems (psychologists are no more interested in Descartes's Demon than philosophers are in Purkinje's twilight shift), we have the same subject, and both sides could profit from improved communication. In pursuit of these ends, it was found necessary to think of information as an objective commodity, as something whose existence (as information) is (largely) independent of the interpretative activities of conscious agents. It is common among cognitive scientists to regard information as a creation of the mind, as something we conscious agents assign to, or impose on, otherwise meaningless events. Information, like beauty, is in the eye of the beholder. For philosophical purposes though, this puts things exactly backward. It assumes what is to be explained. For we want to know what this interpretative ability amounts to, why some physical systems (typically, those with brains) have this capacity and others do not.
> ) J983 Cambridge University Press . c 0140-525X1831010055-36/$06.00
What makes some processors of information (persons, but not television sets) sources of meaning? If we begin our study by populating the world with fully developed cognitive systems, systems that can transform "meaningless" stimuli into thoughts, beliefs, and knowledge (or whatever is involved in interpretation), we make the analysis of information more tractable, perhaps, but only by abandoning it as a tool in our quest to understand the nature of cognitive phenomena. We merely postpone the philosophical questions. Part I of Knowledge develops a semantic theory of information, a theory of the propositional content of a signal (event, structure, or state of affairs). It begins by rehearsing some of the elementary ideas of the mathematical theory of communication (Shannon & Weaver 1949). This theory, though developed for quite different purposes, and though having (as a result) only the remotest connection (some would say none) with the kinds of cognitive issues of concern to this study, does, nonetheless, provide a key that can be used to articulate a semantical theory of information. Chapters 2 and 3 are devoted to adapting and extending this theory's account of an information source and channel into an account of how much information a particular signal carries about a source and what (if any) information this is. Part II applies this theory of information to some traditional problems in epistemology: knowledge, skepticism, and perception. Knowledge is characterized as information-produced belief. Perception is a process in which incoming information is coded in analog form in preparation for further selective processing by cognitive (conceptual) centers. The difference between seeing a 55
Dretske: Knowledge and the flow of information duck and recognizing it as a duck (seeing that it is a duck) is to be found in the different way information about the duck is coded (analog vs. digital). Part III is devoted to an information-theoretic analysis of what has come to be called our propositional attitudes in particular, the belief that something is so. Belief, the thinking that something is so, is characterized in terms of the instantiation of structures (presumably neural) that have, through learning, acquired a certain informationcarrying role. Instances of these structures (the types of which are identified as concepts) sometimes fail to perform satisfactorily. This is false belief.
Information
The mathematical theory of communication (Cherry 1951; Shannon & Weaver 1949) is concerned with certain statistical quantities associated with "sources" and "channels." When a certain condition is realized at a source, and there are other possible conditions that might have been realized (each with its associated probability of occurring), the source can be thought of as a generator of information. The ensemble of possibilities has been reduced to a single reality, and the amount of information generated is a function of these possibilities and their associated probabilities. The die is cast. Any one of six faces might appear uppermost. A "3" appears. Six possibilities, all (let us say) equally likely, have been reduced to one. The source, in this case the throw of the die, generates 2.6 bits of information (log2 6 = 2.6). But more important (for my purposes and for the purpose of understanding communication) is the measure of how much information is transmitted from one point to another, how much information there is at point r (receiver) about what is transpiring at s (source). Once again, communication theory is concerned with the statistical properties of the "channel" connecting r and s, because, for most engineering purposes, it is this channel whose characteristics must be exploited in designing effective coding strategies. The theory looks at a statistical quantity that is a certain weighted average of the conditional probabilities of all signals that can be transmitted from s to r. It does not concern itself with the individual events (the particular signals) except as a basis for computing the statistical functions that define the quantities of interest. I skip over these matters rather lightly here, because it should be obvious that, insofar as communication theory deals with quantities that are statistical averages (sometimes called entropy to distinguish them from real information), it is not dealing with information as it is ordinarily understood. For information as it is ordinarily understood, and as it must figure in semantic and cognitive studies, is something associated with, and only with, individual events (signals, structures, conditions). It is only the particular signal (utterance, track, print, gesture, sequence of neural discharges) that has a content that can be given propositional expression (the content, message, or information carried by the signal). This is the relevant commodity in semantic and cognitive studies, and content - what information a signal carries - cannot be averaged. All one can do is average how much information is carried. There is no meaningful average for the information that my grandmother had a stroke and that
my daughter is getting married. If we can say how much information theses messages represent, then we can speak about their average. But this tells us nothing about what information is being communicated. Hence, the quantities of interest in engineering - and, of course, some psychophysical contexts (Attneave 1959; Garner 1962; Miller 1953) - are not the quantities of interest to someone, like myself, concerned to develop an account of what information travels from source to receiver (object to receptor, receptor to brain, brain to brain) during communication. Nevertheless, though communication theory has its attention elsewhere, it does, as Sayre (1965) and others have noted, highlight the relevant objective relations on which the communication of genuine information depends. For what this theory tells us is that the amount of information at r about s is a function of the degree of lawful (nomic) dependence between conditions at these two points. If two conditions are statistically independent (the way the ringing of your telephone is independent of the ringing of mine), then the one event carries no information about the other. When there is a lawful regularity between two events, statistical or otherwise, as there is between your dialing my number and my phone's ringing, then we can speak of one event's carrying information about the other. And, of course, this is the way we do speak. The ring tells me (informs me) that someone is calling my number, just as fingerprints carry information about the identity of the person who handled the gun, tracks in the snow about the animals in the woods, the honeybee's dance about the location of nectar, and light from a distant star about the chemical constitution of that body. -Such events are pregnant with information, because they depend, in some lawfully regular way, on the conditions about which they are said to carry information. If things are working properly, the ringing of my phone tells me that someone has dialed my number. It delivers this piece of information. It does not tell me that your phone is ringing, even if (coincidentally) your phone happens to be ringing at the same time. Even if A dials B's number whenever C dials D's number (so that D's phone rings whenever A dials B's number), we cannot say that the ringing of D's phone carries information about As dialing activities - not if this "correlation" is a mere coincidence. We cannot say this, because the correlation, being (by hypothesis) completely fortuitous, does not affect the conditional probability of A's dialing B's number, given that D's phone is ringing. Of course, if we know about this (coincidental) correlation (though how one could know about its persistence is beyond me), we can predict one event from a knowledge of the other, but this doesn't change the fact that they are statistically independent. If I correctly describe your future by consulting tea leaves, this is not genuine communication unless the arrangement of tea leaves somehow depends on what you are going to do, in the way a barometer depends on meteorological conditions and, therefore, indirectly on the impending weather. To deny the existence of mental telepathy is not to deny the possibility of improbable cooccurrences (between what A thinks and what B thinks A is thinking); it is, rather, to deny that they are manifestations of lawful regularities. Communication theory only makes sense if it makes sense to talk about the probability of certain specific
56
THE BEHAVIORAL AND BRAIN SCIENCES (1983) 1
Dretske: Knowledge and the flow of information conditions given certain specific signals. This is so because the quantities of interest to communication theory are statistical functions of these probabilities. It is this presupposed idea that I exploit to develop an account of a signal's content. These conditional probabilities determine how much, and indirectly what, information a particular signal carries about a remote source. One needs only to stipulate that the content of the signal, the information it carries, be expressed by a sentence describing the condition (at the source) on which the signal depends in some regular, lawful way. I express this theoretical definition of a signal's (structure's) informational content (Chapter 3, p. 65) in the following way: A signal r carries the information that s is F = The conditional probability of s's being F, given r (and k), is 1 (but, given k alone, less than 1) My gas gauge carries the information that I still have some gas left, if and only if the conditional probability of my having some gas left, given the reading on the gauge, is 1. For the same reason, the discharge of a photoreceptor carries the information that a photon has arrived (perhaps a photon of a certain wavelength), and the pattern of discharge of a cluster of ganglion cells carries the information that there is a sharp energy gradient (a line) in the optic array (Lindsay & Norman 1972; Rumelhart 1977). The following comments explain the main features of this definition. 1. There are, essentially, three reasons for insisting that the value of the conditional probability in this definition be 1 - nothing less. They are: a. If a signal could carry the information that s was F while the conditional probability (of the latter, given the former) was less than 1 (.9 say), then the signal could carry the information that s was F (probability = .91), the information that s was G (probability = .91), but not the information that s was F and G (because the probability of their joint occurrence might be less than .9). I take this to be an unacceptable result. b. I accept something I call the xerox principle: If C carries the information that B, and B's occurrence carries the information that A, then C carries the information that A. You don't lose information about the original (A) by perfectly reproduced copies (B of A and C of B). Without the transitivity this principle describes, the flow of information would be impossible. If we put the threshold of information at anything less than 1, though, the principle is violated. For (using the same numbers) the conditional probability of B, given C, could be .91, the conditional probability of A, given B, also .91, but the conditional probability of A, given C, less than .9. The noise (equivocation, degree of nomic independence, or nonlawful relation) between the end points of this communication channel is enough to break communication, even though every link in the chain passes along the information to its successor. Somehow the information fails to get through, despite the fact that it is nowhere lost. c. Finally, there is no nonarbitrary place to put a threshold that will retain the intimate tie we all intuitively feel between knowledge and information. For, if information about s's being F can be obtained from a signal that makes the conditional probability of this situation only (say) .94, then information loses its cognitive punch. Think of a bag with 94 red balls and 6 white balls. If one is pulled at random (probability of red = .94), can you know (just from the fact that it was drawn from a bag with that composition of colored marbles) that it was red? Clearly not. Then why suppose you have the information that it is red? The only reason I know for not setting the required probability this high is worries (basically skeptical in character) that there are no (or precious few) conditional probabilities of 1 - hence, that no information is ever communicated. I address these worries in Chapter 5. They raise issues (e.g., the idea of a "relevant alternative") that have received some attention in recent epistemology. 2. The definition captures the element that makes information (in contrast, say, to meaning) an important epistemic commodity. No structure can carry the information that s is F unless, in fact, sisF. False information, misinformation, and (grimace!) disinformation are not varieties of information - any more than a decoy duck is a kind of duck. A glance at the dictionary reveals that information is related to intelligence, news, instruction, and knowledge - things that have an important connection to truth. And so it should be with any theoretical approximation to this notion. Information is an important commodity: We buy it, sell it, torture people to get it, and erect booths to dispense it. It should not be confused with meaning, despite some people's willingness to speak of anything (true, false, or meaningless) stored on a magnetic disk as information. 3. Information, as defined above, is an objective commodity, the sort of thing that can be delivered to, processed by, and transmitted from instruments, gauges, computers, and neurons. It is something that can be in the optic array, 1 on the printed page, carried by a temporal configuration of electrical pulses, and stored on a magnetic disk, and it exists there whether or not anyone appreciates this fact or knows how to extract it. It is something that was in this world before we got here. It was, I submit, the raw material out of which minds were manufactured. The parenthetical k occurring in the definition above (and explained below) relativizes information to what the receiver already knows (if anything) about the possibilities at the source, but this relativization does not undermine the essential objectivity of the commodity so relativized (MacKay 1969). We still have the flow of information (perhaps not so much) without conscious agents who know things, but without a lawfully regular universe (no matter how much knowledge we assign the occupants), no information is ever communicated. 4. A signal's informational content is not unique. There is, generally speaking, no single piece of information in a signal or structure. For anything that carries the information that s is a square, say, also carries the information that it is a rectangle, a parallelogram, not a circle, a circle or a square, and so on. If the acoustic pattern reaching my ears carries the information that the doorbell is ringing, and the ringing of the bell carries the information that the doorbell button is being pressed, then the acoustic pattern also carries the information that the doorbell button is being pressed (xerox principle). The one piece of information is nestedin the other. This, once again, is as it should be. The linguistic meaning of an utterance may be
57
Dretske: Knowledge and the flow of information unique (distinguishable, for instance, from what it implies), but not the information carried by that utterance. Herman's statement that he won't come to my party means, simply, that he won't come to my party. It doesn't mean (certainly not in any linguistically relevant sense of "meaning") that he doesn't like me or that he can speak English, although his utterance may well carry these pieces of information. 5. The definition of a signal's informational content has been relativized to k, what the receiver (in the event that we are talking about a communication system in which the receiver - organism or computer - already has knowledge about the possible conditions existing at the source) already knows. This is a minor concession to the way we think and talk about information. The k is dischargeable by recursive applications of the definition. So, for instance, if I receive the information that your knight is not on KB-3 (by some signal), this carries the information that it is on KB-5, if I already know that the other possible positions to which your knight could have moved are already occupied by your pieces. To someone lacking such knowledge, the same signal does not carry this information (though it still carries the information that your knight is not on KB-3). The less we know, the more pregnant with information must be the signals we receive if we are to learn. 6. There is, finally, the important fact, already mentioned, that the informational content of a signal is a function of the nomic (or law-governed) relations it bears to other conditions. Unless these relations are what philosophers like to call "counterfactual supporting" relations (a symptom of a background, lawful regularity), the relations in question are not such as to support an assignment of informational content (Dretske 1977). The reason my thermometer carries information about the temperature of my room (the information that it is 72F. in the room), but not about your room though both rooms are at the same temperature, is that (given its location) the registration of my thermometer is such that it would not read 72F. unless my room was at this temperature. This isn't true of your room. This fact helps explain an (otherwise puzzling) feature of information and, ultimately, of the cognitive attitudes that depend on it (belief, knowledge). For it is by virtue of this fact that a structure (some neural state, say) can carry the information that s (a distal object) is F (spherical) without carrying the information that s is G (plastic), even though (let us suppose) all spheres (in the relevant domain) are plastic. If the fact that all spheres are plastic is sheer accident, not underwritten by any lawful constraint, then the neural state might depend on s's being spherical without depending, in the same way, on its being plastic. Another way of expressing this fact (dear to the heart of philosophers) is to say that the informational content of a structure exhibits intentional properties. By saying that it exhibits intentional properties, I mean what philosophers typically mean by this technical term: that the informational content of a signal or structure (like the content of a belief, a desire, or knowledge) depends, not only on the reference (extension) of the terms used in its sentential expression, but on their meaning (intension). That is, in the sentential expression of a structure's informational content, one cannot substitute coreferring (i.e., referring to the same thing, coextensional) expressions without (possible) alteration in content. Just as a belief that this man is my cousin differs from a belief that he is Susan's husband, despite the fact that Susan's husband is my cousin (these expressions have the same reference), the information (as defined above) that he is my cousin differs from the information that he is Susan's husband. A signal can carry the one piece of information without carrying the other. We have, then, an account of a signal's informational content that exhibits a degree of intentionality. We have, therefore, an account of information that exhibits some of the attributes we hope eventually to be able to explain in our account of our cognitive states. Perhaps, that is, one can know that s is F without knowing that s is G, despite the fact that all Fs are G, because knowledge requires information, and one can get the information that s is F without getting the information that it is G. If intentionality is "the mark of the mental," then we already have, in the physically objective notion of information defined above (even without k), the traces of mentality. And we have it in a form that voltmeters, thermometers, and radios have. What distinguishes us from these more pedestrian processors of information is not our occupation of intentional states, but the sophisticated way we process, encode, and utilize the information we receive. It is our degree of intentionality (see Part III).
Knowledge
Knowledge is defined (Chapter 4) as information-caused (or causally sustained) belief. The analysis is restricted to perceptual knowledge of contingent states of affairs (conditions having an informational measure of something greater than 0) of a de re form: seeing (hence, knowing) that this (the perceptual object) is blue, moving, a dog, or my grandmother. This characterization of knowledge is a version of what has come to be called the "regularity analysis" of knowledge (Armstrong 1973; Dretske 1969; 1971). It is an attempt to get away from the philosopher's usual bag of tricks (justification, reasons, evidence, etc.) in order to give a more realistic picture of what perceptual knowledge is. One doesn't need reasons, evidence, or rational justification for one's belief that there is wine left in the bottle, if the bottle is sitting in good light directly in front of one. One can see that it is still half-full. And, rightly or wrongly, I wanted a characterization that would at least allow for the possibility that animals (a frog, rat, ape, or my dog) could know things without my having to suppose them capable of the more sophisticated intellectual operations involved in traditional analyses of knowledge. What can it mean to speak of information as causing anything - let alone causing a belief? (The analysis of belief, the propositional attitude most often taken as the subjective component of knowledge, is postponed until Part III.) Assuming that belief is some kind of internal state with a content expressible as s is F, this is said to be caused by the information that s is F, if and only if those physical properties of the signal by virtue of which it carries this information are the ones that are causally efficacious in the production of the belief. So, for instance, not just any knock on the door tells you it is your friend. The (prearranged) signal is three quick knocks,
58
Dretske: Knowledge and the flow of information followed by a pause, and then another three quick knocks. It is that particular signal, that particular temporal pattern, that constitutes the information-carrying property of the signal. The amplitude and pitch are irrelevant. When it is this pattern of knocks that causes you to believe that your friend has arrived, then (it is permissible to say that) the information that your friend has arrived causes you to believe he has arrived. The knocks might also frighten away a fly, cause the windows to rattle, and disturb the people upstairs. But what has these effects is not the information, because, presumably, the fly would have been frightened, the windows rattled, and the neighbors disturbed by any sequence of knocks (of roughly the same amplitude). Hence, the information is not the cause. In most ordinary situations, there is no explanatory value in talking about the information (in an event) as the cause of something, because there is some easily identifiable physical (nonrelational) property of the event that can be designated as the cause. Why talk of the information (that your friend has arrived) as the cause, when it is clear enough that it is the particular temporal patterns of knocks (or acoustic vibrations) that was the effective agent? The point of this definition is not to deny that there are physical properties of the signal (e.g., the temporal pattern of knocks in the above example) that cause the belief, but to say which of these properties must be responsible for the effect if the resultant belief is to qualify as knowledge. 2 If the belief that your friend has arrived is caused by the knock, but the pattern of knocks is irrelevant, then (assuming that someone else could be knocking at your door), though you are caused to believe it by the knock on the door, you do not know your friend has arrived. Those properties of the signal that carry the information (that your friend has arrived) are not the ones that are causally responsible for your belief. The need to speak in this more abstract way - of information (rather than the physical event carrying this information) as the cause of something - becomes much more compelling as we turn to more complex information processing systems. For we then discover that there are an indefinitely large number of different sensory inputs, having no identifiable physical (nonrelational) property in common, that all have the same cognitive outcome. The only way we can capture the relevant causal regularities is by retreating to a more abstract characterization of the cause, a characterization in terms of its relational (informational) properties. We often do this sort of thing in our ordinary descriptions of what we see. Why did he stop? He could see that he was almost out of gas. We speak here of the information (that he was almost out of gas) that is contained in (carried by) the fuel gauge pointer and not the fuel gauge pointer itself (which, of course, is what we actually see), because it is a property of this pointer (its position, not its size or color) carrying this vital piece of information that is relevantly involved in the production of the belief. We, as it were, ignore the messenger bringing the information (the fuel gauge indicator) in order to focus on what information the messenger brings. We also ignore the infinite variety of optical inputs (all of varying size, shape, orientation, intensity) in order to focus on the information they carry. Often we have no choice. The only thing they have in common is the information they bear. 3 A belief that s is F may not itself carry the information that s is F just because it is caused by this information (thereby qualifying as knowledge). A gullible person may believe almost anything you tell him - for example, that there are three elephants in your backyard. His beliefs may not, as a result, have any reliable relation to the facts (this is why we don't believe him when he tells us something). Yet this does not prevent him from knowing something he observes firsthand. When he sees the elephants in your backyard, he knows they are there, whatever other signal (lacking the relevant information) might have caused him to believe this. If the belief is caused by the appropriate information, it qualifies as knowledge whatever else may be capable of causing it. This definition of knowledge accords, I think, with our ordinary, intuitive judgments about when someone knows something. You can't know that Jimmy is home by seeing him come through the door, if it could be his twin brother Johnny. Even if it is extremely unlikely to be Johnny (for Johnny rarely comes home this early in the afternoon), as long as this remains a relevant possibility, it prevents one from seeing (hence, knowing) that it is Jimmy (though one may be caused to believe it is Jimmy). The information that it is Jimmy is missing. The optical input is equivocal. Furthermore, this account of knowledge neatly avoids some of the puzzles that intrigue philosophers (and bore everyone else to death). For example, Gettier-like difficulties (Gettier 1963) arise for any account of knowledge that makes knowledge a product of some justificatory relationship (having good evidence, excellent reasons, etc.) that could relate one to something false. For on all these accounts (unless special ad hoc devices are introduced to prevent it), one can be justified (in a way appropriate to knowledge) in believing something that is, in fact, false (hence, not know it); also know that Q (which happens to be true) is a logical consequence of what one believes, and come to believe Q as a result. On some perfectly natural assumptions, then, one is justified (in a way appropriate to knowledge) in believing the truth (Q). But one obviously doesn't know Q is true. This is a problem for justificational accounts. The problem is evaded in the information-theoretic model, because one can get into an appropriate justificational relationship to something false, but one cannot get into an appropriate informational relationship to something false. Similarly, the so-called lottery paradox (Kyburg 1961; 1965) is disarmed. If one could know something without the information (as here defined), one should be able to know before the drawing that the 999,999 eventual losers in a (fair) lottery, for which a million tickets have been sold, are going to lose. For they all are going to lose, and one knows that the probability of each one's (not, of course, alt) losing is negligibly less than 1. Hence, one is perfectly justified in believing (truly) that each one is going to lose. But, clearly, one cannot know this. The paradox is avoided' by acknowledging what is already inherent in the information-theoretic analysis - that one cannot know one is going to lose in such a lottery no matter how many outstanding tickets there may be. And the reason one cannot is (barring a fixed drawing) the information that one is going to lose is absent. There remains a small, but nonetheless greater than 0, amount of equivocation for each outcome.
59
Dretske: Knowledge and the flow of information There are further, technical advantages to this analysis (discussed in Chapter 4), but many will consider these advantages purchased at too great a price. For the feeling will surely be that one never gets the required information. Not if information requires a conditional probability of 1. The stimuli are always equivocal to some degree. Most of us know about Ames's demonstrations, Brunswik's ecological and functional validities, and the fallibility of our own sensory systems. If knowledge requires information, and information requires 0 equivocation, then precious little, if anything, is ever known. These concerns are addressed in Chapter 5, a chapter that will prove tedious to almost everyone but devoted epistemologists (i.e., those who take skepticism seriously). An example will have to suffice to summarize this discussion. A perfectly reliable instrument (or one as reliable as modern technology can make it) has its output reliably coi related with its input. The position of a mobile pointer on a calibrated scale carries information about the magnitude of the quantity being measured. Communication theorists would (given certain tolerances) have no trouble in describing this as a noiseless channel. If we ask about the conditional probabilities, we note that these are determined by regarding certain parameters as fixed (or simply ignoring them). The spring could weaken, it could break, its coefficient of elasticity could fluctuate unpredictably. The electrical resistance of the leads (connecting the instrument to the apparatus on which measurements are being taken) could change. Error would be introduced if any of these possibilities was realized. And who is to say they are not possibilities? There might even be a prankster, a malevolent force, or a god who chooses to interfere. Should all these possibilities go into the reckoning in computing the noise, equivocation, and information conveyed? To do so, of course, would be to abandon communication theory altogether. For this theory requires for its application a system of fixed, stable, enduring conditions within which the degree of covariation in other conditions can be evaluated. If every logical possibility is deemed a possibility, then everything is noise. Nothing is communicated. In the same manner, if everything is deemed a thing for purposes of assessing the emptiness of containers (dust? molecules? radiation?), then no room, pocket, or refrigerator is ever empty. The framework of fixed, stable, enduring conditions within which one reckons the flow of information is what I call "channel conditions. ' Possible variations in these conditions are excluded. They are what epistemologists call "irrelevant alternatives" (Dretske 1970; Goldman 1976). And so it is with our sensory systems. Certainly, in some sense of the word could, Herman, a perfectly normal adult, could be hallucinating the entire football game. There is no logical contradiction in this supposition; it is the same sense in which a voltmeter's spring could behave like silly putty. But this is not a sense of could that is relevant to cognitive studies or the determination of what information these systems are capable of transmitting. The probability of these things happening is set at 0. If they remain possibilities in some sense, they are not possibilities that affect the flow of information. This discussion merely accentuates the way our talk of information presupposes a stable, regular world in which some things can be taken as fixed for the purpose of
60
assessing the covariation in other things. There is here a certain arbitrary or pragmatic element (in what may be taken as permanent and stable enough to qualify as a channel condition), but this element (it is argued) is precisely what we find when we put our cognitive concepts under the same analytical microscope. It is not an objection to regarding the latter as fundamentally information-dependent notions.
Perception
Perception itself is often regarded as a cognitive activity: a form of recognizing, identifying, categorizing, distinguishing, and classifying the things around us (R. N. Haber 1969). But there is what philosophers (at least this philosopher) think of as an extensional and an intensional way of describing our perceptions (Dretske 1969). We see the duck (extensional: a concrete noun phrase occurs as object of the verb) and we recognize it (see it) as a duck see that it is a duck (intensional: typically taking a factive nominal as complement of the verb). Too many people (both philosophers and psychologists) tend to think about perception only in the latter form, and in so doing they systematically ignore one of the most salient aspects of our mental life: the experiences we have when we see, hear, and taste things. The experience in question, the sort of thing that occurs in you when you see a duck (without necessarily recognizing it as a duck), the internal state without which (though you may be looking at the duck) you don't see the duck, is a stage in the processing of sensory information in which information about the duck is coded in what I call analog form, in preparation for its selective utilization by the cognitive centers (where the belief that it is a duck may be generated). To describe what object you see is to describe what object you are getting information about; to describe what you recognize it as (see it to be) is to describe what information (about that object) you have succeeded in cognitively processing (e.g., that it is a duck). You can see a duck, get information about a duck, without getting, let alone cognitively processing, the information that it is a duck. Try looking at one in dim light at such a distance that you can barely see it. To confuse seeing a duck with recognizing it (either as a duck or as something else) is simply to confuse sentience with sapience. Our experience of the world is rich in information in a way that our consequent beliefs (if any) are not. A normal child of two can see as well as I can (probably better). The child's experience of the world is (I rashly conjecture) as rich and as variegated as that of the most knowledgeable adult. What is lacking is a capacity to exploit these experiences in the generation of reliable beliefs (knowledge) about what the child sees. I, my daughter, and my dog can all see the daisy. I see it as a daisy. My daughter sees it simply as a flower. And who knows about my dog? There are severe limits to our information-processing capabilities (Miller 1956), but most of these limitations affect our ability to cognitively process the information supplied in such profusion by our sensory systems (Rock 1975). More information gets in than we can manage to digest and get out (in some appropriate response). Glance around a crowded room, a library filled with books, or a
Dretske: Knowledge and the flow of information garden ablaze with flowers. How much do you see? Is all the information embodied in the sensory representation (experience) given a cognitive form? No. You saw 28 people in a single brief glance (the room was well lit, all were in easy view, and none was occluded by other objects or people). Do you believe you saw 28 people? No. You didn't count and you saw them so briefly that you can only guess. That there were 28 people in the room is a piece of information that was contained in the sensory representation without receiving the kind of cognitive transformation (what I call digitalization) associated with conceptualization (belief). This homely example illustrates what is more convincingly demonstrated by masking experiments with brief visual displays (Averbach & Coriell 1961; Neisser 1967; Sperling 1960). Although it is misleading to put it this way, our sensory experience encodes information in the way a photograph encodes information about the scene at which the camera is pointed. This is not to say that our sensory experience is pictorial (consists of sounds, sights, smells, etc.). I don't think there are daisy replicas inside the head, although I do think there is information about - and in this sense a representation of- daisies in the head. Nor do I mean to suggest (by the picture metaphor) that we are aware of (somehow perceive) these internal sensory representations. On the contrary, what we perceive (what we are aware of) are the things represented by these internal representations (not the representations themselves), the things about which they carry information (see section on "The Objects of Perception" in Chapter 6). I see a red apple in a white bowl surrounded by a variety of other objects. I recognize it as an apple. I come to believe that it is an apple. The belief has a content that we express with the words, "That is an apple." The content of this belief does not represent the apple as red, as large, or as lying next to an orange. I may have (other) beliefs about these matters, but the belief in question abstracts from the concreteness of the sensory representation (icon, sensory information store, experience) in order to represent it simply as an apple. However, these additional pieces of information are contained in the sensory experience of the apple. As Haber and Hershenson (1973) put it (in commenting on a specific experimental setup), "It appears as if all of the information in the retinal projection is available in the iconic storage, since the perceiver can extract whichever part is asked for." In passing from the sensory to the cognitive representation (from seeing the apple to realizing that it is an apple), there is a systematic stripping away of components of information (relating to size, color, orientation, surroundings), which makes the experience of the apple the phenomenally rich thing we know it to be, in order to feature one component of this information - the information that it is an apple. Digitalization (of, for example, the information that s is an apple) is a process whereby a piece of information is taken from a richer matrix of information in the sensory representation (where it is held in what I call "analog" form) and featured to the exclusion of all else. The difference between the analog and digital coding of information is illustrated by the way a picture of an apple (that carries the information that it is an apple) differs from a statement that it is an apple. Both represent it as an apple, but the one embeds this information in an informationally richer representation. Essential to this process of digitalization (the essence of conceptualization) is the loss of this excess information. Digitalization is, of course, merely the information-theoretic version of stimulus generalization. Until information is deleted, nothing corresponding to recognition, classification, or identification has occurred. Nothing distinctively cognitive or conceptual has occurred. To design a pattern-recognition routine for a digital computer, for example, is to design a routine in which information inessential to s's being an instance of the letter A (information about its specific size, orientation, color) is systematically discarded (treated as noise) in the production of some single type of internal structure, which, in turn, will produce some identificatory output label (Uhr 1973). If all the computer could do was pass along the information it received, it could not be credited with recognizing anything at all. It would not be responding to the essential sameness of different inputs. It would be merely a sophisticated transducer. Learning, the acquisition of concepts, is a process whereby we acquire the ability to extract, in this way, information from the sensory representation. Until that happens, we can see but we do not believe.
Belief
The content of a belief, what we believe when we believe (think) that something is so, can be either true or false. If we think of beliefs as internal representations (as I do), then these representations must be capable of misrepresenting how things stand. This is one aspect of intentionality. Furthermore, if two sentences, S1 and S2, mean something different, then the belief we express with Sx is different from the belief we express with S2. Believing that a man is your brother is different from believing that he is my uncle (even if your brother is my uncle), because the sentences "He is your brother" and "He is my uncle" mean something different. A difference in meaning is sufficient, not necessary, for a difference in corresponding beliefs. The belief you express with the words "I am sick" is different from the belief I express with these words, despite the fact that the words mean the same thing. They have a different reference. This is a second aspect of intentionality. But beliefs not only have a content exhibiting these peculiar intentional characteristics; they also, in association with desires, purposes, and fears, help to determine behavior. They are, if we can trust our ordinary ways of thinking, intentional entities with a hand on the steering wheel (Armstrong 1973). It is the purpose of Part III to give a unified, information-theoretic account of these entities. The account is incomplete in a number of important ways, but the underlying purpose is to exhibit the way meanings (insofar as these are understood to be the conceptual contents of our internal states) are developed out of informational contents. We have already seen (Chapter 3) the way informationbearing structures have a content (the information they carry - e.g., that s is F) exhibiting traces of intentionality. But this is only what I call thefirstorder of intentionality. If two properties are lawfully related in the right way,
61
Dretske: Knowledge and the flow of information then no signal can carry information about the one without carrying information about the other. No structure can have the (informational) content that s is F without having the (informational) content that s is G, if it turns out that nothing can be F without being G. This is the first respect in which the informational content of a structure fails to display the degree of intentionality of a belief (we can certainly believe that s is F without believing that s is G, despite the nomic connection between F and G). The second respect in which information-carrying structures are ill prepared to serve as beliefs, despite their possession of content, is that, as we have seen, nothing can carry the information that s is F, nothing can have this informational content, unless, in fact, sisF. But we can certainly believe that something is so without its being so. Without the details, the basic strategy in Part III is quite simple. Consider a map. What makes the symbols on a map say or mean one thing, not another? What makes a little patch of blue ink on a map mean that there is a body of water in a specific location (whether or not there actually is a body of water there)? It seems that it acquires this meaning, this content, by virtue of the informationcarrying role that that symbol (in this case, a conventionally selected and used sign) plays in the production and use of maps. The symbol means this because that is the information it was designed to carry. In the case of maps, of course, the flow of information from map-maker to map-user is underwritten by the executive fidelity of the map-makers. A type of structure, in this case blue ink, means there is water there, even though particular instances of that (type of) structure may, through ignorance or inadvertence, fail to carry this information. Misrepresentation becomes possible, because instances (tokens) of a structure (type) that has been assigned (and in this sense has acquired) an information-carrying role may fail to perform in accordance with that role. The instances mean what they do by virtue of their being instances of a certain type, and the structure type gets its meaning from its (assigned) communicative function. Neural structures, of course, are not conventionally assigned an information-carrying role. They are not, in this sense, symbols. Nevertheless, they acquire such a role, I submit, during their development in learning (concept acquisition). In teaching a child what a bird is, for example, in giving the child this concept (so that the youngster can subsequently have beliefs to the effect that this is a bird, that is not), we expose the child to positive and negative instances of the concept in question (with some kind of appropriate feedback) in order to develop a sensitivity to the kind of information (that s is a bird) that defines the concept. When the child can successfully identify birds, distinguish them from other animals (how this actually happens is, as far as I am concerned, a miracle), we have created something in the child's head that responds, in some consistent way, to the information that something is a bird. When the learning is successful, we have given the pupil a new concept, a new capacity, to exploit in subsequent classificatory and identificatory activities. If the child then sees an airplane and says "bird," this stimulus has triggered another token of a structure type that was developed to encode the information that the perceptual object was a bird (thereby representing it as a bird). We have a case of misrepresentation, a false belief.4 But we still have not captured the full intentionality of beliefs. In teaching our child the concept water, for instance, why say that the structure that develops to encode information about water is not, instead, a structure that was developed to encode information about the presence of oxygen atoms? After all, any incoming signal that carries the information that s is water carries (nested in it) the information that s has oxygen atoms in it (since there is a lawful regularity between something's being water and its having oxygen atoms in it). The answer to this question is, of course, that the child has not developed a sensitivity to the information that s has oxygen atoms in it just because the pupil has been taught to respond positively to signals all of which carry that information. This can easily be demonstrated by testing the child with samples that are not water but do have oxygen atoms in them (rust, air, etc.). The crucial fact is that, although every signal to which the child is taught to respond positively carries information about the presence of oxygen atoms, it is not the properties of the signal carrying this information to which the child has acquired a sensitivity. Recall, it is those properties of the signal that are causally responsible for the child's positive response that define what information he is responding to and, hence, what concept he has acquired when he has completed his training. These properties (if the training was reasonably successful) are those carrying the information that the substance is water (or some approximation thereto - as time goes by, the concept may be refined, its information-response characteristics modified, into something more nearly resembling our mature concept of water). Concept acquisition (of this elementary, ostensive sort) is essentially a process in which a system acquires the capacity to extract a piece of information from a variety of sensory representations in which it occurs. The child sees birds in a variety of colors, orientations, activities, and shapes. The sensory representations are infinitely variegated. To learn what a bird is is to learn to recode this analogically held information (that s is a bird) into a single form that can serve to determine a consistent, univocal response to these diverse stimuli. Until such structures have been developed, or unless we come into this world with them preformed (see the discussion of innate concepts in Chapter 9), nothing of cognitive significance has taken place. NOTES 1. Though I am sympathetic to some of the (earlier) views of the late James Gibson (1950; 1966), and though some of my discourse on information (e.g., its availability in the proximal stimulus) is reminiscent of Gibson's language, this work was not intended as support for Gibson's views - certainly not the more extravagant claims (1979). If criticized for getting Gibson wrong, I will plead "no contest." I wasn't trying to get him right. If we disagree, so much the worse for one of us at least. 2. This is not so much a denial of Fodor's (1980) formality condition as it is an attempt to say which syntactical (formal) properties of the representations must figure in the computational processes if the resulting transformations are to mirror faithfully our ordinary ways of describing them in terms of their semantical relations.
62
Commentary/DretsVe: Knowledge and the flow of information

3. I skip here a discussion of information's causally sustaining a belief. The idea is simply that one may already believe something when one receives the relevant supporting information. In this case, the belief is not caused or produced by the information. It nonetheless - after acquisition of the relevant information - qualifies as knowledge if it is, later, causally sustained by this information. 4. In my eagerness to emphasize the way conceptual content is determined by etiological factors (the information-response charactersitics of the internal structures) and to contrast it with the (behavioristically inspired) functionalist account (where what you believe is largely determined by the kind of output it produces), I seriously misrepresented (in Chapter 8) Dennett's (1969) position. Dennett stresses, as I do, the importance of the way these internal structures mediate input and output. He does, however, trace their ultimate significance, meaning, or content to the kind of (appropriate) behavior they produce. only if the subject recognizes the signal to contain the information in question? This possibility is strongly suggested by the very example Dretske uses to introduce the thesis in question, namely, a pattern of knocks on the door conveying the information that the courier has arrived, (p. 87) Presumably, the recipient will not form the belief that the courier has arrived, unless he recognizes the pattern of knocks as the signal that had been agreed on. This would seem to be the rule with linguistic and other "meaningful" signals. There are also general theoretical reasons for supposing that the subject's awareness is often required. At one point Dretske notes: Since the same information can be carried in a large variety of physically different signals, the possibility exists of there being no readily available causal explanation of the signal's effect in terms of its physical properties (the properties that carry the information). One may have to appeal to the information contained in the signal to explain its effects, (p. 248) How is it that causes widely different in physical properties can all have the same cognitive effect? It is highly plausible to locate the connecting thread in an awareness by the subject of the informational content. The physically different causes have the same effect, because the subject "sees" in all of them the same information. Dretske might reply that, in such cases, the external signal first produces the belief that the signal carries the information that s is F, and then it is that first belief that produces the belief that s is F. But this would by no means obviate the difficultly. At each stage in the process at which K has knowledge, the belief must be produced by the information-carrying features of the signal, rather than (even in part) by K's awareness that the signal has those features or carries that information. To be sure, we can hardly claim that the signal N produces the belief that N carries information / by virtue of K's recognition that N carries /, if that recognition itself involves that belief; the belief cannot be causally responsible for its own existence. But what of the transition from the first belief to the second? If that is to eventuate in knowledge, the first belief must carry the information that s is F and must produce the second belief by virtue of its possession of features that carry that information. But it seems extremely plausible to hold that (sound) inference essentially involves the subject's recognition of the content and the truth of the premises; that is, it involves the subject's awareness of the "information carried" by the belief in those premises. Otherwise, the conclusion is not forthcoming. Once again, something other than the information-carrying features of the signal is required. Dretske is specially concerned with perceptual knowledge, in which he takes the proximate belief-producing signal to be "sensory experience." This signal may have a "builtin" awareness of its own character and, in particular, of the feature by virtue of which it carries its information. Without that awareness, it wouldn't be sensory experience. But even if this is so, it doesn't follow that this awareness is any part of what enables the signal to carry the information. And if it isn't, then the problem remains. Finally, let's consider the point, noted by Dretske, that there seems to be a circularity in the definition of knowledge. Since the information carried by a signal has been made relative to what the potential receiver already knows, there is a covert reference to knowledge in the concept of information that is used in the definition of knowledge. Dretske responds (pp. 86-87) that the definition is recursive. Basic knowledge involves information that does not depend on any prior knowledge on the part of the receiver. In that case, the signal itself (or features thereof) gives the probability of 1 to s's being F, apart from any additional information. Other cases can then be built upon this, but it requires much more discussion than Dretske gives it. Even if our standards of "relevance" can be justifiably set so that there are cases in which the basis of the belief suffices by itself to rule out all "relevant" alternatives to its truth, there is still the
Open Peer Commentary

Commentaries submitted by the qualified professional readership of this journal will be considered for publication in a later issue as Continuing Commentary on this article. Integrative overviews and syntheses are especially encouraged.
Dretske on knowledge
William P. Alston
Department of Philosophy, Syracuse University, Syracuse, N.Y. 13210
K knows that s is F = K's belief that s is F is caused (or causally sustained) by the information that 5 is F. (p. 86)
First, as Dretske notes, although this formulation (given his account of information) ensures that what causes the belief is a reliable sign of the fact believed, it puts no reliability restrictions on the process by which the belief is formed, (pp. 90-91) The cause of the belief couldn't exist without s's actually being F. But there is no analogous requirement that the belief couldn't (or wouldn't) have have been formed if it hadn't been caused in this way, and hence there is no requirement that the belief is a reliable sign of its cause (and, by transitivity, of the fact that s is F). This leaves the account open to "accidentally " objections. Consider a case in which the sensory state that causes the belief carries the information that s is F . Nevertheless, K is a markedly undiscriminating and intemperate perceptual believer. He by no means confines himself to sensory presentations that carry the relevant information; whenever s looks even remotely Flike, then, provided he is predisposed to believe s to be F, he forms the belief. Thus, he frequently forms perceptual beliefs on an insufficient basis, and these beliefs are frequently false. In this case, the basis was sufficient, but it was just luck that he got it right this time. Hence, it isn't knowledge. When one just lucks into a true belief that;;, one doesn't know that p. In this respect Goldman's version of a reliability theory of perceptual knowledge (1976), which requires that K's belief-forming mechanism be suitably discriminating, is superior to the present account. Second, Dretske tells us that a belief will be said to be caused by the information that p, when what causes the belief is the feature of the signal by virtue of which it carries that information. But what if the signal will produce the belief only if the subject is aware of it; indeed, what if the belief is forthcoming
63
Commentary/Dretske:
Knowledge and the flow of information Would Dretske say that there is then no knowledge? This is a very skeptical-sounding thing to say. Yet, as Dretske shows, if he links knowledge merely with information transmitted with very high, but not complete, reliability, there is trouble. His xerox principle becomes false. Given enough xeroxing of xeroxes, the conditional probability that the original is faithfully reproduced in the latest xerox will fall to well below 1. So should Dretske say that, in an indeterministic world, there can be rational belief, but that knowledge is an unattainable ideal? I raise this difficulty for Dretske in no unfriendly spirit. It is equally a difficulty for my own "reliability" (pace, Dretske, not "regularity") account of knowledge (Armstrong 1973). 2. Next I want to call attention to the ingenuity and importance of Dretske's solution to a problem that plagues views such as his. When we hear the doorbell ringing, it is the doorbell that we hear ringing, and not the vibration of the membranes of our ears. It is not immediately clear why this should be so in Dretske's analysis. Should not the causal chain that ends in the brain and that carries information (pleonastically, reliable information) about the doorbell's ringing also carry information about the vibration? I have wrestled with this problem (1968, Chapter 11, Section V) and have proposed what I think Dretske would characterize as an "effect-oriented' solution. I said that the perception gave a capacity for selective behavior, which could be interpreted (with suitable recharacterizing of the purposes involved in the behavior) either as behavior directed toward the ringing, or toward the vibration. However, the doorbell's ringing is (in normal circumstances) the furthest point out in the causal chain to which the perception gives us a capacity to react in a selective way. That the ringing is the "furthest point out" I then linked conceptually with its being the event (immediately) perceived. I was never particularly impressed by this solution. Now Dretske has suggested a truly ingenious alternative (pp. 162-68), based upon the operation of constancy mechanisms in perception. If we consider the causal chain from distal stimulus (the ringing) to the proximal stimulus (the vibration), then, in general, the proximal stimulus will not yield information in Dretske's sense - that is, reliable information - about the distal stimulus. This is because perception is not like getting a series of telephone messages along independent telephone lines about what is going on at various points in the environment, messages which are then merely conjoined to give the whole picture. Our sensory system is not sensitive to localized stimuli, but rather to global characteristics of the whole stimulus pattern, including information from other sense modalities. Without this, objects would not, for instance, look to retain the same size while projecting successively different shapes on the retina. Consequently, between the distal stimulus and the actual perceptions themselves there are no individual proximal stimuli bearing information about the distal stimulus. More generally, the perceptions will not reflect the nature of the individual proximal stimuli. Hence, we cannot react to the proximal stimuli. So they cannot be said to be perceived. This strikes me as an engaging, and rather plausible, solution to the problem of eliminating the proximal stimulus as an object of perception. 3. I am not particularly impressed by Dretske's sharp distinction between perception and belief. The elaborate nature of the mechanisms of perceptual constancy, an elaborate nature which we have just seen that Dretske effectively exploits, shows how sophisticated our perceptual system is. Dretske calls attention to the phenomenon of "restoration" (p. 147), that is, closures of boundaries, insertion of missing sounds, where the system behaves as if it is reasoning inductively beyond the limits of the presented stimuli. The influence of expectations and attitudes upon our perceptions is well known (and Dretske would have to grant that many such expectations are fully blown beliefs). Perception seems belieflike.
very large assumption that all other knowledge can be recursively based on these foundational cases. And it is certainly not obvious that this is so.
Knowledge is mutable
Michael A. Arbib
Department of Computer and Information Science, University of Massachusetts at Amherst, Amherst, Mass. 01003
According to Dretske, a signal r carries the information that s is

F if P(s is F/r) = 1.
1. Yet how do we know that a probability is 1? Even in a Newtonian universe, there are errors in measurement, so that the observations which led, for example, to our knowledge of the existence of Uranus (Hoyt 1980) did not yield the trajectory as a probability 1 event, but rather as the most plausible, using least-squares methods to fit sparse and noisy data. Well, Dretske might respond, it wasn't knowledge - as shown by the later observation of anomalies that led to the discovery of Neptune. But does this force us to conclude that scientists never have knowledge, though in some cases they may approach knowledge? Dretske's discussion of the Kyburg's (1965) paradox does show that even a probability very, very close to 1 cannot, in general, be taken as providing knowledge. But in this example, one is appealing to prior knowledge of the way in which the lottery is run to provide an even higher probability that someone will be a winner. For a general theory of "mutable" knowledge, might not the fact that the latter probability is higher be more pertinent than the fact that the probability is computed to bel? Artificial intelligence is now looking at various schemes for making inferences from uncertain data - HEARSAY (Erman & Lesser 1980) and MYCIN (Shortliffe 1976) are two early examples - which may help philosophers build upon Dretske's work. I often reach a state of certainty based on limited data (and this is not just my personal failing!), only to change my mind when faced with new data or arguments. Many philosophers would say that I am then talking of belief rather than knowledge. I would invite Dretske, then, to reflect upon the philosophical implications of the work cited above as they might appear in a sequel to be entitled Belief and the flow of information.
Indeterminism, proximal stimuli, and perception

D. M. Armstrong
Department of Traditional and Modern Philosophy, University of Sydney, New South Wales, Australia 2006
I very much admire Dretske's Knowledge and the Flow of Information. I think that his naturalistic account of knowledge and belief is on the right track. What is more important, he has taken us farther down that track than any previous worker. I will comment on three particular points: (1) the effect of combining Dretske's views with a probabilistic theory of laws of nature; (2) Dretske's method of distinguishing the perceived distal stimulus from the unperceived proximal stimulus; (3) Dretske s distinction between perception and belief. 1. Dretske's analysis of the transmission of information demands a conditional probability of 1. Information, to be information, must be transmitted with complete reliability. Suppose however, what may be true, that all the fundamental laws of nature are irreducibly probabilistic. It does not then seem possible to have completely reliable transmission of information. Suppose that our world is an irreducibly probabilistic world.
64
Commentary/Dretske:
It seems clear at least that our perceptions are propositional in character, involving both referential elements and classifications and sortings, and that they may correspond or, in some cases, fail to correspond, to reality (may be veridical or nonveridical). They constitute information in a sense which is not Dretske's sense, because it is a sense which allows the notion of misinformation. It is true that our perceptions are much richer in such information than are our verbally expressed beliefs, and it is true that they involve far more information than we can ever give attention to. But it still seems that we could think of perceiving as the acquiring of beliefs, even if most of those beliefs fade almost immediately, without having been made the object of selective attention. Perceiving that or seeming to perceive that (seeing that, smelling that, etc) certainly seems to entail the acquiring of beliefs, and perceiving that seems to be perceiving. It is true that we have idioms where we speak of perceiving things, processes, and so on, and that these idioms are much less obviously cognitive ones. Seeing a cat does not entail that one sees that there is a cat before one. Yet I take the use of such idioms to be that they range over the vast mass of information, and misinformation, acquired about the cat, but without attempting the impossible task of specifying that information.
Knowledge and the flow of information
Information and semantics

Jon Barwise
Departments of Mathematics, Computer Sciences, and Linguistics, University of Wisconsin-Madison, Madison, Wise. 53706
Dretske lays his cards on the table with his first 10 words. "In the beginning there was information. The word came later." (p. vii) The basic notion in Dretske's book is that of one situation carrying information about another. Information in this sense is out there in the world, not in the head, and it is not dependent on there being a representation of it - linguistic or otherwise. It is out there because lawlike constraints obtaining between situations enable one situation to carry information about another. Dretske discusses situations and signals more or less interchangeably. For reasons connected with my own work with John Perry on situation semantics, I use "situation." Dretske shows that the notion of one situation carrying information about another has a central place in cognitive science. He says little that is specific to the way that linguistic situations carry information, but his insights into the basic notion are extremely important for the study of natural language. On the other hand, certain features of the linguistic transfer of information point to inadequacies in his definition of the basic notion. Let me lay my own cards on the table. The word may have come later, but it is the word that interests me - how it is that linguistic situations carry information? I take it to be an important task to give an account of the information-carrying capacity of language, and that this should be a part of model theory. It is a startling fact that two mathematical disciplines that have to do with language, information theory and model theory, pass like strangers in the night.' As Dretske stresses, traditional information theory is not a semantic theory at all; it deals not with what information is in a given situation, but how much. Model theory, for its part, is so preoccupied with the language of mathematics that it thoughtlessly makes assumptions that render it inadequate for the study of information. Model theory is concerned with the relation between expressions of language and the world. Since information is independent of the language in which it is represented, we need model-theoretic structures that represent how the world is independent of language. But the starting point of the standard
theory is that structures are defined only relative to a specificlanguage. It is as though God first created language and then the world, not the other way around. Another problem is what Dretske calls the "intentionality" of information: Its inability to distinguish between coextensional properties is a notorious difficulty with first-order logic. Both of these problems can be solved by admitting that the properties things have are prior to representations of them. Then a structure can be defined in terms of what things have what properties, independent of what words happen to designate those properties. Further problems arise from the fact that information is not complete but partial, and it is not closed under logical consequence. But structures are total representations of how the world is, and logical consequence is defined in terms of total structures. To mesh with a theory of information, model theory must pay attention to the partial structures needed to represent situations. 2 Dretske stresses the distinction between information and meaning. The doctor's asserting "You have the flu" is a typical case of one situation (an utterance) containing information about another (your state of health), but the information is not the meaning of the sentence, "You have the flu." The latter is fixed by the conventions of English, whereas the former depends on whom someone is talking to and when. It is because the sentence has the meaning it does that it can be used in specific situations to convey information. Model theory ignores this and treats sentences as having a truth value independent of the situations in which they are used. This, too, must go if the relation between linguistic meaning and information is to be captured. Let me turn to two worries about Dretske s basic definition of information. First, Dretske emphasizes that it is only due to nomic relations between types of situations that one can carry information about another. But when he defines the basic notion, he relativizes only to what the receiver knows about alternative possibilities. More important is what the receiver knows about the nomic relations. While information is out there, it informs only those attuned to the relations that allow its flow. This is true in general, but obvious where the relations are conventional, as with language. If you don't understand English, the doctor's uttering "You have the flu" will not inform you. Dretske gives a recursive account of knowledge of contingent fact - but only in terms of knowledge of the nomic relations that allow information flow. The second worry is that the probabilistic account misses the point when it comes to linguistic events. The issue is not whether the probability of your having the flu is 1, but whether the doctor has the information and intends to convey it to you in saying "You have the flu." Or suppose that Fermat's last theorem is false. On the probabilistic account, this information is carried by every situation. Yet surely there is a transfer of information in my telling you an explicit counterexample that is not in a situation like a sunset. Dretske has moved information to center stage where it belongs, for it puts important constraints on cognitive theories. But we must understand the various types of nomic relations that allow information flow, and how one comes to be attuned to such relations, before we will really understand what it means to be a cognitive agent. NOTES 1. I take as prototypical the model theory offirst-orderlogic, one of the success stories of mathematical logic. The criticisms below are directed toward this. Some will consider it a straw man, since there are other theories designed specifically to circumvent some of these problems, but I don't know any that circumvents all of them. 2. This is a point I have made at length (Barwise 1981). It applies twice-over to possible-world semantics, in which the assumption of total information is built in at every possible world, but also in the use of total functions of finite type.
65
Commentan/ADretske: Knowledge and the flow of information what we have to posit to explain perceptual reduction as well as, possibly, what children and animals perceive. The difference Radu J. Bogdan appears to be one of computational competence (pp. 165-167). Department of Philosophy, Tulane University, New Orleans, La. 70118 But this does not make sense unless, from a competence perspective, (i) what-is-perceivable is always a specific visual field Determining what talk of what-is-perceived (W) amounts to is and (ii) it is conceptually true that, as an inevitably privileged like computing, metatheoretically, a function of several variframe of reference, the mature, human visualfieldis thought of ables. Three are essential. They characterize principled peras (best delivering) the visual world. So the value of the compespectives on perception qua physical interaction, processing tence function cannot be X but rather its visual field correlate. competence, and representational performance. To say in this , sense that a system S perceives X is to say that (1) S interacts with There is a parallel story for information. What counts as X, (2) S has the competence to process the input from X in information-for-S is (among other things) relative to objective specific ways, and (3) S, as a matter of performance, (a) is in a possibilities at the source and what S knows about them (pp. 65, certain aggregate internal state (undergoes a perceptual experi80-81, 112ff.). "Knows" here is ambiguous. It may mean specifence), (b) forms a phenomenologically accessible percept of X, ic data S has, as a matter of incremental performance, about and (c) acquires new information about X. I will call (1) the which alternatives are likely or realized, and so forth. This is interaction function, (2) the competence function and (3) the what Dretske seems to have in mind. But, antecedently, it may performance function. Determining W depends on which of also mean that S's competence is such that S can (is wired to) set these perspectives is being considered and hence which funcup the alternatives in a given format. Call this the "discrimination is being computed; each brings its own facts and regularities tion space of alternatives" (DSA). A DSA is the informational and hence its own descriptions and explanations to bear on the counterpart of the visual field. If two systems have different identification and analysis of W. For example, when (1) alone is DSAs, then the same signal provides different information for considered, the value of the interaction function is X. This tells those systems and thus different values for what-is-perceivable us something like the following: Blackboxing competence and by them. This result is compatible with the fact that visual fields performance, if perception is viewed (merely) as physical inand DSAs are framed in analog, lawful, and precognitive ways. teraction, then what S perceives is (thought of as) a thing or This brings me to the earlier remark that analog representaevent X. Informationally, the value of the interaction function is tion can be fact-capturing, hence prepositional. To see this, it is the "nuclear information" carried by a signal. essential to counter two dominant and related views to which This is the part of the story of perception that Dretske tells Dretske appears to subscribe, occasionally with qualifications (p. 256, n. 11). The first links the fact-capturing ability of a best. Blackboxed at the receiving end, S is not yet in the picture. This is why the perspective of the interaction function is external representation, hence its propositionality, to a symbol-manipto S, its language extensional, its vocabulary naturalist, its ulating, description-generating, and rule-governed system: in short, a language. This need not be so. I take it as a basic posited regularities physical. This is also why there is no quesintuition that whatever extracts information from an input tion of veridical perception or true/false information (pp. captures facts and represents them propositionally. "Facts" and 45-46): the values of the interaction function are semantically inert for the simple reason that X is not yet represented by S, "propositions," of course, name constructs of ours that are and the nuclear information about it is not yet information-for-S. intended to account for what a representational system does. It takes a change of perspective, hence a new valuation of W, to We may have empirical or terminological difficulties specifying real instances of a system's prepositional representation of facts. bring S in. But this is no argument against looking at things the way I So consider the competence function. The problem now is to suggest. The second view, which typically legitimizes the first, determine what-is-perceivable. We are not asking, as before, is that only cognitive digitalization is a selective and abstractive what is S interacting with? but rather (assuming or blackboxing analysis of input. It is of course so, relative to a prior analog that), what can S represent and how? and, what can be informainput. But the latter, too, is the outcome of an equally selective tion for S and in what form? Since competence can hardly be and abstractive analysis of the external input. A peripheral isolated as such (eye doctors and neurophysiologists can judge receptor, sensitive to specific aspects of that input, will fragment and study it when manifested in some form), let us assume that and analyze it according to its program. Such receptors are built data on fixed, rigid performance count as data on competence. to frame their respective DSAs (for, say, light intensity, texture, At this juncture I find Dretske hestitating. On the one hand, etc.). Together, these DSAs help in articulating a visual field. he seems to hold that the value of the competence function is X, Again, it seems to me that if a receptqr does all this, then it can if no cognitive digitalization is involved, on the grounds that be said to represent particular facts about the input in a preposianalog representation can and does deliver nuclear information tional format (e.g., the value of light intensity = so and so) which on X, and the competence function can be determined in analog has semantic value. No higher mental functions and no digitalterms. The reasoning seems to be this: Whatever delivers ization are needed to explain the phenomenon. If this is so, then nuclear information does so lawfully and hence is a part of a from a competence or design view of what a visual system can (is rigid, law-abiding communication setup, indeed part of physical nature; it is semantically inert (i. e. there is nothing to be right or built to) represent, the W value of the competence function cannot be X. If it is, we are computing the wrong function. wrong about), can justifiably be assimilated to the previous, interactional perspective, and can thus be regarded as a further Afinalremark. Dretske notoriously holds that "we can see, link in S's overall physical interaction with X. In short, when hear and smell objects and events (he aware or conscious of exercised in an analog, lawful, and precognitive way, sensory them) without necessarily categorizing them in any way" (p. competence is not essentially (functionally?) different from the 258, n. 28). The claim is ambiguous. If it refers to the object of competence that, say, light has to carry nuclear information. perception as the object of physical interaction, then it is trivially true. If it says that S can have a percept of X without But things are not so simple. For, on the one hand, Dretske having specific, X-sensitive concepts and beliefs, then it is again defines the "perceptual object" as what is representable in a true. But if it implies that no concepts and beliefs whatsoever, "primary way" (pp. 160-162). Two things are interesting about no categorization at all, are required for S to have that percept, the definition. One is that what is primarily representable is then it is probably false. As Dretske notes (p. 144), a child might something-having-a-property, that is, a fact. So primary reprenot perceive the flower as a daffodil. Yet she must (as a matter of sentation can be analog and prepositional. I will take this up necessity, I think) perceive it as something or other (flower, shortly. The other is that, with Gibson (1979), Dretske discolored thing, whatever). It is hard to imagine how this can fail tinguishes between "visual world" and "visual field": one is to be true. Evolutionary arguments will tell us that lack of populated by X's (chairs, etc.) that adults perceive, the other is 66
Determining what is perceived
Commentary/Dretske: categorization is inefficient and thus detrimental to behavior and survival. It is also plausible to imagine that perception requires memory. This would mean that percept formation involves an assimilation to familiar representations, for example, some matching process along some dimension. This is categorization. It may be unconscious, automatic, and propositional. What percepts are about is a matter of interaction; what they represent, a matter of competence and performance. Determining what-is-perceived is choosing how to talk and what to say about perception.
Content: Semantic and information-theoretic

Paul M. Churchland and Patricia S. Churchland
Institute for Advanced Study, Princeton, N.J. 08540
Dretske's clear and engaging book invites comment on many of its themes, but we will here address only three: (1) the conceptual exploitation of sensory information, (2) the identity of concepts and the propositional content of cognitive structures, and (3) the proper role of information-theoretic notions in epistemology. Despite our sympathy with Dretske's naturalistic motivations, we think he is importantly mistaken on all three matters. On perception, we agree with the initial picture. As Dretske represents them, our peripheral or low-level sensory states carry teeming masses of information, most of which is quite properly lost in the essentially selective activity of subsequent processing, processing which is to some degree plastic and controllable with regard to just what information it selects. (This leaves open the possibility that learning can produce different and much more penetrating forms of perception.) We also accept Dretske's thoughtful account of what (natural) information is contained in any given state. So far, so good. But, in our view, there is a deep and essential difference between the natural information that ends up contained in a high-level cognitive state, and the semantical/intensional/propositional content of that state as it figures in the creature's own representational system. Elsewhere (P. M. Churchland 1979; 1982; P. S. Churchland 1983) we have called the former (information-theoretic) kind of content "calibrational content," since it reflects a creature's natural status as an effective instrument of measurement or detection concerning the status of the objective world. And we have called the latter (semantic) kind of content "translational content," since it reflects what one seeks to represent in faithful translation, namely, how the other creature conceives o/(what it thinks or believes about) the objects of his perception. We will here call them "I-theoretic content" and "semantic content," respectively. These two kinds of content can, and often do, diverge radically. A primitive human being may have a state whose semantic content is "The gods are shouting," but whose I-theorctic content is "There was a large-scale atmospheric electrical discharge nearby. " Or he may have a state whose semantic content is "There are windows, in the crystal sphere, onto the realm of heaven,' but whose I-theoretic content is "There are gravitationally ignited nuclearfiresat points distant from earth"; and so forth. Which is to say, humans, and other creatures, can systematically misconceive what they perceive. This divergence is important, because (assuming the rough integrity of folk psychology) it is the semantic content of our internal states that is critical for our cognitive processing: for the evolution of belief and for the production of behavior. Most regrettably, their I-theoretic content is causally inert (see Fodor 1980, this journal). For some of us, there is no essential connection between these two kinds of content, and it is decidedly problematic how
the semantic contents of our states manage to mimic their actual I-theoretic contents well enough for us to survive and flourish in this world. The answer, some of us suppose, lies in an evolutionary approach to cognition: in the physical evolution of species and in the cognitive evolution of individuals. Collectively, the elements of one's cognitive economy become progressively better "tuned" to the environment, qua elements in a control system for behavior. For Dretske's account, on the other hand, it is decidedly problematic that there is any divergence at all in the information-theoretic and the semantic content of our cognitive states. He would like just to identify them and have done with it; but since false belief and misconceptions are obviously possible, he must concede some divergence. His account of the divergence, however, still leaves semantic content closely dependent on I-theoretic content. For our earliest beliefs involving "primitive" concepts, Dretske does identify the two kinds of content outright; that means that our earliest concepts cannot help but enjoy extension, and our earliest beliefs cannot help but be true. And though cognitive structures or concepts may subsequently be misapplied, so that their semantic and I-theoretic contents then diverge, the structure's semantic content is said to be identical with the original I-theoretic content of the structure-type's earliest instances. This account of the semantic content of our cognitive states will not stand scrutiny. It could be hounded to death by counterexamples, its defense requires a contrast between "primitive" and "complex" concepts that is without theoretical integrity, and it is already superseded by a better account: that which ascribes semantic content to a state (type) according to the conceptual/computational/inferential role that it plays in the subject's cognitive economy (see P. M. Churchland 1979; Field 1977; Harman 1975; Stich 1982). This account of semantic content appeals to a dimension of cognition that receives surprisingly little attention in Dretske's book: the dynamical dimension of cognition. The rich patterns displayed in a creature's cognitive dynamics provide a criterion for ascribing content to its role-playing elements: A state or structure has the semantic content that-P just in case it plays a certain inferential role in his cognitive economy: the same role that my belief-that-P plays in my cognitive economy. This criterion provides a means of ascribing semantic content that does not beg any questions about the integrity of any of the target creature's concepts or about the truth of any of his beliefs. And this is as it should be. The perfect coincidence of our semantic and I-theoretic contents is the infinitely distant epistemic goal toward which we strive, not the immediate and inevitable outcome of merely being a cognitive creature in the first place. We conclude that Dretske's information-theoretic strategy cannot provide an adequate account of the semantic content of a perceptual belief. And this central failure has company. Dretske's account of what makes a structure a belief (complete digitalization of contained information) seems wrong to us: It is its peculiar dynamic role that matters. Neither does his account of intentionality seem to carve nature at a joint. His account of knowledge is much more compelling, but here the appeal seems to derive mostly from the fact that it is a causal account, not from anything essentially information-theoretic. Since the conditional probabilities required for his account are all equal to unity, simple conditionals will do the trick. The apparatus of probability theory is not necessary. This is not to say that information theory has no significant role to play in an account of cognition. We think it has. But it seems more promising as a tool to help us understand the operation of certain neural structures, or the evolutionary selection of certain processing strategies, or the efficiency limits on certain others, and so on: in short, as a toolforunderstanding the physical brain. It need not be an indictment that information theory fails to explicate the suspect distinctions and categories of
67
Commentanj/Dretske: Knowledge and the flow of information

folk psychology. The special theory of relativity fails to explicate the relative sizes and oscillations of the crystal spheres that turn in the heavens. malfunctioned; (4) the concept instantiated in the yellow-card (or, rather, in the fact that yellow-card matches all and only cards punched when . . . , etc.) is not the concept yellow, but, perhaps, the concept looks yellow. Possibility (4) is beside the point. Any of possibilities (1) through (3) could do the job, provided they are compatible with the concept in question being the concept of yellow. The restriction is fairly severe. This Information and cognitive agents cannot, for example, be like a normal case of illusion, for example, chromatic adaptation; for cases of that sort are cases in Robert Cummins which the system functions normally, as expected and according Department of Philosophy, University of Illinois at Chicago Circle, Chicago, to law, and hence are cases that show that merely looking yellow III. 60680 without being yellow was always a relevant possibility, hence cases in which there is positive equivocation, hence the sort of A television camera is coupled with a device that punches cards case envisioned in (4). when the camera is turned on (and, perhaps, moved about). The I don't think all cases of false belief - even all simple de re cards produced are called percept-cards. In addition to the cases - can be due to malfunction, but I want to let that pass to perceptual system, there is a conceptual system, consisting of a focus on something else. Locke now has beliefs, according to deck of prepunched cards (concept-cards) and a sorter. When a Dretske's analysis. Yet something essential is surely lacking. I percept-card is produced, the sorter finds every concept-card don't wish to disparage Locke - Locke is a marvelous system far having only holes duplicated in the percept-card. (Put the beyond anything we can build today, mainly because of the first concept-card in front, and the percept-card behind, and make stage of digitalization I have assumed in the perceptual system. sure you can see through every hole in the concept-card.) We teach this system call it Locke English words by program- But Locke has no understanding of its own representations, and ming a third component, a card reader cum voice synthesizer, to this is why, despite its great virtues, it seems a mere machine and not a cognitive agent or knowing subject. Locke's represenutter the English word expressing the concept corresponding to tations are understood by us: We have to understand them to the concept-card inserted into it. For instance, there could be a understand how Locke works. So Locke's representations are concept-card that is selected when and only when the television representations for us, but they are not representations for camera is pointed at something yellow. If so, we program Locke. They mean nothing to Locke: Though they have meancomponent three, the speech system, to token the word yellow ings, given their informational origins, and they would have when that concept-card is inserted. these meanings, I am willing to concede, whether they meant All of Locke's concepts are innate, though none of its beliefs is anything to us or not, they have no meaning for Locke. What is (if it has any). Locke's famous namesake thought concepts could lacking is what many philosophers, Searle (1980), for example, be acquired by abstraction, that is, that concept-cards could be think of as intentionality, but a kind of intentionality that does made by putting together bunches of percept-cards and making not derive from, or correspond to, any kind of intensionality. a card that has only the common holes-but we now know that I think we need two words here, one for the sort of thing this will not make very useful concept-cards. And since we typically explained by reference to intensional idiom (Dretske's don't know what will work, I've just left concept acquisition three orders) and one for the sort of thing Locke lacks, the sort of aside. This won't matter in what follows, however, for the meaningful state that is understood by the system that has it semantic content of Locke's concept-cards, or rather, the (what Searle was concerned about). I propose to call states of the matching events in which they are featured, is readily deterfirst sort contentful states, reserving "intentional states" for the mined by the information to which they are selectively sensisecond sort. In this sense, Locke is not an intentional system, tive. I take it that Locke does have concepts by Dretske's though Locke does, I suppose, have contentful states of the standards, since the concept-cards are complete digitalizations third order, and false ones as well. Yet these states are still of information available to Locke in analog form, and since they would-be beliefs, at least by my lights, for, while they have have a striking, if crude, executive function. meanings and truth values and the right sort of role in directing Dretske says concepts imply belief (p. 214), but I don't think behavior (they ought to be available as premises in inference, this can be right because, given what I've said so far, there is no too), a sentence in a recipe has all these features (in a slightly provision for false contents, and no reason to suppose Locke enriched neighborhood). Content is only half the story we have exhibits intentionality of the second order (pp. 172-73), yet to tell to describe a cognitive state or structure. The other half of Locke seems to satisfy the spirit of Dretske's account of conceptthe story must provide an account of what it is for a system to possession. Not to quibble, let's credit Locke with would-be understand its own representations: Intentional states of S = beliefs - things that would be beliefs in a ritzier neighborhood contentful states of S that S understands. It is this part of the and turn to the question of how we might upgrade the systematic story that makes the difference between, say, merely doing a neighborhood Locke provides. truth-table, and checking the validity of an inference form. At this level of hand waving, the third order of intentionality, When I do a truth-table, I crunch symbols, but I also know what and hence the second as well, is easy: We need only suppose my symbol crunching means - I not only process information, I that the concept-card instantiating the concept yellow is such know what information I'm processing. Most computers and that the event of matching a percept-card to the yellow-card many logic students do not. does not have as part of its content that the object of perception So the question becomes, given Dretske's promotion of inforis colored, but does have as part of its content that the object of mation-flow into contentful systems, how do we go on to properception is yellow (read this de re). Dretske explains how this mote contentful systems into intentional systems? I've said that could happen in Chapter 7, and I have only minor quibbles with an intentional system is a contentful system that understands its his account. Falsehood, though, is a bit more difficult. Call the own contentful states, that is, a contentful system that knows object of Locke's current perception o, and the (or one of the) what its representations mean. Dretske has an analysis of knowlpercept-cards produced by encounter with o, p(o). Finally, call edge, so the natural move is a kind of bootstrapping: Apply the the event that is a matching of p(o) with the yellow-card in [Y, p(o)], which has as its semantic content o is yellow. Km [Y, p(o)] analysis to the problem of knowing what representations mean. is to be the false belief that o is yellow, then we must assume that o is not yellow. How, then, did m [Y, p(o)] occur? Sigma is an intentional state of S with content m, if and only There are several possibilities: (1) the sorter malfunctioned; if sigma has content m, and S knows that the content of sigma (2) the card punch malfunctioned, (3) the television camera is in.
68
Commentary'/Dretske: Knowledge and the flow of information Dretske's attempt to deal with this difficulty (pp. 162ff.) seems inadequate. He cites constancy phenomena. As I watch my friend walk away, my retinal images of her shrink, but it does not look to me as if she shrinks. From this Dretske infers that my Sigma is an intentional state of S with content m, if and only visual experience does not represent the shrinking of the retinal if sigina lias content m, and S would-be knows (i.e., Dretskeimages. (This does not follow: The shrinking of the retinal knows) that the content of sigma is m. images may be represented by its looking to me as if my friend recedes.) From this conclusion he infers further that my visual I don't suppose this would make for consciousness, but it might make for something that is plainly lacking in Locke, and is just as experience does not represent (in any way) the retinal events that bring me information about external objects. Non sequitur. plainly essential to cognition. Suppose there are alternative ways my retinas can carry the information that the light is on - fll or R2 - and they are such as to make no difference to the visual experience that represents Four difficulties with Dretske's theory of the light's being on. So my visual experience fails to represent knowledge the fact that my retinas are, say, Rl. It does not follow that there is no property of my retinas that both represents the light's Carl Ginet changing and is represented by my visual experience. There is Sage School of Philosophy, Cornell University, Ithaca, N.Y. 14853 the property Rl-or-R2. There must be some such property ifmy 1. Gettler counterexamples. Dretske claims that his account of visual experience is to receive the information about the distal perceptual knowledge (p. 96) avoids Gettier's sort of counterexam- object via my retinas. We cannot rule out primary representaple (Gettier 1963) to the thesis that justified true belief is knowltion of disjunctive properties, for any property can be specified edge. One can take "Gettier counterexample" narrowly, so that as a disjunction of (usually more specific) properties. any account of knowledge that fails to require the belief to be 3. The requirement that the probability be 1. For S2 to carry the justified ipso facto avoids such counterexamples. But if we count information that SI, Dretske requires that the conditional probas "Gettier" any counterexample to an account of knowledge in ability of SI relative to S2 be 1. It will be < 1, if the conditional which a person with a true belief satisfies whatever further probability that some alternative to SI caused S2 is > 0. Is not condition the account puts on knowledge, but still fails to know the latter almost always the case? No, says Dretske. We may (because of peculiarities in the way the condition is satisfied), regard the existence of the channel conditions that ensure that then there are Gettier counterexamples to Dretske's account. S2 is caused by SI rather than some alternative as having a f Machine A has a red indicator light. It comes on when M gets probability of 1; we may regard the conceivable alternatives as too hot. K, having good reason to think H would know, asked H not genuine possibilities. Our entitlement to do this is a matter the meaning of the light. H in fact knew nothing about it but of degree (pp. 129, 133). Degree of what? I am not sure, but it brazenly told K what, by sheer luck, happened to be the truth appears (from pp. 130-31) that it is the conditional probability about it. Seeing the light on, K believes that M is too hot. This (determined by frequency of occurrence in the long run) that the belief is caused by that information (carried by the light) and possibility will occur, given such factors as when and what satisfies Dretske's sufficient condition for knowledge. But it checking has been done on the channel. Presumably, to make seems clear that in the circumstances this belief is not the possibility nongenuine, this probability must be very small, knowledge. but it is suggested that what is small enough for some uses of the There is a peculiar disorder in K's optical system of which K is channel may not be small enough for others (pp. 132-33). unaware. Most of the time, but not always, it translates the If this is how Dretske is to be understood - and how else? events in K's retinas, carrying the information that an object K then an accumulating-probability argument [like one Dretske sees is chartreuse into the visual experience of the object's himself gives for the requirement now under discussion (pp. 59, looking magenta to K. K looks at a chartreuse object in good light 103-4)] shows Dretske's account inconsistent. Suppose a long when this disorder happens to be inoperative. K is caused, by chain of signals: SI causes . . . causes Sn. For each t > 1, there the retinal event carrying the information that the object is is a certain very small probability that Si was caused by some chartreuse, to believe that the object in chartreuse. But the fact alternative to Si-1, small enough that we can regard this posthat the same information could so easily have caused a false sibility as nongenuine and take the probability of St-1 relative to belief about the color, with no clue to K that things were amiss, Si to be 1. From this it follows that we may take the probability of makes it impossible to count the true belief as knowledge. SI relative to Sn to be 1. But suppose that n is large enough that the cumulative probability that somewhere in the chain one of Given what Dretske says about his boiler-gauge example (pp. the alternative possibilities occurs is too large to permit us to 124-25), he may be prepared to say that K does, too, have regard this possibility as nongenuine. From this it follows, on knowledge in these cases. Would Dretske insist that the boiler Dretske's account, that we must take the probability of SI attendant knew on the basis of the gauge that the pressure was relative to Sn to be < 1. normal even if he believed this despite knowing that the designer of the gauge now thinks it quite unreliable and despite having 4. Belief contents. According to Dretske (pp. 208-9), what no reason to suspect (the truth) that the designer is wrong? makes it the case that the belief of the newt with the rotated eye 2. Perceptual objects. According to Dretske, the objects I see is that the bait is in back is not that this belief causes the newt to behave in a way appropriate to the bait's being in back. It is are those whose properties my visual experience represents in a rather that "signals bearing the information that the bait is in primary way (p. 162). Ifmy visual experience represents (carries the information) that M is overheating but does this by means of front . . . are now . . . being coded in a structure designed to carry the information that the bait is in back." (p. 209) But representing that the red light is on, then it fails to represent the suppose a smarter (or just luckier) newt whose behavior changes machine's overheating in a primary way (p. 160). to what is appropriate to the bait's being in front, though the How, on this account, can any distal object be one that I see? light signals are still being coded in a retinal structure that was The information that the light is on is carried to my visual developed (or prewired) to carry the information that the bait is experience through a chain of events: It gets this information by in back. Surely the newt's belief, though caused by the same representing the intermediate events that represent that the retinal event, has now changed its content. It would be absurd light is on. So my visual experience does not represent the to suppose that it still has the same content, but now somehow light's being on in a primary way. It would seem that on causes behavior inappropriate to itself. The contents of percepDretske's account we see only events in our eyes or perhaps only tual beliefs must be in part controlled by their sensory causes events farther back in our neural optical system.
This obviously won't work as it stands, for it is blatantly circular, the capacity for knowledge presupposing intentionality, but perhaps something like this would work:
69
Commentary/Dretske: Knowledge and the flow of information

else they would not be perceptual beliefs - but why cannot these contents be some function of both sensory causes and behavioral effects? paper known to me that offered a measure of surprise was by Warren Weaver (1948). He suggested dividing the expected probability of all events that might have occurred by the probability of the event that did occur. Thus, if the set of probabilities is p,, p2, . . . , pn, then the expected probability is which is also known as the "repeat rate," and if the t''1 event occurs, then Weaver's surprise index is The economist G.L.S. Shackle organized a conference on surprise at the meeting of the British Association for the Advancement of Science in 1953. In my paper at that conference (Good 1957), I pointed out that a degree of surprise depends on how one has already categorized the possible events, such as outcomes of an experiment, and I suggested a generalization of Weaver's surprise index. The most interesting special case of this generalization is - log p( - (-2p.- log pj) or its antilogarithim
Physical probability, surprise, and certainty

I. J. Good
Department of Statistics, Virginia Polytechnic Institute and State University, Blacksburg, Va. 24061
Here come I, my name is Jovvitt, If it's knowledge, then I know it. I am the Master of this College, What I know not is not knowledge. Traditional Oxonian verse
The author's main thesis, if I am not mistaken, is that it should be possible to understand, without assuming vitalism, how a person can have beliefs. He has thus tried to answer a question that the present reviewer (1959) once asked in the title of an article, "Could a machine make probability judgements?' Because my emphasis was on the "subjective" ("personal") probabilities of the machine, it might seem that our views are opposed, for Dretske's discussion is in terms of physical (objective) probabilities. This opposition is more apparent than real because, of course, I agree that the "beliefs" of an informationprocessing machine can only be interpreted in terms of its propensities to act in various ways, including such actions as the typing out of probability estimates, and there are also propensities in its internal working, if it is not deterministic. These are, I suppose, physical probabilities. But if it is a deterministic machine, the probabilities might be called pseudoprobabilities, by analogy with pseudorandom numbers. Yet even if you believe you are an information-processing machine, you should be well advised to make use of a theory of subjective probability for your own decision making. It might be theoretically possible to describe another person in terms of physical probabilities, but for yourself you must also make use of subjective probabilities and value judgements (utilities). I mention this to show that, in spite of the emphasis on subjective probability in my writings, this would not by itself rule out the possibility of philosophical agreement with Dretske's thesis. The first of the three parts of his book deals with information and its measurement. Here I believe the discussion is liable to be misleading from the points of view of history and terminology. For example, the expression log p - where p is the probability of a proposition - as a measure of the amount of information in the proposition, occurs at least as early as 1950 (Good 1950, p. 75), whereas Dretske refers to Attneave (1959), who unfortunately called the expression a "suqirisal." It was unfortunate for reasons explained below. In my note (Good 1966), I used the notation 1(H:E\G) for the amount of information concerning H provided by given G. After assuming reasonable desiderata or axioms, I found that 1(H:E\G) should be defined as a continuous monotonic function of P(E\H and G)/P(|G), the logarithm being convenient if nice additive properties are desired. I mentioned there that, "The analysis applies whether the probabilities are physical, logical, or subjective, in which case we would be talking about physical information, logical information, and subjective information respectively" (p. 578). So, as far as the formulae are concerned, the kind of probability appears to be irrelevant. Consider now the topic of surprise. Suppose that a die is thrown and it gives a 5. There is nothing surprising about this unless one had mentioned 5 in advance. Otherwise, to measure the surprise of obtaining 5, its probability, 1/6, must be compared with some kind of average of all six probabilities, 1/6, 1/6, . . . , 1/6. Every kind of average of these six probabilities is itself 1/6. Hence, there is no surprise in seeing a 5. The first
The first of these forms can be described in words as

the amount of information (concerning itself) in the ccent that occurred, minus the entropy.
This is precisely Dretske's formula (2.2). (p. 52) He says that this formula, together with I = log p, "hold the key" for understanding the commodity of the information in particular messages, (p. 53) In my opinion, the name "surprisal" would be better applied to "log(l/p) minus the entropy" rather than to the first term alone. If the surprisal is negative, then the event was "only to be expected." Of course, if the probabilities are interpreted as physical ones, then the "surprisal" would need to be understood as the "physically justifiable amount of surprise." Regarding the measurement of knowledge, Donald Michie had an interesting insight in 1974 (Good 1977; Michie 1977; 1982). He pointed out that knowledge of a proposition E depends on a time parameter, namely the time taken to retrieve , possibly by means of a "computation." In Dretske's discussion, I believe there is no distinction between recalling and computing it. It is interesting that there are therefore two parameters in the definition of knowledge, which is supposed by some to be an "absolute"; the other parameter is the threshold for P(E) beyond which one refers to knowledge. Of course, the threshold is conventional in practice, and varies from time to time and from person to person. It is not unity: One of my intelligent betting friends once said he was certain of something, but refused to give me odds of 3 to l! In conclusion, it seems to me that Dretske's emphasis on digitalization, even at the expense of loss of information, for the purpose of better information processing in the nervous system, is the same as my emphasis on "regeneration" (Good 1965, pp. 37-40, 46, 57, 62, 77). That paper also dealt largely with information and semantics. (I cannot blame the author for not having read everything I have written.) In these comments I have confined my attention to areas of my own interest, in accordance with BBS's multiple bookreview instructions, rather than trying to review the whole book in detail.
Can information be objectivized?

Ralph Norman Haber
Department of Psychology, University of Illinois at Chicago, Chicago, III. 60680
Information-theoretic descriptions of perception, memory, and concept formation (which are now lumped together as cognition)
70
Commentary/Dretske:
briefly dazzled psychologists in the 1950s and early 1960s, and then simply faded away. A few researchers continued to use some of the mathematics to help them describe their manipulations (most notably Bcrlyne 1972, e.g., in his research on novelty; and Garner 1962, e.g., in his work on dimensionality and judgment), but they were the rare exceptions. Ten years ago, I briefly reviewed the demise of information theory in psychology (R. N. Haber 1974), concluding that it died for empirical reasons - it just did not work. Specifically, while it was generally easy to calculate the amount of information in a stimulus or in a response, such calculations did not correlate with any interesting or relevant behaviors of real perceivers, rememberers, or thinkers. After a while, that discouraged further use of the metric. In retrospect, we can now see that the failure was due to a specific theoretical problem in the definition of information, a problem that Dretske has labeled, but, in spite of his claim to the contrary, one he has neither understood nor solved. That problem is the relativization of the amount of information in a signal to what the recipient of that signal already knows about the signal and about the circumstances of its reception. Information theory provided measures that were entirely independent of the recipient. This independence presents problems for any task involving differences in what people know. For example, if the signal consists of visually presented English alphabetic characters, an English reader "sees" letters, whereas someone unfamiliar with the Roman alphabet "sees" geometric combinations of lines. Not only is the seeing different, but the accuracy of reproduction is affected, even in copying the letters while the signal remains visible. Further, the amount that can be remembered is altered, reaction times are substantially different, and so forth. The signal is the same to each recipient; all that is changed is a characteristic of the recipient. Such effects can also be shown within the same recipient for signals that differ in their familiarity to him. A perceiver does remarkably better in perceiving a sequence of letters that spells a familiar word compared to one that spells an unfamiliar one or no word at all. Familiarity is a state of affairs in the recipient, not in the stimulus alone, and it can be changed suddenly, as with a prompt, hint, or instruction, as well as by the gradual effects of sheer experience. Finally, such effects can be shown within the same recipient, holding experience and familiarity constant, even for the same signal. Any time a person has two or more paths or choices available in making a cognitive decision, he is in a state of "muzzy" (L. R. Ilaber 1975). Such a state is created when two or more rule-systems apply simultaneously to the same cognitive domain. Suppose that I need the past tense for a word I know only in its present tense form (e.g., bink). I have a number of equally acceptable choices (binked, from blink/blinked; bought, from think/thought; bank, from sink/sank or bunk, from sink/sunk), because English has a number of rules for preterite formation that apply to roots of this form (see L. R. Haber 1976). Muzzy applies to real words; is the past tense of dive dove or dived? And what is the plural of mongoose? Nor are the examples of muzzy limited to language; they are found in all aspects of cognitive behavior - whenever the rules that people know about a circumstance lead to more than one acceptable response. Information theory had no convenient way to specify the effects of familiarity (let alone muzziness), yet familiarity, and, more generally, knowledge, has turned out to be the single most potent variable in predicting cognitive performance and cognitive ability. So cognitive psychologists had to abandon information theory in favor of some measure that was sensitive to knowledge. This was one of the reasons that led Miller (1956) to reject the theoretic metric of bits, and to substitute the subjective and far less precise unit he clubbed "chunks," a chunk being whatever unit the recipient is using at the moment.
We have not progressed much beyond Miller's chunks in the ensuing 25 years. Since a chunk is a subjective unit, and not objectivized in the stimulus, the interpretative task facing a researcher is complex. It is not sufficient merely to say that the perceiver is processing letters rather than squiggles, or words rather than letters; We have to demonstrate the subjective unit of analysis by a set of converging operations applied to the data. Most of the remarkable ingenuity in experimental design exploited by the information-processing research of the past 25 years stems from these required converging operations. Dretske proposes on the first page of his preface (p. vii) that the subjective definition of information as used since Miller's time, while intuitively reasonable, is philosophically indefensible. To be useful, he says, information must exist independently of any real or potential recipient. How else can we study the interpretative abilities of recipients (a goal of cognitive psychology), unless we have a measure of information independent of that ability, that is, independent of any relativization due to characteristics of the recipients of the information? Dretske's premise for his book - that we need an objective definition of information - appears at first blush like a step in the right direction. But on reflection, I think it is not useful, and his book fails in its purpose. Let me illustrate. First, if an objective definition of information is critical, how is it that the information-theoretic definition failed for psychology, even though it was objective and independent of the characteristics of the recipient? Dretske suggests that it failed because it described the amount of information (in average terms), but not the content of any particular piece of information. However, examination of the research literature of the 1950s and 1960s shows many examples in which the application of average measures was quite appropriate, and could have been useful. Rather, what failed was the very objectivity Dretske wishes to resurrect. That objective measure was of no use in any context in which differential knowledge, familiarity, or ability among or within recipients was relevant. Second, Dretske fails to develop a case - other than his unexamined appeal to epistemology (p. 1) - for any benefits that would accrue from an objective definition. He simply does not face up to the complexity of specifying the individual differences in what recipients know, even though he does devote substantial space to the question of how to describe the concept of knowledge itself. 1 do not see why a goal of cognitive psychology should be to seek an objective definition. Rather, we have to work to make a better definition that does reflect the subjective characteristics of information. Finally, the objective definition of information proposed by Dretske fails even by his own criterion to meet his demand for independence. Specifically, his formula for information at a source (p. 65) has an explicit term k, where k "is meant to stand for what the receiver already knows . . . about the possibilities that exist at the source." (p. 65) However, Dretske argues (p. 80) that the k term is really unimportant, since there are usually no differences among recipients in what they know. Further, he does not describe how k is to be defined or quantified. Except for a trival example of a shell game (pp. 78-80), he does not consider such differences at all. It is precisely these inter- and intra-individual differences in knowledge that have already killed one attempt at a definition of information. For me, Dretske's page 80 suggests that he has not fully appreciated the struggles that cognitive psychology has had with independent definitions and converging operations. The most important part of our research and theorizing is dismissed by Dretske. He could have made a useful contribution if he had seriously examined the implications of different kinds of definitions of information. As it is, he made the wrong assumption on page one, and then proceeded in a direction that can be of no help to cognitive psychology at all.
71
Commentary/Dretske:

We might then define ordinary knowledge in terms of such relative knowledge: K knows that s is F = there is some "acceptable" framework C such that K knows relative to C that s is F. We must then say which frameworks are "acceptable." For one thing, we need to find restrictions on acceptable frameworks to rule out counterexamples involving deviant causal chains from signal to belief. If the boss associates the color pink with Herman's ties and, on seeing the pink envelope, irrationally comes to believe that Herman has been selected, as happens by accident to be true, he does not come to know in any sense that Herman has been selected, even though the pinkness of the envelope causes his belief, and there is a framework in which the pinkness of the envelope carries the information that Herman has been selected. However, it is not clear to me how to rule out cases of this sort. Similarly, if Albert wrongly believes a coin has come up heads, and says it has come up tails in order to mislead his hearers, then, although there is a framework in relation to which his remark carries the information that the coin has come up tails, a naive hearer, who simply trusts what Albert says on this subject, does not come to know the coin has come up heads. Again, it is unclear what principles would rule this framework out as "not acceptable" while allowing other cases Dretske wants. Since any cause r of a true belief that s is F carries the information that s is F relative to some framework (e.g., that consisting in the set of actual and possible situations in which either s is F or r does not occur), Dretske's analysis of knowledge reduces at best to the trivial point that knowledge is true belief plus something else.
Knowledge and the relativity of information

Gilbert Harman
Department of Philosophy, Princeton University, Princeton, NJ. 08544
Dretske appeals to a notion of probability according to which the probability that a given coin will come up heads when tossed is not to be identified with how likely heads is on the basis of someone's evidence, but is supposed to be an objective property of the coin, or of the procedure of tossing the coin, a property which might be discovered empirically by tossing the coin many times to determine the frequency of heads. He does not sufficiently stress, however, that this sort of probability is always relative to a framework. Suppose the coin has been randomly selected from a group often coins, nine of which are fair, in the sense that the probability of heads with any one of them is .50, and with the tenth being weighted in such a way that there is a probability of .75 of heads if it is tossed. In a framework that includes randomly selecting a coin from the original ten coins and tossing it, the probability of heads is .525. In a framework that includes tossing only this particular coin, the probability of heads is either .50 or .75, depending on which coin it is. Dretske defines information, knowledge, and meaning in terms of objective probability. The key notion is "informational content," which Dretske defines as follows: A signal r carries the information that s is F = The conditional probability of s's being F, given r (and k), is 1 (but, given k alone, less than 1) (p. 65) where k is the background knowledge possessed by someone K. This formulation hides the fact that the relevant probability is conditional on a framework that includes more than r and k. According to Dretske, a signal that K is aware of can carry information without there being any way for K to determine what the information is. Suppose several employees use a random method to select someone for an unpleasant job, communicating their choice to the boss by sending in a letter the name of the person selected. Suppose also that they have a different-colored envelope for each person, and the letter arrives in a pink envelope. Dretske holds that the color of the envelope carries the information that Herman is the person selected, even if the boss does not know the "code" connecting colors with employees. But this satisfies his definition only if the probabilities are taken to be probabilities relative to a framework in which this particular code is taken to be fixed. Often Dretske overlooks this relativity; for example: . . . suppose that all Herman's children have the measles. Despite the "correlation, " a signal might well carry the information that Alice is one of Herman's children without carrying the information that Alice has the measles. Presumably the fact that all Herman's children (living in different parts of the country) happened to contract the measles at the same time does not make the probability of their having the measles, given their common percentage, 1. (p. 74) This is inexact, because it matters what the framework is. In a framework that includes the fact that all Herman's children have the measles, the information that Alice is one of Herman's children does carry the information that Alice has the measles! In a framework that does not include this fact, the information that Alice is one of Herman's children may not carry the information that Alice has the measles. I worry about the implications of this relativity for Dretske's definition of knowledge: K knows that s is F = K's belief that * is F is caused (or causally sustained) by the information that s is F. (p. 86) When the relativity of information is made explicit, we get a definition of relative knowledge: K knows relative to framework C that s is k = K's belief that s is F is caused (or causally sustained) by the information that s is F in relation to C.
Knowledge and the absolute

Henry E. Kyburg, Jr.
Department of Philosophy, University of Rochester, Rochester, N. Y. 14627
Dretske seeks to clothe a relatively traditional approach to epistemology in new information-theoretic clothes. The clothing is clear, not to say transparent, and it is therefore surprising that Dretske himself doesn't notice that the Emperor really isn't much to look at. It is claimed that knowledge must be based on information, where the information that s is F is carried by the signal r if and only if the conditional probability that s is F, given rand k, is 1, where k constitutes "what the receiver already knows (if anything) about the possibilities that exist at the source. ' (p. 65) This is weaker than a common epistemological view, which requires that knowledge that s is F entail that s is F, and not merely that the conditional probability that s is F be 1.' But no matter: It is still too strong. In Chapter 5, Dretske is at pains to show that his absolute information-theoretic view of knowledge does not lead to crippling skepticism. His argument is not unfamiliar in traditional epistemology: In justificationist terms, if you believe S justifiably, and perhaps some other condition is met, and in addition S is true, then you know S, however possible it may be that S is false. Similarly for Dretske. If the channel over which a signal is transmitted generates no (or only redundant) information, then, if that signal carries the information that s is F and leads you to believe that s is F, you know that s is F, regardless of the fact that some channels producing similar signals may be subject to equivocation. Both arguments are persuasive against crippling skepticism. Both show that absolute knowledge is possible. The former more than the latter seems to lead to philosophical puzzlement over the question of whether you can also know that you know. But I find absolute knowledge, under either a justificationist or
72
Commentary/Dretske:
an information-theoretic construal, as uninteresting as knowledge of the Absolute. The reason is that what I want from a theory of knowledge is a sorting criterion - a criterion that will tell me what I can take as evidence, what beliefs I can use in evaluating other beliefs, courses of action, and so on. Given a number of beliefs or knowledge claims, I want to be able to sort them into one pile I don't have to worry about and another pile I have to leave open to question. Information and Truth may give me a perfect division, but they don't give me a criterion I can apply. Even Dretske needs such a criterion. In the discussion of the calibration of a voltmeter, he says, "one uses the instrument to measure known values of a quantity in order to see if [it] registers correctly. . . . " (p. 117, his italics) He refers, in numerous places, to "what one knows on independent grounds." Dretske's discussion of measurement brings the problem of absolute knowledge into sharp focus. His example is a voltmeter: "When the instrument is in proper working order, the information generated at the resistor by the 7-v difference across its leads is communicated to the face of the instrument . . . the equivocation between the source (voltage across resistor) and receiver (position of pointer) is zero. . . ." (p. I l l ) But "if the resistance of the leads varied, the position of the pointer would be equivocal with respect to the voltage being measured." (p. 115) And, as noted in note 9, "the resistance depends on temperature, and the temperature may vary . . . these slight variations can be ignored, since they produce no equivocation. The changes induced occur well below the level of precision at which the instrument operates." (p. 252) Is the position of the pointer equivocal, or is it not? Dretske appears to claim that it is not equivocal, since the equivocation is low: "below the level of precision at which the instrument operates. "2 Suppose I consider n resistors in series. Suppose the "precision" of my voltmeter is 0.1 v. Dretske claims that if the voltmeter is working correctly (whether I know it or not), I can know, for every i, that the voltage drop across resistor t is V;. Therefore I know (by the conjunction principle, which Dretske gives as a "reason" for demanding a conditional probability of no less than 1 as a condition of carrying information) that the voltage drop across the whole series is V(. Now let me measure the total voltage drop V. We all know that the chances are pretty good that V is different from V(. But if the voltmeter is working correctly (whether I know it or not), I therefore know that the total voltage drop is V. I see only three alternatives: I know a contradiction - that the total voltage is both V, and V; I know the voltmeter is not working correctly; I know that conventional circuit theory is wrong. None of them seems acceptable. According to the most common theory of errors of measurement, random errors of measurement are distributed roughly normally, with a mean of 0 and a variance that depends on the instrument. If we replace the simplistic notion of a "level of precision" with the more realistic idea of the distribution of errors characteristic of the instrument - not necessarily the common normal distribution that allows errors of any magnitude, but any distribution of error whatever - then, we must acknowledge not only that the equivocation in the channel is not 0, but also that we know that it is not 0. And yet if there is anything that gives us knowledge, in a practical sense, in the sense in which items of knowledge can be used as evidence for evaluating hypotheses, making predictions, designing machinery, choosing courses of action, it is measurement. If the channel of communication that leads from the voltage drop across the resistor to the belief that the voltage drop is 7.0 .1 v. is known to be equivocal, if it is known that there is a chance of .05 that the true voltage is not in that interval, then I see no way in which this paradigm of empirical knowledge can be construed as absolute knowledge. So much the worse for absolute knowledge.
This was the point of the description of the lottery in Kyburg (1961). It seemed to me that most of what properly passes for empirical knowledge has a known lotterylike chance of being wrong. If so, a number of prima facie suppositions about knowledge must be questioned: among them Dretske's conjunction principle (pp. 100-101) and his xerox principle (pp. 57-58), as well as the claim that there is "no nonarbitrary place to put a threshold." (BBS Precis, "Information") Most epistemologists took the lottery to be a paradox to be explained away in conventional terms, rather than an indication that there might be something wrong with conventional epistemology. It was to show that the lottery has a very real, down-to-earth analog in statistics that I wrote the paper cited by Dretske (Kyburg 1965). But now it seems to me that the very example of measurement brought up by Dretske provides the clearest evidence that if empirical knowledge is construed as absolute knowledge, it is absolutely uninteresting. NOTES 1. This is not altogether clear, since in Dretske's (Precis at point 2 in the section "Information, ' it is claimed that nothing can carry the information that s is F, "unless, in fact, s is F.' This seems to be a return to the stronger requirement of entailment. 2. Perhaps this is true of Dretske's instrument; perhaps it is not true of mine. But no matter, for all measurements, on all instruments, admit of random error.
Dretske on knowledge
Keith Lehrer and Stewart Cohen
Department of Philosophy, University of Arizona, Tucson, Ariz. 85721
Dretske, in his original and important book Knowledge and the Flow of Information, attempts to solve some traditional philosophical problems in terms of his information-theoretic account of knowledge. Dretske offers us the following analysis of knowledge when, as he says, "there is a positive amount of information associated with s's being F": K knows that s is F = K's belief that s is F is caused (or causally sustained) by the information that s is F. (p. 86) Our own objection to analyses of this kind is that knowledge arises when there is the appropriate fit between a system of belief and the facts. Consequently, the causal relation between a single fact and a single belief is neither necessary nor sufficient for knowledge. We shall attempt to illustrate and, indeed, establish this point with examples, but we wish to stress the general point that knowledge is a systematic affair in the sense that whether a particular belief constitutes knowledge depends on what else a person believes, and not simply on the relation of the belief to what is believed. Dretske initially says that he intends to restrict application of his analysis of knowledge to cases of perceptual knowledge, though, in fact, he then applies his analysis to cases of communicative knowledge. The simplest counterexamples to this sort of analysis are ones in which a belief that s is F is caused by the information that s is F, but the causal process depends essentially on false beliefs. Let us consider an example. C6leste looks through a telescope. She sees a planet, and the signal carries the information that the object is a planet. Moreover, she is thus caused to believe that she sees a planet by the information. However, in fact, she identifies the object as a planet because of the position of the object with respect to other celestial objects. She believes that the right-hand object in a configuration at this time is a planet. In fact, the telescope, because it contains a mirror, has inverted the image of the objects. She does not know this. So the object she identifies as the planet is, in fact, not in the appropriate position for the planet when seen with the naked eye at this time. Why, then, does the information cause
73

her to have the correct belief that the object that she sees is the planet? The answer is that she has the false belief that the righthand object in the configuration as seen with the naked eye is the planet. In fact, the left-hand object as so seen is the planet, but she has the further false belief that she is seeing the objects in the order in which they would be seen by the naked eye. As a result, she makes two errors, one compensates for the other, and she has a true belief. She is right as a matter of luck, however, and we deny that she knows that she sees a planet. Unfortunately, by Dretske's analysis, she would be said to know. The second problem arises when one is misled by the evidence. Consider a stellar example. Sopor is in a planetarium looking at what he believes to be a fabrication of the heavens. It is a marvelous planetarium, and people report that often they cannot but believe that they are really seeing the heavens when inside. Sopor falls asleep in the planetarium, and when he awakens, forgetting he is inside the planetarium, believes he sees a bright star above him. It so happens that the people who run the planetarium are able to open small windows in the roof of the planetarium, so that part of what viewers see is the real sky and part is the designers' work of art. The windows are opened without notifying the observers in the planetarium to test the efficacy of the illusion. In fact, people cannot tell the difference between the illusion and the real thing. As a result, when Sopor believes that he sees a bright star, he actually does see it, because a very small window is open in that part of the planetarium roof. His background beliefs, that he is seated in a planetarium and that he is seeing only an illusion created by the owners, are for the moment forgotten due to his sleepy state. So he believes that he sees a star, which he does, though his background beliefs affirm that he must be deceived. The signal from the star overcomes his good groundsforwithholding belief and causes him in his sleepy state to believe he sees a star. He would thus be said to have knowledge by Dretske's analysis, but, again, we would deny that Sopor knows that he sees a star. The problem in all such cases is really the same, and it is a problem with all causal accounts of knowledge. It is that they fail to take adequate account of the role of background belief and information in the formation of knowledge. Such beliefs play an essential role in the formation of knowledge, and, therefore, when a belief fails to fit with those background beliefs in the appropriate way, or when the requisite background beliefs are erroneous, knowledge is lacking. There must be an appropriate correspondence between the system of background beliefs and the belief in question, not just a match between the particular belief and the facts, and that fit must not depend on any error in the system. Only then do we have knowledge. Put in the language of computer modeling, knowledge always involves processing by the central processing system, which has access to background belief. Input that is not so processed is simply not knowledge. In conclusion, we raise a more fundamental objection to Dretske's epistemology. Dretske assumes that a signal carries the information that s is F only if the conditional probability that s is F on the signal is 1. He defends this assumption on the grounds that such a restriction yields conjunctivity and avoids the Gettier (1963) problem. He affirms that a conditional probability of 1 gives us the result that the truth of s is F is entailed by the signal. This line of argument is incorrect on the following grounds: Lehrer (1970) has shown that one can have a probabilitybased rule that is conjunctive. So we do not need the restriction to a probability of 1 to obtain that result. Secondly, as Carnap (1971, pp. 101,111-14) showed, except forfinitelanguages (and our natural language is not finite), a probability of 1 does not entail truth. Since it does not, a probability account of the sort Dretske offers does not solve the Gettier problem, but is disastrously affected by it. Suppose that the conditional probability of s is F on r is 1, but that r does not entail that s is F. Then 74
we may have r rendering it probable that s is F to degree 1, even though it is false that sis F. Suppose that s is F entails that s is G, and suppose s is F is false. It follows from the calculus of probability that the conditional probability of sis Gon r is also 1. The Gettier problem results when a person believes that s is G only because he deduced it from s is F. It has been argued that a probability of 1 is too restrictive a requirement for knowledge, and Dretske's reply is that we may exclude some alternatives on pragmatic grounds as not being relevant. The problem is that the irrelevant alternatives are not excluded by the signal r, and, therefore, the probability that s is F is not 1 on the signal r. Thus, Dretske is not entitled to the advantages of employing a probability of unity in his account. The Gettier problem again emerges. Dretske cannot have it both ways. Either he requires a probability of unity and embraces skepticism or he requires less and takes the Gettier consequences.
Information and error

Isaac Levi
Department of Philosophy, Columbia University, New York, N.Y. 10027
An agent's knowledge at a given time serves then as his standard for serious possibility. Such a standard defines the "space" of possibly true hypotheses over which the judgments of credal or subjective probability that guide his conduct are or ought to be made; and it is relative to that space of possibilities that the informational value of hypotheses is to be assessed. (Levi 1980, 1.2) Dretske appears to agree with at least the latter claim. Yet the agreement is attended by a commitment to a pedigree theory of knowledge, according to which knowledge is true belief produced in a certain way. Dretske wants to restrict knowledge to a subset of those true beliefs an agent has at a given time namely, those produced by processes producing error-free beliefs with nomological necessity. Such processes transmit information from a "source" to a "signal," which is then decoded through perception in a way which produces belief. Both the value of the information transmitted and its content are relative to the agent's background knowledge. To this extent, the background knowledge functions as a standard for serious possibility. Clearly, beliefs not produced as specified by Dretske could also serve in a standard for serious possibility; but Dretske appears to require that only beliefs produced in an error-free manner with nomological necessity can serve as part of the standard. Dretske acknowledges a threat of circularity in his characterization of knowledge. He seeks to disarm the threat by claiming his characterization to be recursive. Background knowledge is itself produced error free through an information-processing mechanism relative to a more attenuated background knowledge, which by iteration of the recursion eventually, so it seems, reduces to nothing, (pp. 86-87) Prima facie, Dretske is committed to a tabula rasa. Yet he also thinks it an open question whether we are "prewired," with innate concepts furnishing us with the ability to form beliefs without prior conditioning, (pp. 231-35) The conflict is only apparent. For Dretske, innate concepts do not presuppose innate knowledge but only an innate capacity to acquire beliefs and knowledge in the flow of information. But any conceptual framework must constrain the space of serious possibilities while it remains in place and, hence, it must function as background knowledge. In this sense, innate conception presupposes innate knowledge, after all. And such knowledge confounds Dretske's characterization of knowledge. Thus, Dretske must rule out prewiring. But if he does, where is the standard for serious possibility required by his account of information processing that is to be used at the initial step of his

recursion formula? Dretske's open question is a dilemma for his position. His devotion to a pedigree theory of knowledge is to blame. We need not feel deprived of "solutions" to the alleged paradoxes of Gettier and the lottery" when we turn our backs on Dretske's information-theoretic pedigree theory of knowledge. ' Still, maybe Dretske's discussion can instruct us about one way in which an agent may expand his corpus and thereby add new information through reducing the space of serious possibilities. Although we sometimes expand by deliberate choice between rival hypotheses on the basis of the available knowledge, such inductive or inferential expansion is unsuitable as a model for expansion in response to sensory stimulation. Perhaps Dretske's project, when reduced to size, offers us an account of such routine expansion (Levi 1980, 2.2). In my judgement, it does not. For Dretske, in order for informational content of some given value to be carried by a signal from a source, a code for decoding the condition of the signal must be given. Dretske thinks there is at least one correct code whose correctness is independent of the opinions of any agent. Such a code decodes in a way which, as a matter of nomological necessity, retrieves error-free information. Only such codes are properly called codes. They correspond roughly to perfectly reliable routines or programs for using data as input in routine expansion (Levi 1980, 2.2, Chapter 17). Perfectly reliable programs undoubtedly exist. But in many contexts, restricting ourselves to perfectly reliable routines for the purpose of fixing beliefs would be foolish. If we seek to retrieve information of great precision in reading voltmeters, gasoline gauges, scales, and the like, we have to contend with random error. Such error may be avoided if we rest content with very crude messages. Thus, if I am content to take the reading of the scale on which I weigh meat as accurate to within 5 lb. on either side of the indicated value, I can, perhaps, even with nomological necessity, avoid error. But if I take the scale to be accurate to within 2 oz. on either side, perhaps I shall be entitled to declare that with a 95% chance or objective probability I shall avoid error. The benefits in precision (and, hence, informational value) gained seem to me to outweigh the risk of error incurred. To be sure, such a code is not a code in Dretske's sense, and the informational value and content obtained are not information according to Dretske's stipulation (which requires that information be true information). That only shows the disadvantages of speaking as Dretske does - in a manner that prevents us from talking without circumlocution of trading off risk of error against information (Levi 1967b). It does not prevent the practice - well known to all users of confidence-interval estimation. Nor does it refute the good sense in that practice. Of course, confidenceinterval estimation is applied in contexts where we deliberately choose a code for reading signals from sources. Dretske appears to be focused on decoding processes that are learned or prewired without deliberate choice of the code. Even so, are we to suppose that the rate of exchange between probability of avoiding error and new informational value must be different for codes adopted without deliberation than for codes we acquire through deliberation? Surely such a view requires substantial theoretical and experimental support. Dretske is silent on this matter. Dretske's use of the notion of probability is hopelessly confusing. No harm is done as long as one thinks of the information carried to the signal as carried infallibly with nomological necessity. Trouble arises when we take random error seriously. In discussing theflowof information from source to signal, we may speak of objective chances or statistical probabilities of conditions at the signal given conditions at the source. As a general rule, we cannot do the reverse. Probabilities of source conditions given signal conditions are subjective or epistemological. Such probabilities can be chances when the condition at the source is itself the outcome of some stochastic process; but the weight of a piece of meat placed on a scale is not a random variable in the stochastic sense. Yet it seems vital to Dretske's project that he treat such probabilities as if they were chances. (The chances of avoiding error in confidence-interval estimation are not, by the way, probabilities of conditions at the source given conditions at the signal.) Objective probabilities or chances should be distinguished not only from credal and epistemological probabilities but from information-determining probabilities as well. A person might regard the victory of any one of the three candidates X, Y, or Z as a serious possibility and he might consider each hypothesis as to who will win equally informative. Since informational value varies inversely with information-determining probability, the information-determining probabilities ought all to be equal. Yet, the person's credal probability that X will win might be higher than his credal probability for the alternatives. In some contexts, it is appropriate to equate information-determining probability with credal probability, in others with chances, and in still others with neither (Levi 1967b; 1980, p. 48). Informational value and content is relative to the demands for information of the agent and, in this sense, depends on his values. This is yet another obstacle to Dretske's effort to construe the informational content and value carried from source to signal as objective; and it indicates additional difficulties, beyond those implied by my foregoing remarks, for his attempt to reduce the intensionality of propositional attitudes to the nomological intensionality involved, on his account, in theflowof information. NOTE 1. My own response to the lottery is given in Levi (1965; 1967a, pp. 38-42). A response to the Gettier problem, seeking to emulate the admirable brevity of Gettier, is given in Levi (1980, 1.9).
Information and belief

Barry Loewer
Department of Philosophy, University of South Carolina, Columbia, S.C. 29208
Dretske attempts, as he puts it in the preface to his book, to "bake a mental cake using only physical yeast and flour. ' (p. xi) His attempt is novel in that its main ingredient is a concept of informational content that he claims to have extracted from information theory. In my commentary, I will focus on two steps in Dretske's recipe, his account of informational content and his account of belief. According to Dretske: A signal r carries the information that F(s) if and only if (iff) P(F(s)|r) = 1. Since "F(s)" is a statement, "r" should also be a statement, something like "an event r of type R occurs at time t"; "\" means "given that," Dretske arrives at this definition after a clear presentation of information theory, but his definition does not really have much to do with information theory a la Shannon and Weaver (1949). The main difficulty with Dretske's definition is that none of the usual interpretations of probability supports the notion of informational content that Dretske later uses in his accounts of knowledge and belief. A degree of belief interpretation is inappropriate, since he wants to analyze belief in terms of a clearly physicalistic notion of informational content. Information theory (unreflectively) employs a relative frequency interpretation, but Dretske explicitly rules this out. (p. 245) In any case, frequency interpretations don't seem suitable, since they apply only to events that are random outcomes of repeatable experiments. In his accounts of belief and knowledge Dretske applies his account to events that are not the outcomes of repeatable experiments. A propensity interpretation may seem more promising, since it is supposed to solve the problem of assigning probabilities to nonrepeatable events. The propensity of a chance setup C producing outcome e is usually explained by proponents of the propensity interpretation as a measure of the causal tendency of C producing r. But
75

Dretske is after the converse probability, the probability that r was produced by chance setup C. This probability is usually not meaningful on a propensity (or for that matter on a frequency) interpretation. [The point is that P(r\C) may be meaningful but not P(C|r), since there may be no propensity P(C).] My suggestion to Dretske is that he reformulate his definition of informational content without appealing to probability. He suggests how this might be done in a note in which he states that he wants conditional probability to be interpreted as, "I mean to be saying that there is a nomic (lawful) regularity between these event types, a regularity which nomically precludes r's occurrence when s is not F." (p. 245) Dretske's suggestion seems to be that R(r) carries the information that F(s) iff it is a law that whenever an event of type R occurs, s is F. Clearly this wont work, since he wants, for example, to say that a brain state of type R carries the information that s is a dog; but it is quite implausible that there are laws connecting brain states and dogs (described as such). Perhaps Dretske's idea can be salvaged by this formulation: r's being R carries the information that s is F iff r is R, and if r is R then s must have been (or must be) F. "If r is fi then s must have been F" is what Lewis (1979) calls "a backtracking conditional." Truth conditions for these, as for other conditionals, are (approximately) "there are laws L, conditions C which are cotenable with R(r) such that L&(C(R(r) imply F(s)." This account dovetails with Dretske's view that the information carried by a signal is relative to "channel conditions."These are conditions that are thought of as fixed in a given situation. My suggestion is that the conditions cotenable with R(r) are the channel conditions. Dretske emphasizes that factors that count as channel conditions are sensitive to context. He remarks that whether or not C is included in the channel conditions "is a question of degree, a question about which people (given their different interests and purposes) can reasonably disagree, a question that may not have an objectively correct answer." (pp. 132-33) This relativity to context accords with my reformulation of his account of information (conditions cotenable with R(r) also depend on context), but it is troubling in view of Dretske's project of constructing a purely physicalistic account of belief. If the information carried by a brain state (and hence what belief it tokens) is relative to our "interests and purposes," then it is not clear that this notion of information is the purely physicalistic one Dretske advertises it to be. According to Dretske, a belief is a type of structure (in humans a type of neural structure), that plays a certain role in the production of behavior. Belief content, however, is not determined by the role of a belief in the production of behavior, but by the information that the original tokens of the structure carried. The content of a belief structure of Type S is the most specific information carried by the original token of S. This account has two notable features. It allows us to distinguish beliefs that p from beliefs that q when p logically implies q, and it provides an account of false belief, since tokens of S other than the original one need not carry the information that the original one did. There are two difficulties with this account that I will mention. The first is that r's being R cannot have as its senantic content - the semantic content of a signal is its logically strongest information - information that we associate with belief contents, for example, that s is a dog, at least not if r is a neurophysiological property. The trouble is that r's beingfiwill carry information about other neurophysiological states (as well as other features of the organism and its environment), since if r is R, then the organism must have previously been in certain other neurophysiological states. One way - perhaps the only way - to avoid this would be to let R be the property of having the content F(s), but this would frustrate the project of giving a purely physicalistic account of belief content. I think that Dretske has noticed an important feature of belief: A person can come to believe p in an innumerable variety of ways and that the content of his belief leaves behind the particular way in which the belief arose. Unfortunately, his account fails to capture this feature of belief. The second difficulty concerns false belief. Dretske's account depends on his having a way of specifying when two token brain states are tokens of the same type. But he provides no account of this. He cannot say that they are tokens of the same type just in case they have the same content, since it is content that he is trying to characterize. My guess is that he would say that two tokens are of the same type just in case they are of the same neurophysiological or the same physical type. But these answers won't do. It may very well be that the ways in which a developed neurophysiological theory individuates brain states will not correspond to the way we individuate belief states. Neurophysiology may count r and r' as tokens of the same neurophysiological type, even though they have different contents and vice versa. Until Dretske has given an account of state type, his account of false belief is seriously incomplete.
Can information be de-cognitized?

William W. Rozeboom
Department of Psychology, University of Alberta, Edmonton, Alberta, Canada T6G 2E9
It is a noble effort that Dretske has made here, and timely as well, considering the pretentious blather about "information" that too often substitutes for honest analysis of transduction mechanisms in those sectors of psychology, philosophy, and artificial intelligence now coalescing into cognitive science. And Dretske succeeds admirably in cutting to the bone on this concept's noncognitive side, even if his book works its way to this through a muchness of beguilement with Shannon statistics. (His probability-theoretic model of infonnation can be reached far more deftly from a broader conception of statistical covariation.) But Dretske oddly slights the cognitive side of information - on purpose it is true, but impeachably nonetheless. There is rather more to information than Dretske lets on: and while it is never fair to chastise an author for not having said everything worth saying on his chosen topic, it needs be asked whether Dretske's partial account may not be a half-truth more deceptive than enlightening. Ordinary language, to which Dretske surely acknowledges some obligation, endows us with the information concept as foremost a process verb in contexts such as J . i I tna t h's house was on fire. _,, . .. 1.r > informed John i , . The phone call , . r v about the fire. Smoke over his roof J I make two presumptions about these variations, whose defense I must forego here but which are also implicit in Dretske's account. First, informational content is characterized primarily by a proposition and only derivatively by objectual reference. Thus, John is informed about the fire only by virtue of being informed that-p for some proposition that-p that makes reference to the fire. (This may not be quite what Dretske would say, butfinedifferences here will not matter.) And secondly, despite the grammatical diversity, which ordinary language accepts for the antecedent of an informing, this is always elliptical for a subject/predicate state of affairs, that is, something's having certain attributes. Thus it is Mary's displaying particular gestures or utterances, or the phone call's having a certain phonemic character, or the roofs billowing smoke that informs John of the fire. Accordingly, we may say that informing is basically a process of form r's being A informs s that-p
(1)
76

wherein r is paradigmatically a dated segment (temporal stage) of some transient or enduring object of the sort to which we attribute stimulus properties, and s is a dated segment of some entity capable of mental acts. From (1), we derive two fundamentally different notions of nominalized information states, on the one hand, intent. Deleting the bracketed qualifier makes no difference here beyond exacerbating the question of precisely how (5)'s sentential clauses should be nominalized. Either way, (5) no more puts the information-that-p wholly in r's-being-A than thetaller-that-John-is-than-Mary is wholly in John, albeit (5)-sansbrackets does make carrier information independent of cognizers. So what is wrong with (5)? Well, for one thing, everyday usage is far from adamant that information must be veridical. (When John is informed by Mary that his house is afire, common sense allows John not merely to retain some doubt about this but to be entirely justified in doing so.) I suggest that the prima facie truthfulness of "information" is best reconstructed in terms of context cues that urge belief upon the recipient (e.g., declarative rather than interrogative mood of a verbal message). But let that pass. Suppose that authentic information is by nature truthful, just as in some common sense (but not psyehonomie) construals of perception, it is not logically possible for * erroneously to perceive that-p. If so, authentic information holds little interest, at least for the psychology of cognition and, I should think, for cognitive science more broadly. In part, this is because extant theories of cognition and information processing do not in fact concern themselves with veridicality of the signals transmitted in natural systems transactions. But neither is there any good psyehonomie reason for such concern: Truthfulness of the "information" contained in various stages of a causal process plays no role in the laws that govern these events. Or, if that is too contentious for your taste, look at it this way: (1) is just one species of genus
r's-being-A induces s to < that-p ) > (6)
possesses the information 1 that-p is informed I
(2)
and, on the other, with an important forthcoming reservation, 's-being-A J carries "II the information that-p '
contains f (3)
Although (2) and (3) are both intuitive consequences of (1), common sense allows (2) to hold even lacking (1) for any (r,A), and insists that (3) does not require r's-being-A in fact to inform any s that-p. Even so, just as we can explicate (1) by conjoining analyses of (2) and (3) with a suitable coupling of r to s, so should the components of (l)'s analysis include explications of (2) and (3) that allow the latter to hold even when (1) fails through lack of a suitable carrier/cognizer coupling. Cognizer information state (2) evidently calls for an analysis of form "s <|>s that-p" for some, perhaps complex, propositional attitude (|>, though eventually we would like to find a psyehonomie reduction for this. There is much to debate over the proper choice of verb-phrase "<))" here; but for simplicity let us take < to be *know, with the asterisk tokening some to-be> | negotiated compromise between the loose, everyday sense of "know" and the classic philosophical conception of knowledge as justified true belief. Then when s's-*knowing-that-p is partialed out of our analysis of (1), it only remains to choose some fragment of this remainder for our explication of (3) - which, to be nontrivial, however, must include more of (1) than just conditions localized in (r,A,that-p). That is, like the incompleteness of "John is taller," schema (3) is best viewed as elliptic for some completion of r's-being-A carries the information that-p relative to (3')
from which we derive , . . f carries 1 , [message] , , . , . rs-beine-A< . fthe< ., > that-p (0-wise, relative to [ contains J [ idea J . )
whose blank must be filled before we have a proper target of analysis. To me, it seems evident that (3') is best completed by reference to particular cognizer-stages and commonsensically interpreted something like r's-being-A carries the information that-p relative to s iff s's reception of r's-being-A would cause s to *know that-p (4) The be-caused-to-*know disposition attributed to s in (4) cries for cashing out in terms of psyehonomie mechanism, but that is exactly the directive that cognition research needs. Dretske, on the other hand, reads into (3') an epistemological extremity that cognitive science can ill afford to respect. For philosophical reasons, Dretske wants being-informedthat-p to be a hard-core knowing, under which that-p is not merely true but in some sense certain. And for that-p to be certain in (2), it must also be certain given (2)'s source in (1). But that-p is never certain given just r's-being-A forcommonsensical or cognitive-science instantiations of schema (1). So to fend off vacuity, Dretske fills the blank in (3') by reference to some cognizer's background knowledge and explicates the result as r's-being-A carries the information that-p relative to [s'sknowing-that]-fc iff the probability that-p is unity given r sbeing-A-and-[s's[knowing-that]-fc, but is less than unity given only [s's-knowing-that]-/: 10(5) I write "s's-knowing-that-fc" where Dretske makes explicit just "k," only because the narrower reading seems truer to his
by the same common sense intuition that pulls (3) out of (1). Cognitive science needs to study many ^-instances of (6), including, in particular, all degrees of belief/disbelief as well as varieties of desire, volition, and contemplative thought; and we expect many of these accounts to have much in common, especially for mental acts that stick close to psychology proper. The counterpart of (4) for other 4>-verbs in (7) is obvious, including its directive to seek out the disposition's psyehonomie nature. In contrast, (5) severs all conceptual ties between carrier information and the psyehonomie production of mental acts, and has no instructive generalization to other <|>-instances of (7). There is indeed an important place for statistical dependencies in the story of real-world cognitive accuracy, but not one reserved for certainties. Meanwhile, it should not go unnoticed that major technical problems bedevil the explicans in (5) even if certitude is relaxed. Dretske understandably wants his conditional probability in (5) to be a de re nomic connection, not a de dicto credibility relation. But a strong argument can be made that if "Pr(x\y)" denotes objective probabilities, "x" and "y" must be unsaturated predicates; that is, open sentences, not names of particular states of affairs as now schematized in (5). Dretske's passing remark that his theory of knowledge is restricted to de re perception suggests sensitivity to the issue here; yet I venture that he will not find it easy to open (5)'s closed sentential clauses in any way that can stand close inspection. Beyond that lies the question whether Pr(x\y) and Pr(y\x) can both exist de re and, if not, whether (5) isn't backing the loser. Underlying all of this is the need for an understanding of scientific lawfulness far more articulate about, inter alia, molar abstraction and locus structure than the philosophy of science has yet recognized. For an introduction to these complexities, see my article on the future of mental systems (Rozeboom 1983).
77
Commentary/Dretske: Knowledge and the flow of information The sufficiency of information-caused belief for knowledge
Bede Rundle
Trinity College, Oxford 0X1 38H, England
Dretske's fascinating and skilful deployment of the concept of information enables him to make sense, in a comprehensive and systematic fashion, of some of the most recalcitrant concepts in epistemology. I shall confine my observations to just one of these, that of knowledge. Knowledge, for Dretske, is a matter of information-caused belief. This characterization brings to mind causal theories of knowledge, one dubious feature of which concerns the supposed terms of the causal relation. So, for instance, it is not clear that, as some formulations would have it, the fact that s is F can meaningfully be said to cause the belief that s is F, facts not enjoying the kind of spatiotemporal location that such a role would require. Dretske recognizes the awkwardness in speaking of the similarly abstract infonnation as being causally efficacious, but he provides an explanation of this way of speaking: The information that s is F is said to causethe belief that s is F, if this belief is caused by that feature of the relevant signal by virtue of which it carries the information. This is not so much to show how, despite what one might think, information can be causally efficacious; rather, it amounts to a proposal to speak as if the information had this role when its carrier does. However, since the latter does give us a genuine cause, there is no way of pressing the objection that confronts the usual causal theories. Dretske makes a strong case for the necessity of the condition in terms of which he characterizes knowledge, not trading on the closeness of "information" and "knowledge" in their everyday uses, but arguing that what causes the belief that s is F must carry information to this effect in the sense of making the probability of s's being F equal to 1. Insistence on this condition enables him to cope with the familiar counterexamples to the definition of knowledge in terms ofjustified true belief- and, if any doubt remains, it seems to me to be in respect to the sufficiency of the condition. To ensure that the belief and its cause stand in the right relation, it would appear necessary to insist not merely that the belief be caused by that feature of the signal that, as it happens, carries the information, but that it be by virtue of carrying the information that the particular feature should have this effect. This is, I believe, Dretske's intention, but it is still not altogether clear that a belief caused in the specified way must be knowledge. Suppose, for instance, that the positions of the planets are in fact informative as to events in people's lives. By observing the planets a person might come to believe that he was destined for travel, a long life, or whatever, but we should wish to know more about the way in which this information was extracted before we allowed that this belief, correct though it was, amounted to knowledge. Here are two possible ways of dealing with such cases. First, it may be said that such instances of complex inferential knowledge are not those at which the analysis is aimed. However, the line between these and the allowable cases is going to be difficult to draw, given that Dretske invokes such examples as that of the spy's learning from the pattern of knocks that the courier is at the door. Here supplementary information relating to the past is required for the interpretation of what is heard. Second, it may be held that the person does know, even if he does not know that he knows. Dretske perhaps hints at this possibility when he mentions that "someday we may have to revise our estimates of what various psychics now know," should we discover that they have access to information that we believe is unavailable to them (p. 127). Even though the reasons they now offer for their beliefs may be inadequate, we may conceivably have to allow that they know what they claim. This way of looking at the matter strikes me as more plausible when there is a lack of supporting reasons, rather than an appeal to reasonings
that involve actual error. In this latter case, we might allow the belief to be knowledge only if the supplementary erroneous beliefs or inferences were not necessary for its generation. Accounts of knowledge that are obliged to introduce a range of special conditions to cope with particular applications of the concept often have a complexity and an ad hoc character that detract from their plausibility. Dretske's invocation of information-caused belief has a refreshing simplicity, and it is to be hoped that this will survive any refinements that may be called for by examples of the above kind, together with the further forms of knowledge to which his analysis is avowedly not addressed.
Some untoward consequences of Dretske's "causal theory" of information

Kenneth M. Sayre
Department of Philosophy, University of Notre Dame, Notre Dame, Ind. 46556
The project of this subtly argued book is to develop an account of semantic information that will help us understand how physical systems can occupy cognitive states. Given the largely metaphorical use of "information' in contemporary cognitive studies, the importance of this project is undeniable. I shall commentfirstupon Dretske's account of information itself, and then upon certain of its ramifications for the overall project. Although the book begins with a brief survey of mathematical communication theory (MCT), and although Dretske repeatedly uses the terms "communication theory" and "information theory" for his own account, it should be noted that his account and MCT have very little in common. In fact, his account rejects almost every fundamental concept of MCT except its measure of infonnation in terms of inverse probabilities. Whereas MCT is concerned exclusively with statistical properties of sets of events, for example, his theory focuses upon the information content of individual signals (acknowledged to be contrary to the intent of MCT on pp. 51-52). Whereas in MCT an infonnation channel is comprised of any two (appropriately constituted) sets of statistically dependent events, a channel by his account is a "fixed framework" that itself "generates no (new) information." (pp. 129,119) And whereas noise in MCT is a statistical property equivalent to the equivocation of the input with respect to the output, in Dretske's account noise is simply information (like "crackle' in a radio receiver's output) that does not originate at the source, a treatment he admits "is unorthodox from a technical standpoint." (p. 239n. 13) The most striking deviation from MCT, however, comes with his definition of informational content (see Precis at section "Information"), stipulating in effect that a signal r carries the information that s is F (if and) only if it is conveyed without equivocation (in terms of MCT, conveyed without noise). For such a signal r to be conveyed without equivocation there must be "a nomic (lawful) regularity between these event types, a regularity which nomically precludes r's occurrence when s is not F." (p. 245n. 1, his emphasis) It is this exceedingly strong requirement, of course, that establishes the connection in his account between information and truth (precluding/a/se information, p. 45), and enables him to move so quickly to the definition of knowledge as information-caused belief (p. 86). So central is this definition to his subsequent results regarding cognition, in fact, that any problems it engenders affect the remainder of his analysis. There are three apparent problems with this definition to which I would like to invite his response. One problem is primarily rhetorical. Inasmuch as the conditional probability of s's being F given r is established at unity on the basis of nomic regularities, it at least appears that an
78
Commentary/Dretske:
equivalent definition of informational content could be formulated in terms of nomic regularity exclusively - that is, a signal r's carrying the information that s is F = r depends nomically upon s's being F. (To be sure, this definition makes no direct mention of the receiver's prior knowledge of possibilities at the source; but it remains unclear how in Dretske's account the knowledge of a cognizer is supposed to aifect the lawful regularity between r and s's being F.) The same may be said for the nesting relation (p. 71), upon which his all-important distinction between digital and analog coding depends - that is, the information that t is G is nested in s's being F = s's being F depends nomically upon t's being G. Moreover, since, as Dretske remarks, the "ultimate source of the intentionality inherent in the transmission and receipt of information is . . . the nomic regularities on which the transmission of information depends" (p. 76, his emphases), and since this intentionality is the key to his subsequent analysis of cognitive states, the question arises of why Dretske goes through the exercise of discussing MCT at all. By the cumulative effect of his own admissions, all he accepts from MCT is its quantitative measure of information content, and this was available in the work of Hartley, (1928), which antedated MCT (in Shannon's 1948 systematic formulation) by 20 years. Although there are rhetorical advantages to be gained from the title "communication theory," a more accurate one for Dretske's own account would be "a causal theory of semantic information. ' At any rate, the reader should be warned against assuming that the bridge being built to cognitive science (in the words of the PrtScis) is being built from communication theory in any recognizably standard form. A second problem is factual in character. As Dretske's discussion makes clear, the sort of nomic regularity that his definition involves is not that of causal determination, in which an antecedent cause necessitates a subsequent effect. It is instead that of exclusive causal production, in which events of a given type (r) are due to causal antecedents of a given type only (s's being F). The problem is whether regularities of this sort are ever found in nature (apart perhaps from the isotropic laws of classical mechanics and particle physics). This problem is not a version of the issue of skepticism, handled so effectively in Chapter 5. It is not the question of whether there are channels with conditions sufficiently fixed not to introduce relevant information at the output; instead it is the question of whether there are any determinate sources that generate signals that could not be generated by other sources as well. If not (or if such sources are relatively rare), then there are in nature no (or very few) signals with informational content in the sense defined. An illustration of this problem is provided by Dretske's example (pp. 185-86) of someone's coming to know that Elmer died on the basis of information to that effect contained in the newspaper sentence "Elmer died." Is it ever the case that a newspaper report renders 100% probable (excluding the standard skeptical caveats) the event of which it purports to be an account? If not, then Dretske's definition of informational content seems to rule out our ever gaining knowledge by reading newspapers. The third problem concerns the effect of this definition upon the account of belief in Chapter 8. Contributing to this account is the definition of a structure's semantic content as the information it carries in completely digitalized form (p. 184), and the notion of a semantic structure in the form of an internal state evolved to be selectively sensitive to the information that s is F (p. 193). Since the content of such a structure (the digitalized information that s is F) - unlike beliefs - cannot be false, beliefs cannot be semantic structures themselves (p. 190). But a belief must have semantic content in some respect, else it could not be capable of misrepresentation. Dretske's attempted resolution is summarized in the following quotation. We are to suppose a cognitive structure evolved to be selectively sensitive to the information that s is F, and such that once it is developed, it acquires a life of its own, so to speak, and is capable of conferring on its subsequent tokens (particular instances of that structure type) its
semantic content . . . whether or not these subsequent tokens actually have this as their informational content. . . . [Thus] a structure type can have its origins in information about the F-ness of things without every (indeed, without any) subsequent token of that type having this information as its origin, (p. 193, his emphasis)
The upshot is that tokens of this structure type can be triggered by signals that lack the appropriate information; and when this happens, the subject has the false belief that s is F. (p. 195) If I have understood this account of belief correctly, it seems to be flawed in two basic respects. First, although a false belief by its nature fails to carry the information that s is F, which it purports to convey, Dretske's account allows a false belief to have semantic content conferred upon it by semantic structures "with a life of their own. " But to have semantic content is to carry the digitalized information that s is F. Thus, a false belief both carries and fails to carry the information that s is F, which at least appears to be sheer double-talk. Second, if indeed a given internal structure were to carry the information that s is F, then by definition all tokens of that structure must carry that information, else the probability of s's being F given the occurrence of that structure would not be unity as the definition of informational content requires. Hence, the suggestion that the structure type might carry the information that s is F even though some or all of its tokens fail to do so seems to be ruled out as necessarily counterfactual. If these difficulties in Dretske's analysis of belief cannot be made to disappear, we have yet further evidence that his definition of informational content is too strong to do the job for which it is intended. And since this definition is the focal point of his novel account of information, we have yet further reason to question the value of this particular account. I, for one, hope these apparent problems can be resolved since, in addition to being timely, this book is acutely argued and delightful to read.
On the "content" and "relevance" of information-theoretic epistemology Ernest Sosa

Department of Philosophy, Brown University, Providence, R.I. 02912
Here are five passages from Knowledge and the Flow of Information:
Informational content: A signal r carries the information that s is F = The conditional probability of s's being F, given r (and k), is 1 (but given k alone, less than 1 [where k is information known to the receiver ]). (p. 65) (I) The information that ( is G is nested in s's being F = s's being F carries the information that Ms C. (p. 71) (II)
Structure S has the fact that Ms F as its semantic content (a) S carries the information that Ms F and (b) S carries no other piece of information, r is G, which is such that the information that Ms F is nested (nomically or analytically) in r's being G. (p. 185) (III) With the idea of a structure's semantic content we are, finally, in a position to give an account of belief, (p. 189) (IV) Beliefs are semantic structures, but that is not all they are. They are semantic structures that occupy an executive office in a system's functional organization. . . . Hereafter, those semantic structures that have an executive function, that help to shape a system's output, shall be called cognitive structures for the system in which they occur. . . . To qualify as a cognitive structure, therefore, an internal state must not only have a semantic content, it must be this content that defines the structure's causal influence on output, (pp. 198-99) (V)
79
Conunentary/Dretske: Knowledge and the flow of information

Comments on the passages 1. Given I, the information carried by a signal is relative (a) to information k that the receiver already has, and (b) to what is considered a given (firm, rigid) channel (with no relevant alternatives). So far as I can see, the nature and constitution of channels can vary ad lib. There are no general noncircular constraints on the conditions that can constitute a channel. And since different people may receive a signal while in possession of different relevant information, that signal may carry different information to them. We are told that "(. . . the conditional probabilities defining the informational content of the signal) are all determined by the lawful relations that exist between source and signal." (p. 77) Strictly speaking, of course, the relations that matter are those between the source O, the channel conditions C, the information (k) already in possession of the receiver, and the signal S. But if there are no further restrictions on C or k, then it is useless to require that the relations be lawful. For either k or C could be made to include the conditional that "if S then O," and then the relations in question would be logically lawful (modus ponens). That requirement alone is hence no restriction. 2. The account of semantic content (III) derives from a distinction between two ways in which a structure can carry information: in analog form and in digital form. Sensory experience carries information, picturelike, mostly in analog form: As in a picture, most of its information is carried nested in more specific information (cf. II). A cognitive structure (semantic structure, belief) carries some information in digital form: Such information is carried free, unconfined in any nest. Given I-V, the content of an internal state can apparently vary from zero at one extreme to the Library of Congress (and beyond) in the other direction, depending on how we set the channel and on the recipient's knowledge. And this leads to the consequence that there is no such thing as what one believes (neat). What one believes relative to one observer and channel can be as different as one can imagine (and more) from what one believes relative to another channel and observer. But surely there are things I believe - consciously consider and accept as true - right now, independently of what any signal may carry to any recipient by any channel about my internal states. (Serious though it appears, this problem seems also in a way external, when compared with the questions below. 3. Internal questions concerning IIII: a. Do not the definitions as they stand make it impossible for any structure S to have as its content anything other than the fact S? The following reductio says they do. i. ii. iii. iv. v. vi. S has the fact T as its semantic content Not- (T = S) S carries the information T S carries the information S T is not nested in S T is nested in S Assumption Assumption i.III I i, ii, iv, III iii, II it remains to a similar objection. Suppose, for example, that S nests T and T nests U (for arbitrary S, T, U). Then the following would seem to qualify as cases of/ (distinct from both T and S) such that S nests / and / nests T: (i) / = T&cU, (ii) / = S&(V or V), (iii) / = V (where Vis nomologically equivalent to S). How then could S have anything (even S itself) as its semantic content if we retain I and II and modify III into III'? c. Also, relative to any receiver who accepts T if and only if U as background information k, no one can have a structure S with Tas its semantic content (apparently, no one can believe T). For S would always carry information U if it carried T (relative to k), and U would always nest T (relative to k). So U would always serve as an / of the sort prohibited by III'. d. Here let's waive questions a through c, and let's suppose that III or something close to that can be cleared of such questions. What relation must there be between S and T for it to be possible that S have T as its semantic content? The following must hold: d(i) P(T|S) = 1 d(ii) There is no / such that: {not-(/ = T) & [P(/|S) = 1] &
IP(T\I) = 1]}
(Since S itself, if distinct from T, would serve as such an /, and given the further difficulties of questions b and c, let us assume as an implicit condition of d(ii) that / is of some (to be specified) sort K; it is thus an / of sort K that d(ii) concerns. So we are supposing for the sake of argument that questions a through c can be answered by specifying such a sort K.) The requirement imposed by d(i) and d(ii) acting jointly is in effect that S cannot have T as its semantic content, unless S "probabilities" T directly, without the aid of any intermediaries. This requirement is made puzzling by the following facts: First, among the semantic structures of greatest interest in the present context are presumably brain states or the like. Second, such cognitive structures are causally and nomologically far removed from many of the contents that they supposedly enable one to think about. Third, for nomologically distant S and T - say a brain state of mine (S) and the rectangularity of the sheet of paper before me (T) - if S is to probabilify T, it must do so relative to some channel of givens (C) and some background of presuppositions Fourth, such a channel C cannot itself alone probabilify T, presumably not ever and especially not in our present case, since, if it did, that would make it an / of the sort ruled out by d(ii) - since, relative to C, P(C S) = 1. Fifth, it is hard to see what such a channel might be, except the conditional that T holds if S holds. [I mean here the weakest such conditional, the material conditional (S * T), tantamount to not-(S & not-T)]. What we want, recall, is a direct connection: a channel relative to which S guarantees T in the absence of any intermediate 1 (any / that guarantees T and is guaranteed by S). But if T is not logically entailed by S and if it is "spatially and temporally distant" from S, then how could S thus guarantee T - how could it ensure T "directly" - except relative to some such channel as T holds if S holds (S -> T)? * But if such a conditional as (S T) can serve as a channel, does it not then turn out that any brain state B has any truth T as its semantic content? After all, for any truth T, the conditional (B T) will always be true, for any arbitrary B. And relative to that conditional, it will then always turn out that P(T\B) = 1, for any such B. Perhaps one could appeal to relevance here. Perhaps in every case where a truth T is in fact the semantic content of some brain state B but not of B', one could distinguish B from B' by saying that the conditional (B-+T) has no relevant alternative, whereas > the conditional (B' T) does have a relevant alternative [e.g., not-(B' * T)]. But whatever one may think of the appeal to relevance in general, here in particular it seems mysterious.
Since the only two assumptions that lead to the contradiction are i and ii, it follows that, given definitions IIII, if S has fact T as its semantic content (i), then T and S must be identical (not-ii). We might accordingly change III to read: Structure S has the fact T as its semantic content = (first) S carries the information T and (second) S carries no piece of information / other than T and other than S itself such that T is nested in I. (Ill') (Note that in this question and in what follows I have replaced Dretske's monadic "s is F, " "t is G," etc., by the more general sentential letters "S," "T," etc. I have done this for simplicity of exposition, and because the reasoning could always be retailored along monadic lines with no essential change.) b. Our definition is little better for the change, alas, subject as
80
Comirientary/Dretske: Knowledge and the flow of information

What could make not-(B - T) irrelevant but not-(J3' -> T) relevant? The answer could not possibly be that B has T as its semantic content, of course; not without vicious circularity, e. It seems to me in fact very hard to understand the notion of relevant alternatives even in the best of circumstances. A thing's appearance does not determine its kind. The possibility of plastic surgery, imitations, mutations, and the like should make this clear. Hence, unless the circumstances take up the slack, it would always be false (even in cases of true knowledge by sight) that the thing would not look thus were it not of that kind. And P(K L) would never reach unity. How can the circumstances ever take up the slack, however, unless they include some such condition as: If it looks L, then it is a K? But in every case of a true K, such a conditional will be true simply by virtue of the truth of its consequent. No matter how a K happens to look - no matter how well disguised or how badly disfigured - it will be true that, if it looks L', then it is a K (again, for any L' look that it may happen to have - and, indeed, for any that it may happen to lack as well). If such a conditional is allowed as a "given' circumstance, then no matter what look a K presents, its look (L') will always "rule out' the alternative of its not being a K (relative to the given circumstance). But if (L * K) is allowed to operate when needed, how can we block the unwanted (U K) - for * arbitrary U in any case of a true K? We can, of course, invoke relevance: (L* K) may be said to have no relevant alternative, whereas (L' K) does have not-(L' K) as a relevant alterna * tive. But it is not as though (L K) had no alternative at all. There is, after all, not-(L> K). So our question is: What makes the negations of some such conditionals relevant and the negations of others irrelevant? What sort of difference is this? To me this remains occult. information he gives me is of a probabilistic and partial sort, but neither one of us is shocked or surprised at the state of that information, and we properly think of it as a reasonable exchange. Matters are the same in science. In his good review of Dretske's book, Loewer (1982) quotes the following useful example:
A signal r carries the information that F(s) iff the conditional probability that F(s), given r(and k) is 1 (but given k alone is less than 1). For example, if the probability that an alpha ray was emitted given that the Geiger counter click is 1, then the click carries the information that an alpha ray was emitted, (p. 297)
Probability and information Patrick Suppes

Institute for Mathematical Studies in the Social Sciences, Stanford University, Stanford, Calif. 94305
There is much to be commended in Dretske's Knowledge and the flow of information. It is hard to think of another book, old or new, that makes information such a central concept in the theory of knowledge. My central objection to Dretske's work is essential to the way he has developed his ideas, but not, I think, fatal to his general plan. I think he is mistaken in requiring that in order for a signal r to carry the information that s is F, the conditional probability of s's being F, given r, is 1. (I have omitted the background k in this formulation.) Obviously he wants to force a condition that intuitively satisfies a strong requirement of knowledge, but in doing so, it seems to me that he has removed himself from the real world of experience to an idealized world in which the condition he imposes is satisfied only in the simplest cases. What is mistaken about his insistence on the conditional probability's being 1 is that, both in common sense and scientific experience, we continually use intuitively the concept of information to convey partial probabilistic beliefs - beliefs that do not satisfy his requirement of being strictly knowledge. Let me give some examples. I call my lawyer for some urgent legal advice, but he is on vacation. His office advises me that they do not know exactly where he is, but they do have information about his travel plans. The obvious and implicit implication is that they have information about his whereabouts but do not know exactly where he is. Thus, I could say, "Well, he is almost certainly somewhere on the East Coast, because he was scheduled to fly into New York yesterday and to be in Washington tomorrow." I ask a friend in the real estate business, "Do you have any information about houses for sale in South Palo Alto?" The
This is an artificial example if there ever was one from the standpoint of this kind of experiment. It is absolutely standard in any sort of counting experiment involving particles, alpha rays, or what have you, that the efficiency of the counter is an issue, and in no complicated real experiments is that efficiency ever 100% - some false alarms and some misses are expected. Thus, in such experiments, Dretske's requirements would not be met, but it would seem absurd to say that the experiment conveyed no information. Still another kind of example of great importance scientifically is the observation of any continuous quantity. Beginning in the 18th century with the work of Simpson, Lagrange, Laplace, Gauss, and others, an elaborate and sophisticated theory of error was developed, and since about the middle of the 19th century, it has been standard in the more precise parts of physics and other sciences to report the standard errors of observations. What kind of account does Dretske give of this massive accumulation of scientific observations reporting endless standard errors? He might take the phenomenological position that each individual observation is somehow what he is talking about, but of course that observation in itself is not interesting. It is merely a phenomenological reading of some instrument. What is important is the estimate of the physical quantity in question, whether it is a particular optical angle observed, or, in another context, the mass of an object, the distance between two points, and so forth. It is these physical quantities about which we want to have knowledge, even if that knowledge is partial and subject to error. The absence of a sophisticated discussion of continuous quantities I found one of the most serious flaws in the book. Thus, in discussing a pressure gauge example (pp. 123-24), Dretske writes "No one has the slightest hesitation about saying that the attendant knows what the boiler pressure is when he consults the gauge. The gauge delivers the relevant information." (p. 124) I would say that 150 years of empirical observations in physics and the development of an elaborate theory of error estimates contradict this statement in as bald a fashion as is possible. The most that the attendant could know is the reading of the gauge - not what the boiler pressure is. (Even the attendant's knowing what the reading is is a matter about which I am skeptical. There are endless studies of another kind in psychology showing that our perceptual observations are always influenced by our interpretation of the framework within which the perceptual observations are made, and that that framework is subject to unconscious and subtle distortions - even my use of the word "distortions" is a mistake; I should choose a neutral word like "transformations" the better to reflect the ubiquitous and often innocent character of context.) One more remark about the pressure gauge. Dretske goes on to deal with the problem of malfunction, but it is precisely the point of the probabilistic theory of error developed from the 18th century on to deal, as physicists would put it, with nonsystematic errors. The sophisticated point was a recognition that many errors are not systematic, in the sense that their causal sources cannot be identified and dealt with. The important step was taken of recognizing that errors will persist whatever we do to eliminate them, and it is fundamental to a proper empirical theory of scientific observation to recognize the irreducible and probabilistic nature of these nonsystematic errors.
81

One of Dretske's objections to letting the conditional probability of s's being F, given r, be less than 1, is that, if two such conditional probabilities, for example, that s is F and s is G, each has conditional probability, say, of .91, it could be the case that the probability of their joint occurrence might be less than .9. I do not see this as an objection at all. It is a standard and acceptable fact about probability and also about the relative certainty of information we may have. A standard way of putting this is that the more an inference depends on many different kinds of evidence, the more skeptical we often are of the conclusion, because we are dubious about the conjoint reliability of the premises. Another objection Dretske raises to having the conditional probability less than 1 is that one then no longer has strict transitivity; that is, c may carry the information that b, b may carry the information that o, and yet, to a lesser degree, c may carry the information that a. This is what he calls the failure of transitivity, and, he claims, it jeopardizes thereby the flow of information. But, of course, what one has is a decay of information, and it seems to me that this is exactly what we expect in the real world. He labels this the xerox principle, but copying, in the minds of real people, as information flowsfromone source to another, is subject to error - as we all recognize. We expect it, we plan for it, we allow for it. Dretske should do the same. I reiterate that there is much of value in Dretske's book. He has done us all a real service by bringing information to the fore as a theoretical concept in the theory of knowledge. He is mistaken, I think, in the way he has constructed the relation between probability and information.
things sometimes go wrong with the gauge, causing it to misrepresent the contents of my gas tank, the same can be said about George when he comes to believe falsely something about the objects he sees. Our humble gauge even exhibits the rudiments of intensionality - representing the amount of gas in my tank, not yours, even if (quite by coincidence) both tanks are always at the same level. The project in Knowledge and the Flow of Information was to provide an information-theoretic model of knowledge, belief, and perception. If the gauge isn't fancy enough (and it isn't), it represents a promising point of departure. Some of my commentators, I think, took me to be doing something more ambitious. If the story I told left something out, if it didn't include an account of intelligent deliberation, then this, I submit, is less a criticism than a complaint, a complaint that there is more to our mental life than is dreamt of in information theory - even my version of information theory. I agree. Information theory. It is noted by Sayre how little my own account of information resembles the mathematical theory of communication (MTC). After cataloguing some of the ways my idea of information differs from that of MTC, he wonders why I go through the exercise at all. Why devote a full chapter to the details of a theory that is promptly discarded in the rest of the book? This is a fair question. Sayre's warning to the reader - against assuming that the bridge being built to cognitive science is being built from communication theory in any recognizable standard form - is timely and appropriate. I thought I had been careful to plant these warnings myself, but it doesn't hurt to emphasize the point. Others have similar worries. Rozeboom notes, correctly, that my ends could have been achieved more deftly from the broader conception of statistical covariation. Loewer, Good, Haber, and the Churchlands all question the usefulness or necessity of the statistical machinery. Is it, as Sayre asks, merely a rhetorical device? It was not the mathematical theory per se that interested me. It was, instead, the ideas clothed in this mathematical dress. These ideas have an application far beyond the restricted set of conditions required for application of the mathematical theory. When Levi finds my account "hopelessly confusing" because (among other things) the weight of a piece of meat placed on a scale is not a random variable in the stochastic sense, he is, in effect, complaining that MTC cannot be applied to such cases. He is right, but he has also missed the point. The question is whether we can apply a suitably relaxed set of information-theoretic notions to systems (gauges, speech, sensory processes) that, in some ordinary sense, transmit information. What, then, are the ideas embodied in MTC that I thought worth exhibiting (in Chapter 1), adapting (in Chapters 2 and 3), and applying (Chapters 4 through 9)? There is, first, the idea of information as an objective commodity, something which, though we speak of it as being in a signal at a receiver, is constituted by the network of relationships existing between a signal and its source. This is objective in the sense that the amount of information transmitted is independent of its potential use, interpretation, or even recognition. There is, furthermore, the idea that two signals can be informationally different (or the same) though the same (or different) in
Author's Response
Why information?
Fred I. Dretske
Department of Philosophy, University of Wisconsin, Madison, Wise. 53706
We may need a well-organized army of stupid homunculi, simple physical systems behaving in predetermined ways, to adequately model (in naturalistic terms) intelligent thought and action. But it would be a mistake to suppose that such hordes are therefore needed to provide an illuminating model of belief, knowledge, and perception. For, on the face of it, there is no reason to suppose that stupid systems cannot know, believe, and perceive. Consider, for a moment, a simple fuel gauge. Why does the gauge in my car register (accurately or inaccurately, as the case may be) the contents of my tank, not yours? The answer is, obviously, that it is hooked up to my tank, not yours. Well, then, why isn't this an appropriate model for understanding perception and reference? The reason George sees his car, not my car parked three blocks away - the reason his belief is about his car, not mine - is that he is hooked up in some appropriate way to his car, not mine. And when the gauge is operating normally, doesn't it represent some fact about my tank, some condition or situation to which we can give propositional expression: for example, that my tank is half-full? Why isn't this an appropriate model for knowledge? When George is operating normally, he also represents in some reliable way facts about the objects to which he is "connected." If 82
Response/Dretske: Knowledge and the flow of information meaning; the idea that the informational value of a message is a function of how many possible options it forecloses; the idea that redundancy can be employed to achieve better communication over a noisy channel (and what this tells us about why our concepts are designed to "chunk" the world into informationally redundant packets); and the idea that signals may carry little or no information about their causal antecedents (thus driving a wedge between causal and informational theories). Add to these ideas the fact that MTC supplies us with a wealth of epistemologically suggestive terminology, the fact that cognitive psychology seems to need something like this constellation of concepts (if not the particular articulation they receive in MTC) in its processing models of human cognition, and the fact that these ideas are, arguably at least, close approximations to our ordinary ways of thinking about information - and one has compelling reasons to rescue information from the clutches of the statisticians and put it to work in mathematically less purified surroundings. W. R. Garner put the point exactly right twenty years ago: As psychologists, we are certainly free to use the concepts in any manner which helps us, and we may even develop them to suit our particular purposes better. We refuse, in other words, to be concerned about a comment Cherry once made in discussing the role of communication theory in experimental psychology. He stated that a particular use of information concepts went beyond established communication theory. He was undoubtedly correct in that statement, but, as psychologists, we are not particularly concerned with this. If going beyond or even distorting established usage helps solve our behavioral problems, then we should feel free to do so. (1962, p. 15) The same should be said about philosophers and their problems. Haber gives me low grades because I haven't helped cognitive psychologists with their problems. I am using tools they discarded years ago. So what has psychology done for philosophy lately? And why can't a tool useless for job A be exactly the right tool for job B? Haber misunderstands what a philosopher is up to when one of them talks about perception and knowledge. I am, to be sure, interested in how a person sees a thing, how one actually gets the job done. But as a philosopher I am also interested in what is experienced. Is it true, as I keep reading in psychology textbooks, that one is never really aware of external objects (only their internal representations)? Do our judgments about and reactions to various stimuli qualify as genuine knowledge or not? If so, what confers this status upon them? These are issues in the semantics of psychological processes. They have to do with reference, truth, and reliability - the epistemological trinity. These topics may bore psychologists concerned (as Rozeboom observes) with the psychological dynamics of cognition, but they lie at the heart of traditional philosophical interest in cognitive issues. Questions about how the brain manages to extract information from incoming stimuli, just how this process is sensitive to such factors as background knowledge, values, and expectations, are (for me) questions about the decoding process and the factors influencing its operation. For answering these questions information theory may be of little use. But for questions about whether, once these operations are performed (whatever the details), the result is knowledge and, if so, whether it is of something external, it is crucial that we understand the etiology of these processes and, in particular, their relations to the situations about which they purport to inform us. My television repairer doesn't care whether I watch the news or stare at the test pattern, whether I am informed or merely entertained by the flickering display. His problems remain the same (television repairers are also methodological solipsists). [See Fodor: "Methodological Solipsism" BBS 3(1) 1980.] But this doesn't mean we can assess the epistemological role of this instrument, its power to inform us, without understanding its relations to distant events. And for this task, I submit, we need to understand information. The question remains, though, whether the ideas embodied in MTC can be adapted and applied in the way I use them. Loewer wonders which interpretation of probability I adopt that would support my use of it in defining informational content. Is it relative frequency, propensity, or something else? Levi insists that the converse probabilities are either credal or epistemological, not the objective probabilities I require. Rozeboom doubts whether a coherent account can be given of these probabilities once it is realized that, for an objective account, the predicate expressions involved in their statement must be (what he calls) "unsaturated." Good objects to my eccentric use of the term "surprisal" and suggests that my account (in Chapter 1) is historically misleading in some respects. (I concede these points to Good; I wasn't aware of the literature he cites.) There are several ways to reply to these criticisms. Perhaps the simplest is to embrace Loewer's suggestion: to drop the notion of probability altogether and emphasize that my account of informational content only requires a particular kind of lawful dependency between signal and source (given channel conditions and k). Even when we are dealing with a nonstochastic source (gas tank half-full, meat weighing 3 lb.), we can still sensibly talk about an equivocation-free signal (the reading on the gauge, the scale's pointer position), and this, of course, is the only condition required for my account of informational content. In fact, I gestured in this direction several times in the book when I was feeling particularly insecure about attributing converse conditional probabilities. We can, however, do better than this. We make comparative judgments about the reliability of communication systems, and these judgments are meant to describe objective facts about the system itself. They are not judgments about our willingness to bet or the integrity of our beliefs, although, of course, such judgments (when correct) imply something about the credal and epistemological usefulness of the systems described. If this instrument is more reliable than that one, more sensitive, and (because of careful manufacture) less susceptible to distorting influences, this describes the instrument's capacity for systematically converting input differences into output differences. Similarly, a psychologist's interest in sensory thresholds, powers of discrimination, and the detectability of signals is an interest in the degree or amount of correlation between input and output in a communication system - the probability, in some objective sense, that the input has the properties the
83
Response/Dretske: Knowledge and the flow of information output says it has. As the doctor moves the eye chart farther away, dims the lights, or removes the patient's glasses, the "probability" of a correct identification goes down. This is a conditional inverse probability: the probability that the third letter on the last line of the eye chart is an "F" given that he said it was. If these inverse probabilities are not real probabilities (because, for instance, the condition at the source is not a random variable), then, whatever they are, they come in variable amounts, are characteristic of the system itself (a measure of the resolving power of the eye or the sensitivity of an instrument), and are the properties on which the transmission of real information depends. I see no reason not to think of these probabilities as relative frequencies (among condition types). But two points must be remembered. First, nofinitesample need reflect the actual probability. This, of course, is why frequentists need the notion of a limiting relative frequency - to maintain the distinction between accidental correlations and genuine probabilities. Second, the relationship on which the communication of content depends is the lawful dependence of one condition on another - a dependence that precludes the occurrence of one without the other. In other words, I do not want to identify, as Lehrer & Cohen suggest I must, a probability of 1 with a limit of 1 (see Chapter 3 n. 1). Harman is right in asserting that these probabilities are always relative to some framework - the reference class relative to which probabilities are assessed. My use of the open ("unsaturated") expression "s is F" in designating the condition at the source whose probability was in question - as well as my discussion of background knowledge and channel conditions in determining a signal's informational content - was addressed, however, to exactly this point. (He is right, though, in observing that some of my statements on this point are "inexact.') Whether a signal carries the information that s is F does depend, among other things, on what the speaker already knows about the object s. It also depends on what conditions are taken to be stable enough to ignore in assessing the conditional probabilities. If a communication system systematically transforms O's into Is (the analog of a chronic but reliable liar), then, as I said, the flow of information is unaffected. My gas gauge consistently reads a quarter-tank too high. That doesn't prevent me from using it to get information about how much gas I have left. If the employees are as reliable in their use of a pink envelope as (I was supposing) they were in their use of the name "Herman," then I don't see why the employer couldn't learn who was selected by the envelope's color. A probability of 1? My commentators reserve their greatest scorn for my insistence that the communication of informational content (hence, given my definition of knowledge - the acquisition of knowledge) requires a conditional probability of 1. Zero equivocation. Armstrong, a sympathetic and early voice in this project, worries about whether there would be any knowledge if it turned out, as it might, that this was an irreducibly probabilistic world. Kyburg finds such an absolute notion as uninteresting (for real-life problems) as knowledge of the Absolute. Arbib wonders whether scientists, using statistical methods to arrive at the most plausible hypoth84
esis, will ever know anything on this account; he offers the friendly (?) suggestion that next time out I concern myself with the more pertinent topic of beliefand the flow of information. Sosa suspects that I can only achieve these high probabilities by the trivial device of specifying the channel conditions (or the receiver's background knowledge) in such a way as to logically ensure the result. Rozeboom doubts whether this strong interaction of information is of any interest, much less has any application, to real-world investigations of cognitive phenomena. Levi, after reminding me of the ubiquity of random error in all measurement, finds it foolish to restrict attention to perfectly reliable routines for the fixation of belief. Sayre asks whether regularities of this sort are ever found in nature. Is it ever the case, he asks, that a newspaper account renders it 100% probable that the event it reports did, indeed, happen? And Suppes childes me for removing myself from the real world of experience to an idealized world of absolute certainties. I do not hanker after certainty. It may well be that most of our perfectly reasonable, justified beliefs are based on less than complete information. When my bridge partner opens the bidding with one spade, I have a perfect (epistemic) right to carry on with the belief that he has at leastfivespades in his hand. If it turns out that he doesn't, he is to blame for violating an understanding between us. He will merit abuse when we land in a completely untenable contract. I was perfectly reasonable in believing he hadfivespades in his hand. But, given that he sometimes indulges in "psychic" bids (bidding one spade with as few as one or two spades in his hand), I can't know he had five spades when he opens the bidding in this way. How could I? Do I know he isn't "psyching"? How? I am - in other words - perfectly happy to admit that a great deal of what is interesting about thefixationof belief - and this applies especially to science - concerns situations in which decisions, choices, and beliefs are based on less than complete information. But what does this tell us about knowledge? Nothing, so far as I can see. I sometimes get the impression that my commentators think / invented the concept of knowledge, that / concocted these requirements - perhaps because, being a philosopher, I find the real world too messy. But / didn't manufacture these standards. I find them present when I think about the difference we mean to be marking when we contrast knowing that the voltage is somewhere between 4.7 and 5.3 and being perfectly reasonable (at a certain confidence level) in believing that it is. I can get information about a 7-lb. object - that it is, say, over 6 lb. - without getting the information that it is 7 lb. Partial information about the weight of an object is not information (in some attenuated sense) that it is exactly 7 lb. It is, rather, information (in some strict sense) that it is almost 7 lb. over 6 lb., or somewhere between 6 and 8 lb. I value this information as much as anyone, and I agree therefore with Suppes's claims about the usefulness of partial information, but this is not relevant to understanding the conditions that must be satisfied for the communication of informational content. If weighing 7 lb. means weighing exactly 7 lb. ( 0), and I don't think it ever means this, then no device will deliver this information. There are, as my commentators point out, random errors. The probabilities are never 1. But this, surely, is why we don't speak of knowing that the weight is exactly 7 lb.
Response/Dretske: Knowledge and the flow of information unless this is meant to contrast (as it typically is) with its eighing 6 or 8 lb. (see Chapter 5 n. 6). If we keep these contrasts in mind, recalling that such implicit contrasts are the way we describe certain intervals, I see no reason to think we can't have a probability of 1. Is no instrument, no matter how reliable and sensitive, capable of telling us (with a probability of 1) that the object weighs between 6.9 and 7.1 lb.? I will be surprised (and disappointed in our technology) to hear that this is so, but if it is so, I won't conclude that we can know with a probability of less than 1. I will conclude that we cannot, with present instruments, know that something's weight lies in this interval. Measurement is a rich source of information. Who could disagree? But the question I was asking myself is what information it supplies. I simply do not understand how one could suppose that one gets the information that the quantity in question is, say 7 .1, when, as Kyburg points out, we know there is a .05 chance that it is not in that interval. This is not to say we are not getting information - valuable information that can be used to guide theoretical speculation. It is to say that we are not getting that information. [See also Kyburg: "Rational Belief BBS 6(2) 1982.] Do we ever get information from a newspaper about the events of which it purports to inform us? We all know that newspapers aren't infallible; they sometimes publish falsehoods. But we also know that we sometimes make mistakes about the whereabouts of our children. Is this relevant to whether I, while arm wrestling with my son, could be mistaken about his whereabouts? The point is that the relevant reference class is not "stories appearing in newspapers" but "stories on this topic appearing in this newspaper. ' We are entitled to this more restricted reference class when the reader knows the paper is the Wall Street Journal (not the National Enquirer) and the topic is the performance of the Dow Jones average (not the secret sex life of Nancy Kissinger). As my discussion of background knowledge and channel conditions was meant to indicate, exactly the same considerations affect our assessment of how much - and what - information is available from instruments and through sensory channels. Aside from the theoretical fruitfulness of thinking about informational content in this way, I gave several positive arguments as to why information (and knowledge) must be thought of in this way. I still think these aiguments are compelling and I note, with interest, that few of my commentators mentioned them. (I know, I know, what can you do in 1,000 words or less.) Kyburg (not unexpectedly, because the arguments derive from his seminal work in this area) is an exception. So are Ginet and, indirectly, Suppes. Kyburg asks us to consider one of my voltmeters that (I supposed) delivered the information that the voltage drop across a resistor is 7 v. I was assuming, of course, that this meant not exactly 7 v., but 7 v. rather than 5 or 6 v. (once again, see Chapter 5, n. 6). Kyburg wires n of these resistors in series and asks whether we could know, by the conjunction principle (if S knows P and S knows Q, then S knows P and Q) that their combined voltage drop is In. If I understand him correctly, he is suggesting that we should be able to know this by the conjunction principle. Because (according to Kyburg) we cannot know (with a probability of 1) what the combined voltage drop will be (random errors may combine to yield a result significantly different from In), my argument, relying as it does on an application of this principle, is defective. This is a misapplication of the conjunction principle. To know that we have 2 gal. of water and that we have 2 gal. of alcohol is not to know what their combined volume will be. All that follows from the conjunction principle is knowledge that there are 2 gal. of water and 2 gal. of alcohol - not what will happen, with regard to volume or anything else, if they are combined. The same is true of the resistors connected in series. Ginet suggests a more interesting case. I use a communication chain to argue that the flow of information requires a probability of 1. He asks what happens as we increase the number of links in this chain. Don't we increase the probability that one or more of the (combined) channel conditions will change, thus interrupting the flow of information, though (according to my account) the information successfully flows from link to link? The answer is that we do not increase the probability of a channel condition's changing when the probability of any such condition's changing is 0. Ginet notes that my discussion of what is to qualify as a channel condition makes this determination a matter of degree. It sounds (I admit it) as though channel conditions are just those that have a small probability of changing. My intent, though, was different: The probability of these changes must be 0. What is a matter of degree is which of those (relevant) conditions that do not happen (modulo k) can be said to have a probability of 0. Conditions that sometimes change (modulo k) are never eligible as channel conditions, but conditions that never change need not be channel conditions. I agree with Suppes that as we extend a communication chain we typically get a "decay" of information. Less information comes out than goes in. The point of my argument, however, was that if a given piece of information is successfully passed from link to link throughout the communication chain, then that piece of information does not decay. To account for this fact we need probabilities of 1 between adjacent links in the chain. Ginet's concern with my invocation of channel conditions (conditions the probability of which is correctly set as 1 for purposes of evaluating the dependency relations between source and receiver) is, however, understandable enough. It looks like a cop-out, a way of effortlessly securing the requisite dependency between source and receiver. If anything can be held constant, everything becomes (artificially) bloated with information. Harman and Sosa agree. They suggest that I impose no significant, noncircular restrictions on what is to qualify as a channel condition and thereby render my account trivial. I don't think I merit this criticism. As my discussion of absolute concepts was meant to show, although one cannot say in a general way what kinds of objects are being excluded in describing something as empty, a discussion of what factors enter into the determination of relevance is useful once one understands that all relevant objects are being excluded. This was the project in Chapter 5. Things aren't so tidy as we might like. Just when, one wants to ask, is a condition stable enough so that its persistence generates 0 information (probability of 1) and can be ignored in evaluating the degree of dependence between receiver and source? At the same time (I would like to reply) that
85
Response/Dretske: Knowledge and the flow of information the poor man ceases to be poor as we keep giving him pennies. Finally, to return to Armstrong's worry, my discussion of the quantum effects discernible in low-level illumination experiments was meant to illustrate how information (in my sense) could flow despite an inherently probabilistic universe. Even if the sensory effects of a stimulus are completely indeterministic (in the sense of being completely unpredictable from a knowledge of the stimulus and receptor conditions), this does not prevent the sensory effect, if it (unpredictably) occurs, from carrying information about the presence of the stimulus. Knowledge. Some commentators find my definition of knowledge too weak; others find it too strong. Some find it both. Those who find it too strong are those who object to probabilities of 1. Those who find it too weak offer examples in which the subject is not "aware" of the information in the signal (but is caused to believe nonetheless) or is caused to believe by the relevant information, but in some fortuitous, unreasonable, or deviant way. Alston invites us to consider a person who is caused to believe that s is F by the information that s is F, but who is so undiscriminating in his habits that he is disposed to believe this by a variety of signals lacking the requisite information. It was, he tells us, just "luck" that he got it right, and this isn't knowledge. Rundle wonders about the sort of case in which (unknown to everyone) some situation (e.g., the position of a planet) carries information about people's lives. If someone, for perfectly silly reasons, is led to believe something about his life by the planetary positions, which (as luck would have it) carry this information, does he know? Harman's example of the pink envelope, several of Ginet's examples, and Lehrer & Cohen's problems are similar. Some of these examples are more troublesome than others. I don't think Alston's example works, for instance, because the person he describes is being caused to believe, not by the properties of the signal that carry the information that s is F, but by other properties of the signal. This person's disposition to believe that s is F whenever s looks even remotely F-like (often when s is not F) suggests that it is s's looking somewhat F-like that is causing the belief and this, according to the example, does not carry the information that s is F. Not everyone who is caused to believe that yonder tree is a maple by a sensory state carrying this information is caused to believe this by those properties of the sensory state that carry this information. If one frequently mistakes other trees for maples, under relevantly similar perceptual conditions, the suspicion is that one doesn't know, because one's belief isn't being caused by the right information. It is important to keep this point in mind since the condition (for knowledge) that the belief be caused by the information-carrying properties of the signal was designed to capture what is worth capturing in the idea that our perceptual beliefs must be reliable to qualify as knowledge. In general, our beliefs do not have to be reliable to qualify as knowledge - not if we type them according to content. If I happen to think (falsely) that Saab automobiles are fancy Volkswagens (hence being very unreliable in my beliefs that s is a Volkswagen), this 86
does not prevent me from knowing that this car (an easily recognizable VW beetle) is a Volkswagen (see Chapter 4, n. 6). The particular belief qualifies as knowledge, not because beliefs of that type are reliable, but because it is produced by those properties (distinctive silhouette, markings, etc.) that carry the relevant information. Neither do I think Ginet's second example works. If an erratic optical system often makes chartreuse things look magenta (thereby causing the perceiver to acquire a false belief), I don't see why he shouldn't be credited with knowledge when the system is functioning so as to make chartreuse things look chartreuse. As long as the system doesn't sometimes make magenta things look chartreuse (in which case "looking chartreuse" would be equivocal), it seems to me this is a case of knowledge. I know someone is at my door when the bell rings, even if the bell sometimes malfunctions and doesn't ring (causing me to have the false belief that no one is at the door). The decisive question is whether the bell sometimes rings when no one is at the door. Lehrer & Cohen's planetarium example exhibits, I think, a confusion between seeing a real star and knowing it is a real star. We can all agree that the sleepy visitor to the planetarium sees a real star, but the question is whether he knows it is a real star (or knows that he sees a real star). The answer seems to me to be "No," and the reason he doesn't is that, given the conditions as described, he is not getting the information that it is a real star. A simulated point of light from the planetarium's projector is a genuine alternative possibility, one that Sopor is (as Lehrer & Cohen admit) in no position to exclude on perceptual grounds. The optical input is equivocal. This leaves those examples (Rundle, Harman, Ginet, Lehrer & Cohen's first example) that do seem to satisfy my definition of knowledge, but they, because of defective background beliefs (compensating false beliefs, etc.), appear not to be genuine cases of knowledge. These examples present a rather mixed bag, and they deserve individual treatment, a luxury I cannot afford in the brief compass of this Response. I attempted to anticipate such cases in my discussion of the technician consulting a pressure gauge in Chapter 5, but my commentators obviously remain unconvinced. Rundle and Ginet correctly anticipate my reaction: One does know if one's belief is caused by the relevant piece of information no matter how one might have acquired the disposition to believe on that basis. I bite the bullet. I tend to think that conflicting intuitions about these examples sometimes involve confusions between someone's knowing something and someone's knowing (or being reasonable in thinking or saying) he knows, but my commentators are too savvy to let me kick this dust in their eyes. Rundle is right when he suggests that my view is more plausible when there is an actual absence of support for one's background beliefs rather than the appeal to reasoning that involves actual error. Still, what shall we say about a person who correctly and confidently predicts the weather by consulting the color of the wooly bear caterpillar's hairs? I know what I now say, but I say that because I don't believe there is any information (in my sense) about the impending weather in the coloration of the caterpillar. But what should be said if we find, much to our surprise, that this is a perfectly reliable indicator,
Response/Dretske: Knowledge and the flow of information as reliable as our most trustworthy natural signs? I'm not quite sure what to say, but there seems to be a strong case for saying that these prognosticators really did know we were in for a bad winter, and they knew this however they may have acquired their confidence in caterpillar fur as a meterologically significant indicator. If they got it at their mothers' knee, or from some crazy notions about divine intervention, I can accuse them of having been unreasonable, both in believing their own predictions and in thinking they knew what was going to happen, but this, remember, isn't the issue. The question is whether they knew, not whether they were reasonable in thinking they knew or whether their knowing was a piece of luck. Barwise suggests that signals carrying information only inform those "attuned" to the nomic relations that make possible its flow. Alston seems to be making a similar point when he alleges that a subject must be "aware" of the information in a signal to be informed by it. I agree with this in one interpretation of "attuned," not in others. I don't have to know the laws whose existence enables me to learn - any more than I have to know the principles of logic in order to argue validly. In the case of language, of course, we must be "attuned" to the conventional relations, but this, I submit, is a matter of learning how to decode information-carrying signals. If you don't understand English, the doctor's uttering "You have the flu" will not inform you, but not because learning English is learning the set of nomic relationships that make this communication informative. Learning English is learning what information these signals carry, not how they carry it, and this, I submit, is a matter of restructuring one's disposition to believe in response to these signals. Barwise's question about Fermat's last theorem is, I concede, a difficulty. I confessed as much in the book. I wish I knew what to say about "necessary" truths, truths which do not generate any information. My only comfort is that hardly anyone else seems to know what to say about them either. This is why I confined myself to perceptual knowledge. Perception. Doubts are expressed by Bogdan and by Armstrong about my notion of seeing an object without any beliefs - without, as Bogdan says, categorizing the perceptual object in some way. Bogdan thinks my point is trivially true if we are merely talking about the object of physical interaction. That, of course, is what we are talking about, but we are talking about it as the perceptual object. The point is that categorization (in belief) does not make it the perceptual object. Bogdan thinks that a child (presumably also a chipmunk or a crow) must see a daffodil as something in order to see it at all - if not as a daffodil, then as a flower, a colored thing, or whatever. Maybe chipmunks do see daffodils as something, but I wonder whether they would be blind if they didn't. How does one acquire the ability to see things as colored if one cannot, until acquiring this ability, see colored things? I thought my arguments in Chapter 6 were enough to establish the distinction between seeing an A, seeing it as an A, and seeing that it was an A; and some of Bogdan's remarks (though expressed in a way I don't always understand) indicate that he agrees. He prefers to talk about the values of certain functions but, unless I miss something, these values are precisely those aspects of perception that I alsofindimportant: the perceptual object, the sensory representation, and the perceptual belief. Armstrong isn't impressed by my sharp distinction between perception and belief. He cites the sophisticated processing occurring in perception and concludes that perception seems "belieflike." I agree: Perception does exhibit an "intelligence" that is remarkably belieflike in character. But so does our gastrointestinal tract. The key difference is that in the case of perception we are dealing with an information delivery system, which, because of the propositional character of information, is easily misassimilated to belief. Information, having a content like belief (though, I argue, different from belief) is easily given a ratiomorphic gloss with the result that sensory processes are confused with cognitive processes. Information gives us the propositional "contents" we need to describe the processes involved in the perception of objects (seeing a daffodil). Why go on to characterize the processes as themselves cognitive? What more do we gain by this attribution? I am not, incidentally, denying that our sensory states can be modified by our beliefs, expectations, and values. These processes may be "cognitively penetrable." [See Pylyshyn: "Computation and Cognition" BBS 3(1) 1980.] But that doesn't mean that the processes (or results) of sensory information extraction are themselves belieflike. What appears on my television screen is also modifiable, and often is modified by my beliefs and values, but that doesn't imply that the electronic image or the processes going into its production are themselves belieflike. Ginet makes some excellent points about my characterization of the perceptual object. He notes that even if a structure (our sensory experience) does not carry the information that my retina is in state fll, it does not follow that it carries no information about the retinal condition. Perhaps it carries the information that the retina is in state Rl or R2 or R3. If we admit disjunctive properties, then it would appear that the sensory experience represents, in a primary way, the retina - not, as I maintain, the distal object. This point troubled me during numerous revisions of Chapter 6. I am still not certain how to express it properly, but the idea is that there are specific properties of the proximal state that lead, causally, to the perceptual objects' looking square, to its being represented as a square. Now this (often holistic) property of the proximal event, though causally efficacious in producing the sensory representation, is not itself represented. I tried to express this by speaking of the way our sensory experience carries "highly specific" information about the distal object without carrying the same kind of specific information about its more proximal causal antecedents (p. 165). This talk of specificity was my way of attempting to rule out artifical disjunctive properties. The message may contain information about who sent it without containing information about who brought it, although, of course, if I couldn't have received it unless someone brought it, then receipt of the message also tells me something about its mode of delivery: that it was either A or B or C that brought it. Concepts and belief. My account of concepts identifies a structure's semantic content with the information it was developed to carry in completely digital form. Loewer
87
flesponse/Dretske: Knowledge and the flow of information and Sosa think (for different reasons) that on this account one would never be able to acquire concepts applicable to external, objective conditions, because any neurophysiological structure N (a candidate for the concept "x is a dog") will inevitably carry more specific information about other neurophysiological states (presumably those neural antecedents causally involved in the production of IV). I don't know what makes Loewer so confident about this. There are an unlimited number of ways of getting the information that s is a dog. Why couldn't a variety of such different causal lines converge on a structure so that the structure itself, though carrying information about the remote source of these lines (the dog), carried no information about which line joins it to the source? It was the point of Figure's 6.2 and 6.3 (Chapter 6, pp. 157, 159) to suggest that such patterns for the delivery of information could representationally "skip" the more proximal causal antecedents to yield a structure whose semantic content applied directly to the more distal condition. Sosa does not see how a neural structure N could directly probability an external condition T without the aid of intermediaries. This will seem mysterious unless one carefully distinguishes a causal theory of representation from an informational theory. If Tis the remote cause of IV, we can (even if we have to use the artifical device of "chunking" an otherwise continuous process) always find causal intermediaries. But this does not mean there are informational intermediaries. N can carry the information that T without carrying the information that /, where / is a causal mediator in this process. Sosa suspects, though, that the only way I can get a V neural state I to directly probabilify an external condition T (with a probability of 1) is by tinkering with channel V conditions so as to include the conditional that I only if T. If this is so, he argues, then the floodgates swing wide open. And so they do. But I see no reason to accept the claim that this is the only way to achieve the required dependency relations between I and T. Unless I miss V something, Sosa is claiming that the only way to get the required probabilities is by converting them into logical truths. I don't agree. I may need the help of channel conditions (and background knowledge) to give a realistic account of when the requisite dependencies obtain, but the dependencies in question are not logical. They are the familiar sort that obtain when I cannot - the "cannot" not being a logical modality - get my lights to go on without throwing the switch. What, Loewer asks, do I mean by the same type of neural structure? Neurophysiology may count R and R' as tokens of the same type even though they have (on my account) different semantic contents and vice versa. There is, in other words, no assurance that my account will square with projected scientific accounts of our capacities and performance. I think the possible divergence Loewer envisages is unlikely to happen. If it happens, so much the worse for our ordinary folk psychology and the enterprise of explaining behavior in terms of contentful internal states (beliefs, intentions, desires). But the reason I think this unlikely is that the individuation of neurophysiological states is, at least in part, functional. Scientists must keep their eye on the role that these structures play in the processing of sensory information, and this, of course, is 88
precisely the criterion for individuating states that my account recommends. To ignore this constraint on type identification in neurophysiology is to render unintelligible how things work or why they evolved in the way they did. It is like trying to individuate state types of my fuel gauge in terms of how far from Chicago the pointer is. You can do this if you like, but by ignoring the informationcarrying role of the pointer (by classifying informationally different states as the same state type), one deprives oneself of any coherent explanation of why the device works the way it does. The Churchlands correctly point out that information and meaning (their I-theoretic and semantic contents) can diverge radically. A primitive human may have a state whose semantic content is "The Gods are shouting" but whose I-theoretic content is "There was an electrical discharge nearby." We can, as they put it, misconceive what we perceive. So far we differ only in terminology. Where we disagree is over the matter of how information and meaning are related. They say there is no essential connection. By my account the relation is (to put it mildly) intimate. They threaten to hound me to death with counterexamples. Their main objection seems to be that there is a better account available of all these matters - that which ascribes semantic content to a state (type) according to the conceptual/inferential role it plays in one's cognitive economy. This has a nice ring to it, but I wonder what makes a structure's role a conceptual/inferential role. This, it seems to me, begs the question. It presupposes that the structures over which computations are being performed already have a semantic content. Where did they get it? To be told that a state or structure has the semantic content that P if it plays the same inferential role as my belief that P plays in my cognitive economy is to leave one wondering what makes my neural structures play an inferential role or participate in a cognitive economy. It sounds like magic: signifying something by multiplying sound and fury. Unless you put cream in, you won't get ice cream out no matter how fast you turn the crank or how sophisticated the "processing." The cream, in the case of a cognitive system, is the representational role of those elements over which computations are performed. And the representational role of a structure is, I submit, a matter of how the elements of the system are related, not to one another, but to the external situations they "express." How can a structure acquire a semantic content and then have a false instantiation? Sayre objects to my "double-talk" about false belief. According to my account of the matter, a structure can acquire the content "x is F" only if its tokens are confined to situations in which something is F. If we could token this structure type without anything's being F, then the structure would not carry the information required for it to have this content. There does (I admit) appear to be some sleight of hand at work here, but I thought I had all my cards on the table - even if some of them fell out of my sleeve. A structure's semantic content was identified with its informational content modulo a certain restricted set of conditions: those conditions that obtained during learning. What makes a gauge "mean" that the tank is half-full (even when it isn't) is that, under certain conditions, that is the information it carries. In the case of cognitive systems
References/Dretske: Knowledge and the flow of information these "certain" conditions are defined by the learning situation. What information did the system acquire a sensitivity to under those conditions? That is the concept the system has. If it misapplies that concept in an enlarged set of conditions, then it has a false belief. Cummins is skeptical about this answer and, together with Sosa and Loewer, notes that my account of what people believe involves a hefty dose of arbitrariness. Cummins asks why we should identify a structure token as a false instantiation of the concept yellow rather than, say, a true instantiation of the concept looks yellow. If there is no principled way of specifying the learning conditions, there is no principled way to say what someone believes - what concept is acquired and later falsely instantiated. Sosa notes (correctly) that the content of an internal state can vary from 0 at one extreme to the Library of Congress (and beyond) at the other. It depends on how one fixes channel conditions and on the learner's background knowledge. He concludes that there is no such thing as what one believes neat. These are incisive comments. They help illustrate a point I tried to make in Chapter 9. Whatever "softness" there is in our determination of channel conditions will infect the determinateness of what someone believes. That is, the same relativity that appeared in the discussion of knowledge, a relativity associated with the relevance of alternative possibilities, appears again in the discussion of belief. These relativities neutralize one another, however, in an epistemologically interesting way (p. 229). Does the frog have the concept bug or just the concept of a moving-dark-spot? Or neither? This depends on how generous we are about channel conditions. (Some of the same issues arise when we ask whether an altimeter represents the altitude or just the pressure.) If we take the conditions prevailing in the frog's natural habitat as part of conditions constituting the channel of communication, then - as there are no shadow simulations of bugs in this environment - we can credit the frog with the concept of a bug or whatever it can, in this environment, reliably identify. That is, we can, and do, assign concepts that faithfully reflect a creature's cognitive resources for applying them. If, however, we enlarge our viewpoint, including the scientist's laboratory as a relevant condition (where frogs are fooled by carefully contrived shadows), then, admittedly, the frog cannot know that anything is a bug (since it is incapable of getting this information); but, for the same reason, neither can the creature acquire the concept of a bug. The restrictions on knowledge and concept formation (at least for simple concepts) cancel out, so to speak, so that a creature can only believe what it possesses the information-gathering resources for knowing. This is the upshot of the joint indeterminacy (or, if you will, relativity) in knowledge and belief. But I take this epistemological consequence to be a powerful argument in favor of the informational account, not an objection to it. The account neatly dissolves an otherwise troubling skeptical objection to the possibility of knowledge. It transforms it into a question about what we can believe. Editorial note: An instance of botanomic laxity- if not referential opacity - seems to have crept into these pages in the indiscriminate way in which the author and various commentators have spoken of daisies and daffodils. I take it that the point at issue (concerning whether the child in question sees the flower in question as the flower in question) does not hinge on any critical (generic or specific) differences between B. terennis and
N. tazetta.
References
Armstrong, D. M. (1968) A materialist theory of the mind. Routledge & Kegan Paul. [DMA] (1973) Belief, truth and knowledge. Cambridge University Press. (DMA, ta FID] Attneave, F. (1959) Applications of information theory to psychology: A summary of basic concepts, methods and results. Henry Holt, [ta FID, IJG] Averbaeh, E. & Coriell, A. S. (1961) Short-term memory in vision. Bell System Technical Journal 40:309-28. [ta FID] Bar-Hillel, Y. (1964) Language and information. Addison-Wcsley. [ta FID] Barwise, J. (1981) Scenes and other situations. Journal of Philosophy 78:369-97. [JB] Berlyne, D. E. (1972) Aesthetics and psychobiology. Appleton. [RNH] Carnap, R. (1971) A basic system of inductive logic, part 1. Inductive logic and probability, vol. 1, ed. R. Carnap & R. Jeffrey. University of California. IKL] Cherry, E. C. (1951) A history of the theory of information. Proceedings of the Institute of Electrical Engineers 98:383-93. [ta FID] Churchland, P. M. (1979) Scientific realism and the plasticity of mind. Cambridge University Press. [PMC] (1982) Functionalism, qualia, and intentionality. Philosophical Topics 12 (no. l):121-45. [PMC] Churchland, P. S. (1983) Stalking the wild epistemic engine. Nous. In press. [PMC] Dennett, D. (1969) Content and consciousness. Routledge & Kegan Paul, [ta FID] Dretske, F. (1969) Seeing and knowing. University of Chicago Press, [ta FID] (1970) Epistemic operators. Journal of Philosophy 67:1007-23. [ta FID] (1971) Conclusive reasons. Australasian Journal of Philosophy 49:1-22. [ta FID] (1977) The laws of nature. Philosophy of Science 44:248-68. [ta FID] (1981) Knowledge and the flow of information. MIT Press/Bradt'ord Books. Erman, L. D. & Lesser, V. R. (1980) The HEARSAY-II speech understanding system: A tutorial. In: Trends in Speech Recognition, ed. Wayne A. Lea, pp. 361-81. Prentice-Hall. [MAA] Field, H. (1977) Logic, meaning, and conceptual role. Journal of Philosophy 74:379-409. [PMC] Fodor, J. (1980) Methodological solipsism considered as a research strategy in cognitive psychology. Behavioral and Brain Sciences 3:63-110. [PMC, ta FID] Garner, VV. R. (1962) Uncertainty and structure as psychological concepts. Wiley, [tar FID, RNH] Gettier, E. (1963) Is justified true belief knowledge? Analysis 23:121-23. [ta FID, CG, KL] Gibson, J. (1950) The perception of the visual world. Houghton Mifflin. [ta FID] (1966) The senses considered as perceptual systems. Houghton Mifilin. [ta
FID]
(1979) The ecological approach to visual perception. Houghton Mifflin. [RJB, ta FID] Goldman, A. (1976) Discrimination and perceptual knowledge. Journal of Philosophy 73:771-91. [WPA, ta FID] Good, 1. J. (1950) Probability and the weighing of evidence. Hafiiers. [IJG] (1957) The approximate and mathematical tools for describing and measuring uncertainty. In: Uncertainty and business decisions, ed. G. F. Carter, G. P. Meredith & G. L. S. Shackle, pp. 20-36. Liverpool University Press. [IJG] (1959) Could a machine make probability judgments? Computers and Automation 8:14-16, 24-26. (Now Computers and People) [IJG] (1965) Speculations concerning the first ultra-intelligent machine. Advances in Computers 6:31-88. [IJG] (1966) A derivation of the probabilistic explication of information. Journal of the Royal Statistical Society B 28:578-81. [IJG] (1977) Dynamic probability, computer chess, and the measurement of knowledge. In: Machine intelligence 8, ed. E. W. Ellcock & D. Michie, pp. 139-50. John Wiley & Sons. [IJG]
89
fie/erences/Dretske: Knowledge and the flow of information

Haber, L. R. (1975) The muzzy theory. In: The Proceedings and papers of the 11th annual meeting of the Chicago Linguistics Society, ed. R. Grossman, pp. 240-56. University of Chicago Press. [RNH] (1976) Leaped and leapt: A theoretical account of linguistic variation. Foundation of Language 14: 211-38. [RNH] Haber, R. N., ed. (1969) Information processing approaches to visual perception. Holt, Rinehart & Winston, [ta FID] (1974) Information processing. In: Handbook of perception. vol. 1. Historical and philosophical roots of perception, ed. E. C. Carterette & M. P. Friedman, pp. 313-33. Academic Press. [RNH] Haber, R. N. & Hershenson, M. (1973) The psychology of visual perception. Holt, Rinehart & Winston, [ta FID] Harman, G. (1975) Meaning and semantics. In: Semantics and philosophy, ed. M. Munite & P. Unger, pp. 11-16. New York University Press. [PMC] Hartley, R. V. L. (1928) Transmission of information. Bell System Technical Journal 7:535-63. [KMS] Hoyt, W. G. (1980) Planets X and Pluto. University of Arizona Press. [MAA] Kyburg, H. (1961) Probability and the logic of rational belief. Wesleyan University Press, [ta FID, HEK] (1965) Probability, rationality, and the rule of detachment. Proceedings of the 1964 International Congress for Logic, Methodology, and Philosophy of Science. North-Holland. [MAA, ta FID, HEK] Lehrer, K. (1970) Justification, explanation, and induction. In: Induction, acceptance, and rational belief, ed. M. Swain. Dordrecht: Reidel. [KL] Levi, I. (1965) Deductive cogency in inductive inference. Journal of Philosophy 62:68-77. [IL] (1967a) Gambling with truth. Knopf. [IL] (1967b) Information and inference. S(/nf/>sel7:369-91. [IL] (1980) The enterprise of knowledge. MIT Press. [IL] Lewis, D. (1979) Counterfactual dependence and times arrow. Nous 13:455-76. [BL] Lindsay, P. H. & Norman, D. A. (1972) Human information processing. Academic Press, [ta FID] Loewer, B. (1982) Review of Knowledge and the flow of information. Philosophy of Science 49:297-300. [PS] MacKay, D. M. (1969) Information, mechanism and meaning. MIT Press, [ta FID] Michie, D. (1977) A theory of advice. In: Machine intelligence 8, ed. E. W. Elcock & D. Michie, pp. 151-68. John Wiley & Sons. [IJG] (in preparation) Measuring the knowledge content of advice. [IJG] Miller, G. A. (1953) What is information measurement? American Psychologist 8:3-11. [ta FID] (1956) The magical number seven, plus or minus two: Some limits on our capacity for processing information. Psychological Review 63:81-97. [ta FID, RNH] Neisser, U. (1967) Cognitive psychology. Applcton-Century-Crofts. [ta FID] Rock, I. (1975) An introduction to perception. Macmillan. [ta FID] Rozeboom, W. W. (1983) Do mental systems have a scientific future? Annals of Theoretical Psychology 3. [ WWR] Rumelhart, D. E. (1977) Introduction to human information processing. John Wiley & Sons, [ta FID] Sayre, K. (1965) Recognition: A study in the philosophy of artificial intelligence. University of Notre Dame Press, [ta FID] Searle, John (1980) Minds, brains, and programs. Behavioral and Brain Sciences 3:417-57. [RC] Shannon, C. (1948) A mathematical theory of communication. Bell System Technical Journal 27:379-423; 623-56. [KMS] Shannon, C. & Weaver, W. (1949) The mathematical theory of communication. University of Illinois Press, [ta FID, BL] Sperling, G. (1960) The information available in brief visual presentations. Psychological Monographs 74:11. [ta FID] Shortliffe, E. H. (1976) Computer-based medical consultations: MYCIN. Elsevier. [MAA] Stich, S. (1982) On the ascription of content. In: Thought and object, ed. A. Woodfield, pp. 153-206. Clarendon Press. [PMC] Uhr, L. (1973) Pattern recognition, learning, and thought. Prentice-Hall, [ta FID] Weaver, W. (1948) Probability, rarity, interest and surprise. Scientific Monthly 67:390-92. [IJG]
90

Sergiu Dretske

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Sergiu Dretske

Uploaded by

Copyright:

Available Formats

THE BEHAVIORAL AND BRAIN SCIENCES (1983) 6, 55-90

Printed in the United States of America

Precis of Knowledge and the Flow of Information

THE BEHAVIORAL AND BRAIN SCIENCES (1983) 1

THE BEHAVIORAL AND BRAIN SCIENCES (1983) 1

THE BEHAVIORAL AND BRAIN SCIENCES (1983) 1

Commentary/DretsVe: Knowledge and the flow of information

Open Peer Commentary

THE BEHAVIORAL AND BRAIN SCIENCES (1983) 1

According to Dretske, a signal r carries the information that s is

Indeterminism, proximal stimuli, and perception

THE BEHAVIORAL AND BRAIN SCIENCES (1983) 1

Knowledge and the flow of information

Information and semantics

THE BEHAVIORAL AND BRAIN SCIENCES (1983) 1

Determining what is perceived

Knowledge and the flow of information

Content: Semantic and information-theoretic

Commentanj/Dretske: Knowledge and the flow of information

THE BEHAVIORAL AND BRAIN SCIENCES (1983) 1

Commentary/Dretske: Knowledge and the flow of information

Physical probability, surprise, and certainty

The first of these forms can be described in words as

Can information be objectivized?

THE BEHAVIORAL AND BRAIN SCIENCES (1983) 1

Knowledge and the flow of information

THE BEHAVIORAL AND BRAIN SCIENCES (1983) 1

Knowledge and the flow of information

Knowledge and the relativity of information

Knowledge and the absolute

THE BEHAVIORAL AND BRAIN SCIENCES (1983) 1

Knowledge and the flow of information

THE BEHAVIORAL AND BRAIN SCIENCES (1983) 1

Commentary/Dretske: Knowledge and the flow of information

Information and error

Commentary/Dretske: Knowledge and the flow of information

Information and belief

Commentary/Dretske: Knowledge and the flow of information

Can information be de-cognitized?

THE BEHAVIORAL AND BRAIN SCIENCES (1983) 1

Commentary/Dretske: Knowledge and the flow of information

possesses the information 1 that-p is informed I

THE BEHAVIORAL AND BRAIN SCIENCES (1983) 1

Some untoward consequences of Dretske's "causal theory" of information

THE BEHAVIORAL AND BRAIN SCIENCES (1983) 1

Knowledge and the flow of information

On the "content" and "relevance" of information-theoretic epistemology Ernest Sosa

Conunentary/Dretske: Knowledge and the flow of information

THE BEHAVIORAL AND BRAIN SCIENCES (1983) 1

Comirientary/Dretske: Knowledge and the flow of information

Probability and information Patrick Suppes

Commentary/Dretske: Knowledge and the flow of information

THE BEHAVIORAL AND BRAIN SCIENCES (1983) 1

fie/erences/Dretske: Knowledge and the flow of information

THE BEHAVIORAL AND BRAIN SCIENCES (1983) 1

You might also like