You are on page 1of 27

CORRECTED MARCH 18, 2010; SEE LAST PAGE

Journal of Experimental Psychology: Learning, Memory, and Cognition 2010, Vol. 36, No. 2, 298 323 2010 American Psychological Association 0278-7393/10/$12.00 DOI: 10.1037/a0018705

Conditional Reasoning in Context: A Dual-Source Model of Probabilistic Inference


Karl Christoph Klauer, Sieghard Beller, and Mandy Hu tter
AlbertLudwigsUniversita t Freiburg
A dual-source model of probabilistic conditional inference is proposed. According to the model, inferences are based on 2 sources of evidence: logical form and prior knowledge. Logical form is a decontextualized source of evidence, whereas prior knowledge is activated by the contents of the conditional rule. In Experiments 1 to 3, manipulations of perceived sufficiency and necessity mapped on the parameters quantifying prior knowledge. Emphasizing rule validity increased the weight given to form-based evidence relative to knowledge-based evidence (Experiment 1). Manipulating rule form (onlyif vs. ifthen) had a focused effect on the parameters quantifying form-based evidence (Experiment 3). The model also provides a parsimonious description of data from the so-called negations paradigm and adequately accounts for polarity bias in that paradigm (Experiment 4). Relationships to alternative conceptualizations of conditional inference are discussed. Keywords: conditional reasoning, probabilistic reasoning, dual-process model

Recent years have seen an increased interest in the role of prior knowledge in conditional reasoning. Such work is characterized by a couple of features: Prior knowledge is activated either through the use of contents with which participants have prior experience (e.g., Beller, 2008; Beller & Spada, 2003; Cummins, Lubart, Alksnis, & Rist, 1991; Thompson, 1994) or by providing explicit prior information about the relationships between the antecedent and consequent of a conditional rule, often in the form of bivariate frequency information (e.g., Evans, Handley, & Over, 2003; Oaksford, Chater, & Larkin, 2000; Oberauer & Wilhelm, 2003). In contrast, earlier work frequently relied on so-called abstract materials, for which little prior information is presumably available. Simultaneously, instructions in earlier work tended to stress the notion of logical necessity, according to which a conclusion can be drawn only if it logically follows from the given premises, whereas the recent work more frequently relies on a graded response format in which the acceptability, plausibility, or probability of the conclusion given the premises has to be assessed (e.g., Liu, Lo, & Wu, 1996; Oaksford et al., 2000). We refer to the recent line of research as research on probabilistic conditional inference. In this article, we propose a dual-source model of probabilistic conditional inference. According to the model, inferences are based on two sources of evidence: logical form and prior knowledge. Logical form is a decontextualized source of evidence, whereas prior knowledge is activated by the contents of

the conditional rule. The dual-source hypothesis is contrasted with the view that probabilistic conditional reasoning draws primarily on prior knowledge. In this view, exemplified by Oaksford et al.s (2000) probabilistic model of conditional inference, the role of the conditional rule is to alter the knowledge base from which inferences are derived. In the introduction, we review existing evidence for two qualitatively distinct modes of reasoning, one based on logical form, the other based on prior knowledge. This is followed by a review of research on the role of logical form in probabilistic conditional inference. The research suggests that both prior knowledge and logical form play a role in such inferences. This hypothesis is then formally specified by means of the dual-source model. In four experiments, the dual-source model is evaluated in terms of its ability to fit the data, in terms of whether the effects of experimental manipulations targeted at specific model parameters indeed affect these parameters as expected, and in terms of whether the model adequately reproduces critical data patterns observed in probabilistic conditional inference. In each experiment, the performance of the dual-source model is compared with that of Oaksford et al.s one-source model. Four conditional inferences are typically studied for a conditional rule of the form if p then q: Modus ponens (MP): Given the rule and p, it follows that q. Modus tollens (MT): Given the rule and not-q, it follows that not-p. Affirmation of the consequent (AC): Given the rule and q, it follows that p. Denial of the antecedent (DA): Given the rule and not-p, it follows that not-q.
298

Karl Christoph Klauer, Sieghard Beller, and Mandy Hu tter, Institut fu r Psychologie, AlbertLudwigsUniversita t Freiburg, Freiburg, Germany. The research reported in this article was supported by Grant Kl 614/31-1 from the Deutsche Forschungsgemeinschaft to Karl Christoph Klauer. Correspondence concerning this article should be addressed to Karl Christoph Klauer, Institut fu r Psychologie, Sozialpsychologie und Methodenlehre, AlbertLudwigsUniversita t Freiburg, D-79085 Freiburg, Germany. E-mail: klauer@psychologie.uni-freiburg.de

A DUAL-SOURCE MODEL

299

Each of these consists of the major premise (i.e., the rule), the minor premise (e.g., p for MP), and a conclusion (e.g., q for MP). Under a traditional interpretation of the conditional rule ( p is sufficient, but not necessary for q; Evans & Over, 2004), MP and MT are logically valid inferences, whereas AC and DA are not logically valid. Let us briefly review results obtained with abstract or arbitrary rule contents and instructions stressing logical necessity. Endorsement rates are typically close to 100% for MP, whereas the AC, DA, and MT inference rates vary from 23%, 17%, and 39% to 89%, 82%, and 91%, respectively (Schroyens, Schaeken, & dYdewalle, 2001). Across studies, MP is accepted significantly more frequently than MT and than AC; MT and AC are accepted significantly more frequently than DA; the difference between MP and MT is significantly and substantially larger than that between AC and DA (Schroyens et al., 2001); and the difference between AC and DA is often not significant in individual studies (Evans, 1993; OBrien, Dias, & Roazzi, 1998). Taken together, acceptance rates tend to be ordered as MP MT AC DA. Some of the variability in the inference rates reflects the fact that participants sometimes adopt a biconditional rather than conditional interpretation of if p then q. The biconditional interpretation ( p is sufficient and necessary for q) justifies acceptance of all four inferences. But many procedural variations seem to play a role in shaping the profile of acceptance rates (Evans & Over, 2004, Chapter 3; Schroyens et al., 2001). As pointed out by Schroyens et al. (2001), these results are consistent with a (revised version of the) mental model theory (Johnson-Laird, Byrne, & Schaeken, 1992) and the theory of mental rules (e.g., Rips, 1994). Studies using contents for which prior knowledge is available also address these inferences and typically require ratings of plausibility, probability, or confidence in the truth of the conclusion. In such studies, there is even more variability in the profiles of ratings over the four inferences, but two major variables account for much of it: perceived sufficiency of p for q and perceived necessity of p for q. Perceived sufficiency and necessity have been assessed in different ways; one possibility is to have participants rate sentences such as It is necessary for p to happen in order for q to happen (perceived necessity) and p happening is enough to ensure that q will happen (perceived sufficiency; Thompson, 1994, p. 745). Thompson (1994) systematically chose contents differing in perceived sufficiency and necessity from different domains dealing with causal relationships, permissions, obligations, and definitions. Across these domains, she consistently found strong effects of perceived sufficiency on the acceptance of MP and MT and strong effects of perceived necessity on DA and AC: Perceived sufficiency and MP and MT acceptability are monotonically related as are perceived necessity and DA and AC acceptability. Figure 1 (middle panel) illustrates the typical results with data compiled from experiments by Liu (2003; Experiments 1a and 2), which we use as an illustrative example throughout the introduction. Perceived sufficiency and necessity were varied in three steps: high (H), medium (M), and low (L), creating six rules HL, ML, LL, LH, LM, and LL, with the first letter referring to degree of sufficiency and the second letter to degree of necessity. Contents with high sufficiency of p for q (HL) received the highest MP and MT ratings, followed by contents with medium sufficiency (ML), whereas contents with high necessity of p for q (LH)

received the highest AC and DA ratings, followed by contents with medium necessity (LM). A distinction related to perceived necessity and sufficiency is that between alternative antecedents and disabling conditions. An alternative antecedent is an event distinct from p which is sufficient for q; for example, an alternative antecedent for the rule If a stone is thrown at a window, it will break is to fire a gun at a window. A disabling condition is a condition that prevents q from happening in the presence of p (e.g., the window is made of Plexiglas). Alternative antecedents thus undermine perceived necessity, and disabling conditions undermine perceived sufficiency. Perceived necessity and availability of disablers as well as perceived sufficiency and availability of alternatives are relatively highly correlated (Verschueren, Schaeken, & dYdewalle, 2005a), and they engender similar effects on the profiles of ratings over the four conditional inferences (e.g., Cummins et al., 1991; De Neys, Schaeken, & dYdewalle, 2003).

One or Two Modes of Reasoning?


One issue raised by the two different lines of research is whether they reflect the operation of two distinct modes of reasoning. The first mode capitalizes on prior knowledge about the particular p and q related by the rule. The second mode capitalizes on the logical form of the proposed inference, irrespective of content. Consider the first mode. When confronted with premises and a proposed conclusion, reasoners might sample their long-term memory, weighing memory traces of situations in which the conclusion and the minor premise were simultaneously fulfilled (e.g., cases with p and q for MP) against those in which the conclusion was not fulfilled, but the minor premise was in force (i.e., cases with p and not-q), retrieving counterexamples in the form of alternatives (i.e., factors other than p that entail q) and/or disablers (i.e., factors that prevent q from happening in the presence of p). These bits of information are then integrated and condensed into a rating of plausibility, probability, or confidence in the truth of the conclusion as the task may require (e.g., Verschueren et al., 2005a). In contrast, the second mode might reflect an effort to judge the logical validity of the proposed inference through an algorithm such as specified in mental model theory (Johnson-Laird et al., 1992; Schroyens et al., 2001) or the theory of mental rules (e.g., Rips, 1994)an algorithm that is endowed with the competence to provide decontextualized, logically correct assessments of the inferences but is fallible due to capacity constraints, misinterpretations, intrusions of strong knowledge-based associations, and a host of other factors. The first mode is what is tapped in studies using probabilistic instructions and response formats; the second mode is more strongly implicated in studies using instructions emphasizing logical necessity and abstract content. Note that both modes are likely to involve multiple, dissociable processes when scrutinized more closely (see, e.g., Verschueren et al., 2005a, for a dual-process characterization of the first mode). For lack of better terms, let us refer to the first mode as knowledge-based and to the second mode as form-based (Beller & Spada, 2003). A few studies have addressed this issue head on by contrasting instructions intended to elicit the knowledge-based mode with instructions intended to elicit the form-based mode while

300
Without Rule
100

TTER KLAUER, BELLER, AND HU

With Rule
100 100

Model Parameters

90

90

80

80

70

70

Parameter Estimates (in Percent)

Probability Rating

60

60

50

50

50

60

70

80

90

40

40

40

30 30 HL ML LL LH LM LL 30

20

20

10

10

MP

MT

AC

DA

MP

MT

AC

DA

10 MP

20

MT

AC

DA

Figure 1. Reanalysis of data from Experiments 1a and 2 by Liu (2003). The left and middle panels show mean ratings for problems without rule and problems with rule, respectively, as a function of content (HL, ML, LL, LH, LM, and LL) and inference (MP [modus ponens], MT modus [tollens], AC [affirmation of the consequent], and DA [denial of the antecedent]). The right panel shows mean parameter estimates (multiplied by 100) of the knowledge parameters for the dual-source model. parameters are also shown in this panel with values marked as . HL, ML, LL, LH, LM, and LL refer to high (H), medium (M), and low (L) degrees of perceived sufficiency (first letter in the pair) and perceived necessity (second letter in the pair).

keeping rule contents constant. Thus, Markovits and Handley (2005; see also Markovits & Thompson, 2008) compared (a) an inductive condition with probabilistic response format (How probable is it that the conclusion follows?) with (b) a deductive condition with a binary response format (Is it certain that the conclusion follows?). Markovits and Handley noted that their results are consistent with a particular one-mode model invoking only the knowledge-based mode of reasoning. In this threshold model, an inference is accepted in the deductive condition if and only if it receives a very high probability rating in the inductive condition. In other words, in the threshold

model, subjective probability underlies both the responses in the inductive condition as well as in the deductive condition, but the binary judgments in the deductive conditions are generated by placing a high threshold on an underlying probability scale. Such threshold models were explicitly tested by Rips (2001) and Heit and Rotello (2005, 2008; Rotello & Heit, 2009). Like Markovits and Handley (2005), these authors compared inductive and deductive instructions using the same problems. Rips presented inferences that were either logically valid or not. Orthogonally, the inferences were either plausible in the light of prior knowledge or

A DUAL-SOURCE MODEL

301

implausible. When comparing deductively valid, but implausible, problems with deductively invalid, but plausible, problems a double dissociation emerged: Under the deductive instruction, deductively valid, but implausible, problems were more frequently accepted than deductively invalid, but plausible, problems, and vice versa under the inductive instruction. This reversal is not compatible with a threshold model in which responses under both kinds of instruction are generated from a single underlying probability scale with different thresholds. Heit and Rotello (2005, 2008; Rotello & Heit, 2009) provided further tests of the threshold model couched in a signal-detection framework. For example, they found that signal-detection parameter d for the participants ability to discriminate between the logically valid and invalid problems was larger under the deductive instruction than under the inductive instruction, which is also inconsistent with a threshold model postulating only one mode of reasoning. Heit and Rotello (2008; Rotello & Heit, 2009) argued that a two-dimensional signal-detection model that draws on two different sources of evidence consistency with background knowledge and perceived deductive correctnessis capable of accounting for the observed dissociations between the inductive and the deductive condition. These findings, although not dealing with conditional reasoning in particular, strongly suggest that reasoners avail of two modes of reasoning that draw on different sources of evidence background knowledge and logical correctnessand that reliance on one source versus the other depends upon the mode of reasoning stressed by instructions (see also Evans & Over, 2004, Chapters 8 and 9 for a similar point of view in the context of conditional reasoning).

Probabilistic Conditional Reasoning and Mode of Reasoning


The findings considered so far remain silent with regard to the possibility of an involvement of the form-based mode of reasoning in probabilistic conditional reasoning. There is in fact relatively little evidence for a contribution of the form-based mode of reasoning to probabilistic conditional reasoning. Strong effects of variables such as perceived necessity/sufficiency and availability of disablers/alternatives attest, however, that the knowledge-based mode is involved (e.g., Thompson, 1994; Verschueren et al., 2005a; Verschueren, Schaeken, & dYdewalle, 2005b). In all of these studies, it is usually possible and even perfectly natural to assess perceived necessity/sufficiency and the availability of alternatives/disablers without reference to a conditional rule.1 Simultaneously, it is natural in knowledge-rich contexts to pose the questions corresponding to the above conditional inferences with the conditional rule left out. Consider the rule If a stone is thrown at a window, then the window breaks. MP without rule would simply ask: A stone is thrown at a window. What is the probability that the window breaks? A few studies have presented the inferences without rule (George, 1995; Liu, 2003; Liu et al., 1996; Matarazzo & Baldassarre, 2008; see also Beller, 2008; Beller & Kuhnmu nch, 2007, for similar experiments with deductive instructions) and found effects of perceived sufficiency and necessity analogous to those observed when a conditional rule is stated. Consider, for example, results from the studies by Liu (2003) shown in Figure 1. The left panel of Figure 1 presents the probability ratings without conditional rule

as a function of perceived necessity and sufficiency (with labels such as HL already explained above). Like when a conditional rule is present, perceived sufficiency of p for q (varied across contents HL, ML, and LL) is monotonically related to what would be MP and MT ratings if the rule if p then q had also been presented, and perceived necessity (varied across contents LH, LM, and LL) to AC and DA ratings. This suggests that we may not need a conditional rule to account for the effects observed in probabilistic conditional reasoning. Without rule, there is, however, no logical form on which the form-based mode of reasoning could operate. This also raises a methodological issue: Studies that do not implement a baseline condition without rule are inherently ambiguous with regard to a possible causal role of the conditional rule. In the rare instances in which such a baseline was obtained, the effects of the presence of a rule were generally found to be small (see also Beller, 2008). For example, in the studies by Liu (2003) most of the variance in ratings without rule and with rule is accounted for by perceived necessity and sufficiency (compare left and middle panels of Figure 1). Another relevant manipulation is to manipulate how content is mapped on logical form. For example, Cummins (1995) and Thompson (1994) manipulated clause order in the rule, comparing inferences with rules If p then q versus If q then p, with minor premise and conclusion held constant. Formally, what are MP and MT with regard to the first rule, are AC and DA, respectively, for the second rule, and vice versa. The effects of rule form were small relative to the dominant effects of perceived necessity and sufficiency (Thompson, 1994) and disabling conditions and alternative antecedents (Cummins, 1995). Finally, Stevenson and Over (2001) looked at MP and MT inferences and manipulated the expertise of a speaker positing the major premise and thereby the perceived validity of the rule. For example, the rule might be If Bill has typhoid, then he will make a quick recovery. In one condition, the rule would be uttered by a professor of medicine, in another condition by a first-year medical student. Both MP and MT conclusions were assigned greater subjective certainty when the major premise was asserted by the expert than by the novice. Taken together, there seem to be effects of rule presence, rule form, and rule validity (manipulated via speaker expertise), even if they tend to be small relative to the effects of background knowledge itself. How can the effects of logical form be explained? As discussed above, reasoners can in principle access evidence from two sources: consistency with background knowledge and perceived deductive correctness. One possibility is therefore that reasoners draw on both of these sources where possible (i.e., where there is relevant background knowledge and a conditional rule is present) and that their judgments then integrate evidence from both sources with
1 For example, considering p A stone is thrown at a window and q The window breaks, one might ask participants to assess, without mentioning a conditional rule, the perceived likelihood that p is sufficient for q to occur (perceived sufficiency of p for q) and the perceived likelihood that p is required for q to occur (perceived necessity of p for q). Similarly, participants can be asked to generate alternative antecedents as factors distinct from p that lead to q, and disabling conditions as conditions that prevent q from occurring in situations with p without reference to any conditional rule whatsoever.

302

TTER KLAUER, BELLER, AND HU

weights determined by instructions and other factors (Heit & Rotello, 2008; Rotello & Heit, 2009). In this view, both the knowledge-based and the form-based modes of reasoning are engaged, and they jointly determine the participants ratings even in probabilistic conditional reasoning. Another possibility is firmly couched in the knowledge-based mode of reasoning and exemplified by Oaksford et al.s (2000) probabilistic model of conditional inference. In their view, the effect of the conditional rule is to alter the knowledge base from which probability ratings are derived. Specifically, a conditional rule serves to depress the perceived probability of cases with p and not-q below the levels that may be in force when no rule is stated (Oaksford & Chater, 2007, p. 164) as elaborated below. The purpose of the present article is to explore the first possibility, which we refer to as the dual-source hypothesis (Beller & Spada, 2003). For this purpose, the dual-source hypothesis is sharpened to the point where it is specified as a computational model of probabilistic conditional reasoning with the same degree of specification and elaboration as the well-established one-source model by Oaksford et al. (2000). We then present the results of four empirical studies evaluating the ability of the resulting dualsource model to account for major findings from probabilistic conditional reasoning and compare its performance with that of Oaksford et al.s model. We focus on that model for the comparison because it is the only one that makes precise quantitative predictions for our data. To foreshadow, both models performed reasonably well in accounting for our data, and we conclude that the dual-source model is thereby at a minimum established as a viable alternative to the dominant knowledge-based view of conditional reasoning (e.g., Geiger & Oberauer, 2007; Liu et al., 1996; Oaksford & Chater, 2007; Oaksford et al., 2000; Verschueren et al., 2005a) as exemplified here by Oaksford et al.s model. In several respects, however, the dual-source model outperformed Oaksford et al.s model.

MP: P q p 1 e C , MT: P not- p not- q AC: P(q p) DA: P not- q not- p 1 b C a C e C , 1 b C a C 1 e C , b C 1 b C a C e C . 1 a C (1)

Oaksford et al.s (2000) Model of Probabilistic Conditional Reasoning


Oaksford et al. (2000) presented an impressive model of probabilistic conditional reasoning. It is a model for the subjective probabilities of the conclusions in the above conditional inferences. The idea is that conclusion probability is the perceived conditional probability of the conclusion given the premises. In particular, the probability ratings of conditional inferences MP, MT, AC, and DA are predicted by the formulae for the conditional probabilities of the conclusion given minor premise, in order, P(qp), P(not-pnot-q), P( pq), and P(not-qnot-p) for the if p then q rule. These conditional probabilities can be expressed in terms of three parameters, a, b, and e. Parameter a is the perceived probability of p events, b is the perceived probability of q events, and e the conditional probability of not-q given p. Parameter e is termed the exceptions parameter because cases with p, but without q, are exceptions from the rule. The values of these parameters summarize the reasoners knowledge about the particular content (e.g., stones and windows) that is used for the rule and premises. We refer to the parameters as a(C), b(C), and e(C) to highlight that different parameters are required for rules that differ in content C. Using elementary probability calculus, the relevant probabilities are as follows:

These equations in themselves impose only mild conditions on the probability ratings, namely that prior knowledge bearing on the different conditional questions is consistent with a bivariate probability distribution relating p and q of that content. As a consequence, observed ratings should obey the laws of the probability calculus. For example, the probability rating for q given p as assessed in MP should be one minus the probability rating for not-q given p assessed in the so-called converse inference MP that presents the rule, minor premise p, and the negated conclusion not-q. That is, to the extent to which q is accepted given p, not-q should be rejected. There is evidence, from studies using abstract conditionals, suggesting that this consistency assumption is an oversimplification (Handley & Feeney, 2003, Experiment 3), but it is an integral part of any model couched in a probability metric, including the dual-source model presented below, and it seems to provide at least a good first approximation of the data from probabilistic conditional reasoning (Oaksford et al., 2000). Oaksford et al.s (2000) model makes the additional, more restrictive assumption that the exceptions parameter e(C) is small, although it may still vary from content to content, being derived from the reasoners current state of knowledge. This directly leads to high predicted ratings for MP, and it implies a positive contingency of p and q in the 2 2 contingency table crossing p and not-p with q and not-q. The model has been fit to many data sets both in the domain of abstract conditional reasoning and in the domain of probabilistic conditional reasoning (Oaksford & Chater, 2007). For these applications, different parameters a, b, and e are usually estimated for each rule.

The Dual-Source Model


The dual-source hypothesis (Beller & Spada, 2003) claims that reasoners draw on two sources of evidence. One source is the signal derived from relevant background knowledge. It encodes the degree of consistency of the conclusion given the premises with background knowledge. The second source is perceived logical correctness of the proposed inference, derived from logical form. As is evident from the studies of conditional reasoning with abstract content, perceived logical correctness is influenced by many factors such as processing constraints and interpretation of the premises. As reviewed above, these studies suggest that the inferences are perceived as logically valid with acceptance rates ordered as MP MT AC DA. When one of these inferences is not accepted, the conclusion is typically seen as neither necessitated nor logically forbidden by the premises (Evans, Newstead, & Byrne, 1993, Chapter 2). In studies using abstract content, the response maybe or dont know might then be given to express the resulting uncertainty about the conclusion. In such cases, participants have little choice other than

A DUAL-SOURCE MODEL

303

to fall back on extralogical sources of information such as their prior knowledge to assess the plausibility or probability of the conclusion. This leads one to expect that the impact of prior knowledge should increase going from the frequently accepted MP inference to the relatively rarely accepted DA inference. In other words, inasmuch as an attempt at logical reasoning plays a role and inasmuch as the results with abstract conditionals can be taken as a guideline for perceived logical correctness, the role of prior knowledge would be progressively reduced moving through the inferences in the order DA, AC, MT, MP. This can again be illustrated using data from Liu (2003). In Lius experiments, conclusions were rated twice, once with the rule present (e.g., MP: If a substance is a diamond, then it is very hard. Given that a substance is a diamond, how probable is it that it is very hard?) and once without the rule (e.g., MP: Given that a substance is a diamond, how probable is it that it is very hard?). The left two panels of Figure 1 show mean probability ratings in percentages. Considering the left panel for problems without rule and the middle panel for problems with rule, two patterns are prominent. First, as already noted, prior knowledge has a strong impact (i.e., there are pronounced effects of perceived sufficiency and necessity). Second, the effect of adding a rule appears to be a compression of ratings toward higher values for MP and MT but not as much for AC and least for DA. Thus, the impact of prior knowledge was reduced for MP and MT but not as much for AC and least for DA. The compression of endorsements toward high values for MP and MT inferences was recently replicated by Matarazzo and Baldassarre (2008; see also George, 1995), who did not consider AC and DA inferences. It suggests in the present framework that perceived logical correctness served to push ratings for MP and MT toward higher levels. Note also that within each inference, the probability ratings are ordered the same over contents for problems without rule as for problems with rule in most cases. This implies, for example, that perceived sufficiency, P(qp), as assessed in MP problems without rule, predicts ratings for MP problems with rule. This in turn suggests that ratings with rule still integrate background knowledge over and above perceived logical correctness, even when logical form provides a strong decontextualized signal as in MP. This pattern of data is consistent with a dual-source model in which participants integrate information from two sources: perceived consistency with background knowledge (knowledge-based evidence) and perceived logical correctness (form-based evidence; see also Beller & Spada, 2003). The dual-source model assumes that information from both sources is integrated as a weighted average with proportional weights given by for the form-based mode of reasoning and 1 for the knowledge-based mode of reasoning. The factor varies between 0 and 1. It is assumed to be the same for each inference and rule, but it may depend upon how much the instructions stress the rule versus the particular contents and on similar context variables. Knowledge-based evidence depends on the particular content C to which the statements p and q refer. It is summarized by parameters (C,x). Parameter (C,x) quantifies, on a probability scale, the subjective certainty that the conclusion in inference x is warranted by background knowledge about the particular p and q presented. In Lius (2003) experiments, prior knowledge was manipulated by the contents C that differ in perceived sufficiency and necessity of p for q, and it was assessed in the conditions without rule. The

parameter (C,x) is the contribution of the knowledge-based mode of reasoning to observed ratings (with weight 1 ). As an example, consider the content C dealing with p A stone is thrown at a window and q The window breaks. There are four parameters for this content, one for each inference under study. For example, (C,MP) is ones subjective certainty that a window at which a stone is thrown breaks, based on what one knows about stones and windows in general and on average; (C,MT) is ones subjective certainty that a stone was not thrown at a window that is not broken, and so forth. Note that these parameters refer to ones knowledge independently of an if p then q rule. The inferences MP, MT, and so forth are still labeled with reference to the if p then q rule, but this is merely a notational convenience because it allows us to use the same labels for problems with and without rule. The inferences could alternatively be labeled by minor premise and conclusion without reference to any rule whatsoever. The form-based evidence is different for each inference, but being decontextualized does not depend upon rule content. It is modeled by parameters (x), with x being one of the inferences, MP, MT, DA, and AC. Parameter (x) quantifies, on a probability scale, the subjective certainty that the inference x is warranted by the logical form of the conditional argument. Taking the results from abstract conditional reasoning as a guideline, we expect the parameters to be ordered as (MP) (MT) (AC) (DA). The remaining uncertainty, 1 (x), quantifies the extent to which participants are uncertain about whether the conclusion is warranted by the logical form of x. In studies using abstract content, the response maybe or dont know might be given in the case of uncertainty. Where rich background knowledge is available, background knowledge suggests a default fall-back position in such cases, namely to use the knowledge-based evidence as summarized in parameters (C,x). Integrating both the extent of certainty and the remaining uncertainty, the component contributed by the form-based mode of reasoning is thus (x) 1 (1 [x]) (C,x).2 This component and the contribution of the knowledge-based mode of reasoning are integrated with weight factors and 1 , respectively, so that the predicted rating is
We also present so-called converse inferences with the opposite conclusion. For example, the converse inference MP for MP is thus: Given if p, then q and p, it follows that not-q. Parameter is defined, more precisely, as the subjective certainty, quantified on a probability scale, that the conclusion must either be accepted or rejected on logical grounds. For the original inferences, a conditional interpretation of ifthen suggests to accept MP and MT, and a biconditional interpretation suggests to accept all four inferences. For the converse inferences, a conditional interpretation suggests to reject the conclusions of MP and MT, and a biconditional interpretation suggests to reject the conclusions for all four converse inferences. We assume that parameter for the original inference has the same value as parameter for the converse inference. For example, consider MP and MP: To the extent to which q is seen as following logically from p, not-q should be rejected on logical grounds, leading to the assumption that (MP) (MP). For the converse inferences x, the appropriate formula integrating certainty and remaining uncertainty is therefore (x) 0 (1 [x]) (C,x) with (x) (x) and, based on the consistency assumption already mentioned (the probability of not-q should be 1 minus the probability of q), with (C,x) 1 (C,x).
2

304

TTER KLAUER, BELLER, AND HU

x 1 1 x C, x 1 C, x . Thus, the knowledge parameters (C,x) enter the dual-source model in two places: (a) in the component for the form-based mode of reasoning as fall-back response when uncertain about the appropriate logical conclusion and (b) in the component for the knowledge-based mode of reasoning. The knowledge-based evidence exclusively drives ratings in conditions in which no rule is present in the first place. Taken together, probability ratings, P(x), should be given by the following equations for each inference x MP, MT, AC, and DA and content C: P x C, No rule is present C, x , P x C, A rule is present x 1 x C,x 1 C, x . (2)

shows the estimates of the knowledge parameters and the rulebased parameters . For example, the knowledge parameters (HL,x) for the HL content are shown as small squares for each inference x MP, MT, AC, and DA, connected by lines. It can be seen that their values were estimated to be high for the MP and MT inferences, but much lower, close to the 50% mark, for the AC and DA inferences. The knowledge parameters roughly follow the pattern of the original ratings without rule. The (x) parameters for the different inferences x are shown as points labeled . Encouragingly, the parameters reproduce the pattern (MP) (MT) (AC) (DA) suggested by studies on abstract conditional reasoning. In the concluding sections of the introduction, we discuss (a) how Oaksford et al.s (2000) model alternatively deals with the effects of rule presence and (b) consider the relationship of the dual-source model to Lius (2003) concept of so-called secondorder conditionalization. Relationships to broader theories of conditional reasoning are considered in the General Discussion.

The dual-source model can be framed as a normative model of Bayesian model averaging (OHagan & Forster, 2004, Chapter 7). Details can be obtained from Karl Christoph Klauer. In a first test, we fit this model to the data shown in the left two panels of Figure 1 by minimizing the sum of the squared deviations of the data from the model predictions (i.e., using a leastsquares objective) using an iterative gradient-based search to obtain best-fitting parameter estimates. There are 48 data points, which were modeled by 14 parameters. Parameters were a function of only the inferences and did not vary across the different contents C HL, ML, LL, LH, LM, LL. Thus, there are four parameters, one for each inference MP, MT, AC, and DA. A special feature of Lius (2003) experiments was that the problems labeled HL, ML, and LL used the same contents as the problems labeled, in order, LH, LM, and LL, with the statements referring to p and q exchanged from one set of problems to the other set of problems. For example, an HL rule was If H1 is 5 years old, then H1 is a child, where H1 stands for a boys or girls name; the corresponding LH rule was If H1 is a child, then H1 is 5 years old, and as a consequence, the knowledge parameters should be the same (taking the exchange of p for q and of q for p into account) for the HL and LH problems.3 Thus, the six different kinds of rules are based on only three different contents. For each content C, there are four knowledge parameters (C,x), one for each inference x, which we reexpressed by three parameters a, b, and e per content C: a(C) P( p), b(C) P(q), and e(C) 1 P(qp) using Oaksford et al.s (2000) formulae (Equation 1) without the restriction that e should be small. As already explained, this implies the relatively mild restriction that the knowledge parameters are consistent with a bivariate probability distribution for the 2 2 contingency table crossing p and not-p with q and not-q. With three parameters a, b, and e per content, this results in a total of nine parameters underlying the knowledge parameters . Furthermore, there was one parameter for all problems with rule. Model fit was not bad (R2 .88), so that the model accounted for almost 90% of the variance in the data.4 The weight given to rule-based evidence was .40, and thus, the rule when present had only a medium-sized impact on ratings. In fact, when is restricted to be zero, R2 is still .77 so that roughly 75% of the variance is accounted for by the knowledge parameters alone. The right panel of Figure 1

Effects of Rule Presence in Oaksford et al.s (2000) Model


What is the effect of the conditional rule in probabilistic conditional reasoning, and how can it be characterized in precise quantitative terms? This is one question answered by the dualsource model. Oaksford et al.s (2000) model provides an alternative answer. As already mentioned, the idea is that probability ratings of conditional inferences MP, MT, AC, and DA are given by the formulae for the conditional probabilities of, in order, P(qp), P(not-pnot-q), P( pq), and P(not-qnot-p). These probabilities are expressed in terms of three parameters, a P( p), b P(q), and e 1 P(qp), with the restriction that the parameter e, the exceptions parameter, should be small. As argued by Oaksford and Chater (2007), the only effect of adding a rule should be to reduce parameter e: It seems that the only effect the assertion of the conditional premise could have is to provide additional evidence that q and p are related, which increases the assessment of P2(qp) [i.e., of 1 e] (p. 164). For the data from Figure 1, the model thus requires three a parameters, three b parameters, and three e parameters (estimated without restriction) for the problems without rule, taking into account that the six different kinds of problems are based on only
3 The inferences MP, MT, AC, and DA for the rule If p then q each present the same minor premise and conclusion as the inferences, in order, AC, DA, MP, and MT for the rule If q then p. The knowledge parameters are defined only in terms of minor premise and conclusion, whereas the rule plays no role. This means that exchanging p and q changes the labels for the inferences, but not the parameter values. Considering, for example, the HL rule and the LH rule generated from it by exchanging p and q, it follows that (HL,MP) (LH,AC), (HL,MT) (LH,DA), (HL,AC) (LH,MP), and (HL,DA) (LH,MT). 4 The relatively high R2 values for the dual-source model and Oaksford et al.s (2000) model reported in this article should be interpreted in relation to the relatively high ratio of parameters to data points; for example, there were 14 parameters to account for 48 data points in Lius (2003) experiments. This issue does not affect, however, the comparison between our model and Oaksford et al.s model. It is addressed in more detail in Experiment 4.

A DUAL-SOURCE MODEL

305

three different sets of contents. Six new e parameters are required for problems with rule to capture the effects of the rules, remembering that the rules were different, if sometimes only in the order of p and q, across the six kinds of problems. Because the effects of the rule are to depress the exceptions parameter according to Oaksford and Chater (2007), the new parameters e were constrained to assume values smaller than or equal to the corresponding parameter estimated from the problems without rule. The model is thus based on 15 parameters when applied to reanalyze Lius (2003) data. Although less parsimonious than the dualsource model (14 parameters) in terms of number of parameters, the probabilistic model fit somewhat less well with R2 .83. Note, however, that Oaksford et al. (2000) and Oaksford and Chater (Chapter 5) argued that rules such as the present LH rule, which imply a P( p) b P(q), are reinterpreted. In particular, P(q) is adjusted upward in the presence of the rule to meet the models restriction that a(1 e) b. This subtracts from the normativeness of the model in that there are many ways in which the model parameters could be adjusted to meet the restriction a(1 e) b imposed by the probability calculus, with probability calculus itself being silent about the normatively appropriate manner of adjusting model parameters. Nevertheless, the model remains a psychologically viable theory, and when we admitted an additional and separate b parameter for the problematic LH problems with rule, the model with 16 parameters approached the model fit of the dual-source model with 14 parameters (R2 .85). Clearly, Oaksford et al.s (2000) model is a viable alternative to the dual-source model. Its conceptual appeal is that it does not postulate two qualitatively distinct sources of information, one based on prior knowledge and a second one based on form-based evidence. Because of the conceptual simplicity, it would probably be preferred over the dual-source model at this point, given that the data from Figure 1 are not strong enough to permit a firm decision for or against one of the two models. Another limitation of this reanalysis is that with few exceptions, it is not legitimate to test nonlinear models such as Oaksford et al.s and our model on data averaged across persons (Rouder, Lu, Morey, Sun, & Speckman, 2008).

explicit model of the second-order conditional probabilities, the dual-source model isolates, and allows one to estimate, a specifically rule-based component in the parameters . According to Liu (2003), ratings for problems with rule directly reflect these second-order conditional probabilities. In terms of the dual-source model, this translates into the claim that the weight parameter for the form-based component is 1 and that for the knowledge-based component, 0. This implies that reliance on the rule should be perfect when a rule is asserted. For example, in the presence of a rule, there should be little room for effects of instructions that emphasize the rule over and above the degree that is already implemented in standard instructions as used by Liu, a prediction that we test in the present Experiment 1. Note, however, that Liu also assumed that participants would not always be able to compute the appropriate second-order conditional probabilities, especially for MT inferences, and that content and instructions might have an effect on how accessible the second-order conditional probabilities are.

Outlook
In four experiments, we test the dual-source model empirically. All experiments employ several phases in which the same contents are presented. In a baseline phase, problems are presented without rule; in the other phases problems are presented with rule. As already mentioned, a baseline phase without rule is needed as a control condition to allow one to assess the effects of the conditional rule in probabilistic conditional reasoning and as a means to assess how participants respond to the different problems when responses can be based only on relevant background knowledge, but not on logical form. In particular, the dual-source model and Oaksford et al.s (2000) model make the same predictions for ratings without rule (because they use the same equations), but they differ in their predictions for the effects of adding a rule. To compare the two models, it is therefore necessary to contrast ratings for problems with rule and those for problems without rule. In all experiments, phases were separated by at least 1 week for each participant. Like in Liu (2003), prior knowledge was systematically manipulated through the use of HH, HL, LH, and LL conditionals in Experiments 1 to 3. In Experiment 1, Phases 2 and 3 furthermore differed in the emphasis put upon the rule. This leads to two major predictions for the parameter estimates of the dual-source model: (a) The manipulation of prior knowledge should systematically affect the knowledge parameters , and (b) the emphasis put upon the rule should systematically influence the relative weight of form-based evidence versus knowledge-based evidence. Experiment 1 also allowed us to compare the fit of the dual-source model and Oaksford et al.s (2000) model of probabilistic inference. Experiment 2 is a control experiment designed to defend some of our procedural choices. In Experiment 3, prior knowledge was again manipulated. In addition, we compared two conditional rules differing in form, namely the if p then q rule and the p only if q rule. Both rules were presented with the same contents in different phases of the experiment. The major predictions were (a) that prior knowledge should again be mapped on the knowledge parameters, whereas (b) that the form of the conditional rule should affect the parameters for the form-based evidence.

The Dual-Source Model and Second-Order Conditionalization


In interpreting his data, Liu (2003) proposed that for problems without rule, participants base responses on the conditional probability of the conclusion given the minor premise (first-order conditional probability) and on the conditional probability given minor premise and major premise (i.e., the rule) for problems with rule (second-order conditional probability). Conditionalizing on both minor and major premise simultaneously is called secondorder conditionalization. One way to look at the dual-source model is to see it as specifying an explicit model for the second-order conditional probabilities; they are given by the component contributed by the form-based mode of reasoning: For that component, both minor premise and major premise are considered true and given. In contrast, the component contributed by the knowledge-based mode of reasoning reflects the first-order conditional probabilities: For that component, only the minor premise is considered true and given, and the major premise plays no role. By formulating an

306

TTER KLAUER, BELLER, AND HU

Finally, in Experiment 4, the so-called negations paradigm was employed. In the negations paradigm, four rules are presented for each content. The four rules differ in whether antecedent and/or consequent are affirmed or negated as elaborated below. We tested whether the model accounts for so-called polarity bias, the major effect emerging in the negations paradigm. Polarity bias was the effect targeted by Oaksford et al. (2000) in the original exposition of their model.

Experiment 1
Experiment 1 used four contents taken from Verschueren et al. (2005a), pretested to be either high or low in sufficiency of p for q and simultaneously either high or low in perceived necessity of p for q (Verschueren et al., 2005a, Appendix 1). Presented as if p then q rules, the four contents were HH: If a predator is hungry, then it will search for prey. HL: If a balloon is pricked with a needle, then it will pop. LH: If a girl has sexual intercourse, then she will be pregnant. LL: If a person drinks lots of Coke, then that person will gain weight. Problems were presented in three phases of 32 problems each, separated by at least 1 week to reduce trivial carry-over effects from phase to phase. The 32 problems were generated by presenting the minor premise and conclusion of each conditional inference MP, MT, AC, and DA with rule left out (Phase 1) or with rule included (i.e., the complete logical form; Phases 2 and 3) for each content, resulting in 16 problems per phase. The remaining 16 problems presented the negation of the conclusion for each inference, or in Oaksford et al.s (2000) terms, the converse inferences MP, MT, AC, and MT for each content. The task was to rate the probability of the conclusion on a percent scale ranging from 0 to 100. For example, the MP problem without rule (Phase 1) for content HL would present the observation that a balloon is pricked with a needle and ask for the probability that the balloon will pop. The MP problem would present the same premise and ask for the probability that the balloon will not pop. Presenting the traditional inferences along with the converse inferences has several advantages: It allows us to assess the consistency of the individual ratings in that the ratings for the original inference and its converse should add up to approximately 100% according to normative prescriptions. Consistency is a basic prerequisite for any model of conditional inference stated in terms of a probability metric. Furthermore, by averaging over both ratings (with the rating r for the converse inference entered as 100 r), it allows us to mitigate the effects of relatively superficial response biases such as differential tendencies to endorse conclusions, whatever the inference, or general tendencies to prefer forward inferences (from p or not-p to q or not-q), whatever the inference, over backward inferences (from q or not-q to p or not-p) as might be induced by a shallow if heuristic. For Phase 1 problems, participants were told that they would see an observation and that they were to rate how probable it is that a certain conclusion drawn from it would hold. Phases 2 and 3

presented the problems with rule. Phases 2 and 3 differed from each other in the degree to which the validity of the rule was stressed. For Phase 2 problems, participants were told that a rule had been stated, that they would see the rule and an observation, and that they were to rate for each problem how probable it is that a certain conclusion drawn from this information would hold. Immediately preceding Phase 3, participants were to rate their belief in the validity of each rule in order to give them an opportunity to express that some of the rules were less plausible than others. For Phase 3 problems, participants were then told that they would see a rule and an observation and that they were to take the rule as valid without exception. They were asked to rate for each problem how probable it is that a certain conclusion drawn from this information would hold, taking the rule as valid and given the observation. Thus, we manipulated the factors (a) prior knowledge by means of the different contents and (b) emphasis on the rule by means of the different instructions. The dual-source model was evaluated by the extent to which it met the following predictions: Prediction 1: The dual-source model should fit the data satisfactorily, and it should provide better fit than Oaksford et al.s (2000) model of probabilistic inference. That model can be directly applied to our data by assuming that rule presence in Phase 2 depresses the exceptions parameters e, and that emphasis on the rule in Phase 3 may further decrease these parameters. Prediction 2: The effects of content should be mapped on the knowledge parameters . In particular, knowledge parameters for MP and MT should be higher for contents high rather than low in perceived sufficiency of p for q. Simultaneously, knowledge parameters for AC and DA should be higher for contents with high rather than low perceived necessity. Prediction 3: Rule emphasis should affect the weight for form-based evidence relative to knowledge-based evidence. Thus, should be higher in Phase 3 than in Phase 2. Prediction 4: Taking the results from reasoning with abstract conditionals as a guideline, the parameters should be ordered across inferences approximately as (MP) (MT) (AC) (DA).

Method
Participants. Participants were 15 University of Freiburg students (four male, 11 female, with age ranging from 18 to 26 years) with different majors, excluding majors that imply formal training in logic such as mathematics. Each participant was requested to go through three phases of the experiment with different phases separated by at least 1 week. Participants received monetary compensation of 14. Procedure. In each phase, the 32 problems described above were presented in two blocks of 16 problems in individual sessions. Between blocks, participants had an opportunity to take a break. The order of problems was newly randomized for each phase and participant with the restrictions (a) that there were four problems per content in each block, (b) that an inference and its converse were assigned to different blocks in each phase, and (c)

A DUAL-SOURCE MODEL

307

that within each block problems were blocked by content. We presented problems blocked by content because pretests suggested that a random sequence of contents was seen as very demanding and confusing. Participants were told that no two problems presented were identical and that they were to read each problem carefully because some of the problems would differ only in small, but important, details. Furthermore, the words signaling negated statements such as not or no were presented in capital letters, because pretests suggested that these particulars were sometimes overlooked. Following the two blocks of each phase, the computer program presenting the experiment compared responses for each inference and its converse. Problem pairs for which responses were very inconsistent, defined as a sum of ratings less than 40 or larger than 160, were then presented once more in a third block, if any such problem pairs existed. Remember that responses were given in percentages and that the ratings for an inference and its converse should add to a value of approximately 100. Sums smaller than 40 or larger than 160 imply that both the original inference and its converse had been rated as highly improbable or highly probable. Prior to the third block, participants learned by means of an example using a content different from the ones used in the problems themselves that they had rated both an inference and its converse as either highly probable or highly improbable. Participants were asked to work through such problems once more, paying particular attention to the details of the wording. The third block, if it was necessary, presented problem pairs of an inference and its converse in immediate succession.

Results and Discussion


Degrees of freedoms of F tests of analyses of variance with repeated measures are GreenhouseGeisser corrected throughout this article. Consistency. As in previous research (Oaksford et al., 2000), participants were reasonably highly consistent with respect to the ratings given to an inference and its converse over the 48 3 16 problem pairs presented. Leaving out the data from the third block in each phase, the upper left panel of Figure 2 shows for each participant the 10%, 25%, 50% (median), 75%, and 90% quantiles for the sum of the ratings given to the inference and converse as well as the correlations between the two ratings, with that for the converse inverted. Correlations are shown as characters x with values multiplied by 100. Quantiles close to 100 and high correlations indicate high consistency of the ratings. It can be seen that participants ratings appear to be fairly well calibrated in that the median (third line from bottom) is always close to 100, and the quartiles (second and fourth lines from bottom) are also close to 100. Correlations tend to be reasonably high (M .86, SD .12). Summed over Phases 1 to 3, an average of 1.4 problem pairs (SD 1.92, range 0 to 6) were repeated in one of the third blocks. For the analyses below, ratings from the third block replaced ratings from the first two blocks, increasing consistency even further. Thereafter, there remained one problem pair for one individual with sum of ratings outside the range of 40 to 160; ratings for this problem pair for this individual were treated as missing values for the model analyses. Rating data. We averaged the two ratings of each problem pair, with the rating r given to the converse problem mirrored at

50% (i.e., with r replaced by 100 r), resulting in 48 data points per participant. The first three panels of Figure 3 show the mean ratings obtained as a function of inference and content. As can be seen, the contents differed as expected: Contents with high rather than low sufficiency of p for q (HH and HL vs. LH and LL) received the highest MP and MT ratings, whereas contents with high rather than low necessity of p for q (HH and LH vs. HL and LL) received the highest AC and DA ratings. The effects of adding a rule were to compress ratings toward higher levels with stronger effects on MP and MT than on AC and DA. Note that within each inference, the order of ratings over contents remained the same in most cases across phases. The effects are thereby qualitatively similar to those observed by Liu (2003), but the effects of rule presence seem more pronounced in our data. Prediction 1. For the model analyses, the 48 data points in the first three panels of Figure 3 were fit by the dual-source model for each participant separately with 18 parameters per participant. These comprise 4 parameters (for inferences MP, MT, AC, and DA), 12 parameters underlying the knowledge parameters (3 parameters, a[C], b[C], and e[C], per content), as well as 2 parameters, 1 for Phase 2 with rule and 1 for Phase 3 with rule emphasis. Parameter was set zero for Phase 1 because there was no rule in Phase 1. A weighted least-squares objective function was minimized for each participants data, with weights reflecting the consistency information that is available for each data point.5 Goodness of fit was acceptable with mean R2 .92 (SD .048). We also fit Oaksford et al.s (2000) model. This model requires 20 parameters. These comprise 4 a parameters; 4 b parameters; as well as 4 e parameters for the problems without rule; 4 new exceptions parameters e, 1 per content, to account for the effects of adding a rule in Phase 2; and 4 further exceptions parameters e to account for the effects of emphasizing the rule in Phase 3. The resulting model fit less well than the dual-source model despite the use of more parameters (mean R2 .74, SD .082). Remember, however, that Oaksford et al. suggested to modify the model for rule LH, which implies a b, permitting the model to adjust b upward to meet the requirement that a(1 e) b. When an additional parameter b was admitted for Phases 2 and 3 for the rule LH, model fit was acceptable (mean R2 .88, SD .053) with 21 parameters. In both cases, model fit was, however, significantly worse than for the dual-source model: t(14) 7.70, p .01, for comparing the unmodified Oaksford et al. model and the dualsource model, and t(14) 3.00, p .01, for comparing the modified model and the dual-source model. The dual-source model
Each data point r was based on averaging two ratings, one for the original inference, ro, and one for its converse, rc, with the converse rating mirrored at 50%, that is, r (ro [100 rc])/2. The ratings for a pair of an inference and its converse are consistent with each other to the extent to which ro rc 100. Weights for the weighted least-squares analysis were given by (60 100 [ro rc])/60. Remembering that data points with ro rc outside the interval between 40 and 160 are treated as missing values, these weights range between 0 and 1. They are large to the extent to which the two ratings ro and rc sum to 100. The goodness-of-fit index R2 in weighted least-squares is the coefficient of variation that quantifies the amount of variance in the weighted data that is accounted for by the model.
5

308

TTER KLAUER, BELLER, AND HU

Experiment 1
180 180

Experiment 2

160

140

Consistency

120

Consistency

100

80

x x x x

80

x x

x x x x x x x x x

100

120

140

160

x
60

xx

xxxx

xx xx

xxxxxxxxx

xxxxxx

60

x x

40

20

Participant Experiment 3
180 180

20

40

Participant Experiment 4

160

140

Consistency

120

Consistency

100

100

120

140

160

80

x
60 60

80

x x x x x x x x x x x x x x x x

x x x x x x x x x x x x x

40

x
20 20

Participant

40

Participant

Figure 2. Consistency data for participants in Experiments 1, 2, 3, and 4 based on the sums of ratings for an inference and its converse. The lines show, in order from bottom line to top line, the 10%, 25%, 50% (median), 75%, and 90% quantiles of these sums as well as correlations between ratings for an original inference (i.e., one of MP [modus ponens], MT modus [tollens], AC [affirmation of the consequent], and DA [denial of the antecedent]) and its converse inference, with the latter mirrored at 50%. Correlations are shown as x marks with values multiplied by 100. Participants are ordered so that correlations increase from left to right.

A DUAL-SOURCE MODEL

309
With Rule

Without Rule
100 100

90

Probability Rating

Probability Rating
MP MT AC DA

80

70

60

50

40

30

30

40

50

60

70

80

90

MP

MT

AC

DA

Rule Emphasized
100

Model Parameters Parameter Estimates (in Percent)


100 90

90

Probability Rating

80

70

60

HL LH LL

60

HH

70

80

50

40

30

MP

MT

AC

DA

30

40

50

MP

MT

AC

DA

Figure 3. Mean ratings and parameter estimates for Experiment 1. The top left, top right, and bottom left panels show, in order, mean ratings for problems without rule (Phase 1), for problems with rule (Phase 2), and for problems with emphasis on the rule (Phase 3) as a function of content (HH, HL, LH, and LL) and inference (MP [modus ponens], MT modus [tollens], AC [affirmation of the consequent], and DA [denial of the antecedent]). The bottom right panel shows mean parameter estimates (multiplied by 100) of the knowledge parameters and of the parameters (with values marked as ) for the dual-source model. HH, HL, LH, and LL refer to contents presented to be simultaneously either high (H) or low (L) in perceived sufficiency (first letter in the pair) and perceived necessity (second letter in the pair) of p for q.

fit the data of 14 of the 15 participants better than the modified model by Oaksford et al. Thus Prediction 1 could be upheld. Prediction 2. The knowledge parameters are also shown in Figure 3 (see lower right panel). As can be seen, they roughly reflect the pattern seen in the ratings in the first phase without rule. In particular, parameter estimates reflected the factors perceived sufficiency and perceived necessity. Knowledge parameters for MP and MT problems were higher for contents high in perceived sufficiency (M 90.33; values are given in percentages) than for contents low in perceived sufficiency (M 50.23), t(14) 8.12, p .01. Knowledge parameters for AC and DA problems were higher for contents high in perceived necessity (M 91.49) than for contents low in perceived necessity (M 53.90), t(14) 12.27, p .01. Prediction 2 could thus be upheld. Prediction 3. The mean weight parameter for Phase 2 (with rule) was .68 (SD .35) and thus somewhat higher than the weight parameter estimated for the Liu (2003) data. In Phase 3 (rule emphasized), it was significantly increased (mean .98,

SD .05), t(14) 3.46, p .01. Note that parameter is the only parameter that varies between Phases 2 and 3; the model thereby provides a very parsimonious description of the differences between these two phases. Prediction 3 could be upheld. Prediction 4. The mean parameters are also shown in Figure 3 (see bottom right panel). As can be seen, they follow the expected order of MP MT AC DA. An analysis of variance of the parameters with factor inference (MP, MT, AC, and DA) found a main effect of inference, F(1.52, 21.28) 9.44, p .01. Planned contrasts revealed that for the MP inference did not significantly exceed for MT, F(1, 14) 1.17, p .30, whereas for the MT inference exceeded the mean for the two invalid inferences AC and DA, F(1, 14) 12.40, p .01. There were no significant differences between AC and DA, F(1, 14) 1.43, p .25. Inspecting the parameters for the different participants individually, some of the participants had equally high parameters for all inferences, suggesting that they followed a biconditional

310

TTER KLAUER, BELLER, AND HU

interpretation of rule-based evidence. On balance, Prediction 4 could be upheld. Belief ratings. Mean rule believability for contents HH, HL, LH, and LL was, in order, 89 (SD 11), 92 (SD 7.5), 39 (SD 29), and 56 (SD 18). The differences between contents were significant, F(1.84, 25.81) 28.97, p .01. In line with previous work, the pattern of rule believability closely followed the pattern for MP ratings in Phase 1 as well as the pattern of knowledge parameters for MP, (C,MP) (see Figure 3). Summary. Taken together, the dual-source model provided an adequate fit to a complex data structure. It fit significantly better than Oaksford et al.s (2000) model despite using fewer parameters. The manipulated factors content, inference, and phase mapped in a meaningful fashion on the knowledge parameters, the parameters, and the parameters.

The major predictions were the following: Prediction 1: Presentation order and inconsistency correction should have little effect on the ratings. Prediction 2: The effects of content should again be mapped on the knowledge parameters , and presentation order and inconsistency correction should have little effect on these parameters. Prediction 3: The parameters should again be ordered approximately according to MP MT AC DA, and presentation order and inconsistency correction should have little effect on the parameters and on the parameter.

Method
Participants. The 28 participants (10 male, 18 female, with age ranging from 18 to 34 years) were 26 University of Freiburg students with different majors, excluding majors that imply formal training in logic such as mathematics, one high school student, and one apprentice. Each participant was requested to go through two phases with different phases separated by at least 1 week. Participants received monetary compensation of 20 for participating. Procedure. Procedures were the same as in Experiment 1, but there were only two phases, one without rule and one with rule. Presentation order of the two phases was manipulated, as well as whether participants had an opportunity to correct highly inconsistent ratings in a third block in each phase (inconsistency correction). In particular, the third blocks were simply omitted for participants in the groups without inconsistency correction. The two factors were crossed, resulting in a balanced design with four groups of participants. For problems with rule, the standard instructions were those that were employed for Phase 2 in the previous experiment; that is, the rule was not specially emphasized. Participants were told that they would go through two phases, one with rule and one without rule, but that the order in which the two phases were presented was randomly determined. Participants were also told that each problem would be presented twice, once with affirmative conclusion and once with the conclusion negated, in order to permit a reliable assessment of the asked-for probability.

Experiment 2
Experiment 2 was a control experiment intended to assess the effects of procedural choices made in Experiment 1 and in the subsequent experiments. In particular, (a) the baseline phase without rule always preceded the phases with rule, and (b) participants were given an opportunity to reevaluate problem pairs for which their ratings were highly inconsistent between original and converse inference. Obtaining baseline ratings without rule first, and ratings for problems with rule second, seems a natural order of presenting problems. Furthermore, as already mentioned, consistency is a basic requirement for both models (the dual-source model and Oaksford et al.s, 2000, model), and inconsistency is attributed to measurement error by both models. Measures to promote consistency therefore do not favor one model over the other. They serve to enhance the reliability of the data and thereby the accuracy of parameter estimates and the test power for discriminating between models. Both procedural choices may, however, have undesired effects that would then limit the generalizability of the results. For example, there may be effects of the order in which problems with rule versus problems without rule are presented on the ratings. Similarly, having an opportunity to reevaluate problem pairs with highly inconsistent ratings may have an effect on ratings. In Experiment 2, we presented two phases, one without rule and one with rule, but without special emphasis on the rule. We manipulated (a) presentation order of the two phases without and with rule (factor presentation order) and (b) whether participants had the opportunity to reevaluate problem pairs with highly inconsistent ratings (factor inconsistency correction). In defense of our procedural choices, we hoped that both factorspresentation order and inconsistency correction would have little impact on ratings and model parameters, and we expected the pattern of results to be the same as in Experiment 1. Note, however, that the dual-source model and Oaksford et al.s (2000) model use the same equations to predict ratings in problems without rule. There was thus only one phase in Experiment 2, the phase with rule, rather than two phases as in Experiment 1 for which the dual-source model and Oaksford et al.s model make different predictions. For this reason, the chances of discriminating between the dual-source model and Oaksford et al.s model were lowered in this experiment relative to Experiment 1.

Results and Discussion


Consistency. Consistency information for the uncorrected ratings is shown in the upper right panel of Figure 2. Like in Experiment 1, participants ratings appear to be fairly well calibrated. Correlations between the two ratings, with that for the converse inverted, again tend to be reasonably high (M .86, SD .12). Correlations were Fisher z transformed and submitted to an analysis of variance with factors presentation order and inconsistency correction. This revealed no significant effects or interactions, largest F(1, 24) 1.43, smallest p .25. Of the 28 32 896 problem pairs presented, 34 or 4% had sums of ratings outside the interval (40,160). Half of the participants had an opportunity to reevaluate such problem pairs, and ratings from this third block replaced ratings from the first two

A DUAL-SOURCE MODEL

311

blocks for the model analyses; thereafter, 23 or 3% of the problem pairs remained with sums of ratings outside the interval (40,160). Like in Experiment 1, these were treated as missing data for the model analyses. Rating data and Prediction 1. Ratings were aggregated across problem pairs of inference and converse inference as in Experiment 1, resulting in 32 data points per participant. We did this twice, once with the uncorrected ratings from the first two blocks and once with the ratings from the corrective third block, where presented, replacing the initial ratings. Both resulting data sets were submitted to analyses of variance with between-

participants factors presentation order and inconsistency correction and within-participants factors content (four contents), inference (MP, MT, AC, and DA), and phase (without rule vs. with rule). Despite the many significance tests for effects and interactions involving presentation order and inconsistency correction, none of these reached significance in either analysis (largest F 2.42, smallest p .08). The mean ratings (with corrections where applicable) are shown in Figure 4 as a function of content and inference for problems without rule (left panel) and problems with rule (middle panel). As can be seen, the data show a pattern that is similar to that observed

Without Rule
100 100

With Rule
100

Model Parameters

90

90

90

80 80 80

Parameter Estimates (in Percent)

Probability Rating

Probability Rating

70

70

70

60

60

60

50

50

50

40

40

HH HL LH LL 30 30 30 MP MT AC DA MP

MP

MT

AC

DA

40

MT

AC

DA

Figure 4. Mean ratings and parameter estimates for Experiment 2. The left and middle panels show mean ratings for problems without rule and with rule, respectively, as a function of content (HH, HL, LH, and LL) and inference (MP [modus ponens], MT modus [tollens], AC [affirmation of the consequent], and DA [denial of the antecedent]). The right panel shows mean parameter estimates (multiplied by 100) of the knowledge parameters and of the parameters (with values marked as ) for the dual-source model. HH, HL, LH, and LL refer to contents presented to be simultaneously either high (H) or low (L) in perceived sufficiency (first letter in the pair) and perceived necessity (second letter in the pair) of p for q.

312

TTER KLAUER, BELLER, AND HU

by Liu (2003) and in Experiment 1 as a function of content and rule presence. Prediction 2. The dual-source model was fit to the data from each individual separately. It comprises 17 parameters per person: 4 parameters (for inferences MP, MT, AC, and DA), 12 parameters underlying the knowledge parameters (3 parameters, a[C], b[C], and e[C], per content), as well as 1 parameter for the phase with rule. Model fit was good (mean R2 .95, SD .057). Mean knowledge parameters are shown in Figure 4 (see right panel). In an analysis of variance with factors presentation order, inconsistency correction, content, and inference, there were no significant effects or interactions involving presentation order and/or inconsistency correction on the knowledge parameters (largest F 2.92, smallest p .07). As can be seen in Figure 4, knowledge parameters (in percentages) for MP and MT problems were higher for contents high in perceived sufficiency (M 93.87) than for contents low in perceived sufficiency (M 59.78), t(27) 14.73, p .01. Knowledge parameters for AC and DA problems were higher for contents high in perceived necessity (M 94.59) than for contents low in perceived necessity (M 63.59), t(27) 11.83, p .01. In sum, Prediction 2 could be upheld. Prediction 3. The mean parameters are also shown in Figure 4. In an analysis of variance with factors presentation order, inconsistency correction, and inference, there were no significant effects or interactions involving presentation order and/or inconsistency correction (largest F 2.03, smallest p .17). Like in Experiment 1, the main effect of inference was, however, significant, F(2.39, 57.45) 7.04, p .01. Planned contrasts revealed that for the MP inference significantly exceeded for MT, F(1, 24) 3.61, p .07 ( p values reported in this article are twotailed, but because the difference is in the predicted direction, this p value can be halved for a one-tailed significance test, resulting in p .035), and for the MT inference exceeded the mean for the two invalid inferences AC and DA, F(1, 24) 5.11, p .03. Like in Experiment 1, (AC) and (DA) did not differ from each other significantly (F 1). The mean parameter for the relative impact of form-based evidence was .69 (SD .36). The parameters were submitted to an analysis of variance with factors presentation order and inconsistency correction. This revealed no significant effect or interaction, largest F(1, 24) 1.42, smallest p .25. Taken together, Prediction 3 could be upheld. Belief ratings. Mean rule believability for contents HH, HL, LH, and LL was, in order, 92 (SD 7.2), 93 (SD 11), 29 (SD 21), and 59 (SD 26). The differences between contents were significant, F(2.35, 63.40) 97.11, p .01. Like in Experiment 1 and in line with previous work, the pattern of belief ratings again closely follows that for MP in the knowledge parameters and the pattern of Phase 1 ratings for MP without rule (see Figure 4). Fit of Oaksford et al.s (2000) model. We again fit Oaksford et al.s model. It requires 16 parameters: 4 a parameters; 4 b parameters; and 4 e parameters, 1 for each content; as well as 4 additional e parameters for the phase with rule. In addition, we again fit the modified model with a separate parameter b for the content LH in the presence of a rule (see Experiment 1). The modified model requires 17 parameters. The unmodified model fit significantly worse than the dual-source model (R2 .81), t(27) 5.42, p .01. The modified model uses an additional parameter

and reaches a mean R2 of .93 that approaches that of the dualsource model (R2 .95), t(27) 1.53, p .14. Summary. The purpose of Experiment 2 was to defend certain procedural choices implemented in Experiment 1 and in the experiments that follow: (a) the fixed presentation order with baseline ratings for problems without rule first, followed by ratings with rule, and (b) having an opportunity to reevaluate problem pairs with highly inconsistent ratings. None of the dependent variables, neither ratings, nor model parameters, showed effects of the factors presentation order and inconsistency correction despite the many significance tests conducted, and the results replicated those found for the first two phases of Experiment 1. It seems likely that as the number of participants is increased, small effects of presentation order or inconsistency correction would eventually emerge, but for the sample sizes employed in the present series of experiments, the effects appear to be negligible. On the basis of Experiment 2, we felt justified in maintaining the tested procedural choices (a) and (b) in the subsequent experiments. As already mentioned, the preferred presentation order (problems without rule first, followed by problems with rule) is a natural order, and the reverse order, in which rules are present first and then withdrawn in a second phase, seemed awkward. Furthermore, having an opportunity to reevaluate problem pairs with highly inconsistent ratings does indeed increase consistency for the corrected ratings. When we computed the correlations between original and converse inferences (with the rating for the converse inferences inverted) on the basis of the corrected ratings for the participants with inconsistency correction and compared them with the correlation for the uncorrected ratings, the correlations were trivially unchanged for seven of 14 participants who did not produce highly inconsistent ratings in the first place but increased for the other seven participants. Importantly, the present data suggest that the increase in consistency and the associated increase in reliability and statistical test power do not come at the cost of a systematic bias in ratings or model parameters. As expected, the data of Experiment 2 did not permit us to discriminate between the dual-source model and Oaksford et al.s (2000) model as clearly as those of Experiment 1.

Experiment 3
In Experiment 3, we focus on a manipulation of rule form. Specifically, we compared problems using if p then q rules with problems using p only if q rules. The manipulation of rule form should map on the parameters given that it affects the logical form of the arguments, but not the background knowledge about the particular contents used in the rule statements. A few studies have compared conditional inferences for ifthen and onlyif, many of them using abstract materials and instructions stressing logical necessity (as summarized by Evans & Over, 2004, Chapter 3). One finding is that onlyif tends to lead to higher rates of the backward inferences, AC and MT, whereas ifthen tends to lead to higher rates of the forward inferences, MP and DA (with the labels MP, MT, AC, and DA defined by the minor premise and conclusion in problems with if p then q). To the extent to which these directional biases also affect the forward and backward converse inferences (MP, MT, AC, and DA), they should cancel out in the present analyses, because we aggregate across each pair of an inference and its converse with ratings for converse

A DUAL-SOURCE MODEL

313

inferences mirrored at 50%. In addition, the literature suggests that onlyif more strongly stresses the necessity of q for p (Thompson & Mann, 1995), facilitating in particular the MT inference (and rejection of the MT inference). For example, Braine (1978) argued that only X means no Y other than X and in his view, the rule p only if q is thereby frequently paraphrased as If other than q, not p or If not q, then not p. In a similar vein, Johnson-Laird et al. (1992) argued that onlyif makes cases with not-p and not-q salient, facilitating MT. We therefore expected effects of rule from on (MT) with higher values for onlyif than for ifthen. In Experiment 3, we used the same contents as in Experiments 1 and 2, with p and q interchanged for the HL and LH contents, so that these contents now became LH and HL contents, respectively. For exploratory reasons, we implemented a fifth content with a relatively unfamiliar topic. The corresponding ifthen rule was If the level of oxytocin in the blood is increased, lactation is elevated. We expected this content to elicit knowledge parameters of intermediate levels, and we refer to it by the letter U in the figures below. Although the Oaksford et al. (2000) model has not been formally extended to deal with onlyif, we tried to adapt it to the rule p only if q for the sake of comparison. According to the above, the rule p only if q seems to stress that cases without q, but with p, are the exception. Consequently, we chose the exceptions parameter to model the probability P( pnot-q) for onlyif (Oaksford & Chater, 2007, p. 155) rather than P(not-qp) as for ifthen, with parameters a and b unchanged. This has the effect to allow the model to accommodate enhanced acceptance of MT for onlyif relative to ifthen. The major predictions were as follows: Prediction 1: The dual-source model will exhibit an adequate model fit, and it will provide a better description of the data than the (adapted) model by Oaksford et al. (2000). Prediction 2: The effects of content should again be mapped on the knowledge parameters . We expected intermediate parameter values for the relatively unfamiliar content U. Prediction 3: The parameters should again be ordered approximately according to MP MT AC DA. Furthermore, rule form should have an effect on the parameters. In particular, we expected (MT) to be larger for onlyif than for ifthen. There was little reason to expect an effect of rule form on the weight parameter, given our initial working assumption that the weight of form-based evidence is largely determined by the experimental setting, and the instructions in particular. We allowed for different weight parameters for onlyif and ifthen, however, to assess a potential effect of rule form on the use of form-based evidence.

through three phases of the experiment with different phases separated by at least 1 week. Participants received monetary compensation of 14. Procedure. Procedures were as in Experiment 1 with the following changes: There were now five contents and hence 40 problems generated by crossing contents and (original and converse) inferences. In Phase 1, these problems were presented without rule. In Phases 2 and 3, the problems were presented with rule, using the standard instructions that did not specially emphasize the rule. In Phases 2 and 3, two of the five contents used one rule form (either ifthen or onlyif); the remaining three used the other rule form. Across Phases 2 and 3, ifthen and onlyif were crossed with all five contents. Which contents and how many (two vs. three) were paired with ifthen rather than onlyif for Phase 2 were randomized for each participant anew. Contents and rule form were thereby counterbalanced across the two phases with rule. In each phase, 40 problems were presented in two blocks of 20 problems. Following the two blocks, the program again represented problem pairs of an inference and its converse with sum of ratings less than 40 or larger than 160 in a third block. At the end of Phase 3, participants rated their belief in the validity of all 10 rules that they had seen in the course of the experiment.

Results and Discussion


Consistency. Consistency information for the uncorrected ratings is shown in the lower left panel of Figure 2. Participants ratings again appear to be fairly well calibrated. Correlations between the two ratings, with that for the converse inverted, tend to be reasonably high (M .85, SD .16). Summed over Phases 1 to 3, an average of 2.4 of the 60 problem pairs (SD 2.66, range 0 to 10) were repeated in one of the third blocks. For the analyses below, ratings from the third block replaced ratings from the first two blocks, increasing consistency even further. Thereafter, there remained four problem pairs in all the individuals data with sum of ratings outside the range from 40 to 160. These individual ratings were treated as missing data for the model analyses. Rating data. Ratings were aggregated across problem pairs of inference and converse inference as in Experiment 1, resulting in 60 data points per participant. The first three panels of Figure 5 show the mean ratings as a function of inference and content. As can be seen, the contents differed as expected, and the unfamiliar content elicited ratings of an intermediate level. The effects of adding a rule were to compress ratings toward higher levels with stronger effects on MP and MT than on AC and DA. The effects of onlyif on compressing MT ratings appear to be a little stronger than the effects of ifthen. Prediction 1. For the model analyses, the 60 data points in the first three panels of Figure 5 were fit by the dual-source model with 25 parameters for each individual separately. These comprise 4 parameters for ifthen and 4 for onlyif; 15 parameters underlying the knowledge parameters (3 parameters, a[C], b[C], and e[C], per content); as well as 2 parameters, 1 for problems with ifthen rule and 1 for problems with onlyif rule. Parameter was as always set zero for problems without rule. Model fit was acceptable (mean R2 .89, SD .081).

Method
Participants. Participants were 18 University of Freiburg students (four male, 14 female, with age ranging from 19 to 28 years) with different majors, excluding majors that imply formal training in logic such as mathematics. Each participant was requested to go

314

TTER KLAUER, BELLER, AND HU

Without Rule
100 100

With IfThen Rule

90

Probability Rating

Probability Rating
MP MT AC DA

80

70

60

50

40

30

30

40

50

60

70

80

90

MP

MT

AC

DA

With OnlyIf Rule


100

Model Parameters Parameter Estimates (in Percent)


100
OI IT OI

90

Probability Rating

80

70

70

80

90

60

60

HH HL LH LL U

IT IT OI OI IT

50

40

30

MP

MT

AC

DA

30

40

50

MP

MT

AC

DA

Figure 5. Mean ratings and parameter estimates for Experiment 3. The top left, top right, and bottom left panels show, in order, mean ratings for problems without rule, problems with ifthen rule, and problems with onlyif rule as a function of content (HH, HL, LH, LL, and U) and inference (MP [modus ponens], MT modus [tollens], AC [affirmation of the consequent], and DA [denial of the antecedent]). The bottom right panel shows mean parameter estimates (multiplied by 100) of the knowledge parameters and of the parameters (with values for ifthen [IT] and onlyif [OI]) for the dual-source model. HH, HL, LH, and LL refer to contents presented to be simultaneously either high (H) or low (L) in perceived sufficiency (first letter in the pair) and perceived necessity (second letter in the pair) of p for q. U refers to content expected to elicit knowledge parameters of intermediate levels.

The model by Oaksford et al. (2000) also required 25 parameters; 5 a parameters; 5 b parameters, one per content; as well as 5 e parameters for the problems without rule; 5 new e parameters for the problems with ifthen; and 5 new e parameters for the problems with onlyif. The mean R2 was .78 (SD .12), and the difference between the two models was significant, t(17) 3.16, p .01. We again modified the Oaksford et al. model by permitting a new b parameter for the LH rule; this model with 26 parameters achieved a mean R2 of .83 (SD .08), which was again significantly smaller than that of the dual-source model in a one-tailed test, t(17) 1.74, p .049. The dual-source model received higher R2 values than the modified Oaksford et al. model for 11 of the 18 participants. Prediction 1 can be upheld. Prediction 2. The knowledge parameters are also shown in Figure 5, see bottom right panel. As can be seen, knowledge parameters for MP and MT problems were higher for contents high in perceived sufficiency (M 94.33) than for contents low in

perceived sufficiency (M 61.71), t(17) 9.54, p .01. Knowledge parameters for AC and DA problems were higher for contents high in perceived necessity (M 94.03) than for contents low in perceived necessity (M 55.13), t(17) 14.75, p .01. The relatively unfamiliar content elicited knowledge parameters of intermediate level. Prediction 2 could thus be upheld. Prediction 3. The mean parameters are also shown in Figure 5 (see bottom right panel, lines with points labeled IT [for ifthen] and OI [for onlyif]). As can be seen, they follow the expected order with MP MT AC DA. An analysis of variance of the parameters with rule form (ifthen vs. onlyif) and inference as factors revealed a significant interaction, F(2.64, 44.80) 4.46, p .01. Individual t tests performed per inference showed as expected that (MT) was significantly larger for onlyif than for ifthen, t(17) 3.07, p .01, all other t 1. Planned contrasts for the ifthen rule furthermore revealed that for the MP inference significantly exceeded the mean for MT,

A DUAL-SOURCE MODEL

315

F(1, 17) 12.22, p .01, whereas for the MT inference nonsignificantly exceeded the mean for the two invalid inferences AC and DA (F 1). Again, there were no significant differences between AC and DA (F 1). Inspecting the parameters for the different participants individually, some of the participants again had equally high parameters for all inferences, suggesting that they adopted a biconditional interpretation of the conditional statements. Prediction 3 can be upheld. Parameter . Separate weights were estimated for the impact of form-based evidence for both rule forms. Mean parameters for ifthen and onlyif were .88 (SD .19) and .88 (SD .25), respectively. As expected, the difference between the two was not significant (t 1). Compared to Experiment 1s Phase 2 and Experiment 2, form-based evidence received somewhat higher weights in Experiment 3. Belief ratings. Mean rule believability for contents HH, HL, LH, LL, and U was, in order, 88 (SD 17), 94 (SD 17), 41 (SD 34), 19 (SD 18), and 78 (SD 18) for the ifthen rules and 79 (SD 18), 87 (SD 30), 30 (SD 31), 12 (SD 16), and 64 (SD 25) for the onlyif rules. An analysis of variance with factors rule form and content revealed main effects of content, F(2.93, 41.36) 46.89, p .01, and rule form, F(1, 17) 12.10, p .01. The interaction was not significant (F 1), so that onlyif rules were simply rated somewhat less believable irrespective of content. Like in Experiments 1 and 2 and in line with previous work, the pattern of belief ratings again follows that for MP in the knowledge parameters and the pattern of baseline ratings for MP (see Figure 5). Summary. The dual-source model again provided a satisfactory account of the data. The effects of content mapped onto the knowledge parameters as in Experiment 1. Rule form had a very focused effect: Use of onlyif selectively increased the parameter for MT as expected. Rule form had no impact on , the relative weight for the form-based evidence. The dual-source model thereby provides a very parsimonious account of the effects of rule form.

Oaksford et al. (2000) originally introduced their model to account for polarity bias in the negations paradigm. They argue that negations define categories with higher probability than their affirmative counterparts. In this view, a statement such as The student did not learn is interpreted as a contrast set relative to a superordinate category of possible activities suggested by the context in which the statement was uttered and thus as The student engaged in possible activities other than learning. The contrast set is regularly much larger than the set implied by the affirmative statement and hence, likely to be seen as more probable. Other things being equal, this leads to the prediction that MT, AC, and DA inferences with negated conclusion should be accepted more frequently than these same inferences with affirmative conclusion under Oaksford et al.s model. In other words, the negative conclusion bias is reinterpreted as a high-probability conclusion effect. The negations paradigm provides a challenge for both Oaksford et al.s (2000) and our model when the same content C is used in the four rules. It should then be possible to use the same knowledge parameters and the same parameters a P( p) and b P(q) in Oaksford et al.s model to describe the ratings obtained for the four different rules (for reasons analogous to those explained in Footnote 3), taking into account in the model equations that a and b for rules with affirmative antecedent and consequent, respectively, become 1 a and 1 b for rules with negated antecedent and consequent. This situation is challenging for both models because it means that it should be possible to model the data for all four rules using only a few parameters. The ratio of parameters to data points was, in order, 18:48, 17:32, and 25:60 for Experiments 1, 2, and 3 for the dual-source model, and 21:48, 17:32, and 26:60 for the model by Oaksford et al. In the present experiment, these ratios were substantially smaller, 17:80 for the dual-source model and 28:80 for Oaksford et al.s model as explained below. We chose four rule contents such that each of the rules AA, AN, NA, and NN was especially plausible for (at least) one content. The contents and plausible rules were as follows: Balloon: If a balloon is pricked with a needle, then it will pop (AA rule). Car: If the battery is empty, then the car will not start (AN rule). Exam: If a student has not learned, then he will fail the exam (NA rule). Pregnancy: If a girl has not had sexual intercourse, then she will not be pregnant (NN rule). Note that each content was used in all four kinds of rules. For example, for the balloon context, we also presented the AN rule, If a balloon is pricked with a needle, then it will not pop; the NA rule, If a balloon is not pricked with a needle, then it will pop; and the NN rule, If a balloon is not pricked with a needle, then it will not pop. We also designated four rules as implausible; these were the above rules with the polarity of the consequent reversed. The purpose of this partial classification of rules as plausible versus implausible was to ensure that participants would never

Experiment 4
A classical paradigm in conditional reasoning is the so-called negations paradigm (Evans & Lynch, 1973). It involves administering equivalent inferences on four conditional statements (AA, AN, NA, and NN) that differ in whether antecedent and/or consequent are affirmed (A) or negated (N): AA: If p then q. AN: If p then not-q. NA: If not-p then q. NN: If not-p then not-q. One major finding emerging from this paradigm is that negative conclusions tend to be accepted more frequently than affirmative conclusions, an effect that is referred to as negative conclusion bias or polarity bias (Evans & Over, 2004). In studies using abstract content, the effect is usually more pronounced for MT and DA than for MP and AC.

316

TTER KLAUER, BELLER, AND HU

work through a phase of exclusively plausible or exclusively implausible problems (see procedures below). The predictions were as follows: Prediction 1: The dual-source model will exhibit an adequate model fit, and it will provide a better description of the data than the model by Oaksford et al. (2000). Prediction 2: The ratings predicted by the dual-source model will adequately reproduce any polarity bias present in the observed ratings. Prediction 3: The parameters should again be ordered approximately as MP MT AC DA.

Method
Participants. Participants were 13 University of Freiburg students (four male, nine female, with age ranging from 20 to 42 years) with different majors, excluding majors that imply formal training in logic such as mathematics. Each participant was requested to go through five phases of the experiment with different phases separated by at least 1 week. Participants received monetary compensation of 21. Procedure. Procedures were as in Experiment 1 with the following changes: Phase 1 was again the baseline phase in which 32 problems were presented without rules. These 32 problems were generated by crossing contents and (original and converse) inferences defined relative to the AA rules with rule omitted. In Phases 2 to 5, the problems were presented with rules. Rule kind (AA, AN, NA, and NN) was crossed with content, generating 16 rules. Each phase presented problems for four of the 16 rules selected randomly with the following restrictions: In each phase, all four contents and all four rule kinds occurred, as did one of the four rules designated plausible and one of the four rules designated implausible. Contents and rule kind were thereby counterbalanced across the four phases with rule.

Results and Discussion


Consistency. Consistency information is shown in the lower right panel of Figure 2. Participants ratings again appear to be fairly well calibrated. Correlations between the two ratings, with that of the converse inverted, tend to be reasonably high (M .87, SD .09). Summed over Phases 1 to 5, an average of 1.7 of the 80 problem pairs (SD 2.39, range 0 to 9) were repeated in one of the third blocks. For the analyses below, ratings from the third block replaced ratings from the first two blocks, increasing consistency even further. Thereafter, there remained 22 problem pairs (2% of the data) in all the individuals data with a sum of ratings outside the interval (40,160). These individual ratings were treated as missing data in the model analyses. Rating data. Ratings were aggregated across problem pairs of inference and converse inference as in Experiment 1, resulting in 80 data points per participant. Figure 6 shows the mean ratings as a function of inference and content. The upper left panel shows the ratings in the baseline phase without rule; the panels other than the bottom right panel show the ratings for the different rule kinds. The inferences are defined relative to the AA rules in the baseline

phase and relative to the relevant rule for the other panels. For example, in the panel showing the results with NA rules, if not-p then q, MP presents not-p and q as minor premise and conclusion, respectively. This makes it somewhat more difficult to compare the panels among each other than in the previous experiments. Plausibility check. For each content, one of the four rules was a priori designated as plausible and one as implausible (there was no a priori classification of the remaining two rules as either plausible or implausible). For the four contents (exam, car, balloon, and pregnancy), mean rule believability for the plausible rules was, in order, 71 (SD 23), 98 (SD 5), 91 (SD 27), and 93 (SD 13); for the implausible rules the order was 11 (SD 21), 5 (SD 18), 2 (SD 6), and 1 (SD 2). The difference between plausible and implausible rule was significant for each content, smallest t(12) 8.58, all ps .01. Rule believability was again highly predictive of MP ratings across the 16 rules; the correlation between mean believability and mean MP rating was .85 across the 16 rules. Prediction 1. For the model analyses, the 80 data points in the five panels of Figure 6 (other than the lower right panel) were fit by the dual-source model with 17 parameters for each individual separately. These comprise 4 parameters, 12 parameters underlying the knowledge parameters (3 parameters, a[C], b[C], and e[C], per content), as well as 1 parameter for the relative weight of the form-based evidence in problems with rule. Model fit was acceptable (mean R2 .79, SD .08), taking into account the comparatively low ratio of parameters to data points. The model by Oaksford et al. (2000) requires 28 parameters; 4 a and 4 b parameters as well as 4 e parameters for problems without rule and 16 new e parameters, 1 for each of the 16 rules. A separate exceptions parameter is necessary for each content and rule kind because the cases that are exceptions differ as a function of rule kind and their perceived likelihood as a function of content according to that model. The mean R2 was .78 (SD .13), and the difference between the two models was not significant (t 1). Thus, the dual-source model achieves a level of fit that is equivalent to that of the Oaksford et al. model, although it uses only 17 rather than 28 parameters. In comparing models, level of fit and parsimony of the model (in terms of number of parameters) must both be weighed (e.g., Myung, 2000), and hence it is fair to conclude that the dual-source model provides the better description of the data on the basis of its much greater parsimony (see also Footnote 6). Given that the Oaksford et al. (2000) model at this point already requires many more parameters than the dual-source model, we did not fit a modified version admitting additional b parameters for some of the rules as was done in the previous experiments: It is trivial that the goodness of fit of the model improves as more and more additional parameters are added. The lower right panel of Figure 6 also shows the model parameters under the dual-source model. As can be seen, the knowledge parameters (gray lines, coded relative to the AA rules) again roughly reflect the pattern of ratings in the baseline phase. The parameters are discussed below. Parameter , the relative weight of the form-based evidence, was .81 (SD .31). Prediction 2. Figure 7 shows polarity effects in ratings and model predictions. Ratings and model predictions were collapsed across rules and contents separately for each inference and conclusion polarity (affirmative vs. negated), leaving out problems

A DUAL-SOURCE MODEL

317
With AN Rule
90 100 30 40 50 60 70 80

Without Rule
90 100 90 100

With AA Rule

Exam Car Pregnancy Balloon

Parameter Estimates (in Percent)

80

70

Probability Rating

Probability Rating

60

50

40

30

30

40

50

60

70

80

Plausible Implausible

20

20

10

10

MP

MT

AC

DA

MP

MT

AC

DA

0 MP

10

20

MT

AC

DA

With NA Rule
90 100 90 100

With NN Rule
90 100

Model Parameters

Parameter Estimates (in Percent)

80

80

80

70

70

Probability Rating

Probability Rating

60

60

60

70

50

50

40

40

30

30

20

20

10

10

0 MP

10

20

30

40

50

MP

MT

AC

DA

MP

MT

AC

DA

MT

AC

DA

Figure 6. Mean ratings and parameter estimates for Experiment 4. The top left, top middle, top right, bottom left, and bottom middle panels show, in order, mean ratings for problems without rule, problems with AA rule, with AN rule, with NA rule, and with NN rule. In each panel with rule, ratings for the rule designated plausible are shown along the bold line, and ratings for the rule designated implausible are along the gray line. The bottom right panel shows mean parameter estimates (multiplied by 100) of the knowledge parameters and of the parameters (with values marked as ) for the dual-source model. AA, AN, NA, and NN refer to conditional statements that differ in whether antecedent and/or consequent are affirmed (A) or negated (N). MP, MT, AC, and DA refer to the four conditional inferences (respectively, modus ponens, modus tollens, affirmation of the consequent, and denial of the antecedent).

presented without rule. It can be seen that there is a negative conclusion bias: Inferences with negated conclusions are rated as more probable than inferences with affirmative conclusions as expected. An analysis of variance of the observed ratings with factors polarity and inference revealed a main effect of polarity, F(1, 12) 22.32, p .01, and a main effect of inference, F(1.44, 17.33) 8.52, p .01. The polarity effect was individually significant in one-tailed t tests for MP and DA inferences, as marked by asterisks in Figure 7 (left panel). In studies using abstract content, polarity biases are often strongest on MT and DA,

but there are few studies of polarity bias with materials for which prior knowledge is available. This same pattern of polarity bias was also reproduced by the dual-source model (middle panel of Figure 7) as well as by the Oaksford et al. (2000) model. That is, there were significant main effects of polarity for the predictions under both models, both Fs(1, 12) 11.56, both ps .01, and individually significant polarity effects for MP and DA. As can be seen in Figure 7, the main effect of inference was correctly reproduced by the dual-source model, and it was significant in the ratings

318

TTER KLAUER, BELLER, AND HU

predicted by that model, F(1.66, 19.99) 6.10, p .01. There was also an interaction of polarity and inference, F(1.59, 19.06) 6.86, p .01, for the ratings predicted by the dual-source model. However, the Oaksford et al. model does not even approximately reproduce the effect of inference as is evident from Figure 7, and the main effect of inference was not significant under that model, F(1.76, 21.13) 1.89, p .18. It is of course possible and even likely that the latter model will eventually fit the observed pattern as additional parameters are added to it. Prediction 3. The mean parameters are also shown in Figure 6 (see bottom right panel). As can be seen, they again follow the expected pattern with MP MT AC DA. An analysis of variance of the parameters with factor inference (MP, MT, AC, and DA) showed a significant main effect of inference, F(1.99, 23.82) 8.18, p .01. Planned contrasts revealed that for the MP inference exceeded the mean for MT significantly, F(1, 12) 19.32, p .01, whereas for MT did not significantly exceed the mean for AC and DA (F 1). The decrease from AC to DA was significant, F(1, 12) 5.03, p .045.6 Summary. The dual-source model again provided a satisfactory account of the data. It did so much more parsimoniously than the model by Oaksford et al. (2000). Despite similar overall levels of goodness of fit, only the dual-source model, but not the model by Oaksford et al., adequately reproduced the pattern of ratings as a function of polarity and inference (see Figure 7). Like in Oaksford et al.s model, polarity bias is located in the knowledge-based component in the dual-source model: It reflects higher estimates for the probability of negated conclusions than of their affirmative counterparts.

General Discussion
What precisely is the role of the conditional rule in probabilistic conditional reasoning? In one view, asserting a conditional rule acts on the reasoners knowledge base by depressing the perceived likelihood of exceptions, that is, of cases in which the antecedent is fulfilled, but not the consequent (Oaksford & Chater, 2007, Chapter 5). The purpose of the present article is to develop and test an alternative view. In this view, the logical form of the problem is a decontextualized source of evidence that is integrated with knowledge-based evidence in assessing the probability of proposed conclusions. Without conditional rule, there is no complete logical form in the first place, and reasoners responses are then based on the evidence derived from background knowledge exclusively. Asserting a conditional rule in this view does not act on the reasoners knowledge base; instead, it makes available an additional source of evidence, based on logical form rather than on knowledge about the presented contents. This view was specified as a dual-source account of probabilistic conditional reasoning. According to the model, participants integrate two sources of evidence, logical form and prior knowledge. Form-based evidence comes into play to the extent to which a rule is stated in the first place, to the extent to which its relevance is emphasized, and to the extent to which a rule of the given form is seen as warranting a given inference, irrespective of content. Conversely, prior knowledge, operationalized summarily by means of bivariate probability information relating p and q, influences probability ratings to the extent to which rule relevance is

downplayed and to the extent to which the given logical form is not seen as warranting a given inference. When no rule is stated in the first place, there is no complete logical form and the ratings are based on background knowledge exclusively. As already mentioned, the model can be framed as a normative model of Bayesian model averaging. It is difficult to assess the role of the conditional rule in probabilistic conditional reasoning when there is no control condition without conditional rule. For that reason, we always implemented a baseline condition in which problems were presented without rule. Across four experiments and in the reanalysis of Lius (2003) data, the model provided a relatively parsimonious account of complex data patterns. What is more, the experimental manipulations mapped on the different model parameters in the expected fashion. Thus, emphasizing the validity of the rule (Experiment 1) increased the relative weight given to rule-based evidence. The knowledge parameters reflected the manipulations of perceived sufficiency and necessity (Experiments 1, 2, and 3). The parameters for the degree to which a given inference is seen as warranted by logical form roughly followed a conditional pattern with MP and MT receiving higher values than AC and DA, although there were individual differences, with some participants showing a biconditional pattern (all experiments). Using rules of the form onlyif in addition led to a focused effect on the parameters in that (MT) was increased for onlyif relative to ifthen (Experiment 3). Finally, the model accounted for observed polarity biases in the negations paradigm parsimoniously (Experiment 4). A control experiment (Experiment 2) defended two procedural choices implemented in the other experiments: (a) to obtain the baseline ratings without rule first and (b) to permit participants to reevaluate problems that they rated highly inconsistently. Across these experiments, the dual-source model was successfully evaluated (a) in terms of goodness of fit, (b) in terms of whether the effects of experimental manipulations targeted at specific model parameters indeed affected these parameters as expected, and (c) in terms of whether the model adequately reproduced critical data patterns (polarity biases). In the following sections, we compare the model parameters across experiments and consider the relationship of the dual-source model to alternative accounts of probabilistic conditional reasoning as well as to broader theories of reasoning.

We also fit a version of the dual-source model with different parameters for each kind of rule (AA, AN, NA, and NN). This model requires 16 parameters and a total of 29 parameters. Mean R2 was .87, which was significantly higher than mean R2 for Oaksford et al.s (2000) model, t(12) 2.64, p .02, which required 28 parameters. This version of our model allowed us to test whether negations in the rule exerted an effect on the parameters. In an analysis of variance with rule kind and inference as factors, neither the main effect of rule kind, F(2.59, 31.11) 1.46, p .25, nor its interaction with inference, F(4.52, 54.29) 1.00, p .42, was significant. Thus, there is little evidence for substantial effects of negations in the rule on the parameters. This agrees well with the finding that the dual-source model with parameters constant across rule kind adequately accounts for the polarity biases observed in the ratings as just reported.
6

A DUAL-SOURCE MODEL

319
Oaksford et al. Model

Observed Data

DualSource Model

95

95

95

*
90

*
90

*
90

+ +
85 85 85

Predicted Probability Rating

+
Predicted Probability Rating
80

Probability Rating
80

+ +
75

80

+
75

75

70

70

+
65 65 65 MP MT AC DA 60 MP

60

MP

MT

AC

DA

60

70

MT

AC

DA

Figure 7. Mean ratings as a function of inference (MP [modus ponens], MT modus [tollens], AC [affirmation of the consequent], and DA [denial of the antecedent]) and conclusion polarity (affirmative [] vs. negated []). The left panel shows the observed ratings; the middle and right panels show the model predictions under the dual-source model and Oaksford et al.s (2000) model, respectively. Asterisks mark inferences with significant polarity effects (at the 5% level of significance).

Comparisons Between Experiments


In this section, we compare the dual-source model parameters across experiments. This analysis can be seen as assessing the impact of individual differences between the different samples of participants and the impact of other contextual and procedural variables that differed between experiments. parameters. parameters from all experiments other than those estimated for the onlyif problems in Experiment 3 were entered into an analysis of variance with factors inference (MP, MT, AC, and DA) and experiment. None of the effects involving experiment was significant (largest F 1.67, smallest p .18). Thus, the profile of parameters was consistent across experiments. The parameter estimates are shown in Figure 8 as a function

of inference and experiment along with the mean parameters over all participants. As can be seen, the overall means follow the expected pattern with values ordered approximately as MP MT AC DA. parameters. We also compared parameters for the relative weight of the form-based evidence across experimental conditions. We left out the parameters estimated for the phase with emphasis on the rule in Experiment 1, because in this condition an effort had been made to alter the value of from that in the standard conditions. An analysis of variance with experiment as factor revealed significant differences between the experiments, F(3, 69) 3.13, p .03: tended to be smaller in Experiments 1 and 2 (M .63, SD .37) than in Experiments 3 and 4 (M

320

TTER KLAUER, BELLER, AND HU


Estimates of Parameter
100

Parameter Estimates (in Percent)

50

60

70

80

90

Exp. 1 Exp. 2 Exp. 3 (ifthen) Exp. 4 Mean

30

40

MP

MT

AC

DA

Figure 8. Mean parameters as a function of inference (MP [modus ponens], MT modus [tollens], AC [affirmation of the consequent], and DA [denial of the antecedent]) and experiment, along with the overall means across all participants.

.85, SD .24). It is difficult to pinpoint the cause of this difference. One difference between the two groups of experiments is that in Experiments 1 and 2, there was only one rule per content, whereas several rules were used with each content in Experiments 3 and 4. This may have made the rules more salient in the latter experiments, but it is also possible that the differences reflect individual differences between the individuals sampled for the different experiments (see next section). Knowledge parameters. In Experiments 1, 2, and 3 the same contents were used with p and q interchanged for the HL and LH rules in Experiment 3. Taking that change into account, we submitted the parameters to an analysis of variance with the withinparticipants factors content and inference and the betweenparticipants factor experiment. This revealed a three-way interaction of all three factors, F(9.85, 280.58) 2.21, p .02. To explore the interaction, we conducted separate analyses for each content. This revealed no significant effects or interactions involving the factor experiment for the HH, HL, and LH contents (largest F 2.96, smallest p .06). The factor experiment had a more pronounced impact on the LL content, where it interacted significantly with inference, F(4.08, 116.25) 6.24, p .01, and exerted a main effect, F(2, 57) 3.19, p .049. Thus, differences between the experiments in the knowledge parameters are largely confined to the LL content (If a person drinks lots of Coke, then that person will gain weight). This makes intuitive sense: Knowledge about the HH (If a predator is hungry, it will search for prey), HL (If a balloon is pricked with

a needle, then it will pop), and LH (If a girl has sexual intercourse, then she will be pregnant) contents is likely to be shared to a greater extent than that for the LL content, which intuitively leaves more room for subjective assessments.

Relationship to Other Accounts


Second-order conditionalization. In the introduction, we discussed the relationship of the dual-source model to the concept of second-order conditionalization by Liu (2003). One way to look at the dual-source model is to say that it extends Lius concept of second-order conditionalization: It provides an explicit model of the second-order conditional probabilities, and it relaxes the assumption implicit in that concept that reliance on the rule is perfect if a rule is given and the second-order conditional probabilities can be computed. The dual-source model thus admits that the major premise of a conditional syllogism, the rule, may be uncertain. Like most approaches in the field, it still considers the given minor premise as certain (but see Over & Hadjichristidis, 2009). Oaksford et al.s (2000) model. An alternative, probabilistic model of conditional inference was presented by Oaksford et al. The dual-source model provided better descriptions of the data than the Oaksford et al. model with the exception of Experiment 2, in which both models performed equivalently (as expected). The dual-source model achieved significantly better goodness-of-fit values in Experiments 1 and 3. In Experiment 4, it gave a much more parsimonious description of the data, with only about half as

A DUAL-SOURCE MODEL

321

many parameters as the Oaksford et al. model and with an equivalent level of goodness of fit. In addition, the dual-source model provided the more adequate account of the effect pattern of conclusion polarity and inference in Experiment 4 (see Figure 7). Another advantage of the dual-source model is conceptual: Manipulations of the relevant knowledge via the used contents (Experiments 1 to 3), manipulations of logical form (Experiment 3), and manipulations of rule relevance via instructions (Experiment 1) all mapped on different model parameters, parameters intended to capture processes assumed to be differentially sensitive to these qualitatively different manipulations on theoretical grounds. In contrast, all parameters of Oaksford et al.s (2000) model summarize the reasoners relevant knowledge, and thus, the different experimental manipulations do not map cleanly on separable parameters. For example, the exceptions parameters in Oaksford et al.s model were sensitive to (a) the particular content used, (b) the presence or absence of a rule, (c) the kind of rule used, and (d) the emphasis that instructions placed on rule relevance. As a side effect, a new exceptions parameter has to be estimated for each rule and content combination in Oaksford et al.s model, causing the model to be much less parsimonious when there are many such combinations as in the negations paradigm. Nevertheless, we believe that it is premature to reject Oaksford et al.s (2000) model altogether at this point because in absolute terms the models disadvantage in goodness of fit was small and because of its appealing conceptual simplicity: It requires only one source of information, namely the bivariate probability distribution of p and q that is assumed to be altered by the presence of a rule. It is possible that modifications and adaptations of that model can be found that are more successful than the modifications that we tried out to remove the above problems. We believe, however, that at this stage it is fair that the burden of proof of this possibility should reside with the proponents of that model. If we do not claim to have refuted Oaksford et al.s (2000) model decisively at this point, we do believe to have established a viable alternative to it in terms of the dual-source model. It provided better descriptions of the data with fewer parameters; like Oaksford et al.s model, it can be given a normative interpretation; and it is capable of dealing with rule forms and connectives other than ifthen without modifications. For example, it can be used with onlyif, with or, and other connectives simply by estimating new values for the parameters for the certainty with which the resulting logical forms are seen to warrant the studied inferences. Verschueren et al.s (2005a) dual-process model. Verschueren et al. (2005a, 2005b) have proposed a dual-process model of conditional reasoning in knowledge-rich contexts. These authors make a distinction between perceived sufficiency and necessity on the one hand and counterexample information in the form of alternative antecedents and disabling conditions on the other hand (but see Geiger & Oberauer, 2007). They argued that conditional reasoning can recruit both a fast and relatively undemanding heuristic process as well as an analytical process that takes more time and imposes a larger load on working-memory resources. Importantly, the heuristic process draws on perceived necessity and sufficiency, whereas the analytical process relies on counterexample information. From the present point of view, both processes are thereby grounded in the knowledge-based mode of reasoning. What Verschueren et al. (2005a, 2005b) showed is that

the knowledge-based mode of reasoning may itself recruit several dissociable processes that differ in speed and working-memory demands (but see Geiger & Oberauer, 2007). As a consequence, we would predict that the effects by Verschueren et al. (2005a, 2005b) would be obtained even if the conditional rule is omitted from all problems and questions, so that participants must rely on background knowledge about the presented contents in the absence of complete logical forms. Suppositional theory of if. The suppositional theory of if (Evans & Over, 2004, Chapter 8) received partial support. According to that theory, ifthen leads reasoners to focus on p cases. The probability of the conditional as well as the confidence in asserting q given p is then derived in a process that is sensitive to the relative frequency of cases with p and q versus cases with p and not-q. As a result, both the confidence in the conditional rule as well as in MP are primarily dependent upon perceived sufficiency P(qp). In the present case, an estimate of P(qp) was given by the ratings for MP problems without rule. In line with the suppositional account, perceived sufficiency predicted the rank orders of rated rule believability as well as MP and MT ratings for problems with rule in all of our experiments. On the other hand, the effect of adding a rule is in general inversely related to the ratings of rule believability and MP ratings without rule assessing perceived sufficiency. For example, in Experiment 1, MP problems without rule received ratings of, in order, 88, 94, 34, and 61 for the HH, HL, LH, and LL contents; rule believability was rated similarly as, in order, 89, 92, 39, and 56; yet, adding a rule increased MP ratings in Phase 2 of Experiment 1 by, in order, 7, 3, 46, and 24 points on the percent scale. Some of this inverse relationship is undoubtedly due to ceiling effects: If rule believability and MP rating without rule are already close to 100, there is little room for further increases. However, the inverse relationship also holds for LH and LL contents, which start out at low to intermediate levels of believability (for ratings without rule) and end well below the ceiling (for ratings with rule; see for example contents LH and LL in Experiment 1). This somewhat paradoxical inverse relationship between (a) the size of rule effects on MP ratings and (b) MP ratings in the baseline phase without rule directly follows from the dual-source model: It is a consequence of the fact that the knowledge-based component enters the MP ratings with rule with a weight factor smaller than one, leading to a compression of the differences between contents relative to the baseline phase that is exclusively driven by the knowledge-based component. It is difficult to see at first glance how the suppositional account of conditional reasoning (Evans & Over, 2004) would deal with this dissociation: The MP problem without rule, like the rule itself, should focus participants on cases with p. Thus, it should not make much of a difference whether a rule is stated for MP ratings. This expectation is underlined by the surprisingly close correspondence between ratings for MP problems without rule and ratings of rule believability in our data. Yet, MP ratings with rule differed quite strongly from ratings of rule believability and ratings of MP problems without rule. One additional aspect of the suppositional account is, however, that it is embedded in a dual-system framework. According to dual-system theories (for a review see Frankish & Evans, 2009), conditional inference integrates the outcomes of two distinct systems, one system being characterized as unconscious, rapid, auto-

322

TTER KLAUER, BELLER, AND HU Cummins, D. D. (1995). Naive theories and causal deduction. Memory & Cognition, 23, 646 658. Cummins, D. D., Lubart, T., Alksnis, O., & Rist, R. (1991). Conditional reasoning and causation. Memory & Cognition, 19, 274 282. De Neys, W., Schaeken, W., & dYdewalle, G. (2003). Inference suppression and semantic memory retrieval: Every counterexample counts. Memory & Cognition, 41, 581595. Evans, J. St. B. T. (1993). The mental model theory of conditional reasoning: Critical appraisal and revision. Cognition, 48, 120. Evans, J. St. B. T. (2009). How many dual-process theories do we need? One, two, or many? In J. St. B. T. Evans & K. Frankish (Eds.), In two minds: Dual processes and beyond (pp. 3354). New York, NY: Oxford University Press. Evans, J. St. B. T., Handley, S. J., & Over, D. E. (2003). Conditionals and conditional probability. Journal of Experimental Psychology: Learning, Memory, and Cognition, 29, 321355. Evans, J. St. B. T., & Lynch, J. S. (1973). Matching bias in the selection task. British Journal of Psychology, 64, 391397. Evans, J. St. B. T., Newstead, S. E., & Byrne, R. M. J. (1993). Human reasoning. Hillsdale, NJ: Erlbaum. Evans, J. St. B. T., & Over, D. E. (2004). If. Oxford, England: Oxford University Press. Frankish, K., & Evans, J. St. B. T. (2009). Systems and levels: Dual-system theories and the personalsubpersonal distinction. In J. St. B. T. Evans & K. Frankish (Eds.), In two minds: Dual processes and beyond (pp. 89 107). New York, NY: Oxford University Press. Geiger, S. M., & Oberauer, K. (2007). Reasoning with conditionals: Does every counterexample count? Its frequency that counts. Memory & Cognition, 35, 2060 2074. George, C. (1995). The endorsement of the premises: Assumption-based or belief-based reasoning. British Journal of Psychology, 86, 93111. Handley, S., & Feeney, A. (2003). Representation, pragmatics and process in model-based reasoning. In W. Schaeken, A. Vandierendonck, W. Schroyens, & G. dYdewalle (Eds.), The mental models theory of reasoning: Refinements and extensions (pp. 25 49). Hillsdale, NJ: Erlbaum. Heit, E., & Rotello, C. M. (2005). Are there two kinds of reasoning? In B. G. Bara, L. Barsalou, & M. Bucciarelli (Eds.), Proceedings of the 27th annual meeting of the Cognitive Science Society (pp. 923928). Mahwah, NJ: Erlbaum. Heit, E., & Rotello, C. M. (2008). Modeling two kinds of reasoning. In B. C. Love, K. McRae, & V. M. Sloutsky (Eds.), Proceedings of the 30th annual meeting of the Cognitive Science Society (pp. 18311836). Austin, TX: Cognitive Science Society. Johnson-Laird, P. N., Byrne, R. M. J., & Schaeken, W. (1992). Propositional reasoning by model. Psychological Review, 99, 418 439. Liu, I. (2003). Conditional reasoning and conditionalization. Journal of Experimental Psychology: Learning, Memory, and Cognition, 29, 694 709. Liu, I., Lo, K., & Wu, J. (1996). A probabilistic interpretation of ifthen. The Quarterly Journal of Experimental Psychology, 49, 828 844. Markovits, H., & Handley, S. (2005). Is inferential reasoning just probabilistic reasoning in disguise? Memory & Cognition, 33, 13151323. Markovits, H., & Thompson, V. (2008). Different developmental patterns of simple deductive and probabilistic inferential reasoning. Memory & Cognition, 36, 1066 1078. Matarazzo, O., & Baldassarre, I. (2008). Probability and instruction effects in syllogistic conditional reasoning. Proceedings of World Academy of Science, Engineering and Technology, 33, 427 435. Myung, I. J. (2000). The importance of complexity in model selection. Journal of Mathematical Psychology, 44, 190 204. Oaksford, M., & Chater, N. (2007). Bayesian rationality. Oxford, England: Oxford University Press. Oaksford, M., Chater, N., & Larkin, J. (2000). Probabilities and polarity

matic, high capacity, and contextualized, and the second as conscious, slow, effortful, deliberative, and decontextualized. It may be that the presence of the rule triggers the second system, thought to be capable of logical inferences, to a larger extent than when problems are presented without rule. From this perspective, the dual-source model can be seen as a specification of the general idea that two systems may be involved in probabilistic conditional reasoning. Dual-process theories in general. There is indeed an obvious relationship between the dual-source model and dual-system/dualprocess theories in terms of the feature contextualized versus decontextualized. Rule-based evidence is conceived of as decontextualized in the dual-source model, depending only upon rule form irrespective of content. Content-based evidence is by definition contextualized and domain-specific. We hesitate, however, to take a firm stance with respect to ascribing the other attributes such as automaticity, efficiency, speed, and so forth to one of the two sources of evidence and the processes recruited to process them. As already mentioned, we believe that both modes comprise several dissociable processes that differ on many of the justmentioned attributes within each mode (see also Evans, 2009). For example, we argued above that Verschueren et al. (2005a, 2005b) have proposed that the knowledge-based mode can draw on at least two processes, one heuristic (fast and relatively efficient) and another one more analytical (slow and less efficient; see also Beller and Spada, 2003). In conclusion, the success of the present dual-source model in accounting for complex patterns of data in a psychologically meaningful way suggests that it may be premature to abandon the idea that there is an abstract, decontextualized representation of conditional rules operating in probabilistic conditional inference. This runs counter to the dominant knowledge-based view of probabilistic conditional reasoning as reviewed in the introduction, but it creates a link to previous work on reasoning (albeit outside the domain of specifically conditional reasoning) that demonstrated that different modes of reasoning can be elicited by different instructions, a form-based mode by deductive instructions, and a knowledge-based mode by inductive instructions (Heit & Rotello, 2005, 2008; Rips, 2001; Rotello & Heit, 2009). The present work also extends this latter line of research by showing that even under purely inductive instructions, both modes of reasoning are recruited, determining responses jointly as specified by the dualsource model. A rule when present raises ones subjective confidence in certain inferences irrespective of content, whereas probabilistic prior knowledge about the content domain under study comes into play to the extent to which the rule-based evidence is weak.

References
Beller, S. (2008). Deontic norms, deontic reasoning, and deontic conditionals. Thinking & Reasoning, 14, 305341. Beller, S., & Kuhnmu nch, G. (2007). What causal conditional reasoning tells us about peoples understanding of causality. Thinking & Reasoning, 13, 426 460. Beller, S., & Spada, H. (2003). The logic of content effects in propositional reasoning: The case of conditional reasoning with a point of view. Thinking & Reasoning, 9, 335379. Braine, M. D. S. (1978). On the relation between the natural logic of reasoning and standard logic. Psychological Review, 85, 121.

A DUAL-SOURCE MODEL biases in conditional inference. Journal of Experimental Psychology: Learning, Memory, and Cognition, 26, 883 899. Oberauer, K., & Wilhelm, O. (2003). The meaning(s) of conditionals Conditional probabilities, mental models, and personal utilities. Journal of Experimental Psychology: Learning, Memory, and Cognition, 29, 680 693. OBrien, D. P., Dias, M. G., & Roazzi, A. (1998). A case study in the mental models and mental-logic debate: Conditional syllogisms. In D. S. Braine & D. P. OBrien (Eds.), Mental logic (pp. 385 420). London, England: Erlbaum. OHagan, A., & Forster, J. (2004). Kendalls advanced theory of statistics: Volume 2B: Bayesian inference. London, England: Arnold. Over, D. E., & Hadjichristidis, C. (2009). Uncertain premises and Jeffreys rule. Behavioral and Brain Sciences, 32, 9798. Rips, L. J. (1994). The psychology of proof. Cambridge, MA: MIT Press. Rips, L. J. (2001). Two kinds of reasoning. Psychological Science, 12, 129 134. Rotello, C. M., & Heit, E. (2009). Modeling the effects of argument length and validity on inductive and deductive reasoning. Journal of Experimental Psychology: Learning, Memory, and Cognition, 35, 13171330. Rouder, J. N., Lu, J. D., Morey, R., Sun, D., & Speckman, P. L. (2008). A hierarchical process-dissociation model. Journal of Experimental Psychology: General, 137, 370 398. Schroyens, W., Schaeken, W., & dYdewalle, G. (2001). A meta-analytic

323

review of conditional reasoning by model and/or rule: Mental models theory revised. Unpublished manuscript, University of Leuven, Belgium. Stevenson, R. J., & Over, D. E. (2001). Reasoning from uncertain premises: Effects of expertise and conversational context. Thinking & Reasoning, 7, 367390. Thompson, V. A. (1994). Interpretational factors in conditional reasoning. Memory & Cognition, 22, 742758. Thompson, V. A., & Mann, J. M. (1995). Perceived necessity explains the dissociation between logic and meaning: The case of only if. Journal of Experimental Psychology: Learning, Memory, and Cognition, 21, 1554 1567. Verschueren, N., Schaeken, W., & dYdewalle, G. (2005a). A dual-process specification of causal conditional reasoning. Thinking & Reasoning, 11, 239 278. Verschueren, N., Schaeken, W., & dYdewalle, G. (2005b). Everyday conditional reasoning: A working memory-dependent tradeoff between counterexample and likelihood use. Memory & Cognition, 33, 107119.

Received February 25, 2009 Revision received December 4, 2009 Accepted December 7, 2009

Correction to Klauer et al. (2010)


In the article Conditional Reasoning in Context: A Dual-Source Model of Probabilistic Inference, by Karl Christoph Klauer, Sieghard Beller, and Mandy Hu tter (Journal of Experimental Psychology: Learning Memory, and Cognition, 2010, Vol. 36, No. 2, pp. 298 323), the dual-source model is overparameterized. Only the products of the and parameters are uniquely identified by the data. This has no consequences for the parameters, for ratios of parameters estimated with the same , for ratios of parameters associated with the same parameters, nor for the fit values. The model fit is, however, achieved more parsimoniously than stated in Klauer et al. because one parameter (Experiments 1, 2, and 4) or two parameters (Experiment 3) are redundant. To fix the scale for and parameters, one of them has to be set to one. We recommend to set the largest of (MP), (MT), (AC), and (DA) equal to one. This yields unique parameter estimates for and but has consequences for their interpretation: Differences in overall level of the profile of parameters over the four inferences (due to, e.g., differences in cognitive load), if any, would be removed from the estimates and would show up in the parameters. The above constraint is the one implicitly imposed almost perfectly by the estimation method used in Klauer et al. (2010). In consequence, when the constraint is explicitly enforced, the numerical values of the parameter estimates reported in Klauer et al. change only minimally, and the outcome of all of the significance tests reported remains the same.

You might also like