You are on page 1of 17

A.

THEORETICAL ISSUES IN CURRICULUM EVALUATION1


Janet Grant Open University Centre for Education in Medicine
1. Introduction

This paper will not present or construct a theory of evaluation in medical education. Neither will it analyze the practice of evaluation in this field. Instead, I intend to consider a variety of underlying or implicit issues which inform the practice of curriculum evaluation generally. These issues are not particular to medical education, although their implications might be. My main concern will not be to concentrate on these implications, but rather to address the issues themselves and to give some indication of their importance to the instigators, implementers, providers, receivers and subjects of curriculum evaluations in medical education. This places on the reader the burden of keeping in view the special context of medical education and considering to what extent these general issues are of particular significance in this field. To state that this paper is about theoretical issues in curriculum evaluation is also to state implicitly that it is not about a variety of other possible things. Thus it is not about the measurement of clinical competence nor about the assessment of students. It is not about the experience of curriculum development in medical education. Instead, it looks at the thinking, values and assumptions which underpin the main current approaches to educational evaluation, the selection and use of such approaches, and the implications of that selection for many aspects of practice. The discussion will focus exclusively on theory and so begins with an examination of what theory actually is. The paper is presented in the following five parts: 1. 2. 3. 4. 5. 2. The importance of theory Theories and evaluation Why is evaluation problematical? The value component Deriving and using theory for evaluation in medical education

Part One: The Importance of Theory Theory is of central importance simply because it both derives from and informs practice. It provides a framework for analysis and understanding and allow participants in a complex field to communicate effectively: for an appreciation of anothers theoretical orientation is fundamental to any understanding of that persons practice. In our particular field of curriculum evaluation in medical education, those participants might be the evaluators or any person within the medical education system: student, teachers, committee member, administrator or patient. Each will be operating with a set of assumptions about the nature of evaluation. To achieve progress, it is often those assumptions which must be examined rather than their practical consequences. For example, disputes often arise about methodology. The medical scientist tends to operate with a model of research which is quite unlike that of the new wave evaluators. Or they might, perhaps, be more interested in the skills of performance of the individual teacher (reflecting the orientation of assessment of clinical competence), rather than more fundamental and difficult issues about the learning milieu: they might construe evaluation as assessment and expect facts and figures rather than descriptions and qualitative evidence. Parlett and Dearden (1977) describe this problem vividly: Difficulties are often encountered in justifying illuminative evaluations to outsiders still wedded to traditional notions of educational research. If one proposes a study that is essentially descriptive and interpretive, the response may well be that while it is useful as a preliminary or pilot study, if it is to be a
1

In. Page,G [ed] Essays on Curriculum Development and Evaluation in Medicine. Report of the 2nd Cambridge Conference. 1989. University of British Columbia.

proper study it should have clear hypotheses, should be objective and capable of public verification. The methods of more holistic, experience-based studies are likely to be considered soft and unreliable: open-ended interviewing, naturalistic description, portrayal of school life, and distillations of participants points of view may be adequate for a novelistic and impressionistic account but it can only be, they say, a personal interpretation full of the investigators biases and prejudices. Such problems cannot be resolved without redress to underlying theoretical perspectives. But what is meant by theoretical in this context? I intend to adopt a broad, common sense definition which will encompass some phenomena, such as models and conceptual frameworks, which philosophers in the area might construe as inadmissible as theory (Warr, 1980). In our definition, theory concerns things that are not practical but have implications for and derive from practice. If theory does not derive from practice, then we can say either that it is not good theory or that it is not theory at all but some form of speculation or wishful thinking. Given this proviso, our discussion of theory will assume that the following are admissible for consideration: Conceptual frameworks which provide a technical language or a set of interpretative principles. Paradigms which may be defined as strong networks of commitments conceptual, theoretical, instrumental and metaphysical (Kuhn, 1962). Models which may be defined in two ways: Model 1: simplified representations of a part of known reality Model 2: imported analogues to assist thinking about the unknown or unfamiliar (Warr, 1980). Although I intend to assume that the term theory could include all these, it is necessary to recognise that sometimes they might not conform to our given definition of theory. That definition is clear that theory does not describe practice but is a kind of grounded abstraction of it which is both retrospective and prospective and therefore constantly developing. In philosophical terms, such a definition is clearly antagonistic to the thinking of logical positivism which construes theories as selfcontained and axiomatic, having no direct reference to the observed world. It is opposed to the rationalist school (of which the objectives movement could be seen as a prominent member) which admits the reality only of reason and not of experience. It is opposed also to the empiricist school which would argue that only the data of perception reflect realities of the objective world. The definition that is adopted here, which seems to reflect the actual state of the art, is firmly based in the philosophical school of dialectical materialism. According to this view, in the relationship between theory and practice, practice is primary. Theory derives from practice and in turn informs future practice. Clearly, the theory of curriculum evaluation we adopt will be powerful in determining the design of the evaluation programme that is why we adopt a theoretical perspective in the first place if we are practising evaluators. Hamilton (1976) puts this very clearly: Evaluation theories provide a framework of assumption and a set of procedural targets. For example, the theory of goal free evaluation suggests that evaluators should look for actual effects rather than intended outcomes; and, in doing so, should work independently of curriculum developers. The question remains, of course, whether the theory of goal-free evaluation did derive from practice prior to informing it. This is an important issue, because the practice-theory-practice of dialectical materialism is a cycle in which it is inevitable and necessary that the theory itself should change and develop. Mao Tse-Tung, in his essay On Practice (1968), suggests that errors of practice or implementation often occur because people adopt a theoretical position which is based on values [Stufflebeam and Webster (1980) give six examples of such work), apparently helpful analogies (e.g., Eisners (1972) art criticism approach) or transpositions from other disciplines (e.g., House (1980) cites professional review and quasi-legal models). Mao suggests that such a situation is

almost bound to lead to failure. To be helpful, theory must be derived from relevant practice, even if that practice is vicarious. Perhaps this is an indication to us, that medical education requires its own theories of curriculum development and evaluation. But that is an issue to which we shall return and which will only be resolved by analysing where the theories we use come from and how they relate to the experience of everyone concerned. But that is not a theoretical issue; it is a practical historical one. So what is subsumed under our heading of theoretical issues? First, we should consider whether or not the term evaluation has any agreed or common meaning, either in theory or in practice. This will involve us in considering a range of credos of evaluation, classifications, and taxonomies. Next, we shall find ourselves in a position to consider exactly why evaluation is so problematical. This is a theoretical issue of key importance to understand, for it requires us to understand and view the complexity of both the theory and practice of evaluation, especially in the field of medical education. Then, it will be helpful to consider what might appear to be an entirely metaphysical problem but turns out to be a very practical one: the role of values and value judgements in evaluation. And finally, we shall consider where this leaves us in terms of the derivation and use of our own theories of evaluation in medical education. To end this part of the paper, it will be useful to draw out its main implications for the practice and practitioners of curriculum evaluation in medical education. Curriculum evaluation in medical education is a special case of curriculum evaluation in general. It has its own characteristics which should be taken into account when resolving theoretical issues and deriving and using theory. Theoretical issues (as opposed to theories) concern the thinking, values and assumptions which eventually contribute towards the selection and use of evaluation approaches, and the implications of that selection for many aspects of practice. Theory both derives from and informs practice. It provides a framework for analysis and understanding and allows participants in a complex field to communicate effectively. Where communication or agreement about evaluation approach or interpretation of data is problematical, it is useful to consider whether the same theoretical assumptions are held by all. Reconciliation of theoretical differences may well be the most effective way of solving the problem. Some apparent theories might not, in reality, match up to the given definition. Statements can only be given the status of theory, for our purposes, if they are clearly an abstraction which derives from practice. Acceptance of the dialectical-materialist definition of theory, which seems to be justifiable and appropriate in this circumstance, gives primary to the role of practice. The practice-theorypractice cycle also makes it inevitable that our theory will evolve and develop and that new theories will arise according to material circumstances and social interaction in the context of medical education. 3. Part 2: Theories of Evaluation The complexity of evaluation seems to derive equally from its practice and its theory. It might also derive from its lack of theory as we have defined it. There is a vast sea of thinking and writing about the area, but how much of it is based on practice rather than values or beliefs or speculation alone, we shall have to consider. For our own purposes, we shall also have to consider to what extent the theories that do exist are appropriate to our own field of practice. Is there anything about medical education which makes vicarious experience irrelevant? Before addressing these questions in a later part of the paper, we can try to gain some insight into the complexity of evaluation by considering whether evaluation has any common, agreed meaning in either theory or practice. We can begin to do this by reviewing some major credos of evaluation. Such a task is daunting when we consider that Stufflebeam and Webster (1980) have identified thirteen types of studies that have been conducted in the name of educational evaluation. House (1980) points out that there are dozens of advocated approaches to evaluation. He finds it possible, however, to classify these into eight major types. Scriven (1967) identified six basic dimensions of evaluation, while Worthen and Sanders (1973) created an even more elaborate taxonomy of evaluation designs. Stake (1976a), for his part, settles for a great oversimplification and identifies nine approaches to educational evaluation and eight dimensions for classifying

evaluation designs. Stakes comment on his grid of approaches to educational evaluation seems to summarize neatly (and unintentionally) the problems which face anyone who tries to make sense of this area: The approaches overlap. Different proponents and different users have different styles. Each protagonist recognises one approach is not ideal for all purposes. Any one study may include several approaches. The grid is an oversimplification. It is intended to show some typical, gross differences, between contemporary evaluation activities. The bold-face italics have been added to illustrate the variety of variables to be taken into account in the process of thinking about evaluation, before any practical decisions have ever been entertained. To approach more closely the complexities of educational evaluation, it will be useful to look briefly at some of its major definitions. We should note, however, that not only are there many definitions, but these in themselves might be open to question. We, echoing Jenkins (1976), would invite you to treat all definitions as problematic. Given our definition of theory, perhaps we might speculate that some problems arise because definitions are not always based on practice-theory-practice as a development process. Either empiricism or rationalism or a value system of a quasi-hypothesis might give rise to such definitions. This is a test which we should apply to the major approaches which we shall now review. However, it is rarely the case that writers explain the derivation of their theories or give the practical rationale or evidence for their preferred approach. The approaches chosen for consideration are representative of contemporary thought. A historical overview, such as that provided by Hamilton (1976), gives a developmental and explanatory account of evaluation and the various research traditions which have contributed with particular reference to psychometrics, field experimentation and comparative studies. He then traces the introduction and uptake of Tylers objectives evaluation model. It is at this point that the historical overview takes on a pale contemporary identity. How many evaluations are still charged to ascertain whether the curriculum is effective, whether it produces a better doctor or, indeed, whether it achieves its objectives? It is probably not chance that an evaluator at a London Medical School in the 1970s began work by helping teachers and departments to write their objectives. It being that the objectives evaluation model is still encountered in the context of medical education, we shall begin there and follow with the theories or models of more recently prominent workers in the field. We should bear in mind, however, that working in a field does not necessarily mean that theory is constructed out of that practice; neither does it mean that the theory actually informs subsequent practice. So we shall consider the following credos and we shall have to decide for ourselves whether or not we consider that each attains the status of a true theory: Tylers objectives evaluation approach Stufflebeams content, input, process and product model Scrivens goal-free evaluation Stakes evaluation as portrayal Eisners art criticism approach Parlett and Hamiltons illuminative evaluation Stenhouses research model of evaluation

In considering each of these, even briefly, it will become clear that theories differ in terms of the theorists view of curriculum mechanics, use of evaluation findings, processes of decision making, sources of relevant data and key emphases. a. Tylers Objectives Evaluation Approach.

This credo is an integral part of a particular view of curriculum design and development, as are all the credos we shall consider. According to this, the curriculum is evaluated (or perhaps more precisely, is assessed) against its pre-specified set of objectives. The necessary curriculum development model on which such an approach is predicated involves some variant of a statement of curriculum aims translated into a set of objectives expressed in terms of student performance on the basis of which curriculum materials are devised. The evaluation will then consist of measuring the fit between actual student performance and set objectives. There seem to be no obvious methodological implications attached to this view of curriculum evaluation other than the need to use methods which measure students attainment of objectives. Perhaps a largely quantitative

methodology is implied or perhaps, more recently, not. What it does imply, however, is a role for the evaluator. Jenkins (1976) summarises this: it traps the evaluator securely within the rhetoric of intent of the programme builder. The evaluator merely feeds back evidence of shortfall allowing the curriculum designer to vary the treatment until the students behaviour matches the pre-specified objectives. It must be said that in relation to the status of this credo as a theory, that the objectives model in its entirety has not been without problems of translation into practice. There are many instances of objectives being written. Perhaps most medical schools these days do make some attempt to state objectives. Nonetheless, the systems model as a whole is a rare and exotic animal perhaps seen by no living being although enjoying a prime position in pedagogical mythology. The objectives model, despite its dominance, has been the subject of consistent and cogent criticism which Stenhouse (1976) summarises. McDonald-Rosss (1973) paper is also still regarded as a classical critique of the model. This being so, we would be well advised to be sceptical when considering whether or not Tylers credo attains the status of theory as we are using the term. b. Stufflebeams Content, Input, Process and Product (CIPP) Model.

Stufflebeams well-known credo could be seen as one of the first of the new wave. His view is that: Educational evaluation is the process of delineating, obtaining and providing useful information for judging decision alternatives. (Stufflebeam et al., 1971) The new, key emphasis of this credo on decision making makes it different from Tylers credo in an important way. The objectives model separates data-gathering from decision making, whereas the decision orientated approach binds together the judgments of the decision maker and the datagathering process. Thus the evaluation design identifies the level of decision to be served and gives consideration to that decision situation, defining clear criteria for it. The design also defines policies for the evaluator. The requisite information is then collected, organised, analyzed and reported. Each phase is broken down into sub-tasks for the evaluator. We see here, then, an elaboration of the role of the evaluation and of the evaluator. The CIPP model potentially contains the objectives model but, as the name implies, encompasses other domains also. So there are potentially four types of evaluation which correspond to four decision areas for curriculum designers: Context evaluation concerns planning decisions which determine objectives. Input evaluation concerns structuring decisions which determine procedural designs. Process evaluation concerns the implementation of decisions to control, use and define procedures. Product evaluation concerns decision review in the light of project attainments. The last of these evaluation areas has obvious points in common with Tylers credo. Like Tylers work, Stufflebeam also seems to make no particular recommendations about appropriate methodology. But unlike Tyler, there seems to be lacking an implicit statement about even general methodological orientation. This must be so, given the broad range of evaluation areas which Stufflebeam identifies. Can Stufflebeams credo, then, be regarded as a true theory? It is arguable that Stufflebeams credo purports to be no such thing. On the other hand it is based on an analysis of the process of curriculum decision making and so, by implication, would seem to fit in with the practice-theorypractice definition. Whether or not that analysis is correct, of course, is open to question. Some might think not (Jenkins, 1976). But that is not really the issue. A theory can be wrong and still retain the status of theory if it continues to evolve with practice. c. Scrivens Goal-Free Evaluation.

No mention of Scrivens work would be tenable without recognition that it was he who introduced the terms formative and summative to this field (Scriven, 1967). However, that major contribution is not our main focus and, indeed, the paper in which the sea-change occurred was one from which Scriven himself later departed intellectually. Scrivens work, for us, must represent a further broadening in the perspective of educational evaluation. Scriven (1973) held that an evaluator should remain deliberately uninformed about a programmes goals in order not to be biased by them. He makes much of the independence of the evaluator. This approach to evaluation must be seen in the light of Scrivens more basic concern to reduce the effect of bias in evaluation. So his theory is based on the practice of evaluation rather than on the practice of curriculum development exclusively. Being unhampered by statements of curriculum intent, goal-free evaluation does not decide in advance what data will be collected or deemed relevant except insofar as they might be relevant to a standard evaluation checklist. The evaluator is free to look at processes and procedures as well as outcomes and is likely to discover many unanticipated side-effects. Methodologically, this opens the field to the hunter (an analogy which Scriven himself used). In the few goal-free evaluations on record, techniques seem to have varied widely. Nonetheless, in Scrivens own mind the terrain is to be mapped by means of a lethal checklist of thirteen criteria points against which any subject of evaluation must be rated on a five-point scale. Superficially, it might seem that Scrivens goal-free evaluation is closely associated with illuminative evaluation or portrayals (see below for both). But although it has in common with them the lack of objectives orientation, it does not go so far along the road to freedom from content parameters and necessary qualification. One effect of occupying such middle ground in that goal-free evaluation has been appreciated and criticised from both ends of the evaluation spectrum. Stufflebeam (1972) feels that sponsors of evaluations require certain pieces of information about the achievements of the educational programme and of its goals in particular. On the other hand, illuminative evaluators might feel that an evaluation which is predicted on a lethal checklist might not get to the heart of the matter. Whatever the criticism, it is difficult to judge this approach in terms of its status as theory or its responsiveness to the practice of curriculum development. d. Stakes Evaluation as Portrayal.

Stake (1976b) is quite clear that an evaluator has no need to use a checklist or any other kind of measurement tool. Indeed, the evaluator should not but; needs merely to make a comprehensive statement of what the programme is observed to be, with useful reference to the satisfaction and dissatisfaction that appropriately selected people feel towards it. In doing this, Stake feels it necessary to decide between analysis and portrayal, just as in his previous writing he had distinguished between formal and informal evaluation and description and judgement data. Evaluation as portrayal is presented as a developed and slightly altered form of responsive evaluation which Stake had discussed comprehensively in 1972 in terms of orientation towards programme activities rather than programme intents, response to audience requirements for information and the different value perspectives present within the programme. Stakes ideas developed quite rapidly during the 1970s and it is sometimes difficult to trace the evolution of the various strands. It is also difficult to establish exactly why his ideas changed; whether it was because of a change in his values, his experience or new influences on him. It is difficult to know whether we are considering a true theory or not. Nonetheless, let us consider Stakes (1975) obviously heartfelt credo: Its a tough choice: analysis or portrayal. The evaluator has to help figure out which will be more useful. How does the record look now? Are programme evaluation studies connected to programmes by more than a tenuous shoestring? Are they little more than the exploiting to a reviewers pet idea, an instructional researchers hunch or a psychometricians fascination? We owe people more than that. Stake obviously developed his view partly, at least, on the basis of his analysis of the practice of evaluation. The influence on him of developing curriculum practice is not so clearly defined. Neither is it very clear who the audience for such evaluations might be. The question of defined or implied

methodology is rather easier to deal with. Portrayal as a technique seems to have a qualitative rather than a quantative aura. Portrayal is likely to tell a story in a narrative manner and Stake supposes that some might be very short, featuring a script, logbook or scrapbook. A longer portrayal might require a broader range of media including even role-playing, photographs, tapes and various exhibits. The question which seems even more important than whether or not such an approach constitutes a true theory (it seems doubtful that it does) is whether or not it is viable even as hopeful speculation given that the audience of the portrayal has to be appropriately receptive to it and see it in terms that will enable action to be taken. Medical teachers and the sources of funding for such projects as evaluation of medical curricula, are firmly set within a hard science tradition rather than a (perhaps more appropriate) clinical case study tradition when it comes to anything that resembles or is research. Demands for P values have to be seen as reasonable against such a background. What is also reasonable is recognition on the part of the evaluator that audience education is probably a prior necessity if findings are to have any acceptability or use. The pragmatics of the situation requires careful scrutiny. It is reasonable to hypothesise that some conceptual leaps will be too demanding, especially when education is often not the main priority of the audience anyway. Despite this specific qualification, it is Stakes intention to make evaluation reports accessible to diverse audiences. Perhaps it is simply a question of what accessible means in any given context. e. Eisners Art Criticism Approach.

Eisners disclosed connoisseurship has some affinity with Stakes portrayal, although perhaps Eisner is less concerned with the issue of accessibility. The overall concept of evaluation as art criticism arose out of Eisners (1969) championing of the idea of expressive objectives. Curricula using such objectives would be best advised to use the forms of evaluation that any other field of expression would use (Eisner, 1972). And so the idea developed in its own right. Art criticism is a way of judging the curriculum, just as an art critic judges a painting or piece of pottery. Criticism is essentially qualitative and is an empirical undertaking. Criticism is not, as the everyday meaning of the word conveys, negative appraisal but rather the illumination of qualities so that a judgement of value can be made. Effective criticism should have some instrumental effect on the perception of an audience. Connoisseurship, for Eisner, is the art of appreciation whereas criticism concerns disclosing the qualities of an item. For him, the necessity is to disclose the connoisseurs appreciation for all to share and benefit. It is almost certain that art criticism does not attain the status of a theory of educational evaluation, although it might appear as an idea. It is a transposition based on the practice of another, really quite separate discipline. Despite this, Eisner does address issues of validity and reliability of data, but this seems somehow out of place within an approach which is designed not to reach a definitive judgment but to expand perception. What is most interesting about this approach seem to be the fact of its constant appearance in the literature on educational evaluation despite its apparent lack of use in practice. Perhaps there is a fundamental division to be made between philosophical theories and practical theories in the world of curriculum evaluation. Our dialectical materialist definition of theory is clear that only theory which derives from practice is likely to succeed and that all else will fail the test of implementation. This certainly is borne out in this brief review of credos of evaluation. Thinking about evaluation is one thing, but actually doing it is another, hedged about with a different set of rules. f. Partlett and Hamiltons Illuminative Evaluation.

Illuminative evaluation might seem to have something in common with both art criticism and with portrayal. It has the distinction, however, of being a true theory grounded in practice and altering as it informs practice. The primary concern of illuminative evaluation, in common with all new wave evaluations, is to contribute to decision-making by providing description and interpretation rather than any form of quantification or narrow prediction. It belongs to the anthropological research paradigm and uses naturalistic research methods at its centre with other forms of data gathering (which might include quantified approaches) as necessary. Methods are not specific but are very clearly implied by the

general orientation of the approach. (1972) at length:

We can hardly do better than quote Partlett and Hamilton

The aims of illuminative evaluation are to study the programme: how it operates; how it is influenced ; what those directly concerned regard as its advantages and disadvantages; and how students intellectual tasks and academic experiences are most affected. It aims to discover and document what it is like to be participating in the scheme, whether as a teacher or pupil; and, in addition, to discern and discuss the innovations most significant features, recurring concomitants and critical processes. In short, it seeks to address and to illuminate a complex array of questions. In the credos described so far there have been underlying theories of curriculum design and development to varying degrees. Central to Parlett and Hamiltons theory are the concepts of the instructional system and the learning milieu. The instructional system corresponds roughly to the curriculum plan and all its elements in implementation; while the learning milieu is the socialpsychological and material environment in which students and teachers work together. These two concepts provide a framework for data gathering. The theory of illuminative evaluation provides no prescriptions about methodology; instead it is a general research strategy and, as such, is adaptable and eclectic about methods. The evaluation problem defines the methods to be used and these must be selected both at the initiation of a project and at interim states, as necessary. The general methodology involves the investigators in observing, enquiring further and seeking explanations for findings. Data would most usually be collected in four ways: observation; interviews; questionnaires and tests; documentary and background sources. Partlett and Hamilton are aware of the issue of data validity and recommend a variety of methods for establishing this. At the same time, they also point out that no form of research is immune from bias. Unusually, they are also aware that there is a methodological question associated with the position of the investigator. They point out that research workers in this area need not only technical and intellectual capability, but also interpersonal skills. Such breadth of perception is also apparent in relation to the audiences for illuminative evaluation reports. At least three groups of decision makers are, realistically, identified as the audience for such reports. These are the programmes participants, its sponsors or supervisors, and interested outsiders such as other curriculum researchers and planners. Each group has its own concerns and will reach different decisions on the basis of the same information. For that reason, the illuminative evaluator concentrates on information gathering and interpretation rather than on decision-making. This is not to say, however, that decisions are not shaped by the judgments presented in the reports. g. Stenhouses Research Model of Evaluation.

Stenhouses model of evaluation is firmly based in a set of ideas about how curriculum development might best occur rather than how it actually does occur. That alone does not preclude the model from being classed as a theory. The model assumes that curriculum decisions rest with the individual school, that the school is the focus of curriculum development and a process of continuous organic development becomes possible. Thus formative evaluations and continuous adjustment and improvement become the order of things. In other words, curriculum evaluation is integrated into the wider curriculum process. But how is this to be achieved? It is in answering this question that the singularity of Stenhouses approach and the power of his values (shared, one would suspect, by many teachers) become clear. Stenhouses central concerns are with the relationship between curriculum developer and evaluator, and with the relationship between developer/evaluator and teacher. For Stenhouse (1976), it is generally the case that: The curriculum developer is seen as one who offers solutions rather than as one who explores problems . The evaluator is the critic or the practical man who tempers enthusiasm with judgment. Stenhouses concern is focused on this relationship. He is sure that:

there must be an aspiration to grow beyond it to a more scientific procedure which builds action and criticism into a more integrated whole. This requires a research model in which the developer is the investigator and the curriculum is always in a developmental stage. Although Stenhouse regards this suggestion (obviously unfortunately, in my view) as Popperian, it could equally (and I would say more usefully) be regarded as a dialectical-materialist approach. Practice-theory-practice, with the emphasis on practice and the practitioner, seems to be exactly what he is advocating as a model of continuous curriculum development. For Stenhouse, however, it is a research model of continually making hypotheses and testing them. So far, then, we can see that Stenhouse has taken others concerns about the relationship between evaluation and decision-making to its logical conclusion. Having done this, he must now face the question which others have managed to deal with only peripherally or pragmatically (or in selfdefence), if at all. That question concerns who will actually do the research. For Stenhouse, it is only the teachers themselves who are in a position to understand the uniqueness of their classroom setting. He is sure that: it is not enough that teachers work should be studied: they need to study it themselves. This would require of each teacher an extended professionalism which would have the following critical characteristics: The commitment to systematic questioning of ones own teaching as a basis for development. The commitment and the skills to study ones own teaching. The concern to question and to test theory in practice by the use of those skills. This implies for Stenhouse a set of methodologies to be mastered and used including either direct or indirect observation, peer review, some form of content analysis of teaching content and case study work. Stenhouse did not work out in detail the methodology to be used and only had begun to address questions of reliability and data collection. He did suggest that a teacher could work collaboratively with a researcher or could collect data by recording methods or by eliciting student opinions. On a more pragmatic level, Stenhouse recognises that the research model of evaluation would require vast changes within teachers, the system and funding arrangements. But he regards (however realistically) the main barriers to such a model as mainly social and psychological. Unexpectedly enough, such an apparently revolutionary approach may well find acceptance and recognition within some branches of medical education. His description of the extended professional seems strangely familiar to anyone acquainted with general practice in the United Kingdom: the outstanding characteristic of the extended professional is a capacity for autonomous professional self-development through systematic self-study, through the study of the work of other teachers and through the testing of ideas by classroom research procedures. Transposed into the practice of medicine, this sounds like a form of audit and peer-review. The problem of different underlying theories may well not face the implementation of a research model of evaluation in some areas of medical education. Stenhouses model seems a potentially useful one, but it is one that largely remains to be tested in practice. It may well be one that would receive criticism from professional evaluators themselves, who may see themselves as relegated to the position of either teachers or research assistants. But the model also has problems in its intentions. It may well be able to deal with questions of the development of classroom teaching, but it leaves entirely open the question of wider curriculum issues which might extend beyond the immediate teaching setting. It also requires a fundamental change in the nature and organisation of teaching, teachers, education and development. Perhaps Stenhouses credo should be valued not as an exclusive approach but one of many possible methods which could be applied as appropriate.

What issues of importance for the evaluation of medical education have been raised by our brief examination of some major theories of curriculum evaluation? In considering seven major credos of evaluation, we have seen that some can be counted as true theory in that they are grounded in practice and, in turn, influence practice, while others can best be seen as statements of value, belief or hope. At least one seems to be an intellectual foray into the construction of analogy. And yet these credos seem, in general, to be dealt with and judged in similar terms. There seems to be no available critical discussion of the state of theory in the field of evaluation. So the first issue for any would-be evaluator concerns how best to make judgements about the literature. It seems clear that some writings should be treated as philosophy (or even sophistry) while others should be treated as providing practical guidance. The judgment about which credo seems best is made on the basis of a comparison with the dialecticalmaterialist view of useful theory as something which has grown out of practice and in turn informs future practice in a cyclical and developing manner. Our tentative distinction between philosophical and practical theory might reflect a concentration on either the first or last dyad of elements of the practice-theory-practice cycle, respectively. The next issue concerns the criteria to be used when selecting an evaluation approach. The new wave evaluations seem to have in common (however implicitly or explicitly) an assumption that each evaluation is an individual thing. Therefore criteria will also be individual but might concern such things as the underlying or explicit assumptions of those concerned with the evaluation. What, in their eyes, would constitute an acceptable approach? To what extent is reeducation or a constructive challenge to their beliefs possible? What role will the curriculum evaluation play in curriculum development? What are the identity and role of the evaluator? What implications are there for establishment of lines of accountability? What would be the implications of these for the evaluation? The next issue concerns the public or private nature of theories. These criteria assume that those planning the evaluation are concerned to sort out their theoretical stance in a conscious way. Now, of course, they might be concerned to do no such thing. And, indeed, it is quite possible to treat the question of evaluation merely as a methodological one. However, this would not despatch the question of theory very effectively, because we have seen in the first part of this paper that, as Mao argued, every kind of thinking is stamped with a brand and so is every kind of action. So the planner who cares to treat evaluation planning and implementation merely as a methodological question leaves his or her theoretical position open to accurate or inaccurate inference. He or she might as well make it explicit in the first place and strike the first blow for effective communication and mutual understanding. The final issue concerns the relationship between theory and practice. There seems to be a plethora of theories and pseudo-theories in the field of evaluation and yet seeking examples of any one of these in practice is a difficult and often fruitless task. Evaluation practice seems not to have been influenced specifically by any one theory. However, the practice of evaluation is changing and perhaps the theories have arisen not out of practice but out of the new climate. In turn, these theories affect, reinforce and sharpen that climate without influencing specific practice in an easily attributable way. If this is the case, it remains for each evaluator to construct within a general framework of values, opinions and practices an appropriate specific theory. It is clear from the discussion of theories of evaluation, their selection and implications, that there is a myriad of issues wrapped up in this field of activity. It is complex and difficult. In the next part of this paper we shall consider exactly why this is so. 4. Part Three: Why is evaluation so problematical?

The discussion of the previous section might seem to answer this question adequately. Evaluation is a field fraught (or enriched, depending upon your point of view) with myriad theories, ideas, values, methods and issues. So much so that more than one writer has been moved to devise grids and matrices to try to make sense of the thoughts and experiences of the field. This, in itself, need not constitute a problem. What makes it difficult, however, is the concomitant lack of specific practice against which to judge any theory or any apparent means of choosing between these things, their often obscure status and the ease with which evaluation can be discussed and implemented without reference to all the underlying issues. So the first reason why evaluation is so problematical concerns the rather parlous state of its theory. That parlous state is not least attributable to the somewhat tenuous relationship between practice and theory in the field. There seems to be a dearth

10

of true theory in the dialectical-materialist sense, that is of theory which has arisen out of practice and in turn informs practice and is altered by it. There is much which masquerades as theory, and there is often, it seems, an unwillingness to attempt to refer to theory at all. To compound this difficulty, there also is a persistent need for an open debate about the role and influence of theory, and for a definition of what would constitute acceptable or useful theory in the field. Until the theories are consciously tested in practice, this will continue to be the case. This situation has its consequences. There being no unifying theory, there is also no unifying method (although unifying method would not imply a unitary methodology). There is certainly no scientific method of evaluation, although there is always the possibility of paying due regard to the establishment of reliability and validity of data, as we have seen, even with such an apparently unscientific approach such as art criticism. The lack of scientific method is recognised by Rowntree (1982) who talks about developing an evaluation procedure rather than setting one up. He recognises that the methodology of evaluation is a responsive one and the eventual evaluation procedures are incapable of final definition until after the evaluation has begun. There are no standard methods but lots to choose from at any one point in the evaluation process. The second reason why evaluation is so problematical concerns its practice. Each evaluation is an individual thing, unreplicable and not quite like any evaluation that occurred before. Its individuality stems from the content, circumstances, participants, possibilities and preclusions of each new situation. MacDonald (1973) points to a variety of ways in which the circumstances of an evaluation are individual: 1. 2. 3. 4. Human action in educational institutions differs widely because of the numbers of variables that influence it . The impact of an innovation is not a set of discrete effects, but an organically related pattern of acts and consequences. To understand fully a single act one must locate it functionally within that pattern . No two schools are sufficiently alike in their circumstances that prescriptions of curricular action can adequately supplant the judgement of people in them The goals and purposes of the programme developers are not necessarily shared by its users.

The individuality of any educational programme stems from all these factors and equally as much from the special set of interpersonal processes that occur within any particular evaluation setting. It is the nature of an applied subject (which evaluation certainly is although exactly what discipline is being applied is not always clear!) to be individual in its application. However, in other fields of endeavour it is often the case that the theory being applied is clear we have seen that with evaluation this is not necessarily so. A third reason why evaluation is so problematical concerns the very nature of the endeavour. It is one which is heterogeneous in its targets, its settings, its findings, its methods, its style of reporting, its uses and its means of being planned and implemented. Such a situation has developed out of what Hamilton (1976) refers to as a recognition of the cultural pluralism of the curriculum which took curriculum evaluation out of the factory and back into the classroom. Evaluation has thus become problematical (or has been recognised as such) when previously, using a more traditional approach deriving from the Tyler model, it was merely a standard package: By adopting a stance of cultural pluralism and recognising the validity of different groupings and viewpoints, evaluation has moved into new territory. It has relinquished the security of objective, universally agreed criteria and struck out into poorly charted waters that are infested with shoals of conflicting values and beliefs. In effect, the technological question, Which criteria?, becomes a social question, Whose criteria?. Where does this lead the evaluator? It leads the evaluator in the first instance, to a developed and developing role. It leads the evaluator into areas which before were either unnecessary or irrelevant. This new stance has been accompanied, of necessity, by new theories and approaches. This being so, it requires of the evaluator a range of skill and knowledge that covers apparently dissimilar domains. The evaluator must be negotiator, counsellor, scribe, commentator, statistician, critic, researcher in the classical and new senses, communicator, manager, developer, change agent, organiser, committee person, an isolate and a socialite. In addition, evaluators must have the ability to enter

11

into and be acceptable and trusted in a world that is not their own. Stenhouse (1976), of course, would disagree with this, asserting that the evaluator should be the person attempting to implement and develop the curriculum the teacher should also be the researcher. Harlen (1975) took quite the opposite view; It is no doubt important for the evaluation of material to be in the hands of someone who is not committed to it, emotionally or intellectually. Where the Harlen position is adopted and the evaluator is an outsider, that person must have special skills. So in our case, we must be able to move in the world or medical education, be acceptable to it, demonstrate understanding of it and interpret it correctly, while at the same time maintaining our own identity as being separate from it and having a professional home elsewhere. The evaluator must be seen as different but not alien, separate but acceptable, speaking a different language but one that is comprehensive, credible, aids understanding and gives insight. Evaluators are constantly working in fields that have different frames of reference, so the skill of reconciliation of sometimes opposing theoretical and practical frameworks is a skill to be acquired. Finally, the evaluator must be a pragmatist and a person who is grounded in practice and practical development. That must be part of the evaluators theoretical view and practical skill. A final reason why evaluation is so problematical concerns the nature of the variables to be addressed by any evaluation study. Educational events seem to have an essentially unknowable and indescribable quality. Any educational event is: complex it involves a multiplicity of variables (people, places and occasions); dynamic it may change, possibly unpredictably, as it proceeds; variable it may differ from previous occasions; dependent on its context it may differ from a similar event elsewhere. (Coles and Gale-Grant, 1985) Whether that is because of their complex nature as interpersonal and intrapersonal processes or whether it is because we really do not yet have tools sophisticated enough to measure them or whether it is simply that the tools we do have are not suited to the environments we would wish to study or whether it is because application of such tools would alter those environments and so make the study invalid is open to speculation and analysis. Whatever the reason, it remains the case that educational events are difficult to study. What conclusions can we draw, then, in relation to the question of why evaluation is so problematical? We can attribute this to a series of specific factors: First, there is an apparent lack of true theory which has arisen out of the practice of curriculum evaluation and in turn informs its practice and is altered by it. Second, there are few obvious criteria to apply when trying to select an appropriate theory or model for practice. Third, there is no scientific method associated with the field of evaluation and, possibly, there never can be. Fourth, there is a vast armamentarium of methods available to the evaluator but no criteria for choosing amongst them. Fifth, the targets of any evaluation study in education are themselves problematical, being complex, multifaceted, numerous and essentially unknowable in any conventional way. Sixth, the role and interrelationships between the participants in an evaluation study always require negotiation and definition. Seventh, the field of evaluation is not unitary or easily identifiable within one discipline. It is essentially hybrid in nature, drawing on a range of disciplines and skills which the evaluator must have acquired. Eighth, the frames of reference, i.e., theories, models, values, approaches, ideas, of those involved in an evaluation study will often be different.

12

What this amounts to is that for any one evaluation study, there is very little that is given and probably less that is predictable. In addition to the factors summarised here, there is a further central issue which is vexed and of both theoretical and practical importance. It too is problematical and concerns the role of values and value judgement in evaluation. 5. Part Four: The Value Component.

Evaluation by definition, involves the placing of a value on something or making judgements of merit or worth. All new wave evaluation approaches involve, either explicitly or implicitly, some judgment of this kind. That judgment is usually made by the evaluator and offered to the relevant audiences. This presents evaluation theorists and practitioners with a range of issues which require either explanation or resolution. According to whose criteria shall the judgement be made? And what status will it attain? When it comes to judgement, who can be said to be right? Even the judiciary has its Court of Appeal. Stenhouse (1976) is particularly concerned about this issue: The new wave of evaluators still seem to me to be concerned with merit or worth in a curriculum or educational practice, but their criteria are not clear and their concern with audiences and presentation of results appears to me to mask their problem. They aspire to tell it as it is, and they often write as if that is possible if they allow for some distortion due to their own values. But there is no telling it as it is. There is only creation of meaning through the use of criteria and conceptual frameworks. Many workers in the field are concerned about this issue debates about reliability and validity of data are not, in essence, about scientific methodology. Discussions about accountability are also discussions about whose criteria will be used. Evaluators usually cannot ignore the power of their funding bodies. So the issue is central to the work of all evaluators even those who might still be evaluating by objectives cannot ignore the fact that the objectives themselves are value-laden. House (1972) made an initial analysis of the involvement of values in evaluation, and he made the useful distinction between the context of valuation and the context of justification: The context of valuation involves the basic value slant derived from the genesis of the evaluation, and includes all those motivations, biases, values, attitudes and pressures from which the evaluation arose. The context of justification involves our attempt to justify our findings using the logic and methodology of the social services In this way, House establishes the idea that values are already present prior to the enrolment of an evaluation and that the evaluation itself adds another layer of value judgement which must be shown to be rational and defensible. It is neither necessary nor possible for evaluation to be valuefree; but it is important that evaluators reveal the values on which their judgements are based. Sometimes, of course, those values are obvious, whether deliberately revealed or not. They are obvious in the conclusions researchers reach on the basis of the data collected. Shipman (1972) demonstrates clearly that the same data can be used equally effectively by both supporters and opponents of new schemes to enhance their case. Parlett and Dearden (1977) would agree with Houses analysis of two contexts relevant to the question of values in evaluation. But they are rather more optimistic on the question of the evaluators ability to be impartial: concerted effort must be made to represent different value positions, ideologies and opinions encountered in the course of the investigation; and, moreover, to represent them in ways considered fair by those holding these positions. Commitment to judicious, non-partisan, and impartial inquiry is open to criticism from some who favour a committed political stance, and from others who believe that it is impossible to be

13

impartial. Most illuminative researchers believe it is difficult but not impossible to be fair to differing points of view. It would be unwise to underestimate that difficulty. Recognition of value pluralism is relatively straightforward. Impartial representation of that pluralism to audiences who are themselves heterogeneous in their values, presents a problem of an entirely different order. Explanation of ones own value system as a preface to an evaluation report could well alienate the potential audiences and cast doubt upon the status of the report. The problem is difficult and unavoidable, for even the selection of an evaluation approach is a statement of value. The question of values in evaluation is vexed and difficult and at times seems almost to throw into doubt whether evaluations are ever worth doing at all if we cannot be sure what they are saying or why they are saying it. If it is really a process of evaluators judging value-laden situations and then audiences judging the evaluators value-laden representations of those situations from their own value-laden perspective, is it really worth doing in the first place? The answer is, of course, that it certainly is worth doing. Rowntree (1982) and Stufflebeam and Webster (1980) provide two quite different reasons for undertaking evaluation despite such problems. Rowntree argues that: there is a large subjective element in evaluation, from deciding on appropriate criteria to weighing up the relative importance of conflicting pieces of feedback. Nevertheless, many teachers have found that evaluation provides a distancing stimulus whereby they can re-examine their teaching as if through someone elses eyes, and thereby see the solecisms and violations of pedagogy that had somehow escaped all their preliminary scruitinies. Stufflebeam and Websters (1980) argument is quite different but equally powerful; ... t is virtually impossible to assess the true worth of any object. Such an achievement would require omniscience, infallibility, and a singularly unquestioned value base. Nevertheless, the continuing attempt to consider questions of worth certainly is essential for the advancement of education. What would be our conclusions then about the problem of values in evaluation? First, evaluations take place within the context of at least three sets of value systems, each of which is likely to span a range of values. These value systems are located within the evaluators, the audiences and the subjects of the evaluation. Second, the involvement and influence of values are unavoidable at all stages of the evaluation process. The relevant decision to be made concerns whether or not a public discussion of the problems this raises would be a help or a hindrance to understanding and acceptability of findings and interpretations. Third, there are influences which can be bought to bear upon the selection of criteria or criterial value systems in any evaluation. These influences stem from the lines of accountability which an evaluation must follow, from the power hierarchy of the relevant institution, and from the funders or sponsors of the evaluation. Prior negotiation of accountability and independence of the evaluation might resolve this problem. 6. Part Five: Deriving and Using Theory for Evaluation of Medical Education

In this final part of our discussion of theory in evaluation, I intend to return to the central issue of whether or not medical education requires its own theory of evaluation. It is a temptation in any field, succumbed to in medicine perhaps more than others, to believe that ones own discipline or field of practice is quite different from all others, much more complex and much more challenging. Whether or not this is the case, we still have to consider to what extent the theories, models and approaches that do exist are appropriate to our field of practice. Or is medicine really different, so that applying our discipline to medical education makes all vicarious experience irrelevant? If this is the case, it is incumbent upon us to construct our own theory deriving from our own practice and informing that practice in turn. True and useful theory, as we have defined it in a dialectical-materialist way, is that which derives from reflection upon practical experience and is used to influence practice in turn. Theory can be

14

built quite justifiably from both direct and indirect practice, as long as that practice is relevant. It is quite possible, therefore, that much of what we have reviewed will be useful to us either as relevant theory in its own right or as a stimulus to our own building of relevant theory. If the latter is the case, then we shall have to consider which elements of others thinking are useful. Part one of this paper made out the case for the necessity of theory, but it will be useful to consider what we actually should expect our theory to do. It should, at a minimum, do the following: Set out the issues associated with each stage of planning and implementation of evaluation medical education, including the stage of reporting. Clarify the relationship between the sponsors, subjects and implementers of evaluation studies. Identify the extent to which a curriculum evaluator should also be a curriculum content expert. Consider the potential roles and lines of accountability associated with formal participants in an evaluation study and the implications of these for practice. Analyse the relationship between curriculum evaluation and curriculum development. Identify any special qualities of medical education which might impinge upon the evaluation process. This might include the instructional system, the learning milieu, the medical school academic, administrative and organizational environment. Address questions of contradictory value systems which might impinge upon an evaluation study and consider whether these are antagonistic or non-antagonistic in nature. Review the implications of the above factors for appropriate methodology in the evaluation of medical education, including the education of audiences about evaluation. Provide a critical analysis of available methodologies and suggest criteria for their selection and use. In this paper, we have begun to indicate that there appears to be a series of problems to take into account when building a theory of evaluation in medical education. The central problem seems to turn, however, on theory itself. The new approaches to evaluation move in a world of values, theories and perceptions which is alien to any medical model of practice or any traditional biomedical model of research. Where the teachers who are likely to be the sponsors or subjects of evaluation studies adhere to the models, values and theories of their own profession rather than to those of educational disciplines, the evaluator is faced with a problem which requires a clear theoretical viewpoint for its successful resolution. One might speculate that medical education does need its own theories of evaluation for this very reason that the practice of medicine and the values and theories of that particular profession are still the key factor in medical education which gives it a general climate and overall orientation. But equally important, it needs theory so that those who evaluate medical education can also communicate effectively with one another. Evaluators themselves need theory. The activity of curriculum evaluation in medical education, by implication, seems to be claiming a special status. Why else would a paper such as this be written? If this is so (and it seems to me to be a defensible contention) then our prime duty is to avoid the error of empiricism by addressing at the outset issues of theory.

7.

References

Coles, C.R. & Gale-Grant, J. (1985). Curriculum Evaluation in Medical and Health-Care Education. Medical Education Research Booklet No.1, Dundee: ASME.

15

Eisner, E.W. (1969). Instructional and expressive educational objectives: Their formulation and use in curriculum. In Popham, W.J. (Ed.) Instructional Objectives. AERA Monograph Series on Curriculum Evaluation No. 3. Chicago: Rand-McNally. Eisner, E.W. (1972). Emerging models for educational evaluation. School Review 1971-2, 80, 573590. Hamilton, D. (1976). Curriculum Evaluation. London: Open Books. Harlen, W. (1975). Science 5-13: A Formative Evaluation. London: McMillan Education House, E.W. (1972). The consciences of education evaluation. Teachers College Record, 73, 405414. House, E.R. (1980), Evaluating with validity. New York: Sage Publications. Jenkins, D. (1976). Toward evaluation. E203 Curriculum Design and Development. Unit 19, Milton Keynes: Open University Press. Kuhn, T.S. (1962). The Structure of Scientific Revolutions. Chicago: University of Chicago Press. MacDonald, B. (1973). Briefing decision-makers. In Hamingson, D. Towards Judgement: The Publications of the Evaluation Unit of the Humanities Curriculum Project 1970-1972. Norwich: Centre for Applied Research in Education. Occasional Publication No. 1. MacDonald-Ross, M. (1973). Behavioural objectives a critical review. Instructional Science, 2, 152. Mao Tse-Tung, (1968). Four essays on Philosophy. 2nd printing. Peking: Foreign Language Press. Parlett, M. & Dearden, G. (Eds.) (1977). Introduction to Illuminative Evaluation: Studies in Higher Education. Cardiff-by-the-Sea: Pacific Soundings Press. Rowntree, D. (1982). Educational Technology in Curriculum Development. Harper and Row. 2nd edition. London:

Scriven, M. (1967). The methodology of evaluation. In Tyler, R., Gagne, R. & Scriven, M. (Eds.) Perspectives of Curriculum Evaluation. AERA Monograph Series on Curriculum Evaluation No. 1. Rand-McNally: Chicago. Scriven, M. (1973). Goal-free evaluation. In House, E.R. (Ed.) School Evaluation: The Politics and Process. Berkely: McCutchan Publishing Corporation. Shipman, M.D. (1972) The Limitations of Social Research. London: Longman. Stake, R.E. (1972). Responsive Evaluation. Mimeo. Centre for Instructional Research and Curriculum Evaluation. University of Illinois. Urbana-Champaign. Stake, R.E. (1975). Analysis and Portrayal. Paper written for AERA Annual Meeting 1972. Revised at the Institute of Education, University of Goteborg. Stake, R.E. (1976a). The methods of evaluating. Evaluating Education Programmes. 18-28, OECD. Stake, R.E. (1976b). Programme evaluation, particularly responsive evaluation. In Dockrell, W.B. & Hamilton, D. (Eds.) Rethinking Education Research. London: Hodder and Stoughton. Stenhouse, L. (1976). Heinemann. An Introduction to Curriculum Design and Development. London:

Stufflebeam, D.L., Foley, W.J., Gephart, W.J., Guba Egar, G., Hammond, R.I., Merriman, H.O. & Provus, M.M. (1971). Educational Evaluation and Decision Making. Itasca: F.E. Peacock. Stufflebeam, D.L. (1972). Should or can evaluation be goal-free? Evaluation Comment, 3, 4.

16

Stufflebeam, D.L. & Webster, W.J. (1980). An analysis of alternative approaches to evaluation. Educational Evaluation and Policy Analysis, 2, 3, 5-20. Warr, P.B. (1980). An introduction to models. In Chapman, A.J. and Jones, D.M. (Eds.) Models of Man. Leicester: British Psychological Society. Worthern, B.R. & Sanders, J.R. (1973). Educational Evaluation: Theory and Practice. Worthington: Charles A. Jones Publishing Company.

17

You might also like