Journal of Moral Education, Vol. 31, No. 4, 2002

A Test for Ethical Sensitivity in

University of Glasgow, UK

The Test for Ethical Sensitivity in Science (TESS) described in this article is a
pen-and-paper measure for studying ethical sensitivity development in young adults. It was
developed to evaluate the impact of a short ethics discussion course for university science
students. TESS requires students to respond to an unstructured story and their responses are
scored according to the level of recognition of the ethical issues in the scenario provided. When
TESS was used in conjunction with ethics teaching it showed that university science education
seems to provide no inherent beneŽ ts in ethical sensitivity development but that a short course
in ethics can have a signiŽ cant impact on students’ ability to recognise ethical problems.

Why Measure Ethical Sensitivity?

Ethical sensitivity is an independent element within moral development. Rest (1986)
placed ethical sensitivity as the Ž rst of four distinct elements in moral behaviour and
decision-making where the other three are: moral reasoning, priority to moral
concerns, and moral courage. Whatever the aim of an ethics course—to develop
moral character, promote ethical decision-making skills, or encourage the develop-
ment of ethically sensitive professionals—ethical sensitivity is an element we cannot
ignore. Ethical sensitivity is the Ž rst step in real-life moral decision-making. Without
recognising the ethical aspects of a situation, it is impossible to solve any moral/
ethical problem, for without the initial recognition no problem exists. Also, without
an analysis of the ethical and moral aspects, it is impossible to move on to making
decisions, at least if the decisions are to be made with awareness of the magnitude
of the problem and the effects of the decision.
Research has shown that scores in moral reasoning (based on the DeŽ ning
Issues Test, DIT) and moral sensitivity correlate only modestly (in the 0.2–0.5
range) (Bebeau & Brabeck, 1987). It seems possible, therefore, for a person to be
skilled in interpreting the ethical issues in a situation, but unskilled at working out
a balanced view of a moral solution, and vice versa to be unable to recognise the
issues personally while being skilled in solving these problems when they have been
identiŽ ed. If moral development is understood as development in both moral
sensitivity and moral reasoning, it is necessary to have separate test tools for both
ISSN 0305-7240 print; ISSN 1465-3877 online/02/040439-15
Ó 2002 Journal of Moral Education Ltd
DOI: 10.1080/0305724022000029662
440 H. Clarkeburn

aspects, as development in one area cannot be taken to indicate development in the

other. Thus, if our educational efforts aim at improving students’ ethical sensitivity,
we need a special measure to evaluate the success of our educational interventions.
This article will describe the process used to develop a new ethical sensitivity
measure. The measure is not profession-speciŽ c such as the DEST (Dental Ethical
Sensitivity Test) (Bebeau et al., 1985), but it is associated closely with scientiŽ c
issues. It is not possible from the results reported here, or from previously published
work, to demonstrate that development in ethical sensitivity in one area (e.g.
professional) leads to increased sensitivity in another (e.g. interpersonal). Thus the
measure described in this article is a measure of ethical sensitivity in relation to
scientiŽ c practice. It is not strictly professional, as it does not measure understand-
ing or recognition of a professional code of ethics. Thus it is not restricted in use to
scientiŽ c professionals only, but for all who have an interest in the ethical issues of
contemporary scientiŽ c progress. I have called it a “Test for Ethical Sensitivity in

Science” (TESS).

What is Ethical Sensitivity?

Ethical sensitivity is a combination of two different abilities; moral imagination and
recognition of ethical issues (Callahan, 1980).

1. Moral imagination is an ability to perceive a “moral point of view”—to

understand that (a) human beings live in a network of moral relationships,
(b) consequences of applying moral rules in practice can be either happiness
or suffering, (c) moral dimensions of life can be hidden or visible, and (d)
moral choices are in most cases inevitable and difŽ cult. Moral imagination
is thus an ability to see the moral side of the story and an ability to foresee
moral consequences of actions. It is like imagination, because it requires one
to “see” something that is not real in a sense that we could touch or feel it,
but something that is real in our minds and within our social existence. This
level of understanding can be considered as a prerequisite level for any moral
discourse. It is necessary, but not sufŽ cient. Without moral imagination we
are not able to engage in discussions on ethical problems; but to have only
imagination is like being able to see, but not to act, to be only a passive
perceiver of things, but not an active player within the moral network one
can see. Therefore, simply to have an ability to perceive moral problems is
insufŽ cient for making moral decisions, while it is an essential part of that
process. In order to solve moral problems one also needs an ability for
conceptual and logical analysis.
2. Recognition of ethical issues is linked closely with moral imagination—it is
moral imagination put into action. If moral imagination is an ability,
recognition of ethical issues is the application of that ability. It is an attempt
to analyse what has been seen, to recognise the value of moral aspects
in a particular situation. Recognition of ethical issues is to distinguish
between emotional responses to situations and appraisal of realities, moral
A Test for Ethical Sensitivity in Science 441

or scientiŽ c. This type of recognition requires an examination of moral/

ethical concepts and statements. To be able to recognise ethical issues in this
way, one needs to be aware of the moral categories, of the aspects that can
be classiŽ ed as moral and to be able to evaluate their importance to a
particular situation.

When ethical sensitivity is described in terms of a skill as above, it is logical to

consider it also as a capacity capable of development—a skill to be acquired and
improved, not an inborn talent. People may differ in their natural sensitivity to moral
problems, but in general, ethical sensitivity is most likely acquired by exposure to
and experience with moral problems. Perceiving ethical sensitivity as a skill, it
becomes possible to cultivate it and improve a person’s ability to understand the
moral aspects of a problem with increasing adequacy and precision. Ethical sensi-
tivity is an important skill and the difŽ culties in interpreting a situation as moral and
in understanding the implications of moral actions should not be underestimated.

While ethical sensitivity has a strong cognitive component, it is not only an
intellectual faculty. Recognition and interpretation of moral aspects are also depen-
dent on situational clues, personal attributes, and affective responses. Rest (1986)
provides the following list of the factors which interact with a rational and clear
perception of moral elements in context:

1. People may block from their consciousness certain aspects because the cues
in the situation are ambiguous and it becomes difŽ cult to interpret them.
2. Research shows that there are distinct individual differences in sensitivity to
needs and welfare.
3. Research has shown that there can be a strong affective response before
extensive cognitive encoding.

Taking into consideration these psychological aspects of recognising and

analysing moral situations, an ethics curriculum can proceed to support student
development in learning to confront these situations with more re ective thought
and understanding of their initial emotive responses.

Measuring Moral Sensitivity—Currently Available Methods

DEST was created by Bebeau et al. (1985) to measure dental students’ ability to
identify and interpret typical ethical problems arising in dental practice. DEST
comprises four recorded dramatised dialogues that might occur in a dental ofŽ ce.
The subjects are Ž rst asked to listen to the dialogues and later take part in the
dialogue and assume the role of the dentist and carry on as if he or she were actually
in that position. The responses are recorded and later the students are interviewed
about their assumptions and perspectives underlying their responses. These
interviews are taped, transcribed and scored to measure the degree of sensitivity to
the responsibilities of dentists. Seven sensitivity criteria are described for each
dilemma and students are scored on a scale from one to three, indicating their
degree of recognition. The scoring criteria were developed in collaboration with
442 H. Clarkeburn

practising dentists and moral philosophers. DEST has proved to be reliable, with
inter-scorer agreement averaging 0.87 and test–retest correlation averaging 0.68
(Bebeau & Brabeck, 1987). The correlation between DEST scores and DIT was
found to be between 0.2 and 0.5.
DEST is very speciŽ c for measuring moral sensitivity in a professional context.
The research literature does not entertain considerations of whether professional
moral sensitivity can be understood as general moral sensitivity or whether moral
sensitivity can develop in relative isolation in different areas of life, and thus one
should not extrapolate these results to measures of general moral sensitivity. Fur-
ther, this approach is best applicable to professions where moral considerations are
situated in personal interactions and which have an agreed code of professional
ethics, as in medicine, teaching and law and, to a certain extent, in science (fraud,
whistle-blowing); but this is a less suitable approach for measuring moral sensitivity
in a situation where personal interaction is limited and where no agreed guidelines

exist (ethics of genetic research, for example).
A similar moral sensitivity test was designed by McNeel (1994). In this
research, college students were played four recorded drama situations containing
moral problems frequently confronted by students: (1) cheating, learning problems
and racism; (2) pressure for sex, date-rape, depression and co-dependency; (3)
grieving for parent’s death, autonomy, career decisions and parental pressure; and
(4) alcohol abuse and its consequences, irresponsibility and broken trust. Before
hearing the drama, students were informed that the researchers were interested in
what the students noticed and what they paid attention to. After hearing the drama,
students took the role of the central character’s best friend and spoke into a tape
recorder as though they were speaking directly to their friend. Non-directive follow-
up probes were used to help the students to express themselves on all the relevant
issues they had noticed in the situation. Coding manuals were devised to allow
reliable and valid scoring of transcriptions. McNeel found gender differences in the
results, but only in some issues. He also found that perception of some moral
problems was signiŽ cantly low—in particular in the date-rape and pressure for sex
drama. No comparison between DIT scores and moral sensitivity was made in this
The test approach of McNeel is less tied to professional moral sensitivity, while
the approach is similar to Bebeau et al.’s DEST in providing scenarios for individual
involvement and direct contact with the problem. The results of McNeel also
indicate that moral sensitivity is case-dependent, which supports the possibility that
moral sensitivity in professional issues may not indicate moral sensitivity in other
areas of life.
Both measures described are based on one-to-one interaction between the
researcher and the subject, and are thus labour-intensive. To study changes in
ethical sensitivity in a larger student population participating in ethics courses, these
methods are less suitable due to time and resource restrictions. Further, the
approach of situating the subject personally within an ethically complex situation, is
not the most suitable approach to measure ethical sensitivity in science. Only a
limited number of the ethically complicated problems in science allow one-to-one
A Test for Ethical Sensitivity in Science 443

interaction with the problem. We can construct morally demanding and complex
situations where students would need to face issues in animal welfare, whistle-
blowing, or use of human subjects, but many ethically demanding problems are not
easily captured in this manner: what are the limits of genetic research, what type of
research should we do, who makes the decisions in the direction of modern
bioscience, etc. Therefore, to study ethical sensitivity towards general scientiŽ c
problems in a large population of science students, it became necessary to develop
a new test.

Methodology for TESS

A suitable starting-point in measuring ethical sensitivity is to develop unstructured
problems. A moral problem is unstructured when it does not directly indicate the
moral issues involved, either by describing them in the problem narrative or by
giving moral statements to choose as possible solutions or considerations for arriving
at a solution. The problems used in DIT are “structured” moral problems, because
the narrative structure describes a particular moral dilemma (e.g. should Heinz steal
and save his wife, or should he not steal and not save his wife) and the consider-
ations for the decision are all part of the moral deliberation process. An
“unstructured” moral problem is thus a problem scenario which has moral compo-
nents, but where these components are not self-evident, and a solution to the
problem can be arrived at without ethical considerations (although that solution
would indicate low ethical sensitivity).
It is therefore impossible to measure ethical sensitivity with a “tick-a-box”
method. Any such method would have to include some level of pre-established
moral analysis, which would have taken place before any statements to choose from
could have been produced. For example, a test protocol which gives students an
unstructured moral problem and then offers several ethical and non-ethical elements
to choose to include in their deliberation, would not test the recognition of ethical
issues, but the importance students place on these issues. It has been found that
people can recognise and discriminate and thus prefer an idea before they can
paraphrase it or before they can spontaneously produce the idea in a response to a
story dilemma (Rest, 1976). An ethical sensitivity test needs to measure the spon-
taneous recognition of moral issues, the interpretation of a situation in moral terms,
if we wish it to represent the ethical sensitivity skills needed in real-life situations.
Therefore, the nature of moral sensitivity requires the test of moral sensitivity
to be qualitative, to allow subjects to respond to an unstructured problem with only
minimal guidelines or pre-established thought-patterns. This type of qualitative data
can be collected either verbally in an interview or in a written form. DEST used both
methods, which provided equally valid and reliable data (Bebeau et al., 1985), while
the interview scores yielded higher estimates of moral sensitivity, as judges felt they
had a better opportunity to conŽ rm their judgement from verbal responses. Inter-
views may produce more data, but they are also more laborious to administer. When
the need is to test large numbers, the appropriate choice is a written test-format.
444 H. Clarkeburn

TABLE I. Pilot stories

Story 1. Laboratory take-over offer

A small research laboratory has made a breakthrough in discovering a gene defect that triggers acute
childhood asthma together with environmental exposure. The research team has been funded through
governmental research councils. The grant is due to run out in a few months time and there are no
guarantees of future funding. A large pharmaceutical company has made a bid for the laboratory, promising
to employ the scientists as long as they will sell the patent rights to the company. This would mean a move
to a new location and not being able to continue with the current support staff. There is an alternative
opportunity to gain further funding from the research council which would allow for the laboratory to stay
independent and possibly expand its facilities, but for now there are no guarantees whether such funding
will be made available.
Should the research laboratory accept the offer?

Story 2. Pharmaceutical milk and a GM cow

A research group is planning a project on creating a cow that would produce milk containing a protein that
could be used to treat patients with cystic Ž brosis. Other pharmaceutical methods to produce this protein
have not been successful or they have been very expensive. The plan is to introduce a new gene from another
animal into the genetic sequence of the cow that directs the production of the mammary gland to change
it from producing normal milk into producing a pharmaceutical milk containing the desired proteins. The
new gene will be introduced by nuclear transfer, a technique also used in cloning. The group hopes to
develop its research Ž ndings into a commercial product.
Do you think the research should go ahead?

Story 3. GM crops for nutritious enhancement

A research group is considering a project on developing more nutritious plants by using plant viruses. The
aim is to genetically modify these viruses so that when they act on the plants, the plant tissue will produce
high levels of novel proteins which will increase the essential dietary value of the plant. Over 900 natural
plant viruses have been described by scientists. The viruses studied so far are pathogens in the plant only
and humans digest and handle them continuously with no ill effect. The genetic material of natural viruses
has not been found to interact with the genes of the host plant. The researchers hope that the new plant
varieties could be used in developing countries.
Do you think the research should go ahead?

Pilot Studies
TESS was developed as part of a research programme aimed at designing and
evaluating an ethics programme for a large number of life sciences undergraduates
at the University of Glasgow (Clarkeburn et al., 2001).
Three different unstructured scenarios were piloted for TESS. Two of the
stories were based on realistic research proposals found in Bruce and Bruce (1999)
(genetic modiŽ cation of a cow to produce pharmaceutical milk for cystic Ž brosis
(Story 2) and genetically enhancing nutritious qualities of a plant (Story 3)). The
third story in the pilot study described a take-over offer made to a successful
research laboratory (Story 1). Each story Ž nished with a question asking whether or
not some action should be taken (see Table I for details).
For each story students were asked to write down no more than Ž ve issues or
questions they believed should be considered before a decision on the topic could be
made. Students had 15 minutes to complete the task. Each story was piloted with
approximately 20 bioscience students from the Ž rst year (Level One/L1) and 20
from the third year of a four-year Honours course (Level Three/L3).
A Test for Ethical Sensitivity in Science 445

TABLE II. Mean numbers of responses to the pilot study

stories by L1 and L3 students

L1 L3

Story 1. Laboratory take-over 2.1 2.5

Story 2. Pharmaceutical milk 2.1 4.0
Story 3. GM crops 3.1 3.4

At the Ž rst stage the numbers of responses per student were collected and
themes of responses identiŽ ed. Table II details the mean numbers of responses
made by each student group to each of the piloted stories.
Stories 1 and 3 did not generate signiŽ cantly different numbers of responses by
L1 and L3 students, while L3 students made signiŽ cantly more responses than L1
students to Story 2. There are three possible interpretations for the different
answering patterns (1) either there is no spontaneous developmental advantage in 2
years of science study measurable by simple response frequencies and Story 2
provides a false impression of such advantage; or (2) the spontaneous advantage
occurs and Stories 1 and 3 fail to capture it; or (3) due to small sample sizes, there
is a possibility of pseudo-difference between levels which is coincidental rather than
The number of themes identiŽ ed by students for each pilot story are shown in
Table III. At this stage, Story 1 was removed from further analysis and development
for two reasons: (1) it generated the least number of responses from both student
groups and thus provided the least material for further analysis; and (2) it generated
the highest number of themes, which complicates the design of a scoring guide.
At the next stage the responses to Story 2 and 3 were categorised between
ethical and non-ethical considerations. This was achieved by asking whether the
question (only 9% of the responses were not questions) can be answered sufŽ ciently
by reference to scientiŽ c/technical/Ž nancial data alone? If the answer was yes, the
response was classiŽ ed as non-ethical. An example of non-ethical response is: how
much milk do the CF sufferers need to drink? In contrast, an example of an ethical
response is: will the beneŽ ts to patients be worthwhile enough to justify altering the
genetic composition of a cow? Table IV details the results.

TABLE III. Number of themes in pilot

study stories

Number of themes

Story 1 13
Story 2 8
Story 3 11
446 H. Clarkeburn

TABLE IV. Percentage of ethical responses

made by students in the pilot study stories

% of ethical responses

L1 L3

Story 2 70% 64%

Story 3 43% 56%

Story 2 generated more ethical responses than Story 3 in both student popula-
tions. This can be interpreted as either (1) that the ethical issues in Story 2 are more
accessible; or (2) that story 2 contains more ethical issues per se. It is also worth
Downloaded by [Tulane University] at 11:11 26 January 2015

noting that the lower number of L1 responses to Story 2 were more concentrated on
ethical issues than the larger number of L3 responses. It was assumed that it is most
likely that the ethical issues in Story 2 are more accessible to students than in Story
3 rather than there being an inherent difference of ethical concerns to be recognised.
The next stage of ethical sensitivity measure development was to look at the
ethical responses in more detail. A three-tier structure, similar to that developed for
DEST, was adopted (see Table VI for details of the approach). The lowest tier
represents a very general recognition of the issue, the second tier shows more
detailed understanding of the issue, and the third and last tier provides evidence of
a more extensive and mature understanding of the problems and stakeholders
involved. A three-tier scoring guide was developed for each theme in both stories,
and the responses were then analysed.
Story 3 proved harder to analyse as many responses covered several themes and
in some thematic categories there were either no lowest or highest tier responses,
casting doubt on the accuracy of the scoring guide. Due to these problems which
were absent from Story 2 analysis, further efforts were concentrated on improving
the scoring method for Story 2.

Scoring TESS
The scoring guide for Story 2 was developed from a total of 44 completed
questionnaires from L1 and L3 students. First, all the themes were submitted to
pre-established tests of logic, as suggested by Bebeau et al. (1985): is a criterion
logically independent of every other (i.e. could an individual score high in one, but
not the other)? Using this method the response themes for this story were reduced
from eight to four. In the remaining four main themes, there were altogether nine
different subthemes. See Table V for details.
For each theme/subtheme a four-tier scoring guide (tiers 0 (non-ethical
response)–3 (highest level ethical response)) with sample entries was developed. To
ensure its validity in representing ethical sensitivity development, it was indepen-
dently evaluated by four academics at the University of Glasgow, representing
A Test for Ethical Sensitivity in Science 447

TABLE V. Themes and sub-themes in TESS scoring guide

Main theme Risks Cost and beneŽ t Basic values Public opinion

Subthemes · Human health · Medical beneŽ ts · Genetic research

· Animals · Opportunity cost · Animal rights
and research
· Supervision · Commercial
involvement and
· Testing and
different disciplines: philosophy, education and science. A special effort was made to
describe each tier so that the length of a students’ answer was not a decisive element
in its allocation into a tier. As an example of the Ž nal scoring guide, for risks/
animals, the tiers can be seen in Table VI.
Once the scoring guide was complete, a further three independent raters were
asked to use it to score 10 questionnaires consisting of altogether 36 responses.
If there was an inconsistency between ratings, the response was brought to a
meeting. There were eight responses where agreement needed to be sought and in
Ž ve cases the disagreement was about which subgroup the response belonged to and

TABLE VI. Sample from TESS scoring guide

Tier 0 Questions of risks for which an answer can be given on purely factual basis—i.e. no moral
consideration required

Sample entries How will the gene affect cow’s original genes?
Where do the genes come from?
Tier 1 First level recognition of risk, which might serve as a stepping stone for higher level
considerations, but that is not apparent in the response

Sample entries What are the side-effects on the cow?

Is the nuclear transfer technique safe?
Tier 2 Better understanding of risk, the considerations are still often factual, but the moral
elements are now deŽ nitely present. Considerations of animal welfare and suffering are
typical. Responses also sometimes include strong, but unqualiŽ ed, value-statements

Sample entries Will the cow suffer from producing the milk?
Animals should not be subjected to any pain or distress
Tier 3 The responses include mature considerations about the role of decision-makers and
what should in uence the acceptance of different levels of risk. JustiŽ cation of using
animals is explicitly sought

Sample entries How much animal suffering can be justiŽ ed for commercial proŽ t?
448 H. Clarkeburn

TABLE VII. Study population

Student levels L1 L3

Filled questionnaires 253 267

Male 81 (32%) 78 (29%)
Female 172 (68%) 189 (71%)

Scored (TESS) 50 267

Male 14 (28%) 78 (29%)
Female 36 (72%) 189 (71%)

in three cases which tier was most appropriate. The guidelines in the scoring guide
that led to these inconsistencies were altered after consultation with the independent
raters. The raters also reported that the guide took some time to learn, but was
logical and simple to use thereafter.
To generate a TESS score, it was decided that responses would accumulate a
score equivalent to the tier it belonged to (tier 0 5 0 points, tier 1 5 1 point, tier
2 5 2 points and tier 3 5 3 points). If there was more than one response belonging
to the same subcategory, only the highest scoring response was included in the Ž nal
score to avoid high scores being generated by rephrasing essentially one item several
times. Also, if in doubt, the response was scored on a lower tier to remove the
possibility of the rater Ž lling in gaps in the responses and thus scoring more

Using the TESS

At the start of the academic year 1999–2000 TESS was administered to undergrad-
uate bioscience students at the University of Glasgow. TESS was the Ž rst part of a
larger Moral Development Questionnaire (MDQ), which also consisted of a three-
story DIT and a Perry Test (Clarkeburn, 2000). Students had 45 minutes to Ž ll in
the MDQ and they were advised to complete it in the sequence it appeared, i.e.
TESS Ž rst, then DIT and the Perry Test last. All students completed the MDQ in
the time given. Table VII details the study population.
During the academic year the L3 students took part in a trial of new ethics
educational material (Clarkeburn et al., 2001), while the L1 students did not. First,
the differences between L1 and L3 students shown in the start of session TESS
scores are presented. Then the differences in TESS scores of L3 students pre- and
post-ethics teaching scores for L3 are discussed.
There was no signiŽ cant difference (P . 0.05, unpaired t-test) between the
mean number of responses by L1 and L3 students (L1 5 3.18 and L3 5 2.87). Thus
the difference between the mean number of responses in the pilot stage of this study
between L1 and L3 students seems to have been co-incidental due to the small pilot
study sample. The high number of responses indicates that the students have
responded genuinely to the questionnaire. The mean TESS for L1 students was
A Test for Ethical Sensitivity in Science 449

TABLE VIII. Mean numbers of responses for L3, pre- and post-TESS

Number of responses to TESS SigniŽ cance

(mean) (paired t-test)

Group Pre- Post-

Test (n 5 133) 2.87 2.99 P , 0.001*
Control (n 5 134) 2.83 3.08 P , 0.001*
SigniŽ cance (unpaired t-test) P . 0.05 P . 0.05

*Highly signiŽ cant.

4.275 and for L3 4.780. These scores were not statistically different (P 5 0.097,
unpaired t-test). This suggests that 2 years of university studies in science provides
no advantage in ethical sensitivity development as measured by TESS.
In L3, students were randomly divided into test (n 5 133) and control (n 5 134)
groups. The test group participated in an ethics intervention which aimed at
increasing students’ awareness of ethical issues. The intervention consisted of three
structured group discussions on ethical themes introduced to the students by
preliminary reading, which was either a scientiŽ c paper or a short philosophical
extract. Discussion groups were never larger than 15 students, usually 12, and each
discussion lasted approximately 2 hours. The discussion themes were chosen in
collaboration with the course co-ordinators and students. The aim was to design the
discussions so that they would be interlinked with the existing science curriculum
and touch on topics relevant to student experience. All groups started with a
discussion on the use of animals in bioscience research, followed by a subject-
speciŽ c topic (drug trials for pharmacology, DDT use in malaria control for zoology,
etc.); the last discussion was on scientiŽ c integrity and misconduct. All groups were
facilitated by an ethicist (Clarkeburn et al., 2001).
TESS was administered to both test and control groups the second time
(post-test) a minimum 3 weeks after the intervention at the end of term 2. The
control group students participated in the ethics discussion groups in term 3.
The number of responses made in the pre-questionnaire were not signiŽ cantly
different between test and control groups (P . 0.05, unpaired t-test), while the mean
number of responses was signiŽ cantly different in the pre- and post-administration
of TESS for both test and control groups (P , 0.001, paired t-test) with both groups
showing a signiŽ cant increase in the response number in the post-test. However,
there was no signiŽ cant difference in the number of responses between control and
test groups in the post-test (see Table VIII for details).
The pre-TESS scores were not signiŽ cantly different between test and control
groups. However, the post-TESS scores, after the intervention, were signiŽ cantly
different between test and control groups (P , 0.05, Wilcoxon’s t-test). Table IX
details the mean pre- and post-TESS scores.
When we look at the direction of change within both groups (Table X) and the
paired t-test results in Table VIII, we Ž nd that the ethical sensitivity score was not
450 H. Clarkeburn

TABLE IX. L3 TESS scores pre- and post-test

Ethical sensitivity score (mean 1 SD)

Group Pre- Post-

Test (n 5 133) 4.68 6 2.27 5.30 6 2.25

Control (n 5 134) 4.89 6 2.18 4.67 6 1.95
SigniŽ cance (unpaired t-test) P . 0.05 P , 0.05*

*SigniŽ cant.

static, but subject to  uctuation independent of educational interventions. However,

in general, more students progressed in the test group than in the control group.
This is understandable in a qualitative measure. At the same time, the large
number of students both regressing and progressing in the control group suggests
that TESS is sensitive to other elements than just development of ethical sensitivity.
This level of noise can make interpretation of small sample sizes difŽ cult, but should
cast less doubt on analysis of larger sample sizes.
The data on the control group ethical sensitivity scores and direction of change
indicate that there is no advantage in completing TESS twice, as there was no
signiŽ cant increase in their TESS scores.
Finally, there were no signiŽ cant gender differences in the pre- or post-TESS
scores (P . 0.05 for pre- and post-, unpaired t-test). However, there was an indica-
tive tendency of male students in the control group to regress more often than
female students, and similarly, the male students in the test group were more likely
to progress than female students. Table XI details the results.

Discussion and Summary

When measuring ethical sensitivity, we need to choose an area within which we
operate. TESS, as described in this paper, measures ethical sensitivity in relation to
ethical issues within science, but it is not strictly a professional measure as it is not
associated with a professional code of ethics.
In measuring ethical sensitivity we need to ensure that we measure the
identiŽ cation of ethical issues, not the ability to recognise or prefer ethical facts
among other facts, etc. Thus the measure needs to use unstructured scenarios where

TABLE X. Direction of TESS score change in L3

Direction of change (% of students)

Progressing Regressing No change

Test (n 5 133) 51.8% 33.3% 14.9%

Control (n 5 134) 31.5% 44.9% 23.6%
A Test for Ethical Sensitivity in Science 451

TABLE XI. Direction of change in TESS by gender

Direction of change (%)

Group Progressing Regressing No change

Test all (n 5 133) 51.8 33.3 14.9

Female (n 5 91) 48.3 37.4 14.3
Male (n 5 42) 57.1 28.6 14.3

Control all (n 5 134) 31.5 44.9 23.6

Female (n 5 98) 34.0 44.0 22.0
Male (n 5 36) 25.0 47.2 27.8
a decision or an appreciation of the scenario is not restricted to certain options, and

where a decision can be made without reference to ethical considerations. We need
to collect qualitative data, either from verbal accounts or from written answers.
Writing provides an opportunity for larger study samples.
Three different stories were piloted during the developmental stages of TESS.
Criteria for choosing one story over the others were based on the story’s ability to
generate different levels of responses in different subjects (ability to differentiate)
and the limited spread of potential issues in the scoring. It was also important that
the story re ected some basic ethical concerns related to most scientiŽ c enterprises.
The TESS scoring guide consisted of four main categories, which were logically
independent of each other. Three of these were subdivided further into two to four
subcategories. The guide was relatively lengthy and thus required the scorers time to
familiarise before a high level of inter-rater agreement could be achieved.
When TESS was used in the University of Glasgow, it provided two interesting

1. University science education as such seems to provide no particular stimulus

for ethical sensitivity development.
2. A small-scale ethical intervention in raising students’ ethical awareness
through structured discussions generated a signiŽ cant increase in TESS

Administering TESS takes less than 15 minutes and an experienced scorer can
score around 30 protocols an hour. This makes TESS relatively low in both labour
requirements and demands on student time. Students in this study had no difŽ culty
Ž lling in TESS and judging from the number of responses they wrote down they also
took the test seriously, thus the results can be considered a fair approximation of
their ethical sensitivity developmental stage in relation to scientiŽ c problems. How-
ever, it is possible that a more accurate picture of students’ ethical sensitivity could
be gained if they were to respond to more than one story at the same time, possibly
relating to different areas of science or representing ethical scenarios outside the
scientiŽ c realm. This would provide an opportunity to compare ethical sensitivity
452 H. Clarkeburn

across topics, reduce the risk of the chosen story being attractive/non-attractive to
the student. This approach was used in both DEST and McNeel’s measure, but not
employed here due to time pressures given to the over all test time.
It is also worth noting that the scores were relatively low, when the maximum
score is 15 and a pre-ethics education average less than Ž ve. Students recognised
relatively few ethical issues and in most cases recognised them only at the most
superŽ cial level. This can naturally be partly an effect of the structure of TESS,
where students might wish to write as short answers as possible in order to complete
the test as quickly as possible. However, when the lengths of the answers from
students scoring high and low were compared, there were no signiŽ cant differences.
This seems to be an indication of relatively low levels of ethical sensitivity in science
students. Their ignorance of even very obvious ethical issues in the discussion
groups supports this interpretation (Clarkeburn et al., 2001). Education designed
speciŽ cally to raise students’ awareness and support the development of their ethical
sensitivity seems to be both possible and needed.
The evidence presented in this paper suggests that TESS provides a good
methodology for measuring ethical sensitivity development in large student and
young adult populations.

Correspondence: Dr Henriikka Clarkeburn, Florentine House, University of Glasgow,

Glasgow G12 8QQ, UK; Tel: 0141 330; Fax: 0141 330; E-mail:

I would like to thank Dr Roger Downie for his support throughout the development
of TESS and The European Commission for the funding which allowed the work to
be carried out.

