Professional Documents
Culture Documents
To cite this article: David Kember , Doris Y. P. Leung & K. P. Kwan (2002) Does
the Use of Student Feedback Questionnaires Improve the Overall Quality of
Teaching?, Assessment & Evaluation in Higher Education, 27:5, 411-425, DOI:
10.1080/0260293022000009294
Taylor & Francis makes every effort to ensure the accuracy of all
the information (the Content) contained in the publications on our
platform. However, Taylor & Francis, our agents, and our licensors
make no representations or warranties whatsoever as to the accuracy,
completeness, or suitability for any purpose of the Content. Any opinions
and views expressed in this publication are the opinions and views of
the authors, and are not the views of or endorsed by Taylor & Francis.
The accuracy of the Content should not be relied upon and should be
independently verified with primary sources of information. Taylor and
Francis shall not be liable for any losses, actions, claims, proceedings,
demands, costs, expenses, damages, and other liabilities whatsoever
or howsoever caused arising directly or indirectly in connection with, in
relation to or arising out of the use of the Content.
This article may be used for research, teaching, and private study
purposes. Any substantial or systematic reproduction, redistribution,
reselling, loan, sub-licensing, systematic supply, or distribution in any
form to anyone is expressly forbidden. Terms & Conditions of access
and use can be found at http://www.tandfonline.com/page/terms-and-
conditions
Downloaded by [Swinburne University of Technology] at 08:59 26 August 2014
Assessment & Evaluation in Higher Education, Vol. 27, No. 5, 2002
improving the quality of teaching. Instructors take note of any weaknesses or areas for
potential improvement revealed by the questionnaire data. In their subsequent teaching
they make efforts to remediate weaknesses and improve their teaching. The logical
outcome of this process would be an overall increase in the quality of teaching over time.
Ratings from student feedback questionnaires are also commonly made use of in
appraisal exercises. Decisions about tenure, contract renewal and promotion now
commonly require evidence of teaching ability as well as research output. In recent years
a number of university systems have also instituted schemes for regular staff appraisal,
which incorporate monitoring of teaching performance. Such exercises should also result
in an enhancement of teaching quality as those with poorer ratings have an inducement
to improve their teaching and the worst teachers could be weeded out.
Downloaded by [Swinburne University of Technology] at 08:59 26 August 2014
The nal reason for having student feedback questionnaire schemes is that it is an
explicit requirement, or felt by university administrations to be an implicit obligation. In
Australia all universities are required to use the Course Experience Questionnaire to
evaluate their programs (Ramsden, 1992). It has become common to subject universities
to quality reviews in which they are required to demonstrate that they have in place
adequate procedures for ensuring teaching quality. Having a system for regularly
administering student feedback questionnaires would probably be the number one
requirement of most review panels.
These three reasons for making use of student feedback questionnaires can obviously
be interrelated, particularly if the nal reason is evident. The requirements of a quality
review process or implicit pressure can include some form of staff appraisal, linked to
a requirement to utilise student feedback questionnaires. Even if such systems were
introduced entirely because of external pressure, the respective university management
would no doubt publicly cite teaching quality improvement as a rationale for their
introduction.
system as in general correlation does not necessarily imply causality. In this speci c case
it is likely that other factors affect the quality of teaching over time. Nevertheless, the
presence of a signi cant rise in ratings would provide evidence that the quality of
teaching was improving, which is the object of the total quality assurance exercise. If it
were found that scores remained static or even fell over time it would certainly raise
questions over whether the resources devoted to the exercise of gathering feedback
through questionnaires could be justi ed. Alternatively it might pose questions about the
implementation of the system in the university in question, as it could be possible that
appropriate conditions or associated processes are needed for quality improvement to
occur.
Downloaded by [Swinburne University of Technology] at 08:59 26 August 2014
Related Research
There have been some previous investigations of questionnaire ratings over time, though
most differ from the present study by concentrating on individuals, conducting short-
term experiments or examining the effectiveness of forms of counselling related to
questionnaire feedback. Marsh and Hocevar (1991) found evidence of stability over a
13-year period when looking at individual instructors. Hativa (1996) found stability in
both levels of ratings and the shape of strength/weakness pro les over four sets of
evaluation data. However, there was improvement from teachers who undertook special
improvement activities.
Investigations of changes in ratings after feedback have mostly been short-term
studies, though they do indicate that change can and does occur. Cohen (1980)
conducted a meta-analysis of studies that gave mid-term feedback and then examined
end-of-term ratings. Those who received the mid-term feedback averaged end-of-term
ratings one-third of a standard deviation higher than controls. Longer-term studies
have been rare but studies that coupled feedback with consultation have shown
longer-term effects (e.g. Marsh & Roche, 1993; Piccinin et al., 1999; Stevens &
Aleamoni, 1985).
It would appear that individuals relative strengths and weaknesses tend to be
reasonably consistent, but this does not imply that overall improvement is not possible.
There seems to be tentative evidence that the level of ratings will tend to be fairly stable
unless the feedback is accompanied by counselling or improvement activities. There are
suf cient studies with evidence of change by individual teachers to suggest that teaching
performance can improve over time. It seems safe to reject the notion that teaching
performance is inherently stable and improvement not possible.
There is a surprising lack of studies similar to the one reported here in which data for
a whole university have been investigated. One possible explanation is that most research
into student feedback questionnaires has been conducted within a positivist framework,
so the researchers prefer to have experimental designs so that effects can be attributed.
Marsh (1987, p. 342) illustrates this concern:
It is true that, without some form of control, effect cannot be unequivocally attributed
to a cause. Controlling feedback is not realistically feasible or ethical, though, at whole
university level. Yet investigation is still important from a naturalistic perspective to see
whether, in real situations, improvement in teaching quality does accompany the use of
student feedback questionnaires.
introduced in 1995, use was made compulsory and an instrument known as the Student
Feedback Questionnaire (SFQ) was introduced. This instrument was developed from the
one used previously in the voluntary scheme. The original instrument contained six
scales derived initially from the extensive literature on the topic (e.g. Feldman, 1976;
Marsh, 1987). The items and dimensions were subsequently modi ed in the light of
feedback from teachers using the voluntary scheme about the type of student feedback
that was most valuable.
On the introduction of staff appraisal the voluntary instrument was modi ed. The
wording of items was changed so that the focus was upon the instructor, to re ect the
appraisal orientation of the instrument. A review subsequently recommended some
changes to the instrument, retaining the six dimensions but cutting the number of items
per dimension from three to two. The six dimensions or subscales of the SFQ are:
learning outcomes; interaction; individual help; organisation and presentation; motiv-
ation; and feedback. A copy of the SFQ instrument used to gather the data analysed in
this study is included as Appendix 1.
A previous study examined the reliabilities of the six scales and reported high values
ranging from 0.93 to 0.97 (Kwan, 1999). Each item required respondents to indicate the
extent of their agreement with a particular statement on a 5-point Likert scale ranging
from strongly agree to strongly disagree. The two items for each subscale were then
summed to produce a measure of the dimension. Hence, the scores for the subscales
ranged from 2 to 10. A high rating indicates a high level of student satisfaction with the
particular aspect of teaching being evaluated.
The SFQ also had two standard open-ended questions. Additional items, either closed
or open-ended, could be added to the questionnaire depending on the needs of the
individual department or staff.
It was university policy that a minimum of two classes per year were to be selected
to ll in the SFQ for each member of the teaching staff. Some departments permitted or
required extra classes to be evaluated. The administration of the questionnaires to the
selected classes was handled by departmental administrative staff and the optical mark
reader forms were processed by a central unit.
The teaching staff received a report on the means, standard deviations and percentage
distributions of the ratings of the six composite measures and the individual items for
each of the selected classes. The departmental averages of the ratings of the six subscales
and the individual items were reported to the department head, who also received a copy
of the data for each member of the department. Table 1 shows means and standard
deviations for the six scales of the questionnaire computed from the overall database
used for the study with department as the unit for analysis.
Student Feedback Questionnaires 415
TABLE 1. Means and standard deviation s of the six scales across department s over
3 years
Method
Permission was obtained to make use of the SFQ database for purposes of evaluating the
instrument itself and the quality assurance system of which it is a part. For this purpose
the le was stripped of both individual and departmental identi ers.
Data from 25 departments of the university which made use of the SFQ were collected
for consecutive years. Due to a change of questionnaire, 4-year data was available for
19 departments and 3-year data for six departments. For each department, the average
scores for the six dimensions of the SFQ were calculated across the classes for each year.
The mean SFQ scores were compared across years by multivariate analysis of variance
(MANOVA) for each of the 25 departments.
The department was chosen as an appropriate unit for analysis in that it caters for two
mechanisms for improving the quality of teaching. Firstly enough individuals could
improve their ratings by a suf cient margin that the department overall registered a
signi cant increase. Alternatively individuals with low ratings might not have contracts
renewed and be replaced by others who subsequently have higher ratings. A combination
of these two mechanisms is also possible. Had there been signi cant changes in
departmental scores the intention was to have looked at individual scores to determine
the mechanism.
In our study the sample was limited to one university by the practical constraint of
gaining access to such a wide body of sensitive data in a university other than ones own.
Generalisability is clearly a relevant issue. It can be questioned whether it is possible to
generalise from the nding in one university to suggest that the use of feedback
questionnaires in other universities will lead to (or not lead to) an indication of
improvement in teaching quality.
A study of one university clearly cannot lead to formal inference, as the sample was
insuf cient and not random. Eisner (1991, ch. 9) argues, though, that inferences from
small samples and even single cases can be made through attribute analysis and image
matching. In this particular case the process requires the reader to make a judgement as
to whether the university, the questionnaire, the administration system and the use made
of the feedback are suf ciently similar to those in other universities for there to be the
possibility of similar ndings.
For this reason we have tried, in various parts of the paper, to make transparent the
situation in the university in question to provide the reader with the evidence to make
a valid judgement. We argue that the questionnaire was closely related to those used by
many universities in that it incorporated dimensions commonly accepted in the literature.
416 D. Kember et al.
The procedures for its use are described in the following sections so the reader can judge
how similar they are to practices in other universities. If the level of attribute matching
indicates that the context and procedures are related to those described in this study then
there is the possibility that similar outcomes would be found if the same type of study
were conducted. The study does seem to provide a justi cation for other universities to
examine their own data.
Results
Results of the MANOVA were shown in Table 2 which reported the Wilks l, the
corresponding F value and the associated p-value. Besides the statistical signi cance, we
Downloaded by [Swinburne University of Technology] at 08:59 26 August 2014
also checked the practical difference among the mean scores for each department. We
considered a department had practical signi cant change if its mean scores across years
changed greater than a range of 1 /- 0.2 which is 5% of the feasible scale range of 2
to 10. The use of practical signi cance levels is appropriate because it is well known that
the very large sample size would mean that even tiny differences could be statistically
signi cant (e.g. Harris, 1998). The results for practical signi cant change were also
reported in Table 2. The MANOVA results show that these changes were too small to
be statistically signi cant. The results show, therefore, that 14 out of 25 departments
have no statistically signi cant change in their mean scores for any of the six dimensions
of the SFQ at the 5% level of signi cance in the 3- or 4-year period. For the 11
departments that did show a statistical change on one or more of the six dimensions, only
ve of them showed signi cant practical changes. Three of those ve departments had
a sudden drop in the last year of observation and the other two had rises and falls during
the period. The overall conclusion is that the SFQ evaluation process produces no
evidence of an improvement in the quality of teaching during the 4-year period.
To give some feel for the data average scores for the six subscales by year are plotted
for two typical departments in Figure 1.
TABLE 2. Results for MANOVA and practical signi cance of the mean SFQ scores across
years for the 25 department s
Note: Practical signi cance: Mean scores differences are greater than 5% of the feasible
range.
they should write clearly, speak at a normal and attainable speed and lend the
transparencies to the students. Although I have spoken up in the course
evaluation, no improvement has been made.
Both studies were qualitative and aimed for interpretation and understanding. It is not
therefore possible to give a precise measure of the extent of such sentiments, particularly
since these views emerged from indirect questioning. The students quoted were certainly
not isolated cases though, so this does seem to be quite a common belief. A search of
the transcripts of both studies produced no statements from students with evidence that
the student feedback questionnaires had made a positive impact on teaching.
Possible Explanations
As there had not been any signi cant changes in the student intake or the evaluation
policies and procedures of the university over the period under investigation, it was
highly unlikely that they would negate any increase in ratings had there been any
improvement in the overall teaching quality. If anything, given the importance of the
ratings, departments and instructors tended to adapt to the evaluation system by choosing
classes that would raise rather than lower their ratings.
The following discussion aims to explore the reasons why the SFQ may not have
contributed to an improvement in the quality of teaching and learning. The feasibility of
418 D. Kember et al.
Downloaded by [Swinburne University of Technology] at 08:59 26 August 2014
FIG . 1. Average mean scores for the six scales by year for two departments .
Keys: Learn: Learning outcomes; Interact: Interaction ; Ind. Help: Individua l help; Organ: Organisatio n and
presentation ; Motiv: Motivation ; Feedback: Feedback.
Student Feedback Questionnaires 419
each potential reason is discussed in the light of evidence in the literature and other
available contextual information.
relations between the two. Similarly the effect of quality assurance measures could
have an early impact which would wane over time. This possibility of university-
wide ratings reaching a stable plateau does not appear to have been investigated, but
if there were such an effect it would clearly be most prevalent in the most stable
situations.
However, there are a number of indicators that suggest that the university in which
this study was conducted would be less likely than most to have reached a stable plateau.
Until the recent Asian economic downturn there had been a higher staff turnover than
in most comparable western universities and the university had a younger staff pro le
than many. Recent years have also seen many innovations in teaching funded by
teaching development grants (Kember, 2000). Overall, there is insuf cient evidence to
conclude that teaching quality cannot be improved and no compelling reason to suggest
that the university in which the study was conducted might have reached a mature stable
plateau.
FIG. 2. Percentage s of response s to the question Good teaching is properly rewarded in the University.
(N 5 201).
Note: The 5-point Likert scale for response : 1 5 Strongly agree, 2 5 Agree, 3 5 Neutral, 4 5 Disagree and
5 5 Strongly disagree.
academics, 68% agreed that their institutions needed better ways, beside publications, to
evaluate scholarly performance (p. 34).
Like most others, the university in the study had an of cial policy that teaching quality
was taken into account in staff appraisal, contract renewal and promotion decisions.
There was also an annual scheme for honouring excellent teachers. Clearly the survey
results indicate that the academics perceived a mismatch between policy and practice, or
felt that the measures did not go far enough in rewarding good teaching. Again in this
respect the university was certainly not unusual. Many, if not most, universities now
have policy statements stressing the importance of teaching and schemes that are meant
to put the policy into practice. However, the results of the international surveys cited
above indicate high levels of cynicism among academics as to whether their universities
Downloaded by [Swinburne University of Technology] at 08:59 26 August 2014
fewer than 11 of the 19 begin with the word Teachers, which adds credence to
Centras claim (1993, p. 47) that the typical student rating form is devised to re ect
effectiveness in lecture, lecture and discussion and other teacher-centred methods.
DApollonia and Abrami (1997) argued that typical feedback questionnaires are based
upon models of instruction focusing upon traditional didactic teaching. McKeachie
(1997) pointed out that student rating forms gather information about conventional
classroom teaching. Almost all ignore the learning that takes place outside the classroom,
which is probably the majority for many students. Kolitch and Dean (1999) examined a
typical US evaluation instrument against two models of teaching. They found it
compatible with a transmission model but not with an engaged-critical one. The article
went on to question the neutrality of instruments that did not acknowledge forms of
Downloaded by [Swinburne University of Technology] at 08:59 26 August 2014
Conclusion
Employing a teaching evaluation system that does not appear to demonstrate any overall
improvement in teaching quality cannot be considered satisfactory. Several potential
reasons have been given, all of which may have played some part, but it was not clear
which, if any, predominated. If it is not possible to discover systemic factors that are
Student Feedback Questionnaires 423
discouraging improvements, there has to be a question over the continued use of the
student feedback questionnaires. Their regular use is expensive in terms of both funds
and time. If a quality assurance system is not effective then it is hard to justify its
continuation.
The study has been conducted in one university on one student feedback system. No
formal generalisation is possible but it is of interest to speculate whether similar results
might be found in other universities. The questionnaire was similar to those used to rate
instructors in many universities, and the results of Tagamori and Bishop (1995) and
Kwan (1999) suggest that it was better designed than many. The questionnaire was used
as part of a staff appraisal system, which again is a common situation.
Of the reasons suggested for the questionnaire not contributing to an improvement in
Downloaded by [Swinburne University of Technology] at 08:59 26 August 2014
teaching quality, it would appear that many would be widely applicable. The procedures
for making use of feedback data were probably fairly common, as few institutions appear
to offer widely available specialised counselling when feedback data are returned. The
perception that teaching was insuf ciently rewarded is certainly widespread, so there
may well be other universities where there is a perceived lack of incentive to make use
of the feedback.
Overall there is no obvious reason why the university in which this study was
conducted could be seen as differing markedly in evaluation practice from a wide range
of others. This does suggest that there is good reason for others to examine data from
their own universities to see whether their student feedback questionnaire systems are
contributing to an improvement in the quality of teaching and learning.
Studies that have shown improvement have either coupled the use of questionnaires
with specialised counselling (Marsh & Roche, 1993; Piccinin et al., 1999; Stevens &
Aleamoni, 1985) or encouraged teachers to devise their own ways of evaluating their
teaching innovations (Kember, 2000; Kember et al., 1997). In both cases there is an
implication that there is concern for improvement. In universities in which it is perceived
that good teaching is not valued or adequately rewarded there would appear to be a
possibility of also nding a lack of improvement over time as instructors lack incentive
to make use of the feedback from the compulsory standard questionnaires.
Acknowledgement
In the period since the data for this article were gathered, the university in which the
study was conducted has changed the policies and procedures associated with the
evaluation of teaching, so the observations in the article may no longer be applicable.
The rst-named author was in the Educational Development Centre of the Hong Kong
Polytechnic University at the time the study was conducted.
Note on Contributors
DAVID KEMBER is the Chief Educational Development Of cer, DORIS Y. P. LEUNG
is a Research Fellow, and K. P. KWAN is a Senior Educational Development Of cer
in the Educational Development Centre, The Hong Kong Polytechnic University.
Correspondence: Dr K. P. Kwan, EDC, Hong Kong Polytechnic University, Hung
Hom, Hong Kong.
424 D. Kember et al.
REFERENCES
BOYER, E. L. (1990) Scholarshi p reconsidered : Priorities of the professorat e (San Francisco, CA, The
Carnegie Foundation for the Advancement of Teaching).
BRINKO, K. T. (1993) The practice of giving feedback to improve teaching : What is effective? , Journal
of Higher Education, 64 (5), pp. 575593.
CENTRA, J. (1993) Re ective faculty evaluatio n (San Francisco, CA, Jossey Bass).
COHEN, P. A. (1980) Effectivenes s of student-ratin g feedback for improving college instruction : a
meta-analysis, Research in Higher Education, 13, pp. 321341.
DAPPOLLONIA, S. & ABRAMI, P. C. (1997) Navigating student ratings of instruction , American Psychol-
ogist, 52 (11), pp. 11981208.
EISNER, E. W. (1991) The enlightene d eye: Qualitative inquiry and the enhancemen t of educationa l
practice (New York, Macmillan Publishing) .
FELDMAN , K. A. (1976) The superior college teacher from the students view, Research in Higher
Downloaded by [Swinburne University of Technology] at 08:59 26 August 2014
Appendix 1
The Student Feedback Questionnair e
Please ll in the appropriat e circle to indicate your attitude to the following statements.
Learning Outcomes
1. I have understoo d the subject matter taught by the staff member.
2. The staff members method of teaching has helped my understanding .
Interaction
3. The staff member gave students opportunitie s to ask questions and discuss ideas.
4. The staff member encourage d active participatio n in class.
Downloaded by [Swinburne University of Technology] at 08:59 26 August 2014
Individua l Help
5. The staff member provided appropriat e help for students with learning problems.
6. Assistance was available from the staff member when necessary.
Motivation
9. The staff member explaine d the signi cance of what was taught.
10. The staff members teaching stimulated my interest in the subject.
Feedback
11. The staff member gave me regular feedback on my progress .
12. The feedback from the staff member was helpful and constructive .