You are on page 1of 4

0363-5465/103/3131-0083$02.

00/0
THE AMERICAN JOURNAL OF SPORTS MEDICINE, Vol. 31, No. 1
2003 American Orthopaedic Society for Sports Medicine

Reproducibility and Reliability


of the Outerbridge Classification for
Grading Chondral Lesions of the
Knee Arthroscopically
Michelle L. Cameron, MD, Karen K. Briggs, MBA, and J. Richard Steadman,* MD
From the Steadman-Hawkins Sports Medicine Foundation, Vail, Colorado

Background: Few studies have investigated the accuracy and reproducibility of the Outerbridge classification system for
classification of chondral damage in the knee.
Hypothesis: Arthroscopically assigned Outerbridge grades are accurate, reliable, and reproducible.
Study Design: Cadaver study.
Methods: Six cadaveric knees underwent diagnostic arthroscopy, which was videotaped. An arthrotomy was then performed
and the arthroscopically identified lesions were measured with calipers. Nine orthopaedic surgeons reviewed each video and
graded each chondral lesion two separate times. Accuracy of observations was calculated based on the percentage of
agreement between the grades determined during arthroscopy and arthrotomy.
Results: The overall accuracy was 68% but varied by location. The kappa coefficient between the two scores was 0.602; the
arthroscopy grade was higher than the arthrotomy grade 63% of the time. The intraobserver and interobserver kappa coefficients
were 0.80 and 0.52, respectively. The mean interobserver kappa between the two physicians in practice 5 years or more was
0.72, compared with 0.50 for physicians in practice less than 5 years.
Conclusions: The Outerbridge classification was moderately accurate when used to grade chondral lesions arthroscopically.
Clinical Relevance: Orthopaedic surgeons can accurately grade chondral lesions of the knee with the Outerbridge classification, regardless of their level of experience.
2003 American Orthopaedic Society for Sports Medicine

Many classification systems for assessing chondral damage of the knee have been described.4, 6, 9 11 The Outerbridge classification system was originally designed to
classify chondromalacia patellae.10, 11 Over the years, it
has been extrapolated for use to classify chondral lesions
throughout the body, but few studies have investigated its
accuracy and reproducibility.1 The purpose of this study
was to determine the intraobserver reliability, interobserver reproducibility, and accuracy of the Outerbridge
classification system for grading chondral lesions in knees
viewed arthroscopically compared with observations at
arthrotomy.

MATERIALS AND METHODS


Six cadaveric knees from five female and one male donors
(average age, 67 years [range, 50 to 79]) underwent diagnostic arthroscopy performed through standard medial
and lateral joint line portals with a medial suprapatellar
portal for inflow. All arthroscopic procedures were videotaped for later review. All compartments were visualized
and the chondral surfaces were evaluated. Areas suspected of having a chondral lesion were assessed with an
arthroscopic calibrated probe to determine the depth and
diameter of the lesion. After the arthroscopic examination,
an arthrotomy was performed on the knee. Lesions that
had been identified arthroscopically were then measured
by using calipers to assign the grade of chondrosis according to the Outerbridge classification.10, 11
Nine orthopaedic surgeons (four attending surgeons and
five sports medicine fellows) independently reviewed each
video. Before they viewed the video, the Outerbridge clas-

* Address correspondence and reprint requests to J. Richard Steadman,


MD, Department of Clinical Research, 181 West Meadow Drive, Suite 1000,
Vail, CO 81657.
No author or related institution has received any financial benefit from
research in this study.

83

84

Cameron et al.

American Journal of Sports Medicine


TABLE 3
Location of Identified Lesions

TABLE 1
Accuracy of Observers Using the Outerbridge Classificationa
First
Second
All
observation observation observations

Agreement

All observers
Observers 5 years practice
Observers 5 years practice

67%
70%
66%

68%
70%
68%

68%
70%
67%

Accuracy based on agreement between grade at arthroscopy


and grade at arthrotomy.

Location

Number of lesions

Accuracya

Patella
Trochlear groove
Medial femoral condyle
Lateral femoral condyle
Medial tibial plateau
Lateral tibial plateau

3
3
3
4
3
4

94%
80%
65%
60%
80%
56%

a
Accuracy based on agreement between grade at arthroscopy
and grade at arthrotomy.

sification was reviewed in detail with all participants. For


this study, the Outerbridge classification was defined as
follows: grade 0, normal cartilage; grade I, cartilage with
softening and swelling; grade II, a partial-thickness defect
with fissures on the surface that do not reach subchondral
bone or exceed 1.5 cm in diameter; grade III, fissuring to
the level of subchondral bone in an area with a diameter
more than 1.5 cm; and grade IV, exposed subchondral
bone.10, 11 In the six cadaveric knees, there were three
grade 0, five grade I, four grade II, four grade III, and four
grade IV lesions, as determined during arthrotomy. The
orthopaedic surgeons participating in the study graded
each chondral lesion two separate times but at the same
session. Each lesion was edited to a single video clip and
was shown in different random order each time to decrease bias.
Statistical Analysis
The overall accuracy was determined by comparing the
percentage correctly graded by all nine orthopaedic surgeons from the arthroscopy videotape compared with the
classification assigned during the arthrotomy. The orthopaedic surgeons were then stratified by the number of
years they had been in practice, and the accuracy was

TABLE 2
Observer Agreement with Arthrotomy Grade
Lesion

Grade

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20

4
1
3
4
2
0
1
2
0
2
0
2
1
1
1
3
3
3
4
4

Number (%) of observers who


graded correctly
Session A

Session B

9 (100)
2 (22)
9 (100)
2 (22)
6 (67)
6 (67)
8 (89)
7 (78)
7 (78)
4 (44)
6 (67)
5 (56)
6 (67)
4 (44)
7 (78)
7 (78)
3 (33)
7 (78)
8 (89)
9 (100)

9 (100)
2 (22)
8 (89)
9 (100)
7 (78)
5 (56)
8 (89)
4 (44)
7 (78)
5 (56)
5 (56)
7 (78)
7 (78)
7 (78)
7 (78)
8 (89)
3 (33)
7 (78)
9 (100)
9 (100)

determined for those in practice less than 5 years and


more than 5 years, respectively.
The kappa score between the arthrotomy grade and the
grade assigned by the surgeons from the videotape of the
arthroscopy was calculated to correlate the agreement
between the two grades. Reliability and reproducibility
were assessed with the use of kappa statistics. The kappa
coefficient represents the percentage of instances of agreement with the likelihood of agreement based on chance
alone taken into account. A kappa coefficient of 1.00 indicates perfect agreement. A kappa value of 0.00 to 0.40
indicates fair agreement; 0.41 to 0.75 indicates good
agreement; and 0.76 to 1.00 indicates excellent agreement.7 The interobserver reproducibility was assessed by
comparisons among the nine observers. Intraobserver reliability was assessed by comparing the grades assigned
each lesion at each surgeons two viewings.

RESULTS
The overall accuracy rate was 68% for all observers. When
stratified by years in practice, there was no difference
between the observers in practice more than 5 years (N
2) and those in practice less than 5 years, including fellows
(N 7) (Table 1). The accuracy rate by lesion graded
ranged from 22% to 100%, with lower-grade lesions diagnosed with less accuracy than higher-grade lesions (Table
2). For those observations that did not agree with the
grade assigned during arthrotomy, the observers graded
the lesion higher 63% of the time and lower 37% of the
time. The accuracy rate varied according to the location of
the lesions (Table 3). The kappa score between the arthrotomy grade and the surgeons grade was 0.602, indicating fair-to-good agreement.
The average intraobserver kappa coefficient was 0.80,
indicating excellent agreement. The highest intraobserver
kappa coefficient for an observer was 1.0 and the lowest
was 0.55 (Table 4). The mean intraobserver kappa for
physicians in practice 5 years or more was 0.91, compared
with a kappa of 0.76 for physicians in practice less than 5
years and fellows.
The average interobserver kappa coefficient was 0.52,
indicating good agreement (Table 5). The mean interobserver kappa between the two physicians in practice 5
years or more was 0.72, compared with a kappa of 0.50 for
the interobserver reliability among the physicians in practice less than 5 years and the fellows.

Vol. 31, No. 1, 2003

Reproducibility and Reliability of the Outerbridge Classification

85

TABLE 4
Intraobserver Reliability of Use of the Outerbridge Classification to Grade Lesions of the Knee
Observer

1
2
3
4
5
6
7
8
9

Kappa
coefficienta

Percentage
agreementb

Level of
agreement

P value

0.86
0.95
0.55
0.80
1.00
0.62
0.63
0.75
1.00

90
95
65
85
100
70
75
80
100

Excellent
Excellent
Good
Excellent
Excellent
Good
Good
Good
Excellent

0.001
0.001
0.001
0.001
0.001
0.001
0.001
0.001
0.001

0.80

84

Excellent

0.001

(5 years practice)
(5 years practice)
(fellow)
(fellow)
(fellow)
(fellow)
(fellow)

Average
a
b

Kappa values: 0.00 to 0.40, fair agreement; 0.41 to 0.75, good agreement; 0.76 to 1.00, excellent agreement.
Percentage agreement equals the agreement between the first and second observation of each physician.
TABLE 5
Interobserver Reliability of Use of the Outerbridge Classification to Grade Lesions of the Kneea
First set of ratings
Observer

1
2
3
4
5
6
7
8
9

0.75

0.47
0.70

0.38
0.45
0.42

0.41
0.52
0.64
0.24

0.40
0.63
0.64
0.35
0.65

0.30
0.52
0.39
0.28
0.52
0.35

0.59
0.70
0.75
0.37
0.51
0.58
0.39

0.63
0.64
0.59
0.40
0.52
0.64
0.47
0.58

Second set of ratings


Observer

1
2
3
4
5
6
7
8
9
a

0.69

0.58
0.64

0.50
0.58
0.41

0.41
0.58
0.58
0.24

0.44
0.46
0.43
0.45
0.48

0.52
0.70
0.76
0.41
0.64
0.41

0.52
0.58
0.75
0.47
0.33
0.32
0.64

0.63
0.70
0.70
0.40
0.52
0.51
0.70
0.60

Reliability reported by kappa coefficient.

DISCUSSION
The Outerbridge classification started as a simple grading
system for chondromalacia patellae, but it has been extrapolated to be used for grading every articular surface in
the body. In this study, we attempted to determine the
accuracy and reproducibility of this system for classifying
chondral lesions in the knee. We found that arthroscopic
grading, when compared with the standard of grading
during arthrotomy, was 68% accurate, with good agreement on kappa testing. When arthroscopically graded lesions were misgraded, they tended to be graded more
severely than the arthrotomy grade.
The intraobserver correlation between the first and second grading of each lesion was excellent. The interobserver correlation was found to be good. There was no
difference seen in the accuracy of this classification sys-

tem based on the number of years in practice of the observer; however, the intraobserver reproducibility was
higher for those observers with more experience. It is not
possible to determine whether this is a significant difference because of the limited number of observers. In another similar study, when the judgment of fellows in making a diagnosis of meniscal tears was compared with that
of treating surgeons, poor-to-fair agreement was seen.2
Several other studies have looked at the assessment of
cartilage damage and agreement with the findings of imaging studies.3, 5, 8, 12 In a study by Disler et al.,3 interobserver agreement with ultrasonography was high, with a
kappa of 0.80. Another study compared use of the SFA
(Socie te Francaise dArthroscopic) score4 and MRI in grading cartilage damage in osteoarthritic knees.5 The intraobserver reliability was high, as was the interobserver

86

Cameron et al.

reliability. A similar study by Potter et al.12 used the


Outerbridge classification and reported the accuracy of
grading from MR images. The interobserver reliability
was high; however, there were only two observers. Agreement between grading with use of MR images and arthroscopic findings ranged from 65% to 85%.12
The results of our study were similar to those of the
other studies. There are, however, several limitations of
this study, including the limited number of cadaveric
knees studied. Another limitation was that the observer
was unable to have tactile feedback from triangulation
during the actual arthroscopic procedure. Tactile feedback
might have increased the accuracy of the Outerbridge
classification over simple video review and might also
have resulted in greater differences between surgeons.
Another limitation was that all lesions were graded two
separate times but during the same session. However, all
video clips were shown in random order to decrease bias.
The study could have been improved by having reviewers
evaluate the arthroscopic videos at two sessions, several
weeks apart. However, the possibility of bias would still
exist. Even with these limitations, our findings are similar
to those recently published by Brismar et al.,1 who reported kappa statistics of 0.55 to 0.75, suggesting good
reproducibility for the Outerbridge classification.
In summary, we found that the Outerbridge classification, when used to grade chondral lesions arthroscopically, was moderately accurate. It had excellent intraobserver reliability, good interobserver reproducibility, and
was used accurately by orthopaedic surgeons regardless of
the level of their experience.

American Journal of Sports Medicine

ACKNOWLEDGMENT
The authors thank Dr. Mininder Kocher for his statistical
assistance with this study.

REFERENCES
1. Brismar BH, Wredmark T, Movin T, et al: Observer reliability in the
arthroscopic classification of osteoarthritis of the knee. J Bone Joint Surg
84B: 42 47, 2002
2. Dervin GF, Stiell IG, Wells GA, et al: Physicians accuracy and interrator
reliability for the diagnosis of unstable meniscal tears in patients having
osteoarthritis of the knee. Can J Surg 44: 267274, 2001
3. Disler DG, Raymond E, May DA, et al: Articular cartilage defects: In vitro
evaluation of accuracy and interobserver reliability for detection and grading with US. Radiology 215: 846 851, 2000
4. Dougados M, Ayral X, Listrat V, et al: The SFA system for assessing
articular cartilage lesions at arthroscopy of the knee. Arthroscopy 10:
69 77, 1994
5. Drape JL, Pessis E, Auleley GR, et al: Quantitative MR imaging evaluation
of chondropathy in osteoarthritic knees. Radiology 208: 49 55, 1998
6. Hunt N, Sanchez-Ballester J, Pandit R, et al: Chondral lesions of the knee:
A localization method and correlation with associated pathology. Arthroscopy 17: 481 490, 2001
7. Landis JR, Koch GG: The measurement of observer agreement for categorical data. Biometrics 33: 159 174, 1977
8. Lysholm J, Hamberg P, Gillquist J: The correlation between osteoarthrosis
as seen on radiographs and on arthroscopy. Arthroscopy 3: 161165,
1987
9. Noyes FR, Stabler CL: A system for grading articular cartilage lesions at
arthroscopy. Am J Sports Med 17: 505513, 1989
10. Outerbridge RE: The etiology of chondromalacia patellae. J Bone Joint
Surg 43B: 752757, 1961
11. Outerbridge RE, Dunlop JA: The problem of chondromalacia patellae. Clin
Orthop 110: 177196, 1975
12. Potter HG, Linklater JM, Allen AA, et al: Magnetic resonance imaging of
articular cartilage in the knee: An evaluation with use of fast-spin-echo
imaging. J Bone Joint Surg 80A: 1276 1284, 1998

You might also like