Professional Documents
Culture Documents
Comparison of sensitivity and specificity between two diagostic tests, each measured on the
same patient, when the same reference standard is used
For this situation, we want to test whether the two diagnostic tests perform equally against a
common reference standard. For Test A and Test B, the hypothesis test for a comparison of
sensitivity can be stated,
Sensitivity (SeA) = { true positives } / {all patients with disease} = n11A / r1A
Sensitivity (SeB) = { true positives } / {all patients with disease} = n11B / r1B
_________________
Source: Stoddard GJ. Biostatistics and Epidemiology Using Stata: A Course Manual [unpublished manuscript] University of Utah
School of Medicine, 2010.
Where the cell counts, the ms, simply fill in by the crosstabulation procedure.
Since the data are not independent, being repeated measures on the same patient (both tests done
on same patient), we must apply a paired proportions comparision. To compare sensitivity, we
simply apply the McNemar test, which is the standard way to compare two paired binary
variables expressed in this paired data layout (Lachenbruch and Lynch, 1998; Zhou et al, 2002,
pp.166-169).
The McNemar test is commonly referred to as the McNemar change test, as it only uses
information from the discordant pairs (the cells where the two diagnostic tests are different).
It is simply a chi-square test (Siegel and Castellan, 1988, p.76) expressed as,
(m10 - m01 ) 2
c 2
df =1 =
m10 + m01
The chi-square test requires a sufficiently large sample size to provide an accurate p value. The
rule-of-thumb for the McNemar test version of the chi-square test is that when (m10 + m01) < 10,
the exact form of the test should be used (Siegel and Castellan, 1988, p.79). Since the data are
paired, the Fishers exact test is not appropriate, and so the binomial test is used. In Stata, this
binomial test is labeled Exact McNemar.
Specificity (SpA) = { true negative } / {all patients without disease} = n00A / r0A
Specificity (SpB) = { true negative } / {all patients without disease} = n00B / r0B
We see that all information for specificity for each test is contained in the second row, where the
second row of each table is the true absence of disease as identified by the common reference
standard. For a paired comparison of specificity, then, all we need are the cell counts in these
rows, combined into a paired crosstabulation table.
Where the cell counts, the ms, simply fill in by the crosstabulation procedure.
For comparison of sensitivity and specificity between two diagnostic tests, you could describe
the statistical method as:
Within the same patients, both Test A and Test B will be compared to a common Test C
gold standard and test characteristics will be calculated. The sensitivity between Test A
and Test B will be compared using a McNemar test, or exact McNemar test, as
appropriate [Lachenbruch and Lynch, 1998]. The specificity will similarly be compared.
Example
We will use the CASS dataset (see Appendix 1 for references). These data come from the
coronary artery surgery study (CASS). In a cohort study of N=1465 men undergoing coronary
arteriography (the gold standard) for suspected or probable coronary heart disease, both an
exercise stress test (EST) and chest pain history (CPH) were recorded. The data are coded as
File
Open
Find the directory where you copied the course CD:
Find the subdirectory datasets & do-files
Single click on cass.dta
Open
To obtain the sensitivity and specificity for est, we use the diagt command, which is not available
from the Stata menu bar.
findit diagt
Click on the sbe36_2 link, or a later version if one appears, to install the diagt command.
Coronary |
artery | Exercise Stress test
disease | Pos. Neg. | Total
-----------+----------------------+----------
Abnormal | 815 208 | 1,023
Normal | 115 327 | 442
-----------+----------------------+----------
Total | 930 535 | 1,465
[95% Confidence Interval]
---------------------------------------------------------------------------
Prevalence Pr(A) 70% 67% 72.2%
---------------------------------------------------------------------------
Sensitivity Pr(+|A) 79.7% 77.1% 82.1%
Specificity Pr(-|N) 74% 69.6% 78%
---------------------------------------------------------------------------
Coronary |
artery | Chest pain history
disease | Pos. Neg. | Total
-----------+----------------------+----------
Abnormal | 969 54 | 1,023
Normal | 245 197 | 442
-----------+----------------------+----------
Total | 1,214 251 | 1,465
| Controls |
Cases | Exposed Unexposed | Total
-----------------+------------------------+------------
Exposed | 786 183 | 969
Unexposed | 29 25 | 54
-----------------+------------------------+------------
Total | 815 208 | 1023
We see that the sum of the discordent pairs, 183+29 > 10, so that the sample size is large enough
to provide an accurate chi-square test p value. Therefore, we report the chi-square version of
McNemars test (p < 0.001). If, however, the discordant pairs had summed to a number < 10, we
would report the Exact McNemar test (p < .001).
Unfortunately, the variables are labeled cases and controls, which is rather confusing. It is
labelled this way because the McNemar test is part of the epitab suite of commands (the
epidemiology statistical procedures). To verify which variable represents cases, and which
represents controls, we can use,
This output has the row and column variables consistent with the mcc command, but displays it
in ascending sort order.
| Controls |
Cases | Exposed Unexposed | Total
-----------------+------------------------+------------
Exposed | 69 176 | 245
Unexposed | 46 151 | 197
-----------------+------------------------+------------
Total | 115 327 | 442
Comparing ROCs
In Stata, the method for comparing two ROCs, as programmed in the roccomp command, is
described by DeLong et al (1988). You could describe this in your protocol as,
The area under the receiver operating characteristic (ROC) curves were computed. For
comparisons of the ROC from different prediction rules, or prognostic models, using a
common reference standard, the method of DeLong et al (1988) was used.
----
DeLong ER, Delong DM, Clark-Pearson DL. Comparing the areas under two or more
correlated receiver operating characteristic curves: a nonparametric approach. Biometrics
1988;44(3):837-845.
DeLong ER, Delong DM, Clark-Pearson DL. (1988). Comparing the areas under two or more
correlated receiver operating characteristic curves: a nonparametric approach. Biometrics
44(3):837-845.
Lachenbruch PA, Lynch C. (1998). Assessing screening tests: extensions of McNemars test.
Statist Med 17:2207-2217.
Siegel S, Castellan NH Jr. (1988). Nonparametric Statistics for the Behavioral Sciences. 2nd ed.
New York, McGraw Hill.
Zhou X-H, Obuchowski NA, McClish DK. (2002). Statistical Methods in Diagnostic Medicine.
New York, John Wiley & Sons.