Professional Documents
Culture Documents
Learning objectives
When you have nished this chapter, you should be able to:
r Explain the difference between nominal, ordinal, and metric discrete and metric continuous variables.
r Identify the type of a variable. r Explain the non-numeric nature of ordinal data.
I am using measure in the broadest sense here. We wouldnt measure the sex or the ethnicity of someone, for example. We would instead usually observe it or ask the person or get the value from a questionnaire. But we would measure their height or their blood pressure. More on this shortly.
Medical Statistics from Scratch, Second Edition David Bowers C 2008 John Wiley & Sons, Ltd
Mr Patel 24 Male O
Ms Manda 20 Female A
Categorical variables
Metric variables
Categorical variables
Nominal categorical variables
Consider the variable blood type. Lets assume for simplicity that there are only four different blood types: O, A, B, and A/B. Suppose we have a group of 100 patients. We can rst determine the blood type of each and then allocate the result to one of the four blood type categories. We might end up with a table like Table 1.2.
You will also see metric data referred to as interval/ratio data. The computer package SPSS uses the term scale data.
CATEGORICAL VARIABLES
By the way, a table like Table 1.2 is called a frequency table, or a contingency table. It shows how the number, or frequency, of the different blood types is distributed across the four categories. So 65 patients have a blood type O, 15 blood type A, and so on. Well look at frequency tables in more detail in the next chapter. The variable blood type is a nominal categorical variable. Notice two things about this variable, which is typical of all nominal variables:
r The data do not have any units of measurement.3 r The ordering of the categories is completely arbitrary. In other words, the categories cannot
be ordered in any meaningful way.4 In other words we could just as easily write the blood type categories as A/B, B, O, A or B, O, A, A/B, or B, A, A/B, O, or whatever. We cant say that being in any particular category is better, or shorter, or quicker, or longer, than being in any other category.
3 4
For example, cm, or seconds, or ccs, or kg, etc. We are excluding trivial arrangements such as alphabetic.
Table 1.3 A frequency table showing the (hypothetical) distribution of 90 Glasgow Coma Scale scores
Glasgow Coma Scale score 3 4 5 6 7 8 9 10 11 12 13 14 15 Number of patients 8 1 6 5 5 7 6 8 8 10 12 9 5
The Glasgow Coma Scale is an ordinal categorical variable. Notice two things about this variable, which is typical of all ordinal variables:
r The data do not have any units of measurement (so the same as for nominal variables). r The ordering of the categories is not arbitrary as it was with nominal variables. It is now
possible to order the categories in a meaningful way. In other words, we can say that a patient in the category 15 has less brain injury than a patient in category 14. Similarly, a patient in the category 14 has less brain injury than a patient in category 13, and so on. However, there is one additional and very important feature of these scores, (or any other set of ordinal scores). Namely, the difference between any pair of adjacent scores is not necessarily the same as the difference between any other pair of adjacent scores. For example, the difference in the degree of brain injury between Glasgow Coma Scale scores of 5 and 6, and scores of 6 and 7, is not necessarily the same. Nor can we say that a patient with a score of say 6 has exactly twice the degree of brain injury as a patient with a score of 12. The direct consequence of this is that ordinal data therefore are not real numbers. They cannot be placed on the number line.5 The reason is, of course, that the Glasgow Coma Scale data, and
The number line can be visualised as a horizontal line stretching from minus innity on the left to plus innity on the right. Any real number, whether negative or positive, decimal or integer (whole number), can be placed somewhere on this line.
METRIC VARIABLES
the data of most other clinical scales, are not properly measured but assessed in some way, by the clinician working with the patient.6 This is a characteristic of all ordinal data. Because ordinal data are not real numbers, it is not appropriate to apply any of the rules of basic arithmetic to this sort of data. You should not add, subtract, multiply or divide ordinal values. This limitation has marked implications for the sorts of analyses we can do with such data as you will see later in this book.
Exercise 1.2 Suggest a few more scales with which you may be familiar from your clinical work. Exercise 1.3 Explain why it wouldnt really make sense to calculate an average Glasgow Coma Scale for a group of head injury patients.
Metric variables
Continuous metric variables
Look at Table 1.4, which shows the weight in kg (rounded to two decimal places) of six individuals.
6
There are some scales that may involve some degree of proper measurement, but these will still produce ordinal values if even one part of the score is determined by a non-measured element.
The variable weight is a metric continuous variable. With metric variables, proper measurement is possible. For example, if we want to know someones weight, we can use a weighing machine, we dont have to look at the patient and make a guess (which would be approximate), or ask them how heavy they are (very unreliable). Similarly, if we want to know their diastolic blood pressure we can use a sphygmometer.7 Guessing, or asking, is not necessary. Because they can be properly measured, these variables produce data that are real numbers, and so can be placed on the number line. Some common examples of metric continuous variables include: birthweight (g), blood pressure (mmHg), blood cholesterol (g/ml), waiting time (minutes), body mass index (kg/m2 ), peak expiry ow (l per min), and so on. Notice that all of these variables have units of measurement attached to them. This is a characteristic of all metric continuous variables. In contrast to ordinal values, the difference between any pair of adjacent values is exactly the same. The difference between birthweights of 4000 g and 4001 g is the same as the difference between 4001 g and 4002 g, and so on. This property of real numbers is known as the interval property (and as we have seen, its not a property possessed by ordinal values). Moreover, a blood cholesterol score, for example, of 8.4 g/ml is exactly twice a blood cholesterol of 4.2 g/ml. This property is known as the ratio property (again not shared by ordinal values).8 In summary:
r Metric continuous variables can be properly measured and have units of measurement. r They produce data that are real numbers (located on the number line).
These properties are in marked contrast to the characteristics of nominal and ordinal variables. Because metric data values are real numbers, you can apply all of the usual mathematical operations to them. This opens up a much wider range of analytical possibilities than is possible with either nominal or ordinal data as you will see.
Exercise 1.4 Suggest a few continuous metric variables with which you are familiar. What is the difference between, and consequences of, assessing the value of something and measuring it?
We call the device we use to obtain the measured value, e.g. a weighing scale, or a sphygmometer, or tape measure, etc., a measuring instrument. 8 It is for these two reasons that metric data is also known as interval/ratio data but metric data is shorter!
Table 1.5 The number of times that a group of children with asthma used their inhalers in the past 24 hours
Number of times inhaler used in past 24 hours 1 2 6 6 7 8
r Metric discrete variables can be properly counted and have units of measurement numbers
of things.
r They produce data which are real numbers located on the number line.
Exercise 1.5 Suggest a few discrete metric variables with which you are familiar. Exercise 1.6 What is the difference between a continuous and a discrete metric variable? Somebody shows you a six-pack egg carton. List (a) the possible number of eggs that the carton could contain; (b) the number of possible values for the weight of the empty carton. What do you conclude?
10
Has the variable got units? (this includes 'numbers of things') No Can the data be put in meaningful order? No Categorical nominal Yes Categorical ordinal Yes Do the data come from measuring or counting? Counting Discrete metric Measuring Continuous metric
Exercise 1.7 Four migraine patients are asked to assess the severity of their migraine pain one hour after the rst symptoms of an attack, by marking a point on a horizontal line, 100 mm long. The line is marked No pain, at the left-hand end, and Worst possible pain at the right-hand end. The distance of each patients mark from the left-hand end is subsequently measured with a mm rule, and their scores are 25 mm, 44 mm, 68 mm and 85 mm. What sort of data is this? Can you calculate the average pain of these four patients? Note that this form of measurement (using a line and getting subjects to mark it) is known as a visual analogue scale (VAS). Exercise 1.8 Table 1.6 contains the characteristics of cases and controls from a casecontrol study9 into stressful life events and breast cancer in women (Protheroe et al.1999). Identify the type of each variable in the table. Exercise 1.9 Table 1.7 is from a cross-section study to determine the incidence of pregnancy-related venous thromboembolic events and their relationship to selected risk factors, such as maternal age, parity, smoking, and so on (Lindqvist et al. 1999). Identify the type of each variable in the table. Exercise 1.10 Table 1.8 is from a study to compare two lotions, Malathion and d-phenothrin, in the treatment of head lice (Chosidow et al. 1994). In 193 schoolchildren, 95 children were given Malathion and 98 d-phenothrin. Identify the type of each variable in the table.
At the end of each chapter you should look again at the learning objectives and satisfy yourself that you have achieved them.
Dont worry about the different types of study, I will discuss them in detail in Chapter 6.
11
Table 1.6 Characteristics of cases and controls from a case-control study into stressful life events and breast cancer in women. Values are mean (SD) unless stated otherwise. Reproduced from BMJ, 319, 102730, courtesy of BMJ Publishing Group
Breast cancer group (n = 106) 61.6 (10.9) 10 (10) 38 (36) 28 (26) 13 (12) 11 (10) 3 (3) 3 (3) 15 (14) 16 (15) 42 (40) 32 (31) 21.3 (5.6) 12.8 (1.4) 14 (13) 9 (9) 83 (78) 47.7 (4.5) 38 3.0 (5.4) (n = 90) 7.4 (9.9) 29 (27) 1.6 (3.7) 8 (8) 15 (15) 16 (15) 38 (36) 26 (25) 20 (19) 22 (21) 83 (78.3) 8 (7.6) 15 (14.2) 26.8 (5.5) Control group (n = 226) 51.0 (8.5) 20 (9) 82 (36) 72 (32) 24 (11) 21 (9) 2 (1) 4 (2) 31 (14) 31 (13.7) 84 (37) 80 (35) 20.5 (4.3) 13.0 (1.6) 66 (29) 43 (19) 117 (52) 45.6 (5.2) 61 4.2 (5.0) (n = 195) 7.4 (12.1) 78 (35) 1.9 (4.0) 10 (4) 105 (47) 35 (16) 59 (26) 71 (31) 52 (23) 44 (20) 170 (75.2) 14 (6.2) 42 (18.6) 24.8 (4.2)
Variable Age Social class (%): I II III non-manual III manual IV V VI No of children (%): 0 1 2 3 Age at birth of rst child Age at menarche Menopausal state (%): Premenopausal Perimenopausal Postmenopausal Age at menopause Lifetime use of oral contraceptives (%) No of years taking oral contraceptives No of months breastfeeding Lifetime use of hormone replacement therapy (%) Mean years of hormone replacement therapy Family history of ovarian cancer (%) History of benign breast disease (%) Family history of breast cancer (%) Units of alcohol/week (%): 0 04 59 10 No of cigarettes/day: 0 19 10 Body mass index (kg/m2 )
Two sample t test. Data for one case missing. 2 test for trend.
P value 0.000
0.094
0.97 0.500 0.200 0.000 0.001 0.000 0.065 0.990 0.193 0.460 0.241 0.000 0.997
0.927
0.383 0.001
12
Table 1.7 Patient characteristics from a cross-section study of thrombotic risk during pregnancy. Reproduced with permission from Elsevier (Obstetrics and Gynaecology, 1999, Vol. 94, pages 595599.
Thrombosis cases (n = 608) Maternal age (y) (classication 1) 19 2024 2529 3034 35 Maternal age (y) (classication 2) 19 2034 35 Parity Para 0 Para 1 Para 2 Para 3 Missing data No. of cigarettes daily 0 19 10 Missing data Multiple pregnancy No Yes Preeclampsia No Yes Cesarean delivery No Yes
OR = odds ratio; CI = condence interval. Data presented as n (%).
Controls (n = 114,940) 2817 (2.5) 23,006 (20.0) 44,763 (38.9) 30,135 (26.2) 14,219 (12.4) 2817 (2.5) 97,904 (85.2) 14,219 (12.4) 47,425 (41.3) 40,734 (35.4) 18,113 (15.8) 8429 (7.3) 239 (0.2) 87,408 (76.0) 14,295 (12.4) 8177 (7.1) 5060 (4.4) 113,330 (98.6) 1610 (1.4) 111,788 (97.3) 3152 (2.7) 102,181 (88.9) 12,759 (11.1)
OR 1.9 1.1 1.0 1.0 1.3 1.8 1.0 1.3 1.8 1.0 1.5 2.4
95% CI 1.3, 2.9 0.9, 1.4 Reference 0.8, 1.3 1.0, 1.7 1.2, 2.7 Reference 1.0, 1.6 1.5, 2.2 Reference 1.1, 1.9 1.8, 3.1
26 (4.3) 125 (20.6) 216 (35.5) 151 (24.8) 90 (14.8) 26 (4.3) 492 (80.9) 90 (14.8) 304 (50.0) 142 (23.4) 93 (15.3) 69 (11.3) 0 (0) 423 (69.6) 80 (13.2) 57 (9.4) 48 (7.9) 593 (97.5) 15 (2.5) 562 (92.4) 46 (7.6) 420 (69.1) 188 (30.9)