Professional Documents
Culture Documents
I. Sampling
1. General Info
a. Impossible to obtain data from every member of a population
i. Need to take representative sample of underlying population
1. then draw inferences about the population from the sample.
b. Sample always involves an element of random variation/error.
c. Sampling statistics are essentially about characterizing the nature and magnitude of this random error.
2. Random Error
a. Def. the variation that is due to chance
i. Inherent feature of sampling, statistical inference, and measurement of biological phenomenon
1. Blood pressure measurement
b. Statistics as a field concerned w/ random error
i. Thus, a significant p-value or a precise confidence interval CANNOT tell you if the underlying data is
accurate/unbiased
1. P-value/CI only statistical no info on biases
2. the further the point estimate from the null, the smaller the p-value
a. further away from the null
c. Not as important as systematic error
3. Systematic Error
a. Def. any process that acts to distort data or findings from their true value.
b. More important than random error.
i. It can be removed by better processes
c. Can be seen as selection bias, measurement bias, or confounding bias
4. Statistical Inference
a. Def. the process whereby one draws conclusions regarding a population from the results observed in a
sample taken from that population
b. Types
i. Estimation estimating the specific value of a parameter
1. Used with confidence intervals
ii. Hypothesis Testing making a decision about a hypothesized value of a parameter
1. General Info
a. There is inherent variability in data
b. All variation is additive
i. The net observed variation is a result of the all the individual sources of variation
c. Two main categories (biological and measurement)
2. Biological Variation
a. Def. variation in the actual entity being measured
b. Outside of the science stuff, can be subdivided
i. Variation within a person (intra-person)
1. Your BP changes as a result of stimuli (time of day, posture, emotions)
ii. Variation between people (inter-person)
c. Without it, there would be nothing for epidemiologists to measure
i. The presence of biological variation is sine qua non
d. Net effect it adds to the level of random error in any measurement process
i. Can be reduced with repeated measurements
3. Measurement Variation
a. Def. variation due to the measurement process
b. Causes
i. Instrument error (inaccuracy in the instrument)
Husaini 1 of 40
ii. Operator error (inaccuracy in the person operating the test)
c. Can introduce BOTH random and systematic error
i. Systematic differences is why different laboratories establish their own reference range
d. Types
i. Inter-observer variability different observes reading the same test
ii. Intra-observer variability same person observing test at different times
e. Net effect the use of specific operational standards can reduce the impact of measurement bias
1. Validity
a. Def. the degree to which a measurement process tend to measure what is intended to
i. It is the accuracy
ii. A valid instrument/test free of any systematic error/bias
1. Will be close to the underlying true value
b. Can be determined by comparing to an accepted gold standard
c. When no gold standards exist, we measure some specific phenomena or construct
i. Constructs are then used to develop a clinical scale which can then be used to measure the
phenomenon in practice.
d. Types of validity
i. Content validity includes all the dimension to be measured
1. If measuring for pain, you would include questions on aching, throbbing, burning (but not
itching, nausea, tingling)
ii. Construct validity the scale correlates with other known measures
1. A scale for depression includes questions related to it such as those about fatigue and
headache
iii. Criterion validity scale predicts a directly observable phenomenon
1. To see if responses to pain bear a predictable relationship to pain of known severity
e. For dichotomous data validity usually expressed in terms of sensitivity and specificity
f. For continuous data can use mean, SD, correlation, and regression analysis
2. Reliability
a. Def. the extent that repeated measures of a phenomenon tend to yield the same results regardless of the
correctness
i. It is the reproducibility
ii. No comparison to a reference or gold standard
iii. Refers to the lack of random error
b. Classified as intra-observer variability or inter-observer variability
c. If not direct observation reliability can be assed with the test-retest method
i. Respondents answer the same question at two different times.
ii. Measures a form of intra-person reliability
d. Type of data measured dictates the exact statistical approach Validity= Get what your supposed to
i. Categorical data Kappa Ask family/neighbors/goldstandard
ii. Interval data intra-class correlation
Reliability = Always get the same result
IV. Statistical aspects of variability test-retest
1. General Info
a. Measures of variation
i. Is basically a measure of dispersion
1. variance (2), SD (), and range
b. Measures of agreement
i. Correlation (r) and Kappa
2. Standard Deviation ()
a. Def. the absolute value of the average difference of individual values from the mean. It is calculated by taking
the square root of the variance
1. 1 SD = 68% of total observations
2. 2 SD = 95% of total observations
a. 2 SD away from the mean is considered abnormal
Husaini 2 of 40
3. Correlation (r)
a. Def. the correlation coefficient expresses the reliability of a continuous measurement (interval data)
i. It measures the strength of the liner relationship between two continuous variables
b. Ranges from -1 to 1 (zero is no correlation)
c. Takeaway
i. If info from actual/true values then correlation is a test of validity
ii. Most cases correlation asses reliability
d. It is possible to have r values, yet have little direct agreement between observers
i. A perfect r (1.0) can be obtained if Lab A results are always exactly 10mg/dl higher than those of Lab
B.
e. It is also often used in test-retest studies
i. Used for intra-rater or intra-person variability
4. Kappa
a. Def. reliability can be characterized for categorical/qualitative data
i. It corrects for the degree of chance in the overall level of agreement
ii. it tells us the possible agreement over and above chance the reviewers have achieved
b. The ability of kappa to adjust for chance agreement is important clinically
i. The prevalence of a particular condition being evaluated affects the likelihood that observers will
agree purely due to chance
1. Even if two people have no idea what they are doing, there will be substantial agreement
by chance alone
2. The magnitude of the agreement by chance increases as the proportion of positive (or
negative) assessments increases.
3. Two people each repeatedly toss a coin
a. Four possible options (HH, TT, TH, HT)
i. agreement 50% of the time due to chance
ii. Thus any percentage above the 50% is what we care about.
ii. If the prevalence of the attribute is either high or low, than the overall percent agreement will also be
high. In other words, if something if obviously right or obviously wrong, people are more likely to
agree as such.
1. prevalence overall agreement
2. prevalence overall agreement
3. prevalence overall agreement
c. Kappa ranges from -1 to 1
i. Negative agreement that is worse than chance
ii. Zero agreement is no better than chance
iii. Positive the amount of agreement above chance
V. Types of Data
1. Categorical
a. Nominal no order
i. Alive vs. dead, male vs. female, blood type (A,B,AB,O),
Husaini 3 of 40
b. Ordinal in a natural order, but not equally spaced
i. 1st/2nd/3rd degree burn, pain scale for migraines (none, mild, moderate, severe), Glasgow Coma Scale
1. General Info
a. The normal, Gaussian, distribution is the bell-shaped
curve)
i. The mean, median, and mode are all equal
b. Two ways to summarize distribution
i. Central tendency mean, median, mode
ii. Dispersion Standard Deviation
2. Central Tendency
a. Mean pulled by outliers
i. The center of gravity of the distribution
b. Median best when values are skewed
i. Will be between mode and mean when the data is skewed
c. Mode least sensitive to skewed data
i. Maximum value
3. Dispersion
a. Standard deviation used for normal (or near normal) distributions
i. 1 SD = 2/3 of the observations
ii. 2 SD = 95% of the observations
VII. Misc.
1. Abnormality
a. Abnormality depends on the population and their respective distribution
i. The cut-off will differ b/w populations
b. Best definition of abnormality
i. Being unusual greater than 2 SD from mean
ii. Sick observation regularly associated with disease
1. Most common definition
iii. Treatable only considered abnormal if treatment leads to improved outcome
2. Sub-group sampling
a. May need to obtain a larger sample from important subgroups and select subjects at random within subgroup
3. Regression to the mean some outliers may be due to random error; retesting them will cause them to move closer to
the mean.
1. General Info
a. Uncertainty can be characterized as
i. Qualitatively unlikely, possible, suspected, etc
ii. Quantatative probability and odds
1. Can be converted back and forth
b. Often used to quantify a physicians opinion
Husaini 4 of 40
c. Quantitative odds can force one to be more exact than is justified
P = a/b
2. Probability
a. Expresses uncertainty explicitly
a = number of events
b. Numerical value between 0 and 1
b = the total number at risk
c. Calculated as a proportion
3. Odds
a. The ratio of the probability of the event occurring over the probability of the event not occurring
1. Ratios
a. Expressed as (A/B) where A is NOT a part of B
b. In other words, A & B are mutually exclusive frequencies
c. Ratio of blacks to white in a school was 15/300 or 1:20
2. Proportion
a. Expressed as (A/B) where A is INCLUDED in B
b. Based on a fraction in which the numerator (frequency of disease or condition) is included in the denominator
(the population)
c. The proportion of blacks in the school was 15/315 or 4.8%
3. Rates
a. Special types of proportions that are evaluated over a specified time period
i. Express the relationship b/w an event (e.g. disease) during a given time period and a defined
population at risk over the same time period
b. Must have a population-at-risk and a specific time period
1. Prevalence
a. The proportion of the number of cases observed compared to the population at risk at a given point in time.
i. no time dimension
ii. Is the pretest probability
b. Refers to all cases of disease observed
at a given moment
c. Is a function of both the incidence rate and
the mean duration of the disease in the population
i. Ex. Arthritis no cure, so there is a long duration. Thus, the
Prevalence = burden
prevalence is high (for a given incidence rate)
Incidence = risk Ex. Rabies lethal disease, so the duration is very short. Thus,
the prevalence is very low (for a given incidence rate)
d. Conveys the disease burden preferred by epidemiological studies for disease burden
Husaini 5 of 40
ii. If you have a 5-year CIR of disease is 10%, then you have a 10% chance of developing the disease
over the next five years.
d. Range of 0 to 1 and must have a reference to time CFR = Deaths/Cases
i. Thus, it must increase with time
Mortality = Deaths/Population
1. Time : CIR
e. It is the event rate in the context of randomized trails Incidence rate = cases/population
i. Control event rate (CER) for the baseline/control group
ii. Experimental event rate (EER) for the treatment group
f. Case-Fatality Rate (CFR) proportion of affected individuals that die from the disease
i. CFR = die/affected thus, we need the number of affected in the denominator
1. This contrasts to mortality rate in which the denominator is the entire population
ii. Associated with the seriousness and/or virulence of the disease
1. CFR : virulent the disease
iii. Best Measure for the lethality of the condition
g. Attack Rate number of people affected divided by the number at risk
i. Used as a measure of morbidity (illness) in outbreak investigations
Husaini 6 of 40
1. Why CRF is the best measure for the lethality of a condition
2. CRFs can be similar between two populations when intuition tells you otherwise (aka, once
you have it your fuckedits just a matter of time)
a. In other words, CRFs doesnt tell you true mortality or incidence as you dont
know who is going to get what and how often they may do so. What CRF does
tell you is that once you get it, this is how lethal it is based on those who died
divided by those who have it.
b. For example, acute myocardial infarction between males and females. Males
(10% CRF) and Females (12% CRF) while mortality was 110/100,00 person
years for males and 35/1000,00 person years for females.
1. General Idea
a. In clinical studies, it is common to calculate the risk/CIR of an event in different populations
i. By taking the ratio or difference between these two measures, we can calculate two fundamental
measures of effect
1. The relative risk
2. The absolute risk
b. The risk in the control group = baseline risk
Husaini 7 of 40
2. 10-30% = moderate treatment effect
3. >30% = large treatment effect
ii. A RRR of 38% would be interpreted as the death rate being 38% lower after the new treatment
compared to the old treatment
iii. It represents the proportion of the original baseline (control) risk that is removed by the
treatment.
b. It is nothing more than a re-expression of the RR (hence they add to 1)
c. Commonly used in context of RCT RRR how much risk was removed
d. More Clinically Important more direct meaning
i. It indicates by how much in relative terms the event rate is decreased by the treatment
ii. X (Rc) = (Rc RT)
RR RRR ARR
"The risk of outcome is X times "The outcome is X% lower in T "The outcome is (T-C) lower in T
Example
lower (or higher) in T compared to C compared to C" compared to C
Husaini 8 of 40
e. Use NNT and NNH in concert with each other to make decisions
i. Will describe in absolute terms the trade off in both benefits and harm
1. General Info
a. In order to understand the impact of a risk factor on the incidence of disease in a population, we need to
know
i. The relative effect of the risk factor
ii. The prevalence of the risk factor in the population
b. In order to quantify the impact of the risk factors, we have the implicit assumption that the risk factor is a
cause of the disease.
c. PAR and PARF indicate the potential public health significance of a risk factor
i. Risk factor with big effect (RR =10) but is rare (P = 0.01%) has a PARF of 1%
ii. Risk factor with small effect (RR=2) but is common (P=40%) has a PARF of 44%.
Husaini 9 of 40
i. The odds of death due to lung cancer was 15.6 times higher in smokers compared to non-smokers
ii. Odds ratio is symmetrical if you calculated the odd of disease among the exposed (a/b) and
divided it by the odds of disease among the non-exposed (c/d), you would get the same odds ratio.
1.
General Overview
a. Concerned with making a decision about the value of an unknown parameter
b. Views experimentation as a decision making process
c. Null Hypothesis (Ho) no difference in the groups being compared with respect to the measured quantity of
interest
d. Alternative Hypothesis (HA) the groups being compared are different
i. can be specified for direction (one-sided alternative) instead of any difference (two-sided alternative)
ii. if difference round regardless, it is called the treatment effect
iii. we can never prove Ha is true, we can only reject Ho
e. Process of testing null hypothesis consists of calculating the probability of obtaining the results observed
assuming the null hypothesis is true
i. This probability known as the p-value
1. It is the probability of observing the test statistic at least as large as the one observed
P value = % that H0 is true under the assumption that the null hypothesis is true
2. P = probability of seeing the result P% of the time assuming the null hypothesis is
true
f. Alpha () the significance level
i. By convention, set to 5%; can be altered to suit researchers needs.
g. If the P-value is less than Alpha, the null hypothesis is rejected (as the percentage chance of
it being true is lower than what we define as significant).
3. The T-test
a. Tests means between two groups using continuous data, assuming the data is normally distributed
b. Larger values of t result in smaller p values which are more consistent with Ho being false
i. Numerator larger differences in the mean result in larger t values
1. difference : t: P Ho being false
ii. Denominator measure of the standard error of the difference
1. As sample size increases, the denominator decreases, and t
increases
a. sample size : t : P Ho being false
Husaini 10 of 40
3. This makes sense b/c everything under this 5% will be deemed significant (even though it
Type 2 Error (FN) = beta is not assuming we are still under the FP pretense) as the null hypothesis will be rejected.
In other words, 5 times out of 100 we will have a Type I error because there are 5 chances
(0.0, 0.01, 0.02, 0.03, and 0.04) that the p-value can be under alpha (0.5).
b. Type II (FN) error
i. Occurs when we determine a difference does
not exist when in fact it does
1. When we accept Ho as false
ii. A statistically non-significant p-value is
obtained
1. Even though there is a difference
between the groups being compared
iii. The rate that false negatives occurs is beta
1. Also known as Type II error rate
2. Sample size estimates are based on
setting beta at either 20% or as low
as 10%
3. This means that a real difference would be missed 20% of the time
iv. For smaller studies, the probability of a Type II error is a lot higher
c. & have an inverse relationship
i. As one increases, the other decreases
5. Power and Sample Size Power = (1-) = sensitivity = probability of correctly rejecting Ho when Ho is false
a. Power
i. The complement of the Type II error rate power = (1-)
ii. The probability of correctly rejecting Ho when Ho is false
1. The probability of the study finding a difference when a difference truly exists.
iii. Most studies have power = 0.8 or greater
iv. Power is analogous to sensitivity
v. Easiest way to increase power = sample size
b. 4 parameters
i. (FP) error rate
1. a smaller alpha increases beta, which would lower the power of the study making it
harder to identify a real difference
a. a low more stringent test harder to prove difference exists (harder to
reject Ho)
i. Likely to get a Type II error ()
2. as : : Power
ii. (FN) error rate
1. the smaller the beta, the easier it will be to identify a difference
a. this can be accomplished by increase the sample size or increasing
2. : easier to find a difference ( power)
iii. Effect Size
1. The magnitude of the treatment difference you are trying to detect
a. Bigger differences are easier to detect than smaller differences
b. Size does matter
2. The study will be powered to find the minimal clinically important difference (the
smallest difference b/w 2 treatments what would be clinically beneficial)
iv. The variability of the data
1. The greater the variability in the data the harder it will be to detect a difference
a. variability : power
2. It is harder to detect the true signal when there is a lot of noise to contend with
3. Also true with rare events (death, relapse in follow-up study)
a. rare outcome : power
c. Problem with low power studies
i. It is difficult to interpret negative results
1. No effect? Or was there a failure to detect a true effect b/c of too small #s or outcomes
2. Low power studies also indicate imprecise measurements (wide CIs)
ii. Low power studies Type II errors ()
iii. Low power studies no effect on Type I errors ()
1. In a nutshell
a. Approaches statistical inference as a measurement exercise
i. Estimating the specific value and the precision in which the specific value is measured
b. Same info as p-value; however also gives
i. Size of treatment difference
Husaini 11 of 40
ii. The precision of the estimated difference
iii. Information that aids in interpreting a negative result
3. Clinical Relevance
a. Clinicians should only adjust their practices if there is a treatment difference and that treatment is large enough
to be clinically important.
b. With wide confidence intervals, clinicians can determine what they think is clinically important and then reach
conclusions appropriate for their practice.
1. When two identical groups of patients are compared, there is a chance () that a statistically significant p value will be
obtained (type I error)
a. When multiple comparisons are performed, the risk of one or more false-positive p values increases
i. If choose enough outcomes will eventually get data that is statistically significant
2. Bonferroni Correction
a. Method for reducing the overall Type I error risk when making multiple comparisons
b. Divide the overall type I error risk desired (0.05) by the number of comparisons the new value is the for
each individual test
c. Controls the type I error risk, but reduces the power in type II error risk
1. Hypothetico-deductive reasoning
a. Diagnostic strategy that nearly all clinicians use most of the time
b. Steps
i. Formulate hypotheses for patients primary problem
ii. First consider explanations that are most likely and/or those that are particular harmful to miss
1. Simultaneously rule out those that would be particularly harmful or catastrophic and try to
rule-in those that are considered to be most likely.
iii. Continue until list is shortened and/or candidate disease has very high likelihood (>90%)
c. The list of possibilities is reduced by considering the evidence for and against each, discussing those which are
very unlikely and conducting further tests to increase the likelihood of the most plausible candidates
1. Se & Sp Overview
a. Due to inherent variability in biological systems
Husaini 12 of 40
i. FPs and FNs will always occur
b. Interpretation of diagnostic results is essentially concerned with comparing the relative frequencies of the
incorrect results (FN/FP) to the correct results (TP/TN)
c. As tests normally have a continuum of values, positive and negative are divided due to a cut-off point that
differentiates between normal and abnormal
i. To the extent that the two populations (see figure) have similar
measurements, the test will not be able to discriminate between
them
1. Degree of overlap = measure of the test effectiveness
a. Sp and Se quantify this
ii. overlap: discriminatory power of the test
d. The presence or absence of a disease must be determined by a gold standard
2. Sensitivity
a. Defined as, the proportion of individuals with disease that have a positive test
result or the ability of a test to detect a disease when it is present
i. The true-positive rate
1. Test positives divided by total disease positives
ii. Calculated from diseased individuals
iii. The conditional probability of being test positive given the disease is present
1. Se = P (T+|D+)
2. When the disease is present, the test will be positive
b. The more sensitive a test, the better the NPV
c. A perfectly sensitive test
i. Test recognizes all actual positives it rarely misses
All diseased patients test (+) ii. Type II error (FN) we wont miss the disease
No FN results 1. Negative results rule out disease should be reassuring
All TN patients are disease free 2. All negatives must be TNs
Sizable portion of D- test positive 3. No FNs
iii. Does not tell you if disease is present
1. Test gives no information on false positives
iv. Three scenarios when high sensitivity tests should be selected
1. Early stages of work-up when large # of potential diseases are being considered
a. (-) result rules out that disease; helps narrow down choices
2. When there is an important penalty for missing the disease
a. TB, syphilis, etc
b. FNs since they are treatable, we want to make sure we dont miss them
3. Probability of disease is relatively low (low prevalence)
a. Purpose is to discover asymptomatic disease
X% of the time
Patients with this will have this test result
(indicates Se)
Duodenal ulcer History of ulcer, 50+ years, pain relieved by eating or pain after eating 95%
Favorable prognosis following non-traumatic
Positive corneal reflex 92%
coma
intracranial pressure Absence of spont. Pulsation of retinal veins 100%
DVT (+) D-dimer 89%
Pancreatic cancer (+) ERCP 95%
Husaini 13 of 40
c. The more specific a test, the better the PPV
d. A perfectly specific test
i. Test recognizes all actual negatives confirms health
All non-disease patients test (-) ii. Type II error (FP) well catch it if you have it
No FP results 1. Positive results rule in disease true bad news
2. All positives must be TPs
All TP patients have disease iii. Does not tell you if disease is absent
Sizable portion of D+ test negative 1. Test gives no information on false negatives
iv. Two scenarios when high specificity tests should be selected
1. To rule-in a diagnosis that has been suggested by other tests
2. When FPs can harm the patient physically or emotionally
a. Confirmation of HIV or cancer
b. When we want to be absolutely sure a condition is present
X% of the time
Patients without this will have this test result
(indicates Sp)
Alcohol dependency No to 3 or more of the 4 CAGE questions 99.7%
Fe-deficiency Anemia (-) serum ferritin 90%
Breast Cancer (-) fine needle aspirate 98%
Strep Throat (-) pharyngeal gram stain 96%
1. General Idea
Husaini 14 of 40
a. Se & Sp can only be calculated if the true disease status is known
i. They are conditional on the disease status being either positive (Se) or negative (Sp)
b. Clinician is using test precisely b/c they do not know the disease state
i. We want to know the conditional probability of disease given a test result
4. Prevalence
a. Represents the proportion of the total population tested that has the
disease
b. It is the third force that often goes unnoticed before revealing
its influence in dramatic fashion
i. Has dramatic influence on PVN & PVP
c. AKA likelihood of disease, prior probability, prior belief, prior odds,
pre-test probability & pre-test odds.
d. As prevalence falls
i. PPV
ii. NPV
e. As prevalence increases
Husaini 15 of 40
i. PPV
ii. NPV
1. General Idea
a. Calculating the predictive values for any combination of Se, Sp, & prevalence (using 2x2 tables)
b. The process by which disease probabilities are revised in face of new test information
c. Can be used to calculate PPV, NPV, Se, Sp, and Prev.
1. General Idea
a. Tests with sufficiently high Se & Sp that can simultaneously
rule out and rule in are very rare
i. We only have access to an array of imperfect tests
b. Get more information by combining tests
2. Parallel Testing
a. Situation where several tests are fun simultaneously
i. Any one positive test leads to further evaluation
b. Used in early phases of work-up when trying to , rule stuff out
i. Lots of negative tests = condition ruled out
c. Positive test only means that more testing is needed
i. b/c of in FP
d. Very costly, highly inefficient, dangerous to the patient, and is bad medicine
e. Best when need highly sensitive test (yet only have a couple insensitive tests)
i. Combining the relatively insensitive tests in parallel maximizes your chance of identifying
diseased subjects.
f. Net effect
i. likelihood of detecting disease
1. Se (there are multiple opportunities to find a positive test result)
2. PVN
ii. risk of FP
1. Sp
2. PVP
3. Serial Testing
a. Situation where several tests are run in order
i. Each subsequent test is only run if the prior test was positive
b. Any negative test work-up stopped
c. Best used when
i. We want to be sure a disease is ruled in w/ certainty
1. Used when we have no time constraints
ii. If the definitive test is expensive, difficult, or invasive
1. To avoid overusing, we make sure the patient is positive to other tests before advancing
a. Example colonoscopy after (+) fecal occult blood test
d. Great example of the logic of Bayes theorem to revise probabilities
i. The results of the first test are used to provided pre-test probabilities for the second test
e. Net effect
i. Sp & PPV b/c each case has to test positive to multiple tests
1. FPs
ii. Se $ NPV
Husaini 16 of 40
I. Introduction
1. General Idea
a. Goal of screening mortality & morbidity (and/or expensive or toxic treatment)
i. Is a form of secondary prevention
ii. Designed to detect disease early in asymptomatic phase
1. Early treatment either slows disease progression or provides a cure
iii. Premise is based on concept that early treatment will stop/retard disease progression
iv. Screening has both diagnostic and therapeutic components
2. Results of screening
a. Unlikely to have the disease (both FN & TN)
b. Likely to have disease therefore requires further diagnostic evaluation
Testing Screening
Sick patients are tested Healthy, non-patients are screened
Diagnostic intent No diagnostic intent
to disease prevalence to disease prevalence
1. General Idea
a. Defined as, the interval from detection by screening to the time at which diagnosis would have been
made without screening
i. It is the central rational of screening as it equals the amount of time by which treatment is
advanced or made early
ii. Results in longer awareness of disease
b. Does not necessarily imply any improved outcome
i. After lead time has occurred, early treatment must then be effective for screening to be beneficial
Husaini 17 of 40
2. Lead time is not a theory or a statistical artifact
a. It is what is expected w/ an early diagnosis
b. It must occur for screening to be worthwhile
i. It is therefore a necessary condition for screening to be effective in reducing mortality
3. Distribution of lead time is important
a. It indicates the length of time by which detection and treatment must be advanced in order to achieve a
level of improved mortality
b. Suggests how often screens should be done
1. Sensitivity
a. Defined as, the proportion of cases with a positive screening test among all cases of pre-clinical disease
b. In order to be accurate, all pre-clinical disease individuals must be identified w/an acceptable gold standard
diagnostic test
i. The true disease status of negative screening individuals is impossible to verify
1. No justification for a full diagnostic work-up
a. Excellent example of verification bias
c. sensitive a test = better the NPV
d. Imperfect sensitivity affects a few (the cases)
i. An imperfect (sensitive) test will have a lot of FNs, so a lot of diseased people will be
classified as negative; thus, it is affecting the cases
e. Can only be estimated in screening studies by counting the # of interval cases that occur over a
specified period in persons who tested negative to the screening test
i. In other words, count the people who got the disease but tested negative
1. false negatives (FNs) = screening Se
2. Specificity
a. Defined as, the ability of screening test to designate as negative people who do not have pre-clinical disease
b. Imperfect specificity affects many (the healthy!)
i. An imperfect (specific) test will have lots of SPs, so a lot of healthy people will be classified
as positive; thus, it is affecting healthy people
c. specific a test = better the PPV
d. For screening to be feasible the FP rate (1-Sp) needs to be sufficiently low
i. Since prevalence in pre-clinical disease is always , the positive predictive value (PPV) will be low in
most screening programs
1. pre-clinical prevalence : PPV ; thus, we need FP
ii. PPV can be improved by
1. screening only high risk populations
2. using a lower frequency of screening (which pre-clinical prevalence)
a. repeatedly screening will catch everyone
i. pre-clinical prevalence
1. No one else to catch
ii. PVP will in a successful screening program
1. Less people to catch
3. Yield
a. Defined as, the amount of previously unrecognized disease that is diagnosed and brought to treatment
as a result of screening
b. Affected by
i. the sensitivity of the screening test
1. Se = smaller fraction of diseased individuals are detected at any screening
ii. Pre-clinical disease prevalence in the population
1. pre-clinical prevalence = yield
a. Aiming screening programs at risk populations will efficiency
1. Methods
a. Experimental
i. Conduct a RCT of the screening modality
1. compare the disease specific cumulative mortality rate
a. groups randomized to screening
b. control
2. allows one to study effects of early treatment
3. estimate distribution of lead times
4. identify prognostic factors
ii. randomized design critical
1. eliminating confounding (known & unknown)
2. allowing a valid comparison between groups
Husaini 18 of 40
a. unaffected by time bias
iii.Problems
1. Expensive, time, ethical concerns
b. Non-experimental
i. Cohort comparison of advanced illness or death rate b/w people who chose to be screened and
that do not
ii. CCS comparison of screening history b/w people w/ advanced disease/death and those unaffected
(healthy)
iii. Ecological correlation of screening patterns and disease experience of several populations
c. Problems with non-experimental
i. Confounding due to health awareness
1. Those that choose to get screened are more health conscious and have lower mortality
ii. Poor quality, often retrospective data
iii. Difficult to distinguish screening from diagnostic examinations
2. Measures of effect
a. Comparison of survival experience/duration
i. the efficacy of a screening program cannot be assessed by comparing the duration of survival of
screen detected cases versus cases diagnosed clinically
1. Although common, they over-estimate the effect of screening
a. Selection bias patients who chose to get screened are more health
conscious, better educated, and have an inherently better prognosis
i. may also occur when subjects decide to get screened b/c they have
symptoms
b. Lead time screen-detected cases will survive longer even without benefit of
early treatment
i. Simply b/c they were detected earlier!
ii. Survival is increased due to lead time
c. Length-based sampling screen detected cases represent a sample of
cases prevalent in the asymptomatic pre-clinical phase
i. It is not simply a sample of all cases in a population
ii. screening preferentially identifies slow growing, indolent cases that
have a long pre-clinical phase
1. Slow growing tumors will obviously have a better prognosis
as they have a long pre-clinical phase and a long clinical
phase
b. Disease-specific mortality rate (DSMR)
i. The only true valid measure of the efficacy of a screening program is to conduct a randomized
screening trail where the DSMRs are compared b/w groups assigned screening or no screening
ii. Unlike survival time, the DSMR will not be changed by early diagnosis/lead time
1. DSMR accurately reflects benefit of screening
iii. One major problem with DSMR
1. Within the confines of a randomized screening trial, the specific cause of death is usually
assigned by an adjudication committee
a. Since they get all the information they need to properly figure out the cause of
death, they can pretty much figure out what screening group they were in
i. Study becomes un-blinded
ii. Tendencies in a breast cancer
1. Deaths in mammography trial not breast cancer related
2. Deaths in control group over diagnose breast cancer as
cause of death
2. Debate is now if the ideal measure should be all-cause mortality
a. Not subject to these biases
1. General Idea
a. A potential negative side-effect of screening is pseudo-disease or over-diagnosis
i. Identifying disease that wouldnt become clinically apparent if not for screening
b. Involves three forms
i. Over-diagnosis
1. Cases detected what would have never progressed to a clinical state
a. Ex. Cancer cases w/limited malignant potential; PSA testing and low-grade
prostate cancer; mammography and ductal carcinoma in situ
b. Is an extreme form of length-based sampling
2. Pap testing
a. incidence of invasive cervical cancer
b. in overall incidence of cervical cancer b/c of over-diagnosis of carcinoma in
situ
ii. Competing risks
Husaini 19 of 40
1. Cases are identified that would have been interrupted by an unrelated death
a. Identification of prostate cancer in an 85 year old man who would have died of
stroke
iii. Serendipity
1. The identification of disease due to non-related diagnostic test
a. Chest X-ray for TB identifies lung cancer
1. Prevention paradox
a. Occurs when a majority of the patients come from a low to moderate risk pool (low to moderate
hypertenstion) while only a few come from a high risk pool (extreme hypertension). Also seen in mothers
who have Down Syndrome babies (A majority of Down Syndrome babies come from younger, lower risk
mothers than the older, higher risk mothers)
i. Paradox = It is common and logical to equate high risk populations with a majority of the
cases
b. A preventative measure may provide a large benefit to the community at-large, but very little to the individual. It
explains how the absolute benefit provided by a preventive action to the individual can be small, yet, collectively
the benefit may be significant. Example, if everyone in a community always used a seatbelt, over the lifetime
one subject out of 400 would be saved from dying in a motor vehicle accident. The net benefit on an individual
level is small, but it is large when judged from the community level.
2. Other important Issues
a. Assessability
i. Program should be convenient, free of discomfort, efficient, and economical
b. Efficiency
i. PPV = wasteful program
1. Most of the test positive individuals will not have the disease
ii. PPV = normally good
1. Can still be associated w/only a few cases detected and thus, only a small reduction in
overall mortality
iii. No reduction in mortality if
1. Mortality from the disease is normally low
2. Risk of death from other causes is high (in the aged)
c. Cost-effectiveness
i. Should these health dollars be spent on this program?
1. Most population based screening programs are about 30-50K/year of life saved
1. General Idea
a. Experimental study conducted on clinical patients
b. Investigator controls everything
i. The exposure type, amount, and duration
ii. Randomization who receives what
c. Most scientifically vigorous study
i. Groups are equivalent w/respect to baseline prognosis
1. The unpredictable random assignment eliminates/reduces confounding from known and
unknown prognostic factors.
ii. No biased measurements
Husaini 20 of 40
1. Blinding ensures that outcomes are measured with the same degree of accuracy and
completeness in every patient
iii. Main potential biases are selection and measurement and they are small compared to cohort,
CCS, XS
d. Can confidently attribute cause and effect
i. As a result of the conditions, the presence or absence of treatment is the only thing that differed
between the two groups
1. thus, any effect is a result of the respective group
ii. Has a high internal validity
1. The experimental design ensures that strong cause and effects conclusions can be drawn
from the results
e. Gold standard to determine the efficacy of treatment
3. Levels of RCTs
a. Individual level highly select group, tightly controlled conditions
b. Community level large groups, less rigidly controlled conditions
i. Test interventions for primary prevention purposes
1. Inclusion criteria
a. Done in order to optimize
i. The rate of the primary outcome
ii. The expected efficacy of the treatment
iii. The generaliziblity of the results
iv. The recruitment, follow-up, and compliance of patients
b. Goal is to identify sup-population whom the intervention is feasible and will produce the desired effect
i. Choice of inclusion criteria represents a balance between
1. Picking the people who are most likely to benefit
2. Not sacrificing the generalizability of the study
ii. If too restrictive study population is so unique, it cant be applied to other populations
2. Exclusion criteria
a. Valid reasons for excluding patients that would mess-up the study
i. The risk of treatment/placebo is unacceptable
ii. Treatment is unlikely to be effective for the respective patient
1. Disease is too severe, too mild, or treatment has already failed in the patient
iii. Co-morbities interfere w/ intervention, measure of outcome, expected length of follow up (terminal
cancer)
iv. Compliance patient unlikely to complete follow-up or adhere to protocol
v. Other practical reasons
1. Language, cognitive barriers, no phone
b. Goal is still to identify sup-population whom the intervention is feasible and will produce the desired
effect
i. Avoid excessive exclusions
1. Will add to complexity of screening process (every patient needs to be assessed, so
exclusions = time) and ultimately decrease recruitment
a. exclusions = complexity = recruitment
ii. Trade off between
1. Patients more likely to make the study a success
2. Sacrificing generalizibility
a. If too restrictive, wont apply to real world
b. internal validity; external validity
3. Baseline Measurements
a. Necessary to collect sufficient (but not excessive) demographic to illustrate that the randomization process
resulted in identical groups
b. Baseline info to be collected
i. Demographics of participants
1. Important to demonstrate that randomization process worked
ii. Contact & identifying info from patient and contact info from friends, family, etc
1. Important to track subject during study prevent loss-to-follow up
Husaini 21 of 40
iii. Major clinical and prognostic factors for the primary outcome that can be evaluated in pre-specified
subgroup analyses
1. If we thought treatment effect would dependent on gender, we would collect info on gender
5. Intervention
a. Important to balance potential benefits vs. risks of intervention
i. Everyone is exposed to potential side effects of interventions
ii. Yet, not everyone will benefit from the intervention
1. Not everyone will develop the outcome; no intervention is ever 100% effective)
iii. Thus, caution dictates using the lowest effective dose
b. RCTs designed on premise that serious side effects will occur much less frequently than the outcome
i. Thus, RCTs are very under-powered to detect side effects ( : : power)
1. Phase IV post-marketing surveillance studies are around to check serious side effects once
drugs make it to market b/c many more people will the power so rare, but serious, side
effects can be uncovered.
c. Control group measures the cumulative effects of all other
factors except for the treatment
i. Spontaneous improvements due to natural history
ii. Hawthorne effect subjects improve simply b/c they
are being studied
iii. Placebo Effect Its in your head
6. Blinding (masking)
a. Cardinal feature of RCT
i. It preserves the benefits of randomization by
preventing biased assessment of outcomes
b. Blinding = prevents measurement bias
c. Helps reduce non-compliance, contamination, & cross-overs
i. Especially true in control group (they are unaware
that they are not getting the active treatment)
d. Types
i. Single blind either patient or physician is blinded
ii. Double blind both patient and physician are blinded
iii. Ideally patients, caregivers, collectors of outcome data, adjudicators of the outcome data, & the
data analyst
e. Placebo
i. Defined as, any agent or process that attempts to mask, or blind, the identity of the true active
treatment
ii. Common feature of drug trails
iii. Especially important when primary outcome measure is non-specific (soft)
1. soft = patients self reporting pain, nausea, depression, etc
2. Placebo effect the tendency for such soft outcomes to improve in study
participants (regardless of control vs. treatment)
a. The effect is regarded as the baseline against which to measure the effect of
active treatment
iv. Placebos are not justified when known standard to care already exists
Husaini 22 of 40
1. Must give patients the minimum standard of care
a. Ex. In stroke prevention trial w/ anti-platelet drugs, aspirin would be the minimum
standard of care that would be used as the control group
f. Many times blinding/placebo are not feasible (surgical interventions)
i. Difficult to mask who got surgery and who didnt
ii. Study referred to as an open trial
iii. Blinding may be hard to maintain
1. When treatment has clear and obvious benefit or side effect
a. Turns urine orange, etc
iv. In such cases, use hard outcomes to standardize treatment/data collection
1. Blinding feasible hard outcomes (any cause death)
Husaini 23 of 40
c. In order to mitigate problems
i. Trails will impute/extrapolate an outcome based on missing data
protocol
1. Using the last or worse observation
2. Attempting to predict the unobserved outcome based on the
characteristics of the subject
ii. Results should always be viewed with caution
ii. 5 & 20 rule
1. Technique to assess the likely impact of poor compliance & LTFU; The percentages are
those of the study participants affected by LTFU or non-compliance
a. If affects <5% = bias is minimal
b. If it affects >20% = bias is likely to be considerable
iii. best case/worst case sensitivity analysis
1. Assess potential impact of LTFU
2. Best case
a. LTFU subjects assumed to have best outcome (no adverse outcomes)
b. Event rates calculated counting all LTFU in denominator but not in the numerator
3. Worst case
a. LTFU subjects assumed to have worst outcome
b. Event rates calculated counting all LTFU in both numerator and denominator
4. Overall potential impact is then gauged by comparing the actual results with the range of
findings from the best case/worse case sensitivity analysis
a. High range of estimates imply studys findings are questionable
9. Statistical Analyses
a. Should be straight forward as the design should have created balance b/w all the factors except for the
intervention
i. Simple matter of comparing 1 outcomes b/w groups
1. Continuous data t-test
2. Categorical outcomes chi-square
3. Small data or not Gaussian non-parametric methods
4. Survival type studies Kaplan Meire survival curves or Cox Regression modeling (will
measure the fraction of patients living for a certain amount of time after treatment)
b. Intention-to-Treat Analysis (ITT)
i. Most important concept for RCT
Husaini 24 of 40
ii. Compares outcomes based on the original treatment arm that each individual participant was
randomized to regardless of protocol violations
1. Violations include ineligibility, non-compliance, contamination, or LTFU
iii. Results are the most valid, but conservative estimate of the true treatment effect
1. Approach is the truest to the principles of randomization (which sticks to the perfectly
comparable groups at study outset)
2. However, ITT cannot fix the problem of LTFU unless the missing outcomes are imputed
using a valid method (which can never be fully verified)
a. Thus, no amount of
statistical analysis can fix
the problem that the final
outcome is unknown for a
sub-set of subjects
c. Per Protocol (PP)
i. Fundamentally Flawed
ii. Persons dropped
1. Those in treatment arm who did not
comply
2. Control subjects who got treated
(cross-overs)
iii. Analyzed
1. Only those who complied w/ the
original randomization
iv. Answers the question as to whether the
treatment works among those who complied
1. It can never provide an unbiased
assessment of the true treatment
effect
a. The decision to comply w/
treatment is unlikely to
occur at random
2. Basically the same thing when, during an ITT analysis, subjects are dropped b/c of
unknown outcome
v. Aka. Efficacy, exploratory, or effectiveness analyses
d. As Treated (AT)
i. Fundamentally Flawed
ii. Analyzed
1. Everyone assuming subjects got the treatment or did not (regardless of which group they
were originally assigned to)
iii. Basically the same as analyzing a trial as if a cohort study had been done completely destroys
any of the advantages afforded by randomization
1. everyone decided themselves whether to get treated or not
iv. Published studies use AT when studies do not show positive ITT analysis
1. Have to ask, what was the point of doing the trial in the first place if you ended up doing an
AT analysis AT approach w/o merit
e. Example
i. Note the very high death rate in the 26 subjects that were slated for surgery but received medical
treatment
1. At baseline, were probably a very sick group of patients who died before surgery or were
too sick to undergo it
ii. Note the 50 subjects who should have gotten medical treatment but got surgery instead and their
much lower death rate
1. At baseline, these men were probably healthier so impossible to judge the relative merits of
surgery based on this info
iii. Analysis
1. ITT surgery has small, significant benefit
2. PP or AT surgery has a larger and statistically significant benefit for surgery
a. This estimate is biased!
i. The 26 high risk subjects were either dropped from the surgery
group (PP) or moved to the medical group (AT which makes
medical look much worse)
Husaini 25 of 40
10. Meta-analyses
a. assessing trial quality, trail reporting, and trail registration; improve effect size estimtates (narrow the CI)
b. Meta-analyses fast becoming the undisputed king of the evidence based tree
c. Three important implications for RCTs
i. Assessment of study quality
1. b/c of the variability in the quality of published RCTs, meta-analysts will attempt to assess
their quality to determine whether the quality of a trial has an impact on the overall results
2. all approaches focus on
a. a description of randomization process
b. the use of concealment & blinding
c. a description of the LTFU and non-compliance rates
ii. Trail reporting
1. Reports on quality assessment of trials (using Jadad scale or similar tool)
2. If trail is of marginal or poor quality
a. Probably did not report info on key quality criteria
i. Randomization, concealment, blinding, LTFU
b. Not sure if author simply failed to mention or if they simply did not follow these
steps
3. Lead to development of specific guidelines for the reporting of clinical trials
a. CONSORT Statement aims to make sure trials are reported in a consistent
fashion and that specific descriptions are included so the validity can be
independently assessed
iii. Trail registration
1. Big problem for meta-analyses is potential for publication bias
2. Results from meta-analyses can be seriously biased if there is a tendency to not
publish trails w/ negative or null results
a. Thus, when we collect data, we are collecting relatively much more positive data
that what is truly representative
i. The negative data is hidden
b. Unpublished negative trails
i. Either small (power)
ii. Large, drug company sponsored trials
1. Company doesnt want to release info
3. International Committee of Medical Journal Editors
a. Requires that for a trial to be published in any of these journals, it must
have been registered prior to starting it
i. Thus, scientific community will then a registry of all trials undertaken on
a respective subject
Husaini 26 of 40
a. Better safety profile ( side effects, monitoring)
b. Easier dosing schedule
i. compliance
c. Cheaper
3. May involve the evaluation of the same drug given using a different strategy, dose, or
duration
ii. Methodological challenges
1. Null hypothesis is opposite that of typical superiority trial
a. Superiority trail
i. Ho: new drug = active control
ii. Ha: New drug active control
b. Non-inferiority trail
i. Ho: New drug + equivalence margin < active control
1. the active control is substantially better than the new drug
2. Rejecting Ho new drug is not inferior to the active
control within the bounds of the equivalence margin
ii. HA: New drug + equivalence margin active control
2. Equivalence margin = how much we are willing to accept that the new drug can have worse
efficacy
a. Set by clinically deciding how big a difference there would have to be between
the two drugs before we would decided that the drugs are clinically not equivalent
b. It is the critical determinant of the success of the trial & its sample size
c. #s = more conservative
d. #s = more liberal
iii. Other problems/limitations of non-inferiority trials
1. Assay sensitivity
a. A poorly conducted trail may falsely show that the 2 drugs are equivalent
i. Poor trail conduct (compliance, follow-up, blinding, etc) will favor the
non-inferiority
2. Blinding
a. Vital step to reduce measurement bias in superiority trials
b. It cannot protect against the investigators giving the same outcomes/ratings to all
subjects
i. Thereby showing non-inferiority
3. ITT analysis
a. ITT is gold standard in superiority trails
b. ITT in non-inferiority trails tends to bias towards finding non-inferiority
i. Including non-compliance in treatment/control tends to minimize the
differences b/w groups
1. Thus, this can show an inferior drug to be non-inferior
c. PP analysis can introduce bias in either direction
i. Not recommended as it can compound the problem
d. Best bet = do both ITT & PP and hope findings are consistent
i. Even so, accepting HA doesnt rule out possibility of bias
1. Advantages
a. internal validity
b. Control of exposure (amount, timing, frequency, duration)
c. Randomization
i. Ensures balance of factors that could influence outcome
1. controls the effect of known and unknown confounders
d. A true measure of efficacy
2. Disadvantages
a. Limited external validity
b. Artificial environment
i. Strict eligibility criteria and conducted in specialized tertiary care medical centers
1. Greatly limits generalizability
c. Difficult/complex to conduct
i. Takes time and is expensive
d. Limited scope due to ethical concerns
i. Mostly therapeutic/preventive only
1. Overview
Husaini 27 of 40
a. Observational study = investigator has no control over exposure
b. Descriptive
i. Case reports & case series (Clinical)
1. Profile of a clinical case or case series which should
a. Illustrate a new finding
b. Emphasize a clinical principle
c. Generate a new hypothesis
2. It is not a measure of disease occurrence
3. As there is no control or comparison group, we usually cannot identify risk factors or
the cause
a. Exception 12 cases w/ salmonella infection, 10 had eaten cantaloupe
ii. Cross-sectional (Epidemiological) prevalence, or collecting data
c. Analytical
i. Cohort
ii. Case-control
iii. Ecological
1. we dont know exposures, but people who are affected are relatedso we study the
relationship workers and asbestos
2. Risk Factor
d. Heard daily with cholesterol (heart disease), HPV (cervical cancer), cell phones (brain cancer), TV watching
(childhood obesity), etc
i. However, association does not mean causation
ii. Ex. almost perfect overlap b/w CHD and non-CHD b/w in
percentage of men who developed coronary heart disease
with respect to serum cholesterol
1. Even though cholesterol is a proven risk factor
2. If you are just given one # for an individual, it is
difficult to predict outcome b/c of the perfect
match
e. How are risk factors used
i. Identifying individuals/groups at risk
1. Even though ability to predict future disease in
individual patients is very limited (even for well
established risk factors like cholesterol/CHD), it
still helps identify populations
ii. Causation causative agent vs. a marker
iii. Establish pretest probability (Bayes theorem)
iv. Risk stratification
1. Helps to identify target populations (age >40 for mammography screening)
v. Prevention
1. Remove causative agent prevent disease
4. Prevention
c. Removing a true cause = disease incidence
i. Decrease aspirin use = Reyes Syndrome
ii. Discourage prone position while infants are sleeping
1. Back to Sleep = SIDS
5. Observational Studies
d. XS, Cohort, and CCS are all analytical observational studies
1. General Idea
a. Exposure & Outcome at the same time
b. Also called prevalence study
i. Prevalence measured by conducting a survey of the population
of interest
c. Mainstay of descriptive epidemiology
i. Patterns of occurrence by time, place, and person
ii. Estimate disease frequency (prevalence) and time trends
Husaini 28 of 40
iii. when trying to get a handle on an ideatrying to get clues on the origin of disease by looking at
subgroups
d. Useful for
i. Program planning
ii. Resource allocation
iii. Generate hypotheses
2. How
a. Select sample of individual subjects and report disease prevalence (%)
b. Can also simultaneously classify subjects according to exposure and disease status to draw inferences
i. Describe association using the Odds Ratio (OR)
3. Examples
a. Prevalence of asthma in school-aged children in MI
b. Trends and changing epidemiology of hepatitis in Italy
c. Characteristics of teenage smokers in MI
d. Prevalence of stroke in Olmstead County, MN
3. Relative Risk
a. The standard measure of association in cohort studies
b. Describes the magnitude and direction of the association
c. Incidence can be measured as IDR or CIR
d. Interpretation
i. RR = 1.0 null
ii. RR = 2.0 risk is twice as high in exposed vs. non-
exposed
iii. RR = 0.5 risk in exposed is half that in non-exposed
0 0.2 0.5 1 2 3 4 5 6
Big Moderate Small Moderate Big
4. Sources of Cohorts
Husaini 29 of 40
a.Geographically defined groups
i. Framingham, MA heart study
b. Special resource groups
i. Medical plans (Kaiser Permanente), Medical professionals (Physicians health study, Nurses Health
Study), Veterans, College Grads
c. Special Exposure Groups
i. Occupational exposures
1. Lead workers, uranium miners
a. If everyone was exposed, then you need an external (non-exposed) cohort for
comparison purposes
i. Lead workers to car assembly workers
5. Cohort Design Options
a. Variation in timing of exposure and disease measurement
b. Types
i. Prospective
ii. Historical Look back at the entire cohort (Exposure) and see who gets the outcome
iii. Retrospective
1. Go back in time to figure out exposure
2. Comparing exposure and non-exposure
3. Doesnt happen often, but sweet way to do it.
4. Examples
a. Aware of cases of fibromyalgia in women within a large HMO. Go back and
determine who had silicone breast implants (past exposure). Compare incidence
of disease in exposed and non-exposed.
i. Go back and look
1. Who had fibromyalgia
2. Who had silicone breast implants
ii. Case control study would look like this
1. Ask women w/fibromyalgia if they had silicone implants
2. Determine a control group & then only compare the 2 groups
no population comparison
b. Framingham Study: used frozen blood bank to determine baseline levels of hs-
CRP and then measure incidence of CHD by risk groups (quartiles)
i. They measured the CRP levels in the blood from the 60s and then
figured out who currently had CHD
5. We know that the population is composed of cases and non-cases. In case control
studies we find the cases/controls and then track backward to determine exposure.
In retrospective cohort studies, we start by figuring out exposure retroactively and
then track them forward to figure out if they developed into cases
6. Need complete population data in order to do this
a. need to classify everyone
7. Bias
a. Selection Bias
i. Can occur at any time once the cohort is first assembled
1. Patients assembled for the study differ in many ways other than the exposure under
study and these factors may determine the outcome
a. Ex. Only the uranium miners at the highest risk for lung cancer (smokers, prior
family history) agree to participate.
ii. Can occur during the study
1. Differential LTFU in exposed vs. non-exposed
Husaini 30 of 40
a. LTFU doesnt occur at random
iii. Its basically inevitable
b. Confounding Bias
i. As the exposure of interest is not assigned at random & other risk factors may be associated
w/ both the exposure and the disease, confounding basis can occur in these cohort studies
ii. Confounding bias the big one for cohort studies
8. Advantages/Disadvantages
a. Advantages
i. Can measure disease incidence, can study the natural history of disease, provides strong
evidence b/w casual association between E/D (b/c time order is known), provides info on time lag,
multiple diseases can be examined, good choice if exposure is rare (assemble special exposure
cohort), generally less susceptible to bias vs. CCS
b. Disadvantages
i. Takes time, large samples, is expensive, complicated to implement and conduct, not useful for
rare diseases/outcomes, problems of selection bias (assembling at start and LTFU during) &
prolonged time period compounds LTFU, and confounding
Effect cause
Begin with the OUTCOME (case) and then GOING BACK ( Recall Bias) looking for ODDS OF EXPOSURE
Good for RARE OUTCOMES; cohort for rare exposures
IV. Case Control Studies (CCS) Cannot calculate incidence (no RR or AR)
High bias for everything (RECALL, selection, confounding, & measurement)
1. General Idea
a. An alternative observational design to identify risk factors for a disease/outcome
i. Two samples are selected
1. Patients who had developed the disease in question
2. Otherwise similar people who did not develop the disease in question
ii. Find a case (45year old female) with a control (45 year old female)
1. Distribution of age and gender are the same b/w the groups they can no longer
confound
iii. They already have the outcome
b. Question: how do diseased cases differ from non-diseased (controls) w/ respect to prior exposure
history?
c. In the population we find those that are diseased, and then match controls that are not diseased. In
other words, we figure out cases and controls first and then figure out if exposures occurred.
i. We know that the population is composed of cases and non-cases. In case control studies we find
the cases/controls and then track backward to determine exposure. In retrospective cohort
studies, we start by figuring out exposure retroactively and then track them forward to figure
out if they developed into cases
d. Compare the frequency of exposure among cases and control
i. Effect cause
e. Cannot calculate disease incidence rates b/c the CCS does not
follow a disease free population over time
f. For CCS, all the cases had the outcome
g. Basically, we identify cases and then look backward to find
causes of disease (& non-disease)
i. Look for common exposure
ii. Still set up a control group & then look back at that group as
well
2. Nested CCS
a. Study of MHG in infants
i. Not only did they look forward to see how the infants were
affected, they set up a control group in both those w/MHG &
those w/o MHG & looked back for exposures
3. Examples of CCS
a. Outbreak investigations ( what is causing young women to die of
toxic shock)
b. Birth defects (drug exposures and heart teratology)
c. New (unrecognized) disease (DES and vaginal cancer in adolescents)
Husaini 31 of 40
i. Selection of cases
1. Requires case definition
a. Need for standard diagnostic criteria, consider severity of disease, and consider
duration of disease (prevalent or incident case?)
2. Requires eligibility criteria
a. Age of residence, age, gender, etc
ii. Sources of cases
1. Population based
a. Identify and enroll all incident cases from a defined population
i. Ex. Disease registry, defined geographic area, vital records
2. Hospital based
i. Popular in USA b/c we dont have good national/regional databases
b. Identify cases where you can find them
i. Hospitals, clinics
ii. Issues of representativeness, prevalent or incident cases?
iii. Selection of controls
1. Controls reveal the normal/expected level of exposure in the population that gave
rise to the cases
2. Should have the same eligibility criteria as the cases
3. Issue
a. Control comparability to cases concept of the study base
i. Controls should be from same underlying population
ii. Need to determine if the control would have developed disease
would s/he be included as a case in the study
1. If no, then dont include as a control
iv. Sources of controls
1. Population based
a. Ideal as it represents the exposure distribution in the general population
b. However, if there is a low participation rate response bias likely (selection
bias)
2. Hospital based
a. Used when population based controls are not feasible
b. Much more susceptible to bias
c. Advantages
i. Similar to cases? (it is a hospital after all..), more likely to participate,
and efficient (there in a hospital)
d. Disadvantages
i. Are they representative of the study base?
ii. They already have some kind of disease/co-morbidity
1. Dont select if risk factor for their disease is similar to the
disease under study (COPD & lung cancer)
3. Other sources
a. Relatives, neighbors, friends (of cases)
i. Advantages
1. Similar to cases and more willing to cooperate
ii. Disadvantages
1. More time consuming, may not be willing to give info, may
have similar risk factors
5. Analysis of CCS
a. The only valid measure of association for the CCS is the Odds Ratio (OR)
b. Under reasonable assumptions (the rare disease assumption), the OR approximates the RR
c. Odds Ratio
i. Odds of exposure among cases = a/c
ii. Odds of exposure among controls = b/d
iii. Similar interpretation as RR
iv. Provides the same information as RR if
1. Controls represent the target population
2. Cases represent all cases
3. Rare disease assumption holds
a. Or if CCS us undertaken w/population based sampling
v. OR can be calculated for any design
1. OR is the only valid measure for the CCS
2. RR can only be calculated for RCT & cohort studies
3. Publications will occasionally mislabel OR & RR
6. Confounding
a. Exposure of interest may be confounded by a factor that is associated with the exposure and the
disease it is an independent risk factor that the disease
b. Can be controlled
Husaini 32 of 40
i. At the design phase
1. Randomization, restriction, matching
ii. At the analysis phase
1. Age-adjustment, stratification, multivariable adjustment
c. Matching
i. Used to control an extraneous variable by matching controls to cases on a factor you know is
an important risk factor or marker for disease
1. Age (w/in 5 years), sex, neighborhood
ii. If the factor is fixed to be the same in both the cases and controls, then it cant be confounded
iii. Analysis of matched CCS needs to account for the matched case-control pairs
1. Only pairs that are discordant with respect to exposure provide useful information
2. McNemars OR = b/c
a. Case (+/-) vs. Control (+/-) and then match in a 2X2 table
i. Each box entered as a pair of one case and one control
ii. Concordant pair = both smokers or both non-smokers
iii. Discordant cells = contribute to Odds Ratio
1. Case is a smoker, control is not
2. Case is a non-smoker, control is a smoker
b. the only pair that gives any information is discordant pairs
iv. Can power by matching more than one control per case
1. 4 controls to 1 case = power
2. Useful if few cases are available
v. Over-matching
1. Matching can result in controls being so similar to cases that all exposures are the same
a. Ex. 8 cases of GRID (LA county 1981) in which all cases were gay men so they
were matched using a 4:1 matching ration to other gay men who did not have
signs of GRID (32 controls)
i. No differences found in sexual or other lifestyle habits
d. Recall Bias
i. Presence of disease may affect ability to recall or report the exposure
1. Is a form of measurement bias
2. Ex. Exposure to OTC drugs during pregnancy use by moms of normal and congenitally
abnormal babies
a. Its pretty hard to remember if/when you may have taken a Tylenol
ii. To lessen potential
1. Blind participants and study personnel to study hypothesis
2. Use explicit definitions for exposure
3. Use controls w/ an unrelated but similar disease
a. Ex. Heart tetralogy (cases), hypospadia (controls)
e. Reverse Causation
i. The disease or sub-clinical manifestations of it results in a change in behavior (exposure)
ii. Ex. Obese children found to be less physically active than non-obese children
Husaini 33 of 40
EPI 547
I. Random Stuff
1. Bias
a.
It is a deviation from the truth (Grimes, Bias and Causal Associations, Lancet 2002)
b.
Systematic Error (Bias) - Error in study design which may skew the results leading to a deviation from the
truth.
i. This is when all measurements are consistently all high, or low.
1. Ex: A spectroscopy machine consistently gives high readings because it wasnt calibrated.
c. Three broad classes of Bias
i. Confounding
1. Factor that distorts the true relationship of the study variable of interest of being to both
a. The outcome of interest
b. The study variable
c. Confonding = bias that we can control
2. Two mechandims
a. Confounding by Indication
i. Intractable problem where prognostic factors influence treatment
decision
1. Problematic when elevating treatment effects from
observational data
ii. Ex. Asthma studies of 1980s showed an association between a -
agonist (Fenoterol) and death from asthma
1. However, it was argued that patients who had more sever
asthma were therefore at a higher risk of mortality from
asthma and thus were likely to be prescribed Fenoterol in
the first place.
2. The severity of the disease confounds the association
between the drug and the adverse outcome
b. Channeling effect (bias)
i. Tendency for clincians to prescribe certain treatments based on a
Reducing Bias
Confounding use RANDOMIZATION patients underlying prognosis or comorbidity profile
Selection use CONCEALMENT 1. results in differences in baseline risk
Measurement use BLINDING ii. Solution
1. Adapt the design of the study
2. Statistically adjust the baseline risk differences
ii. Selection
1. Internal validity questions for Dx & Px
2. Cohort studies
3. Selection bias = bais that we cannot control (compared to confounding)
iii. Measurement
1. Internal validity questions for harm
2. Case Control Study
Husaini 34 of 40
c. Interviewer bias error introduced by an interviewers conscious or subconscious gathering of selective data
(the interviewer might think that people are sicker than they really are).
d. Recall bias error due to differences in accuracy or completeness of recall to memory of past events or
experiences. Particularly relevant to case control studies (CCS).
e. Selection bias an error in patient assignment between groups that permits a confounding variable to arise
from the study design rather than by chance alone.
i. Occurs when the groups of exposed and non-exposed assembled for the study differ in some way
other than the prognostic factors under study.
ii. When extraneous variables affect the outcome of the study
iii. This stems from an absence of comparability between groups being studied
iv. Spectrum Bias
1. Definied as: the difference in both the spectrum and severity of diseae between
a. The population among whom the test was first developed (the study population)
i. Phase I evalutions to see if (+) in sick people
1. The sickest of the sick
a. Easier time picking out the obvious
b. Se will be overestimated
ii. Phase I evaluations to see if (-) in normal people
1. The wellest of the well
a. Healthier and younger than clinically relevant
population
b. Less likely to have other Dx or co-morbitites
c. Less FP results (or so many TN..)
d. Overestiimates Sp
iii. NET EFFECT: new diagnostic tests are overly optimistic
a. Overestimate Se and Sp
b. The population that the test will be used in practice (clinically relevant population)
i. Phase II evaluations clinical population that has a whole array of
conditions that are a part of the DDX
1. Patients WITH OUT the DISEASE of INTEREST
a. Conditions that cause FPs
b. This underestimates the Sp
c. Opposite of the wellest of the well
v. Assembly or Susceptibility bias Is an example of selection bias, since the bias occurs when the
subjects are first selected.
1. Survival Cohort This is a special type of assembly bias where only the patients that
survived the outcome are taken into account
2. A survival cohort describes the past history of prevalent cases and NOT that of a true
inception cohort
3. Individuals who would have been included in a true inception cohort are not accounted for
because they died soon after the onset of treatment
vi. Migration Bias / Loss-to-Follow-Up This is another form of selection bias which occurs when
patients drop out of the study prematurely
vii. Referral / Sampling Bias This is a selective referral of patients to tertiary (academic) medical
centers where many publications concerning prognostic aspects of disease are conducted and this
selection bias alters the clinical spectrum of disease
1. The proportion of more severe or unusual cases tends to be artificially higher at tertiary
care centers.
2. People who are treated at primary care centers are often TOO SICK to be referred to or
even make the trip to the tertiary care center.
3. The people that survived the referral to the academic centers are the ones getting studied.
f. Volunteer bias people who choose to enroll in clinical research may be systematically different (e.g.
healthier, or more motivated) from your patients.
g. Verification Bias / Workup bias when the decision to conduct the confirmatory or reference standard test is
influenced by the result of the diagnostic test under study.
i. Ask: Where all the patients subjected to a gold standard
ii. Fecal occult blood test and colonoscopy
1. FOBT (-) are not referred to colonoscopty, but they could be FNs
a. Over estimates Se (b/c of the FNs are underestimated)
b. Under estimates Sp (b/c # of TN are underestimated)
Husaini 35 of 40
3. Cochrane Collaboration
a. This international group, named for Archie Cochrane, is a unique initiative in the evaluation of healthcare
interventions that prepares, disseminates, and continuously update systematic reviews of controlled trials for
specific patient problems.
b. This team will gather all of the studies on a subject, disregard the poor studies, and come up with a consensus
on the final outcomes of the good studies.
c. Key points for systematic reviews
i. Grade concealment of allocation
ii. Describe key quality parameters relevant to topic
iii. Report risk of bias table
6. Hazard Function
a. The probability of an event (such as death or relapse) at a given moment in time (t) (EPI 547 CP).
b. It is a direct measure of prognosis and indicates that, given the patient has survived to a certain point in time,
what is the probability of the patient failing during the next time period?
i. in hazard function indicates that the prognosis worsens with time
Husaini 36 of 40
ii. in hazard function indicates that the prognosis improves for those patients that survive longer
7. Censoring
a. When the event of interest does not occur in all individuals because
i. Study stopped/ended before outcome occurred
ii. LTFU
iii. Death from other (competing) causes eg. Road accident
Husaini 37 of 40
i. Advantages
1. can establish clear temporal relationships between exposure and disease onset
2. able to generate incidence rates .
ii. Disadvantages
1. control/unexposed groups may be difficult to identify, exposure to a variable may be linked
to a hidden confounding variable, blinding is often not possible, randomization is not
present.
2. For relatively rare diseases of interest, cohort studies require huge sample sizes and long
f/u (hence they are slow and expensive).
f. N-of-1 Trial When an individual patient undergoes pairs of treatment periods organized so that one period
involves use of the experimental treatment and the other involves use of a placebo or alternate therapy. Ideally
the patient and physician are both blinded, and outcomes are measured. Treatment periods are replicated until
patient and clinician are convinced that the treatments are definitely different or definitely not different.
g. Randomized Controlled Trial (RCT) A group of patients is randomized into an experimental group and into a
controlled group. These groups are then followed up and various outcomes of interest are documented. RCTs
are the ultimate standard by which new therapeutic maneuvers are judged. Randomization should result in the
equal distribution of both known and unknown confounding variables into each group. An unbiased RCT also
requires concealment and where feasible blinding.
i. It is the gold standard of clinical research designs because it REDUCES CONFOUNDING from
known and unknown confounders
ii. Randomization should ensure that there are NO differences between the groups at baseline.
1. This can be described as the groups having an equal chance at prognosis at baseline
2. It can also be described as controlling for the known and unknown confounding variables
3. It CANNOT be applied to ALL clinical questions of interest
4. RCTs do NOT have to always have a placebo group, but they must have a group to
compare to (different drug)
iii. Disadvantages: often impractical, limited generalizability, volunteer bias, significant expense, and
sometimes ethical difficulties.
h. Systematic Review A formal review of a focused clinical question based on a comprehensive search
strategy and structured critical appraisal designed to reduce the likelihood of bias.
i. No quantitative summary is generated however.
ii. Any summary of research that attempts to address a focused clinical question in a systematic,
reproducible manner.
iii. These reviews provide a summary of studies which have been searched out comprehensively with
explicit and reproducible search strategy intended to answer a specific clinical question
iv. These reviews incorporate some sort of inclusion criteria, valid quality assessment methods, a
rigorous appraisal of the evidence offered, and some summary conclusions
v. High quality SRs
1. Assess quality of individual studies
2. Report results
i. Narrative Review This provides a general overview which may not follow rigorous, reproducible scientific
methods
i. This may result in biased or erroneous conclusions
ii. It may be that these reviews provide practical information about managing common clinical conditions
iii. One is unlikey to find a detailed and discriminating appraisal of evidence
j. Meta-Analysis A systematic review which uses quantitative methods to combine the results of several
studies into a pooled summary estimate.
i. The quantitative synthesis that yields a single best estimate of, for instance, treatment effect.
ii. This is a sub-set of systematic reviews where the investigators report a combined summary statistic
with a variable of interest.
Husaini 38 of 40
ii. Fundamental Fact #1
1. The interpretation of test results depends on the probability of disease before the test was
run
d. Odds-LR form of Bayes Theorem
i. The environment in which the test is applied (indicated by pre-test odds) is as important as the
information provided by the test (indicated by the LR); in other words, each aspect is only half of the
story
ii. The LR can be obtained on-online, the hard part is for the clinician to provide an accurate estimate of
the pre-test odds.
iii. (Pre-test ODDS)(LR) = (Post-test ODDS)
1. Pre-test ODDS = (Prev/(1-Prev)
2. Post-test PROBALITY = (post test odds)/(1+post test odds)
a. Post test porbality = PVP (or PVN)
iv. Thus, can calculate PVP or PVN from
1. LR+
2. LR-
3. Pre-test odds
v. Positive posttest Probability) the likelihood of disease given that the test is (+) = P(D+T+)
vi. Negative posttest probability) the likelihood of disease given that the test is (-) = P(D+T-)
1. This one isnt much use to us as it is the complement of PVN or 1- P(D-T-)
a. Therefore, we calculate PVN = 1- post test probability
e. Can be calculated for a range of test results (ordinal or continuous test), thereby preserving clinical information
f. Test results provide the maximum amount of information the change b/w prior and post test probability when
the prevalence of disease is between 40-60%.
g. Theoritical advantage of LR over Se and Sp
i. Se and Sp remain constant regardless of the prior probability of disease
ii. LR is less susceptible to changes in the underlying prevalence of disease because they are
calculated from smaller slices of the data.
15. Heterogeneity
a. Test done in systematic reviews and meta analysis to determine how similar individual study RESULTS are
b. Q statistic
i. Based on the chi-square test (the test has low power to detect heterogeneity)
ii. H0: p>0.5 there is homogeneity among the results
iii. Ha: p<0.5 then heterogeneity is present
1. Unlikey that chance explains the difference in the studies
iv. NOTE: the H0 is OPPOSITE of what we would normally think
c. Inconsistency Index or I2
i. Estimate of the variability in results due to true differences in treatment effect vs. chance
ii. I2<25% = low heterogeneity
iii. I2 25-75% = moderate heterogeneity
iv. I2 >75% = high heterogeneity
d. Sources of heterogeneity
i. Clinical heterogeniety
1. Populatoin
2. Intervention
3. Outcome
ii. Methodologcical heterogenetiy
1. Design
iii. Chance
Husaini 39 of 40
i. Larger trails have more weight as a simple mean may provide an unbalanced estimate of the effect
size
ii. Types
1. Fixed effect model = used with LOW heterogeneity
a. Inference based on studies at hand
b. So assumption is that identicial studies should produce identical results
i. Thus, any difference is only from within-study random variation
c. Combines all the studies according to weight
2. Random effect model = used with HIGH heterogeneity
a. Random sample of studies from all the possible studies in the univerise
(hypothetical)
b. Accounts for between and within study random variation
c. More weight given to smaller studies
d. Wider 95% confidence interval
Husaini 40 of 40