The Statistics in Trials

Randy Glasbergen cartoon motivational speaker with flip chart and pointer
The LowCarb Lecture on Clinical Trials
What the mind can conceive and believe, it can achieve. You can make a fortune, build an empire, change the world. But don't try to lose weight because that never works.
William D Heavlin bill.heavlin@comcast.net

revised August 2006
Objectives:
Target audience: Clinical research associates and other research support staff. Time expected: ~2 hours
Just pop these on before we start, Mrs Foobar. fg 2005 cartoon Doctor handing woman with dog goggles like he is wearing. Cabinet has truth goggle box.
Topic: after this class you will

see some key concepts for clinical trials, for clinical trial statistics, and understand how they are reflected in the design of clinical trials.
August 2006 bill.heavlin@comcast.net
Basic Outline:
I make my pie charts with real pie! Everyone who pays attention gets a slice at the end of the meeting.
Randy Glasbergen 2001 cartoon Clinical trials history Man with pie and flip chart under and ethics arm talking to woman in business Statistics worldview suit with briefcase. Sample size and confidence intervals Endpoint and design hierarchies Effects researchers worry about Random assignment Multiplicity Typical clinical trial practices and why
A selected history of clinical trials

1747 James Lind 1863 Austin Flint scurvy rhuematism 6 paired patients, citrus 1st use of a placebo self-exposed to Aedes aegytpi Randomized experiments Randomization by coin toss 10 points: consent...benefit 1st use of random numbers teratogenic effect not tested Contemporary guidelines
bill.heavlin@comcast.net
1901 Walter Reed, malaria Clara Maass 1925 RA Fisher 1931 Amberson agriculture tuberculosis
1947 Nuremberg Code 1948 BMRC 1961 FO Kelsey 1993 HHS, WHO
August 2006
tuberculosis sleep aid
Nuremberg Code
1.voluntary consent 2.potential benefit 3.based on animal experiments and literature 4.avoids suffering, injury 5.no a priori reason to expect death 6.risk proportionate to benefit
August 2006
7.proper facilities for care 8.research only by qualified scientifists 9.subjects may withdraw consent at any time 10. scientist may terminate at any time
clinical trial ethics

medical confidentiality: Subjects medical records, and the fact of their participation in the clinical trial, are considered private. Trial records must anonymize all results, including patients' identification, whether results are to be published or not. informed consent: requires that significant risks be disclosed, and any risks important to a given patient. Protocol must consider both the general (the reasonable patient) and the specific (this particular patient). Patient consent can be withdrawn at any time. Can be no consequences to withdrawal of consent. Full disclosure to subjects who participate in the trial, and this limits the role for deception in clinical trials. Today, subjects are told that the chances that they may receive the new treatment(s) being tested over the control/placebo. Balancing Treatment vs. Research Objectives. When standard treatments exist for the disease being studied in clinical trials, a standard treatment is always used in place of a placebo for serious diseases. Research Benefit: Plausible chance of new treatment being beneficial to subject, and of research being useful (sound protocol, sufficient sample size, likelihood of publication). Routinely breaking single-blind protocol therefore undermines the ethical basis of the trial. Equipoise: Research justified by the lack of consensus about appropriate treatment. Intellectual position that recognizes the limits of one's opinions.
Clinical trial phases
cartoon: phase I: safety, tolerability, two men with tablecloth pharmokinetics, scibbled full, one of whom pharmodynamics. Usually is yanking tablecloth from adjacent, occupied table involves ramping doses. phase II: expanded version of phase I, IIa for dosing, IIb for And for phase two... efficacy. When drugs fail, they usually fail in phase II. phase III: definitive, hence expensive assessment of efficacy. typically double-blind. Necessary for a regulatory submission. phase IV: post-launch surveillance. Rarer adverse effects detected here.
The Statistics Worldview

uniform measurement: Data are numbers collected using the same method on all subjects. random variation: Even when everything else is the same, different patient outcomes are expected. generalizability: In spite of such variation, we can learn something about a larger group, from data on a smaller group. sample size: How much we learn depends on how much databetter precision from larger n, experimental design: how it is gathered, and effect size: how big the effect is on what we measure.
(a) usual six-sided die

0.5 probability 0.4 0.3
(b) roll twice, pick larger

0.5 probability 0.4
0.3
0.2 0.1 0.0 1 2 3 4 5 6 die roll
0.2
0.1
0.0 1 2 3 4 5 6 die roll
after one observation, can you tell is it (a) or (b)? after 1000 observations? how much data, then?
Statistical Approach
Pick a statistic, e.g. average die roll Pick a sample size, e.g. n=20 rolls Pick a rule, e.g. average roll > 4
If the die is fair, the chance that the statistic is larger than the threshold should be low. If the die is not fair, the chance that the statistic is larger than the threshold should be high.
averages of 20 rolls
D0n sample: 100000 1.5 1.0 0.5 0.0 0.0 2.0 4.0
1 fair die 20% > 4.0
Statistic: average Sample size: 20 Rule: > 4

August 2006
D1n sample: 100000 3.0 2.0 1.0 0.0 0.0 2.0 4.0
88% > 4.0 larger of 2 rolls
The Art of Sample Sizes (I)

In the previous example, n=20 is too small, while n=34 (next) is just large enough. These sample sizes depended on the distribution of a fair die (the null hypothesis) and on the distribution of the larger of two rolls (the alternative hypothesis).
August 2006
averages of 34
D0n sample: 100000 1 fair die 1.5 1.0 0.5 0.0 2.0 3.0 4.0 1 = 97.6% % > 4.0 = 5% % > 4.0
Statistic: average Sample size: 34 Rule: > 4

August 2006
D1n sample: 100000 larger of 3.0 2 rolls 2.0 1.0 0.0 3.0 4.0
5.0
The Art of Sample Sizes (II)

When the null hypothesis is really true (e.g. control is equivalent to treatment), the chance of concluding otherwise is called a type I error. Typically =0.05. When the alternative is really true, the chance of concluding otherwise is called a type II error. Typically =0.1 or 0.5. 1 is called power.
August 2006
Power Curves for 2 Designs

1.0 power
0.8
= Design A (better)
0.6
0.4
type I error =0.05
0.2
= Design B
0.0
null hypothesis
.25 .30 .35 .40 .45
probability of event <= 3 months .50 .55 .60 .65
August 2006
Confidence Intervals
Sampling distribution: A die roll has a probability distribution of its values, and a statistic, e.g. the average of 20, has a distribution of its values. We observe only one observation from the sampling distribution.
0.5 probability 0.4 0.3
0.2 0.1 0.0 1 2 3 4 5 6 die roll
D0n sample: 100000 relative 1.5 1.0 0.5 0.0 0.0 2.0 4.0
average of 20 rolls probability
August 2006
We observe only one observation from the sampling distribution, and so there are various true distributions that are consistent with this one value. A Confidence Interval is the range of true values that are (for a given probability) consistent with the observed statistic.
With n=20, observing a mean between 2.75 and 4.25 would be consistent with a true mean of 3.5.
2.75 95%
D0n sample: 100000 relative
probability
4.25
1.5 1.0 0.5 0.0 0.0 2.0 4.0
true value = 3.5

August 2006
average of 20 rolls
averages of 10
single obs
10
11
12
13
14
10
11
12
13
14
for a given probability, the range of true population parameters (e.g. population averages) that are consistent with the observed data. are the error bars of summary statistics (e.g. averages, and so on). give the sizes the treatment effects, hence are the focus of clinical trials. grow ever smaller as sample size increases.
August 2006
Prediction Intervals
for a given probability, the range of single values one can expect to observe from a population. are the error bars of single observations (i.e. averages of n=1). give the range of outcomes for individual subjects. converse to a constant size as the sample size increases.
Rules of Thumb: Square root rule: To make the width of a confidence interval smaller by a factor of 2, the sample size must be increased by 22 = 4; ...smaller by a factor of 3, increased by 32 = 9. 50 percent power rule: If the sample size n is just large enough to exclude a particular alternative, the probability the alternative is in the confidence interval is 0.50. 90 percent power rule: 2.75n50, where n50 is the sample size of the 50% power.
Endpoint Hierarchies
1. Quantitative change from baseline 2. Quantitative outcome 3. Time to event (if any) 4. Count of events 5. any event (yes/no) 6. Assessment scale 7. Clinician assessment 8. Self- or family report
August 2006
objective lab measure
behaviorial
nomothetic
ipsative
Design Hierarchies
1. Compare subject to self i.e. to baseline (ab) i.e. to alternating on-off treatments (abab) 2. to similar subjects i.e. stratum based on clinical similarity 3. to those in similar care i.e. stratum based on care environment, e.g. hospital, referring clinic, primary care physician 4. Compare one group to another
August 2006
local control
stratified
unstratified
Ways to Boost Power

Increase sample size Size groups equally Study drug efficacy rather than effectiveness Move up on Design Hierarchy Move up on Endpoint Measurement Hierarchy Longer periods to observe time-to-event endpoints.
Effects Researchers Worry About

Hawthorne effect Selection and selfselection bias Placebo effect Rosenthal effect Stratum effect Cohort and adjunct care effects Treatment compliance
August 2006
S Harris cartoon: several surgeons around patient on operating table, two aside, of which one has surgical mask down to speak.
We'll just mill around till he's asleep, and then send him back up. This operation is actually for a placebo effect.'
Hawthorne effect
Named for the Chicago Hawthorne Works once owned by Western Electric.
C Barsotti cartoon: two men at bar, somewhat facing one another
If a group knows they are being studied, their outcomes may be biased. Must be able to answer what would happen if Participating in an the treatment had no experimental protocol and effect, in spite of being measuring clinical outcome part of an experiment, can itself improve outcomes. hence Patients may take more care than they would otherwise. Control Groups.
I can't explain itit's just a funny feeling that I'm being Googled.
Selection & self-selection bias
Wiley cartoon: couple at maitre d' station, who states No, this isn't a men's club. It's just that no woman has ever been able to pass the dress code. Sign states coat and tie required for mail patrons. Women required to withstand the scrutiny of other women.
The tendency of subjects of better prognosis to be assigned to a particular treatment: Clinicians more likely to refer patients for which they are more hopeful, and work harder on patients when they have more hope. Self-selection bias: Patients who actively seek out new or experimental therapies may be different, may manage their care better, comply better, (a form of the Hawthorne effect). To overcome these issues, randomized control groups.
Random Assignment
In an experiment, a method of assigning subjects to treatment and control groups that gives each subject a prescribed (usually equal) chance of being assigned to each group. Randomization accomplished by tossing coin, rolling die, random number table, On average, ensures the treatment groups are comparable before treatments begin, reduces treatment selection bias.
Placebo effect
The action of a drug or psychological treatment that is not attributable to any specific operations of the agent. When made worse, sometimes called the nocebo effect. Also called the subject-expectancy effect. Veyant cartoon: For example, a tranquilizer can Single man at table, with plate of food, reduce anxiety both because of wine glass, flowers, speaking to waiter. its special biochemical action and because the recipient No, there's nothing wrong with the food. I just needed a little attention. expects relief. Good experimental practice blinds the patients as to which treatment group they are in: Single-blind.
Rosenthal effect
The tendency for results to conform to experimenters expectations unless stringent safeguards are instituted to minimize human bias. Also called the Pygmalion effect and the observer-expectancy effect. Robert Rosenthal, professor of psychology, UCLA, performed many of the first experiments revealing the problem. Good experimental practice blinds the clinicians, researchers, and follow-up interviewers to which group any patient belongs. Double-blind.
Stratum effect
Certain treatment locations have generally better outcome than others, perhaps because the treatment is better, or because the associated patient population is healthier.
cartoon: Guggenheim Museum in New York, looked upon by two men in convertible car.
Are they allowed to do that on Fifth Avenue?
Good experimental practice has at each treatment location both treatment and control groups. Good analysis practice compares the treatment outcomes of a stratum to only its local control groupwithin the same stratum. Stratified.
Cohort and adjuvant care effects

Usually, adjuvant care improves over time, as personnel become more experienced, and additional treatments become available. Good experimental practice has treatment and control groups treated in the same time period, so Historical control groups are less satisfactory, hence Concurrent control groups.
August 2006
cartoon: two bald men in prison cell, on upper and lower bunks. sign: Many years later
My third felony was a smart move. Folks on the outside are still waiting for health care.
Treatment compliance
Once a patient is assigned to a treatment group, all subsequent behavior is in some sense an outcome also. Comparing only patients that adhered closely to their assigned treatments creates a selection bias. Good data analysis practice analyzes outcomes based on which group the patient was intended for treatment intention to treat (ITT). Compliance can sometimes be corrected for during data analysis.
August 2006
Mark Parisi cartoon: Acupuncture sign on wall. One porcupine lying on stomach, another poised beside table. caption: Oh, jeepers...I've completely lost track here... Whadaya say I just randomly pull out some needles and we'll call it even?
Multiplicity Issues
heads-tails, 2 out of 3 game
H
T
First toss
HH
HT
TH
TT
Second toss
HHH
HHT
HTH
HTT
THH
THT
TTH
TTT
Third toss
one toss is fair, best of 3 is fair but picking #tosses after one toss is not fair = 5/8
Three Kinds of Multiplicity:

Interim analyses, Multiple endpoints, Multiple comparisons of groups.
lying u l ti p M is bit's Rab !!! k Wor

cartoon: man holding baby, beside 3 kids and pregnant wife, being addressed by several rabbits, who hold a sign.
We'd like a word with you!
August 2006
Dealing with Multiplicity
Interim analyses: Schedule interim looks into formal protocol, Raise threshold of significance to keep type I error at stated You've got one foot in the grave. level. Further testing will determine if it's your left or your right. Endpoints: Pick a single primary endpoints, and then others become only secondary, or One composite endpoint by combining several. Groups: One treatment vs one control condition, avoiding analyses within strata. Isolating dose-finding to phase II.
cartoon: man in gown on examination table, being addressed by doctor
A Typical Phase III Design

Control group: placebo or standard care Randomized, concurrent control groups, stratified by site sample size gives 90% power, and type I error of 5% placebo-controlled, single- and double-blind, one treatment arm, one primary endpoint, any interim analyses scheduled up front, all conforming to planned protocol.
some concluding remarks

The needs of the patient/subject come first, and the integrity of the research protocol is also key to ethical research. Philosophically, these considerations are represented by the concept of equipoise, and practically by the randomization, single, and double-blinding protocols. Continued recruitment from all strata improves the study beyond the sample size effect. As with any clinical setting, encourage compliance to treatment.

The Statistics in Trials

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

The Statistics in Trials

Uploaded by

Copyright:

Available Formats

Randy Glasbergen cartoon motivational speaker with flip chart and pointer

The LowCarb Lecture on Clinical Trials

William D Heavlin bill.heavlin@comcast.net

Topic: after this class you will

A selected history of clinical trials

tuberculosis sleep aid

clinical trial ethics

Clinical trial phases

The Statistics Worldview

(a) usual six-sided die

(b) roll twice, pick larger

0.2 0.1 0.0 1 2 3 4 5 6 die roll

0.0 1 2 3 4 5 6 die roll

1 fair die 20% > 4.0

Statistic: average Sample size: 20 Rule: > 4

88% > 4.0 larger of 2 rolls

The Art of Sample Sizes (I)

Statistic: average Sample size: 34 Rule: > 4

The Art of Sample Sizes (II)

Power Curves for 2 Designs

type I error =0.05

probability of event <= 3 months .50 .55 .60 .65

0.5 probability 0.4 0.3

0.2 0.1 0.0 1 2 3 4 5 6 die roll

1.5 1.0 0.5 0.0 0.0 2.0 4.0

true value = 3.5

objective lab measure

Ways to Boost Power

Effects Researchers Worry About

C Barsotti cartoon: two men at bar, somewhat facing one another

Selection & self-selection bias

Are they allowed to do that on Fifth Avenue?

Cohort and adjuvant care effects

Three Kinds of Multiplicity:

lying u l ti p M is bit's Rab !!! k Wor

We'd like a word with you!

Dealing with Multiplicity

cartoon: man in gown on examination table, being addressed by doctor

A Typical Phase III Design

some concluding remarks

You might also like