Comparing Systems Powerpoints PDF

"The method that proceeds without analysis is like the
groping of a blind man. Socrates
Questions Addressed
What
are the issues involved in comparing

two or more system configurations?
What is hypothesis testing?
How do common random numbers work?
9-2
Purpose
Purpose:
comparison of alternative system designs.

Approach: discuss a few of many statistical
methods that can be used to compare two or more
system designs.
Statistical analysis is needed to discover whether observed

differences are due to:
Differences in design, or
The random fluctuation inherent in the models.
3
Outline
For two-system
comparisons:
Independent sampling.
Correlated sampling (common random numbers).
For multiple
system comparisons:
Bonferroni approach: confidence-interval
estimation, screening, and selecting the best.
Comparing Two Systems
Challenge is to determine if differences are

attributable to actual differences in
performance and not to statistical variation.
Running multiple replications, or batches,

is required.
9-5
Hypothesis Testing
Used
when outcomes are close or high

precision is required.
A hypothesis is first formulated (e.g. that
methods A and B both result in the same throughput)
and then a test is made to see whether the

results of the simulation lead us to reject
the hypothesis.
9-6
What does it mean if you

fail to reject a hypothesis?
1. Our hypothesis is correct
OR
2. The variance in the observed outcomes
are too high given the number of
replications to be conclusive (run more
replications or use variance reduction techniques).
9-7
Types of Errors
Type
I error: Reject True Hypothesis

Type II error: Accept False Hypothesis
True
Accept
Reject
False
II
I
9-8
Possible Confidence Intervals

[------|------]
(A)
(B)
(C)
[------|------]
Fail to
Reject H0
Reject H0
[------|------]
Reject H0
U1-U2=0
[------|------] denotes confidence Interval
9-9
Comparison of two Buffer Strategies
Assume we have two candidate strategies believed to

maximize the throughput of the system.
We offer two methods based on the Confidence Interval
approach.
We seek to discover if the mean throughput of the system
due to Strategy 1 and Strategy 2 are significantly different.
We begin by estimating the mean performance of the two
proposed strategies by simulating each strategy for 16 days
past the warm-up period.
The simulation experiment was replicated 10 times for
each strategy.
The throughput achieved by each strategy is shown next.
9-10
Comparison of two Buffer Strategies

(A)
Replication
(C)
(B)
Strategy 2
Strategy 1
Throughput Throughput
54.48
56.01
57.36
54.08
54.81
52.14
56.20
53.49
54.83
55.49
57.69
55.00
58.33
54.88
57.19
54.47
56.84
54.93
10
55.29
55.84
mean
Standard Deviation
56.30
1.37
54.63
1.17
Variance
1.89
1.36
9-11
Welch Confidence Interval

Requires
that the observations drawn

from each simulated system be
normally distributed and independent
within a population and between
populations.
9-12
9-13
17.5
= T.INV.2T(0.05,17.5) = 2.10
9-14
9-15
9-16
Paired-t CI for Comparing 2 Systems

Like
the Welch CI method, the paired-t CI

method requires that the observations drawing
from each population be normally distributed
and independent within a population.
However, the paired-t CI method does not
require that the observations between
populations be independent.
9-17
Paired-t CI for Comparing 2 Systems

This
allows us to use Common Random

Numbers to force a positive correlation between
the two populations in order to reduce the halfwidth.
Finally, like the Welch method, the paired-t CI
method does not require that the population have
equal variances.
Given n observations (n1=n2=n), we pair the
observations from each population (x1 and x2) to
define a new random variable: x(1-2)j = x1j - x2j
9-18
9-19
Paired-t Comparison
(A)
Replication (j)
(B)
Strategy 1
Throughput x1j
(C)
Strategy 2
Throughput
x2j
(D)
Difference (B C)
X(1-2)j = x1j - x2j
54.48
56.01
-1.53
57.36
54.08
3.28
54.81
52.14
2.67
56.20
53.49
2.71
54.83
55.49
-0.66
57.69
55.00
2.69
58.33
54.88
3.45
57.19
54.47
2.72
56.84
54.93
1.91
10
55.29
55.84
-0.55
mean
Standard Dev.
Variance
1.67
1.85
3.42
9-20
9-21
Comparing More than 2 Systems

Bonferroni
approach
ANOVA
Factorial
design and
optimization experiments
9-22
9-23
9-24
9-25
9-26
Factorial Design
Tests system response(s) when
multiple factors are being manipulated.
Input
variables are the factors

Output measures are the responses
9-27
Two-level Full-factorial Design

Define a high and low level setting for each factor.
Try every combination of factor settings.
Run detailed studies for factors that have the
greatest impact.
9-28
Fractional-factorial Design
Strategically "screen out" factors that have little or no
impact on system performance.
On remaining factors, run a full-factorial experiment.
Run detailed studies for factors that have the greatest
impact.
9-29
Variance Reduction
Common
Random Numbers (CRN)

Evaluates each system under the exact same
circumstances.
Helps ensure that observed differences of
system designs are due to the differences in
the designs and not to differences in
experimental conditions.
Antithetic Random Numbers (ARN)
9-30
Table 7.10. Comparison of two buffer allocation strategies using

common random numbers.
CRN Continued
(A)
Replication
1
2
3
4
5
6
7
8
9
10
(B)
Strategy 1
Throughput
79.05
54.96
(C)
Strategy 2
Throughput
75.09
51.09
(D)
Difference (B-C)
3.96
3.87
51.23
49.09
2.14
88.74
56.43
70.42
88.01
53.34
67.54
0.73
3.09
2.88
35.71
34.87
0.84
58.12
57.77
45.08
54.24
55.03
42.55
3.88
2.74
2.53
X Difference = 2.67
s Difference = 1.16
9-31
More Detail on Comparison of Two

System Designs
Goal: compare two possible system configurations, e.g.:
two possible ordering policies in a supply-chain system,
two possible scheduling rules in a job shop.
Approach: the method of replications is used to analyze the

output data.
The mean performance measure for system i is denoted by

qi (i = 1, 2) to obtain point and interval estimates for the
difference in mean performance, namely q1 q2.
32
Comparison of Two System Designs

Vehicle-safety inspection example:
The station performs 3 jobs: (1) brake check, (2) headlight check, and (3)
steering check.
Vehicles arrival: Possion with rate = 9.5/hour.
Present system:
Three stalls in parallel (one attendant makes all 3 inspections at each stall).
Service times for the 3 jobs: normally distributed with means 6.5, 6.0 and 5.5
minutes, respectively.
Alternative system:
Each attendant specializes in a single task, each vehicle will pass through
three work stations in series
Mean service times for each job decreases by 10% (5.85, 5.4, and 4.95
minutes).
Performance measure: mean response time per vehicle (total time from
vehicle arrival to its departure).
33
From replication r of system i, the simulation analyst obtains an estimate
Yir of the mean performance measure qi .

Assuming that the estimators Yir are (at least approximately) unbiased:
q1 = E(Y1r ), r = 1, , R1;
q2 = E(Y2r ), r = 1, , R2
Goal: compute a confidence interval for q1 q2 to compare the
two system designs
Confidence interval for q1 q2 :

If c.i. is totally to the left of 0, strong evidence for the hypothesis that q1 q2 < 0 (q1 < q2 ).
If c.i. is totally to the right of 0, strong evidence for the hypothesis that q1 q2 > 0 (q1 > q2 ).
If c.i. contains 0, no strong statistical evidence that one system is better than the other
If enough additional data were collected (i.e., Ri increased), the c.i. would
most likely shift, and definitely shrink in length, until conclusion of q1 < q2 or
q1 > q2 would be drawn.
34

A two-sided 100(1-)% c.i. for q1 q2 always takes the
form of:
Y.1 Y.2 t / 2, s.e.(Y.1 Y.2 )

where Y.i is the sample mean performance measure for system i over all replications,
and is the degress of freedom.
To calculate the Standard Error, the analyst uses one of

two statistical techniques.
Both techniques assume that the basic data Yir are
approximately normally distributed.
We will discuss these two methods next.
35

Statistically significant versus practically significant
Statistical significance: is the observed difference Y.1 Y.2
larger than the variability in Y.1 Y.2 ?

Practical significance: is the true difference q1 q2 large
enough to matter for the decision we need to make?

Confidence intervals do not answer the question of
practical significance directly, instead, they bound the true

difference within a range.
36
Independent Sampling with Equal Variances

[Comparison of 2 systems]
Different and independent random number streams are

used to simulate the two systems
All observations of simulated system 1 are statistically
independent of all the observations of simulated system 2.
The variance of the sample mean, Y.i , is:
V Y.i i2
V Y.i
,
Ri
Ri
i 1,2
For independent samples:
V Y.1 Y.2 V Y.1 V Y.2
12
R1
22
R2
37
Independent Sampling with Equal Variances

If it is reasonable to assume that 21 22 (approximately) or if R1 = R2,
a two-sample-t confidence-interval approach can be used:
The point estimate of the mean performance difference is: Y.1 Y.2
The sample variance for system i is:
The pooled estimate of 2 is:
( R1 1) S12 ( R2 1) S 22
S
,
R1 R2 2
2
p
i
1
2
Si
Yri Y.i
Ri 1 r 1
i
1
Yri 2 RiY.i 2
Ri 1 r 1
where R R -2 degrees of freedom

1
2
C.I. is given by:
Y.1 Y.2 t / 2, s.e.(Y.1 Y.2 )
Standard error:
s.e. Y.1 Y.2 S p
1
1
R1 R2
38
Independent Sampling with Unequal Variances

If the assumption of equal variances cannot safely be made, an

approximate 100(1-)% C.I. for can be computed as:
s.e. Y.1 Y.2
S12 S 22
R1 R2
With degrees of freedom:
/ R1 S 22 / R2
S 2 / R 2 / R 1 S 2 / R
1
1 1
2 2
2
1
/ R
2
, round to an interger
Minimum number of replications R1 > 7 and R2 > 7 is recommended.
39
Common Random Numbers (CRN)

For each replication, the same random numbers are used to simulate
both systems.
For each replication r, the two estimates, Yr1 and Yr2, are correlated.
However, independent streams of random numbers are used on different
replications, so the pairs (Yr1 ,Ys2 ) are mutually independent.
Purpose: induce positive correlation between Y.1,Y.2 (for each r) to

reduce variance in the point estimator of Y.1 Y.2 .
V Y.1 Y.2 V Y.1 V Y.2 2 cov Y.1 , Y.2
12
R
22
R
2 12 1 2
R
12 is positive
Variance of Y.1 Y.2 arising from CRN is less than that of independent
sampling (with R1=R2).

40

The estimator based on CRN is more precise, leading to a

shorter confidence interval for the difference.
Sample variance of the differences
S D2
Dr D
R 1 r 1
where Dr Yr1-Yr 2
Standard error:
D Y.1 Y.2 :
1
2
2
Dr RD
R 1 r 1
1
and D
R
D ,
r
with degress of freedom R - 1
r 1
s.e.(D ) s.e. Y.1 Y.2
SD
R
41

It is never enough to simply use the same seed for the

random-number generator(s):
The random numbers must be synchronized: each
random number used in one model for some purpose
should be used for the same purpose in the other
model.
e.g., if the ith random number is used to generate a
service time at work station 2 for the 5th arrival in
model 1, the ith random number should be used for
the very same purpose in model 2.
42
Comparison of Several System Designs

To compare K alternative system designs based on some specific
performance measure, qi, of system i , for i = 1, 2, , K.
Procedures are classified as:
Fixed-sample-size procedures: predetermined sample size is used to
draw inferences via hypothesis tests of confidence intervals.
Sequential sampling (multistage): more and more data are collected
until an estimator with a pre-specified precision is achieved or until one
of several alternative hypotheses is selected.
Some goals/approaches of system comparison:

Estimation of each parameter q.
Comparison of each performance measure qi, to control q1.
All pairwise comparisons, qi - qj, for all i not equal to j
Selection of the best qi.
43
Bonferroni Approach
[Multiple Comparisons]
To make statements about several parameters simultaneously,

(where all statements are true simultaneously).
Bonferroni inequality:
C
P(all statements Si are true, i 1, ...,C ) 1 j 1 E

j 1
Overall error probability, provides an upper bound on the

probability of a false conclusion
The smaller j is, the wider the jth confidence interval will be.
Major advantage: inequality holds whether models are run with

independent sampling or CRN
Major disadvantage: width of each individual interval increases
as the number of comparisons increases.
44
Bonferroni Approach
[Multiple Comparisons]
Should be used only when a small number of comparisons are made.

Practical upper limit: about 10 comparisons
3 possible applications:
Individual c.i.s: Construct a 100(1- j)% c.i. for parameter qi,
where # of comparisons = K.
Comparison to an existing system: Construct a 100(1- j)%
c.i. for parameter qi- q1 (i = 2,3, K), where # of comparisons
= K 1.
All pairwise: For any 2 different system designs, construct a
100(1- j)% c.i. for parameter qi- qj. Hence, total # of
comparisons = K(K 1)/2.
45

Comparing Systems Powerpoints PDF

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Comparing Systems Powerpoints PDF

Uploaded by

Copyright:

Available Formats

"The method that proceeds without analysis is like the

groping of a blind man. Socrates

are the issues involved in comparing

comparison of alternative system designs.

Statistical analysis is needed to discover whether observed

Comparing Two Systems

Challenge is to determine if differences are

Running multiple replications, or batches,

when outcomes are close or high

and then a test is made to see whether the

What does it mean if you

I error: Reject True Hypothesis

Possible Confidence Intervals

Comparison of two Buffer Strategies

Assume we have two candidate strategies believed to

Comparison of two Buffer Strategies

Welch Confidence Interval

that the observations drawn

Paired-t CI for Comparing 2 Systems

the Welch CI method, the paired-t CI

Paired-t CI for Comparing 2 Systems

allows us to use Common Random

Comparing More than 2 Systems

variables are the factors

Two-level Full-factorial Design

Random Numbers (CRN)

Table 7.10. Comparison of two buffer allocation strategies using

More Detail on Comparison of Two

Approach: the method of replications is used to analyze the

The mean performance measure for system i is denoted by

Comparison of Two System Designs

vehicle arrival to its departure).

Comparison of Two System Designs

From replication r of system i, the simulation analyst obtains an estimate

Yir of the mean performance measure qi .

Goal: compute a confidence interval for q1 q2 to compare the

two system designs

Confidence interval for q1 q2 :

Comparison of Two System Designs

Y.1 Y.2 t / 2, s.e.(Y.1 Y.2 )

To calculate the Standard Error, the analyst uses one of

Comparison of Two System Designs

larger than the variability in Y.1 Y.2 ?

enough to matter for the decision we need to make?

practical significance directly, instead, they bound the true

Independent Sampling with Equal Variances

Different and independent random number streams are

For independent samples:

V Y.1 Y.2 V Y.1 V Y.2

Independent Sampling with Equal Variances

where R R -2 degrees of freedom

C.I. is given by:

Y.1 Y.2 t / 2, s.e.(Y.1 Y.2 )

s.e. Y.1 Y.2 S p

Independent Sampling with Unequal Variances

If the assumption of equal variances cannot safely be made, an

s.e. Y.1 Y.2

With degrees of freedom:

Minimum number of replications R1 > 7 and R2 > 7 is recommended.

Common Random Numbers (CRN)

replications, so the pairs (Yr1 ,Ys2 ) are mutually independent.

Purpose: induce positive correlation between Y.1,Y.2 (for each r) to