You are on page 1of 856

CONTENTS

C ontents

CHAPTER 1: Contents i

CHAPTER 2: Introduction 1
Using this Manual .................................................................................... 1
What’s New in SigmaStat 3.1 ................................................................... 2
System Requirements................................................................................ 4
SigmaStat Procedures ............................................................................... 4
Major SigmaStat Features......................................................................... 6
Using SigmaStat Toolbars ........................................................................ 8
Saving Your Work ................................................................................ 10
Setting Options .................................................................................... 10
Exiting SigmaStat ................................................................................... 11
Getting Help........................................................................................... 11
Systat on the World Wide Web .............................................................. 12
References............................................................................................... 12

CHAPTER 3: Notebook Manager Basics 15


Notebook Manager Overview................................................................. 15
Opening and Closing Notebooks in the Notebook Manager.................. 18
Protecting Notebooks............................................................................. 21

CHAPTER 4: Using the Worksheet 27


Using SigmaStat Worksheets .................................................................. 28
Using Excel Worksheets in SigmaStat ................................................... 29

i
CONTENTS

Entering Data in a SigmaStat Worksheet ............................................... 30


Moving Around the SigmaStat Worksheet.............................................. 31
Selecting a Block of Data ........................................................................ 32
Cutting, Copying, Pasting, Moving, and Deleting Data......................... 33
Stacking Columns ................................................................................... 34
Inserting and Deleting Blocks, Columns, and Rows............................... 35
Using Special Worksheet Shortcuts ...................................................... 36
Insertion and Overwrite Modes .............................................................. 37
Entering and Promoting Column and Row Titles .................................. 37
Setting Worksheet Display Options........................................................ 41
Customizing Individual Worksheets....................................................... 50
Sorting Data............................................................................................ 52
Viewing the Column Statistics Worksheet .............................................. 54
Saving Worksheets to SigmaStat Notebooks .......................................... 57
Exporting Worksheets............................................................................. 58
Opening Worksheets .............................................................................. 59
Importing Data....................................................................................... 60
Switching Rows to Columns ................................................................... 63
Arranging Data for t-Tests and ANOVAs................................................ 64
Arranging Data for Contingency Tables ............................................. 69
Arranging Data for Regressions .............................................................. 71
Indexing Data ......................................................................................... 73
Using Transforms to Modify Data.......................................................... 76
Printing Worksheet Data........................................................................ 80

CHAPTER 5: Using the Advisor Wizard 83


Select what you need to do...................................................................... 83
How are the data measured? ................................................................... 85
Did you apply more than one treatment per subject? ............................. 86
How many groups or treatments are there?............................................. 87
What kind of data do you have?.............................................................. 91
What kind of prediction do you want to make? ..................................... 92

ii
CONTENTS

What kind of curve do you want to use?................................................. 93


How do you want to specify the independent variables? ........................ 94
How do you want SigmaStat to select the
independent variable? ............................................................................. 95

CHAPTER 6: Using SigmaStat Procedures 97


Running SigmaStat Procedures .............................................................. 97
Choosing the Procedure to Use ............................................................ 103
Describing Your Data with Basic Statistics.......................................... 104
Choosing the Group Comparison Test to Use...................................... 113
Choosing the Repeated Measures Test to Use ...................................... 117
Choosing the Rate and Proportion Comparison to Use ....................... 122
Choosing the Prediction or Correlation Method.................................. 123
Choosing the Survival Analysis to Use.................................................. 126
Testing Normality ................................................................................ 127
Determining Experimental Power and Sample Size ............................. 133

CHAPTER 7: Working with Reports 135


Setting Report Options......................................................................... 135
Generating Reports............................................................................... 137
Editing Reports .................................................................................... 137
Moving Around Reports....................................................................... 141
Saving Reports to Notebooks ............................................................... 142
Exporting Reports ................................................................................ 143
Opening Reports ................................................................................ 144
Closing and Deleting Reports............................................................... 144
Printing Reports ................................................................................... 145

CHAPTER 8: Creating and Modifying Graphs 147


Generating Report Graphs ................................................................... 148
Creating Exploratory Graphs ............................................................... 162
Creating Exploratory Graphs ............................................................... 175

iii
CONTENTS

Setting Graph Page Options ................................................................. 178


Zooming In and Out on Graphs ........................................................... 181
Selecting Graphs and Labels ................................................................. 182
Resizing and Moving Graphs on the Page ............................................ 182
Modifying Graph Attributes ................................................................. 184
Creating and Editing Labels on the Graph Page................................... 186
Using SigmaPlot to Modify Graphs ...................................................... 189
Cutting and Copying Graphs and Other Page Objects......................... 190
Pasting Graphs and Other Objects onto a SigmaStat
Graph Page ........................................................................................... 190
Pasting SigmaStat Graphs into other Applications ............................... 198
Saving Graphs in Notebook Files.......................................................... 200
Opening Graph Pages ........................................................................... 200
Closing and Deleting Graph Pages ....................................................... 201
Printing Graph Pages............................................................................ 202

CHAPTER 9: Comparing Two or More Groups 203


About Group Comparison Tests ........................................................... 203
Data Format for Group Comparison Tests ........................................... 204
Unpaired t-Test .................................................................................. 206
Mann-Whitney Rank Sum Test ............................................................ 220
One Way Analysis of Variance .............................................................. 230
Two Way Analysis of Variance (ANOVA) ............................................. 253
Three Way Analysis of Variance (ANOVA)........................................... 283
Kruskal-Wallis Analysis of Variance on Ranks ..................................... 310

CHAPTER 10: Comparing Repeated


Measurements of the Same Individuals 329
About Repeated Measures Tests............................................................ 329
Data Format for Repeated Measures Tests............................................ 330
Paired t-Test ......................................................................................... 332
Wilcoxon Signed Rank Test.................................................................. 345

iv
CONTENTS

One Way Repeated Measures Analysis of Variance (ANOVA).............. 355


Two Way Repeated Measures Analysis of Variance (ANOVA) ............. 379
Friedman Repeated Measures Analysis of Variance on Ranks .............. 408

CHAPTER 11: Comparing Frequencies,


Rates, and Proportions 429
About Rate and Proportion Tests ......................................................... 429
Data Format for Rate and Proportion Tests......................................... 431
Comparing Proportions Using the z-Test............................................. 434
Chi-Square Analysis of Contingency Tables ......................................... 442
The Fisher Exact Test ........................................................................... 451
McNemar’s Test.................................................................................... 457

CHAPTER 12: Prediction and Correlation 465


About Regression.................................................................................. 465
Correlation ........................................................................................... 467
Data Format for Regression and Correlation ....................................... 468
Simple Linear Regression ..................................................................... 469
Multiple Logistic Regression ................................................................ 527
Polynomial Regression ......................................................................... 553
Stepwise Linear Regression................................................................... 577
Best Subsets Regression ........................................................................ 611
Pearson Product Moment Correlation ................................................. 623
Spearman Rank Order Correlation ...................................................... 631
Nonlinear Regression ........................................................................... 636

CHAPTER 13: Survival Analysis 667


About Survival Analysis ........................................................................ 667
Data Format for Survival Analysis........................................................ 668
Single Group Survival Analysis............................................................. 670
LogRank Survival Analysis ................................................................... 679
Gehan-Breslow Survival Analysis.......................................................... 693

v
CONTENTS

Failures, Censored Values and Ties....................................................... 708


Survival Curve Graph Examples ........................................................... 709

CHAPTER 14: Computing Power and Sample Size 713


About Power ......................................................................................... 713
About Sample Size ................................................................................ 714
Determining the Power of a t-Test........................................................ 714
Determining the Power of a Paired t-Test ........................................... 717
Determining the Power of a z-Test Proportions Comparison............... 719
Determining the Power of a One Way ANOVA.................................... 721
Determining the Power of a Chi-Square Test ....................................... 723
Determining the Power to Detect a Specified Correlation.................... 726
Determining the Minimum Sample Size for a t-Test ............................ 728
Determining the Minimum Sample Size for a Paired t-Test ................. 730
Determining the Minimum Sample Size for a
Proportions Comparison ...................................................................... 733
Determining the Minimum Sample Size for a One Way Anova ........... 735
Determining the Minimum Sample Size for a Chi-Square Test............ 738
Determining the Minimum Sample Size to Detect a
Specified Correlation ............................................................................ 741

CHAPTER 15: Using Transforms 745


Types of Transforms ............................................................................. 745
Quick Mathematical Transforms .......................................................... 747
Using Quick Transforms to Linearize and Normalize Data.................. 749
Centering Data ..................................................................................... 755
Standardizing Data ............................................................................... 758
Ranking Data........................................................................................ 760
Creating Interaction Variables .............................................................. 762

CHAPTER 16: 765


Creating Dummy (Indicator) Variables ................................................ 765

vi
CONTENTS

Creating Lagged Variables .................................................................... 775


Filtering Strings and Numbers ............................................................. 777
Generating Random Numbers.............................................................. 781
Translating Missing Value Codes ......................................................... 785
User-Defined Transforms ..................................................................... 787

CHAPTER 17: Glossary 789

vii
CONTENTS

viii
Introduction

1 Introduction

SigmaStat® 3.1 provides a wide range of powerful yet easy-to-use


statistical analyses specifically designed to meet the needs of research
scientists, engineers, and statisticians, without requiring in-depth
knowledge of the math behind the procedures performed.

The first

This chapter describes the organization of this manual and introduces


you to most SigmaStat features. It also covers some basics about using
SigmaStat, including:

➤ Using this manual (page 1)


➤ The system requirements to run SigmaStat (page 4)
➤ SigmaStat procedures (page 4)
➤ Major SigmaStat features (page 6)
➤ Using SigmaStat toolbars (page 8)
➤ Saving your work (page 10)
➤ Setting options (page 10)
➤ Exiting SigmaStat (page 11)
➤ Getting help (page 11)
➤ The Systat world wide web home page (page 12)
➤ Additional references (page 12)

Using this Manual 10

The SigmaStat User's Guide is designed to provide you with complete


instructions on how to use SigmaStat's statistical procedures. It is
arranged in an order that parallels the steps you would probably follow

Using this Manual 1


Introduction

in manually performing procedures and tests, including basic


interpretations of results.

Cross-references are hypertext linked for quick navigation.

For an introduction to using SigmaStat, see the printed SigmaStat 3.1


Getting Started Guide. This is also available in PDF format. To view this
PDF, on the Help menu, click View SigmaStat PDFs, and then click
Getting Started.

For an introduction to the Advisor Wizard, see Chapter 4, Using the


Advisor Wizard.

For an overview of all SigmaStat tests, and directions on selecting the


appropriate statistical test or procedure, see Chapter 5, Using SigmaStat
Procedures.

To perform individual tests and interpret test and analysis results, see:
Chapter 8, Comparing Two or More Groups; Chapter 9, Comparing
Repeated Measurements of the Same Individuals; Chapter 10,
Comparing Frequencies, Rates, and Proportions; Chapter 12, Survival
Analysis; and Chapter 13, Computing Power and Sample Size.

For a reference of all menu commands and associated dialog box option
functions, see Chapter 15, Reference.

What’s New in SigmaStat 3.1 10

SigmaPlot 9.0 Prepare your SigmaStat graphs for publication using SigmaPlot’s
Integration advanced graph editing on your SigmaStat graphs. SigmaPlot 9.0 must
be installed.
For more information, see Using SigmaPlot to Modify Graphs on page
189.
Survival Analysis Use Survival Analysis to compute the probability of time to an event,
such as surviving lung cancer, using Kaplan-Meier (product limit)
survival curve estimation. Run Single Group, Log Rank, or Gehan-
Breslow tests, and SigmaStat automatically generates a graph page and
report. Survival curve options include error bars and confidence
intervals, fraction or percent scales.Survival Analysis supports both raw

What’s New in SigmaStat 3.1 2


Introduction

and indexed data formats, and it alllows for multiple definitions of event
and censored data.
For more information, see Chapter 12, Survival Analysis.
Improved Worksheet SigmaStat 3.1’s new worksheet can now handle larger data sets of 32
million rows by 32,000 columns. Adjust row height and column width,
enter longer text strings and variable names, add row titles, perform
multiple undo, and format cells and empty columns.
For more information, see Chapter 3, Using the Worksheet.
Improved Importing Import MS Access, SPSS, and previous SigmaPlot and SigmaStat files.
Multiple Undo Experiment with different annotations on your graph page, worksheet,
or report, then quickly undo the last several changes and start again.
Improved Reports SigmaStat reports now support .pdf and .html export, the ability to
insert a date/time field, decimal aligned tabs, more keyboard control
options for improved navigation, a new Formatting toolbar that includes
text justification, Print Preview, and an improved Find/Replace dialog
box with a Go To option.
For more information, see Chapter 6, Working with Reports.
Graph Improvements Graph improvements include graph legends, multiple undo, step plots,
and more symbols, Create and re-create graphs with fewer clicks using
the new Graph Wizard. Select a range of data in a column, rather than
the whole column. Graph single X or Y data, or when creating multiple
plots, choose X many Y or many X and Y data formats. When creating
histograms, indicate the number of bins.
For more information, see Chapter 7, Creating and Modifying Graphs.
Transform Language Added to the transform language are parameter determination functions
Additions using ape, dsinp, fwhm, inv, lowess, lowpass, sinp, x25, x50, x75,
xatymax, xwtr.
Performance SigmaStat 3.1’s statistical tests, curve fitting and transforms are now
Improvements faster than ever.
New Technology SigmaStat 3.1 is compatible with Windows 2000 and Windows XP.

What’s New in SigmaStat 3.1 3


Introduction

System Requirements 10

SigmaStat version 3.1 requires Windows 2000, Windows 98, Windows


NT, or Windows XP.

Minimum Hardware Minimum Software


Requirement Requirement

Client ➤ Pentium 200 or better ➤ Windows 98


➤ 64MB RAM ➤ Windows NT 4.0
➤ 48MB available Hard disk space ➤ Windows 2000
➤ CD-ROM drive ➤ Windows XP
➤ SVGA/256 color graphics
adapter (800 x 600, High Color
recommended)

Server ➤ Pentium 450 recommended ➤ Windows 98


➤ 64MB RAM ➤ Windows NT 4.0
➤ 48MB available Hard disk space ➤ Windows 2000
➤ CD-ROM drive ➤ Windows XP
➤ SVGA/256 color graphics ➤ Windows NT
adapter (800 x 600, High Color networking
recommended) ➤ Novell 3.2 networking

SigmaStat Procedures 10

SigmaStat performs specified statistical tests and other analyses on data


you enter or import into the worksheet. The following tests and
procedures are available in SigmaStat:

➤ Basic descriptive statistics for data.


➤ Independent and paired t-tests.
➤ Mann-Whitney Rank Sum and Wilcoxon Signed Rank tests
(nonparametric tests).
➤ One Way ANOVA, Two Way ANOVA, Three Way ANOVA, and
Kruskal-Wallis One Way ANOVA on Ranks (nonparametric
ANOVA).

System Requirements 4
Introduction

➤ One Way Repeated Measures ANOVA and Friedman Repeated


Measures ANOVA on Ranks (nonparametric Repeated Measures
ANOVA).
➤ Two Way Repeated Measures ANOVA, with either one or two
repeated measurements.
➤ Automatically performed multiple comparison procedures (post-hoc
testing) using the Tukey test, Student-Newman-Keuls test, Duncan’s
test, Fisher’s LSD test, Bonferroni t-test, Dunn’s test, and Dunnett’s
test with all pairwise comparisons, comparisons versus a control
group, and multiple contrasts comparisons available.
➤ Automatic normality assumption testing with the Kolmogorov-
Smirnov test, and equal variance assumption testing with the Levene
median test.
➤ z-test comparison of proportions.
➤ Chi-Square Analysis of Contingency Tables, McNemar’s test, and
the Fisher Exact test.
➤ Bivariate analysis using Simple Linear Regression.
➤ Multivariate analysis using Multiple Linear and Logistic Regression,
including Forward and Backward Stepwise selection of independent
variables.
➤ Best subsets selection of Multiple Regression models.
➤ Polynomial Regression, either incremental orders or a specific order.
➤ Nonlinear Regression with user-defined models.
➤ Single group, LogRank, and Gehan-Breslow survival analysis.
➤ Normality, constant variance, multicollinearity, and influential and
outlying point testing of regression data.
➤ Pearson Product Moment and Spearman Rank Order Correlation
Coefficients.
➤ Power and sample size of a specified for an experiment you want
to perform.

Viewing Test Test reports automatically appear after the test is complete. Reports can
Report be edited, exported, printed, and saved to the notebook.

Generating You can also generate graphs from test reports. Report graphs can be
Report Graphs edited, printed, and saved to notebook files.

Creating You can create a variety of graphs using your worksheet data. These
Exploratory Graphs graphs can be edited, printed, and saved to the notebook.

SigmaStat Procedures 5
Introduction

SigmaStat Transforms You can also use the complete set of data transforms to center,
standardize, and otherwise modify your data prior to tests.

Major SigmaStat Features 10

Several special SigmaStat features provide you with new standards of ease
of use, power, and flexibility..

The Advisor Wizard The Help menu Advisor command starts the Advisor Wizard. The
Advisor asks you a series of questions about what you want to do with
your data, and the kind of data you have. Answering these questions
enables SigmaStat to select the appropriate test for your analysis.

Assumption Checking All statistical tests and analyses can assume that your data possesses
certain characteristics that underlie the methods used to test the data.
For the appropriate tests, SigmaStat can automatically check whether
your data is compatible with assumptions of:

➤ Normality
➤ Equal variance
➤ Constant variance
➤ Multicollinearity
➤ Outlying and influential points

Missing and Unbalanced SigmaStat can perform One and Two Way (two factor) ANOVA and
Data Handling Repeated Measures ANOVA on data that contains missing values or
with unbalanced designs without requiring you to supply your own
estimates of the missing data. A sophisticated general linear model is
automatically used to provide least squares estimates of the means for the
cells.

If you have empty ANOVA table cells, SigmaStat automatically provides


choices for the appropriate procedures to follow.

Data Transforms SigmaStat provides a complete array of data transformations. These can
be used to modify data to better fit assumptions of tests, or otherwise
modify it before performing a statistical procedure.

Use data transforms to:

Major SigmaStat Features 6


Introduction

➤ Add, subtract, divide, and find absolute values.


➤ Compute squares and square roots, logs and natural logs, and
reciprocals and exponentials.
➤ Generate random numbers.
➤ Center, standardize, rank, and filter data.
➤ Define dummy and lagged variables, and variable interactions.
➤ Compute arcsin square roots to normalize percentage data.
➤ Convert missing value codes to the double “--” dash symbol used by
SigmaStat.

SigmaStat also includes a transform language which can be used to


generate complex custom transforms, which can be saved to and loaded
from transform files.

User-Defined Nonlinear Both Linear and Nonlinear Regression are provided. The functions used
Regression for Nonlinear Regression can be defined as any function of up to ten
independent variables and up to 25 parameters. Full use of the transform
language is supported, and many other nonlinear regression procedure
settings are customizable. You can save these nonlinear procedures to
and load them from nonlinear regression fit files.

Importing Data SigmaStat can import data from most common spreadsheet formats,
dBASE files, and ASCII text files.

➤ Microsoft Excel (.xls)


➤ Plain Text (.asc, .txt, .prn, .dat)
➤ Comma Delimited (.csv)
➤ Microsoft Access (.mdb)
➤ SPSS (.sav)
➤ SigmaPlot 1.0, 2.0 Worksheet (.spw)
➤ SigmaPlot Macintosh 4 Worksheet (.sp5)
➤ SigmaPlot Macintosh 5 Worksheet (.spw)
➤ SigmaStat 1.0 Worksheet (.spw)
➤ SigmaPlot DOS Worksheet (.sp5, .SPG)
➤ SigmaStat DOS Worksheet (.sp5)
➤ SigmaScan, SigmaScan Pro Worksheets (.spw)
➤ Mocha, SigmaScan Image Worksheets (.moc)
➤ Axon Binary (.abf, .dat)

Major SigmaStat Features 7


Introduction

➤ Axon Text (.atf, etc.)


➤ Lotus 1-2-3 (.wks, .wk1, .wk2, .wk3, .wk4)
➤ Dbase (.db2, .db3, .dbf )
➤ Quattro Pro (.wq1, .wkg)
➤ Paradox (.db)
➤ Symphony (.wk1, .wr1, .wrk, .wks)
➤ SYSTAT (.sys, .syd)
➤ TableCurve 2D & 3D (.tvc, .txt, .prn)
➤ DIF

Exporting Data SigmaStat can export in the following file formats:

➤ SigmaStat 2.0 Notebook (.snb)


➤ SigmaStat 1.0 (.spw)
➤ Microsoft Excel (.xls)
➤ Plain Text (.txt)
➤ Comma Delimited (.csv)
➤ Tab Delimited (.tab)
➤ SigmaScan, SigmaScan Pro (.spw)
➤ Mocha, SigmaScan Image (.moc)
➤ DIF (.dif )

Using SigmaStat Toolbars 10

SigmaStat can display two toolbars at the top of the SigmaStat window.
The standard toolbar provides quick and easy access to statistical tests
and the most commonly used commands. The formatting toolbar
provides easy access to text attributes commands for report graphs.

FIGURE 1–1
The SigmaStat Toolbar
New Excel Select Test
Open Print Copy Redo Worksheet Drop-Down List Run Test Test Options Advisor

New Notebook Save Cut Paste Undo New Create Graph Rerun Test Zoom Help
Worksheet

Using SigmaStat Toolbars 8


Introduction

Right 1½
FIGURE 1–2 Boldface Underline Align Justified Spaces
The Formatting Toolbar

Italics Left Center Single Double


Align Space Space
For more information on how to use the formatting toolbar, see page
137 in the REPORTS chapter.

Viewing and Choose the View menu Toolbars command to open the Toolbars dialog
Hiding Toolbars box, then select or clear options to view or hide selected toolbars.

FIGURE 1–3
The Toolbars Dialog Box

Selected toolbars are displayed; cleared toolbars are hidden. Select or


clear a toolbar by clicking the option.

Check the Large Buttons check box to increase the size of standard and
formatting toolbar buttons. Clear the Color Buttons check box by
selecting it to make color toolbar buttons monochrome. Clear the Show
ToolTips check box to hide the toolbar help tags.

Positioning Toolbars The toolbars can be moved from their default position to anywhere in
the screen, including changing from a horizontal to a vertical position.

To move a toolbar from its fixed position, click anywhere in the toolbar
that is not a button, and drag the toolbar to the desired place on the
screen. The toolbar appears as a floating palette when it is not attached
to the SigmaStat window.

To reattach the floating palette toolbar to a fixed position in the


SigmaStat window, click anywhere in the toolbar that is not a button or
the title bar, drag it on to the menu bar, bottom, or left or right edge of
the SigmaStat window, then release the mouse button. If you are

Using SigmaStat Toolbars 9


Introduction

dragging the toolbar to the edge of the window, release the mouse
button after the toolbar flips to a vertical position.

Once the toolbar is attached to the edge of the SigmaStat window it


remains in a vertical position until it is reattached to the top or bottom
of the window. If you are attaching the vertical toolbar to anywhere at
the top or bottom of the SigmaStat window, release the mouse button
after the toolbar flips to a horizontal position.

Saving Your Work 10

To save the notebook and its associated data, report, and graphs, choose
the File menu Save command. The first time you save your work, you
must select or enter a file name and/or directory. The default file
extension is .SNB for Notebook files. Select OK to save the notebook.

! Note that each worksheet, its associated reports and graphs, and the
notebook are saved together as a single file.

After initially saving your work to a file, you can continue to save to the
same file name with the Save command, or choose a different file name
and/or destination with the Save As command. Use the Save All
command to save all open notebooks.

Setting Options 10

Click Options on the Tool menu to set worksheet and graph page
preferences.

Worksheet Options The worksheet options include numeric display, default column width,
number of decimal places, and use of engineering notation. Worksheet
preferences are discussed in Setting Worksheet Display Options on page
41.

Page Options The page options control units of measurement on the page, graph and
object resizing options, and page Undo disabling. Graph page
preferences are discussed in Setting Graph Page Options on page 178.

Report options The report options include units of measurement on the report, setting
the number of decimals displayed in the report, enabling or disabling

Saving Your Work 10


Introduction

scientific notation, enabling or disabling explanatory text for report


results, setting whether or not you want to report only flagged values, or
hiding or displaying the report ruler. Graph page options are discussed
inSetting Report Options on page 135

Exiting SigmaStat 10

Select either the File menu Exit command, or press Alt+F, then X to
leave SigmaPlot. You can press Alt+F, X from any location in the
program to quit.

All current toolbar and text settings are saved as defaults between
sessions.

Getting Help 10

SigmaStat's on-line help uses the standard Windows help system. To get
help, select the Help menu, then choose Contents and Index to open the
help search dialog box and search for a specific topic within help.

View the Windows Help on Help or refer to your Windows User's


Manual for more information on using Windows help.

Getting Technical The services of Systat Software Technical Support are available to
Support registered customers. Customers may call Technical Support for
assistance in using Systat Software products or for installation help for
one of the supported hardware environments. To reach Technical
Support, see the Systat Software home page on the World Wide Web at
http:// www.systat.com, or contact us:

In North America:

Telephone: (866) 797-8288


(8:00 A.M. to 5:00 P.M. Central
Time)
Fax: (510) 412-2909
E-mail: techsupport@systat.com

Exiting SigmaStat 11
Introduction

In Europe:
Telephone: 49 2104 / 95480
Fax: 49 2104 / 95410
Email: eurotechsupport@systat.com

Systat on the World Wide Web 10

Explore the Systat Software home page at:


http:\\www.systat.com

References 10

Belsley DA, Kuh E, Welsch RE. Regression Diagnostics: Identifying


Influential Data and Sources of Collinearity. New York: John Wiley and
Sons, 1980.

Dixon WJ, Massey FJ Jr. Introduction to Statistical Analysis (4 ed). New


York: McGraw-Hill, 1983.

Fox J. Linear Statistical Models and Related Methods: With Applications to


Social Research. New York: John Wiley and Sons, 1984.

Glantz SA. Primer of Biostatistics (3 ed). New York: Mc Graw-Hill,


1992.

Glantz SA, Slinker BK. Primer of Applied Regression and Analysis of


Variance. New York: McGraw-Hill, 1990.1.

Hosmer DW, Lemeshow S, Applied Survival Analysis: Regression


Modeling of Time to Event Data. New York: Wiley, 1999.

Kleinbaum D, Survival Analysis: A Self-Learning Text. New York:


Springer-Average, 1996.

Kennedy WJ, Gentle JE. Statistical Computing. New York: Marcel


Dekker, 1980.

Systat on the World Wide Web 12


Introduction

Madansky A. Prescriptions for Working Statisticians. New York:


Springer-Verlag, 1988.

Maindonald JH. Statistical Computation. New York: John Wiley and


Sons, 1984.

Milliken GA, DE Johnson. Analysis of Messy Data. Volume I: Designed


Experiments. New York: van Nostrand Reinhold, 1984.

Mosteller F, Rourke REK. Sturdy Statistics: Nonparametrics and Order


Statistics. Reading MA: Addison-Wesley, 1973.

Myers RH. Classical and Modern Regression with Applications. Boston:


Duxbury Press, 1986.

Nash, JC. Compact Numerical Methods for Computers: Linear Algebra


and Function Minimization. New York: John Wiley & Sons, Inc., 1979.

Ott L. An Introduction to Statistical Methods and Data Analysis (3 ed).


Boston: PWS-Kent, 1984.

Press, WH, Flannery, BP, Teukolsky, SA, and Vetterling, WT. Numerical
Recipes. Cambridge: Cambridge University Press, 1986.

Searle SR. Linear Models for Unbalanced Data. New York: John Wiley
and Sons, 1987.

Siegel S. Nonparametric Statistics for the Behavioral Sciences. New York:


McGraw-Hill, 1956.

Thisted RA. Elements of Statistical Computing. New York: Chapman


and Hall, 1988.

Weisberg S. Applied Linear Regression (2 ed). New York: John Wiley and
Sons, 1985.

Winer BJ, Brown DR, Michels KM. Statistical Principles in


Experimental Design (3 ed). New York: McGraw-Hill, 1991.

Zar JH. Biostatistical Analysis (2 ed). Englewood Cliffs NJ: Prentice-


Hall, 1984.

References 13
Introduction

References 14
Notebook Manager Basics

2 Notebook Manager Basics

SigmaPlot notebook files contains all of your SigmaPlot data and graphs,
and are organized within the SigmaPlot Notebook Manager. This
chapter covers:

➤ Notebook Manager organization (see page 15).


➤ Saving your work (see page 19).
➤ Creating notebooks and adding notebook items (see page 22).
➤ Opening notebooks and notebook items (see page 23).
➤ Copying, pasting, and deleting notebook items (see page 25).

Notebook Manager Overview 10

When you first start SigmaPlot, and empty worksheet appears along
with the Notebook Manager. The Notebook Manager is a dockable or
floating window that displays all open notebooks.

The first time you see the Notebook Manager, it appears with one open
notebook, which contains one section. That section contains one empty
worksheet. Contents of the Notebook Manager appear as a tree
structure, similar to Windows Explorer.

Notebook Manager Overview 15


Notebook Manager Basics

FIGURE 2–1
The Notebook Manager Window

Each open notebook appears as the top level, with one or more sections
at the second level, and one or more items at the third level. Within each
section you can create one worksheet and an unlimited number of graph
pages, reports, equations, and macros. The most recently opened
notebook file appears at the top of the Notebook Manager.

Notebook Manager Overview 16


Notebook Manager Basics

FIGURE 2–2
The Notebook
Manager in a
Docked Position

Modified Notebook An asterisk next to an item in the Notebook Manager indicates that the
Names item has been modified since the last time you saved the notebook.

Notebook Item Names The default startup notebook is named Notebook1. It contains one
notebook section, Section 1, and one worksheet, Data 1. When you save
your notebook file, the name of the file appears at the top of the
Notebook Manager window. Notebook files use a (.jnb) extension. The
default names given to notebook sections and items are, Section
(number), Data (number) or Excel (number), and Report (number).
Regression equations are named when they are created. New items are
numbered sequentially.

Notebook Manager Overview 17


Notebook Manager Basics

Opening and Closing Notebooks in the Notebook Manager 10

You can open as many notebooks as you like. All opened notebooks
appear in the Notebook Manager. You can navigate through the
different open notebooks by selecting them in the Notebook Manager.
You can hide them by clicking the Close button on the upper right-hand
corner of the Notebook window; however, this does not close the item.
It only hides it from view. To close notebook, use the File menu.

To open a notebook:
1 From the File menu, click Open. The Open dialog box appears.

2 Select a notebook (.jnb) file from the list, and click Open. The
notebook appears in the Notebook Manager.

To close a notebook:
1 Select the notebook to close in the Notebook Manager.

2 Right-click, and from the shortcut menu, click Close Notebook.

You can also choose Close Notebook from the File menu.

Sizing and Docking the The Notebook Manager can appear in six states:
Notebook Manager
➤ Docked with summary information in view.
➤ Docked with summary information hidden.
➤ Floating with summary information in view.
➤ Floating with summary information hidden.
➤ Docked and collapsed.
➤ Hidden.

1 To undock the Notebook Manager, double-click the title bar and


drag it to the desired location.

2 To dock the Notebook Manager and move it back to its original


position, double-click the title bar again.

Opening and Closing Notebooks in the Notebook Manager 18


Notebook Manager Basics

3 To view summary information, click Show summary information.


To hide it, click Hide summary information.

FIGURE 2–3
The Notebook
Manager Displaying
Summary Information

4 To collapse the Notebook Manager, click the arrow button on the


top right-hand corner of the Notebook Manager when docked. To
view again, click the graph icon.

5 To drag and drop the Notebook manager, click the title bar and drag
the Notebook Manager anywhere on the SigmaPlot desktop.

Saving Your Work Be sure to save your work at regular intervals.

To save a notebook file for the first time:


1 Click the Save button. The Save As dialog box appears.

2 Navigate to the directory where you want to save your notebook.

3 Type a name for the notebook in the File Name text box.

Opening and Closing Notebooks in the Notebook Manager 19


Notebook Manager Basics

4 Click Save to save the notebook file and close the Save As dialog
box.

To save changes with the same name and path:


1 Click the Save button on the Standard toolbar. Your file is saved.

To save to a new name and path:


1 On the File menu, click Save As. The Save As dialog box
appears.

2 Navigate to the directory where you want to save your notebook.

3 Type a name for the notebook in the File Name text box.

4 Click Save to save the notebook file and close the Save As dialog
box.

Printing Selected You can print active worksheets, graph pages, reports, and selected
Notebook Items notebook items by clicking the Print button on the Standard toolbar.
You can print individual or multiple items from the notebook, including
entire sections.

To print one or more items or sections from the notebook:


1 Select one or more items or sections from the notebook.

2 Click the Print button on the Standard toolbar to print the


worksheet using all the default settings.

To set printing options before printing a report, graph page, or worksheet:


Open each item.

3 Press Ctrl+P. The Print dialog box appears.

4 Click Properties.

Opening and Closing Notebooks in the Notebook Manager 20


Notebook Manager Basics

Protecting Notebooks 10

To ensure security of notebook contents, you can lock notebooks using a


password. This is particularly useful if two or more users are using the
same version of SigmaPlot. You can also use a password to send
confidential data to other SigmaPlot users.

Setting a Password To set a password:

1 Select the notebook in the Notebook Manager.

2 On the Tools menu, click Password. The Set Password dialog


box appears.

3 In the New Password box, type a password.

4 In the Reconfirm box, type the password again.

5 Click OK.

Changing or Removing a To change or remove a password:


Password
1 Select the notebook in the Notebook Manager.

2 On the Tools menu, click Password. The Set Password dialog


box appears.

3 In the Old Password box, type the old password.|

4 In the New Password box, type a new password.

5 If you want to remove this password, leave this box, and the
Reconfirm box, empty.

6 In the Reconfirm box, type the password again.

Protecting Notebooks 21
Notebook Manager Basics

7 Click OK.

Working with Sections in Notebook sections are place-holders in the notebook. They contain
the Notebook Manager notebook items, but no data. However, you can name, open, and close
notebook sections.

You can create as many new sections as you want in a notebook. You
may also create reports within each section to document the items in
each section.

To expand or collapse a section, double-click the section icon or click the


(+) or (-) symbol.

Creating New Items in Using the right-click shortcut menu, you can create new sections and
the Notebook Manager items in the Notebook Manager, such as:

➤ Worksheets
➤ Excel Worksheets
➤ Graph pages
➤ Reports
➤ Equations
➤ Sections
➤ Macros

To create a new section or item:


1 Right-click anywhere in the Notebook Manager that you want the
new section or item to appear.

2 On the shortcut menu click New, and then select the item to
create. The new section or item appears in the Notebook
Manager.

Copying and Pasting to Another method to create a new notebook section is to copy and paste a
Create New Sections section in the notebook window. Whenever you copy and paste a
section, its contents appear at the bottom of the notebook window.
SigmaPlot names and numbers the section automatically. For example, if
you copy notebook Section 3, the new section is named Copy of Section
3.

Copied sections create copies of all items within that section as well.

Protecting Notebooks 22
Notebook Manager Basics

Renaming Notebook You can change summary information for all notebook files and items.
Files and Items
To change summary information:
1 If the summary information is hidden on the Notebook Manager,
click View summary information.

2 Select the notebook item and edit as appropriate.

In-place Editing Section You can change the name of a notebook section or item in the notebook
and Item Names itself without opening the Summary Information dialog box.

To in-place edit:
1 In the Notebook Manager, click the section or item you want to
rename.

2 Click it a second time.

3 Type the new name.

4 Press Enter. The new section or item name appears.

! To change the name of the notebook, use the Save As dialog box. For more
information, see “Saving Your Work ” above.

Copying a Page to a If you copy a graph page into an empty section or a section that has no
Section with No worksheet, you create an independent page. The independent page
Worksheet retains all its plotted data without the worksheet. You can store the pages
from several different sections that have different data together this way.
However, if you ever create or paste a worksheet into a section, all
independent pages will revert to plotting the data from the new
worksheet.

Use independent pages as templates, or to draw or store objects. You


cannot create graphs for an independent page until it is associated with a
worksheet (and no longer independent).

Opening Files in the You can open SigmaPlot files and other types of files as SigmaPlot
Notebook Manager notebooks.

To open a notebook file that is stored on a disk:

Protecting Notebooks 23
Notebook Manager Basics

1 Click the Open button on the Standard toolbar. The Open dialog
box appears.
Figure 10-1
Open Dialog Box

FIGURE 2–4
Open Dialog Box

2 Choose the appropriate drive and directory of the notebook file to


open.

3 Double-click the desired notebook file.

4 If you want to open another type of file, choose the type of file from
the Files of type list.

5 Click Open. The opened notebook appears in the Notebook


Manager.

Opening Worksheets, You can open a worksheet, report, or page by double-clicking its icon in
Reports, and Pages the Notebook Manager. You can also right-click the item, and on the
shortcut menu, click Open. Open worksheets, pages and report appear
in their own window, and in the notebook as a colored icons.

Double-clicking an item that is already open brings the item’s window to


the front.

Opening Multiple Items You can open as many items as your system’s memory allows. You can
open multiple items from multiple notebooks. The selected item appears
highlighted in the Notebook Manager.

Protecting Notebooks 24
Notebook Manager Basics

Copying and Pasting You can copy and paste items from one open notebook file to another in
Items in the Notebook the Notebook Manager; however, you cannot copy a worksheet into a
Manager notebook section that already contains a worksheet.

Copying and pasting pages and worksheets between sections results in


using graph pages as templates. For more information, see “Using Graph
Pages as Templates” in Chapter 5.

To copy and paste a notebook item:


1 Right-click the item in the Notebook Manager that you want to
copy, and on the shortcut menu, click Copy.

2 Right-click the section where you want to paste the item, and on
the shortcut menu, click Paste. The selected item is pasted to the
current notebook and section.

Deleting Items in the To delete an item from the Notebook Manager:


Notebook Manager
1 Select the item and press Delete. The item is deleted.

Items removed from a notebook file using the Delete button are
removed permanently.

Protecting Notebooks 25
Notebook Manager Basics

Protecting Notebooks 26
Using the Worksheet

3 Using the Worksheet

Using the worksheet involves manipulating the worksheet and


worksheet data. This chapter describes how to:

➤ Begin new and open multiple worksheets (page 28)


➤ Begin new and open multiple Excel worksheets (page 29)
➤ Enter data in the worksheet (page 30)
➤ Move around the worksheet (page 31)
➤ Select data (page 32)
➤ Cut, copy, and paste data (page 33)
➤ Stack data columns (page 34)
➤ Insert and delete columns and rows (page 35)
➤ Use worksheet shortcut commands (page 36)
➤ Turn insertion and overwrite modes on and off (page 37)
➤ Enter column and row titles (page 37)
➤ Change how data is displayed in the worksheet and set column
widths (page 41)
➤ Sort data (page 52)
➤ View column statistics (page 54)
➤ Save worksheets to the Notebook Manager (page 57)
➤ Export worksheets (page 58)
➤ Open worksheets (page 59)
➤ Import data (page 60)
➤ Switch worksheet rows to columns (page 63)
➤ Arrange data for t-tests and ANOVAs (page 64)
➤ Arrange data for contingency tables (page 69)
➤ Arrange data for regressions (page 71)
➤ Index data (page 73)

27
Using the Worksheet

➤ Use transforms to modify data (page 76)


➤ Print worksheet data (page 80)

You will also be introduced to the SigmaStat data transforms, which


enable you to apply mathematical functions to your worksheet data as
well as generate new data. Using transforms is fully covered in the
Transforms and Nonlinear Regression reference .pdf.

Using SigmaStat Worksheets 10

When you begin SigmaStat, the worksheet automatically appears in its


own section in the current notebook. You can move, resize, minimize,
and close the worksheet window like any other Windows window.

FIGURE 3–1
Example of a
Worksheet
Column numbers and titles
appear here.

Row numbers are listed


along the left side of the
worksheet.

Select a cell by clicking it or


using the arrow keys to scroll
to it. Selected cells are
highlighted.

Opening New and To begin a new worksheet, choose the File menu New... command, or
Multiple Worksheets click the toolbar button. The New dialog box appears. Select
Worksheet from the list in the New drop-down list, select SigmStat 3.1
from the Type option, then click OK.

FIGURE 3–2
Selecting a Worksheet from
the New Dialog Box

Using SigmaStat Worksheets 28


Using the Worksheet

A new SigmaStat Worksheet appears. You can open as many worksheets


as desired. Each worksheet you open is assigned to its own notebook
section.

FIGURE 3–3
Example of Multiple
Worksheets

Using Excel Worksheets in SigmaStat 10

SigmaStat also supports Microsoft Excel worksheets that you can use to
run tests and create graphs on your data. To open an Excel worksheet in
SigmaStat, choose the File menu New... command, or click the toolbar
button. The New dialog box appears. Select Worksheet from the

Using Excel Worksheets in SigmaStat 29


Using the Worksheet

New drop-down list, select Excel from the Type option, then click OK.
You can also click the toolbar button to start an Excel worksheet..

FIGURE 3–4
Selecting an Excel
Worksheet from
the New Dialog Box

An Excel worksheet appears in its own section of the notebook, and the
Microsoft Excel menus, menu commands, and toolbars appear in
SigmaStat.

Excel worksheets work identically to worksheets when running tests and


making SigmaStat graphs (see Chapter 6), except that text in the first
row of a worksheet column is considered a column title. Many of Excel’s
native features, however, work differently from worksheet features. For
information on how to use the Excel worksheet, refer to your Excel User’s
Manual.

Opening New and To begin a new Excel worksheet, choose the File menu New... command
Multiple Excel or click the toolbar button. The New dialog box appears. Select
Worksheets Worksheet from the New drop-down list, select Excel Worksheet from
the Type list, then click OK. A new Excel Worksheet appears in the
SigmaStat window. You can open as many Excel worksheets as desired.
Each worksheet you open is assigned to its own notebook section.

Entering Data in a SigmaStat Worksheet 10

The worksheet is organized by columns and rows of cells. Since


SigmaStat is designed to organize and analyze data according to column
numbers, enter data by columns.

To enter data into a SigmaStat worksheet:

Entering Data in a SigmaStat Worksheet 30


Using the Worksheet

1 Move the pointer to the cell where you want to begin and click, or
move the cursor to the desired location.

2 Type a number or label. As you type, the characters appear in the


edit window (located in the upper left corner of the window).

To enter numbers using scientific notation, type the number


followed by an e and the power (e.g., for 1.26 "#$%&', you would
type 1.26e-6). You can also enter text labels up to 10 characters in
length, with both upper and lower case letters. Use the Backspace
and Delete keys to correct mistakes.

3 Press Enter. The worksheet highlight automatically moves down


one row (the ( key operates identically). You can also press the Tab
key to move one column to the right or the ) key to move one row
up. For a description of using insert and overwrite modes, see page
37.

FIGURE 3–5
Entering Data in
the Worksheet

The file name for the current


worksheet is displayed in the
window.

Moving Around the SigmaStat Worksheet 10

You can move around the worksheet using the scroll bars or by moving
the cell highlight using the keyboard.

Function Keystroke
Move one column right/left * or +
Move one row up/down ) or (

Moving Around the SigmaStat Worksheet 31


Using the Worksheet

Move one window view up/down Page Up, Page Down


Move to end of column End
Move to end of worksheet End+End
Move to top of column Home
Move to column one, row one Home+Home

Going to a To move the cell highlight to any cell in the worksheet, specifying the
Specific Cell column and row number in the Go To Cell dialog box, double-right-
click the worksheet icon in the upper left corner of the worksheet, or
choose the Edit menu Go To... command. The Go To Cell dialog box
appears.

FIGURE 3–6
Using the Go To Dialog Box
to Move to a Specific
Cell in the Worksheet

Enter the desired column and row number. To select the block of cells
between the current highlight location and the new cell, click the Extend
Selection to Cell check box. Click OK to move to the new cell.

Selecting a Block of Data 10

Use any of the following methods to select a block of worksheet cells.

➤ Drag the mouse over the desired worksheet cells while pressing and
holding down the left mouse button.
➤ Hold down the Shift key and press the arrow, PgUp, PgDn, Home,
or End keys.
➤ Use the Go To... command (see Going to a Specific Cell on page 32)

Selecting Columns and To select an entire column, move the pointer over the column. When
Rows the pointer changes to a downward pointing arrow, click or drag to
highlight the desired columns.

Selecting a Block of Data 32


Using the Worksheet

FIGURE 3–7
Selecting a Block
of Data in the Worksheet

To select entire rows, move the pointer to the left of the rows. When the
pointer changes to a right-pointing arrow, click or drag to select the
desired rows.

Selecting the To select all data in the worksheet, double-click the worksheet icon in
Entire Worksheet the upper left corner of the worksheet. To select the entire worksheet,
double click the worksheet icon.

Cutting, Copying, Pasting, Moving, and Deleting Data 10

Use the appropriate Edit menu commands to cut, copy, paste, or clear a
selected cell or block. You can also press the Ctrl+X, Ctrl+C, and Ctrl+V
and the toolbar , , and buttons to cut, copy, and paste data.

You can also access the Edit menu Cut, Copy, and Paste using the right-
click popup menu. Right-click the column with the data you want to
cut, copy, or paste, then choose Cut, Copy, Paste, or Delete.

Cutting and To remove a selected cell or block of data from the worksheet and copy it
Copying Data to the Clipboard, choose the Edit menu Cut command, click the toolbar
button, or press Ctrl+X. To copy a selected cell or block of data from
the worksheet to the Clipboard without removing it from the worksheet,
choose the Edit menu Copy command, click the toolbar button, or
press Ctrl+C.

! The Clipboard is a data buffer which retains the last cut or copied data
block. Subsequent cuts or copies overwrite the current Clipboard contents.

Pasting Data To paste cut or copied data from the Clipboard, click or move the
worksheet cursor to the cell you want to paste the data to; then choose

Cutting, Copying, Pasting, Moving, and Deleting Data 33


Using the Worksheet

the Paste command, click the toolbar button, or press Ctrl+V. The
Clipboard contents appear in the specified cells of the worksheet.

Moving Data To move a block of data cut it, select the upper-left cell of the new
location, then paste the block.

Deleting Data Use the Clear command or press the Delete key to permanently erase
selected data. The data is not copied to the Clipboard.

Stacking Columns 10

You can merge the contents of two or more columns by stacking the
column contents on top of each other.

1 Choose the Transforms menu Stack command. The Pick Columns


for Stacked Columns dialog box appears.

2 Select the output column to place the stacked data by clicking the
worksheet column.

3 Select the columns to stack, either by clicking the worksheet


columns, or selecting the column from the Data for Input drop-
down list. Click Finish to stack the contents of the selected input
columns in the selected output column.

FIGURE 3–8
Stacking Data in
the Worksheet

Column 5 contains
the results of stacking
columns 1 through 4.

Note that you cannot stack blocks of data, only entire columns.

Stacking Columns 34
Using the Worksheet

Inserting and Deleting Blocks, Columns, and Rows 10

You can insert and delete blocks of cells as well as multiple columns or
rows using the Edit menu Insert Cells and Delete Cells commands.

You can cut or delete data and titles in columns by highlighting the
column, then cutting or deleting, but this does not shift or affect other
columns.

1 Select the block, column, or rows to insert or delete by dragging


the mouse over the region where you want the empty block to
appear or over the block you want to delete.

2 To insert a block, columns, or rows, choose the Edit menu Insert


Cells... command. To delete a block, columns, or rows, choose the
Delete Cells... command.

3 Select the direction to shift the existing data when the empty block
is inserted or deleted. Select Columns or Rows to insert or delete
the columns or rows specified in the selected block.

FIGURE 3–9
Inserting an Empty Block
of Data into the Worksheet

4 Click OK to insert or delete the block, columns, or rows. The


existing data shifts in the specified direction. All data shifts to the

Inserting and Deleting Blocks, Columns, and Rows 35


Using the Worksheet

right when a new column is inserted or moves down when a new


row is inserted.

FIGURE 3–10
The Result of Inserting
an Empty Block with
Cells Shifted Down

! Inserting adds empty (blank) cells to the worksheet. To overwrite


empty cells, make sure that you are in Overwrite mode (see below).

Using Special Worksheet Shortcuts 10

In addition to the menu commands and toolbar buttons referred to in


the body of this manual, right-clicking the worksheet displays a right-
click popup menu. The commands on the right-click popup menu are
the same as the Edit menu Cut, Copy, Paste, Delete, Transpose Paste,
Insert Cells..., and Delete Cells... commands.

FIGURE 3–11
Viewing a Worksheet
Right-Click Popup Menu

Using Special Worksheet Shortcuts 36


Using the Worksheet

Insertion and Overwrite Modes 10

Press the Ins key or on the Edit menu click Insertion Mode to switch
between overwrite and insert data entry modes.

If in Insertion mode, “Ins” appears in the status bar. A check mark next
to the Insertion Mode command on the Edit menu also indicates that the
worksheet is in insertion mode. New data entered in a cell does not erase
the previous contents. Any existing data in the column is moved down
one row. If you paste a block of cells, existing data is pushed down and/
or to the right to make room for the pasted cells. If you cut or clear data,
data below the deleted block moves up and/or to the left.

If not in Insertion mode, the worksheet is in overwrite mode. Data


entered into a cell replaces any existing data. If you paste a block of data,
the block overwrites existing data.

Entering and Promoting Column and Row Titles 10

Column and row titles label and identify data. Reports reference column
titles when building tables of results. The Indexing transform also uses
column titles to build index columns. Some column titles are generated
automatically when residuals or other results are placed in the worksheet.
Column titles also appear in the Graph Wizard when picking columns
to plot and can be used in transforms instead of column numbers.

To enter or edit a worksheet column or row title, double-click the title,


and enter or edit the title. Press Enter to accept the new title. Labeling
worksheet columns keeps previous number of column with the new
added name.

! You must use at least one text character in every column title. If you need to
use a number as column title, type a space character (by pressing the space
bar) before the number.

Using the Column and You can enter and edit column and row titles using the Column and
Row Titles Row Titles dialog box.
Dialog Box
To enter or edit a column or row title:

Insertion and Overwrite Modes 37


Using the Worksheet

1 On the Format menu, click Column and Row Titles.

The Column and Row Titles dialog box appears.

Figure 3–12
Entering a Column Title
Using the Column and Row
Titles Dialog Box

2 Click the Column tab to enter or edit a column title, or the Row
tab to enter or edit a row title.

3 Enter the column or row title in the Title box.

4 To edit an existing title, move to that column by clicking the Next


or Prev buttons, then edit the title.

5 Click OK to close the Column and Row Titles dialog box when
you are finished editing the titles.

Entering and Promoting Column and Row Titles 38


Using the Worksheet

Using a Enter labels into a row, then use that row for worksheet column titles.
Worksheet Row This is useful for data imported or copied from spreadsheets.
for Column Titles
All the cells of the selected row are promoted, not just those cells which
contain column titles. This may effect other data sets in the worksheet.

To use a row for column titles:


1 If necessary, enter the column titles you want to use in a single
worksheet row.

2 Select the cells in the row you want to use as column titles.

3 On the Format menu, click Column and Row Titles.

The Column and Row Titles dialog box appears.

4 Click the Column tab.

The row you wish to promote appears in the Promote row to titles
box.

5 To delete the original row once it has been promoted, select Delete
Promoted Row.

6 Click Promote.

The selected row contents appear as column titles.

7 Click OK to close the Column and Row Titles dialog box.

Using a Worksheet Enter labels into a column, then use that column for worksheet row
Column for Row Titles titles. This is particularly useful for data imported or copied from
spreadsheets.

All the cells of the selected row are promoted, not just those cells which
contain column titles. This may effect other data sets in the worksheet.

To use a column for row titles:

Entering and Promoting Column and Row Titles 39


Using the Worksheet

1 If necessary, enter the row titles you want to use in a single


worksheet column.

2 Select the the row you want to use as row titles.

3 On the Format menu, click Column and Row Titles.

The Column and Row Titles dialog box appears.

4 Click the Row tab.

The title of the column you wish to promote appears in the


Promote column to titles box.

5 To delete the orgininal column once it has been promoted, select


Delete Promoted Column.

6 Click Promote.

The selected column contents appear as row titles.

7 Click OK to closethe Column and Row Titles dialog box.

Using a Cell as a Column Use the Column and Row Titles dialog box to promote individual cells
or Row Title to column and row titles.

To promote individual cells:


1 Click the cell on the worksheet that you want to promote to a
column or row title. Do not select the entire column.

2 On the Format menu, click Column and Row Titles.

The Column and Row Titles dialog box appears.

3 Click the Row tab to promote a row cell to title; click the Column
tab to promote a column cell to a title.

Entering and Promoting Column and Row Titles 40


Using the Worksheet

4 Click Promote.

The the content of the cell appears as the column title.

5 Select Delete Promoted Column or Delete Promoted Row to


delete the original cell once it has been promoted.

6 Click Next or Prev to move to the next desired column or row,


then follow steps 2 through 4.

Setting Worksheet Display Options 10

Use the Options dialog box to set the default for how data is displayed in
the worksheet. You can also set the default for acolumn widths and row
height. Options set here appear in all subsequently opened worksheets.

The Options dialog box Worksheet tab sets the display for:

➤ Numeric
➤ Date and Time
➤ Statistics
➤ Appearance

To learn more about column statistics, see Viewing the Column


Statistics Worksheet on page 54.

Setting Worksheet To set the way numbers are displayed in the worksheet, select one of the
Numeric Display numeric formats available on the Options dialog box.

To set the numeric display:


1 View the worksheet.

2 On the Tools menu, click Options.

Setting Worksheet Display Options 41


Using the Worksheet

The Options dialog box appears.

Figure 3–13
Selecting a Numeric Display
on the Options Dialog Box

3 In the Settings For list, click Numeric.

4 Select a Numeric format setting from the Display As drop-down


list.

You can choose from one of the following:.

Numeric Display Description Example


E Notation When Displays worksheet data as 12.00
Needed scientific notation only when
the length of the value
exceeds the width of the cell.
The default column width is
twelve.

Setting Worksheet Display Options 42


Using the Worksheet

Numeric Display Description Example


E Notation Always Always displays data as 12.00e+ 0
scientific notation. The
number of decimal places is
set in the Decimal Places edit
box.
Fixed Decimal Displays data with a fixed 12.00
number of decimal places. Set
the number of decimal places
in the Decimal Places edit
box. The number of decimal
places allowed is limited by
the column width of the
maximum number of decimal
places cannot exceed the
column width. The default
setting for decimal places is
two.
General Displays data exactly as you 12
enter it in the worksheet.

5 Click OK to accept the settings and close the dialog box.

Setting Decimal Places The column width limits the number of decimal places allowed. The
in the Worksheet maximum number of decimal places cannot exceed the column width.

! SigmaStat is only accurate to fifteen significant digits, so data precision is


limited to fifteen place floating point number, no matter how many decimal
places you specify.

To set decimal places:


1 On the Tools menu, click Options.

Setting Worksheet Display Options 43


Using the Worksheet

The Options dialog box appears.

Figure 3–14
Setting Decimal Places on
the Options Dialog Box

2 In the Settings For list, click Numeric.

3 Select the number of decimal places from the Decimal Places drop-
down list.

4 Click OK to accept the changes and close the dialog box.

Changing Date and Time SigmaStat has a variety of date/time displays. When you enter a value
Display in a Worksheet into a date/ time formatted cell, SigmaStat assumes internal date/time
information about that value from the year to the millisecond.

For example, if you enter a day and month, you can display the month
and year.

1 On the Tools menu, click Options.

Setting Worksheet Display Options 44


Using the Worksheet

The Options dialog box appears.

Figure 3–15
Using the Options Dialog
Box to Change the Date and
Time Display on the
Workhsheet

2 In the Show Settings drop-down list, click Date and Time.

3 To change the Date format, you can type a format listed below, or
select a format from the drop-down list.

Typing Displays
M/d/yy No leading 0 for single digit month, day or year
M/d/yy Leading 0 for single digit month, day or year
MMMM Complete month
dddd Complete day
yyy or yyyy Complete year
MMM Three-letter month
ddd Three-letter day
gg Era (AD or BC)

Setting Worksheet Display Options 45


Using the Worksheet

4 To change the display Time format, type one of the following


examples into the Time box, or select a format from the drop-
down list:

Typing Displays
hh or h 12 hour clock
HH or H Military hours
mm or m Minutes
ss or s Seconds
uu or u Milliseconds
H: h: m: s: or u No leading zeroes for single digits
HH: hh: mm: ss: uu Leading zero for single digits
tt Double letter AM or PM
t Single letter AM or PM

5 Click OK to accept the settings and close the dialog box.

Day Zero Setting a Start Date is only necessary if you are importing numbers to be
converted to dates, or converting dates to numbers for export. The
starting date must match the date used by the other application

To set the start date:


1 On the Tools menu, click Options.

The Options dialog box appears.

2 Under Settings For, select Date and Time.

3 Select a date from the Day Zero drop-down list, or type your own
start date. SigmaStat provides three start dates:

➤ 1900
➤ 1904

Setting Worksheet Display Options 46


Using the Worksheet

Figure 3–16
Using the Options Dialog
Box to Set the Start Date for
Date and Time Data

➤ -4713

The default start date is 1/1/1900.

Day Zero becomes the number 1.00 when you change from Date and
Time to Numbers format. The basic unit of conversion is the day; that
is, whole integers correspond to days. Fractions of numbers convert to
times. Zero and negative numbers entered into the worksheet convert to
days previous to the Day Zero start date.

Conversion between date/time values and numbers can occur for the
calendar range of 4713 BC to beyond the year 4,000 AD. The internal
calendar calculates dates using the Julian calendar until September,
1752. After that, dates are calculated using the Gregorian calendar.

! If you convert numbers to dates, a start date is applied. If you convert the
dates back to numbers, be sure you use the same start date as when you
converted them, or they will have a different value.

Regional Settings Drop-down lists in the Options dialog box


Worksheet tab use the current date/time settings in your operating
system. The Windows Regional Settings control date/time delimiters,
12 or 24 hour clock, and AM/PM display.

Setting Worksheet Display Options 47


Using the Worksheet

Date and time display formats may be affected by your operating


systemís Regional Settings. For example, if your Time Zones are
specified as British (English), your date values appear as dd/mm/yy. If
the setting is US (English), your date values appear as mm/dd/yy. If you
want to view or modify the current settings, or view alternative settings
available on your system, click the Regional Settings button, or modify
them directly from the Windows Control tab.

! Date and time values appear on the worksheet using the date and time
delimiters, generally a forward slash (/) or colon (:).

Changing Worksheet Use the Options dialog box to adjust column width and row height, set
Appearance grid line color and thickness, set the data feedback display, and to set the
worksheet font and size.

! To learn how to set data feedback colors, see Setting Data Feedback Colors
below.

To change the worksheet appearance:


1 On the Tools menu, click Options.

The Options dialog box appears.

2 In the Settings For list, click Appearance.

3 To adjust column width and row height, select from the the
Column Width and Row Height drop-down lists. The maximum
number of columns displayed depends on the resolution of your
display and the size of the worksheet window.

Note that you can drag the boundaries of worksheet columns and
rows to resize them. For more informaton, see Sizing Individual
Columns and Rows on page 3-51.

Cell entries whose length exceeds the column width are displayed
as greater than symbols (####).

4 To set color and thickness, select from the Color and Thickness
drop-down lists.

5 To set the font style and size, select from the Font and size drop-
down lists.

Setting Worksheet Display Options 48


Using the Worksheet

Figure 3–17
Using the Options Dialog
Box to Set Worksheet
Line Thickness

6 Click OK to apply the changes and close the dialog box.

Setting Data Feedback Data Feedback highlights the cells and columns on the worksheet that
Colors correspond to the selected curve or datapoint’s X and Y values.

1 On the Tools menu, click Options.

The Options dialog box appears.

2 In the Settings For list, click Appearance.

Set data feedback colors and thickness from the X and Y drop-
down lists.

3 Click OK to apply the changes and close the dialog box.

Setting Worksheet Display Options 49


Using the Worksheet

Figure 3–18
Using the Options Dialog
Box to Set Worksheet
Data Feedback Colors

Customizing Individual Worksheets 10

You can customize worksheet data to appear as Numeric, Text, or Date


and Time data without changing the default options set in the Options
dialog box. You can also set column width and row height. Changes set
here will not affect subsequently created worksheets.

Cells must contain data for the formatting to take affect.

1 Select the row, column or block of cells.

2 On the Format menu, click Cells.

The Format Cells dialog box appears.

3 To set the data display, click the Data tab. Under Type, select
Numeric, Text, or Date and Time data.

Customizing Individual Worksheets 50


Using the Worksheet

FIGURE 3–19
Using the Format Cells
Dialog Box to Set the
Numeric Display on a
Workhsheet

The options that appear under Settings reflect your selection. To


learn more, see Setting Worksheet Display Options on page 41.

4 To set the column width and row height, click the Rows and
Columns tab.

You can also drag the boundaries of column and row headings to
resize. To learn more, see Sizing Individual Columns and Rows
below.

5 Select heights and widths from the Height and Width drop-down
lists.

6 To apply these formats to the entire worksheet, select Apply to


entire data region.

Sizing Individual If the contents of your column exceed the column width, cell contents
Columns and Rows display as pound symbols (####). Label entries are truncated.

To change a column width, drag the boundary on the right side of the
column heading until the column is the size you want.

To change a row height, drag the boundary below the row heading until
the row is the size you want.

Customizing Individual Worksheets 51


Using the Worksheet

FIGURE 3–20
Dragging a Column
Heading to increase the
Column Width

Sorting Data 10

You can sort selected blocks of data in ascending or descending order


according to the order in a key column.

! Because the sort command sorts data in place, if you want the original data to
remain intact, copy the data to a new location and sort the copied data.

To sort selected data:


1 Select the data you want to sort using the mouse or keyboard.
Only the selected columns and rows will be sorted; unselected
values within a column are ignored.

Sorting Data 52
Using the Worksheet

2 Choose the Transforms menu Sort Selection... command. The Sort


Selection dialog box appears.

FIGURE 3–21
The Sort
Selection Dialog Box

3 Select the key column to use. If you sort more than one column of
data, the key column is used as the sorting index for all other
selected data. Only the key column is sorted. The rows in the other
selected columns are “attached” to the original rows in the key
column, and follow the rows in the key column as they are sorted.

Note that this will not necessarily sort the other columns in
ascending or descending order; instead, the order is determined by
the order the key column was sorted.

4 Select either Ascending or Descending to sort your data in order of


increasing or decreasing values.

5 Click OK to sort the data in place.

Sorting Data 53
Using the Worksheet

Viewing the Column Statistics Worksheet 10

SigmaStat automatically calculates a number of basic statistical values for


all worksheet columns. To view a worksheet of these statistics for the
currently selected worksheet, on the View menu click Column Statistics,
or press F6. The running calculations performed for each column appear
in a Column Statistics worksheet for the data in the current worksheet.

FIGURE 3–22
The Column
Statistics Worksheet

You can close the Column Statistics window by clicking the button
in the upper right corner of the worksheet window, by choosing the
View menu Statistics command again, or by choosing the File menu
Close command.

Column Statistics To display only a portion of the available statistics use the Worksheet
Options Preferences dialog box, then select column statistics to show or hide.

A Column Statistics worksheet must be in view.

To specify which statistics are shown or hidden:


1 On the Tools menu, click Options, and then click the
Worksheet tab.

2 Select the statistic(s) you want shown or hidden, then use the
Show and Hide buttons to move the statistics between the Shown
and Not Shown lists.

3 To change the column widths and data display, select the


appropriate options. For more information on changing column

Viewing the Column Statistics Worksheet 54


Using the Worksheet

width, see page 41. For more information on changing other data
display settings, see page 41.

Available Statistics The statistics shown in the Column Statistics window are determined by
your settings in the Column Statistics Options dialog box (see the
preceding section, Column Statistics Options). The following statistics
can be displayed in the Column Statistics window. Empty cells, missing
values, and text are ignored in most calculations.

Mean The arithmetic mean, or average, of all the cells in the column,
excluding the missing values. This is defined by:

n
x , --n1- - xi
i ,1

Std Dev The sample standard deviation is defined as the square root of
the mean of the square of the differences from their mean of the data
samples xi in the column. Missing values are ignored.

1---
2
n
2
- . xi – x /
1
s = ----------
-
n–1
i =1

Std Err The standard error is the standard deviation of the mean. It is
the sample standard deviation divided by the square root of the number
of samples. For sample standard deviation s:

s
Std Err , -------
n

Viewing the Column Statistics Worksheet 55


Using the Worksheet

95% Conf The value for a 95% confidence interval. The end points of
the interval are given by:

s-
& t . v, z / ------
x0
n

where x is the mean, s is the sample standard deviation, and t(v,z) the t
statistic for v = n&1 degrees of freedom and z , 1.96 standard normal
percentile equivalent.

99% Conf The value for a 99% confidence interval. The end points for
this interval are computed from the equation for the 95% confidence
interval using z , 2.576.

Size The number of occupied cells in the column, whether they are
occupied by data, text, or missing values.

Sum The arithmetic sum of the data values in the column.

Min The value of the numerically smallest data value in the column,
ignoring missing values.

Max The value of the numerically largest data value in the column.

Min Pos The smallest positive value.

Missing The number of cells in the column occupied by text or missing


values.

Other Text or an empty cell.

Skewness These values represent the asymmetry of a distribution. The


higher the value the more asymmetric the distribution.

Viewing the Column Statistics Worksheet 56


Using the Worksheet

Printing To print or export column statistics, choose the File menu Print...
Column Statistics command and choose Statistics from the Print drop-down list. Note that
in order to print the name of the statistic, you must select to print the
row titles by clicking Options, then checking Row Titles by selecting it
from the Headers options.

For more information on printing worksheets, seePrinting Worksheet


Data on page 80.

Saving Worksheets to SigmaStat Notebooks 10

SigmaStat Worksheets are saved to Notebook files. To save data for the
current worksheet to a notebook file, choose the File menu Save
command, press Ctrl+S, or click the toolbar button.

If you are saving the notebook for the first time, the Save As dialog box
appears prompting you for a file name and path for the notebook file. If
you are saving the worksheet to an existing notebook file, the notebook
is updated to include the new worksheet or the changes to the existing
worksheet.

! To save worksheets as non-notebook files, you must export them using the
File menu Export... command. For more information on exporting
worksheets, see Exporting Worksheets on page 58.

For more information on saving worksheets and other items to notebook


files, see Saving Notebook Files and Items on page 30.

! To save worksheets as non-notebook files, you must export them using the
File menu Export... command. For more information on exporting
worksheets, see the following section.

Saving Worksheets to SigmaStat Notebooks 57


Using the Worksheet

Exporting Worksheets 10

SigmaStat worksheets can be saved as non-notebook files using the File


menu Export... command. Saving worksheets as non-notebook files is
useful if you want edit your data in other spreadsheet applications.

To export a worksheet to a non-notebook file, drag the mouse over the


text you want to save to a file. If no text is selected, the entire worksheet
is exported. Choose the File menu Export... command, then select the
file type to export the worksheet. Worksheets can be exported as the
following file types:

➤ SigmaStat 2.0 (*.snb)


➤ SigmaStat 1.0 (*.spw)
➤ Microsoft Excel® (*.xls)
➤ Plain Text (*.txt)
➤ Comma Delimited (*.csv
➤ Tab Delimited (*.tab)
➤ SigmaScan, SigmaScan Pro (*.spw)
➤ Mocha, SigmaScan Image (*moc)
➤ DIF (*.dif )

! Exporting worksheets does not export associated graphs.

For more information on exporting worksheets, see Exporting Notebook


Items to Other File Formats on page 32.

Exporting Worksheets 58
4
Opening Worksheets 10

To open a worksheet, choose the File menu Open... command, click the
toolbar button, or press Ctrl+O. When the Open dialog box appears,
select the type of worksheet you want to open by selecting a file type
from the List Files of Type drop-down list, then click OK.

Worksheets in Notebook If you open a SigmaStat Notebook file type (.SNB), a notebook file
Files appears displaying its sections and items. To view the desired worksheet,
double-click the worksheet icon in the appropriate notebook section.

For more information on opening notebook files, see Opening and


Viewing Notebook Files and Items on page 25.

Non-Notebook Non-notebook files are individual files which are separate from the
Worksheet Files notebook. They are automatically converted to notebook file format
when opened in SigmaStat. You can open the following non-notebook
worksheet file types in SigmaStat:

➤ SigmaStat 1.0 (*.spw)


➤ SigmaStat DOS (*.sp5)
➤ SigmaPlot Notebook(*.jnb)
➤ SigmaPlot 1.0, 2.0 (*.spw)
➤ SigmaPlot DOS (*.sp5, *spg)
➤ SigmaPlot Macintosh 4 (*.sp5)
➤ SigmaPlot Macintosh 5 (*.spw)
➤ Microsoft Excel (*.xls)
➤ Plain Text (*.asc, *.txt, *, prn, *.dat)
➤ Comma Delimited (*.csv
➤ TableCurve 2D & 3D (*.tvc, *.txt, *.prn)
➤ Lotus 1-2-3 (*.wks, *.wk1, *.wk3, *.wk4)
➤ Quattro Pro (*.wk1, *wkq)
➤ dBase (*.db2, *.db3, *.dbf )
➤ SigmaScan, SigmaScan Pro Worksheets (*.spw)
➤ Mocha, SigmaScan Image Worksheets (*.spw)
➤ SigmaStat Report (*.rtf )

Opening Worksheets 59
➤ Mocha and SigmaScan files (.MOC)
➤ DIF (.dif )

For more information on opening non-notebook files in SigmaStat, see


Opening Non-Notebook Files on page 26.

Importing Data 10

SigmaStat can import data from the following file types:

➤ MS Excel (*.xls)
➤ Plain Text (*.txt)
➤ Comma Delimited (*.csv)
➤ MS Access (*.mdb)
➤ SPSS (*.sav)
➤ SigmaPlot 1.0, 2.0 Worksheet (*.spw)
➤ SigmaPlot Mackintosh 4.0 Worksheet (*.sp5)
➤ SigmaPlot Mackintosh 5.0 Worksheet (*.spw)
➤ SigmaStat 1.0 Worksheet (*.spw)
➤ SigmaPlot DOS Worksheet (*.sp5, *.spg)
➤ SigmaStat DOS Worksheet (*.sp5)
➤ SigmaScan, SigmaScan Pro Worksheets (*.spw)
➤ Mocha, SigmaScan Image Worksheets (*.spw)
➤ Axon Binary (*.abf, *.dat)
➤ Axon Text (*.atf., etc)
➤ Lotus 1-2-3 (*.wks, *.wk1, *.wk3, *wk4)
➤ DBase (*.db2, *.db3, *.dbf )
➤ Quattro Pro (*.wq1, *.wkq)
➤ Paradox (*.db)
➤ Symphony (*.wk1, *.wr1, *.wrk, *.wks)
➤ SYSTAT (*.sys, *.syd)
➤ TableCurve 2D & 3D (*.tvc, *.txt, *.prn)
➤ DIF (*.dif )

To import data from a selected file:

Importing Data 60
1 Move the worksheet cursor to the worksheet cell where you want
the imported data to start.

2 Choose the File menu Import Data... command. A file dialog box
displaying the current drive, directory, and files appears.

3 Use the List Files of Type box to select the type of file you want to
import.

4 Change the drive and directory as desired, select the file you want
to read, then click Import, or double-click the file name.
Depending on the type of file, the data is either imported
immediately, or another dialog box appears.

To learn about importing SigmaPlot, SigmaStat, SigmaScan,


Mocha and DIF files, see Importing Worksheets and DIF Files on
page 61. To learn about importing Lotus, Quattro, and dBASE
files, see Importing Excel, Lotus, Quattro, and dBASE Files on
page 62. To learn about importing ASCII files, see Importing Text
Files on page 62.

Imported data appears in the worksheet at the position of the worksheet


cursor. To move your data, cut the data, move to the cell where you want
your data to begin, then paste the data back to the worksheet. To learn
how to cut and paste worksheet data, see Cutting, Copying, Pasting,
Moving, and Deleting Data on page 33.

To learn how to reformat imported data, see the preceding sections.

Importing Worksheets If importing a SigmaStat, SigmaPlot, SigmaScan, Mocha, or DIF file,


and the dialog box which appears after selecting the appropriate file from the
DIF Files Import File dialog box enables you to select a range of data to import.

Select the start and end of the range; the default is the entire range. The
dialog box lists the insertion point in the SigmaStat worksheet.

FIGURE 4–1
Import SPW Dialog Box

Importing Data 61
After selecting the range, click Import to place the data in the SigmaStat
worksheet.

Importing If you are importing a spreadsheet or dBASE file, the Import


Excel, Lotus, Quattro, Spreadsheet dialog box appears.
and dBASE Files
1 Select to import either the entire spreadsheet or a specified range of
cells. Cells are specified using the standard 1-2-3 notation (e.g.
A1:C50 for a range from cell a1 to cell c50). For dBASE files, cell
letters correspond to fields.

FIGURE 4–2
Import Spreadsheet
Dialog Box

2 To import spreadsheet data from non-compatible programs, save


the spreadsheet as either a Lotus or text file, then import that file.

3 When you are finished specifying the range to import, click


Import. The selected data is imported.

! Importing data from an Excel file places the data into a worksheet.
To open an Excel worksheet in SigmaStat, see Using Excel
Worksheets in SigmaStat on page 29.

Importing Text Files If you are importing a text file, the Import Text dialog box appears. Use
this dialog box to view the text file and to specify other delimiter types
or to build a model of the data file according to custom column widths.

1 To specify a different column separator, select Delimited By to


activate the delimiter options; then select the appropriate type. You
can select commas, hyphens, or any other characters. For example,
many databases use semicolons (;) as delimiters.

The drop-down lists display all delimiters used by all saved formats
(see Saving Text Import Formats below.).

2 To specify a model of the data, use dashes (-) to specify column


widths, and [ and ] bracket characters to define the column edges.
Use a vertical bar | character to indicate a single-character width

Importing Data 62
FIGURE 4–3
The Import Text Dialog Box

column. Click Reanalyze to display the appearance of the file using


the new model.

3 To specify a different range, enter the rows and columns to read,


then select the Analyze option. You can use this feature to
eliminate file headers and other undesired text.

Saving Text Import Formats You can save the specifications used to
import a text file for future use. Enter a name into the Format scheme
box, then click Add. Delete unwanted import formats using the
Remove button.

When you are finished specifying the file parameters, click Import. The
specified data from the file is imported.

Switching Rows to Columns 10

Occasionally you may need to rearrange data from a row oriented format
to a column-wise organization or vice versa. In this case, you can use the
Edit menu Transpose Paste command to paste Clipboard contents with
the row and column coordinates transposed.

To swap data column and row positions:

1 Drag the mouse or use the Shift+arrow keys to select the block of
data whose rows and columns you want to transpose.

Switching Rows to Columns 63


2 Click the toolbar button, choose the Edit menu Cut command,
or press Ctrl+X. The data is removed from the worksheet and
placed in the Clipboard.

3 Select the cell to paste the beginning of the data, then choose the
Edit menu Transpose Paste command. The data is pasted to the
worksheet with the column and row coordinates reversed.

FIGURE 4–4
Results of Switching Rows
to Columns Using the
Transpose Paste Command
Columns 1 and 2
were copied and then
transposed pasted,
beginning in
column 3, row 1.

Arranging Data for t-Tests and ANOVAs 10

There are several forms of data that can be analyzed by SigmaStat t-tests,
analysis of variances (ANOVAs), repeated measures ANOVAs, and their
nonparametric analogs, such as:

➤ Raw data, which places the data for each group in separate columns;
this is the format used by SigmaStat.
➤ Indexed data, which places the group names in one column, and the
corresponding data for each group in another column.
➤ Statistical summary data, which can be used by unpaired t-tests and
One Way ANOVAs.

The data format is set in the Pick Columns dialog box that appears after
choosing the Statistics menu Run Current Test... command or clicking
the toolbar Run icon.

! Messy and Unbalanced Data SigmaStat automatically handles missing


data points (indicated with an “--”) for all situations. If a two factor ANOVA
is missing entire cells, the appropriate steps are suggested, and the desired
procedure is performed.

Arranging Data for t-Tests and ANOVAs 64


Raw Data Raw data format places the data for each group to be compared or
analyzed in separate columns. Use column titles to identify the groups,
as the titles will also be used in the analysis report (see Entering and
Promoting Column and Row Titles on page 37).

t-tests and Rank Tests The groups to be compared are always placed in
two columns.

Paired t-tests and signed rank tests (both repeated measures tests)
assume that the data for each subject is in the same row.

FIGURE 4–5
Raw Data for an
Unpaired t-test

For more information on arranging data for t-tests and rank tests, see
pages 8-207 and 8-221.

One Way ANOVA and One Way ANOVA on Ranks Data for each
group is placed in separate columns, with as many columns as there are
groups. One way repeated measures ANOVA and one way repeated
measures ANOVA on ranks assume that the data for each subject is in
the same row.

For more information on arranging data for one way ANOVAs, see page
8-231.

Raw Data for Two and Three Way ANOVAs The Two way
ANOVA,Two Way repeated measures ANOVA, and Three Way
ANOVA cannot analyze raw data and require indexed data; for a
description of indexed data, see Indexed Data below. For more
information on using the Index command, see INDEXING DATA on page
73.

For more information on arranging data for Two Way ANOVAs, see
Arranging Two Way ANOVA Data on page 254.

Arranging Data for t-Tests and ANOVAs 65


Indexed Data Indexed data consists of a factor column, which contains the names of
the groups or levels, and a data column containing the data points in
corresponding rows.

Two way ANOVAs require two factor columns and one data column.
Three Way ANOVAs require three factor columns and one data column,
and Repeated measures ANOVAs require an additional subject column
to identify the subject of the measurement.

The order of the rows containing the index and data does not matter;
i.e., they do not have to be grouped or sorted by factor level or subject.

! If you are analyzing entire columns of data, the location in the worksheet of
the factor, subject, and data columns does not matter.

If you plan to compare only a portion of the data, put the index in the

FIGURE 4–6
Indexed Data for
a One Way ANOVA
Column 1 (Species) is the
factor column, with levels A,
B, and C, and column 2
(Density) is the
corresponding data.

left column(s) and the data in the right column.

You can index data using Edit menu Index command. For information
on indexing data, see INDEXING DATA below.

Independent t-test and Mann-Whitney Rank Sum Test The group


index is in a factor column, and the corresponding data points to be
compared are in a second column.

For more information on arranging data for the t-test and the Rank Sum
Test, see pages 8-207 and 8-221 in Chapter 9.

Arranging Data for t-Tests and ANOVAs 66


Paired t-test and Wilcoxon Signed Rank Test Repeated measures
comparisons require an additional subject index column, which
indicates the subject for each level and data point.

For more information on arranging data for the Paired t-test and the
Signed Rank Sum Test, see pages page 333 and page 346 in Chapter 10.

One Way ANOVA and Kruskall-Wallis ANOVA on Ranks The


factor column contains the group index, and the data column contains
the corresponding data points. Indexed data for one way ANOVA
contains only two columns.

For more information on arranging data for the One Way ANOVA and
the ANOVA on Ranks, see pages 8-231 and 9-311 in Chapter 9.

Two Way ANOVA Two factor columns are required for Two Way
ANOVAs, one for each level of the observation. Each data point should
be represented by different combinations of the factors; see Table 4-3 on
page 74 and Figure 4–13 on page 75 for an example. The factors are
Gender and Drug, and the levels are Male/Female and Drug A/Drug B.

! If you do not want to bother entering indexed data for a Two Way ANOVA,
you can enter the data for each cell of the Two Way ANOVA table into
separate columns, then use the Edit menu Index command to create the
indexed columns. See page 4-73 for this procedure.

For more information on arranging data for the Two Way ANOVA, see
9-254 in Chapter 9.

Three Way ANOVA Three factors are required for Three Way
ANOVAs, one for each level of observation. Each data point should be
represented by different combinations of the factors.

For more information on arranging data for the Three Way ANOVA, see
Three Way Analysis of Variance (ANOVA) on page 283.

Arranging Data for t-Tests and ANOVAs 67


Repeated Measures ANOVA These tests require an additional subject
column, which identifies the data points for each subject.

A Two Way Repeated Measures ANOVA requires both a subject column


and two factor columns, as well as a data column.
TABLE 3-1
Data for Two Way Repeated
Species Subject Salinity
Factor ANOVA, with One 10 15
Factor (Salinity) Repeated
Artemia sp. 1 A 10.0 12.5
B 8.5 13.0
C 9.0 10.5
Artemia sp. 2 D 5.5 5.5
E 7.5 8.0
F 7.0 6.5

FIGURE 4–7
Indexed Data Format
for a Two Way Repeated
Measures ANOVA of the
Data from Table 3-1
Column 1 is the subject
column, columns 2 and 3
are the factor columns, and
column 4 is the data column.

Statistical Unpaired t-tests and one way ANOVAs can be performed on summary
Summary Data statistics of the data. These statistics can be in the form of:

➤ The sample size, mean, and standard deviation for each group, or
➤ The sample size, mean, and standard error of the mean (SEM) for
each group.

Arranging Data for t-Tests and ANOVAs 68


The sample sizes (N) must be in one worksheet column, the means in

FIGURE 4–8
Statistical Summary Data
for a One Way ANOVA

another column, and the standard deviations (or standard errors of the
mean) in a third column, with the data for each group in the same row.

If you plan to compare only a portion of the data by selecting a block,


put the sample sizes in the left column, the means in the middle column,
and the standard deviations or SEMs in the right column.

Arranging Data for Contingency Tables 10

Data for 12 (Chi-Square) tests, the Fisher Exact Test, and McNemar’s
Test can be arranged in the worksheet as either the contingency table
data, or as indexed raw data.

Tabulated Data Tabulated data is arranged in a contingency table showing the number of
observations for each cell. The worksheet rows and columns correspond
to the groups and categories. The number of observations must always
be an integer.

Note that the order and location of the rows or columns corresponding
to the groups and categories is unimportant. You can use the rows for
category and the columns for group, or vice versa.

Arranging Data for Contingency Tables 69


Chi Square Test A chi-square analysis of contingency tables can have
any number of rows and columns in the table, and any number of
observations.
TABLE 4-1
A Contingency Table
Describing the Number Species Location
of Lowland and Alpine
Species Found at Tundra Foothills Treeline
Different Locations
Lowland 125 16 6
Alpine 7 19 117

Fisher Exact Test The data for a Fisher exact test must form a 2 x 2
(two rows by two columns) contingency table, with 5 or less expected
observations in one or more cells of the table.

McNemar’s Test The data for McNemar's test must form a


contingency table that has exactly the same number of rows and
columns.

Raw Data You can report the group and category of each individual observation by
placing the group in one worksheet column and the corresponding
category in another column. Each row corresponds to a single
observation, so there should be as many rows of data as there are total
numbers of observations.

FIGURE 4–9
Worksheet Data
Arrangement for
Contingency Table
Data from Table 3-1
Columns 1 through 3 are in
tabular format, and columns
4 and 5 are raw data.

SigmaStat automatically cross tabulates these data and performs the


analysis on the resulting contingency table.

Arranging Data for Contingency Tables 70


Chi-Square Test A chi-square analysis of contingency tables can have
any combination of categories and observations.

Fisher Exact Test There can be no more than two categories for each
group, so that exactly four possible combinations result. There should be
no more than 5 observations in at least one combination of categories.

McNemar’s Test There must be the same set of categories used in each
column.

Arranging Data for Regressions 10

Data for all regression and correlation procedures consists of the


dependent variables (usually the “y” data) in one column, and the
independent variables (usually the “x” data) in one or more additional
columns, one column for each independent variable. Dependent
variable data for the logistic regression can be dichotomous or
continuous. Logistic regression data can also be entered in both raw and
grouped data format.

Regression ignores rows containing missing data points within columns


of data (indicated with an “--”). All the columns must be of equal
length, including missing values, or you will receive an error message.

If you plan to test blocks of data instead of picking columns, the


columns must be adjacent, and the leftmost column is assumed to be the
dependent variable.

Raw Data All regressions use data arranged in raw data format. To enter data in raw
format, place the data for the observed dependent variable in one

Arranging Data for Regressions 71


column and the data for the corresponding independent variables in one
or more columns.

FIGURE 4–10
Data for a Multiple
Linear Regression
Temperature and pH are
the independent
variables, and Growth
Rate is the dependent
variable.

Grouped Data Only the Logistic Regression uses the grouped data format. Use grouped
data to specify the number of instances a combination of dependent and
independent variables appear in a logistic regression data set. This data
format is useful if you have several instances of the same variable
combination, and you don’t want to enter every instance in the
worksheet.

To enter data in grouped format, place the data for the observed
dependent variable in one column and the data for the corresponding
independent variables in one or more columns. Only enter one instance
of each different combination of dependent and independent variables,
then specify the number of times the combination appears in the data set
in the corresponding row of another worksheet column.

For example, if there are three instances of the dependent variable 0 with
corresponding independent variables of 26, and 142, place 0 in the
dependent variable column, 26, and 142 in the corresponding rows of
the independent variable columns, and 3 in the corresponding row of
the count worksheet column.

For more information, see Multiple Logistic Regression on page 527.

Arranging Data for Regressions 72


Indexing Data 10

You can convert raw data to indexed data and vice versa, using the
Transforms menu Index and Unindex commands. You can index and
unindex data with one and two factors.

Creating Before indexing data, add titles to the columns. The column title strings
Indexed Data are used as the index codes.

! If you are indexing two ways, you must use columns titles consisting of the
levels of the two factors for that table cell, separated by a hyphen (–), forward
slash (/) or colon (:). These levels names will be used for the index codes.

For more information on entering unindexed cell data for a Two Way
ANOVA, see page Indexing Data on page 73.

To index data:

1 Choose the Transforms menu Index command, then choose One


Way to index by one factor, or Two Way to index for two factors.

FIGURE 4–11
The Results of a One Way
Index of Columns 1, 2, and 3
The results appear in
columns 4 and 5.

2 Select the output column to place the indexed data by clicking the
worksheet column. This should be an empty column with at least
one empty column to the right for a One Way ANOVA, or two
empty columns for Two Way ANOVA.

Indexing Data 73
3 Select the columns to index, either by clicking the worksheet
columns, or selecting the column from the Data for Input drop-
down list. Click Finish to index the contents of the selected input
columns in the selected output column.

4 The indexed data is tabulated, with the indexes appearing in the


left column(s), and the data in the right column.

Indexing Raw Data for a Two Way ANOVA In order to index data for
a Two Way ANOVA, you must have entered the data for each cell of the
Two Way ANOVA table into separate columns before indexing.

TABLE 3-2 Gender Drug


Data Table for a
Two Way ANOVA Drug A Drug B
The factors are Gender and
Drug, and the levels are Male/
Male 3.8 1.5
Female and Drug A/Drug B. 3.2 1.8
3.5 2.2
Female 5.1 5.9
4.9 6.1
5.2 6.0

To enter two factor cell data into columns:

1 Decide on the strings to use as the indexes for the factor levels.
These should be no longer than six characters in length.

2 Enter the factor level combination for each cell, using the index
names for the levels, as the titles for the columns.

Move to the column, press F9 or choose the Format menu


Column Titles... command, type the first factor level index, then a
hyphen (–), forward slash (/), or colon (:), then the second factor
level.

Repeat this for every cell, being sure that you enter the name for
the level identically each time, with the first factor level entered
first, followed by the second factor level.

3 Enter the cell data into the column with the corresponding
column title.

Indexing Data 74
FIGURE 4–12
Raw Data Format for a Two
Way ANOVA, Arranged by
Cell (see Table 3-4)

This data must be indexed


before you can perform a
Two Way ANOVA.

TABLE 4-3
4 To create the indexed columns, choose the Transforms menu Index
A Two Way command, choose Two Way, and select the columns as directed.
ANOVA table The levels used in the column titles are used as the level indexes.
The factors are Gender and
Drug, and the levels are Male/ 5 If you are indexing data for a two way repeated measures test, you
Female and Drug A/Drug B
still need to enter the subject index. Use the Edit menu Copy,
Paste, and Stack commands to quickly create a subject index
column.

FIGURE 4–13
Columns 5 through 7
contain indexed data for a
Two Way ANOVA, generated
using the Index Two Ways
command.

Unindexing Data Indexed data can be unindexed for graphing purposes using the unindex
command.

1 Choose the Transforms menu Unindex command, then choose


One Way to unindex by one factor, or Two Way to unindex for
two factors.

2 Select the columns to unindex as prompted.

Indexing Data 75
FIGURE 4–14
Results of a Two Way
Unindex of Columns
5, 6, and 7
Columns 5 and 6 are the
factor columns, and column
7 is the data column. The
unindexed data was placed in
column 8 and appears in
columns 8 through 11.

3 Select the first result column; this should be an empty column


with enough room to the right to accept the unindexed data.

4 The data is unindexed into raw data. The level indexes are used as
the column titles. If you unindexed one way, each column contains
the data for each level of the factor.

5 If you unindexed two ways, each column contains the data for one
cell in the Two Way ANOVA table, and the two factor levels
appear as the column title, separated by a hyphen (-).

Using Transforms to Modify Data 10

Transforms are math functions that can be applied to existing worksheet


data, or generate data which can be placed in worksheet columns. All
transforms are run from the Transforms menu.

Quick Mathematical SigmaStat provides quick transform functions that can be executed from
Transforms a menu command. These functions are:

➤ Add
➤ Subtract
➤ Divide
➤ Square
➤ Absolute value
➤ Natural logarithm

Using Transforms to Modify Data 76


➤ Base 10 logarithm
➤ Reciprocal
➤ Exponential
➤ Square root
➤ Arcsin x

To apply these transform functions to your data:

1 Choose the Transforms menu Quick Transforms command, then


select the appropriate function.

FIGURE 4–15
The Quick Transforms
and Transforms Available
from the Transform Menu

2 Select the column with the data you want to manipulate as your
input column.

3 Select the column where you want to place the transform results as
your output column, then click Run. The results appear in the
specified column.

Other Transforms SigmaStat provides a complete array of general data transformations.


Use these to transform data to better fit assumptions of tests, or
otherwise modify it before performing a stati stical
procedure. You use data transforms to:

➤ Generate random numbers.


➤ Center, standardize, rank, and filter data.

Using Transforms to Modify Data 77


➤ Define dummy variables, lagged variables, and variable interactions.
➤ Convert other missing values codes to the “--” double dash symbol
for missing values used by SigmaStat.

These commands all appear in the Transforms menu. Choosing these


commands prompts you to select the columns to transform, followed by
a result column. See Chapter 14, USING TRANSFORMS, for a complete
description of each command.

User-Defined You can specify transforms other than those provided as commands in
Transforms the Transforms menu using the User-Defined... command. User-
defined transforms use the SigmaStat transform language. These
transforms are defined by typing equations using variables you define,
the transform language functions, and standard math arithmetic and
logic operators.

User-defined transforms can use data from the worksheet as well as save
equation results to the worksheet.

User-defined transforms can be saved to files on disk that can be opened


later and applied or modified. Because these transforms are saved as
plain text files, they can be created and edited using any word processor
that can edit and save text files.

Chapter 11, USING TRANSFORMS, describes the use and structure of


transforms, including a brief tutorial, and a reference section on
transform operators and functions.

To define your own transform:

1 Choose the Transforms menu User-Defined... command. The


User-Defined Transforms dialog box appears.

2 Select the edit box to begin entering the desired equations. As you
enter your equation, the window scrolls up to accommodate all of
the

Using Transforms to Modify Data 78


lines. You can enter up to 32K worth of text. For directions on
entering transform functions and equations, see Chapter 11.

FIGURE 4–16
The User-Defined
Transform Dialog Box
The transform entered
into the Edit Window
recodes the numeric data
in column 1 to the values
“SMALL,” “MEDIUM,” and
“LARGE” in column 2.

3 Click Execute to run the transform.

FIGURE 4–17
The Results of the
User-defined Recoding
Transform from Figure 4–16

Using Transforms to Modify Data 79


Printing Worksheet Data 10

You can send the contents of the worksheet to a printer using the toolbar
button or the File menu Print... command. To print the worksheet:

1 Make sure that the worksheet is the active window. If you want to
print only a portion of the columns in the worksheet, select a block
from the worksheet.

2 Click the button in the toolbar, or choose the File menu


Print... command. The Print dialog box appears.

FIGURE 4–18
The Print Dialog Box for the
HP LaserJet 4/4M Postscript
Printer Driver

3 Set the appropriate options, then click OK. The Print Data
Worksheet dialog box appears.

4 Specify whether you want to print the entire worksheet, only the
selected cells in the worksheet, or a specified range of columns by
selecting one of the options under the Area to Print heading.

5 To include page, column, and row titles, and column numbers for
the current worksheet, select the appropriate check boxes.

6 To print the data at the full twenty-one place precision, select the
Full Precision option. Otherwise, the data is printed as displayed in
the worksheet. Worksheet data display is controlled with the File
menu Preferences... command (see page 41).

Printing Worksheet Data 80


7 To print grid lines separating each cell, select the Grid Lines
option. Grid lines are solid lines.

FIGURE 4–19
The Print Data
Worksheet Dialog Box

8 Click OK to print the worksheet.

Printing Column To print column statistics, select the column statistics worksheet, click
Statistics the toolbar button, choose the File menu Print... command, or press
Ctrl+P, then follow the procedures for printing the worksheet (see
PRINTING WORKSHEET DATA on page 80).

! Note that in order to print the names of the statistics that appear in the row
region of the worksheet, you must select to print row titles.

Printing Worksheet Data 81


Printing Worksheet Data 82
Using the Advisor Wizard

4 Using the Advisor Wizard

The SigmaStat Advisor Wizard is designed to help you to determine the


appropriate SigmaStat test to use to analyze your data.

To start the Advisor Wizard:

1 Click the toolbar button, or choose the Help menu Advisor...


command. The Advisor Wizard appears.

2 Answer the questions about what you want to do and the format of
your data. Click Next to go to the following dialog box, Back to go
to the preceding dialog box, Finish to view the suggested test, or
Cancel to close the Advisor Wizard.

3 When a test is suggested, click Run to perform the test. The Pick
Columns dialog box for the suggested test appears prompting you
to select the worksheet columns with the data you want to test. For
information on how to use this dialog box, see page 99.

The remainder of this chapter describes the answers for each dialog box.

For discussions of how the nature of data and experimental design


determine the statistical test to use, you can reference any appropriate
statistics reference. For a list of suggested references, see page 12.

Select what you need to do 10

The first step in assigning a test appropriate to your data is defining what
you want to accomplish. SigmaStat’s Advisor begins by prompting you
to select what you need to do. After selecting the desired general goal,
you are either prompted for additional information or a dialog box
appears suggesting the test to use.

Select what you need to do 83


Using the Advisor Wizard

Describe your Data with Select this option if you want to view a list of descriptive statistics for
Basic Statistics one or more columns of data.

After you select this option, click Finish. SigmaStat suggests performing
the Describe Statistics test. Click Run to perform the test. The Pick
Columns dialog box appears prompting you to pick the columns you
want to use for the test. For directions on performing this test, see page
107. For information on the results of this procedure, see page 108.

FIGURE 4–1
The Test Suggestion
Dialog Box Prompting You
to Run the Suggested Test

Comparing Groups Select this option if you want to compare data for significant differences,
or Treatments for example, if you want to compare the mean blood pressure of people
for Significant who are receiving different drug treatments. The data to be compared
Differences can be the data collected from different groups, the data for different
treatments on the same subjects, or the distributions or proportions of
different groups.

If you select this option, you are asked to describe how your data is
measured; see How are the data measured? on page 85.

Predict a Trend, Select this option if you want to use regression to predict a dependent
Find a Correlation, variable from one or more independent variables, or describe the
or Fit a Curve strength of association between two variables with a correlation
coefficient. For example, select this option if you want to see if you can
predict the average caloric intake of an animal from its weight.

If you select this option, you are asked to describe how your data is
measured; see the following section, HOW ARE THE DATA MEASURED?.

Determine Select this option if you want to determine the desired sample size for an
Sample Size for an experiment you intend to perform.
Experimental Design
If you select this option, you are asked to describe how the data is
measured; see the following section, HOW ARE THE DATA MEASURED?.

Select what you need to do 84


Using the Advisor Wizard

Determine the Sensitivity Select this answer to determine the power or ability of a test to detect an
of an Experimental effect for an experiment you want to perform.
Design
If you select this option, then click Next, you are asked to describe how
the data is measured; see the following section, HOW ARE THE DATA
MEASURED?.

How are the data measured? 10

You need to define how your data are measured to determine which
SigmaStat test to perform for most procedures.

There are three ways data can be measured:

➤ By numeric values.
➤ By order or rank.
➤ By proportion or number of observations.

By Numeric Select this option if your data are measured on a continuous scale using
Values (e.g., numbers. Examples of numeric values include height, weight,
meters or degrees) concentrations, ages, or any measurement where there is an arithmetic
relationship between values.

➤ If you are comparing groups or treatments for differences, you are


asked if you have repeated observations on the same individuals. See
Did you apply more than one treatment per subject? on page 86.
➤ If you are predicting a trend, you are prompted to select the type of
prediction you want to perform. See What kind of prediction do
you want to make? on page 92.
➤ If you are determining the sample size of or the sensitivity of an
experimental design, you are asked how many groups or treatments
you have. See How many groups or treatments are there? on page
87.

By Order or Rank (e.g., Select this option if your data are measured on a rank scale that has an
poor, fair, ordering relationship, but no arithmetic relationship, between values.
good, excellent)
For example, clinical status is often measured on an ordinal scale, such
as: Healthy , 1, Feeling ill , 2, Sick , 3, Hospitalized , 4, and Dead ,
5. These ratings show that being dead is worse than being healthy, but

How are the data measured? 85


Using the Advisor Wizard

they do not indicate that being dead is five times worse than being
healthy.

➤ If you are comparing groups or treatments for differences, you are


asked if you have repeated observations on the same individuals. See
Did you apply more than one treatment per subject? on page 86.
➤ If you are predicting a trend, click Finish. SigmaStat suggests
computing the Spearman Rank Correlation. Click Run to perform
the procedure, Cancel to exit the Advisor and return to the
worksheet, or Help for information on the test. For directions on
performing this procedure, see page 635. For descriptions of the
results for this procedure, see page 635.

By Proportion Select this option if your data is measured on a nominal scale, which
or Number of counts the number or proportions that fall into categories, and where
Observations in there is no relationship between the categories (such as Democrat versus
Categories (e.g., male Republican).
vs. female)
➤ If you are comparing groups or treatments for differences, you are
asked if you have repeated observations on the same individuals. See
Did you apply more than one treatment per subject? on page 86.
➤ If you are predicting a trend, click Finish. SigmaStat suggests
running a Multiple Logistic Regression. Click Run to perform the
test, Cancel to exit the Advisor and return to the worksheet, or Help
for information on the test. For information on how to perform a
Multiple Logistic Regression, see page 527. For information on
Logistic Regression results, see page 544.
➤ If you are determining a sample size or the sensitivity of a
experimental group, you are asked how your data is formatted. See
What kind of data do you have? on page 91.

Did you apply more than one treatment per subject? 10

If you are comparing groups or treatments, or determining sample size


or power and your data is measured on a continuous numeric scale, you
must specify whether the observations were, or are to be made, on the
same or different subjects. Select Yes or No, then click Next to continue,
click Back to return to the previous dialog box, or click Cancel to return
to the worksheet.

Did you apply more than one treatment per subject? 86


Using the Advisor Wizard

No Answer No if each observation was obtained from a different subject. If


you are seeing if there is a difference between different groups, such as
comparing the weights of three different populations of elephants, you
are not repeating observations. The only time you should select Yes is if
you are comparing the same individuals before and after one or more
treatments.

➤ If you are comparing groups on an arithmetic or rank scale, you are


asked to specify the number of groups or treatments; see the
following section, How many groups or treatments are there? on
page 87.
➤ If you are comparing group proportions or distribution in
categories, you are asked what kind of data you have. See What kind
of data do you have? on page 91.

Yes Answer Yes if the observations are different treatments made on the same
subjects. Select Yes when you are comparing the same individuals before
and after one or more different treatments or changes in condition. For
example, you would select Yes if you were testing the effect of changing
diet on the cholesterol level of experimental subjects, or if you were
taking an opinion poll of the same voters before and after a political
debate.

➤ If you are comparing groups on an arithmetic or rank scale, you are


asked to specify the number of groups or treatments; see the
following section, How many groups or treatments are there? on
page 87.
➤ If you are comparing group proportions or distribution in
categories, click Finish. SigmaStat suggests performing McNemar's
Test. Click Run to perform the test, Cancel to exit the Advisor and
return to the worksheet, or Help to get help on the suggested test.
For directions on performing this procedure, see page 457. For
descriptions of the results for this procedure, see page 463.

How many groups or treatments are there? 10

When comparing groups or treatments or determining sample size or


power and your data is measured on a continuous numeric or rank scale,
SigmaStat asks you how many treatments or conditions are involved.
After specifying the number of groups, you are asked more questions, or

How many groups or treatments are there? 87


Using the Advisor Wizard

a test is suggested. Click Back to return to the previous dialog box,


Cancel to return to the worksheet, Help for information on using the
Advisor, or Finish to view the suggested test, then Run to perform it.

Two Select this option if you have two different experimental groups or if
your subjects underwent two different treatments.

For example, if you are comparing differences in hormone levels


between men and women, or if you are measuring the change in
individuals before and after a drug treatment, there are two groups.

After you select this option, SigmaStat suggests the appropriate test.
Click Finish to view the suggested test, then Run to perform the test.
Click Cancel to exit the Advisor and return to the worksheet, or Help
for information on the test.

➤ If you are comparing two different groups on an arithmetic scale,


SigmaStat suggests the independent t-test. For directions on
performing this procedure, see page 212. For descriptions of the
results for this procedure, see page 214.
➤ If you are determining sample size or power for a comparison of two
groups on an arithmetic scale, SigmaStat suggests that you perform
t-test sample size or power computations. For directions on
determining sample size see page 728, and for directions on
determining power see page 714.
➤ If you are comparing the same subjects undergoing two different
treatments on an arithmetic scale, SigmaStat suggests performing
the Paired t-test. For directions on performing this procedure, see
page 337. For descriptions of the results for this procedure, see page
339.
➤ If you are determining sample size or power for a comparison of the
same subjects undergoing two treatments on an arithmetic scale,
SigmaStat suggests performing Paired t-test sample size or power
computations. For directions on determining sample size see page
730, and for directions on determining power see page 717.
➤ If you are comparing two different groups on a rank scale, SigmaStat
suggests performing the Mann-Whitney Rank Sum Test. For
directions on performing this procedure, see page 224. For
descriptions of the results for this procedure, see page 226.
➤ If you are comparing the same subjects undergoing two different
treatments on a rank scale, SigmaStat suggests performing the

How many groups or treatments are there? 88


Using the Advisor Wizard

Wilcoxon Signed Rank Test. For directions on performing this


procedure, see page 349. For descriptions of the results for this
procedure, see page 351.

Three or More Select this option if your group has three or more different groups to
compare, or are comparing the response of the same subjects to three or
more different treatments.

For example, if you collected ethnic diversity data from five different
cities, or subjected individuals to a series of four dietary changes and
measured change in serum cholesterol, you are analyzing three or more
groups.

After you select this option, click Finish. SigmaStat suggests the
appropriate test. Click Run to perform the test, Cancel to exit the
Advisor and return to the worksheet, or Help for information o the test.

➤ If you are comparing three or more different groups on an


arithmetic scale, SigmaStat suggests performing One Way ANOVA.
For directions on performing this procedure, see page 230. For
descriptions of the results for this procedure, see page 243.
➤ If you are determining sample size or power for a comparison of
three or more different groups on an arithmetic scale, SigmaStat
suggests performing One Way ANOVA sample size or power
computations. For directions on performing these procedures, see
pages 13-735 and page 721.
➤ If you are comparing the same subjects undergoing three or more
different treatments on an arithmetic scale, SigmaStat suggests
performing One Way Repeated Measures ANOVA. For directions
on performing this procedure, see page page 362. For descriptions
of the results for this procedure, see page page 369.
➤ If you are comparing three or more different groups on a rank scale,
SigmaStat suggests the Kruskal-Wallis ANOVA on Ranks. For
directions on performing this procedure, see page page 316. For
descriptions of the results for this procedure, see page page 322.
➤ If you are comparing the same subjects undergoing three or more
different treatments on a rank scale, SigmaStat suggests the
Friedman Repeated Measures ANOVA on Ranks. For directions on
performing this procedure, see page page 362. For descriptions of
the results, see page page 369.

How many groups or treatments are there? 89


Using the Advisor Wizard

There are two Select this option if each experimental subject is affected by two
combinations of groups different experimental factors or underwent two different treatments
or treatments to simultaneously. Note that different levels of a factor, such as male and
consider (e.g., males female for gender, are not considered to be different factors.
and females from
different cities) For example, if you were comparing only males and females, you would
have only one factor. However, if you compared males and females from
different countries, there would be two factors, gender and nationality.

After you select this option, click Finish. SigmaStat suggests the
appropriate test. Click Run to perform the test, Cancel to exit the
Advisor and return to the worksheet, or Help for information on the
test.

➤ If you are comparing three or more different groups on an


arithmetic scale, SigmaStat suggests performing Two Way ANOVA.
For directions on performing this procedure, see page 264. For
descriptions of the results for this procedure, see page 271.
➤ If you are comparing the same subjects undergoing three or more
repeated treatments on an arithmetic scale, SigmaStat suggests Two
Way Repeated Measures ANOVA. Note that either one or both
factors can be repeated treatments. For directions on performing this
procedure, see page 390. For descriptions of the results for this
procedure, see page 397.

There are three Select this option if each experimental subject is affected by three
combinations of groups different experimental factors or underwent three different treatments
to consider. simultaneously. Note that different levels of a factor, such as male and
female for gender, and Italian and German for nationalities are not
considered to be different factors.

For example, if you are comparing only males and females, from Italy
and Germany, you have only two factors. However, if you are comparing
males and females from different countries with different diets, there are
three factors: gender, nationality, and diet.

After you select this option, click Finish to view the suggested test.
SigmaStat suggests you run a Three Way ANOVA. Click Run to
perform the test, Back to return to the previous Advisor panel, or
Cancel to return to the worksheet without running the test. For
directions on performing the Three Way ANOVA, see page 293. For
descriptions of the results for the Three Way ANOVA, see page 300.

How many groups or treatments are there? 90


Using the Advisor Wizard

This is a measure of the If you are determining power or sample size, this option also appears. If
association between two you select this answer, click Finish. SigmaStat suggests performing power
variables or sample size computations for a correlation coefficient.

Click Run to perform the test, Cancel to exit the Advisor and return to
the worksheet, or Help for information on the test.

What kind of data do you have? 10

You can have two kinds of data that are arranged by proportions in
categories. After specifying the kind of data you have, click Finish to
view the suggested test, Back to return to the previous panel, or Cancel
to quit the Advisor and return to the worksheet. Click Run to perform
the test, Cancel to return to the worksheet, or Help for information on
the test.

A Contingency Table Select this option if you have data in the form of a contingency table. A
contingency table is a method of displaying the observed numbers of
different groups that fall into different categories; for example, the
number of men and women that voted for a Republican or Democratic
candidate. These tables are used to see if there is a difference between the
expected and observed distributions of the groups in the categories.

A contingency table uses the groups and categories as the rows and
columns, and places the number of observations for each combination
in the cells. For more information on how to create a contingency table,
see page 443.

Select this option, then click Finish; SigmaStat suggests performing a


Chi-Square Analysis of Contingency Tables. Click Run to perform the
procedure, Cancel to quit the Advisor, or Help for information on the
test. For directions on performing this procedure, see page 446. For
descriptions on the results for this procedure, see page 449.

Observed Proportions Select this option when you have data for the sample sizes of two groups
and the proportion of each group that falls into a single category. This
data is used to see if there is a difference between the proportion of two
different groups that fall into the category. For information on how to
enter this data, see page 435.

What kind of data do you have? 91


Using the Advisor Wizard

If you select this option, click Finish to view the suggested test;
SigmaStat suggests that you compare proportions. Click Run to perform
the procedure, Cancel to quit the Advisor, or Help for information on
the test. For directions on performing this procedure, see page 438. For
descriptions on the results for this procedure, see page 439.

What kind of prediction do you want to make? 10

If you are predicting a trend, finding a correlation, or fitting a curve and


your data is measured on a continuous numeric scale, you are asked what
kind of prediction you want to make. There are three different goals
available when you are trying to predict one dependent variable from
one or more independent variables. After specifying the kind of
prediction you want to make, SigmaStat asks more questions or suggests
the kind of test to use. Answer the questions to continue, or click Back
to return to the previous dialog box, Cancel to return to the worksheet,
or Finish to view the suggested test.

Fit a Straight Line Select this answer to find the slope and the intercept of the line
Through the Data
y = p0 + p1 x

that most closely describes the relationship of your data, where y is the
dependent variable and x is the independent variable.

If you select this option, click Finish to view the suggested test.
SigmaStat suggests performing a Linear Regression. Click Run to
perform the procedure, Cancel to quit the Advisor, or Help for
information on the test. For directions on performing this procedure, see
page 482. For descriptions on the results for this procedure, see page
483.

Fit a Curved Line Select this answer to find an equation that predicts the dependent
Through the Data variable from an independent variable without assuming a straight line
relationship. If you select to fit a curved line through your data,
SigmaStat asks you what kind of curve you want to use; see WHAT KIND
OF CURVE DO YOU WANT TO USE? on page 93.

What kind of prediction do you want to make? 92


Using the Advisor Wizard

Predict a Select this option if you want to predict a dependent variable from more
Dependent than one independent variable using the linear relationship
Variable from Several
Independent Variables y = b0 + b1 x1 + b2 x2 + b3 x3 + + bk xk

where y is the dependent variable, x1, x2, x3, ..., xk are the k independent
variables, and b0, b1, b2,...,bk are the regression coefficients. As the values
for xi vary, the corresponding value for y either increases or decreases
proportionately.

If you select this option, SigmaStat asks how you want to specify the
independent variables. See HOW DO YOU WANT TO SPECIFY THE
INDEPENDENT VARIABLES? on page 94.

Measure Variable Select this option to find how closely the value of one variable predicts
Association Strength the value of another (i.e., the likelihood that a variable increases or
decreases when the other variable increases or decreases), without
specifying which is the dependent and independent variable.

If you select this option, click Finish. SigmaStat suggests computing the
Pearson Product Moment Correlation. Click Run to perform the
procedure or Cancel to quit the Advisor, or Help for information on the
test.

What kind of curve do you want to use? 10

If you are trying to predict one variable from one or more other variables
using a curved line, you are asked what kind of curve you want to use.

A Polynomial Select this option if you want to use a kth order polynomial curve of the
Curve with One form
Independent Variable
2 x
y = b0 + b1 x + b2 x + + bk x

to predict the dependent variable y from the independent variable x,


with b0, b1, b2,...,bk are the regression coefficients.

What kind of curve do you want to use? 93


Using the Advisor Wizard

If you select this option, click Finish. SigmaStat suggests performing


Polynomial Regression. Click Run to perform the procedure, Cancel to
quit the Advisor, or Help for information on the selected test. For
directions on performing this procedure, see page 564. For descriptions
on the results for this procedure, see page 566.

A General Select this option if you want to describe your data with a nonlinear
Nonlinear Equation function. Common nonlinear functions include rising and falling
exponential and log curves, logistic sigmoid curves, and hyperbolic
curves that approach a maximum or minimum.

If you select this option, click Finish. SigmaStat suggests using


Nonlinear Regression. Click Run to perform the procedure, Cancel to
quit the Advisor, or Help for information on the selected test.

Nonlinear Regression uses a dialog box to specify any general nonlinear


equation with up to ten independent variables, then uses an iterative
least squares algorithm to estimate the parameters in the regression
model.

For directions on performing a Nonlinear Regression, see Chapter 12.


For descriptions on the results for this procedure, see page 652.

How do you want to specify the independent variables? 10

If you chose to predict a dependent variable from several independent


variables, you can select the independent variables using two methods.
The dependent variable and independent variables are selected as
columns from the worksheet when the regression procedure is
performed.
Include All Selected Select this option if you want to compute a single equation using all
Independent Variables in independent variables you select for the equation, regardless of whether
the Equation they contribute significantly to predicting the dependent variable.

If you select this option, click Finish. SigmaStat suggests performing a


Multiple Linear Regression. Click Run to perform the procedure,
Cancel to exit the Advisor and return to the worksheet, or Help for
information on the test. For directions on performing this procedure, see
page 511. For descriptions on the results for this procedure, see page
512.

How do you want to specify the independent variables? 94


Using the Advisor Wizard

Let SigmaStat Select the Select this option if you want SigmaStat to screen the potential
“Best” Variables to independent variables you select and only include ones that significantly
Include in the Equation contribute to predicting the dependent variable. You are then asked how
you want to select the independent variables; see HOW DO YOU WANT
SIGMASTAT TO SELECT THE INDEPENDENT VARIABLE? below.

How do you want SigmaStat to select the


independent variable? 10

If you are predicting the value of one variable from other variables, and
you want SigmaStat to screen potential variables for their contribution
to the predictive value of the regression equation, you can select three
different methods.

Sequentially Add New Select this option to select the independent variables for the equation by
Independent Variables to starting with no independent variables, then adding variables until the
the Equation ability to predict the dependent variable is no longer improved. The
variables are added in order of the amount of predictive ability they add
to the model.

The predictive ability of models produced with forward stepwise


regression is measured by their ability to reduce the residual sum of
squares in the regression equation.

If you select this option, click Finish. SigmaStat suggests Forward


Stepwise Regression. Click Run to perform the procedure, Cancel to exit
the Advisor and return to the worksheet, or Help for information on the
test. For directions on performing this procedure, see page 594. For
descriptions on the results for this procedure, see page 596.

Sequentially Remove Select this option to select the independent variables for the equation by
Independent Variables starting with all independent variables in the equation, then deleting
from the Equation variables one at a time. The variable that contributes the least to the
prediction of the dependent variable is deleted from the equation first.
This elimination process continues until the ability of the model to
predict the dependent variable is reduced below a specified level.

The predictive ability of models produced with backwards stepwise


regression is measured by their ability to reduce the residual sum of
squares in the regression equation.

How do you want SigmaStat to select the independent variable? 95


Using the Advisor Wizard

If you select this option, click Finish. SigmaStat suggests the Backward
Stepwise Regression. Click Run to perform the test, Cancel to exit the
Advisor and return to the worksheet, or Help for information on the
test. For directions on performing this procedure, see page 594. For
descriptions on the results for this procedure, see page 596.

Consider All Possible Select this option if you want SigmaStat to evaluate all possible
Combinations of the regression models, and isolate the models that “best” predict the
Independent Variable dependent variable.
and Select the Best
Subset If you select this option, click Finish. SigmaStat suggests the Best Subset
Regression. Click Run to perform the procedure, Cancel to exit the
Advisor and return to the worksheet, or Help for information on the
test. For directions on performing this procedure, see page 618. For
descriptions on the results for this procedure, see page 619.

SigmaStat selects the sets of independent variables that “best” predict the
dependent variable using criteria specified in the Best Subsets Regression
Options dialog box.

How do you want SigmaStat to select the independent variable? 96


Using SigmaStat Procedures

5 Using SigmaStat Procedures

The statistical procedure used to analyze a given data set depends on the
goals of your analysis and the nature of your data. The Advisor Wizard
asks you questions about your goals and your data, then selects the
appropriate test. For information on how to use the Advisor Wizard, see
Chapter 4. Alternately, you can perform SigmaStat's statistical
procedures directly by choosing the appropriate Statistics menu
command.

Running SigmaStat Procedures 10

In general, the steps to run a SigmaStat test or procedure are:

1 Entering or importing and arranging your data appropriately in


the worksheet.

2 Determining and choosing the test you want to perform. For


information in choosing the appropriate procedures, see Choosing
the Procedure to Use on page 103.

3 If desired, setting the test options using the selected test’s Options
dialog box.

4 the test by picking the worksheet columns with the data you want
to test using the Pick Columns dialog box.

5 Viewing, generating, and interpreting, the test reports and graphs.

Arranging Worksheet The method used to enter or arrange data in the worksheet depends on
Data the type of test you are running. For information on how to arrange data
for the different tests, see Data Format for Group Comparison Tests on

Running SigmaStat Procedures 97


Using SigmaStat Procedures

page 204, Data Format for Repeated Measures Tests on page 330, Data
Format for Rate and Proportion Tests on page 431, and Data Format for
Regression and Correlation on page 468.

Selecting a Test You can select a test by selecting the test from the drop-down list in the
toolbar or by choosing the appropriate Statistics menu command.

To change test options before you run a test, you must select the test in
the toolbar drop-down list, then choose the Statistics menu Current Test
Options... command, or click the toolbar button.

Setting Test Options Almost all SigmaStat procedures can be configured with a set of options.
These settings enable you to perform additional tests and procedures.
You may wish to enable or disable some of these options or change
assumption checking parameters; all changes are saved between
SigmaStat sessions.

To change option settings before you run a test:

1 Select the test you will be running from the drop-down list in the
toolbar, then click the button or choose the Statistics menu
Current Test Options... command.

2 Select the tab of the options you want to view. Click a selected
check box if you do not want to use that test option. Click an
unselected check box to include an option in the test.

FIGURE 5–1
Example of
an Options Dialog Box
Each test has
its own settings.

3 Click the Select All button to select all the options in the panel.
Click Clear to clear all the selected options in the panel.

Running SigmaStat Procedures 98


Using SigmaStat Procedures

4 Once you have changed the desired options, click Run Test to
continue the test. The Pick Columns dialog box appears see (the
next topic for more information). To accept the current settings
without continuing the test, click Apply. To close the dialog box
without changing any settings or running the test, click Cancel.
Select Help at any time to access SigmaStat’s on-line help system.

Picking Data to Test The Pick Columns dialog box is used to select the worksheet columns
with the data you want to test and to specify how your data is arranged
in the worksheet.

To select the data you want use in the test:

1 Start the test you want to run; this opens the Pick Columns dialog
box for that test. You can either:

➤ Select the test from the drop-down list in the toolbar, then click
the toolbar button.
➤ Click Run Test from the Options dialog box.
➤ Choose the test from the Statistics menu.

2 If your data can be arranged in more than one format, the Pick
Columns dialog box appears prompting you to specify a data
format. Select the appropriate format from the Data Format drop-
down list, then click Next.

The available formats depend on the test you are running. For
information on how data can be arranged for the different
SigmaStat tests, see Data Format for Repeated Measures Tests on
page 330, Data Format for Rate and Proportion Tests on page 431,
and Data Format for Regression and Correlation on page 468.

FIGURE 5–2
The Pick Columns
Dialog Box for a One Way
ANOVA Prompting
For a Data Format
The data formats available
depend on the type of test
you are running.

Running SigmaStat Procedures 99


Using SigmaStat Procedures

If the test you are running uses only one type of data format, the
Pick Columns dialog box appears prompting you to select the
columns with the data you want to test (see the following step).

3 If you selected columns before you chose the test, the selected
columns automatically appear in the Selected Columns list. To
assign the desired worksheet columns to the Selected Columns list,
select the columns in the worksheet, or select the columns from
the Data drop-down list.

The dialog box indicates the type of data you are selecting.

FIGURE 5–3
The Pick Columns Dialog
Box
for a One Way ANOVA
Using Raw Data
If you select your data
columns before you
run the test, the columns
appear in the dialog box.

The first selected column is assigned to the first entry in the


Selected Columns list, and all successively selected columns are
assigned to successive entries in the list. The number or title of
selected columns appear in each entry. The number of columns
you can select depends on the test you are running and the format
of you data.

4 To change your selections, select the assignment in the list, then


select new column from the worksheet. You can also clear a
column assignment by double-clicking it in the Selected Columns
list.

5 If you are running a Forward or Back ward Stepwise Regression,


click Next. The Pick Columns dialog box appears prompting you
specify which variables you want to force into the regression

Running SigmaStat Procedures 100


Using SigmaStat Procedures

equation. For more information on stepwise regression, see


12-577.

FIGURE 5–4
The Pick Columns
Dialog Box
for the Forward Stepwise
Regression Prompting
You to Specify the
Variables to Force into
the Regression Equation

6 Click Finish to perform the test on the data in the selected


columns. After the computations are completed, the report
appears.

! If you attempt to run a test on worksheet columns which contains


empty cells, a dialog box appears asking if you want to convert the
empty cells to missing values. Click OK to convert the cells and
continue with the test, or Cancel to cancel the test.

FIGURE 5–5
The Convert Empty Cells
to Missing Values Dialog
Box

Reports and Test reports automatically appear after a test has been performed. To
Report Graphs generate a report graph, make sure the report is the active window, then
click the toolbar button, or choose the Graph menu Create Graph...
command.

Graphs are not created for rates and proportion tests, and best subset
and incremental polynomial regression reports. The toolbar button
and the Graph menu Create Graph... command are dimmed for these
tests.

! If you close a report without generating or saving a graph, the graph is not
recoverable. See Saving Graphs in Notebook Files on page 200 for more
information on saving graphs.

Running SigmaStat Procedures 101


Using SigmaStat Procedures

Editing, Saving, and Opening Reports and Graphs Reports and


graphs can be edited using the Format menu commands and the Graph
Properties dialog box. Reports can also be exported as non-notebook
files and edited in other applications. For more information on editing
graphs and exporting reports, see Modifying Graph Attributes on page
184 and Exporting Reports on page 143.

Repeating Tests Repeating a test involves running the last test you performed, using the
same worksheet columns. To repeat a test using new data columns, use
the button or the Statistics menu Run Current Test... command (see
Running SigmaStat Procedures on page 97 for more information).

To repeat a test using the same worksheet columns:

1 Make sure the last test you performed is displayed in the toolbar
drop-down list.

If you haven’t performed the test displayed in the drop-down list,


the button and the Statistics menu Rerun Current Test...
command are inactive. To find the last performed test, you can
scroll through the drop-down list until the button and command
are active.

2 If desired, edit the data in the columns used by the test. You can
add data and change values and column titles.

3 To change the option settings before you rerun the test, click the
toolbar button, change the desired options, then click OK to
accept the changes and close the dialog box.

4 Click the toolbar button, or choose the Statistics menu Rerun


Current Test... command. The Pick Columns dialog box appears
with the columns used in the last procedure selected.

5 Click Finish to repeat the procedure using these columns. After the
computations are complete, a new report appears.

Running SigmaStat Procedures 102


Using SigmaStat Procedures

Choosing the Procedure to Use 10

You can use SigmaStat to perform a wide range of statistical procedures.


The SigmaStat Advisor (seeChapter 4) can suggest which test to use. You
can also determine the appropriate test yourself. The type of procedure
to choose depends on the kind of analysis you want to perform:

➤ Use descriptive statistics to compute a number of commonly used


Statistical values for the selected data; see page 104.
➤ Use group comparison tests to analyze two or more different sample
groups for statistically significant differences; see Chapter 8.
➤ Use repeated measures comparisons to test the differences in the
same individuals before and after one or more treatments or changes
in condition; see Chapter 9.
➤ Use rate and proportion analysis to compare the distribution of
groups that are divided or fall into different categories or classes (for
example, male versus female, or reaction versus no reaction); see
FIGURE 5–6
Chapter 10.
SigmaStat Procedures to
Use for Statistical Tests
Type of Experiment
Scale of Two groups of Three or more Same individuals Same individuals Predict a variable
Measurement different groups of before and after a after multiple or find an
individuals different single treatment treatments association
individuals between variables

Numeric, normally Unpaired t-test One Way or Paired t-test One Way or Regression or
distributed with Two Way Two Way Pearson Product
equal variances ANOVA Repeated Moment
Measures Correlation
ANOVA

By rank or order, or Mann-Whitney Kruskall-Wallis Wilcoxon Signed Friedman Spearman Rank


numeric, but non- Rank Sum Test ANOVA on Rank Test Repeated Order
normally Ranks Measures Correlation
distributed and/or
with unequal ANOVA on
variances Ranks

By distribution in Chi-Square Chi-Square McNemar’s Test Not Available Not Available


different categories Analysis of Analysis of
Contingency Contingency
Tables Tables

Choosing the Procedure to Use 103


Using SigmaStat Procedures

➤ Use regression and correlation to predict values of a variable from


other variables or describe a variable's effect on, or strength of
association, with another variable; see Chapter 11.
➤ Use survival analysis to generate the probability of the time to an
event and the associated statistics such as the median survival time;
see Chapter 12.
➤ Use power and sample size determination to calculate the sensitivity,
or power, of an experimental test, or to compute the experimental
sample size required to achieve a desired sensitivity; see Chapter 13.

All statistical procedure commands are found under the Statistics menu.

Describing Your Data with Basic Statistics 10

You can use SigmaStat to describe your data by computing basic


statistics, such as the mean, median, standard deviation, percentiles, and
so forth, that summarize the observed data.

Describing your data involves:

➤ Arranging your data in the appropriate format.


➤ Setting descriptive statistic options.
➤ Selecting the columns you want to compute the statistics for.
➤ Viewing the descriptive statistics results.

Describing Your Data with Basic Statistics 104


Using SigmaStat Procedures

Arranging Descriptive Statistics Data 10

Descriptive Statistics are performed on columns of data, so you should


arrange the data for each group or variable you want to analyze in
separate columns.

FIGURE 5–7
Data Arrangement
with Treatments or
Groups in Columns

Selecting Data Columns You can calculate statistics for entire columns or only a portion of
columns. When running the descriptive statistics procedure, you can:

➤ Select the columns or block of data before you run the test, or
➤ Select the columns while running the test (page 105)

! To calculate statistics for only a range of data, select the data before you run
the test. You can select a minimum of one column and a maximum of 32
columns when describing data.

Setting Descriptive Statistics Options 10

The statistics to be calculated are selected using the Descriptive Statistics


Options dialog box.

To change descriptive statistics test options:

1 If you are going to run the test after changing test options, and
want to select your data before you run the test, drag the pointer
over your data.

Describing Your Data with Basic Statistics 105


Using SigmaStat Procedures

2 To open the Options for Descriptive Statistics dialog box, select


Descriptive Statistics from the toolbar drop-down list, then:

➤ Click
the button, or
➤ Choose the Statistics menu Current Test Options... command

3 Click any of the selected statistics settings you do not want to


include in the report. A description of each statistic is described in
Descriptive Statistics Results on page 108.

FIGURE 5–8
The Options for
Descriptive
Statistics Dialog Box

The specific summary statistics that are appropriate for a given


data set depend on the nature of the data. If the observations are
normally distributed, then the mean and standard deviation
provide a good description of the data. If not, then the median and
percentiles often provide a better description of the data.

4 To change the percentile or confidence intervals computed,


edit the values in the appropriate boxes. To change the confidence
interval, enter any number from 1 to 99 (95 and 99 are the most
commonly used intervals).

5 To change the P value for the normality test, edit the value P
Value to Reject edit box. The P value determines the probability of
being incorrect in concluding that the data is not normally
distributed. If the P computed by the test is greater than the P set
here, the data passes the normality test.

To require a stricter adherence to normality, increase the P value.


Because the parametric statistical methods are relatively robust in
terms of detecting violations of the assumptions, the suggested

Describing Your Data with Basic Statistics 106


Using SigmaStat Procedures

value in SigmaStat is 0.050. Larger values of P (for example,


0.100) require less evidence to conclude the data is not normal.

To relax the requirement of normality, decrease P. Requiring


smaller values of P to reject the normality assumption means that
you are willing to accept greater deviations from the theoretical
normal distribution before you flag the data as non-normal.

6 To select all statistics options, click the All button. To clear all
selections, click the Clear button.

7 Click Run Test to perform the test with the selected options
settings. Click Apply to accept the selected settings without
continuing with the test. Cancel closes the Options dialog box and
returns to the previous option settings, and Help accesses
SigmaStat’s on-line help system.

Running the Descriptive Statistics Test 10

To describe your data:

1 If you want to select your data before you run the procedure, drag
the pointer over your data.

2 Open the Pick Columns for Descriptive Statistics dialog box by:

➤ Selecting Descriptive Statisticsfrom the toolbar drop-down list,


then clicking the button.
➤ Choosing the Statistics menu Describe Data... command.
➤ Clicking the Run Test button from the Options for Descriptive
Statistics dialog box (page 105).

3 If you selected columns before you chose the test, the selected
columns automatically appear in the Select Columns list. To assign
the desired worksheet columns to the Selected Columns list, select

Describing Your Data with Basic Statistics 107


Using SigmaStat Procedures

the columns in the worksheet, or select the columns from the Data
for Data drop-down list.

FIGURE 5–9
The Pick Columns
for Descriptive
Statistics Dialog Box

The first selected column is assigned to the first row in the Selected
Columns list, and all successively selected columns are assigned to
successive rows in the list. The number or title of selected columns
appear in each row. You can select up to 64 columns of data for the
Descriptive Statistics Test.

4 To change your selections, select the assignment in the list, then


select new column from the worksheet. You can also clear a
column assignment by double-clicking it in the Selected Columns
list.

5 Click Finish to describe the data in the selected columns. After the
computations are completed, the report appears. To edit the
report, use the Format menu commands; for information on
editing reports, see page 137.

! If you attempt to run a test on worksheet columns with empty cells,


a dialog box appears asking you if you want to convert the empty
cells to missing values. Choose Convert to convert the cells and
continue with the test, or Cancel to cancel the test.

Descriptive Statistics Results 10

The following statistics can be calculated and displayed in the results


report. These values are calculated for each column selected. The specific
statistics computed are selected in the Options for Descriptive Statistics
dialog box.

! The number of decimal places displayed is controlled by the File menu


Preferences... command.

Describing Your Data with Basic Statistics 108


Using SigmaStat Procedures

Size This is the number of non-missing observations in a worksheet column.

Missing This is the number of missing observations in a worksheet column.

Mean The mean is the average value for a column. If the observations are
normally distributed, the mean is the center of the distribution.

Standard Deviation Standard deviation is a measure of data variability about the mean.

Standard Error The standard error of the mean is a measure of how closely the sample
of the Mean mean approximates the true population mean.

FIGURE 5–10
Descriptive
Statistics Results
Report

Range The range is the minimum values subtracted from the maximum values.

Maximum Maximum is the largest observation.

Minimum Minimum is the smallest observation.

Median The median is the “middle” observation, computed by ordering all


observations from smallest to largest, then selecting the largest value of
the smaller half of the observations.

Percentiles The two percentile points which define the upper and lower ends (tails)
of the data, as specified by the Descriptive Statistics options.

Sum The sum is the sum of all observations. The mean equals the sum
divided by the sample size.

Describing Your Data with Basic Statistics 109


Using SigmaStat Procedures

Sum of Squares The sum of the squared observation is the sum of squared deviations
from the mean.

Confidence Interval for The confidence interval for the mean is the range in which the true
the Mean population mean will fall for a percentage of all possible samples drawn
from the population.

Skewness Skewness is a measure of how symmetrically the observed values are


distributed about the mean. A normal distribution has skewness equal to
zero.

Kurtosis Kurtosis is a measure of how peaked or flat the distribution of observed


values is, compared to a normal distribution. A normal distribution has
Kurtosis equal to zero.

K-S Distance The Kolmogorov-Smirnov distance is the maximum cumulative distance


between the histogram of your data and the gaussian distribution curve
of your data.

Normality Normality tests the observations for normality using the Kolmogorov-
Smirnov test.

Descriptive Statistics Report Graphs 10

You can generate up to five graphs using the results from a descriptive
statistics graph. They include a:

➤ Bar chart of the column means.


➤ Scatter plot with error bars of the column means.
➤ Point plot of the column data.
➤ Point plot of the column data with error bars plotting the column
means.
➤ Box plot of the percentiles and median of column data.

Bar Chart The Descriptive Statistics bar chart plots the group means as vertical bars
with error bars indicating the standard deviation. The column titles are
used as the tick marks for the bar chart bars and default X Data and Y
Data axis titles are assigned to the graph. For an example of a bar chart,
see page 149.

Scatter Plot The Descriptive Statistics scatter plot graphs the column means as single
points with error bars indicating the standard deviation. The column

Describing Your Data with Basic Statistics 110


Using SigmaStat Procedures

titles are used as the tick marks for the scatter plot points and default X
Data and Y Data axis titles are assigned to the graph. For an example of
a scatter plot, see page 150.

Point Plot The Descriptive Statistics point plot graphs all values in each column as
a point on the graph. The column titles are used as the tick marks for the
plot points and default X Data and Y Data axis titles are assigned to the
graph. For an example of a point plot, see page 150.

Point and The Descriptive Statistics point and column means plot graphs all values
Column Means Plot in each column as a point on the graph with error bars indicating the
column means and standard deviations of each column. The column
titles are used as the tick marks for the plot points and default X Data
and Y Data axis titles are assigned to the graph. For an example of a
point and column means plot, see page 151.

Box Plot The Descriptive Statistics test box plot graphs the percentiles and the
median of column data. The ends of the boxes define the 25th and 75th
percentiles, with a line at the median and error bars defining the 10th
and 90th percentiles. For an example of a box plot, see Figure 5–12 on
page 112.

The column titles are used as the tick marks for the box plot boxes, and
no axis titles are assigned to the graph.

Creating a To generate a graph of Descriptive Statistics report data:


Report Graph
1 Click the toolbar button, or choose the Graph menu Create
Graph command when the Descriptive Statistics report is selected.
The Create Graph dialog box appears displaying the types of
graphs available for the Descriptive Statistics report.

2 Select the type of graph you want to create from the Graph Type
list, then click OK, or double-click the desired graph in the list.

Describing Your Data with Basic Statistics 111


Using SigmaStat Procedures

FIGURE 5–11
The Create Graph Dialog
Box
for the Descriptive Statistics
Report

For more information on each of the graph types, see pages 7-149
through 8-176. The specified graph appears in a graph window or
in the report.

FIGURE 5–12
A Box Plot of the Result
Data for a Descriptive
Statistics Test

For information on closing, editing, saving, opening, and printing


graphs, see the GRAPHS chapter.

Describing Your Data with Basic Statistics 112


Using SigmaStat Procedures

Choosing the Group Comparison Test to Use 10

Use the various group comparison procedures to test sample means or


medians for differences.

SigmaStat's Advisor Wizard prompts you to answer questions about your


data and goals, then selects the appropriate test. However, if you are
already familiar with the comparison requirements, you can go directly
to the appropriate test. The criteria used to select the appropriate
procedure include:

➤ The number of groups to compare. Are you comparing two


different groups or many different groups?
➤ The distribution of the sample data. Is the source population for
your sample distributed along a normal “bell” (Gaussian) curve, or
not? Comparisons of samples from normal populations use
parametric tests, which are based on the mean and standard
variation parameters of a normally distributed population. If the
populations are not normal, a non-parametric, or distribution-free
test must be used, which ranks the values along a new ordinal scale
before performing the test.

! Note that SigmaStat can automatically test for assumptions of normality and
equal variance.

SigmaStat lists the specific tests in the Statistics menu and the toolbar
drop-down list. The complete procedures for each group comparison
test are outlined in Chapter 6.

For more information on how to use the Advisor Wizard, see Chapter 5.

When to Compare Two If data was collected from two different groups of subjects (for example,
Groups two different species of fish or voters from two different parts of the
country), use a two group comparison to test for a significant difference
beyond what can be attributed to random sampling variation.

When to Use a t-test versus a Mann-Whitney Rank Sum Test You


can perform two kinds of two group comparison tests: an unpaired t-test
and the Mann-Whitney rank sum test.

➤ Choose the unpaired t-test (page 206) if your samples were taken
from normally distributed populations and the variances of the two

Choosing the Group Comparison Test to Use 113


Using SigmaStat Procedures

populations are equal. The unpaired t-test is a parametric test which


directly compares the sample data.
FIGURE 5–13
The Compare Two
Groups Commands

➤ If your samples were taken from populations with non-normal


distribution and/or unequal variances, choose the Mann-Whitney
rank sum test (page 220). The Mann-Whitney rank sum test
arranges the data into sets of rankings, then performs an unpaired t-
test on the sum of these ranks, rather than directly on the data.
➤ If your samples are already ordered according to qualitative ranks,
such as poor, fair, good, and very good, use the Mann-Whitney rank
sum test.

The advantage of the t-test is that, assuming normality and equal


variance, it is slightly more sensitive (i.e., it has greater power) than the
Mann-Whitney rank sum test. When these assumptions are not met, the
Mann-Whitney rank sum test is more reliable.

! Note that you can tell SigmaStat to analyze your data and test for normal
distribution and equal variance. If assumptions of normality and equal
variance are violated, the alternative parametric or nonparametric test is
suggested. Assumption tests are activated and configured in the t-test and
Mann-Whitney Options dialog boxes.

SigmaStat tests for normality using the Kolmogorov-Smirnov test, and


for equal variance using the Levene Median test.

When to Compare Many If you collected data from three or more different groups of subjects,
Groups use one of the ANOVA (analysis of variance) procedures to test if there

Choosing the Group Comparison Test to Use 114


Using SigmaStat Procedures

is difference among the groups beyond what can be attributed to


random sampling variation.

There are four procedures available: the single factor or One Way
ANOVA, the Two Way ANOVA, the Three Way ANOVA, and the
Kruskal-Wallis ANOVA on ranks.

➤ Choose One, Two, or Three Way ANOVA (page 230, page 253,
and page 283) if the samples were taken from normally distributed
populations and the variances of the populations are equal. The
One, Two, and Three Way ANOVAs are parametric tests which
directly compare the samples arithmetically.
➤ If your samples were taken from populations with non-normal
distribution and/or unequal variance, choose the Kruskall-Wallis
ANOVA on ranks (page 310), which is the nonparametric analog of
the one way ANOVA. The Kruskall-Wallis ANOVA on ranks
arranges the data into sets of rankings, then performs an analysis of
variance based on these ranks, rather than directly on the data, so it
does not require assuming normality and equal variance.

FIGURE 5–14
The Compare Many
Groups Commands

The advantage of parametric ANOVAs are that, when the normality and
equal variance assumptions are met, they are slightly more sensitive (i.e.,
they have greater power) than the analysis based on ranks. When the
assumptions are not met, the Kruskall-Wallis ANOVA on ranks is more
reliable.

! Note that SigmaStat does not have a two factor analysis of variance based
on ranks.

Choosing the Group Comparison Test to Use 115


Using SigmaStat Procedures

! Note also that you can tell SigmaStat to analyze your data and tests for
normal distribution and equal variance. If assumptions of normality and
equal variance are violated, the alternative parametric or nonparametric test
is suggested. These tests are specified in the Options dialog boxes. To open
the dialog box for the current test, click the button, or choose the
Statistics menu Current Test Options... command.

SigmaStat tests for normality using the Kolmogorov-Smirnov test, and


for equal variance using the Levene Median test.

When to Use One, Two, The difference between a One, Two, and Three Way ANOVA lies in the
and Three Way ANOVAs design of the experiment that produced the data.

➤ Use a One Way ANOVA (page 230) if there are several different
experimental groups that received a set of related but different
treatments (i.e., one factor). This design is essentially the same as an
unpaired t-test (a one way ANOVA of two groups obtains exactly
the same P value as an unpaired t-test).
➤ Use a Two Way ANOVA (page 253) if there were two experimental
factors that are varied for each experimental group.
➤ Use a Three Way ANOVA (page 283) if there are three experimental
factors which are varied for each experimental group.

An example of when to use a One Way ANOVA would be when


comparing biology teachers from three different states for their
knowledge of evolution. The factor varied is state.

An example of when to use Two Way ANOVA would be when


comparing teachers from the three states and with different education
levels for their knowledge of evolution—the two different factors are
state and years of education. The two factor design can test three
hypotheses about the state and education levels: (1) there is no
difference in opinion of the teachers among states; (2) there is no
difference in knowledge among education levels; and (3) there is no
interaction between state and education in terms of knowledge; any
differences between differing levels of education are the same in all
states.

An example of when to use a Three Way ANOVA would be when


comparing teachers male and female teachers from three different states,
with different levels of education for their knowledge of evolution—the
three different factors are gender, state, and years of education. The three

Choosing the Group Comparison Test to Use 116


Using SigmaStat Procedures

factor design can test for (1) there is no difference in opinion of the
teachers among gender; (2) there is no difference in opinion of the
teachers among states; (3) there is no difference in knowledge among
education levels; and (4) there is no interaction between gender, state,
and education in terms of knowledge; any differences between differing
levels of education are the same for all genders in all states.

How to Determine Which Analysis of variance techniques (both parametric and nonparametric)
Groups are Different test the hypothesis of no differences between the groups, but do not
indicate what the differences are. You can use the multiple comparison
procedures (post-hoc tests) provided by SigmaStat to isolate these
differences.

To always test for differences among the groups select the Always
Perform option under the Post Hoc Tests tab in the ANOVA options
dialog boxes. You can also specify to use multiple comparisons to test for
a difference only when the ANOVA P value is significant by selecting the
Only When ANOVA P Value is Significant option, then select the
desired P value.

The specific multiple comparisons procedures to use for each ANOVA


are selected in the Multiple Comparison Options dialog box.

Choosing the Repeated Measures Test to Use 10

Use repeated measures tests to determine the effect a treatment or


condition has on the same individuals by observing the individuals
before and after the treatments or conditions.

By concentrating on the changes produced by the treatment instead of


the values observed before and after the treatment, repeated measures
tests eliminate the differences due to individual reactions, which gives a
more sensitive (or more powerful) test for finding an effect.

The Advisor Wizard (see Chapter 5) prompts you to answer questions


about your data and goals, then selects the appropriate test. However, if
you are already familiar with the comparison requirements, you can go
directly to the appropriate test. The criteria used to select the
appropriate procedure include:

Choosing the Repeated Measures Test to Use 117


Using SigmaStat Procedures

➤ The number of treatments to compare. Are you comparing the


effect before and after a single treatment, or after two or more
different treatments?
➤ The distribution of the treatment effects. Are the individual effects
distributed along a normal “bell” (Gaussian) curve, or not?
Comparisons of treatments effects with normal distributions use
parametric tests, which are based on the mean and standard
deviation parameters of a normally distributed population. If the
effect distributions are not normal, a nonparametric, or
distribution-free test must be used which ranks the values along a
new ordinal scale before performing the test.

! Note that SigmaStat can automatically test for assumptions of normality and
variance.

SigmaStat lists the specific tests in the Statistics menu and the toolbar
drop-down list. The complete procedures for each repeated measures
comparison test are outlined in Chapter 10.

When to Compare If data was collected from the same group of individuals (for example,
Effects on Individuals patients before and after a surgical treatment, or rats before and after
Before and After a Single training), use Before and After comparison to test for a significant
Treatment difference beyond what can be attributed to random individual
variation.

When to use a Paired t-test versus a Wilcoxon Signed Rank Test


You can use two different tests to compare observations before and after
an intervention in the same individuals: the Paired t-test and the
Wilcoxon Signed Rank Test.

➤ Choose the Paired t-test (page 332) if your samples were taken from
a population in which the changes to each subject are normally
distributed. The Paired t-test is a parametric test which directly
compares the sample data.
➤ If your sample effects are not normally distributed, choose the
Wilcoxon Signed Rank Test (page 345). The Wilcoxon Signed Rank
Test arranges the data into sets of rankings, then performs a Paired t-
test on the sum of these ranks, rather than directly on the data.
➤ If your samples are already ordered according to qualitative ranks,
such as poor, fair, good, and very good, use the Wilcoxon Signed
Rank Test.

Choosing the Repeated Measures Test to Use 118


Using SigmaStat Procedures

FIGURE 5–15
The Before and After
Comparison Commands

The advantage of the paired t-test is that, assuming normality and equal
variance, it is slightly more sensitive (i.e., it has greater power) than the
Wilcoxon Signed Rank Test. When these assumptions are not met, the
Wilcoxon Signed Rank Test is more reliable.

! Note that you can tell SigmaStat to analyze your data and test for normality.
If the assumption of normality is violated, the alternative parametric or
nonparametric test is suggested. Assumption tests are activated and
configured in the Paired t-test and Wilcoxon Options dialog boxes.

SigmaStat tests for normality using the Kolmogorov-Smirnov test.

When to Compare If you collected data on the same individuals undergoing three or more
Effects on Individuals different treatments or conditions, use one of the Repeated Measures
After Multiple ANOVA (analysis of variance) procedures to test if there is difference
Treatments among the effects of the treatments beyond what can be attributed to
random individual variation.

There are three procedures available: the single factor or One Way
Repeated Measures ANOVA (analysis of variance), the Two Way
Repeated Measures ANOVA, and the Friedman Repeated Measures
ANOVA on Ranks.

➤ Choose One or Two Way ANOVA (pages page 230 and page 379) if
the treatment effects are normally distributed with equal variances.
The one and two way ANOVAs are parametric tests which directly
compare the two samples arithmetically.
➤ If the treatment effects are not normally distributed and/or have
unequal variances, choose the Friedman Repeated Measures

Choosing the Repeated Measures Test to Use 119


Using SigmaStat Procedures

FIGURE 5–16
The Repeated Measures
Comparison Commands

ANOVA on ranks (page 408), which is the nonparametric analog of


the One Way ANOVA. The Friedman Repeated Measures ANOVA
on ranks arranges the data into sets of rankings, then performs an
analysis of variance based on these ranks, rather than directly on the
data, so it does not require assuming normality and equal variances.

The advantage of parametric Repeated Measures ANOVAs are that,


when the normality and equal variance assumptions are met, they are
slightly more sensitive (i.e., they have greater power) than the analysis
based on ranks. When the assumptions are not met, the Repeated
Measures Friedman ANOVA on ranks is more reliable.

! Note that SigmaStat does not have a two factor analysis of variance based
on ranks.

! Note that you can tell SigmaStat to analyze your data and test for normal
distribution and equal variance. If assumptions of normality and equal
variance are violated, the alternative parametric or nonparametric test is
suggested. These tests are specified in the repeated measures one and two way
and Friedman options dialog boxes. See pages page 230, and page 408 for
more information.

SigmaStat tests for normality using the Kolmogorov-Smirnov test, and


for equal variance using the Levene Median test.

When to Use One and The difference between a one factor and two factor repeated measures
Two Way RM ANOVA ANOVA lies in the design of the experiment that produced the data.

Choosing the Repeated Measures Test to Use 120


Using SigmaStat Procedures

➤ Use a One Way RM ANOVA (page 355) if the individuals received


a set of related but different treatments (i.e., one factor). This
design is essentially the same as a paired t-test (a one way repeated
measures ANOVA of two groups obtains exactly the same P value as
a paired t-test).
➤ Use a Two Way RM ANOVA (page 379) if there were two
experimental factors that are varied for the individuals. One or both
of the factors can be repeated on the individuals.

An example of when to use One Way Repeated Measures ANOVA


would be when comparing the reading skills of the same students after
grade school, high school, and college. The repeated factor is education.

An example of when to use Two Way Repeated Measures ANOVA


would be when comparing reading skills at different education levels,
but the students attended different schools. This example has repeated
measures on education level only, with school as the unrepeated second
factor. If you changed the schools so that all students attended all schools
as well, then the school factor is also repeated.

The two factor design can test three hypotheses about the education
levels and schools: (1) there is no difference in reading skill at different
education levels; (2) there is no difference in reading skill at different
schools or after changing schools; and (3) there is no interaction between
education level and school in terms of reading skill; any effect of levels of
education are the same in all schools.

! SigmaStat automatically determines if one or both factors have repeated


observations in a two way repeated measures ANOVA.

How to Determine Which Repeated measures analysis of variance techniques (both parametric and
Treatments Have an nonparametric) test the hypothesis of no effect among treatments, but
Effect do not indicate which treatments have an effect. You can use the
multiple comparison procedures provided by SigmaStat to isolate the
differences in effect.

To always test for differences among the groups, select the Always
Perform option under the Post Hoc Tests tab in the ANOVA options
dialog boxes. You can also specify to use multiple comparisons to test for
a difference only when the ANOVA P value is significant by selecting the
Only When ANOVA P Value is Significant option, then select the
desired P value.

Choosing the Repeated Measures Test to Use 121


Using SigmaStat Procedures

The specific multiple comparisons procedures to use for each ANOVA


are selected in the Multiple Comparison Options dialog boxe.

Choosing the Rate and Proportion Comparison to Use 10

Frequency, rate, and proportion tests compare percentages and


occurrences of observations, such as the proportion of males and females
found in different countries. Use rate and proportion comparisons to
determine if there is a significant difference in the distribution of a
group among different categories or classes beyond what can be
attributed to random sampling variation. The data can be random
observations of a population, or a group before and after a treatment or
change in condition.

You can compare distribution in categories using a z-test to Compare


Proportions, Chi-Square analysis of contingency tables, Fisher Exact
Test, and McNemar's Test.

➤ Use z-test (page 429) to determine if proportions of a single group


divided into two categories are significantly different. Compare
Proportions compares two groups according to the percentage of
each group in the two categories.
➤ Use Chi-Square "#2) analysis of contingency tables (page 442) to
compare the numbers of individuals of two or more groups that fall
into two or more different categories.
➤ Use the Fisher Exact Test (page 451) if you have two groups falling
into two categories (a 2 x 2 contingency table) with a small number
of expected observations in any category.
➤ Use McNemar's Test (page 457) to compare the number of
individuals that fall into different categories before and after a single
treatment or change in condition.

! Note that SigmaStat automatically analyzes your data for its suitability for
Chi-Square or Fisher Exact Test, and suggests the appropriate test.

Choosing the Rate and Proportion Comparison to Use 122


Using SigmaStat Procedures

FIGURE 5–17
The Rates and Proportions
Comparison Methods

Choosing the Prediction or Correlation Method 10

When you want to predict the value of one variable from one or more
other variables, you can use regression methods to estimate the
predictive equation, and/or compute a correlation coefficient to
describe the how strongly the value of one variable is associated with
another.

When to Use Regression Regression methods are used to predict the value of one variable (the
to dependent variable) from one or more independent variables by
Predict a Variable estimating the coefficients in a mathematical model. Regression assumes
that the value of the dependent variable is always determined by the
value of independent variables. Regression is also known as fitting a line
or curve to the data.

Regression is a parametric statistical method that assumes that the


residuals (differences between the predicted and observed values of the
dependent variables) are normally distributed with constant variance.

The type of regression procedure to use depends on the number of


independent variables and the shape of the relationship between the
dependent and independent variables. You can perform regression using
Simple Linear Regression, Multiple Linear Regression, Multiple Logistic
Regression, Polynomial Regression, and Nonlinear Regression.

Choosing the Prediction or Correlation Method 123


Using SigmaStat Procedures

➤ Use a Simple Linear Regression procedure if there is a single


independent variable, and the dependent variable changes in
proportion to changes in the independent variable (i.e, linearly). For
more information on the Simple Linear Regression, see page 469.
➤ Use Multiple Linear Regression when there are several independent
variables, and the dependent variable changes in proportion to
changes in each independent variable (i.e., linearly). For more
information on the Multiple Linear Regression, see page 495.
➤ Use Multiple Logistic Regression when you want to predict a
qualitative dependent variable, such as the presence or absence of a
disease, from observations of one or more independent variables, by
fitting a logistic function to the data. For more information on the
Multiple Logistic Regression, see page 527.
➤ Use Polynomial Regression for curved relationships that include
powers of the independent variable in the regression equation. For
more information on the Polynomial Regression, see page 553.
➤ Use Nonlinear Regression to fit any general equation to the
observations. For more information on the Nonlinear Regression,
see page 636.

You can determine whether or not a possible independent variable


contributes to a multiple linear regression model using Forward and
Backward Stepwise Regression or Best Subset Regression. Use these
procedures if you are unsure of the contribution of a variable to the value
of the independent variable in a Multiple Linear Regression.

➤ Use Backwards Stepwise Regression (page 577) to begin with all


selected independent variables, and delete the variables that least
contribute to predicting the dependent variable, until only variables
with real predictive value remain in the model.
➤ Use Forward Stepwise Regression (page 577) to start with zero
independent variables, then add variables that contribute to the
prediction of the dependent variable, until (ideally) all variables that
contribute have been added to the model.
➤ Use Best Subset Regression (page 611) to evaluate all possible
models of the regression equation, and identify those with the best
predictive ability (according a to specified criterion).

Note that these procedures can be used to find Multiple Linear


Regression models. Choose Polynomial or Nonlinear Regression for
curved data sets.

Choosing the Prediction or Correlation Method 124


Using SigmaStat Procedures

FIGURE 5–18
The Regression Commands

When to Use Correlation Compute the correlation coefficient if you want to quantify the
relationship between two variables without specifying which variable is
the dependent variable and which is the independent variable.
Correlation does not predict the value of one variable from another; it
only quantifies the strength of association between the value of one
variable with another.

FIGURE 5–19
The Correlation Coefficient
Computation Commands

You can compute two kinds of correlation coefficients: the Pearson


Product Moment Correlation coefficient, and the Spearman Rank
Order Correlation coefficient.

➤ Choose Pearson Product Moment Correlation (page 623) if the


residuals are normally distributed and the variances are constant.

Choosing the Prediction or Correlation Method 125


Using SigmaStat Procedures

The Pearson Product Moment Correlation is a parametric test


which assumes that data were drawn from a normal population.
➤ If the residuals are not normally distributed and/or have non-
constant variances, choose Spearman Rank Order Correlation (page
631). The Spearman rank order correlation is a nonparametric test
that constructs a measure of association based on ranks rather than
on arithmetic values.
➤ If your samples are already ordered according to qualitative ranks,
such as poor, fair, good, and very good, choose Spearman rank order
correlation.

The advantage of the Pearson Product Moment Correlation is that,


assuming normality and constant variance, it is slightly more sensitive
(i.e., it has greater power) than the Spearman rank order correlation.

Choosing the Survival Analysis to Use 10

Use survival analysis to generate the probability of the time to an event


and the associated statistics such as the median survival time. Three
procedures are provided: single group, multiple groups using the
LogRank statistic and multiple groups using the Gehan-Breslow
statistic.

➤ Use Survival, Single Group to determine the survival time statistics


and graph for a single data set (group). This may also be used to
generate a single survival curve graph and statistics for all data sets
combined in a multi-group data set provided that the data is in
Indexed format. This is done by selecting the survival time a status
columns and ignoring the group column.
➤ Use Survival, LogRank to determine the survival time statistics and
graph for multi-group data sets. The LogRank statistic and one of
two multiple comparison procedures will be used to determine
which groups are significantly different. The LogRank statistic
assumes that all survival times are equally accurate.
➤ Use Survival, Gehan-Breslow for exactly the same situation as the
LogRank case except that the later survival times are assumed to be
less accurate and are given less weight. Many censored values with
large survival times provide an example of this situation.

Choosing the Survival Analysis to Use 126


Using SigmaStat Procedures

Testing Normality 10

A normal population follows a standard, “bell” shaped Gaussian


distribution. Parametric tests assume normality of the underlying
population or residuals of the dependent variable, and can become
unreliable if this assumption is violated. SigmaStat uses the Kolmogorov-
Smirnov test (with Lilliefors' correction) to test data for normality of the
estimated underlying population.

FIGURE 5–20
Example of Normally
Distributed Data Plotted
Using a Line Plot
Normally distributed data
has a characteristic “bell”
shaped distribution,
as shown on the left.

Non-normal data can have a


“skewed” distribution, as
shown on the right.

When to Test for Normality is assumed for all parametric tests and regression procedures.
Normality SigmaStat can automatically perform a normality test when running
a statistical procedure that makes assumptions about the population
parameters. This assumption testing is enabled in the Options dialog
box for each test. If the data fails the assumptions required for a
particular test, SigmaStat will suggest the appropriate test that can be
used instead.

However, if you want to perform a parametric test and your data fails the
normality test, you can transform your data using Transforms menu
commands so that it meets the normality requirements. To make sure
transformed data now follows a normal distribution pattern, you can
run a normality test on the data before performing the parametric
procedure again.

Performing a Normality To run a normality test:


Test
1 Enter, transform or import the data to be tested for normality into
data worksheet columns.

2 If desired, set the P value used to pass or fail the data in the Report
Options dialog box (see the following section).

Testing Normality 127


Using SigmaStat Procedures

3 Select Normality from the toolbar drop-down list, or choose the


Statistics menu Normality... command.

4 Run the Normality test by selecting the worksheet columns with


the data you want to test using the Pick Columns dialog box.

5 View and interpret the Normality test report, and generate the
report graphs.

Setting the P Value for the Normality Test 10

The P value used by the Kolmogorov-Smirnov test to say whether the


data passes or fails the Normality test is set in the Report Options dialog
box.

To set the P value for the Normality test:

1 Choose the Statistics menu Report Options... command. The


Report Options dialog box appears.

2 If you want to change the P value for the normality test, select the
P value box. The P value determines the probability of being
incorrect in concluding that the data is not normally distributed. If
the P computed by the test is greater than the P set here, the test
passes.

FIGURE 5–21
The Report Options Dialog
Box

To require a stricter adherence to normality, increase the P value.


Because the parametric statistical methods are relatively robust in
terms of detecting violations of the assumptions, the suggested
value in SigmaStat is 0.050. Larger values of P (for example,
0.100) require less evidence to conclude the data is not normal.

To relax the requirement of normality, decrease P. Requiring


smaller values of P to reject the normality assumption means that

Testing Normality 128


Using SigmaStat Procedures

you are willing to accept greater deviations from the theoretical


normal distribution before you flag the data as non-normal.

3 Click OK when finished.

For more information on setting the P value in the Report Options


dialog box, see Setting Report Options on page 135.

Arranging Normality Test Data 10

Normality test data must be in raw data format, with the individual
observations for each group, treatment or level in separate columns. You
can test up to 64 columns of data for normality.

FIGURE 5–22
Valid Data Format
for Normality Testing

Running a Normality Test 10

To run a Normality test, you need to select the data to test. The Pick
Columns dialog box is used to select the worksheet columns with the
data you want to test.

To run a Normality test:

1 If you want to select your data before you run the test, drag the
pointer over your data.

Testing Normality 129


Using SigmaStat Procedures

2 Open the Pick Columns for Normality dialog box by:

➤ selecting Normality from the toolbar drop-down list, then


selecting the button
➤ choosing the Statistics menu Normality... command

3 If you selected columns before you chose the test, the selected
columns automatically appear in the Selected Columns list. To
assign the desired worksheet columns to the Selected Columns list,
select the columns in the worksheet, or select the columns from
the Data for Data drop-down list.

FIGURE 5–23
The Pick Columns
for Normality Dialog Box

The first selected column is assigned to the first row in the Selected
Columns list, and all successively selected columns are assigned to
successive rows in the list. The number or title of selected columns
appear in each row. You can select up to 64 columns of data for the
Normality test.

4 To change your selections, select the assignment in the list, then


select new column from the worksheet. You can also clear a
column assignment by double-clicking it in the Selected Columns
list.

5 Click Finish to describe the data in the selected columns. After the
computations are completed, the report appears. To edit the
report, use the Format menu commands; for information on
editing reports, see page 137.

! If you attempt to run a test on worksheet columns with empty cells,


a dialog box appears asking you if you want to convert the empty
cells to missing values. Choose Convert to convert the cells and
continue with the test, or Cancel to cancel the test.

Testing Normality 130


Using SigmaStat Procedures

Interpreting Normality Test Results 10

The results of a Normality test display the K-S distances and P values
computed for each column, and whether or not each column selected
passed or failed the test.

FIGURE 5–24
The Normality
Test Report

K-S Distance The Kolmogorov-Smirnov distance is the maximum cumulative distance


between the histogram of your data and the gaussian distribution curve
of your data.

P Values The P values represent the observations for normality using the
Kolmogorov-Smirnov test. If the P computed by the test is greater than
the P set in the Report Options dialog box (see page 135), your data can
be considered normal.

In addition to the numerical results, expanded explanations of the results


may also appear. You can turn off this explanatory text in the Report
Options dialog box. For more information, see Setting Report Options
on page 135.

! The number of decimal places displayed is also controlled in the Report


Options dialog box.

Normality Report Graphs 10

You can generate two graphs using the results from a Normality report.
They include a:

Testing Normality 131


Using SigmaStat Procedures

➤ Histogram of the residuals.


➤ Normal probability plot of the residuals.

Histogram of Residuals The Normality histogram plots the raw residuals in a specified range,
using a defined interval set. The residuals are divided into a number of
evenly incremented histogram intervals and plotted as histogram bars
indicating the number of residuals in each interval. The X axis represents
the histogram intervals, and the Y axis represent the number of residuals
in each group. For an example of a histogram of residuals, see page 153.

Normal The Normality probability plot graphs the frequency of the raw
Probability Plot residuals. The residuals are sorted and then plotted as points around a
curve representing the area of the gaussian plotted on a probability axis.
Plots with residuals that fall along gaussian curve indicate that your data
was taken from a normally distributed population. The X axis is a linear
scale representing the residual values. The Y axis is a probability scale
representing the cumulative frequency of the residuals. For an example
of a normal probability plot, see page 155.

Creating a To generate a graph of Normality report data:


Report Graph
1 Click the toolbar button, or choose the Graph menu Create
Graph command when the Normality report is selected. The
Create Graph dialog box appears displaying the types of graphs
available for the Normality report.

FIGURE 5–25
The Create Graph Dialog
Box
for the Normality Report

2 Select the type of graph you want to create from the Graph Type
list, then click OK, or double-click the desired graph in the list.
For more information on each of the graph types, see page 149

Testing Normality 132


Using SigmaStat Procedures

through 8-176. The specified graph appears in a graph window or


in the report.

For information on closing, editing, saving, opening, and printing


graphs, see Chapter 7.

Determining Experimental Power and Sample Size 10

The power, or sensitivity of a statistical hypothesis test depends on the


alpha ($) level, or risk of a false positive conclusion, the size of the effect
or difference you wish to detect, the underlying population variability,
and the sample size.

The sample size for an intended experiment is determined by the power,


alpha ($), the size of the difference, and the population variability.

Both of these procedures are discussed in detail in Chapter 13.

When to Power and sample size computations are used to determine the
Compute Power parameters for an intended experiment, before the experiment is carried
and Sample Size out. Use these procedures to help improve the ability of your
experiments to test the desired hypotheses.

You can determine power or sample size for:

➤ Paired and Unpaired t-tests


➤ One Way ANOVA
➤ z-test comparison of proportions
➤ Chi-Square analysis of contingency tables
➤ Correlation Coefficients

To determine the power of an intended test, choose the Statistics menu


Power command, then choose the test.

When the dialog box appears, specify the remaining parameters of the
data. For more information on determining the power of a test, see
Chapter 13.

Determining Experimental Power and Sample Size 133


Using SigmaStat Procedures

FIGURE 5–26
The Power
Computation Commands

To estimate the sample size necessary to achieve a desired power, choose


the Statistics menu Sample Size command, then choose the test.

FIGURE 5–27
The Sample Size
Computation Commands

When the dialog box appears specify the power and the remaining
parameters of the data. For more information on determining the
sample size of a test, see Chapter 13.

Determining Experimental Power and Sample Size 134


Working with Reports

6 Working with Reports

This chapter discusses test reports. It explains how to:

➤ Set report options (page 135)


➤ Generate reports (page 137)
➤ Edit reports (page 137)
➤ Move around reports (page 141)
➤ Save reports to Notebooks (page 142)
➤ Export reports (page 143)
➤ Open reports (page 144)
➤ Print reports (page 145)

Setting Report Options 10

Use the Report Options dialog box to:

➤ Set the number of decimals displayed in the report.


➤ Enable or disable scientific notation.
➤ Enable or disable explanatory text for report results.
➤ Set whether or not you want to report only flagged values.
➤ Hide or display the report ruler.

To set report options:

1 Choose the Tools menu Options command and then click the
Report tab.

2 Change the appropriate options, then click OK to accept the


changes and close the dialog box (see below for information on the

Setting Report Options 135


Working with Reports

individual options). All generated reports use the specified option


settings.

Setting The Number of Significant Digits option is used to set the number of
Significant Digits significant digits used for the values in the report. The default is three
digits. The maximum number of digits is sixteen.

Using Scientific Notation The Always Use Scientific Notation option uses scientific notation for
the appropriate values in the report tables. If this option is disabled,
scientific notation is only used when the value is too long to fit in the
table cell. This option is disabled by default.

Explaining The Explain Test Results option includes explanatory text for test results
Test Results in the report. This option is enabled by default. Disable the option to
keep explanatory text out of the report.

Specifying a Significant The P Value for Significance determines whether there is a statistically
P Value significant difference in the mean values of the groups being tested. The
value you specify is compared to the P values computed by all tests.

! It is important to note that this P value does not affect the actual test results.
It only affects the text that explains if the difference in the mean values of the
groups is due to chance or due to random sampling variation.

If the P computed by the test is smaller than the P set here, the text
reads, “The difference in the mean values of the two groups is greater
than would be expected by chance; there is a statistically significant
difference between the input groups.”

If the P value computed by the test is greater than the P set here, the text
reads, “The difference in the mean values of the two groups is not great
enough to reject the possibility that the difference is due to random
sampling variability. There is not a statistically significant difference
between the input groups.”

One of the above explanation text strings appears for each P value
computed by the test. ANOVAs and some regressions produce multiple
P values.

! If the Explain Test Results option is turned off, the results of this P value do
not appear in the report.

Setting Report Options 136


Working with Reports

Hiding and Displaying The Show Ruler option displays the ruler at the top margin of the report
the page. This option is enabled by default. Disable the option to hide the
Report Ruler report ruler.

Generating Reports 10

Reports contain the results of a performed test. Each time you run a test
a new test section containing the report is generated. The report takes
the name of the test it was generated from and number of the report.
The section takes the name of the test the report was generated from and
is numbered according to the order it is generated.

For example, if the first report you generate by running the Descriptive
Statistics test, the title of the report window is Descriptive Statistics 1. If
you generate a second report using data from the Paired t-test worksheet,
the title of the report is Paired t-test 2. The sections are Descriptive
Statistics 1 and Paired t-test 1. For information of renaming and moving
notebook items, see Copying and Moving Notebook Items on page 28
and Naming Sections and Items on page 23.

You can generate as many reports as desired. If you have multiple reports
opened, use the Window menu to select the report you want to view.

! Use the Edit menu commands to combine reports together. See the
following section, EDITING REPORTS for more information.

Editing Reports 10

Editing reports involves changing text and paragraph attributes. Use the
formatting toolbar or the Format menu commands to modify the font,
margins, tabs, text alignment, line spacing, and tabs in the selected
report. You can also use the Edit menu to search for and replace specified
text with new text.

! By default, the report page is set to US Letter size and Portrait orientation.

Formatting Toolbar The formatting toolbar automatically appears under the standard
toolbar whenever a report window is open; it is not available unless a
report is open, and is not active unless a report is the active window. Use

Generating Reports 137


Working with Reports

the formatting toolbar to modify the style, alignment, spacing, and tabs
of selected report text.

For information on hiding, displaying, and positioning the formatting


toolbar, see Using SigmaStat Toolbars on page 8.

Report Ruler Settings A ruler automatically appears at the top of each report. It can be used to
set tabs and position text in the report. To turn the report ruler on and
off choose the Tools menu Options command, and click the Report tab.
Select or clear Show Ruler. Select the units of measurement you want to
use, then click OK.

FIGURE 6–1
The Ruler Units Dialog Box

Changing Text Attributes Changing the attributes of report text involves modifying the typeface,
color, style, and size of the selected characters.

To modify report text attributes:

1 Select the characters you want to modify. If you want to modify


the entire report, choose the Edit menu Select All command to
select all the text in the report.

Editing Reports 138


Working with Reports

2 Choose the Format menu Font... command to open the Font


dialog box. .

FIGURE 6–2
The Font Dialog Box

3 Select the desired font, style, and size from the Font, Font Style,
and Size lists. Use the Underline and Strikeout check boxes to draw
lines through and under selected text. Select the desired color from
the Color drop-down list. An example of the specified font
attributes appears in the box at the bottom of the dialog box.

! You can also use the buttons in the formatting toolbar to italicize,
underline, or make selected text boldface.

4 Click Apply to assign the settings to the selected text without


closing the dialog box or OK to assign the settings to the text and
close the dialog box. The selected text changes according to the
specified characters attributes.

Spacing and Space and align report text using the Format menu Spacing and
Aligning Report Text Alignment commands or the formatting toolbar. Line spacing can be set
to 1 Line, 2 Lines, or 3 Lines. Text can be left, right, or centered aligned
or justified. Select the lines you want to space. Choose the Edit menu
Select All command to select all the text in the report. When the desired
text is selected, choose the appropriate command, or click the
appropriate buttons from the formatting toolbar.

Setting Tabs Set tabs using the ruler that appears above the report. Select the left,
right, center, or decimal aligned tab button from the formatting toolbar
or choose the appropriate Format menu Tab command, then click in the

Editing Reports 139


Working with Reports

space in the ruler above the measurement units. Arrows or a decimal


representing the selected tab appears at the selected measurement unit.
You can add as many tabs as desired. To clear tabs from report text, drag
the tab markers off the report ruler.

Searching for To find and change specified text in a report:


and Changing
Report Text 1 Make sure the report is the active window, then choose the Edit
menu Find or Replace command, or press Ctrl+F. The Find and
Replace dialog box appears.

2 Type the text you want to search for in the Find What edit box.

3 Under Search Options, select All, Up, or Down to specify the


direction of the search.

4 Select the Match Case option to search only for text that matches
the case you specified in the Find What edit box.

5 To replace the text you are searching for with new text, click the
Replace tab, and type the text you want to replace the old text with
in the Replace With edit box.

6 Click the Find Next button to find the specified text according to
the selected settings. SigmaStat starts searching at the cursor
location. The first instance of the text after the cursor is
highlighted in the report.

7 To replace the highlighted text with the text in the Replace With
edit box, click the Replace button. Click the Replace All button to
replace all instances of the text in the Find What edit box with the
text in the Replace With edit box.

! Selecting Replace with nothing in the Replace With edit box, deletes
report text matching the text in the Find What edit box.

8 Click Cancel to close the dialog box. You can also close the dialog
box by clicking the button in the upper right corner of the
dialog box.

Cutting and Copying To remove selected text from the report, choose the Edit menu Cut
Report Text command, click the toolbar button, or press Ctrl+X. To copy
selected text from the report to the Clipboard without removing it from

Editing Reports 140


Working with Reports

the worksheet, choose the Edit menu Copy command, click the toolbar
button, or press Ctrl+C.

Pasting Report Text Use the Paste command to paste text to other locations in the report,
other reports, or other applications. To paste cut or copied text from the
Clipboard, click or move the cursor to where you want to place the text;
then choose the Paste command, click the toolbar button, or press
Ctrl+V. The Clipboard contents appear in the specified location of the
report or application.

Deleting Report Text Use the Clear command or press the Delete key to permanently erase
selected text from the report. The text is not copied to the Clipboard.

Editing Reports Using a The SigmaStat editor is a fully functional text editor, however, for
Word Processor complex or lengthy editing tasks, you can use a more powerful word
processor. To open reports in other applications, you need to export
report as either text or RTF files. Reports saved as RTF files keep all of
the formatting code. To leave the formatting code out of the report,
export the report as plain text by choosing Text (.TXT) as the file type.

For more information on exporting reports, see Exporting Reports on


page 143.

Moving Around Reports 10

Use the scroll bars at the right and bottom edges of the report window to
scroll through the current page of the report. Scroll bar do not move to
the next or previous page. You must use the formatting toolbar and
buttons to move one page up and down in the report.

You can also use the following keyboard commands to move around and
select text in the report.

Function Keystroke
Move to next character %
Move to previous character &
Move to next word Ctrl'%
Move to previous word Ctrl+&

Moving Around Reports 141


Working with Reports

Move to beginning of line Home


Move to end of line End
Move down one line (
Move up one line )
Move one window height down PgDn
Move one window height up PgUp
Move to beginning of report Ctrl+Home
Move to end of report Ctrl+End
Backspace or delete previous character Backspace
Delete current character Delete
Select next character Shift+%
Select previous character Shift+&
Select a word Double-click
Select to end of line Shift+End
Select to beginning of line Shift+Home
Select to end of report Ctrl+Shift+End
Select to beginning of report Ctrl+Shift+Home

Saving Reports to Notebooks 10

Reports generated from SigmaStat tests and non-notebook report files


opened in SigmaStat are always saved to notebook files. To save a report
to an existing notebook file, choose the File menu Save command, press
Ctrl+S, or click the toolbar button.

If you are saving a notebook for the first time, the Save As dialog box
appears prompting you for a file name, and path for the notebook file.

Saving Reports to Notebooks 142


Working with Reports

If you are saving the report to an existing notebook file, the notebook is
updated to include the new report or the changes to the existing report.

! To save reports as an non-notebook files, you must export them


using the File menu Export... command. For more information on
exporting reports, see see Exporting Reports below.

For more information on saving reports to notebook files, see Saving


Notebook Files and Items on page 30.

Exporting Reports 10

SigmaStat reports can be saved as non-notebook files using the File


menu Export... command. Saving reports as non-notebook files is useful
if you want edit reports in a word processing application (see Editing
Reports Using a Word Processor on page 141).

To export a report as a non-notebook file, drag the mouse over the text
you want to save to a file. If no text is selected, the entire report is
exported. Choose the File menu Export... command, then select the file
type to export the report to. Reports can be exported as the following file
types:

➤ Text files (.TXT)


➤ Rich Text Format files (.RTF)
➤ PDF Format files (*.PDF)
➤ HTML Format files (*.HTM)

RTF files are saved with formatting attributes. Text files are saved
without the formatting attributes.

! You can also cut and/or copy report text, then paste it into a word processing
application using the Edit menu commands. Text pasted into other
applications is pasted as plain text.

For more information on exporting reports, see Exporting Notebook


Items to Other File Formats on page 32.

Exporting Reports 143


Working with Reports

Opening Reports 10

To open a report, choose the File menu Open... command, click the
toolbar button, or press Ctrl+O. When the Open dialog box appears,
select the type of file you want to open by selecting a file type from the
List Files of Type drop-down list, then click OK. If you are opening a
report in a notebook file, choose a notebook file as the file type. If you
are opening a non-notebook report file, choose Text or Rich Text
Format as the file type.

Reports in If you open a Notebook file (.SNB), a notebook file appears displaying
Notebook Files its sections and items. To view the desired report, double-click the report
icon in the appropriate section.

For more information on opening notebook files, see Opening and


Viewing Notebook Files and Items on page 25.

Non-Notebook Report Non-notebook files are individual files which are separate from the
Files notebook. They are automatically converted to notebook file format
when opened in SigmaStat. You can open the following non-notebook
report file types in SigmaStat:

➤ Report Text Files (.TXT)


➤ Rich Text Format files RTF (.RTF)

For more information on opening non-notebook files in SigmaStat, see


Opening Non-Notebook Files on page 26.

Closing and Deleting Reports 10

Close a report by clicking the close button which appears in the upper
right corner of the report window. You can also close reports by choosing
the File menu Close command while the report is the active window.
Closed reports can be opened by double-clicking the report and graph
icon in the notebook section.

Closed reports and graphs are not removed from the notebook. To delete
a report or graph, close it, then select the report icon in the notebook
section and press Delete. For more information on deleting notebook
items, see Removing Items from a Notebook File on page 30.

Opening Reports 144


Working with Reports

Printing Reports 10

To print a report:

1 Make sure the report or graph you want to print in the active
window, then choose the File menu Print... command, click the
toolbar button, or press Ctrl+P. The Print dialog box appears.

2 Specify the printer to use, the range of pages, and number of copies
to print.

FIGURE 6–3
Example of the Print dialog
box for the HP LaserJet
Postscript Printer
This dialog box differs
depending on the type of
output device you have.

3 Click Properties to set more advanced printing options. Once you


have set the desired option, click OK to return to the Print dialog
box, then click OK again to print the selected report.

! Note that the Print dialog box differs depending on the type of
printer you have. Figure 6–3 is an example of the Print dialog box
with an HP 4/4M Postscript driver selected.

Printing Reports 145


Working with Reports

Printing Reports 146


Creating and Modifying Graphs

7 Creating and Modifying Graphs

SigmaStat creates graphs generated from reports and exploratory graphs that
you create using the Graph Wizard. This chapter discusses how to create and
modify both types of SigmaStat graphs. It explains how to:

➤ Generate report graphs (page 148)


➤ Create exploratory graphs (page 162)
➤ Set graph page options (page 178)
➤ Zoom in and out on graphs (page 181)
➤ Select graphs and objects (page 182)
➤ Move and size graphs (page 182)
➤ Modify graphs (page 184)
➤ Add and edit labels and legends to the graph (page 186)
➤ Edit graphs in SigmaPlot (page 189)
➤ Save graphs in Notebooks (page 200)
➤ Open graph pages (page 200)
➤ Close and delete reports and graphs (page 201)
➤ Print graphs (page 202)

147
Creating and Modifying Graphs

Generating Report Graphs 10

Graphs can be generated for all test reports except Two Way Repeated
Measures ANOVA, rates and proportions tests, Best Subset and Incremental
Polynomial Regression, and Multiple Logistic reports.

To generate a report graph, select the appropriate report, then click the
toolbar button, or choose the Graph menu Create Graph... command, or
press F3. The Create Graph dialog box appears displaying the available
graphs for the selected report.

! The button and Create Graph... command are dimmed if no report is


selected or if the selected report does not generate a graph.

Select the report graph you want to create, then click OK, or double-click the
graph in the list.

FIGURE 7–1
The Create Graph
Dialog Box for a Report
Graph

If you are generating a 2D graph or a 3D graph for a Multiple Linear or a


Polynomial Regression with more than two independent variables, a dialog
box appears asking you to specify the independent variables to plot. Select the
desired variables, then click OK. For more information see, Multiple Linear

Generating Report Graphs 148


Creating and Modifying Graphs

Regression Report Graphs on page 523 and Polynomial Regression Report


Graphs on page 574.

FIGURE 7–2
The Select Independent
Variable Dialog Box

The selected graph appears in a graph page window with the name of the
page in the window title bar. Graph pages are named according to the type of
graph created and are numbered incrementally. The graph page is assigned to
the test section of its associated report.

Bar Charts of the Bar charts to the column means are available for the following tests:
Column Means
➤ Descriptive Statistics (see page 104)
➤ t-test (see page 206)
➤ One Way ANOVA (see page 230)

This bar chart plots the group means as vertical bars with error bars
indicating the standard deviation.

FIGURE 7–3
A Bar Chart of the
Result Data for a t-test

Generating Report Graphs 149


Creating and Modifying Graphs

Scatter Plot The scatter plot is available for the following tests:

➤ Descriptive Statistics (see page 104)


➤ t-test (see page 206)
➤ One Way ANOVA (see page 230)

The scatter plot graphs the group means as single points with error bars
indicating the standard deviation.

FIGURE 7–4
A Scatter Plot of the
Result Data for a
One Way ANOVA

Point Plot The point plot is available for the following tests:

➤ Descriptive Statistics (page 104)


➤ t-test (page 206)
➤ Rank Sum Test (page 220)
➤ ANOVA on Ranks (page 310)

Generating Report Graphs 150


Creating and Modifying Graphs

The point plot graphs all values in each column as a point on the graph.

FIGURE 7–5
A Point Plot of the
Result Data for a
ANOVA on Ranks

Point Plots and The point and column means plot is only available for Descriptive Statistics
Column Means (see Describing Your Data with Basic Statistics on page 104).

The point and column means plot graphs all values in each column as a point
on the graph with error bars indicating the column means and standard
deviations of each column.

FIGURE 7–6
A Point and Column Means
Plot of the Result Data for a
Descriptive Statistics Test
The error bars plot the
column means and the
standard deviations
of the column data.

Generating Report Graphs 151


Creating and Modifying Graphs

Box Plot The box plot is available for the following tests:

➤ Descriptive Statistics (see page 104)


➤ Rank Sum Test (see page 220)
➤ ANOVA on Ranks (see page 310)
➤ Repeated Measures ANOVA on Ranks (see page 355)

The Rank Sum Test box plot graphs the percentiles and the median of
column data. The ends of the boxes define the 25th and 75th percentiles,
with a line at the median and error bars defining the 10th and 90th
percentiles.

FIGURE 7–7
A Box Plot of the
Result Data for a
Rank Sum Test

2D Scatter Plot The 2D scatter plot of the residuals is available for all of the regressions
of the Residuals except the Multiple Logistic and the Incremental Polynomial Regressions.

The scatter plots of the residuals plot the raw residuals of the independent
variables as points relative to the standard deviations. The X axis represents
the independent variable values, the Y axis represents the residuals of the
variables, and the horizontal lines running across the graph represent the

Generating Report Graphs 152


Creating and Modifying Graphs

standard deviations of the data. See Chapter 12 for more information on the
graphs for the individual regression reports.

FIGURE 7–8
Scatter Plot of the Simple
Linear Regression
Residuals with Standard
Deviations

Bar Chart of Bar charts of the standardized residuals are available for all regressions except
the Standardized the Multiple Logistic and the Incremental Polynomial Regressions. They
Residuals plot the standardized residuals of the data in the selected independent
variable column as points relative to the standard deviations.

See Chapter 12 for more information on the graphs for the individual
regression reports.

Histogram of The histogram of residuals graph is available for the following tests:
Residuals
➤ t-test (see page 206)
➤ One Way ANOVA (see page 230)
➤ Two Way ANOVA (see page 253)
➤ Three Way ANOVA (see page 283)
➤ Paired t-test (see page 332)
➤ One Way Repeated Measures ANOVA (see page 355)
➤ Two Way Repeated Measures ANOVA (see page 379)
➤ Linear Regression (see page 469)
➤ Multiple Linear Regression (see page 495)
➤ Polynomial Regression (see page 553)
➤ Stepwise Regression (see page 577)

Generating Report Graphs 153


Creating and Modifying Graphs

FIGURE 7–9
Multiple Linear
Regression Bar Chart
of the Standardized
Residuals with Standard
Deviations Using One
Independent Variable

➤ Nonlinear Regression (see page 636)


➤ Normality test (see page 127)

The histogram plots the raw residuals in a specified range, using a defined
interval set. The residuals are divided into a number of evenly incremented
histogram intervals and plotted as histogram bars indicating the number of
residuals in each interval.

FIGURE 7–10
A Histogram of
the Residuals for a t-test

Generating Report Graphs 154


Creating and Modifying Graphs

Normal The normal probability plot is available for the following test reports:
Probability Plot
➤ t-test (see page 206)
➤ One Way ANOVA (see page 230)
➤ Two Way ANOVA (see page 253)
➤ Three Way ANOVA (see page 283)
➤ Paired t-test (see page 332)
➤ One Way Repeated Measures ANOVA (see page 355)
➤ Two Way Repeated Measures ANOVA (see page 379)
➤ Linear Regression (see page 469)
➤ Multiple Linear Regression (see page 495)
➤ Polynomial Regression (see page 553)
➤ Stepwise Regression (see page 577)
➤ Nonlinear Regression (see page 636)
➤ Normality test (see page 127)

The probability plot graphs the frequency of the raw residuals. The residuals
are sorted and then plotted as points around a curve representing the area of
the gaussian plotted on a probability axis. Plots with residuals that fall along
gaussian curve indicate that your data was taken from a normally distributed
population. The X axis is a linear scale representing the residual values. The Y
axis is a probability scale representing the cumulative frequency of the
residuals.

FIGURE 7–11
Normal Probability
Plot of the Residuals

Generating Report Graphs 155


Creating and Modifying Graphs

2D Line/Scatter Plots The 2D line and scatter plots of the regressions are available for all of the
of the Regressions regression reports, except Multiple Logistic and Incremental Polynomial
with Prediction and Regressions. They plot the observations of the regressions as a line/scatter
Confidence Intervals plot. The points represent the data dependent variables plotted against the
independent variables, the solid line running through the points represents
the regression line, and the dashed lines represent the prediction and
confidence intervals. The X axis represents the independent variables and the
Y axis represents the dependent variables.

SeeChapter 11, Prediction and Correlation for more information on the


graphs for the individual regression reports.

FIGURE 7–12
A Line/Scatter Plot
of the Linear Regression
Observations with
a Regression and
Confidence and
Prediction Interval Lines

3D Residual The 3D residual scatter plots are available for the following test reports:
Scatter Plot
➤ Two Way ANOVA (see page 253)
➤ Two Way Repeated Measures ANOVA (see page 379)
➤ Multiple Linear Regression (see page 495)
➤ Stepwise Regression (see page 577)
➤ Nonlinear Regression (see page 636)

Generating Report Graphs 156


Creating and Modifying Graphs

They plot the residuals of the two selected columns of independent variable
data. The X and the Y axes represent the independent variables, and the Z
axis represents the residuals.

FIGURE 7–13
A Multiple Linear
Regression 3D
Residual Scatter
Plot of the Two
Selected Independent
Variable Columns

Grouped Bar Chart This graph is available for the Two Way ANOVA (see page 253) report. It
with Error Bars plots the data means with error bars indicating the standard deviations for
each level of the factor columns. The levels in the first factor column are used
as the X axis tick marks, and the title of the first factor column and the data
column are used as the X and the Y axis titles. The first bar in the group

Generating Report Graphs 157


Creating and Modifying Graphs

represents the first level of the second factor column and the second bar in
the group represents the second level in the second factor column.

FIGURE 7–14
A Two Way ANOVA Grouped
Bar Chart with Error Bars

3D Category This graph is available for the Two Way ANOVA (see Two Way Analysis of
Scatter Graph Variance (ANOVA) on page 253) and the Two Way Repeated Measures
ANOVA (see Two Way Repeated Measures Analysis of Variance (ANOVA)
on page 379). The 3D Category Scatter plot graphs the two factors from the
independent data columns along the X and Y axes against the data of the
dependent variable column along the Z axis. The tick marks for the X and Y
axes represent the two factors from the independent variable columns, and

Generating Report Graphs 158


Creating and Modifying Graphs

the tick marks for the Z axis represent the data from the dependent variable
column.

FIGURE 7–15
A Two Way ANOVA 3D
Category Scatter Plot

Before and The before and after line plot is available for the:
After Line Plots
➤ Paired t-test (see page 332)
➤ Signed Rank Test (see page 345)
➤ One Way Repeated Measures ANOVA (see page 355)
➤ Repeated Measures ANOVA on Ranks (see page 408)

The before and after line plot uses lines to plot a subject's change after each
treatment. If the graph plots raw data, the lines represent the rows in the
column, the column titles are used as the tick marks for the X axis and the
data is used as the tick marks for the Y axis.

If the graph plots indexed data, the lines represent the levels in the subject
column, the levels in the treatment column are used as the tick marks for the

Generating Report Graphs 159


Creating and Modifying Graphs

X axis, the data is used as the tick marks for the Y axis, and the treatment and
data column titles are used as the axis titles.

FIGURE 7–16
A Before & After Line Scatter
Plot Displaying Data
for a Paired t-test

Multiple The multiple comparison graphs are available for all ANOVA reports. They
Comparison Graphs plot significant differences between levels of a significant factor. There is one
graph for every significant factor reported by the specified multiple
comparison test. If there is one significant factor reported, one graph appears;
if there are tow significant factors, two graphs appear, etc. If a factor is not
reported as significant, a graph for the factor does not appear.

For information on individual graphs for each ANOVA, see the Chapters 9
and 10.

Scatter Matrix The matrix of scatter graphs is available for all the Pearson and the Spearman
Correlation reports (see Pearson Product Moment Correlation on page 623
and Spearman Rank Order Correlation on page 631). The matrix is a series
of scatter graphs that plot the associations between all possible combinations
of variables.

The first row of the matrix represents the first set of variables or the first
column of data, the second row of the matrix represents the second set of
variables or the second data column, and the third row of the matrix
represents the third set of variables or third data column. The X and Y data
for the graphs correspond to the column and row of the graph in the matrix.

For example, the X data for the graphs in the first row of the matrix is taken
from the second column of tested data, and the Y data is taken from the first

Generating Report Graphs 160


Creating and Modifying Graphs

FIGURE 7–17
A Multiple Comparison
Matrix for a Two
Way ANOVA with One
Significant Factor

column of tested data. The X data for the graphs in the second row of the
matrix is taken from the first column of tested data, and the Y data is taken
from the second column of tested data. The X data for the graphs in the third
row of the matrix is taken from the second column of tested data, and the Y
data is taken from the third column of tested data. The number of graph
rows in the matrix is equal to the number of data columns tested.

FIGURE 7–18
A Scatter Matrix for a
Pearson Correlation

Generating Report Graphs 161


Creating and Modifying Graphs

Creating Exploratory Graphs 10

Exploratory graphs are graphs that you create by picking data from the
worksheet. Use your worksheet data to create a variety of scatter, line, point,
and box plots, bar charts, and pie charts.

To create an exploratory graph using your worksheet data:

1 Make sure a worksheet is the active window. If you want to pick your
data before creating a graph, drag the pointer over the data you want to
plot.

2 On the Graph menu click Create Graph or click the toolbar button.

The Graph Wizard appears displaying all of the exploratory graph types
in the Graph Styles scroll box.

FIGURE 7–19
Use the Graph Wizard to
Create Exploratory Graphs

3 Select the graph you want to create, and click Next to move to the next
panel of the Graph Wizard.

➤ If you select Histogram as the graph style, the Graph Wizard Create
Graph- Histogram panel prompts you to specify the number of bins
for the histogram. Go to Step 4.
➤ Some graph styles, like scatter plots and bar charts, require you to
choose the data format. Some, however, like pie charts or 3D scatter
plots, use a default data format.
➤ If you select a graph style that requires you to select the data
format, the Graph Wizard Create Graph - Data Format panel
appears. Go to Step 5.

Creating Exploratory Graphs 162


Creating and Modifying Graphs

➤ If you select a graph style that uses a default data format, the Graph
Wizard Creat Graph - Select Data panel appears. Go to Step 6.
4 To specify the number of bins to use for a histogram, type the
desired value in the edit box, then click Next.

FIGURE 7–20
Selecting Histogram Bins

! The number of bins is the number of intervals used for the histogram
bars. For more information, see Histogram on page 172.

FIGURE 7–21
Picking the Columns to Plot

5 Select the data format you want to use from the Data format list, and
click Next.

6 To pick the columns with the data to plot, begin picking data either
by clicking the corresponding column directly in the worksheet, or
choosing the appropriate column from the Data Columns drop-down
list. The selected columns appear in the Selected Columns list in the
order they are picked from the worksheet. You are prompted for X and/
or Y columns depending on the type of graph you are creating.

➤ If you are creating a histogram, select the column with the range of
data to plot as the Y column and the column to place the results of

Creating Exploratory Graphs 163


Creating and Modifying Graphs

data range divided by the number of bins as the Output column. For
more information on the histogram, see Histogram on page 172.
➤ If you are creating a graph of residuals, you are prompted to select the
data column with the residuals you want to plot as the Y column,
and the column to place the residuals of the Y data as the Output
column. If you are creating a Normal Probability Plot select the first
output column for the sorted residuals and the second output
column for the cumulative frequency of the residuals. For more
information on residual and normal probability plots, see 2D Scatter
Plot of the Residuals on page 152, Bar Chart of the Standardized
Residuals on page 153, and Normal Probability Plot on page 155.
➤ If you are plotting error bars and you selected Worksheet Column as
your error bar source, you are prompted for your X and /or Y
columns and the columns with the error bar values. For more
information on graphs with error bars, see Bar Charts of the Column
Means on page 149 and Grouped Bar Chart with Error Bars on page
157.
For more information of the how data is plotted for a graph, see the
sections on each of the exploratory graphs below.

If you have already selected the data you want to plot in the worksheet,
the selected columns automatically appear in the Selected Columns
box.

7 To change the column assignment in the Selected Columns list,


highlight the row in the list, then select the desired column from the
worksheet or the Data drop-down list. Double-click a column
assignment to clear it from the Selected Columns list.

8 When you have finished picking your columns, click Finish to create
the graph. The graph appears in a graph window with the name and
number of the graph page and its associated worksheet in the window
title bar. The graph is assigned to the notebook section of its associated
worksheet. For descriptions of each of the exploratory graphs, see the
following pages.

If no graph window is open, the graph appears in a new graph page


window. If a graph page window is already open, the graph appears on
the opened graph page. You can add as many graphs to one page as
desired.

Creating Exploratory Graphs 164


Creating and Modifying Graphs

Scatter Plot Select Single Scatter Plot as the graph type from the Graph Wizard to use
symbols to graph two columns of worksheet data as X values versus Y values.
Select one worksheet column for the X data and one worksheet column for
the Y data.

FIGURE 7–22
Example of a Scatter Plot

Scatter Plot Using Select Multiple Scatter Plot as the graph type from the Graph Wizard to use
Many Y Columns symbols to graph multiple columns or worksheet data as X values against Y
values. Select as many XY column pairs as desired. Each XY pair represents a
curve on the graph.

FIGURE 7–23
Example of a Scatter Plot
With Multiple Curves

Creating Exploratory Graphs 165


Creating and Modifying Graphs

Line Plot Select Single Line Plot as the graph type from the Graph Wizard to use lines
to graph two columns of worksheet data as X values versus Y values. The
Graph Wizard dialog box prompts you to pick one worksheet column for the
X data and one worksheet column for the Y data. Lines connect the points
plotted on the graph.

FIGURE 7–24
Example of a Line Plot

Line Plot Using Many Select Multiple Line Plot as the graph type from the Graph Wizard to use
Y Columns lines to graph multiple columns or worksheet data as X values against Y
values. You are prompted to select as many XY column pairs as desired. Each
column of data represents a curve on the graph Lines connect the points
plotted on the graph.

Bar Chart Select Simple Bar Chart as the graph type from the Graph Wizard to plot all
values in a selected column as bars on a bar chart. The column values are
plotted as the Y values against X values representing the row numbers of each
value. The Graph Wizard dialog box prompts you to pick one column of
data.

Creating Exploratory Graphs 166


Creating and Modifying Graphs

FIGURE 7–25
Example of a Line Plot
with Many Curves

FIGURE 7–26
Example of a Bar Chart

Creating Exploratory Graphs 167


Creating and Modifying Graphs

Bar Chart of Select Bar Chart Col Means as the graph type from the Graph Wizard to plot
Column Means the means of column data as bars with error bars indicating the standard
deviation of each column. The column means are plotted as the Y values
against X values representing the row numbers of each value.

The error bars are calculated using the means from the plotted worksheet
columns or from specified worksheet columns. If you selected to use values
from a worksheet column, you are prompted to pick the columns with the
error bar values. You pick one error bar column for each bar in the bar chart.

FIGURE 7–27
Example of a Bar Chart
of the Column Means

Scatter Plot of Select Scatter Plot Col Means as the graph type from the Graph Wizard to
Column Means plot the means of column data as points with error bars indicating the
standard deviation of each column. The column means are plotted as the Y
values against X data representing the row numbers of each value.

The error bars are calculated using the means from the plotted worksheet
columns or from specified worksheet columns. If you selected to use values
from a worksheet column, you are prompted to pick the columns with the

Creating Exploratory Graphs 168


Creating and Modifying Graphs

error bar values. You pick one error bar column for each point you plot on
the graph.

FIGURE 7–28
Example of a Scatter Plot
of the Column Means

Point Plot Select Point Plot as the graph type from the Graph Wizard to graph each
value in a selected worksheet column as Y data against X values representing
the order the columns are selected from the worksheet. The points are
represented by symbols. The Graph Wizard dialog box prompts you to pick
as many columns as desired.

FIGURE 7–29
Example of a Point Plot

Creating Exploratory Graphs 169


Creating and Modifying Graphs

Point Plot and Select Point and Column Means as the graph type from the Graph Wizard to
Column Means graph each value in the selected columns as Y data. The order of the columns
represent the X data. Error bars plot the means of each data column with
their standard deviations.

This graph style uses a default data format. Do not pre-select the data;
instead, select the data on the Graph Wizard Create Graph - Select Data
panel. First, select an empty column for the Output, which is the location for
the symbols of the means.

FIGURE 7–30
Selecting Data for a Point
Plot with Column Means

The error bars are calculated using the means from the plotted worksheet
columns or from specified worksheet columns.

FIGURE 7–31
Example of a Point
Plot with Error Bars
Plotting the Column Means

Creating Exploratory Graphs 170


Creating and Modifying Graphs

Box Plot Select Box Plot as the graph type in the Graph Wizard to graph the
percentiles and the median of the data in selected columns as boxes. The
percentiles and the median are plotted as the Y data against X values
representing the order the columns are selected from the worksheet. You can
select as many columns as desired.

FIGURE 7–32
Example of a Box Plot

Creating Exploratory Graphs 171


Creating and Modifying Graphs

Histogram Select Histogram as the graph type from the Graph Wizard to plot a column
of data in a specified range, using a defined interval set. The data is divided
into a number of evenly incremented histogram bins and plotted as
histogram bars indicating the number of data points in each bin or interval.
The X axis represents the histogram bins, and the Y axis represent the
number of residuals in each group. The Graph Wizard dialog box prompts
you the worksheet column with the range of data you want to plot as the Y
column and the column to place the results of your data divided by the
specified bin.

FIGURE 7–33
Example of a Histogram
Plotting Residuals of
Each Data Column

Creating Exploratory Graphs 172


Creating and Modifying Graphs

Creating Exploratory Graphs 173


Creating and Modifying Graphs

Creating Exploratory Graphs 174


8
Pie Chart Select Pie Chart as the graph type from the Graph Wizard to plot each value
in a selected column as a pie slice equivalent to its percentage of the total pie.
The Graph Wizard dialog box prompts you to select one worksheet column.

FIGURE 8–31
Example of a Pie Chart

3D Scatter Plot Select 3D Scatter Plot as the graph type from the Graph Wizard to use points
to graph three columns of worksheet data as X, Y, and Z values on a three
dimensional plane. The Graph Wizard prompts you to select an X, Y, and Z
column. You can select as many XYZ triplets as desired.

FIGURE 8–32
Example of a 3D
Scatter Plot

CREATING EXPLORATORY GRAPHS 175


Scatter Plot of Select Scatter Plot Residuals as the graph type from the Graph Wizard to plot
the Residuals the residuals of the values of the selected worksheet column as points relative
to the standard deviations. The column residuals are plotted as the Y values
against X values representing the row numbers of each Y value. The Graph
Wizard prompts you to pick one Y column with the data to plot the residuals
for and one Output column to place the residuals for the Y data. The Output
column is plotted.

FIGURE 8–33
Example of a Scatter
Plot Plotting Residuals
of Each Data Column

Bar Chart of Select Standardized Residuals as the graph type from the Graph Wizard to
the Standardized plot the standardized residuals of the values in the selected column as bars
Residuals relative to the standard deviations. The column residuals are plotted as the Y
values against X values representing the row numbers of each Y value. The
Graph Wizard dialog box prompts you to pick a Y column with the data to
plot the residuals for, and an Output column to place the residual results in.
The output column is plotted.

Normal Select Normal Probability Plot as the graph style in the Graph Wizard to plot
Probability Plot the frequency of the raw residuals along a gaussian curve. The residuals are
sorted and then plotted as points around a curve representing the area of the
gaussian plotted on a probability axis. Plots with residuals that fall along the
gaussian curve indicate that your data was taken from a normally distributed
population.

The X axis is a linear scale representing the residual values. The Y axis is a
probability scale representing the cumulative frequency of the residual. Select
the column with the data you want to plot the residuals for plot as the Y

CREATING EXPLORATORY GRAPHS 176


FIGURE 8–34
Example of a Bar Chart
Plotting the Standardized
Residuals of Data Columns

column, the column to place the sorted residuals of the data as the first
output column and the column to place the cumulative frequency of the
residuals as the second output column.

FIGURE 8–35
Example of a Normal
Probability Plot of the
Residuals

CREATING EXPLORATORY GRAPHS 177


Setting Graph Page Options 10

Setting the graph page options includes setting the page margins, size and
orientation, and the units of measurement on the page, graph resizing
options, and page undo disabling.

Setting Page The margins, size, and orientation of the graph page are set in the Page Setup
Margins, Size, dialog box.
and Orientation
1 On the File menu click Page Setup.

The Page Setup dialog box appears.

2 To set the margins of the graph page, click the Margins tab, then type
in or select the desired margins in the appropriate edit boxes. The
measurement unit used for the margins is specified in the Page
Preferences dialog box (see the following section).

FIGURE 8–36
The Page Setup Dialog
Box Displaying the
Margins Options

3 Clear or check the Show Margins options by selecting it. If this option
is selected, margins are displayed on the page. To hide page margins,
make sure the Show Margins option is not checked.

4 To set the size of the graph page, click the Page Size tab, then select a
paper size from the Paper Size drop-down list. Use the Width and
Height options to specify the dimensions of the page and the Portrait
and Landscape options to specify the orientation of the page. The

Setting Graph Page Options 178


measurement unit used for the page size is specified in the Page Options
dialog box (see the following section).

FIGURE 8–37
The Page Setup
Dialog Box Displaying
the Page Size Options

5 To set the background color and to show or hide graphs on the graph
page, click the Page Layout tab. Graphs that appear on the page are
listed under Shown. Graphs that are hidden are listed under Hidden.
Double-click a graph to move it between lists.

FIGURE 8–38
The Page Setup
Dialog Box Displaying
the Page Layout Options

6 Click OK to accept the specified settings and close the dialog box, or
Apply to accept the settings without closing the dialog box.

Setting Measurement The graph page measurement units, graph resizing options, and page undo
Units, Graph Resizing disabling option are set in the Options dialog box on the Page tab. To set
Options, and Page these options:
Undo
1 On the Tools menu click Options.

Setting Graph Page Options 179


The Options dialog box appears.

FIGURE 8–39
The Page Panel of the
Preferences Dialog Box

2 Click the Page tab.

3 To set the measurement unit used for the graph page, select the
desired unit of measurement from the Units list. You can choose inches,
millimeters, or points. The selected measurement unit is used for the
page margins and size and for the size of the graph.

4 To enable and disable the Undo and Redo commands for the graph
page, select the Page Undo check box. When this check box is cleared,
the Undo and Redo commands are unavailable. Disabling the Undo
and Redo functionality of the graph page can speed page operations
significantly; however, it means page editing cannot be undone. To
disable the page Undo and Redo commands, clear the check box by
selecting it.

5 To set the aspect ratio of resized graph, select the Stretch Maintains
Aspect Ratio check box. When this command is checked, resized
objects maintain their vertical-to-horizontal ratio. If this command is
not checked, objects can be resized disproportionately. For more
information on sizing graphs, Resizing and Moving Graphs on the Page
on page 182.

Setting Graph Page Options 180


6 To specify whether the graph and axis titles resize with the graph,
select the Graph Object Resize with Graph check box. When this
command is checked, resizing a graph automatically resizes objects
associate with the graph, like axis labels, tick labels, the graph title, and
the automatic legend. If this command is not checked, objects must be
sized individually.

7 To display grids on the graph page, select Show Grid. You can display
grids as either as dots or as lines. Select the density of the grid from the
Density drop-down list. Select the color of the grid from the Color
drop-down list.

8 To “snap” objects to the nearest grid point when dragging and


dropping, select Snap-to. When drawing or resizing, the current corner
or edge being dragged is snapped. When moving an object, the upper
left corner is snapped.

9 Click OK to accept the settings and close the dialog box or Apply to
accept the settings without closing the dialog box.

Zooming In and Out on Graphs 10

Use the View menu Zoom command to reduce and enlarge a graph in the
graph window. There are five different zoom levels to choose from. You can
zoom out on the entire page or you can choose a 50, 100, 200, or 400
percent view of the graph. You can also select the desired zoom level from the
toolbar drop-down list.

➤ Ctrl+W Fit in Window


➤ Ctrl+5 50% View
➤ Ctrl+1 100% View

Zooming In and Out on Graphs 181


➤ Ctrl+2 200% View
➤ Ctrl+4 400% View

FIGURE 8–40
Example of Different
Zoom Levels for Graphs

The first graph is zoomed in at


100%, and the second graph is
zoomed out at the
Fit in Window view.

Selecting Graphs and Labels 10

Before you can modify a graph or graph labels, you need to select them. To
select a graph or graph label, you must be in select mode. To make sure you
are in select mode, choose the Tools menu Select Object command. A check
mark next to the Select Object command indicates you are in select mode.
Once you are in select mode, click a graph or graph label to select it. Selected
graphs are surrounded by handles, and selected labels are surrounded by a
dotted box.

Use text mode to enter and edit text on a page. For information on entering
and editing text on a page, see Creating and Editing Labels on the Graph
Page on page 186.

Resizing and Moving Graphs on the Page 10

Use your mouse to move and resize a graph on the page.

Selecting Graphs and Labels 182


Moving Graphs To move a graph, select it, then hold down your mouse button and drag the
graph to a new position on the page. A dotted outline of the graph appears
indicates the position of the graph while you are dragging it.

FIGURE 8–41
Moving a Graph to a
New Location on the Page

Sizing Graphs To resize and scale a graph, select it. Handles surround a selected graph. Drag
a side handle to stretch or shrink a graph in one direction; drag a corner
handle to stretch or shrink a a graph two dimensionally. A dashed outline of
the resized graph follows the pointer position.

FIGURE 8–42
Sizing a Graph on the Page

Resizing and Moving Graphs on the Page 183


Modifying Graph Attributes 10

Modifying graph attributes in SigmaStat involves changing plot symbols,


lines, bars, boxes, meshes, and pie slices. For more advanced editing, you can
use SigmaPlot (see Using SigmaPlot to Modify Graphs on page 189).

To modify graph attributes in SigmaStat, select the graph, then choose the
Graph menu Graph Properties... command. The Graph Properties dialog box
appears.

FIGURE 8–43
The Graph Properties Dialog
Box

The options available in the dialog box depend on the type of graph you have
selected. Change the desired settings, click Apply to update the graph, and
OK to close the dialog box.

Changing Fills Use the Fill Color option to change the color of graph symbols, bars, boxes,
meshes, and pie slices. Use the Fill Pattern option to change the fill pattern of
graph bars, boxes, and pie slices.

Select a color scheme from the list of options to assign a set of colors to the
curve, bar, boxes in the plot or slices in the pie chart.

Changing Symbols Use the Symbol Type option to change the type of symbols used in 2D and
3D scatter plots. Change symbol sizes by moving the slider with your mouse.
Moving it to the left decreases symbol size and moving it to the right
increases symbol size. The value in the edit box changes to reflect the position
of the slider. You can also edit the value in the box to change symbol size.

Changing Lines Use the color option to change the color of plot and mesh lines and the
outline and fill pattern lines of bars, boxes, and pie chart slices. Use the Type
option to change the type of lines used for Line Plot lines.

Modifying Graph Attributes 184


Select a line type scheme from the list of options to assign a set of line types
to the curve in the plot.

! You cannot change the line type of bar, box, or pie slice outlines or of mesh
plots.

Changing Axes Select the axis you want to apply the selected scale type to from the Apply to
drop-down list. You can change the scale type of individual axes or multiple
axes in the graph.

Changing Axis Scales Use the Scale option to assign a different scale type to the specified axis or
axes. The default axis scales are linear, but you can also use common log,
natural log, probability, and logit axis scales.

Linear Scale A linear axis scale is a standard base 10 numeric scale.

Common Log Scale A common log axis scale is a base 10 logarithmic scale.

Natural Log Scale A natural log scale is a base e logarithmic scale.

Probability Scale A probability axis scale is the inverse of the Gaussian


cumulative distribution function. The graph of the sigmoidally shaped
Gaussian cumulative distribution function on a probability scale is a straight
line. Probabilities are expressed as percentages with the minimum range value
set a 0.001 and the maximum range data set at 99.999. The default depends
on the range of the actual data.

Probit Scale A probit axis scale is similar to the probability scale; the
Gaussian cumulative distribution function plots as a straight line on a probit
scale. The scale is linear, however, with major tick marks at each Normal
Equivalent Deviation (N.E.D. = X * +,-.,/plus 5.0. At the mean (X = +,/the
probit/0/5.01/at the mean plus one standard deviation (X = + + .) the
probit = 6.0, etc. The default range is from 3 to 7. The range limit for a
probit axis scale is 1 to 9.

Logit A logit scale uses the transformation:


y
logit = ln 5 -----------6
3 a – y4

where a 0100 and 0 2/y/2/100. The default range is 7 to 97. Like the
probability and probit scales, the logit scale “straightens” a sigmoidal curve

Modifying Graph Attributes 185


Changing Error Bar Use the Error Bar option to change the calculation method of the error bar in
Calculation Methods the selected graph. The default is standard deviation, but you can also
compute error bars using standard error, 95% confidence intervals, or 99%
confidence intervals.

This option is not available for graphs without error bars.

Creating and Editing Labels on the Graph Page 10

Labels, legends, and other kinds of text are added to the graph page using the
Edit Text dialog box. You can also use the Edit Text dialog box to edit graph
and axis titles which automatically appear on the page when a graph is
created.

Creating Labels You can add an unlimited number of text labels and legends to any page.
and Legends SigmaStat for Windows supports:

➤ All TrueType®, PostScript®, and other fonts installed on your system.


➤ Multiple lines of text aligned left, right, or centered, with adjustable line
heights.
➤ Mixed fonts and other attributes within a single label.
➤ Multiple levels of superscripting and subscripting.
➤ Rotation of text in single degree increments.
➤ Color using up to 16.7 million different combinations of red, green,
and blue.

To create text labels or legends on a page:

1 Make the page the active window by selecting it, then choose the Tools
menu Text command to switch from select to text mode. A check mark
next to the command indicates that you are in text mode.

2 Click the page where you want the label to begin. The Edit Text dialog
box appears.

3 Select the font and character size, and normal, bold, italic, or
underlined characters. You can also use the Edit Text options to create

Creating and Editing Labels on the Graph Page 186


superscript, subscript, and Greek text ( ), and to specify right, center,
or left text alignment, as well as text line spacing, angle of rotation, and
text color.

FIGURE 8–44
The Edit Text Dialog Box

! Note that the Rotation, Alignment, and Line Spacing options affect the
entire label, not just the selected text, and that Line Spacing is an
automatic spacing control, not fixed. If you change the height of
characters by changing font sizes or by adding superscripts or subscripts,
the line height adjusts automatically.

4 Use the keyboard to type your label. To type additional lines, insert a
line break by pressing the Enter key.

5 To change text attributes while entering the label, select the


appropriate options and continue typing the label.

6 To switch back to normal text from superscript, or subscript text, click


the normal button.

7 To change the attributes of text you’ve already typed in the Edit


Text dialog box, drag the pointer over the text you want to change to
highlight it, then select the appropriate options.

You can use all standard cut (Ctrl+X), copy (Ctrl+C), and paste
(Ctrl+V) keystrokes and the Ctrl+% and Ctrl+& key commands to
move the cursor to the next and previous words in the text.

8 To add legend symbols to your text, click the Symbols... button. The
Symbols dialog box appears.

Click Show Legend to enable manually created legend options. Choose


the Graph to apply the legend to from the Graph drop-down list, then
choose to place the symbol before text or after text using the Placement
drop-down list. Use the Style drop-down list to control the appearance
of the legend you are creating, then choose the symbol to use for the
Creating and Editing Labels on the Graph Page 187
legend from the Symbol window. Symbols and Style options vary
depending on the graph you’ve created.

Legend symbols added to text using the Edit Text dialog box do not
appear in the Edit Text dialog box; they appear with the text on the
page.

FIGURE 8–45
The Symbol Dialog Box

9 Click OK to place the symbol in the text and to close the Symbol dialog
box.

10 When you are finished entering text, close the Edit Text dialog box by
clicking OK.

Editing Existing Use the Edit Text dialog box to edit existing text and labels that you’ve
Text Labels created, as well as automatically created graph and axis titles. Editing existing
text consists of changing the content of the text, including adding Greek
symbols to the text, adding legend symbols, and changing formatting of the
text.

To edit text on the page:

1 If you are in select mode (Tools menu Select command), double-click


the label to open the Edit Text dialog box. If you are in text mode
(Tools menu Text command) click the label to open the Edit Text
dialog box (see Figure 8–44 on page 187).

2 Select the text to modify, then use your keyboard to type new text, or
use the Edit Text dialog box options to format the text. For more
information on Edit Text options, see step 3 on page 186. You can also
use all standard cut (Ctrl+X), paste (Ctrl+V), and copy (Ctrl+C)
keystrokes as well as the , , and toolbar buttons.

Creating and Editing Labels on the Graph Page 188


Deleting Text Labels To delete a text label from the page, make sure you are in select mode by
choosing the Tools menu Select command. A check mark next to the Select
command indicates that you are in select mode. Select the label you want to
delete, then press the Delete key or choose the Edit menu Cut or Clear
commands.

FIGURE 8–46
Example of Report Graph
with Added Legends and
with Modified Axis Titles

Using SigmaPlot to Modify Graphs 10

If you have SigmaPlot 8.02 installed on your computer, you can use
SigmaPlot's more advanced graph editing capabilities to modify your
SigmaStat graph. To view and edit SigmaStat graphs in SigmaPlot, choose the
Graph menu Edit with SigmaPlot command.

SigmaPlot opens within SigmaStat, which you can use to customize your
graph.

To close SigmaPlot, click another SigmaStat window, and choose the File
menu End SigmaPlot Editing command, or press Esc.

You can use SigmaPlot to edit both the data and the graph attributes of
exploratory graphs but only the graph attributes of report graphs. Because
report graphs do not use worksheets in SigmaStat, worksheets do not appear
with the graph page when you run SigmaPlot.

Using SigmaPlot to Modify Graphs 189


Cutting and Copying Graphs and Other Page Objects 10

Cut and copy graphs to the Clipboard using the toolbar, or by using Edit
menu commands.

Clipboard contents can be pasted to any open page, or into any other
Windows application that supports Windows Metafiles or OLE2 (Object
Linking and Embedding). To learn about pasting objects and graphs, see
Pasting Graphs and Other Objects onto a SigmaStat Graph Page on page
190.

! The Clipboard is a Microsoft Windows feature. To learn more about how the
Clipboard works, refer to your Windows User’s Guide.

To cut or copy a graph or page object, select the graph or object to cut or
copy by clicking it. To cut the item, click the toolbar button, choose the
Edit menu Cut command, or press Ctrl+X. To copy the item, click the
toolbar button, choose the Edit menu Copy command, or press Ctrl+C. A
copy of the selected graph is placed in the Clipboard. Since copied items
remain in the Clipboard until replaced, you can paste as many copies as you
want without having to cut or copy the object each time.

For information on retrieving cut and copied items from the Clipboard, see
the following section, PASTING GRAPHS AND OTHER OBJECTS ONTO A
PAGE.

Pasting Graphs and Other Objects onto a SigmaStat


Graph Page 10

Use the Edit menu Paste or Paste Special commands to paste Clipboard
contents to a graph page window. Pasted objects can be SigmaStat graphs,
scanned images, clip art, text from a word processor, or anything else that can
be cut or loaded into the Windows Clipboard.

Use the Paste command, the toolbar button, or press Ctrl+V to paste an
object without linking or embedding it. Use the Paste Special... command to
paste an object as a specified file type, as an embedded object, or as a linked
file object.

Cutting and Copying Graphs and Other Page Objects 190


Embedding or linking text is especially useful for placing equations on a page,
enabling you to insert equations created with the Microsoft Word Equation
Editor, and edit them at a later date. For more information on using
Microsoft Word and the Equation Editor, refer to the Microsoft Word User’s
Guide. To learn about pasting text on a page, see Pasting Objects on page
193.

To learn about using the Edit menu Cut and Copy commands to place
graphs and other objects on the Clipboard, see Cutting and Copying Graphs
and Other Page Objects on page 190.

Linking Objects vs. Embedding Objects 10

When using the Edit menu Paste Special... command to paste an object to a
page, you can usually choose between embedding the object as a specified file
type by choosing the Paste Special dialog box Paste option, or linking the
object using the Paste Link option. Embedding the object actually places
copy of the object on the graph page and enables you to edit the object by
activating the object’s source application when you double-click it, but does
not change the original file from which the object was pasted. Embedding the
object has the advantage of keeping all the associated data in one place, but
can create large files.

Linking the object appears to place a copy of the object on the page, but
actually only places a reference to the original object file, and modifies the
object every time the original file is changed. Linking is useful when a
number of files need to refer to a central graph page, but also need to be
stored separately, either to save disk space, or to keep file elements in their
native applications for easy location and updating. The disadvantage of
linking objects is that a referenced file cannot be accessed if the locations of
the SigmaStat file and/or the source file are changed.

To learn about viewing, updating, and changing object links, see Viewing and
Modifying Object Links on page 196.

! Note that the Link option is unavailable if the Clipboard contents come from
an application that cannot link to SigmaPlot.

If you don’t anticipate needing to edit the object you are pasting, you do not
need to paste it as an embedded or linked file; however, you can still use the
Paste Special... command to paste the object as a specified file type.

Pasting Graphs and Other Objects onto a SigmaStat Graph Page 191
! You can place an object on the graph page without pasting it. To learn about
inserting an object on the page, see Inserting Graphic Objects on page 194.

Pasting Graphs To paste a cut or copied graph to a page:

1 Select the graph to cut or copy, then use the Edit menu Cut or Copy
command, the toolbar or button, or press Ctrl+X or Ctrl+C to
cut or copy the graph.

2 View the page to paste the graph to, then choose the Edit menu Paste
command, or click the toolbar button.

Graphs pasted using the Edit menu Paste command take their plotted
data with them; pasted graph data is placed in the worksheet associated
with the current page. Graphs pasted using the Paste command can be
modified just like any other graph. To learn about modifying graphs,
see Modifying Graph Attributes on page 184.

3 To paste a graph without moving its data, choose the Edit menu
Paste Special... command, the Paste Special dialog box appears.

FIGURE 8–47
Using the Paste
Special Dialog Box to
Paste a Graph to
Another Graph Page

4 Select the Paste option and choose Metafile from the As box. The graph
is pasted as a metafile object and does not place data on the worksheet.
Graphs pasted as metafile objects cannot be edited using most
SigmaPlot commands.

! SigmaPlot Clipboard objects can also be pasted into other Windows


applications. For more information, see Pasting Objects, below.

Pasting Graphs and Other Objects onto a SigmaStat Graph Page 192
Pasting Objects To paste artwork, text from a word processing application, or other objects
onto a page:

1 Open the application and file containing the desired artwork or text,
and cut or copy the object.

2 Switch to SigmaPlot and view the page, opening the notebook file with
the page, if necessary. (To learn about opening notebook files and items,
see Opening and Viewing Notebook Files and Items on page 25.)

3 To paste the object, press Ctrl+V, click the toolbar button, or


choose the Edit menu Paste command. The graphic is pasted to the
page. You can size and stretch the graphic like any other drawn object.

4 To paste the object as a specified file type, an embedded object, or a


linked object, choose the Edit menu Paste Special... command. The
Paste Special dialog box appears.

! Note that the options available in the Paste Special dialog box depend on
the type of file being pasted.

5 Check the Display As Icon option if you want the object displayed as an
icon. Click the icon to view and edit the object in its source application.

You can also specify a different icon to display the pasted object. Click
the Icon... button to open the Change Icon dialog box. Choose a
different icon from the available options, or click the Browse... button
to search for alternative icons on your system.

6 Select the Paste option to embed the object, or to paste it as a specified


file type. Choose the Paste Link option to paste the object as a linked
file that can be updated in another application.

Pasting Graphs and Other Objects onto a SigmaStat Graph Page 193
! The options in the As box change depending on your selection of either
Paste or Paste Link, and the explanation in the Result box changes
depending on your selection in the As option box.

FIGURE 8–48
Using the Paste Special
Dialog Box to Paste an
Object from MicroSoft
Word to SigmaPlot

7 Select the file type to paste, embed, or link, from the As box. If you
have selected the Paste option, the text in the Results box explains
whether you are choosing simply to paste the object, or whether you are
embedding the object. Embedded objects enable you to activate a
source application and edit the object.

8 Click OK to paste the object and to close the Paste Special dialog box.
The object is pasted to the page.

! If you pasted the object as an embedded file, you can double-click the
object to edit it. If you pasted the object as a linked file, double-click it to
modify the pasted object and the file from which it was pasted.

Inserting To place an object on the page without using the Clipboard:


Graphic Objects
1 View the page to place the object on by clicking it, or by choosing the
name of the page from the Window menu.

2 Choose the Edit menu Insert New Object... command. The Insert
Object dialog box appears.

3 To display the new object as an icon, check the Display As Icon check
box.

You can also specify a different icon to display the inserted object. Click
the Icon... button to open the Change Icon dialog box. Choose a

Pasting Graphs and Other Objects onto a SigmaStat Graph Page 194
different icon from the available options, or click the Browse... button
to search for alternative icons on your system.

4 To create a new object to place on the page, select the Create New
option, then choose the type of object to create from the Object Type
list. The objects available to create depend on the applications installed
on your system.

FIGURE 8–49
The Insert Object Dialog Box
After Selecting Create New

5 Click OK to open the application associated with the selected object


type. Create the desired object, then use the application’s appropriate
Exit command to close the application and return to SigmaPlot. The
created object is displayed on the graph page as an embedded object.

6 To insert an object from an existing file on the graph page, select the
Create From File option, then type the path and file name of the
desired file in the File edit box, or click the Browse button to Open the
Browse dialog box from which you can select the appropriate path and
file name.

Pasting Graphs and Other Objects onto a SigmaStat Graph Page 195
7 Check the Link option by selecting it to place the object on the page as
a linked object. If the Link option is not selected, the object is pasted to
the page as an embedded object.

FIGURE 8–50
The Insert Object Dialog Box
After Selecting Create From
File, and the Browse Dialog
Box

8 Click OK to place the object on the page and close the dialog box.

Viewing and View and modify links of pasted objects using the Links dialog box. The
Modifying Links dialog box displays all links associated with the current graph page.
Object Links
To view and modify links:

1 View the page by selecting it, or by choosing the name of the page from
the Window menu.

2 Choose the Edit menu Links... command. The Links dialog box
appears displaying the path, file name, type of file, and if it is a
manually updated or automatically updated link.

FIGURE 8–51
The Links Dialog Box

Pasting Graphs and Other Objects onto a SigmaStat Graph Page 196
If you do not have any linked objects on this page, the dialog box does
not display any links.

3 To change the updating to either Automatic or Manual, select the


unselected option. If Automatic updating is selected, the object changes
automatically when the source file is changed. If Manual updating is
selected, you must use the Update Now button to update the linked
object with any changes made to the source file.

4 To edit a linked object, highlight the object name in the Links dialog
box by selecting it, then click the Open Source button. The source file
opens in the appropriate application where you can make changes, then
exit the application and return to SigmaPlot.

If Automatic updating is selected, the object reflects the changes; if


Manual updating is selected, you must select the Update Now button
to apply changes to the linked object.

5 To change the source file used for a linked object, click the Change
Source button. The Change Source dialog box opens. Choose the new
path and file name, then click OK. The link appears in the Links dialog
box with the new path and file name. You may need to click the Update
Now button to view this change in your document.

FIGURE 8–52
The Change Source Dialog
Box

6 To end the link between an object and it’s source file, click the Break
Link button. The object is no longer treated as a linked object.

7 Click Close to close the Links dialog box.

Pasting Graphs and Other Objects onto a SigmaStat Graph Page 197
Identifying Page Objects 10

If an object has been pasted or inserted on a graph page, you can use the Edit
menu Object command to determine the type of object. To identify the
object, select it, then view the Edit menu by selecting it with the pointer. The
Object command changes to reflect the file type of the selected object. For
example, if a bitmap object is selected, the Object command may read
Bitmap Image Object.

Pasting SigmaStat Graphs into other Applications 10

You can paste SigmaStat graphs into other applications exactly the same way
as you paste objects and graphs into SigmaStat. Copy the graph in SigmaStat
using the Edit menu Copy command (see Cutting and Copying Graphs and
Other Page Objects on page 190), then open the application you want to
paste the graph into, and choose the Edit menu Paste or Paste Special
command.

For detailed information on linking and embedding objects using the Paste
commands, see Pasting Graphs and Other Objects onto a SigmaStat Graph
Page on page 190. For information on how to insert graphs into other
applications, see Pasting SigmaStat Graphs into other Applications on page
198.

Pasting SigmaStat Graphs into other Applications 198


FIGURE 8–53
Example of a
SigmaStat Graph
Pasted Into
MicroSoft Word as
an OLE Object

Pasting SigmaStat Graphs into other Applications 199


Saving Graphs in Notebook Files 10

Report, exploratory, and non-notebook graphs opened in SigmaStat are


always saved to their associated notebook file. Report graphs are assigned to
the test sections of their associated report; exploratory graphs and non-
notebook graph files are assigned to the section of their associated
worksheets.

To save a graph page to a notebook file, choose the File menu Save command,
press Ctrl+S, or click the toolbar button. The Save As dialog box appears
prompting you for a file name, and path for the notebook file.

If you are saving the graph to an existing notebook file, the notebook is
updated to include the new worksheet or the changes to the existing
worksheet.

For more information on saving notebook items, see Saving Notebook Files
and Items on page 30.

Graphs as Non- SigmaStat graphs cannot be exported directly, but you can use the Edit menu
notebook Files command to cut or copy a graph to the Clipboard, then paste it to another
application.

For more information on cutting, copying, and pasting graphs, see Creating
and Editing Labels on the Graph Page on page 186.

Opening Graph Pages 10

To open a graph page, choose the File menu Open... command, click the
toolbar button, or press Ctrl+O. When the Open dialog box appears,
select the type of graph you want to open by selecting a file type from the List
Files of Type drop-down list, then click OK.

Graph Pages in If you open a Notebook file type (.SNB), a notebook file appears displaying
Notebook Files its sections and items. To view the desired graph, double-click the graph icon
in the appropriate notebook section.

The graph page and associated worksheet or report appear in the SigmaStat
window.

Saving Graphs in Notebook Files 200


For more information on opening notebook files, see Opening and Viewing
Notebook Files and Items on page 25.

Non-Notebook Graph Non-notebook files are individual files which are separate from the notebook.
Page Files They are automatically converted to notebook file format when opened in
SigmaStat.

FIGURE 8–54
Opening a Graph File
Using the Open Dialog Box

For more information on opening non-notebook files in SigmaStat, see


Opening Non-Notebook Files on page 26.

Closing and Deleting Graph Pages 10

Close a graph page by clicking the close button which appears in the upper
right corner of the Windows 95 graph page window. You can also close graph
pages by choosing the File menu Close command while the graph page is the
active window. Closed graph pages can be opened by double-clicking the
graph icon in the notebook section.

Closed graphs are not removed from the notebook. To delete a graph page,
close it, then select the report or graph icon in the notebook section and press
the Delete key.

To delete a graph from a graph page, select it, then press the Delete key or
choose the Edit menu Clear command. Cut the graph to the Clipboard using
the Edit menu Cut command, pressing Ctrl+X, or clicking the toolbar
button.

Closing and Deleting Graph Pages 201


Printing Graph Pages 10

To print a graph page:

1 Make sure the graph page you want to print in the active window, then
choose the File menu Print... command, click the toolbar button, or
press Ctrl+P. The Print dialog box appears.

! To set page margins, size, orientation, and paper source before you print
the page, choose the File menu Page Setup... command, set the desired
options, then select Printer... to go to the Print dialog box. See Setting
Graph Page Options on page 178 for more information.

2 Specify the printer to use, the range of pages, and number of copies to
print.

FIGURE 8–55
Example of the Print dialog
box for the HP LaserJet
Postscript Printer
This dialog box differs
depending on the type of
output device you have.

3 Click Properties to set more advanced printing options. Once you have
set the desired option, click OK to return to the Print dialog box, then
click OK again to print the selected page.

! Note that the Print dialog box differs depending on the type of printer you
have. Figure 8–55 is an example of the Print dialog box with an HP 4/4M
Postscript driver selected.

Printing Graph Pages 202


Comparing Two or More Groups

8 Comparing Two or More Groups

Use group comparison tests to compare random samples from two or more
different groups for differences in the mean or median values which cannot
be attributed to random sampling variation.

If you are comparing the effects of different treatments on the same


individuals, use repeated measures procedures. See Choosing the Procedure
to Use on page 103 for more information on when to use the different
SigmaStat tests.

About Group Comparison Tests 10

Group comparisons test two or more different groups for a significant


difference in the mean or median values beyond what can be attributed to
random sampling variation.

See Choosing the Group Comparison Test to Use on page 113 for more
information on when to use the different SigmaStat group comparison tests.

Parametric and Parametric tests assume samples were drawn from normally distributed
Nonparametric Tests populations with the same variances (or standard deviations). Parametric tests
are based on estimates of the population means and standard deviations, the
parameters of a normal distribution.

Nonparametric tests do not assume that the samples were drawn from a
normal population. Instead, they perform a comparison on ranks of the
observations. Rank Sum Tests automatically rank numeric data, then
compare the ranks rather than the original values.

About Group Comparison Tests 203


Comparing Two or More Groups

Comparing You can compare two groups using:


Two Groups
➤ An Unpaired t-test (a parametric test).
➤ A Mann-Whitney Rank Sum Test (a nonparametric test).

Comparing You can compare three or more groups using the:


Many Groups
➤ One Way ANOVA (analysis of variance). A parametric test that compares
the effect of a single factor on the mean of two or more groups.
➤ Two Way ANOVA. A parametric test that compares the effect of two
different factors on the means of two or more groups.
➤ Three Way ANOVA. A parametric test that compares the effect of three
different factors on the means of two or more groups.
➤ Kruskal-Wallis Analysis of Variance on Ranks, which is the
nonparametric analog of One Way ANOVA.

If you are using one of these procedures to compare multiple groups, and you
find a statistically significant difference, you can use several multiple
comparison procedures (also known as post-hoc tests) to determine exactly
which groups are different and the size of the difference. These procedures are
described for each test.

Data Format for Group Comparison Tests 10

Data can be arranged in the worksheet as:

➤ Columns for each group (raw data).


➤ Data indexed to other column(s).

For t-tests and One Way ANOVAs, you can also use:

➤ The sample size, mean, and standard deviation for each group.
➤ The sample size, mean, and standard error of the mean (SEM) for each
group.

Below is a brief description of each type of data format. Complete


descriptions of data entry and formats can be found in Chapter 4, USING
THE DATA WORKSHEET.

Data Format for Group Comparison Tests 204


Comparing Two or More Groups

Raw Data The raw data format uses separate worksheet columns for the data in each
group. This is the most common format, where your data have not yet been
analyzed or transformed.

You can use raw data for all tests except Two and Three Way ANOVAs.

! SigmaStat tests accept messy and unbalanced data and do not require equal
sample sizes in the groups being compared. There are no problems associated
with missing data or uneven columns; however, missing values must be
indicated by double dashes (“--”), not empty cells.
FIGURE 8–1
Valid Data Formats
for an Unpaired t-
test
Columns 1 and 2
are arranged as raw
data. Columns 3, 4,
and 5 are arranged
as descriptive
statistics using the
sample size, mean,
and standard
deviation. Columns
6 and 7 are
arranged as group
indexed data, with
column 6 as the
factor column and column 7
as the data column.

Descriptive If your data is in the form of statistical values (sample size, mean, standard
Statistics deviation, or standard error of the mean), the sample sizes (N) must be in one
worksheet column, the means in another column, and the standard
deviations (or standard errors of the mean) in a third column, with the data
for each group in the same row. When comparing two groups, there should
be exactly two rows of data.

Indexed Data Indexed data places the groups or treatments in a factor column, and the
corresponding data points in a second column. Two way ANOVAs require
two factor columns and one data column.

The data does not have to be organized in any particular order.

! You can index raw data or convert indexed data to raw data using the Edit
menu Index and UnIndex commands (see Indexing Data on page 73).

! Data for a Two Way ANOVA is always assumed to be indexed.

Data Format for Group Comparison Tests 205


Comparing Two or More Groups

FIGURE 8–2
Data Format for a Two
Way ANOVA with
Two Factor Indexed Data
Column 1 is the first factor
column, column 2 is the
second factor column, and
column 3 contains the data.

Unpaired t-Test 10

Use an Unpaired t-test when:

➤ You want to see if the means of two different samples are significantly
different.
➤ Your samples are drawn from normally distributed populations with the
same variances.

If you know that your data was drawn from a non-normal population, use
the Mann-Whitney Rank Sum Test. When there are more than two groups
to compare, do a One Way Analysis of Variance.

! Note that, depending on your t-test options settings (see Setting t-test
Options on page 208), if you attempt to perform a t-test on non-normal
populations or populations with unequal variances, SigmaStat will inform
you that the data is unsuitable for a t-test, and suggest the Mann-Whitney
Rank Sum Test instead (see Mann-Whitney Rank Sum Test on page 220).

About the The unpaired t-test tests for a difference between two groups that is greater
Unpaired t-test than what can be attributed to random sampling variation. The null
hypothesis of an unpaired t-test is that the means of the populations that you
drew the samples from are the same. If you can confidently reject this
hypothesis, you can conclude that the means are different.

Unpaired t-Test 206


Comparing Two or More Groups

The Unpaired t-test is a parametric test based on estimates of the mean and
standard deviation parameters of the normally distributed populations from
which the samples were drawn.

Performing an To perform an Unpaired t-test:


Unpaired t-test
1 Enter or arrange your data appropriately in the worksheet
(see following section).

2 If desired, set the t-test options using the Options for t-test dialog box
(see page 209).

3 Select t-test from the toolbar drop-down list, then click the button,
or choose the Statistics menu Compare Two Groups, t-test command.

4 Run the test by selecting the worksheet columns with the data you want
to test using the Pick Columns dialog box (see page 212).

5 View and interpret the t-test report and generate report graphs (pages
8-214 and 8-217).

Arranging t-test Data 10

The format of the data to be tested can be raw, indexed, or summary


statistics. For raw and indexed data, the data is placed in two worksheet
columns. Statistical summary data is placed in three worksheet columns.

FIGURE 8–3
Valid Data
Formats
for an Unpaired
t-test
Columns 1 and
2 are arranged
as raw data.
Columns 3, 4,
and 5 are
arranged as
descriptive
statistics using
the sample size,
mean, and
standard
deviation.
Columns 6 and
7 are arranged as group
indexed data, with column 6
as the factor column and
column 7 as the data column.

Unpaired t-Test 207


Comparing Two or More Groups

For more information on arranging data, see Data Format for Group
Comparison Tests on page 204 or Arranging Data for t-Tests and ANOVAs
on page 64. For information on how to select the data format for an unpaired
t-test, see Unpaired t-Test on page 206.

Selecting Data When running a t-test, you can either:


Columns
➤ Select the columns to test from the worksheet by dragging your mouse
over the columns before choosing the test.
➤ Select the columns while performing the test (see page 105).

Setting t-test Options 10

Use the t-test options to:

➤ Adjust the parameters of a test to relax or restrict the testing of your data
for normality and equal variance.
➤ Display the statistics summary and the confidence interval for the data in
the report and save residuals to a worksheet column.
➤ Compute the power or sensitivity of the test.

To change the t-test options:

1 If you are going to run the test after changing test options, and want to
select your data before you run the test, drag the pointer over your data.

2 To open the Options for t-test dialog box, select t-test from the toolbar
drop-down list, then click the button, or choose the Statistics menu
Current Test Options... command. The Normality and Equal Variance
options appear (see Figure 8–4 on page 209).

3 Click the Result tab to view the Summary Table, Confidence Interval,
and Residual options (see Figure 8–5 on page 210), and the Post Hoc
Test tab to view the Power option (see Figure 8–6 on page 211). Click
the Assumption Checking tab to return to the Normality and Equal
Variance options.

4 Click a check box to enable or disable a test option. Options settings are
saved between SigmaStat sessions. For more information on each of the
test options, see pages 8-209 through 8-217.

Unpaired t-Test 208


Comparing Two or More Groups

5 To continue the test, click Run Test. The Pick Columns dialog box
appears (see Selecting Data Columns on page 105 for more
information).

6 To accept the current settings and close the options dialog box, click
OK. To accept the current setting without closing the options dialog
box, click Apply. To close the dialog box without changing any settings
or running the test, click Cancel.

! You can select Help at any time to access SigmaStat’s on-line help system.

Normality and Select the Assumption Checking tab from the options dialog box to view the
Equal Variance Normality and Equal Variance options. The normality assumption test
Assumptions checks for a normally distributed population. The equal variance assumption
test checks the variability about the group means.

Normality Testing SigmaStat uses the Kolmogorov-Smirnov test to test for


a normally distributed population.

FIGURE 8–4
The Options for t-test
Dialog Box Displaying
the Assumption
Checking Options

Equal Variance Testing SigmaStat tests for equal variance by checking the
variability about the group means.

P Values for Normality and Equal Variance The P value determines the
probability of being incorrect in concluding that the data is not normally
distributed (P value is the risk of falsely rejecting the null hypothesis that the
data is normally distributed). If the P computed by the test is greater than the
P set here, the test passes.

To require a stricter adherence to normality and/or equal variance,


increase the P value. Because the parametric statistical methods are relatively
robust in terms of detecting violations of the assumptions, the suggested

Unpaired t-Test 209


Comparing Two or More Groups

value in SigmaStat is 0.050. Larger values of P (for example, 0.100) require


less evidence to conclude that data is not normal.

To relax the requirement of normality and/or equal variance, decrease P.


Requiring smaller values of P to reject the normality assumption means that
you are willing to accept greater deviations from the theoretical normal
distribution before you flag the data as non-normal. For example, a P value of
0.050 requires greater deviations from normality to flag the data as non-
normal than a value of 0.100.

! There are extreme conditions of data distribution that these tests cannot take
into account. For example, the Levene Median test fails to detect differences
in variance of several orders of magnitude. However, these conditions should
be easily detected by simply examining the data without resorting to the
automatic assumption tests.

Summary Table Select the Results tab in the options dialog box to view the Summary Table
option. The Summary Table option displays the number of observations for a
column or group, the number of missing values for a column or group, the
average value for the column or group, the standard deviation of the column
or group, and the standard error of the mean for the column or group.

FIGURE 8–5
The Options for t-test
Dialog Box Displaying
the Summary Table,
Confidence Intervals,
and Residuals Options

Confidence Interval Select the Results tab in the options dialog box to view the Confidence
Intervals option. The Confidence Intervals option displays the confidence
interval for the difference of the means. To change the interval, enter any
number from 1 to 99 (95 and 99 are the most commonly used intervals).
Click the selected check box if you do not want to include the confidence
interval in the report.

Unpaired t-Test 210


Comparing Two or More Groups

Residuals Select the Results tab in the options dialog box to view the Residuals option.
Use the Residuals option to display residuals in the report and to save the
residuals of the test to the specified worksheet column. To change the column
the residuals are saved to, edit the number in or select a number from the
drop-down list.

Power Select the Post Hoc Tests tab in the options dialog box to view the Power
option. The power or sensitivity of a test is the probability that the test will
detect a difference between the groups if there is really a difference.

Change the alpha value by editing the number in the Alpha Value box. Alpha
($) is the acceptable probability of incorrectly concluding that there is a
difference. The suggested value is $ 0 0.05. This indicates that a one in
twenty chance of error is acceptable, or that you are willing to conclude there
is a significant difference when P 7 0.05.

Smaller values of $ result in stricter requirements before concluding there is a


significant difference, but a greater possibility of concluding there is no
difference when one exists. Larger values of $ make it easier to conclude that
there is a difference, but also increase the risk of reporting a false positive.

FIGURE 8–6
The Options for t-test
Dialog Box Displaying
the Power Option

Unpaired t-Test 211


Comparing Two or More Groups

Running a t-test 10

To run a t-test, you need to select the data to test. The Pick Columns dialog
box is used to select the worksheet columns with the data you want to test
and to specify how your data is arranged in the worksheet.

To run a t-test:

1 If you want to select your data before you run the test, drag the pointer
over your data.

2 Open the Pick Columns dialog box to start the t-test. You can either:

➤ Selectt-test from the toolbar drop-down list, then click


the button.
➤ Choose the Statistics menu Compare Two Groups, t-test...
command.
➤ Click the Run Test button from the Options for t-test dialog box.

The Pick Columns dialog box appears prompting you to specify a data
format.

3 Select the appropriate data format from the Data Format drop-down
list. If your data is grouped in columns, select Raw. If your data is in the
form of a group index column(s) paired with a data column(s), select
Indexed. If your data was entered in the form of summary statistics for
each group, select either Sample Size, Mean, and Standard Deviation,
or Sample Size, Mean, and SEM (Standard Error of the Mean).

FIGURE 8–7
The Pick Columns
for t-test Dialog Box
Prompting You to
Specify a Data Format

For more information on arranging data, see Data Format for Group
Comparison Tests on page 204, or Arranging Data for t-Tests and
ANOVAs on page 64.

Unpaired t-Test 212


Comparing Two or More Groups

4 Click Next to pick the data columns for the test. If you selected
columns before you chose the test, the selected columns appear in the
Selected Columns list.

5 To assign the desired worksheet columns to the Selected Columns list,


select the columns in the worksheet, or select the columns from the
Data for Data drop-down list.

The first selected column is assigned to the first row in the Selected
Columns list, and all successively selected columns are assigned to
successive rows in the list. The title of selected columns appears in each
row. For raw and indexed data, you are prompted to select two
worksheet columns. For statistical summary data you are prompted to
select three columns.

FIGURE 8–8
The Pick Columns
for t-test Dialog Box
Prompting You to
Select Data Columns

6 To change your selections, select the assignment in the list, then select
new column from the worksheet. You can also clear a column
assignment by double-clicking it in the Selected Columns list.

7 Click Finish to run the t-test on the selected columns. After the
computations are completed, the report appears. To edit the report, use
the Format menu commands; for information on editing reports, see
Editing Reports on page 137.

! If you attempt to run a test on worksheet columns with empty cells, a


dialog box appears asking you if you want to convert the empty cells to
missing values. Choose Convert to convert the cells and continue with
the test, or Cancel to cancel the test.

Unpaired t-Test 213


Comparing Two or More Groups

Interpreting t-test Results 10

The t-test calculates the t statistic, degrees of freedom, and P value of the
specified data. These results are displayed in the t-test report which
automatically appears after the t-test is performed. The other results displayed
in the report are enabled and disabled in the Options for t-test dialog box (see
page 209).

For descriptions of the derivations for t-test results, you can reference any
appropriate statistics reference. For a list of suggested references, see
References on page 12.

! The report scroll bars only scroll to the top and bottom of the current page.
To move to the next or the previous page in the report, use the and
buttons in the formatting toolbar to move one page up and down in the
report.

Result Explanations In addition to the numerical results, expanded explanations of the results may
also appear. To turn off this explanatory text, choose the Statistics menu
Report Options... command and uncheck the Explain Test Results option.

The number of decimal places displayed is also set in the Report Options
dialog box. For more information on setting report options, see Setting
Report Options on page 135.

Normality Test Normality test results show whether the data passed or failed the test of the
assumption that the samples were drawn from normal populations and the P
value calculated by the test. All parametric tests require normally distributed
source populations.

Unpaired t-Test 214


Comparing Two or More Groups

This result is set in the Options for t-test dialog box (see page 209).

FIGURE 8–9
The t-test Report

Equal Variance Test Equal Variance test results display whether or not the data passed or failed the
test of the assumption that the samples were drawn from populations with
the same variance and the P value calculated by the test. Equal variance of the
source population is assumed for all parametric tests. This result is set in the
Options for t-test dialog box (see page 209).

Summary Table SigmaStat can generate a summary table listing the sizes N for the two
samples, number of missing values, means, standard deviations, and the
standard error of the means (SEM). This result is displayed unless you disable
the Summary Table option in the Options for t-test dialog box.

N (Size) The number of non-missing observations for that column or


group.

Missing The number of missing values for that column or group.

Mean The average value for the column. If the observations are normally
distributed the mean is the center of the distribution.

Standard Deviation A measure of variability. If the observations are


normally distributed, about two-thirds will fall within one standard deviation

Unpaired t-Test 215


Comparing Two or More Groups

above or below the mean, and about 95% of the observations will fall within
two standard deviations above or below the mean.

Standard Error of the Mean A measure of the approximation with which


the mean computed from the sample approximates the true population
mean.

t Statistic The t-test statistic is the ratio

difference between the means of the two groups


t = ------------------------------------------------------------------------------------------------------------------
standard error of the difference between the means

The standard error of the difference is a measure of the precision with which
this difference can be estimated.

You can conclude from “large” absolute values of t that the samples were
drawn from different populations. A large t indicates that the difference
between the treatment group means is larger than what would be expected
from sampling variability alone (i.e., that the differences between the two
groups are statistically significant). A small t (near 0) indicates that there is no
significant difference between the samples.

Degrees of Freedom Degrees of freedom represents the sample sizes, which


affect the ability of the t-test to detect differences in the means. As degrees of
freedom (sample sizes) increase, the ability to detect a difference with a
smaller t increases.

P Value The P value is the probability of being wrong in concluding that


there is a true difference in the two groups (i.e., the probability of falsely
rejecting the null hypothesis, or committing a Type I error, based on t). The
smaller the P value, the greater the probability that the samples are drawn
from different populations. Traditionally, you can conclude there is a
significant difference when P 7 0.05.

Confidence Interval If the confidence interval does not include zero, you can conclude that there
for the Difference is a significant difference between the proportions with the level of
of the Means confidence specified. This can also be described as P < $ (alpha), where $ is
the acceptable probability of incorrectly concluding that there is a difference.

The level of confidence is adjusted in the Options for t-test dialog box; this is
typically 100(1*$), or 95%. Larger values of confidence result in wider
intervals and smaller values in smaller intervals. For a further explanation of

Unpaired t-Test 216


Comparing Two or More Groups

$, see Power below. This result is set Options for t-test dialog box (see page
209).

Power The power, or sensitivity, of a t-test is the probability that the test will detect
a difference between the groups if there really is a difference. The closer the
power is to 1, the more sensitive the test.

t-test power is affected by the sample size of both groups, the chance of
erroneously reporting a difference, $ (alpha), the difference of the means, and
the standard deviation.

This result is set in the Options for t-test dialog box (see page 209).

Alpha ($) Alpha ($) is the acceptable probability of incorrectly concluding


that there is a difference. An $ error is also called a Type I error (a Type I
error is when you reject the hypothesis of no effect when this hypothesis is
true).

The $ value is set in the Options for t-test dialog box; a value of $ 0 0.05
indicates that a one in twenty chance of error is acceptable, or that you are
willing to conclude there is a significant difference when P 7 0.05.

Smaller values of $ result in stricter requirements before concluding there is a


significant difference, but a greater possibility of concluding there is no
difference when one exists (a Type II error). Larger values of $ make it easier
to conclude that there is a difference but also increase the risk of reporting a
false positive (a Type I error).

t-test Report Graphs 10

You can generate up to five graphs using the results from a t-test. They
include a:

➤ Bar chart of the column means.


➤ Scatter plot with error bars of the column means.
➤ Point plot of the column means.
➤ Histogram of the residuals.
➤ Normal probability plot of the residuals.

Bar Chart The t-test bar chart plots the group means as vertical bars with error bars
indicating the standard deviation. If the graph data is indexed, the levels in
the factor column are used as the tick marks for the bar chart bars, and the

Unpaired t-Test 217


Comparing Two or More Groups

column titles are used as the X and Y axis titles. If the graph data is in raw or
statistical format, the column titles are used as the tick marks for the bar chart
bars and default X Data and Y Data axis titles are assigned to the graph. For
an example of a bar chart, see Bar Charts of the Column Means on page 149.

Scatter Plot The t-test scatter plot graphs the group means as single points with error bars
indicating the standard deviation. If the graph data is indexed, the levels in
the factor column are used as the tick marks for the scatter plot points, and
the column titles are used as the X and Y axis titles. If the graph data is in raw
or statistical format, the column titles are used as the tick marks for the
scatter plot points and default X Data and Y Data axis titles are assigned to
the graph. For an example of a scatter plot, see Scatter Plot on page 150.

Point Plot The t-test point plot graphs all values in each column as a point on the graph.
If the graph data is indexed, the levels in the factor column are used as the
tick marks for the plot points, and the column titles are used as the X and Y
axis titles. If the graph data is in raw or statistical format, the column titles are
used as the tick marks for the plot points and default X Data and Y Data axis
titles are assigned to the graph. For an example of a point plot, see Point Plot
on page 150.

Histogram of The t-test histogram plots the raw residuals in a specified range, using a
Residuals defined interval set. The residuals are divided into a number of evenly
incremented histogram intervals and plotted as histogram bars indicating the
number of residuals in each interval. The X axis represents the histogram
intervals, and the Y axis represent the number of residuals in each group. For
an example of a histogram, see page 153.

Normal The t-test probability plot graphs the frequency of the raw residuals. The
Probability Plot residuals are sorted and then plotted as points around a curve representing
the area of the gaussian plotted on a probability axis. Plots with residuals that
fall along gaussian curve indicate that your data was taken from a normally
distributed population. The X axis is a linear scale representing the residual
values. The Y axis is a probability scale representing the cumulative frequency
of the residuals. For an example of a normal probability plot, see page 155.

Creating a To generate a graph of t-test data:


Report Graph

Unpaired t-Test 218


Comparing Two or More Groups

1 Click the toolbar button, or choose the Graph menu Create Graph
command when the t-test report is selected. The Create Graph dilaog
appears displaying the types of graphs available for the t-test results.

FIGURE 8–10
The Create Graph Dialog
Box
for the t-test Report

2 Select the type of graph you want to create from the Graph Type list,
then click OK, or double-click the desired graph in the list. The
selected graph appears in a graph window. For more information on
each of the graph types, see Chapter 8.

FIGURE 8–11
A Point Plot of the
Result Data for a t-test

For information on manipulating graphs, see Chapter 8, CREATING AND


MODIFYING GRAPHS.

Unpaired t-Test 219


Comparing Two or More Groups

Mann-Whitney Rank Sum Test 10

The Rank Sum Test should be used when:

➤ You want to see if the medians of two different samples are significantly
different.
➤ The samples are not drawn from normally distributed populations with
the same variances, or you do not want to assume that they were drawn
from normal populations.

If you know your data was drawn from a normally distributed population,
use the Unpaired t-test (page 206). When there are more than two groups to
compare, run a Kruskal-Wallis ANOVA on Ranks (page 310).

! Note that, depending on your Rank Sum Test options settings (see page
222), if you attempt to perform a rank sum test on normal populations with
equal variances, SigmaStat informs you that the data can be analyzed with
the more powerful Unpaired t-test instead.

About the The Mann-Whitney Rank Sum Test is used to test for a difference between
Mann-Whitney two groups that is greater than what can be attributed to random sampling
Rank Sum Test variation. The null hypothesis is that the two samples were not drawn from
populations with different medians.

The Rank Sum Test is a nonparametric procedure, which does not require
assuming normality or equal variance. It ranks all the observations from
smallest to largest without regard to which group each observation comes
from. The ranks for each group are summed and the rank sums compared.

If there is no difference between the two groups, the mean ranks should be
approximately the same. If they differ by a large amount, you can assume that
the low ranks tend to be in one group and the high ranks are in the other, and
conclude that the samples were drawn from different populations (i.e., that
there is a statistically significant difference).

Performing a To perform a Mann-Whitney Rank Sum Test:


Mann-Whitney
Rank Sum Test 1 Enter or arrange your data appropriately in the data worksheet (see
following section).

2 If desired, set the Rank Sum options using the Options for Rank Sum
Test dialog box (page 232).

Mann-Whitney Rank Sum Test 220


Comparing Two or More Groups

3 Select Rank Sum Test from the toolbar drop-down list, then click the
button, or choose the Statistics menu Compare Two Groups, Rank
Sum Test command.

4 Run the test by selecting the worksheet columns with the data you want
to test using the Pick Columns dialog box (page 99).

5 View and interpret the Rank Sum Test report and generate report
graphs (pages 8-226 and pages 8-228).

Arranging Rank Sum Data 10

The format of the data to be tested can be raw data or indexed data; in either
case, the data is found in two worksheet columns. For more information on
arranging data, see Data Format for Group Comparison Tests on page 204,
or Arranging Data for t-Tests and ANOVAs on page 64. For information on
how to select the data format for a test, see Selecting a Test on page 98.

FIGURE 8–12
Valid Data Formats
for a Mann-Whitney
Rank Sum Test
Columns 1 and 2 are
arranged as raw data.
Columns 3 and 4 are
arranged as group
indexed data, with column
3 as the factor column.

Selecting When running a Rank Sum Test you can either:


Data Columns
➤ Select the columns by dragging your mouse over the columns before
choosing the test, or
➤ Select the columns while running the test.

Mann-Whitney Rank Sum Test 221


Comparing Two or More Groups

Setting Mann-Whitney Rank Sum Test Options 10

Use the Rank Sum Test options to:

➤ Adjust the parameters of the test to relax or restrict the testing of your
data for normality and equal variance.
➤ Display the summary table.

To change the Rank Sum Test options:

1 If you are going to run the test after changing test options, and want to
select your data before you run the test, drag the pointer over your data.

2 To open the Options for Rank Sum Test dialog box, select Rank Sum
Test from the toolbar drop-down list, then click the button, or
choose the Statistics menu Current Test Options... command. The
Normality and Equal Variance options appear (see Figure 8–20 on page
232).

3 Click the Results tab to view the Summary Table option (see Figure 8–
20 on page 232). Click the Assumption Checking tab to return to the
Normality and Equal Variance options.

4 Click a check box to enable or disable a test option. Options settings are
saved between SigmaStat sessions. For more information on each of the
test options, see pages 8-209 through pages 8-210.

5 To continue the test, click Run Test. The Pick Columns dialog box
appears (see Picking Data to Test on page 99 for more information).

6 To accept the current settings and close the options dialog box, click
OK. To accept the current setting without closing the options dialog
box, click Apply. To close the dialog box without changing any settings
or running the test, click Cancel.

! You can select Help at any time to access SigmaStat’s on-line help system.

Normality and Select the Assumption Checking tab from the options dialog box to view the
Equal Variance Normality and Equal Variance options. The normality assumption test
Assumptions checks for a normally distributed population. The equal variance assumption
test checks the variability about the group means.

Mann-Whitney Rank Sum Test 222


Comparing Two or More Groups

Normality Testing SigmaStat uses the Kolmogorov-Smirnov test to test for


a normally distributed population.

FIGURE 8–13
The Options for Rank
Sum Test Dialog Box
Displaying the Assumption
Checking Options

Equal Variance Testing SigmaStat tests for equal variance by checking the
variability about the group means.

P Values for Normality and Equal Variance The P value determines the
probability of being incorrect in concluding that the data is not normally
distributed (the P value is the risk of falsely rejecting the null hypothesis that
the data is normally distributed). If the P value computed by the test is
greater than the P set here, the test passes.

To require a stricter adherence to normality and/or equal variance,


increase the P value. Because the parametric statistical methods are relatively
robust in terms of detecting violations of the assumptions, the suggested
value in SigmaStat is 0.050. Larger values of P (for example, 0.100) require
less evidence to conclude that data is not normal.

To relax the requirement of normality and/or equal variance, decrease P.


Requiring smaller values of P to reject the normality assumption means that
you are willing to accept greater deviations from the theoretical normal
distribution before you flag the data as non-normal. For example, a P value of
0.050 requires greater deviations from normality to flag the data as non-
normal than a value of 0.100.

! There are extreme conditions of data distribution that these tests cannot take
into account. For example, the Levene Median test fails to detect differences
in variance of several orders of magnitude. However, these conditions should
be easily detected by simply examining the data without resorting to the
automatic assumption tests.

Mann-Whitney Rank Sum Test 223


Comparing Two or More Groups

Summary Table Select the Results tab to view the Summary Table option. The summary table
for a ANOVA on Ranks lists the medians, percentiles, and sample sizes N in
the ANOVA on Ranks report. If desired, change the percentile values by
editing the boxes. The 25th and the 75th percentiles are the suggested
percentiles.

FIGURE 8–14
The Options for Rank Sum
Test Dialog Box Displaying
the Summary Table Options

Running a Rank Sum Test 10

To run a test, you need to select the data to test. The Pick Columns dialog
box is used to select the worksheet columns with the data you want to test
and to specify how your data is arranged in the worksheet.

To run the Rank Sum Test:

1 If you want to select your data before you run the test, drag the pointer
over your data.

2 Open the Pick Columns dialog box to start the Rank Sum Test. You can
either:

➤ Select Rank Sum Test from the toolbar drop-down list, then click the
button.
➤ Choose the Statistics menu Compare Two Groups, Rank Sum Test...
command.
➤ Click the Run Test button from the Options for Rank Sum Test
dialog box.

The Pick Columns dialog box appears prompting you to specify a data
format.

Mann-Whitney Rank Sum Test 224


Comparing Two or More Groups

3 Select the appropriate data format from the Data Format drop-down
list. If your data is grouped in columns, select Raw. If your data is in the
form of a group index column(s) paired with a data column(s), select
Indexed.

FIGURE 8–15
The Pick Columns
for Rank Sum Test Dialog
Box
Prompting You to
Specify a Data Format

For more information on arranging data, see Data Format for Group
Comparison Tests on page 204, or Arranging Data for t-Tests and
ANOVAs on page 64.

4 Click Next to pick the data columns for the test. If you selected
columns before you chose the test, the selected columns appear in the
Selected Columns list.

5 To assign the desired worksheet columns to the Selected Columns list,


select the columns in the worksheet, or select the columns from the
Data for Data drop-down list.

The first selected column is assigned to the first row in the Selected
Columns list, and all successively selected columns are assigned to
successive rows in the list. The number or title of selected columns
appear in each row. For raw and indexed data, you are prompted to
select two worksheet columns.

6 To change your selections, select the assignment in the list, then select
new column from the worksheet. You can also clear a column
assignment by double-clicking it in the Selected Columns list.

7 Click Finish to run the Rank Sum Test on the selected columns. If you
elected to test for normality and equal variance, SigmaStat performs the
test for normality (Kolmogorov-Smirnov) and the test for equal
variance (Levene Median). If your data pass both tests, SigmaStat

Mann-Whitney Rank Sum Test 225


Comparing Two or More Groups

FIGURE 8–16
The Pick Columns
for Rank Sum Test Dialog
Box
Prompting You to
Select Data Columns

informs you and suggests continuing your analysis using a parametric


t-test (see page 332).

After the computations are completed, the report appears (see


Figure 8–17 on page 227). For information on editing reports, see
Editing Reports on page 137.

! If you attempt to run a test on worksheet columns with empty cells, a


dialog box appears asking you if you want to convert the empty cells to
missing values. Choose Convert to convert the cells and continue with
the test, or Cancel to cancel the test.

Interpreting Rank Sum Test Results 10

The Rank Sum Test computes the Mann-Whitney T statistic and the P value
for T. These results are displayed in the rank sum report which appears after
the rank sum test is performed. The other results displayed in the report are

Mann-Whitney Rank Sum Test 226


Comparing Two or More Groups

enabled and disabled in the Options for Rank Sum Test dialog box (see
Setting Mann-Whitney Rank Sum Test Options on page 222).

FIGURE 8–17
The Mann-Whitney
Rank Sum Test
Report

For descriptions of the derivations for t-test results, you can reference any
appropriate statistics reference. For a list of suggested references, see
References on page 12.

! The report scroll bars only scroll to the top and bottom of the current page.
To move to the next or the previous page in the report, use the and
buttons in the formatting toolbar to move one page up and down in the
report.

Result Explanations In addition to the numerical results, expanded explanations of the results may
also appear. To turn off this explanatory text, choose the Statistics menu
Report Options... command and uncheck the Explain Test Results option.

The number of decimal places displayed is also set in the Report Options
dialog box. For more information of setting report options, see Setting
Report Options on page 135.

Normality Test Normality test results display whether the data passed or failed the test of the
assumption that they were drawn from a normal population and the P value
calculated by the test. For nonparametric procedures, this test can fail, as
nonparametric tests do not assume normally distributed source populations.
This result is set in the Options for Rank Sum Test dialog box (see page 232).

Equal Variance Test Equal Variance test results display whether or not the data passed or failed the
test of the assumption that the samples were drawn from populations with

Mann-Whitney Rank Sum Test 227


Comparing Two or More Groups

the same variance and the P value calculated by the test. Nonparametric tests
do not assume equal variance of the source populations. This result is set in
the Options for Rank Sum Test dialog box (see page 232).

Summary Table SigmaStat generates a summary table listing the sample sizes N, number of
missing values, medians, and percentiles unless you disable the Display
Summary Table option in the Options for Rank Sum Test dialog box.

N (Size) The number of non-missing observations for that column or


group.

Missing The number of missing values for that column or group.

Medians The “middle” observation as computed by listing all the


observations from smallest to largest and selecting the largest value of the
smallest half of the observations. The median observation has an equal
number of observations greater than and less than that observation.

Percentiles The two percentile points that define the upper and lower tails
of the observed values.

T Statistic The T statistic is the sum of the ranks in the smaller sample group or from
the first selected group, if both groups are the same size. This value is
compared to the population of all possible rankings to determine the
possibility of this T occurring.

P Value The P value is the probability of being wrong in concluding that


there is a true difference in the two groups (i.e., the probability of falsely
rejecting the null hypothesis, or committing a Type I error, based on T). The
smaller the P value, the greater the probability that the samples are drawn
from different populations.

Traditionally, you can conclude there is a significant difference when


P < 0.05.

Rank Sum Test Report Graphs 10

You can generate up to two graphs using the results from a Rank Sum Test.
They include a:

➤ Box plot of the percentiles and median of column data.


➤ Point plot of the column data.

Mann-Whitney Rank Sum Test 228


Comparing Two or More Groups

Box Plot The Rank Sum Test box plot graphs the percentiles and the median of
column data. The ends of the boxes define the 25th and 75th percentiles,
with a line at the median and error bars defining the 10th and 90th
percentiles.

If the graph data is indexed, the levels in the factor column are used as the
tick marks for the box plot boxes, and the column titles are used as the axis
titles. If the graph data is in raw format, the column titles are used as the tick
marks for the box plot boxes, and no axis titles are assigned to the graph. For
an example of a box plot, see page 152 in the CREATING AND MODIFYING
GRAPHS chapter.

Point Plot The Rank Sum Test point plot graphs all values in each column as a point on
the graph. If the graph data is indexed, the levels in the factor column are
used as the tick marks for the plot points, and the column titles are used as
the X and Y axis titles. If the graph data is in raw or statistical format, the
column titles are used as the tick marks for the plot points and default X Data
and Y Data axis titles are assigned to the graph. For an example of a point
plot, see Point Plot on page 150.

Creating a To generate a graph of Rank Sum Test report data:


Report Graph
1 Click the toolbar button, or choose the Graph menu Create Graph
command when the Rank Sum Test report is selected. The Create
Graph dilaog appears displaying the types of graphs available for the
Rank Sum test results.

FIGURE 8–18
The Create Graph Dialog
Box for the Rank Sum Test
Report

Mann-Whitney Rank Sum Test 229


Comparing Two or More Groups

2 Select the type of graph you want to create from the Graph Type list,
then click OK, or double-click the desired graph in the list. For more
information on each of the graph types, see Chapter 8.

FIGURE 8–19
A Box Plot of the
Result Data for a
Rank Sum Test

For information on manipulating graphs, see Chapter 8, CREATING AND


MODIFYING GRAPHS.

One Way Analysis of Variance 10

One Way Analysis of Variance is a parametric test that assumes that all the
samples are drawn from normally distributed populations with the same
standard deviations (variances).

A One Way or One Factor ANOVA should be used when:

➤ You want to see if the means of two of more different experimental


groups are affected by a single factor.
➤ Your samples are drawn from normally distributed populations with
equal variance.

If you know that your data was drawn from non-normal populations, use the
Kruskal-Wallis ANOVA on Ranks. If you want to consider the effects of two
factors on your experimental groups, use Two Way ANOVA. When there are
only two groups to compare, you can do a t-test (depending on the type of

One Way Analysis of Variance 230


Comparing Two or More Groups

results you want). Performing an ANOVA for two groups yields exactly the
same P value as an unpaired t-test.

! Depending on your ANOVA options settings (see page 232), if you attempt
to perform an ANOVA on non-normal populations or populations with
unequal variances, SigmaStat informs you that the data is unsuitable for a
parametric test, and suggests the Kruskal-Wallis ANOVA on Ranks (see
page 310).

About a One The design for a One Way ANOVA is the same as an unpaired t-test except
Way ANOVA that there can be more than two experimental groups. The null hypothesis is
that there is no difference among the populations from which the samples
were drawn.

Performing a To perform a One Way ANOVA:


One Way ANOVA
1 Enter or arrange your data appropriately in the worksheet
(see following section).

2 If desired, set the One Way ANOVA options using the Options for
One Way ANOVA dialog box (page 232).

3 Select One Way ANOVA from the toolbar drop-down list, then click
the button, or choose Compare Many Groups, One Way ANOVA...
command from the Statistics menu.

4 Run the test by selecting the worksheet columns with the data you want
to test using the Pick Columns dialog box (page 99).

5 Specify the multiple comparison method you want to perform on your


test (page 236).

6 View and interpret the One Way ANOVA report and generate report
graphs (page 243 and page 250).

Arranging One Way ANOVA Data 10

Data can be arranged as raw data, indexed data, or summary statistics. Raw
data is placed in as many columns as there are groups, up to 32; each column
contains the data for one group. Indexed data is placed in two worksheet
columns. Statistical summary data is placed in three columns. For more

One Way Analysis of Variance 231


Comparing Two or More Groups

information on arranging data, see Data Format for Group Comparison Tests
on page 204, or Arranging Data for Contingency Tables on page 69.

FIGURE 8–20
Valid Data Formats
for a One Way ANOVA
Columns 1 through 3 are
arranged as groups in
columns. Columns 4, 5,
and 6 are arranged as
descriptive statistics using
the mean, standard
deviation, and size. Columns
7 and 8 are arranged as
group indexed data, with
column
7 as the factor column.

Selecting Data When running a One Way ANOVA you can either:
Columns
➤ Select the columns by dragging your mouse over the columns before
choosing the test.
➤ Select the columns while running the test.

Setting One Way ANOVA Options 10

Use the One Way ANOVA options to:

➤ Adjust the parameters of the test to relax or restrict the testing of your
data for normality and equal variance.
➤ Display the statistics summary table and the confidence interval for the
data, and assign residuals to a worksheet column.
➤ Enable multiple comparisons.
➤ Compute the power, or sensitivity, of the test.

To change the One Way ANOVA options:

1 If you are going to run the test after changing test options, and want to
select your data before you run the test, drag the pointer over your data.

2 To open the Options for One Way ANOVA dialog box, select One Way
ANOVA from the toolbar drop-down, then click the button, or
choose the Statistics menu Current Test Options... command. The

One Way Analysis of Variance 232


Comparing Two or More Groups

Normality and Equal Variance options appear (see Figure 8–20 on page
232).

3 Click the Results tab to view the Summary Table, Confidence Intervals,
and Residuals in Column options (see Figure 8–22 on page 235). Click
the Post Hoc Test tab to view the Power and Multiple Comparisons
options (see 8–23 on page 236). Click the Assumption Checking tab to
return to the Normality and Equal Variance options.

4 Click a check box to enable or disable a test option. Options settings are
saved between SigmaStat sessions. For more information on each of the
test options, see pages 8-233 through 8-236.

5 To continue the test, click Run Test. The Pick Columns dialog box
appears (see page 99 for more information).

6 To accept the current settings and close the options dialog box, click
OK. To accept the current setting without closing the options dialog
box, click Apply. To close the dialog box without changing any settings
or running the test, click Cancel.

! You can select Help at any time to access SigmaStat’s on-line help system.

Normality and Select the Assumption Checking tab from the options dialog box to view the
Equal Variance Normality and Equal Variance options. The normality assumption test
Assumptions checks for a normally distributed population. The equal variance assumption
test checks the variability about the group means.

Normality Testing SigmaStat uses the Kolmogorov-Smirnov test to test for


a normally distributed population.

FIGURE 8–21
The Options for
One Way ANOVA
Dialog Box Displaying
the Assumption
Checking Options

One Way Analysis of Variance 233


Comparing Two or More Groups

Equal Variance Testing SigmaStat tests for equal variance by checking the
variability about the group means.

P Values for Normality and Equal Variance The P value determines the
probability of being incorrect in concluding that the data is not normally
distributed (the P value is the risk of falsely rejecting the null hypothesis that
the data is normally distributed). If the P value computed by the test is
greater than the P set here, the test passes.

To require a stricter adherence to normality and/or equal variance,


increase the P value. Because the parametric statistical methods are relatively
robust in terms of detecting violations of the assumptions, the suggested
value in SigmaStat is 0.050. Larger values of P (for example, 0.100) require
less evidence to conclude that data is not normal.

To relax the requirement of normality and/or equal variance, decrease P.


Requiring smaller values of P to reject the normality assumption means that
you are willing to accept greater deviations from the theoretical normal
distribution before you flag the data as non-normal. For example, a P value of
0.050 requires greater deviations from normality to flag the data as non-
normal than a value of 0.100.

! There are extreme conditions of data distribution that these tests cannot take
into account. For example, the Levene Median test fails to detect differences
in variance of several orders of magnitude. However, these conditions should
be easily detected by simply examining the data without resorting to the
automatic assumption tests.

Summary Table Select the Results tab in the options dialog box to view the summary table
option. The Summary Table option displays the number of observations for a
column or group, the number of missing values for a column or group, the

One Way Analysis of Variance 234


Comparing Two or More Groups

average value for the column or group, the standard deviation of the column
or group, and the standard error of the mean for the column or group.

FIGURE 8–22
The Options for One Way
ANOVA Dialog Box
Displaying the Summary
Table, Confidence Intervals,
and Residuals Options

Confidence Interval Select the Results tab in the options dialog box to view the Confidence
Interval option. The Confidence Intervals option displays the confidence
interval for the difference of the means. To change the interval, enter any
number from 1 to 99 (95 and 99 are the most commonly used intervals).
Click the selected check box if you do not want to include the confidence
interval in the report.

Residuals Select the Results tab in the options dialog box to view the Residuals option.
Use the Residuals option to display residuals in the report and to save the
residuals of the test to the specified worksheet column. To change the
column the residuals are saved to, edit the number in or select a number
from the drop-down list.

One Way Analysis of Variance 235


Comparing Two or More Groups

Power Select the Post Hoc Tests tab in the options dialog box to view the Power
options. The power or sensitivity of a test is the probability that the test will
detect a difference between the groups if there is really a difference.

FIGURE 8–23
The Options for One
Way ANOVA Dialog Box
Displaying the
Power and Multiple
Comparison Options

Change the alpha value by editing the number in the Alpha Value box. Alpha
($) is the acceptable probability of incorrectly concluding that there is a
difference. The suggested value is $ 0 0.05. This indicates that a one in
twenty chance of error is acceptable, or that you are willing to conclude there
is a significant difference when P 7 0.05.

Smaller values of $ result in stricter requirements before concluding there is a


significant difference, but a greater possibility of concluding there is no
difference when one exists. Larger values of $ make it easier to conclude that
there is a difference, but also increase the risk of reporting a false positive.

Multiple Select the Post Hoc Test tab in the Options dialog box to view the multiple
Comparisons comparisons options (see Figure 8–25 on page 239). One Way ANOVAs test
the hypothesis of no differences between the several treatment groups, but do
not determine which groups are different, or the sizes of these differences.
Multiple comparison procedures isolate these differences.

The P value used to determine if the ANOVA detects a difference is set in the
Report Options dialog box. If the P value produced by the One Way
ANOVA is less than the P value specified in the box, a difference in the
groups is detected and the multiple comparisons are performed. For more
information on specifying a P value for the ANOVA, see Setting Report
Options on page 232.

One Way Analysis of Variance 236


Comparing Two or More Groups

Performing Multiple Comparisons You can choose to always perform


multiple comparisons or to only perform multiple comparisons if a One Way
ANOVA detects a difference.

Select the Always Perform option to perform multiple comparisons whether


or not the ANOVA detects a difference.

Select the Only When ANOVA P Value is Significant option to perform


multiple comparisons only if the ANOVA detects a difference.

Significant Multiple Comparison Value Select either .05 or .01 from the
Significance Value for Multiple Comparisons drop-down list. This value
determines the that the likelihood of the multiple comparison being incorrect
in concluding that there is a significant difference in the treatments.

A value of .05 indicates that the multiple comparisons will detect a difference
if there is less than 5% chance that the multiple comparison is incorrect in
detecting a difference.

! If multiple comparisons are triggered, the Multiple Comparison Options


dialog box appears after you pick your data from the worksheet and run the
test, prompting you to choose a multiple comparison method.

Running a One Way ANOVA 10

To run a test, you need to select the data to test. The Pick Columns dialog
box is used to select the worksheet columns with the data you want to test
and to specify how your data is arranged in the worksheet.

To run a One Way ANOVA:

1 If you want to select your data before you run the test, drag the pointer
over your data.

2 Open the Pick Columns dialog box to start the One Way ANOVA. You
can either:

➤ Select One Way ANOVA from the toolbar drop-down list, then click
the button.
➤ Choose the Statistics menu Compare Many Groups, One Way
ANOVA... command.
➤ Click the Run Test button from the Options for One Way ANOVA
dialog box.

One Way Analysis of Variance 237


Comparing Two or More Groups

The Pick Columns dialog box appears prompting you to specify a data
format.

3 Select the appropriate data format from the Data Format drop-down
list. If your data is grouped in columns, select Raw. If your data is in the
form of a group index column(s) paired with a data column(s), select
Indexed. If your data was entered in the form of summary statistics for
each group, select either Sample Size, Mean, and Standard Deviation or
Sample Size, Mean, and SEM (Standard Error of the Mean).

For more information on arranging data, see Data Format for Group
Comparison Tests on page 204, or Arranging Data for Contingency
Tables on page 69.

FIGURE 8–24
The Pick Columns
for One Way ANOVA
Dialog Box Prompting You to
Specify a Data Format

4 Click Next to pick the data columns for the test. If you selected
columns before you chose the test, the selected columns appear in the
Selected Columns list.

5 To assign the desired worksheet columns to the Selected Columns list,


select the columns in the worksheet, or select the columns from the
Data drop-down list.

The first selected column is assigned to the first row in the Selected
Columns list, and all successively selected columns are assigned to
successive rows in the list. The number or title of selected columns
appear in each row. You are prompted to pick a minimum of two and a
maximum of 64 columns for raw data, two columns for indexed data,
and three columns for statistical summary data.

6 To change your selections, select the assignment in the list, then select
new column from the worksheet. You can also clear a column
assignment by double-clicking it in the Selected Columns list.

One Way Analysis of Variance 238


Comparing Two or More Groups

FIGURE 8–25
The Pick Columns
for One Way ANOVA
Dialog Box Prompting You to
Select Data Columns

7 Click Finish to perform the One Way ANOVA. If you elected to test
for normality and equal variance, and your data fails either test,
SigmaStat warns you and suggests continuing your analysis using the
nonparametric Kruskal-Wallis ANOVA on Ranks (see page 310).

If you selected to run multiple comparisons only when the P value is


significant, and the P value is not significant (see page 131), the One
Way ANOVA report appears after the test is complete. To edit the
report, use the Format menu commands; for information on editing
reports, see Editing Reports on page 137.

If the P value for multiple comparisons is significant, or you selected to


always perform multiple comparisons, the Multiple Comparisons
Options dialog box appears prompting you to select a multiple
comparison method. For more information on selecting a multiple
comparison method, see the following section, Multiple Comparisons
Options.

Multiple Comparison Options 10

The One Way ANOVA tests the hypothesis of no differences between the
several treatment groups, but does not determine which groups are different,
or the sizes of these differences. Multiple comparison tests isolate these
differences by running comparisons between the experimental groups.

If you selected to run multiple comparisons only when the P value is


significant, and the ANOVA produces a P value equal to or less than the
trigger P value, or you selected to always run multiple comparisons in the
Options for One Way ANOVA dialog box (see page 242), the Multiple
Comparison Options dialog box appears prompting you to specify a multiple
comparison test. The P value produced by the ANOVA is displayed in the
upper left corner of the dialog box.

One Way Analysis of Variance 239


Comparing Two or More Groups

There are seven multiple comparison tests to choose from for the One Way
ANOVA. You can choose to perform the:

➤ Holm-Sidak Test
➤ Tukey Test
➤ Student-Newman-Keuls Test
➤ Bonferroni t-test
➤ Fisher’s LSD
➤ Dunnet’s Test
➤ Duncan’s Multiple Range Test

There are two types of multiple comparisons available for the One Way
ANOVA. The types of comparison you can make depends on the selected
multiple comparison test, including:

➤ All pairwise comparisons compare all possible pairs of treatments.


➤ Multiple comparisons versus a control compare all experimental
treatments to a single control group.

Holm-Sidak Test Use the Holm-Sidak Test for both pairwise comparisons and comparisons
versus a control group. It is more powerful than the Tukey and Bonferroni
tests and, consequently, it is able to detect differences that these other tests do
not. It is recommended as the first-line procedure for pairwise comparison
testing.

When performing the test, the P values of all comparisons are computed and
ordered from smallest to largest. Each P value is then compared to a critical
level that depends upon the significance level of the test (set in the test
options), the rank of the P value, and the total number of comparisons made.
A P value less than the critical level indicates there is a significant difference
between the corresponding two groups.

Tukey Test The Tukey Test and the Student-Newman-Keuls test are conducted similarly
to the Bonferroni t-test, except that they use a table of critical values that is
computed based on a better mathematical model of the probability structure
of the multiple comparisons. The Tukey Test is more conservative than the
Student-Newman-Keuls test, because it controls the errors of all comparisons
simultaneously, while the Student-Neuman-Keuls test controls errors among
tests of k means. Because it is more conservative, it is less likely to determine
that a give differences is statistically significant and it is the recommended test
for all pairwise comparisons.

One Way Analysis of Variance 240


Comparing Two or More Groups

While Multiple Comparisons vs. a Control is an available comparison type


for the Tukey Test, it is not recommended. Use the Dunnet’s Test for
multiple comparisons vs. a control.

Student-Newman- The Student-Newman-Keuls Test and the Tukey Test are conducted similarly
Keuls (SNK) Test to the Bonferroni t-test, except that they use a table of critical values that is
computed based on a better mathematical model of the probability structure
of the multiple comparisons. The Student-Newman-Keuls Test is less
conservative than the Tukey Test because it controls errors among tests of k
means, while the Tukey Test controls the errors of all comparisons
simultaneously. Because it is less conservative, it is more likely to determine
that a given difference is statistically significant. The Student-Newman-Keuls
Test is usually more sensitive than the Bonferroni t-test, and is only available
for all pairwise comparisons.

Bonferroni t-test The Bonferroni t-test performs pairwise comparisons with paired t-tests. The
P values are then multiplied by the number of comparisons that were made.
It can perform both all pairwise comparisons and multiple comparisons vs. a
control, and is the most conservative test for both each comparison type. For
less conservative all pairwise comparison tests, see the Tukey and the Student-
Newman-Keuls tests, and for the less conservative multiple comparison vs. a
control tests, see the Dunnett’s Test.

Fisher’s Least The Fisher’s LSD Test is the least conservative all pairwise comparison test.
Significance Unlike the Tukey and the Student-Newman-Keuls, it makes not effort to
Difference Test control the error rate. Because it makes not attempt in controlling the error
rate when detecting differences between groups, it is not recommended.

Dunnett’s Test Dunnett's test is the analog of the Student-Newan-Keuls Test for the case of
multiple comparisons against a single control group. It is conducted similarly
to the Bonferroni t-test, but with a more sophisticated mathematical model
of the way the error accumulates in order to derive the associated table of
critical values for hypothesis testing. This test is less conservative than the
Bonferroni Test, and is only available for multiple comparisons vs. a control.

Duncan’s The Duncan’s Test is the same way as the Tukey and the Student-Newman-
Multiple Range Keuls tests, except that it is less conservative in determining whether the
difference between groups is significant by allowing a wider range for error
rates. Although it has a greater power to detect differences than the Tukey
and the Student-Newman-Keuls tests, it has less control over the Type 1 error
rate, and is, therefore, not recommended.

One Way Analysis of Variance 241


Comparing Two or More Groups

Performing a The multiple comparison test you choose depends on the treatments you are
Multiple Comparison testing. Click Cancel if you do not want to perform a multiple comparison
test.

To perform a multiple comparison test:

8 The All Levels option under the Select Factors to Compare heading
determines whether or not multiple comparison are performed. This
option is automatically selected if P value produced by the ANOVA
(displayed in the upper left corner of the dialog box) is less than or
equal to the P value set in the Options dialog box, and multiple
comparisons are performed. If the P value displayed in the dialog box is
greater than the P value set in the Options dialog box, the All Factors
options is not selected and multiple comparisons are not performed.

You can disable multiple comparison testing for the groups by clicking
the selected All Factors check box.

9 Select the desired multiple comparison test from the Suggested Test
drop-down list. The Tukey and Student-Newman-Keuls tests are
recommended for determining the difference among all treatments. If
you have only a few treatments, you may want to select the simpler
Bonferroni t-test.

FIGURE 8–26
The Multiple Comparison
Options Dialog Box

The Dunnett's test is recommended for determining the differences


between the experimental treatments and a control group. If you have
only a few treatments or observations, you can select the simpler
Bonferroni t-test.

One Way Analysis of Variance 242


Comparing Two or More Groups

! Note that in both cases the Bonferroni t-test is most sensitive with a small
number of groups. Dunnett’s test is not available if you have fewer than
six observations.

For more information on each of the multiple comparison tests, see


pages 8-240 through 8-241.

10 Select a Comparison Type. The types of comparisons available depend


on the selected test. All Pairwise compares all possible pairs of
treatments and is available for the Tukey, Student-Newman-Keuls,
Bonferroni, Fisher LSD, and Duncan’s tests.

Versus Control compares all experimental treatments to a single control


group and is available for the Tukey, Bonferroni, Fisher LSD,
Dunnett’s, and Duncan’s tests. It is not recommended for the Tukey,
Fisher LSD, or Duncan’s test. If you select Versus Control, you must
also select the control group from the list of groups (see step 5).

For more information on multiple comparison test and the available


comparison types, see pages 8-236 through 8-241.

11 If you selected an all pairwise comparison test, click Finish to


continue with the One Way ANOVA and view the report (see
Figure 8–28 on page 245). For information on editing reports, see
Editing Reports on page 137.

12 If you selected a multiple comparisons versus a control test, click


Next. The Multiple Comparisons Options dialog box prompts you to
select a control group. Select the desired control group from the list,
then click Finish to continue with the One Way ANOVA and view the
report (see Figure 8–28 on page 245). For information on editing
reports, see Editing Reports on page 137.

Interpreting One Way ANOVA Results 10

The One Way ANOVA report displays an ANOVA table describing the
source of the variation in the groups. This table displays the sum of squares,
degrees of freedom, and mean squares of the groups, as well as the F statistic
and the corresponding P value. The statistical summary table of the data and
other results displayed in the report are enabled and disabled in the Options
for One Way ANOVA dialog box (see Setting One Way ANOVA Options on
page 232).

One Way Analysis of Variance 243


Comparing Two or More Groups

FIGURE 8–27
The Multiple Comparison
Options Dialog Box
Prompting You to
Select a Control Group

You can also generate tables of multiple comparisons. Multiple Comparison


results are also specified in the Options for One Way ANOVA dialog box (see
page 236). The test used to perform the multiple comparison is selected in
the Multiple Comparison Options dialog box (see page 239).

For descriptions of the derivations for One Way ANOVA results, you can
reference any appropriate statistics reference. For a list of suggested references,
see References on page 12.

! The report scroll bars only scroll to the top and bottom of the current page.
To move to the next or the previous page in the report, use the and
buttons in the formatting toolbar to move one page up and down in the
report.

Result Explanations In addition to the numerical results, expanded explanations of the results may
also appear. To turn off this explanatory text, choose the Statistics menu
Report Options... command and click the selected Explain Test Results check
box.

The number of decimal places displayed is also set in the Report Options
dialog box. For information on editing reports, see Chapter 6, Working with
Reports.

Normality Test Normality test results display whether the data passed or failed the test of the
assumption that they were drawn from a normal population and the P value
calculated by the test. Normally distributed source populations are required
for all parametric tests. This result is set in the Options for One Way
ANOVA dialog box (see page 233).

One Way Analysis of Variance 244


Comparing Two or More Groups

Equal Variance Test Equal Variance test results display whether or not the data passed or failed the
test of the assumption that the samples were drawn from populations with
the same variance, and the P value calculated by the test. Equal variance of
the source populations is assumed for all parametric tests.

FIGURE 8–28
The One Way ANOVA
Results Report

Summary Table If you enabled this option in the Options for One Way ANOVA dialog box,
SigmaStat generates a summary table listing the sample sizes N, number of
missing values, mean, standard deviation, differences of the means and
standard deviations, and standard error of the means.

N (Size) The number of non-missing observations for that column or


group.

Missing The number of missing values for that column or group.

Mean The average value for the column. If the observations are normally
distributed the mean is the center of the distribution.

Standard Deviation A measure of variability. If the observations are


normally distributed, about two-thirds will fall within one standard deviation
above or below the mean, and about 95% of the observations will fall within
two standard deviations above or below the mean.

One Way Analysis of Variance 245


Comparing Two or More Groups

Standard Error of the Mean A measure of the approximation with which


the mean computed from the sample approximates the true population
mean.

Confidence Interval If the confidence interval does not include zero, you can conclude that there
for the Difference is a significant difference between the proportions with the level of
of the Means confidence specified. This can also be described as P < $ (alpha), where $ is
the acceptable probability of incorrectly concluding that there is a difference.

The level of confidence is adjusted in the options dialog box; this is typically
100(1*$), or 95%. Larger values of confidence result in wider intervals and
smaller values in smaller intervals. For a further explanation of $, see the
following section, Power.

Power The power of the performed test is displayed unless you disable this option in
the Options for One Way ANOVA dialog box.

The power, or sensitivity, of a One Way ANOVA is the probability that the
test will detect a difference among the groups if there really is a difference.
The closer the power is to 1, the more sensitive the test.

ANOVA power is affected by the sample sizes, the number of groups being
compared, the chance of erroneously reporting a difference $ (alpha), the
observed differences of the group means, and the observed standard
deviations of the samples.

Alpha ($) Alpha ($) is the acceptable probability of incorrectly concluding


that there is a difference. An $ error is also called a Type I error. A Type I
error is when you reject the hypothesis of no effect when this hypothesis is
true.

The $ value is set in the Options for One Way ANOVA dialog box; the
suggested value is $ 0 0.05 which indicates that a one in twenty chance of
error is acceptable. Smaller values of $ result in stricter requirements before
concluding there is a significant difference, but a greater possibility of
concluding there is no difference when one exists (a Type II error). Larger
values of $ make it easier to conclude that there is a difference but also
increase the risk of seeing a false difference (a Type I error).

One Way Analysis of Variance 246


Comparing Two or More Groups

ANOVA Table The ANOVA table lists the results of the one way ANOVA.

DF (Degrees of Freedom) Degrees of freedom represent the number of


groups and sample size which affects the sensitivity of the ANOVA,
including:

➤ The degrees of freedom between groups is a measure of the number of


groups.
➤ The degrees of freedom within groups (sometimes called the error or
residual degrees of freedom) is a measure of the total sample size,
adjusted for the number of groups.
➤ The total degrees of freedom is a measure of the total sample size.

SS (Sum of Squares) The sum of squares is a measure of variability


associated with each element in the ANOVA data table, including:.

➤ The sum of squares between the groups measures the variability of the
average differences of the sample groups.
➤ The sum of squares within the groups (also called error or residual sum of
squares) measures the underlying variability of all individual samples.
➤ The total sum of squares measures the total variability of the observations
about the grand mean (mean of all observations).

MS (Mean Squares) The mean squares provide two estimates of the


population variances. Comparing these variance estimates is the basis of
analysis of variance.

The mean square between groups is :

sum of squares between groups - SS between


---------------------------------------------------------------------------- = --------------------- = MS between
degrees of freedom between groups DF between

The mean square within groups (also called the residual or error mean square)
is:

sum of squares within groups - = ------------------- SS within


------------------------------------------------------------------------- = MS within
degrees of freedom within groups DF within

One Way Analysis of Variance 247


Comparing Two or More Groups

F Statistic The F test statistic is the ratio:

estimated population variance between groups- = MS


-------------------------------------------------------------------------------------------------------- between
- = F
---------------------
estimated population variance within groups MS within

If the F ratio is around 1, you can conclude that there are no significant
differences between groups (i.e., the data groups are consistent with the null
hypothesis that all the samples were drawn from the same population).

If F is a large number, you can conclude that at least one of the samples was
drawn from a different population (i.e., the variability is larger than what is
expected from random variability in the population). To determine exactly
which groups are different, examine the multiple comparison results.

P Value The P value is the probability of being wrong in concluding that


there is a true difference between the groups (i.e., the probability of falsely
rejecting the null hypothesis, or committing a Type I error, based on F). The
smaller the P value, the greater the probability that the samples are drawn
from different populations. Traditionally, you can conclude that there are
significant differences when P 7 0.05.

Multiple If you selected to perform multiple comparisons (see page 236), a table of the
Comparisons comparisons between group pairs is displayed. The multiple comparison
procedure is activated in the Options for One Way ANOVA dialog box (see
page 236). The tests used in the multiple comparison procedure is selected in
the Multiple Comparison Options dialog box (see page 239).

Multiple comparison results are used to determine exactly which treatments


are different, since the ANOVA results only inform you that two or more of
the groups are different. The specific type of multiple comparison results
depends on the comparison test used and whether the comparison was made
pairwise or versus a control.

➤ All pairwise comparison results list comparisons of all possible


combinations of group pairs; the all pairwise tests are the Holm-Sidak,
Tukey, Student-Newman-Keuls, Fisher LSD, Duncan’s test and the
Bonferroni t-test.
➤ Comparisons versus a single control group list only comparisons with the
selected control group. The control group is selected during the actual
multiple comparison procedure. The comparison versus a control tests

One Way Analysis of Variance 248


Comparing Two or More Groups

are the Holm-Sidak, Dunnett’s, Fishers LSD, Duncan’s tests, the


Bonferroni t-test.

For descriptions of the derivations of parametric multiple comparison


procedure results, you can reference any appropriate statistics reference. For a
list of suggested references, see page 12.

Bonferroni t-test Results The Bonferroni t-test lists the differences of the
means for each pair of groups, computes the t values for each pair, and
displays whether or not P < 0.05 for that comparison. The Bonferroni t-test
can be used to compare all groups or to compare versus a control.

You can conclude from “large” values of t that the difference of the two
groups being compared is statistically significant.

If the P value for the comparison is less than 0.05, the likelihood of
erroneously concluding that there is a significant difference is less than 5%. If
it is greater than 0.05, you cannot confidently conclude that there is a
difference.

The difference of the means is a gauge of the size of the difference between
the two groups.

Tukey, Student-Newman-Keuls, Fisher LSD, Duncan’s, and Dunnett's


Test Results The Tukey, Student-Newman-Keuls (SNK), Fisher LSD, and
Duncan’s tests are all pairwise comparisons of every combination of group
pairs. While the Tukey Fisher LSD, and Duncan’s can be used to compare a
control group to other groups, they are not recommended for this type of
comparison.

Dunnett's test only compares a control group to all other groups. All tests
compute the q test statistic and display whether or not P 7 0.05 for that pair
comparison.

You can conclude from “large” values of q that the difference of the two
groups being compared is statistically significant.

If the P value for the comparison is less than 0.05, the likelihood of being
incorrect in concluding that there is a significant difference is less than 5%. If
it is greater than 0.05, you cannot confidently conclude that there is a
difference.

One Way Analysis of Variance 249


Comparing Two or More Groups

The Difference of the Means is a gauge of the size of the difference between
the two groups.

p is a parameter used when computing q. The larger the p, the larger q needs
to be to indicate a significant difference. p is an indication of the differences
in the ranks of the group means being compared. Groups means are ranked
in order from largest to smallest, and p is the number of means spanned in
the comparison. For example, if you are comparing four means, when
comparing the largest to the smallest p 0 4, and when comparing the second
smallest to the smallest p 0 2.

If a group is found to be not significantly different than another group, all


groups with p ranks in between the p ranks of the two groups that are not
different are also assumed not to be significantly different, and a result of
DNT (Do Not Test) appears for those comparisons.

One Way ANOVA Report Graphs 10

You can generate up to five graphs using the results from a One Way
ANOVA. They include a:

➤ Bar chart of the column means.


➤ Scatter plot with error bars of the column means.
➤ Histogram of the residuals.
➤ Normal probability plot of the residuals.
➤ Multiple comparison graphs.

Bar Chart The One Way ANOVA bar chart plots the group means as vertical bars with
error bars indicating the standard deviation. If the graph data is indexed, the
levels in the factor column are used as the tick marks for the bar chart bars,
and the column titles are used as the X and Y axis titles. If the graph data is in
raw or statistical format, the column titles are used as the tick marks for the
bar chart bars and default X Data and Y Data axis titles are assigned to the
graph. For an example of a bar chart, see Bar Charts of the Column Means on
page 149.

Scatter Plot The One Way ANOVA scatter plot graphs the group means as single points
with error bars indicating the standard deviation. If the graph data is indexed,
the levels in the factor column are used as the tick marks for the scatter plot
points, and the column titles are used as the X and Y axis titles. If the graph
data is in raw or statistical format, the column titles are used as the tick marks

One Way Analysis of Variance 250


Comparing Two or More Groups

for the scatter plot points and default X Data and Y Data axis titles are
assigned to the graph. For an example of a scatter plot, see page 150.

Histogram of The One Way ANOVA histogram plots the raw residuals in a specified range,
Residuals using a defined interval set. The residuals are divided into a number of evenly
incremented histogram intervals and plotted as histogram bars indicating the
number of residuals in each interval. The X axis represents the histogram
intervals, and the Y axis represent the number of residuals in each group. For
an example of a histogram, see page 153.

Probability Plot The One Way ANOVA probability plot graphs the frequency of the raw
residuals. The residuals are sorted and then plotted as points around a curve
representing the area of the gaussian plotted on a probability axis. Plots with
residuals that fall along gaussian curve indicate that your data was taken from
a normally distributed population. The X axis is a linear scale representing
the residual values. The Y axis is a probability scale representing the
cumulative frequency of the residuals. For an example of a probability, see
page 155.

Multiple The One Way ANOVA multiple comparison graphs a plot significant
Comparison Graphs differences between levels of a significant factor. There is one graph for every
significant factor reported by the specified multiple comparison test. If there
is one significant factor reported, one graph appears; if there are two
significant factors, two graphs appear, and so on. If a factor is not reported as
significant, a graph for the factor does not appear. For an example of a
multiple comparison graph, see page 160.

Creating a To generate a One Way ANOVA report graph:


Report Graph
1 Click the toolbar button, or choose the Graph menu Create Graph
command when the One Way ANOVA report is selected. The Create

One Way Analysis of Variance 251


Comparing Two or More Groups

Graph dilaog appears displaying the types of graphs available for the
One Way ANOVA report.

FIGURE 8–29
The Create Graph Dialog
Box for a One Way ANOVA
Report

2 Select the type of graph you want to create from the Graph Type list,
then click OK, or double-click the desired graph in the list. For more
information on each of the graph types, see Chapter 8. The specified
graph appears in a graph window or in the report.

FIGURE 8–30
A Normal Probability Plot
for a One Way ANOVA

For information on manipulating graphs, see Chapter 8, CREATING AND


MODIFYING GRAPHS.

One Way Analysis of Variance 252


9
Two Way Analysis of Variance (ANOVA) 10

A Two Way or Two Factor ANOVA (analysis of variance) should be used


when:

➤ You want to see if two of more different experimental groups are


affected by two different factors which may or may not interact.
➤ Samples are drawn from normally distributed populations with
equal variances.

If you want to consider the effects of only one factor on your


experimental groups, use the One Way ANOVA. If you are considering
the effects of three factors on your experimental graphs, use the Three
Way ANOVA. SigmaStat has no equivalent nonparametric two or three
factor comparison for samples drawn from a non-normal population. If
your data is non-normal, you can transform the data to make them
comply better with the assumptions of analysis of variance using
Transform menu commands. If the sample size is large, and you want to
do a nonparametric test, use the Transforms menu Rank command to
convert the observations to ranks, then run a Two or Three Way
ANOVA on the ranks.

For more information on transforming data, see Chapter 14, USING


TRANSFORMS.

About the In a two way or two factor analysis of variance, there are two
Two Way ANOVA experimental factors which are varied for each experimental group. A
two factor design is used to test for differences between samples grouped
according to the levels of each factor and for interactions between the
factors.

A two factor analysis of variance tests three hypotheses: (1) There is no


difference among the levels of the first factor; (2) There is no difference
among the levels of the second factor; and (3) There is no interaction
between the factors, i.e., if there is any difference among groups within
one factor, the differences are the same regardless of the second factor
level.

Two Way ANOVA is a parametric test that assumes that all the samples
were drawn from normally distributed populations with the same
variances.

Two Way Analysis of Variance (ANOVA) 253


Performing a Two Way To perform a Two Way ANOVA:
ANOVA
1 Enter or arrange your data appropriately in the worksheet
(see following section).

2 If desired, set the Two Way ANOVA options using the Options for
Two Way ANOVA dialog box (page 258).

3 Select Two Way ANOVA from the toolbar drop-down list, or


choose the Compare Many Groups, Two Way ANOVA...
command from the Statistics menu.

4 Run the test by selecting the worksheet columns with the data you
want to test using the Pick Columns dialog box (page 99).

5 Specify the multiple comparisons you want to perform on your


test (page 277).

6 View and interpret the Two Way ANOVA report and generate the
report graph (pages 9-271 and 9-279).

Arranging Two Way ANOVA Data 10

The Two Way ANOVA tests for differences between samples grouped
according to the levels of each factor and the interactions between the
factors.

For example, in an analysis of the effect of gender on the action of two


different drugs, gender and drug are the factors, male and female are the
levels of the gender factor, drug types are the levels for the drug factor,
and the different combinations of the levels (gender and drug) are the
groups, or cells.
TABLE 9-1
Data for a Gender Drug
Two Way ANOVA
Drug A Drug B
The factors are gender
and drug, and the levels Male 3.8 1.5
are Male/Female and
Drug A/Drug B. 3.2 1.8
3.5 2.2

Two Way Analysis of Variance (ANOVA) 254


Female 5.1 5.9
4.9 6.1
5.5 6.6

If your data is missing data points or even whole cells, SigmaStat detects
this and provides the correct solutions; see the following section, Missing
Data and Empty Cells.

For more information on arranging data, see Data Format for Group
Comparison Tests on page 204, or Arranging Data for Contingency
Tables on page 69.

Missing Data and Empty Ideally, the data for a Two Way ANOVA should be completely balanced,
Cells Data i.e., each group or cell in the experiment has the same number of
observations and there are no missing data. However, SigmaStat
properly handles all occurrences of missing and unbalanced data
automatically.

Missing Data Points If there are missing values, SigmaStat


automatically handles the missing data by using a general linear model
approach. This approach constructs hypothesis tests using the marginal
sums of squares (also commonly called the Type III or adjusted sums of
squares).

TABLE 3-2
Data for a Two
Gender Drug
Way ANOVA with a Drug A Drug B
Missing Value in the
Male/Drug A Cell Male 3.8 1.5
A general linear model
approach is used in -- 1.8
these situations.
3.5 2.2
Female 5.1 5.9
4.9 6.1
5.5 6.6

Empty Cells When there is an empty cell, i.e., there are no observations
for a combination of two factor levels, SigmaStat stops and suggests
either analysis of the data using a two way design with the added

Two Way Analysis of Variance (ANOVA) 255


assumption of no interaction between the factors, or a One Way
ANOVA.
TABLE 3-3
Data for a Two Way ANOVA Gender Drug
with a Missing Cell (Male/
Drug A)
Drug A Drug B
You can use either one Male -- 1.5
factor analysis or assume no
interaction between factors. -- 1.8
-- 2.2
Female 5.1 5.9
4.9 6.1
5.5 6.6

Assumption of no interaction analyzes the main effects of each treatment


separately.

! Note that it can be dangerous to assume there is no interaction between the


two factors in a Two Way ANOVA. Under some circumstances, this
assumption can lead to a meaningless analysis, particularly if you are
interested in studying the interaction effect.

If you treat the problem as a One Way ANOVA, each cell in the table is
treated as a different level of a single experimental factor. This approach
is the most conservative analysis because it requires no additional
assumptions about the nature of the data or experimental design.

Connected versus Disconnected Data The no interaction assumption


does not always permit a two factor analysis when there is more than one
empty cell. The non-empty cells must be geometrically connected in
order to do the computation. You cannot perform Two Way ANOVAs
on disconnected data.

Data arranged in a two-dimensional grid, where you can draw a series of


straight vertical and horizontal lines connecting all occupied cells,
without changing direction in an empty cell, is guaranteed to be
connected.

It is important to note that failure to meet the above requirement does


not imply that the data is disconnected. The data in Table 3-5, for
example, is connected.

Two Way Analysis of Variance (ANOVA) 256


TABLE 3-4
Example of Drawing 1.2 4.2 .05 1.4 .54
Straight Horizontal and
Vertical Lines through 2.6 3.3 3.1
Connected Data
2.0

TABLE 3-5
Example of Connected 1.2 8.3
Data that You Can’t Draw
a Series of Straight 2.4 6.2
Vertical and Horizontal
Lines Through
5.8 1.0
4.8 .98

SigmaStat automatically checks for this condition. If disconnected data


is encountered during a Two Way ANOVA, SigmaStat suggests
treatment of the problem as a One Way ANOVA.

For descriptions of the concept of data connectivity, you can reference


any appropriate statistics reference. For a list of suggested references, see
page 12.
TABLE 3-6
Disconnected Data Gender Drug
Because these data are not Drug A Drug B
geometrically connected
(they share no factor levels in Male -- 1.5
common) a two way ANOVA
cannot be performed, even -- 1.8
assuming no interaction.
-- 2.2
Female 5.1 --
4.9 --
5.5 --

Entering A Two Way ANOVA can only be performed on two factor indexed data.
Worksheet Data Two factor indexed data is placed in three columns; a data point indexed

Two Way Analysis of Variance (ANOVA) 257


two ways consists of the first factor in one column, the second factor in a
second column, and the data point in a third column.

FIGURE 9–31
Valid Data Formats
for a Two Way ANOVA
Column 1 is the first factor
index, column 2 is the
second factor index, and
column 3 is the data. The
data for this worksheet is
taken from Table 3-2.

Selecting When running a Two Way ANOVA you can either:


Data Columns
➤ Select the columns to test by dragging your mouse over the columns
before choosing the test.
➤ Select the columns while running the test.

Setting Two Way ANOVA Options 10

Use the Two Way ANOVA options to:

➤ Adjust the parameters of the test to relax or restrict the testing of


your data for normality and equal variance.
➤ Display the statistics summary table and confidence interval for the
data.
➤ Compute the power, or sensitivity, of the test.
➤ Enable multiple comparison testing.

To change Two Way ANOVA options:

1 If you are going to run the test after changing test options and
want to select your data before you run the test, drag the pointer
over the data.

2 To open the Options for Two Way ANOVA dialog box, select Two
Way ANOVA from the toolbar drop-down list, then click the
button, or choose the Statistics menu Current Test Options...

Two Way Analysis of Variance (ANOVA) 258


command. The Normality and Equal Variance options appear (see
Figure 9–33 on page 260).

3 Click the Results tab to view the Summary Table, Confidence


Intervals, and Residuals in Column options (see Figure 9–33 on
page 262). Click the Post Hoc Test tab to view the Power and
Multiple Comparisons options (see Figure 9–34 on page 262).
Click the Assumption Checking tab to return to the Normality
and Equal Variance options.

4 Click a check box to enable or disable a test option. Options


settings are saved between SigmaStat sessions. For more
information on each of the test options, see pages 9-259 through
9-263.

5 To continue the test, click Run Test. The Pick Columns dialog
box appears (see Picking Data to Test on page 99 for more
information).

6 To accept the current settings and close the options dialog box,
click OK. To accept the current setting without closing the
options dialog box, click Apply. To close the dialog box without
changing any settings or running the test, click Cancel.

! You can cick Help at any time to access SigmaStat’s on-line help
system.

Normality and Select the Assumption Checking tab from the options dialog box to view
Equal Variance the Normality and Equal Variance options. The normality assumption
Assumptions test checks for a normally distributed population. The equal variance
assumption test checks the variability about the group means.

Two Way Analysis of Variance (ANOVA) 259


Normality Testing SigmaStat uses the Kolmogorov-Smirnov test to test
for a normally distributed population.

FIGURE 9–32
The Options for Two
Way ANOVA Dialog Box
Displaying the Assumption
Checking Options

Equal Variance Testing SigmaStat tests for equal variance by checking


the variability about the group means.

P Values for Normality and Equal Variance Enter the corresponding


P value in the P Value to Reject box. The P value determines the
probability of being incorrect in concluding that the data is not
normally distributed (the P value is the risk of falsely rejecting the null
hypothesis that the data is normally distributed). If the P value
computed by the test is greater than the P set here, the test passes.

To require a stricter adherence to normality and/or equal variance,


increase the P value.

Because the parametric statistical methods are relatively robust in terms


of detecting violations of the assumptions, the suggested value in
SigmaStat is 0.050. Larger values of P (for example, 0.100) require less
evidence to conclude that data is not normal.

To relax the requirement of normality and/or equal variance,


decrease P.

Requiring smaller values of P to reject the normality assumption means


that you are willing to accept greater deviations from the theoretical
normal distribution before you flag the data as non-normal. For
example, a P value of 0.050 requires greater deviations from normality to
flag the data as non-normal than a value of 0.100.

Two Way Analysis of Variance (ANOVA) 260


! There are extreme conditions of data distribution that these tests cannot take
into account. For example, the Levene Median test fails to detect differences
in variance of several orders of magnitude. However, these conditions
should be easily detected by simply examining the data without resorting to
the automatic assumption tests.

Summary Table Select the Results tab in the options dialog box to view the Summary
Table option. The Summary Table option displays the number of
observations for a column or group, the number of missing values for a
column or group, the average value for the column or group, the
standard deviation of the column or group, and the standard error of the
mean for the column or group.

Confidence Intervals Select the Results tab in the options dialog box to view the Confidence
Intervals option. The Confidence Intervals option displays the
confidence interval for the difference of the means. To change the
interval, enter any number from 1 to 99 (95 and 99 are the most
commonly used intervals). Click the selected check box if you do not
want to include the confidence interval in the report.

Residuals Select the Results tab in the options dialog box to view the Residuals
option. Use the Residuals option to display residuals in the report and to
save the residuals of the test to the specified worksheet column. To
change the column the residuals are saved to, edit the number in or
select a number from the drop-down list.

Two Way Analysis of Variance (ANOVA) 261


FIGURE 9–33
The Options for Two Way
ANOVA Dialog Box
Displaying
the Summary Table,
Confidence Intervals,
and Residuals Options

Power Select the Post Hoc Tests tab in the options dialog box to view the Power
option. The power or sensitivity of a test is the probability that the test
will detect a difference between the groups if there is really a difference.

Change the alpha value by editing the number in the Alpha Value box.
Alpha ($) is the acceptable probability of incorrectly concluding that
there is a difference. The suggested value is $ 0 0.05. This indicates
that a one in twenty chance of error is acceptable, or that you are willing
to conclude there is a significant difference when P 7 0.05.

Smaller values of $ result in stricter requirements before concluding


there is a significant difference, but a greater possibility of concluding
there is no difference when one exists. Larger values of $ make it easier
to conclude that there is a difference, but also increase the risk of
reporting a false positive.

FIGURE 9–34
The Options for Two
Way ANOVA Dialog Box
Displaying the
Power and Multiple
Comparisons Options

Two Way Analysis of Variance (ANOVA) 262


Multiple Select the Post Hoc Test tab in the Options dialog box to view the
Comparisons multiple comparisons options (see Figure 9–34 on page 262). Two Way
ANOVAs test the hypothesis of no differences between the several
treatment groups, but do not determine which groups are different, or
the sizes of these differences. Multiple comparisons to isolate these
differences whenever a Two Way ANOVA detects a difference.

The P value used to determine if the ANOVA detects a difference is set


in the Report Options dialog box. If the P value produced by the Two
Way ANOVA is less than the P value specified in the box, a difference in
the groups is detected and the multiple comparisons are performed. For
more information on specifying a P value for the ANOVA, see Setting
Two Way ANOVA Options on page 258.

Performing Multiple Comparisons You can choose to always perform


multiple comparisons or to only perform multiple comparisons if the
Two Way ANOVA detects a difference.

Select the Always Perform option to perform multiple comparisons


whether or not the ANOVA detects a difference.

Select the Only When ANOVA P Value is Significant option to perform


multiple comparisons only if the ANOVA detects a difference.

Significant Multiple Comparison Value Select either 0.05 or 0.01


from the Significance Value for Multiple Comparisons drop-down list.
This value determines the that the likelihood of the multiple comparison
being incorrect in concluding that there is a significant difference in the
treatments.

A value of 0.05 indicates that the multiple comparisons will detect a


difference if there is less than 5% chance that the multiple comparison is
incorrect in detecting a difference.

! If multiple comparisons are triggered, the Multiple Comparison Options


dialog box appears after you pick your data from the worksheet and run the
test, prompting you to choose a multiple comparison method. See Multiple
Comparison Options on page 266 for more information.

Two Way Analysis of Variance (ANOVA) 263


Running a Two Way ANOVA 10

To run a Two Way ANOVA you need to select the data to test. The Pick
Columns dialog box is used to select the worksheet columns with the
data you want to test.

To run a Two Way ANOVA:

1 If you want to select your data before you run the test, drag the
pointer over your data.

2 Open the Pick Columns for Two Way ANOVA dialog box to start
the Two Way ANOVA. You can either

➤ Select Two Way ANOVA from the toolbar drop-down list, then
click the button.
➤ Choose the Statistics menu Compare Many Groups, Two Way
ANOVA... command.
➤ Click the Run Test button from the Options for Two Way
ANOVA dialog box.

The Pick Columns dialog box appears. If you selected columns


before you chose the test, the selected columns appear in the
column list. If you have not selected columns, the dialog box
prompts you to pick your data.

3 To assign the desired worksheet columns to the Selected Columns


list, select the columns in the worksheet, or select the columns
from the Data for Data drop-down list.

The first selected column is assigned to the first row in the Selected
Columns list, and all successively selected columns are assigned to
successive rows in the list. The number or title of selected columns

Two Way Analysis of Variance (ANOVA) 264


appear in each row. You are prompted to pick a minimum three
worksheet columns.

FIGURE 9–35
The Pick Columns
for Two ANOVA Dialog Box
Prompting You to
Select Data Columns

4 To change your selections, select the assignment in the list, then


select new column from the worksheet. You can also clear a
column assignment by double-clicking it in the Selected Columns
list.

5 Click Finish to perform the Two Way ANOVA. The Two Way
ANOVA report appears (see page 272) if you:

➤ Selected to test for normality and equal variance, and your data
passes both tests.
➤ Your data has no missing data points, cells, or is not otherwise
unbalanced.
➤ Selected not perform multiple comparisons, or if you selected to
run multiple comparisons only when the P value is significant,
and the P value is not significant (see page 260).

To edit the report, use the Format menu commands; for


information on editing reports, see Editing Reports on page 137.

6 If you elected to test for normality and equal variance, and your
data fails either test, either continue or transform your data, then
perform the Two Way ANOVA on the transformed data. For
information on how to transform data, see Chapter 14, Using
Transforms.

7 If your data is missing data points, missing cells, or is otherwise


unbalanced, you are prompted to perform the appropriate
procedure.

Two Way Analysis of Variance (ANOVA) 265


➤ If you are missing data points, but still have at least one
observation in each cell, SigmaStat automatically proceeds with
the Two Way ANOVA using a general linear model.
➤ If you are missing a cell, but the data is connected, you can
proceed by either performing a two way analysis assuming no
interaction between the factor, or converting the problem into a
one way design with each non-empty cell a different level of a
single factor.
➤ If your data is not geometrically connected, you cannot perform
a
Two Way ANOVA. Either treat the problem as a One Way
ANOVA, or cancel the test.

For more information on missing data point and cell handling, see
If There Were Missing Data Cells on page 272.

8 If the P value for multiple comparisons is significant, or you


selected in to always perform multiple comparisons, the Multiple
Comparisons Options dialog box appears prompting you to select
a multiple comparison method. For more information on
selecting a multiple comparison method, see Multiple Comparison
Options below..

Multiple Comparison Options 10

If you selected to run multiple comparisons only when the P value is


significant, and the ANOVA produces a P value, for either of the two
factors or the interaction between the two factors, equal to or less than
the trigger P value, or you selected to always run multiple comparisons
in the Options for Two Way ANOVA dialog box (see page 260), the
Multiple Comparison Options dialog box appears prompting you to
specify a multiple comparison test.

This dialog box displays the P values for each of the two experimental
factors and of the interaction between the two factors. Only the options
with P values less than or equal to the value set in the Options dialog
box are selected. You can disable multiple comparison testing for a factor
by clicking the selected option. If no factor is selected, multiple
comparison results are not reported.

There are seven multiple comparison tests to choose from for the Two
Way ANOVA. You can choose to perform the:

Two Way Analysis of Variance (ANOVA) 266


➤ Holm-Sidak Test
➤ Tukey Test
➤ Student-Newman-Keuls Test
➤ Bonferroni t-test
➤ Fisher’s LSD
➤ Dunnet’s Test
➤ Duncan’s Multiple Range Test

There are two types of multiple comparison available for the Two Way
ANOVA. The types of comparison you can make depends on the
selected multiple comparison test.

➤ All pairwise comparisons test the difference between each


treatment or level within the two factors separately (i.e., among the
different rows and columns of the data table).
➤ Multiple comparisons versus a control test the difference between
all the different combinations of each factors (i.e., all the cells in the
data table).

When comparing the two factors separately, the levels within one factor
are compared among themselves without regard to the second factor,
and vice versa. These results should be used when the interaction is not
statistically significant.

When the interaction is statistically significant, interpreting multiple


comparisons among different levels of each experimental factor may not
be meaningful. SigmaStat also suggests performing a multiple
comparison between all the cells.

The result of all comparisons is a listing of the similar and different


group pairs, i.e., those groups that are and are not detectably different
from each other. Because no statistical test eliminates uncertainty,
multiple comparison procedures sometimes produce ambiguous
groupings.

Holm-Sidak Test The Holm-Sidak Test can be used for both pairwise comparisons and
comparisons versus a control group. It is more powerful than the Tukey
and Bonferroni tests and, consequently, it is able to detect differences
that these other tests do not. It is recommended as the first-line
procedure for pairwise comparison testing.

Two Way Analysis of Variance (ANOVA) 267


When performing the test, the P values of all comparisons are computed
and ordered from smallest to largest. Each P value is then compared to a
critical level that depends upon the significance level of the test (set in
the test options), the rank of the P value, and the total number of
comparisons made. A P value less than the critical level indicates there is
a significant difference between the corresponding two groups.

Tukey Test The Tukey Test and the Student-Newman-Keuls test are conducted
similarly to the Bonferroni t-test, except that it uses a table of critical
values that is computed based on a better mathematical model of the
probability structure of the multiple comparisons. The Tukey Test is
more conservative than the Student-Newman-Keuls test, because it
controls the errors of all comparisons simultaneously, while the Student-
Neuman-Keuls test controls errors among tests of k means. Because it is
more conservative, it is less likely to determine that a give differences is
statistically significant and it is the recommended test for all pairwise
comparisons.

While Multiple Comparisons vs. a Control is an available comparison


type for the Tukey Test, it is not recommended. Use the Dunnet’s Test
for multiple comparisons vs. a control.

Student-Newman-Keuls The Student-Newman-Keuls Test and the Tukey Test are conducted
(SNK) Test similarly to the Bonferroni t-test, except that it uses a table of critical
values that is computed based on a better mathematical model of the
probability structure of the multiple comparisons. The Student-
Newman-Keuls Test is less conservative than the Tukey Test because it
controls errors among tests of k means, while the Tukey Test controls the
errors of all comparisons simultaneously. Because it is less conservative,
it is more likely to determine that a give differences is statistically
significant. The Student-Newman-Keuls Test is usually more sensitive
than the Bonferroni t-test, and is only available for all pairwise
comparisons.

Bonferroni t-test The Bonferroni t-test performs pairwise comparisons with paired t-tests.
The P values are then multiplied by the number of comparisons that
were made. It can perform both all pairwise comparisons and multiple
comparisons vs. a control, and is the most conservative test for both each
comparison type. For less conservative all pairwise comparison tests, see
the Tukey and the Student-Newman-Keuls tests, and for the less
conservative multiple comparison vs. a control tests, see the Dunnett’s
Test.

Two Way Analysis of Variance (ANOVA) 268


Fisher’s Least The Fisher’s LSD Test is the least conservative all pairwise comparison
Significance Difference test. Unlike the Tukey and the Student-Newman-Keuls, it makes not
Test effort to control the error rate. Because it makes not attempt in
controlling the error rate when detecting differences between groups, it
is not recommended.

Dunnett’s Test Dunnett's test is the analog of the Student-Newman-Keuls Test for the
case of multiple comparisons against a single control group. It is
conducted similarly to the Bonferroni t-test, but with a more
sophisticated mathematical model of the way the error accumulates in
order to derive the associated table of critical values for hypothesis
testing. This test is less conservative than the Bonferroni Test, and is
only available for multiple comparisons vs. a control.

Duncan’s The Duncan’s Test is the same way as the Tukey and the Student-
Multiple Range Newman-Keuls tests, except that it is less conservative in determining
whether the difference between groups is significant by allowing a wider
range for error rates. Although it has a greater power to detect
differences than the Tukey and the Student-Newman-Keuls tests, it has
less control over the Type1 error rate, and is, therefore, not
recommended.

Performing a Multiple The multiple comparison you choose to perform depends on the
Comparison treatments you are testing. Click Cancel if you do not want to perform
a multiple comparison procedure.

1 Multiple comparisons are performed of the factors selected under


the Select Factors to Compare heading. The factors with P values
less than or equal to the value set in the Options dialog box box are
automatically selected, and the P values for the selected factors,
and/or the interactions of the factors are displayed in the upper left
corner of the dialog box. If the P value is greater than the P value
set in the Options dialog box, the factor is not selected, the P value
for the factor is not displayed, and multiple comparisons are not
performed for the factor.

You can disable multiple comparison testing for a factor by


clicking the selected option.

2 Select the desired multiple comparison test from the Suggested


Test drop-down list. The Tukey and Student-Newman-Keuls tests
are recommended for determining the difference among all

Two Way Analysis of Variance (ANOVA) 269


treatments. If you have only a few treatments, you may want to
select the simpler Bonferroni t-test.

The Dunnett's test is recommended for determining the


differences between the experimental treatments and a control
group. If you have only a few treatments or observations, you can
select the simpler Bonferroni t-test.

! Note that in both cases the Bonferroni t-test is most sensitive with a
small number of groups. Dunnett’s test is not available if you have
fewer than six observations.

For more information on each of the multiple comparison tests,


see About Group Comparison Tests on page 203.

FIGURE 9–36
The Multiple Comparison
Options Dialog Box for
a Two Way ANOVA

3 Select a Comparison Type. The types of comparisons available


depend on the selected test. All Pairwise compares all possible
pairs of treatments and is available for the Tukey, Student-
Newman-Keuls, Bonferroni, Fisher LSD, and Duncan’s tests.

Versus Control compares all experimental treatments to a single


control group and is available for the Tukey, Bonferroni, Fisher
LSD, Dunnett’s, and Duncan’s tests. It is not recommended for the
Tukey, Fisher LSD, or Duncan’s test. If you select Versus Control,
you must also select the control group from the list of groups.

For more information on multiple comparison test and the


available comparison types, see page 203.

4 If you selected an all pairwise comparison test, cick Finish to


continue with the Two Way ANOVA and view the report (see

Two Way Analysis of Variance (ANOVA) 270


Figure 9–38 on page 272). For information on editing reports, see
Editing Reports on page 137.

5 If you selected a multiple comparisons versus a control test,


cick Next. The Multiple Comparisons Options dialog box
prompts you to select a control group for each factor. Select the
desired control groups from the lists, then cick Finish to continue
with the Two Way ANOVA and view the report (see Figure 9–38
on page 272). For information on editing reports, see Editing
Reports on page 137.

FIGURE 9–37
The Multiple Comparison
Options Dialog Box
Prompting You to
Select Control Group s

Interpreting Two Way ANOVA Results 10

A full Two Way ANOVA report displays an ANOVA table describing


the variation associated with each factor and their interactions. This
table displays the degrees of freedom, sum of squares, and mean squares
for each of the elements in the data table, as well as the F statistics and
the corresponding P values.

Summary tables of least square means for each factor and for both
factors together can also be generated. This result and additional results
are enabled in the Options for Two Way ANOVA dialog box (see Setting
Two Way ANOVA Options on page 258). Click a selected check box to
enable or disable a test option. All options are saved between SigmaStat
sessions.

You can also generate tables of multiple comparisons. Multiple


Comparison results are also specified in the Options for Two Way
ANOVA dialog box. The tests used in the multiple comparisons are

Two Way Analysis of Variance (ANOVA) 271


selected in the Multiple Comparisons Options dialog box (see page
270).

For descriptions of the derivations for Two Way ANOVA results, you
can reference any appropriate statistics reference. For a list of suggested
references, see page 12.

! The report scroll bars only scroll to the top and bottom of the current page.
To move to the next or the previous page in the report, use the and
buttons in the formatting toolbar to move one page up and down in the
report.

Result Explanations In addition to the numerical results, expanded explanations of the results
may also appear. To turn off this explanatory text, choose the Statistics
menu Report Options... command and click the selected Explain Test
Results check box.

The number of decimal places displayed is also set in the Report


Options dialog box. For more information on setting report options see
Setting Report Options on page 135.

FIGURE 9–38
Two Way ANOVA
Report

If There Were Missing If your data contained missing values but no empty cells, the report
Data Cells indicates the results were computed using a general linear model.

Two Way Analysis of Variance (ANOVA) 272


If your data contained empty cells, you either analyzed the problem
assuming either no interaction or treated the problem as a One Way
ANOVAl

➤ If you choose no interactions, no statistics for factor interaction are


calculated.
➤ If you performed a One Way ANOVA, the results shown are
identical to One Way ANOVA results (see page 243).

Dependent Variable This is the data column title of the indexed worksheet data you are
analyzing with the Two Way ANOVA. Determining if the values in this
column are affected by the different factor levels is the objective of the
Two Way ANOVA.

Normality Test Normality test results display whether the data passed or failed the test of
the assumption that they were drawn from a normal population and the
P value calculated by the test. Normally distributed source populations
are required for all parametric tests.

This result appears if you enabled normality testing in the Two Way
ANOVA Options dialog box (see page 260).

Equal Variance Test Equal Variance test results display whether or not the data passed or
failed the test of the assumption that the samples were drawn from
populations with the same variance and the P value calculated by the
test. Equal variance of the source population is assumed for all
parametric tests.

This result appears if you enabled equal variance testing in the Two Way
ANOVA Options dialog box (see page 260).

ANOVA Table The ANOVA table lists the results of the Two Way ANOVA.

! When there are missing data, the best estimate of these values is
automatically calculated using a general linear model.

DF (Degrees of Freedom) Degrees of freedom represent the number of


groups in each factor and the sample size, which affects the sensitivity of
the ANOVA, including:

➤ The degrees of freedom for each factor is a measure of the number


of levels in each factor.

Two Way Analysis of Variance (ANOVA) 273


➤ The interaction degrees of freedom is a measure of the total number
of cells.
➤ The error degrees of freedom (sometimes called the residual or
within groups degrees of freedom) is a measure of the sample size
after accounting for the factors and interaction.
➤ The total degrees of freedom is a measure of the total sample size.

SS (Sum of Squares) The sum of squares is a measure of variability


associated with each element in the ANOVA data table.

➤ The factor sums of squares measure the variability within between


the rows or columns of the table considered separately.
➤ The interaction sum of squares measures the variability of the
average differences between the cell in addition to the variation
between the rows and columns, considered separately—this is a
gauge of the interaction between the factors.
➤ The error sum of squares (also called residual or within group sum
of squares) is a measure of the underlying random variation in the
data, i.e., the variability not associated with the factors or their
interaction.
➤ The total sum of squares is a measure of the total variability in the
data; if there are no missing data, the total sum of squares equals the
sum of the other table sums of squares.

MS (Mean Squares) The mean squares provide different estimates of


the population variances. Comparing these variance estimates is the
basis of analysis of variance.

The mean square for each factor

sum of squares for thefactor SS factor


----------------------------------------------------------------------- = ----------------- = MS factor
degrees of freedom for the factor DF factor

is an estimate of the variance of the underlying population computed


from the variability between levels of the factor.

Two Way Analysis of Variance (ANOVA) 274


The interaction mean square

sum of squares for the interaction = --------------- SS inter


----------------------------------------------------------------------------------- - = MS inter
degrees of freedom for the interaction DF inter

is an estimate of the variance of the underlying population computed


from the variability associated with the interactions of the factors.

The error mean square (residual, or within groups)

error sum of squares - SS error


---------------------------------------------------- - = MS error
= ---------------
error degrees of freedom DF error

is an estimate of the variability in the underlying population, computed


from the random component of the observations.

F Statistic The F test statistic is provided for comparisons within each


factor and between the factors.

The F ratio to test each factor is

mean square for the factor MS factor


----------------------------------------------------------- - = F factor
= -----------------
mean square of the error MS error

The F ratio to test the interaction is

mean square for the interaction = ---------------- MS inter


----------------------------------------------------------------------- - = F inter
mean square of the error MS error

If the F ratio is around 1, you can conclude that there are no significant
differences between factor levels or that there is no interaction between
factors (i.e., the data groups are consistent with the null hypothesis that
all the samples were drawn from the same population).

If F is a large number, you can conclude that at least one of the samples
for that factor or combination of factors was drawn from a different
population (i.e., the variability is larger than what is expected from
random variability in the population). To determine exactly which

Two Way Analysis of Variance (ANOVA) 275


groups are different, examine the multiple comparison results (see page
271).

P Value The P value is the probability of being wrong in concluding


that there is a true difference between the groups (i.e., the probability of
falsely rejecting the null hypothesis, or committing a Type I error, based
on F). The smaller the P value, the greater the probability that the
samples are drawn from different populations.

Traditionally, you can conclude there are significant differences if P 7


0.05.

Power The power, or sensitivity, of a Two Way ANOVA is the probability that
the test will detect the observed difference among the groups if there
really is a difference. The closer the power is to 1, the more sensitive the
test. The power for the comparison of the groups within the two factors
and the power for the comparison of the interactions are all displayed.
These results are set in the Options for Two Way ANOVA dialog box.

ANOVA power is affected by the sample sizes, the number of groups


being compared, the chance of erroneously reporting a difference $
(alpha), the observed differences of the group means, and the observed
standard deviations of the samples.

Alpha ($) Alpha ($) is the acceptable probability of incorrectly


concluding that there is a difference. An $ error also is called a Type I
error (a Type I error is when you reject the hypothesis of no effect when
this hypothesis is true).

The $ value is set in the Options for Two Way ANOVA dialog box; the
suggested value is $ 0 0.05 which indicates that a one in twenty chance
of error is acceptable. Smaller values of $ result in stricter requirements
before concluding there is a significant difference, but a greater
possibility of concluding there is no difference when one exists (a Type
II error). Larger values of $ make it easier to conclude that there is a
difference, but also increase the risk of seeing a false difference (a Type I
error).

Summary Table The least square means and standard error of the means are displayed for
each factor separately (summary table row and column), and for each
combination of factors (summary table cells). If there are missing values,
the least square means are estimated using a general linear model.

Two Way Analysis of Variance (ANOVA) 276


Mean The average value for the column. If the observations are
normally distributed the mean is the center of the distribution.

Standard Error of the Mean A measure of the approximation with


which the mean computed from the sample approximates the true
population mean.

When there are no missing data, the least square means equal the cell
and marginal (row and column) means. When there are missing data,
the least squared means provide the best estimate of these values, using a
general linear model. These means and standard errors are used when
performing multiple comparisons (see following section).

Multiple Comparisons If a difference is found among the groups, multiple comparison tables
can be computed. Multiple comparison procedures are activated in the
Options for Two Way ANOVA dialog box (see page 260). The tests
used in the multiple comparisons are set in the Multiple Comparisons
Options dialog box (see page 270).

Multiple comparison results are used to determine exactly which groups


are different, since the ANOVA results only inform you that two or
more of the groups are different. Two factor multiple comparison for a
full Two Way ANOVA also compares:

➤ Groups within each factor without regard to the other factor (this is
a marginal comparison, i.e.,only the columns or rows in the table
are compared).
➤ All combinations of factors (all cells in the table are compared with
each other).

The specific type of multiple comparison results depends on the


comparison test used and whether the comparison was made pairwise or
versus a control.

➤ All pairwise comparison results list comparisons of all possible


combinations of group pairs; the all pairwise tests are the Tukey,
Student-Newman-Keuls, Fisher LSD, Duncan’s, and Dunnett's, and
Bonferroni t-test.
➤ Comparisons versus a single control group list only comparisons
with the selected control group. The control group is selected
during the actual multiple comparison procedure. The comparison
versus a control tests are a Bonferroni t-test and Dunnett's test.

Two Way Analysis of Variance (ANOVA) 277


For descriptions of the derivations of two way multiple comparison
procedure results, you can reference any appropriate statistics reference.
For a list of suggested references, see page 12.

Bonferroni t-test Results The Bonferroni t-test lists the differences of


the means for each pair of groups, computes the t values for each pair,
and displays whether or not P 7 0.05 for that comparison. The
Bonferroni t-test can be used to compare all groups or to compare versus
a control.

You can conclude from “large” values of t that the difference of the two
groups being compared is statistically significant.

If the P value for the comparison is less than 0.05, the likelihood of
erroneously concluding that there is a significant difference is less than
5%. If it is greater than 0.05, you cannot confidently conclude that
there is a difference.

The Difference of Means is a gauge of the size of the difference between


the levels or cells being compared.

The degrees of freedom DF for the marginal comparisons are a measure


of the number of groups (levels) within the factor being compared. The
degrees of freedom when comparing all cells is a measure of the sample
size after accounting for the factors and interaction. This is the same as
the error or residual degrees of freedom.

Tukey, Student-Newman-Keuls, Fisher LSD, Duncan’s, and


Dunnett's Test Results The Tukey, Student-Newman-Keuls (SNK),
Fisher LSD, and Duncan’s tests are all pairwise comparisons of every
combination of group pairs. While the Tukey Fisher LSD, and
Duncan’s can be used to compare a control group to other groups, they
are not recommended for this type of comparison.

Dunnett's test only compares a control group to all other groups. All
tests compute the q test statistic, and display whether or not P 7 0.01 for
that pair comparison.

You can conclude from “large” values of q that the difference of the two
groups being compared is statistically significant.

If the P value for the comparison is less than 0.05, the likelihood of
being incorrect in concluding that there is a significant difference is less

Two Way Analysis of Variance (ANOVA) 278


than 5%. If it is greater than 0.05, you cannot confidently conclude
that there is a difference.

p is a parameter used when computing q. The larger the p, the larger q


needs to be to indicate a significant difference. p is an indication of the
differences in the ranks of the group means being compared. Groups
means are ranked in order from largest to smallest, and p is the number
of means spanned in the comparison. For example, when comparing
four means, comparing the largest to the smallest p 0 4, and when
comparing the second smallest to the smallest p 0 2. For the Tukey test,
p is always equal to the total number of groups.

If a group is found to be not significantly different than another group,


all groups with p ranks in between the p ranks of the two groups that are
not different are also assumed not to be significantly different, and a
result of DNT (Do Not Test) appears for those comparisons.

The Difference of Means is a gauge of the size of the difference between


the groups or cells being compared.

The degrees of freedom DF for the marginal comparisons are a measure


of the number of groups (levels) within the factor being compared. The
degrees of freedom when comparing all cells is a measure of the sample
size after accounting for the factors and interaction (this is the same as
the error or residual degrees of freedom).

Two Way ANOVA Report Graphs 10

You can generate up to six graphs using the results from a Two Way
ANOVA. They include a:

➤ Histogram of the residuals.


➤ Normal probability plot of the residuals.
➤ 3D plot of the residuals.
➤ Grouped bar chart of the column means.
➤ 3D category scatter plot.
➤ Multiple comparison graphs.

Histogram of Residuals The Two Way ANOVA histogram plots the raw residuals in a specified
range, using a defined interval set. The residuals are divided into a
number of evenly incremented histogram intervals and plotted as

Two Way Analysis of Variance (ANOVA) 279


histogram bars indicating the number of residuals in each interval. The
X axis represents the histogram intervals, and the Y axis represent the
number of residuals in each group. For an example of a histogram, see
page 153.

Probability Plot The Two Way ANOVA probability plot graphs the frequency of the raw
residuals. The residuals are sorted and then plotted as points around a
curve representing the area of the gaussian plotted on a probability axis.
Plots with residuals that fall along gaussian curve indicate that your data
was taken from a normally distributed population. The X axis is a
linear scale representing the residual values. The Y axis is a probability
scale representing the cumulative frequency of the residuals. For an
example of a probability plot, see page 155.

Grouped Bar Chart The Two Way ANOVA grouped bar chart plots the means of column
data as bars with error bars indicating the standard deviation of each
group. The levels in the first factor column are used as the tick marks
for the bar chart bars, and the column titles for the first factor column
and the data column are used as the axis titles. Each bar in the group
represents a different level in the second factor column. For an example
of a grouped bar chart, see page 157.

! If there are no interactions between factors, a graph for the Two Way
ANOVA is not generated. In this case, it is more appropriate to perform two
One Way ANOVAs.

3D Residual The Two Way ANOVA 3D residual scatter plot graphs the residuals of
Scatter Plot the two columns of independent variable data. The X and the Y axes
represent the independent variables, and the Z axis represents the
residuals. For an example a 3D residual scatter plot, see page 156.

3D Category The Two Way ANOVA 3D Category Scatter plot graphs the two factors
Scatter Graph from the independent data columns along the X and Y axes against the
data of the dependent variable column along the Z axis. The tick marks
for the X and Y axes represent the two factors from the independent
variable columns, and the tick marks for the Z axis represent the data
from the dependent variable column. For an example a 3D category
scatter plot, see page 158.

Two Way Analysis of Variance (ANOVA) 280


Multiple The Two Way ANOVA multiple comparison graphs plot the significant
Comparison Graphs differences between levels of a significant factor. There is one graph for
every significant factor reported by the specified multiple comparison
test. If there is one significant factor reported, one graph appears; if
there are two significant factors, two graphs appear, etc. If a factor is not
reported as significant, a graph for the factor does not appear. For an
example of a multiple comparison graph, see page 160.

Creating a Report Graph To generate a graph of Two Way ANOVA data:

1 Click the toolbar button, or choose the Graph menu Create


Graph command when the Two Way ANOVA report is selected.
The Create Graph dilaog appears displaying the types of graphs
available for the Two Way ANOVA report.

FIGURE 9–39
The Create Graph Dialog
Box for Two ANOVA Report
Graphs

2 Select the type of graph you want to create from the Graph Type
list, then cick OK, or double-click the desired graph in the list.

Two Way Analysis of Variance (ANOVA) 281


For more information on each of the graph types, see Chapter 8.
The specified graph appears in a graph window or in the report.

FIGURE 9–40
A Multiple Comparison
for the Two Way ANOVA

For information on manipulating graphs, see Chapter 8, CREATING


AND MODIFYING GRAPHS.

Two Way Analysis of Variance (ANOVA) 282


9
Three Way Analysis of Variance (ANOVA) 10

A Three Way or three factor ANOVA (analysis of variance) should be used


when:

➤ You want to see if two of more different experimental groups are affected
by three different factors which may or may not interact.
➤ Samples are drawn from normally distributed populations with equal
variances.

To consider the effects of only one or two factors on your experimental


groups, use a One or Two Way ANOVA (see pages 8-230 and 9-253).
SigmaStat has no equivalent nonparametric three factor comparison for
samples drawn from a non-normal population. If your data is non-normal,
you can transform the data to make them comply better with the
assumptions of analysis of variance using Transforms menu commands. If
the sample size is large, and you want to do a nonparametric test, use the
Transforms menu Rank command to convert the observations to ranks, then
run a Three Way ANOVA on the ranks.

For more information on transforming data, see Chapter 14, USING


TRANSFORMS.

About the In a three way or three factor analysis of variance, there are three experimental
Three Way ANOVA factors which are varied for each experimental group. A three factor design is
used to test for differences between samples grouped according to the levels
of each factor and for interactions between the factors.

A three factor analysis of variance tests four hypotheses: (1) There is no


difference among the levels of the first factor; (2) There is no difference
among the levels of the second factor; (3) There is no difference among the
levels of the third factor; and (4) There is no interaction between the factors,
i.e., if there is any difference among groups within one factor, the differences
are the same regardless of the second and third factor levels.

Three Way ANOVA is a parametric test that assumes that all the samples
were drawn from normally distributed populations with the same variances.

Performing a To perform a Three Way ANOVA:


Three Way ANOVA
1 Enter or arrange your data appropriately in the worksheet
(see the following section).

Three Way Analysis of Variance (ANOVA) 283


2 If desired, set the Three Way ANOVA options using the Options for
Three Way ANOVA dialog box (page 290).

3 Select Three Way ANOVA from the toolbar drop-down list, or choose
the Statistics menu Compare Many Groups, Three Way ANOVA...
command.

4 Run the test by selecting the worksheet columns with the data you want
to test using the Pick Columns dialog box page 99.

5 Specify the multiple comparisons you want to perform on your test


(page 324).

6 View and interpret the Three Way ANOVA report and generate report
graphs (pages 9-300 and 9-308).

Arranging Three Way ANOVA Data 10

The Three Way ANOVA tests for differences between samples grouped
according to the levels of each factor and the interactions between the factors.

For example, in an analysis of the effect of gender on the action of two


different drugs over a different periods of time, gender, drugs, and time
period are the factors, male and female are the levels of the gender factor,
drug types are the levels for the drug factor, days are the levels of the time
period factor, and the different combinations of the levels (gender, drug, and
time period) are the groups, or cells.
TABLE 9-7
Data for a Gender Male Female
Three Way ANOVA
The factors are gender, Drug Drug A Drug B Drug A Drug B
drug, and time period. The
levels are Male/Female, Time Day Day Day Day Day Day Day Day Day Day Day Day
Drug A/Drug B, and Period 1 2 3 1 2 3 1 2 3 1 2 3
Day 1, 2, and 3.
1 2 3 4 5 6 7 8 9 10 11 12
Reaction 13 14 15 16 17 18 19 20 21 22 23 24
25 26 27 28 29 30 31 32 33 34 35 36

If your data is missing data points or even whole cells, SigmaStat detects this
and provides the correct solutions; see the following section, Missing Data
and Empty Cells.

Three Way Analysis of Variance (ANOVA) 284


For more information on arranging data, see Data Format for Group
Comparison Tests on page 204, or Arranging Data for Contingency Tables
on page 69.

Missing Data and Ideally, the data for a Three Way ANOVA should be completely balanced,
Empty Cells Data i.e., each group or cell in the experiment has the same number of
observations and there are no missing data. However, SigmaStat properly
handles all occurrences of missing and unbalanced data automatically.

Missing Data Points If there are missing values, SigmaStat automatically


handles the missing data by using a general linear model approach. This
approach constructs hypothesis tests using the marginal sums of squares (also
commonly called the Type III or adjusted sums of squares).
TABLE 3-8
Data for a Three Gender Male Female
Way ANOVA with a
Missing Value in the Drug Drug A Drug B Drug A Drug B
Male, Drug A, Day 1 Cell
A general linear model 1 2 3 1 2 3 1 2 3 1 2 3
approach is used in Day
these situations.
Day Day Day Day Day Day Day Day Day Day Day Day
1 2 3 4 5 6 7 8 9 10 11 12
Reaction -- 14 15 16 17 18 19 20 21 22 23 24
25 26 27 28 29 30 31 32 33 34 35 36

Empty Cells When there is an empty cell, i.e., there are no observations for
a combination of three factor levels, a dialog box appears asking you if you
want to analyze the data using a two way or a one way design. If you select a
two way design, SigmaStat attempts to analyze your data using two
interactions. If there are no observations with two interactions, SigmaStat
runs a One Way ANOVA.

If you treat the problem as a Two Way ANOVA, a dialog box appears
prompting you to remove one of the factors. Select the factor you want to
remove, then click OK. The Two Way ANOVA is performed.

If you treat the problem as a One Way ANOVA, each cell in the table is
treated as a single experimental factor. This approach is the most

Three Way Analysis of Variance (ANOVA) 285


conservative analysis because it requires no additional assumptions about the
nature of the data or experimental design.
TABLE 3-9
Data for a Three Way Gender Male Female
ANOVA with a Missing
Cell (Male/Drug A, Day 1) Drug Drug A Drug B Drug A Drug B
You can use either a two
factor analysis or assume no 1 2 3 1 2 3 1 2 3 1 2 3
interaction between factors. Day
Day Day Day Day Day Day Day Day Day Day Day Day
-- 2 3 4 5 6 7 8 9 10 11 12
Reaction -- 14 15 16 17 18 19 20 21 22 23 24
-- 26 27 28 29 30 31 32 33 34 35 36

Assumption of no interaction analyzes the main effects of each treatment


separately.

! Note that it can be dangerous to assume there is no interaction between the


three factors in a Three Way ANOVA. Under some circumstances, this
assumption can lead to a meaningless analysis, particularly if you are
interested in studying the interaction effect.

Connected versus Disconnected Data The no interaction assumption does


not always permit a two factor analysis when there is more than one empty
cell. The non-empty cells must be geometrically connected in order to do
the computation. You cannot perform Three Way ANOVAs on
disconnected data.

Data arranged in a two-dimensional grid, where you can draw a series of


straight vertical and horizontal lines connecting all occupied cells, without
changing direction in an empty cell, is guaranteed to be connected.

TABLE 3-10
Example of Drawing 1.2 4.2 .05 1.4 .54
Straight Horizontal and
Vertical Lines through 2.6 3.3 3.1
Connected Data
2.0

! It is important to note that failure to meet the above requirement does not
imply that the data is disconnected. The data in Table 3-11, for example, is
connected.

Three Way Analysis of Variance (ANOVA) 286


TABLE 3-11
Example of Connected 1.2 8.3
Data that You Can’t
Draw a Series of 2.4 6.2
Straight Vertical and
Horizontal Lines Through
5.8 1.0
4.8 .98

SigmaStat automatically checks for this condition. If disconnected data is


encountered during a Three Way ANOVA, SigmaStat suggests treatment of
the problem as a Two Way ANOVA. If the disconnected data is still
encountered during a Two Way ANOVA, a One Way ANOVA is performed.

For descriptions of the concept of data connectivity, you can reference any
appropriate statistics reference. For a list of suggested references, see page 12.
TABLE 3-12
Disconnected Data Gender Male Female
Because this data is not
geometrically connected Drug Drug A Drug B Drug A Drug B
(they share no factor levels
in common), a Three 1 2 3 1 2 3 1 2 3 1 2 3
Way ANOVA cannot be Days Day Day Day Day Day Day Day Day Day Day Day Day
performed, even
assuming no interaction.
-- -- -- 4 5 6 7 8 9 -- -- --
Reaction -- -- -- 16 17 18 19 20 21 -- -- --
-- -- -- 28 29 30 31 32 33 -- -- --

Entering A Three Way ANOVA can only be performed on three factor indexed data.
Worksheet Data Three factor indexed data is placed in four columns; a data point indexed
three ways consists of the first factor in one column, the second factor in a
second column, the third factor in a third column, and the data in a forth
column.

Selecting Data When running a Three Way ANOVA you can either:
Columns
➤ Select the columns to test by dragging your mouse over the columns
before choosing the test.
➤ Select the columns while running the test (page 99).

Three Way Analysis of Variance (ANOVA) 287


FIGURE 9–41
Valid Data Formats for a
Three Way ANOVA
Column 1 is the first factor
index, column 2 is the
second factor index, column
3 is the third factor index,
and column 4 is the data.

Setting Three Way ANOVA Options 10

Use the Three Way ANOVA options to:

➤ Adjust the parameters of the test to relax or restrict the testing of your
data for normality and equal variance.
➤ Include the statistics summary table and confidence interval for the data
in the report, and save residuals to the worksheet.
➤ Compute the power, or sensitivity, of the test.
➤ Enable multiple comparison testing.

To set Three Way ANOVA options:

1 If you are going to run the test after changing test options and want to
select your data before you run the test, drag the pointer over the data.

2 To open the Options for Three Way ANOVA dialog box, select Three
Way ANOVA from the toolbar drop-down list, then click the
button, or choose the Statistics menu Current Test Options...
command. The Normality and Equal Variance options appear (see
Figure 9–42 on page 290).

Three Way Analysis of Variance (ANOVA) 288


3 Click the Results tab to view the Summary Table, Confidence Intervals,
and Residuals in Column options (see Figure 9–43 on page 291). Click
the Post Hoc Test tab to view the Power and Multiple Comparisons
options (see Figure 9–44 on page 292). Click the Assumption
Checking tab to return to the Normality and Equal Variance options.

4 Click a check box to enable or disable a test option. Options settings


are saved between SigmaStat sessions. For more information on each of
the test options, see pages 9-291 through 9-313.

5 To continue the test, click Run Test. The Pick Columns dialog box
appears (see page 99 for more information).

6 To accept the current settings and close the options dialog box, click
OK. To accept the current setting without closing the options dialog
box, click Apply. To close the dialog box without changing any settings
or running the test, click Cancel.

! You can select Help at any time to access SigmaStat’s on-line help system.

Normality and Select the Assumption Checking tab from the options dialog box to view the
Equal Variance Normality and Equal Variance options. The normality assumption test
Assumptions checks for a normally distributed population. The equal variance assumption
test checks the variability about the group means.

Normality Testing SigmaStat uses the Kolmogorov-Smirnov test to test for


a normally distributed population.

Equal Variance Testing SigmaStat tests for equal variance by checking the
variability about the group means.

P Values for Normality and Equal Variance Type the corresponding P


value in the P Value to Reject box. The P value determines the probability of
being incorrect in concluding that the data is not normally distributed (the P
value is the risk of falsely rejecting the null hypothesis that the data is
normally distributed). If the P value computed by the test is greater than the
P set here, the test passes.

To require a stricter adherence to normality and/or equal variance,


increase the P value. Because the parametric statistical methods are relatively
robust in terms of detecting violations of the assumptions, the suggested
value in SigmaStat is 0.050. Larger values of P (for example, 0.100) require
less evidence to conclude that data is not normal.

Three Way Analysis of Variance (ANOVA) 289


FIGURE 9–42
The Options for Three
Way ANOVA Dialog Box
Displaying the Assumption
Checking Options

To relax the requirement of normality and/or equal variance, decrease P.


Requiring smaller values of P to reject the normality assumption means that
you are willing to accept greater deviations from the theoretical normal
distribution before you flag the data as non-normal. For example, a P value
of 0.050 requires greater deviations from normality to flag the data as non-
normal than a value of 0.100.

! There are extreme conditions of data distribution that these tests cannot take
into account. For example, the Levene Median test fails to detect differences
in variance of several orders of magnitude. However, these conditions
should be easily detected by simply examining the data without resorting to
the automatic assumption tests.

Summary Table Select the Results tab in the options dialog box to view the Summary Table
option. The Summary Table option displays the number of observations for
a column or group, the number of missing values for a column or group, the
average value for the column or group, the standard deviation of the column
or group, and the standard error of the mean for the column or group.

Confidence Intervals Select the Results tab in the options dialog box to view the Confidence
Intervals option. The Confidence Intervals option displays the confidence
interval for the difference of the means. To change the interval, enter any
number from 1 to 99 (95 and 99 are the most commonly used intervals).
Click the selected check box if you do not want to include the confidence
interval in the report.

Three Way Analysis of Variance (ANOVA) 290


Residuals Select the Results tab in the options dialog box to view the Residuals option.
Use the Residuals option to display residuals in the report and to save the
residuals of the test to the specified worksheet column. To change the
column the residuals are saved to, edit the number in or select a number from
the drop-down list.

FIGURE 9–43
The Options for Three Way
ANOVA Dialog Box
Displaying
the Summary Table,
Confidence Intervals,
and Residual Options

Power Select the Post Hoc Tests tab in the options dialog box to view the Power
option. The power or sensitivity of a test is the probability that the test will
detect a difference between the groups if there is really a difference.

Change the alpha value by editing the number in the Alpha Value box.
Alpha ($) is the acceptable probability of incorrectly concluding that there is
a difference. The suggested value is $ 0 0.05. This indicates that a one in
twenty chance of error is acceptable, or that you are willing to conclude there
is a significant difference when P 7 0.05.

Smaller values of $ result in stricter requirements before concluding there is a


significant difference, but a greater possibility of concluding there is no
difference when one exists. Larger values of $ make it easier to conclude that
there is a difference, but also increase the risk of reporting a false positive.

Multiple Select the Post Hoc Test tab in the Options dialog box to view the multiple
Comparisons comparisons options. Three Way ANOVAs test the hypothesis of no
differences between the several treatment groups, but do not determine
which groups are different, or the sizes of these differences. Multiple
comparisons to isolate these differences whenever a Three Way ANOVA
detects a difference.

The P value used to determine if the ANOVA detects a difference is set in the
Report Options dialog box. If the P value produced by the Three Way

Three Way Analysis of Variance (ANOVA) 291


ANOVA is less than the P value specified in the box, a difference in the
groups is detected and the multiple comparisons are performed. For more
information on specifying a P value for the ANOVA, see Setting Test
Options on page 98.

FIGURE 9–44
The Options for Three
Way ANOVA Dialog Box
Displaying the Power and
Multiple Comparisons
Options

Performing Multiple Comparisons You can choose to always perform


multiple comparisons or to only perform multiple comparisons if the Three
Way ANOVA detects a difference.

Select the Always Perform option to perform multiple comparisons whether


or not the ANOVA detects a difference.

Select the Only When ANOVA P Value is Significant option to perform


multiple comparisons only if the ANOVA detects a difference.

Significant Multiple Comparison Value Select either .05 or .01 from the
Significance Value for Multiple Comparisons drop-down list. This value
determines that the likelihood of the multiple comparison being incorrect in
concluding that there is a significant difference in the treatments.

A value of .05 indicates that the multiple comparisons will detect a difference
if there is less than 5% chance that the multiple comparison is incorrect in
detecting a difference. A value of .01 indicates that the multiple comparisons
will detect a difference if there is less than 1% chance that the multiple
comparison is incorrect in detecting a difference.

! If multiple comparisons are triggered, the Multiple Comparison Options


dialog box appears after you pick your data from the worksheet and run the
test, prompting you to choose a multiple comparison test. See page 298 for
more information.

Three Way Analysis of Variance (ANOVA) 292


Running a Three Way ANOVA 10

To run a Three Way ANOVA you need to select the data to test. The Pick
Columns dialog box is used to select the worksheet columns with the data
you want to test.

To run a Three Way ANOVA:

1 If you want to select your data before you run the test, drag the pointer
over your data.

2 Open the Pick Columns for Three Way ANOVA dialog box to start the
Three Way ANOVA. You can either:

➤ Select Three Way ANOVA from the drop-down list in the toolbar,
then click the button.
➤ Choose the Statistics menu Compare Many Groups, Three Way
ANOVA... command.
➤ Click the Run Test button from the Options for Three Way ANOVA
dialog box.

The Pick Columns for Three Way ANOVA dialog box appears. If you
selected columns before you chose the test, the selected columns appear
in the Selected Columns list. If you have not selected columns, the
dialog box prompts you to pick your data.

3 To assign the desired worksheet columns to the Selected Columns list,


select the columns in the worksheet, or select the columns from the
Data for Data drop-down list.

The first selected column is assigned to the first row in the Selected
Columns list, and all successively selected columns are assigned to
successive rows in the list. The number or title of selected columns
appear in each row. You are prompted to pick three factor columns and
one data column.

4 To change your selections, select the assignment in the list, then select
new column from the worksheet. You can also clear a column
assignment by double-clicking it in the Selected Columns list.

5 Click Finish to perform the Three Way ANOVA. The Three Way
ANOVA report appears if:

Three Way Analysis of Variance (ANOVA) 293


➤ You elected to test for normality and equal variance, and your data
passes both tests.
➤ Your data has no missing data points, cells, or is not otherwise
unbalanced.
➤ You selected not to perform multiple comparisons, or if you selected
to run multiple comparisons only when the P value is significant, and
the P value is not significant (see page 131).

To edit the report, use the Format menu commands; for information
on editing reports, see Editing Reports on page 137.

FIGURE 9–45
The Pick Columns for
Three Way ANOVA Dialog
Box

6 If you elected to test for normality and equal variance, and your data
fails either test, either continue or transform your data, then perform
the Three Way ANOVA on the transformed data. For information on
how to transform data, see Chapter 14, USING TRANSFORMS.

7 If your data is missing data points, missing cells, or is otherwise


unbalanced, you are prompted to perform the appropriate procedure if:

➤ You are missing data points, but still have at least one observation in
each cell, SigmaStat automatically proceeds with the Three Way
ANOVA using a general linear model.
➤ You are missing a cell, but the data is connected, you can proceed by
either performing a three way analysis assuming no interaction
between the factor, or converting the problem into a two way design
with each non-empty cell a different level of two factor.
➤ Your data is not geometrically connected, you cannot perform a
Three Way ANOVA. Either treat the problem as a Two Way
ANOVA, or cancel the test.

Three Way Analysis of Variance (ANOVA) 294


For more information on missing data point and cell handling, see
Missing Data and Empty Cells Data on page 285.

8 If the P value for multiple comparisons is significant, or you selected in


to always perform multiple comparisons, the Multiple Comparisons
Options dialog box appears prompting you to select a multiple
comparison method. For more information on selecting a multiple
comparison method, see Multiple Comparison Options below.

Multiple Comparison Options 10

If you enabled multiple comparisons in the Three Way ANOVA Options


dialog box (see page 9-292), and the ANOVA produces a P value, for any of
the three factors or the interaction between the three factors, equal to or less
than the trigger P value, the Multiple Comparison Options dialog box
appears.

This dialog box displays the P values for each of the experimental factors and
of the interaction between the three factors. Only the options with P values
less than or equal to the value set in the Options dialog box are selected. You
can disable multiple comparison testing for a factor by clicking the selected
option. If no factor is selected, multiple comparison results are not reported.

There are seven multiple comparison tests to choose from for the Three Way
ANOVA. You can choose to perform the:

➤ Holm-Sidak Test
➤ Tukey Test
➤ Student-Newman-Keuls Test
➤ Bonferroni t-test
➤ Fisher’s LSD
➤ Dunnet’s Test
➤ Duncan’s Multiple Range Test

There are two types of multiple comparison available for the Three Way
ANOVA. The types of comparison you can make depends on the selected
multiple comparison test, such as:

Three Way Analysis of Variance (ANOVA) 295


➤ All pairwise comparisons test the difference between each treatment or
level within the two factors separately (i.e., among the different rows and
columns of the data table).
➤ Multiple comparisons versus a control test the difference between all
the different combinations of each factors (i.e., all the cells in the data
table).

All pairwise comparisons test the difference between each treatment or level
within the two factors separately (i.e., among the different rows and columns
of the data table). Multiple comparisons versus a control test the difference
between all the different combinations of each factors (i.e., all the cells in the
data table).

When comparing the two factors separately, the levels within one factor are
compared among themselves without regard to the second factor, and vice
versa. These results should be used when the interaction is not statistically
significant.

When the interaction is statistically significant, interpreting multiple


comparisons among different levels of each experimental factor may not be
meaningful. SigmaStat also suggests performing a multiple comparison
between all the cells.

The result of both comparisons is a listing of the similar and different group
pairs, i.e., those groups that are and are not detectably different from each
other. Because no statistical test eliminates uncertainty, multiple comparison
procedures sometimes produce ambiguous groupings.

Holm-Sidak Test The Holm-Sidak Test can be used for both pairwise comparisons and
comparisons versus a control group. It is more powerful than the Tukey and
Bonferroni tests and, consequently, it is able to detect differences that these
other tests do not. It is recommended as the first-line procedure for pairwise
comparison testing.

When performing the test, the P values of all comparisons are computed and
ordered from smallest to largest. Each P value is then compared to a critical
level that depends upon the significance level of the test (set in the test
options), the rank of the P value, and the total number of comparisons made.
A P value less than the critical level indicates there is a significant difference
between the corresponding two groups.

Tukey Test The Tukey Test and the Student-Newman-Keuls test are conducted similarly
to the Bonferroni t-test, except that each uses a table of critical values that is

Three Way Analysis of Variance (ANOVA) 296


computed based on a better mathematical model of the probability structure
of the multiple comparisons. The Tukey Test is more conservative than the
Student-Newman-Keuls test, because it controls the errors of all comparisons
simultaneously, while the Student-Neuman-Keuls test controls errors among
tests of k means. Because it is more conservative, it is less likely to determine
that a give differences is statistically significant and it is the recommended test
for all pairwise comparisons.

While Multiple Comparisons vs. a Control is an available comparison type


for the Tukey Test, it is not recommended. Use the Dunnett’s Test for
multiple comparisons vs. a control.

Student-Newman- The Student-Newman-Keuls Test and the Tukey Test are conducted similarly
Keuls (SNK) Test to the Bonferroni t-test, except that each uses a table of critical values that is
computed based on a better mathematical model of the probability structure
of the multiple comparisons. The Student-Newman-Keuls Test is less
conservative than the Tukey Test because it controls errors among tests of k
means, while the Tukey Test controls the errors of all comparisons
simultaneously. Because it is less conservative, it is more likely to determine
that a give differences is statistically significant. The Student-Newman-Keuls
Test is usually more sensitive than the Bonferroni t-test, and is only available
for all pairwise comparisons.

Bonferroni t-test The Bonferroni t-test performs pairwise comparisons with paired t-tests. The
P values are then multiplied by the number of comparisons that were made.
It can perform both all pairwise comparisons and multiple comparisons vs. a
control, and is the most conservative test for both each comparison type. For
less conservative all pairwise comparison tests, see the Tukey and the Student-
Newman-Keuls tests, and for the less conservative multiple comparison vs. a
control tests, see the Dunnett’s Test.

Fisher’s Least The Fisher’s LSD Test is the least conservative all pairwise comparison test.
Significance Unlike the Tukey and the Student-Newman-Keuls, it makes not effort to
Difference Test control the error rate. Because it makes not attempt in controlling the error
rate when detecting differences between groups, it is not recommended.

Dunnett’s Test Dunnett's test is the analog of the Student-Newman-Keuls Test for the case
of multiple comparisons against a single control group. It is conducted
similarly to the Bonferroni t-test, but with a more sophisticated mathematical
model of the way the error accumulates in order to derive the associated table
of critical values for hypothesis testing. This test is less conservative than the
Bonferroni Test, and is only available for multiple comparisons vs. a control.

Three Way Analysis of Variance (ANOVA) 297


Duncan’s The Duncan’s Test is the same way as the Tukey and the Student-Newman-
Multiple Range Keuls tests, except that it is less conservative in determining whether the
difference between groups is significant by allowing a wider range for error
rates. Although it has a greater power to detect differences than the Tukey
and the Student-Newman-Keuls tests, it has less control over the Type1 error
rate, and is, therefore, not recommended.

Performing a The multiple comparison you choose to perform depends on the treatments
Multiple Comparison you are testing. Click Cancel if you do not want to perform a multiple
comparison procedure.

1 Multiple comparisons are performed of the factors selected under the


Select Factors to Compare heading. The factors with P values less than
or equal to the value set in the Options dialog box are automatically
selected, and the P values for the selected factors, and/or the
interactions of the factors are displayed in the upper left corner of the
dialog box. If the P value is greater than the P value set in the Options
dialog box, the factor is not selected, the P value for the factor is not
displayed, and multiple comparisons are not performed for the factor.

You can disable multiple comparison testing for a factor by clicking the
selected option.

FIGURE 9–46
The Multiple Comparison
Options Dialog Box for
a Three Way ANOVA

2 Select the desired multiple comparison test from the Suggested Test
drop-down list. The Tukey and Student-Newman-Keuls tests are
recommended for determining the difference among all treatments. If
you have only a few treatments, you may want to select the simpler
Bonferroni t-test.

Three Way Analysis of Variance (ANOVA) 298


The Dunnett's test is recommended for determining the differences
between the experimental treatments and a control group. If you have
only a few treatments or observations, you can select the simpler
Bonferroni t-test.

! Note that in both cases the Bonferroni t-test is most sensitive with a small
number of groups. Dunnett’s test is not available if you have fewer than
six observations.

For more information on each of the multiple comparison tests, see


page 324.

3 Select a Comparison Type. The types of comparisons available depend


on the selected test. All Pairwise compares all possible pairs of
treatments and is available for the Tukey, Student-Newman-Keuls,
Bonferroni, Fisher LSD, and Duncan’s tests.

Versus Control compares all experimental treatments to a single control


group and is available for the Tukey, Bonferroni, Fisher LSD,
Dunnett’s, and Duncan’s tests. It is not recommended for the Tukey,
Fisher LSD, or Duncan’s test. If you select Versus Control, you must
also select the control group from the list of groups.

For more information on multiple comparison test and the available


comparison types, see page 324.

4 If you selected an all pairwise comparison test, click Finish to


continue with the Three Way ANOVA and view the report (see Figure
9–48 on page 301). For information on editing reports, see Editing
Reports on page 137.

5 If you selected a multiple comparisons versus a control test, click


Next. The Multiple Comparisons Options dialog box prompts you to
select a control group. Select the desired control group from the list,
then click Finish to continue with the Three Way ANOVA and view

Three Way Analysis of Variance (ANOVA) 299


the report (see Figure 9–48 on page 301). For information on editing
reports, see page 137.

FIGURE 9–47
The Multiple Comparison
Options Dialog Box
Prompting You to
Select a Control Group

Interpreting Three Way ANOVA Results 10

A full Three Way ANOVA report displays an ANOVA table describing the
variation associated with each factor and their interactions. This table
displays the degrees of freedom, sum of squares, and mean squares for each of
the elements in the data table, as well as the F statistics and the corresponding
P values.

Summary tables of least square means for each factor and for all three factors
together can also be generated. This result and additional results are enabled
in the Options for Three Way ANOVA dialog box (see Setting Three Way
ANOVA Options on page 288). Click a check box to enable or disable a test
option. All options are saved between SigmaStat sessions.

You can also generate tables of multiple comparisons. Multiple Comparison


results are also specified in the Options for Three Way ANOVA dialog box.
The tests used in the multiple comparisons are selected in the Multiple
Comparisons Options dialog box (see page 295).

For descriptions of the derivations for Three Way ANOVA results, you can
reference any appropriate statistics reference. For a list of suggested
references, see page 12.

! The report scroll bars only scroll to the top and bottom of the current page.
To move to the next or the previous page in the report, use the and

Three Way Analysis of Variance (ANOVA) 300


buttons in the formatting toolbar to move one page up and down in the
report.

Result Explanations In addition to the numerical results, expanded explanations of the results may
also appear. To turn off this explanatory text, choose the Statistics menu
Report Options... command and click the selected Explain Test Results check
box.

The number of decimal places displayed is also set in the Report Options
dialog box. For more information on setting report options see Setting Test
Options on page 98.

If There Were If your data contained missing values but no empty cells, the report indicates
Missing Data Cells the results were computed using a general linear model.

If your data contained empty cells, you either analyzed the problem assuming
either no interaction or treated the problem as a Two or One Way ANOVA.

➤ If you choose no interactions, no statistics for factor interaction are


calculated.
➤ If you performed a Two or One Way ANOVA, the results shown are
identical to Two and One Way ANOVA results (see 8-243 and page
271).

FIGURE 9–48
Three Way ANOVA
Report

Three Way Analysis of Variance (ANOVA) 301


Dependent Variable This is the data column title of the indexed worksheet data you are analyzing
with the Three Way ANOVA. Determining if the values in this column are
affected by the different factor levels is the objective of the Three Way
ANOVA.

Normality Test Normality test results display whether the data passed or failed the test of the
assumption that they were drawn from a normal population and the P value
calculated by the test. Normally distributed source populations are required
for all parametric tests.

This result appears if you enabled normality testing in the Options for Three
Way ANOVA dialog box (see page 290).

Equal Variance Test Equal Variance test results display whether or not the data passed or failed the
test of the assumption that the samples were drawn from populations with
the same variance and the P value calculated by the test. Equal variance of
the source population is assumed for all parametric tests.

This result appears if you enabled equal variance testing in the Options for
Three Way ANOVA dialog box (see page 290).

ANOVA Table The ANOVA table lists the results of the Three Way ANOVA.

! When there are missing data, the best estimate of these values is
automatically calculated using a general linear model.

DF (Degrees of Freedom) Degrees of freedom represent the number of


groups in each factor and the sample size, which affects the sensitivity of the
ANOVA. Note that:

➤ The degrees of freedom for each factor is a measure of the number of


levels in each factor.
➤ The interaction degrees of freedom is a measure of the total number
of cells.
➤ The error degrees of freedom (sometimes called the residual or within
groups degrees of freedom) is a measure of the sample size after
accounting for the factors and interaction.
➤ The total degrees of freedom is a measure of the total sample size.

SS (Sum of Squares) The sum of squares is a measure of variability


associated with each element in the ANOVA data table. Note that:

Three Way Analysis of Variance (ANOVA) 302


➤ The factor sums of squares measure the variability within between the
rows or columns of the table considered separately.
➤ The interaction sum of squares measures the variability of the average
differences between the cell in addition to the variation between the rows
and columns, considered separately—this is a gauge of the interaction
between the factors.
➤ The error sum of squares (also called residual or within group sum of
squares) is a measure of the underlying random variation in the data, i.e.,
the variability not associated with the factors or their interaction.
➤ The total sum of squares is a measure of the total variability in the data; if
there are no missing data, the total sum of squares equals the sum of the
other table sums of squares.

MS (Mean Squares) The mean squares provide different estimates of the


population variances. Comparing these variance estimates is the basis of
analysis of variance.

The mean square for each factor:

sum of squares for thefactor SS factor


----------------------------------------------------------------------- = ----------------- = MS factor
degrees of freedom for the factor DF factor

is an estimate of the variance of the underlying population computed from


the variability between levels of the factor.

The interaction mean square:

sum of squares for the interaction SS inter


----------------------------------------------------------------------------------- - = MS inter
= ---------------
degrees of freedom for the interaction DF inter

is an estimate of the variance of the underlying population computed from


the variability associated with the interactions of the factors.

The error mean square (residual, or within groups):

error sum of squares - = --------------- SS error


---------------------------------------------------- - = MS error
error degrees of freedom DF error

Three Way Analysis of Variance (ANOVA) 303


is an estimate of the variability in the underlying population, computed from
the random component of the observations.

F Statistic The F test statistic is provided for comparisons within each factor
and between the factors.

The F ratio to test each factor is:

mean square for the factor = ----------------- MS factor


----------------------------------------------------------- - = F factor
mean square of the error MS error

The F ratio to test the interaction is:

mean square for the interaction MS inter


----------------------------------------------------------------------- - = F inter
= ----------------
mean square of the error MS error

If the F ratio is around 1, you can conclude that there are no significant
differences between factor levels or that there is no interaction between
factors (i.e., the data groups are consistent with the null hypothesis that all
the samples were drawn from the same population).

If F is a large number, you can conclude that at least one of the samples for
that factor or combination of factors was drawn from a different population
(i.e., the variability is larger than what is expected from random variability in
the population). To determine exactly which groups are different, examine
the multiple comparison results (see page 305).

P Value The P value is the probability of being wrong in concluding that


there is a true difference between the groups (i.e., the probability of falsely
rejecting the null hypothesis, or committing a Type I error, based on F). The
smaller the P value, the greater the probability that the samples are drawn
from different populations.

Traditionally, you can conclude there are significant differences if P 7 0.05.

Power The power, or sensitivity, of a Three Way ANOVA is the probability that the
test will detect the observed difference among the groups if there really is a
difference. The closer the power is to 1, the more sensitive the test. The
power for the comparison of the groups within the two factors and the power
for the comparison of the interactions are all displayed. These results are set
in the Options for Three Way ANOVA dialog box (see page 291).

Three Way Analysis of Variance (ANOVA) 304


ANOVA power is affected by the sample sizes, the number of groups being
compared, the chance of erroneously reporting a difference $ (alpha), the
observed differences of the group means, and the observed standard
deviations of the samples.

Alpha ($) Alpha ($) is the acceptable probability of incorrectly concluding


that there is a difference. An $ error also is called a Type I error (a Type I
error is when you reject the hypothesis of no effect when this hypothesis is
true).

The $ value is set in the Options for Three Way ANOVA dialog box; the
suggested value is $ 0 0.05 which indicates that a one in twenty chance of
error is acceptable. Smaller values of $ result in stricter requirements before
concluding there is a significant difference, but a greater possibility of
concluding there is no difference when one exists (a Type II error). Larger
values of $ make it easier to conclude that there is a difference, but also
increase the risk of seeing a false difference (a Type I error).

Summary Table The least square means and standard error of the means are displayed for each
factor separately (summary table row and column), and for each combination
of factors (summary table cells). If there are missing values, the least square
means are estimated using a general linear model.

Mean The average value for the column. If the observations are normally
distributed the mean is the center of the distribution.

Standard Error of the Mean A measure of the approximation with which


the mean computed from the sample approximates the true population
mean.

When there are no missing data, the least square means equal the cell and
marginal (row and column) means. When there are missing data, the least
squared means provide the best estimate of these values, using a general linear
model. These means and standard errors are used when performing multiple
comparisons (see below).

Multiple If a difference is found among the groups, multiple comparison tables can be
Comparisons computed. Multiple comparison procedures are activated in the Options for
Three Way ANOVA dialog box (see page 292). The tests used in the
multiple comparisons are set in the Multiple Comparisons Options dialog
box (see page 298).

Three Way Analysis of Variance (ANOVA) 305


Multiple comparison results are used to determine exactly which groups are
different, since the ANOVA results only inform you that three or more of the
groups are different. Three factor multiple comparison for a full Three Way
ANOVA also compares:

➤ Groups within each factor without regard to the other factor (this is a
marginal comparison, i.e.,only the columns or rows in the table are
compared).
➤ All combinations of factors (all cells in the table are compared with each
other).

The specific type of multiple comparison results depends on the comparison


test used and whether the comparison was made pairwise or versus a control.
Note that:

➤ All pairwise comparison results list comparisons of all possible


combinations of group pairs; the all pairwise tests are the Tukey, Student-
Newman-Keuls, Fisher LSD, Duncan’s, and Dunnett's, and Bonferroni
t-test.
➤ Comparisons versus a single control group list only comparisons with
the selected control group. The control group is selected during the
actual multiple comparison procedure. The comparison versus a control
tests are a Bonferroni t-test and Dunnett's test.

For descriptions of the derivations of three way multiple comparison


procedure results, you can reference any appropriate statistics reference. For a
list of suggested references, see page 10.

Bonferroni t-test Results The Bonferroni t-test lists the differences of the
means for each pair of groups, computes the t values for each pair, and
displays whether or not P 7 0.05 for that comparison. The Bonferroni t-test
can be used to compare all groups or to compare versus a control.

You can conclude from “large” values of t that the difference of the two
groups being compared is statistically significant.

If the P value for the comparison is less than 0.05, the likelihood of
erroneously concluding that there is a significant difference is less than 5%.
If it is greater than 0.05, you cannot confidently conclude that there is a
difference.

The Difference of Means is a gauge of the size of the difference between the
levels or cells being compared.

Three Way Analysis of Variance (ANOVA) 306


The degrees of freedom DF for the marginal comparisons are a measure of
the number of groups (levels) within the factor being compared. The degrees
of freedom when comparing all cells is a measure of the sample size after
accounting for the factors and interaction. This is the same as the error or
residual degrees of freedom.

Tukey, Student-Newman-Keuls, Fisher LSD, Duncan’s, and Dunnett's


Test Results The Tukey, Student-Newman-Keuls (SNK), Fisher LSD, and
Duncan’s tests are all pairwise comparisons of every combination of group
pairs. While the Tukey Fisher LSD, and Duncan’s can be used to compare a
control group to other groups, they are not recommended for this type of
comparison.

Dunnett's test only compares a control group to all other groups. All tests
compute the q test statistic and display whether or not P 7 0.05 for that pair
comparison.

You can conclude from “large” values of q that the difference of the two
groups being compared is statistically significant.

If the P value for the comparison is less than 0.05, the likelihood of being
incorrect in concluding that there is a significant difference is less than 5%.
If it is greater than 0.05, you cannot confidently conclude that there is a
difference.

p is a parameter used when computing q. The larger the p, the larger q needs
to be to indicate a significant difference. p is an indication of the differences
in the ranks of the group means being compared. Groups means are ranked
in order from largest to smallest, and p is the number of means spanned in
the comparison. For example, when comparing four means, comparing the
largest to the smallest p 0 4, and when comparing the second smallest to the
smallest p 0 2.

If a group is found to be not significantly different than another group, all


groups with p ranks in between the p ranks of the two groups that are not
different are also assumed not to be significantly different, and a result of
DNT (Do Not Test) appears for those comparisons.

The Difference of Means is a gauge of the size of the difference between the
groups or cells being compared.

The degrees of freedom DF for the marginal comparisons are a measure of


the number of groups (levels) within the factor being compared. The degrees

Three Way Analysis of Variance (ANOVA) 307


of freedom when comparing all cells is a measure of the sample size after
accounting for the factors and interaction (this is the same as the error or
residual degrees of freedom).

Three Way ANOVA Report Graphs 10

You can generate up to three graphs using the results from a Three Way
ANOVA. They include a:

➤ Histogram of the residuals.


➤ Normal probability plot of the residuals.
➤ Multiple comparison graphs.

Histogram of The Three Way ANOVA histogram plots the raw residuals in a specified
Residuals range, using a defined interval set. The residuals are divided into a number of
evenly incremented histogram intervals and plotted as histogram bars
indicating the number of residuals in each interval. The X axis represents the
histogram intervals, and the Y axis represent the number of residuals in each
group. For an example of a histogram, see page 182.

Probability Plot The Three Way ANOVA probability plot graphs the frequency of the raw
residuals. The residuals are sorted and then plotted as points around a curve
representing the area of the gaussian plotted on a probability axis. Plots with
residuals that fall along gaussian curve indicate that your data was taken from
a normally distributed population. The X axis is a linear scale representing
the residual values. The Y axis is a probability scale representing the
cumulative frequency of the residuals. For an example of a probability, see
page 182.

Multiple The Three Way ANOVA multiple comparison graphs plot significant
Comparison Graphs differences between levels of a significant factor. There is one graph for every
significant factor reported by the specified multiple comparison test. If there
is one significant factor reported, one graph appears; if there are two
significant factors, two graphs appear, etc. If a factor is not reported as
significant, a graph for the factor does not appear. For an example of a
multiple comparison graph, see page 182

Creating a To generate a Three Way ANOVA report graph:


Report Graph
1 Click the toolbar button, or choose the Graph menu Create Graph
command when the Three Way ANOVA report is selected. The Create

Three Way Analysis of Variance (ANOVA) 308


Graph dilaog appears displaying the types of graphs available for the
Three Way ANOVA report.

FIGURE 9–49
The Create Graph Dialog
Box for a Three Way ANOVA
Report

2 Select the type of graph you want to create from the Graph Type list,
then click OK, or double-click the desired graph in the list. For more
information on each of the graph types, see Chapter 8. The specified
graph appears in a graph window or in the report.

FIGURE 9–50
A Normal Probability Plot
for a Three Way ANOVA

For information on manipulating graphs, see pages 8-178 through 8-202.

Three Way Analysis of Variance (ANOVA) 309


Kruskal-Wallis Analysis of Variance on Ranks 10

A Kruskal-Wallis ANOVA (analysis of variance) on Ranks should be used


when:

➤ You want to see if three or more different experimental groups are


affected by a single factor.
➤ Your samples are drawn from non-normal populations or do not have
equal variances.

If you know that your data were drawn from normal populations with equal
variances, use One Way ANOVA. When there are only two groups to
compare, do a Mann-Whitney Rank Sum Test. There is no two or three
factor test for non-normal populations; however, you can transform your data
using Transform menu commands so that it fits the assumptions of a
parametric test. For more information on transforming your data, see
Chapter 14, USING TRANSFORMS.

! If you selected normality testing in the Options for ANOVA on Ranks


dialog box (see page 313) to perform an ANOVA on Ranks on a normal
population, SigmaStat informs you that the data is suitable for a parametric
test, and suggests a One Way ANOVA instead.

About the The Kruskal-Wallis Analysis of Variance on Ranks compares several different
Kruskal-Wallis experimental groups that receive different treatments. This design is
ANOVA on Ranks essentially the same as a Mann-Whitney Rank Sum Test, except that there are
more than two experimental groups. If you try to perform an ANOVA on
Ranks on two groups, SigmaStat tells you to perform a Rank Sum Test
instead.

The null hypothesis you test is that there is no difference in the distribution
of values between the different groups.

The Kruskal-Wallis ANOVA on Ranks is a nonparametric test that does not


require assuming all the samples were drawn from normally distributed
populations with equal variances.

Performing an To perform an ANOVA on Ranks:


ANOVA on Ranks
1 Enter or arrange your data appropriately in the worksheet
(see following section).

Kruskal-Wallis Analysis of Variance on Ranks 310


2 If desired, set the ANOVA on Ranks options using the Options for
ANOVA on Ranks dialog box (page 313).

3 Select ANOVA on Ranks from the toolbar drop-down list, or choose


the Compare Many Groups, ANOVA on Ranks... command from the
Statistics menu.

4 Run the test by selecting the worksheet columns with the data you want
to test using the Pick Columns dialog box (page 316).

5 Specify the multiple comparisons you want to perform on your test


(page 324).

6 View and interpret the ANOVA on Ranks report and generate report
graph (pages page 322 and page 326).

Arranging ANOVA on Ranks Data 10

The format of the data to be tested can be raw data or indexed data. Raw
data is placed in as many columns as there are groups, up to 64; each column
contains the data for one group. Indexed data is placed in two worksheet
columns with at least three treatments. If you have less than three treatments
you should use the Rank Sum Test (see page 220).

For more information on arranging data, see Data Format for Group
Comparison Tests on page 204, or Arranging Data for Contingency Tables
on page 69.

Selecting When running an ANOVA on Ranks you can either:


Data Columns
➤ Select the columns to test by dragging your mouse over the columns
before choosing the test, or
➤ Select the columns while running the test (page 99).

Setting the ANOVA on Ranks Options 10

Use the ANOVA on Ranks options to:

➤ Adjust the parameters of the test to relax or restrict the testing of your
data for normality and equal variance.
➤ Enable multiple comparison testing.
➤ Display the summary table.

Kruskal-Wallis Analysis of Variance on Ranks 311


FIGURE 9–51
Valid Data Formats for
an ANOVA on Ranks

Columns 1 through 3 are


arranged as raw data.
Columns 4 and 5 are
arranged as indexed data,
with column 4 as the factor
column and column 5 as the
data column.

To change the ANOVA on Ranks options:

1 If you are going to run the test after changing test options, and want to
select your data before you run the test, drag the pointer over your data.

2 To open the Options for ANOVA on Ranks dialog box, select ANOVA
on Ranks from the toolbar drop-down list, then click the button, or
choose the Statistics menu Current Test Options... command. The
Normality and Equal Variance options appear (see Figure 9–52 on page
313).

3 Click the Results tab to view the Summary Table option (see Figure 9–
53 on page 314) and the Post Hoc Tests tab to view the multiple
comparison option (see Figure 9–54 on page 315). Click the
Assumption Checking tab to return to the Normality and Equal
Variance options.

4 Click a check box to enable or disable a test option. Options settings


are saved between SigmaStat sessions. For more information on each of
the test options, see page 324.

5 To continue the test, click Run Test. The Pick Columns dialog box
appears (see page 99 for more information).

6 To accept the current settings and close the options dialog box, click
OK. To accept the current setting without closing the options dialog

Kruskal-Wallis Analysis of Variance on Ranks 312


box, click Apply. To close the dialog box without changing any settings
or running the test, click Cancel.

! You can click Help at any time to access SigmaStat’s on-line help system.

Normality and Select the Assumption Checking tab from the options dialog box to view the
Equal Variance Normality and Equal Variance options. The normality assumption test
Assumptions checks for a normally distributed population. The equal variance assumption
test checks the variability about the group means.

Normality Testing SigmaStat uses the Kolmogorov-Smirnov test to test for


a normally distributed population.

FIGURE 9–52
The Options for
ANOVA on Ranks Dialog Box
Displaying the Assumption
Checking Options

Equal Variance Testing SigmaStat tests for equal variance by checking the
variability about the group means.

P Values for Normality and Equal Variance Enter the corresponding P


value in the P Value to Reject box. The P value determines the probability of
being incorrect in concluding that the data is not normally distributed (the P
value is the risk of falsely rejecting the null hypothesis that the data is
normally distributed). If the P value computed by the test is greater than the
P set here, the test passes.

To require a stricter adherence to normality and/or equal variance,


increase the P value. Because the parametric statistical methods are relatively
robust in terms of detecting violations of the assumptions, the suggested
value in SigmaStat is 0.050. Larger values of P (for example, 0.100) require
less evidence to conclude that data is not normal.

To relax the requirement of normality and/or equal variance, decrease P.


Requiring smaller values of P to reject the normality assumption means that

Kruskal-Wallis Analysis of Variance on Ranks 313


you are willing to accept greater deviations from the theoretical normal
distribution before you flag the data as non-normal. For example, a P value
of 0.050 requires greater deviations from normality to flag the data as non-
normal than a value of 0.100.

! There are extreme conditions of data distribution that these tests cannot take
into account. For example, the Levene Median test fails to detect differences
in variance of several orders of magnitude. However, these conditions
should be easily detected by simply examining the data without resorting to
the automatic assumption tests.

Summary Table The summary table for a Rank Sum Test lists the medians, percentiles, and
sample sizes N in the Rank Sum test report. If desired, change the percentile
values by editing the boxes. The 25th and the 75th percentiles are the
suggested percentiles.

FIGURE 9–53
The Options for
ANOVA on Ranks Dialog Box
Displaying the Summary
Table Option

Multiple Comparison Select the Post Hoc Test tab in the Options dialog box to view the multiple
Options comparisons options. An ANOVA on Ranks tests the hypothesis of no
differences between the several treatment groups, but does not determine
which groups are different, or the size of these differences. Multiple
comparisons isolate these differences.

The P value used to determine if the ANOVA detects a difference is set in the
Report Options dialog box. If the P value produced by the ANOVA on
Ranks is less than the P value specified in the box, a difference in the groups
is detected and the multiple comparisons are performed. For more
information on specifying a P value for the ANOVA, see Setting Report
Options on page 135.

Kruskal-Wallis Analysis of Variance on Ranks 314


Performing Multiple Comparisons You can choose to always perform
multiple comparisons or to only perform multiple comparisons if the
ANOVA on Ranks detects a difference.

Select the Always Perform option to perform multiple comparisons whether


or not the ANOVA detects a difference.

FIGURE 9–54
The Options for ANOVA on
Ranks Dialog Box
Displaying the Multiple
Comparison Options

Select the Only When ANOVA P Value is Significant option to perform


multiple comparisons only if the ANOVA detects a difference.

Select a value from the Significance Value for Multiple Comparisons drop-
down list. This value determines the that the likelihood of the multiple
comparison being incorrect in concluding that there is a significant difference
in the treatments.

A value of .05 indicates that the multiple comparisons will detect a difference
if there is less than 5% chance that the multiple comparison is incorrect in
detecting a difference.

! If multiple comparisons are triggered, the Multiple Comparison Options


dialog box appears after you pick your data from the worksheet and run the
test, prompting you to choose a multiple comparison method. See page 314
for more information.

! Note that because no statistical test eliminates uncertainty, multiple


comparison tests sometimes produce ambiguous groupings.

Kruskal-Wallis Analysis of Variance on Ranks 315


Running an ANOVA on Ranks 10

To run a test, you need to select the data to test. The Pick Columns dialog
box is used to select the worksheet columns with the data you want to test
and to specify the format of the data you are testing.

To run an ANOVA on Ranks:

1 If you want to select your data before you run the test, drag the pointer
over your data.

2 Open the Pick Columns for ANOVA on Ranks dialog box to start the
ANOVA on Ranks. You can either:

➤ Select ANOVA on Ranks from the toolbar drop-down list, then click
the button.
➤ Choose the Statistics menu Compare Many Groups, ANOVA on
Ranks... command.
➤ Click the Run Test button from the Options from ANOVA on
Ranks dialog box (see page 313).

The Pick Columns dialog box appears prompting you to specify a data
format.

3 Select the appropriate data format from the Data Format drop-down
list. If your data is grouped in columns, select Raw. If your data is in
the form of a group index column(s) paired with a data column(s),
select Indexed.

For more information on arranging data, see Data Format for Group
Comparison Tests on page 204, or Arranging Data for Contingency
Tables on page 69.

FIGURE 9–55
The Pick Columns
for ANOVA on
Ranks Dialog Box
Prompting You to
Specify A Data Format

Kruskal-Wallis Analysis of Variance on Ranks 316


4 Click Next to pick the data columns for the test. If you selected
columns before you chose the test, the selected columns appear in the
Selected Columns list.

5 To assign the desired worksheet columns to the Selected Columns list,


select the columns in the worksheet, or select the columns from the
Data for Data drop-down list.

The first selected column is assigned to the first row in the Selected
Columns list, and all successively selected columns are assigned to
successive rows in the list.

The number or title of selected columns appear in each row. You are
prompted to pick a minimum of two and a maximum of 64 columns
for raw data and two columns with at least three treatments are selected
for indexed data. If you have less than three treatments, a message
appears telling you to use the Rank Sum Test (see page 220).

FIGURE 9–56
The Pick Columns
for ANOVA on Ranks Dialog
Box
Prompting You to
Select Data Columns

6 To change your selections, select the assignment in the list, then select
new column from the worksheet. You can also clear a column
assignment by double-clicking it in the Selected Columns list.

7 If you elected to test for normality and equal variance, and your data
fails either test, either continue or transform your data, then perform
the Two Way ANOVA on the transformed data. For information on
how to transform data, see Chapter 14, USING TRANSFORMS.

8 Click Finish to perform the ANOVA on Ranks. The ANOVA on Ranks


report appears (see page 322) if you:

➤ Elected to test for normality and equal variance, and your data passes
both tests.

Kruskal-Wallis Analysis of Variance on Ranks 317


➤ Selected not perform multiple comparisons, or if you selected to run
multiple comparisons only when the P value is significant, and the P
value is not significant (see page 304).

To edit the report, use the Format menu commands; for information
on editing reports, see page 137 in the WORKING WITH REPORTS
chapter.

9 If the P value for multiple comparisons is significant, or you selected in


to always perform multiple comparisons, the Multiple Comparisons
Options dialog box appears prompting you to select a multiple
comparison method. For more information on selecting a multiple
comparison method, see the following section, see Multiple
Comparison Options below.

Multiple Comparison Options 10

If you selected to run multiple comparisons only when the P value is


significant, and the ANOVA produces a P value, for either of the two factors
or the interaction between the two factors, equal to or less than the trigger P
value, or you selected to always run multiple comparisons in the Options for
ANOVA on Ranks dialog box (see page 315), the Multiple Comparison
Options dialog box appears prompting you to specify a multiple comparison
test.

This dialog box displays the P values for each of the two experimental factors
and of the interaction between the two factors. Only the options with P
values less than or equal to the value set in the Options dialog box are
selected. You can disable multiple comparison testing for a factor by clicking
the selected option. If no factor is selected, multiple comparison results are
not reported.

There are four multiple comparison tests to choose from for the ANOVA on
Ranks. You can choose to perform the:

➤ Dunn’s Test
➤ Dunnett’s Test
➤ Tukey Test
➤ Student-Newman-Keuls Test

Kruskal-Wallis Analysis of Variance on Ranks 318


There are two types of multiple comparison available for the ANOVA on
Ranks. The types of comparison you can make depends on the selected
multiple comparison test, such as:

➤ Multiple comparisons versus a control test the difference between all


the different combinations of each factors (i.e., all the cells in the data
table).
➤ All pairwise comparisons test the difference between each treatment or
level within the two factors separately (i.e., among the different rows and
columns of the data table)

Tukey Test The Tukey Test is the suggested all pairwise comparison test unless you have
missing values (see the Dunn’s test for data with empty cells). It uses a table
of critical values that is computed based on a mathematical model of the
probability structure of the multiple comparisons. The Tukey Test is more
conservative than the Student-Newman-Keuls test, because it controls the
errors of all comparisons simultaneously, while the Student-Neuman-Keuls
test controls errors among tests of k means. Because it is more conservative, it
is less likely to determine that a give differences is statistically significant and
it is the recommended test for all pairwise comparisons.

While Multiple Comparisons vs. a Control is an available comparison type


for the Tukey Test, it is not recommended. Use the Dunnet’s Test for
multiple comparisons vs. a control.

Student-Newman- Like the Tukey Test, the Student-Newman-Keuls test is an all pairwise
Keuls (SNK) Test comparison test that uses a table of critical values that is computed based on a
mathematical model of the probability structure of the multiple comparisons.
The Student-Newman-Keuls Test is more conservative than the Tukey Test,
because the Tukey test controls the errors of all comparisons simultaneously,
while the Student-Neuman-Keuls test controls errors among tests of k means.
Because it is less conservative, it is more likely to determine that a give
differences is statistically significant.

The nonparametric SNK test requires equal sample sizes. Use Dunn's test if
the sample sizes are not equal; SigmaStat automatically selects this test when
sample sizes are unequal.

Dunnett's Test Dunnett's test is the analog of the SNK test for the case of multiple
comparisons against a single control group. The nonparametric Dunnett's
test requires equal sample sizes. Use Dunn's test if the sample sizes are not
equal; SigmaStat automatically selects this test when sample sizes are unequal.

Kruskal-Wallis Analysis of Variance on Ranks 319


Dunn's Test Dunn's test must be used for ANOVA on Ranks when the sample sizes in the
different treatment groups are different. You can perform both all pairwise
comparisons and multiple comparisons versus a control with the Dunn’s test.
The all pairwise Dunn’s test is the default for data with missing values.

Performing a The multiple comparison you choose to perform depends on the groups you
Multiple Comparison are testing. Click Cancel if you do not want to perform a multiple
comparison procedure.The multiple comparison you choose to perform
depends on the treatments you are testing. Click Cancel if you do not want
to perform a multiple comparison procedure.

1 Multiple comparisons are performed of the factors selected under the


Select Factors to Compare heading. The factors with P values less than
or equal to the value set in the Options dialog box are automatically
selected, and the P values for the selected factors, and/or the
interactions of the factors are displayed in the upper left corner of the
dialog box. If the P value is greater than the P value set in the Options
dialog box, the factor is not selected, the P value for the factor is not
displayed, and multiple comparisons are not performed for the factor.

You can disable multiple comparison testing for a factor by clicking the
selected option.

FIGURE 9–57
The Multiple Comparison
Options Dialog Box for
the ANOVA on Ranks

2 Select the desired multiple comparison test from the Suggested Test
drop-down list. The Tukey and Student-Newman-Keuls tests are
recommended for determining the difference among all treatments, and
if your sample sizes are equal. To perform an all pairwise comparison
on unequal sample size, select Dunn’s test.

Kruskal-Wallis Analysis of Variance on Ranks 320


3 Select Dunnett’s to determine the differences between the experimental
groups and a control group, and if your sample sizes are equal. To
perform a comparison vs. a control group on unequal sample size, select
the Dunn’s test.

! Note that in both cases SigmaStat defaults to Dunn’s test when your
sample sizes are unequal. You must use Dunn’s test for unequal sample
sizes.

For more information on each of the multiple comparison tests, see


page 295.

4 Select a Comparison Type. The types of comparisons available depend


on the selected test. All Pairwise compares all possible pairs of
treatments and is available for the Tukey, Student-Newman-Keuls, and
Dunn’s test.

Versus Control compares all experimental treatments to a single control


group and is available for the Dunn’s and Dunnett’s tests. It is not
recommended for the Tukey test. If you select Versus Control, you
must also select the control group from the list of groups.

For more information on each of the multiple comparison tests, see


page 295.

5 If you selected an all pairwise comparison test, click Finish to


continue with the ANOVA on Ranks and view the report (see Figure
Figure 9–59 on page 323). For information on editing reports, see page
137 in the WORKING WITH REPORTS chapter.

6 If you selected a multiple comparisons versus a control test, click


Next. The Multiple Comparisons Options dialog box prompts you to
select a control group. Select the desired control group from the list,
then click Finish to continue with the ANOVA on Ranks and view the

Kruskal-Wallis Analysis of Variance on Ranks 321


report (see Figure 9–59 on page 323). For information on editing
reports, see page 137 in the WORKING WITH REPORTS chapter.

FIGURE 9–58
The Multiple Comparison
Options Dialog Box
Prompting You to
Select a Control Group

Interpreting ANOVA on Ranks Results 10

The ANOVA on Ranks report displays the H statistic (corrected for ties) and
the corresponding P value for H. The other results displayed in the report are
enabled and disabled in the Options for ANOVA on Ranks dialog box (see
page 313).

For descriptions of the derivations for ANOVA on Ranks results, you can
reference any appropriate statistics reference. For a list of suggested
references, see page 12.

The report scroll bars only scroll to the top and bottom of the current page.
To move to the next or the previous page in the report, use the and
buttons in the formatting toolbar to move one page up and down in the
report.

Result Explanations In addition to the numerical results, expanded explanations of the results may
also appear. To turn off this explanatory text, choose the Statistics menu
Report Options... command and click the selected Explain Test Options
check box.

The number of decimal places displayed is also set in the Report Options
dialog box. For more information on setting report options, see page 135 in
Chapter 7.

Normality Test Normality test results display whether the data passed or failed the test of the
assumption that it was drawn from a normal population and the P value

Kruskal-Wallis Analysis of Variance on Ranks 322


calculated by the test. For nonparametric procedures, this test can fail, since
nonparametric tests do not assume normally distributed source populations.

These results appear unless you disabled normality testing in the Options for
ANOVA on Ranks dialog box (see page 313).

Equal Variance Test Equal Variance test results display whether or not the data passed or failed the
test of the assumption that the samples were drawn from populations with
the same variance and the P value calculated by the test. Nonparametric tests
do not assume equal variances of the source populations.

These results appear unless you disabled equal variance testing in the Options
for ANOVA on Ranks dialog box (see page 313).

FIGURE 9–59
The ANOVA on Ranks
Results Report

Summary Table If you selected this option in the Options for ANOVA on Ranks dialog box,
SigmaStat generates a summary table listing the medians, the percentiles
defined in the Options dialog box, and sample sizes N.

N (Size) The number of non-missing observations for that column or


group.

Missing The number of missing values for that column or group.

Medians The “middle” observation as computed by listing all the


observations from smallest to largest and selecting the largest value of the

Kruskal-Wallis Analysis of Variance on Ranks 323


smallest half of the observations. The median observation has an equal
number of observations greater than and less than that observation.

Percentiles The two percentile points that define the upper and lower tails
of the observed values.

H Statistic The ANOVA on Ranks test statistic H is computed by ranking all


observations from smallest to largest without regard for treatment group.
The average value of the ranks for each treatment group are computed and
compared.

For large sample sizes, this value is compared to the chi-square distribution
(the estimate of all possible distributions of H) to determine the possibility of
this H occurring. For small sample sizes, the actual distribution of H is used.

If H is small, the average ranks observed in each treatment group are


approximately the same. You can conclude that the data is consistent with
the null hypothesis that all the samples were drawn from the same population
(i.e., no treatment effect).

If H is a large number, the variability among the average ranks is larger than
expected from random variability in the population, and you can conclude
that the samples were drawn from different populations (i.e., the differences
between the groups are statistically significant).

P Value The P value is the probability of being wrong in concluding that


there is a true difference in the groups (i.e., the probability of falsely rejecting
the null hypothesis, or committing a Type I error, based on H). The smaller
the P value, the greater the probability that the samples are significantly
different. Traditionally, you can conclude there are significant differences
when P ! 0.05.

Multiple If a difference is found among the groups, and you requested and elected to
Comparisons perform multiple comparisons, a table of the comparisons between group
pairs is displayed. The multiple comparison procedure is activated in the
Options for ANOVA on Ranks dialog box (see page 292). The test used in
the multiple comparison procedure is selected in the Multiple Comparison
Options dialog box (see page 298).

Multiple comparison results are used to determine exactly which groups are
different, since the ANOVA results only inform you that two or more of the
groups are different. The specific type of multiple comparison results

Kruskal-Wallis Analysis of Variance on Ranks 324


depends on the comparison test used and whether the comparison was made
pairwise or versus a control. Note that:

➤ All pairwise comparison results list comparisons of all possible


combinations of group pairs: the all pairwise tests are the Tukey, Student-
Newman-Keuls test and Dunn's test.
➤ Comparisons versus a single control list only comparisons with the
selected control group. The control group is selected during the actual
multiple comparison procedure. The comparison versus a control tests
are Dunnett's test and Dunn's test.

For descriptions of the derivations of nonparametric multiple comparison


results, you can reference any appropriate statistics reference. For a list of
suggested references, see page 12.

Tukey, Student-Newman-Keuls, and Dunnett's Test Results The Tukey


and Student-Newman-Keuls (SNK) tests are all pairwise comparisons of
every combination of group pairs. Dunnett's test only compares a control
group to all other groups. All tests compute the q test statistic. They also
display the number of rank sums spanned in the comparison p, and display
whether or not P ! 0.05 (or P ! 0.01) for that pair comparison.

You can conclude from “large” values of q that the difference of the two
groups being compared is statistically significant.

If the P value for the comparison is less than 0.05, the probability of being
incorrect in concluding that there is a significant difference is less than 5%.
If it is greater than 0.05, you cannot confidently conclude that there is a
difference.

The Difference of Ranks is a gauge of the size of the real difference between
the two groups.

p is a parameter used when computing q. The larger the p, the larger q needs
to be to indicate a significant difference. p is an indication of the differences
in the ranks of the group means being compared. Group rank sums are
ranked in order from largest to smallest in an SNK or Dunnett’s test, so p is
the number of rank sums spanned in the comparison. For example, when
comparing four rank sums, comparing the largest to the smallest p " 4, and
when comparing the second smallest to the smallest
p " 2.

Kruskal-Wallis Analysis of Variance on Ranks 325


If a group is found to be not significantly different than another group, all
groups with ranks in between the rank sums of the two groups that are not
different are also assumed not to be significantly different, and a result of
DNT (Do Not Test) appears for those comparisons.

Dunn's Test Results Dunn's test is used to compare all groups or to


compare versus a control. Dunn's test lists the difference of rank means,
computes the Q test statistic, and displays whether or not P ! 0.05, for each
group pair.

You can conclude from “large” values of Q that the difference of the two
groups being compared is statistically significant.

If the P value for the comparison is less than 0.05, the likelihood of being
incorrect in concluding that there is a significant difference is less than 5%.
If it is greater than 0.05, you cannot confidently conclude that there is a
difference.

The Difference of Rank Means is a gauge of the size of the difference between
the two groups.

ANOVA on Ranks Report Graphs 10

You can generate up to three graphs using the results from an ANOVA on
Ranks. They include a:

➤ Point plot of the column data.


➤ Box plot.
➤ Multiple comparison graphs.

Point Plot The ANOVA on Ranks point plot graphs all values in each column as a point
on the graph. If the graph data is indexed, the levels in the factor column are
used as the tick marks for the plot points, and the column titles are used as
the X and Y axis titles. If the graph data is in raw or statistical format, the
column titles are used as the tick marks for the plot points and default X Data
and Y Data axis titles are assigned to the graph.

Kruskal-Wallis Analysis of Variance on Ranks 326


Box Plot The ANOVA on Ranks box plot graphs each of the groups being tested. The
ends of the boxes define the 25th and 75th percentiles, with a line at the
median and error bars defining the 10th and 90th percentiles.

If the graph data is indexed, the levels in the factor column are used as the
tick marks for the box plot boxes, and the column titles are used as the axis
titles. If the graph data is in raw format, the column titles are used as the tick
marks for the box plot boxes, and default axis titles, X Axis and Y Axis, are
assigned to the graph. For an example of a box plot, see page 182 in the
CREATING AND MODIFYING GRAPHS chapter.

Multiple The multiple comparison graphs are available for all ANOVA reports. They
Comparison Graphs plot significant differences between levels of a significant factor. There is one
graph for every significant factor reported by the specified multiple
comparison test. If there is one significant factor reported, one graph
appears; if there are two significant factors, two graphs appear, etc. If a factor
is not reported as significant, a graph for the factor does not appear. For an
example of a multiple comparison graph, see page 182 in the CREATING AND
MODIFYING GRAPHS chapter.

Creating a Graph To generate a graph of ANOVA on Ranks report graph:

1 Click the toolbar button, or choose the Graph menu Create Graph
command when the ANOVA on Ranks report is selected. The Create
Graph dilaog appears displaying the types of graphs available for the
ANOVA on Ranks report.

FIGURE 9–60
The Create Graph Dialog
Box
for the ANOVA on
Ranks Report Graph

Kruskal-Wallis Analysis of Variance on Ranks 327


2 Select the type of graph you want to create from the Graph Type list,
then click OK, or double-click the desired graph in the list. For more
information on each of the graph types, see Chapter 7, Creating and
Modifying Graphs. The specified graph appears in a graph window or
in the report.

For information on manipulating graphs, see pages page 178 through page
202.

FIGURE 9–61
A Multiple Comparison
Matrix for a
ANOVA on Ranks

Kruskal-Wallis Analysis of Variance on Ranks 328


Comparing Repeated Measurements of the Same Individuals

9 Comparing Repeated
Measurements of the Same
Individuals

Use repeated measures procedures to test for differences in same


individuals before and after one or more different treatments or changes
in condition.

When comparing random samples from two or more groups consisting


of different individuals, use group comparison tests. See Choosing the
Procedure to Use on page 103 for more information on when to use the
different SigmaStat tests.

About Repeated Measures Tests 10

Repeated measures tests are used to detect significant differences in the


mean or median effect of treatment(s) within individuals beyond what
can be attributed to random variation of the repeated treatments.
Variation among individuals is taken into account, allowing
concentration of the effect of the treatments rather than the differences
between individuals.

See Choosing the Repeated Measures Test to Use on page 117 for more
information on when to use the different SigmaStat repeated measures
comparisons tests.

Parametric and Parametric tests assume treatment effects are normally distributed with
Nonparametric Tests the same variances (or standard deviations). Parametric tests are based
on estimates of the population means and standard deviations, the
parameters of a normal distribution.

About Repeated Measures Tests 329


Comparing Repeated Measurements of the Same Individuals

Nonparametric tests do not assume that the treatment effects are


normally distributed. Instead, they perform a comparison on ranks of
the observed effects.

Comparing Individuals Use before and after comparisons to test the effect of a single
Before and After a Single experimental treatment on the same individuals. There are two tests
Treatment available:

➤ The Paired t-test; this is a parametric test.


➤ The Wilcoxon Signed Rank Test; this is a nonparametric test.

Comparing Individuals Use repeated measures procedures to test the effect of more than one
Before and After Multiple experimental treatment on the same individuals. There are three tests
Treatments available

➤ One Way Repeated Measures ANOVA, a parametric test comparing


the effect of a single series of treatments or conditions.
➤ Two Way Repeated Measures ANOVA, a parametric test comparing
the effect of two factors, where one or both factors are a series of
treatments or conditions.
➤ the Friedman One Way Repeated Measures ANOVA on Ranks, the
nonparametric analog of One Way Repeated Measures ANOVA.

When using one of these procedures to compare multiple treatments,


and you find a statistically significant difference, you can use several
multiple comparison procedures to determine exactly which treatments
had an effect, and the size of the effect. These procedures are described
for each test.

Data Format for Repeated Measures Tests 10

Data can be arranged in the worksheet as:

➤ Columns for each treatment (raw data).


➤ Data indexed to other column(s).

You cannot use the summary statistics for repeated measures tests.
Complete descriptions of data entry and formats can be found in
Chapter 4, USING THE DATA WORKSHEET.

Data Format for Repeated Measures Tests 330


Comparing Repeated Measurements of the Same Individuals

Raw Data To enter data in raw data format, enter the data for each treatment in
separate worksheet columns. You can use raw data for all tests except
Two Way ANOVAs.

FIGURE 9–1
Valid Data Formats
for a Paired t-test
Columns 1 and 2 are
arranged as raw data.
Columns 3, 4, and 5 are
arranged as indexed data,
with column 3 as the subject
column, column 4 as the
factor column, and column 5
as the data column.

The worksheet columns for raw data must be the same length. If a missing
value is encountered, that individual is either ignored or, for parametric
ANOVAs, a general linear model is used to take advantage of all available
data.

Indexed Data Indexed data contains the treatments in one column and the
corresponding data points in another column. A One Way Repeated
Measures ANOVA requires a subject index in a third column. Two Way
Repeated Measures ANOVA requires an additional factor column, for a
total of four columns.

If you plan to compare only a portion of the data, put the treatment
index in the left column, followed by the second factor index (for Two
Way ANOVA only), then the subject index (for Repeated Measures
ANOVA), and finally the data in the rightmost column.

You can index raw data or convert indexed data to raw data using the Edit
menu Index and UnIndex commands.

Data Format for Repeated Measures Tests 331


Comparing Repeated Measurements of the Same Individuals

Paired t-Test 10

The Paired t-test is a parametric statistical method that assumes the


observed treatment effects are normally distributed. A Paired t-test
should be used when:

➤ You want to see if the effect of a single treatment on the same


individual is significant.
➤ The treatment effects (i.e., the changes in the individuals before and
after the treatment) are normally distributed.

If you know that the distribution of the observed effects are non-normal,
use the Wilcoxon Signed Rank Test. If you are comparing the effect of
multiple treatments on the same individuals, do a Repeated Measures
Analysis of Variance.

About the The Paired t-test examines the changes which occur before and after a
Paired t-test single experimental intervention on the same individuals to determine
whether or not the treatment had a significant effect. Examining the
changes rather than the values observed before and after the intervention
removes the differences due to individual responses, producing a more
sensitive, or powerful, test.

Performing a To perform a Paired t-test:


Paired t-test
1 Enter or arrange your data appropriately in the worksheet
(see following section).

2 If desired, set the Paired t-test options using the Options for Paired
t-test dialog box (page 336).

3 Select Paired t-test from the toolbar drop-down list, or choose


Before and After, Paired t-test from the Statistics menu.

4 Run the test by selecting the worksheet columns with the data you
want to test using the Pick Columns dialog box (page 338).

5 View and interpret the Paired t-test report and generate report
graphs (page 339 and page 343).

Paired t-Test 332


Comparing Repeated Measurements of the Same Individuals

Arranging Paired t-test Data 10

The format of the data to be tested can be raw data or indexed data. The
data is placed in two worksheet columns for raw data and three columns
(a subject, factor, and data column) for indexed data. The columns for
raw data must be the same length. If a missing value is encountered,
that individual is ignored. You cannot use statistical summary data for
repeated measures tests.

FIGURE 9–2
Valid Data Formats
for a Paired t-test
Columns 1 and 2 are
arranged as raw data.
Columns 3, 4, and 5 are
arranged as indexed data,
with column 3 as the subject
column and column 4 as the
factor column.

For more information on arranging data, see Data Format for Repeated
Measures Tests on page 330, or Arranging Paired t-test Data on page
333.

Selecting Data Columns When running a Paired t-test, you can either:

➤ Select the columns to test from the worksheet by dragging your


mouse over the columns before choosing the test.
➤ Select the columns while running the test.

Setting Paired t-test Options 10

Use the Paired t-test options to:

➤ Adjust the parameters of a test to relax or restrict the testing of your


data for normality.
➤ Display the statistics summary and the confidence interval for the
data.
➤ Compute the power, or sensitivity, of the test.

Paired t-Test 333


Comparing Repeated Measurements of the Same Individuals

To change the Paired t-test options:

1 If you are going to run the test after changing test options, and
want to select your data before you run the test, drag the pointer
over your data.

2 To open the Options for Paired t-test dialog box, select Paired t-
test from the toolbar drop-down list, then click the button, or
choose the Statistics menu Current Test Options... command.
The Normality option appears (see Figure 9–3 on page 335).

3 Click the Results tab to view the Summary Table, Confidence


Interval, and Residual options (see Figure 9–4 on page 336), and
the Post Hoc Test tab to view the Power option (see Figure 9–5 on
page 337). Click the Assumption Checking tab to return to the
Normality option.

4 Click a check box to enable or disable a test option. Options


settings are saved between SigmaStat sessions. For more
information on each of the test options, see pages 9-334 through
9-336.

5 To continue the test, click Run Test. The Pick Columns dialog
box appears (see page 350 for more information).

6 To accept the current settings and close the options dialog box,
click OK. To accept the current setting without closing the
options dialog box, click Apply. To close the dialog box without
changing any settings or running the test, click Cancel.

You can select Help at any time to access SigmaStat’s on-line help
system.

Normality Assumptions Select the Assumption Checking tab from the options dialog box to view
the Normality option. The normality assumption test checks for a
normally distributed population.

Paired t-Test 334


Comparing Repeated Measurements of the Same Individuals

Normality Testing SigmaStat uses the Kolmogorov-Smirnov test to test


for a normally distributed population.

FIGURE 9–3
The Options for Paired t-test
Dialog Box Displaying the
Assumption Checking
Options

The Equal Variance option is not available for the Paired t-test because
Paired t-tests are based on changes in each individual rather than on different
individuals in the selected population, making equal variance testing
unnecessary.

P Values for Normality Enter the corresponding P value in the P Value


to Reject box. The P value determines the probability of being incorrect
in concluding that the data is not normally distributed (the P value is the
risk of falsely rejecting the null hypothesis that the data is normally
distributed). If the P value computed by the test is greater than the P set
here, the test passes.

To require a stricter adherence to normality, increase the P value.


Because the parametric statistical methods are relatively robust in terms
of detecting violations of the assumptions, the suggested value in
SigmaStat is 0.050. Larger values of P (for example, 0.100) require less
evidence to conclude that data is not normal.

To relax the requirement of normality, decrease P. Requiring smaller


values of P to reject the normality assumption means that your are
willing to accept greater deviations from the theoretical normal
distribution before you flag the data as non-normal. For example, a P
value of 0.050 requires greater deviations from normality to flag the data
as non-normal than a value of 0.100.

Although the normality test is robust in detecting data from populations that
are non-normal, there are extreme conditions of data distribution that this
test cannot take into account. However, these conditions should be easily

Paired t-Test 335


Comparing Repeated Measurements of the Same Individuals

detected by simply examining the data without resorting to the automatic


assumption test.

Summary Table Select the Results tab in the options dialog box to view the Summary
Table option. The Summary Table option displays the number of
observations for a column or group, the number of missing values for a
column or group, the average value for the column or group, the
standard deviation of the column or group, and the standard error of the
mean for the column or group.

Confidence Intervals Select the Results tab in the options dialog box to view the Confidence
Intervals option. The Confidence Intervals option displays the
confidence interval for the difference of the means. To change the
interval, enter any number from 1 to 99 (95 and 99 are the most
commonly used intervals). Click the selected check box if you do not
want to include the confidence interval in the report.

Residuals Select the Results tab in the options dialog box to view the Residuals
option. Use the Residuals option to display residuals in the report and to
save the residuals of the test to the specified worksheet column. To
change the column the residuals are saved to, edit the number in or
select a number from the drop-down list.

FIGURE 9–4
The Options for Paired
t-test Dialog Box Displaying
the Summary Table,
Confidence Intervals,
and Residuals Options

Power Select the Post Hoc Tests tab in the options dialog box to view the Power
option. The power or sensitivity of a test is the probability that the test
will detect a difference between the groups if there is really a difference.

Change the alpha value by editing the number in the Alpha Value box.
Alpha (#) is the acceptable probability of incorrectly concluding that
there is a difference. The suggested value is # " 0.05. This indicates

Paired t-Test 336


Comparing Repeated Measurements of the Same Individuals

that a one in twenty chance of error is acceptable, or that you are willing
to conclude there is a significant difference when P ! 0.05.

Smaller values of z# result in stricter requirements before concluding


there is a significant difference, but a greater possibility of concluding
there is no difference when one exists. Larger values of # make it easier
to conclude that there is a difference, but also increase the risk of
reporting a false positive.

FIGURE 9–5
The Options for Paired
t-test Dialog Box Displaying
the Power Option

Running a Paired t-test 10

To run a test, you need to select the data to test. The Pick Columns
dialog box is used to select the worksheet columns with the data you
want to test and to specify how your data is arranged in the worksheet.

To run a Paired t-test:

1 If you want to select your data before you run the test, drag the
pointer over your data.

2 Open the Pick Columns dialog box to start the Paired t-test. You
can either:

➤ Select Paired t-test from the toolbar drop-down list, then select
the button.
➤ Choose the Statistics menu Before and After command, then
choose Paired t-test.
➤ Click the Run Test button from the Options for Paired t-test
dialog box (see page 337).

Paired t-Test 337


Comparing Repeated Measurements of the Same Individuals

The Pick Columns dialog box appears prompting you to specify a


data format.

3 Select the appropriate data format from the Data Format drop-
down list. If your data is grouped in columns, select Raw. If your
data is in the form of a group index column(s) paired with a data
column(s), select Indexed.

FIGURE 9–6
The Pick Columns
for Paired t-test Dialog Box
Prompting You to
Specify a Data Format

For more information on arranging data, see Data Format for


Repeated Measures Tests on page 330, or Arranging Paired t-test
Data on page 333.

4 Select Next to pick the data columns for the test. If you selected
columns before you chose the test, the selected columns appear in
the Selected Columns list.

5 To assign the desired worksheet columns to the Selected Columns


list, select the columns in the worksheet, or select the columns
from the Data for Data drop-down list.

The first selected column is assigned to the first row in the Selected
Columns list, and all successively selected columns are assigned to
successive rows in the list. The number or title of selected columns
appear in each row. For raw data you are prompted for two data
columns and for indexed data, you are prompted to select three
worksheet columns.

6 To change your selections, select the assignment in the list, then


select new column from the worksheet. You can also clear a
column assignment by double-clicking it in the Selected Columns
list.

Paired t-Test 338


Comparing Repeated Measurements of the Same Individuals

FIGURE 9–7
The Pick Columns
for Paired t-test Dialog Box
Prompting You to
Select Data Columns

7 Select Finish to perform the test. The report appears displaying


the results of the Paired t-test (see Figure 9–8 on page 340). If you
selected to place the residuals in the worksheet, they appear in the
column specified in the options dialog box (see page 337).

Interpreting Paired t-test Results 10

The Paired t-test report displays the t statistic, degrees of freedom, and P
value for the test. The other results displayed in the report are selected in
the Options for Paired t-test dialog box (see Setting Paired t-test Options
on page 333).

For descriptions of the derivations for paired t-test results, you can
reference an appropriate statistics reference. For a list of suggested
references, see the page 12.

The report scroll bars only scroll to the top and bottom of the current page.
To move to the next or the previous page in the report, use the and
buttons in the formatting toolbar to move one page up and down in the
report.

Result Explanations In addition to the numerical results, expanded explanations of the results
may also appear. To turn off this explanatory text, choose the Statistics
menu Report Options... command and uncheck the Explain Test
Results option.

The number of decimal places displayed is also set in the Report


Options dialog box. For more information on setting report options, see
Setting Report Options on page 135.

Normality Test Normality test results display whether the data passed or failed the test of
the assumption that the changes observed in each subject is consistent

Paired t-Test 339


Comparing Repeated Measurements of the Same Individuals

with a normally distributed population, and the P value calculated by


the test. A normally distributed source is required for all parametric
tests.

This result appears unless you disabled normality testing in the Paired t-
test Options dialog box (see page 335).

FIGURE 9–8
The Paired t-test
Results Report

Summary Table SigmaStat can generate a summary table listing the sample size N,
number of missing values (if any), mean, standard deviation, and
standard error of the means (SEM). This result is displayed unless you
disabled it in the Paired
t-test Options dialog box (see page 336).

N (Size) The number of non-missing observations for that column or


group.

Missing The number of missing values for that column or group.

Mean The average value for the column. If the observations are
normally distributed the mean is the center of the distribution.

Standard Deviation A measure of variability. If the observations are


normally distributed, about two-thirds will fall within one standard

Paired t-Test 340


Comparing Repeated Measurements of the Same Individuals

deviation above or below the mean, and about 95% of the observations
will fall within two standard deviations above or below the mean.

Standard Error of the Mean A measure of the approximation with


which the mean computed from the sample approximates the true
population mean.

Difference The difference of the group before and after the treatment is described in
terms of the mean of the differences (changes) in the subjects before and
after the treatment, and the standard deviation and standard error of the
mean difference.

The standard error of the mean difference is a measure of the precision


with which the mean difference estimates the true difference in the
underlying population.

t Statistic The t-test statistic is computed by subtracting the values before the
intervention from the value observed after the intervention in each
experimental subject. The remaining analysis is conducted on these
differences.

The t-test statistic is the ratio

t = mean difference of the subjects before & after-


-----------------------------------------------------------------------------------------------------
standard error of the mean difference

You can conclude from large (bigger than $2) absolute values of t that
the treatment affected the variable of interest (you reject the null
hypothesis of no difference). A large t indicates that the difference in
observed value after and before the treatment is larger than one would be
expected from effect variability alone (i.e., that the effect is statistically
significant). A small t (near 0) indicates that there is no significant
difference between the samples (little difference in the means before and
after the treatment).

Degrees of Freedom The degrees of freedom is a measure of the sample


size, which affects the ability of t to detect differences in the mean
effects. As degrees of freedom increase, the ability to detect a difference
with a smaller t increases.

P Value The P value is the probability of being wrong in concluding


that there is a true effect (i.e., the probability of falsely rejecting the null

Paired t-Test 341


Comparing Repeated Measurements of the Same Individuals

hypothesis, or committing a Type I error, based on t). The smaller the P


value, the greater the probability that the treatment effect is significant.
Traditionally, you can conclude there is a significant difference when
P ! 0.05.

Confidence Interval for If the confidence interval does not include a value of zero, you can
the Difference conclude that there is a significant difference with that level of
of the Means confidence. Confidence can also be described as P !%#, where # is the
acceptable probability of incorrectly concluding that there is an effect.

The level of confidence is adjusted in the Options for Paired t-test dialog
box; this is typically 100(1&%#), or 95%. Larger values of confidence
result in wider intervals.

This result is displayed unless you disabled it in the options for Paired t-
test dialog box (see page 337).

Power The power, or sensitivity, of a Paired t-test is the probability that the test
will detect a difference between treatments if there really is a difference.
The closer the power is to 1, the more sensitive the test.

Paired t-test power is affected by the sample sizes, the chance of


erroneously reporting a difference # (alpha), the observed differences of
the subject means, and the observed standard deviations of the samples.

This result is displayed unless you disabled it in the Options for Paired t-
test dialog box (see page 337).

Alpha (#) Alpha (#) is the acceptable probability of incorrectly


concluding that there is a difference. An # error is also called a Type I
error. A Type I error is when you reject the hypothesis of no effect when
this hypothesis is true.

The # value is set in the Options for Paired t-test dialog box; the
suggested value is # " 0.05 which indicates that a one in twenty chance
of error is acceptable. Smaller values of # result in stricter requirements
before concluding there is a significant difference, but a greater
possibility of concluding there is no difference when one exists (a Type
II error). Larger values of # make it easier to conclude that there is a
difference but also increase the risk of seeing a false difference (a Type I
error).

Paired t-Test 342


Comparing Repeated Measurements of the Same Individuals

Paired t-test Report Graphs 10

You can generate up to three graphs using the results from a paired t-test.
They include a:

➤ Line scatter graph of the changes after treatment.


➤ Normal probability plot of the residuals.
➤ Histogram of the residuals.

Before and After The Paired t-test graph uses lines to plot a subject's change after each
Line Graph treatment. If the graph plots raw data, the lines represent the rows in the
column, the column titles are used as the tick marks for the X axis and
the data is used as the tick marks for the Y axis.

If the graph plots indexed data, the lines represent the levels in the
subject column, the levels in the treatment column are used as the tick
marks for the X axis, the data is used as the tick marks for the Y axis, and
the treatment and data column titles are used as the axis titles. For an
example of a before and after graph, see page 159.

Histogram of Residuals The Paired t-test histogram plots the raw residuals in a specified range,
using a defined interval set. The residuals are divided into a number of
evenly incremented histogram intervals and plotted as histogram bars
indicating the number of residuals in each interval. The X axis
represents the histogram intervals, and the Y axis represent the number
of residuals in each group. For an example of a histogram, see page 172.

Probability Plot The Paired t-test probability plot graphs the frequency of the raw
residuals. The residuals are sorted and then plotted as points around a
curve representing the area of the gaussian plotted on a probability axis.
Plots with residuals that fall along gaussian curve indicate that your data
was taken from a normally distributed population. The X axis is a
linear scale representing the residual values. The Y axis is a probability
scale representing the cumulative frequency of the residuals. For an
example of a probability plot, see page 176.

Creating a To generate a graph of Paired t-test report data:


Report Graph
1 Click the toolbar button, or choose the Graph menu Create
Graph command when the t-test report is selected. The Create

Paired t-Test 343


Comparing Repeated Measurements of the Same Individuals

Graph dialog box appears displaying the types of graphs available


for the Paired t-test results.

FIGURE 9–9
The Create Graph Dialog
Box for Paired t-test Report
Graphs

2 Select the type of graph you want to create from the Graph Type
list, then select OK, or double-click the desired graph in the list.
For more information on each of the graph types, see Chapter 8.
FIGURE 9–10
A Normal Probability
Plot of the Report Data

For information on modifying graphs, see pages page 178 through page
202.

Paired t-Test 344


Comparing Repeated Measurements of the Same Individuals

Wilcoxon Signed Rank Test 10

The Signed Rank Test is a nonparametric procedure, which does not


require assuming normality or equal variance. A Signed Rank Test
should be used when:

➤ You want to see if the effect of a single treatment on the same


individual is significant.
➤ The treatment effects are not normally distributed with the same
variances.

If you know that the effects are normally distributed, use the Paired t-
test. When there are multiple treatments to compare, do a Friedman
Repeated Measures ANOVA on Ranks.

Note that, depending on your Signed Rank Test option settings (see page
347), if you attempt to perform a Signed Rank Test on a normal population,
SigmaStat suggests that the data can be analyzed with the more powerful
Paired t-test instead.

About the A Signed Rank Test Ranks all the observed treatment differences from
Signed Rank Test smallest to largest without regard to sign (based on their absolute value),
then attaches the sign of each difference to the ranks. The signed ranks
are summed and compared. This procedure uses the size of the
treatment effects and the sign.

If there is no treatment effect, the positive ranks should be similar to the


negative ranks. If the ranks tend to have the same sign, you can
conclude that there was a treatment effect (i.e., that there is a statistically
significant difference before and after the treatment).

The Wilcoxon Signed Rank Tests the null hypothesis a treatment has no
effect on the subject.

Performing a To perform a Signed Rank Test:


Signed Rank Test
1 Enter or arrange your data appropriately in the data worksheet
(see following section).

2 If desired, set the Signed Rank Test options using the Options for
Signed Rank Test dialog box (page 349).

Wilcoxon Signed Rank Test 345


Comparing Repeated Measurements of the Same Individuals

3 Select Signed Rank Test from the toolbar, or choose the Statistics
menu Before and After command, then choose Signed Rank Test.

4 Run the test by selecting the worksheet columns with the data you
want to test using the Pick Columns dialog box (page 350).

5 View and interpret the Signed Rank Test report and generate
report graphs (pages 9-351 and 9-353).

Arranging Signed Rank Data 10

The format of the data to be tested can be raw data or indexed data; in
either case, the data is found in two worksheet columns. For more
information on arranging data, see Data Format for Repeated Measures
Tests on page 330, or Arranging Data for Contingency Tables on page
69. For information on how to select the data format for a test, see
Picking Data to Test on page 99.

FIGURE 9–11
Valid Data Formats for a
Wilcoxon Signed Rank Test
Columns 1 and 2 are
arranged as raw data.
Columns 3 and 4 are
arranged as indexed data,
with column 3 as the factor
column.

Selecting When running a Wilcoxon Signed Rank Test you can either:
Data Columns
➤ Select the columns to test by dragging your mouse over the columns
before choosing the test.
➤ Select the columns while running the test.

Wilcoxon Signed Rank Test 346


Comparing Repeated Measurements of the Same Individuals

Setting the Signed Rank Options 10

Use the Signed Rank Test options to:

➤ Adjust the parameters of the test to relax or restrict the testing of


your data for normality.
➤ Display the summary table.

To change the Signed Rank Test options:

1 If you are going to run the test after changing test options, and
want to select your data before you run the test, drag the pointer
over your data.

2 To open the Options for Signed Rank Test dialog box, select
Signed Rank Test from the drop-down list in the toolbar, then
click the button, or choose the Statistics menu Current Test
Options... command. The Normality appears (see Figure 9–12 on
page 349).

3 Click the Results tab to view the Summary Table option (see
Figure 9–13 on page 349). Click the Assumption Checking tab to
return to the Normality and Equal Variance options.

4 Click a check box to enable or disable a test option. Options


settings are saved between SigmaStat sessions. For more
information on each of the test options, see pages 9-348 through
9-349.

5 To continue the test, click Run Test. The Pick Columns dialog
box appears (see page 338 for more information).

6 To accept the current settings and close the options dialog box,
click OK. To accept the current setting without closing the
options dialog box, click Apply. To close the dialog box without
changing any settings or running the test, click Cancel.

You can select Help at any time to access SigmaStat’s on-line help
system.

Wilcoxon Signed Rank Test 347


Comparing Repeated Measurements of the Same Individuals

The Normality Select the Assumption Checking tab from the options dialog box to view
Assumption the Normality and Equal Variance options. The normality assumption
test checks for a normally distributed population.

The Equal Variance option is not available for the Signed Rank Test because
Signed Rank Tests are based on changes in each individual rather than on
different individuals in the selected population, making equal variance
testing unnecessary.

Normality Testing SigmaStat uses the Kolmogorov-Smirnov test to test


for a normally distributed population.

P Values for Normality Enter the corresponding P value in the P


Value to Reject box. The P value determines the probability of being
incorrect in concluding that the data is not normally distributed (the P
value is the risk of falsely rejecting the null hypothesis that the data is
normally distributed). If the P value computed by the test is greater
than the P set here, the test passes.

To require a stricter adherence to normality, increase the P value.


Because the parametric statistical methods are relatively robust in terms
of detecting violations of the assumptions, the suggested value in
SigmaStat is 0.050. Larger values of P (for example, 0.100) require less
evidence to conclude that data is not normal.

To relax the requirement of normality, decrease P. Requiring smaller


values of P to reject the normality assumption means that your are
willing to accept greater deviations from the theoretical normal
distribution before you flag the data as non-normal. For example, a P
value of 0.050 requires greater deviations from normality to flag the data
as non-normal than a value of 0.100.

Although this assumption test is robust in detecting data from populations


that are non-normal, there are extreme conditions of data distribution that
this test cannot take into account. However, these conditions should be

Wilcoxon Signed Rank Test 348


Comparing Repeated Measurements of the Same Individuals

easily detected by simply examining the data without resorting to the


automatic assumption test.
FIGURE 9–12
The Options for Signed
Rank Test Displaying
the Normality Option

Summary Table The summary table for a Signed Rank Test lists the medians, percentiles,
and sample sizes N in the Rank Sum test report. If desired, change the
percentile values by editing the boxes. The 25th and the 75th
percentiles are the suggested percentiles...

FIGURE 9–13
The Options for Signed
Rank Test Displaying
the Summary Table Option

Running a Signed Rank Test 10

To run a test, you need to select the data to test. The Pick Columns
dialog box is used to select the worksheet columns with the data you
want to test and to specify how your data is arranged in the worksheet.

To run a Signed Rank Test:

1 If you want to select your data before you run the test, drag the
pointer over your data.

Wilcoxon Signed Rank Test 349


Comparing Repeated Measurements of the Same Individuals

2 Open the Pick Columns dialog box to start the Signed Rank Test.
You can either:

➤ Select Signed Rank Test from the toolbar drop-down list, then
select the button.
➤ Choose the Statistics menu Before and After, Signed Rank
Test... command.
➤ Click the Run button from the Options for Signed Rank Test
dialog box (see page 349).

The Pick Columns dialog box appears prompting you to specify a


data format.

3 Select the appropriate data format from the Data Format drop-
down list. If your data is grouped in columns, select Raw. If your
data is in the form of a group index column(s) paired with a data
column(s), select Indexed.

For more information on arranging data, see Data Format for


Repeated Measures Tests on page 330, or Arranging Paired t-test
Data on page 333.

FIGURE 9–14
The Pick Columns
for Signed Rank Test
Dialog Box Prompting You
to Specify a Data Format

4 Select Next to pick the data columns for the test. If you selected
columns before you chose the test, the selected columns appear in
the Selected Columns list.

5 To assign the desired worksheet columns to the Selected Columns


list, select the columns in the worksheet, or select the columns
from the Data for Data drop-down list.

The first selected column is assigned to the first row in the Selected
Columns list, and all successively selected columns are assigned to
successive rows in the list.

Wilcoxon Signed Rank Test 350


Comparing Repeated Measurements of the Same Individuals

The number or title of selected columns appear in each row. You


are prompted to pick two columns for raw data and three columns
for indexed data.

6 To change your selections, select the assignment in the list, then


select new column from the worksheet. You can also clear a
column assignment by double-clicking it in the Selected Columns
list.

7 Select Finish to perform the test. If you elected to test for


normality, SigmaStat performs the test for normality
(Kolmogorov-Smirnov). If your data pass the test, SigmaStat
informs you and suggests continuing your analysis using a Paired t-
test.

When the test is complete, the report appears displaying the results
of the Signed Rank Test (see Figure 9–15 on page 352).

Interpreting Signed Rank Test Results 10

The Signed Rank Test computes the Wilcoxon W statistic and the P
value
for W. Additional results to be displayed are selected in the Options for
Signed Rank Test dialog box (see Setting the Signed Rank Options on
page 347).

For descriptions of the derivations for Wilcoxon Signed Rank Test


results, you can reference an appropriate statistics reference. For a list of
suggested references, see page 12.

The report scroll bars only scroll to the top and bottom of the current page.
To move to the next or the previous page in the report, use the and
buttons in the formatting toolbar to move one page up and down in the
report.

Result Explanations In addition to the numerical results, expanded explanations of the results
may also appear. To turn off this explanatory text, choose the Statistics
menu Report Options... command and uncheck the Explain Test
Results option.

Wilcoxon Signed Rank Test 351


Comparing Repeated Measurements of the Same Individuals

The number of decimal places displayed is also set in the Report


Options dialog box. For more information on setting report options, see
Setting Report Options on page 135.

FIGURE 9–15
The Wilcoxon
Signed
Rank Test Results
Report

Normality Test Normality test results display whether the data passed or failed the test of
the assumption that the difference of the treatment originates from a
normal distribution, and the P value calculated by the test. For
nonparametric procedures this test can fail, since nonparametric tests do
not require normally distributed source populations. This result appears
unless you disabled normality testing in the Options for Signed Rank
Test dialog box (see page 349).

Summary Tables SigmaStat generates a summary table listing the sample sizes N, number
of missing values (if any), medians, and percentiles. All of these results
are displayed in the report unless you disable them in the Signed Rank
Test Options dialog box (see page 349).

N (Size) The number of non-missing observations for that column or


group.

Missing The number of missing values for that column or group.

Medians The “middle” observation as computed by listing all the


observations from smallest to largest and selecting the largest value of the
smallest half of the observations. The median observation has an equal
number of observations greater than and less than that observation.

Wilcoxon Signed Rank Test 352


Comparing Repeated Measurements of the Same Individuals

Percentiles The two percentile points that define the upper and lower
tails of the observed values.

W Statistic The Wilcoxon test statistic W is computed by ranking all the differences
before and after the treatment based on their absolute value, then
attaching the signs of the difference to the corresponding ranks. The
signed ranks are summed and compared.

If the absolute value of W is “large”, you can conclude that there was a
treatment effect (i.e., the ranks tend to have the same sign, so there is a
statistically significant difference before and after the treatment).

If W is small, the positive ranks are similar to the negative ranks, and you
can conclude that there is no treatment effect.

P Value The P value is the probability of being wrong in concluding


that there is a true effect (i.e., the probability of falsely rejecting the null
hypothesis, or committing a Type I error, based on W). The smaller the
P value, the greater the probability that the there is a treatment effect.

Traditionally, you can conclude there is a significant difference when


P ! 0.05.

Signed Rank Test Report Graphs 10

You can generate a line scatter graph of the changes after treatment for a
Signed Rank Test report.

Before and After The Signed Rank Test graph uses lines to plot a subject's change after
Line Graph each treatment. If the graph plots raw data, the lines represent the rows
in the column, the column titles are used as the tick marks for the X axis
and the data is used as the tick marks for the Y axis.

If the graph plots indexed data, the lines represent the levels in the
subject column, the levels in the treatment column are used as the tick
marks for the X axis, the data is used as the tick marks for the Y axis, and
the treatment and data column titles are used as the axis titles.

Creating To generate a graph of Signed Rank Test data:


Report Graphs
8 Click the toolbar button, or choose the Graph menu Create
Graph command when the Signed Rank Test report is selected.

Wilcoxon Signed Rank Test 353


Comparing Repeated Measurements of the Same Individuals

The Create Graph dialog box appears displaying the types of


graphs available for the Signed Rank Test results.

FIGURE 9–16
The Create Graph
Dialog Box for the Signed
Rank Test Report

9 Select the type of graph you want to create from the Graph Type
list, then select OK, or double-click the desired graph in the list.
For more information on each of the graph types, see Chapter 8.
The specified graph appears in a graph window or in the report.

For information on manipulating graphs, see pages 8-178 through


8-202.

FIGURE 9–17
A Before & After
Scatter Graph

Wilcoxon Signed Rank Test 354


10
One Way Repeated Measures Analysis of Variance (ANOVA)10
Use a one way or one factor repeated measures ANOVA (analysis of
variance) when:

➤ You want to see if a single group of individuals was affected by a


series of experimental treatments or conditions.
➤ Only one factor or one type of intervention is considered in each
treatment or condition.
➤ The treatment effects are normally distributed with the same
variances.

If you know that the treatment effects are not normally distributed, use
the Friedman Repeated Measures ANOVA on Ranks. If your want to
consider the effects of an additional factor on your experimental
treatments, use Two Way Repeated Measures ANOVA. When there is
only a single treatment, you can do a Paired t-test (depending on the
type of results you want).

Note that, depending on your One Way Repeated Measures ANOVA


options settings (see page 357), if you attempt to perform an ANOVA on a
non-normal population, SigmaStat informs you that the data is unsuitable
for a parametric test, and suggest the Friedman ANOVA on Ranks instead.

About the One A One Way or One Factor Repeated Measures ANOVA tests for
Way Repeated Measures differences in the effect of a series of experimental interventions on the
ANOVA same group of subjects by examining the changes in each individual.
Examining the changes rather than the values observed before and after
interventions removes the differences due to individual responses,
producing a more sensitive (or more powerful) test.

The design for a One Way Repeated Measures ANOVA is essentially the
same as a Paired t-test, except that there can be multiple treatments on
the same group. The null hypothesis is that there are no differences
among all the treatments.

One Way Analysis of Variance is a parametric test that assumes that all
treatment effects are normally distributed with the same standard
deviations (variances).

One Way Repeated Measures Analysis of Variance (ANOVA) 355


Performing a One Way To perform a One Way Repeated Measures ANOVA:
Repeated Measures
ANOVA 1 Enter or arrange your data appropriately in the data worksheet
(see the following section).

2 If desired, set the One Way Repeated Measures ANOVA options


using the Options for One Way Repeated Measures ANOVA
dialog box box (page 358).

3 Select One Way RM ANOVA from the toolbar, or choose


Repeated Measures command, then choose One Way Repeated
Measures ANOVA from the Statistics menu.

4 Run the test by selecting the worksheet columns with the data you
want to test using the Pick Columns dialog box box (page 363).

5 Specify the multiple comparisons you want to perform on your


test (page 364).

6 View and interpret the One Way Repeated Measures ANOVA


report and generate report graphs (pages 10-369 and 10-376).

Arranging One Way Repeated Measures ANOVA Data 10

The format of the data to be tested can be raw data or indexed data.
Raw data is placed in as many columns as there are treatments, up to 64;
each column contains the data for one treatment. The columns for raw
data must be the same length.

Indexed data is placed in two worksheet columns. You cannot use


statistical summary data for repeated measures tests.

Selecting Data Columns When running a One Way Repeated Measures ANOVA you can either:

➤ Select the columns to test by dragging your mouse over the columns
before choosing the test.
➤ Select the columns while running the test (see page 99).

Missing Data Points If there are missing values, SigmaStat automatically handles the missing
data by using a general linear model. This approach constructs
hypothesis tests using the marginal sums of squares (also commonly
called the Type III or adjusted sums of squares). However, the columns
must still be equal in length.

One Way Repeated Measures Analysis of Variance (ANOVA) 356


FIGURE 10–20
Valid Data Formats for
a One Way Repeated
Measures ANOVA
Columns 1 through 3 are
arranged as raw data.
Columns 4, 5, and 6 are
arranged as indexed data,
with column 4 as the
treatment index column and
column 5 as the subject index
column.

Setting One Way Repeated Measures ANOVA Options 10

Use the One Way Repeated Measures ANOVA options to:

➤ Adjust the parameters of the test to relax or restrict the testing of


your data for normality and equal variance.
➤ Display the statistics summary table and the confidence interval for
the data, and assign residuals to a worksheet column.
➤ Enable multiple comparisons.
➤ Compute the power, or sensitivity, of the test.

To change the One Way Repeated Measures ANOVA options:

1 If you are going to run the test after changing test options, and
want to select your data before you run the test, drag the pointer
over your data.

2 To open the Options for One Way RM ANOVA dialog box box,
select One Way RM ANOVA from the toolbar drop-down list,
then click the button, or choose the Statistics menu Current
Test Options... command. The Normality and Equal Variance
options appear (see Figure 10–21 on page 358).

3 Click the Results tab to view the Summary Table, Confidence


Intervals, and Residuals in Column options (see Figure 10–22 on
page 360). Click the Post Hoc Test tab to view the Power and
Multiple Comparisons options (see Figure 10–23 on page 361).
Click the Assumption Checking tab to return to the Normality
and Equal Variance options.

One Way Repeated Measures Analysis of Variance (ANOVA) 357


4 Click a check box to enable or disable a test option. Options
settings are saved between SigmaStat sessions. For more
information on each of the test options, see pages 10-358 through
10-361.

5 To continue the test, click Run Test. The Pick Columns dialog
box box appears (see page 363 for more information).

6 To accept the current settings and close the options dialog box box,
click OK. To accept the current setting without closing the
options dialog box box, click Apply. To close the dialog box box
without changing any settings or running the test, click Cancel.

You can select Help at any time to access SigmaStat’s on-line help
system.

Normality and Select the Assumption Checking tab from the options dialog box box to
Equal Variance view the Normality and Equal Variance options. The normality
Assumptions assumption test checks for a normally distributed population. The equal
variance assumption test checks the variability about the group means.

Normality Testing SigmaStat uses the Kolmogorov-Smirnov test to test


for a normally distributed population.

FIGURE 10–21
The Options for
One Way RM ANOVA
Dialog Box Displaying
the Assumption
Checking Options

Equal Variance Testing SigmaStat tests for equal variance by checking


the variability about the group means.

Enter the corresponding P value in the P Value to Reject box. The P


value determines the probability of being incorrect in concluding that
the data is not normally distributed (the P value is the risk of falsely
rejecting the null hypothesis that the data is normally distributed). If

One Way Repeated Measures Analysis of Variance (ANOVA) 358


the P value computed by the test is greater than the P set here, the test
passes.

To require a stricter adherence to normality and/or equal variance,


increase the P value. Because the parametric statistical methods are
relatively robust in terms of detecting violations of the assumptions, the
suggested value in SigmaStat is 0.050. Larger values of P (for example,
0.100) require less evidence to conclude that data is not normal.

To relax the requirement of normality and/or equal variance,


decrease P. Requiring smaller values of P to reject the normality
assumption means that you are willing to accept greater deviations from
the theoretical normal distribution before you flag the data as non-
normal. For example, a P value of 0.050 requires greater deviations from
normality to flag the data as non-normal than a value of 0.100.

Although the assumption tests are robust in detecting data from populations
that are non-normal or with unequal variances, there are extreme conditions
of data distribution that these tests cannot take into account. For example,
the Levene Median test fails to detect differences in variance of several orders
of magnitude. However, these conditions should be easily detected by
simply examining the data without resorting to the automatic assumption
tests.

Summary Table Select the Results tab in the options dialog box box to view the
Summary Table option. The Summary Table option displays the
number of observations for a column or group, the number of missing
values for a column or group, the average value for the column or group,
the standard deviation of the column or group, and the standard error of
the mean for the column or group.

Confidence Intervals Select the Results tab in the options dialog box box to view the
Confidence Intervals option. The Confidence Intervals option displays
the confidence interval for the difference of the means. To change the
interval, enter any number from 1 to 99 (95 and 99 are the most
commonly used intervals). Click the selected check box if you do not
want to include the confidence interval in the report.

Residuals Select the Results tab in the options dialog box box to view the Residuals
option. Use the Residuals option to display residuals in the report and to
save the residuals of the test to the specified worksheet column. To

One Way Repeated Measures Analysis of Variance (ANOVA) 359


change the column the residuals are saved to, edit the number in or
select a number from the drop-down list.

FIGURE 10–22
The Options for One Way
ANOVA Dialog Box
Displaying the Summary
Table Options

Power Select the Post Hoc Tests tab in the options dialog box box to view the
Power option. The power or sensitivity of a test is the probability that
the test will detect a difference between the groups if there is really a
difference.

Change the alpha value by editing the number in the Alpha Value box.

Alpha (#) is the acceptable probability of incorrectly concluding that


there is a difference. The suggested value is # " 0.05. This indicates
that a one in twenty chance of error is acceptable, or that you are willing
to conclude there is a significant difference when P ! 0.05.

Smaller values of # result in stricter requirements before concluding


there is a significant difference, but a greater possibility of concluding
there is no difference when one exists. Larger values of # make it easier

One Way Repeated Measures Analysis of Variance (ANOVA) 360


to conclude that there is a difference, but also increase the risk of
reporting a false positive.

FIGURE 10–23
The Options for One Way
ANOVA Dialog Box
Displaying the Power and
Multiple Comparison
Options

Multiple Comparisons Select the Post Hoc Test tab in the Options dialog box box to view the
multiple comparisons options (see Figure 10–23 on page 361). A One
Way Repeated Measures ANOVA tests the hypothesis of no differences
between the several treatment groups, but do not determine which
groups are different, or the sizes of these differences. Multiple
comparison procedures isolate these differences.

The P value used to determine if the ANOVA detects a difference is set


in the Report Options dialog box box. If the P value produced by the
One Way ANOVA is less than the P value specified in the box, a
difference in the groups is detected and the multiple comparisons are
performed. For more information on specifying a P value for the
ANOVA, see Setting Report Options on page 135.

Performing Multiple Comparisons You can choose to always perform


multiple comparisons or to only perform multiple comparisons if a One
Way Repeated Measures ANOVA detects a difference.

Select the Always Perform option to perform multiple comparisons


whether or not the ANOVA detects a difference.

Select the Only When ANOVA P Value is Significant option to perform


multiple comparisons only if the ANOVA detects a difference.

Significant Multiple Comparison Value Select either .05 or .10 from


the Significance Value for Multiple Comparisons drop-down list. This
value determines the that the likelihood of the multiple comparison

One Way Repeated Measures Analysis of Variance (ANOVA) 361


being incorrect in concluding that there is a significant difference in the
treatments.

A value of .05 indicates that the multiple comparisons will detect a


difference if there is less than 5% chance that the multiple comparison is
incorrect in detecting a difference. A value of .10 indicates that the
multiple comparisons will detect a difference if there is less than 10%
chance that the multiple comparison is incorrect in detecting a
difference.

If multiple comparisons are triggered, the Multiple Comparison Options


dialog box box appears after you pick your data from the worksheet and run
the test, prompting you to choose a multiple comparison method.

Running a One Way Repeated Measures ANOVA 10

To run a test, you need to select the data to test. The Pick Columns
dialog box box is used to select the worksheet columns with the data you
want to test and to specify how your data is arranged in the worksheet.

To run a One Way Repeated Measures ANOVA:

1 If you want to select your data before you run the test, drag the
pointer over your data.

2 Open the Pick Columns dialog box box to start the One Way
Repeated Measures ANOVA. You can either:

➤ SelectOne Way RM ANOVA from the toolbar drop-down list,


then click the button.
➤ Choose the Statistics menu Repeated Measures command, then
choose One Way Repeated Measures ANOVA .
➤ Click the Run Test button from the Options for One Way RM
ANOVA dialog box box (see page 358).

The Pick Columns dialog box box appears prompting you to


specify a data format.

3 Select the appropriate data format from the Data Format drop-
down list. If your data is grouped in columns, select Raw. If your
data is in the form of a group index column(s) paired with a data
column(s), select Indexed.

One Way Repeated Measures Analysis of Variance (ANOVA) 362


For more information on arranging data, see Data Format for
Repeated Measures Tests on page 330, or Arranging Data for
Contingency Tables on page 69.

FIGURE 10–24
The Pick Columns
for One Way RM ANOVA
Dialog Box Prompting You to
Specify a Data Format

4 Select Next to pick the data columns for the test. If you selected
columns before you chose the test, the selected columns appear in
the Selected Columns list.

5 To assign the desired worksheet columns to the Selected Columns


list, select the columns in the worksheet, or select the columns
from the Data drop-down list.

The first selected column is assigned to the first row in the Selected
Columns list, and all successively selected columns are assigned to
successive rows in the list. The number or title of selected columns
appear in each row. You are prompted to pick a minimum of two
and a maximum of 64 columns for raw data, two columns for
indexed data, and three columns for statistical summary data.

FIGURE 10–25
The Pick Columns
for One Way RM ANOVA
Dialog Box Prompting You
to Select Data Columns

6 To change your selections, select the assignment in the list, then


select new column from the worksheet. You can also clear a

One Way Repeated Measures Analysis of Variance (ANOVA) 363


column assignment by double-clicking it in the Selected Columns
list.

7 Select Finish to perform the One Way Repeated Measures


ANOVA. If you elected to test for normality and equal variance,
and your data fail either test, SigmaStat warns you and suggests
continuing your analysis using the nonparametric Friedman
Repeated Measures ANOVA on Ranks (see page 408).

If you selected to run multiple comparisons only when the P value


is significant, and the P value is not significant (see page 374), the
One Way ANOVA report appears after the test is complete. To
edit the report, use the Format menu commands; for information
on editing reports, see Editing Reports on page 137.

If the P value for multiple comparisons is significant, or you


selected in to always perform multiple comparisons, the Multiple
Comparisons Options dialog box box appears prompting you to
select a multiple comparison method. For more information on
selecting a multiple comparison method, see the following section,
Multiple Comparisons Options.

Multiple Comparison Options 10

The One Way Repeated Measures ANOVA tests the hypothesis of no


differences between the several treatment groups, but do not determine
which groups are different, or the sizes of these differences. Multiple
comparison tests isolate these differences by running comparisons
between the experimental groups.

If you selected to run multiple comparisons only when the P value is


significant, and the ANOVA produces a P value equal to or less than the
trigger P value, or you selected to always run multiple comparisons in
the Options for One Way RM ANOVA dialog box box (see page 358),
the Multiple Comparison Options dialog box box appears prompting
you to specify a multiple comparison test. The P value produced by the
ANOVA is displayed in the upper left corner of the dialog box box. For
more information on the P value and how if affects multiple comparison
testing, see page 374.

There are seven kinds of multiple comparison tests available for the One
Way Repeated Measures ANOVA.

One Way Repeated Measures Analysis of Variance (ANOVA) 364


You can choose to perform the:

➤ Holm-Sidak Test
➤ Tukey Test
➤ Student-Newman-Keuls Test
➤ Bonferroni t-test
➤ Fisher’s LSD
➤ Dunnett’s Test
➤ Duncan’s Multiple Range Test

There are two types of multiple comparisons available for the One Way
Repeated Measures ANOVA. The types of comparison you can make
depends on the selected multiple comparison test.

➤ All pairwise comparisons compare all possible pairs of treatments.


➤ Multiple comparisons versus a control compare all experimental
treatments to a single control group.

Holm-Sidak Test The Holm-Sidak Test can be used for both pairwise comparisons and
comparisons versus a control group. It is more powerful than the Tukey
and Bonferroni tests and, consequently, it is able to detect differences
that these other tests do not. It is recommended as the first-line
procedure for pairwise comparison testing.

When performing the test, the P values of all comparisons are computed
and ordered from smallest to largest. Each P value is then compared to a
critical level that depends upon the significance level of the test (set in
the test options), the rank of the P value, and the total number of
comparisons made. A P value less than the critical level indicates there is
a significant difference between the corresponding two groups.

Tukey Test The Tukey Test and the Student-Newman-Keuls test are conducted
similarly to the Bonferroni t-test, except that it uses a table of critical
values that is computed based on a better mathematical model of the
probability structure of the multiple comparisons. The Tukey Test is
more conservative than the Student-Newman-Keuls test, because it
controls the errors of all comparisons simultaneously, while the Student-
Neuman-Keuls test controls errors among tests of k means. Because it is
more conservative, it is less likely to determine that a give differences is
statistically significant and it is the recommended test for all pairwise
comparisons.

One Way Repeated Measures Analysis of Variance (ANOVA) 365


While Multiple Comparisons vs a Control is an available comparison
type for the Tukey Test, it is not recommended. Use the Dunnett’s Test
for multiple comparisons vs a control.

Student-Newman-Keuls The Student-Newman-Keuls Test and the Tukey Test are conducted
(SNK) Test similarly to the Bonferroni t-test, except that it uses a table of critical
values that is computed based on a better mathematical model of the
probability structure of the multiple comparisons. The Student-
Newman-Keuls Test is less conservative than the Tukey Test because it
controls errors among tests of k means, while the Tukey Test controls the
errors of all comparisons simultaneously. Because it is less conservative,
it is more likely to determine that a give differences is statistically
significant. The Student-Newman-Keuls Test is usually more sensitive
than the Bonferroni t-test, and is only available for all pairwise
comparisons.

Bonferroni t-test The Bonferroni t-test performs pairwise comparisons with paired t-tests.
The P values are then multiplied by the number of comparisons that
were made. It can perform both all pairwise comparisons and multiple
comparisons vs a control, and is the most conservative test for both each
comparison type. For less conservative all pairwise comparison tests, see
the Tukey and the Student-Newman-Keuls tests, and for the less
conservative multiple comparison vs a control tests, see the Dunnett’s
Test.

Fisher’s Least The Fisher’s LSD Test is the least conservative all pairwise comparison
Significance Difference test. Unlike the Tukey and the Student-Newman-Keuls, it makes not
Test effort to control the error rate. Because it makes not attempt in
controlling the error rate when detecting differences between groups, it
is not recommended.

Dunnett’s Test Dunnett's test is the analog of the Student-Newman-Keuls Test for the
case of multiple comparisons against a single control group. It is
conducted similarly to the Bonferroni t-test, but with a more
sophisticated mathematical model of the way the error accumulates in
order to derive the associated table of critical values for hypothesis
testing. This test is less conservative than the Bonferroni Test, and is
only available for multiple comparisons vs a control.

Duncan’s The Duncan’s Test is the same way as the Tukey and the Student-
Multiple Range Newman-Keuls tests, except that it is less conservative in determining
whether the difference between groups is significant by allowing a wider

One Way Repeated Measures Analysis of Variance (ANOVA) 366


range for error rates. Although it has a greater power to detect
differences than the Tukey and the Student-Newman-Keuls tests, it has
less control over the Type1 error rate, and is, therefore, not
recommended.

Performing a Multiple The multiple comparison test you choose depends on the treatments
Comparison you are testing. Select Cancel if you do not want to perform a multiple
comparison test.

To perform a multiple comparison test:

1 The All Levels option under the Select Factors to Compare


heading determines whether or not multiple comparison are
performed. This option is automatically selected if P value
produced by the ANOVA (displayed in the upper left corner of the
dialog box box) is less than or equal to the P value set in the
Options dialog box box, and multiple comparisons are performed.
If the P value displayed in the dialog box is greater than the P value
set in the Options dialog box, the All Factors options is not
selected and multiple comparisons are not performed.

You can disable multiple comparison testing for the groups by


clicking the selected All Factors check box.

2 Select the desired multiple comparison test from the Suggested


Test drop-down list. The Tukey and Student-Newman-Keuls tests
are recommended for determining the difference among all
treatments. If you have only a few treatments, you may want to
select the simpler Bonferroni t-test.

The Dunnett's test is recommended for determining the


differences between the experimental treatments and a control
group. If you have only a few treatments or observations, you can
select the simpler Bonferroni t-test.

Note that in both cases the Bonferroni t-test is most sensitive with a
small number of groups. Dunnett’s test is not available if you have
fewer than six observations.

For more information on each of the multiple comparison tests,


see page 364.

One Way Repeated Measures Analysis of Variance (ANOVA) 367


3 Select a Comparison Type. The types of comparisons available
depend on the selected test. All Pairwise compares all possible
pairs of treatments and is available for the Tukey, Student-
Newman-Keuls, Bonferroni, Fisher LSD, and Duncan’s tests.

FIGURE 10–26
The Multiple Comparison
Options Dialog Box

Versus Control compares all experimental treatments to a single


control group and is available for the Tukey, Bonferroni, Fisher
LSD, Dunnett’s, and Duncan’s tests. It is not recommended for the
Tukey, Fisher LSD, or Duncan’s test. If you select Versus Control,
you must also select the control group from the list of groups.

For more information on multiple comparison test and the


available comparison types, see page 374.

4 If you selected an all pairwise comparison test, select Finish to


continue with the One Way RM ANOVA and view the report (see
Figure 10–28 on page 370). For information on editing reports,
see Editing Reports on page 137.

5 If you selected a multiple comparisons versus a control test,


select Next. The Multiple Comparisons Options dialog box
prompts you to select a control group. Select the desired control
group from the list, then select Finish to continue with the One
Way RM ANOVA and view the report (see Figure 10–28 on page

One Way Repeated Measures Analysis of Variance (ANOVA) 368


370). For information on editing reports, see Editing Reports on
page 137.

FIGURE 10–27
The Multiple Comparison
Options Dialog Box
Prompting You to
Select a Control Group

Interpreting One Way Repeated Measures ANOVA Results 10

The One Way Repeated Measures ANOVA report generates an ANOVA


table describing the source of the variation in the treatments. This table
displays the degrees of freedom, sum of squares, and mean squares of the
treatments, as well as the F statistic and the corresponding P value. The
other results displayed are in the Options for One Way RM ANOVA
dialog box (see Setting One Way Repeated Measures ANOVA Options
on page 357).

You can also generate tables of multiple comparisons. Multiple


Comparison results are also specified in the Options for One Way RM
ANOVA dialog box (see page 358). The test used to perform the
multiple comparison is selected in the Multiple Comparison Options
dialog box (see page 368).

For descriptions of the derivations for One Way RM ANOVA results,


you can reference any appropriate statistics reference. For a list of
suggested references, see page 12.

The report scroll bars only scroll to the top and bottom of the current page.
To move to the next or the previous page in the report, use the and
buttons in the formatting toolbar to move one page up and down in the
report.

Result Explanations In addition to the numerical results, expanded explanations of the results
may also appear. To turn off this explanatory text, choose the Statistics

One Way Repeated Measures Analysis of Variance (ANOVA) 369


menu Report Options... command and uncheck the Explain Test
Results option.

The number of decimal places displayed is set in the Report Options


dialog box. For more information on setting report options see Setting
Report Options on page 135.

FIGURE 10–28
Example of the
One Way Repeated
Measures ANOVA
Report

If There Were Missing If your data contained missing values, the report indicates the results
Data Cells were computed using a general linear model. The ANOVA table
includes the degrees of freedom used to compute F, the estimated mean
square equations are listed, and the summary table displays the estimated
least square means.

For descriptions of the derivations for One Way Repeated Measures


ANOVA results, you can reference an appropriate statistics reference.
For a list of suggested references, see page 12.

One Way Repeated Measures Analysis of Variance (ANOVA) 370


Normality Test Normality test results display whether the data passed or failed the test of
the assumption that the differences of the changes originate from a
normal distribution, and the P value calculated by the test. Normally
distributed source populations are required for all parametric tests.

This result appears unless you disabled equal variance testing in the
Options for One Way RM ANOVA dialog box (see page 358).

Equal Variance Test Equal Variance test results display whether or not the data passed or
failed the test of the assumption that the differences of the changes
originate from a population with the same variance, and the P value
calculated by the test. Equal variances of the source populations are
assumed for all parametric tests.

This result appears unless you disabled equal variance testing in the
Options for One Way RM ANOVA dialog box (see page 358).

Summary Table If you enabled this option in the Options for One Way RM ANOVA
dialog box, SigmaStat generates a summary table listing the sample sizes
N, number of missing values, mean, standard deviation, differences of
the means and standard deviations, and standard error of the means.

N (Size) The number of non-missing observations for that column or


group.

Missing The number of missing values for that column or group.

Mean The average value for the column. If the observations are
normally distributed the mean is the center of the distribution.

Standard Deviation A measure of variability. If the observations are


normally distributed, about two-thirds will fall within one standard
deviation above or below the mean, and about 95% of the observations
will fall within two standard deviations above or below the mean.

Standard Error of the Mean A measure of the approximation with


which the mean computed from the sample approximates the true
population mean.

Power The power of the performed test is displayed unless you disable this
option in the Options for One Way RM ANOVA dialog box.

The power, or sensitivity, of a One Way Repeated Measures ANOVA is


the probability that the test will detect a difference among the

One Way Repeated Measures Analysis of Variance (ANOVA) 371


treatments if there really is a difference. The closer the power is to 1, the
more sensitive the test.

Repeated measures ANOVA power is affected by the sample sizes, the


number of treatments being compared, the chance of erroneously
reporting a difference # (alpha), the observed differences of the group
means, and the observed standard deviations of the samples.

Alpha (#) Alpha (#) is the acceptable probability of incorrectly


concluding that there is a difference. An # error is also called a Type I
error. A Type I error is when you reject the hypothesis of no effect when
this hypothesis is true.

The # value is set in the Options for One Way RM ANOVA dialog box;
the suggested value is # " 0.05 which indicates that a one in twenty
chance of error is acceptable. Smaller values of # result in stricter
requirements before concluding there is a significant difference, but a
greater possibility of concluding there is no difference when one exists (a
Type II error). Larger values of # make it easier to conclude that there
is a difference but also increase the risk of seeing a false difference (a
Type I error).

ANOVA Table The ANOVA table lists the results of the one way repeated measures
ANOVA.

DF (Degrees of Freedom) Degrees of freedom represents the number


of samples and sample size, which affects the sensitivity of the ANOVA.

➤ The degrees of freedom between subjects is a measure of the number


of subjects
➤ The degrees of freedom within subjects is a measure of the total
number of observations, adjusted for the number of treatments
➤ The degrees of freedom for the treatments is a measure of the
number of treatments
➤ The residual degrees of freedom is a measure of the difference
between the number of observations, adjusted for the number of
subjects and treatments
➤ The total degrees of freedom is a measure of both number of
subjects and treatments

SS (Sum of Squares) The sum of squares is a measure of variability


associated with each element in the ANOVA data table.

One Way Repeated Measures Analysis of Variance (ANOVA) 372


➤ The sum of squares between the subjects measures the variability of
the average responses of each subject
➤ The sum of squares within the subjects measures the underlying
total variability within each subject
➤ The sum of squares of the treatments measures the variability of the
mean treatment responses within the subjects
➤ The residual sum of squares measures the underlying variability
among all observations after accounting for differences between
subjects
➤ The total sum of squares measures the total variability

MS (Mean Squares) The mean squares provide estimates of the


variances. Comparing these variance estimates is the basis of analysis of
variance.

The mean square of the treatments is

sum of squares between groups - SS between


---------------------------------------------------------------------------- = --------------------- = MS between
degrees of freedom between groups DF between

The residual mean square is

sum of squares within groups - SS within


------------------------------------------------------------------------- = ------------------- = MS within
degrees of freedom within groups DF within

F Statistic The F test statistic is a ratio used to gauge the differences of the effects.
If there are no missing data, F is calculated as

estimated population variance between groups- = --------------------- MS between


-------------------------------------------------------------------------------------------------------- - = F
estimated population variance within groups MS within

If the F ratio is around 1, you can conclude that there are no differences
among treatments (the data is consistent with the null hypothesis that
there are no treatment effects).

If F is a large number, the variability among the effect means is larger


than expected from random variability in the treatments, you can

One Way Repeated Measures Analysis of Variance (ANOVA) 373


conclude that the treatments have different effects (the differences
among the treatments are statistically significant).

P Value The P value is the probability of being wrong in concluding


that there is a true difference among the treatments (i.e., the probability
of falsely rejecting the null hypothesis, or committing a Type I error,
based on F). The smaller the P value, the greater the probability that the
treatment effects are different. Traditionally, you can conclude that
there are significant differences when P ! 0.05.

Expected If there was missing data and a general linear model was used, the linear
Mean Squares equations for the expected mean squares computed by the model are
displayed. These equations are displayed only if a general linear model
was used.

Multiple Comparisons If you selected to perform multiple comparisons (see page 364), a table
of the comparisons between group pairs is displayed. The multiple
comparison procedure is activated in the Options for One Way RM
ANOVA dialog box (see page 361). The tests used in the multiple
comparison procedure is selected in the Multiple Comparison Options
dialog box (see page 368).

Multiple comparison results are used to determine exactly which


treatments are different, since the ANOVA results only inform you that
two or more of the groups are different. The specific type of multiple
comparison results depends on the comparison test used and whether
the comparison was made pairwise or versus a control.

➤ All pairwise comparison results list comparisons of all possible


combinations of group pairs; the all pairwise tests are the Tukey,
Student-Newman-Keuls, Fisher LSD, Duncan’s test and the
Bonferroni t-test.
➤ Comparisons versus a single control group list only comparisons
with the selected control group. The control group is selected
during the actual multiple comparison procedure. The comparison
versus a control tests are the Bonferroni t-test and the Dunnett’s,
Fishers LSD, and Duncan’s tests.

For descriptions of the derivation of parametric multiple comparison


procedure results, you can reference an appropriate statistics reference.
For a list of suggested references, see page 12.

One Way Repeated Measures Analysis of Variance (ANOVA) 374


Bonferroni t-test Results The Bonferroni t-test lists the differences of
the means for each pair of treatments, computes the t values for each
pair, and displays whether or not P ! 0.05 for that comparison. The
Bonferroni t-test can be used to compare all treatments or to compare
versus a control.

You can conclude from “large” values of t that the difference of the two
treatments being compared is statistically significant.

If the P value for the comparison is less than 0.05, the likelihood of
erroneously concluding that there is a significant difference is less than
5%. If it is greater than 0.05, you cannot confidently conclude that
there is a difference.

The difference of the means is a gauge of the size of the difference


between the two treatments.

Tukey, Student-Newman-Keuls, Fisher LSD, Duncan’s, and


Dunnett's Test Results The Tukey, Student-Newman-Keuls (SNK),
Fisher LSD, and Duncan’s tests are all pairwise comparisons of every
combination of group pairs. While the Tukey Fisher LSD, and
Duncan’s can be used to compare a control group to other groups, they
are not recommended for this type of comparison.

Dunnett's test only compares a control group to all other groups. All
tests compute the q test statistic, the number of means spanned in the
comparison p, and display whether or not P ! 0.05 for that pair
comparison.

You can conclude from “large” values of q that the difference of the two
treatments being compared is statistically significant.

If the P value for the comparison is less than 0.05, the likelihood of
being incorrect in concluding that there is a significant difference is less
than 5%. If it is greater than 0.05, you cannot confidently conclude
that there is a difference.

The difference of the means is a gauge of the size of the difference


between the two treatments.

p is parameter used when computing q. The larger the p, the larger q


needs to be to indicate a significant difference. p is an indication of the
differences in the ranks of the group means being compared. Groups

One Way Repeated Measures Analysis of Variance (ANOVA) 375


means are ranked in order from largest to smallest in an SNK test, so p is
the number of means spanned in the comparison. For example, when
comparing four means, comparing the largest to the smallest p " 4, and
when comparing the second smallest to the smallest p " 2.

If a treatment is found to be not significantly different than another


treatment, all treatments with p ranks in between the p ranks of the two
treatments that are not different are also assumed not to be significantly
different, and a result of DNT (Do Not Test) appears for those
comparisons.

One Way Repeated Measures ANOVA Report Graphs 10

You can generate up to three graphs using the results from a One Way
RM ANOVA. They include a:

➤ Line scatter graph of the changes after treatment.


➤ Histogram of the residuals.
➤ Normal probability plot of the residuals.
➤ Multiple comparison graphs.

Before and After The One Way Repeated Measures ANOVA uses lines to plot a subject's
Line Graph change after each treatment. If the graph plots raw data, the lines
represent the rows in the column, the column titles are used as the tick
marks for the X axis and the data is used as the tick marks for the Y axis.
For an example of a before and after graph, see page 159.

If the graph plots indexed data, the lines represent the levels in the
subject column, the levels in the treatment column are used as the tick
marks for the X axis, the data is used as the tick marks for the Y axis, and
the treatment and data column titles are used as the axis titles.

Histogram of Residuals The One Way Repeated Measures ANOVA histogram plots the raw
residuals in a specified range, using a defined interval set. The residuals
are divided into a number of evenly incremented histogram intervals and
plotted as histogram bars indicating the number of residuals in each
interval. The X axis represents the histogram intervals, and the Y axis
represent the number of residuals in each group. For an example of a bar
chart, see page 153.

One Way Repeated Measures Analysis of Variance (ANOVA) 376


Probability Plot The One Way Repeated Measures ANOVA probability plot graphs the
frequency of the raw residuals. The residuals are sorted and then plotted
as points around a curve representing the area of the gaussian plotted on
a probability axis. Plots with residuals that fall along gaussian curve
indicate that your data was taken from a normally distributed
population. The X axis is a linear scale representing the residual values.
The Y axis is a probability scale representing the cumulative frequency of
the residuals. For an example of a bar chart, see page 155.

Multiple The One Way Repeated Measures ANOVA multiple comparison graphs
Comparison Graphs plot significant differences between levels of a significant factor. There is
one graph for every significant factor reported by the specified multiple
comparison test. If there is one significant factor reported, one graph
appears; if there are tow significant factors, two graphs appear, etc. If a
factor is not reported as significant, a graph for the factor does not
appear. For an example of a multiple comparison graph, see page 361.

Creating a Graph To generate a graph of One Way Repeated Measures ANOVA data:

1 Click the toolbar button, or choose the Graph menu Create


Graph command when the One Way Repeated Measures ANOVA
report is selected. The Create Graph dialog box appears displaying
the types of graphs available for the One Way Repeated Measures
ANOVA results.

FIGURE 10–29
The Create Graph Dialog
Box
for a One Way RM
ANOVA Report

2 Select the type of graph you want to create from the Graph Type
list, then select OK, or double-click the desired graph in the list.

One Way Repeated Measures Analysis of Variance (ANOVA) 377


For more information on each of the graph types, see Chapter 8.
The specified graph appears in a graph window or in the report.

FIGURE 10–30
A Normal Probability Plot
for a One Way RM ANOVA

For information on manipulating graphs, see pages 8-178 through


8-202.

One Way Repeated Measures Analysis of Variance (ANOVA) 378


10
Two Way Repeated Measures Analysis of Variance (ANOVA)10
A two way or two factor repeated measures ANOVA (analysis of
variance) should be used when:

➤ You want to see if the same group of individuals are affected by a


series of experimental treatments or conditions.
➤ You want to consider the effect of an additional factor which may or
may not interact, and may or may not be another series of
treatments or conditions.
➤ The treatment effects are normally distributed with equal variances.

SigmaStat performs Two Way Repeated Measures ANOVAs for one factor
repeated or both factors repeated. SigmaStat automatically determines if
one or both factors are repeated from the data, and uses the appropriate
procedures.

If your want to consider the effects of only one factor on your


experimental groups, use One Way Repeated Measures ANOVA.

There is no equivalent in SigmaStat for a two factor repeated measure


comparison for samples drawn from a non-normal populations. If your
data is non-normal, you can transform the data to make it comply better
with the assumptions of analysis of variance using Transform menu
commands. If the sample size is large, and you want to do a
nonparametric test, use the Transform menu Rank command to convert
the observations to ranks, then do a Two Way ANOVA on the ranks.

For more information on transforming your data, see Chapter 14,


USING TRANSFORMS.

About the Two In a two way or two factor repeated measures analysis of variance, there
Way Repeated Measures are two experimental factors which may affect each experimental
ANOVA treatment. Either or both of these factors are repeated treatments on the
same group of individuals. A two factor design tests for differences
between the different levels of each treatment and for interactions
between the treatments. For information on arranging data for the Two
Way Repeated Measures ANOVA, see page 380.

A two factor analysis of variance tests three hypotheses: (1) There is no


difference among the levels or treatments of the first factor; (2) There is
no difference among the levels or treatments of the second factor; and

Two Way Repeated Measures Analysis of Variance (ANOVA) 379


(3) There is no interaction between the factors, i.e., if there is any
difference among treatments within one factor, the differences are the
same regardless of the second factor.

Two Way Repeated Measures ANOVA is a parametric test that assumes


that all the treatment effects are normally distributed with the same
variance. SigmaStat does not have an automatic nonparametric test if
these assumptions are violated.

Performing a Two Way To perform a Two Way Repeated Measures ANOVA:


Repeated Measures
ANOVA 1 Enter or arrange your data appropriately in the data worksheet
(see following section).

2 If desired, set the Two Way Repeated Measures ANOVA options


using the Options for Two Way RM ANOVA dialog box (page
388).

3 Select Two Way RM ANOVA from the toolbar, then select the
button, or choose the Statistics menu Repeated Measures
command, then choose Two Way Repeated Measures ANOVA.

4 Run the test by selecting the worksheet columns with the data you
want to test using the Pick Columns dialog box (page 391).

5 Specify the multiple comparisons you want to perform on your


data (page 392).

6 View and interpret the Two Way Repeated Measures ANOVA


report and generate report graphs (page 397 and page 406).

Arranging Two Way Repeated Measures ANOVA Data 10

Either or both of the two factors used in the Two Way Repeated
Measures ANOVA can be repeated on the same group of individuals.
For example, if you analyze the effect of changing salinity on the activity
of two different species of shrimp, you have a two factor experiment
with a single repeated treatment (salinity). Different salinity treatment

Two Way Repeated Measures Analysis of Variance (ANOVA) 380


and shrimp type are the levels. To see how data for a single repeated
treatment is entered in the worksheet, see Figure 10–32 on page 387.
TABLE 10-1
Data for a Two Way Species Subject Salinity
Repeated Factor ANOVA
with one Repeated Factor 10 15
(Salinity)
Artemia sp. 1 A 10.0 12.5
B 8.5 13.0
C 9.0 10.5
Artemia sp. 2 D 5.5 5.5
E 7.5 8.0
F 7.0 6.5

If you wanted to test the effect of different salinities and temperatures on


the activity on a single species of shrimp, you have a two factor
experiment with two repeated treatments, salinity and temperature. In
both cases, the different combinations of treatments/factors levels are the
cells of the comparison. SigmaStat automatically handles both one and
two repeated treatment factors.

TABLE 3-2 Temperature Subject Salinity


Data for a Two Way
Repeated Factor ANOVA 10 15
with two Repeated Factors
(Temperature and Salinity) 25° A 8.5 11.0
B 8.5 10.5
C 9.5 12.0
30° A 9.0 12.5
B 9.0 11.5
C 10.0 13.0

Missing Data Ideally, the data for a Two Way ANOVA should be completely balanced,
and Empty Cells i.e., each group or cell in the experiment has the same number of
observations and there are no missing data. However, SigmaStat
properly handles all occurrences of missing and unbalanced data
automatically.

Missing Data Point(s) If there are missing values, SigmaStat


automatically handles the missing data by using a general linear model.

Two Way Repeated Measures Analysis of Variance (ANOVA) 381


This approach constructs a hypothesis tests using the marginal sums of
squares (also commonly called the Type III or adjusted sums of squares).
TABLE 3-3
Data for a Two Way Species Subject Salinity
Repeated Factor ANOVA
with one Repeated
10 15
Factor (Salinity) and a Artemia sp. 1 A 10.0 12.5
Missing Data Point
A general linear model is B 8.5 13.0
used to handle missing
data points. C 9.0 10.5
Artemia sp. 2 D 5.5 5.5
E 7.5 --
F 7.0 6.5

Empty Cell(s) When there is an empty cell, i.e., there are no


observations for a combination of two factor levels, but there is still at
least one repeated factor for every subject, SigmaStat stops and suggests
either analysis of the data assuming no interaction between the factors,
or using One Way ANOVA.

Assumption of no interaction analyzes the effects of each treatment


separately.

Note that assuming there is no interaction between the two factors in Two
Way ANOVA can be dangerous. Under some circumstances, this
assumption can lead to a meaningless analysis, particularly if you are
interested in studying the interaction effect.
TABLE 3-4
Data for a Two Way Temperature Subject Salinity
Repeated Factor ANOVA
with Two Repeated Factors 10 15
(Temperature and Salinity)
Repeated And A Missing
25° A 8.5 11.0
Cell B 8.5 10.5
Data with missing cells
that still have repeated C 9.5 12.0
factor data for every
subject can be analyzed 30° A 9.0 --
either by assuming no
interaction or a One
B 9.0 --
Way AVOVA. C 10.0 --

If you treat the problem as One Way ANOVA, each cell in the table is
treated as a different level of a single experimental factor. This approach

Two Way Repeated Measures Analysis of Variance (ANOVA) 382


is the most conservative analysis because it requires no additional
assumptions about the nature of the data or experimental design.

Connected versus Disconnected Data The no interaction assumption


requires that the non-empty cells must be geometrically connected in
order to do the computation of a two factor no interaction model. You
cannot perform Two Way Repeated Measures ANOVA on data
disconnected by empty cells.

TABLE 3-5
Data for a Two Way
Temperature Subject Salinity
Repeated Factor ANOVA 10 15
with Geometrically
Disconnected Data 25° A -- 11.0
This data cannot be
analyzed with a two way B -- 10.5
repeated measures
ANOVA.
C -- 12.0
30° A 9.0 --
B 9.0 --
C 10.0 --

When the data is geometrically connected, you can draw a series of


straight vertical and horizontal lines connecting all cells containing data
without changing direction in any empty cells. SigmaStat automatically
checks for this condition. If disconnected data is encountered during
Two Way Repeated Measures ANOVA, SigmaStat suggests treatment of
the problem as a One Way Repeated Measures ANOVA.

For descriptions of the concept of connectivity, you can reference an


appropriate statistics reference. For a list of suggested references, see
page 12.

Two Way Repeated Measures Analysis of Variance (ANOVA) 383


Missing Factor Data for One Subject Another case of an empty cell
can occur when both factors are repeated, and there are no data for one
level for one of the subjects. SigmaStat automatically handles this
situation by converting the problem to a One Way Repeated Measures
ANOVA.

TABLE 3-6
Data for a Two Way
Temperature Subject Salinity
Repeated Factor ANOVA 10 15
with Two Factors
Repeated and No Data for 25° A -- 11.0
One Level for a Subject
This data cannot be B -- --
analyzed as a one way
repeated measures
C -- 12.0
ANOVA problem. 30° A 9.0 12.5
B 9.0 --
C 10.0 13.0

Entering Two Way Repeated Measures ANOVA can be only be performed on


Worksheet Data data indexed by both subject and two factors. The data is placed in four
columns; the first factor is in one column, the second factor is in a
second column, the subject index is in a third column, and the actual
data is in a fourth column.

FIGURE 10–31
Valid Data Formats for
a Two Way Repeated
Measures ANOVA with
One Factor Repeated
Columns 1 is the subject
index, column 2 is the
non-repeated first factor,
column 3 is the repeated
second factor, and column 4
is the data (see Table 10-1,
on page 10-381).

SigmaStat performs two way repeated measures for one factor repeated or
both factors repeated. SigmaStat automatically determines if one or both
factors are repeated from the data, and uses the appropriate procedures.

Two Way Repeated Measures Analysis of Variance (ANOVA) 384


Selecting Two Way Repeated Measures ANOVAs can be performed on two entire
Data Columns columns or only a portion of two columns. When running a test you
can either:

➤ Select the columns to test by dragging your mouse over the columns
or cells before choosing the test.
➤ Select the columns while running the test .

Setting Two Way Repeated Measures ANOVA Options 10

Use the Two Way Repeated Measures ANOVA to:

➤ Adjust the parameters of a test to relax or restrict the testing of your


data for normality and equal variance (page 386)
➤ Display the statistics summary table and the confidence interval for
the data and assign residuals to the worksheet (page 387)
➤ compute the power, or sensitivity, of the test (page 388)
➤ enable multiple comparison testing (page 389)

To change the Two Way Repeated Measures ANOVA options:

1 If you are going to run the test after changing test options, and
want to select your data before you run the test, drag the pointer
over your data.

2 To open the Options for Two Way RM ANOVA dialog box, select
Two Way RM ANOVA from the toolbar drop-down list, then
click the button, or choose the Statistics menu Current Test
Options... command. The Normality and Equal Variance options
appear (see Figure 10–32 on page 387).

3 Click the Results tab to view the Summary Table, Confidence


Intervals, and Residuals in Column options (see Figure 10–33 on
page 388). Click the Post Hoc Test tab to view the Power and
Multiple Comparisons options (see Figure 10–34 on page 388).
Click the Assumption Checking tab to return to the Normality
and Equal Variance options.

4 Click a check box to enable or disable a test option. Options


settings are saved between SigmaStat sessions. For more
information on each of the test options, see pages 10-386 through
10-389.

Two Way Repeated Measures Analysis of Variance (ANOVA) 385


5 To continue the test, click Run Test. The Pick Columns dialog
box appears (see page 391 for more information).

6 To accept the current settings and close the options dialog box,
click OK. To accept the current setting without closing the
options dialog box, click Apply. To close the dialog box without
changing any settings or running the test, click Cancel.

You can select Help at any time to access SigmaStat’s on-line help
system.

Normality and Select the Assumption Checking tab from the options dialog box to view
Equal Variance the Normality and Equal Variance options. The normality assumption
Assumptions test checks for a normally distributed population. The equal variance
assumption test checks the variability about the group means. .

Normality Testing SigmaStat uses the Kolmogorov-Smirnov test to test


for a normally distributed population.

Equal Variance Testing SigmaStat tests for equal variance by checking


the variability about the group means.

P Values for Normality and Equal Variance The P value determines


the probability of being incorrect in concluding that the data is not
normally distributed (the P value is the risk of falsely rejecting the null
hypothesis that the data is normally distributed). If the P value
computed by the test is greater than the P set here, the test passes.

To require a stricter adherence to normality and/or equal variance,


increase the P value. Because the parametric statistical methods are
relatively robust in terms of detecting violations of the assumptions, the

Two Way Repeated Measures Analysis of Variance (ANOVA) 386


suggested value in SigmaStat is 0.050. Larger values of P (for example,
0.100) require less evidence to conclude that data is not normal.

FIGURE 10–32
The Options for Two
Way RM ANOVA Dialog Box
Displaying the Assumption
Checking Options

To relax the requirement of normality and/or equal variance,


decrease P. Requiring smaller values of P to reject the normality
assumption means that you are willing to accept greater deviations from
the theoretical normal distribution before you flag the data as non-
normal. For example, a P value of 0.050 requires greater deviations from
normality to flag the data as non-normal than a value of 0.100.

Although the assumption tests are robust in detecting data from populations
that are non-normal or with unequal variances, there are extreme conditions
of data distribution that these tests cannot take into account. For example,
the Levene Median test fails to detect differences in variance of several orders
of magnitude. However, these conditions should be easily detected by
simply examining the data without resorting to the automatic assumption
tests.

Summary Table Select the Results tab in the options dialog box to view the summary
table option. The Summary Table option displays the number of
observations for a column or group, the number of missing values for a
column or group, the average value for the column or group, the
standard deviation of the column or group, and the standard error of the
mean for the column or group.

Confidence Interval Select the Results tab in the options dialog box to view the Confidence
Interval option. The Confidence Intervals option displays the
confidence interval for the difference of the means. To change the
interval, enter any number from 1 to 99 (95 and 99 are the most

Two Way Repeated Measures Analysis of Variance (ANOVA) 387


commonly used intervals). Click the selected check box if you do not
want to include the confidence interval in the report.

FIGURE 10–33
The Options for
Two Way ANOVA Dialog Box
Displaying the Summary
Table, Confidence Intervals,
and Residuals Options

Residuals Select the Results tab in the options dialog box to view the Residuals
option. Use the Residuals option to display residuals in the report and
to save the residuals of the test to the specified worksheet column. To
change the column the residuals are saved to, edit the number in or
select a number from the drop-down list.

Power Select the Post Hoc Tests tab in the options dialog box to view the Power
options. The power or sensitivity of a test is the probability that the test
will detect a difference between the groups if there is really a difference.

FIGURE 10–34
The Options for Two
Way RM ANOVA Dialog Box
Displaying the
Power and Multiple
Comparison Options

Change the alpha value by editing the number in the Alpha Value box.

Alpha (#) is the acceptable probability of incorrectly concluding that


there is a difference. The suggested value is # " 0.05. This indicates

Two Way Repeated Measures Analysis of Variance (ANOVA) 388


that a one in twenty chance of error is acceptable, or that you are willing
to conclude there is a significant difference when P ! 0.05.

Smaller values of # result in stricter requirements before concluding


there is a significant difference, but a greater possibility of concluding
there is no difference when one exists. Larger values of # make it easier
to conclude that there is a difference, but also increase the risk of
reporting a false positive.

Multiple Comparisons Select the Post Hoc Test tab in the Options dialog box to view the
multiple comparisons options (see Figure 10–36 on page 396). The
Two Way Repeated Measures ANOVA tests the hypothesis of no
differences between the several treatment groups, but do not determine
which groups are different, or the sizes of these differences. Multiple
comparison procedures isolate these differences.

The P value used to determine if the ANOVA detects a difference is set


in the Report Options dialog box. If the P value produced by the Two
Way RM ANOVA is less than the P value specified in the box, a
difference in the groups is detected and the multiple comparisons are
performed. For more information on specifying a P value for the
ANOVA, see Setting Report Options on page 135.

Performing Multiple Comparisons You can choose to always perform


multiple comparisons or to only perform multiple comparisons if a Two
Way Repeated Measures ANOVA detects a difference.

Select the Always Perform option to perform multiple comparisons


whether or not the ANOVA detects a difference.

Select the Only When ANOVA P Value is Significant option to perform


multiple comparisons only if the ANOVA detects a difference.

Significant Multiple Comparison Value Select either .05 or .10 from


the Significance Value for Multiple Comparisons drop-down list. This
value determines the that the likelihood of the multiple comparison
being incorrect in concluding that there is a significant difference in the
treatments.

A value of .05 indicates that the multiple comparisons will detect a


difference if there is less than 5% chance that the multiple comparison is
incorrect in detecting a difference. A value of .10 indicates that the
multiple comparisons will detect a difference if there is less than 10%

Two Way Repeated Measures Analysis of Variance (ANOVA) 389


chance that the multiple comparison is incorrect in detecting a
difference.

If multiple comparisons are triggered, the Multiple Comparison Options


dialog box appears after you pick your data from the worksheet and run the
test, prompting you to choose a multiple comparison method.

Running a Two Way Repeated Measures ANOVA 10

To run a test, you need to select the data to test. The Pick Columns
dialog box is used to select the worksheet columns with the data you
want to test and to specify how your data is arranged in the worksheet.

To run a Two Way Repeated Measures ANOVA:

1 If you want to select your data before you run the test, drag the
pointer over your data.

2 Open the Pick Columns for Two Way Repeated Measures


ANOVA dialog box to start the Two Way Repeated Measures
ANOVA. You can either:

➤ SelectTwo Way RM ANOVA from the toolbar drop-down list,


then select the button.
➤ Choose the Statistics menu Repeated Measures command, then
choose Two Way RM ANOVA...
➤ Click the Run button from the Options for Two Way RM
ANOVA dialog box (see page 388).

If you selected columns before you chose the test, the selected
columns appear in the column list. If you have not selected
columns, the dialog box prompts you to pick your data.

To assign the desired worksheet columns to the Selected Columns


list, select the columns in the worksheet, or select the columns
from the Data drop-down list.

The first selected column is assigned to the first row in the Selected
Columns list, and all successively selected columns are assigned to
successive rows in the list. The number or title of selected columns
appear in each row. You are prompted to pick four data columns;
the first factor in one column, the second factor in a second

Two Way Repeated Measures Analysis of Variance (ANOVA) 390


column, the subject index is the third, and the actual data in the
forth. .

FIGURE 10–35
The Pick Columns
for Two Way ANOVA
Dialog Box Prompting You to
Select Data Columns

3 To change your selections, select the assignment in the list, then


select new column from the worksheet. You can also clear a
column assignment by double-clicking it in the Selected Columns
list.

4 Select Finish to perform the Two Way repeated measures ANOVA.


If you elected to test for normality and equal variance, and your
data fail either test, SigmaStat warns you and suggests continuing
your analysis using the nonparametric Friedman Repeated
Measures ANOVA on Ranks.

5 If you elected to test for normality and equal variance, SigmaStat


performs the test for normality (Kolmogorov-Smirnov) and the
test for equal variance (Levene Median). If your data fail either
test, SigmaStat informs you. You can either continue, or transform
your data, then perform a Two Way Repeated Measures ANOVA
on the transformed data (see Chapter 14, USING TRANSFORMS).

6 If your data have empty cells, you are prompted to perform the
appropriate procedure.

➤ If you are missing a cell, but the data is still connected, you may
have to proceed by either assuming no interaction between the
factors, or by performing a one factor analysis on each cell
➤ If your data is not geometrically connected, or if a subject is
missing data for one level, you cannot perform a Two Way
Repeated Measures ANOVA. Continue using a One Way
ANOVA, or cancel the test

Two Way Repeated Measures Analysis of Variance (ANOVA) 391


➤ If you are missing a few data points, but there is still at least one
observation in each cell, SigmaStat automatically proceeds

For more information on missing data point and cell handling, see
Missing Data and Empty Cells on page 381.

7 If you selected to run multiple comparisons only when the P value


is significant, and the P value is not significant (see page 386) the
One Way ANOVA report appears after the test is complete. To
edit the report, use the Format menu commands; for information
on editing reports, see Editing Reports on page 137.

If the P value for multiple comparisons is significant, or you


selected in to always perform multiple comparisons, the Multiple
Comparisons Options dialog box appears prompting you to select
a multiple comparison method. For more information on
selecting a multiple comparison method, see the following section,
Multiple Comparisons Options.

Multiple Comparison Options 10

The Two Way Repeated Measures ANOVA tests the hypothesis of no


differences between the several treatment groups, but do not determine
which groups are different, or the sizes of these differences. Multiple
comparison tests isolate these differences by running comparisons
between the experimental groups.

If you selected to run multiple comparisons only when the P value is


significant, and the ANOVA produces a P value equal to or less than the
trigger P value, or you selected to always run multiple comparisons in
the Options for Two Way RM ANOVA dialog box (see page 388), the
Multiple Comparison Options dialog box appears prompting you to
specify a multiple comparison test. The P value produced by the
ANOVA is displayed in the upper left corner of the dialog box. For
more information on the P value and how if affects multiple comparison
testing, see page 386.

There are sevem multiple comparison tests to choose from for the Two
Way Repeated Measures ANOVA. You can choose to perform the

➤ Holm-Sidak Test
➤ Tukey Test

Two Way Repeated Measures Analysis of Variance (ANOVA) 392


➤ Student-Newman-Keuls Test
➤ Bonferroni t-test
➤ Fisher’s LSD
➤ Dunnet’s Test
➤ Duncan’s Multiple Range Test

There are two types of multiple comparisons available for the Two Way
Repeated Measures ANOVA. The types of comparison you can make
depends on the selected multiple comparison test. .

➤ All pairwise comparisons compare all possible pairs of treatments.


➤ Multiple comparisons versus a control compare all experimental
treatments to a single control group.

When comparing the two factors separately, the treatments within one
factor are compared among themselves without regard to the second
factor, and vice versa. These results should be used when the interaction
is not statistically significant.

When the interaction is statistically significant, interpreting multiple


comparisons among different levels of each experimental factor may not
be meaningful. SigmaStat also performs a multiple comparison between
all the cells.

The result of both comparisons is a listing of the similar and different


treatment pairs, i.e., those treatments that are and are not detectably
different from each other. Because no statistical test eliminates
uncertainty, multiple comparison procedures sometimes produce
ambiguous groupings.

Holm-Sidak Test The Holm-Sidak Test can be used for both pairwise comparisons and
comparisons versus a control group. It is more powerful than the Tukey
and Bonferroni tests and, consequently, it is able to detect differences
that these other tests do not. It is recommended as the first-line
procedure for pairwise comparison testing.

When performing the test, the P values of all comparisons are computed
and ordered from smallest to largest. Each P value is then compared to a
critical level that depends upon the significance level of the test (set in
the test options), the rank of the P value, and the total number of
comparisons made. A P value less than the critical level indicates there is
a significant difference between the corresponding two groups.

Two Way Repeated Measures Analysis of Variance (ANOVA) 393


Tukey Test The Tukey Test and the Student-Newman-Keuls test are conducted
similarly to the Bonferroni t-test, except that it uses a table of critical
values that is computed based on a better mathematical model of the
probability structure of the multiple comparisons. The Tukey Test is
more conservative than the Student-Newman-Keuls test, because it
controls the errors of all comparisons simultaneously, while the Student-
Neuman-Keuls test controls errors among tests of k means. Because it is
more conservative, it is less likely to determine that a give differences is
statistically significant and it is the recommended test for all pairwise
comparisons.

While Multiple Comparisons vs a Control is an available comparison


type for the Tukey Test, it is not recommended. Use the Dunnet’s Test
for multiple comparisons vs a control.

Student-Newman-Keuls The Student-Newman-Keuls Test and the Tukey Test are conducted
(SNK) Test similarly to the Bonferroni t-test, except that it uses a table of critical
values that is computed based on a better mathematical model of the
probability structure of the multiple comparisons. The Student-
Newman-Keuls Test is less conservative than the Tukey Test because it
controls errors among tests of k means, while the Tukey Test controls the
errors of all comparisons simultaneously. Because it is less conservative,
it is more likely to determine that a give differences is statistically
significant. The Student-Newman-Keuls Test is usually more sensitive
than the Bonferroni t-test, and is only available for all pairwise
comparisons.

Bonferroni t-test The Bonferroni t-test performs pairwise comparisons with paired t-tests.
The P values are then multiplied by the number of comparisons that
were made. It can perform both all pairwise comparisons and multiple
comparisons vs a control, and is the most conservative test for both each
comparison type. For less conservative all pairwise comparison tests, see
the Tukey and the Student-Newman-Keuls tests, and for the less
conservative multiple comparison vs a control tests, see the Dunnett’s
Test.

Fisher’s Least The Fisher’s LSD Test is the least conservative all pairwise comparison
Significance Difference test. Unlike the Tukey and the Student-Newman-Keuls, it makes not
Test effort to control the error rate. Because it makes not attempt in
controlling the error rate when detecting differences between groups, it
is not recommended.

Two Way Repeated Measures Analysis of Variance (ANOVA) 394


Dunnett’s Test Dunnett's test is the analog of the Student-Newman-Keuls Test for the
case of multiple comparisons against a single control group. It is
conducted similarly to the Bonferroni t-test, but with a more
sophisticated mathematical model of the way the error accumulates in
order to derive the associated table of critical values for hypothesis
testing. This test is less conservative than the Bonferroni Test, and is
only available for multiple comparisons vs a control.

Duncan’s The Duncan’s Test is the same way as the Tukey and the Student-
Multiple Range Newman-Keuls tests, except that it is less conservative in determining
whether the difference between groups is significant by allowing a wider
range for error rates. Although it has a greater power to detect
differences than the Tukey and the Student-Newman-Keuls tests, it has
less control over the Type1 error rate, and is, therefore, not
recommended.

Performing a Multiple The multiple comparison you choose to perform depends on the
Comparison treatments you are testing. Select Cancel if you do not want to perform
a multiple comparison procedure.

1 Multiple comparisons are performed of the factors selected under


the Select Factors to Compare heading. The factors with P values
less than or equal to the value set in the Options dialog box are
automatically selected, and the P values for the selected factors,
and/or the interactions of the factors are displayed in the upper left
corner of the dialog box. If the P value is greater than the P value
set in the Options dialog box, the factor is not selected, the P value
for the factor is not displayed, and multiple comparisons are not
performed for the factor.

Two Way Repeated Measures Analysis of Variance (ANOVA) 395


You can disable multiple comparison testing for a factor by
clicking the selected option.

FIGURE 10–36
The Multiple Comparison
Options Dialog Box for
a Two Way ANOVA

2 Select the desired multiple comparison test from the Suggested


Test drop-down list. The Tukey and Student-Newman-Keuls tests
are recommended for determining the difference among all
treatments. If you have only a few treatments, you may want to
select the simpler Bonferroni t-test.

The Dunnett's test is recommended for determining the


differences between the experimental treatments and a control
group. If you have only a few treatments or observations, you can
select the simpler Bonferroni t-test.

Note that in both cases the Bonferroni t-test is most sensitive with a
small number of groups. Dunnett’s test is not available if you have
fewer than six observations.

For more information on each of the multiple comparison tests,


see page 389.

3 Select a Comparison Type. The types of comparisons available


depend on the selected test. All Pairwise compares all possible
pairs of treatments and is available for the Tukey, Student-
Newman-Keuls, Bonferroni, Fisher LSD, and Duncan’s tests.

Versus Control compares all experimental treatments to a single


control group and is available for the Tukey, Bonferroni, Fisher
LSD, Dunnett’s, and Duncan’s tests. It is not recommended for the
Tukey, Fisher LSD, or Duncan’s test. If you select Versus Control,
you must also select the control group from the list of groups.

Two Way Repeated Measures Analysis of Variance (ANOVA) 396


For more information on multiple comparison test and the
available comparison types, see pages page 389.

4 If you selected an all pairwise comparison test, select Finish to


continue with the Two Way RM ANOVA and view the report (see
Figure 10–38 on page 399). For information on editing reports,
see Editing Reports on page 137.

5 If you selected a multiple comparisons versus a control test,


select Next. The Multiple Comparisons Options dialog box
prompts you to select a control group for each factor. Select the
desired control groups from the lists, then select Finish to continue
with the Two Way RM ANOVA and view the report (see Figure
10–38 on page 399). For information on editing, see Editing
Reports on page 137.

FIGURE 10–37
The Multiple Comparison
Options Dialog Box
Prompting You To Select a
Control Group

Interpreting Two Way Repeated Measures ANOVA Results 10

A Two Way Repeated Measures ANOVA of one repeated factor


generates an ANOVA table describing the source of the variation among
the treatments. This table displays the sum of squares, degrees of
freedom, and mean squares for the subjects, for each factor, for both
factors together, and for the subject and the repeated factor. The
corresponding F statistics and the corresponding P values are also
displayed.

A Two Way Repeated Measures ANOVA of two repeated factors


includes the sum of squares, degrees of freedom, and mean squares for
the subjects with both factors, since both factors are repeated.

Two Way Repeated Measures Analysis of Variance (ANOVA) 397


Corresponding F statistics and the corresponding P values are also
displayed.

Tables of least square means for each for the levels of factor and for the
levels of both factors together are also generated for both one and two
factor two way repeated measures ANOVA.

Additional results for both forms of Two Way Repeated Measure


ANOVA can be disabled and enabled in the Options for Two Way RM
ANOVA dialog box (page 387). Multiple comparisons are enabled in
the Options for Two Way RM ANOVA dialog box. The test used in the
multiple comparison is selected in the Multiple Comparison Options
dialog box (see page 396).

The report scroll bars only scroll to the top and bottom of the current page.
To move to the next or the previous page in the report, use the and
buttons in the formatting toolbar to move one page up and down in the
report.

Result Explanations In addition to the numerical results, expanded explanations of the results
may also appear. To turn off this explanatory text, choose the Statistics
menu Report Options... command and uncheck the Explain Test
Options option.

The number of decimal places displayed is also set in the Report


Options dialog box. For more information on setting report options, see
page 135.

If There Were Missing If your data contained missing values but no empty cells, the report
Data or Empty Cells indicates the results were computed using a general linear model. The
ANOVA table includes the approximate degrees of freedom used to
compute F, the estimated mean square equations are listed, and the
summary table displays the estimated least square means.

If your data contained empty cells, you either analyzed the problem
assuming no interaction, or treated the problem as a One Way ANOVA.

➤ If you choose no interactions, no statistics for factor interaction are


calculated
➤ If you performed a One Way ANOVA, the results shown are
identical to one way ANOVA results (see page 369)

Two Way Repeated Measures Analysis of Variance (ANOVA) 398


FIGURE 10–38
The Report of a
Two Way
Repeated
Measures ANOVA
with One
Repeated Factor

For descriptions of the derivations for two way repeated measures


ANOVA results, you can reference an appropriate statistics reference.
For a list of suggested references, see page 12.

Dependent Variable This is the column title of the indexed worksheet data you are analyzing
with the Two Way Repeated Measures ANOVA. Determining if the
values in this column are affected by the different factor levels is the
objective of the Two Way Repeated Measures ANOVA.

Normality Test Normality test results display whether the data passed or failed the test of
the assumption that the differences of the changes originate from a
normal distribution, and the P value calculated by the test. A normally
distributed source is required for all parametric tests.

This result appears if you enabled normality testing in the Options for
Two Way RM ANOVA dialog box (see page 387).

Equal Variance Test Equal Variance test results display whether or not the data passed or
failed the test of the assumption that the differences of the changes
originate from a population with the same variance, and the P value
calculated by the test. Equal variance of the source is assumed for all
parametric tests.

Two Way Repeated Measures Analysis of Variance (ANOVA) 399


This result appears if you enabled equal variance testing in the Options
for Two Way RM ANOVA dialog box (see page 387).

ANOVA Table The ANOVA table lists the results of the two way repeated measures
ANOVA. The results are calculated for each factor, and then between
the factors.

DF (Degrees of Freedom) The degrees of freedom are a measure of the


numbers of subjects and treatments, which affects the sensitivity of the
ANOVA.

➤ Factor degrees of freedom are measures of the number of treatments


in each factor (columns in the table).
➤ The factor ' factor interaction degrees of freedom is a measure of
the total number of cells.
➤ The subjects degrees of freedom is a measure of the number of
subjects (rows in the table).
➤ The subject ' factor degrees of freedom is a measure of the number
of subjects and treatments for the factor.
➤ The residual degrees of freedom is a measure of difference between
the number of subjects and the number of treatments after
accounting for factor and interaction.

SS (Sum of Squares) The sum of squares is a measure of variability


associated with each element in the ANOVA table.

➤ Factor sum of squares measures variability of treatments in each


factor (between the rows and columns of the table, considered
separately).
➤ The factor ' factor interaction sum of squares measures the
variability of the treatments for both factors; this is the variability of
the average differences between the cell in addition to the variation
between the rows and columns, considered separately.
➤ The subjects sum of squares measures the variability of all subjects.
➤ The subject ' factor sum of squares is a measure of the variability of
the subjects within each factor.
➤ The residual sum of squares is a measure of the underlying
variability of all observations.

Two Way Repeated Measures Analysis of Variance (ANOVA) 400


MS (Mean Squares) The mean squares provide estimates of the
population variances. Comparing these variance estimates is the basis of
analysis of variance.

The mean square for each factor

sum of squares for the factor SS factor


----------------------------------------------------------------------- = ----------------- = MS factor
degrees of freedom for the factor DF factor

is an estimate of the variance of the underlying population computed


from the variability between levels of the factor.

The interaction mean square

sum of squares for the interaction = --------------- SS inter


----------------------------------------------------------------------------------- - = MS inter
degrees of freedom for the interaction DF inter

is an estimate of the variance of the underlying population computed


from the variability associated with the interactions of the factors.

The error mean square (residual, or within groups)

error sum of squares - SS error


---------------------------------------------------- - = MS error
= ---------------
error degrees of freedom DF error

is an estimate of the variability in the underlying population, computed


from the random component of the observations.

F Test Statistic The F test statistic is provided for comparisons within


each factor and between the factors

If there are no missing data, the F statistic within the factors is

mean square for the factor - MS factor


----------------------------------------------------------------------- - = F factor
= -----------------
error mean square for the factor MS error

and the F ratio between the factors is

Two Way Repeated Measures Analysis of Variance (ANOVA) 401


mean square for the interaction - MS inter
----------------------------------------------------------------------------------- - = F inter
= ----------------
error mean square for the interaction MS error

If there are missing data or empty cells, SigmaStat automatically adjusts the F
computations to account for the offsets of the expected mean squares.

If the F ratio is around 1, the data is consistent with the null hypothesis
that there is no effect (i.e., no differences among treatments).

If F is a large number, the variability among the means is larger than


expected from random variability in the population, and you can
conclude that the samples were drawn from different populations (i.e.,
the differences between the treatments are statistically significant).

P value The P value is the probability of being wrong in concluding


that there is a true difference between the treatments (i.e., the
probability of falsely rejecting the null hypothesis, or committing a Type
I error, based on F). The smaller the P value, the greater the probability
that the samples are drawn from different populations. Traditionally,
you can conclude there are significant differences if P ! 0.05.

Approximate DF (Degrees of Freedom) If a general linear model was


used, the ANOVA table also includes the approximate degrees of
freedom that allow for the missing value(s). See DF (Degrees of
Freedom) above for an explanation of the degrees of freedom for each
variable.

Power The power of the performed test is displayed unless you disable this
option in the Options for Two Way RM ANOVA dialog box.

The power, or sensitivity, of a Two Way Repeated Measures ANOVA is


the probability that the test will detect a difference among the
treatments if there really is a difference. The closer the power is to 1, the
more sensitive the test.

Repeated Measures ANOVA power is affected by the sample sizes, the


number of treatments being compared, the chance of erroneously
reporting a difference # (alpha), the observed differences of the group
means, and the observed standard deviations of the samples.

Alpha (#) Alpha (#) is the acceptable probability of incorrectly


concluding that there is a difference. An # error is also called a Type I

Two Way Repeated Measures Analysis of Variance (ANOVA) 402


error. A Type I error is when you reject the hypothesis of no effect when
this hypothesis is true.

The # value is set in the Options for Two Way RM ANOVA dialog box;
the suggested value is # " 0.05 which indicates that a one in twenty
chance of error is acceptable. Smaller values of # result in stricter
requirements before concluding there is a significant difference, but a
greater possibility of concluding there is no difference when one exists (a
Type II error). Larger values of # make it easier to conclude that there
is a difference but also increase the risk of seeing a false difference (a
Type I error).

Expected If there were missing data and a general linear model was used, the linear
Mean Squares equations for the expected mean squares computed by the model are
displayed. These equations are displayed only if a general linear model
was used.

Summary Table The least square means and standard error of the means are displayed for
each factor separately (summary table row and column), and for each
combination of factors (summary table cells). If there are missing values,
the least square means are estimated using a general linear model.

Mean The average value for the column. If the observations are
normally distributed the mean is the center of the distribution.

Standard Error of the Mean A measure of the approximation with


which the mean computed from the sample approximates the true
population mean.

This table is generated if you select to display summary table in the


Options for Two Way RM ANOVA dialog box (see page 388).

Multiple Comparisons If a difference is found among the treatments, a multiple comparison


table can be computed. Multiple comparisons are enabled in the
Options for Two Way Repeated Measures ANOVA dialog box (see page
388). The multiple comparison test is selected in the Multiple
Comparisons Options dialog box (page 396).

Multiple comparison results are used to determine exactly which


treatments are different, since the ANOVA results only inform you that
two or more of the treatments are different. Two factor multiple
comparison for a full Two Way ANOVA also compares:

Two Way Repeated Measures Analysis of Variance (ANOVA) 403


➤ Treatments within each factor without regard to the other factor
(this is a marginal comparison, i.e., only the columns or rows in the
table are compared).
➤ All combinations of factors (all cells in the table are compared).

The specific type of multiple comparison results depends on the


comparison test used and whether the comparison was made pairwise or
versus a control.

➤ All pairwise comparison results list comparisons of all possible


combinations of group pairs; the all pairwise tests are the Tukey,
Student-Newman-Keuls, Fisher LSD, Duncan’s, and Dunnett's, and
Bonferroni t-test.
➤ Comparisons versus a single control group list only comparisons
with the selected control group. The control group is selected
during the actual multiple comparison procedure. The comparison
versus a control tests are a Bonferroni t-test and Dunnett's test.

Bonferroni t-test Results The Bonferroni t-test lists the differences of


the means for each pair of treatments, computes the t values for each
pair, and displays whether or not P ! 0.05 for that comparison. The
Bonferroni t-test can be used to compare all treatments or to compare
versus a control.

You can conclude from “large” values of t that the difference of the two
treatments being compared is statistically significant.

If the P value for the comparison is less than 0.05, the likelihood of
erroneously concluding that there is a significant difference is less than
5%. If it is greater than 0.05, you cannot confidently conclude that
there is a difference.

The Difference of Means is a gauge of the size of the difference between


the treatments or cells being compared.

The degrees of freedom DF for the marginal comparisons are a measure


of the number of treatments (levels) within the factor being compared.
The degrees of freedom when comparing all cells is a measure of the
sample size after accounting for the factors and interaction (this is the
same as the error or residual degrees of freedom).

Two Way Repeated Measures Analysis of Variance (ANOVA) 404


Tukey, Student-Newman-Keuls, Fisher LSD, Duncan’s, and
Dunnett's Test Results The Tukey, Student-Newman-Keuls (SNK),
Fisher LSD, and Duncan’s tests are all pairwise comparisons of every
combination of group pairs. While the Tukey Fisher LSD, and
Duncan’s can be used to compare a control group to other groups, they
are not recommended for this type of comparison. .

Dunnett's test only compares a control group to all other groups. All
tests compute the q test statistic, the number of means spanned in the
comparison p, and display whether or not P ! 0.05 for that pair
comparison.

You can conclude from “large” values of q that the difference of the two
treatments being compared is statistically significant.

If the P value for the comparison is less than 0.05, the likelihood of
being incorrect in concluding that there is a significant difference is less
than 5%. If it is greater than 0.05, you cannot confidently conclude
that there is a difference.

p is parameter used when computing q. The larger the p, the larger q


needs to be to indicate a significant difference. p is an indication of the
differences in the ranks of the group means being compared. Groups
means are ranked in order from largest to smallest in an SNK test, so p is
the number of means spanned in the comparison. For example, when
comparing four means, comparing the largest to the smallest p " 4, and
when comparing the second smallest to the smallest p " 2.

If a treatment is found to be not significantly different than another


treatment, all treatments with p ranks in between the p ranks of the two
treatments that are not different are also assumed not to be significantly
different, and a result of DNT (Do Not Test) appears for those
comparisons.

SigmaStat does not apply the DNT logic to all pairwise comparisons because
of differences in the degrees of freedom between different cell pairs.

The Difference of Means is a gauge of the size of the difference between


the treatments or cells being compared.

The degrees of freedom DF for the marginal comparisons are a measure


of the number of treatments (levels) within the factor being compared.
The degrees of freedom when comparing all cells is a measure of the

Two Way Repeated Measures Analysis of Variance (ANOVA) 405


sample size after accounting for the factors and interaction (this is the
same as the error or residual degrees of freedom).

Two Way Repeated Measures ANOVA Report Graphs 10

You can generate up to five graphs using the results from a Two Way
Repeated Measures ANOVA. They include a:

➤ Histogram of the residuals


➤ Normal probability plot of the residuals
➤ 3D scatter plot of the residuals
➤ 3D category scatter plot
➤ Multiple comparison graphs

Histogram of Residuals The Two Way Repeated Measures ANOVA histogram plots the raw
residuals in a specified range, using a defined interval set. The residuals
are divided into a number of evenly incremented histogram intervals and
plotted as histogram bars indicating the number of residuals in each
interval. The X axis represents the histogram intervals, and the Y axis
represent the number of residuals in each group. For an example of a
histogram, see page 153.

Probability Plot The Two Way Repeated Measures ANOVA probability plot graphs the
frequency of the raw residuals. The residuals are sorted and then plotted
as points around a curve representing the area of the gaussian plotted on
a probability axis. Plots with residuals that fall along gaussian curve
indicate that your data was taken from a normally distributed
population. The X axis is a linear scale representing the residual values.
The Y axis is a probability scale representing the cumulative frequency of
the residuals. For an example of a probability plot, see page 155.

3D Residual The Two Way RM ANOVA 3D residual scatter plot graphs the residuals
Scatter Plot of the two columns of independent variable data. The X and the Y axes
represent the independent variables, and the Z axis represents the
residuals. For an example a 3D residual scatter plot, see page 156.

3D Category The Two Way Repeated Measures ANOVA 3D Category Scatter plot
Scatter Graph graphs the two factors from the independent data columns along the X
and Y axes against the data of the dependent variable column along the
Z axis. The tick marks for the X and Y axes represent the two factors

Two Way Repeated Measures Analysis of Variance (ANOVA) 406


from the independent variable columns, and the tick marks for the Z
axis represent the data from the dependent variable column. For an
example a 3D category scatter plot, see page 406.

Multiple The Two Way Repeated Measures ANOVA multiple comparison graphs
Comparison Matrix plot significant differences between levels of a significant factor. There is
one graph for every significant factor reported by the specified multiple
comparison test. If there is one significant factor reported, one graph
appears; if there are tow significant factors, two graphs appear, etc. If a
factor is not reported as significant, a graph for the factor does not
appear. For an example a multiple comparison graph, see page 160.

Creating a Graph To generate a graph of Two Way Repeated Measures ANOVA report
data:

1 Click the toolbar button, or choose the Graph menu Create


Graph command when the Two Way Repeated Measures ANOVA
report is selected. The Create Graph dialog box appears displaying
the types of graphs available for the Two Way Repeated Measures
ANOVA results.

FIGURE 10–39
The Create Graph Dialog
Box
for Two RM ANOVA
Report Graphs

2 Select the type of graph you want to create from the Graph Type
list, then select OK, or double-click the desired graph in the list.

Two Way Repeated Measures Analysis of Variance (ANOVA) 407


For more information on each of the graph types, see Chapter 8.
The specified graph appears in a graph window or in the report.).

FIGURE 10–40
Example of a Multiple
Comparison Graph

For information on manipulating graphs, see page 178 through page


202.

Friedman Repeated Measures Analysis of Variance on


Ranks 10

Use a Repeated Measures ANOVA (analysis of variance) on ranks when:

➤ You want to see if a single group of individuals was affected by a


series of three or more different experimental treatments, where each
individual received treatment.
➤ The treatment effects are not normally distributed.

If you know the treatment effects are normally distributed, use One Way
Repeated Measures ANOVA. If there are only two treatments to
compare, do a Wilcoxon Signed Rank Test. There is no two factor test
for non-normally distributed treatment effects; however, you can
transform your data using Transform menu commands so that it fits the
assumptions of a parametric test.

Friedman Repeated Measures Analysis of Variance on Ranks 408


For more information on transforming your data, see Chapter 14,
USING TRANSFORMS.

Note that, depending on your Repeated Measures ANOVA on Ranks option


settings (see page 410), if you attempt to perform a Repeated Measures
ANOVA on Ranks on a normal population, SigmaStat informs you that the
data is suitable for a parametric test, and suggest One Way Repeated
Measures ANOVA instead.

About the Repeated The Friedman Repeated Measures Analysis of Variance on Ranks
Measures ANOVA compares effects of a series of different experimental treatments on a
on Ranks single group. Each subject's responses are ranked from smallest to
largest without regard to other subjects, then the rank sums for the
treatments are compared.

The Friedman Repeated Measures ANOVA on Ranks is a


nonparametric test that does not require assuming all the differences in
treatments are from a normally distributed source with equal variance.

Performing a Repeated To perform a Repeated Measures ANOVA on Ranks:


Measures ANOVA on
Ranks 1 Enter or arrange your data appropriately in the worksheet (page
410).

2 If desired, set the rank sum options using the Options for RM
ANOVA on Ranks dialog box (see following section).

3 Select RM ANOVA on Ranks from the toolbar, then choose the


Statistics menu Repeated Measures command, then choose
Repeated Measures ANOVA on Ranks.

4 Run the test by selecting the worksheet columns with the data you
want to test using the Pick Columns dialog box (page 415).

5 Specify the multiple comparisons you want to perform on your


data (page 417).

6 View and interpret the Repeated Measures ANOVA on Ranks


report and generate report graphs (page 421 and page 425).

Friedman Repeated Measures Analysis of Variance on Ranks 409


Arranging Repeated Measures ANOVA on Ranks Data 10

The format of the data to be tested can be raw data or indexed data.
Data for raw data is placed in as many columns as there are treatments,
up to 64; each column contains the data for one treatment and each row
contains the treatments of one subject. Indexed data is placed in three
worksheet columns: a factor column, a subject index column, and a data
column.

The columns for raw data must be the same length. If a missing value is
encountered, that individual is ignored.

For more information on arranging data, see Data Format for Repeated
Measures Tests on page 330, or Arranging Data for Contingency Tables
on page 69.

FIGURE 10–41
Valid Data Formats for
a Repeated Measures
ANOVA on Ranks
Columns 1 through 3 are
arranged as raw data.
Columns 4 through 6 are
arranged as indexed data,
with column 4 as the
subject column, column
5 as the factor column,
and column 6 as the data
column.

Selecting Data Repeated Measures ANOVA on Ranks can be performed on two entire
columns or only a portion of two columns. When running a test you
can either:

➤ Select the columns to test by dragging your mouse over the columns
before choosing the test.
➤ Select the columns while running the test.

Setting the Repeated Measures ANOVA on Ranks Options 10

Use the Repeated Measures ANOVA on Ranks options to:

➤ Adjust the parameters of the test to relax or restrict the testing of


your data for normality and equal variance.

Friedman Repeated Measures Analysis of Variance on Ranks 410


➤ Display the summary table.
➤ Enable and disable multiple comparison testing.

To change the Repeated Measures ANOVA on Ranks options:

1 If you are going to run the test after changing test options, and
want to select your data before you run the test, drag the pointer
over your data.

2 To open the Options for RM ANOVA on Ranks dialog box, select


RM ANOVA on Ranks from the toolbar drop-down, then click
the button, or choose the Statistics menu Current Test
Options... command. The Normality and Equal Variance options
appear (see Figure 10–42 on page 412).

3 Click the Results tab to view the Summary Table option (see
Figure 10–43 on page 413). Click the Post Hoc Test tab to view
the Power and Multiple Comparisons options (see Figure 10–44
on page 414). Click the Assumption Checking tab to return to the
Normality and Equal Variance options.

4 Click a check box to enable or disable a test option. Options


settings are saved between SigmaStat sessions. For more
information on each of the test options, see pages 10-411 through
10-423.

5 To continue the test, click Run Test. The Pick Columns dialog
box appears (see page 415 for more information).

6 To accept the current settings and close the options dialog box,
click OK. To accept the current setting without closing the
options dialog box, click Apply. To close the dialog box without
changing any settings or running the test, click Cancel.

You can select Help at any time to access SigmaStat’s on-line help
system.

Normality and Select the Assumption Checking tab from the options dialog box to view
Equal Variance the Normality and Equal Variance options. The normality assumption
Assumptions test checks for a normally distributed population. The equal variance
assumption test checks the variability about the group means.

Friedman Repeated Measures Analysis of Variance on Ranks 411


Normality Testing SigmaStat uses the Kolmogorov-Smirnov test to test
for a normally distributed population.

Equal Variance Testing SigmaStat tests for equal variance by checking


the variability about the group means.

P Values for Normality and Equal Variance Enter the corresponding


P value in the P Value to Reject box. The P value determines the
probability of being incorrect in concluding that the data is not
normally distributed (the P value is the risk of falsely rejecting the null
hypothesis that the data is normally distributed). If the P value
computed by the test is greater than the P set here, the test passes.

To require a stricter adherence to normality and/or equal variance,


increase the P value. Because the parametric statistical methods are
relatively robust in terms of detecting violations of the assumptions, the
suggested value in SigmaStat is 0.050. Larger values of P (for example,
0.100) require less evidence to conclude that data is not normal.

To relax the requirement of normality and/or equal variance,


decrease P. Requiring smaller values of P to reject the normality
assumption means that you are willing to accept greater deviations from
the theoretical normal distribution before you flag the data as non-
normal. For example, a P value of 0.050 requires greater deviations from
normality to flag the data as non-normal.

FIGURE 10–42
The Options for RM
ANOVA on Ranks
Dialog Box Displaying
the Assumption
Checking Options

Although the assumption tests are robust in detecting data from populations
that are non-normal or with unequal variances, there are extreme conditions
of data distribution that these tests cannot take into account. For example,
the Levene Median test fails to detect differences in variance of several orders

Friedman Repeated Measures Analysis of Variance on Ranks 412


of magnitude. However, these conditions should be easily detected by
simply examining the data without resorting to the automatic assumption
tests.

Summary Table Select the Results tab to view the Summary Table option. The summary
table for a ANOVA on Ranks lists the medians, percentiles, and sample
sizes N in the ANOVA on Ranks report. If desired, change the
percentile values by editing the boxes. The 25th and the 75th
percentiles are the suggested percentiles.

FIGURE 10–43
The Options for RM ANOVA
on Ranks Dialog Box
Displaying the Summary
Table Options

Multiple Comparisons Select the Post Hoc Test tab in the Options dialog box to view the
multiple comparisons options (see Figure 10–44 on page 414).
Repeated Measures ANOVA on Ranks test the hypothesis of no
differences between the several treatment groups, but do not determine
which groups are different, or the sizes of these differences. Multiple
comparison procedures isolate these differences.

The P value used to determine if the ANOVA detects a difference is set


in the Report Options dialog box. If the P value produced by the One
Way ANOVA is less than the P value specified in the box, a difference in
the groups s detected and the multiple comparisons are performed. For
more information on specifying a P value for the ANOVA, see Setting
Report Options on page 135.

Performing Multiple Comparisons You can choose to always perform


multiple comparisons or to only perform multiple comparisons if the
Two Way ANOVA detects a difference.

Friedman Repeated Measures Analysis of Variance on Ranks 413


Select the Always Perform option to perform multiple comparisons
whether or not the ANOVA detects a difference.

Select the Only When ANOVA P Value is Significant option to perform


multiple comparisons only if the ANOVA detects a difference.

Significant Multiple Comparison Value Select either .05 or .10 from


the Significance Value for Multiple Comparisons drop-down list. This
value determines the that the likelihood of the multiple comparison
being incorrect in concluding that there is a significant difference in the
treatments.

A value of .05 indicates that the multiple comparisons will detect a


difference if there is less than 5% chance that the multiple comparison is
incorrect in detecting a difference. A value of .10 indicates that the
multiple comparisons will detect a difference if there is less than 10%
chance that the multiple comparison is incorrect in detecting a
difference.

FIGURE 10–44
The Options for RM
ANOVA on Ranks Dialog Box
Displaying the Multiple
Comparison Options

If multiple comparisons are triggered, the Multiple Comparison Options


dialog box appears after you pick your data from the worksheet and run the
test, prompting you to choose a multiple comparison method. See page 392
for more information.

Running a Repeated Measures ANOVA on Ranks 10

To run an Repeated Measures ANOVA on Ranks, you need to select the


data to test. The Pick Columns dialog box is used to select the
worksheet columns with the data you want to test and to specify how
your data is arranged in the worksheet.

Friedman Repeated Measures Analysis of Variance on Ranks 414


To run a Repeated Measures ANOVA on Ranks:

1 If you want to select your data before you run the test, drag the
pointer over your data.

2 Open the Pick Columns dialog box to start the Repeated Measures
ANOVA on Ranks. You can either:

➤ SelectRM ANOVA on Ranks from the toolbar drop-down list,


then select the button.
➤ Choose the Statistics menu Repeated Measures command, then
choose Repeated Measures ANOVA on Ranks.
➤ Click the Run Test button from the Options for RM ANOVA
on Ranks dialog box (see page 413).

The Pick Columns dialog box appears prompting you to specify a


data format.

3 Select the appropriate data format from the Data Format drop-
down list. If your data is grouped in columns, select Raw. If your
data is in the form of a group index column(s) paired with a data
column(s), select Indexed.

FIGURE 10–45
The Pick Columns
for RM ANOVA on Ranks
Dialog Box Prompting You to
Specify a Data Format

For more information on arranging data, see Data Format for


Repeated Measures Tests on page 330, or Arranging Data for
Contingency Tables on page 69.

4 Select Next to pick the data columns for the test. If you selected
columns before you chose the test, the selected columns appear in
the Selected Columns list.

Friedman Repeated Measures Analysis of Variance on Ranks 415


5 To assign the desired worksheet columns to the Selected Columns
list, select the columns in the worksheet, or select the columns
from the Data for Data drop-down list.

The first selected column is assigned to the first row in the Selected
Columns list, and all successively selected columns are assigned to
successive rows in the list. The number or title of selected columns
appear in each row. For raw data you are prompted for up to 64
data columns and for indexed data, you are prompted to select
three (Subject, Level, Data) worksheet columns.

FIGURE 10–46
The Pick Columns
for RM ANOVA on Ranks
Dialog Box Prompting You to
Select Data Columns

6 To change your selections, select the assignment in the list, then


select new column from the worksheet. You can also clear a
column assignment by double-clicking it in the Selected Columns
list.

7 Select Finish to perform the test. If you elected to test for


normality and equal variance, SigmaStat performs the test for
normality (Kolmogorov-Smirnov) and the test for equal variance
(Levene Median). If your data passes both tests, SigmaStat informs
you and suggests continuing your analysis using One Way
Repeated Measures ANOVA.

If you did not enable multiple comparison testing in the Options


for RM ANOVA on Ranks dialog box, the Repeated Measures
ANOVA on Ranks report appears after the test is complete.

If you did enable the Multiple Comparisons option in the options


dialog box, the Multiple Comparison Options dialog box appears
prompting you to select a multiple comparison method. For more
information on selecting a multiple comparison method, see the
following section, Multiple Comparisons Options.

Friedman Repeated Measures Analysis of Variance on Ranks 416


Multiple Comparison Options 10

If you selected to run multiple comparisons only when the P value is


significant, and the ANOVA produces a P value, for either of the two
factors or the interaction between the two factors, equal to or less than
the trigger P value, or you selected to always run multiple comparisons
in the Options for RM ANOVA on Ranks dialog box (see page 414), the
Multiple Comparison Options dialog box appears prompting you to
specify a multiple comparison test.

This dialog box displays the P values for each of the two experimental
factors and of the interaction between the two factors. Only the options
with P values less than or equal to the value set in the Options dialog
box are selected. You can disable multiple comparison testing for a factor
by clicking the selected option. If no factor is selected, multiple
comparison results are not reported.

There are four multiple comparison tests to choose from for the
ANOVA on Ranks. You can choose to perform the

➤ Dunn’s Test
➤ Dunnett’s Test
➤ Tukey Test
➤ Student-Newman-Keuls Test

There are two kinds of multiple comparison procedures available for the
Repeated Measures ANOVA on Ranks.

➤ All pairwise comparisons test the difference between each


treatment or level within the two factors separately (i.e., among the
different rows and columns of the data table)
➤ Multiple comparisons versus a control test the difference between
all the different combinations of each factors (i.e., all the cells in the
data table)

Tukey Test The Tukey Test and the Student-Newman-Keuls test are conducted
similarly to the Bonferroni t-test, except that it uses a table of critical
values that is computed based on a better mathematical model of the
probability structure of the multiple comparisons. The Tukey Test is
more conservative than the Student-Newman-Keuls test, because it
controls the errors of all comparisons simultaneously, while the Student-
Neuman-Keuls test controls errors among tests of k means. Because it is

Friedman Repeated Measures Analysis of Variance on Ranks 417


more conservative, it is less likely to determine that a give differences is
statistically significant and it is the recommended test for all pairwise
comparisons.

While Multiple Comparisons vs a Control is an available comparison


type for the Tukey Test, it is not recommended. Use the Dunnet’s Test
for multiple comparisons vs a control.

Student-Newman-Keuls The Student-Newman-Keuls Test and the Tukey Test are conducted
(SNK) Test similarly to the Bonferroni t-test, except that it uses a table of critical
values that is computed based on a better mathematical model of the
probability structure of the multiple comparisons. The Student-
Newman-Keuls Test is less conservative than the Tukey Test because it
controls errors among tests of k means, while the Tukey Test controls the
errors of all comparisons simultaneously. Because it is less conservative,
it is more likely to determine that a give differences is statistically
significant. The Student-Newman-Keuls Test is usually more sensitive
than the Bonferroni t-test, and is only available for all pairwise
comparisons.

Dunn's Test Dunn's test must be used for ANOVA on Ranks when the sample sizes
in the different treatment groups are different. You can perform both all
pairwise comparisons and multiple comparisons versus a control with
the Dunn’s test. The all pairwise Dunn’s test is the default for data with
missing values.

Dunnett’s Test Dunnett's test is the analog of the Student-Newman-Keuls Test for the
case of multiple comparisons against a single control group. It is
conducted similarly to the Bonferroni t-test, but with a more
sophisticated mathematical model of the way the error accumulates in
order to derive the associated table of critical values for hypothesis
testing. This test is less conservative than the Bonferroni Test, and is
only available for multiple comparisons vs a control.

Performing a Multiple The multiple comparison you choose to perform depends on the
Comparison treatments you are testing. Select Cancel if you do not want to perform
a multiple comparison procedure.

1 Multiple comparisons are performed of the factors selected under


the Select Factors to Compare heading. The factors with P values
less than or equal to the value set in the Options dialog box are
automatically selected, and the P values for the selected factors,
and/or the interactions of the factors are displayed in the upper left

Friedman Repeated Measures Analysis of Variance on Ranks 418


corner of the dialog box. If the P value is greater than the P value
set in the Options dialog box, the factor is not selected, the P value
for the factor is not displayed, and multiple comparisons are not
performed for the factor.

You can disable multiple comparison testing for a factor by


clicking the selected option.

FIGURE 10–47
The Multiple Comparison
Options Dialog Box for
the Repeated Measures
ANOVA on Ranks

2 Select the desired multiple comparison test from the Suggested


Test drop-down list. The Tukey and Student-Newman-Keuls tests
are recommended all pairwise comparisons for determining the
difference among all treatments.

3 Select the desired multiple comparison test from the Suggested


Test drop-down list. The Tukey and Student-Newman-Keuls tests
are recommended for determining the difference among all
treatments, and if your sample sizes are equal. To perform an all
pairwise comparison on unequal sample size, select Dunn’s test.

4 Select Dunnett’s to determine the differences between the


experimental groups and a control group, and if your sample sizes
are equal. To perform a comparison vs a control group on
unequal sample size, select the Dunn’s test.

Note that in both cases SigmaStat defaults to Dunn’s test when your
sample sizes are unequal. You must use Dunn’s test for unequal
sample sizes.

For more information on each of the multiple comparison tests,


see page 396.

Friedman Repeated Measures Analysis of Variance on Ranks 419


5 Select a Comparison Type. The types of comparisons available
depend on the selected test. All Pairwise compares all possible
pairs of treatments and is available for the Tukey, Student-
Newman-Keuls, and Dunn’s test.

Versus Control compares all experimental treatments to a single


control group and is available for the Dunn’s and Dunnett’s tests. It
is not recommended for the Tukey test. If you select Versus
Control, you must also select the control group from the list of
groups.

For more information on each of the multiple comparison tests,


see page 10-423.

6 If you selected an all pairwise comparison test, select Finish to


continue with the Repeated Measures ANOVA on Ranks and view
the report (see Figure 10–49 on page 422). For information on
editing reports, see Editing Reports on page 137.

7 If you selected a multiple comparisons versus a control test,


select Next. The Multiple Comparisons Options dialog box
prompts you to select a control group. Select the desired control
group from the list, then select Finish to continue with the
Repeated Measures ANOVA on Ranks and view the report (see
Figure 10–49 on page 422). For information on editing reports,
see Editing Reports on page 137.

FIGURE 10–48
The Multiple Comparison
Options Dialog Box
Prompting You to
Select a Control Group

Friedman Repeated Measures Analysis of Variance on Ranks 420


Interpreting Repeated Measures ANOVA on Ranks Results 10

The Friedman Repeated Measures ANOVA on Ranks report displays the


2
results for ( r , degrees of freedom, and P. The other results displayed are
selected in the Options for RM ANOVA on Ranks dialog box (see
Setting RM ANOVA on Ranks Options on page 410). Multiple
comparisons are enabled in the Options for RM ANOVA on Ranks
dialog box. The test used to perform the multiple comparison is selected
in the Multiple Comparisons Options dialog box (see Multiple
Comparison Options on page 392).

For descriptions of the derivations for ANOVA on Ranks results, you


can reference an appropriate statistics reference. For a list of suggested
references, see page 12.

The report scroll bars only scroll to the top and bottom of the current page.
To move to the next or the previous page in the report, use the and
buttons in the formatting toolbar to move one page up and down in the
report.

Result Explanations In addition to the numerical results, expanded explanations of the results
may also appear. To turn off this explanatory text, choose the Statistics
menu Report Options... command and uncheck the Explain Test
Results option.

The number of decimal places displayed is also set in the Report


Options dialog box. For more information on setting report options, see
Setting Report Options on page 135.

Normality Test Normality test results display whether the data passed or failed the test of
the assumption that the differences of the treatments originate from a
normal distribution, and the P value calculated by the test. For
nonparametric procedures this test can fail, as nonparametric tests do
not require normally distributed source populations. This result appears
unless you disabled normality testing in the Options for RM ANOVA
on Ranks dialog box (see page 413).

Equal Variance Test Equal Variance test results display whether or not the data passed or
failed the test of the assumption that the differences of the treatments
originate from a population with the same variance, and the P value
calculated by the test. Nonparametric tests do not assume equal variance
of the source. This result appears unless you disabled equal variance

Friedman Repeated Measures Analysis of Variance on Ranks 421


testing in the Options for RM ANOVA on Ranks dialog box (see page
413).

Summary Table SigmaStat can generate a summary table listing the sample sizes N,
number of missing values, medians, and percentiles defined in the
Options for RM ANOVA on Ranks dialog box.

N (Size) The number of non-missing observations for that column or


group.

Missing The number of missing values for that column or group.

Medians The “middle” observation as computed by listing all the


observations from smallest to largest and selecting the largest value of the
smallest half of the observations. The median observation has an equal
number of observations greater than and less than that observation.

Percentiles The two percentile points that define the upper and lower
tails of the observed values.

These results appear in the report unless you disable them in the
Options for RM ANOVA on Ranks dialog box (see page 413).

FIGURE 10–49
The Friedman
Repeated
Measures ANOVA
on
Ranks Results
Report

Friedman Repeated Measures Analysis of Variance on Ranks 422


2 2
Chi-Square ( ( r ) The Friedman test statistic ( r is used to evaluate the null hypothesis
2
Statistic that all the rank sums are equal. If the value of ( r is large, you can
conclude that the treatment effects are different (i.e., that the differences
in the rank sums are greater than would be expected by chance).
2
Values of ( r near zero indicates that there is no significant difference in
treatments; the ranks within each subject are random.
2
( r is computed by ranking all observations for each subject from
smallest to largest without regard for other subjects. The ranks are
2
summed for each treatment and ( r is computed from the sum of
squares.

Degrees of Freedom The degrees of freedom is an indication of the


2
sensitivity of ( r . It is a measure of the number of treatments.

P value The P value is the probability of being wrong in concluding


that there is a true difference in the treatments (i.e., the probability of
falsely rejecting the null hypothesis, or committing a Type I error, based
on ( 2r ). The smaller the P value, the greater the probability that the
samples are significantly different.

Traditionally, you can conclude there are significant differences when


P ! 0.05.

Multiple Comparisons If a difference is found among the groups, and you requested and elected
to perform multiple comparisons, a table of the comparisons between
group pairs is displayed. The multiple comparison procedure is
activated in the Options for ANOVA on Ranks dialog box (see page
414). The test used in the multiple comparison procedure is selected in
the Multiple Comparison Options dialog box (see page 396).

Multiple comparison results are used to determine exactly which groups


are different, since the ANOVA results only inform you that two or
more of the groups are different. The specific type of multiple
comparison results depends on the comparison test used and whether
the comparison was made pairwise or versus a control.

➤ All pairwise comparison results list comparisons of all possible


combinations of group pairs: the all pairwise tests are the Tukey,
Student-Newman-Keuls test and Dunn's test.
➤ Comparisons versus a single control list only comparisons with
the selected control group. The control group is selected during the

Friedman Repeated Measures Analysis of Variance on Ranks 423


actual multiple comparison procedure. The comparison versus a
control tests are Dunnett's test and Dunn's test.

Tukey, Student-Newman-Keuls, and Dunnett's Test Results The


Tukey and Student-Newman-Keuls (SNK) tests are all pairwise
comparisons of every combination of group pairs. Dunnett's test only
compares a control group to all other groups. All tests compute the q
test statistic, the number of rank sums spanned in the comparison p, and
display whether or not P ! 0.05 for that pair comparison.

You can conclude from “large” values of q that the difference of the two
treatments being compared is statistically significant.

If the P value for the comparison is less than 0.05, the likelihood of
being incorrect in concluding that there is a significant difference is less
than 5%. If it is greater than 0.05, you cannot confidently conclude
that there is a difference.

The rank sums is a gauge of the size of the difference between the two
treatments.

p is parameter used when computing q. The larger the p, the larger q


needs to be to indicate a significant difference. p is an indication of the
differences in the ranks of the rank sums being compared. Group rank
sums are ranked in order from largest to smallest in an SNK test, so p is
the number of ranks spanned in the comparison. For example, when
comparing four rank means, comparing the largest to the smallest p " 4,
and when comparing the second smallest to the smallest p " 2.

If a treatment is found to be not significantly different than another


treatment, all treatments with p ranks in between the p ranks of the two
treatments that are not different are also assumed not to be significantly
different, and a result of Do Not Test appears for those comparisons.

SigmaStat does not apply the DNT logic to all pairwise comparisons because
of differences in the degrees of freedom between different cell pairs.

Dunn's Test Results Dunn's test is used to compare all treatments or


to compare versus a control when the group sizes are unequal. Dunn's
test lists the difference of ranks, computes the Q test statistic, and
displays whether or not P ! 0.05, for each treatment pair.

Friedman Repeated Measures Analysis of Variance on Ranks 424


You can conclude from “large” values of Q that the difference of the two
treatments being compared is statistically significant.

If the P value for the comparison is less than 0.05, the likelihood of
being incorrect in concluding that there is a significant difference is less
than 5%. If it is greater than 0.05, you cannot confidently conclude
that there is a difference.

The rank sums is a gauge of the size of the difference between the two
treatments.

A result of DNT (do not test) appears for those comparison pairs whose
difference of rank means is less than the differences of the first
comparison pair which is found to be not significantly different.

Repeated Measures ANOVA on Ranks Report Graphs 10

You can generate up to three graphs using the results from a Repeated
Measures ANOVA on Ranks. They include a:

➤ Box of the column means.


➤ Line graph of the changes after treatment.
➤ Multiple comparison graphs.

Box Plot The Repeated Measures ANOVA on Ranks box plot graphs each of the
groups being tested as boxes. The ends of the boxes define the 25th and
75th percentiles, with a line at the median and error bars defining the
10th and 90th percentiles.

If the graph data is indexed, the levels in the factor column are used as
the tick marks for the box plot boxes, and the column titles are used as
the axis titles. If the graph data is in raw format, the column titles are
used as the tick marks for the box plot boxes, and default axis titles, X
Axis and Y Axis, are assigned to the graph. For an example of a box plot,
see page 152.

Before and After The Repeated Measures ANOVA on Ranks uses lines to plot a subject's
Line Plot change after each treatment. If the graph plots raw data, the lines
represent the rows in the column, the column titles are used as the tick
marks for the X axis and the data is used as the tick marks for the Y axis.

If the graph plots indexed data, the lines represent the levels in the
subject column, the levels in the treatment column are used as the tick

Friedman Repeated Measures Analysis of Variance on Ranks 425


marks for the X axis, the data is used as the tick marks for the Y axis, and
the treatment and data column titles are used as the axis titles. You
cannot view a graph of your data until you have performed an ANOVA
on Ranks. The legends indicate which subject each symbol represents.
For an example of a line/scatter plot, see page 156.

Multiple Comparison The Repeated Measures ANOVA on Ranks multiple comparison graphs
Graphs plot significant differences between levels of a significant factor. There is
one graph for every significant factor reported by the specified multiple
comparison test. If there is one significant factor reported, one graph
appears; if there are tow significant factors, two graphs appear, etc. If a
factor is not reported as significant, a graph for the factor does not
appear. For an example of a multiple comparison graph, see page 160.

Creating a Graph To generate a graph of Repeated Measures ANOVA on Ranks data:

1 Click the toolbar button, or choose the Graph menu Create


Graph command when the Repeated Measures ANOVA on Ranks
report is selected. The Create Graph dialog box appears displaying
the types of graphs available for the Repeated Measures ANOVA
on Ranks results.

FIGURE 10–50
The Create Graph Dialog
Box
for the Repeated Measures
ANOVA on Ranks Report
Graph

2 Select the type of graph you want to create from the Graph Type
list, then select OK, or double-click the desired graph in the list.

Friedman Repeated Measures Analysis of Variance on Ranks 426


For more information on each of the graph types, see Chapter 8.
The specified graph appears in a graph window or in the report.

FIGURE 10–51
A Box Plot for a Repeated
Measures ANOVA on Ranks

For information on modifying graphs, see Chapter 8, CREATING AND


MODIFYING GRAPHS.

Friedman Repeated Measures Analysis of Variance on Ranks 427


Friedman Repeated Measures Analysis of Variance on Ranks 428
Comparing Frequencies, Rates, and Proportions

10 Comparing Frequencies, Rates, and


Proportions

Use rate and proportion tests to compare two or more sets of data for
differences in the number of individuals that fall into different classes or
categories. All these tests are found under the Statistics menu Rates and
Proportions command.

If you are comparing groups where the data is measured on a numeric


scale, use the appropriate group comparison or repeated measures tests.
See Choosing the Procedure to Use on page 103 for more information
on when to use the different SigmaStat tests.

About Rate and Proportion Tests 10

Rate and proportion tests are used when the data is measured on a
nominal scale. Rate and proportion comparisons test for significant
differences in the categorical distribution of the data beyond what can be
attributed to random variation.

See Choosing the Rate and Proportion Comparison to Use on page 122
for more information on when to use the different SigmaStat frequency,
rate, and proportion tests.

Contingency Tables Many rate and proportion tests utilize a contingency table which lists
the groups and/or categories to be compared as the table column and
row titles, and the number of observations for each combination of
category or group as the table cells. See Figure 11-1 on page 431 for an
example of a simple contingency table. A contingency table is used to
determine whether or not the distribution of a group is contingent on
the categories it falls in.

About Rate and Proportion Tests 429


Comparing Frequencies, Rates, and Proportions

A 2 x 2 contingency table has two groups and two categories (i.e., two
rows and two columns). A 2 x 3 table has two groups and three
categories or three groups and two categories, etc.

Comparing the Use a z-test to compare the proportions of two groups found within a
Proportions of single category for a significant difference. The z-test is performed using
Two Groups in the Rates and Proportions, z-test command.
One Category

Comparing Proportions You can use analysis of contingency tables to test if the distributions of
of Multiple Groups in two or more groups within two or more categories are significantly
Multiple Categories different.

➤ Use Chi-Square ((2) analysis of contingency if there are more than


two groups or categories, or if the expected number of observations
per cell in a 2 x 2 contingency table are greater than five
➤ Use the Fisher Exact Test when the expected number of observations
is less than five in any cell of a 2 x 2 contingency table

SigmaStat automatically checks your data during a Chi-Square analysis


and suggests the Fisher Exact Test when applicable. Note than you can
perform the Fisher Exact Test on any 2 x 2 contingency table.

SigmaStat computes a two-tailed Fisher Exact Test.

Comparing Proportions You can test for differences in the proportions of the responses in the
of same individuals to a series of two different treatments using McNemar's
the Same Group Test for changes.
to Two Treatments

Yates Correction The Yates Correction for continuity can be automatically applied to the
z-test and for all tests using 2 x 2 tables or comparisons with the (2
distribution with one degree of freedom. It is generally accepted that the
Yates Correction yields a more accurately computed P value in these
cases.

For descriptions of the Yates Correction Factor, you can reference any
appropriate statistics reference. For a list of suggested references, see
page 12.

Application of the Yates Correction Factor is selected in the Options


dialog box for each test (see page 435, page 444, and page 459).

About Rate and Proportion Tests 430


Comparing Frequencies, Rates, and Proportions

Data Format for Rate and Proportion Tests 10

The exact format for each rate and proportion test varies from test to
test.

Note that whenever numbers of observations are listed, they must always be
integers.

z-test The data for a z-test is always placed in two worksheet rows by two
columns. The size (total number of observations) of each group is in
one column, and the corresponding proportion p of the observations
within the category is in a second column. The number of observations
must always be an integer, and the proportions p must be between 0 and
1.

(2 Analysis of The data can be arranged in the worksheet as either the contingency
Contingency Tables table data or as indexed raw data.

Tabulated Data Tabulated data is arranged in a contingency table


showing the number of observations for each cell. The worksheet rows
and columns correspond to the groups and categories. The number of
observations must always be an integer.

Note that the order and location of the rows or columns corresponding
to the groups and categories is unimportant. You can use the rows for
category and the columns for group, or vice versa.
TABLE 11-1
A Contingency Table Species Location
Describing the Number of
Lowland and Alpine Species Tundra Foothills Treeline
Found at Different Locations
Lowland 125 16 6
Alpine 7 19 117

Raw Data You can report the group and category of each individual
observation by placing the group in one worksheet column and the
corresponding category in another column. Each row corresponds to a
single observation, so there should be as many rows of data as there are
total numbers of observations.

SigmaStat automatically cross tabulates these data and performs the (2


analysis on the resulting contingency table.

Data Format for Rate and Proportion Tests 431


Comparing Frequencies, Rates, and Proportions

FIGURE 10–1
Worksheet Data
Arrangement for
Contingency Table Data
from Table 8-1
Columns 1 through 3 are in
tabular format, and columns
4 and 5 are raw data.

For information on specifying a data format for a Chi-Square Test, see


10-447.

Fisher Exact Test The data must form a 2 x 2 contingency table, with the number of
observations in each cell. You can test tabulated data or raw data
observations.
TABLE 3-2
A 2 x 2 Contingency Pinniped Species Island
Table Describing the
Number of Harbor Seals Island 1 Island 2
and Sea Lions Found
on Two Different Islands
Sea Lions 5 1
Harbor Seals 2 7

Tabulated Data Tabulated data is arranged in a contingency table


showing the number of observations for each cell. The worksheet rows
and columns correspond to the groups and categories. The number of
observations must always be an integer.

Raw Data A group identifier is placed in one worksheet column and


the corresponding category in another column. There must be exactly
two kinds of groups and two types of categories. Each row corresponds
to a single observation, so there should be as many rows of data as there
are total numbers of observations.

SigmaStat automatically cross-tabulates this data and performs the


Fisher Exact Test on the resulting contingency table.

Data Format for Rate and Proportion Tests 432


Comparing Frequencies, Rates, and Proportions

FIGURE 10–2
Data Formats for
a Fisher Exact Test
Columns 1 and 2 are in
tabular format and
columns 3 and 4 are
raw data observations.

A Fisher Exact Test requires


data for a 2 x 2 table.

For information on specifying a data format for a Fisher Exact Test, see
page 454.

McNemar's Test The data must form a table with the same number of rows and columns,
since the both treatments must have the same number of categories. You
can test tabulated data or raw data observations.

Tabulated Data Tabulated data is arranged in a contingency table


showing the number of observations for each cell. The worksheet rows
and columns correspond to the two groups of categories. The number
of category types must be the same for both groups, so that the
contingency table is square. The number of observations must always be
an integer.
TABLE 3-3
A 3 x 3 Contingency Table Before Report After Report
Describing the Effect of a
Report on the Opinion of Approve Disapprove Don’t Know
Surveyed People
The McNemar Test ignores Approve 12 24 6
people who didn’t change
their opinion. Disapprove 5 32 3
Don’t Know 4 6 4

Raw Data A category identifier is placed in one worksheet column and


the corresponding category in another column. There must be the same
number of the types of categories. Each row corresponds to a single
observation, so there should be as many rows of data as there are total
numbers of observations.

Data Format for Rate and Proportion Tests 433


Comparing Frequencies, Rates, and Proportions

FIGURE 10–3
Data Formats for
McNemar Test
Columns 1 through 3 are in
tabular format, and columns
4 through 6 are raw data
observations.

McNemar Test requires data


for tables with equal numbers
of columns and rows–here a
3 x 3 table.

SigmaStat automatically cross tabulates this data and performs


McNemar's Test on the resulting contingency table.

For information on specifying a data format for a McNemar Test, see


page 461.

Comparing Proportions Using the z-Test 10

Compare proportions with a z-test when:

➤ You have two groups to compare.


➤ You know the total sample size (number of observations) for each
group.
➤ You have the proportions p for each group that falls within a single
category.

If you have data for the numbers of observations for each group that fall
in two categories perform (2 analysis of contingency tables instead. This
will produce the same P value as the z-test. You can also run the (2
analysis of contingency tables if you have more than two groups or
categories.

About the z-test The z-test comparison of proportions is used to determine if the
proportions of two groups within one category or class are significantly
different. The
z-test assumes that:

Comparing Proportions Using the z-Test 434


Comparing Frequencies, Rates, and Proportions

➤ Each observation falls into one of two mutually exclusive categories.


➤ All observations are independent.

Performing a z-test To perform a z-test:

1 Enter or arrange your data appropriately in the data worksheet


(see following section).

2 If desired, set the z-test options using the Options for z-test dialog
box
(page 435).

3 Select z-test from the toolbar, then click the button, or choose
the Statistics menu Rates and Proportions command, then choose
z-test.

4 Run the test by selecting the worksheet columns with the data you
want to test using the Pick Columns dialog box (page 438).

5 View and interpret the z-test report (page 439).

Arranging z-test Data 10

To compare two proportions, enter the two sample sizes in one column
and the corresponding observed proportions p in a second column.
There must be exactly two rows and two columns. The sample sizes
must be whole numbers and the observed proportions must be between
0 and 1. For more information see Data Format for Rate and
Proportion Tests on page 431.

Selecting When running a z-test, you can either:


Data Columns
➤ Select the columns to test from the worksheet by dragging your
mouse over the columns before choosing the test.
➤ Select the columns while performing the test (page 438).

Setting z-test Options 10

Use the Compare Proportion options to:

➤ Display the confidence interval for the data in Compare Proportion


test reports.

Comparing Proportions Using the z-Test 435


Comparing Frequencies, Rates, and Proportions

➤ Display the power of a performed test for Compare Proportion tests


in the reports.
➤ Enable the Yates Correction Factor.

To change z-test options:

1 If you are going to run the test after changing test options, and
want to select your data before you run the test, drag the pointer
over your data.

2 To open the Options for z-test dialog box, select z-test from the
toolbar drop-down, then click the button, or choose the
Statistics menu Current Test Options... command. The power
Yates Correction Factor, and Confidence Intervals appear.

3 Click a check box to enable or disable a test option. All options are
saved between SigmaStat sessions.

FIGURE 10–4
The Options for z-test Dialog
Box

4 To continue the test, click Run Test. The Pick Columns dialog
box appears (see page 438 for more information).

5 To accept the current settings and close the options dialog box,
click OK. To accept the current setting without closing the
options dialog box, click Apply. To close the dialog box without
changing any settings or running the test, click Cancel.

You can select Help at any time to access SigmaStat’s on-line help
system.

Power Leave the Power option selected to detect the sensitivity of the test. The
power or sensitivity of a test is the probability that the test will detect a

Comparing Proportions Using the z-Test 436


Comparing Frequencies, Rates, and Proportions

difference between the proportions of two groups if there is really a


difference.

Change the alpha value by editing the number in the Alpha Value box.

Alpha (#) is the acceptable probability of incorrectly concluding that


there is a difference. The suggested value is # " 0.05. This indicates
that a one in twenty chance of error is acceptable, or that you are willing
to conclude there is a significant difference when P ! 0.05.

Smaller values of # result in stricter requirements before concluding


there is a significant difference, but a greater possibility of concluding
there is no difference when one exists. Larger values of # make it easier
to conclude that there is a difference, but also increase the risk of
reporting a false positive.

The Yates When a statistical test uses a () distribution with one degree of freedom,
Correction Factor such as analysis of a 2 x 2 contingency table or McNemar's test, the (2
calculated tends to produce P values which are too small, when
compared with the actual distribution of the (2 test statistic. The
theoretical (2 distribution is continuous, whereas the distribution of the
(2 test statistic is discrete.

Use the Yates Correction Factor to adjust the computed (2 value down
to compensate for this discrepancy. Using the Yates correction makes a
test more conservative, i.e., it increases the P value and reduces the
chance of a false positive conclusion. The Yates correction is applied to
2 x 2 tables and other statistics where the P value is computed from a (2
distribution with one degree of freedom.

Click the selected check box to turn the Yates Correction Factor on or
off.

For descriptions of the derivation of the Yates correction, you can


reference any appropriate statistics reference. For a list of suggested
references, see page 12.

Confidence Interval This is the confidence interval for the difference of proportions. To
change the specified interval, select the box and type any number from 1
to 99 (95 and 99 are the most commonly used intervals).

Comparing Proportions Using the z-Test 437


Comparing Frequencies, Rates, and Proportions

Running a z-test 10

To run a test, you need to select the data to test. The Pick Columns
dialog box is used to select the worksheet columns with the data you
want to test and to specify how your data is arranged in the worksheet.

To run a z-test:

1 If you want to select your data before you run the test, drag the
pointer over your data.

2 Open the Pick Columns dialog box to start the z-test. You can
either:

➤ Select z-test from the toolbar drop-down list, then select the
button.
➤ Choose the Statistics menu Rates and Proportions command,
then choose z-test...
➤ Click the Run Test button from the Options for z-test dialog
box (see step 4 on page 436).

If you selected columns before you chose the test, the selected
columns appear in the column list. If you have not selected
columns, the dialog box prompts you to pick your data.

3 To assign the desired worksheet columns to the Selected Columns


list, select the columns in the worksheet, or select the columns
from the Data for Size or Proportion drop-down list.

The first selected column is assigned to the Size row in the Selected
Columns list, and the second column is assigned to Proportion
row in the list. The title of selected columns appear in each row.
You can only select one Size and one Proportion data column.

4 To change your selections, select the assignment in the list, then


select new column from the worksheet. You can also clear a
column assignment by double-clicking it in the Selected Columns
list.

5 Select Finish to perform the test. The report appears displaying


the results of the z-test (see Figure 10–6 on page 440).

Comparing Proportions Using the z-Test 438


Comparing Frequencies, Rates, and Proportions

FIGURE 10–5
The Pick Columns
for z-test Dialog Box
Prompting You to
Select Data Columns

Interpreting Proportion Comparison Results 10

The z-test report displays a table of the statistical values used, the z
statistic, and the P for the test. You can also display a confidence
interval for the difference of the proportions using the Options for z-test
dialog box (see Setting z-test Options on page 435).

For descriptions of the derivation for z-test results, you can reference any
appropriate statistics reference. For a list of suggested references, see
page 12.

The report scroll bars only scroll to the top and bottom of the current page.
To move to the next or the previous page in the report, use the and
buttons in the formatting toolbar to move one page up and down in the
report.

Result Explanations In addition to the numerical results, expanded explanations of the results
may also appear. To turn off this explanatory text, choose the Statistics
menu Report Options... command and uncheck the Explain Test
Results option.

The number of decimal places displayed is set in the Report Options


dialog box. For more information on setting report options, see Setting
Report Options on page 135.

Statistical Summary The summary table for a z-test lists the sizes of the groups n and the
proportion of each group in the category p. These values are taken
directly from the data.

Difference of Proportions This is the difference between the p


proportions for the two groups.

Comparing Proportions Using the z-Test 439


Comparing Frequencies, Rates, and Proportions

FIGURE 10–6
The z-test
Comparison of
Proportions
Results Report

Pooled Estimate for P This is the estimate of the population


proportion p based on pooling the two samples to test the hypothesis
that they were drawn from the same population. It depends on both the
nature of the underlying population and the specific samples drawn.

Standard Error of the Difference The standard error of the difference


is a measure of the precision with which this difference can be estimated.

z statistic The z statistic is

difference of the sample proportions -


------------------------------------------------------------------------------------------ = z
standard error of the sample proportions

You can conclude from “large” absolute values of z that the proportions
of the populations are different. A large z indicates that the difference
between the proportions is larger than what would be expected from
sampling variability alone (i.e., that the difference between the
proportions of the two groups is statistically significant). A small z (near
0) indicates that there is no significant difference between the
proportions of the two groups.

If you enabled the Yates correction in the Options for z-test dialog box,
the calculation of z is slightly smaller to account for the difference
between the theoretical and calculated values of z. For more

Comparing Proportions Using the z-Test 440


Comparing Frequencies, Rates, and Proportions

information on the Yates correction for continuity, see The Yates


Correction Factor on page 437.

P value The P value is the probability of being wrong in concluding


that there is a difference in the proportions of the two groups (i.e., the
probability of falsely rejecting the null hypothesis, or committing a Type
I error). The smaller the P value, the greater the probability that the
samples are drawn from populations with different proportions.
Traditionally, you conclude that there are significant differences when P
< 0.05.

Confidence Interval for If the confidence interval does not include zero, you can conclude that
the Difference there is a significant difference between the proportions with the level of
confidence specified. This can also be described as P < #, where # is the
acceptable probability of incorrectly concluding that there is a
difference.

The level of confidence is adjusted in the Options dialog box; this is


typically 100(1 &%#), or 95%. Larger values of confidence result in
wider intervals, and smaller values in smaller intervals. For a further
explanation of #, see Power below.

This result is displayed unless you disable it in the Options for z-test
dialog box (see page 437).

Power The power, or sensitivity, of a z-test is the probability that the test will
detect a difference among the groups if there really is a difference. The
closer the power is to 1, the more sensitive the test. z-test power is
affected by the sample size and the observed proportions of the samples.

This result is displayed unless you disabled it in the Options for z-test
dialog box (see page 435).

Alpha (#) Alpha (#) is the acceptable probability of incorrectly


concluding that there is a difference. An # error is also called a Type I
error (a Type I error is when you reject the hypothesis of no effect when
this hypothesis is true).

The #%value is set in the z-test Power dialog box (see page 719 in
Chapter 10, Comparing Frequencies, Rates, and Proportions); the
suggested value is # " 0.05 which indicates that a one in twenty chance
of error is acceptable. Smaller values of # result in stricter requirements
before concluding there is a difference in distribution, but a greater

Comparing Proportions Using the z-Test 441


Comparing Frequencies, Rates, and Proportions

possibility of concluding there is no difference when one exists (a Type


II error). Larger values of # make it easier to conclude that there is a
difference, but also increase the risk of seeing a false difference (a Type I
error).

Chi-Square Analysis of Contingency Tables 10

Use (2 analysis of contingency tables when:

➤ You want to compare the distributions of two or more groups whose


individuals fall into two or more different classes or categories
➤ There are five or more observations expected in each cell of a 2 x 2
contingency table

If you have fewer than five observations in any cell of a 2 x 2


contingency table, use the Fisher Exact Test. The (2 test is computed
based on the assumption that the rows and columns are independent: if
the rows and columns are dependent, i.e., the same group undergoes two
consecutive treatments, use McNemar's Test.

About the The Chi-Square Test analyzes data in a contingency table. A


Chi-Square Test contingency table is a table of the number of individuals in each group
that fall in each category. The different characteristics or categories are
the columns of the table, and the groups are the rows of the table (or vice
versa). Each cell in the table lists the number of individuals for that
combination of category and group.

A 2 x 2 contingency table has two groups and two categories, (i.e., two
rows and two columns), a 2 x 3 table has two groups and three categories
or three groups and two categories, etc.

TABLE 3-4
A Contingency Table Species Location
Describing the Number
of Lowland and Alpine Tundra Foothills Treeline
Species Found at
Different Locations Lowland 125 16 6
Alpine 7 19 117

The (2 test uses the percentages of the row and column totals for each
cell to compute the expected number of observations per cell if the

Chi-Square Analysis of Contingency Tables 442


Comparing Frequencies, Rates, and Proportions

treatment had no effect. The (2 statistic summarizes the difference


between the expected and the observed frequencies. For more
information on arranging data in contingency tables, see DATA FORMAT
FOR RATE AND PROPORTION TESTS on page 431.

Performing a To perform a Chi-Square ((2) Test:


Chi-Square Test
1 Enter or arrange your data appropriately in the data worksheet
(see following section).

2 If desired, set the Chi-Square options using the Options for Chi-
Square dialog box (page 444).

3 Select Chi-Square from the toolbar drop-down list, then select the
button, or choose the Statistics menu Rates and Proportions
command, then choose Chi-Square.

4 Run the test by selecting the worksheet columns with the data you
want to test using the Pick Columns dialog box (page 446).

5 View and interpret the Chi-Square report (page 449).

Arranging Chi-Square Data 10

Analysis of contingency tables can be done directly from a contingency


table entered in the worksheet or from two columns of raw data
observations.

The data format used in the test is specified in the Pick Columns dialog
box. For more information on selecting a data format in the Pick
Columns dialog box, see page 447.

Tabulated Data Tabulated data is arranged in a contingency table using the worksheet
rows and columns as the groups and categories. The number of
observations for each combination of the group are entered into the
appropriate cells.

Raw Data Raw data uses a row for each individual observation, and places the
corresponding groups for the observations in one column and the
categories in a second column. SigmaStat automatically determines the
number of groups and categories used. For more information on

Chi-Square Analysis of Contingency Tables 443


Comparing Frequencies, Rates, and Proportions

arranging data as indexed data, see Data Format for Rate and Proportion
Tests on page 431.

FIGURE 10–7
Valid Data Formats
for a%(2 Test
Columns 1 through 3
are arranged as a
contingency table.
Columns 4 and 5 are raw
data for the observations.
Each row corresponds to
a single observation.

Note that not all the


raw data points are shown,
as the columns are longer
than fifteen rows.

Selecting Data Columns When running a Chi-Square test, you can either:

➤ Select the columns to test from the worksheet by dragging your


mouse over the columns before choosing the test.
➤ Select the columns while performing the test (see page 446).

Setting Chi-Square Options 10

Use the Chi-Square options to:

➤ Display the power of a performed test for Compare Proportion tests


in the reports.
➤ Enable the Yates Correction Factor.

To change Chi-Square options:

1 If you are going to run the test after changing test options, and
want to select your data before you run the test, drag the pointer
over your data.

2 To open the Options for Chi-Square dialog box, select Chi-Square


from the toolbar drop-down list, then click the button, or
choose the Statistics menu Current Test Options... command.
The Power and the Yates Correction Factor options appear.

Chi-Square Analysis of Contingency Tables 444


Comparing Frequencies, Rates, and Proportions

3 Click a check box to enable or disable a test option. All options are
saved between SigmaStat sessions.

4 To continue the test, click Run Test. The Pick Columns dialog
box appears (see page 446 for more information).

5 To accept the current settings and close the options dialog box,
click OK. To accept the current setting without closing the
options dialog box, click Apply. To close the dialog box without
changing any settings or running the test, click Cancel.

You can select Help at any time to access SigmaStat’s on-line help
system.

Power Leave the Power option selected to detect the sensitivity of the test. The
power or sensitivity of a test is the probability that the test will detect a
difference between the proportions of two groups if there is really a
difference.

FIGURE 10–8
The Options for
Chi-Square Dialog Box

Change the alpha value by editing the value in the Alpha Value box.

Alpha (#) is the acceptable probability of incorrectly concluding that


there is a difference. The suggested value is # " 0.05. This indicates
that a one in twenty chance of error is acceptable, or that you are willing
to conclude there is a significant difference when P ! 0.05.

Smaller values of # result in stricter requirements before concluding


there is a significant difference, but a greater possibility of concluding
there is no difference when one exists. Larger values of # make it easier
to conclude that there is a difference, but also increase the risk of
reporting a false positive.

Chi-Square Analysis of Contingency Tables 445


Comparing Frequencies, Rates, and Proportions

The Yates When a statistical test uses a () distribution with one degree of freedom,
Correction Factor such as analysis of a 2 x 2 contingency table or McNemar's test, the (2
calculated tends to produce P values which are too small, when
compared with the actual distribution of the (2 test statistic. The
theoretical (2 distribution is continuous, whereas the (2 produced with
real data is discrete.

You can use the Yates Continuity Correction to adjust the computed (2
value down to compensate for this discrepancy. Using the Yates
correction makes a test more conservative, i.e., it increases the P value
and reduces the chance of a false positive conclusion. The Yates
correction is applied to 2 x 2 tables and other statistics where the P value
is computed from a (2 distribution with one degree of freedom.

Click the check box to turn the Yates Correction Factor on or off.

For descriptions of the derivation of the Yates correction, you can


reference any appropriate statistics reference. For a list of suggested
references, see page 12.

Running a Chi-Square Test 10

To run a test, you need to select the data to test. The Pick Columns
dialog box is used to select the worksheet columns with the data you
want to test and to specify how your data is arranged in the worksheet.

To run a Chi-Square Test:

1 If you want to select your data before you run the test, drag the
pointer over your data.

2 Open the Pick Columns dialog box to start the Chi-Square test.
You can either:

➤ Select Chi-Square from the toolbar drop-down, then select the


button.
➤ Choose the Statistics menu Rates and Proportions command,
then choose Chi-Square...
➤ Click the Run Test button from the Options for Chi-Square
dialog box (see step 4 on page 445).

The Pick Columns dialog box appears prompting you to specify a


data format.

Chi-Square Analysis of Contingency Tables 446


Comparing Frequencies, Rates, and Proportions

3 Select the appropriate data format from the Data Format drop-
down list. If you are testing contingency table data, select
Tabulated. If your data is arranged in raw format, select Raw (see
page 443).

FIGURE 10–9
The Pick Columns
for Chi-Square Test Dialog
Box
Prompting You to
Specify a Data Format

For more information on arranging data, see Data Format for Rate
and Proportion Tests on page 431, or Arranging Data for
Contingency Tables on page 69.

4 Select Next to pick the data columns for the test. If you selected
columns before you chose the test, the selected columns appear in
the Selected Columns list.

If you selected columns before you chose the test, the selected
columns appear in the column list. If you have not selected
columns, the dialog box prompts you to pick your data.

5 To assign the desired worksheet columns to the Selected Columns


list, select the columns in the worksheet, or select the columns
from the Data for Observations or Category drop-down list.

The first selected column is assigned to the first Observation or


Category row in the Selected Columns list, and all successively
selected columns are assigned to successive rows in the list. The
title of selected columns appears in each row. For raw data, you are
prompted to select two worksheet columns. For tabulated data
you are prompted to select up to 64 columns.

6 To change your selections, select the assignment in the list, then


select new column from the worksheet. You can also clear a
column assignment by double-clicking it in the Selected Columns
list.

Chi-Square Analysis of Contingency Tables 447


Comparing Frequencies, Rates, and Proportions

FIGURE 10–10
The Pick Columns
for Chi-Square Dialog Box
Prompting You to
Select Data Columns

7 Select Finish to run the test. If there are too many cells in a
contingency table with expected values below 5, SigmaStat either:

➤ Suggests that you redefine the groups or categories in the


contingency table to reduce the number of cells and increase the
number of observations per cell.
➤ Suggests the Fisher Exact Test if the table is a 2 x 2 contingency
table.

When there are many cells with expected observations of 5 or less,


the theoretical (2 distribution does not accurately describe the
actual distribution of the (2 test statistic, and the resulting P values
may not be accurate.

Fisher Exact Test computes the exact two-tailed probability of


observing a specific 2 x 2 contingency table, and does not require
that the expected frequencies in all cells exceed 5. When the test is
complete, the (2 test report appears (see Figure 10–11 on page
449).

Chi-Square Analysis of Contingency Tables 448


Comparing Frequencies, Rates, and Proportions

Interpreting Results of a%(2 Analysis of Contingency Tables 10

The report for a (2 test lists a summary of the contingency table data,
the (2 statistic calculated from the distributions, and the P value for (2.

For descriptions of the derivations for (2 test results, you can reference
any appropriate statistics reference. For a list of suggested references, see
page 12.

The report scroll bars only scroll to the top and bottom of the current page.
To move to the next or the previous page in the report, use the and
buttons in the formatting toolbar to move one page up and down in the
report.

Results Explanations In addition to the numerical results, expanded explanations of the results
may also appear. To turn off this explanatory text, choose the Statistics
menu Report Options... command and click the selected Explain Test
Results check box.

The number of decimal places displayed is also set in the Report


Options dialog box. For more information on setting report options, see
Setting Report Options on page 135.

FIGURE 10–11
A%Chi-Square
Test Results Report

Chi-Square Analysis of Contingency Tables 449


Comparing Frequencies, Rates, and Proportions

Contingency Each cell in the table is described with a set of statistics.


Table Summary
Observed Counts These are the number of observations per cell,
obtained from the contingency table data.

Expected Frequencies The expected frequencies for each cell in the


contingency table, as predicted using the row and columns percentages.

Row Percentage The percentage of observations in each row of the


contingency table, obtained by dividing the observed frequency counts
in the cells by the total number of observations in that row.

Column Percentage The percentage of observations in each column of


the contingency table, obtained by dividing the observed frequency
counts in the cells by the total number of observations in that column.

Total Cell Percentage The percentage of total number observations in


the contingency table, obtained by dividing the observed frequency in
the cells by the total number of observations in the table.

Chi-Square ((2) (2 is the summed squared differences between the observed frequencies
in each cell of the table and the expected frequencies, or
2
2 observed – expected numbers per cell + -
*-----------------------------------------------------------------------------------------
( =
expected numbers per cell

This computation assumes that the rows and columns are independent.

If the value of (2 is large, you can conclude that the distributions are
different (i.e., that there is a large differences between the expected and
observed frequencies, indicating that the rows and columns are
independent).

Values of (2 near zero indicate that the pattern in the contingency table
is no different from what one would expect if the counts were
distributed at random.

Yates Correction The Yates correction is used to adjust the (2 and


therefore the P value for 2 x 2 tables to more accurately reflect the true
distribution of (2. The Yates correction is enabled in the Options for
Chi-Square dialog box, and is only applied to 2 x 2 tables.

P Value The P value is the probability of being wrong in concluding


that there is a true difference in the distribution of the numbers of

Chi-Square Analysis of Contingency Tables 450


Comparing Frequencies, Rates, and Proportions

observations (i.e., the probability of falsely rejecting the null hypothesis,


or committing a Type I error, based on (2). The smaller the P value, the
greater the probability that the samples are drawn from populations with
different distributions among the categories. Traditionally, you
conclude that there are significant differences when P < 0.05.

Power The power, or sensitivity, of a Chi-Square test is the probability that the
test will detect a difference among the groups if there really is a
difference. The closer the power is to 1, the more sensitive the test.
Chi-Square power is affected by the sample size and the observed
proportions of the samples. This result is displayed if you selected this
option in the Options for Chi-Square dialog box.

Alpha (#) Alpha (#) is the acceptable probability of incorrectly


concluding that there is a difference. An # error is also called a Type I
error (a Type I error is when you reject the hypothesis of no effect when
this hypothesis is true).

The # value is set in the Power Option dialog box (see Determining the
Power of a Chi-Square Test on page 723). The suggested value is # "
0.05, which indicates that a one in twenty chance of error is acceptable.
Smaller values of # result in stricter requirements before concluding
there is a difference in distribution, but a greater possibility of
concluding there is no difference when one exists (a Type II error).
Larger values of # make it easier to conclude that there is a difference,
but also increase the risk of seeing a false difference (a Type I error).

The Fisher Exact Test 10

Use the Fisher Exact Test to compare the distributions in a 2 x 2


contingency table that has 5 or less expected observations in one or more
cells.

If no cells have less than five expected observations, you can use a (2 test.

About the The Fisher Exact Test determines the exact probability of observing a
Fisher Exact Test specific 2 x 2 contingency table (or a more extreme pattern). Use the
Fisher Exact Test instead of (2 analysis of a 2 x 2 contingency table when
the expected frequencies of one or more cells is less than 5.

The Fisher Exact Test 451


Comparing Frequencies, Rates, and Proportions

SigmaStat automatically suggests the Fisher Exact Test when a (2 analysis of


a 2 x 2 contingency table is performed and less than 5 expected observations
are encountered in any cells.

Performing a To perform a Fisher Exact Test:


Fisher Exact Test
1 Enter or arrange your data appropriately in the data worksheet
(see following section).

2 Select Fisher Exact Test from the toolbar, then select the
button, or choose the Statistics menu Rates and Proportions
command, then choose Fisher Exact Test.

3 Run the test by selecting the worksheet columns with the data you
want to test using the Pick Columns dialog box (page 453).

4 View and interpret the Fisher Exact Test report (page 455).

Arranging Fisher Exact Test Data 10

The data of a Fisher Exact Test must form a 2 x 2 contingency table, that
is, exactly two rows by two columns. The data can be tabulated data in
2 x 2 table entered in the worksheet or from two columns of raw data.

Tabulated Data Tabulated or contingency table data uses the rows to represent the two
groups, and the columns to represent the two categories, or vice versa.
The number of individuals that fall into each combination of groups
and categories is entered into each cell. There should be no more than
two rows and two columns.

Raw Data Raw data uses a row for each individual observation, and places the
corresponding groups for the observations in one column and the

The Fisher Exact Test 452


Comparing Frequencies, Rates, and Proportions

categories in a second column. There should be no more than two


different groups and two types of categories.

FIGURE 10–12
Valid Data Formats for a
Fisher Exact Test
Columns 1 and 2 are
arranged as a 2 x 2
contingency table, and
columns 3 and 4 are the
raw observation data.

Selecting When running a Fisher Exact Test, you can either:


Data Columns
➤ Select the columns to test from the worksheet by dragging your
mouse over the columns before choosing the test.
➤ Select the columns while performing the test (see page 453).

Running a Fisher Exact Test 10

To run a test, you need to select the data to test. The Pick Columns
dialog box is used to select the worksheet columns with the data you
want to test and to specify how your data is arranged in the worksheet.

To run a Fisher Exact Test:

1 If you want to select your data before you run the test, drag the
pointer over your data.

2 Open the Pick Columns dialog box to start the Fisher Exact Test.
You can either:

➤ Select Fisher Exact Test from the toolbar drop-down list, then
select the button.
➤ Choose the Statistics menu Rates and Proportions command,
then choose Fisher Exact Test...

The Pick Columns dialog box appears prompting you to specify a


data format.

The Fisher Exact Test 453


Comparing Frequencies, Rates, and Proportions

3 Select the appropriate data format from the Data Format drop-
down list. If you are testing contingency table data, select
Tabulated. If your data is arranged in raw format, select Raw (see
page 452).

FIGURE 10–13
The Pick Columns
for Fisher Exact Test
Dialog Box Prompting You to
Specify a Data Format

For more information on arranging data, see Data Format for Rate
and Proportion Tests on page 431, or Arranging Data for
Contingency Tables on page 69.

4 Select Next to pick the data columns for the test. If you selected
columns before you chose the test, the selected columns appear in
the Selected Columns list.

If you selected columns before you chose the test, the selected
columns appear in the column list. If you have not selected
columns, the dialog box prompts you to pick your data.

5 To assign the desired worksheet columns to the Selected Columns


list, select the columns in the worksheet, or select the columns
from the Data for Observations or Category drop-down list.

The first selected column is assigned to the first Observation or


Category row in the Selected Columns list, and all successively
selected columns are assigned to successive rows in the list. The
title of selected columns appears in each row. For raw data, you are
prompted to select up two worksheet columns. For tabulated data
you are prompted to select up to 64 columns.

6 To change your selections, select the assignment in the list, then


select new column from the worksheet. You can also clear a
column assignment by double-clicking it in the Selected Columns
list.

The Fisher Exact Test 454


Comparing Frequencies, Rates, and Proportions

FIGURE 10–14
The Pick Columns
for Fisher Exact Test
Dialog Box Prompting You to
Select Data Columns

7 Select Finish to run the test. If there are no cells in the table with
expected values below 5, SigmaStat suggests the (2 test instead (the
Fisher Exact Test can be used, but takes longer to compute).

The Fisher Exact Test computes the exact two-tailed probabilities of


observing a specific 2 x 2 contingency table.

Fisher Exact Test computes the exact two-tailed probability of


observing a specific 2 x 2 contingency table, and does not require
that the expected frequencies in all cells exceed 5.

The Fisher Exact Test is performed. When the test is complete,


the Fisher Exact Test report appears (see Figure 10–15 on page
456).

Interpreting Results of a Fisher Exact Test 10

Fisher Exact Test computes the two-tailed P value corresponding to the


exact probability distribution of the table.

For descriptions of the derivations for Fisher Exact Test results, you can
reference any appropriate statistics reference. For a list of suggested
references, see page 12.

% The report scroll bars only scroll to the top and bottom of the current page.
To move to the next or the previous page in the report, use the and
buttons in the formatting toolbar to move one page up and down in the
report.

Result Explanations In addition to the numerical results, expanded explanations of the results
may also appear. To turn off this explanatory text, choose the Statistics

The Fisher Exact Test 455


Comparing Frequencies, Rates, and Proportions

menu Report Options... command and uncheck the Explain Test


Results option.

The number of decimal places displayed is also controlled in the Report


Options dialog box. For more information on setting report options, see
page 135.

FIGURE 10–15
A Fisher Exact Test
Results Report

P Value The P value is the two-tailed probability of being wrong in concluding


that there is a true difference in the distribution of the numbers of
observations (i.e., the probability of falsely rejecting the null hypothesis,
or committing a Type I error). The smaller the P value, the greater the
probability that the samples are drawn from populations with different
distributions among the two categories.

Traditionally, you conclude that there are significant differences when


P < 0.05.

The Fisher Exact Test computes P directly using a two tailed probability.

Contingency Each cell in the table is described with a set of statistics.


Table Summary
Observed Counts These are the number of observations per cell,
obtained from the contingency table data.

The Fisher Exact Test 456


Comparing Frequencies, Rates, and Proportions

Total Cell Percentage The percentage of total number of observations


in the contingency table, obtained by dividing the observed frequency in
the cells by the total number of observations in the table.

Row Percentage The percentage of observations in each row of the


contingency table, obtained by dividing the observed frequency counts
in the cells by the total number of observations in that row.

Column Percentage The percentage of observations in each column of


the contingency table, obtained by dividing the observed frequency
counts in the cells by the total number of observations in that column.

McNemar’s Test 10

Use McNemar's Test when you are:

➤ Making observations on the same individuals.


➤ Counting the distributions in the same categories after two different
treatments or changes in condition.

About the McNemar's Test is an analysis of contingency tables that have repeated
McNemar Test observations of the same individuals. These table designs are used when

➤ Determining whether or not an individual responded to a treatment


or change in condition, which uses observations before and after the
treatment.
➤ Comparing the results of two different treatments or conditions that
result in the same type of responses; for example, surveying the
opinion (approve, disapprove, or don't know) of the same people
before and after a report.

McNemar's Test is similar to a regular analysis of a contingency table.


However, it ignores individuals who responded the same way to the
same treatments, and calculates the expected frequencies using the
remaining cells as the average number of individuals who responded
differently to the treatments.

Performing a McNemar To perform a McNemar Test:


Test
1 Enter or arrange your data appropriately in the data worksheet
(see following section).

McNemar’s Test 457


Comparing Frequencies, Rates, and Proportions

2 If desired, set the McNemar Test options using the Options for
McNemar’s dialog box (page 459).

3 Select McNemar Test from the toolbar, then select the button,
or choose the Statistics menu Rates and Proportions command,
McNemar Test from the Statistics menu.

4 Run the test by selecting the worksheet columns with the data you
want to test using the Pick Columns dialog box (page 461).

5 View and interpret the McNemar Test report (page 463).

Arranging McNemar Test Data 10

The data for McNemar's Test must form a contingency table that has
exactly the same number of rows and columns. The data can be
tabulated in a table entered in the worksheet or from two columns of
raw data.

TABLE 3-5
A 3 x 3 Contingency Table Before Report After Report
Describing the Effect of a
Report on the Opinion of Approve Disapprove Don’t Know
Surveyed People
The McNemar Test Approve 12 24 6
ignores people who
didn’t change their opinion. Disapprove 5 32 3
Don’t Know 4 6 4

Tabulated Data For tabulated or contingency table data, the worksheet rows correspond
to one set of treatment categories and the columns to the other set of
treatment categories. The number of individuals that correspond to that
combination of categories is entered into each cell. The categories
assigned to the rows are assumed to be in the same order of occurrence as
the columns. The number of individuals that fall into each combination
of the categories is entered into each cell. Because the same set of
categories are used for the two different treatments, the number of rows
and columns in the table are always the same.

Raw Data Raw data uses a row for each individual observation, and places the
corresponding groups for the first treatment category in one column and

McNemar’s Test 458


Comparing Frequencies, Rates, and Proportions

the second treatment category in a second column. There should be the


same number of categories in each column.

The data format used when running a test is specified in the Pick
Columns dialog box. See page 461 for more information.

FIGURE 10–16
Valid Data Formats for
McNemar Test
Columns 1 through 3
are arranged as a 3 x 3
contingency table, and
columns 4 and 5 are raw
observation data.

Selecting When running a McNemar Test, you can either:


Data Columns
➤ Select the columns to test from the worksheet by dragging your
mouse over the columns before choosing the test.
➤ Select the columns while performing the test (see page 461).

Setting McNemar’s Options 10

Use the McNemar Test options to enable the Yates Correction Factor.

To change McNemar Test options:

1 If you are going to run the test after changing test options and
want to select your data before you run the test, drag the pointer
over your data.

2 To open the Options for McNemar Test dialog box, select


McNemar Test from the toolbar drop-down, then click the
button, or choose the Statistics menu Current Test Options...
command.

3 Leave the Yates Correction check box selected to include the Yates
Correction Factor in the test report. Click the selected check box

McNemar’s Test 459


Comparing Frequencies, Rates, and Proportions

to disable the Yates Correction Factor. For more information on


the Yates Correction Factor, see page 460.

4 To continue the test, click Run Test. The Pick Columns dialog
box appears (see page 461 for more information). To close the
options dialog box and accept the current settings without
continuing the test, click OK. Click Apply to accept the current
settings without closing the dialog box, and to click Cancel close
the dialog box without changing any settings or running the test.

You can select Help at any time to access SigmaStat’s on-line help
system.

The Yates When a statistical test uses a () distribution with one degree of freedom,
Correction Factor such as analysis of a 2 x 2 contingency table or McNemar's test, the (2
calculated tends to produce P values which are too small when compared
with the actual distribution of the (2 test statistic. The theoretical (2
distribution is continuous, whereas the (2 produced with real data is
discrete.

FIGURE 10–17
The Options for
McNemar Test Dialog Box

You can use the Yates Continuity Correction to adjust the computed (2
value down to compensate for this discrepancy. Using the Yates
correction makes a test more conservative, i.e., it increases the P value
and reduces the chance of a false positive conclusion. The Yates
correction is applied to 2 x 2 tables and other statistics where the P value
is computed from a (2 distribution with one degree of freedom.

Click the check box to enable or disable the Yates Correction Factor.

McNemar’s Test 460


Comparing Frequencies, Rates, and Proportions

For descriptions of the derivation of the Yates correction, you can


reference any appropriate statistics reference. For a list of suggested
references, see page 12.

Running a McNemar Test 10

To run the McNemar Test, you need to select the data to test. The Pick
Columns dialog box is used to select the worksheet columns with the
data you want to test and to specify how your data is arranged in the
worksheet.

To run a McNemar Test:

1 If you want to select your data before you run the test, drag the
pointer over your data.

2 Open the Pick Columns dialog box to start the McNemar Test.
You can either:

➤ Select McNemar Test from the toolbar drop-down list, then


select the button.
➤ Choose the Statistics menu Rates and Proportions, McNemar
Test... command.
➤ Click the Run Test button from the Options for McNemar Test
dialog box (see step 4 on page 460).

The Pick Columns dialog box appears prompting you to specify a


data format.

3 Select the appropriate data format from the Data Format drop-
down list. If you are testing contingency table data, select
Tabulated. If your data is arranged in raw format, select Raw (see
page 458).

For more information on arranging data, see Data Format for Rate
and Proportion Tests on page 431, or Arranging Data for
Contingency Tables on page 69.

4 Select Next to pick the data columns for the test. If you selected
columns before you chose the test, the selected columns appear in
the Selected Columns list.

McNemar’s Test 461


Comparing Frequencies, Rates, and Proportions

FIGURE 10–18
The Pick Columns
for McNemar’s Test
Dialog Box Prompting You to
Specify a Data Format

If you selected columns before you chose the test, the selected
columns appear in the column list. If you have not selected
columns, the dialog box prompts you to pick your data.

5 To assign the desired worksheet columns to the Selected Columns


list, select the columns in the worksheet, or select the columns
from the Data for Observations or Category drop-down list.

The first selected column is assigned to the first Observation or


Category row in the Selected Columns list, and all successively
selected columns are assigned to successive rows in the list. The
title of selected columns appears in each row. For raw data, you are
prompted to select two worksheet columns. For tabulated data
you are prompted to select up to 64 worksheet columns.

FIGURE 10–19
The Pick Columns
for McNemar’s Test
Dialog Box Prompting You to
Select Data Columns

6 To change your selections, select the assignment in the list, then


select new column from the worksheet. You can also clear a
column assignment by double-clicking it in the Selected Columns
list.

7 Select Finish to run the test. The McNemar’s test report appears.
.

McNemar’s Test 462


Comparing Frequencies, Rates, and Proportions

Interpreting Results of McNemar Test 10

The report for McNemar Test lists a summary of the contingency table
data, the (2 statistic calculated from the distributions, and the P value.

For descriptions of the derivations of McNemar Test results, you can


reference any appropriate statistics reference. For a list of suggested
references, see page 12.

The report scroll bars only scroll to the top and bottom of the current page.
To move to the next or the previous page in the report, use the and
buttons in the formatting toolbar to move one page up and down in the
report.

Result Explanations In addition to the numerical results, expanded explanations of the results
may also appear. To turn off this explanatory text, choose the Statistics
menu Report Options... command and uncheck the Explain Test
Results option.

The number of decimal places displayed is also set in the Report


Options dialog box. For more information on setting report options, see
page 135.

FIGURE 10–20
A McNemar Test
Results Report

McNemar’s Test 463


Comparing Frequencies, Rates, and Proportions

Chi-Square ((2) (2 is the summed squared differences between the observed frequencies
in each cell of the table and the expected frequencies, ignoring
observations on the diagonal cells of the table where the individuals
responded identically to the treatments.
2
2 observed – expected numbers per cell + -
*-----------------------------------------------------------------------------------------
( =
expected numbers per cell

Large values of the (2 test statistic indicate that individuals responded


differently to the different treatments (i.e., that there are differences
between the expected and observed frequencies).

Values of (2 near zero indicate that the pattern in the contingency table
is no different from what one would expect if the counts were
distributed at random.

P Value The P value is the probability of being wrong in concluding


that there is a true difference in the distribution of the numbers of
observations (i.e., the probability of falsely rejecting the null hypothesis,
or committing a Type I error, based on (2). The smaller the P value, the
greater the probability that the samples are drawn from populations with
different distributions among the categories. Traditionally, you
conclude that there are significant differences when P < 0.05.

Contingency Each cell in the table is described with a set of statistics for that cell.
Table Summary
Observed Counts These are the number of observations per cell,
obtained from the contingency table data.

Expected Frequencies The expected frequencies for each cell in the


contingency table, as predicted using the row and columns percentages.

McNemar’s Test 464


Prediction and Correlation

11 Prediction and Correlation

Prediction uses regression and correlation techniques to describe the


relationship between two or more variables. For information on when
to use the different regression and correlation procedures, see Choosing
the Procedure to Use on page 103.

About Regression 10

Regression procedures use the values of one or more independent


variables to predict the value of a dependent variable. The
independent variables are the known, or predictor, variables. When the
independent variables are varied, they result in a corresponding value for
the dependent, or response, variable.

You can perform regressions using six different methods.

➤ Simple Linear Regression (page 469)


➤ Multiple Linear Regression (page 495)
➤ Multiple Linear Logistic Regression (page 527)
➤ Polynomial Regression (page 553)
➤ Stepwise Regression, both forwards and backwards (page 577)
➤ Best Subset Regression (page 611)
➤ Nonlinear Regression (page 636)

Regression assumes an association between the independent and


dependent variables that, when graphed on a Cartesian coordinate
system, produces a straight line, plane, or curve. Regression finds the
equation that most closely describes the actual data.

About Regression 465


Prediction and Correlation

For example, Simple Linear Regression uses the equation for a straight
line

y = b0 + b1 x

where y is the dependent variable, x is the independent variable, b0 is the


intercept, or constant term (the value of the dependent variable when
x"0, the point where the regression line intersects the y axis), and b1 is
the slope, or regression coefficient (increase in the value of y per unit
increase in x). As the values for x increase by 1, the corresponding values
for y either increase or decrease by b1, depending on the sign of b1.

Multiple Linear Regression is similar to simple linear regression, but uses


multiple independent variables to fit the general equation for a
multidimensional plane

y = b0 + b1 x1 + b2 x2 + b3 x3 + , + bk xk

where y is the dependent variable, x1, x2, x3, ..., xk are the k independent
variables, and b0, b1, b2,...,bk are the k regression coefficients. As the
values for xi increase by 1, the corresponding value for y either increases
or decreases by bk depending on the sign of bk.

FIGURE 11–1 11
Graph of a Linear
Regression with data points, 10
Regression Line, and
Residual
Residuals Labeled 9
Dependent Variable, y

8 Data point

6
Regression
5

2
100 120 140 160 180 200 220 240 260 280 300

Independent Variable, x

About Regression 466


Prediction and Correlation

Regression is a parametric statistical method that assumes that the


residuals (differences between the predicted and observed values of the
dependent variables) are normally distributed with constant variance.

Because the regression coefficients are computed by minimizing the sum


of squared residuals, this technique is often called least squares
regression.

Correlation 10

Correlation procedures measure the strength of association between two


variables, which can be used as a gauge of the certainty of prediction.
Unlike regression, it is not necessary to define one variable as the
independent variable and one as the dependent variable.

The correlation coefficient r is a number that varies between –1 and +1.


A correlation of –1 indicates there is a perfect negative relationship
between the two variables, with one always decreasing as the other
increases. A correlation of +1 indicates there is a perfect positive
relationship between the two variables, with both always increasing
together. A correlation of 0 indicates no relationship between the two
variables.

FIGURE 11–2 r = 1.00 r = -1.00


12 12
Graphs of Data with Varying
Correlation Coefficients r 10 10

8 8

6 6

4 4

2 2

0 0
0 2 4 6 8 10 12 0 2 4 6 8 10 12

r = 0.75 r = 0.05
12 12

10 10

8 8

6 6

4 4

2 2

0 0
0 2 4 6 8 10 12 0 2 4 6 8 10 12

Correlation 467
Prediction and Correlation

There are two types of correlation coefficients.

➤ The Pearson Product Moment Correlation, a parametric statistic


which assumes a normal distribution and constant variance of the
residuals.
➤ The Spearman Rank Order Correlation, a nonparametric
association test that does not require assuming normality or
constant variance of the residuals.

Data Format for Regression and Correlation 10

Data for all regression and correlation procedures consists of the


dependent variables (usually the “y” data) in one column, and the
independent variables (usually the “x” data) in one or more additional
columns, one column for each independent variable.

Regression ignores rows containing missing data points within columns


of data (indicated with a double dash “--”). All the columns must be of
equal length, including missing values, or you will receive an error
message.

If you plan to test blocks of data instead of picking columns, the


columns must be adjacent, and the leftmost column is assumed to be the
dependent variable.

Data Format for Regression and Correlation 468


Prediction and Correlation

See the Selecting Data Columns sections under each test for information
on selecting blocks of data instead of entire columns.

FIGURE 11–3
Data for a Multiple
Linear Regression
Temperature and pH are
the independent variables,
and Growth Rate is the
dependent variable.

Simple Linear Regression 10

Use Linear Regression when:

➤ You want to predict a trend in data, or predict the value of a variable


from the value of another variable, by fitting a straight line through
the data.
➤ You know there is exactly one independent variable.

The independent variable is the known, or predicted, variable, such as


time or temperature. When the independent variable is varied, it
produces a corresponding value for the dependent, or response, variable.
If you know there is more than one independent variable, use multiple
linear regression.

About the Simple Linear Linear Regression assumes an association between the independent and
Regression dependent variable that, when graphed on a Cartesian coordinate
system, produces a straight line. Linear Regression finds the straight line
that most closely describes, or predicts, the value of the dependent
variable, given the observed value of the independent variable.

Simple Linear Regression 469


Prediction and Correlation

The equation used for a Simple Linear Regression is the equation for a
straight line, or

y = b0 + b1 x

where y is the dependent variable, x is the independent variable, b0 is the


intercept, or constant term (value of the dependent variable when x " 0,
the point where the regression line intersects the y axis), and b1 is the
slope, or regression coefficient (increase in the value of y per unit
increase in x). As the values for x increase, the corresponding value for y
either increases or decreases by b1, depending on the sign of b1.

Linear Regression is a parametric test, that is, for a given independent


variable value, the possible values for the dependent variable are assumed
to be normally distributed with constant variance around the regression
line.

Performing a To perform a Simple Linear Regression:


Linear Regression
1 Enter or arrange your data appropriately in the worksheet
(see following section).

2 If desired, set the Linear Regression options using the Options for
Linear Regression dialog box (page 471).

3 Select Linear Regression from the toolbar, then select the


button, or choose the Statistics menu Regression command, then
choose Linear.

4 Run the test by selecting the worksheet columns with the data you
want to test using the Pick Columns dialog box (page 482).

5 View and interpret the Linear Regression report and generate


report graphs (pages 11-483 and 11-493).

Simple Linear Regression 470


Prediction and Correlation

Arranging Linear Regression Data 10

Place the data for the observed dependent variable in one column and
the data for the corresponding independent variable in a second column.
Observations containing missing values are ignored, and both columns
must be equal in length.

FIGURE 11–4
Data Format for a
Simple Linear Regression

Selecting When running a Linear Regression, you can either:


Data Columns
➤ Select the columns to test from the worksheet before choosing the
test, or
➤ Select the columns while performing the test.

Setting Linear Regression Options 10

Use the Linear Regression options to:

➤ Set assumption checking options.


➤ Specify the residuals to display and save them to the worksheet.
➤ Display confidence intervals and save them to the worksheet.
➤ Display the PRESS Prediction Error and standardized regression
coefficients.
➤ Specify tests to identify outlying or influential data points.
➤ Display power.

To change Linear Regression options:

Simple Linear Regression 471


Prediction and Correlation

1 If you are going to run the test after changing test options, and
want to select your data before you run the test, drag the pointer
over your data.

2 To open the Options for Linear Regression dialog box, select


Linear Regression from the drop-down list in the toolbar, then
click the button, or choose the Statistics menu Current Test
Options... command. The assumption checking options appear
(see Figure 11–5 on page 473).

3 Click the Residuals tab to view the residual options (see Figure 11–
6 on page 475), More Statistics tab to view the confidence
intervals, PRESS Prediction Error, and Standardized Coefficients
options (see 11–7 on page 477), and Other Diagnostics tab to view
the Influence and Power options (see Figures 11–9 on page 480).
Click the Assumption Checking tab to return to the Normality,
Constant Variance, and Durbin-Watson options.

4 Click a check box to enable or disable a test option. Options


settings are saved between SigmaStat sessions. For more
information on each of the test options, see pages 11-471 through
11-481.

5 To continue the test, click Run Test. The Pick Columns dialog
box appears (see page 482 for more information).

6 To accept the current settings and close the options dialog box,
click OK. To accept the current setting without closing the
options dialog box, click Apply. To close the dialog box without
changing any settings or running the test, click Cancel.

You can select Help at any time to access SigmaStat’s on-line help system.

Assumption Checking Select the Assumption Checking tab from the options dialog box to view
Options the Normality, Constant Variance, and Durbin-Watson options. These
options test your data for its suitability for regression analysis by
checking three assumptions that a linear regression makes about the
data. A linear regression assumes:

➤ That the source population is normally distributed about the


regression.

Simple Linear Regression 472


Prediction and Correlation

➤ The variance of the dependent variable in the source population is


constant regardless of the value of the independent variable(s).
➤ That the residuals are independent of each other.

All assumption checking options are selected by default. Only disable


these options if you are certain that the data was sampled from normal
populations with constant variance and that the residuals are
independent of each other.

Normality Testing SigmaStat uses the Kolmogorov-Smirnov test to test


for a normally distributed population.

FIGURE 11–5
The Options for
Linear Regression Dialog
Box Displaying the
Assumption Checking
Options

Constant Variance Testing SigmaStat tests for constant variance by


computing the Spearman rank correlation between the absolute values of
the residuals and the observed value of the dependent variable. When
this correlation is significant, the constant variance assumption may be
violated, and you should consider trying a different model (i.e., one that
more closely follows the pattern of the data), or transforming one or
more of the independent variables to stabilize the variance; see Chapter
14, Using Transforms, for more information on the appropriate
transform to use.

P Values for Normality and Constant Variance The P value


determines the probability of being incorrect in concluding that the data
is not normally distributed (P value is the risk of falsely rejecting the null
hypothesis that the data is normally distributed). If the P computed by
the test is greater than the P set here, the test passes.

To require a stricter adherence to normality and/or constant variance,


increase the P value. Because the parametric statistical methods are

Simple Linear Regression 473


Prediction and Correlation

relatively robust in terms of detecting violations of the assumptions, the


suggested value in SigmaStat is 0.05. Larger values of P (for example,
0.10) require less evidence to conclude that the residuals are not
normally distributed or the constant variance assumption is violated.

To relax the requirement of normality and/or constant variance,


decrease P. Requiring smaller values of P to reject the normality
assumption means that you are willing to accept greater deviations from
the theoretical normal distribution before you flag the data as non-
normal. For example, a P value of 0.01 for the normality test requires
greater deviations from normality to flag the data as non-normal than a
value of 0.05.

Although the assumption tests are robust in detecting data from populations
that are non-normal or with non-constant variances, there are extreme
conditions of data distribution that these tests cannot detect. However, these
conditions should be easily detected by visually examining the data without
resorting to the automatic assumption tests.

Durbin-Watson Statistic SigmaStat uses the Durbin-Watson statistic


to test residuals for their independence of each other. The Durbin-
Watson statistic is a measure of serial correlation between the residuals.
The residuals are often correlated when the independent variable is time,
and the deviation between the observation and the regression line at one
time are related to the deviation at the previous time. If the residuals are
not correlated, the Durbin-Watson statistic will be 2.

Difference from 2 Value Enter the acceptable deviation from 2.0 that
you consider as evidence of a serial correlation in the Difference for 2.0
box. If the computed Durbin-Watson statistic deviates from 2.0 more
than the entered value, SigmaStat warns you that the residuals may not
be independent. The suggested deviation value is 0.50, i.e., Durbin-
Watson Statistic values greater than 2.5 or less than 1.5 flag the residuals
as correlated.

To require a stricter adherence to independence, decrease the acceptable


difference from 2.0.

To relax the requirement of independence, increase the acceptable


difference from 2.0.

Simple Linear Regression 474


Prediction and Correlation

Residuals Select the Residuals tab in the options dialog box to view the Predicted
Values, Raw, Standardized, Studentized, Studentized Deleted, and
Report Flagged Values Only options.

FIGURE 11–6
The Options for
Linear Regression
Dialog Box Displaying the
Residuals Options

Predicted Values Use this option to calculate the predicted value of the dependent variable
for each observed value of the independent variable(s), then save the
results to the worksheet. Click the selected check box if you do not want
to include raw residuals in the worksheet.

To assign predicted values to a worksheet column, select the worksheet


column you want to save the predicted values to from the corresponding
drop-down list. If you select none and the Predicted Values check box is
selected, the values appear in the report but are not assigned to the
worksheet.

Raw Residuals The raw residuals are the differences between the
predicted and observed values of the dependent variables. To include
raw residuals in the report, make sure this check box is selected. Click
the selected check box if you do not want to include raw residuals in the
worksheet.

To assign the raw residuals to a worksheet column, select the number of


the desired column from the corresponding drop-down list. If you select
none from the drop-down list and the Raw check box is selected, the
values appear in the report but are not assigned to the worksheet.

Standardized Residuals The standardized residual is the residual


divided by the standard error of the estimate. The standard error of the
residuals is essentially the standard deviation of the residuals, and is a
measure of variability around the regression line. To include

Simple Linear Regression 475


Prediction and Correlation

standardized residuals in the report, make sure this check box is selected.
Click the selected check box if you do not want to include raw residuals
in the worksheet.

SigmaStat automatically flags data points lying outside of the confidence


interval specified in the corresponding box. These data points are
considered to have “large” standardized residuals, i.e., outlying data
points. You can change which data points are flagged by editing the
value in the Flag Values > edit box. The suggested residual value is
2.500.

Studentized Residuals Studentized residuals scale the standardized


residuals by taking into account the greater precision of the regression
line near the middle of the data versus the extremes. The Studentized
residuals tend to be distributed according to the Student t distribution,
so the t distribution can be used to define “large” values of the
Studentized residuals. SigmaStat automatically flags data points with
“large” values of the Studentized residuals, i.e., outlying data points; the
suggested data points flagged lie outside the 95% confidence interval for
the regression population.

To include studentized residuals in the report, make sure this check box
is selected. Click the selected check box if you do not want to include
studentized residuals in the worksheet.

Studentized Deleted Residuals Studentized deleted residuals are


similar to the Studentized residual, except that the residual values are
obtained by computing the regression equation without using the data
point in question.

To include studentized deleted residuals in the report, make sure this


check box is selected. Click the selected check box if you do not want to
include studentized deleted residuals in the worksheet.

SigmaStat can automatically flag data points with “large” values of the
studentized deleted residual, i.e., outlying data points; the suggested data
points flagged lie outside the 95% confidence interval for the regression
population.

Note that both Studentized and Studentized deleted residuals use the same
confidence interval setting to determine outlying points.

Simple Linear Regression 476


Prediction and Correlation

Report Flagged Values Only To include only the flagged


standardized and studentized deleted residuals in the report, make sure
the Report Flagged Values Only check box is selected. Uncheck this
option to include all standardized and studentized residuals in the
report.

Confidence Intervals Select the More Statistics tab in the options dialog box to view the
confidence interval options. You can set the confidence interval for the
population, regression, or both and then save them to the worksheet.

FIGURE 11–7
The Options for Linear
Regression Dialog Box
Displaying the Confidence
Intervals Options

Confidence Interval for the Population The confidence interval for


the population gives the range of values that define the region that
contains the population from which the observations were drawn.

To include confidence intervals for the population in the report, make


sure the Population check box is selected. Click the selected check box if
you do not want to include the confidence intervals for the population
in the report.

Confidence Interval for the Regression The confidence interval for


the regression line gives the range of values that defines the region
containing the true mean relationship between the dependent and
independent variables, with the specified level of confidence.

To include confidence intervals for the regression in the report, make


sure the Regression check box is selected, then specify a confidence level
by entering a value in the percentage box. The confidence level can be
any value from 1 to 99. The suggested confidence level for all intervals
is 95%. Click the selected check box if you do not want to include the
confidence intervals for the population in the report.

Simple Linear Regression 477


Prediction and Correlation

Click the selected check box if you do not want to include the
confidence intervals for the population in the report.

Saving Confidence Intervals to the Worksheet To save the


confidence intervals to the worksheet, select the column number of the
first column you want to save the intervals to from the Starting in
Column drop-down list. The selected intervals are saved to the
worksheet starting with the specified column and continuing with
successive columns in the worksheet.

PRESS Select the More Statistics tab in the options dialog box to view the
Prediction Error PRESS Prediction Error option (see Figure 11–7 on page 477). The
PRESS Prediction Error is a measure of how well the regression equation
fits the data. Leave this check box selected to evaluate the fit of the
equation using the PRESS statistic. Click the selected check box if you
do not want to include the PRESS statistic in the report.

Standardized Click the More Statistics tab in the options dialog box to view the
Coefficients (-i) Standardized Coefficients option (see Figure 11–7 on page 477). These
are the coefficients of the regression equation standardized to
dimensionless values,
sx
- i = b i ----i
sy

where bi = regression coefficient, s xi = standard deviation of the


independent variable xi, and sy = standard deviation of dependent
variable y.

To include the standardized coefficients in the report, make sure the


Standardized Coefficients check box is selected. Click the selected
check box if you do not want to include the standardized coefficients in
the worksheet.

Influence Options Select the Other Diagnostics tab in the options dialog box to view the
Influence options. Influence options automatically detect instances of
influential data points. Most influential points are data points which are
outliers, that is, they do not do not “line up” with the rest of the data
points. These points can have a potentially disproportionately strong
influence on the calculation of the regression line. You can use several
influence tests to identify and quantify influential points.

Simple Linear Regression 478


Prediction and Correlation

FIGURE 11–8 16
A Graph with an
Influential Outlying Point 14 Outlying Point
The solid line shows the
regression for the data 12
including the outlier, and
the dotted line is the
regression computed 10
with the outlying point.
8

2
100 120 140 160 180 200 220 240 260 280 300

DFFITS DFFITSi is the number of estimated standard errors that the


predicted value changes for the ith data point when it is removed from
the data set. It is another measure of the influence of a data point on the
prediction used to compute the regression coefficients.

Predicted values that change by more than two standard errors when the
data point is removed are considered to be influential.

Check the DFFITS check box to compute this value for all points and
flag influential points, i.e., those with DFFITS greater than the value
specified in the Flag Values > edit box. The suggested value is 2.0
standard errors, which indicates that the point has a strong influence on

Simple Linear Regression 479


Prediction and Correlation

the data. To avoid flagging more influential points, increase this value;
to flag less influential points, decrease this value.

FIGURE 11–9
The Options for Linear
Regression Dialog Box
Displaying the Influence
and Power Options

Leverage Leverage is used to identify the potential influence of a point


on the results of the regression equation. Leverage depends only on the
value of the independent variable(s). Observations with high leverage
tend to be at the extremes of the independent variables, where small
changes in the independent variables can have large effects on the
predicted values of the dependent variable.
* k + 1 +-
The expected leverage of a data point is ----------------
n
, where there are k
independent variables and n data points. Observations with leverages
much higher than the expected leverages are potentially influential
points.

Check the Leverage check box to compute the leverage for each point
and automatically flag potentially influential points, i.e., those points
that could have leverages greater than the specified value times the
expected leverage. The suggested value is 2.0 times the expected leverage
* k + 1 +-
for the regression (i.e., 2--------------------
n
). To avoid flagging more potentially
influential points, increase this value; to flag points with less potential
influence, lower this value.

Cook's Distance Cook's distance is a measure of how great an effect


each point has on the estimates of the parameters in the regression
equation. Cook's distance assesses how much the values of the
regression coefficients change if a point is deleted from the analysis.
Cook's distance depends on both the values of the independent and
dependent variables.

Simple Linear Regression 480


Prediction and Correlation

Check the Cook's Distance check box to compute this value for all
points and flag influential points, i.e., those with a Cook's distance
greater than the specified value. The suggested value is 4.0. Cook's
distances above 1 indicate that a point is possibly influential. Cook's
distances exceeding 4 indicate that the point has a major effect on the
values of the parameter estimates. To avoid flagging more influential
points, increase this value: to flag less influential points, lower this value.

Report Flagged Values Only To only include only the influential


points flagged by the influential point tests in the report, make sure the
Report Flagged Values Only check box is selected. Uncheck this option
to include all influential points in the report.

What to Do About Influential Points Influential points have two


possible causes:

➤ There is something wrong with the data point, caused by an error in


observation or data entry.
➤ The model is incorrect.

If a mistake was made in data collection or entry, correct the value. If


you do not know the correct value, you may be able to justify deleting
the data point. If the model appears to be incorrect, try regression with
different independent variables, or a Nonlinear Regression.

For descriptions of how to handle influential points, you can reference


an appropriate statistics reference. For a list of suggested references, see
page 12.

Power Select the Other Diagnostics tab in the options dialog box to view the
Power options (see Figure 11–9 on page 480). The power of a regression
is the power to detect the observed relationship in the data. The alpha
(#) is the acceptable probability of incorrectly concluding there is a
relationship.

Check the Power check box to compute the power for the linear
regression data. Change the alpha value by editing the number in the
Alpha Value edit box. The suggested value is # " 0.05. This indicates
that a one in twenty chance of error is acceptable, or that you are willing
to conclude there is a significant relationship when P ! 0.05.

Smaller values of # result in stricter requirements before concluding


there is a significant relationship, but a greater possibility of concluding

Simple Linear Regression 481


Prediction and Correlation

there is no relationship when one exists. Larger values of # make it


easier to conclude that there is a relationship, but also increase the risk of
reporting a false positive.

Running a Linear Regression 10

To run a Simple Linear Regression, you need to select the data to test.
The Pick Columns dialog box is used to select the worksheet columns
with the data you want to test.

To run a Linear Regression:

1 If you want to select your data before you run the test, drag the
pointer over your data.

2 Open the Pick Columns dialog box to start the Linear Regression.
You can either:

➤ Select Linear Regression from the toolbar drop-down list , then


select the button.
➤ Choose the Statistics menu Regression command, then choose
Linear...
➤ Click the Run Test button from the Options for Linear
Regression dialog box (see step 5 on page 472).

If you selected columns before you chose the test, the columns
appear in the Selected Columns list. If you have not selected
columns, the dialog box prompts you to pick your data.

3 To assign the desired worksheet columns to the Selected Columns


list, select the columns in the worksheet, or select the columns
from the Data for Dependent or Independent drop-down list.

The first selected column is assigned to the dependent row in the


Selected Columns list, and the second column is assigned to
independent row in the list. The title of selected columns appear
in each row. You can only select one dependent and one
independent data column.

4 To change your selections, select the assignment in the list, then


select new column from the worksheet. You can also clear a
column assignment by double-clicking it in the Selected Columns
list.

Simple Linear Regression 482


Prediction and Correlation

FIGURE 11–10
The Pick Columns
for Linear Regression
Dialog Box Prompting You to
Select Data Columns

5 Select Finish to run the regression. If you elected to test for


normality, constant variance, and/or independent residuals,
SigmaStat performs the tests for normality (Kolmogorov-
Smirnov), constant variance, and independent residuals. If your
data fail either of these tests, SigmaStat warns you. When the test
is complete, the Simple Linear Regression report appears (see
Figures 11–11 on page 485).

If you selected to place predicted values and residuals the


worksheet (see page 475 and page 477), they are placed in the
specified column and are labeled by content and source column.

Interpreting Simple Linear Regression Results 10

The report for a Linear Regression displays the equation with the
computed coefficients for the line, R, R2, and adjusted R2, a table of
statistical values for the estimate of the dependent variable, and the P
values for the regression equation and for the individual coefficients.

The other results displayed in the report are enabled and disabled
Options for Linear Regression dialog box (see Setting Linear Regression
Options on
page 471).

For descriptions of the computations of these results, you can reference


an appropriate statistics reference. For a list of suggested references, see
page 12 in the INTRODUCTION chapter.

The report scroll bars only scroll to the top and bottom of the current page.
To move to the next or the previous page in the report, use the and
buttons in the formatting toolbar to move one page up and down in the
report.

Simple Linear Regression 483


Prediction and Correlation

Result Explanations In addition to the numerical results, expanded explanations of the results
may also appear. To turn off this explanatory text, choose the Statistics
menu Report Options... command and uncheck the Explain Test
Results option.

The number of decimal places displayed is also set in the Report


Options dialog box. For more information on setting report options, see
page 135 in the WORKING WITH REPORTS chapter.

Regression Equation This is the equation for a line with the values of the coefficients—the
intercept (constant) and the slope—in place.

This equation takes the form:

y = b0 + b1 x

where y is the dependent variable, x is the independent variable, b0 is the


constant, or intercept (value of the dependent variable when x"0, the
point where the regression line intersects the y axis), and b1 is the slope
(increase in the value of y per unit increase in x).

Simple Linear Regression 484


Prediction and Correlation

The number of observations N, and the number of observations


containing missing values (if any) that were omitted from the regression,
are also displayed.

FIGURE 11–11
An Example of the
Simple Linear
Regression Report

R, R2, and R, the correlation coefficient, and R2, the coefficient of


Adj R2 determination, are both measures of how well the regression model
describes the data. R values near 1 indicate that the straight line is a
good description of the relation between the independent and
dependent variable.

R equals 0 when the values of the independent variable do not allow any
prediction of the dependent variables, and equals 1 when you can
perfectly predict the dependent variable from the independent variable.
2
Adjusted R2 The adjusted R2, R adj , is also a measure of how well the
regression model describes the data, but takes into account the number
of independent variables, which reflects the degrees of freedom. Larger
2
R adj values (nearer to 1) indicate that the equation is a good description
of the relation between the independent and dependent variables.

Standard Error of The standard error of the estimate s y x is a measure of the actual
the Estimate ( s y x ) variability about the regression line of the underlying population. The
underlying population generally falls within about two standard errors of
the observed sample.

Simple Linear Regression 485


Prediction and Correlation

Statistical Coefficients The value for the constant (intercept) and coefficient of
Summary Table the independent variable (slope) for the regression model are listed.

Standard Error The standard errors of the intercept and slope are
measures of the precision of the estimates of the regression coefficients
(analogous to the standard error of the mean). The true regression
coefficients of the underlying population generally fall within about two
standard errors of the observed sample coefficients. These values are
used to compute t and confidence intervals for the regression.

t Statistic The t statistic tests the null hypothesis that the coefficient of
the independent variable is zero, that is, the independent variable does
not contribute to predicting the dependent variable. t is the ratio of the
regression coefficient to its standard error, or

regression coefficient
t = -------------------------------------------------------------------------------------
-
standard error of regression coefficient

You can conclude from “large” t values that the independent variable can
be used to predict the dependent variable (i.e., that the coefficient is not
zero).

P value P is the P value calculated for t. The P value is the probability


of being wrong in concluding that there is a true association between the
variables (i.e., the probability of falsely rejecting the null hypothesis, or
committing a Type I error, based on t). The smaller the P value, the
greater the probability that the independent variable can be used to
predict the dependent variable.

Traditionally, you can conclude that the independent variable can be


used to predict the dependent variable when P ! 0.05.

Beta (Standardized Coefficient -) This is the coefficient of the


independent variable standardized to dimensionless values,

sx
- 1 = b 1 ---
sy

where b1 = regression coefficient, sx%" standard deviation of the


independent variable x, and sy " standard deviation of dependent
variable y.

Simple Linear Regression 486


Prediction and Correlation

This result is displayed unless the Standardized Coefficients option is


disabled in the Options for Linear Regression dialog box (see page 478).

Analysis of Variance The ANOVA (analysis of variance) table lists the ANOVA statistics for
(ANOVA) Table the regression and the corresponding F value.

DF (Degrees of Freedom) Degrees of freedom represent the number of


observations and variables in the regression equation.

➤ The regression degrees of freedom is a measure of the number of


independent variables in the regression equation (always 1 for
simple linear regression)
➤ The residual degrees of freedom is a measure of the number of
observations less the number of terms in the equation
➤ The total degrees of freedom is a measure of total observations

SS (Sum of Squares) The sum of squares are measures of variability of


the dependent variable.

➤ The sum of squares due to regression (SSreg ) measures the


difference of the regression line from the mean of the dependent
variable.
➤ The residual sum of squares (SSres ) is a measure of the size of the
residuals, which are the differences between the observed values of
the dependent variable and the values predicted by regression
model.
➤ The total sum of squares (SStot ) is a measure of the overall
variability of the dependent variable about its mean value.

MS (Mean Square) The mean square provides two estimates of the


population variances. Comparing these variance estimates is the basis of
analysis of variance.

The mean square regression is a measure of the variation of the


regression from the mean of the dependent variable, or

sum of squares due to regression = ------------ SS reg


----------------------------------------------------------------------- - = MS reg
regression degrees of freedom DF reg

Simple Linear Regression 487


Prediction and Correlation

The residual mean square is a measure of the variation of the residuals


about the regression line, or

residual sum of squares - SS res


----------------------------------------------------------- = ------------ = MS res
residual degrees of freedom DF res

2
The residual mean square is also equal to s y x .

F Statistic The F test statistic gauges the contribution of the


independent variable in predicting the dependent variable. It is the ratio

regression variation from the dependent variable mean- MS reg


--------------------------------------------------------------------------------------------------------------------------- = -------------
-
residual variation about the regression line MS res

If F is a large number, you can conclude that the independent variable


contributes to the prediction of the dependent variable (i.e., the slope of
the line is different from zero, and the “unexplained variability” is
smaller than what is expected from random sampling variability). If the
F ratio is around 1, you can conclude that there is no association
between the variables (i.e., the data is consistent with the null hypothesis
that all the samples are just randomly distributed about the population
mean, regardless of the value of the independent variable).

P Value The P value is the probability of being wrong in concluding


that there is an association between the dependent and independent
variables (i.e., the probability of falsely rejecting the null hypothesis, or
committing a Type I error, based on F). The smaller the P value, the
greater the probability that there is an association.

Traditionally, you can conclude that the independent variable can be


used to predict the dependent variable when P ! 0.05.

In simple linear regression, the P value for the ANOVA is identical to the P
value associated with the t of the slope coefficient, and F"t 2, where t is the t
value associated with the slope.

PRESS Statistic PRESS, the Predicted Residual Error Sum of Squares, is a gauge of
how well a regression model predicts new data. The smaller the PRESS
statistic, the better the predictive ability of the model.

Simple Linear Regression 488


Prediction and Correlation

The PRESS statistic is computed by summing the squares of the


prediction errors (the differences between predicted and observed values)
for each observation, with that point deleted from the computation of
the regression equation.

Durbin-Watson Statistic The Durbin-Watson statistic is a measure of correlation between the


residuals. If the residuals are not correlated, the Durbin-Watson statistic
will be 2; the more this value differs from 2, the greater the likelihood
that the residuals are correlated. This result appears if it was selected in
the Regression Options dialog box.

Regression assumes that the residuals are independent of each other; the
Durbin-Watson test is used to check this assumption. If the Durbin-
Watson value deviates from 2 by more than the value set in the Options
for Linear Regression dialog box, a warning appears in the report. The
suggested trigger value is a difference of more than 0.50 (i.e., if the
Durbin-Watson statistic is below 1.5 or over 2.5).

Normality Test Normality test result displays whether the data passed or failed the test of
the assumption that the source population is normally distributed
around the regression line, and the P value calculated by the test. All
regressions assume a source population to be normally distributed about
the regression line. When this assumption may be violated, a warning
appears in the report. This result appears unless you disabled normality
testing in the Options for Linear Regression dialog box (see page 471).

Failure of the normality test can indicate the presence of outlying


influential points or an incorrect regression model.

Constant The constant variance test result displays whether or not the data passed
Variance Test or failed the test of the assumption that the variance of the dependent
variable in the source population is constant regardless of the value of
the independent variable, and the P value calculated by the test. When
the constant variance assumption may be violated, a warning appears in
the report.

If you receive this warning, you should consider trying a different model
(i.e., one that more closely follows the pattern of the data), or
transforming the independent variable to stabilize the variance and
obtain more accurate estimates of the parameters in the regression
equation. See Chapter 14, Using Transforms for more information on
the appropriate transform to use.

Simple Linear Regression 489


Prediction and Correlation

Power This result is displayed if you selected this option in the options dialog
box. The power, or sensitivity, of a performed regression is the
probability that the model correctly describes the relationship of the
variables, if there is a relationship.

Regression power is affected by the number of observations, the chance


of erroneously reporting a difference # (alpha), and the correlation
coefficient r associated with the regression.

Alpha (#) Alpha (#) is the acceptable probability of incorrectly


concluding that the model is correct. An #% error is also called a Type I
error (a Type I error is when you reject the hypothesis of no association
when this hypothesis is true).

The # value is set in the Power Options dialog box; the suggested value
is
#%" 0.05 which indicates that a one in twenty chance of error is
acceptable. Smaller values of # result in stricter requirements before
concluding the model is correct, but a greater possibility of concluding
the model is bad when it is really correct (a Type II error). Larger values
of # make it easier to conclude that the model is correct, but also
increase the risk of accepting a bad model (a Type I error).

Regression Diagnostics The regression diagnostic results display only the values for the predicted
values, residual results, and other diagnostics selected in the Options for
Regression dialog box (see page 471). All results that qualify as outlying
values are flagged with a ! symbol. The trigger values to flag residuals as
outliers are set in the Options for Linear Regression dialog box.

If you selected Report Cases with Outliers Only, only those observations
that have one or more residuals flagged as outliers are reported; however,
all other results for that observation are also displayed.

Row This is the row number of the observation.

Predicted Values This is the value for the dependent variable predicted
by the regression model for each observation.

Residuals These are the raw residuals, the difference between the
predicted and observed values for the dependent variables.

Standardized Residuals The standardized residual is the raw residual


divided by the standard error of the estimate s y x .

Simple Linear Regression 490


Prediction and Correlation

If the residuals are normally distributed about the regression line, about
66% of the standardized residuals have values between &1 and 1, and
about 95% of the standardized residuals have values between !2 and 2.
A larger standardized residual indicates that the point is far from the
regression line; the suggested value flagged as an outlier is 2.5.

Studentized Residuals The Studentized residual is a standardized


residual that also takes into account the greater confidence of the
predicted values of the dependent variable in the “middle” of the data
set. By weighting the values of the residuals of the extreme data points
(those with the lowest and highest independent variable values), the
Studentized residual is more sensitive than the standardized residual in
detecting outliers.

Both Studentized and Studentized deleted residuals that lie outside a


specified confidence interval for the regression are flagged as outlying
points; the suggested confidence value is 95%.

This residual is also known as the internally Studentized residual because


the standard error of the estimate is computed using all data.

Studentized Deleted Residuals The Studentized deleted residual, or


externally Studentized residual, is a Studentized residual which uses the
standard error of the estimate s y x $ – i % , computed after deleting the data
point associated with the residual. This reflects the greater effect of
outlying points by deleting the data point from the variance
computation.

Both Studentized and Studentized deleted residuals that lie outside a


specified confidence interval for the regression are flagged as outlying
points; the suggested confidence value is 95%.

The Studentized deleted residual is more sensitive than the Studentized


residual in detecting outliers, since the Studentized deleted residual
results in much larger values for outliers than the Studentized residual.

Influence Diagnostics The influence diagnostic results display only the values for the results
selected in the Options dialog box under the Other Diagnostics tab (see
page 478). All results that qualify as outlying values are flagged with a"#
symbol. The trigger values to flag data points as outliers are also set in
the Options dialog box under the Other Diagnostics tab.

Simple Linear Regression 491


Prediction and Correlation

If you selected Report Cases with Outliers Only, only observations that
have one or more observations flagged as outliers are reported; however,
all other results for that observation are also displayed.

Row This is the row number of the observation.

Cook's Distance Cook's distance is a measure of how great an effect


each point has on the estimates of the parameters in the regression
equation. It is a measure of how much the values of the regression
equation would change if that point is deleted from the analysis.

Values above 1 indicate that a point is possibly influential. Cook's


distances exceeding 4 indicate that the point has a major effect on the
values of the parameter estimates. Points with Cook's distances greater
than the specified value are flagged as influential; the suggested value is
4.

Leverage Leverage values identify potentially influential points.


Observations with leverages a specified factor greater than the expected
leverages are flagged as potentially influential points; the suggested value
is 2.0 times the expected leverage.

The expected leverage of a data point is $----------------


k + 1 %-
n
, where there are k
independent variables and n data points.

Because leverage is calculated using only the dependent variable, high


leverage points tend to be at the extremes of the independent variables
(large and small values), where small changes in the independent
variables can have large effects on the predicted values of the dependent
variable.

DFFITS The DFFITS statistic is a measure of the influence of a data


point on regression prediction. It is the number of estimated standard
errors the predicted value for a data point changes when the observed
value is removed from the data set before computing the regression
coefficients.

Predicted values that change by more than the specified number of


standard errors when the data point is removed are flagged as influential;
the suggested value is 2.0 standard errors.

Confidence Intervals These results are displayed if you selected them in the Regression
Options dialog box. If the confidence interval does not include zero,

Simple Linear Regression 492


Prediction and Correlation

you can conclude that the coefficient is different than zero with the level
of confidence specified. This can also be described as P # & (alpha),
where & is the acceptable probability of incorrectly concluding that the
coefficient is different than zero, and the confidence interval is 100(1"!"
&).

The specified confidence level can be any value from 1 to 99; the
suggested confidence level for both intervals is 95%.

Row This is the row number of the observation.

Predicted This is the value for the dependent variable predicted by the
regression model for each observation.

Regression The confidence interval for the regression line gives the
range of variable values computed for the region containing the true
relationship between the dependent and independent variables, for the
specified level of confidence.

Population The confidence interval for the population gives the range
of variable values computed for the region containing the population
from which the observations were drawn, for the specified level of
confidence.

Simple Linear Regression Report Graphs 10

You can generate up to five graphs using the results from a Simple Linear
Regression. They include a:

➤ Histogram of the residuals.


➤ Scatter plot of the residuals.
➤ Bar chart of the standardized residuals.
➤ Normal probability plot of residuals.
➤ Line/scatter plot of the regression with confidence
and prediction intervals.

Histogram of Residuals The linear regression histogram plots the raw residuals in a specified
range, using a defined interval set. The residuals are divided into a
number of evenly incremented histogram intervals and plotted as
histogram bars indicating the number of residuals in each interval. The
X axis represents the histogram intervals, and the Y axis represent the

Simple Linear Regression 493


Prediction and Correlation

number of residuals in each group. For an example of a histogram, see


page 153 in the CREATING AND MODIFYING GRAPHS chapter.

Scatter Plot of The Linear Regression scatter plot of the residuals plots the residuals of
the Residuals the independent variables as points relative to the standard deviations.
The X axis represents the independent variable values, the Y axis
represents the residuals of the variables, and the horizontal lines running
across the graph represent the standard deviations of the data. For an
example of a scatter plot, see page 152 in the CREATING AND
MODIFYING GRAPHS chapter.

Bar Chart of The Linear Regression bar chart of the standardized residuals plots the
the Standardized standardized residuals of the independent variables as points relative to
Residuals the standard deviations. The X axis represents the independent variable
values, the Y axis represents the residuals of the variables, and the
horizontal lines running across the graph represent the standard
deviations of the data. For an example of a bar chart of the residuals, see
page 153 in the CREATING AND MODIFYING GRAPHS chapter.

Normal The Linear Regression probability plot graphs standardized residuals


Probability Plot versus their cumulative frequencies along a probability scale. The
residuals are sorted and then plotted as points around a curve
representing the area of the gaussian. Plots with residuals that fall along
gaussian curve indicate that your data was taken from a normally
distributed population. The X axis is a linear scale representing the
residual values. The Y axis is a probability scale representing the
cumulative frequency of the residuals. For an example of a normal
probability plot, see page 155 in the CREATING AND MODIFYING
GRAPHS chapter.

Line/Scatter Plot The Linear Regression graph plots the observations of the linear
of the Regression with regression as a line/scatter plot. The points represent the data dependent
Prediction and variables plotted against the independent variables, the solid line
Confidence Intervals running through the points represents the regression line, and the
dashed lines represent the prediction and confidence intervals. The X
axis represents the independent variables and the Y axis represents the
dependent variables. For an example of a line/scatter plot of the
regression, see page 156 in the CREATING AND MODIFYING GRAPHS
chapter.

Simple Linear Regression 494


Prediction and Correlation

Creating a Linear To generate a graph of Linear Regression report data:


Regression
Report Graph 1 Click the toolbar button, or choose the Graph menu Create
Graph command when the Linear Regression report is selected.
The Create Graph dialog box appears displaying the types of
graphs available for the Linear Regression results.

FIGURE 11–12
The Create Graph Dialog
Box
for the Linear Regression
Report Graphs

2 Select the type of graph you want to create from the Graph Type
list, then select OK, or double-click the desired graph in the list.
For more information on each of the graph types, see pages 11-493
through 11-494. The specified graph appears in a graph window
or in the report.

For information on manipulating graphs, see pages 8-178 through


8-202 in the CREATING AND MODIFYING GRAPHS chapter.

Multiple Linear Use a Multiple Linear Regression to when you want to


Regression
➤ Predict the value of one variable from the values of two or more
other variables, by fitting a plane (or hyperplane) through the data,
and
➤ You know there are two or more independent variables and want to
find a model with these independent variables.

The independent variables are the known, or predictor, variables.


When the independent variables are varied, they produce a
corresponding value for the dependent, or response, variable.

If you know there is only one independent variable, use Simple Linear
Regression. If you are not sure if all independent variables should be

Simple Linear Regression 495


Prediction and Correlation

FIGURE 11–13
An Example of a Line/
Scatter Plot of the Linear
Regression Observations
with a Regression and
Confidence and Prediction
Interval Lines

used in the model, use Stepwise or Best Subsets Regression to identify


the important independent variables from the selected possible
independent variables.

If the relationship is not a straight line or plane, use Polynomial or


Nonlinear Regression, or use a variable transformation.

About the Multiple Multiple Linear Regression assumes an association between the
Linear Regression dependent and k independent variables that fits the general equation for
a multidimensional plane:

y = b0 + b1 x1 + b2 x2 + b3 x3 + ' + bk xk

where y is the dependent variable, x1, x2, x3, ..., xk are the k independent
variables, and b0, b1 ,b2,...,bk are the k regression coefficients.

As the values xi vary, the corresponding value for y either increases or


decreases, depending on the sign of the associated regression coefficient
bi.

Multiple Linear Regression finds the k 1 dimensional plane that most


closely describes the actual data, using all the independent variables
selected.

Simple Linear Regression 496


Prediction and Correlation

Multiple Linear Regression is a parametric test, that is, for a given set of
independent variable values, the possible values for the dependent
variable are assumed to be normally distributed and have constant
variance about the regression plane.

Performing To perform a Multiple Linear Regression:


a Multiple
Linear Regression 1 Enter or arrange your data appropriately in the worksheet
(see following section).

2 If desired, set the Linear Regression options using the Options for
Multiple Linear Regression dialog box (page 498).

3 Select Multiple Linear Regression from the toolbar, then select the
button, or choose the Statistics menu command, then choose
Regression.

4 Run the test by selecting the worksheet columns with the data you
want to test using the Pick Columns dialog box (page 511).

5 View and interpret the Multiple Linear Regression report and


generate report graphs (pages 11-512 and 11-523).

Simple Linear Regression 497


Prediction and Correlation

Arranging Multiple Linear Regression Data 10

Place the data for the observed dependent variable in one column and
the data for the corresponding independent variables in two or more
columns.

FIGURE 11–14
Data Format for a
Multiple Linear Regression

Selecting When running a Multiple Linear Regression, you can either:


Data Columns
➤ Select the columns to test from the worksheet before choosing the
test, or
➤ Select the columns while performing the test.

Observations containing missing values are ignored, and all columns


must be equal in length.

Setting Multiple Linear Regression Options 10

Use the Multiple Linear Regression options to:

➤ Set assumption checking options


➤ Specify the residuals to display and save them to the worksheet
➤ Display confidence intervals and save them to the worksheet
➤ Display the PRESS Prediction Error and standardized regression
coefficients
➤ Specify tests to identify outlying or influential data points

Simple Linear Regression 498


Prediction and Correlation

➤ Set the variance inflation factor


➤ Display power

To change Multiple Linear Regression options:

1 If you are going to run the test after changing test options and
want to select your data before you run the test, drag the pointer
over the data.

2 To open the Options for Multiple Linear Regression dialog box,


select Multiple Linear Regression from the drop-down list in the
toolbar, then click the button, or choose the Statistics menu
Current Test Options... command. The assumption checking
options appear (see Figure 11–15 on page 500).

3 Click the Residuals tab to view the residual options (see Figure 11–
16 on page 502), More Statistics tab to view the confidence
intervals, PRESS Prediction Error, Standardized Coefficients
options (see Figure 11–17 on page 504), and Other Diagnostics to
view the Influence, Variance Inflation Factor, and Power options
(see Figure 11–19 on page 507). Click the Assumption Checking
tab to return to the Normality, Constant Variance, and Durbin-
Watson options.

4 Click a check box to enable or disable a test option. Options


settings are saved between SigmaStat sessions. For more
information on each of the test options, see pages 11-499 through
11-508.

5 To continue the test, click Run Test. The Pick Columns dialog
box appears (see page 498 for more information).

6 To accept the current settings and close the options dialog box,
click OK. To accept the current setting without closing the
options dialog box, click Apply. To close the dialog box without
changing any settings or running the test, click Cancel.

( You can click Help at any time to access SigmaStat’s on-line help system.

Assumption Checking Select the Assumption Checking tab from the options dialog box to view
Options the Normality, Constant Variance, and Durbin-Watson options. These
options test your data for its suitability for regression analysis by

Simple Linear Regression 499


Prediction and Correlation

checking three assumptions that a multiple linear regression makes


about the data. A Multiple Linear Regression assumes:

➤ That the source population is normally distributed about the


regression.
➤ The variance of the dependent variable in the source population is
constant regardless of the value of the independent variable(s).
➤ That the residuals are independent of each other.

All assumption checking options are selected by default. Only disable


these options if you are certain that the data was sampled from normal
populations with constant variance and that the residuals are
independent of each other.

Normality Testing SigmaStat uses the Kolmogorov-Smirnov test to test


for a normally distributed population.

FIGURE 11–15
The Options for Multiple
Linear Regression
Dialog Box Displaying
the Assumption
Checking Options

Constant Variance Testing SigmaStat tests for constant variance by


computing the Spearman rank correlation between the absolute values of
the residuals and the observed value of the dependent variable. When
this correlation is significant, the constant variance assumption may be
violated, and you should consider trying a different model (i.e., one that
more closely follows the pattern of the data), or transforming one or
more of the independent variables to stabilize the variance; see Chapter
14, Using Transforms for more information on the appropriate
transform to use.

P Values for Normality and Constant Variance The P value


determines the probability of being incorrect in concluding that the data
is not normally distributed (P value is the risk of falsely rejecting the null

Simple Linear Regression 500


Prediction and Correlation

hypothesis that the data is normally distributed). If the P computed by


the test is greater than the P set here, the test passes.

To require a stricter adherence to normality and/or constant variance,


increase the P value. Because the parametric statistical methods are
relatively robust in terms of detecting violations of the assumptions, the
suggested value in SigmaStat is 0.05. Larger values of P (for example,
0.10) require less evidence to conclude that the residuals are not
normally distributed or the constant variance assumption is violated.

To relax the requirement of normality and/or constant variance,


decrease P. Requiring smaller values of P to reject the normality
assumption means that you are willing to accept greater deviations from
the theoretical normal distribution before you flag the data as non-
normal. For example, a P value of 0.01 for the normality test requires
greater deviations from normality to flag the data as non-normal than a
value of 0.05.

( Although the assumption tests are robust in detecting data from populations
that are non-normal or with non-constant variances, there are extreme
conditions of data distribution that these tests cannot detect. However, these
conditions should be easily detected by visually examining the data without
resorting to the automatic assumption tests.

Durbin-Watson Statistic SigmaStat uses the Durbin-Watson statistic


to test residuals for their independence of each other. The Durbin-
Watson statistic is a measure of serial correlation between the residuals.
The residuals are often correlated when the independent variable is time,
and the deviation between the observation and the regression line at one
time are related to the deviation at the previous time. If the residuals are
not correlated, the Durbin-Watson statistic will be 2.

Difference from 2 Value Enter the acceptable deviation from 2.0 that
you consider as evidence of a serial correlation in the Difference for 2.0
box. If the computed Durbin-Watson statistic deviates from 2.0 more
than the entered value, SigmaStat warns you that the residuals may not
be independent. The suggested deviation value is 0.50, i.e., Durbin-
Watson Statistic values greater than 2.5 or less than 1.5 flag the residuals
as correlated.

To require a stricter adherence to independence, decrease the acceptable


difference from 2.0.

Simple Linear Regression 501


Prediction and Correlation

To relax the requirement of independence, increase the acceptable


difference from 2.0.

Residuals Select the Residuals tab in the options dialog box to view the Predicted
Values, Raw, Standardized, Studentized, Studentized Deleted, and
Report Flagged Values Only options.

FIGURE 11–16
The Options for Multiple
Linear Regression
Dialog Box Displaying
the Residual Options

Predicted Values Use this option to calculate the predicted value of the
dependent variable for each observed value of the independent
variable(s), then save the results to the data worksheet.

To assign predicted values to a worksheet column, select the worksheet


column you want to save the predicted values to from the corresponding
drop-down list. If you select none and the Predicted Values check box is
selected, the values appear in the report but are not assigned to the
worksheet.

Raw Residuals The raw residuals are the differences between the
predicted and observed values of the dependent variables. To include
raw residuals in the report, make sure this check box is selected.

To assign the raw residuals to a worksheet column, select the number of


the desired column from the corresponding drop-down list. If you select
none from the drop-down list and the Raw check box is selected, the
values appear in the report but are not assigned to the worksheet.

Standardized Residuals The standardized residual is the residual


divided by the standard error of the estimate. The standard error of the
residuals is essentially the standard deviation of the residuals, and is a

Simple Linear Regression 502


Prediction and Correlation

measure of variability around the regression line. To include


standardized residuals in the report, make sure this check box is selected.

SigmaStat automatically flags data points lying outside of the confidence


interval specified in the corresponding box. These data points are
considered to have “large” standardized residuals, i.e., outlying data
points. You can change which data points are flagged by editing the
value in the Flag Values > edit box.

Studentized Residuals Studentized residuals scale the standardized


residuals by taking into account the greater precision of the regression
line near the middle of the data versus the extremes. The Studentized
residuals tend to be distributed according to the Student t distribution,
so the t distribution can be used to define “large” values of the
Studentized residuals. SigmaStat automatically flags data points with
“large” values of the Studentized residuals, i.e., outlying data points; the
suggested data points flagged lie outside the 95% confidence interval for
the regression population.

To include studentized residuals in the report, make sure this check box
is selected. Click the selected check box if you do not want to include
studentized residuals in the worksheet.

Studentized Deleted Residuals Studentized deleted residuals are


similar to the Studentized residual, except that the residual values are
obtained by computing the regression equation without using the data
point in question.

To include studentized deleted residuals in the report, make sure this


check box is selected. Click the selected check box if you do not want to
include studentized deleted residuals in the worksheet.

SigmaStat can automatically flag data points with “large” values of the
studentized deleted residual, i.e., outlying data points; the suggested data
points flagged lie outside the 95% confidence interval for the regression
population.

( Note that both Studentized and Studentized deleted residuals use the same
confidence interval setting to determine outlying points.

Report Flagged Values Only To only include the flagged


standardized and studentized deleted residuals in the report, make sure
that the

Simple Linear Regression 503


Prediction and Correlation

Report Flagged Values Only check box is selected. Uncheck this option
to include all standardized and studentized residuals in the report.

Confidence Intervals Select the More Statistics tab in the options dialog box to view the
confidence interval options. You can set the confidence interval for the
population, regression, or both and then save them to the data
worksheet.

FIGURE 11–17
The Options for Multiple
Linear Regression
Dialog Box Displaying
the Confidence
Interval Options

Confidence Interval for the Population The confidence interval for


the population gives the range of values that define the region that
contains the population from which the observations were drawn.

To include confidence intervals for the population in the report, make


sure the Population check box is selected. Click the selected check box if
you do not want to include the confidence intervals for the population
in the report.

Confidence Interval for the Regression The confidence interval for


the regression line gives the range of values that defines the region
containing the true mean relationship between the dependent and
independent variables, with the specified level of confidence.

To include confidence intervals for the regression in the report, make


sure the Regression check box is selected, then specify a confidence level
by entering a value in the percentage box. The confidence level can be
any value from 1 to 99. The suggested confidence level for all intervals
is 95%. Click the selected check box if you do not want to include the
confidence intervals for the population in the report.

Simple Linear Regression 504


Prediction and Correlation

Click the selected check box if you do not want to include the
confidence intervals for the population in the report.

Saving Confidence Intervals to the Worksheet To save the


confidence intervals to the worksheet, select the column number of the
first column you want to save the intervals to from the Starting in
Column drop-down list. The selected intervals are saved to the
worksheet starting with the specified column and continuing with
successive columns in the worksheet.

PRESS Select the More Statistics tab in the options dialog box to view the
Prediction Error PRESS Prediction Error option (see Figure 11–17 on page 504). The
PRESS Prediction Error is a measure of how well the regression equation
fits the data. Leave this check box selected to evaluate the fit of the
equation using the PRESS statistic. Click the selected check box if you
do not want to include the PRESS statistic in the report.

Standardized Click the More Statistics tab in the options dialog box to view the
Coefficients ()i) Standardized Coefficients option (see Figure 11–17 on page 504).
These are the coefficients of the regression equation standardized to
dimensionless values,
sx
) i = b i ----i
sy

where bi = regression coefficient, s xi = standard deviation of the


independent variable xi, and sy = standard deviation of dependent
variable y.

To include the standardized coefficients in the report, make sure the


Standardized Coefficients check box is selected. Click the selected
check box if you do not want to include the standardized coefficients in
the worksheet.

Influence Options Select the Other Diagnostics tab in the options dialog box to view the
Influence options. Influence options automatically detect instances of
influential data points. Most influential points are data points which are
outliers, that is, they do not do not “line up” with the rest of the data
points. These points can have a potentially disproportionately strong
influence on the calculation of the regression line. You can use several
influence tests to identify and quantify influential points.

Simple Linear Regression 505


Prediction and Correlation

FIGURE 11–18 16
A Graph with an
Influential Outlying Point 14 Outlying Point
The solid line shows the
regression for the data 12
including the outlier, and
the dotted line is the
regression computed 10
with the outlying point.
8

2
100 120 140 160 180 200 220 240 260 280 300

DFFITS DFFITSi is the number of estimated standard errors that the


predicted value changes for the ith data point when it is removed from
the data set. It is another measure of the influence of a data point on the
prediction used to compute the regression coefficients.

Predicted values that change by more than two standard errors when the
data point is removed are considered to be influential.

Check the DFFITS check box to compute this value for all points and
flag influential points, i.e., those with DFFITS greater than the value
specified in the Flag Values > edit box. The suggested value is 2.0
standard errors, which indicates that the point has a strong influence on
the data. To avoid flagging more influential points, increase this value;
to flag less influential points, decrease this value.

Leverage Leverage is used to identify the potential influence of a point


on the results of the regression equation. Leverage depends only on the
value of the independent variable(s). Observations with high leverage
tend to be at the extremes of the independent variables, where small
changes in the independent variables can have large effects on the
predicted values of the dependent variable.

The expected leverage of a data point is $----------------


k + 1 %-
n
, where there are k
independent variables and n data points. Observations with leverages
much higher than the expected leverages are potentially influential
points.

Simple Linear Regression 506


Prediction and Correlation

Check the Leverage check box to compute the leverage for each point
and automatically flag potentially influential points, i.e., those points
that could have leverages greater than the specified value times the
expected leverage. The suggested value is 2.0 times the expected leverage
$ k + 1 %-
for the regression (i.e., 2--------------------
n
). To avoid flagging more potentially
influential points, increase this value; to flag points with less potential
influence, lower this value.

FIGURE 11–19
The Options for
Multiple Linear
Regression Dialog Box
Displaying the Influence
Variance Inflation Factor,
and Power Options

Cook's Distance Cook's distance is a measure of how great an effect


each point has on the estimates of the parameters in the regression
equation. Cook's distance assesses how much the values of the
regression coefficients change if a point is deleted from the analysis.
Cook's distance depends on both the values of the independent and
dependent variables.

Check the Cook's Distance check box to compute this value for all
points and flag influential points, i.e., those with a Cook's distance
greater than the specified value. The suggested value is 4.0. Cook's
distances above 1 indicate that a point is possibly influential. Cook's
distances exceeding 4 indicate that the point has a major effect on the
values of the parameter estimates. To avoid flagging more influential
points, increase this value: to flag less influential points, lower this value.

Report Flagged Values Only To only include only the influential


points flagged by the influential point tests in the report, make sure the
Report Flagged Values Only check box is selected. Uncheck this option
to include all influential points in the report.

What to Do About Influential Points Influential points have two


possible causes:

Simple Linear Regression 507


Prediction and Correlation

➤ There is something wrong with the data point, caused by an error in


observation or data entry.
➤ The model is incorrect.

If a mistake was made in data collection or entry, correct the value. If


you do not know the correct value, you may be able to justify deleting
the data point. If the model appears to be incorrect, try regression with
different independent variables, or a Nonlinear Regression.

For descriptions of how to handle influential points, you can reference


an appropriate statistics reference. For a list of suggested references, see
page 12 in the INTRODUCTION chapter.

Power Select the Other Diagnostics tab in the options dialog box to view the
Power options (see Figure 11–19 on page 507). The power of a
regression is the power to detect the observed relationship in the data.
The alpha (&) is the acceptable probability of incorrectly concluding
there is a relationship.

Check the Power check box to compute the power for the multiple
linear regression data. Change the alpha value by editing the number in
the Alpha Value edit box. The suggested value is & * 0.05. This
indicates that a one in twenty chance of error is acceptable, or that you
are willing to conclude there is a significant relationship when P # 0.05.

Smaller values of & result in stricter requirements before concluding


there is a significant relationship, but a greater possibility of concluding
there is no relationship when one exists. Larger values of & make it
easier to conclude that there is a relationship, but also increase the risk of
reporting a false positive.

Variance Select the Other Diagnostics tab in the options dialog box to view the
Inflation Factor Variance Inflation Factor option (see Figure 11–19 on page 507). Use
this option to measure the multicollinearity of the independent
variables, or the linear combination of the independent variables in the
fit.

Regression procedures assume that the independent variables are


statistically independent of each other, i.e., that the value of one
independent variable does not affect the value of another. However, this
ideal situation rarely occurs in the real world. When the independent

Simple Linear Regression 508


Prediction and Correlation

variables are correlated, or contain redundant information, the estimates


of the parameters in the regression model can become unreliable.

FIGURE 11–20
A Graph with
Multicollinear data points
Note that knowing the value 18
of one of the independent
variables allows you to 16
predict the other, so that the
independent variables are 14
Dependent y

statistically independent.
12

10 120
100

2
ent x
8 80
6 60

pend
40
4 600

Inde
500 20
400 300 200 100 0

Independ
ent x
1

The parameters in regression models quantify the theoretically unique


contribution of each independent variable to predicting the dependent
variable. When the independent variables are correlated, they contain
some common information and “contaminate” the estimates of the
parameters. If the multicollinearity is severe, the parameter estimates
can become unreliable.

There are two types of multicollinearity.

Structural Multicollinearity Structural multicollinearity occurs when


the regression equation contains several independent variables which are
functions of each other. The most common form of structural
multicollinearity occurs when a polynomial regression equation contains
several powers of the independent variable. Because these powers (e.g.,
x, x2, etc.) are correlated with each other, structural multicollinearity
occurs. Including interaction terms in a regression equation can also
result in structural multicollinearity.

Sample-Based Multicollinearity Sample-based multicollinearity


occurs when the sample observations are collected in such a way that the
independent variables are correlated (for example, if age, height, and
weight are collected on children of varying ages, each variable has a
correlation with the others).

Simple Linear Regression 509


Prediction and Correlation

SigmaStat can automatically detect multicollinear independent variables


using the variance inflation factor. Click the Other Diagnostics tab in
the Options dialog box to view the Variance Inflation Factor option.

Flagging Multicollinear Data Use the value in the Flag Values > edit
box as a threshold for multicollinear variables. The default threshold
value is 4.0, meaning that any value greater than 4.0 will be flagged as
multicollinear. To make this test more sensitive to possible
multicollinearity, decrease this value. To allow greater correlation of the
independent variables before flagging the data as multicollinear, increase
this value.

When the variance inflation factor is large, there are redundant variables
in the regression model, and the parameter estimates may not be reliable.
Variance inflation factor values above 4 suggest possible
multicollinearity; values above 10 indicate serious multicollinearity.

What to Do About Multicollinearity Sample-based multicollinearity


can sometimes be resolved by collecting more data under other
conditions to break up the correlation among the independent variables.
If this is not possible, the regression equation is over parameterized and
one or more of the independent variables must be dropped to eliminate
the multicollinearity.

Structural multicollinearities can be resolved by centering the


independent variable before forming the power or interaction terms.
Use the Transform menu Center command to center the data; see
Chapter 14, Using Transforms for more information on using
transforms to modify data.

For descriptions of how to handle multicollinearity, you can reference an


appropriate statistics reference. For a list of suggested references, see the
page 12 in the INTRODUCTION chapter.

Report Flagged Values Only To only include only the points flagged
by the influential point tests and values exceeding the variance inflation
threshold in the report, make sure the Report Flagged Values Only check
box is selected. Uncheck this option to include all influential points in
the report.

Simple Linear Regression 510


Prediction and Correlation

Running a Multiple Linear Regression 10

To run a Multiple Linear Regression, you need to select the data to test.
Use the Pick Columns dialog box to select the worksheet columns with
the data you want to test.

To run a Multiple Linear Regression:

1 If you want to select your data before you run the regression, drag
the pointer over your data.

2 Open the Pick Columns dialog box to start the Multiple Linear
Regression. You can either:

➤ Select Multiple Linear Regression from the toolbar drop-down


list, then select the button
➤ Choose the Statistics menu Regression command, then choose
Multiple Linear...
➤ Click the Run Test button from the Options for Multiple Linear
Regression dialog box (see step 5 on page 499)

If you selected columns before you chose the test, the selected
columns appear in the column list. If you have not selected
columns, the dialog box prompts you to pick your data.

3 To assign the desired worksheet columns to the Selected Columns


list, select the columns in the worksheet or from the Data for
Dependent or Independent drop-down list.

The first selected column is assigned to the Dependent row in the


Selected Columns list, and all successively selected columns are
assigned to the Independent rows in the list. The title of selected

Simple Linear Regression 511


Prediction and Correlation

columns appear in each row. You can select up to 64 independent


columns.

FIGURE 11–21
The Pick Columns
for Multiple Linear
Regression Dialog Box

4 To change your selections, select the assignment in the list, then


select new column from the worksheet. You can also clear a
column assignment by double-clicking it in the Selected Columns
list.

5 Click Finish to run to perform the regression. If you elected to test


for normality, constant variance, and/or independent residuals,
SigmaStat performs the tests for normality (Kolmogorov-
Smirnov), constant variance, and independent residuals. If your
data fails either of these tests, SigmaStat warns you. When the test
is complete, the report appears displaying the results of the
Multiple Linear Regression (see Figure 11–22 on page 515).

If you selected to place residuals and other test results in the


worksheet, they are placed in the specified column and are labeled
by content and source column.

Interpreting Multiple Linear Regression Results 10

The report for a Multiple Linear Regression displays the equation with
the computed coefficients, R, R2, and the adjusted R2, a table of
statistical values for the estimate of the dependent variable, and the P
value for the regression equation and for the individual coefficients.

The other results displayed in the report are enabled or disabled in the
Options for Multiple Linear Regression dialog box (see page 498).

Simple Linear Regression 512


Prediction and Correlation

Result Explanations In addition to the numerical results, expanded explanations of the results
may also appear. To turn off this explanatory text, choose the Statistics
menu Report Options command and clear the Explain Results option.

The number of decimal places displayed is also set in the Report


Options dialog box. For more information on setting report options, see
Working with Reports on page 135.

Regression Equation This is the equation with the values of the coefficients in place. This
equation takes the form:

y = b0 + b1 x1 + b2 x2 + b3 x3 + ' + bk xk

where y is the dependent variable, x1, x2, x3, ..., xk are the independent
variables, and b0, b1, b2, b3,...,bk are the regression coefficients.

The number of observations N, and the number of observations


containing missing values (if any) that were omitted from the regression,
are also displayed.

R, R2, and Adj R2 R and R2 R, the correlation coefficient, and R2, the coefficient of
determination for multiple regression, are both measures of how well
the regression model describes the data. R values near 1 indicate that the
equation is a good description of the relation between the independent
and dependent variables.

R equals 0 when the values of the independent variable do not allow any
prediction of the dependent variables, and equals 1 when you can
perfectly predict the dependent variables from the independent
variables.

Adjusted R2 The adjusted R2, R2adj, is also a measure of how well the
regression model describes the data, but takes into account the number
of independent variables, which reflects the degrees of freedom. Larger
R2adj values (nearer to 1) indicate that the equation is a good description
of the relation between the independent and dependent variables.

Standard Error of The standard error of the estimate Sy x is a measure of the actual
the Estimate ( Sy x ) variability about the regression plane of the underlying population. The
underlying population generally falls within about two standard errors of
the estimate of the observed sample.

Simple Linear Regression 513


Prediction and Correlation

Statistical Coefficients The value for the constant and coefficients of the
Summary Table independent variables for the regression model are listed.

Standard Error The standard errors of the regression coefficients


(analogous to the standard error of the mean). The true regression
coefficients of the underlying population generally fall within about two
standard errors of the observed sample coefficients. Large standard
errors may indicate multicollinearity.

These values are used to compute t and confidence intervals for the
regression.

Beta (Standardized Coefficient )i) These are the coefficients of the


regression equation standardized to dimensionless values,
sx
) i = b i ----i
sy

where bi = regression coefficient, s xi = standard deviation of the


independent variable xi, and sy = standard deviation of dependent
variable y.

These results are displayed if the Standardized Coefficients option was


selected in the Regression Options dialog box.

t Statistic The t statistic tests the null hypothesis that the coefficient of
the independent variable is zero, that is, the independent variable does
not contribute to predicting the dependent variable. t is the ratio of the
regression coefficient to its standard error, or:

regression coefficient
t = --------------------------------------------------------------------------------------
standard error of regression coefficient

You can conclude from “large” t values that the independent variable can
be used to predict the dependent variable (i.e., that the coefficient is not
zero).

P value P is the P value calculated for t. The P value is the probability


of being wrong in concluding that there is a true association between the
variables (i.e., the probability of falsely rejecting the null hypothesis, or

Simple Linear Regression 514


Prediction and Correlation

committing a Type I error, based on t). The smaller the P value, the
greater the probability that the variables are correlated.

Traditionally, you can conclude that the independent variable


contributes to predicting the dependent variable when P # 0.05.

VIF (Variance Inflation Factor) The variance inflation factor is a


measure of multicollinearity. It measures the “inflation” of the standard
error of each regression parameter (coefficient) for an independent
variable due to redundant information in other independent variables.

If the variance inflation factor is 1.0, there is no redundant information


in the other independent variables. If the variance inflation factor is
much larger, there are redundant variables in the regression model, and
the parameter estimates may not be reliable.

Variance inflation factor values for independent variables above the


specified value are flagged with a + symbol, indicating multicollinearity
with other independent variables. The suggested value is 4.0.

FIGURE 11–22
An Example
of a Multiple Linear
Regression Report

Analysis of Variance The ANOVA (analysis of variance) table lists the ANOVA statistics for
(ANOVA) Table the regression and the corresponding F value.

SS (Sum of Squares) The sum of squares are measures of variability of


the dependent variable.

Simple Linear Regression 515


Prediction and Correlation

➤ The sum of squares due to regression measures the difference of the


regression plane from the mean of the dependent variable
➤ The residual sum of squares is a measure of the size of the residuals,
which are the differences between the observed values of the
dependent variable and the values predicted by regression model
➤ The total sum of squares is a measure of the overall variability of the
dependent variable about its mean value

DF (Degrees of Freedom) Degrees of freedom represent the number


observations and variables in the regression equation.

➤ The regression degrees of freedom is a measure of the number of


independent variables
➤ The residual degrees of freedom is a measure of the number of
observations less the number of terms in the equation
➤ The total degrees of freedom is a measure of total observations

MS (Mean Square) The mean square provides two estimates of the


population variances. Comparing these variance estimates is the basis of
analysis of variance.

The mean square regression is a measure of the variation of the


regression from the mean of the dependent variable, or:

sum of squares due to regression = ------------ SS reg


----------------------------------------------------------------------- - = MS reg
regression degrees of freedom DF reg

The residual mean square is a measure of the variation of the residuals


about the regression plane, or:

residual sum of squares - SS res


----------------------------------------------------------- = ------------ = MS res
residual degrees of freedom DF res
2
The residual mean square is also equal to s y x .

Simple Linear Regression 516


Prediction and Correlation

F Statistic The F test statistic gauges the ability of the regression


equation, containing all independent variables, to predict the dependent
variable. It is the ratio

regression variation from the dependent variable mean- MS reg


--------------------------------------------------------------------------------------------------------------------------- = -------------
residual variation about the regression MS res

If F is a large number, you can conclude that the independent variables


contribute to the prediction of the dependent variable (i.e., at least one
of the coefficients is different from zero, and the “unexplained
variability” is smaller than what is expected from random sampling
variability about the mean value of the dependent variable). If the F
ratio is around 1, you can conclude that there is no association between
the variables (i.e., the data is consistent with the null hypothesis that all
the samples are just randomly distributed).

P Value The P value is the probability of being wrong in concluding


that there is an association between the dependent and independent
variables (i.e., the probability of falsely rejecting the null hypothesis, or
committing a Type I error, based on F). The smaller the P value, the
greater the probability that there is an association.

Traditionally, you can conclude that the independent variable can be


used to predict the dependent variable when P # 0.05.

Incremental SSincr SSincr, the incremental or Type I sum of squares, is a measure


Sum of Squares of the new predictive information contained in an independent variable,
as it is added to the equation.

The incremental sum of squares measures the increase in the regression


sum of squares (and reduction in the sum of squared residuals) obtained
when that independent variable is added to the regression equation, after
all independent variables above it have been entered.

You can gauge the additional contribution of each independent variable


by comparing these values.

The incremental sum of squares is affected by the order of the


independent variables in the regression equation, unless the independent
variables are unrelated (no multicollinearity).

Simple Linear Regression 517


Prediction and Correlation

SSmarg SSmarg, the marginal or Type III sum of squares, is a measure


of the unique predictive information contained in an independent
variable, after taking into account all other independent variables. You
can gauge the independent contribution of each independent variable
by comparing these values.

The marginal sum of squares measures the reduction in the sum of


squared residuals obtained by entering the independent variable last,
after all other variables in the equation have been entered.

PRESS Statistic PRESS, the Predicted Residual Error Sum of Squares, is a gauge of
how well a regression model predicts new data. The smaller the PRESS
statistic, the better the predictive ability of the model.

The PRESS statistic is computed by summing the squares of the


prediction errors (the differences between predicted and observed values)
for each observation, with that point deleted from the computation of
the regression equation.

Durbin-Watson Statistic The Durbin-Watson statistic is a measure of correlation between the


residuals. If the residuals are not correlated, the Durbin-Watson statistic
will be 2; the more this value differs from 2, the greater the likelihood
that the residuals are correlated. This results appears if it was selected in
the Regression Options dialog box.

Regression assumes that the residuals are independent of each other; the
Durbin-Watson test is used to check this assumption. If the Durbin-
Watson value deviates from 2 by more than the value set in the
Regression Options dialog box, a warning appears in the report. The
suggested trigger value is a difference of more than 0.50, i.e., the
Durbin-Watson statistic is below 1.50 or above 2.50.

Normality Test Normality test result displays whether the data passed or failed the test of
the assumption that the source population is normally distributed
around the regression, and the P value calculated by the test. All
regressions require a source population to be normally distributed about
the regression line. When this assumption may be violated, a warning
appears in the report. This result appears unless you disabled normality
testing in the Regression Options dialog box (see page 499).

Failure of the normality test can indicate the presence of outlying


influential points or an incorrect regression model.

Simple Linear Regression 518


Prediction and Correlation

Constant The constant variance test result displays whether or not the data passed
Variance Test or failed the test of the assumption that the variance of the dependent
variable in the source population is constant regardless of the value of
the independent variable, and the P value calculated by the test. When
the constant variance assumption may be violated, a warning appears in
the report.

If you receive this warning, you should consider trying a different model
(i.e., one that more closely follows the pattern of the data), or
transforming the independent variable to stabilize the variance and
obtain more accurate estimates of the parameters in the regression
equation. See Chapter 14, Using Transforms, USING TRANSFORMS for
more information on the appropriate transform to use.

Power This result is displayed if you selected this option in the Options for
Multiple Linear Regression dialog box.

The power, or sensitivity, of a regression is the probability that the


regression model can detect the observed relationship among the
variables, if there is a relationship in the underlying population.

Regression power is affected by the number of observations, the chance


of erroneously reporting a difference & (alpha), and the slope of the
regression.

Alpha (a) Alpha (&) is the acceptable probability of incorrectly


concluding that the model is correct. An & error is also called a Type I
error (a Type I error is when you reject the hypothesis of no association
when this hypothesis is true).

The & value is set in the Power Options dialog box; the suggested value
is
&"* 0.05 which indicates that a one in twenty chance of error is
acceptable. Smaller values of & result in stricter requirements before
concluding the model is correct, but a greater possibility of concluding
the model is bad when it is really correct (a Type II error). Larger values
of & make it easier to conclude that the model is correct, but also
increase the risk of accepting a bad model (a Type I error).

Regression Diagnostics The regression diagnostic results display only the values for the predicted
values, residuals, and other diagnostic results selected in the Options for
Multiple Linear Regression dialog box (see page 498). All results that
qualify as outlying values are flagged with a # symbol. The trigger values

Simple Linear Regression 519


Prediction and Correlation

to flag residuals as outliers are set in the Options for Multiple Linear
Regression dialog box.

If you selected Report Cases with Outliers Only, only those observations
that have one or more residuals flagged as outliers are reported; however,
all other results for that observation are also displayed.

Row This is the row number of the observation.

Predicted Values This is the value for the dependent variable predicted
by the regression model for each observation.

Residuals These are the raw residuals, the difference between the
predicted and observed values for the dependent variables.

Standardized Residuals The standardized residual is the raw residual


divided by the standard error of the estimate s y x .

If the residuals are normally distributed about the regression, about 66%
of the standardized residuals have values between !1 and 1, and about
95% of the standardized residuals have values between !2 and 2. A
larger standardized residual indicates that the point is far from the
regression; the suggested value flagged as an outlier is 2.5.

Studentized Residuals The Studentized residual is a standardized


residual that also takes into account the greater confidence of the
predicted values of the dependent variable in the “middle” of the data
set. By weighting the values of the residuals of the extreme data points
(those with the lowest and highest independent variable values), the
Studentized residual is more sensitive than the standardized residual in
detecting outliers.

Both Studentized and Studentized deleted residuals that lie outside a


specified confidence interval for the regression are flagged as outlying
points; the suggested confidence value is 95%.

This residual is also known as the internally Studentized residual,


because the standard error of the estimate is computed using all data.

Studentized Deleted Residual The Studentized deleted residual, or


externally Studentized residual, is a Studentized residual which uses the
standard error of the estimate s y $ –i % , computed after deleting the data
point associated with the residual. This reflects the greater effect of

Simple Linear Regression 520


Prediction and Correlation

outlying points by deleting the data point from the variance


computation.

Both Studentized and Studentized deleted residuals that lie outside a


specified confidence interval for the regression are flagged as outlying
points; the suggested confidence value is 95%.

The Studentized deleted residual is more sensitive than the Studentized


residual in detecting outliers, since the Studentized deleted residual
results in much larger values for outliers than the Studentized residual.

Influence Diagnostics The influence diagnostic results display only the values for the results
selected in the Options dialog box under the Other Diagnostics tab (see
page 505). All results that qualify as outlying values are flagged with a #
symbol. The trigger values to flag data points as outliers are also set in
Options dialog box under the Other Diagnostics tab.

If you selected Report Cases with Outliers Only, only observations that
have one or more observations flagged as outliers are reported; however,
all other results for that observation are also displayed.

Row This is the row number of the observation.

Cook's Distance Cook's distance is a measure of how great an effect


each point has on the estimates of the parameters in the regression
equation. It is a measure of how much the values of the regression
coefficients would change if that point is deleted from the analysis.

Values above 1 indicate that a point is possibly influential. Cook's


distances exceeding 4 indicate that the point has a major effect on the
values of the parameter estimates. Points with Cook's distances greater
than the specified value are flagged as influential; the suggested value is
4.

Leverage Leverage values identify potentially influential points.


Observations with leverages a specified factor greater than the expected
leverages are flagged as potentially influential points; the suggested value
is 2.0 times the expected leverage.

The expected leverage of a data point is $----------------


k + 1 %-
n
, where there are k
independent variables and n data points.

Because leverage is calculated using only the dependent variable, high


leverage points tend to be at the extremes of the independent variables

Simple Linear Regression 521


Prediction and Correlation

(large and small values), where small changes in the independent


variables can have large effects on the predicted values of the dependent
variable.

DFFITS The DFFITS statistic is a measure of the influence of a data


point on regression prediction. It is the number of estimated standard
errors the predicted value for a data point changes when the observed
value is removed from the data set before computing the regression
coefficients.

Predicted values that change by more than the specified number of


standard errors when the data point is removed are flagged as influential;
the suggested value is 2.0 standard errors.

Confidence Intervals These results are displayed if you selected them in the Options for
Multiple Linear Regression dialog box. If the confidence interval does
not include zero, you can conclude that the coefficient is different than
zero with the level of confidence specified. This can also be described as
P #"& (alpha), where & is the acceptable probability of incorrectly
concluding that the coefficient is different than zero, and the confidence
interval is 100(1 !"&").

The specified confidence level can be any value from 1 to 99; the
suggested confidence level for both intervals is 95%.

Row This is the row number of the observation.

Predicted This is the value for the dependent variable predicted by the
regression model for each observation.

Regression The confidence interval for the regression gives the range of
variable values computed for the region containing the true relationship
between the dependent and independent variables, for the specified level
of confidence.

Population The confidence interval for the population gives the range
of variable values computed for the region containing the population
from which the observations were drawn, for the specified level of
confidence.

Simple Linear Regression 522


Prediction and Correlation

Multiple Linear Regression Report Graphs 10

You can generate up to six graphs using the results from a Multiple
Linear Regression. They include a:

➤ Histogram of the residuals.


➤ Scatter plot of the residuals.
➤ Bar chart of the standardized residuals.
➤ Normal probability plot of the residuals.
➤ Line/scatter plot of the regression with one independent variable
and confidence and prediction intervals.
➤ 3D scatter plot of the residuals.

Histogram of Residuals The Multiple Linear Regression histogram plots the raw residuals in a
specified range, using a defined interval set. The residuals are divided
into a number of evenly incremented histogram intervals and plotted as
histogram bars indicating the number of residuals in each interval. The
X axis represents the histogram intervals, and the Y axis represent the
number of residuals in each group. For an example of a histogram, see
page 153 in the CREATING AND MODIFYING GRAPHS chapter.

Scatter Plot of The Multiple Linear Regression scatter plot of the residuals plots the
the Residuals residuals of the data in the selected independent variable column as
points relative to the standard deviations. The X axis represents the
independent variable values, the Y axis represents the residuals of the
variables, and the horizontal lines running across the graph represent the
standard deviations of the data. For an example of a scatter plot of the
residuals, see page 152 in the CREATING AND MODIFYING GRAPHS
chapter.

Bar Chart of The Multiple Linear Regression bar chart of the standardized residuals
the Standardized plots the standardized residuals of the data in the selected independent
Residuals variable column as points relative to the standard deviations. The X axis
represents the selected independent variable values, the Y axis represents
the residuals of the variables, and the horizontal lines running across the
graph represent the standard deviations of the data. For an example of a
bar chart of the residuals, see page 153 in the CREATING AND
MODIFYING GRAPHS chapter.

Normal The Multiple Linear Regression probability plot graphs standardized


Probability Plot residuals versus their cumulative frequencies along a probability scale.

Simple Linear Regression 523


Prediction and Correlation

The residuals are sorted and then plotted as points around a curve
representing the area of the gaussian. Plots with residuals that fall along
gaussian curve indicate that your data was taken from a normally
distributed population. The X axis is a linear scale representing the
residual values. The Y axis is a probability scale representing the
cumulative frequency of the residuals. For an example of a normal
probability plot, see page 155 in the CREATING AND MODIFYING
GRAPHS chapter.

Line/Scatter Plot The Multiple Linear Regression line/scatter graph plots the observations
of the Regression with of the linear regression for the data of the selected independent variable
Prediction and column as a line/scatter plot. The points represent the dependent
Confidence Intervals variable data plotted against the selected independent variable data, the
solid line running through the points represents the regression line, and
the dashed lines represent the prediction and confidence intervals. The
X axis represents the independent variables and the Y axis represents the
dependent variables. For an example of a line scatter plot of the
regression, see page 156 in the CREATING AND MODIFYING GRAPHS
chapter.

3D Residual The multiple linear regression 3D residual scatter plot graphs the
Scatter Plot residuals of the two selected columns of independent variable data. The
X and the Y axes represent the independent variables, and the Z axis
represents the residuals. For an example of a 3D scatter plot of the
residuals, see page 156 in the CREATING AND MODIFYING GRAPHS
chapter.

Simple Linear Regression 524


Prediction and Correlation

Creating Multiple Linear To generate a report graph of Multiple Linear Regression data:
Regression
Report Graphs 1 Click the toolbar button, or choose the Graph menu Create
Graph command when the Multiple Linear Regression report is
selected. The Create Graph dialog box appears displaying the
types of graphs available for the Multiple Linear Regression results.

2 Select the type of graph you want to create from the Graph Type
list, then select OK, or double-click the desired graph in the list.
For more information on each of the graph types, see pages 11-493
through 11-494.

FIGURE 11–23
The Create Graph Dialog
Box
for the Multiple Linear
Regression Report

If you select Scatter Plot Residuals, Bar Chart Std Residuals,


Regression, Conf. & Pred, a dialog box appears prompting you to
select the column with independent variables you want to use in
the graph.

FIGURE 11–24
The Dialog Box Prompting
You
to Select the Independent
Variable Column to Plot

If you select 3D Scatter & Mesh, or 3D Residual Scatter, and you


have more than two columns of independent variables, a dialog
box appears prompting you to select the two columns with the
independent variables you want to plot.

Simple Linear Regression 525


Prediction and Correlation

3 Select the columns with the independent variables you want to use
in the graph, then select OK. The graph appears using the
specified independent variables.

FIGURE 11–25
A 3D Scatter Plot of
Multiple Comparison
Residuals

For information on manipulating graphs, see pages 8-181 through


8-201 in the CREATING AND MODIFYING GRAPHS chapter.

Simple Linear Regression 526


12
Multiple Logistic Regression 10

Use a Multiple Logistic Regression when you want to predict a


qualitative dependent variable, such as the presence or absence of a
disease, from observations of one or more independent variables, by
fitting a logistic function to the data.

The independent variables are the known, or predictor, variables.


When the independent variables are varied, they produce a
corresponding value for the dependent, or response, variable.
SigmaStat’s Logistic Regression requires that the dependent variable be
dichotomous or take two possible responses (dead or alive, black or
white) represented by values of 0 and 1.

If your dependent variable data does not use dichotomous values, use a
Simple Linear Regression if you have one independent variable and a
Multiple Linear Regression if you have more than one independent
variable.

About the Multiple Multiple Logistic Regression assumes an association between the
Logistic Regression dependent and k independent variables that fits the general equation for
a multidimensional plane:
1
P $ y = 1 % = -----------------------------------------------------------------
$ b0 + b1 x1 + b2 x2 + ' + bk xk %
1+e

where y is the dependent variable, P(y *1) is the predicted probability


that the dependent variable is positive response or has a value of 1, b0
through bk are the k regression coefficients, and x1 through xk are the
independent variables.

As the values xi vary, the corresponding estimated probability that y *1


increases or decreases, depending on the sign of the associated regression
coefficient bi.

Multiple Logistic Regression finds the set of values of the regression


coefficients most likely to predict the observed values of the dependent
variable, given the observed values of the independent variables.

Multiple Logistic Regression 527


Performing To perform a Multiple Logistic Regression:
a Multiple
Logistic Regression 1 Enter or arrange your data appropriately in the worksheet
(see following section).

2 If desired, set the Logistic Regression options using the Options


for Multiple Logistic Regression dialog box (page 530).

3 Select Multiple Logistic Regression from the toolbar, then select


the button, or choose the Statistics menu command, then
choose Regression.

4 Run the test by selecting the worksheet columns with the data you
want to test using the Pick Columns dialog box (page 541).

5 View and interpret the Multiple Logistic Regression report


(page 544).

Arranging Multiple Logistic Regression Data 10

Logistic Regression data can be entered into the worksheet in raw or


grouped data format. For both formats you must have one column of
dependent variable data and one or more columns of independent
variable data. You must enter dependent variable data as dichotomous
data and independent variable data must be entered in numerical
format.

If you have continuous numerical data or as text as your dependent


variable data, or if you are using categorical independent variables, you
must convert them into an equivalent set of dummy variables using
reference coding.

For more information on using reference coding to convert independent


variables to dummy variables see Creating Dummy (Indicator) Variables
on page 765.

Observations containing missing values are ignored, and all columns


must be equal in length.

Multiple Logistic Regression 528


Raw Data To enter data in raw format, place the data for the observed dependent
variable in one column and the data for the corresponding independent
variables in one or more columns.

FIGURE 12–28
Valid Raw Data Format
for a Multiple Logistic
Regression
Column 1 is the dependent
variable column and
columns 2 through 6
are the independent
variable columns.

Grouped Data The grouped data format enables you to specify the number of instances
a combination of dependent and independent variables appear in a data
set. This data format is useful if you have several instances of the same
variable combination, and you don’t want to enter every instance in the
worksheet.

To enter data in grouped format, place the data for the observed
dependent variable in one column and the data for the corresponding
independent variables in one or more columns. Only enter one instance
of each different combination of dependent and independent variables,
then specify the number of times the combination appears in the data set
in the corresponding row of another worksheet column.

For example, if there are three instances of the dependent variable 0 with
corresponding independent variables of 26, and 142, place 0 in the
dependent variable column, 26, and 142 in the corresponding rows of
the independent variable columns, and 3 in the corresponding row of
the count worksheet column.

Multiple Logistic Regression 529


FIGURE 12–29
Valid Grouped Data
Format for a Multiple
Logistic Regression
Column one is the
dependent variable;
column 2 is the
independent variable;
column 3 is the number
of time the dependent
and independent
variable combinations.

Selecting When running a Multiple Logistic Regression, you can either:


Data Columns
➤ Select the columns to test from the worksheet before choosing the
test, or
➤ Select the columns while performing the test

Setting Multiple Logistic Regression Options 10

Use the Multiple Logistic Regression options to:

➤ Set options used to determine how well the logistics regression


equation fits the data.
➤ Estimate the variance inflation factors for the regression coefficients.
➤ Specify the residuals to display and save them to the worksheet.
➤ Calculate the standard error coefficient, Wald statistic, odds ratio,
odds ratio confidence, and coefficients P value.
➤ Specify tests to identify outlying or influential data points.

To change Multiple Logistic Regression options:

1 If you are going to run the test after changing test options and
want to select your data before you run the test, drag the pointer
over the data.

2 To open the Options for Multiple Logistic Regression dialog box,


select Multiple Logistic Regression from the toolbar drop-down

Multiple Logistic Regression 530


list, then click the button, or choose the Statistics menu
Current Test Options... command.

3 Click the More Statistics tab to view the Standard Error


Coefficients, Wald Statistic, Odds Ratio, Odds Ratio Confidence,
and Coefficients P Values, Predicted Values, and Variance Inflation
Factor options (see Figure 12–31 on page 533); click the Residuals
tab to view the residual and influence options (see Figure 12–32
on page 538), Click the Criterion tab to return to the criterion
options.

4 Click a check box to enable or disable a test option. Option


settings are saved between SigmaStat sessions. For more
information on each of the test options, see pages 12-531 through
12-539.

5 To continue the test, click Run Test. The Pick Columns dialog
box appears (see page 541 for more information).

6 To accept the current settings and close the options dialog box,
click OK. To accept the current setting without closing the
options dialog box, click Apply. To close the dialog box without
changing any settings or running the test, click Cancel.

( You can select Help at any time to access SigmaStat’s on-line help system.

Criterion Options Select the Criteriton tab in the Options dialog box to set the criterion
options. Use these options to specify the criterion you want to use to
test how well your data fits the logistic regression equation.

Hosmer-Lemshow Statistic The Hosmer-Lemshow statistic tests the


null hypothesis that the logistic equation fits the data by comparing the
number of individuals with each outcome with the number expected
based on the logistic equation. Small P values indicate that you can
reject the null hypothesis that the logistic equation fits the data and try
should try an equation with different independent variables. Large P
values indicate a good fit between the logistic equation and the data.
The default value is 0.2. Setting the P value to larger values requires
smaller deviations between the values predicted by the logistic equation
and the observed values of the dependent variable to accept the equation

Multiple Logistic Regression 531


as a good fit to the data. To change the P value, type a new value in the
edit box.

FIGURE 12–30
The Options for
Multiple Logistic
Regression Dialog Box
Displaying the
Criterion Options
.

Pearson Chi-Square Statistic The Pearson Chi-Square statistic tests


how well the logistic regression equation fits your data by summing the
squares of the Pearson residuals. Small values of the Pearson Chi-Square
statistic indicate a good agreement between the logistic regression
equation and the data. Large values of the Pearson Chi-Square indicate
a poor agreement.

Likelihood Ratio Test Statistic The Likelihood Ratio Test statistic


tests how well the logistic regression equation fits your data by summing
the squares of the deviance residuals. It compares the your full model
against a model that uses nothing but the mean of the dependent
variable. Small P values indicate a good fit between the logistic
regression equation and your data.

Classification Table The classification table tests the null hypothesis


that the data follow the logistic equation by comparing the number of
individuals with each outcome with the number expected based on the
logistic equation. It summarizes the results of whether the data fits the
logistic equation by cross-classifying the actual dependent response
variables with predicted responses and identifying the number of
different combinations of the independent variables. The predicted
responses are assigned dichotomous variables derived by comparing
estimated logistic probabilities to the probability value specified in the
Threshold Probability for Positive Classification edit box.

If the estimated probability exceeds the specified probability value, the


predicted variable is assigned a positive response (value of 1);

Multiple Logistic Regression 532


probabilities less than or equal to the specified value are assigned a value
of 0 or a reference value. The default threshold is 0.5. The resulting
contingency table can be analyzed with a Chi-Square test. As with the
Hosmer-Lemshow statistic, a large P value indicates a good fit between
the logistic regression equation and the data. For more information of
the classification results, see page 548.

Number of Independent Variable Combinations If the number of


unique combinations of the independent variables is not large compared
to the number of independent variables, your logistic regression results
may be unreliable. To calculate the number of independent variable
combinations and warn if there are not enough combinations as
compared to the independent variables, select the Number of
Independent Variable Combinations check box. If the calculated
independent combination is less than the value in the corresponding edit
box, a dialog box appears warning you that the number of independent
variable combinations are too small, and asks if you want to continue. If
you select Yes, the warning message appears in the report.

Statistics Options Select the More Statistics tab in the Options dialog box to view the
statistics options. These options help determine how well your data fits
the logistic regression equation using maximum likelihood as the
estimation criterion.

Standard Error Coefficients The Standard Error Coefficients are


measures of the precision of the estimates of the regression coefficients.
The true regression coefficients of the underlying population generally
fall within two standard errors of the observed sample coefficients.

FIGURE 12–31
The Options for
Multiple Logistic
Regression Dialog Box
Displaying the
Statistics Options

Multiple Logistic Regression 533


Wald Statistic The Wald statistic compares the observed value of the
estimated coefficient with its associated standard error. It is computed as
the ratio:
2
b
z = ----i-
2
s bi

where z is the Wald statistics, bi is the observed value of the estimated


coefficient, and s b i is the standard error of the coefficient.

Select the Wald statistic option to include the ratio of the observed
coefficient with the associated standard error in the report. The Wald
statistic can also be used to determine how significant the independent
variables are in predicting the dependent variable. For information on
using the Wald statistic to test whether your data fits the logistic
regression equation, see Wald Statistic on page 548.

Odds Ratio The odds of any event occurring can be defined by


P
Odds = , = ------------
1–P

Where P is the probability of the event happening. The odds ratio for
an independent variable is computed as

, G = e )I

where ) I is the regression coefficient. The odds ratio is an estimate of


the increase (or decrease) in the odds for an outcome if the independent
variable value is increased by 1.

Odds Ratio Confidence The odds ratio confidence intervals are


defined as

0b - Z s 1
. i 1 – --- i/
& b
e 2

Where b i is the coefficient, s bi is the standard error of the coefficient,


and Z 1 – ---&2 is the point on the axis of the standard normal distibution
that corresponds to the desired confidence interval.

The default confidence used is 95%. To change the confidence used,


change the percentage in the corresponding edit box.

Multiple Logistic Regression 534


Coefficients P Value The Coefficients P value determines the
probability of being incorrect in concluding that the each independent
variable has a significant effect on determining the dependent variable.
The smaller the P value, the more likely the independent variables
actually predicts the dependent variables.

The Wald statistic is used to test whether the coefficients associated


with the independent variables are significantly different from zero. The
significance of independent variables is tested by comparing the
observed value of the coefficients with the associated standard error of
the coefficient. If the observed value of the coefficient is large compared
to the standard error, you can conclude that the coefficients are
significantly different from zero and that the independent variables
contribute significantly to predicting the dependent variables. For more
information on computing the Wald statistic and on including it in your
report, see page 534.

Predicted Values Use this option to calculate the predicted value of the
dependent variable for each observed value of the independent
variable(s), then save the results to the data worksheet.

For logistic regression the predicted values indicate the probability of a


positive response. The predicted responses are assigned dichotomous
variables derived by comparing estimated logistic probabilities to the
probability value specified in the Threshold Probability for Positive
Classification edit box. For more information on the Threshold
Probability for Positive Classification value see Classification Table on
page 548.

To assign predicted values to a worksheet column, select the worksheet


column you want to save the predicted values to from the corresponding
drop-down list. If you select none and the Predicted Values check box is
selected, the values appear in the report but are not assigned to the
worksheet.

Variance Inflation Factor Use this option to measure the


multicollinearity of the independent variables, or the linear
combination of the independent variables in the fit.

Regression procedures assume that the independent variables are


statistically independent of each other, i.e., that the value of one
independent variable does not affect the value of another. However, this
ideal situation rarely occurs in the real world. When the independent

Multiple Logistic Regression 535


variables are correlated, or contain redundant information, the estimates
of the parameters in the regression model can become unreliable.

The parameters in regression models quantify the theoretically unique


contribution of each independent variable to predicting the dependent
variable. When the independent variables are correlated, they contain
some common information and “contaminate” the estimates of the
parameters. If the multicollinearity is severe, the parameter estimates
can become unreliable.

There are two types of multicollinearity.

➤ Sample-Based Multicollinearity Sample-based multicollinearity


occurs when the sample observations are collected in such a way that
the independent variables are correlated (for example, if age, height,
and weight are collected on children of varying ages, each variable
has a correlation with the others). This is the most common form
of multicollinearity.
➤ Structural Multicollinearity Structural multicollinearity occurs
when the regression equation contains several independent variables
which are functions of each other. An example of this is when a
regression equation contains several powers of the independent
variable. Because these powers (e.g., x, x2, etc.) are correlated with
each other, structural multicollinearity occurs. Including
interaction terms in a regression equation can also result in
structural multicollinearity.

Flagging Multicollinear Data Use the value in the Flag Values > edit
box as a threshold for multicollinear variables. The default threshold
value is 4.0, meaning that any value greater than 4.0 will be flagged as
multicollinear. To make this test more sensitive to possible
multicollinearity, decrease this value. To allow greater correlation of the
independent variables before flagging the data as multicollinear, increase
this value.

When the variance inflation factor is large, there are redundant variables
in the regression model, and the parameter estimates may not be reliable.
Variance inflation factor values above 4 suggest possible
multicollinearity; values above 10 indicate serious multicollinearity.

What to Do About Multicollinearity Sample-based multicollinearity


can sometimes be resolved by collecting more data under other

Multiple Logistic Regression 536


conditions to break up the correlation among the independent variables.
If this is not possible, the regression equation is over parameterized and
one or more of the independent variables must be dropped to eliminate
the multicollinearity.

Structural multicollinearities can be resolved by centering the


independent variable before forming the power or interaction terms.
Use the Transform menu Center command to center the data; see
Chapter 14, Using Transforms, for more information on using
transforms to modify data.

For descriptions of how to handle multicollinearity, you can reference an


appropriate statistics reference. For a list of suggested references, see
page 12.

Report Flagged Values Only To only include only the points flagged
by the influential point tests and values exceeding the variance inflation
threshold in the report, make sure the Report Flagged Values Only check
box is selected. Uncheck this option to include all influential points in
the report.

Residuals Select the Residuals tab in the options dialog box to view the Residual
Type, Raw, Standardized, Studentized, Studentized Deleted, and Report
Flagged Values Only options.

Residual Type Residuals are not reported by default. To include


residuals in the report select either Pearson or Deviance from the
Residual Type drop-down list. Select None from the drop-down list if
you don’t want to include residuals in the report.

Deviance residuals are used to calculate the likelihood ratio test statistic
to assess the overall goodness of fit of the logistic regression equation to
the data. The likelihood ratio test statistic is the sum of squared
deviance residuals. The deviance residual for each point is a measure of
how much that point contributes to the likelihood ratio test statistic.
Larger values of the deviance residual indicate a larger difference
between the observed and predicted values of the dependent variable.

Pearson residuals are calculated by dividing the raw residual by the


standard error. The standard error is defined as the observed value of the
dependent variable (0 or 1) divided by the probability of a positive
response (i.e. y *"1) outcome that is estimated from the Logistic
Regression equation. Pearson residuals are the default residual type used

Multiple Logistic Regression 537


to calculate the goodness of fit for the logistic regression equation
because the Chi-Square goodness of fit statistic is the sum of squared
Pearson residuals.

FIGURE 12–32
The Options for
Multiple Logistic
Regression Dialog Box
Displaying the
Residuals Options

Raw Residuals The raw residuals are the differences between the
predicted and observed values of the dependent variables. To include
raw residuals in the report, make sure this check box is selected. Click
the selected check box if you do not want to include raw residuals in the
worksheet.

To assign the raw residuals to a worksheet column, select the number of


the desired column from the corresponding drop-down list. If you select
none from the drop-down list and the Raw check box is selected, the
values appear in the report but are not assigned to the worksheet.

Studentized Residuals Studentized residuals take into account the


greater precision of the regression estimates near the middle of the data
versus the extremes. The Studentized residuals tend to be distributed
according to the Student t distribution, so the t distribution can be used
to define “large” values of the Studentized residuals. SigmaStat
automatically flags data points with “large” values of the Studentized
residuals, i.e., outlying data points; the suggested data points flagged lie
outside the 95% confidence interval for the regression population.

To include studentized residuals in the report, make sure this check box
is selected. Click the selected check box if you do not want to include
studentized residuals in the worksheet.

Studentized Deleted Residuals Studentized deleted residuals are


similar to the Studentized residual, except that the residual values are

Multiple Logistic Regression 538


obtained by computing the regression equation without using the data
point in question.

To include studentized deleted residuals in the report, make sure this


check box is selected. Click the selected check box if you do not want to
include studentized deleted residuals in the worksheet.

SigmaStat can automatically flag data points with “large” values of the
studentized deleted residual, i.e., outlying data points; the suggested data
points flagged lie outside the 95% confidence interval for the regression
population.

( Note that both Studentized and Studentized deleted residuals use the same
confidence interval setting to determine outlying points.

Report Flagged Values Only To only include the flagged


standardized and studentized deleted residuals in the report, make sure
the Report Flagged Values Only check box is selected. Uncheck this
option to include all standardized and studentized residuals in the
report.

Influence Options Select the Residuals tab in the options dialog box to view the Influence
options (see Figure 12–32 on page 538). Influence options
automatically detect instances of influential data points. Most
influential points are data points which are outliers, that is, they do not
“line up” with the rest of the data points. These points can have a
potentially disproportionately strong influence on the calculation of the
regression line. You can use several influence tests to identify and
quantify influential points.

Leverage Leverage is used to identify the potential influence of a point


on the results of the regression equation. Leverage depends only on the
value of the independent variable(s). Observations with high leverage
tend to be at the extremes of the independent variables, where small
changes in the independent variables can have large effects on the
predicted values of the dependent variable.

The expected leverage of a data point is $----------------


k + 1 %-
n
, where there are k
independent variables and n data points. Observations with leverages
much higher than the expected leverages are potentially influential
points.

Multiple Logistic Regression 539


FIGURE 12–33 16
A Graph with an Outlying Point
Influential Outlying Point 14
The solid line shows the
regression for the data 12
including the outlier, and
the dotted line is the
regression computed 10
with the outlying point.
8

2
100 120 140 160 180 200 220 240 260 280 300

Check the Leverage check box to compute the leverage for each point
and automatically flag potentially influential points, i.e., those points
that could have leverages greater than the specified value times the
expected leverage. The suggested value is 2.0 times the expected leverage
$ k + 1 %-
for the regression (i.e., 2--------------------
n
). To avoid flagging more potentially
influential points, increase this value; to flag points with less potential
influence, lower this value.

Cook's Distance Cook's distance is a measure of how great an effect


each point has on the estimates of the parameters in the regression
equation. Cook's distance assesses how much the values of the
regression coefficients change if a point is deleted from the analysis.
Cook's distance depends on both the values of the independent and
dependent variables.

Check the Cook's Distance check box to compute this value for all
points and flag influential points, i.e., those with a Cook's distance
greater than the specified value. The suggested value is 4.0. Cook's
distances above 1 indicate that a point is possibly influential. Cook's
distances exceeding 4 indicate that the point has a major effect on the
values of the parameter estimates. To avoid flagging more influential
points, increase this value: to flag less influential points, lower this value.

Report Flagged Values Only To only include only the influential


points flagged by the influential point tests in the report, make sure the
Report Flagged Values Only check box is selected. Uncheck this option
to include all influential points in the report.

Multiple Logistic Regression 540


Running a Multiple Logistic Regression 10

To run a Multiple Logistic Regression, you need to select the data to test.
The Pick Columns dialog box is used to select the worksheet columns
with the data you want to test.

To run a Multiple Logistic Regression:

1 If you want to select your data before you run the regression, drag
the pointer over your data.

2 Open the Pick Columns dialog box to start the Multiple Logistic
Regression. You can either:

➤ Select Multiple Logistic Regression from the drop-down list in


the toolbar, then select the button
➤ Choose the Statistics menu Regression command, then choose
Multiple Logistic...
➤ Click the Run Test button from the Options for Multiple
Logistic Regression dialog box (see step 5 on page 531).

The Pick Columns dialog box appears prompting you to specify a


data format.

3 Select the appropriate data format from the Data Format drop-
down list. If every instance of your dependent and independent
variable combination, including repeated combinations, is entered
in the worksheet, select Raw. If the number of repeated dependent
and independent variable combinations are indicated by a value in
a separate column, select data, select Grouped.

FIGURE 12–34
The Pick Columns
for Multiple Logistic
Regression Dialog Box
Prompting You to
Specify a Data Format

Multiple Logistic Regression 541


For more information on arranging data, see Data Format for
Regression and Correlation on page 468, or Arranging Data for
Regressions on page 71.

4 Select Next to pick the data columns for the test. If you selected
columns before you chose the test, the selected columns appear in
the Selected Columns list.

5 To assign the desired worksheet columns to the Selected Columns


list, select the columns in the worksheet, or select the columns
from the Data for Dependent, Independent, or Count drop-down
list.

If you selected Raw as your data format, you are prompted for one
dependent column and up to 64 independent column. If you
selected Grouped as your data format, you are prompted for one
Count column. Select the column with the values indication the
number of time a dependent and independent combination is
repeats as the Count column. The title of selected columns
appears in each row.

FIGURE 12–35
The Pick Columns
for Multiple Logistic
Regression Dialog Box
Prompting You to
Select Data Columns

6 To change your selections, select the assignment in the list, then


select new column from the worksheet. You can also clear a
column assignment by double-clicking it in the Selected Columns
list.

7 Select Finish to run the regression. If you elected to test for


normality, constant variance, and/or independent residuals,
SigmaStat performs the tests for normality (Kolmogorov-
Smirnov), constant variance, and independent residuals. If your
data fails either of these tests, SigmaStat warns you. When the test

Multiple Logistic Regression 542


is complete, the report appears displaying the results of the
Multiple Linear Regression (see Figure 12–36 on page 544).

If you selected to place residuals and other test results in the


worksheet, they are placed in the specified column and are labeled
by content and source column.

Multiple Logistic Regression 543


Interpreting Multiple Logistic Regression Results 10

The report for a Multiple Logistic Regression displays the logistic


equation with the computed coefficients, their standard errors, the
number of observations in the test, estimation criterion used to fit the
logistic equation to your data, the worksheet column with the
dependent variable data, the values representing the positive and
reference responses, and the Hosmer-Lemshow and Chi Square
goodness of fit statistics.

The other results displayed in the report are enabled or disabled in the
Options for Multiple Logistic Regression dialog box (see page 530).

( The report scroll bars only scroll to the top and bottom of the current page.
To move to the next or the previous page in the report, use the and
buttons in the formatting toolbar to move one page up and down in the
report.

Result Explanations In addition to the numerical results, expanded explanations of the results
may also appear. To turn off this explanatory text, choose the Statistics
menu Report Options command and uncheck the Explain Results
option.

The number of decimal places displayed is also set in the Report


Options dialog box. For more information on setting report options, see
Setting Report Options on page 135.
FIGURE 12–36
An Example
of the Multiple
Logistic
Regression Report

Multiple Logistic Regression 544


Regression Equation The logistic regression equation is:

1
P = --------------------------------------------------------------------
–$ b0 + b1 x1 + b2 x2 + ' + bk xk %
1+e

where P is the probability of a “positive” response (i.e., value of the


dependent variable equal to 1) and x1, x2, x3, ..., xk are the independent
variables and b0, b1, b2, b3, ... bk are the regression coefficients. The
equation can be rewritten by applying the logit transformation to both
sides of this equation.

P
Logit P = ln 0 ------------1
. 1 – P/

Number of Observations The number of observations N, and the number of observations


containing missing values (if any) that were omitted from the regression,
are also displayed.

Estimation Criterion Logistic regression uses the maximum likelihood approach to find the
values of the coefficients (bi) in the Logistic Regression Equation that
were most likely to fit the observed data.

( The regression coefficients computed by minimizing the sum of squared


residuals in Multiple Logistic Regression are also the maximum likelihood
estimates.

Dependent Variable This section of the report indicates which values in the dependent
variable column represent the positive response (1) and which value
represents the reference response (0).

Number of Unique This value represents the number of unique combinations of the
Independent Variable independent variables and appears if you have the Number of
Combinations Independent Variable Combinations option in the Options for Logistic
Regression dialog box (see page 530) selected. The number of unique
independent variable combinations is compared to the actual number of
independent variables. If this value is less than the value specified for the
Number of Independent Variable Combinations option (page 533), a
warning message appears in the report that your results may be
unreliable.

Hosmer-Lemshow The Hosmer-Lemshow P value indicates how well the logistic regression
P Value equation fits your data by comparing the number of individuals with

Multiple Logistic Regression 545


each outcome with the number expected based on the logistic equation.
It tests the null hypothesis that the logistic equation describes the data.
Thus, small P values indicate a poor fit of the equation to your data (i.e.,
you reject the null hypothesis of agreement). Large P values indicate a
good fit between the logistic equation and the data. The critical
Hosmer-Lemshow P value option is set in the Options for Multiple
Logistic Regression dialog box (page 531).

When the dataset is small, goodness of fit measures for the logistic
regression should be interpreted with great caution. All of the P values
are based on a chi-square probability distribution, which is not
recommended for use with small numbers of observations.

Pearson The Pearson Chi-Square statistic is the sum of the squared Pearson
Chi-Square Statistic residuals. It is a measure of the agreement between the observed and
predicted values of the dependent variable using a Chi-Square test
statistic. The Chi-Square test statistic is analogous to the residual sum of
squares in ordinary linear regression. Small values of the Chi-Square
(and corresponding large values of the associated P value) indicate a
good agreement between the logistic regression equation and the data
and large values of Chi-Square (and small values of P) indicate a poor
agreement. The Pearson Chi-Square option is set in the Options for
Multiple Logistic Regression dialog box (page 530).

Likelihood Ratio The Likelihood Ratio Test statistic is derived from the sum of the
Test Statistic squared deviance residuals. It indicates how well the logistic regression
equation fits your data by comparing the likelihood of obtaining
observations if the independent variables had no effect on the dependent
variable with the likelihood of obtaining the observations if the
independent variables had an effect on the dependent variables.

This comparison is computed by running the logistic regression with


and without the independent variables and comparing the results. If the
pattern of observed outcomes is more likely to have occurred when
independent variables affect the outcome than when they do not, a small
coefficients of P value is reported, indicating a good fit between the
logistic regression equation and your data.

Multiple Logistic Regression 546


Log Likelihood Statistic The -2 log likelihood statistic is a measure of the goodness of fit between
the actual observations and the predicted probabilities. It is the
summation:
n

–2 4 yi ln $ 2i % + $ 1 – yi % ln $ 1 – 2i %
i=1

where the yi and 23 are respectively the observed and predicted values of
the dependent variable, and n is the number of observations. Note that
ln(1) is zero and the observed values must be 0 or 1. Thus the closer the
predicted values are to the observed, the closer this sum will be to zero.

The -2 log likelihood is also equal to the sum of the squared deviance
residuals.

The -2 log likelihood (LL) statistic is related to the likelihood ratio (LR):

LR = LL – LL 0

where LL0 is the -2 log likelihood of a regression model having none of


the independent variables, just a constant term. In viewing this
relationship note that both LL0 and LL are positive, and LL must be
closer to zero reflecting a better fit. (At the extremes, LL will be zero
when there is a perfect fit, and LL will equal LL0 when there is no fit
whatsoever). Thus the larger the LR the larger the implied explanatory
power of the independent variables for the given dependent variable.

Threshold Probability for The threshold probability value determines whether the response
Positive Classification predicted by the logistic model in the classification and probability
tables (see following sections) is a positive or a reference response. If the
estimated probability in the probability table (see page 532) exceeds the
specified threshold probability value, the predicted variable is assigned a
positive response (value of 1); probabilities less than or equal to the
specified value are assigned a value of 0 or a reference value. The
threshold probability value is set in the options dialog box (see page
532).

Multiple Logistic Regression 547


Classification Table The classification table summarizes the results by cross-classifying the
observed dependent response variables with predicted responses and
identifying the number of correctly and incorrectly classified cases.

The responses classified by the logistic model are derived by comparing


estimated logistic probabilities in the Probability Table to the specified
threshold probability value (see preceding section).

This table appears in the report if the Classification Table option is


selected in the options dialog box (see page 532).

Probability Table The Probability Table lists the actual responses of the dependent
variable, the estimated logistic probability of a positive response (a value
of 1), and the predicted response of the dependent variables. The
predicted responses are assigned values of 1 (positive response) or 0
(reference response) derived by comparing estimated logistic
probabilities to the specified threshold probability value (see preceding
section).

This table appears in the report if the Predicted Values option is selected
in the options dialog box (see page 535).

Statistical The summary table lists the coefficient, standard error, Wald Statistic,
Summary Table Odds Ratio, Odds Ratio Confidence, P value, and VIF for the
independent variables.

Coefficients The value for the constant and coefficients of the


independent variables for the regression model are listed.

Standard Error The standard errors of the regression coefficients


(analogous to the standard error of the mean). The true regression
coefficients of the underlying population generally fall within about two
standard errors of the observed sample coefficients. Large standard
errors may indicate multicollinearity.

These values are used to compute the Wald statistic and confidence
intervals for the regression coefficients.

Wald Statistic The Wald statistic is the regression coefficient divided by


the standard error. It is computed as the ratio:

b
z = ----i
s bi

Multiple Logistic Regression 548


where z is the Wald Statistic, bi is the observed value of the estimated
coefficient, and s b i is the standard error of the coefficient.

P value P is the P value calculated for the Wald statistic. The P value is
the probability of being wrong in concluding that there is a true
association between the variables. The P value is based on the chi-square
distribution with one degree of freedom. The smaller the P value, the
greater the probability that the independent variables affect the
dependent variable.

Traditionally, you can conclude that the independent variable


contributes to predicting the dependent variable when P # 0.05.

Odds Ratio The odds ratio for an independent variable is computed as

, G = e )I

where ) I is the regression coefficient. The odds ratio is an estimate of


the increase (or decrease) in the odds for an outcome if the independent
variable value is increased by 1.

Odds Ratio Confidence These two values represent the lower and
upper ends of the confidence interval in which the true odds ratio lies.
The level of confidence (95%) is specified in the options dialog box (see
page 534).

VIF (Variance Inflation Factor) The variance inflation factor is a


measure of multicollinearity. It measures the “inflation” of the standard
error of each regression parameter (coefficient) for an independent
variable due to redundant information in other independent variables.

If the variance inflation factor is 1.0, there is no redundant information


in the other independent variables. If the variance inflation factor is
much larger, there are redundant variables in the regression model, and
the parameter estimates may not be reliable.

Variance inflation factor values for independent variables above the


specified value are flagged with a + symbol, indicating multicollinearity
with other independent variables.

The presence of serious multicollinearity indicates that you have too


many redundant independent variables in your regression equation. To
improve the quality of the regression equation, you should delete the

Multiple Logistic Regression 549


redundant variables. The cutoff value for flagging multicollinearity is set
in the options dialog box (see page 536). The suggested value is 4.0.
For information on what to do about multicollinearity, seepage 536.

Residual The residual calculation method indicates how the residuals for the
Calculation Method logistic regression are calculated. You can choose Pearson or Deviance
residuals from the Options for Logistic Regression dialog box. This
choice does not affect the logistic regression itself, which minimizes the
deviance residuals squared, but does affect how the Studentized residuals
are calculated.

The Pearson residual is defined as:


yi – 2i
----------------------------
-
2i $ 1 – 2i %

where yi and 2i are respectively the observed and predicted values for the
ith case.

The deviance residual is defined as:


1 - for y = 0
– 2 ln ------------------ i
$ 1 – 2i %

1-
+ 2 ln ---- for y i = 1
2i

Residuals Table The residuals table displays the raw, Pearson or Deviance, studentized,
and studentized deleted residuals if the associated options are selected in
the options dialog box (see page 537). All residuals that qualify as
outlying values are flagged with a # symbol. The trigger values to flag
residuals as outliers are also set in the Options for Multiple Logistic
Regression dialog box.

If you selected Report Cases with Outliers Only, only those observations
that have one or more residuals flagged as outliers are reported; however,
all other results for that observation are also displayed. The way the
residuals are calculated depend on whether Pearson or Deviance is
selected as the residual type in the options dialog box (see page 537).

Row This is the row number of the observation. Note that if your data
has a case with a value missing, the corresponding row is entirely
omitted from the table of residuals.

Multiple Logistic Regression 550


Pearson/Deviance Residuals The Residual table displays either
Pearson or Deviance residuals, depending on the Residual Type option
setting in the Options for Logistic Regression dialog box (see page 537).

Both Pearson and Deviance residuals indicate goodness of fit between


the logistic equation and the data, with smaller values indicating a better
fit. These two residual types are calculated differently and affect the way
the studentized residuals in the table are calculated.

Pearson residuals, also known as standardized residuals, are the raw


residuals divided by the standard error. Deviance residuals are a measure
of how much each point contributes to the likelihood function being
minimized as part of the maximum likelihood procedure. For more
information on Pearson and Deviance residuals, see page 551.

Raw Residuals Raw residuals are the difference between the predicted
and observed values for each of the subjects or cases.

Studentized Residuals The Studentized residual is a standardized


residual that also takes into account the greater confidence of the
predicted values of the dependent variable in the “middle” of the data
set.

This residual is also known as the internally Studentized residual,


because the standard error of the estimate is computed using all data.

Studentized Deleted Residual The Studentized deleted residual, or


externally Studentized residual, is a Studentized residual which uses the
standard error, computed after deleting the data point associated with
the residual.

Both Studentized and Studentized deleted residuals that lie outside a


specified confidence interval for the regression are flagged as outlying
points; the suggested confidence value is 95%.

The Studentized deleted residual is more sensitive than the Studentized


residual in detecting outliers, since the Studentized deleted residual
results in much larger values for outliers than the Studentized residual.

Influence Diagnostics The influence diagnostic results display only the values for the results
selected in the Options dialog box under the More Statistics tab (see
page 539). All results that qualify as outlying values are flagged with a #

Multiple Logistic Regression 551


symbol. The trigger values to flag data points as outliers are also set in
the Options dialog box under the More Statistics tab.

If you selected Report Flagged Values Only, only observations that have
one or more observations flagged as outliers are reported; however, all
other results for that observation are also displayed.

Row This is the row number of the observation. Note that if your data
has a case with a value missing, the corresponding row is entirely
omitted from the table of residuals.

Cook's Distance Cook's distance is a measure of how great an effect


each point has on the estimates of the parameters in the regression
equation. It is a measure of how much the values of the regression
coefficients would change if that point is deleted from the analysis.

Values above 1 indicate that a point is possibly influential. Cook's


distances exceeding 4 indicate that the point has a major effect on the
values of the parameter estimates. Points with Cook's distances greater
than the specified value are flagged as influential; the suggested value is
4. The Cook’s Distance value used to flag “large” values is set in the
options dialog box (see page 540).

Leverage Leverage values identify potentially influential points.


Observations with leverages a specified factor greater than the expected
leverages are flagged as potentially influential points; the suggested value
is 2.0 times the expected leverage.
$ k + 1 %-
The expected leverage of a data point is ----------------
n
, where there are k
independent variables and n data points.

Because leverage is calculated using only the dependent variable, high


leverage points tend to be at the extremes of the independent variables
(large and small values), where small changes in the independent
variables can have large effects on the predicted values of the dependent
variable.

Multiple Logistic Regression 552


Polynomial Regression 10

Use Polynomial Regression to when you:

➤ Want to predict a trend in the data, or predict the value of one


variable from the value of another variable, by fitting a curve
through the data that does not follow a straight line, and
➤ Know there is only one independent variable

The independent variable is the known, or predictor, variable. When


the independent variable is varied, a corresponding value for the
dependent, or response, variable is produced.

If the relationships between the independent variables and the


dependent variables is first order (a straight line), use Multiple Linear
Regression. If the relationship is not a linear polynomial (e.g., a log or
exponential function), use Nonlinear Regression.

About the Polynomial Polynomial Regression assumes an association between the independent
Regression and dependent variables that fits the general equation for a polynomial
of order k

2 3 k
y = b0 + b1 x1 + b2 x + b3 x + ' + bk x

where y is the dependent variable, x is the independent variable, and b0,


b1, b2, b3,...,bk are the regression coefficients. As the value for x varies,
the corresponding value varies according to a polynomial function.

The order of the polynomial k is the highest exponent of the


independent variable; a first order polynomial is a straight line, a second
order (quadratic) polynomial is a parabola, etc.

Polynomial Regression is a parametric test, that is, for a given


independent variable value, the possible values for the dependent
variable are assumed to be normally distributed and have equal variance.

( If you are fitting a polynomial to data, the polynomial regression procedure


yields more reliable results than simply performing a Multiple Linear
Regression using x, x 2, etc. as the independent variables.

Polynomial Regression 553


Performing a Polynomial To perform a Polynomial Regression:
Regression
1 Enter or arrange your data appropriately in the worksheet
(see following section).

2 If desired, set the polynomial regression options using the Options


for Polynomial Regression dialog box (page 530).

3 Select Polynomial Regression from the toolbar, then select the


button, or choose the Statistics menu Regression command, then
choose Polynomial.

4 Run the test by selecting the worksheet columns with the data you
want to test using the Pick Columns dialog box (page 541).

5 View and interpret the polynomial regression and generate report


graphs (pages 12-566 and 12-574).

Arranging Polynomial Regression Data 10

Place the data for the dependent variable in one column and the
corresponding data for the observed independent variable in another
column.

Observations containing missing values are ignored, and all columns


must be equal in length.

FIGURE 12–37
Data Format for a
Polynomial Regression

Selecting Data Columns When running a Polynomial Regression, you can either:

Polynomial Regression 554


➤ Select the columns to test from the worksheet before choosing the
test, or
➤ Select the columns while performing the test

Setting Polynomial Regression Options 10

Use the Polynomial Regression options to:

➤ Set the polynomial order.


➤ Specify the type of polynomial regression you want to perform
(incremental evaluation or order only).
➤ Set the assumption checking options.
➤ Specify the residuals to display and save them to the worksheet.
➤ Display confidence intervals and save them to the worksheet.
➤ Display the PRESS prediction error and the standardized
coefficients.
➤ Display the power.

To change Polynomial Regression options:

1 If you are going to run the test after changing test options, and
want to select your data before you run the test, drag the pointer
over your data.

2 To open the Options for Polynomial Regressions dialog box, select


Polynomial Regression from the drop-down list in the toolbar,
then click the toolbar button, or choose the Statistics menu
Current Test Options... command. The criterion options appear
(see Figures 12–38 on page 556). Select the type of regression to
run from the Regression Type drop-down list.

If you select Incremental Order as the regression type, only the


criterion options are available (see page 556). If you select Order
Only, Criterion, Assumption Checking, Residuals, and More
Statistics options are available.

3 Click the Assumption Checking tab to view the Normality,


Constant Variance, and Durbin-Watson options (see Figure 12–39
on page 558), the Residuals tab to view the residual options (see
Figure 12–40 on page 560), and More Statistics tab to view the
confidence intervals, PRESS Prediction Error, Standardized

Polynomial Regression 555


Coefficients options (see Figure 12–41 on page 564). Select the
Post Hoc Tests tab to view the Power option (see Figure 12–42 on
page 564). Click the Criterion tab to return to the Normality,
Constant Variance, and Durbin-Watson options.

4 Click a check box to enable or disable a test option. Options


settings are saved between SigmaStat sessions. For more
information on each of the test options, see pages 12-531 through
12-539.

5 To continue the test, click Run Test. The Pick Columns dialog
box appears (see page 564 for more information).

6 To accept the current settings and close the options dialog box,
click OK. To accept the current setting without closing the
options dialog box, click Apply. To close the dialog box without
changing any settings or running the test, click Cancel.

( You can select Help at any time to access SigmaStat’s on-line help system.

Criterion Options Select the Criterion tab from the options dialog box to view the
Polynomial Order and Regression options. Use these options to specify
the polynomial order to use and the type of polynomial to use to
evaluate your data.

Polynomial Order Select the desired polynomial order from the


Polynomial Order drop-down list. You can also type the desired value
on the drop-down box. This value is used either as the maximum order
to evaluate or the specific order to compute.

FIGURE 12–38
The Options for Polynomial
Regression Dialog Box
Displaying the
Criterion Options

Polynomial Regression 556


Order Only Select Order Only from the Regression drop-down list to
fit only the order specified in the Polynomial Order edit box to the data.

Incremental Evaluation Select Incremental Evaluation if you need to


find the order of polynomial to use. This option evaluates each
polynomial order equation starting at zero and increasing to the value
specified in the Polynomial Order box.

Note this option does not display all regression results; instead, it is used
to evaluate the order for the best model to use. Once the order is
determined, run an order only polynomial regression to obtain complete
regression results.

Assumption Checking Select the Assumption Checking tab from the options dialog box to view
Options the Normality, Constant Variance, and Durbin-Watson options. These
options test your data for its suitability for regression analysis by
checking three assumptions that a polynomial regression makes about
the data. A polynomial regression assumes:

➤ That the source population is normally distributed about the


regression
➤ The variance of the dependent variable in the source population is
constant regardless of the value of the independent variable(s).
➤ That the residuals are independent of each other.

All assumption checking options are selected by default. Only disable


these options if you are certain that the data was sampled from normal
populations with constant variance and that the residuals are
independent of each other.

Normality Testing SigmaStat uses the Kolmogorov-Smirnov test to test


for a normally distributed population.

Constant Variance Testing SigmaStat tests for constant variance by


computing the Spearman rank correlation between the absolute values of
the residuals and the observed value of the dependent variable. When
this correlation is significant, the constant variance assumption may be
violated, and you should consider trying a different model (i.e., one that
more closely follows the pattern of the data), or transforming one or
more of the independent variables to stabilize the variance; see Chapter
14, Using Transforms for more information on the appropriate
transform to use.

Polynomial Regression 557


P Values for Normality and Constant Variance The P value
determines the probability of being incorrect in concluding that the data
is not normally distributed (P value is the risk of falsely rejecting the null
hypothesis that the data is normally distributed). If the P computed by
the test is greater than the P set here, the test passes.

To require a stricter adherence to normality and/or constant variance,


increase the P value. Because the parametric statistical methods are
relatively robust in terms of detecting violations of the assumptions, the
suggested value in SigmaStat is 0.05. Larger values of P (for example,
0.10) require less evidence to conclude that the residuals are not
normally distributed or the constant variance assumption is violated.

To relax the requirement of normality and/or constant variance,


decrease P. Requiring smaller values of P to reject the normality
assumption means that you are willing to accept greater deviations from
the theoretical normal distribution before you flag the data as non-
normal. For example, a P value of 0.01 for the normality test requires
greater deviations from normality to flag the data as non-normal than a
value of 0.05.

( Although the assumption tests are robust in detecting data from populations
that are non-normal or with non-constant variances, there are extreme
conditions of data distribution that these tests cannot detect. However, these
conditions should be easily detected by visually examining the data without
resorting to the automatic assumption tests.
FIGURE 12–39
The Options for Polynomial
Regression Dialog Box
Displaying the Assumption
Checking Options

Durbin-Watson Statistic SigmaStat uses the Durbin-Watson statistic


to test residuals for their independence of each other. The Durbin-
Watson statistic is a measure of serial correlation between the residuals.
The residuals are often correlated when the independent variable is time,

Polynomial Regression 558


and the deviation between the observation and the regression line at one
time are related to the deviation at the previous time. If the residuals are
not correlated, the Durbin-Watson statistic will be 2.

Difference from 2 Value Enter the acceptable deviation from 2.0 that
you consider as evidence of a serial correlation in the Difference for 2.0
box. If the computed Durbin-Watson statistic deviates from 2.0 more
than the entered value, SigmaStat warns you that the residuals may not
be independent. The suggested deviation value is 0.50, i.e., Durbin-
Watson Statistic values greater than 2.5 or less than 1.5 flag the residuals
as correlated.

To require a stricter adherence to independence, decrease the acceptable


difference from 2.0.

To relax the requirement of independence, increase the acceptable


difference from 2.0.

Residuals Select the Residuals tab in the options dialog box to view the Predicted
Values, Raw, Standardized, Studentized, Studentized Deleted, and
Report Flagged Values Only options.

Predicted Values Use this option to calculate the predicted value of the
dependent variable for each observed value of the independent
variable(s), then save the results to the worksheet. Click the selected
check box if you do not want to include raw residuals in the worksheet.

To assign predicted values to a worksheet column, select the worksheet


column you want to save the predicted values to from the corresponding
drop-down list. If you select none and the Predicted Values check box is

Polynomial Regression 559


selected, the values appear in the report but are not assigned to the
worksheet.

FIGURE 12–40
The Options for Polynomial
Regression Dialog Box
Displaying the
Residual Options

Raw Residuals The raw residuals are the differences between the
predicted and observed values of the dependent variables. To include
raw residuals in the report, make sure this check box is selected. Click
the selected check box if you do not want to include raw residuals in the
worksheet.

To assign the raw residuals to a worksheet column, select the number of


the desired column from the corresponding drop-down list. If you select
none from the drop-down list and the Raw check box is selected, the
values appear in the report but are not assigned to the worksheet.

Standardized Residuals The standardized residual is the residual


divided by the standard error of the estimate. The standard error of the
residuals is essentially the standard deviation of the residuals, and is a
measure of variability around the regression line. To include
standardized residuals in the report, make sure this check box is selected.
Click the selected check box if you do not want to include raw residuals
in the worksheet.

To assign the standardized residuals to a worksheet column, select the


number of the desired column from the corresponding drop-down list.
If you select none from the drop-down list and the Standardized check
box is selected, the values appear in the report but are not assigned to the
worksheet.

SigmaStat automatically flags data points lying outside of the confidence


interval specified in the corresponding box. These data points are

Polynomial Regression 560


considered to have “large” standardized residuals, i.e., outlying data
points. You can change which data points are flagged by editing the
value in the Flag Values > edit box. The suggested interval is 95%.

Studentized Residuals Studentized residuals scale the standardized


residuals by taking into account the greater precision of the regression
line near the middle of the data versus the extremes. The Studentized
residuals tend to be distributed according to the Student t distribution,
so the t distribution can be used to define “large” values of the
Studentized residuals. SigmaStat automatically flags data points with
“large” values of the Studentized residuals, i.e., outlying data points; the
suggested data points flagged lie outside the 95% confidence interval for
the regression population.

To include studentized residuals in the report, make sure this check box
is selected. Click the selected check box if you do not want to include
studentized residuals in the worksheet.

Studentized Deleted Residuals Studentized deleted residuals are


similar to the Studentized residual, except that the residual values are
obtained by computing the regression equation without using the data
point in question.

To include studentized deleted residuals in the report, make sure this


check box is selected. Click the selected check box if you do not want to
include studentized deleted residuals in the worksheet.

SigmaStat can automatically flag data points with “large” values of the
studentized deleted residual, i.e., outlying data points; the suggested data
points flagged lie outside the 95% confidence interval for the regression
population.

( Note that both Studentized and Studentized deleted residuals use the same
confidence interval setting to determine outlying points.

Report Flagged Values Only To only include only the flagged


standardized and studentized deleted residuals in the report, make sure
the Report Flagged Values Only check box is selected. Uncheck this
option to include all standardized and studentized residuals in the
report.

Polynomial Regression 561


Confidence Intervals Select the More Statistics tab in the options dialog box to view the
confidence interval options. You can set the confidence interval for the
population, regression, or both and then save them to the worksheet.

Confidence Interval for the Population The confidence interval for


the population gives the range of values that define the region that
contains the population from which the observations were drawn.

FIGURE 12–41
The Options for Polynomial
Regression Dialog Box
Displaying the
Confidence intervals,
PRESS Prediction Error,
and Standard
Coefficients Options

To include confidence intervals for the population in the report, make


sure the Population check box is selected. Click the selected check box if
you do not want to include the confidence intervals for the population
in the report.

Confidence Interval for the Regression The confidence interval for


the regression line gives the range of values that defines the region
containing the true mean relationship between the dependent and
independent variables, with the specified level of confidence.

To include confidence intervals for the regression in the report, make


sure the Regression check box is selected, then specify a confidence level
by entering a value in the percentage box. The confidence level can be
any value from 1 to 99. The suggested confidence level for all intervals
is 95%. Click the selected check box if you do not want to include the
confidence intervals for the population in the report.

Click the selected check box if you do not want to include the
confidence intervals for the population in the report.

Saving Confidence Intervals to the Worksheet To save the


confidence intervals to the worksheet, select the column number of the
first column you want to save the intervals to from the Starting in

Polynomial Regression 562


Column drop-down list. The selected intervals are saved to the
worksheet starting with the specified column and continuing with
successive columns in the worksheet.

PRESS Select the More Statistics tab in the options dialog box to view the
Prediction Error PRESS Prediction Error option (see Figure 12–42 on page 564). The
PRESS Prediction Error is a measure of how well the regression equation
fits the data. Leave this check box selected to evaluate the fit of the
equation using the PRESS statistic. Click the selected check box if you
do not want to include the PRESS statistic in the report.

Standardized Click the More Statistics tab in the options dialog box to view the
Coefficients ()i) Standardized Coefficients option (see Figure 12–42 on page 564).
These are the coefficients of the regression equation standardized to
dimensionless values,
sx
) i = b i ----i
sy

where bi = regression coefficient, s xi = standard deviation of the


independent variable xi, and sy = standard deviation of dependent
variable y.

To include the standardized coefficients in the report, make sure the


Standardized Coefficients check box is selected. Click the selected
check box if you do not want to include the standardized coefficients in
the worksheet.

Power Select the Post Hoc Tests tab in the options dialog box to view the Power
option. If you can’t see the Post Hoc tab in the Options dialog box, click
the right point arrow at the right of the tabs to move the tab into view.
Use the left pointing arrow to move the other tabs back into view.

The power of a regression is the power to detect the observed


relationship in the data. The alpha (&) is the acceptable probability of
incorrectly concluding there is a relationship.

Check the Power check box to compute the power for the polynomial
regression data. Change the alpha value by editing the number in the
Alpha Value edit box. The suggested value is & * 0.05. This indicates

Polynomial Regression 563


that a one in twenty chance of error is acceptable, or that you are willing
to conclude there is a significant relationship when P # 0.05.

FIGURE 12–42
The Options for Polynomial
Regression Dialog Box
Displaying the
Power Option

Smaller values of & result in stricter requirements before concluding


there is a significant relationship, but a greater possibility of concluding
there is no relationship when one exists. Larger values of & make it
easier to conclude that there is a relationship, but also increase the risk of
reporting a false positive.

Running a Polynomial Regression 10

To run a Polynomial Regression you need to select the data to test. The
Pick Columns dialog box is used to select the worksheet columns with
the data you want to test.

To run a Polynomial Regression:

1 If you want to select your data before you run the regression, drag
the pointer over your data.

2 Open the Pick Columns dialog box to start the Polynomial


Regression. You can either:

➤ SelectPolynomial Regression from the drop-down list in the


toolbar, then select the button.
➤ Choose the Statistics menu Regression command, then choose
Polynomial...
➤ Click the Run Test button from the Options for Polynomial
Regression dialog box (see step 5 on page 556).

Polynomial Regression 564


If you selected columns before you chose the test, the selected
columns appear in the column list. If you have not selected
columns, the dialog box prompts you to pick your data.

3 To assign the desired worksheet columns to the Selected Columns


list, select the columns in the worksheet, or select the columns
from the Data for Dependent and Independent drop-down list.

The first selected column is assigned to the Dependent Variable


row in the Selected Columns list, and the second column is
assigned to the Independent Variable row. The title of selected
columns appears in each row. You are only prompted for one
dependent and one independent variable column.

FIGURE 12–43
The Pick Columns
for Polynomial Regression
Dialog Box Prompting You to
Select Data Columns

4 To change your selections, select the assignment in the list, then


select new column from the worksheet. You can also clear a
column assignment by double-clicking it in the Selected Columns
list.

5 Select Finish to run the regression. If you elected to test for


normality, constant variance, and/or independent residuals,
SigmaStat performs the tests for normality (Kolmogorov-
Smirnov), constant variance, and independent residuals. If your
data fail either of these tests, SigmaStat warns you. When the test
is complete, the report appears displaying the results of the
Polynomial Regression (see Figure 12–44 on page 567).

If you are performing a regression using one order only, and


selected to place predicted values, residuals, and/or other test
results in the worksheet, they are placed in the specified data
columns and are labeled by content and source column.

Polynomial Regression 565


( Worksheet results can only be obtained using order only polynomial
regression.

Interpreting Incremental Polynomial Regression Results 10

Incremental Order Polynomial Regression results display the regression


equations for each order polynomial, starting with zero order and
increasing to the specified order. The residual and incremental mean
square, and incremental and overall R2, F value, and P value for each
order equation are listed.

( The report scroll bars only scroll to the top and bottom of the current page.
To move to the next or the previous page in the report, use the and
buttons in the formatting toolbar to move one page up and down in the
report.

Results Explanations In addition to the numerical results, expanded explanations of the results
may also appear. To turn off this explanatory text, choose the Statistics
menu Options command, choose Report, and unselect the Explain
Results option. The number of decimal places displayed is also set in the
Report Options dialog box. For more information on setting report
options, see page 135.

Regression Equation These are the regression equations for each order, with the values of the
coefficients in place. The equations take the form:

2
y = b0 + b1 x1 + b2 x + b3 x 3 + ' + bk x k

where y is the dependent variable, x is the independent variable, and b0,


b1, b2, b3,...,bk are the regression coefficients.

The order k of the polynomial is the largest exponent of the independent


variable.

Polynomial Regression 566


For incremental polynomial regression, all equations from zero order up
to the maximum order specified in the Options for Polynomial
Regressions dialog box are listed.

FIGURE 12–44
An Example of the
Incremental
Polynomial
Regression Report

Incremental Results MSres (Residual Mean Square) The residual mean square is a measure
of the variation of the residuals about the regression line.
residual sum of squares - SS res
----------------------------------------------------------- = ------------ = MS res
residual degrees of freedom DF res

MSincr (Incremental Mean Square) The incremental mean square is a


measure of the reduction in variation of the residuals about the
regression equation gained with this order polynomial

incremental sum of squares SS incr


--------------------------------------------------------------------- - = MS incr
= --------------
incremental degrees of freedom DF incr

The sum of squares are measures of variability of the dependent


variable.

The residual sum of squares is a measure of the size of the residuals,


which are the differences between the observed values of the dependent
variable and the values predicted by regression model.

Polynomial Regression 567


The incremental or Type I sum of squares, is a measure of the new
predictive information contained in the added power of the independent
variable, as it is added to the equation.

It is a measure of the increase in the regression sum of squares (and


reduction in the sum of squared residuals) obtained when the highest
order term of the independent variable is added to the regression
equation, after all lower order terms have been entered. Since one order
is added in each step, DFincr =1.

Rsq (R2) R2, the coefficient of determination, is a measure of how well


the regression model describes the data.

➤ The incremental R2 is the gain in R2 obtained with this order


polynomial over the previous order polynomial.
➤ The overall R2 is the actual R2 of this order polynomial.

Overall R2 values nearer to 1 indicate that the curve is a good description


of the relation between the independent and dependent variables. R2 is
near 0 when the values of the independent variable poorly predict the
dependent variables.

F value The F test statistic gauges the ability of the independent


variable in predicting the dependent variable.

➤ The incremental F value gauges the increase in contribution of each


added order of the independent variable in predicting the dependent
variable. It is the ratio

incremental variation from the dependent variable mean- MS


--------------------------------------------------------------------------------------------------------------------------------- = --------
residual variation about the regression curve MS
If the incremental F is large and the overall F jumps to a large
number, you can conclude that adding that order of the
independent variables predicts the dependent variable significantly
better than the previous model. The “best” order polynomial to
use is generally the highest order polynomial that produces a
marked improvement in predictive ability.

Polynomial Regression 568


➤ Overall F value gauges the contribution of all orders of the
independent variable in predicting the dependent variable. It is the
ratio

regression variation from the dependent variable mean- MS reg


--------------------------------------------------------------------------------------------------------------------------- = -------------
residual variation about the regression curve MS res
When the overall F ratio is around 1, you can conclude that there
is no association between the independent variables (i.e., the data
is consistent with the null hypothesis that all the samples are just
randomly distributed).

P Value P is the P value calculated for F. The P value is the probability


of being wrong in concluding that there is a true association between the
dependent and independent variables (i.e., the probability of falsely
rejecting the null hypothesis, or committing a Type I error, based on F).
The smaller the P value, the greater the probability that there is an
association.

➤ The incremental P value is the change in probability of being wrong


that the added independent variable order improves the prediction
of the dependent variable.
➤ The overall P value is the probability of being wrong that order of
polynomial correctly predicts the dependent variable.

Traditionally, you can conclude that the independent variable can be


used to predict the dependent variable when P #"0.05.

Assumption Testing Normality Normality test result displays whether or not the polynomial
model passed or failed the test of the assumption that the source
population is normally distributed around the regression curve, and the
P value calculated by the test. All regression requires a source population
to be normally distributed about the regression curve.

When this assumption may be violated, a warning appears in the report.


Failure of the normality test can indicate the presence of outlying
influential points or an incorrect regression model.

Constant Variance The constant variance test results list whether or


not that polynomial model passed the test for constant variance of the
residuals about the regression, and the P value computed for that order

Polynomial Regression 569


polynomial. All regression techniques require a normal distribution of
the residuals about the regression curve.

Choosing the The smaller the residual sum of squares and mean square, the closer the
Best Model curve matches the data at those values of the independent variable. The
first model that has a significant increase in the incremental F value is
generally the best model to use. Because the R2 value increases as the
order increases, you also want to use the simplest model that adequately
describes the data.

Interpreting Order Only Polynomial Regression Results 10

The report for an order only Polynomial Regression displays the


equation with the computed coefficients for the curve, R and R2, mean
squares, F, and the P value for the regression equation.

The other results displayed in the report are selected in the Options for
Polynomial Regression dialog box (see page 555).

( The report scroll bars only scroll to the top and bottom of the current page.
To move to the next or the previous page in the report, use the and
buttons in the formatting toolbar to move one page up and down in the
report.

Result Explanations In addition to the numerical results, expanded explanations of the results
may also appear. To turn off this explanatory text, choose the Statistics
menu Report Options command and unselect the Explain Results
option.

The number of decimal places displayed is also set in the Report


Options dialog box. For more information on setting report options, see
page 135.

Regression Equation This is the equation with the values of the coefficients in place. This
equation takes the form:
2 3 k
y = b0 + b1 x1 + b2 x + b3 x + ' + bk x

where y is the dependent variable, x is the independent variable, and b0,


b1, b2, b3,...,bk are the regression coefficients.

Polynomial Regression 570


The order of the polynomial is the exponent of the independent variable.
The number of observations N is also displayed, with the missing values,
if any.

Analysis of MSres (Residual Mean Square) The mean square provides an estimate
Variance (ANOVA) of the population variance. The residual mean square is a measure of the
variation of the residuals about the regression curve, or

residual sum of squares - = ------------ SS res


----------------------------------------------------------- = MS res
residual degrees of freedom DF res

Rsq (R2) The coefficient of determination R2 is a measure of how well


the regression model describes the data.

R2 values near 1 indicate that the curve is a good description of the


relation between the independent and dependent variables. R2 values
near 0 indicate that the values of the independent variable do not predict
the dependent variables.

F Statistic The F test statistic gauges the contribution of the regression


equation to predict the dependent variable. It is the ratio

regression variation from the dependent variable mean- MS reg


--------------------------------------------------------------------------------------------------------------------------- = -------------
residual variation about the regression curve MS res

If F is a large number, you can conclude that the independent variable


contributes to the prediction of the dependent variable (i.e., the
“unexplained variability” is smaller than what is expected from random
sampling variability of the dependent variable about its mean). If the F
ratio is around 1, you can conclude that there is no association between
the variables (i.e., the data is consistent with the null hypothesis that all
the samples are just randomly distributed).

P Value P is the P value calculated for F. The P value is the probability


of being wrong in concluding that there is a true association between the
variables (i.e., the probability of falsely rejecting the null hypothesis, or
committing a Type I error, based on F). The smaller the P value, the
greater the probability that the variables are correlated.

Standard Error of The standard error of the estimate s y x is a measure of the actual
the Estimate ( s y x ) variability about the regression line of the underlying population. The

Polynomial Regression 571


underlying population generally falls within about two standard errors of
the observed sample.

PRESS Statistic PRESS, the Predicted Residual Error Sum of Squares, is


a gauge of how well a regression model predicts new data. The smaller
the PRESS statistic, the better the predictive ability of the model.

The PRESS statistic is computed by summing the squares of the


prediction errors (the differences between predicted and observed values)
for each observation, with that point deleted from the computation of
the regression equation.

Durbin-Watson Statistic The Durbin-Watson statistic is a measure of correlation between the


residuals. If the residuals are not correlated, the Durbin-Watson statistic
will be 2; the more this value differs from 2, the greater the likelihood
that the residuals are correlated. This result appears if it was selected in
the Options for Polynomial Regression dialog box.

Regression assumes that the residuals are independent of each other; the
Durbin-Watson test is used to check this assumption. If the Durbin-
Watson value deviates from 2 by more than the value set in the Options
for Polynomial Regression dialog box, a warning appears in the report.
The suggested trigger value is a difference of more than 0.50 (i.e., if the
Durbin-Watson statistic is below 1.5 or over 2.5).

Normality Test The normality test results display whether or not the polynomial model
passed or failed the test of the assumption that the source population is
normally distributed around the regression curve, and the P value
calculated by the test. All regression requires a source population to be
normally distributed about the regression curve.

When this assumption may be violated, a warning appears in the report.


Failure of the normality test can indicate the presence of outlying
influential points or an incorrect regression model.

This result appears unless you disabled normality testing in the Options
for Polynomial Regression dialog box (see page 555).

Constant The constant variance test result displays whether or not the polynomial
Variance Test model passed or failed the test of the assumption that the variance of the
dependent variable in the source population is constant regardless of the
value of the independent variable, and the P value calculated by the test.

Polynomial Regression 572


When the constant variance assumption may be violated, a warning
appears in the report.

If you receive this warning, you should consider trying a different model
(i.e., one that more closely follows the pattern of the data), or
transforming the independent variable to stabilize the variance and
obtain more accurate estimates of the parameters in the regression
equation. For more information on the appropriate transform to use,
see Using Quick Transforms to Linearize and Normalize Data on page
749.

This result appears unless you disabled constant variance testing in the
Options for Polynomial Regression dialog box (see page 557).

Regression Diagnostics The regression diagnostic results display only the values for the predicted
values, residual results, and other diagnostics selected in the Options for
Polynomial Regression dialog box (see page 559). All results that qualify
as outlying values are flagged with a # symbol. The trigger values to flag
residuals as outliers are set in the Options for Polynomial Regression
dialog box.

If you selected Report Cases with Outliers Only, only those observations
that have one or more residuals flagged as outliers are reported; however,
all other results for that observation are also displayed.

Row This is the row number of the observation.

Residuals These are the raw residuals, the difference between the
predicted and observed values for the dependent variables.

Standardized Residuals The standardized residual is the raw residual


divided by the standard error of the estimate Sy x .

If the residuals are normally distributed about the regression line, about
66% of the standardized residuals have values between !1 and 1, and
about 95% of the standardized residuals have values between !2 and 2.
A larger standardized residual indicates that the point is far from the
regression line; the suggested value flagged as an outlier is 2.5.

Confidence Intervals These results are displayed if you selected them in the Options for
Polynomial Regression dialog box. If the confidence interval does not
include zero, you can conclude that the coefficient is different than zero
with the level of confidence specified. This can also be described as P #"

Polynomial Regression 573


& (alpha), where & is the acceptable probability of incorrectly
concluding that the coefficient is different than zero, and the confidence
interval is 100(1!"&).

The specified confidence level can be any value from 1 to 99; the
suggested confidence level for both intervals is 95%.

Row This is the row number of the observation.

Predicted This is the value for the dependent variable predicted by the
regression model for each observation.

Regression These are the values that define the region containing the
true relationship between the dependent and independent variables, for
the specified level of confidence, centered at the predicted value.

This result is displayed if you selected it in the Options for Polynomial


Regression dialog box. The specified confidence level can be any value
from 1 to 99; the suggested confidence level is 95%.

Population Confidence Interval These are the values that define the
region containing the population from which the observations were
drawn, for the specified level of confidence, centered at the predicted
value.

This result is displayed if you selected it in the Options for Polynomial


Regression dialog box. The specified confidence level can be any value
from 1 to 99; the suggested confidence level is 95%.

Polynomial Regression Report Graphs 10

You can generate up to five graphs using the results from a Polynomial
Regression. They include a:

➤ Histogram of the residuals.


➤ Scatter plot of the residuals.
➤ Bar chart of the standardized residuals.
➤ Normal probability plot of the residuals.
➤ Line/scatter plot of the regression with one independent variable
and confidence and prediction intervals.

Polynomial Regression 574


Histogram of Residuals The polynomial regression histogram plots the raw residuals in a
specified range, using a defined interval set. The residuals are divided
into a number of evenly incremented histogram intervals and plotted as
histogram bars indicating the number of residuals in each interval. The
X axis represents the histogram intervals, and the Y axis represent the
number of residuals in each group. For an example of a histogram, see
page 153.

Scatter Plot of The polynomial regression scatter plot of the residuals plots the residuals
the Residuals of the independent variables data as points relative to the standard
deviations. The X axis represents the independent variable values, the Y
axis represents the residuals of the variables, and the horizontal lines
running across the graph represent the standard deviations of the data.
For an example of a scatter plot, see page 152.

Bar Chart of The Polynomial Regression bar chart of the standardized residuals plots
the Standardized the standardized residuals of the independent variable data as points
Residuals relative to the standard deviations. The X axis represents the
independent variable values, the Y axis represents the residuals of the
variable data, and the horizontal lines running across the graph represent
the standard deviations of the data. For an example of a bar chart, see
page 153.

Normal The Polynomial Regression probability plot graphs standardized


Probability Plot residuals versus their cumulative frequencies along a probability scale.
The residuals are sorted and then plotted as points around a curve
representing the area of the gaussian. Plots with residuals that fall along
gaussian curve indicate that your data was taken from a normally
distributed population. The X axis is a linear scale representing the
residual values. The Y axis is a probability scale representing the
cumulative frequency of the residuals. For an example of a normal
probability plot, see page 155.

Line/Scatter Plot The Polynomial Regression graph plots the observations of the
of the Regression with polynomial regression for the independent variables as a line/scatter plot.
Prediction and The points represent the data dependent variables plotted against the
Confidence Intervals selected independent variables, the solid line running through the points
represents the regression line, and the dashed lines represent the
prediction and confidence intervals. The X axis represents the
independent variables and the Y axis represents the dependent variables.
For an example of a line/scatter plot of the regression, see page 156.

Polynomial Regression 575


Creating Polynomial To generate a report graph of Polynomial Regression report data:
Regression
Report Graphs 1 Click the toolbar button, or choose the Graph menu Create
Graph command when the polynomial regression report is
selected. The Create Graph dialog box appears displaying the
types of graphs available for the Polynomial Regression report.
FIGURE 12–45
The Create Graph Dialog
Box
for the Polynomial
Regression Report

2 Select the type of graph you want to create from the Graph Type
list, then select OK, or double-click the desired graph in the list.
The selected graph appears in a graph window. For more
information on each of the graph types, see pages 12-575 through
12-575.

FIGURE 12–46
A Line and Scatter Plot of
the Regression and
Confidence and Prediction
Intervals for a Polynomial
Regression Report

For information on manipulation graphs, see pages 8-181 through


8-202.

Polynomial Regression 576


12
Stepwise Linear Regression 10

Use Stepwise Linear Regression when you:

➤ Want to predict a trend in the data, or predict the value of one


variable from the values of one or more other variables, by fitting a
line or plane (or hyperplane) through the data.
➤ Do not know which independent variables contribute to predicting
the dependent variable, and you want to find the model with
suitable independent variables by adding or removing independent
variables from the equation.

If you already know the independent variables you want to include, use
Multiple Linear Regression. If you want to find the few best equations
from all possible models, use Best Subsets Regression. If the
relationship is not a straight line or plane, use Polynomial or Nonlinear
Regression.

About Stepwise Linear Stepwise Regression is a technique for selecting independent variables
Regression for a Multiple Linear Regression equation from a list of candidate
variables. Using Stepwise Regression instead of regular Multiple Linear
Regression avoids using extraneous variables, or under specifying or over
specifying the model.

Stepwise Regression assumes an association between the one or more


independent variables and a dependent variable that fits the general
equation for a multidimensional plane:

y = b0 + b1 x1 + b2 x2 + b3 x3 + ' + bk xk

where y is the dependent variable, x1, x2, x3, ..., xk are the independent
variables, and b0, b1, b2,...,bk are the regression coefficients. The
independent variable is the known, or predicted, variable. As the values
for xi vary, the corresponding value for y either increases or decreases,
depending on the sign of bi. Stepwise Regression determines which
independent variables to use by adding or removing selected
independent variables from the equation.

There are two approaches to Stepwise Regression.

Stepwise Linear Regression 577


Forward Stepwise Regression In Forward Stepwise Regression, the
independent variable that produces the best prediction of the dependent
variable (and has an F value higher than a specified F-to-Enter) is
entered into the equation first, the independent variable that adds the
next largest amount of information is entered second, and so on. After
each variable is entered, the F value of each variable already entered into
the equation is checked, and any variables with small F values (below a
specified F-to-Remove value) are removed.

This process is repeated the until adding or removing variables does not
significantly improve the prediction of the dependent variable.

Backward Stepwise Regression In Backward Stepwise Regression, all


variables are entered into the equation. The independent variable that
contributes the least to the prediction (and has an F value lower than a
specified F-to-Remove) is removed from the equation, the next least
important independent variable is removed, and so on. After each
variable is removed, the F value of each variable removed from the
equation is checked, and any variables with large F values (above a
specified F-to-Enter value) are re-entered into the equation.

This process is repeated the until removing or adding variables does not
significantly improve the prediction of the dependent variable.

( Forward and Backward Stepwise Regression using the same potential


variables do not necessarily yield the same final regression model when there
is multicollinearity among the possible independent variables.

Performing To perform a Stepwise Regression:


a Stepwise
Linear Regression 1 Enter or arrange your data appropriately in the worksheet
(see following section).

2 If desired, set the Stepwise Regression options using the Options


for Forward or Backward Stepwise Regression dialog box (page
579).

3 Select Forward Stepwise Regression or Backward Stepwise


Regression from the toolbar, then select the button, or choose
the Statistics menu Regression command, choose Stepwise, then
choose Forward or Backward.

Stepwise Linear Regression 578


4 Run the test by selecting the worksheet columns with the data you
want to test using the Pick Columns dialog box (page 594).

5 View and interpret the Stepwise Regression report and generate


report graphs (pages 12-596 and 12-607).

Arranging Stepwise Regression Data 10

The data format for a Stepwise Linear Regression consists of the data for
the independent variables in one or more columns and the
corresponding data for the observed dependent variable in a single
column. Any observations containing missing values are ignored, and
the columns must be equal in length.

FIGURE 12–47
Data Format for a
Stepwise Linear Regression

Selecting When running a Stepwise Regression, you can either:


Data Columns
➤ Select the columns to test from the worksheet before choosing the
test, or
➤ Select the columns while performing the test

Setting Stepwise Regression Options 10

Use the Stepwise Regression options to:

➤ Specify which independent variables entered, replaced, deleted, and/


or removed into or from a regression equation during forward or
backwards stepwise regression.

Stepwise Linear Regression 579


➤ Set the number of steps permitted before the stepwise algorithm
stops
➤ Set assumption checking options.
➤ Specify the residuals to display and save them to the worksheet.
➤ Set confidence interval options.
➤ Display the PRESS statistic error.
➤ Display standardized regression coefficients.
➤ Specify test to identify outlying or influential data points.
➤ Display the power of the regression.

To change the Stepwise Regression options:

1 If you are going to run the test after changing test options, and
want to select your data before you run the test, drag the pointer
over the data.

2 Open the Options for Forward or Backward Stepwise Regression


dialog box by selecting Forward or Backward Stepwise Regression
from the toolbar drop-down, then click the button, or choose
the Statistics menu Current Test Options... command. The
criterion options appear (see Figure 12–48 on page 582).

3 Click the Assumption Checking tab to view the Normality,


Constant Variance, and Durbin-Watson options (see Figure 12–48
on page 582), the Residuals tab to view the residual options (see
Figure 12–48 on page 582), More Statistics tab to view the
confidence intervals, PRESS Prediction Error, Standardized
Coefficients options (see Figure 12–48 on page 582), and Other
Diagnostics tab to view the Influence, Variance Inflation Factor,
and Power options (see Figure 12–48 on page 582). Click the
Criterion tab to return to the Normality, Constant Variance, and
Durbin-Watson options.

4 Click a check box to enable or disable a test option. Options


settings are saved between SigmaStat sessions. For more
information on each of the test options, see pages 12-581 through
12-593.

5 To continue the test, click Run Test. The Pick Columns dialog
box appears (see page 594 for more information).

Stepwise Linear Regression 580


6 To accept the current settings and close the options dialog box,
click OK. To accept the current setting without closing the
options dialog box, click Apply. To close the dialog box without
changing any settings or running the test, click Cancel.

( You can select Help at any time to access SigmaStat’s on-line help system.

Criterion Options Select the Criterion tab from the options dialog box to view the F-to-
Enter, F-to-Remove, and Number of Steps options. Use these options to
specify the independent variables that are entered into, replaced, or
removed from the regression equation during the stepwise regression,
and to specify when the stepwise algorithm stops.

F-to-Enter Value The F-to-Enter value controls which independent


variables are entered into the regression equation during forward
stepwise regression or replaced after each step during backwards stepwise
regression.

The F-to-Enter value is the minimum incremental F value associated


with an independent variable before it can be entered into the regression
equation. All independent variables producing incremental F values
above the F-to-Enter value are added to the model.

The suggested F-to-Enter value is 4.0. Increasing F-to-Enter requires a


potential independent variable to have a greater effect on the ability of
the regression equation to predict the dependent variable before it is
accepted, but may stop too soon and exclude important variables.

( The F-to-Enter value should always be greater than or equal to the F-to-
Remove value, to avoid cycling variables in and out of the regression model.

Reducing the F-to-Enter value makes it easier to add a variable, because


it relaxes the importance of a variable required before it is accepted, but
may produce redundant variables and result in multicollinearity.

( If you are performing backwards stepwise regression and you want any
variable that has been removed to remain deleted, increase the F-to-Enter
value to a large number, e.g., 100000.

F-to-Remove Value The F-to-Remove value controls which


independent variables are deleted from the regression equation during
backwards stepwise regression, or removed after each step in forward
stepwise regression.

Stepwise Linear Regression 581


FIGURE 12–48
The Options for Stepwise
Regression Dialog Box
Displaying the
Criterion Options

The F-to-Remove is the maximum incremental F value associated with


an independent variable before it can be removed from the regression
equation. All independent variables producing incremental F values
below the F-to-Remove value are deleted from the model.

The suggested F-to-Remove value is 3.9. Reducing the F-to-Remove


value makes it easier to retain a variable in the regression equation
because variables that have smaller effects on the ability of the regression
equation to predict the dependent variable are still accepted. However,
the regression may still contain redundant variables, resulting in
multicollinearity.

( The F-to-Remove value should always be less than or equal to the F-to-Enter
value, to avoid cycling variables in and out of the regression model.

Increasing the F-to-Remove value makes it easier to delete variables from


the equation, as variables that contain more predictive value can be
removed. Important variables may also be deleted, however.

( If you are performing forwards stepwise regression and you want any variable
that has been to entered to remain in the equation, set the F-to-Remove
value to zero.

Number of Steps Use this option to set the maximum number of steps
permitted before the stepwise algorithm stops. Note that if the
algorithm stops because it ran out of steps, the results are probably not
reliable. The suggested number of steps is 20 added or deleted
independent variables.

Assumption Checking Select the Assumption Checking tab from the options dialog box to view
Options the Normality, Constant Variance, and Durbin-Watson options. These

Stepwise Linear Regression 582


options test your data for its suitability for regression analysis by
checking three assumptions that a Stepwise Linear Regression makes
about the data. A Stepwise Linear Regression assumes:

➤ That the source population is normally distributed about the


regression.
➤ The variance of the dependent variable in the source population is
constant regardless of the value of the independent variable(s).
➤ That the residuals are independent of each other.

All assumption checking options are selected by default. Only disable


these options if you are certain that the data was sampled from normal
populations with constant variance and that the residuals are
independent of each other.

Normality Testing SigmaStat uses the Kolmogorov-Smirnov test to test


for a normally distributed population.

Constant Variance Testing SigmaStat tests for constant variance by


computing the Spearman rank correlation between the absolute values of
the residuals and the observed value of the dependent variable. When
this correlation is significant, the constant variance assumption may be
violated, and you should consider trying a different model (i.e., one that
more closely follows the pattern of the data), or transforming one or
more of the independent variables to stabilize the variance; see Chapter
14, Using Transforms for more information on the appropriate
transform to use.

FIGURE 12–49
The Options for Stepwise
Regression Dialog Box
Displaying the Assumption
Checking Options

P Values for Normality and Constant Variance The P value


determines the probability of being incorrect in concluding that the data

Stepwise Linear Regression 583


is not normally distributed (P value is the risk of falsely rejecting the null
hypothesis that the data is normally distributed). If the P computed by
the test is greater than the P set here, the test passes.

To require a stricter adherence to normality and/or constant variance,


increase the P value. Because the parametric statistical methods are
relatively robust in terms of detecting violations of the assumptions, the
suggested value in SigmaStat is 0.05. Larger values of P (for example,
0.10) require less evidence to conclude that the residuals are not
normally distributed or the constant variance assumption is violated.

To relax the requirement of normality and/or constant variance,


decrease P. Requiring smaller values of P to reject the normality
assumption means that you are willing to accept greater deviations from
the theoretical normal distribution before you flag the data as non-
normal. For example, a P value of 0.01 for the normality test requires
greater deviations from normality to flag the data as non-normal than a
value of 0.05.

( Although the assumption tests are robust in detecting data from populations
that are non-normal or with non-constant variances, there are extreme
conditions of data distribution that these tests cannot detect. However, these
conditions should be easily detected by visually examining the data without
resorting to the automatic assumption tests.

Durbin-Watson Statistic SigmaStat uses the Durbin-Watson statistic


to test residuals for their independence of each other. The Durbin-
Watson statistic is a measure of serial correlation between the residuals.
The residuals are often correlated when the independent variable is time,
and the deviation between the observation and the regression line at one
time are related to the deviation at the previous time. If the residuals are
not correlated, the Durbin-Watson statistic will be 2.

Difference from 2 Value Enter the acceptable deviation from 2.0 that
you consider as evidence of a serial correlation in the Difference for 2.0
box. If the computed Durbin-Watson statistic deviates from 2.0 more
than the entered value, SigmaStat warns you that the residuals may not
be independent. The suggested deviation value is 0.50, i.e., Durbin-
Watson Statistic values greater than 2.5 or less than 1.5 flag the residuals
as correlated.

To require a stricter adherence to independence, decrease the acceptable


difference from 2.0.

Stepwise Linear Regression 584


To relax the requirement of independence, increase the acceptable
difference from 2.0.

Residuals Select the Residuals tab in the options dialog box to view the Predicted
Values, Raw, Standardized, Studentized, Studentized Deleted, and
Report Flagged Values Only options.

Predicted Values Use this option to calculate the predicted value of the
dependent variable for each observed value of the independent
variable(s), then save the results to the data worksheet. Click the selected
check box if you do not want to include raw residuals in the worksheet.

To assign predicted values to a worksheet column, select the worksheet


column you want to save the predicted values to from the corresponding
drop-down list. If you select none and the Predicted Values check box is
selected, the values appear in the report but are not assigned to the
worksheet.

Raw Residuals The raw residuals are the differences between the
predicted and observed values of the dependent variables. To include
raw residuals in the report, make sure this check box is selected. Click
the selected check box if you do not want to include raw residuals in the
worksheet.

To assign the raw residuals to a worksheet column, select the number of


the desired column from the corresponding drop-down list. If you select
none from the drop-down list and the Raw check box is selected, the
values appear in the report but are not assigned to the worksheet.

FIGURE 12–50
The Options for Stepwise
Regression Dialog Box
Displaying the Residuals
Options

Standardized Residuals The standardized residual is the residual


divided by the standard error of the estimate. The standard error of the

Stepwise Linear Regression 585


residuals is essentially the standard deviation of the residuals, and is a
measure of variability around the regression line. To include
standardized residuals in the report, make sure this check box is selected.

SigmaStat automatically flags data points lying outside of the confidence


interval specified in the corresponding box. These data points are
considered to have “large” standardized residuals, i.e., outlying data
points. You can change which data points are flagged by editing the
value in the Flag Values > edit box.

Studentized Residuals Studentized residuals scale the standardized


residuals by taking into account the greater precision of the regression
line near the middle of the data versus the extremes. The Studentized
residuals tend to be distributed according to the Student t distribution,
so the t distribution can be used to define “large” values of the
Studentized residuals. SigmaStat automatically flags data points with
“large” values of the Studentized residuals, i.e., outlying data points; the
suggested data points flagged lie outside the 95% confidence interval for
the regression population.

To include studentized residuals in the report, make sure this check box
is selected. Click the selected check box if you do not want to include
studentized residuals in the worksheet.

Studentized Deleted Residuals Studentized deleted residuals are


similar to the Studentized residual, except that the residual values are
obtained by computing the regression equation without using the data
point in question.

To include studentized deleted residuals in the report, make sure this


check box is selected. Click the selected check box if you do not want to
include studentized deleted residuals in the worksheet.

SigmaStat can automatically flag data points with “large” values of the
studentized deleted residual, i.e., outlying data points; the suggested data
points flagged lie outside the 95% confidence interval for the regression
population.

( Note that both Studentized and Studentized deleted residuals use the same
confidence interval setting to determine outlying points.

Report Flagged Values Only To only include only the flagged


standardized and studentized deleted residuals in the report, make sure

Stepwise Linear Regression 586


the Report Flagged Values Only check box is selected. Uncheck this
option to include all standardized and studentized residuals in the
report.

Confidence Intervals Select the More Statistics tab in the options dialog box to view the
confidence interval options. You can set the confidence interval for the
population, regression, or both, and then save them to the worksheet.

Confidence Interval for the Population The confidence interval for


the population gives the range of values that define the region that
contains the population from which the observations were drawn.

To include confidence intervals for the population in the report, make


sure the Population check box is selected. Click the selected check box if
you do not want to include the confidence intervals for the population
in the report.

Confidence Interval for the Regression The confidence interval for


the regression line gives the range of values that defines the region
containing the true mean relationship between the dependent and
independent variables, with the specified level of confidence.

To include confidence intervals for the regression in the report, make


sure the Regression check box is selected, then specify a confidence level
by entering a value in the percentage box. The confidence level can be
any value from 1 to 99. The suggested confidence level is 95%. Click
the selected check box if you do not want to include the confidence
intervals for the population in the report.

Click the selected check box if you do not want to include the
confidence intervals for the population in the report.

Saving Confidence Intervals to the Worksheet To save the


confidence intervals to the worksheet, select the column number of the
first column you want to save the intervals to from the Starting in
Column drop-down list. The selected intervals are saved to the
worksheet starting with the specified column and continuing with
successive columns in the worksheet.

PRESS Select the More Statistics tab in the options dialog box to view the
Prediction Error PRESS Prediction Error option (see Figure 12–48). The PRESS
Prediction Error is a measure of how well the regression equation fits the
data. Leave this check box selected to evaluate the fit of the equation

Stepwise Linear Regression 587


FIGURE 12–51
The Options for Stepwise
Regression Dialog Box
Displaying the Confidence
Interval, PRESS Prediction
Error, and Standardized
Coefficients Options

using the PRESS statistic. Click the selected check box if you do not
want to include the PRESS statistic in the report.

Standardized Click the More Statistics tab in the options dialog box to view the
Coefficients ()i) Standardized Coefficients option (see Figure 12–48 on page 582).
These are the coefficients of the regression equation standardized to
dimensionless values,
sx
) i = b i ----i
sy

where bi = regression coefficient, s xi = standard deviation of the


independent variable xi, and sy = standard deviation of dependent
variable y.

To include the standardized coefficients in the report, make sure the


Standardized Coefficients check box is selected. Click the selected
check box if you do not want to include the standardized coefficients in
the worksheet.

Influence Options Select the Other Diagnostics tab in the options dialog box to view the
Influence options (see Figure 12–48 on page 582). If Other Diagnostic
is hidden, click the right pointing arrow to the right of the tabs to move
it into view. Use the left pointing arrow to move the other tabs back
into view.

Influence options automatically detect instances of influential data


points. Most influential points are data points which are outliers, that
is, they do not do not “line up” with the rest of the data points. These
points can have a potentially disproportionately strong influence on the

Stepwise Linear Regression 588


calculation of the regression line. You can use several influence tests to
identify and quantify influential points.

FIGURE 12–52 16
A Graph with an
Influential Outlying Point 14 Outlying Point
The solid line shows the
regression for the data 12
including the outlier, and
the dotted line is the
regression computed 10
with the outlying point.
8

2
100 120 140 160 180 200 220 240 260 280 300

DFFITS DFFITSi is the number of estimated standard errors that the


predicted value changes for the ith data point when it is removed from
the data set. It is another measure of the influence of a data point on the
prediction used to compute the regression coefficients.

Predicted values that change by more than two standard errors when the
data point is removed are considered to be influential.

Check the DFFITS check box to compute this value for all points and
flag influential points, i.e., those with DFFITS greater than the value
specified in the Flag Values > edit box. The suggested value is 2.0
standard errors, which indicates that the point has a strong influence on
the data. To avoid flagging more influential points, increase this value;
to flag less influential points, decrease this value.

Leverage Leverage is used to identify the potential influence of a point


on the results of the regression equation. Leverage depends only on the
value of the independent variable(s). Observations with high leverage
tend to be at the extremes of the independent variables, where small
changes in the independent variables can have large effects on the
predicted values of the dependent variable.

The expected leverage of a data point is $----------------


k + 1 %-
n
, where there are k
independent variables and n data points. Observations with leverages

Stepwise Linear Regression 589


FIGURE 12–53
The Options for Stepwise
Regression Dialog Box
Displaying the
Influence Options

much higher than the expected leverages are potentially influential


points.

Check the Leverage check box to compute the leverage for each point
and automatically flag potentially influential points, i.e., those points
that could have leverages greater than the specified value times the
expected leverage. The suggested value is 2.0 times the expected leverage
$ k + 1 %-
for the regression (i.e., 2--------------------
n
). To avoid flagging more potentially
influential points, increase this value; to flag points with less potential
influence, lower this value.

Cook's Distance Cook's distance is a measure of how great an effect


each point has on the estimates of the parameters in the regression
equation. Cook's distance assesses how much the values of the
regression coefficients change if a point is deleted from the analysis.
Cook's distance depends on both the values of the independent and
dependent variables.

Check the Cook's Distance check box to compute this value for all
points and flag influential points, i.e., those with a Cook's distance
greater than the specified value. The suggested value is 4.0. Cook's
distances above 1 indicate that a point is possibly influential. Cook's
distances exceeding 4 indicate that the point has a major effect on the
values of the parameter estimates. To avoid flagging more influential
points, increase this value: to flag less influential points, lower this value.

Report Flagged Values Only To only include only the influential


points flagged by the influential point tests in the report, make sure the
Report Flagged Values Only check box is selected. Uncheck this option
to include all influential points in the report.

Stepwise Linear Regression 590


What to Do About Influential Points Influential points have two
possible causes:

➤ There is something wrong with the data point, caused by an error in


observation or data entry.
➤ The model is incorrect.

If a mistake was made in data collection or entry, correct the value. If


you do not know the correct value, you may be able to justify deleting
the data point. If the model appears to be incorrect, try regression with
different independent variables, or a Nonlinear Regression.

For descriptions of how to handle influential points, you can reference


an appropriate statistics reference. For a list of suggested references, see
page 12.

Variance Select the Other Diagnostics tab in the options dialog box to view the
Inflation Factor Variance Inflation Factor option (see Figure 12–48 on page 582). If
Other Diagnostic is hidden, click the right pointing arrow to the right of
the tabs to move it into view. Use the left pointing arrow to move the
other tabs back into view.

The Variance Inflation Factor option measures the multicollinearity of


the independent variables, or the linear combination of the independent
variables in the fit.

Regression procedures assume that the independent variables are


statistically independent of each other, i.e., that the value of one
independent variable does not affect the value of another. However, this
ideal situation rarely occurs in the real world. When the independent

Stepwise Linear Regression 591


variables are correlated, or contain redundant information, the estimates
of the parameters in the regression model can become unreliable.

FIGURE 12–54
A Graph with
Multicollinear Data Points
Note that knowing the value 18
of one of the independent
variables allows you to 16
predict the other so that the
independent variables are 14
Dependent y

statistically independent.
12

10 120
100

2
ent x
8 80
6 60

pend
40
4 600

Inde
500 20
400 300 200 100 0

Independ
ent x
1

The parameters in regression models quantify the theoretically unique


contribution of each independent variable to predicting the dependent
variable. When the independent variables are correlated, they contain
some common information and “contaminate” the estimates of the
parameters. If the multicollinearity is severe, the parameter estimates
can become unreliable.

There are two types of multicollinearity.

Structural Multicollinearity Structural multicollinearity occurs when


the regression equation contains several independent variables which are
functions of each other. The most common form of structural
multicollinearity occurs when a polynomial regression equation contains
several powers of the independent variable. Because these powers (e.g.,
x, x2, etc.) are correlated with each other, structural multicollinearity
occurs. Including interaction terms in a regression equation can also
result in structural multicollinearity.

Sample-Based Multicollinearity Sample-based multicollinearity


occurs when the sample observations are collected in such a way that the
independent variables are correlated (for example, if age, height, and
weight are collected on children of varying ages, each variable has a
correlation with the others).

Stepwise Linear Regression 592


SigmaStat can automatically detect multicollinear independent variables
using the variance inflation factor. Click the Other Diagnostics tab in
the Options dialog box to view the Variance Inflation Factor option.

Flagging Multicollinear Data Use the value in the Flag Values > edit
box as a threshold for multicollinear variables. The default threshold
value is 4.0, meaning that any value greater than 4.0 will be flagged as
multicollinear. To make this test more sensitive to possible
multicollinearity, decrease this value. To allow greater correlation of the
independent variables before flagging the data as multicollinear, increase
this value.

When the variance inflation factor is large, there are redundant variables
in the regression model, and the parameter estimates may not be reliable.
Variance inflation factor values above 4 suggest possible
multicollinearity; values above 10 indicate serious multicollinearity.

What to Do About Multicollinearity Sample-based multicollinearity


can sometimes be resolved by collecting more data under other
conditions to break up the correlation among the independent variables.
If this is not possible, the regression equation is over parameterized and
one or more of the independent variables must be dropped to eliminate
the multicollinearity.

Structural multicollinearities can be resolved by centering the


independent variable before forming the power or interaction terms.
Use the Transform menu Center command to center the data; see
Chapter 11 for more information on using transforms to modify data.

For descriptions of how to handle multicollinearity, you can reference an


appropriate statistics reference. For a list of suggested references, see
page 12.

Report Flagged Values Only To only include only the points flagged
by the influential point tests and values exceeding the variance inflation
threshold in the report, make sure the Report Flagged Values Only check
box is selected. Uncheck this option to include all influential points in
the report.

Power Select the Other Diagnostics tab in the options dialog box to view the
Power options (see Figure 12–48 on page 582). If Other Diagnostic is
hidden, click the right pointing arrow to the right of the tabs to move it

Stepwise Linear Regression 593


into view. Use the left pointing arrow to move the other tabs back into
view.

The power of a regression is the power to detect the observed


relationship in the data. The alpha (&) is the acceptable probability of
incorrectly concluding there is a relationship.

Check the Power check box to compute the power for the stepwise linear
regression data. Change the alpha value by editing the number in the
Alpha Value edit box. The suggested value is & * 0.05. This indicates
that a one in twenty chance of error is acceptable, or that you are willing
to conclude there is a significant relationship when P # 0.05.

Smaller values of & result in stricter requirements before concluding


there is a significant relationship, but a greater possibility of concluding
there is no relationship when one exists. Larger values of & make it
easier to conclude that there is a relationship, but also increase the risk of
reporting a false positive.

Running a Stepwise Regression 10

To run a Stepwise Regression, you need to select the data to test. The
Pick Columns dialog box is used to select the worksheet columns with
the data you want to test and to specify which independent variables to
include in and omit from the regression equation.

To run a Stepwise Regression:

1 If you want to select your data before you run the regression, drag
the pointer over your data.

2 Open the Pick Columns dialog box to start the Stepwise


Regression. You can either:

➤ Select Forward or Backward Stepwise Regression from the drop-


down list in the toolbar, then select the button.
➤ Choose the Statistics menu Regression command, choose
Stepwise, then choose Forward... or Backward...
➤ Click the Run Test button from the Options for Forward or
Backward Stepwise Regression dialog box (see step 5 on page
580).

Stepwise Linear Regression 594


If you selected columns before you chose the test, the selected
columns appear in the column list. If you have not selected
columns, the dialog box prompts you to pick your data.

3 To assign the desired worksheet columns to the Selected Columns


list, select the columns in the worksheet, or select the columns
from the Data for Dependent or Independent drop-down list.

You are prompted to select one dependent variable column and up


to 64 independent variable columns. The title of selected columns
appears in each row.

FIGURE 12–55
The Pick Columns
for Stepwise Regression
Dialog Box Prompting You to
Select Data Columns

4 To change your selections, select the assignment in the list, then


select new column from the worksheet. You can also clear a
column assignment by double-clicking it in the Selected Columns
list.

5 Select Next. A dialog box appears prompting you to specify the


variables to force into the regression. Select the columns with the
variables to force into the regression equation from the worksheet
of the Variable drop-down list.

FIGURE 12–56
The Pick Columns
for Forward Stepwise
Regression Dialog Box
Prompting You to
Select Columns With
the Variables to
Force into the Equation

Stepwise Linear Regression 595


6 Select Finish. If you chose to test for normality, constant variance,
and/or independent residuals, SigmaStat performs the tests for
normality (Kolmogorov-Smirnov), constant variance, and
independent residuals. If your data fail either of these tests,
SigmaStat warns you. When the test is complete, the regression
report is displayed.

If you selected to place predicted values, residuals and other test


results in the worksheet, only the values for the final model
selected by the stepwise regression are computed. These results are
placed in the specified column and are labeled by content and
source column.

( To view these results for other models, note which independent


variables were used for that model, then perform a Multiple Linear
Regression using only those independent variables.

Interpreting Stepwise Regression Results 10

The report for both Forward and Backward Stepwise Regression displays
the variables that were entered or removed for that step, the regression
coefficients, an ANOVA table, and information about the variables in
and not in the model. Regression diagnostics, confidence intervals, and
predicted values are listed for the final regression model if these options
were selected in the Options for Forward or Backward Regression dialog
box. For more information on selecting regression options, see Setting
Stepwise Regression Options on page 579.

For descriptions of the computations of these results, you can reference


an appropriate statistics reference. For a list of suggested references, see
the page 12.

( The report scroll bars only scroll to the top and bottom of the current page.
To move to the next or the previous page in the report, use the and
buttons in the formatting toolbar to move one page up and down in the
report.

Result Explanations In addition to the numerical results, expanded explanations of the results
may also appear. To turn off this explanatory text, choose the Statistics
menu Report Options command and uncheck the Explain Test Result
option.

Stepwise Linear Regression 596


The number of decimal places displayed is also set in the Report
Options dialog box. For more information on setting report options, see
page 135.

FIGURE 12–57
Example of the
a Forward Stepwise
Regression Report

F-to-Enter This is the worksheet column used as the dependent variable in the
F-to-Remove regression computation.

These are the F values specified in the Options for Stepwise Regression
dialog boxs.

F-to-Enter The F-to-Enter value controls which independent variables


are entered into the regression equation during forward stepwise
regression, or replaced after each step during backwards stepwise
regression. It is the minimum incremental F value associated with an
independent variable before it can be entered into the regression
equation.

All independent variables with incremental F values above the F-to-


Enter value are added to the model. The suggested F-to-Enter value is
4.0.

F-to-Remove The F-to-Remove value controls which independent


variables are deleted from the regression equation during Backwards

Stepwise Linear Regression 597


Stepwise Regression, or removed after each step in Forward Stepwise
Regression. It is the maximum incremental F value associated with an
independent variable before it can be removed from the regression
equation.

All independent variables with incremental F values below the F-to-


Remove value are deleted from the model. The suggested F-to-Remove
value is 3.9.

Step The step number, variable added or removed, R, R2, and the adjusted R2
for the equation, and standard error of the estimate are all listed under
this heading.

R and R2 R, the multiple correlation coefficient, and R2, the


coefficient of determination for Stepwise Regression, are both measures
of how well the regression model describes the data. R values near 1
indicate that the equation is a good description of the relation between
the independent and dependent variables.

R equals 0 when the values of the independent variable does not allow
any prediction of the dependent variables, and equals 1 when you can
perfectly predict the dependent variables from the independent
variables.

Adjusted R2 The adjusted R2, R2adj, is also a measure of how well the
regression model describes the data, but takes into account the number
of independent variables, which reflects the degrees of freedom. Larger
R2adj values (nearer to 1) indicate that the equation is a good description
of the relation between the independent and dependent variables.

Standard Error of the Estimate ( S y x ) The standard error of the


estimate S y x is a measure of the actual variability about the regression
plane of the underlying population. The underlying population
generally falls within about two standard errors of the observed sample.
This statistic is displayed for the results of each step.

Analysis of Variance The ANOVA (analysis of variance) table lists the ANOVA statistics for
(ANOVA) Table the regression and the corresponding F value for each step.

SS (Sum of Squares) The sum of squares are measures of variability of


the dependent variable.

Stepwise Linear Regression 598


➤ The sum of squares due to regression measures the difference of the
regression plane from the mean of the dependent variable.
➤ The residual sum of squares is a measure of the size of the residuals,
which are the differences between the observed values of the
dependent variable and the values predicted by regression model.

DF (Degrees of Freedom) Degrees of freedom represent the number


observations and variables in the regression equation.

➤ The regression degrees of freedom is a measure of the number of


independent variables.
➤ The residual degrees of freedom is a measure of the number of
observations less the number of terms in the equation.

MS (Mean Square) The mean square provides two estimates of the


population variances. Comparing these variance estimates is the basis of
analysis of variance.

The mean square regression is a measure of the variation of the


regression from the mean of the dependent variable, or

sum of squares due to regression = ------------ SS reg


----------------------------------------------------------------------- - = MS reg
regression degrees of freedom DF reg

The residual mean square is a measure of the variation of the residuals


about the regression plane, or

residual sum of squares - = ------------ SS res


----------------------------------------------------------- = MS res
residual degrees of freedom DF res

2
The residual mean square is also equal to S y x .

F Statistic The F test statistic gauges the contribution of the


independent variables in predicting the dependent variable. It is the
ratio

regression variation from the dependent variable mean- MS reg


--------------------------------------------------------------------------------------------------------------------------- = -------------
residual variation about the regression MS res

Stepwise Linear Regression 599


If F is a large number, you can conclude that the independent variables
contribute to the prediction of the dependent variable (i.e., at least one
of the coefficients is different from zero, and the “unexplained
variability” is smaller than what is expected from random sampling
variability of the dependent variable about its mean). If the F ratio is
around 1, you can conclude that there is no association between the
variables (i.e., the data is consistent with the null hypothesis that all the
samples are just randomly distributed).

P Value The P value is the probability of being wrong in concluding


that there is an association between the dependent and independent
variables (i.e., the probability of falsely rejecting the null hypothesis, or
committing a Type I error, based on F). The smaller the P value, the
greater the probability that there is an association.

Traditionally, you can conclude that the independent variable can be


used to predict the dependent variable when P # 0.05.

Variables in Model Information about the independent variables used in the regression
equation for the current step are listed under this heading. The value of
the variable coefficients, standard errors, the F-to-Remove, and the
corresponding P value for the F-to-Remove are listed. These statistics
are displayed for each step. An asterisk (5) indicates variables that were
forced into the model.

Coefficients The value for the constant and coefficients of the


independent variables for the regression model are listed.

Standard Error The standard errors are estimates of the regression


coefficients (analogous to the standard error of the mean). The true
regression coefficients of the underlying population generally fall within
about two standard errors of the observed sample coefficients. Large
standard errors may indicate multicollinearity.

Stepwise Linear Regression 600


F-to-Enter The F-to-Enter gauges the increase in predicting the
dependent variable gained by adding the independent variable to the
regression equation. It is the ratio

regression variation from the dependent


variable mean associated with adding x j
when x 1 , ..., x j – 1 are already in the equation MS xj x1 ,'xj –
------------------------------------------------------------------------------------------------------------ = ------------------------------
-
residual variation about the regression MS res x i ,',x j
containing x 1 , ..., x j

If the F-to-Enter for a variable is larger than the F-to-Enter cutoff


specified with the Stepwise Regression options, the variable remains in
or is added back to the equation.

( Note that the F-to-Remove value is the cutoff that determines if a variable is
removed from or stays out of the equation.

P Value P is the P value calculated for the F-to-Enter value. The P


value is the probability of being wrong in concluding that adding the
independent variable contributes to predicting the dependent variable
(i.e., the probability of falsely rejecting the null hypothesis, or
committing a Type I error, based on F-to-Enter). The smaller the P
value, the greater the probability that adding the variable contributes to
the model.

Traditionally, you can conclude that the independent variable can be


used to predict the dependent variable when P"# 0.05.

Variables The variables not entered or removed from the model are listed under
not in Model this heading, along with their corresponding F-to-Remove and P values.

F-to-Remove The F-to-Remove gauges the increase in predicting the


dependent variable gained by removing the independent variable from
the regression equation.

If the F-to-Remove for a variable is larger than the F-to-Remove cutoff


specified with the stepwise regression options, the variable is removed
from or stays out of the equation.

( Note that it is the F-to-Enter value that determines which variable is re-
entered into or remains in the equation.

Stepwise Linear Regression 601


P Value P is the P value calculated for the F-to-Remove value. The P
value is the probability of being wrong in concluding that removing the
independent variable contributes to predicting the dependent variable
(i.e., the probability of falsely rejecting the null hypothesis, or
committing a Type I error, based on F-to-Enter). The smaller the P
value, the greater the probability that removing the variable contributes
to the model.

Traditionally, you can conclude that the independent variable can be


used to predict the dependent variable when P # 0.05.

PRESS Statistic PRESS, the Predicted Residual Error Sum of Squares, is a gauge of how
well a regression model predicts new data. The smaller the PRESS
statistic, the better the predictive ability of the model.

The PRESS statistic is computed by summing the squares of the


prediction errors (the differences between predicted and observed values)
for each observation, with that point deleted from the computation of
the regression equation.

Durbin-Watson Statistic The Durbin-Watson statistic is a measure of correlation between the


residuals. If the residuals are not correlated, the Durbin-Watson statistic
will be 2; the more this value differs from 2, the greater the likelihood
that the residuals are correlated. This results appears if it was selected in
the Options for Stepwise Regression dialog box.

Regression assumes that the residuals are independent of each other; the
Durbin-Watson test is used to check this assumption. If the Durbin-
Watson value deviates from 2 by more than the value set in the Options
for Stepwise Regression dialog box, a warning appears in the report. The
suggested trigger value is a difference of more than 0.50, i.e., when the
Durbin-Watson statistic is less than 1.5 or greater than 2.5.

Normality Test The Normality test result displays whether the data passed or failed the
test of the assumption that the source population is normally distributed
around the regression, and the P value calculated by the test. All
regression requires a source population to be normally distributed about
the regression. When this assumption may be violated, a warning
appears in the report. This result appears unless you disabled normality
testing in the Options for Best Subset Regression dialog box (see page
583).

Stepwise Linear Regression 602


Failure of the normality test can indicate the presence of outlying
influential points or an incorrect regression model.

Constant The constant variance test result displays whether or not the data passed
Variance Test or failed the test of the assumption that the variance of the dependent
variable in the source population is constant regardless of the value of
the independent variable, and the P value calculated by the test. When
the constant variance assumption may be violated, a warning appears in
the report.

If you receive this warning, you should consider trying a different model
(i.e., one that more closely follows the pattern of the data), or
transforming the independent variable to stabilize the variance and
obtain more accurate estimates of the parameters in the regression
equation. See Chapter 14, Using Transforms for more information on
the appropriate transform to use.

Power This result is displayed if you selected this option in the Options for
Stepwise Regression dialog box.

The power, or sensitivity, of a regression is the probability that the model


correctly describes the relationship of the variables, if there is a
relationship.

Regression power is affected by the number of observations, the chance


of erroneously reporting a difference & (alpha), and the slope of the
regression.

Alpha (&) Alpha (&) is the acceptable probability of incorrectly


concluding that the model is correct. An & error is also called a Type I
error (a Type I error is when you reject the hypothesis of no association
when this hypothesis is true).

The & value is set in the Power Options dialog box; the suggested value
is
&"* 0.05 which indicates that a one in twenty chance of error is
acceptable. Smaller values of & result in stricter requirements before
concluding the model is correct, but a greater possibility of concluding
the model is bad when it is really correct (a Type II error). Larger values
of & make it easier to conclude that the model is correct, but also
increase the risk of accepting a bad model (a Type I error).

Stepwise Linear Regression 603


Regression Diagnostics The regression diagnostic results display only the values for the predicted
and residual results selected in the Options for Stepwise Regression
dialog box (see page 585). All results that qualify as outlying values are
flagged with a # symbol. The trigger values to flag residuals as outliers
are set in the Options for Stepwise Regression dialog box.

If you selected Report Cases with Outliers Only, only those observations
that have one or more residuals flagged as outliers are reported; however,
all other results for that observation are also displayed.

Predicted Values This is the value for the dependent variable predicted
by the regression model for each observation. If these values were saved
to the worksheet, they may be used to plot the regression using
SigmaPlot.

Residuals These are the raw residuals, the difference between the
predicted and observed values for the dependent variables.

Standardized Residuals The standardized residual is the raw residual


divided by the standard error of the estimate.

If the residuals are normally distributed about the regression, about 66%
of the standardized residuals have values between !1 and 1, and about
95% of the standardized residuals have values between !2 and 2. A
larger standardized residual indicates that the point is far from the
regression; the suggested value flagged as an outlier is 2.5.

Studentized Residuals The Studentized residual is a standardized


residual that also takes into account the greater confidence of the data
points in the “middle” of the data set. By weighting the values of the
residuals of the extreme data points (those with the lowest and highest
independent variable values), the Studentized residual is more sensitive
than the standardized residual in detecting outliers.

Both Studentized and Studentized deleted residuals that lie outside a


specified confidence interval for the regression are flagged as outlying
points: the suggested confidence value is 95%.

This residual is also known as the internally Studentized residual,


because the standard error of the estimate is computed using all data.

Studentized Deleted Residual The Studentized deleted residual, or


externally Studentized residual, is a Studentized residual which uses the

Stepwise Linear Regression 604


standard error of the estimate Sy x $ –i % , computed by deleting the data
point associated with the residual. This reflects the greater effect of
outlying points by deleting the data point from the variance
computation.

Both Studentized and Studentized deleted residuals that lie outside a


specified confidence interval for the regression are flagged as outlying
points; the suggested confidence value is 95%.

The Studentized deleted residual is more sensitive than the Studentized


residual in detecting outliers, since the Studentized deleted residual
results in much larger values for outliers than the Studentized residual.

Influence Diagnostics The influence diagnostic results display only the values for the results
selected in the Options dialog box under the Other Diagnostics tab (see
page 588). All results that qualify as outlying values are flagged with a #
symbol. The trigger values to flag data points as outliers are also set in
the Options dialog box under the Other Diagnostics tab.

If you selected Report Cases with Outliers Only, only observations that
have one or more observations flagged as outliers are reported; however,
all other results for that observation are also displayed.

Cook's Distance Cook's distance is a measure of how great an effect


each point has on the estimates of the parameters in the regression
equation. It is a measure how much the values of the regression
equation would change if that point is deleted from the analysis.

Values above 1 indicate that a point is possibly influential. Cook's


distances exceeding 4 indicate that the point has a major effect on the
values of the parameter estimates. Points with Cook's distances greater
than the specified value are flagged as influential; the suggested value is
4.

Leverage Leverage values identify potentially influential points.


Observations with leverages a specified factor greater than the expected
leverages are flagged as potentially influential points; the suggested value
is 2.0 times the expected leverage.

The expected leverage of a data point is $----------------


k + 1 %-
n
, where there are k
independent variables and n data points.

Stepwise Linear Regression 605


Because leverage is calculated using only the dependent variable, high
leverage points tend to be at the extremes of the independent variables
(large and small values), where small changes in the independent
variables can have large effects on the predicted values of the dependent
variable.

DFFITS The DFFITS statistic is a measure of the influence of a data


point on regression prediction. It is the number of estimated standard
errors the predicted value for a data point changes when the observed
value is removed from the data set before computing the regression
coefficients.

Predicted values that change by more than the specified number of


standard errors when the data point is removed are flagged as influential:
the suggested value is 2.0 standard errors.

What to Do About Influential Points Influential points have two


possible causes:

➤ There is something wrong with the data point, caused by an error in


observation or data entry.
➤ The model is incorrect.

If a mistake was made in data collection or entry, correct the value. If


you do not know the correct value, you may be able to justify deleting
the data point. If the model appears to be incorrect, try regression with
different independent variables, or a Nonlinear Regression.

For descriptions of how to handle influential points, you can reference


an appropriate statistics reference. For a list of suggested references, see
page 12.

Confidence Intervals These results are displayed if you selected them in the Options for
Stepwise Regression dialog box. If the confidence interval does not
include zero, you can conclude that the coefficient is different than zero
with the level of confidence specified. This can also be described as P #
& (alpha), where & is the acceptable probability of incorrectly
concluding that the coefficient is different than zero, and the confidence
interval is 100(1 !"&).

The specified confidence level can be any value from 1 to 99; the
suggested confidence level for both intervals is 95%.

Stepwise Linear Regression 606


Pred (Predicted Values) This is the value for the dependent variable
predicted by the regression model for each observation.

Mean The confidence interval for the regression gives the range of
variable values computed for the region containing the true relationship
between the dependent and independent variables, for the specified level
of confidence.

Obs (Observations) The confidence interval for the population gives


the range of variable values computed for the region containing the
population from which the observations were drawn, for the specified
level of confidence.

Stepwise Regression Report Graphs 10

You can generate up to six graphs using the results from a Stepwise
Regression. They include a:

➤ Histogram of the residuals.


➤ Scatter plot of the residuals.
➤ Bar chart of the standardized residuals.
➤ Normal probability plot of the residuals.
➤ Line/scatter plot of the regression with one independent variable
and confidence and prediction intervals.
➤ 3D scatter plot of the residuals.

Histogram of Residuals The Stepwise Regression histogram plots the raw residuals in a specified
range, using a defined interval set. The residuals are divided into a
number of evenly incremented histogram intervals and plotted as
histogram bars indicating the number of residuals in each interval. The
X axis represents the histogram intervals, and the Y axis represent the
number of residuals in each group. For an example of a histogram, see
page 153.

Scatter Plot of The Stepwise Regression scatter plot of the residuals plots the residuals
the Residuals of the data in the selected independent variable column as points relative
to the standard deviations. The X axis represents the independent
variable values, the Y axis represents the residuals of the variables, and
the horizontal lines running across the graph represent the standard
deviations of the data. For an example of a scatter plot, see page 152.

Stepwise Linear Regression 607


Bar Chart of The Stepwise Regression bar chart of the standardized residuals plots the
the Standardized standardized residuals of the data in the selected independent variable
Residuals column as points relative to the standard deviations. The X axis
represents the selected independent variable values, the Y axis represents
the residuals of the variables, and the horizontal lines running across the
graph represent the standard deviations of the data. For an example of a
bar chart, see page 153.

Normal The Stepwise Regression probability plot graphs standardized residuals


Probability Plot versus their cumulative frequencies along a probability scale. The
residuals are sorted and then plotted as points around a curve
representing the area of the gaussian. Plots with residuals that fall along
gaussian curve indicate that your data was taken from a normally
distributed population. The X axis is a linear scale representing the
residual values. The Y axis is a probability scale representing the
cumulative frequency of the residuals. For an example a normal
probability plot, see page 155.

Line/Scatter Plot The Stepwise Regression line/scatter graph plots the observations of the
of the Regression with stepwise regression for the data of the selected independent variable
Prediction and column as a line/scatter plot. The points represent the dependent
Confidence Intervals variable data plotted against the selected independent variable data, the
solid line running through the points represents the regression line, and
the dashed lines represent the prediction and confidence intervals. The
X axis represents the independent variables and the Y axis represents the
dependent variables. For an example a line/scatter plot, see page 156.

3D Residual The stepwise regression 3D residual scatter plot graphs the residuals of
Scatter Plot the two selected columns of independent variable data. The X and the Y
axes represent the independent variables, and the Z axis represents the
residuals. For an example a 3D residual scatter plot, see page 156.

Creating Stepwise To generate a graph of Stepwise Regression report data:


Regression
Report Graphs 1 Click the toolbar button, or choose the Graph menu Create
Graph command when the stepwise linear regression report is

Stepwise Linear Regression 608


selected. The Create Graph dialog box appears displaying the
types of graphs available for the Stepwise Regression results.

FIGURE 12–58
The Create Graph Dialog
Box
for the Stepwise
Regression Report

2 Select the type of graph you want to create from the Graph Type
list, then select OK, or double-click the desired graph in the list.
For more information on each of the graph types, see pages 12-607
through 12-608.

If you select Scatter Plot Residuals, Bar Chart Std Residuals,


Regression, Conf. & Pred, a dialog box appears prompting you to
select the column with independent variables you want to use in
the graph. If you select 3D Scatter & Mesh, or 3D Residual
Scatter, and you have more than two columns of independent
variables, a dialog box appears prompting you to select the two
columns with the independent variables you want to plot.

FIGURE 12–59
Select X Independent
Variable Prompting you to
Select The Independent
Variable You Want to Plot

Stepwise Linear Regression 609


3 Select the columns with the independent variables you want to use
in the graph, then select OK. The graph appears using the
specified independent variables.

FIGURE 12–60
Example of a 3D Scatter
Plot of the Residuals for a
Stepwise Regression Report

For information on manipulating graphs, see page 182 through page


202.

Stepwise Linear Regression 610


12
Best Subsets Regression 10

Use Linear Best Subsets Regression when you:

➤ Need to predict a trend in the data, or predict the value of one


variable from the values of one or more other variables, by fitting a
line or plane (or hyperplane) through the data.
➤ Do not know which independent variables contribute to the
prediction of the dependent variable, and you want to find the
subsets of independent variables that best contribute to predicting
the dependent variable.

The independent variable is the known, or predicted, variable. When


the independent variable is varied, a corresponding value for the
dependent, or response, variable is produced.

If you already know which independent variables to use, use Multiple


Linear Regression. If you want to select the equation model by
incrementally adding or deleting variables from the model, use Stepwise
Regression. If the relationship is not a straight line or plane, use
Polynomial or Nonlinear Regression.

About Best Best Subsets Regression is a technique for selecting variables in a


Subset Regression multiple linear regression by systematically searching through the
different combinations of the independent variables and selecting the
subsets of variables that best contribute to predicting the dependent
variable.

Best Subset Regression assumes an association between the independent


and dependent variables that fits the general equation for a
multidimensional plane:

y = b0 + b1 x1 + b2 x2 + b3 x3 + ' + bk xk

where y is the dependent variable, x1, x2, x3, ..., xk are the independent
variables, and b0, b1, b2,...,bk are the regression coefficients. As the
values for xi vary, the corresponding value for y either increases or
decreases. Best subsets regression searches for those combinations of the
independent variables that give the “best” prediction of the dependent
variable. There are several criteria for “best,” and the results depend on

Best Subsets Regression 611


which criterion you select. These criteria are specified in the Options for
Best Subset Regression dialog box.

No predicted values, residuals, graphs, or other results are produced with


a best subsets regression. To view results, note which independent
variables were used for the desired model, then perform a multiple linear
regression using only those independent variables.

“Best” Subsets Criteria There are three statistics that can be used to evaluate which subsets of
variables best contribute to predicting the dependent variable. For a
further discussion of these statistics, you can reference an appropriate
statistics reference. For a list of suggested references, see page 12.

R2 R2, the coefficient of determination for multiple regression, is a


measure of how well the regression model describes the data. The larger
the value of R2, the better the model predicts the dependent variable.

However, the number of variables used in the equation is not taken into
account. Consequently, equation with more variables will always have
higher R2 values, whether or not the additional variables really
contribute to the prediction.
2
Adjusted R2 The adjusted R2, R adj , is a measure of how well the
regression model describes the data based on R2, but takes into account
the number of independent variables.

Mallows Cp Cp is a gauge of the size of the bias introduced into the


estimate of the dependent variable when independent variables are
omitted from the regression equation, as computed from the number of
parameters plus a measure of the difference between the predicted and
true population means of the dependent variable.

The optimal value of Cp is equal to the number of parameters (the


independent variables used in the subset plus the constant), or:

Cp = p = k + 1

where p is the number of parameters and k is the number of independent


variables.

Best Subsets Regression 612


The closer the value of Cp is to the number of parameters, the less likely
a relevant variable was omitted. Note that the fully specified model will
always have a Cp * p.

Performing a Best To perform a Best Subset Regression:


Subset Regression
1 Enter or arrange your data appropriately in the worksheet
(see following section).

2 If desired, set the Best Subset Regression options using the


Options for Best Subset Regression dialog box (page 614).

3 Select Best Subset Regression from the toolbar drop-down list,


then select the button, or choose the Statistics menu Regression
command, then choose Best Subset.

4 Run the test by selecting the worksheet columns with the data you
want to test using the Pick Columns dialog box (page 618).

5 View and interpret the Best Subset Regression report (page 619).

Arranging Best Subset Regression Data 10

Place the data for the observed dependent variable in a single column
and the corresponding data for the independent variables in one or more
columns. Rows containing missing values are ignored, and the columns
must be of equal length.

FIGURE 12–61
Data Format for a
Best Subset Regression

Best Subsets Regression 613


Selecting When running a Best Subset Regression, you can either:
Data Columns
➤ Select the columns to test from the worksheet before choosing the
test, or
➤ Select the columns while performing the test

Setting Best Subset Regression Options 10

Use the Best Subset Regression options to:

➤ Specify the criterion to use to predict the dependent variable and the
number of subset used in the equation.
➤ Enable the variance inflation factor to identify potential difficulties
with the regression parameter estimates (multicollinearity).

To change Best Subset Regression options:

1 If you are going to run the test after changing test options, and
want to select your data before you run the test, drag the pointer
over the data.

2 Open the Options for Best Subset Regression dialog box by


selecting Best Subset Regression from the toolbar drop-down list,
then click the button, or choose the Statistics menu Current
Test Options... command. The criterion options appear (see
Figure 12–62 on page 615).

3 Click the Other Diagnostics tab to view the Variance Inflation


Factor option. Click the Criterion tab to return to the Best
Criterion and Number of Subset options.

4 Select a check box to enable or disable a test option. Options


settings are saved between SigmaStat sessions. For more
information on each of the test options, see pages 12-612 through
12-618.

5 To continue the test, click Run Test. The Pick Columns dialog
box appears (see page 618 for more information).

6 To accept the current settings and close the options dialog box,
click OK. To accept the current setting without closing the
options dialog box, click Apply. To close the dialog box without
changing any settings or running the test, click Cancel.

Best Subsets Regression 614


( You can select Help at any time to access SigmaStat’s on-line help system.

Criterion Options Use the Best Criterion option to select the criterion used to determine
the best subsets and the Number of Subsets option to specify the
number of subsets to list.

Mallows Cp Select Mallows C(p) from the Best Criterion drop-down


list to use a gauge of the bias introduced when variables are omitted to
quickly screen large numbers of potential variables and produce a few
subsets that include only the relevant variables. The number of subsets
listed is equal to the number set with the Number of Subsets option.

R Squared Select R Squared (R2) from the Best Criterion drop-down


list to use the largest coefficient of determination to find the best fitting
subset. R2 contains no information on the number of variables used, so
subsets are listed for each number of possible variables (i.e., one
independent variable, two variables, etc., up to all variables selected).
The maximum number of subsets listed for each number of possible
variables is equal to the Number of Subsets option (see page 615).

Adjusted R Squared Select Adjusted R Squared (Adjusted R2) from the


2
Best Criterion drop-down list to use the largest R adj values to select the
2
best regressions. R adj takes into account the loss of degrees of freedom
when additional independent variables are added to the regression
equation. The number of subsets listed is equal to the number set with
the Number of Subsets option .

FIGURE 12–62
The Options for Best Subset
Regression Dialog Box

Number of Subsets Use this option to specify the number of most


contributing variable groups to list by entering the desired value in the
2
Number of Subsets edit box. For Cp and R adj , this is the total number

Best Subsets Regression 615


of subsets. For R2, this is the number of variable subsets listed for each
number of independent variables in the equation.

Variance Use Variance Inflation Factor option (see Figure 12–62 on page 615) to
Inflation Factor measure the multicollinearity of the independent variables, or the linear
combination of the independent variables in the fit.

Regression procedures assume that the independent variables are


statistically independent of each other, i.e., that the value of one
independent variable does not affect the value of another. However, this
ideal situation rarely occurs in the real world. When the independent
variables are correlated, or contain redundant information, the estimates
of the parameters in the regression model can become unreliable.

FIGURE 12–63
A Graph with
Multicollinear Data Points
Note that knowing the value 18
of one of the independent
variables allows you to 16
predict the other, so that the
independent variables are 14
Dependent y

statistically independent.
12

10 120
100
2
ent x

8 80
6 60
pend

40
4 600
Inde

500 20
400 300 200 100 0

Independ
ent x
1

The parameters in regression models quantify the theoretically unique


contribution of each independent variable to predicting the dependent
variable. When the independent variables are correlated, they contain
some common information and “contaminate” the estimates of the
parameters. If the multicollinearity is severe, the parameter estimates
can become unreliable.

There are two types of multicollinearity.

Structural Multicollinearity Structural multicollinearity occurs when


the regression equation contains several independent variables which are
functions of each other. The most common form of structural
multicollinearity occurs when a polynomial regression equation contains

Best Subsets Regression 616


several powers of the independent variable. Because these powers (e.g.,
x, x2, etc.) are correlated with each other, structural multicollinearity
occurs. Including interaction terms in a regression equation can also
result in structural multicollinearity.

Sample-Based Multicollinearity Sample-based multicollinearity


occurs when the sample observations are collected in such a way that the
independent variables are correlated (for example, if age, height, and
weight are collected on children of varying ages, each variable has a
correlation with the others).

SigmaStat can automatically detect multicollinear independent variables


using the variance inflation factor. Click the Other Diagnostics tab in
the Options dialog box to view the Variance Inflation Factor option.

Flagging Multicollinear Data Use the value in the Flag Values > edit
box as a threshold for multicollinear variables. The default threshold
value is 4.0, meaning that any value greater than 4.0 will be flagged as
multicollinear. To make this test more sensitive to possible
multicollinearity, decrease this value. To allow greater correlation of the
independent variables before flagging the data as multicollinear, increase
this value.

When the variance inflation factor is large, there are redundant variables
in the regression model, and the parameter estimates may not be reliable.
Variance inflation factor values above 4 suggest possible
multicollinearity; values above 10 indicate serious multicollinearity.

What to Do About Multicollinearity Sample-based multicollinearity


can sometimes be resolved by collecting more data under other
conditions to break up the correlation among the independent variables.
If this is not possible, the regression equation is over parameterized and
one or more of the independent variables must be dropped to eliminate
the multicollinearity.

Structural multicollinearities can be resolved by centering the


independent variable before forming the power or interaction terms.
Use the Transform menu Center command to center the data; see
Chapter 14, Using Transforms for more information on using
transforms to modify data.

Best Subsets Regression 617


For descriptions of how to handle multicollinearity, you can reference an
appropriate statistics reference. For a list of suggested references, see
page 12.

Report Flagged Values Only To only include only the points flagged
by the influential point tests and values exceeding the variance inflation
threshold in the report, make sure the Report Flagged Values Only check
box is selected. Uncheck this option to include all influential points in
the report.

Running a Best Subset Regression 10

To run a Best Subset Regression, you need to select the data to test. The
Pick Columns dialog box is used to select the worksheet columns with
the data you want to test.

To run a Best Subset Regression:

1 If you want to select your data before you run the regression, drag
the pointer over your data.

2 Open the Pick Columns dialog box to start the best subset
regression. You can either:

➤ Select Best Subset Regression from the toolbar drop-down list,


then select the button.
➤ Choose the Statistics menu Regression command, then choose
Best Subset...
➤ Select the Run Test button in the Options for Best Subset
Regression dialog box (see step 5 on page 614).

If you selected columns before you chose the test, the selected
columns appear in the column list. If you have not selected
columns, the dialog box prompts you to pick your data.

3 To assign the desired worksheet columns to the Selected Columns


list, select the columns in the worksheet, or select the columns
from the Data for Dependent and Independent drop-down list.

The first selected column is assigned to the Dependent Variable


row in the Selected Columns list, and the following columns are
assigned to the Independent Variable rows. The title of selected

Best Subsets Regression 618


columns appears in each row. You are only prompted for one
dependent and one up to 64 independent variable columns.

FIGURE 12–64
The Pick Columns
for Best Subset Regression
Dialog Box Prompting You to
Select Data Columns

4 To change your selections, select the assignment in the list, then


select new column from the worksheet. You can also clear a
column assignment by double-clicking it in the Selected Columns
list.

5 Select Finish to run the regression. The Best Subset Regression is


performed. When the test is complete, the Best Subset regression
report appears.

( No predicted values, residuals and other test results are computed or placed
in the worksheet. To view results for models, note which independent
variables were used for that model, then perform a Multiple Linear
Regression using only those independent variables.

Interpreting Best Subsets Regression Results 10

A Best Subsets Regression report lists a summary table of the “best”


criteria statistics for all variable subsets, along with the error mean square
and the specific member variables of the subset. Detailed results for each
subset regression equation are then listed individually.

Note that the number of subsets listed is determined by the number of


subsets selected in the Options for Best Subsets Regression dialog box,
and the criterion used to select the best subsets.

Best Subsets Regression 619


➤ If you used R2, the maximum number of subsets reported for each
number of variables included is the number set in the Best Subsets
Regression Options dialog box.
2
➤ If you used R adj or Cp, the number of subset results reported is the
number set in the Options for Best Subsets Regression dialog box.

( You cannot generate report graphs for Best Subsets Regression. To view a
graph, perform a Multiple Linear Regression using the variables in the
subset(s) of interest, and graph those results. For information on performing
Multiple Linear Regression, page 495.

( The report scroll bars only scroll to the top and bottom of the current page.
To move to the next or the previous page in the report, use the and
buttons in the formatting toolbar to move one page up and down in the
report.

Result Explanations In addition to the numerical results, expanded explanations of the results
may also appear. To turn off this explanatory text, choose the Statistics
menu Report Options command and unselect the Explain Results
option.

The number of decimal places displayed is controlled in the Report


Options dialog box. For more information on setting report options, see
page 135.

Summary Table Regression Model This is the subset model number that corresponds
to the numbers of the more detailed regression equation statistics.

Variables The variables included in the subset are noted by asterisks (*)
which appear below the variable symbols on the right side of the table.

Mallows Cp Cp is a gauge of the bias introduced into the estimate of the


dependent variable when independent variables are omitted from the
regression equation. The optimal value of Cp is equal to the number of
parameters (the independent variables used in the subset plus the
constant), or

Cp = p = k + 1

where p is the number of parameters and k is the number of independent


variables. The closer the value of Cp is to the number of parameters, the
less likely a relevant variable was omitted. Subsets with low orders that

Best Subsets Regression 620


FIGURE 12–65
An Example of
a Best Subset
Regression Report
Using the
Correlation
Coefficient R2 as
the “Best” Criterion

also have Cp values close to k 1 are good candidates for the best subset
of variables.

R2 R2, the coefficient of determination for multiple regression, is a


measure of how well the regression model describes the data. The closer
the value of R2 to 1, the better the model predicts the dependent
variable. However, because the number of variables used is not taken
into account, higher order subsets will always have higher R2 values,
whether or not the additional variables really contribute to the
prediction.
2
Adjusted R2 The adjusted R2, R adj , is a measure of how well the
regression model describes the data based on R2, but takes into account
the number of independent variables.
2
Larger R adj values (nearer to 1) indicate that the equation is a good
description of the relation between the independent and dependent
variables.Note that the subset that includes all variables always has a Cp =
p.

Best Subsets Regression 621


MSerr (Error Mean Square) The error mean square (residual, or
within groups):

error sum of squares - = --------------- SS error


---------------------------------------------------- - = MS error
error degrees of freedom DF error

is an estimate of the variability in the underlying population, computed


from the random component of the observations.

Residual Sum of Squares The residual sum of squares is a measure of


the size of the residuals, which are the differences between the observed
values of the dependent variable and the values predicted by regression
model.

Subsets Results Tables of statistical results are listed for each regression equation
identified in the summary table.

Coefficient The value for the constant and coefficients of the


independent variables for the regression model are listed.

Std Err (Standard Error) The standard errors are estimates of these
regression coefficients (analogous to the standard error of the mean).
The true regression coefficients of the underlying population generally
fall within about two standard errors of the observed sample coefficients.
Large standard errors may indicate multicollinearity. These values are
used to compute t for the regression coefficients.

t Statistic The t statistic tests the null hypothesis that the coefficient of
each independent variable is zero, that is, the independent variable does
not contribute to predicting the dependent variable. t is the ratio of the
regression coefficient to its standard error, or:

regression coefficient
t = --------------------------------------------------------------------------------------
standard error of regression coefficient

You can conclude from “large” t values that the independent variable(s)
can be used to predict the dependent variable (i.e., that the coefficient is
not zero).

P Value P is the P value calculated for t. The P value is the probability


of being wrong in concluding that there is a true association between the

Best Subsets Regression 622


variables (i.e., the probability of falsely rejecting the null hypothesis, or
committing a Type I error, based on t). The smaller the P value, the
greater the probability that the independent variable helps predict the
dependent variable.

Traditionally, you can conclude that the independent variable can be


used to predict the dependent variable when P"#"0.05.

VIF (Variance Inflation Factor) The variance inflation factor is a


measure of multicollinearity. It measures the “inflation” of a regression
parameter (coefficient) for an independent variable due to redundant
information in other independent variables.

If the variance inflation factor is at or near 1.0, there is no redundant


information in the other independent variables. If the variance inflation
factor is much larger, there are redundant variables in the regression
model, and the parameter estimates may not be reliable.

This result appears unless it was disabled in the Options for Best Subset
Regression dialog box (see page 623).

Pearson Product Moment Correlation 10

Use Pearson Product Moment Correlation when:

➤ You want to measure the strength of the association between pairs of


variables without regard to which variable is dependent or
independent.
➤ You want to determine if the relationship, if any, between the
variables is a straight line.
➤ The residuals (distances of the data points from the regression line)
are normally distributed with constant variance.

The Pearson Product Moment Correlation coefficient is the most


commonly used correlation coefficient.

If you want to predict the value of one variable from another, use Simple
or multiple Linear Regression. If you need to find the correlation of
data measured by rank or order, use the nonparametric Spearman Rank
Order Correlation.

Pearson Product Moment Correlation 623


About the When an assumption is made about the dependency of one variable on
Pearson Product another, it affects the computation of the regression line. Reversing the
Moment Correlation assumption of the variable dependencies results in a different regression
Coefficient line.

The Pearson Product Moment Correlation coefficient does not require


the variables to be assigned as independent and dependent. Instead,
only the strength of association is measured.

Pearson Product Moment Correlation is a parametric test that assumes


the residuals (distances of the data points from the regression line) are
normally distributed with constant variance.

Computing the Pearson To compute the Pearson Product Moment Correlation coefficient:
Product Moment
Correlation Coefficient 1 Enter or arrange your data appropriately in the data worksheet
(see following section).

2 Select Pearson Correlation from the toolbar, then select the


button, or choose the Statistics menu Correlation command, then
choose Pearson Product Moment.

3 Run the test by selecting the worksheet columns with the data you
want to test using the Pick Columns dialog box (page 625).

4 View and interpret the Pearson Product Moment Report and


generate report graph (pages 12-627 and 12-629).

Arranging Pearson Product Moment Correlation Data 10

Place the data for each variable in a column. You must have at least two
columns of variables, with a maximum of 64 columns. Observations

Pearson Product Moment Correlation 624


containing missing values are ignored, including missing values created
by columns of unequal length.

FIGURE 12–66
Data for Computing a
Pearson Product Moment
Correlation Coefficient

Selecting When computing a coefficient, you can either:


Data Columns
➤ Select the columns to test from the worksheet before choosing the
test, or
➤ Select the columns while computing the coefficient

Running a Pearson Product Moment Correlation 10

To run a Pearson Product Moment test, you need to select the data to
test. The Pick Columns dialog box is used to select the worksheet
columns with the data you want to test.

To run a Pearson Product Moment Correlation:

1 If you want to select your data before you run the correlation, drag
the pointer over your data.

2 Open the Pick Columns dialog box to start the Pearson Product
Moment Correlation. You can either:

➤ SelectPearson Correlation from the toolbar drop-down list,


then select the button>
➤ Choose the Statistics menu Correlation command, then choose
Pearson Product Moment...

Pearson Product Moment Correlation 625


If you selected columns before you chose the test, the selected
columns appear in the column list. If you have not selected
columns, the dialog box prompts you to pick your data.

3 To assign the desired worksheet columns to the Selected Columns


list, select the columns in the worksheet, or select the columns
from the Data for Variable drop-down list.

The selected columns are assigned to the Variables row in the


Selected Columns list in the order they are selected from the
worksheet. The title of selected columns appears in each row. You
can select up to 64 variable columns. SigmaStat computes the
correlation coefficient for every possible pair.

FIGURE 12–67
The Pick Columns
for Pearson Correlation
Dialog Box Prompting You to
Select Data Columns

4 To change your selections, select the assignment in the list, then


select new column from the worksheet. You can also clear a
column assignment by double-clicking it in the Selected Columns
list.

5 Select Finish. The correlation coefficient is computed. When the


test is complete, the Pearson Product Moment Correlation
Coefficient report appears.

Pearson Product Moment Correlation 626


Interpreting Pearson Product Moment Correlation Results 10

The report for a Pearson Product Moment Correlation displays the


correlation coefficient r, the P value for the correlation coefficient, and
the number of data points used in the computation, for each pair of
variables.

( The report scroll bars only scroll to the top and bottom of the current page.
To move to the next or the previous page in the report, use the and
buttons in the formatting toolbar to move one page up and down in the
report.

Result Explanations In addition to the numerical results, expanded explanations of the results
may also appear. To turn off this explanatory text, choose the Statistics
menu Report Options command and unselect the Explain Results
option.

( The number of decimal places displayed is also set in the Report Options
dialog box. For more information on setting report options see page 135.
FIGURE 12–68
The Pearson
Product Moment
Correlation
Results Report

Pearson Product Moment Correlation 627


Correlation Coefficient The correlation coefficient r quantifies the strength of the association
between the variables. r varies between !1 and 1. A correlation
coefficient near 1 indicates there is a strong positive relationship
between the two variables, with both always increasing together. A
correlation coefficient near !1 indicates there is a strong negative
relationship between the two variables, with one always decreasing as the
other increases. A correlation coefficient of 0 indicates no relationship
between the two variables.

P Value The P value is the probability of being wrong in concluding that there is
a true association between the variables (i.e., the probability of falsely
rejecting the null hypothesis, or committing a Type I error). The smaller
the P value, the greater the probability that the variables are correlated.

Traditionally, you can conclude that the independent variable can be


used to predict the dependent variable when P #"0.05.

Number of Samples This is the number of data points used to compute the correlation
coefficient. This number reflects samples omitted because of missing
values in one of the two variables used to compute each correlation
coefficient.

Pearson Product Moment Correlation 628


Pearson Product Moment Correlation Report Graph 10

The Pearson Moment Correlation matrix is a series of scatter graphs that


plot the associations between all possible combinations of variables.

The first row of the matrix represents the first set of variables or the first
column of data, the second row of the matrix represents the second set of
variables or the second data column, and the third row of the matrix
represents the third set of variables or third data column. The X and Y
data for the graphs correspond to the column and row of the graph in
the matrix.

For example, the X data for the graphs in the first row of the matrix is
taken from the second column of tested data, and the Y data is taken
from the first column of tested data. The X data for the graphs in the
second row of the matrix is taken from the first column of tested data,
and the Y data is taken from the second column of tested data. The X
data for the graphs in the third row of the matrix is taken from the
second column of tested data, and the Y data is taken from the third
column of tested data, etc. The number of graph rows in the matrix is
equal to the number of data columns being tested.

Creating the Pearson To generate the graph of Pearson Product Moment report data:
Product Moment
Report Graph 1 Click the toolbar button, or choose the Graph menu Create
Graph command when the Pearson product moment report is
selected. The Create Graph dialog box appears displaying the
Scatter Matrix graph

Pearson Product Moment Correlation 629


2 Select Scatter Plot Matrix from the Graph Type list, then select
OK, or double-click the desired graph in the list..

FIGURE 12–69
The Create Graph Dialog
Box
for the Pearson Product
Moment Correlation Report

FIGURE 12–70
A Scatter Matrix Graph
of the Pearson Product
Moment Report Data

For information on manipulating graphs, see pages 8-181 through


8-202.

Pearson Product Moment Correlation 630


12
Spearman Rank Order Correlation 10

Use Spearman Rank Order Correlation when:

➤ You want to measure the strength of association between pairs of


variables without specifying which variable is dependent or
independent.
➤ The residuals (distances of the data points from the regression line)
are not normally distributed with constant variance.

If you want to assume that the value of one variable affects the other, use
some form of regression. If you need to find the correlation of normally
distributed data, use the parametric Pearson Product Moment
Correlation.

About the When an assumption is made about the dependency of one variable on
Spearman Rank Order another, it affects the computation of the regression line. Reversing the
Correlation Coefficient assumption of the variable dependencies results in a different regression
line.

The Spearman Rank Order Correlation coefficient does not require the
variables to be assigned as independent and dependent. Instead, only
the strength of association is measured.

The Spearman Rank Order Correlation coefficient is computed by


ranking all values of each variable, then computing the Pearson Product
Moment Correlation coefficient of the ranks.

Spearman Rank Order Correlation is a nonparametric test that does not


require the data points to be linearly related with a normal distribution
about the regression line with constant variance.

Computing the To compute the Spearman Rank Order Correlation coefficient:


Spearman Rank Order
Correlation Coefficient 1 Enter or arrange your data appropriately in the worksheet
(see following section).

2 Select Spearman Correlation from the toolbar, then select the


button, or choose the Statistics menu Correlation command, then
choose Spearman Rank Order.

3 Run the test by selecting the worksheet columns with the data you
want to test using the Pick Columns dialog box (12-632).

Spearman Rank Order Correlation 631


4 View and interpret the Spearman rank order correlation report and
generate the report graph (pages 12-634 and 12-635).

Arranging Spearman Rank Order Correlation Coefficient Data 10

Place the data for each variable in a column. You must have at least two
columns of variables, with a maximum of 64 columns. Observations
containing missing values are ignored. However, rank order correlations
require columns of equal length.

FIGURE 12–71
Data for Computing a
Spearman Rank Order
Correlation Coefficient

Selecting Data Columns When computing a coefficient, you can either:

➤ Select the columns to test from the worksheet before choosing the
test, or
➤ Select the columns while computing the coefficient

Running a Spearman Rank Order Correlation 10

To run a Spearman Rank Order Correlation test, you need to select the
data to test. The Pick Columns dialog box is used to select the
worksheet columns with the data you want to test and to specify how
your data is arranged in the worksheet.

To run a Spearman Rank Order Correlation:

1 If you want to select your data before you run the correlation, drag
the pointer over your data.

Spearman Rank Order Correlation 632


2 Open the Pick Columns dialog box to start the Spearman
Correlation. You can either:

➤ SelectSpearman Correlation from the toolbar drop-down list,


then select the button.
➤ Choose the Statistics menu Correlation command, then choose
Spearman Rank Order...

If you selected columns before you chose the test, the selected
columns appear in the column list. If you have not selected
columns, the dialog box prompts you to pick your data.

3 To assign the desired worksheet columns to the Selected Columns


list, select the columns in the worksheet, or select the columns
from the Data for Variable drop-down list.

The selected columns are assigned to the Variable rows in the


Selected Columns list in the order they are selected from the
worksheet. The title of selected columns appears in each row. You
can select up to 64 variable columns. SigmaStat computes the
correlation coefficient for every possible pair.

FIGURE 12–72
The Pick Columns
for Spearman Correlation
Dialog Box Prompting You to
Select Data Columns

4 To change your selections, select the assignment in the list, then


select new column from the worksheet. You can also clear a
column assignment by double-clicking it in the Selected Columns
list.

5 Select Finish. The correlation coefficient is computed. When the


test is complete, the Spearman Rank Order Correlation
Coefficient report appears.

Spearman Rank Order Correlation 633


Interpreting Spearman Rank Correlation Results 10

The report for a Spearman Rank Order Correlation displays the


correlation coefficient r, the P value for the correlation coefficient, and
the number of data points used in the computation, for each pair of
variables.

( The report scroll bars only scroll to the top and bottom of the current page.
To move to the next or the previous page in the report, use the and
buttons in the formatting toolbar to move one page up and down in the
report.

Result Explanations In addition to the numerical results, expanded explanations of the results
may also appear. To turn off this explanatory text, choose the Statistics
menu Options command, choose Report, and unselect the Explain
Results option. The number of decimal places is also set in the Report
Options dialog box.

FIGURE 12–73
The Spearman Rank
Order Correlation
Report

Spearman Correlation The Spearman correlation coefficient rs quantifies the strength of the
Coefficient rs association between the variables. rs varies between !1 and 1. A
correlation coefficient near 1 indicates there is a strong positive
relationship between the two variables, with both always increasing
together. A correlation coefficient near !1 indicates there is a strong
negative relationship between the two variables, with one always

Spearman Rank Order Correlation 634


decreasing as the other increases. A correlation coefficient of 0 indicates
no relationship between the two variables.

P Value The P value is the probability of being wrong in concluding that there is
a true association between the variables (i.e., the probability of falsely
rejecting the null hypothesis, or committing a Type I error). The smaller
the P value, the greater the probability that the variables are correlated.

Traditionally, you can conclude that the independent variable can be


used to predict the dependent variable when P # 0.05.

Number of Samples This is the number of data points used to compute the correlation
coefficient. This number reflects samples omitted because of missing
values in one of the two variables used to compute each correlation
coefficient.

Spearman Rank Order Correlation Report Graph 10

The Spearman Rank Order Correlation matrix of scatter graphs is a


series of scatter graphs that plot the associations between all possible
combinations of variables.

The first row of the matrix represents the first set of variables or the first
column of data, the second row of the matrix represents the second set of
variables or the second data column, and the third row of the matrix
represents the third set of variables or third data column. The X and Y
data for the graphs correspond to the column and row of the graph in
the matrix.

For example, the X data for the graphs in the first row of the matrix is
taken from the second column of tested data, and the Y data is taken
from the first column of tested data. The X data for the graphs in the
second row of the matrix is taken from the first column of tested data,
and the Y data is taken from the second column of tested data. The X
data for the graphs in the third row of the matrix is taken from the
second column of tested data, and the Y data is taken from the third
column of tested data, etc. The number of graph rows in the matrix is
equal to the number of data columns being tested.

For an example of a scatter matrix, see page 160.

Spearman Rank Order Correlation 635


Creating the Spearman To generate the graph of Spearman Rank Order report data:
Rank Order Report
Graph 1 Click the toolbar button, or choose the Graph menu Create
Graph command when the Pearson Product Moment report is
selected. The Create Graph dialog box appears displaying the
Scatter Matrix graph.

2 Select Scatter Plot Matrix from the Graph Type list, then select
OK, or double-click the desired graph in the list.

For information on manipulating graphs, see page 181 through page


202.

FIGURE 12–74
The Create Graph Dialog
Box
for the Spearman
Correlation Report

Nonlinear Regression 10

Use Nonlinear Regression when your data follow a curve that is a


nonlinear function. Nonlinear Regression solves the regression problem
directly without transforming the data and performing linear regression
techniques.

For additional information, a tutorial, examples, and tips on the


Nonlinear Regression refer to the Transforms and Regression reference.

About Nonlinear Nonlinear Regression uses the Marquardt-Levenberg algorithm to find


Regression the coefficients (parameters) of the independent variable(s) that give the
“best fit” between the equation and the data.

Nonlinear Regression 636


The Nonlinear Regression algorithm seeks the values of the parameters
that minimize the sum of the squared differences between the values of
the observed and predicted values of the dependent variable

4 $ yi – ŷi %
2
SS =
i=1

where yi is the observed and ŷ i is the predicted value of the dependent


variable.

This process is iterative—SigmaStat begin with a “guess” at the


parameters, checks to see how well the equation fits, then continues to
make better guesses until the differences between the residual sum of
squares no longer decreases significantly. This condition is known as
convergence.

Nonlinear Regression can fit almost any equation. Equations are


defined using the full SigmaStat transform language—this means that
you can type in an equation (or set of equations) made up of any of the
transform functions. For example, you can use if statements in the fit
equation to fit different equations over different ranges of the data.

For descriptions of Marqardt’s method of Nonlinear Regression, you can


reference any appropriate statistics reference. For a list of suggested
references, see page 12.

Performing a Nonlinear To perform a Nonlinear Regression:


Regression
1 Enter or arrange your data appropriately in the worksheet
(see following section).

2 If desired, set the options for Nonlinear Regression using the


Options for Nonlinear Regression dialog box (page 638).

3 Run the regression by entering the equation, initial parameter


values, variables, and other regression settings in the Nonlinear
Regression dialog box (page 648).

4 View the Nonlinear Regression results and generate the reports


graphs (page 652 and page 662).

Nonlinear Regression 637


Arranging Nonlinear Regression Data 10

Data for Nonlinear Regressions consists of the data for the observed
dependent variable in one column, and the corresponding data for the
independent variables in one or more columns.

FIGURE 12–75
Data Format for a
Nonlinear Regression
Note that Nonlinear
Regression does not
necessarily require data in
the worksheet; variable
values can be specified using
ranges within the Nonlinear
Regression dialog box.

Note that Nonlinear Regression does not necessarily require data in the
worksheet; variable values can be specified using ranges within the
Nonlinear Regression dialog box under the [Variables] heading.

Setting Nonlinear Regression Options 10

Use the Nonlinear Regression options to:

➤ Set assumption checking using residuals.


➤ Display predicted values for the dependent variable and save them
to the worksheet.
➤ Specify the residuals to display and save them to the worksheet.
➤ Display confidence intervals and save them to the worksheet.
➤ Display the PRESS prediction error.
➤ Display standardized and studentized regression coefficients.
➤ Specify tests to identify outlying or influential data points.
➤ Specify tests to identify potential difficulties with the regression
parameter estimates (multicollinearity).
➤ Display the power.

Nonlinear Regression 638


To change the Nonlinear Regression options:

1 If you are going to run the test after changing test options, and
want to select your data before you run the test, drag the pointer
over your data.

2 To open the Options for Nonlinear Regression dialog box, select


Nonlinear Regression from the toolbar drop-down list, then click
the button, or choose the Statistics menu Current Test
Options... command. The Assumption Checking options appears
(see Figure 12–76 on page 640).

3 Click the Residuals tab to view the Residual options (see Figure
12–77 on page 642), the More Statistics tab to view the
Confidence Interval, PRESS, and Standardized Coefficients
options (see Figure 12–78 on page 644), Other Diagnostics tab to
view Influence, VIF, and Power options (see Figure 12–80 on page
646). Click the Assumption Checking tab to return to the
Normality and Equal Variance options.

4 Click a check box to enable or disable a test option. Options


settings are saved between SigmaStat sessions. For more
information on each of the test options, see pages 12-639 through
12-648.

5 To continue the test, click Run Test. The Pick Columns dialog
box appears (see page 648 for more information).

6 To accept the current settings and close the options dialog box,
click OK. To apply the current settings without closing the
options dialog box, click Apply. To close the dialog box without
changing any settings or running the test, click Cancel.

( You can select Help at any time to access SigmaStat’s online help
system.

Assumption Checking Select the Assumption Checking tab from the options dialog box to view
the Normality, Constant Variance, and Durbin-Watson options. These
options test your data for its suitability for regression analysis by
checking three assumptions that a nonlinear regression makes about the
data. A nonlinear regression assumes:

Nonlinear Regression 639


➤ That the source population is normally distributed about the
regression.
➤ The variance of the dependent variable in the source population is
constant regardless of the value of the independent variable(s).
➤ That the residuals are independent of each other.

Only disable the assumption checking options if you are certain that the
data was sampled from normal populations with constant variance and
that the residuals are independent of each other.

Normality Testing The normality assumption test checks for a


normally distributed population. SigmaStat uses the Kolmogorov-
Smirnov test to test for normality.

FIGURE 12–76
The Options for
Nonlinear Regression
Dialog Box Displaying
the Assumption
Checking Options

Constant Variance Testing SigmaStat tests for constant variance by


computing the Spearman rank correlation between the absolute values of
the residuals and the observed value of the dependent variable. When
this correlation is significant, the constant variance assumption may be
violated and you should consider trying a different model (i.e., one that
more closely follows the pattern of the data), or transforming one or
more of the independent variables to stabilize the variance; see Chapter
14, Using Transforms for more information on the appropriate
transform to use.

P Values for Normality and Constant Variance The P value


determines the probability of being incorrect in concluding that the data
is not normally distributed (P value is the risk of falsely rejecting the null
hypothesis that the data normally distributed). If the P computed by the
test is greater than the P set here, the test passes.

Nonlinear Regression 640


To require a stricter adherence to normality and/or constant variance,
increase the P value. Because the parametric statistical methods are
relatively robust in terms of detecting violations of the assumptions, the
suggested value in SigmaStat is 0.05. Larger values of P (for example,
0.10) require less evidence to conclude that the residuals are not
normally distributed or the constant variance assumption is violated.

To relax the requirement of normality and/or constant variance,


decrease P. Requiring smaller values of P to reject the normality
assumption means that you are willing to accept greater deviations from
the theoretical normal distribution before you flag the data as non-
normal. For example, a P value of 0.01 for the normality test requires
greater deviations from normality to flag the data as non-normal than a
value of 0.05.

( Although the assumption tests are robust in detecting data from populations
that are non-normal or with unconstant variances, there are extreme
conditions of data distribution that these tests cannot detect. However, these
conditions should be easily detected by visually examining the data without
resorting to the automatic assumption tests.

Durbin-Watson Statistic SigmaStat uses the Durbin-Watson statistic


to test residuals for their independence of each other. The Durbin-
Watson statistic is a measure of serial correlation between the residuals.
The residuals are often correlated when the independent variable is time,
and the deviation between the observation and the regression line at one
time are related to the deviation at the previous time. If the residuals are
not correlated, the Durbin-Watson statistic will be 2.

Difference from 2 Value Enter the acceptable deviation from 2.0 that
you consider as evidence of a serial correlation in the Difference for 2.0
box. If the computed Durbin-Watson statistic deviates from 2.0 more
than the entered value, SigmaStat warns you that the residuals may not
be independent. The suggested deviation value is 0.50, i.e., Durbin-
Watson Statistic values greater than 2.5 or less than 1.5 flag the residuals
as correlated.

To require a stricter adherence to independence, decrease the acceptable


difference from 2.0.

To relax the requirement of independence, increase the acceptable


difference from 2.0.

Nonlinear Regression 641


Residuals Select the Residuals tab in the options dialog box to view the Predicted
Values, Raw, Standardized, Studentized, Studentized Deleted, and
Report Flagged Values Only options.

Predicted Values Use this option to calculate the predicted value of the
dependent variable for each observed value of the independent
variable(s), then save the results to the worksheet. Click the selected
check box if you do not want to include raw residuals in the worksheet.

To assign predicted values to a worksheet column, select the worksheet


column you want to save the predicted values to from the corresponding
drop-down list. If you select none and the Predicted Values check box is
selected, the values appear in the report but are not assigned to the
worksheet.

FIGURE 12–77
The Options for
Nonlinear Regression
Dialog Box Displaying
the Residuals Options

Raw Residuals The raw residuals are the differences between the
predicted and observed values of the dependent variables. To include
raw residuals in the report, make sure this check box is selected. Click
the selected check box if you do not want to include raw residuals in the
worksheet.

To assign the raw residuals to a worksheet column, select the number of


the desired column from the corresponding drop-down list. If you select
none from the drop-down list and the Raw check box is selected, the
values appear in the report but are not assigned to the worksheet.

Standardized Residuals The standardized residual is the residual


divided by the standard error of the estimate. The standard error of the
residuals is essentially the standard deviation of the residuals, and is a
measure of variability around the regression line. To include

Nonlinear Regression 642


standardized residuals in the report, make sure this check box is selected.
Click the selected check box if you do not want to include raw residuals
in the worksheet.

SigmaStat automatically flags data points lying outside of the confidence


interval specified in the corresponding box. These data points are
considered to have “large” standardized residuals, i.e., outlying data
points. You can change which data points are flagged by editing the
value in the Flag Values > edit box.

Studentized Residuals Studentized residuals scale the standardized


residuals by taking into account the greater precision of the regression
line near the middle of the data versus the extremes. The Studentized
residuals tend to be distributed according to the Student t distribution,
so the t distribution can be used to define “large” values of the
Studentized residuals. SigmaStat automatically flags data points with
“large” values of the Studentized residuals, i.e., outlying data points; the
suggested data points flagged lie outside the 95% confidence interval for
the regression population.

To include studentized residuals in the report, make sure this check box
is selected. Click the selected check box if you do not want to include
studentized residuals in the worksheet.

Studentized Deleted Residuals Studentized deleted residuals are


similar to the Studentized residual, except that the residual values are
obtained by computing the regression equation without using the data
point in question.

To include studentized deleted residuals in the report, make sure this


check box is selected. Click the selected check box if you do not want to
include studentized deleted residuals in the worksheet.

SigmaStat can automatically flag data points with “large” values of the
studentized deleted residual, i.e., outlying data points; the suggested data
points flagged lie outside the 95% confidence interval for the regression
population.

( Note that both Studentized and Studentized deleted residuals use the same
confidence interval setting to determine outlying points.

Report Flagged Values Only To only include only the flagged


standardized and studentized deleted residuals in the report, make sure

Nonlinear Regression 643


the Report Flagged Values Only check box is selected. Uncheck this
option to include all standardized and studentized residuals in the
report.

Confidence Intervals Select the More Statistics tab in the options dialog box to view the
confidence interval options. You can set the confidence interval for the
population, regression, or both and then save them to the worksheet.

FIGURE 12–78
The Options for Nonlinear
Regression Dialog Box
Displaying the Confidence
Interval, PRESS Prediction
Error, and Standardized
Coefficient Options

Confidence Interval for the Population The confidence interval for


the population gives the range of values that define the region that
contains the population from which the observations were drawn.

To include confidence intervals for the population in the report, make


sure the Population check box is selected. Click the selected check box if
you do not want to include the confidence intervals for the population
in the report.

Confidence Interval for the Regression The confidence interval for


the regression line gives the range of values that defines the region
containing the true mean relationship between the dependent and
independent variables, with the specified level of confidence.

To include confidence intervals for the regression in the report, make


sure the Regression check box is selected, then specify a confidence level
by entering a value in the percentage box. The confidence level can be
any value from 1 to 99. The suggested confidence level for all intervals
is 95%. Click the selected check box if you do not want to include the
confidence intervals for the population in the report.

Nonlinear Regression 644


Click the selected check box if you do not want to include the
confidence intervals for the population in the report.

Saving Confidence Intervals to the Worksheet To save the


confidence intervals to the worksheet, select the column number of the
first column you want to save the intervals to from the Starting in
Column drop-down list. The selected intervals are saved to the
worksheet starting with the specified column and continuing with
successive columns in the worksheet.

PRESS Select the More Statistics tab in the options dialog box to view the
Prediction Error PRESS Prediction Error option (see Figure 12–78 on page 644). The
PRESS Prediction Error is a measure of how well the regression equation
fits the data. Leave this check box selected to evaluate the fit of the
equation using the PRESS statistic. Click the selected check box if you
do not want to include the PRESS statistic in the report.

Standardized Click the More Statistics tab in the options dialog box to view the
Coefficients ()i) Standardized Coefficients option (see Figure 12–78 on page 644).
These are the coefficients of the regression equation standardized to
dimensionless values,
sx
) i = b i ----i
sy

where bi = regression coefficient, s xi = standard deviation of the


independent variable xi, and sy = standard deviation of dependent
variable y.

To include the standardized coefficients in the report, make sure the


Standardized Coefficients check box is selected. Click the selected
check box

Influence Options Select the Other Diagnostics tab in the options dialog box to view the
Influence options. Influence options automatically detect instances of
influential data points. Most influential points are data points which are
outliers, that is, they do not do not “line up” with the rest of the data
points. These points can have a potentially disproportionately strong
influence on the calculation of the regression line. You can use several
influence tests to identify and quantify influential points.

Nonlinear Regression 645


FIGURE 12–79 16
A Graph with an
Influential Outlying Point 14 Outlying Point
The solid line shows the
regression for the data 12
including the outlier, and
the dotted line is the
regression computed 10
with the outlying point.
8

2
100 120 140 160 180 200 220 240 260 280 300

DFFITS DFFITSi is the number of estimated standard errors that the


predicted value changes for the ith data point when it is removed from
the data set. It is another measure of the influence of a data point on the
prediction used to compute the regression coefficients.

Predicted values that change by more than two standard errors when the
data point is removed are considered to be influential.

Check the DFFITS check box to compute this value for all points and
flag influential points, i.e., those with DFFITS greater than the value
specified in the Flag Values > edit box. The suggested value is 2.0
standard errors, which indicates that the point has a strong influence on
the data. To avoid flagging more influential points, increase this value;
to flag less influential points, decrease this value.

FIGURE 12–80
The Options for Nonlinear
Regression Dialog Box
Displaying the Influence
and Power Options

Nonlinear Regression 646


Leverage Leverage is used to identify the potential influence of a point
on the results of the regression equation. Leverage depends only on the
value of the independent variable(s). Observations with high leverage
tend to be at the extremes of the independent variables, where small
changes in the independent variables can have large effects on the
predicted values of the dependent variable.

The expected leverage of a data point is $----------------


k + 1 %-
n
, where there are k
independent variables and n data points. Observations with leverages
much higher than the expected leverages are potentially influential
points.

Check the Leverage check box to compute the leverage for each point
and automatically flag potentially influential points, i.e., those points
that could have leverages greater than the specified value times the
expected leverage. The suggested value is 2.0 times the expected leverage
2 $ k + 1 %-
for the regression (i.e., --------------------
n
). To avoid flagging more potentially
influential points, increase this value; to flag points with less potential
influence, lower this value.

Cook's Distance Cook's distance is a measure of how great an effect


each point has on the estimates of the parameters in the regression
equation. Cook's distance assesses how much the values of the
regression coefficients change if a point is deleted from the analysis.
Cook's distance depends on both the values of the independent and
dependent variables.

Check the Cook's Distance check box to compute this value for all
points and flag influential points, i.e., those with a Cook's distance
greater than the specified value. The suggested value is 4.0. Cook's
distances above 1 indicate that a point is possibly influential. Cook's
distances exceeding 4 indicate that the point has a major effect on the
values of the parameter estimates. To avoid flagging more influential
points, increase this value: to flag less influential points, lower this value.

Report Flagged Values Only To only include only the influential


points flagged by the influential point tests in the report, make sure the
Report Flagged Values Only check box is selected. Uncheck this option
to include all influential points in the report.

What to Do About Influential Points Influential points have two


possible causes:

Nonlinear Regression 647


➤ There is something wrong with the data point, caused by an error in
observation or data entry.
➤ The model is incorrect.

For descriptions of how to handle influential points, you can reference


an appropriate statistics reference. For a list of suggested references, see
page 12. To learn about the Variance Inflation Factor option, see
Variance Inflation Factor on page 591.

Power Select the Other Diagnostics tab in the options dialog box to view the
Power options (see Figure 12–80 on page 646). The power of a
regression is the power to detect the observed relationship in the data.
The alpha (&) is the acceptable probability of incorrectly concluding
there is a relationship.

Check the Power check box to compute the power for the nonlinear
regression data. Change the alpha value by editing the number in the
Alpha Value edit box. The suggested value is & * 0.05. This indicates
that a one in twenty chance of error is acceptable, or that you are willing
to conclude there is a significant relationship when P # 0.05.

Smaller values of & result in stricter requirements before concluding


there is a significant relationship, but a greater possibility of concluding
there is no relationship when one exists. Larger values of & make it
easier to conclude that there is a relationship, but also increase the risk of
reporting a false positive. .

Running a Nonlinear Regression 10

Running a Nonlinear Regression involves opening the Nonlinear


Regression dialog box and entering the appropriate Nonlinear
Regression settings.

To run a Nonlinear Regression:

1 Open the Nonlinear Regression dialog box to start the Nonlinear


Regression. You can either:

➤ SelectNonlinear Regression from the toolbar drop-down list


and click the button.
➤ Choose the Statistics menu Regression, Nonlinear... command.

Nonlinear Regression 648


➤ SelectRun Test in the Nonlinear Regression Options dialog
box. See step 5 on page 639 for information more information
on the Options for Nonlinear Regression dialog box.

The Nonlinear Regression dialog box displays the last equation


entered into the edit window. If an equation appears, click New to
clear the edit window. You can either enter a new regression
equation or open an existing regression equation that has been save
to a regression file. For information on saving and opening
regression equation files, see the Transforms and Regression
reference.

2 Enter the parameter and starting values under the [Parameters]


heading at the top of the edit window. Parameters are the
unknown coefficients of the equation you want SigmaStat to
estimate.

For more information on defining parameter values, see


ENTERING PARAMETERS in the Transforms and Regression reference.

3 Enter the variable definitions under the [Variables] heading in the


edit window. You can define independent, dependent, and
weighting variables. You need to define at least two variables—the
observed dependent variable, and at least one independent
variable. You can define up to ten independent variables.

For more information on defining variables, see the TRANSFORM


COMPONENTS section in the Transforms and Regression reference.

4 Enter the nonlinear model equation under the [Equations]


heading, and type the fit equation(s) using the transform language
operators and functions.

The equation should contain all of the variables you want to use as
independent variables, as well as the dependent variable to be fit to
the data.

For detailed information on entering equations, see the


TRANSFORM COMPONENTS section in the Transforms and
Regression reference.

5 Optionally, enter parameter constraints. Constraints are used to


set limits and conditions for parameter values and to improve

Nonlinear Regression 649


Nonlinear Regression speed and accuracy. For example, if you
know that a parameter should always be positive, you can enter a
constraint defining the parameter to be always + 0. A maximum
of 25 constraints can be entered.

For more information on parameter constraints, see the


TRANSFORM COMPONENTS section in the Transforms and
Regression reference.

6 Optionally, enter special options to influence Nonlinear


Regression. These options can be used to speed up or improve the
Nonlinear Regression process. To enter these options, move
beneath the [Options] heading and type in the desired settings.

For more information on using these options, see the TRANSFORM


COMPONENTS section in the Transforms and Regression reference.

FIGURE 12–81
The Edit Nonlinear
Regression Dialog Box with
Regression Settings Entered
Under the Appropriate
Headings

7 Once you have entered the parameters, variables, equations, and


the optional constraints and options, select Run to begin fit
iterations. The Iteration dialog box appears.

During the regression process the Iteration dialog box is displayed.


This dialog box displays the iteration number, the parameter
values for each iteration, and the norm of the residuals (the norm is
SS , the square root of the residual sum of squares).

Nonlinear Regression 650


You can stop the fitting by selecting Cancel. If the fitting process
is canceled, the Nonlinear Regression Results window is displayed
with the most recent parameter values and norm. The fitting
process can be continued by selecting the More Iterations option.

8 When the Nonlinear Regression is complete, the Nonlinear


Regression Results dialog box appears (see Figure 12–82 on page
651).

Interpreting the Nonlinear Regression Results Dialog Box 10

After the Nonlinear Regression is complete, the Nonlinear Regression


Results dialog box appears displaying the initial regression parameter
results. These results can be used to evaluate the success of the nonlinear
fit.

You can use this dialog box to edit the Nonlinear Regression, then run
the Nonlinear Regression again, check your parameter constraints, save
the results to the worksheet, and view the Nonlinear Regression report.

FIGURE 12–82
The Nonlinear Regression
Results Dialog Box

For detailed descriptions on the Nonlinear Regression results,


completion status messages, checking parameter constraints, and saving
results to the worksheet, see INTERPRETING NONLINEAR REGRESSION
RESULTS and NONLINEAR REGRESSION RESULT MESSAGES in the
Transforms and Regression reference.

Nonlinear Regression 651


Interpreting the Nonlinear Regression Report 10

Select the Report button in the Nonlinear Regression Results dialog box
to view detailed Nonlinear Regression report. The report for a
Nonlinear Regression lists all the settings entered into the Nonlinear
Regression dialog box, and a table of the values and statistics for the
regression parameters.

Additional results to be displayed are selected in the Nonlinear


Regression Options dialog box. See Setting Nonlinear Regression
Options on page 638 for more information.

For descriptions of the derivations of these results, you can reference any
appropriate statistics reference. For a list of suggested references, see
page 12.

( The report scroll bars only scroll to the top and bottom of the current page.
To move to the next or the previous page in the report, use the and
buttons in the formatting toolbar to move one page up and down in the
report.

Result Explanations In addition to the numerical results, expanded explanations of the results
may also appear. To turn off this explanatory text, choose the Statistics
menu Report Options... command and click the selected Explain Test
Results check box.

The number of decimal places displayed is also set in the Report


Options dialog box. For more information on setting report options, see
page 135.

For information on modifying reports, see the WORKING WITH


REPORTS chapter.

Initial Settings These are the settings as entered into the Nonlinear Regressions dialog
box.

Parameters and Initial Estimates These are the parameters and


starting values specified under the [Parameters] heading in the
Nonlinear Regressions dialog box.

Variables These are the variables specified under the [Variables]


heading in the Nonlinear Regressions dialog box.

Nonlinear Regression 652


FIGURE 12–83
The Nonlinear
Regression Report

Fit Equations These are the regression model, fit statement, and
options weighting statement, as specified under the [Equations] heading
in the Nonlinear Regressions dialog box.

Parameter Constraints These are the optional parameter constraints as


specified under the [Constraints] heading in the Nonlinear Regressions
dialog box.

Regression Options These are the optional stepsize, tolerance, and


iterations as specified under the [Options] heading in the Nonlinear
Regressions dialog box.

R and R Squared The multiple correlation coefficient, and R2, the coefficient of
determination for Nonlinear Regression, are both measures of how well
the regression model describes the data. R values near 1 indicate that the
equation is a good description of the relation between the independent
and dependent variables.

R equals 0 when the values of the independent variable does not allow
any prediction of the dependent variables, and equals 1 when you can
perfectly predict the dependent variables from the independent
variables.

Nonlinear Regression 653


Adjusted R Squared The adjusted R2, R2adj, is also a measure of how well the regression
model describes the data, but takes into account the number of
independent variables, which reflects the degrees of freedom. Larger
R2adj values (nearer to 1) indicate that the equation is a good description
of the relation between the independent and dependent variables.

Standard Error of The standard error of the estimate S y x is a measure of the actual
the Estimate ( S y x ) variability about the regression plane of the underlying population. The
underlying population generally falls within about two standard errors of
the observed sample.

Statistical The standard error, t and P values are approximations based on the final
Summary Table iteration of the Nonlinear Regression.

Estimate The value for the constant and coefficients of the


independent variables for the regression model are listed.

Standard Error The standard errors are estimates of the uncertainties


in the estimates of the regression coefficients (analogous to the standard
error of the mean). The true regression coefficients of the underlying
population generally fall within about two standard errors of the
observed sample coefficients. Large standard errors may indicate
multicollinearity.

t Statistic The t statistic tests the null hypothesis that the coefficient of
the independent variable is zero, that is, the independent variable does
not contribute to predicting the dependent variable. t is the ratio of the
regression coefficient to its standard error, or

regression coefficient
t = -------------------------------------------------------------------------------------
-
standard error of regression coefficient

You can conclude from “large” t values that the independent variable can
be used to predict the dependent variable (i.e., that the coefficient is not
zero).

P value P is the P value calculated for t. The P value is the probability


of being wrong in concluding that the coefficient is not zero (i.e., the
probability of falsely rejecting the null hypothesis, or committing a Type
I error, based on t). The smaller the P value, the greater the probability
that the coefficient is not zero.

Nonlinear Regression 654


Traditionally, you can conclude that the independent variable can be
used to predict the dependent variable when P # 0.05.

VIF The variance inflation factor (VIF) for each parameter is a measure
of the uncertainty with which the parameter can be estimated.
Parameters with large VIFs (much greater than 1.0) indicate that the
equation(s) used are “overparameterized.” There are too many
parameters to allow unique identification of the parameter values from
the available data, and a model with fewer parameters may be better.

Analysis of Variance The ANOVA (analysis of variance) table lists the ANOVA statistics for
(ANOVA ) Table the regression and the corresponding F value for each step.

SS (Sum of Squares) The sum of squares are measures of variability of


the dependent variable.

➤ The sum of squares due to regression measures the difference of the


regression plane from the mean of the dependent variable.
➤ The residual sum of squares is a measure of the size of the residuals,
which are the differences between the observed values of the
dependent variable and the values predicted by regression model.

DF (Degrees of Freedom) Degrees of freedom represent the number


observations and variables in the regression equation.

➤ The regression degrees of freedom is a measure of the number of


independent variables.
➤ The residual degrees of freedom is a measure of the number of
observations less the number of terms in the equation.

MS (Mean Square) The mean square provides two estimates of the


population variances. Comparing these variance estimates is the basis of
analysis of variance.

The mean square regression is a measure of the variation of the


regression from the mean of the dependent variable, or

sum of squares due to regression = ------------ SS reg


----------------------------------------------------------------------- - = MS reg
regression degrees of freedom DF reg

Nonlinear Regression 655


The residual mean square is a measure of the variation of the residuals
about the regression plane, or

residual sum of squares - = ------------ SS res


----------------------------------------------------------- = MS res
residual degrees of freedom DF res

2
The residual mean square is also equal to S y x .

F Statistic The F test statistic gauges the contribution of the


independent variables in predicting the dependent variable. It is the
ratio

regression variation from the dependent variable mean- = MS


--------------------------------------------------------------------------------------------------------------------------- reg
------------- = F
residual variation about the regression MS res

If F is a large number, you can conclude that the independent variables


contribute to the prediction of the dependent variable (i.e., at least one
of the coefficients is different from zero, and the “unexplained
variability” is smaller than what is expected from random sampling
variability of the dependent variable about its mean). If the F ratio is
around 1, you can conclude that there is no association between the
variables (i.e., the data is consistent with the null hypothesis that all the
samples are just randomly distributed).

P Value The P value is the probability of being wrong in concluding


that there is an association between the dependent and independent
variables (i.e., the probability of falsely rejecting the null hypothesis, or
committing a Type I error, based on F). The smaller the P value, the
greater the probability that there is an association.

Traditionally, you can conclude that the independent variable can be


used to predict the dependent variable when P # 0.05.

PRESS Statistic PRESS, the Predicted Residual Error Sum of Squares, is a gauge of
how well a regression model predicts new data. The smaller the PRESS
statistic, the better the predictive ability of the model.

The PRESS statistic is computed by summing the squares of the


prediction errors (the differences between predicted and observed values)

Nonlinear Regression 656


for each observation, with that point deleted from the computation of
the regression equation.

Durbin-Watson Statistic The Durbin-Watson statistic is a measure of correlation between the


residuals. If the residuals are not correlated, the Durbin-Watson statistic
will be 2; the more this value differs from 2, the greater the likelihood
that the residuals are correlated. This results appears if it was selected in
the Options for Nonlinear Regression dialog box.

Regression assumes that the residuals are independent of each other; the
Durbin-Watson test is used to check this assumption. If the Durbin-
Watson value deviates from 2 by more than the value set in the Options
for Nonlinear Regression dialog box, a warning appears in the report.
The suggested trigger value is a difference of more than 0.50, i.e., the
Durbin-Watson statistic is below 1.50 or above 2.50.

Normality Test The normality test results display whether the data passed or failed the
test of the assumption that the source population is normally distributed
around the regression, and the P value calculated by the test. All
regressions require a source population to be normally distributed about
the regression line. When this assumption may be violated, a warning
appears in the report. This result appears unless you disabled normality
testing in the Options for Nonlinear Regression dialog box (see page
640).

Failure of the normality test can indicate the presence of outlying


influential points or an incorrect regression model.

Constant The constant variance test result displays whether or not the data passed
Variance Test or failed the test of the assumption that the variance of the dependent
variable in the source population is constant regardless of the value of
the independent variable, and the P value calculated by the test. When
the constant variance assumption may be violated, a warning appears in
the report.

If you receive this warning, you should consider trying a different model
(i.e., one that more closely follows the pattern of the data) using a
weighted regression, or transforming the independent variable to
stabilize the variance and obtain more accurate estimates of the
parameters in the regression equation.

Nonlinear Regression 657


If you perform a weighted regression, the normality and equal variance
tests are use the weighted residuals w i $ y i – ŷ i % instead of the raw
residuals y i – ŷ i .

See Chapter 14, Using Transforms for more information on the


appropriate transform to use.

Power This result is displayed if you selected this option in the Options for
Nonlinear Regression dialog box.

The power, or sensitivity, of a regression is the probability that the model


correctly describes the relationship of the variables, if there is a
relationship.

Regression power is affected by the number of observations, the chance


of erroneously reporting a difference & (alpha), and the slope of the
regression.

Alpha (&) Alpha (&) is the acceptable probability of incorrectly


concluding that the model is correct. An & error is also called a Type I
error (a Type I error is when you reject the hypothesis of no association
when this hypothesis is true).

The & value is set in the Options dialog box; the suggested value is
&"* 0.05 which indicates that a one in twenty chance of error is
acceptable. Smaller values of & result in stricter requirements before
concluding the model is correct, but a greater possibility of concluding
the model is bad when it is really correct (a Type II error). Larger values
of & make it easier to conclude that the model is correct, but also
increase the risk of accepting a bad model (a Type I error).

Regression Diagnostics The regression diagnostic results display the values for the predicted
values, residuals, and other diagnostic results selected in the Options for
Nonlinear Regression dialog box (see page 644). All results that qualify
as outlying values are flagged with a # symbol. The trigger values to flag
residuals as outliers are set in the Options for Nonlinear Regression
dialog box.

If you selected Report Cases with Outliers Only, only those observations
that have one or more residuals flagged as outliers are reported; however,
all other results for that observation are also displayed.

Row This is the row number of the observation

Nonlinear Regression 658


Predicted Values This is the value for the dependent variable predicted
by the regression model for each observation.

Residuals These are the unweighted raw residuals, the difference


between the predicted and observed values for the dependent variables.

Standardized Residuals The standardized residual is the raw residual


divided by the standard error of the estimate S y x .

If the residuals are normally distributed about the regression, about 66%
of the standardized residuals have values between !1 and 1, and about
95% of the standardized residuals have values between !2 and 2. A
larger standardized residual indicates that the point is far from the
regression; the suggested value flagged as an outlier is 2.5.

Studentized Residuals The Studentized residual is a standardized


residual that also takes into account the greater confidence of the
predicted values of the dependent variable in the “middle” of the data
set. By weighting the values of the residuals of the extreme data points
(those with the lowest and highest independent variable values), the
Studentized residual is more sensitive than the standardized residual in
detecting outliers.

Both Studentized and Studentized deleted residuals that lie outside a


specified confidence interval for the regression are flagged as outlying
points; the suggested confidence value is 95%.

This residual is also known as the internally Studentized residual,


because the standard error of the estimate is computed using all data.

Studentized Deleted Residuals The Studentized deleted residual, or


externally Studentized residual, is a Studentized residual which uses the
standard error of the estimate S y x $ – i % , computed after deleting the data
point associated with the residual. This reflects the greater effect of
outlying points by deleting the data point from the variance
computation.

Both Studentized and Studentized deleted residuals that lie outside a


specified confidence interval for the regression are flagged as outlying
points; the suggested confidence value is 95%.

Nonlinear Regression 659


The Studentized deleted residual is more sensitive than the Studentized
residual in detecting outliers, since the Studentized deleted residual
results in much larger values for outliers than the Studentized residual.

Influence Diagnostics The influence diagnostic results display only the values for the results
selected in the Options dialog box under the Other Diagnostics tab (see
page 645). All results that qualify as outlying values are flagged with a #
symbol. The trigger values to flag data points as outliers are also set in
the Options for Nonlinear Regression dialog box under the Other
Diagnostics tab.

If you selected Report Cases with Outliers Only, only observations that
have one or more observations flagged as outliers are reported; however,
all other results for that observation are also displayed.

Row This is the row number of the observation.

Cook's Distance Cook's distance is a measure of how great an effect


each point has on the estimates of the parameters in the regression
equation. It is a measure of how much the values of the regression
coefficients would change if that point is deleted from the analysis.

Values above 1 indicate that a point is possibly influential. Cook's


distances exceeding 4 indicate that the point has a major effect on the
values of the parameter estimates. Points with Cook's distances greater
than the specified value are flagged as influential; the suggested value is
4.

Leverage Leverage values identify potentially influential points.


Observations with leverages a specified factor greater than the expected
leverages are flagged as potentially influential points; the suggested value
is 2.0 times the expected leverage.
p
The expected leverage of a data point is --- , where there are p parameters
n
and n data points.

Because leverage is calculated using only the dependent variable, high


leverage points tend to be at the extremes of the independent variables
(large and small values), where small changes in the independent
variables can have large effects on the predicted values of the dependent
variable.

Nonlinear Regression 660


DFFITS The DFFITSi statistic is a measure of the influence of a data
point on regression prediction. It is the number of estimated standard
errors the predicted value for a data point changes when the observed
value is removed from the data set before computing the regression
coefficients.

Predicted values that change by more than the specified number of


standard errors when the data point is removed are flagged as influential;
the suggested value is 2.0 standard errors.

Confidence Intervals These results are displayed if you selected them in the Regression
Options dialog box. If the confidence interval does not include zero,
you can conclude that the coefficient is different than zero with the level
of confidence specified. This can also be described as P # & (alpha),
where & is the acceptable probability of incorrectly concluding that the
coefficient is different than zero, and the confidence interval is 100(1 !
&).

The specified confidence level can be any value from 1 to 99; the
suggested confidence level for both intervals is 95%.

Row This is the row number of the observation.

Predicted Values This is the value for the dependent variable predicted
by the regression model for each observation.

Regression The confidence interval for the regression gives the range of
variable values computed for the region containing the true relationship
between the dependent and independent variables, for the specified level
of confidence.

Population The confidence interval for the population gives the range
of variable values computed for the region containing the population
from which the observations were drawn, for the specified level of
confidence.

Nonlinear Regression 661


Nonlinear Regression Report Graphs 10

You can generate up to six graphs using the results from a Stepwise
Regression. They include a:

➤ Histogram of the residuals.


➤ Scatter plot of the residuals.
➤ Bar chart of the standardized residuals.
➤ Normal probability plot of the residuals.
➤ Line/scatter plot of the regression with one independent variable
and confidence and prediction intervals
➤ 3D scatter plot of the residuals.

Histogram of Residuals The Nonlinear Regression histogram plots the raw residuals in a
specified range, using a defined interval set. The residuals are divided
into a number of evenly incremented histogram intervals and plotted as
histogram bars indicating the number of residuals in each interval. The
X axis represents the histogram intervals, and the Y axis represent the
number of residuals in each group. For an example of a histogram, see
page 153.

Scatter Plot of The Nonlinear Regression scatter plot of the residuals plots the residuals
the Residuals of the data in the selected independent variable column as points relative
to the standard deviations. The X axis represents the independent
variable values, the Y axis represents the residuals of the variables, and
the horizontal lines running across the graph represent the standard
deviations of the data. For an example of a scatter plot, see page 152.

Bar Chart of The Nonlinear Regression bar chart of the standardized residuals plots
the Standardized the standardized residuals of the data in the selected independent
Residuals variable column as points relative to the standard deviations. The X axis
represents the selected independent variable values, the Y axis represents
the residuals of the variables, and the horizontal lines running across the
graph represent the standard deviations of the data. For an example of a
bar chart, see page 153.

Normal The Nonlinear Regression probability plot graphs standardized residuals


Probability Plot versus their cumulative frequencies along a probability scale. The
residuals are sorted and then plotted as points around a curve
representing the area of the gaussian. Plots with residuals that fall along
gaussian curve indicate that your data was taken from a normally

Nonlinear Regression 662


distributed population. The X axis is a linear scale representing the
residual values. The Y axis is a probability scale representing the
cumulative frequency of the residuals. For an example a normal
probability plot, see page 155.

Line/Scatter Plot The Nonlinear Regression line/scatter graph plots the observations of
of the Regression with the stepwise regression for the data of the selected independent variable
Prediction and column as a line/scatter plot. The points represent the dependent
Confidence Intervals variable data plotted against the selected independent variable data, the
solid line running through the points represents the regression line, and
the dashed lines represent the prediction and confidence intervals. The
X axis represents the independent variables and the Y axis represents the
dependent variables. For an example a line/scatter plot, see page 156.

3D Residual The Nonlinear Regression 3D residual scatter plot graphs the residuals
Scatter Plot of the two selected columns of independent variable data. The X and
the Y axes represent the independent variables, and the Z axis represents
the residuals. For an example a 3D residual scatter plot, see page 156.

Creating Nonlinear To generate a graph of Nonlinear Regression report data:


Regression
Report Graphs 1 Click the toolbar button, or choose the Graph menu Create
Graph command when the Nonlinear Regression report is
selected. The Create Graph dialog box appears displaying the
types of graphs available for the Nonlinear Regression results.

FIGURE 12–84
The Create Graph Dialog
Box
for the Stepwise
Regression Report

2 Select the type of graph you want to create from the Graph Type
list, then select OK, or double-click the desired graph in the list.
For more information on each of the graph types, see pages 12-662
through 12-663.

Nonlinear Regression 663


If you select Scatter Plot Residuals, Bar Chart Std Residuals,
Regression, Conf. & Pred, a dialog box appears prompting you to
select the column with independent variables you want to use in
the graph. If you select 3D Scatter & Mesh, or 3D Residual
Scatter, and you have more than two columns of independent
variables, a dialog box appears prompting you to select the two
columns with the independent variables you want to plot.

FIGURE 12–85
Select X Independent
Variable Prompting you to
Select The Independent
Variable You Want to Plot

3 Select the columns with the independent variables you want to use
in the graph, then select OK. The graph appears using the
specified independent variables.

FIGURE 12–86
Example of a 2D Scatter
Plot of the Residuals for a
Nonlinear Regression
Report

For information on manipulating graphs, see page 178 through page


202.

Nonlinear Regression 664


Nonlinear Regression 665
Nonlinear Regression 666
Survival Analysis

12 Survival Analysis

Use survival analysis to generate the probability of the time to an event.


For example, a survival curve shows as a function of time the probability
of surviving lung cancer.

SigmaStat provides three types of Kaplan-Meier survival analysis and


uses two data formats. The types are the analysis of a single curve, the
comparison of multiple curves using the LogRank test and the
comparison of multiple curves using the Gehan-Breslow test.

About Survival Analysis 10

Survival analysis studies the variable that is the time to some event. The
term survival originates from the event death. But the event need not be
death; it can be the time to any event. This could be the time to closure
of a vascular graft or the time when a mouse footpad swells from
infection. Of course it need not be medical or biological. It could be the
time a motor runs until it fails. For consistency we will use survival and
death (or failure) here.

Sometimes death doesn't occur during the length of the study or the
patient dies from some other cause or the patient relocates to another
part of the country. Though a death did not occur, this information is
useful since the patient survived up until the time he or she left the
study. When this occurs the patient is referred to as censored. This comes
from the expression censored from observation - the data has been lost
from view of the study. Examples of censored values are patients who
moved to another geographic location before the study ended and
patients who are alive when the study ended. Kaplan-Meier survival
analysis includes both failures (death) and censored values.

About Survival Analysis 667


Survival Analysis

Three Survival Tests Use the Survival statistic to obtain one of the following three tests.

➤ Single Group Use this to analyze and graph one survival curve.
➤ LogRank Use this to compare two or more survival curves. The
LogRank test assumes that all survival time data is equally accurate
and all data will be equally weighted in the analysis.
➤ Gehan-Breslow Use this to compare two or more survival curves
when you expect the early data to be more accurate than later. Use
this, for example, if there are many more censored values at the end
of the study than at the beginning.

Two Multiple If the LogRank or Gehan-Breslow statistic yields a significant difference


Comparison Tests in survival curves then you have the option to use one of two multiple
comparison procedures to determine exactly which pairs of curves are
different. These are the Bonferroni and Holm-Sidak tests and are
described for each test.

Data Format for Survival Analysis 10

Your survival data will consist of three variables:

➤ Survival time
➤ Status
➤ Group

The survival times are the times when the event occurred. They must be
positive and all non-positive values will be considered missing values.
The data need not be sorted by survival time or group.

The status variable defines whether the data is a failure or censored


value. You are allowed to use multiple names for both failure and
censored. These can be text or numeric.

The group variable defines each individual survival data set (and curve).

Data Format for Survival Analysis 668


Survival Analysis

Data can be arranged in the worksheet in two formats:

➤ Raw data format Column pairs of survival time and status value for
each group.
➤ Indexed data format Data indexed to a group column.

Raw Data To enter the data in Raw data format, enter the survival time in one
column and the corresponding status in a second column. Do this for
each group. If you wish, you can identify each group with a column title
in the survival time column. If you do this then these group titles will be
used in the graph and report.

FIGURE 12–1
Raw Data Format for a
Survival Analysis
with Two Groups
Columns 1 and 2 are the
survival time and status
values for the first group -
Affected Node. Columns 3
and 4 are the same for the
second group - Total Node.
The report and the survival
curve graph will use the text
strings (“Affected Node”,
“Total Node”) found in the
survival time column titles.

( The worksheet columns for each group must be the same length. If not then
the cells in the longer length column will be considered missing. All non-
positive survival times will also be considered missing. All status variable
values not defined as either a failure or a censored value will be considered
missing.

Data Format for Survival Analysis 669


Survival Analysis

Indexed Data Indexed data is a three-column format. The survival time and status
variable in two columns are indexed on the group names in a third
column. Informative column titles are not necessary but are useful when
selecting columns in the wizard.

FIGURE 12–2
Indexed Data Format - a
Three-Column Format
Consisting of Group,
Survival Time, and Status
In this example group is in
column 1, survival time is in
column 2 and the status
variable is in column 3.

( The Transforms menu Index and Unindex commands are not designed for
converting between survival analysis data formats. To use these features you
must index and unindex the survival time and status variables separately and
then reorganize the resulting columns.

Single Group Survival Analysis 10

The Single Group option analyzes the survival data from one group,
creates a report and a graph with a single survival curve. There is no
statistical test performed but statistics associated with the data, such as
the median survival time, are calculated and presented in the report.

Performing a Single To perform a Single Group analysis:


Group Analysis
1 Enter or arrange you data appropriately in the worksheet.
)

2 If desired set the Single Group options using the Options for
Survival Single Group dialog. For more information, see Setting
Single Group Options on page 672.

Single Group Survival Analysis 670


Survival Analysis

3 Select Survival Single Group from the toolbar then click the
button, or choose the Statistics menu Survival command, then
choose Single Group.

4 Select the two worksheet columns with the survival times and
status values using the Pick Columns panel.

5 Click Next and select the Event and Censored labels. You may
select multiple labels for each.

6 Click Finish.

7 View and interpret the Single Group survival analysis report and
curve ( see Interpreting Single Group Survival Results on page
677).

Arranging Single Group Survival Analysis Data 10

Two data columns are required: a column with survival times and a
column with status labels. These can be just two columns in a worksheet
or two columns from a multi-group data set. So you can select a single
pair of columns from the multiple groups in the Raw data format.

( Use this option to analyze all groups as a single group from an indexed
format data set. For example, select the last two columns in the worksheet
shown in Figure 12–2 to analyze both groups as one group. You cannot do
this directly with Raw data format since the groups are not concatenated in
two columns. You would need to use the Stack feature in Transforms to
concatenate the columns.

Selecting Data Columns When running a Single Group survival analysis you can either:

➤ Select the two columns from the worksheet to analyze by dragging


your mouse over the columns before choosing the analysis. In this
case the columns must be adjacent and the survival time must be the
first column.
➤ Select the columns while in the Test Wizard.

Single Group Survival Analysis 671


Survival Analysis

Setting Single Group Options 10

Use the Survival Curve Test Options to:

➤ Specify attributes of the generated survival curve graph


➤ Customize the post-test contents of the report and worksheet

To change the Survival Curve options:

1 If you are going to analyze your survival curve after changing test
options, and want to select your data before you create the curve,
drag the pointer over your data.

2 To open the Options for Survival Curve dialog, select Survival


Single Group from the toolbar drop-down list, then click the Test
Options button, or choose the Statistics menu Current Test
Options command. The Options for Survival Single Group dialog
appears.

FIGURE 12–3
The Options for Survival
Curve Dialog Displaying the
Graph Options

Single Group Survival Analysis 672


Survival Analysis

3 Click the Graph Options tab to view the graph symbol, line and
scaling options. Additional statistical graph elements may be
selected here.

FIGURE 12–4
The Options for Survival
Curve Dialog Displaying the
Report and Worksheet
Results Options

4 Click the Results tab to specify the survival time units and to
modify the content of the report and worksheet.

Options settings are saved between SigmaStat sessions.

5 To continue the test, click the Run Test button. The Pick Columns
panel appears. For more information, see Running a Single Group
Survival Analysis on page 675.

6 To accept the current settings and close the options dialog, click
OK. To accept the current settings without closing the options
dialog, click Apply. To close the dialog without changing any
settings or running the test, click Cancel.

( You can click Help at any time to access SigmaStat's on-line help
system.

All options in these dialogs are “sticky” and will remain in the state that
you have selected until you change them.

Status Symbols All graph options apply to graphs that are created when the analysis is
run. You can use the Graph Properties dialog to modify the attributes of
the survival curves after they have been created.

Censored Symbols Select the Graph Options tab from the Options for
Survival Single Group dialog to view the status symbols options.
Censored symbols are graphed by default. Clear this option to not
display the censored symbols.

Single Group Survival Analysis 673


Survival Analysis

Failures Symbols Checking this box will display symbols at the failure
times. These symbols always occupy the inside corners of the steps in the
survival curve. As such they provide redundant information and need
not be displayed.

Group Color The color of the objects in a survival curve group may be changed with
this option. All objects, for example, survival line, symbols, confidence
interval lines, will be changed to the selected color. Use Graph Properties
to modify individual object colors after the graph has been created.

Survival Scale You can display the survival graph either using fractional values
(probabilities) or percents.

Fraction If you select this then the Y axis scaling will be from 0 to 1.

Percent Selecting this will result in a Y axis scaling from 0 to 100.

( The results in the report are always expressed in fractional terms no matter
which option is selected for the graph.

Additional Plot Statistics Two different types of graph elements may be added to your survival
curve.

95% Confidence Intervals Selecting this will add the upper and lower
confidence lines in a stepped line format.

Standard Error Bars Selecting this will add error bars for the standard
errors of the survival probability. These are placed at the failure times.
All of these elements will be graphed with the same color as the survival
curve. You may change these colors, and other graph attributes, from
Graph Properties after the graph has been created.

Report Cumulative Probability Table Clear this option to exclude the


cumulative probability table from the report. This will reduce the length
of the report for large data sets.

Worksheet 95% Confidence Intervals Select this to place the survival curve upper
and lower 95% confidence interval values into the worksheet. These will
be placed into the first empty worksheet columns.

Time Units Select a time unit from the drop-down list or enter a unit. These units
will be used in the graph axis titles and the survival report.

Single Group Survival Analysis 674


Survival Analysis

Running a Single Group Survival Analysis 10

To run a single group survival analysis you need to select survival time
and status data columns to analyze. The Pick Columns panel is used to
select these two columns in the worksheet.

To run a Single Group analysis:

1 Specify any options for your graph and report. You can do this by
selecting Survival Single Group in the Select Test drop-down list
and either clicking the Test Options button or selecting Current
Test Options from the Statistics menu.

2 If you want to select your data before you run the test, drag the
pointer over your data. The Survival Time column must precede
and be adjacent to the Status column.

3 Open the Pick Columns panel to start the Single Group analysis.
You can either:

➤ Select Survival Single Group from the toolbar drop-down list,


then click the button.
➤ Choose the Statistics menu Survival command then choose
Single Group.

4 Click the Run Test button from the Options for Survival Single
Group dialog.

The Pick Columns panel appears prompting you to select your


data columns. If you selected columns before you chose the test,
the selected columns appear in the Selected Columns list.

FIGURE 12–5
The Pick Columns for
Survival Single Group Panel
Prompting You to Select
Time and Status Columns

5 To assign the desired worksheet columns to the Selected Columns


list, select the columns in the worksheet, or select the columns

Single Group Survival Analysis 675


Survival Analysis

from the Data for Status drop-down list. The first selected column
is assigned to the first row (Time) in the Selected Columns list,
and the next selected column is assigned to the next row (Status) in
the list. The number or title of selected columns appears in each
row.

6 To change your selections, select the assignment in the list and


then select a new column from the worksheet. You can also clear a
column assignment by double-clicking it in the Selected Columns
list.

7 Select Next to choose the status variables. The status variables


found in the columns you selected are shown in the Status labels in
selected columns: window.

FIGURE 12–6
The Pick Columns for
Survival Single Group Panel
Prompting You to Select the
Status Variables.

Figure 12–6 shows an example with two event variables, “failure”


and “1”, and one censored variable “0”. Select these and click the
right arrow buttons to place the event variables in the Event
window and the censored variable in the Censored window.

FIGURE 12–7
The Pick Columns for
Survival Single Group Panel
Showing the Results of
Selecting the Status
Variables

You can have more than one Event label and more than one
Censored label. You must select one Event label in order to
proceed. You need not select a censored variable, though, and some
data sets will not have any censored values. You need not select all

Single Group Survival Analysis 676


Survival Analysis

the variables; any data associated with unselected status variables


will be considered missing.

8 Use the back arrow keys to remove labels from the Event and
Censored windows. This will place them back in the Status labels
in selected columns window.

The Event and Censored labels that you selected are saved for your
next analysis. If the next data set contains exactly the same status
labels, or if you are re-analyzing your present data set, then the
saved selections will appear in the Event and Censored windows.

9 Click Finish to create the survival graph and report. The results
you obtain will depend on the Test Options that you selected.

Interpreting Single Group Survival Results 10

The Single Group survival analysis report displays information about the
origin of your data, a table containing the cumulative survival
probabilities and summary statistics of the survival curve.

For descriptions of the derivations for survival curve statistics see


Hosmer & Lemeshow or Kleinbaum (page 1-12).

Results Explanations The number of significant digits displayed in the report may be set in
the Report Options dialog. Use Tools, Options to display this dialog.
For more information on setting report options, see Setting Report
Options on page 135.

Report Header The report header includes the date and time that the analysis was
Information performed. The data source is identified by the worksheet title
containing the data being analyzed and the notebook name. In Figure
12–8 the Data source shows the worksheet title to be “standard,
squamous” and the notebook name to be “Survival Analysis Data”. The
event and censor labels used in this analysis are listed. Also, the time
units used are displayed.

Survival Cumulative The survival probability table lists all event times and, for each event
Probability Table time, the number of events that occurred, the number of subjects
remaining at risk, the cumulative survival probability and its standard
error. The upper and lower 95% confidence limits are not displayed but
these may be placed into the worksheet (see Figure 12–4). Failure times
are not shown but you can infer their existence from jumps in the

Single Group Survival Analysis 677


Survival Analysis

Number at Risk data and the summary table immediately below this
table

You can turn the display of this table off by clearing this option in the
Results tab of Test Options. This is useful for large data sets.

Data Summary Table The data summary table shows the total number of cases. The sum of
the number of events, censored and missing values, shown below this,
will equal the total number of cases.

Statistical Summary The mean and percentile survival times and their statistics are listed in
Table this table. The median survival time is commonly used in publications.

FIGURE 12–8
The Single Group
Survival Analysis
Results Report

Single Group Survival Graph 10

Visual interpretation of the survival curve is an important component of


survival analysis. For this reason SigmaStat always generates a survival
curve graph (see Figure 12–9). This is different from the other statistical
tests where you select a report graph a posteriori.

You can control the graph in three ways. You can set the graph options
shown in Figure 12–3 and these options will become the default values
until they are changed. After the graph is created you can modify it using
SigmaStat's Graph Options. Each object in the graph is a separate plot

Single Group Survival Analysis 678


Survival Analysis

(for example, survival curve, failure symbols, censored symbols, upper


confidence limit, and so on) so you have considerable control over the
appearance of your graph. If you also have SigmaPlot then you can use
Edit with SigmaPlot from the Graph menu and obtain additional
control over your graph.

FIGURE 12–9
A single group
survival curve

LogRank Survival Analysis 10

The LogRank option analyzes survival data from multiple groups and
creates a report and a graph showing multiple survival curves. Statistics
associated with each group, such as the median survival time, are
calculated and presented in the report.

The LogRank test is also performed to determine whether survival


curves are significantly different. It is a nonparametric test that uses a
LogRank statistic to reject the null hypothesis that the survival curves
came from the same population.

The LogRank statistic for two groups is formed from the square of the
sum, across all event times, of the difference of the observed and
estimated number of events (censored values removed) divided by the

LogRank Survival Analysis 679


Survival Analysis

variance of this sum. The P value is then obtained from a chi-square


distribution using this statistic.

The LogRank test assumes that there is no difference in the accuracy of


the data at any given time. This is different from the Gehan-Breslow test
that weights the early data more since it assumes that this data is more
accurate.

Performing a LogRank To perform a LogRank analysis:


Analysis
1 Enter or arrange your data in either Indexed or Raw data format in
the worksheet (see following sections)

2 If desired set the LogRank options using the Options for Survival
LogRank dialog. For more information, see Setting LogRank
Survival Options on page 681.

3 Select Survival LogRank from the toolbar Select Test drop-down


list and click the Run Test button, or choose Survival, LogRank
from the Statistics menu.

4 Select the appropriate data format - Indexed or Raw - and click


Next.

5 Pick the worksheet columns - multiple Time, Status column pairs


for Raw data format or Group, Time, Status for Indexed data
format - and click Next.

6 Select the groups from the Group panel if you selected Indexed
data format and click Next

7 Select the Event and Censored labels. You may select multiple
labels for each.

8 Click Finish.

9 View and interpret the LogRank survival analysis report and


graph. For more information, see Interpreting LogRank Survival
Results on page 690.

LogRank Survival Analysis 680


Survival Analysis

Arranging LogRank Survival Analysis Data 10

Multiple Time, Status column pairs (two or more) are required for Raw
data format. Indexed data format requires three columns for Group,
Time and Status. You can preselect the data to have the column selection
panel automatically select the Time, Status column pairs if you organize
your worksheet with the Time column preceding the Status column and
have all columns be adjacent. For Indexed data format, placing the
Group, Time and Status variables in adjacent columns and in that order
also allows automatic column selection.

Setting LogRank Survival Options 10

Use the Survival LogRank Test Options to:

➤ Specify attributes of the generated survival curve graph


➤ Customize the post-test contents of the report and worksheet
➤ Select the multiple comparison test and its options

To change the Survival Curve options:

1 If you are going to analyze your survival curve after changing test
options, and want to select your data before you create the curve,
then drag the pointer over your data.

2 To open the Test Options for Survival LogRank dialog, select


Survival LogRank from the toolbar Select Test drop-down list,
then click the Test Options button, or choose the Statistics menu
Current Test Options command. The options dialog appears.

FIGURE 12–10
The Options for Survival
LogRank Dialog Displaying
the Graph Options

LogRank Survival Analysis 681


Survival Analysis

3 Click the Graph Options tab to view the graph symbol, line and
scaling options. Additional statistical graph elements may also be
selected here.

4 Click the Results tab to specify the survival time units and to
modify the content of the report and worksheet.

FIGURE 12–11
The Options for Survival
LogRank Dialog Displaying
the Report and Worksheet
Results Options

Click the Post Hoc Tests tab to modify the multiple comparison
options.

FIGURE 12–12
The Options for Survival
LogRank Dialog Displaying
the Post Hoc Test Options

Options settings are saved between SigmaStat sessions.

5 To continue the test, click the Run Test button.

6 To accept the current settings and close the options dialog, click
OK. To accept the current settings without closing the options
dialog, click Apply. To close the dialog without changing any
settings or running the test, click Cancel.

LogRank Survival Analysis 682


Survival Analysis

Click Help at any time to access SigmaStat's on-line help system.

( All options in these dialogs are “sticky” and will remain in the state that you
have selected until you change them.

Status Symbols All graph options apply to graphs that are created when the analysis is
run. You can use the Graph Properties dialog to modify the attributes of
the survival curves after they have been created.

Censored Symbols Select the Graph Options tab from the Options
dialog to view the status symbols options. Censored symbols are graphed
by default. Clear this option to not display the censored symbols.

Failures Symbols Checking this box will display symbols at the failure
times. These symbols always occupy the inside corners of the steps in the
survival curve. As such they provide redundant information and need
not be displayed.

Group Color The color of the objects in a survival curve group may be changed with
this option. All objects, e.g., survival line, symbols, confidence interval
lines, will be changed to the selected color or color scheme. A four
density gray scale color scheme is used as the default. You may change
this to black, where all survival curves and their attributes will be black,
or incrementing that is a multi-color scheme. Use Graph Properties to
modify individual object colors after the graph has been created.

Survival Scale You can display the survival graph either using fractional values
(probabilities) or percents.

Fraction If you select this then the Y axis scaling will be from 0 to 1.

Percent Selecting this will result in a Y axis scaling from 0 to 100.

( The results in the report is always expressed in fractional terms no matter


which option is selected for the graph.

Additional Plot Statistics Two different types of graph elements may be added to your survival
curves.

95% Confidence Intervals Selecting this will add the upper and lower
confidence lines in a stepped line format.

Standard Error Bars Selecting this will add error bars for the standard
errors of the survival probability. These are placed at the failure times.

LogRank Survival Analysis 683


Survival Analysis

All of these elements will be graphed with the same color as the survival
curve. You may change these colors, and other graph attributes, from
Graph Properties after the graph has been created.

Report Cumulative Probability Table Clear this option to exclude the


cumulative probability table from the report. This will reduce the length
of the report for large data sets.

P values for multiple comparisons Select this to show both the P


values from the pairwise multiple comparison tests and the critical values
against which the pairwise P values are tested. The critical values for the
Holm-Sidak test will vary for each pairwise test (see Figure 12–19). If
this is selected for the Bonferroni test, the critical values will be identical
for all pairwise tests.

( The critical P value for the LogRank test may also be changed.
Entering a different value for P Value for Significance at the Report
tab of Tools, Options does this. This is a global setting for the
critical P value and will affect all tests in SigmaStat.

Worksheet 95% Confidence Intervals Select this to place the survival curve upper
and lower 95% confidence intervals into the worksheet. These will be
placed into the first empty worksheet columns.

Time Units Select a time unit from the drop-down list or enter a unit. These units
will be used in the graph axis titles and the survival report.

Multiple Comparisons You can select when multiple comparisons are to be computed and
displayed in the report. LogRank tests the hypothesis of no differences
between survival groups but do not determine which groups are
different, or the sizes of these differences. Multiple comparison
procedures isolate these differences.

Always Perform Select this option to always display multiple


comparison results in the report. If the original comparison test is not
significant then the multiple comparison results will also be not
significant and will just clutter the report. The multiple comparison test
is a separate computation from the original comparison test so it is
possible to obtain significant results from the multiple comparison test
when the original test was insignificant.

Only when Survival P Value is Significant Select this to place


multiple comparison results in the report only when the original

LogRank Survival Analysis 684


Survival Analysis

comparison test is significant. The significance level can be set to either


0.05 or 0.01 using the Significance Value for Multiple Comparisons
drop-down list.

( If multiple comparisons are triggered, the report will show the


results of the comparison. You may elect to always show them by de-
selecting the Only when Survival P Value is Significant option.

Running a LogRank Survival Analysis 10

To run a LogRank survival analysis you need to select data in the


worksheet and specify the status variables.

To run a LogRank Survival analysis:

1 Specify any options for your graph, report and post-hoc tests (see
Figure 12–10, Figure 12–11, and Figure 12–12). You can do this
by selecting Survival LogRank in the Select Test drop-down list
and either clicking the Test Options button or selecting Current
Test Options from the Statistics menu.

2 To select your data before you run the test drag the pointer over
your data. The columns must be adjacent and in the correct order
(Time, Status for Raw data and Group, Time Status for Indexed
data).

3 Open the Data Format panel to start the LogRank analysis.To do


this you can either:

➤ Select Survival LogRank from the toolbar Select Test drop-down


list and then click the Run Test button.
➤ Choose the Statistics menu Survival command then choose
LogRank.
➤ Click the Run Test button from the Options for Survival
LogRank dialog (see Figure 12–10).

4 From the Data Format drop-down list select either:

➤ Raw data format when you have groups of data in multiple


Time, Status column pairs (see Figure 12–1).

LogRank Survival Analysis 685


Survival Analysis

➤ Indexed data format when you have the groups specified by a


column (see Figure 12–2).

FIGURE 12–13
The Data Format Panel With
Raw Data Format Selected

5 Click Next to display the Pick Columns panel that prompts you to
select your data columns. If you selected columns before you chose
the test, the selected columns appear in the Selected Columns list.

FIGURE 12–14
The Pick Columns Panel for
Survival LogRank Raw Data
Format Prompting You to
Select Multiple Time and
Status Columns

6 To assign the desired worksheet columns to the Selected Columns


list, select the columns in the worksheet, or select the columns
from the Data for drop-down list.

7 The first selected column is assigned to the first row (Time 1) in


the Selected Columns list, and the next selected column is assigned
to the next row (Status 1) in the list. The number or title of
selected columns appears in each row. Continue selecting Time,
Status columns for all groups that you wish to analyze. The results
of this selection are shown in Figure 12–14.

8 To change your selections, select the assignment in the list and


then select a new column from the worksheet. You can also clear a
column assignment by double-clicking it in the Selected Columns
list.

LogRank Survival Analysis 686


Survival Analysis

9 Select Next to choose the status variables. The status variables


found in the columns you selected are shown in the Status labels in
selected columns: window.

FIGURE 12–15
The Pick Columns for
Survival LogRank Panel
Prompting You to Select the
Status Variables

Figure 12–15 shows an example with one event variable “1”, two
censored variables “0” and “censored” and one variable
“lobectomy” to be ignored. Select these and click the right arrow
buttons to place the event variables in the Event window and the
censored variable in the Censored window. The result of this
selection is shown inFigure 12–16. All data associated with
“lobectomy” will be considered missing.

FIGURE 12–16
The Pick Columns for
Survival LogRank Dialog
Showing the Results of
Selecting the Status
Variables

You can have more than one Event label and more than one
Censored label. You must select one Event label in order to
proceed. You need not select a censored variable, though, and some
data sets will not have any censored values. You need not select all
the variables; any data associated with unselected status variables
will be considered missing.

10 Use the back arrow keys to remove labels from the Event and
Censored windows. This will place them back in the Status labels
in selected columns window.

LogRank Survival Analysis 687


Survival Analysis

The Event and Censored labels that you selected are saved for your
next analysis. If the next data set contains exactly the same status
labels, or if you are re-analyzing your present data set, then the
saved selections will appear in the Event and Censored windows.

11 Click Finish to create the survival graph and report. The results
you obtain will depend on the Test Options that you selected.

If you selected Indexed data format then the Pick Columns panel
asks you to select the three columns in the worksheet for your
Group, Time and Status (Figure 12–17).

FIGURE 12–17
The Pick Columns Panel for
Survival LogRank Indexed
Data Format Prompting You
to Select Group, Time and
Status Columns

12 Click Next to select the groups you want to include in the analysis.
If you want to analyze all groups found in the Group column then
select the Select all groups checkbox. Otherwise select groups from
the Data for Group drop-down list. You can select subsets of all
groups and, as shown in Figure 12–18, select them in the order
that you wish to see them in the report.

FIGURE 12–18
The Group Selection Panel
for Survival LogRank
Indexed Data Format
Prompting You to Select
Groups to Analyze

13 Click Next to select the status variables as described in steps 9


through 12 above and then continue to complete the analysis to
create the report and graph.

LogRank Survival Analysis 688


Survival Analysis

Multiple Comparison LogRank tests the hypothesis of no differences between the several
Options survival groups, but does not determine which groups are different, or
the sizes of the differences. Multiple comparison tests isolate these
differences by running comparisons between the experimental groups.

If you selected to run multiple comparisons only when the P value is


significant, and LogRank produces a P value equal to or less than the
trigger P value, or you selected to always run multiple comparison in the
Options for LogRank dialog (see Figure 12–12), the multiple
comparison results are displayed in the Report.

There are two multiple comparison tests to choose from for the
LogRank survival analysis:

➤ Holm-Sidak
➤ Bonferroni

Holm-Sidak Test The Holm-Sidak test is an improvement on the Bonferroni test that
avoids the low power and overconservatism that the Bonferroni test
yields. The Holm-Sidak test is a sequentially rejective procedure because
it applies an accept/reject criterion to a set of ordered null hypotheses
(Glantz, see page 1-12). The Bonferroni test is not sequential. The
Holm-Sidak test can be described by example using the VA Lung Cancer
data in Samples.jnb. The multiple comparison results are shown in
Figure 12–19.

FIGURE 12–19
Holm-Sidak Multiple
Comparison Results for VA
Lung Cancer Study

There are six comparisons of the four survival groups small, large, adeno
and squamous. The LogRank statistic is computed for all data pairs and
the corresponding P value (Unadjusted P Value) determined from the
chi-square distribution. The comparisons are ranked by ascending P
value and the critical P level computed (the critical P level depends only
on the rank, total number of comparisons and the family P value set in
Options). The unadjusted P value is compared to the critical level to

LogRank Survival Analysis 689


Survival Analysis

determine significance. Compare Figure 12–19 and Figure 12–20 to see


that one difference between the two tests is the computation of the
critical level. The Bonferroni critical level is constant since it is not a
sequential method.

Bonferroni Test The Bonferroni test performs pairwise comparisons with paired chi-
square tests. It is computationally similar to the Holm-Sidak test except
that it is not sequential (the critical level used is fixed for all
comparisons). The critical level is the ratio of the family P value (set in
Post Hoc Test Options - Figure 12–12) to the number of comparisons.
It is a more conservative test than the Holm-Sidak test in that the chi-
square value required to conclude that a difference exists becomes much
larger than it really needs to be. Bonferroni multiple comparison results
for the VA Lung Cancer data from Samples.jnb are shown in Figure
Figure 12–20.

FIGURE 12–20
Bonferroni Multiple
Comparison Results for VA
Lung Cancer Study

The critical level is constant at 0.05/6 = 0.00833. Since the critical level
does not increase, as it does for the Holm-Sidak test, there will tend to
be fewer comparisons with significant differences (they are the same for
this particular example but not for the Gehan-Breslow test, Figure
Figure 12–33).

Interpreting LogRank Survival Results 10

The LogRank survival analysis report displays information about the


origin of your data, tables containing the cumulative survival
probabilities for each group, summary statistics for each survival curve
and the LogRank test of significance. Multiple comparison test results
will also be displayed provided significant differences were found or the
Post Hoc Tests Options were selected to display them.

For descriptions of the derivations for survival curve statistics see


Hosmer & Lemeshow or Kleinbaum (page 1-12).

LogRank Survival Analysis 690


Survival Analysis

Results Explanations The number of significant digits displayed in the report may be set in
the Report Options dialog. For more information on setting report
options, see Setting Report Options on page 135.

Report Header The report header includes the date and time that the analysis was
Information performed. The data source is identified by the worksheet title
containing the data being analyzed and the notebook name. In Figure
12–21 the Data source shows the worksheet title to be “VA Lung Cancer
Trial” and the notebook title to be “Survival Analysis Data”. The event
and censor labels used in this analysis are listed. Also, the time units used
are displayed.

Survival Cumulative The survival probability table lists all event times and, for each event
Probability Table time, the number of events that occurred, the number of subjects
remaining at risk, the cumulative survival probability and its standard
error. The upper and lower 95% confidence limits are not displayed but
these may be placed into the worksheet (see Figure 12–11). Failure times
are not shown but you can infer their existence from jumps in the
Number at Risk data and the summary table immediately below this
table

You can turn the display of this table off by clearing this option in the
Results tab of Test Options. This is useful to keep the report a reasonable
length when you have large data sets.

Data Summary Table The data summary table shows the total number of cases. The sum of
the number of events, censored and missing values, shown below this,
will equal the total number of cases.

LogRank Survival Analysis 691


Survival Analysis

Statistical Summary The mean and percentile survival times and their statistics are listed in
Table this table. The median survival time is commonly used in publications.

FIGURE 12–21
The LogRank Survival
Analysis Results Report

LogRank Survival Graph 10

Visual interpretation of the survival curve is an important component of


survival analysis. For this reason SigmaStat always generates a survival
curve graph (see Figure 12–22). This is different from the other
statistical tests where you select a report graph a posteriori.

You can control the graph in three ways. You can set the graph options
shown in Figure 12–10 and these options will become the default values
until they are changed. After the graph is created you can modify it using
SigmaStat's Graph Options. Each object in the graph is a separate plot
(for example, survival curve, failure symbols, censored symbols, upper
confidence limit, etc.) so you have considerable control over the
appearance of your graph. If you also have SigmaPlot then you can use

LogRank Survival Analysis 692


Survival Analysis

Edit with SigmaPlot from the Graph menu and obtain additional
control over your graph.

FIGURE 12–22
LogRank Survival Curves
The default Test Options, gray
scale colors, solid circle
symbols, was used.
Squamous and large cell
carcinomas do not appear to
be significantly different (as
well as small cell and
adenocarcinoma). This is
confirmed by the Holm-Sidak
test (see Figure 12–19).

Gehan-Breslow Survival Analysis 10

The Gehan-Breslow option analyzes survival data from multiple groups,


creates a report and a graph showing multiple survival curves. Statistics
associated with each group, such as the median survival time, are
calculated and presented in the report.

The Gehan-Breslow test is also performed to determine whether survival


curves are significantly different. It is a nonparametric test that uses a
Gehan-Breslow statistic to reject the null hypothesis that the survival
curves came from the same population. The Gehan-Breslow statistic for
two groups is formed from the square of the sum, across all event times,
of the weighted difference of the observed and estimated number of
events (censored values removed) divided by the variance of this
weighted sum. The P value is then obtained from a chi-square
distribution using this statistic.

Gehan-Breslow Survival Analysis 693


Survival Analysis

The Gehan-Breslow test assumes that the early survival times are known
more accurately than later times and weights the data accordingly. As an
example, you would want to use Gehan-Breslow if there were many late-
survival-time censored values. This is different from the LogRank test
that assumes there is no difference in the accuracy of the survival times.

Performing a Gehan- To perform a Gehan-Breslow analysis:


Breslow Analysis
1 Enter or arrange your data in either Indexed or Raw data format in
the worksheet (see following sections).

2 If desired set the Gehan-Breslow options using the Options for


Survival Gehan-Breslow dialog (see Setting Gehan-Breslow
Survival Options on page 12-695.).

3 Select Survival Gehan-Breslow from the toolbar Select Test drop-


down list and click the Run Test button, or choose Survival,
Gehan-Breslow from the Statistics menu.

4 Select the appropriate data format - Indexed or Raw - and click


Next.

5 Pick the worksheet columns - multiple Time, Status column pairs


for Raw data format or Group, Time, Status for Indexed data
format - and click Next.

6 Select the groups from the Group panel if you selected Indexed
data format and click Next.

7 Select the Event and Censored labels. You may select multiple
labels for each.

8 Click Finish.

9 View and interpret the Gehan-Breslow survival analysis report and


curve (see Interpreting Gehan-Breslow Survival Results on page
12-705.).

Gehan-Breslow Survival Analysis 694


Survival Analysis

Arranging Gehan-Breslow Survival Analysis Data 10

Multiple Time, Status column pairs (two or more) are required for Raw
data format. Indexed data format requires three columns for Group,
Time and Status. You can preselect the data to have the column selection
panel automatically select the Time, Status column pairs if you organize
your worksheet with the Time column preceding the Status column and
have all columns be adjacent.

Setting Gehan-Breslow Survival Options 10

Use the Survival Gehan-Breslow Test Options to

➤ Specify attributes of the generated survival curve graph


➤ Customize the post-test contents of the report and worksheet
➤ Select the multiple comparison test and its options

To change the Survival Curve options:

1 If you are going to analyze your survival curve after changing test
options, and want to select your data before you create the curve,
then drag the pointer over your data.

2 To open the Test Options for Survival Gehan-Breslow dialog,


select Survival Gehan-Breslow from the toolbar drop-down list,
then click the Test Options button, or choose the Statistics menu
Current Test Options command. The options dialog appears.

FIGURE 12–23
The Options for
Survival Gehan-Breslow
Dialog Displaying the
Graph Options

Gehan-Breslow Survival Analysis 695


Survival Analysis

3 Click the Graph Options tab to view the graph symbol, line and
scaling options. Additional statistical graph elements may be
selected here.

Click the Results tab to specify the survival time units and to
modify the content of the report and worksheet.

FIGURE 12–24
The Options for Survival
Gehan-Breslow Dialog
Displaying the Report and
Worksheet Results Options

Click the Post Hoc Tests tab to modify the multiple comparison
options.

FIGURE 12–25
The Options for Survival
Gehan-Breslow Dialog
Displaying the Post Hoc Test
Options

4 Options settings are saved between SigmaStat sessions.

5 To continue the test, click the Run Test button. The Pick Columns
panel appears (see Running a Gehan-Breslow Survival Analysis on
page 699 for more information).

6 To accept the current settings and close the options dialog, click
OK. To accept the current settings without closing the options

Gehan-Breslow Survival Analysis 696


Survival Analysis

dialog, click Apply. To close the dialog without changing any


settings or running the test, click Cancel.

You can select Help at any time to access SigmaStat's on-line help
system.

( All options in these dialogs are “sticky” and will remain in the state that you
have selected until you change them.

Status Symbols All graph options apply to graphs that are created when the analysis is
run. You can use the Graph Properties dialog to modify the attributes of
the survival curves after they have been created.

Censored Symbols Select the Graph Options tab from the Options
dialog to view the status symbols options. Censored symbols are graphed
by default. Clear this option to not display the censored symbols.

Failures Symbols Checking this box will display symbols at the failure
times. These symbols always occupy the inside corners of the steps in the
survival curve. As such they provide redundant information and need
not be displayed.

Group Color The color of the objects in a survival curve group may be changed with
this option. All objects, e.g., survival line, symbols, confidence interval
lines, will be changed to the selected color. A four density gray scale
color scheme is used as the default. You may change this to black, where
all survival curves and their attributes will be black, or incrementing that
is a multi-color scheme. Use Graph Properties to modify individual
object colors after the graph has been created.

Survival Scale You can display the survival graph either using fractional values
(probabilities) or percents.

Fraction If you select this then the Y axis scaling will be from 0 to 1.

Percent Selecting this will result in a Y axis scaling from 0 to 100.

( The results in the report is always expressed in fractional terms no matter


which option is selected for the graph.

Gehan-Breslow Survival Analysis 697


Survival Analysis

Additional Plot Statistics Two different types of graph elements may be added to your survival
curves.

95% Confidence Intervals Selecting this will add the upper and lower
confidence lines in a stepped line format.

Standard Error Bars Selecting this will add error bars for the standard
errors of the survival probability. These are placed at the failure times.
All of these elements will be graphed with the same color as the survival
curve. You may change these colors, and other graph attributes, from
Graph Properties after the graph has been created.

Report Cumulative Probability Table Clear this option to exclude the


cumulative probability table from the report. This will reduce the length
of the report for large data sets.

P values for multiple comparisons Select this to show both the P


values from the pairwise multiple comparison tests and the critical values
against which the pairwise P values are tested. The critical values for the
Holm-Sidak test will vary for each pairwise test (Figure 12–32). If this is
selected for the Bonferroni test, the critical values will be identical for all
pairwise tests.

( The critical P value for the Gehan-Breslow test may also be changed.
Entering a different value for P Value for Significance at the Report tab of
Tools, Options does this. This is a global setting for the critical P value and
will affect all tests in SigmaStat.

Worksheet 95% Confidence Intervals Select this to place the survival curve upper
and lower 95% confidence intervals into the worksheet. These will be
placed into the first empty worksheet columns.

Time Units Select a time unit from the drop-down list or enter a unit.
These units will be used in the graph axis titles and the survival report.

Multiple Comparisons You can select when multiple comparisons are to be computed and
displayed in the report. Gehan-Breslow tests the hypothesis of no
differences between survival groups but do not determine which groups
are different, or the sizes of these differences. Multiple comparison
procedures isolate these differences.

Always Perform Select this option to always display multiple


comparison results in the report. If the original comparison test is not

Gehan-Breslow Survival Analysis 698


Survival Analysis

significant then the multiple comparison results will also be not


significant and will just clutter the report. The multiple comparison test
is a separate computation from the original comparison test so it is
possible to obtain significant results from the multiple comparison test
when the original test was insignificant.

Only when Survival P value is significant Select this to place


multiple comparison results in the report only when the original
comparison test is significant. The significance level can be set to either
0.05 or 0.01 using the Significance Value for Multiple Comparisons
drop-down list.

( If multiple comparisons are triggered, the report will show the results of the
comparison. You may elect to always show them by clearing Only when
Survival P Value is Significant.

Running a Gehan-Breslow Survival Analysis 10

To run a Gehan-Breslow survival analysis you need to select data in the


worksheet and specify the status variables.

To run a Gehan-Breslow Survival analysis:

1 Specify any options for your graph, report and post-hoc tests (see
Figure 12–23, Figure 12–24, and Figure 12–25). You can do this
by selecting Survival Gehan-Breslow in the Select Test drop-down
list and either clicking the Test Options button or selecting
Current Test Options from the Statistics menu.

2 If you want to select your data before you run the test then drag
the pointer over your data. The columns must be adjacent and in
the correct order (Time, Status for Raw data and Group, Time
Status for Indexed data).

3 Open the Data Format panel to start the Gehan-Breslow analysis.


To do this you can either:

➤ SelectSurvival Gehan-Breslow from the toolbar drop-down list


and then click the ?? button.
➤ Choose the Statistics menu Survival command then choose
Gehan-Breslow.

Gehan-Breslow Survival Analysis 699


Survival Analysis

➤ Clickthe Run Test button from the Options for Survival


Gehan-Breslow dialog (see Figure 12–23).

FIGURE 12–26
The Data Format Panel With
Raw Data Format Selected

4 From the Data Format drop-down list select either:

➤ Raw data format when you have groups of data in multiple


Time, Status column pairs (see Figure 12–26).
➤ Indexed data format when you have the groups specified by a
column.

5 Click Next to display the Pick Columns panel that prompts you to
select your data columns. If you selected columns before you chose
the test, the selected columns appear in the Selected Columns list.

6 To assign the desired worksheet columns to the Selected Columns


list, select the columns in the worksheet, or select the columns
from the Data for drop-down list.

The first selected column is assigned to the first row (Time 1) in


the Selected Columns list, and the next selected column is assigned
to the next row (Status 1) in the list. The number or title of

Gehan-Breslow Survival Analysis 700


Survival Analysis

selected columns appears in each row. Continue selecting Time,


Status columns for all groups that you wish to analyze.

FIGURE 12–27
The Pick Columns for
Survival LogRank Panel
Prompting You to Select
Multiple Time and Status
Columns

7 To change your selections, select the assignment in the list and


then select a new column from the worksheet. You can also clear a
column assignment by double-clicking it in the Selected Columns
list.

FIGURE 12–28
The Pick Columns for
Survival Gehan-Breslow
Panel Prompting You to
Select the Status Variables

8 Select Next to choose the status variables. The status variables


found in the columns you selected are shown in the Status labels in
selected columns: window. Figure 12–28 shows an example with
two event variables, “failure” and “1”, and one censored variable
“0”. Select these and click the right arrow buttons to place the
event variables in the Event window and the censored variable in

Gehan-Breslow Survival Analysis 701


Survival Analysis

the Censored window. The result of this selection is shown in


Figure 12–29.

FIGURE 12–29
The Pick Columns for
Survival Gehan-Breslow
Dialog Showing the Results
of Selecting the Status
Variables

You can have more than one Event label and more than one
Censored label. You must select one Event label in order to
proceed. You need not select a censored variable, though, and some
data sets will not have any censored values. You need not select all
the variables; any data associated with unselected status variables
will be considered missing.

9 Use the back arrow keys to remove labels from the Event and
Censored windows. This will place them back in the Status labels
in selected columns window.

The Event and Censored labels that you selected are saved for your
next analysis. If the next data set contains exactly the same status
labels, or if you are re-analyzing your present data set, then the
saved selections will appear in the Event and Censored windows.

10 Click Finish to create the survival graph and report. The results
you obtain will depend on the Test Options that you selected.

Gehan-Breslow Survival Analysis 702


Survival Analysis

11 If you selected Indexed data format then the Pick Columns panel
asks you to select the three columns in the worksheet for your
Group, Time and Status.

FIGURE 12–30
The Pick Columns Panel for
Survival Gehan-Breslow
Indexed Data Format
Prompting You to Select
Group, Time and Status
Columns

12 Click Next to select the groups you want to include in the analysis.
If you want to analyze all groups found in the Group column then
select the Select all groups checkbox. Otherwise select groups from
the Data for Group drop-down list. You can select subsets of all
groups and select them in the order that you wish to see them in
the report.

FIGURE 12–31
The Group Selection Panel
for Survival Gehan-Breslow
Indexed Data Format
Prompting You to Select
Groups to Analyze

13 Click Next to select the status variables as described in steps 9 - 13


above and then to complete the analysis to create the report and
graph.

Multiple Comparison Gehan-Breslow tests the hypothesis of no differences between the several
Options survival groups, but does not determine which groups are different, or
the sizes of the differences. Multiple comparison tests isolate these
differences by running comparisons between the experimental groups.

If you selected to run multiple comparisons only when the P value is


significant, and Gehan-Breslow produces a P value equal to or less than
the trigger P value, or you selected to always run multiple comparison in

Gehan-Breslow Survival Analysis 703


Survival Analysis

the Options for Gehan-Breslow dialog (see Figure Figure 12–25), the
multiple comparison results are displayed in the Report.

There are two multiple comparison tests to choose from for the
LogRank survival analysis:

➤ Holm-Sidak
➤ Bonferroni

Holm-Sidak Test The Holm-Sidak test is an improvement on the Bonferroni test that
avoids the low power and overconservatism that the Bonferroni test
yields. The Holm-Sidak test is a sequentially rejective procedure because
it applies an accept/reject criterion to a set of ordered null hypotheses
(Glantz, page 1-12). The Bonferroni test is not sequential. The Holm-
Sidak test can be described by example using the VA Lung Cancer data
in Samples.jnb.

FIGURE 12–32
Holm-Sidak Multiple
Comparison Results for VA
Lung Cancer Study

There are six comparisons of the four survival groups small, large, adeno
and squamous. The Gehan-Breslow statistic is computed for all data
pairs and the corresponding P value (Unadjusted P Value) determined
from the chi-square distribution. The comparisons are ranked by
ascending P value and the critical P level computed (the critical P level
depends only on the rank, total number of comparisons and the family P
value set in Test Options). The unadjusted P value is compared to the
critical level to determine significance. Compare Figure 12–32 and
Figure 12–33 to see that one difference between the two tests is the
computation of the critical level. The Bonferroni critical level is constant
since it is not a sequential method.

Bonferroni Test The Bonferroni test performs pairwise comparisons with paired chi-
square tests. It is computationally similar to the Holm-Sidak test except
that it is not sequential (the critical level used is fixed for all
comparisons). The critical level for the Bonferroni test is the ratio of the

Gehan-Breslow Survival Analysis 704


Survival Analysis

family P value (set in Test Options, Figure 12–25) to the number of


comparisons. It is a more conservative test than the Holm-Sidak test in
that the chi-square value required to conclude that a difference exists
becomes much larger than it really needs to be. Bonferroni multiple
comparison results for the VA Lung Cancer data from Samples.jnb are
shown in Figure 12–33.

FIGURE 12–33
Bonferroni Multiple
Comparison Results for VA
Lung Cancer Study

The critical level is constant at 0.05/6 = 0.00833. Since the critical level
does not increase, as it does for the Holm-Sidak test, there will tend to
be fewer comparisons with significant differences. This occurs here with
three significant comparisons as compared to four for the Holm-Sidak
case.

Interpreting Gehan-Breslow Survival Results 10

The Gehan-Breslow survival analysis report displays information about


the origin of your data, tables containing the cumulative survival
probabilities for each group, summary statistics for each survival curve
and the Gehan-Breslow test of significance. Multiple comparison test
results will also be displayed provided significant differences were found
or the Post Hoc Tests Options were selected to display them.

For descriptions of the derivations for survival curve statistics see


Hosmer & Lemeshow or Kleinbaum (page 1-12).

Results Explanations The number of significant digits displayed in the report may be set in
the Report Options dialog. For more information on setting report
options, Setting Report Options on page 135.

Report Header Information The report header includes the date and
time that the analysis was performed. The data source is identified by
the worksheet title containing the data being analyzed and the notebook
name. In Figure 12–34 the Data source shows the worksheet title to be

Gehan-Breslow Survival Analysis 705


Survival Analysis

“VA Lung Cancer Trial” and the notebook name to be “Survival Analysis
Data”. The event and censor labels used in this analysis are listed. Also,
the time units used are displayed.

Survival Cumulative Probability Table The survival probability table


lists all event times and, for each event time, the number of events that
occurred, the number of subjects remaining at risk, the cumulative
survival probability and its standard error. The upper and lower 95%
confidence limits are not displayed but these may be placed into the
worksheet (see Figure 12–24). Failure times are not shown but you can
infer their existence from jumps in the Number at Risk data and the
summary table immediately below this table

You can turn the display of this table off by clearing this option in the
Results tab of Test Options. This is useful to keep the report a reasonable
length when you have large data sets.

Data Summary Table The data summary table shows the total number
of cases. The sum of the number of events, censored and missing values,
shown below this, will equal the total number of cases.

Statistical Summary Table The mean and percentile survival times and
their statistics are listed in this table. The median survival time is
commonly used in publications.

FIGURE 12–34
The Gehan-Breslow
Survival Analysis
Results Report

Gehan-Breslow Survival Analysis 706


Survival Analysis

Gehan-Breslow Survival Graph 10

Visual interpretation of the survival curve is an important component of


survival analysis. For this reason SigmaStat always generates a survival
curve graph (Figure 12–35). This is different from the other statistical
tests where you select a report graph a posteriori.

You can control the graph in three ways. You can set the graph options
shown in Figure 12–23 and these options will become the default values
until they are changed. After the graph is created you can modify it using
SigmaStat's Graph Options. Each object in the graph is a separate plot
(e.g., survival curve, failure symbols, censored symbols, upper
confidence limit, etc.) so you have considerable control over the
appearance of your graph. If you also have SigmaPlot then you can use
Run SigmaPlot from the Graph menu and obtain additional control
over your graph.

FIGURE 12–35
Gehan-Breslow
Survival Curves
Incrementing colors, percent
survival and 95% confidence
interval options were
selected from Test Options.
The Holm-Sidak test showed
these two curves to be
significantly different at the
0.001 level.

Gehan-Breslow Survival Analysis 707


Survival Analysis

Failures, Censored Values and Ties 10

It is useful to understand the relationship between failures, censored


values and ties and also the effect that these have on the shape of the
survival curve. Some rules that characterize survival curves are:

➤ A step decrease occurs at every failure.


➤ Larger step decreases result from multiple failures occurring at the
same time (ties).
➤ The curve does not decrease at a censored value.
➤ Tied failure (and failure and censored) values superimpose at the
appropriate inside corner of the step survival curve.
➤ It is useful to display symbols for censored values.
➤ It is not necessary to display symbols for failures.
➤ The survival curve decreases to zero if the largest survival time is a
failure.
➤ Censored values cause the survival curve to decrease more slowly.

FIGURE 12–36
A contrived survival curve
with various combinations
of failures, censored values
and tied data that
graphically shows the
effects of these rules

Failures and censored values are shown in Figure 12–36 as open and
filled circles, respectively. A single failure is shown at time = 1.0. It is
located at the inner corner of the step curve. All failures will occur at the
inner corners so it is not necessary to display failure symbols. Failure
symbols may be displayed in SigmaStat but by default are not. Two tied

Failures, Censored Values and Ties 708


Survival Analysis

failures are shown at time = 2.0. They superimpose at the inner corner of
the step that has decreased roughly twice as much as the step for a single
failure. Four censored values, two of which are tied, are shown in the
time interval between 2.0 and 8.0. Censored values do not cause a
decrease in the survival curve and nothing unusual occurs at tied censor
values. Four tied values, two failures and two censored, are shown at
time = 8.0 (the censored values are slightly displaced for clarity). They
occur at the inside corner of the step since that is where failures are
located. The censored value at time = 19.0 prevents the survival curve
from touching the X-axis.

Survival Curve Graph Examples 10

The survival curve attributes may be modified using Test Options,


Graph Properties and SigmaPlot. To learn more about using the Graph
Properties dialog box, see Modifying Graph Attributes on page 184. To
learn more about editing graphs using SigmaPlot, see Using SigmaPlot
to Modify Graphs on page 189

Test Options Figure 12–37 shows four variations that can be achieved by modifying
survival curve Test Options.

➤ A. Survival curve with censored symbols.


➤ B. Survival curve with censored and failure symbols.
➤ C. Survival curve with both symbol types and 95% confidence
intervals.
➤ D. Survival curve with standard error bars.

Survival Curve Graph Examples 709


Survival Analysis

FIGURE 12–37
For Variations of
Survival Graphs 1 .0

that Can Be
1 .0
A B
Achieved by
Modifying 0 .8 0 .8
Survival Curve
S u r v iv a l

S u r v iv a l
Test Options
0 .6 0 .6

0 .4
0 .4

0 .2
0 .2

0 .0
0 .0
0 1 0 2 0 3 0 4 0
0 1 0 2 0 3 0 4 0
T im e ( D a y s )
T im e (D a y s )

1 .0
C 1 .0
D
0 .8
0 .8
S u r v iv a l

S u r v iv a l

0 .6
0 .6

0 .4
0 .4

0 .2
0 .2

0 .0
0 .0
0 1 0 2 0 3 0 4 0
0 1 0 2 0 3 0 4 0
T im e ( D a y s )
T im e ( d a y s )

Survival Curve Graph Examples 710


Survival Analysis

Graph Properties Figure 12–38 shows modifications made from Graph Properties to the
graph in Figure 12–37, C. The confidence interval lines were changed
from small gray dashed to solid blue. The censored symbol type was also
changed from a solid circle to a square.

FIGURE 12–38
Modifications made from
Graph Properties to the
graph in Figure 12–37, C.
1.0

0.8

0.6
Survival

0.4

0.2

0.0
0 10 20 30 40

Time (Days)

Survival Curve Graph Examples 711


Survival Analysis

SigmaPlot If you have access to SigmaPlot you have complete control over your
graph. Figure 13-39 shows a graph generated in SigmaStat that has been
modified by using Run SigmaPlot from the Graph menu. The
background color and grid lines were added using custom colors. One of
the custom colors was used for the axis lines. Custom colors were used
for the survival curve lines and symbols. Tick marks were removed and
legend modifications made.

FIGURE 12–39
Modification of the Irradiation Effectiveness
Survival Curve Graph
using SigmaPlot
100
Affected Node
Total Node
Relapse Free Percentage

80

60

40

20

0
0 500 1000 1500 2000 2500

Time to Relapse (days)

Survival Curve Graph Examples 712


Computing Power and Sample Size

13 Computing Power and Sample Size

SigmaStat provides two experimental design aids: experimental power,


and sample size computations. Use these procedures to determine the
power of an intended test or to determine the minimum sample size
required to achieve a desired level of power.

Power and sample size computations are available for:

➤ Unpaired and Paired t-tests


➤ z-test comparison of proportions
➤ ANOVAs
➤ Chi-square Analysis of Contingency Tables
➤ Correlation Coefficient

About Power 10

The power, or sensitivity, of a test is the probability that the test will
detect a difference or effect if there really is a difference or effect. The
closer the power is to 1, the more sensitive the test. Traditionally, you
want to achieve a power of 0.80, which means that there is an 80%
chance of detecting a specified effect with 1 !" confidence (i.e., a 95%
confidence when
"!# 0.05). Power less than 0.001 is noted as "P # < 0.001."

The power of a statistical test depends on:

➤ The specific test.


➤ The alpha ("), or acceptable risk of a false positive (see below).
➤ The sample size.

About Power 713


Computing Power and Sample Size

➤ The minimum difference or treatment effect to detect.


➤ The underlying variability of the data.

FIGURE 13–1
The Power Computation
Commands Menu

About Sample Size 10

You can estimate how big the sample size has to be in order to detect the
a treatment effect or difference with a specified level of statistical
significance and power. All else being equal, the larger the sample size,
the greater the power of the test.

Determining the Power of a t-Test 10

You can determine the power of an intended t-test. Unpaired t-tests are
used to compare two different samples from populations that are
normally distributed with equal variances among the individuals. For
more information on running t-tests, see Running a t-test on page 212.

To determine the power for a t-test, you need to set the:

➤ Expected difference of the means of the groups you want to detect.

About Sample Size 714


Computing Power and Sample Size

➤ Expected standard deviation of the groups.


➤ Expected sizes of the two groups.
➤ Alpha $") used for power computations.

To find the power of a t-test:

1 Choose the Statistics menu Power command and choose t-test.


The
t-test Power dialog box appears.

FIGURE 13–2
The t-test Power Dialog Box

2 Enter the size of the difference between the means of the two
groups you want to be able to detect in the Expected Difference of
Means box. This can be the size you expect to see, as determined
from previous samples or experiments, or just an estimate.

3 Enter the estimated size of the standard deviation for the


population your data will be drawn from in the Expected Standard
Deviation box. This can be the size you expect to see, as
determined from previous samples or experiments, or just an
estimate.

Note that t-tests assume that the standard deviations of the


underlying normally distributed populations are equal.

4 Enter the expected sizes of each group in the Group 1 Size and
Group 2 Size boxes.

5 If desired, change the alpha level. Alpha (") is the acceptable


probability of incorrectly concluding that there is a difference. An

Determining the Power of a t-Test 715


Computing Power and Sample Size

" error is also called a Type I error (a Type I error is when you
reject the hypothesis of no effect when this hypothesis is true).
The traditional " value used is 0.05. This indicates that a one in
twenty chance of error is acceptable, or that you are willing to
conclude there is a significant difference when P % 0.05.

Smaller values of "!result in stricter requirements before


concluding there is a significant difference, but a greater possibility
of concluding there is no difference when one exists (a Type II
error). Larger values of " make it easier to conclude that there is a
difference, but also increase the risk of reporting a false positive (a
Type I error).

6 Select the # button to see the power of a t-test at the specified


conditions. The Power calculation appears at the tip of the dialog
box. If desired, you can change any of the settings and select the #
button again to view the new power as many times as desired.

7 Select the Save to Report option to save the power computation


settings and resulting power to the current report and select Close
to exit from t-test power computation.

FIGURE 13–3
The t-test Power
Computation
Results
Viewed in the
Report

For descriptions of computing the power of a t-test, you can reference an


appropriate statistics reference. For a list of suggested references, see
page 12.

Determining the Power of a t-Test 716


Computing Power and Sample Size

Determining the Power of a Paired t-Test 10

You can determine the power of a Paired t-test. Paired t-tests are used to
see if there is a change in the same individuals before and after a single
treatment or change in condition. The sizes of the treatment effects are
assumed to be normally distributed. For more information on
performing Paired t-tests, see page 337.

To determine the power for a Paired t-test, you need to set the

➤ expected change before and after treatment you want to detect


➤ expected standard deviation of the changes
➤ number of subjects
➤ alpha (") used for power computations

To find the power of a Paired t-test:

1 Choose the Statistics menu Power command, and choose Paired t-


test. The Paired t-test Power dialog box appears.

FIGURE 13–4
The Paired t-test
Power Dialog Box

2 Enter the size of the change before and after the treatment in the
Change to be Detected box. The size of the change is determined
by the difference of the means. This can be size of the treatment
effect you expect to see, as determined from previous experiments,
or just an estimate.

3 Enter the size of standard deviation of the change in the Expected


Standard Deviation of Change box. This can be the size you

Determining the Power of a Paired t-Test 717


Computing Power and Sample Size

expect to see, as determined from previous experiments, or just an


estimate.

4 Enter the expected (or estimated) number of subjects in the


Desired Sample Size box.

5 Enter the desired alpha level. Alpha (") is the acceptable


probability of incorrectly concluding that there is an effect. The
traditional " value used is 0.05. This indicates that a one in
twenty chance of error is acceptable, or that you are willing to
conclude there is a significant treatment difference when P % 0.05.

Smaller values of " result in stricter requirements before


concluding there is a significant effect, but a greater possibility of
concluding there is no effect when one exists (a Type II error).
Larger values of " make it easier to conclude that there is an effect,
but also increase the risk of reporting a false positive (a Type I
error).

6 Select the # button to see the power of a Paired t-test at the


specified conditions. If desired, you can change any of the settings
and select the #!button again to view the new power as many times
as desired.

7 Select the Save to Report option to save the power computation


settings and resulting power to the current report and select Close
to exit from Paired t-test power computation.

FIGURE 13–5
The Paired t-test
Power
Computation
Results Viewed
in the Report

For descriptions of computing the power of a Paired t-test, you can


reference an appropriate statistics reference. For a list of suggested
references, page 12.

Determining the Power of a Paired t-Test 718


Computing Power and Sample Size

Determining the Power of a z-Test Proportions Comparison 10


You can determine the power of a z-test comparison of proportions. A
comparison of proportions compares the difference in the proportion of
two different groups that fall within a single category. For more
information on running z-tests, see page 438.

To determine the power for a proportion comparison, you need to set


the:

➤ Expected proportion of each group that falls within the category.


➤ Size of each sample.
➤ Alpha (") used for power computations.

To find the power of a z-test proportion comparison:

1 Choose the Statistics menu Power command and choose


Proportions. The Proportions Power dialog box appears.

2 Enter the expected proportions that fall into the category for each
group. This can be the distribution you expect to see, as
determined from previous experiments, or just an estimate.

3 Enter the sizes of each group. This can be sample sizes you expect
to obtain, or just an estimate.

FIGURE 13–6
The Proportions
Power Dialog Box

Determining the Power of a z-Test Proportions Comparison 719


Computing Power and Sample Size

4 Enter the desired alpha level. Alpha (") is the acceptable


probability of incorrectly concluding that there is an effect. The
traditional " value used is 0.05. This indicates that a one in
twenty chance of error is acceptable, or that you are willing to
conclude there is a significant distribution difference when P %
0.05.

Smaller values of " result in stricter requirements before


concluding there is a significant difference, but a greater possibility
of concluding there is no difference in distribution when one exists
(a Type II error). Larger values of " make it easier to conclude
that there is a difference, but also increase the risk of reporting a
false positive (a Type I error).

5 Select the #!button to see the power of a proportion comparison at


the specified conditions. If desired, you can change any of the
settings and select the # button again to view the new power as
many times as desired.

& The Yates correction factor is used if this option is selected in the
Options for z-test dialog box. See page 435 for information on
setting z-test options.

6 Select the Save to Report button to save the power computation


settings and resulting power to the current report.

7 Select Close to exit from proportion comparison power


computation.

FIGURE 13–7
The Proportion Power
Computation Results
Viewed in the Report

Determining the Power of a z-Test Proportions Comparison 720


Computing Power and Sample Size

For descriptions of computing the power of a z-test, you can reference an


appropriate statistics reference. For a list of suggested references, see
page 12.

Determining the Power of a One Way ANOVA 10

You can determine the power of a One Way ANOVA (analysis of


variance). One Way ANOVAs are used to see if there is a difference
among two or more samples taken from populations that are normally
distributed with equal variances among the individuals. For more
information on performing a One Way ANOVA, see page 237.

To determine the power for a One Way ANOVA, you need to specify
the:

➤ Minimum difference between group means you want to detect.


➤ Standard deviation of the population from which the samples were
drawn.
➤ Estimated number of groups.
➤ Estimated size of a group.
➤ Alpha (") used for power computations.

To find the power of a One Way ANOVA:

Determining the Power of a One Way ANOVA 721


Computing Power and Sample Size

1 Choose the Statistics menu Power command and choose ANOVA.


The ANOVA Power dialog box appears.

FIGURE 13–8
The ANOVA Power Dialog
Box

2 Enter the size of the expected difference of group means in the


Difference in Group Means to be Detected box. This can be size
of a difference you expect to see, as determined from previous
experiments, or just an estimate.

3 Enter the estimated standard deviation of the population from


which the samples will be drawn. This can be size you expect to
see, as determined from previous experiments, or just an estimate.

4 Enter the expected number of groups and the expected size of each
group.

5 Enter the desired alpha level. Alpha (") is the acceptable


probability of incorrectly concluding that there is an effect. The
traditional "!value used is 0.05. This indicates that a one in
twenty chance of error is acceptable, i.e., you are willing to
conclude there is a significant difference when P % 0.05.

6 Smaller values of " result in stricter requirements before


concluding there is a significant difference, but a greater possibility
of concluding there is no difference when one exists (a Type II
error). Larger values of " make it easier to conclude that there is a
difference, but also increase the risk of reporting a false positive (a
Type I error).

Determining the Power of a One Way ANOVA 722


Computing Power and Sample Size

7 Select the # button to see the power of a One Way ANOVA at the
specified conditions. The power calculation appears at the top of
the dialog box. If desired, you can change any of the settings and
select the # button again to view the new power as many times as
desired.

8 Select the Save to Report button to save the power computation


settings and resulting power to the current report.

9 Select Close to exit from ANOVA power computation.

FIGURE 13–9
The ANOVA Power
Computation Results
Viewed in the Report

For descriptions of computing the power of a One Way ANOVA, you


can reference an appropriate statistics reference. For a list of suggested
references, see page 12.

Determining the Power of a Chi-Square Test 10

You can determine the power of a chi-square ('2) analysis of a


contingency table. A '2 test compares the difference between the
expected and observed number of individuals of two or more different
groups that fall within two or more categories. For more information on
performing '2 tests, see
page 446.

The power of a '2 analysis contingency tables is determined by the


estimated relative proportions in each category for each group. Because
SigmaStat uses numbers of observations to compute the estimated
proportions, you need to enter a contingency table in the worksheet

Determining the Power of a Chi-Square Test 723


Computing Power and Sample Size

containing the estimated pattern in the observations before you can


TABLE 13-1
The Contingency Table
with Expected Numbers
of Observations of Two
Groups in Three Categories

compute the estimated proportions.

Group Categories

Category 1 Category 2 Category 3

Group 1 15 15 35

Group 2 15 30 10

& You only need to specify the pattern (distribution) of the number of
observations. The absolute numbers in the cells do not matter, only their
relative values.

To find the power of a '2 test:

1 Enter a contingency table into the worksheet by placing the


estimated number of observations for each table cell in a
corresponding worksheet cell. These observations are used to
compute the estimated proportions.

FIGURE 13–10
Contingency Table Data
Entered into the Worksheet

The worksheet rows and columns correspond to the groups and


categories. The number of observations must always be an integer.

Note that the order and location of the rows or columns


corresponding to the groups and categories is unimportant.

Determining the Power of a Chi-Square Test 724


Computing Power and Sample Size

Choose the Statistics menu Power command and choose Chi-


square. The Pick Columns for Chi-Square Power dialog box
appears.

2 Select the columns of the contingency table from the worksheet as


prompted. Click Finish when you’ve selected the desired columns.
The Chi-square Power dialog box appears.

FIGURE 13–11
The Chi-square
Power Dialog Box

3 Enter the total number of observation in the Sample Size box.


This can be number of observations you expect to see, as
determined from previous experiments, or just an estimate.

4 Enter the desired alpha level. Alpha (") is the acceptable


probability of incorrectly concluding that there is a difference.
The traditional " value used is 0.05. This indicates that a one in
twenty chance of error is acceptable, or that you are willing to
conclude there is a significant difference when P % 0.05.

Smaller values of " result in stricter requirements before


concluding there is a significant difference, but a greater possibility
of concluding there is no effect when one exists (a Type II error).
Larger values of "!make it easier to conclude that there is a
difference, but also increase the risk of reporting a false positive (a
Type I error).

5 Select the # button to see the power of a chi-square test at the


specified conditions. If desired, you can change any of the settings
and select the # button again to view the new power as many times
as desired. However, if you want to change the number of
observations per category, you need to select Cancel, edit the table,
then repeat the sample size computation.

Determining the Power of a Chi-Square Test 725


Computing Power and Sample Size

6 Select the Save to Report option to save the power computation


settings and resulting power to the current report file, and then
select Cancel to exit from chi-square test power computation.

FIGURE 13–12
The Chi-square Power
Computation Results
Viewed in the Report

For descriptions of computing the power of a chi-square analysis of


contingency tables, you can reference an appropriate statistics reference.
For a list of suggested references, see page 12.

Determining the Power to Detect a Specified Correlation 10

You can determine the power to detect a given Pearson Product Moment
Correlation Coefficient r. A correlation coefficient quantifies the
strength of association between the values of two variables. A correlation
coefficient of 1 means that as one variable increases, the other increases
exactly linearly. A correlation coefficient of 1 means that as one
variable increases, the other decreases exactly linearly. For more
information on computing the correlation coefficient, see page 467.

To determine the power of a correlation coefficient, you need to specify


the:

➤ Correlation coefficient you want to detect.


➤ Desired sample size.
➤ Alpha(") used for power computations.

To find the power to detect a correlation coefficient:

Determining the Power to Detect a Specified Correlation 726


Computing Power and Sample Size

1 Choose the Statistics menu Power command and choose


Correlation. The Correlation Power dialog box appears.

FIGURE 13–13
The Correlation
Power Dialog Box

2 Enter the expected correlation coefficient. This can be the


correlation coefficient you expect to see, as determined from
previous experiments, or just an estimate.

3 Enter the desired number of data points. This can be the sample
size you expect to obtain, or just an estimate.

4 Enter the desired alpha level. Alpha (") is the acceptable


probability of incorrectly concluding that there is an association.
The traditional " value used is 0.05. This indicates that a one in
twenty chance of error is acceptable, or that you are willing to
conclude there is an association when P % 0.05.

Smaller values of " result in stricter requirements before


concluding there is a true association, but a greater possibility of
concluding there is no relationship when one exists (a Type II
error). Larger values of "!make it easier to conclude that there is
an association, but also increase the risk of reporting a false
positive (a Type I error).

5 Select the # button to see the power of a correlation coefficient at


the specified conditions. The power calculation appears at the top
of the dialog box. If desired, you can change any of the settings
and select the # button again to view the new power as many times
as desired.

6 Select the Save to Report button to save the power computation


settings and resulting power to the current report, and then select
Close to exit from correlation coefficient power computation.

Determining the Power to Detect a Specified Correlation 727


Computing Power and Sample Size

FIGURE 13–14
The Correlation
Coefficient Power
Computation
Results Viewed in
the Report

For descriptions of computing the power to detect a correlation


coefficient, you can reference an appropriate statistics reference. For a
list of suggested references, see page 12.

Determining the Minimum Sample Size for a t-Test 10

You can determine the minimum sample size for an intended t-test.
Unpaired t-tests are used to compare two different samples from
populations that are normally distributed with equal variances among
the individuals. For more information on running t-tests, see page 212.

To determine the sample size for a t-test, you need to specify the:

➤ Expected difference of the means of the groups you want to detect.


➤ Expected standard deviation of the underlying populations.
➤ Desired power of the t-test.
➤ Alpha level (") used for determining the sample size.

To determine the sample size of a t-test:

1 Choose the Statistics menu Sample Size command and choose t-


test. The t-test Sample Size dialog box appears.

2 Enter the size of the difference between the means of the two
groups to be detected in the Expected Difference in Means box.
This can be size you expect to see, as determined from previous
samples or experiments, or just an estimate.

Determining the Minimum Sample Size for a t-Test 728


Computing Power and Sample Size

FIGURE 13–15
The t-test Sample
Size Dialog Box

3 Enter the estimated standard deviation of the underlying


population in the Expected Standard Deviation box. This can be
size you expect to see, as determined from previous samples or
experiments, or just an estimate.

Note that t-tests assume that the standard deviations of the


underlying normally distributed populations are equal.

4 Enter the desired power, or test sensitivity. Power is the probability


that the t-test will detect a difference if there really is a difference.
The closer the power is to 1, the more sensitive the test.

Traditionally, you want to achieve a power of 0.80, which means


that there is an 80% chance of detecting a difference with 1 !"
confidence (i.e., a 95% confidence when "!#!0.05).

5 Enter the desired alpha level. Alpha (") is the acceptable


probability of incorrectly concluding that there is a difference.

The traditional " value used is 0.05. This indicates that a one in
twenty chance of error is acceptable, or that you are willing to
conclude there is a significant difference when P % 0.05.

Smaller values of " result in stricter requirements before


concluding there is a significant difference, but a greater possibility
of concluding there is no difference when one exists (a Type II
error). Larger values of " make it easier to conclude that there is a
difference, but also increase the risk of reporting a false positive (a
Type I error).

Determining the Minimum Sample Size for a t-Test 729


Computing Power and Sample Size

6 Select the # button to see the required sample size for a t-test at the
specified conditions. The sample size calculation appears at the
top of the dialog box. The sample size is the size of each of the
groups. If desired, you can change any of the settings and select
the # button again to view the new sample size as many times as
desired.

7 Select the Save to Report button to save the sample size


computation settings and resulting sample size to the current
report.

8 Select Close to exit from t-test sample size computation.

FIGURE 13–16
The t-test
Sample Size
Results Viewed
in the Report

For descriptions of computing the sample size for a t-test, you can
reference an appropriate statistics reference. For a list of suggested
references, see page 12.

Determining the Minimum Sample Size for a Paired t-Test 10

You can determine the sample size for a Paired t-test. Paired t-tests are
used to see if there is a change in the same individuals before and after a
single treatment or change in condition. The sizes of the treatment
effects are assumed to be normally distributed. For more information on
running Paired t-tests, see page 337.

To determine the sample size for a Paired t-test, you need to estimate
the:

Determining the Minimum Sample Size for a Paired t-Test 730


Computing Power and Sample Size

➤ Difference of the means you wish to detect.


➤ Estimated standard deviation of the changes in the underlying
population.
➤ Desired power or sensitivity of the test.
➤ Alpha $"( used to determine the sample size.

To find the sample size for a Paired t-test:

1 Choose the Statistics menu Sample Size command and choose


Paired
t-test. The Paired t-test Sample Size dialog box appears.

2 Enter the size of the change before and after the treatment in the
Change to be Detected box. This can be size of the treatment
effect you expect to see, as determined from previous experiments,
or just an estimate.

3 Enter the size of standard deviation of the change in the Expected


Standard Deviation of Change. This can be size you expect to see,
as determined from previous experiments, or just an estimate.

FIGURE 13–17
The Paired t-test
Sample Size Dialog Box

4 Enter the desired power, or test sensitivity. Power is the probability


that the paired t-test will detect an effect if there really is an effect.
The closer the power is to 1, the more sensitive the test.
Traditionally, you want to achieve a power of 0.80, which means
that there is an 80% chance of detecting an effect with 1 !"
confidence (i.e., a 95% confidence when "!#!0.05).

5 Enter the desired alpha level. Alpha (") is the acceptable


probability of incorrectly concluding that there is an effect. The

Determining the Minimum Sample Size for a Paired t-Test 731


Computing Power and Sample Size

traditional "!value used is 0.05. This indicates that a one in


twenty chance of error is acceptable, or that you are willing to
conclude there is a significant treatment difference when P % 0.05.

Smaller values of " result in stricter requirements before


concluding there is a significant effect, but a greater possibility of
concluding there is no effect when one exists (a Type II error).
Larger values of " make it easier to conclude that there is an effect,
but also increase the risk of reporting a false positive (a Type I
error).

6 Select the # button to see the required sample size for a paired t-
test at the specified conditions. The sample size calculation
appears at the top of the dialog box. If desired, you can change any
of the settings and select the # button again to view the new
sample size as many times as desired.

7 Select the Save to Report button to save the sample size


computation settings and resulting sample size to the current
report.

8 Select Close to exit from paired t-test sample size computation.

FIGURE 13–18
The Paired t-test
Sample Size Results
Viewed in the Report

For descriptions of computing the sample size for a paired t-test, you can
reference an appropriate statistics reference. For a list of suggested
references, see page 12.

Determining the Minimum Sample Size for a Paired t-Test 732


Computing Power and Sample Size

Determining the Minimum Sample Size for a


Proportions Comparison 10

You can determine the sample size for a z-test comparison of


proportions. A comparison of proportions compares the difference in
the proportion of two different groups that falls within a single category.
For more information on running z-tests, see page 438.

To determine the sample size for a proportion comparison, you need to


specify the:

➤ Proportion of each group that falls within the category.


➤ Desired power or sensitivity of the test.
➤ Alpha (") used to determine the sample size.

To find the sample size for a z-test proportion comparison:

1 Enter the expected proportions that fall into the category for each
group in the Group 1 and 2 Proportion boxes. This can be the
distribution you expect to see, as determined from previous
experiments, or just an estimate.

2 Enter the desired power, or test sensitivity. Power is the probability


that the proportion comparison will detect a difference if there
really is a difference in proportion. The closer the power is to 1,
the more sensitive the test. Traditionally, you want to achieve a
power of 0.80, which means that there is an 80% chance of

Determining the Minimum Sample Size for a Proportions Comparison 733


Computing Power and Sample Size

detecting an difference with 1! !" confidence (i.e., a 95%


confidence when "!# 0.05).

FIGURE 13–19
The Proportions
Sample Size Dialog Box

3 Enter the desired alpha level. Alpha (") is the acceptable


probability of incorrectly concluding that there is an effect. The
traditional!" value used is 0.05. This indicates that a one in
twenty chance of error is acceptable, or that you are willing to
conclude there is a significant distribution difference when P %
0.05.

Smaller values of " result in stricter requirements before


concluding there is a significant difference, but a greater possibility
of concluding there is no difference in distribution when one exists
(a Type II error). Larger values of " make it easier to conclude
that there is a difference, but also increase the risk of reporting a
false positive (a Type I error).

4 Select the # button to see the required sample size for a proportion
comparison at the specified conditions. The calculated sample size
appears at the top of the dialog box. If desired, you can change any
of the settings and select the # button again to view the new
sample size as many times as desired.

& The Yates correction factor is used if this option was selected in the
Options for z-test dialog box. See page 435 for information on z-
test options.

5 Select the Save to Report button to save the sample size


computation settings and resulting sample size to the current

Determining the Minimum Sample Size for a Proportions Comparison 734


Computing Power and Sample Size

report. The estimated sample size is the sample size for each
group.

6 Select Close to exit from proportion comparison sample size


computation.

FIGURE 13–20
The Proportions
Sample Size Results
Viewed in the Report

For descriptions of computing the sample size for a z-test, you can
reference an appropriate statistics reference. For a list of suggested
references, see page 12.

Determining the Minimum Sample Size for a One Way


Anova 10

You can determine the group sample size for a One Way ANOVA
(analysis of variance). One Way ANOVAs are used to see if there is a
difference among two or more samples taken from populations that are
normally distributed with equal variances among the individuals. For
more information on running a One Way ANOVA, see page 237.

To determine the sample size for a One Way ANOVA, you need to
specify the:

➤ Minimum difference in between group means to be detected.


➤ Estimated standard deviation of the underlying populations.
➤ Number of groups.

Determining the Minimum Sample Size for a One Way Anova 735
Computing Power and Sample Size

➤ Desired power or sensitivity of the ANOVA.


➤ Alpha (") used to determine the sample size.

To find the sample size for a One Way ANOVA:

1 Choose the Statistics menu Sample Size command and choose


ANOVA. The ANOVA Sample Size dialog box appears.

FIGURE 13–21
The ANOVA
Sample Size Dialog Box

2 Enter the size of the minimum expected difference of group means


in the Minimum Detectable Difference box. This can be size of a
difference you expect to see, as determined from previous
experiments, or just an estimate.

3 Enter the size of standard deviation of the residuals. This can be


size you expect to see, as determined from previous experiments, or
just an estimate. Note that one way ANOVA assumes that the
standard deviations of the underlying normally distributed
populations are equal. Then enter the expected number of groups.

4 Enter the desired power, or test sensitivity. Power is the probability


that the ANOVA will detect a difference if there really is a
difference among the groups. The closer the power is to 1, the
more sensitive the test. Traditionally, you want to achieve a power
of 0.80, which means that there is an 80% chance of detecting an
difference with 1 !" confidence (i.e., a 95% confidence when "!#
0.05).

5 Enter the desired alpha level. Alpha (") is the acceptable


probability of incorrectly concluding that there is an effect. The

Determining the Minimum Sample Size for a One Way Anova 736
Computing Power and Sample Size

traditional " value used is 0.05. This indicates that a one in


twenty chance of error is acceptable, or that you are willing to
conclude there is a significant difference when P!% 0.05.

Smaller values of "!result in stricter requirements before


concluding there is a significant difference, but a greater possibility
of concluding there is no difference when one exists (a Type II
error). Larger values of "!make it easier to conclude that there is a
difference, but also increase the risk of reporting a false positive (a
Type I error).

6 Select the # button to see the required sample size for a One Way
ANOVA at the specified conditions. The sample size calculation
appears at the top of the dialog box. The sample size is the size of
each group. If desired, you can change any of the settings and
select the # button again to view the new sample size as many
times as desired.

7 Select the Save to Report option to save the sample size


computation settings and resulting sample size to the current
report, and then select Close to exit from ANOVA sample size
computation.

FIGURE 13–22
The ANOVA Sample
Size Results Viewed
in the Report

For descriptions of computing the sample size for a One Way ANOVA,
you can reference an appropriate statistics reference. For a list of
suggested references, see page 12.

Determining the Minimum Sample Size for a One Way Anova 737
Computing Power and Sample Size

Determining the Minimum Sample Size for a Chi-Square


Test 10

You can determine the sample size for a chi-square ('2) analysis of a
contingency table. A Chi-square test compares the difference between
the expected and observed number of individuals of two or more
different groups that fall within two or more categories. For more
information on running Chi-square tests, see page 446.
TABLE 13-2
The Contingency Table Group Categories
with Expected Numbers
of Observations of Two Category 1 Category 2 Category 3
Groups in Three Categories

Group 1 15 15 35

Group 2 15 30 10

The sample size for a chi-square analysis contingency table is determined


by the estimated relative proportions in each category for each group.
Because SigmaStat uses numbers of observations to compute these
estimated proportions, you need to enter a contingency table in the
worksheet containing the estimated number of observations before you
can compute the estimated proportions.

To find the sample size for a Chi-square test:

1 Enter a contingency table into the worksheet by placing the


estimated number of observations for each table cell in a
corresponding worksheet cell.

The worksheet rows and columns correspond to the groups and


categories. The number of observations must always be an integer.

Note that the order and location of the rows or columns


corresponding to the groups and categories is unimportant. You
can use the rows for category and the columns for group, or vice
versa.

2 Choose the Statistics menu Sample Size command and choose


Chi- square. The Pick Columns for Chi-square dialog box
appears.

Determining the Minimum Sample Size for a Chi-Square Test 738


Computing Power and Sample Size

FIGURE 13–23
Contingency Table Data
Entered into the Worksheet

FIGURE 13–24
The Pick Columns for
Chi-square Dialog Box

3 Select the columns of the contingency table from the worksheet as


prompted. Select Run when you have selected all three columns.
The Chi-square Sample Size dialog box appears.

FIGURE 13–25
The Chi-square Sample Size
Dialog Box

4 Enter the desired power, or test sensitivity. Power is the probability


that the chi-square test will detect a difference in observed
distribution if there really is a difference. The closer the power is
to 1, the more sensitive the test. Traditionally, you want to achieve
a power of 0.80, which means that there is an 80% chance of
detecting an difference with 1 !" confidence (i.e., a 95%
confidence when "!# 0.05).

Determining the Minimum Sample Size for a Chi-Square Test 739


Computing Power and Sample Size

5 Enter the desired alpha level. Alpha (") is the acceptable


probability of incorrectly concluding that there is a difference.
The traditional " value used is 0.05. This indicates that a one in
twenty chance of error is acceptable, or that you are willing to
conclude there is a significant difference when P!%!0.05.

Smaller values of " result in stricter requirements before


concluding there is a significant difference, but increase the
possibility of concluding there is no effect when one exists (a Type
II error). Larger values of " make it easier to conclude that there
is a difference, but also increase the possibility of concluding there
is an effect when none exists.

6 Select the #!button to see the required sample size for a Chi-square
test at the specified conditions. The sample size calculation
appears at the top of the dialog box. If desired, you can change any
of the settings and select the #!button again to view the new
sample size as many times as desired. However, if you want to
change the number of observations per category, you need to select
Close, edit the table, then repeat the sample size computation.

7 Select the Save to Report button to save the sample size


computation settings and resulting sample size to the current
report, and then select Close to exit from chi-square test sample
size computation.

FIGURE 13–26
The Chi-square Sample Size
Computation Results
Viewed in the Report

For descriptions of computing the sample size required for a chi-square


analysis of contingency tables, you can reference an appropriate statistics
reference. For a list of suggested references, see page 12.

Determining the Minimum Sample Size for a Chi-Square Test 740


Computing Power and Sample Size

Determining the Minimum Sample Size to Detect a


Specified Correlation 10

You can determine the sample size necessary to detect a specified Pearson
Product Moment Correlation Coefficient r. A correlation coefficient
quantifies the strength of association between the values of two variables.
A correlation coefficient of 1 means that as one variable increases, the
other increases exactly linearly. A correlation coefficient of 1 means
that as one variable increases, the other decreases exactly linearly. For
more information on computing the correlation coefficient, see page
467.

To determine the sample size necessary to detect a specified correlation


coefficient, you need to specify thel

➤ Expected value of the correlation coefficient.


➤ Desired power or sensitivity of the test.
➤ Alpha (") used to determine the sample size.

To find the sample size required for a specific correlation coefficient:

1 Choose the Statistics menu Sample Size command and choose


Correlation. The Correlation Sample Size dialog box appears.

2 Enter the expected correlation coefficient in the Correlation


Coefficient box. This can be the correlation coefficient you expect
to see, as determined from previous experiments, or just an
estimate.

FIGURE 13–27
The Correlation
Sample Size Dialog Box

3 Enter the desired power, or test sensitivity. Power is the probability


that the correlation coefficient quantifies an actual association.

Determining the Minimum Sample Size to Detect a Specified Correlation 741


Computing Power and Sample Size

The closer the power is to 1, the more sensitive the test.


Traditionally, you want to achieve a power of 0.80, which means
that there is an 80% chance of detecting an association with 1 !"
confidence (i.e., a 95% confidence when "!# 0.05).

4 Enter the desired alpha level. Alpha (") is the acceptable


probability of incorrectly concluding that there is an association.
The traditional " value used is 0.05. This indicates that a one in
twenty chance of error is acceptable, or that you are willing to
conclude there is an association when P % 0.05.

Smaller values of " result in stricter requirements before


concluding there is a true association, but a greater possibility of
concluding there is no relationship when one exists (a Type II
error). Larger values of " make it easier to conclude that there is
an association, but also increase the risk of reporting a false
positive (a Type I error).

5 Select the # button to see the required sample size of a correlation


coefficient at the specified conditions. The sample size calculation
appears at the top of the dialog box. If desired, you can change any
of the settings and select the # button again to view the new
sample size as many times as desired.

FIGURE 13–28
The Correlation
Coefficient Sample
Size Results Viewed
in the Report

6 Select the Save to Report button to save the sample size


computation settings and resulting sample size to the current
report.

7 Select Close to exit from correlation coefficient sample size


computation.

Determining the Minimum Sample Size to Detect a Specified Correlation 742


Computing Power and Sample Size

For descriptions of computing the sample size required to detect a


correlation coefficient, you can reference an appropriate statistics
reference. For a list of suggested references, see page 12.

Determining the Minimum Sample Size to Detect a Specified Correlation 743


Computing Power and Sample Size

Determining the Minimum Sample Size to Detect a Specified Correlation 744


Using Transforms

14 Using Transforms

SigmaStat transforms are math functions and equations which are


applied to worksheet data. Use transforms to perform flexible and
powerful mathematical manipulations on your data.

Types of Transforms 10

SigmaStat provides three types of transforms. They include:

➤ Quick transforms
➤ A number of miscellaneous data transforms
➤ User-defined transforms

Quick Transforms Use quick transforms to perform fast mathematical transforms on your
data and for linearizing and normalizing data.

Mathematical Transforms The quick mathematical transforms enable


to add, subtract, divide, and find the absolute value of column data.

Linearizing and Normalizing Transforms When your data does not


meet the assumptions required for meaningful statistical tests, you can
often transform it using a mathematical function so that the resulting
variables meet the requirements of the statistical methods. These
assumptions include the linearity, normality and equal variance
assumptions of linear regression, and analysis of variance.

You can often use transforms in order to use linear regression techniques
for data which does not fall along a straight line. SigmaStat provides a
number of Quick Transforms for linearizing data and stabilizing
(equalizing) non-constant variances.

Types of Transforms 745


Using Transforms

The other option for handling nonlinear data is to use the appropriate
curved data regression technique. Both polynomial and general
nonlinear regression methods are provided. For a description of
polynomial regression, see Polynomial Regression on page 553. For
information on how to use nonlinear regression, see Chapter 11,
Prediction and Correlation.

For descriptions of linearizing transforms, you can reference any


appropriate statistics reference. For a list of suggested references, see
page 12.

Other Transforms SigmaStat provides a number of miscellaneous data transformations for:

➤ Sorting data (page 52)


➤ Indexing and unindexing data (page 73)
➤ Stacking data (page 34)
➤ Centering data (page 755)
➤ Standardizing data (page 758)
➤ Assigning ranks (page 760)
➤ Computing interaction (page 762)
➤ creating dummy variables (page 765)
➤ Creating lagged variables (page 775)
➤ Filtering data (page 777)
➤ Generating random numbers (page 781)
➤ Translating missing value symbols (page 785)

These transforms appear as commands on the transform menu.

User-Defined SigmaStat contains a powerful transform language that can be used to


Transforms create customized data transformations that can be both lengthy and
complex. The transform language contains many built-in functions, a
complete set of arithmetic and logical operators, and can even use
custom functions described in the transform library of the STAT.INI
file.

User-defined transforms are entered as equations in User-Defined


Transform dialog box. They can be saved, opened, and modified as files.
For complete information on user-defined transforms, see User-Defined
Transforms on page 787 and in the Transforms and Regression reference
.pdf.

Types of Transforms 746


Using Transforms

Quick Mathematical Transforms 10

The quick mathematical transforms include the following transforms:

➤ Add
➤ Subtract
➤ Divide
➤ Absolute Value

Use these quick transforms to add, subtract, or divide the corresponding


row values of two columns of data and to find the absolute values of data
in a column. The results for each of these transforms are placed in a
specified output column.

Adding, To add, subtract, or divide corresponding row values of two worksheet


Subtracting, columns:
and Dividing
Column Data 8 Choose the Transform menu Quick Transforms, Add...,
Subtract..., or Divide... command. The Pick Columns dialog box
appears and prompts you to select an input column.

& If you select columns in the worksheet before you choose the sum
transform, the first two selected columns are automatically assigned
as the input columns, and the third column is assigned as the output
column.

1 Pick the first input column with the data you want to add by
clicking it in the worksheet or selecting it from the Data for Input
drop-down list.

The number or title of the selected column appear in the


highlighted input row and you are prompted for another input
column.

2 Pick the column with the data you want to add the first column
values to, subtract the first column values from, or divide the first
columns by as the second input column.

Quick Mathematical Transforms 747


Using Transforms

The number or title of the selected column appear in the


highlighted input row and you are prompted for an output
column.

FIGURE 14–1
The Pick Columns for
Add Transform Dialog Box

3 Select the column where you want to place the results of the
addition, subtraction, or division of the input columns as the
output column.

4 To change your selections, select the column assignment in the


Selected Columns list, then pick the desired column from the
worksheet or the drop-down list. You can also double-click a
column assignment to clear it.

5 Select Finish to run the transform. The results of the transform


appear in the specified output column.

& If you specify an output column that contains data, a dialog box
appears asking you if you want to overwrite the column contents,
push the contents down, or cancel the transform.

FIGURE 14–2
The Output Columns
Are Not Empty Dialog Box

6 Select Overwrite to replace the existing column contents with the


transform results. Select Insert to place transform results above the
existing cell contents.

The results appear in the specified output column.

Quick Mathematical Transforms 748


Using Transforms

Finding the Use the Absolute Value quick transform to find the absolute values of
Absolute Values data in a worksheet column. Choose the Transforms menu Quick
of Column Data Transforms, Absolute Value command. When the Pick Columns dialog
box appears prompting you for an input column, select the column with
the data you want to find the absolute values for.

Select the column you want to put the absolute value results in as the
output column.

For detailed instructions on how to pick input and output columns for
the Absolute Value transform, see steps 1 through 7 on page 749.

Using Quick Transforms to Linearize and Normalize Data 10

SigmaStat provides several commonly used transformations that are used


to linearize or normalize observations or stabilize the variance,
particularly in regression and analysis of variance problems.

➤ Square x2
➤ Natural log ln(x)
➤ Log log(x)
1
➤ Reciprocal --x-
➤ Exponential ex
➤ Square root x
➤ Arcsin square root transform arcsin $ x (

To use these quick transforms:

Using Quick Transforms to Linearize and Normalize Data 749


Using Transforms

1 Choose the Transforms menu Quick Transforms command, then


choose the desired transform.

FIGURE 14–3
The Transforms menu Quick
Transforms Commands

The Pick Columns dialog box for the specified transform appears
and prompts you for an input column.

& If you select a column in the worksheet before you choose the
transform, the selected column is automatically assigned as the
input column, and you are prompted for the output column (see
step 3).

2 Pick the data column you want to apply the transform to as the
input column by clicking it in the worksheet or selecting it from
the Data for Input drop-down list.

The number or title of the selected column appear in the


highlighted input row in the Selected Columns list, and you are
prompted for an output column.

3 Pick the column where you want the transform results to appear as
the output column by clicking it in the worksheet or selecting it
form the drop-down list. The number or title of the selected
column appear in the highlighted output row.

4 To change your selections, select the column assignment in the


Selected Columns list, then pick the desired worksheet column
from the worksheet or drop-down list. You can also double-click a
column assignment to clear it.

Using Quick Transforms to Linearize and Normalize Data 750


Using Transforms

FIGURE 14–4
Picking Input and
Output Columns Using
the Pick Columns Dialog Box

5 Select Finish to run the transform on the specified input column.

& If you specify an output column that contains data, a dialog box
appears asking you if you want to erase the column contents, push
the contents down, or cancel the transform.

FIGURE 14–5
The Output Columns
Are Not Empty Dialog Box

6 Select Overwrite to replace the existing column contents with the


transform results. Select Insert to place transform results above the
existing cell contents.

Common Linearizing and Here are some common linearization and normalizing transformations
Normalization for nonlinear data. You can use these when you want to use linear
Transforms regression or ANOVA on nonlinear data.

For descriptions of linearizing transforms, you can reference any


appropriate statistics reference. For a list of suggested references, see
page 12.

Using Quick Transforms to Linearize and Normalize Data 751


Using Transforms

2 Quadratic Functions (Second Order Polynomials) You can use


y = b0 + b1 x + b1 x
transforms to linearize and stabilize variances of curved data that appear
to be described by a quadratic equation.

FIGURE 14–6
Sample Quadratic
Curve Shapes

Note that when squaring the independent variable, you can also
introduce multicollinearity. To avoid this, apply the Center transform
on the data before squaring the independent variable. For information
on centering variables, see page 755.

1 Apply the Square quick transform on the independent variable x


only; the dependent variable y is left alone.

2 Perform Multiple Linear Regression, selecting both the original


independent and the squared independent variables as the
independent variables for the regression equation.

& This procedure can be generalized to any order polynomial; use user-defined
transforms to create the higher-order polynomials. However, if you are
trying to fit data to higher-order curves than a quadratic or cubic, the
Polynomial Regression produces more reliable results. For information on
using Polynomial Regression, see page 553.

b1 Power Functions You can use transforms to linearize equations that use
y = b0 x
the independent variable as the power of a constant.

To use a power function for Linear Regression:

1 Apply the Ln (natural log) quick transform to both the


independent variable x and the dependent variable y.

2 Perform a Simple Linear Regression on the transformed variables


(see page 469).
b1 x
y = b0 e Exponential Functions You can use transforms to linearize exponential
functions. All dependent variables y must be positive.

Using Quick Transforms to Linearize and Normalize Data 752


Using Transforms

FIGURE 14–7
Sample Power
Function Shapes

To use transform and exponential function for linear regression:

1 Apply the Ln (natural log) quick transform to both the dependent


variable y only; do not transform the independent variable x.

2 Perform a Simple Linear Regression on the transformed dependent


variable y and the original independent variable x.

FIGURE 14–8
Sample Exponential
Function Shapes

b1 Inverse Exponential Functions You can use transforms to linearize


----
x-
y = b0 e inverse exponential functions.

To use transform a power function for Linear Regression:

1 Apply the Ln (natural log) quick transform to the dependent


variable y.

2 Apply the Reciprocal +1---, quick transform to the independent


) x*
variable x.

3 Perform a Simple Linear Regression on the transformed variables


(see page 469).

x -
y = ------------------- Hyperbola You can transform hyperbolas to make them suitable for
b0 + b1 x Simple Linear Regression.
+ 1,
1 Apply the Reciprocal ) --x-* quick transform to both the dependent
variable y and the independent variable x.

Using Quick Transforms to Linearize and Normalize Data 753


Using Transforms

FIGURE 14–9
Sample Inverse
Exponential Function
Shapes

2 Perform a Simple Linear Regression on the transformed variables


(see page 469).

FIGURE 14–10
Sample Hyperbolic Function
Shapes

1 You can linearize this function with transforms.


y = --------------------
b0 + b1 x + 1,
1 Apply the Reciprocal ) --y-* quick transform to the dependent
variable y; do not transform the independent variable x.

2 Perform a Simple Linear Regression on the transformed dependent


variable y and the original independent variable x.

b You can linearize this function with transforms.


y = b 0 + ----1- + 1,
x 1 Apply the Reciprocal ) --x-* quick transform to the independent
variable x; do not transform the dependent variable y.

2 Perform a Simple Linear Regression on the original dependent


variable y and the transformed independent variable x.
2 You can linearize this function with transforms, particularly if the data is
y = $ b0 + b1 x (
counts of observations.

1 Apply the Square Root $ y ( quick transform to the dependent


variable y; do not transform the independent variable x.

2 Perform a simple linear regression on the transformed dependent


variable y and the original independent variable x.

Using Quick Transforms to Linearize and Normalize Data 754


Using Transforms

If you want to normalize percentage data that are linearly distributed


arcsin $ x (
before performing a statistical procedure, you can use the Arcsin quick
transform. By definition, most percentage data is linearly distributed.
The Arcsin transform modifies percentage values so that the values more
closely follow a normal distribution. The data must be values between 0
and 1, where 0 equals 0% and 1 equals 100%.

To normalize percentage data:

1 Apply the Arcsin Square Root quick transform to the data you
want to normalize.

The data from the source column(s) is transformed to a more


normal distribution.

2 Perform the desired parametric statistical procedure on the


transformed data.

FIGURE 14–11
Histograms of Data Before
and After the Arcsin Quick
Transform was Applied

Centering Data 10

The center transform subtracts the mean of a column from all values in
that column and places the result in a specified output column.

You can often use the center transform on data to eliminate or reduce
multicollinearity. For more information on centering data, you can
reference any appropriate statistics reference. For a list of suggested
references, see page 12.

To center a variable:

Centering Data 755


Using Transforms

1 Choose the Transforms menu Center command. The Pick


Columns for Center Transform dialog box appears and prompts
you to select an input column.

& If you select a column in the worksheet before you choose the
transform, the selected column is automatically assigned as the
input column, and you are prompted for the output column (see
step 3).

2 Pick the worksheet column with the data you want to center as the
input column by clicking it in the worksheet or selecting it from
the Data for Input drop-down list. The number or title of the
selected column appear in the highlighted input row and you are
prompted for an output column.

FIGURE 14–12
The Pick Columns for
Center Transform Dialog Box

3 Pick the column where you want the centered variables to appear
as the output column by clicking it the worksheet or selecting it
from the Data for Output drop-down list. The number or title of
the selected column appear in the highlighted output row.

4 To change your selections, select the column assignment in the


Selected Columns list, then select the desired column from the
worksheet or the drop-down list. You can also double-click a
column assignment to clear it.

5 Select Finish to run the transform on the specified input column.

Centering Data 756


Using Transforms

& If you specify an output column that contains data, a dialog box
appears asking you if you want to erase the column contents, push
the contents down, or cancel the transform.

FIGURE 14–13
The Output Columns
Are Not Empty Dialog Box

6 Select Overwrite to replace the existing column contents with the


transform results. Select Insert to place transform results above the
existing cell contents.

The data from the source column is centered around its mean
value and placed in the specified output column.

FIGURE 14–14
Example of Center
Transform
The data in column 1
is the input column and
the data in column 2 is
the result data from
running the Center
transform on the data
in column 1.

Centering Data 757


Using Transforms

Standardizing Data 10

Use this transform if you want to standardize variables before


performing a statistical procedure. By definition, standardized data has a
mean of zero and a standard deviation of one.

The standardize transform subtracts the mean of a column from all


values in that column, then divides the centered values by the standard
deviations.

To standardize a variable:

1 Choose the Transform menu Standardize command. The Pick


Columns for Standardize Transform dialog box appears prompting
you to select an input column.

& If you select a column in the worksheet before you choose the
transform, the selected column is automatically assigned as the
input column in the Selected Columns list, and you are prompted
for the output column (see step 3).

2 Pick the worksheet column with the data you want to standardize
as the input column by clicking it in the worksheet or selecting it
from the Data for Input drop-down list. The number or title of
the selected column appear in the highlighted input row, and you
are prompted for an output column.

FIGURE 14–15
The Pick Columns
for Standardize
Transform Dialog Box

3 Pick the column where you want the standardized variables to


appear as the output column by clicking it the worksheet or
selecting it from the Data for Output drop-down list. The
number or title of the selected column appears in the highlighted
output row.

Standardizing Data 758


Using Transforms

4 To change your selections, select the column assignment in the


Selected Columns list, then select the desired column from the
worksheet or drop-down list. You can clear a column assignment
by double-clicking it.

5 Click Finish to run the transform on the specified input column


and place the results in the specified output column.

& If you specify an output column that contains data, a dialog box
appears asking you if you want to erase the column contents, push
the contents down, or cancel the transform.

FIGURE 14–16
The Output Columns
Are Not Empty Dialog Box

6 Select Overwrite to replace the existing column contents with the


transform results. Select Insert to place transform results above the
existing cell contents.

The data from the source column is standardized and placed in the
specified column.

FIGURE 14–17
Example of the
Standardized Transform
The data in column 1
is the input column and
the data in column 2 is
the result data from running
the Standardized
transform on the data
in column 1.

Standardizing Data 759


Using Transforms

Ranking Data 10

Use the rank transform to assign integer rank values to data. Ranking
data is useful if you want know how the values are ranked, or to perform
two way ANOVA on the ranks of data that fails the normality or equal
variance tests.

The rank transform assigns rank values to all observations in a column


from smallest to largest. Equal values are tied in rank, and an averaged
rank is assigned to all tied values. This rank is the average of the ranks
that would have been assigned to all the tied values if they were not tied.

To rank a variable:

1 Choose the Transform menu Rank command. The Pick Data for
Rank Transform dialog box appears and prompts you to select an
input column.

& If you select a column in the worksheet before you choose the
transform, the selected column is automatically assigned as the
input column, and you are prompted for the output column (see
step 3).

2 Pick the column with the data you want to rank as the input
column by clicking it in the worksheet or selecting it from the
Data for Input drop-down list. The number or title of the selected
column appears in the highlighted input row, and you are
prompted for an output column.

FIGURE 14–18
The Pick Columns for
Rank Transform Dialog Box

3 Pick the column where you want the ranked variables to appear as
the output column by clicking it the worksheet or selecting it from

Ranking Data 760


Using Transforms

the Data for Output drop-down list. The number or title of the
selected column appears in the highlighted output row.

4 To change your selections, select the column assignment in the


Selected Columns list, then select the desired column from the
worksheet or drop-down list. You can clear a column assignment
by double-clicking it.

5 Click Finish to run the transform on the specified input column


and place the results in the specified output column.

& If you specify an output column that contains data, a dialog box
appears asking you if you want to erase the column contents, push
the contents down, or cancel the transform.

FIGURE 14–19
The Output Columns
Are Not Empty Dialog Box

6 Select Overwrite to replace the existing column contents with the


transform results. Select Insert to place transform results above the
existing cell contents.

Ranking Data 761


Using Transforms

The data from the input column is ranked and the corresponding
rank values are placed in the specified column.

FIGURE 14–20
Example of the
Ranked Transform
The data in column 1
is the input column and
the data in column 2 is
the result data from
running the Ranked
transform on the
data in column 1.

Creating Interaction Variables 10

Use the interaction transform to when you want to introduce an


interaction variable into a multiple linear regression model, i.e., a
variable that takes into account the interaction between two
independent variables. The interaction transform computes the product
of the values in two data columns and places the results in an output
column.

For example, to introduce an interaction factor into the general multiple


linear regression model:

y = b0 + b1 x1 + b2 x2

you could add another variable to the equation equal to x1x2, e.g.,

y = b0 + b1 x1 + b2 x2 + b3 x1 x2

& Note that adding an interaction variable to a multiple linear regression can
induce multicollinearity. To avoid or reduce this problem, use the Center

Creating Interaction Variables 762


Using Transforms

transform on the original variables, then use the centered variables to


generate the interaction variable.

For descriptions of independent variable interactions in multiple linear


regression, you can reference any appropriate statistics reference. For a
list of suggested references, see page 12.

To generate an interaction variable:

1 Choose the Transform menu Interactions command. The Pick


Columns for Interactions Transform dialog box appears and
prompts you to select an input column.

& If you selected columns before you ran the transform, the selected
columns are assigned as the input and output columns in the order
they were selected in the worksheet.

2 Pick the first variable column with the data you want to factor into
the interaction by clicking it in the worksheet or selecting it from
the Data for Input drop-down list, then pick the second input
column. The number or title of the selected column appears in the
input row of the Selected Columns list.

FIGURE 14–21
The Pick Columns for
Interaction Transform Dialog
Box

3 Select the column where you want to place the interaction variable
as the output column by clicking it in the worksheet or selecting it
from the Data for Output drop-down list. The number or title of
the selected column appears in the highlighted output row.

4 To change your selections, select the column assignment in the


Selected Columns list, then select the desired column from the
worksheet or drop-down list. You can clear a column assignment
by double-clicking it.

Creating Interaction Variables 763


Using Transforms

5 Select Finish to run thee transform on the specified input columns


and place the results in the specified output column.

& If you specify an output column that contains data, a dialog box
appears asking you if you want to erase the column contents, push
the contents down, or cancel the transform.

FIGURE 14–22
The Output Columns
Are Not Empty Dialog Box

6 Select Overwrite to replace the existing column contents with the


transform results. Select Insert to place transform results above the
existing cell contents.

The data from the input columns are factored together and placed
in the specified output column.

FIGURE 14–23
Example of the
Interaction Transform
The data in columns 1 and 2
are the input columns and
the data in column 3 is
the result data from
running the Interaction
transform on the
data in columns 1 and 2.

Creating Interaction Variables 764


14
Creating Dummy (Indicator) Variables 10

Dummy, or indicator, variables can be used to determine if sets of data


share the same constant (intercept) value, by determining if the constant
is affected by conditional changes specified by the dummy variables.
This can be used to determine if all the data in a simple linear regression
lie on the same line, or if there are conditional dependencies on the
independent variable.

Dummy variables are generally computed from indexed data columns.


They are assigned to index variable data similarly to the way index values
are assigned to raw data, but always correspond to specific numeric
values, as determined by the index variable values and by the kind of
dummy variable coding used. There are two ways to define dummy
variables: reference coding and effects coding.

If the index column contains numeric values, the dummy variable


transform uses the nearest whole number as the code value, and
evaluates the data for the corresponding dummy variable by rounding
up to the nearest whole number.

For k number of different index variable values (conditions of possible


dependencies), the dummy variables transform creates k 1 dummy
variables. If an index variable contains two different index values, one
dummy variable column is produced, and the other is used as the
reference or effect index value; if the index variable contains three
different index values, two dummy variable columns are produced, and
the third is used as the reference or effect value. To create a dummy

Creating Dummy (Indicator) Variables 765


variable column, the index column must contain at least two different
index values.

FIGURE 14–25 120


A SigmaPlot Graph
Showing the Effect of
100
Different Intercepts for
Different Groups
80
The groups are quantified
using dummy variables
to fit all the data with 60
a single multiple
regression equation.
40
The regression lines are
shown for both data sets 20
together—for each
data set considered.
0
0 5 10 15 20 25 30

For descriptions of how to use dummy variables to detect different


slopes, you can reference any appropriate statistics reference. For a list of
suggested references, see the bibliography in Appendix A.

Reference Coding Reference coding sets the value of all dummy variables to zero when the
index variable corresponds to the indexed condition used, and codes all
other values of the index variable with a 1. The referenced condition is
always assigned
a 0.

& Use reference coding when you want the constant to be the mean of the
dependent variable under a selected referenced condition, and the
coefficients computed for the dummy variable(s) to reflect the changes of the
constant value from reference condition dependent variable mean.

To create reference coded dummy variables:

1 If necessary, create an index column for your data. These data can
consist of any numbers or strings. Each dependent variable value
that falls under a different condition is indexed with a different
label. For more information on indexing data, see page 73. Two

Creating Dummy (Indicator) Variables 766


factor and repeated measures data require additional index
columns.

FIGURE 14–26
Example of Indexed Data
Column Created For
Dependent Variable Data
Column 3 is the indexed data.

2 Choose the Transforms menu Dummy Variables command, and


choose Reference Coded. The Pick Columns for Reference
Transform dialog box appears and prompts you to select input and
output columns.

& If you selected columns before you ran the transform, the selected
columns are assigned as the input and output columns in the order
they were selected in the worksheet.

3 Pick the column with the indexed data you want to create dummy
variables for as the input column by clicking it in the worksheet or
selecting it from the Data for Input drop-down list. The number

Creating Dummy (Indicator) Variables 767


or title of the selected column appear in the highlighted input row
and you are prompted for the output column.

FIGURE 14–27
The Pick Columns for
Reference Transform Dialog
Box

4 Select the destination column for the dummy variables as the


output column by clicking it in the worksheet or selecting it from
the Data for Output drop-down list. The number or title of the
selected column appears in the highlighted output row.

There should be enough empty columns to the right of the


destination column to accommodate all the dummy variable
columns; the number of dummy variable columns produced is one
less than the number of index values (different groups).

5 To change your selections, select the column assignment in the


Selected Columns list, then select the desired column from the
worksheet or drop-down list. You can clear a column assignment
by double-clicking it.

6 Select Finish to run the transform. The Select Reference Index


dialog box appears.

FIGURE 14–28
The Select Reference
Index Dialog Box

7 Select the reference index value from the list to use as the reference
condition; no dummy variable is created using this value (this is
the condition that determines the constant value; the
corresponding dummy variable values for this condition are always

Creating Dummy (Indicator) Variables 768


zero). All other index values are evaluated for the corresponding
dummy variable values.

8 Click OK. The reference coded dummy variables are placed in as


many columns as there are index values, less one. Index column
values that match the condition used to evaluate the column are
assigned a zero; all other values are assigned a 1. One dummy
variable column is produced for each index value, except for the
index value selected as the reference condition.

& Ifappears
you specify an output column that contains data, a dialog box
asking you if you want to erase the column contents, push
the contents down, or cancel the transform.

FIGURE 14–29
The Output Columns
Are Not Empty Dialog Box

9 Click Overwrite to replace the existing column contents with the


transform results. Click Insert to place transform results above the
existing cell contents.

Creating Dummy (Indicator) Variables 769


If you are creating dummy variables for a two factor or repeated
measures problem, create dummy variables for all remaining index
columns.

FIGURE 14–30
Example of Reference
Coded Dummy
Variable Values
The data in column 2 is the
input data and the data in
column 4 ius the output data.

Effects Coding In effects coding, the dummy variables are coded with! 1, 0, and 1. The
reference condition is always coded with a 1. The value of other
dummy variables is set to zero when the index variable corresponds to
the indexed condition used, and set to 1 for all other values of the index
variable.

& Use effects coding when you want the constant term to be computed using
the value of the dependent variable under all indexed conditions, and you
want the coefficients of the dummy variables to quantify the size of changes
from this overall mean.

To create effects coded dummy variables:

1 If necessary, create an index column for your data. This data can
consist of any numbers or strings. Each dependent variable value
that falls under a different condition is indexed with a different
label. For more information on indexing data, see page 73. Two

Creating Dummy (Indicator) Variables 770


factor and repeated measures data require additional index
columns.

FIGURE 14–31
Example of Indexed Data
Columns Created For
Dependent Variable Data
Column 3 is the indexed data.

2 Choose the Transforms menu Dummy Variables command, and


choose Effects Coded. The Pick Columns for Effects Transform
dialog box appears and prompts you select an input column.

& If you selected columns before you ran the transform, the selected
columns are assigned as the input and output columns in the order
they were selected in the worksheet.

3 Pick the column with the indexed data you want to create dummy
variables for as the input column by clicking it in the worksheet or
by selecting it from the Data for Input drop-down list. The

Creating Dummy (Indicator) Variables 771


number or title of the selected column appears in the highlighted
input row, and you are prompted for the output column.

FIGURE 14–32
The Pick Columns for
Effects Transform Dialog
Box

4 To change your selections, select the column assignment in the


Selected Columns list, then select the desired column from the
worksheet or drop-down list. You can clear a column assignment
by double-clicking it.

5 Pick the destination column for the dummy variable column(s) as


the output column by clicking it in the worksheet or selecting it
from the Data for Output drop-down list. The number or title of
the selected column appears in the highlighted output row of the
Selected Columns list.

There should be enough empty columns to the right of the


destination column to accommodate all the dummy variable
columns; the number of dummy variable columns produced is one
less than the number of index values (different groups).

6 Select Finish to run the transform and open the Select Reference
Index dialog box.

FIGURE 14–33
The Select Reference
Index Dialog Box

7 Select the reference index value from the list to use as the reference;
no dummy variable is created for this value, and the corresponding
dummy variable values for this condition are always 1. All other

Creating Dummy (Indicator) Variables 772


dummy variable values are set to 1 for the corresponding index
variable values.

8 Select OK. Values in the index column that match the index value
used to evaluate the column are assigned a zero. Index values that
match the reference condition are assigned 1. All other values are
set to 1. One dummy variable column is produced for each index
value, except the index selected as the reference condition.

& Ifappears
you specify an output column that contains data, a dialog box
asking you if you want to erase the column contents, push
the contents down, or cancel the transform.

FIGURE 14–34
The Output Columns
Are Not Empty Dialog Box

9 Select Overwrite to replace the existing column contents with the


transform results. Select Insert to place transform results above the
existing cell contents.

Creating Dummy (Indicator) Variables 773


If you are creating dummy variables for a two factor or repeated
measures problem, create dummy variables for all remaining index
columns.

FIGURE 14–35
Example of Effects Coded
Dummy Variable Values
The data in column 3 is the
input data and the data in
columns 4 and 5 are the
output data columns.

Performing a Regression The equation used to evaluate the effect of a condition on the regression
Using Dummy Variables model constant is:

y = b0 + b1 x + b2 d1 + b3 d2 + - + bk – 1 dk – 1

where y is the dependent variable, x is the independent variable, k is the


number of conditions that may affect the constant, d1, d2 ... dk-1 are the
dummy variables, and b0, b1, b2 ... bk-1 are the coefficients.

To perform a Multiple Linear Regression using dummy variables:

1 Select Multiple Linear Regression from the toolbar drop-down list,


then click the button.

2 Select the dependent variable column, then select the original


independent variable and all the dummy variables as the
independent variables. Click Finish to run the regression on the
selected columns.

Creating Dummy (Indicator) Variables 774


3 Compare the results to the original Simple Linear Regression. If
the prediction is significantly better, you should consider
performing a simple linear regression on the different conditions
separately.

4 You can use dummy variables to convert analysis of variance


problems into regression problems. For more information on how
to do this, you can reference any appropriate statistics reference.
For a list of suggested references, see the bibliography in Appendix
A.

Creating Lagged Variables 10

The lagged variables transformation lags the observations in one column


by one row, by inserting a missing value to the first row of the data and
removing the last value; the overall column size remains constant.

Lagged variables are commonly used to create time series models, when
the effect of an independent variable on the dependent variable
corresponds more appropriately to the value of the dependent variable at
a later time.

5 Choose the Transform menu Lagged Variables command. The


Pick Columns for Lagged Variable Transform dialog box appears
and prompts you to pick an input column.

6 Pick the column with the data you want to lag as the input column
by clicking it in the worksheet or selecting it from the Data for
Input drop-down list. The number or title of the selected column
appears in the highlighted input row of the Selected Columns list,
and you are prompted for an output column.

7 Pick the column where you want the lagged variables to appear as
the output column by clicking it in the worksheet or selecting it
from the Data for Output drop-down list. The number or title of

Creating Lagged Variables 775


the selected column appears in the highlighted output row of the
Selected Columns list.

FIGURE 14–36
The Pick Columns for
Lagged Transform Dialog
Box

8 To change your selections, select the column assignment in the


Selected Columns list, then select the desired column from the
worksheet or drop-down list. You can clear a column assignment
by double-clicking it.

9 Select Finish to run the transform on the specified input column


and place the results in the specified output column.

& If you specify an output column that contains data, a dialog box
appears asking you if you want to erase the column contents, push
the contents down, or cancel the transform.

FIGURE 14–37
The Output Columns
Are Not Empty Dialog Box

10 Select Overwrite to replace the existing column contents with the


transform results. Select Insert to place transform results above the
existing cell contents.

Creating Lagged Variables 776


The data from the source column is lagged by one row and placed
in the specified column.

FIGURE 14–38
Example of Lagged
Variable Values
The data in column 2 is the
input data and the data in
columns 3 is the output data.

11 Repeat this transform if you need to lag the data by additional


rows.

Filtering Strings and Numbers 10

You can isolate specified groups of data using both numeric and text
filters. The filter transform operates by selecting only the rows that
correspond to specified numbers or labels in a key column, then placing
these rows and the corresponding data in new columns.

You can sort data according to a numeric range of a key column or


according to the text label in a key column. These columns are usually
factor or subject index columns for indexed data.

To sort numerically or using text:

1 Choose the Transform menu Filter command. The Filter Data


Transform dialog box appears and prompts you to pick a key
column.

Filtering Strings and Numbers 777


& If you select columns before you choose the missing values
transform, the selected columns are assigned as the input and output
column is the order they were selected in the worksheet.

2 Pick the key column to filter by clicking it in the worksheet or


selecting it from the Data for Key drop-down list. This is the
column you want to apply the sorting filter to. The number or
title of the selected column appears in the highlighted key row, and
you are prompted for an output column.

3 Pick the column where you want the results of the key column to
appear as the output column by clicking it in the worksheet or
selecting it from the Data for Output drop-down list. The number
or title of the selected column appear in the highlighted output
row, and you are prompted for an input column.

4 Pick in the columns that contain the corresponding data to be


filtered along with the key column as the input columns, then pick
their corresponding output columns by clicking them in the
worksheet or selecting them from the drop-down lists. You can
pick as many input columns as desired, and you must pick an
output column for every input column.

FIGURE 14–39
The Pick Columns for
Filter Transform Dialog Box

5 To change your selections, select the column assignment in the


Selected Columns list, then select the desired column from the
worksheet or drop-down list. You can clear a column assignment
by double-clicking it.

6 Select Finish to run the Filter transform. The Set Filter dialog box
appears.

Filtering Strings and Numbers 778


7 Select Numeric Filter to sort the key column data according to a
numeric range. Specify the upper and lower bounds of the values
to filter in the Upper Bound and Lower Bound boxes.

FIGURE 14–40
The Set Filter Dialog Box

8 Select Text Filter to sort the key column data according to a text
label in the key column. Enter the string exactly as it appears in
the worksheet in the Key Label box and select OK when you have
specified the appropriate filter.

& Ifappears
you specify an output column that contains data, a dialog box
asking you if you want to erase the column contents, push
the contents down, or cancel the transform.

FIGURE 14–41
The Output Columns
Are Not Empty Dialog Box

Filtering Strings and Numbers 779


9 Select Overwrite to replace the existing column contents with the
transform results. Select Insert to place transform results above the
existing cell contents.

FIGURE 14–42
Example of the Filter
Transform Using
a Numeric Filter
The data in columns 1
through 3 were filtered to
include a range of 1 through
3, using column 1 as the key
column, and placed in
columns 4 through 6.

Columns are filtered according to the corresponding rows in the


key column.

FIGURE 14–43
Example of the
Filter Transform
Using the Text Filer
The data in columns 1
through 3 was filtered for
the label “Site 3,” using
column 3 as the key column,
and placed in columns 4
through 6.

Filtering Strings and Numbers 780


Generating Random Numbers 10

You can generate either uniform or normally distributed random


numbers. These commands perform identically to the random and
Gaussian user-defined transform functions.

Uniformly Distributed To generate uniformly distributed random numbers:


Random Numbers
1 Choose the Transform menu Random Numbers command and
choose Uniform... The Pick Columns for Uniform Random
Transform dialog box appears and prompts you for an output
column.

& Input columns are not selected for the random number transform.

2 Pick the column where you want the random numbers to appear as
the output column by clicking it in the worksheet or selecting it
from the Data for Output drop-down list. The number or title of
the selected column appears in the highlighted output row of the
Selected Columns list.

FIGURE 14–44
The Pick Columns
for Uniform Random
Transform Dialog Box

3 To change your selections, select the column assignment in the


Selected Columns list, then select the desired column from the
worksheet or drop-down list. You can clear a column assignment
by double-clicking it.

4 Select Finish to open the Random Number Generation dialog box,


and enter the number of random numbers you want to generate in
the Quantity box.

5 Enter the lowest and highest numbers in the range of numbers in


the Low and High boxes.

Generating Random Numbers 781


FIGURE 14–45
The Uniform Random
Number Generation Dialog
Box

6 Enter the seed for the random generator. This is the number used
to generate the random numbers. Select Random from the drop-
down list to use a random seed number.

7 Click OK when finished. The random numbers are generated


according to your specifications and appear in the selected output
column.

& Ifappears
you specify an output column that contains data, a dialog box
asking you if you want to erase the column contents, push
the contents down, or cancel the transform.

FIGURE 14–46
The Output Columns
Are Not Empty Dialog Box

8 Select Overwrite to replace the existing column contents with the


transform results. Select Insert to place transform results above the
existing cell contents.

Normally Distributed To generate random data that follows a normal “bell” shaped
Random Numbers distribution curve:

1 Choose the Transform menu Random Numbers command and


choose Normal... The Pick Columns for Normal Random
Transform dialog box appears and prompts you to select an output
column.

& Input numbers are not selected for the random number transform.

Generating Random Numbers 782


2 Pick the column where you want the random numbers to appear as
the output column by clicking it in the worksheet or selecting it
from the Data for Output drop-down list. The number or title of
the selected column appears in the highlighted output row of the
Selected Columns list.

FIGURE 14–47
The Pick Columns
for Normal Random
Transform Dialog Box

3 To change your selections, select the column assignment in the


Selected Columns list, then select the desired column from the
worksheet or drop-down list. You can clear a column assignment
by double-clicking it.

4 Click Finish to open the Normal Random Number Generator


dialog box, and enter the number of random numbers you want to
generate in the Quantity box.

FIGURE 14–48
The Normal Random
Number Generator Dialog
Box

5 Enter the mean used for the numbers. This is the “middle” or
“top” of the bell curve.

6 Enter the standard deviation for the data. The size of this value
determines the amount of variation about the mean of the data. A
relatively large standard deviation distributes data as a low, flat bell.
A relatively small standard deviation creates a tall, skinny bell.

Generating Random Numbers 783


7 Enter the seed for the random number generator. This is the
number used to generate the random numbers. Select Random
from the drop-down list to use a random seed number.

8 Click OK when finished. The random numbers are generated


according to your specifications and appear in the selected output.

& Ifappears
you specify an output column that contains data, a dialog box
asking you if you want to erase the column contents, push
the contents down, or cancel the transform.

FIGURE 14–49
The Output Columns
Are Not Empty Dialog Box

9 Select Overwrite to replace the existing column contents with the


transform results. Select Insert to place transform results above the
existing cell contents.

FIGURE 14–50
Example of Values
Generated by The Random
Numbers Transform
Column 1 contains uniformly
randomly distributed
numbers. Column 2 contains
random numbers with
normally distributed values.

Generating Random Numbers 784


Translating Missing Value Codes 10

The missing data transformation converts specified values in selected


columns to the SigmaStat missing value double-dash indicator (“--”).
Use this transform to translate bad or missing value codes from other
data formats to the SigmaStat format. You can also use this transform to
convert all incidences of a bad observation to missing values.

To convert all occurrences of a string to “--”

1 Choose the Transform menu Missing Values command. The Pick


Columns for Missing Values Transform dialog box appears and
prompts you to select an input column.

& If you select columns before you choose the missing values
transform, the selected columns are assigned as the input and output
columns in the order they were selected in the worksheet.

2 Pick the columns with the strings you want convert to missing
values as the input column by clicking it in the worksheet or
selecting it from the Data for Input drop-down list; then the
corresponding output column. You must pick an output column
for every input column you select. You can pick as many input
columns as desired.

The number or title of the selected columns appear in the


highlighted input and output rows in the Selected Columns list.

FIGURE 14–51
The Pick Columns
for Missing Value
Transform Dialog Box

3 To change your selections, select the column assignment in the


Selected Columns list, then select the desired column from the
worksheet or drop-down list. You can clear a column assignment
by double-clicking it.

Translating Missing Value Codes 785


4 Select Finish to run the transform and open the Missing Value
Transform dialog box.

5 Specify the string to replace with missing value symbols. Enter the
string exactly as it appears in the worksheet, or select the string
from the drop-down list.

FIGURE 14–52
The Missing Value
Transform Dialog Box

6 Click OK when finished. The specified symbols are converted to


missing values.

& Ifappears
you specify an output column that contains data, a dialog box
asking you if you want to erase the column contents, push
the contents down, or cancel the transform.

FIGURE 14–53
The Output Columns
Are Not Empty Dialog Box

Translating Missing Value Codes 786


7 Click Overwrite to replace the existing column contents with the
transform results. Select Insert to place transform results above the
existing cell contents.

FIGURE 14–54
Example of the
Missing Values
Transform Converting
Text Strings to
Missing Values
The string “N/A” in
columns 1 through 3
was converted to “--”
symbols in columns
4 through 6.

User-Defined Transforms 10

For an in-depth discussion of transforms, refer to the Transforms and


Regression reference .pdf. This .pdf file contains descriptions and
examples of all transforms functions, as well as a tutorial, and transforms
examples and results.

One of SigmaStat’s more powerful and flexible feature is its extensive


mathematical transformation language, which enables you to define
transforms to manipulate and modify worksheet data. You can use user-
defined transforms to create new data by performing functions on
existing data, or generate calculated or “random” data, which can then
be placed in worksheet columns.

To create user-defined transforms:

1 Choose the Transforms menu User Defined... command. The


User-Defined Transform dialog box appears. This window can be
resized for a better view of the edit box.

2 Type the transforms instructions into the edit box. You can enter
up to 32K worth of text.

User-Defined Transforms 787


3 Select Execute to perform the transform.

The contents of the transform edit window can be saved to a file by


pressing the Save button. Since this is a text file, you can view or print
these files using any word processor. Previously saved transforms can be
opened and brought into the transform window for execution and
modification using the Open button.

A library of sample transforms has been provided with SigmaStat; these


can be found in the XFMS subdirectory of the Stat3 directory. To view
these files, select Open... in the User-Defined Transform window, then
use the dialog box to open a transform file. Most transform files also
include graph files displaying the results of the transform.

FIGURE 14–55
The User-Defined
Transform Dialog Box

User-Defined Transforms 788


Glossary

A Glossary

95% or 99% Confidence See Confidence Interval.

Advisor The SigmaStat Advisor is designed to help you determine the appropriate SigmaStat test to
use to analyze your data. For more information, see the USING THE ADVISOR WIZARD chapter.

Alpha Value Alpha (") is the acceptable probability of incorrectly rejecting the null hypothesis.

ANOVA on Ranks Also known as the Kruskal-Wallis analysis of variance on ranks. This
nonparametric test compares several different experimental groups that receive different treatments.

Arcsin Square Root Transform This transform is used to normalize percentage data that is linearly
distributed before performing a statistical procedure, by computing arcsin $ x ( .

ASCII File See text file. (ASCII stands for American Standard Code for Information Interchange.)

AUTOEXEC.BAT A DOS file that automatically executes a series of commands when DOS is
booted.

Axis In a Cartesian graph, an axis indicates the direction and range of X, Y, or Z values. In SigmaStat,
axes define the origin and scaling of a plot, and include tick and label definitions.

Backward Stepwise Regression One of two stepwise regression methods for selecting independent
variables. In backward stepwise regression, all variables are entered into the equation. The
independent variable that contributes the least to the prediction is removed, followed by the next least,
and so on.

Bar Chart A plot which graphs data as vertical or horizontal bars with bar lengths equal to the data
values.

Base (of an exponent) The number that is raised to the exponential power (for example, 10 or e).

Block A selected, rectangular region of worksheet cells. Blocks can be copied, deleted, pasted,
transposed, sorted, printed, and exported.

789
Glossary

Box Plot A plot type that displays the 10th, 25th, 50th, 75th, and 90th percentiles as lines on a bar
centered about the mean, and the 5th and 95th percentiles as error bars. The mean line and data
points beyond the 5th and 95th percentiles can also be displayed.

Cell (worksheet) A location on the worksheet that holds a single data value or label, described by its
column and row number.

Center This transform is used to subtract the mean of a column from each of the values in that
column and place the results in a specified output column.

Chi-Square The Chi-Square statistic summarizes the difference between the expected and the
observed frequencies by summing the squared differences and dividing by the expected frequencies

$ O – E (2
. ---------------------
E

It can be calculated wherever you have a set of observed values and a set of corresponding expected
values.

Click To press and release a mouse button, usually to select a menu or dialog box option, an item on
a list, a block of text, etc.

Clipboard The Windows data buffer where cut or copied data and text are stored. Press Ctrl+V or
use the Edit menu Paste command to place Clipboard contents in the worksheet or on the page. Note
that data and text are stored in the same Clipboard, so cutting additional data, text, or objects
overwrites current Clipboard contents. Cleared (deleted) data or text bypasses the Clipboard and
leaves the current contents intact.

Coefficient A real number that multiplies a variable in an algebraic expression. See also, Correlation
Coefficient and R.

Column The SigmaStat worksheet consists of columns and rows of cells. A column is a vertical
collection of cells which generally holds a range of numbers to be analyzed as a set.

Column Titles These are used to identify groups of data. The column title is displayed above the
data column in the worksheet.

Column Statistics A collection of statistics computed for each column. These are displayed on the
bottom half of the worksheet by choosing the View menu Column Statistics command.

Common Log Scale An axis type that plots data along a logarithmic scale with base 10. See also,
Natural Log Scale.

790
Glossary

Confidence Interval Also known as confidence level, a specified confidence interval can be any%
value from 1 to 99; the suggested confidence level for both intervals is 95%. This can also be
described as P < " (alpha), where " is the acceptable probability of incorrectly concluding that the
coefficient is different than zero, and the confidence interval is 100(1 - ").

Confidence Interval for the Mean The range in which the true population mean will fall for a
percentage of all possible samples of a certain size drawn from the population.

CONFIG.SYS A DOS file that installs device drivers and sets system parameters when you turn on or
restart your computer.

Contingency Table A method of displaying the observed numbers of different groups that fall into
different categories. These tables are used to see if there is a difference between the expected and
observed distributions of the groups in the categories.

A contingency table associates the groups and categories with the rows and columns, and places the
number of observations for each combination in the cells. For more information about how to create
a contingency table, see page 69.

Constant Variance Also known as homoscedasticity, this is the assumption that the variance of the
dependent variable in the source population is constant regardless of the value of the independent
variables.

Copy To place selected worksheet data or graphic objects in the Windows Clipboard without
removing the data or objects, press Ctrl+C or use the Edit menu Copy command. The Clipboard
contents can be placed elsewhere on the worksheet or page by pressing Ctrl+V or selecting the Edit
menu Paste command.

Correlation Coefficient (R) R represents the measure of the relationship between two variables.
Specifically, it is the covariance divided by the product of the sample standard deviations. This
number varies between -1 and +1.

Cursor See Pointer.

Cut To Remove selected data, graphs, or text and place them in the Windows Clipboard. Press
Ctrl+X or use the Edit menu Cut command to cut data, graphs, or text. Cut displaces any current
Clipboard contents. Only the last cut item can be pasted. Note that data, graphs, and text use the
same Clipboard.

The Clipboard contents can be placed at any selected worksheet or page location by pressing Ctrl+V or
choosing the Edit menu Paste command. Clipboard contents can also be pasted into other Windows
applications. See also, Paste.

Data Set A column, or set of worksheet columns, that have been chosen for analysis.

791
Glossary

Degrees of Freedom Represents a measure of the sample size, which affects the power of a test.

Delimiter A symbol or character used to separate data fields within a data file format; for example,
white space, commas, semicolons, or colons.

Descriptive Statistics SigmaStat can describe your data by computing basic statistics, such as the
mean, median, standard deviation, percentiles, etc. that summarize the observed data.

Dialog Boxes Boxes of commands and options that appear on the screen. Use dialog box options to
view and change test and report settings.

DIF files A text-based data file format, recognized by SigmaStat, developed by Software ArtsTM which
is used widely to exchange data between programs.

Drag Move the mouse while holding down the left mouse button.

Dummy Variables Also known as indicator variables, they can be used to determine if sets of data
share the same constant (intercept) value by determining if the constant is affected by conditional
changes specified by the dummy variables. Dummy variables can be defined with either effects coding
or reference coding.

Durbin-Watson Statistic This is a measure of serial correlation between the residuals. If the residuals
are not correlated, the Durbin-Watson statistic will be 2. Small values indicate positive serial
correlation among the residuals; large values indicate negative serial correlation.

Equal Variance Test This test is used to determine if the variances are as close together as the samples.
The test is computed with a Levene Median test.

Exponent The power to which a base is raised. See Base.

Exponential Transform This transform is used to calculate the values of the number e raised to the
values in a specified column.

Export Data Save worksheet data from SigmaStat to a file, for use with other programs. Choose the
File menu Export Data command to export files in Text, DIF, Excel, or other file formats. See also,
Text files and DIF.

Fills Fills include pattern of lines and colors that fill bar chart bars, pie chart slices, 3D graph mesh
grids, 3D bar fills, and drawn objects. Fill patterns affect the color of bar, box, and slice edges and
mesh grid lines. Fills and edges are specified using the Fills settings of the Graph Properties dialog box.

Filter Transform This transform is used to isolate specified groups of data using both numeric and
text filters. It operates by selecting only rows that correspond to specified numbers or labels in a key
column, then placing these rows and the corresponding data in new columns.

792
Glossary

Fisher Exact Test This test determines the exact two-tailed probability of observing a specific 2 / 2
contingency table (or a more extreme pattern).

Font A style or type of character. TrueType fonts are available with the Windows system. Other
fonts, such as PostScript and Hewlett Packard fonts, are only available if the printer drivers are
installed.

Forward Stepwise Regression One of two stepwise regression methods for selecting independent
variables. In forward stepwise regression, the independent variable that produces the best prediction of
the dependent variable is entered into the equation first, the independent variable that adds the next
largest amount of information is entered second, and so on.

Gaussian Distribution A continuous probability distribution defined by two parameters, mean, and
variance. Also called the normal distribution. You can use the gaussian transform function to generate
normally distributed data.

Hardcopy A printed copy of a graph, report, or worksheet page.

Help System A context-sensitive system of indexed screens providing on-line information about
SigmaStat commands and operations. Press F1 to view the Help Contents, or choose one of the Help
menu commands to get additional help information.

Related topics are linked through highlighted words on the screen; selecting these brings up the entry
for that topic.

Histogram A representation of a frequency distribution showing the number of occurrences within


specified intervals, usually displayed as a bar or step chart.

Holm-Sidak Test This can be used for both pairwise comparisons and comparisons versus a control
group. It is more powerful than the Tukey and Bonferroni tests and, consequently, it is able to detect
differences that these other tests do not. It is recommended as the first-line procedure for pairwise
comparison testing.

Hotkey A quick method of selecting menu commands and dialog box options. A letter in the
command or option appears underlined; pressing that letter selects the command or option.

Import Data Transfer data from a file to the SigmaStat worksheet for testing or other operations.
SigmaStat recognizes text, .DIF, .Lotus 123, Excel, SigmaPlot, and other file formats.

Choose the File menu Import Data... command to select files to import. See also, Text Files, DIF, and
Lotus .WKS.

Indexed Data This data format places the group names in one column (a factor column), and the
corresponding data for each group in another column.

793
Glossary

Interactions Transform This transform is used when you want to introduce an interaction variable
into a multiple linear regression model, i.e., a variable that takes into account the interaction between
two independent variables. The interaction transform computes the product of the values in two data
columns and places the results in an output column.

Insert A data entry mode where existing data is moved aside to make room for entered data. When
typing text labels on the page, you are always in insert mode.

When entering text, press the Insert key to toggle between insert and overwrite modes. In insert
mode, characters are moved to the right to make room for the new characters. See also Overwrite.

Kruskal-Wallis ANOVA on Ranks This is a nonparametric test that compares several different
experimental groups that receive different treatments. All values are ranked without regard to which
group they are in, then the sum of the ranks for each group are compared.

Kurtosis A measure of how peaked or flat the distribution of observed values is, compared to a
normal distribution. A normal distribution has kurtosis equal to zero.

Label Any text string, including graph and axis titles, and text entered using the Tools menu Text
command. Any text labels on the page can be modified by double-clicking them or using the Edit
Text dialog box.

Lagged Variables The lagged variables transform lags the observations in one column by one row.
These variables are commonly used to create time series models, when the effect of an independent
variable on the dependent variable corresponds more appropriately to the value of the dependent
variable at a later time.

Legend An explanation of the symbols on a graph. Legends are edited by double-clicking them, or
choosing the Tools menu Text command, clicking the legend on the page, then selecting the
Symbols... button to specify the placement of symbols, and legend style to use for the legend.

Levene Median Test This test is a Levene Mean test using the median instead of the mean. In
SigmaStat, it is used to test the equivalency of variances.

Line Graph A plot type in which data points are connected by lines. Line graphs and trajectory
graphs are 2D Cartesian graphs or 3D Cartesian graphs using a line plot type.

Linear Axis Scale An axis scale in which values along the axis increment arithmetically.

Linear Regression A linear regression finds the straight line that most closely describes, or predicts,
the value of the dependent variable, given the observed value of the independent variable. See also,
Regression.

794
Glossary

Link Use the Edit menu Paste Special command to place a linked object on the graph page. Linking
the object appears to place a copy of the object on the page, but actually only places a reference to the
original object file, and modifies the object every time the original file is changed.

Ln Transform This function returns a value or range of values consisting of the natural logarithm of
each number in the specified range.

Log Transform This function returns a value or range of values consisting of the base 10 logarithm of
each number in the specified range.

Logarithmic Scale A scale that represents numbers as a power of the base. See also, Common Log
and Natural Log.

Logit Scale An axis scale based on the logit equation

y
Logit = ln " -----------------#
100 – y!

Lotus .WKS, .WK? Files Files created in Lotus 1-2-3 are recognized by SigmaStat and can be
imported. Lotus files have the file name extension .WKS or .WK?. Note that Lotus functions are not
imported.

Mann-Whitney Rank Sum Test This nonparametric test is used to test the null hypothesis that two
samples were not drawn from populations with different medians. A rank sum test ranks all the
observations from smallest to largest without regard to which group each observation comes from.
The ranks for each group are summed and the rank sums compared.

Maximum The largest observation.

McNemar’s Test McNemar’s test is an analysis of contingency tables that have repeated observations
of the same individuals. Unlike a regular analysis of a contingency table, it ignores individuals who
responded the same way to the same treatments, and calculates the expected frequencies using the
remaining cells as the average number of individuals who responded differently to the treatments.

Mean The average value for a column. If the observations are normally distributed, the mean is the
center of the distribution.

Median The “middle” observation, computed by ordering all observations from smallest to largest,
then selecting the largest value of the smaller half of the observations.

Menu Bar A list of menus appearing at the top of the SigmaStat screen. These menus can be selected
with a mouse, or by pressing Alt and the first letter of the menu name. When one menu appears, the
adjacent menu can be pulled down by pressing 0 or 1.

795
Glossary

Minimum The smallest observation.

Missing Values The number of missing observations in a worksheet column. A missing value is
different than a blank because it represents the attempt to record a value.

Multicollinearity Multicollinearity occurs when changing the parameters for two independent
variables has a similar effect on the fit; when this is serious, the estimates of the regression coefficients
become unreliable. This only applies to regressions involving multiple independent variables (not for
simple linear regression or polynomial regression).

Multiple Comparisons You can use multiple comparison procedures to isolate the differences
between groups, when running an ANOVA. There are two classes of multiple comparison procedures:
all pairwise comparisons, where every pair of groups are compared; and multiple comparisons versus a
control, where all treatment groups are compared with a single control group. For more information,
see Multiple Comparison Options on page 295.

Multiple Linear Regression Multiple linear regression is used when you want to predict the value of
one variable from the values of two or more other variables, by fitting a plane (or hyperplane) to the
data, and you know there are two or more independent variables, and want to find a model with these
independent variables.

Multiple Logistic Regression Multiple Logistic Regression is used when you wan to predict a
qualitative dependent variable, such as the presence or absence of disease, from observations of one or
more independent variables, by fitting a logistic function to the data. For more information, see page
527.

N The number of non-missing observations in a worksheet column.

Nonlinear Regression A nonlinear regression is used when the data follow a curve that is a nonlinear
function. Nonlinear regression solves the regression problem directly without transforming the data
and performing linear regression techniques. Nonlinear regression uses the Marquardt-Levenberg
algorithm to find the coefficients (parameters) of the independent variable(s) that give the “best fit”
between the equation and the data.

Nonparametric Tests These tests do not require that the data is normally distributed. They perform
a comparison on ranks of the observations.

Normality This refers to the assumption (contained within parametric tests) that a population follows
a standard, “bell” shaped Gaussian distribution, also known as a “normal” distribution.

Notebook File Notebook files are compound files that contain worksheets and graph pages.
Notebook files are provided as a means for automatic file organization, enabling you to keep separate
notebooks for separate groups of data.

796
Glossary

Novice Prompting Messages alerting you to certain situations or which double check some choices
(for example, telling you that data contains missing values or asking for confirmation before clearing
data).

Observed Proportions This data format consists of the sample sizes of two groups and the
proportion of each group that falls into a single category. This data is used to see if there is a difference
between the proportion of two different groups that fall into the category.

OLE2 Objects pasted from the Clipboard to a graph page can be linked, embedded, or placed on the
page as a generic object without any kind of file reference. Linked and embedded objects use OLE2,
Object Linking and Embedding version 2. To learn about the differences between linking and
embedding, see PASTING GRAPHS AND OTHER OBJECTS page 190.

One Way ANOVA A one way ANOVA is used when you want to see if two or more different
experimental groups are affected by two different treatments and your samples are drawn from
normally distributed populations with equal variances.

One Way Repeated Measures ANOVA A one way RM ANOVA tests for differences in the effect of a
series of experimental interventions on the same group of subjects by examining the changes in each
individual. Examining the differences between the values rather than the absolute values removes any
differences due to individual responses, producing a more sensitive (or more powerful) test.

Open Load a file into SigmaStat, either a notebook file, worksheet, graph, report, or another
program’s file.

Overwrite A data or text entry mode in which newly typed characters replace characters already on
the screen. See also, Insert.

P Value This value is the probability of incorrectly rejecting the null hypothesis. An increase in the P
value lessens the probability, and a decrease in P increases the risk.

Page Where reports and graphs are displayed and printed. The page displays the current report(s),
graph(s), or other objects as they appear when printed.

Paired t-test The paired t-test examines the changes that occur before and after a single experimental
intervention on the same individuals to determine whether or not the treatment had a significant
effect. Examining the changes rather than the values observed before and after the intervention
removes the differences due to individual responses, producing a more sensitive, or powerful, test.

Parametric Tests These tests are used to compare samples from normal populations, and are based on
estimates of the means and standard variation parameters of a normally distributed population.

797
Glossary

Paste Place the contents of the Clipboard at the selected location. On the worksheet, the upper left
corner of the Clipboard data block appears at the highlighted cell. On the Page, the Clipboard
contents are offset from the original object’s position.

Press the Ctrl+V key or choose the Edit menu Paste command to paste data or graphics.

Paste Special Place the contents of the Clipboard as an object of specified file type, as an embedded
object, or as a linked file object.

Embedding or linking text is especially useful for placing equations on a page, enabling you to insert
equations created with the Microsoft Word Equation Editor, and edit them at a later date. For more
information on using Microsoft Word and the Equation Editor, refer to the Microsoft Word User’s
Guide. To learn about pasting text on a page, see PASTING GRAPHS AND OTHER OBJECTS page 190.

Pearson Product Moment Correlation The Pearson product moment correlation is used when you
want to measure the strength of association between pairs of variables without regard to which variable
is dependent or independent, the relationship, if any, between the variables is a straight line, and the
residuals (distances of the data points from the regression line) are normally distributed with constant
variance.

Percentiles The two percentile points which define the upper and lower ends (tails) of the data, as
specified by the Descriptive Statistics options.

Point (pt) A unit of measure used in typesetting. Seventy-two points equal one inch.

Pointer The tool controlled by the mouse used to choose commands, select dialog box options, select
data on the worksheet, and select and modify page objects. Sometimes called the cursor.

The pointer is usually arrow-shaped. On the page, the shape of the pointer changes according to its
current function.

Polynomial Regression A polynomial regression is used when you want to predict a trend in the
data, or predict the value of one variable from the value of another variable, by fitting a curve through
the data that does not follow a straight line, and know there is only one independent variable.

Power The power, or sensitivity, of a test is the probability that the test will detect a difference or
effect if there really is a difference or effect. The closer the power is to 1, the more sensitive the test.

Predicted Values The predicted values for the regression are the values computed for the dependent
variable by the regression equation for each observed value of the independent variables.

Preferences A set of options used to customize the appearance of SigmaStat worksheets and graph
pages and to set some defaults. Use the File menu Preferences... command to access the preferences
options.

798
Glossary

PRESS Prediction Error The PRESS statistic (Predicted Residual Error Sum of Squares) is a measure
of how well the regression equation fits the data. The PRESS statistic is computed by removing the ith
data point from the data set, computing the regression equation without this data point, predicting
that point based on the regression equation, then computing the residual.

Probability Scale An axis scale in which a sigmoidally shaped curve identical to the Gaussian
cumulative distribution function appears as a straight line.

Probit Scale An axis scale identical to the probability scale, except that it is expressed in terms of
standard normal deviates increased by five. A probability of 0.5 (50%) corresponds to 0 standard
normal deviates, or five probits. One standard normal deviate on either side of zero encompasses
68.2% of the area under the normal curve. A probit of 6 (1+5) corresponds to the 84.1% probability
and a probit of 4 (-1+5) corresponds to the 15.9% probability (68.2% = 84.1% - 15.9%).

Quick Transforms Seven commonly-used functions used to linearize observations or stabilize the
variance, so that the resulting variables meet the requirements of statistical methods.

R Value The correlation coefficient, or square root of R2. R2 is sometimes called the coefficient of
determination and is a measure of the closeness of fit of a scatter graph to its regression line where R2 =
1 is a perfect fit. See also, Correlation Coefficient and Regression.

Random Numbers A series of normally or uniformly distributed numbers created by the two random
number generating functions, or transforms, within SigmaStat.

Range The maximum minus the minimum values.

Rank Sum Test See Mann-Whitney rank sum test.

Rank Transform This function is used to assign rank values to all observations in a column from
smallest to largest. Ties are assigned the average of the ranks that would be assigned if there were no
tied values. The rank transform assigns integer rank values to data.

Raw Data This data format places the data for each group to be compared or analyzed in separate
columns.

Raw Residuals The raw residuals are the differences between the predicted and observed values of the
dependent variables.
1
Reciprocal Transform This function calculates the reciprocal, ---x- , of the values in a specified column.

Regression These procedures use the values of one or more independent variables to predict the value
of a dependent variable. Regression assumes an association between the independent and dependent
variables that, when graphed on a Cartesian coordinate system, produces a straight line, plane, or
curve. Regression finds the equation that most closely describes the actual data.

799
Glossary

Use the Statistics menu Regressions... command to perform regressions. See also Confidence Interval
and Nonlinear Regression.

Regression Coefficients These are the values of the constant and coefficients of the independent
variables for the regression model, as computed by the regression procedure. See also, Regression.

Repeated Measures ANOVA on Ranks The Friedman repeated measures analysis of variance on
ranks is a parametric test that compares effects of a series of different experimental treatments on a
single group. Each subject’s responses are ranked from smallest to largest without regard to other
subjects, then the rank sums for the subjects are compared.

Residuals These are the differences between the predicted and observed values of the dependent
variables. There are 4 types of residuals: Raw Residuals, Standardized Residuals, Studentized
Residuals, and Studentized Deleted Residuals.

Sample Size The sample size is the number of observations, both in each column or group, and taken
as a whole (all groups or treatments). All else being equal, the larger the sample size, the greater the
power of the test.

Save To write all data and graph settings to a file. Use the File menu Save command to save your
work.

Your data, report, and graph are saved to the same notebook file that was previously opened; if you
began a new session, you are prompted for a path and file name. Transform (.XFM) and nonlinear
regression (.FIT) files are saved using buttons in the Transform or Nonlinear Regression dialog boxes.

Save As Write all data, report, and graph settings to a new file. Use the File menu Save As...
command to specify a new file name and directory. Transform (.XFM) and nonlinear regression (.FIT)
files are saved using buttons in the Math Transform or Regression dialog boxes.

Scatter Graphs A graph type where a symbol represents each data point. Scatter plots are 2D or 3D
Cartesian graphs using a scatter plot type.

Scientific Notation A form for expressing numbers using the letter e to represent the power of 10.
For example, the scientific notation for 10.0 is 1.0e+001.

Scroll Box A dialog box option containing a list of items. You can scroll up or down to reveal more
selections. Selected scroll boxes have a scroll bar appearing along the right side. You can use the
mouse to drag the scroll bar up or down, or click the up and down arrow buttons.

Section Sections are a subdivision of the notebook file which is a compound file used to save all data
and graphs in SigmaStat. Notebook sections are individual “folders” that contain notebook items.
Notebook items are worksheets and graph pages you have created using SigmaStat. Each notebook

800
Glossary

section may contain only one worksheet, but can contain up to ten graph pages. Within sections,
notebook items are indicated as worksheets or graph pages by icons that appear next to item names.

In addition, reports and their associated graphs, are saved to test sections “nested” within the section
containing the corresponding worksheet data. For more information, see NOTEBOOK FILE
STRUCTURE on page 17.

Select (Object) To choose an object on the page in order to perform an operation (such as move or
delete) on it. Graphs and text labels can be selected. Items can only be selected when the Tools menu
Select Object command is checked.

To select an object, click while the pointer is over the object. Selected objects are surrounded by square
handles or a dotted line. You can select multiple objects by dragging a dotted-line box completely
around the objects, or by holding down the Shift key while selecting individual objects.

Signed Rank Test The Wilcoxon signed rank test tests the null hypothesis that two samples were
drawn from populations with the same medians. A signed rank test is a nonparametric procedure that
ranks all the observed treatment differences from smallest to largest without regard to sign (based on
their absolute value), then attaches the sign of each difference to the ranks.

Simple Linear Regression See linear regression.

Skewness A measure of how symmetrically the observed values are distributed about the mean. A
normal distribution has a skewness equal to zero.

Sort To arrange items in an ascending or descending order. Selected blocks of worksheet data can be
sorted using the Edit menu Sort Selection... command. If you sort more than one column, all
columns are sorted according to the selected key column.

Spearman Rank Order Correlation This correlation is used when you want to measure the strength
of association between pairs of variables without specifying which variable is dependent or
independent measure the residuals (distances of the data points from the regression line) the
population is not normally distributed with constant variance.

Square Root Transform This function, x , is used to calculate the square root of values in a
specified worksheet column.

Square Transform This function computes the squares of the values in a specified worksheet column
x 2.

801
Glossary

Standard Deviation A measure of the spread of the data about the mean. The sample standard
deviation is the square root of the ratio of the sum of the squares of the residuals divided by the
number of data points, minus one.
1/2
n
1 -
s = -----------
n–1 & $ xi – x %2
i=1

Standard Error (of the Mean) The standard deviation of the mean, computed by dividing the
sample standard deviation by the square root of the sample size.

s
Std Err = -------
n

Standardize Transform The standardize transform, used before performing a statistical procedure,
subtracts the mean of a column from the sum of all values in that column, then divides that value by
the standard deviation, placing the results in a specified output column. By definition, standardized
data has a mean of zero and a standard deviation of one.

Standardized Residuals The standardized residual is the residual divided by the standard error of the
estimate. The standard error of the residuals is essentially the standard deviation of the residuals, and
is a measure of variability around the regression line. See also, Residuals.

Statistical Summary Data This data format can be used to perform a t-test or one way ANOVA.
These statistics are in the form of the sample size, mean, and the standard deviation (or the standard
error of the mean) for each group.

Stepwise Regression A stepwise linear regression is used when you want to predict a trend in the data,
or predict the value of one variable from the values of one or more other variables, by fitting a line or
plane (or hyperplane) through the data, and when you do not know which independent variables
contribute to predicting the dependent variable. This procedure finds the model with suitable
independent variables by adding or removing independent variables from the equation.

Studentized Deleted Residuals Studentized deleted residuals are similar to the Studentized Residuals
except that the residual values are obtained by computing the regression equation without using the
data point in question. See also, Studentized Residuals.

Studentized Residuals Studentized residuals scale the standardized residuals by taking into account
the greater precision of the regression line near the middle of the data versus the extremes. The
Studentized residuals tend to be distributed according to the Student t distribution, so the t
distribution can be used to define “large” values of the Studentized residuals.

802
Glossary

Sum Refers to the sum of all observations. The mean equals the sum divided by the sample size.

Sum of Squares The sum of the squared observation values. This is the sum of squared deviations
from the mean.

Summary Table A summary table of basic statistics can be produced for all group comparison and
repeated measures tests. The summary table is displayed in the report if it was selected in the options
dialog box for the test.

Survival Analysis This is a statistical test that studies the variable that is the time to some event.

Symbol The figure (such as a circle or triangle) used to represent a data point in a line or scatter plot.
Plot symbols are modified using the Symbols settings of the Graph Properties dialog box. For more
information on symbol settings, see MODIFYING GRAPH ATTRIBUTES on page 184.

Tabulated Data Raw observation counts organized in a contingency table. This data format can be
used for the Chi-Square, McNemar’s, and Fisher exact tests. See also Raw Data.

t-test A parametric statistical test used to determine if there is a difference between two groups that is
greater than what can be attributed to random sampling variation. A t-test is based on estimates of the
mean and standard deviation parameters of the normally distributed populations from which the
samples were drawn. Also called Student's t-test. See also Paired t-test.

Text File A “plain text” file format widely used by word processing, desktop publishing, and
spreadsheet programs. SigmaStat can import and export text files.

Toolbar Toolbars are floating palettes containing buttons to execute many common File, Edit, View,
Format, Graph, and Statistics menu commands. These include running tests, the SigmaStat Advisor,
creating and editing graphs, and formatting and editing reports.

For more information on modifying the display and positioning of toolbars, and using the button
commands, see USING TOOLBARS on page 8.

Transform A mathematical equation that generates data, either by performing calculations on


columns of data in the worksheet, or by producing series of random or automatically incremented
numbers.

See the Transforms and Nonlinear Regression reference for a complete description of transform
functions.

Transpose Switches the orientation of worksheet data so that columns become rows and rows become
columns. Use the Edit menu Transpose Paste command to paste Clipboard data with rows and
columns transposed.

803
Glossary

Three Way ANOVA In a three way or three factor analysis of variance, there are three experimental
factors which are varied for each experimental group. A three factor design is used to test for
differences between samples grouped according to the levels of each factor, and for interactions
between the factors.

Transpose Paste Switches the orientation of worksheet data so that columns become rows and rows
become columns.

Use the Edit menu Transpose Paste command to paste Clipboard data with rows and columns
transposed.

Two Way ANOVA In a two way or two factor analysis of variance, there are two experimental factors
which are varied for each experimental group. A two factor design is used to test for differences
between samples grouped according to the levels of each factor, and for interactions between the
factors.

Two Way Repeated Measures ANOVA In a two way or two factor repeated measures analysis of
variance, there are two experimental factors which may affect each experimental treatment. Either or
both of these factors are repeated treatments of the same group of individuals. A two factor design
tests for differences between the different levels of each treatment, and for interactions between the
treatments.

Uniform Random Numbers See Random Numbers.

User-Defined Transforms SigmaStat user-defined transforms are math functions and equations
which are applied to worksheet data. User-defined transforms provide extremely flexible data
manipulation, allowing powerful mathematical calculations to be performed on specific sets of data.

Yates Correction Factor The Yates correction is applied to '()(' tables and other statistics where the
P value is computed from a chi-square distribution with one degree of freedom. Using the Yates
correction makes a test more conservative, i.e., it increases the P value and reduces the chance of a false
positive conclusion.

z-test The z-test comparison of proportions is used to determine if the proportions of two groups
within one category or class are significantly different. It is used when there are two groups to
compare, the total sample size (number of observations) for each group is known, and with the
proportions p for each group that falling within a single category.

Zoom Enlarge or shrink the view of the current graph. Choose the View menu Zoom command to
change the zoom level. You can view a graph at 50%, 100%, 200%, 400%, or fit the page in the
current window.

804
Index

Index A
Adjusted R2
best subset regression 612
best subset regression results 621
Symbols linear regression results 485
.ASC files multiple linear regression 513
opening 23, 24 nonlinear regression results 654
.CVS files stepwise regression results 598
opening 23, 24 Advisor
.DBF files calculating power 88, 89, 91
opening 23, 24 calculating sample size 84, 88, 91
.DIF files data format 91
importing 61 defining your goals 83
opening 23, 24 determining sensitivity 85
.HTM files determining which test to use 83
exporting 143 independent variables 94
.JNB files measuring data 85
saving 10, 19 number of treatments 87
.MOC files repeated observations 86
opening 23, 24 using 6
.PDF files see also type of prediction
exporting 143 Algorithm
.PRN files Marquardt-Levenberg 636
opening 23, 24 Aligning
.SP5 files report text 139
importing 61 Alignment
opening 23, 24 text 186
.SPG files All pairwise comparisons
importing 61 ANOVA on ranks 319
opening 23, 24 ANOVA on ranks results 325
.SPW files one way ANOVA 240
opening 23, 24 one way ANOVA results 248
.TXT files one way RM ANOVA 365, 374
opening 23, 24 RM ANOVA on ranks 417, 423
.WK* files three way ANOVA 296
opening 23, 24 three way ANOVA results 306
.WKS files two way ANOVA 267
defined 795 two way ANOVA results 277
two way RM ANOVA 393, 404
Alpha
Numerics defined 716
2D graphs in power 441, 451
line/scatter 800 Alpha value
3D category scatter plot Chi-Square test 445
two way ANOVA 280 defined 789
two way RM ANOVA 406 editing 211, 236, 291, 481, 508, 563, 594, 648
3D graphs in power 133, 217
scatter 800 linear regression 481, 594, 648
3D scatter plot linear regression results 490
two way ANOVA 280 multiple linear regression 508
3D scatter residual plot one way ANOVA options 236
two way RM ANOVA 406 one way RM ANOVA 360
95% confidence interval 56 paired t-test 336
99% confidence interval 56

805
Index

polynomial regression 563 ANOVA table


power 217, 246, 276, 305, 342, 372, 403, 519, degrees of freedom 247
603, 658 linear regression results 487
sample size 133 mean squares 247
stepwise regression 594 multiple linear regression results 515
three way ANOVA options 291 nonlinear regression results 655
two way ANOVA options 262 one way ANOVA results 247
two way RM ANOVA options 388 one way RM ANOVA results 372
z-test 437 stepwise regression results 598
see t-test options two way RM ANOVA results 400
Analysis of contingency tables Appearance
when to use 442 setting in worksheet 48
ANOVA ARCSIN function
see one way, two way, and RM ANOVAs applying 76
ANOVA on Ranks normalizing percentage data 755
selecting data format 316 ARCSIN square root function
Tukey test 319 defined 789
One way ANOVA Arithmetic mean
see group comparison tests see mean
ANOVA on ranks Arranging data
about 310 ANOVA on ranks 311
all pairwise comparisons 319 as indexed data 66
arranging data 311 as raw data 65
changing test options 312 best subset regression 613
creating a graph 327 Chi-Square test 443
data format 67 contingency tables 69
defined 789 descriptive statistics 105
Dunn’s test 320, 326, 418 descriptive statistics values 205
Dunnett’s test 319 Fisher Exact test 452
enabling multiple comparisons 314 Gehan-Breslow survival analysis 695
interpreting results 322 group comparison tests 64, 204
multiple comparison options 318 indexed data 205
multiple comparison vs. a control 319 linear regression 471
performing a multiple comparison 320 LogRank survival analysis 681
results 322 McNemar’s test 458
running 316 multiple linear regression 498
setting options 311 multiple logistic regression 528
Student-Newman-Keuls test 319 nonlinear regression 638
viewing graph 326 normality test 129
when to use 89, 115, 204, 310 one way ANOVA 231
ANOVA on ranks options one way RM ANOVA 356
when to use 311 paired t-test 333
ANOVA on ranks results Pearson Product Moment Correlation 624
all pairwise comparison 325 polynomial regression 554
box plot 327 rank sum test 221, 225, 338, 415
difference of ranks 325 raw data 205
equal variance 323 repeated measure tests 330
H statistic 322, 324 RM ANOVA on ranks 410
multiple comparison graphs 327 signed rank test 346
multiple comparisons 324 single group survival test 671
multiple comparisons vs. a control 325 Spearman Rank Order Correlation 632
normality 322 stepwise regression 579
point plot 326 three way ANOVA 284
table 323 two way ANOVA 254

806
Index

two way RM ANOVA 380 when to use 124


unpaired t-test 207 Bar charts
z-test 435 color 184
see also data format creating 166
Artwork descriptive statistics results 110
pasting 190 one way ANOVA results 250
ASCII Files plotting column means 168
defined 789 plotting standardized residuals 176
ASCII files unpaired t-test results 217
see text files Bars
Aspect ratio fills 184
maintaining 180 BBS number 11
preferences 180 Before & after procedures
Assumption checking paired t-test 118
homoscedasticity 473, 500, 557, 583 signed rank test 118
linear regression 472 Before and after procedures
multiple linear regression 499 comparing repeated measurements 329
nonlinear regression 639 data format 330
normality 348 paired t-test 330, 332
normality and equal variance 209, 222, 233, 259, signed rank test 330, 345
289, 313, 386 Best subset regression
polynomial regression 557 adjusted R2 612
stepwise regression 582 arranging data 613
tests and analyses 6 best subset criteria 612
Assumptions coefficient of determination 612
normality 348 defined 611
normality and equal variance 358, 411 interpreting results 619
Asymmetry Mallows Cp 612
skewness 56 performing 611
Averaging results 619
see mean running test 618
Axes selecting data columns 614
lines 185 setting options 614
modifying 185 when to use 96, 124, 611
scale type 185 Best subset regression options 614
Axis adjusted R2 615
defined 789 flagged values 618
linear scale 794 influence/multicollinearity 616
log scale 795 Mallows Cp 615
logit scale 795 number of subsets 615
probability scale 799 R2 615
probit scale 799 VIF 616
Axis, scale Best subset regression results
common log 185 coefficients 622
linear 185 P value 623
logit 185 standard error 622
natural log 185 summary table 620
probability 185 t statistic 622
probit 185 VIF 623
Bibliography 12
B Bitmaps
pasting 190
Backward stepwise regression Blocks, data
defined 578, 789 inserting and deleting 35

807
Index

sorting 52 z-test 733


see also data Categories
Bolding comparing 122, 430
report text 139 Cells
Bonferroni t-test empty 256, 286, 381, 382
one way ANOVA 241, 268, 297, 394 missing 370
one way ANOVA results 249 missing, three way ANOVA 286, 301
one way RM ANOVA 366, 375 missing, two way ANOVA 256
three way ANOVA 297 moving to 32
three way ANOVA results 306 using as column or row titles 40
two way ANOVA 268 Center transform
two way ANOVA results 278 applying 755
two way RM ANOVA 394 defined 790
two way RM ANOVA results 404 reducing multicollinearity 752
Box plots using 746
ANOVA on ranks 327 Centering
color 184 data 755
creating 171 see also center transform
defined 790 Change to be detected box
descriptive statistics results 111 entering value 731
fills 184 Changing
rank sum test 229 alpha value 211, 236, 291, 336, 360
Breaking ANOVA on ranks options 312
links 197 best subset regression options 614
inserted object icons 195
C linear regression options 471
multiple logistic regression options 530
Calculating nonlinear regression options 639
chi-square statistic 423 one way ANOVA options 232
dummy variables 765 one way RM ANOVA options 357
error bars 186 paired t-test options 334
interaction 746 pasted object icons 193
N statistic 109 polynomial regression options 555
Pearson Product Moment Correlation 624 rank sum test options 222
power 133 signed rank test options 347
Spearman Rank Order Correlation 631 source files for links 197
Calculating power 85, 91 stepwise regression options 580
advisor 91 symbols 184
chi-square 723 text 140
correlation coefficient 726 three way ANOVA options 288
determining test to use 88, 89 t-test options 208
one way ANOVA 721 two way ANOVA options 258
paired t-test 717 two way RM ANOVA options 385
proportions 719 Characters
t-test 714 non-keyboard 187
Calculating sample size 84 Chi-Square results
advisor 91 Chi-Square statistic 450
correlation coefficient 741 P value 451
determining test to use 88, 89 power 451
one way ANOVA 735 Yates correction 450
paired t-test 730 Chi-Square statistic
proportions comparison 733 calculation 423
t-test 728 Chi-Square results 450
unpaired t-test 728 defined 790

808
Index

McNemar’s test results 464 graphs 201


multiple logistic regression 532 reports 144
P value 451 Coding
RM ANOVA on ranks 423 effects 770
RM ANOVA on ranks results 421 reference 766
Yates correction results 450 Coefficient
Chi-Square test defined 790
alpha value 445 Coefficient of determination
arranging data 69, 443 best subset regression 612
calculating power/sample size 133 best subset regression results 621
contingency tables 443 defined 568
data format 443 linear regression results 485
interpreting results 449 multiple linear regression 513
performing 442 multiple linear regression results 513
power 445 polynomial regression results 568, 571
raw data 443 stepwise regression results 598, 653
results 449 Coefficients
running 446 best subset regression results 622
setting options 444 correlation 125, 467
tabulated data 443 linear regression results 486
when to use 122, 430, 442 multiple linear regression results 514
Yates correction factor 446 multiple logistic regression results 548
Chi-square test nonlinear regression results 654
arranging data as 447 regression 800
calculating power 723 standardized 478, 505, 514, 563, 588, 645
contingency table 724, 738 stepwise regression results 600
Choosing Coefficients P value
appropriate procedure 103 multiple logistic regression 535
Choosing column data Color
descriptive statistics 107 fills 184
Choosing columns to test lines 184
one way ANOVA 232 mesh lines 184
one way RM ANOVA 356 outlines 184
paired t-test 337 pattern lines 184
rank sum test 221 report text 138
RM ANOVA on ranks 410 Colors
signed rank test 346 changing plot 184
three way ANOVA 293 Column means
t-test 208 plotting 168, 170
two way ANOVA 264 Column percentage
two way RM ANOVA 390 in contingency tables 450
Classification table test Column statistics
multiple logistic regression 532 99% confidence interval 56
multiple logistic regression results 548 defined 790
threshold probability 532 maximum value 56
Clearing minimum positive value 56
data 34 minimum value 56
see deleting missing values 56
Clipart other values 56
pasting 190 printing 57, 81
Clipboard setting Options 10
cutting & copying data 33 setting preferences 54
defined 790 showing/hiding 54
Closing size of sample 56

809
Index

skewness 56 data format 431, 435


sum of sample 56 interpreting results 439
viewing 54 performing 434
Column titles results 439
defined 790 setting options 435
Column width and row height when to use 430
setting in worksheet 48 Compare proportions options
Columns when to use 435
averaging 55 Compare proportions procedures
column and row titles dialog box 37 arranging data 435
data 205 running z-test 438
deleting 35 same group to two treatments 430
editing titles 37 Compare two groups procedure
factor 66, 205 rank sum test 113
inserting empty 35 t-tests 113
key 52 when to use 113
selecting 32 Comparing
selecting entire 32 categories 122
sorting data 52 Comparing groups
stacking contents 34 choosing group comparison 113
statistics 54 many 114
subject 66 same group before and after multiple treatments 119
switching from rows to columns 63 same group before and after one treatment 118
titles 37 two groups 113
using as row titles 39 Comparisons
see also data multiple 364, 392, 417
Columns statistics Computing
95% confidence interval 56 see calculating
Common log axis scale 185 Conditions
Common log scale number of 84, 87
defined 790 Confidence interval
Compare groups procedures descriptive statistics 106
before and after single treatment 345 descriptive statistics results 110
determining test to use 87 difference of means 216, 246, 342
for differences in rates and proportions 429 for difference of proportions 437, 441
one way ANOVA 236 for the mean 110, 791
one way RM ANOVA 361 linear regression 477
parametric/nonparametric tests 203 linear regression results 493
RM ANOVA on ranks 413 multiple linear regression 504
three way ANOVA 291 multiple linear regression results 522
two way ANOVA 263 nonlinear regression 644
two way RM ANOVA 389 nonlinear regression results 661
Compare many groups procedure one way ANOVA 235
ANOVA on ranks 115 one way RM ANOVA 359
one way ANOVA 115, 116 polynomial regression options 562
two way ANOVA 115, 116 polynomial regression results 573, 574
when to use 114 population 477, 493, 504, 522, 562, 574, 587, 607,
Compare many groups procedures 204 644
ANOVA on ranks 204, 310 regression 477, 493, 504, 522, 562, 587, 607, 644,
Kruskal-Wallis ANOVA on ranks 204 661
one way ANOVA 204, 230 saving to the worksheet 478, 505, 562, 587, 645
three way ANOVA 204 stepwise regression 587
two way ANOVA 204, 253 stepwise regression results 606
Compare proportions three way ANOVA 290

810
Index

two way RM ANOVA 387 linear regression results 492


unpaired t-test 210 multiple linear regression results 521
z-test 437 multiple logistic regression results 552
Confidence intervals nonlinear regression results 660
paired t-test 336 stepwise regression results 605
two way ANOVA 261 Copying
Confidence lines data 33
95% 56 defined 791
99% 56 moving data 34
CONFIG.SYS report text 140
defined 791 copying
Connected data notebook items between notebooks 23, 25
three way ANOVA 286 Correction
two way ANOVA 256 Yates 430
two way RM ANOVA 383 Correlation coefficient
Constant variance calculating power 91, 726
defined 791 calculating sample size 741
Constant variance test defined 467, 791
enabling/disabling 473, 500, 557, 583 expected value 727, 741
linear regression results 489 finding 84
Multiple linear regression results 519 linear regression results and Adj r2 485
multiple linear regression results 519 multiple linear regression results 513
nonlinear regression 640 nonlinear regression results 653
nonlinear regression results 657 number of data points used to compute 628, 635
polynomial regression results 570, 572 Pearson Product Moment 468, 623
stepwise regression results 603 Pearson Product Moment Correlation results 628
Constraints, parameter Spearman Rank Order 468, 631
entering 650 stepwise regression results 598
nonlinear regression results 653 see also correlation procedure
Contacting Correlation coefficients
Systat Software via home page 12 calculating power/sample size 133
technical support 11 Correlation procedures 465
Contingency Table data format 71, 468
defined 791 Pearson Product Moment 125, 126, 623
Contingency table Spearman Rank Order 126, 631
data format 91 Creating
Contingency table summary ANOVA on ranks graph 327
Fisher Exact test results 456 descriptive statistics report graph 111
Contingency tables exploratory graphs 162
arranging data 69, 431, 458 indexed data 73
Chi-Square 443 new object to insert 195
entering into worksheet 724, 738 normality test report graph 132
Fisher Exact test 432 one way ANOVA graph 251
McNemar’s test 433 rank sum test graph 229
rate and proportion tests 429 report graphs 148
raw data 70, 431 three way ANOVA graph 308
summary table 450, 464 t-test graph 218
Yates correction 430 two way ANOVA, graph 281
Continuous scale creating
measuring data 85 equation 22
Converting Excel worksheets 22
indexed data to raw data 205 graph pages 22
Cook’s Distance test macros 22
defined 480, 507, 540, 590, 647 new notebook files and items 22

811
Index

reports 22 indexed, three way ANOVA 287


sections 22 indexed, t-test 208
worksheets 22 inserting 37
Criterion options linearizing 745, 749
multiple logistic regression 531 measuring 85
Curve messy 64, 205
best model to fit data 570 missing 6, 256, 356, 370, 381, 382
fitting 84 missing, three way ANOVA 286, 301
fitting through data 92, 553 missing, two way ANOVA 272
polynomial 93 missing, two way RM ANOVA 398
Curve fitting moving 34
linear curve fit, defined 794 observing 86
Cutting opening worksheets 59
data 33 pasting 33
defined 791 plotting many Y columns 165, 166
moving data 34 plotting means 168, 170
report text 140 plotting residuals 132
cutting predicting trend in 469, 553, 577, 611
notebook items between notebooks 23, 25 predicting variables 495
printing 80
D random 781
ranking 760
Data raw 65, 205, 331
arranging 105, 207, 221, 231, 254, 284, 311, 333, raw, ANOVA on ranks 311
380, 410, 498, 528, 554, 579, 613, 624, 632 raw, multiple logistic regression 72, 529
arranging in contingency tables 69, 431 raw, one way ANOVA 232
best model fit 570 raw, rank sum test 221
brushing 49 raw, RM ANOVA on ranks 410
centering 755 raw, t-test 208
clearing 34 rearranging 63
column statistics 54 saving 10, 57
connected/disconnected 256, 286, 383 selecting 32
copying 33 setting feedback colors in worksheet 49
cutting 33 sorting 52
deleting 33, 34 specifying text file import model 63
describing 84, 94 stacking columns 34
descriptive statistics values 205 standardizing 746
displaying as a fixed decimal 43 statistical summary data 68
displaying as engineering notation 42, 43 tabulated 803
editing 33 tabulated data 69
entering 27 transposing 63
entering using scientific notation 31 unbalanced 6, 64, 205, 256
exporting 58, 792 un-indexing 75
filtering 746 see also data format
fitting curve through 84, 92, 553 data
fitting line through 469 saving 19
fitting plane through 611 Data brushing 49
grouped, multiple logistic regression 72, 529 Data feedback 49
importing 7, 60, 793 Data format
indexed 66, 73, 205, 258, 331 ANOVA on Ranks 316
indexed, ANOVA on ranks 311 ANOVA on ranks 67, 311
indexed, one way ANOVA 232 ANOVAs 64
indexed, rank sum test 221 Chi-Square 443
indexed, RM ANOVA on ranks 410 Chi-Square test 69, 443, 447

812
Index

compare proportions 435 importing 62


contingency table 91 Decimal places
contingency tables 69, 431 setting in reports 136
correlation procedures 71 setting in worksheet 43
correlation tests 468 Defining
descriptive statistics 205, 238 dummy variables 766, 770
determining 91 transforms 78
Fisher Exact test 69, 432, 452, 454 Degrees of freedom
group comparison tests 64, 204 chi-square statistic 423
indexed data 66, 205, 331 defined 792
linear regression 471 linear regression results 487
logistic regression 541 multiple linear regression results 516
McNemar’s test 69, 433, 458, 461 nonlinear regression results 655
missing data points 356 one way ANOVA results 247
multiple logistic regression 72, 529, 541 one way RM ANOVA results 372
nonlinear regression 638 paired t-test results 341
normality test 129 stepwise regression results 599
observed proportions 91 three way ANOVA results 302
one way ANOVA 65, 67, 232, 238 two way ANOVA results 273
one way ANOVA on ranks 65 two way RM ANOVA results 400, 402
one way RM ANOVA 65, 356, 362 unpaired t-test results 216
paired t-test 67 Deleting
Pearson Product Moment Correlation 625 columns and rows 35
polynomial regression 554 data 33, 34
rank sum test 66, 221 graphs 201
rate and proportion tests 431 report text 141
raw data 65, 70, 71, 129, 205, 331, 431 reports 144
regression procedures 71, 468 Delimiter
repeated measure tests 330 defined 792
RM ANOVA 68 Dependent variables
RM ANOVA on ranks 410, 415 data format 468
signed rank test 65, 67, 350 dichotomous 527
single group survival test 671 multiple logistic regression results 545
Spearman Rank Order Correlation 632 predicting 84, 92, 93, 465
statistical summary 68 three way ANOVA results 302
survival analysis tests 668 two way ANOVA results 273, 399
tabulated 69 Descriptive Statistics
three way ANOVA 67, 287 defined 792
t-tests 64, 208 Descriptive statistics
two way ANOVA 65, 67, 258 arranging data as 205, 238
two way RM ANOVA 65 arranging data for 105
unpaired t-test 212, 542 confidence interval 106
z-test 431, 435 graphing data 110
Data points interpreting results 108
missing 255 P value 106
Data set picking column data 107
defined 791 results 108
Date and time displays selecting data columns 105
setting in worksheet 44 setting options 105, 108
Date and Time format two way RM ANOVA options 387
regional settings 47 viewing 84
Day Zero Descriptive statistics results
setting in worksheets 46 bar chart 110
dBASE files box plot 111

813
Index

point and column means plot 111 report ruler 137, 138
point plot 111 docking
scatter plot 110 Notebook Manager 18
Deviance residuals Dragging
regression results 537 defined 792
DFFITS test dragging
linear regression results 492 Notebook Manager 18
multiple linear regression results 522 Dummy variables
nonlinear regression results 661 applying transform 765
stepwise regression results 606 defined 792
when to use 479, 506, 589, 646 defining 765
Diagnostics multiple logistic regression 528
influence 491, 605, 660 using 746
regression 520, 550, 573, 604 Duncan’s Multiple Range test
Dialogs three way ANOVA 298
defined 792 two way ANOVA 269
DIF files two way RM ANOVA 395
defined 792 Duncan’s test
importing 61 one way RM ANOVA 366
Difference from 2 value two way RM ANOVA results 405
linear regression 474 Dunn’s test
multiple linear regression 501 ANOVA on ranks 320, 326, 418
nonlinear regression 641 RM ANOVA on ranks results 424
polynomial regression 559 Dunnett’s test
stepwise regression 584 ANOVA on ranks 319, 325
Difference of groups one way ANOVA 241
paired t-test results 341 one way RM ANOVA 366, 375
Difference of means repeated measures ANOVA on ranks 418
Bonferroni t-test results 278, 307 RM ANOVA on ranks results 424
Dunnett’s test results 279, 307 three way ANOVA 297, 307
one way ANOVA results 250 two way ANOVA 269, 278
one way RM ANOVA results 249 two way RM ANOVA 395
Student-Newman-Keuls test results 279, 307 two way RM ANOVA results 405
Difference of proportions Durbin-Watson statistic
confidence interval 437 defined 792
z-test results 439 linear regression 474
Difference of ranks multiple linear regression 501
ANOVA on ranks results 325 nonlinear regression 641
Disconnected data polynomial regression 559
three way ANOVA 286 stepwise regression 584
two way ANOVA 256 Durbin-Watson test
two way RM ANOVA 383 linear regression results 489
Display summary table multiple linear regression results 518
one way RM ANOVA 359 nonlinear regression results 657
rank sum test 224, 349, 413 polynomial regression results 572
RM ANOVA on ranks 413 stepwise regression results 602
two way ANOVA 261
two way RM ANOVA 387 E
unpaired t-test 210
Displaying Edit menu
data using a fixed decimal 43 clear 141
data using scientific notation 42, 43 copy 33, 140
formatting toolbar 9 cut 33, 140
page margins 178 delete 34

814
Index

find/replace 140 one way ANOVA 233, 245


insert new object 194 one way RM ANOVA 358, 371
object 198 one way RM ANOVA results 371
paste 33, 141 rank sum test 222, 227, 313, 411
transpose paste 63 RM ANOVA on ranks results 421
Editing three way ANOVA 289, 302
column titles 37 two way ANOVA 259, 273
data 33 two way RM ANOVA 386
graphs with SigmaPlot 189 two way RM ANOVA results 399
object links 197 unpaired t-test 215
reports 137 Equations
row titles 37 adding independent variables 95
symbols 184 entering transform 787
text on the page 188 fit 653
word processor 141 linear regression 466
editing linearizing 752
notebook items 23 model 649
notebook sections 23 multiple linear regression 466, 496, 513, 545
Editor multiple logistic regression 527, 545
modifying reports 137 nonlinear 94
viewing reports 137 polynomial regression 553, 566, 570
Effects coding quadratic 752
defining dummy variables 765, 770 removing independent variables 95
when to use 770 simple linear regression 470
Embedding Wald statistic 534
defined 191 equations
objects 191 creating within Notebook Manager 22
Empty cells Erasing
three way ANOVA 285, 286 data 34
two way ANOVA 255, 256 Error bars
two way RM ANOVA 381, 382 calculating 186
Enabling multiple comparisons Error mean square
ANOVA on ranks 314 best subset regression results 622
RM ANOVA on ranks 413 Error sum of squares
Engineering notation three way ANOVA results 303
displaying data as 42, 43 two way ANOVA results 274
Entering Example
constraints, parameter 650 Gehan-Breslow survival analysis graph 707
data 27 LogRank survival analysis graph 692
data from keyboard 30 single group survival analysis graph 678
Greek symbols 187 Examples
options 650 survival curve graph examples 709
symbols in legends 188 Excel
text on the page 186 worksheets 29
user-defined transforms 78 Excel files
Equal variance importing 62
defined 792 Excel worksheets
Equal variance test creating within Notebook Manager 22
adjusting P value 209, 473, 500, 558, 584, 640 Expected
ANOVA on ranks results 323 correlation coefficient 727, 741
enabling/disabling 209, 222, 233, 259, 289, 313, difference in means 715, 722, 728
358, 386, 411 frequencies 450, 464
entering the P value 223, 234, 260, 289, 313, 359, group size 719
386, 412 number of subjects 718

815
Index

proportion 719 changing source for linked objects 197


standard deviation 718, 722, 729, 731, 736 dBASE 62
Expected mean squares embedding objects 191
one way RM ANOVA results 374 Excel 62
two way RM ANOVA 403 importing 7
two way RM ANOVA results 403 importing data from 60
Explain test results linking objects 191
reports 136 Lotus 62
Exploratory graphs object types 198
bar charts 166 Quattro 62
box plots 171 saving 10
creating 162 saving worksheets as non-notebook files 58
histograms 172 SP? 61
line plot 166 text 62
pie charts 175 updating links 197
plotting column means 168, 170 files
plotting residuals 172, 176 opening non-notebook 23, 24
point plots 169 saving 19
scatter plots 165 saving notebook 19
Exponent transforms Fill patterns
applying 76 changing color 184
Exponential functions changing type 184
applying to linear regression 753 Fills
Exponents changing 184
defined 792 defined 792
Exporting Filter transform
.HTM files 143 applying 777
.PDF files 143 using 746
data 792 Filtering
reports 143 see filter transform
worksheets 58 Find/replace
report text 140
F Finding
text 140
F statistic Fisher Exact test
linear regression results 488 arranging data 69, 452
multiple linear regression results 517 arranging data as 454
nonlinear regression results 656 contingency tables 432
one way ANOVA results 248 data format 432, 452
one way RM ANOVA results 373 interpreting results 455
P value 374, 402 performing 451
polynomial regression results 571 raw data 452
stepwise regression results 599 raw data format 432
three way ANOVA results 304 results 455
two way ANOVA results 275 running 453
two way RM ANOVA results 401 when to use 122, 430
F value Fisher Exact test results
polynomial regression results 568 P value 456
Factor columns 66, 205 statistics 456
FAX number 11 Fisher LSD Test
File import model 63 two way RM ANOVA results 405
Files Fisher’s LSD test
as objects to insert 195 three way ANOVA 297
breaking links between source and object 197 two way ANOVA 269

816
Index

two way RM ANOVA 394 G


Fit equations Gaussian transforms
nonlinear regression 649 defined 793
nonlinear regression results 653 Gehan-Breslow survival analysis
Fitting arranging data 695
curve through data 84, 92 example graph 707
Fixed decimal interpreting results 705
displaying data as 43 performing 693
Flagged values running 699
best subset regression 618 setting options 695
linear regression 477, 481, 643 when to use 693
Multiple linear regression 503 Generating
multiple linear regression 507, 510 exploratory graphs 162
multiple logistic regression 537, 539, 540 lagged variables 775
nonlinear regression 647 random numbers 781
polynomial regression options 561 report graphs 148
stepwise regression 586, 590, 593 reports 137
Font Geometrically connected data
report text 138 three way ANOVA 286
Fonts two way ANOVA 256
defined 793 Glossary 789–804
Greek 187 Goals
PostScript 186 defining 83
symbols 187 predicting 92
TrueType 186 see also test goals
Formatting data Going to
data format and arranging data 65 worksheet cell 32
Formatting toolbar graph pages
hiding/displaying 9 creating within Notebook Manager 22
positioning 9 naming 23
Forward stepwise regression Graphics
defined 578 pasting 190
when to use 95, 124 Graphs 147
Friedman ANOVA on ranks 326
RM ANOVA on ranks 408 bar charts 166
Friedman RM ANOVA on ranks box plots 171
see RM ANOVA on ranks 409 closing 201
F-to-enter value deleting 201
P value 601 descriptive statistics 110
setting 581, 597 editing with SigmaPlot 189
stepwise regression results 601 exploratory 162
F-to-remove value histograms 172
P value 602 line plots 166
setting 581, 598 line/scatter 794, 800
stepwise regression results 601 linear regression 493
Functions maintaining aspect ratio 180
exponential 753 multiple linear regression 523
hyperbolas 753 nonlinear regression 662
nonlinear 94 one way ANOVA 250
polynomial regression 570 one way RM ANOVA 376
power 752 paired t-test 343
quadratic 752 pasting as metafiles 192
pasting without data 192

817
Index

Pearson Product Moment Correlation 629 H


pie charts 175 H statistic
plotting many Y columns 165, 166 ANOVA on ranks 322, 324
point plots 169 P value 324
polynomial regression 574 Handling unbalanced data
rank sum 228 three way ANOVA 285
report 148 two way ANOVA 255
resizing labels/legends automatically 181 Hardcopy
RM ANOVA on ranks 425 defined 793
saving 10 Help system
scatter plots 165 defined 793
scatter, 3D 800 using 11
signed rank 353 Hiding
Spearman Rank Order Correlation 635 column statistics 54
stepwise regression 607 formatting toolbar 9
three way ANOVA 308 page margins 178
two way ANOVA 279 report ruler 137, 138
two way RM ANOVA 406 hiding
unpaired t-test 217 notebooks in Notebook Manager 18
using paste special 190 Histogram of residuals
see also exploratory graphs normality test results 132
graphs one way ANOVA 251
saving 19 three way ANOVA 308
Greek symbols two way ANOVA 279
entering 187 unpaired t-test results 218
Group comparison test Histogram of residuals report graph 153
which to use 87 Histograms
Group comparison tests creating 172
ANOVA on ranks 310 defined 793
arranging data 64 plotting residuals 172
arranging data for 204 Holm-Sidak Test
choosing appropriate 113 defined 793
comparing many groups 204 Holm-Sidak test
comparing two groups 204 one way ANOVA 240
defined 203 one way RM ANOVA 365
one way ANOVA 230 three way ANOVA 296
parametric/nonparametric 203 two way RM ANOVA 393
two way ANOVA 253 Home page 12
when to use 113 Homoscedasticity
Group indexed data assumption checking 473, 500, 557, 583, 640
arranging 205 linear regression 489
Grouped bar charts linear regression results 489
two way ANOVA 280 see also constant variance test 657
Grouped data Hosmer-Lemshow P value
multiple logistic regression 72, 529 multiple logistic regression results 545
Groups Hosmer-Lemshow statistic
comparing many 114 P value 531
comparing two 113 Hot keys
expected number 722 defined 793
expected size 719, 722 HTML files
number of 87 exporting 143
Hyperbolas 753, 754
Hypothesis, null 220, 231

818
Index

I nonlinear regression results 660


Icons stepwise regression results 605
changing display for inserted objects 195 Influence/multicollinearity
changing display for pasted objects 193 multiple linear regression results 521
displaying inserted objects as 194 multiple logistic regression results 551
displaying pasted objects as 193 see also influential point tests and multicollinearity
Importing Influential point tests 539, 588, 660
data 7, 793 Cook’s Distance 480, 492, 507, 521, 540, 552, 590,
data files 60 605, 647
dBASE files 62 DFFITS 479, 492, 506, 522, 589, 606, 646
DIF files 61 leverage 480, 492, 506, 507, 521, 539, 540, 552,
Excel files 62 589, 590, 647
Lotus files 62 leverage results 605
Quattro files 62 linear regression 478, 481
SigmaPlot files 61 multiple linear regression 505, 507
spreadsheets 62 multiple logistical regression 606
text file model 63 nonlinear regression 645, 647
text files 62 stepwise regression 591
Incremental evaluation Insert New Object... command 194
polynomial regression 557 Inserting
Incremental mean square columns and rows 35
polynomial regression results 567 data 37
Incremental sum of squares defined 794
defined 568 displaying inserted objects as icons 194
independent graph pages 23 linked objects 196
Independent variables modifying inserted object icons 195
adding to equations 95 new object 195
combinations 533 objects 194
combinations in multiple logistic regression results 545 objects from file 195
data format 468 see also pasting
predicting dependent variables 84, 92, 93, 123, 465 Insertion mode
regression equation 600 turning on/off 37
removing from equations 95 Interaction transform
selecting 95 applying 762
specifying 94 using 746
Indexed data Intercept
ANOVA on ranks 311 finding for line 92
arranging for group comparison tests 66 Internet
creating 73 home page 12
group comparison tests 205 Interpreting results
one way ANOVA 232 ANOVA on ranks 322
rank sum test 221 best subset regression 619
repeated measure tests 331 Chi-Square 449
RM ANOVA on ranks 410 compare proportions (z-test) 439
survival analysis test 670 descriptive statistics 108
three way ANOVA 287 Fisher Exact test 455
t-test 208 Gehan-Breslow survival analysis 705
two way ANOVA 74, 258 linear regression 483
Indexing data 73 LogRank survival analysis 690
Indicators McNemar’s test 463
dummy 765 multiple linear regression 512
Influence diagnostics multiple logistic regression 544
linear regression results 491 nonlinear regression 651
one way ANOVA 243

819
Index

one way RM ANOVA 369 column and row titles 37


paired t-test 339 defined 794
Pearson Product Moment Correlation 627 editing 188
polynomial regression 566 entering non-keyboard characters 187
polynomial regression, order only 570 entering numeric in worksheet 31
rank sum test 226 entering text in worksheet 31
RM ANOVA on ranks 421 rotating 187
signed rank test 351 using column and row title dialog box 37
single group survival analysis 677 using for column titles 39
Spearman Rank Order Correlation 634 using for row titles 39
stepwise regression 596 Lagged variables
three way ANOVA 300 generating 775
two way ANOVA 271 using 746
two way RM ANOVA 397 Least squares mean
unpaired t-test 214 two way RM ANOVA results 403
Intervals Least squares regression
confidence 56 defined 467
Introduction 1 Legends
Inverse exponential functions adding symbols 188
stabilizing variances 753 adding to graph page 186
Italicizing automatic scaling with graphs 181
report text 139 defined 794
Iterations editing 188
nonlinear regressions 637 Levene Median test 209, 223, 233, 260, 289, 313, 335,
348, 358, 386, 412
J defined 794
Leverage test
JNB files linear regression results 492
saving 10 multiple linear regression results 521
multiple logistic regression results 552
K nonlinear regression results 660
Key column 52 stepwise regression results 605
Keyboard when to use 480, 506, 507, 539, 540, 589, 590, 647
entering data with 30 Library
Keystrokes transform 746
moving around reports 141 Likelihood ratio test statistic
Kolmogorov-Smirnov test 209, 223, 233, 260, 289, 313, multiple logistic regression 532
335, 348, 358, 386, 412, 500, 557, 583, 640 multiple logistic regression results 546
Kruskal-Wallis Line
see ANOVA on ranks fitting through data 469
Kruskal-Wallis ANOVA on ranks slope & intercept 92
when to use 204 slope and intercept 470
see ANOVA on ranks Line plots
K-S distance creating 166
descriptive statistics results 110 modifying lines 184
normality test results 131 plotting many Y columns 166
Kurtosis Line/scatter graphs
descriptive statistics results 110 defined 794, 800
Linear axis scale 185
defined 794
L Linear regression
Labels alpha value 481, 594, 648
adding to graph page 186 arranging data 471
automatic scaling with graphs 181 bar chart of standardized residuals 494

820
Index

data format 471 P value 486, 488


difference from 2 value 474 power 490
Durbin Watson statistic 474 PRESS statistic 478, 488
equation 466, 470 regression diagnostics 490
exponential function 753 standard error 486
graphs 493 standard error of the estimate 485
histogram of residuals 493 standardized coefficient (beta) 486
hyperbolas 753 sum of squares 487
interpreting results 483 t statistic 486
line/scatter plot with confidence and prediction Linearizing
intervals 156, 494 data 749
ln function 753 equations 752
multiple 495 exponential functions 753
parametric test 470 Lines
performing 469 axis 185
power function 752 modifying plot 184
predicting variables 123 Linking
probability plot 494 defined 191, 795
reciprocal transform 754 objects 191
results 483 Links
running test 482 breaking 197
scatter plot of residuals 494 changing source files 197
selecting data columns 471 editing 197
setting options 471 manual/automatic updating 197
using power function 752 viewing object links 196
when to use 92, 124 Ln transform
Linear regression options applying 752, 753
assumption checking 472 defined 795
confidence interval 477 Log 10 function
Cook’s Distance test 480 applying 76
DFFITS test 479 LOG function
flagged values 477, 481 defined 795
influence/multicollinearity 478 Log likelihood statistic
influential points 481 multiple logistic regression results 547
Kolmogorov-Smirnov test 473 Logarithmic axis scale
leverage 480 defined 795
power 481 Logit axis scale 185
standardized coefficients 478 defined 795
Linear regression results Logit transform 545
adjusted R2 485 LogRank survival analysis
ANOVA table 487 arranging data 681
coefficients 485, 486 example graph 692
confidence interval 493 interpreting results 690
constant variance test 489 performing 679
Cook’s Distance test 492 running 685
creating a graph 495 setting options 681
degrees of freedom 487 when to use 679
DFFITS 492 Lotus files
Durbin-Watson statistic 489 defined 795
F statistic 488 importing 62
influence diagnostics 491
leverage 492 M
mean square 487
normality test 489 macros

821
Index

creating within Notebook Manager 22 two way ANOVA results 274


Mallows Cp two way RM ANOVA results 401
best subset regression 612 Measurement units
best subset regression results 620 setting for page 180
Mann-Whitney rank sum test setting for report ruler 138
see rank sum test Measuring data 85
Many groups continuous scale 85
comparing 204 nominal/ordinal scale 85, 86
Margins Median
displaying/hiding 178 defined 795
page 178 descriptive statistics results 109
Marquardt-Levenberg algorithm 636 Menu bar 795
Maximum Menus
defined 795 popup using right-click 36
Maximum value transforms 745
descriptive statistics results 109 Merging
Maximum value (max) column contents 34
column statistics 56 Mesh
McNemar’s test changing color 184
arranging data 69, 458 fills 184
arranging data as 461 Messy data 64
contingency table 433 group comparison tests 205
data format 433, 458 Metafiles
defined 795 pasting 190
interpreting results 463 pasting graphs as 192
performing 457 Microsoft Word files
raw data 433, 458 opening 144
results 463 Minimum
running 461 defined 796
setting options 459 Minimum positive value (min pos)
tabulated data 458 column statistics 56
when to use 122, 430, 457 Minimum value
Yates correction factor 460 descriptive statistics results 109
McNemar’s test results Minimum value (min)
P value 464 column statistics 56
Mean Missing cells 370
column statistics 55 three way ANOVA 301
confidence interval for difference 216, 246, 342 two way ANOVA 272
defined 795 two way RM ANOVA 398
descriptive statistic results 109 Missing data
difference of 249, 250, 278, 279, 307 one way RM ANOVA 356
expected difference 715, 722, 728, 736 one way RM ANOVA results 370
of difference 341 regression 468
Mean squares three way ANOVA 285, 286, 294
error 622 two way ANOVA 255, 265
incremental 567 two way RM ANOVA 381, 382, 391
linear regression results 487 Missing factor data
multiple linear regression results 516 one subject 384
nonlinear regression results 655 Missing values
one way ANOVA results 247 column statistics 56
one way RM ANOVA results 373, 374 converting 78
residual 567, 571 converting values to 785
stepwise regression results 599 defined 796
three way ANOVA results 303 descriptive statistic results 109

822
Index

handling 6 RM ANOVA on ranks 426


lagging observations 775 three way ANOVA 308
transform 746 two way RM ANOVA 407
Model Multiple comparison options
fitting curve to data 570 ANOVA on ranks 318
Modifying one way ANOVA 239
axes 185 one way RM ANOVA 364
inserted object icons 195 RM ANOVA on ranks 417
object links 196 setting 117, 122
pasted object icon 193 three way ANOVA 295
source files for links 197 two way ANOVA 266
symbols 184 two way RM ANOVA 392
worksheet appearance 80 Multiple comparison results 277, 306
see also editing all pairwise 277, 306, 325
Moving ANOVA on ranks 324
around reports 141 Bonferroni t-test 249, 278, 306
around the worksheet 31 Dunn’s test 326
formatting toolbar 9 Dunnett’s test 278, 307, 325
to worksheet cell 32 multiple comparison vs. a control 249, 325
toolbars 9 one way ANOVA 248
moving one way RM ANOVA 374
notebook items between notebooks 23, 25 RM ANOVA on ranks 423
MSerr Student-Newman-Keuls test 249, 278, 307, 325, 375
best subset regression results 622 three way ANOVA 305
MSincr two way ANOVA 277
polynomial regression results 567 two way RM ANOVA 403
MSres Multiple comparison vs. a control
polynomial regression results 567, 571 ANOVA on ranks 319, 325
Multicollinearity one way ANOVA 240
defined 796 one way RM ANOVA 365, 374
flagging data 593, 617 one way RM NOVA results 249
reducing 752, 755 RM ANOVA on ranks 417, 424
sample-based 509, 536, 592, 617 three way ANOVA 296
setting options 508, 535, 616 three way ANOVA results 306
structural 509, 536, 592, 617 two way ANOVA 267
Multiple two way ANOVA results 277
Excel worksheets 30 two way RM ANOVA 393, 404
worksheets 28 Multiple comparisons
Multiple categories defined 796
comparing 430 Multiple linear regression 513
Multiple comparison about 496, 527
ANOVA on Ranks 319 arranging data 498
enabling 314, 413 center transform 752
one way ANOVA 236 defined 796
one way RM ANOVA 361 difference from 2 value 501
pairwise comparisons 240, 365 Durbin-Watson statistic 501
performing 242, 367 equation 466, 496
three way ANOVA 291 interpreting results 512
two way ANOVA 263, 267 parametric test 497
two way RM ANOVA 389, 393 performing 495
Multiple comparison graphs results 512
ANOVA on ranks 327 running test 511
one way ANOVA 251 selecting data columns 498, 530
one way RM ANOVA 377 setting options 498

823
Index

using dummy variables 774 interpreting results 544


viewing graph 523 results 544
when to use 124, 495 running test 541
Multiple linear regression options setting options 530
alpha value 508 when to use 124, 527
assumption checking 499 Multiple logistic regression options
confidence interval 504 Chi-Square statistic 532
Cook’s Distance test 507 classification table test 532
DFFITS test 506 coefficients P value 535
flagged values 503, 507, 510 Cook’s Distance test 540
influence/multicollinearity 505, 508 flagged values 537, 539, 540
influential points 507 Hosmer-Lemshow statistic 531
leverage 506 independent variable combinations 533
power 508 influence 539
PRESS statistic 505 influence/multicollinearity 535
standardized coefficients 505 leverage 539
variance inflation factor 508 likelihood ratio test statistic 532
Multiple linear regression results multicollinearity 535
3D residual scatter plot 524 odds ratio 534
adjusted R2 513 odds ratio confidence 534
ANOVA table 515 predicted values 535
bar chart of standardized residuals 523 standard error coefficients 533
confidence interval 522 Wald statistic 534
creating a graph 525 Multiple logistic regression results
degrees of freedom 516 classification table 548
Durbin-Watson statistic 518 dependent variable 545
histogram of residuals 523 estimation criterion 545
influence/multicollinearity 521 Hosmer-Lemshow P value 545
line/scatter plot with prediction and confidence influence/multicollinearity 551
intervals 524 likelihood ratio test statistic 546
mean square 516 log likelihood statistic 547
normality test 518 logit transform 545
P value 515, 517 odds ratio confidence value 549
power 519 P value 549
PRESS statistic 518 Pearson Chi-Square statistic 546
probability plot 523 Pearson/deviance residuals 551
regression diagnostics 520, 550 probability table 548
regression equation 513, 545 residual calculation method 550
scatter plot of residuals 523 residuals table 550
SSincr 517 threshold probability 547
SSmarg 518 unique independent variable combinations 545
standard error of the estimate 513 variance inflation factor 549
standardized coefficient (beta) 514 Wald statistic 548
sum of squares 515 Multiple logistical regression options
summary table 514 influential points 606
Multiple logistic regression
about 527 N
arranging data 528
arranging data as 541 N
criterion options 531 defined 796
defined 796 N statistic
dichotomous variable 527 descriptive statistic results 109
dummy variables 528 naming
equation 527, 545 graph pages 23

824
Index

notebook files 23 difference from 2 value 641


notebook items 23 Durbin-Watson statistic 641
sections 23 flagged values 643, 647
worksheets 23 influence/multicollinearity 645
Natural log (ln) transform influential points 647
applying 753 leverage 647
Natural log axis scale 185 normality 640
Natural log scale power 648
defined 795 residuals 642
Natural log transform standardized coefficients 645
applying 76, 752, 753 Nonlinear regression results
New ANOVA table 655
Excel worksheets 29 coefficients 654
new confidence interval 661
equation 22 confidence interval for the regression 661
Excel worksheets 22 constant variance test 657
graph pages 22 constants 654
macros 22 Cook’s Distance test 660
notebook files and items 22 DFFITS 661
reports 22 Durbin-Watson statistic 657
sections 22 F statistic 656
worksheets 22 fit equations 653
Nominal (category) scale influence diagnostics 660
measuring data 86 leverage 660
Non-key board characters normality test 657
see text 187 P value 654, 656
Nonlinear equation parameter constraints 653
describing data 94 parameters 652
fitting curve through data 94 power 658
Nonlinear regression predicted values 661
constraints, parameter 650 PRESS statistic 645, 656
data format 638 regression diagnostics 658
defined 636, 796 report 652
edit window 650 standard error 654
equation parser 637 standard error of the estimate 654
influencing operation 650 statistics 654
interpreting results 651 sum of squares 655
iterations 637 t statistic 654
model equation 649 variables 652
options 650 variance inflation factor 655
overview 637 Non-normal populations
performing 637 testing 114, 118, 126, 206, 332
results 651 Non-notebook files
running 648 opening worksheets 59
setting options 638 saving worksheets as 58
user defined 7 non-notebook files
viewing graph 662 opening 23, 24
when to use 94, 124 Nonparametric option
Nonlinear regression options data format 205
assumption checking 639 Nonparametric tests
confidence interval 644 ANOVA on ranks 310
constant variance 640 data format 205
Cook’s Distance test 647 defined 796
DFFITS test 646 group comparison 203

825
Index

rank sum test 204 normal probability plot of residuals 132


repeated measure 330 P value 131
RM ANOVA on ranks 409 report graphs 131
signed rank test 118 Normalizing
Spearman Rank Order Correlation 631 observations 749
Normality percentage data 755
defined 796 Normally distributed populations
nonlinear regression 640 testing 114, 115, 118, 119, 126, 206
Normality procedure Notebook files
when to use 127 defined 796
see also normality test opening worksheets 59
Normality test saving reports to 142
adjusting P value 209, 473, 500, 558, 584, 640 saving worksheet data 57
ANOVA on ranks results 322 notebook files
data format 129 creating 22
descriptive statistics results 110 naming 23
enabling/disabling 209, 222, 233, 259, 260, 289, opening 23, 24
313, 358, 386, 411, 473, 500, 557, 583 saving 19
entering the P value 223, 234, 260, 289, 313, 359, viewing 23, 24
386, 412 notebook items 23
explanation of unpaired t-test results 214 creating within Notebook Manager 22
interpreting results 131 cutting/copying between notebooks 23, 25
linear regression results 489 naming 23
multiple linear regression results 518 opening 23, 24
nonlinear regression 657 printing selected notebook items 20
one way ANOVA 233 saving 19
one way ANOVA results 244 viewing 23, 24
one way RM ANOVA 358 Notebook Manager
one way RM ANOVA results 371 cutting/copying between notebooks 25
paired t-test 334 docking 18
paired t-test results 339 dragging 18
performing 127 opening and closing notebooks 18
picking data columns 129 overview 15
polynomial regression results 569, 572 sizing 18
rank sum test 222, 313, 411 notebooks
rank sum test results 227 closing 18
RM ANOVA on ranks results 421 opening 18
running 127 password protecting 21
see also normality procedure using the Notebook Manager 15
setting P value 128 viewing 18
signed rank test 348 Novice prompting
signed rank test results 352 defined 797
stepwise regression results 602 Null hypothesis
three way ANOVA 289 testing 220, 231
two way ANOVA 259 Numbers
two way RM ANOVA 386 filtering 777
two way RM ANOVA results 399 random 781
unpaired t-test 209 Numeric values
unpaired t-test results 214 measuring data 85
when to use 127
Normality test results 131 O
creating graphs 132
histogram of residuals 132 Object command 198
K-S distance 131 Objects

826
Index

breaking links 197 multiple comparisons vs. a control 240


changing source files for links 197 normality and equal variance assumptions 233
displaying as icons 193, 194 P value results 248
editing linked 191 pairwise comparisons 240, 365
editing links 197 power option 236
embedding 191 residuals options 235
identifying types 198 results 243
inserting 194 running test 237
inserting from file 195 selecting data columns 232
inserting linked objects 196 selecting data format 238
inserting new 195 setting options 232
linking 191 Student-Newman-Keuls test 241, 268, 297, 418
linking vs embedding 191 summary table 234
modifying object links 196 Tukey test 240
pasting as linked/embedded 193 viewing graph 250
pasting as specified file type 193 when to use 89, 115, 116, 121, 204, 206, 230
pasting to a page 190 see group comparison tests
updating links 197 One way ANOVA on ranks
using paste special 190 data format 65
viewing object links 196 One way ANOVA options
Observations alpha value 236
data 86 multiple comparison 239
lagging 775 when to use 232
normalizing 749 One way ANOVA results
repeated 86 bar chart 250
Observed counts Bonferroni t-test 249
in contingency tables 450, 464 confidence interval for the difference of the mean 246
Observed proportions creating a graph 251
defined 797 degrees of freedom 247
Odds ratio difference of means 250
multiple logistic regression 534 F statistic 248
Odds ratio confidence histogram of residuals 251
multiple logistic regression 534 mean squares 247
multiple logistic regression results 549 multiple comparison graphs 251
OLE2 multiple comparisons vs. a control 249
defined 797 P value 248
One way ANOVA 230 pairwise comparisons 248
about 231 probability plot 251
arranging data 231 Student-Newman-Keuls test 249
bar chart 250 sum of squares 247
Bonferroni t-test 241, 268, 297, 394 One Way RM ANOVA
calculating power 721 summary table 359
calculating power/sample size 133 One way RM ANOVA
calculating sample size 735 arranging data 356
changing test options 232 Bonferroni t-test 366
confidence interval 235 data format 65, 356, 362
data format 65, 67, 232 defined 797
defined 797 Duncan’s test 366
Dunnett’s test 241 Dunnett’s test 366
enabling multiple comparisons 236 enabling multiple comparisons 361
Holm-Sidak test 240 Holm-Sidak test 365
interpreting results 243 interpreting results 369
multiple comparison options 239 missing data points 356
multiple comparisons results 248 multiple comparisons results 374

827
Index

multiple comparisons vs. a control 365 entering 650


normality and equal variance assumptions 358 Gehan-Breslow survival analysis 695
performing 355 influence 478, 539, 588, 645
results 369 linear regression 471
running 362 LogRank survival analysis 681
selecting data columns 356 McNemar’s test 459
setting options 357 multicollinearity 508, 535, 616
Student-Newman-Keuls test 366 multiple comparison 117, 122
Tukey test 365 multiple comparisons for a three way ANOVA 295
viewing graph of data 376 multiple comparisons for a two way ANOVA 266
when to use 89, 119, 121, 330, 355 multiple comparisons for ANOVA on ranks 318
One way RM ANOVA options multiple linear regression 498
multiple comparisons 364 multiple logistic regression 530
One way RM ANOVA results nonlinear regression 638, 639
all pairwise comparison 374 one way ANOVA 232
ANOVA table 372 one way RM ANOVA 357
Bonferroni t-test 375 page 10
creating a graph 377 page units 10
degrees of freedom 372 paired t-test 333
equal variance test 371 polynomial regression 555
expected mean squares 374 rank sum test 222
F statistic 373 report 135
histogram of residuals 376 RM ANOVA on ranks 410
line/scatter graph 376 signed rank test 347
mean squares 373 single group survival analysis 672
missing data 370 stepwise regression 579
multiple comparison graphs 377 three way ANOVA 288
multiple comparison results 374 t-test 208
multiple comparison vs. a control 374 two way ANOVA 258
normality test 371 two way RM ANOVA 385
probability plot 377 worksheet 10
statistics 371 z-test 435
Student-Newman-Keuls test 375 Order
sum of squares 372 polynomial regression 553, 556
Opening Order only
defined 797 polynomial regression 557
Excel worksheets 29 Ordinal (rank) scale
multiple Excel worksheets 30 measuring data 85
multiple worksheets 28 Other values
popup menu 36 column statistics 56
reports saved as non-notebook files 144 Outlines
worksheets 59 changing color 184
opening Overwrite mode 37
non-notebook files 23, 24 defined 797
notebook files 23, 24
notebook items 23, 24 P
notebooks 18
worksheets 24 P value
Options defined 797
ANOVA on ranks 311 P value
best subset regression 614 best subset regression results 623
Chi-Square test 444 Bonferroni t-test 278, 306
column statistics 10 Chi-Square statistic 451
descriptive statistics 105 chi-square statistic 423

828
Index

descriptive statistics 106 setting units of measurement 10, 180


Dunnett’s test 279, 307 sizing 178
F statistic 374 page
Fisher Exact test results 456 moving between notebooks 23, 25
F-to-enter value 601 naming 23
F-to-remove value 602 Page undo
H statistic 324 preferences 180
Hosmer-Lemshow 531, 545 Paired t-test
linear regression results 486, 488 calculating power 717
McNemar’s test results 464 calculating sample size 730
multiple linear regression results 515, 517 Paired t-test 118
multiple logistic regression results 549 about 332
nonlinear regression results 654, 656 arranging data 333
normality test 128 data format 67
normality test results 131 defined 797
normality/constant variance tests 473, 500, 558, 584, graph of data 343
640 interpreting results 339
normality/equal variance 223, 234, 260, 289, 313, P value 335
359, 386, 412 performing 332
normality/equal variance tests 209 picking columns to test 337
one way ANOVA results 248 power 342
paired t-test 335 results 339
Pearson Product Moment Correlation results 628 running 337
polynomial regression results 569, 571 selecting data columns 333
regression 473, 500, 558, 640 setting options 333
report options 136 when to use 88, 118, 330, 345, 355
signed rank test 348 Paired t-test options
Spearman Rank Order Correlation 635 confidence intervals 336
stepwise regression 584 residual options 336
stepwise regression results 600 summary table 336
Student-Newman-Keuls test 279, 307 Paired t-test options
t statistic 216, 342 normality 334
T statistic results 228 Paired t-test results
three way ANOVA 304 confidence interval for difference of mean 342
t-test 209 creating a graph 343
two way ANOVA 276 degrees of freedom 341
two way RM ANOVA 402 histogram of residuals 343
unpaired t-test results 216 line/scatter graph 343
W statistic 353 normality 339
z statistic results 441 P value 342
Page probability plot 343
clipart 190 statistics 340
creating new objects to insert 195 Pairwise comparisons
defined 797 one way ANOVA 240
embedding objects 191 one way RM ANOVA 365
identifying object types 198 Parameters
inserting objects from files 195 constraints 650, 653
labels 186 nonlinear regression results 652
linking objects 191 Parametric tests
measurement units 180 defined 797
pasting artwork 190 group comparison 203
pasting graphs/objects 190 linear regression 470
setting margins 178 multiple linear regression 497
setting Options 10 one way ANOVA 230

829
Index

paired t-test 118 P value 628


Pearson Product Moment Correlation 624 Pearson product-moment correlation
polynomial regression 553 calculating power 726
t-test 206 calculating sample size 741
unpaired t-test 204, 206 Pearson/deviance residuals 538
passwords Percentage data
protecting notebooks 21 normalizing 755
Paste link Percentiles
defined 191 defined 798
Paste Special descriptive statistics results 109
defined 798 Performing
Paste special a multiple comparison 242, 367
displaying pasted objects as icons 193 ANOVA on ranks 310
embedding objects 191 best subset regression 611
linking objects 191 Chi-Square test 442
modifying pasted object icons 193 Fisher Exact test 451
pasting graphs and objects 190 Gehan-Breslow survival analysis 699
pasting graphs as metafiles 192 linear regression 469
pasting graphs without data 192 LogRank survival analysis 685
Paste Special... command 190 McNemar’s test 457
Pasting multiple linear regression 495
bitmaps 190 nonlinear regression 637
clipart 190 normality test 127
data 33 one way ANOVA 230
defined 798 one way RM ANOVA 355
graphics 190 paired t-test 332, 337
graphs as metafiles 192 Pearson Product Moment Correlation 623
graphs/objects to a page 190 polynomial regression 553
graphs/objects using paste special 190 power/sample size procedures 713
metafiles 190 procedures 97
objects 190 rate and proportion tests 429
report text 141 RM ANOVA on ranks 408
transpose 63 signed rank test 345
see also inserting single group survival analysis 675
PDF files Spearman Rank Order Correlation 631
exporting 143 survival analysis tests 667
Pearson Chi-Square statistic three way ANOVA 293
multiple logistic regression results 546 two way ANOVA 264
Pearson Product Moment Correlation z-test 438
about 624 Performing a multiple comparison
arranging data 624 RM ANOVA on ranks 418
calculating 624 Picking columns to test
data format 625 z-test 438
defined 798 Pie charts
interpreting results 627 color 184
performing 623 creating 175
running test 625 Pie slices
selecting data columns 625 fills 184
viewing graph 629 Plane
when to use 93, 125, 126, 623 fitting through data 611
Pearson Product Moment Correlation results Plotting
correlation coefficient 628 column means 168, 170
creating a graph 629 residuals 172, 176
number of data points used to compute 628 standardized residuals as bar chart 176

830
Index

Point studentized residuals 561


defined 798 Polynomial regression results
Point and column means plots bar chart of standardized residuals 575
descriptive statistics results 111 best model 570
Point plots coefficient of determination 568, 571
ANOVA on ranks 326 confidence interval 573, 574
creating 169 constant variance test 570, 572
descriptive statistics results 111 creating a graph 576
plotting column means 170 Durbin-Watson statistic 572
rank sum test 229 F statistic 571
report graphs 150 F value 568
unpaired t-test results 218 histogram of residuals 575
Pointer incremental mean square 567
defined 798 incremental results 567
Polynomial line/scatter plot with prediction and confidence
second order 752 intervals 575
Polynomial curve normality test 569, 572
fitting through data 93 P value 569, 571
Polynomial order predicted values 574
regression 556 PRESS statistic 572
Polynomial regression probability plot 575
about 553 regression diagnostics 573
arranging data 554 regression equation 566
data format 554 residual mean square 567, 571
defined 798 residuals 573
difference from 2 value 559 scatter plot of residuals 575
Durbin-Watson statistic 559 standard error of the estimate 572
equation 553 standardized residuals 573
interpreting order only results 570 Population
interpreting results 566 confidence interval 477, 504, 562, 574, 587, 607,
order 553 644
parametric test 553 confidence interval results 661
performing 553 non-normal 332
results 566 non-normal distribution 206
running test 564 normally distributed 206
selecting data columns 554 see confidence interval
setting options 555 Positioning
viewing graph 574 formatting toolbar 9
when to use 94, 124, 553 toolbars 9
Polynomial regression options Post-hoc power
alpha value 563 one way ANOVA results 236, 246, 371
assumption checking 557 one way RM ANOVA 360
confidence intervals 562 paired t-test 342
criterion options 556 three way ANOVA results 304
flagged values 561 two way ANOVA results 276
incremental evaluation/order only 557 two way RM ANOVA results 402
order 556 unpaired t-test 211
power 563 unpaired t-test results 217
predicted values 559 Power
PRESS statistic 563 alpha 441, 451
raw residuals 560 alpha value 133, 217, 236, 246, 276, 305, 342, 372,
standardized coefficients 563 403, 490, 519, 603, 658
standardized residuals 561 calculating 85, 89, 91, 133
studentized deleted residuals 561 Chi-Square results 451

831
Index

Chi-Square test 445 variables/trends 84, 92, 123


chi-square test 723 Prediction test
contingency table 724 see regression procedures
correlation coefficient 726 Preferences
defined 798 automatic 181
linear regression 481 column statistics 54
linear regression results 490 defined 798
multiple linear regression 508 page margins 178
multiple linear regression results 519 page undo 180
nonlinear regression 648 page units 180
nonlinear regression results 658 stretch maintains aspect ratio 180
one way ANOVA 236, 721 PRESS statistic
one way ANOVA results 246 defined 799
one way RM ANOVA 360 linear regression results 488
one way RM ANOVA results 371 multiple linear regression results 518
paired t-test 717 nonlinear regression results 656
paired t-test 342 polynomial regression results 572
Pearson product-moment correlation 726 stepwise regression results 602
performing procedure 713 Printing
polynomial regression 563 column statistics 57, 81
post-hoc 217, 246, 276, 304, 371, 402 modified worksheet 80
sample size 714 reports 145
saving settings and results 716, 718, 720, 723, 726, worksheet 80
727 worksheet data 80
stepwise regression 593 printing
stepwise regression results 603 selected notebook items 20
three way ANOVA 291 Probability
three way ANOVA results 304 threshold value for multiple logistic regression results
t-test 714 547
two way ANOVA 262 Probability axis scale 185
two way ANOVA results 276 defined 799
two way RM ANOVA 388 Probability plot
unpaired t-test 211, 336 two way RM ANOVA 406
unpaired t-test results 217 Probability plots
when to use 133 normality test results 132
z-test 719 one way ANOVA 251
z-test 441 three way ANOVA 308
see also post hoc power 246 two way ANOVA 280
Power function unpaired t-test results 155, 218
linearizing equations 752 Probability table
Predicted values multiple logistic regression results 548
defined 798 Probit axis scale 185
multiple linear regression results 502 defined 799
multiple logistic regression 535 Procedures
nonlinear regression results 661 applying 97
polynomial regression options 559 before and after 329
polynomial regression results 574 choosing appropriate 103
regression diagnostic results 490, 520, 604, 659 compare many groups 114, 204
regression results 475, 535, 642 compare two groups 113, 204
stepwise regression results 585 correlation 465
Predicting multiple comparison 117
goals 92 multiple comparisons 267, 319, 393
trend in data 469, 553, 577, 611 normality 127
variables 495 performing 97

832
Index

power 133, 713 Ranges


rates and proportions 429 descriptive statistics results 109
regression 465 Rank sum test
repeat 102 about 220
repeated measures 329 arranging data 221, 225, 338, 415
sample size 133, 713 box plot 229
survival analysis 667 changing test options 222
using 97 data format 66
Proportions defined 795
calculating power 719 interpreting results 226
calculating sample size 733 normality and equal variance 222, 313, 411
comparing 430 picking data columns to test 224
comparing in multiple categories 430 point plot 229
measuring data by 86 results 226
protecting running 224
notebooks with passwords 21 setting options 222
viewing graph 228
Q when to use 88, 206, 220
Rank sum test options
Quadratic functions when to use 222
linearizing/stabilizing data 752 Rank sum test results
Quattro files equal variance test 227
importing 62 normality test 227
Quick transforms P value 228
applying 76 summary table 228
defined 799 T statistic 228
using 745, 749 Rank transform
applying 760
R defined 799
R (correlation coefficient) Rank, ordinal scale
linear regression results 485 measuring data 85
multiple linear regression 513 Ranking
nonlinear regression 653 see rank transform
stepwise regression 598 Rates and proportions procedures
R value Chi-Square 430, 442
defined 799 compare proportions 430
R2 (coefficient of 513 contingency tables 429
R2 (coefficient of determination) data format 431
linear regression results 485 defined 429
multiple linear regression 513 Fisher Exact test 430, 451
nonlinear regression 653 McNemar’s test 430, 457
polynomial regression results 568, 571 performing 429
stepwise regression 598 z-test 430
R2, adjusted Raw data
linear regression results 485 arranging for group comparison tests 65
multiple linear regression 513 Chi-Square test 443
Random number generation contingency tables 70, 431
generating numbers 746, 781 converting to/from indexed 73
Random numbers defined 799
defined 799 Fisher Exact test 432, 452
uniform 781 group comparison tests 205
Range in normality tests 129
defined 799 in repeated measure tests 331
McNemar’s test 433, 458

833
Index

regression procedures 71 best subsets regression 611


survival analysis test 669 data format 71, 468
Raw residuals defined 465
defined 799 linear 469
multiple linear regression results 502 multiple linear 495
polynomial regression options 560 performing 465
regression results 475, 538, 642 polynomial 553
stepwise regression results 585 raw data 71
Rearranging data 63 Regression results
Reciprocal transform leverage 539
applying 76, 753, 754 Regressions
defined 799 coefficients 800
Reference coding defined 794, 799
defining dummy variables 765, 766 Removing
when to use 766 report text 141
References 12 Repeat
Regional settings procedures 102
Date and Time format 47 Repeated measures ANOVA on ranks
Regression Dunnett’s test 418
best subset 96 Student-Newman-Keuls test 418
confidence interval 477, 504, 522, 562, 587, 607, Tukey test 417
644 Repeated measures procedures
correlation coefficient 467 about 329
defined 123 ANOVA tests 329
forward stepwise 95 data format 330
least squares 467 defined 329
linear 92, 469 nonparametric 330
missing data 468 one way RM ANOVA 330, 355
multiple linear 495 paired t-test 330, 332
nonlinear 94 RM ANOVA on ranks 330, 408
polynomial 94, 553 signed rank test 330, 345
procedures 465 subject column 66
residuals 467 two way RM ANOVA 330
stepwise 96 Repeated observations 86
user defined 7 Replacing
using dummy variables 774 text 140
Regression diagnostics Report graphs
multiple linear regression results 520, 550 generating 148
nonlinear regression results 658 normality test results 131
polynomial regression results 573 plotting residuals 132
predicted values 490 probability plots 132
residuals 490, 491, 573 Reporting flagged values
stepwise regression results 604 best subset regression 618
Regression equation linear regression 477, 481, 643
independent variables 600 multiple linear regression 503, 507, 510
linear regression results 484 multiple logistic regression 537, 539, 540
multiple linear results 513, 545 nonlinear regression 647
polynomial 566 polynomial regression 561
polynomial regression results 570 stepwise regression 586, 590, 593
Regression model Reports 135
best subset regression results 620 closing 144
Regression options cutting/copying text 140
confidence interval 477, 644 decimal places 136
Regression procedures deleting 144, 201

834
Index

deleting text 141 586, 604, 643, 659


editing 137 studentized deleted 476, 491, 503, 521, 539, 551,
explain test results 136 561, 586, 605, 643, 659
exporting 143 sum of squares 622
find/replace 140 table, multiple logistic regression results 550
formatting toolbar 137 two way ANOVA 261
generating 137 Result explanations
generating graphs 148 turning on/off 214, 227, 244, 272, 301, 322, 484,
modifying 137 513, 544, 570, 596, 620, 627, 634
modifying text 138 Results
moving around 141 ANOVA on ranks 322
nonlinear regression 652 best subset regression 619
opening non-notebook files 144 Chi-Square test 449
pasting text 141 compare proportions 439
printing 145 descriptive statistics 108
ruler 138 editing 137
saving 10, 142 equal variance for one way ANOVA 245
scrolling 214, 227, 244, 339, 439, 449, 463, 483, equal variance for one way RM ANOVA 371
544, 566, 570, 597, 620, 627, 634, 652 Fisher Exact test 455
searching for and replacing text 140 linear regression 483
setting options 135 McNemar’s test 463
setting tabs 139 missing data in a three way ANOVA 301
significant P value 136 missing data in a two way ANOVA 272
text attributes 138 missing data in a two way RM ANOVA 398
turning on/off ruler 137 multiple comparison 423
using scientific notation 136 multiple comparisons in a one way ANOVA 248
reports multiple comparisons in a one way RM ANOVA 374
creating within Notebook Manager 22 multiple linear regression 512
Residual mean square multiple logistic regression 544
polynomial regression results 567, 571 nonlinear regression 651
Residual options normality for one way ANOVA 244
one way ANOVA 234 normality for one way RM ANOVA 371
unpaired t-test 210 normality test 131
Residual sum of squares one way ANOVA 243
best subset regression results 622 one way ANOVA confidence interval 246
Residual tests one way ANOVA power 246
Durbin-Watson statistic 657 one way ANOVA table 247
PRESS statistic 478, 505, 563, 588, 645, 656 one way RM ANOVA 369
Residuals one way RM ANOVA power 371
calculation of, multiple logistic regression results 550 one way RM ANOVA table 372
defined 123, 467, 800 P value in T statistic 228
deviance 537 paired t-test 339
mean square 567 Pearson Product Moment Correlation 627
nonlinear regression 642 polynomial regression 566, 570
paired t-test 336 rank sum test 226
Pearson/deviance 538, 551 rank sum test equal variance 227
plotting 132, 172, 176 rank sum test normality 227
probability plots 132 rank sum test summary table 228
raw 475, 502, 538, 560, 585, 604, 642 rank sum test T statistic 228
regression diagnostic results 490, 520, 551, 573, 604, RM ANOVA on ranks 421
659 signed rank test 351
standardized 176, 476, 490, 503, 520, 561, 573, single group survival analysis 677
586, 604, 643, 659 Spearman Rank Order Correlation 634
studentized 476, 491, 503, 520, 538, 551, 561, stepwise regression 596

835
Index

summary table for ANOVA on ranks 323 Dunnett’s test 424


summary table for best subset regression 620 equal variance test 421
summary table for linear regression 486 line/scatter plot 425
summary table for multiple linear regression 514 multiple comparison graphs 426
summary table for multiple logistic regression 548 multiple comparisons 423
summary table for one way ANOVA 245 normality test 421
summary table for one way RM ANOVA 371 statistics 422
summary table for paired t-test 340 Student-Newman-Keuls test 424
summary table for RM ANOVA on ranks 422 table 422
summary table for signed rank test 352 Rotating
summary table for three way ANOVA 305 labels 187
summary table for two way ANOVA 273 text 187
summary table for two way RM ANOVA 400 Row percentage
three way ANOVA 300 in contingency tables 450
two way ANOVA 271 Rows
two way RM ANOVA 397 deleting 35
unpaired t-test 214 editing titles 37
viewing 5 inserting empty 35
Results, incremental selecting 32
polynomial regression results 567 selecting entire 32
Rich Text Format files titlesEntering
opening 144 column and row titles 37
Right-clicking transposing 63
worksheet to open popup menu 36 using as column titles 39, 40
RM ANOVA see also data
data format 68 RTF files
RM ANOVA on ranks opening 144
about 409 Ruler
all pairwise comparisons 417 report 138
arranging data 410 turning on/off 137
data format 410, 415 Running
defined 800 ANOVA on ranks 316
enabling multiple comparisons 413 best subset regression 618
interpreting results 421 Chi-Square test 446
multiple comparison options 417 descriptive test 107
multiple comparison results 423 Fisher Exact test 453
multiple comparison vs. a control 417 Gehan-Breslow survival analysis 699
performing 408 linear regression 482
performing a multiple comparison 418 LogRank survival analysis 685
results 421 McNemar’s test 461
running 414 multiple linear regression 511
selecting data columns 410 multiple logistic regression 541
setting options 410 nonlinear regression 648
viewing graph 425 normality test 127
when to use 89, 119, 330, 345, 355, 408 one way ANOVA 237
RM ANOVA on ranks options one way RM ANOVA 362
when to use 410 paired t-test 337
RM ANOVA on ranks results Pearson Product Moment Correlation 625
all pairwise comparisons 423 polynomial regression 564
box plot 425 procedures 97
chi-square statistic 421, 423 rank sum test 224
comparisons vs. a single control 424 RM ANOVA on ranks 414
creating a graph 426 signed rank test 349
Dunn’s test 424

836
Index

single group survival analysis 675 continuous 85


Spearman Rank Order Correlation 632 nominal (category) 86
stepwise linear regression 594 ordinal (rank) 85
three way ANOVA 293 Scale, axis
two way ANOVA 264 linear 794
two way RM ANOVA 390 logit 795
unpaired t-test 212 modifying type 185
z-test 438 natural logarithmic 795
probability 799
S probit 799
Scaling
Sample size resizing labels/legends automatically with graphs 181
alpha value 133 setting aspect ratio preference 180
calculating 84, 89, 91, 133 Scatter graphs, 3D
calculating for Chi-Square test 133 defined 800
calculating for correlation coefficients 133 Scatter plots
calculating for one way ANOVA 133 creating 165
calculating for unpaired t-tests 133 descriptive statistics results 110
calculating for z-tests 133 plotting column means 168
correlation coefficient 741 plotting many Y columns 165
defined 714, 800 report graphs 150
one way ANOVA 735 unpaired t-test results 218
paired t-test 730 Scientific notation
performing procedure 713 defined 800
proportions comparison 733 entering data using 31
saving settings and results 730, 732, 735, 737, 740, using in reports 136
742 Scroll boxes
t-test 728 defined 800
when to use 133 Scrolling
z-test 733 reports 214, 227, 244, 339, 439, 449, 463, 483,
Sample-based multicollinearity 544, 566, 570, 597, 620, 627, 634, 652
defined 509, 536, 592, 617 Searching for
Save as text 140
defined 800 Second order polynomials
Saving linearizing/stabilizing data 752
confidence interval 478, 505, 562, 587, 645 Section
data 10 defined 800
defined 800 section
graphs 10 creating within Notebook Manager 22
notebook files 10 editing 23
power settings and results 716, 718, 720, 723, 726, naming 23
727 security
reports 10, 142 password protecting notebooks 21
reports as non-notebook files 143 Selecting
sample size settings and results 730, 732, 735, 737, all data in worksheet 33
740, 742 columns 32
worksheet data 57 data 32
saving defined 801
data 19 entire worksheet 33
graphs 19 rows 32
notebook files 19 Selecting columns to test
pages 19 z-test 438
worksheets 19 Selecting data columns
Scale ANOVA on ranks 311

837
Index

best subset regression 614 ANOVA on ranks options 312


descriptive statistics 105, 107 aspect ratio 180
linear regression 471 column statistics 10, 54
multiple linear regression 498, 530 date and time display in worksheet 44
normality test 129 decimal places in worksheet 43
one way ANOVA 232 descriptive statistics options 105
one way RM ANOVA 356 multiple comparison options 117, 122
paired t-test 333 three way ANOVA options 288
Pearson Product Moment Correlation 625 worksheet 10
polynomial regression 554 worksheet appearance 48
rank sum test 221, 224 worksheet column width and row height 48
RM ANOVA on ranks 410 Shortcuts
signed rank test 346 opening popup menu 36
Spearman Rank Order Correlation 632 Showing
stepwise regression 579 column statistics 54
three way ANOVA 293 SigmaPlot
two way ANOVA 258, 264 editing graphs with 189
unpaired t-test 208 importing SP? files 61
Sensitivity SigmaStat
alpha value 133 introduction 1
one way RM ANOVA 360 Signed rank Test
paired t-test 342 selecting data format 350
three way ANOVA 291 Signed rank test
two way ANOVA 262 about 345
two way RM ANOVA 388 arranging data 346
unpaired t-test 211, 336 data format 65, 67, 350
Setting options defined 801
ANOVA on ranks 311 interpreting results 351
best subset regression 614 P value 348
Chi-Square test 444 performing 345
compare proportions 435 picking columns to test 349
Gehan-Breslow survival analysis 695 results 351
linear regression 471 running 349
LogRank survival analysis 681 selecting data columns 346
McNemar’s test 459 setting options 347
multiple linear regression 498 viewing graph 353
multiple logistic regression 530 when to use 89, 118, 330, 345
nonlinear regression 638 Signed rank test results
one way ANOVA 232 creating a graph 353
one way RM ANOVA 357 line/scatter graph 353
paired t-test 333 normality test 352
polynomial regression 555 P value 353
rank sum test 222 statistics 352
reports 135 Significant P value
RM ANOVA on ranks 410 reports 136
signed rank test 347 Simple linear regression
single group survival analysis 672 about 469
stepwise regression 579 see linear regression
three way ANOVA 288 Single group survival analysis
two way ANOVA 258 example graph 678
two way RM ANOVA 385 interpreting results 677
unpaired t-test 208 results 677
z-test 435 running 675
Settings setting options 672

838
Index

Single group survival test applying 76, 752


arranging data 671 defined 801
data format 671 SSincr
performing 670 multiple linear regression results 517
when to use 670 SSmarg
Size multiple linear regression results 518
column statistics 56 Stabilizing variances 749
Sizing Stacking columns 34
graph page 178 Standard deviation
report text 138 column statistics 55
resizing labels/legends automatically with graphs 181 defined 802
setting aspect ratio preference 180 descriptive statistics results 109
sizing expected change 718
Notebook Manager 18 expected size 722, 729, 731, 736
Skewness Standard error
column statistics 56 best subset regression results 622
defined 801 column statistics 55
descriptive statistics results 110 defined 802
Slope descriptive statistic results 109
finding for line 92 linear regression results 486
SNK test multiple linear regression results 514
see Student-Newman-Keuls test multiple logistic regression results 548
Sorting nonlinear regression results 654
data 52 stepwise regression results 600
defined 801 Standard error coefficients
SP? files multiple logistic regression 533
importing 61 Standard error of difference
SP5 files z-test results 440
importing 61 Standard error of the estimate
Spacing linear regression results 485
report text 139 multiple linear regression results 513
Spearman correlation coefficient 634 nonlinear regression results 654
Spearman Rank Order Correlation polynomial regression results 572
about 631 stepwise regression results 598
arranging data 632 Standard error of the means
calculating 631 two way RM ANOVA results 403
creating a graph 636 Standardize transform
defined 631 applying 758
interpreting results 634 defined 802
number of data points used to compute 635 using 746
performing 631 Standardized coefficient (beta)
results 634 linear regression 486
running test 632 multiple linear regression 514
selecting data columns 632 Standardized coefficients
viewing graph 635 beta 514
when to use 126, 631 linear regression options 478
Spearman rank order correlation multiple linear regression 505
defined 801 nonlinear regression options 645
Spearman Rank Order Correlation results polynomial regression options 563
P value 635 stepwise regression 588
Square root transform Standardized residuals
applying 76, 754 defined 802
defined 801 multiple linear regression results 503
Square transform plotting as bar chart 176

839
Index

polynomial regression options 561 two way ANOVA options 258


regression diagnostic results 490, 520, 573, 604, 659 two way ANOVA results 273
regression results 476, 643 two way RM ANOVA options 385
stepwise regression results 586 two way RM ANOVA results 400
Statistic unpaired t-test results 215
Durbin-Watson 584 Wald 534, 548
Pearson Product Moment Correlation 468 Statistics menu
Spearman Rank Order Correlation 468 compare many groups 114, 204
Statistical summary data compare two groups 204
defined 802 power 713
in group comparison tests 68 rates and proportions 429
Statistical summary table sample size 133, 713
nonlinear regression results 654 survival analysis 667
Statistical test options Statistics tests
one way RM ANOVA 359 see tests
signed rank test 349 Step number
Statistics 441 stepwise regression results 598
ANOVA on ranks options 312 Steps
ANOVA on ranks results 323 setting maximum 582
best subset regression 612 Stepwise linear regression
best subset regression results 620 about 577
Chi-Square 532 Stepwise regression
chi-square statistic 421, 423 arranging data 579
classification table test 532 backward 124, 578
coefficient of determination 612 defined 802
contingency table summary 456 difference from 2 value 584
descriptive 84, 205, 238 Durbin-Watson statistic 584
Durbin-Watson 489, 518, 559, 572, 602, 657 forward 124, 578
Durbin-Watson statistic 657 influential points 591
F statistic 248, 275, 304, 373, 401, 488, 517, 571, interpreting results 596
599, 656 results 596
F value 568 running test 594
H statistic 322, 324 selecting data columns 579
Hosmer-Lemshow 531 setting options 579
likelihood ratio test 532, 546 viewing graph 607
linear regression results 486 when to use 96, 124, 577
log likelihood 547 Stepwise regression options 579
multiple linear regression results 514 alpha value 594
multiple logistic regression results 548 assumption checking 582
one way ANOVA options 232 confidence interval 587
one way ANOVA results 245, 371 Cook’s Distance test 590
one way RM ANOVA results 371 criterion options 581
paired t-test results 340 DFFITS test 589
Pearson Chi-Square 546 flagged values 586, 590, 593
power 217, 246, 276, 304, 371 F-to-enter value 581
PRESS 478, 505, 518, 563, 572, 588, 602, 645, F-to-remove value 581
656 influence/multicollinearity 588
rank sum test options 222 leverage 589
rank sum test results 228 number of steps 582
RM ANOVA on ranks results 422 power 593
signed rank test results 352 PRESS statistic 588
T statistic 228 standardized coefficients 588
t statistic 216, 341, 486, 514, 622, 654 VIF 591
three way ANOVA options 288 Stepwise regression results

840
Index

3D residual scatter plot 608, 663 regression results 476, 538, 643
adjusted R2 598, 654 stepwise regression results 586
ANOVA table 598 Student-Newman-Keuls test
bar chart of standardized residuals 608, 662 ANOVA on ranks 319, 325
confidence intervals 606 one way ANOVA 241, 249, 268, 297, 418
constant variance test 603 one way RM ANOVA 366, 375
Cook’s Distance test 605 repeated measures ANOVA on ranks 418
creating a graph 608, 663 RM ANOVA on ranks results 424
DFFITS test 606 three way ANOVA 297, 307
Durbin-Watson statistic 602 two way ANOVA 268, 278
F statistic 599 two way RM ANOVA 394
F-to-enter value 597 two way RM ANOVA results 405
F-to-remove value 598 Subject column 66
histogram of residuals 607, 662 Subscript 186
influence diagnostics 605 Sum
leverage test 605 column statistics 56
line/scatter plot with prediction and confidence descriptive statistics results 109
intervals 608, 663 Sum of squares
normality test 602 defined 567, 803
P value 600 descriptive statistics results 110
power 603 incremental 568
predicted values 585 linear regression results 487
PRESS statistic 602 multiple linear regression results 515
probability plot 608, 662 nonlinear regression results 655
raw residuals 585 one way ANOVA results 247
regression diagnostics 604 one way RM ANOVA results 372
scatter plot of residuals 607, 662 residual 622
standard error of the estimate 598 stepwise regression results 598
standardized residuals 586 three way ANOVA results 302, 303
step number 598 two way ANOVA results 274
studentized deleted residuals 586 two way RM ANOVA results 400
studentized residuals 586 Summary table
sum of squares 598 ANOVA on ranks results 323
variables 600 best subset regression results 620
Stop after steps defined 803
entering value 582 linear regression results 486
Strings multiple linear regression results 514
filtering 777 multiple logistic regression results 548
Structural multicollinearity one way ANOVA 234
defined 509, 536, 592, 617 one way ANOVA results 245
Studentized deleted residuals one way RM ANOVA 359
defined 802 one way RM ANOVA results 371
multiple linear regression results 503 Paired t-test 336
multiple logistic regression results 551 paired t-test results 340
polynomial regression options 561 rank sum results 228
regression diagnostic results 491 rank sum test 224, 413
regression results 476, 521, 539, 605, 643, 659 RM ANOVA on ranks 413
stepwise regression results 586 RM ANOVA on ranks results 422
Studentized residuals signed rank test results 352
defined 802 three way ANOVA 290
multiple linear regression results 503 three way ANOVA results 305
multiple logistic regression diagnostic results 551 two way ANOVA 261
polynomial regression options 561 two way ANOVA results 273
regression diagnostic results 491, 520, 604, 659 two way RM ANOVA 387

841
Index

two way RM ANOVA results 400, 403 Table


unpaired t-test 210 ANOVA on ranks results 323
unpaired t-test results 215 best subset regression results 620
z-test results 439 classification for multiple logistic regression results 548
Superscript 186 linear regression results 486
Survival Analysis multiple linear regression results 514
defined 803 multiple logistic regression results 548
when to use 126 one way ANOVA results 245
Survival analysis one way RM ANOVA results 371
example graphs 709 paired t-test result 340
example of Gehan-Breslow survival graph 707 probability for multiple logistic regression results 548
example of LogRank survival graph 692 residuals, multiple logistic regression results 550
example of single group survival graph 678 RM ANOVA on ranks results 422
failures, censored values, and ties 708 signed rank test results 352
interpreting Gehan-Breslow results 705 three way ANOVA results 302
interpreting LogRank results 690 two way ANOVA results 273
Survival analysis procedures two way RM ANOVA results 400
arranging data 671, 681, 695 Tables
data format 668 contingency 429
defined 667 Tabulated data
for probability of time to event 667 Chi-Square test 443
performing 667 contingency tables 69, 431
running Gehan-Breslow survival analysis 699 defined 803
running LogRank survival analysis 685 in contingency tables 69
running single group survival analysis 675 McNemar’s test 458
types of tests 668 Technical support 11
Survival analysis test via home page 12
indexed data format 670 Telephone numbers 11
raw data format 669 Test
Switching rows to columns 63 repeating 102
Symbols Test goals
color 184 defining 83
defined 803 predicting 92
Greek 187 Test results
inserting in legends 188 see results
modifying 184 Testing
Systat Software constant variance 640
world wide web home page 12 non-normal populations 114, 118, 126, 206, 332
normality 348, 640
T normality and equal variance 209
normally distributed populations 114, 115, 118, 119,
T statistic 126, 206
rank sum test results 228 Tests
t statistic advising user on which test to use 83
best subsets regression results 622 choosing appropriate 103
calculation 341 correlation 465
degrees of freedom 341 defining goals 83
linear regression results 486 Fisher Exact 451
multiple linear regression results 514 Gehan-Breslow survival analysis 693
nonlinear regression results 654 generating report graphs 148
P value 342 group comparison 113
unpaired t-test results 216 LogRank survival analysis 679
Tab McNemar’s 457
setting in reports 139 measuring effect 85

842
Index

measuring sensitivity 713 empty cells 285


nonparametric 203, 330, 409 enabling multiple comparisons 291
normality 127 equal variance test results 302
paired t-test 88, 332 Fisher’s LSD test 297
parametric 203 Holm-Sidak test 296
rank sum 88 interpreting results 300
rate and proportion 429 missing data 285, 286
reports 135 missing data procedure 294
setting tabs in reports 139 multiple comparison results 305
signed rank 89, 345 multiple comparisons options 295
Single group survival test 670 multiple comparisons vs. a control 296
survival analysis 667 normality and equal variance assumptions 289
t-test 88 options 288
unpaired t-test 88 performing a multiple comparison 298
z-test 434 picking columns to test 293
see also procedures post-hoc power 291
Text results 300
adding to graph page 186 results for missing data 301
aligning 139 running test 293
alignment 186 setting options 288
color 138 Student-Newman-Keuls test 297
cutting/copying 140 Tukey test 296
deleting 141 viewing graph 308
editing 188 when to use 90, 116, 204, 283
entering in worksheet 31 Three way ANOVA options
find/replace 140 confidence interval 290
font 138 residuals options 291
pasting 141 summary table 290
replacing 140 Three way ANOVA results
rotating 187 all pairwise comparisons 296, 306
searching for 140 creating a graph 308
sizing 138 degrees of freedom 302
spacing 139 error sum of squares 303
style 138 F statistic 304
subscript 186 histogram of residuals 308
superscript 186 interpreting 300
Text files mean squares 303
defined 803 missing data 301
importing 62 multiple comparison graphs 308
opening 144 multiple comparison vs. a control 306
Text mode P value 304
entering non-keyboard characters 187 post-hoc power 304
Three way ANOVA power 304
about 283 probability plot 308
arranging data 284 sum of squares 302
Bonferroni t-test 297 summary table 305
changing test options 288 table 302
connected data 286 type I error 305
data format 67, 287 Threshold probability
defined 804 multiple logistic regression results 547
dependent variables 302 see classification table test
disconnected data 286 Titles
Duncan’s Multiple Range test 298 column and row titles 37
Dunnett’s test 297 column and row titles dialog box 37

843
Index

editing column 37 filter 746, 777


editing row 37 interactions 746, 762
using cells as column or row titles 40 lagged variables 746, 775
using worksheet columns as row titles 39 missing values 746, 785
using worksheet rows as column titles 39 quick trans 745, 749
Toolbars random numbers 746, 781
defined 803 rank 760
formatting toolbar 9, 137 standardize 758
positioning 9 user defined 787
Total cell percentage Transposing
in contingency tables 450 defined 803, 804
Transform rows & columns 63
square root 754 Treatments
Transform dialog 787 number of 84, 87
Transform menu Trends
standardize 746 predicting 84
user defined 746 predicting in data 469, 553, 577, 611
Transforms t-test
applying 6 defined 803
arcsin 755 unpaired 206
center 746, 752, 755 unpaired, when to use 206
centering data 77 t-test
converting missing values 78 selecting data format 212
defined 803 t-test
defining own 746 arranging data as 212
dialog 787 t-test options
dummy variables 78, 746 Kolmogorov-Smirnov test 209
entering 787 t-tests
filter 746, 777 power 714
filtering data 77 sample size 728
generating random numbers 77 t-tests
interaction 746, 762 arranging data 64
lagged variables 78, 746, 775 data format 65, 66
library 746 paired 88, 118, 330, 332
linearizing/variance stabilizing 751 unpaired 113
ln 752, 753 when t 113
logit 545 when to use 88
missing values 746, 785 see unpaired t-test
quick 745, 749 Tukey test
quick mathematical 76 ANOVA on Ranks 319
random numbers 746, 781 one way ANOVA 240
rank 760 one way RM ANOVA 365
ranking data 77 repeated measures ANOVA on ranks 417
reciprocal 753, 754 three way ANOVA 296
square 752 two way ANOVA 267, 268
standardize 746, 758 two way RM ANOVA 394
standardizing data 77 two way RM ANOVA results 405
user defined 746, 787 Turning on/off
user-defined 78 formatting toolbar 9
using 745 insertion mode 37
variable interactions 78 report ruler 137, 138
Transforms menu result explanations 214, 227, 244, 272, 301, 322,
center 746, 755 484, 513, 544, 570, 596, 620, 627, 634
dummy variables 746, 765 Two groups

844
Index

comparing 204 mean squares 274


Two way ANOVA multiple comparison vs. a control 277
3D category scatter 280 P value 276
3D scatter of residuals 280 power 276
about 253 sum of squares 274
alpha value 262 table 273
arranging data 205, 254 Two way RM ANOVA
Bonferroni t-test 268 3D category scatter plot 406
changing test options 258 3D scatter of residuals 406
connected data 256 alpha value 388
creating a graph 281 arranging data 380
data format 65, 67, 258 Bonferroni t-test 394
defined 804 changing test options 385
disconnected data 256 creating a graph 407
Duncan’s Multiple Range test 269 data format 65
Dunnett’s test 269 defined 804
empty cells 255 Duncan’s Multiple Range test 395
enabling multiple comparisons 263 Dunnett’s test 395
equal variance test results 273 empty cells 381, 382
Fisher’s LSD test 269 enabling multiple comparisons 389
grouped bar chart 280 Fisher’s LSD test 394
histogram of residuals 279 histogram of residuals 406
indexing raw data 74 Holm-Sidak test 393
interpreting results 271 interpreting results 397
methods of multiple comparison 267 methods of multiple comparison 393
missing data 255, 256 missing data 381, 382
missing data procedure 265 missing data procedure 391
multiple comparison results 277 missing factor data 384
multiple comparisons options 266 multiple comparison graphs 407
normality and equal variance assumptions 259 multiple comparison results 403
options 258 normality and equal variance assumptions 386
performing 253 performing a multiple comparison 395
performing a multiple comparison 269 picking columns to test 390
picking columns to test 264 power 388
picking data columns to test 258 probability plot 406
power 262 results 397
probability plot 280 results for missing data 398
results 271 running 390
results for missing data 272 setting options 385
running test 264 statistics test results 403
setting options 258 Student-Newman-Keuls test 394
Student-Newman-Keuls test 268 Tukey test 394
Tukey test 267, 268 viewing graph 406
viewing graph 279 when to use 90, 119, 121, 330, 355, 379
when to use 90, 115, 116, 121, 204, 253 Two way RM ANOVA options
Two way ANOVA options confidence interval 387
confidence intervals 261 multiple comparisons 392
residual options 261 residual options 388
summary table 261 summary table 387
Two way ANOVA results Two way RM ANOVA results
all pairwise comparisons 277 all pairwise comparison 404
Bonferroni t-test 278, 306 ANOVA table 400
degrees of freedom 273 approximate degrees of freedom 402
F statistic 275 Bonferroni t-test 404

845
Index

comparison vs. a single control 404 equal variance test 215


degrees of freedom 400 interpreting results 214
dependent variable 399 results 214
descriptive statistics options 387 running 212
Duncan’s test 405 setting options 208
Dunnett’s test 405 summary table 210
equal variance test 399 viewing graph 217
expected mean squares 403 when to use 88, 113, 114, 206
F statistic 401 Unpaired t-test options
Fisher LSD test 405 when to use 208
mean squares 401 Unpaired t-test results
normality test 399 bar chart 217
P value 402 confidence interval 216
statistics 403 degrees of freedom 216
Student-Newman-Keuls test 405 difference of the mean 216
sum of squares 400 explanations 214
Tukey test 405 histogram of residuals 218
Type I error normality test 214
alpha value 217 P val 216
three way ANOVA results 305 point plot 218
two way ANOVA results 276 probability plot 155, 218
Type II error scatter plot 218
alpha value 217 summary table 215
Types t statistic 216
axis line 185 Updating
pasted/inserted objects 198 object links 197
User-defined transforms
U defined 804
entering 78, 787
Unbalanced data 64 using 746
in group comparison tests 205 Using
procedures for three way ANOVA 294 Yates correction 437
procedures for two way ANOVA 265
procedures for two way RM ANOVA 391
three way ANOVA 285 V
two way ANOVA 255 Values
Underlining alpha 133
report text 139 difference from 2 641
Uniform random numbers F-to-enter 581, 597
defined 804 F-to-remove 581, 598
generating 781 maximum 56
Un-indexing minimum 56
data 75 minimum positive 56
Units missing 56, 785
setting for report ruler 138 predicted 475, 502, 535, 559, 574, 585, 604, 642
Units, measurement skewness 56
graph page 180 Variables
Unpaired t-test centering 755, 758
calculating sample size 728 combinations 533
power 714 data format 468
Unpaired t-test dependent 273, 302, 399
about 206 determining relationship between 623
calculating power/sample size 133 dichotomous 527
data format 212, 542 dummy 528, 746, 765

846
Index

independent 600 linear regression 493


interaction 762 multiple linear regression 523
lagged 746, 775 Pearson Product Moment Correlation 629
measuring strength 93 polynomial regression 574
measuring strength of association 623, 631 Spearman Rank Order Correlation 635
nonlinear regression results 652, 655 stepwise regression 607
not in regression equation 601 VIF
predicting 84, 92, 123 see variance inflation factor
predicting dependent 465
quantifying strength of association 125 W
ranking 760
selecting independent 95 W statistic
specifying independent 94 P value 353
standardizing 758 Wald statistic
Variance inflation factor multiple logistic regression 534
best subsets regression options 616 multiple logistic regression results 548
best subsets regression results 623 when to use 535
defined 510, 536 z value 534
multiple linear regression 508 Wilcoxon signed rank test
multiple linear regression results 515 see signed rank test
multiple logistic regression results 535, 549 Windows
nonlinear regression results 655 worksheet 28
stepwise regression options 591 WKS files
see also multicollinearity defined 795
Variances Word files
stabilizing 749 opening 144
View menu Word processor
toolbars 9 editing reports 141
Viewing Worksheet
ANOVA on ranks graph 326 arranging data in contingency tables 69, 431
column statistics 54 changing appearance 48
descriptive statistics 84 column and row titles 37
formatting toolbar 9 column width and row height 48
inserted objects as icons 194 displaying data as engineering notation 42, 43
nonlinear regression graph 662 displaying data as fixed decimal 43
object links 196 entering data 27
one way ANOVA graph 250 importing data 60
one way RM ANOVA graph 376 inserting columns and rows 35
paired t-test graph 343 insertion mode 37
pasted objects as icons 193 missing cells 370
rank sum graph 228 modifying appearance 80
report ruler 137, 138 opening from notebook files 59
reports 137 opening non-notebook files 59
results 5 overwrite mode 37
RM ANOVA on ranks graph 425 plotting many Y columns 165, 166
signed rank graph 353 printing 80
three way ANOVA graph 308 selecting all data 33
two way ANOVA graph 279 selecting data 32
two way RM ANOVA graph 406 selecting entire 33
unpaired t-test graph 217 setting date and time display 44
viewing setting decimal places 43
notebook files 23, 24 setting Options 10
notebook items 23, 24 shortcut popup menu 36
Viewing graph sorting data 52

847
Index

transposing rows and columns 63 interpreting results 439


worksheet picking columns to test 438
moving between notebooks 23, 25 running 438
naming 23 setting options 435
opening 24 when to use 122, 434
Worksheet data Yates correction 430, 437
saving to notebook files 57 see also compare proportions
Worksheets z-test results
creating graphs from 162 alpha 441
data brushing 49 confidence interval for difference of proportions 441
Excel 29 difference of proportions 439
exporting 58 power 441
moving around 31 standard error of the difference 440
multiple 28 statistics 439
multiple Excel 30
opening 59
saving as non-notebook files 58
setting Day Zero 46
worksheet windows 28
worksheets
creating within Notebook Manager 22
WPF files
opening 144

Y
Yates correction
chi-square results 450
chi-square test 446
defined 430
McNemar’s test 460
setting 437, 446, 460
when to use 437, 446, 460
Yates correction factor
defined 804

Z
z statistic
P value 441
z-test results 441
Zooming in/out
defined 804
z-test
power 719
sample size 733
when to use 430
z-test
alpha value 437
arranging data 435
calculating power/sample size 133
confidence interval 437
data format 431, 435
defined 804

848

You might also like