SPSS Basics Probability Distributions

SPSS Basics for Probability Distributions
Built-in Statistical Functions in SPSS

Begin by defining some variables in the Variable View of a data file, save this file as
“Probability_Distributions.sav” and save the corresponding output file as
“Probability_Distributions.spo”.
Accessing built-in statistical functions (and others) in SPSS is fairly straightforward

when using the Transform then Compute Variable option, see below
The Compute Variable window then opens
Prepared by: Chris Hay-Jahans, UAS Mathematics Program Page 1 of 28

For the present, let’s identify where the various distribution related functions are located.
The menu of all available functions in SPSS is in the box labeled Function group:,
selecting one of the function groups provides a sub-menu in the Functions and Special
Variables box.
By selecting CDF & Noncentral CDF access can be gained to a list of Cumulative
Distribution Functions.
By selecting PDF & Noncentral PDF access can be gained to a list of Probability
Density Functions

By selecting Inverse DF access can be gained to Inverse functions for Cumulative

Probabilities.
Selecting Random Numbers provides access to a list of Random Number Generators

within specific probability distributions

Selecting Significance provides access to functions which may be used in computing a

Significance (commonly referred to as p-value) corresponding to the F- and Chi-Square
distributions.
Selecting Statistical provides access to some of the more routine functions used in
statistical computations.

Finally, selecting Arithmetic provides access to a list of commonly used arithmetic,

algebraic and transcendental functions which might be needed.
When using any of the above identified functions (or any others, for that matter), a
Target Variable has to be defined or listed and then a Numeric Expression (a desired
computational formula) which may or may not involve built in functions will need to be
entered. This formula also may or may not involve other variables listed in the data file.
Once the computational formula is entered and the OK button is clicked SPSS will
compute values for every row that has at least one data value in it.
Computing Probabilities and their Inverses

Consider the following simple examples.
Working with built-in functions for the Binomial Distribution
Go to the Data View of the data file, enter the data values 5, 10, 15 and 20 for the
variable x. The following discussion addresses computations of cumulative probabilities
for binomial distributions.
Open the Compute Variable window, identify “Binomial” as the Target Variable, then
select CDF & Noncentral CDF in the Function group box and highlight Cdf.Binom.
To place this function in the Numeric Expression box, click on the upward-arrow button
located right next to the Function group box.
When ever a function or special variable is highlighted in the Functions and Special
Variables box, a description will appear in the central space reserved for instructions
and/or descriptions right below the “calculator” key-pad.
In this case, note that the Numeric Expression “CDF.BINOM(?,?,?)” requires a “quant”,
an “n”, and a “prob”. This function computes a probability of the form
P(X ≤ x) = CDF.BINOM(x, n, p)
for a binomial distribution with probability of success p and number of trials n.
Put x for the 1st “?” – SPSS takes x’s value from the data file, set n = 23 and p = 0.37.

Now click on OK, a little window will pop up asking if the existing values of the variable
should be changed. Select OK.
The output file will be updated with a “log” of the computation performed, and the data
file will now contain the cumulative probabilities
P(x ≤ 5), P(x ≤ 10), P(x ≤ 15) and P(x ≤ 20)
For the binomial distribution with parameters n = 23 and p = 0.37.
Adjust the decimal places for the variable “Binomial” to four places, then copy the values
from “Binomial” to the variable “p” – reset the decimal places for “p” as well
Now, suppose the reverse computation is desired, i.e., find x for which the cumulative
probability is 0.0938 and so on – note that the answers are already given in the variable x,
but consider a process for finding these values.
It appears that SPSS does not have inverse functions associated with cumulative
probabilities for discrete probability distributions. This provides an opportunity to
illustrate the use of SPSS’ Command Syntax.
A Command Syntax Illustration: Begin by opening a “Syntax” file (as opposed to a data
or output file).

In this new file, type in the command syntax shown below and save the file as
“InvBinom.sps”. Note that all text that follow a “/*” are comments for user reference –
they tell users of the “program” what a particular code’s purpose is.
Now, to run the program, select Run and then All.
Open the data file and notice the new values placed under “InvBinom”

The values entered are exactly those under the variable “x”.
Comments about the above “Program” and Command Syntax: In general, when
dealing with a binomial distribution it is the probability of success that is of most interest
– this being the quantity which might need estimating. The above “program” will work
only if you are dealing with a theoretically exact set of probabilities. Its introduction here
is for illustrative purposes only. Advanced users of SPSS will be the ones who find most
use out of the Command Syntax language of SPSS, most users will never find a need for
it. A point to note is that in order to be able to make effective use of the Command Syntax
feature a user will need familiarity with computer programming and a reasonably high
level of comfort with mathematics.
In computing probabilities of the form P(X = x) for a binomial distribution using SPSS’
built-in function, the Function group: PDF & Noncentral PDF is opened and in
Functions & Special Variables: the function Pdf.Binom is called up
Thus, for a binomial distribution with parameters n and p (n trials and probability of
success p),
P(X = x) = Pdf.Binom(x, n, p).
Use this function in the same manner as which cumulative frequencies were computed
earlier. Be sure to assign a target variable.
Working with built-in functions for the Normal Distribution
As with binomial distributions, the main characteristics of a normal distribution are

determined by two parameters. For the normal distribution (for the built-in functions in
SPSS at least) the two parameters are the mean μ and the standard deviation σ
(occasionally the variance is used instead of the standard deviation).

Thus, to compute cumulative frequencies corresponding to the given values of x in the

data file the population mean and standard deviation of the random variable X are needed.
Remember that the normal distribution is a theoretical distribution, thus, at best, one may
estimate the true mean and standard deviation.
Suppose the data given earlier is obtained from a normal distribution with μ = 6.25 and
standard deviation σ = 3.5.
As with the binomial distribution, the cumulative frequency distribution function for a
normal distribution with mean μ and standard deviation σ is accessed by opening the
Compute Variable window and then selecting Cdf.Normal. The Numeric Expression
entered below computes the probability P( -∞ < X < x), where x represents the data value
being used in the function expression
Being sure to identify a target variable, click on OK to get

Note that to compute a cumulative probability of the form P(a < X < b) using SPSS, one
would use the numeric expression
Cdf.Normal(b, μ, σ) - Cdf.Normal(a, μ, σ).
Increase the number of decimal places, if desired, for the variable “Normal” and then
copy and paste the newly computed values into the variable “p”.
Now, consider finding values for x for which P( -∞ < X < x) = p. Think of this task as
solving this equation for x. SPSS does this through the function Idf.Normal – Keeping
the same values μ = 6.25 and standard deviation σ = 3.5, and setting up the Compute
Variable window as
The values computed show up in

Unlike as was the case in the binomial distribution, the function Pdf.Norm(x, μ, σ) does
not return the value of P(X = x). This function computes the probability density of the
normal distribution, with specified mean μ and standard deviation σ, at x – see the
general discussion on normal distributions in the text.
This function can be used to obtain the graph of the normal distribution curve (with
specified mean μ and standard deviation σ), however, this function will not play much of
a direct role in this course.
Working with built-in functions for the t-Distribution
The parameter needed to determine a t-distribution is the degrees of freedom, df = n – 1.

Computing probabilities using a t-distribution follows the same steps as for binomial and
normal distributions. The cumulative frequency distribution function for a t-distribution
with degrees of freedom df = n – 1 is accessed by opening the Compute Variable
window and then selecting Cdf.T. The Numeric Expression entered returns the
probability P( -∞ < T < x).

The values computed show up in
As in previous cases, inverses of probabilities can be computed
to get
Note: Some poor notation has crept in, my apologies – in the above data file the letter t
represents a probability. In practice the letter t is reserved for the “t-value” of a t-
distribution.

The probability density function of t-distributions will not play much of a direct role in
this course.
Working with built-in functions for the χ2-Distribution
The parameter needed to determine a (Chi-square) χ2-distribution is again the degrees of

freedom, df = n – 1. Computing probabilities follows the same steps as before. The
cumulative frequency distribution function for a χ2-distribution with degrees of freedom
df = n – 1 is accessed by opening the Compute Variable window and then selecting
Cdf.Chisq. The Numeric Expression entered returns the probability P( -∞ < χ2 < x).
The computed values appear in
Note: Once again, be aware of the poor notation. Here, the variable “Chi” represents
probabilities and the variable “InvChi” represents “χ2-values” of the χ2-distribution.

To obtain inverses of probabilities (i.e., to find the “x-values”) involving a χ2-

distribution, the process is the same as before.
Again, the probability density function of χ2-distributions will not play much of a direct
role in this course.
Working with built-in functions for the F-Distribution
First, change the variable “F” to “FProb”.

When working with F-distributions, the parameters needed to determine a distribution
include two degrees of freedom, dfN (of the numerator) and dfD (of the denominator).
Computing probabilities follows the same steps as before – here suppose, for the sake of
example, that the two degrees of freedom are dfN = 3 and dfD = 5.
The cumulative frequency distribution function for the desired F-distribution is accessed
by opening the Compute Variable window and then selecting Cdf.F. The Numeric
Expression entered returns the probability P( -∞ < F < x).
Similarly, the inverse of a probability involving an F-distribution is accessed by opening
the Compute Variable window and then selecting Idf.F.
The Compute Variable windows for each of these are shown on the next page, the first
being for computing probabilities.

Then for inverses of probabilities
The results of the above computations are

Yet again, the probability density function of F-distributions will not play much of a
direct role in this course.
Assessing the Normality of a Random Variable Graphically

A common method for determining whether the underlying population of a random
variable, say X, for a set of data is (approximately) normally distributed is to use what is
generically called a normal probability plot. Here, a particular type of normal probability
called a Q-Q normal probability plot for a given set of data is obtained from scratch, and
then using a built-in routine available in SPSS.
The data used are shown below
Computational Procedures for Obtaining a Q-Q Normal Probability Plot

Begin by first sorting the data in ascending order. To do this, select Data on the toolbar
and then click on Sort Cases as shown below

In the Sort Cases window, identify the variable to be sorted and the Sort Order
The result will be

Now insert a new variable before the variable “x”,
Name this new variable “i” and for this variable, enter the values 1,2, …, 12.
Now obtain the plotting position using Blom’s approximation – this uses the formula
i − 0.375
p=
n + 0.25

This is done using the Compute Variable feature.
Adjust the decimal places for the computed variable values to 4 places.
Though not necessary, now standardize the x-values using the mean and standard
deviation of the sample.

You will have to compute the mean and standard deviation for the data first. One way to
do this is to use the SPSS routine
Then
It is useful to note that SPSS provides a means of obtaining standardized values – see
Save standardized values as variables in the Descriptives window above. The
standardized values are computed using the formula
x−x
z=
s
Where x is the mean of the data values and s is the standard deviation. You can limit
the amount of output by opening the Options window and selecting only that which is
desired – see below.

The standardized values are saved under the variable name “Zx”, these values will be
referred to as “Observed Values”
The next step in the process is to compute the “Expected Values”. Since it is the
normality of the data that is being examined, the distribution used to obtain these
expected values is a normal distribution. Furthermore, since the observed data values
have been standardized it makes sense to use the Standard Normal distribution. Each
plotting position “p” provides the cumulative probability associated with the rank of the
corresponding observed data value. The expected value is then obtained by computing
Ze = Idf.Normal(p, 0, 1).

Use the Compute Variables feature to obtain this.
Open the Variable View of the data file and Label “Zx” as “Observed Values” and “Ze”
as “Expected Values.
All that is needed to obtain the Q-Q normal probability plot has now been obtained. Any
one of the (three) graphing features may now be used to obtain a scatter plot of the
observed values against the expected values. The “closeness” to normality is then
indicated by how “closely” the scatter plot approximates the line y = x.
The two output columns are shown below.

Now, to obtain the plot, open the Legacy Dialogs and select Scatter/Dot. Choose Simple
Scatter and then assign “Ze” to the x-axis and “Zx” to the y-axis. You can add a Title
and then select OK. The initial appearance of the Q-Q plot is
Earlier editing methods can be used to obtain

For ease of comparison, the reference line y = x can be included as follows. Click on the
button that produces the “Add a reference line from Equation” pop-up text box.
A reference line will appear on the chart area and the Properties window will show the
equation of the line in the Custom Equation box in the form
Y = a*x +b
For this Q-Q normal probability plot (using standardized values) make sure a = 1 and
b = 0.

The end result is
Observe that the scatter plot follows the line y = x very closely, and all points are closely
clustered randomly (no systematic patterns) about the line. This suggests that it is
reasonable to assume that the underlying population of the random variable X is (at least
approximately) normally distributed. See class handout for a detailed analysis and
interpretation of Q-Q normal probability plots.
Obtaining a Q-Q Normal Probability Plot using the Built-in SPSS Feature
SPSS has a built in feature to construct Q-Q normal probability plots which shortens the
process considerably

Select the toolbar commands shown below
In the Q-Q Plots window assign the original variable to Variables, check the indicated
boxes etc. and select OK.
The resulting Q-Q normal probability plot is similar to the one obtained “from scratch”,
with the exception that the “Expected Values” values are placed in the vertical axis rather
than the horizontal axis, see below.

Open the Chart Editor window and begin by transposing the graph. This is done by
clicking on the button that yields a pop-up box containing the “Transpose chart
coordinate system”
Further edits can then be made to obtain a Q-Q normal probability plot that is very close
in appearance to the earlier obtained plot.

SPSS Basics Probability Distributions

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

SPSS Basics Probability Distributions

Uploaded by

Copyright:

Available Formats

SPSS Basics for Probability Distributions

Built-in Statistical Functions in SPSS

Accessing built-in statistical functions (and others) in SPSS is fairly straightforward

The Compute Variable window then opens

Prepared by: Chris Hay-Jahans, UAS Mathematics Program Page 1 of 28

Prepared by: Chris Hay-Jahans, UAS Mathematics Program Page 2 of 28

By selecting Inverse DF access can be gained to Inverse functions for Cumulative

Selecting Random Numbers provides access to a list of Random Number Generators

Prepared by: Chris Hay-Jahans, UAS Mathematics Program Page 3 of 28

Selecting Significance provides access to functions which may be used in computing a

Prepared by: Chris Hay-Jahans, UAS Mathematics Program Page 4 of 28

Finally, selecting Arithmetic provides access to a list of commonly used arithmetic,

Computing Probabilities and their Inverses

Working with built-in functions for the Binomial Distribution

Prepared by: Chris Hay-Jahans, UAS Mathematics Program Page 6 of 28

Prepared by: Chris Hay-Jahans, UAS Mathematics Program Page 7 of 28

Now, to run the program, select Run and then All.

Prepared by: Chris Hay-Jahans, UAS Mathematics Program Page 8 of 28

Working with built-in functions for the Normal Distribution

As with binomial distributions, the main characteristics of a normal distribution are

Prepared by: Chris Hay-Jahans, UAS Mathematics Program Page 9 of 28

Thus, to compute cumulative frequencies corresponding to the given values of x in the

Being sure to identify a target variable, click on OK to get

Prepared by: Chris Hay-Jahans, UAS Mathematics Program Page 10 of 28

The values computed show up in

Prepared by: Chris Hay-Jahans, UAS Mathematics Program Page 11 of 28

Working with built-in functions for the t-Distribution

The parameter needed to determine a t-distribution is the degrees of freedom, df = n – 1.

Prepared by: Chris Hay-Jahans, UAS Mathematics Program Page 12 of 28

The values computed show up in

As in previous cases, inverses of probabilities can be computed

Prepared by: Chris Hay-Jahans, UAS Mathematics Program Page 13 of 28

Working with built-in functions for the χ2-Distribution

The parameter needed to determine a (Chi-square) χ2-distribution is again the degrees of

The computed values appear in

Prepared by: Chris Hay-Jahans, UAS Mathematics Program Page 14 of 28

To obtain inverses of probabilities (i.e., to find the “x-values”) involving a χ2-

Working with built-in functions for the F-Distribution

First, change the variable “F” to “FProb”.

Prepared by: Chris Hay-Jahans, UAS Mathematics Program Page 15 of 28

Then for inverses of probabilities

The results of the above computations are

Assessing the Normality of a Random Variable Graphically

Computational Procedures for Obtaining a Q-Q Normal Probability Plot

Prepared by: Chris Hay-Jahans, UAS Mathematics Program Page 17 of 28

The result will be

Prepared by: Chris Hay-Jahans, UAS Mathematics Program Page 18 of 28

Now insert a new variable before the variable “x”,

Prepared by: Chris Hay-Jahans, UAS Mathematics Program Page 19 of 28

This is done using the Compute Variable feature.

Prepared by: Chris Hay-Jahans, UAS Mathematics Program Page 20 of 28

Prepared by: Chris Hay-Jahans, UAS Mathematics Program Page 21 of 28

Prepared by: Chris Hay-Jahans, UAS Mathematics Program Page 22 of 28

Use the Compute Variables feature to obtain this.

The two output columns are shown below.

Prepared by: Chris Hay-Jahans, UAS Mathematics Program Page 23 of 28

Earlier editing methods can be used to obtain

Prepared by: Chris Hay-Jahans, UAS Mathematics Program Page 24 of 28

Prepared by: Chris Hay-Jahans, UAS Mathematics Program Page 25 of 28

The end result is

Prepared by: Chris Hay-Jahans, UAS Mathematics Program Page 26 of 28

Select the toolbar commands shown below

Prepared by: Chris Hay-Jahans, UAS Mathematics Program Page 27 of 28