Professional Documents
Culture Documents
For the present, let’s identify where the various distribution related functions are located.
The menu of all available functions in SPSS is in the box labeled Function group:,
selecting one of the function groups provides a sub-menu in the Functions and Special
Variables box.
By selecting CDF & Noncentral CDF access can be gained to a list of Cumulative
Distribution Functions.
By selecting PDF & Noncentral PDF access can be gained to a list of Probability
Density Functions
Selecting Statistical provides access to some of the more routine functions used in
statistical computations.
When using any of the above identified functions (or any others, for that matter), a
Target Variable has to be defined or listed and then a Numeric Expression (a desired
computational formula) which may or may not involve built in functions will need to be
entered. This formula also may or may not involve other variables listed in the data file.
Once the computational formula is entered and the OK button is clicked SPSS will
compute values for every row that has at least one data value in it.
Go to the Data View of the data file, enter the data values 5, 10, 15 and 20 for the
variable x. The following discussion addresses computations of cumulative probabilities
for binomial distributions.
Open the Compute Variable window, identify “Binomial” as the Target Variable, then
select CDF & Noncentral CDF in the Function group box and highlight Cdf.Binom.
To place this function in the Numeric Expression box, click on the upward-arrow button
located right next to the Function group box.
When ever a function or special variable is highlighted in the Functions and Special
Variables box, a description will appear in the central space reserved for instructions
and/or descriptions right below the “calculator” key-pad.
Prepared by: Chris Hay-Jahans, UAS Mathematics Program Page 5 of 28
SPSS Basics for Probability Distributions
In this case, note that the Numeric Expression “CDF.BINOM(?,?,?)” requires a “quant”,
an “n”, and a “prob”. This function computes a probability of the form
P(X ≤ x) = CDF.BINOM(x, n, p)
for a binomial distribution with probability of success p and number of trials n.
Put x for the 1st “?” – SPSS takes x’s value from the data file, set n = 23 and p = 0.37.
Now click on OK, a little window will pop up asking if the existing values of the variable
should be changed. Select OK.
The output file will be updated with a “log” of the computation performed, and the data
file will now contain the cumulative probabilities
P(x ≤ 5), P(x ≤ 10), P(x ≤ 15) and P(x ≤ 20)
For the binomial distribution with parameters n = 23 and p = 0.37.
Adjust the decimal places for the variable “Binomial” to four places, then copy the values
from “Binomial” to the variable “p” – reset the decimal places for “p” as well
Now, suppose the reverse computation is desired, i.e., find x for which the cumulative
probability is 0.0938 and so on – note that the answers are already given in the variable x,
but consider a process for finding these values.
It appears that SPSS does not have inverse functions associated with cumulative
probabilities for discrete probability distributions. This provides an opportunity to
illustrate the use of SPSS’ Command Syntax.
A Command Syntax Illustration: Begin by opening a “Syntax” file (as opposed to a data
or output file).
In this new file, type in the command syntax shown below and save the file as
“InvBinom.sps”. Note that all text that follow a “/*” are comments for user reference –
they tell users of the “program” what a particular code’s purpose is.
Open the data file and notice the new values placed under “InvBinom”
The values entered are exactly those under the variable “x”.
Comments about the above “Program” and Command Syntax: In general, when
dealing with a binomial distribution it is the probability of success that is of most interest
– this being the quantity which might need estimating. The above “program” will work
only if you are dealing with a theoretically exact set of probabilities. Its introduction here
is for illustrative purposes only. Advanced users of SPSS will be the ones who find most
use out of the Command Syntax language of SPSS, most users will never find a need for
it. A point to note is that in order to be able to make effective use of the Command Syntax
feature a user will need familiarity with computer programming and a reasonably high
level of comfort with mathematics.
In computing probabilities of the form P(X = x) for a binomial distribution using SPSS’
built-in function, the Function group: PDF & Noncentral PDF is opened and in
Functions & Special Variables: the function Pdf.Binom is called up
Thus, for a binomial distribution with parameters n and p (n trials and probability of
success p),
P(X = x) = Pdf.Binom(x, n, p).
Use this function in the same manner as which cumulative frequencies were computed
earlier. Be sure to assign a target variable.
Note that to compute a cumulative probability of the form P(a < X < b) using SPSS, one
would use the numeric expression
Cdf.Normal(b, μ, σ) - Cdf.Normal(a, μ, σ).
Increase the number of decimal places, if desired, for the variable “Normal” and then
copy and paste the newly computed values into the variable “p”.
Now, consider finding values for x for which P( -∞ < X < x) = p. Think of this task as
solving this equation for x. SPSS does this through the function Idf.Normal – Keeping
the same values μ = 6.25 and standard deviation σ = 3.5, and setting up the Compute
Variable window as
Unlike as was the case in the binomial distribution, the function Pdf.Norm(x, μ, σ) does
not return the value of P(X = x). This function computes the probability density of the
normal distribution, with specified mean μ and standard deviation σ, at x – see the
general discussion on normal distributions in the text.
This function can be used to obtain the graph of the normal distribution curve (with
specified mean μ and standard deviation σ), however, this function will not play much of
a direct role in this course.
to get
Note: Some poor notation has crept in, my apologies – in the above data file the letter t
represents a probability. In practice the letter t is reserved for the “t-value” of a t-
distribution.
The probability density function of t-distributions will not play much of a direct role in
this course.
Note: Once again, be aware of the poor notation. Here, the variable “Chi” represents
probabilities and the variable “InvChi” represents “χ2-values” of the χ2-distribution.
Again, the probability density function of χ2-distributions will not play much of a direct
role in this course.
Yet again, the probability density function of F-distributions will not play much of a
direct role in this course.
In the Sort Cases window, identify the variable to be sorted and the Sort Order
Name this new variable “i” and for this variable, enter the values 1,2, …, 12.
Now obtain the plotting position using Blom’s approximation – this uses the formula
i − 0.375
p=
n + 0.25
Adjust the decimal places for the computed variable values to 4 places.
Though not necessary, now standardize the x-values using the mean and standard
deviation of the sample.
You will have to compute the mean and standard deviation for the data first. One way to
do this is to use the SPSS routine
Then
It is useful to note that SPSS provides a means of obtaining standardized values – see
Save standardized values as variables in the Descriptives window above. The
standardized values are computed using the formula
x−x
z=
s
Where x is the mean of the data values and s is the standard deviation. You can limit
the amount of output by opening the Options window and selecting only that which is
desired – see below.
The standardized values are saved under the variable name “Zx”, these values will be
referred to as “Observed Values”
The next step in the process is to compute the “Expected Values”. Since it is the
normality of the data that is being examined, the distribution used to obtain these
expected values is a normal distribution. Furthermore, since the observed data values
have been standardized it makes sense to use the Standard Normal distribution. Each
plotting position “p” provides the cumulative probability associated with the rank of the
corresponding observed data value. The expected value is then obtained by computing
Ze = Idf.Normal(p, 0, 1).
Open the Variable View of the data file and Label “Zx” as “Observed Values” and “Ze”
as “Expected Values.
All that is needed to obtain the Q-Q normal probability plot has now been obtained. Any
one of the (three) graphing features may now be used to obtain a scatter plot of the
observed values against the expected values. The “closeness” to normality is then
indicated by how “closely” the scatter plot approximates the line y = x.
Now, to obtain the plot, open the Legacy Dialogs and select Scatter/Dot. Choose Simple
Scatter and then assign “Ze” to the x-axis and “Zx” to the y-axis. You can add a Title
and then select OK. The initial appearance of the Q-Q plot is
For ease of comparison, the reference line y = x can be included as follows. Click on the
button that produces the “Add a reference line from Equation” pop-up text box.
A reference line will appear on the chart area and the Properties window will show the
equation of the line in the Custom Equation box in the form
Y = a*x +b
For this Q-Q normal probability plot (using standardized values) make sure a = 1 and
b = 0.
Observe that the scatter plot follows the line y = x very closely, and all points are closely
clustered randomly (no systematic patterns) about the line. This suggests that it is
reasonable to assume that the underlying population of the random variable X is (at least
approximately) normally distributed. See class handout for a detailed analysis and
interpretation of Q-Q normal probability plots.
Obtaining a Q-Q Normal Probability Plot using the Built-in SPSS Feature
SPSS has a built in feature to construct Q-Q normal probability plots which shortens the
process considerably
In the Q-Q Plots window assign the original variable to Variables, check the indicated
boxes etc. and select OK.
The resulting Q-Q normal probability plot is similar to the one obtained “from scratch”,
with the exception that the “Expected Values” values are placed in the vertical axis rather
than the horizontal axis, see below.
Open the Chart Editor window and begin by transposing the graph. This is done by
clicking on the button that yields a pop-up box containing the “Transpose chart
coordinate system”
Further edits can then be made to obtain a Q-Q normal probability plot that is very close
in appearance to the earlier obtained plot.