Professional Documents
Culture Documents
The spreadsheet below shows how COUNTIF can be used to calculate how many times each
country appears in the list in Column A.
Note: Cells D2:F7 show the formulas used. Cells D11:F16 show the results.
1
Basic Statistics with Microsoft Excel
The FREQUENCY function involves the use of array formulas that provide multiple values (in this
case the class frequencies) as output.
PivotTable Report provides a general tool for summarising the data for two or more variables
simultaneously.
1. Select the Data menu and choose PivotTable and PivotChart Report.
2. Choose Microsoft Excel list or database.
3. Choose PivotTable and select Next.
4. Enter the data range in the Range box and select Next.
5. Select New Worksheet (if required).
6. Click on the Layout button.
7. Drag the field buttons to the ROW, COLUMN and DATA sections of the diagram as
appropriate.
8. Double click the Sum of … field button in the data section.
9. Choose Count under Summarise by: and click OK.
10. Click OK and the Finish.
2
Basic Statistics with Microsoft Excel
The following spreadsheet shows the functions used to calculate the mean, median, mode,
percentiles and quartiles for a cell range named hours.
The sum of the deviations about the mean will always equal 0.
To calculate the square of a value enter =A1^2.
The sample variance (difference between the value of each observation and the mean) will be the
sum of the Squared Deviation divided by n-1. In the example above this will be 41320/4 = 10330
3
Basic Statistics with Microsoft Excel
Sample Variance and Sample Standard Deviation
3. Choose Descriptive
Statistics from the list of
Analysis Tools. The
Descriptive Statistics box will
open.
4
Basic Statistics with Microsoft Excel
9. Click OK.
Covariance is a measure of linear association between two variables. Positive values indicate a
positive relationship; negative values indicate a negative relationship. The correlation coefficient
is another measure of linear association between two variables that takes on values between -1 and
+1. Values near +1 indicate a strong positive linear relationship, values near -1 indicate a strong
negative linear relationship, and values near 0 indicate the lack of a linear relationship.
The covariance function =COVAR() treats the data as a population and the correlation function
=CORREL() treats the data as a sample. The result obtained using the covariance function must be
adjusted to provide the sample covariance. The formula for the population covariance requires
dividing by the total number of observations in the data set, but the formula for the sample
covariance requires dividing by the total number of observations minus 1. Therefore to compute the
sample covariance multiply the population covariance by n/(n-1).
5
Basic Statistics with Microsoft Excel
PROBABILITY
The spreadsheet below shows the prior probabilities for two mutually exclusive events A1 and A2.
6
Basic Statistics with Microsoft Excel
The SUMPRODUCT function multiplies each value in one range by the corresponding value in
another range and sums the products.
Binomial Probabilities
Excel’s BINOMDIST function can be used to compute binomial probabilities and cumulative
binomial probabilities. The spreadsheet below shows how to calculate the probability of 0, 1, 2 and
3 successful outcomes given 3 trials if each trial has a 0.3 probability of success.
7
Basic Statistics with Microsoft Excel
A Poisson probability
distribution is a
probability distribution
showing the probability of
x occurrences of an event
over a specified interval of
time or space. The
POISSON function
requires three arguments
and has the following
syntax:
=POISSON(x, mean,
cumulative).
8
Basic Statistics with Microsoft Excel
Using HYPGEOMDIST to Compute Hypergeometric Probabilities
9
Basic Statistics with Microsoft Excel
Normal Probabilities
In a normal probability distribution the probability density function is bell shaped and
determined by its mean µ and standard deviation σ. A standard normal probability distribution
is a normal distribution with a mean of zero and a standard deviation of one.
Excel has two functions for computing probabilities and z values for a standard normal probability
distribution: NORMSDIST and NORMSINV. The NORMSDIST function is used to compute the
cumulative probability given a z value and its syntax is =NORMSDIST(z) where z is the value for
which you want the distribution. The NORMSINV function is used to compute the z value given a
cumulative probability and has the syntax =NORMSINV(probability) where probability is a value
between 0 and 1. The letter S reminds us that the functions relate to the standard normal probability
distributions.
The NORMSDIST function provides the area under the standard normal curve to the left of a given
z value. For nonnegative z values, the NORMSDIST function provides the same cumulative
probability we would obtain if we used a cumulative normal probabilities table. However, unlike a
table, the NORMSDIST function provides cumulative probabilities for negative z values as well.
To calculate the probability of z being in an interval you must calculate the value of NORMSDIST
at the upper end point and subtract the value of NORMSDIST at the lower endpoint of the interval.
To calculate the area under the standard normal curve to the right of a given z value you must take
the cumulative probability away from 1.
10
Basic Statistics with Microsoft Excel
The NORMSINV function is the inverse of the NORMSDIST function; it takes a cumulative
probability (lower tail area) input and provides the z value corresponding to that cumulative
probability. To work out the z value for an upper tail probability, subtract the probability from 1.
Two similar functions, NORMDIST and NORMINV are available for computing the cumulative
probability and the x value for any normal distribution. The NORMDIST function provides the area
under the normal curve to the left of a given value of the random variable x. Its syntax is
=NORMDIST(x, mean, standard_dev, cumulative). If cumulative is TRUE it will return the
cumulative distribution function; if FALSE it returns the probability mass function (height of the
curve).
The NORMINV function is the inverse of NORMDIST and takes a cumulative probability as input
and provides the value of x corresponding to that cumulative probability. Its syntax is
=NORMINV(probability, mean, standard_dev).
11
Basic Statistics with Microsoft Excel
The EXPONDIST function can be used to compute exponential probabilities. Its syntax is
=EXPONDIST(x, lambda, cumulative) where x is the random variable, lambda is 1/µ and
cumulative will always be TRUE.
12