You are on page 1of 12

Basic Statistics with Microsoft Excel

Excel Functions for Statistics


DESCRIPTIVE STATISTICS: FREQUENCIES

Using COUNTIF to Construct a Frequency Distribution

The spreadsheet below shows how COUNTIF can be used to calculate how many times each
country appears in the list in Column A.

To calculate Frequency select D2 and enter the function:


=COUNTIF(A$2:A$17, C2) A2:A17 is the range to be evaluated (need absolute row
reference ($) to make sure function will fill down correctly).
C2 is the country to be counted.

To calculate Relative Frequency, divide Frequency (D2) by Total (D7).

To calculate Percent Frequency, multiply Relative Frequency (E2) by 100.

Note: Cells D2:F7 show the formulas used. Cells D11:F16 show the results.

1
Basic Statistics with Microsoft Excel

Using FREQUENCY to Construct a Frequency Distribution

The FREQUENCY function involves the use of array formulas that provide multiple values (in this
case the class frequencies) as output.

1. Select cells D2:D5 where the


frequencies are to appear.
2. Type the formula:
=FREQUENCY
(A2:A16,{9,14,19,24})
3. Press CTRL + SHIFT +
ENTER and the array
formula will be entered into
each of the cells D2:D6.

Because we entered an array


formula, the formula that Excel
displays in each of the cells is the
same, but the values are not - they
are the frequencies for each class.
The class upper limits in the second
argument of the FREQUENCY
function tell Excel which frequency to put in each cell within the range of the array formula.

Using PivotTable Report to Construct a Crosstabulation

PivotTable Report provides a general tool for summarising the data for two or more variables
simultaneously.

1. Select the Data menu and choose PivotTable and PivotChart Report.
2. Choose Microsoft Excel list or database.
3. Choose PivotTable and select Next.
4. Enter the data range in the Range box and select Next.
5. Select New Worksheet (if required).
6. Click on the Layout button.
7. Drag the field buttons to the ROW, COLUMN and DATA sections of the diagram as
appropriate.
8. Double click the Sum of … field button in the data section.
9. Choose Count under Summarise by: and click OK.
10. Click OK and the Finish.

2
Basic Statistics with Microsoft Excel

DESCRIPTIVE STATISTICS: NUMERICAL METHODS

The following spreadsheet shows the functions used to calculate the mean, median, mode,
percentiles and quartiles for a cell range named hours.

Displaying the Mean, Median, and Mode

=AVERAGE(array), =MEDIAN(array), =MODE(array)

Percentiles and Quartiles

=PERCENTILE(array, percentile) where percentile is between 0 and 1


=QUARTILE(array, quart) where quart is 1, 2, 3 or 4
1st Quartile = 25th Percentile, 2nd Quartile = 50th Percentile, 3rd Quartile = 75th Percentile

Deviation and Squared Deviation About the Mean

The sum of the deviations about the mean will always equal 0.
To calculate the square of a value enter =A1^2.
The sample variance (difference between the value of each observation and the mean) will be the
sum of the Squared Deviation divided by n-1. In the example above this will be 41320/4 = 10330

3
Basic Statistics with Microsoft Excel
Sample Variance and Sample Standard Deviation

To calculate the Variance use =VAR(range).


To calculate the Standard Deviation use =STDEV(range).

Using the Descriptive Statistics Tool

The Descriptive Statistics Tool is one


of Excel’s Data Analysis Tools
(which are available from the
Analysis Toolpak Add-In) and
allows the user to compute a variety
of statistics at once.

1. Make sure that the Analysis


Toolpak has already been
installed. (If not, go to
Tools/Add-Ins and select
Analysis Toolpak).

2. Select Data Analysis from


the Tools menu.

3. Choose Descriptive
Statistics from the list of
Analysis Tools. The
Descriptive Statistics box will
open.

4
Basic Statistics with Microsoft Excel

4. In the Input Range: box,


enter the range for your
data (B3:B18).

5. Select Columns in the


Grouped By: section.

6. Select Labels in first row.

7. Enter where you want the


statistics to appear in the
Output Range: box (D3).

8. Select Summary statistics.

9. Click OK.

Covariance and the Correlation Coefficient

Covariance is a measure of linear association between two variables. Positive values indicate a
positive relationship; negative values indicate a negative relationship. The correlation coefficient
is another measure of linear association between two variables that takes on values between -1 and
+1. Values near +1 indicate a strong positive linear relationship, values near -1 indicate a strong
negative linear relationship, and values near 0 indicate the lack of a linear relationship.

The covariance function =COVAR() treats the data as a population and the correlation function
=CORREL() treats the data as a sample. The result obtained using the covariance function must be
adjusted to provide the sample covariance. The formula for the population covariance requires
dividing by the total number of observations in the data set, but the formula for the sample
covariance requires dividing by the total number of observations minus 1. Therefore to compute the
sample covariance multiply the population covariance by n/(n-1).

5
Basic Statistics with Microsoft Excel
PROBABILITY

Computing Posterior Probabilities

The spreadsheet below shows the prior probabilities for two mutually exclusive events A1 and A2.

Prior probability is the initial estimate of the probability of an event.


Conditional probability is the probability of an event given that another event has occurred.
Joint probability is the probability of two events both occurring (intersection of two events) in the
case above the prior probability multiplied by the conditional probability.
Posterior probability is the revised probability of an event based on additional information.

6
Basic Statistics with Microsoft Excel

DISCRETE PROBABILITY DISTRIBUTIONS

Expected Value, Variance, and Standard Deviation

Expected value is a measure of the mean or central location of a random variable.


Variance is a measure of the variability or dispersion of a random variable.
Standard deviation is the positive square root of the variance.

The SUMPRODUCT function multiplies each value in one range by the corresponding value in
another range and sums the products.

Binomial Probabilities

A binomial experiment has the following four properties:

1. The experiment consists of a sequence of n identical trials.


2. Two outcomes are possible on each trial – a success and a failure.
3. The probability of a success, denoted by p, does not change from trial to trial. Consequently
the probability of a failure, denoted by 1-p, does not change from trial to trial.
4. The trials are independent.

Excel’s BINOMDIST function can be used to compute binomial probabilities and cumulative
binomial probabilities. The spreadsheet below shows how to calculate the probability of 0, 1, 2 and
3 successful outcomes given 3 trials if each trial has a 0.3 probability of success.

7
Basic Statistics with Microsoft Excel

If you prefer to use the Insert Function


command, select the Statistical category and
click on BINOMDIST. Enter the arguments
for the first calculation as shown, using
absolute referencing where necessary.

Using POISSON to Compute Poisson Probabilities

A Poisson probability
distribution is a
probability distribution
showing the probability of
x occurrences of an event
over a specified interval of
time or space. The
POISSON function
requires three arguments
and has the following
syntax:
=POISSON(x, mean,
cumulative).

If you prefer to use the Insert Function


command, select the Statistical category and
click on POISSON. Enter the arguments for
the first calculation as shown, using absolute
referencing where necessary.

To calculate the cumulative probabilities, the


third argument will be TRUE.

8
Basic Statistics with Microsoft Excel
Using HYPGEOMDIST to Compute Hypergeometric Probabilities

The HYPGEOMDIST function is the function used to


compute the probability of x successes in n trials
when the trials are dependent. HYPGEOMDIST will
only compute probabilities, not cumulative
probabilities and has four arguments: x, n, r, and N.
Its syntax is:
=HYPGEOMDIST(sample_s, number_sample,
population_s, number_pop)

If a sample of 5 children contains 3 girls and 2 boys,


the probability of selecting 2 children who are both
girls will be 0.3.

9
Basic Statistics with Microsoft Excel

CONTINUOUS PROBABILITY DISTRIBUTIONS

Normal Probabilities

In a normal probability distribution the probability density function is bell shaped and
determined by its mean µ and standard deviation σ. A standard normal probability distribution
is a normal distribution with a mean of zero and a standard deviation of one.

Excel has two functions for computing probabilities and z values for a standard normal probability
distribution: NORMSDIST and NORMSINV. The NORMSDIST function is used to compute the
cumulative probability given a z value and its syntax is =NORMSDIST(z) where z is the value for
which you want the distribution. The NORMSINV function is used to compute the z value given a
cumulative probability and has the syntax =NORMSINV(probability) where probability is a value
between 0 and 1. The letter S reminds us that the functions relate to the standard normal probability
distributions.

The NORMSDIST function provides the area under the standard normal curve to the left of a given
z value. For nonnegative z values, the NORMSDIST function provides the same cumulative
probability we would obtain if we used a cumulative normal probabilities table. However, unlike a
table, the NORMSDIST function provides cumulative probabilities for negative z values as well.

To calculate the probability of z being in an interval you must calculate the value of NORMSDIST
at the upper end point and subtract the value of NORMSDIST at the lower endpoint of the interval.

To calculate the area under the standard normal curve to the right of a given z value you must take
the cumulative probability away from 1.

10
Basic Statistics with Microsoft Excel
The NORMSINV function is the inverse of the NORMSDIST function; it takes a cumulative
probability (lower tail area) input and provides the z value corresponding to that cumulative
probability. To work out the z value for an upper tail probability, subtract the probability from 1.

Two similar functions, NORMDIST and NORMINV are available for computing the cumulative
probability and the x value for any normal distribution. The NORMDIST function provides the area
under the normal curve to the left of a given value of the random variable x. Its syntax is
=NORMDIST(x, mean, standard_dev, cumulative). If cumulative is TRUE it will return the
cumulative distribution function; if FALSE it returns the probability mass function (height of the
curve).

The NORMINV function is the inverse of NORMDIST and takes a cumulative probability as input
and provides the value of x corresponding to that cumulative probability. Its syntax is
=NORMINV(probability, mean, standard_dev).

11
Basic Statistics with Microsoft Excel

Exponential Probability Distribution

The EXPONDIST function can be used to compute exponential probabilities. Its syntax is
=EXPONDIST(x, lambda, cumulative) where x is the random variable, lambda is 1/µ and
cumulative will always be TRUE.

12

You might also like