You are on page 1of 7

Descriptive Statistics

What are 'Descriptive Statistics'

Descriptive statistics are brief descriptive coefficients that summarize a given data set, which can be either a representation of the
entire population or a sample of it. Descriptive statistics are broken down into measures of central tendency and measures
of variability, or spread. Measures of central tendency include the mean, median and mode, while measures of variability include
the standard deviation or variance, the minimum and maximum variables, and the kurtosis and skewness.

BREAKING DOWN 'Descriptive Statistics'

Descriptive statistics, in short, help describe and understand the features of a specific data set, by giving short summaries about
the sample and measures of the data. The most recognized types of descriptive statistics are the mean, median and mode, which
are used at almost all levels of math and statistics. However, there are less-common types of descriptive statistics that are still
very important.

People use descriptive statistics to repurpose hard-to-understand quantitative insights across a large data set into bite-sized
descriptions. A student's grade point average (GPA), for example, provides a good understanding of descriptive statistics. The
idea of a GPA is that it takes data points from a wide range of exams, classes and grades, and averages them together to provide a
general understanding of a student's overall academic abilities. A student's personal GPA reflects his mean academic performance.

Measures of Descriptive Statistics

All descriptive statistics, whether they be the mean, median, mode, standard deviation, kurtosis or skewness, are either measures
of central tendency or measures of variability. These two measures use graphs, tables and general discussions to help people
understand the meaning of the data being analyzed.

Measures of central tendency describe the center position of a distribution for a data set. A person analyzes the frequency of each
data point in the distribution and describes it using the mean, median or mode, which measure the most common patterns of the
data set being analyzed.

Measures of variability, or the measures of spread, aid in analyzing how spread-out the distribution is for a set of data. For
example, while the measures of central tendency may give a person the average of a data set, it doesn't describe how the data is
distributed within the set. So, while the average of the data may be 65 out of 100, there can still be data points at both 1 and 100.
Measures of variability help communicate this by describing the shape and spread of the data set. Range, quartiles, absolute
deviation and variance are all examples of measures of variability.

What Are Descriptive Statistics?


Imagine that you are interested in measuring the level of anxiety of college students during finals
week in one of your courses. You have 11 study participants rate their level of anxiety on aSCALE
from 1 to 10, with 1 being 'no anxiety' and 10 being 'extremely anxious.' You collect the ratings and
review them. The ratings are 8, 4, 9, 3, 5, 8, 6, 6, 7, 8, and 10. Your teacher asks you for a summary
of your findings. How do you summarize this data? One way we could do this is by using descriptive
statistics.

Descriptive statistics are used to describe or summarize data in ways that are meaningful and
useful. For example, it would not be useful to know that all of the participants in our example wore
blueSHOES . However, it would be useful to know how spread out their anxiety ratings were.
Descriptive statistics is at the heart of all quantitative analysis.

So how do we describe data? There are two ways: measures of central tendency and measures of
variability, or dispersion.

Central tendency describes the central point in a


data set. Variability describes the spread of the
data.

Measures of Central Tendency


You are probably somewhat familiar with the mean, but did you know that it is a measure of central
tendency? Measures of central tendency use a single value to describe the center of a data set.
The mean, median, and mode are all the three measures of central tendency.

The mean, or average, is calculated by finding the sum of the study data and dividing it by the total
number of data. The mode is the number that appears most frequently in the set of data.

The median is the middle value in a set of data. It is calculated by first listing the data in numerical
order then locating the value in the middle of the list. When working with an odd set of data, the
median is the middle number. For example, the median in a set of 9 data is the number in the fifth
place. When working with an even set of data, you find the average of the two middle numbers. For
example, in a data set of 10, you would find the average of the numbers in the fifth and sixth places.

The mean and median can only be used with numerical data. The mode can be used with both
numerical and nominal data, or data in the form of names or labels. Eye color, gender, and hair
color are all examples of nominal data. The mean is the preferred measure of central tendency since
it considers all of the numbers in a data set; however, the mean is extremely sensitive to outliers, or
extreme values that are much higher or lower than the rest of the values in a data set. The median is
preferred in cases where there are outliers, since the median only considers the middle values.

Knowing what we know, let's calculate the mean, median, and mode using the example from before.
Again, the anxiety ratings of your classmates are 8, 4, 9, 3, 5, 8, 6, 6, 7, 8, and 10.

Mean: (8+ 4 + 9 + 3 + 5 + 8 + 6 + 6 + 7 + 8 + 10) / 11 = 74 / 11 = The mean is 6.73.

Median : In a data set of 11, the median is the number in the sixth place. 3, 4, 5, 6, 6, 7, 8, 8, 8, 9,
10. The median is 7.

Mode: The number 8 appears more than any other number. The mode is 8.

Measures of Dispersion
We've got some pretty solid numbers on our data now, but let's say that you wanted to look at how
spread out the study data are from a central value, i.e. the mean. In this case, you would look
at measures of dispersion, which include the range, variance, and standard deviation.

The simplest measure of dispersion is the range. This tells us how spread out our data is. In order to
calculate the range, you subtract the smallest number from the largest number. Just like the mean,
the range is very sensitive to outliers.

The variance is a measure of the average distance that a set of data lies from its mean. The
variance is not a stand-alone statistic. It is typically used in order to calculate other statistics, such as
the standard deviation. The higher the variance, the more spread out your data are.

There are four steps to calculate the variance:

1. Calculate the mean.

2. Subtract the mean from each data value. This tells you how far each value lies from the
mean.

3. Square each of the values so that you now have all positive values, then find the sum of the
squares.

4. Divide the sum of the squares by the total number of data in the set.
Introduction

In this article, we'll learn how to calculate standard deviation "by hand".

Interestingly, in the real world no statistician would ever calculate standard


deviation by hand. The calculations involved are somewhat complex, and the risk of
making a mistake is high. Also, calculating by hand is slow. Very slow. This is why
statisticians rely on spreadsheets and computer programs to crunch their numbers.

So what's the point of this article? Why are we taking time to learn a process
statisticians don't actually use? The answer is that learning to do the calculations by
hand will give us insight into how standard deviation really works. This insight is
valuable. Instead of viewing standard deviation as some magical number our
spreadsheet or computer program gives us, we'll be able to explain where that
number comes from.

Overview of how to calculate standard deviation

The formula for standard deviation (SD) is

\Large\text{SD} = \sqrt{\dfrac{\sum\limits_{}^{}{{\lvert x-\bar{x}\rvert^2}}}{n}}SD=


nxx2

where \sumsum means "sum of", xxx is a value in the data set, \bar{x}x is the
mean of the data set, and nnn is the number of data points.

The formula may look confusing, but it will make sense after we break it down. In
the coming sections, we'll walk through a step-by-step interactive example. Here's a
quick preview of the steps we're about to follow:

Step 1: Find the mean.

Step 2: For each data point, find the square of its distance to the mean.

Step 3: Sum the values from Step 2.

Step 4: Divide by the number of data points.

Step 5: Take the square root.

Step-by-step interactive example for calculating standard deviation

First, we need a data set to work with. Let's pick something small so we don't get
overwhelmed by the number of data points. Here's a good one:

6, 2, 3, 16,2,3,16, comma, 2, comma, 3, comma, 1


And here's a data set:

1, 4, 7, 2,61,4,7,2,61, comma, 4, comma, 7, comma, 2, comma, 6

Find the standard deviation of the data set.


Round your answer to the nearest hundredth.

\text{SD}=SD=S, D, equals

STANDARD DEVIATION

The formula for the sample standard deviation of a data set (s) is

where xi is each value is the data set, x-bar is the mean, and n is the number of values in the
data set. To calculate s, do the following steps:

Find the average of the data set,

Take each value in the data set (x) and subtract the mean from it to get

Square each of the differences,

Add up all of the results from Step 3 to get the sum of squares,

Divide the sum of squares (found in Step 4) by the number of numbers in the data set minus
one; that is, (n 1). Now you have

Take the square root to get


which is the sample standard deviation, s. Whew!

At the end of Step 5 you have found a statistic called the sample variance, denoted by s2. The
variance is another way to measure variation in a data set; its downside is that its in square
units. If your data are in dollars, for example, the variance would be in square dollars which
makes no sense. Thats why you proceed to Step 6. Standard deviation has the same units as
the original data.

Look at the following small example: Suppose you have four quiz scores: 1, 3, 5, and 7. The
mean is 16 4 = 4 points. Subtracting the mean from each number, you get (1 4) = 3, (3 4)
= 1, (5 4) = +1, and (7 4) = +3. Squaring each of these results, you get 9, 1, 1, and 9.
Adding these up, the sum is 20. In this example, n = 4, and therefore n 1 = 3, so you divide 20
by 3 to get 6.67, which is the variance. The units here are points squared, which obviously
makes no sense. Finally, you take the square root of 6.67, to get 2.58. The standard deviation
for these four quiz scores is 2.58 points.

Because calculating the standard deviation involves many steps, in most cases you have a
computer calculate it for you. However, knowing how to calculate the standard deviation helps
you better interpret this statistic and can help you figure out when the statistic may be wrong.

CORRELATION

For example, suppose you have the data set (3, 2), (3, 3), and (6, 4). You calculate the correlation

coefficient r via the following steps. (Note that for this data the x-values are 3, 3, 6, and the y-values are 2,

3, 4.)

1. Calculating the mean of the x and y values, you get

2. The standard deviations are sx = 1.73 and sy = 1.00.

3. The n = 3 differences found in Step 2 multiplied together are: (3 4)(2 3) = ( 1)( 1) = +1; (3
4)(3 3) = ( 1)(0) = 0; (6 4)(4 3) = (2)(1) = +2.
4. Adding the n = 3 Step 3 results, you get 1 + 0 + 2 = 3.

5. Dividing by sx sy gives you 3 / (1.73 1.00) = 3 / 1.73 = 1.73. (Its just a coincidence that the
result from Step 5 is also 1.73.)

6. Now divide the Step 5 result by 3 1 (which is 2), and you get the correlation r = 0.87.

Item reliability is simply the product of the standard deviation of item scores and a

correlational discrimination index (Item-Total Correlation Discrimination in

the Item Analysis Report). So item reliability reflects how much the item is contributing

to total score variance.

You might also like