You are on page 1of 13

Algebra 1

Section 9.2

Representing Data for 1 Variable


Quantitative and Qualitative Variables and Frequency Tables

Data sets can involve variables that take the form of a number, or quantity, in which case they
are quantitative. An example of a quantitative variable is time (1 second, 3 minutes, etcetera). The
items in a data set, however, may fit into categories that cannot be represented by numbers, in which
case they are qualitative. An example of a qualitative variable is favorite color.

As discussed in 9.1, data sets can be represented as lists of values. To express these data sets vi-
sually, however, it may be more convenient to transform the list into a frequency table. A frequency
table lists the values that occur in a set and how many times they occur. This is a nice way to list the
data in a set because it makes it simple to create graphs of the data.

As an example, a frequency table for the data set {2, 3, 2, 5, 2, 4, 3, 6, 5, 3, 4} is given below.

Value Frequency
2 3
3 3
4 2
5 2
6 1

To make a frequency table, first make a row for each value that appears in the data set. Then,
go back for each row and find the number of times that value appears in the data set.

Frequency tables can be made for qualitative data as well. For example the data set {Green,
Blue, Green, Red, Blue, Green} has the frequency table below.

Color Frequency
Green 3
Blue 2
Red 1

Note that the data set could not be ordered as there is no order to color. Thus, frequency tables
provide a useful way to group qualitative data.

1
Dot Plots, Bar Graphs, and Histograms

Data can be represented visually on axes using one axis (usually the horizontal axis) for the pos-
sible values or categories of a variable, and the other axis (usually the vertical axis) for the frequency
of each value or category.

For discrete, quantitative variables, or number values that are not in ranges and do not go con-
tinuously, dot plots can be a convenient way to represent data visually. The values are given on the
horizontal axis while the frequency of each value is represented by the hight of a stack of dots above
that value running parallel to the vertical axis. For example, consider the frequency table and dot plot
below for the average hours worked per day for a number of people that were surveyed.

Average Hours Worked Frequency


0 2
1 2
2 2
3 1
4 4
5 7
6 8
7 12
8 18
9 15
10 9
11 4
12 3
13 0
14 2

2
It is important when making a dot plot to have each dot level. That is, a stack of n dots should
be the same hight for all values on the horizontal axis. This allows one to determine which values
occur more frequently in the data by simply looking at the relative heights of the stacks of dots.

For data with qualitative variables, the frequency is often represented with a bar graph. The
horizontal axis is similar to a dot plot, with the exception that the numbers on a dot plot are replaced
with categories on a bar graph. Additionally, columns are used to represent the frequency of each
category rather than stacks of dots. Tall columns, or bars, mean the category occurs more frequently.
For example, consider the frequency table and associated bar graph below for a local diners patrons
favorite style of egg.

Favorite Style of Egg Frequency


Scrambled 11
Fried 22
Poached 9
Boiled 14
Omelet 11

3
Again, equal frequencies must translate to equal hight of bars. For example, just as many people
preferred scrambled eggs as those who preferred omelets, so the bars for scrambled and omelet are
equally high.

For quantitative data with a lot of values, it can be difficult to make a dot plot because there
would have to be so many stacks of dots. In fact, if the range of values for quantitative data is con-
tinuous, it will include an infinite number of possible values. For example, the number of audience
members at a concert can vary a lot, but audience sizes can be similar. Having a stack of dots for
each value wouldnt be effective because the size of an audience is so specific each value will have low
frequencies. In such a case, it is often convenient to divide the values into ranges. Then, the frequency
of values in each range occurring can be represented by vertical columns, similar to a bar graph. This is
called a histogram. The columns on histograms, however, are connected because the range of values
are connected. Consider the frequency table below for the size of the audience at concerts in a given
year and the associated histogram on the next page.

Number of Audience Members Frequency


0 1999 134
2000 3999 251
4000 5999 290
6000 7999 403
8000 9999 511
10000 11999 422
12000 13999 212
14000 16000 93

Because of the way the ranges are defined, the values on the border of two ranges falls into the
larger range and thus the column to the right of the "dividing line" in the histogram.

4
5
Box Plots

Sometimes it is useful to visualize only the shape, center, and spread of a data set rather than
the frequency of individual values. Such a visualization would be more concise. This is exactly what
a box plot does.

A box plot, or a "box and whisker plot", does not show the frequency of each value in a data
set. Rather, it shows the range, median, and IQR of a data set.

A box plot sits on the number line, and a dot goes above the lowest and highest values of the
data set on the number line. Then, a box is constructed with a dividing line. The left line of the box
is placed above the 1st quartile of the data set on the number line while the right line of the box is
placed above 3rd quartile of the data set on the number line. The dividing line is placed above the
median of the data set on the number line. Then, the outer dots are connected to the box by lines.
For example, consider the data set below.

{4, 3, 2, 0, 2, 2, 3}.

The minimum and maximum of this data set are 4 and 3 respectively. Thus, a dot goes above
these values on the number line. The median of the data set is 0, so a line goes above this value on
the number line. The 1st and 3rd quartiles are 3 and 2 respectively so lines go above these values on
the number line and compose the outside of the box. Then, the sides of the box are connected and the
outer dots are attached to the box by lines. The finished product will look like the box plot below.

6
1st Quartile Median 3rd Quartile
Minimum Maximum

-4 -3 -2 -1 0 1 2 3 4

Box plots show the spread of a data set. It can be useful to recognize when a box plot shows an
outliers. If the distance from the 1st or 3rd quartile to the minimum or maximum respectively of a set
is significantly larger than the distance from either of the quartiles to the median, either the minimum
or maximum (whichever is significantly far from a quartile) is likely an outlier.

A box plot can show a data set as close symmetric if the distances from the vertical lines to
each other and the dots are very close.

7
Examples

Here are a few examples to test the concepts provided in this section. Answers can be found on
the following pages.

1. A survey asked 20 students about their favorite class. The results are given in the frequency
table below. Make a bar graph to represent this data. Why is a bar graph the right way to
represent this data rather than a histogram?

Favorite Class Frequency


Science 7
Math 8
English 4
History 1

2. Make a frequency table and a dot plot for the following set of data.

{2, 5, 3, 3, 6, 2, 7, 1, 5, 6, 1, 1, 3, 3, 8}

3. Three events were going on the same night. A survey asked which each person would attend.
The results are given in the frequency table below. Choose a way to represent this data visually
based on the nature of the variables, and represent the data in that way.

Event Frequency
Concert 5, 000
Football Game 10, 000
Movie Premiere 500
None 2, 500

4. Make a box plot for the following data set.

{23, 22, 25, 26, 23, 25, 27, 20, 18, 33, 23}

8
Solutions

These are the solutions to the questions on the previous page

1. A bar graph is the right way to represent this data because the independent variable (favorite
class) is qualitative. A histogram is used for quantitative data thats continuous or has a large
range of values.

9
2. The frequency table for the values in the data set is given below and the dot plot is below that.
Notice that the stacks of dots are level, as they should be so that values with equal frequencies
are represented by stacks of dots with equal heights.

Number of Audience Members Frequency


1 3
2 2
3 4
4 0
5 2
6 2
7 1
8 1

10
3. The independent variable, the event, is qualitative. Thus, a bar graph should be used. That box
plot is given below.

11
4. To make a bar graph, it is necessary to know the median, 1st and 3rd quartile, and the highest
and lowest values of a data set. This means the set must be ordered. The ordered data set is
thus given below.

{18, 20, 22, 23, 23, 23, 25, 25, 26, 27, 33}

The median of the data set is 23, because there are five values to the left of the third 23 and five
values to the right of the third 23. Thus, the middle line of the box is over 23. The lowest value
is 18 and the highest value is 33. Thus, there are dots over 18 and 33. The 1st quartile of the
data is the median of the values to the left of the third 23. In this case, thats 22 (two values to
the left and right of 22). The 3rd quartile of the data is the median of the values to the right of
the third 23. In this case, thats 26 (two values to the left and right of 26). Thus, the left and
right sides of the box are over 22 and 26 respectively. The dots will be connected to the left and
right sides of the box by lines, and this will make the final box plot that is shown below.

10 15 20 25 30 35 40 45

12

You might also like