You are on page 1of 12

Robert Kirkham

Math 1040-404
April 7, 2015

Skittles Project Report

This project is in regards to the proportion of each color Skittles in multiple bags
of this tasty candy. Each student in Math 1040 Introductory Statistics was to purchase
a 2.17 bag of Original Skittles, separate the Skittles by color and then count the number
of each candy per color. After the counting was finished, all the Skittles of all the
students in the class (21 Students and 21 bags of candy) were totaled both by each
color as well as the grand total of all the candies. What is seen below are the graph
representations of all the Students Skittles.

The total of my own single 2.17-ounce bag of Skittles.


Number of red
candies

19

Number of
orange candies
11

Number of
yellow candies
11

Number of green Number of


candies
purple candies
9

12

The total number of Skittles, separated by color, for the entire Math 1040
class.
Number of red
candies
283

Number of
orange candies
271

Number of
yellow candies
254

Number of green Number of


candies
purple candies
206

250

Prior to starting this project I thought that the distribution of each candy color would be
close to the same. I definitely was wrong. When looking at the graphs I found it much easier
to see the separation in numbers in the Pareto chart as compared to the Pie chart. Looking at the
Pie chart I really couldnt see much if any difference in each piece. But looking at the Pareto
chart, having the numbers in descending order I could clearly see the differences in the totals of
each color of candy.
When looking at my totals from my own bag of Skittles, my numbers held quite close to
the classes numbers when looking at, and comparing colors. In my bag, and in the class totals,
the color with the highest total was red, by a sizable margin. In both my bag and the class total
green was the lowest total, again by a sizable margin. In my bag and in the class total both
yellow and purple candies were in the middle and the totals were negligible. Where there was a
difference between my bag and the class total was with the color orange. For me, my total was
right in the middle with the yellow and purple. But for the class total, orange was much higher,
closer to the higher total (red) than the midline (yellow and purple).

Organizing and Displaying Quantitative Data: The Number of Candies per


Bag

Mean:

60.2

Standard Deviation:

3.01

5 Number Summary:
Minimum: 52
First Quartile Q1: 59
Second Quartile Q2: 60
Third Quartile Q3: 62
Maximum: 67

(Median)

When I look at the Frequency Histogram for the Skittles I observed that its skewed to the left. I
want to say that it has a bell shaped, but in trying to be more precise with my Histogram by
doing my width at one it left a space at 61, right in the middle of my Histogram. The Boxplot
shows a negative skew with the minimum at 52, the first quartile at 59, the second quartile
(median) at 60, the third quartile at 62, and the maximum at 67. My bag of Skittles had 62
pieces of candy and when comparing my bag with the class mean of 60, I saw what I expected to
see with the class data especially when I observed a fair amount of the class (7) had the same
amount of candies as I did. As I previously mentioned, I had 62 pieces of candy in my bag and
there were a grand total of 21 bags in the class sample.
The difference between categorical data and quantitative data is categorical data or qualitative
data are the names/labels, not numbers, representing counts or measurements where quantitative
data are the actual numbers that represent the counts or measurements. A bar graph, a pareto
chart, a pie chart are all graphs that are good for categorical data because the information in the
graph actually depicts information from the different categories.
Histograms and boxplots are graphs that works well for quantitative data because you get to see
what the actual numbers are. A time-series graph, a stemplot also work well for quantitative
data for the same reason. Calculations that work well for for bar graphs, pareto charts, and pie
charts are working out percentages so you can plug in the actual numbers into the chart.
Calculations that work well for quantitative data are figuring out the 5 number summary so you
can figure out boxplots and so you can see the possible bell curve on a histogram.

Confidence Intervals
A confidence interval is a range of values used to estimate the true value of a population.
It means the probability that a population parameter will fall between two set values.

Construct a 95% confidence interval estimate for the true proportion of


purple candies.

16 12 15 21 13 10 14
9 15

9 10

8 12 10

8 11 17

14

250 purple candies 1264 total candies


p^ = 250/1264 = 0.1977848101
q^ = 0.8022151899
E = 0.0219596009
Critical value = 1.96
0.1977848101 - 0.0219596009 < p < 0.1977848101 + 0.0219596009
0.1758252092 < p < 0.219744411
95% CI = 0.176 < p < 0.220

We are 95% confident that the interval from 0.176 to 0.220 actually does contain the true
value of the population proportion p. This means that if we were to select many different

samples of 1264 Skittles candies and construct the corresponding confidence interval, 95% of
them would actually contain the value of the population proportion of purple candies.

Construct a 99% confidence interval estimate for the true mean number of
candies per bag.
62 56 63 60 52 60 59 62 62 57
67 60 59 58 60 62 62 60 59 62
62

x = 60.19047619
= 3.01
t value = 2.845

3.01
E = 2.845
21

or

E = 1.868697992

60.19047619 - 1.868697992
58.3217782

<

<

<

<

60.19047619 + 1.868697992

62.05917418

58.4 < < 62.1

We are 99% confident that the interval from 58.4 to 62.1 actually does contain the true
value of . This means that if we were to select many different 2.17 ounce bags of Original
Skittles and construct the corresponding confidence intervals in the long run, 99% of them would
actually contain the value of .

Construct a 98% confidence interval estimate for the standard deviation of


the number of candies per bag.
62 56 63 60 52 60 59 62 62 57
67 60 59 58 60 62 62 60 59 62
62

Number of bags = 21
Standard Deviation s = 3.010299779 (rounded to 3.01)
Chi-Square Left tail = 37.566
Chi-Square Right tail = 8.260

(21-1)(3.01)2

37.566

<

2.196261337 <

Rounded to

2.20 <

(21-1)(3.01)2
<
8.260

<

4.683725882

<

4.68

We have 98% confidence that the limits of 2.20 and 2.68 contain the true value of .
Looking at the survey of 21 2.17 ounce bags of Original Skittles and the standard deviation of
3.01, the value 0f 3.01 is contained within the confidence interval, so the variation within the
bags of skittles does not appear to be unusual.

Hypothesis Tests
Explain in general the purpose and meaning of a hypothesis test.
A Hypothesis Test is a procedure for testing a claim about a property of a
population. Every Hypothesis Test requires that we state a null hypothesis
and also an alternative hypothesis and to evaluate which one is true.
Use a 0.01 significance level to test the claim that 20% of all Skittles
candies are green.

0.16 - 0.20

(0.20)(0.80)

1264

x = 206
n = 1264
p^ = 0.1629746835
p = 0.20
q = 0.80
z = -3.29
H0 = P = 0.20
H1 = P 0.20
= 0.01
Fail to reject H0
There is not sufficient evidence to warrant rejection of the claim that 20% of all Skittles candies are
Green.

Use a 0.05 significance level to test the claim that the mean number of
candies in a bag of Skittles is 56.
62 56 63 60 52 60 59
62 62 57 67 60 59 58
60 62 62 60 59 62 62

60.19 - 56

3.01
-21

x = 60.19047619 rounded to 60.19

x = 56
n = 21
s = 3.010299779 rounded to 3.01
p = 3.18
t = 6.379156811
H0 = = 56
H1 = 56
= 0.05
Reject H0 .
There is not sufficient evidence to support the claim that the mean number of candies in a bag of
Skittles is 56.

Discuss and interpret the results of each of your two hypothesis tests.
Include neatly written and scanned copies of your work.
With the claim of 20% of the Skittles candies being green, the P-value was
greater than the and therefore we are required to fail to reject the null
hypothesis. With the claim that the mean number of candies in a bag of
Skittles being 56, the P-value was less than the so we are required to
reject the null hypothesis.

Reflection
State the conditions for doing interval estimates and hypothesis tests for
population proportions and discuss whether or not your samples met
these conditions.
The conditions for doing interval estimates and hypothesis tests for
population proportions require that the sample observations are a simple
random sample, that the conditions for a binomial distribution are satisfied,
and that the conditions of np 5 and np 5 are both satisfied. With the
hypothesis test of the population proportion with the Skittles project the
sample was a simple random sample, the conditions for a binomial
distribution are satisfied, and finally (1264)(0.2) 5 and (1264)(0.8) 5,
therefore the samples met these conditions.

State the conditions for doing interval estimates and hypothesis tests for
population means and discuss whether or not your samples met these
conditions.
The conditions for doing interval estimates and hypothesis tests for
population mean require that the sample is a simple random sample, and
that the population is normally distributed or that n > 30. With the
hypothesis test of the population mean with the Skittles project the sample
was a simple random sample and although our sample was only 21 bags,
the population was normally distributed.

State the conditions for doing interval estimates for population standard
deviations and discuss whether or not your samples met these conditions.
The conditions for doing interval estimates and hypothesis tests for
population standard deviation require that the sample is a simple random
sample and that the population has a normal distribution (this is a strict
requirement). With the Skittles project the sample was a simple random
sample and the population was normally distributed.

What possible errors could have been made by using this data? How
could the sampling method be improved? State what conclusions you
have drawn from your statistical research.
Errors that possibly could have been made would be a Type I error and a
Type II error. A Type I error is the mistake of rejecting the null hypothesis
when it is actually true. A Type II error is the mistake of failing to reject the
null hypothesis when it is actually false. Thinking about what could be an
improvement in the sampling method is it would be more effective to have
more samples. In the grand scheme of things, 21 bags of Skittles is a
small number to work with when trying to get a true estimate of the specific
candy colors in a bag of Skittles and a true estimate of the mean number of
Skittles candies in a bag of Skittles.

You might also like