You are on page 1of 10

MATH 1040 SKITTLES DATA PROJECT

Report introduction

For our project in MATH 1040 Introduction to Statistics, the whole class must buy (each student) a 2.17
individual size bag of skittles and proceed to count and find out the number of each color of candy in the
bag. The class data was collected, and we used it for several different exercises concerning different
aspects of statistics as we were learning in each class.

In the first part of the project, we had to determine the proportion of each color of candy and created a
Pareto chart and a Pie chart and using the total number of each color candy from the data of the entire
class given to us for the teacher. Comparing our personal data with the one from the whole class and
noticing any similarities or differences between the two of these.

Next, we used the skittles data to created statistics summaries of the mean, standard deviation and 5-
number summary. We construct a frequency histogram of the total number of candies as well as a box
plot chart. Individually, I also wrote a paragraph about the implication of different qualitative and
quantitative methods of analysis.

At the end of the project we worked with confidence intervals. We found three different confidence
intervals for the population, mean, and standard deviation. I wrote an analysis about what each
confidence interval meant. We worked with Hypothesis test, explaining the general purpose and
meaning of these.

To finish this project, we wrote a Reflective writing paper and explain what we had learned doing this
project and how this will impact in other classes and in my personal professional life.
ORGANIZING AND DISPLAYING CATEGORICAL DATA: COLORS

MATH 1040 SKITTLES DATA SUMMARY TO USE IN CLASS PROJECT


Data # Red Orange Yellow Green Purple Total
1 Anthony Anderson 10 11 8 12 17 58
2 Roberto Benvenutto 12 9 15 16 8 60
3 Tatyana Campos 5 15 15 10 14 59
4 Jared Gilbert 11 14 13 9 10 57
5 Juston Helf 10 10 12 16 11 59
6 Dulce Kim 10 10 15 11 13 59
7 Claire Langue 19 9 13 12 5 58
8 Aubre Mcqueen 13 12 8 13 12 58
9 Marisela Mendoza 12 6 14 12 15 59
10 Alejandra Miranda 14 7 12 13 12 58
Gutierrez
11 Geeta Mishra 11 15 11 13 12 62
12 Sterling Pabla 9 15 13 9 13 59
13 Heidi Ragland 10 12 18 7 13 60
14 Wagma Shour 16 8 14 9 10 57
15 Jasmine Topete 14 7 9 14 14 58
16 Madison Vibbert 5 14 17 15 7 58
17 Alexander Vincent 14 10 12 11 11 58
18 Vanessa Aguirre 11 14 12 11 16 64
19 Michael Casper 11 15 13 13 10 62
20 Elizabeth Cheney 6 8 9 12 24 59
21 Annelisa DeJong 13 15 11 9 13 61
22 Orlando Durant 10 16 11 12 13 62
23 Hans Gonzalez- 13 14 8 13 13 61
Campos
24 Carson Kropushek 8 13 14 14 10 59
25 Sophie Kynaston 6 9 13 20 14 62
26 Mark Owusu 10 10 10 16 15 61
27 Vanessa Quiles-Gomez 9 12 15 11 11 58
28 Peter Uchala 12 16 10 9 13 60
29 Thomas Whittaker 9 11 8 13 19 60
30 Matelynn Wilson 17 8 11 13 11 60
MATH 1040 STATISTICS CLASS SKITTLES PROJECT PROPORTIONS
COLOR COUNT PROPORTION OF TOTAL
Red skittles 330 0.185
Orange skittles 345 0.193
Green skittles 368 0.206
Purple skittles 379 0.212
Yellow skittles 364 0.204
Total number of 1786 1.000
skittles
MATH 1040 SKITTLES PROJECT DATA
SKITTLES COLOR CLASS TOTAL PROPORTION MY TOTAL PROPORTION
Red Skittles 330 0.185 12 0.200
Orange skittles 345 0.193 9 0.150
Yellow skittles 364 0.204 15 0.250
Green skittles 368 0.206 16 0.267
Purple skittles 379 0.212 8 0.133
Total skittles 1786 1.000 60 1.000
My skittles bag was a lot different from the other students. My bag had significantly less red and yellow
skittles if we compare with the rest of the class, but my bag had more purple skittles as we can
appreciated in the class data sheet. I always believe that red skittles are the most common and I was
assuming that these ones were more in number in each bag, but I was wrong. Probably I thought about
that because of the color, red is a very strong color, more vibrant and is noticeable more than the other
colors, but it wasn’t supported by the class data. In my skittles bag Purple was the most common color
and I was surprised to see this result.

Does the class data represent a random sample?


Yes. The class data does embody a random sample. Even though each student was asked to buy their
own bag of skittles and not every bag of skittles in the region had an equal chance of being selected, the
dispersal of skittles from the central factory/plant/warehouse, was most likely random. The skittles
factory most likely does not count colors as they load the bags and simply loads by weight, and assuming
students did not make any biased decisions about which bag to grab off the shelf(which in my point of
view is almost impossible), every bag produced had an equal chance of being shipped to any location in
the country and being selected at random by any student in the class.

What would the population will be?

In this project, the sample is the class data. Since not everyone in the class is currently living in the same
city, and some of the students live out of the state, then the population will be all 2.17 ounces skittles
bags in the United States. There are currently different manufacturing plants operating overseas,
therefore the population can only rationally be expanded to include the United States distribution circuit
only.

ORGANIZING AND DISPLAYING QUANTITATIVE DATA: THE NUMBER OF CANDIES PER BAG
Using de total number of candies in each bag in the class sample, calculate de mean, standard deviation,
and 5-number summary:

(a)Mean number of candies per bag

The mean number of candies per bag is 59.5

(b)Standard deviation of the number of candies per bag

The standard deviation per bag is 1.72

(c)5 number summary for the number of candies per bag

The 5-number summary is 57.0, 58.0, 59.0, 61.0, and 64.0.


CONFIDENCE INTERVAL ESTIMATES

Confidence Intervals estimated from a population proportion are used to determine, with the specified
degree of confidence, the proportion of a characteristic found within a population.

99% CONFIDENCE INTERVAL ESTIMATE FOR THE POPULATION PROPORTION OF YELLOW CANDY
364
𝑆𝑎𝑚𝑝𝑙𝑒 𝑝𝑟𝑜𝑝𝑜𝑟𝑡𝑖𝑜𝑛: = 0.204
1786
∝ 0.01
𝐶𝑟𝑖𝑡𝑖𝑐𝑎𝑙 𝑣𝑎𝑙𝑢𝑒: 99% = 𝑍 =𝑍 = 𝑍 0.005 = 2.575
2 2
√0.204(1 − 0.204)
𝑀𝑎𝑟𝑔𝑖𝑛 𝑜𝑓 𝐸𝑟𝑟𝑜𝑟: 2.575 ∗ = 0.024
1786
𝐿𝑜𝑤𝑒𝑟 𝑙𝑖𝑚𝑖𝑡: 0.204 − 0.024 = 0.18
𝑈𝑝𝑝𝑒𝑟 𝑙𝑖𝑚𝑖𝑡: 0.204 + 0.024 = 0.228
99% Confidence Interval Estimate: (0.179, 0.229)

In relation to the skittles, we are 99% confident that the proportion of yellow skittles in any bag of
skittles falls between 0.18 and 0.23.

95% CONFIDENCE INTERVAL ESTIMATE FOR THE POPULATION MEAN NUMBER OF SKITTLES PER BAG

Mean number of candies per bag = 59.5

Standard deviation of number of candies per bag = 1.72


(1 − 𝛼) ∗ 100% = 95% = (1 − 0.05) ∗ 100% = 95% 𝑠𝑜 𝑡𝛼 = 𝑡0.05
𝑡𝛼
𝐶𝑟𝑖𝑡𝑖𝑐𝑎𝑙 𝑣𝑎𝑙𝑢𝑒: 95% = 2 = 𝑡 0.005/2 = t0.025 = 2.00

Degrees of freedom: n-1 or 30 – 1 or 29


∝ 𝑠 1.72
𝐿𝑜𝑤𝑒𝑟 𝑏𝑜𝑢𝑛𝑑 ∶ × −𝑡 ∗ = 59.5 − 2.00 ∗ = 58.86
2 √𝑛 √29

∝ 𝑠 1.72
𝑈𝑝𝑝𝑒𝑟 𝑏𝑜𝑢𝑛𝑑 ∶ 𝑥 + 𝑡 ∗ = 59.5 + 2.00 ∗ = 60.13
2 √𝑛 √29

∝ 𝑠 1.72
𝑀𝑎𝑟𝑔𝑖𝑛 𝑜𝑓 𝐸𝑟𝑟𝑜𝑟 ∶ 𝐸 = 𝑡 ∗ = 𝐸 = 2.00 ∗ = 0.638.
2 √𝑛 √29
We are 95% confident that the mean number of candies per bag is between 58.8 and 60.1.

HYPOTHESIS TEST

We do hypothesis test in order to claim a value of a population proportion, population Mean or a


population standard deviation and whether the claim is true or false. The purpose of the hypothesis test
is to make a conclusion about a claim.

1. For testing the claim that 20% of all skittles are red.
n= 1786 x= 330 p^=0.185 Level of Significance=.05
Ho: p=.20 H1: p ≠ .20
Z=-1.609 p=0.108
P >.05
Because p-value is greater than the level of significance .05, fail to reject hypothesis null.
2. Testing for mean number of candies.
N=30 x-bar=59.5 Level of Significance=.01 Sx=1.72
Ho: μ=55 H1: μ ≠55 t=14.330
p-value 1.085>.01
Fail to reject the null hypothesis

There is not enough evidence to warrant rejection of the claim that the mean number of candies in a
bag of skittles is 55.

REFLECTION

The purpose of taking data from a designated sample and calculating statistics from that sample is to
estimate the overall population. One of the issues when calculating statistics from a sample for the
population is how well that sample represents the population parameter. A confidence interval is put
into place to help alleviate that issue by allowing us to provide a range that the population parameter
will most likely fall into. Each interval is constructed with a level of confidence, such as 95%, 98% or 99%.
The higher level of confidence the more likely someone is to accept your hypothesis.

One of the possible errors that could have been made is during data entry. Everyone submitted their
amount of skittles per color and their total. It could have been mistyped or calculated when the data
was being compiled. Another possible error that could have been made is if one of the students didn’t
participate with purchasing a bag of skittles and just submitted false information. The Sampling method
could be improved by requiring all students to purchase their bag of skittles and bring them to class on a
specific date. Then, to calculate each color and total in person so that everyone participating could
physically see what each person had.

The conclusions that have been drawn from our statistical research and from doing the hypothesis test,
is that 20% of our skittles are red and that the mean number of candies is 55. These hypothesis tests
were confirmed through a 95-99% confidence interval, which concluded that we failed to reject the null
hypothesis since p-value is greater than our level of significance.

You might also like