You are on page 1of 8

Kyliegh Billings, Melissa Billings

Math 1040-012
Term project-Skittles data

For this project, everyone in the class was asked to buy a 2.17 oz sized bag of skittles, and count
the number of each color of candy in the bag. The class data was compiled, and this is the data
that we used to complete the different aspects of this statistics assignment.
For the first part of the project we were asked to determine the proportion of each
color within the overall sample gathered by the class. To do this, we created a Pie Chart and a
Pareto Chart representing the numbers of each color of candy. We compared the class data to
our own personal data and noted any similarities or differences.
For the next portion of the project we used the skittles data to calculate the mean,
standard deviation, and 5-number summary. We then used this data to make a frequency
histogram, and a box plot.
The last part of the project involved confidence intervals, and hypothesis tests. We
found three different confidence intervals. One each for the population proportion, mean, and
standard deviation and wrote an analysis about what each confidence interval meant.

Colors
Red
Orange
Yellow
Green
Purple

total number
295
291
282
294
265

TOTAL

1427

COLOR
red
orange
yellow
green
purple
TOTAL

PROPORTION
0.207
0.204
0.198
0.206
0.186
1

Data for my bag of Skittles


Color

Number

Proportion

Percentage

Green

19

.3015873016

30.16%

Red

11

.1746031746

17.46%

Orange

11

.1746031746

17.46%

Yellow

12

.1904761905

19.05%

Purple

10

.1587301587

15.87%

These graphs do represent what I expected to see. I thought that each color of candy would be
equally represented in each bag, and the sample data seems to suggest that is the case.

With my sample data all colors were approximately equally represented, with the exception of
green candies. In my bag there were significantly more green candies. the green candies made
up .301587, or 30.16%.

Using the total number of candies in each bag in the class sample, we were asked to calculate
the mean, standard deviation, and 5-number summary. Those results are as follows:
Mean: 59.5
Sample standard deviation: 1.98
Five number summary: Min=55, Q1=58, Median=60, Q3=61, Max=63

The difference between difference between categorical and quantitative data is that with
quantitative data you can do math, it uses numbers. Categorical data does not include
numbers, so you cannot do math with it. It includes such things as colors, gender, prenatal care,
etc

You would use histograms, pie charts, boxplots, stem and leaf plots, and scatterplots for
quantitative data. For categorical data you should use a Pareto chart.

When it comes to calculations, mean and median only make sense for quantitative data. The
mean is the average quantity of something in an entire sample, therefore only makes sense
when applied to quantitative data. The median represents the middle value of the data and
once again makes the most sense only when applied to quantitative data. The best central
tendency to apply to categorical data is the mode. When looking at the colors of candy in a
skittles bag, you may not able to find the average color or the median color, but you can
establish which color occurs the most often.

99% Confidence Interval estimate for the population proportion of yellow candies

X= 282
n= 1427
Z-value for 99% CI = 2.575
p= 282/1427= 0.198
99% Confidence Interval Estimate: (0.171, 0.225)
Confidence Intervals estimated from a population proportion are used to determine, with the
specified degree of confidence, the proportion of a characteristic found within a population. In
relation to the skittles, we are 99% confident that the proportion of yellow skittles in any bag of
skittles falls between 0.171 and 0.225.
95% Confidence Interval estimate for the population mean number of skittles per bag
n= 24
Sx = 1.978
Sample mean= 59.458
59.458 +/ 2.069(0.404) =.835
59.458 + .835 = 60.293
59.458- .835 = 58.623
95% Confidence Interval Estimate: (58.623, 60.293)
Confidence Interval estimates of the population mean use sample data to extrapolate an
interval with the specified degree of confidence that the mean characteristic of a population
should fall within. In this case, we are 95% confident that the mean number of skittles in any
bag is between 58.623 and 60.293.

Hypothesis Tests allow us to test a given claim by comparing it to the null, or alternate, of that
claim. In this testing, using the given significance level, we can ascertain whether the claim is
valid, the null is valid, or the alternate is valid.
Use a 0.05 significance level to test the claim that 20% of all Skittles candies are red.
Claim: p=.20
Null (H0): p=.20
Alternative (H1): p.20
Test Statistic: z0=

.2067274.20

0(10)

.20(.80)

1427

=.6353298386=.64

P-value=.7389
Critical value=1.96
Fail to reject H0. P-value is greater than.025. There is not sufficient evidence to warrant
rejection of the claim that p=.20
This hypothesis test tells us that we can say with 90% confidence that the claim that
20% of all skittles are red is true.

Use a 0.01 significance level to test the claim that the mean number of candies in
a bag of Skittles is 55.
Claim: =55
Null (H0): =55
Alternative (H1): 55
Test Statistic: t0=

59.45833355

1.97768346424

4.458333=11.03
.4036929466

P-value=2.5
Critical value=2.575
Reject H0. There is sufficient evidence to warrant rejection of the claim the mean
number of candies in each bag equals 55.
This hypothesis test tells us that we can say with 99% confidence that the claim that
there is a mean of 55 Skittles per bag is not true.

You might also like