You are on page 1of 6

Eleshia Peterson

Professor Maw
Math 1040
6 August 2015

Skittles Data 2015 Summer Math 1040 Online


Each student in our Math1040 was given the task to purchase one 2.17 oz
bag of Original Skittles. We then separated out all of the candies and tallied how
many of each color we had. Once we had our totals, we submitted them to
professor Maw to compile for a sample total. The purpose of the project is to tie
together all of the concepts we have learned throughout the semester.
Below is the chart listing the candy totals per color with the class totals.

ID
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
20
bags

Count
Red

Count
Orange

Count
Yellow

Count
Green

Count
Purple

12
13
17
5
15
13
12
7
7
13
14
12
12
14
9
9
13
15
12
10

13
11
15
14
12
10
17
10
10
15
20
11
9
9
6
13
7
13
14
14

10
13
12
14
12
14
6
18
18
11
8
10
6
12
10
8
15
13
12
13

10
10
9
18
14
11
10
22
22
9
8
10
13
16
8
14
16
12
11
15

16
13
16
12
6
9
18
5
5
12
11
17
14
11
5
17
8
13
10
10

Total
Candies in
Each Bag
61
60
69
63
59
57
63
62
62
60
61
60
54
62
38
61
59
66
59
62

234

243

235

258

228

1198

Below is my bag of candies compared with the class totals.

My Bag
Class
Counts

Count
Green
15

Count
Orange
14

Count
Yellow
13

Count
Red
10

Count
Purple
10

258

243

235

234

228

Total
62
1198

I was actually surprised to see the variance in colors in my own bag of


candies. For some reason, since I was little I always thought the bags were
somewhat close to being even in all of the colors so this is what I expected to see
(a 20% distribution in all colors). I noticed, in the class count though, that there was
one person who was about 20 candies short from the mean total of candies. Not
sure how or why that happened. This will impact the overall totals, which will
continue to impact calculations for the mean, deviations, etc. However, the
distribution of colors for the total class data does replicate what was in my bag of
cadies in that green is the overall high extreme and purple is the overall low
extreme. Green seems to be favored by the Skittles Company.
For the overall sample gathered, I assumed that each color would all be close
to 20%. This is because there are five different colors, the percent of each added
together should equal 100% - in a perfect world. However, there is a slight
difference in the percentages once the actual calculations were made. Listed below
are the guesses and actual percentages.
Color

Guess

Actual

Purple
Red
Yellow
Orange
Green

20
20
20
20
20

19.03
19.53
19.62
20.28
21.54

%
%
%
%
%

%
%
%
%
%

In order to make the correct chart throughout the project, we needed to


identify what type of sample we were working with. This is a cluster sample. It is not
a simple random sample because each candy bag had an equal opportunity of
being chosen to participate in the sample. The population would be all of the all
Original Skittles made.

Below is a pareto chart and pie chart for the sample data to better visualize
the amounts of each color for our sample.

To get an even better idea of what is happening in our sample, I created the
next two charts which required a bit more calculating. In order to create a histogram
and boxplot we had to calculate the following variables for our sample:
Mean: 59.9
Standard Deviation: 6
5-Number Summary: 38, 59, 61, 62, 69

The shape of distribution is skewed to the left on both graphs above. This is
about what I expected to see after we had analyzed the data in pie chart and pareto
chart; however, I do not believe that is normal. I believe something may have gone
awry such as candies being eaten before being counted, a defective bag, etc. That
being said, I believe that the outlier causing the skew should be thrown out to show
a more accurate spread of stats. The overall data from the sample of the 20 bags of

Skittles compared to my one bag of 62 candies is in agreeance. My bag of candies


falls within 1 standard deviation from the median.
As you can tell, the data we are working with is considered quantitative and
categorical data. Quantitative data contains numbers representing measurements
or counts such as weight in pounds, measurements in inches, or time in seconds.
Categorical data is usually separated into different categories, such as a color
of a candy, yes or no, gender, etc. Histograms, stem leaf plots, and dot plots, are
types of graphs that make sense when graphing quantitative data. For quantitative
data it doesnt make sense to use bar graphs or pie charts. The reason only certain
graphs can be used for quantitative data is because quantitative data is measured
by values or counts and are expressed as numbers, so it wouldnt really make sense
to make a pie chart with only numbers. There should be some sort of categorical
name to understand what that pie chart would mean. For categorical data, pie
charts and bar graphs make sense when graphing your data. Although, histograms,
stem leaf plots, and dot plots, are types of graphs that do not make sense when
graphing categorical data. Only certain graphs can be used for categorical data, this
is because the graph must be able to use categories instead of a number spread on
at least on 1 axis, or on a pie chart. For instance it isnt possible to plot categorical
data on a stem leaf plot, this is because it doesnt contain a spot for categorical
information, only numerical information.
I am now going to construct confidence intervals for the sample. The purpose
and meaning of a confidence interval is to help us to see a range of values for an
estimated population parameter instead of a point estimate, or just one value.
Confidence intervals give us a more accurate picture of where an estimated
population parameter is located. Given that an experiment is done over and over
again, the point of estimate should fall in between the value range calculated by the
confidence interval results; or at least most of the time. With confidence intervals,
we are calculating the probability that a bigger sample size than the one used, will
have close to the same results. The most common confidence levels are 99%, 95%,
and 90%. So if we calculated a confidence interval using a 95% confidence level, we
could say that we are 95% confident that the population mean is within the range of
(as an example), 120 -150.
Constructa99%confidenceintervalforthetrueproportionofyellowcandies.

E=z

.1962 2.005

( .1962 ) (1.1962 )
1198

^p ( 1 ^p )
n

0.167< p<0.226

Iamsurethat99%ofthetimetheestimatedproportionofyellowcandiesinaSkittlesbagofcandyfalls
between0.167to0.226.
Constructa95%confidenceintervalestimateforthetruemeannumberofcandiesperbag.

E=t

s
2 n

E=2.093

( 620 )

E=2.81

57.09< <62.71

Iam95%confidentthattheinterval57.09to62.71containsthetruevalueof perSkittlesbag.

Constructa98%confidenceintervalestimateforthestandarddeviationofthenumberofcandiesperbag.

( n1 ) s 2
( n1 ) s2
<o<
x 2R
x 2L

( 201 ) 6 2
( 201 ) 6 2
<o <
36.191
7.633

4.3< o<9.5

Basedontheresults,Ihave98%confidencethatthelimitsof4.3and9.5containthetruevalueof o
perSkittlesbag.

Reflection:
This project helped me in seeing how to apply these concepts to real life
events. As important as it is to apply them to Skittles, I have been able to identify
areas in my current job that I would be able to apply them as well. I currently am
working for a company that staffs medical professional temporarily to help out a
facility in need, for patients to receive the treatment needed. We have multiple
ways of tracking our success; however, I am part of one specific tracking team
called fall off. This refers to the number of coverage days that we had to cancel
due to whatever circumstance that arose. We, of course, track the total number of
coverage days lost each day, week, month and year, along with the percentage
against coverage days added and why the coverage was lost (reason for the loss
such as provider cancelled, facility found alternate coverage, credentialing, etc.).
This course has shown me that we can take that even a few steps further in
finding our outliers, what the common theme seems to be for cancelling coverage,
etc. This will become especially helpful when coaching our recruiters in showing
them what the trends are and how to prevent them from re-occurring. Although,
some circumstances will still be unavoidable, even the smallest outlier that gets
adjusted/thrown out will make a big difference in the grand scheme of things. Once
we have these additional statistics down, we can then start hypothesizing to better
see if our annual goal will even be attainable with how the trend is currently
occurring. If not, we will be able to see what needs to change in order to hit that
goal.
I understand now that there is reason behind why some bottles of water are a
little more or a little less than the ounces that may be listed (standard deviation). I
am able to see how to catch the one offs and how that can cause issues in the
overall picture. Overall, I see statistics everywhere now. They are unavoidable and
so useful, especially in the business world!

You might also like