You are on page 1of 7

Kent Jones

Math 1040 Skittles Term Project

7.16.2015

In this project, I will be demonstrating many of the skills and techniques that I
have learned through my Math 1040 introduction to statistics course. As a class,
each student has purchased their own 2.17 ounce bag of Skittles and have recorded
the number of each color of candies in their bag. For me, I had:
10 red, 16 orange, 9 yellow, 16 green, and 11 purple, putting my personal total of
Skittles at 62.
As a class total, we had:
237 red, 226 orange, 242 yellow, 263 green, 224 purple, giving the class total of
skittles at 1197.

Number of Skittles for each color


[Class Data]
224

237

226

263
247
Red

Orange

Green

Purple

Yellow

To show the proportion of each color


within the overall sample gathered by the class, below is a Pie Chart, and a Pareto
Chart for the numbers of candies of each color, keep in mind, this is from the class
including my sample of skittles as well.

Number of Class Skittles


270

263

260
250

247
237

240

Number of Class Skittles 230

226

224

Yellow

Green

220
210
200

Purple Orange

Red

Color of Class Skittles

Do you
notice the difference of these two graphs? Notice that the Pie Graphs make it
difficult to compare numbers, its hard to see which one is larger than the other,
while the Pareto Chart allow you to easily see what color the class had the most of.
These charts reflect about what I was expecting to see. I didnt expect to see any
color of Skittle standing out, or for there to be a large portion of one color than the
rest.

Kent Jones

Math 1040 Skittles Term Project

7.16.2015

Number of Skittles for each color


[My Data]
11
16
9

16
10

Purple

Orange

Yellow

Green

Red

I expected this because we are


looking at a large sample of Skittles. Through my education of statistics, I have been
taught that the larger the sample group, the more accurate the results to the true
numbers will be. Once you start comparing smaller sample sizes you will begin to
notice some interesting observation. In my sample, I only had 9 yellow Skittles,
which was the smallest group of colors that I had, but in the class data, notice how
we actually have 247 yellow Skittles, being the second largest group of colors. This
goes to show that when you compare differences with larger numbers, you will have
more accurate and proportionate numbers.

Number of My Skittles
18
16

16

16

14
11

12

Number of My Skittles

10

10

8
6
4
2
0

Yellow

Red

Purple

Orange

Green

Color of My Skittles

To help show
the point that Im trying to make, below I have also included a Pie Chart and a
Pareto Chart for the numbers of candies of each color from only my personal data. If
you compare my graph to the class graphs, youll notice that my numbers seem
really disproportionate and unequal. One could begin to think that the Skittles
company CEOs two favorite colors are orange and green. But once you see the
class graphs, youll realize that the numbers are much closer together.

Kent Jones

Math 1040 Skittles Term Project

7.16.2015

Number of Skittles per 2.17-ounce bag


6
4
Bags of Skittles 2
0

Frequency

Amount of Skittles per 2.17 ounce bag

To help break this data


down and make it easier to understand, I will go into depth on specific aspects of
the data. Notice however, that all of this data is categorical data (also referred to as
frequency or qualitative data.) These Skittles have been grouped together by color,
not by weight or a measurable data. If we were comparing measurable data, such
as weight, then that data would be known as quantitative. The reason why its not
unfair or wrong for the Skittles company to sell some bags that have 63 Skittles in
them, while others only have 55 Skittles in them, is because they are weighed and
quantified. Knowing the difference in between these types of data is significant to
understand the validity of data, and the statistics.
Above you will see a histogram (graph of the number of skittles per 2.17
ounce bag.) This visually indicates how many Skittles were in each bag, youll notice
that majority of the bags had 59 Skittles. Two students got lucky and had 63
Skittles, luckily I was just below those two students at 62 Skittles. But, there were
two unfortunate individuals that only had 55 Skittles. Youll notice that this doesnt
have a bell shape to it, and it is not normally distributed. Remember, some of these
skittles could weigh more than others, and could be larger than others. So it may
seem like its unfair to get less Skittles than those who got a lot, they have all been
weighed to 2.17 ounces, and are therefore equal.
Ive also included quantitative data below, this gives us vital information
about the data, such as the mean, median, and mode, as well as the standard
deviation, and the minimum and maximum values. Using the quantitative data, I
was able to construct a box plot whisker graph, shown below. This shows that the
data is skewed to the left, and is therefore negatively skewed. This indicates that
the majority of the class got 59 to 62 Skittles.

| ----------| --| ----| --|


55

59

60

62

63

Statistics also uses a great tool called Confidence Interval Estimates. These
are generally used to estimate a range of values. This range has a high probability
of including the unknown population. To do this, however, you must choose, or be
given an interval to estimate your range.
But, we must have specific requirements be met. First, the sample is a simple
random sample, and ours is. Second, conditions for the binomial distributions are
met and satisfied such as

Kent Jones

Math 1040 Skittles Term Project

7.16.2015

There is a fixed number of trials, such as how we have 20 bags of


Skittles.
Trials are independent, one bag of Skittles will not change the amount
or color of skittles in the next bag of Skittles.
There are two categories of outcomes, such as succeeding to have
orange skittles, or failing to have some other color.
The probabilities remain constant for each trial, if you were to get your
own bag of skittles, the probabilities of getting an orange skittle would
be the same as the example shown here.

Third, there must be at least five successes, and five failures. Our Skittles sample
meets all of these requirements.
For example, to construct a 95% confidence interval to estimate the true
proportion of orange Skittles. Youll notice on the drawn in graph below, that 95% of
values would fall into the center of the graph. The tails of the graph would be the
remaining 2.5% of the graph. Work is shown below.

_____________________

This shows that with 95% confidence, the interval of 0.167 to 0.211 contains the
true value proportion of orange Skittles. This means that if we were to select many
different samples of 20 Skittle bags and construct the corresponding confidence
intervals, 95% of them would actually contain the value of the population proportion
p.
Also, to construct a 98% confidence interval estimate for the standard
deviation of the number of candies per bag, I also must have specific requirements
as well. The sample must be a simple random sample, and it must be normally
distributed, regardless of the size. Looking at our histogram, our sample is roughly
normally distributed, because of this we can assume the population is as well. The
number of bags would still be 20, and standard deviation would still be 2.28, but my
alpha would change to 0.02. Work is shown below.

________________

Kent Jones

Math 1040 Skittles Term Project

7.16.2015

This shows that with 98% confidence that the interval from 1.652 to 3.597 does
contain the value standard deviation of the number of Skittles per bag.
As another example, to construct a 99% confidence interval estimate for the
true mean number of candies per bag. The requirements to perform this problem is
the same for the confidence interval estimate. In this example the alpha = 0.01,
and the degrees of freedom is one number less of bags of skittles, in our case it is
19. The standard deviation is 2.28, and X is 59.9. Work is shown below.

This shows that with 99% confidence the interval from 58.441 to 61.359 does
contain the true mean number of Skittles per bag.
Ill show and explain one more tool used in statistics. We also use something
called a hypothesis test. A hypothesis test is a claim or statement about a property
of a population. It usually has two part to it that tell us if our graph has one tail, or
two tails. A hypothesis test is also a procedure for testing a claim about a property
of a population. When creating a hypothesis tests, two hypothesis are created, one
is called a Null Population, and is usually represented by H 0. It is always represented
as being equal to a given or claimed value. The second hypothesis used is the
Alternative hypothesis, and is usually represented by H A. The HA is represented as
being greater than, less than, or not equal to a given or claimed value.
However, before I begin this test, we must meet specific requirements. First,
the sample is a simple random sample, ours meets this requirement. Second, the
sample size must either be over 30 or normally distributed, since our sample size is
20 bags of Skittles, we must have the population be normally distributed values, our
histogram roughly reflects a normally distributed sample size. Because our sample
is roughly normally distributed, we can assume the population is normally
distributed. With these requirements satisfied, we can proceed with a problem.
For example, to use a 0.01 significance level to test the claim that 20% of all
Skittles are purple, I would form these two hypothesis. Youll notice that because my
H0 is P = 0.20, and my HA is p = 0.20, that my graph will have two tails. If it were
greater than or less than, it would only have one tail.

____________
Because the test statistic signified as Z falls within the 99% value, we would fail to
reject the claim that 20% of the skittles are purple. In simple words, this means that
there is not sufficient evidence to warrant rejection of the claim that 20% of all
Skittles are purple.

Kent Jones

Math 1040 Skittles Term Project

7.16.2015

To show another example of Hypothesis Testing, is if a 0.05 significant level


was used to test claim that the mean number of candies in a bag is 58. Work for this
example is shown below.

__________________
Youll notice that because the H0 falls within the red shaded region, which is known
as the critical value, we reject H0. Put in simple words, this means that there is
sufficient evidence to warrant rejection of the claim that the mean number of
candies in a bag is 58. Youll notice that although our simple random sample size is
less than 30, our population is normally distributed, allowing us to test this
hypothesis.
Although hypothesis testing is very useful, you must be very careful to avoid
two types of errors. These errors are commonly referred to as Type I and Type II
Errors. Type I error is when the null hypothesis is true, but is rejected anyway, and
likewise, Type II error is when the null hypothesis was not rejected, when it is false.
These two problems create a balancing game in making hypothesis tests. We need
to make sure that we dont make the confidence interval to small or large. One
successful way to avoid both of these errors, however, is to increase the sample
size.
In conclusion, to be able to perform interval estimates and hypothesis tests
for population means, the data used must meet specific requirements. Three
different sets of requirements that must be met, first is to identify the specific claim
or hypothesis to be tested, and to put it into symbolic terms. Second is to give the
symbolic form that must be true when the original claim is false. Third is of the two
symbolic expressions obtained so far, let the alternative hypothesis (as noted above
this was HA) be the one not containing equality, so that H a uses the symbols < or >
or =. As you can tell above, all of the samples that were tested met these condition.
Using the skills I have learned in this class, I have shown that in small
samples, numbers may seem misleading, but once we include a much larger
number into consideration, we can see much more of what is expected. I am
grateful for my education in statistics, and for the knowledge to be able to apply
these skills into an every day application. Although this is just a project about the
color of pieces of candy, this gave me an oportunity to show which graphs make
numbers easy to compare, the distribution of a histogram, the difference between
qualitative and categorical data, the skewness of a box and whiskers graph, and
even explanation and examples of confidence interval estimates, and hypothesis
tests, as well as type I and type II errors.
Total Skittles
Per Bag

5-NumberSummary
Min

Kent Jones

Mean
Median
Mode
Standard
Deviation
Sample Variance
Range
Minimum
Maximum
Sum
Count

Math 1040 Skittles Term Project

59.
9 Q1
60 Q2
59 Q3
2.2
8 Max
5.1
9
8
55
63
119
7
20

5
5
9
6
0
6
2
6
3

7.16.2015

You might also like