Professional Documents
Culture Documents
7.16.2015
In this project, I will be demonstrating many of the skills and techniques that I
have learned through my Math 1040 introduction to statistics course. As a class,
each student has purchased their own 2.17 ounce bag of Skittles and have recorded
the number of each color of candies in their bag. For me, I had:
10 red, 16 orange, 9 yellow, 16 green, and 11 purple, putting my personal total of
Skittles at 62.
As a class total, we had:
237 red, 226 orange, 242 yellow, 263 green, 224 purple, giving the class total of
skittles at 1197.
237
226
263
247
Red
Orange
Green
Purple
Yellow
263
260
250
247
237
240
226
224
Yellow
Green
220
210
200
Purple Orange
Red
Do you
notice the difference of these two graphs? Notice that the Pie Graphs make it
difficult to compare numbers, its hard to see which one is larger than the other,
while the Pareto Chart allow you to easily see what color the class had the most of.
These charts reflect about what I was expecting to see. I didnt expect to see any
color of Skittle standing out, or for there to be a large portion of one color than the
rest.
Kent Jones
7.16.2015
16
10
Purple
Orange
Yellow
Green
Red
Number of My Skittles
18
16
16
16
14
11
12
Number of My Skittles
10
10
8
6
4
2
0
Yellow
Red
Purple
Orange
Green
Color of My Skittles
To help show
the point that Im trying to make, below I have also included a Pie Chart and a
Pareto Chart for the numbers of candies of each color from only my personal data. If
you compare my graph to the class graphs, youll notice that my numbers seem
really disproportionate and unequal. One could begin to think that the Skittles
company CEOs two favorite colors are orange and green. But once you see the
class graphs, youll realize that the numbers are much closer together.
Kent Jones
7.16.2015
Frequency
59
60
62
63
Statistics also uses a great tool called Confidence Interval Estimates. These
are generally used to estimate a range of values. This range has a high probability
of including the unknown population. To do this, however, you must choose, or be
given an interval to estimate your range.
But, we must have specific requirements be met. First, the sample is a simple
random sample, and ours is. Second, conditions for the binomial distributions are
met and satisfied such as
Kent Jones
7.16.2015
Third, there must be at least five successes, and five failures. Our Skittles sample
meets all of these requirements.
For example, to construct a 95% confidence interval to estimate the true
proportion of orange Skittles. Youll notice on the drawn in graph below, that 95% of
values would fall into the center of the graph. The tails of the graph would be the
remaining 2.5% of the graph. Work is shown below.
_____________________
This shows that with 95% confidence, the interval of 0.167 to 0.211 contains the
true value proportion of orange Skittles. This means that if we were to select many
different samples of 20 Skittle bags and construct the corresponding confidence
intervals, 95% of them would actually contain the value of the population proportion
p.
Also, to construct a 98% confidence interval estimate for the standard
deviation of the number of candies per bag, I also must have specific requirements
as well. The sample must be a simple random sample, and it must be normally
distributed, regardless of the size. Looking at our histogram, our sample is roughly
normally distributed, because of this we can assume the population is as well. The
number of bags would still be 20, and standard deviation would still be 2.28, but my
alpha would change to 0.02. Work is shown below.
________________
Kent Jones
7.16.2015
This shows that with 98% confidence that the interval from 1.652 to 3.597 does
contain the value standard deviation of the number of Skittles per bag.
As another example, to construct a 99% confidence interval estimate for the
true mean number of candies per bag. The requirements to perform this problem is
the same for the confidence interval estimate. In this example the alpha = 0.01,
and the degrees of freedom is one number less of bags of skittles, in our case it is
19. The standard deviation is 2.28, and X is 59.9. Work is shown below.
This shows that with 99% confidence the interval from 58.441 to 61.359 does
contain the true mean number of Skittles per bag.
Ill show and explain one more tool used in statistics. We also use something
called a hypothesis test. A hypothesis test is a claim or statement about a property
of a population. It usually has two part to it that tell us if our graph has one tail, or
two tails. A hypothesis test is also a procedure for testing a claim about a property
of a population. When creating a hypothesis tests, two hypothesis are created, one
is called a Null Population, and is usually represented by H 0. It is always represented
as being equal to a given or claimed value. The second hypothesis used is the
Alternative hypothesis, and is usually represented by H A. The HA is represented as
being greater than, less than, or not equal to a given or claimed value.
However, before I begin this test, we must meet specific requirements. First,
the sample is a simple random sample, ours meets this requirement. Second, the
sample size must either be over 30 or normally distributed, since our sample size is
20 bags of Skittles, we must have the population be normally distributed values, our
histogram roughly reflects a normally distributed sample size. Because our sample
is roughly normally distributed, we can assume the population is normally
distributed. With these requirements satisfied, we can proceed with a problem.
For example, to use a 0.01 significance level to test the claim that 20% of all
Skittles are purple, I would form these two hypothesis. Youll notice that because my
H0 is P = 0.20, and my HA is p = 0.20, that my graph will have two tails. If it were
greater than or less than, it would only have one tail.
____________
Because the test statistic signified as Z falls within the 99% value, we would fail to
reject the claim that 20% of the skittles are purple. In simple words, this means that
there is not sufficient evidence to warrant rejection of the claim that 20% of all
Skittles are purple.
Kent Jones
7.16.2015
__________________
Youll notice that because the H0 falls within the red shaded region, which is known
as the critical value, we reject H0. Put in simple words, this means that there is
sufficient evidence to warrant rejection of the claim that the mean number of
candies in a bag is 58. Youll notice that although our simple random sample size is
less than 30, our population is normally distributed, allowing us to test this
hypothesis.
Although hypothesis testing is very useful, you must be very careful to avoid
two types of errors. These errors are commonly referred to as Type I and Type II
Errors. Type I error is when the null hypothesis is true, but is rejected anyway, and
likewise, Type II error is when the null hypothesis was not rejected, when it is false.
These two problems create a balancing game in making hypothesis tests. We need
to make sure that we dont make the confidence interval to small or large. One
successful way to avoid both of these errors, however, is to increase the sample
size.
In conclusion, to be able to perform interval estimates and hypothesis tests
for population means, the data used must meet specific requirements. Three
different sets of requirements that must be met, first is to identify the specific claim
or hypothesis to be tested, and to put it into symbolic terms. Second is to give the
symbolic form that must be true when the original claim is false. Third is of the two
symbolic expressions obtained so far, let the alternative hypothesis (as noted above
this was HA) be the one not containing equality, so that H a uses the symbols < or >
or =. As you can tell above, all of the samples that were tested met these condition.
Using the skills I have learned in this class, I have shown that in small
samples, numbers may seem misleading, but once we include a much larger
number into consideration, we can see much more of what is expected. I am
grateful for my education in statistics, and for the knowledge to be able to apply
these skills into an every day application. Although this is just a project about the
color of pieces of candy, this gave me an oportunity to show which graphs make
numbers easy to compare, the distribution of a histogram, the difference between
qualitative and categorical data, the skewness of a box and whiskers graph, and
even explanation and examples of confidence interval estimates, and hypothesis
tests, as well as type I and type II errors.
Total Skittles
Per Bag
5-NumberSummary
Min
Kent Jones
Mean
Median
Mode
Standard
Deviation
Sample Variance
Range
Minimum
Maximum
Sum
Count
59.
9 Q1
60 Q2
59 Q3
2.2
8 Max
5.1
9
8
55
63
119
7
20
5
5
9
6
0
6
2
6
3
7.16.2015