Kassem Hajj: Math 1040 Skittles Term Project

Kassem Hajj
Math 1040 Skittles Term Project

The following project will consist of many of the concepts I am learning in the course of
Math 1040. Some of the concepts I will be covering will include organizing and
analyzing data, drawing conclusions using confidence intervals and hypothesis tests. I
will first start by collecting my own data. My data was collected along with all of the
other students in class and combined into one data set that will be used throughout this
assignment.
Data Collection
I have purchased one 2.17 ounce bag of Original Skittles and recorded the data below.
The total number of candies in my own bag is 62.
Number of
Number of
Number of
Number of
Number of
Red
Orange
Yellow
Green
Purple
candies
candies
candies
candies
candies
13
19
17
Class Data Collection:

The following is the collection of data for the whole statistics class. The total number of
bags in the class sample is 21, and the overall total of skittles is 1264.
Bags
1
2
3
4
5
6
7
8
9
RED
22
11
7
11
8
12
12
5
19
Orange
13
13
18
6
9
13
8
23
11
Yellow
3
10
13
15
10
12
14
15
11
Green
8
10
10
7
12
13
11
11
9
Purple
16
12
15
21
13
10
14
8
12
Total
62
56
63
60
52
60
59
62
62
10
11
12
13
14
15
16
17
18
19
20
21
Color
Totals
Proportion
%
13
15
10
11
21
13
20
17
17
12
12
15
283
15
18
14
9
11
7
21
11
10
9
17
15
271
22.389 21.440
13
16
12
13
9
18
10
14
12
7
16
11
254
6
9
9
17
8
12
2
12
10
14
9
7
206
10
9
15
9
9
10
9
8
11
17
8
14
250
57
67
60
59
58
60
62
62
60
59
62
62
1264
20.095
16.297
19.778
100
Note: Only including whole candies and disregarding any partial candies in the bag.
Organizing and Displaying Categorical Data:
Pie Chart:
Proportion of Each Color

Red
20%
22%
Yellow
Green
16%
21%
20%
Pareto Chart:
Orange
Purple
Amount of Each Color

300
200
Amount
Series 1
100
0
Red
Orange
Yellow
Purple
Green
Candies Colors
The data from the whole class shows that the five colors have an approximately equal
proportion. Which is 20%. I was expecting this result because it seems very reasonable.
In some bags there was almost double the amount of a specific color of skittles. However,
we are dealing with twenty-one bags so, the outliers in some bags do not affect our
results.
Organizing and Displaying Quantitative Data:
The total number of candies in my own single 2.17-ounce bag of Skittles = 62 .
The total number of bags in the sample collected by the entire class = 21 .
The total number of candies in the sample collected by the entire class = 1264 .
For the entire class:
Summary statistics:
Column
n
Mean
Total
21
60.2
Std. dev.
3.01
Median
Min Max Q1 Q3
60
52
67 59 62
The shape of the distribution is not normal because the data does not reflect a bell shape
on the histogram. It is almost normal though, for exception to the bag that contains 52
candies. The data does reflect what I was expecting because I thought that the data would
resemble a bell shape. If you look at the boxplot the data seems to make more sense and
you can see that most of the data stays around the 60 total mark. My bag of skittles had a
total of 62 skittles, so I would consider the class data to be aligned with my own data.
Reflection:
The difference between Categorical and Quantitative data is that Categorical data is
considered labels such as hair color, Names, etc... and Quantitative data is numbered data
that can be mathematically manipulated. With Quantitative data you would most likely
want to use boxplots, Histograms, and summary data statistics because all of these can
show numbered data in a meaningful way. With Categorical data you may want to choose
something like a pie chart or stem and leaf plot or even a bar graph (pareto chart) because
these explain better the data and show its meaning with categories.
Confidence Interval Estimates
The purpose of taking a random sample from a lot or population and computing a
statistic, such as the mean (proportion, standard deviation and Variance) from the data, is
to approximate the mean for example (or the other parameters) of the population. How
well the sample statistic estimates the underlying population value is always an issue. A
confidence interval addresses this issue because it provides a range of values, which is
likely to contain the population parameter of interest.
Constructing a 95% confidence interval estimate for the true proportion of purple
candies:
^p=0.198 ; q^ =1^p =0.802 ; n=1264 ; x =11.9 ; =0.05

0.176< p<0.220
We have 95% confidence that the proportion value (p) of the purple color candies is
between the range (0.176,0.22)
Constructing a 99% confidence interval estimate for the true mean number of
candies per bag.
n=21; n1=20 ; x =60.2 ; s=3.01 ; =0.01

58.331< <62.069
We have 99% confidence that the mean number of candies per bag is between 58.331 and
62.069.
Constructing a 98% confidence interval estimate for the standard deviation of

number of candies per bag:
s=3.01 ; n=21; =0.02

K 2L =8.26 ; K 2R=37.566
2.196< < 4.684
We have a 98% confidence that the standard deviation of the population is included
between 2.196 and 4.684.
Hypothesis Tests:
In statistics, a hypothesis test is a procedure for testing a claim about a property of a
population. To apply this test, we should identify the null hypothesis (statement that the
value of a population parameter is equal to some claimed value) and alternative
hypothesis (statement that the parameter has a value that somehow differs from the null
hypothesis) so that the formal hypothesis test includes these standard components that are
used often in many different disciplines. Then we have to calculate the value of the test
statistic, given a claim and sample data. Choose the sampling distribution that is relevant.
Either find the P-value or identify the critical value. State the conclusion about a claim in
simple and nontechnical terms.
Use a 0.01 significance level to test the claim that 20% of all Skittles candies are
green.
x=206 n=1264 ^p =0.163 =0.01 p=0.2q=0.8
H 0 : p=0.2 H 1 : p 0.2 z=3.29 p value=0.001

Since the p-value is less than or equal the significance level ( 0.0001 0.01 .
Therefore, we reject the null hypothesis. There is insufficient evidence to support the
claim that 20% of all Skittles candies are green.
Use a 0.05 significance level to test the claim that the mean number of candies in
a bag of Skittles is 56.
x =60.2 s=3.01 n=21 0=56
H 0=56 H 1 56 t=6.39 pvalue =0.0031
Since the p-value is less than or equal the significance level ( 0.0031 0.05 .
Therefore, we reject the null hypothesis. There is insufficient evidence to support the
claim that the mean number of candies in a bag of Skittles is 56.
Reflection:
The conditions for doing interval estimates and hypothesis tests for population
proportions are:
1. The sample observations are a simple random sample.
2. The conditions for a binomial distribution are satisfied.
3. The conditions np 5nq 5 are both satisfied.
Since we all bought a random Skittles bag and we all got different random numbers and
colors of candies, this meet the condition. For the binomial distribution conditions, in our
project, each color has its own fixed outcomes, and for this specific calculation p was 0.2.
np=21 0.2=4.2< 5nq=21 0.8=16.8 5, In this case np is not satisfied.
The conditions for doing interval estimates and hypothesis tests for population
means are:
2. Either or both or these conditions is satisfied: The population is normally
distributed or n>30 .
As I mentioned before, the sample is a simple random sample. But the population is not
normally distributed as I discussed before in the section (Organizing and Displaying
Quantitative Data) and also the histogram confirms that. And we have n=21< 30 . The
second condition is not satisfied.
The conditions for doing interval estimates for population standard deviations are:
2. The population must have normally distributed vales.
Again condition number one is already satisfied but the second condition is not as I just
verified in the section above.
By using this data, we have made some errors. We assumed that the conditions of all test
are satisfied, which is not true. This means that some of the tests we did are not
appropriates with this data. To improve the sampling method we need to deal with a
larger population (for example n>30 ) as lager n we can get, as more accurate tests we
can apply.
Conclusion:
My main conclusion is that we needed more number of bags to improve this research,
also we could get more accurate statistics and the claims we want to make would be easy
tested.
Reflective writing:
I have learned many new concepts in statistics from this class. It has included how
to make proper graphs and determining if stated statistics are a false representation of the
true statistical number, such as a product works 200% better, this is not a true statement.
This class will help me in other classes that I will take. Now I have a better
understanding of statistics, which are quite commonly used in almost all areas of study.
By gaining even a basic understanding of statistics I will have an easier time
comprehending statistics that are included in my other studies.
Some of the specific processes throughout this assignment that I will use in other
classes are the graphs that I now know how to correctly make. These graphs can be used
in health, since I am majoring in Health Promotion, where we deal with a lot of graphs
and statistics from the population.
This project has been very challenging throughout this term. I struggled with
some of the concepts throughout the last half of this class, which I needed to fully
understand to complete parts of this project. I have learned to continue to move on
throughout the course and continue to learn the remaining concepts; with this it
sometimes helped me to better understand past concepts that I misunderstood.
This project has changed some of the ways that I think about real world math
applications by having a better understanding of correct graphs and proper calculations
for populations and sample populations.

Kassem Hajj: Math 1040 Skittles Term Project

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Kassem Hajj: Math 1040 Skittles Term Project

Uploaded by

Copyright:

Available Formats

Kassem Hajj

Math 1040 Skittles Term Project

Class Data Collection:

Proportion of Each Color

Amount of Each Color

^p=0.198 ; q^ =1^p =0.802 ; n=1264 ; x =11.9 ; =0.05

n=21; n1=20 ; x =60.2 ; s=3.01 ; =0.01

Constructing a 98% confidence interval estimate for the standard deviation of

s=3.01 ; n=21; =0.02

x=206 n=1264 ^p =0.163 =0.01 p=0.2q=0.8

H 0 : p=0.2 H 1 : p 0.2 z=3.29 p value=0.001

You might also like