You are on page 1of 10

Kassem Hajj

Math 1040 Skittles Term Project


The following project will consist of many of the concepts I am learning in the course of
Math 1040. Some of the concepts I will be covering will include organizing and
analyzing data, drawing conclusions using confidence intervals and hypothesis tests. I
will first start by collecting my own data. My data was collected along with all of the
other students in class and combined into one data set that will be used throughout this
assignment.
Data Collection
I have purchased one 2.17 ounce bag of Original Skittles and recorded the data below.
The total number of candies in my own bag is 62.

Number of

Number of

Number of

Number of

Number of

Red

Orange

Yellow

Green

Purple

candies

candies

candies

candies

candies

13

19

17

Class Data Collection:


The following is the collection of data for the whole statistics class. The total number of
bags in the class sample is 21, and the overall total of skittles is 1264.
Bags
1
2
3
4
5
6
7
8
9

RED
22
11
7
11
8
12
12
5
19

Orange
13
13
18
6
9
13
8
23
11

Yellow
3
10
13
15
10
12
14
15
11

Green
8
10
10
7
12
13
11
11
9

Purple
16
12
15
21
13
10
14
8
12

Total
62
56
63
60
52
60
59
62
62

10
11
12
13
14
15
16
17
18
19
20
21
Color
Totals
Proportion
%

13
15
10
11
21
13
20
17
17
12
12
15
283

15
18
14
9
11
7
21
11
10
9
17
15
271

22.389 21.440

13
16
12
13
9
18
10
14
12
7
16
11
254

6
9
9
17
8
12
2
12
10
14
9
7
206

10
9
15
9
9
10
9
8
11
17
8
14
250

57
67
60
59
58
60
62
62
60
59
62
62
1264

20.095

16.297

19.778

100

Note: Only including whole candies and disregarding any partial candies in the bag.
Organizing and Displaying Categorical Data:
Pie Chart:

Proportion of Each Color


Red
20%

22%

Yellow
Green

16%

21%
20%

Pareto Chart:

Orange

Purple

Amount of Each Color


300
200
Amount

Series 1

100
0
Red

Orange

Yellow

Purple

Green

Candies Colors

The data from the whole class shows that the five colors have an approximately equal
proportion. Which is 20%. I was expecting this result because it seems very reasonable.
In some bags there was almost double the amount of a specific color of skittles. However,
we are dealing with twenty-one bags so, the outliers in some bags do not affect our
results.
Organizing and Displaying Quantitative Data:
The total number of candies in my own single 2.17-ounce bag of Skittles = 62 .
The total number of bags in the sample collected by the entire class = 21 .
The total number of candies in the sample collected by the entire class = 1264 .
For the entire class:
Summary statistics:
Column
n
Mean
Total
21
60.2

Std. dev.
3.01

Median

Min Max Q1 Q3
60
52
67 59 62

The shape of the distribution is not normal because the data does not reflect a bell shape
on the histogram. It is almost normal though, for exception to the bag that contains 52
candies. The data does reflect what I was expecting because I thought that the data would
resemble a bell shape. If you look at the boxplot the data seems to make more sense and
you can see that most of the data stays around the 60 total mark. My bag of skittles had a
total of 62 skittles, so I would consider the class data to be aligned with my own data.
Reflection:
The difference between Categorical and Quantitative data is that Categorical data is
considered labels such as hair color, Names, etc... and Quantitative data is numbered data
that can be mathematically manipulated. With Quantitative data you would most likely
want to use boxplots, Histograms, and summary data statistics because all of these can
show numbered data in a meaningful way. With Categorical data you may want to choose

something like a pie chart or stem and leaf plot or even a bar graph (pareto chart) because
these explain better the data and show its meaning with categories.
Confidence Interval Estimates
The purpose of taking a random sample from a lot or population and computing a
statistic, such as the mean (proportion, standard deviation and Variance) from the data, is
to approximate the mean for example (or the other parameters) of the population. How
well the sample statistic estimates the underlying population value is always an issue. A
confidence interval addresses this issue because it provides a range of values, which is
likely to contain the population parameter of interest.

Constructing a 95% confidence interval estimate for the true proportion of purple
candies:

^p=0.198 ; q^ =1^p =0.802 ; n=1264 ; x =11.9 ; =0.05


0.176< p<0.220

We have 95% confidence that the proportion value (p) of the purple color candies is
between the range (0.176,0.22)

Constructing a 99% confidence interval estimate for the true mean number of
candies per bag.

n=21; n1=20 ; x =60.2 ; s=3.01 ; =0.01


58.331< <62.069

We have 99% confidence that the mean number of candies per bag is between 58.331 and
62.069.

Constructing a 98% confidence interval estimate for the standard deviation of


number of candies per bag:

s=3.01 ; n=21; =0.02


K 2L =8.26 ; K 2R=37.566
2.196< < 4.684
We have a 98% confidence that the standard deviation of the population is included
between 2.196 and 4.684.

Hypothesis Tests:
In statistics, a hypothesis test is a procedure for testing a claim about a property of a
population. To apply this test, we should identify the null hypothesis (statement that the
value of a population parameter is equal to some claimed value) and alternative
hypothesis (statement that the parameter has a value that somehow differs from the null
hypothesis) so that the formal hypothesis test includes these standard components that are
used often in many different disciplines. Then we have to calculate the value of the test
statistic, given a claim and sample data. Choose the sampling distribution that is relevant.
Either find the P-value or identify the critical value. State the conclusion about a claim in
simple and nontechnical terms.

Use a 0.01 significance level to test the claim that 20% of all Skittles candies are
green.

x=206 n=1264 ^p =0.163 =0.01 p=0.2q=0.8

H 0 : p=0.2 H 1 : p 0.2 z=3.29 p value=0.001


Since the p-value is less than or equal the significance level ( 0.0001 0.01 .
Therefore, we reject the null hypothesis. There is insufficient evidence to support the
claim that 20% of all Skittles candies are green.

Use a 0.05 significance level to test the claim that the mean number of candies in
a bag of Skittles is 56.
x =60.2 s=3.01 n=21 0=56
H 0=56 H 1 56 t=6.39 pvalue =0.0031

Since the p-value is less than or equal the significance level ( 0.0031 0.05 .
Therefore, we reject the null hypothesis. There is insufficient evidence to support the
claim that the mean number of candies in a bag of Skittles is 56.

Reflection:

The conditions for doing interval estimates and hypothesis tests for population
proportions are:
1. The sample observations are a simple random sample.
2. The conditions for a binomial distribution are satisfied.
3. The conditions np 5nq 5 are both satisfied.
Since we all bought a random Skittles bag and we all got different random numbers and
colors of candies, this meet the condition. For the binomial distribution conditions, in our
project, each color has its own fixed outcomes, and for this specific calculation p was 0.2.
np=21 0.2=4.2< 5nq=21 0.8=16.8 5, In this case np is not satisfied.

The conditions for doing interval estimates and hypothesis tests for population
means are:
1. The sample observations are a simple random sample.
2. Either or both or these conditions is satisfied: The population is normally
distributed or n>30 .
As I mentioned before, the sample is a simple random sample. But the population is not
normally distributed as I discussed before in the section (Organizing and Displaying
Quantitative Data) and also the histogram confirms that. And we have n=21< 30 . The
second condition is not satisfied.
The conditions for doing interval estimates for population standard deviations are:
1. The sample observations are a simple random sample.
2. The population must have normally distributed vales.
Again condition number one is already satisfied but the second condition is not as I just
verified in the section above.
By using this data, we have made some errors. We assumed that the conditions of all test
are satisfied, which is not true. This means that some of the tests we did are not
appropriates with this data. To improve the sampling method we need to deal with a
larger population (for example n>30 ) as lager n we can get, as more accurate tests we
can apply.
Conclusion:
My main conclusion is that we needed more number of bags to improve this research,
also we could get more accurate statistics and the claims we want to make would be easy
tested.
Reflective writing:
I have learned many new concepts in statistics from this class. It has included how
to make proper graphs and determining if stated statistics are a false representation of the
true statistical number, such as a product works 200% better, this is not a true statement.
This class will help me in other classes that I will take. Now I have a better
understanding of statistics, which are quite commonly used in almost all areas of study.
By gaining even a basic understanding of statistics I will have an easier time
comprehending statistics that are included in my other studies.

Some of the specific processes throughout this assignment that I will use in other
classes are the graphs that I now know how to correctly make. These graphs can be used
in health, since I am majoring in Health Promotion, where we deal with a lot of graphs
and statistics from the population.
This project has been very challenging throughout this term. I struggled with
some of the concepts throughout the last half of this class, which I needed to fully
understand to complete parts of this project. I have learned to continue to move on
throughout the course and continue to learn the remaining concepts; with this it
sometimes helped me to better understand past concepts that I misunderstood.
This project has changed some of the ways that I think about real world math
applications by having a better understanding of correct graphs and proper calculations
for populations and sample populations.

You might also like