Professional Documents
Culture Documents
Color
Frequency
Proportion
Yellow
47
0.193
Group Data
Purple
Red
47
48
0.193
0.198
Green
55
0.226
Orange
46
0.189
Total
243
1
Yellow
276
0.184
Orange
339
0.227
Total
1496
1
Color
Frequency
Proportion
Color
Frequency
Proportion
Yellow
14
0.226
Individual Data
Purple
Red
Green
11
10
13
0.177
0.161
0.210
Orange
14
0.226
Total
62
1
Observation of Individual Data, Group Data and Class Data : Thitirat Pongprajuc
What I saw was that my data was slightly similar to the whole class sample. The most
candy color that I had was orange which was also the highest for the overall class sample. For
my data, the orange and yellow had the same amount that was not the case for the class data.
Also, red was my lowest count not yellow, which was one of the classes lowest counts.
However, the highest number of candy color of my group data was green followed by red, and
the lowest number of candy color was orange. The data reflected what I expected to see that
some of my data would be similar to the whole class data, but not exactly the same, but I did
not expect that the group data was extremely different from my data and the class data.
Color
Frequency
Proportion
Yellow
6
0.098
Individual Data
Purple
Red
Green
10
18
16
0.164
0.295
0.262
Orange
11
0.180
Total
61
1
Observation of Individual Data, Group Data and Class Data : Shainalyn Howell
The data gathered from the class is represented above by a Pie chart and a Pareto chart.
As you can see from these graphs orange had the most and yellow had the least. This I slightly
different from my individual charts, because my bag had more red, orange was in the middle
but yellow was also the least amount in my bag (individual data represented in the graphs
below). Having never put much thought into how many different colored candies are in my
skittle bag before this project I would have just assumed that the number of colors would have
been distributed evenly throughout the bag.
Color
Frequency
Proportion
Yellow
13
0.213
Individual Data
Purple
Red
13
10
0.213
0.164
Green
12
0.198
Orange
13
0.213
Total
61
1
Observation of Individual Data, Group Data and Class Data : Ashlie Hashimoto
My data didnt seem to be that similar to the Class or my Group data. Upon comparing
my data to the group I had less red and green skittles and more yellow, purple, and green.
When comparing my data to class I had more yellow and purple and less red, green, but the
same color of orange skittles. The data didnt seem to reflect what I thought I would see. I
thought that I would be able to have more in common with the other data from the group and
class than I witnessed.
Color
Frequency
Proportion
Yellow
14
0.237
Purple
13
.220
Individual Data
Red
Green Orange
10
14
8
.169
.237
.136
Total
59
1
Observation of Individual Data, Group Data and Class Data : Veronica Hollestelle
While the group data seemed to have equal amounts of each color, the group ones did not.
This proves that the more samples you use the more consistent and equal the data count will
be. Th4ere was even more difference in my data compared to the class data. Orange for the
class was the highest .227, the group .189 my count .136 were the least. The lowest color for
the class was yellow .184 and the group count .193 while mine was the highest .237. The
highest group count was green at .226 mine for green was .237 and the class showed a high
number of green also with .204. I had the highest count for yellow, everyone had more orange
candies while I had the least amount of orange candies. My bag had one of the lowest counts of
candies per bag and the colors varied a lot from almost everyone.
Organizing and Displaying Quantitative Data: the Number of Candies per Bag
Group Data
Name
Hashimoto Ashlie
Hollestelle Veronica
Howell Shainalyn
Pongprajuc Thitirat
Total
Yellow
13
14
6
14
47
Summary statistics:
Column n
Mean Std. dev. Min
Total
4
60.8
1.26
59
Purple
13
13
10
11
47
Q1
60
Red
10
10
18
10
48
Median Q3
61
61.5
Green
12
14
16
13
55
Orange
13
8
11
14
46
Total
61
59
61
62
243
Max
62
IQR = Q3 - Q1
= 61.5 60
= 1.5
Lower fence = Q1 1.5IQR
= 60 1.5(1.5)
= 60 2.25
= 57.75
Upper fence = Q3 + 1.5IQR
= 62 + 1.5(1.5)
= 62 + 2.25
= 64.25
There is no outlier for the group data.
Name
Alaguretnam Nitharshan
Becker Jenna
Bekavac Morena
Dunn Devin
Ebert Diedre
Hashimoto Ashlie
Hills Seung
Hollestelle Veronica
Howell Shainalyn
Jackson amanda
Jameson Samantha
Juback Haley
Karaiskos Kalliopi
Pongprajuc Thitirat
Rojas Kasy
Schofield Victoria
Schott Kristina
Seike Nai
Shimizu Shelsea
Smith Richard
Sorto Jennifer
Sorto Nicole
Taylor Kelcee
Terrell Elizabeth
Wright Heather
Total
Yellow
6
10
11
10
20
13
13
14
6
9
12
13
16
14
10
4
11
17
4
13
8
13
6
11
12
276
Purple
13
15
11
11
12
13
9
13
10
10
13
12
12
11
10
16
7
12
13
18
11
9
11
10
12
294
Red
13
8
15
10
8
10
12
10
18
15
13
6
12
10
10
13
12
6
10
8
14
8
15
18
8
282
Green
14
13
6
13
9
12
13
14
16
11
11
12
12
13
14
11
12
11
7
13
10
15
17
13
13
305
Orange
16
13
17
9
6
13
12
8
11
17
10
15
16
14
16
17
18
14
22
10
17
11
11
10
16
339
Total
62
59
60
53
55
61
59
59
61
62
59
58
68
62
60
61
60
60
56
62
60
56
60
62
61
1496
Summary statistics:
Column n
Mean Std. dev. Min
Total
25
59.8
2.90
53
Q1
59
Median
60
Q3
61
Max
68
IQR = Q3 - Q1
= 61 59
= 2
Lower fence = Q1 1.5IQR
= 59 1.5(2)
= 59 3
= 56
Upper fence = Q3 + 1.5IQR
= 61 + 1.5(2)
= 61 + 3
= 64
The class datas outliers are 53, 55 and 68.
Based on the StatCrunch result of our data, we are 95% confidence that the interval
between 0.1764 and 0.2166 actually does contain the true value of the population proportion
of purple candies. This means that if we were to select many different samples of size 1496 and
construct the corresponding confidence intervals, 95% of them would actually contain the value
of the population proportion of purple candies.
Construct a 99% confidence interval estimate for the true mean number of candies per bag.
99% confidence interval results:
: Mean of variable
Variable Sample Mean
Std. Err.
N of Candies
59.84
0.57930993
for each bag
DF
24
L. Limit
58.219705
U. Limit
61.460295
Based on the calculations from our data, we are 99% confident that the interval from
actually does contain the true value of . This means that if we selected many different
samples of the same size and construct the corresponding confidence intervals; in the long run
99% of them would actually contain the value of .
Construct a 98% confidence interval estimate for the standard deviation of the number of
candies per bag.
98% confidence interval results:
2 : Variance of variable
Variable Sample Var.
DF
L. Limit
U. Limit
DF
24
L. Limit
4.6849894
U. Limit
18.547651
Find
4.6849894 18.557651
2.1645 4.3079
Based on this result, we have 98% confidence that the limits of 2.167 and 4.312 contain
the true value of .
Hypothesis Tests
A hypothesis test is a test whether a claim of a value of a population proportion, a
population mean, or a population standard deviation and whether or not the claim is true. The
purpose of a hypothesis test is to make a conclusion about a claim
Use a 0.01 significance level to test the claim that 20% of all Skittles candies are green.
n = 1496
p = 0.2
= 0.2039
x = 305
q = 0.8
= 0.01
Step 1: The original claim is that 20% of all Skittles candies are green. p = 0.20
Step 2: The opposite of the original claim is p 0.20
Step 3: The null hypothesis is p = 0.20 and the alternative hypothesis is p 0.20
0 = 0.20,
0.20
Step 4: The significance level = 0.01
Step 5: Because the testing claim is a population proportion , the sample statistic is relevant,
which makes it a normal distribution.
Step 6: The test statistic z = 0.37 is calculated as: p-value = 0.7077 > 0.01
Hypothesis test results:
p : Proportion of successes
H0 : p = 0.2
HA : p 0.2
Proportion
P
Count
305
Total
1496
Sample Prop.
0.20387701
Std. Err.
Z-Stat
P-value
0.010341754 0.37488858 0.7077
Step 7: Because the p-value is greater than the significance level of = 0.01 the null hypothesis
is supported
p-value > or
0.7077 > 0.01 0
Step 8: From this hypothesis test, because the null hypothesis was failed to reject, there is
sufficient evidence to support the rejection of the claim that 20% of all Skittles candies are
green.
Use a 0.05 significance level to test the claim that the mean number of candies in a bag of
Skittles is 56.
n = 25
= 59.8
= 56
= 0.05
= 0.025
Step 1: The original claim is that the mean number of candies in a bag of Skittles is 56. = 56
Step 2: The alternative to the original claim is does not equal 56.
Step 3: The null hypothesis is = 56 and the alternative hypothesis is 56.
0 = 56,
56
Step 4: The significance level = 0.05
Step 5: Because the testing claim is a population mean , the sample statistic mean is
relevant, which makes it a student t distribution.
Step 6: The test statistic t = 6.6286 is calculated.
Hypothesis test results:
: Mean of variable
H0 : = 56
HA : 56
Variable Sample Mean
N of Candies
59.84
for each bag
Std. Err.
0.57930993
DF
24
T-Stat
6.6285761
P-value
<0.0001
Step 7: Because the p-value is less than the significance level of = 0.05, we reject the null
hypothesis.
p-value < or
(< 0.0001) < 0.01 0
Step 8: There is not sufficient statistical evidence to support the claim that the mean number of
candies in a bag of Skittles is 56.
There are three conditions for confidence Interval for estimating a population proportion p
1) The sample is a simple random sample.
2) Either or both of these conditions are satisfied: the population is normally distributed or
n>30.
Our sample met the both conditions that the sample is randomly selected, and the
population is normally distributed. However, our sample size is 25 which is smaller than the
requirement of 30. We can still calculate for confidence interval for estimating a population
proportion p.
Conditions for Confidence Interval for Estimating a Population Mean with not known
1) The sample is a simple random sample.
2) Either or both of these conditions is satisfied: The population is normally distributed or
n > 30.
Our sample met the both conditions that the sample is randomly selected, and the
population is normally distributed. However, our sample size is 25 which is smaller than the
requirement of 30. We can still calculate for confidence interval for estimating a population mean
with not known.
Conditions for Confidence Interval for estimating a population Standard Deviation or Variance
1) The sample is a simple random sample.
2) The population must have normally distributed values.
Our sample met the condition that the sample is randomly selected. We do not know
exactly if the population is normally distributed, we assume that it is normal distribution. Since
the two conditions are met, we can calculate for confidence interval for estimating a population
standard Deviation or variance.
Mistakes could be made gathering this data. One type of error could be recording
incorrect data. This could happen if the person counted incorrectly or wrote the wrong quantity
down for that color. The sampling method could be improved by increasing the sample
size. We could also improve the sampling method by acquiring bags from different parts of the
country and/or world, rather than the local area.
We have drawn the conclusion that the true mean number of candies in each bag of
Skittles is close to the actual mean we found by gathering our data. We have also drawn the
conclusion that each color of Skittle is somewhat evenly proportioned in each bag.
Reflection
Some of the things that I have learned as a result from this paper are that statistics is
not easy as algebra, but I can use it in real life situations. I am not sure that I would ever have to
know how many candies are in a Skittles bag, but it made the statistics learning fun and
different. I was surprised to know that not all Skittles bags has the even numbers of candies.
Some bags may have some candy color more than others, which makes me wonder what is the
probability of all Skittles candy bags in the world would have outliers numbers of candies.
The math skills that I will applied from this project to other classes is interpreting words
problems. In statistics, understanding the problems and choosing the right formula are very
critical because if I do not read the problems carefully. I would easily solve the problems by
using wrong methods, and the answer would not even close to the right answer. If I apply
interpreting skills to my future classes, I would get the information or make decision correctly.
This Statistics class refreshed my problem solving skills after having taken ten years off of
school. I was confident at the beginning of the semester, but facing complex problem made me
have less confidence. However, I gained a lot of confidence again after practicing solving
problems. With a variety of different formulas and scenarios throughout the book, my thought
process was challenged and made me think through problems. I was able to see the changes
and development in my problem making skills as I completed different parts of the project.
Each section had specific challenges that required us to use our resources and judgement. This
project changed the way I think about real-world math applications. I learned that I could use
statistics in many real world ways, not just using skittles. When you think about it, you actually
use it a lot more than you know, but only simple statistics. I had never thought that the chance
of having a certain gender of baby could be calculated statistically. In order to build an airplane,
the engineer has to consider to the size of seats regarding to an average size of passengers
body. I noticed that statistics is all around, but people do not realize that it is in the real world.