You are on page 1of 10

Project Part One: The First thing we did was compile data.

Each Student was told to


go to the store and buy a 2.17 oz bag of Skittles candies. Heres my data:
Red
13

Orange
12
Count
Red

Yellow
Green
Purple
Total
12
11
11
59
Count
Count
Count
Count
Orange
Yellow
Green
Purple
Totals
59
My Bag
13
12
12
11
11
791
814
763
748
798
3914
Class Counts
20.34
20.34
18.64 18.64
Proportions for my bag
22.03%
%
%
%
%
100%
Proportions for entire
20.80
19.49
19.11 20.39
data set
20.21%
%
%
%
%
100%
i)
The graph is not much different from what I expected but, the variation
in the totals is definitely something worth looking into. We expected to
see a 20% proportion for each of the amount totals and each of the
amounts are very close to that 20% threshold. Our range of proportions
is only from 19.1% for the green and 20.8% for the orange.
ii)
We have two values that are outliers. In one instance somebody
recorded 88 total skittles and in another somebody recorded over 100.
These are both very interesting considering that most of our data
appears to be right around sixty for total skittles per bag. These outliers
in particular will skew our average and make our intervals wider. It will
also make our range incredible large and our totals higher that what
we'd predicted.
iii)
The distribution of my data is definitely wider. This is likely due to my
sample size of 59, compared to the entire class's sample of 3914. My
percentages ranged from 18.64-22.03. That is much wider that the class
variation.

Total Number of Each Candy

748

814

763

798

Project part two: Next we compiled the entire classs data and came up with
proportions and made graphs relating the total proportions. Above is a pie chart
along with the data. Below is a bar graph representing our number of total candies.

791

Total Number of Each Candy

814
820

798

800

791
763

780
760

748

740
720
700

Orange

Purple

Red

Yellow

Green

Project part three: Next we provided more summary stats for our total number of
candies in each bag including the standard deviation among other things. We also
created more graphs such as a histogram. We also explained differences in ways to
use different logical statistics.
1. i.
ii.
iii.

Mean number of candies per bag: 61.16


Standard deviation of the number of candies per bag: 8.01
5 number summary:
a.
Sample minimum 52
b.
First quartile 58.5
c.
Median 60
d.
Third quartile 61
e.
Sample maximum 110

2. Frequency Histogram

3. Box Plot for Total Candies in Each Bag

Individual Portion Part Three


The candies are pretty evenly distributed when you look how big of a sample
size we have but we do have some variation considering our lowest value, green,
had 66 less than our high value of 814. Overall the data reflects mostly what I
expected to see, small variation and pretty evenly distributed values. In my bag I
had 59 total candies and a larger variation in the distribution compared to the
overall sample, which contained 3,914 total candies. This is exactly what wed
expect to see considering our null is true. The larger the sample gets the less
variable it will be.
Categorical data is where you cant quantify your findings. You cant put a
number to your answers. For example, you dont ask someone if they are happy or
not and expect to get an answer of 3. That just wouldnt make much sense. Instead
youre wanting a yes or no answer. These two different answers would be two
different categories. Another example is favorite colors. When you ask someone
their favorite color you dont expect them to give you the wavelength of the color
blue. You expect them to tell you a color. This would be a category. The answers
people give arent related to a specific number or numerical. Its a not mathematical
answer. Some calculations that make sense for this kind of data is stuff like count
and proportions. Something that wouldnt make a lot of sense would be a mean.
What the mean of all the colors? Well if you mix all the colors you get a gross brown

color. (I tried it in the third grade.) It wouldnt make sense to say, The average
color of peoples favorite colors is brown. This would make no sense what so ever
and skew and destroy our study. This shows no significance about anything and
actually could make the findings useless and false. The graphs that work the best
for this kind of data are graphs like pie and bar graphs. These graphs show the
different categories relative to each other in an easy to read and process way. One
that would be weird to put to categorical data and would work well would be a box
plot. Again youd run into the problem of having no real mean and other vital
mathematical numbers.
Quantitative data is data that deal with mathematical values. A couple great
examples are test scores or height or weight or distance. An example might be if
youre going to get a drivers license at the DLD they ask you your height on a form
you need to fill out. If you just put tall youd probably be denied a drivers license.
They expect a certain height value. Some calculations that dont make a lot of
sense are a particular value such as a dollar amount. For example deciding what
portion of the population currently had $2.95 in their pocket wouldnt be valuable
data. You could ask what portion of people are living on less than $2.95 a day but,
how much had $2.95 in their pocket right now, exactly, would make much sense. A
graph that wouldnt make a ton of sense when looking at quantitative data is a pie
chart. You could split up your Quantitative data into categories and a pie chart
would make sense but otherwise it wouldnt. You basically have to have a different
slice out of the pie for each sample because different people will have slightly
different amounts of money in their pocket or will be slightly shorter or heavier.
Part Four: We the calculated confidence intervals of our sample data to determine if
certain claims could be true about the data we compiled. We also explained why
confidence intervals are worthwhile and effective in calculating the significance of
our data.
1. 1-Proportion Z-interval test
x: 763
n: 3914
C-Level: .99
Calculate:
(.179, .211)
pp : .195

E=2.576

.195(1.195)
3914
E=.01631

Margin of Error: (.195 .016)


Using the Equation Editor:

.211.179=.032

.032
=.016
2
Critical Value: invNorm (.995)= 2.575
2. T-Interval
xx : 61.15625
Sx: 8.008366161
n: 64
C-Level: .95
Calculate:
(59.156, 63.157)

E=1.96

8.008
64

E=1.962
Margin of Error:

63.15759.156=4.001

4.001
=2.0005
2
(61.1522.0005)
Critical Value: inv .05
3. We used the chi sheet to pick the numbers.

64 8.00082
64 8.00082
<<
37.485
88.379

46.35512056< < 109.29223


6.8085< <10.4543
4. A. We are 99% confident that if we got a random bag of skittles that that bag
would contain between 17.9% to 21.1% would be yellow skittles.
B. We are 95% confident that if we got a random regular sized bag of skittles
that there would contain between 59 and 64 skittles.
C. We are 98% confident that if we got a random regular sized bag of skittles
that the standard deviation of that bag would be between 6.8085 and
10.4543.
Confidence intervals are great ways to interpret data. According to McGraw
Hill, The purpose of confidence intervals is to give us a range of values for our
estimated population parameter rather than a single value or a point estimate. The
estimated confidence interval gives us a range of values within which we believe,
with varying degrees of confidence, that the true population value falls. Thats the

technical way to put I think the purpose of a confidence interval is to give us values
of what wed expect of a particular population. Thats me trying to make it sound
simple. For example, if we got a 95% confidence interval that the proportion of all
people who love chocolate was (.61, .71) , we could say that if you asked somebody
if they loved chocolate 61%-71% of people would say that they did love chocolate.
We could also say that there is between a 61-71% chance that if you asked a person
if they loved chocolate that theyd say yes. My favorite thing about a confidence
interval is the way you can interpret the variability of a data set. If you get a really
wide confidence interval you can assume that your overall population has more
variation that a more narrow sample.
Part Five: In part five we used hypothesis testing to test certain hypothetical claims.
We use p-values to test the claims.
1. We use hypothesis testing to test someones claim. We use the null
hypothesis to label what the original claim is. We use the alternative
hypothesis to say anything that isnt the original claim. An example would be
that the null hypothesis is that exactly 50% of people sitting on a table read a
public ad placed on the table. The alternative would be less than 50% of
people sitting on a table read the public ad placed on the table. We do a
hypothesis test to see if there is significant evidence against the null
hypothesis or if there is not.
2. The validity conditions are that there are more than 20 trials with at least 10
success and 10 failures, the data cannot be skewed and the distribution must
be somewhat normal.
3.

z= ^p p 1.156

p(1 p) n

Our decision about the null hypothesis is based on our p-value which is
(.7431). This is greater than our significance level of (.05) and we have
determined that there is little to no evidence against the null and therefore
cant disprove it.
In Conclusion, there is little or no evidence against the null hypothesis that
20% of all of the skittles are red.
4. Input the data in L1, then go to Tests, T-test, data,
55
t= 6.15
p-value=0.00000005
xx = 61.156
Sx= 8.008
n= 64
Calculate
Because the p-value is a lot smaller than the .01 significance level there is
very strong evidence against the null: not like that the mean is 55.
We can conclude that there is strong evidence that each bag of skittles has
something different than an average of 55 skittles.
Part six: This was a reflection of our experience of the project and stats in general.

Ive learned so much doing this project and throughout this class. Ive
learned stuff ranging from how easily scientific data can be wrongly
projected to how easily data can be entirely false based on a couple of
people reporting false information to researcher. These simple mistakes can
be made by any person who isnt extremely careful with the data set and the
facts. Most of all throughout this class and this project Ive learned to
question everything. Question whos doing the research, whos paying for the
research, whos interpreting the data, and so many more. The questions I
have seem to be endless now.
Some of the things Im referring to is the data set we received in the
Skittles project. The directions were as clear as could be: Get a regular size
bag of Skittles. As students, not following direction, our grades are on the
line. We have something to lose. Still even though this happened and we
have things at risk, people got the wrong size of bag of Skittles. Now you can
reflect this to the real world. How many people who answer surveys or are
part of statistical studies arent very honest or answer wrong. These people
have nothing at stake and people likely would want to skew the data just for
kicks or to get a quick laugh. This is some of the reasons you have to
question everything. Who or what is being researched and how is it being
researched. Were the researchers wise enough to throw-out the false data,
and when they decided to throw the false data out, did they ensure only false
data was thrown out?

I truly loved this class and this project. It showed me how to look at
data and statistical facts in a new way. Theres so much more variation in
every statistic that what magazines and even researchers project. In the end
what I really learned doing this project and going through this class was to
look deeper into what scientists tell the world. Its to question anything and
everything and interpret it with a more educated and informed mind.

You might also like