Data Analysis Project

Part 1 Data Analysis
We chose to do the Exhale Study Data.

We will be collecting a data sample. Then we will analyze the data, organize it, and draw conclusions. We will be
first create a pie chart of part of our data. Then creating a histogram and box plot. These will help visually show our data
in graphs. Then we will find out the level of confidence and the level of significance with our data. The goal of the
assignment is not specifically about the data but to show what we have learned in statistics to be able to find this
information from the data set.
Project Worksheet

Population
Categorical Variable: Yes
All Values of Categorical Variable: Female, Male.
Choose one of the above values to use in Part 4 and Part 5 of the project.
P= Female
Sample One: n = 14
x = 7
p(hat) = 0.5

Sample Two: n = 14
x = 8
p(hat)=0.571

Quantitative Variable: Height
Population:
Mean (mu): 61.14357798
Standard Deviation (sigma): 5.699150646
Sample One:
n = 13
x(bar)= 56.23076923
s = 6.022926284
Sample Two:
n=13
x(bar)=61.46153846
s=6.989460931

Part 2 Data Analysis Project

SYSTEMATIC SAMPLING
Sample One: 1, 0, 1, 1, 0, 1, 1, 0, 0, 0, 1, 0, 0, 1
I got these values from choosing every 40
th
person, this is also known as Systematic Sampling. Systematic Sampling us
when you select some starting point and then select every kth element in the population.

SIMPLE RANDOM SAMPLING
Sample Two: 0, 0, 1, 0, 1, 1, 1, 0, 1, 0, 1, 0, 0, 0
I got these values from randomly seleting throughout the data, I would scroll down and choose whichever one my cursor
was on. Simple Random Sample is a sample of n subjects is slected in such a way that every possible sample of the same
size n has the same change of being chosen.

Results: Female Male
Original: 48.62% 51.38%
Sample One: 50% 50%
Sample Two: 57.14% 42.86%

The origianl and the sample one (systematic sampling) are pretty close in numbers just a 1.38% increase in Females and
a 1.38% decrease in Males. There is quite the difference between the origianl and sample two (simple random sample)
with an 8.52% increase in Females and 8.53% decrease in Males.
The difference between sample one and sample two are quite different too with a 7.14% increase in Females and 7.14%
decrease in Males.
With Females increasing with each sampling method, males decreased with each sampling method.

Project Worksheet
Population
Categorical Variable: Yes
All Values of Categorical Variable: Female, Male.
Choose one of the above values to use in Part 4 and Part 5 of the project.
P= Female
Sample One: n = 14
x = 7
p(hat) = 0.5

Sample Two: n = 14
x = 8
p(hat)=0.571

Population Mean of Heights = 61.14357798
Standard Deviation = 5.699150646
1. Sample minimum = 46
2. First Quartile = 57
3. Median = 61.5
4. Upper Quartile = 65.5
5. Sample Maximum = 74

CLUSTER SAMPLING
Sample One: 48, 49.5, 50, 51, 54, 55.5, 56, 57, 57.5, 59.5, 60, 64, 69
I got these values from randomly choosing the persons that were part of rows 300-312. Cluster sample is obtained by
selecting all individuals within a randomly selected collection or group of individuals.
Sample Statistics: Mean: 56.231, Standard Deviation: 6.023, Five-number summary: 48, 50.5, 56, 59.75, 69

SYSTEMATIC SAMPLING
Sample Two: 48, 55, 55, 56.5, 57.5, 58, 63, 64, 66, 67.5, 68, 68.5, 72
I got these values by choosing every 50
th
person, this is also known as Systematic Sampling. Systematic Sampling us
when you select some starting point and then select every kth element in the population.
Sample Statistics: Mean: 61.462, Standard Deviation: 6.989, Five-number summary: 48, 55.75, 63, 67.75, 72

Results:
The original frequency histogram has a normal distribution along with the sample one (cluster sample), where-as the
sample two (systematic sampling) does not have a normal distribution.
Sample two has more variance than sample one and the heights of sample two are more spread out rather than sample
one making the sample two not a normal distribution. Sample two has more high numbers than sample one making the
boxplot shift more up with a higher five-number summary than sample one.

Project Worksheet
Quantitative Variable: Height
Population:
Mean (mu): 61.14357798
Standard Deviation (sigma): 5.699150646
Sample One:
n = 13
x(bar)= 56.23076923
s = 6.022926284
Sample Two:
n=13
x(bar)=61.46153846
s=6.9899460931

Confidence Intervals Samples
Sample One:
95% confidence level
5% unconfident level alpha = 0.05
n = 14
x = 7
p(hat) = 0.5

Za/2= Z0.025= 1.96
E= 1.96(squareroot)(0.5)(0.5)/14 = 0.261916017

Margin of Error:
0.5-0.261916017=0.238
0.5+0.261916017=0.762
(0.238 < p < 0.762)

Sample Two:
95% confidence level
n = 14
x = 8
p(hat)=0.57
Za/2= Z0.025= 1.96
E= 1.96(squareroot)(0.57)(0.43)/14=0.259336538

Margin of Error:
0.57-0.259336538=0.312
0.57+0.259336538=0.829
(0.312 < p < 0.829)

Confidence Intervals for the Mean Quantitative Samples
Sample One:
90% Confidence level


n = 13
x(bar)= 56.23076923
s = 6.022926284
degrees of freedom = 12
ta/2=t0.05=1.782
E= 1.782(6.022926284/(square-root)13)= 2.976758287
Margin of Error:
56.23076923+2.976758287= 59.208
56.23076923-2.976758287=53.254
53.254 < Mu < 59.208

Sample Two:

n=13
x(bar)= 61.46153846
s=6.9899460931
ta/2= t0.05=1.782
E= 1.782(6.9899460931/(square-root)13)= 3.454696102
Margin of Error:
61.46153846+3.454696102= 64.916
61.46153846-3.454696102= 58.007
58.007 < Mu < 64.916

Confidence Intervals for Standard Deviation Quantitative Samples
Sample One:
n= 13
s=6.022926284
a/2=0.005
X2(right) = 28.299
X2(left) = 3.074

(13-1)6.022926284(squared)/28.299 = 15.3824
(13-1) 6.022926284(squared)/3.074= 141.6095
Square answers
15.3824 = 3.9220
141.6095= 11.899
3.922 < Sigma < 11.899

Sample Two:
n= 13
s= 6.9899460931
a/2= 0.005
X2(right)= 28.299
X2(left)= 3.074
(13-1)6.9899460931(squared)/28.299= 20.7185
(13-1)6.022926284(squared)/3.074= 190.7326
Square answers
20.7185= 4.552
190.7326= 13.821
4.552 < Sigma < 13.821

A confidence interval is a range of values so defined that there is a specified probability that the value of a parameter
lies within it.
A confidence level that confidence level is the probability 1-a often expressed as the equivalent percentage value
that the confidence interval actually does contain the population parameter, assuming that the estimation process is
repeated a large number of times.
So with each problem there is a confidence level given and that is the confidence level in which they are confident that
the value(s) will land within the ranges and to do that we calculate a margin of error in which tells us that a certain
amount that is allowed for in case of miscalculation of change of circumstances.
I think that it did capture the population parameter.

Population Proportion
H0: p=0.4862
H1: p (does not equal) 0.4862
0.05 significance level
n= 654
x = 318
p(hat) = 0.486238532
z= 0.486238532-0.4862/(squared-root)(0.4862)(0.5138)/654= 0.001971541
z= 0.5000
Two Tailed
P-Value= 0.5000+0.5000=1
Fail to reject null hypothesis. P-value is greater than significance level.
We fail to reject the null hypothesis, here is insufficient evidence to support the claim that 48.62% of females
participated in the exhale study.
Population Mean
H0: Mu=61.14357798
H1: Mu (does not equal) 61.14357798
0.01 significance level

n=13
x(bar)=56.23076923
s= 6.022926284
t= -2.940992969
Critical value= 3.055
Fail to Reject
There is insufficient evidence to support the claim that the mean is 61.14.

Ciara Long
Math 1040
Prof. Schweitzer
August 10, 2014
What I Learned
I feel I have learned a lot in statistics this semester. I have already encountered an article that I was
reading for my research paper for my English class and at the bottom of the article there was the margin of
error. I felt awesome that I knew the meaning! It was really cool to see it used in something I am interested in
and it helped me better understand the article, let alone write a good sentence or two about the article. Specific
parts of the project that helped me was when I selected my samples and then made the charts. I really like
visually seeing the differences and how using different sample methods can have different outcomes. I am
hoping that I did the math right during the last two parts. I had a really hard time understanding part 5. I really
enjoy when the problem is set up for me and I am able to pick out the parts that I need to test the hypothesis. It
is definitely more difficult when you have to set it up yourself. It helped me with problem solving skills
especially during the intervals on part 4. I really enjoy doing those and I really understand them. I most
definitely think that everything I have learned during the semester can be applied to the real-world. I never once
thought that math was not able to be applied to the real-world. I am impressed with all that I have learned in this
class and appreciate all that I have learned. I know that it can be applied to my career in the health field, but for
right now I know that it really wont apply. I am considering becoming a physician assistant so most definitely
all that I have learned can be applied once I get more down that road.

Data Analysis Project

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Data Analysis Project

Uploaded by

Copyright:

Available Formats

Part 1 Data Analysis

We chose to do the Exhale Study Data.

You might also like