You are on page 1of 64

Introduction to Statistics

(modified for Science)

Statistics
The collection, evaluation, and interpretation of data

Statistical analysis of measurements can help verify the quality of a set of measurements.

Summary Statistics
Central Tendency
Center of a distribution
Mean, median, mode

Variation Spread of values around the center


Range, standard deviation, interquartile range

Distribution Summary of the frequency of values


Frequency tables, histograms, probability distributions, (normal distribution)

Standard Deviation

Variation

Measure of data variation The standard deviation is a measure of the spread of data values
A larger standard deviation indicates a wider spread in data values

Standard Deviation

Variation

xi N

= standard deviation xi = individual data value ( x1, x2, x3, ) = mean N = size of population

Standard Deviation

Variation 2

Procedure xi = N 1. Calculate the mean, 2. Subtract the mean from each value and then square each difference 3. Sum all squared differences 4. Divide the summation by the size of the population (number of data values), N 5. Calculate the square root of the result

Standard Deviation
Calculate the standard deviation for the data array

xi N

2, 5, 48, 49, 55, 58, 59, 60, 62, 63, 63


xi 524 1. Calculate the mean = = 47.63 11 N 2. Subtract the mean from each data value and square each 2 difference xi (2 - 47.63)2 = 2082.6777 (5 - 47.63)2 = 1817.8595 (48 - 47.63)2 = 0.1322 (49 - 47.63)2 = 1.8595 (55 - 47.63)2 = 54.2231 (58 - 47.63)2 = 107.4050

(59 - 47.63)2 = (60 - 47.63)2 = (62 - 47.63)2 = (63 - 47.63)2 = (63 - 47.63)2 =

129.1322 152.8595 206.3140 236.0413 236.0413

Standard Deviation

Variation

3. Sum all squared differences 2 2082.6777 + 1817.8595 + 0.1322 + 1.8595 + 54.2231 + xi = 107.4050 + 129.1322 + 152.8595 + 206.3140
+ 236.0413 + 236.0413

= 5,024.5455

Note that this is the sum of the unrounded squared differences.

4. Divide the summation by the number of data values 2 xi 5024.5455 = = 456.7769 N 11 5. Calculate the square root of the result xi N 2 = 456.7769 = 21.4

Histogram

Distribution

Frequency

A histogram is a common data distribution chart that is used to show the frequency with which specific values, or values within ranges, occur in a set of data. A scientist might use a histogram to show the variation of a measurement that exists when an experiment is repeated. 5
4 3 2 1 0

0.745 0.746 0.747 0.748 0.749 0.750 0.751 0.752 0.753 0.754 0.755 0.756 0.757 0.758 0.759 0.760

Length (in.)

Histogram

Distribution

Large sets of data are often divided into a limited number of groups. These groups are called class intervals.

-16 to -6

-5 to 5

6 to 16

Class Intervals

Histogram

Distribution

The number of data elements in each class interval is shown by the frequency, which is indicated along the Y-axis of the graph.
Frequency
7 5 3 1

-16 to -6

-5 to 5

6 to 16

Histogram
Example

Distribution

1, 7, 15, 4, 8, 8, 5, 12, 10 1, 4, 5, 7, 8, 8, 10, 12,15


Frequency
4 3

0.5 < x 5.5

5.5 < x 10.5

10.5 < x 15.5

2
1

1 to 5

6 to 10

11 to 15

0.5

5.5

10.5

15.5

Histogram

Distribution

The height of each bar in the chart indicates the number of data elements, or frequency of occurrence, within each range.
1, 4, 5, 7, 8, 8, 10,12,15

Frequency

4 3

2
1

1 to 5

6 to 10

11 to 15

Histogram
5 4 Frequency 3 2 1 0

Distribution
0.7495 < x 0.7505

Length (in.)

MINIMUM = 0.745 in.

MAXIMUM = 0.760 in.

Research and Statistics Often we do not have information on the entire population of interest Population versus sample
Population = all members of a group Sample = part of a population

Inferential statistics involves estimating, forecasting or predicting the odds of an outcome based on an incomplete set of data
use sample statistics

Population versus Sample Standard Deviation


Population Standard Deviation
The measure of the spread of data within a population. Used when you have a data value for every member of the entire population of interest.

Sample Standard Deviation


An estimate of the spread of data within a larger population. Used when you do not have a data value for every member of the entire population of interest. This includes predicting the values of measurements which have not yet occurred. Uses a subset (sample) of the data to generalize the results to the larger population.

A Note about Standard Deviation


Population Standard Deviation
= xi N 2

Sample Standard Deviation


s= xi x n 1 2

= population standard deviation xi = individual data value ( x1, x2, x3, ) = population mean N = size of population

s = sample standard deviation xi = individual data value ( x1, x2, x3, ) x = sample mean n = size of sample

Sample Standard Deviation

Variation

Procedure: s= 1. Calculate the sample mean, x. 2. Subtract the mean from each value and then square each difference. 3. Sum all squared differences. 4. Divide the summation by the number of data values minus one, n - 1. 5. Calculate the square root of the result.

xi x n 1

Sample Mean

Central Tendency

xi x= n
x = sample mean
xi = individual data value

xi = summation of all data values


n = # of data values in the sample

Sample Standard Deviation

Estimate the standard deviation for a population for which the following data is a sample.

s=

xi x n1

2, 5, 48, 49, 55, 58, 59, 60, 62, 63, 63


524 xi 1. Calculate the sample mean x= 11 = 47.63 n 2. Subtract the sample mean from each data value and 2 square the difference. xi x
(2 - 47.63)2 = 2082.6777 (5 - 47.63)2 = 1817.8595 (48 - 47.63)2 = 0.1322 (49 - 47.63)2 = 1.8595 (55 - 47.63)2 = 54.2231 (58 - 47.63)2 = 107.4050

(59 - 47.63)2 = (60 - 47.63)2 = (62 - 47.63)2 = (63 - 47.63)2 = (63 - 47.63)2 =

129.1322 152.8595 206.3140 236.0413 236.0413

Sample Standard Deviation

Variation

3. Sum all squared differences. 2 xi x = 2082.6777 + 1817.8595 + 0.1322 + 1.8595 + 54.2231 +


107.4050 + 129.1322 + 152.8595 + 206.3140 + 236.0413 + 236.0413

= 5,024.5455
4. Divide the summation by the number of sample data values minus one. 2 xi x 5024.5455 = = 502.4545 n1 10 5. Calculate the square root of the result. xi x n1 2 = 502.4545 = 22.4

A Note about Standard Deviation


Population Standard Deviation
= xi N 2

Sample Standard Deviation


s= xi x n1 2

= population standard deviation xi = individual data value ( x1, x2, x3, ) = population mean N = size of population

s = sample standard deviation xi = individual data value ( x1, x2, x3, ) x = sample mean n = size of sample

As n N, s
So for very large numbers of measurements, s

A Note about Standard Deviation


Population Standard Deviation
= xi N 2

Sample Standard Deviation


Given the SAT score of 2 your every student in xi x s = class, use the n1 population standard deviation formula to find standard deviation of s = the sample standard deviation xi = individual data scores value ( x , x , x , ) SAT x = sample mean in the class.
1 2 3

= population standard deviation xi = individual data value ( x1, x2, x3, ) = population mean N = size of population

n = size of sample

A Note about Standard Deviation


Population Standard Deviation Given the SAT scores of
every student in your 2 class, use thexsample i = standard deviation N formula to estimate the standard deviation of the = population standard deviation SAT scores of all students xi = individual data value ( x , x , x , ) at your school.
1 2 3

Sample Standard Deviation


s= xi x n1 2

OR predict what the odds of a particular score are

= population mean N = size of population

s = sample standard deviation xi = individual data value ( x1, x2, x3, ) x = sample mean n = size of sample

Probability Distribution

Distribution

A distribution of all possible values of a variable with an indication of the likelihood that each will occur A probability distribution can be represented by a probability density function
Normal Distribution most commonly used probability distribution

http://en.wikipedia.org/wiki/File:Normal_Distribution_PDF.svg

Normal Distribution
Is the data distribution normal?

Distribution

Translation: Is the histogram/dot plot bellshaped?


Does the greatest frequency of the data values occur at about the mean value? Does the curve decrease on both sides away from the mean? Is the curve symmetric about the mean?

Normal Distribution

Distribution

Bell shaped curve

Frequency
-6 -5

-4

-3

-2

-1

Data Elements

Normal Distribution
Does the greatest frequency of the data values occur at about the mean value?
Mean Value

Distribution

Frequency
-6 -5

-4

-3

-2

-1

Data Elements

Normal Distribution
Does the curve decrease on both sides away from the mean?
Mean Value

Distribution

Frequency
-6 -5

-4

-3

-2

-1

Data Elements

Normal Distribution
Is the curve symmetric about the mean?
Mean Value

Distribution

Frequency
-6 -5

-4

-3

-2

-1

Data Elements

What if the data is not symmetric?

Histogram Interpretation: Skewed (Non-Normal) Right

What if the data is not symmetric?

A normal distribution is a reasonable assumption.

Empirical Rule (MAKING PREDICTIONS)


Applies to normal distributions Almost all data will fall within three standard deviations of the mean

Empirical Rule
If the data are normally distributed:
68% of the observations fall within 1 standard deviation of the mean. 95% of the observations fall within 2 standard deviations of the mean. 99.7% of the observations fall within 3 standard deviations of the mean.

Empirical Rule Example


Data from a sample of a larger population

Mean = x = 0.08
Standard Deviation = s = 1.77 (sample)

Normal Distribution
0.08 + - 1.77 = -1.69 0.08 + 1.77 = 1.88

68 %

s s -1.77 +1.77 x 0.08

Data Elements

Normal Distribution
0.08 + -3.54 = - 3.46 0.08 + 3.54 = 3.62

95 %

2s - 3.54

x 0.08 Data Elements

2s + 3.54

Your Turn
Revisit the data you collected during the Fling Machine Instant Challenge.
Assume that you repeated launch cotton balls with your device. Using the mean and sample standard deviation of your data:
Predict the range of travel distances within which 68% of cotton balls would fall
Predict the range of travel distances within which 95% of cotton balls would fall

Example
Assume that a statistical analysis resulted in the following:
Mean = x = 2.35 ft.
Sample standard deviation = s = 0.76 ft
Predict the range of travel distances within which 68% of cotton balls would fall x s : 2.35 - 0.76 = 1.59 ft 2.35 + 0.76 = 3.26 ft Prediction: Approximately 68% of the launches will result in a travel distance between 1.59 ft and 3.26 ft.

Example
Assume that a statistical analysis resulted in the following:
Mean = x = 2.35 ft.
Sample standard deviation = s = 0.76 ft
Predict the range of travel distances within which 95% of cotton balls would fall x 2s : 2.35 2(0.76) = 0.83 ft 2.35 + 2(0.76) = 3.87ft Prediction: Approximately 95% of the launches will result in a travel distance between 0.83 ft and 3.86 ft.

Uncertainty in Measurements
Scientists and engineers often use significant digits to indicate the uncertainty of a measurement
A measurement is recorded such that all certain digits are reported and one uncertain (estimated) digit is reported

Uncertainty in Measurements
Another (more definitive) method to indicate uncertainty is to use plus/minus notation.
THIS IS THE FORMAT YOU WILL USE IN COLLEGE
IF YOU WANT TO ADOPT IT SOONER BE MY GUEST

Example: 3.84 .05 cm 3.79 true value 3.89 This means that we are certain the true measurement lies between 1.19 cm and 1.29 cm

Uncertainty in Measurement
In some cases the uncertainty from a digital or analog instrument is greater than indicated by the scale or reading display
Resolution of the instrument is better than the accuracy
Example: Speedometers

How can we determine, with confidence, how close a measurement is to the true value?

Uncertainty in Measurement
Uncertainty of single measurement
How close is this measurement to the true value? Uncertainty dependent on instrument and scale

Uncertainty in repeated measurements


Random error Best estimate is the mean of the values

Accuracy and Precision


Accuracy = the degree of closeness of measurements of a quantity to the actual (or accepted) value Precision (repeatability) = the degree to which repeated measurements show the same result

High Accuracy Low Precision

Low Accuracy High Precision

High Accuracy High Precision

Accuracy and Precision


Ideally, a measurement device is both accurate and precise Accuracy is dependent on calibration to a standard
Correctness Poor accuracy results from procedural or equipment flaws Poor accuracy is associated with systematic errors

Precision is dependent on the capabilities of the measuring device and its use
Reproducibility Poor precision is associated with random error

Your Turn
Two students each measure the length of a credit card four times. Student A measures with a plastic ruler, and student B measures with a precision measuring instrument called a micrometer.
Student A 85.1mm 85.0 mm 85.2 mm 84.9 mm Student B 85.701 mm 85.698 mm 85.699 mm 85.701 mm

Your Turn
Plot Student As data on a number line Plot Student Bs data on a number line
Student A 85.1mm 85.0 mm 85.2 mm 85.1 mm Student B 85.301 mm 85.298 mm 85.299 mm 85.301 mm

Your Turn
Student As data ranges from 85.0 mm to 85.2 mm Student Bs data ranges from 85.298 mm to 85.301 mm The accepted length of the credit card is 85.105 mm

Accepted Value

85.105

Your Turn
Which students data is more accurate?
Student A

Which students data is more precise?


Student B

Quantifying Accuracy
The accuracy of a measurement is related to the error between the measurement value and the accepted value Error = measured values accepted value
Student A 85.1mm 85.0 mm 85.2 mm 85.1 mm Student B 85.301 mm 85.298 mm 85.299 mm 85.301 mm

Student A: xA = 85.10 mm Student B: xB = 85.2998 mm

Quantifying Accuracy
Calculate the error of Student As measurements
Error A = mean of measured values accepted value Error A = 85.10 mm 85.105 mm = 0.005 mm xA = 85.10
Accepted 85.105 Value
Error - 0.005

Quantifying Accuracy
Calculate the error of Student Bs measurements
Error B = mean of measured values accepted value Error B = 85.2998 mm 85.105 mm = 0.1948 mm xA = 85.10
Error 0.1948

Accepted 85.105 Value

xB= 85.2998

Error - 0.005

Quantifying Accuracy
Calculate the error of Student Bs measurements
Error B = mean of measured values accepted value Error B = 85.2998 mm 85.105 mm = 0.1948 mm xA = 85.10
Error |0.1948| 0.1948 = 0.1948

Accepted 85.105 Value

xB= 85.2998

Error |- 0.005| 0.005 = 0.005

Quantifying Accuracy
Calculate the error of Student Bs measurements
Error B = mean of measured values accepted value Error B = 85.2998 mm 85.105 mm = 0.1948 mm xA = 85.10
Error |0.1948| 0.1948 = 0.1948

Student A MORE ACCURATE

Accepted 85.105 Value

xB= 85.2998

Error |- 0.005| 0.005 = 0.005

Quantifying Precision
Precision is related to the variation in measurement data due to random errors that produce differing values when a measurement is repeated

Quantifying Precision
The precision of a measurement device can be related to the standard deviation of repeated measurement data

Student A 85.1mm 85.0 mm 85.2 mm 85.1 mm

Student B 85.301 mm 85.298 mm 85.299 mm 85.301 mm

Student A: sA= 0.08 mm Student B: sB = 0.0015 mm

Quantifying Precision
Use the empirical rule to express precision
True value is within one standard deviation of the mean with 68% confidence True value is within two standard deviations of the mean with 95% confidence

Quantifying Precision
Express the precision indicated by Student As data at the 68% confidence level
True value is 85.10 0.08 mm with 68% confidence
85.10 0.08 mm true value 85.10 + 0.08 mm

85.02 mm true value 85.18 mm with 68% confidence


Student A: xA= 85.10 mm sA= 0.07 mm

Quantifying Precision
Express the precision indicated by Student As data at the 95% confidence level
True value is 85.10 2(0.08) mm with 95% confidence
85.10 0.16 mm true value 85.10 + 0.16 mm

84.94 mm true value 85.26 mm with 95% confidence


Student A: xA= 85.10 mm sA= 0.07 mm

The Statistics of Accuracy and Precision


A
Low Accuracy High Precision

B
High Accuracy High Precision

C
Low Accuracy Low Precision

D
High Accuracy Low Precision

You might also like