Professional Documents
Culture Documents
Numerical Measures
Numerical measure describing some characteristic of a population is called a Parameter Numerical measure describing some characteristic of a sample is called a Statistic
. A
1)
Arithmetic Mean
Notation
Notation
pronounced x-bar denotes the mean of a set of sample values
x x= x1 + x2 + ...... + xn = n n
i
Advantages
Is relatively reliable. Takes every data value into account.
Disadvantage
Is sensitive to every data value, one extreme value can affect it dramatically; is not a resistant measure of center
2)
Median
Median
the middle value when the original data values are arranged in order of increasing (or decreasing) magnitude
raw: 5.40
0.48 1.10
(in order - even number of values no exact middle shared by two numbers)
0.73 + 1.10
2
raw:
MEDIAN is 0.915
0.48 1.10 0.66 1.10 1.10 5.40
exact middle
MEDIAN is 0.73
3)
Mode
Bimodal
two data values occur with the same greatest frequency Multimodal more than two data values occur with the same greatest frequency No Mode no data value is repeated Mode is the only measure of central tendency that can be used with nominal data
Mode - Examples
b. 27 27 27 55 55 55 88 88 99
c. 1 2 3 6 7 8 9 10
Symmetric
distribution of data is symmetric if the left half of its histogram is roughly a mirror image of its right half
Skewed
distribution of data is skewed if it is not symmetric and extends more to one side than the other
Skewness
Range
The range of a set of data values is the difference between the maximum data value and the minimum data value.
Range = (maximum value) (minimum value)
It is very sensitive to extreme values; therefore not as useful as other measures of variation.
Standard deviation
The standard deviation of a set of sample values, denoted by s, is a measure of variation of values about the mean.
The value of the standard deviation s can increase dramatically with the inclusion of one or more outliers (data values far away from all others).
The units of the standard deviation s are the same as the units of the original data values.
Variance
The variance of a set of values is a measure of variation equal to the square of the standard deviation.
Notation- summary
x = sample mean s = sample standard deviation
s2 = sample variance
= population mean
= population standard deviation
= population variance
Definition
Frequency Distribution (or Frequency Table)
shows how a data set is partitioned among all of several categories (or classes) by listing all of the categories along with the number of data values in each of the categories.
The frequency for a particular class is the number of original values that fall into that class.
Class Boundaries
are the numbers used to separate classes, but without the gaps created by class limits
Class Boundaries
119.5
129.5
Class Width
is the difference between two consecutive lower class limits or two consecutive lower class boundaries
10
Class Width
10 10 10 10 10
3. 4. 5. 6.
Starting point: Choose the minimum data value or a convenient value below it as the first lower class limit. Using the first lower class limit and class width, proceed to list the other lower class limits. List the lower class limits in a vertical column and proceed to enter the upper class limits. Take each individual data value and put a tally mark in the appropriate class. Add the tally marks to get the frequency.
class frequency
sum of all frequencies class frequency 100%
percentage frequency
Total Frequency = 40
Cumulative Frequencies
Histogram
A graph consisting of bars of equal width drawn adjacent to each other (without gaps). The horizontal scale represents the classes of quantitative data values and the vertical scale represents the frequencies. The heights of the bars correspond to the frequency values.
Histogram
Basically a graphic version of a frequency distribution.
Histogram
Horizontal Scale for Histogram: Use class boundaries or class midpoints.
Ogive
A line graph that depicts cumulative frequencies
Percentiles
are measures of location. There are 99 percentiles denoted P1, P2, . . . P99, which divide a set of data into 100 groups with about 1% of the values in each group.
Percentile of value x = number of values less than x total number of values 100
Quartiles
Are measures of location, denoted Q1, Q2, and Q3, which divide a set of data into four groups with about 25% of the values in each group.
Q1
(First Quartile) separates the bottom 25% of sorted values from the top 75%.
(Second Quartile) same as the median; separates the bottom 50% of sorted values from the top 50%. (Third Quartile) separates the bottom 75% of sorted values from the top 25%.
Q2
Q3
Quartiles
Q 1 , Q2 , Q 3
divide ranked scores into four equal parts
25%
25%
25%
25% (maximum)
(minimum)
Q1 Q 2 Q3
(median)
P90 P 10
5-Number Summary
For a set of data, the 5-number summary consists of: the minimum value; the first quartile Q1; the median (or second quartile Q2); the third quartile, Q3; and the maximum value.
Boxplot
A boxplot (or box-and-whiskerdiagram) is a graph of a data set that consists of a line extending from the minimum value to the maximum value, and a box with lines drawn at the first quartile, Q1; the median; and the third quartile, Q3.
Boxplots
Outliers
An outlier is a value that lies very far away from the vast majority of the other values in a data set. It can have a dramatic effect on the mean and standard deviation. Also can have a dramatic effect on the scale of the histogram so that the true nature of the distribution is totally obscured.
Modified Boxplots
Boxplots described earlier are called skeletal (or regular) boxplots. Some statistical packages provide modified boxplots which represent outliers as special points.