Professional Documents
Culture Documents
MATH283/STAT291 2013
Categorical Measurements
Each observation belongs to one of a set of categories. Categories may be labelled by text (e.g. male, female) or by numbers (e.g. 0, 1). Nominal measurements involve unordered categories, e.g. gender. Ordinal measurements involve ordered categories, e.g. low, medium, high.
Pam Davy c Week 1, Friday Lecture (Statistics) MATH283/STAT291 2013 ,
Quantitative Measurements
Observations take numerical values which measure a physical quantity. For interval measurements, dierences have meaning but ratios dont; e.g. temperature. For ratio measurements, dierences and ratios have meaning; e.g. weight. Dont do inappropriate things to data!
Pam Davy c Week 1, Friday Lecture (Statistics) MATH283/STAT291 2013 ,
Discrete or Continuous
If the possible values are separate points on the number line, a measurement is said to be discrete, e.g. no. of emails.
u u u u u u u -
Frequency Table
Method of organizing categorical or discrete data Lists all possible values along with number of observations (frequency or count) for each value Relative frequency = frequency / total is often included (possibly as %)
, Pam Davy c Week 1, Friday Lecture (Statistics) MATH283/STAT291 2013 ,
The possible values of a continuous measurement form 1 or more intervals on the number line, e.g. length.
Pam Davy c Week 1, Friday Lecture (Statistics) MATH283/STAT291 2013
Example
Number of accidents reported per week: 0, 2, 1, 0, 0, 1, 1, 0, 0, 0 Accidents Frequency Relative frequency 0 6 0.6 1 3 0.3 2 1 0.1 total 10 1
Bar Chart
6
frequency
4 2 0 0 1 2
accidents
MATH283/STAT291 2013
Histogram
Similar to bar chart, but used for continuous interval or ratio data. Real number scale on horizontal axis, no gaps between bars, Observations are grouped into classes, not necessarily of constant width. Frequency or relative frequency is represented by area of bar.
Pam Davy c Week 1, Friday Lecture (Statistics) MATH283/STAT291 2013 ,
MATLAB Histogram
Sulphur emission data (from eLearning) Reasonably symmetric, single hump
Histogram of Sulphur Oxide Emissions 15
frequency
10
0 5
10
35
MATH283/STAT291 2013
Stem-and-Leaf Plot
Graphical display of quantitative data which retains numerical values Easy to construct with pencil and paper Example: 0 46 leaf unit = 0.1 1 1 represents data values 0.4, 0.6, 1.1
Pam Davy c Week 1, Friday Lecture (Statistics)
Construction
Left-hand digit(s) used as stem (stem may be negative) Next single digit used as leaf, truncate right-hand digits if necessary Vertical line separates stems from leaves Observations with same stem are sorted according to leaf value
Pam Davy c Week 1, Friday Lecture (Statistics) MATH283/STAT291 2013 ,
Interpretation
A stem-and-leaf plot is like a histogram rotated by 90o . Stem corresponds to horizontal histogram axis. Rows of leaves correspond to bars. Left or right tails of histogram correspond to top or bottom regions of stem-and-leaf plot
Pam Davy c Week 1, Friday Lecture (Statistics) MATH283/STAT291 2013 ,
MATH283/STAT291 2013
Truncation
Problem: Stem-and-leaf plots with too many rows dont t on paper! Solution: truncate original data values, e.g. 246.8 2|4 (leaf unit 10). Some computer packages round rather than truncate.
Splitting Rows
Problem: Stem-and-leaf plots with too few rows do not reveal shapes/patterns. Solution: try splitting each row into 2. Put low leaves (0 to 4) in one row, high leaves (5 to 9) in other row. Still not enough? Split each original row into 5; group leaves 0 to 1, 2 to 3, 4 to 5, 6 to 7, 8 to 9.
, Pam Davy c Week 1, Friday Lecture (Statistics) MATH283/STAT291 2013 ,
MATH283/STAT291 2013
Sample Mean
Consider n data values x1, x2 . . . , xn The mean is the average value. 1 x= n
n
Rescaling Data
If each data value xi is rescaled by a linear transformation a + bxi , the mean is rescaled in the same way. This is not true for a non-linear transformation such as xi2. For {1, 2, 3}, x = 2 For {13, 16, 19}, x = 10 + 3 2 = 16 For {12, 22, 32}, x = 14/3 = 22
, Pam Davy c Week 1, Friday Lecture (Statistics) MATH283/STAT291 2013 ,
xi
i =1
MATH283/STAT291 2013
Sample Variance
Variance is a measure of spread, based on squared distances of individual data points from the mean. 1 s = n1
2 n
(xi x )2
i =1
Variance is never negative, and is only zero when all data values are identical. MATLAB code: var(x)
Pam Davy c Week 1, Friday Lecture (Statistics) MATH283/STAT291 2013
Rescaling Data
Adding a constant to all observations has no eect upon standard deviation. When all observations are multiplied by positive constant c , new s = c old s . For {1, 2, 3}, s = 1 For {4, 5, 6}, s = 1 For {102, 104, 106}, s = 2 1 = 2
Pam Davy c Week 1, Friday Lecture (Statistics) MATH283/STAT291 2013 ,
Sample Median
To nd the median, rst sort the n data values in ascending order. For odd n, the median is the middle sorted data value. For even n, the median is the average of the middle 2 data values. MATLAB code: median(x)
Pam Davy c Week 1, Friday Lecture (Statistics) MATH283/STAT291 2013 ,
Sample Quartiles
Idea: 25% of data lie below rst or lower quartile Q1, 25% of data lie above third or upper quartile Q3. In practice, dierent books/packages compute quartiles in dierent ways. MATLAB uses linear interpolation: quantile(x,[0.25 0.75]) The repeated median method is simple for hand calculations.
Pam Davy c Week 1, Friday Lecture (Statistics) MATH283/STAT291 2013 ,
Box Plots
Using an axis with appropriate scale, draw a box from Q1 to Q3, and mark position of median. Draw whiskers from Q1 to the minimum and from Q3 to the maximum. Outliers are sometimes shown separately. Gives a quick, easy comparison of 2 or more samples.
Pam Davy c Week 1, Friday Lecture (Statistics) MATH283/STAT291 2013 , Pam Davy c Week 1, Friday Lecture (Statistics)
In MATLAB, use boxplot(x,g) to draw parallel box plots, where x contains data, g is a grouping variable
MATH283/STAT291 2013