You are on page 1of 10

Block XI | Research | Lesson 1

DESCRIPTIVE STATISTICS
Dr. Telia Avendano Posecion
September 15, 2015 (10:00AM-12:00 PM)

SUMMARY/OUTLINE
I.

Data Analysis & Collection


A. Getting Ready for Data Analysis
B. 10 Commandments for Data Analysis
C. Data Analysis
D. Levels of Measurements
1. Nominal
2. Ordinal
3. Interval
4. Ratio
II. Statistics
A. Selecting Statistical Analysis
B. Major Uses of Statistical Procedures
1. Inferential Statistics
2. Descriptive Statistics
a. Frequency Distribution
b. Graphical Presentation of Data
c. Summary Statistics
i.
Measures of Central Tendency
a) Mean
b) Mode
c) Median
d) Distributions
- Symmetrical
- Positively Skewed
- Negatively Skewed
ii.
Measures of Variability
a) Range
b) Standard Deviation
c) Variance
- Coefficient of Variability
iii.
Measures of Location
a) Percentile
b) Quartile
c) Decile

DATA ANALYSIS & COLLECTION


A. Getting Ready for Data Analysis
1. Research proposal
2. Data collection form
3. Data collection
4. Data processing - looking for errors, for completeness, arranging, and encoding
5. Data analysis
B.
1.
2.
3.

Ten Commandments of Data Collection


Think of the data you have to collect to answer your question.
Think about where you will be getting your data. There should be a catchment area.
Make sure that the data collection form you are using is clear and easy to use. Tools should be
validated and undergo pre-testing.
4. Once you transfer your scores to your data collection form, make a duplicate copy of the data
file and keep it in a separate location. Always have a back up.
5. Do not rely on other people to collect or transfer your data unless you have personally trained
them and are confident that they understand the data collection process as well as you do.
Page
SGD 7B| Acanto, Maquilang, Roldan

1 of 10

Block XI | Research | Lesson 1


Descriptive Statistics
6. Plan a detailed schedule of when and where you will be collecting your data. This is
where we make a Gantt chart.
7. As soon as possible, cultivate possible sources for your participant pool.
8. Try to follow up on subjects who missed their testing session or interview. Make
sure that the data collection form is completely filled in before leaving. It is very
difficult to follow-up respondents.
9. Never discard original data. You may have missed something.
10.Follow the previous 9.
C.

Data Analysis
Process of summarizing trends and patterns observed in the data
Determine major differentials or relationships among variables used in the study
Application of appropriate statistical tests on a set of data to answer the objectives
of a study

Type of data analysis to use depends on the:


Objective of the study (to describe groups, determine sensitivity, to know risk
factors, etc.)
Kind of scales of measurement of the data or variables being dealt with is very
important too.
- Levels of Measurement
1. Nominal
Simplest, no mathematical values, categorical scale, we dont measure but we
count number of observations with or without attribute or interest
2. Ordinal
Ranked into two or more orders, distance between is not the same, ex. small
medium large, observations are greater than others; you use this on parametric
statistical tools
3. Interval
Data have numerical value, distance in between have equal distances
4. Ratio
Same as interval, difference of addition of meaningful zero point
STATISTICS
Powerful tool for organizing and understanding data
Provides ways to represent and describe groups, summarize results and evaluate
data
All about summarizing
A. Selecting Statistical Analysis: Initial Flowchart

Page

2 of 10

Block XI | Research | Lesson 1


Descriptive Statistics

B. Two Major Uses of Statistical Procedures


1. Descriptive Statistics
First step in analysis of data is to describe them
Simplify and organize data
Describe some of the characteristics of the distribution of scores you have
collected
Demographic data usually first
2. Inferential Statistics
Interpret what the data mean
Help you make decisions about how the data you collected relates to your
original hypothesis
Help you make generalizations but should be careful, it depends on sampling
method etc.
DESCRIPTIVE STATISTICS
A. Frequency distribution
B. Graphical representation of data
C. Summary statistics
A. Frequency Distribution
Simplest way to organize and summarize data at a glance
How often?
List of the number of participants who fall in a particular category
It is helpful to convert frequencies to percentages
If there are many possible scores between the highest and lowest scores,
frequency distribution will be long and almost as difficult to read as the original
data Used usually if the data is nominal.
We use group frequency distribution which shortens the table to a more
manageable size.
Sometimes, it is helpful to categorize participants on the basis of more than one
variable at the same time. This is called cross-tabulation. Cross-tabulations can
Page

3 of 10

Block XI | Research | Lesson 1


Descriptive Statistics
help us to see relationships between nominal measures. Usually a 2x2 table
sometimes called a contingency table.
Table 1. Contingency table of males and females pro- and anti- reproductive Health
(RH) Bill
Sex
Male
Female
total

pro RH
Bill
38
24
62

anti RH
Bill
12
26
38

Total
50
50
100

B. Graphical Presentation of Data


one picture is worth a thousand words
Clarify a data set
Most people find a graphic representation easier to understand than other
statistical procedures
Helps interpret a summary statistic or statistical set

Examples:
1. Bar Graph (Used if data is discrete/nominal with categories)

2. Histogram (Used if the data is continuous)

Page

4 of 10

Block XI | Research | Lesson 1


Descriptive Statistics

Bar graph vs Histogram


The bars represent different categories. They could be rearranged.
Histograms use continuous data where the bins represent ranges of data rather
than categories.
3. Pie Chart
- Used if there are more than two variables being compared

4. Component Bar Graph


-used to show different categories, within a category, there is another category

Page

5 of 10

Block XI | Research | Lesson 1


Descriptive Statistics
5. Frequency polygon
-

a graphical device for understanding the shapes of distributions. They serve the same purpose as histograms,
but are especially helpful for comparing sets of data. Frequency polygons are also a good choice for
displaying cumulative frequency distributions.

6. Line Gram (can be days of the week, years)

7. Scatter-plot (relationships and correlations)

Types of Graphs Commonly Used in Presenting

Page

6 of 10

Block XI | Research | Lesson 1


Descriptive Statistics

C. Summary statistics
Measures of Central Tendency
- Mean, Median, Mode
Measures of Variability
- Range, Variance, Standard Deviation,
- Coefficient of Variability

Measures of Location
- Percentile , decile, quartile
MEASURES OF CENTRAL TENDENCY
1. Mean
Most commonly used measure of central tendency unless distribution is skewed
The average

When a distribution is skewed, the most strongly affected is the mean


May be influenced by extreme values
Not ordinarily used with ordinal data because of the arbitrary nature of an ordinal
scale
2. Median
Score in a distribution above which one half of the scores lie
Order scores/data from lowest to highest then select the middle score as the
median
Used if distribution is skewed
If n= even, find the mean of the 2 middle scores
Often used to compute an average when extreme scores are involved
Used in ordinal data
To determine the position of the median, the following formula may be used:
Page

7 of 10

Block XI | Research | Lesson 1


Descriptive Statistics
3. Mode
Most frequently occurring value
An excellent choice if you want a general overview of which class or category
occurs most frequently
A distribution may have more than one mode
Common mistake: The mode is the value and not the frequency of that value.
Easily computed but unstable; can be affected by a change in only one or two
scores
Choice of the Measures of Tendency depends on nature of the distribution and
concept of central tendency which is desired
Can be bimodal, (2), Multimodal (more than 2)
Choice of Measure Tendency (very important)
Depend on:
a. Nature of distribution
b. Concept of central tendency
c. Scale /level of measurement
Guidelines to help you decide which measure of central tendency is best:
Mean is used for numerical data and for symmetric distribution.
Median is used for ordinal data or for numerical data if the distribution is skewed
Mode is used primarily for bimodal distributions
4.

Distributions
Symmetric / normal
Bell shaped curve
Most participants are near the middle of the distribution
Location of the measures of central tendency for a symmetric distribution

Positively skewed
Direction is indicated by the tail (skewedness depends on the tail)
Most of the scores pile up near the bottom
Skewed to the right

Negatively skewed
- Most of the scores pile up near the high or positive side
- skewed to the left (clue is to look at the mean, it is to the left of the median)
MEASURES OF VARIABILITY
Highest minus lowest
One of the most important concepts in research
Measures of variability measure the spread or degree of variability present in the
distribution while central tendency give information only as to the tendency of the
values to clump together
Natural variability among participants or samples often can mask the effects of
variables under study
Page

8 of 10

Block XI | Research | Lesson 1


Descriptive Statistics
Most research designs and statistical procedures were developed to control or
minimize the effects of natural variability of scores
Some variables may have large differences between participants, others small
Important points to remember:
Scores do vary
Degree of variability can be quantified
1.

Range
Simplest measure of variability
Difference between the highest and the lowest value/score
Too unstable because it depends on only two scores
A single deviant score can dramatically affect the range of scores

2. Variance
A measure of variability is better than range since it utilizes all the scores in
quantifying the degree of variability in the data
Has statistical properties that make it useful in influential statistics
It answers the question on average, how much do the scores in the sample differ
from the mean of the sample
Takes the mean as the reference point
Takes into account the deviation of each individual observation from the mean
It is the average of the squared deviations from the mean
Another definition of mean is the score around which the sum of the deviations
equals zero.
The more variability in a group, the higher the value of the variance; the more
homogenous the group, the lower the variance
3.

Standard Deviation
Square root of the variance
Transforms the variance back into the same units as the original scores
For ungrouped data:

Page

9 of 10

Block XI | Research | Lesson 1


Descriptive Statistics
Coefficient of Variation
Use it when units of measurement of variables being compared are different
Ex. Weight in kg VS height in cm
Or when the means differ markedly
A more peaked pot shows less variability
Measure of relative dispersion which expresses the standard deviation as a
percentage of the mean

MEASURES OF LOCATION
Percentile is one of the 99 values of a variable which divides the distribution into
100 equal parts
Decile is one of 9 values of a variable which divides the distribution into 10 equal
parts
Quartile is one of the 3 values of a variable which divides the distribution into 4
equal parts
Percentile
Most frequent type of measure used to report the results of standardized tests,
anthropometric measurements
These scores are normed on very large groups in which the scores form an
approximately normal distribution
A persons percentile rank is a very close estimate of how many persons could be
expected to score lower than that person
Easiest score to understand
Ex. NMAT Scores
Interquartile Range (IQR)
It is a measure of spread,
It is primarily used to build box plots.
It can also be used as a test for normal distribution.
The formula can be used to find outliers in a data set.
It is a measure of where the first and last data items are in a set
The difference between the first quartile and third quartile of a set of data or the
difference between the upper quartile or the lower quartile
The IQR formula is used in conjunction with the mean and standard deviation to
test whether or not a population has a normal distribution.
Reference:
Dr. Posecions Lecture

Page

10 of 10

You might also like