You are on page 1of 16

CHAPTER ONE

STATISTICS REFRESHER
INTRODUCTION
Statistics is one of the most important and useful subjects taught in business as well as Economics
College. Governments, businesses, researchers and scientists in the Natural or Social science need
information for their activities. Most of this information required are quantitative and need a
scientific approach or technique to gather and use.

Presently, there is no successful social as well as natural scientist can function without some
knowledge of statistics. Statistics has developed to an extent that it has become indispensable in
all aspects of our activities. We frequently see or hear the following kinds of statements:
The unemployment rate has dropped to 11%
The GNP of Sub-Saharan African countries is rising at 1.7% per annum
The average coffee Price in Addis Ababa become 100 Birr/kg
And so on
All the aforementioned statements are some of the statistics (numerical facts) we encounter. Now
lets turn our discussion towards defining statistics

1.1. Meaning and Basic concepts of Statistics

The word statistics is an Italian word composed of two words, stato, which means the state and
statista-refers to a person involved with the affairs of the state. Therefore statistics was meant the
collection of facts useful to the state.

Nowadays statistics is not restricted to information about the state. It extends to almost every realm
of human endeavor.

Statistics is defined as a science or process of collecting, organizing, presenting, analyzing and


interpreting data to assist in making effective decision. Particularly, in business and economics, a
major reason for studying statistics is to give managers and decision makers a better understanding
of the business and economic environment and thus enable them to make more informed and better
decisions.

Importance of Statistics: Statistics is useful for:

Government officials for making policy decisions in unemployment, inflation,


health, education, infrastructure etc

1
Financial planners for trend analysis, stock market, future investment etc...
Businesses, for product development, customer satisfaction.
Production supervisors for quality control, improve product quality etc.
Politicians for legislation campaign strategy
Physicians and Hospitals on effectiveness of drugs and disease surveillance etc.

1.1.1. Types of Statistics

Basically, the field of statistics has two broad sub-divisions; Descriptive statistics and inferential
(analytical) statistics.

1. Descriptive Statistics: is concerned with the collection, processing, summarizing

and describing important features of the data without going beyond (i.e. without any
attempt to infer from the data).
2. Inferential (Analytical) statistics: is concerned with the process of using data
obtained from sample to make estimates or test hypotheses about the characteristics of
a population. It consists of a host of techniques that help decision makers to arrive at
rational decisions under uncertainty. A Conclusion drawn about a population based on
information in a sample from the population is called statistical inference. Statistics is
usually concerned with inference. The population we want to study is usually large or
infinite. So we need to select a sample since it is impossible to study the population.
Data Collection
It is the first step in any statistical investigations and care must be enveloped here for it prints the
foundation of statistical analysis. Here it is important to locate the sources of data and the
techniques that need to be put in to effect such that one can realize data utility.
Sources of data
Generally, there are two sources of data, namely Primary sources and secondary sources. Thus,
depending on the type of source, data can be classified in to two:

a. Primary data- the type of data that are originally and directly generated from the mouth
and action/behaving of the respondents. Example, if a manufacturing company is collecting
data from the users of its product on issues concerning the products capability in satisfying

2
their needs compared to that of competitors; then the primary sources are the customers
and the primary data is the word of mouth and behavioral reflections of the customers
b. Secondary data- unlike to the primary data, secondary data are collected from already
processed documents like journals, articles, government releases, and other documents of
relevance. One can deduct that with the use of secondary sources it is easy to save time and
(that would otherwise have been lost in planning and executing the collection project).
Moreover, it is useful in times when it is impossible to collect the primary data. In fact, the
limitations that are typically related with secondary data are that of fit and accuracy
Methods of data collection (primary)
The techniques used in gathering data should be able to spell the informational utility the
interpretation phase deserves otherwise. In fact, the choice of data collection method requires some
practice and skill. We will consider the techniques used to collect mainly primary data:

- Direct Personal Interview this is used when there need be an in depth search for opinions
and back up attitudes of the informants. Hence, a face-to-face contact with the respondents
would be a must technique.
- Observation sometimes data for business applications are collected by observation.
- Telephone-interviewing is another popular technique for gathering information from
informants.
- Mail questionnaire here written questionnaires are mailed to different respondents and is
usually used to gather data when a mailing list exists or when respondents are scattered
over a wide area.

Organization
This involves three jobs:
Editing to rule for omissions, inconsistencies, wrong computations etc
Classifying- arranging data according to some attributes of homogeneity.
Tabulation arranging data into column and rows to ensure clarity.
Presentation
This involves representing the statistical relevance on to diagrams and graphs such as pie charts,
bar graphs, histograms etc.

3
Analysis
Once the first three procedural requirements are put into effect, then it is on the go ahead position
to allow the search for information useful for decision- makers. Notice that analysis and
interpretation is note one and the same. The premier deals with the data itself while the later jumps
beyond.
Interpretation
It is about drawing meaningful conclusions from the data collected and analyzed and based on
which implementation packages are set forth in resolving the managerial problem on hand.

1.2. Frequency Distribution

Once the edited data has been put in to an ordered array, it can be organized in to a frequency
distribution

Definition: Frequency distribution is the list of data classes or categories along with the number
of values that fall in to each. This data display method shows the frequency, or the number of
occurrences, in each of the several classes. The distribution can be either for a grouped or
ungrouped data.

Exercise1. The survey by the St. George Brewery Company on the average daily sales revenue of 20
selected bars in Mekelle town shows the following data: 525, 525, 375, 700, 1200, 375, 525, 1200, 150,
700, 700, 525, 375, 375, 700, 150, 150, 525, 150, and 1200. This can be listed in a tabular form and the
frequency of occurrences be assigned as shown below;

Sales volume(birr) Frequency


150 4
375 4
525 5
700 5
1200 2
Total 20

Exercise1. Students in the New Millennium College have been asked to rate the colleges service program.
The results show Good, Excellent, Very good, Very good, Excellent, Very good, Good, Moderate,
Excellent, Very good, Good, Very good, Moderate, Excellent. The students reaction can be summarized as

4
Students reaction Frequency
Excellent 4
Very good 5
Good 2
Moderate 2
Poor 0
Total 13
Steps to construct a frequency distribution (grouped data)

Generally, the following steps should be followed in constructing a given frequency distribution table

a. Determine the number of classes, usually between 5 and 15


b. Determine the size of each class. Class size or width is determined by finding the difference
between the largest value in the data set and the smallest value and dividing it to the number of
classes desired.
c. Determine the starting point for the first class
d. Prepare a table of the distribution using the actual counts/ percentages (relative percentages)
Exercise1. As part of the financial policy and pay system reform project, ROSE Consulting Group has been
investigating the monthly income of the employees of the client company, CLEAR BLUE. The following
results were obtained; 545, 545, 545, 675, 545, 690, 690, 675, 1450, 1200, 545, 870, 870, 375, 454, 400,
600, 900, 955, 640, 1125, 1000, 1040, 755, 790, 850 775, 1075, 690, 650.

Required: Construct the frequency distribution table in 5 classes.

Solution

Determine the number of classes that you want. i.e. 5 Classes

Determine the size of the of each class;

First find the range of the data by subtracting the lowest value from the highest value; the highest value is
1450 birr and the lowest value is 375 birr. Then the range (R) is 1450 375 = 1075. Second you divide the
range to the number of classes thought, i.e.

range 1075
Class size (width) 215
total number of classes assumed 5

Then you can start constructing the intervals by determining the lower limit of the first class. Assume a
lower class limit of 375; then the class intervals will be as follows:

5
Monthly income Tally Frequency
375 590 IIIIIIII 8
590 805 IIIIIIIIIII 11
805- 1020 IIIIII 6
1020 1235 IIII 4
1235 1450 I 1

Cumulative Frequency Distribution


Cumulative frequency distribution, unlike the simple frequency distribution, spells the total
number of items or observations that fall above or below a certain point or juncture. Thus, if you
would like to know the total number of observations that fall below or above a given point, you
can use the cumulative frequency distribution. For example, construct a cumulative frequency
distribution for the above example of ROSE Consulting Group.

Monthly income Frequency Cumulative frequency


375 590 8 8
590 805 11 19
805- 1020 6 25
1020 1235 4 29
1235 1450 1 30

Exercise1. Given the number of visitors of the Mekelle Museum of Martyrs, as reported by the authorities,
24-45; 45-66; 66-87; 87-108; 108-129; 129-150 and their respective frequencies as 6, 6, 5, 5, 5, 3, construct
the frequency distribution and cumulative frequency distribution table.

Table of frequency distribution

Number of visitors Number of days


24-45 6
45-66 6
66-87 5
87-108 5
108-129 5
129-150 3

Table of cumulative frequency distribution

Number of visitors Number of days Cumulative frequency


24-45 6 6
45-66 6 12

6
66-87 5 17
87-108 5 22
108-129 5 27
129-150 3 30

Relative frequencies- relative frequencies are percentages calculated by the actual frequency
for each class by the total number of observations being classified the column of percentages
should equal 1.000
Example1.The following table presents the age distribution of a certain group.

Class interval Frequency Cumulative Relative frequency


frequency
0-9 1 1 1/88 = 0.011
10-19 6 7 6/88 = 0.068
20-29 27 34 27/88 = 0.307
30-39 22 56 22/88 = 0.250
40-49 12 68 12/88 = 0.136
50-59 16 84 16/88 = 0.182
60-69 4 88 4/88 = 0.046

1.3. Measures Of Central Tendency

In calculating summary values for data collection, the first consideration is to find a central, or
typical, value for the data. Three important measures of central tendency are presented in this
section: mean, median, mode. With the use of these measures we can summarize the huge volume
of data with a single value characterizing the nature of data we have. Moreover, measures of
variation or dispersion are used to diagnose how good the distribution of data is with reference to
the central measures.

Arithmetic mean
The arithmetic mean of a collection of numerical values is the sum of these values divided by the number of
values. The symbol for the population mean is the correct letter (mu), and the symbol for a sample mean

is x (x-bar):

x i
x i 1
, . . . . . . . . . . . . . . . . . . . . . For ungrouped data
n

7
n

fx i i
x i 1
, . . . . . . . . . . . . . . . . . . . . For a grouped data
N

The Median (Sometimes called counting average)


Median refers to a single value from the data set that measures the central item in the data. The single item
is the middle most or most central item in the set of numbers. Half of the items lie above this point, and the
other half lie below it.

To find the median we first array the data in a descending or ascending order. Once ordered, the middle
value will be the median (if the number of observations is odd) or the average of the two middle items (if
the number of items is even)

Calculation of median

n 1
th

Median 1) = item in a data array (ungrouped data)


2

For odd observation (n= odd number)

n n
th 1th
2 2
2

For even observation (n= even number

n 1
F 1
x =
2 w L (Grouped data)
2) ~
fm m for odd observation

n
F
x = Lm W
~ 2
f
m

For even observation (n= even number)

Where:
~
x is sample median

8
n is total number of items in the distribution
F is sum of all class frequencies up to, but not including, the median class
fm is the frequency of the median class
w is class interval
Lm is lower limit of the median class interval width
The Dashen Bank Mekelle Branch has disclosed that distribution of its customers monthly balance as in
the following table.
Class interval(Birr) Frequency
0-49.99 78
50-99.99 123
100-149.99 187
150-199.99 82
200-249.99 51
250-299.99 47
300-349.99 13
350-399.99 9
400-449.99 6
450-499.99 4
Total 600
n 1 600 600 1
th

Solution: Using the first method, i.e, = + th 300th & 300.5 item is the
th

2 2 2
center most. (You can take it as the 300th and the 301st item)

Add the frequencies to locate the class that contain the above center most element ( i.e., 78+123+187=388)
this shows that the item is in the 3rd class 100-149.99

Lm (the lower limit of the class) =100

n (number of observations=600

F (sum of all frequencies up to the med. class) =201

W (class interval width) 149.99 100 49.99 50

fm (frequency of the median class)=187

(n 1)
F
x Lm 2
~ W, substituting the items in the formula we have,
fm

9
(600 1)
201
x 100
~ 2 50
187

(601)
(201)
x 100
~ 2 50
187

~ (99.5)
x 100 50
187
=100+ (0.532)50

=100+26.6; ~
x =126.6 is the sample median

Exercise The following data represent the weights of fishes caught in lake Hashenge by a local fisher man
. Compute the median

Class Frequency
0-24.9 5
25-49.9 13
50-74.9 16
75-99.9 8
100-124.9 6

The Mode (observed average) x


Sometimes you may come to situation where you want to know the value with the greatest number of
happening (occurrence) the value, therefore, with the largest number of occurrence is what is called mode
or modal value. Or it can be defined rather as the value about which the items are most closely concentrated

Graphically the most typical or fashionable value of a distribution can be given as follows:

0 mode x x

Calculation of mode

If the distribution is ungrouped, then item with the greatest frequency is selected as the modal value.

10
However if the distribution is grouped the following formula is used,

1
x Lmo w
1 2
Where x is the mode

Lmo is the lower limit of the modal class


1 is the difference between the frequency of the modal class and the frequency of the pre modal
class
2 is the difference between the frequency of the modal class and the frequency of the post modal
class
W is the class interval of the modal class
Example1: Consider the following table of income distribution of 300 workers of Messebo Cement Factor

Income interval Frequency


150-199.5 14
200-249.5 27
250-299.5 58
300-349.5 72
350-399.5 63
400449.5 36
450499.5 18
Solution
Locate the class with the greatest frequency; in this case 300-349.5 is the modal class (72)
Then Lm = 300
1 = 72-58 = 14
2 = 72-63 = 9
w = 349.5-300 = 49.5 50

1
x Lmo w 14

1 2 300 50
14 9
14
300 50
23
= 300 + (0.6087)50
= 300 + 30.44
= 330.44
Example2. The following were the grade score points of 60 students in their statistics for finance

Score Frequency
35 41.99 10

11
45 54.99 12
55 64.99 18
65 74.99 13
75 84.99 7

Compute the modal score point


Solution
From the table the class with the largest frequency is 55 64.99 (18)
Then Lmo = 55
1 = 18 12 = 6
2 = 18 13 = 5
w = 64.99 55 = 9.99 10
1
x Lmo w

1 2

6
= 55 + 10
65
6
= 55 + 10
11
= 55 + (0.5455) 10
= 55 + 5.45
x 60.45 is the modal value
1.4. Measures of Dispersion [Averages of Second Orders]
The measures of central location tell only part of what we need to know regarding certain characteristics of
a distribution, i.e., the unit central measure that represents the entire data. These measures may always be
sufficient when the set of observation may have same values. However, in practice some of the
observations may show disparities from where the center is. Thus measures of dispersion helps us study
another dimension of importance in the statistical investigation. Measures of dispersion study how the data
set are spread or in other words it studies the degree of variability of the distribution.
Importance of measuring variation (dispersion)
i. to decide on the reliability of the central value (average)
ii. to serve as the basis of control of the variability
iii. compare two or more distribution
Variance and Standard Deviation
These are other measures of dispersion which are often used in many areas of interest and particularly as
they apply to business. Variance and standard deviation are powerful measures of dispersion which take in
to account how all the observations in the data are distributed and take in to consideration each value of the
data. If the data are reasonably closer to the center (to the mean), then we say that there is little variability or

12
dispersion in the data. On the other hand, if the data are quite dispersed and at a considerable distance from
the center, then we would say that the data is highly variable.
Their measure is given by:
(Ungrouped Data)
(x )2 N xi ( xi ) 2
2

Variance
2 i
, or , population
N N2

Variance s 2

(x i x) 2
, or
n xi ( xi ) 2
2

, sample
n 1 n 2 (n 1)

S tan dard Deviation s


(x i x)2
, sample
n 1

(x i )2
, population
N

Where
2 = population variance
s 2 = sample variance
N = total number of observations (population size)
n = total sample observation
= population mean
x i = data values or class midpoints
x = sample mean
= population standard deviation
s = sample standard deviation

Example1. Take sample ages of 10 college students below. Find their standard deviation and the variance.
17, 17, 18, 19, 20, 20, 22, 22, 22, and 23
Solution
First compute the mean of the distribution, i.e.,

x
Xi 200 20
n 10
Then the variance can be computed as follows:
Age(x) xx 2
x x
17 -3 9
17 -3 9
18 -2 4
19 -1 1

13
20 0 0
20 0 0
22 2 4
22 2 4
22 2 4
23 3 9

x
2
i x 44

Variance s 2
(x i x) 2

44

44
4.88 ; The standard deviation of the distribution is the
n 1 10 1 9
root value of its variance, i.e. s s 2 4.88 2.2

Example New Millennium College wants to hire graduates in areas of management, accounting, and
economics. The ages of the first 10 applicants to be interviewed are 22, 21, 20, 25, 26, 24, 26, 24, 22, and
24. The college demands candidates whose ages are fairly grouped around 23 years. Moreover, the college
wants that the standard deviation of 2 years as acceptable. Does this group of candidates qualify?

Solution
Age of candidates(x) x x x x 2

22 -1.4 1.96
21 -2.4 5.76
20 -3.4 11.56
25 1.6 2.56
26 2.6 6.76
24 0.6 0.36
26 2.6 6.76
24 0.6 0.36
22 -1.4 1.96
24 0.6 0.36

x
2
i x 38.4

x
x i

234
23.4
n 10

S
(x i x) 2

38.4

38.4
4.2667 2.0656
n 1 10 1 9
38.4
S2 4.2667
9
Since the computed standard deviation is greater than the desired one, the candidates may not all qualify
Example1. The age of college students is given below

14
Age F
16 -17 4
17 -18 14
18 -19 18
19 20 28
20 -21 20
21 22 12
22 23 4
Compute the variance & the standard deviation of the distribution

Solution: Before you try to compute the values, you have to find the class mid points, i.e., upper class limits
plus lower class limit divided by two. Then,
Age Mid fi fixi xx xx
2

fi x x
2

point
16 -17 16.5 4 66 -2.98 8.8804 32.52
17 -18 17.5 14 245 -1.98 3.9204 54.89
18 -19 18.5 18 333 -0.98 0.9604 17.29
19 20 19.5 28 546 0.02 0.0004 0.01
20 -21 20.5 20 410 1.02 1.0404 20.81
21 22 21.5 12 258 2.02 4.0804 48.96
22 23 22.5 4 90 3.02 9.1204 36.48
f x 1,948 f x 2
i i i x 210.96

(Grouped data)

Variance 2
f ( x ) , and s f ( x x)
i i
2
2 i i
2

f i n 1

S tan dard deiation


fi ( x ) , and s fi ( x x)
i
2
i
2

fi n 1

x
f x 1948 19.48
i i

f i 100

The variance of the distribution is s


f ( x x) 210.96 210.96 2.13
2 i i
2

n 1 100 1 99

Standard deviation s
fi ( x x) i
2

2.13 1.46
n 1
1.5. Measures of Shapes and Distribution
Frequency distributions can assume many shapes. The three most important shapes are positively
skewed, symmetric, and negatively skewed.

15
1. Positively skewed or right-skewed distribution: the majorities of the data values fall to
the left of the mean and cluster at the lower end of the distribution; the tail is to the right.
Also, the mean is to the right of the median, and the mode is to the left of the median. For
example, if an instructor gave an examination and most of the students did poorly; their
scores would tend to cluster on the left side of the distribution. A few high scores would
constitute the tail of the distribution, which would be on the right side.
Another example of a positively skewed distribution is the incomes of the population of the
United States. Most of the incomes cluster about the low end of the distribution; those with
high incomes are in the minority and are in the tail at the right of the distribution.

2. Symmetric distribution: the data values are evenly distributed on both sides of the mean.
In addition, when the distribution is unimodal, the mean, median, and mode are the same
and are at the center of the distribution. Examples of symmetric distributions are IQ scores
and heights of adult males.
3. Negatively skewed or left-skewed: When the majorities of the data values fall to the right
of the mean and cluster at the upper end of the distribution, with the tail to the left, the
distribution. Also, the mean is to the left of the median, and the mode is to the right of the
median. As an example, a negatively skewed distribution results if the majority of students
score very high on an instructors examination. These scores will tend to cluster to the right
of the distribution.
When a distribution is extremely skewed, the value of the mean will be pulled toward the
tail, but the majority of the data values will be greater than the mean or less than the mean
(depending on which way the data are skewed); hence, the median rather than the mean is a
more appropriate measure of central tendency. An extremely skewed distribution can also
affect other statistics.

16

You might also like