Professional Documents
Culture Documents
Frequency distribution
Grouped frequency distribution
Cumulative frequency distribution
2
Mostly collected data are an overwhelming mass of raw
material and detail with out any form or structure.
In order to make it easily understandable the first task
of the statistician is to condense and simplify them in
such a manner that irrelevant details are eliminated and
their significant features stand out prominently.
The procedure that is adopted for this purpose is
known as the method of classification and tabulation.
3
It is the process of arranging things in groups or
classes according to their resemblances and affinities
The objectives of classification are:
To eliminate unnecessary details
To bring out clearly points of similarity and dissimilarity
To enable one to form mental pictures of objects, and
To enable one to make comparisons and draw
inferences
4
Statistical facts are classified according to their
characteristics or attributes as:
Attributes or qualitative characteristics
• are those that are not capable of being described
numerically. E.g., Sex, Nationality, Yarn hairiness, etc...
Variables or quantitative characteristics
• are those that can be numerically described such as
height, mass, yarn count, etc…
5
continuous variable - a variable that takes any
numerical value with in a certain range (including
decimal points). For example: height, weight, strength,
length etc…
Discrete variable - a variable that takes only discrete or
exact values (not written in fractions, no number after
decimal point). For example, family members, number
of working machines, etc….
6
Tabulation is to mean constructing frequency tables
Tables can be constructed in the following manner
Array – represents a set of numbers arranged in rows
and columns
The first thing to be done, therefore, is arranging the
collected data is to prepare an array
The array is prepared by arranging the values of the
variable in an ascending or descending order
7
This will enable the statistician to know the range over
which the items are spread, and he/she will also get an
idea of their general distribution.
Table in next slide shows - Data of marks of 50
students obtained in statistics as originally collected
8
40 37 61 67 59
46 66 41 60 38
51 57 40 72 39
41 25 42 38 40
33 54 58 14 71
65 55 66 40 62
48 55 38 40 20
43 49 59 73 28
30 38 52 68 38
71 44 52 45 56
9
14 38 41 52 62
20 38 42 54 65
25 38 43 55 66
28 39 44 55 67
30 40 45 56 68
33 40 46 58 68
37 40 48 59 71
37 40 49 59 71
38 40 51 60 72
38 41 52 61 73
10
Now, one can easily know the rang of the data
Next to this, Its bulk should further be reduced so that
it will be easier to visualize and make computation
Condensation would be achieved by representing the
repetitions of a particular mark by tallies instead of
rewriting the marks itself.
The number of tallies corresponding to any given marks
is the frequency of that mark
11
Marks Tallies Frequency Marks Tallies Frequency
14 / 1 39 / 1
20 / 1 40 ///// 5
25 / 1 41 // 2
28 / 1 42 / 1
30 / 1 43 / 1
33 / 1 44 / 1
37 // 2 45 / 1
38 ///// 5 46 / 1
48 / 1 61 / 1
49 / 1 62 / 1
51 / 1 65 / 1
52 // 2 66 / 1
54 / 1 67 / 1
55 // 2 68 // 2
56 / 1 71 // 2
58 / 1 72 / 1
59 // 2 73 / 1
60 / 1
12
In the table, frequency implies the number of times a
certain value of the variables is repeated in the given
data.
A table formed in such a manner is known as frequency
distribution table
13
A grouped frequency distribution is one where the total
numbers of items possessing a certain number of
values of the variable under study are put together and
stated as the frequency of these values
It is used to make the data more readily comprehensible
and further reduce its bulk
In constructing a grouped frequency table, the following
decisions have to be taken.
◦ 1. the number and width of the classes
◦ 2. Determination of the class limits 14
The quality of a frequency distribution is ultimately
determined by a wise choice of the number and the
width of the classes
Important points to be considered during the decision
of number and width of classes include:
1. The number of classes should seldom be less than 6
or more than 20 and,
- 15 generally is a good number
15
- By dividing the range by 15, the resultant quotient will
provide a helpful suggestion as to the size of class
interval.
2. Intervals in multiples of 5 are convenient and
preferable if possible.
- As far as possible, class intervals should be uniform in
width.
- Unequal class intervals should be avoided.
16
3.In general, an interval with an odd number of units is
easier to work with than one with an even number.
- because a class interval with odd number of units has
the advantage of having an integer as its midpoint
17
Class limit should be definite and clearly stated
The starting point i.e. the lower limit of the first class
should be determined in such a manner that
frequencies of each class get concentrated near the
middle of the class interval
Because the mid point of each class is taken to
represent the value of all items included in the
frequency of that class.
18
Inclusive class interval - items having values equal to
the lower and the upper limits of a class are included in
the frequency of that class
Exclusive class interval - items equal to the size of
either the lower limit or the upper limit are excluded
from the frequency of that class
19
Class Marks (excluding upper limit) Frequency
1 11-20 1
2 20-29 3
3 29-38 4
4 38-47 18
5 47-56 8
6 56-65 7
7 65-74 9
Total 50
20
Terms are clarified based on the table in slide 19
Symbols like 11- 20 are class intervals
The end number 11 and 20 are called class limits
the smaller number 11 is the lower class limit
and the larger number 20 is the upper class limit.
A class interval has either no upper class limit or no
lower class limit indicated is called an open class
interval
21
Class Boundaries - if the upper and/or the lower limits
are rounded from decimal numbers, say instead of 11.5
-19.5 if it is said 11-20, then 11.5 & 19.5 are called
class boundaries to mean true class limits
11.5 is the lower class boundary
19.5 is the upper class boundary
The difference between the lower and upper class
boundaries also refereed to us the class width, class
size, or class length
22
All class intervals of a frequency distribution have equal
width; this common width is denoted by ‘C’
class mark is the mid point of the class interval and is
obtained by adding lower and upper class limits and
dividing by 2 for inclusive class intervals.
23
A cumulative frequency distribution identifies the
cumulative number of observations included below the
upper boundary of each class in the distribution
The cumulative frequency for a class can be determined
by adding the observed frequency for that class to the
cumulative frequency for the preceding class
24
class Marks (upper limit Frequency Cumulative
excluded ) Frequency
1 11-20 1 1
2 20-29 3 1+3=4
3 29-38 4 4+4=8
4 38-47 18 8+18=26
5 47-56 8 26+8=34
6 56-65 7 34+7=41
7 65-74 9 41+9=50
25
Thank You !
26