You are on page 1of 26

Statistical data representation

 Frequency distribution
 Grouped frequency distribution
 Cumulative frequency distribution

2
 Mostly collected data are an overwhelming mass of raw
material and detail with out any form or structure.
 In order to make it easily understandable the first task
of the statistician is to condense and simplify them in
such a manner that irrelevant details are eliminated and
their significant features stand out prominently.
 The procedure that is adopted for this purpose is
known as the method of classification and tabulation.

3
 It is the process of arranging things in groups or
classes according to their resemblances and affinities
 The objectives of classification are:
 To eliminate unnecessary details
 To bring out clearly points of similarity and dissimilarity
 To enable one to form mental pictures of objects, and
 To enable one to make comparisons and draw
inferences
 4
 Statistical facts are classified according to their
characteristics or attributes as:
 Attributes or qualitative characteristics
• are those that are not capable of being described
numerically. E.g., Sex, Nationality, Yarn hairiness, etc...
 Variables or quantitative characteristics
• are those that can be numerically described such as
height, mass, yarn count, etc…

5
 continuous variable - a variable that takes any
numerical value with in a certain range (including
decimal points). For example: height, weight, strength,
length etc…
 Discrete variable - a variable that takes only discrete or
exact values (not written in fractions, no number after
decimal point). For example, family members, number
of working machines, etc….

6
 Tabulation is to mean constructing frequency tables
 Tables can be constructed in the following manner
 Array – represents a set of numbers arranged in rows
and columns
 The first thing to be done, therefore, is arranging the
collected data is to prepare an array
 The array is prepared by arranging the values of the
variable in an ascending or descending order

7
 This will enable the statistician to know the range over
which the items are spread, and he/she will also get an
idea of their general distribution.
 Table in next slide shows - Data of marks of 50
students obtained in statistics as originally collected

8
40 37 61 67 59
46 66 41 60 38
51 57 40 72 39
41 25 42 38 40
33 54 58 14 71
65 55 66 40 62
48 55 38 40 20
43 49 59 73 28
30 38 52 68 38
71 44 52 45 56

9
14 38 41 52 62
20 38 42 54 65
25 38 43 55 66
28 39 44 55 67
30 40 45 56 68
33 40 46 58 68
37 40 48 59 71
37 40 49 59 71
38 40 51 60 72
38 41 52 61 73

10
 Now, one can easily know the rang of the data
 Next to this, Its bulk should further be reduced so that
it will be easier to visualize and make computation
 Condensation would be achieved by representing the
repetitions of a particular mark by tallies instead of
rewriting the marks itself.
 The number of tallies corresponding to any given marks
is the frequency of that mark

11
Marks Tallies Frequency Marks Tallies Frequency

14 / 1 39 / 1
20 / 1 40 ///// 5
25 / 1 41 // 2
28 / 1 42 / 1
30 / 1 43 / 1
33 / 1 44 / 1
37 // 2 45 / 1
38 ///// 5 46 / 1
48 / 1 61 / 1
49 / 1 62 / 1
51 / 1 65 / 1
52 // 2 66 / 1
54 / 1 67 / 1
55 // 2 68 // 2
56 / 1 71 // 2
58 / 1 72 / 1
59 // 2 73 / 1
60 / 1
12
 In the table, frequency implies the number of times a
certain value of the variables is repeated in the given
data.
 A table formed in such a manner is known as frequency
distribution table

13
 A grouped frequency distribution is one where the total
numbers of items possessing a certain number of
values of the variable under study are put together and
stated as the frequency of these values
 It is used to make the data more readily comprehensible
and further reduce its bulk
 In constructing a grouped frequency table, the following
decisions have to be taken.
◦ 1. the number and width of the classes
◦ 2. Determination of the class limits 14
 The quality of a frequency distribution is ultimately
determined by a wise choice of the number and the
width of the classes
 Important points to be considered during the decision
of number and width of classes include:
1. The number of classes should seldom be less than 6
or more than 20 and,
- 15 generally is a good number

15
- By dividing the range by 15, the resultant quotient will
provide a helpful suggestion as to the size of class
interval.
2. Intervals in multiples of 5 are convenient and
preferable if possible.
- As far as possible, class intervals should be uniform in
width.
- Unequal class intervals should be avoided.

16
3.In general, an interval with an odd number of units is
easier to work with than one with an even number.
- because a class interval with odd number of units has
the advantage of having an integer as its midpoint

17
 Class limit should be definite and clearly stated
 The starting point i.e. the lower limit of the first class
should be determined in such a manner that
frequencies of each class get concentrated near the
middle of the class interval
 Because the mid point of each class is taken to
represent the value of all items included in the
frequency of that class.

18
 Inclusive class interval - items having values equal to
the lower and the upper limits of a class are included in
the frequency of that class
 Exclusive class interval - items equal to the size of
either the lower limit or the upper limit are excluded
from the frequency of that class

19
Class Marks (excluding upper limit) Frequency
1 11-20 1
2 20-29 3
3 29-38 4
4 38-47 18
5 47-56 8
6 56-65 7
7 65-74 9
Total 50

20
 Terms are clarified based on the table in slide 19
 Symbols like 11- 20 are class intervals
 The end number 11 and 20 are called class limits
 the smaller number 11 is the lower class limit
 and the larger number 20 is the upper class limit.
 A class interval has either no upper class limit or no
lower class limit indicated is called an open class
interval

21
 Class Boundaries - if the upper and/or the lower limits
are rounded from decimal numbers, say instead of 11.5
-19.5 if it is said 11-20, then 11.5 & 19.5 are called
class boundaries to mean true class limits
 11.5 is the lower class boundary
 19.5 is the upper class boundary
 The difference between the lower and upper class
boundaries also refereed to us the class width, class
size, or class length
22
 All class intervals of a frequency distribution have equal
width; this common width is denoted by ‘C’
 class mark is the mid point of the class interval and is
obtained by adding lower and upper class limits and
dividing by 2 for inclusive class intervals.

23
 A cumulative frequency distribution identifies the
cumulative number of observations included below the
upper boundary of each class in the distribution
 The cumulative frequency for a class can be determined
by adding the observed frequency for that class to the
cumulative frequency for the preceding class

24
class Marks (upper limit Frequency Cumulative
excluded ) Frequency

1 11-20 1 1
2 20-29 3 1+3=4
3 29-38 4 4+4=8
4 38-47 18 8+18=26
5 47-56 8 26+8=34
6 56-65 7 34+7=41
7 65-74 9 41+9=50

25
Thank You !

26

You might also like