Professional Documents
Culture Documents
2.1 Introduction
In the last Unit, we have studied the role of statistics in the various areas of
science and engineering. The current unit focuses on the collection,
analysis, and interpretation of uncertain business data. The compilation and
analysis of data are fundamental to science and engineering. Scientists
discover the principles that govern the physical world, and techno-managers
learn how to design important new products and processes, by analyzing
data collected in scientific experiments. A major difficulty with scientific data
is that they are subject to random variation, or uncertainty. To deal with the
uncertainty of data, knowledge of statistics is essential.
Data collection and analysis for the business are playing an ever-increasing
role in all aspects of modern life. For better or worse, huge amounts of data
are collected about our opinions and our lifestyles, for purposes ranging
from the creation of more effective marketing campaigns to the development
of social policies designed to improve the way of life. On almost any given
day, newspaper articles are published that purport to explain social or
Activity 1:
Distinguish between primary and secondary data. Describe the various
methods of collecting primary data.
Self Assessment Questions
5. Sources of data can be primary or secondary. (True/False)
6. Match the following:
a) Attribute a) Quantitative data
b) Variable b) Qualitative data
c) Classification of data c) Four types
d) Reserve Bank of India bulletin d) Secondary data
e) Monthly coal bulletin e) Primary data
f) Monthly abstract of statistics f) Primary data
7. Primary data can be collected by census method or by sample enquiry
method. (True/False)
8. Direct personal interview method is used to collect _____________.
(i.e., persons who are required to answer them) will feel bored and
reluctant to answer all the questions.
2) The individual questions should be simple, unambiguous and precise.
Lengthy questions cause irritation, resulting in careless and inaccurate
replies. Complicated questions should be split up into several smaller
parts which can be easily answered by the respondents. Explanations
and definition of some of the terms used in questionnaire must therefore
accompany each proforma.
3) If possible, questions should be so set as to elicit only two possible
definite answers-„yes‟ or „no‟.
4) The units in which the information is to be collected should be clearly
and precisely mentioned in the questionnaire.
5) The arrangement of questions in the proforma should be such as to
have an easy and systematic flow of answers in turn. Questions should
not skip back and forth from one topic to another.
6) After the questionnaire has been devised, it is desirable to try it on a few
individuals. The procedure, which is known as pilot survey, is useful in
detecting the shortcomings of the questionnaire, so that necessary
modifications may be made before it is used in the actual enquiry.
Hence, the outcome of each question will produce the large data base.
Statistician will use this data base for further analysis and prediction of the
results.
Activity 2:
An automobile manufacturing unit has selling branches in each
metropolitan city in India. It manufactures four different types of vehicles,
which are sold by its branches. The Head Office wishes to plan a sales
campaign based on the past sales and likely future demand. Design a
questionnaire for the collection of necessary data and draft the
instructions for completing the questionnaire.
minutes and 15 seconds. Similarly, assume that the girls‟ competition was
won by Nabnita Choudhury in 30 minutes and 10 seconds for the same
race. However, the names and finishing times for all participants were
recorded in order of finishing. The first three positions won were given
awards.
The above example contains much different kind of data. The information
that puts each contestant into the category of boy or girl is known as
qualitative data. For example, the information that the winner Nabnita
Choudhury is a female is “qualitative data”.
The order in which the winners finished is known as, “ordinal data”. The
persons finishing first, second, third etc., are examples of ordinal data.
All these voluminous data must be presented in a condensed form to the
management without any loss of information contained in it. Hence, the
collected data must be organized, carefully summarized and presented
either in the form of tables or graphs that can be easily interpreted. The
tools of classification and presentation of statistical data are listed as
follows:
Frequency distribution
Cumulative frequency distribution
Relative frequency distribution
Charts
2.6.1 Frequency distribution
Frequency distribution is a better way to arrange data. It helps in
compressing data. For constructing a frequency table, we divide the data
into groups of similar values (class) and then record the number of
observations that fall in each group. If the statistical data are of repeating
nature, then they should be presented in the form of the number of
occurrences of each value of the data of a particular type. A frequency
distribution is defined as, “the list of all the values obtained in the data and
the frequency with which these value occur in the data”.
The frequency distribution can be classified into discrete frequency
distribution and continuous frequency distribution, which are demonstrated
in Tables 2.1 and 2.2, respectively. In table 2.1, the variable has discrete
numerical value. But the monthly income is a continuous variable (table 2.2).
Example 1 Construct a pie chart for the following data: Principal exporting
countries of Cotton (1000 bales)-1955-56
U.S.A India Egypt Brazil Argentina
6,367 2,999 1,688 650 202
Bar chart
In bar charts, we make use of rectangles to present the given data. It
consists of a group of equispaced rectangular bar, one for each category (or
class) of given statistical data. The bars starting from a common base line
must be of equal width and their lengths represent the value of statistical
data. A bar chart can be set up in different forms: vertical, horizontal.
Vertical bars are used to represent time series data or data classified by the
values of a variable. Horizontal bars are used to depict data classified by
attributes only.
Example 2: The following table shows the average approximate yield of food
grains in lbs. per acre in various countries of the world in 1988-89.
Fig. 2.4: Average Yields of Food Grains in lbs per Acre in Various Countries
Histogram
In the representation of „histogram‟, the given data are plotted in the form of
series of rectangles. Class intervals are marked along the X-axis and the
frequencies along the Y-axis according to a suitable scale. Unlike the bar
chart, which is one dimensional meaning that only the length of the bar
material and not the width, a histogram is two-dimensional in which the
length and the width are important. A histogram is constructed from a
frequency distribution of grouped data, where the height of the rectangle is
proportional to the respective frequency and the width represents the class
interval. Each rectangle is joined with the other and the blank spaces
between the rectangles would mean that the category is empty and there
are no values in that class interval.
Frequency polygon
A „frequency polygon‟ is a line chart of frequency distribution in which either
the values of discrete variables or the mid-points of class intervals are
plotted against the frequencies and these plotted points are joined together
by straight lines. Since the frequencies don‟t start at zero or end at zero, this
diagram as such would not touch the horizontal axis. However, since the
area under the entire curve is the same as that of a histogram which is
100%, the curve must be „enclosed.‟ The starting mid-point is joined with a
„fictitious‟ preceding mid-point whose value is zero. This makes the
beginning of the curve touch the horizontal axis. The last mid-point is joined
with a „fictitious‟ succeeding mid-point, whose value is also zero. Now, the
curve will end at the horizontal axis. The enclosed diagram is known as the
„frequency polygon‟.
Ogive curve
An accounts officer of SMU may be interested in finding out the number of
staff members who have paid in less than 30 days or who have taken more
than 50 days to make payments. To answer such questions we draw
cumulative frequency curves, also known as Ogives. Ogive curve is a
cumulative frequency curve. These ogives are of two types. One of these is
“less than” and the other is “more than” ogive. In less-than-ogive curve, the
“less than” cumulative frequencies are plotted against upper boundaries of
their respective class intervals. But, in the more-than-ogive, the “greater
than” cumulative frequencies are plotted against the lower boundaries of
their respective class intervals.
A B
Fig. 2.7: (A) More-than Ogive Curve (B) Less-than Ogive Curve
2.7 Summary
Let‟s recapitulate the important aspects of the unit:
In this chapter, at first, we learnt the process of collection of statistical
data at the minimum cost. The collected data must be verified by asking
some standard questions.
In the second stage, we studied the classification of the data, which is
based on the answers of the first stage questions. In this topic, sources
of primary and secondary data have been discussed. The primary
source generates original data and corresponds to the objective of
investigation. However, the secondary data are often available in
published form, collected originally by some other agency with a
different goal. The primary data is more reliable than secondary data.
There are several methods of collecting the primary data. The choice of
a particular method depends apart from objective, scope and nature of
investigation, on the availability of resources, literacy levels of the
respondents, etc. Secondary data should be collected carefully, only
after examining that these are suitable, adequate and reliable for the
purpose of investigation under consideration.
In the third stage, we discussed the process of framing the
questionnaires for the statistical enquiry.
In the fourth stage, we learnt the process of sample selection from the
large population.
Finally, we discussed the process of displaying results in the form of
frequency distribution, cumulative frequency distribution, relative
frequency distribution and charts. Some numerical examples have been
included in this topic. Exploit and explore the full unit!
2.8 Glossary
Data: A collection of any number of related observations on one or more
variables.
Data point: A single observation from data set.
Data set: A collection of data.
Raw data: Information before it is arranged or analyzed by statistical
methods.
Sample: A collection of some, but not all, of the elements of the population
under study, used to describe the population.
Population: A collection of all the elements we are studying, and about
which we are trying to draw conclusions.
Frequency curve: A frequency polygon smoothed by adding classes and
data points to a data set.
Frequency distribution: An organized display of data that shows the
number of observations from the data set that falls into each of a set of
mutually exclusive and collectively exhaustive classes.
Cumulative frequency: A tabular display of data showing how many
observations lie above, or below certain values.
2.10 Answers
Answers to Self Assessment Questions
1. True
2. True
3. Test
4. Uncertainty
5. True