Professional Documents
Culture Documents
Stagaes of Statistical SurveyStatistical surveys are categorized into two stages- Planning and
Execution.
>
>
C.
c)
Methods for collecting data
Collection of data is the first and most important stage in any statistical survey. The
method for collection of data depends upon various factors such as objective,
scope, nature of investigation and availability of resources. Direct personal
interviews, third party agencies, and questionnaires are some ways through which
data is collected.
Primary data :
Primary data is the one, which is collected by the investigator for the purpose of a
specific inquiry or study. Such data is original in character and is generated by a
survey conducted by individuals or a research institution or any organisation. For
Example: If a researcher is interested to know the impact of a non-meal scheme for
school children, he/she has to undertake a survey and collect data on the opinion of
parents and children by asking relevant questions. Such a data collected is called
primary data. Collection of primary data is done by a suitable method as per the
following:
1. Direct personal observation
2. Indirect oral interview
3. Information through agencies
4. Information through mailed questionnaires
5. Information through a schedule filled by investigators.
Secondary data:
Any information, that is used for the current investigation but is obtained from
some data, which has been collected and used by some other agency or person in a
b)Unpublished sources
It is not necessary that all statistical contents have to be published. Unpublished
data such as records maintained by various government and private offices, studies
made by research institutions and scholars can also be used where necessary.
Though, use of secondary data is economic in terms of expense, time and
manpower requirement, researcher must be careful in choosing such secondary
data. Secondary data must possess the following characteristic:
1) Reliability of data: The reliability related to secondary data can be tested by
investigating
a) Who collected the data?
b) What were the sources of data?
c) Whether they are collected by a proper method?
d) At what time were they collected?
f) What level of accuracy was desired? Was it achieved
Q-2)
a)
P(A)= n
b)
In a bolt factory machines A, B, C manufacture 25, 35 and 40 percent
of the total output. Of their total output 5, 4 and 2 percent are defective
respectively. A bolt is drawn at random and is found to be defective. What
are the probabilities that it was manufactured by machines A, B and C?
P(A1) = P( that the machine A manufacture the bolts) = 25%
= 0.25
1
Similarly P(A2) = 35% = 0.35 and P(A3) = 40% = 0.40 Let B be the event that the drawn bolt is
defective.
P(B/ A1) = P (that the defective bolt from the machine A1)
= 5 % = 0.05 Similarly, P(B/ A2) = 4% = 0.04 And
P(B/
A3 )
= 2% = 0.02
We have to find P(A2/ B).
Hence by Bayes theorem, we get
Q-3)
a)
The procedure of testing hypothesis requires a researcher to adopt
several steps. Describe in brief all such steps.
b)
A sample of 400 items is taken from a normal population whose
mean as well as variance is 4. If the sample mean is 4.5, can the sample
be regarded as a truly random sample?
a)
b)
(iMean = 4
Variance = 4, S.D.=
Variance
=2
n = 400
z=
x
S. E
= (4.5-4)/2/
400
=5
Null hypothesis. There is no difference between population mean and sample mean.
Alternate hypothesis. There is a significant difference between population mean and
sample mean.
In this question level of significance is not mentioned, therefore, we can take the
maximum value of r as 3. When r varies between 3 S.D.s, the area covered is
99.73%. Since the calculated value of z is more than the table value, we reject the
Null hypothesis. Hence, we can infer that the sample drawn is not a truly random
sample.
(ii) In this part the level of significance is mentioned to be 1%. Therefore z=2.58
Z=(4-4.45)/2/30=.0.45*15=-6.75
Here, r
= -0.45 x 15 = -6.75
Since, the calculated value of r is numerically more than the table value of z, we
reject the Null hypothesis and conclude that sample cannot be regarded as a true
random sample.
Q-4)
a)
What is a Chi-square test? Point out its applications. Under what
conditions is this test applicable?
b)
What are the components of time series? Enumerate the methods of
determining trend in time series.
a)
The Chi-square test is one of the most commonly used non-parametric tests in
statistical work. The Greek Letter
is used to denote this test.
describe the
magnitude of discrepancy between the observed and the expected frequencies. The
value of
is calculated as:
The following are the conditions for using the Chi-Square test:
1. The frequencies used in Chi-Square test must be absolute and not in relative
terms.
2. The total number of observations collected for this test must be large.
3. Each of the observations which make up the sample of this test must be
independent of each other.
4. As
test is based wholly on sample data, no assumption is made concerning
the population distribution. In other words, it is a non parametric-test.
5. test is wholly dependent on degrees of freedom. As the degrees of freedom
increase, the Chi-Square distribution curve becomes symmetrical.
6. The expected frequency of any item or cell must not be less than 5, the
frequencies of adjacent items or cells should be polled together in order to make it
more than 5.
7. The data should be expressed in original units for convenience of comparison and
the given distribution should not be replaced by relative frequencies or proportions.
8. This test is used only for drawing inferences through test of the hypothesis, so it
cannot be used for estimation of parameter value.
Applications of Chi-Square Test
b)
Components of time series and methods of determining trend in
time series
A trend exists when there is a long-term increase or decrease in the data. It does
not have to be linear. Sometimes we will refer to a trend changing direction when
it might go from an increasing trend to a decreasing trend.
Seasonal
A seasonal pattern exists when a series is influenced by seasonal factors (e.g., the
quarter of the year, the month, or day of the week). Seasonality is always of a fixed
and known period.
Cyclic
A cyclic pattern exists when data exhibit rises and falls that are not of fixed period.
The duration of these fluctuations is usually of at least 2 years.
Methods of determining trend
There are no proven "automatic" techniques to identify trend components in the
time series data; however, as long as the trend is monotonous (consistently
increasing or decreasing) that part of data analysis is typically not very difficult. If
the time series data contain considerable error, then the first step in the process of
trend identification is smoothing.
Smoothing. Smoothing always involves some form of local averaging of data such
that the nonsystematic components of individual observations cancel each other
out. The most common technique is moving average smoothing which replaces
each element of the series by either the simple or weighted average of n
surrounding elements, where n is the width of the smoothing "window" Medians can
b)
Methods of constructing cost of living index with an example for
each
A-5)
The Cost of living index, also known as consumer price index or Cost of living
price index is the countrys principal measure of price change. The Consumer price
index helps us in determining the effect of rise and fall in prices on different classes
of consumers living in different areas.
The amount of money needed to sustain a certain level of living, including basic
expenses such as housing, food, taxes, and healthcare. Cost of living is often used
when comparing how expensive it is to live in one city versus another.
The Steps involved in constructing cost of living index are as follows
1. Purpose of the Index Number:
Before constructing an index number, it should be decided the purpose for which it
is needed. An index number constructed for one category or purpose cannot be
used for others. A cost of living index of working classes cannot be used for farmers
because the items entering into their consumption will be different.
2. Selection of Commodities:
Commodities to be selected depend upon the purpose or objective of the index
number to be constructed. But the number of commodities should neither be too
large nor too small.
Moreover, commodities to be selected must be broadly representative of the group
of commodities. They should also be comparable in the sense that standard or
graded items should be taken.
3. Selection of Prices:
The next step is to select the prices of these commodities. For this purpose, care
should be taken to select prices from representative persons, places or journals or
other sources. But they must be reliable. Prices may be quoted in money terms i.e.
Rs. 100 per quintal or in quantity terms, i.e. 2 kg. per rupee. Care should be taken
not to mix these prices. Then the problem is to select wholesale or retail prices. This
depends on the type of index number. For a consumer price index, wholesale prices
are required, while for a cost of living index, retail prices are needed. But different
prices should not be mixed up.
4. Selection of an Average:
Since index numbers are averages, the problem is how to select an appropriate
average. The two important averages are the arithmetic mean and geometric mean.
The arithmetic mean is the simpler of the two. But geometric mean is more
accurate. However, the average prices should be reduced to price relatives
(percentages) either on the basis of the fixed base method or the chain base
method.
5. Selection of Weights:
While constructing an index number due weightage or importance should be given
to the various commodities. Commodities which are more important in the
consumption of consumers should be given higher weightage than other
commodities. The weights are determined with reference to the relative amounts of
income spent on commodities by consumers. Weights may be given in terms of
value or quantity.
12
10
10
13
14
12
11
14
a)
b)
A-6]
ANOVA is a method of splitting the total variation of data into constituent parts
which measure the different sources of variations.
The total variation is split up into the following two components:
1) Variance within the subgroups of samples
2) Variation between the subgroups of the samples
Assumptions for study of ANOVA
The underlying assumptions for the study of ANOVA are:
i)
ii)
iii)
iv)
v)