Assignment Stat

Q-1)
A statistical survey is a scientific process of collection and analysis of

numerical data. Explain the stages of statistical survey. Describe the
various methods for collecting data in a statistical survey.
A-1)
a)
Meaning of statistical survey
A Statistical Survey is a scientific process of collection and analysis of numerical

data. Statistical surveys are used to collect information about units in a population
and it involves asking questions to individuals. Surveys of human populations are
common in government, health, social science and marketing sectors.
b)
Stages of statistical survey (Listing and Explanation)
A statistical survey is a scientific process of collection and

analysis of numerical data. Statistical survey are use to collect
numerical information about units in population. Surveys involve
asking questions to individuals. Surveys of human populations are
common in government, health, social science and marketing
sectors.
Stagaes of Statistical SurveyStatistical surveys are categorized into two stages- Planning and
Execution.
1) Planning a Statistical Survey- The relevance and accuracy of data obtained in a

survey depends upon the care exercised in planning. A properly planned investigation can
lead to best result with least cost and time.
A. Nature of the problem to investigate should be clearly defined in an
unambiguous manner.
B. Objective of investigation should be stated at the outset objectives could
be to:
> Obtain certain estimates
> Establish a theory
>
>
Verify an existing statement

Find relationship between characteristics
A. The scope of investigation has to be made clear. The scope of investigation

refers to the area to be covered, identification of units to be studied, nature of
characteristics to be observed,
accuracy of measurements, analytical methods, time, cost and other
resources required.
B.
Whether to use data collected from primary or secondary source should be

determination in advance.
C.
The organization of investigation I the final step in the process. It

encompasses the determination of the number of investigators required,
their training, and supervision work needed funds required.
c)
Methods for collecting data
Collection of data is the first and most important stage in any statistical survey. The
method for collection of data depends upon various factors such as objective,
scope, nature of investigation and availability of resources. Direct personal
interviews, third party agencies, and questionnaires are some ways through which
data is collected.
Primary data :
Primary data is the one, which is collected by the investigator for the purpose of a
specific inquiry or study. Such data is original in character and is generated by a
survey conducted by individuals or a research institution or any organisation. For
Example: If a researcher is interested to know the impact of a non-meal scheme for
school children, he/she has to undertake a survey and collect data on the opinion of
parents and children by asking relevant questions. Such a data collected is called
primary data. Collection of primary data is done by a suitable method as per the
following:
1. Direct personal observation
2. Indirect oral interview
3. Information through agencies
4. Information through mailed questionnaires
5. Information through a schedule filled by investigators.
Secondary data:
Any information, that is used for the current investigation but is obtained from
some data, which has been collected and used by some other agency or person in a
separate investigation, or survey, is known as secondary data. They are available in

a published or unpublished form. In published form, secondary data is available in
research papers, newspapers, magazines, government publication, international
publication, and websites.
a)Published sources
The various sources of published data are: Reports and official publications of
international and national organisations as well as central and state governments
Publications of several local bodies such as municipal corporations and district
boards Financial and economic journals Annual reports of various companies
Publications brought out by research agencies and research scholars
Some of the journals (both academic and non-academic) are published at regular
intervals like yearly, monthly, weekly whereas, other publications are more ad hoc.
Internet is a powerful source of secondary data, which can be accessed at any time
for any further analysis of the study.
b)Unpublished sources
It is not necessary that all statistical contents have to be published. Unpublished
data such as records maintained by various government and private offices, studies
made by research institutions and scholars can also be used where necessary.
Though, use of secondary data is economic in terms of expense, time and
manpower requirement, researcher must be careful in choosing such secondary
data. Secondary data must possess the following characteristic:
1) Reliability of data: The reliability related to secondary data can be tested by
investigating
a) Who collected the data?
b) What were the sources of data?
c) Whether they are collected by a proper method?
d) At what time were they collected?
f) What level of accuracy was desired? Was it achieved
Q-2)
a)
Explain the approaches to define probability.
Probability is a numerical measure which indicates the chance of occurrence of an

event A. It is denoted by P(A). It is the ratio between the favourable outcomes of
an event A (m) to the total outcomes of the experiment (n). In other words:
P(A)= n
Where, m is the number of favourable outcomes of an event A and n is the total

number of outcomes of the experiment.
Probability is a numerical measure which indicates the chance of occurrence.
b)
In a bolt factory machines A, B, C manufacture 25, 35 and 40 percent
of the total output. Of their total output 5, 4 and 2 percent are defective
respectively. A bolt is drawn at random and is found to be defective. What
are the probabilities that it was manufactured by machines A, B and C?
P(A1) = P( that the machine A manufacture the bolts) = 25%
= 0.25
1
Similarly P(A2) = 35% = 0.35 and P(A3) = 40% = 0.40 Let B be the event that the drawn bolt is
defective.
P(B/ A1) = P (that the defective bolt from the machine A1)
= 5 % = 0.05 Similarly, P(B/ A2) = 4% = 0.04 And
P(B/
A3 )
= 2% = 0.02
We have to find P(A2/ B).
Hence by Bayes theorem, we get
P(A2/ B).= (P(A2)P(B/A2))/( ( P(A1) P(B/A1))+ ( P(A2) P(B/A2))+ ( P(A3) P(B/A3)))

=_______________(0.35X0.04)________________
(0.25)(0.05) + (0.35)(0.04) + (0.4)(0.02)
= 28/69
= 0.4058
Q-3)
a)
The procedure of testing hypothesis requires a researcher to adopt
several steps. Describe in brief all such steps.
b)
A sample of 400 items is taken from a normal population whose
mean as well as variance is 4. If the sample mean is 4.5, can the sample
be regarded as a truly random sample?
a)
Hypothesis testing procedure

There are five steps involved in testing on hypothesis while are as follows:
a) Formulate a Hypothesis: The first step is to set up two hypothesis instead

of one in such a way that if one hypothesis is true, the other is false.
Alternatively; if one hypothesis is false or rejected then the other is true or
accepted.
b) Set up a suitable significance level: After formulating the hypothesis, the
next steps is to test its validity at a certain level of significance. The
confidence with which a null hypothesis is rejected or accepted depends
on the significance level used for the purpose.
c) Select test criterion: The next steps in hypothesis testing in the selection
of an appropriate statistical technique as a test criterion. There are many
techniques from which one is to be chosen. For example, when the
hypothesis partners to a large of more than 30, the Z test implying normal
distribution is used for population mean. If the sample is small (n<30) the
t test will be more appropriate. The test criteria that are frequently used in
hypothesis testing are Z, t, f and x2.
d) Compute: After selecting the sampling technique to less the hypotheses,
the next step includes various computations necessary for the application
of that particular test. These computations include the testing statistic as
also its standard error.
e) Making decision: The final step in hypothesis testing is to draw a statistical
decreases, involving the acceptance or rejection of the null hypothesis.
b)
Calculation and solution to the problem
(iMean = 4
Variance = 4, S.D.=
Variance
=2
n = 400
z=
x
S. E
= (4.5-4)/2/
400
=5
Null hypothesis. There is no difference between population mean and sample mean.
Alternate hypothesis. There is a significant difference between population mean and
sample mean.
In this question level of significance is not mentioned, therefore, we can take the
maximum value of r as 3. When r varies between 3 S.D.s, the area covered is
99.73%. Since the calculated value of z is more than the table value, we reject the
Null hypothesis. Hence, we can infer that the sample drawn is not a truly random
sample.
(ii) In this part the level of significance is mentioned to be 1%. Therefore z=2.58
Z=(4-4.45)/2/30=.0.45*15=-6.75
Here, r
= -0.45 x 15 = -6.75
Since, the calculated value of r is numerically more than the table value of z, we
reject the Null hypothesis and conclude that sample cannot be regarded as a true
random sample.
Q-4)
a)
What is a Chi-square test? Point out its applications. Under what
conditions is this test applicable?
b)
What are the components of time series? Enumerate the methods of
determining trend in time series.
a)
Meaning, applications and conditions
The Chi-square test is one of the most commonly used non-parametric tests in
statistical work. The Greek Letter
is used to denote this test.
describe the
magnitude of discrepancy between the observed and the expected frequencies. The
value of
is calculated as:
The following are the conditions for using the Chi-Square test:
1. The frequencies used in Chi-Square test must be absolute and not in relative
terms.
2. The total number of observations collected for this test must be large.
3. Each of the observations which make up the sample of this test must be
independent of each other.
4. As
test is based wholly on sample data, no assumption is made concerning
the population distribution. In other words, it is a non parametric-test.
5. test is wholly dependent on degrees of freedom. As the degrees of freedom
increase, the Chi-Square distribution curve becomes symmetrical.
6. The expected frequency of any item or cell must not be less than 5, the
frequencies of adjacent items or cells should be polled together in order to make it
more than 5.
7. The data should be expressed in original units for convenience of comparison and
the given distribution should not be replaced by relative frequencies or proportions.
8. This test is used only for drawing inferences through test of the hypothesis, so it
cannot be used for estimation of parameter value.
Applications of Chi-Square Test
Test of goodness of fit

The test of goodness of fit of a statistical model measures how accurately the test
fits a set of observations. This test measures and summarises the differences if any,
between the observed and expected values of the considered statistical model.
These test results are helpful to know whether the samples are drawn from identical
distributions or not. The degrees of freedom are n-1 and the expected value is
equal to the average of the observed values.
b)
Components of time series and methods of determining trend in
time series
A trend exists when there is a long-term increase or decrease in the data. It does
not have to be linear. Sometimes we will refer to a trend changing direction when
it might go from an increasing trend to a decreasing trend.
Seasonal
A seasonal pattern exists when a series is influenced by seasonal factors (e.g., the
quarter of the year, the month, or day of the week). Seasonality is always of a fixed
and known period.
Cyclic
A cyclic pattern exists when data exhibit rises and falls that are not of fixed period.
The duration of these fluctuations is usually of at least 2 years.
Methods of determining trend
There are no proven "automatic" techniques to identify trend components in the
time series data; however, as long as the trend is monotonous (consistently
increasing or decreasing) that part of data analysis is typically not very difficult. If
the time series data contain considerable error, then the first step in the process of
trend identification is smoothing.
Smoothing. Smoothing always involves some form of local averaging of data such
that the nonsystematic components of individual observations cancel each other
out. The most common technique is moving average smoothing which replaces
each element of the series by either the simple or weighted average of n
surrounding elements, where n is the width of the smoothing "window" Medians can
be used instead of means. The main advantage of median as compared to moving

average smoothing is that its results are less biased by outliers (within the
smoothing window). Thus, if there are outliers in the data (e.g., due to
measurement errors), median smoothing typically produces smoother or at least
more "reliable" curves than moving average based on the same window width. The
main disadvantage of median smoothing is that in the absence of clear outliers it
may produce more "jagged" curves than moving average and it does not allow for
weighting.
In the relatively less common cases (in time series data), when the measurement
error is very large, the distance weighted least squares smoothing or negative
exponentially weighted smoothing techniques can be used. All those methods will
filter out the noise and convert the data into a smooth curve that is relatively
unbiased by outliers (see the respective sections on each of those methods for more
details). Series with relatively few and systematically distributed points can be
smoothed with bicubic splines.
Fitting a function. Many monotonous time series data can be adequately
approximated by a linear function; if there is a clear monotonous nonlinear
component, the data first need to be transformed to remove the nonlinearity.
Usually a logarithmic, exponential, or (less often) polynomial function can be used.
Analysis of Seasonality
Seasonal dependency (seasonality) is another general component of the time series
pattern. The concept was illustrated in the example of the airline passengers data
above. It is formally defined as correlational dependency of order k between each
i'th element of the series and the (i-k)'th element and measured by autocorrelation
(i.e., a correlation between the two terms); k is usually called the lag. If the
measurement error is not too large, seasonality can be visually identified in the
series as a pattern that repeats every k elements.
Autocorrelation correlogram. Seasonal patterns of time series can be examined via
correlograms. The correlogram (autocorrelogram) displays graphically and
numerically the autocorrelation function (ACF), that is, serial correlation coefficients
(and their standard errors) for consecutive lags in a specified range of lags (e.g., 1
through 30). Ranges of two standard errors for each lag are usually marked in
correlograms but typically the size of auto correlation is of more interest than its
reliability (see Elementary Concepts) because we are usually interested only in very
strong (and thus highly significant) autocorrelations.
Examining correlograms. While examining correlograms one should keep in mind
that autocorrelations for consecutive lags are formally dependent. Consider the
following example. If the first element is closely related to the second, and the
second to the third, then the first element must also be somewhat related to the
third one, etc. This implies that the pattern of serial dependencies can change
considerably after removing the first order auto correlation (i.e., after differencing
the series with a lag of 1).
Q-5)
What do you mean by cost of living index? Discuss the methods of

construction of cost of living index with an example for each.
a)
Meaning of cost of living index
b)
Methods of constructing cost of living index with an example for
each
A-5)
The Cost of living index, also known as consumer price index or Cost of living
price index is the countrys principal measure of price change. The Consumer price
index helps us in determining the effect of rise and fall in prices on different classes
of consumers living in different areas.
The amount of money needed to sustain a certain level of living, including basic
expenses such as housing, food, taxes, and healthcare. Cost of living is often used
when comparing how expensive it is to live in one city versus another.
The Steps involved in constructing cost of living index are as follows
1. Purpose of the Index Number:
Before constructing an index number, it should be decided the purpose for which it
is needed. An index number constructed for one category or purpose cannot be
used for others. A cost of living index of working classes cannot be used for farmers
because the items entering into their consumption will be different.
2. Selection of Commodities:
Commodities to be selected depend upon the purpose or objective of the index
number to be constructed. But the number of commodities should neither be too
large nor too small.
Moreover, commodities to be selected must be broadly representative of the group
of commodities. They should also be comparable in the sense that standard or
graded items should be taken.
3. Selection of Prices:
The next step is to select the prices of these commodities. For this purpose, care
should be taken to select prices from representative persons, places or journals or
other sources. But they must be reliable. Prices may be quoted in money terms i.e.
Rs. 100 per quintal or in quantity terms, i.e. 2 kg. per rupee. Care should be taken
not to mix these prices. Then the problem is to select wholesale or retail prices. This
depends on the type of index number. For a consumer price index, wholesale prices
are required, while for a cost of living index, retail prices are needed. But different
prices should not be mixed up.
4. Selection of an Average:
Since index numbers are averages, the problem is how to select an appropriate
average. The two important averages are the arithmetic mean and geometric mean.
The arithmetic mean is the simpler of the two. But geometric mean is more
accurate. However, the average prices should be reduced to price relatives
(percentages) either on the basis of the fixed base method or the chain base
method.
5. Selection of Weights:
While constructing an index number due weightage or importance should be given
to the various commodities. Commodities which are more important in the
consumption of consumers should be given higher weightage than other
commodities. The weights are determined with reference to the relative amounts of
income spent on commodities by consumers. Weights may be given in terms of
value or quantity.
6. Selection of the Base Period:

The selection of the base period is the most important step in the construction of an
index number. It is a period against which comparisons are made. The base period
should be normal and free from any unusual events such as war, famine,
earthquake, drought, boom, etc. It should not be either very recent or remote.
7. Selection of Formula:
A number of formulas have been devised to construct an index number. But the
selection of an appropriate formula depends upon the availability of data and
purpose of the index number. No single formula may be used for all types of index
numbers.
Q-6)
a)
What is analysis of variance? What are the assumptions of this
technique?
b)
Three samples below have been obtained from normal populations
with equal variances. Test the hypothesis at 5% level that the population
means are equal.
A
12
10
10
13
14
12
11
14
[The table value of F at 5% level of significance for vi = 2 and V2= 12 is

3.88]
a)
Meaning and Assumptions
b)
Formulas/Calculation/Solution to the problem
A-6]
ANOVA is a method of splitting the total variation of data into constituent parts
which measure the different sources of variations.
The total variation is split up into the following two components:
1) Variance within the subgroups of samples
2) Variation between the subgroups of the samples
Assumptions for study of ANOVA
The underlying assumptions for the study of ANOVA are:
i)
ii)
iii)
iv)
v)
Each of the samples is a simple random sample

Population from which the samples are selected are normally distributed
Each of the samples is independent of the other samples
Each of the population has the same variation and identical means
The effect of various components are additive

Assignment Stat

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Assignment Stat

Uploaded by

Copyright:

Available Formats

Q-1)

A statistical survey is a scientific process of collection and analysis of

Meaning of statistical survey

A Statistical Survey is a scientific process of collection and analysis of numerical

A statistical survey is a scientific process of collection and

1) Planning a Statistical Survey- The relevance and accuracy of data obtained in a

Verify an existing statement

A. The scope of investigation has to be made clear. The scope of investigation

Whether to use data collected from primary or secondary source should be

The organization of investigation I the final step in the process. It

separate investigation, or survey, is known as secondary data. They are available in

Explain the approaches to define probability.

Probability is a numerical measure which indicates the chance of occurrence of an

Where, m is the number of favourable outcomes of an event A and n is the total

P(A2/ B).= (P(A2)P(B/A2))/( ( P(A1) P(B/A1))+ ( P(A2) P(B/A2))+ ( P(A3) P(B/A3)))

Hypothesis testing procedure

a) Formulate a Hypothesis: The first step is to set up two hypothesis instead

Calculation and solution to the problem

Meaning, applications and conditions

Test of goodness of fit

be used instead of means. The main advantage of median as compared to moving

What do you mean by cost of living index? Discuss the methods of

Meaning of cost of living index

6. Selection of the Base Period:

[The table value of F at 5% level of significance for vi = 2 and V2= 12 is

Meaning and Assumptions

Formulas/Calculation/Solution to the problem

Each of the samples is a simple random sample

You might also like