Statistics Notes

CORRELATION ANALYSIS
Concept and Importance of Correlation

We may come across certain series wherein there may be more than one variable. A distribution in which each variable assumes two values is called a Bivariate Distribution. If we measure more than two variables on each unit of a distribution, it is called Multivariate Distribution. In a bivariate distribution, we may be interested to find if there is any relationship between the two variables under study. The Correlation is a statistical tool which studies the relationship between two variables and the correlation analysis involves various methods and techniques used for studying and measuring the extent of the relationship between the two variables. Correlation analysis is used as a statistical tool to ascertain the association between two variables. When the relationship is of a quantitative nature, the appropriate statistical tool for discovering & measuring the relationship and expressing it in a brief formula is known as correlation. - Croxton & Cowden Correlation is an analysis of the covariation between two or more variables. - A. M. Tuttle Correlation Analysis contributes to the understanding of economic behaviour, aids in locating the critically important variables on which others depend, may reveal to the economist the connections by which disturbances spread and suggest to him the paths through which stabilizing forces may become effective. - W. A. Neiswanger The effect of correlation is to relation is to reduce the range of uncertainty of our prediction. - Tippett The problem in analyzing the association between two variables can be broken down into three steps. o We try to know whether the two variables are related or independent of each other. o If we find that there is a relationship between the two variables, we try to know its nature and strength. This means whether these variables have a positive or a negative relationship and how close that relationship is. o We may like to know if there is a causal relationship between them. This means that the variation in one variable causes variation in another. When data regarding two or more variables are available, we may study the related variation of these variables. For e.g. in a data regarding heights (x) and weights (y) of students of a college, we find that those students who have greater height would have greater weight. Also, students who have lesser height would have lesser weight. This type of related variation among variables is called correlation. Correlation may be (i) Simple correlation (ii) Multiple correlation (iii) Partial correlation. 1
Simple correlation concerns with related variation among two variables. Multiple correlation and partial correlation concern with related variation among three or more variables. Two variables are said to be correlated when they vary such that a. The higher values of one variable correspond to the higher values of the other and the lower values of the variable correspond to the lower values of the other. or b. The higher values of one variable correspond to the lower values of the other. Generally, it can be seen that those who are tall will have greater weight, and those who are short will have lesser weight. Thus height (x) and weight (y) of persons show related variation. And so they are correlated. On the other hand production (x) and price (y) of vegetables show variation in opposite directions. Here the higher the production the lower would be the price. In both the above examples, the variables x and y show related variation. And so they are correlated. TYPES OF CORRELATION Correlation is positive (direct) if the variables vary in the same directions, that is, if they increase and decrease together. Height (x) and weight (y) of persons are positively correlated. Correlation is negative (inverse) if the variables vary in the opposite directions, that is, if one variable increases the other variable decreases. Production (x) and price (y) of vegetables are negatively correlated. If variables do not show related variation, they are said to be non correlated. If variables show exact linear relationship, they are said to be perfectly correlated. Perfect correlation may be positive or negative.
Correlation and Causation

o The correlation may be due to chance particularly when the data pertain to a small sample. o It is possible that both the variables are influenced by one or more other variables. o There may be another situation where both the variables may be influencing each other so that we cannot say which is the cause and which is the effect.
Types of Correlation
o Positive and Negative: If the values of the two variables deviate in the same direction i.e., if the increase in the values of one variable results, on an average, in a corresponding 2
increase in the values of the other variable or if a decrease in the values of one variable results, on an average, in a corresponding decrease in the values of the other variable, correlation is said to be positive or direct. For example: Price & Supply of the commodity. On the other hand, correlation is said to be negative or inverse if the variables deviate in the opposite direction i.e., if the increase (decrease) in the values of one variable results, on the average, in a corresponding decrease (increase) in the values of the other variable. For example: Temperature and Sale of Woolen Garments. o Linear and Non-Linear: The correlation between two variables is said to be linear if corresponding to a unit change in one variable, there is a constant change in the other variable over the entire range of the values. For example: y = ax + b. The relationship between two variables is said to be non-linear or curvilinear if corresponding to a unit change in one variable, the other variable does not change at a constant rate but at a fluctuating rate. When this is plotted in the graph this will not be a straight line. o Simple, Partial and Multiple: The distinction amongst these three types of correlation depends upon the number of variables involved in a study. If only two variables are involved in a study, then the correlation is said to be simple correlation. When three or more variables are involved in a study, then it is a problem of either partial or multiple correlation. In multiple correlation, three or more variables are studied simultaneously. But in partial correlation we consider only two variables influencing each other while the effect of other variable is held constant. For example: Let us suppose that we have three variables, number of hours studied (x); IQ (y); marks obtained (z). In a multiple correlation we will the correlation between z with 2 variables x & y. In contrast, when we study the relationship between x & z, keeping an average IQ as constant, it is said to be a study involving partial correlation.
Methods of Correlation
study METHODS OF CORRELATION GRAPHIC SCATTER DIAGRAM COVARIENCE METHOD RANK CORRELATION ALGEBRAIC CONCURRENT DEVIATION METHOD
Process of Calculating Coefficient of Correlation
o Calculate the means of the two series: X and Y. o Take deviations in the two series from their respective means, indicated as x and y. The deviation should be taken in each case as the value of the individual item minus () the arithmetic mean. o Square the deviations in both the series and obtain the sum of the deviation-squared columns. This would give x2 and y2. 3
o Take the product of the deviations, that is, xy. This means individual deviations are to be multiplied by the corresponding deviations in the other series and then their sum is obtained. o The values thus obtained in the preceding steps xy, x2 and y2 are to be used in the formula for correlation. KARL PEARSONS COEFFICIENT OF CORRELATION (COVARIENCE METHOD; PRODUCT MOMENT) This is a measure of linear relationship between the two variables. It indicates the degree of correlation between the two variables. It is denoted by r. INTERPRETATION OF COEFFICIENT OF CORRELATION a. b. c. d. e. A positive value of r indicates positive correlation A negative value of r indicates negative correlation r = +1 means, correlation is perfect positive. r = -1 means, correlation is perfect negative. r = 0 (or low) means, the variables are non correlated.
Karl Pearsons measure known as Pearsonian correlation co efficient between two variables ( series) X and Y , usually donated by r , is a numerical measure of linear relationship between them and is defined as the ratio of the covariance between X and Y , written as Cov ( x, y) to the product of standard deviation of X and Y .
Assumptions of the Karl Pearsons Correlation

o The two variables X and Y are linearly related. o The two variables are affected by several causes, which are independent, so as to form a normal distribution.
Coefficient of Determination
The strength of r is judged by coefficient of determination, r2 for r = 0.9, r2 = 0.81. We multiply it by 100, thus getting 81 per cent. This suggests that when r is 0.9 then we can say that 81 per cent of the total variation in the Y series can be attributed to the relationship with X.
Rank Correlation
Limitations of Spearmans Method of Correlation o Spearmans r is a distribution-free or non parametric measure of correlation. o As such, the result may not be as dependable as in the case of ordinary correlation where the distribution is known.
o Another limitation of rank correlation is that it cannot be applied to a grouped frequency distribution. o When the number of observations is quite large and one has to assign ranks to the observations in the two series, then such an exercise becomes rather tedious and timeconsuming. This becomes a major limitation of rank correlation.
Some Limitations of Correlation Analysis

o Correlation analysis cannot determine cause-and-effect relationship. o Another mistake that occurs frequently is on account of misinterpretation of the coefficient of correlation and the coefficient of determination. o Another mistake in the interpretation of the coefficient of correlation occurs when one concludes a positive or negative relationship even though the two variables are actually unrelated.
Properties of Correlation Coefficient

Property 1 - Limits for Correlation Coefficient Pearsonian correlation coefficient can not exceed 1 numerically. In other words it lies between 1 and -1. Symbolically: 1 r 1. r = + 1 implies perfect positive correlation between the variables. Property 2 - Correlation Coefficient is independent of the change of origin and scale. Mathematically, if X and Y are given and they are transformed to the new variables U and V by the change of origin and scale viz, u = (x A)/h and v = (y B)/k ; h >0, k >0
Where A, B, h >0, k >0; then the correlation coefficient between x and y is same as the correlation coefficient between u and v i.e., r (x,y) = r ( u, v) rxy = ruv
Property 3 - Two independent variables are uncorrelated but the converse is not true. Remarks: one should not be confused with the words of uncorrelation and independence. rxy = 0 i.e., uncorrelation between the variables x and y simply implies the absence of any linear (straight line) relationship between them. They may however, be related in some other form (other than straight line) e.g., quadratic (as we have see in the above example, logarithmic or trigonometric form. Property 5 - If the variables x and y is (+ 1) if the signs of a and b are different and (-1) if the signs of a and b are alike. Interpretation of r the following general points may be borne in mind 5
while interpreting an observed value of correlation coefficient r: If r = -1 there is perfect negative correlation between the variables. In this scatter diagram will again be a straight line. If r = 0, the variables are uncorrelated in other words there is no linear (straight line) relationship between the variables. However, r = 0 does not imply that the variables are independent. For other values of r lying between + 1 and 1 there are no set guidelines for its interpretation. The maximum we can conclude is that nearer the value of r to 1, the closer is the relationship between the variables and nearer is the value of r to 0 the less close is the relationship between them. One should be very careful in interpreting the value of r as it is often misinterpreted. The reliability or the significance of the value of the correlation depends on a number of factors. One of the ways of testing the significance of r is finding its probable error, which in addition to the value of r takes into account the size of the sample also. Another more useful measure for interpreting the value of r is the coefficient of determination. It is observed there that the closeness of the relation ship between two variables is not proportional to r.
In total the Properties are:

o o o o Limits for Correlation Coefficient. Independent of the change of origin & scale. Two independent variables are uncorrelated but the converse is not true. If variable x & y are connected by a linear equation: ax+by+c=0, if the correlation coefficient between x & y is (+1) if signs of a, b are different & (-1) if signs of a, b are alike.
PROBABLE ERROR
After computing the value of the correlation coefficient, the next step is to find the extent to which it is dependable. Probable error of correlation coefficient usually denoted by P.E (r) is an old measure of testing the reliability of an observed value of correlation coefficient in so far as it depends upon the condition of random sampling. If r is the observed correlation coefficient in a sample of n pairs of observation then its standard error, usually denoted by S.E (r) is given by
1 r2 SE (r) = n
PE (r) = SE (r) * 0.6745
The reason for taking the factor 0.6745 is that in a normal distribution 50% of the distribution lie in the rang 0.6745 is the s.d.
According to Secrist, The probable error of the correlation coefficient is an amount which if added to and subtracted from the mean correlation coefficient, produces amounts within which the chances are even that a coefficient of correlation from a series selected a random will fall. Uses of probable error The probable error of correlation coefficient may be used to determine the limits which the population correlation coefficient may be expected to lie. Limits for population correlation coefficient are 1. r P.E. (r) : This implies that if we take another random sample of the same size n from the same population from which the first sample was taken, then the observed value of the correlation coefficient , say, r1 in the second sample can be expected to lie within the limits given. 2. P.E. (r) may be used to test if an observed value of sample correlation coefficient is significant of any correlation in the population. The following guidelines may be used: a. If r < P.E. ( r ) i.e, if the observed value of r is less than its P.E., then the correlation is not at all significant. b. If r > P.E. ( r ) i.e, if the observed value of r is greater than 6 times its P.E., then r is definitely significant. c. In other situation nothing can be concluded with certainty. Important Remarks 1: Sometimes P.E. may lead to fallacious conclusions particularly when n , the number of pairs of observations is small. In order to use P.E. effectively, n should be fairly large. However a rigorous test for testing the significance of an observed sample correlation coefficient is provided by Students t test. Important Remarks 2: P.E. can be used only under the following conditions a. The data must have been drawn from a normal population. b. The conditions of random sampling should prevail in selecting sampled observation. r < PE (r) r is not at all significant; r > 6 PE (r) r is significant; other cases nothing can be concluded with certainty. RANK CORRELATION METHOD Sometimes we come across statistical series in which the variable under consideration are not capable of quantitative measurements but can be arranged in a serial order. This happens when we are dealing with qualitative characteristics ( attributes) such as honesty, beauty, character, morality, etc. Which cannot be measured quantitatively but can be arranged serially. In such situations Karl Pearsons coefficient of correlation cannot be used as such. Charles Edward 7
Spearman, a British psychologist, developed a formula in 1904 which consists in obtaining the correlation coefficient between the ranks of n individuals in the two attributes under study. The Pearson Correlation Coefficient between the ranks X and Y is called the rank correlation coefficient between the characteristics A and B for that group of individuals. The students are assigned ranks in Statistics according to their marks in Statistics. Also, they are assigned ranks in Mathematics according to their marks in Mathematics. Then, the correlation between these two sets of ranks is called rank correlation. The coefficient of correlation computed for these ranks is called Spearmans coefficient of rank correlation. In a bivariate data, if the values of the variables are ranked in the decreasing (or increasing) order, the correlation between these ranks is rank correlation. The coefficient of correlation computed for these rank is Spearmans coefficient of rank correlation. It is denoted by (Rho) If R1 and R2 are the ranks in the two characteristics, and d = R1 R2 is the difference between the ranks, coefficient of rank correlation is = 1 - 6d2
n3 n
Since is the product moment coefficient of correlation between the ranks , it is a value between -1 and +1 Karl Pearsons coefficient of correlation can be calculated only if the characteristics under study are quantitative ( they should be numerically measurable) but, Spearmans coefficient of rank correlation can be calculated even if the characteristics under study are qualitative. If it is possible to assign ranks to the units with regard to the two characteristics , co efficient of rank correlation can be calculated. REPEATED RANKS In case of attributes if there is a tie i.e. if any two or more individuals are placed together in any classification w.r.t an attribute or if in case of variable data there is more than one item with the same value in either or both the series, then Spearmans formula for calculating the rank correlation coefficient breaks down, since in this case the variable X ( the ranks of individuals in characteristic A ( 1st series) and Y ( the ranks of individuals characteristic B ( 2nd series) do not take the values from 1 to n and consequently x y, while in proving we had assumed that x = y. For the computation of coefficient of rank correlation, while ranking the values, two or more values may be equal. And so, a situation of ties may arise. In such a case, all those values which are equal are assigned with the same average rank. And then, the coefficient of rank correlation is found. Here, corresponding to every such repeated rank correlation is found. Here
corresponding to every such repeated rank (which repeats m times), a factor (m3 m) / 12 is 2 added to d In this case, common ranks are assigned to the repeated items. These common ranks are the arithmetic mean of the ranks which these items would have got if they were different from, each other and the next item will get the rank next to the rank used in computing the common rank. For e.g, suppose an item is repeated at rank 4. Then the common rank to be assigned to each item is ( 4 + 5) / 2 i.e, 4.5 which is the average of 4 and 5 , the ranks which these observations would have assumed if they were different. The next item will be assigned the rank 6. if an item is repeated thrice at rank 7, then the common rank to be assigned to each value will be ( 7+8+9)/ 3, i.e 8 which the arithmetic mean of 7,8 and 9 viz, the ranks these observation would have got if they were different from each other. The next rank to be assigned will be 10. If only a small proportion of the ranks are tied, this technique may be applied together with formula. If a large proportion of ranks are tied, it is advisable to apply an adjustment or a correction factor as explained: In a formula add the factor m (m 1) / 12 to d , where m is the number of times an item is repeated. This correction factor is to be added for each repeated value in both the series.
2 2
REMARKS ON SPEARMANS RANK CORRELATION COEFFICIENT 1. Since Spearmans rank correlation coefficient is nothing but Pearsons correlation coefficient between the ranks, it can be interpreted in the same way as the Karl Pearsons correlation coefficient. 2. Karl Pearsons correlation coefficient assumes that the parent population from which sample observations are drawn is normal. If this assumption is violated than we need a measure which is distribution free (or non parametric). A distribution free measure is one which does not make any assumptions about the form of the population. Spearmans is such a measure (i.e. distribution free), since no strict assumptions are made about the form of the population from which sample observations are drawn. 3. Spearmans formula is easy to understand and apply as compared with Karl Pearsons formula. The values obtained by the two formulae, viz Pearsonian r and Spearmans are generally different. The differences arise due to the fact that when ranking is used instead of full set of observations, there is always some loss of information. Unless many ties exist, the coefficient of rank correlation should be slightly lower than the Pearsonian coefficient. 4. Spearmans formula is the only formula to be used for finding correlation coefficient if we are dealing with qualitative characteristics which cannot be measured quantitatively but can be arranged serially. It can also be used where
actual data are given. In case of extreme observations, Spearmans formula is preferred to Pearsons formula. 5. Spearmans formula has its limitation also. It is not practicable in the case of bivariate frequency distribution. For n > 30, this formula should not be used unless the ranks are given, since in the contrary case the calculations are quite time consuming.
When ranks are not repeated:
= 1When ranks are repeated:
6D2 n3 1)
= 1-
6[D2 + {m(m2-1)/12}] n3 1)
REGRESSION
Literally the word regression means return to the origin. In statistics, the word is used in a different sense. If two variables are correlated, the unknown value of one of the variables can be estimated by using the known value of the other variable. The so estimated value may not be equal to the actually observed value, but it will be close to the actual value. Regression Analysis, in general sense, means the estimation or prediction of the unknown value of one variable from the known value of the other variable. The Regression Analysis confined to the study of only two variables at a time is termed as Simple Regression. But quite often the values of a particular phenomenon may be affected by multiplicity of causes. The Regression analysis for studying more than two variables at a time is known as Multiple Regression. In Regression Analysis there are two types of variables. The variable whose value is influenced or is to be predicted is called dependent variable. The variable which influences the values or used for prediction is called independent variable. The Regression Analysis independent variable is known as regressor or predictor or explanator while the dependent variable is also known as regressed or explained variable. LINEAR & NON-LINEAR REGRESSION 10
If the given bivariate data are plotted on a graph, the points so obtained on the diagram will more or less concentrate around a curve, called the Curve of Regression. The mathematical equation of the Regression curve, is called the Regression Equation. If the regression curve is a straight line, we say that there is linear regression between the variables under study. If the curve of regression is not a straight line, the regression is termed as curved or non-linear regression. The property of the tendency of the actual value to lie close to the estimated value is called regression. In a wider usage regression is the theory of estimation of unknown value of a variable with the help of known values of the variables. The regression theory was first introduced and developed by Sir Francis Galton in the field of Genetics. Here, firstly, a mathematical relation between the two variables is framed. This relation which is called regression equation is obtained by the method of least squares. It may be linear or non linear. For a bivariate data on x and y, the regression equation obtained with the assumption that x is dependent on y is called regression of x on y. The regression of x on y is: (x AM of x ) = bxy (y AM of y) The regression equation obtained with the assumption that y is dependent on x is called regression of y on x. the regression of y on x is (y AM of y) = byx (x AM of x) The following set of formulas explains all the terms given below:
bxy =
r. x y
bxy =
Cov (x,y) y2
dx.dy
dy2
byx =
r. y x
byx =
Cov (x,y) x2
dx.dy
dx2
bxy=
nxy - x.y
bxy =
ny2 -(y)2
nxy byx= x.y nx2 -(x)2
byx =
The regression of x on y is used for the estimation of x values and the regression of y on x is used for the estimation of y values. The graph of the regression equations are the regression lines.
PROPERTIES OF REGRESSION
Regression coefficient are the coefficients of the independent variables in the regression equations. 11
1. The regression coefficient bxy is the change occurring in x for unit change in y. The regression coefficient byx is the change occurring in y for unit change in x. 2. The regression coefficient is independent of the origin of measurements of the variables. But, they are dependent on the scale. 3. The geometric mean of regression coefficients is equal to the coefficient of correlation (numerically). 4. The regression coefficients cannot be of opposite signs. If r is positive, both the regression coefficients will be positive. If r is negative, both the regression coefficients will be negative. If r is zero, both the regression coefficients will be zero. 5. Since coefficient of correlation, numerically cannot be greater than 1, the product of regression coefficients cannot be greater than 1.
PROPERTIES OF REGRESSION LINES

There are two regression lines. 1. The regression lines intersect at ( x,y) 2. The regression lines have positive slope if the variables are positively correlated. They have negative slope if the variables are negatively correlated. 3. If there is perfect correlation, the regression lines coincide ( there will be only one regression line) LINES OF REGRESSION Line of regression is the lines which gives the best estimate of one variable for any given value of the other variable. In case of two variable say x & y, we shall have two regression equations; x on y and the other is y on x. Line of regression of y on x is the line which gives the best estimate for the value of y for any specified value of x. Line of regression of x on y is the line which gives the best estimate for the value of x for any specified value of y. REMEMBER a. When r=0 i.e., when x & y are uncorrelated, then the lines of regression of y on x, and x on y are given as: y y = 0 and x x = 0. The lines are perpendicular to each other. b. When r=+1 then the two lines coincide. c. If the value of r is significant, we can use the lines of regression for estimation and prediction. d. If r is not significant, then the linear model is not a good fit and hence the line of regression should not be used for prediction. 12
COEFFICIENTS OF REGRESSION a. bxy is the Coefficient of regression of x on y. b. byx is the Coefficient of regression of y on x. THEOREMS ON REGRESSION COEFFICIENTS a. The correlation coefficient is the Geometric Mean between the Regression Coefficients i.e., r2= bxy byx b. The sign to be taken before the square root is same as that of regression coefficients. c. If one of the regression coefficient is greater than one, then the other must be less than one. d. The AM of the modulus value of regression coefficients is greater than the GM of the modulus value of the Correlation Coefficient. e. Regression coefficients are independent of change of origin but not of scale.
13
Time Series
Generally, planning of economic and business activities is based on predictions of production, demand, sales etc. The future can be predicted by a detailed study of the past variations. Thus, future demand can be predicted by studying the variations in the demand for last few years. A time series may be defined as a collection of readings belonging to different time periods, of some economic variable or composite of variables. A series of observations of a phenomenon recorded at successive points of time is called Time Series. It is a chronological arrangement of statistical data regarding the phenomenon. Generally, time series are those of production, demand, sales, price, imports, exports, bank rate, value of money, etc. Usually in time series equidistant points of time are considered. There may be weekly, monthly, yearly, etc recordings. A graphical presentation of a time series is called Historigram. COMPONENTS OF A TIME SERIES In a time series, the observations vary with time. The variation occurring in any period is the result of many factors. The effects of these factors may be summed up as four components. They are a. b. c. d. Trend. ( Secular trend, Long Term Movement) Seasonal Variation. Cyclical variation ( Business Cycle) Irregular variation ( Random Fluctuation, Erratic Variation) Cyclical Variation
An analytical Study of different components of a time series, the effects of these components, etc is called analysis of time series. The utility of such analysis is a. Understanding the past behaviour of the variable b. Knowing the existing nature of variation c. Predicting the future trend d. Comparison with other similar variables. Trend (Secular Trend) Trend is the overall change taking place in the time series over a long period of time. It is the change taking place in a period of many years. Most of the time series show a general tendency to increase, decrease or to remain constant over a long period of time. Such an overall change occurring is the trend.Examples a. Steady increase in the population of India in the past many years is an upward trend. 14
b. Steady increase in the price of gold in last many years is an upward trend. c. Due to availability of greater medical facilities, death rate is decreasing. Thus, death rate shows a downward trend. d. Atmospheric temperature at a place, though show short time variation, does not show significant upward or downward trend. The root cause of trend is technological advancement, growth of population change in tastes etc. Trend is measured, mainly by the method of moving averages and by the method of least squares. Seasonal Variation The regular and periodic variation in a time series is called seasonal variation. Generally, the period of seasonal variation would generally, the period of seasonal variation would be within one year. The factors causing seasonal variation are (1) weather condition, (2) customs, tradition and habits of people. Seasonal variation is predictable. Examples a. An increase in the sales of woolen cloths during winter. b. An increase in the sales of note books during the month of June, July and August. c. An increase in atmospheric temperature during summer. Cyclical Variation (Business Cycle) Cyclical Variation is an oscillatory variation which occurs in four stages viz prosperity, recession, depression and recovery. Generally, such variation occurs in economic and business activities. They occur in a gap of more than one year. One cycle consisting of four stages occurs in a period of few years. The period is not definite. Generally, the period is 5 to 10 years. Many Economists have explained the causes of cyclical variation. Each of them is significant. Irregular variation (Random Fluctuation) Apart from the regular variations, most of the time series show variations which are totally unexpected. Irregular variations occur as a result of unexpected happenings such as wars, famines, strikes, floods etc. they are unpredictable. Generally, the effect of such variation lasts for a short period. Examples a. An increase in the price of vegetables due to a strike by the railway employees. b. A decrease in the number of passengers in the city buses, occurring as a result of strike by public sector employees. c. An increase in the number of deaths due to earthquakes.
15
Measurement Of Trend o Graphic (or Free-hand Curve fitting) Method o Method of Semi-Averages o Method of Curve Fitting by the Principle of Least Squares o Method of Moving Averages METHOD OF MOVING AVERAGES This is the simple and flexible method of measuring trend. Moving Average is an averaging process that smoothens out the fluctuations and ups & downs in the given data. The Moving Average of period m is a series of successive averages of m overlapping values at a time, starting with 1st, 2nd, 3rd value and so on. SEASONAL VARIATIONS The variations due to such forces which operate in a regular periodic manner with period less than one year. The objectives of studying this is as follows: o To isolate seasonal variations: To determine the effect of seasonal swings on the values of a given phenomenon. o To eliminate them: To determine the value of the phenomenon if there were no seasonal ups & downs. Methods: o Method of Simple Averages o Ratio to Trend Method o Ratio to Moving Averages Method o Link Relative Method SIMPLE AVERAGES This is the simplest method of measuring the seasonal variations in a time series and involves the following steps: o Arrange the data by years & months o Compute the average for the months o Compute the overall average o Obtain seasonal Indices for different months o CYCLICAL VARIATIONS o This is an approximate or crude method of measuring cyclical variations, which consists of estimating trend, seasonal components and then eliminating their effect from the given Time Series. o RANDOM VARIATIONS o These can not be estimated accurately, we can not obtain an estimate the variance of random components. 16
INTERPOLATION AND EXTRAPOLATION According to W.M. HarperInterpolation consists of reading a value which lies between 2 extreme points. Extrapolation means reading a value that lies outside the two extreme points. According to HirachInterpolation is the estimation of a most likely estimate in given condition. The technique of estimating a past figure is termed as Interpolation which that of estimating a probable figure of the future is called Extrapolation. Meaning INTERPOLATION refers to the insertion of an intermediate value in a series of items. EXTRAPOLATION refers to the projecting of a value for the future. Assumptions a) There is no violent or disturbing situation in the intermediate period ie., there must be continuity and no sudden jump from one period to another. b) There is uniformity in the changes of known figure. c) There is definite and stable relationship between both the variable. Methods of Interpolation The methods can be divided under two different heads a) Graphicl method b) Algebraic method: under this method there are several formulae. The following are some of the important methods: i) Binomial expansion method ii) Newtons method iii) Lag ranges method iv) Parabolic curve method. Binomial expansion method: this method is applicable when the independent variable X advances with equal intervals. The formulae is based on the expansion (Y-1)m=0.
17

Statistics Notes

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Statistics Notes

Uploaded by

Copyright:

Available Formats

CORRELATION ANALYSIS

Concept and Importance of Correlation

Correlation and Causation

Process of Calculating Coefficient of Correlation

Assumptions of the Karl Pearsons Correlation

Some Limitations of Correlation Analysis

Properties of Correlation Coefficient

In total the Properties are:

PE (r) = SE (r) * 0.6745

When ranks are not repeated:

= 1When ranks are repeated:

nxy byx= x.y nx2 -(x)2

PROPERTIES OF REGRESSION LINES

You might also like