Professional Documents
Culture Documents
CONTENTS
Characteristics of Multivariate Analysis Dependency Techniques -- Multiple Regression Analysis -- Discriminate Analysis Interdependency Techniques -- Factor Analysis -- Cluster Analysis
Example ..
Where the sales level of a companys product is influenced not only by demand, but also by competitors strategies. While analyzing sales, the manager of the company has to take this variable into consideration as well. Multivariate techniques are broadly classified into two categories. -- Dependency Techniques -- Interdependency Techniques
Dependency Techniques
Dependency techniques aim at explaining or predicting one or more dependent variables based on two or more independent variables. Here, the focus is on defining the relationship between one dependent variable and many independent variables that affect it.
Key Purposes
Multiple regression analysis is used for two key purposes: To identify relationships between variables To predict the outcomes Multiple regression analysis can help the researcher to evaluate the association between a single dependent variable and two or more than two independent variables.
Test of Significance
Null hypothesis H0 : R2 = 0 Alternative hypothesis H1: R20 The test statistic can be determined using the formula
SSR k F SSE (n k 1)
Where,
SSR is the sum of squares due to regression SSE is the residual sum of squares n represents the sample size k represents the number of variables in the problem We reject the null hypothesis if the calculated F-value exceeds the tabular F-value. We accept the null hypothesis if the calculated F-value is less than the tabular F-value
Example.
A financial institution may want to classify various investment options into high return, medium return, and low return investments Similarly a market research agency might want to asses the quality of various car models and classify them under high quality, medium quality and low quality categories
Where it is used?
By using discriminant equation we can classify the objects into particular predefined groups To predict the success/failure of the objects. Based on the classification of objects, we can find answers like: which investment option will provide higher returns, or who are the potential customers Discriminant analysis also helps in determining the factors that aid in discriminating between the objects.
Example
This can be used in marketing where we can apply it to achieve an understanding of how customer preferences for different brands differ
Contd
The number of discriminant equations required to carry out discriminant analysis depends upon the number of categories into which the objects are to be classified We need to develop n-1 discriminant equations, where n represents the number of categories, to carry out the discriminant analysis
For example
If the problem consists of categorizing the objects into two groups, (such as eligible and ineligible candidates, buyers and nonbuyers) then we need to develop a single discriminant equation For a problem consisting of three categories (high return stocks, medium return stocks, and low return stocks), we need to develop two discriminant equations
INTERDEPENDENCY TECHNIQUES
Interdependency techniques are used in situations where no distinction is made between variables which are independent variables and those which are dependent variables Instead the interdependent relationships between variables are examined. Prominent interdependency techniques are factor analysis, cluster analysis, metric multidimensional scaling and non-metric multidimensional scaling
FACTOR ANALYSIS
Factor analysis can be defined as a set of methods in which the observable or manifest responses of individuals on a set of variables are represented as functions of a small number of latent variables called factors. Factor analysis is used when the research problem involves a large number of variables making the analysis and interpretation of the problem difficult It helps the researcher to reduce the number of variables to be analyzed, thereby making the analysis easier. Using factor analysis, the researcher can reduce the large number of variables into a few dimensions called factors that summarize the available data
Factor loadings:
It helps in interpreting and labeling the factors. It measures how closely the variables in the factor are associated. These are also called factor-variable correlations. Factor loadings are correlation coefficients between the variables and the factors
Eigen values:
Eigen values measure the variance in all the variables corresponding to the factor. They are calculated by adding the squares of factor loadings of all the variables in a factor. They aid in explaining the importance of the factor with respect to the variables
Communalities: Communalities, denoted by h2, measure the percentage of variance in each variable explained by the factors extracted. This is calculated by adding the squared factor loadings of a variable across the factors. The communality ranges from 0 to 1. A high communality value indicates that the maximum amount of the variance in the variable is explained by the factors extracted from the factor analysis Total variance explained: It is the percentage of total variance of the variables explained. This is calculated by adding all the communality values of each variable and dividing it by the number of variables
Age
Gender Marital status Income levels Education Employment Credit history Family background
68
75 83 21 17 -04 32 14
12
06 05 78 83 81 12 32
-3
-8 -7 -11 -11 -6 -75 -71
80
82 80 53 69 57 62 70
of 43%
20%
10%
43%
63%
73%
The Results of the Factor Analysis can be Interpreted in the following ways
Factor loadings:
In the above table we observe that the variables age, gender, marital status have high factor loadings on the first factor compared to the other two factors. Thus we can infer that these two variables are highly correlated and represent an underlying common factor. Thus the analysis of factor loadings can help in interpreting and labeling the factors.
Factor scores:
As the variables are grouped into factors, they become new variables. These factors are used for subsequent analysis. The values for each new observation based on these new variables are called factor scores.
3) Selection of clustering approach: There are two types of clustering approaches Hierarchical clustering approach: It consists of either a top-down approach or a bottom-up approach. Prominent hierarchical clustering methods are: single linkage, complete linkage, average linkage, wards method and centroid method. Non-hierarchical clustering approach: There are three prominent non-hierarchical clustering methods: sequential threshold method, parallel threshold method, and optimizing portioning method.
4) Deciding on the number of clusters to be selected: One way is to decide it intuitively. Another way is to get inputs from the pattern of clusters that a method generates. The researcher can use distance between the objects as the criterion. So a researcher can set a certain distance value and he limits the clustering process to the point where the values exceed that specified value. 5) Interpreting the clusters: It can be done using the centroid. The centroid helps the researcher in explaining the cluster and providing an appropriate label to the cluster.
Multidimensional Scaling
It is defined as a technique that involves representing objects preferences and perceptions as points on a multidimensional space This statistical technique is used to reveal the underlying dimensions based on which consumers perceive that two objects are similar It is commonly used in motivational research