a) Logistic regression is used to predict relationship between categorical dependent variable
and independent variables can be nominal, ordinal, interval or ratio scale. b) The type of data where Logistic Regression can’t be used is when the dependent variable is not discrete. Logistic regression can’t be used when there is high level of multicollinearity in the independent variables c) Some issues with Logistic Regression are: Absence of a goodness of fit R2 like statistic for logistic regression, require sufficient responses in each category otherwise the errors will be more, using method of maximum likelihood gives worse results. d) One of the many applications of Logistic Regression is to predict employee attrition in an organization. As the dependent variable is categorical i.e. employee left the job or not left the job, and relationship between dependent and independent variables need not be linear, Logistic Regression in the best applicable model in this situation. The different independent variable data that need to be collected can be Gender, marital status, age, education, years of experience, salary, designation, previous years performance ratings etc. This past information and the fact the employee have left the job or not can be used as input data in logistic regression and a logistic equation can be obtained which will predict the future attrition probability of the employee. The employees with high probability can be taken into consideration by the management and they can be interviewed with to prevent their attrition in the future.
Classification Trees
a) Classification trees is used effectively when the dependent variable is categorical.
b) Classification trees can’t be used for continuous variables effectively. They need to be transformed into discrete variables before using the data for classification trees. c) Issues with the use of classification trees are using time sensitive data which requires a lot of data preparation, they throw away some of the data from the data set, overfitting of data is also an issue with the use of classification trees. d) An application of classification trees is to understand customer churn. The independent variables collected in the data set can be demographics, location, call minutes during different period of day, charges for different services, total call duration etc. These attributes can be used to predict if a customer is having a probability to churn in the future and steps can be taken by companies to prevent customer churn by providing different offers to the customer.
Artificial Neural Network
a) Artificial neural network can be used effectively to any type of data set both continuous and categorical. b) Artificial Neural network can be used for all type of data sets c) Issues with Artificial Neural Network are they can be used only for a data set with small number of input variables, also the weights it assigns to the input variables is optimum there is no guarantee for that and there is no mathematical modelling available to explain what the model is ding in order to generate the outcome. There is no set of mathematical rules that ANN provided that it used to generate the results. d) An application of Artificial Neural Network is in detection of fraud in credit cards. The independent variables in the data set used for modelling can be demographics, amount of credit given in past, history of past payment, amount of bill payed, amount of bill pending etc. since the data is time dependent use of ANN for such interpretation will provide the highest accuracy in terms of predicting the probability of fraud by an individual in the future based on the past attributes.