You are on page 1of 8

What is Classification?

Classification is a form of data analysis that extracts


models describing data classes

What is Classifier?
A classifier, or classification model, predicts categorical
labels (classes)

How does classification work?


Data classification is a two-step process, consisting of a
learning step (where a classification model is
constructed) and a classification step (where the model
is used to predict class labels for given data)

Differentiate Supervised Learning and Unsupervised


Leaning.
If the class label of each training tuple is provided, this step is also
known as
supervised learning (i.e., the learning of the classifier is
supervised in that it is told to which class each training tuple
belongs). It contrasts with unsupervised learning (or clustering), in
which the class label of each training tuple is not known

Define Decision Tree Induction


Decision tree induction is the learning of decision trees from
class-labeled training tuples. A decision tree is a flowchart-like
tree structure, where each internal node (nonleaf node)
denotes a test on an attribute, each branch represents an
outcome of the test, and each leaf node (or terminal node)
holds a class label. The topmost node in a tree is the root
node.

Name few Decision Tree Algorithms


ID3, C4.5, and CART(Classification and Regression Trees)

Give an example for splitting criterion in decision


tree

What are the possible scenarios available


for splitting attribute
discrete-valued method
continuous-valued
discrete-valued and a binary tree

Name popular attribute selection measures


information gain, gain ratio, and Gini index.

Define Tree Pruning


Tree pruning methods address the problem of overfitting the data.
Tree pruning algorithms attempt to improve accuracy by removing
tree branches reflecting noise in the data.

How does tree pruning


work?

There are two common approaches to tree pruning:


.
prepruning
and postpruning..

Define - Naive Bayesian classification


Naive Bayesian classification is based on Bayes theorem of posterior
probability. It assumes class-conditional independencethat the effect
of an attribute value on a given class is independent of the values of
the other attributes.

State - Bayes Theorm


Bayes theorem is useful in that it provides a way of calculating
the posterior probability,

What is rule based Classifier


? A rule-based classifier uses a set of IF-THEN rules for classification. Rules
.can be extracted from a decision tree. Rules may also be generated
directly from training data using sequential covering algorithms.

What is Confusion Matrix ?


.

True Positives : These refer to the positive tuples that were correctly
labeled by the classifier
True negatives :These are the negative tuples that were correctly
labeled by the classifier..
False positives: These are the negative tuples that were incorrectly
labeled as positive (e.g., tuples of class buys computer = no for which
the classifier predicted buys computer = yes).
False negatives : These are the positive tuples that were mislabeled
as
. negative (e.g., tuples of class buys computer = yes for which the
classifier predicted buys computer = no).

What if we would like to predict a


continuous
value, rather than a categorical
.
label?
Numeric prediction is the task of predicting continuous (or ordered)
values for given input.

What do you mean by Regression


Analysis?
Regression analysis can be used to model the relationship between
one or more independent or predictor variables and a dependent or
response variable (which is continuous-valued)

Define - Straight-line regression analysis


Straight-line regression analysis involves a response variable, y, and
a single predictor variable, x. It is the simplest form of regression, and
models y as a linear function of x.
That is,
y = b+wx;

What is Multiple linear regression?


.
Multiple
linear regression is an extension of straight-line regression so
as to involve more than one predictor variable.
An example of a multiple linear regression model based on two
predictor attributes or variables, A1 and A2, is

y = w0+w1x1+w2x2

How can we model data that does not


show a linear dependence?
Using Non linear regression model we can solve the problem. we
can convert the nonlinear model into a linear one that can then be
solved by the method of least squares.

Define - Straight-line regression analysis


Straight-line regression analysis involves a response variable, y, and
a single predictor variable, x. It is the simplest form of regression, and
models y as a linear function of x.
That is,
.
y = b+wx;

Give
an
example
for
transforming
polynomial
regression model to a linear
.
regression model

Equation can then be converted to linear formby applying the above


assignments, resulting in the equation y = w0 +w1x1 +w2x2 +w3x3.

Name few Regression-Based Methods


Linear, nonlinear, Log linear model, Multiple and generalized
linear models of regression can be used for prediction..

You might also like