You are on page 1of 33

Group Members

Zarnab Asif 04
Athar Gul 54
Umar Ali 36
Fatima Saqib 39
Zeeshan 27
Introduction
 It is supervised machine learning algorithm “Naive
Bayes” mainly used for classification.

What is it ?
• Statistical method for classification.
• Supervised Learning Method.
• Assumes an underlying probabilistic model, the
Bayes theorem.
Introduction

• Can solve problems involving both categorical


and continuous valued attributes.
• Named after Thomas Bayes, who proposed the
Bayes
Theorem.
Why it is called Naive?

 It's called naive because it makes the


assumption that all attributes are
independent of each other. This
assumption is why it's called naive as
in lots of real world situations this does
not fit. Despite this the classifier works
extremely well in lots of real world
situations and has comparable
performance to neutral networks and
SVM's in certain cases (though not all).
Naive Baye’s Classifiers

Naive Baye’s classifiers and their


implementation:

 We know that Naive Bayes classifiers are a


collection of classification algorithms based
on Bayes’ Theorem.
 It is not a single algorithm but a family of
algorithms where all of them share a
common principle, i.e. every pair of features
being classified is independent of each
other.
THE BAYE’S THEOREM
 The Bayes Theorem:
P(H|X)= P(X|H) P(H)/ P(X)

 P(H|X) : Probability that the customer will buy a computer


given that we know his age, credit rating and income.
(Posterior Probability of H)
 P(H) : Probability that the customer will buy a computer
regardless of age, credit rating, income (Prior Probability of
H)
 P(X|H) : Probability that the customer is 35 yrs old, have
fair credit rating and earns $40,000, given that he has
bought our computer (Posterior Probability of X)
 P(X) : Probability that a person from our set of customers is
35 yrs old, have fair credit rating and earns $40,000. (Prior
Probability of X)
What is Naive Baye’s
algorithm?
 It is a classification technique based on Baye’s Theorem with
an assumption of independence among predictors.
 In simple terms, a Naive Bayes classifier assumes that the
presence of a particular feature in a class is unrelated to the
presence of any other feature.
 For example, a fruit may be considered to be an apple if it is
red, round, and about 3 inches in diameter. Even if these
features depend on each other or upon the existence of the
other features, all of these properties independently
contribute to the probability that this fruit is an apple and that
is why it is known as ‘Naive’.
 Naive Bayes model is easy to build and particularly useful for
very large data sets. Along with simplicity, Naive Bayes is
known to outperform even highly sophisticated classification
methods.
How Naive Bayes algorithm
works?
 Let’s understand it using an example. Below I
have a training data set of weather and
corresponding target variable ‘Play’ (suggesting
possibilities of playing). Now, we need to classify
whether players will play or not based on weather
condition. Let’s follow the below steps to perform
it.

 Step 1: Convert the data set into a frequency


table.

 Step 2: Create Likelihood table by finding the


probabilities like Overcast probability = 0.29 and
probability of playing is 0.64.
How Naive Bayes algorithm
works?
How Naive Bayes
algorithm works?

 Step 3: Now, use Naive Bayesian equation to


calculate the posterior probability for each
class. The class with the highest posterior
probability is the outcome of prediction.
 Problem: Players will play if weather is sunny.
Is this statement is correct?
 We can solve it using above discussed
method of posterior probability.
 P(Yes | Sunny) = P( Sunny | Yes) * P(Yes) / P
(Sunny)
How Naive Bayes
algorithm works?

 Here we have P (Sunny |Yes) = 3/9 = 0.33,


P(Sunny) = 5/14 = 0.36, P( Yes)= 9/14 =
0.64
 Now, P (Yes | Sunny) = 0.33 * 0.64 / 0.36 =
0.60, which has higher probability.
 Naive Bayes uses a similar method to
predict the probability of different class
based on various attributes. This algorithm
is mostly used in text classification and
with problems having multiple classes.
TEXT CLASSIFICATION
ALGORITHM: NAÏVE BAYES

 Tct – Number of particular


word in particular class
 Tct’ – Number of total words
in particular class
 B’ – Number of distinct
words in all class
TEXT CLASSIFICATION ALGORITHM: NAÏVE
BAYES


Example

 For understanding a theoretical concept, the best


procedure is to try it on an example. Since I am a
pet lover so selected animals as our predicted class.
 Let’s consider a training dataset with 1500 records
and 3 classes. We presume that there are no
missing values in our data. We have
We have 3 classes associated with Animal Types:
 Parrot,
 Dog,
 Fish.
Conti..

The Predictor features set consists of 4


features as:

 Swim
 Wings
 Green Color
 Dangerous Teeth.
 Green Color, Dangerous Teeth. All the
features are categorical variables with
either of the 2 values: T(True) or F( False).
Conti..
Conti..

The above table shows a frequency table of our


data. In our training data:
Parrots have 50(10%) value for Swim, i.e., 10% parrot
can swim according to our data, 500 out of
500(100%) parrots have wings, 400 out of 500(80%)
parrots are Green and 0(0%) parrots have
Dangerous Teeth.
 Classes with Animal type Dogs shows that 450 out of
500(90%) can swim, 0(0%) dogs have wings, 0(0%)
dogs are of Green color and 500 out of 500(100%)
dogs have Dangerous Teeth.
Conti..

 Classes with Animal type Fishes shows


that 500 out of 500(100%) can swim, 0(0%)
fishes have wings, 100(20%) fishes are of
Green color and 50 out of 500(10%) dogs
have Dangerous Teeth.

Now, it’s time to work on predict classes


using the Naive Bayes model. We have
taken 2 records that have values in their
feature set, but the target variable needs
to predicted.
Conti..

 We have to predict animal type using the feature


values. We have to predict whether the animal is a
Dog, a Parrot or a Fish
 We will use the Naive Bayes approach
P(H|Multiple Evidences) = P(E1| H)* P(E2|H)
……*P(En|H) * P(H) / P(Multiple Evidences)
 Let’s consider the first record.
The Evidence here is Swim & Green. The Hypothesis
can be an animal type to be Dog, Parrot, Fish.
Conti..

 For Hypothesis testing for the animal to be a Dog:


 P(Dog | Swim, Green) = P(Swim|Dog) *
P(Green|Dog) * P(Dog) / P(Swim, Green)
= 0.9 * 0 * 0.333 / P(Swim, Green)
=0
 For Hypothesis testing for the animal to be a Parrot:
 P(Parrot| Swim, Green) = P(Swim|Parrot) *
P(Green|Parrot) * P(Parrot) / P(Swim, Green)
= 0.1 * 0.80 * 0.333 / P(Swim, Green)
= 0.0264/ P(Swim, Green)
Conti..

 For Hypothesis testing for the animal to be a Fish:


 P(Fish| Swim, Green) = P(Swim|Fish) * P(Green|Fish)
* P(Fish) / P(Swim, Green)
= 1 * 0.2 * 0.333 / P(Swim, Green)
= 0.0666/ P(Swim, Green)
 The denominator of all the above calculations is
same i.e, P(Swim, Green). The value of P(Fish| Swim,
Green) is greater that P(Parrot| Swim, Green).
 Using Naive Bayes, we can predict that the class of
this record is Fish.
 Let’s consider the second record.
The Evidence here is Swim, Green & Teeth. The
Hypothesis can be an animal type to be Dog, Parrot,
Fish.
Conti..
 For Hypothesis testing for the animal to be a Dog:
 P(Dog | Swim, Green, Teeth) = P(Swim|Dog) *
P(Green|Dog) * P(Teeth|Dog) * P(Dog) / P(Swim,
Green, Teeth)
= 0.9 * 0 * 1 * 0.333 / P(Swim, Green, Teeth)
=0
 For Hypothesis testing for the animal to be a Parrot:
 P(Parrot| Swim, Green, Teeth) = P(Swim|Parrot) *
P(Green|Parrot)* P(Teeth|Parrot) * P(Parrot) /
P(Swim, Green, Teeth)
= 0.1 * 0.80 * 0 *0.333 / P(Swim, Green, Teeth)
=0
Conti..
 For Hypothesis testing for the animal to be a Fish:
 P(Fish|Swim, Green, Teeth) = P(Swim|Fish) *
P(Green|Fish) * P(Teeth|Fish) *P(Fish) / P(Swim, Green,
Teeth)
= 1 * 0.2 * 0.1 * 0.333 / P(Swim, Green, Teeth)
= 0.00666 / P(Swim, Green, Teeth)
 The denominator of all the above calculations is same
i.e, P(Swim, Green, Teeth). The value of P(Fish| Swim,
Green, Teeth) is the only positive value greater than 0.
Using Naive Bayes, we can predict that the class of this
record is Fish.
 As the calculated value of probabilities is very less. To
normalize these values, we need to use denominators.
Applications of Naive Baye’s
Algorithms

 Real time Prediction: Naive Bayes is an eager


learning classifier and it is sure fast. Thus, it
could be used for making predictions in real
time.

 Multi class Prediction: This algorithm is also


well known for multi class prediction feature.
Here we can predict the probability of
multiple classes of target variable.
Applications of Naive Bayes
Algorithms
Text classification/ Spam Filtering/ Sentiment
Analysis:

 Naive Bayes classifiers mostly used in text


classification (due to better result in multi class
problems and independence rule) have higher
success rate as compared to other algorithms.
 As a result, it is widely used in Spam filtering (identify
spam e-mail) and Sentiment Analysis (in social
media analysis, to identify positive and negative
customer sentiments)
Applications of Naive Bayes
Algorithms
Recommendation System:

 Naive Bayes Classifier and Collaborative


Filtering together builds a
Recommendation System that
uses machine learning and data mining
techniques to filter unseen information
and predict whether a user would like a
given resource or not.
 Multinomial:

It is used for discrete counts. For example,


let’s say, we have a text classification
problem. Here we can consider bernoulli trials
which is one step further and instead of
“word occurring in the document”, we have
“count how often word occurs in the
document”, you can think of it as “number of
times outcome number x_i is observed over
the n trials”.
Types of Models

 Bernoulli:
The binomial model is useful if your feature
vectors are binary (i.e. zeros and ones). One
application would be text classification with
‘bag of words’ model where the 1s & 0s are
“word occurs in the document” and “word
does not occur in the document”
respectively.

 Based on your data set, you can choose any


of above discussed model.
What are the Pros and Cons of
Naive Bayes?
Pros:
 It is easy and fast to predict class of test data set. It
also perform well in multi class prediction
 When assumption of independence holds, a Naive
Bayes classifier performs better compare to
other models like logistic regression and you need
less training data.
What are the Pros and Cons of
Naive Bayes?

 It perform well in case of categorical


input variables compared to numerical
variable(s).
 For numerical variable, normal distribution
is assumed (bell curve, which is a strong
assumption).
What are the Pros and Cons of
Naive Bayes?
Cons:
 If categorical variable has a category (in test data
set), which was not observed in training data set,
then model will assign a 0 (zero) probability and will
be unable to make a prediction. This is often known
as “Zero Frequency”.
 To solve this, we can use the smoothing technique.
One of the simplest smoothing techniques is called
Laplace estimation.
What are the Pros and
Cons of Naive Bayes?

 On the other side naive Bayes is also known


as a bad estimator, so the probability outputs
from predict_proba are not to be taken too
seriously.

 Another limitation of Naive Bayes is the


assumption of independent predictors. In real
life, it is almost impossible that we get a set of
predictors which are completely
independent.

You might also like