Introduction

The Naïve Bayes classification is based on Bayesian Theorem and is particularly suited when the dimensionality of the inputs is high. Despite its simplicity, Naïve Bayes can often outperform more sophisticated classification methods.

The Naïve Bayes probabilistic Model

Abstractly, the probability model for a classifier is a conditional model:

naive bayes

Over a dependent class variable C with a small number of outcomes or classes, conditional on several feature variables F1 though Fn. The problem is that if the number of feature n is large or when a feature can take on a large number of values, then basing such a model on probability table is infeasible. We therefore reformulate the model to make it more tractable.

Using Bayes’ theorem, we write:

naive bayes

In plain English the above equation can be written as:

naive bayes

In practice we are only interested in the numerator of that fraction, since the denominator does not depend on C and the values of the features Fi are given, so that the denominator is effectively constant. The numerator is equivalent to the joint probability model (See Bayes theorem):

naive bayes

which can be rewritten as follows, using repeated applications of the definition of conditional probability:

naive bayes

Now the “naïve” conditional independence assumptions come into play: assume that each feature Fi is conditionally independent of every other feature Fj for j≠i. For example, j≠i, k, l. the assumption means that:

naive bayes

So the joint model can be expressed as:

naive bayes

This means that under the above independence assumptions, the conditional distribution over the class variable C can be expressed like this:

naive bayes

Where Z (the evidence) is a scaling factor dependent only on F1,…,Fn

Naïve Bayes can be modelled in several different ways including Normal, Lognormal, Gamma and Poisson density function.

naive bayes

Example

Problem: Using the following training set, classify whether a given person is a male or a female, based on the provided specifications. The known criterias are: height, weight, and foot size.

naive bayes

The classifier created from the above training data, using a Gaussian Distribution Assumption, would be:

naive bayes

Then, there is a sample to be classified as “male” or “female”:

naive bayes

Our goal is to determine the greater posterior, “male” or “female”.

For the classification as “male” the posterior is given by:

naive bayes

For the classification as “female” the posterior is given by:

naive bayes

Next, the evidence may be calculated:

naive bayes

However, in this case the evidence is a constant. Therefore, both posteriors are scaled equally. That's why it can be ignored, since it doesn't affect classification.

Let's continue to calculate the probability distribution for the sex of the sample data:

naive bayes

where arithmetic mean and arithmetic mean are the parameters of normal distribution which have been previously determined from the training set. Note that a value greater than 1 is OK here – it is a probability density rather than a probability, because height is a continuous variable.

naive bayes

Since posterior numerator is greater in the female case, we predict the sample is female.

 

References & Resources

  • Wikipedia
  • Bindi Chen, the First Year Transfer Report - Progressing for the Degree of Doctor of Philosophy.