Introduction

The Naïve Bayes classification is based on Bayesian Theorem and is particularly suited when the dimensionality of the inputs is high. Despite its simplicity, Naïve Bayes can often outperform more sophisticated classification methods.

The Naïve Bayes probabilistic Model

Abstractly, the probability model for a classifier is a conditional model:

naive bayes

Over a dependent class variable C with a small number of outcomes or classes, conditional on several feature variables F1 though Fn. The problem is that if the number of feature n is large or when a feature can take on a large number of values, then basing such a model on probability table is infeasible. We therefore reformulate the model to make it more tractable.

Using Bayes’ theorem, we write:

naive bayes

In plain English the above equation can be written as:

naive bayes

In practice we are only interested in the numerator of that fraction, since the denominator does not depend on C and the values of the features Fi are given, so that the denominator is effectively constant. The numerator is equivalent to the joint probability model (See Bayes theorem):

naive bayes

which can be rewritten as follows, using repeated applications of the definition of conditional probability:

naive bayes

Now the “naïve” conditional independence assumptions come into play: assume that each feature Fi is conditionally independent of every other feature Fj for j≠i. For example, j≠i, k, l. the assumption means that:

naive bayes

So the joint model can be expressed as:

naive bayes

This means that under the above independence assumptions, the conditional distribution over the class variable C can be expressed like this:

naive bayes

Where Z (the evidence) is a scaling factor dependent only on F1,…,Fn

Naïve Bayes can be modelled in several different ways including Normal, Lognormal, Gamma and Poisson density function.

naive bayes

Example

In the implementation of WT fault diagnosis, we supposed to have the following training data for a specific design of 2 MW WT.

naive bayes

The classifier created from above training data using a Normal Distribution Assumption would be:

naive bayes

Then, there is a sample to be classified as “OK” or “Warning”:

naive bayes

We wish to determine which posterior is greater, “OK” or “Warning”. For the classification as “OK” the posterior is given by:

naive bayes

For the classification as “Warning” the posterior is given by:

naive bayes

The evidence may be ignored as it is the same in posterior(ok) and posterior(warning). Finally, based on the Normal Distribution Assumption, we can get the posterior numerator of p(ok)=0.062 and the posterior numerator of p(warning)=0.0057. Since posterior numerator is greater in the “OK” case, we identify the sample as “OK”.

 

References & Resources

  • Wikipedia
  • Bindi Chen, the First Year Transfer Report - Progressing for the Degree of Doctor of Philosophy.