### Introduction

The Naïve Bayes classification is based on Bayesian Theorem and is particularly suited when the dimensionality of the inputs is high. Despite its simplicity, Naïve Bayes can often outperform more sophisticated classification methods.

### The Naïve Bayes probabilistic Model

Abstractly, the probability model for a classifier is a conditional model:

Over a dependent class variable C with a small number of outcomes or classes, conditional on several feature variables F1 though Fn. The problem is that if the number of feature n is large or when a feature can take on a large number of values, then basing such a model on probability table is infeasible. We therefore reformulate the model to make it more tractable.

Using Bayes’ theorem, we write:

In plain English the above equation can be written as:

In practice we are only interested in the numerator of that fraction, since the denominator does not depend on C and the values of the features Fi are given, so that the denominator is effectively constant. The numerator is equivalent to the joint probability model (See Bayes theorem):

which can be rewritten as follows, using repeated applications of the definition of conditional probability:

Now the “naïve” conditional independence assumptions come into play: assume that each feature Fi is conditionally independent of every other feature Fj for j≠i. For example, j≠i, k, l. the assumption means that:

So the joint model can be expressed as:

This means that under the above independence assumptions, the conditional distribution over the class variable C can be expressed like this:

Where Z (the evidence) is a scaling factor dependent only on F1,…,Fn

Naïve Bayes can be modelled in several different ways including Normal, Lognormal, Gamma and Poisson density function.

### Example

In the implementation of WT fault diagnosis, we supposed to have the following training data for a specific design of 2 MW WT.

The classifier created from above training data using a Normal Distribution Assumption would be:

Then, there is a sample to be classified as “OK” or “Warning”:

We wish to determine which posterior is greater, “OK” or “Warning”. For the classification as “OK” the posterior is given by:

For the classification as “Warning” the posterior is given by:

The evidence may be ignored as it is the same in posterior(ok) and posterior(warning). Finally, based on the Normal Distribution Assumption, we can get the posterior numerator of p(ok)=0.062 and the posterior numerator of p(warning)=0.0057. Since posterior numerator is greater in the “OK” case, we identify the sample as “OK”.

### References & Resources

• Wikipedia
• Bindi Chen, the First Year Transfer Report - Progressing for the Degree of Doctor of Philosophy.