## Naïve Bayes

### Introduction

The **Naïve Bayes** classification is based on Bayesian Theorem and is particularly suited when the dimensionality of the inputs is high. Despite its simplicity, Naïve Bayes can often outperform more sophisticated classification methods.

### The Naïve Bayes probabilistic Model

Abstractly, the probability model for a classifier is a conditional model:

Over a dependent class variable C with a small number of outcomes or classes, conditional on several feature variables F_{1} though F_{n}. The problem is that if the number of feature n is large or when a feature can take on a large number of values, then basing such a model on probability table is infeasible. We therefore reformulate the model to make it more tractable.

Using Bayes’ theorem, we write:

In plain English the above equation can be written as:

In practice we are only interested in the numerator of that fraction, since the denominator does not depend on C and the values of the features F_{i} are given, so that the denominator is effectively constant. The numerator is equivalent to the joint probability model (See Bayes theorem):

which can be rewritten as follows, using repeated applications of the definition of conditional probability:

Now the “naïve” conditional independence assumptions come into play: assume that each feature F_{i} is * conditionally independent* of every other feature F

_{j}for j≠i. For example, j≠i, k, l. the assumption means that:

So the joint model can be expressed as:

This means that under the above **independence assumptions**, the conditional distribution over the class variable C can be expressed like this:

Where Z (the evidence) is a scaling factor dependent only on F_{1},…,F_{n}

Naïve Bayes can be modelled in several different ways including Normal, Lognormal, Gamma and Poisson density function.

### Example

In the implementation of WT fault diagnosis, we supposed to have the following training data for a specific design of 2 MW WT.

The classifier created from above training data using a **Normal Distribution Assumption** would be:

Then, there is a sample to be classified as “OK” or “Warning”:

We wish to determine which posterior is greater, “OK” or “Warning”. For the classification as “OK” the posterior is given by:

For the classification as “Warning” the posterior is given by:

The evidence may be ignored as it is the same in posterior(ok) and posterior(warning). Finally, based on the **Normal Distribution Assumption**, we can get the posterior numerator of p(ok)=0.062 and the posterior numerator of p(warning)=0.0057. Since posterior numerator is greater in the “OK” case, we identify the sample as “OK”.

### References & Resources

- Wikipedia
- Bindi Chen, the First Year Transfer Report - Progressing for the Degree of Doctor of Philosophy.

#### Latest Post

- Dependency injection
- Directives and Pipes
- Data binding
- HTTP Get vs. Post
- Node.js is everywhere
- MongoDB root user
- Combine JavaScript and CSS
- Inline Small JavaScript and CSS
- Minify JavaScript and CSS
- Defer Parsing of JavaScript
- Prefer Async Script Loading
- Components, Bootstrap and DOM
- What is HEAD in git?
- Show the changes in Git.
- What is AngularJS 2?
- Confidence Interval for a Population Mean
- Accuracy vs. Precision
- Sampling Distribution
- Working with the Normal Distribution
- Standardized score - Z score
- Percentile
- Evaluating the Normal Distribution
- What is Nodejs? Advantages and disadvantage?
- How do I debug Nodejs applications?
- Sync directory search using fs.readdirSync