ML Naive Bayes

02 Março, 2019

Read this in "about 3 minutes".

Input & output
Main idea
Learning method
Parameter estimation
Example in practise

Naive Bayes is a classifier which is based on Bayes equation and conditional independence.

Input & output

input: $x=(x_{1},x_{2},...,x_{d})$ , $d$ is the dimension of feature vector $x$ . output: $y \in {c_1, c_2, ..., c_K}$ , $y$ is the class label in $K$ classes.

Main idea

Based on Bayes equation

$P(w|x) = \frac {p(x|w) P(w)}{P(x)}\\ = \frac {p(w, x)}{P(x)}$

$P(x|w)$ is the likelihood, $P(w)$ is prior probability. $P(x)$ is the evidence, and $P(w|x)$ is the posterior probability. The likelihood is the conditional probability, the evidence is the const value and P(w|x) is the probability we wanna solve given likelihood and evidence.
Here to fresh your memory, you can think of $x$ as a phenomenon and $w$ as a rule.

Learning method

1.First we have to understand from the very beginning, we are definitely want to figure out the conditional probability. The conditional probability is meant for every class in every dimension of feature vector $x$ , the appearance of all possible values.
Then we can apply conditional independence theory to calculate the combined probability.

$P(X = x | Y = c_{k}) = P(X_{1} = x_{1}, X_{2} = x_{2},..., X_{d} = x_{d} | Y = c_{k})\\ = \prod_{i=1}^{n} P(X_{i} = x_{i} | Y = c_{k})$

Then for every dimension $i$ in $n$ , $x_{i}$ have $S_{i}$ values, the total number of parameters for K classes is $K \prod_{i=1}^{n} S_{i}$ . It’s really hard to predict so many parameters.

2.Instead, we can figuring out $P(Y = c_{k} | X = x)$ .
I think this alternative method is not to cover all possible $x$ , it only focuses the instance we want to predict.

$P(Y = c_{k} \mid X = x) = \frac { P(X = x \mid Y = c_{k}) P(Y = c_{k}) }{ P(X = x) }\\ = \frac { P(X = x \mid Y = c_{k}) P(Y = c_{k}) }{ \sum_{K} P(Y = c{k}) \prod_{i} P(X_{i} = x_{i} \mid Y = c_{k}) }\\ = \frac { \prod_{i}P(X_{i} = x_{i} \mid Y = c_{k}) P(Y = c_{k}) }{ \sum_{K} P(Y = c{k}) \prod_{i} P(X_{i} = x_{i} \mid Y = c_{k}) }\\$

3.So we can calculate for every class $c_{k}$ and find the maximum,

$y = f(x) = argmax_{c_{k}}\frac { \prod_{i}P(X_{i} = x_{i} \mid Y = c_{k}) P(Y = c_{k}) }{ \sum_{K} P(Y = c{k}) \prod_{i} P(X_{i} = x_{i} \mid Y = c_{k}) }\\$

Ignoring the same denominator, we get:

$y = f(x) = argmax_{c_{k}} { \prod_{i}P(X_{i} = x_{i} \mid Y = c_{k}) P(Y = c_{k}) }\\$

Finally:) we easily apply the Bayes equation to decide the class of give instance $x$ .

Parameter estimation

1.Maximum likelihood
To calculate:
$y = argmax_{c_{k}} { \prod_{i} P(X_{i} = x_{i} \mid Y = c_{k}) P(Y = c_{k})}$

That is to calculate
$P(y = c_{k}) = \frac {\sum_{i=1}^N I(y_{i} = c_{k})}{N}, k = 1,2,...,K\\ P(X_{i} = x_{ij} \mid Y = c_{k}) = \frac {\sum_{i=1}^N I(X_{i} = x_{ij}, y_{i} = c_{k})}{\sum_{i=1}^N I(y_{i} = c_{k})}, x_{i} \in \{x_{i1},x_{i2},...,x_{iS_{i}}\}$

2.Bayes estimation
$P_{\lambda}(y = c_{k}) = \frac {\sum_{i=1}^N I(y_{i} = c_{k}) + \lambda}{N+K\lambda}, k = 1,2,...,K\\ P_{\lambda}(X_{i} = x_{ij} \mid Y = c_{k}) = \frac {\sum_{i=1}^N I(X_{i} = x_{ij}, y_{i} = c_{k}) + \lambda}{\sum_{i=1}^N I(y_{i} = c_{k}) + S_{i}\lambda}, x_{i} \in \{x_{i1},x_{i2},...,x_{iS_{i}}\}$

The only difference is to add Laplace smoothing, to make
$P \gt 0\\ \sum_{k=1}^{K}P_{k} = 1$

Example in practise

IPython file

Goodbye!

Author

Typing Theme

Lorem ipsum dolor sit amet, consectetur adipisicing elit. Tempora non aut eos voluptas debitis unde impedit aliquid ipsa.

This is
April Cai.

ML Naive Bayes

Input & output

Main idea

Learning method

Parameter estimation

Example in practise

The comment for this post is disabled.

This is April Cai.

ML Naive Bayes

Input & output

Main idea

Learning method

Parameter estimation

Example in practise

The comment for this post is disabled.

This is
April Cai.