ML KNN

01 Março, 2019

Read this in "about 2 minutes".

Input & output
Main idea
Distance measurement
The choice of K
Classification decision rule
Example in practise

KNN is k-nearest neighbour, a basic classifier which has three key elements: the choice of k, the distance measurement and classification decision rule.

Input & output

The input $x=(x_{1},x_{2},...,x_{d})$ , d is the dimension of feature vector x. The output $y \in {c_1, c_2, ..., c_K}$ , y is the class label in K classes.

Main idea

Given an instance $x$ , choose a kind of distance measurement, find the K closest neighbours of $k$ and formed the group $N_{k}(x)$ . Then decide the class of $x$ by applying the classification decision rule to neighbour group.

‘Birds of a feather flock together.’ It is just like the old chinese saying goes. And we tend to judge a person by their family or friends, because who they are getting along with in a sense reflects who they are.

Distance measurement

We denote $x_{i} = (x_{i}^1, x_{i}^2, ..., x_{i}^n)$ , $y_{i} = (y_{i}^1, y_{i}^2, ..., y_{i}^d)$ ,
Then the $L_{p}$ distance is:

${( \sum_{l=1}^{d} {\vert x_{i}^{l} - x_{j}^{l} \vert}^p )}^{\frac {1}{p}}$

When $p = 1$ , the distance is Manhattan distance:

$\sum_{l=1}^{d} {\vert x_{i}^{l} - x_{j}^{l} \vert}$

When $p = 2$ , the distance is Euclidean distance:

${( \sum_{l=1}^{d} {\vert x_{i}^{l} - x_{j}^{l} \vert}^2 )}^{\frac {1}{2}}$

When $p = \infty$ , the distance is the maximum distance among all dimensions:

$\max_{l} {\vert x_{i}^l - y_{i}^l \vert}$

The choice of K

If K is too large, the estimation error is getting smaller, but the model may get over-fitting.
While if K is too small, the approximation error is rather smaller, then the model may be too simple to be more generalized.

Classification decision rule

The decision rule is to use to decide the class by the neighbour group. The most commonly used rule is majority voting rule: to choose the class of most neighbour instances. Then the misclassification rate of class $c_{j}$ is :

$\frac {1}{k} \sum_{x_{i} \in N_{k}(x)} I(y_{i} \neq c_{j})\\ = 1 - \frac {1}{k} \sum_{x_{i} \in N_{k}(x)} I(y_{i} = c_{j})$

Example in practise

The final part is kd-tree which is the algorithms to efficiently find k-nearest neighbour.

IPython file

Goodbye!

Author

Typing Theme

Lorem ipsum dolor sit amet, consectetur adipisicing elit. Tempora non aut eos voluptas debitis unde impedit aliquid ipsa.

This is
April Cai.

ML KNN

Input & output

Main idea

Distance measurement

The choice of K

Classification decision rule

Example in practise

The comment for this post is disabled.

This is April Cai.

ML KNN

Input & output

Main idea

Distance measurement

The choice of K

Classification decision rule

Example in practise

The comment for this post is disabled.

This is
April Cai.