ML KNN

Read this in "about 2 minutes".

KNN is k-nearest neighbour, a basic classifier which has three key elements: the choice of k, the distance measurement and classification decision rule.


Input & output

The input , d is the dimension of feature vector x. The output , y is the class label in K classes.

Main idea

Given an instance , choose a kind of distance measurement, find the K closest neighbours of and formed the group . Then decide the class of by applying the classification decision rule to neighbour group.

‘Birds of a feather flock together.’ It is just like the old chinese saying goes. And we tend to judge a person by their family or friends, because who they are getting along with in a sense reflects who they are.

Distance measurement

We denote , ,
Then the distance is:

When , the distance is Manhattan distance:

When , the distance is Euclidean distance:

When , the distance is the maximum distance among all dimensions:


The choice of K

If K is too large, the estimation error is getting smaller, but the model may get over-fitting.
While if K is too small, the approximation error is rather smaller, then the model may be too simple to be more generalized.


Classification decision rule

The decision rule is to use to decide the class by the neighbour group. The most commonly used rule is majority voting rule: to choose the class of most neighbour instances. Then the misclassification rate of class is :


Example in practise

The final part is kd-tree which is the algorithms to efficiently find k-nearest neighbour.

IPython file

Goodbye!


Author

Typing Theme

Lorem ipsum dolor sit amet, consectetur adipisicing elit. Tempora non aut eos voluptas debitis unde impedit aliquid ipsa.

 The comment for this post is disabled.