ML KNN
- Input & output
- Main idea
- Distance measurement
- The choice of K
- Classification decision rule
- Example in practise
KNN is k-nearest neighbour, a basic classifier which has three key elements: the choice of k, the distance measurement and classification decision rule.
Input & output
The input , d is the dimension of feature vector x. The output , y is the class label in K classes.
Main idea
Given an instance , choose a kind of distance measurement
, find the K
closest neighbours of and formed the group . Then decide the class of by applying the classification decision rule
to neighbour group.
‘Birds of a feather flock together.’ It is just like the old chinese saying goes. And we tend to judge a person by their family or friends, because who they are getting along with in a sense reflects who they are.
Distance measurement
We denote , ,
Then the distance is:
When , the distance is Manhattan distance:
When , the distance is Euclidean distance:
When , the distance is the maximum distance among all dimensions:
The choice of K
If K is too large, the estimation error is getting smaller, but the model may get over-fitting.
While if K is too small, the approximation error is rather smaller, then the model may be too simple to be more generalized.
Classification decision rule
The decision rule is to use to decide the class by the neighbour group. The most commonly used rule is majority voting rule
: to choose the class of most neighbour instances. Then the misclassification rate of class is :
Example in practise
The final part is kd-tree which is the algorithms to efficiently find k-nearest neighbour.
Goodbye!