Tutorial 1: KNN

Welcome to the Machine Learning Course for Black and Indigenous Students!

A type of supervised ML algorithm that can be used for both classification as well as regression predictive problems,
mainly used for classification predictive problems in the industry,
based on the feature similarity approach.

Lazy learning algorithm − KNN is a lazy learning algorithm because it does not have a specialized training phase and uses all the data for training while classification.
Non-parametric learning algorithm − KNN is also a non-parametric learning algorithm because it doesn’t assume anything about the underlying data.

Load data
Initialize 'k' to your chosen number of neighbours, 'K' can be any integer.
For each example in the data 3.1 Calculate the distance between the test data and each row of training data, with help of one of these methods: Euclidean, Manhattan or Hamming. (Most common: Euclidean) 3.2 Sort based on the distance value in ascending order. 3.3 Choose the top ‘K’ rows from the sorted array. 3.4 Assign the class to test point based on most frequent class of these rows.

Untitled

performs better with a lower number of features than a large number of features.
When the number of features increases, then it requires more data.
Increase in dimension → Overfitting (To avoid overfitting, a larger amount of data is needed as dimensions increase)

How to deal with the Curse of Dimensionality?

Perform Principal Component Analysis (PCA) before applying any machine learning algorithm.

OR