Welcome to the Machine Learning Course for Black and Indigenous Students!
This program is offered by Vector Institute in its drive to build research and expand career pathways in the field of AI for under-represented populations.
Instructor: Bonaventure Molokwu | Tutorial Developer: Manmeet Kaur Baxi | Course Tutors: Yinka Oladimeji and Manmeet Kaur Baxi | Course Director: Shingai Manjengwa (@Tjido)
Never stop learning!
K-Nearest Neighbours (KNN)
- A type of supervised ML algorithm that can be used for both classification as well as regression predictive problems,
- mainly used for classification predictive problems in the industry,
- based on the feature similarity approach.
Properties of KNN:
- Lazy learning algorithm − KNN is a lazy learning algorithm because it does not have a specialized training phase and uses all the data for training while classification.
- Non-parametric learning algorithm − KNN is also a non-parametric learning algorithm because it doesn’t assume anything about the underlying data.
KNN Algorithm:
- Load data
- Initialize 'k' to your chosen number of neighbours, 'K' can be any integer.
- For each example in the data
3.1 Calculate the distance between the test data and each row of training data, with help of one of these methods: Euclidean, Manhattan or Hamming. (Most common: Euclidean)
3.2 Sort based on the distance value in ascending order.
3.3 Choose the top ‘K’ rows from the sorted array.
3.4 Assign the class to test point based on most frequent class of these rows.

Curse of Dimensionality (Problem of higher dimensions/features):
- performs better with a lower number of features than a large number of features.
- When the number of features increases, then it requires more data.
- Increase in dimension → Overfitting (To avoid overfitting, a larger amount of data is needed as dimensions increase)
How to deal with the Curse of Dimensionality?