MLPR Lab 5

Wandb Dashboard screenshots

Plots and output of jupyter notebook

Report:

1. What are the common distance metrics used in distance-based classification algorithms?

Some common metrics are:

Eucledian Distance, which is the straight line distance between two points in an eucledian space.
Mahattan distance (city block distance), which is distance between two points measured along axis at right angles.
Chebyshev distance, where the distance between two vectors is the greatest of their differences along any coordinate dimension.
Cosine distance, which measures angle between two vectors.

2. What are some real-world applications of distance-based classification algorithms?

Distance based algorithms like KNN are used in many applications like: recommender system for suggesting things with similar characteristics, text classification to detect similar messages like spam, medical diagnosis to classify patient data.

3. Explain various distance metrics.

Eucledian distance: It is the most common general purpose distance in low dimensions spaces. It is very sensitive to scale of data. It's formula: $$ d(x, y) = \sqrt{\sum_{i=1}^{n} (x_i - y_i)^2} $$
Mahalanobis distance: It accounts for correlation between features and then calculates the distance. It is good for dataset with correlated features. It's formula: $$ d(x, y) = \sqrt{(x - y)^T S^{-1} (x - y)} $$
Cosine distance: It measures orientation rather than magnitude. Used in high dimensional space where direction of vector matters more than the magnitude. It's formula: $$ d(x, y) = 1 - \frac{x \cdot y}{|x| |y|} $$

4. What is the role of cross validation in model performance?

Cross validation is used to analyze the generalizability of a model. In some models, it is very helpful in tuning the hyperparameters like 'k' in KNN. It also prevents overfitting. When we partition data into several folds and train the model on the remaining subset and every data point gets to be in the test set once, so it gives reliable estimate of how good the model is performing.

5. Explain variance and bias in terms of KNN?

Bias is the inability of model to perform because of some differnce/error occuring between model's predicted value and actual value. For low bias, the model will match the training dataset and for high bias, model will not match training dataset. For KNN, if we have large k, there is a smooth decision boundary and it leads to higher bias. Meaning the model is over simplyfying.

Variance is the amount by which performance of predictive model changes when trained on different subsets of data. It measures how sensitive it is to another susbet of training dataset. For KNN, small k means that model is influenced a lot by the noise and individual data points, which leads to high variance where model is good on training data but bad on unseen data.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.github/workflows		.github/workflows
venv		venv
wandb/run-20250224_182735-zjqyx9ac		wandb/run-20250224_182735-zjqyx9ac
.gitignore		.gitignore
1.png		1.png
2.png		2.png
3.png		3.png
4.png		4.png
5.png		5.png
6.png		6.png
Dockerfile		Dockerfile
Dr_Shashi_Tharoor.jpg		Dr_Shashi_Tharoor.jpg
MLPR Prep Assignment - Distance Based Classification.pdf		MLPR Prep Assignment - Distance Based Classification.pdf
Plaksha_Faculty.jpg		Plaksha_Faculty.jpg
README.md		README.md
action_file.py		action_file.py
config.py		config.py
distance_classification.py		distance_classification.py
raghav_lab-5.ipynb		raghav_lab-5.ipynb
raghav_lab-5.pdf		raghav_lab-5.pdf
wandb_screenshot_1.png		wandb_screenshot_1.png
wandb_screenshot_2.png		wandb_screenshot_2.png
wandb_screenshot_3.png		wandb_screenshot_3.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MLPR Lab 5

Wandb Dashboard screenshots

Plots and output of jupyter notebook

Report:

1. What are the common distance metrics used in distance-based classification algorithms?

2. What are some real-world applications of distance-based classification algorithms?

3. Explain various distance metrics.

4. What is the role of cross validation in model performance?

5. Explain variance and bias in terms of KNN?

About

Releases

Packages

Languages

raghavsarna/mlpr_lab5_distance_classification

Folders and files

Latest commit

History

Repository files navigation

MLPR Lab 5

Wandb Dashboard screenshots

Plots and output of jupyter notebook

Report:

1. What are the common distance metrics used in distance-based classification algorithms?

2. What are some real-world applications of distance-based classification algorithms?

3. Explain various distance metrics.

4. What is the role of cross validation in model performance?

5. Explain variance and bias in terms of KNN?

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages