Machine Learning 101 -- Lecture Notes
27th April 2022Introduction
I recently gave a little, informal lecture about the basics of machine learning (ML) to a bunch of friends. I have decided to publish the jupyter notebook I have used for the talk on this page as lecture notes for future reference, and because I think it might be useful for other people as well. Be aware that some explanations might not be as in-depth as desired, so feel free to contact me in case of any questions!
You can find the notebook here, as well as a PDF version of it.
The Contents of the Notebook
General concepts introduced:
- Supervised vs. unsupervised learning
- The importance of data preprocessing and visual inspection
- The train/test split
- Hyperparameter tuning based on 'elbow plots'
- The idea of ensemble models to quantify predictive uncertainty
Specific methods used:
- Classification with a support vector machine
- Classification with a decision tree
- Classification with a random forest
- Dimensionality reduction using PCA
- Clustering using k-means
Afterthoughts
Just some days ago, I actually used a regression tree for the very first time myself. Looking back, I think this might be a nice addition to a lecture like the one presented here. Thus, I'll likely write up a few words about regression trees in a future post on this blog.
In hindsight, I should have also said a few words about the problem of under- vs. overfitting, i.e. the famous bias-variance tradeoff. I'd definitely include a remark about this in a future lecture of mine, as this is really a fundamental concept throughout ML.
Finally, my friends asked me whether I'd be willing to also present some kind of "Neural Networks 101" similar to the one shared here. I definitely am, and as soon as I have done so, you'll find the lecture notes on this blog as well :D!