Welcome to DSCI 571, an introductory supervised machine learning course! In this course we will focus on basic machine learning concepts such as data splitting, cross-validation, generalization error, overfitting, the fundamental trade-off, the golden rule, and data preprocessing. You will also be exposed to common machine learning algorithms such as decision trees, K-nearest neighbours, SVMs, naive Bayes, and logistic regression using the scikit-learn framework.
2020-21 instructor: Varada Kolhatkar
By the end of the course, students are expected to be able to:
- describe supervised learning and identify what kind of tasks it is suitable for;
- explain common machine learning concepts such as classification and regression, data splitting, overfitting, parameters and hyperparameters, and the golden rule;
- identify when and why to apply data pre-processing techniques such as imputation, scaling, and one-hot encoding;
- describe at a high level how common machine learning algorithms work, including decision trees, K-nearest neighbours, and naive Bayes;
- use Python and the
scikit-learn
package to responsibly develop end-to-end supervised machine learning pipelines on real-- world datasets and to interpret your results carefully.
The following deliverables will determine your course grade:
Assessment | Weight |
---|---|
Lab Assignment 1 | 15% |
Lab Assignment 2 | 15% |
Lab Assignment 3 | 15% |
Lab Assignment 4 | 15% |
Quiz 1 | 20% |
Quiz 2 | 20% |
We will be meeting three times every week: twice for lectures and once for the lab.
Lectures of this course will be a combination of pre-recorded videos and class discussions and activities. You are expected to watch the videos before the lecture. We'll spend the lecture time in group activities and Q&A sessions.
We are providing you with a conda
environment file which is available here. You can download this file and create a conda environment for the course and activate it as follows.
conda env create -f env-dsci-571.yaml
conda activate 571
In order to use this environment in Jupyter
, you will have to install nb_conda_kernels
in the environment where you have installed Jupyter
(typically the base
environment). You will then be able to select this new environment in Jupyter
.
Note that this is not a complete list of the packages we'll be using in the course and there might be a few packages you will be installing using conda install
later in the course. But this is a good enough list to get you started.
- A Course in Machine Learning (CIML) by Hal Daumé III (also relevant for DSCI 572, 573, 575, 563)
- Introduction to Machine Learning with Python: A Guide for Data Scientists by Andreas C. Mueller and Sarah Guido.
- The Elements of Statistical Learning (ESL)
- Data Mining: Practical Machine Learning Tools and Techniques (PMLTT)
- Artificial intelligence: A Modern Approach by Russell, Stuart and Peter Norvig.
- Artificial Intelligence 2E: Foundations of Computational Agents (2017) by David Poole and Alan Mackworth (of UBC!).
- Mike's CPSC 330
Mike is currently teaching an undergrad course on applied machine learning. Unlike DSCI 571, CPSC 330 is a semester-long course but there is a lot of overlap and sharing of notes between these courses. You might find the course useful. - Mike's CPSC 340
- Machine Learning (Andrew Ng's famous Coursera course)
- Foundations of Machine Learning online course from Bloomberg.
- Machine Learning Exercises In Python, Part 1 (translation of Andrew Ng's course to Python, also relevant for DSCI 561, 572, 563)
- A Visual Introduction to Machine Learning (Part 1)
- A Few Useful Things to Know About Machine Learning (an article by Pedro Domingos)
- Metacademy (sort of like a concept map for machine learning, with suggested resources)
- Machine Learning 101 (slides by Jason Mayes, engineer at Google)
Please see the general MDS policies.