For many of the participants, this was their first steps in to Machine Learning or Cloud/Systems or both. This document is to provide some guidance on how to continue the learning journey. Everyone’s learning journey is unique and I would urge you to craft your own path. A few pointers that I would offer from our own limited experience are:
- Practice, Practice, Practice Or to simplify just Practice. Repeat the data science process many times and as much as possible, try to write the code, break things and see how to fix it.
- Understand the basics Always investigate why the algorithms and libraries work the way they do and build an intuition around what is happening in the process.
- Focus on Data Science Portfolio Work on at least 6 - 8 different projects over the next 1 year to build a portfolio of projects across data science topics.
- Build a public profile It could be a blog, a website, GitHub repos - so that people can see what you are doing and connect if they wish.
- Do projects that interest you You are more likely to stick through the long arduous journey of doing it. It could be in your work domain or it could be sports, economics, politics, public policy etc.
- Understand your learning style Are you the hacker kind, who learns best by finding out how things work or do you prefer the structured classroom style learning. This would help you identify better what works for you to learn new concepts.
- Share what you do and get feedback Put yourself out there. It may be a small project, a visualisation, an analysis, using a library, anything. Nothing is too small to share.
- Find a community of like minded people Most of our learning happens through peers. Build or join a community of peers with whom you can engage, teach and learn. It could be virtual but even better if it is face to face.
- Find your own rhythm. Build and do something every day, week or month. Find your own cadence and start creating.
There are many resources on the web to continue to learn data science. What we are listing here is a highly opinionated and limited take on material available.
- Beginner Python Data Science Handbook (Github) (Book) - Jake Vanderplas' book is a thorough introduction to doing data science in python from a library perspective -
numpy
,pandas
,matplotlib
andscikit-learn
. The material is very well documented and the notebooks are all available on GitHub. - Intermediate Introduction to Statistical Learning in R (Book) (Video) (Github) - Hastie and Tibsharani are authors' of many of the algorithms used in Machine Learning. This book is free and even though it has all the applied stuff in R - the concepts are very well explained. To follow along with Python code- just use the GitHub repo link which implements every chapter in Python.
- Intermediate Python Machine Learning (Book) (Github) - Sebastian Rashka writes very prolifically on ML and his blog itself is worth reading. This book is an applied book with real life examples and is useful to understand how to scale to larger datasets.
- Advanced Learning from Data (Slides) (Video): Yaser Abu-Mostafa’s course is the best course I have seen on understanding how Machine Learning really works. It is both intuitive as well as bit mathematical. If you are really keen to understand the whole theory of generalisation and how the algorithms really work - you will really like this course.
- Our Teaching Material (GitHub): We have a fair number of teaching repos on different Machine Learning topics and they are constantly being updated. If you liked this workshop, you may like some of them. We will also be updating The Art of Data Science, HackerMath for ML, and Applied Machine Learning as we do more workshops in the future. So you can come back later and check it out too.
Introduction to Statistics - Hacker's approach: Think Stats
Statistics - Theoretical Background: All of Statistics
Statistics - Theoretical Background: Statistics is Easy
Linear Algebra: Introduction to Linear Algebra by Gilbert Strang - 18.06 MIT Open Courseware