Decision-making under uncertainty

This repository emerges out of teaching data science to students of various backgrounds and my practice in the industry. I aspire to contribute to the understanding of this complex landscape and teach people how to navigate it, how to develop valuable skills, and become more effective at problem-solving.

As outlined in the course website, we'll be contemplating in the library and engineering in the trenches, so here are lecture thumbnails, along with suggested practices and readings. I recommend to start your journey with the statistical fundamentals, as I re-contextualize and build on top of them towards more sophisticated, but interpretable models, which would aid decision-makers. To learn more about the interdisciplinary approach to decision-making, read the course philosophy.

Module I: Business Decisions and Data Science
Module II: Probability and Statistics Fundamentals
Module III: A/B Testing and Experiment Design
Module IV: Bayesian Hierarchical Models
Module V: Machine Learning and Special Topics
Module VI: Full Stack Data Apps

The repository will go through many changes as we go through the journey together, but you can get a sneak-peek of what it's about in the /playground directory.

I: Business Decisions and Data Science

The slides for the first, short module are completed in /slides and will be moved and published on the course website repo.

Lecture 1: Introduction and Course Overview

First, I justify why -- what we really want is "decision science"
What is the course about and why should you care?
Conversations about industries, domains, and applications
Teaching approach, learning how to learn, and course philosophy


(Fig.1) - Learn what does Pollock and Picasso have to do with statistics and ML	(Fig.2) - Learn how everything you learned before fits together into a coherent whole

Lecture 2: Business Context and Strategy

The second lecture is also conceptual, as we explore and articulate hard choices businesses face. I then bring some clarity onto the big, interdisciplinary picture of AI.

It is important to understand AI in context of business decisions and strategy
Read here for the difference between Analytics, Statistics, and ML.
The lecture is filled with hard-learned lessons and multiple tools for figuring out a good strategy: both for the business and AI

Reading and Practice

R. Rumelt - The perils of bad strategy (McKinsey, 2011)
K. Pretz - Stop Calling Everything AI, Machine-Learning Pioneer Says
M. Jordan - Artificial Intelligence: The Revolution Hasn’t Happened Yet

(Py) Lab 1: Setting up the local environment

Reading and Practice

First, you have to be confident and comfortable with your local development tooling. Invest an hour to understand conda and type in the commands -- benefit a decade ahead!

Walk through this tutorial: "Introduction to conda for (data) scientists". It will serve you well for exploration and experimentation.
- For projects more focused on building data-driven applications, we will use pip and poetry.
- We can use conda just for virtual environments and not for package management and dependency resolution / tracking.
- Therefore, one has to pick an optimal approach for each project. Not great, but could be worse (as in npm)
Read this old, but still relevant blog post about "Conda: Myths and Misconceptions"
Read these two introductory articles on modules and packages
- Absolute vs Relative imports by Mbithe Nzomo
- Python Modules and Packages – An Introduction by John Sturtz
IMPORTANT! For those of you working on Windows 10/11, here's the best set-up I know of, which involves WSL2. Here are the instructions

(Py) Lab 2: The python ecosystem

Functional programming ideas in the context of numpy, pandas
The great and terrible matplotlib


(Fig.10) - Practicing the tools for modeling and operationalization of models


(Fig.11) - Getting comfortable with the idea of literate programming and learn the tools which make this whole zoo of technologies run harmoniously

II: Probability and Statistics Fundamentals

Lecture 3: The Probabilitstic Multiverse

The third lecture is also conceptual, but in a more mathematical sense, as I attempt to build the bridge between reality and the language of uncertainty (probability theory).


(Fig.3) - How many people will show up to safari? notebook here	(Fig.4) - We discussed the importance of visual storytelling: relevance, persuasiveness,truthfulness, and aesthetics.

Reading and Practice

Read about a few fundamental ideas and concepts in probability and why we need them here
To assess if you need a refresher over probability and statistics, look at this study guide

There are three amazing resources which you can use as reference and inspiration for introductory to intermediate probability and mathematical statistics. They have recorded video lectures, a freely-available book, and the first two, code:

Probability 110 by Joe Blitzstein (Harvard), with R code. Great stories behind probabilities, numerous examples of applications, and accessible proofs.
Probability for Data Science by Stanley Chan (Purdue), with python code. Amazing graphics, visualizations, accessible and extensive mathematical treatment.
Probability by Santosh Venkatesh (University of Pennsylvania), once available on coursera, now on youtube. Great real-world examples from numerous domains, gentle build-up towards more complicated concepts. Unfortunately, no code or book -- but you can combine this playlist with one of the above.

Lab 3: Probability DAGs and Simulations

If you have conda installed on Linux, MacOS or WSL2 on Windows, the easiest way to play around with the notebook is to recreate the environment from the yml file. Then, you can either create a kernel or connect from VSCode notebooks to the environment and start hacking.

git clone https://github.com/bizovi/decision-making.git

cd playground
conda env create --file conda-env.yml
conda activate gpa-prob

# if using a jupyter lab
python -m ipykernel install --user \
    --name="gpa-kernel" \
    --display-name="Kernel for Simulations"

# run the test suite and see if everything works as expected
python -m pytest

Lecture 4: Think like a Bayesian

Statistics is the art and science of changing your mind and action in the face of evidence. We're going to declare our assumptions and apply Bayes theorem to weight the information from data with our prior beliefs.

We're still in the land of probability and generative models, but a step closer towards making inferences about parameters and latent quantities, in order to answer the research questions.


(Fig.5) - Bayes Theorem and Rare Diseases. Inverse probabilities and conditioning notebook here	(Fig. 6) - How confident am I code has no bugs after x tests pass? Grids and point estimates

Lecture 5: Full Luxury Bayes and Large N

It's time we move away from point estimates, towards a full posterior distribution, which captures the uncertainty in our estimates and can be used to make prediction about the observable quantities.

A few important ideas to add to your conceptual understanding:

Parameter (estimand), estimator, estimation
DeMoivre: "The most dangerous equation": are U.S. schools too big?
What does a statistician want? Properties of estimators.
Most practical applications won't have an analytic solution, so we have to use a probabilistic programming language like pymc to draw samples from the posterior


(Fig.7) - The greatest theorem never told adapted and refactored from CamDavidson (upcoming!)	(Fig.8) - Conjugate priors and the idea of Bayesian updating. Full luxury bayes: automatic sampling, thoughtful modeling

Name		Name	Last commit message	Last commit date
Latest commit History 54 Commits
bayes_playground		bayes_playground
decision_labs		decision_labs
decision_labs_r		decision_labs_r
docs/img		docs/img
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Decision-making under uncertainty

I: Business Decisions and Data Science

Lecture 1: Introduction and Course Overview

Lecture 2: Business Context and Strategy

Reading and Practice

(Py) Lab 1: Setting up the local environment

Reading and Practice

(Py) Lab 2: The python ecosystem

II: Probability and Statistics Fundamentals

Lecture 3: The Probabilitstic Multiverse

Reading and Practice

Lab 3: Probability DAGs and Simulations

Lecture 4: Think like a Bayesian

Lecture 5: Full Luxury Bayes and Large N

About

Releases

Packages

Languages

License

Bizovi/decision-making

Folders and files

Latest commit

History

Repository files navigation

Decision-making under uncertainty

I: Business Decisions and Data Science

Lecture 1: Introduction and Course Overview

Lecture 2: Business Context and Strategy

Reading and Practice

(Py) Lab 1: Setting up the local environment

Reading and Practice

(Py) Lab 2: The python ecosystem

II: Probability and Statistics Fundamentals

Lecture 3: The Probabilitstic Multiverse

Reading and Practice

Lab 3: Probability DAGs and Simulations

Lecture 4: Think like a Bayesian

Lecture 5: Full Luxury Bayes and Large N

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages