Trotro MLops: Predicting the duration of a taxi ride

Introduction

In the realm of urban mobility, accurately predicting taxi ride durations is crucial for optimizing logistics and enhancing user experience.

Trotro MLops project designed to leverage Machine Learning techniques for predicting taxi ride durations. By adhering to MLOps principles, this project seeks to facilitate efficient model development, deployment, and monitoring, ensuring that our predictions are both reliable and scalable.

Machine Learning Modeling and Experiment Tracking

The dataset for training, testing, and validation was downloaded from New York City Taxi and Limousine Commission. The January 2024 data was used for training, while the February 2024 data was utilized for validation.

Hyperparameter tuning with Bayesian optimization was performed, leveraging the hyperopt Python library. The final model was trained with the best hyperparameters and full training data, which included both the training and validation datasets. The March 2024 data was used for scoring the model.

Experimentation variables, features, metrics, parameters, and models are tracked with MLFlow. Some of the artifacts that were saved with MLFlow are the plots of predictions from an XGBoost regressor on the training and validation sets.

Training Dataset Prediction	Validation Dataset Prediction

Future work:

Feature importance analysis and interpretability with tools such as SHAP and IntepretML
Model fairness evaluation with FairLearn
Model explanation and interpretability with counterfactuals, using DICE

See source code here

Orchestration

Machine Learning pipeline orchestration was built to run with Microsoft Azure Machine Learning. This implementation is similar the project found here. For easier reproducability, this project uses GitHub Actions as a build system to orchestrate the model hyperparameter tuning, training, and scoring.

Unit tests have been implemented using pytest to validate the core functionality of the orchestration.

See source code here

Deployment

MLOps engineers must evaluate the model against specific quality metrics before deploying it to staging or production. In this project, the model is served via a Flask web service and containerized using Docker.

Integration test

GitHub Actions is used to set up an integration test, that

First builds the model
Containerize the model
Calls the web service endpoint in the Docker container

See source code here

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
.github/workflows		.github/workflows
data		data
deployment		deployment
modeling		modeling
orchestration		orchestration
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
Makefile		Makefile
README.md		README.md
environment.yml		environment.yml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Trotro MLops: Predicting the duration of a taxi ride

Introduction

Table of Contents

Machine Learning Modeling and Experiment Tracking

Orchestration

Deployment

Integration test

About

Releases

Packages

Languages

kennedyopokuasare/trotro_mlops

Folders and files

Latest commit

History

Repository files navigation

Trotro MLops: Predicting the duration of a taxi ride

Introduction

Table of Contents

Machine Learning Modeling and Experiment Tracking

Orchestration

Deployment

Integration test

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages