Skip to content

An MLOPS for predicting when a taxi will arrive at a trip destination

Notifications You must be signed in to change notification settings

kennedyopokuasare/trotro_mlops

Repository files navigation

Trotro MLops: Predicting the duration of a taxi ride

Orchestrate Machine Learning Pipeline

Introduction

In the realm of urban mobility, accurately predicting taxi ride durations is crucial for optimizing logistics and enhancing user experience.

Trotro MLops project designed to leverage Machine Learning techniques for predicting taxi ride durations. By adhering to MLOps principles, this project seeks to facilitate efficient model development, deployment, and monitoring, ensuring that our predictions are both reliable and scalable.

Table of Contents

Machine Learning Modeling and Experiment Tracking

The dataset for training, testing, and validation was downloaded from New York City Taxi and Limousine Commission. The January 2024 data was used for training, while the February 2024 data was utilized for validation.

Hyperparameter tuning with Bayesian optimization was performed, leveraging the hyperopt Python library. The final model was trained with the best hyperparameters and full training data, which included both the training and validation datasets. The March 2024 data was used for scoring the model.

Experimentation variables, features, metrics, parameters, and models are tracked with MLFlow. Some of the artifacts that were saved with MLFlow are the plots of predictions from an XGBoost regressor on the training and validation sets.

Training Dataset Prediction Validation Dataset Prediction
Training set prediction Validation set prediction

Future work:

  • Feature importance analysis and interpretability with tools such as SHAP and IntepretML
  • Model fairness evaluation with FairLearn
  • Model explanation and interpretability with counterfactuals, using DICE

See source code here

Orchestration

Machine Learning pipeline orchestration was built to run with Microsoft Azure Machine Learning. This implementation is similar the project found here. For easier reproducability, this project uses GitHub Actions as a build system to orchestrate the model hyperparameter tuning, training, and scoring.

Unit tests have been implemented using pytest to validate the core functionality of the orchestration.

See source code here

Deployment

MLOps engineers must evaluate the model against specific quality metrics before deploying it to staging or production. In this project, the model is served via a Flask web service and containerized using Docker.

Integration test

GitHub Actions is used to set up an integration test, that

  • First builds the model
  • Containerize the model
  • Calls the web service endpoint in the Docker container

Deployment GitHub Actions workflow

See source code here

About

An MLOPS for predicting when a taxi will arrive at a trip destination

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published