In the realm of urban mobility, accurately predicting taxi ride durations is crucial for optimizing logistics and enhancing user experience.
Trotro MLops project designed to leverage Machine Learning techniques for predicting taxi ride durations. By adhering to MLOps principles, this project seeks to facilitate efficient model development, deployment, and monitoring, ensuring that our predictions are both reliable and scalable.
The dataset for training, testing, and validation was downloaded from New York City Taxi and Limousine Commission. The January 2024 data was used for training, while the February 2024 data was utilized for validation.
Hyperparameter tuning with Bayesian optimization was performed, leveraging the hyperopt
Python library. The final model was trained with the best hyperparameters and full training data, which included both the training and validation datasets. The March 2024 data was used for scoring the model.
Experimentation variables, features, metrics, parameters, and models are tracked with MLFlow. Some of the artifacts that were saved with MLFlow are the plots of predictions from an XGBoost regressor on the training and validation sets.
Training Dataset Prediction | Validation Dataset Prediction |
---|---|
![]() |
![]() |
Future work:
- Feature importance analysis and interpretability with tools such as SHAP and IntepretML
- Model fairness evaluation with FairLearn
- Model explanation and interpretability with counterfactuals, using DICE
Machine Learning pipeline orchestration was built to run with Microsoft Azure Machine Learning
. This implementation is similar the project found here. For easier reproducability, this project uses GitHub Actions
as a build system to orchestrate the model hyperparameter tuning, training, and scoring.
Unit tests have been implemented using pytest
to validate the core functionality of the orchestration.
MLOps engineers must evaluate the model against specific quality metrics before deploying it to staging or production. In this project, the model is served via a Flask
web service and containerized using Docker
.
GitHub Actions
is used to set up an integration test
, that
- First builds the model
- Containerize the model
- Calls the web service endpoint in the Docker container