Skip to content

Commit

Permalink
Merge remote-tracking branch 'refs/remotes/origin/main'
Browse files Browse the repository at this point in the history
  • Loading branch information
khanhvynguyen committed Dec 9, 2024
2 parents 2d9081b + 257dc95 commit c50d70f
Show file tree
Hide file tree
Showing 12 changed files with 151 additions and 120 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/model.yml
Original file line number Diff line number Diff line change
Expand Up @@ -88,7 +88,7 @@ jobs:
gcloud ai custom-jobs create \
--region=${{ secrets.GOOGLE_CLOUD_REGION }} \
--display-name=model-training \
--args="--data-path=gs://${{ secrets.GCS_BUCKET_NAME }}/data/training_data.csv --epochs=10 --batch-size=32" \
--args="--data-path=gs://${{ secrets.GCS_BUCKET_NAME }}/pipeline/airflow/dags/data/scaled_data_train.csv --epochs=10 --batch-size=32" \
--python-package-uris=gs://${{ secrets.GCS_BUCKET_NAME }}/trainer/trainer-0.1.tar.gz \
--worker-pool-spec="machine-type=e2-standard-4,executor-image-uri=us-docker.pkg.dev/vertex-ai/training/tf-cpu.2-9:latest,python-module=trainer.task"
Expand Down
92 changes: 86 additions & 6 deletions Assignments_Submissions/Model Deployment/README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Model Deployment
# Model Deployment Phase

## Table of Contents

Expand All @@ -25,18 +25,34 @@ Steps for Replication
- GitHub Actions automatically triggers the CI/CD pipeline to initiate deployment using Cloud Build.
- The pipeline executes pre-defined steps, ensuring the model is correctly deployed to Vertex AI.
- Confirm that all dependencies are installed locally or in the CI/CD pipeline environment.

#### 3. **Verifying Deployment**
- Access the Vertex AI console to verify the deployment.
- Test the deployed model endpoint to confirm successful deployment and validate model predictions.
- Review monitoring dashboards to ensure no issues with prediction outputs or feature drift.

![Logging Dashboard](https://github.com/IE7374-MachineLearningOperations/StockPricePrediction/blob/104a48ddf826520ccc31374002d8df92f2015796/assets/Logging%20Dashboard.png)

![Drift Detection Logging](https://github.com/IE7374-MachineLearningOperations/StockPricePrediction/blob/104a48ddf826520ccc31374002d8df92f2015796/assets/Drift%20Detection%20logging.png)

---

## Model Serving and Deployment
To serve models using Vertex AI by deploying trained models, managing versions, and automating deployments through CI/CD pipelines.

Workflows and setups for managing machine learning pipelines on Vertex AI in Google Cloud are as follows:

1. **Jupyter Notebooks in Vertex AI Workbench**:
- The setup includes instances like `group10-test-vy` and `mlops-group10`, both configured for NumPy/SciPy and scikit-learn environments. These notebooks are GPU-enabled, optimizing their utility for intensive ML operations.

2. **Training Pipelines**:
- Multiple training pipelines are orchestrated on Vertex AI, such as `mlops-group10` and `group10-model-train`. These are primarily custom training pipelines aimed at tasks like hyperparameter tuning, training, and validation, leveraging the scalability of Google Cloud's infrastructure.

3. **Metadata Management**:
- Metadata tracking is managed through Vertex AI Metadata Store, with records such as `vertex_dataset`. This ensures reproducibility and streamlined monitoring of all artifacts produced during ML workflows.

4. **Model Registry**:
- Deployed models like `mlops-group10-deploy` and `group10-model` are maintained in the model registry. The registry supports versioning and deployment tracking for consistency and monitoring.

5. **Endpoints for Online Prediction**:
- Various endpoints, such as `mlops-group10-deploy` and `testt`, are active and ready for predictions. The setup is optimized for real-time online predictions, and monitoring can be enabled for anomaly detection or drift detection.

### Steps for Deployment of Trained Models
1. **Model Registration**: Once a model is trained, register it in Vertex AI's Model Registry. Specify the model name, version, and any relevant metadata.
Expand Down Expand Up @@ -72,17 +88,78 @@ To serve models using Vertex AI by deploying trained models, managing versions,
- **GitHub Actions**: Configure workflows in `.github/workflows/` directory to automate testing, building, and deploying models.
- **Cloud Build**: Create a `cloudbuild.yaml` file specifying steps to build, test, and deploy models based on changes in the repository.


![GitHub Actions CI/CD](https://github.com/IE7374-MachineLearningOperations/StockPricePrediction/blob/104a48ddf826520ccc31374002d8df92f2015796/assets/Github%20Actions%20CICD.png)

---

#### Automated Deployment Scripts
- **Script Functions**:
- **Pull the Latest Model**: Scripts should fetch the latest model version from Vertex AI Model Registry or a specified repository.
- **Deploy or Update Model**: Automate the deployment of the model to the configured Vertex AI endpoint.
- **Monitor and Log**: Set up logging for deployment status to ensure visibility and troubleshooting capabilities.

---
#### **1. `airflowtrigger.yaml`**
- **Purpose**: Triggers and manages Apache Airflow DAG workflows.
- **Steps**:
- **Set up environment**: Installs Python, dependencies, and Docker Compose.
- **Airflow initialization**: Starts Airflow services and checks their status.
- **DAG management**: Lists, triggers, and monitors DAG execution (success or failure).
- **Cleanup**: Stops Airflow services and removes unnecessary files.

---

#### **2. `deploy.yaml`**
- **Purpose**: Deploys and monitors a machine learning model on Vertex AI.
- **Steps**:
- **Environment setup**: Configures Google Cloud SDK using secrets.
- **Model deployment**: Deploys a trained model to Vertex AI endpoints.
- **Monitoring**: Fetches the latest model and endpoint IDs and sets them for further monitoring.

---

#### **3. `model.yml`**
- **Purpose**: Handles training and packaging a machine learning model for deployment.
- **Steps**:
- **Trainer creation**: Builds a Python package (`trainer`) for model training.
- **Package upload**: Uploads the trainer package to Google Cloud Storage.
- **Training job**: Triggers a Vertex AI custom training job using the uploaded package.
- **Notification**: Indicates the completion of the training process.

---

#### **4. `PyTest.yaml`**
- **Purpose**: Runs Python unit tests and generates test coverage reports.
- **Steps**:
- **Environment setup**: Installs dependencies and Google Cloud CLI.
- **Testing**: Runs tests with pytest, generates coverage reports, and uploads them as artifacts.
- **Upload results**: Saves coverage reports to a GCP bucket for review.

---

#### **5. `syncgcp.yaml`**
- **Purpose**: Synchronizes local artifacts and Airflow DAGs with a Google Cloud Storage bucket.
- **Steps**:
- **Environment setup**: Installs the Google Cloud CLI and authenticates with a service account.
- **File uploads**:
- Uploads specific artifacts and files to predefined GCP bucket locations.
- Synchronizes repository content with the bucket directory structure.
- **Verification**: Lists uploaded files to confirm the sync.
---

![GitHub Actions](https://github.com/IE7374-MachineLearningOperations/StockPricePrediction/blob/104a48ddf826520ccc31374002d8df92f2015796/assets/Github%20Workflows.png)


#### Summary
These YAML workflows automate various aspects of an ML lifecycle:
1. **`airflowtrigger.yaml`**: Airflow DAG management.
2. **`deploy.yaml`**: Vertex AI deployment and monitoring.
3. **`model.yml`**: Training pipeline and GCS uploads.
4. **`PyTest.yaml`**: Testing and reporting.
5. **`syncgcp.yaml`**: Artifact and DAG synchronization with GCP.

Each workflow is tailored for a specific task in CI/CD for ML operations, leveraging GitHub Actions and Google Cloud services.
---

## Monitoring and Maintenance

Expand All @@ -95,18 +172,21 @@ To serve models using Vertex AI by deploying trained models, managing versions,

![Model Monitoring Anomalies](https://github.com/IE7374-MachineLearningOperations/StockPricePrediction/blob/104a48ddf826520ccc31374002d8df92f2015796/assets/Model%20monitoring%20Anomolies.png)

The provided images highlight the active setup and management of a Vertex AI model monitoring system. Files like `anomalies.json` and `anomalies.textproto` document identified issues in the input data. The structure also includes folders such as `baseline`, `logs`, and `metrics`, which organize monitoring data effectively for future analysis. A notification email confirming the creation of a model monitoring job for a specific Vertex AI endpoint. This email provides essential details, such as the endpoint name, monitoring job link, and the GCS bucket path where statistics and anomalies will be saved.

2. **Maintenance**:
- Pre-configured thresholds for model performance trigger retraining or redeployment of updated models.
- Logs and alerts from Vertex AI and Cloud Build ensure the system remains reliable and scalable.

![Monitor Details](https://github.com/IE7374-MachineLearningOperations/StockPricePrediction/blob/104a48ddf826520ccc31374002d8df92f2015796/assets/Monitor%20details.png)

![Logging Dashboard](https://github.com/IE7374-MachineLearningOperations/StockPricePrediction/blob/104a48ddf826520ccc31374002d8df92f2015796/assets/Logging%20Dashboard.png)

![Monitor Feature Detection](https://github.com/IE7374-MachineLearningOperations/StockPricePrediction/blob/104a48ddf826520ccc31374002d8df92f2015796/assets/Monitor%20feature%20detection.png)


![Monitor Drift Detection](https://github.com/IE7374-MachineLearningOperations/StockPricePrediction/blob/104a48ddf826520ccc31374002d8df92f2015796/assets/Monitor%20drift%20detection.png)


---


Binary file removed artifacts/models/best_linear_regression_model.joblib
Binary file not shown.
Binary file added assets/Cloud build trigger.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added assets/Github Workflows.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Empty file removed pipeline/airflow/dags/docker.yaml
Empty file.
2 changes: 1 addition & 1 deletion pipeline/airflow/dags/src/upload_blob.py
Original file line number Diff line number Diff line change
Expand Up @@ -74,4 +74,4 @@ def upload_blob(data, gcs_file_path: str = None):
technical_indicators_data = add_technical_indicators(feature_interactions_data)
scaled_data = scaler(technical_indicators_data)
upload_blob(scaled_data)
print("done!!")
# print("done!!")
23 changes: 18 additions & 5 deletions pipeline/airflow/docker-compose.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -117,21 +117,34 @@ services:
retries: 50
restart: always

# airflow-webserver:
# <<: *airflow-common
# command: webserver
# ports:
# - 8080:8080
# healthcheck:
# test: ["CMD", "curl", "--fail", "http://localhost:8080/health"]
# interval: 10s
# timeout: 10s
# retries: 5
# restart: always
# depends_on:
# <<: *airflow-common-depends-on
# airflow-init:
# condition: service_completed_successfully
airflow-webserver:
<<: *airflow-common
command: webserver
ports:
- 8080:8080
- "8080:8080"
healthcheck:
test: ["CMD", "curl", "--fail", "http://localhost:8080/health"]
test: ["CMD-SHELL", "curl --fail http://localhost:8080/health || exit 1"]
interval: 10s
timeout: 10s
retries: 5
restart: always
depends_on:
<<: *airflow-common-depends-on
airflow-init:
condition: service_completed_successfully
- airflow-init

airflow-scheduler:
<<: *airflow-common
Expand Down
1 change: 0 additions & 1 deletion pipeline/airflow/logs/scheduler/latest

This file was deleted.

Empty file removed pipeline/current.txt
Empty file.
95 changes: 0 additions & 95 deletions pipeline/pipielinetree.txt

This file was deleted.

Loading

0 comments on commit c50d70f

Please sign in to comment.