diff --git a/Assignments_Submissions/Model Deployment/README.md b/Assignments_Submissions/Model Deployment/README.md index f4dd808..99b26ae 100644 --- a/Assignments_Submissions/Model Deployment/README.md +++ b/Assignments_Submissions/Model Deployment/README.md @@ -18,7 +18,7 @@ Steps for Replication - A GitHub repository with access to GitHub Actions for automation. - Required IAM roles for deploying models to Vertex AI and managing Cloud Build resources. -![GCP Billing Dashboard](https://github.com/IE7374-MachineLearningOperations/StockPricePrediction/blob/104a48ddf826520ccc31374002d8df92f2015796/assets/GCP%20billing%20dashboard.png) +![GCP Billing Dashboard](https://github.com/IE7374-MachineLearningOperations/StockPricePrediction/blob/v2.1/assets/GCP%20billing%20dashboard.png) #### 2. **Running Deployment Automation** - Push changes to the main branch of the GitHub repository. @@ -31,7 +31,7 @@ Steps for Replication - Test the deployed model endpoint to confirm successful deployment and validate model predictions. - Review monitoring dashboards to ensure no issues with prediction outputs or feature drift. -![Drift Detection Logging](https://github.com/IE7374-MachineLearningOperations/StockPricePrediction/blob/104a48ddf826520ccc31374002d8df92f2015796/assets/Drift%20Detection%20logging.png) +![Drift Detection Logging](https://github.com/IE7374-MachineLearningOperations/StockPricePrediction/blob/v2.1/assets/Drift%20Detection%20logging.png) --- @@ -57,25 +57,25 @@ Workflows and setups for managing machine learning pipelines on Vertex AI in Goo ### Steps for Deployment of Trained Models 1. **Model Registration**: Once a model is trained, register it in Vertex AI's Model Registry. Specify the model name, version, and any relevant metadata. -![Vertex AI Jupyter Notebooks](https://github.com/IE7374-MachineLearningOperations/StockPricePrediction/blob/104a48ddf826520ccc31374002d8df92f2015796/assets/Vertex%20Ai%20jupyter%20notebooks.png) +![Vertex AI Jupyter Notebooks](https://github.com/IE7374-MachineLearningOperations/StockPricePrediction/blob/v2.1/assets/Vertex%20Ai%20jupyter%20notebooks.png) -![Model Serving](https://github.com/IE7374-MachineLearningOperations/StockPricePrediction/blob/104a48ddf826520ccc31374002d8df92f2015796/assets/Model%20serving.png) +![Model Serving](https://github.com/IE7374-MachineLearningOperations/StockPricePrediction/blob/v2.1/assets/Model%20serving.png) -![Vertex AI Model Registry](https://github.com/IE7374-MachineLearningOperations/StockPricePrediction/blob/104a48ddf826520ccc31374002d8df92f2015796/assets/Vertex%20Ai%20model%20registry.png) +![Vertex AI Model Registry](https://github.com/IE7374-MachineLearningOperations/StockPricePrediction/blob/v2.1/assets/Vertex%20Ai%20model%20registry.png) 2. **Create an Endpoint**: - In Vertex AI, create an endpoint. This endpoint will act as the interface for serving predictions. - Navigate to Vertex AI > Online prediction > Endpoints > Create. - Assign a name and select the appropriate region. -![Vertex AI Endpoints](https://github.com/IE7374-MachineLearningOperations/StockPricePrediction/blob/104a48ddf826520ccc31374002d8df92f2015796/assets/Vertex%20Ai%20endpoints.png) +![Vertex AI Endpoints](https://github.com/IE7374-MachineLearningOperations/StockPricePrediction/blob/v2.1/assets/Vertex%20Ai%20endpoints.png) 3. **Deploy the Model to an Endpoint**: - Select the registered model and choose "Deploy to Endpoint". - Configure the deployment settings such as machine type, traffic splitting among model versions, and whether to enable logging or monitoring. - Confirm deployment which will make the model ready to serve predictions. -![Vertex AI Model Development Training](https://github.com/IE7374-MachineLearningOperations/StockPricePrediction/blob/104a48ddf826520ccc31374002d8df92f2015796/assets/Vertex%20Ai%20model%20development%20training.png) +![Vertex AI Model Development Training](https://github.com/IE7374-MachineLearningOperations/StockPricePrediction/blob/v2.1/assets/Vertex%20Ai%20model%20development%20training.png) ### Model Versioning - **Manage Versions**: In Vertex AI, each model can have multiple versions allowing easy rollback and version comparison. @@ -88,7 +88,7 @@ Workflows and setups for managing machine learning pipelines on Vertex AI in Goo - **GitHub Actions**: Configure workflows in `.github/workflows/` directory to automate testing, building, and deploying models. - **Cloud Build**: Create a `cloudbuild.yaml` file specifying steps to build, test, and deploy models based on changes in the repository. -![GitHub Actions CI/CD](https://github.com/IE7374-MachineLearningOperations/StockPricePrediction/blob/104a48ddf826520ccc31374002d8df92f2015796/assets/Github%20Actions%20CICD.png) +![GitHub Actions CI/CD](https://github.com/IE7374-MachineLearningOperations/StockPricePrediction/blob/v2.1/assets/Github%20Actions%20CICD.png) --- @@ -147,7 +147,7 @@ Workflows and setups for managing machine learning pipelines on Vertex AI in Goo - **Verification**: Lists uploaded files to confirm the sync. --- -![GitHub Actions](https://github.com/IE7374-MachineLearningOperations/StockPricePrediction/blob/104a48ddf826520ccc31374002d8df92f2015796/assets/Github%20Workflows.png) +![GitHub Actions](https://github.com/IE7374-MachineLearningOperations/StockPricePrediction/blob/v2.1/assets/Github%20Workflows.png) #### Summary @@ -167,10 +167,10 @@ Each workflow is tailored for a specific task in CI/CD for ML operations, levera - Vertex AI provides dashboards to monitor model performance and data drift. - Alerts are configured to notify stakeholders when anomalies, such as feature attribution drift, are detected. -![Model Monitoring Notification](https://github.com/IE7374-MachineLearningOperations/StockPricePrediction/blob/104a48ddf826520ccc31374002d8df92f2015796/assets/Model%20Monitoring%20notification.png) +![Model Monitoring Notification](https://github.com/IE7374-MachineLearningOperations/StockPricePrediction/blob/v2.1/assets/Model%20Monitoring%20notification.png) -![Model Monitoring Anomalies](https://github.com/IE7374-MachineLearningOperations/StockPricePrediction/blob/104a48ddf826520ccc31374002d8df92f2015796/assets/Model%20monitoring%20Anomolies.png) +![Model Monitoring Anomalies](https://github.com/IE7374-MachineLearningOperations/StockPricePrediction/blob/v2.1/assets/Model%20monitoring%20Anomolies.png) The provided images highlight the active setup and management of a Vertex AI model monitoring system. Files like `anomalies.json` and `anomalies.textproto` document identified issues in the input data. The structure also includes folders such as `baseline`, `logs`, and `metrics`, which organize monitoring data effectively for future analysis. A notification email confirming the creation of a model monitoring job for a specific Vertex AI endpoint. This email provides essential details, such as the endpoint name, monitoring job link, and the GCS bucket path where statistics and anomalies will be saved. @@ -178,14 +178,13 @@ The provided images highlight the active setup and management of a Vertex AI mod - Pre-configured thresholds for model performance trigger retraining or redeployment of updated models. - Logs and alerts from Vertex AI and Cloud Build ensure the system remains reliable and scalable. -![Monitor Details](https://github.com/IE7374-MachineLearningOperations/StockPricePrediction/blob/104a48ddf826520ccc31374002d8df92f2015796/assets/Monitor%20details.png) +![Monitor Details](https://github.com/IE7374-MachineLearningOperations/StockPricePrediction/blob/v2.1/assets/Monitor%20details.png) -![Logging Dashboard](https://github.com/IE7374-MachineLearningOperations/StockPricePrediction/blob/104a48ddf826520ccc31374002d8df92f2015796/assets/Logging%20Dashboard.png) +![Logging Dashboard](https://github.com/IE7374-MachineLearningOperations/StockPricePrediction/blob/v2.1/assets/Logging%20Dashboard.png) -![Monitor Feature Detection](https://github.com/IE7374-MachineLearningOperations/StockPricePrediction/blob/104a48ddf826520ccc31374002d8df92f2015796/assets/Monitor%20feature%20detection.png) - -![Monitor Drift Detection](https://github.com/IE7374-MachineLearningOperations/StockPricePrediction/blob/104a48ddf826520ccc31374002d8df92f2015796/assets/Monitor%20drift%20detection.png) +![Monitor Feature Detection](https://github.com/IE7374-MachineLearningOperations/StockPricePrediction/blob/v2.1/assets/Monitor%20feature%20detection.png) +![Monitor Drift Detection](https://github.com/IE7374-MachineLearningOperations/StockPricePrediction/blob/v2.1/assets/Monitor%20drift%20detection.png) --- diff --git a/Assignments_Submissions/Model development pipeline phase/README.md b/Assignments_Submissions/Model development pipeline phase/README.md index 965fbc2..9f4e361 100644 --- a/Assignments_Submissions/Model development pipeline phase/README.md +++ b/Assignments_Submissions/Model development pipeline phase/README.md @@ -227,9 +227,46 @@ Model validation is a crucial step to evaluate how well the selected model perfo ### 4. Model Bias Detection (Using Slicing Techniques) Bias detection ensures that the model behaves equitably across different subgroups of data. +> Please refer to [Model Bias Notebook](https://github.com/IE7374-MachineLearningOperations/StockPricePrediction/blob/c50d70f57592fa1ca139141ce09fe82099e7ea1b/src/FeatureEng_and_ModelBiasDetn.ipynb) + - **Tools Used**: `Fairlearn` for data slicing and detecting model bias. - **Purpose**: To evaluate model fairness across various demographic or sensitive features. + - **Metrics by Slice**: + - High VIX Slice: Mean Squared Error (MSE): 2.04 , Mean Absolute Error (MAE): 1.14 + - Low VIX Slice: Mean Squared Error (MSE): 1.34 , Mean Absolute Error (MAE): 0.87 + - **Disparity Analysis**: Overall MSE Disparity: 0.70 , Overall MAE Disparity: 0.27 + +- **ElasticNet Model Performance Summary**: + + - **ElasticNet Test MSE with Resampling**: 1.37 + - **ElasticNet Test MAE with Resampling**: 0.91 + + - **1. Dataset Slicing and Bias Detection**: + - **Slices Defined**: We defined slices based on SP500 and VIX values (high and low conditions) to evaluate model performance under different market scenarios. + - **Metrics Tracked**: For each slice, we tracked Mean Squared Error (MSE) and Mean Absolute Error (MAE) to assess prediction accuracy. + - **Results**: Initial analysis showed disparities in MSE and MAE between slices, with higher errors in some slices (e.g., high SP500), suggesting potential bias. + + - **2. Bias Mitigation Technique Applied**: + - **Resampling**: To mitigate bias, we applied resampling to balance the training dataset, focusing on slices with less representation or higher error. + - **Model Re-evaluation**: After re-training the ElasticNet model on the resampled data, we saw improvements in MSE and MAE across slices, which indicated reduced disparity. + + - **3. Performance Improvement After Mitigation**: + - **Metrics After Resampling**: + - **Test MSE**: Improved to 1.37. + - **Test MAE**: Improved to 0.91. + - **Disparity Reduction**: The disparity between slices decreased, showing that the model predictions were more consistent across different market conditions. + + - **4. Considerations and Trade-offs**: + - **Trade-offs**: Resampling helped improve fairness across slices, but it may have slightly affected the model's predictive power for over-represented groups. However, our main goal was to ensure balanced fairness across conditions to minimize bias. + - **Future Improvements**: We could further enhance model fairness by applying additional techniques like re-weighting or fine-tuning model hyperparameters. + +- **Observations**: + - These metrics show how well the model is doing on the test set after rebalancing the training data to ensure fairness across slices defined by VIX values. + - The low MSE and MAE values suggest that the model's predictions are pretty close to the actual values, which means we've likely improved both fairness and overall accuracy. + - The high VIX slice has a higher error (MSE of 2.04) compared to the low VIX slice (MSE of 1.34). This suggests that the model struggles a bit more with accuracy when market volatility (VIX) is high. + - The disparity in both MSE and MAE points to a noticeable performance gap between these slices, with the model performing better under stable (low VIX) conditions. + **Airflow Graph Depicting Bias Detection Task**: ![Airflow Graph](https://github.com/IE7374-MachineLearningOperations/StockPricePrediction/blob/v1.0/assets/airflow_graph.png) @@ -275,7 +312,7 @@ Sensitivity analysis helps understand how changes in input features and hyperpar - **Hyperparameter Sensitivity Analysis**: - ![Hyperparameter Sensitivity](https://github.com/IE7374-MachineLearningOperations/StockPricePrediction/blob/v1.0/assets/results_linear_regression .png) + ![Hyperparameter Sensitivity](https://github.com/IE7374-MachineLearningOperations/StockPricePrediction/blob/v2.1/assets/Linear%20Regression%20-%20Hyperparameter%20Sensitivity_%20model__alpha.png.png) ### 8. Experiment Tracking and Results with Weights & Biases diff --git a/README.md b/README.md index 40a9ba3..a82b52e 100644 --- a/README.md +++ b/README.md @@ -9,7 +9,7 @@ To read all files: > Refer to [Assignments Submissions](https://github.com/IE7374-MachineLearningOperations/StockPricePrediction/tree/1e36981df331c0ecb44a13194e940dbe7ba8aa5b/Assignments_Submissions/) To read current phase: - > Refer to [Model Development pipeline](https://github.com/IE7374-MachineLearningOperations/StockPricePrediction/tree/1e36981df331c0ecb44a13194e940dbe7ba8aa5b/Assignments_Submissions/Model%20development%20pipeline%20phase) + > Refer to [Model Deployment](https://github.com/IE7374-MachineLearningOperations/StockPricePrediction/tree/c50d70f57592fa1ca139141ce09fe82099e7ea1b/Assignments_Submissions/Model%20Deployment) ## Table of Contents - [Directory Structure](#directory-structure) @@ -21,6 +21,8 @@ To read current phase: - [Test Functions](#test-functions) - [Reproducibility and Data Versioning](#reproducibility-and-data-versioning) - [Data Sources](#data-sources) +- [Model Serving and Deployment](#model-serving-and-deployment) +- [Monitoring and Maintenance](#monitoring-and-maintenance) --- @@ -28,76 +30,40 @@ To read current phase: ``` . -├── artifacts # Stores output files like correlation matrix, trained model artifacts -│ ├── correlation_matrix_after_removing_correlated_features.png -│ ├── ElasticNet.pkl # Trained model files (ElasticNet, Ridge, etc.) -│ ├── Feature Importance for ElasticNet on Test Set.png -│ └── Ridge.pkl -├── assets # Contains images used for visualization and documentation -│ ├── airflow_dags.jpeg -│ ├── gcpbucket.png -│ └── overview_charts_all_runs.png -├── Assignments_Submissions # Stores submission documents for different project phases -│ ├── DataPipeline Phase -│ │ ├── Airflow README.md -│ │ └── Project README.md -│ └── Scoping Phase -│ ├── Data Collection Group 10.pdf -│ └── Group 10 Scoping Document.pdf -├── data # Stores raw and preprocessed data files -│ ├── ADS_Index.csv -│ └── preprocessed -│ ├── final_dataset.csv -│ └── merged_original_dataset.csv -├── GCP # Configuration and scripts for Google Cloud operations -│ ├── application_default_credentials.json -│ ├── deploy.yml -│ └── synclocal.ipynb -├── models # Jupyter notebooks and model checkpoint files -│ ├── KNN.ipynb -│ ├── model_checkpoints_Ridge.pkl -│ └── XGBoost.ipynb -├── pipeline -│ ├── airflow -│ │ ├── artifacts # Stores intermediate artifacts for Airflow processing -│ │ │ ├── pca_components.png -│ │ │ └── yfinance_time_series.png -│ │ ├── dags # DAG scripts for orchestrating data and model pipelines -│ │ │ ├── airflow.py -│ │ │ ├── data # Data used in Airflow DAG processing -│ │ │ │ ├── ADS_index.csv -│ │ │ │ └── merged_original_dataset.csv -│ │ │ └── src # Python scripts for various DAG steps (e.g., data preprocessing) -│ │ │ ├── convert_column_dtype.py -│ │ │ ├── download_data.py -│ │ │ ├── models # Model-specific scripts used in the pipeline -│ │ │ │ ├── LSTM.py -│ │ │ │ └── XGBoost.py -│ │ │ ├── pca.py -│ │ │ ├── scaler.py -│ │ │ └── upload_blob.py -│ │ ├── docker-compose.yaml # Docker setup for running Airflow components -│ │ ├── plugins # Custom plugins for Airflow -│ │ ├── tests # Py tests for DAG steps -│ │ │ ├── test_download_data.py -│ │ │ └── test_scaler.py -│ │ ├── wandb # Logs for W&B experiments -│ │ │ └── run-20241115_215708-13bfiift -│ │ │ └── files -│ │ │ ├── config.yaml -│ │ │ └── wandb-summary.json -│ │ └── working_data # Temporary data during Airflow execution -│ └── README.md -├── README.md # Project description and setup information -├── requirements.txt # Dependencies required for the project -├── src # Python scripts and notebooks for model training and preprocessing -│ ├── KNN.ipynb -│ ├── PROJECT_DATA_CLEANING.ipynb -│ └── XGBoost.ipynb -└── tests # Additional test scripts for validation - ├── test_convert_column_dtype.py - ├── test_lagged_features.py - └── test_scaler.py +├── artifacts +│ ├── drift_detection_log.txt +│ └── schema.pbtxt +├── assets # Visual assets for monitoring and documentation +│ ├── airflow*, gcp*, and github* related images +│ ├── Logging Dashboard, Model Monitoring, and Vertex AI images +│ ├── feature engineering, PCA, and deployment visuals +│ └── other analysis-related graphics +├── Assignments_Submissions # Reports and documentation +│ ├── DataPipeline Phase # Includes README and pipeline documentation +│ ├── Model Deployment # Deployment phase documentation +│ ├── Model Development Pipeline Phase +│ ├── Scoping Phase # Scoping reports and user needs +├── data # Raw and preprocessed datasets +│ ├── raw # Unprocessed datasets +│ ├── preprocessed # Cleaned datasets +├── GCP # Google Cloud-related files and scripts +│ ├── application credentials, deployment configs +│ ├── gcpdeploy # Scripts for training and serving models +│ └── wandb # Weights & Biases logs and metadata +├── pipeline # Pipeline scripts and configurations +│ ├── airflow # DAGs, logs, and DAG-related scripts +│ ├── dags/data # Data source files for pipeline tasks +│ ├── artifacts # Artifacts generated from DAGs +│ ├── tests # Unit test scripts for pipeline tasks +│ └── wandb # Workflow and run logs +├── src # Core source code (Py script, Notebook, Model scripts) +│ ├── Data +│ ├── best_model.py +│ └── Datadrift_detection_updated.ipynb +├── requirements.txt # Python dependencies +├── dockerfile +├── LICENSE +└── README.md # Main documentation ``` @@ -331,6 +297,161 @@ We used **DVC (Data Version Control)** for files management. 3. **FRED Variables**: Includes various economic indicators, such as AMERIBOR, NIKKEI 225, and VIX. 4. **YFinance**: Pulls historical stock data ('GOOGL') for financial time-series analysis. +--- + +## Model Serving and Deployment + +Workflows and setups for managing machine learning pipelines on Vertex AI in Google Cloud are as follows: + +1. **Jupyter Notebooks in Vertex AI Workbench**: + - The setup includes instances like `group10-test-vy` and `mlops-group10`, both configured for NumPy/SciPy and scikit-learn environments. These notebooks are GPU-enabled, optimizing their utility for intensive ML operations. + +2. **Training Pipelines**: + - Multiple training pipelines are orchestrated on Vertex AI, such as `mlops-group10` and `group10-model-train`. These are primarily custom training pipelines aimed at tasks like hyperparameter tuning, training, and validation, leveraging the scalability of Google Cloud's infrastructure. + +3. **Metadata Management**: + - Metadata tracking is managed through Vertex AI Metadata Store, with records such as `vertex_dataset`. This ensures reproducibility and streamlined monitoring of all artifacts produced during ML workflows. + +4. **Model Registry**: + - Deployed models like `mlops-group10-deploy` and `group10-model` are maintained in the model registry. The registry supports versioning and deployment tracking for consistency and monitoring. + +5. **Endpoints for Online Prediction**: + - Various endpoints, such as `mlops-group10-deploy` and `testt`, are active and ready for predictions. The setup is optimized for real-time online predictions, and monitoring can be enabled for anomaly detection or drift detection. + +### Steps for Deployment of Trained Models +1. **Model Registration**: Once a model is trained, register it in Vertex AI's Model Registry. Specify the model name, version, and any relevant metadata. + +![Vertex AI Jupyter Notebooks](https://github.com/IE7374-MachineLearningOperations/StockPricePrediction/blob/v2.1/assets/Vertex%20Ai%20jupyter%20notebooks.png) + +![Model Serving](https://github.com/IE7374-MachineLearningOperations/StockPricePrediction/blob/v2.1/assets/Model%20serving.png) + +![Vertex AI Model Registry](https://github.com/IE7374-MachineLearningOperations/StockPricePrediction/blob/v2.1/assets/Vertex%20Ai%20model%20registry.png) + +2. **Create an Endpoint**: + - In Vertex AI, create an endpoint. This endpoint will act as the interface for serving predictions. + - Navigate to Vertex AI > Online prediction > Endpoints > Create. + - Assign a name and select the appropriate region. + +![Vertex AI Endpoints](https://github.com/IE7374-MachineLearningOperations/StockPricePrediction/blob/v2.1/assets/Vertex%20Ai%20endpoints.png) + +3. **Deploy the Model to an Endpoint**: + - Select the registered model and choose "Deploy to Endpoint". + - Configure the deployment settings such as machine type, traffic splitting among model versions, and whether to enable logging or monitoring. + - Confirm deployment which will make the model ready to serve predictions. + +![Vertex AI Model Development Training](https://github.com/IE7374-MachineLearningOperations/StockPricePrediction/blob/v2.1/assets/Vertex%20Ai%20model%20development%20training.png) + +### Model Versioning +- **Manage Versions**: In Vertex AI, each model can have multiple versions allowing easy rollback and version comparison. +- **Update Versions**: Upload new versions of the model to the Model Registry and adjust the endpoint configurations to direct traffic to the newer version. + +### Deployment Automation +#### Continuous Integration and Deployment Pipeline +- **Automate Deployments**: Use GitHub Actions and Google Cloud Build to automate the deployment of new model versions from a repository. +- **CI/CD Pipeline Configuration**: + - **GitHub Actions**: Configure workflows in `.github/workflows/` directory to automate testing, building, and deploying models. + - **Cloud Build**: Create a `cloudbuild.yaml` file specifying steps to build, test, and deploy models based on changes in the repository. + +![GitHub Actions CI/CD](https://github.com/IE7374-MachineLearningOperations/StockPricePrediction/blob/v2.1/assets/Github%20Actions%20CICD.png) + +--- + +#### Automated Deployment Scripts +- **Script Functions**: + - **Pull the Latest Model**: Scripts should fetch the latest model version from Vertex AI Model Registry or a specified repository. + - **Deploy or Update Model**: Automate the deployment of the model to the configured Vertex AI endpoint. + - **Monitor and Log**: Set up logging for deployment status to ensure visibility and troubleshooting capabilities. + +--- +#### **1. `airflowtrigger.yaml`** +- **Purpose**: Triggers and manages Apache Airflow DAG workflows. +- **Steps**: + - **Set up environment**: Installs Python, dependencies, and Docker Compose. + - **Airflow initialization**: Starts Airflow services and checks their status. + - **DAG management**: Lists, triggers, and monitors DAG execution (success or failure). + - **Cleanup**: Stops Airflow services and removes unnecessary files. + +--- + +#### **2. `deploy.yaml`** +- **Purpose**: Deploys and monitors a machine learning model on Vertex AI. +- **Steps**: + - **Environment setup**: Configures Google Cloud SDK using secrets. + - **Model deployment**: Deploys a trained model to Vertex AI endpoints. + - **Monitoring**: Fetches the latest model and endpoint IDs and sets them for further monitoring. + +--- + +#### **3. `model.yml`** +- **Purpose**: Handles training and packaging a machine learning model for deployment. +- **Steps**: + - **Trainer creation**: Builds a Python package (`trainer`) for model training. + - **Package upload**: Uploads the trainer package to Google Cloud Storage. + - **Training job**: Triggers a Vertex AI custom training job using the uploaded package. + - **Notification**: Indicates the completion of the training process. + +--- + +#### **4. `PyTest.yaml`** +- **Purpose**: Runs Python unit tests and generates test coverage reports. +- **Steps**: + - **Environment setup**: Installs dependencies and Google Cloud CLI. + - **Testing**: Runs tests with pytest, generates coverage reports, and uploads them as artifacts. + - **Upload results**: Saves coverage reports to a GCP bucket for review. + +--- + +#### **5. `syncgcp.yaml`** +- **Purpose**: Synchronizes local artifacts and Airflow DAGs with a Google Cloud Storage bucket. +- **Steps**: + - **Environment setup**: Installs the Google Cloud CLI and authenticates with a service account. + - **File uploads**: + - Uploads specific artifacts and files to predefined GCP bucket locations. + - Synchronizes repository content with the bucket directory structure. + - **Verification**: Lists uploaded files to confirm the sync. +--- + +![GitHub Actions](https://github.com/IE7374-MachineLearningOperations/StockPricePrediction/blob/v2.1/assets/Github%20Workflows.png) + + +#### Summary +These YAML workflows automate various aspects of an ML lifecycle: +1. **`airflowtrigger.yaml`**: Airflow DAG management. +2. **`deploy.yaml`**: Vertex AI deployment and monitoring. +3. **`model.yml`**: Training pipeline and GCS uploads. +4. **`PyTest.yaml`**: Testing and reporting. +5. **`syncgcp.yaml`**: Artifact and DAG synchronization with GCP. + +Each workflow is tailored for a specific task in CI/CD for ML operations, leveraging GitHub Actions and Google Cloud services. + +--- + +## Monitoring and Maintenance + +1. **Monitoring**: + - Vertex AI provides dashboards to monitor model performance and data drift. + - Alerts are configured to notify stakeholders when anomalies, such as feature attribution drift, are detected. + +![Model Monitoring Notification](https://github.com/IE7374-MachineLearningOperations/StockPricePrediction/blob/v2.1/assets/Model%20Monitoring%20notification.png) + + +![Model Monitoring Anomalies](https://github.com/IE7374-MachineLearningOperations/StockPricePrediction/blob/v2.1/assets/Model%20monitoring%20Anomolies.png) + +The provided images highlight the active setup and management of a Vertex AI model monitoring system. Files like `anomalies.json` and `anomalies.textproto` document identified issues in the input data. The structure also includes folders such as `baseline`, `logs`, and `metrics`, which organize monitoring data effectively for future analysis. A notification email confirming the creation of a model monitoring job for a specific Vertex AI endpoint. This email provides essential details, such as the endpoint name, monitoring job link, and the GCS bucket path where statistics and anomalies will be saved. + +2. **Maintenance**: + - Pre-configured thresholds for model performance trigger retraining or redeployment of updated models. + - Logs and alerts from Vertex AI and Cloud Build ensure the system remains reliable and scalable. + +![Monitor Details](https://github.com/IE7374-MachineLearningOperations/StockPricePrediction/blob/v2.1/assets/Monitor%20details.png) + +![Logging Dashboard](https://github.com/IE7374-MachineLearningOperations/StockPricePrediction/blob/v2.1/assets/Logging%20Dashboard.png) + +![Monitor Feature Detection](https://github.com/IE7374-MachineLearningOperations/StockPricePrediction/blob/v2.1/assets/Monitor%20feature%20detection.png) + +![Monitor Drift Detection](https://github.com/IE7374-MachineLearningOperations/StockPricePrediction/blob/v2.1/assets/Monitor%20drift%20detection.png) + +--- ## License This project is licensed under the MIT License. See the [LICENSE](https://github.com/IE7374-MachineLearningOperations/StockPricePrediction/blob/2abdea96ee56b51357cd519a9f5e89126b9c87bb/LICENSE) file. diff --git a/current.txt b/current.txt index ced11f0..d5dc208 100644 --- a/current.txt +++ b/current.txt @@ -1,12 +1,11 @@ . ├── artifacts │   ├── correlation_matrix_after_removing_correlated_features.png -│   ├── ElasticNet.pkl +│   ├── drift_detection_log.txt │   ├── Feature Importance for ElasticNet on Test Set.png │   ├── Feature Importance for Lasso on Test Set.png │   ├── Feature Importance for Ridge on Test Set.png -│   ├── Lasso.pkl -│   └── Ridge.pkl +│   └── schema.pbtxt ├── assets │   ├── airflow_dags.jpeg │   ├── airflow_gantt.png @@ -16,29 +15,48 @@ │   ├── airflow_logging.jpeg │   ├── airflow_pipeline.png │   ├── artifacts_blob.png +│   ├── Cloud build trigger.png │   ├── compare_different_runs.png │   ├── correlation_matrix_after_removing_correlated_features.png │   ├── dags_run.png │   ├── Data_split1.png │   ├── detail_one_run.png │   ├── detect_bias_log.png +│   ├── Drift Detection logging.png │   ├── email_notification.jpeg │   ├── gantt.jpeg │   ├── gcp-artifcats.png +│   ├── GCP billing dashbgoard.png │   ├── gcpbucket.png +│   ├── Github Actions CICD.png │   ├── github_trigger.png +│   ├── Github Workflows.png +│   ├── IAM roles.png │   ├── linear_reg_outputs.png +│   ├── Linear Regression - Hyperparameter Sensitivity_ model__alpha.png +│   ├── Linear Regression - Hyperparameter Sensitivity_ model__l1_ratio.png +│   ├── Logging Dashboard.png │   ├── mlops10trigger.png │   ├── MLOps Group10 Diag.png │   ├── model_analysis_elasticNet.png │   ├── model_checkpoints.png +│   ├── Model monitoring Anomolies.png +│   ├── Model Monitoring notification.png +│   ├── Model serving.png +│   ├── Monitor details.png +│   ├── Monitor drift detection.png +│   ├── Monitor feature detection.png │   ├── overview_charts_all_runs.png │   ├── pca_components.png -│   ├── pca_components.png.dvc │   ├── results_linear_regression .png │   ├── save_best_model_to_gcs.png │   ├── test_functions.jpeg │   ├── trigger_cloudfun.png +│   ├── Vertex Ai endpoints.png +│   ├── Vertex AI jupyter notebooks.png +│   ├── Vertex AI metadata.png +│   ├── Vertex AI model development training.png +│   ├── Vertex AI model registry.png │   ├── VM_instance.png │   ├── wandb_main_dashboard_overview_all_runs.png │   └── yfinance_time_series.png @@ -47,6 +65,8 @@ │   │   ├── Airflow README.md │   │   ├── Notebooks README.md │   │   └── Project README.md +│   ├── Model Deployment +│   │   └── README.md │   ├── Model development pipeline phase │   │   └── README.md │   └── Scoping Phase @@ -60,16 +80,19 @@ ├── data │   ├── ADS_Index.csv │   ├── fama_french.csv -│   └── preprocessed -│   ├── final_dataset.csv -│   └── merged_original_dataset.csv +│   ├── preprocessed +│   │   ├── final_dataset.csv +│   │   ├── merged_original_dataset.csv +│   │   └── preprocess_data.ipynb +│   └── raw +│   └── dataset.csv ├── data.dvc ├── dockerfile +├── dvc_files ├── GCP │   ├── application_default_credentials.json │   ├── bucketcsv.sh │   ├── deploy.yml -│   ├── filesbucket.ipynb │   ├── gcloud │   │   ├── access_tokens.db │   │   ├── active_config @@ -110,11 +133,90 @@ │   │   ├── 01.10.10.702675.log │   │   └── 01.10.12.933871.log │   ├── gcpbuckettree.txt -│   ├── GCPresorce.py +│   ├── gcpdeploy +│   │   ├── best_model.py +│   │   ├── src +│   │   │   ├── serve +│   │   │   │   ├── Dockerfile +│   │   │   │   ├── model_serving.ipynb +│   │   │   │   └── predict.py +│   │   │   └── trainer +│   │   │   ├── Dockerfile +│   │   │   └── train.py +│   │   └── trainer +│   │   ├── artifacts +│   │   │   ├── bias_detection_resultsV2.txt +│   │   │   ├── bias_detection.txt +│   │   │   ├── feature_importance.png +│   │   │   └── model.joblib +│   │   ├── config.yaml +│   │   ├── Dockerfile +│   │   ├── model_serving.ipynb +│   │   ├── train.py +│   │   └── wandb +│   │   ├── debug-internal.log +│   │   ├── debug.log +│   │   ├── latest-run +│   │   │   ├── files +│   │   │   │   ├── config.yaml +│   │   │   │   ├── output.log +│   │   │   │   ├── requirements.txt +│   │   │   │   ├── wandb-metadata.json +│   │   │   │   └── wandb-summary.json +│   │   │   ├── logs +│   │   │   │   ├── debug-core.log +│   │   │   │   ├── debug-internal.log +│   │   │   │   └── debug.log +│   │   │   ├── run-0v5xt73p.wandb +│   │   │   └── tmp +│   │   │   └── code +│   │   ├── run-20241204_225214-8yu3j8rv +│   │   │   ├── files +│   │   │   │   ├── config.yaml +│   │   │   │   ├── output.log +│   │   │   │   ├── requirements.txt +│   │   │   │   ├── wandb-metadata.json +│   │   │   │   └── wandb-summary.json +│   │   │   ├── logs +│   │   │   │   ├── debug-core.log +│   │   │   │   ├── debug-internal.log +│   │   │   │   └── debug.log +│   │   │   ├── run-8yu3j8rv.wandb +│   │   │   └── tmp +│   │   │   └── code +│   │   ├── run-20241204_225320-york2hyl +│   │   │   ├── files +│   │   │   │   ├── config.yaml +│   │   │   │   ├── output.log +│   │   │   │   ├── requirements.txt +│   │   │   │   ├── wandb-metadata.json +│   │   │   │   └── wandb-summary.json +│   │   │   ├── logs +│   │   │   │   ├── debug-core.log +│   │   │   │   ├── debug-internal.log +│   │   │   │   └── debug.log +│   │   │   ├── run-york2hyl.wandb +│   │   │   └── tmp +│   │   │   └── code +│   │   └── run-20241204_225424-knvuoquq +│   │   ├── files +│   │   │   ├── config.yaml +│   │   │   ├── output.log +│   │   │   ├── requirements.txt +│   │   │   ├── wandb-metadata.json +│   │   │   └── wandb-summary.json +│   │   ├── logs +│   │   │   ├── debug-core.log +│   │   │   ├── debug-internal.log +│   │   │   └── debug.log +│   │   ├── run-knvuoquq.wandb +│   │   └── tmp +│   │   └── code │   ├── README.md -│   ├── striped-graph-440017-d7-79f99f8253bc.json -│   └── synclocal.ipynb +│   └── striped-graph-440017-d7-79f99f8253bc.json +├── gcpdeploy.dvc ├── LICENSE +├── LICENSE.dvc ├── models │   ├── cleaned_data.csv │   ├── FeatureEng_and_ModelBiasDetn.ipynb @@ -122,11 +224,6 @@ │   ├── linear_regression.ipynb │   ├── LSTM.ipynb │   ├── ML Models.ipynb -│   ├── model_checkpoints_ElasticNet.pkl -│   ├── model_checkpoints_Lasso.pkl -│   ├── model_checkpoints_LSTM.pkl -│   ├── model_checkpoints_Ridge.pkl -│   ├── model_checkpoints_XGBoost.pkl │   ├── README.md │   ├── RF_Model.ipynb │   ├── SVM.ipynb @@ -134,6 +231,7 @@ │   ├── X_test_split.csv │   ├── X_train_split.csv │   └── X_validation_split.csv +├── notebooks.dvc ├── pipeline │   ├── airflow │   │   ├── artifacts @@ -144,8 +242,8 @@ │   │   │   ├── Feature Importance for Lasso.png │   │   │   ├── Feature Importance for Ridge on Test Set.png │   │   │   ├── Feature Importance for Ridge.png -│   │   │   ├── Linear Regression - Hyperparameter Sensitivity: model__alpha.png -│   │   │   ├── Linear Regression - Hyperparameter Sensitivity: model__l1_ratio.png +│   │   │   ├── Linear Regression - Hyperparameter Sensitivity_ model__alpha.png +│   │   │   ├── Linear Regression - Hyperparameter Sensitivity_ model__l1_ratio.png │   │   │   ├── pca_components.png │   │   │   └── yfinance_time_series.png │   │   ├── dags @@ -153,6 +251,7 @@ │   │   │   ├── data │   │   │   │   ├── ADS_index.csv │   │   │   │   ├── fama_french.csv +│   │   │   │   ├── final_dataset_for_modeling.csv │   │   │   │   ├── FRED_Variables │   │   │   │   │   ├── AMERIBOR.csv │   │   │   │   │   ├── BAMLH0A0HYM2.csv @@ -182,8 +281,7 @@ │   │   │   │   │   ├── USRECDP.csv │   │   │   │   │   └── VIXCLS.csv │   │   │   │   └── merged_original_dataset.csv -│   │   │   ├── __pycache__ -│   │   │   │   └── airflow.cpython-312.pyc +│   │   │   ├── requirements.txt │   │   │   └── src │   │   │   ├── convert_column_dtype.py │   │   │   ├── correlation.py @@ -193,42 +291,29 @@ │   │   │   ├── keep_latest_data.py │   │   │   ├── lagged_features.py │   │   │   ├── models +│   │   │   │   ├── cleaned_data.csv │   │   │   │   ├── linear_regression.py │   │   │   │   ├── LSTM.py │   │   │   │   ├── model_bias_detection.py │   │   │   │   ├── model_sensitivity_analysis.py │   │   │   │   ├── model_utils.py +│   │   │   │   ├── svr.py │   │   │   │   └── XGBoost.py │   │   │   ├── pca.py │   │   │   ├── plot_yfinance_time_series.py -│   │   │   ├── __pycache__ -│   │   │   │   ├── convert_column_dtype.cpython-312.pyc -│   │   │   │   ├── correlation.cpython-312.pyc -│   │   │   │   ├── download_data.cpython-312.pyc -│   │   │   │   ├── feature_interactions.cpython-312.pyc -│   │   │   │   ├── handle_missing.cpython-312.pyc -│   │   │   │   ├── keep_latest_data.cpython-312.pyc -│   │   │   │   ├── lagged_features.cpython-312.pyc -│   │   │   │   ├── pca.cpython-312.pyc -│   │   │   │   ├── plot_yfinance_time_series.cpython-312.pyc -│   │   │   │   ├── remove_weekend_data.cpython-312.pyc -│   │   │   │   ├── Requirement.cpython-312.pyc -│   │   │   │   ├── scaler.cpython-312.pyc -│   │   │   │   └── technical_indicators.cpython-312.pyc │   │   │   ├── remove_weekend_data.py │   │   │   ├── scaler.py │   │   │   ├── technical_indicators.py -│   │   │   ├── upload_blob.py -│   │   │   └── wandb_log.py +│   │   │   └── upload_blob.py │   │   ├── docker-compose.yaml +│   │   ├── Dockerfile │   │   ├── dvc.lock │   │   ├── dvc.yaml │   │   ├── images_push_to_gcp.sh │   │   ├── logs │   │   │   └── scheduler -│   │   │   ├── latest -> /opt/airflow/logs/scheduler/2024-11-16 +│   │   │   ├── latest -> /opt/airflow/logs/scheduler/2024-12-02 │   │   │   └── latest~HEAD -│   │   ├── plugins │   │   ├── tests │   │   │   ├── test_convert_column_dtype.py │   │   │   ├── test_correlation.py @@ -242,85 +327,300 @@ │   │   │   ├── test_remove_weekend_data.py │   │   │   ├── test_scaler.py │   │   │   └── test_technical_indicators.py -│   │   ├── wandb -│   │   │   ├── latest-run -> run-20241115_215718-z46bnrst -│   │   │   ├── run-20241115_211624-l57amau7 -│   │   │   │   ├── files -│   │   │   │   │   ├── config.yaml -│   │   │   │   │   ├── wandb-metadata.json -│   │   │   │   │   └── wandb-summary.json -│   │   │   │   └── run-l57amau7.wandb -│   │   │   ├── run-20241115_212249-pkl9a7mt -│   │   │   │   ├── files -│   │   │   │   │   ├── config.yaml -│   │   │   │   │   ├── wandb-metadata.json -│   │   │   │   │   └── wandb-summary.json -│   │   │   │   └── run-pkl9a7mt.wandb -│   │   │   ├── run-20241115_212432-xrpxqqpj -│   │   │   │   ├── files -│   │   │   │   │   ├── config.yaml -│   │   │   │   │   ├── wandb-metadata.json -│   │   │   │   │   └── wandb-summary.json -│   │   │   │   └── run-xrpxqqpj.wandb -│   │   │   ├── run-20241115_212708-qmlg5wns -│   │   │   │   ├── files -│   │   │   │   │   ├── config.yaml -│   │   │   │   │   ├── wandb-metadata.json -│   │   │   │   │   └── wandb-summary.json -│   │   │   │   └── run-qmlg5wns.wandb -│   │   │   ├── run-20241115_212913-vu840z1f -│   │   │   │   ├── files -│   │   │   │   │   ├── config.yaml -│   │   │   │   │   ├── wandb-metadata.json -│   │   │   │   │   └── wandb-summary.json -│   │   │   │   └── run-vu840z1f.wandb -│   │   │   ├── run-20241115_215701-jcdpdy5d -│   │   │   │   ├── files -│   │   │   │   │   ├── config.yaml -│   │   │   │   │   ├── wandb-metadata.json -│   │   │   │   │   └── wandb-summary.json -│   │   │   │   └── run-jcdpdy5d.wandb -│   │   │   ├── run-20241115_215708-13bfiift -│   │   │   │   ├── files -│   │   │   │   │   ├── config.yaml -│   │   │   │   │   ├── wandb-metadata.json -│   │   │   │   │   └── wandb-summary.json -│   │   │   │   └── run-13bfiift.wandb -│   │   │   ├── run-20241115_215712-vbyyrfjs -│   │   │   │   ├── files -│   │   │   │   │   ├── config.yaml -│   │   │   │   │   ├── wandb-metadata.json -│   │   │   │   │   └── wandb-summary.json -│   │   │   │   └── run-vbyyrfjs.wandb -│   │   │   └── run-20241115_215718-z46bnrst -│   │   │   ├── files -│   │   │   │   ├── config.yaml -│   │   │   │   ├── wandb-metadata.json -│   │   │   │   └── wandb-summary.json -│   │   │   └── run-z46bnrst.wandb -│   │   └── working_data +│   │   └── wandb +│   │   ├── latest-run -> run-20241115_215718-z46bnrst +│   │   ├── run-20241115_211624-l57amau7 +│   │   │   ├── files +│   │   │   │   ├── config.yaml +│   │   │   │   ├── wandb-metadata.json +│   │   │   │   └── wandb-summary.json +│   │   │   └── run-l57amau7.wandb +│   │   ├── run-20241115_212249-pkl9a7mt +│   │   │   ├── files +│   │   │   │   ├── config.yaml +│   │   │   │   ├── wandb-metadata.json +│   │   │   │   └── wandb-summary.json +│   │   │   └── run-pkl9a7mt.wandb +│   │   ├── run-20241115_212432-xrpxqqpj +│   │   │   ├── files +│   │   │   │   ├── config.yaml +│   │   │   │   ├── wandb-metadata.json +│   │   │   │   └── wandb-summary.json +│   │   │   └── run-xrpxqqpj.wandb +│   │   ├── run-20241115_212708-qmlg5wns +│   │   │   ├── files +│   │   │   │   ├── config.yaml +│   │   │   │   ├── wandb-metadata.json +│   │   │   │   └── wandb-summary.json +│   │   │   └── run-qmlg5wns.wandb +│   │   ├── run-20241115_212913-vu840z1f +│   │   │   ├── files +│   │   │   │   ├── config.yaml +│   │   │   │   ├── wandb-metadata.json +│   │   │   │   └── wandb-summary.json +│   │   │   └── run-vu840z1f.wandb +│   │   ├── run-20241115_215701-jcdpdy5d +│   │   │   ├── files +│   │   │   │   ├── config.yaml +│   │   │   │   ├── wandb-metadata.json +│   │   │   │   └── wandb-summary.json +│   │   │   └── run-jcdpdy5d.wandb +│   │   ├── run-20241115_215708-13bfiift +│   │   │   ├── files +│   │   │   │   ├── config.yaml +│   │   │   │   ├── wandb-metadata.json +│   │   │   │   └── wandb-summary.json +│   │   │   └── run-13bfiift.wandb +│   │   ├── run-20241115_215712-vbyyrfjs +│   │   │   ├── files +│   │   │   │   ├── config.yaml +│   │   │   │   ├── wandb-metadata.json +│   │   │   │   └── wandb-summary.json +│   │   │   └── run-vbyyrfjs.wandb +│   │   └── run-20241115_215718-z46bnrst +│   │   ├── files +│   │   │   ├── config.yaml +│   │   │   ├── wandb-metadata.json +│   │   │   └── wandb-summary.json +│   │   └── run-z46bnrst.wandb │   ├── pipelinetree.txt -│   └── README.md +│   ├── README.md +│   └── requirements.txt ├── README.md ├── README.md.dvc ├── requirements.txt ├── requirements.txt.dvc ├── src +│   ├── best_model.py +│   ├── check.py +│   ├── Data +│   │   ├── assets +│   │   │   ├── correlation_matrix_after_removing_correlated_features.png +│   │   │   ├── gcpbucket.png +│   │   │   ├── MLOps Group10 Diag.png +│   │   │   ├── pca_components.png +│   │   │   └── yfinance_time_series.png +│   │   ├── data +│   │   │   ├── ADS_Index.csv +│   │   │   ├── fama_french.csv +│   │   │   └── preprocessed +│   │   │   ├── final_dataset.csv +│   │   │   └── merged_original_dataset.csv +│   │   └── pipeline +│   │   └── airflow +│   │   ├── artifacts +│   │   │   ├── correlation_matrix_after_removing_correlated_features.png +│   │   │   ├── pca_components.png +│   │   │   └── yfinance_time_series.png +│   │   └── dags +│   │   └── data +│   │   ├── ADS_index.csv +│   │   ├── fama_french.csv +│   │   ├── final_dataset.csv +│   │   ├── FRED_Variables +│   │   │   ├── AMERIBOR.csv +│   │   │   ├── BAMLH0A0HYM2.csv +│   │   │   ├── BAMLH0A0HYM2EY.csv +│   │   │   ├── CBBTCUSD.csv +│   │   │   ├── CBETHUSD.csv +│   │   │   ├── DAAA.csv +│   │   │   ├── DBAA.csv +│   │   │   ├── DCOILBRENTEU.csv +│   │   │   ├── DCOILWTICO.csv +│   │   │   ├── DCPF1M.csv +│   │   │   ├── DCPN3M.csv +│   │   │   ├── DEXJPUS.csv +│   │   │   ├── DEXUSEU.csv +│   │   │   ├── DEXUSUK.csv +│   │   │   ├── DGS10.csv +│   │   │   ├── DGS1.csv +│   │   │   ├── DHHNGSP.csv +│   │   │   ├── NIKKEI225.csv +│   │   │   ├── OBMMIJUMBO30YF.csv +│   │   │   ├── RIFSPPFAAD90NB.csv +│   │   │   ├── T10Y3M.csv +│   │   │   ├── T10YIE.csv +│   │   │   ├── T5YIE.csv +│   │   │   ├── USRECD.csv +│   │   │   ├── USRECDM.csv +│   │   │   ├── USRECDP.csv +│   │   │   └── VIXCLS.csv +│   │   └── merged_original_dataset.csv +│   ├── dataapi.py +│   ├── Datadrift_detection_updated.ipynb │   ├── data_preprocessing.ipynb │   ├── DataSchema_Stats.ipynb +│   ├── DVC +│   │   └── files +│   │   └── md5 +│   │   ├── 00 +│   │   │   └── b2f65a78688d3e02cd1be5ff35b027 +│   │   ├── 01 +│   │   │   └── 0a91294324fa9455890c8664464208 +│   │   ├── 0d +│   │   │   └── 249287c3a977c179ea4d737465e2bd +│   │   ├── 12 +│   │   │   └── 52f4e4135d4bf2e251f51c9d3e1a15.dir +│   │   ├── 18 +│   │   │   └── 3702fa2a04170a102fa672cba98cb3 +│   │   ├── 19 +│   │   │   └── 5212c1c883121cc27ef25b40dd1147 +│   │   ├── 1b +│   │   │   └── 18d13c54000beb2f78fb5350a70088 +│   │   ├── 1c +│   │   │   └── 820732fffe9a07d2b60db4d04901c0 +│   │   ├── 1d +│   │   │   └── cce21c382ceb4196933531da242a5b +│   │   ├── 23 +│   │   │   └── 0e90d16b6d38cbae0019d62c372ad9 +│   │   ├── 26 +│   │   │   └── 86be68201e37304512429dd69b76c1 +│   │   ├── 2e +│   │   │   └── 55c6057ed20579173f49e53c111569 +│   │   ├── 30 +│   │   │   └── 3fb2e7171c34d3cbdbcb937d4b9dff +│   │   ├── 33 +│   │   │   └── 16e0d18a168f00e3298ab34a72ff14 +│   │   ├── 34 +│   │   │   └── 6428a30a15dee21ff2c09c3d62e394 +│   │   ├── 38 +│   │   │   └── f500081cac37497d672f27b7ee62cf +│   │   ├── 42 +│   │   │   └── db0f08cedf931610a5f08ef1f3bf8d.dir +│   │   ├── 43 +│   │   │   └── b6488d7b26b6c361101f56c3b16c8d +│   │   ├── 4c +│   │   │   └── 69076cfca52511b91ade2a24cdc0ca +│   │   ├── 4e +│   │   │   └── dd5057638e102f5bc205968d4593d5.dir +│   │   ├── 4f +│   │   │   ├── 9887096051715bce09afa092d44053 +│   │   │   └── d0fc4fe5f72c3b70057e25d736c5b7 +│   │   ├── 51 +│   │   │   └── 8e1b0debd4fae2d25bd525c0e7e31f +│   │   ├── 5a +│   │   │   └── d2208d8fd1a3b09f0274d80b19853b +│   │   ├── 62 +│   │   │   └── a27affb01d76de691ec6362e0db52c +│   │   ├── 64 +│   │   │   └── c8fa78764ef5227db39d07722b2b58 +│   │   ├── 65 +│   │   │   ├── 87bf75a7cc1f6db67460d121a6f997 +│   │   │   └── e57926f4606c68730d243de175b219 +│   │   ├── 68 +│   │   │   └── 319992a64b2f1283bfeb516bb00e78 +│   │   ├── 69 +│   │   │   └── 414b772def6af0373789c0f4d9e97f +│   │   ├── 6c +│   │   │   └── 8da018fe7441094cf5282c499b457f +│   │   ├── 6f +│   │   │   ├── 21deb251577e14acc627a0d4f4bc60 +│   │   │   └── efc1ffdcec6975cc76d44b249c8c20 +│   │   ├── 71 +│   │   │   └── 0b22dac2ef339391b4db1d1156ffb4 +│   │   ├── 72 +│   │   │   └── 72abf9e695de989fc9b86739c1fcbf +│   │   ├── 75 +│   │   │   ├── 0eae632c3c59a86e27e305b348106f +│   │   │   └── 2d41c7ece58d400ef7cdee1ea861bf +│   │   ├── 78 +│   │   │   ├── 03d509a2b02d8f0e3f68be81861920 +│   │   │   └── 9e46c042ca0078de5f5616fa148e2b +│   │   ├── 7c +│   │   │   ├── 605eb5473eeba85716c90631e1370f.dir +│   │   │   └── dc250dfe088372c94e073d99e2fd86.dir +│   │   ├── 7f +│   │   │   └── f0d06e94a3f2a6e75f57428d739767 +│   │   ├── 86 +│   │   │   └── de7bbbe59a8c06e3a023ae95d6b385.dir +│   │   ├── 87 +│   │   │   ├── 0e9954220f0eb7759e86786b2ddd46 +│   │   │   └── 9f94b5120a7bd92e9223fa0e40321d +│   │   ├── 8c +│   │   │   ├── 40a7d04c458d5516662d6ffd504b06 +│   │   │   └── 7335d21a9bf00ac32aa5736adb7bc2 +│   │   ├── 8d +│   │   │   └── a91016c8f253eba682ceca59128026 +│   │   ├── 8e +│   │   │   └── bdc129516a30b60742c355c73565db +│   │   ├── 99 +│   │   │   ├── 083257c4363bbc2e460ac836f82b96.dir +│   │   │   ├── 655d7c0717bb9bbe09b7b50e4e4420 +│   │   │   └── ccec2cde1efb45fedc35230a01b604 +│   │   ├── 9c +│   │   │   └── cc64fec277ad2d7a9f4c1b415daf0b +│   │   ├── a1 +│   │   │   └── ebc1bb2e0f1578241466b081371add +│   │   ├── a6 +│   │   │   └── cd12134076daad0a2e6d00105367b6.dir +│   │   ├── a9 +│   │   │   ├── 41b5954d549fcfef1f9d8bf4e88879 +│   │   │   ├── 49d38915cc9d7db3bbb1f36590b5ef +│   │   │   └── 6bcade213b0c739167a8e22ca0fe38 +│   │   ├── ab +│   │   │   └── 8aa47a45bb548cbd9c6d8efe5e5da2 +│   │   ├── ae +│   │   │   └── 1a9eba08eaccabdb8b7be11fe405da +│   │   ├── b3 +│   │   │   └── 18d1ddc24c377b097bffca13471b90 +│   │   ├── b4 +│   │   │   └── a1fb85f126fec761cc1dc7425662f1 +│   │   ├── b7 +│   │   │   └── 433ff54aa187dbdbd8bc5ca752c798 +│   │   ├── bc +│   │   │   └── ca7c25f155f310e865b75e39c8cc99 +│   │   ├── bd +│   │   │   └── 04d71b318926695e81915a7ba14726 +│   │   ├── c2 +│   │   │   └── 827c79e1fc7cf3130da44ddc2ebfa4.dir +│   │   ├── ca +│   │   │   ├── a520c73ea398e6a9133c7ebbb63cd8 +│   │   │   └── c6d413eafcd382ed62c4a623a6d89e +│   │   ├── d4 +│   │   │   └── 1d8cd98f00b204e9800998ecf8427e +│   │   ├── dd +│   │   │   └── 9e8687243e940cb60730b3b3950da0 +│   │   ├── e1 +│   │   │   └── 46c5534f58d3c11ab1bd767912a997 +│   │   ├── e3 +│   │   │   └── 30d7fcfc691c7bee4df890e9a3fad7 +│   │   ├── e7 +│   │   │   └── 9b8402e5e91dd3191dc2bd30e2a270 +│   │   ├── eb +│   │   │   └── bf36e2b8bb5e4001f2f1e0029f3fa2 +│   │   ├── ed +│   │   │   └── c7e5bb869a745d327315924a20e7da +│   │   ├── ef +│   │   │   └── 7d94343cbecf17f1894760dd7b4af1 +│   │   ├── f0 +│   │   │   └── 5a2816635d78a2fed94f9f2b76d807.dir +│   │   ├── f2 +│   │   │   └── b67c60a1187b88a0e75497ffde7ac3 +│   │   ├── f4 +│   │   │   └── b6a23e8f6a6cf6e54a604e58642639 +│   │   ├── f9 +│   │   │   └── bcf17402eadb8705837fb2af0c06b0.dir +│   │   └── fc +│   │   └── 87fa6cf06eeec52acb1712d3e2d73f │   ├── FeatureEng_and_ModelBiasDetn.ipynb │   ├── Feature Engineering.ipynb +│   ├── feature_engineering.py +│   ├── filesbucket.ipynb +│   ├── GCPresorce.py │   ├── KNN.ipynb │   ├── linear_regression.ipynb │   ├── LSTM.ipynb │   ├── ML Models.ipynb +│   ├── preprocess_data.ipynb │   ├── PROJECT_DATA_CLEANING.ipynb │   ├── README.md │   ├── RF_Model.ipynb │   ├── SVM.ipynb +│   ├── synclocal.ipynb │   └── XGBoost.ipynb ├── src.dvc ├── striped-graph-440017-d7-79f99f8253bc.json +├── striped-graph-440017-d7-c8fdb42bc8ba.json └── tests ├── test_convert_column_dtype.py ├── test_correlation.py @@ -335,4 +635,4 @@ ├── test_scaler.py └── test_technical_indicators.py -58 directories, 278 files +165 directories, 471 files diff --git a/models/X_test_split.csv b/data/models/X_test_split.csv similarity index 100% rename from models/X_test_split.csv rename to data/models/X_test_split.csv diff --git a/models/X_train_split.csv b/data/models/X_train_split.csv similarity index 100% rename from models/X_train_split.csv rename to data/models/X_train_split.csv diff --git a/models/X_validation_split.csv b/data/models/X_validation_split.csv similarity index 100% rename from models/X_validation_split.csv rename to data/models/X_validation_split.csv diff --git a/models/cleaned_data.csv b/data/models/cleaned_data.csv similarity index 100% rename from models/cleaned_data.csv rename to data/models/cleaned_data.csv diff --git a/pipeline/README.md b/pipeline/README.md index d2c225b..e866c88 100644 --- a/pipeline/README.md +++ b/pipeline/README.md @@ -8,6 +8,8 @@ This section explains the DAGs pipeline implemented using Apache Airflow for wor - [Pipeline Components](#pipeline-components) - [Setup and Usage](#setup-and-usage) - [Data Sources](#data-sources) +- [Model Serving and Deployment](#model-serving-and-deployment) +- [Monitoring and Maintenance](#monitoring-and-maintenance) --- @@ -20,8 +22,8 @@ This section explains the DAGs pipeline implemented using Apache Airflow for wor │ │ ├── correlation_matrix_after_removing_correlated_features.png │ │ ├── Feature Importance for ElasticNet on Test Set.png │ │ ├── Feature Importance for Lasso on Test Set.png -│ │ ├── Linear Regression - Hyperparameter Sensitivity: model__alpha.png -│ │ ├── Linear Regression - Hyperparameter Sensitivity: model__l1_ratio.png +│ │ ├── Linear Regression - Hyperparameter Sensitivity model__alpha.png +│ │ ├── Linear Regression - Hyperparameter Sensitivity model__l1_ratio.png │ │ ├── pca_components.png │ │ └── yfinance_time_series.png │ ├── dags # Contains Airflow DAGs @@ -157,4 +159,6 @@ To set up and run the pipeline: 1. **ADS Index**: Tracks economic trends and business cycles. 2. **Fama-French Factors**: Provides historical data for financial research. 3. **FRED Variables**: Includes various economic indicators, such as AMERIBOR, NIKKEI 225, and VIX. -4. **YFinance**: Pulls historical stock data ('GOOGL') for financial time-series analysis. \ No newline at end of file +4. **YFinance**: Pulls historical stock data ('GOOGL') for financial time-series analysis. + +--- diff --git a/pipeline/pipelinetree.txt b/pipeline/pipelinetree.txt index ea3e90b..af2f6e6 100644 --- a/pipeline/pipelinetree.txt +++ b/pipeline/pipelinetree.txt @@ -8,8 +8,8 @@ │   │   ├── Feature Importance for Lasso.png │   │   ├── Feature Importance for Ridge on Test Set.png │   │   ├── Feature Importance for Ridge.png -│   │   ├── Linear Regression - Hyperparameter Sensitivity: model__alpha.png -│   │   ├── Linear Regression - Hyperparameter Sensitivity: model__l1_ratio.png +│   │   ├── Linear Regression - Hyperparameter Sensitivity_ model__alpha.png +│   │   ├── Linear Regression - Hyperparameter Sensitivity_ model__l1_ratio.png │   │   ├── pca_components.png │   │   └── yfinance_time_series.png │   ├── dags @@ -17,6 +17,7 @@ │   │   ├── data │   │   │   ├── ADS_index.csv │   │   │   ├── fama_french.csv +│   │   │   ├── final_dataset_for_modeling.csv │   │   │   ├── FRED_Variables │   │   │   │   ├── AMERIBOR.csv │   │   │   │   ├── BAMLH0A0HYM2.csv @@ -46,8 +47,7 @@ │   │   │   │   ├── USRECDP.csv │   │   │   │   └── VIXCLS.csv │   │   │   └── merged_original_dataset.csv -│   │   ├── __pycache__ -│   │   │   └── airflow.cpython-312.pyc +│   │   ├── requirements.txt │   │   └── src │   │   ├── convert_column_dtype.py │   │   ├── correlation.py @@ -57,42 +57,29 @@ │   │   ├── keep_latest_data.py │   │   ├── lagged_features.py │   │   ├── models +│   │   │   ├── cleaned_data.csv │   │   │   ├── linear_regression.py │   │   │   ├── LSTM.py │   │   │   ├── model_bias_detection.py │   │   │   ├── model_sensitivity_analysis.py │   │   │   ├── model_utils.py +│   │   │   ├── svr.py │   │   │   └── XGBoost.py │   │   ├── pca.py │   │   ├── plot_yfinance_time_series.py -│   │   ├── __pycache__ -│   │   │   ├── convert_column_dtype.cpython-312.pyc -│   │   │   ├── correlation.cpython-312.pyc -│   │   │   ├── download_data.cpython-312.pyc -│   │   │   ├── feature_interactions.cpython-312.pyc -│   │   │   ├── handle_missing.cpython-312.pyc -│   │   │   ├── keep_latest_data.cpython-312.pyc -│   │   │   ├── lagged_features.cpython-312.pyc -│   │   │   ├── pca.cpython-312.pyc -│   │   │   ├── plot_yfinance_time_series.cpython-312.pyc -│   │   │   ├── remove_weekend_data.cpython-312.pyc -│   │   │   ├── Requirement.cpython-312.pyc -│   │   │   ├── scaler.cpython-312.pyc -│   │   │   └── technical_indicators.cpython-312.pyc │   │   ├── remove_weekend_data.py │   │   ├── scaler.py │   │   ├── technical_indicators.py -│   │   ├── upload_blob.py -│   │   └── wandb_log.py +│   │   └── upload_blob.py │   ├── docker-compose.yaml +│   ├── Dockerfile │   ├── dvc.lock │   ├── dvc.yaml │   ├── images_push_to_gcp.sh │   ├── logs │   │   └── scheduler -│   │   ├── latest -> /opt/airflow/logs/scheduler/2024-11-16 +│   │   ├── latest -> /opt/airflow/logs/scheduler/2024-12-02 │   │   └── latest~HEAD -│   ├── plugins │   ├── tests │   │   ├── test_convert_column_dtype.py │   │   ├── test_correlation.py @@ -106,64 +93,64 @@ │   │   ├── test_remove_weekend_data.py │   │   ├── test_scaler.py │   │   └── test_technical_indicators.py -│   ├── wandb -│   │   ├── latest-run -> run-20241115_215718-z46bnrst -│   │   ├── run-20241115_211624-l57amau7 -│   │   │   ├── files -│   │   │   │   ├── config.yaml -│   │   │   │   ├── wandb-metadata.json -│   │   │   │   └── wandb-summary.json -│   │   │   └── run-l57amau7.wandb -│   │   ├── run-20241115_212249-pkl9a7mt -│   │   │   ├── files -│   │   │   │   ├── config.yaml -│   │   │   │   ├── wandb-metadata.json -│   │   │   │   └── wandb-summary.json -│   │   │   └── run-pkl9a7mt.wandb -│   │   ├── run-20241115_212432-xrpxqqpj -│   │   │   ├── files -│   │   │   │   ├── config.yaml -│   │   │   │   ├── wandb-metadata.json -│   │   │   │   └── wandb-summary.json -│   │   │   └── run-xrpxqqpj.wandb -│   │   ├── run-20241115_212708-qmlg5wns -│   │   │   ├── files -│   │   │   │   ├── config.yaml -│   │   │   │   ├── wandb-metadata.json -│   │   │   │   └── wandb-summary.json -│   │   │   └── run-qmlg5wns.wandb -│   │   ├── run-20241115_212913-vu840z1f -│   │   │   ├── files -│   │   │   │   ├── config.yaml -│   │   │   │   ├── wandb-metadata.json -│   │   │   │   └── wandb-summary.json -│   │   │   └── run-vu840z1f.wandb -│   │   ├── run-20241115_215701-jcdpdy5d -│   │   │   ├── files -│   │   │   │   ├── config.yaml -│   │   │   │   ├── wandb-metadata.json -│   │   │   │   └── wandb-summary.json -│   │   │   └── run-jcdpdy5d.wandb -│   │   ├── run-20241115_215708-13bfiift -│   │   │   ├── files -│   │   │   │   ├── config.yaml -│   │   │   │   ├── wandb-metadata.json -│   │   │   │   └── wandb-summary.json -│   │   │   └── run-13bfiift.wandb -│   │   ├── run-20241115_215712-vbyyrfjs -│   │   │   ├── files -│   │   │   │   ├── config.yaml -│   │   │   │   ├── wandb-metadata.json -│   │   │   │   └── wandb-summary.json -│   │   │   └── run-vbyyrfjs.wandb -│   │   └── run-20241115_215718-z46bnrst -│   │   ├── files -│   │   │   ├── config.yaml -│   │   │   ├── wandb-metadata.json -│   │   │   └── wandb-summary.json -│   │   └── run-z46bnrst.wandb -│   └── working_data +│   └── wandb +│   ├── latest-run -> run-20241115_215718-z46bnrst +│   ├── run-20241115_211624-l57amau7 +│   │   ├── files +│   │   │   ├── config.yaml +│   │   │   ├── wandb-metadata.json +│   │   │   └── wandb-summary.json +│   │   └── run-l57amau7.wandb +│   ├── run-20241115_212249-pkl9a7mt +│   │   ├── files +│   │   │   ├── config.yaml +│   │   │   ├── wandb-metadata.json +│   │   │   └── wandb-summary.json +│   │   └── run-pkl9a7mt.wandb +│   ├── run-20241115_212432-xrpxqqpj +│   │   ├── files +│   │   │   ├── config.yaml +│   │   │   ├── wandb-metadata.json +│   │   │   └── wandb-summary.json +│   │   └── run-xrpxqqpj.wandb +│   ├── run-20241115_212708-qmlg5wns +│   │   ├── files +│   │   │   ├── config.yaml +│   │   │   ├── wandb-metadata.json +│   │   │   └── wandb-summary.json +│   │   └── run-qmlg5wns.wandb +│   ├── run-20241115_212913-vu840z1f +│   │   ├── files +│   │   │   ├── config.yaml +│   │   │   ├── wandb-metadata.json +│   │   │   └── wandb-summary.json +│   │   └── run-vu840z1f.wandb +│   ├── run-20241115_215701-jcdpdy5d +│   │   ├── files +│   │   │   ├── config.yaml +│   │   │   ├── wandb-metadata.json +│   │   │   └── wandb-summary.json +│   │   └── run-jcdpdy5d.wandb +│   ├── run-20241115_215708-13bfiift +│   │   ├── files +│   │   │   ├── config.yaml +│   │   │   ├── wandb-metadata.json +│   │   │   └── wandb-summary.json +│   │   └── run-13bfiift.wandb +│   ├── run-20241115_215712-vbyyrfjs +│   │   ├── files +│   │   │   ├── config.yaml +│   │   │   ├── wandb-metadata.json +│   │   │   └── wandb-summary.json +│   │   └── run-vbyyrfjs.wandb +│   └── run-20241115_215718-z46bnrst +│   ├── files +│   │   ├── config.yaml +│   │   ├── wandb-metadata.json +│   │   └── wandb-summary.json +│   └── run-z46bnrst.wandb ├── pipelinetree.txt -└── README.md +├── README.md +└── requirements.txt -35 directories, 132 files +31 directories, 123 files