Author | Contribution | Version | Date |
---|---|---|---|
Rob van Eijk @rvaneijk | AWS SageMaker CloudFormation template and documentation | 1.0 | 2025-01-25 |
Rob van Eijk @rvaneijk | Multi-GPU support and documentation | 1.1 | 2025-02-08 |
- Added multi-GPU support with PyTorch distributed training
- Enhanced CloudFormation templates for larger GPU instances
- Added real-time training progress monitoring
- Split templates into production/development and single/multi-GPU variants
- Initial CloudFormation template for Privacy Meter deployment on AWS SageMaker
- Custom Conda environment with PyTorch GPU support
- Automated package verification system
- Documentation and deployment guide
The deployment offers four CloudFormation template variants to match your needs:
-
cloudformation-template.yml (single GPU)
- Production environment, single GPU
- Default instance: ml.g4dn.xlarge (1x T4 GPU)
- Suitable for: Production deployments with smaller workloads
- Stable release repository
-
cloudformation-template-multi-gpu.yml (4 GPUs)
- Production environment, multiple GPUs
- Default instance: ml.g4dn.12xlarge (4x T4 GPUs)
- Suitable for: Production deployments with larger workloads
- Stable release repository
-
cloudformation-template-dev.yml (single GPU)
- Development environment, single GPU
- Default instance: ml.g4dn.xlarge (1x T4 GPU)
- Suitable for: Development and testing with smaller workloads
- Development repository with latest features
-
cloudformation-template-dev-multi-gpu.yml (4 GPUs)
- Development environment, multiple GPUs
- Default instance: ml.g4dn.12xlarge (4x T4 GPUs)
- Suitable for: Development and testing with larger workloads
- Development repository with latest features
This CloudFormation template deploys Privacy Meter as an AWS SageMaker notebook instance. Key features include:
[Previous features list remains the same...]
Disclaimer: This template is provided as-is without any warranties. Users are responsible for understanding AWS costs, security implications, and maintaining their deployments. Always review and test the template before deploying in a production environment.
-
Select appropriate template:
- For production use:
- Single GPU:
cloudformation-template.yml
- Multiple GPUs:
cloudformation-template-multi-gpu.yml
- Single GPU:
- For development use:
- Single GPU:
cloudformation-template-dev.yml
- Multiple GPUs:
cloudformation-template-dev-multi-gpu.yml
- Single GPU:
- For production use:
-
Deploy using AWS console:
- Navigate to AWS CloudFormation console
- Click "Create stack" (with new resources)
- Upload your selected template file
- Follow the prompts to complete deployment
Alternatively, deploy using AWS CLI:
aws cloudformation create-stack \ --stack-name privacy-meter-dev \ --template-body file://[selected-template].yml \ --capabilities CAPABILITY_IAM
-
Access your notebook:
- Navigate to the AWS SageMaker console
- Click "Notebook instances"
- Find your instance (default name: PrivacyMeterNotebook)
- Click "Open JupyterLab"
The template supports various instance types based on your workload:
-
Single GPU (Small Workloads, single GPU)
- Instance:
ml.g4dn.xlarge
- 1x NVIDIA T4 GPU
- 4 vCPUs
- 16 GB RAM
- Instance:
-
Multi-GPU (Medium Workloads, 4 GPUs)
- Instance:
ml.g4dn.12xlarge
(default) - 4x NVIDIA T4 GPUs
- 48 vCPUs
- 192 GB RAM
- Up to 900 GB NVMe SSD
- Instance:
The demo_aws_multi_gpu.ipynb
notebook provides multi-GPU training capabilities:
# Configure multi-GPU training
models_list = parallel_prepare_models(
log_dir,
dataset,
data_splits,
memberships,
configs,
logger,
num_gpus=4 # Adjust based on instance type
)
Features include:
- Real-time training progress for each GPU
- Memory utilization tracking
- Cost estimation for spot instances
- Comprehensive logging and metadata collection
- Instance name (default: PrivacyMeterNotebook)
- GPU instance type (based on workload size)
- Repository URL (privacy_meter or privacy_meter_dev)
- Python version (default: 3.10)
- Kernel name (default: conda_privacymeter_p310)
- PyTorch Installation URL (Default: CUDA 11.8 build of PyTorch https://download.pytorch.org/whl/cu118)
- Number of GPUs to utilize (default: all available)
- Monitor setup progress and check the log file:
- Installation Complete Flag: Wait for
/home/ec2-user/SageMaker/setup-complete
file to appear - Log files: Check
/home/ec2-user/SageMaker/setup.log
for installation progress and errors. The tail of the file should look like:
All packages match required versions! Installed kernelspec privacymeter_p310 in /home/ec2-user/.local/share/jupyter/kernels/privacymeter_p310
- Installation Complete Flag: Wait for
- Verification Notebook: Verify the conda environment is active in the notebook
verification.ipynb
:- Launch a Jupyter notebook with the privacymeter kernel preferred kernel: conda_privacy_meter_p310
- Reload with a different kernel first if the list of installed kernels has not been updated yet
- Check the last line of the output of Cell 4 in the notebook. It should look like:
All packages match required versions!
- Conda Env: Verify the conda environment is active from a AWS SageMaker Terminal window:
- Make sure to match the extension in 'conda activate privacymeter_p310' with the Python version in the Cloudformation template
export WORKING_DIR=/home/ec2-user/SageMaker/custom-miniconda "$WORKING_DIR/miniconda/bin/conda" init bash source ~/.bashrc conda activate privacymeter_p310 conda info --envs
- The output should look like:
# conda environments: # base /home/ec2-user/SageMaker/custom-miniconda/miniconda privacymeter_p310 * /home/ec2-user/SageMaker/custom-miniconda/miniconda/envs/privacymeter_p310 /home/ec2-user/anaconda3 /home/ec2-user/anaconda3/envs/JupyterSystemEnv /home/ec2-user/anaconda3/envs/R /home/ec2-user/anaconda3/envs/python3 /home/ec2-user/anaconda3/envs/pytorch_p310 /home/ec2-user/anaconda3/envs/tensorflow2_p310
To remove all resources go to the AWS Cloudformation console and select the stack you want to delete.
Alternatively, remove all resources using AWS CLI:
aws cloudformation delete-stack --stack-name privacy-meter-dev
-
Stop the notebook when not in use via the AWS Cloudformation console
Alternatively, deploy using AWS CLI:
aws sagemaker stop-notebook-instance --notebook-instance-name PrivacyMeterNotebook
-
Consider scheduling automatic starts/stops during work hours
-
Monitor costs through AWS Cost Explorer and set up billing alarms as needed
-
For multi-GPU instances, be especially mindful of usage as costs scale with GPU count