Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make braket_container.py thread safe #326

Open
sterseba opened this issue Feb 13, 2025 · 0 comments
Open

Make braket_container.py thread safe #326

sterseba opened this issue Feb 13, 2025 · 0 comments

Comments

@sterseba
Copy link

Describe the feature you'd like
The braket_container.py script to launch the user-provided algorithm script is not thread safe, currently. When running a multi-node job with parallelization through MPI (the hyperparameter sagemaker_mpi_enabled makes SageMaker to invoke the braket_container.py with mpirun), this can create race conditions in paritcular in the step to download, extract and make available the user-provided code, when running a multi-node job.

The braket_container.py script should be made thread safe to account for jobs running on multiple instances or (GPU) cores with sagemaker_mpi_enabled=True.

How would this feature be used? Please describe.
The user shouldn't have to worry about this feature and, specifically, shouldn't have to change the braket_container.py script if they want to use MPI support for the jobs.

Additional context
There is an example for a simple workaround for this issue in the amazon-braket-examples repository. I have created an issue there to document this doesn't ultimately solve the problem. But, actually, I think this should be addressed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant