Skip to content

Commit 6774846

Browse files
stersebarmshafferyitchen-tim
authoredMar 10, 2025
Reduced-size CUDA-Q BYOC image based on pre-built Braket base jobs image (#684)
Co-authored-by: Ryan Shaffer <3620100+rmshaffer@users.noreply.github.com> Co-authored-by: Tim (Yi-Ting) <yitchen@amazon.com>
1 parent 198f2d1 commit 6774846

11 files changed

+240
-85
lines changed
 

‎README.md

+6-1
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@ The examples in this repository are structured as follows:
1313
- [Pulse Control](#pulse)
1414
- [Analog Hamiltonian Simulation](#ahs)
1515
- [Qiskit with Braket](#qiskit)
16+
- [CUDA-Q](#cudaq)
1617

1718
---
1819

@@ -312,7 +313,11 @@ This folder contains examples that illustrate the use of Amazon Braket Hybrid Jo
312313

313314
- [**Parallel simulations on multiple GPUs**](examples/nvidia_cuda_q/2_parallel_simulations.ipynb)
314315

315-
This tutorial shows you how to parallelize the simulations of observables and circuit batches over multiple GPUs using Braket Hybrid Jobs.
316+
This tutorial shows you how to parallelize the simulations of observables and circuit batches over multiple GPUs using CUDA-Q with Braket Hybrid Jobs.
317+
318+
- [**Distributed state vector simulations on multiple GPUs (advanced)**](examples/nvidia_cuda_q/3_distributed_statevector_simulations.ipynb)
319+
320+
This tutorial shows you how to distribute a single state vector simulation across multiple GPUs using CUDA-Q with Braket Hybrid Jobs.
316321

317322
---
318323

‎examples/nvidia_cuda_q/0_hello_cudaq_jobs.ipynb

+7-3
Original file line numberDiff line numberDiff line change
@@ -260,7 +260,7 @@
260260
"metadata": {},
261261
"source": [
262262
"## Summary\n",
263-
"This notebook shows you how to run your first CUDA-Q program with Amazon Braket Hybrid Jobs. Using the BYOC feature of Amazon Braket and a shell script we provide, you can create a CUDA-Q environment with a few lines of code. Once you have registered your CUDA-Q container image, you can run CUDA-Q programs with Braket Hybrid Jobs and scale your workloads up and out with the range of compute options provided by AWS. In the following tutorials, we will show you how to run CUDA-Q simulations on GPUs ([notebook](1_simulation_with_GPUs.ipynb)) and distribute workloads across multiple instances ([notebook](2_parallel_simulations.ipynb))."
263+
"This notebook shows you how to run your first CUDA-Q program with Amazon Braket Hybrid Jobs. Using the BYOC feature of Amazon Braket and a shell script we provide, you can create a CUDA-Q environment with a few lines of code. Once you have registered your CUDA-Q container image, you can run CUDA-Q programs with Braket Hybrid Jobs and scale your workloads up and out with the range of compute options provided by AWS. In the following tutorials, we will show you how to run CUDA-Q simulations on GPUs ([notebook](1_simulation_with_GPUs.ipynb)), distribute workloads across multiple instances ([notebook](2_parallel_simulations.ipynb)), and distribute a single state vector simulation across multiple GPUs ([notebook](3_distributed_statevector_simulations.ipynb))."
264264
]
265265
},
266266
{
@@ -271,7 +271,8 @@
271271
"## Appendix: Procedure for building the container\n",
272272
"\n",
273273
"When the shell script `container_build_and_push.sh` is called, a Docker container is built with CUDA-Q and other GPU related settings are configured. The procedure for BYOC is presented in [this page from the Braket Developer Guide](https://docs.aws.amazon.com/braket/latest/developerguide/braket-jobs-byoc.html). The required files for building a container with CUDA-Q are in the \"container\" folder, including\n",
274-
"- `Dockerfile`: Describes how the container is built.\n",
274+
"- `Dockerfile`: Describes how the container is built for CUDA-Q scenarios.\n",
275+
"- `Dockerfile.mgpu`: (advanced) Describes how the container is built for CUDA-Q scenarios which require multi-GPU support.\n",
275276
"- `requirements.txt`: Additional Python dependencies to include.\n",
276277
"- `braket_container.py`: The start-up script of a job container.\n",
277278
"\n",
@@ -306,6 +307,9 @@
306307
"metadata": {},
307308
"outputs": [],
308309
"source": [
310+
"from braket.aws import AwsQuantumJob\n",
311+
"from braket.devices import Devices\n",
312+
"\n",
309313
"# create a hybrid job\n",
310314
"job = AwsQuantumJob.create(\n",
311315
" device=Devices.Amazon.SV1,\n",
@@ -315,7 +319,7 @@
315319
"\n",
316320
"# view the ARN and the status of the job\n",
317321
"print(\"ARN of the job: \", job.arn)\n",
318-
"print(\"Status of the job: \", job.status())"
322+
"print(\"Status of the job: \", job.state())"
319323
]
320324
}
321325
],

‎examples/nvidia_cuda_q/1_simulation_with_GPUs.ipynb

+5-3
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@
77
"source": [
88
"# Simulating quantum programs on GPUs\n",
99
"\n",
10-
"In this notebook, you will learn how to simulate quantum circuits using GPUs with Amazon Braket. You can learn about BYOC and view how to build a container image that supports CUDA-Q in the notebook \"0_hello_cudaq_jobs.ipynb\"."
10+
"In this notebook, you will learn how to simulate quantum circuits using GPUs with NVIDIA CUDA-Q and Braket Hybrid Jobs."
1111
]
1212
},
1313
{
@@ -39,7 +39,9 @@
3939
"id": "0bd2cd58-420c-4bf3-b161-09c27a272f8f",
4040
"metadata": {},
4141
"source": [
42-
"Next, specify the URI of you CUDA-Q container image. If you went through the \"0_hello_cudaq_jobs.ipynb\" notebook, you can use the same image URI."
42+
"Next, specify the URI of your container image that supports CUDA-Q.\n",
43+
"\n",
44+
"If you don't have this URI already, see the notebook \"0_hello_cudaq_jobs.ipynb\", where you can learn about Braket Hybrid Jobs and how to build a container image that supports CUDA-Q. After following the steps in that notebook to upload the container image, you can use the same image URI here."
4345
]
4446
},
4547
{
@@ -228,7 +230,7 @@
228230
"name": "python",
229231
"nbconvert_exporter": "python",
230232
"pygments_lexer": "ipython3",
231-
"version": "3.10.15"
233+
"version": "3.10.13"
232234
}
233235
},
234236
"nbformat": 4,

‎examples/nvidia_cuda_q/2_parallel_simulations.ipynb

+4-64
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,9 @@
4040
"id": "1331136a-a369-4eef-8bfe-252c79103a3e",
4141
"metadata": {},
4242
"source": [
43-
"Next, specify the URI of the container image that supports CUDA-Q. If you went through the \"0_hello_cudaq_jobs.ipynb\" notebook, you can use the same image URI."
43+
"Next, specify the URI of your container image that supports CUDA-Q.\n",
44+
"\n",
45+
"If you don't have this URI already, see the notebook \"0_hello_cudaq_jobs.ipynb\", where you can learn about Braket Hybrid Jobs and how to build a container image that supports CUDA-Q. After following the steps in that notebook to upload the container image, you can use the same image URI here."
4446
]
4547
},
4648
{
@@ -378,68 +380,6 @@
378380
"```"
379381
]
380382
},
381-
{
382-
"cell_type": "markdown",
383-
"id": "a844344d-0978-4b11-8fe4-66b387e80c72",
384-
"metadata": {},
385-
"source": [
386-
"## Distributed statevector simulations\n",
387-
"The `nvidia` target with `mgpu` option supports distributing state vector simulations to multiple GPUs. This enables GPU simulations for circuits with higher qubit count, to up to 34 qubits. The example below shows how to submit a job with the `mgpu` option."
388-
]
389-
},
390-
{
391-
"cell_type": "code",
392-
"execution_count": null,
393-
"id": "d38b73de-6a7e-45b7-b64b-afec60c0a6c3",
394-
"metadata": {},
395-
"outputs": [],
396-
"source": [
397-
"@hybrid_job(\n",
398-
" device=\"local:nvidia/nvidia-mgpu\",\n",
399-
" instance_config=InstanceConfig(instanceType=\"ml.p3.8xlarge\", instanceCount=1),\n",
400-
" image_uri=image_uri,\n",
401-
")\n",
402-
"def distributed_gpu_job(\n",
403-
" n_qubits,\n",
404-
" n_shots,\n",
405-
" sagemaker_mpi_enabled=True,\n",
406-
"):\n",
407-
" import cudaq\n",
408-
"\n",
409-
" # Define target\n",
410-
" cudaq.set_target(\"nvidia\", option=\"mgpu\")\n",
411-
" print(\"CUDA-Q backend: \", cudaq.get_target())\n",
412-
" print(\"num_available_gpus: \", cudaq.num_available_gpus())\n",
413-
"\n",
414-
" # Initialize MPI and view the MPI properties\n",
415-
" cudaq.mpi.initialize()\n",
416-
" rank = cudaq.mpi.rank()\n",
417-
"\n",
418-
" # Define circuit and observables\n",
419-
" @cudaq.kernel\n",
420-
" def ghz():\n",
421-
" qubits = cudaq.qvector(n_qubits)\n",
422-
" h(qubits[0])\n",
423-
" for q in range(1, n_qubits):\n",
424-
" cx(qubits[0], qubits[q])\n",
425-
"\n",
426-
" hamiltonian = cudaq.SpinOperator.random(n_qubits, 1)\n",
427-
"\n",
428-
" # Parallelize circuit simulation\n",
429-
" result = cudaq.observe(ghz, hamiltonian, shots_count=n_shots)\n",
430-
"\n",
431-
" # End the MPI interface\n",
432-
" cudaq.mpi.finalize()\n",
433-
"\n",
434-
" if rank == 0:\n",
435-
" return {\"expectation\": result.expectation()}\n",
436-
"\n",
437-
"\n",
438-
"n_qubits = 25\n",
439-
"n_shots = 1000\n",
440-
"distributed_job = distributed_gpu_job(n_qubits, n_shots)"
441-
]
442-
},
443383
{
444384
"cell_type": "markdown",
445385
"id": "18004fe8-24a0-4316-ab0a-a4e0aac6ba1e",
@@ -466,7 +406,7 @@
466406
"name": "python",
467407
"nbconvert_exporter": "python",
468408
"pygments_lexer": "ipython3",
469-
"version": "3.10.15"
409+
"version": "3.10.13"
470410
}
471411
},
472412
"nbformat": 4,
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,183 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"id": "021cfd04-df68-414d-a864-48f62fc8ddfb",
6+
"metadata": {},
7+
"source": [
8+
"# Distributed state vector simulations on multiple GPUs (advanced)\n",
9+
"\n",
10+
"In the notebook \"2_parallel_simulations.ipynb\", you learned how to use CUDA-Q and Braket Hybrid Jobs to parallelize the simulation of a batch of observables and circuits over multiple GPUs, where each GPU simulates a single QPU. For workloads with larger qubit counts, however, it may be necessary to distribute a single state vector simulation across multiple GPUs, so that multiple GPUs together simulate a single QPU.\n",
11+
"\n",
12+
"In this notebook, you will learn how to use CUDA-Q and Braket Hybrid Jobs to tackle this."
13+
]
14+
},
15+
{
16+
"cell_type": "markdown",
17+
"id": "32b46659-6dcc-4900-a13a-e971f8bf0590",
18+
"metadata": {},
19+
"source": [
20+
"We start with necessary imports that are used in the examples below."
21+
]
22+
},
23+
{
24+
"cell_type": "code",
25+
"execution_count": null,
26+
"id": "8738f65f-969c-4b58-96f8-69bbc1bad5e1",
27+
"metadata": {},
28+
"outputs": [],
29+
"source": [
30+
"from braket.jobs import hybrid_job\n",
31+
"from braket.jobs.config import InstanceConfig"
32+
]
33+
},
34+
{
35+
"cell_type": "markdown",
36+
"id": "1331136a-a369-4eef-8bfe-252c79103a3e",
37+
"metadata": {},
38+
"source": [
39+
"Next, we need to create and upload a container which contains both CUDA-Q and the underlying CUDA support required for distributing our computation across multiple GPUs. Note: this container image will be different than the one used in the previous notebooks illustrating more basic CUDA-Q scenarios.\n",
40+
"\n",
41+
"To do this, we need to run the commands in the cell below. (For more information about what these commands are doing, please see the detailed documentation in \"0_hello_cudaq_jobs.ipynb\". The difference here is that we specify the dockerfile `Dockerfile.mgpu` in order to ensure full support for this advanced scenario.)"
42+
]
43+
},
44+
{
45+
"cell_type": "code",
46+
"execution_count": null,
47+
"id": "f552c738",
48+
"metadata": {},
49+
"outputs": [],
50+
"source": [
51+
"!chmod +x container/container_build_and_push.sh\n",
52+
"!container/container_build_and_push.sh cudaq-mgpu-job us-west-2 Dockerfile.mgpu"
53+
]
54+
},
55+
{
56+
"cell_type": "markdown",
57+
"id": "fc46e446",
58+
"metadata": {},
59+
"source": [
60+
"Now we prepare the URI of the container image. Fill the proper value of `aws_account_id`, `region_name` and `container_image_name` in the cell below. For example, with the shell command above, `region_name=\"us-west-2\"` and `container_image_name=\"cudaq-mgpu-job\"`. The cell below prints out the image URI. When you use a container image to run a job, it ensures that your code is run in the same environment every time. "
61+
]
62+
},
63+
{
64+
"cell_type": "code",
65+
"execution_count": null,
66+
"id": "25fdc720-3143-411a-8bef-f9623369b516",
67+
"metadata": {},
68+
"outputs": [],
69+
"source": [
70+
"aws_account_id = \"<aws-account-id>\"\n",
71+
"region_name = \"<region-name>\"\n",
72+
"container_image_name = \"<container-image-name>\"\n",
73+
"\n",
74+
"image_uri = f\"{aws_account_id}.dkr.ecr.{region_name}.amazonaws.com/{container_image_name}:latest\"\n",
75+
"print(image_uri)"
76+
]
77+
},
78+
{
79+
"cell_type": "markdown",
80+
"id": "a844344d-0978-4b11-8fe4-66b387e80c72",
81+
"metadata": {},
82+
"source": [
83+
"## Distributed state vector simulations\n",
84+
"Now that we have the container image URI, we are ready to run our workload. The `nvidia` target with `mgpu` option supports distributing state vector simulations to multiple GPUs. This enables GPU simulations for circuits with higher qubit count, to up to 34 qubits. The example below shows how to submit a job with the `mgpu` option."
85+
]
86+
},
87+
{
88+
"cell_type": "code",
89+
"execution_count": null,
90+
"id": "d38b73de-6a7e-45b7-b64b-afec60c0a6c3",
91+
"metadata": {},
92+
"outputs": [],
93+
"source": [
94+
"@hybrid_job(\n",
95+
" device=\"local:nvidia/nvidia-mgpu\",\n",
96+
" instance_config=InstanceConfig(instanceType=\"ml.p3.8xlarge\", instanceCount=1),\n",
97+
" image_uri=image_uri,\n",
98+
")\n",
99+
"def distributed_gpu_job(\n",
100+
" n_qubits,\n",
101+
" n_shots,\n",
102+
" sagemaker_mpi_enabled=True,\n",
103+
"):\n",
104+
" import cudaq\n",
105+
"\n",
106+
" # Define target\n",
107+
" cudaq.set_target(\"nvidia\", option=\"mgpu\")\n",
108+
" print(\"CUDA-Q backend: \", cudaq.get_target())\n",
109+
" print(\"num_available_gpus: \", cudaq.num_available_gpus())\n",
110+
"\n",
111+
" # Initialize MPI and view the MPI properties\n",
112+
" cudaq.mpi.initialize()\n",
113+
" rank = cudaq.mpi.rank()\n",
114+
"\n",
115+
" # Define circuit and observables\n",
116+
" @cudaq.kernel\n",
117+
" def ghz():\n",
118+
" qubits = cudaq.qvector(n_qubits)\n",
119+
" h(qubits[0])\n",
120+
" for q in range(1, n_qubits):\n",
121+
" cx(qubits[0], qubits[q])\n",
122+
"\n",
123+
" hamiltonian = cudaq.SpinOperator.random(n_qubits, 1)\n",
124+
"\n",
125+
" # Parallelize circuit simulation\n",
126+
" result = cudaq.observe(ghz, hamiltonian, shots_count=n_shots)\n",
127+
"\n",
128+
" # End the MPI interface\n",
129+
" cudaq.mpi.finalize()\n",
130+
"\n",
131+
" if rank == 0:\n",
132+
" return {\"expectation\": result.expectation()}\n",
133+
"\n",
134+
"\n",
135+
"n_qubits = 25\n",
136+
"n_shots = 1000\n",
137+
"distributed_job = distributed_gpu_job(n_qubits, n_shots)\n",
138+
"print(\"Job ARN: \", distributed_job.arn)"
139+
]
140+
},
141+
{
142+
"cell_type": "code",
143+
"execution_count": null,
144+
"id": "a054c24e",
145+
"metadata": {},
146+
"outputs": [],
147+
"source": [
148+
"distributed_job_result = distributed_job.result()\n",
149+
"print(f\"result: {distributed_job_result['expectation']}\")"
150+
]
151+
},
152+
{
153+
"cell_type": "markdown",
154+
"id": "18004fe8-24a0-4316-ab0a-a4e0aac6ba1e",
155+
"metadata": {},
156+
"source": [
157+
"## Summary\n",
158+
"This notebook shows you how to distribute a single state vector simulation across multiple GPUs, so that multiple GPUs together simulate a single QPU. If you have workloads with a qubit count that is too large to simulate on a single GPU, you can use this technique to make these large workloads feasible."
159+
]
160+
}
161+
],
162+
"metadata": {
163+
"kernelspec": {
164+
"display_name": "Python 3 (ipykernel)",
165+
"language": "python",
166+
"name": "python3"
167+
},
168+
"language_info": {
169+
"codemirror_mode": {
170+
"name": "ipython",
171+
"version": 3
172+
},
173+
"file_extension": ".py",
174+
"mimetype": "text/x-python",
175+
"name": "python",
176+
"nbconvert_exporter": "python",
177+
"pygments_lexer": "ipython3",
178+
"version": "3.10.13"
179+
}
180+
},
181+
"nbformat": 4,
182+
"nbformat_minor": 5
183+
}
+9-11
Original file line numberDiff line numberDiff line change
@@ -1,17 +1,15 @@
1-
FROM 292282985366.dkr.ecr.us-west-2.amazonaws.com/amazon-braket-pytorch-jobs:latest
2-
RUN python3 -m pip install --upgrade pip
1+
FROM 292282985366.dkr.ecr.us-west-2.amazonaws.com/amazon-braket-base-jobs:1.0-cpu-py310-ubuntu22.04-2025-02-15-18-01-26
32

4-
# install cudaq
53
ARG SCRIPT_PATH
6-
ARG CUDAQ_PATH=/opt/conda/lib/python3.10/site-packages
7-
ENV MPI_PATH=/opt/amazon/openmpi
4+
ARG CUDAQ_PATH=/usr/local/lib/python3.10/site-packages
85

9-
RUN python3 -m pip install cudaq
10-
RUN bash "${CUDAQ_PATH}/distributed_interfaces/activate_custom_mpi.sh"
6+
ENV MPI_PATH=/usr/local \
7+
SAGEMAKER_PROGRAM=braket_container.py
118

12-
# install additional python dependencies
13-
RUN python3 -m pip install --no-cache --upgrade -r requirements.txt
9+
# install Python dependencies including cudaq
10+
COPY "${SCRIPT_PATH}/requirements.txt" .
11+
RUN pip install --no-cache --upgrade -r requirements.txt && \
12+
bash "${CUDAQ_PATH}/distributed_interfaces/activate_custom_mpi.sh"
1413

15-
# Setup our entry point
14+
# setup the entry point
1615
COPY "${SCRIPT_PATH}/braket_container.py" /opt/ml/code/braket_container.py
17-
ENV SAGEMAKER_PROGRAM=braket_container.py

0 commit comments

Comments
 (0)
Please sign in to comment.