|
| 1 | +{ |
| 2 | + "cells": [ |
| 3 | + { |
| 4 | + "cell_type": "markdown", |
| 5 | + "id": "021cfd04-df68-414d-a864-48f62fc8ddfb", |
| 6 | + "metadata": {}, |
| 7 | + "source": [ |
| 8 | + "# Distributed state vector simulations on multiple GPUs (advanced)\n", |
| 9 | + "\n", |
| 10 | + "In the notebook \"2_parallel_simulations.ipynb\", you learned how to use CUDA-Q and Braket Hybrid Jobs to parallelize the simulation of a batch of observables and circuits over multiple GPUs, where each GPU simulates a single QPU. For workloads with larger qubit counts, however, it may be necessary to distribute a single state vector simulation across multiple GPUs, so that multiple GPUs together simulate a single QPU.\n", |
| 11 | + "\n", |
| 12 | + "In this notebook, you will learn how to use CUDA-Q and Braket Hybrid Jobs to tackle this." |
| 13 | + ] |
| 14 | + }, |
| 15 | + { |
| 16 | + "cell_type": "markdown", |
| 17 | + "id": "32b46659-6dcc-4900-a13a-e971f8bf0590", |
| 18 | + "metadata": {}, |
| 19 | + "source": [ |
| 20 | + "We start with necessary imports that are used in the examples below." |
| 21 | + ] |
| 22 | + }, |
| 23 | + { |
| 24 | + "cell_type": "code", |
| 25 | + "execution_count": null, |
| 26 | + "id": "8738f65f-969c-4b58-96f8-69bbc1bad5e1", |
| 27 | + "metadata": {}, |
| 28 | + "outputs": [], |
| 29 | + "source": [ |
| 30 | + "from braket.jobs import hybrid_job\n", |
| 31 | + "from braket.jobs.config import InstanceConfig" |
| 32 | + ] |
| 33 | + }, |
| 34 | + { |
| 35 | + "cell_type": "markdown", |
| 36 | + "id": "1331136a-a369-4eef-8bfe-252c79103a3e", |
| 37 | + "metadata": {}, |
| 38 | + "source": [ |
| 39 | + "Next, we need to create and upload a container which contains both CUDA-Q and the underlying CUDA support required for distributing our computation across multiple GPUs. Note: this container image will be different than the one used in the previous notebooks illustrating more basic CUDA-Q scenarios.\n", |
| 40 | + "\n", |
| 41 | + "To do this, we need to run the commands in the cell below. (For more information about what these commands are doing, please see the detailed documentation in \"0_hello_cudaq_jobs.ipynb\". The difference here is that we specify the dockerfile `Dockerfile.mgpu` in order to ensure full support for this advanced scenario.)" |
| 42 | + ] |
| 43 | + }, |
| 44 | + { |
| 45 | + "cell_type": "code", |
| 46 | + "execution_count": null, |
| 47 | + "id": "f552c738", |
| 48 | + "metadata": {}, |
| 49 | + "outputs": [], |
| 50 | + "source": [ |
| 51 | + "!chmod +x container/container_build_and_push.sh\n", |
| 52 | + "!container/container_build_and_push.sh cudaq-mgpu-job us-west-2 Dockerfile.mgpu" |
| 53 | + ] |
| 54 | + }, |
| 55 | + { |
| 56 | + "cell_type": "markdown", |
| 57 | + "id": "fc46e446", |
| 58 | + "metadata": {}, |
| 59 | + "source": [ |
| 60 | + "Now we prepare the URI of the container image. Fill the proper value of `aws_account_id`, `region_name` and `container_image_name` in the cell below. For example, with the shell command above, `region_name=\"us-west-2\"` and `container_image_name=\"cudaq-mgpu-job\"`. The cell below prints out the image URI. When you use a container image to run a job, it ensures that your code is run in the same environment every time. " |
| 61 | + ] |
| 62 | + }, |
| 63 | + { |
| 64 | + "cell_type": "code", |
| 65 | + "execution_count": null, |
| 66 | + "id": "25fdc720-3143-411a-8bef-f9623369b516", |
| 67 | + "metadata": {}, |
| 68 | + "outputs": [], |
| 69 | + "source": [ |
| 70 | + "aws_account_id = \"<aws-account-id>\"\n", |
| 71 | + "region_name = \"<region-name>\"\n", |
| 72 | + "container_image_name = \"<container-image-name>\"\n", |
| 73 | + "\n", |
| 74 | + "image_uri = f\"{aws_account_id}.dkr.ecr.{region_name}.amazonaws.com/{container_image_name}:latest\"\n", |
| 75 | + "print(image_uri)" |
| 76 | + ] |
| 77 | + }, |
| 78 | + { |
| 79 | + "cell_type": "markdown", |
| 80 | + "id": "a844344d-0978-4b11-8fe4-66b387e80c72", |
| 81 | + "metadata": {}, |
| 82 | + "source": [ |
| 83 | + "## Distributed state vector simulations\n", |
| 84 | + "Now that we have the container image URI, we are ready to run our workload. The `nvidia` target with `mgpu` option supports distributing state vector simulations to multiple GPUs. This enables GPU simulations for circuits with higher qubit count, to up to 34 qubits. The example below shows how to submit a job with the `mgpu` option." |
| 85 | + ] |
| 86 | + }, |
| 87 | + { |
| 88 | + "cell_type": "code", |
| 89 | + "execution_count": null, |
| 90 | + "id": "d38b73de-6a7e-45b7-b64b-afec60c0a6c3", |
| 91 | + "metadata": {}, |
| 92 | + "outputs": [], |
| 93 | + "source": [ |
| 94 | + "@hybrid_job(\n", |
| 95 | + " device=\"local:nvidia/nvidia-mgpu\",\n", |
| 96 | + " instance_config=InstanceConfig(instanceType=\"ml.p3.8xlarge\", instanceCount=1),\n", |
| 97 | + " image_uri=image_uri,\n", |
| 98 | + ")\n", |
| 99 | + "def distributed_gpu_job(\n", |
| 100 | + " n_qubits,\n", |
| 101 | + " n_shots,\n", |
| 102 | + " sagemaker_mpi_enabled=True,\n", |
| 103 | + "):\n", |
| 104 | + " import cudaq\n", |
| 105 | + "\n", |
| 106 | + " # Define target\n", |
| 107 | + " cudaq.set_target(\"nvidia\", option=\"mgpu\")\n", |
| 108 | + " print(\"CUDA-Q backend: \", cudaq.get_target())\n", |
| 109 | + " print(\"num_available_gpus: \", cudaq.num_available_gpus())\n", |
| 110 | + "\n", |
| 111 | + " # Initialize MPI and view the MPI properties\n", |
| 112 | + " cudaq.mpi.initialize()\n", |
| 113 | + " rank = cudaq.mpi.rank()\n", |
| 114 | + "\n", |
| 115 | + " # Define circuit and observables\n", |
| 116 | + " @cudaq.kernel\n", |
| 117 | + " def ghz():\n", |
| 118 | + " qubits = cudaq.qvector(n_qubits)\n", |
| 119 | + " h(qubits[0])\n", |
| 120 | + " for q in range(1, n_qubits):\n", |
| 121 | + " cx(qubits[0], qubits[q])\n", |
| 122 | + "\n", |
| 123 | + " hamiltonian = cudaq.SpinOperator.random(n_qubits, 1)\n", |
| 124 | + "\n", |
| 125 | + " # Parallelize circuit simulation\n", |
| 126 | + " result = cudaq.observe(ghz, hamiltonian, shots_count=n_shots)\n", |
| 127 | + "\n", |
| 128 | + " # End the MPI interface\n", |
| 129 | + " cudaq.mpi.finalize()\n", |
| 130 | + "\n", |
| 131 | + " if rank == 0:\n", |
| 132 | + " return {\"expectation\": result.expectation()}\n", |
| 133 | + "\n", |
| 134 | + "\n", |
| 135 | + "n_qubits = 25\n", |
| 136 | + "n_shots = 1000\n", |
| 137 | + "distributed_job = distributed_gpu_job(n_qubits, n_shots)\n", |
| 138 | + "print(\"Job ARN: \", distributed_job.arn)" |
| 139 | + ] |
| 140 | + }, |
| 141 | + { |
| 142 | + "cell_type": "code", |
| 143 | + "execution_count": null, |
| 144 | + "id": "a054c24e", |
| 145 | + "metadata": {}, |
| 146 | + "outputs": [], |
| 147 | + "source": [ |
| 148 | + "distributed_job_result = distributed_job.result()\n", |
| 149 | + "print(f\"result: {distributed_job_result['expectation']}\")" |
| 150 | + ] |
| 151 | + }, |
| 152 | + { |
| 153 | + "cell_type": "markdown", |
| 154 | + "id": "18004fe8-24a0-4316-ab0a-a4e0aac6ba1e", |
| 155 | + "metadata": {}, |
| 156 | + "source": [ |
| 157 | + "## Summary\n", |
| 158 | + "This notebook shows you how to distribute a single state vector simulation across multiple GPUs, so that multiple GPUs together simulate a single QPU. If you have workloads with a qubit count that is too large to simulate on a single GPU, you can use this technique to make these large workloads feasible." |
| 159 | + ] |
| 160 | + } |
| 161 | + ], |
| 162 | + "metadata": { |
| 163 | + "kernelspec": { |
| 164 | + "display_name": "Python 3 (ipykernel)", |
| 165 | + "language": "python", |
| 166 | + "name": "python3" |
| 167 | + }, |
| 168 | + "language_info": { |
| 169 | + "codemirror_mode": { |
| 170 | + "name": "ipython", |
| 171 | + "version": 3 |
| 172 | + }, |
| 173 | + "file_extension": ".py", |
| 174 | + "mimetype": "text/x-python", |
| 175 | + "name": "python", |
| 176 | + "nbconvert_exporter": "python", |
| 177 | + "pygments_lexer": "ipython3", |
| 178 | + "version": "3.10.13" |
| 179 | + } |
| 180 | + }, |
| 181 | + "nbformat": 4, |
| 182 | + "nbformat_minor": 5 |
| 183 | +} |
0 commit comments