|
| 1 | +# Accelerated inference on AMD GPUs supported by ROCm |
| 2 | + |
| 3 | +By default, ONNX Runtime runs inference on CPU devices. However, it is possible to place supported operations on an AMD Instinct GPU, while leaving any unsupported ones on CPU. In most cases, this allows costly operations to be placed on GPU and significantly accelerate inference. |
| 4 | + |
| 5 | +Our testing involved AMD Instinct GPUs, and for specific GPU compatibility, please refer to the official support list of GPUs available [here](https://rocm.docs.amd.com/en/latest/release/gpu_os_support.html). |
| 6 | + |
| 7 | +This guide will show you how to run inference on the `ROCMExecutionProvider` execution provider that ONNX Runtime supports for AMD GPUs. |
| 8 | + |
| 9 | +## Installation |
| 10 | +The following setup installs the ONNX Runtime support with ROCM Execution Provider with ROCm 5.7. |
| 11 | + |
| 12 | +#### 1. ROCm Installation |
| 13 | + |
| 14 | +To install ROCM 5.7, please follow the [ROCm installation guide](https://rocm.docs.amd.com/en/latest/deploy/linux/index.html). |
| 15 | + |
| 16 | +#### 2. PyTorch Installation with ROCm Support |
| 17 | +Optimum ONNX Runtime integration relies on some functionalities of Transformers that require PyTorch. For now, we recommend to use Pytorch compiled against RoCm 5.7, that can be installed following [PyTorch installation guide](https://pytorch.org/get-started/locally/): |
| 18 | + |
| 19 | +```bash |
| 20 | +pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/rocm5.7 |
| 21 | +``` |
| 22 | + |
| 23 | +<Tip> |
| 24 | +For docker installation, the following base image is recommended: `rocm/pytorch:rocm5.7_ubuntu22.04_py3.10_pytorch_2.0.1` |
| 25 | +</Tip> |
| 26 | + |
| 27 | +### 3. ONNX Runtime installation with ROCm Execution Provider |
| 28 | + |
| 29 | +```bash |
| 30 | +# pre-requisites |
| 31 | +pip install -U pip |
| 32 | +pip install cmake onnx |
| 33 | +curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh |
| 34 | + |
| 35 | +# Install ONNXRuntime from source |
| 36 | +git clone --recursive https://github.com/ROCmSoftwarePlatform/onnxruntime.git |
| 37 | +git checkout rocm5.7_internal_testing_eigen-3.4.zip_hash |
| 38 | +cd onnxruntime |
| 39 | + |
| 40 | +./build.sh --config Release --build_wheel --update --build --parallel --cmake_extra_defines ONNXRUNTIME_VERSION=$(cat ./VERSION_NUMBER) --use_rocm --rocm_home=/opt/rocm |
| 41 | +pip install build/Linux/Release/dist/* |
| 42 | +``` |
| 43 | + |
| 44 | +<Tip> |
| 45 | +To avoid conflicts between `onnxruntime` and `onnxruntime-rocm`, make sure the package `onnxruntime` is not installed by running `pip uninstall onnxruntime` prior to installing `onnxruntime-rocm`. |
| 46 | +</Tip> |
| 47 | + |
| 48 | +### Checking the ROCm installation is successful |
| 49 | + |
| 50 | +Before going further, run the following sample code to check whether the install was successful: |
| 51 | + |
| 52 | +```python |
| 53 | +>>> from optimum.onnxruntime import ORTModelForSequenceClassification |
| 54 | +>>> from transformers import AutoTokenizer |
| 55 | + |
| 56 | +>>> ort_model = ORTModelForSequenceClassification.from_pretrained( |
| 57 | +... "philschmid/tiny-bert-sst2-distilled", |
| 58 | +... export=True, |
| 59 | +... provider="ROCMExecutionProvider", |
| 60 | +... ) |
| 61 | + |
| 62 | +>>> tokenizer = AutoTokenizer.from_pretrained("philschmid/tiny-bert-sst2-distilled") |
| 63 | +>>> inputs = tokenizer("expectations were low, actual enjoyment was high", return_tensors="pt", padding=True) |
| 64 | + |
| 65 | +>>> outputs = ort_model(**inputs) |
| 66 | +>>> assert ort_model.providers == ["ROCMExecutionProvider", "CPUExecutionProvider"] |
| 67 | +``` |
| 68 | + |
| 69 | +In case this code runs gracefully, congratulations, the installation is successfull! If you encounter the following error or similar, |
| 70 | + |
| 71 | +``` |
| 72 | +ValueError: Asked to use ROCMExecutionProvider as an ONNX Runtime execution provider, but the available execution providers are ['CPUExecutionProvider']. |
| 73 | +``` |
| 74 | + |
| 75 | +then something is wrong with the ROCM or ONNX Runtime installation. |
| 76 | + |
| 77 | +### Use ROCM Execution Provider with ORT models |
| 78 | + |
| 79 | +For ORT models, the use is straightforward. Simply specify the `provider` argument in the `ORTModel.from_pretrained()` method. Here's an example: |
| 80 | + |
| 81 | +```python |
| 82 | +>>> from optimum.onnxruntime import ORTModelForSequenceClassification |
| 83 | + |
| 84 | +>>> ort_model = ORTModelForSequenceClassification.from_pretrained( |
| 85 | +... "distilbert-base-uncased-finetuned-sst-2-english", |
| 86 | +... export=True, |
| 87 | +... provider="ROCMExecutionProvider", |
| 88 | +... ) |
| 89 | +``` |
| 90 | + |
| 91 | +The model can then be used with the common 🤗 Transformers API for inference and evaluation, such as [pipelines](https://huggingface.co/docs/optimum/onnxruntime/usage_guides/pipelines). |
| 92 | +When using Transformers pipeline, note that the `device` argument should be set to perform pre- and post-processing on GPU, following the example below: |
| 93 | + |
| 94 | +```python |
| 95 | +>>> from optimum.pipelines import pipeline |
| 96 | +>>> from transformers import AutoTokenizer |
| 97 | + |
| 98 | +>>> tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased-finetuned-sst-2-english") |
| 99 | + |
| 100 | +>>> pipe = pipeline(task="text-classification", model=ort_model, tokenizer=tokenizer, device="cuda:0") |
| 101 | +>>> result = pipe("Both the music and visual were astounding, not to mention the actors performance.") |
| 102 | +>>> print(result) # doctest: +IGNORE_RESULT |
| 103 | +# printing: [{'label': 'POSITIVE', 'score': 0.9997727274894c714}] |
| 104 | +``` |
| 105 | + |
| 106 | +Additionally, you can pass the session option `log_severity_level = 0` (verbose), to check whether all nodes are indeed placed on the ROCM execution provider or not: |
| 107 | + |
| 108 | +```python |
| 109 | +>>> import onnxruntime |
| 110 | + |
| 111 | +>>> session_options = onnxruntime.SessionOptions() |
| 112 | +>>> session_options.log_severity_level = 0 |
| 113 | + |
| 114 | +>>> ort_model = ORTModelForSequenceClassification.from_pretrained( |
| 115 | +... "distilbert-base-uncased-finetuned-sst-2-english", |
| 116 | +... export=True, |
| 117 | +... provider="ROCMExecutionProvider", |
| 118 | +... session_options=session_options |
| 119 | +... ) |
| 120 | +``` |
| 121 | + |
| 122 | +### Observed time gains |
| 123 | + |
| 124 | +Coming soon! |
0 commit comments