You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This script is used to benchmark the offline inference throughput of a specified model. It sets up the environment, defines the paths for the tokenizer, model, and dataset, and uses numactl to bind the process to appropriate CPU resources for optimized performance.
88
+
```bash
89
+
#!/bin/bash
90
+
91
+
# Preload libiomp5.so by following cmd or LD_PRELOAD=libiomp5.so manually
92
+
export$(python -c 'import xfastertransformer as xft; print(xft.get_env())')
93
+
94
+
# Define the paths for the tokenizer and the model
--tokenizer ${TOKEN_PATH}\ # Path to the tokenizer
102
+
--model ${MODEL_PATH}\ # Path to the model
103
+
--dataset ${DATASET_PATH}# Path to the dataset
104
+
```
105
+
106
+
### Benchmark online serving throughput.
107
+
This guide explains how to benchmark the online serving throughput for a model. It includes instructions for setting up the server and running the client benchmark script.
108
+
1. On the server side, you can refer to the following code to start the test API server:
109
+
```bash
110
+
#!/bin/bash
111
+
112
+
# Preload libiomp5.so by following cmd or LD_PRELOAD=libiomp5.so manually
113
+
export$(python -c 'import xfastertransformer as xft; print(xft.get_env())')
114
+
115
+
# Define the paths for the tokenizer and the model
116
+
TOKEN_PATH=/data/models/Qwen2-7B-Instruct
117
+
MODEL_PATH=/data/models/Qwen2-7B-Instruct-xft
118
+
119
+
# Start the API server using numactl to bind to appropriate CPU resources
0 commit comments