[NPU][Llama] NPU is slower than CPU＆GPU when running LLM #1882

yang-ahuan · 2025-03-11T06:24:10Z

Description

I followed the official guidelines (NPU with OpenVINO GenAI) to run LLM. When testing the model on different hardware, I observed the following:

GPU performance is indeed faster than CPU, as expected.
However, NPU runs slower than CPU, which seems counterintuitive.

Is this behavior expected, or could there be an issue with my setup? I would appreciate any insights into this.

Experiment

Environment

Processor: Intel(R) Core(TM) Ultra 9 288V
OS: Win11 Pro 24H2
Package version:
- openvino: 2025.0.0
- openvino-tokenizers: 2025.0.0.0
- openvino-genai: 2025.0.0.0
- optimum-intel: 1.22.0

Testing model

Llama-3.2-3B-Instruct
Llama-2-7B-Chat-Hf

Testing code

optimum-cli export openvino -m meta-llama/Llama-2-7b-chat-hf --weight-format int4 --sym --group-size -1 model/llama2_7b_chat_hf_ov_quant_wi4_sym_gs-1

from time import time
from openvino_genai import LLMPipeline

device = "CPU" # NPU
model_path = "model/llama2_7b_chat_hf_ov_quant_wi4_sym_gs-1"
pipe = LLMPipeline(
    model_path,
    device,
    # CACHE_DIR=".npucache",
)

latency_rec = []
for _ in range(10):
    start_time = time()
    pipe.generate("How are you?", max_new_tokens=100)
    latency_rec.append(time()-start_time)

Result

Device	Average	Individual
CPU	6.1186s	12.960466861724854, 5.25625205039978, 1.1002938747406006, 6.883204698562622, 6.783355712890625, 2.4332289695739746, 4.931521654129028, 6.968140602111816, 7.027053356170654, 6.842746019363403
NPU	9.4189s	10.49455976486206, 9.88098931312561, 6.548482418060303, 9.95136308670044, 9.887852668762207, 9.795000553131104, 9.912097215652466, 7.871314287185669, 9.916764259338379, 9.931214570999146

Try to Solve

I attempted to optimize the execution of the LLM on the NPU by configuring different group sizes, model types, and cache settings, but the results were still slower than on the CPU.

The text was updated successfully, but these errors were encountered:

Wan-Intel · 2025-03-13T13:07:33Z

I've replicated the issue on Intel® Core™ Ultra 7 Processor 258V, the results of NPU are slower than CPU:

CPU:
[7.669469594955444, 4.948709964752197, 5.008191347122192, 5.01607608795166, 5.037824392318726, 5.108126640319824, 2.7261970043182373, 5.05750036239624, 5.072535514831543, 4.77348256111145]

NPU:
[12.301685571670532, 11.517853736877441, 11.37859845161438, 11.471659660339355, 11.345218658447266, 10.561753749847412, 11.35003662109375, 11.410158395767212, 11.563832759857178, 11.311937808990479]

I'll escalate the case to relevant team, and we will provide an update as soon as possible.

deveworld · 2025-03-20T14:25:26Z

I was also able to reproduce the NPU being significantly slower than the CPU and iGPU on a Galaxy Book5 Pro with an Intel® Core™ Ultra 5 Processor 228V.
Here are the test results:

iGPU (3.7sec) < CPU (5.6sec) < NPU (7.5sec)

Device	Avg (sec)	10 samples
iGPU	3.7	3.8001456260681152, 3.6813881397247314, 3.6545255184173584, 3.667475700378418, 3.6804163455963135, 3.717153549194336, 3.703486919403076, 3.1916630268096924, 3.6896822452545166, 3.717395067214966
CPU	5.6	5.6250293254852295, 5.627218246459961, 5.371657133102417, 5.438901901245117, 5.822175741195679, 5.832181692123413, 5.705724716186523, 5.52695894241333, 5.490842580795288, 5.5097081661224365
NPU	7.5	8.383576393127441, 7.973790645599365, 5.383915662765503, 7.974738121032715, 8.008851289749146, 8.047709226608276, 8.006436824798584, 5.279796361923218, 8.01841402053833, 8.002372026443481

When can we expect a fix for this?

yang-ahuan changed the title ~~[NPU] NPU is slower than CPU＆GPU when running LLM~~ [NPU][Llama] NPU is slower than CPU＆GPU when running LLM Mar 11, 2025

ilya-lavrenov assigned TolyaTalamanov Mar 11, 2025

ilya-lavrenov added category: LLM category: NPU labels Mar 11, 2025

YuChern-Intel assigned Munesh-Intel and Wan-Intel Mar 12, 2025

Wan-Intel added the PSE label Mar 16, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[NPU][Llama] NPU is slower than CPU＆GPU when running LLM #1882

[NPU][Llama] NPU is slower than CPU＆GPU when running LLM #1882

yang-ahuan commented Mar 11, 2025 •

edited

Loading

Wan-Intel commented Mar 13, 2025

deveworld commented Mar 20, 2025 •

edited

Loading

[NPU][Llama] NPU is slower than CPU＆GPU when running LLM #1882

[NPU][Llama] NPU is slower than CPU＆GPU when running LLM #1882

Comments

yang-ahuan commented Mar 11, 2025 • edited Loading

Description

Experiment

Environment

Testing model

Testing code

Result

Try to Solve

Wan-Intel commented Mar 13, 2025

deveworld commented Mar 20, 2025 • edited Loading

yang-ahuan commented Mar 11, 2025 •

edited

Loading

deveworld commented Mar 20, 2025 •

edited

Loading