Issue when using API with Yi-VL-34B #623

deadpipe · 2025-01-23T17:22:29Z

Reminder

I have searched the Github Discussion and issues and have not found anything similar to this.

Environment

- OS: WSL 2 - Ubuntu 22.04
- Python: 3.10.6
- PyTorch: 2.0.1
- CUDA: 12.1

Current Behavior

Hello,

I'm using openai_api.py with Yi-VL-34B, but I've noticed that my GPU (CUDA) remains active even after the chat completion is finished.

Even after the API request completes and the output is printed, CUDA cores and CPU usage remain high, my GPU fans keep running, and the GPU wattage increases.

Is this expected behavior, or is there something I need to adjust?

Please see the screenshot below

Expected Behavior

No response

Steps to Reproduce

Run openai_api.py with Yi-VL-34B.
Send a chat completion request.
Wait for the response to be printed.
Observe that GPU usage, CUDA cores, and CPU usage remain high even after completion.

Anything Else?

No response

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue when using API with Yi-VL-34B #623

Issue when using API with Yi-VL-34B #623

deadpipe commented Jan 23, 2025

Issue when using API with Yi-VL-34B #623

Issue when using API with Yi-VL-34B #623

Comments

deadpipe commented Jan 23, 2025

Reminder

Environment

Current Behavior

Expected Behavior

Steps to Reproduce

Anything Else?