You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm using openai_api.py with Yi-VL-34B, but I've noticed that my GPU (CUDA) remains active even after the chat completion is finished.
Even after the API request completes and the output is printed, CUDA cores and CPU usage remain high, my GPU fans keep running, and the GPU wattage increases.
Is this expected behavior, or is there something I need to adjust?
Please see the screenshot below
Expected Behavior
No response
Steps to Reproduce
Run openai_api.py with Yi-VL-34B.
Send a chat completion request.
Wait for the response to be printed.
Observe that GPU usage, CUDA cores, and CPU usage remain high even after completion.
Anything Else?
No response
The text was updated successfully, but these errors were encountered:
Reminder
Environment
Current Behavior
Hello,
I'm using openai_api.py with Yi-VL-34B, but I've noticed that my GPU (CUDA) remains active even after the chat completion is finished.
Even after the API request completes and the output is printed, CUDA cores and CPU usage remain high, my GPU fans keep running, and the GPU wattage increases.
Is this expected behavior, or is there something I need to adjust?
Please see the screenshot below
Expected Behavior
No response
Steps to Reproduce
Anything Else?
No response
The text was updated successfully, but these errors were encountered: