Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue when using API with Yi-VL-34B #623

Open
1 task done
deadpipe opened this issue Jan 23, 2025 · 0 comments
Open
1 task done

Issue when using API with Yi-VL-34B #623

deadpipe opened this issue Jan 23, 2025 · 0 comments

Comments

@deadpipe
Copy link

Reminder

  • I have searched the Github Discussion and issues and have not found anything similar to this.

Environment

- OS: WSL 2 - Ubuntu 22.04
- Python: 3.10.6
- PyTorch: 2.0.1
- CUDA: 12.1

Current Behavior

Hello,

I'm using openai_api.py with Yi-VL-34B, but I've noticed that my GPU (CUDA) remains active even after the chat completion is finished.

Even after the API request completes and the output is printed, CUDA cores and CPU usage remain high, my GPU fans keep running, and the GPU wattage increases.

Is this expected behavior, or is there something I need to adjust?

Please see the screenshot below

Image

Expected Behavior

No response

Steps to Reproduce

  1. Run openai_api.py with Yi-VL-34B.
  2. Send a chat completion request.
  3. Wait for the response to be printed.
  4. Observe that GPU usage, CUDA cores, and CPU usage remain high even after completion.

Anything Else?

No response

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant