Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add deploy from catalog to huggingface client #2880

Open
ErikKaum opened this issue Feb 19, 2025 · 7 comments
Open

Add deploy from catalog to huggingface client #2880

ErikKaum opened this issue Feb 19, 2025 · 7 comments
Assignees

Comments

@ErikKaum
Copy link
Member

Adds a similar "one-click" deploy experience to the python client as we have in the UI:

  1. get list of available catalog models https://endpoints.huggingface.co/api/catalog/repo-list
  2. deploy model like
curl -X POST "https://endpoints.huggingface.co/api/catalog/deploy" \
  -H "Content-Type: application/json" \
  -d '{
    "accessToken": ACCESS_TOKEN,
    "namespace": NAMESPACE,
    "repoId": REPO_ID,
    "endpointName": ENDPOINT_NAME
}'
  • We omit for now more "advanced" checking if a model was removed from the catalog or similar. Let's first see if the feature gets traction.
@julien-c
Copy link
Member

  • how is this different from the existing create_inference_endpoint (and shouldn't we make that better instead)
  • is there a slack convo about this?
    Thanks!

@ErikKaum
Copy link
Member Author

  • here is the slack context where it started, also we had chats with @Wauplin about this and thought it would be a nice thing to test
  • the difference is similar to deploying manually from the ui VS from the catalog in the UI, so create_inference_endpoint requires the user to specify all the details about which container to use, which hardware etc, whereas this just calls deploy and it uses the recommended catalog configuration. The details remain opaque to the user.

Also good to note that this api/catalog/repo-list is experimental and there we'd just want to quickly experiment if this type of a feature would be nice + used in the python client.

@hanouticelina
Copy link
Contributor

👍 I like the idea to be able to programmatically deploy an endpoint without having to worry about the config (device, framework and all of that) + it seems to be a low hanging fruit.

@Wauplin
Copy link
Contributor

Wauplin commented Feb 27, 2025

@ErikKaum a few comments about the API:

  1. is "https://endpoints.huggingface.co/api/catalog/" API designed to be stable?
  2. would it be possible to pass the accessToken as a Authorization header instead of the payload? (as done in every other HF endpoint)
  3. when trying to pass "llama-3.2-3b-instruct-gth" as endpointName, I'm getting {"message":"Bad Request: Invalid endpoint name. It can only contain lowercase alphanumeric characters or '-' and have a length of 32 characters"}.
    1. is it possible to put the error message under "error" instead of "message"? (better for error formatting client-side)
    2. is 32 a hard limit? Seems very low TBH. Repo ids can go up to 96 chars if I remember correctly

@ErikKaum
Copy link
Member Author

  1. if possible to mark the python wrapper as "beta", that would be nice. Also I could add a /v1 in there just to make life easier. I don't think we'll change much but nice to be able to change it without breaking it for you

  2. Yes, I'll put it under error and yes it's a bit of an annoying limit I agree. I think it would require a bigger rework on the backend side which might be a bit of pandora's box, ideally we'd avoid it.

@Wauplin
Copy link
Contributor

Wauplin commented Feb 28, 2025

In #2892, methods are flagged as experimental.

Request auth, is it possible to update the endpoint to pass token as header instead of payload?

@ErikKaum
Copy link
Member Author

Awesome 👍

Ah sorry, missed to reply to that one. Yes, I'll make that change as well!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants