We currently host a single open model endpoint that could be used to student projects, experiments, practices for students enrolled in the course. Please use the endpoint responsibly as it is a shared resource, and keep in mind that we only keep the endpoint running at a given version during the project submission period, and it may be updated or removed after the course ends.
Acknowledgement
The endpoint is hosted by the Digital Heritage National Laboratory, who kindly provide some usage quota for our course as well. We are grateful for their support.
Field | Description |
---|---|
API Endpoint URL | http://mobydick.elte-dh.hu:12321 |
Endpoint API definition | http://mobydick.elte-dh.hu:12321/docs |
Authentication | API Key (contact the lecturers) |
Model Version (might be out of date) | ibnzterrell/Meta-Llama-3.3-70B-Instruct-AWQ-INT4 |
TGI Server documentation | https://huggingface.co/docs/text-generation-inference/index |
Test GUI | http://mobydick.elte-dh.hu:3000 |
Test GUI login | Ask the lecturers for the login credentials. |
You can use the openai
package to interact with the endpoint.
The endpoint is mostly compatible with the core OpenAI functionality with slight limitations.
To use the endpoint set the following parameters:
from openai import OpenAI
# Set the API key and the base URL (with the /v1 endpoint)
client = OpenAI(
base_url="http://mobydick.elte-dh.hu:12321/v1",
api_key="<API_KEY>"
)
chat_completion = client.chat.completions.create(
model="tgi",
messages=[
{"role": "system", "content": "You are a helpful assistant." },
{"role": "user", "content": "What is the purpose of life"}
],
stream=False
)
print(chat_completion)
For the most up to date information, please refer to the official TGI tutorial: https://huggingface.co/docs/text-generation-inference/basic_tutorials/consuming_tgi#openai-client
Visual inputs are not supported, and the context window is limited to ~100k tokens (as a sum of prompt and response). There is no maximal response length.
Please check TGI documentation for parameter details (there are known issues with top_p, etc...), also the default for the max generated tokens tends to be a low number (1-200), which you have to override in your requests!
Tooling is currently in limited support (automatic selection works fine, but forced selection might not work as expected), and there is no json output support yet.
For enforcing output schema you can instantiate your own tokenizer and use huggingface's generate
with an inference client, where you are able to provide guidance grammar to enforce the output schema. Details
For tokenization or token counting you should use the HTTP endpoints or after getting the model id (by HTTP endpoints again) you can instantiate your own tokenizer from Huggingface objects. Some models might be gated so you need a huggingface access token to download their tokenizer.
You can also use HTTP requests to interact with the endpoint. In this case the API key should be passed as a Bearer token:
import requests
headers = {
"Content-Type": "application/json",
"Authorization": "Bearer <API_KEY>"
}
data = {
'inputs': 'What is the purpose of life?',
'parameters': {
'max_new_tokens': 20,
},
}
response = requests.post('http://mobydick.elte-dh.hu:12321/generate', headers=headers, json=data)
Some functionalities are not available via the OpenAI package (nor the official Huggingface Hub Client), such as listing the available model information, or performing tokenizations, health checks, etc. Check the API documentation for more details.
The test GUI is currently very limited, you can only set the model, system prompt, temperature and max output tokens. Feel free to create your own test GUI if needed, gradio
and streamlit
are both good, swift options for that.
The endpoint is for educational / experimental purposes only, it is not meant for production use or handling sensitive or personal data. Commercial use, or any unauthorized usage out of class is unsafe and strictly forbidden. We also occasionally monitor and log the requests and responses to avoid misuse.