-
Notifications
You must be signed in to change notification settings - Fork 211
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add the option of streaming gen_ai.choice events. #1964
Comments
Any proposal for what to actually change with the current |
I mean for the streaming or chunked response, we should give an optional semconv like:
{"index":0,"finish_reason":"stop","message":{"content":"Why did the developer bring OpenTelemetry to the party? Because it always knows how to trace the fun!"}}
{"index":0,"sequence_id":0,"message":{"content":"Why did the developer"}}
{"index":0,"sequence_id":1,"message":{"content":" bring OpenTelemetry"}}
{"index":0,"sequence_id":2,"message":{"content":" to the party?"}}
{"index":0,"sequence_id":3,"message":{"content":" Because it always"}}
{"index":0,"sequence_id":4,"message":{"content":" knows how to"}}
{"index":0,"sequence_id":5,"finish_reason":"stop","message":{"content":" trace the fun!"}} |
Here's some important context from slack: Given that the capture of prompts/completions for three things:
For observability, prompts/completions of high precision are not required, and GenAI spans/logs should pay more attention to the most important metadata. As for audit/compliance, the capture could be separately for the precision optimization, such as implement of eBPF or other out-of-process approaches. I'm not sure if I miss or misunderstand anything. @lmolkova |
Area(s)
area:gen-ai
What's missing?
There are streaming and non-streaming response mode for LLM call, and that's means the implement of capturing
gen_ai.choice
can be very different.I have noticed two approaches up to now:
Personally we prefer the latter option — which means less memory usage in collection sides. As we know, collection sides normally work together with production applications (they mostly in the same process). So we will still hear sounds that collect tools are occupying much memory if we follow the first implement.
However, there're no semantic conventions about capturing streaming response here. This means that observability backends following the OTel semconv can only recognize choice events that have been aggregated on the collection side. Even if they become aware of the issue above and allow the ingestion of chunked choice events, such an implementation would be non-standardized, leading to a wide variety of final formats and causing confusion for OTel users.
My proposal is: Could we provide an alternative and define a streaming format for the event structure? This would give developers flexibility — they could aggregate the data on the client side, or they could choose to stream the events, with the latter implying that they must rely on a server-side solution that supports aggregation.
Describe the solution you'd like
P.S. I have to point out this topic is what I want to discuss in today's SIG APAC but nobody else comes actually. We really need a notify if this meet has been cancelled or delay.
The text was updated successfully, but these errors were encountered: