Skip to content

Commit d7b0559

Browse files
authored
Merge branch 'main' into feat/add-instrumentation-python
2 parents 225d21b + 9185116 commit d7b0559

File tree

420 files changed

+5487
-1196
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

420 files changed

+5487
-1196
lines changed

.cspell.yml

+5-5
Original file line numberDiff line numberDiff line change
@@ -10,19 +10,19 @@ import:
1010
caseSensitive: true
1111
ignorePaths:
1212
- '*.svg'
13-
- content/ja
14-
- content/zh
13+
- content/{ja,zh}
1514
- data/community/members.yaml
15+
- data/ecosystem/vendors.yaml
16+
- public/_redirects
1617
- static/refcache.json
17-
- vendors.yaml
1818
patterns:
1919
- name: CodeBlock
2020
pattern: |
2121
/
2222
^(\s*[~`]{3,}) # code-block start
2323
.* # all languages and options, e.g. shell {hl_lines=[12]}
2424
[\s\S]*? # content
25-
\1 # code-block end
25+
\1 # code-block end - cSpell:disable-next-line
2626
/igmx
2727
languageSettings:
2828
- languageId: markdown
@@ -58,6 +58,6 @@ dictionaries:
5858
words: # Valid words across all locales
5959
- Docsy
6060
- htmltest
61-
# Hugo
6261
- jsonify
62+
- opentelemetrybot
6363
- warnf

.github/workflows/check-i18n.yml

+7-2
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,12 @@ jobs:
1313
with:
1414
fetch-depth: 0 # all
1515
- name: Any files missing hash key?
16-
run: scripts/check-i18n.sh -n -x -v
16+
run: |
17+
scripts/check-i18n.sh -n -x -v
18+
.github/workflows/scripts/check-i18n-helper.sh
1719
- name: Any files with invalid hash keys?
1820
run: scripts/check-i18n.sh -v
19-
- run: .github/workflows/scripts/check-i18n-helper.sh
21+
- name: Drifted status needs updating?
22+
run: |
23+
scripts/check-i18n.sh -D
24+
# npm run _diff:fail

.gitmodules

+1-1
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@
2020
[submodule "content-modules/semantic-conventions"]
2121
path = content-modules/semantic-conventions
2222
url = https://github.com/open-telemetry/semantic-conventions
23-
semconv-pin = v1.30.0
23+
semconv-pin = v1.31.0
2424
[submodule "content-modules/opamp-spec"]
2525
path = content-modules/opamp-spec
2626
url = https://github.com/open-telemetry/opamp-spec

.textlintrc.yml

+1-1
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ filters:
1919
# src attribute in figure Hugo template:
2020
- /src=".*?"/
2121
# Other:
22-
- /<https?://.*?>/ # Raw URLs
22+
- /<https?:\/\/.*?>/ # Raw URLs
2323
rules:
2424
terminology:
2525
defaultTerms: false

README.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -61,7 +61,7 @@ Here is a list of community roles with current and previous members:
6161
- [Fabrizio Ferri-Benedetti](https://github.com/theletterf)
6262
- [Patrice Chalin](https://github.com/chalin), CNCF
6363
- [Phillip Carter](https://github.com/cartermp), Honeycomb
64-
- [Severin Neumann](https://github.com/svrnm), Cisco
64+
- [Severin Neumann](https://github.com/svrnm)
6565
- [Tiffany Hrabusa](https://github.com/tiffany76), Grafana Labs
6666

6767
- Emeritus approvers:

assets/js/tracing.js

+6-4
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ import { getWebAutoInstrumentations } from '@opentelemetry/auto-instrumentations
77
import { registerInstrumentations } from '@opentelemetry/instrumentation';
88
import { OTLPTraceExporter } from '@opentelemetry/exporter-trace-otlp-http';
99
import { Resource } from '@opentelemetry/resources';
10-
import { SemanticResourceAttributes } from '@opentelemetry/semantic-conventions';
10+
import { ATTR_SERVICE_NAME } from '@opentelemetry/semantic-conventions';
1111
import { ZoneContextManager } from '@opentelemetry/context-zone-peer-dep';
1212

1313
const collectorOptions = {
@@ -16,21 +16,23 @@ const collectorOptions = {
1616
const exporter = new OTLPTraceExporter(collectorOptions);
1717

1818
const resources = new Resource({
19-
[SemanticResourceAttributes.SERVICE_NAME]: 'opentelemetry.io',
19+
[ATTR_SERVICE_NAME]: 'opentelemetry.io',
2020
'browser.language': navigator.language,
2121
});
2222

2323
const provider = new WebTracerProvider({
2424
resource: resources,
25+
spanProcessors: [
26+
new SimpleSpanProcessor(exporter),
27+
new SimpleSpanProcessor(new ConsoleSpanExporter()),
28+
],
2529
});
2630

2731
registerInstrumentations({
2832
instrumentations: [getWebAutoInstrumentations({})],
2933
tracerProvider: provider,
3034
});
3135

32-
provider.addSpanProcessor(new SimpleSpanProcessor(exporter));
33-
provider.addSpanProcessor(new SimpleSpanProcessor(new ConsoleSpanExporter()));
3436
provider.register({
3537
contextManger: new ZoneContextManager(),
3638
});

content-modules/semantic-conventions

content/en/announcements/kubecon-eu.md

+4-2
Original file line numberDiff line numberDiff line change
@@ -8,8 +8,10 @@ weight: -1
88

99
<i class="fas fa-bullhorn"></i> [**{{% param title %}}**][LF],
1010
**<span class="text-nowrap">April 1 - 4,</span> London England**.
11-
<span class="d-none d-md-inline"><br></span> Come collaborate, learn, and
12-
share<span class="d-none d-sm-inline"> with the Cloud Native community</span>!
11+
<span class="d-none d-md-inline"><br></span> Come [collaborate, learn, and
12+
share][blog]<span class="d-none d-sm-inline"> with the Cloud Native
13+
community</span>!
1314

1415
[LF]:
1516
https://events.linuxfoundation.org/kubecon-cloudnativecon-europe/register/?utm_source=opentelemetry&utm_medium=all&utm_campaign=KubeCon-EU-2025&utm_content=slim-banner
17+
[blog]: /blog/2025/kubecon-eu/

content/en/blog/2023/otterize-otel/index.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -47,7 +47,7 @@ with each other. You can then use that information for operational or security
4747
needs, such as determining the blast radius of a downtime or security incident.
4848
You can use the service graph to figure out where to start rolling out
4949
OpenTelemetry tracing, as that deployment tends to be more involved and requires
50-
the integration of the OpenTelemetry SDK into your source code.
50+
the integration of the OpenTelemetry SDK into your source code.
5151

5252
While it was easy to use the OTel SDK for the network mapper, we can see why
5353
there's a bit of a chicken-and-egg problem here when you're looking into
Loading
Loading
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,259 @@
1+
---
2+
title: AI Agent Observability - Evolving Standards and Best Practices
3+
author: >-
4+
[Guangya Liu](https://github.com/gyliu513) (IBM), [Sujay
5+
Solomon](https://github.com/solsu01) (Google)
6+
linkTitle: AI Agent Observability
7+
issue: https://github.com/open-telemetry/opentelemetry.io/issues/6389
8+
sig: SIG GenAI Observability
9+
date: 2025-03-06
10+
cSpell:ignore: genai Guangya PydanticAI Sujay
11+
---
12+
13+
## 2025: Year of AI agents
14+
15+
AI Agents are becoming the next big leap in artificial intelligence in 2025.
16+
From autonomous workflows to intelligent decision making, AI Agents will power
17+
numerous applications across industries. However, with this evolution comes the
18+
critical need for AI agent observability, especially when scaling these agents
19+
to meet enterprise needs. Without proper monitoring, tracing, and logging
20+
mechanisms, diagnosing issues, improving efficiency, and ensuring reliability in
21+
AI agent-driven applications will be challenging.
22+
23+
### What is an AI agent
24+
25+
An AI agent is an application that uses a combination of LLM capabilities, tools
26+
to connect to the external world, and high-level reasoning to achieve a desired
27+
end goal or state; Alternatively, agents can also be treated as systems where
28+
LLMs dynamically direct their own processes and tool usage, maintaining control
29+
over how they accomplish tasks.
30+
31+
![Sample RAG based application w/ReAct reasoning/planning](ai-agent.png)
32+
<small>_Image credit_:
33+
[Google AI Agent Whitepaper](https://www.kaggle.com/whitepaper-agents).</small>
34+
35+
For more information about AI agents, see:
36+
37+
- [Google: What is an AI agent?](https://cloud.google.com/discover/what-are-ai-agents)
38+
- [IBM: What are AI agents?](https://www.ibm.com/think/topics/ai-agents)
39+
- [MicroSoft: AI agents — what they are, and how they’ll change the way we work](https://news.microsoft.com/source/features/ai/ai-agents-what-they-are-and-how-theyll-change-the-way-we-work/)
40+
- [AWS: What are AI Agents?](https://aws.amazon.com/what-is/ai-agents/)
41+
- [Anthropic: Building effective agents](https://www.anthropic.com/research/building-effective-agents)
42+
43+
### Observability and beyond
44+
45+
Typically, telemetry from applications is used to monitor and troubleshoot them.
46+
In the case of an AI agent, given its non-deterministic nature, telemetry is
47+
also used as a feedback loop to continuously learn from and improve the quality
48+
of the agent by using it as input for evaluation tools.
49+
50+
Given that observability and evaluation tools for GenAI come from various
51+
vendors, it is important to establish standards around the shape of the
52+
telemetry generated by agent apps to avoid lock-in caused by vendor or framework
53+
specific formats.
54+
55+
## Current state of AI agent observability
56+
57+
As AI agent ecosystems continue to mature, the need for standardized and robust
58+
observability has become more apparent. While some frameworks offer built-in
59+
instrumentation, others rely on integration with observability tools. This
60+
fragmented landscape underscores the importance of the
61+
[GenAI observability project](https://github.com/open-telemetry/community/blob/main/projects/gen-ai.md)
62+
and OpenTelemetry’s emerging semantic conventions, which aim to unify how
63+
telemetry data is collected and reported.
64+
65+
### Understanding AI agent application vs. AI agent framework
66+
67+
It is crucial to distinguish between **AI agent application** and **AI agent
68+
frameworks**:
69+
70+
- **AI agent application** refer to individual AI-driven entities that perform
71+
specific tasks autonomously.
72+
- **AI agent framework** provide the necessary infrastructure to develop,
73+
manage, and deploy AI agents often in a more streamlined way than building an
74+
agent from scratch. Examples include the following:
75+
[IBM Bee AI](https://github.com/i-am-bee),
76+
[IBM wxFlow](https://github.com/IBM/wxflows/),
77+
[CrewAI](https://www.crewai.com/),
78+
[AutoGen](https://microsoft.github.io/autogen/dev/),
79+
[Semantic Kernel](https://github.com/microsoft/semantic-kernel),
80+
[LangGraph](https://www.langchain.com/langgraph),
81+
[PydanticAI](https://ai.pydantic.dev/) and more.
82+
83+
![AI agent application vs AI agent framework](agent-agent-framework.png)
84+
85+
### Establishing a standardized semantic convention
86+
87+
Today, the
88+
[GenAI observability project](https://github.com/open-telemetry/community/blob/main/projects/gen-ai.md)
89+
within OpenTelemetry is actively working on defining semantic conventions to
90+
standardize AI agent observability. This effort is primarily driven by:
91+
92+
- **Agent application semantic convention** – A draft AI agent application
93+
semantic convention has already been established and finalized as part of the
94+
discussions in the
95+
[OpenTelemetry semantic conventions repository](https://github.com/open-telemetry/semantic-conventions/issues/1732).
96+
The initial AI agent semantic convention is based on
97+
[Google's AI agent white paper](https://www.kaggle.com/whitepaper-agents),
98+
providing a foundational framework for defining observability standards.
99+
Moving forward, we will continue to refine and enhance this initial convention
100+
to make it more robust and comprehensive.
101+
- **Agent framework semantic convention** – Now, the focus has shifted towards
102+
defining a common semantic convention for all AI agent frameworks. This effort
103+
is being discussed in
104+
[this OpenTelemetry issue](https://github.com/open-telemetry/semantic-conventions/issues/1530)
105+
and aims to establish a standardized approach for frameworks such as IBM Bee
106+
Stack, IBM wxFlow, CrewAI, AutoGen, LangGraph, and others. Additionally,
107+
different AI Agent frameworks will be able to define their own Framework
108+
Vendor Specific Semantic Convention while adhering to the common standard.
109+
110+
By establishing these conventions, we ensure that AI agent frameworks can report
111+
standardized metrics, traces, and logs, making it easier to integrate
112+
observability solutions and compare performance across different frameworks.
113+
114+
Note: Experimental conventions already exist in OpenTelemetry for models at
115+
[GenAI semantic convention](/docs/specs/semconv/gen-ai/).
116+
117+
### Instrumentation approaches
118+
119+
In order to make a system observable, it must be instrumented: That is, code
120+
from the system’s components must
121+
[emit traces, metrics, and logs](/docs/concepts/instrumentation/).
122+
123+
Different AI agent frameworks have varying approaches to implementing
124+
observability, mainly categorized into two options:
125+
126+
#### Option 1: Baked-in instrumentation
127+
128+
The first option is to implement built-in instrumentation that emits telemetry
129+
using OpenTelemetry semantic conventions. This means observability is a native
130+
feature, allowing users to seamlessly track agent performance, task execution,
131+
and resource utilization. Some AI agent frameworks, such as CrewAI, follow this
132+
pattern.
133+
134+
As a developer of an agent framework, here are some pros and cons of this
135+
baked-in instrumentation:
136+
137+
- Pros
138+
- You can take on the maintenance overhead of keeping the instrumentation for
139+
telemetry up-to-date.
140+
- Simplifies adoption for users unfamiliar with OpenTelemetry configuration.
141+
- Keep new features secret while providing instrumentation for them on the day
142+
of release.
143+
- Cons
144+
- Adds bloat to the framework for users who do not need observability
145+
features.
146+
- Risk of version lock-in if the framework’s OpenTelemetry dependencies lag
147+
behind upstream updates.
148+
- Less flexibility for advanced users who prefer custom instrumentation.
149+
- You may not get feedback/review from OTel contributors familiar with current
150+
semantic conventions.
151+
- Your instrumentation may lag with respect to best practices/conventions (not
152+
just the version of the OTel library dependencies).
153+
- Some best practices to follow if you consider this approach:
154+
- Provide a configuration setting that lets users easily enable or disable
155+
telemetry collection from your framework's built-in instrumentation.
156+
- Plan ahead of users wanting to use other external instrumentation packages
157+
and avoid collision.
158+
- Consider listing your agent framework in the
159+
[OpenTelemetry registry](/ecosystem/registry/) if you choose this path.
160+
- As a developer of an agent application, you may want to choose an agent
161+
framework with baked-in instrumentation if…
162+
- Minimal dependencies on external packages in your agent app code.
163+
- Out-of-the-box observability without manual setup.
164+
165+
#### Option 2: Instrumentation via OpenTelemetry
166+
167+
This option is to publish OpenTelemetry instrumentation libraries to some GitHub
168+
repositories. These instrumentation libraries can be imported into agents and
169+
configured to emit telemetry per OpenTelemetry semantic conventions.
170+
171+
For publishing instrumentation with OpenTelemetry, there are two options:
172+
173+
- Option 1: External instrumentation in your own repository/package, like
174+
[Traceloop OpenTelemetry Instrumentation](https://github.com/traceloop/openllmetry/tree/main/packages),
175+
[Langtrace OpenTelemetry Instrumentation](https://github.com/Scale3-Labs/langtrace-python-sdk/tree/main/src/langtrace_python_sdk/instrumentation)
176+
etc.
177+
- Option 2: External instrumentation in OpenTelemetry owned repository, like
178+
[instrumentation-genai](https://github.com/open-telemetry/opentelemetry-python-contrib/tree/main/instrumentation-genai)
179+
etc.
180+
181+
Both options work well, but the long term goal is to host the code in
182+
OpenTelemetry owned repositories, like Traceloop is trying to
183+
[donate the instrumentation code](https://github.com/open-telemetry/community/issues/2571)
184+
to OpenTelemetry now.
185+
186+
As a developer of an agent framework, here are some pros and cons of
187+
instrumentation with OpenTelemetry:
188+
189+
- Pros
190+
- Decouples observability from the core framework, reducing bloat.
191+
- Leverages OpenTelemetry’s community-driven maintenance for instrumentation
192+
updates.
193+
- Allows users to mix and match contrib libraries for their specific needs
194+
(e.g., cloud providers, LLM vendors).
195+
- More likely to leverage best practices around semantic conventions and
196+
zero-code instrumentation
197+
- Cons
198+
- Risk of fragmentation if users rely on incompatible or outdated contrib
199+
packages for both install time and runtime.
200+
- Development velocity slows down when there are too many PRs in the
201+
OpenTelemetry review queue.
202+
- Best practices for this approach:
203+
- Ensure compatibility with popular OpenTelemetry contrib libraries (e.g., LLM
204+
vendors, vector DBs).
205+
- Provide clear documentation on recommended contrib packages and
206+
configuration examples.
207+
- Avoid reinventing the wheel; align with existing OpenTelemetry standards.
208+
- As a developer of an agent application, you may want to choose an agent
209+
framework with baked-in instrumentation if…
210+
- You need fine-grained control over telemetry sources and destinations.
211+
- Your use case requires integrating observability with niche or custom tools.
212+
213+
**NOTE:** Regardless of the approach taken, it is essential that all AI agent
214+
frameworks adopt the AI agent framework semantic convention to ensure
215+
interoperability and consistency in observability data.
216+
217+
## Future of AI agent observability
218+
219+
Looking ahead, AI agent observability will continue to evolve with:
220+
221+
- **More robust semantic conventions** to cover edge cases and emerging AI agent
222+
frameworks.
223+
- **A unified AI agent framework semantic convention** to ensure
224+
interoperability across different frameworks while allowing flexibility for
225+
vendor-specific extensions.
226+
- **Continuous improvements to the AI agent semantic convention** to refine the
227+
initial standard and address new challenges as AI agents evolve.
228+
- **Improved tooling** for monitoring, debugging, and optimizing AI agents.
229+
- **Tighter integration with AI model observability** to provide end-to-end
230+
visibility into AI powered applications.
231+
232+
## Role of OpenTelemetry's GenAI SIG
233+
234+
The
235+
[GenAI Special Interest Group (SIG) in OpenTelemetry](https://github.com/open-telemetry/community/blob/main/projects/gen-ai.md)
236+
is actively defining [GenAI semantic conventions](/docs/specs/semconv/gen-ai/)
237+
that cover key areas such as:
238+
239+
- LLM or model semantic conventions
240+
- VectorDB semantic conventions
241+
- AI agent semantic conventions (a critical component within the broader GenAI
242+
semantic convention)
243+
244+
In addition to conventions, the SIG has also expanded its scope to provide
245+
instrumentation coverage for agents and models in Python and other languages. As
246+
AI Agents become increasingly sophisticated, observability will play a
247+
fundamental role in ensuring their reliability, efficiency, and trustworthiness.
248+
Establishing a standardized approach to AI Agent observability requires
249+
collaboration, and we invite contributions from the broader AI community.
250+
251+
We look forward to partnering with different AI agent framework communities to
252+
establish best practices and refine these standards together. Your insights and
253+
contributions will help shape the future of AI observability, fostering a more
254+
transparent and effective AI ecosystem.
255+
256+
Don’t miss this opportunity to help shape the future of industry standards for
257+
GenAI Observability! Join us on the [CNCF Slack](https://slack.cncf.io)
258+
`#otel-genai-instrumentation` channel, or by attending a
259+
[GenAI SIG meeting](https://github.com/open-telemetry/community/blob/main/projects/gen-ai.md#meeting-times).

0 commit comments

Comments
 (0)