Skip to content

Commit a998322

Browse files
authoredMar 1, 2024
Add OTel Anti-Patterns blog post. Ref: open-telemetry#4043 (open-telemetry#4044)
1 parent 6d2576c commit a998322

File tree

5 files changed

+241
-0
lines changed

5 files changed

+241
-0
lines changed
 
Loading
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,185 @@
1+
---
2+
title: OpenTelemetry Collector Antipatterns
3+
linkTitle: OTel Collector Antipatterns
4+
date: 2024-03-01
5+
author: >-
6+
[Adriana Villela](https://github.com/avillela) (Lightstep),
7+
8+
canonical_url: https://open.substack.com/pub/geekingoutpodcast/p/opentelemetry-collector-anti-patterns
9+
cSpell:ignore: antipattern antipatterns
10+
---
11+
12+
![House on stilts against ocean and mountain backdrop](house-on-stilts.jpg)
13+
14+
The [OpenTelemetry Collector](/docs/collector) is one of my favorite
15+
OpenTelemetry (OTel) components. It’s a flexible and powerful data pipeline
16+
which allows you to ingest OTel data from one or more sources, transform it
17+
(including batching, filtering, and masking), and export it to one or more
18+
observability backends for analysis. It’s vendor-neutral. It’s extensible,
19+
meaning that you can create your own custom components for it. What’s there not
20+
to like?
21+
22+
Unfortunately, as it happens with many tools out there, it is also very easy to
23+
fall into some bad habits. Today, I will dig into five OpenTelemetry Collector
24+
antipatterns, and how to avoid them. Let’s get started!
25+
26+
## Antipatterns
27+
28+
### 1- Improper use of Collector deployment modes
29+
30+
It’s not just enough to use a Collector. It’s also about _how_ your Collectors
31+
are deployed within your organization. That’s right - Collector*s*, plural.
32+
Because one is often not enough.
33+
34+
There are two deployment modes for Collectors: agent mode and gateway mode, and
35+
both are needed.
36+
37+
In [agent mode](/docs/collector/deployment/agent/), the Collector sits next to
38+
the application or on the same host as the application.
39+
40+
![OTel Collector Agent Mode](otel-collector-agent.png)
41+
42+
In [gateway mode](/docs/collector/deployment/gateway/), telemetry data is sent
43+
to a load balancer, which then determines how to distribute the load amongst a
44+
pool of Collectors. Because you have a pool of Collectors, should one Collector
45+
in that pool fail, one of the other Collectors in the pool can take over. This
46+
keeps data flowing to your destination sans disruptions. Gateway mode is
47+
commonly deployed per cluster, data center, or region.
48+
49+
![OTel Collector Agent Mode](otel-collector-gateway.png)
50+
51+
So which should you use? Both agent and gateway.
52+
53+
If you’re collecting telemetry data for your application, place a Collector
54+
agent alongside your application. If you’re collecting data for infrastructure,
55+
place a Collector agent alongside your infrastructure. Whatever you do, don’t
56+
collect telemetry for all of your infrastructure and applications using a single
57+
Collector. That way, if one Collector fails, the rest of your telemetry
58+
collection is unaffected.
59+
60+
The telemetry from your Collector agents can then be sent to a Collector
61+
gateway. Because the gateway sits behind a load balancer, you don’t have a
62+
single point of failure for exporting telemetry data, typically to your
63+
observability backend.
64+
65+
_Bottom line:_ Having the right Collector deployment configuration to send data
66+
to your observability backend ensures higher availability of your telemetry
67+
collection infrastructure.
68+
69+
### 2- Not monitoring your Collectors
70+
71+
Deploying multiple Collector agents and a Collector gateway is great, but it’s
72+
not good enough. Wouldn’t it be nice to know when one of your Collectors is
73+
malfunctioning, or when data is being dropped? That way, you can take action
74+
before things start to escalate. This is where monitoring your Collectors can be
75+
very useful.
76+
77+
But how does one monitor a Collector? The OTel Collector already emits
78+
[metrics for the purposes of its own monitoring](https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/monitoring.md).
79+
These can then be sent to your Observability backend for monitoring.
80+
81+
### 3- Not using the right Collector Distribution (or not building your own distribution)
82+
83+
There are two official distributions of the OpenTelemetry Collector:
84+
[Core](https://github.com/open-telemetry/opentelemetry-collector), and
85+
[Contrib](https://github.com/open-telemetry/opentelemetry-collector-contrib).
86+
87+
The Core distribution is a bare-bones distribution of the Collector for OTel
88+
developers to develop and test. It contains a base set of
89+
[extensions](/docs/collector/configuration/#service-extensions),
90+
[connectors](/docs/collector/configuration/#connectors),
91+
[receivers](/docs/collector/configuration/#receivers),
92+
[processors](/docs/collector/configuration/#processors), and
93+
[exporters](/docs/collector/configuration/#exporters).
94+
95+
The Contrib distribution is for non-OTel developers to experiment and learn. It
96+
also extends the Core distribution, and includes components created by
97+
third-parties (including vendors and individual community members), that are
98+
useful to the OpenTelemetry community at large.
99+
100+
Neither Core nor Contrib alone are meant to be part of your production workload.
101+
Using just Core by itself is too bare-bones and wouldn’t suit an organization’s
102+
needs. (Though its components are absolutely needed!) And although many
103+
OpenTelemetry practitioners, deploy Contrib in their respective organizations,
104+
it has many components, and you likely won’t need every single exporter,
105+
receiver, processor, connector, and extension. That would be overkill, and your
106+
Collector instance ends up needlessly bloated, potentially increasing the attack
107+
surface.
108+
109+
But how do you pick and choose the components that you need? The answer is to
110+
build your own distribution, and you can do that using a tool called the
111+
[OpenTelemetry Collector Builder](/docs/collector/custom-collector/) (OCB). In
112+
addition, at some point, you may need to create your own custom Collector
113+
component, such as a processor or exporter. The OCB allows you to integrate your
114+
custom components AND pick and choose the Contrib components that you need.
115+
116+
It is also worth mentioning that some vendors build their own
117+
[Collector distributions](/ecosystem/distributions/). These are OTel Collector
118+
distributions that are curated to Collector components that are specific to that
119+
vendor. They may be a combination of custom, vendor-developed components, and
120+
curated Collector Contrib components. Using vendor-specific distributions
121+
ensures that you are using just the Collector components that you need, again
122+
reducing overall bloat.
123+
124+
_Bottom line:_ Using the right distribution reduces bloat and allows you to
125+
include only the Collector components that you need.
126+
127+
### 4- Not updating your Collectors
128+
129+
This one’s short and sweet. Keeping software up-to-date is important, and the
130+
Collector is no different! By regularly updating the Collector, it allows you to
131+
stay up-to-date with the latest version so that you can take advantage of new
132+
features, bug fixes, performance improvements, and security fixes.
133+
134+
### 5- Not using the OpenTelemetry Collector where appropriate
135+
136+
OpenTelemetry allows you to send telemetry signals from your application to an
137+
observability backend in one of two ways:
138+
139+
- [Directly from the application](/docs/collector/deployment/no-collector/)
140+
- [Via the OpenTelemetry Collector](/docs/collector/)
141+
142+
Sending telemetry data “direct from application” for non-production systems is
143+
all well and good if you’re getting started with OpenTelemetry, but it is
144+
neither suited nor recommended to use this approach for production systems.
145+
Instead, the
146+
[OpenTelemetry docs recommend using the OpenTelemetry Collector](/docs/collector/#when-to-use-a-collector).
147+
How come?
148+
149+
[Per the OTel Docs](/docs/collector/#when-to-use-a-collector), the Collector
150+
“allows your service to offload data quickly and the collector can take care of
151+
additional handling like retries, batching, encryption or even sensitive data
152+
filtering.”
153+
154+
Check out some additional Collector benefits:
155+
156+
- **Collectors can enhance the quality of the telemetry emitted by an
157+
application while also minimizing costs.** For example: sampling spans to
158+
reduce costs, enriching telemetry with extra metadata, and generating new
159+
telemetry, such as metrics derived from spans.
160+
- **Using a Collector to ingest telemetry data makes it easy to change to a new
161+
backend or export the data in a different format.** If we want to change how
162+
telemetry is being processed or exported, that change happens in one place
163+
(the Collector!), as opposed to making the same change for multiple
164+
applications in your organization.
165+
- **Collectors allow you to receive data of various formats and translate to the
166+
desired format for export.** This can be very handy when transitioning from
167+
some other telemetry solution to OTel.
168+
- **Collectors allow you to ingest non-application telemetry.** This includes
169+
logs and non-app metrics from infrastructure like Azure, Prometheus, and
170+
Cloudwatch.
171+
172+
That being said, there are some use-cases where folks don't want or can't use a
173+
Collector. For instance, when collecting data at the edge from IOT devices, it
174+
might be better to send data directly to their observability backend instead of
175+
a local Collector, given that resources on that edge might be limited.
176+
177+
_Bottom line:_ As a general rule, using the OpenTelemetry Collector gives you
178+
additional flexibility for managing your telemetry data.
179+
180+
## Final Thoughts
181+
182+
The OpenTelemetry Collector is a powerful and flexible tool for ingesting,
183+
manipulating, and exporting OpenTelemetry data. By using it to its full
184+
potential and by avoiding these five pitfalls, your organization can be well on
185+
its way towards achieving observability greatness.
Loading
Loading

‎static/refcache.json

+56
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,10 @@
3939
"StatusCode": 200,
4040
"LastSeen": "2024-01-18T08:05:55.59597-05:00"
4141
},
42+
"https://adri-v.medium.com/43dca4a857a0": {
43+
"StatusCode": 200,
44+
"LastSeen": "2024-02-23T23:30:53.006527-05:00"
45+
},
4246
"https://agilecoffee.com/leancoffee/": {
4347
"StatusCode": 200,
4448
"LastSeen": "2024-01-18T08:05:43.542109-05:00"
@@ -5283,6 +5287,10 @@
52835287
"StatusCode": 200,
52845288
"LastSeen": "2024-01-30T15:37:21.465525-05:00"
52855289
},
5290+
"https://open.substack.com/pub/geekingoutpodcast/p/opentelemetry-collector-anti-patterns": {
5291+
"StatusCode": 200,
5292+
"LastSeen": "2024-02-26T15:05:23.506868-05:00"
5293+
},
52865294
"https://opencensus.io": {
52875295
"StatusCode": 206,
52885296
"LastSeen": "2024-01-18T19:07:33.722102-05:00"
@@ -5395,6 +5403,54 @@
53955403
"StatusCode": 200,
53965404
"LastSeen": "2024-01-18T19:07:12.98586-05:00"
53975405
},
5406+
"https://opentelemetry.io/docs/collector": {
5407+
"StatusCode": 206,
5408+
"LastSeen": "2024-02-23T22:55:03.656226-05:00"
5409+
},
5410+
"https://opentelemetry.io/docs/collector/": {
5411+
"StatusCode": 206,
5412+
"LastSeen": "2024-02-23T22:55:04.244864-05:00"
5413+
},
5414+
"https://opentelemetry.io/docs/collector/#when-to-use-a-collector": {
5415+
"StatusCode": 206,
5416+
"LastSeen": "2024-02-23T22:55:04.48411-05:00"
5417+
},
5418+
"https://opentelemetry.io/docs/collector/configuration/#connectors": {
5419+
"StatusCode": 206,
5420+
"LastSeen": "2024-02-23T22:55:05.306982-05:00"
5421+
},
5422+
"https://opentelemetry.io/docs/collector/configuration/#exporters": {
5423+
"StatusCode": 206,
5424+
"LastSeen": "2024-02-23T22:55:06.037446-05:00"
5425+
},
5426+
"https://opentelemetry.io/docs/collector/configuration/#processors": {
5427+
"StatusCode": 206,
5428+
"LastSeen": "2024-02-23T22:55:05.754871-05:00"
5429+
},
5430+
"https://opentelemetry.io/docs/collector/configuration/#receivers": {
5431+
"StatusCode": 206,
5432+
"LastSeen": "2024-02-23T22:55:05.518086-05:00"
5433+
},
5434+
"https://opentelemetry.io/docs/collector/configuration/#service-extensions": {
5435+
"StatusCode": 206,
5436+
"LastSeen": "2024-02-23T22:55:05.132379-05:00"
5437+
},
5438+
"https://opentelemetry.io/docs/collector/custom-collector/": {
5439+
"StatusCode": 206,
5440+
"LastSeen": "2024-02-23T22:55:06.360327-05:00"
5441+
},
5442+
"https://opentelemetry.io/docs/collector/deployment/agent/": {
5443+
"StatusCode": 206,
5444+
"LastSeen": "2024-02-23T22:55:04.712097-05:00"
5445+
},
5446+
"https://opentelemetry.io/docs/collector/deployment/gateway/": {
5447+
"StatusCode": 206,
5448+
"LastSeen": "2024-02-23T22:55:04.939057-05:00"
5449+
},
5450+
"https://opentelemetry.io/docs/collector/deployment/no-collector/": {
5451+
"StatusCode": 206,
5452+
"LastSeen": "2024-02-23T22:55:04.014798-05:00"
5453+
},
53985454
"https://opentracing.io": {
53995455
"StatusCode": 206,
54005456
"LastSeen": "2024-01-18T19:07:33.813401-05:00"

0 commit comments

Comments
 (0)
Please sign in to comment.