|
| 1 | +--- |
| 2 | +title: OpenTelemetry Collector Antipatterns |
| 3 | +linkTitle: OTel Collector Antipatterns |
| 4 | +date: 2024-03-01 |
| 5 | +author: >- |
| 6 | + [Adriana Villela](https://github.com/avillela) (Lightstep), |
| 7 | +
|
| 8 | +canonical_url: https://open.substack.com/pub/geekingoutpodcast/p/opentelemetry-collector-anti-patterns |
| 9 | +cSpell:ignore: antipattern antipatterns |
| 10 | +--- |
| 11 | + |
| 12 | + |
| 13 | + |
| 14 | +The [OpenTelemetry Collector](/docs/collector) is one of my favorite |
| 15 | +OpenTelemetry (OTel) components. It’s a flexible and powerful data pipeline |
| 16 | +which allows you to ingest OTel data from one or more sources, transform it |
| 17 | +(including batching, filtering, and masking), and export it to one or more |
| 18 | +observability backends for analysis. It’s vendor-neutral. It’s extensible, |
| 19 | +meaning that you can create your own custom components for it. What’s there not |
| 20 | +to like? |
| 21 | + |
| 22 | +Unfortunately, as it happens with many tools out there, it is also very easy to |
| 23 | +fall into some bad habits. Today, I will dig into five OpenTelemetry Collector |
| 24 | +antipatterns, and how to avoid them. Let’s get started! |
| 25 | + |
| 26 | +## Antipatterns |
| 27 | + |
| 28 | +### 1- Improper use of Collector deployment modes |
| 29 | + |
| 30 | +It’s not just enough to use a Collector. It’s also about _how_ your Collectors |
| 31 | +are deployed within your organization. That’s right - Collector*s*, plural. |
| 32 | +Because one is often not enough. |
| 33 | + |
| 34 | +There are two deployment modes for Collectors: agent mode and gateway mode, and |
| 35 | +both are needed. |
| 36 | + |
| 37 | +In [agent mode](/docs/collector/deployment/agent/), the Collector sits next to |
| 38 | +the application or on the same host as the application. |
| 39 | + |
| 40 | + |
| 41 | + |
| 42 | +In [gateway mode](/docs/collector/deployment/gateway/), telemetry data is sent |
| 43 | +to a load balancer, which then determines how to distribute the load amongst a |
| 44 | +pool of Collectors. Because you have a pool of Collectors, should one Collector |
| 45 | +in that pool fail, one of the other Collectors in the pool can take over. This |
| 46 | +keeps data flowing to your destination sans disruptions. Gateway mode is |
| 47 | +commonly deployed per cluster, data center, or region. |
| 48 | + |
| 49 | + |
| 50 | + |
| 51 | +So which should you use? Both agent and gateway. |
| 52 | + |
| 53 | +If you’re collecting telemetry data for your application, place a Collector |
| 54 | +agent alongside your application. If you’re collecting data for infrastructure, |
| 55 | +place a Collector agent alongside your infrastructure. Whatever you do, don’t |
| 56 | +collect telemetry for all of your infrastructure and applications using a single |
| 57 | +Collector. That way, if one Collector fails, the rest of your telemetry |
| 58 | +collection is unaffected. |
| 59 | + |
| 60 | +The telemetry from your Collector agents can then be sent to a Collector |
| 61 | +gateway. Because the gateway sits behind a load balancer, you don’t have a |
| 62 | +single point of failure for exporting telemetry data, typically to your |
| 63 | +observability backend. |
| 64 | + |
| 65 | +_Bottom line:_ Having the right Collector deployment configuration to send data |
| 66 | +to your observability backend ensures higher availability of your telemetry |
| 67 | +collection infrastructure. |
| 68 | + |
| 69 | +### 2- Not monitoring your Collectors |
| 70 | + |
| 71 | +Deploying multiple Collector agents and a Collector gateway is great, but it’s |
| 72 | +not good enough. Wouldn’t it be nice to know when one of your Collectors is |
| 73 | +malfunctioning, or when data is being dropped? That way, you can take action |
| 74 | +before things start to escalate. This is where monitoring your Collectors can be |
| 75 | +very useful. |
| 76 | + |
| 77 | +But how does one monitor a Collector? The OTel Collector already emits |
| 78 | +[metrics for the purposes of its own monitoring](https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/monitoring.md). |
| 79 | +These can then be sent to your Observability backend for monitoring. |
| 80 | + |
| 81 | +### 3- Not using the right Collector Distribution (or not building your own distribution) |
| 82 | + |
| 83 | +There are two official distributions of the OpenTelemetry Collector: |
| 84 | +[Core](https://github.com/open-telemetry/opentelemetry-collector), and |
| 85 | +[Contrib](https://github.com/open-telemetry/opentelemetry-collector-contrib). |
| 86 | + |
| 87 | +The Core distribution is a bare-bones distribution of the Collector for OTel |
| 88 | +developers to develop and test. It contains a base set of |
| 89 | +[extensions](/docs/collector/configuration/#service-extensions), |
| 90 | +[connectors](/docs/collector/configuration/#connectors), |
| 91 | +[receivers](/docs/collector/configuration/#receivers), |
| 92 | +[processors](/docs/collector/configuration/#processors), and |
| 93 | +[exporters](/docs/collector/configuration/#exporters). |
| 94 | + |
| 95 | +The Contrib distribution is for non-OTel developers to experiment and learn. It |
| 96 | +also extends the Core distribution, and includes components created by |
| 97 | +third-parties (including vendors and individual community members), that are |
| 98 | +useful to the OpenTelemetry community at large. |
| 99 | + |
| 100 | +Neither Core nor Contrib alone are meant to be part of your production workload. |
| 101 | +Using just Core by itself is too bare-bones and wouldn’t suit an organization’s |
| 102 | +needs. (Though its components are absolutely needed!) And although many |
| 103 | +OpenTelemetry practitioners, deploy Contrib in their respective organizations, |
| 104 | +it has many components, and you likely won’t need every single exporter, |
| 105 | +receiver, processor, connector, and extension. That would be overkill, and your |
| 106 | +Collector instance ends up needlessly bloated, potentially increasing the attack |
| 107 | +surface. |
| 108 | + |
| 109 | +But how do you pick and choose the components that you need? The answer is to |
| 110 | +build your own distribution, and you can do that using a tool called the |
| 111 | +[OpenTelemetry Collector Builder](/docs/collector/custom-collector/) (OCB). In |
| 112 | +addition, at some point, you may need to create your own custom Collector |
| 113 | +component, such as a processor or exporter. The OCB allows you to integrate your |
| 114 | +custom components AND pick and choose the Contrib components that you need. |
| 115 | + |
| 116 | +It is also worth mentioning that some vendors build their own |
| 117 | +[Collector distributions](/ecosystem/distributions/). These are OTel Collector |
| 118 | +distributions that are curated to Collector components that are specific to that |
| 119 | +vendor. They may be a combination of custom, vendor-developed components, and |
| 120 | +curated Collector Contrib components. Using vendor-specific distributions |
| 121 | +ensures that you are using just the Collector components that you need, again |
| 122 | +reducing overall bloat. |
| 123 | + |
| 124 | +_Bottom line:_ Using the right distribution reduces bloat and allows you to |
| 125 | +include only the Collector components that you need. |
| 126 | + |
| 127 | +### 4- Not updating your Collectors |
| 128 | + |
| 129 | +This one’s short and sweet. Keeping software up-to-date is important, and the |
| 130 | +Collector is no different! By regularly updating the Collector, it allows you to |
| 131 | +stay up-to-date with the latest version so that you can take advantage of new |
| 132 | +features, bug fixes, performance improvements, and security fixes. |
| 133 | + |
| 134 | +### 5- Not using the OpenTelemetry Collector where appropriate |
| 135 | + |
| 136 | +OpenTelemetry allows you to send telemetry signals from your application to an |
| 137 | +observability backend in one of two ways: |
| 138 | + |
| 139 | +- [Directly from the application](/docs/collector/deployment/no-collector/) |
| 140 | +- [Via the OpenTelemetry Collector](/docs/collector/) |
| 141 | + |
| 142 | +Sending telemetry data “direct from application” for non-production systems is |
| 143 | +all well and good if you’re getting started with OpenTelemetry, but it is |
| 144 | +neither suited nor recommended to use this approach for production systems. |
| 145 | +Instead, the |
| 146 | +[OpenTelemetry docs recommend using the OpenTelemetry Collector](/docs/collector/#when-to-use-a-collector). |
| 147 | +How come? |
| 148 | + |
| 149 | +[Per the OTel Docs](/docs/collector/#when-to-use-a-collector), the Collector |
| 150 | +“allows your service to offload data quickly and the collector can take care of |
| 151 | +additional handling like retries, batching, encryption or even sensitive data |
| 152 | +filtering.” |
| 153 | + |
| 154 | +Check out some additional Collector benefits: |
| 155 | + |
| 156 | +- **Collectors can enhance the quality of the telemetry emitted by an |
| 157 | + application while also minimizing costs.** For example: sampling spans to |
| 158 | + reduce costs, enriching telemetry with extra metadata, and generating new |
| 159 | + telemetry, such as metrics derived from spans. |
| 160 | +- **Using a Collector to ingest telemetry data makes it easy to change to a new |
| 161 | + backend or export the data in a different format.** If we want to change how |
| 162 | + telemetry is being processed or exported, that change happens in one place |
| 163 | + (the Collector!), as opposed to making the same change for multiple |
| 164 | + applications in your organization. |
| 165 | +- **Collectors allow you to receive data of various formats and translate to the |
| 166 | + desired format for export.** This can be very handy when transitioning from |
| 167 | + some other telemetry solution to OTel. |
| 168 | +- **Collectors allow you to ingest non-application telemetry.** This includes |
| 169 | + logs and non-app metrics from infrastructure like Azure, Prometheus, and |
| 170 | + Cloudwatch. |
| 171 | + |
| 172 | +That being said, there are some use-cases where folks don't want or can't use a |
| 173 | +Collector. For instance, when collecting data at the edge from IOT devices, it |
| 174 | +might be better to send data directly to their observability backend instead of |
| 175 | +a local Collector, given that resources on that edge might be limited. |
| 176 | + |
| 177 | +_Bottom line:_ As a general rule, using the OpenTelemetry Collector gives you |
| 178 | +additional flexibility for managing your telemetry data. |
| 179 | + |
| 180 | +## Final Thoughts |
| 181 | + |
| 182 | +The OpenTelemetry Collector is a powerful and flexible tool for ingesting, |
| 183 | +manipulating, and exporting OpenTelemetry data. By using it to its full |
| 184 | +potential and by avoiding these five pitfalls, your organization can be well on |
| 185 | +its way towards achieving observability greatness. |
0 commit comments