Responsive

OpenTelemetry Collector's New Filter Processor

OpenTelemetry Collector's New Filter Processor
Dec 6, 2023
4 minutes
read
Matheus Nogueira
Software Engineer
Tracetest

OpenTelemetry collector now supports filtering without needing tail sampling. See how Tracetest uses this to filter based on trace state to allow a second pipeline to route span info to Tracetest.

Share on X
Share on LinkedIn
Share on Reddit
Share on HackerNews
Copy URL

Table of Contents

Traditionally, trace data has been used in a reactive manner. When things break, the trace data enables you to reduce your MTTR. At [Tracetest.io](http://tracetest.io/), we enable trace data to also be used proactively, not just reactively, by enabling E2E tests in a fraction of the time typically required. The secret sauce behind this is the trace data, as we rely on it to conduct trace-based tests.

As traces are the main component to enable trace-based testing, we have two ways of collecting those:

- Direct integration: Used with trace data stores that allow an API or direct connection to query and retrieve a trace. This includes Jaeger, Tempo, OpenSearch, AWS X-Ray, and [the list is growing](https://docs.tracetest.io/configuration/overview).

- Via OpenTelemetry Collector: The user defines a second pipeline in their collector and sends traces directly to Tracetest. This is the approach used with Datadog, New Relic, Honeycomb, Lightstep, and others.

However, when using the OpenTelemetry Collector and dealing with a large number of traces, storage can become a problem pretty quickly, thus we had to find a way of filtering out traces that are not relevant for Tracetest. Especially because we never wanted to build Tracetest as a competitor of existing trace stores such as Jaeger, Tempo, or Lightstep. The solution found was to filter spans based on an attribute in their `Trace State` object.

What is Trace State?

`Trace State` is one of the components of the `TraceContext`. On the OpenTelemetry website, we can see what both of those things mean:

> A **Context** is an object that contains the information for the sending and receiving service to correlate one span with another and associate it with the trace overall.

>

> **Trace State**, a list of key-value pairs that can carry vendor-specific trace information.

>

This means that anything that is part of the `TraceContext` is propagated automatically. This was very important for us because it means that if we set a value in the `Trace State`, it gets propagated to all the services that make up the operation of that trace.

Filtering by Trace State

If you had to filter spans based on the `Trace State` object a few months ago, you probably noticed you didn’t have a way of doing it if you were using the `core` distribution of the OpenTelemetry Collector. You had to rely on the `distrib` version and use the `tail_sampling` processor.

However, a few months ago the OpenTelemetry team rewrote part of the `filter` processor and made it possible to use OpenTelemetry Transformation Language (OTTL) to build your filters. This change made it possible to use the `filter` processor to access attributes in the `Trace State` object.

```yaml

receivers:
 otlp:
   protocols:
     grpc:
     http:

processors:
 batch:
   timeout: 100ms
 filter/tracetest:
   error_mode: ignore
   traces:
     span:
       # remove all spans that the trace state doesn't have an object
       # which key is "tracetest" and value is "true"
       - 'trace_state["tracetest"] != "true"'

exporters:
 logging:
   loglevel: debug

service:
 pipelines:
   traces/1:
     receivers: [otlp]
     processors: [filter/tracetest, batch]
     exporters: [logging]

```

For more complex filters, check the [OTTL documentation](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/pkg/ottl/README.md).

Why is this New Filtering Capability in the OTel Collector Important?

There are two main arguments for migrating from `tail_sampling` to `filter`:

- It’s available in the `core` distribution: Most vendor-specific collectors are based on this collector version. Thus, if you use an up-to-date vendor-specific collector, you probably have access to the `filter` processor, but not to the `tail_sampling`.

- It’s easier to write: If you ever had to write a tail sampling rule you know. It’s not hard, but the syntax is more cumbersome than OTTL. So having the `filter` processor makes it easier to understand what is being filtered out of your pipeline.

- The other point is **performance**: Tail sampling requires the collector to keep your trace in memory until it decides if it’s going to sample it or not. So, depending on your configuration and the amount of spans your system generates, this processor can be very heavy on memory usage. This means a higher cost of maintaining your collector.

Conclusion

The new capabilities of the `filter` processor are very useful and it is easy to write and maintain. If you have tried using it before but thought it was lacking something, it’s worth revisiting it and checking the new filtering capabilities.

After we saw the new capabilities, we dropped our recommendation of tail sampling and started suggesting users use the `filter` processor instead.

I want to send a big Kudos to the OpenTelemetry Collector team for this change!

Want to enable your trace data to power trace-based tests, allowing you to build powerful E2E tests in minutes rather than days? Give [tracetest.io](https://tracetest.io) a try!