Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MDC values for event.type and event.category are not properly serialized as JSON arrays #301

Open
thomastrinn opened this issue Jan 8, 2025 · 3 comments
Labels
agent-java community Issues and PRs created by the community enhancement New feature or request

Comments

@thomastrinn
Copy link

Description

When using logback-ecs-encoder with SLF4J's MDC to set array-type fields (like event.type and event.category), the values are serialized as string literals instead of proper JSON arrays. While MDC only supports String values by design, the ECS encoder could detect and properly format string values that represent arrays for fields that are defined as arrays in the ECS specification.

Current Behavior

When setting an array value in MDC (which only accepts strings):

MDC.put("event.type", Arrays.asList("connection", "allowed").toString());

The current output in logs:

{
  "@timestamp": "2025-01-08T13:00:53.318Z",
  "event.type": "[connection, allowed]",
  // other fields...
}

Expected Behaviour

The log output should contain a proper JSON array according to ECS specification:

{
  "@timestamp": "2025-01-08T13:00:53.318Z",
  "event.type": ["connection", "allowed"],
  // other fields...
}

Technical Details

The issue is in EcsJsonSerializer.serializeMDC() where all MDC values are treated as string literals:

builder.append("\":\"");
JsonUtils.quoteAsString(toNullSafeString(String.valueOf(entry.getValue())), builder);
builder.append("\",");

While we understand that MDC only supports string values, the ECS encoder could detect and properly format these string values for fields that are defined as arrays in the ECS specification.

Impact

This limitation affects any field that should be an array according to ECS specification, particularly:

  • event.type
  • event.category
  • tags
  • labels

This makes it difficult to use the library with standard Java collections for fields that should be arrays according to the ECS specification.

Suggested Solution

The serializer could:

  • Check if the field name matches known array fields from ECS specification
  • Check if the string value represents a list (e.g., starts with '[' and ends with ']')
  • Parse and format such values as proper JSON arrays

Example implementation approach:

private static final Set<String> ARRAY_FIELDS = Set.of(
            "event.type",
            "event.category",
            "tags",
            "labels"
);

public static void serializeMDC(StringBuilder builder, Map<String, ?> properties) {
    if (properties != null && !properties.isEmpty()) {
        for (Map.Entry<String, ?> entry : properties.entrySet()) {
            builder.append('\"');
            String key = entry.getKey();
            JsonUtils.quoteAsString(key, builder);

            String value = toNullSafeString(String.valueOf(entry.getValue()));
            if (value.startsWith("[") && value.endsWith("]")) {
                List<String> items = Arrays.stream(
                                value.substring(1, value.length() - 1)
                                        .split(","))
                        .map(String::trim)
                        .collect(Collectors.toList());

                builder.append("\":");
                builder.append(formatAsJsonArray(items));
                builder.append(",");
            } else {
                builder.append("\":\"");
                JsonUtils.quoteAsString(toNullSafeString(String.valueOf(entry.getValue())), builder);
                builder.append("\",");
            }
        }
    }
}

Environment

  • logback-ecs-encoder version: 1.6.0
  • slf4j-api version: 2.0.9
  • Java version: 11
  • Logback version: 1.4.12
@github-actions github-actions bot added agent-java community Issues and PRs created by the community triage Issues and PRs that need to be triaged labels Jan 8, 2025
@SylvainJuge
Copy link
Member

Hi @thomastrinn , thanks for opening this suggestion.

I think there are a few challenges here:

  • how the collection is serialized as a string depends on the toString implementation, which is out of our control here, so we have strictly no guarantee that all collections are serialized using this [x, y, ...] syntax, or that someone would for example use [foo] value that would be de-serialized as an array.
  • what about corner cases where one of the values contain an , character ? I think there are plenty of such corner-cases we would have to deal with here.

As an alternative, I would suggest to keep your serialization logic as-is with calls to toString in your application and then split into an array using an ingest (or processing) pipeline before it is stored into Elasticsearch.

@thomastrinn
Copy link
Author

Hi @SylvainJuge , thank you for your response,

I completely understand your point of view! Because I’m using SLF4J, and its MDC accepts only string values, my initial thought was simply to produce the desired string representation — like "[item-1, item-2, item-3]" — right at that point. However, as I see from your implementation (Map<String, ?> properties), the value is not necessarily a String; it could be anything. That insight clarifies the toString() issue, especially since you need to support multiple logging frameworks, not just SLF4J.

For fields like event.type and event.category, commas wouldn’t typically cause a problem, because those sets are limited and defined by you. But for fields such as tags and labels, they can indeed contain arbitrary values, which complicates things. My approach might have been a bit naive, but hopefully it helped spark some ideas.

From my perspective, it’s important to generate ECS-compatible logs directly in the application, but there’s no built-in solution for handling arrays. Because of that, I’d have to apply extra processing steps for event.type and event.category, whereas my goal is to have the application itself produce ECS-formatted logs from the start.

That’s why I decided to explore alternative options, and I ended up choosing the logstash-logback-encoder library. It’s flexible, and with the following logback.xml snippet, I was able to generate ECS-formatted logs:

<!-- logback.xml configuration -->
<configuration>
    <property name="SERVICE_NAME" value="${logging.structured.service.name:-application}" />
    <property name="SERVICE_VERSION" value="${logging.structured.service.version:-1.0.0}" />
    <property name="SERVICE_ENVIRONMENT" value="${logging.structured.service.environment:-development}" />
    <property name="SERVICE_NODE_NAME" value="${logging.structured.service.node.name:-${SERVICE_NAME}}" />

    <appender name="CONSOLE" class="ch.qos.logback.core.ConsoleAppender">
        <encoder class="net.logstash.logback.encoder.LogstashEncoder">
            <timestampPattern>yyyy-MM-dd'T'HH:mm:ss.SSS'Z'</timestampPattern>
            <timeZone>UTC</timeZone>

            <fieldNames>
                <timestamp>@timestamp</timestamp>
                <level>log.level</level>
                <logger>log.logger</logger>
                <thread>process.thread.name</thread>
                <message>message</message>
                <!-- Disable default logstash fields -->
                <levelValue>[ignore]</levelValue>
                <version>[ignore]</version>
                <tags>[ignore]</tags>
            </fieldNames>

            <customFields>
                {
                "ecs.version": "1.2.0",
                "service.name": "${SERVICE_NAME}",
                "service.version": "${SERVICE_VERSION}",
                "service.environment": "${SERVICE_ENVIRONMENT}",
                "service.node.name": "${SERVICE_NODE_NAME}",
                "event.dataset": "${SERVICE_NAME}"
                }
            </customFields>

            <includeMdc>true</includeMdc>
        </encoder>
    </appender>

    <root level="INFO">
        <appender-ref ref="CONSOLE" />
    </root>
</configuration>

And here’s a straightforward example in code:

package com.example

import net.logstash.logback.marker.Markers;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.slf4j.MDC;
import org.slf4j.Marker;

public class MyClass {

    private static final Logger logger = LoggerFactory.getLogger(MyClass.class);

    public void someMethod() {
        try {
            MDC.put("event.kind", "event");
    
            Marker marker = Markers.aggregate(
                Markers.append("event.category", List.of("library")),
                Markers.append("event.type", List.of("info"))
            );
    
            logger.info(marker, "some event to log");
            
        } finally {
            MDC.remove("event.kind");
        }
    }
}

Hopefully this example will be useful for anyone else who needs to generate valid ECS logs.

Thank you for considering my suggestion—I hope it might be revisited in the future so we can handle arrays out-of-the-box without additional processing.

@SylvainJuge
Copy link
Member

Thanks for your exhaustive response, I am happy to learn that you've managed to find an alternative implementation, this would definitely be useful to anyone trying to do the same on their own application.

Also, this issue can be used to gather feedback to anyone having a similar challenge so it helps prioritize properly (if you read this and are interested, +1 or adding a comment would help).

@SylvainJuge SylvainJuge added enhancement New feature or request and removed triage Issues and PRs that need to be triaged labels Jan 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
agent-java community Issues and PRs created by the community enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants