Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DEVEX-222] Add built-in auto-serialization #329

Merged
merged 49 commits into from
Feb 27, 2025

Conversation

oskardudycz
Copy link
Collaborator

@oskardudycz oskardudycz commented Jan 23, 2025

Motivation

KurrentDB users should be able to quickly feel efficient in using basic operations like appending new messages, reading them or subscribing to notifications about them. The current API is not fulfilling it as it doesn't provide safe defaults for the most common scenarios like:

  • building the state from events,
  • appending new events as an outcome of the business logic,
  • subscribing to stream etc.

Currently, they need to consider immediately in which order to read the stream, what the deadline is, and what max count to provide. And most importantly, how to serialize data and which serializer to use, as there's none. All of those customisations are needed in the end but shouldn't be thrown immediately on users if they just want to use the default recommended way. In most development environments, you can find the default, mature choices for serialization.

Such code may not be complex; once you get it right, you don't need to change it too often. But if it's stacked with multiple other things you need to keep in mind, then it's easy to miss something. Most importantly, we shouldn't require our users to build their own wrappers around KurrentDB, but we should provide an accessible API to benefit users from using KurrentDB from the get-go.

Solution

The code adds automatic serialization of message payloads. It uses JSON as a default but allows users to implement the binary format themselves. Thanks to that, user can:

  • append plain messages ,
  • use a simplified Message wrapper that takes data, and optionally metadata and message-id, this hides the unwanted complexity of EventData,
  • read messages deserialised automatically. Thanks to that, the user doesn't need to think about which resolved event is correct (event, original event, they can just use Message or DeserializedData fields).
  • subscribe to messages deserialised automatically.

To introduce those capabilities, it was necessary to introduce new methods not to make breaking changes. Having that, I benefited from it and reshaped their APIs.

Now, they just take mandatory data as parameters (e.g. stream name when appending or reading from stream) and pack the rest of the optional parameters into the options parameter. It also wraps safe defaults. That's aligned with how Java and Node.js clients are doing.

Thanks to that, users can use such simple APIs as:

var streamName = $"users-{Guid.NewGuid()}";

// Appending
var messages = [new UserRegistered(email, name)];
client.AppendToStreamAsync(streamName, messages);

// Appending with options
var messages = [new UserRegistered(email, name)];
await client.AppendToStreamAsync(
    streamName
    messages, 
    new AppendToStreamOptions{ ExpectedStreamState = StreamState.NoStream }
);

// Reading
List<UserRegistered> readMessages = 
    client.ReadStreamAsync(streamName)
    .Select(e => e.DeserializedData).ToListAsync();

// Reading with options
List<UserRegistered> readMessages = 
    client.ReadStreamAsync(streamName, new ReadStreamOptions { MaxCount = 3 })
    .Select(e => e.DeserializedData).ToListAsync();

// Subscribing
await using var subscription = client.SubscribeToStream(streamName);

// Subscribing with options
await using var subscription = client
    .SubscribeToStream(streamName, new SubscribeToStreamOptions{ Start = FromStream.End})

Next steps

After getting approval for changes in the .NET client, that should be applied accordingly to other clients (starting from Java and Node.js).

This change is foundational for enabling scenarios like getting state from events, running business logic, and appending new events as an outcome.

Details

0. No intentional breaking changes were made.

1. This PR introduces the default System.Text.Json serializer and uses it both for JSON and Binary serialization. Users can change the default serialization and also override it through each operation option.

2. Automatic serialization is only enabled for new methods. Thanks to that, people won't get additional overhead upon migration and will be motivated to migrate to the new methods. I didn't make old methods obsolete, but I'd suggest marking them as [Obsolete("Use the new method x. This method may be removed in future releases", false)]. Such registration won't be marked as a warning but as information, so we won't have friction and confusion. This would also give a hint for existing users that they can use new methods.

3. A basic client-side schema registry was introduced. Currently, it's a simple implementation that is responsible for finding the proper serializer based on the content type and mapping between the message type name and CLR type. Default convention is: {categoryName}-{CLR Type Full Name}.

The CLR name will be of course, specific to .NET, but if customers are also using other clients (e.g. Java, Node.js) then they can override the convention by injecting a custom implementation of the IMessageTypeNamingStrategy interface into settings. They can also define different, more universal conventions that are not based on C# or Java types, but then they'll also need to register those types up front.

Users can register message type mapping upfront or benefit from the automated registration. The automated registration will try to load type from assemblies available in AppDomain. It won't load assemblies. No assembly loading code is provided out of the box or registration by attribute. This can be added to the follow-up PR if needed. We could also use Assembly Qualified Name; then, we'd get assembly loading out of the box.

The Mappings are cached in the type registry to improve performance.

The automatic registration was added to allow automatic use of KurrentDB and diverge convention when needed.

4. General serialization settings can be set through client settings, which include:

  • Changing the default content type to octet stream,
  • Changing the default System.Text.Json options. That should help to adjust to settings that users are currently using (if they're different from the recommended ones),
  • Overriding serializer implementations. User can implement the new ISerializer interface for a specific content type. For now, I haven't provided a serializer for Protobuf and Avro, as the PR is already big. This can (and should) be added later.
  • Changing type name mapping convention,
  • Register type mappings manually. This can help register types manually upfront or those that are not following the default conventions.
  • Register CLR message types for the stream category. They will be mapped using the default (or registered) naming conventions.
  • Registering the custom metadata type. For now default serializer either takes TraceMetadata or same metadata type for all messages. Serialization will take data as it is. The more nuanced default handling can be provided by the user in the custom IMessageTypeNamingStrategy implementation. We can also expand it in the future.

All of that can also be customized for each operation.

5. The central piece of serialization is MessageSerializer class that orchestrates the serialization code. It does that by:

  • resolving message type using the registered in schema registry mapping strategy,
  • taking the serializer from the schema registry based on the intended content type,
  • (de)serializing message data using it,
  • (de)serializing message metadata as JSON

6. Old methods were made of wrappers for new ones. Thanks to that, the old test suite also covers new methods, and there was no need to repeat it. It'll also be easier to drop them in the future. The integration tests were added, covering additional scenarios.

TODO:

  • ensuring that the minimum set of types are expected, and the rest are made internally,
  • detailed unit tests after getting an agreement on the design.

…ate method

This should enable easier customisation like changing serializer settings.
@oskardudycz oskardudycz force-pushed the DEVEX-222-AutoSerialization branch 3 times, most recently from 748955e to 64d583d Compare January 23, 2025 13:46
@oskardudycz oskardudycz force-pushed the DEVEX-222-AutoSerialization branch 2 times, most recently from 59666ee to d12c429 Compare January 23, 2025 15:15
…erialization

It's not fully ready, as it has hardcoded schema serializer. Main question will be how much from schema registry I need to move here.
@oskardudycz oskardudycz force-pushed the DEVEX-222-AutoSerialization branch from d12c429 to 001f1a0 Compare January 23, 2025 15:24
…rom ResolvedEvent

Per @RagingKore suggestion it'll be better not to keep the reference to serializer in ResolvedEvent to keep the clear intention behind it.

Now, the serializer is resolved from schema registry based on the auto-serialization settings.

Made possible to register more than one serializer per content type. Currently it doesn't support more schema definition types. To be discussed how to include them later on.
That will make resolution easier and give users ability to inject their own serializers implementation
…rialization Context

Previously we had DeserializationContext, but after consideration, it'll be better to merge those concepts to make easier customizations per operations (e.g. append, subscribe, etc.).
Refactored the code accordingly. It takes the full object instead of the limited number of parameters, as you may be using metadata to get parameters about clrtype.
@oskardudycz oskardudycz force-pushed the DEVEX-222-AutoSerialization branch from 6677a46 to 64c79e7 Compare January 29, 2025 10:00
@oskardudycz oskardudycz force-pushed the DEVEX-222-AutoSerialization branch from 4e88a61 to 786caf3 Compare January 29, 2025 11:04
…pper to make clearer responsibilities

Actually, it's just wrapping message serializer based on the client settings.
Removed also obsolete SerializationSettings
… IMessageSerializer

There's no need to force all code to know about wrapper existence, it's okay to just create it during the setup.
…ied syntax

Now they don't require all the fancy, but redundant in most cases Stream Positions, Directions, EventData etc.
…ad of parameters

Thanks to that safe defaults are already provided
@oskardudycz oskardudycz added kind/enhancement .NET Pull requests that update .net code labels Feb 13, 2025
@oskardudycz oskardudycz marked this pull request as ready for review February 13, 2025 16:52
readonly JsonSerializerOptions _options = options?.Options ?? SystemTextJsonSerializationSettings.DefaultJsonSerializerOptions;

public ReadOnlyMemory<byte> Serialize(object value) {
return Encoding.UTF8.GetBytes(JsonSerializer.Serialize(value, _options));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use SerializeToUtf8Bytes?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in e8385b3

public interface IMessageSerializer {
public EventData Serialize(Message value, MessageSerializationContext context);

#if NET48
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pragma is not needed if the project is compiled with latest SDK

);
}

#if NET48
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pragma is not needed if the project is compiled with latest SDK

.TryResolveClrMetadataType(record.EventType, out clrMetadataType);
}

public class NulloMessageSerializer : IMessageSerializer {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's Nullo?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automatic serialization is only enabled for new methods. Thanks to that, people won't get additional overhead upon migration and will be motivated to migrate to the new methods.

To have that I need to pass IMessageSerializer to be able to (de)serialize data, but only for new methods. Having that, I could either make the serializer nullable or provide such nullo implementation that does not serialize data, just returns null. I took such a path, as then when we get rid at some point of the old methods, then we can just remove them without changing all code paths.

Also, someone may want to disable deserialization for specific methods, e.g. if they want to just restream bytes.

See how the serializer is created based on the operations settings:

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, nonetheless Nullo is not an ideal name. I suggest Null or Nullable.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed to NullMessageSerializer in f0a86fc

throw new InvalidOperationException("Cannot serialize, automatic deserialization is disabled");
}

#if NET48
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pragma is not needed if the project is compiled with latest SDK

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually have the latest version, and its required indeed.
Screenshot 2025-02-21 at 11 25 56

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


public class PersistentSubscriptionListener {
#if NET48
/// <summary>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docs comment should not be inside pragma

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in e8385b3

try {
await foreach (var message in _channel.Reader.ReadAllAsync(_cts.Token)) {
if (message is PersistentSubscriptionMessage.SubscriptionConfirmation(var subscriptionId
))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we merge this line to previous one?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in e8385b3

_ => null
}
);
public Task Nack(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are formatting changes, can we avoid those? It seems that the formatting is not being applied correctly.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I tried to avoid them where possible, but when I applied automated formatting to the files, I changed them, and then those appeared. So, it seems that those files did not match the formatting rules. Should I revert those changes?

/// <param name="cancellationToken">The optional <see cref="System.Threading.CancellationToken"/>.</param>
/// <returns></returns>
public Task<IWriteResult> AppendToStreamAsync(
string streamName,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prepping for multi-stream transactions, can we already add a structure that represents an append to a stream? Where stream name, expected version for the whole operation, and new events are combined?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to have that as a follow-up PR, as I'd need to learn more about the multi-stream transactions. My gut feeling tells me that it can be a separate method or overload. Is it fine to move it to a separate discussion? I think that it'll still take some time for the rebranded client to be released.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed


var eventsData = _messageSerializer.Serialize(messages, serializationContext);

return options.ExpectedStreamRevision.HasValue
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the expected version belongs to Options tbh.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It already belongs to options in all the other clients.

The intention is to keep optional parameters with safe defaults there and not require to provide them.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets keep it in the options, but provide an extension so we dont have to create options just to pass the ExpectedStreamRevision.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added such extension methods in 95f8546

@oskardudycz oskardudycz force-pushed the DEVEX-222-AutoSerialization branch 3 times, most recently from 0cdb254 to 869e503 Compare February 25, 2025 13:11
@oskardudycz oskardudycz force-pushed the DEVEX-222-AutoSerialization branch 7 times, most recently from 8569d78 to 32461a5 Compare February 26, 2025 08:30
@oskardudycz oskardudycz force-pushed the DEVEX-222-AutoSerialization branch from 32461a5 to e938afe Compare February 26, 2025 08:33
Used also two external assemlies
- one that's loaded,
- one that's never loaded.

To double-check behaviour of loading types from different assemblies.
@oskardudycz oskardudycz force-pushed the DEVEX-222-AutoSerialization branch from a490f31 to 1af1def Compare February 26, 2025 11:27
@w1am w1am merged commit 44607ea into DEVEX-185-Rebranding Feb 27, 2025
45 checks passed
w1am added a commit that referenced this pull request Feb 27, 2025
w1am added a commit that referenced this pull request Feb 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants