-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added serde customization support in Fast-Avro #520
Added serde customization support in Fast-Avro #520
Conversation
df7e7fc
to
856fc4c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't finish reviewing, but LGTM so far. I'll finish later.
// Test with an empty map | ||
GenericRecord recordWithEmptyMap = new GenericData.Record(recordSchema); | ||
recordWithEmptyMap.put("testInt", new Integer(200)); | ||
recordWithEmptyMap.put("testMap", Collections.emptyMap()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes me think of something... using the builder introduced in this PR, we could take advantage of datasets with many empty maps to make them take up less space on heap by leveraging the singleton empty map, e.g.:
DatumReaderCustomization customization = new DatumReaderCustomization.Builder()
.setNewMapOverrideFunc( (reuse, size) -> {
if (reuse instanceof ConcurrentHashMap) {
((ConcurrentHashMap)reuse).clear();
return reuse;
} else if (size == 0) {
return Collections.emptyMap();
} else {
return new ConcurrentHashMap<>(size);
}
})
.build();
This could be done built-into fast-avro itself, but it's probably not a good default, since it means the returned map is immutable, which would be a breaking change in some cases. But it's a good use case for customization...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, we can consider doing this in OSS Venice.
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #520 +/- ##
============================================
+ Coverage 45.77% 45.78% +0.01%
- Complexity 4440 4444 +4
============================================
Files 403 403
Lines 28070 28070
Branches 4622 4622
============================================
+ Hits 12850 12853 +3
Misses 13664 13664
+ Partials 1556 1553 -3 ☔ View full report in Codecov by Sentry. |
Thanks for adding back the generated classes @gaojieliu, but can we add them before the code change, so we can see the diff? i.e. these steps as separate commits:
On a separate note, @radai-rosenblatt, how strongly do you feel about not checking in the generated code? It makes it quite tedious, needing to do these git acrobatics in order to see the effect of the changes to the meta code... |
290b372
to
6f03f25
Compare
It will be great to checkin one version of generated classes, otherwise the PR effort will be too high.. |
ecb02fb
to
0f1e01e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for restructuring the PR's commits. I left one more comment.
...eserialization/AVRO_1_11/Array_of_TestRecord_SpecificDeserializer_1088113243_1088113243.java
Outdated
Show resolved
Hide resolved
We have the following requirements: For serialization, we would like to validate whether all the map fields are using the desired map type. For deserialization, we would like to deserialize the map type into a special map impelementation for later use. These customized requirements are not supported in the past because of the following reasons: 1. Fast classes generated are shared, so it is possible different users of the same schema may have different requirement. 2. For the same process, for different schema, the requirements can be different too. 3. No way to specify customized logic/data type when generating fast classes. This PR adds a new functionality to specify customized logic and it is expandable and backward compatible. DatumReaderCustomization : customization for read DatumWriterCustomization : customization for write Currently, these classes only support the requirements mentioned at the beginning. How it works internally? 1. Each Fast DatumReader/DatumWriter constructor will take a new param for customization. 2. Each Fast DatumReader/DatumWriter will keep a local vanilla-Avro based implementation with customization support since the shared vanilla-Avro based implementation is still the default implementation. 3. Each generated Fast class will have a new param for customization in serialize/deserialize APIs. 4. Fast DatumReader/DatumWriter will call this new API with customization param of Fast classes. 5. The read/write API in Fast DatumReader/DatumWriter doesn't change, so it is backward compatible.
0f1e01e
to
854aab3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for iterating on this Gaojie. It LGTM overall. I have a few more minor comments.
Also, I think it would be good to make the float list customization part of this new framework (but it doesn't have to be part of this PR).
...fastserde/src/main/java/com/linkedin/avro/fastserde/customized/DatumReaderCustomization.java
Outdated
Show resolved
Hide resolved
...serde/avro-fastserde/src/main/java/org/apache/avro/generic/CustomizedGenericDatumReader.java
Outdated
Show resolved
Hide resolved
...serde/avro-fastserde/src/main/java/org/apache/avro/generic/CustomizedGenericDatumWriter.java
Outdated
Show resolved
Hide resolved
...fastserde/src/main/java/com/linkedin/avro/fastserde/customized/DatumWriterCustomization.java
Show resolved
Hide resolved
854aab3
to
322fba8
Compare
Hi @gaojieliu
Presence of the new class ( We use
but it looks like |
Thanks for reporting... that is weird, since I don't see changes to the |
OK, it's a bit more complex scenario :)
where Now, in version
which ignores invocation of Basically I did an experiment and injecting
to |
Ah, I see, that's very interesting... we should add a test for that (: |
Hi @FelixGV @gaojieliu
We also provided bugfix for another case: |
Hi, |
We have the following requirements:
For serialization, we would like to validate whether all the map fields are using the desired map type.
For deserialization, we would like to deserialize the map type into a special map impelementation for later use.
These customized requirements are not supported in the past because of the following reasons:
This PR adds a new functionality to specify customized logic and it is expandable and backward compatible.
DatumReaderCustomization : customization for read
DatumWriterCustomization : customization for write
Currently, these classes only support the requirements mentioned at the beginning.
How it works internally?