Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Pass CommitProperties object custom metadata in deltalake #3914

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

tkauf15k
Copy link

@tkauf15k tkauf15k commented Mar 5, 2025

Since deltalake>=0.20.0, the api for providing custom metadata has changed.

Before, custom metadata could be provided as dict; now it has to be encapsulated by a CommitProperties object.
See here.

Other commit properties not relevant atm.

There are no version constraints used for deltalake at the moment, so i kept it untouched.
Might be the time to add >=0.20.0.

@github-actions github-actions bot added the fix label Mar 5, 2025
@tkauf15k tkauf15k marked this pull request as ready for review March 5, 2025 17:27
@tkauf15k tkauf15k force-pushed the fix/delta-commit-prop branch from e9321ff to 32e86e8 Compare March 5, 2025 18:13
Copy link

codecov bot commented Mar 5, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 78.50%. Comparing base (aaa677e) to head (b1f0435).
Report is 18 commits behind head on main.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #3914      +/-   ##
==========================================
+ Coverage   77.54%   78.50%   +0.95%     
==========================================
  Files         767      773       +6     
  Lines       99440    98465     -975     
==========================================
+ Hits        77113    77300     +187     
+ Misses      22327    21165    -1162     
Files with missing lines Coverage Δ
daft/dataframe/dataframe.py 86.01% <100.00%> (+0.51%) ⬆️

... and 49 files with indirect coverage changes

🚀 New features to boost your workflow:
  • Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@kevinzwang kevinzwang self-requested a review March 5, 2025 19:04
@kevinzwang
Copy link
Member

Hi @tkauf15k, thank you for contributing this! I'll give it a review soon

@ccmao1130 ccmao1130 added the p1 Important to tackle soon, but preemptable by p0 label Mar 5, 2025
@tkauf15k tkauf15k force-pushed the fix/delta-commit-prop branch 2 times, most recently from 184f3a9 to 9e2a722 Compare March 6, 2025 07:56
Copy link
Member

@kevinzwang kevinzwang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @tkauf15k, thanks for the PR! The change looks good, but instead of doing a try except, why don't we make the behavior be dependent on the deltalake version? We can do this by conditioning on parse(deltalake.__version__) < parse("0.20.0")

Also, instead of creating the _create_custom_metadata function, could you instead just call table._table.create_write_transaction with different parameters depending on the deltalake version?

@tkauf15k
Copy link
Author

tkauf15k commented Mar 7, 2025

I know the try-except is a bit ugly. I considered the parse approach, but afaik there's no real "safe" parse function in the stdlib. There's one in packaging and also in pkg_resources, but not sure if it's worth adding another dependency, even if it's just in the deltalake extra (and even though packaging is quite lightweight and standard). Also doesn't feel right to parse it ourselves tbh (a regex to maintain etc.).

I added the function only to keep the "compatibility code" close to the imports.

Should we go with the packaging as dependency in the deltalake extra approach?

@kevinzwang
Copy link
Member

Yeah that's a good point. We actually do already assume that packaging is installed for certain deltalake operations (example). We should really just add it as a dependency for the deltalake feature in our pyproject.toml. Would you mind doing that?

@tkauf15k tkauf15k force-pushed the fix/delta-commit-prop branch from 9e2a722 to b1f0435 Compare March 9, 2025 13:42
@tkauf15k
Copy link
Author

tkauf15k commented Mar 9, 2025

Updated it and added few tests. @kevinzwang any idea what's happening in the integration tests?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fix p1 Important to tackle soon, but preemptable by p0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants