Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

S3FileSystem.checksum() returns different values if file is moved to a new bucket #941

Open
gdzelinka opened this issue Feb 14, 2025 · 2 comments

Comments

@gdzelinka
Copy link

The value of S3FileSystem.checksum() changes when a file is moved to a new location even if the file is not modified.

I have a file at s3://path/file
I copy that file to s3://path/a/file
The values of S3FileSystem.checksum('s3://path/file) and S3FileSystem.checksum('s3://path/a/file') are different when they should be the same.

@martindurant
Copy link
Member

Although called "checksum", this is in fact the ETag and should be interpreted as a unique identifier of a version. The doc says:

        Unique value for current version of file

        If the checksum is the same from one moment to another, the contents
        are guaranteed to be the same. If the checksum changes, the contents
        *might* have changed.

@martindurant
Copy link
Member

S3 might also provide content checksums (sha, md5, crc) and check this during an upload operation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants