-
Notifications
You must be signed in to change notification settings - Fork 115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement Tiered-Storage #1419
Comments
Hi, please let me know how I can help. |
Hi, does it mean that we should support different |
@Xuanwo yes, that sounds right to me. We will need to allow the users to configure the destination details. For example if the "type=ObjectStorage" then allow configs for storage location (S3/GCS etc), bucket name, TTLs, and any other Headers they want to pass for access controls etc. @spetz @hubcio @numinnex can confirm on the "Persister" implementation |
@Xuanwo the idea behind the tiered storage, is to allow the users to store huge amounts of data on a much cheaper storage, such as S3 or anything else that is also supported. Let's consider the following storage kinds in terms of fastest writes/reads:
With VSR (Viewstamped Replication) in place, the user could even ignore the server's disk and simply make use of RAM to achieve the consensus across multiple nodes, and then S3 or similar storage as the persistency layer. Of course, this will be up to the user to decide which tiers to use via the appropriate configuration. The mentioned Currently, the main challenges would be:
These are some of the most important things I can think of now. |
Hi, @spetz. Thanks so much for sharing your ideas on this! Your input is truly excellent. I'm not an expert in the streaming area or in iggy itself, so I’ll just share some storage-related design thoughts for your inspiration.
My expriense is that:
|
Thank you for the tips @Xuanwo, the idea of having an index is smart, and probably the best way to do it - we'll definitely consider implementing this. We already have indexes for both, offset and timestamps which work well, but this might be a bit of a different challenge, to use them in the same, efficient way on S3 or so. Speaking of the batches, it might be that we could even try to write N message batches at once, depending on their total size (if it's close to the configured limit, like the mentioned 8-16MiB range). In the ideal scenario, we would like to allow the users to have freedom of choice and use any of the available OpenDAL integrations, hidden under the single abstraction in our server, but we're yet to see, if there are any potential edge-cases depending on the underlying storage model. |
Iggy should support configurable Tiered-storage functionality, to flush the data to long term storage like S3/GCS/ObjectStorage
Need to support consuming from the long term storage buckets.
Consider leveraging Apache OpenDAL
https://opendal.apache.org
The text was updated successfully, but these errors were encountered: