Implement Tiered-Storage #1419

kparisa · 2025-01-04T06:22:52Z

Iggy should support configurable Tiered-storage functionality, to flush the data to long term storage like S3/GCS/ObjectStorage

Need to support consuming from the long term storage buckets.

Consider leveraging Apache OpenDAL
https://opendal.apache.org

Xuanwo · 2025-01-14T05:27:30Z

Hi, please let me know how I can help.

Xuanwo · 2025-01-20T09:26:40Z

Hi, does it mean that we should support different Persister other than FilePersister?

https://github.com/iggy-rs/iggy/blob/0dc44618aaa8cc6d0d918065f13d24b05eaba013/server/src/streaming/persistence/persister.rs#L37-L71

kparisa · 2025-01-20T09:33:10Z

@Xuanwo yes, that sounds right to me. We will need to allow the users to configure the destination details. For example if the "type=ObjectStorage" then allow configs for storage location (S3/GCS etc), bucket name, TTLs, and any other Headers they want to pass for access controls etc.

@spetz @hubcio @numinnex can confirm on the "Persister" implementation

spetz · 2025-01-20T20:23:19Z

@Xuanwo the idea behind the tiered storage, is to allow the users to store huge amounts of data on a much cheaper storage, such as S3 or anything else that is also supported. Let's consider the following storage kinds in terms of fastest writes/reads:

RAM - ideal for caching and extremely fast reads
NVMe SSD - very fast writes & reads, could be sufficient for the users who either do not produce e.g. TBs of data, or have the cleanup policy in place (such as removing the old segments which are not needed anymore).
S3 or similar storage - can be quite fast too (for the typical use-cases), and allows storing massive amounts of data at a relatively small cost

With VSR (Viewstamped Replication) in place, the user could even ignore the server's disk and simply make use of RAM to achieve the consensus across multiple nodes, and then S3 or similar storage as the persistency layer. Of course, this will be up to the user to decide which tiers to use via the appropriate configuration.

The mentioned Persister trait could be probably ignored for now, as we do have a plan to rewrite it & also update our message batch with the new metadata (optimized for writes & reads), yet this is something we plan to discuss with the community in the near future.

Currently, the main challenges would be:

Design of the components allowing to seamlessly write/read to/from the particular persistency tier in the following order: RAM -> Disk -> S3 (depending on whether Disk/S3 or similar is enabled in the first place). For example, let's say that there are the following offsets available:
- 0...9999 on S3
- 10000...19999 on Disk
- 20000...20990 in RAM
  The server needs to know where to look for the data depending on the provided offset range (or even combine multiple sources) - it's currently working for disk & RAM.
Storing the data on the further tier (e.g. moving the segment from disk to S3) depending on the specified conditions - it's also partially implemented for the message archiver process which simply archives & uploads the historical data to S3 before deleting it from the disk
Efficiently writing to & reading from S3 or similar storage - having billions of files representing the messages can be quite costly. Most likely we would only write the message batches as a whole, but it's yet to be discussed.
Ensuring that the integration will also work with the other runtimes than Tokio. We do consider using monoio for its io_uring support (already have the experimental branch for it), and it'd be great to have the async support too. Maybe the other possible option could be an external plugin or even a process with IPC in place (between itself and Iggy server) taking care of S3 or similar storage writes/reads, but ideally, we'd like to avoid this.
In case of the major version upgrade and breaking changes requiring the migration, we would need a way of updating the stored messages. Hopefully, it shouldn't happen, but theoretically it's possible at some point in the future.

These are some of the most important things I can think of now.

Xuanwo · 2025-01-21T07:42:43Z

Hi, @spetz. Thanks so much for sharing your ideas on this! Your input is truly excellent. I'm not an expert in the streaming area or in iggy itself, so I’ll just share some storage-related design thoughts for your inspiration.

Efficiently writing to & reading from S3 or similar storage - having billions of files representing the messages can be quite costly. Most likely we would only write the message batches as a whole, but it's yet to be discussed.

My expriense is that:

Avoid making LIST calls on object storage services, as they can be time-consuming and challenging to scale. A single LIST call takes approximately 50ms and returns 1,000 objects, which means we can only list 20,000 objects per second. Instead, maintaining an index file would be a more efficient approach.
Writing in batches is a great idea and is recommended by most object storage services. To maximize bandwidth efficiency, I encourage writing in sizes between 8 MiB and 16 MiB per request.

spetz · 2025-01-21T12:38:02Z

Thank you for the tips @Xuanwo, the idea of having an index is smart, and probably the best way to do it - we'll definitely consider implementing this. We already have indexes for both, offset and timestamps which work well, but this might be a bit of a different challenge, to use them in the same, efficient way on S3 or so.

Speaking of the batches, it might be that we could even try to write N message batches at once, depending on their total size (if it's close to the configured limit, like the mentioned 8-16MiB range).

In the ideal scenario, we would like to allow the users to have freedom of choice and use any of the available OpenDAL integrations, hidden under the single abstraction in our server, but we're yet to see, if there are any potential edge-cases depending on the underlying storage model.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement Tiered-Storage #1419

Implement Tiered-Storage #1419

kparisa commented Jan 4, 2025

Xuanwo commented Jan 14, 2025

Xuanwo commented Jan 20, 2025

kparisa commented Jan 20, 2025

spetz commented Jan 20, 2025

Xuanwo commented Jan 21, 2025

spetz commented Jan 21, 2025 •

edited

Loading

Implement Tiered-Storage #1419

Implement Tiered-Storage #1419

Comments

kparisa commented Jan 4, 2025

Xuanwo commented Jan 14, 2025

Xuanwo commented Jan 20, 2025

kparisa commented Jan 20, 2025

spetz commented Jan 20, 2025

Xuanwo commented Jan 21, 2025

spetz commented Jan 21, 2025 • edited Loading

spetz commented Jan 21, 2025 •

edited

Loading