-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature request, write_element does not materialize computation in in-memory SpatialData Object #866
Comments
Thanks for reporting. I also endorse the use case. I wonder if a more explicit approach, were the users loads the object in-memory completely before calling |
Loading the object in-memory before calling
I think writing pipelines using a spatialdata object backed by disk should ideally support idempotent functions, e.g., operation(sdata, ..., inplace=True), where the user does not have to worry about loading back from the zarr store, and layers not being materialized. |
I see this issue is already mentioned here #521. I think reloading/materializing computation could be implemented as follows:
|
Thanks for sharing!
I'll expand on the above: we would like allow the data model to accept also non-lazy objects whenever we have also the lazy option. Tracked for instance here #153. For instance right now it is not possible to do Do you think the approach you described based on |
I think adding support for in-memory equivalent of elements would be a nice feature, but I would restrict it to the in-memory equivalents of the elements ( dask->numpy, daskdataframe->pandas,... etc ). If a In short, I don't think spatialdata should call |
Ok thanks for the info. Then I think a good way forwards would be to have the |
Indeed,
|
When using
sdata.write_element()
, the Dask computation remains lazy in the in-memory SpatialData object, when SpatialData object is backed by disk.So each time we do
sdata.write_element(...)
, we need to callspatialdata.read_zarr(sdata.path)
.See example:
Although this is in line with e.g.
dask.array.to_zarr
anddask.array.from_zarr
, it would be a nice feature to have an extra parameterinplace
, e.g.sdata.write_element(..., inplace=True)
, which takes care of reloading from the zarr store. Ideally this would only reload the element that was written to disk, not the complete Spatialdata Zarr store, which can be slow when there are lot of tables or shapes, as these are always loaded in-memory.Also see saeyslab/harpy#90 for additional context.
The text was updated successfully, but these errors were encountered: