Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Add sync handler base #73

Merged
merged 6 commits into from
Feb 24, 2025
Merged

[WIP] Add sync handler base #73

merged 6 commits into from
Feb 24, 2025

Conversation

RRosio
Copy link
Collaborator

@RRosio RRosio commented Jan 31, 2025

Frontend server call functions and base for backend sync handler and tests

@RRosio RRosio added the enhancement New feature or request label Jan 31, 2025
@RRosio RRosio self-assigned this Jan 31, 2025
@RRosio RRosio changed the title Add sync handler base [WIP] Add sync handler base Jan 31, 2025
Copy link
Member

@martindurant martindurant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some thought on this - I wasn't sure if you wanted this already while WIP.

self.fs_manager = fs_manager

async def get(self):
# remote to local (fetching latest changes)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's to be explicitly "remote to local", then fs.get() may be simpler, unless you really use features from rsync (e.g., testing the file sizes of files, deleting ones that disappeared on remote).


try:
# rsync
# (remote_source_path, local_destination_path)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you don't trigger this yet?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can probably figure out here is this is going to be async; OR, we can use the wrapper to make everything async (run on a thread) to make sure we don't block.

# ====================================================================================
# Handle Syncing Local and Remote filesystems
# ====================================================================================
class FilesystemSyncHandler(BaseFileSystemHandler):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest that the FIlesystemManager should have the core functionality, and that the handler only take care of passing on the appropriate parameters and set the output status.

def initialize(self, fs_manager):
self.fs_manager = fs_manager

async def get(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get implies no change of state. I would call both methods "post", but have them on different endpoints or with different operation parameters as with the other filesystem commands.

async def get(self):
# remote to local (fetching latest changes)
request_data = json.loads(self.request.body.decode("utf-8"))
local_destination_path = request_data.get("local_path")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Were we ever thinking of using pydantic? Unwrapping all the keys and validating seems a little tedious.
The endpoint doc must document verbosely what the expected input is like (e.g., this might eventually appear in swagger generated docs).

@@ -556,13 +627,15 @@ def setup_handlers(web_app):
route_fs_file_transfer = url_path_join(
base_url, "jupyter_fsspec", "files", "transfer"
)
route_fs_sync = url_path_join(base_url, "jupyter_fsspec", "sync")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really see why we need a separate URL here is we don't generally for other operations.

fs_manager = fs_manager_instance
remote_fs_info = fs_manager.get_filesystem_by_protocol("s3")
remote_key = remote_fs_info["key"]
remote_fs = remote_fs_info["info"]["instance"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's weird to me, that the call to get_filesystem succeeds without an exception but returns nothing.
Also, didn't we key these things by configured name rather than protocol?

@@ -366,6 +366,52 @@ export class FsspecModel {
}
}

async sync_push(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may wish to eventually provide a feedback mechanism (see e.g., https://filesystem-spec.readthedocs.io/en/latest/features.html#callbacks ).

@RRosio RRosio mentioned this pull request Feb 12, 2025
18 tasks
@RRosio
Copy link
Collaborator Author

RRosio commented Feb 14, 2025

Thank you Martin for your feedback! I have worked on some updates:

  • Basis for sync with get and put (as an upload/download from root)
  • Removed BaseFileSystemHandler => FileSystemManager handlers functionality previously here
  • Using single POST handler function for both mechanisms
  • Using Pydantic for request data validation
  • Using get_filesystem(key) ; Removed get_filesystem_by_protocol()

Still TBD:

@RRosio RRosio marked this pull request as ready for review February 23, 2025 23:20
@RRosio
Copy link
Collaborator Author

RRosio commented Feb 24, 2025

I think that this PR is in a good state to merge in and further iterate on. I will go ahead and merge these changes and continue work as a part of the follow-up tickets that have been linked in my comment above!

@RRosio RRosio merged commit 76b3292 into fsspec:main Feb 24, 2025
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants