Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable lazy loading support for DataCatalog 2.0 on Kedro-Viz #2272

Merged
merged 33 commits into from
Feb 28, 2025

Conversation

SajidAlamQB
Copy link
Contributor

@SajidAlamQB SajidAlamQB commented Feb 12, 2025

Description

Related to: #2213

This PR integrates the new lazy-loading feature from Kedro’s updated DataCatalog API into Kedro-Viz.

Specifically, it checks if the project’s catalog is an instance of KedroDataCatalog. This allows Kedro-Viz to show dataset details without requiring full dataset installation. It also prevents unintended data loads if a dataset is configured to be lazy.

Development notes

  • Added conditional imports for KedroDataCatalog with fallbacks
  • Updated repository access methods to handle lazy datasets
  • Made dataset parameters optional (Optional[AbstractDataset])
  • Removed assertions that would fail with lazy datasets
  • Added null-safety checks in all validators and properties
  • Modified to accept both catalog implementations
  • Ensured dataset fetching works consistently for both types
  • Updated type annotations for better API clarity

QA notes

  • Added tests that verify behavior with KedroDataCatalog
  • Added tests for handling null cases in node models

Checklist

  • Read the contributing guidelines
  • Opened this PR as a 'Draft Pull Request' if it is work-in-progress
  • Updated the documentation to reflect the code changes
  • Added new entries to the RELEASE.md file
  • Added tests to cover my changes

Signed-off-by: Sajid Alam <sajid_alam@mckinsey.com>
Signed-off-by: Sajid Alam <sajid_alam@mckinsey.com>
Signed-off-by: Sajid Alam <sajid_alam@mckinsey.com>
@SajidAlamQB SajidAlamQB marked this pull request as draft February 12, 2025 15:26
Signed-off-by: Sajid Alam <sajid_alam@mckinsey.com>
Signed-off-by: Sajid Alam <sajid_alam@mckinsey.com>
Signed-off-by: Sajid Alam <sajid_alam@mckinsey.com>
@astrojuanlu astrojuanlu linked an issue Feb 12, 2025 that may be closed by this pull request
1 task
Signed-off-by: Sajid Alam <sajid_alam@mckinsey.com>
Copy link

@ElenaKhaustova ElenaKhaustova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think with the new catalog we can significantly simplify overall logic, which will be the same for default and lite modes:

  1. We still need to resolve dataset factory patterns, but we do it via the catalog.config_resolver.resolve_pattern() method—it returns dataset configuration and does not require dataset initialization.
  2. The above step is the only step needed to display the dataset
  3. Once the user clicks on the dataset, we get it from the catalog, then call preview and process failure if something is uninstalled
  4. That way, we don't really need UnavailableDataset and lite mode becomes default.

I see that the difficulty is in supporting different catalogs simultaneously and fitting different concepts together. Maybe we need to discuss how to apply the above concept to further transition to a new catalog while maintaining backward compatibility.

Signed-off-by: Sajid Alam <sajid_alam@mckinsey.com>
Signed-off-by: Sajid Alam <sajid_alam@mckinsey.com>
Signed-off-by: Sajid Alam <sajid_alam@mckinsey.com>
Signed-off-by: Sajid Alam <sajid_alam@mckinsey.com>
Signed-off-by: Sajid Alam <sajid_alam@mckinsey.com>
Signed-off-by: Sajid Alam <sajid_alam@mckinsey.com>
Signed-off-by: Sajid Alam <sajid_alam@mckinsey.com>
@SajidAlamQB SajidAlamQB marked this pull request as ready for review February 24, 2025 14:18
SajidAlamQB and others added 3 commits February 24, 2025 14:20
Signed-off-by: Sajid Alam <90610031+SajidAlamQB@users.noreply.github.com>
Signed-off-by: Sajid Alam <sajid_alam@mckinsey.com>
Copy link
Contributor

@rashidakanchwala rashidakanchwala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to handle kedro viz --lite correctly here. I think we are breaking it. Because when I try to uninstall kedro-datasets and do kedro viz --lite it breaks. This is not supposed to happen. kedro viz --lite should work regardless if dependencies are there or not. If kedro-datasets is missing, it runs and just makes all datasets UnavailableDatasets.

SajidAlamQB and others added 3 commits February 27, 2025 11:12
Signed-off-by: Sajid Alam <sajid_alam@mckinsey.com>
…//github.com/kedro-org/kedro-viz into feat/enable-lazy-loading-of-datasets-feature

Signed-off-by: Sajid Alam <sajid_alam@mckinsey.com>
@SajidAlamQB
Copy link
Contributor Author

SajidAlamQB commented Feb 27, 2025

We need to handle kedro viz --lite correctly here. I think we are breaking it.

kedro viz --lite should be working now, we check if lite is required first and do the patching before getting the catalog.

Signed-off-by: Sajid Alam <sajid_alam@mckinsey.com>
Signed-off-by: Sajid Alam <sajid_alam@mckinsey.com>
Signed-off-by: Sajid Alam <sajid_alam@mckinsey.com>
Signed-off-by: Sajid Alam <sajid_alam@mckinsey.com>
Signed-off-by: Sajid Alam <sajid_alam@mckinsey.com>
Signed-off-by: Sajid Alam <sajid_alam@mckinsey.com>
Signed-off-by: Sajid Alam <sajid_alam@mckinsey.com>
Signed-off-by: Sajid Alam <sajid_alam@mckinsey.com>
Signed-off-by: Sajid Alam <sajid_alam@mckinsey.com>
Signed-off-by: Sajid Alam <sajid_alam@mckinsey.com>
Copy link
Contributor

@rashidakanchwala rashidakanchwala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, now it makes sense.

Signed-off-by: Sajid Alam <sajid_alam@mckinsey.com>
Signed-off-by: Sajid Alam <sajid_alam@mckinsey.com>
Copy link
Member

@astrojuanlu astrojuanlu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some questions about the forward compatibility of the KedroDataCatalog kedro-org/kedro#4510 (comment) not sure they should block this PR, but I think it would be cool to sort them out ahead of the next releases.

@SajidAlamQB
Copy link
Contributor Author

I have some questions about the forward compatibility of the KedroDataCatalog kedro-org/kedro#4510 (comment) not sure they should block this PR, but I think it would be cool to sort them out ahead of the next releases.

The current implementation should work with Kedro 0.18.x through 1.0.x (assuming KedroDataCatalog is kept as a deprecated alias in 1.0). However, it would break in Kedro 2.0 when KedroDataCatalog is completely removed.

We could add version detection logic to handle the transition to Kedro 1.0+ differently. Would you recommend implementing that now, or waiting until Kedro 1.0 is closer to release?

@rashidakanchwala
Copy link
Contributor

rashidakanchwala commented Feb 28, 2025

Good question, i thought Kedro 1.0 will deprecate old DataCatalog but I just saw the notes and it says it will be deprecated in 2.0. So far the plan was for Kedro-viz to remove the old DataCatalog methods when Kedro releases 1.0.

cc- @ElenaKhaustova , @merelcht

update - Spoke to Merel, this hasn't been decided yet. Anyways Kedro-viz plan was to continue to support old DataCatalog until Kedro 1.0 is released. So I suppose that decision doesn't block this PR

@SajidAlamQB SajidAlamQB requested a review from jitu5 February 28, 2025 11:39
Copy link
Contributor

@ravi-kumar-pilla ravi-kumar-pilla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works well !! Thank you @SajidAlamQB

@SajidAlamQB SajidAlamQB merged commit a1d69c2 into main Feb 28, 2025
37 checks passed
@SajidAlamQB SajidAlamQB deleted the feat/enable-lazy-loading-of-datasets-feature branch February 28, 2025 21:23
@Huongg Huongg mentioned this pull request Mar 6, 2025
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Enable Lazy loading of datasets feature from latest Kedro
6 participants