Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs/nodes-grouping #4519

Open
wants to merge 39 commits into
base: main
Choose a base branch
from
Open

docs/nodes-grouping #4519

wants to merge 39 commits into from

Conversation

Huongg
Copy link
Contributor

@Huongg Huongg commented Feb 25, 2025

Description

closes #4441
Documentation to guide users on various ways to group their nodes in Kedro, including pipelines, tags, and namespaces.

Rendered version: Link

Development notes

Developer Certificate of Origin

We need all contributions to comply with the Developer Certificate of Origin (DCO). All commits must be signed off by including a Signed-off-by line in the commit message. See our wiki for guidance.

If your PR is blocked due to unsigned commits, then you must follow the instructions under "Rebase the branch" on the GitHub Checks page for your PR. This will retroactively add the sign-off to all unsigned commits and allow the DCO check to pass.

Checklist

  • Read the contributing guidelines
  • Signed off each commit with a Developer Certificate of Origin (DCO)
  • Opened this PR as a 'Draft Pull Request' if it is work-in-progress
  • Updated the documentation to reflect the code changes
  • Added a description of this change in the RELEASE.md file
  • Added tests to cover my changes
  • Checked if this change will affect Kedro-Viz, and if so, communicated that with the Viz team

Signed-off-by: Huong Nguyen <huong.nguyen@mckinsey.com>
@DimedS
Copy link
Member

DimedS commented Feb 26, 2025

Thanks for the PR, @Huongg - great job! Here are a few initial comments:

  • The table is a bit hard to read, possibly due to short line spacing.
  • The images are too small and not clickable, making it difficult to see details.
  • Some table entries contain too much text. We might consider shortening them by keeping only the key points where necessary.
Screenshot 2025-02-26 at 10 49 01

Copy link
Member

@merelcht merelcht left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The general content looks good to me, but I'm not sure the table is the right format. It's quite small and the bullet points aren't formatted nicely inside the columns. Perhaps you can just write the content inside paragraphs with clear headings?

I've also left some comments on the phrasing, I find the tone of the text a bit too marketing style, like we're trying to sell this feature instead of explaining it. I'd suggest changing the tone to make it more informative and explain the why more than the what.

Huong Nguyen added 3 commits February 26, 2025 19:00
Signed-off-by: Huong Nguyen <huong.nguyen@mckinsey.com>
Signed-off-by: Huong Nguyen <huong.nguyen@mckinsey.com>
Signed-off-by: Huong Nguyen <huong.nguyen@mckinsey.com>
@Huongg Huongg force-pushed the docs/nodes-grouping branch from 7fae928 to 53dae3c Compare February 27, 2025 14:58
Huong Nguyen added 4 commits February 27, 2025 15:13
Signed-off-by: Huong Nguyen <huong.nguyen@mckinsey.com>
Signed-off-by: Huong Nguyen <huong.nguyen@mckinsey.com>
Signed-off-by: Huong Nguyen <huong.nguyen@mckinsey.com>
Signed-off-by: Huong Nguyen <huong.nguyen@mckinsey.com>
@Huongg
Copy link
Contributor Author

Huongg commented Feb 27, 2025

hey @DmitrySorokinQB and @merelcht thanks for reviewing this PR! I agree that the table format was quite hard to read in this form. I've updated it to use paragraphs instead, but I've also included a summary table at the end to highlight the key points and make it easier for users.

@Huongg Huongg marked this pull request as ready for review February 27, 2025 19:23
@Huongg Huongg changed the title [DRAFT] docs/nodes-grouping docs/nodes-grouping Feb 27, 2025
Copy link
Member

@astrojuanlu astrojuanlu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot @Huongg and team! Left a few comments.

More broadly, the "what works in Kedro - what doesn't work in Kedro - how to use" section titles feel a bit repetitive, but the "best used when - not to use when" are useful. Maybe when we're settled on the content we can turn some of these bullet lists into prose?

![Switching between different pipelines in Kedro Viz](../meta/images/kedro_viz_switching_pipeline.gif)

### What doesn't work with Kedro
- If you need to create custom node groupings that are different with your existing pipelines, creating new pipelines for them may not be convenient. Instead, you can use alternative grouping methods such as tags or namespaces.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite understand this sentence 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think what I'm trying to say is "if you want to group nodes to something different than the current pipeline structure, instead of creating a new pipeline, using tags or namespace can help you with that"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll make it clearer

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"If you want to group nodes differently from the current pipeline structure, instead of creating a new pipeline, you can use tags or namespaces to achieve that." ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, now I understand! However, it's not really a disadvantage right? It's just that users have an alternative in this case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah thats true, it also doesnt fit in any categories. Maybe there's no need to mention this at all.

@astrojuanlu
Copy link
Member

Huongg and others added 7 commits February 28, 2025 16:04
Co-authored-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>
Signed-off-by: Huong Nguyen <32060364+Huongg@users.noreply.github.com>
Co-authored-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>
Signed-off-by: Huong Nguyen <32060364+Huongg@users.noreply.github.com>
Co-authored-by: Merel Theisen <49397448+merelcht@users.noreply.github.com>
Signed-off-by: Huong Nguyen <32060364+Huongg@users.noreply.github.com>
Co-authored-by: Merel Theisen <49397448+merelcht@users.noreply.github.com>
Signed-off-by: Huong Nguyen <32060364+Huongg@users.noreply.github.com>
Co-authored-by: Merel Theisen <49397448+merelcht@users.noreply.github.com>
Signed-off-by: Huong Nguyen <32060364+Huongg@users.noreply.github.com>
Co-authored-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>
Signed-off-by: Huong Nguyen <32060364+Huongg@users.noreply.github.com>
Co-authored-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>
Signed-off-by: Huong Nguyen <32060364+Huongg@users.noreply.github.com>
Huong Nguyen added 3 commits March 5, 2025 08:50
Signed-off-by: Huong Nguyen <huong.nguyen@mckinsey.com>
Signed-off-by: Huong Nguyen <huong.nguyen@mckinsey.com>
@rashidakanchwala
Copy link
Contributor

I had a first read, and overall, it looks great! Initially, I felt that node grouping could technically be done without Kedro-Viz. That said, visualizing how you group nodes is crucial, and Kedro-Viz plays a key role in that. The Viz screenshots also make it much easier to understand each approach we take.

Thanks, @Huongg and @DimedS!

@astrojuanlu
Copy link
Member

Docs are broken because MLflow just migrated theirs to Docusaurus 👀 mlflow/mlflow#13645

@astrojuanlu
Copy link
Member

Signed-off-by: Huong Nguyen <huong.nguyen@mckinsey.com>
@astrojuanlu
Copy link
Member

Unclear where this error comes from, the link looks just okay 🤔 maybe add a final slash?

❯ curl -I "https://towardsdatascience.com/the-importance-of-layered-thinking-in-data-engineering-a09f685edc71"
HTTP/2 301 
date: Wed, 05 Mar 2025 21:25:31 GMT
content-type: text/html; charset=UTF-8
location: https://towardsdatascience.com/the-importance-of-layered-thinking-in-data-engineering-a09f685edc71/

Copy link
Member

@astrojuanlu astrojuanlu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good, thanks a lot @Huongg 🙏🏼 two departing comments:

  • The hierarchy still seems broken for some reason, compare
latest this pr
image image
  • I still think it would be good to turn the bullet points of what was "what works - what doesn't work" into prose. The bullet points of "best used when - not to use when" do look good though, as a kind of summary.

Huong Nguyen added 5 commits March 6, 2025 20:00
Signed-off-by: Huong Nguyen <huong.nguyen@mckinsey.com>
Signed-off-by: Huong Nguyen <huong.nguyen@mckinsey.com>
Signed-off-by: Huong Nguyen <huong.nguyen@mckinsey.com>
Signed-off-by: Huong Nguyen <huong.nguyen@mckinsey.com>
@Huongg
Copy link
Contributor Author

Huongg commented Mar 6, 2025

hey @astrojuanlu, Thanks for the review and for catching the bug in the content navigation! I've updated the structure to reflect the fix.

Screenshot 2025-03-06 at 20 37 57

I also consolidated all the points, and they look much better now. As for the failed test, it’s passing now—sometimes links break randomly but resolve themselves after a new build.

Huong Nguyen added 2 commits March 6, 2025 20:49
Signed-off-by: Huong Nguyen <huong.nguyen@mckinsey.com>
Signed-off-by: Huong Nguyen <huong.nguyen@mckinsey.com>
Copy link
Contributor

@rashidakanchwala rashidakanchwala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. thanks!

Just one question - in the navigation panel, under 'Deployment', does it make sense to keep 'Node Grouping at the beginning' rather than at the end because the grouping comes early in the workflow.

@Huongg
Copy link
Contributor Author

Huongg commented Mar 7, 2025

under 'Deployment', does it make sense to keep 'Node Grouping at the beginning' rather than at the end because the grouping comes early in the workflow.

hey @rashidakanchwala good point, however I think its good practice to have the navigation bar reflects the structure of the content page. So if we change the order of the navigation bar, we will need to consider to change the structure of the content page too.

cc @iamelijahko and @stephkaiser to get their thoughts on this

Signed-off-by: Huong Nguyen <huong.nguyen@mckinsey.com>
Copy link
Member

@merelcht merelcht left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some small comments on wording. The most important point I think is around the "when not to use" section for namespaces.

**Not to use when**
- The tagged nodes have strong dependencies, which might cause execution failures.
- Tags are not hierarchical, so tracking groups of nodes can become difficult.
- Tags do not enforce structure like pipelines or namespaces.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this last point isn't so much a Not to use when but more a note on tags. Maybe it makes sense to mention this before the best used when/not to use when sections.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DONE 👍

- Your pipeline structure is well-defined, and using namespaces improves visualisation in Kedro-Viz.
- Customising deployment groups by adding namespaces at the node level.

**Not to use when**
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like in this section these are more phrased like "downsides" of namespaces rather than Not to use when points. Maybe we can say something like: not to use when namespaces overcomplicate an otherwise simple pipeline structure or not to use when editing the catalog is not wanted. Maybe a bit more elaborate, but in any case I think the point we need to bring across here is that namespaces will add complexity and so it's not advised to use them in simple projects where pipeline grouping is better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point, I've updated the copy to highlight the key point you mentioned. Let me know what you think ? @merelcht

**Not to use when**
- In small and simple projects, using namespaces can introduce unnecessary complexity, making pipeline grouping a more suitable choice.
- Namespaces require additional effort, such as updating catalog names, since namespace prefixes are automatically applied to all the elements unless explicitly overridden in the namespaced pipeline parameters.

I also included the example on how users can overrides the parameters directly in the pipeline function if they dont want to rename in the catalog

You will need to update your catalog accordingly. If you don't want to change the names of your inputs, outputs, or parameters with the namespace_name. prefix while using a namespace, you should list these objects inside the corresponding parameters of the pipeline() creation function. For example:

return pipeline(
    base_pipeline,
    namespace = "new_namespaced_pipeline", # With that namespace, "new_namespaced_pipeline" prefix will be added to inputs, outputs, params, and node names
    inputs={"the_original_input_name"}, # Inputs remain the same, without namespace prefix
)

Huongg and others added 9 commits March 7, 2025 13:10
Co-authored-by: Merel Theisen <49397448+merelcht@users.noreply.github.com>
Signed-off-by: Huong Nguyen <32060364+Huongg@users.noreply.github.com>
Co-authored-by: Merel Theisen <49397448+merelcht@users.noreply.github.com>
Signed-off-by: Huong Nguyen <32060364+Huongg@users.noreply.github.com>
Co-authored-by: Merel Theisen <49397448+merelcht@users.noreply.github.com>
Signed-off-by: Huong Nguyen <32060364+Huongg@users.noreply.github.com>
Co-authored-by: Merel Theisen <49397448+merelcht@users.noreply.github.com>
Signed-off-by: Huong Nguyen <32060364+Huongg@users.noreply.github.com>
Co-authored-by: Merel Theisen <49397448+merelcht@users.noreply.github.com>
Signed-off-by: Huong Nguyen <32060364+Huongg@users.noreply.github.com>
Co-authored-by: Merel Theisen <49397448+merelcht@users.noreply.github.com>
Signed-off-by: Huong Nguyen <32060364+Huongg@users.noreply.github.com>
Co-authored-by: Merel Theisen <49397448+merelcht@users.noreply.github.com>
Signed-off-by: Huong Nguyen <32060364+Huongg@users.noreply.github.com>
Co-authored-by: Merel Theisen <49397448+merelcht@users.noreply.github.com>
Signed-off-by: Huong Nguyen <32060364+Huongg@users.noreply.github.com>
Signed-off-by: Huong Nguyen <huong.nguyen@mckinsey.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Node grouping guide for deployment
5 participants