-
Notifications
You must be signed in to change notification settings - Fork 930
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update docs add Airflow KubernetesPodOperator #4529
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Dmitry Sorokin <dmd40in@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot @DimedS , this is great! Left a few comments, didn't test the code itself yet
``` | ||
|
||
### Running Multiple Nodes in a Single Container | ||
By default, this approach runs each node in an isolated Docker container. However, to reduce computational overhead, you can choose to run multiple nodes together within the same container. If you opt for this, you must modify the DAG accordingly to adjust task dependencies and execution order. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this probably the preferred approach given the user feedback we got? If so, might be helpful to describe how exactly the user can achieve this, or what customisations are needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's true! I tried to expand the description with an example - hope it's clearer now
Signed-off-by: Dmitry Sorokin <dmd40in@gmail.com>
Thanks for the review, @astrojuanlu! I tried to address all the comments and requested a re-review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks @DimedS!
If you want to execute your DAG in an isolated environment on Airflow using a Kubernetes cluster, you can use a combination of the [`kedro-airflow`](https://github.com/kedro-org/kedro-plugins/tree/main/kedro-airflow) and [`kedro-docker`](https://github.com/kedro-org/kedro-plugins/tree/main/kedro-docker) plugins. | ||
|
||
1. **Package your Kedro project as a Docker container** | ||
[Use the `kedro docker init` and `kedro docker build` commands](https://github.com/kedro-org/kedro-plugins/tree/main/kedro-docker) to containerize your Kedro project. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[Use the `kedro docker init` and `kedro docker build` commands](https://github.com/kedro-org/kedro-plugins/tree/main/kedro-docker) to containerize your Kedro project. | |
[Use the `kedro docker init` and `kedro docker build` commands](https://github.com/kedro-org/kedro-plugins/tree/main/kedro-docker) to containerise your Kedro project. |
@@ -244,6 +244,77 @@ On the next page, set the `Public network (Internet accessible)` option in the ` | |||
|
|||
## How to run a Kedro pipeline on Apache Airflow using a Kubernetes cluster | |||
|
|||
The `kedro-airflow-k8s` plugin from GetInData | Part of Xebia enables you to run a Kedro pipeline on Airflow with a Kubernetes cluster. The plugin can be used together with `kedro-docker` to prepare a docker image for pipeline execution. At present, the plugin is available for versions of Kedro < 0.18 only. | |||
If you want to execute your DAG in an isolated environment on Airflow using a Kubernetes cluster, you can use a combination of the [`kedro-airflow`](https://github.com/kedro-org/kedro-plugins/tree/main/kedro-airflow) and [`kedro-docker`](https://github.com/kedro-org/kedro-plugins/tree/main/kedro-docker) plugins. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe link to the PyPI packages instead (https://pypi.org/project/kedro-airflow/ and https://pypi.org/project/kedro-docker/)
```python | ||
from airflow.providers.cncf.kubernetes.operators.pod import KubernetesPodOperator | ||
|
||
KubernetesPodOperator( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be trivial but I don't see where do I configure the location of the Kubernetes cluster. For example, if I launch a k3d
cluster locally, how do I tell KubernetesPodOperator
to use it?
Description
Closes #4499 by providing guidance on executing a Kedro project in an isolated environment on Airflow Kubernetes using
KubernetesPodOperator
, along with thekedro-airflow
andkedro-docker
plugins.Developer Certificate of Origin
We need all contributions to comply with the Developer Certificate of Origin (DCO). All commits must be signed off by including a
Signed-off-by
line in the commit message. See our wiki for guidance.If your PR is blocked due to unsigned commits, then you must follow the instructions under "Rebase the branch" on the GitHub Checks page for your PR. This will retroactively add the sign-off to all unsigned commits and allow the DCO check to pass.
Checklist
RELEASE.md
file