[DataCatalog2.0]: Update `KedroDataCatalog` CLI logic and make it reusable #3312

noklam · 2023-11-15T11:48:07Z

Description

Parent issue: #4472

Context

Background: https://linen-slack.kedro.org/t/16064885/when-i-say-catalog-list-in-a-kedro-jupter-lab-instance-it-do#ad3bb4aa-f6f9-44c6-bb84-b25163bfe85c

With dataset factory, the "defintion" of a dataset is not known until the pipeline is run. When user is using a Jupyter notebook, they expected to see the full list of dataset with catalog.list.

Current workaround to see the datasets for __default__ pipeline look like this:

for dataset in pipeline["__default__"].data_sets():
  catalog.exists(dataset)

When using the CLI commands, e.g. kedro catalog list we do matching to figure out which factory mentions in the catalog match the datasets used in the pipeline, but when going through the interactive flow no such checking has been done yet.

Possible Implementation

Could check dataset existence when the session is created. We need to verify if that has any unexpected side effects.

This ticket is still open scope and we don't have a specify implementation in mind. The person who pick up can evaluate different approaches, with considerations of side-effect, avoid coupling with other components.

Possible Alternatives

catalog.list( pipeline=<name>) - not a good solution because catalog wouldn't have access to a pipeline
Do something similar to what's happening when kedro catalog list is called.

The text was updated successfully, but these errors were encountered:

datajoely · 2023-11-15T13:13:31Z

could we have something like catalog.resolve(pipeline:Optional[str]).list()?

merelcht · 2023-11-20T14:56:44Z

This Viz issue is related: kedro-org/kedro-viz#1480

MarcelBeining · 2024-02-16T09:36:36Z

could we have something like catalog.resolve(pipeline:Optional[str]).list()?

That would be perfect! We would need such a thing

noklam · 2024-02-16T17:00:06Z

@MarcelBeining Can you explains a bit more why you need this? I am thinking about this again because I am trying to build a plugin for kedro and this would come in handy to compile a static version of configuration.

MarcelBeining · 2024-03-12T14:08:46Z

@noklam We try to find kedro datasets for which we have not written a data test, hence we iterate over catalog.list(). However, if we use dataset factories, the datasets captured with a factory is not listed in catalog.list()

noklam · 2024-03-12T14:48:55Z

@MarcelBeining Did I understand this question correctly as:

Find which datasets is not written in catalog.yml yet? I have some WIP in https://github.com/noklam/kedro-inspect which explores this idea but I haven't finished it.

Does kedro catalog resolve or kedro catalog list helps you? If not what are missing?

MarcelBeining · 2024-03-12T16:04:11Z

@noklam "Find which datasets is not written in catalog.yml including dataset factory resolves, yet" , yes

kedro catalog resolve shows what I need, but it is a CLI command and I need it within Python (of course one could use os.system etc, but a simple extension of catalog.list() should not be that hard)

noklam · 2024-03-12T16:06:58Z

@MarcelBeining Are you integrating this with some extra functionalities? How do you consume this information if this is ok to share?

ianwhale · 2024-05-17T12:57:57Z

@noklam

Adding on from our discussion on slack,

kedro catalog resolve does what I'd want.

But I'd also like that information easily consumable in a notebook (for example).

So if my catalog stores models like:

"{experiment}.model":
  type: pickle.PickleDataset
  filepath: data/06_models/{experiment}/model.pickle
  versioned: true

I would want to be able to (somehow) do something like:

models = {}

for model_dataset in [d for d in catalog.list(*~*magic*~*) if ".model" in d]:
    models[model_dataset] = catalog.load(model_dataset)

Its a small thing. But I was kind of surprised to not see my {experiment}.model entries not listed at all in catalog.list().

noklam · 2024-05-24T11:34:52Z

Another one, bumped to high priority as discussed in slack.
https://linen-slack.kedro.org/t/18841749/hi-i-have-a-dataset-factory-specified-and-when-i-do-catalog-#eb609fb2-fce6-434d-a652-ffb62eb41e7b

noklam · 2024-06-03T13:50:37Z

What if DataCatalog is iterable?

for datasets in data_catalog:
   ...

datajoely · 2024-06-03T15:01:29Z

I think it's neat @noklam , but I don't know if it's discoverable.

To me DataCatalog.list() feels more powerful in the IDE than list(DataCatalog)...

astrojuanlu · 2024-06-03T15:39:39Z

why_not_both.gif

def list(self):
    ...

def __iter__(self):
    return self.list()

Galileo-Galilei · 2024-06-03T20:42:45Z

I've also wanted to be able to iterate through the datasets for a while, but it raises some unanswered questions:

should we iterate on catalog (maybe more intuitive) or catalog.datasets as described in [DataCatalog]: Iterate through datasets objects in the catalog #3916 (maybe more accurate, especially in regard to the "resolving" issue discussed below) ?
How does this loop would handle dataset factory? We could eventually replace :
- catalog.list() by [dataset.name for dataset in catalog]
- catalog.search (which does not exist but is suggested in [DataCatalog]: Add functionality to search datasets in the catalog #3917) by [dataset.name for dataset in catalog if re.match(dataset.name, regex)]

But we always face the same issue: we would need to "resolve" the dataset factory first relatively to a pipeline. it would eventually give: [dataset.name for dataset in catalog.resolve(pipeline)], but is it really a better / more intuitive syntax ? I personnaly find it quite neat, but arguably beginners would prefer a "native" method.

The real advantage of doing so is that we do not need to create a search method with all type of supported search (by extension, by regex... as suggested in the corresponding issue) because it's easily customizable, so it's less maintenance burden in the end.

noklam · 2024-06-03T21:33:24Z

Catalog.list already support regex, isn't that identical to what you suggest as catalog.search?

datajoely · 2024-06-04T00:09:27Z

@noklam you can only search by name, namespaces aren't really supported and you can't search by attribute

noklam · 2024-06-04T14:02:34Z

namespace is just a prefix string so it works pretty well. I do believe there are benefits to improve it, but I think we should at least add an example for existing feature since @Galileo-Galilei told me he is not aware of it and most likely very few do.

#3924

ElenaKhaustova · 2025-01-23T16:58:28Z

Inconsistency between CLI commands and interactive workflow is a problem related to all kedro catalog commands. Currently, all CLI logic is implemented within commands and cannot be reused across the codebase. This is confusing for users as one cannot perform the same actions that are available via commands in the interactive workflow. It also requires additional maintenance as any catalog-related changes should be validated for CLI logic as well.

Thus, we suggest refactoring CLI logic and moving it to the session level so that it can be reused for CLI and interactive workflows. We also suggest doing that together with the deprecation of the old catalog since CLI logic currently relies on some private fields and methods of the old catalog that will be removed.

We also think that we should not couple catalog and pipelines, so we do not consider extending catalog.list() method for that and suggest using session instead.

datajoely · 2025-01-24T11:47:59Z

So this came up yesterday - my colleague said 'I don't think the catalog is working' we did catalog.list() and all we saw was parameters.

It took 10 minutes of playing with %kedro reload --env=... to realise that it was down to the fact that factories don't show up.

What may have helped?

catalog.list() could show up the patterns it is expecting too e.g. "{name}_data" so the user is aware of the

What did we do? we wrote something horrible like this:

{x:catalog.load(x) for x in (pipelines["__default__"].outputs() | pipelines["__default__"].inputs())}

In summary, I think we don't have to over-engineer this, I think having expected patterns show up in the catalog.list() would be a really helpful addition that doesn't add too much complexity.

datajoely · 2025-01-24T11:49:37Z

Which I realise this is exactly what I pitched 18 months ago 😂
#3312 (comment)

ElenaKhaustova · 2025-01-24T12:28:39Z

@datajoely
I think the conceptual problem here is that we were not clear enough in explaining to users that datasets are not equal to patterns. After we added CatalogConfigResolver patterns can be listed like that for both new and old catalogs: catalog.config_resolver.list_patterns().

The idea of listing datasets and patterns together looks interesting. But doing so could lead to confusion, as people may not differentiate datasets and patterns. So, providing an interface to access both separately and communicate this better seems more reasonable to me.

There is also a high chance that catalog.list() will be replaced with another filtering method or removed, as now we have a dict-like interface to list keys, values, and items: #3917.

I see the pain point about the discrepancy between catalog.list() and the CI command kedro catalog list, where the last considers the pipeline but the first does not. But as suggested above we think this can be solved via session, not catalog to avoid coupling catalog and pipelines.

Long story short - we suggest:

Keep catalog.config_resolver.list_patterns() to list patterns
Keep dict-like interface to list datasets
Replace catalog.list() with catalog.filter()or remove it
Move CI logic to the session, so it is possible to list datasets with regards to pipelines from the CI and in the interactive workflow, something like that: session.catalog_list()
Communicate better difference between datasets and patterns, resolution based on pipelines

These 5 should address the points you mentioned as well as others we discussed above.

datajoely · 2025-01-24T14:08:42Z

Okay we're actually designing for different things. Perhaps we could print out some of these recommendations when the ipython extension is loaded because even experienced kedro users are going to know how to retrieve patterns etc

datajoely · 2025-02-11T09:27:32Z

Now this is prioritised for the 1.0.0 collection of breaking changes I want to push that the current CLI command is useless:

I no longer have access to telemetry, but last time I looked at it was barely used.

What it does well:

It interpolates the dataset factories so you don't get the situation where catalog.list() doesn't work until the dataset has been discovered eagerly.
It breaks down the datasets by type, but this is now solved in the python API

What it doesn't do well:

It's not really machine readable, the output is technically YAML, but it's hard to do anything with it as the logging cannot be disabled by the user from the CLI on the own
I think a detailed JSON output would be amazing, being able to do kedro catalog list | jq would be a productivity driver for users.

ElenaKhaustova · 2025-02-11T11:21:33Z

@datajoely, yep, now it's a good time to point to possible improvements. Can you please elaborate on what you mean by detailed? Do you expect any other information included?

datajoely · 2025-02-11T13:01:20Z

I a JSON structure something like this would be more useful from a machine interpretability point of view:

[
   {"dataset_name": "...", "dataset_type": "...", "pipelines":["...","..."] },
]

There is more that is possibly useful such as classpath for custom datasets, which YAML file it was found in etc.

ElenaKhaustova · 2025-03-04T18:24:25Z

1. Summary

The prototype of updated catalog commands is here: #4480

Note 1: We cannot proceed with the aforementioned PR until #4481 and #4475 are resolved. So, for now, the goal is to get feedback and validate some implementation details and functionality of updated commands.

Note 2: To demonstrate changes in the interactive environment, I made the session and context stateful as if we implemented #4481. Please do not pay attention to those changes now.

2. Implementation

2.1 Where should commands live

Context

The bare minimum to implement catalog commands is having loaded catalog and pipelines.

Previously, we instantiated a session inside each command and accessed the catalog via context. Pipelines were also imported there.

kedro/kedro/framework/cli/catalog.py

Line 59 in ed04197

def list_datasets(metadata: ProjectMetadata, pipeline: str, env: str) -> None:

from kedro.framework.project import pipelines


def list_datasets(metadata: ProjectMetadata, pipeline: str, env: str) -> None:
    session = _create_session(metadata.package_name, env=env)
    context = session.load_context()
    catalog = context.catalog

Since commands are placed in /kedro/framework/cli/catalog.py, we can not reuse this logic programmatically or in the interactive environment.

Suggestion

Thus, we suggest moving the logic for catalog commands to the session level, particularly to kedro/framework/session/catalog.py, where we can access the catalog and pipelines. This way, we can reuse this logic for the commands and call them programmatically via session.

kedro/kedro/framework/session/catalog.py

Line 77 in b5391f8

def list_catalog_patterns(self) -> list[str]:

For example, if we move list catalog patterns logic to the session level, we will be able to update kedro catalog rank command like that:

@catalog.command("rank")
@env_option
@click.pass_obj
def rank_catalog_factories(metadata: ProjectMetadata, env: str) -> None:
    """List all dataset factories in the catalog, ranked by priority by which they are matched."""
    session = _create_session(metadata.package_name, env=env)
    catalog_factories = session.list_catalog_patterns()

    if catalog_factories:
        click.echo(yaml.dump(catalog_factories))
    else:
        click.echo("There are no dataset factories in the catalog.")

And in the interactive environment we'll be able to do the following:

kedro ipython

In [1]: session.list_catalog_patterns()
Out[1]: ['{name}.{folder}#csv', '{name}_data', 'out-{dataset_name}', '{dataset_name}#csv', 'in-{dataset_name}']

Why not move catalog commands to catalog level

We could implement commands at the catalog level, but that would lead to coupling catalog and pipelines, which is not desired.

2.2 Implementation at the session level

We suggest implementing catalog commands logic as a mixin class in kedro/framework/session/catalog.py and placing it next to the kedro/framework/session/session.py.

kedro/kedro/framework/session/catalog.py

Line 15 in b5391f8

class CatalogCommandsMixin:

That way we will be able to extend the session without modifying it and any changes on the commands side will not affect session.

kedro/kedro/framework/session/session.py

Line 82 in b5391f8

class KedroSession(CatalogCommandsMixin):

3. Functionality

Each command is implemented as a function that outputs serialisable objects, so they can be saved in any format. These functions are planned to be used for CLI commands too, where we can decide on the specific output format. For example, @datajoely suggested printing output in JSON format.

Below you can find changes made per each command.

3.1 `kedro catalog rank`

Current functionality

This command lists all dataset patterns in the catalog, ranked by priority by which they are matched.

kedro catalog rank

- '{name}.{folder}#csv'
- '{name}_data'
- out-{dataset_name}
- '{dataset_name}#csv'
- in-{dataset_name}

Updated functionality

kedro/kedro/framework/session/catalog.py

Line 77 in b5391f8

def list_catalog_patterns(self) -> list[str]:

The functionality of this command is left unchanged but now we can do some interactive things:

kedro ipython

In [1]: session.list_catalog_patterns()
Out[1]: ['{name}.{folder}#csv', '{name}_data', 'out-{dataset_name}', '{dataset_name}#csv', 'in-{dataset_name}']

In [2]: runtime_pattern = {"{default}": {"type": "MemoryDataset"}}

In [3]: catalog.config_resolver.add_runtime_patterns(runtime_pattern)

In [4]: session.list_catalog_patterns()
Out[4]: ['{name}.{folder}#csv', '{name}_data', 'out-{dataset_name}', '{dataset_name}#csv', 'in-{dataset_name}', '{default}']

3.2 `kedro catalog resolve`

Current functionality

This command resolves catalog patterns against pipeline datasets.

kedro catalog resolve

X_test:
  type: MemoryDataset
companies#csv:
  filepath: data/01_raw/companies.csv
  type: pandas.CSVDataset
shuttle_id_dataset:
  credentials: db_credentials
  execution_options:
    stream_results: true
  load_args:
    chunksize: 1000
  sql: select shuttle, shuttle_id from spaceflights.shuttles;
  type: pandas.SQLQueryDataset

Updated functionality

kedro/kedro/framework/session/catalog.py

Line 83 in b5391f8

    
           def resolve_catalog_patterns(self, include_default: bool = False) -> dict[str, Any]:

Changes:

Output full dataset configuration for catalog datasets
Added option to include default datasets (by default they're excluded as before) - session.resolve_catalog_patterns(include_default=True)
Interactive workflow support

kedro ipython

In [1]: session.resolve_catalog_patterns()
Out[1]:
{
    'X_test': {'type': 'kedro.io.memory_dataset.MemoryDataset', 'copy_mode': None},
    'companies#csv': {
            'type': 'pandas.CSVDataset',
            'filepath': '/Projects/kedro-tests/default/data/01_raw/companies.csv'
        },
    'shuttle_id_dataset': {
            'type': 'kedro_datasets.pandas.sql_dataset.SQLQueryDataset',
            'sql': 'select shuttle, shuttle_id from spaceflights.shuttles;',
            'credentials': 'shuttle_id_dataset_credentials',
            'execution_options': {'stream_results': True},
            'load_args': {'chunksize': 1000},
            'fs_args': None,
            'filepath': None
        },
}

In [2]: catalog["new_dataset"] = 123

In [3]: session.resolve_catalog_patterns()
Out[3]:
{
    'X_test': {'type': 'kedro.io.memory_dataset.MemoryDataset', 'copy_mode': None},
    'companies#csv': {
            'type': 'pandas.CSVDataset',
            'filepath': '/Projects/kedro-tests/default/data/01_raw/companies.csv'
        },
    'shuttle_id_dataset': {
            'type': 'kedro_datasets.pandas.sql_dataset.SQLQueryDataset',
            'sql': 'select shuttle, shuttle_id from spaceflights.shuttles;',
            'credentials': 'shuttle_id_dataset_credentials',
            'execution_options': {'stream_results': True},
            'load_args': {'chunksize': 1000},
            'fs_args': None,
            'filepath': None
        },
    'new_dataset': {'type': 'kedro.io.memory_dataset.MemoryDataset', 'copy_mode': None},
}

3.3 `kedro catalog list`

Current functionality

This command shows datasets per type.

kedro catalog list -p data_processing

Datasets in 'data_processing' pipeline:
  Datasets generated from factories:
    pandas.CSVDataset:
    - reviews.01_raw#csv
    - companies#csv
  Datasets mentioned in pipeline:
    DefaultDataset:
    - preprocessed_companies
    ExcelDataset:
    - shuttles
    ParquetDataset:
    - preprocessed_shuttles
    - model_input_table
  Datasets not mentioned in pipeline:
    MemoryDataset:
    - X_test
    PickleDataset:
    - regressor
    SQLQueryDataset:
    - shuttle_id_dataset

Updated functionality

kedro/kedro/framework/session/catalog.py

Line 22 in b5391f8

def list_catalog_datasets(self, pipelines: list[str] | None = None) -> dict:

Changes:

Output full dataset types
Per each pipeline output 3 categories:

datasets - pipeline datasets configured in the catalog
factories - pipeline datasets resolved using catalog patterns
defaults - pipeline datasets match the catalog default pattern

Remove Datasets not mentioned in pipeline
Interactive workflow support

kedro ipython

In [1]: session.list_catalog_datasets(pipelines=["data_processing"])
Out[1]:

{
    'data_processing': {
        'datasets': {
            'kedro_datasets.pandas.parquet_dataset.ParquetDataset': ['model_input_table', 'preprocessed_shuttles'],
            'kedro_datasets.pandas.excel_dataset.ExcelDataset': ['shuttles']
        },
        'factories': {'kedro_datasets.pandas.csv_dataset.CSVDataset': ['reviews.01_raw#csv', 'companies#csv']},
        'defaults': {'kedro.io.memory_dataset.MemoryDataset': ['preprocessed_companies']}
    }
}

In [2]: catalog["companies#csv"]
Out[2]: kedro_datasets.pandas.csv_dataset.CSVDataset(filepath=PurePosixPath('/Projects/kedro-tests/default/data/01_raw/companies.csv'), protocol='file', load_args={}, save_args={'index': False})

In [3]: session.list_catalog_datasets(pipelines=["data_processing"])
Out[3]:

{
    'data_processing': {
        'datasets': {
            'kedro_datasets.pandas.parquet_dataset.ParquetDataset': ['model_input_table', 'preprocessed_shuttles'],
            'kedro_datasets.pandas.excel_dataset.ExcelDataset': ['shuttles'],
            'kedro_datasets.pandas.csv_dataset.CSVDataset': ['companies#csv']
        },
        'factories': {'kedro_datasets.pandas.csv_dataset.CSVDataset': ['reviews.01_raw#csv']},
        'defaults': {'kedro.io.memory_dataset.MemoryDataset': ['preprocessed_companies']}
    }
}

3.4 `kedro catalog create`

This command creates a YAML catalog configuration with missing datasets. So, it now saves MemoryDatasets not mentioned in the catalog` in the new YAML file.

I haven't implemented the updated version of this command because now it just saves defaults from the updated list command to new new YAML file. So it's not clear whether it is still useful.

astrojuanlu · 2025-03-07T17:49:04Z

Thanks for the long writeup @ElenaKhaustova ! Left an idea about the interactive functionality in #4481 (comment). I understand it's a thorny issue, hope we can unblock it.

About your "current vs updated functionality", just to clarify: Is the idea to keep the kedro catalog rank, kedro catalog resolve, kedro catalog list CLI commands just as they are now, without user changes? If so, the rationale to move their logic to the session looks good to me, as well as the programmatic APIs.

noklam added the Issue: Feature Request New feature or improvement to existing feature label Nov 15, 2023

github-actions bot mentioned this issue Dec 1, 2023

Monthly issue metrics report #3375

Closed

takikadiri mentioned this issue Dec 1, 2023

Best-effort dataset factories takikadiri/kedro-boot#3

Merged

ankatiyar added this to the Dataset Factory Improvements milestone Mar 4, 2024

astrojuanlu mentioned this issue Jun 3, 2024

[DataCatalog]: Add functionality to search datasets in the catalog #3917

Closed

noklam mentioned this issue Jun 4, 2024

Add an example of catalog.list(<regex>) and replace io to catalog in docs #3924

Merged

7 tasks

kacper-ki mentioned this issue Jul 16, 2024

Factory datasets are not getting validated Galileo-Galilei/kedro-pandera#80

Closed

This was referenced Oct 22, 2024

ThreadRunner Dataset DatasetAlreadyExistsError: Dataset has already been registered #4250

Closed

DataCatalog to support listing generic datasets #4184

Closed

noklam changed the title ~~How to improve catalog.list or alternative for dataset factory?~~ Improve catalog.list or alternative for dataset factory? Oct 23, 2024

ElenaKhaustova modified the milestones: Dataset Factory Improvements, Redesign the API for IO (catalog) Jan 14, 2025

ElenaKhaustova mentioned this issue Jan 14, 2025

Design DataCatalog2.0 #3995

Open

3 tasks

ElenaKhaustova self-assigned this Jan 20, 2025

astrojuanlu added this to Kedro 🔶 Jan 27, 2025

astrojuanlu moved this to To Do in Kedro 🔶 Jan 27, 2025

gtauzin mentioned this issue Feb 4, 2025

Kedro integration cannot load datasets created using dataset factories mckinsey/vizro#988

Closed

1 task

This was referenced Feb 10, 2025

[DataCatalog2.0]: Make catalog CLI logic reusable across CLI and session #4472

Open

[DataCatalog2.0]: Update catalog CLI commands based on reusable logic #4471

Open

ElenaKhaustova changed the title ~~Improve catalog.list or alternative for dataset factory?~~ [DataCatalog2.0]: Update KedroDataCatalog CLI logic and make it reusable Feb 10, 2025

ElenaKhaustova added the Component: IO Issue/PR addresses data loading/saving/versioning and validation, the DataCatalog and DataSets label Feb 10, 2025

ElenaKhaustova moved this from To Do to In Progress in Kedro 🔶 Feb 12, 2025

ElenaKhaustova mentioned this issue Feb 13, 2025

Make session retain context and context retain catalog #4481

Open

ElenaKhaustova mentioned this issue Mar 4, 2025

[Prototype] Make catalog cli logic reusable #4480

Draft

7 tasks

ElenaKhaustova moved this from In Progress to In Review in Kedro 🔶 Mar 4, 2025

ElenaKhaustova modified the milestones: Redesign the API for IO (catalog), Kedro 1.0.0 Mar 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DataCatalog2.0]: Update `KedroDataCatalog` CLI logic and make it reusable #3312

[DataCatalog2.0]: Update `KedroDataCatalog` CLI logic and make it reusable #3312

noklam commented Nov 15, 2023 •

edited by ElenaKhaustova

Loading

datajoely commented Nov 15, 2023

merelcht commented Nov 20, 2023

MarcelBeining commented Feb 16, 2024

noklam commented Feb 16, 2024

MarcelBeining commented Mar 12, 2024

noklam commented Mar 12, 2024 •

edited

Loading

MarcelBeining commented Mar 12, 2024

noklam commented Mar 12, 2024 •

edited

Loading

ianwhale commented May 17, 2024

noklam commented May 24, 2024 •

edited by astrojuanlu

Loading

noklam commented Jun 3, 2024

datajoely commented Jun 3, 2024

astrojuanlu commented Jun 3, 2024

Galileo-Galilei commented Jun 3, 2024 •

edited

Loading

noklam commented Jun 3, 2024

datajoely commented Jun 4, 2024

noklam commented Jun 4, 2024 •

edited

Loading

ElenaKhaustova commented Jan 23, 2025

datajoely commented Jan 24, 2025

datajoely commented Jan 24, 2025

ElenaKhaustova commented Jan 24, 2025

datajoely commented Jan 24, 2025

datajoely commented Feb 11, 2025

ElenaKhaustova commented Feb 11, 2025

datajoely commented Feb 11, 2025

ElenaKhaustova commented Mar 4, 2025

astrojuanlu commented Mar 7, 2025

[DataCatalog2.0]: Update KedroDataCatalog CLI logic and make it reusable #3312

[DataCatalog2.0]: Update KedroDataCatalog CLI logic and make it reusable #3312

Comments

noklam commented Nov 15, 2023 • edited by ElenaKhaustova Loading

Description

Context

Possible Implementation

Possible Alternatives

datajoely commented Nov 15, 2023

merelcht commented Nov 20, 2023

MarcelBeining commented Feb 16, 2024

noklam commented Feb 16, 2024

MarcelBeining commented Mar 12, 2024

noklam commented Mar 12, 2024 • edited Loading

MarcelBeining commented Mar 12, 2024

noklam commented Mar 12, 2024 • edited Loading

ianwhale commented May 17, 2024

noklam commented May 24, 2024 • edited by astrojuanlu Loading

noklam commented Jun 3, 2024

datajoely commented Jun 3, 2024

astrojuanlu commented Jun 3, 2024

Galileo-Galilei commented Jun 3, 2024 • edited Loading

noklam commented Jun 3, 2024

datajoely commented Jun 4, 2024

noklam commented Jun 4, 2024 • edited Loading

ElenaKhaustova commented Jan 23, 2025

datajoely commented Jan 24, 2025

datajoely commented Jan 24, 2025

ElenaKhaustova commented Jan 24, 2025

datajoely commented Jan 24, 2025

datajoely commented Feb 11, 2025

ElenaKhaustova commented Feb 11, 2025

datajoely commented Feb 11, 2025

ElenaKhaustova commented Mar 4, 2025

1. Summary

2. Implementation

2.1 Where should commands live

Context

Suggestion

Why not move catalog commands to catalog level

2.2 Implementation at the session level

3. Functionality

3.1 kedro catalog rank

Current functionality

Updated functionality

3.2 kedro catalog resolve

Current functionality

Updated functionality

3.3 kedro catalog list

Current functionality

Updated functionality

3.4 kedro catalog create

astrojuanlu commented Mar 7, 2025

[DataCatalog2.0]: Update `KedroDataCatalog` CLI logic and make it reusable #3312

[DataCatalog2.0]: Update `KedroDataCatalog` CLI logic and make it reusable #3312

noklam commented Nov 15, 2023 •

edited by ElenaKhaustova

Loading

noklam commented Mar 12, 2024 •

edited

Loading

noklam commented Mar 12, 2024 •

edited

Loading

noklam commented May 24, 2024 •

edited by astrojuanlu

Loading

Galileo-Galilei commented Jun 3, 2024 •

edited

Loading

noklam commented Jun 4, 2024 •

edited

Loading

3.1 `kedro catalog rank`

3.2 `kedro catalog resolve`

3.3 `kedro catalog list`

3.4 `kedro catalog create`