Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Factory datasets are not getting validated #80

Closed
kacper-ki opened this issue Jul 16, 2024 · 1 comment · Fixed by #81
Closed

Factory datasets are not getting validated #80

kacper-ki opened this issue Jul 16, 2024 · 1 comment · Fixed by #81

Comments

@kacper-ki
Copy link

kacper-ki commented Jul 16, 2024

Description

A factory dataset with schema defined isn't validated by kedro-pandera

Context

Unable to validate datasets defined as a factory

Steps to Reproduce

For this catalog entry:

"{foo}_feature":
  type: pandas.ParquetDataset
  filepath: data/04_feature/{foo}_feature.parquet
  metadata:
    pandera:
      schema: ${pa.python:my_kedro_project.pipelines.feature_preprocessing.schemas.GenericFeatureSchema}

Expected Result

The factory dataset is validated

Actual Result

The dataset isn't validated (in my case its the output dataset). Removing factory specification fixes the problem

Investigation / workaround

Looking into the source code the line dataset = catalog._datasets.get(name) returns None for a factory dataset, which makes metadata become None too. That stops the validation.

It is a bigger issue with the catalog and dataset factories.

I managed to fix the issue by wrapping the code inside the for loop:

for name, data in datasets.items():
  if catalog.exists(name):
    dataset = catalog._datasets.get(name)
    metadata = getattr(dataset, "metadata", None)
    ...

That makes the dataset pop up in catalog._datasets and it's getting validated properly

Another workaround that I can think of is move from before/after_node_run hook to before/after_dataset_loaded, but not 100% sure that it will work

Your Environment

  • kedro-pandera=0.2.2
  • Python 3.10.11
  • Apple M2 Pro, Sonoma 14.5

Does the bug also happen with the last version on main?

Yes

@Galileo-Galilei
Copy link
Owner

Thank you very much for the detailed report. I'll accept PR if you want to solve it fast ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants