Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ColumnNotFoundError when using unnest #19307

Closed
2 tasks done
micouy opened this issue Oct 18, 2024 · 5 comments
Closed
2 tasks done

ColumnNotFoundError when using unnest #19307

micouy opened this issue Oct 18, 2024 · 5 comments
Assignees
Labels
bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars

Comments

@micouy
Copy link

micouy commented Oct 18, 2024

Checks

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of Polars.

Reproducible example

import polars as pl

df = pl.LazyFrame({
    "a": [
        [1],
        [2, 2],
        [3, 3, 3],
    ]
})


def field_name(index: int) -> str:
    return f"a_split_{index}"


df = df.with_columns(
    pl.col("a")
        .list
        .to_struct(
            n_field_strategy="max_width",
            fields=field_name,
        )
).unnest("a")

# Comment this step out and the "a_split_0" will be present in the final output.
df = df.with_columns(
    pl.col("a_split_0") * 2
)

print(df.collect())

Log output

No response

Issue description

The code above raises the following error:

polars.exceptions.ColumnNotFoundError: a_split_0

Resolved plan until failure:

	---> FAILED HERE RESOLVING 'with_columns' <---
UNNEST by:[a]
   WITH_COLUMNS:
   [col("a").list.to_struct()] 
    DF ["a"]; PROJECT */1 COLUMNS; SELECTION: None

I suspect this might be an expected behaviour. I know in case of LazyFrame the schema is validated before executing the query. Since the number of columns cannot be known before execution and since the naming function might be indeterministic, the presence of the resulting columns cannot be verified early.

In case the error is expected, I think it would be useful for users to add a comment to LazyFrame.unnest docs.

Expected behavior

The query above would complete successfully or an informative warning would be raised.

Installed versions

--------Version info---------
Polars:              1.9.0
Index type:          UInt32
Platform:            macOS-14.5-arm64-arm-64bit
Python:              3.11.9 (main, Aug  2 2024, 11:33:02) [Clang 15.0.0 (clang-1500.3.9.4)]

----Optional dependencies----
adbc_driver_manager  <not installed>
altair               <not installed>
cloudpickle          <not installed>
connectorx           <not installed>
deltalake            <not installed>
fastexcel            <not installed>
fsspec               <not installed>
gevent               <not installed>
great_tables         <not installed>
matplotlib           <not installed>
nest_asyncio         1.6.0
numpy                2.1.2
openpyxl             <not installed>
pandas               <not installed>
pyarrow              <not installed>
pydantic             2.9.2
pyiceberg            <not installed>
sqlalchemy           <not installed>
torch                <not installed>
xlsx2csv             <not installed>
xlsxwriter           <not installed>
@micouy micouy added bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars labels Oct 18, 2024
@ritchie46
Copy link
Member

The dtype of list.to_struct cannot be known statically and therefore requires a user to provide a dtype. We must add that option.

@ritchie46
Copy link
Member

In the mean time you can add a cast to inform Polars of the type.

@cmdlineluser cmdlineluser mentioned this issue Oct 22, 2024
2 tasks
@ritchie46 ritchie46 self-assigned this Oct 23, 2024
@nameexhaustion nameexhaustion changed the title ColumnNotFoundError when using unnest ColumnNotFoundError when using unnest inside list.eval Oct 25, 2024
@nameexhaustion nameexhaustion changed the title ColumnNotFoundError when using unnest inside list.eval ColumnNotFoundError when using unnest Oct 25, 2024
@DarkAmoeba
Copy link

I think I also encountered a variant of this issue which seemed specifically related to when the I aliased the structed column with a new name. This is a re-producible example of the issue I experienced:

import datetime

data = [{'timestamp': datetime.datetime(2024, 9, 30, 16, 5, 19),
         'flight': 1,
         'conformances': [True,
                          True,
                          False]},
        {'timestamp': datetime.datetime(2024, 9, 30, 16, 5, 20),
         'flight': 2,
         'conformances': [False,
                          False,
                          False]},
        {'timestamp': datetime.datetime(2024, 9, 30, 16, 5, 21),
         'flight': 3,
         'conformances': [False,
                          False,
                          True]}]

ldf = pl.LazyFrame(data)
ldf2 = ldf.with_columns(pl.col('conformances').list.to_struct(fields=['heading', 'level', 'speed']).alias('new_conformances'))

# this works fine
ldf2.unnest('new_conformances').collect().select('heading', 'level', 'speed')

# this raises polars.exceptions.ColumnNotFoundError: heading
ldf2.unnest('new_conformances').drop('flight').collect().select('heading', 'level', 'speed')

@cmdlineluser
Copy link
Contributor

@DarkAmoeba I can reproduce that on 1.10.0

I think it was fixed by #19439 in 1.11.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars
Projects
None yet
Development

No branches or pull requests

5 participants