Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] make_empty_column produces a dictionary column in an invalid state #18071

Open
vyasr opened this issue Feb 22, 2025 · 5 comments
Open

[BUG] make_empty_column produces a dictionary column in an invalid state #18071

vyasr opened this issue Feb 22, 2025 · 5 comments
Labels
bug Something isn't working

Comments

@vyasr
Copy link
Contributor

vyasr commented Feb 22, 2025

Describe the bug
Currently running the following two lines will produce a segmentation fault:

  auto col = cudf::make_empty_column(cudf::data_type{cudf::type_id::DICTIONARY32});
  auto x = cudf::dictionary_column_view(col->view()).indices();

The reason for this is that when producing an empty column the make_empty_column function does not produce any children for the column, resulting in a null pointer dereference on the indices() call.

In principle, the function protects against this class of error by explicitly opting out of supporting nested column types. However, dictionary types are not considered nested by the is_nested trait.

Expected behavior
This code should either succeed or fail gracefully, not seg fault.

Additional context
The two obvious options for solutions would be to add dictionary types to is_nested or to manually disable them in the factory function. Naively I would opt for the former, but I suspect that would violate expectations in other parts of the code. I unfortunately don't know the dictionary code very well.

@vyasr vyasr added the bug Something isn't working label Feb 22, 2025
@vyasr
Copy link
Contributor Author

vyasr commented Feb 22, 2025

@davidwendt I'm happy to open a PR with a fix if you can advise on what makes the most sense here.

@davidwendt
Copy link
Contributor

davidwendt commented Feb 22, 2025

Should I be concerned that we are suddenly trying to use dictionary types?

Accessing children of an empty compound column (one with potential children) is considered UB.
https://github.com/rapidsai/cudf/blob/branch-25.04/cpp/doxygen/developer_guide/DEVELOPER_GUIDE.md#empty-columns
I can explicitly add dictionary types this section of the developer's guide if necessary.

The following will also segfault:

  auto col = cudf::make_empty_column(cudf::data_type{cudf::type_id::STRING});
  auto x = cudf::strings_column_view(col->view()).offsets();

@vyasr
Copy link
Contributor Author

vyasr commented Feb 26, 2025

No, nothing to be worried about here. This came up in #18084 because we already support dictionary type in interop. This discussion implies that we have UB here. I can fix that in a follow-up PR.

@davidwendt
Copy link
Contributor

Attempting to fix this here #18121

@vyasr
Copy link
Contributor Author

vyasr commented Feb 27, 2025

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants