-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
File written by polars.DataFrame.write_ipc
read incorrectly
#540
Comments
not at a computer but is _ipc the correct thing to write out? |
Not sure, it's just what I've been using in Python. Should I be using a different I tried
Since the method's name is "write IPC stream", I also tried reading it with Julia's julia> DataFrame(Arrow.Stream("dates_stream.arrow"))
ERROR: MethodError: Cannot `convert` an object of type Arrow.View{Union{Missing, String}} to an object of type String
The function `convert` exists, but no method is defined for this combination of argument types.
Closest candidates are:
convert(::Type{String}, ::StringManipulation.Decoration)
@ StringManipulation ~/.julia/packages/StringManipulation/bMZ2A/src/decorations.jl:365
convert(::Type{String}, ::Base.JuliaSyntax.Kind)
@ Base /cache/build/builder-demeter6-3/julialang/julia-release-1-dot-11/base/JuliaSyntax/src/kinds.jl:975
convert(::Type{String}, ::String)
@ Base essentials.jl:461
...
Stacktrace:
[1] convert(::Type{Union{Missing, String}}, x::Arrow.View{Union{Missing, String}})
@ Base ./missing.jl:70
[2] push!(a::Vector{Union{Missing, String}}, item::Arrow.View{Union{Missing, String}})
@ Base ./array.jl:1260
[3] add!
@ ~/.julia/packages/Tables/8p03y/src/fallbacks.jl:140 [inlined]
[4] eachcolumns
@ ~/.julia/packages/Tables/8p03y/src/utils.jl:111 [inlined]
[5] buildcolumns(schema::Tables.Schema{…}, rowitr::Tables.IteratorWrapper{…})
@ Tables ~/.julia/packages/Tables/8p03y/src/fallbacks.jl:147
[6] _columns
@ ~/.julia/packages/Tables/8p03y/src/fallbacks.jl:274 [inlined]
[7] columns
@ ~/.julia/packages/Tables/8p03y/src/fallbacks.jl:258 [inlined]
[8] DataFrame(x::Arrow.Stream; copycols::Nothing)
@ DataFrames ~/.julia/packages/DataFrames/kcA9R/src/other/tables.jl:57
[9] DataFrame(x::Arrow.Stream)
@ DataFrames ~/.julia/packages/DataFrames/kcA9R/src/other/tables.jl:48
[10] top-level scope
@ REPL[4]:1
Some type information was truncated. Use `show(err)` to see complete types. |
I guess another check is to see if |
Yes, pyarrow can read files written by #!/usr/bin/env -S uv run --script
# /// script
# requires-python = ">=3.11"
# dependencies = ["polars==1.21.0", "pyarrow==19.0.0"]
# ///
from datetime import date
import polars as pl, pyarrow as pa
df = pl.DataFrame({
'text': "this is some text".split(),
'date': [date(2025,1,i+1) for i in range(4)],
'float': [i * 0.7 for i in range(4)],
'int': list(range(4))
})
print("!!!Writing df...")
df.write_ipc("dates.arrow")
df.write_ipc_stream("dates_stream.arrow")
print("\n!!!Reading IPC...")
with pa.OSFile("dates.arrow", 'rb') as src:
data = pa.ipc.open_file(src).read_all()
print(data)
print("\n!!!Reading IPC stream...")
with pa.OSFile("dates_stream.arrow", 'rb') as src:
data = pa.ipc.open_stream(src).read_all()
print(data) Output:
|
More examples where Arrow.jl can't read the file:
A dataframe like
When strings are of different lengths, short ones are messed up:
I tried "weird" non-ASCII scripts like Devanagari, but couldn't trigger the bug. |
Here's a
Also, sometimes data from the first column appears in the second column, but only for dataframes with more than about 30 rows:
Pyarrow reads all of these correctly. |
It seems that arrow-julia doesn't support string view yet. |
Is |
Oh, sorry. They are the same type. |
Python code that writes the file:
Polars can read this file:
Arrow.jl reads garbage:
Issue: this is not at all what Polars wrote to the file
Other data types are read properly:
The text was updated successfully, but these errors were encountered: