Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

File written by polars.DataFrame.write_ipc read incorrectly #540

Open
ForceBru opened this issue Feb 9, 2025 · 9 comments
Open

File written by polars.DataFrame.write_ipc read incorrectly #540

ForceBru opened this issue Feb 9, 2025 · 9 comments

Comments

@ForceBru
Copy link

ForceBru commented Feb 9, 2025

Python code that writes the file:

#!/usr/bin/env -S uv run --script
# /// script
# requires-python = ">=3.11"
# dependencies = ["polars<=1.21.0"]
# ///
import polars as pl

pl.DataFrame({'text': "this is some text".split()}).write_ipc("data.arrow")

Polars can read this file:

>>> import polars as pl
>>> pl.read_ipc("data.arrow")
shape: (4, 1)
┌──────┐
│ text │
│ ---  │
│ str  │
╞══════╡
│ this │
│ is   │
│ some │
│ text │
└──────┘
>>>

Arrow.jl reads garbage:

julia> import Pkg; Pkg.status()
Status `~/tmp/Project.toml`
  [69666777] Arrow v2.8.0
  [a93c6f00] DataFrames v1.7.0

julia> using DataFrames; import Arrow

julia> DataFrame(Arrow.Table("./data.arrow"))
4×1 DataFrame
 Row │ text     
     │ String?  
─────┼──────────
   1 │ W1\0\0
   2\xf2\xff
   3\v\0\b\0
   4\b\0\b\0

julia> 

Issue: this is not at all what Polars wrote to the file


Other data types are read properly:

> cat arrow_bug.py
#!/usr/bin/env -S uv run --script
# /// script
# requires-python = ">=3.11"
# dependencies = ["polars<=1.21.0"]
# ///
from datetime import date
import polars as pl

pl.DataFrame({
    'text': "this is some text".split(),
    'date': [date(2025,1,i+1) for i in range(4)],
    'float': [float(i) for i in range(4)],
    'int': list(range(4))
}).write_ipc("dates.arrow")
> ./arrow_bug.py
> julia --project
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.11.3 (2025-01-21)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |

julia> using DataFrames; import Arrow

julia> DataFrame(Arrow.Table("dates.arrow"))
4×4 DataFrame
 Row │ text      date        float     int    
     │ String?   Date?       Float64?  Int64? 
─────┼────────────────────────────────────────
   1 │ W1\0\0    2025-01-01       0.0       0
   2 │ \xf2\xff  2025-01-02       1.0       1
   3 │ \v\0\b\0  2025-01-03       2.0       2
   4 │ \b\0\b\0  2025-01-04       3.0       3

julia> 
@Moelf
Copy link
Contributor

Moelf commented Feb 9, 2025

not at a computer but is _ipc the correct thing to write out?

@ForceBru
Copy link
Author

ForceBru commented Feb 9, 2025

is _ipc the correct thing to write out?

Not sure, it's just what I've been using in Python. Should I be using a different write_ method to write Arrow files from Polars?

I tried write_ipc_stream, but Arrow.jl can't read the String column anyway:

> cat arrow_bug.py
#!/usr/bin/env -S uv run --script
# /// script
# requires-python = ">=3.11"
# dependencies = ["polars<=1.21.0"]
# ///
from datetime import date
import polars as pl

df = pl.DataFrame({
    'text': "this is some text".split(),
    'date': [date(2025,1,i+1) for i in range(4)],
    'float': [float(i) for i in range(4)],
    'int': list(range(4))
})
df.write_ipc("dates.arrow")
df.write_ipc_stream("dates_stream.arrow")
> ./arrow_bug.py
> julia --project
julia> using DataFrames; import Arrow

julia> DataFrame(Arrow.Table("dates.arrow"))
4×4 DataFrame
 Row │ text      date        float     int    
     │ String?   Date?       Float64?  Int64? 
─────┼────────────────────────────────────────
   1 │ W1\0\0    2025-01-01       0.0       0
   2 │ \xf2\xff  2025-01-02       1.0       1
   3 │ \v\0\b\0  2025-01-03       2.0       2
   4 │ \b\0\b\0  2025-01-04       3.0       3

julia> DataFrame(Arrow.Table("dates_stream.arrow"))
4×4 DataFrame
 Row │ text              date        float     int    
     │ String?           Date?       Float64?  Int64? 
─────┼────────────────────────────────────────────────
   1 │ @\x01\0\0         2025-01-01       0.0       0
   2 │ \x04\0            2025-01-02       1.0       1
   3 │ \xf8\xff\xff\xff  2025-01-03       2.0       2
   4 │ \x04\0\0\0        2025-01-04       3.0       3

julia> 

Since the method's name is "write IPC stream", I also tried reading it with Julia's Arrow.Stream, but got this error:

julia> DataFrame(Arrow.Stream("dates_stream.arrow"))
ERROR: MethodError: Cannot `convert` an object of type Arrow.View{Union{Missing, String}} to an object of type String
The function `convert` exists, but no method is defined for this combination of argument types.

Closest candidates are:
  convert(::Type{String}, ::StringManipulation.Decoration)
   @ StringManipulation ~/.julia/packages/StringManipulation/bMZ2A/src/decorations.jl:365
  convert(::Type{String}, ::Base.JuliaSyntax.Kind)
   @ Base /cache/build/builder-demeter6-3/julialang/julia-release-1-dot-11/base/JuliaSyntax/src/kinds.jl:975
  convert(::Type{String}, ::String)
   @ Base essentials.jl:461
  ...

Stacktrace:
  [1] convert(::Type{Union{Missing, String}}, x::Arrow.View{Union{Missing, String}})
    @ Base ./missing.jl:70
  [2] push!(a::Vector{Union{Missing, String}}, item::Arrow.View{Union{Missing, String}})
    @ Base ./array.jl:1260
  [3] add!
    @ ~/.julia/packages/Tables/8p03y/src/fallbacks.jl:140 [inlined]
  [4] eachcolumns
    @ ~/.julia/packages/Tables/8p03y/src/utils.jl:111 [inlined]
  [5] buildcolumns(schema::Tables.Schema{…}, rowitr::Tables.IteratorWrapper{…})
    @ Tables ~/.julia/packages/Tables/8p03y/src/fallbacks.jl:147
  [6] _columns
    @ ~/.julia/packages/Tables/8p03y/src/fallbacks.jl:274 [inlined]
  [7] columns
    @ ~/.julia/packages/Tables/8p03y/src/fallbacks.jl:258 [inlined]
  [8] DataFrame(x::Arrow.Stream; copycols::Nothing)
    @ DataFrames ~/.julia/packages/DataFrames/kcA9R/src/other/tables.jl:57
  [9] DataFrame(x::Arrow.Stream)
    @ DataFrames ~/.julia/packages/DataFrames/kcA9R/src/other/tables.jl:48
 [10] top-level scope
    @ REPL[4]:1
Some type information was truncated. Use `show(err)` to see complete types.

@Moelf
Copy link
Contributor

Moelf commented Feb 9, 2025

I guess another check is to see if pyarrow can read it

@ForceBru
Copy link
Author

ForceBru commented Feb 9, 2025

Yes, pyarrow can read files written by df.write_ipc and df.write_ipc_stream:

#!/usr/bin/env -S uv run --script
# /// script
# requires-python = ">=3.11"
# dependencies = ["polars==1.21.0", "pyarrow==19.0.0"]
# ///
from datetime import date
import polars as pl, pyarrow as pa

df = pl.DataFrame({
    'text': "this is some text".split(),
    'date': [date(2025,1,i+1) for i in range(4)],
    'float': [i * 0.7 for i in range(4)],
    'int': list(range(4))
})
print("!!!Writing df...")
df.write_ipc("dates.arrow")
df.write_ipc_stream("dates_stream.arrow")

print("\n!!!Reading IPC...")
with pa.OSFile("dates.arrow", 'rb') as src:
    data = pa.ipc.open_file(src).read_all()
    print(data)

print("\n!!!Reading IPC stream...")
with pa.OSFile("dates_stream.arrow", 'rb') as src:
    data = pa.ipc.open_stream(src).read_all()
    print(data)

Output:

> chmod +x code.py && ./code.py
!!!Writing df...

!!!Reading IPC...
pyarrow.Table
text: string_view
date: date32[day]
float: double
int: int64
----
text: [["this","is","some","text"]]
date: [[2025-01-01,2025-01-02,2025-01-03,2025-01-04]]
float: [[0,0.7,1.4,2.0999999999999996]]
int: [[0,1,2,3]]

!!!Reading IPC stream...
pyarrow.Table
text: string_view
date: date32[day]
float: double
int: int64
----
text: [["this","is","some","text"]]
date: [[2025-01-01,2025-01-02,2025-01-03,2025-01-04]]
float: [[0,0.7,1.4,2.0999999999999996]]
int: [[0,1,2,3]]

@ForceBru
Copy link
Author

More examples where Arrow.jl can't read the file:

> python
Python 3.12.7 (main, Jan 17 2025, 16:55:27) [GCC 14.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import polars as pl
>>> pl.DataFrame({'text': ['this is some text'] * 10, 'more': ['hello']*10}).write_ipc("long.arrow")
>>> 
> julia --project -e "using DataFrames; import Arrow; Arrow.Table(\"long.arrow\") |> DataFrame |> display"
10×2 DataFrame
 Row │ text               more                 
     │ String?            String?              
─────┼─────────────────────────────────────────
   1 │ this is some text  W1\0\0\xff
   2 │ this is some text  \xf2\xff\xff\xff\x14
   3 │ this is some text  \v\0\b\0\n
   4 │ this is some text  \b\0\b\0\0
   5 │ this is some text  \x04\0\0\0\xec
   6 │ this is some text  \x18\0\0\0\x01
   7 │ this is some text  \x11\0\b\0\0
   8 │ this is some text  \x04\0\x04\0\x04
   9 │ this is some text  \xec\xff\xff\xff,
  10 │ this is some text  \x01\x18\0\0\x10
> 

A dataframe like pl.DataFrame({'ints': [0] * 10, 'ye': [5]*10, 'more': ['h'*L]*10}).write_ipc("long.arrow") is read incorrectly for 1<=L<=12 (checked manually), but is suddenly read fine for L==13:

> python
>>> import polars as pl; pl.DataFrame({'ints': [0] * 10, 'ye': [5]*10, 'more': ['h'*12]*10}).write_ipc("long.arrow")
> julia --project -e "using DataFrames; import Arrow; Arrow.Table(\"long.arrow\") |> DataFrame |> display"
10×3 DataFrame
 Row │ ints    ye      more                              
     │ Int64?  Int64?  String?                           
─────┼───────────────────────────────────────────────────
   1 │      0       5  W1\0\0\xff\xff\xff\xff\b\x01\0\0
   2 │      0       5  \xf2\xff\xff\xff\x14\0\0\0\x04\0…
   3 │      0       5  \v\0\b\0\n\0\x04\0\xf8\xff\xff\x…
   4 │      0       5  \b\0\b\0\0\0\x04\0\x03\0\0\0
   5 │      0       5  D\0\0\0\x04\0\0\0\xec\xff\xff\xff
   6 │      0       5   \0\0\0\x18\0\0\0\x01\x18\0\0
   7 │      0       5  \x04\0\x10\0\x11\0\b\0\0\0\f\0
   8 │      0       5  \xfc\xff\xff\xff\x04\0\x04\0\x04…
   9 │      0       5  \0\0\0\0\xec\xff\xff\xff8\0\0\0
  10 │      0       5  \x18\0\0\0\x01\x02\0\0\x10\0\x12…
> python
>>> import polars as pl; pl.DataFrame({'ints': [0] * 10, 'ye': [5]*10, 'more': ['h'*13]*10}).write_ipc("long.arrow")
> julia --project -e "using DataFrames; import Arrow; Arrow.Table(\"long.arrow\") |> DataFrame |> display"
10×3 DataFrame
 Row │ ints    ye      more          
     │ Int64?  Int64?  String?       
─────┼───────────────────────────────
   1 │      0       5  hhhhhhhhhhhhh
   2 │      0       5  hhhhhhhhhhhhh
   3 │      0       5  hhhhhhhhhhhhh
   4 │      0       5  hhhhhhhhhhhhh
   5 │      0       5  hhhhhhhhhhhhh
   6 │      0       5  hhhhhhhhhhhhh
   7 │      0       5  hhhhhhhhhhhhh
   8 │      0       5  hhhhhhhhhhhhh
   9 │      0       5  hhhhhhhhhhhhh
  10 │      0       5  hhhhhhhhhhhhh
> 

When strings are of different lengths, short ones are messed up:

> python
>>> from random import randint; col=[randint(1,50) for _ in range(10)]; print(col); import polars as pl; pl.DataFrame({'ints': [0] * 10, 'ye': [5]*10, 'more': ['h'*i for i in col]}).write_ipc("long.arrow")
[38, 5, 48, 32, 12, 3, 26, 23, 33, 37]
> julia --project -e "using DataFrames; import Arrow; Arrow.Table(\"long.arrow\") |> DataFrame |> display"
10×3 DataFrame
 Row │ ints    ye      more                              
     │ Int64?  Int64?  String?                           
─────┼───────────────────────────────────────────────────
   1 │      0       5  hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh…
   2 │      0       5  \xf2\xff\xff\xff\x14
   3 │      0       5  hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh…
   4 │      0       5  hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
   5 │      0       5  D\0\0\0\x04\0\0\0\xec\xff\xff\xff
   6 │      0       5   \0\0
   7 │      0       5  hhhhhhhhhhhhhhhhhhhhhhhhhh
   8 │      0       5  hhhhhhhhhhhhhhhhhhhhhhh
   9 │      0       5  hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
  10 │      0       5  hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh…

I tried "weird" non-ASCII scripts like Devanagari, but couldn't trigger the bug.

@ForceBru
Copy link
Author

ForceBru commented Feb 10, 2025

Here's a BoundsError: attempt to access 0-element Vector{Vector{UInt8}} at index [1]:

> python
>>> from random import randint; col=[randint(1,500) for _ in range(100)]; print(col); import polars as pl; pl.DataFrame({'more': ['नमस्ते'*i for i in col],'text':['k'*i for i in col]}).write_ipc("long.arrow")
[232, 143, 235, 324, 105, 114, 47, 455, 111, 132, 125, 327, 249, 355, 317, 156, 312, 481, 107, 404, 493, 343, 41, 430, 1, 13, 107, 125, 114, 172, 443, 307, 328, 331, 318, 292, 327, 175, 41, 483, 147, 340, 309, 346, 414, 333, 103, 147, 143, 335, 132, 88, 409, 473, 45, 108, 112, 282, 150, 334, 261, 428, 316, 385, 157, 458, 348, 207, 444, 140, 425, 69, 500, 222, 472, 35, 170, 431, 11, 125, 484, 346, 187, 441, 108, 237, 18, 466, 128, 467, 466, 391, 310, 318, 171, 331, 450, 90, 194, 465]
> julia --project -e "using DataFrames; import Arrow; Arrow.Table(\"long.arrow\") |> DataFrame |> display"
ERROR: BoundsError: attempt to access 0-element Vector{Vector{UInt8}} at index [1]
Stacktrace:
  [1] throw_boundserror(A::Vector{Vector{UInt8}}, I::Tuple{Int64})
    @ Base ./essentials.jl:14
  [2] getindex
    @ ./essentials.jl:916 [inlined]
  [3] getindex(l::Arrow.View{Union{Missing, String}}, i::Int64)
    @ Arrow ~/.julia/packages/Arrow/3GbnS/src/arraytypes/views.jl:61
  [4] getindex
    @ ~/.julia/packages/DataFrames/kcA9R/src/dataframe/dataframe.jl:517 [inlined]
  [5] _pretty_tables_highlighter_func(data::DataFrame, i::Int64, j::Int64)
    @ DataFrames ~/.julia/packages/DataFrames/kcA9R/src/abstractdataframe/prettytables.jl:13
  [6] _text_process_data_cell(ptable::PrettyTables.ProcessedTable, cell_data::PrettyTables.UndefinedCell, cell_str::String, i::Int64, j::Int64, l::Int64, column_width::Int64, crayon::Crayons.Crayon, alignment::Symbol, highlighters::Ref{Any})
    @ PrettyTables ~/.julia/packages/PrettyTables/oVZqx/src/backends/text/print_cell.jl:108
  [7] _text_print_table!(display::PrettyTables.Display, ptable::PrettyTables.ProcessedTable, table_str::Matrix{Vector{String}}, actual_columns_width::Vector{Int64}, continuation_row_line::Int64, num_lines_in_row::Vector{Int64}, num_lines_around_table::Int64, body_hlines::Vector{Int64}, body_hlines_format::NTuple{4, Char}, continuation_row_alignment::Symbol, ellipsis_line_skip::Int64, highlighters::Ref{Any}, hlines::Vector{Int64}, tf::PrettyTables.TextFormat, text_crayons::PrettyTables.TextCrayons{Crayons.Crayon, Crayons.Crayon}, vlines::Vector{Int64})
    @ PrettyTables ~/.julia/packages/PrettyTables/oVZqx/src/backends/text/print_table.jl:237
  [8] _print_table_with_text_back_end(pinfo::PrettyTables.PrintInfo; alignment_anchor_fallback::Symbol, alignment_anchor_fallback_override::Dict{Int64, Symbol}, alignment_anchor_regex::Dict{Int64, Vector{Regex}}, autowrap::Bool, body_hlines::Vector{Int64}, body_hlines_format::Nothing, continuation_row_alignment::Symbol, crop::Symbol, crop_subheader::Bool, columns_width::Int64, display_size::Tuple{Int64, Int64}, equal_columns_width::Bool, ellipsis_line_skip::Int64, highlighters::Tuple{PrettyTables.Highlighter}, hlines::Vector{Symbol}, linebreaks::Bool, maximum_columns_width::Vector{Int64}, minimum_columns_width::Int64, newline_at_end::Bool, overwrite::Bool, reserved_display_lines::Int64, show_omitted_cell_summary::Bool, sortkeys::Bool, tf::PrettyTables.TextFormat, title_autowrap::Bool, title_same_width_as_table::Bool, vcrop_mode::Symbol, vlines::Vector{Int64}, border_crayon::Crayons.Crayon, header_crayon::Crayons.Crayon, omitted_cell_summary_crayon::Crayons.Crayon, row_label_crayon::Crayons.Crayon, row_label_header_crayon::Crayons.Crayon, row_number_header_crayon::Crayons.Crayon, subheader_crayon::Crayons.Crayon, text_crayon::Crayons.Crayon, title_crayon::Crayons.Crayon)
    @ PrettyTables ~/.julia/packages/PrettyTables/oVZqx/src/backends/text/text_backend.jl:371
  [9] _print_table(io::IO, data::Any; alignment::Vector{Symbol}, backend::Val{:auto}, cell_alignment::Nothing, cell_first_line_only::Bool, compact_printing::Bool, formatters::Tuple{typeof(DataFrames._pretty_tables_general_formatter)}, header::Tuple{Vector{String}, Vector{String}}, header_alignment::Symbol, header_cell_alignment::Nothing, limit_printing::Bool, max_num_of_columns::Int64, max_num_of_rows::Int64, renderer::Symbol, row_labels::Nothing, row_label_alignment::Symbol, row_label_column_title::String, row_number_alignment::Symbol, row_number_column_title::String, show_header::Bool, show_row_number::Bool, show_subheader::Bool, title::String, title_alignment::Symbol, kwargs::@Kwargs{alignment_anchor_fallback::Symbol, alignment_anchor_regex::Dict{Int64, Vector{Regex}}, crop::Symbol, ellipsis_line_skip::Int64, hlines::Vector{Symbol}, highlighters::Tuple{PrettyTables.Highlighter}, maximum_columns_width::Vector{Int64}, newline_at_end::Bool, reserved_display_lines::Int64, row_label_crayon::Crayons.Crayon, vcrop_mode::Symbol, vlines::Vector{Int64}})
    @ PrettyTables ~/.julia/packages/PrettyTables/oVZqx/src/print.jl:1059
 [10] _print_table
    @ ~/.julia/packages/PrettyTables/oVZqx/src/print.jl:934 [inlined]
 [11] #pretty_table#62
    @ ~/.julia/packages/PrettyTables/oVZqx/src/print.jl:825 [inlined]
 [12] pretty_table
    @ ~/.julia/packages/PrettyTables/oVZqx/src/print.jl:794 [inlined]
 [13] _show(io::Base.TTY, df::DataFrame; allrows::Bool, allcols::Bool, rowlabel::Symbol, summary::Bool, eltypes::Bool, rowid::Nothing, truncate::Int64, kwargs::@Kwargs{})
    @ DataFrames ~/.julia/packages/DataFrames/kcA9R/src/abstractdataframe/show.jl:253
 [14] _show
    @ ~/.julia/packages/DataFrames/kcA9R/src/abstractdataframe/show.jl:147 [inlined]
 [15] #show#871
    @ ~/.julia/packages/DataFrames/kcA9R/src/abstractdataframe/show.jl:352 [inlined]
 [16] show
    @ ~/.julia/packages/DataFrames/kcA9R/src/abstractdataframe/show.jl:339 [inlined]
 [17] show(io::Base.TTY, mime::MIME{Symbol("text/plain")}, df::DataFrame)
    @ DataFrames ~/.julia/packages/DataFrames/kcA9R/src/abstractdataframe/io.jl:150
 [18] display(d::TextDisplay, M::MIME{Symbol("text/plain")}, x::Any)
    @ Base.Multimedia ./multimedia.jl:254
 [19] display
    @ ./multimedia.jl:255 [inlined]
 [20] display(x::Any)
    @ Base.Multimedia ./multimedia.jl:340
 [21] |>(x::DataFrame, f::typeof(display))
    @ Base ./operators.jl:926
 [22] top-level scope
    @ none:1

Also, sometimes data from the first column appears in the second column, but only for dataframes with more than about 30 rows:

> python
>>> col=[7 for _ in range(40)]; import polars as pl; pl.DataFrame({'more': ['नमस्त *i for i in col],'text':['k'*i for i in col]}).write_ipc("long.arrow")
>>> 
> julia --project -e "using DataFrames; import Arrow; Arrow.Table(\"long.arrow\") |> DataFrame |> display"
40×2 DataFrame
 Row │ more                          text                     
     │ String?                       String?                  
─────┼────────────────────────────────────────────────────────
   1 │ नमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्ते  W1\0\0\xff\xff\xff
   2 │ नमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्ते  \xf2\xff\xff\xff\x14\0\0
   3 │ नमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्ते  \v\0\b\0\n\0\x04
   4 │ नमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्ते  \b\0\b\0\0\0\x04
   5 │ नमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्ते  \x04\0\0\0\xec\xff\xff
   6 │ नमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्ते  \x18\0\0\0\x01\x18\0
   7 │ नमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्ते  \x11\0\b\0\0\0\f
   8 │ नमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्ते  \x04\0\x04\0\x04\0\0
   9 │ नमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्ते  \xec\xff\xff\xff,\0\0
  10 │ नमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्ते  \x01\x18\0\0\x10\0\x12
  11 │ नमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्ते  \0\0\f\0\0\0\0
  12 │ नमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्ते  \x04\0\0\0mor # trying to spell "more", name of 1st column?
  13 │ नमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्ते  \xe8\0\0\0\x04\0\0
  14 │ नमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्ते  \0\0\0\0\x14\0\0
  15 │ नमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्ते  \x10\0\x12\0\f\0\x04
  16 │ नमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्ते  \0\0\0\0\x90\0\0
  17 │ नमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्ते  \0\0\0\0\0\0\x0e
  18 │ नमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्ते  \0\0\x14\0\x02\0\0
  19 │ नमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्ते  \0\0\0\0\0\0\0
  20 │ नमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्ते  \0\0\0\0\0\0\0
  21 │ नमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्ते  \0\0\0\0\0\0\0
  22 │ नमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्ते  \x80\x02\0\0\0\0\0
  23 │ नमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्ते  @\x16\0\0\0\0\0
  24 │ नमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्ते  @\x16\0\0\0\0\0
  25 │ नमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्ते  \0\0\0\0\x02\0\0
  26 │ नमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्ते  \0\0\0\0\0\0\0
  27 │ नमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्ते  \0\0\0\0\0\0\0
  28 │ नमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्ते  न\xe0\0\0\0 # न shouldn't be here
  29 │ नमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्ते  न\xe0\0\0\0
  30 │ नमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्ते  न\xe0\0\0\0
  31 │ नमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्ते  न\xe0\0\0\0
  32 │ नमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्ते  न\xe0\0\0\0
  33 │ नमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्ते  न\xe0\0\0\0
  34 │ नमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्ते  न\xe0\0\0\0
  35 │ नमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्ते  न\xe0\0\0\0
  36 │ नमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्ते  न\xe0\0\0\0
  37 │ नमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्ते  न\xe0\0\0\0
  38 │ नमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्ते  न\xe0\0\0\0
  39 │ नमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्ते  न\xe0\0\0\0
  40 │ नमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्तेनमस्ते  न\xe0\0\0\0

Pyarrow reads all of these correctly.

@kou
Copy link
Member

kou commented Feb 10, 2025

text: string_view

It seems that arrow-julia doesn't support string view yet.

@quinnj
Copy link
Member

quinnj commented Feb 10, 2025

Is string_view different than the new Utf8View that we support (added here: https://github.com/apache/arrow-julia/pull/512/files#diff-bdc4e5cd6aa22fdc5e659e805b70c4763308be9f41128c42db5eeb3c13ed8631)?

@kou
Copy link
Member

kou commented Feb 10, 2025

Oh, sorry. They are the same type.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants