Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quick bump - Snowpark_venv issues with frame_to_hyper #348

Closed
skyth540 opened this issue Sep 30, 2024 · 10 comments
Closed

Quick bump - Snowpark_venv issues with frame_to_hyper #348

skyth540 opened this issue Sep 30, 2024 · 10 comments

Comments

@skyth540
Copy link

Bumping this comment that may have been missed

Running this code:

import polars as pl
import pantab as pt

schema = {
    'string_col':pl.String,
    'cat_col':pl.String,
    'int_col':pl.Int64,
    'float_col':pl.Float32
}

data = {
    'string_col':['ID129120','ID8923879','ID89231987','ID126735817'],
    'cat_col':['Apple', 'Orange', 'Pear', 'Peach'],
    'int_col':[44,23,6,88],
    'float_col':[12.25,4.56,12.645,12.098]
}

df = pl.DataFrame(data, schema = schema)

path = 'test.hyper'

pt.frame_to_hyper(df, path, table = 'test')

Results in this error:

Temp Storage folder ~\AppData\Roaming\Code\User\globalStorage\ms-toolsai.jupyter\version-2024.8.1
Workspace folder ~\OneDrive\Desktop\Python Projects, Home = c:\Users\nicho
09:49:50.367 [warn] No interpreter with path c:\Python_Virtual_Environments\Snowpark_venv\Snowpark_venv\Scripts\python.exe found in Python API, will convert Uri path to string as Id c:\Python_Virtual_Environments\Snowpark_venv\Snowpark_venv\Scripts\python.exe
09:49:50.829 [info] Starting Kernel (Python Path: ~\anaconda3\python.exe, Conda, 3.12.4) for '~\OneDrive\Desktop\Python Projects\project.ipynb' (disableUI=true)
09:49:52.892 [warn] Kernel Spec for 'Snowpark_venv' (~\AppData\Roaming\jupyter\kernels\snowpark_venv\kernel.json) hidden, as we cannot find a matching interpreter argv = 'C:\Python_Virtual_Environments\Snowpark_venv\Snowpark_venv\Scripts\python.exe'. To resolve this, please change 'C:\Python_Virtual_Environments\Snowpark_venv\Snowpark_venv\Scripts\python.exe' to point to the fully qualified Python executable.
09:49:58.820 [info] Process Execution: ~\anaconda3\python.exe -m pip list
09:49:59.055 [info] Process Execution: ~\anaconda3\python.exe -c "import ipykernel; print(ipykernel.__version__); print("5dc3a68c-e34e-4080-9c3e-2a532b2ccb4d"); print(ipykernel.__file__)"
09:49:59.062 [info] Process Execution: ~\anaconda3\python.exe c:\Users\~\.vscode\extensions\ms-toolsai.jupyter-2024.8.1-win32-x64\pythonFiles\vscode_datascience_helpers\kernel_interrupt_daemon.py --ppid 19224
    > cwd: ~\.vscode\extensions\ms-toolsai.jupyter-2024.8.1-win32-x64\pythonFiles\vscode_datascience_helpers
09:49:59.318 [info] Process Execution: ~\anaconda3\python.exe -m ipykernel_launcher --f=c:\Users\~\AppData\Roaming\jupyter\runtime\kernel-v37c56cd9404949ccc3ff9314442d0987098707aec.json
    > cwd: ~\OneDrive\Desktop\Python Projects
09:50:01.509 [info] Kernel successfully started
09:50:01.526 [info] Process Execution: ~\anaconda3\python.exe c:\Users\~\.vscode\extensions\ms-toolsai.jupyter-2024.8.1-win32-x64\pythonFiles\printJupyterDataDir.py
09:50:02.351 [error] Disposing session as kernel process died ExitCode: 3221225477, Reason: 
@WillAyd
Copy link
Collaborator

WillAyd commented Sep 30, 2024

Thank you - I did overlook that message.

Hmm this is a tough one...my guess is that it is an issue just on Windows (?). If you have the capability of running from WSL you might not run into the issue there (?)

There are a few macros for Windows that maybe can be re-evaluated more closely. If you take out all of the string data does it then work?

The hard part for me as a maintainer of pantab is that I don't have access to a Windows machine, and am not very familiar with Windows tools for debugging. Additionally, the Hyper API makes it impossible to run ASAN/UBSAN to detect any potential code issues in an automated fashion...I've brought that up with the Tableau team in the past but I don't know that there will be that much traction on that request, unfortunately

@skyth540
Copy link
Author

I am creating process improvements for a BI team, and they ultimately need to be the ones to use my work with minimum additional set up / support... WSL wouldn't be an option for them

I'm getting the same jupyter log with this:


import polars as pl
import pantab as pt

schema = {
    'int_col':pl.Int64,
    'float_col':pl.Float32
}

data = {
    'int_col':[44,23,6,88],
    'float_col':[12.25,4.56,12.645,12.098]
}

df = pl.DataFrame(data, schema = schema)

path = 'test.hyper'

pt.frame_to_hyper(df, path, table = 'test')

I appreciate all your work! Sorry for the challenges, that sounds frustrating

@WillAyd
Copy link
Collaborator

WillAyd commented Sep 30, 2024

The jupyter log is unfortunately not helpful for this; it appears that there is a lower level failure that won't be captured in those logs.

You are getting this when installing the library from github, but not from pypi right? Its possible that the CMake setup may be missing some corrections to the shared libraries that cibuildwheel and delvewheel take care of when distributing Windows binaries.

Since the code in your most recent example has not been affected by any changes since the last release on pypi, it may be resolved when I post the new release (assuming the current release is working fine)

@skyth540
Copy link
Author

skyth540 commented Sep 30, 2024

You are probably right... yes, this was off of the Github branch. I uninstalled, and reinstalled from pip, and it isn't crashing anymore- but I am back to my RuntimeError: Could not init schema view from child schema 0: Error parsing schema->format: Unknown format: 'vu' error that is caused by polars strings.

Sort of a catch-22 until I can install the string fixes from pip

@WillAyd
Copy link
Collaborator

WillAyd commented Sep 30, 2024

What might be nice is if we set up nightly or release candidate builds of pantab to help with this.

That is not something I personally have the bandwidth to take on right now, but @jorwoods has done a lot of great stuff for our CI so maybe has time/interest. Otherwise we would need a community contribution to make this happen!

@WillAyd
Copy link
Collaborator

WillAyd commented Oct 2, 2024

@skyth540 actually I was able to get release candidates distributed without too much extra trouble. Can you try

python -m pip install --upgrade --pre pantab

to get the 5.1.0 release candidate and see if it fixes your issues?

@skyth540
Copy link
Author

skyth540 commented Oct 2, 2024

Everything was successful up until frame_to_hyper

params = {"default_database_version": "1"}
pt.frame_to_hyper(step_3.collect(), path, table = 'test', process_params = params)
RuntimeError: This database does not support 32-bit floating points.
Context: 0xfa6b0e2f

I have strings, categoricals, f32, f64, int8, int64, and date datatypes.

@WillAyd
Copy link
Collaborator

WillAyd commented Oct 2, 2024

The issue is that you are asking for a default_database_version of 1 but 32 bit float support was not added until version 4. This isn't a pantab problem as much as a Hyper API problem with the tools that you are using.

If you need to use database version 1 for your other tools, you will have to cast your 32 bit floats to 64 bit to use that version.

For more info, see the tableau documentation on that parameter:

https://tableau.github.io/hyper-db/docs/hyper-api/hyper_process/#default_database_version

@WillAyd
Copy link
Collaborator

WillAyd commented Oct 4, 2024

Let me know if you run into any other issues - hoping to cut a release next week if all else is good. Especially since Python 3.13 just came out would be nice to get a new release with wheels for that

@WillAyd
Copy link
Collaborator

WillAyd commented Oct 8, 2024

Going to go ahead and close this but if anything else comes up let me know

@WillAyd WillAyd closed this as completed Oct 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants