Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(Chinese) A few illegal characters � show up when the file is large enough #67

Open
duanyukai opened this issue Jan 13, 2020 · 6 comments

Comments

@duanyukai
Copy link

I found this strange bug, after parsing my excel file with large amount of Chinese characters, the output file contains very few amount of illegal utf8 characters ( shown as �) .
I can only reproduce this bug when the file is large enough, the testcase file is below. I just copied the same line a lot.

测试.xlsx

image

@catamphetamine
Copy link
Owner

Hmm, no idea.
I guess this issue should stay open so that other Chinese-speaking users could see it.

@duanyukai
Copy link
Author

I'll try to find some other "smaller" testcases, it seems like fault with buffer or something else?

@catamphetamine
Copy link
Owner

it seems like fault with buffer or something else?

Absolutely no idea.
Sometimes I think that we should find an alternative simple Excel reading library and place the link in the readme: this library is intended for really simple cases, and people say it won't always work for large files.

@catamphetamine catamphetamine changed the title A few illegal characters show when the excel file is large enough (Chinese) A few illegal characters � show up when the excel file is large enough Feb 15, 2020
@catamphetamine catamphetamine changed the title (Chinese) A few illegal characters � show up when the excel file is large enough (Chinese) A few illegal characters � show up when the file is large enough Feb 15, 2020
@plaa
Copy link

plaa commented Aug 26, 2021

Encountered this also in Finnish words, where Näytä was converted into N��ytä. The latter ä is correct but the first one becomes two Unicode replacement characters U+FFFD.

This was triggered by modifying other cell values (the same value was read correctly previously). Adding any text in front of Näytä results in correct conversion, so this seems to require some very specific conditions to manifest.

@plaa
Copy link

plaa commented Aug 26, 2021

As for 'large enough', our file is 28kB (185 rows by 5 columns) which I consider to be pretty small.

@catamphetamine
Copy link
Owner

catamphetamine commented Aug 26, 2021

@plaa Attach the file illustrating the bug so that someone could potentially look at it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants