Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"wins" column missing data #1

Open
Ayelet-Iz opened this issue Jan 17, 2025 · 5 comments
Open

"wins" column missing data #1

Ayelet-Iz opened this issue Jan 17, 2025 · 5 comments
Assignees
Labels
bug Something isn't working

Comments

@Ayelet-Iz
Copy link

This is a great dataset that I wanted to use to work on a project.
Unfortunately the "wins" column is missing data.
All the values are "0"
See for example the movie Slumdog Millionaire.
On IMDB: https://www.imdb.com/title/tt1010048/?ref_=nv_sr_srsg_0_tt_8_nm_0_in_0_q_slum
Image

in the dataset (row 27868):
Image

@RaedAddala RaedAddala self-assigned this Jan 17, 2025
@RaedAddala RaedAddala added the bug Something isn't working label Jan 17, 2025
@RaedAddala
Copy link
Owner

Hello @Ayelet-Iz.

Thank you for your interest in the dataset and thank you for noting this issue.

I found another problem with the dataset.

I am working on fixing it. I will add this issue too.

Once I fix this. It will take some time to scrape. I will notify you once this issue is solved and the data is ready again.

Are there any more features, details, years, you think are important and should be included?

Have a great day.

@RaedAddala
Copy link
Owner

Also, the Oscar value in the given example was read wrong. I will fix that too.

@Ayelet-Iz
Copy link
Author

Thanks for your reply, I couldn't think of anything else that I thought was missing
Thanks for your work!

@emartins90
Copy link

Budget also appears to not be in USD. I spot checked the titles with the top budgets and they are mostly in South Korean won.

@RaedAddala
Copy link
Owner

@emartins90 @Ayelet-Iz I have worked more on the data and found inconsistencies in the IMDb details page itself, the information provided doesn't follow the same standard everywhere in all movies.
This is the reason behind many of the problems also I assumed all the movies would be in dollars because even foreign movies (non-American movies) have their box office info all in US dollar (as a universal monetary unit of measure).

My assumptions were wrong. I am changing my code to use an LLM to normalize the information in the same format and enforce a structure.

I am still experimenting, so I may take some time.

I apologize for the delay and I apologize for not noticing such errors early on.

Thank you for your interest and time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

When branches are created from issues, their pull requests are automatically linked.

3 participants