Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up download of OpenAQ data #29

Open
AnthonyMockler opened this issue Aug 23, 2022 · 0 comments
Open

Speed up download of OpenAQ data #29

AnthonyMockler opened this issue Aug 23, 2022 · 0 comments

Comments

@AnthonyMockler
Copy link

Describe the problem and proposed solution

Downloading OpenAQ data is painfully, unneccesarily slow. For a 1 year time range (2021-05-01-2022-05-01) and a single country (Thailand) the mean runtime is 380min.
The helper utility openaq.py runs single threaded, with two nested loops (An outer loop for each day in the range, an inner loop for each page in the current day)

Are there any potential alternatives?
Rewrite openaq.py to allow for an arbitrary number of threads, using Python's built in multiprocessing library and the ratelimit python package (https://github.com/tomasbasham/ratelimit)

  • Create a new function called get_openaq_page(date_from,date_to,country_id,limit,isMobile,parameter,has_geo,page)
  • Decorate with @limits(calls=300, period=FIVE_MINUTES)
  • Create a new function called get_openaq_for_date(date_from,date_to,country_id,limit,isMobile,parameter,has_geo)
  • Use python's builtin multiprocessing.dummy to create a Threadpool of get_openaq_page objects for each date
  • Replace existing try / except loop with backoffs from the python package backoff (https://github.com/litl/backoff)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant