Speed up download of OpenAQ data #29

AnthonyMockler · 2022-08-23T21:56:49Z

Describe the problem and proposed solution

Downloading OpenAQ data is painfully, unneccesarily slow. For a 1 year time range (2021-05-01-2022-05-01) and a single country (Thailand) the mean runtime is 380min.
The helper utility openaq.py runs single threaded, with two nested loops (An outer loop for each day in the range, an inner loop for each page in the current day)

Are there any potential alternatives?
Rewrite openaq.py to allow for an arbitrary number of threads, using Python's built in multiprocessing library and the ratelimit python package (https://github.com/tomasbasham/ratelimit)

Create a new function called get_openaq_page(date_from,date_to,country_id,limit,isMobile,parameter,has_geo,page)
Decorate with @limits(calls=300, period=FIVE_MINUTES)
Create a new function called get_openaq_for_date(date_from,date_to,country_id,limit,isMobile,parameter,has_geo)
Use python's builtin multiprocessing.dummy to create a Threadpool of get_openaq_page objects for each date
Replace existing try / except loop with backoffs from the python package backoff (https://github.com/litl/backoff)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speed up download of OpenAQ data #29

Speed up download of OpenAQ data #29

AnthonyMockler commented Aug 23, 2022

Speed up download of OpenAQ data #29

Speed up download of OpenAQ data #29

Comments

AnthonyMockler commented Aug 23, 2022