You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Downloading OpenAQ data is painfully, unneccesarily slow. For a 1 year time range (2021-05-01-2022-05-01) and a single country (Thailand) the mean runtime is 380min.
The helper utility openaq.py runs single threaded, with two nested loops (An outer loop for each day in the range, an inner loop for each page in the current day)
Are there any potential alternatives?
Rewrite openaq.py to allow for an arbitrary number of threads, using Python's built in multiprocessing library and the ratelimit python package (https://github.com/tomasbasham/ratelimit)
Create a new function called get_openaq_page(date_from,date_to,country_id,limit,isMobile,parameter,has_geo,page)
Decorate with @limits(calls=300, period=FIVE_MINUTES)
Create a new function called get_openaq_for_date(date_from,date_to,country_id,limit,isMobile,parameter,has_geo)
Use python's builtin multiprocessing.dummy to create a Threadpool of get_openaq_page objects for each date
Describe the problem and proposed solution
Downloading OpenAQ data is painfully, unneccesarily slow. For a 1 year time range (2021-05-01-2022-05-01) and a single country (Thailand) the mean runtime is 380min.
The helper utility openaq.py runs single threaded, with two nested loops (An outer loop for each day in the range, an inner loop for each page in the current day)
Are there any potential alternatives?
Rewrite openaq.py to allow for an arbitrary number of threads, using Python's built in multiprocessing library and the ratelimit python package (https://github.com/tomasbasham/ratelimit)
The text was updated successfully, but these errors were encountered: