The goal of analytics
is to provide an easy way to access and analyze
“dark” data generated before, during, and after the Data Science for
Open Wash Data Course.
To install the package, you can use the following code:
# install.packages("devtools")
devtools::install_github("openwashdata/analytics")
#> ── R CMD build ─────────────────────────────────────────────────────────────────
#> * checking for file ‘/private/var/folders/q2/thf5k95955q8kn6twjrwljbr00jms0/T/RtmpFxjxao/remotesfda540e67cf3/openwashdata-analytics-d6c7304d3c4ef0168740664ab0f63cf2f1f787ea/DESCRIPTION’ ... OK
#> * preparing ‘analytics’:
#> * checking DESCRIPTION meta-information ... OK
#> * checking for LF line-endings in source and make files and shell scripts
#> * checking for empty or unneeded directories
#> * building ‘analytics_0.1.0.tar.gz’
#> Warning: invalid uid value replaced by that for user 'nobody'
The Data Science for Open WASH Data Course conducted by Global Health Engineering (GHE) at ETH Zürich generates data on participants’ engagement with the course, previous experiences with programming and take-aways from the course. This package makes it easier to access data stored in a variety of formats and provide a consolidate storage for it. In the future, this data will be used to provide an overview of the impact of the course.
In short, anyone can use this package. However, access to raw data is restricted to members of GHE and those invited by members of GHE. A data package with data from the first iteration of the course will be available soon. This data has been anonymized and can be used by anyone.
-
Google Drive: The GHE shared google drive collects data from survey responses. This data is stored in Google Sheets.
-
Posit Cloud: Data about usage of a shared Posit Cloud space is retrieved from Posit cloud. The shared space was accessible to all course participants.
-
Plausible: The website for the course is hosted on Plausible. Data about visitors, locations of visits and other trends in user interactions with the website are retrieved from Plausible.
-
Github: Data from the course is also present on Github. This is generally course participants’ locations. This is anonymized.
Certain permissions and API Keys are needed to access the data.
- Google Drive:
- A gmail account (username and password) with access to the relevant sheets OR
- A google service account with at least
read
access to the Google Sheet you need to read. - If logging in for the first time, you will be prompted to authenticate
your account and provide access permissions to the
googledrive
library
- Posit Cloud:
- A posit cloud account
- Access to the data science for open wash data course space on Posit Cloud
- An API key with access to read data from the course space
- Note: This code utilises the rscloud library which handles
authentication silently. There is no need to pass API keys to the
function but one needs to be generated. If required, please use
rscloud_whoami()
to re-authenticate
- Plausible:
- Github:
- A github account with access to the repository where the data is stored
- A personal access token with repo scope to access the data
All data pulled by this library is also available in a postgres database. Access to this database is limited. If you have the username and password for this data, you can access it directly using your credentials.
On the other hand, you can also write to the database. This function also cleans and archives old data. Write and Delete permission to the database is required for this.
For detailed usage of and documentation of each function, please refer to the vignettes.
library(analytics)
library(httr)
library(jsonlite)
library(googledrive)
library(googlesheets4)
get_plausible_data(site_url="ds4owd-001.github.io/website", pagewise=TRUE, token="SECRET_PLAUSIBLE_TOKEN")
get_survey_data(sheet_url="docs.google.com/spreadsheets", email="yourname@ethz.ch", n_course_modules=10)
get_pscloud_data(space_id="12345")
R libraries utilized in the code are: - RPostgres
- googledrive
-
googlesheets4
- rscloud
- httr
- jsonlite
- lubridate
-
tidyverse
Other standard libraries should already be installed in your default R environment.