README.Rmd

---
output:
  rmarkdown::github_document:
    df_print: kable
---
<!-- README.md is generated from README.Rmd. Please edit that file -->

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%"
)
```

# special.epd: SPECIAL Research Group's Version of the European Pollen Database (EPD)
<!-- S<sub>PECIAL</sub> <sub>R</sub>e<sub>search group's version of the</sub> E<sub>uropean</sub> Po<sub>llen</sub> D<sub>atabase (EPD)</sub>  -->
<!-- *S*PECIAL R*e*search group's version of the *E*uropean *Po*llen *D*atabase (EPD)  -->
<!-- **S**PECIAL R**e**search group's version of the **E**uropean **Po**llen **D**atabase (EPD)  -->
<!-- <img src="inst/images/logo.png" alt="logo" align="right" height=200px/> -->
<!-- badges: start -->
`r badger::badge_devel("special-uor/special.epd", "yellow")`
`r badger::badge_github_actions("special-uor/special.epd")`
`r badger::badge_cran_release("special.epd", "black")`
<!-- badges: end -->

The goal of `special.epd` is to provide access to the SPECIAL Research group's version of the European Pollen Database (EPD).

## Installation

You **can(not)** install the released version of `special.epd` from [CRAN](https://CRAN.R-project.org) with:

``` r
install.packages("special.epd")
```

And the development version from 
[GitHub](https://github.com/special-uor/special.epd) with:

``` r
# install.packages("remotes")
remotes::install_github("special-uor/special.epd", "dev")
```
## Example

#### Load tables to working environment
```{r, eval = FALSE}
data("entity", package = "special.epd")
data("date_info", package = "special.epd")
data("sample", package = "special.epd")
data("age_model", package = "special.epd")
data("pollen_count", package = "special.epd")
```

#### Create a snapshot of entities

The function `special.epd::snapshot` takes few different parameters and based on
the first one, `x`, it returns a variety of snapshots. 

This function returns a list with 5 components:

- `entity`: data frame (`tibble` object) with the metadata associated to the entities.
- `date_info`: data frame (`tibble` object) with the dating information. This one can be linked to the `entity` table using the column called `ID_ENTITY`.
- `sample`: data frame (`tibble` object) with the sampling information. This one can be linked to the `entity` table using the column called `ID_ENTITY`.
- `age_model`: : data frame (`tibble` object) with the "new" age models (created with [**ageR**](https://github.com/special-uor/ageR)). This one can be linked to the `sample` table using the column called `ID_SAMPLE`.
- `pollen_count`: list of data frames (`tibble` objects) containing the pollen counts for 3 levels of "amalgamation":
    - `clean`
    - `intermediate`
    - `amalgamated`
    
  All these data frames can be linked to the `sample` table using the column called `ID_SAMPLE`.

:warning: **NOTE:** the output is returned "invisibly", so you should assign the output of the function to a variable.

```{r, eval = FALSE}
output <- special.epd::snapshot(...)
output$entity
output$date_info
output$sample
output$pollen_count$clean
output$pollen_count$intermediate
output$pollen_count$intermediate
```

##### Using the `entity_name`
```{r, eval = TRUE}
special.epd::snapshot("MBA3")
```

##### Using the `site_name`
```{r, eval = TRUE}
special.epd::snapshot("Abant Golu", use_site_name = TRUE)
```

##### Using the `ID_ENTITY`
```{r, eval = TRUE}
special.epd::snapshot(2)
```

##### Using the `ID_SITE`
```{r, eval = TRUE}
special.epd::snapshot(3, use_id_site = TRUE)
```

##### Extracting multiple records at once
```{r, eval = TRUE}
special.epd::snapshot(1:10)
```

##### Extracting all the records at once
This will run very slow, so if only few entities are required, it would be better
to indicate which, based on the previous examples.

```{r, eval = FALSE}
out <- special.epd::snapshot()
```

#### Export data as individual CSV files

The function `special.epd::write_csvs` takes to parameters:

- `.data`: a list of class `snapshot`, this one can be generated using the function `special.epd::snapshot` (see previous section).
- `prefix`: a prefix name to be included in each individual files, this prefix can include a relative or absolute path to a directory in the local machine.

##### Without a path
```{r, eval = TRUE}
`%>%` <- special.epd::`%>%`
special.epd::snapshot("MBA3") %>%
  special.epd::write_csvs(prefix = "MBA3")
```

###### Output
```{r, echo = FALSE}
paths <- list.files(pattern = "MBA3", full.names = TRUE)
tree <- data.tree::as.Node(data.frame(pathString = paths))
data.tree::SetGraphStyle(tree, rankdir = "TB")
data.tree::SetNodeStyle(tree,
                        style = "filled,rounded",
                        shape = "box")
print(tree)
```

##### Including a path
```{r, eval = FALSE}
`%>%` <- special.epd::`%>%`
special.epd::snapshot("MBA3") %>%
  special.epd::write_csvs(prefix = "/special.uor/epd/MBA3")
```

###### Output
```{r, echo = FALSE}
paths <- list.files(pattern = "MBA3", full.names = TRUE) %>%
  stringr::str_replace_all("./", "/special.uor/epd/")
tree <- data.tree::as.Node(data.frame(pathString = paths))
data.tree::SetGraphStyle(tree, rankdir = "TB")
data.tree::SetNodeStyle(tree,
                        style = "filled,rounded",
                        shape = "box")
print(tree)
```

```{r, eval = TRUE, echo = FALSE}
paths <- list.files(pattern = "MBA3", full.names = TRUE)
unlink(paths)
```


## Spatial distribution of the entities
```{r special-epd, echo = TRUE, cache = TRUE, fig.width = 10, fig.height = 6, dpi = 300}
`%>%` <- special.epd::`%>%`
special.epd::entity %>%
  smpds::plot_climate(var = "elevation", units = "m ASL", 
                      ylim = c(25, 85),
                      xlim = c(-30, 170))
```


## Extract Potential Natural Vegetation (PNV)
Using the package `smpds` [https://github.com/special-uor/smpds] we can extract the PNV for each entity and create a plot:

```{r pnv-code, echo = TRUE, eval = FALSE}
`%>%` <- special.epd::`%>%`
special.epd_pnv <- special.epd::entity %>%
  smpds::extract_biome()

# For a quicker execution
special.epd_pnv <- special.epd::entity %>%
  smpds::parallel_extract_biome(cpus = 4)

# Plot the PNV
special.epd_pnv %>%
  smpds::plot_biome(ylim = c(25, 85),
                    xlim = c(-30, 170))
```

```{r pnv, eval = TRUE, echo = FALSE, cache = TRUE, fig.width = 10, fig.height = 8, dpi = 300}
`%>%` <- special.epd::`%>%`
special.epd_pnv <- special.epd::entity %>%
  smpds::parallel_extract_biome(cpus = 12)

# Plot the PNV
special.epd_pnv %>%
  smpds::plot_biome(ylim = c(25, 85),
                    xlim = c(-30, 170))
```

## Summary of the database
```{r, cache = TRUE}
`%>%` <- special.epd::`%>%`
special_epd_summary <- special.epd::db_summary()
tibble::tibble(
    `# Entities` = nrow(special_epd_summary),
    `with Dates` = sum(special_epd_summary$has_DATES, na.rm = TRUE),
    `with Age models (using IntCal20)` = sum(special_epd_summary$has_AM, na.rm = TRUE),
    `with Pollen counts` = sum(special_epd_summary$has_COUNTS, na.rm = TRUE)
  ) %>%
  knitr::kable()
```