-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path20160622_ggplot2_contours.rmd
281 lines (225 loc) · 14.9 KB
/
20160622_ggplot2_contours.rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
---
title: "Plotting greyscale contoured data in R with ggplot2"
author: "Philip Massie"
date: 2016-06-22
draft: true
tags: [ "R", "ggplot2", "contours", "visualisation", "greyscale" ]
output:
html_document:
keep_md: yes
toc: yes
---
```{r global_options, include=FALSE}
knitr::opts_chunk$set(echo=TRUE, warning=FALSE, message=FALSE, dev="png",
dev.args=list(type="cairo"),
dpi=96)
```
### Introduction
Preparing figures for publication can take a long time (well it does for me anyway), and I relied very heavily on numerous online resources to figure out some of the dos and don'ts. Obviously I owe massive thanks to the hundreds of blogs and Stack Exchange questions and answers I relied on. Where I can, I will try to link to them.
Additionally, this tutorial covers a little bit about working with reshaped gridded data, shape files, GeoTIFF data and even a simple homemade polygon. This then is my contribution to folks trying to get their plots to look the way they need them to.
As most people know, publishing in colour is way more expensive than in grey scale. The costs were completely prohibitive for myself and my co-authors, so I made efforts to change my beautiful, spectacularly coloured plots, which looked so nice in slide shows, into grey scale.
This is also my first attempt to use R Markdown in a real-world scenario. Please feel free to provide critical feedback.
### The data
This plot is showing mean values of the derived variable eddy kinetic (EKE) energy to the south west of Marion Island (in the Southern Ocean) from 2008 to 2010. Interactions between the Antarctic Circumpolar Current and a series of undersea faults in the South West Indian Ridge, give rise to an area of instability, and it is the degree and extent of this instability that I set out to investigate via EKE.
EKE is derived from u and v components [like this](http://goo.gl/4c3ccI). The friendly folks at [AVISO](http://www.aviso.altimetry.fr) have kindly let me use and make available this subset of their data for the purpose of this tutorial. Perhaps if someone is interested I'll make another post showing how I went about the calculations and wrangling.
If you are interested in this sort of data, please check out AVISO. They make available unfathomable amounts of absolutely beautiful data for free. All you need is an account and it's yours - Honestly access to data like this blows my mind. I wish it was like this with South African regional data... but that's another story.
### Method
Start by loading all the necessary libraries, setting working directories and loading data. You can download the EKE data set [here](http://www.philmassie.com/data/data.eke.rdata).
```{r libraries}
library(sp)
library(rgdal)
library(raster)
library(ggplot2)
library(plyr)
load(file = "data/data.eke.rdata")
head(data.eke)
# When we derive EKE, grid cells over land produce NaNs, so we must replace them with NAs
data.eke$eke[which(is.nan(data.eke$eke))] <- NA
```
#### Define our map extents
The EKE data spans from -18.875 E to 43.875 E and -38.88 N to -68.62 N.
Those are the extents needed for the full study, but it's too much for the plot, so we must subset it.
```{r map_extents}
latmin <- -55
latmax <- -43
lonmin <- 20
lonmax <- 43
map.extents <- extent(lonmin, lonmax, latmin, latmax)
data.eke.sub <- subset(data.eke,
data.eke$lat >= latmin &
data.eke$lat <= latmax &
data.eke$lon >= lonmin &
data.eke$lon <= lonmax)
```
#### Shape Files
The plot area includes the Prince Edward Islands. Instead of leaving blank, blocky spaces, let's use some shape files to make it look neater. These shape files come from [http://www.naturalearthdata.com](http://www.naturalearthdata.com). They are from the 'fine scale (1:10), Admin 0 -Countries' set available for free from [here](http://www.naturalearthdata.com/downloads/10m-cultural-vectors/).
```{r shape_files, results = "hide"}
# read in the shape file
countries <- readOGR("shapes", layer="ne_10m_admin_0_countries")
# Add an id column to the data slot of countries, based on rownames
countries@data$id <- rownames(countries@data)
# Fortify the spatial polygons for ggplot2 - likes long format data
# The resulting data frame contains all the spatial data and the id but no information labels.
countries.df <- fortify(countries)
# Use the id column to join all the label data in countries@data to the new dataframe.
countries.df <- join(countries.df, countries@data, by="id")
# Now we can subset the new dataframe to include only South Africa and her Islands and tidy up a bit
za <- subset(countries.df, countries.df$ISO_A2=="ZA")
rm(countries, countries.df)
```
#### GeoTIFF
The plot needs some bathymetric contour lines. We will use the ETOPO1 data for this via a GeoTIFF. There are higher resolution data sets around (e.g. ETOTPO2 and GEBCO) but ETOPO1 will suffice for this application. The ETOPO1 data is (freely?) available from [here](http://www.ngdc.noaa.gov/mgg/global/global.html).
```{r geotiff}
# Load up the geotiff
etopo1.full <- raster("etopo/ETOPO1_Ice_c_geotiff.tif")
# Crop out our ROI
etopo.crop <- crop(etopo1.full, map.extents)
# Prepare for ggplot2 (long format) and tidy up a bit
etopo.crop.df <- as.data.frame(etopo.crop, xy=TRUE)
names(etopo.crop.df) <- c("long", "lat", "z")
rm(etopo1.full)
```
#### Polygon
Lastly we'll build a simple polygon to demarcate an area we consider within the region of elevated EKE (eddy field).
```{r eddy_field}
# Define extents
eflatmin <- -53
eflatmax <- -47.33
eflonmin <- 27.33
eflonmax <- 37.66
# bind the data into sequential coordinates, outlining the new polygon and generate a polygon.
ef.coords <- cbind(c(eflonmin, eflonmax, eflonmax, eflonmin, eflonmin),
c(eflatmax, eflatmax, eflatmin, eflatmin, eflatmax))
ef.poly <- Polygon(ef.coords)
# Prepare for ggplot2 and tidy up
ef.df <- fortify(ef.poly)
# set an individual group for plotting
ef.df$group <-1
rm(ef.coords, ef.poly)
```
### Plot time
Finally we have all the bits in place so we can build a plot. We'll start with the colourful version, and then see how to refine it to a grey scale contoured version.
#### Preliminaries
We start by defining the plot label positions and text and then set some custom theme stuff. See [here](http://docs.ggplot2.org/current/theme.html) for more on theming.
```{r plot_prelims}
# define some lables and their coordinates
lbl <- data.frame(x = c(40, 36, 30, 37.1, 22.5),
y = c(-46.5, -53.4, -50, -45, -51.5),
txt = c("Marion Island", "Eddy Field", " ", "SWIR", "SWIR"))
# Define the theme data as I like it for these plots...
thm <- theme_bw() +
theme(axis.text.x = element_text(size=8, face = "plain"),
axis.text.y = element_text(size=8, face = "plain"),
axis.title.x = element_text(size=8, face = "plain"),
axis.title.y = element_text(size=8, face = "plain"),
axis.ticks.x = element_line(size=0.3),
axis.ticks.y = element_line(size=0.3),
legend.key.height = unit(13, units="mm"),
legend.text = element_text(size=8, face = "plain"),
legend.title = element_text(size=8, angle = 90, vjust = 1, face = "plain"),
legend.title.align = 0.5,
panel.border = element_rect(colour = "black", fill=NA, size=.3))
```
#### Coloured tiles plot
Ggplot2 is even more awesome than anything you might have read, but it can be intimidating (I find). The best I can suggest is to keep playing and you'll start to get a handle on. One of the biggest difficulties for me was figuring out what order the layers should be in. This is where I ended up. I'd like to tell you that I had figured out an overriding set of rules, but there was some trial and error involved.
I have tried to comment the coloured plot extensively.
<<<<<<< HEAD
```{r coloured_plot, fig.width=8}
=======
```{r coloured_plot, fig.width=8, fig.align="center"}
>>>>>>> 8e329665c5f98353bccc132327e2167ca598c39f
# initiate the plot
plot.eke.colour <- ggplot() +
# include a layer with the eke data as coloured tile data
geom_tile(data=data.eke.sub, aes(x=lon,y=lat,fill=eke)) +
# include the shape file data
geom_polygon(data = za, aes(x = long, y = lat, group = group),
colour="black", fill="white", alpha=1, size = 0.3) +
# include the home made polygon data
geom_polygon(data=ef.df, aes(x=long, y=lat, group=group),
colour="black", fill="white", alpha=0, linetype="dashed", size = 0.3) +
# include the bathymetry data with geom_contour. Define a single custom break
geom_contour(data=etopo.crop.df, aes(x=long,y=lat,z=z),
breaks=c(-3000), colour="black", size = 0.3) +
# axis lables
labs(x = 'Longitude (ºE)', y = 'Latitude (ºN)') +
# map projection - boring choice but suits the task at hand
coord_map("mercator") +
# this trims unwanted whitespace around the plot
coord_fixed(xlim = c(lonmin, lonmax), ylim = c(latmin, latmax), expand = FALSE) +
# change the default EKE colour to more traditional rainbow colours (oceanography etc)
scale_fill_gradientn(colours = rev(rainbow(7, end=4/6)),
space = "Lab",
guide = guide_colorbar(title="Eddy kinetic energy (cm²/s²)",
title.position="right")) +
# include the lables defined earlier
geom_text(data=lbl, aes(x=x, y=y, label=txt), size=rel(3), colour="white", fontface="bold", alpha = 1) +
geom_text(data=lbl, aes(x=x+0.01, y=y+0.01, label=txt), size=rel(3), colour="black", fontface="bold") +
# include the theme settings
thm
plot.eke.colour
```
That looks pretty good. Unfortunately when we save the file, things don't always look the same as they do on the screen so there is generally some fine tuning involved. This version, as it stands, makes rather a good print. The colour palette is familiar to all my MATLAB/Python oceanographic colleagues too.
***Saving the plot according to the publishers requirements***
For completeness, this is one approach to saving your plots
```{r save_coloured_plot}
ggsave(filename = 'plot.eke.colour.png', plot = plot.eke.colour, width = 174, height = 105, units="mm", dpi = 300, type="cairo-png")
```
#### Greyscale contoured plot
If we just jump in and plot the existing data using the stat_contour() approach we immediately hit a problem. It's a little difficult to explain, but easy enough to understand if you play around a bit. I'll try my best.
Some of the contoured values exist as closed polygons within the plot frame. These plot as we would expect. Unfortunately, other contoured values are (now) polylines. These lines would have closed somewhere outside of the plot extents, but were cropped, resulting in open polygons/polylines. As a result, ggplot2 can't fill them with shading as required. Ggplot2 closes them automatically by joining their ends. This is pretty catastrophic. You can see what happens for yourself by exchanging `data.eke.sub.contour` for `data.eke.sub` in the plot code below.
***Calculate a low arbitrary value***
To avoid this we need to add extra, make-believe EKE data just outside the plotting frame. We set this to a low, arbitrary value which allows ggplot2 to close off the contour polygons which we are interested in. What the arbitrary value is, is not *that* important, as long as it is lower than everything else. I found this approach on [stackoverflow here](http://stackoverflow.com/questions/28469829/how-to-fill-in-the-contour-fully-using-stat-contour). It looks at the data spread, divides it into n bins and sets an arbitrary value according to the lowest value minus the bin width * 1.5.
```{r contour_fix}
bins <- 50
bin.width <- diff(range(na.omit(data.eke.sub$eke))) / bins
arbitary.value <- min(na.omit(data.eke.sub$eke)) - bin.width * 1.5
# Build some data frames representing 1 degree of extra EKE data in each direction, with the arbitrary value as EKE
min <- sapply(data.eke.sub, min, na.rm = TRUE)
max <- sapply(data.eke.sub, max, na.rm = TRUE)
seq.lat <- seq(min['lat'], max['lat'], 0.25)
seq.lon <- seq(min['lon'], max['lon'], 0.25)
west.edge <- data.frame(lon = rep(min['lon'] - 1, length(seq.lat)), lat = seq.lat, eke = arbitary.value)
east.edge <- data.frame(lon = rep(max['lon'] + 1, length(seq.lat)), lat = seq.lat, eke = arbitary.value)
south.edge <- data.frame(lon = seq.lon, lat = rep(min['lat'] - 1, length(seq.lon)), eke = arbitary.value)
north.edge <- data.frame(lon = seq.lon, lat = rep(max['lat'] + 1, length(seq.lon)), eke = arbitary.value)
# rbind the new data to the existing data and tidy up
data.eke.sub.contour <- rbind(data.eke.sub, west.edge, east.edge, north.edge, south.edge)
rm(bins, bin.width, arbitary.value, min, max, seq.lat, seq.lon, west.edge, east.edge, north.edge, south.edge)
```
***Now we plot***
Now, with our newly padded dataset, we can complete the greyscale plot. I've only commented the changed lines in this plot (vs. the colour plot above).
<<<<<<< HEAD
```{r greyscale_plot, fig.width=8}
=======
```{r greyscale_plot, fig.width=8, fig.align="center"}
>>>>>>> 8e329665c5f98353bccc132327e2167ca598c39f
plot.eke.contour.grey <- ggplot() +
# This line denotes the newly padded data, as a contour statistic.
# ..level.. indicates that the fill aesthetic is calculated by the contour statistic.
# breaks manually define which contours we want to keep.
stat_contour(data = data.eke.sub.contour, aes(x = lon, y = lat, z = eke, fill = ..level..),
geom = "polygon", breaks = c(0, 0.02, 0.04, 0.06, 0.08)) +
geom_polygon(data = za, aes(x = long, y = lat, group = group),
colour="black", fill="white", alpha=1, size = 0.3) +
geom_polygon(data=ef.df, aes(x=long, y=lat, group=group),
colour="black", fill="white", alpha=0, linetype="dashed", size = 0.3) +
geom_contour(data=etopo.crop.df, aes(x=long,y=lat,z=z), breaks=c(-3000), colour="black", size = 0.3) +
coord_map("mercator") +
coord_fixed(xlim = c(lonmin, lonmax), ylim = c(latmin, latmax), expand = FALSE) +
labs(x = 'Longitude (ºE)', y = 'Latitude (ºN)') +
# Here the grey scale colours are defined for the plot and the legend
scale_fill_gradient(low = "gray25", high = "gray95",
space = "Lab",
guide = guide_colorbar(title="Eddy kinetic energy (cm²/s²)", title.position="right")) +
geom_text(data=lbl, aes(x=x, y=y, label=txt), size=rel(3), colour="white") +
thm
plot.eke.contour.grey
```
There we go! The contours look great. There aren't too many of them, they don't have distracting borders, the 3000 m bathymetry contour looks nice and isn't overly distracting.
***Saving the plot according to the publishers requirements***
For completeness, this is one approach to saving your plots
```{r save_greyscale_plot}
ggsave(filename = 'plot.eke.contour.grey.png', plot = plot.eke.contour.grey, width = 174, height = 105, units="mm", dpi = 300, type="cairo-png")
```
I hope this helps someone :)