-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathData-Management-Setup-Project-01.Rmd
213 lines (118 loc) · 8.67 KB
/
Data-Management-Setup-Project-01.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
---
title: "Data Management Setup Project"
author: "Lauren Yee"
date: "15/07/2020"
output: word_document
---
```{r setup, include=FALSE,fig.path = 'Figs/', dev="png",dpi=300}
knitr::opts_chunk$set(echo = TRUE)
```
# Setting up R Project
One of the first steps of every workflow should be to set up a Project within RStudio. A Project is the home for all of the files, images, reports, and code that are used in any given project. Note that when we capitalize the word Project, we’re referring to a specific setup within RStudio, while we refer to general projects that you might work on with the lowercase project.
We use Projects because they create a self-contained folder for a given analysis in R. This means that if you want to share your Project with a colleague, they will not have to reset file paths (or even know anything about file paths!) in order to re-run your analysis.
Furthermore, even if the only person you ever collaborate with is a future version of yourself, using a Project for each of your analyses will mean that you can move the Project folder around on your computer, or even move it to a new computer, and remain confident that the analysis will run in the future (at least in terms of file path structures).
Creating a Project is one of the first steps in working on an R-based data science project in RStudio. To create a Project you will need to first open RStudio.
From within RStudio, follow these steps:
Click on File
Select New Project
Choose New Directory
Click on New Project
Enter your Project’s name in the box that says “Directory name.” We recommend choosing a Project name that helps you to remember that this is a project that involves data management and cleaning.
Avoid using spaces in your Project name, and instead separate words with hyphens or underscore characters.
Choose where to save your Project by clicking on “Browse” next to the box labeled “Create project as a subdirectory of:” If you are just using this to learn and to test out creating a Project, consider placing it in your downloads or another temporary directory so that you remember to remove it later.
Click “Create Project”

At this point, you should have a Project that will serve as a place to store any .R scripts that you create as you work through this text. If you’d like more practice, take a few moments to set up a couple of additional Projects by following the steps listed above. Within each Project, add and save .R scripts. Since this is just for practice, feel free to delete these Projects once you have the hang of the procedure.
# Arranging your Markdown document:
Start each program with a description of what it does.
Then load all required packages.
Consider what working directory you are in when sourcing a script.
Use comments to mark off sections of code.
Put function definitions at the top of your file, or in a separate file if there are many.
Name and style code consistently.
Break code into small, discrete pieces.
Factor out common operations rather than repeating them.
Keep all of the source files for a project in one directory and use relative paths to access them.
Keep track of the memory used by your program.
Always start with a clean environment instead of saving the workspace.
Keep track of session information in your project folder.
Have someone else review your code.
Use version control.
# Style / Naming Conventions
Object names must start with a letter, and can only contain letters, numbers, .codeblock`_` and `.`
You want your object names to be descriptive, so you’ll need a convention for multiple words. I recommend snake_case where you separate lowercase words with `_`
i_use_snake_case
otherPeopleUseCamelCase
some.people.use.periods
And_aFew.People_RENOUNCEconvention
# R Markdown
Open this document in R Studio to examine how this document is structured in R Markdown and how it is generated by word. I recommend having both R studio and the word document side by side.
All R markdown documents end in **.rmd** as opposed to **.R**. The start of a markdown file is called a "YAML". Here you can specify the title and other meta data attributes to your file, as well as the outputs generated. Such as a pdf, word file, or html document
Each "chunk" represented by ``` of R code can executed independently and visualizations are generated in-line.
```{r default,echo=FALSE,out.width="90%",out.extra=""}
knitr::include_graphics('./Figures/markdown.png')
```
# Adding chunks
To add a new code chunk, press *Cmd+Option+I* (*Ctrl+Alt+I* on Windows), or click the *Insert* button at the top of this document, then select *R*. R Markdown will add a new, empty chunk at your cursor's location.
Try making a code chunk below:
Examine the chunks below:
```{r}
# Sometimes you might want to run only some of the code
# in a code chunk. To do that, highlight the code to
# run and then press Cmd + Enter (Control + Enter on
# Windows). If you do not highlight any code, R will
# run the line of code that your cursor is on.
# Try it now. Run mean(1:5) but not the line below.
mean(1:5)
warning("You shouldn't run this!")
```
```{r}
# You can click the downward facing arrow to the left of the play button to run
# every chunk above the current code chunk. This is useful if the code in your
# chunk uses object that you made in previous chunks.
# Sys.Date()
```
Did you notice the green lines in the code chunk above? They are *code comments*, lines of text that R ignores when it runs the code. R will treat everything that appears after `#` on a line as a code comment. As a result, if you run the chunk above, nothing will happen—it is all code comments (and that's fine)!
Remove the `#` on the last line of the chunk above and then rerun the chunk. Can you tell what `Sys.Date()` does?
By the way, you only need to use code comments _inside_ of code chunks. R knows not to try to run the text that you write outside of code chunks.
# Chunk Options
`eval = FALSE` prevents code from being evaluated. This is useful for displaying example code, or for disabling a large block of code without commenting each line.
```{r, eval = FALSE}
mean(1:5)
```
`include = FALSE` runs the code, but doesn’t show the code or results in the final document.
```{r, include = FALSE}
mean(1:5)
```
`echo = FALSE` prevents code, but not the results from appearing in the finished file. Use this when writing reports aimed at people who don’t want to see the underlying R code. Or to show a figure generated by `ggplot2`
```{r, echo=FALSE}
mean(1:5)
```
# Themes in R Markdown
There are built in themes you can explore in R markdown that can we changed through the YAML header (more on this in future sessions).
Note that embedding images in R Markdown can take many forms, this is just one way. Also note the **relative path** used to our Figures Directory.
Below is the default R markdown theme.
```{r markdown output,out.width="90%"}
knitr::include_graphics("./Figures/default_markdown.PNG")
```
# Change Theme
By changing the YAML header theme to 'darkly' we get a different look.
```{r markdown output2}
knitr::include_graphics("./Figures/darkly_yaml.png")
```
```{r darkly}
knitr::include_graphics("./Figures/darkly.png")
```
# Text formatting
Have you noticed the funny highlighting that appears in this document? R Markdown treats text surrounded by *asterisks*, **double asterisks**, and `backticks` in special ways. It is R Markdown's way of saying that these words are in
- _italics_
- *also italics*
- **bold**, and
- `code font`
`*`, `**`, and \` are signals used by a text editing format known as `markdown`. R Markdown uses `markdown` to turn your plain looking .Rmd documents into polished reports. Let's give that a try.
# Reports
When you click the `knit` button at the top of an R Markdown file (like this one), R Markdown generates a polished copy of your report. R Markdown:
1. Transforms all of your markdown cues into actual formatted text (e.g. bold text, italic text, etc.)
2. Reruns all of your code chunks in a clean R session and appends the results to the finished report.
3. Saves the finished report alongside your .Rmd file
Click the *knit* button at the top of this document or press *Cmd+Shift+K* (*Ctrl+Shift+K* on Windows) to render the finished report. The RStudio IDE will open the report so you can see its contents. For now, our reports will be HTML files. Try clicking *Knit* now.