-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path07a_git.qmd
612 lines (382 loc) · 21.2 KB
/
07a_git.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
---
title: "Version Control with Git & GitHub"
subtitle: "Version Control"
---
> Hands up - *who has heard of version control software?*\
> *What do you think it does?*
<br>
# What is Version control? :thinking:
The **management of changes** to documents, computer programs, large web sites, and other collections of information.
**Examples:**
- Numbering of book editions
- Wikipedia's Page history
<br>
## Where did it come from?
The need for a logical way to organize and control revisions has existed for almost as long as writing has existed, but **revision control became much more important, and complicated when the era of computing began**
## Elements of a Version Control system
- Changes are usually identified by a number or letter code, termed the "revision number"
- Each revision is associated with a timestamp and the person making the change.
- Only changes to a file are recorded rather than saving a whole new copy.
- Revisions can be compared, restored, and with some types of files, merged.
<br>
# What is git? :thinking:
Open source (free to use) Version control software. Usually accessed via the command line, or a client program.
```{r, out.width="40%", echo=FALSE}
knitr::include_graphics("https://git-scm.com/images/logos/downloads/Git-Logo-2Color.png")
```
## Where did it come from?
Git development began in 2006 after many developers of the Linux kernel gave up access to [BitKeeper](http://www.bitkeeper.org/) (at the time the best but proprietary)
## Why use it in research?
::: grid
::: g-col-md-6
#### Exhibit A
[](https://xkcd.com/)
:::
::: g-col-md-6
#### Exhibit B
](http://www.phdcomics.com/comics/archive/phd101212s.gif){height="400"}
:::
:::
# What is GitHub :thinking:
{width="50%"}
A **website** that allows you to **store your Git repositories online** and makes it easy to collaborate with others. They also provide other services like issue (bug) tracking and wikis. Similar services are [GitLab](https://gitlab.com) and [BitBucket](https://bitbucket.org/).
## Why use it in research:
To enable collaboration and track contributions
<img src="assets/workflows.png" width="50%"/><img src="assets/collab.png" width="50%"/>
*images: Mozilla Science Lab* CC-BY 4.0
<br>
- Acts as a remote back-up
- Facilitates transparency
- Facilitates project management
- Facilitates sharing and collaboration
- Super-charges innovation by Open Sourcing Science
- [Mozilla & Working Open](https://www.youtube.com/watch?v=quKdaqlR_9w)
- [Open Source Basics](https://www.youtube.com/watch?v=Tyd0FO0tko8)
- [Reinventing Discovery](https://www.amazon.co.uk/Reinventing-Discovery-New-Networked-Science/dp/0691160198/ref=sr_1_1?ie=UTF8&qid=1544031907&sr=8-1&keywords=reinventing+discovery)
- [Macroecological and macroevolutionary patterns emerge in the universe of GNU/Linux operating systems](https://onlinelibrary.wiley.com/doi/10.1111/ecog.03424)
# Anatomy of GitHub Repo
- **Readme files**. Create a `README.md` file to explain what your project is, and how to install and use it. `README.md` is the file that is automatically displayed when you open a GitHub repo.
- **License**. Without some sort of licence, the contents of the repository are technically closed. Some allow users of the code to do anything they like with their code - these are known as permissive licences. Examples are the [MIT Licence](https://tldrlegal.com/license/mit-license) or [Apache](https://tldrlegal.com/license/apache-license-2.0-(apache-2.0)).
- <https://choosealicense.com/> - does what it says on the tin and helps you choose a licence. Here are some resources to help you choose:
- <https://tldrlegal.com/> - plain english explanations of licences in bullet form.
- **Contributing guide** - make a file called `CONTRIBUTING.md` and guidelines for contributors so they know what they should do if they want to help you out.
- **Code of Conduct** - good projects have codes of conduct to make sure that people are treated well. Github has an [Code of Conduct wizard](https://help.github.com/articles/adding-a-code-of-conduct-to-your-project/) to make it easy to add one.
- **Issues** - use GitHub issues to record and discuss tasks.
<br>
# Git, Github & Rstudio
### Before: git only through the terminal

### Rstudio & `usethis` to the rescue!
- Initialise **Rstudio project** with Git by **just checking a box!**
- Forgot to? use `usethis::use_git()`
- visual panel to easily see the status of all your files
- interactive navigation through file version history
# Version Control Demo
## How does `Git` work?
When a local directory becomes ***initialised with git***, a **hidden `.git` folder is added** to it.
> it's now called a **repository**
- New copies of files you tell git to track will be **added** to that `.git` folder.
- After adding, git will track any modifications to those files
### first commit - whole file added
Any file unknown to git will have a **yellow ?** box next to it.

The first time you commit a file you are **adding** it to `.git`, effectively telling it to start tracking the file

### second commit - only difference highlighted
The first time you commit a file, only the changes are shown and any file that has uncommited modifications is shown with a **blue M**
 When all changes have been committed, the git panel is clear.

Enough theory, how about in practice!
# Configure git & GitHub
## Configure git
First, `git` needs to know who you are so your commits can be attributed to you. **`usethis`** to the rescue again!
**Check your configuration**
```{r}
#| eval: false
usethis::git_sitrep()
```
```
── Git global (user)
• Name: <unset>
• Email: <unset>
✖ Git user's name and email are not set. Configure using `use_git_config(user.name = "<your name>", user.email = "<your email>")`.
• Global (user-level) gitignore file: <unset>
• Vaccinated: FALSE
ℹ See `?git_vaccinate` to learn more
ℹ Defaulting to 'https' Git protocol
• Default Git protocol: 'https'
• Default initial branch name: <unset>
── GitHub user
• Default GitHub host: 'https://github.com'
• Personal access token for 'https://github.com': <unset>
• To create a personal access token, call `create_github_token()`
• To store a token for current and future use, call `credentials::set_github_pat()`
ℹ Read more in the 'Managing Git(Hub) Credentials' article:
https://usethis.r-lib.org/articles/articles/git-credentials.html
── Active usethis project: '/cloud/project' ──
ℹ Active project is not a Git repo
```
**Set your configuration**
Use your github username and and the email you used to sign-up on GitHub
```{r}
#| eval: false
usethis::use_git_config(
user.name = "Jane",
user.email = "jane@example.org")
```
## Create a GITHUB PAT
To authenticate with GitHub, you'll also need a Personal Authorisation Token (PAT). Password-based authentication for Git is deprecated, i.e. you really should *not* be sending your username and password every time you push or pull. Here, I'm referring to the username and password you would use to login to GitHub in the browser.
What should you do instead?
Get a **personal access token** (PAT) and use that as your credential for HTTPS operations. (The PAT will actually be sent as the password and the username is somewhat artificial, consulted only for credential lookup.)
#### How to get a PAT?
GitHub offers instructions for [creating a personal access token](https://docs.github.com/en/free-pro-team@latest/github/authenticating-to-github/creating-a-personal-access-token).
The usethis package has a helper function that takes you to the web form to create a PAT, with the added benefit that it pre-selects the recommended scopes:
```{r}
#| eval: false
usethis::create_github_token()
```
```
● Call `credentials::set_github_pat()` to register this token in the local Git credential store
It is also a great idea to store this token in any password-management software that you use
✔ Opening URL 'https://github.com/settings/tokens/new?scopes=repo,user,gist,workflow&description=DESCRIBE THE TOKEN\'S USE CASE'
```
Will open up the GitHub panel to generate your PAT.
Give your PAT a descriptive name. I'm calling mine `POSIT_CLOUD` as that's the only place I'm going to use it and, for security reasons, I'm setting an expiration date of 7 days, just long enough to complete the course.

Once you are happy with the selected scopes, scroll to the bottom and click "Generate token".
You will be taken to a page that shows your token. **Make sure you copy it to your clipboard!**.

As the page says, you must **store this token somewhere**, because you'll never be able to see it again, once you leave that page or close the window.
::: callout-warning
Do not ever hard-wire your PAT into your code! A PAT should always be retrieved implicitly, for example, from the Git credential store or from an environment variable.
:::
## Store your credential
Below, we will add your PAT to our the Posit Cloud Git cached credential store as a semi-persistent convenience, sort of like “remember me” on a website.
**YOU MUST ALSO STORE A SECURE COPY SOMEWHERE, Either a password manager or locally on your computer for now, WE WILL NEED IT IN FUTURE SESSIONS**
Add to the file and save.
### Store with `credentials` package
As of March 2024, the recommended R package for accessing the Git credential store is:
- [credentials](https://docs.ropensci.org/credentials/)
If you don't have credentials installed, install via `install.packages("credentials")`.
Then call `credentials::set_github_pat()`:
```{r}
#| eval: false
credentials::set_github_pat()
```
`credentials::set_github_pat()` is a very handy function, since it populates the GITHUB_PAT environment variable using the `git_credential` manager, which `git` itself uses for storing passwords. The credential manager returns stored credentials if available, and securely prompt the user for credentials when needed.
Respond to the prompt with your personal access token (PAT).
You can check that you've stored a credential with `credentials::set_github_pat()`:
```{r}
#| eval: false
credentials::set_github_pat()
```
```
If prompted for GitHub credentials, enter your PAT in the password field
Using GITHUB_PAT from Anna Krystalli (credential helper: cache)
```
### TODO store your PAT with a password manager
**Treat this PAT like a password!** Currently, we've only store a copyable form of the PAT in the Posit CLoud credentials store cache. This cache will be cleared regularly so if we **don't store it somewhere else securely too, it will be lost!!**
If you use a password management app, such as 1Password or LastPass (which you should), it is highly recommended to store this PAT in it as a password. This way, you can easily retrieve it when needed but also manage it securely.
If for some reason you ever generate a PAT but fail to capture it on your system, you’ll have to generate another one.
This is not the end of the world, but you should delete the “lost” PAT on GitHub. If you aren’t disciplined about labelling PATs and deleting lost PATs, you will find yourself in an unsettling situation where you can’t be sure which PAT(s) are in use.
::: callout-caution
For security reasons, Posit Cloud only caches the following configurations temporarily. While more secure, it does mean we'll need to perfom these steps in new projects and every 12 hours in older projects.
**On your local system however, you will only need to do this once.**
To allow your local Rstudio application to communicate seamlessly with GitHub when pusshing and pulling through the GUI, I also recommend you go a step further and setup an ssh key. This will allow you to authenticate with GitHub without having to enter your username and password every time.
See [Set up keys for SSH](https://happygitwithr.com/ssh-keys#ssh-keys) in Happy Git with R as well the [`credentials` package vignette](https://docs.ropensci.org/credentials/articles/intro.html) for more information.
:::
## Version Controlling projects
### Turn our project into a repository
If you didn't initialise git at the beginning of your project, you can do so now with `usethis::use_git()`:
::: callout-warning
This will try to **commit everything in the repo so far in one go!**
**Override that behaviour by SELECTING A NEGATIVE RESPONSE WHEN ASKED.**
:::
```{r}
#| eval: false
usethis::use_git()
```
``` r
✔ Initialising Git repo
✔ Adding '.Rhistory', '.RData' to '.gitignore'
There are 9 uncommitted files:
* '.gitignore'
* '.Rbuildignore'
* 'analysis.R'
* 'data-raw/'
* 'data/'
* 'report.html'
* 'report.Rmd'
* 'R/'
* 'project.Rproj'
Is it ok to commit them?
1: Negative
2: Nope
3: I agree
Selection:
```
Next **allow Rstudio to restart when asked:**
``` r
● A restart of RStudio is required to activate the Git pane
Restart now?
1: Absolutely
2: Absolutely not
3: Negative
```
The **git panel should now be visible** in the top right of your Rstudio window.
Let's now check our git setup again:
```{r}
#| eval: false
usethis::git_sitrep()
```
```
── Git global (user)
• Name: 'Anna Krystalli'
• Email: 'annakrystalli@googlemail.com'
• Global (user-level) gitignore file: <unset>
• Vaccinated: FALSE
ℹ See `?git_vaccinate` to learn more
ℹ Defaulting to 'https' Git protocol
• Default Git protocol: 'https'
• Default initial branch name: <unset>
── GitHub user
• Default GitHub host: 'https://github.com'
• Personal access token for 'https://github.com': <unset>
• To create a personal access token, call `create_github_token()`
• To store a token for current and future use, call `credentials::set_github_pat()`
ℹ Read more in the 'Managing Git(Hub) Credentials' article:
https://usethis.r-lib.org/articles/articles/git-credentials.html
── Active usethis project: '/cloud/project' ──
── Git local (project)
• Name: 'Anna Krystalli'
• Email: 'annakrystalli@googlemail.com'
✖ Default branch cannot be determined.
• Current local branch -> remote tracking branch:
'<unset>' -> '<unset>'
── GitHub project
ℹ Project does not use GitHub
```
# Review git status
In our project, let's have a look at the Rstudio **Git** tab. It shows all the files currently in the root of our project that git is aware of. The **yellow ?** indicates none of the files have been added to git yet, i.e. git is aware of them but none are being formally tracked for changes.

# Ignoring files through `.gitignore`
There may be files that we don't want to commit to git, e.g.
- data files that are too large
- documents with sensitive information (eg authorisation tokens etc)
- intermediate files that we don't need to save copies of.
**We can tell git to ignore them by adding them to the `.gitignore` file. By doing this upfront, we minimise the danger of committing them to our project's history by accident.**
## Open `.gitignore`
When we open `.gitgnore` we see **there are a number of standard files already added**.
```
.Rproj.user
.Rhistory
.Rdata
.httr-oauth
.DS_Store
.quarto
```
These are standard files it's best practice to never commit to version control, especially if we are going to share our repository online.
Let's go ahead and add the rest of the files we want to ignore.
## Add files and folders to `.gitignore`
### `.gitignore` regex
You can use `regex` (regular expressions) in `.gitignore` files to **ignore files according to a pattern.**
- `directoryname/*` will ignore all files in a directory.
- `*.html` will ignore any file ending in `.html`
- prefix "!" which negates the pattern
So let's use regex to: - ignore **all files in `attic/`** - ignore **all files in `data-raw/wood-survey-data-master/` apart from the NEON logo (`neon-logo.png`) and `references.bib` which are required to successfully render our report!**
To do so, add the following to the bottom of `.gitignore`
```
attic/*
data-raw/wood-survey-data-master/*
!data-raw/wood-survey-data-master/neon-logo.png
!data-raw/wood-survey-data-master/references.bib
```
You should notice that once the .gitgnore file is saved, the files and directories we've added to it are **no longer visible in the git panel as options to add to a commit.**
# Commit project files
## Add files
The first step of a commit is to **add the files we want to be part of the commit to the staging area.**.
To do so, just **tick the box next to any files and/or directories in the git pane**.
Given we want to commit everything, as git is ignoring files we are not interested in, it is safe to add all files to the staging area!

::: callout-tip
Note how when the `data-raw` folder is selected, only 3 files are added. That's `.gitignore` in action!
:::
## Commit changes
Write an appropriate commit message (e.g. "Initial commit") and click the **commit** button:

Once the commit os complete, the git panel should be clear:

# Create a README
Our repository also needs a `README`. We only need a simple plain markdown (`.md`) file for our README.
We can create a template using `usethis::use_readme_md()`
```{r}
#| eval: false
usethis::use_readme_md()
```
## Edit README
Adapt the template, adding a short description about your project, including local links to your report and dataspice information page.
I've added the following:
``` md
# wood-survey
<!-- badges: start -->
<!-- badges: end -->
The goal of wood-survey is to analyse the relationship between stem diameter and height in NEON vegetation survey data.
A report of our analysis can be found [here](report.html)
Details about the data used can be found [here](data/index.html)
```
Add and commit your new README


# Create repository on GitHub
So far we've only been working with a local version (i.e. only available to us in Posit Cloud) of our project. **Now we want to share it with the world through GitHub!**
Because have set up a `GITHUP_PAT`, we can use function `usethis::use_github()` to create a GitHub repository for our project:
```{r}
#| eval: false
usethis::use_github(protocol = "https")
```
``` r
✔ Creating GitHub repository 'annakrystalli/project'
✔ Setting remote 'origin' to 'https://github.com/annakrystalli/project.git'
✔ Pushing 'master' branch to GitHub and setting 'origin/master' as upstream branch
✔ Opening URL 'https://github.com/annakrystalli/project'
```
Once the repo is created and any commmited files pushed, the repo is launched in the browser:

# Host HTML content on GitHub
Let's head to the repo and have a look at what we've shared. To host our html content on GitHub, we need to **enable `gh-pages`** in our repository.
## Enable `gh-pages`
Go to repo **Settings** and navigate to the **Pages** panel on the left hand side and change your settings to the following configuration:
\- **Source:** Deploy from a branch.
\- **Branch:** `master`
\- {{< fa folder >}} **Folder:** `/ (root)`
\- Ensure the ***Enforce HTTPS*** option is selected.
and **Save**

## Check Deployed Site
Once the site has been successfully deployed, you will see a message like this (you might need to refresh the page):

Click on the link displayed and go check out your work!
::: callout-note
Note that my URL looks a bit different to yours. That's because I'm redirecting all my GitHub pages to my personal domain. Yours will be something like `https://<username>.github.io/<repo-name>`
:::
### Landing page
The landing page of your site will be the `README.md` file (although you could override this if the root of your project contained a `index.md` or `index.html` file).
 From there you can click through to the rendered report page (or you can add `/report.html` to the end of the URL to go directly to the report) and the data documentation page.
### Report

Scroll through and check out the interactive features like the table of contents, code folding and table, figure and citation previews when hovering over their mentions!
## Add URL to repo About panel
A useful feature of GitHub is the ability to include a link to any hosted content in the **About** panel of your repository. This is a great way to direct visitors to the rendered content of your project.
To add the URL to the **About** panel, click on the {{< fa cog>}} button next to **About** and check the **Use your GitHub Pages website** box:
 Once added, visitors can easily navigate to the hosted content from the repo landing page:

# `Git` tips
- commit early, commit often
- commit logical bits of work together
- write meaninful messages
## Further Resources
- [Git-it](http://jlord.us/git-it/)
- [Happy with Git](http://happygitwithr.com/)
- [Oh Shit Git](http://ohshitgit.com/)
### Never forget
