forked from ocbe-uio/2022_bioinformatics_workshop
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathpart2_model.qmd
120 lines (88 loc) · 2.69 KB
/
part2_model.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
---
title: "R Lab - Part II"
format:
html:
code-fold: false
code-tools: true
editor: visual
---
## Association modeling
In this part of the exercise, we model (on the log-scale) the association of miRNA espression on protein expression adjusting for the corresponding mRNA.
Investigate **miR-107** and **B-RAF** (Aure et al, 2015, Figure 2H)
```{r}
#| label: load-data-againn
#| warning: false
#| echo: false
# need to load the data in this page again. DO NOT SHOW
mir <- read.table("lab/data/miRNA-421x282.txt", header=T, sep="\t", dec=".")
rna <- read.table("lab/data/mRNA-100x282.txt", header=T, sep="\t", dec=".")
prt <- read.table("lab/data/prot-100x282.txt", header=T, sep="\t", dec=".")
mir <- as.matrix(mir)
rna <- as.matrix(rna)
prt <- as.matrix(prt)
```
```{r}
#| label: select-braf
#| warning: false
#| echo: true
prt.BRAF = prt[12,]
rna.BRAF = rna[12,]
mir.107 = mir[16,]
```
### (a) Linear regression model
on the log-scale, Aure et al. 2015, equation (3)
```{r}
#| echo: true
#| eval: true
fitA <- lm(prt.BRAF ~ mir.107 + rna.BRAF)
summary(fitA)
```
Add smooth non-linear cures to the scatterplots: use existing `panel.smooth()` function, and add linear regression lines to the scatterplots:
```{r}
#| echo: true
#| eval: true
panel.linear <- function (x, y, col.regres = "blue", ...)
{
points(x, y, pch=19)
ok <- is.finite(x) & is.finite(y)
if (any(ok))
abline(stats::lm(y[ok] ~ x[ok]), col = col.regres, ...)
}
pairs(data.frame(mir.107, prt.BRAF, rna.BRAF),
lower.panel = panel.smooth,
upper.panel = panel.linear)
```
### (b) Lasso-penalised linear model
with **all miRNAs** (Aure et al. 2015, equation (4))
```{r}
#| echo: true
#| eval: false
library(glmnet)
# 10-fold CV to determine the optimal lambda:
# Note: rna.BRAF is penalised together with all the mir variables.
# You can use the penalty.factor option to avoid this.
set.seed(1234)
cvfit <- cv.glmnet(y=prt.BRAF, x=t(rbind(mir, rna.BRAF)),
alpha=1, nfolds=10, standardize=TRUE)
par(mfrow=c(1,1))
plot(cvfit)
lambda.opt <- cvfit$lambda.min
# Coefficient path plot and coefficients for optimal lambda:
fitB <- cvfit$glmnet.fit
plot(fitB, xvar="lambda")
abline(v=log(lambda.opt))
coef(fitB, s=lambda.opt)
predict(fitB, type="nonzero", s=lambda.opt)
```
Compare the regression coefficient of mir.107 from the models in (a) and (b):
```{r}
#| echo: true
#| eval: false
coef(fitA)["mir.107"]
as.matrix(coef(fitB, s=cvfit$lambda.min))["hsa-miR-107",]
```
::: callout-note
## Task 3
Repeat the lasso analysis, but this time do not penalise the `rna.BRAF` variable together with the mir variables.
Check out the information on the `penalty.factor` option in `?glmnet` to understand how.
:::