part2_model.qmd

---
title: "R Lab - Part II"
format: 
  html:
    code-fold: false
    code-tools: true
editor: visual
---

## Association modeling

In this part of the exercise, we model (on the log-scale) the association of miRNA espression on protein expression adjusting for the corresponding mRNA.

Investigate **miR-107** and **B-RAF** (Aure et al, 2015, Figure 2H)

```{r}
#| label: load-data-againn
#| warning: false
#| echo: false

# need to load the data in this page again. DO NOT SHOW
mir <- read.table("lab/data/miRNA-421x282.txt", header=T, sep="\t", dec=".")
rna <- read.table("lab/data/mRNA-100x282.txt", header=T, sep="\t", dec=".")
prt <- read.table("lab/data/prot-100x282.txt", header=T, sep="\t", dec=".")

mir <- as.matrix(mir)
rna <- as.matrix(rna)
prt <- as.matrix(prt)
```

```{r}
#| label: select-braf
#| warning: false
#| echo: true
 
prt.BRAF = prt[12,]
rna.BRAF = rna[12,]
mir.107 = mir[16,] 
```

### (a) Linear regression model

on the log-scale, Aure et al. 2015, equation (3)

```{r}
#| echo: true
#| eval: true

fitA <- lm(prt.BRAF ~ mir.107 + rna.BRAF)
summary(fitA)
```

Add smooth non-linear cures to the scatterplots: use existing `panel.smooth()` function, and add linear regression lines to the scatterplots:

```{r}
#| echo: true
#| eval: true

panel.linear <- function (x, y, col.regres = "blue", ...) 
{ 
  points(x, y, pch=19) 
  ok <- is.finite(x) & is.finite(y) 
  if (any(ok)) 
    abline(stats::lm(y[ok] ~ x[ok]), col = col.regres, ...) 
} 

pairs(data.frame(mir.107, prt.BRAF, rna.BRAF), 
      lower.panel = panel.smooth,
      upper.panel = panel.linear)

```

### (b) Lasso-penalised linear model

with **all miRNAs** (Aure et al. 2015, equation (4))

```{r}
#| echo: true
#| eval: false

library(glmnet)

# 10-fold CV to determine the optimal lambda:
# Note: rna.BRAF is penalised together with all the mir variables. 
# You can use the penalty.factor option to avoid this.
set.seed(1234)
cvfit <- cv.glmnet(y=prt.BRAF, x=t(rbind(mir, rna.BRAF)),
                   alpha=1, nfolds=10, standardize=TRUE)

par(mfrow=c(1,1))
plot(cvfit)
lambda.opt <- cvfit$lambda.min

# Coefficient path plot and coefficients for optimal lambda:
fitB <- cvfit$glmnet.fit

plot(fitB, xvar="lambda")
abline(v=log(lambda.opt))

coef(fitB, s=lambda.opt)
predict(fitB, type="nonzero", s=lambda.opt)
```

Compare the regression coefficient of mir.107 from the models in (a) and (b):

```{r}
#| echo: true
#| eval: false

coef(fitA)["mir.107"]
as.matrix(coef(fitB, s=cvfit$lambda.min))["hsa-miR-107",]
```

::: callout-note
## Task 3

Repeat the lasso analysis, but this time do not penalise the `rna.BRAF` variable together with the mir variables.

Check out the information on the `penalty.factor` option in `?glmnet` to understand how.
:::