Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in Differential Expression testing after spatialdecon and reverseDecon #36

Open
gianfilippo opened this issue May 13, 2022 · 7 comments

Comments

@gianfilippo
Copy link

Hi,

could you please clarify how to run Differential Expression on the cell type specific data generated by reverseDecon ?

Below is what I did so far:

I followed the GeoMxWorkflows tutorial to QC, filter, process the data from DCC files

Next, I used assayData(myData)[["q_norm"]] as final matrix for analysis and tried to follow the SpatialDecon tutorial:
norm = assayData(myData)[["q_norm"]]

infer background using derive_GeoMx_background

run the spatialdecon function as follow:
spatialdecon(norm = norm, bg = bg, X = mouseAllenBrain, align_genes = TRUE)
where mouseAllenBrain = download_profile_matrix(species = "Mouse", age_group = "Adult", matrixname = "Brain_AllenBrainAtlas")

Then I used reverseDecon to get cell type specific gene expression levels
rdecon = reverseDecon(norm = norm, beta = res$beta)

from here I derived the cell type specific gene expression levels (not sure about this).
for instance, for Actrocytes,
for (i in 1:ncol(norm)) normAstro[,i] = rdecon$coefs[,"Astro"] * res$beta["Astro",i] + rdecon$coefs[,1]

the normAstro matrix has some entries equal (or very close) to zero, so, log2(normAstro) not an option. I use log2(normAstro+1)
and assign it to the GeoMXset object
assayDataElement(object = myData, elt = "log_q_norm_Astro") = log2(normCellType[["Astro"]]+1.0)

At this point I do differential expression testing as follow (as in the tutorial) for each segment.
NOTE: the experimental design is such that each slide has both "Ref" and "Pert" in testClass, but I may have two slides each individual, so I used a LMM without random slope

segment = "TypeA"
ind <- pData(target_demoData)$segment == segment
mixedOutmc <- mixedModelDE(target_demoData[, ind], elt = "log_q_norm_Astro", modelFormula = ~ testClass + (1 | slide),
                           groupVar = "testClass", nCores = parallel::detectCores(), multiCore = F)
r_test <- do.call(rbind, mixedOutmc["lsmeans", ])
tests <- rownames(r_test)
r_test <- as.data.frame(r_test)
r_test$Contrast <- tests
r_test$Gene <- unlist(lapply(colnames(mixedOutmc), rep, nrow(mixedOutmc["lsmeans", ][[1]])))
r_test$Subset <- segment
r_test$FDR <- p.adjust(r_test$`Pr(>|t|)`, method = "fdr")
r_test <- r_test[, c("Gene", "Subset", "Contrast", "Estimate", "Pr(>|t|)", "FDR")]
resultsCellTypeSpec <- rbind(resultsCellTypeSpec, r_test)

The code crashes at this point with the following error
"log_q_norm_Astro"
Error in checkForRemoteErrors(val) :
20 nodes produced errors; first error: invalid class “corMatrix” object: 'sd' slot has non-finite entries

Please let me know if you need more details

Thanks
Gianfilippo

@gianfilippo
Copy link
Author

Hi Maddy,

I take the reverseDecon output, "rdecon" in my case, and, for each sample, I generate cell type specific gene expression levels. See blow for an example:

for (i in 1:ncol(norm))
normAstro[,i] = rdecon$coefs[,"Astro"] * res$beta["Astro",i] + rdecon$coefs[,1]

I am not sure this is correct. Anyway, the resulting cell type specific gene expression matrix, normAstro is a gene X sample matrix, with the same gene names and sample IDs as the original matrix.

I then add it to the NanoStringGeoMxSet and git is a 'name', as follow:
assayDataElement(object = myData, elt = "log_q_norm_Astro") = log2(normCellType[["Astro"]]+1.0)

Then I try to test for differential expression as described in my previous post.

Unless there is a problem with the way I add the assayDataElement to the NanoStringGeoMxSet, I cannot see the matrix format to be a problem.
May be the way I estimate cell type specific expression levels by gene is wrong? what do you think ?

Thanks
Gianfilippo

@maddygriz
Copy link
Collaborator

Hi Gianfilippo,

I was reading your issue incorrectly and noticed my mistake right after I sent my previous comment. I am looking into your issue, now that I understand it correctly, and I will get back to you shortly.

Maddy

@maddygriz
Copy link
Collaborator

Hi Gianfilippo,

So you are creating the cell type specific gene expression levels in the expected format so that is not the issue. The issue seems to stem from the significant number of zeros that are created from generating this matrix. Our biostatistician suggests

"limiting this analysis to genes that are mainly coming from the cell type of interest. E.g. if a gene's "astrocyte-specific" expression is >50% of the total background-subtracted expression, then it could be worth analyzing in this way. This kind of rule does remove most genes from consideration."

But if you don't want to reduce the gene list that much, you at least need to remove the genes that have zeros across all of the samples. Having the same value across the row is causing the standard deviation error.

Maddy

@gianfilippo
Copy link
Author

How naive of me!! I did not think about re-filtering the matrix. Thank you (and your biostatistician) for noticing it. I will let you know how it goes
Best
Gianfilippo

@gianfilippo
Copy link
Author

ok, I made some changes but I am still having problems. I get the following error

Error in if (maxmingrad > ccl$tol) { :
missing value where TRUE/FALSE needed

Can you please help ?

Below is what I did. In the first few lines, I create a copy of the original data, then I filter out from the
myData is the original data object.

Copy data obj

myData_test = myData

Add cell type specific matrix to data Obj and store into "q_norm_Astro" assay element

assayDataElement(object=myData_test, elt="q_norm_Astro")) <- normCellType[["Astro"]]

Filter genes. Keep only genes with a min expr of 0.0001 in 9 samples or more

keep = which(rowSums(normCellType[["Astro"]]>=0.0001)>8)
myData_test = myData_test[keep,]

log cell type specific matrix and store into "log_q_norm_Astro" assay element

assayDataElement(object=myData_test, elt="log_q_norm_Astro") <- assayDataApply(myData_test, 2, FUN=log, base=2, elt="q_norm_Astro")

get index for Astrocyte specific samples to be tested

ind <- pData(myData_test)$segment == "AstroMarker"
mixedOutmc <- mixedModelDE(myData_test[, ind], elt = "log_q_norm_Astro", modelFormula = ~ testClass + (1 | slide), groupVar = "testClass", nCores = 20, multiCore = F)
r_test <- do.call(rbind, mixedOutmc["lsmeans", ])
tests <- rownames(r_test)
r_test <- as.data.frame(r_test)
r_test$Contrast <- tests
r_test$Gene <- unlist(lapply(colnames(mixedOutmc), rep, nrow(mixedOutmc["lsmeans", ][[1]])))
r_test$Subset <- segment
r_test$FDR <- p.adjust(r_test$`Pr(>|t|)`, method = "fdr")
r_test <- r_test[, c("Gene", "Subset", "Contrast", "Estimate", "Pr(>|t|)", "FDR")]

@gianfilippo
Copy link
Author

Hi, I think I found the issue.
The way I was getting the cell type specific gene expression estimate is wrong, since I was adding the intercept value to the gene expression estimate, i.e.
for (i in 1:ncol(norm)) normAstro[,i] = rdecon$coefs[,"Astro"] * res$beta["Astro",i] + rdecon$coefs[,1]

In reality this intercept, as I understand it, is part of the model for overall gene expression. So, when I removed it, and estimated cell type gene expression as
for (i in 1:ncol(norm)) normAstro[,i] = rdecon$coefs[,"Astro"] * res$beta["Astro",i]
the code works.
The reason for the crash, as you mentioned also, is constant expression across samples, which happens anywhere the cell-sample coeff is zero, making the only non-zero entry the intercept coefficient.
Perhaps, a question is how good are these estimates for differential expression analysis, since, I believe, you have not assessed that in the paper. i can see that the difference from actual measurement is non-negligible.
Also, what is the meaning of the intercept coefficient ? background noise ?
Thanks
Gianfilippo

@patrickjdanaher
Copy link
Collaborator

Hi Gianfilippo,
Glad you found the fix! Answers to your questions:

  • The reverseDecon residuals are suitable for differential expression.
  • It looks like you're calculating astrocyte-specific correctly.
  • To run "astrocyte-specific DE", you could run DE on the reverse decon residuals for genes whose counts are mainly from astrocytes, calculated as you describe. Whether you include background counts in the astrocyte_derived_counts / total_counts calculation is up to you and your reviewers.
    Good luck,
    Patrick

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants