Adding log link to binary dist

Keith Goldfeld · Keith Goldfeld · commit cf77d160ffe9 · 2024-06-28T13:34:26.000-04:00
diff --git a/NEWS.md b/NEWS.md
@@ -2,6 +2,7 @@
 
 ## New features
 * Added the ability to generate data from a empirical distribution by using new functions `genDataDensity` and `addDataDensity`.
+* The *binary* and *binomial* distributions can now accomodate a "log" link.
 
 ## Minor fix
 * `addCorGen` no longer requires all clusters to have the same size when using the *rho* and *corstr* arguments to define the correlation.
diff --git a/R/define_data.R b/R/define_data.R
@@ -690,13 +690,17 @@ defSurv <- function(dtDefs = NULL,
     switch(newdist,
       binary = {
         .isValidArithmeticFormula(newform, defVars)
+        .isIdLogLogit(link)
+      },
+      beta = {
+        .isValidArithmeticFormula(newform, defVars)
+        .isValidArithmeticFormula(variance, defVars)
         .isIdLogit(link)
       },
-      beta = ,
       binomial = {
         .isValidArithmeticFormula(newform, defVars)
         .isValidArithmeticFormula(variance, defVars)
-        .isIdLogit(link)
+        .isIdLogLogit(link)
       },
       noZeroPoisson = ,
       poisson = ,
@@ -905,6 +909,16 @@ defSurv <- function(dtDefs = NULL,
   invisible(link)
 }
 
+#' Is identity, log, logit?
+#'
+#' @param link link as string.
+#' @return Invisible, error if link not valid.
+#' @noRd
+.isIdLogLogit <- function(link) {
+  .isLink(link, c("identity", "log", "logit"))
+  invisible(link)
+}
+
 #' Error template for link check
 #'
 #' @param link Link as string.
diff --git a/R/generate_dist.R b/R/generate_dist.R
@@ -253,6 +253,10 @@
   size <- .evalWith(size, .parseDotVars(size, envir), dtSim, n)
   p <- .evalWith(formula, .parseDotVars(formula, envir), dtSim, n)
 
+  if (link == "log") {
+    p <- exp(p)
+  }
+  
   if (link == "logit") {
     p <- 1 / (1 + exp(-p))
   }
diff --git a/R/simstudy-package.R b/R/simstudy-package.R
@@ -30,8 +30,8 @@ NULL
 #' | **name**        | **formula**            | **format**                               | **variance**     | **link**          |
 #' |-----------------|------------------------|------------------------------------------|------------------|-------------------|
 #' | beta            | mean                   | String or Number                         | dispersion value | identity or logit |
-#' | binary          | probability for 1      | String or Number                         | NA             | identity or logit |
-#' | binomial        | probability of success | String or Number                         | number of trials | identity or logit |
+#' | binary          | probability for 1      | String or Number                         | NA             | identity, log, or logit |
+#' | binomial        | probability of success | String or Number                         | number of trials | identity, log, or logit |
 #' | categorical     | probabilities          | `p_1;p_2;..;p_n`                         | category labels: `a;b;c` , `50;130;20`| identity or logit |
 #' | custom          | name of function       | String                                   | arguments      | identity          |
 #' | exponential     | mean (lambda)          | String or Number                         | NA             | identity or log   |
diff --git a/man/distributions.Rd b/man/distributions.Rd
diff --git a/vignettes/simstudy.Rmd b/vignettes/simstudy.Rmd
@@ -186,8 +186,8 @@ The foundation of generating data is the assumptions we make about the distribut
 ```{r,  echo=FALSE}
 d <- list()
 d[[1]] <- data.table("beta", "mean", "both", "-", "dispersion", "X", "-", "X") 
-d[[2]] <- data.table("binary", "probability", "both", "-", "-", "X", "-", "X") 
-d[[3]] <- data.table("binomial", "probability", "both", "-", "# of trials", "X", "-", "X")
+d[[2]] <- data.table("binary", "probability", "both", "-", "-", "X", "X", "X") 
+d[[3]] <- data.table("binomial", "probability", "both", "-", "# of trials", "X", "X", "X")
 d[[4]] <- data.table("categorical", "probability", "string", "p_1;p_2;...;p_n", "a;b;c", "X", "-", "X")
 d[[5]] <- data.table("clusterSize", "total N", "both", "-", "dispersion", "X", "-", "-")
 d[[6]] <- data.table("custom", "function", "string", "-", "arguments", "X", "-", "-")
@@ -215,11 +215,11 @@ A *beta* distribution is a continuous data distribution that takes on values bet
 
 #### binary
 
-A *binary* distribution is a discrete data distribution that takes values $0$ or $1$. (It is more conventionally called a *Bernoulli* distribution, or is a *binomial* distribution with a single trial $n=1$.) The `formula` represents the probability (with the 'identity' link) or the log odds (with the 'logit' link) that the variable takes the value of 1. The mean of this distribution is $p$, and variance $\sigma^2$ is $p(1-p)$.
+A *binary* distribution is a discrete data distribution that takes values $0$ or $1$. (It is more conventionally called a *Bernoulli* distribution, or is a *binomial* distribution with a single trial $n=1$.) The `formula` represents the probability (with the 'identity' link), the relative risk (with the 'log' link), or the log odds (with the 'logit' link) that the variable takes the value of 1. The mean of this distribution is $p$, and variance $\sigma^2$ is $p(1-p)$.
 
 #### binomial
 
-A *binomial* distribution is a discrete data distribution that represents the count of the number of successes given a number of trials. The formula specifies the probability of success $p$, and the variance field is used to specify the number of trials $n$. Given a value of $p$, the mean $\mu$ of this distribution is $n*p$, and the variance $\sigma^2$ is $np(1-p)$.
+A *binomial* distribution is a discrete data distribution that represents the count of the number of successes given a number of trials. The formula specifies the probability of success (with the 'identity' link), the relative risk (with the 'log' link), or the log odds (with the 'logit' link) that the variable takes the value of 1. and the variance field is used to specify the number of trials $n$. Given a value of $p$, the mean $\mu$ of this distribution is $n*p$, and the variance $\sigma^2$ is $np(1-p)$.
 
 #### categorical