Skip to content

Commit cf77d16

Browse files
author
Keith Goldfeld
committed
Adding log link to binary dist
1 parent e153754 commit cf77d16

File tree

6 files changed

+29
-10
lines changed

6 files changed

+29
-10
lines changed

NEWS.md

+1
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@
22

33
## New features
44
* Added the ability to generate data from a empirical distribution by using new functions `genDataDensity` and `addDataDensity`.
5+
* The *binary* and *binomial* distributions can now accomodate a "log" link.
56

67
## Minor fix
78
* `addCorGen` no longer requires all clusters to have the same size when using the *rho* and *corstr* arguments to define the correlation.

R/define_data.R

+16-2
Original file line numberDiff line numberDiff line change
@@ -690,13 +690,17 @@ defSurv <- function(dtDefs = NULL,
690690
switch(newdist,
691691
binary = {
692692
.isValidArithmeticFormula(newform, defVars)
693+
.isIdLogLogit(link)
694+
},
695+
beta = {
696+
.isValidArithmeticFormula(newform, defVars)
697+
.isValidArithmeticFormula(variance, defVars)
693698
.isIdLogit(link)
694699
},
695-
beta = ,
696700
binomial = {
697701
.isValidArithmeticFormula(newform, defVars)
698702
.isValidArithmeticFormula(variance, defVars)
699-
.isIdLogit(link)
703+
.isIdLogLogit(link)
700704
},
701705
noZeroPoisson = ,
702706
poisson = ,
@@ -905,6 +909,16 @@ defSurv <- function(dtDefs = NULL,
905909
invisible(link)
906910
}
907911

912+
#' Is identity, log, logit?
913+
#'
914+
#' @param link link as string.
915+
#' @return Invisible, error if link not valid.
916+
#' @noRd
917+
.isIdLogLogit <- function(link) {
918+
.isLink(link, c("identity", "log", "logit"))
919+
invisible(link)
920+
}
921+
908922
#' Error template for link check
909923
#'
910924
#' @param link Link as string.

R/generate_dist.R

+4
Original file line numberDiff line numberDiff line change
@@ -253,6 +253,10 @@
253253
size <- .evalWith(size, .parseDotVars(size, envir), dtSim, n)
254254
p <- .evalWith(formula, .parseDotVars(formula, envir), dtSim, n)
255255

256+
if (link == "log") {
257+
p <- exp(p)
258+
}
259+
256260
if (link == "logit") {
257261
p <- 1 / (1 + exp(-p))
258262
}

R/simstudy-package.R

+2-2
Original file line numberDiff line numberDiff line change
@@ -30,8 +30,8 @@ NULL
3030
#' | **name** | **formula** | **format** | **variance** | **link** |
3131
#' |-----------------|------------------------|------------------------------------------|------------------|-------------------|
3232
#' | beta | mean | String or Number | dispersion value | identity or logit |
33-
#' | binary | probability for 1 | String or Number | NA | identity or logit |
34-
#' | binomial | probability of success | String or Number | number of trials | identity or logit |
33+
#' | binary | probability for 1 | String or Number | NA | identity, log, or logit |
34+
#' | binomial | probability of success | String or Number | number of trials | identity, log, or logit |
3535
#' | categorical | probabilities | `p_1;p_2;..;p_n` | category labels: `a;b;c` , `50;130;20`| identity or logit |
3636
#' | custom | name of function | String | arguments | identity |
3737
#' | exponential | mean (lambda) | String or Number | NA | identity or log |

man/distributions.Rd

+2-2
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

vignettes/simstudy.Rmd

+4-4
Original file line numberDiff line numberDiff line change
@@ -186,8 +186,8 @@ The foundation of generating data is the assumptions we make about the distribut
186186
```{r, echo=FALSE}
187187
d <- list()
188188
d[[1]] <- data.table("beta", "mean", "both", "-", "dispersion", "X", "-", "X")
189-
d[[2]] <- data.table("binary", "probability", "both", "-", "-", "X", "-", "X")
190-
d[[3]] <- data.table("binomial", "probability", "both", "-", "# of trials", "X", "-", "X")
189+
d[[2]] <- data.table("binary", "probability", "both", "-", "-", "X", "X", "X")
190+
d[[3]] <- data.table("binomial", "probability", "both", "-", "# of trials", "X", "X", "X")
191191
d[[4]] <- data.table("categorical", "probability", "string", "p_1;p_2;...;p_n", "a;b;c", "X", "-", "X")
192192
d[[5]] <- data.table("clusterSize", "total N", "both", "-", "dispersion", "X", "-", "-")
193193
d[[6]] <- data.table("custom", "function", "string", "-", "arguments", "X", "-", "-")
@@ -215,11 +215,11 @@ A *beta* distribution is a continuous data distribution that takes on values bet
215215

216216
#### binary
217217

218-
A *binary* distribution is a discrete data distribution that takes values $0$ or $1$. (It is more conventionally called a *Bernoulli* distribution, or is a *binomial* distribution with a single trial $n=1$.) The `formula` represents the probability (with the 'identity' link) or the log odds (with the 'logit' link) that the variable takes the value of 1. The mean of this distribution is $p$, and variance $\sigma^2$ is $p(1-p)$.
218+
A *binary* distribution is a discrete data distribution that takes values $0$ or $1$. (It is more conventionally called a *Bernoulli* distribution, or is a *binomial* distribution with a single trial $n=1$.) The `formula` represents the probability (with the 'identity' link), the relative risk (with the 'log' link), or the log odds (with the 'logit' link) that the variable takes the value of 1. The mean of this distribution is $p$, and variance $\sigma^2$ is $p(1-p)$.
219219

220220
#### binomial
221221

222-
A *binomial* distribution is a discrete data distribution that represents the count of the number of successes given a number of trials. The formula specifies the probability of success $p$, and the variance field is used to specify the number of trials $n$. Given a value of $p$, the mean $\mu$ of this distribution is $n*p$, and the variance $\sigma^2$ is $np(1-p)$.
222+
A *binomial* distribution is a discrete data distribution that represents the count of the number of successes given a number of trials. The formula specifies the probability of success (with the 'identity' link), the relative risk (with the 'log' link), or the log odds (with the 'logit' link) that the variable takes the value of 1. and the variance field is used to specify the number of trials $n$. Given a value of $p$, the mean $\mu$ of this distribution is $n*p$, and the variance $\sigma^2$ is $np(1-p)$.
223223

224224
#### categorical
225225

0 commit comments

Comments
 (0)