216 nonrandom distribution returns a single value when repeated values are expected #225

kgoldfeld · 2024-05-10T20:45:48Z

No description provided.

…repeated-values-are-expected' of https://github.com/kgoldfeld/simstudy into 216-nonrandom-distribution-returns-a-single-value-when-repeated-values-are-expected

kgoldfeld · 2024-05-10T20:57:52Z

@assignUser I tried to create a test for the new custom functionality, and of course, I need to create a function as part of the test. However, the test doesn't run, because it can't see the function for some reason. Here is my test - any thoughts on how I can make it work?

test_that("custom data is generated as expected.", {
  skip_on_cran()

  trunc_norm <- function(n, lower, upper, mu = 0, s = 1.5) {

    F.a <- pnorm(lower, mean = mu, sd = s)
    F.b <- pnorm(upper, mean = mu, sd = s)

    u <- runif(n, min = F.a, max = F.b)
    qnorm(u, mean = mu, sd = s)

  }

  def <-
    defData(varname = "x", formula = 5, dist = "poisson") |>
    defData(varname = "y", formula = ".trunc_norm",
            variance = "s = 100, lower = x - 1, upper = x + 1",
            dist = "custom"
    )

  dd <- genData(10000, def)

  expect_true( dd[, min(y)] > dd[, min(x-1)])
  expect_true( dd[, max(y)] < dd[, max(x+1)])

})

Here is the error message:

dim = c(10000L, 
1L), dimnames = list(NULL, "x+1")))`: could not find function "trunc_norm"
Backtrace:
    ▆
 1. └─simstudy::genData(10000, def) at test-generate_dist.R:195:3
 2.   └─simstudy:::.generate(...) at simstudy/R/generate_data.R:86:7
 3.     └─simstudy:::.gencustom(...) at simstudy/R/generate_dist.R:16:3
 4.       └─base::do.call(fn, arg_list) at simstudy/R/generate_dist.R:371:3
[ FAIL 1 | WARN 1 | SKIP 0 | PASS 28 ]

Test complete

Of course, the code works if I just run it outside the context of a test.

github-actions · 2024-05-10T21:27:26Z

This is how benchmark results would change (along with a 95% confidence interval in relative change) if b6919cc is merged into main:

✔️define_data: 17.6ms -> 17.4ms [-3.51%, +1.93%]
✔️dist_beta: 211ms -> 212ms [-0.91%, +1.19%]
✔️dist_binary: 8.73ms -> 8.71ms [-1.31%, +0.93%]
✔️dist_binomial: 13ms -> 13ms [-1.44%, +2.46%]
✔️dist_categorical: 57.7ms -> 58.1ms [-0.18%, +1.45%]
✔️dist_exponential: 8.84ms -> 8.79ms [-2.01%, +0.78%]
✔️dist_gamma: 16.3ms -> 16.2ms [-1.5%, +0.09%]
✔️dist_mixture: 227ms -> 225ms [-1.91%, +0.56%]
✔️dist_normal: 11.2ms -> 10.6ms [-15.87%, +6.34%]
✔️gen_all_dists: 50.3ms -> 49.1ms [-6.26%, +1.57%]
Further explanation regarding interpretation and methodology can be found in the documentation.

kgoldfeld · 2024-05-11T01:40:05Z

I figured out the test issue - needed to change the environment when in the do.call(). It has the added benefit of working if the custom function is defined within another function.

I've created a new vignette - that should give you a better idea of how it is all working.

Let me know if you think I can merge this.

github-actions · 2024-05-11T01:41:25Z

This is how benchmark results would change (along with a 95% confidence interval in relative change) if 6c10b21 is merged into main:

❗🐌define_data: 17.2ms -> 17.6ms [+0.67%, +3.25%]
✔️dist_beta: 212ms -> 213ms [-0.29%, +1.8%]
✔️dist_binary: 9.01ms -> 9.18ms [-4.33%, +8.28%]
✔️dist_binomial: 13.1ms -> 13ms [-2.07%, +1.1%]
✔️dist_categorical: 58.5ms -> 58.2ms [-1.78%, +0.69%]
✔️dist_exponential: 8.85ms -> 8.91ms [-1.18%, +2.67%]
✔️dist_gamma: 16.4ms -> 16.5ms [-1.81%, +3.67%]
✔️dist_mixture: 226ms -> 226ms [-1.47%, +1.2%]
✔️dist_normal: 10.6ms -> 11.2ms [-6.62%, +17.84%]
✔️gen_all_dists: 48.9ms -> 48.8ms [-1.7%, +1.34%]
Further explanation regarding interpretation and methodology can be found in the documentation.

assignUser

Wow, very cool, nice job! Neat how the systems come together and something is easier than expected 😁

I have just two small question:

this will at the moment only work with arguments fully specified as name = vale||var (just from looking at the code) If I am not mistaken I think that should be mentioned in the documentation to avoid confusion. (or the code changed to accept a mix as well but I think it's totally fine keeping it as is)
double-dot variables should also work right? that might also be worth a mention.

R/generate_dist.R

Co-authored-by: Jacob Wujciak-Jens <jacob@wujciak.de>

github-actions · 2024-05-13T02:10:19Z

This is how benchmark results would change (along with a 95% confidence interval in relative change) if 4bf87a0 is merged into main:

✔️define_data: 17.8ms -> 17.6ms [-4.01%, +1.5%]
✔️dist_beta: 215ms -> 214ms [-2.25%, +1.53%]
✔️dist_binary: 8.69ms -> 8.97ms [-3.29%, +9.82%]
✔️dist_binomial: 13.2ms -> 13.3ms [-3.49%, +4.89%]
✔️dist_categorical: 58.4ms -> 58.9ms [-2.22%, +3.73%]
✔️dist_exponential: 9.18ms -> 9.17ms [-1.9%, +1.83%]
✔️dist_gamma: 16.5ms -> 16.4ms [-2.06%, +0.58%]
✔️dist_mixture: 226ms -> 225ms [-2.12%, +1.86%]
✔️dist_normal: 10.6ms -> 10.8ms [-1.39%, +3.67%]
✔️gen_all_dists: 49.4ms -> 49.5ms [-0.9%, +1.48%]
Further explanation regarding interpretation and methodology can be found in the documentation.

kgoldfeld · 2024-05-13T02:31:26Z

Wow, very cool, nice job! Neat how the systems come together and something is easier than expected 😁

I have just two small question:
* this will at the moment only work with arguments fully specified as `name = vale||var` (just from looking at the code) If I am not mistaken I think that should be mentioned in the documentation to avoid confusion. (or the code changed to accept a mix as well but I think it's totally fine keeping it as is)

Yes - I will make it clear that we need that format "name = value or formula"

* double-dot variables should also work right? that might also be worth a mention.

Yes - double dot works here as well - that was the beauty of the system you set up - I was able to take advantage here.

kgoldfeld · 2024-05-13T02:52:06Z

I have updated the help documentation in simstudy.Rmd.

One thought I had. I now have the function name in the defData formula field, and the formula string in the variance field, becuase I thought it was logical to have the function name first. However, it might be less confusing to switch them so that the formula string is in the defData formula field, and the function name in the defData variance field. Do you think the change would be useful?

github-actions · 2024-05-13T03:12:45Z

This is how benchmark results would change (along with a 95% confidence interval in relative change) if f297024 is merged into main:

✔️define_data: 17.2ms -> 17.3ms [-0.96%, +2.47%]
✔️dist_beta: 211ms -> 224ms [-1.31%, +13.24%]
❗🐌dist_binary: 8.65ms -> 8.81ms [+0.2%, +3.63%]
✔️dist_binomial: 13ms -> 13ms [-1.91%, +1.55%]
✔️dist_categorical: 59ms -> 57.8ms [-4.51%, +0.58%]
✔️dist_exponential: 8.82ms -> 8.89ms [-0.22%, +1.75%]
✔️dist_gamma: 16.5ms -> 16.5ms [-1.44%, +1.99%]
✔️dist_mixture: 222ms -> 224ms [-0.66%, +2.39%]
✔️dist_normal: 10.7ms -> 10.7ms [-1.79%, +1.62%]
✔️gen_all_dists: 48.7ms -> 48.9ms [-0.02%, +0.88%]
Further explanation regarding interpretation and methodology can be found in the documentation.

github-actions · 2024-05-13T20:08:35Z

This is how benchmark results would change (along with a 95% confidence interval in relative change) if 57a5939 is merged into main:

✔️define_data: 17.3ms -> 17.4ms [-0.59%, +1.45%]
✔️dist_beta: 215ms -> 216ms [-1.29%, +2.35%]
✔️dist_binary: 8.86ms -> 8.98ms [-1.96%, +4.75%]
✔️dist_binomial: 13.3ms -> 13.2ms [-2.9%, +2.03%]
✔️dist_categorical: 58.7ms -> 59ms [-0.22%, +1.21%]
✔️dist_exponential: 9.12ms -> 9.16ms [-2.12%, +3.02%]
✔️dist_gamma: 16.8ms -> 16.5ms [-2.96%, +0.22%]
✔️dist_mixture: 231ms -> 230ms [-2.72%, +1.46%]
✔️dist_normal: 10.8ms -> 10.8ms [-3.17%, +1.83%]
✔️gen_all_dists: 49.5ms -> 50ms [-0.97%, +2.75%]
Further explanation regarding interpretation and methodology can be found in the documentation.

assignUser · 2024-05-14T00:18:49Z

Do you think the change would be useful?

I think either way works but I would also tend to put the function name first, it just feels 'natural' even if the field name doesn't fit ^^

kgoldfeld · 2024-05-14T14:11:01Z

I guess I landed in the same place as you. This one is in the books - since it has been merged. I will likely be submitting the current development version as simstudy 0.8.0 in the next few weeks.

Keith Goldfeld and others added 5 commits May 10, 2024 11:27

Adding "custom" distribution

30e3056

Update CITATION.cff

c63ad11

Adding vignette and tests

a723c89

Merge branch '216-nonrandom-distribution-returns-a-single-value-when-…

50964d6

…repeated-values-are-expected' of https://github.com/kgoldfeld/simstudy into 216-nonrandom-distribution-returns-a-single-value-when-repeated-values-are-expected

Updating vignette and removing test

9389e76

kgoldfeld linked an issue May 10, 2024 that may be closed by this pull request

nonrandom distribution returns a single value when repeated values are expected #216

Closed

Fixing environment in .gencustom

197c961

assignUser approved these changes May 12, 2024

View reviewed changes

R/generate_dist.R Outdated Show resolved Hide resolved

R/generate_dist.R Show resolved Hide resolved

R/generate_dist.R Show resolved Hide resolved

Update R/generate_dist.R

5bbda02

Co-authored-by: Jacob Wujciak-Jens <jacob@wujciak.de>

Updating custom help documentation

eb4175b

Fixing documentation and vignettes

aa853cc

kgoldfeld marked this pull request as ready for review May 13, 2024 20:17

kgoldfeld merged commit 1e16802 into main May 13, 2024
9 checks passed

kgoldfeld deleted the 216-nonrandom-distribution-returns-a-single-value-when-repeated-values-are-expected branch May 13, 2024 20:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

216 nonrandom distribution returns a single value when repeated values are expected #225

216 nonrandom distribution returns a single value when repeated values are expected #225

kgoldfeld commented May 10, 2024

kgoldfeld commented May 10, 2024

github-actions bot commented May 10, 2024

kgoldfeld commented May 11, 2024 •

edited

Loading

github-actions bot commented May 11, 2024

assignUser left a comment

github-actions bot commented May 13, 2024

kgoldfeld commented May 13, 2024

kgoldfeld commented May 13, 2024

github-actions bot commented May 13, 2024

github-actions bot commented May 13, 2024

assignUser commented May 14, 2024

kgoldfeld commented May 14, 2024

216 nonrandom distribution returns a single value when repeated values are expected #225

216 nonrandom distribution returns a single value when repeated values are expected #225

Conversation

kgoldfeld commented May 10, 2024

kgoldfeld commented May 10, 2024

github-actions bot commented May 10, 2024

kgoldfeld commented May 11, 2024 • edited Loading

github-actions bot commented May 11, 2024

assignUser left a comment

Choose a reason for hiding this comment

github-actions bot commented May 13, 2024

kgoldfeld commented May 13, 2024

kgoldfeld commented May 13, 2024

github-actions bot commented May 13, 2024

github-actions bot commented May 13, 2024

assignUser commented May 14, 2024

kgoldfeld commented May 14, 2024

kgoldfeld commented May 11, 2024 •

edited

Loading