Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

216 nonrandom distribution returns a single value when repeated values are expected #225

Conversation

kgoldfeld
Copy link
Owner

No description provided.

Keith Goldfeld and others added 5 commits May 10, 2024 11:27
@kgoldfeld
Copy link
Owner Author

@assignUser I tried to create a test for the new custom functionality, and of course, I need to create a function as part of the test. However, the test doesn't run, because it can't see the function for some reason. Here is my test - any thoughts on how I can make it work?

test_that("custom data is generated as expected.", {
  skip_on_cran()

  trunc_norm <- function(n, lower, upper, mu = 0, s = 1.5) {

    F.a <- pnorm(lower, mean = mu, sd = s)
    F.b <- pnorm(upper, mean = mu, sd = s)

    u <- runif(n, min = F.a, max = F.b)
    qnorm(u, mean = mu, sd = s)

  }

  def <-
    defData(varname = "x", formula = 5, dist = "poisson") |>
    defData(varname = "y", formula = ".trunc_norm",
            variance = "s = 100, lower = x - 1, upper = x + 1",
            dist = "custom"
    )

  dd <- genData(10000, def)

  expect_true( dd[, min(y)] > dd[, min(x-1)])
  expect_true( dd[, max(y)] < dd[, max(x+1)])

})

Here is the error message:

dim = c(10000L, 
1L), dimnames = list(NULL, "x+1")))`: could not find function "trunc_norm"
Backtrace:
    ▆
 1. └─simstudy::genData(10000, def) at test-generate_dist.R:195:3
 2.   └─simstudy:::.generate(...) at simstudy/R/generate_data.R:86:7
 3.     └─simstudy:::.gencustom(...) at simstudy/R/generate_dist.R:16:3
 4.       └─base::do.call(fn, arg_list) at simstudy/R/generate_dist.R:371:3
[ FAIL 1 | WARN 1 | SKIP 0 | PASS 28 ]

Test complete

Of course, the code works if I just run it outside the context of a test.

Copy link

This is how benchmark results would change (along with a 95% confidence interval in relative change) if b6919cc is merged into main:

  • ✔️define_data: 17.6ms -> 17.4ms [-3.51%, +1.93%]
  • ✔️dist_beta: 211ms -> 212ms [-0.91%, +1.19%]
  • ✔️dist_binary: 8.73ms -> 8.71ms [-1.31%, +0.93%]
  • ✔️dist_binomial: 13ms -> 13ms [-1.44%, +2.46%]
  • ✔️dist_categorical: 57.7ms -> 58.1ms [-0.18%, +1.45%]
  • ✔️dist_exponential: 8.84ms -> 8.79ms [-2.01%, +0.78%]
  • ✔️dist_gamma: 16.3ms -> 16.2ms [-1.5%, +0.09%]
  • ✔️dist_mixture: 227ms -> 225ms [-1.91%, +0.56%]
  • ✔️dist_normal: 11.2ms -> 10.6ms [-15.87%, +6.34%]
  • ✔️gen_all_dists: 50.3ms -> 49.1ms [-6.26%, +1.57%]
    Further explanation regarding interpretation and methodology can be found in the documentation.

@kgoldfeld
Copy link
Owner Author

kgoldfeld commented May 11, 2024

I figured out the test issue - needed to change the environment when in the do.call(). It has the added benefit of working if the custom function is defined within another function.

I've created a new vignette - that should give you a better idea of how it is all working.

Let me know if you think I can merge this.

Copy link

This is how benchmark results would change (along with a 95% confidence interval in relative change) if 6c10b21 is merged into main:

  • ❗🐌define_data: 17.2ms -> 17.6ms [+0.67%, +3.25%]
  • ✔️dist_beta: 212ms -> 213ms [-0.29%, +1.8%]
  • ✔️dist_binary: 9.01ms -> 9.18ms [-4.33%, +8.28%]
  • ✔️dist_binomial: 13.1ms -> 13ms [-2.07%, +1.1%]
  • ✔️dist_categorical: 58.5ms -> 58.2ms [-1.78%, +0.69%]
  • ✔️dist_exponential: 8.85ms -> 8.91ms [-1.18%, +2.67%]
  • ✔️dist_gamma: 16.4ms -> 16.5ms [-1.81%, +3.67%]
  • ✔️dist_mixture: 226ms -> 226ms [-1.47%, +1.2%]
  • ✔️dist_normal: 10.6ms -> 11.2ms [-6.62%, +17.84%]
  • ✔️gen_all_dists: 48.9ms -> 48.8ms [-1.7%, +1.34%]
    Further explanation regarding interpretation and methodology can be found in the documentation.

Copy link
Collaborator

@assignUser assignUser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, very cool, nice job! Neat how the systems come together and something is easier than expected 😁

I have just two small question:

  • this will at the moment only work with arguments fully specified as name = vale||var (just from looking at the code) If I am not mistaken I think that should be mentioned in the documentation to avoid confusion. (or the code changed to accept a mix as well but I think it's totally fine keeping it as is)
  • double-dot variables should also work right? that might also be worth a mention.

Co-authored-by: Jacob Wujciak-Jens <jacob@wujciak.de>
Copy link

This is how benchmark results would change (along with a 95% confidence interval in relative change) if 4bf87a0 is merged into main:

  • ✔️define_data: 17.8ms -> 17.6ms [-4.01%, +1.5%]
  • ✔️dist_beta: 215ms -> 214ms [-2.25%, +1.53%]
  • ✔️dist_binary: 8.69ms -> 8.97ms [-3.29%, +9.82%]
  • ✔️dist_binomial: 13.2ms -> 13.3ms [-3.49%, +4.89%]
  • ✔️dist_categorical: 58.4ms -> 58.9ms [-2.22%, +3.73%]
  • ✔️dist_exponential: 9.18ms -> 9.17ms [-1.9%, +1.83%]
  • ✔️dist_gamma: 16.5ms -> 16.4ms [-2.06%, +0.58%]
  • ✔️dist_mixture: 226ms -> 225ms [-2.12%, +1.86%]
  • ✔️dist_normal: 10.6ms -> 10.8ms [-1.39%, +3.67%]
  • ✔️gen_all_dists: 49.4ms -> 49.5ms [-0.9%, +1.48%]
    Further explanation regarding interpretation and methodology can be found in the documentation.

@kgoldfeld
Copy link
Owner Author

Wow, very cool, nice job! Neat how the systems come together and something is easier than expected 😁

I have just two small question:

* this will at the moment only work with arguments fully specified as `name = vale||var` (just from looking at the code) If I am not mistaken I think that should be mentioned in the documentation to avoid confusion. (or the code changed to accept a mix as well but I think it's totally fine keeping it as is)

Yes - I will make it clear that we need that format "name = value or formula"

* double-dot variables should also work right? that might also be worth a mention.

Yes - double dot works here as well - that was the beauty of the system you set up - I was able to take advantage here.

@kgoldfeld
Copy link
Owner Author

I have updated the help documentation in simstudy.Rmd.

One thought I had. I now have the function name in the defData formula field, and the formula string in the variance field, becuase I thought it was logical to have the function name first. However, it might be less confusing to switch them so that the formula string is in the defData formula field, and the function name in the defData variance field. Do you think the change would be useful?

Copy link

This is how benchmark results would change (along with a 95% confidence interval in relative change) if f297024 is merged into main:

  • ✔️define_data: 17.2ms -> 17.3ms [-0.96%, +2.47%]
  • ✔️dist_beta: 211ms -> 224ms [-1.31%, +13.24%]
  • ❗🐌dist_binary: 8.65ms -> 8.81ms [+0.2%, +3.63%]
  • ✔️dist_binomial: 13ms -> 13ms [-1.91%, +1.55%]
  • ✔️dist_categorical: 59ms -> 57.8ms [-4.51%, +0.58%]
  • ✔️dist_exponential: 8.82ms -> 8.89ms [-0.22%, +1.75%]
  • ✔️dist_gamma: 16.5ms -> 16.5ms [-1.44%, +1.99%]
  • ✔️dist_mixture: 222ms -> 224ms [-0.66%, +2.39%]
  • ✔️dist_normal: 10.7ms -> 10.7ms [-1.79%, +1.62%]
  • ✔️gen_all_dists: 48.7ms -> 48.9ms [-0.02%, +0.88%]
    Further explanation regarding interpretation and methodology can be found in the documentation.

Copy link

This is how benchmark results would change (along with a 95% confidence interval in relative change) if 57a5939 is merged into main:

  • ✔️define_data: 17.3ms -> 17.4ms [-0.59%, +1.45%]
  • ✔️dist_beta: 215ms -> 216ms [-1.29%, +2.35%]
  • ✔️dist_binary: 8.86ms -> 8.98ms [-1.96%, +4.75%]
  • ✔️dist_binomial: 13.3ms -> 13.2ms [-2.9%, +2.03%]
  • ✔️dist_categorical: 58.7ms -> 59ms [-0.22%, +1.21%]
  • ✔️dist_exponential: 9.12ms -> 9.16ms [-2.12%, +3.02%]
  • ✔️dist_gamma: 16.8ms -> 16.5ms [-2.96%, +0.22%]
  • ✔️dist_mixture: 231ms -> 230ms [-2.72%, +1.46%]
  • ✔️dist_normal: 10.8ms -> 10.8ms [-3.17%, +1.83%]
  • ✔️gen_all_dists: 49.5ms -> 50ms [-0.97%, +2.75%]
    Further explanation regarding interpretation and methodology can be found in the documentation.

@kgoldfeld kgoldfeld marked this pull request as ready for review May 13, 2024 20:17
@kgoldfeld kgoldfeld merged commit 1e16802 into main May 13, 2024
9 checks passed
@kgoldfeld kgoldfeld deleted the 216-nonrandom-distribution-returns-a-single-value-when-repeated-values-are-expected branch May 13, 2024 20:21
@assignUser
Copy link
Collaborator

Do you think the change would be useful?

I think either way works but I would also tend to put the function name first, it just feels 'natural' even if the field name doesn't fit ^^

@kgoldfeld
Copy link
Owner Author

I guess I landed in the same place as you. This one is in the books - since it has been merged. I will likely be submitting the current development version as simstudy 0.8.0 in the next few weeks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

nonrandom distribution returns a single value when repeated values are expected
2 participants