Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation fault in CVAdataStore #49

Open
lcontento opened this issue Sep 10, 2020 · 5 comments
Open

Segmentation fault in CVAdataStore #49

lcontento opened this issue Sep 10, 2020 · 5 comments

Comments

@lcontento
Copy link

For some parameter values I get a segmentation fault while doing adjoint sensitivity analysis.
The problematic point is line 2033 in the following snippet:

sundials/src/cvodes/cvodea.c

Lines 2027 to 2037 in 0c83e0b

i = 1;
do {
flag = CVode(cv_mem, ck_mem->ck_t1, ca_mem->ca_ytmp, &t, CV_ONE_STEP);
if (flag < 0) return(CV_FWD_FAIL);
dt_mem[i]->t = t;
ca_mem->ca_IMstore(cv_mem, dt_mem[i]);
i++;
} while ( sign*(ck_mem->ck_t1 - t) > ZERO );

It seems that the index i is not checked for being inside the bounds of the dt_mem array.
If I add the check if (i > ca_mem->ca_nsteps) return(CV_FWD_FAIL); at the beginning of the loop body the segmentation fault is solved.
However, I am not sure if this is the correct thing to do or if index i should always be inside the bounds and the actual problem lies elsewhere.

@lcontento
Copy link
Author

I investigated the matter a bit more. It seems that the software package I use (https://github.com/AMICI-dev/AMICI) is calling functions not in the public API. So maybe when only the public API is used the index i would be guaranteed to be inside the valid bounds (not sure).

@aseyboldt
Copy link

This might be the same issue as #31?

@gardner48 gardner48 self-assigned this Jun 27, 2022
@balos1 balos1 added the triage label Mar 29, 2023
dweindl added a commit to dweindl/AMICI that referenced this issue Feb 16, 2025
amici wants to avoid sundials' checkpointing, that's we set `steps` (the number of integration steps at which a checkpoint is created) for `CVodeAdjInit` to the maximum number of integration steps (`mxsteps`) for the forward problem.

However, the checkpoint is created at step `steps`, not at step `steps + 1`.

Therefore, if the forward integration takes exactly `mxsteps`, a checkpoint is still created.

This is not a problem per se, but it may trigger segfaults under circumstances not fully understood (-> LLNL/sundials#49).
Although unlikely, it still occasionally crashes long running optimizations.

This change should make sure that checkpointing never occurs.
github-merge-queue bot pushed a commit to AMICI-dev/AMICI that referenced this issue Feb 17, 2025
amici wants to avoid sundials' checkpointing, that's we set `steps` (the number of integration steps at which a checkpoint is created) for `CVodeAdjInit` to the maximum number of integration steps (`mxsteps`) for the forward problem.

However, the checkpoint is created at step `steps`, not at step `steps + 1`.

Therefore, if the forward integration takes exactly `mxsteps`, a checkpoint is still created.

This is not a problem per se, but it may trigger segfaults under circumstances not fully understood (-> LLNL/sundials#49).
Although unlikely, it still occasionally crashes long running optimizations.

This change should make sure that checkpointing never occurs.
dweindl added a commit to dweindl/AMICI that referenced this issue Feb 18, 2025
amici wants to avoid sundials' checkpointing, that's we set `steps` (the number of integration steps at which a checkpoint is created) for `CVodeAdjInit` to the maximum number of integration steps (`mxsteps`) for the forward problem.

However, the checkpoint is created at step `steps`, not at step `steps + 1`.

Therefore, if the forward integration takes exactly `mxsteps`, a checkpoint is still created.

This is not a problem per se, but it may trigger segfaults under circumstances not fully understood (-> LLNL/sundials#49).
Although unlikely, it still occasionally crashes long running optimizations.

This change should make sure that checkpointing never occurs.
@dweindl
Copy link
Contributor

dweindl commented Feb 18, 2025

Hi, this issue still persists with SUNDIALS 7.1.1. I can reproduce it, but unfortunately, I cannot provide a self-contained example. The problem occurs if forward integration from a checkpoint under CVodeB takes more steps than during the initial forward integration under CVodeF.

While this shouldn't happen, it does happen occasionally. I don't know what exactly is causing this.
I could imagine 1) just slightly different hyperparameters, or 2) non-deterministic behavior in the user provided functions or in the solver (for example, due to parallelization) resulting in tiny numerical inaccuracies. (Note that a single additional integrator step will cause a segfault here.)

While I don't necessarily expect SUNDIALS to be able to handle this case, it would be great to at least gracefully fail with an error message if i > ca_mem->ca_nsteps instead of segfaulting at dt_mem[i]->t. I'm happy to provide more information if needed.

@dweindl
Copy link
Contributor

dweindl commented Feb 20, 2025

This issue can easily be triggered by providing a non-deterministic Jacobian. For example, by applying the following patch to cvsRoberts_ASAi_dns.c. This is a contrived example, but I still think CVODES shouldn't crash in this case.

diff --git a/examples/cvodes/serial/cvsRoberts_ASAi_dns.c b/examples/cvodes/serial/cvsRoberts_ASAi_dns.c
index c60c74ce2..78abf884a 100644
--- a/examples/cvodes/serial/cvsRoberts_ASAi_dns.c
+++ b/examples/cvodes/serial/cvsRoberts_ASAi_dns.c
@@ -162,6 +162,7 @@ int main(int argc, char* argv[])
   ckpnt     = NULL;
   y = yB = qB = NULL;
 
+  srand(0);
   /* Print problem description */
   printf("\nAdjoint Sensitivity Example for Chemical Kinetics\n");
   printf("-------------------------------------------------\n\n");
@@ -581,7 +582,7 @@ static int Jac(sunrealtype t, N_Vector y, N_Vector fy, SUNMatrix J,
   p2   = data->p[1];
   p3   = data->p[2];
 
-  IJth(J, 1, 1) = -p1;
+  IJth(J, 1, 1) = -p1 + p1 / 1000.0 * ((double)rand() / RAND_MAX);
   IJth(J, 1, 2) = p2 * y3;
   IJth(J, 1, 3) = p2 * y2;
   IJth(J, 2, 1) = p1;

gdb -ex r -ex bt build/examples/cvodes/serial/cvsRoberts_ASAi_dns:

Program received signal SIGSEGV, Segmentation fault.
0x000055555555c0ab in CVAdataStore (cv_mem=0x5555555aac60, ck_mem=0x5555555e03c0) at sundials/src/cvodes/cvodea.c:2363
2363        dt_mem[i]->t = t;
#0  0x000055555555c0ab in CVAdataStore (cv_mem=0x5555555aac60, ck_mem=0x5555555e03c0) at sundials/src/cvodes/cvodea.c:2363
#1  0x0000555555559e67 in CVodeB (cvode_mem=0x5555555aac60, tBout=40, itaskB=1) at sundials/src/cvodes/cvodea.c:1547
#2  0x0000555555556228 in main (argc=1, argv=0x7fffffffd368) at sundials/examples/cvodes/serial/cvsRoberts_ASAi_dns.c:415

@balos1
Copy link
Member

balos1 commented Feb 20, 2025

Thanks for the reproducer @dweindl. I think adding a check to gracefully fail with an error message is reasonable.

@balos1 balos1 added enhancement and removed triage labels Feb 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants