Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Developer Issue]: Why not RTMB? #554

Open
Andrea-Havron-NOAA opened this issue Mar 11, 2024 · 20 comments
Open

[Developer Issue]: Why not RTMB? #554

Andrea-Havron-NOAA opened this issue Mar 11, 2024 · 20 comments
Assignees
Labels
kind: feature New feature or request kind: question Further information is requested kind: refactor Restructure code to improve the implementation of FIMS P2 mid-level priority status: needs more info Information is needed from the issue creator before forward progress can be made. theme: more features
Milestone

Comments

@Andrea-Havron-NOAA
Copy link
Collaborator

Issue details

RTMB is a wrapper interface that allows models written in R to be directly passed to TMB. Users no longer need to write models in C++ in order to access the TMB statistical inference engine, but can write them directly in R. Both RTMB models and original TMB models are translated into the same C++ computational graphs for statistical inferece.

The question, why not write FIMS in R and use RTMB? has come up several times. It is important to document this decision point and get feedback from the implementation team. A final version of the decision points should be made available to the community similar to issue NOAA-FIMS/noaa-fims.github.io#2.

I've come up with the following reasons:

  • FIMS started development before RTMB and would involve a cost to switch over from C++ to R.
  • FIMS aims to be portable and writing in C++ allows greater portability with platforms outside R (python - primary machine learning platform)
  • It is easier to introduce unintended bugs in R given it is a dynamic language and implements implicit conversion (R changes things on you without you realizing it).

There is a tradeoff between portability and community accessibility, both of which are stated goals of the FIMS project.

@Andrea-Havron-NOAA Andrea-Havron-NOAA added the status: triage_needed This is not approved for this milestone, do not work on it yet label Mar 11, 2024
@nathanvaughan-NOAA
Copy link
Contributor

I don't have an answer to the question but this made me think is it possible that we could use RTMB to allow user-specified R functions to be incorporated into a FIMS model such as a selective function? That could be a really useful use case for avoiding adding multitudes of function variations to base FIMS while allowing users flexibility?

@JaneSullivan-NOAA
Copy link
Contributor

Pro RTMB:

  • NO compilation = fast, fun, interactive development
  • NO C++ or templated C++ = a concise code base
  • NO Rcpp
  • Only one line of code to gain TMB's simulation and OSA functionality (though I recognize there is still some development that needs to happen in order for this to be fully operational)
  • never leave the R environment
  • we can still have modularity
  • we can all contribute and use our existing expertise as R programmers and stock assessment scientists

@JaneSullivan-NOAA
Copy link
Contributor

I don't have an answer to the question but this made me think is it possible that we could use RTMB to allow user-specified R functions to be incorporated into a FIMS model such as a selective function? That could be a really useful use case for avoiding adding multitudes of function variations to base FIMS while allowing users flexibility?

@nathanvaughan-NOAA to me this is the only "easy" thing to actually do in FIMS, so I don't see the benefit of creating a way to do this in R (PS it seems Matthew has already done something similar). The hard part of FIMS is the core architecture, the added dimensions, the Rcpp interface, connecting it to TMB, etc.

@JaneSullivan-NOAA
Copy link
Contributor

It is easier to introduce unintended bugs in R given it is a dynamic language and implements implicit conversion (R changes things on you without you realizing it).
@Andrea-Havron-NOAA I think a lot of work has been done to prevent this. Can you provide an example?

@nathanvaughan-NOAA
Copy link
Contributor

Great points @JaneSullivan-NOAA !! and thanks for the link, that is exactly what I was thinking.

@msupernaw
Copy link
Contributor

msupernaw commented Mar 12, 2024 via email

@kellijohnson-NOAA
Copy link
Contributor

Thanks @Andrea-Havron-NOAA for starting this. I agree that bullet (1) is true but I do not know if I see that as a limitation to not changing. I would like to think that if there was any kind of software that suited FIMS better than what we are currently using that we would put in the effort to make the switch. I am not saying that RTMB is better just that if it was I would think that especially at this stage of development that we would put in the effort to switch.

@JaneSullivan-NOAA
Copy link
Contributor

JaneSullivan-NOAA commented Mar 12, 2024

R is a untyped programming language. In a untyped language, a type for a variable can change at any time. For example, a variable defined as a list in R can easily be redefined as any other type, such as a vector, matrix, S4 object, etc and no warning will be given. I believe this is the kind of unintended bug

@msupernaw my point is that this is an R issue and not an RTMB issue, at least not to my knowledge. RTMB does the typing for the user ("typing" - is that how you say it?). Specifically, before we identify this as a "problem" in the "why not RTMB" column, I think we should have at least one example of it actually being a problem in RTMB.

The closest thing I know of is that RTMB does not currently accept sparse matrix classes, so you have to as.vector() them inside a likelihood (example from Sean Anderson). This is a different issue though.

@ChristineStawitz-NOAA
Copy link
Contributor

One other consideration is if using RTMB reduces portability, increasing our dependence on TMB which is externally developed and has no designated successor to maintain the code base. I know the user base is large enough there is hope the community could generate support, but it's a consideration IMO.

@Andrea-Havron-NOAA
Copy link
Collaborator Author

Andrea-Havron-NOAA commented Mar 12, 2024

@msupernaw my point is that this is an R issue and not an RTMB issue, at least not to my knowledge. RTMB does the typing for the user ("typing" - is that how you say it?). Specifically, before we identify this as a "problem" in the "why not RTMB" column, I think we should have at least one example of it actually being a problem in RTMB.

The closest thing I know of is that RTMB does not currently accept sparse matrix classes, so you have to as.vector() them inside a likelihood (example from Sean Anderson). This is a different issue though.

Thanks for raising this point Jane. RTMB, though, is just an R interface for TMB. RTMB specific functions are written in C++ and exposed to R via Rcpp. The objective function gets coded up in R. Any non-RTMB specific function (for example in FIMS, the population loop and code written to calculate expected/derived values) would be written in R.

Here is an example from RTMB's mvrw.R where I introduced an error by dropping the indices on the sd vector when calculating the covariance matrix.

Correct code:

cov <- outer(1:stateDim,
                 1:stateDim,
                 function(i,j) rho^(abs(i-j)) * sds[i] * sds[j])
>  cov 
       [,1]   [,2] [,3]
[1,] 0.2500 0.5625 0.81
[2,] 0.5625 1.5625 2.25
[3,] 0.8100 2.2500 4.00

Code with missing indices:

cov <- outer(1:stateDim,
                 1:stateDim,
                 function(i,j) rho^(abs(i-j)) * sds * sds)
> cov
        [,1]   [,2]    [,3]
[1,] 0.25000 0.2250 0.20250
[2,] 1.40625 1.5625 1.40625
[3,] 3.24000 3.6000 4.00000

I compiled the TMB equivalent example, mvrw.cpp and it failed with 'invalid cast from type' error due to trying to multiply a scalar times a vector,

@nathanvaughan-NOAA
Copy link
Contributor

nathanvaughan-NOAA commented Mar 12, 2024

@Andrea-Havron-NOAA and @JaneSullivan-NOAA. That is a good example of how R can do wild things when it makes incorrect assumptions. Are there any important cases where this could happen with RTMB due to a user error that wouldn't happen if we use C++ base code? I assume that we will just need to ensure this kind of error doesn't occur in developer written FIMS code through tests and case studies. I also understand @ChristineStawitz-NOAA's desire for portability due to unknown future support for TMB but we should also consider the potential for future NOAA support of FIMS if we limit accessibility to only/largely C++ which has already resulted in significant issues within our development team. Coding FIMS in all R would substantially increase current and future participation in development within both our development team and the rest of NOAA. Using SS3 as an example and almost 100% reliance on @Rick-Methot-NOAA for updates/support it seems that future FIMS development support may be much more precarious/limited than TMB support given the more limited user base? I also agree with @kellijohnson-NOAA that we should not consider porting over to RTMB as a negative at this point given the significant refactoring and ongoing future development that is already planned. That may have been a little all over the place so in summary.

  1. Would moving to RTMB add new user-induced bugs that cannot be caught in our code, but would be with C++? If yes then that is a significant concern as silent errors that give plausible but wrong results are a huge liability in production assessments particularly during on the fly revisions that are often requested in the SE and I assume elsewhere.

  2. If no to 1 then this just comes down to a tradeoff between community/developer accessibility vs future portability with the additional issue that FIMS would only remain portable as long as there was developer support to do the porting. We don't want to end up in an SS type situation where only a small group of people are golden handcuffed to a lifetime of development support.

@JaneSullivan-NOAA
Copy link
Contributor

@Andrea-Havron-NOAA great example! Thank you :) I'll just go shut up now.

@Andrea-Havron-NOAA
Copy link
Collaborator Author

Thanks everyone for all the great feedback. I think any platform we develop needs to be somewhat independent of TMB. I see two main potential future use cases that would not depend on TMB:

  • FIMS as an operating model for MSEs/ecological/economic models
  • Machine Learning if/once ML outperforms statistical inference

@JonBrodziak
Copy link
Contributor

In future assessment modeling situations, Brieman’s two cultures of data modeling and algorithmic modeling may both need to be considered to produce the best predictive accuracy (Breiman 2001). This contrasts with the long-term fisheries assessment modeling practice of deploying data models with parameter sets to be estimated in contrast to machine learning algorithms to make probabilistic classifications of stock status for management action. Optimization methods for assessments formulated as data models typically use gradient descent and assume differentiable objective functions. Other applicable but less common optimization approaches include stochastic search algorithms based on Monte Carlo simulation or generic hill-climbing algorithm based on simulated annealing or particle swarm optimizers for frequentist or Markov Chain Monte Carlo or Hamiltonian Markov Chain algorithms for Bayesian sampling from the target posterior distribution. In comparison, there are few instances of algorithmic or machine learning models that have been applied in marine science to date (Malde et al. 2020). However, more frequent future applications of machine learning for assessment purposes can be expected (Beyan and Brouwer 2020, Sartar et al. 2021) based on predictive accuracies being obtained for species identification (e.g., Smolinski et al. 2019, Allken et al. 2021), forecasting environmental conditions (e.g., Ye et al. 2021) forecasting recruitment or spawning biomass (Smolinksi 2018, Lüdtke and Pierce 2023) or setting prior distributions for probable stock size (Froese et al. 2023). While there is no certainty about the future needs for stock assessment modeling, an important future challenge is to ensure that any next-generation assessment platform be adaptive to potential changes from a data model-based to a more algorithmic modeling culture.

@Cole-Monnahan-NOAA
Copy link
Contributor

@JonBrodziak can you give a little more context here? Are you saying that we should build FIMS to do traditional stats and ML equally well? How does this interact with RTMB vs C++?

FYI the key paper cited is here.

@ChristineStawitz-NOAA
Copy link
Contributor

Happy to have this thread continue our discussion, but if you do have a specific pro or con or suggested action I would like to capture them in this doc.

@JonBrodziak
Copy link
Contributor

JonBrodziak commented Mar 14, 2024 via email

@ChristineStawitz-NOAA ChristineStawitz-NOAA added theme: more features and removed status: triage_needed This is not approved for this milestone, do not work on it yet labels Apr 16, 2024
@ChristineStawitz-NOAA
Copy link
Contributor

@JaneSullivan-NOAA is going to start prototyping an age-length structured approach in RTMB which will help inform this decision

@JaneSullivan-NOAA
Copy link
Contributor

JaneSullivan-NOAA commented Apr 16, 2024

Nathan and I have some meeting times set May 13-17 on the FIMS calendar if anyone else wants to join.

@ChristineStawitz-NOAA ChristineStawitz-NOAA added this to the Parking lot milestone May 20, 2024
@kellijohnson-NOAA kellijohnson-NOAA modified the milestones: Parking lot, 2 Jun 14, 2024
@kellijohnson-NOAA kellijohnson-NOAA added kind: feature New feature or request kind: question Further information is requested status: needs more info Information is needed from the issue creator before forward progress can be made. kind: refactor Restructure code to improve the implementation of FIMS P2 mid-level priority labels Jun 14, 2024
@Andrea-Havron-NOAA
Copy link
Collaborator Author

It's possible we may be able to pass a FIMS MakeADFun object to RTMB. FIMS would need to be compiled with TMBad to get this to work, which is scheduled for the quality milestone after M2. Benefits of passing to RTMB would be accessing visualization tools for the tape, although the usefulness of this is questionable given how complicated the tape will likely be.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind: feature New feature or request kind: question Further information is requested kind: refactor Restructure code to improve the implementation of FIMS P2 mid-level priority status: needs more info Information is needed from the issue creator before forward progress can be made. theme: more features
Projects
None yet
Development

No branches or pull requests

8 participants