-
Notifications
You must be signed in to change notification settings - Fork 182
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Exploding entropy temperature #34
Comments
Did you ever manage to solve this problem? I'm encountering similar exploding temperature for my environment no matter what value I choose, eventually the policy reaches that entropy, and temperature starts increasing... |
Hi, not at the moment. I also compared the implementation to OpenAI's baselines and experimented with theirs but with similar results. Working on multiple things at the moment but will provide an update if I find anything. |
I find it can be solved by delete "self.action_scale" in line 103 of model.py. |
I am facing the same problem. It is not solved yet, however, I can locate the issue: In GaussianPolicy -> sample(): The policy output x_t is transformed by tanh(x_t). If you look at the shape of tanh() then you will find out that this is equal to 1 or -1 for most of the arguments besides a small area between -5 and 5. As a result, we receive a clipped y_t. The actions are "fixed" to the action_space constraints. The algorithm stops exploring and is therefore increasing the temperature-factor alpha. This leads firstly to an exploding temperature factor and secondly to an exploding critic loss. The solution should be to replace action by x_t in the return statement of the sample() method. However, this is leading to an error (underneath): I am trying to solve this and let you know if there are updates from my side. If you have any thoughts or inputs on this, please let me know. Traceback (most recent call last): |
Can you explain a little bit more what you meant by that ? I am also trying to fix it on my side |
The OpenAI implementation have also the temperature exploding ? |
The issue is solved on my side. Are you using a custom environment? My problem was related to the environment. Make sure to punish your agent when it proposes a value outside the desired interval. |
I am confused, I thought that the aim of the squashed guaussian was to not go outside of the interval ? So the code is working on your side with the learned temperature and w/o modification ? (I am trying to use it on the 'LunarLanderContinuous-v2' right now, the scores hover around 0 and to solve it you need to have >200) |
Yes. It works without modification. If I have time in the evening I'll check LunarLander for you. |
Oh, my bad, it is working 😶 Thank you though :) |
Hi,
When I set the automatic_entropy_tuning to be true in an environment with action space of shape 1, my entropy temperature explodes and increases exponentially to a magnitude of 10^8 before pytorch fails and crashes the run. Any ideas as to why it is so?
The text was updated successfully, but these errors were encountered: