Adds an initial StableBaselines3 RL environment as an example #2667

arjo129 · 2024-11-07T07:12:04Z

This PR provides an incredibly basic example of how to use gazebo with StableBaselines3 for RL. This example is that of the classic cartpole which is commonly used as a "getting started" task in reinforcement learning. The python script trains a simple model using python to balance a cart pole. We use the gui to visualize it.

This PR adds support for the Reset API to the test fixture. As `TestFixture` is one of the main ways one can get access to the ECM in python when trying to write some scripts for Deep Reinforcement Learning I realized that without `Reset` supported in the `TestFixture` API, end users would have a very hard time using our python APIs (which are actually quite nice). For reference I'm hacking a demo template here: https://github.com/arjo129/gz_deep_rl_experiments/tree/ionic Signed-off-by: Arjo Chakravarty <arjoc@intrinsic.ai>

This allows us to reset simulations without having to call into gz-transport making the code more readable from an external API. Depends on #2647 Signed-off-by: Arjo Chakravarty <arjoc@intrinsic.ai>

Signed-off-by: Arjo Chakravarty <arjoc@intrinsic.ai>

A lot of things are not working. Particularly when `ResetAll` is called, the EnableVelocityChecks does not trigger the phyics system to populate the velocity components. This is a blocker for the current example. Signed-off-by: Arjo Chakravarty <arjoc@intrinsic.ai>

Signed-off-by: Arjo Chakravarty <arjoc@intrinsic.ai>

arjo129 · 2024-11-12T07:06:58Z

So the above code should be able to train a RL model even on a potato. Currently I've got the algorithm to successfully balance a cart pole. There are some open issues however that will block this from being merged. Primarily, my main concern is that I've hacked together an API for running the gui client.

* Adds support for Reset in test fixture This PR adds support for the Reset API to the test fixture. As `TestFixture` is one of the main ways one can get access to the ECM in python when trying to write some scripts for Deep Reinforcement Learning I realized that without `Reset` supported in the `TestFixture` API, end users would have a very hard time using our python APIs (which are actually quite nice). For reference I'm hacking a demo template here: #2667 --------- Signed-off-by: Arjo Chakravarty <arjoc@intrinsic.ai>

Signed-off-by: Arjo Chakravarty <arjoc@intrinsic.ai>

…t_public_api

Signed-off-by: Arjo Chakravarty <arjoc@intrinsic.ai>

arjo129 · 2025-02-05T02:40:30Z

python/src/gz/sim/Gui.cc

+void defineGuiClient(pybind11::module &_module)
+{
+
+  _module.def("run_gui", [](){


This is currently a blocker: I'm not sure what the best way forward is.

Gazebo GUI does not like running in the same process as the server. It core dumps when I try to run it from within a std::thread.

Forking works on linux, but windows has no concept of fork(). It seems to start a new process, you must call an executable via CreateProcess instead of simply fork()ing.

An alternative would be to create a jupyter widget out of gz-web but that will need us to work on the websocket server and have some python integration with gz-launch

Forking works on linux, but windows has no concept of fork(). It seems to start a new process, you must call an executable via CreateProcess instead of simply fork()ing.

We had similar problems on Gazebo Classic, in my experience to abstract away the problem of creating processes on Windows it is more convenient with libraries like https://github.com/DaanDeMeyer/reproc or https://gitlab.com/eidheim/tiny-process-library .

By the way, I guess we will have a similar problem when implementing gz-sim standalone executable for gazebosim/gz-tools#7 .

So last night I had a discussion with @azeey. I think the proposed method is to use the runGuiMain.cc executable on both platforms. I haven't tried it yet.

An update is python's os.fork() works via Cygwin on windows and may be a viable option for us.

An update is python's os.fork() works via Cygwin on windows and may be a viable option for us.

I am not sure about that. The official Python binary on Windows from python.org, the python installed by uv and the python installed by conda-forge all use msvc-based python. To use the cygwin powered python, you need to install python via cygwin, and that also prevents you to install most existing Windows wheels, that assume the use msvc-calling convention or runtime libraries (see pypa/cibuildwheel#329).

Yeah, I'm going ahead and using the binary method for now.

…l_example

Signed-off-by: Arjo Chakravarty <arjoc@intrinsic.ai>

Signed-off-by: Ubuntu <arjoc@intrinsic.ai>

…ample' into arjo/examples/rl_example

Signed-off-by: Ubuntu <arjoc@intrinsic.ai>

Signed-off-by: Arjo Chakravarty <arjoc@intrinsic.ai>

…l_example

Signed-off-by: Arjo Chakravarty <arjoc@intrinsic.ai>

arjo129 added 7 commits October 11, 2024 13:20

Add support for simulation reset via a publicly callable API

053152f

This allows us to reset simulations without having to call into gz-transport making the code more readable from an external API. Depends on #2647 Signed-off-by: Arjo Chakravarty <arjoc@intrinsic.ai>

Style

ff468b7

Signed-off-by: Arjo Chakravarty <arjoc@intrinsic.ai>

Style

e0ee0dd

Signed-off-by: Arjo Chakravarty <arjoc@intrinsic.ai>

Style

047be5b

Signed-off-by: Arjo Chakravarty <arjoc@intrinsic.ai>

Typo

76adc26

Signed-off-by: Arjo Chakravarty <arjoc@intrinsic.ai>

arjo129 mentioned this pull request Nov 7, 2024

Example of doing Reinforcement Learning in Gazebo #2662

Open

19 tasks

arjo129 added 3 commits November 11, 2024 13:12

Fixed readme instructions

c3eea00

Signed-off-by: Arjo Chakravarty <arjoc@intrinsic.ai>

Style

9a2b742

Signed-off-by: Arjo Chakravarty <arjoc@intrinsic.ai>

Got the gui working. Time for gradient descent by grad student

da33f11

Signed-off-by: Arjo Chakravarty <arjoc@intrinsic.ai>

arjo129 added 4 commits January 8, 2025 19:14

Address feedback

3e83828

Signed-off-by: Arjo Chakravarty <arjoc@intrinsic.ai>

Merge remote-tracking branch 'origin/main' into arjo/feat/server_rese…

35eca3e

…t_public_api

Add support for individual resets

15fa867

Signed-off-by: Arjo Chakravarty <arjoc@intrinsic.ai>

Merge branch 'main' into arjo/feat/server_reset_public_api

ed502ee

arjo129 commented Feb 5, 2025

View reviewed changes

arjo129 added 8 commits February 6, 2025 15:48

Merge branch 'arjo/feat/server_reset_public_api' into arjo/examples/r…

120c1ec

…l_example

Style

16d4407

Signed-off-by: Arjo Chakravarty <arjoc@intrinsic.ai>

Bind installation directories and install

6c8131c

Signed-off-by: Ubuntu <arjoc@intrinsic.ai>

Missed the file in the last commit

5117071

Signed-off-by: Ubuntu <arjoc@intrinsic.ai>

Merge remote-tracking branch 'refs/remotes/origin/arjo/examples/rl_ex…

e602467

…ample' into arjo/examples/rl_example

style

46e3ca8

Signed-off-by: Ubuntu <arjoc@intrinsic.ai>

Style

6799df2

Signed-off-by: Arjo Chakravarty <arjoc@intrinsic.ai>

Merge branch 'arjo/feat/server_reset_public_api' into arjo/examples/r…

ad1ace4

…l_example

Base automatically changed from arjo/feat/server_reset_public_api to main February 10, 2025 06:44

arjo129 added 2 commits February 10, 2025 14:52

Style

856f316

Signed-off-by: Arjo Chakravarty <arjoc@intrinsic.ai>

Enable GUI rollout

f8ddbf6

Signed-off-by: Arjo Chakravarty <arjoc@intrinsic.ai>

arjo129 marked this pull request as ready for review March 5, 2025 03:39

arjo129 requested a review from mjcarroll as a code owner March 5, 2025 03:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adds an initial StableBaselines3 RL environment as an example #2667

Adds an initial StableBaselines3 RL environment as an example #2667

arjo129 commented Nov 7, 2024 •

edited

Loading

arjo129 commented Nov 12, 2024

arjo129 Feb 5, 2025 •

edited

Loading

traversaro Feb 6, 2025

traversaro Feb 6, 2025

arjo129 Feb 6, 2025

arjo129 Feb 7, 2025

traversaro Feb 7, 2025

arjo129 Feb 7, 2025 •

edited

Loading

Adds an initial StableBaselines3 RL environment as an example #2667

Are you sure you want to change the base?

Adds an initial StableBaselines3 RL environment as an example #2667

Conversation

arjo129 commented Nov 7, 2024 • edited Loading

arjo129 commented Nov 12, 2024

arjo129 Feb 5, 2025 • edited Loading

Choose a reason for hiding this comment

traversaro Feb 6, 2025

Choose a reason for hiding this comment

traversaro Feb 6, 2025

Choose a reason for hiding this comment

arjo129 Feb 6, 2025

Choose a reason for hiding this comment

arjo129 Feb 7, 2025

Choose a reason for hiding this comment

traversaro Feb 7, 2025

Choose a reason for hiding this comment

arjo129 Feb 7, 2025 • edited Loading

Choose a reason for hiding this comment

arjo129 commented Nov 7, 2024 •

edited

Loading

arjo129 Feb 5, 2025 •

edited

Loading

arjo129 Feb 7, 2025 •

edited

Loading