Implementation of DQN, Double DQN, Dueling DDQN, and Actor-Critic policy Gradient for Car-Racing_v0 game
In this project, we are interested in training the DQN and A2C agents on Car-Racing_v0 environment. The Car-Racing environment is part of the Box2D environments.
In the video below, we present samples of interactions with the environment.
Q-learning is a value-based reinforcement learning algorithm used to find the optimal policy for an MDP. It operates by iteratively updating a Q-table, where each entry (Q-value) represents the expected cumulative reward of taking a specific action within a particular state. The iterative process involves the agent learning by exploring the environment and updating the model as the exploration continues.
Double Deep Q-Learning (DDQN) is an enhancement of the classic Q-learning algorithm. It addresses the issue of overestimation bias in Q-values using two separate Q-value networks: an online network and a target network. DDQN employs experience replay to improve sample efficiency and stabilize training.
Dueling Network Architectures proposes a different architecture compared with the one used in Mnih’s 2015 paper. After the end of the convolutional layers, they introduced two estimators: one for the state value function and the other for the state-dependent action advantage function.
[1] https://hiddenbeginner.github.io/study-notes/contents/tutorials/2023-04-20_CartRacing-v2_DQN.html