Skip to content

On-policy MCTS combined with deep learning to train an actor-critic neural network that plays Hex (Con-tac-tix).

License

Notifications You must be signed in to change notification settings

srefsland/deep-rl-mcts

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

84 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Deep RL MCTS with Hex

This project implements a simplified version of the AlphaGo/AlphaZero architecture that has been successful when applied to board games such as Chess and Go. To create a more manageable state space, the neural network has been trained for the game Hex. The approach combines Deep Learning (DL) and Reinforcement Learning (RL) using On-Policy Monte Carlo Tree Search (MCTS), which creates distribution targets through self-play, which the neural network can use to improve its playing strength. These targets are created by initially playing completely random moves, and gradually increasing the probability of using the neural network to make moves, which is trained after each episode.

The neural network is a Convolutional Neural Network (CNN), which outputs both the actor and critic evaluation, where the actor's output is used to make moves, and the critic is used to evaluate the current position. The purpose of the critic is to reduce the number of rollouts when reaching a leaf node.

Demonstration

About

On-policy MCTS combined with deep learning to train an actor-critic neural network that plays Hex (Con-tac-tix).

Topics

Resources

License

Stars

Watchers

Forks

Languages