Common RL framework and utilities.
- maximize module reuse under RL framework
- accumulate experiment baselines for algorithm, with hyperparam setup, easily reproducible
- accumulate experiment baselines for problems, with algorithm/hyperparam/engineering efforts
- easily implement more algorithms
- easily combine different works
- [v] DQN
- [v] DDPG
- [v] Replay Buffer
- [v] Prioritized Exp Replay
- [v] Double DQN
- [v] Duel DQN
- [v] Actor Critic
- [v] Optimality Tightening
- [v] A3C
- [v] PPO
- [v] Bootstrap DQN
- [v] ICM
- [v] I2A
- [v] Soft Q Learning
pip install -e .
So you can use algorithms elsewhere.
Running different experiments may require different libraries, such as opencv-python
, gym[box2d]
, or roboschool
.
python test/exp_tabular.py run --name TabularGrid
for starter.
python test/exp_tabular.py list
python test/exp_deeprl.py list
to get a list of experiments in each experiment file.
. scripts/a3c_pong.sh
to start processes to run a3c algorithm.
python -m unittest discover -s hobotrl -p "test*.py" -v
to run all unit test cases.
>>> import hobotrl as hrl
>>> dir(hrl)
to see what's inside.
Typically most widely used classes are imported in module hobotrl
, like DQN
, DPG
, ActorCritic
:
>>> help(hrl.DQN)
>>> help(hrl.DPG)
>>> help(hrl.ActorCritic)
to consult help doc. Also remember to check out experiment files and unit tests as a reference.
In hobotrl, distributed training is implemented with Tensorflow's cluster capability.
See bash scripts in scripts
folder for starting worker
and ps
processes for distributed training.
The steps for starting the driving simulator environment:
- Open up a new shell, exececute
roscore
to launch ROS master. - Open up yet another shell, first
source [catkin_ws_dir]/devel/setup.bash
to register simulator ROS packages, then runpython rviz_restart.py
to fire up the simulator launcher. - The last shell if for running the actual main script, where a
DrivingSimulatorEnv
is instanced to commnunicate with the previously opened nodes as well as the agent.
Note these steps are tentitive and subject to change.
See this wiki entry for a recommended way via global variable scope reuse. [Setting global scope reference will break the creation of target network.]