Project 1, Low dimensional DRL search path

Open source 2 research projects for more future collaboration and work done on them

Project 1, Low dimensional DRL search path

Abstract

We found that the search path of some most used deep reinforcement learning algorithms (PPO, SAC) mostly falls in a low dimensional space (<10 dimension explains >98% of variance vs ambient parameter space of over 1000) for all tested tasks. Moreover I found that first PCA direction points to the optimal parameter with small error. (stable_baselines/low_dim_analysis directory)

Application

The plane of first 2 PCA axis always captures about 85% of the variance of the search path, also this plane is one of the most informative 2d euclidean surface about the return structure in parameter space in hard locomotion tasks since a slightly off surface will have almost 0 variance of very low return. For example, a slightly off parameter will cause the agent lose balance and not able to complete the task at all. (jump, run etc). You can use the visualization to gain intuition of the difficulty of the task, is there a plateau, how big is the plateau, move the surface along the normal direction to have a 3d visualization of the parameter blob. Below is an example of return landscape of Hopper task in the parameter space of a PPO agent.

Example of first PCA direction points roughly to optimal parameter, notice that as the training goes on, the orange curve goes below 20 degree error.

Also includes various trials to utilize this knowledge to accelerate those algorithms including trials to quickly estimate the subspace with small error and run ES algorithm on that subspace, identify the first PCA direction and only search the cone given by the first PCA direction. However, in hard locomotion tasks, the estimated subspace is not accurate enough to contain near optimal parameter with high probability.

Future work

It's interesting to have an understanding of why above phenomenon happens, even though it's hard to prove theorems about it.

Project 2, implicit physics model in model free DRL algorithms

Abstract

Identifying linear and non linear correlations between neurons in a trained deep neural networks in DRL algorithms and variables from lagrangian equations. (new_neuron_analysis directory). This is the trial to see whether model free DRL algorithms encodes implicit model of the environment in their single neurons ( physics model in locomotion tasks ).

Examples

You can see a clear linear correlation from variables in mass matrix from lagrangian equations and neurons in the trained model.(For a review of lagrangian equations, see https://fab.cba.mit.edu/classes/865.18/design/optimization/dynamics_1.pdf)

Credits

Projects done with advisor Prof Karen Liu from Stanford University

Name		Name	Last commit message	Last commit date
Latest commit History 939 Commits
.github/ISSUE_TEMPLATE		.github/ISSUE_TEMPLATE
cma		cma
data		data
docker		docker
docs		docs
neuron_vis		neuron_vis
new_neuron_analysis		new_neuron_analysis
pydart2		pydart2
readme_pics_lagrangian		readme_pics_lagrangian
readme_pics_low_dim		readme_pics_low_dim
stable_baselines		stable_baselines
tests		tests
.coveragerc		.coveragerc
.gitignore		.gitignore
.travis.yml		.travis.yml
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Package.list		Package.list
README.md		README.md
Repo.keys		Repo.keys
conftest.py		conftest.py
dart_req.txt		dart_req.txt
env.yml		env.yml
job.sh		job.sh
job_dev.sh		job_dev.sh
requirements.txt		requirements.txt
run_docker_cpu.sh		run_docker_cpu.sh
run_docker_gpu.sh		run_docker_gpu.sh
run_tests.sh		run_tests.sh
setup.cfg		setup.cfg
setup.py		setup.py
wait_for_some_script.sh		wait_for_some_script.sh

License

hugerepo-tianhang/low_dim_update_stable

Folders and files

Latest commit

History

Repository files navigation

Project 1, Low dimensional DRL search path

Abstract

Application

Future work

Project 2, implicit physics model in model free DRL algorithms

Abstract

Examples

Credits

About

Resources

License

Stars

Watchers

Forks

Languages