Reproduction codes of Bootstrapping Error Accumulation Reduction (BEAR) with chainer
install chainer, d4rl and tensorboardX in prior of using the code
$ python3 main.py --env="Ant-v2" --datafile=<file to buffer path> --gpu=<gpu number>
$ python3 main.py --env="Ant-v2" --datafile=<file to buffer path>
I tested only with Ant-v2 data and found that laplacian kernel is highly stable compared to gaussian kernel.
However, both kernel succeeded learning similar policy that scores like the behavior policy used for gathering the training data.
Below graphs are results of 1 training run for laplacian kernel.
Policy performance suddenly decreses after 200k iterations.