Software & Data Downloads — QNTRPO

Quasi-Newton Trust Region Policy Optimization for policy optimization that employs Quasi-Newton approximation for the Hessian.

We propose a trust region method for policy optimization that employs Quasi-Newton approximation for the Hessian, called Quasi-Newton Trust Region Policy Optimization (QNTRPO). Gradient descent has become the de facto algorithm for reinforcement learning tasks with continuous controls. The algorithms has achieved state-of-the-art performance on wide variety of tasks and resulted in several improvements in performance of reinforcement learning algorithms across a wide range of systems. However, the algorithm suffers from a number of drawbacks including: lack of stepsize selection criterion, slow convergence, and dependence on problem scaling. We investigate the use of a dogleg method with a Quasi-Newton approximation for the Hessian to perform trust region method for policy optimization. We show that our particular choice addresses the listed drawbacks without sacrificing computational efficiency.

We provide for an algorithm which we call Quasi-Newton Trust Region Policy Optimization, which uses a dogleg method for computing the step (i.e., the size and direction) during policy optimization. This code has been tested on several difficult continuous control environments in Mujoco and achieves better learning rate than the TRPO algorithm. The code is compatible with openai-gym and thus can be used with any environment compatible with gym. The approach proposed in the paper appeared in a paper titled "Quasi-Newton Trust Region Policy Optimization" at the Conference on Robot Learning (CoRL), 2019 in Osaka, Japan.

MERL Contacts
- Arvind
  Raghunathan
- Diego
  Romeres

Related Publications
Jha, D.K., Raghunathan, A., Romeres, D., "QNTRPO: Including Curvature in TRPO", Optimization Foundations for Reinforcement Learning Workshop at NeurIPS, December 2019.
BibTeX TR2019-154 PDF Software
- @inproceedings{Jha2019dec,
- author = {Jha, Devesh K. and Raghunathan, Arvind and Romeres, Diego},
- title = {{QNTRPO: Including Curvature in TRPO}},
- booktitle = {Optimization Foundations for Reinforcement Learning Workshop at NeurIPS},
- year = 2019,
- month = dec,
- url = {https://www.merl.com/publications/TR2019-154}
- }
Jha, D.K., Raghunathan, A., Romeres, D., "Quasi-Newton Trust Region Policy Optimization", Conference on Robot Learning (CoRL), Leslie Pack Kaelbling and Danica Kragic and Komei Sugiura, Eds., October 2019, pp. 945-954.
BibTeX TR2019-120 PDF Software
- @inproceedings{Jha2019oct,
- author = {Jha, Devesh K. and Raghunathan, Arvind and Romeres, Diego},
- title = {{Quasi-Newton Trust Region Policy Optimization}},
- booktitle = {Conference on Robot Learning (CoRL)},
- year = 2019,
- editor = {Leslie Pack Kaelbling and Danica Kragic and Komei Sugiura},
- pages = {945--954},
- month = oct,
- publisher = {Proceedings of Machine Learning Research},
- url = {https://www.merl.com/publications/TR2019-120}
- }

Access software at https://github.com/merlresearch/QNTRPO.

ArvindRaghunathan

DiegoRomeres

Arvind
Raghunathan

Diego
Romeres