MODEL-ENSEMBLE TRUST-REGION POLICY OPTIMIZATION

Jan 18, 2021

ICLR 2018 paper from Thanard Kurutach, Ignashi Clavera, Yan Duan, Aviv Tamar, Pieter Abbeel

They proposed a method to solve a problem a model-based reinforcement learning. In model-based reinforcement learning, the learning process alternates between model learning and policy optimization. The learned model is used to search for an improved policy. However, the policy optimization tends to exploit regions where insufficient data is available to train the model, leading to catastrophic failures.

Their idea is to use multiple models {f_1, f_2, …f_k} to learn environmental dynamics/transition probability. These models are trained via standard supervised learning, and the only differ by the. initial weights and the order in which mini-batches are sampled. The model ensemble serves as effective regularization for policy learning. The model bias is reduced by model ensemble.

MODEL-ENSEMBLE TRUST-REGION POLICY OPTIMIZATION

Written by LAAI