IJITCS Vol. 11, No. 11, 8 Nov. 2019
Cover page and Table of Contents: PDF (size: 790KB)
Full Text (PDF, 790KB), PP.24-33
Views: 0 Downloads: 0
ARS-Quad, aerial systems, reinforcement learning, policy optimization, episodes, quadcopter, augmented random search
Model-based reinforcement learning strategies are believed to exhibit more significant sample complexity than model-free strategies to control dynamical systems, such as quadcopters. This belief that Model-based strategies that involve the use of well-trained neural networks for making such high-level decisions always give better performance can be dispelled by making use of Model-free policy search methods. This paper proposes the use of a model-free random searching strategy, called Augmented Random Search (ARS), which is a better and faster approach of linear policy training for continuous control tasks like controlling a Quadcopter’s flight. The method achieves state-of-the-art accuracy by eliminating the use of too much data for the training of neural networks that are present in the previous approaches to the task of Quadcopter control. The paper also highlights the performance results of the searching strategy used for this task in a strategically designed task environment with the help of simulations. Reward collection performance over 1000 episodes and agent’s behavior in flight for augmented random search is compared with that of the behavior for reinforcement learning state-of- the-art algorithm, called Deep Deterministic policy gradient(DDPG) Our simulations and results manifest that a high variability in performance is observed in commonly used strategies for sample efficiency of such tasks but the built policy network of ARS-Quad can react relatively accurately to step response providing a better performing alternative to reinforcement learning strategies.
Ashutosh Kumar Tiwari, Sandeep Varma Nadimpalli, "Augmented Random Search for Quadcopter Control: An alternative to Reinforcement Learning", International Journal of Information Technology and Computer Science(IJITCS), Vol.11, No.11, pp.24-33, 2019. DOI:10.5815/ijitcs.2019.11.03
[1]Tuomas Haarnoja, Haoran Tang, Pieter Abbeel, Sergey Levine. Reinforcement Learning with Deep Energy- Based Policies. arXiv:1702.08165, 2017.
[2]P. Henderson, R. Islam, P. Bachman, J. Pineau, D. Precup, and D. Meger. Deep reinforcement learning that matters. arXiv:1709.06560, 2017.
[3]T. Salimans, J. Ho, X. Chen, and I. Sutskever. Evolution strategies as a scalable alternative to reinforcement learning. arxiv:1703.03864, 2017
[4]Vankadari, M., Das, K., Shinde, C. and Kumar, S. A Reinforcement Learning Approach for Autonomous Control and Landing of a Quadrotor.International conference on unmanned aerial vehicles(IUCAS), 2018
[5]William Koch, Renato Mancuso, Richard West , Azer Bestavros. Reinforcement learning for UAV attitude control. arXiv:1804.04154
[6]A. Rajeshwaran,K. Lowrey , S.kadake and E.todorov. Towards generalizationand simplicity in continuous control. Advances in Neural Information processing systems,2017
[7]P.Moritz, S. Jordan, and P.Abbeel. High-dimensional continuous control using generalized advantage estimation. International conference on Learning representation, 2015
[8]R. Islam, P. Henderson, M. Gomrokchi, and D. Precup. Reproducibility of benchmarked deep reinforcem learning tasks for continuous control. arxiv:1708.04133, 2017
[9]Timothy P. Lillicrap, Jonathna J. Hunt , Alexander Pritzel, Nocholas Heess, Tom Erez, Yuval Tassa, David Silver and Daan Wierstra . Continous control with Deep reinforcement learning , ICLR, 2016
[10]N. Heess, S. Sriram, J. Lemmon, J. Merel, G. Wayne,Y. Tassa, T. Erez, Z. Wang, A. Eslami, M. Riedmiller, et al. Emergence of locomotion behaviours in rich environments. arxiv:1707.02286, 2017
[11]J. Garcia, F. Fernandez .Safe exploration of state and Action spaces in reinforcement learning. arXiv:1402.0560, 2014
[12]B. Kim, L.P. Kaelbling, T.L. perez . Guiding the search in continuous state-action spaces by learning an action sampling distribution from off-target samples. arXiv:1711.01391, 2017
[13]Jemin Hwangbo,Inkyu Sa, Roland Siegwart and marco Hutter. Control of a quadrotor using reinforcement learning. arXiv :1707.05110,2017
[14]Shayegan Omidshafiei . Reinforcement Learning based Quadcopter control, December 11, 2013
[15]J. Schulman, S. Levine, P. Abbeel, M. Jordan, and P. Moritz. Trust region policy optimization. pages 1889–1897, 2015.
[16]S. Levine and V. Koltun. Guided policy search in International Conference of machine learning (ICML), 2014
[17]Deisenroth, Marc Peter , Neumann , Gerhard , Peters, jan, et al. A survey on policy search for robotics . Foundations and trends in Robotics , 2(1-2),2013
[18]Horia Mania, Aurelia Guy, Benjamin Recht. Simple random search provides a competitive approach to reinforcement learning. arXiv:1803.07055, 2018
[19]I. Loshchilov and F. Hutter , “SGDR: stochastic gradient descent with warm restarts . arXiv preprint arXiv:1608.03983, 2016
[20]D.Silver , A. Huang ,C.J. Maddison A. Guez , L. Sifre. Deterministic policy gradient algorithms .Proceedings of the International Conference on Machine Learning ,2014
[21]Mingyang Geng, Kele Xu, Bo Ding, Huaimin Wang, Lei Zhang. Learning data augmentation policies using augmented random search, arXiv:1811.04768, 2018
[22]J.G Ziegler and N. B. Nichols . Optimum settings for automatic controllers. Trans. ASME,vol. 64, no.11, 2007
[23]Khao Tran and Thanh nam Nguyen . Flight Motion Controller Design using genetic Algorithm for a quadcopter, 51(5), vol. 5, 2018
[24]Tianho Zhang, Gregory Kahn , Sergey Levine ,Peiter Abbeel. Learning Deep control policies for autonomous Aerial vehicles with MPC-Guided Policy Search, arXiv:1509.06791,2015-2016
[25]S.Tiu and B. Recht. Least –squares temporal difference learning for the linear quadratic regulator.arXiv:17 12:08642,2017
[26]Michael C. Koval, Christopher R. mansley, Michael L. Littman , Autonomous Quadcopter Control with reinforcement Learning , IEEE International Conference on Robotics and Automation, ICRA 2010
[27]I. Sa. And P. Corke . System Identification , Estimation and Control for a cost Effective Open –Source Quadcopter in Proceedings of the IEEE International Conference on Robotics and Automation , 2012
[28]Balduzzi, David and Ghifary, Muhammad. Compatible value gradients for reinforcement learning of continuous deep policies . arXiv preprint arXiv: 1509.03005, 2015
[29]Wawrzynski, Pawel . Real-time reinforcement learning by sfigureuential actor-critics and experience replay . Neural networks, 22(10):1487-1497 ,2009
[30]Z. Wang , V. bapst , N. Heess, V. Mnih , R. munos, K. kavukcuoglu and N. de Freitas. Sample efficient actor-critic with experience replay . arXiv preprint arXiv: 1611.01224 , 2016
[31]Hafner ,Roland and Riedmiller , Martin . Reinforcement learning in feedback control . Machine learning , 84(1-2):137-169, 2011
[32]S. Levine and P. Abbeel. Learning neural network policies with guided search under unknown dynamics in Advances in Neural Information Processing Systems (NIPS), 2014
[33]Sergey Levin and V. koltun . Learning complex neural networks policies with trajectory optimization in International Conference on Machine Learning (ICML) , 2014
[34]S. Lupashin , A. schollig, M. Sherback and R. D’ Andrea . A simple learning strategy for high speed quadcopter multi-flips . IEEE International Conference on Robotics and Automation (ICRA) , 2010
[35]Kingma , Diederick and Ba. Jimmy , Adam : A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
[36]Eric jones, T. Oliphant , P. Peterson et al. .Open source scientific tools for python http://www.scipy.org/, 2001
[37]Allian ,Rhett . plotly technologies Inc., https://plot.ly/ , 2015