
来源:公文范文 发布时间:2022-12-15 20:15:02 点击:


关键词: 强化学习; Option; 连续空间; 随机技能发现

中图分类号: TN911⁃34; TP18 文献标识码: A 文章编号: 1004⁃373X(2016)10⁃0014⁃04

A random skill discovery algorithm in continuous spaces

LUAN Yonghong 1,2, LIU Quan2,3, ZHANG Peng2

(1. Suzhou Institute of Industrial Technology, Suzhou 215104, China; 2. Institute of Computer Science and Technology, Soochow University, Suzhou 215006, China; 3. MOE Key Laboratory of Symbolic Computation and Knowledge Engineering, Jilin University, Changchun 130012, China)

Abstract: In allusion to the large and continuous space’s “dimension curse” problem caused by the increase of state dimension exponential order, an improved random skill finding algorithm based on Option hierarchical reinforcement learning framework is proposed. A random skill tree set is generated via defining random Option to construct a random skill tree set. The task goal is divided into several sub⁃goals, and then the increase of learning parameter exponent due to the increase of the intelligent agent is reduced through learning low⁃order Option policy. The simulation experiment and analysis were implemented by taking a shortest path between any two points in two⁃dimension maze with barriers in the continuous space as the task. The experiment result shows that the algorithm may have some intermittent instability in the initial performance because Option is defined randomly, but it can be converged to the approximate optimal solution quickly with the increase of the random skill tree set, which can effectively overcome the problem being hard to obtain the optimal policy and slow convergence due to “dimension curse”.

Keywords: reinforcement learning; Option; continuous space; random skill discovery

0 引 言

强化学习[1⁃2](Reinforcement Learning,RL)是Agent通过与环境直接交互,学习状态到行为的映射策略。经典的强化学习算法试图在所有领域中寻求一个最优策略,这在小规模或离散环境中是很有效的,但是在大规模和连续状态空间中会面临着“维数灾”的问题。为了解决“维数灾”等问题,研究者们提出了状态聚类法、有限策略空间搜索法、值函数逼近法以及分层强化学习等方法[3]。分层强化学习的层次结构的构建实质是通过在强化学习的基础上增加抽象机制来实现的,也就是利用了强化学习方法中的原始动作和高层次的技能动作[3](也称为Option)来实现。



1 分层强化学习与Option框架

分层强化学习(Hierarchical Reinforcement Learning,HRL)的核心思想是引入抽象机制对整个学习任务进行分解。在HRL方法中,智能体不仅能处理给定的原始动作集,同时也能处理高层次技能。

4 结 语



[1] SUTTON R S, BARTO A G. Reinforcement learning: An introduction [M]. Cambridge, MA: MIT Press,1998.

[2] KAELBLING L P, LITTMAN M L, MOORE A W. Reinforcement learning: A survey [EB/OL]. [1996⁃05⁃01]. http:// www.cs.cmu.edu/afs/cs...vey.html.

[3] BARTO A G, MAHADEVAN S. Recent advances in hierarchical reinforcement learning [J]. Discrete event dynamic systems. 2003, 13(4): 341⁃379.

[4] SIMSEK O, WOLFE A P, BARTO A G. Identifying useful subgoals in reinforcement learning by local graph partitioning [C]// Proceedings of the 22nd International Conference on Machine learning. USA: ACM, 2005, 8: 816⁃823.

[5] OSENTOSKI S, MAHADEVAN S. Learning state⁃action basis functions for hierarchical MDPs [C]// Proceedings of the 24th International Conference on Machine learning. USA: ACM, 2007, 7: 705⁃712.

[6] MCGOVERN A, BARTO A. Autonomous discovery of subgolas in reinfoeremente learning using deverse density [C]// Proceedings of the 8th Intemational Coference on Machine Learning. San Fransisco:Morgan Kaufmann, 2001: 36l⁃368.

[7] JONG N K, STONE P. State abstraction discovery from irrelevant state variables [J]. IJCAI, 2005, 8: 752⁃757.

[8] KONIDARIS G, BARTO A G. Skill discovery in continuous reinforcement learning domains using skill chaining [J]. NIPS, 2009, 8: 1015⁃1023.

[9] KONIDARIS G, KUINDERSMA S, BARTO A G, et al. Constructing skill trees for reinforcement learning agents from demonstration trajectories [J]. NIPS, 2010, 23: 1162⁃1170.

[10] 刘全,闫其粹,伏玉琛,等.一种基于启发式奖赏函数的分层强化学习方法[J].计算机研究与发展,2011,48(12):2352⁃2358.

[11] 沈晶,刘海波,张汝波,等.基于半马尔科夫对策的多机器人分层强化学习[J].山东大学学报(工学版),2010,40(4):1⁃7.

[12] KONIDARIS G, BARTO A. Efficient skill learning using abstraction selection [C]// Proceedings of the 21st International Joint Conference on Artificial Intelligence. Pasadena, CA, USA: [S.l.], 2009: 1107⁃1113.

[13] XIAO Ding, LI Yitong, SHI Chuan. Autonomic discovery of subgoals in hierarchical reinforcement learning [J]. Journal of china universities of posts and telecommunications, 2014, 21(5): 94⁃104.

[14] CHEN Chunlin, DONG Daoyi, LI Hanxiong, et al. Hybrid MDP based integrated hierarchical Q⁃learning [J]. Science China (information sciences), 2011, 54(11): 2279⁃2294.

推荐访问:算法 随机 技能 连续 发现

Copyright @ 2009 - 2024 优泰范文网 All Rights Reserved

优泰范文网 版权所有 备案号:粤ICP备09201876号-1