Sarsa 和Q-Learning 在强化学习中已经算是比较有名以及有广泛应用的算法了,这两个都是运用value-action function。Sarsa得名的原因是下图,而Q-learning是因为用了Q function所以得名,这两个算法非常相像,之后会具体说到这两者的区别。 Sarsa Algorithm for On-policy Control Convergence of Sarsa Sarsa converges to...
The first, named Fuzzy Q-learning, in an adaptation of Watkins' Q-learning for fuzzy Inference systems. The second, named Dynamical Fuzzy Q-learning, eliminates some drawbacks of both Q-learning and Fuzzy Q-learning. These algorithms are used to improve the rule based of a fuzzy controller....
intermittent Q-learningsuboptimal performanceZeno-freeThis paper proposes an intermittent model-free learning algorithm for linear time-invariant systems, where the control policy and transmission decisions are co-designed simultaneously while also being subjected to worst-case disturbances. The control policy...
Letting computers have the ability of cognizing,expressing their emotions and training them to act human e- motions,is becoming hotspot of recent research.This paper,designs a model of emotion-automaton based on dynamic Q-learning arithmetic,in which defines an emotional unit.The emotional unit wil...
This paper presents a dynamic fuzzy Q-learning (DFQL) method that is capable of tuning fuzzy inference systems (FIS) online. A novel online self-organizing learning algorithm is developed so that structure and parameters identification are accomplished automatically and simultaneously based only on Q-...
机器人的环境由目标和障碍物组成。目标和障碍可以是静态的,也可以是动态的。机器人的任务是通过q -学习算法找到一条无碰撞的路径在这个环境中导航。在这种环境下应用Q-learning算法的第一步是定义状态和动作空间。 对于在静态环境中导航的机器人,其状态集可以完全由机器人的位置或坐标来定义。
最后通过仿真实例验证了仿真环境的合理性以及Q-learning算法用于高铁动态调度的有效性, 为高铁调度员做出优化决策提供了良好的依据. 英文摘要 As the backbone of the national comprehensive transportation system, high-speed railway has achieved rapid and vigorous development in the past decade. At the same ...
In order for the soft limits exceedance cost to be comparable or even greater than the tracking cost, so that solutions within the soft limits are favoured, a reasonable heuristic is to choose the gains in \(\varvec{Q}_s = \text {blkdiag}(q_p \varvec{I}_n, \ q_v \varvec{I}_...
Presents information on a study which described a Q-learning-based dynamic channel assignment technique for mobile communication systems. Performance of th... Nie,Junhong,Haykin,... - 《IEEE Transactions on Vehicular Technology》 被引量: 181发表: 1999年 Performance analysis of cellular mobile communi...
In this paper, in order to realize emotion generating and emotion behavior more appropriate for a living thing, we embed to a robot about the model of neuromodulators that exists in human's brain and propose the learning method of the emotion behavior using Q-Learning that has meta-parameter...