<rss xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title>Reinforcement Learning - Tag - Xiaopeng Xu</title><link>https://xu-xp.com/tags/reinforcement-learning/</link><description>Reinforcement Learning - Tag - Xiaopeng Xu</description><generator>Hugo -- gohugo.io</generator><language>en</language><managingEditor>xiaopeng.xu@kaust.edu.sa (Xiaopeng Xu)</managingEditor><webMaster>xiaopeng.xu@kaust.edu.sa (Xiaopeng Xu)</webMaster><lastBuildDate>Fri, 18 Jun 2021 00:00:00 +0000</lastBuildDate><atom:link href="https://xu-xp.com/tags/reinforcement-learning/" rel="self" type="application/rss+xml"/><item><title>RL 强化学习笔记</title><link>https://xu-xp.com/posts/rl/</link><pubDate>Fri, 18 Jun 2021 00:00:00 +0000</pubDate><author>xiaopeng.xu@kaust.edu.sa (Xiaopeng Xu)</author><guid>https://xu-xp.com/posts/rl/</guid><description><![CDATA[<h2 id="rl-基础">RL 基础</h2>
<h3 id="2-多臂赌博机-k-arm-bandit">2 多臂赌博机 (K-arm bandit)</h3>
<ul>
<li>
<p>只有动作 (action) 和对应的收益 (rewards)。无状态 (states)。</p>
</li>
<li>
<p>动作价值函数</p>
</li>
<li>
<p>增量式实现</p>]]></description></item></channel></rss>