Inspired by behaviorist psychology, **reinforcement learning** is an area of machine learning in computer science, concerned with how an *agent* ought to take *actions* in an *environment* so as to maximize some notion of cumulative *reward*. The problem, due to its generality, is studied in many other disciplines, such as game theory, control theory, operations research, information theory, simulation-based optimization, statistics, and genetic algorithms. In the operations research and control literature the field where reinforcement learning methods are studied is called *approximate dynamic programming*. The problem has been studied in the theory of optimal control, though most studies there are concerned with existence of optimal solutions and their characterization, and not with the learning or approximation aspects. In economics and game theory, reinforcement learning may be used to explain how equilibrium may arise under bounded rationality.

In machine learning, the environment is typically formulated as a Markov decision process (MDP), and many reinforcement learning algorithms for this context are highly related to dynamic programming techniques. The main difference to these classical techniques is that reinforcement learning algorithms do not need the knowledge of the MDP and they target large MDPs where exact methods become infeasible.

Reinforcement learning differs from standard supervised learning in that correct input/output pairs are never presented, nor sub-optimal actions explicitly corrected. Further, there is a focus on on-line performance, which involves finding a balance between exploration (of uncharted territory) and exploitation (of current knowledge). The exploration vs. exploitation trade-off in reinforcement learning has been most thoroughly studied through the multi-armed bandit problem and in finite MDPs.

Read more about Reinforcement Learning: Introduction, Exploration, Algorithms For Control Learning, Theory, Current Research

### Other articles related to "reinforcement learning, learning":

**Reinforcement Learning**

... In the field of

**reinforcement learning**, a softmax function can be used to convert values into action probabilities ...

**Reinforcement Learning**

... In

**reinforcement learning**, data are usually not given, but generated by an agent's interactions with the environment ... ANNs are frequently used in

**reinforcement learning**as part of the overall algorithm ... Tasks that fall within the paradigm of

**reinforcement learning**are control problems, games and other sequential decision making tasks ...

**Reinforcement Learning**

... are unknown, the problem is one of

**reinforcement learning**(Sutton and Barto, 1998) ... While this function is also unknown, experience during

**learning**is based on pairs (together with the outcome ) that is, "I was in state and I tried doing and happened") ... This is known as Q‑

**learning**...

**Reinforcement Learning**- Literature - Conferences, Journals

... Most

**reinforcement learning**papers are published at the major machine

**learning**and AI conferences (ICML, NIPS, AAAI, IJCAI, UAI, AI and Statistics) and ... The annual IEEE symposium titled Approximate Dynamic Programming and

**Reinforcement Learning**(ADPRL) and the biannual European Workshop on

**Reinforcement Learning**(EWRL) are two regularly held meetings ...

### Famous quotes containing the word learning:

“Without our being especially conscious of the transition, the word “parent” has gradually come to be used as much as a verb as a noun. Whereas we formerly thought mainly about “being a parent,” we now find ourselves talking about *learning* how “to parent.” . . . It suggests that we may now be concentrating on action rather than status, on what we do rather than what or who we are.”

—Bettye M. Caldwell (20th century)