HOMEDig In To Machine Learning
Back

Programs learn from the consequences of action rather than from explicit teaching. The reward feedback is delayed until the end of several steps. This type of learning is useful to control something like vehicles or devices because it requires a sequence of actions where you don't know if you have done it right until the job is complete. These processes involve a decision-making agent interacting with its environment so as to maximize the cumulative reward it receives over time. RL methods are intended to address the kind of learning and decision making problems that people and animals face in their everyday lives.

Your browser window appears to be too small to view this page. Please resize your browser.