#### DMCA

## Motivated Reinforcement Learning for Improved Head Actuation of Humanoid Robots

Citations: | 3 - 3 self |

### Citations

5599 | Reinforcement Learning: An Introduction,
- Sutton, Barto
- 1998
(Show Context)
Citation Context ...nt Learning Reinforcement learning is a form of machine learning used to solve problems involving a series of decisions based on perceptions, with a metric indicating performance after every decision =-=[5]-=-. The problem can be formulated as an interaction between a learning entity called the agent and its environment. The agent is the decision maker and the environment is defined as anything external to... |

1667 |
Learning from Delayed Rewards
- Watkins
- 1989
(Show Context)
Citation Context ...: S → 2A where Λ(s) is the subset of actions available in state s ∈ S; a transition function T : S × A × S → [0, 1] describing the probability of state transitions; and a reward function R : S ×A→ IR =-=[6]-=-. Here 2A = {U : U ⊆ A} is the power set of A, or the set of all subsets of A. We additionally simplify the problem for the purpose of head actuation by including the assumption that Λ(s) is finite an... |

234 |
Der Merwe, “The unscented Kalman filter for nonlinear estimation
- Wan, Van
(Show Context)
Citation Context ...mation into global coordinates [3]. A robot is said to be localised if its localisation model is sufficiently accurate. The model of the soccer field world is a collection of Unscented Kalman Filters =-=[4]-=-; each robot maintains a filter for its own position and a filter for the ball’s position. Each filter is a Gaussian probability distribution for the possible field locations of an object. The filters... |

173 |
RoboCup: A challenge problem for AI
- Kitano, Kuniyoshi, et al.
- 1997
(Show Context)
Citation Context ...nty. Therefore, we utilised motivated reinforcement learning techniques to implement ‘curious’ anthropomorphic behaviours with the goal of optimising localisation in the RoboCup KidSize soccer league =-=[2]-=-. During a game of soccer, the robots must function autonomously and all computations must be performed on the robots’ internal CPUs. Thus, the robot must use a camera built into its head to measure f... |

45 | Value Function Approximation in Reinforcement Learning using the Fourier Basis. In
- Konidaris, Osentoski, et al.
- 2011
(Show Context)
Citation Context ... learning a set of scalar weights using gradient descent. Based on the Fourier series expansion of periodic functions, a Fourier basis can be used to approximate the value functions in a given domain =-=[13]-=-. The Fourier basis linear approximator is given by the cosine part of a truncated Fourier series and is updated with a sampled point, gradient descent update rule. That is, by sampling the unknown fu... |

18 | Designing for interest and novelty: Motivating design agents. In: de Vries
- Saunders, Gero
- 2001
(Show Context)
Citation Context .... It has been established that natural agents will seek out a middle ground in terms of novelty of sensation, resulting in an aversion to experiences too familiar or too unfamiliar. Saunders and Gero =-=[9]-=- implemented motivated reinforcement learning agents to study the progression of architectural designs, with successive designs similar-yet-different to previous designs. Merrick, et al. [10], have us... |

13 |
Motivated Reinforcement Learning. Curious Characters for Multiuser Games.
- Merrick, Maher
- 2009
(Show Context)
Citation Context ...ture has been used to model motivated behaviour for application in generating complex, exploratory behaviours in unsupervised intelligent agents for non-player characters in online multi-player games =-=[8]-=-. Such motivated reinforcement learning agents differ from standard agents in that they generate their own reward, independent of the environmental reward, based on state perceptions and their own act... |

9 | Evaluation of colour models for computer vision using cluster validation techniques
- Budden, Fenn, et al.
- 2013
(Show Context)
Citation Context ...ured until desirable behaviour is achieved. 3 Approximating Continuous Value Functions Reinforcement learning is often performed in finite, discrete state spaces. In this case, a simple look-up table =-=[12]-=- can be used to store the value function Q and the reinforcement learning problem is often easily solved. The majority of the useful variables available for the robot soccer player to sample are conti... |

8 | A novel approach to ball detection for humanoid robot soccer
- Budden, Fenn, et al.
- 2012
(Show Context)
Citation Context ... coordinate systems. Objects, such as the ball, must be localised in the world model by measuring position relative to a localised robot, and then transforming the information into global coordinates =-=[3]-=-. A robot is said to be localised if its localisation model is sufficiently accurate. The model of the soccer field world is a collection of Unscented Kalman Filters [4]; each robot maintains a filter... |

2 |
Visual gaze analysis of robotic pedestrians moving in urban space
- Wong, Chalup, et al.
(Show Context)
Citation Context ...tion of object recognition or tuning of probabilistic filters. Keywords: motivated reinforcement learning, localisation, Fourier basis, head actuation, simulated curiosity 1 Introduction Wong, et al. =-=[1]-=-, used humanoid robots to model human gaze behaviour in urban environments. They utilised a gaze direction model based on visually salient objects placed in a model urban environment. If a robot saw a... |

2 |
N.: A shape grammar approach to computational creativity and procedural content generation in massively multiplayer online role playing games
- Merrick, Isaacs, et al.
- 2013
(Show Context)
Citation Context ... and Gero [9] implemented motivated reinforcement learning agents to study the progression of architectural designs, with successive designs similar-yet-different to previous designs. Merrick, et al. =-=[10]-=-, have used motivated reinforcement learning agents to create game content procedurally, conforming to the similar-but-different concept of motivation, to simulate creativity. Further research by Merr... |

1 |
Principles of Physiology and
- Wundt
- 1910
(Show Context)
Citation Context ..., the choice of actions becomes ambiguous as the environment no longer discriminates. Typically, an animal given no environmental stimuli will seek out novel experiences, rather than taking no action =-=[7]-=-. Reinforcement learning infrastructure has been used to model motivated behaviour for application in generating complex, exploratory behaviours in unsupervised intelligent agents for non-player chara... |

1 |
Intrinsic motivation and introspection in reinforcement learning. Autonomous Mental Development
- Merrick
- 2012
(Show Context)
Citation Context ...ve used motivated reinforcement learning agents to create game content procedurally, conforming to the similar-but-different concept of motivation, to simulate creativity. Further research by Merrick =-=[11]-=- involves agents which generate goals based on motivation. The agent then seeks to learn these goals with standard reinforcement learning techniques. Given the novelty N = N(s, a) of a state action pa... |