As a landmark, signaling the target region, serves a 3-D geometric shape with several beneficial attributes. Therefore, we modeled a general docking situation in a Webots simulation environment (Fig. grasping, user interaction or recharging can be performed. Docking of a mobile robot is the initial problem that has to be solved before other applications, e.g. However, for the more difficult task, the modulatory effect on δs j I n via the non-local terms has been included, mostly because of the necessity when performing vanilla gradient descent. This biologically more realistic approximation has been successfully applied to the first experimental scenario presented below. 8), we yield a purely local learning rule. Note, by omitting the non-local terms (aggregated in brackets in Eq.
8) represents a non-local learning rule, because i) the action layer weights Q are involved and ii) it is summed over all activations of the feature layer.
#WEBOTS FOR NAO UPDATE#
7), the update of the feature weights (Eq. In contrast to the learning rule for the action weights (Eq. exactly one feature unit is active ( s j = 1, and for all others s k,k = j = 0 ), the first and the second term cancel each other out and learning has converged. The second term represents a competitive decay term that has a larger suppressive effect if strong activations s k in the feature layer are paired with large weights Q ik. Since all weights tend to be non-negative when positive rewards are given, one might interpret this factor as influencing learning speed but not the final result. first term Q ij that arises through backpropagation de- notes how strong state neuron j contributes to the output.