, 1999). For example, studies have shown that Epacadostat concentration the representation of direction in the caudate preceded in time the representation in PFC early in learning and perhaps served as a teaching signal for the PFC (Antzoulatos and Miller, 2011 and Pasupathy and Miller, 2005).
This is generally consistent with our finding that the caudate had an enriched representation of value derived from the reinforcement learning algorithm in the fixed condition. The learning in Pasupathy and Miller, however, evolved over about 60 trials, whereas the selection in our task evolved over 3–4 trials, making it difficult for us to examine changes in the relative timing of movement signals with learning, to compare our results directly. Much of the work that suggests a role for the striatum in RL has been motivated by the strong projection of the midbrain dopamine
neurons to the striatum (Haber et al., 2000) and the finding that dopamine neurons signal reward prediction errors (Schultz, 2006). Evidence also suggests, however, that dopamine neurons can be driven by aversive events (Joshua et al., 2008, Matsumoto and Hikosaka, 2009 and Seamans check details and Robbins, 2010), and therefore a straightforward interpretation of dopamine responses as a reward prediction error is not possible. It is still possible that striatal neurons represent action value. Although this has been shown previously (Samejima et al., 2005), similar value representations have been seen in the cortex (Barraclough et al., 2004, Kennerley and Wallis, 2009, Leon and Shadlen, 1999 and Platt and Glimcher, 1999), and therefore the specific role of the striatal action value signal was unclear. As we recorded from both lPFC and the dSTR simultaneously, we were able to show that there was an enrichment of value representations in the dSTR relative to the lPFC in the same task. Interestingly, this was true in both the random and fixed task conditions. In the fixed task condition we found that activity scaled with a value estimate from a reinforcement learning algorithm, and in the
almost random and fixed conditions the activity scaled with the color bias, which is related to the animals’ probability of advancing in the sequence and ultimately the number of steps necessary to get the reward. This finding is consistent with a role for the dSTR in reinforcement learning, although it suggests a more general role in value representation, as the neurons represent value in both random and fixed conditions. The representation in the random condition is consistent with finding from previous studies (Ding and Gold, 2010). One interesting question is where the action value information comes from, if not from lPFC. There are three likely candidates. One is the dopamine neurons, which have a strong projection to the striatum (Haber et al., 2000) and respond to rewards and reward prediction errors (Joshua et al.