�ݺ�ߣ

INTEGRATION OF VALUES AND
INFORMATION
IN DECISION-MAKING
1
Marion ROUAULT, Jan DRUGOWITSCH and Étienne KOECHLIN
Laboratoire de Neurosciences Cognitives INSERM U960,
Ecole Normale Supérieure, Paris

Neural bases of action outcomes evaluation
2
• Executive control of behavior relies on evaluation of action
outcomes to adjust subsequent action
Fronto-striatal
loops
Striatum
Atlas Yelnik and Bardinet
Ventromedial
prefrontal cortex
Dopaminergic system:
reward processing

Working hypothesis
Action outcomes may convey two types of value signals:
- “Rewarding” value : valorisation for the action outcome over an
axis of subjective preferences
- “Informational” value : information transmitted by the action
outcome about choice reliability (probability that, in the current
situation, the chosen action was the most appropriate)
3
Reinforcement learning
Simple, rapid, phylogenetically old
Bayesian inference
Sophisticated, rapidly saturated
How are processed rewarding and informational aspects of
action outcomes?
What are their neural and functional interaction?

Probabilistic reversal learning task
Correct state is rewarded 80 % of the
time
+ reversals
• States :
4
• Values : 2, 4, 6, 8, 10 € before decision,
range 1:11 € after decision
• Minimal instructions

3 conditions
Manipulate separately values and information
CONDITION
CORRELATE
D
CONDITION
RANDOM
20% 80% 80% 20%
Values provide no
information about
the most frequently
rewarded state
Higher values are
correlated with the
most frequently
rewarded state
Higher values are
correlated with
the less frequently
rewarded state
5
Reward
80% 20%
Probability
Reward Reward
CONDITION
ANTI-CORRELATED

Behavior
Choice % of most frequently
rewarded target
CONDITION
ANTI-CORRELATED
CONDITION
CORRELATE
D
CONDITION
RANDOM
Subjects favor accuracy, “being correct”, over simply maximizing 6
reward
Choice % of target with
best expected value
Trial number after contingency reversal
22 SUBJECTS

Variables contributive to choice?
Logistic regressions
Contribution to choice
(beta weight)
CONDITION
CORRELATE
D
CONDITION
ANTI-CORRELATED
CONDITION
RANDOM
p r1 r2 EV1 EV2 xt-1 p r1 r2 EV1 EV2 xt-1 p r1 r2 EV1 EV2 xt-1
• Differential processing of rewards given the experimental
condition: informational value
• No computation of expected value

Choice models
• Optimal choice would be rational combination of probabilities
and rewards:
Probability x Reward
• However people’s behavior is usually suboptimal
• To explain this sub-optimality, it is assumed that subjects
have distortions in their probabilities and rewards
representations
8
Khaneman and Tversky 1979
Zhang and Maloney 2012

1000 simulations
SUBJECTS
DISTORTIONS MODEL
Distortions model
CONDITION
ANTI-CORRELATED
CONDITION
CORRELATE
D
CONDITION
RANDOM
rewarded target
best expected value

Mixed model: integration of 2 concurrent
systems for decision-making
Bayesian RL
inference
Combination of beliefs and reinforcement:
Choice over: 0.75 BeliefBay + 0.25 QRL
Particularity of the protocol: possible rewards to gain are presented
before choice:
Revision of beliefs before choice Revision of Qs before choice:
given reward distributions:
Qt+1 = Qt + α (Rt – Qt)
(1 – w)Qt + wRt
with w biasing current
expected returns

1000 simulations
SUBJECTS
MIXED MODEL
Mixed model
CONDITION
ANTI-CORRELATED
CONDITION
CORRELATE
D
CONDITION
RANDOM
rewarded target
best expected value

LLH BIC AIC
Relative gain to a Bayesian model solely monitoring beliefs
DISTORTIONS
MIXED
p = .
057
Models comparison
p < .
005
p < .05
Distortions might be better explained by a mixed model
integrating two systems for decision-making

Mixed model without informational value
rewarded target
best expected value
CONDITION
CORRELATE
D
CONDITION
RANDOM
CONDITION
ANTI-CORRELATED
SUBJECTS
MIXED MODEL WITHOUT
INFORMATIONAL VALUE

14
Informational value processing
Refaire extraction de betas dans GLM36
p < .005 unc., c > 10 voxels, z = 40.
Small but significant positive correlation with
informational value within dlPFC regions

p < 0.005 unc. c > 10.
linear
quadratic
Neuro Imaging results
Belief system RL system

Neuro Imaging results
16
Belief system: RL system:
Neural activations are coherent with a mixed model involving
two systems for decision-making

Summary
• The product of the distortions is actually explained
by an integration of two systems for decision-making
• Rewarding value: network involving
• Informational value: network involving dlPFC,
17
Reinforcement learning
Simple, rapid, phylogenetically old
Bayesian inference
Sophisticated, rapidly saturated

Acknowledgments
Frontal lobe functions team
18

Choice given reward presented
CONDITION
CORRELATE
D
CONDITION
RANDOM
CONDITION
ANTI-CORRELATED
Choice r euros when presented
19
4 € 10 €
How much do you choose 10 euros,
independently of your belief about the
current state?
2 4 6 8 10
Choices rather related to states
Remaining effect of rewarding value visible in
condition random

20
Reinforcement learning model
Computations associated with RL model:
with w biasing current expected returns

21
Generative model of the task
Action
selection:
z STATE OF THE WORLD (NOT OBSERVED)

Variables contributive to choice?
Logistic regressions
p r1 r2 xt-1 p r1 r2 xt-1 p r1 r2 xt-1
(beta weight)
(beta weight)
DISTORTIONS
MIXED
22
subje
cts
19
subje
cts
p r1 r2 xt-1 p r1 r2 xt-1 p r1 r2 xt-1

Mixed model: integration of 2 concurrent
systems for decision-making
Bayesian RL
inference
Linear combination of beliefs and reinforcement:
Choice over: 0.75 BeliefBay + 0.25 QRL
Particularity of the protocol: possible rewards to gain are presented
before choice:
Revision of beliefs before choice
given reward distributions:
Revision of Qs before choice:
with w biasing current
expected returns

�ݺ�ߣ

Rouault sfn2014

Recommended

More Related Content

Similar to Rouault sfn2014 (20)

Rouault sfn2014

Editor's Notes