logo delicious         Le partage d'événements est un service expérimental  

Les messages de la liste echos du Risc

[these] Soutenance : Gabriel Sulem, An ordinal generative model of Bayesian inference for Human decision-making in continuous reward environments, Paris, 14 sept. 2017, 14h30
(12/09/2017)

De : "clementine fourrier" (via iec Mailing List) iec [ chez ] lists.ens.fr

THESIS DEFENSE

14 septembre 2017, 2:30pm, Salle des Actes (45 rue d'Ulm, 1st floor)
Gabriel Sulem (LNC) : "An ordinal generative model of Bayesian inference for Human decision-making in continuous reward environments"

Jury:
Peter Dayan (UCL, Londres)
Pierre-Yves Oudeyer (INRIA Bordeaux)
Mathias Pessigione (ICM, Paris)
Mehdi Khamassi (ISIR, Paris)
Etienne Koechlin (ENS, Paris)

Abstract:
Our thesis aims at understanding how human behavior adapts to an environment where rewards are continuous. Many works have studied environments with binary rewards (win/lose) and have shown that human behavior could be accounted for by Bayesian inference algorithms. Bayesian inference is very efficient when there are a discrete number of possible environmental states to identify and when the events to be classified (here rewards) are also discrete.

A general Bayesian algorithm works in a continuous environment provided that it is based on a “generative” model of the environment, which is a structural assumption about environmental contingencies which limits the number of possible interpretations of observations and structures the aggregation of data across time. By contrast reinforcement learning algorithms remain efficient with continuous reward scales by efficiently adapting and building value expectations and selecting best options.

The issue we address in this thesis is to characterize which kind of generative model of continuous rewards characterizes human decision-making within a Bayesian inference framework.
One putative hypothesis is to consider that each action attributes rewards as noisy samples of the true action value, typically distributed as a Gaussian distribution. Statistics on a few samples enable to infer the relevant information (mean and standard deviation) for subsequent choices. We propose instead a general generative model using assumptions about the relationship between the values of the different actions available and the existence of a reliable ordering of action values. This structural assumption enables to simulate mentally counterfactual rewards and to learn simultaneously reward distributions associated with all actions. This limits the need of exploratory choices and changes in environmental contingencies are detected when obtained rewards departs from learned distributions. To validate our model, we ran three behavioral experiments on healthy subjects in a setting where reward distributions associated to actions were continuous and changed across time. O!
ur proposed model described correctly participants’ behavior in all three tasks, while other competitive models, including especially Gaussian models failed.
Our results extend the implementation of Bayesian algorithms to continuous rewards which are frequent in everyday environments. Our proposed model establishes which rewards are “good” and desirable according to the current context. Additionally, it selects actions according to the probability that it is better than the others rather than following actions’ expected values. Lastly, our model answers to evolutionarily constraints by adapting quickly, while performing correctly in many different settings including the ones in which the assumptions of the generative model are not verified.


The 'pot de thèse' will take place in the Espace Curie at around 5:30pm


------
Message redirigé par le relais d'information sur les sciences de la cognition (RISC) sans virus
http://www.risc.cnrs.fr. Pour des raisons de sécurité, cette liste ne transmet pas les pièces jointes.

Cette liste est modérée. Pour envoyer un message à la liste, écrire à : Pourinfos [ chez ] risc.cnrs.fr

Vous disposez d'un droit d'accès, de modification, de rectification et de suppression de la liste de diffusion. Pour l'exercer adressez vous à : Pourinfos [ chez ] risc.cnrs.fr