model mental disorders with reinforcement learning

Project on linking dopamine-associated mental disorders with expectile-based distributional reinforcement learning (DistRL)

TL;DR

  • A class project while I was at ETH Zürich on linking dopamine-associated mental disorders such as drug and gambling addiction to expectile-based distributional reinforcement learning
  • Paper ***

Ever wondered how the brain learns and interacts with external environment, and how people with different neurological conditions have differed in their responses? With the advancements in the field of machine learning, especially Reinforcement Learning (RL), the design of computational agents is inspired by how animals behave through their interaction with environment, reward processing in this case. As we are focusing on dopamine, which is a neurotransmitter that has plays major role in reward-based learning, we pointed out, through simulations and well-known tasks, the similarities on RL system in the brain, and on that of a computational agent, or AI.

In addition to the similarities, the models and simulation results were used to explore the possible explanations for mechanism of dopamine-associated mental disorders, which are drug addiction, gambling addiction and Parkinson’s disease.

Before getting into the distributional reinforcement learning (DistRL) paradigm, one must be comfortable with how traditional RL works, and how the processes resembles that of a neural reward system. For traditional RL, temporal difference (TD) learning algorithm is widely used for estimating the value of each stimulus, which could produce positive or negative reward, in order to achieve the goal of maximizing the long-term reward. From that, one could see the resemblance between that and actor-critic methods, which has two main components. First, there is an actor which its role is to execute actions that maximize the rewards. Then, the critic inform if the actor’s action is good or not, and whether it should adjust its action or not. Thus, this results in a loop of acting and optimising the action, in other words, they learn their decision policy directly alongside their value function, mutually optimising both.

From that, people have made attempts to optimize the traditional RL by extending its components, thus its potential. Normally, one would estimate single expected values from a given stimulus, but with a class of summary statistics generalising the expected value, or expectiles, we can now estimate a set of statistics of the reward distribution. To elucidate, expectiles are considered as asymmetric version of expected value, with various degrees of pessimism (asymmetry parameter less than 0.5) and optimism (asymmetry parameter more than 0.5). Then, being able to retrieve the distribution open many doors for exploring how abnormal asymmetries could result in aberrant reward processing system of the computational agent.

RL in brain ==> RPE, actor-critic, disorders

DistRL in brain ==> explain imputation and extension from traditional RL

Methods ==> Imputation experiment, IGT

Results ==> Pictures

Discussion and future directions