To the Editor: Action selection is the task by which an agent characterizes what to do next.1 In goal-directed behavior, deciding which action to take is biased by the prediction of the outcome. Differences between predicted and actual outcome (error signal) can be used to optimize the behavior.2 Obsessive-compulsive disorder (OCD) is an anxiety disorder that is related to inappropriate behavioral optimization, characterized by repetitive, interfering thoughts and compulsive behaviors. OCD patients tend to feel that something is wrong even when they perform correctly. This produces severe anxiety, which causes recurrent behaviors in order to decrease the emotional pressure.3 The “hyperactive error-monitoring” hypothesis of OCD suggests that patients receive faulty error signals when they don't reach their goals, urging them to repeat their compulsive behaviors.4 In machine-learning, “reinforcement learning theory” studies the way that artificial systems/agents can learn to predict the outcomes of their behaviors and optimize them in the environment to maximize some notion of cumulative reward.5 In other words, reinforcement learning (RL) is learning how to map situations or states to actions in order to maximize a reward or minimize a punishment. Two of RL model components are the “policy function” and the “model of the environment.” Policy function maps the agent states to best actions, and model is described as anything that an agent can use to predict the environmental responses to its actions.6 We hypothesize that the high error signals produced in the brain of OCD patients contribute to an inappropriate mapping of the state of the environment to a proper action (inappropriate action-selection). In RL language, we suggest that OCD patients suffer from an improper policy function. Hence, what these patients need as a therapy is a suitable way to improve their policy. One of the ways to improve policy in the RL domain is composed of two steps: 1) learning the model of “how the environment works;” and, 2) choosing “the best action,” given the current knowledge of the environment. Traditionally, cognitive-behavioral therapy (CBT) has been used as the most effective type of psychotherapy for this disorder. The patient is exposed many times to a situation that triggers the obsessive thoughts, and learns gradually to cope with the anxiety and resist the urge to perform the compulsion. Here, we propose a therapy that combines CBT and RL. This method, like above-mentioned RL method, consists of two steps, and, unlike CBT, this method does not need exposure to real situations. In this method, for the first step, the patient, with the help of the therapist, learns a model of the environment. In other words, the patient learns how the environment will respond to his or her actions. A model produces a prediction of the next state of the environment. In fact, the model is used to simulate the environment and produce simulated experience. With a learned model, the patient no longer needs real experiences and can use a model to produce simulated experience in the mind; so the patient can expose herself/himself to a simulated situation in his or her mind that triggers the obsessive thoughts, and, with the help of the learned model, predict the environmental response and then choose the best action mentally. So, this method is a mental practice that reduces the anxiety encountered by the patient in the real situation. We think that behavioral studies on several groups of OCD patients, comparing the results of our method with traditional methods, may be a good beginning for testing our hypothesis.