Posted in | News | Automation Electronics

Variable Used in Robot Learning Found in Rat's Brain

Reviewed

Mar 2 2015

A key part of the brain involved with decision making, the striatum, appears to operate hierarchically - much like a traditional corporation with executives, middle managers and employees, according to researchers at the Okinawa Institute of Science and Technology (OIST) Graduate University in Japan.

Neurons in the dorsolateral, dorsomedial and ventral striatum were activated during different phases of the task. The vertical axes are numbered neurons, and the activity of each neuron is indicated by the yellow and red colors. The most neurons in the ventral striatum were activated prior to the task starting (phase 1). Credit:OIST

The striatum is part of the basal ganglia, the inner core of the brain that processes decisions and movements. Neuroscientists have thought the three regions of the striatum - ventral, dorsomedial and dorsolateral - have very distinct roles in motivation, adaptive decisions and routine actions, respectively.

However, OIST researchers found these parts do not operate in isolation, but work together in a coordinated hierarchy - like a traditional company with executives making decisions, delegating to middle managers and employees carrying out specific tasks.

"The three parts have not been investigated simultaneously in the same task before," said Dr. Mokoto Ito, a researcher in OIST's Neural Computation Unit and lead paper author. "We found the different parts work for the same behaviors, but in different roles."

Their findings were published online on February 24 in The Journal of Neuroscience.

To observe what each part does, the researchers hooked up tiny electrodes to rats' brains. The electrodes measured how frequently neurons in each section fired during a task, in which rats picked between two holes based on the probability of getting a sugar pellet reward. During fixed trials, the reward probability was held at different rates for the two holes, so the rats' responses would become habitual over several weeks. During free-choice trials, the probability of reward jumped around, requiring the rats to adapt and evaluate their options more carefully.

The researchers found that while the three striatum regions have distinct roles, they work together in different phases in a trial.

"They do not work for separate behaviors," said Prof. Kenji Doya, who heads OIST's Neural Computation Unit and paper co-author. "It's probably better to understand these different parts from a hierarchical control viewpoint."

The ventral striatum (VS) was most active early on, when the rat decided whether it would participate in the activity or not. The dorsomedial striatum (DMS) changed firing levels as the rat evaluated the expected reward for each option while making a decision to turn left or right. The dorsolateral striatum (DLS) fired short bursts at a variety of times throughout the task, suggesting the involvement with the control of fine motor movements.

This is akin to a company's president deciding to make a new product, middle managers evaluating different design and sales options, and employees building specific parts.

Neuroscientists have long thought there are separate circuits for routine actions and actions in continuously changing environments. If true, the DLS would be more active if the probability is fixed, while the DMS would be more active in free-choice tasks that require the rat to learn and adapt. To the researchers' surprise, there was little difference in DMS and DLS firing during fixed and free-choice tasks in this study.

That was not the only unexpected result. OIST's Neural Computation Unit works on adaptive robots learning how to autonomously behave based on reward feedback. The core component of the robots' algorithm is the "action value," which keeps track of the probability for a positive outcome.

"The same variable we use for robot learning was also found in the rat's brain," Doya said. "This is quite a striking observation."

This strongly suggests rats analyzed the potential benefit of choosing the left or right hole, and that analysis was constantly updated after each trial - the same way the robot algorithm works.

Source:

http://www.oist.jp/