Jun 25 2019
A car is instructed to improve its speed while it races down a track in a computer game, pushing the pedal to the metal. It then continues to rotate in a tight little circle. However, there was nothing in the instructions that informed the car to drive straight, and hence it improvised.
While this instance seems to be humorous in a computer game, it is not so much in life. It is also among those that inspired researchers at Stanford University to construct a better method to set objectives for autonomous systems.
Two different methods of setting goals for autonomous robots into a single process have been integrated by Dorsa Sadigh, assistant professor of computer science and of electrical engineering, and her laboratory. The process performed better when compared to either of its parts alone in real-world as well as simulations experiments. The work was presented at the Robotics: Science and Systems conference on June 24th, 2019.
In the future, I fully expect there to be more autonomous systems in the world and they are going to need some concept of what is good and what is bad. It’s crucial, if we want to deploy these autonomous systems in the future, that we get that right.
Andy Palan, Study Co-Lead Author and Graduate Student, Department of Computer Science, Stanford University
Developed for giving instruction to robots called reward functions, the latest system integrates demonstrations, wherein humans instruct the robot what to do, and user preference surveys, wherein people have to answer queries regarding how they prefer the robot to behave.
“Demonstrations are informative but they can be noisy. On the other hand, preferences provide, at most, one bit of information, but are way more accurate,” stated Sadigh. “Our goal is to get the best of both worlds, and combine data coming from both of these sources more intelligently to better learn about humans’ preferred reward function.”
Demonstrations and surveys
In earlier studies, Sadigh had only concentrated on preference surveys, which ask individuals to compare scenarios, like a couple of trajectories for an autonomous car. While this is an efficient technique, it could take as long as three minutes to create the subsequent question, which is still sluggish for generating instructions for car and other similar intricate systems.
In order to expedite that, the researchers subsequently created a method for generating numerous questions at once—questions that can be answered in rapid succession by a single person or spread among a number of individuals. This update accelerates the process by as much as 15 to 50 times when compared to generating questions in sequential order.
The latest combination system starts with a person exhibiting a behavior to the robot. While that can give plenty of information to autonomous robots, they usually find it difficult to establish what parts of the demonstration are significant. Moreover, people do not invariably want a robot to act similar to the human that trained it.
We can’t always give demonstrations, and even when we can, we often can’t rely on the information people give. For example, previous studies have shown people want autonomous cars to drive less aggressively than they do themselves.
Erdem Biyik, Graduate Student, Department of Electrical Engineering, Stanford University
Biyik headed the study that developed the multiple-question surveys.
That is exactly where the surveys come in, providing the autonomous robot a method of asking—for instance, whether users require the robot to move its arm up toward the ceiling or low toward the ground. The researchers utilized the slower single question technique for this study, but they are planning to combine multiple-question surveys in subsequent analyses.
In various tests that were conducted, the researchers noted that integrating surveys and demonstrations was faster than simply specifying preferences and, when in comparison to demonstrations alone, approximately 80% of people preferred the way an autonomous robot behaved when it was trained with the integrated system.
This is a step in better understanding what people want or expect from a robot. Our work is making it easier and more efficient for humans to interact and teach robots, and I am excited about taking this work further, particularly in studying how robots and humans might learn from each other.
Dorsa Sadigh, Assistant Professor, Departments of Computer Science and Electrical Engineering, Stanford University
Better, faster, smarter
Individuals who utilized the integrated technique found it difficult to interpret what the system was exactly getting at with some of its queries, which at times asked them to choose between two scenarios that either appeared to be the same or appeared to be unrelated to the task at hand—a problem that is common in preference-based learning. The team is hoping to deal with this drawback with surveys that are easier and also work more rapidly.
Looking to the future, it’s not 100 percent obvious to me what the right way to make reward functions is, but realistically you’re going to have some sort of combination that can address complex situations with human input. Being able to design reward functions for autonomous systems is a big, important problem that hasn’t received quite the attention in academia as it deserves.
Andy Palan, Study Co-Lead Author and Graduate Student, Department of Computer Science, Stanford University
The researchers are also interested in a variation on their system, which would enable individuals to generate reward functions for varied scenarios at the same time. For instance, people may prefer their cars to drive more aggressively during light traffic and more conservatively during slow traffic.
Graduate student Nicholas C. Landolfi and undergraduate Gleb Shevchuk, both from Stanford University, are co-authors of the RSS 2019 paper.
The Toyota Research Institute and the Future of Life Institute funded the study.