Nov 13 2018
To robots, instructions from humans can be extremely complex. Even a fairly basic command like “Go to the house, passing the tree on your right,” may require numerous trials to learn. Plus if the command varies to “Go to the house, passing the tree on your left,” the robots need to re-learn the operation from scratch.
Quadcopter Following Navigation Instructions
A new paper by Cornell scientists aims to solve this challenge by dividing the robot’s task into two distinct stages: first, interpreting the language in the command and planning a trajectory; and then implementing its trip. In a simulation, a drone programmed with this method learned how to move in a specific environment faster and more unerringly than with current approaches.
“Planning where to go is a much simpler problem than actually going there, because it avoids the agent having to actually act in the environment,” said Valts Blukis, a doctoral student in computer science at Cornell Tech and the paper’s first author, “Mapping Navigation Instructions to Continuous Control Actions With Position-Visitation Prediction,” presented at the Conference on Robot Learning held from October 29th to 31st in Zurich, Switzerland. “Once it can predict a path, it is then relatively easy to follow it, without having to care about the original instruction.”
The paper was co-written with Cornell computer science doctoral student Dipendra Misra; Ross Knepper, assistant professor of computer science; and senior author Yoav Artzi, assistant professor of computer science at Cornell Tech.
The majority of current robots follow commands from complicated user interfaces, or controllers such as joysticks. To regulate them, operators must have skill or training, limiting the robots’ use to tedious tasks and industrial sites like factories. Robots that could deduce natural human language could be made available for non-experts and possibly capable of a broader range of tasks.
“Language is powerful and lets us express many ideas and constructions,” Blukis said. “With language, we could envision telling our robots exactly what we want them to do.”
But the same intricacy and richness that make language so effective also make it tough for robots to understand. A command such as “Go toward the blue fence, passing the anvil and tree on the right,” for instance, requires the computer to comprehend many concepts and behaviors.
In the meantime, a number of recent robots learn by experience. Through hundreds of thousands of efforts, they rectify their behavior until they have learned how to accomplish each job successfully. That method is not feasible if one is attempting to get the robot to react to natural human language, instead of a list of pre-learned commands.
In the scientists’ new model, the robot initially interprets the language to identify the areas a robot will probably visit while doing its task and recognizing the right destination. It then travels between its most expected positions to get to its destination.
Those two stages were trained independently using deep neural networks—a type of machine learning architecture wherein computers learn representations from data. The model stores information as it sees it, allowing it to perfect its predictions over time.
Once it has a prediction for where it should be going—basically by highlighting the areas that are very likely to be visited—it generates actions to go there. This way, we can iterate through the instruction data much faster, and quickly teach the robot to correctly plan where to go from instructions, without the robot having to act out in the environment and make mistakes at all.
Valts Blukis, Doctoral Student in Computer Science, Cornell Tech.
The scientists tested their model using almost 28,000 crowdsourced commands and a quadcopter simulator that approximates drone flight, including a realistic controller requiring rapid decisions in response to varying circumstances. They learned their simulated drone was nearly twice as accurate as those using two other recently suggested techniques, and Blukis said they trained their model within days, rather than weeks.
Although the model exists only in a simulated format until now, future similar methods could possibly be applied to delivery robots or even self-driving vehicles. This system would be mainly useful in large, unaccustomed or complicated surroundings where training a robot to respond to more precise, targeted tasks would be unrealistic, he said.
The study received support from the Schmidt Science Fellows program, the National Science Foundation, the Air Force Office of Scientific Research and Amazon.