Revolutionizing AI Training with Real-Time Human Feedback

Scientists from Duke University and the Army Research Laboratory have developed a platform called GUIDE, designed to enable artificial intelligence (AI) to learn complex tasks in a manner more similar to humans. This AI framework will be presented at the Conference on Neural Information Processing Systems (NeurIPS 2024), scheduled for December 9–15 in Vancouver, Canada.

Researchers have developed a platform that allows AI to learn from constant, nuanced human feedback rather than large datasets. Image Credit: Duke University.

In a first driving lesson, the instructor typically sits beside the student, providing immediate feedback on every turn, stop, and adjustment. If the instructor is a parent, they might even grab the wheel and shout, “Brake!” These real-time corrections gradually build the student’s experience and intuition, ultimately transforming them into a confident, independent driver.

Despite advancements in AI that enable self-driving cars, AI training methods remain vastly different from this hands-on approach. Instead of relying on nuanced, real-time guidance, AI learns primarily through large datasets and extensive simulations, regardless of the application.

It remains a challenge for AI to handle tasks that require fast decision-making based on limited learning information. Existing training methods are often constrained by their reliance on extensive pre-existing datasets while also struggling with the limited adaptability of traditional feedback approaches. We aimed to bridge this gap by incorporating real-time continuous human feedback.

Boyuan Chen, Professor, Mechanical Engineering and Materials Science, Duke University

Boyuan Chen is also a Professor of Electrical and Computer Engineering and Computer Science at Duke, where he also directs the Duke General Robotics Lab.

GUIDE allows humans to monitor AI actions in real-time and provide continuous, detailed feedback. Similar to a skilled driving coach offering precise guidance rather than simple commands, GUIDE promotes incremental improvements and deeper learning.

In an initial study, GUIDE was used to help AI learn to play hide-and-seek. The game involved two computer-controlled players: a red seeker and a green hider. Only the red player’s AI actively improved through training.

The game takes place on a square field with a C-shaped barrier at the center. Much of the field remains hidden until the red seeker explores new areas, uncovering their contents.

As the red player pursued the green player, a human trainer guided its search strategy. Unlike traditional training methods, which rely on basic inputs like "good," "bad," or "neutral," GUIDE uses a gradient scale. Trainers hovered a mouse cursor to provide real-time feedback, enabling more nuanced adjustments.

The study involved 50 adult participants without prior training or expertise, making it one of the largest of its kind. In just 10 minutes of feedback, the AI’s performance significantly improved. GUIDE achieved up to a 30 % increase in success rates compared to leading human-guided reinforcement learning methods.

This strong quantitative and qualitative evidence highlights the effectiveness of our approach. It shows how GUIDE can boost adaptability, helping AI to independently navigate and respond to complex, dynamic environments.

Lingyu Zhang, Study Lead Author, Duke University

Lingyu Zhang is a first-year Ph.D. student in Chen’s lab.

The researchers also demonstrated that human trainers are needed only for a short period. As participants provided feedback, the team developed a simulated AI trainer based on their insights in specific scenarios and moments. This enables the seeker AI to continue learning after the human trainer stops providing input. While it may seem counterintuitive to train an AI "coach" that is less skilled than the AI it is training, Chen explains that this approach mirrors a common human tendency.

While it’s very difficult for someone to master a certain task, it’s not that hard for someone to judge whether or not they’re getting better at it. Lots of coaches can guide players to championships without having been a champion themselves.

Boyuan Chen, Professor, Mechanical Engineering and Materials Science, Duke University

Another interesting aspect of GUIDE is its potential to explore the individual differences among human trainers. Cognitive tests given to all 50 participants revealed that certain abilities, such as spatial reasoning and quick decision-making, influenced how effectively a person could guide the AI. These findings suggest the possibility of enhancing these abilities through targeted training and identifying other factors that could improve AI training.

These insights highlight the potential for developing more adaptive training frameworks that not only focus on teaching AI but also on enhancing human capabilities, leading to more effective human-AI collaboration.

By addressing these challenges, researchers aim to create a future where AI learns more intuitively, bridging the gap between human intuition and machine learning. This could enable AI to function more autonomously in environments with limited information.

Chen says, “As AI technologies become more prevalent, it’s crucial to design systems that are intuitive and accessible for everyday users. GUIDE paves the way for smarter, more responsive AI capable of functioning autonomously in dynamic and unpredictable environments.”

The team envisions future research that incorporates diverse communication signals, such as language, facial expressions, hand gestures, and more, to develop a more comprehensive and intuitive framework for AI to learn from human interactions. Their work aligns with the lab’s mission to build next-level intelligent systems that collaborate with humans to address tasks that neither AI nor humans could solve independently.

This research is supported in part by the Army Research Laboratory.

GUIDE: Real-Time Human-Shaped Agents

Video Credit: Duke University

Tell Us What You Think

Do you have a review, update or anything you would like to add to this news story?

Leave your feedback
Your comment type
Submit

While we only use edited and approved content for Azthena answers, it may on occasions provide incorrect responses. Please confirm any data provided with the related suppliers or authors. We do not provide medical advice, if you search for medical information you must always consult a medical professional before acting on any information provided.

Your questions, but not your email details will be shared with OpenAI and retained for 30 days in accordance with their privacy principles.

Please do not ask questions that use sensitive or confidential information.

Read the full Terms & Conditions.