Reviewed by Lexie CornerOct 17 2024
Researchers from the University of Virginia have developed an AI-driven intelligent video analyzer capable of identifying human actions in video footage with unprecedented accuracy and precision, according to a study published in IEEE Transactions on Pattern Analysis and Machine Intelligence.
What if a security camera could not only record footage but also understand the activity it captures, making a real-time distinction between normal activity and potentially harmful behavior? Researchers at the University of Virginia’s School of Engineering and Applied Science are working towards this future with their recent development of the Semantic and Motion-Aware Spatiotemporal Transformer Network (SMAST). This system could enhance public safety, surveillance, healthcare motion tracking, and autonomous vehicle navigation in complex environments.
This AI technology opens doors for real-time action detection in some of the most demanding environments. It is the kind of advancement that can help prevent accidents, improve diagnostics, and even save lives.
Scott T. Acton, Study Lead Researcher, Professor and Chair, Department of Electrical and Computer Engineering, University of Virginia
AI-Driven Innovation for Complex Video Analysis
How does SMAST work? At its core, it relies on artificial intelligence to identify and interpret complex human behaviors. The system is powered by two key elements.
First, a multi-feature selective attention model helps the AI filter out irrelevant details, focusing on the most important parts of a scene, like specific actions or objects. This enables more accurate event detection, such as recognizing when someone is throwing a ball rather than just moving their arm.
Second, a motion-aware 2D positional encoding algorithm allows the AI to track movement over time. For instance, in videos where people change positions frequently, this tool helps the AI remember and understand those movements and their relationships.
By combining these capabilities, SMAST can accurately identify complex movements in real time, making it highly effective in critical areas like autonomous driving, healthcare diagnostics, and surveillance.
SMAST represents a new approach to how machines recognize and understand human behavior. Current systems often struggle to capture the context of events in chaotic, continuous video footage. However, SMAST’s innovative design, powered by AI components that learn and adapt from data, allows it to track the dynamic interactions between people and objects with exceptional precision.
Setting New Standards in Action Detection Technology
This breakthrough enables the AI system to identify actions like a runner crossing the street, a doctor performing a complex procedure, or detecting a security threat in a crowded area. SMAST has already surpassed leading solutions in key academic benchmarks, including AVA, UCF101-24, and EPIC-Kitchens, setting new standards for accuracy and efficiency.
The societal impact could be huge. We are excited to see how this AI technology might transform industries, making video-based systems more intelligent and capable of real-time understanding.
Matthew Korban, Postdoctoral Research Associate, University of Virginia
Matthew Korban, Peter Youngs, and Scott T. Acton from the University of Virginia are the study authors.
The National Science Foundation (NSF), under Grant 2000487 and Grant 2322993, funded the study.
Journal Reference:
Korban, M. et. al. (2024) A Semantic and Motion-Aware Spatiotemporal Transformer Network for Action Detection. IEEE Transactions on Pattern Analysis and Machine Intelligence. doi.org/10.1109/TPAMI.2024.3377192