Researchers from Florida Atlantic University have carried out a first-of-its-kind study on computer vision-based recognition of American Sign Language alphabet motions.
Sign language serves as a sophisticated means of communication for individuals who are deaf or hard-of-hearing, relying on hand movements, facial expressions, and body language to convey meaning. American Sign Language (ASL) exemplifies this complexity with its own unique grammar and syntax.
Sign language is not universal; instead, there are many distinct sign languages worldwide, each with its own structure, vocabulary, and rules. This diversity highlights the linguistic richness of sign languages across different cultures.
To improve communication accessibility for those who rely on sign language, researchers are exploring methods to convert hand gestures into text or spoken language in real-time. A reliable, real-time system capable of accurately detecting and tracking ASL gestures would be pivotal in breaking down communication barriers and promoting more inclusive interactions.
Researchers at the College of Engineering and Computer Science at Florida Atlantic University (FAU) have made significant strides in this area. They developed a first-of-its-kind approach to recognizing ASL alphabet gestures using computer vision. The team created a custom dataset containing 29,820 static images of ASL hand gestures. Each image was annotated using MediaPipe, which provided detailed spatial information by marking 21 key landmarks on each hand.
These annotations played an important part in enhancing the precision of YOLOv8, the deep learning model the researchers trained, by allowing it to better detect subtle differences in hand gestures.
The study's results, published in the Elsevier journal Franklin Open, reveal that by leveraging this detailed hand pose information, the model achieved a more refined detection process, accurately capturing the complex structure of American Sign Language gestures. Combining MediaPipe for hand movement tracking with YOLOv8 for training resulted in a powerful system for recognizing American Sign Language alphabet gestures with high accuracy.
Combining MediaPipe and YOLOv8, along with fine-tuning hyperparameters for the best accuracy, represents a groundbreaking and innovative approach. This method hasn’t been explored in previous research, making it a new and promising direction for future advancements.
Bader Alsharif, Study First Author and PhD Candidate, Department of Electrical Engineering and Computer Science, Florida Atlantic University
The model achieved an impressive 98 % accuracy, 98 % recall (correct identification rate), and a 99 % F1 score. Additionally, it recorded a mean average precision (mAP) of 98 % and a more detailed mAP50-95 score of 93 %, underscoring its reliability and precision in gesture recognition.
“Results from our research demonstrate our model’s ability to accurately detect and classify American Sign Language gestures with very few errors. Importantly, findings from this study emphasize not only the robustness of the system but also its potential to be used in practical, real-time applications to enable more intuitive human-computer interaction,” Alsharif added.
The integration of MediaPipe’s landmark annotations into the YOLOv8 training process enhanced both bounding box accuracy and gesture classification, allowing the model to capture even the smallest variations in hand poses. This two-step method of landmark tracking and object detection proved essential in achieving high accuracy and efficiency in real-world applications. The model’s robust performance across varying hand positions and gestures highlights its adaptability to diverse settings.
Our research demonstrates the potential of combining advanced object detection algorithms with landmark tracking for real-time gesture recognition, offering a reliable solution for American Sign Language interpretation. The success of this model is largely due to the careful integration of transfer learning, meticulous dataset creation, and precise tuning of hyperparameters. This combination has led to the development of a highly accurate and reliable system for recognizing American Sign Language gestures, representing a major milestone in the field of assistive technology.
Mohammad Ilyas, Study Co-Author and Professor, Department of Electrical Engineering and Computer Science, Florida Atlantic University
Future work will focus on expanding the dataset to include a broader range of hand shapes and gestures, further improving the model’s ability to differentiate between visually similar gestures. Additionally, efforts will be made to optimize the system for deployment on edge devices, ensuring it maintains its real-time performance even in resource-limited environments.
By improving American Sign Language recognition, this work contributes to creating tools that can enhance communication for the deaf and hard-of-hearing community. The model’s ability to reliably interpret gestures opens the door to more inclusive solutions that support accessibility, making daily interactions – whether in education, health care, or social settings – more seamless and effective for individuals who rely on sign language. This progress holds great promise for fostering a more inclusive society where communication barriers are reduced.
Stella Batalama, PhD, Dean, College of Engineering and Computer Science, Florida Atlantic University
Easa Alalwany, Ph.D., a recent Ph.D. graduate of FAU and assistant professor at Taibah University, also contributed to this research.
Journal Reference:
Alsharif, B., et. al. (2024) Transfer learning with YOLOV8 for real-time recognition system of American Sign Language Alphabet. Franklin Open. doi.org/10.1016/j.fraope.2024.100165