New AI Silent-Speech Recognition Interface Helps Read Silent Speech

Ruidong Zhang is a doctoral student in the field of information science at Cornell University. In the picture shown, it might seem like Zhang is talking to himself, but in reality, Zhang is quietly mouthing the passcode to open his nearby smartphone and play a song in his playlist.

Image Credit: Cornell University 

It is not telepathy: It is the common, off-the-shelf eyeglasses he is wearing, known as EchoSpeech—a silent-speech recognition interface that makes use of acoustic-sensing and artificial intelligence to identify up to 31 unvocalized commands, depending on lip and mouth movements.

According to scientists, having been developed by Cornell’s Smart Computer Interfaces for Future Interactions (SciFi) Lab, the low-power and wearable interface needs just a few minutes of user training data before it will identify commands and could be run on a smartphone.

Zhang is the lead author of “EchoSpeech: Continuous Silent Speech Recognition on Minimally-obtrusive Eyewear Powered by Acoustic Sensing,” which will be presented at the Association for Computing Machinery Conference on Human Factors in Computing Systems (CHI) April 2023 in Hamburg, Germany.

For people who cannot vocalize sound, this silent speech technology could be an excellent input for a voice synthesizer. It could give patients their voices back.

Ruidong Zhang, Doctoral Student, Field of Information Science, Cornell University

In its current form, EchoSpeech could be utilized to communicate with others through smartphones in places where speech is unsuitable or inconvenient, like a quiet library or a noisy restaurant. Also, the silent speech interface could be paired with a stylus and made use of design software like CAD, all but removing the requirement for a mouse and a keyboard.

Fitted with a pair of speakers and microphones smaller than pencil erasers, the EchoSpeech glasses turn out to be a wearable AI-powered sonar system, sending and receiving soundwaves throughout the face and sensing mouth movements. Also, a deep learning algorithm developed by SciFi Lab scientists then examines such echo profiles in real-time, having around 95% accuracy.

We’re moving sonar onto the body. We’re very excited about this system because it really pushes the field forward on performance and privacy. It’s small, low-power, and privacy-sensitive, which are all important features for deploying new, wearable technologies in the real world.

Cheng Zhang, Assistant Professor, Information Science, Ann S. Bowers College of Computing and Information Science, Cornell University

Also, Zhang is the director of the SciFi Lab.

The SciFi Lab has come up with numerous wearable devices that track hand, body, and facial movements with the help of machine learning and wearable miniature video cameras.

 

Image Credit: everything possible/Shutterstock.com

Now, the laboratory has shifted away from cameras and moved toward acoustic sensing to track body and face movements.

EchoSpeech relies on the laboratory’s similar acoustic-sensing device known as EarIO, a wearable earbud that helps track facial movements.

Cheng Zhang stated that the majority of the technology in silent-speech recognition had been restricted to a select set of predetermined commands and needed the user to face or wear a camera, which is neither practical nor possible. Also, there are significant privacy concerns, including wearable cameras—for both the user and also with those whom the user interacts with.

Newly developed acoustic-sensing technology like EchoSpeech helps eliminate the need for wearable video cameras. Also, because audio data is much smaller compared to the image or video data, it needs less bandwidth to process and could be relayed to a smartphone through Bluetooth in real time, stated François Guimbretière, professor in information science at Cornell Bowers CIS and a co-author.

And because the data is processed locally on your smartphone instead of uploaded to the cloud privacy-sensitive information never leaves your control.

Cheng Zhang, Assistant Professor, Information Science, Ann S. Bowers College of Computing and Information Science, Cornell University

Cheng Zhang stated that battery life improves exponentially, with ten hours with acoustic sensing versus 30 minutes with a camera.

Furthermore, the team is examining commercializing the technology behind EchoSpeech, thanks in part to Ignite, Cornell's Research Laboratory to Market gap funding.

In upcoming work, scientists from SciFi Lab are exploring smart-glass applications to track eye, facial, and upper body movements.

Cheng Zhang stated, “We think glass will be an important personal computing platform to understand human activities in everyday settings.”

The co-authors of the study were information science doctoral student Ke Li, Yihong Hao ‘24, Yufan Wang ‘24, and Zhengnan Lai ‘25. This study was partially funded by the National Science Foundation.

Tell Us What You Think

Do you have a review, update or anything you would like to add to this news story?

Leave your feedback
Your comment type
Submit

While we only use edited and approved content for Azthena answers, it may on occasions provide incorrect responses. Please confirm any data provided with the related suppliers or authors. We do not provide medical advice, if you search for medical information you must always consult a medical professional before acting on any information provided.

Your questions, but not your email details will be shared with OpenAI and retained for 30 days in accordance with their privacy principles.

Please do not ask questions that use sensitive or confidential information.

Read the full Terms & Conditions.