Anyone who has used noise-canceling headphones is probably aware of how important it might be to hear the appropriate audio at the right moment. When working indoors, someone might prefer to muffle automobile horns, but not when strolling down a crowded street. People still have little control over which sounds their headphones block out.
Researchers at the University of Washington are leading a team that has created deep-learning algorithms that let users select the sounds that will filter through their headphones in real-time. The team refers to the system as “semantic hearing.”. Headphones eliminate all background noise by streaming recorded audio to a connected smartphone.
Headphone users can choose from 20 types of sounds, including sirens, infant cries, speech, vacuum cleaners, and bird chirps, using voice commands or a smartphone app. The headphones will only play the sounds that the users have chosen.
The team presented its findings at UIST ‘23 in San Francisco on November 1st, 2023. The researchers intend to release a commercial version of the technology in the future.
Understanding what a bird sounds like and extracting it from all other sounds in an environment requires real-time intelligence that today’s noise-canceling headphones haven’t achieved. The challenge is that the sounds headphone wearers hear the need to sync with their visual senses. You can’t be hearing someone’s voice two seconds after they talk to you. This means the neural algorithms must process sounds in under a hundredth of a second.
Shyam Gollakota, Study Senior Author and Professor, Paul G. Allen School of Computer Science and Engineering, University of Washington
Due to time constraints, the semantic hearing system must process sounds on a device such as a linked smartphone rather than on more powerful cloud servers. Furthermore, since sounds from different directions arrive in people’s ears at different times, the system must preserve these delays as well as other spatial signals so that people can still experience sounds in their environment meaningfully.
The system was tested in a variety of settings, including streets, parks, and offices. It was able to isolate target sounds, such as sirens and bird chirps, while eliminating background noise. Twenty-two participants rated the audio output of the system for the target sound, and they reported that, on average, the quality was better than the original recording.
The system occasionally had trouble recognizing sounds that have a lot in common, like vocal music and human speech. The models could produce better results if they were trained on more real-world data, according to the researchers.
Additional study co-authors included Takuya Yoshioka, head of research at AssemblyAI, Justin Chan, who performed this research as a doctoral student in the Allen School and is currently at Carnegie Mellon University, and Bandhav Veluri and Malek Itani, both UW PhD students in the Allen School.
Semantic hearing: Future of intelligent hearables
Video Credit: University of Washington
Journal Reference
Veluri, B., et al. (2023) Semantic Hearing: Programming Acoustic Scenes with Binaural Hearables. Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology. doi:10.1145/3586183.3606779.