A study published in Scientific Reports explored the use of advanced deep learning (DL) techniques to assess the intelligibility of speech processed through cochlear implants (CIs). It addressed the limitations of traditional methods that rely on human subjects for evaluating CI simulations, offering a more efficient approach.
The researchers from the US aimed to develop a more efficient and cost-effective way to understand speech processing in individuals with hearing impairments, particularly in challenging listening environments.
Advancement in Cochlear Implant Technology
CIs represent a significant advancement in auditory rehabilitation, specifically designed to restore hearing in individuals with profound deafness. These devices stimulate the auditory nerve directly through electrodes surgically implanted in the cochlea, bypassing damaged hair cells. Over time, improvements in sound coding and signal processing have aimed to enhance auditory information delivery to the brain.
However, many CI users still struggle to understand speech, especially in noisy environments. Traditional methods for evaluating CI performance rely on human subjects, making the process costly, time-consuming, and subject to individual differences like learning and fatigue. This highlights the ongoing need for research to improve these technologies and enhance user outcomes.
Using Deep Learning for Speech Intelligibility
In this study, the authors used Whisper, a DL-based speech recognition model developed by OpenAI, to evaluate the intelligibility of vocoder-processed speech. Whisper's performance was compared with traditional human studies, which often face logistical challenges and variability due to individual differences in auditory processing.
The study employed acoustic vocoder simulations to mimic the auditory degradations experienced by CI users. These vocoders process sound by extracting amplitude envelopes from spectral channels and discarding temporal fine structure, simulating CI device functionality. The simulations adjusted various parameters to represent different CI settings and listening conditions.
Whisper was tasked with recognizing vocoder-processed speech under different conditions, including varying background noise levels. Its performance was then compared to existing human subject data to validate its effectiveness.
Methodology and Experimental Design
The researchers conducted simulations using the model and processed approximately 270,000 sentences and 900,000 words across 450 different vocoder settings. They systematically varied vocoder parameters like frequency range, number of vocoder bands, and envelope dynamic range to simulate different sound processing settings in CI devices.
Additionally, intensity quantization and low-cutoff frequency of vocoder envelopes were adjusted to mimic the temporal and intensity resolutions experienced by CI patients. The results were then analyzed to determine how these factors influenced the model's performance in both quiet and noisy environments.
Impacts of Using Advanced Model
The outcomes showed that the Whisper model exhibited human-like responses to changes in vocoder parameters. Its performance improved with more vocoder channels, plateauing around 8-10 channels in quiet conditions. In noisy environments, sentence recognition plateaued at 90 % with 12 channels, while word recognition continued to improve up to 22 channels, though it remained below 60 %.
The authors found that changes in channel frequency boundaries/range and envelope cutoff frequencies significantly impacted performance. Whisper's accuracy declined as the starting frequency increased, with a steep drop occurring beyond a certain point. Performance improved as the envelope low-cutoff frequency rose from 5 to 20 Hz, with no further gains beyond that. The optimal frequency range was found to be between 300 and 7000 Hz.
Higher frequencies above 20 Hz did not provide additional speech information. Increasing the dynamic range improved intelligibility, especially for words with fewer channels. Additionally, the envelope dynamic range and quantization levels significantly affected speech intelligibility. Whisper's performance improved with a dynamic range of up to 60 dB, particularly for words and with fewer channels. The model's performance plateaued at around 20 quantized envelope levels.
Although Whisper was not specifically designed to mimic human auditory processing, it showed strong resilience to vocoder-induced degradations. This suggests that DL models could provide valuable insights into complex auditory scenarios, offering information that may not be easily captured through human studies.
Applications and Future Directions
This research has significant implications for improving CI technology and auditory rehabilitation strategies, particularly in noisy environments. The use of DL models like Whisper could allow scientists and clinicians to explore more signal processing options and their impact on speech intelligibility without relying on human testing, accelerating the development of personalized CI strategies.
In summary, DL techniques like Whisper proved effective in evaluating speech intelligibility in vocoder simulations. Their human-like responses to vocoder parameters demonstrate the potential of DL models to simulate human speech processing. This approach offers significant time and cost benefits, accelerating CI research and clinical applications.
Future research could explore integrating models of auditory nerve responses to CI stimulation, enhancing DL's role in auditory research. Further advancements in DL and training methods may help bridge the gap between machine and human performance in complex auditory tasks.
Journal Reference
Sinha, R., Azadpour, M. Employing deep learning model to evaluate speech information in acoustic simulations of Cochlear implants. Sci Rep 14, 24056 (2024). DOI: 10.1038/s41598-024-73173-6, https://www.nature.com/articles/s41598-024-73173-6
Disclaimer: The views expressed here are those of the author expressed in their private capacity and do not necessarily represent the views of AZoM.com Limited T/A AZoNetwork the owner and operator of this website. This disclaimer forms part of the Terms and conditions of use of this website.
Article Revisions
- Oct 22 2024 - Combined applications and conclusion sections to provide a clear ending that ties future directions with practical implications.