A recent study published in the journal Scientific Reports comprehensively explored using advanced deep learning (DL) techniques to assess the intelligibility of speech processed through cochlear implants (CIs). It addressed the limitations of traditional methods that rely on human subjects to evaluate CI simulations.
The researchers from the USA aimed to provide a more efficient and cost-effective approach to understanding speech processing in individuals with hearing impairments, particularly in challenging listening environments.
Advancement in Cochlear Implant Technology
CIs represent a significant advancement in auditory rehabilitation, specifically designed to restore hearing in individuals with profound deafness. These devices stimulate the auditory nerve directly through electrodes surgically implanted in the cochlea, bypassing damaged hair cells. Over time, improvements in sound coding and signal processing have aimed to enhance auditory information delivery to the brain.
However, many CI users still struggle to understand speech, especially in noisy environments. Traditional methods for evaluating CI performance rely on human subjects, making the process costly, time-consuming, and subject to individual differences like learning and fatigue. This highlights the ongoing need for research to improve these technologies and enhance user outcomes.
Using Deep Learning for Speech Intelligibility
In this paper, the authors used Whisper, an advanced DL-based speech recognition model developed by OpenAI, to evaluate the intelligibility of vocoder-processed speech. They aimed to compare Whisper's performance with traditional human subject studies, which often face significant logistical challenges and variability due to individual differences in auditory processing.
To achieve this, the study employed acoustic vocoder simulations that mimic the auditory degradations experienced by CI users. These vocoders process sound by extracting amplitude envelopes from spectral channels and discarding temporal fine structure, simulating how CI devices operate. The simulations adjusted various parameters to represent different CI settings and psychophysical conditions.
Whisper was then tasked with recognizing vocoder-processed speech stimuli under various conditions, including different background noise levels. Furthermore, its performance was compared to existing human subject data to validate its effectiveness.
Methodology and Experimental Design
The researchers conducted simulations using the model and processed approximately 270,000 sentences and 900,000 words across 450 different vocoder settings. They systematically varied vocoder parameters like frequency range, number of vocoder bands, and envelope dynamic range to simulate different sound processing settings in CI devices.
Additionally, intensity quantization and low-cutoff frequency of vocoder envelopes were adjusted to mimic the temporal and intensity resolutions experienced by CI patients. The results were then analyzed to determine how these factors influenced the model's performance in both quiet and noisy environments.
Impacts of Using Advanced Model
The outcomes showed that the Whisper model exhibited human-like responses to changes in vocoder parameters. Its performance improved with more vocoder channels, plateauing around 8-10 channels in quiet conditions. In noisy environments, sentence recognition plateaued at 90% with 12 channels, while word recognition continued to improve up to 22 channels, though it remained below 60%.
The authors found that changes in channel frequency boundaries/range and envelope cutoff frequencies significantly impacted performance. Whisper's accuracy declined as the starting frequency increased, with a steep drop occurring beyond a certain point. Performance improved as the envelope low-cutoff frequency rose from 5 to 20 Hz, with no further gains beyond that. The optimal frequency range was found to be between 300 and 7000 Hz.
Higher frequencies above 20 Hz did not provide additional speech information. Increasing the dynamic range improved intelligibility, especially for words with fewer channels. Additionally, the envelope dynamic range and quantization levels significantly affected speech intelligibility. Whisper's performance improved with a dynamic range of up to 60 dB, particularly for words and with fewer channels. The model's performance plateaued at around 20 quantized envelope levels.
Furthermore, while the Whisper model was not specifically designed to mimic human auditory processing, it showed strong resilience to vocoder-induced degradations. This indicates that DL models could provide valuable insights into complex auditory scenarios, offering information that human subject studies may not easily capture.
Applications
This research has significant implications for improving cochlear implant technology and auditory rehabilitation strategies, particularly in noisy environments, which present a major challenge for CI users. It could allow scientists and clinicians to explore more signal processing options and their impact on speech intelligibility without explicit human testing. Overall, this could accelerate the development of personalized CI processing strategies.
Conclusion and Future Directions
In summary, DL techniques like Whisper proved effective in auditory research, especially for evaluating speech intelligibility in vocoder simulations. Their human-like response to vocoder parameters demonstrated its potential to simulate human speech processing. This approach offers significant time and cost benefits and could accelerate CI research and clinical applications.
Future work could explore integrating models of auditory nerve responses to CI stimulation, further enhancing the role of DL in auditory research. Advancements in DL and training methods may also help bridge the gap between machine and human performance in complex auditory tasks.
Journal Reference
Sinha, R., Azadpour, M. Employing deep learning model to evaluate speech information in acoustic simulations of Cochlear implants. Sci Rep 14, 24056 (2024). DOI: 10.1038/s41598-024-73173-6, https://www.nature.com/articles/s41598-024-73173-6
Disclaimer: The views expressed here are those of the author expressed in their private capacity and do not necessarily represent the views of AZoM.com Limited T/A AZoNetwork the owner and operator of this website. This disclaimer forms part of the Terms and conditions of use of this website.