According to scientists from the University of Surrey, who are inviting the public to check their new text-to-audio model, Generative Artificial Intelligence (AI) systems will stimulate an explosion of creativity in the music sector and beyond.
AudioLDM is a new AI-based system from Surrey enabling users to submit a text prompt which is further utilized to produce an equivalent audio clip. The system has the potential to process prompts and offer clips with the help of less computational power than present AI systems without making a compromise on sound quality or the users’ potential to manipulate clips.
The general public is capable of trying out AudioLDM by visiting its Hugging Face space. Also, their code is open-sourced on Github with 1000+ stars.
Such a system could be utilized by sound designers in different applications such as film-making, digital art, game design, the metaverse, virtual reality, and digital assistants for the visually impaired.
Generative AI has the potential to transform every sector, including music and sound creation. With AudioLDM, we show that anyone can create high-quality and unique samples in seconds with very little computing power.
Haohe Liu, Study Project Lead, University of Surrey
Liu state, “While there are some legitimate concerns about the technology, there is no doubt that AI will open doors for many within these creative industries and inspire an explosion of new ideas.”
Surrey’s open-sourced model is constructed in a semi-supervised approach with a method known as Contrastive Language-Audio Pretraining (CLAP). With the help of the CLAP method, AudioLDM could be trained on enormous amounts of audio data in the absence of text labeling, thereby considerably enhancing model capacity.
What makes AudioLDM special is not just that it can create sound clips from text prompts, but that it can create new sounds based on the same text without requiring retraining.
Wenwu Wang, Professor in Signal Processing and Machine Learning, University of Surrey
Wang added, “This saves time and resources since it doesn't require additional training. As generative AI becomes part and parcel of our daily lives, it's important that we start thinking about the energy required to power up the computers that run these technologies. AudioLDM is a step in the right direction."
The user community has made a range of music clips with the help of AudioLDM in various genres.
AudioLDM is a research demonstration project and depends on the present UK copyright exception exemption available for data mining for non-commercial research.