Meta Releases AI-Driven Open-Source Acoustics for VR and AR
Sound plays an important role in virtual reality and mixed reality experiences. Meta has been working on research focusing on the use of Artificial Intelligence in delivering sound quality that realistically matches the settings a user in the metaverse is immersed in.
The company has now presented its new research on the use of artificial intelligence (AI) for realistic metaverse audio. These multimodally trained systems can evaluate visual information and adjust the sound automatically to suit the setting.
Meta’s Reality Labs and the University of Texas also released new AI models that have been designed to optimize sound in virtual reality and augmented reality based on visual data.
Meta is also releasing new Artificial Intelligence models as open source: visual acoustic matching, visual voice, and visually-informed dereverberation. All three models will leverage AI to automatically shape the sound to match visual information. The new research Meta presented focuses on the multimodal interaction of video, audio, and text.
Meta’s research team writes that existing artificial intelligence models do a good job in understanding images and are also getting better at understanding video but to create new and better immersive VR and AR experiences, new multimodal AI models are needed. These are models that can take audio, video, and text signals simultaneously and use them to generate a much richer understanding of the environment.
The AI may, for instance, detect a sound emanating from a cave and then automatically applies the appropriate reverberation (visual-acoustic matching).
Visual-acoustic dereverberation entails matching the sound of existing content to current space instead of to the sound of the space the content was recorded in.
For instance, the recorded soundscape from a theater performance can be processed to appear like it is being performed in the current space in an AR projection. The AI automatically strips out the unwanted background noise from the original soundtrack, the researchers say.
This can also be applied in a virtual concert visit. For instance, an avatar may initially hear muffled sounds in the metaverse and these will become clearer and clearer as the avatar comes closer to the virtual stage.
It can also be used to keep dialogue audible even if there is an increasing ambient volume, making the conversations appear as if the people are standing close to one another without the loud background music. The AI acoustics for VR and AR can also focus the sound around small groups, for instance, to ensure voices do not overlap.
Working in concert, the audio systems could power “intelligent assistants” capable of better understanding what we are saying to them, even in loud settings.
Meta has released the three AI models as open source. You can find additional resources including the paper and the models in Metas’s AI blog.