Deep learning could bring the concert experience home

Now the registered one the sound has become omnipresent, we hardly think about it. From our smartphones, smart speakers, TVs, radios, disc players and car audio systems, it’s a lasting and fun presence in our lives. In 2017, a poll by the polling firm Nielsen suggested that about 90% of the US population listens to music regularly and that, on average, they do about 32 hours a week.

Behind this unbridled pleasure are huge industries applying technology to the longstanding goal of reproducing sound with the greatest possible realism. From Edison’s phonograph and horn loudspeakers of the 1880s, successive generations of engineers in pursuit of this ideally invented and exploited countless technologies: triode vacuum tubes, dynamic loudspeakers, magnetic phonograph cartridges, solid-state amplifier circuits in dozens of different topologies. , electrostatic speakers, optical discs, stereo and surround sound. And in the past five decades, digital technologies, such as audio compression and streaming, have transformed the music industry.

Yet even now, after 150 years of development, the sound we hear from even a high-end audio system is far less than what we hear when we are physically present at a live music performance. In such an event, we find ourselves in a natural sound field and we can easily perceive that the sounds of different instruments come from different places, even when the sound field is crossed with mixed sounds of multiple instruments. There’s a reason people pay hefty sums to listen to live music – it’s more fun, exciting, and can generate more emotional impact.

Today, researchers, companies and entrepreneurs, including ourselves, are finally getting closer to recorded audio that truly recreates a natural sound field. The group includes large companies, such as Apple and Sony, as well as smaller companies, such as Creative. Netflix recently revealed a partnership with Sennheiser whereby the network has started using a new system, Ambeo 2-Channel Spatial Audio, to increase the sonic realism of TV shows like “Stranger Things” and “The Witcher”.

There are now at least half a dozen different approaches to producing highly realistic audio. We use the term “soundstage” to distinguish our work from other audio formats, such as those referred to as spatial audio or immersive audio. These can represent sound with a more spatial effect than normal stereo, but typically do not include the sound source localization details needed to reproduce a truly convincing sound field.

We believe soundstage is the future of music recording and playback. But before such a revolution can take place, a huge obstacle will have to be overcome: that of converting the countless hours of existing recordings in a convenient and cost-effective way, regardless of whether it is mono, stereo or multichannel surround sound (5.1, 7.1, and so on). Nobody knows exactly how many songs were recorded, but according to entertainment metadata company Gracenote, more than 200 million recorded songs are now available on planet Earth. Since the average length of a song is around 3 minutes, this equates to around 1,100 years of music.

After separating a recording into its component tracks, the next step is to remix them into a stage recording. This is achieved by a stage signal processor. This sound stage processor performs a complex calculation function to generate the output signals that drive the speakers and produce the sound stage audio. Inputs to the generator include isolated tracks, physical speaker locations, and desired positions of the listener and sound sources in the recreated sound field. The soundstage processor outputs are multitrack signals, one for each channel, to drive multiple speakers.

The sound field can be in a physical space, if it is generated by speakers, or in a virtual space, if it is generated by headphones or earphones. The function performed within the sound stage processor is based on computational acoustics and psychoacoustics and takes into account the propagation of the sound wave and interference in the desired sound field and HRTFs for the listener and the desired sound field.

For example, if the listener will be using earphones, the generator selects a set of HRTFs based on the configuration of the desired sound source positions, then uses the selected HRTFs to filter out isolated sound source tracks. Finally, the soundstage processor combines all the HRTF outputs to generate the left and right tracks for the earphones. If the music will play from the speakers, at least two speakers are required, but the more speakers, the better the sound field. The number of sound sources in the recreated sound field can be greater or less than the number of speakers.

We released our first stage app, for iPhone, in 2020. It allows listeners to set up, listen and save stage music in real time – processing causes no discernible delays. The app, called 3D Music, converts stereo music from a listener’s personal music library, from the cloud, or even from streaming music to the stage in real time. (For karaoke, the app can remove vocals or play any isolated instrument.)

Earlier this year, we opened a web portal,, which provides all the features of the 3D Music app in the cloud plus an application programming interface (API) that makes the features available to music providers. streaming and even to users of any web browsing program. Anyone can now listen to music with stage sound on virtually any device.

As sound reaches your ears, the unique characteristics of your head – its physical shape, the shape of your outer and inner ears, even the shape of your nasal passages – change the audio spectrum of the original sound.

We have also developed separate versions of the 3D Soundstage software for vehicle and home audio systems and devices to recreate a 3D sound field using two, four or more speakers. In addition to music playback, we place high hopes on this technology in video conferencing. Many of us have had the tiring experience of participating in video conferences where we had a hard time hearing other participants clearly or being confused about who was talking. With the soundstage, the audio can be configured so that each person is heard coming from a distinct location in a virtual room. Or the “position” can be simply assigned according to the position of the person in the grid typical of Zoom and other videoconferencing applications. For some, at least, videoconferencing will be less tiring and speech will be more intelligible.

Just like audio shifted from mono to stereo and from stereo to surround and spatial audio, it is now starting to move across the stage. In those earlier eras, audiophiles rated an audio system based on its fidelity, based on parameters such as bandwidth, harmonic distortion, data resolution, response time, lossless or lossy data compression, and other factors related to signal. Now, the soundstage can be added as another dimension to the fidelity of sound and, dare we say, the most fundamental. To human ears, the impact of the soundstage, with its spatial cues and compelling immediacy, is far more significant than incremental fidelity improvements. This amazing feature offers capabilities previously beyond the experience of even the deepest audiophiles.

Technology has fueled previous revolutions in the audio industry and is now launching another. Artificial intelligence, virtual reality and digital signal processing are leveraging psychoacoustics to offer audio enthusiasts capabilities they never had. At the same time, these technologies are offering record companies and artists new tools that will breathe new life into old recordings and open new avenues for creativity. Finally, the age-old goal of convincingly recreating the sounds of the concert hall has been achieved.

This article appears in the October 2022 press issue as “How Audio is catching up.”

From your Articles site

Related Articles Around the Web