Page Card

Spatialization

Panning is the technique of shifting the amplitudes of a sound source between sound channels. Stereo panning is the technique of shifting the amplitudes of a sound source between a left channel and a right channel.

Spatialization is the technique of setting a sound in three-dimensional space. Natural sounds exist in space, and our brains are experts at determining the location and direction of sounds. Adding spatialization adds curiosity and depth.

Cues of Spatialization

We use many cues to determine the location and direction of a sound. In the sound realm, we use amplitude, phase, time-gap, high-frequency content, and reverberation.

A close sound is louder, a further away sound is softer.

If a sound arrives at the left ear faster than the right ear, we perceive the sound as coming from the left. The delay between ears, which informs our sense of space of the sound, we call "Interaural Time Delay". Phase information in the sound also informs our brain of the location and direction.

A sound close has its full spectrum, while a sound further away loses some of its high frequency content.

Finally, reverberation informs our sense of space. A closer sound will appear to have less reverberation, while a sound further away will have more reverberation.

Sometimes, small movements of the head can greatly increase our understanding of the space of the sound. We constantly make small head movements.

If a sound moves towards us, we perceive the sound as having a higher pitch. If a sound moves away from us, we perceive the sound as having a lower pitch. We call this the Doppler effect.

Three Dimensional Sound

There are three dimensions when we set a sound in space: distance, elevation, and azimuth. Distance is how far the sound is from the observer. Elevation is how much above or below the sound is. Azimuth is how much to the left, right, in front, or in back the sound is.

Binaural audio simulates three-dimensional sound using headphones, which is two channels of audio. Binaural audio works by encoding all the cues of space of sound. Most binaural recordings use a manakin with two specialized microphones inside.

We can also simulate binaural audio using a Head-Related Transfer Function. A HRTF is a collection of mathematical constructs that simulate space cues on sound sources.

Another technology that enables spatialization is Ambisonics. In ambisonics, there are four channels of audio: X Y Z and W. We can route to any number of speakers in any arrangement around the observer. Many toolkits will allow the sound designer to set each sound source's distance, elevation, and azimuth. Then the software will automatically set each sound in space and create the four channels. Ambisonics uses mathematical models to route the four channels to the speakers. These models give each sound source a sense of space. With ambisonics, we can scale and spatialize a single recording for anything from 2 speakers to 512 speakers and beyond.