3D Audio: How It Works
Posted: April 10, 2000
Written by: Tuan "Solace" Nguyen

Introduction

Once in a while, technological breakthroughs come by and astound society. And surround sound is no exception. You've probably experienced surround sound in one form or another. If not, at least stereo sound. If your PC isn't primed for 3D audio, your hearing is missing the big leagues. And if you haven't heard of such technologies as Dolby Digital, A3D, EAX, or S3D, then hopefully this article will spark some needed interest. We'll tell you how surround sound and 3D audio works.

Wet Willy

Quite often you'll see a little button on your speakers with the print "3D Audio" or "Surround Sound" used to enable a '3D effect' on your sound output but to your dismay, the sound is either garbled up, loses frequency response or doesn't do anything. The technical name for this effect is called "Expanded Stereo" and it is nowhere near true "3D" audio. The technique we're discussing today is called "Positional 3D Audio". A system capable of this should be able to place sounds in the space around you; above, below, and behind you.

In a movie, having 3D sound isn't as significant as it is on the PC. This is because we want the sound to interact with what we're doing. We want to be able to experience sound as it is when we hear it in real life. For example, when a Ferrari drives by you in Need For Speed IV, you want to be able to hear it's trajectory. And the sound that the engine makes when it gets closer to you changes in detail as it does in volume. This is the main difference between true interactive positional 3D audio and surround sound which just immerses the listener in an audio environment but is less conscious of sound placement.

Surround sound was first introduced for featured films in theaters because the technology was too expensive for home use. Then eventually it slid down into home entertainment systems and in recent years, computers.

Types of Audio Enhancements

There are three methods that companies and developers use to enhance sound in a game or a feature film. Being almost completely different from one another, all three will still enhance your listening experience. And some might even ruin it.

Expanded Stereo

Stereo sound was invented in the 1950's using two speakers, each one being monophonic. Each speaker carries a separate channel. And each channel carries its own discrete signal. Stereo, while being greatly superior to mono, has always been limited by its sound stage, or the size of its audio image perceived by the listener. The major problem was that stereo had a relatively minute optimum listening point which is where the listener must position themselves in order to get the best acoustic image.

Numerous techniques have been used to widen the stereo sound stage and even the optimum listening point by using delays or filters. Most of these however, were poorly designed and implemented and would create muffled sounds, out of phase signals with a great deal of frequency response loss. The bass would just die out and the high trebles sounded "oily".

The most well known companies who specialized in expanded stereo (and usually found on many PC speaker systems) include SRS, Spatializer, and QSound. They use sophisticated algorithms to widen the sound stage more effectively without destroying the original signal. Expanded Stereo is the most convenient method when you're tight on desktop space and low on cash. Some expanded stereo methods entail that you connect a small device between your soundcard and your speakers, adjusting the settings to your liking. Some are done via software such as Power Technology's Widener. A lot of low-to-midrange priced speakers have the technology built in. And even though its sound stage is still limited and it will never do positional effects, it is still a step up from plain stereo.

Surround Sound

Dolby Pro Logic Surround

The first of three types of surround sound is Dolby Pro Logic Surround. It is the oldest surround technology and consists of four channels of audio information: Left, Right, Center and Surround. It is actually composed of two stereo channels. The center channel consists of equal signals from the left and right channel while the surround channel is the left and right channel thrown out of phase with one another. If you try to play a Pro Logic encoded sound through two speakers you may hear a slight echo or sound shift in the audio output. Some home receivers and sound cards have signal processing capabilities that virtualize the center channel with just the front left and right speakers.



Dolby Digital 5.1

Also known as AC-3 -- which is Dolby's encoding system, Dolby Digital is a surround system that stores all channels separately. Left, Right, Center, Behind-Left, Behind-Right, and Low Frequency Effects are all stored on their own respective channels.

Its full name is Dolby Digital 5.1. The 5 stating that there are fives speakers, and the .1 is for the Low Frequency Effects or subwoofer. Because its response range is limited to frequencies below 120Hz and not the entire 20Hz to 20KHz, it is denoted with .1. The AC-3 system can be thought of as an MP3 encoding format where it discards sounds that are supposedly inaudible to the human ear to reduce on data file size.

DTS

Abbreviation of Digital Theater Systems, DTS is similar to Dolby Digital. DTS is designed to be used with DVD audio and could also be used with multichannel audio CDs. DTS encodes and compresses the audio data like AC-3 but does so without discarding as much information as AC-3. However, Dolby Digital is more widely used.

To be able to listen to full Dolby Digital or DTS from your PC, you must have a soundcard with a S/PDIF (Sony/Philips Digital Interface Format) output that supports AC-3 or DTS. You can then pipe the cable to a discrete six speaker capable receiver that can decode AC3 or DTS. Some newer soundcards can down mix AC3 signals for use on quad outputs and virtualizes the missing center channel.



Surround Sound Slow Down

Out of all the surround sound formats just discussed, rarely any are used as an interactive medium in games. This is because the decoding algorithms for AC-3 and DTS are so complex that it would take too long to decode and output to listening to keep up with what is happening onscreen. Consequently, this is the reason why Dolby Digital and DTS are mainly used for pre-rendered cinematic soundtracks.

Enter the world of Positional 3D Sound. As games became more sophisticated and gamers became more sophisticated of what games they were playing, developers had to come up with ways to deliver the effect of surround sound that would envelope the player and(!) be interactive at the same time.

Called an API or Application Programming Interface, this software layer allows games and or applications to have direct access to the PC's hardware. Since the most complex functions are done in hardware, complex algorithms can be used to manipulate the digital signals.

The few Positional 3D Sound APIs that exist include: DirectSound3D, A3D, Sensaura, and Q3D. If you know about these technologies, you may be wondering why EAX and I3DL2 are not included as "Positional 3D Sound APIs". We'll touch on the reason a little later.

Positional 3D Audio

3D Sound in 3D Space

The way in which 3D sound works on a PC requires intensive mathematical computations. The software developer must specify where the sound source is located and where the listener is located in X, Y and Z coordinates. Other parameters that they must consider are the direction that the listener is facing in regards to the sound source, the velocity of the sound source, and the way it radiates - conically or spherically. But to first understand how these sound APIs work and how 3D sound works, we're gonna tell you how your ears work.

Head Related Transfer Functions

There are two main clues that help us identify where a sound is emitting from: Intensity (IID) of Delay (ITD). This means that if a sound source is on your left, then your left ear will hear it louder and sooner than your right ear. It is also interesting to know that the shape of your outer ear helps the brain more accurately define the sound's location in space. Other factors that help localize a sound are head shadow and shoulder bounce, which also alter the sound. We also hear high frequencies better than low ones. This is why it is important to place certain speakers in certain positions. Your subwoofer can be place anywhere because your ears cannot tell which direction the bass is coming from. However, your satellites, being able to create almost the entire sound spectrum must be placed accordingly to where you are.



How HRTFs were designed is relatively simple. Researchers placed a dummy head into an echo free room and installed microphones into the ear canals. Then the sound source was moved around the head at a constant distance from the head. The difference between the acoustic spectral responses in both ears were measured and recorded. The set of these measurements is called a Head Related Transfer Function (a mathematical representation of how the human ear perceives sound). It is composed of three parts: near-ear response, far-ear response and Interaural Time Difference (the time difference between the near and far ear.) This process is repeated for different head sizes and ear sizes to come to a generalized HRTF model suitable for a wide range of audiences.

Piping a sound signal through a HRTF filter should make the sound appear to originate from the location of that specific filter. For example, a sound that is passed through a HRTF filter that was measured for 145 degrees behind you will make it seem as though it were there.

HRTFs alone cannot produce sounds positioned accurately if a signal intended for your left ear is also heard by your right through the left speaker. Developers must also add transaural crosstalk cancellation signals to keep the sound from reaching the opposite ear. These signals are usually inverse waveforms of the original waveform.





Now that you know how 3D sound works in conjunction with your ears, let's get down to the 3D Sound APIs that create the magic.

DirectSound3D

Developed to be included in the DirectX library, Microsoft's own API was designed to offer basic Volume, Panning and Doppler shifts. Unlike the other Microsoft API, it decided that this time, DS3D version 5.0 will allow developers to use whatever property sets and additional code they wish to extend DS3D. This will allow DS3D access to proprietary hardware with exclusive effects.

If your soundcard is DS3D capable, it can take DS3D calls to it and use it in any manner it wants to render the sound output. If your soundcard does not support DS3D calls in hardware, the DS3D will render the sound source output itself. Although you may think that this is great and you won't need to fork out those mint bills for a new soundcard, DS3D in software produces nothing close to what the actual sound would sound like in 3D had it been rendered by hardware. It also requires that you give up great amounts of CPU cycles to render the sound. Although this is one drawback of DS3D in software, it does offer more efficient rendering algorithms than AC-3 and DTS. And two levels of HRTFs are packed into DS3D also to help achieve the effect.

DS3D in DirectX 7.x also includes voice management for capable hardware. This lets the application determine the number of 3D sound streams the soundcard can render simultaneously. And if there is an over request exceeding the maximum streams that the soundcard can handle, then the stream is passed onto the CPU to render. If this is still not possible, the 3D audio stream is down-mixed to simple stereo. Plain stereo signals don't require complex algorithms to filter and therefore a soundcard can render many more stereo streams than 3D audio streams.

Aureal3D



A3D is an API developed in-house by Aureal Semiconductor. It is hailed as the most effective 3D sound API and those who have heard it would agree that it is nothing less than amazing. Aureal's method is purely based on mathematics. It decided that the best way to achieve 3D sound accurately was to accurately model the on screen environment and manipulate the sound source accordingly. A3D2.0 actually takes into account the geometry of the game scene and uses wavetracing to calculate the sounds reflections and occlusions in real-time. Occlusions are what you hear when a sound is being heard with another object like a wall in the way. A muffled and low volume effect. It also is capable or rendering height (z-axis) relations as well. A3D1.0 was just simply based on HRTFs with front and rear effects.

With A3D3.0, Aureal has added geometric reverb effects, support for EAX1.0 and 2.0, Dolby Digital decoding and MP3 decoding, and volumetric sounds such as large crowds that appear on a soundstage too big for point sources.

All this is defined in different algorithms that Aureal has developed for different speaker modes: satellites, quad speakers, monitor speakers, and headphones. As with all surround techniques, there is a sweet spot that one must be situated in order to hear the most convincing effect. Since using earphones always puts you in optimum position, Aureal naturally recommends this as being the best method to hear the most realistic effect. If you decide to setup four speaker mayhem, be aware that HRTFs are only used on the front speakers. But the rears still augment the effect better than a two front speaker setup.



You can find A3D technology with soundcards that feature the Vortex 1 or Vortex 2 chips. Vortex 1 handles only A3D1.0 while Vortex 2 is capable of A3D1.0, 2.0 and 3.0 as well as EAX. Some cards that you may have heard of include:

Aureal SQ1500 (Vortex1) and SQ3500 (Vortex2),
Diamond Sonic Impact S90 (Vortex1) and Monster Sound MX300 (Vortex2),
Xitel Storm VX (Vortex1) and Storm Platinum (Vortex2),
Turtle Beach Montego II (Vortex1) Beach Montego II Quadzilla (Vortex2),
TerraTec XLerate (Vortex1) and XLerate Pro (Vortex2)


Most Aureal based cards have dual stereo outs, meaning that you can connect two sets of speakers for front and rear effects. Of course you’ll want the space to do so. If you’re worried that you don’t have enough room for four speakers, don’t sweat it. A3D technology is designed to work with two speakers. And once you’ve heard A3D, you’ll wonder how you ever got by with stereo. Just for kicks, if you have a capable sound card, try spinning your favorite MP3 around your head! :)

Sensaura3D



Based on similar HRTF algorithms, Sensaura differentiates itself from its competitors by using their proprietary MultiDrive system. Unlike Aureal, which only uses HRTFs on its front channels, Sensaura uses HRTFs on all four channels. This means that all four channels are based on Interaural Time Difference (ITD) or delay. While Aureal uses panning, Sensaura claims that ITDs are more important than panning (which changes volume). Consequently, when using panning, you must be located in a certain position or “sweet spot” that is equal distance from left and right speakers to be effective.



Sensaura’s technology is still maturing while others such as A3D are already mature. Sensaura has still yet to perfect its technology. But other technologies that Sensaura uses are very interesting to know.

Sensaura’s MacroFX technology simulates sounds that are very close to your ears. Such as sounds when you scratch your head or when someone whispers into your ear. This system still has some quirks to be worked out but currently works very well.

ZoomFX simulates volumetric sounds just like A3D. And EnvironmentFX supplies reverb effects presets. It also supports EAX1.0, 2.0 and I3DL2.

A few of the companies supporting Sensaura’s technology include:

Diamond Monster Sound MX400 (ESS Canyon 3D),
Guillemot Maxi Sound Fortissimo (ESS Canyon 3D),
Terratec DMX (ESS Canyon 3D),
Yamaha WaveForce 192/192Digital (Proprietary Yamaha YMF924).

Q3D



Q3D is from the creators of QSound. Q3D2.0 is an acceleration layer on top of DS3D. It is efficient and supports EAX1.0 in software. Q3D is also able to virtualize Dolby Digital, emulating the center channel with the front left and right speakers. Currently, Q3D employs HRTFs when you choose to use headphones and Q3D and panning similar to A3D when using speakers.

An interesting technology from QSound is its Q123, which expands monophonic signals. The effect attempts to sparse certain frequencies to your left and right speakers, simulating stereo. The actual effect causes the sound imagery to appear confusing. At times, certain sounds like a drum beat would suddenly switch speakers. The original mono sound itself may even sound better unfiltered.

Expanded Stereo, Surround Sound, and Positional 3D Audio are all methods of enhancing sound, making it more realistic. Expanded Stereo would be the least expensive of the three and Surround Sound being the most because of receivers and speaker setups. Of those two, none offer interactivity with the listener and are usually all predefined. In some early games that offered options better than Stereo, Dolby Pro Logic was used, but that method died out because it was uneconomical for the player.

So where is EAX and I3DL2?

Reverbaration

Reverb is a method of layering echoes to a sound to make it seem like it is playing in a certain environment. Speaking in a small bedroom sounds much different than speaking in a large hall. The sounds travel further and take longer to bounce back and causes echoes. In a small room where the walls are made of dry wall, the sound would be absorbed more and less sound would return to the source than if they were radiating in a hall with concrete walls. Reverb takes into account the size of the closure of the environment, whether it is in open space or in a closed space. Another factor is material. Speaking in a room with foamed walls sound much different from a room with wooden walls.

These specific echoes help our brains identify the environment that we are situated in. It does not help clue where the sound is actually coming from. Reverb is a method of changing sound stage and sound environment. It does not use any HRTFs and panning techniques.

EAX



Creative introduced Environmental Audio Extensions well over a year ago and it’s still doing very well and gaining leverage everyday. You may have known that already, but what you may not know is that EAX is not a 3D API. It doesn’t take sound to a place it in 3D space. It has no idea where sounds are coming from and may never will.

But what it does know, it knows well. Simply put, EAX is a series of reverb techniques. It simulates sound in different environments. With it, you can tell the difference between a sound inside a metal pipe or a sound inside a stadium. You may say “but everyone says EAX does 3D sound!” And they are partially correct. EAX sits on top of DS3D and modifies the 3D streams and filters the streams through its standard 26 presets from small room to sewer pipe to stadium. Then, the amount of reverb is applied to simulate distance in objects. So, using DS3D as its Positional 3D Audio backbone, it can then further enhance the impressive sense of the environment.

While Aureal had advanced effects like reflections and occlusions, EAX was just 26 reverb presets. Meet EAX2.0, Creative Labs’ answer to Aureal. EAX2.0 now includes support for reflections and occlusions. And also new to the table is early reflections, which are the echoes that precede real-world reverb. This gives a better simulation of environmental sizes and location of the sound source.

As of April, 2000, Creative was working on EAX3.0, which will bring reverb-morphing. It allows the reverb effect to change dynamically, gradually or quickly with the environment change. For example, the music playing from the stereo in your room changes slowly as you start to walk away from it, out of the room, and into the hall, and out into the kitchen. Another feature that is in the works is one-shot reflections used to simulate ricochet effects such as gunshots in a large room.

Currently the most regarded APIs are Aureal’s A3D and Creative’s EAX. Although they are different technologies for different purposes, EAX is more widely used than A3D. This is because there is already DS3D underneath EAX and while not as good as A3D, it is already there and therefore much easier to program for than A3D. All the programmers have to do is to program for DS3D, and choose the appropriate EAX preset and everything is set. In A3D, they have to program for both DS3D and on top of that program for A3D’s wavetracing technology. But, if effort is put into designing for A3D, the end result according to majority, sound better than EAX enabled DS3D.

I3DL2

You may notice how we kept I3DL2 technology for last. This is because it is neither a surround technology, 3D positioning, or reverb technology. Developed by the Interactive Audio Special Interest Group (IA-SIG), I3DL2 (Interactive 3D Audio Rendering Guidelines) Level 2 is a set of minimum acceptable 3D audio features that all platforms should have. Aureal, Creative and Sensaura and other companies contributed to this standard. You can think of I3DL2 as the DirectX of sound. Currently, Level 3 is being developed and may include Creative’s EAX morphing.

Conclusion

Which technology should you choose? Well, you must consider the following factors. Everything lies on top of DS3D, which in itself does not sound very good because it is missing a lot of effects. This is where the other APIs come into play. But whichever one you end up choosing, you should definitely consider the number of speakers you’ll be using. You’ll also want to check on the games that each API supports. If you are going for accuracy and realistic positional 3D audio, consider soundcards based on Aureal’s Vortex2 or ESS’s Canyon 3D technology. If you’re planning to use more than 2 satellites, then Sensaura is the best choice. If you’re only using two front speakers, then A3D is the best.

While Creative Lab’s Sound Blaster Live! series of soundcards don’t really offer any proprietary 3D acceleration, it does accelerate DS3D with its own EAX API, enhancing the base DS3D streams. And while its approach is not as accurate as A3D or Sensaura, the effect does "immerse" you. However, the majority of gamers say that nothing beats A3D. The Live! cards do offer a lot more to the table as well, such as an abundant number of hardware reverb effects and it is designed for the musician as well, while Sensaura and A3D is directed more toward games.

One important thing to note is that Creative is working with Microsoft to incorporate EAX into DirectX. If this does happen, then EAX will be available to any soundcard that is compatible with DirectX, providing one less reason to buy Creative’s card.

Test each technology out and see which one you like best. We can soak you up with all the information there is about these technologies, but the end choice is really up to you and your ears.

Want to return to the normal guide? Click here!

All Content Copyright ©Dan Kennedy; 1999