The Magic of Dither by Stephen Dawson

Most manufacturers have moved from 16-bits, to 24-bits, to 32-bits. Here, Stephen Dawson explores what happens to a musical signals—and test tones—when you reduce the number of bits, in this case from 16-bits down to just 8-bits, and how dither can magically come to your aid...

Digital audio is not a subject lending itself to intuitive human understanding. But we try to use words best suited to such understandings to describe it. Unfortunately, it too often leads us astray.

So we have arguments between audiophiles and engineers on the value—even the audibility—of different forms of digital audio. Is DSD better-sounding than PCM, or is it merely a different way of representing a digital signal of a certain resolution? Does an increase in the PCM sampling frequency from 44.1kHz to 96kHz, or even to 192kHz, yield audible improvements? If so, how are they manifested? Is the increased resolution of 24-bit audio worth the doubling in file size*?

How do you even judge these things?

A good start is by gaining a strong understanding of what audible effects these things can actually have on music. But that can be difficult. Most people would accept that the difference in quality between 16-bit and 24-bit PCM sound is likely to be subtle, so it can be difficult to put your finger on what the precise difference is, if any.

Being human, our judgements about these things can be influenced by how we conceptualise them. But, as I stated at the outset, digital audio is not easily intuitive. So we might think in analogies. Simple math shows that 16-bit PCM defines sound on a scale with 65,536 levels. With 24-bits that scale goes to 16,777,216 levels. It is therefore tempting to think that because of the increased number of levels that 24-bit PCM is going to sound smoother than 16-bit PCM, since each sample is more precisely defined. Surely 16-bits will sound 'grainier', by analogy with coarse beach sand vs. the fine talc of 24-bits.

And it should sound even grainier still if you drop down even further to a mere 8-bits. That, after all, encodes each sample to one of just 256 levels! Are we now talking pebbles, rather than grains?

We all know 8-bit sound is awful. Or, at least, those of us old enough to remember the system sounds on Windows 95 (and earlier) computers, many of which were just 8-bits (and often 22.05kHz or lower in sampling frequency). But how bad the sound is, and bad in what way, is determined by more than just the bit depth. The sound can be treated—even 8-bit sound—to make is much better than you might expect.

Let's illustrate this. I am using the final 13 seconds of the track Kangaroo Street, Part 1 from the album 'Walk into the Sun' by the Sydney group B'Jezus (thanks to Jez Ford, editor of sister magazine Sound+Image, and leader of the group, for permission to use the track,**). This is of course in standard CD quality—16-bits, 44.1kHz sampling—and the section consists of a final refrain followed by a nicely natural acoustic fade out. I added one second of digital silence at the start to allow time for DACs to lock on.

You can listen to it HERE 

Then I did a straight down-conversion to 8-bits. No changes other than that, and with no dither added.

You can listen to it HERE 

Listening to this is fascinating. For the first second there is still no sound, because digital black in 8-bits is identical to digital black in 16-bits. But the instant the actual signal starts a background noise becomes obvious. For the first six or seven seconds it sounds more or less like white noise in the background of music that otherwise sounds the same as the original. But as the music quietens, particular during the decay of the gently plucked guitar strings, there is a distinct static-like feel, and the final decay of the guitar sounds like it has been placed in a fast modulation envelope, something which is definitely not there in the original.

This is the sum total of what happens when music is naively reduced in resolution. First, there is white noise. Second, there can be artefacts associated with low levels of signal. With 8-bits of resolution, any parts of the signal below –48dB will be encoded at either one or zero bits of resolution, effectively as square waves. That will both generate odd order harmonics and in the case of repeating tones, emphasise the fundamental so that it peaks at –48dB.

So what can we do about it? Dither is the answer! Dither is low-level noise that is added to the signal. This has the effect of removing correlations and breaking up artefacts but in so doing it increases the level of the white noise. So the signal itself is just as pure as the original higher-resolution one, but it is now immersed in a fair bit of noise.

You can hear the result HERE:

So what can we do about that? Noise-shaping is the answer! Basic dither just randomises the least significant bit of the signal—it randomly adds, or doesn't add, one. Noise-shaped dither is noise which is stronger in some frequency bands than in others. There are plenty of shapes available, but the one that suits our purposes is one that reduces the level of noise in the most audible part of the spectrum at the cost of increasing it in less audible parts. For this purpose I chose an aggressive one called ‘E2’ in my noise-shaping software, with a nominal 1-bit of depth. All the noise below about 11kHz is lower than with plain dithering, and more than 24dB lower down at 1kHz. But above 15kHz the noise is much louder. Listening-wise, it is far less objectionable and leaves the music much clearer, as you’ll be able to hear at tinyurl.com/bjezus-8-bit-shaped-dither. Fine-tuning with different noise-shaped curves and reduced levels of dither can improve the results further.

The graphics illustrate these shapes. Graph 1 shows the spectrum of a 980Hz sine wave at –80dB recorded (digitally created, actually) in 16-bits. The sound is located HERE 

Graph 1 908Hz sine wave at -80dB (Click on graph for larger version)
 

 Graph 2 ('980Hz - 8 bits - no dither.png') shows a naive down-conversion to 8-bits (i.e. with no dither added). You will notice that not only has the noise floor increased from –136dB to –100dB, but there are now massive amounts of harmonic distortion—you can see distortion components right out to the fifteenth harmonic. Even more disturbingly, instead of being at –80dB, the main tone is now at –48dB and most of the distortion components are higher in level than the fundamental itself is supposed to be! The sound is located HERE

Play this loud and the 980Hz tone sounds ‘way too loud, and rather like a square wave, thanks to the harmonics.

Graph 2 980Hz sine wave at 8 bits, no dither. (Click on graph for larger version)

Once we add dither, proper order is restored, as shown in Graph 3 (980Hz, 8-bits, Unshaped dither.png'). The harmonics disappear and the 980Hz tone resumes its proper level. But the noise floor is now at –88dB, an increase of 12dB. This means that although the tone is at the same level as previously, it is now only 8dB higher than the noise, and I was unable to hear it when I turned up the volume. You can listen to it HERE

Graph 3 980Hz 8 bits unshaped dither (Click on graph for a larger version)

Finally, Graph 4 shows what happens when the original tone is down-converted to 8-bits, with the addition of noise-shaped dither. ('1kHz - 8 bits - shaped dither E2.png'), With the E2 noise-shaped dither, the 980Hz tone is still at the proper level, and still undistorted (no harmonic distortion components visible), but it’s now around 30dB higher above the noise floor in its frequency band. In the band of frequencies to which the ear is most sensitive, the noise is greatly lowered. When I listened to this, the tone was audible and clean with the volume advanced. You can listen/download it HERE

Graph 4. 980Hz - 8 bits - shaped dither (click on graph for larger version)

Let's pause for a moment. This sine wave peaks at –80dB, which is 32dB below what you'd expect to be the resolving ability of 8-bit PCM. Yet with the addition of some shaped dither—the addition of some artificial noise—it is clear and clean.

So when assessing 16-bit versus 24-bit sound, you know what to listen for. Just remember the noise will be 48dB quieter, and any artefacts in undithered material will not only be 48dB quieter, they will also be much rarer since much less content is encoded to less than –96dB than there is to –48dB. # Stephen Dawson

* In the real world, lossless compression is much less efficient with the additional 8-bits than with the most significant 16-bits, because the additional bits mostly encode noise.

(**You can hear the complete track, along with the rest of the album, HERE at https://soundcloud.com/bjezus)

ADDENDUM:

In the article above, I made mention of the fact that the 980Hz signal I used was made LOUDER by a naive 16-bit to 8-bit undithered conversion, so in this on-line addendum I explore the weirdness that my experiments revealed, and explain why it happened.

Let’s create a sine wave of 980 hertz with a sampling rate of 44.1kHz and 16 bits of resolution. Let this sine wave be of an extremely low level, with a peak of -80dB. Here’s its spectrogram:

[980 hertz, -80dB, 16 bits]

(This was generated with a small amount of dither, which is why it has a noise floor at -138dB and is unaccompanied by the nasty harmonics that would normally be present with a sine wave that is sketched out in values of +3 to -3). Note that the spike reaches -80dB.

Now let’s downsample it to 8 bits, accompanied by some dither noise:

[980Hz - 8 bits - unshaped dither]

Note the very high level of the noise, to the point that the 980 hertz signal barely peeks out over the top. This signal’s spike reaches -80dB, just as it did before it was downconverted.

So let’s go back to the 16 bit original and downsample it again to 8 bits, this time with some aggressive noise shaping:

[980z - 8 bits - shaped dither E2]

That’s better. The 980 hertz signal stands well out from the signal, and if you turn the whole thing up it is clearly audible. It, once again, reaches -80dB.

Now what I’ve been leading up to:  again we return to the 16 bit original, and again we downsample to 8 bits. But this time there’s no dither, no added low level noise. Instead we simply divide the 16 bit value of each sample by 8, and map it onto the nearest 8 bit integer value. Here’s the spectrum:

[980Hz - 8 bits - no dither]

Lots of harmonics, quite a bit of noise (although a lower level than with the plain dithering) and something that’s distinctly odd. Instead of the 980 hertz spike reaching -80dB, it makes it up to -48dB. How could reducing its precision from 16 bits to 8 bits have caused that?

Well, here’s the thing: -80dB is lower than minimum value definable in an 8 bit signal. The lowest quantisation level is -48dB. So the only way this wave form can be represented is by toggling between values for zero and one. Let’s zoom in on the wave form:

[980Hz - 8 bits - no dither wave form]

As you can see, our sine wave has become a kind of square wave, which explains all those harmonics. But it also explains why the fundamental is way too powerful. A sample value of ‘1’ is -48dB on an 8 bit scale, so that’s what our signal becomes.

Which is why dither is so vitally important... at least with 8 bit audio. # Stephen Dawson