Dark Side of the Disc Article, Pink Floyd remastered DSotM Review
Dark Side of the Disc Article, Pink Floyd remastered DSotM Review
May 19 2003, 17:08
Joined: 27-March 02
From: California, USA
Member No.: 1631
Interesting review of SACD vs. CD audio mastering of Pink Floyd's "Dark Side of the Moon" in the Stereophile web site. Seems that mastering quality plays a significant role in the sound difference between the two.
Stereophile Article: Dark Side of the Disc
Was that a 1 or a 0?
May 20 2003, 19:31
Joined: 12-January 03
Member No.: 4542
This topic, which partly addresses the possible quality of the CD format, encouraged me to cover a couple of points about dither and the difference between dynamic range and signal to noise ratio and how to visualise it more like the ear does, instead of using waveform view to develop rules of thumb that aren't quite true of what you can hear. So it's a bit long, with pictures, but I will return to SACD at the end.
I saw the Dark Side Of The Moon SACD with red book CD layer on sale recently and suspected I'd soon read about it on HA.org for falling foul of this trend towards overloud remastering of the CD version, and here it is!
By the way, to visualise the effects of clipping, a pure full-scale sine tone at 1760Hz (as used below) amplified by just +1.0 dB (12% amplitude) and peak-clipped to full scale introduces numerous harmonics (overtones) at multiples of the original pure 1760 Hz frequency:
1760 Hz sinewave, +1.0 dB FS amplitude, clipped, frequency analysis.
Another point, as I understand it, is that SACD's DSD encoding is completely unlike PCM (which is used for CDs, and DVD audio) and doesn't have a simple maximum sample value as a hard limit. For that reason the SACD spec includes a loudness measurement on the released music to ensure they don't fall outside the spec. In a way this is more like the analogue limits imposed on vinyl mastering to prevent excessive demands on stylus movement (except you don't have to turn down the bass to fit within the limits, then boost it back in a turntable pre-amp). In a way, having a vague limit (and specific measurements) can prevent the record companies from abusing their mastering engineers as a means to make it sound "hot" and louder than everything else on the CD changer, so SACD, rather like vinyl cannot so easily become subject to this loudness war.
PCM (CD, WAV, etc) samples the waveform a fixed number of times per second and represents the analogue value with a fixed number of bits, like 16 or 24, giving 2^16 or 2^24 instantaneous levels.
16-bit has about 6dB x 16 bits = 96 dB maximum Signal To Noise Ratio (SNR)
24-bit has about 6dB x 24 bits = 144 dB maximum SNR (although the analogue electronics won't be this good)
Dither is essential to minimise harmonic distortion and make the digital audio suitably analogue, and flat dither tends to add about 3 dB more noise, reducing the SNR by about 3 dB accordingly.
Noise shaped dither will reduce the perceived noise by about 15 dB for 44.1 kHz sampling and about 36 dB for 96 kHz sampling, according to Frank Klemm's information), but the measured noise on waveform view will be larger, reducing the measured SNR. At the same time the perceived SNR has improved, as has the perceived dynamic range.
You can visualise this effect by looking at the spectral plot (e.g. Frequency Spectrum in Cool Edit or Exact Audio Copy), which is relatively close to way the ear's cochlea works (different parts detect different frequencies). With about 1024 sample FFT size and a Blackman window function, you get a pretty good picture. A logarithmic frequency scale from about 20 to 20000 Hz is also more ear-like, but doesn't show the energy-density of noise shaped dither clearly, so I'm using linear frequency here.
Here's a 0 dB (full-scale) 1760 Hz sine wave (two octaves above concert middle A) with flat dither, in stereo:
1760 Hz sinewave, 0 dB FS amplitude (just before clipping), flat dither, frequency analysis.
The tone was created in Foobar2000 v0.62a as Add Location... tone://1760,2 with ReplayGain values manually set to 0dB for gain, 1.0 for peak, with only Mono to Stereo DSP used, converting (Diskwriter) to 16-bit PCM dithered, flat dither (no noise shaping), tone generator set to 44100 S/s, oversample 32x. It was analyzed in EAC's WAV editor (which only accepts 44100 S/s stereo, 16 bit WAVs)
Even on flat-dithered 16 bit audio, as above, you'll note that the noise in each spectral bin is below -96 dB. For an FFT size of 1024 with a Blackman Window Function (giving similar time resolution/averaging time to the ear) you'll note that noise is about -120 dB in each frequency bin for flat dither. This means the normalised power (not amplitude) in each frequency bin is 1e-12 (10 to the power of -12 or 10^(-12)), and over the whole power spectrum of 512 bins, this adds up to 5.12e-10 of total dither noise power). Converting to dB (10 x log(5.12e-10), this comes to -93 dB, which is the expected signal to noise ratio after adding dither. (Note a 1024-point FFT gives 512 bins in the power spectrum because of negative frequencies, which make sense in complex mathematics but are ignored for the power spectrum)
Now, with Garf's strong ATH noise shaping (recommended) dither applied to 16-bit audio in Foobar2000 v0.62a, sample values can reach a peak sample value of +32 and have an average RMS power of -70 dB, but although this is about 26 dB more power, it's perceived to be considerably quieter (perhaps 15-20 dB quieter) than flat dither because most of that extra power is concentrated in high frequencies to which the ear is insensitive, while there's less power in the frequencies where the ear is most sensitive:
1760 Hz sinewave, 0 dB FS amplitude, strong ATH noise shaping dither, frequency analysis.
That example includes the same 1760 Hz tone at 0 dB FS. This one is silence plus strong ATH noise shaping dither, just for illustration of the dither spectrum alone:
Silence, strong ATH noise shaping dither, frequency analysis.
You could imagine how much more noise we could fit in up to 24 kHz if we had a 48 kHz sampling frequency. This might buy a further few dB less noise in the 1-4 kHz region while still preventing truncation distortion as dither ought to.
Add up the power of all the components and the total dither power comes to -70dB over the full spectrum, but in the important regions it's well below the -120 dB per bin we had with flat dither. It's more like -138 dB per bin, at about 1-4 kHz where the ear is very sensitive (e.g. babies crying!). 18 dB less is about 1/64th of the power spectral density.
Incidentally, the 1760 Hz tone when scanned in FB2K, generates a replaygain of -15.22 dB, so that tells you how piercingly loud the full scale signal would sound if played without RG, partly because it's in the region where the ear is very sensitive.
Now, let's consider how the ear manages to perceive a sine wave supposedly below the noise floor of 16-bit audio, at -102 dB (amplitude = 0.25 bits, peak-to-peak = 0.5 bits). Remember, the simple Signal to Noise Ratio (SNR) is 96 dB, so a signal at -102 dB FS, by that rule of thumb, ought to be simply lost. Thanks to dither and the way the ear works similarly to the frequency spectrum, that's not the case, and the dynamic range for perceiving tonal frequencies is greater than the SNR.
This is where dither is essential. I'll manually set the RG values to -102 dB to create this tone.
Using flat dither (no noise shaping), you cannot discern the 25.06 sample period of the 0.25-bit high sine wave when you zoom in on the waveform:
1760 Hz sinewave, -102 dB FS amplitude, flat dither (no noise shaping), waveform view (about 6 full periods of the sinusoidal waveform are shown, believe it or not!).
...but if you look at the frequency spectrum of the exact same waveform, you'll see that the -102 dB peak at 1760 Hz is sufficiently above the dither noise (-120 dB/bin) to be easily spotted:
1760 Hz sinewave, -102 dB FS amplitude, flat dither, frequency analysis.
This is a much better representation of how the ear perceives things than the waveform view where your eye can't pick out the correlated timings of those sporadic spikes in the least significant bit to notice the frequency that's actually present there. The ear contains resonant detectors which can pick out the frequency, rather like a spectrum analyzer.
So people who worry that a -96 dB sine wave would disappear when applying -6 dB of Replaygain, might be reassured by this demonstration (which requires dither to be guaranteed to work, although other tones in music often effectively provide partial dither even with undithered reproduction, such as simple MP3 decoders used in portables which don't suddenly start losing the quiet tones within the music when you apply mp3gain or do deep fadeouts using mp3directCut). You can try various things out more audibly with Foobar2000, by playing in the preferences. You may be surprised how good 8-bit 44.1 kHz audio with strong ATH noise shaped dither can sound! 8-bit audio uses sample values from -128 to +127 and has a -48 dB SNR (whereas 16-bit uses 1/256th the step size, giving -32768 to +32767).
With strong ATH noise shaping dither, the same signal looks even more deeply buried in noise on the waveform view, but the absence of noise spectral components at the ear's most sensitive frequencies actually helps it stand out even more clearly in the spectral view compared to the noise level at frequencies around it:
1760 Hz sinewave, -102 dB FS amplitude, strong ATH noise shaping dither, frequency analysis.
Remember that the original 0 dB FS sinewave had a perceived volume (replaygain calculation) of 89 dB SPL + 15.22 dB = 104.22 dB SPL. The -102 dB FS one, shown above, has a perceived loudness of about 2.22 dB SPL (sound pressure level), and you can see that there's scope for a fair bit more reduction before it sinks into the noise floor on the frequency spectrum (even with flat dither, let alone strong ATH noise shaping).
Now, music isn't all tonal frequencies like sine waves, and it includes transients, percussive noises and vocal sibilants, which are far more noiselike and broad-band in the frequency spectrum.
To try this out, I generated some pink noise at 44100 S/s stereo, 16-bit. Foobar2000 reported the ReplayGain as -2.07 dB (peak = 0.757446), meaning the original was 91.07 dB SPL. (Programs like Cool Edit can generate noise)
pink noise, 91.07 dB SPL perceived loudness, frequency analysis.
I then silenced 0.5 seconds in the middle of the 2 second generated noise using zero-crossing adjustment, then faded in a further 0.2 seconds in the right channel only (from 0% to 100% linear) and saved this as pink_noise_edited_91.07dB_SPL.wav . This modulation of the noisy sound's volume is the sort of thing one needs to perceive (e.g. beating drum, cymbal attack/decay, vocal sibilant sound) and the left-right difference might equate to some impression of stereo image panning.
To bring the noise to a similar perceived loudness as the -102 dB FS tone, i.e. 2.22 dB SPL, I manually entered replaygain values of -88.87 dB (peak amplitude would now be 0.89 of a bit) and wrote it out using the diskwriter (with strong ATH noise shaping dither) then renamed that file pink_noise_edited_02.22dB_SPL.wav
On waveform view, the pink noise signal was completely hidden in the shaped dither, which looked like a flat signal with no variation in amplitude over time. I amplified it by 48.7 dB (by normalising to 25%) to hear it easily, making it equivalent to 8-bits. The sudden cessation of noise then its restoration on the left ear with rapid fade-in on the right was completely obvious despite the extremely low volume and peak amplitude.
Given the broadband noiselike nature of the signal, the frequency spectrum in the noise and when it disappears are difficult to see. However, the spectral view (spectrogram) is a WAV editor colour-codes the power of the frequency components as they vary with time, and after 48 dB amplification, it's loud enough to see quite clearly:
pink noise with cut then fade in on right, 2.22 dB SPL perceived loudness, strong ATH noise shaped dither, then amplified 48.7dB, spectrogram.
I redid the amplified noise as an 8-bit file, with the same dither type (now at 50.22 dB SPL perceived loudness) straight from fb2k, and it sounds the same, as you can hear in (this Monkey's Audio .APE compressed file, 59KB). With this sample, it's somewhere around 40 dB SPL that the variation becomes barely audible in an 8-bit file, so for 16-bit turned up loud, it would be around 48 dB lower, i.e. -8 dB SPL.
Given that the noise could reach a peak level of 1.0000 even without distortion and dynamic limiting, which is 2.41 dB louder than the 91.07 dB SPL, i.e. it's at 93.48 dB SPL, even for pink noise signals, the usable dynamic range is about 101.5 dB (and considerably more with dynamic compression).
For a mixture of noise and tonal signals (the latter may be perceived louder than noise-like signals, such as the 1760 Hz full scale sine tone perceived as +104.22 dB SPL), there's a usable range of around 112 dB before dynamic compression. For sounds with reasonable tonality (i.e. less like uncorrelated noise) they can be perceived over a wider dynamic range in 16-bit audio, of perhaps 120 dB (depending where you decide to call the cut-off).
I think this also demonstrates how sounds sink gracefully into the noise floor with dithered digital audio and aren't abruptly cut off as one might expect from looking at the waveform view.
Anyhow, getting back to SACD's DSD scheme, this operates at 2.8224 MHz sampling rate with a single bit of a fixed differential change up or down in voltage at each cycle per channel. Effectively, the resolution of the system is all created by dither (noise shaped) at inaudibly high frequencies (frequencies that may well be filtered out by the electronics and loudspeakers). It isn't necessary to use any brickwall filtering at around 22 kHz to prevent aliasing. This is 5645 kbps of data in stereo, most of which is involved in dithering at frequencies >30 kHz. Large amplitudes could be encoded at low frequencies, but such large amplitudes are not possible at high frequencies, so to preserve bandwidth and audible quality, limitations on the loudness of the signal are imposed by the SACD format. To follow even quite a loud 1 kHz sinewave upwards, one would have a number of up and down transitions every 0.35 Ás, with the up transitions slightly outweighing the down transitions by a suitable amount so that the average position of the wobbly curve follows the sine wave as closely as possible given that it has to go up or down at each clock cycle. In essence, it's pulse density modulation.
CD has 1411 kbps of data, or about a quarter as much, and can achieve about 115-120 dB of effective dynamic range in the audible band using noise shaped dither etc on two channels. At 5800 kbps for two channels, SACD could effectively dither to an adaptable tradeoff between the effective bit depth (resolution) and the effective bandwidth. SACD is designed for 120 dB SNR in the audible bandwidth and up to 100 kHz maximum bandwidth (albeit with worse SNR outside the audible bandwidth than within it). Furthermore, with lossless compression, they can also put a 6 channel audio stream on the same disc.
Super Audio Compact Disc: A Technical Proposal (PDF format)
The above article also suggests some of the uses of adaptable bandwidth versus resolution tradeoff for archiving various studio media digitally or to allow different mastering techniques to be used in future while remaining compatible with existing home audio equipment.
Personally, I think the inability to set a hard limit and light up those peak meters, combined with the ability to directly convert to noise-shaped dithered 16-bit 44.1 red-book PCM, may perversely be something that restores decent mastering values and dynamics to music if this format gets off the ground, even if it's not necessary to have quite so much dynamic range and frequency response. Also, it's harder to make a bad SACD player (or DVD-A player) than a bad CD player because the steep anti-alias filtering near the Nyquist limit and even the error correction and concealment aren't remotely as critical.
SACD cannot be maxed out to a hard limit the same way as CD, and if the production is done in the DSD domain with direct conversion to red book CD format for the second layer of the disk as described in the link above, the dynamics and mastering of the CD layer will be the same as for the SACD later. Alternatively, if the mass market are deemed to want their music heavily compressed, they can have the CD layer mastered that way while the SACD layer is mastered properly, so at least decent music is available (and hopefully on the same disc you bought before upgrading to SACD).
Really, what I hope for is a return to analogue thinking and discipline, and keeping an engineering safety margin to the limits of the medium rather than knowing you can go right up to full scale. I fear that DVD-A could in no time become as heavily compressed and boring as recent CDs and that remasters of old albums on DVD-A could be as badly damaged as some remastered CDs of late.
|Lo-Fi Version||Time is now: 30th May 2015 - 14:02|