Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: When and to what should I apply dithering? (Read 22742 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

When and to what should I apply dithering?

Hello folks.

I'm currently developing a thin audio library for audio decoding, resampling and mixing. I've read up on dithering but I'm not sure when exactly to apply it, and to which signal to actually add the dither. My architecture is as follows:

All source formats are decoded to 32-bit float samples where possible (for example Vorbis using vorbisfile, MP3 using mpg123 and MIDI file rendering using fluidsynth). Some other formats, like MODs (IT, S3M, XM, etc.) are rendered directly to 32-bit integer and then simply converted to 32-bit float as-is (nominal float range is always [-1.0..1.0)). PCM audio stored in WAV or RIFF files is converted straight to 32-bit float from whatever the source depth is. I didn't add FLAC decoding yet, but since it's a lossless format, I assume it's decoding to the same bit depth of the PCM data it compresses so handling will effectively be identical to WAV/RIFF sources.

Next comes resampling, where every 32-bit float signal is converted to the target sampling rate, if needed (or wanted.)

After decoding is done, every signal, which is now 32-bit float, is first attenuated according to its volume setting (which a volume factor of 1.0 meaning "don't touch this at all", otherwise multiply the samples by the volume factor).

Then comes the mixing, which happens by simply adding all the samples together (the joys of 32-bit float mixing.)

So the end I have a final output signal, consisting of 32-bit float samples, which needs to be send to the audio device. At this stage, I convert to the appropriate format for the device by multiplying with an appropriate factor. If anything in the source is equal or above 1.0, or lower than -1.0, I simply clip for now (I plan to add a proper limiter in the future.)

Now the question is: where in the above chain of events would I want to dither? Should I do it after the final mix is converted to the output format, meaning applying dither to the 8-bit/16-bit/24-bit samples just before they're sent to the audio device? Should it happen instead to the final mix while it's still in 32-bit float format? Or should it happen to each source signal individually? And in that last case, for formats that don't decode directly to 32-bit float, should it happen before the conversion to float or before? Also, does resampling play a role here (if each signal needs dither applied individually, should it happen before or after it has been resampled?)

And on a related matter, if currently only one signal is active (meaning nothing gets mixed), and it's not been attenuated at all (meaning having a volume factor of 1.0), is dithering still needed?

Thanks for any insights you can provide on this

When and to what should I apply dithering?

Reply #1
Very simply, as long as you are working with 24 significant bits, as in 32 bit float that is not scaled to some ridiculously low scale factor, you do not dither. The only time that you should dither is when you reduce the bit depth to 16 bits or less, and even at 16 bits dither is not really necessary, but does little harm.

Edit: if you are working in 32 bit float, but the data are only 16 bits, and you do no processing that would add more bits, then do not dither.

When and to what should I apply dithering?

Reply #2
There is no point in dithering at any stage except when reducing precision.

Reducing bit depth- moving to a format with less precision- introduces quantization noise. The overall spectral power of the noise will be similar for any reasonable bit depth reduction method, but if you simply truncate, that noise may be concentrated into harmonic distortions. Dither allows you to control the spectral characteristics of the noise.

Please watch this Xiph video esp. the chapters on bit depth and dithering.

Don't dither when going from 32-bit floats (which have 24 bits of significand precision) to 24bit. 24-bit quantization noise is going to be inaudible with simple truncation. If you're giving 8-bit output, dither will probably be vital to avoid annoying distortions. For 16-bit output it's a toss-up.

When and to what should I apply dithering?

Reply #3
And on a related matter, if currently only one signal is active (meaning nothing gets mixed), and it's not been attenuated at all (meaning having a volume factor of 1.0), is dithering still needed?

Yes if you resample or if output bitdepth is less than the signal bitdepth.

When and to what should I apply dithering?

Reply #4
true, always dither when reducing bitdepth.
Be aware that internal processing of DAW are performed on a high bitdepth, usually 32fp or more.
This means that when you "touch" a 16bit track (even for very small corrections) in a DAW, the used bitdepth is increased to the full bitdepth available, thus the result should be dithered before saving the file.
This is true even if the original and the target file have the same bitdepth.

skyp

When and to what should I apply dithering?

Reply #5
FLAC (command-line) is limited by design to 24 bit, every 32bit file encoded in flac will be "truncated" (no dither) to 24.
for 32bit lossless use wavpack

When and to what should I apply dithering?

Reply #6
Thanks everyone. This clears a few things up.

Just a few details:

And on a related matter, if currently only one signal is active (meaning nothing gets mixed), and it's not been attenuated at all (meaning having a volume factor of 1.0), is dithering still needed?

Yes if you resample or if output bitdepth is less than the signal bitdepth.

Does it matter that resampling happens at 32-bit float? And does the quality of the resampling affect the decision? I support different resamplers as back-ends (SRC, Speex from Opus-tools and SoXR). For example, if I resample a 32-bit float signal using SRC_SINC_BEST_QUALITY, then convert it to 24-bit integer, would I need to dither because of the resampling? (Since according to the other advice, converting to 24-bit doesn't need dither.)

Please watch this Xiph video esp. the chapters on bit depth and dithering.

Don't dither when going from 32-bit floats (which have 24 bits of significand precision) to 24bit. 24-bit quantization noise is going to be inaudible with simple truncation. If you're giving 8-bit output, dither will probably be vital to avoid annoying distortions. For 16-bit output it's a toss-up.

I suppose it doesn't matter that I don't actually truncate bits, but actually do a full rescale to the target bit depth? My current implementation is this:

Code: [Select]
/* Convert and clip a float sample to an integer sample. This works for
* all supported integer sample types (8-bit, 16-bit, 32-bit, signed or
* unsigned.)
*/
template <typename T>
static void
floatSampleToInt(T& dst, float src)
{
    if (src >= 1.f) {
        // Overflow. Clip to max.
        dst = std::numeric_limits<T>::max();
    } else if (src < -1.f) {
        // Underflow. Clip to min.
        dst = std::numeric_limits<T>::min();
    } else {
        dst = src * (float)(1UL << (sizeof(T) * 8 - 1))
              + ((float)(1UL << (sizeof(T) * 8 - 1))
                 + (float)std::numeric_limits<T>::min());
    }
}


Would you recommend truncation instead? (I assume that this means converting to 32-bit integer first, and then drop the low-order bits that don't fit.)

Edit:
Btw, thanks for that video! Seeing the concepts visualized well goes a long way in better understanding them.

When and to what should I apply dithering?

Reply #7
Does it matter that resampling happens at 32-bit float? And does the quality of the resampling affect the decision? I support different resamplers as back-ends (SRC, Speex from Opus-tools and SoXR). For example, if I resample a 32-bit float signal using SRC_SINC_BEST_QUALITY, then convert it to 24-bit integer, would I need to dither because of the resampling? (Since according to the other advice, converting to 24-bit doesn't need dither.)


Resampling (and attenuation) changes the real bit depth of the signal: to 32-bit float in your case. That's all.

I suppose it doesn't matter that I don't actually truncate bits, but actually do a full rescale to the target bit depth? My current implementation is this:


You do truncation when you assign 32-bit float value to the integer variable (T& dst).

When and to what should I apply dithering?

Reply #8
Resampling (and attenuation) changes the real bit depth of the signal: to 320bit float in your case. That's all.

So you mean that if I convert from a 16-bit source to internal 32-bit float, then resample it, I need to dither regardless of the final output format (e.g. 24-bit) simply because of the resampling step?

Btw, should I dither the float samples first and then convert to integer, or convert first and dither the resulting integer samples?

Quote
I suppose it doesn't matter that I don't actually truncate bits, but actually do a full rescale to the target bit depth? My current implementation is this:

You do truncation when you assign 32-bit float value to the integer variable (T& dst).

Ah, I see what you meant now. The fractional part gets truncated, so that counts as truncation.

When and to what should I apply dithering?

Reply #9
So you mean that if I convert from a 16-bit source to internal 32-bit float, then resample it, I need to dither regardless of the final output format (e.g. 24-bit) simply because of the resampling step?

No. If you convert 16bit -> 32-bit float -> 16 or 24 bit  then dither is unnecessary because these 32-bit floats have only 16 non-zero bits, and truncation doesn't create distortion. After resampling or attenuation all bits of signifigand may become non-zero.

(And I agree that dither is not necessary for 32bit float -> 24bit int conversion).


convert first and dither the resulting integer samples?

That's not possible.

When and to what should I apply dithering?

Reply #10
Code: [Select]
/* Convert and clip a float sample to an integer sample. This works for
* all supported integer sample types (8-bit, 16-bit, 32-bit, signed or
* unsigned.)
*/
template <typename T>
static void
floatSampleToInt(T& dst, float src)
{
    if (src >= 1.f) {
        // Overflow. Clip to max.
        dst = std::numeric_limits<T>::max();
    } else if (src < -1.f) {
        // Underflow. Clip to min.
        dst = std::numeric_limits<T>::min();
    } else {
        dst = src * (float)(1UL << (sizeof(T) * 8 - 1))
              + ((float)(1UL << (sizeof(T) * 8 - 1))
                 + (float)std::numeric_limits<T>::min());
    }
}



I'm an audio newbie as well and have a question about this. If we were converting to 8-bit signed, an input of 1.0 would end up as 127 and an input of -1.0 would be -128. So the two would be uneven. Is that OK?

For dithering, would it be OK to just add (0.5f * (float)rand() / (float)RAND_MAX) ? I know there are better dithering algorithms out there.

Can anyone recommend an open source library that does this?

Thanks!


Brendan

When and to what should I apply dithering?

Reply #11
In theory, with an integer format, its minimum value should be -1.0 and its maximum value be 1.0 (i.e. both values Full scale).
But once analysing what this means, with, for example 4 bits ( -8 to 7), we see we have more values on the negative side than in the positive side ( 8 versus 7 in this case, because zero is center).

As such, either zero is not really zero or both sides are not even, and the DAC would need to compensate that.

So in practice, it is not uncommon that maximum value is not really 1.0.


As for dithering, your formula returns a varying value between 0 and 0.5 (rand() starts at zero). That's probably ok, but it has an offset and the effect is more like -0.25..0.25.
You might want to do some spectrogram analysis to determine if it's good enough.

When and to what should I apply dithering?

Reply #12

As for dithering, your formula returns a varying value between 0 and 0.5 (rand() starts at zero). That's probably ok, but it has an offset and the effect is more like -0.25..0.25.
You might want to do some spectrogram analysis to determine if it's good enough.


I thought rand() would return negative values because it returns a signed int, but upon further review you are correct.  So instead I guess I'd add ( ((float)rand() / (float)RAND_MAX) - 0.5).

I'd also add some rounding.  I believe the standard is to add 0.5 to positive numbers and -0.5 to negative numbers before the conversion to integer.


Brendan

When and to what should I apply dithering?

Reply #13
I thought rand() would return negative values because it returns a signed int, but upon further review you are correct.  So instead I guess I'd add ( ((float)rand() / (float)RAND_MAX) - 0.5).


Don't forget to multiply this value by 1/128.


I'd also add some rounding.  I believe the standard is to add 0.5 to positive numbers and -0.5 to negative numbers before the conversion to integer.


It will only add distortion to a signal so it's worse than useless.

When and to what should I apply dithering?

Reply #14
Don't forget to multiply this value by 1/128.


Mmm.. that depends on the implementation. The dither has to be applied to the least significant bit, so one could first convert the range (not the type), then apply dither without any extra division, and then change the type.

When and to what should I apply dithering?

Reply #15

Mmm.. that depends on the implementation. The dither has to be applied to the least significant bit, so one could first convert the range (not the type), then apply dither without any extra division, and then change the type.



Yeah, that's what I meant.  Convert to the full (-128, 127) range in float and then add noise (-0.5 to 0.5) to the result.  That would work regardless of the target bit depth (although I guess people don't think dithering is necessary when going to 16-bit and above).

I think you want to add the rounding factor, otherwise 121.9 will become 8-bit value 121 when it's closer to 122.  Although not doing rounding would just have the effect of lowering samples by 0.5 on average.


Brendan