IPB

Welcome Guest ( Log In | Register )

 
Reply to this topicStart new topic
When and to what should I apply dithering?
Nikaki
post Mar 28 2013, 15:03
Post #1





Group: Members
Posts: 100
Joined: 6-March 07
Member No.: 41226



Hello folks.

I'm currently developing a thin audio library for audio decoding, resampling and mixing. I've read up on dithering but I'm not sure when exactly to apply it, and to which signal to actually add the dither. My architecture is as follows:

All source formats are decoded to 32-bit float samples where possible (for example Vorbis using vorbisfile, MP3 using mpg123 and MIDI file rendering using fluidsynth). Some other formats, like MODs (IT, S3M, XM, etc.) are rendered directly to 32-bit integer and then simply converted to 32-bit float as-is (nominal float range is always [-1.0..1.0)). PCM audio stored in WAV or RIFF files is converted straight to 32-bit float from whatever the source depth is. I didn't add FLAC decoding yet, but since it's a lossless format, I assume it's decoding to the same bit depth of the PCM data it compresses so handling will effectively be identical to WAV/RIFF sources.

Next comes resampling, where every 32-bit float signal is converted to the target sampling rate, if needed (or wanted.)

After decoding is done, every signal, which is now 32-bit float, is first attenuated according to its volume setting (which a volume factor of 1.0 meaning "don't touch this at all", otherwise multiply the samples by the volume factor).

Then comes the mixing, which happens by simply adding all the samples together (the joys of 32-bit float mixing.)

So the end I have a final output signal, consisting of 32-bit float samples, which needs to be send to the audio device. At this stage, I convert to the appropriate format for the device by multiplying with an appropriate factor. If anything in the source is equal or above 1.0, or lower than -1.0, I simply clip for now (I plan to add a proper limiter in the future.)

Now the question is: where in the above chain of events would I want to dither? Should I do it after the final mix is converted to the output format, meaning applying dither to the 8-bit/16-bit/24-bit samples just before they're sent to the audio device? Should it happen instead to the final mix while it's still in 32-bit float format? Or should it happen to each source signal individually? And in that last case, for formats that don't decode directly to 32-bit float, should it happen before the conversion to float or before? Also, does resampling play a role here (if each signal needs dither applied individually, should it happen before or after it has been resampled?)

And on a related matter, if currently only one signal is active (meaning nothing gets mixed), and it's not been attenuated at all (meaning having a volume factor of 1.0), is dithering still needed?

Thanks for any insights you can provide on this smile.gif

This post has been edited by Nikaki: Mar 28 2013, 15:27
Go to the top of the page
+Quote Post
pdq
post Mar 28 2013, 16:58
Post #2





Group: Members
Posts: 3394
Joined: 1-September 05
From: SE Pennsylvania
Member No.: 24233



Very simply, as long as you are working with 24 significant bits, as in 32 bit float that is not scaled to some ridiculously low scale factor, you do not dither. The only time that you should dither is when you reduce the bit depth to 16 bits or less, and even at 16 bits dither is not really necessary, but does little harm.

Edit: if you are working in 32 bit float, but the data are only 16 bits, and you do no processing that would add more bits, then do not dither.


This post has been edited by pdq: Mar 28 2013, 16:59
Go to the top of the page
+Quote Post
jensend
post Mar 28 2013, 17:29
Post #3





Group: Members
Posts: 143
Joined: 21-May 05
Member No.: 22191



There is no point in dithering at any stage except when reducing precision.

Reducing bit depth- moving to a format with less precision- introduces quantization noise. The overall spectral power of the noise will be similar for any reasonable bit depth reduction method, but if you simply truncate, that noise may be concentrated into harmonic distortions. Dither allows you to control the spectral characteristics of the noise.

Please watch this Xiph video esp. the chapters on bit depth and dithering.

Don't dither when going from 32-bit floats (which have 24 bits of significand precision) to 24bit. 24-bit quantization noise is going to be inaudible with simple truncation. If you're giving 8-bit output, dither will probably be vital to avoid annoying distortions. For 16-bit output it's a toss-up.
Go to the top of the page
+Quote Post
lvqcl
post Mar 28 2013, 17:33
Post #4





Group: Developer
Posts: 3371
Joined: 2-December 07
Member No.: 49183



QUOTE (Nikaki @ Mar 28 2013, 18:03) *
And on a related matter, if currently only one signal is active (meaning nothing gets mixed), and it's not been attenuated at all (meaning having a volume factor of 1.0), is dithering still needed?

Yes if you resample or if output bitdepth is less than the signal bitdepth.
Go to the top of the page
+Quote Post
skyp
post Mar 29 2013, 02:32
Post #5





Group: Members
Posts: 2
Joined: 29-March 13
Member No.: 107437



true, always dither when reducing bitdepth.
Be aware that internal processing of DAW are performed on a high bitdepth, usually 32fp or more.
This means that when you "touch" a 16bit track (even for very small corrections) in a DAW, the used bitdepth is increased to the full bitdepth available, thus the result should be dithered before saving the file.
This is true even if the original and the target file have the same bitdepth.

skyp
Go to the top of the page
+Quote Post
skyp
post Mar 29 2013, 03:35
Post #6





Group: Members
Posts: 2
Joined: 29-March 13
Member No.: 107437



FLAC (command-line) is limited by design to 24 bit, every 32bit file encoded in flac will be "truncated" (no dither) to 24.
for 32bit lossless use wavpack

This post has been edited by skyp: Mar 29 2013, 03:43
Go to the top of the page
+Quote Post
Nikaki
post Mar 30 2013, 10:39
Post #7





Group: Members
Posts: 100
Joined: 6-March 07
Member No.: 41226



Thanks everyone. This clears a few things up.

Just a few details:

QUOTE (lvqcl @ Mar 28 2013, 18:33) *
QUOTE (Nikaki @ Mar 28 2013, 18:03) *
And on a related matter, if currently only one signal is active (meaning nothing gets mixed), and it's not been attenuated at all (meaning having a volume factor of 1.0), is dithering still needed?

Yes if you resample or if output bitdepth is less than the signal bitdepth.

Does it matter that resampling happens at 32-bit float? And does the quality of the resampling affect the decision? I support different resamplers as back-ends (SRC, Speex from Opus-tools and SoXR). For example, if I resample a 32-bit float signal using SRC_SINC_BEST_QUALITY, then convert it to 24-bit integer, would I need to dither because of the resampling? (Since according to the other advice, converting to 24-bit doesn't need dither.)

QUOTE (jensend @ Mar 28 2013, 18:29) *
Please watch this Xiph video esp. the chapters on bit depth and dithering.

Don't dither when going from 32-bit floats (which have 24 bits of significand precision) to 24bit. 24-bit quantization noise is going to be inaudible with simple truncation. If you're giving 8-bit output, dither will probably be vital to avoid annoying distortions. For 16-bit output it's a toss-up.

I suppose it doesn't matter that I don't actually truncate bits, but actually do a full rescale to the target bit depth? My current implementation is this:

CODE
/* Convert and clip a float sample to an integer sample. This works for
* all supported integer sample types (8-bit, 16-bit, 32-bit, signed or
* unsigned.)
*/
template <typename T>
static void
floatSampleToInt(T& dst, float src)
{
    if (src >= 1.f) {
        // Overflow. Clip to max.
        dst = std::numeric_limits<T>::max();
    } else if (src < -1.f) {
        // Underflow. Clip to min.
        dst = std::numeric_limits<T>::min();
    } else {
        dst = src * (float)(1UL << (sizeof(T) * 8 - 1))
              + ((float)(1UL << (sizeof(T) * 8 - 1))
                 + (float)std::numeric_limits<T>::min());
    }
}


Would you recommend truncation instead? (I assume that this means converting to 32-bit integer first, and then drop the low-order bits that don't fit.)

Edit:
Btw, thanks for that video! Seeing the concepts visualized well goes a long way in better understanding them.

This post has been edited by Nikaki: Mar 30 2013, 10:46
Go to the top of the page
+Quote Post
lvqcl
post Mar 30 2013, 11:22
Post #8





Group: Developer
Posts: 3371
Joined: 2-December 07
Member No.: 49183



QUOTE (Nikaki @ Mar 30 2013, 13:39) *
Does it matter that resampling happens at 32-bit float? And does the quality of the resampling affect the decision? I support different resamplers as back-ends (SRC, Speex from Opus-tools and SoXR). For example, if I resample a 32-bit float signal using SRC_SINC_BEST_QUALITY, then convert it to 24-bit integer, would I need to dither because of the resampling? (Since according to the other advice, converting to 24-bit doesn't need dither.)


Resampling (and attenuation) changes the real bit depth of the signal: to 32-bit float in your case. That's all.

QUOTE (Nikaki @ Mar 30 2013, 13:39) *
I suppose it doesn't matter that I don't actually truncate bits, but actually do a full rescale to the target bit depth? My current implementation is this:


You do truncation when you assign 32-bit float value to the integer variable (T& dst).

This post has been edited by lvqcl: Jul 20 2013, 09:40
Go to the top of the page
+Quote Post
Nikaki
post Mar 30 2013, 11:56
Post #9





Group: Members
Posts: 100
Joined: 6-March 07
Member No.: 41226



QUOTE (lvqcl @ Mar 30 2013, 12:22) *
Resampling (and attenuation) changes the real bit depth of the signal: to 320bit float in your case. That's all.

So you mean that if I convert from a 16-bit source to internal 32-bit float, then resample it, I need to dither regardless of the final output format (e.g. 24-bit) simply because of the resampling step?

Btw, should I dither the float samples first and then convert to integer, or convert first and dither the resulting integer samples?

QUOTE
QUOTE (Nikaki @ Mar 30 2013, 13:39) *
I suppose it doesn't matter that I don't actually truncate bits, but actually do a full rescale to the target bit depth? My current implementation is this:

You do truncation when you assign 32-bit float value to the integer variable (T& dst).

Ah, I see what you meant now. The fractional part gets truncated, so that counts as truncation.

This post has been edited by Nikaki: Mar 30 2013, 11:57
Go to the top of the page
+Quote Post
lvqcl
post Mar 30 2013, 12:36
Post #10





Group: Developer
Posts: 3371
Joined: 2-December 07
Member No.: 49183



QUOTE (Nikaki @ Mar 30 2013, 14:56) *
So you mean that if I convert from a 16-bit source to internal 32-bit float, then resample it, I need to dither regardless of the final output format (e.g. 24-bit) simply because of the resampling step?

No. If you convert 16bit -> 32-bit float -> 16 or 24 bit then dither is unnecessary because these 32-bit floats have only 16 non-zero bits, and truncation doesn't create distortion. After resampling or attenuation all bits of signifigand may become non-zero.

(And I agree that dither is not necessary for 32bit float -> 24bit int conversion).


QUOTE (Nikaki @ Mar 30 2013, 14:56) *
convert first and dither the resulting integer samples?

That's not possible.
Go to the top of the page
+Quote Post
fnordware
post Jul 19 2013, 21:55
Post #11





Group: Members
Posts: 3
Joined: 19-July 13
Member No.: 109181



QUOTE (Nikaki @ Mar 30 2013, 10:39) *
CODE
/* Convert and clip a float sample to an integer sample. This works for
* all supported integer sample types (8-bit, 16-bit, 32-bit, signed or
* unsigned.)
*/
template <typename T>
static void
floatSampleToInt(T& dst, float src)
{
    if (src >= 1.f) {
        // Overflow. Clip to max.
        dst = std::numeric_limits<T>::max();
    } else if (src < -1.f) {
        // Underflow. Clip to min.
        dst = std::numeric_limits<T>::min();
    } else {
        dst = src * (float)(1UL << (sizeof(T) * 8 - 1))
              + ((float)(1UL << (sizeof(T) * 8 - 1))
                 + (float)std::numeric_limits<T>::min());
    }
}



I'm an audio newbie as well and have a question about this. If we were converting to 8-bit signed, an input of 1.0 would end up as 127 and an input of -1.0 would be -128. So the two would be uneven. Is that OK?

For dithering, would it be OK to just add (0.5f * (float)rand() / (float)RAND_MAX) ? I know there are better dithering algorithms out there.

Can anyone recommend an open source library that does this?

Thanks!


Brendan

This post has been edited by fnordware: Jul 19 2013, 21:58
Go to the top of the page
+Quote Post
[JAZ]
post Jul 20 2013, 11:11
Post #12





Group: Members
Posts: 1779
Joined: 24-June 02
From: Catalunya(Spain)
Member No.: 2383



In theory, with an integer format, its minimum value should be -1.0 and its maximum value be 1.0 (i.e. both values Full scale).
But once analysing what this means, with, for example 4 bits ( -8 to 7), we see we have more values on the negative side than in the positive side ( 8 versus 7 in this case, because zero is center).

As such, either zero is not really zero or both sides are not even, and the DAC would need to compensate that.

So in practice, it is not uncommon that maximum value is not really 1.0.


As for dithering, your formula returns a varying value between 0 and 0.5 (rand() starts at zero). That's probably ok, but it has an offset and the effect is more like -0.25..0.25.
You might want to do some spectrogram analysis to determine if it's good enough.
Go to the top of the page
+Quote Post
fnordware
post Jul 20 2013, 17:24
Post #13





Group: Members
Posts: 3
Joined: 19-July 13
Member No.: 109181



QUOTE ([JAZ] @ Jul 20 2013, 11:11) *

As for dithering, your formula returns a varying value between 0 and 0.5 (rand() starts at zero). That's probably ok, but it has an offset and the effect is more like -0.25..0.25.
You might want to do some spectrogram analysis to determine if it's good enough.


I thought rand() would return negative values because it returns a signed int, but upon further review you are correct. So instead I guess I'd add ( ((float)rand() / (float)RAND_MAX) - 0.5).

I'd also add some rounding. I believe the standard is to add 0.5 to positive numbers and -0.5 to negative numbers before the conversion to integer.


Brendan
Go to the top of the page
+Quote Post
lvqcl
post Jul 20 2013, 18:06
Post #14





Group: Developer
Posts: 3371
Joined: 2-December 07
Member No.: 49183



QUOTE (fnordware @ Jul 20 2013, 20:24) *
I thought rand() would return negative values because it returns a signed int, but upon further review you are correct. So instead I guess I'd add ( ((float)rand() / (float)RAND_MAX) - 0.5).


Don't forget to multiply this value by 1/128.


QUOTE (fnordware @ Jul 20 2013, 20:24) *
I'd also add some rounding. I believe the standard is to add 0.5 to positive numbers and -0.5 to negative numbers before the conversion to integer.


It will only add distortion to a signal so it's worse than useless.
Go to the top of the page
+Quote Post
[JAZ]
post Jul 20 2013, 19:50
Post #15





Group: Members
Posts: 1779
Joined: 24-June 02
From: Catalunya(Spain)
Member No.: 2383



QUOTE (lvqcl @ Jul 20 2013, 19:06) *
Don't forget to multiply this value by 1/128.


Mmm.. that depends on the implementation. The dither has to be applied to the least significant bit, so one could first convert the range (not the type), then apply dither without any extra division, and then change the type.
Go to the top of the page
+Quote Post
fnordware
post Jul 20 2013, 20:13
Post #16





Group: Members
Posts: 3
Joined: 19-July 13
Member No.: 109181



QUOTE ([JAZ] @ Jul 20 2013, 19:50) *

Mmm.. that depends on the implementation. The dither has to be applied to the least significant bit, so one could first convert the range (not the type), then apply dither without any extra division, and then change the type.



Yeah, that's what I meant. Convert to the full (-128, 127) range in float and then add noise (-0.5 to 0.5) to the result. That would work regardless of the target bit depth (although I guess people don't think dithering is necessary when going to 16-bit and above).

I think you want to add the rounding factor, otherwise 121.9 will become 8-bit value 121 when it's closer to 122. Although not doing rounding would just have the effect of lowering samples by 0.5 on average.


Brendan
Go to the top of the page
+Quote Post

Reply to this topicStart new topic
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:

 



RSS Lo-Fi Version Time is now: 1st September 2014 - 13:17