IPB

Welcome Guest ( Log In | Register )

> foobar2000 General Forum Rules

This is NOT a tech support forum.
Tech support questions go to foobar2000 Tech Support forum instead.

See also: Hydrogenaudio Terms of Service.

 
Reply to this topicStart new topic
fdkaac input bitdepth
yetanotherid
post Aug 31 2014, 09:49
Post #1





Group: Members
Posts: 32
Joined: 11-August 08
Member No.: 56966



All the information I've been reading, including the info straight from the horses mouth here, and Wikipedia repeats here, indicates fdkaac is based on fixed-point math and only supports 16-bit integer PCM input.

Which confuses the hell out of me when it seems to happily accept a 24 bit input and foobar2000 sets the "highest BPS supported" to 32 for fdkaac.
It confuses me even more when it doesn't appear to be clipping values above 0dBFS and encodes them just as most other lossy encoders do.

Could someone please explain what I'm missing here?

Thank you.

This post has been edited by yetanotherid: Aug 31 2014, 09:53
Go to the top of the page
+Quote Post
nu774
post Aug 31 2014, 11:37
Post #2





Group: Developer
Posts: 525
Joined: 22-November 10
From: Japan
Member No.: 85902



The encoder is surely fixed point based. Frontend (fdkaac command) is not, and it accepts up to 32bit int and 64bit float. It down converts to 16bit int before encoding.
QUOTE
It confuses me even more when it doesn't appear to be clipping values above 0dBFS and encodes them just as most other lossy encoders do.

Well, it is possible that peak goes beyond 0dBFS when decoded, but it surely clips.
However, it applies smart limiter to floating point input in order to minimize the audible defect of hard clipping. Therefore, it might not so obvious.
Try feeding float input with intentionally high gain (For example, try this one: Attached File  test.wv ( 6.33MB ) Number of downloads: 5
)
The resulting peak should be much lower than the original (due to clipping), but audible defect of clipping should not be so obvious as the case of hard clipping (For example, LAME or MPC will simply hard clip and the result should be obvious).
Go to the top of the page
+Quote Post
yetanotherid
post Aug 31 2014, 13:20
Post #3





Group: Members
Posts: 32
Joined: 11-August 08
Member No.: 56966



Thanks for the reply.

I thought I'd pretty much tried what you suggested before posting (it turns out, maybe not), but to ensure I'm not going mad......

I ran ReplayGain on your test file and the result was this:
Track Gain: -7.64dB, Track Peak 1.633424

After re-encoding with fdkaac (the LAME result was pretty much the same):
Track Gain: -7.51dB, Track Peak 1.019475

Okay, so that was clipped. Back to the drawing board.....

I dug out a random flac track taken from a CD and ran a ReplayGain scan:
Track Gain: -2.55dB, Track Peak 0.905121

Re-encoded with ffdkaac while applying a 10dB volume increase:
Track Gain: -12.00dB, Track Peak 1.418604

Re-encoded with LAME while applying a 10dB volume increase:
Track Gain: -11.60dB, Track Peak 1.547830

Re-encoded with NeroAAC while applying a 10dB volume increase:
Track Gain: -12.52dB, Track Peak 2.928896

Re-encoded with wavepack (32 bit) while applying a 10dB volume increase:
Track Gain: -12.55dB, Track Peak 2.862243

And finally the wavepack (32 bit) file above re-encoded with fdkaac, no volume change):
Track Gain: -12.00dB, Track Peak 1.418604


So a question to help me get my head around all that..... what do the "Track Peak" ReplayGain values represent? Are they a percentage?

I'm assuming from the above result, the only lossy encoder which encoded the audio correctly after the volume was increased by 10dB was NeroAAC. The rest of the time the peaks were clipped and.... I guess..... the ReplayGain Track Peaks of around 1.4 represent the level at which the clipping/distortion is decoded, or something to that effect? If that's the case, I guess seeing the Track Peaks of around 1.4 was fooling me into thinking the audio was still being encoded properly.
Maybe I should have listened instead. I did this time. The distortion in the fdkaac/lame encoded versions was pretty obvious. Foobar2000's advanced limiter (which I leave in the playback chain) did a pretty good job of limiting the wavepack version. Much less distortion.

Sometimes when converting audio while applying a DSP (downmixing multichannel audio to stereo and encoding as MP3, for example) I've run a scan with MP3Gain and it's reported peaks a dB or three over 0dB. I aim to prevent that anyway, but I'd always assumed it's because the encoder could store values greater than 0dBFS, but now I'm thinking that's not possible unless the encoder input is 32 bit (which would be 32 bit float).

Am I on the right track with any of that?

Cheers.
Go to the top of the page
+Quote Post
nu774
post Aug 31 2014, 13:50
Post #4





Group: Developer
Posts: 525
Joined: 22-November 10
From: Japan
Member No.: 85902



QUOTE (yetanotherid @ Aug 31 2014, 21:20) *
So a question to help me get my head around all that..... what do the "Track Peak" ReplayGain values represent? Are they a percentage?

If you multiply it by 100, yes. 1.0 means 100% and 0dBFS. You can convert it to dB by the following formula.
CODE
20 * log10(x)

For example, 0.5 will become 20*log10(0.5) = -6.02 in dB.

In your example, original peak is 0.905121. Applying 10dB gain means multiplying 10^(0.05*10), which becomes 0.905121 * (10^(0.05*10)) = 2.8622439.
The result of Wavpack (floating point) looks exact.
Go to the top of the page
+Quote Post
nu774
post Aug 31 2014, 14:09
Post #5





Group: Developer
Posts: 525
Joined: 22-November 10
From: Japan
Member No.: 85902



The smart limiter of fdkaac was added on 0.6.0 (the latest version on github repo).
It's not usable on older version, in which case it will simply hard clip like LAME.
Go to the top of the page
+Quote Post
yetanotherid
post Aug 31 2014, 15:46
Post #6





Group: Members
Posts: 32
Joined: 11-August 08
Member No.: 56966



Thanks for the info. I think I understand now. So for a Track Peak of 1.547830:

20*log10(1.547830) = +3.79 dB

1.547830 seems like a lot more until you convert it to dB. Then it looks like it's probably just extra distortion.

Why does foobar2000 display the ReplayGain scan result for Track Gain in dB but the Track Peak as a percentage? Why not dB for both so it's a tad more intuitive?
I only ask because if it did, I probably would have looked at it the results and thought "a Track Peak of -0.86dB before, +3.79dB after.... that's not the 10dB difference I applied so the audio must be clipped". smile.gif

Thanks again!

PS One other quick question if I may........
If I create a custom encoder preset for the ffdkaac encoder and set the maximum BPS to 24 (as opposed to the maximum of 32 foobar2000 specifies), would I be doing the encoder a disservice? I ask, because if in the future I need to remember which encoder supports 32 bit float input and which doesn't (even in the case of ffdkaac where it's converted to integer "internally") I can just check the converter presets to remind myself.

Well two more.... although this is more of an observation. When setting up a fdkaac encoder preset with the VBR5 quality setting, foobar2000 lists the average bitrate for CD audio as 180kbps. In my experience so far, that's not even close. It's more like 224kbps - 240kbps. I think the claimed 128kbps for the VBR4 is much closer to accurate. The bitrate jump from VBR4 to VBR5 seems quite large.

This post has been edited by yetanotherid: Aug 31 2014, 16:22
Go to the top of the page
+Quote Post
lithopsian
post Aug 31 2014, 16:55
Post #7





Group: Members
Posts: 177
Joined: 27-February 14
Member No.: 114718



Gain values are in dB because they are almost always applied as is, or combined with other dB settings such as a preamp. Internally that may well mean converting it to a simple multiplier, but that's the same for all gains. The values may also be quite large as multipliers and are more readily understood as a number no more than a few dB.

Peak values are in percentages/fractions relative to 0dB because that is more useful. As you've seen the peak numbers are generally quite small, frequently less than 1, and the resulting very small dB values plus or minus from 0 are less easy to understand at a glance. A dB peak value wouldn't be applied immediately to anything. Instead a percentage of peak is used in conjunction with any gains that are applied to calculate whether clipping is going to occur. Then you decide what do do with that information, which might be as simple as applying a gain value to bring the peak down, or may be something more complex.

And because the spec says so smile.gif

P.S. I agree that the FDK settings 4 and 5 are a long way apart right where you'd want one in between. The resulting bitrate is very variable between and even within tracks, which can make the gap seem even bigger.

This post has been edited by lithopsian: Aug 31 2014, 16:57
Go to the top of the page
+Quote Post
yetanotherid
post Aug 31 2014, 18:26
Post #8





Group: Members
Posts: 32
Joined: 11-August 08
Member No.: 56966



I don't know...... if I saw a Track Peak expressed as 0.2dB, 1.5dB, 4dB, 10dB or some other dB value it'd have some meaning to me. 1.064587 or 2.354671 etc are meaningless to me unless I convert them to dB.

If I was to see a Track Gain of 82dB and a Track Peak of -4.5dB it seems fairly obvious applying track gain without clipping would result in a 4.5dB increase and a new Track Gain of 86.5dB. Unless I'm missing something..... maybe it's just me.

My logic would be..... I convert an audio file. Maybe it's multi-channel and I downmix it. I run ReplayGain on the output file and it shows a track peak of 1.36498. Okay..... so I'll re-convert it while applying a volume reduction..... which I've got to specify as a reduction of "x" dB.....

I haven't played around with it much, and I've no idea what frequency is used by default, but specifying a low pass frequency of 17500 seems to reduce the bitrate for fdkaac VBR5 much of the time (-w 17500). A little more in-line with what you'd expect relative to VBR4 (around 200kbps). I was playing with 17500Hz because from memory that's the frequency the LAME V2 preset uses.
Whether it's a good idea I'm not sure, but specifying the same frequency for VBR4 increases the average bitrate for that preset a tad. A track which might end up 128kbps without -w 17500 might increase to 138kbps with it. Maybe adjusting the frequency is the key.

Edit: I checked and it seems I was remembering wrong. A CBR 128k LAME encode used a low pass of 17k. The V2 preset uses 18.5k. For V4 it's 17.5k and for V0 it's 22.1k. Obviously the LAME developers seem to think adjusting the low pass frequency is a good idea so maybe a couple of presets "in-between" fdkaac's Q4 and Q5 could be created that way?

This post has been edited by yetanotherid: Aug 31 2014, 18:39
Go to the top of the page
+Quote Post
foosion
post Aug 31 2014, 23:12
Post #9





Group: FB2K Moderator (Donating)
Posts: 4426
Joined: 24-February 03
Member No.: 5153



QUOTE (yetanotherid @ Aug 31 2014, 18:26) *
I run ReplayGain on the output file and it shows a track peak of 1.36498. Okay..... so I'll re-convert it while applying a volume reduction..... which I've got to specify as a reduction of "x" dB.....
Perhaps I am missing something here but why do you need or want to specify the volume reduction manually in this case? foobar2000 can apply the Replaygain settings during conversion just like it does during playback. You only have to enable this function in Processing section of the converter settings. wink.gif


--------------------
http://foosion.foobar2000.org/ - my components for foobar2000
Go to the top of the page
+Quote Post
kode54
post Yesterday, 10:09
Post #10





Group: Admin
Posts: 4613
Joined: 15-December 02
Member No.: 4082



Needs more testing with my fork of fdkaac. I never updated it, but I did at least set the foundation. Basically, I changed the code that was based on 32 bit integer, so it would instead accept 8.24 fixed point. The only limitation was that it absolutely required +/- 1.0 integer format input for the supplied SBR decoder, which could then produce 8.24 fixed point. I don't know if this can be fixed, or if this also applies to the encoder.

https://github.com/kode54/fdk-aac

Commits:

Modified to output 8.24 fixed point samples [Relevant to encoder and decoder]
Fixed SBR decoding volume scale [Relevant to decoder, possibly needs to be applied to encoder]
Go to the top of the page
+Quote Post
yetanotherid
post Yesterday, 22:41
Post #11





Group: Members
Posts: 32
Joined: 11-August 08
Member No.: 56966



QUOTE (foosion @ Sep 1 2014, 08:12) *
QUOTE (yetanotherid @ Aug 31 2014, 18:26) *
I run ReplayGain on the output file and it shows a track peak of 1.36498. Okay..... so I'll re-convert it while applying a volume reduction..... which I've got to specify as a reduction of "x" dB.....
Perhaps I am missing something here but why do you need or want to specify the volume reduction manually in this case? foobar2000 can apply the Replaygain settings during conversion just like it does during playback. You only have to enable this function in Processing section of the converter settings


It doesn't work if the source file doesn't contain ReplayGain info, and at least one format (wave files) can't contain it.

Applying DSPs while converting can change the volume significantly.

I thought I had a third reason but I got distracted for a few minutes and now it's gone.... smile.gif

This post has been edited by yetanotherid: Yesterday, 22:42
Go to the top of the page
+Quote Post

Reply to this topicStart new topic
2 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
1 Members: Daeron

 



RSS Lo-Fi Version Time is now: 2nd September 2014 - 02:27