Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: HE-AAC/SBR Decoder Delay? (Read 4431 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

HE-AAC/SBR Decoder Delay?

So I've been trying to understand about how gapless playback in AAC through the iTunSMPB tag actually works, by reading some technical papers/pdfs/wikis, of which the following are pertaining to this topic:

http://www.hydrogenaud.io/forums/index.php?showtopic=87847
http://www.hydrogenaud.io/forums/index.php?showtopic=98450
http://wiki.multimedia.cx/index.php?title=AAC

In particular, the thread titled "HE-AAC gapless playback" was very useful to that end. Here's a recap of what I came out with:
  • For HE-AAC, FhG's iTunSMPB includes: Encoder-Delay + LC-Decoder-Delay + HE-Decoder-Delay, Padding, Original-Sample-Count
  • For HE-AAC, Apple's iTunSMPB includes: Encoder-Delay + LC-Decoder-Delay, Padding, Original-Sample-Count (i.e. leaving out the fixed value of 481 samples of additional HE decoder delay.)

However, I still wasn't quite clear on what decoder delay actually is or how it translates into silent samples in the encoded audio. To gain more insight into this whole thing, I took a lossless FLAC track from a gapless album (Sample Rate = 44.1 Khz) and encoded it to FhG HE-AAC and Apple HE-AAC. For better accuracy, I used qaac with smart padding enabled to circumvent Apple's faulty HE encoding (cutting off 1 frame short at the end).

Next, I took the FhG encoded HE-AAC file, marked down its iTunSMPB tag value ( 00000000 00001084 0000000F 0000000000F6E76D etc.), stripped all tags from it and then decoded it to PCM. Next, I opened the decoded file in a sound editor, stripped 4228 (#1084) samples from the beginning (i.e. total encoder + decoder delay for both HE+LC), removed the last 15 (#F) samples, and ended up with 16181101 (#F6E76D) audio samples whose offset now perfectly matches that of the original lossless FLAC source. This was all expected. So far, so good.

Again, I repeated the whole thing with the Apple encoded HE-AAC file. Its iTunSMPB reads ( 00000000 00000840 00000409 00000000007B73B7 etc.). Those values were obviously doubled before marking them down and proceeding. Now to my surprise, after cutting off delay + padding in sound editor, I ended up with original sample count. That's something I wasn't expecting since I hadn't taken into account the implicit HE decoder delay. Also, audio's offset was off by 962 samples (HE decoder delay exactly). Finally, cutting off the padding samples at the end of the stream this time resulted in truncating valid audio samples.

So to sum up, here's what I was expecting the Apple-encoded file structure to be like:

EXPLICIT-DELAY-VALUE = #840 x 2 = 4224 samples (i.e. Encoder-Delay + LC-Decoder-Delay)
IMPLICIT-HE-DECODER-DELAY = 481 x 2 = 962 samples
ORIGINAL-SAMPLE-COUNT = #7B73B7 x 2 = 16181102 samples
PADDING = #409 x 2 = 2066 samples

And here is how it actually looked like:

EXPLICIT-DELAY-VALUE = #840 x 2 = 4224 samples (i.e. Encoder-Delay + LC-Decoder-Delay)
IMPLICIT-HE-DECODER-DELAY = 481 x 2 = 962 samples
ORIGINAL-SAMPLE-COUNT = #7B73B7 x 2 = 16181102 samples
!!!PADDING = #409 x 2 minus IMPLICIT-HE-DECODER-DELAY = 2066 - 962 samples!!!


So what gives? Why does the Apple-encoded file have a wrong padding sample count? Doesn't this mean that the Apple HE-AAC iTunSMPB tag actually includes the HE decoder delay of 962 samples, albeit lumped in within the padding value???

HE-AAC/SBR Decoder Delay?

Reply #1
You just have to slide the window like this, so number of valid samples and number of total samples remain the same.

Before:
|--------------|-------------------------------------------|------------|
After:                                                                       
|--------------------|--------------------------------------------|-----|

You add delay by 481 samples, and you subtract padding by 481 samples.
This means that padding for HE-AAC has to be greater than 481 samples.

HOWEVER, Apple's encoder often fails to put enough padding, which yields truncated output on decoding. qaac addresses this issue by feeding silence at the end to the Apple's encoder.