IPB

Welcome Guest ( Log In | Register )

> foobar2000 Tech Support Forum Rules

Please read foobar2000 Tech Support Forum Rules before posting and comply with all the points.
Failure to provide all the information pointed out in the above document in your post is considered wasting other people's time and in extreme cases will lead to your topic getting locked without a reply.


See also: Hydrogenaudio Terms of Service.

 
Reply to this topicStart new topic
fb2k's compensation for decoder delay is inconsistent, 529 samples sometimes go missing. Bug or feature?
mjb2006
post Nov 2 2010, 09:49
Post #1





Group: Members
Posts: 755
Joined: 12-May 06
From: Colorado, USA
Member No.: 30694



As I understand it, the first non-VBR-header frame of an MP3 file should decode to 529 (the usual decoder delay) + 1152 samples, and every frame thereafter should decode to exactly 1152 samples. The player should discard those first 529, and then it should discard however many samples the VBR header said was the encoder delay. When the last frames are reached, the player should discard however many samples the VBR header said was padding. It should be as simple as that, right?

foobar2000 1.1 seems to behave inconsistently in this regard. timcupery apparently noticed the problem in 0.9 a while back but didn't get any responses, perhaps because he asked about several things at the same time.

Here's a quick test to demonstrate.

flac.exe -c -d -s Boing_Boom_Tschak.flac | lame.exe -V0 -Boing_Boom_Tschak.mp3
The resulting MP3 has 1155 frames + the VBR header frame w/LAME tag. The LAME tag specifies encoder delay 576 and padding 991. This checks out fine: 1155 frames 1152 samples/frame = 1330560 samples. 1330560 - 576 - 991 = 1328993, same as the original FLAC.

Now load up the MP3 in foobar2000 1.1 and convert it to WAV. Output should be 1328993 samples and should align with the FLAC in a wave editor. No problem so far.

Now in foobar2000, edit the MP3's gapless playback information. Change encoder delay to 0 and length-of-track to 1330560. The playlist item's properties window should now show delay & padding of 0. Convert to WAV again. We would expect the output to be 1330560 samples. However, it's now 1330031. In a wave editor, it's apparent that fb2k removed 529 samples from the beginning, apparently overcompensating for decoder delay by a factor of 2!

It's not just happening in conversion, but in ordinary playback as well. Any MP3 having a delay and padding both set to 0 loses its first 529 samples. Because of this, seams are audible in what should be a gapless series of MP3s created by mp3DirectCut after splitting a CBR, no-bit-reservoir MP3 on frame boundaries.

Back to the Kraftwerk sample—some experimentation shows that after using fb2k to edit the gapless playback information, leaving the encoder delay at 0 and setting the padding to any value between 0 and 529 (length-of-track 1330560 to 1330031), results in the same 1330031-sample output. But setting the padding to 530 (length-of-track 1330030) results in the requested 1330030 samples. In a wave editor, one can observe 576 samples of encoder delay, followed by 1328993 samples of audio, followed by not 530 but 461 samples of padding, as that's what's required to reach 1330030. I notice 1330030 is 530 shy of a frame boundary, probably not a coincidence.

I haven't even experimented with encoder delay values yet, and I'm finding it rather painful to wrap my head around this behavior. I don't understand why decoder delay even enters into the equation. I should be able to set encoder delay & padding to whatever I want and get those samples trimmed from the beginning and end, respectively, and never be exposed to decoder delay. What's going on here? Bug? Feature?

This post has been edited by mjb2006: Nov 2 2010, 09:50
Go to the top of the page
+Quote Post
kode54
post Nov 2 2010, 19:08
Post #2





Group: Admin
Posts: 4572
Joined: 15-December 02
Member No.: 4082



Where do you arrive at the conclusion that 529 is "overcompensating for decoder delay by a factor of 2?" 529 is not an even number. It is exactly the decoder delay for layer 3 files, in this case.
Go to the top of the page
+Quote Post
mjb2006
post Nov 2 2010, 22:48
Post #3





Group: Members
Posts: 755
Joined: 12-May 06
From: Colorado, USA
Member No.: 30694



QUOTE (kode54 @ Nov 2 2010, 12:08) *
Where do you arrive at the conclusion that 529 is "overcompensating for decoder delay by a factor of 2?" 529 is not an even number. It is exactly the decoder delay for layer 3 files, in this case.


The decoder presumably added 529 samples of silence to the beginning. These got removed, as they should be. But then another 529 samples got removed. Those samples are getting cut from the beginning of the non-delay part of the audio.

This post has been edited by mjb2006: Nov 2 2010, 22:52
Go to the top of the page
+Quote Post
kode54
post Nov 3 2010, 01:32
Post #4





Group: Admin
Posts: 4572
Joined: 15-December 02
Member No.: 4082



If you remove the gapless information or set the delay and padding to zero, then the decoder delay is not removed.
Go to the top of the page
+Quote Post
mjb2006
post Nov 3 2010, 03:23
Post #5





Group: Members
Posts: 755
Joined: 12-May 06
From: Colorado, USA
Member No.: 30694



QUOTE (kode54 @ Nov 2 2010, 18:32) *
If you remove the gapless information or set the delay and padding to zero, then the decoder delay is not removed.


If that's true, then there should be 529 extra samples of silence at the beginning. There's not.
The decoder delay is stripped, and another 529 samples beyond that are also stripped.
Go to the top of the page
+Quote Post
kode54
post Nov 3 2010, 09:34
Post #6





Group: Admin
Posts: 4572
Joined: 15-December 02
Member No.: 4082



If that were the case, then the decoder would output ( number of frames * 1152 ) + 529 samples. It does not. The only way to make it do so would be to feed it a garbage frame after the end of the file, then keep the first 529 samples of that frame.

The decoder always emits exactly 1152 samples per frame decoded.

Perhaps somebody here should better explain how MP3 gapless information works, where the delay samples are emitted, and how they are discarded?

How does LAME itself handle this when decoding files with or without gapless information?
Go to the top of the page
+Quote Post
mjb2006
post Nov 4 2010, 08:04
Post #7





Group: Members
Posts: 755
Joined: 12-May 06
From: Colorado, USA
Member No.: 30694



QUOTE (kode54 @ Nov 3 2010, 02:34) *
The decoder always emits exactly 1152 samples per frame decoded.


This doesn't help me. Decoder delay, as I understand it, precedes the first frame's-worth of 1152 samples. Not every frame, just the first one in the stream. I assume fb2k, as the handler of the encoder's i/o, removes that delay so the first frame effectively yields 1152, like the rest. Are you saying that's not true?

The fact remains that:
  • given an 1155-frame file with 576 delay and 991 padding, fb2k emits 1328993 (11551152-576-991) samples, which makes sense, and inspection of the samples reveals nothing is missing; the sample stream matches up with the original.
  • given an 1155-frame file with 0 delay and 0 padding, fb2k should emit 1330560 (11551152) samples, but instead it emits 1330031 (11551152-529) samples, which doesn't make sense. Inspection reveals the missing 529 samples are in fact signal, not delay, as I already stated. Follow the steps above to reproduce it if you don't believe me.


So it seems under some low encoder delay/padding circumstances, 529 samples get cut in error.

This post has been edited by mjb2006: Nov 4 2010, 08:09
Go to the top of the page
+Quote Post
Yirkha
post Nov 5 2010, 14:45
Post #8





Group: FB2K Moderator
Posts: 2359
Joined: 30-November 07
Member No.: 49158



QUOTE (mjb2006 @ Nov 2 2010, 10:49) *
As I understand it, the first non-VBR-header frame of an MP3 file should decode to 529 (the usual decoder delay) + 1152 samples, and every frame thereafter should decode to exactly 1152 samples.
That's wrong, all frames of your MP3 file decode to exactly 1152 samples.

QUOTE (mjb2006 @ Nov 2 2010, 10:49) *
The player should discard those first 529, and then it should discard however many samples the VBR header said was the encoder delay.
True. First its 529 samples are discarded, because that's the delay of the decoder. Additionally, if encoded with a "LAME gapless tag", the specified number of subsequent samples, the encoder delay, are cut off too.

QUOTE (mjb2006 @ Nov 2 2010, 10:49) *
When the last frames are reached, the player should discard however many samples the VBR header said was padding.
Almost right. It is necessary to clarify that there is an implicit padding of 529 samples (the decoder delay again), so padding values <= 529 have no effect and are ignored.

QUOTE (kode54 @ Nov 3 2010, 10:34) *
How does LAME itself handle this when decoding files with or without gapless information?
lame-3.98.4.tar.gz:/frontend/main.c:201:
CODE
            if (*enc_delay > -1 || *enc_padding > -1) {
                if (*enc_delay > -1)
                    skip_start = *enc_delay + 528 + 1;
                if (*enc_padding > -1)
                    skip_end = *enc_padding - (528 + 1);
            }
            else
                skip_start = lame_get_encoder_delay(gfp) + 528 + 1;


This post has been edited by Yirkha: Nov 8 2010, 00:18


--------------------
Full-quoting makes you scroll past the same junk over and over.
Go to the top of the page
+Quote Post
mjb2006
post Nov 6 2010, 21:50
Post #9





Group: Members
Posts: 755
Joined: 12-May 06
From: Colorado, USA
Member No.: 30694



QUOTE (Yirkha @ Nov 5 2010, 07:45) *
True. First it's 529 samples are discarded, because that's the delay of the decoder. Additionally, if encoded with a "LAME gapless tag", the specified number of subsequent samples, the encoder delay, are cut off too.

I guess I'm not understanding what the decoder delay is, then. I need a diagram showing how the 1152 samples in the first frame are split into encoder delay, decoder delay, padding, and samples representing the samples from the original audio.

QUOTE (Yirkha)
It is necessary to clarify that there is an implicit padding of 529 samples (the decoder delay again), so padding values <= 529 have no effect and are ignored.

What is the relationship between decoder delay and padding? I thought padding was appended to the original audio (not the original audio + 529) to make it be a multiple of 1152 before encoding, and decoder delay is silence added to the beginning upon decoding.

Bonus if this can be explained without using the words 'filterbank' and 'MDCT' smile.gif
Go to the top of the page
+Quote Post
Yirkha
post Nov 7 2010, 20:10
Post #10





Group: FB2K Moderator
Posts: 2359
Joined: 30-November 07
Member No.: 49158



Oh well, here you go:


--------------------
Full-quoting makes you scroll past the same junk over and over.
Go to the top of the page
+Quote Post
mjb2006
post Nov 8 2010, 13:03
Post #11





Group: Members
Posts: 755
Joined: 12-May 06
From: Colorado, USA
Member No.: 30694



Thanks! This is helping. I'm closer to understanding what foobar is doing. The fact that there are 3 related but discrete padding measurements is a revelation. I have more questions, but it'll take me a day or two to articulate them. Thanks again.
Go to the top of the page
+Quote Post

Reply to this topicStart new topic
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:

 



RSS Lo-Fi Version Time is now: 23rd July 2014 - 06:29