IPB

Welcome Guest ( Log In | Register )

 
Reply to this topicStart new topic
Musepack encoder, Modification to write MPEG Audio Layer 1/2
S_O
post Dec 5 2010, 21:54
Post #1





Group: Members
Posts: 296
Joined: 27-July 02
From: Germany
Member No.: 2821



Hello,
several years ago it was said that changing the MusePack encoder to mp2 output wonīt take very long. But AFAIK nobody has ever released a musepack-based mp2-encoder so far.
Because I need a high-quality mp2-encoder (for authoring DVDs) I tried it myself:

I was able to modify the encoder in a way it outputs a 448kbit/s MP1 file, if the bitrate musepack wants to encode is higher, the last subbands are just cut off, otherwise the frame is padded with 0. The problem is now: After just one musepack frame (=three layer1 frames) the bitrate is decreased dramatically: Beginning from band 5 only 2 bits are assigned for each subband, the lower ones also have 8 bits max. It doesnīt matter at what quality I try, itīs the same for thumb to insane.
Otherwise the output file will play fine (very noticeable artifacts because of low effective bitrate (about 200kbps layer1, stereo, 44100khz), but otherwise it seems to work).
Is here anybody familiar with the MusePack encoder able to tell why there is this bitrate drop? I havenīt done anything special, just disabled MS coding, changed the scalefactors, created a function to write MPEG layer1 bitstream and modified the allocate function not to use unsupported quantizers (resolution is increased in that case) and that allocation is limited by maximum bitrate (thatīs not causing the problem). Thatīs it. Unfortunately the source code is not very structured/readable so I donīt see what Iīm missing.
Any ideas?
Go to the top of the page
+Quote Post
alexeysp
post Dec 6 2010, 11:21
Post #2





Group: Members
Posts: 142
Joined: 3-April 09
Member No.: 68627



Although I cannot answer your question, may I ask why wouldn't you just use TooLAME instead?

Go to the top of the page
+Quote Post
S_O
post Dec 6 2010, 15:52
Post #3





Group: Members
Posts: 296
Joined: 27-July 02
From: Germany
Member No.: 2821



TooLAME isnīt developed for a long time now, also the development of itīs successor twoLame seems to be stopped for some years already. Both encoders donīt provide the quality of MusePack. MusePack has a more advanced encoder, allowing much higher quality. Of course the mp2 bitstream format does not allow all MusePack features (M/S stereo, huffman, PNS, true VBR,...), but you should be able to encode mp2 files about the same quality as musepack with a major bitrate increase (like 256 - 320 kbps comparable to musepack standard).
I need the encoder to author DVDs containing music. Unfortunately LPCM causes problems with several players and there is also no free, high-quality AC-3 encoder around. MusePack based MP2 seems to be good choice too me.

I found out that the problem are the SMRs returned by "Psychoakustisches_Modell", beginning from the second call of that function they are much too low. Unfortunately I havenīt found a reason why. Any ideas?
Go to the top of the page
+Quote Post
S_O
post Dec 6 2010, 23:43
Post #4





Group: Members
Posts: 296
Joined: 27-July 02
From: Germany
Member No.: 2821



I was able to trace the problem inside "Psychoakustisches_Modell" to the function "PreechoControl", which is changing a global array. After commenting the array-modifying code the bitrate drop is gone, but the bitrate allocation is awful. Most likely this is not the only problem. I can encode decent sounding 448 kbps MP1 file using a fixed allocation table (not using psychoacoustics at all).
What am I missing that the function "Psychoakustisches_Modell" is returning no useable values?
Go to the top of the page
+Quote Post
alexeysp
post Dec 7 2010, 15:47
Post #5





Group: Members
Posts: 142
Joined: 3-April 09
Member No.: 68627



No offence, but I think you're wasting your time. I seriously doubt that there could be any audible difference between TooLAME at 384 kbps (maybe even lower) and the thing you're trying to make.

In any case, claims like

QUOTE (S_O @ Dec 6 2010, 16:52) *
Both encoders donīt provide the quality of MusePack. MusePack has a more advanced encoder, allowing much higher quality.


certainly demand for a proof (especially considering we're talking about bitrates around ~400 kbps).
Go to the top of the page
+Quote Post
S_O
post Dec 7 2010, 18:53
Post #6





Group: Members
Posts: 296
Joined: 27-July 02
From: Germany
Member No.: 2821



QUOTE (alexeysp @ Dec 7 2010, 15:47) *
No offence, but I think you're wasting your time. I seriously doubt that there could be any audible difference between TooLAME at 384 kbps (maybe even lower) and the thing you're trying to make.

In any case, claims like

QUOTE (S_O @ Dec 6 2010, 16:52) *
Both encoders donīt provide the quality of MusePack. MusePack has a more advanced encoder, allowing much higher quality.


certainly demand for a proof (especially considering we're talking about bitrates around ~400 kbps).

I understand what you mean, I remember having quite bad results toolame, I just checked twolame and the first sample Iīve tried was easily ABXable (8/8) at 256kbps (and noticeable artifacts at 192kbps) and my ears are not "tuned" to hear encoding artifacts. By the way, Iīm not talking about ~400kps bitrates, 384kbps is maximum for mp2 and it would be great if it could already sound transparent at 256kbps. High bitrate doesnīt imply high quality. Try blade at 320kbps for mp3, you will find a lot of killer samples that wonīt sound transparent. You also cannot compare this bitrates of a old subband-coder with no entropy coding to modern mdct-based codecs (aac, vorbis or even mp3).
MusePack achieves this high quality because of a highly tuned encoder. Because MP2 is basically MusePack with several features missing it is possible to create mp2 bitstream of similar quality at higher bitrates using the MusePack encoder.

To musepack source:
I take it all back! I was completely wrong. It had nothing to do with "Psychoakustisches_Modell". MS needed to be disabled on two places and I noticed I was coding one quantizer wrong, making the sound quite awful (and I didnīt used that one in my fixed allocation table). So I basically got it, now I need to change the code to a more sophisticated CBR allocation, all the mp2 allocation tables etc.

At the moment there is just one question left: Combine Penalty. Itīs about how many scalefactors are coded for each band in a frame (1, 2 or 3). Therefore it uses this magic table:

CODE
static const unsigned char  Penalty [256] = {
    255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,
    255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,
    255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,
    255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,
    255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,
    255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,
    255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,
    255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,
      0,  2,  5,  9, 15, 23, 36, 54, 79,116,169,246,255,255,255,255,
    255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,
    255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,
    255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,
    255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,
    255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,
    255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,
    255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,
};

#define P(new,old)  Penalty [128 + (old) - (new)]


P is called with new and old scalefactor index and the value compared to combine penalty value of the profile (6 default for all profiles). MP2 uses different scalefactors so this table probably needs to be updated with new values. Unfortunately I have no idea what this values represent ans how this values have been calculated in the first place.
Go to the top of the page
+Quote Post
alexeysp
post Dec 8 2010, 00:28
Post #7





Group: Members
Posts: 142
Joined: 3-April 09
Member No.: 68627



Apparently this table provides penalty for replacing "old" with "new" depending on their difference. The penalty is maximum if old < new, or if the difference exceeds 11. Since "old" and "new" are actually indices into scalefactor table, and, if I get it right, the scalefactors are defined as scf = 10**(-0.1*index/1.26), then the difference of indices maps to certain ratio of scalefactors. If we plot the penalty values Penalty[128] through Penalty[139] vs. the corresponding scalefactor ratio values, we will get the following graph:



Now, don't cite me on this, but it seems that in these coordinates the penalty values are remarkably well described by the following function:

p(x) = 4.5*((1/x**2)-1)

where x is the scalefactor ratio (or by simple quadratic function for reverse ratios). The proportionality constant probably comes from some power-of-two to power-of-ten conversion, but it's just a guess.

For what it's worth, hope this helps.
Go to the top of the page
+Quote Post
S_O
post Dec 8 2010, 14:12
Post #8





Group: Members
Posts: 296
Joined: 27-July 02
From: Germany
Member No.: 2821



Thank you very much! That makes a sense and helped a lot. I calculated new values for mp2 scalefactors.

I hope Iīm able to finish that encoder so it can be released beginning of next year.
Go to the top of the page
+Quote Post
S_O
post Dec 15 2010, 13:28
Post #9





Group: Members
Posts: 296
Joined: 27-July 02
From: Germany
Member No.: 2821



Iīm making some progress creating the encoder (Iīm already able to write mp2 files with dynamic allocation, removed all global variables and separated frontend from encoder), but I found something I donīt understand:

CODE
#define MPPENC_DENORMAL_FIX_BASE ( 32. * 1024. /* normalized sample value range */ / ( (float) (1 << 24 /* first bit below 32-bit PCM range */ ) ) )
#define MPPENC_DENORMAL_FIX_LEFT ( MPPENC_DENORMAL_FIX_BASE )
#define MPPENC_DENORMAL_FIX_RIGHT ( MPPENC_DENORMAL_FIX_BASE * 0.5f )

This values are added to the input. Any idea why, and why only the half on the right channel?
Go to the top of the page
+Quote Post
benski
post Dec 15 2010, 20:00
Post #10


Winamp Developer


Group: Developer
Posts: 670
Joined: 17-July 05
From: Brooklyn, NY
Member No.: 23375



QUOTE (S_O @ Dec 15 2010, 08:28) *
Iīm making some progress creating the encoder (Iīm already able to write mp2 files with dynamic allocation, removed all global variables and separated frontend from encoder), but I found something I donīt understand:

CODE
#define MPPENC_DENORMAL_FIX_BASE ( 32. * 1024. /* normalized sample value range */ / ( (float) (1 << 24 /* first bit below 32-bit PCM range */ ) ) )
#define MPPENC_DENORMAL_FIX_LEFT ( MPPENC_DENORMAL_FIX_BASE )
#define MPPENC_DENORMAL_FIX_RIGHT ( MPPENC_DENORMAL_FIX_BASE * 0.5f )

This values are added to the input. Any idea why, and why only the half on the right channel?

No idea on why the right channel only gets half, but "denormal" refers to an issue with the FPU on some Intel processors where values very close to zero get calculated using a more precise but much slower mode. There's no perfect way around it but a common approach is to add a small inaudible value so that numbers close to zero get pushed back over the threshold. It's nothing to do with musepack or mp2, simply an optimization that's commonly done on dsp code that will run on the x86
Go to the top of the page
+Quote Post
S_O
post Dec 16 2010, 22:06
Post #11





Group: Members
Posts: 296
Joined: 27-July 02
From: Germany
Member No.: 2821



Interesting. You say the reason this is done, because very small (denormal) numbers slow done the processor?

Based on the fact that Musepack encoder operates on floating point numbers in the range -32767 ... +32767, the smallest number (positive, absolute) you get with 16 bit input is 1. With 24 bit input the smallest number is 1/256. But added is 1/512, that doesnīt make much sense, because -1/256 is also possible and resulting in -1/512, a even smaller number (absolute value).
I read that the smallest normal number in 32 bit float is about 1,401*10^(-45). That is a lot smaller than 1/512. In fact to reach that small number you would need a 165-Bit int audio source.

Are you sure thatīs the reason? I havenīt implemented this stuff in my encoder yet and it is very fast (faster than Musepack).

I also read that SSE2 provides a flag in a control register telling the processor to make all denormal numbers 0, wouldnīt that make more sense instead of adding a constant?
Go to the top of the page
+Quote Post
benski
post Dec 17 2010, 02:40
Post #12


Winamp Developer


Group: Developer
Posts: 670
Joined: 17-July 05
From: Brooklyn, NY
Member No.: 23375



I can't remember all the details of layer 2, but if there are any IIR filters in the signal path will eventually create denormals, regardless of value range, when there is digital silence at the input.

Yes, I'd you use the SSE2 extension you can disable special denormal processing. But it is only used for sse2 instructions (e.g. addpd) and not x87 instructions (e.g. fadd )

Yes denormals are super slow.
Go to the top of the page
+Quote Post
S_O
post Dec 18 2010, 16:13
Post #13





Group: Members
Posts: 296
Joined: 27-July 02
From: Germany
Member No.: 2821



Iīve tested different files, also digital silence and the encoder never slows down (since denormals are about 700 times slower than normal numbers I think that would be noticeable). Of course there may be a audio file causing denormals somewhere in the processing, but I donīt see why adding 1/512 helps (except preventing 0 at the input, and that only for 16/24 Bit int sources). I also could not find anything like this in twolame code.

I think already finished most of the work, the encoder should support all features of mp2 except intensity stereo (and dual channel is just like stereo, just a flag in the header is different).
Now I have to test if my modified allocation code causes any problems. Thatīs the biggest difference between Musepack and MP2: Musepack can allocate yust as many bits for every subband how the encoder thinks itīs best. In MP2 there are not only fixed frame sizes, but also allocation tables that doesnīt allow all quantizers for all subbands.
The way I did it: Musepack allocation increases the Resolution of each subband until the Mask-To-Noise-Ratio (MNR) is smaller than 1.
For MP2 I added:
-Increase all resolutions until there are codeable by MP2 (new MNRs for changed subbands)
-If VBR: Find mininum frame size that allows coding these resolutions
-Decrease resolution of all subbands one (codeable) step until audio fits into frame (for VBR that only happens if maximum bitrate is reached)
-Calculate new MNRs
-Increase resolution of subband with highest MNR and calculate new MNR in a loop until frame is completly filled.
Any better ideas?
Go to the top of the page
+Quote Post
benski
post Dec 18 2010, 19:33
Post #14


Winamp Developer


Group: Developer
Posts: 670
Joined: 17-July 05
From: Brooklyn, NY
Member No.: 23375



When lowering resolution, can you lower critical band(s) last or not at all?
Go to the top of the page
+Quote Post
S_O
post Dec 19 2010, 01:46
Post #15





Group: Members
Posts: 296
Joined: 27-July 02
From: Germany
Member No.: 2821



The Musepack Psy-Model decides what bands are critical.
Instead of reducing one single band I reduce all bands. The idea is, that if for example band X might have the lowest MNR of all, therefore I decreased it, but after decreasing the MNR increased dramatically, while decreasing the resolution of band Y with higher MNR would have increased the MNR of that band just a little.

Therefore I decrease all bands and then increase always the one with highest MNR until the frame is filled. The idea is that the highest MNR of all bands should be as low as possible, rather than that the average MNR should be as low as possible. Of course if frame is not overfull in the first place the decreasing step is skipped. To get the highest quality, the PsyModel should be set to a value that comes closest to the mp2 bitrate. I need to to some further testing, but musepack standard (5.0) should go with about 256kbps. That means most of the frames will only be increased, but still 1/3 has to be decreased (depends of course on the sample, but average for some music I tested). Of course the encoder will offer the same parameters to control the PsyModel like Musepack.

The allocation algorithm treats all bands the same, but the PsyModel calculates the values to get the MNR. The MNR is also calculated different for transient and non-transient bands (the PsyModel also finds out what bands are transient bands).

Something completely non-technical: Any idea how I should name that encoder, mp2enc or similar doesnīt sound like a completely new mp2 encoder. I asked the Musepack team, they do not want the encoder named in a way that resembles Musepack.
Go to the top of the page
+Quote Post
bryant
post Dec 19 2010, 08:08
Post #16


WavPack Developer


Group: Developer (Donating)
Posts: 1297
Joined: 3-January 02
From: San Francisco CA
Member No.: 900



NAME = Name Ain't a Musepack Encoder wink.gif
Go to the top of the page
+Quote Post
pbelkner
post Dec 19 2010, 12:12
Post #17





Group: Members
Posts: 412
Joined: 13-June 10
Member No.: 81467



QUOTE (benski @ Dec 17 2010, 03:40) *
Yes denormals are super slow.

By poor accident I just came across this:
Denormal numbers in floating point signal processing applications
Laurent de Soras
2005.04.19
http://ldesoras.free.fr/doc/articles/denormal-en.pdf

Go to the top of the page
+Quote Post

Reply to this topicStart new topic
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:

 



RSS Lo-Fi Version Time is now: 25th December 2014 - 19:40