IPB

Welcome Guest ( Log In | Register )

 
Reply to this topicStart new topic
MP3 vs. MPC vs. Ogg: Low volume test
tigre
post Feb 8 2003, 02:30
Post #1


Moderator


Group: Members
Posts: 1434
Joined: 26-November 02
Member No.: 3890



Hi there!

Inspired by Using DirectSound SSRC Plugin thread I wanted to know what's the difference between outputs of foobar 2000 and winamp/other players. I figured out following test:

1. Take a sample (I did it with 30 seconds of music, could be done with test signals too), convert it to 32 bit resolution in Cool Edit Pro.

2. Apply logarithmic fadeout to the whole sample: 0 to -150 db (using CEP).

3. Save or dither to 24bit -> save, depending on the needs of lossy encoders (Step 4).

4. Encode with lame 3.90.2 -api, mpc 1.15i braindead, vorbis 1.0 -q 10 (this should not be the weak link).

5. Playback encoded files in tested players and capture output to wav file, resolution 24bit/96kHz if possible. The problem here is that I don't know any tool for this, I asked for advice in this thread. Because of this I've only tested this on foobar2000 0.5 beta so far, as its diskwriting feature is capable of 24bit/96kHz output. So until now it's not a player output comparison but a encoder comparison as you'll see soon.

6. Aply logarithmic fadein 0 -> +150db to the whole sample with CEP to bring the volume back to original level everywhere.

7. Listening to the resulting files.

The reason for this proceeding was to "exaggerate" the differences that occur at very low volume and make them audible.

My observations: All files started to contain aditional noise compared to the original after ~1/2 of playing time. At this point started what surprised me: The mp3 sounded like the original + increasing noise till the end of the sample while MPC and Vorbis started with slight artifacts and sounded more and more awfull: ogg somehow like underwater, mpc like NUMLock's lame audiophile preset wink.gif and worse.

So is mp3 superior at low volumes? Not really. I gave mp3 an advantage by using CBR. A quick test using lame ape (=VBR) showed the same awfulness as Vorbis and mpc. Besides, I noticed that with decreasing volume the bitrate of all 3 codes decreased too.

Conclusion? I'm not sure, but maybe there's something that could be improved (adaptive ath model for high resolution signals, or ...). Imagine, you have a high resolution source (DVD audio ...) and music that uses its huge dynamic range (not completely of course) - and equipment with really high SNR ... this could be an issue.

I hope I haven't just wased my time and space on this forum wink.gif. Thoughts welcome.


--------------------
Let's suppose that rain washes out a picnic. Who is feeling negative? The rain? Or YOU? What's causing the negative feeling? The rain or your reaction? - Anthony De Mello
Go to the top of the page
+Quote Post
SometimesWarrior
post Feb 8 2003, 03:10
Post #2





Group: Members
Posts: 671
Joined: 21-November 01
From: California, US
Member No.: 514



I remembered hearing something about MPC SV8 having support for a higher dynamic range than SV7. A forum search turned up an old post by Frank Klemm mentioning this. Here's an excerpt (emphasis mine):

QUOTE (Frank Klemm @ Mar 5 2002 - 09:20 AM)
But the main advantage will be more stable quality than SV7. Especially the little bit too much treble problem should be fixed in SV8. Editing/Fading of files without reencoding is possible. You can convert SV7 and MP2 to SV8. Streaming is possible (but not optimized for weak connections). 32 kHz and 48 kHz is possible. Up to 7 channels are possible. Increases dynamic range (102 dB => 256 dB). Tagging support. CDdb (not lke cddb or freedb, but more like imdb).

Although perhaps the details have changed slightly; a newer excerpt from here says SV8 will "only" have 190dB possible dynamics.

I'm not sure if this has any relevance to your observations, it's just something that came to my mind. I'm fairly confident that the "increased dynamic range" goes towards eliminating clipping during the encoding phase, but I'm not sure if it also helps with the situation you're testing.
Go to the top of the page
+Quote Post
CiTay
post Feb 8 2003, 03:22
Post #3


Administrator


Group: Admin
Posts: 2378
Joined: 22-September 01
Member No.: 3



QUOTE
Besides, I noticed that with decreasing volume the bitrate of all 3 codes decreased too.


I think you're mainly testing the ATH curves/masking thresholds of each codec. This is something i thought about before, in conjunction with WaveGain prior to encoding, or older, relatively quiet recordings. I wonder if that could have a possible negative effect, because more of the signal goes down in the noise, or on the other hand, with overcompressed modern music, if more and more of the signal is left out due to temporal masking. Remember, we've already seen some sideffects of today's loudness race, see internal clipping with MPC. This was a design flaw, since nobody could anticipate today's excessive volume levels. There must be some more in today's codecs...
Go to the top of the page
+Quote Post
Garf
post Feb 8 2003, 10:16
Post #4


Server Admin


Group: Admin
Posts: 4886
Joined: 24-September 01
Member No.: 13



QUOTE (CiTay @ Feb 8 2003 - 04:22 AM)
I think you're mainly testing the ATH curves/masking thresholds of each codec. This is something i thought about before, in conjunction with WaveGain prior to encoding, or older, relatively quiet recordings. I wonder if that could have a possible negative effect, because more of the signal goes down in the noise, or on the other hand, with overcompressed modern music, if more and more of the signal is left out due to temporal masking.

You can test this theory by scaling down, encoding, decoding, and scaling up. The result should be very different and not have any noticeable effect on quality.

Vorbis (and I assume the others) have a 'floating' ATH curve. If you start out at 0dB then the codec will assume you can't be hearing the -90dB signal well, be it because it's too silent or because you blew your ears out with the 0dB signal.
Go to the top of the page
+Quote Post
tigre
post Feb 8 2003, 10:54
Post #5


Moderator


Group: Members
Posts: 1434
Joined: 26-November 02
Member No.: 3890



QUOTE (Garf @ Feb 8 2003 - 01:16 AM)
Vorbis (and I assume the others) have a 'floating' ATH curve. If you start out at 0dB then the codec will assume you can't be hearing the -90dB signal well, be it because it's too silent or because you blew your ears out with the 0dB signal.

If I understood correctly what you said, the results should be different if the "test" is modified like this: Apply fadeout and fadein (Step 2/6) not 0 -> -150 dB but e.g. -60 -> -150dB. So the maximum volume is much lower and the ATH curve will "float" so that at points where sound was awful before (e.g. -90dB amplification in step 2) It should be much better now. I tested this and it's still the same.


--------------------
Let's suppose that rain washes out a picnic. Who is feeling negative? The rain? Or YOU? What's causing the negative feeling? The rain or your reaction? - Anthony De Mello
Go to the top of the page
+Quote Post
robert
post Feb 8 2003, 15:23
Post #6


LAME developer


Group: Developer
Posts: 789
Joined: 22-September 01
Member No.: 5



lame offers three different approaches of ath adjustment. the only problem for comparing them utilizing aps is, that some of the aps code level hacks are present in type 3 only.
Go to the top of the page
+Quote Post
Gecko
post Feb 8 2003, 17:17
Post #7





Group: Members
Posts: 948
Joined: 15-December 01
From: Germany
Member No.: 662



MPC (at least the Buschmann encoder, I see no reason why it would have been removed) has a strategy to increase fidelity on low volume passages too. See here. It's called "adaptive threshold in quiet".

tigre, I have a question: when you say, you compared the enodes to the original, after amplifying them back to normal volume, what original did you use? Imho it would be usefull to compare the processed original (WAV -> fadeout -> fadein) instead of the unprocessed source.

edit: link fixed

This post has been edited by Gecko: Feb 8 2003, 18:34
Go to the top of the page
+Quote Post
tigre
post Feb 8 2003, 18:15
Post #8


Moderator


Group: Members
Posts: 1434
Joined: 26-November 02
Member No.: 3890



QUOTE (Gecko @ Feb 8 2003 - 08:17 AM)
tigre, I have a question: when you say, you compared the enodes to the original, after amplifying them back to normal volume, what original did you use? Imho it would be usefull to compare the processed original (WAV -> fadeout -> fadein) instead of the unprocessed source.

I created different originals in different ways: 1. The pocessed original (as you suggested), 2. A processed original applying all steps exept for lossy compression. 3. Unprocessed original.

Originals #1a (saved as 24bit .wav after fadeout) and #2 sounded like this: increasing noise starting to be audible at 2/3 of the sample (=-100dB amplification), sounds the same as CBR mp3 (api). This is expectable, as during the processing the resolution is lowered to 24bit. In Original #1b (saved as 32bit .wav after fadeout) there was no audible difference to #3 (unprocessed signal). It seems like the music I tested with and the noise of my system/environment managed to mask the noise introduced by processing of #1b enough (32bit = 193dB SNR; - 150db amplification => 43dB SNR left) to hide the difference to me.

[EDIT]BTW: The link in your post above doesn't work here.[/EDIT]

This post has been edited by tigre: Feb 8 2003, 18:16


--------------------
Let's suppose that rain washes out a picnic. Who is feeling negative? The rain? Or YOU? What's causing the negative feeling? The rain or your reaction? - Anthony De Mello
Go to the top of the page
+Quote Post
Gabriel
post Feb 8 2003, 19:13
Post #9


LAME developer


Group: Developer
Posts: 2950
Joined: 1-October 01
From: Nanterre, France
Member No.: 138



Well, Lame should have about 125dB of dynamic range. I guess the situation is the same for MPC and Vorbis.

150dB seems too high. You will never encounter music with 150dB of dynamic range. (btw, if you encounter such music, I would seriously advise you to NOT LISTEN it, or you would immediately become deaf)

Testing with -90 or -100dB would be more interesting, as this is likely to happen in extreme cases in music.
Go to the top of the page
+Quote Post
tigre
post Feb 8 2003, 20:38
Post #10


Moderator


Group: Members
Posts: 1434
Joined: 26-November 02
Member No.: 3890



I know 150dB dynamic range is unrealistic. But as it's a test and I wanted to find out where problems start. For this a 0->-150dB fadeout is a decent choice.

Take mpc as an example: My music sample was 42 seconds, mpc started to sound bad at 18 seconds. That is equal to an amplification of - 64dB. The peak volume of the original file was -6dB at 18 seconds, so the volume of the source used for compression was -70dB when it started to sound bad. Given the necessary SNR (for me on this peace of music/my test equipment) of 43dB (see post above) I could say that in this case mpc behaves like it would have a SNR of 113dB. (Somehow I have the feeling that I simplifed the math part too much blink.gif ).

If you're interested in figures like this for mp3/vorbis, I'll repeat the test, including ABX if necessary to find the exact positions where differences start to be noticable.


--------------------
Let's suppose that rain washes out a picnic. Who is feeling negative? The rain? Or YOU? What's causing the negative feeling? The rain or your reaction? - Anthony De Mello
Go to the top of the page
+Quote Post
Gabriel
post Feb 9 2003, 11:00
Post #11


LAME developer


Group: Developer
Posts: 2950
Joined: 1-October 01
From: Nanterre, France
Member No.: 138



I'd be curious to know the corresponding numbers for Lame. If you are willing to do the same test, I'd be interested in 3.90.2 --alt-preset-extreme and 3.94a11 --preset extreme.
Go to the top of the page
+Quote Post
tigre
post Feb 9 2003, 17:42
Post #12


Moderator


Group: Members
Posts: 1434
Joined: 26-November 02
Member No.: 3890



@Gabriel: Here's the requested MP3 test:
Original test sample: "real" music (salsa): o.wav
length 42.667 sec.
peak amplitude -4.54dB = 19248 max sample value
max RMS Power -10.48dB
min RMS Power -48.4dB
Average RMS Power -21.85dB
Total RMS Power -21.0dB
(Values taken from CEP Waveform Statistics)

1. Creating amplified "original" sample o_a_4b.wav:
- converting to 32bit
- logarithmic fadeout 0 -> -150dB
- save as 4 byte PCM (type 1, 32bit), as Lame can't handle default CEP format "32 bit Normalized float (type 3)" properly.2.

2. For comparing noise created empty file, same samplerate, same bit-depth, same length: n_a_4b.wav:

3. Encoding
o_a_4b.wav:
- lame 3.90.2 --alt-preset extreme
- lame 3.90.2 --alt-preset insane
- lame 3.94a11 --preset extreme
- lame 3.94a11 --preset extreme -F
n_a_4b.wav:
- lame 3.90.2 --alt-preset insane

4. Decoding with foobar2000 0.5beta16 Diskwriter, Output WAV (PCM 24bit dithered), DSP used, Resampling to 96kHz (Fast mode disabled)
- all encoded files
- o_a_4b.wav
- n_a_4b.wav

5. Processing all files made in step 4 like this:
- apply logarithmic fadein 0 -> +150dB
- downsample to 16bit/48kHz (dithered) and save to xxx_fadein.wav

6. Having a look at the files from step 4 with Encspot

7. Listening using WinABX

Results/Conclusions

Step 6: AFAIK --alt-preset extreme (3.94a11 --preset extreme) only uses 32kbps frames when it "notices" silence. So for the mp3 files encoded with "extreme" it could be an interesting information when the first 32kbps frame and when the last 128kbps frame occurs.

lame 3.90.2 --alt-preset extreme:
- first 32kbps frame: 22.7 sec. -> Amplification at this position: -80dB; RMS Power at this position of o.wav: 25dB -> power of encoded signal: -105dB
- last 128kpbs frame: 26.5 sec. -> Amplification: -90dB; RMS Power of o.wav: -17dB -> power of encoded signal: -107dB

lame 3.94a11 --preset extreme:
- first 32kbps frame: 23.1 sec. -> Amplification: -81dB; RMS Power of o.wav: -30dB -> power of encoded signal: -111dB
- last 128kpbs frame: 27.2 sec. -> Amplification: -95dB; RMS Power of o.wav: -19dB -> power of encoded signal: -114dB

Maybe the maths I'm doing here are a sign of faulty resoning, but as I did the same it's comparable. So it seems that lame 3.94 keeps encoding low volume signals where lame 3.90.2 already stopped and the difference is somewhere around 6-7dB.

Step 7: ABX tests (The volume of my system (AC'97 onboard sound, amp, HD 530 headphones) is set to a level I use normally for ABXing music samples = "hearing-damage proof", all ABX tests 4/4 or till guessing probability < 5%):

- n_a_4b.wav vs. n_a_4b_fadein.wav to find out where I start noticing noise ("equipment/ears test"). -> 21 sec. = -83dB RMS power

- o.wav vs. o_a_4b_fadein.wav (Where starts noise added to music to be noticable?) -> 24.0-25.0 range; min RMS power of that range: -33dB; RMS power of n_a_4b_fadein.wav (=noise) at this point: -71dB => "SNR"=38dB

- o_a_4b_fadein.wav vs. o_a_4b_3.90.2_insane_fadein.wav: The *music* sounds exactly the same to me. Increasing noise starting at 24 sec., starting to sound distorted (clipping) at 39 sec. If I focus on the nois I can ABX at some points (34-36 sec. = noise RMS -35 to -25dB), but it's questionable if the difference is caused by lame or by massive amplification of dither noise.

- o_a_4b_fadein.wav vs. o_a_4b_3.90.2_extreme_fadein.wav: ABXable differences start at 13:0-13:9 sec. (ringing, hashness of "s", later "underwater sound") -> Amplification -47dB, RMS power -21dB => RMS power of signal passed to lame -68dB

- o_a_4b_fadein.wav vs. o_a_4b_3.94a11_extreme_fadein.wav: 13:0-13:9 sec. (Same as 3.90.2.; 3.90.2 vs. 3.94 isn't ABXable for me with this sample - well, I didn't bother finding differences at 20 ... 30 seconds where music is totally replaced by artifacts

- 3.94 --preset extreme -F: The same. It seems like the "guilty" is vbr mode, not minimum bitrate of 32kbps, since 3.90.2 api sounds fine the whole range.


--------------------
Let's suppose that rain washes out a picnic. Who is feeling negative? The rain? Or YOU? What's causing the negative feeling? The rain or your reaction? - Anthony De Mello
Go to the top of the page
+Quote Post
NumLOCK
post Feb 10 2003, 09:43
Post #13


Neutrino G-RSA developer


Group: Developer
Posts: 852
Joined: 8-May 02
From: Geneva
Member No.: 2002



QUOTE (SometimesWarrior @ Feb 8 2003 - 03:10 AM)
I remembered hearing something about MPC SV8 having support for a higher dynamic range than SV7.

The weakness of SV7 is not its dynamic range, but its inability to handle highly clipped samples exactly. This will be fixed in SV8.

If I'm not mistaken, the dynamic range itself is already enough to encode samples down to the ATH (absolute threshold of hearing) level. Below the ATH, you'll have noise. This is the main reason why your song is distorted when you ramp the volume up so much.

It's not a problem of lossy codecs, but a feature. If you listen at normal volumes, then the sounds that are so low that you can't hear them, will be roughly encodec, to save space.


--------------------
Try Leeloo Chat at http://leeloo.webhop.net
Go to the top of the page
+Quote Post
tigre
post Feb 10 2003, 11:08
Post #14


Moderator


Group: Members
Posts: 1434
Joined: 26-November 02
Member No.: 3890



@NumLock:
You're right. I did another quick test. This time more "realistic": I took my test sample (very familiar with it now), and "enveloped" (=amplified dynamically) it after changing resolution to 32bit in the following way:
0-5sec: 0dB; 5-10sec.: fadout 0 -> -40 db; 10-30sec: fadout -40 -> -60dB; 30-35sec.: fadein -60 -> 0dB; 35sec-end: 0dB
I encoded the resulting sample (dithered to 16bit) with --preset extreme and tried to ABX (at a volume sounding comfortable at the beginning and at the end of the sample. I couldn't hear/ABX any differences. Focusing on the artifacts I heard in the test before was not successful, my impression was that if there'd have been artifacts, they were covered by noise (which is caused mainly by my hardware/environment, as "digital silence noise" sounds almost exactly the same at this volume to me.)

My conclusion: On music that has a wide dynamic range (50dB and more) - with slow volume changes, giving the ear/brain enough time to adapt - listened to with low noise equipment/environment (maybe from 24bit source) there could still be a audible difference to lossy encoded (especially mp3 as I haven't tested the other formats that accurately). Although I've not been able to prove (=ABX) it with my equipment/environment/abilities, I believe this is a resonable conclusion from my tests. If my reasoning is wrong, please convince me of the opposite. I'm open to this. smile.gif


--------------------
Let's suppose that rain washes out a picnic. Who is feeling negative? The rain? Or YOU? What's causing the negative feeling? The rain or your reaction? - Anthony De Mello
Go to the top of the page
+Quote Post

Reply to this topicStart new topic
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:

 



RSS Lo-Fi Version Time is now: 26th December 2014 - 07:40