IPB

Welcome Guest ( Log In | Register )

7 Pages V  < 1 2 3 4 > »   
Reply to this topicStart new topic
Wavegain vs. MP3Gain, Why the former might be better...
HansHeijden
post Jun 22 2003, 11:53
Post #26





Group: Members
Posts: 159
Joined: 30-September 01
Member No.: 75



--scale is applied to float values so the lower volume doesn't throw away bits.

The question is, if it would be better to let lame encode music that is already at a (supposedly) fixed listening volume level, rather than just encode at original level and make the correction afterwards. I would think the first, but perhaps lame (presets) needs some ath 'retuning' to correct for the usually much lower input volumes.
Go to the top of the page
+Quote Post
john33
post Jun 22 2003, 12:06
Post #27


xcLame and OggDropXPd Developer


Group: Developer
Posts: 3760
Joined: 30-September 01
From: Bracknell, UK
Member No.: 111



When applying the gain, WaveGain converts the input data to float, performs all gain/hard limiting adjustment and then converts, with or without dithering, back to desired output bitwidth. You can write out 8 bit, 16 bit, 24 bit, 32 bit integers or floats as you wish regardless of the input bitwidth.


--------------------
John
----------------------------------------------------------------
My compiles and utilities are at http://www.rarewares.org/
Go to the top of the page
+Quote Post
john33
post Jun 22 2003, 12:45
Post #28


xcLame and OggDropXPd Developer


Group: Developer
Posts: 3760
Joined: 30-September 01
From: Bracknell, UK
Member No.: 111



QUOTE (Jebus @ Jun 21 2003 - 11:35 PM)
ah but i need to manually calculate scalefactor from that number right?

I just uploaded WaveGain V1.0.1 that does that for you and displays the Scale next to the Album Gain. wink.gif


--------------------
John
----------------------------------------------------------------
My compiles and utilities are at http://www.rarewares.org/
Go to the top of the page
+Quote Post
M
post Jun 22 2003, 12:58
Post #29





Group: Members
Posts: 964
Joined: 29-December 01
Member No.: 830



QUOTE (john33 @ Jun 22 2003 - 06:45 AM)
I just uploaded WaveGain V1.0.1 that does that for you and displays the Scale next to the Album Gain. wink.gif

Nice! I had just thought to myself, "I wonder how long it will take John to add a scale-factor calculation to WaveGain," and thirty seconds later you did so. smile.gif

Any chance the author of those CoolEdit plugins shared his algorithms...?

- M.
Go to the top of the page
+Quote Post
de Mon
post Jun 22 2003, 14:17
Post #30





Group: Members
Posts: 474
Joined: 1-December 02
Member No.: 3940



May be I misunderstood something... If we are going to gain before encoding what will happen to signal/noise ratio? Gaining them down will increase noise level. Am I wrong?

This post has been edited by de Mon: Jun 22 2003, 14:21


--------------------
Ogg Vorbis for music and speech [q-2.0 - q6.0]
FLAC for recordings to be edited
Speex for speech
Go to the top of the page
+Quote Post
Xenno
post Jun 22 2003, 14:39
Post #31





Group: Members
Posts: 393
Joined: 23-July 02
From: Blue Grass, IA
Member No.: 2760



Any non-zero value will be raised by the same amount as the gain applied to the file overall. But the difference between the floor and the peak values will stay constant (assuming no clipping). That's why there's no substitute for strongly recorded track versus a weak one that has been RG'd.

xen-uno

This post has been edited by Xenno: Jun 22 2003, 14:40


--------------------
No one can be told what Ogg Vorbis is...you have to hear it for yourself
- Morpheus
Go to the top of the page
+Quote Post
mrosscook
post Jun 22 2003, 14:44
Post #32





Group: Members
Posts: 82
Joined: 14-December 02
From: Amherst MA
Member No.: 4077



Tigre, Hanky, HansHeijden, and John33,

I don't think that going to a greater bitdepth, or even to floating-point calculations, alters the problem in principle. If we carried out the entire wavgain process, for example, from a floating-point input all the way down to floating-point silence, we have still lost all the information in the original signal; the asymptote is still complete loss.

I agree that in practice the amount of loss is likely to be so small as to be imperceptible, and that higher bitdepths will eliminate noise that would be introduced by dithering otherwise. But the basic problem is still there.
Go to the top of the page
+Quote Post
Hanky
post Jun 22 2003, 17:57
Post #33





Group: Members (Donating)
Posts: 531
Joined: 18-November 01
From: The Netherlands
Member No.: 481



After using the search function and finally realizing that this issue was discussed many times before on HA, I came to the conclusion that the whole ReplayGain concept will always be a compromise.
Surprisingly this was stated very clearly in the Replay Gain FAQ as early as december 2001 by David Robinson himself....
QUOTE
To maintain full dynamic range, the ideal solution is to feed the Replay Level value out of your PC, to your volume control. Obviously this requires dedicated hardware, and few people are going to do this, but it would be possible for those who demand highest quality to put an end to stupid fluctuations in level in this manner. No compromise. No downside.
Go to the top of the page
+Quote Post
Jebus
post Jun 22 2003, 19:19
Post #34





Group: Developer
Posts: 1293
Joined: 17-March 03
From: Calgary, AB
Member No.: 5541



QUOTE (john33 @ Jun 22 2003 - 03:45 AM)
QUOTE (Jebus @ Jun 21 2003 - 11:35 PM)
ah but i need to manually calculate scalefactor from that number right?

I just uploaded WaveGain V1.0.1 that does that for you and displays the Scale next to the Album Gain. wink.gif

Thanks John! I actually made this small change myself, but didn't have access to a windows compiler. This shaves a few seconds off my work.

Whether this process is lossy is I suppose still debatable, but as far as I can tell, inputing a --scale value is in all ways superiour to MP3Gaining or Wavegaining...

1) you get precisely 89.0 dB (not +- 1.5dB) like MP3Gain does.
2) you don't have to dither before encoding like Wavegain requires.
3) newer (louder) albums do not receive an inflated bitrate like they would with MP3Gain.
4) signals below the threshold of hearing AFTER GAINING are simply discarded by the psymodel, instead of being artificially turned down later (by MP3Gain).
Go to the top of the page
+Quote Post
ibm2080
post Jun 22 2003, 20:27
Post #35





Group: Members
Posts: 67
Joined: 27-April 03
From: Paris, FR
Member No.: 6226



QUOTE (mrosscook @ Jun 22 2003 - 05:44 AM)
Tigre, Hanky, HansHeijden, and John33,

I don't think that going to a greater bitdepth, or even to floating-point calculations, alters the problem in principle.  If we carried out the entire wavgain process, for example, from a floating-point input all the way down to floating-point silence, we have still lost all the information in the original signal;  the asymptote is still complete loss.

I agree that in practice the amount of loss is likely to be so small as to be imperceptible, and that higher bitdepths will eliminate noise that would be introduced by dithering otherwise.  But the basic problem is still there.

I completely agree with you.
You explain it in an earlier post too (same thread, page 1).
I am not sure how the code works at all, I haven't even looked at it to tell you the truth, but I think mrosscook's point is valid from a logical point of view, assuming that the floor is at a constant value and that values below that floor are dropped. If we apply a negative gain, everything in that file will drop by that much. Keep in mind that before the gain was applied there was some sound info that was just above the floor limit. If all that is true, then there should be some sound info that drops below the floor limit if we apply a negative gain (ie is dropped from the file), meaning there is less sound info after the gain process to encode. It also means that if you take the same track, and apply two different gains to it, one being more negative than the other, the resulting files should have two different sizes.


--------------------
Ib
Go to the top of the page
+Quote Post
Jebus
post Jun 22 2003, 20:56
Post #36





Group: Developer
Posts: 1293
Joined: 17-March 03
From: Calgary, AB
Member No.: 5541



The thing is, when you MP3gain a file down a few dB, it ends up clipping sounds from the lowend anyhow... so it is still throwing away sound. It is still there in the MP3, just inaudible in the same sense that a too-loud signal is clipped WITHOUT the normalizing. By doing it my way with --scale you are still loosing the same information, but instead of keeping it in the MP3, it is discarded during the encode process.

essentially what I am saying is that yes, if you --scale 0.0 you will have a 0kB file with no information in it. But if you MP3gain a file down to 0, you are also getting a file that will have no audible information - it's still there, but its all been clipped below playback threshold. So in this sense, both methods are just as lossy... the MP3Gain method is reversible however, while the --scale method has lost that data forever in the interest of space savings.

This post has been edited by Jebus: Jun 22 2003, 20:57
Go to the top of the page
+Quote Post
mmortal03
post Jun 22 2003, 21:08
Post #37





Group: Members
Posts: 601
Joined: 19-July 02
From: USA
Member No.: 2667



QUOTE (Jebus @ Jun 22 2003 - 12:19 PM)
4) signals below the threshold of hearing AFTER GAINING are simply discarded by the psymodel, instead of being artificially turned down later (by MP3Gain).

This is what I was thinking. If we aren't going to amplify (or RG) these mp3s later, who cares if we are leaving out sounds below the threshold? As long as we were going to listen to them MP3Gained in the end anyway, and probably not ever touch them again, it shouldn't matter to the majority of people, and plus there is a 10% file size savings.

My question is, without amplifying these --scale'd mp3s, is there ANY perceptible difference with an equally MP3Gain scaled mp3? Peceptibly, there shouldn't be.

This post has been edited by mmortal03: Jun 22 2003, 21:10


--------------------
WARNING: Changing of advanced parameters might degrade sound quality. Modify them only if you are expirienced in audio compression!
Go to the top of the page
+Quote Post
Jebus
post Jun 22 2003, 21:58
Post #38





Group: Developer
Posts: 1293
Joined: 17-March 03
From: Calgary, AB
Member No.: 5541



I personally can't ABX a difference, but my ears aren't as well tuned as others... Someone wish to do an ABX test on a highly compressed album done both with MP3Gain and --scale?
Go to the top of the page
+Quote Post
_Shorty
post Jun 23 2003, 10:10
Post #39





Group: Banned
Posts: 694
Joined: 19-April 02
Member No.: 1820



don't certain portions of the mp3 encoding process rely on the volume level of the signal to begin with? I'm sure it must determine the audibility of certain things in relation to full-scale, no? Or is it all relative to other portions of the signal content, with no regard for full-scale at all? Seems to me that if full-scale matters then encoding should be done first with replaygain/mp3gain only being done afterwards. If full-scale doesn't matter at all though, then it would seem to make sense to --scale the data after all. But this also makes me wonder something else. Wouldn't mp3 encoder quality, or any lossy format at all for that matter, sound best with a certain standardized playback volume so it could actually match up with what the psy-model is throwing away? Surely absolute volume matters too, and not just relative volume of differing frequency components.
Go to the top of the page
+Quote Post
2Bdecided
post Jun 23 2003, 10:57
Post #40


ReplayGain developer


Group: Developer
Posts: 5058
Joined: 5-November 01
From: Yorkshire, UK
Member No.: 409



In most of this thread, the real problem is only just hinted at.

Don't worry about what replay gain is doing or what scale is doing - you're not losing anything at all here when using the calculated value to set --scale in lame. Also, don't worry about what happens at the very limit: --scale 0.0 isn't possible - it's silence, and it's infinite attentuation. --scale 0.0 is less than -50dB, less than -100dB, less than -200dB, - it's -infinity dB! As such, it's misleading and irelevant. Take -100dB as the limit case: perfectly possible without loss in 32-bit floating point calculations. So, no need to worry there.

We're left with two worries: what lame does, and what happens at the 16-bit output when decoded. OK, forget the second one of these - this is always an issue with Replay Gain, and was discussed before there was even any RG software available (thanks for the quote Hanky!).


So, we have one, and only one issue: what does lame do? IIRC It supposedly looks at the "loudness" of the signal, makes some assumptions from this, and enforces a sensible absolute threshold of hearing. This is most critical for quiet tracks - if it assumed that you will never turn up the volume to hear them, then most of the audio signal is way below hearing threshold, and can be discarded. This used to happen, but people often turn up the volume, and heard problems: so (again, IIRC) for a few years now, lame has taken this into account, and shifted the ath down for quieter tracks. There's a time constant involved, and it may not be a linear process - this is why changing the volume before encoding can change the output file size. If lame were adjusting everything immediately and linearly, scaling a file in the floating point domain would make no difference at all to the resulting filesize.

There are probably other factors at work here. What worries me slightly is that lame --aps has been tuned with tracks ripped from CDs. Not with tracks ripped from CDs and then dropped by 10dB. If you get a smaller file, then by ddefinition it has less information in it. Something has been lost. The question is what? and does it matter?


Maybe it's possible to tell lame that the file has been replay gained, and that you will be listening to this track at a particular loudness - that then fixes the ath. It would be interesting to do this, so fixing the ath at the "correct" value, to see what kind of bitrates this yields.

Replay gain originally targetted 83dB. This calibration assumes that a full scale sine wave will give 103dB SPL. Most of the current implementations target 89dB. This calibration assumes that a full scale sine wave will give 97dB SPL.

I bet that lame is guessing that a full scale sine wave would give about 85-90dB SPL for highly compressed tracks. What switches can people use to turn off lame's automatic ath adjustment, and to enforce it at a level which makes sense if a full scale sine wave = 97 dB SPL?

(I'm assuming that no one will listen louder than this, and that listening quieter than this will not break the psychoacoustics. Both these assumptions are false, but so are any assumptions that assume how loud people will listen - including the current automatic one in lame - so I'm hoping this doesn't cause any problems).


John - is Wavgain using 83 or 89dB target?

Lame experts - which switches can fix the ath where we want it?

Jebus - can you try whatever gets suggested with your test track and report back please?

Dibrom/other devs = if you're reading this, can you see any way in which this would break --aps or lame in general?

Cheers,
David.
Go to the top of the page
+Quote Post
john33
post Jun 23 2003, 11:10
Post #41


xcLame and OggDropXPd Developer


Group: Developer
Posts: 3760
Joined: 30-September 01
From: Bracknell, UK
Member No.: 111



QUOTE (2Bdecided @ Jun 23 2003 - 09:57 AM)
John - is Wavgain using 83 or 89dB target?

Cheers,
David.

89dB.


--------------------
John
----------------------------------------------------------------
My compiles and utilities are at http://www.rarewares.org/
Go to the top of the page
+Quote Post
ibm2080
post Jun 23 2003, 13:14
Post #42





Group: Members
Posts: 67
Joined: 27-April 03
From: Paris, FR
Member No.: 6226



QUOTE (2Bdecided @ Jun 23 2003 - 01:57 AM)
There are probably other factors at work here. What worries me slightly is that lame --aps has been tuned with tracks ripped from CDs. Not with tracks ripped from CDs and then dropped by 10dB. If you get a smaller file, then by ddefinition it has less information in it. Something has been lost. The question is what? and does it matter?

Cheers,
David.

That is the heart of the problem as I see it.


--------------------
Ib
Go to the top of the page
+Quote Post
dev0
post Jun 23 2003, 14:45
Post #43





Group: Developer
Posts: 1679
Joined: 23-December 01
From: Germany
Member No.: 731



Let's just take a look at the current possibilities to apply ReplayGain to an MP3 (as pointed out in this thread; excluding fb2k's implementation):

Original -> lame -> MP3Gain

Original -> Wavegain (+ Dither/NS?) -> lame

Original - > (Wavegain to calc. scale) -> lame + scale

Another interesting point to consider is, how the Dithering/NoiseShaping of Wavegain influences/confuses the behaviour of lame (ATH?).

dev0


--------------------
"To understand me, you'll have to swallow a world." Or maybe your words.
Go to the top of the page
+Quote Post
2Bdecided
post Jun 23 2003, 15:33
Post #44


ReplayGain developer


Group: Developer
Posts: 5058
Joined: 5-November 01
From: Yorkshire, UK
Member No.: 409



QUOTE (dev0 @ Jun 23 2003 - 01:45 PM)
1. Original -> lame -> MP3Gain

2. Original -> Wavegain (+ Dither/NS?) -> lame

3. Original - > (Wavegain to calc. scale) -> lame + scale

Another interesting point to consider is, how the Dithering/NoiseShaping of Wavegain influences/confuses the behaviour of lame (ATH?).

The thing is, 2 can never be better than 3 - as you suggest, there could be interesting consequences of using it, but because 3 must be better than 2, I'd rather forget 2 and explore 3.


3 may be more useful, because the only quality concerns are within Lame itself.


Cheers,
David.
Go to the top of the page
+Quote Post
john33
post Jun 23 2003, 16:59
Post #45


xcLame and OggDropXPd Developer


Group: Developer
Posts: 3760
Joined: 30-September 01
From: Bracknell, UK
Member No.: 111



These are the results from using one track only. It may, or may not, be representative, but the results are fairly interesting. I make no comment, I am posting for others to comment. wink.gif
CODE
******** WaveGain, NO DITHER

D:\testdir>lame --preset standard 132.wav 132.mp3
LAME version 3.90.3 MMX  (http://www.mp3dev.org/)
CPU features: i387, MMX (ASM used), 3DNow!, SIMD
Using polyphase lowpass  filter, transition band: 18671 Hz - 19205 Hz
Encoding 132.wav to 132.mp3
Encoding as 44.1 kHz VBR(q=2) j-stereo MPEG-1 Layer III (ca. 7.4x) qval=2
   Frame          |  CPU time/estim | REAL time/estim | play/CPU |    ETA
 8206/8208  (100%)|    0:35/    0:35|    0:35/    0:35|   6.0438x|    0:00
32 [ 151] ****
128 [ 790] %%***************
160 [2880] %%%%%********************************************************
192 [3157] %%%%%%%%%%%%%%%%%*************************************************
224 [ 803] %%%%%************
256 [ 247] %%****
320 [ 180] %***
average: 179.5 kbps   LR: 1384 (16.86%)   MS: 6824 (83.14%)

Writing LAME Tag...done

******** WaveGain, DITHER with NO NOISE SHAPING

D:\testdir>lame --preset standard 132.wav 132.mp3
LAME version 3.90.3 MMX  (http://www.mp3dev.org/)
CPU features: i387, MMX (ASM used), 3DNow!, SIMD
Using polyphase lowpass  filter, transition band: 18671 Hz - 19205 Hz
Encoding 132.wav to 132.mp3
Encoding as 44.1 kHz VBR(q=2) j-stereo MPEG-1 Layer III (ca. 7.4x) qval=2
   Frame          |  CPU time/estim | REAL time/estim | play/CPU |    ETA
 8206/8208  (100%)|    0:36/    0:36|    0:36/    0:36|   5.9545x|    0:00
32 [ 150] %%%*
128 [ 801] %%***************
160 [2848] %%%%%******************************************************
192 [3201] %%%%%%%%%%%%%%%%%%************************************************
224 [ 782] %%%%%************
256 [ 240] %%***
320 [ 186] %%**
average: 179.5 kbps   LR: 1520 (18.52%)   MS: 6688 (81.48%)

Writing LAME Tag...done

******** WaveGain, DITHER with LIGHT NOISE SHAPING

D:\testdir>lame --preset standard 132.wav 132.mp3
LAME version 3.90.3 MMX  (http://www.mp3dev.org/)
CPU features: i387, MMX (ASM used), 3DNow!, SIMD
Using polyphase lowpass  filter, transition band: 18671 Hz - 19205 Hz
Encoding 132.wav to 132.mp3
Encoding as 44.1 kHz VBR(q=2) j-stereo MPEG-1 Layer III (ca. 7.4x) qval=2
   Frame          |  CPU time/estim | REAL time/estim | play/CPU |    ETA
 8206/8208  (100%)|    0:36/    0:36|    0:36/    0:36|   5.9339x|    0:00
32 [  14] %
40 [   1] *
128 [ 916] %%%%****************
160 [2877] %%%%%*******************************************************
192 [3172] %%%%%%%%%%%%%%%%%%************************************************
224 [ 797] %%%%%************
256 [ 249] %%****
320 [ 182] %***
average: 181.2 kbps   LR: 1534 (18.69%)   MS: 6674 (81.31%)

Writing LAME Tag...done

******** WaveGain, DITHER with MEDIUM NOISE SHAPING

D:\testdir>lame --preset standard 132.wav 132.mp3
LAME version 3.90.3 MMX  (http://www.mp3dev.org/)
CPU features: i387, MMX (ASM used), 3DNow!, SIMD
Using polyphase lowpass  filter, transition band: 18671 Hz - 19205 Hz
Encoding 132.wav to 132.mp3
Encoding as 44.1 kHz VBR(q=2) j-stereo MPEG-1 Layer III (ca. 7.4x) qval=2
   Frame          |  CPU time/estim | REAL time/estim | play/CPU |    ETA
 8206/8208  (100%)|    0:36/    0:36|    0:36/    0:36|   5.9831x|    0:00
32 [  10] %
128 [ 916] %%%%****************
160 [2895] %%%%%********************************************************
192 [3156] %%%%%%%%%%%%%%%%%*************************************************
224 [ 810] %%%%%************
256 [ 245] %%****
320 [ 176] %%**
average: 181.2 kbps   LR: 1485 (18.09%)   MS: 6723 (81.91%)

Writing LAME Tag...done

******** WaveGain, DITHER with HEAVY NOISE SHAPING

D:\testdir>lame --preset standard 132.wav 132.mp3
LAME version 3.90.3 MMX  (http://www.mp3dev.org/)
CPU features: i387, MMX (ASM used), 3DNow!, SIMD
Using polyphase lowpass  filter, transition band: 18671 Hz - 19205 Hz
Encoding 132.wav to 132.mp3
Encoding as 44.1 kHz VBR(q=2) j-stereo MPEG-1 Layer III (ca. 7.4x) qval=2
   Frame          |  CPU time/estim | REAL time/estim | play/CPU |    ETA
 8206/8208  (100%)|    0:36/    0:36|    0:36/    0:36|   5.9390x|    0:00
40 [   1] *
128 [1115] %%%%%*******************
160 [3176] %%%%%%%%**********************************************************
192 [2890] %%%%%%%%%%%%%%%%*********************************************
224 [ 653] %%%%**********
256 [ 222] %%***
320 [ 151] %***
average: 177.5 kbps   LR: 1583 (19.29%)   MS: 6625 (80.71%)

Writing LAME Tag...done

******** ORIGINAL WAVE FILE + LAME --scale

D:\testdir>D:\testdir>lame --preset standard --scale 0.691 13.wav 13.mp3
LAME version 3.90.3 MMX  (http://www.mp3dev.org/)
CPU features: i387, MMX (ASM used), 3DNow!, SIMD
Using polyphase lowpass  filter, transition band: 18671 Hz - 19205 Hz
Encoding 13.wav to 13.mp3
Encoding as 44.1 kHz VBR(q=2) j-stereo MPEG-1 Layer III (ca. 7.4x) qval=2
   Frame          |  CPU time/estim | REAL time/estim | play/CPU |    ETA
 8206/8208  (100%)|    0:36/    0:36|    0:36/    0:36|   6.0251x|    0:00
32 [ 152] %***
128 [ 810] %*****************
160 [2876] %%%%%********************************************************
192 [3130] %%%%%%%%%%%%%%%%%*************************************************
224 [ 815] %%%%%*************
256 [ 238] %%****
320 [ 187] %***
average: 179.5 kbps   LR: 1372 (16.72%)   MS: 6836 (83.28%)

Writing LAME Tag...done

******** ORIGINAL WAVE FILE + LAME AS REFERENCE

D:\testdir>lame --preset standard 13.wav 13.mp3
LAME version 3.90.3 MMX  (http://www.mp3dev.org/)
CPU features: i387, MMX (ASM used), 3DNow!, SIMD
Using polyphase lowpass  filter, transition band: 18671 Hz - 19205 Hz
Encoding 13.wav to 13.mp3
Encoding as 44.1 kHz VBR(q=2) j-stereo MPEG-1 Layer III (ca. 7.4x) qval=2
   Frame          |  CPU time/estim | REAL time/estim | play/CPU |    ETA
 8206/8208  (100%)|    0:35/    0:35|    0:35/    0:35|   6.0544x|    0:00
32 [ 151] ****
128 [ 779] %****************
160 [2771] %%%%%*****************************************************
192 [3188] %%%%%%%%%%%%%%%%%*************************************************
224 [ 872] %%%%%**************
256 [ 251] %%****
320 [ 196] %%***
average: 180.6 kbps   LR: 1373 (16.73%)   MS: 6835 (83.27%)

Writing LAME Tag...done

D:\testdir>


This post has been edited by john33: Jun 23 2003, 17:04


--------------------
John
----------------------------------------------------------------
My compiles and utilities are at http://www.rarewares.org/
Go to the top of the page
+Quote Post
dev0
post Jun 23 2003, 17:07
Post #46





Group: Developer
Posts: 1679
Joined: 23-December 01
From: Germany
Member No.: 731



Could you please decode the test files and verify if any of these are the same? If they are we could try grouping some options.

dev0


--------------------
"To understand me, you'll have to swallow a world." Or maybe your words.
Go to the top of the page
+Quote Post
tigre
post Jun 23 2003, 17:22
Post #47


Moderator


Group: Members
Posts: 1434
Joined: 26-November 02
Member No.: 3890



QUOTE (dev0 @ Jun 23 2003 - 08:07 AM)
Could you please decode the test files and verify if any of these are the same? If they are we could try grouping some options.

dev0

None of them are exactly the same as you'll see e.g. by comparing the numbers of 160 kbps frames for each file.


--------------------
Let's suppose that rain washes out a picnic. Who is feeling negative? The rain? Or YOU? What's causing the negative feeling? The rain or your reaction? - Anthony De Mello
Go to the top of the page
+Quote Post
mrosscook
post Jun 23 2003, 17:32
Post #48





Group: Members
Posts: 82
Joined: 14-December 02
From: Amherst MA
Member No.: 4077



In light of 2Bdecided's comments about the possible role of psymodels and ath in this effect, I thought it might be useful to see what happens if we compress both an original file and a wavegained version using ZIP (which of course has no psychoacoustics in it).

I used the track, "When I Paint My Masterpiece", from Rock of Ages, Disc 2, by Bob Dylan and the Band. The original is a 44.38 MB wav file. This was transformed into a gained file using wavegain with a -18.95 dB setting -- the -6.95 dB recommended by the program itself, plus another -12 dB for emphasis. Using ZIP at max compression,

Original compresses 44.38 MB to 42.27 MB (95%)
Gained compresses 44.38 MB to 35.63 MB (80%)

Certainly, no psymodel has a role here. I also thought it might be interesting to compress the original and gained files using a lossless audio codec (FLAC high) and two lossy codecs (LAME v.3.92 -aps, and MPC -q5); that gives the little table,

CODE
             ZIP          FLAC       LAME        MPC
Original   42.27(95%)   27.70(62%)  4.79(11%)  4.51(10%)
Gained     35.63(80%)   18.98(43%)  4.92(11%)  4.25(10%)


FLAC, like ZIP, gets a big boost in compression by aggressive wavegaining. The lossy codecs don't. LAME actually makes the gained file a little bit larger than the original, but I doubt that is a meaningful difference.

What is the moral of the story? Maybe that wavegaining does produce loss (as suggested by ZIP and FLAC) but that the lossy-mode psychoacoustics are much larger effects and swamp out wavegain? Any comments would be welcome.
Go to the top of the page
+Quote Post
Jebus
post Jun 23 2003, 18:54
Post #49





Group: Developer
Posts: 1293
Joined: 17-March 03
From: Calgary, AB
Member No.: 5541



Lets forget the actual wavegaining, as 2bdecided said. I would like to hear from Gabriel or maybe Dibrom regarding the issues he brought up with --scale and ATH.

Fellas?

In addition, what this eventually comes down to is an ABX test. I can post some highly compressed tracks done with both mp3gain and --scale if you'd like, but I think you may be more successful with tracks you know well. I for one cannot identify a difference, beyond the slight volume difference due to the 89.0dB accuracy of the --scale method.

Seriously though, forget the wavegain/dithering method - there is no point in pursuing that - it will just confuse matters.

This post has been edited by Jebus: Jun 23 2003, 19:03
Go to the top of the page
+Quote Post
M
post Jun 24 2003, 02:53
Post #50





Group: Members
Posts: 964
Joined: 29-December 01
Member No.: 830



From a simple numeric comparison (considering the frame breakdown as well as the average bitrate) of john33's examples, it seems the following two are the closest to each other:

CODE
******** WaveGain, NO DITHER

D:\testdir>lame --preset standard 132.wav 132.mp3
LAME version 3.90.3 MMX  (http://www.mp3dev.org/)
CPU features: i387, MMX (ASM used), 3DNow!, SIMD
Using polyphase lowpass  filter, transition band: 18671 Hz - 19205 Hz
Encoding 132.wav to 132.mp3
Encoding as 44.1 kHz VBR(q=2) j-stereo MPEG-1 Layer III (ca. 7.4x) qval=2
  Frame          |  CPU time/estim | REAL time/estim | play/CPU |    ETA
8206/8208  (100%)|    0:35/    0:35|    0:35/    0:35|   6.0438x|    0:00
32 [ 151] ****
128 [ 790] %%***************
160 [2880] %%%%%********************************************************
192 [3157] %%%%%%%%%%%%%%%%%*************************************************
224 [ 803] %%%%%************
256 [ 247] %%****
320 [ 180] %***
average: 179.5 kbps   LR: 1384 (16.86%)   MS: 6824 (83.14%)

Writing LAME Tag...done

******** ORIGINAL WAVE FILE + LAME --scale

D:\testdir>D:\testdir>lame --preset standard --scale 0.691 13.wav 13.mp3
LAME version 3.90.3 MMX  (http://www.mp3dev.org/)
CPU features: i387, MMX (ASM used), 3DNow!, SIMD
Using polyphase lowpass  filter, transition band: 18671 Hz - 19205 Hz
Encoding 13.wav to 13.mp3
Encoding as 44.1 kHz VBR(q=2) j-stereo MPEG-1 Layer III (ca. 7.4x) qval=2
  Frame          |  CPU time/estim | REAL time/estim | play/CPU |    ETA
8206/8208  (100%)|    0:36/    0:36|    0:36/    0:36|   6.0251x|    0:00
32 [ 152] %***
128 [ 810] %*****************
160 [2876] %%%%%********************************************************
192 [3130] %%%%%%%%%%%%%%%%%*************************************************
224 [ 815] %%%%%*************
256 [ 238] %%****
320 [ 187] %***
average: 179.5 kbps   LR: 1372 (16.72%)   MS: 6836 (83.28%)

Writing LAME Tag...done


Now, most of the time when I use WaveGain I don't use dither anyway (look, if it were really an essential step it should be the default, no?), but I know a lot of folks here do use it - all the time. However, from the bitrate counts it would appear --scale is not pre-dithering anything... so the question becomes, how essential is dither to producing decent-sounding MP3s? Or does it even affect the sound to a measurable degree, since the process of psychoacoustic modeling is later in the chain?

- M.
Go to the top of the page
+Quote Post

7 Pages V  < 1 2 3 4 > » 
Reply to this topicStart new topic
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:

 



RSS Lo-Fi Version Time is now: 23rd July 2014 - 20:45