Flaw in ReplayGain spec
Flaw in ReplayGain spec
May 12 2002, 11:04
Joined: 24-September 01
Member No.: 13
It occured to me today that there is a problem with the current ReplayGain spec, or rather, my proposal for doing it in Vorbis.
The issue is combining replaygain and clipping prevention.
If applying the replaygain would cause the track to clip, clipping prevention kicks in, and reduces the level. This will make the output loudness different from the ideal, 'equal' level.
When running in radio/track mode, there is no way around this, since you don't know in advance what you are going to encounter. The best you can do is set the default level low enough so you can hope it'll never happen. I believe this was the idea (among possibly other things) behind setting the default level to K-20 in the new MPC decoders? (Frank? )
If the implementation in the current Vorbis players is correct, a similar effect can be reached by setting the preamp in the plugin to -6dB or so.
In album gain, you could avoid this from happening for the entire album you're listening to, since you already ReplayGain-processed them in group and thus know what is coming up, however, my current proposal poses problems for doing this: You would need to read in all files that belong to the album, read in the peak values, and remember the largest, and use that as the peak value for the individual tracks.
This is what I originally envisioned, however, looking back, this is both ugly, cumbersome and it may not even be possible in some player/plugin architectures.
I think the correct solution would probably be to store an album-peak value.
It would be trivial to implement in the ReplayGain tools, and require only minimal changes in the players without all the uglyness the current method would require (which isn't done correctly by anyone anyway).
The disadvantage is that it requires another tag. However, since the Vorbis people seem to have gotten a bit more enthousiastic about ReplayGain lately, perhaps that isn't so much of a problem.
I believe it's valuable to do this, as it may post a real problem in practise. Moreover, the proposal as it is now is broken by design in this regard, and I'd prefer to fix it while it's still fixable.
Also, the ReplayGain proposal on David's site doesn't mention anything about this? Is there another way to address this problem?
There's two other issues with the current spec that I'd like to discuss about while it's still possible.
1) Change RG_* into REPLAYGAIN_*
This was proposed by Segher, with the idea that someone looking at the tags and that doesn't know what they are can at least google to find out, whereas you'd be left clueless with the current 'RG'. I think this idea is valuable and good.
2) Source/version tag
I didn't include one originally because I saw no way to keep it consistent if you allow the user to edit the tags (you can't require them to know the spec...), and because I didn't see the RG calculations being improved for quite a while. Unfortunately, Frank Klemm has already proven me wrong on the latter. I don't see a way to make such a tag actually _work_ though.
I'd like feedback from everyone about all of this. Is it worthwhile to change the current proposal and fix some of the above issues?
May 20 2002, 17:50
Joined: 5-November 01
From: Yorkshire, UK
Member No.: 409
All other things being equal, assuming a complete understanding of replaygain in the player, both are equivalent.
So we can store either. ..
"relative": the player does this:
a) read in replay gain value
b) apply replay gain value
"absolute": the player does this:
a) read in replay gain value
b) subtract 83 from this value
c) apply the resulting value
I originally thought about storing the "absolute" value (rather than the "relative" adjustment) but several people pointed out that storing the adjustment is easier - and I agreed. So that's the way it is. You've got to pick one of the other, so I picked the option that made player implementation easier.
My suggestion is simply that the output (when using the reference signal) should be 83 (as an absolute value) rather than 0 (as a relative value).
But they're both relative, aren't they? Decibels (dB) are, by definition, a relative measurement. The 83 isn't just "83", like you could say the length of a piece of string is "83cm" and that's that. It's equivalent to the perceived loudness of a -20dB FS (relative to a full scale sinewave) RMS pink noise signal replayed via an SMPTE RP 200 calibrated audio system.
My way, you don't put an 83 in the replay gain calculation, and you don't put an 83 in the player calculation either. Your way, you could well add it at one end, and then subtract it at the other! why?!
Besides, the peaks describe a property of the file. Level would do the same.
If you store what you suggest (e.g. 80dB instead of -3dB, for example) then you're NOT storing a property of the file. You're storing the gain required to make the file average 83dB perceived loudness in a calibrated system. Rather than storing a property, you're still storing an adjustement. A loud file will have a lower value, whether you add 83 to it or not! So, you'd have to take the "-3dB", switch the sine (+3dB), add it to 83dB, and store this (86dB) as the perceived loudness of the file. Then, to play it back, you take this value (86dB), and subtract it from the required level (e.g. 83dB-86dB=-3dB), and then apply this gain change to the file. BUT LOOK! We're just where we started - with the number -3dB! so why bother?!
To make the stored value a property of the file (i.e. truly absolute, not relative) you have to remove the calibration step (stage 4, if you refer to the explanation on the website). It's the calibration step that causes different calculations (e.g. mine and Franks) to fall on the same scale - take this away, and you've got a big disadvantage: no prospect of improving the calculation.
I hope this clarifies the situation. Yes, you could equally well store the value with 83 added to it or not, just so long as everyone understood which you were doing. But since NOT adding it means you then DON'T have to subtract it again, that makes most sense. It's also what has been happening for a year, so there seems no reason to change now. Also, it's still a relative value: an instruction to turn up or turn down the file to match a reference level. If you remove this reference level, then you can state something explicitly about the file, but you loose the calibration, so making it harder to change the calculation while retaining a compatible scale.
In short: adding 83 makes little difference (but is no use, and a small amount of trouble); changing to a true representation of "what's in the file" will cause many problems for little gain (pardon the pun).
|Lo-Fi Version||Time is now: 26th May 2015 - 01:14|