IPB

Welcome Guest ( Log In | Register )

4 Pages V  < 1 2 3 4 >  
Reply to this topicStart new topic
Improving ReplayGain, some ideas for Devs etc
2Bdecided
post Nov 24 2003, 11:51
Post #26


ReplayGain developer


Group: Developer
Posts: 5141
Joined: 5-November 01
From: Yorkshire, UK
Member No.: 409



QUOTE (dev0 @ Nov 23 2003, 05:36 PM)
3. ReplayGain lossy approximation

Storing this seems pointless to me, since ReplayGain calculations will become inaccurate after transcoding and no tool should be copying ReplayGain values when transcoding.

The ReplayGain values will be close enough. The peak values may not be, but they are much quicker to re-calculate.

Cheers,
David.
Go to the top of the page
+Quote Post
2Bdecided
post Nov 24 2003, 11:58
Post #27


ReplayGain developer


Group: Developer
Posts: 5141
Joined: 5-November 01
From: Yorkshire, UK
Member No.: 409



QUOTE (guruboolez @ Nov 24 2003, 10:18 AM)
Try lame abr for exemple (there's a --scale 0.98 included in the preset). Difference will be higher.

But in the case where an intentional gain change is applied, it's each to correct the ReplayGain values.

No software current "transcode" the ReplayGain values by default, so nothing is copying over incorrect values.

I'd suggest that any software that does "transcode" the ReplayGain values should
a) set the "lossy" (or whatever it gets called) flag, and
b) correct the values for any known gain change applied during the process

Both are much much quicker and easier than re-calculating the ReplayGain values.



I'd better mention something here: All this should make things a lot easier for the user. Any extra complexity introduced by these additions will go into the software, and be hidden from the user. The result should be that the software is able to do the "right thing" by default. Very simple.

Cheers,
David.
Go to the top of the page
+Quote Post
guruboolez
post Nov 24 2003, 12:26
Post #28





Group: Members (Donating)
Posts: 3474
Joined: 7-November 01
From: Strasbourg (France)
Member No.: 420



And what about the idea of a personnal track/album gain, set by the user? Purpose:
- useful if flaws in the RG calculation model
- useful if an audiophile want to keep a better coherency between different albums. For exemple, RG makes a gamba, an harpsichord, a flute sound as loud as an orchestra or a heavy metal band. It's the purpose of RG to do it. The idea is nice, but in some case, it doesn't have sense. I've recently bought a CD, anthology of the best sound recording of te year. The booklet is clear: some instrumental tracks have to be played much quieter than others in order to maintain high-fidelity principles. I've another disc, with an instrument called "clavicorde" (small harpsichord). The mastering level is very quiet; why? Because instrument sound is covered by human voice. RG will explode the volume (and background noise), and ruin the engineer and artist's will.

I suppose that RG can't determine if an instrument should be louder than another. Therefore, manual correction (and software tool for batch correction) is really needed.
Go to the top of the page
+Quote Post
2Bdecided
post Nov 24 2003, 12:54
Post #29


ReplayGain developer


Group: Developer
Posts: 5141
Joined: 5-November 01
From: Yorkshire, UK
Member No.: 409



See my reply in your other thread, and my suggestion for "user" and "real" ReplayGains in my first post in this thread.

Please reply in this thread.

Cheers,
David.
Go to the top of the page
+Quote Post
Case
post Nov 24 2003, 18:06
Post #30





Group: Developer (Donating)
Posts: 2230
Joined: 19-October 01
From: Finland
Member No.: 322



QUOTE (2Bdecided @ Nov 24 2003, 12:50 PM)
Sorry Case, but I think you're wrong.

Yup, I realized it seconds after posting. I had reference levels and calibrations in my mind and didn't consider the possibility of skipping all that during scanning.
Go to the top of the page
+Quote Post
2Bdecided
post Nov 26 2003, 11:02
Post #31


ReplayGain developer


Group: Developer
Posts: 5141
Joined: 5-November 01
From: Yorkshire, UK
Member No.: 409



Another suggestion (this isn't fundamental)...

It would be useful to copy over the DialNorm and MixLev values from Dolby Digital (AC-3) data when it's transcoded.

MixLev should go into the new "ReplayGain Real" field, and DialNorm could probably go into the existing Album Gain field (in which case the new field to indicate how the gain was calculated would be useful).

I'll figure out appropriate conversion factors, and maybe seek help from the Doom9 crowd.

Cheers,
David.
Go to the top of the page
+Quote Post
andyh
post Jan 2 2004, 17:44
Post #32





Group: Members
Posts: 1
Joined: 2-January 04
Member No.: 10882



I'm confused as to how the RealLife level is different than the artist/producer origin code in the id3 proposal. Would it be more consistent to include a separate track and album setting for this setting? I also don't understand how storing the calculation method would work. Theoretically the track gain could have been calculated by version 1 of the algorithm and the album gain could have been read from the cd. Which value would be stored in the calculation method field? Is it stored seperately for the track and the album field?

I think it would be a good idea to keep the id3 tag spec up to date with these suggestions. Since nobody has implemented it yet, I think that we should not worry about keeping it compatible.

Since David has said that he doesn't like the format of the gain values, I would like to change those as well. I think that the gain values should be stored as signed integers by simply multiplying the value by ten(or one hundred if the extra precision is usefull). Information about which values are set could be stored in a bitfield along with the lossy bit.

If the lame header is going to be changed to use an int for the peak value, now would probably be the best time to change the formats of the gain values as well. It might be nice if they would allocate space for the album peak value as well.

Here is my proposal for the contents of the id3 frame:

#define LOSSY 0x1
#define HAS_AUTO_TRACK_GAIN 0x2
#define HAS_AUTO_ALBUM_GAIN 0x4
#define HAS_USER_TRACK_GAIN 0x8
#define HAS_USER_ALBUM_GAIN 0xf
#define HAS_PRODUCER_TRACK_GAIN 0x10
#define HAS_PRODUCER_ALBUM_GAIN 0x20

struct {
long track_peak;
long album_peak;
char calculation_method;
short reference_gain;
short bitfield;
signed short auto_track_gain;
signed short auto_album_gain;
signed short user_track_gain;
signed short user_album_gain;
signed short producer_track_gain;
signed short producer_album_gain;
short right_undo;
short left_undo;
};

I have included both left and right undo values because mp3gain is storing both in the APE tags. I don't think anybody really scales the channels seperately, but I think that it would be good to store the same data in the different tag formats.

I would also like to know whether anyone intends to request that the new frame be added to the id3 spec. Section 3.3 of the 2.3.0 spec says:

The frame ID made out of the characters capital A-Z and 0-9. Identifiers beginning with "X", "Y" and "Z" are for experimental use and free for everyone to use, without the need to set the experimental bit in the tag header. Have in mind that someone else might have used the same identifier as you. All other identifiers are either used or reserved for future use.

If no one intends to propose adding replaygain to id3, we will need to rename the frame. Would "XRGA" be acceptable?

Any comments or suggestions would be welcome.
Go to the top of the page
+Quote Post
Lear
post Jan 2 2004, 19:10
Post #33


VorbisGain developer


Group: Developer
Posts: 140
Joined: 10-January 02
Member No.: 973



QUOTE (2Bdecided @ Nov 18 2003, 05:35 PM)
(It's a pity I didn't stick with the original idea of storing the ReplayGain level in the file e.g. 92dB instead of -3dB, because then the reference level wouldn't matter. Too confusing to change back now I think)

Interesting... I suggested changing it like that a year and a half ago, but you weren't too fond of the idea then (see here)... tongue.gif

QUOTE (2Bdecided @ Nov 24 2003, 11:50 AM)
At the moment, people store the gain change needed to match a standard loudness. Most use 89dB as that standard, but some use 83dB. So, there's confusion.

But they all measure the "perceived" loudness of the track the same way. (They're all taking my "pink_ref.wav" file, or whatever it was called, to be 83dB, after SMPTE RP-200 - after a real, and long existing standard). So if you store the "perceived" loudness, there's no confusion.

And this is the very reason why I suggested the change! biggrin.gif

(Btw, I must've missed this thread when it started... I should read through it, in case I have any comments.)

(Edit: Added second quote.)

This post has been edited by Lear: Jan 2 2004, 22:57
Go to the top of the page
+Quote Post
Lear
post Jan 2 2004, 22:46
Post #34


VorbisGain developer


Group: Developer
Posts: 140
Joined: 10-January 02
Member No.: 973



QUOTE (2Bdecided @ Nov 20 2003, 12:43 PM)
Field = 32-bit INT.


For 16-bit audio data, use

00000000xxxxxxxxxxxxxxxx00000000

Where xxxxxxxxxxxxxxxx is the peak value.
(1000000000000000 is the largest possible value for linear 16-bit data, e.g. a .wav file)


For 24-bit audio data

00000000xxxxxxxxxxxxxxxxxxxxxxxx

Where xxxxxxxxxxxxxxxxxxxxxxxx is the peak value.

One problem is that you can't differentiate "24 bit where the low 8 bits just happen to be 0" from "16 bit". So why not keep it simple, i.e. fixed point, where 1.0 is full scale. 23 bits fraction is enough, but I think 24 bits would be "cleaner" (e.g., 1.0 would then be 0x01000000). Allowing 256 times full scale ought to be enough... cool.gif

QUOTE
Should we change peak values to fixed point in all implementations?

Would it be easy for players to use, because I'm thinking about this being a useful convention to employ in all formats, since floating point isn't strictly needed, and is causing rounding confusion.

Or would it be stupid to change to fixed point for the peak value in other formats, because this would break compatibility with old players?

Doing it only for consistency isn't that important, IMO. Both are about as easy, I'd say (not that I've done much fixed-point stuff). It could be good to keep the precision about the same though (VorbisGain does that).

If they are stored in human readable format (i.e. Vorbis or APE tags), I'd say floating point is preferable, as it is easier to understand, even if it would require a bit more code on (embedded) systems without an FPU.
Go to the top of the page
+Quote Post
SamK
post Jan 4 2004, 15:48
Post #35





Group: Members
Posts: 57
Joined: 4-January 04
Member No.: 10938



I think it's the right time to switch to absolute replaygain value (90dB instead of +1dB).
Most the programs that support replaygain atm are frequent updates program, so backward player compatibility shouldnt be a problem too long.
If some player take months to update, its users would just have to stick to relative gain values.

Anyway, changing the representation of the number (fixed / float / ..) would break the compatibility all the same, wouldnt it ? So it's definitely the right time to do both changes at once.

I don't think it's a problem as long there is backward compatability for the files themselves, ie an old file with replaygain value should still be supported by new-replaygain supporting players.

if both value meaning and value encoding are to be changed, it sounds safer to choose between old and new meaning from another data. And the proposed 'method calculation Version' field presence would be enough to know it's a new gain tag.

I'm for applying all the good changes at once.

If you're really concerned about the risk of someone sueing you after playing a new replaygained file with an old player and blowing his ears up due to ludicrous pre-amping , let's just use another name for the gain value. RG2, whatever, and this wont be a risk anymore.

--
SamK
Go to the top of the page
+Quote Post
knik
post Jan 5 2004, 12:16
Post #36


FAAC developer


Group: Developer
Posts: 32
Joined: 8-July 03
Member No.: 7654



QUOTE (2Bdecided @ Nov 18 2003, 07:35 PM)
Almost everyone is using a reference level of 89dB, rather than the 83dB in the original ReplayGain proposal. Unless there are any objections, I'll change the official reference level to 89dB.

(It's a pity I didn't stick with the original idea of storing the ReplayGain level in the file e.g. 92dB instead of -3dB, because then the reference level wouldn't matter. Too confusing to change back now I think)

Does the '+92dB' approach use 16-bit min RMS (+-1 samples) as a reference or am I missing something?
I think reference level should be bit depth independent e.g. max RMS.
If the current ref level is some (maxrms - 7dB) then I think it's not bad.

Edit:
After closer look:
83dB = 14125.4 and 16-bit maxrms = 32768, hence 83dB = maxrms - 7.3dB

I would suggest to redefine reference level from 83dB to maxrms-7dB. It would be much less confusing.

This post has been edited by knik: Jan 5 2004, 19:18
Go to the top of the page
+Quote Post
saratoga
post Jan 6 2004, 06:28
Post #37





Group: Members
Posts: 4971
Joined: 2-September 02
Member No.: 3264



Stupid question: Is 0dB relative also 96 dB in 16 bit? I'm not sure what it means when i set the volume to -89dB.
Go to the top of the page
+Quote Post
SamK
post Jan 6 2004, 14:52
Post #38





Group: Members
Posts: 57
Joined: 4-January 04
Member No.: 10938



QUOTE (knik @ Jan 5 2004, 12:16 PM)
After closer look:
83dB = 14125.4 and 16-bit maxrms = 32768, hence 83dB = maxrms - 7.3dB

I would suggest to redefine reference level from 83dB to maxrms-7dB. It would be much less confusing.

ah ok, I see what you mean.
Considering a signal as a flow of unitless, infinite precision numbers.
ReplayGain computes a reference level (95-th percentile of all 0.05s frames RMS values). this unitless number is turned into a dB, let's call it absRL.

If a signal in [-1, 1] is multiplied by 2^depth, the absRL is shifted :
8 bit : (max/oldmax)^2 =(2^8)^2 ~= 6.5 *10^5 ~= 10^4.8 => absRL += 48dB
16 bit : (max/oldmax)^2=(2^16)^2 ~= 4.2 *10^9 ~= 10^9.6 => absRL += 96dB
24bit : (max/oldmax)^2=(2^24)^2 => absRL +=144dB

so let"s call those values:
fullScaleDB(bit_depth) = (bit_depth /8) * 10*log(2^16)
(adds 48.165dB every 8bit..)

If files of varying bitdepths were common, someone looking at their absRL would need to substract them with this 48.165*bit_depth/8 in order to know which one sounds louder when played at full volume.

So you're right, it's better to store :

(absRL(song) - fullScale_dB(bit_depth) )

which is in the fact the absRL of the songs if its samples are scaled back to [-1, 1].
it would be the 'absolute normalized Reference Level', ANRL.

btw I think the ANRL can still be positive, due to the filtering done before computing RMSs - which boosts human-sensitive frequencies and dampens others, so it can produce some samples > 1.0 from a [-1,1]-normalized signal.
(a song in 16 bit can be at absRL=100 dB or even a bit more)
Go to the top of the page
+Quote Post
2Bdecided
post Jan 6 2004, 15:12
Post #39


ReplayGain developer


Group: Developer
Posts: 5141
Joined: 5-November 01
From: Yorkshire, UK
Member No.: 409



QUOTE (Lear @ Jan 2 2004, 06:10 PM)
QUOTE (2Bdecided @ Nov 18 2003, 05:35 PM)
(It's a pity I didn't stick with the original idea of storing the ReplayGain level in the file e.g. 92dB instead of -3dB, because then the reference level wouldn't matter. Too confusing to change back now I think)

Interesting... I suggested changing it like that a year and a half ago, but you weren't too fond of the idea then (see here)... tongue.gif

QUOTE (2Bdecided @ Nov 24 2003, 11:50 AM)
At the moment, people store the gain change needed to match a standard loudness. Most use 89dB as that standard, but some use 83dB. So, there's confusion.

But they all measure the "perceived" loudness of the track the same way. (They're all taking my "pink_ref.wav" file, or whatever it was called, to be 83dB, after SMPTE RP-200 - after a real, and long existing standard). So if you store the "perceived" loudness, there's no confusion.

And this is the very reason why I suggested the change! biggrin.gif

(Btw, I must've missed this thread when it started... I should read through it, in case I have any comments.)

(Edit: Added second quote.)

Hi Lear!

I remember that thread! There was no way I was going to change it back again and confuse everyone again, since the argument was basically about whether or not to add 83dB at the end. I naively assumed that everyone would follow the suggestion, and there would be no confusion. Ha - some chance! rolleyes.gif laugh.gif

It's reminded me of something though: I expected people to think that things were too quiet, so suggested the player should default to adding 6dB to the values. What people chose to do instead was to make the calculation add 6dB to the values (if you think about it, the values stored in every file are 6dB greater than I suggested - because they get you to 89dB, not 83dB).

I wonder if I'd stuck with my original thought (what you proposed) if there still would have been confusion because someone would get the calculation to add 6dB to the value to have the same effect. Or else they would see that all the players used 89dB as a reference, but their calculator used 83dB as a reference, and change it. Or they'd just take the ref_pink.wav file and boost it by 6dB.

I do, in retrospect, think adding 83dB (and hence storing 92dB instead of -3dB or -9dB) is a better solution. But I have a feeling that someone would still have managed to mess it up!
Go to the top of the page
+Quote Post
knik
post Jan 6 2004, 18:45
Post #40


FAAC developer


Group: Developer
Posts: 32
Joined: 8-July 03
Member No.: 7654



QUOTE (2Bdecided @ Jan 6 2004, 05:12 PM)
I do, in retrospect, think adding 83dB (and hence storing 92dB instead of -3dB or -9dB) is a better solution. But I have a feeling that someone would still have managed to mess it up!

I really think we should forget about 16-bit dynamic range and use maxrms as a reference otherwise we will always have some confusion.
Go to the top of the page
+Quote Post
knik
post Jan 6 2004, 20:56
Post #41


FAAC developer


Group: Developer
Posts: 32
Joined: 8-July 03
Member No.: 7654



QUOTE (SamK @ Jan 6 2004, 04:52 PM)
So you're right, it's better to store :

(absRL(song) -  fullScale_dB(bit_depth) )

which is in the fact the absRL of the songs if its samples are scaled back to [-1, 1].
it would be the 'absolute normalized Reference Level', ANRL.

Yes, that's the point. We should use 1.0 as a reference for [-1,1] samples and we don't need any sample bit-depth assumption here.
It can always be rescaled to the actual output sample depth.
Go to the top of the page
+Quote Post
2Bdecided
post Jan 7 2004, 14:07
Post #42


ReplayGain developer


Group: Developer
Posts: 5141
Joined: 5-November 01
From: Yorkshire, UK
Member No.: 409



QUOTE (knik @ Jan 6 2004, 07:56 PM)
QUOTE (SamK @ Jan 6 2004, 04:52 PM)
So you're right, it's better to store :

(absRL(song) -  fullScale_dB(bit_depth) )

which is in the fact the absRL of the songs if its samples are scaled back to [-1, 1].
it would be the 'absolute normalized Reference Level', ANRL.

Yes, that's the point. We should use 1.0 as a reference for [-1,1] samples and we don't need any sample bit-depth assumption here.
It can always be rescaled to the actual output sample depth.

knik,

I didn't get around to replying to your (and other people's) posts because I didn't have the time, but I'd better squash this idea before it goes any further.

ReplayGain is referenced to SMPTE RP 200, a calibration by which a -20dB FS RMS pink noise signal will give a real world SPL of 83dB. All RG figures come from this concept, and all ReplayGain values are the gain adjustments needed to make that track (or album) match the perceived loudness of that test signal. (+6dB in most implementations)

The values are not based on bit depth. The notion of "how loud" a full scale sine wave is flows from SMPTE RP 200, and it is not 90dB, 96dB or 144dB. It's frequency dependent, but will be 103dB SPL for 2kHz (IIRC in the calculations I originally proposed).

The exact values depend on the "psychoacoustic" model used to determine the loudness of a given track or album. Different psychoacoustic models can be calibrated to the SMPTE RP 200 standard and used interchangeably (This means people can improve or change the ReplayGain calculation without messing everything up - compatibility and interchangeability is ensured).

Taking a non psychoacoustic standard (i.e. choosing digital full scale to equal some dB value) would make it very difficult to update the psychoacoustic model and calibrate it with previous versions. There are already several incompatible, uncalibrated, and largely unused methods for “correcting the loudness differences between tracks or albums”. I didn’t want to create yet another one!

The common sense approach to calibrating a system which judges perceived loudness is to define a specific test signal, and how loud this signal should be. As the industry has already done this, it made sense to follow this existing calibration.

Hope this helps. Please read http://www.replaygain.org/ for more information.

Cheers,
David.
Go to the top of the page
+Quote Post
2Bdecided
post Jan 7 2004, 14:16
Post #43


ReplayGain developer


Group: Developer
Posts: 5141
Joined: 5-November 01
From: Yorkshire, UK
Member No.: 409



QUOTE (Lear @ Jan 2 2004, 09:46 PM)
QUOTE (2Bdecided @ Nov 20 2003, 12:43 PM)
Field = 32-bit INT.


For 16-bit audio data, use

00000000xxxxxxxxxxxxxxxx00000000

Where xxxxxxxxxxxxxxxx is the peak value.
(1000000000000000 is the largest possible value for linear 16-bit data, e.g. a .wav file)


For 24-bit audio data

00000000xxxxxxxxxxxxxxxxxxxxxxxx

Where xxxxxxxxxxxxxxxxxxxxxxxx is the peak value.

One problem is that you can't differentiate "24 bit where the low 8 bits just happen to be 0" from "16 bit". So why not keep it simple, i.e. fixed point, where 1.0 is full scale. 23 bits fraction is enough, but I think 24 bits would be "cleaner" (e.g., 1.0 would then be 0x01000000). Allowing 256 times full scale ought to be enough... cool.gif

But it is fixed point, and I don't see why you'd need to "differentiate" between 24-bits (last 8 bits zero) and 16-bits. Can you explain?


QUOTE
QUOTE

Should we change peak values to fixed point in all implementations?

Would it be easy for players to use, because I'm thinking about this being a useful convention to employ in all formats, since floating point isn't strictly needed, and is causing rounding confusion.

Or would it be stupid to change to fixed point for the peak value in other formats, because this would break compatibility with old players?

Doing it only for consistency isn't that important, IMO. Both are about as easy, I'd say (not that I've done much fixed-point stuff). It could be good to keep the precision about the same though (VorbisGain does that).

If they are stored in human readable format (i.e. Vorbis or APE tags), I'd say floating point is preferable, as it is easier to understand, even if it would require a bit more code on (embedded) systems without an FPU.


You might say that, but Frank Klemm simply said "Floating point is a stupid idea" and coded it fixed point, 16-bit, with 6dB headroom above digital full scale. And he did that on the format "MusePack" which has 24-bit encoders and decoders, and can easily peak above 6dB above digital full scale. His argument was that he had 16 bits spare, he didn't want to use floating point, and what he stored should be enough to prevent clipping in all but the most severe situations.

When other people are coding it, you have to try to please them as well as yourself!

Cheers,
David.
Go to the top of the page
+Quote Post
Gabriel
post Jan 7 2004, 14:23
Post #44


LAME developer


Group: Developer
Posts: 2950
Joined: 1-October 01
From: Nanterre, France
Member No.: 138



Lame is using the fixed point representation from David since 3.94b
Go to the top of the page
+Quote Post
SamK
post Jan 7 2004, 15:57
Post #45





Group: Members
Posts: 57
Joined: 4-January 04
Member No.: 10938



QUOTE (2Bdecided @ Jan 7 2004, 02:07 PM)
ReplayGain is referenced to SMPTE RP 200, a calibration by which a -20dB FS RMS pink noise signal will give a real world SPL of 83dB. All RG figures come from this concept, and all ReplayGain values are the gain adjustments needed to make that track (or album) match the perceived loudness of that test signal. (+6dB in most implementations)

The values are not based on bit depth. The notion of "how loud" a full scale sine wave is flows from SMPTE RP 200, and it is not 90dB, 96dB or 144dB. It's frequency dependent, but will be 103dB SPL for 2kHz (IIRC in the calculations I originally proposed).

ah ok, the replaygain is already bitdepth independant. I had read most of replaygian documents, but this wasnt clearly stated anywhere.
If I had known matlab's wavread function returns an array of numbers in [-1, 1], I would have gotten the clue from the matlab demonstration code..
Maybe you should add a first step in the 4-step "General Concept" at http://replaygain.hydrogenaudio.org/rms_energy.html, like :
0. the signal is converted to floating point numbers, and divided by the full scale of the original format. (which is 2^15 for 16 bit integer encoding)

or something, to insure everyone gets this point.

To sum up what I understood,
replaygain computations are bitdepth independant from the start,
and the proposal is to store

Vrms = 83+ (replaygain(filename) - ref_Vrms);

(with ref_Vms being the gain of the standard digital signal corresponding to 83db SPL
ref_Vrms = replaygain("pink_ref.wav"); )

instead of previous :
Vrms = - (replaygain(filename) - ref_Vrms);

Then players would now use the stored value like that :
average_song_Vrms = 89; // user setting
rel_gain = average_song_Vrms - Vrms;
ratio = 10^(rel_gain/20);
// multiplies decoded samples by ratio.

This post has been edited by SamK: Jan 7 2004, 19:23
Go to the top of the page
+Quote Post
SamK
post Jan 7 2004, 20:34
Post #46





Group: Members
Posts: 57
Joined: 4-January 04
Member No.: 10938



reading http://home.earthlink.net/~bobkatz24bit/integrated.html, and the K-N VU meters, I realized there is no reason why the magic 83dB number from SMTPE RP200 standard should appear in ReplayGain. The computation is all in the digital domain, no SPL number should arise.

What SMTPE RP200 brings to us is only a standard -20dBFS signal to calibrate measures on.
The fact that this signal is supposed to actually produce sound at 83dB SPL in a calibrated hi-fi system is of no importance here, as we're only doing things *before* the actual sound system.

in fact, if you have a song with replaygain = +20 dB (relative to the original 83dB reference), it really means it is measured to perceptually sound 10 times louder overall than the reference pink noise signal (which is used as calibration reference for replaygain = +0dB)

That's all.

The real point of Replaygain is to compute
HR = replaygain -20
- aka : (AbsReplaygain-83) - 20
as a good measure of the overall headroom of the song. (ratio between peak capability of medium and "average level").

Indeed, if you take the -20dB FS standard pink noise sound, whose replaygain is exactly 83dB (by definition), HR will be exactly -20dB.
translate that to any signal, and you get HR to be indicative of the overall headroom.
It will be slightly negative values for most pop songs (maybe possibly slightly positive for a real loud sound concentrated in frequencies boosted by the psychoacoustics filter in use)
And could be lower than -10dB for classical music or anything with a bit more dynamic range.

So, if it is decided to switch to storing an absolute value, I'm suggesting storing the value HR.
(which is in fact the relative value minus 20 .. )
It gives all the info replaygain has to give, is independent to bit_depth AND the psychoacoustics filter used just as well as current replaygain is.
Plus it only takes from the SMTPE RP200 standard what it really uses : the choice of a reference signal so that different psychoacoustic implementations can calibrate on it.

And its value is much more intuitive, much less confusing than expressing the value in terms of SPL produced by calibrated system, which does not belong here.

Is it not ?
Go to the top of the page
+Quote Post
Lear
post Jan 7 2004, 20:47
Post #47


VorbisGain developer


Group: Developer
Posts: 140
Joined: 10-January 02
Member No.: 973



QUOTE (2Bdecided @ Jan 7 2004, 02:16 PM)
QUOTE (Lear @ Jan 2 2004, 09:46 PM)

One problem is that you can't differentiate "24 bit where the low 8 bits just happen to be 0" from "16 bit". So why not keep it simple, i.e. fixed point, where 1.0 is full scale. 23 bits fraction is enough, but I think 24 bits would be "cleaner" (e.g., 1.0 would then be 0x01000000). Allowing 256 times full scale ought to be enough...  cool.gif

But it is fixed point, and I don't see why you'd need to "differentiate" between 24-bits (last 8 bits zero) and 16-bits. Can you explain?


If you decode the value in the same way, regardless of bit depth, you'll get a kind of rounding error (or whatever it should be called) when dealing with the 16-bit value. E.g., 0x3FFF00 (half scale in 16 bit) is not the same as 0x3FFFFF (half scale in 24 bit). Sure, the error will be small, but it'll be there. smile.gif (Of course, if the processing is all done in 16 bits it doesn't matter, as the low bits will be thrown away.)

QUOTE
You might say that, but Frank Klemm simply said "Floating point is a stupid idea" and coded it fixed point, 16-bit, with 6dB headroom above digital full scale. And he did that on the format "MusePack" which has 24-bit encoders and decoders, and can easily peak above 6dB above digital full scale. His argument was that he had 16 bits spare, he didn't want to use floating point, and what he stored should be enough to prevent clipping in all but the most severe situations.


I'd guess he did it that way because there were 16 bits of reserved space in the file format he could use, so he squeezed in what he could. But that doesn't mean other file formats should do it like that. Still, the actual format in the tag isn't very important, IMO, as long as the necessary resolution is there.
Go to the top of the page
+Quote Post
knik
post Jan 7 2004, 21:32
Post #48


FAAC developer


Group: Developer
Posts: 32
Joined: 8-July 03
Member No.: 7654



Thanks for explanation, 2Bdecided. It really helped.
Now I see RG reference level is well defined.
Go to the top of the page
+Quote Post
2Bdecided
post Jan 8 2004, 12:29
Post #49


ReplayGain developer


Group: Developer
Posts: 5141
Joined: 5-November 01
From: Yorkshire, UK
Member No.: 409



QUOTE (SamK @ Jan 7 2004, 02:57 PM)
QUOTE (2Bdecided @ Jan 7 2004, 02:07 PM)
ReplayGain is referenced to SMPTE RP 200, a calibration by which a -20dB FS RMS pink noise signal will give a real world SPL of 83dB. All RG figures come from this concept, and all ReplayGain values are the gain adjustments needed to make that track (or album) match the perceived loudness of that test signal. (+6dB in most implementations)

The values are not based on bit depth. The notion of "how loud" a full scale sine wave is flows from SMPTE RP 200, and it is not 90dB, 96dB or 144dB. It's frequency dependent, but will be 103dB SPL for 2kHz (IIRC in the calculations I originally proposed).

ah ok, the replaygain is already bitdepth independant. I had read most of replaygian documents, but this wasnt clearly stated anywhere.
If I had known matlab's wavread function returns an array of numbers in [-1, 1], I would have gotten the clue from the matlab demonstration code..
Maybe you should add a first step in the 4-step "General Concept" at http://replaygain.hydrogenaudio.org/rms_energy.html, like :
0. the signal is converted to floating point numbers, and divided by the full scale of the original format. (which is 2^15 for 16 bit integer encoding)

or something, to insure everyone gets this point.

I think, if you follow it through, it doesn't matter whether wavread returns [1,-1] or [-32768,32767] (you're right saying that it returns the former). As long as the value "ref_Vrms" has been calculated by the same method (which is essential anyway), then calibrating to it (i.e. subtracting it at the end) will cancel out whatever scaling or units or whatever are used at the input. That's because both the file in question, and the ref_pink.wav file will be scaled the same on the way in (to [1,-1] or [-32768,32767] or whatever). Subtracting in the logarithmic domain (which dB is) is the same as dividing in the linear domain. So any scaling is cancelled in this last step.


QUOTE
To sum up what I understood,
replaygain computations are bitdepth independant from the start,
and the proposal is to store

Vrms = 83+ (replaygain(filename) - ref_Vrms);

(with ref_Vms being the gain of the standard digital signal corresponding to 83db SPL
ref_Vrms = replaygain("pink_ref.wav");  )

instead of previous :
Vrms = - (replaygain(filename) - ref_Vrms);

Then players would now use the stored value like that :
average_song_Vrms = 89; // user setting
rel_gain = average_song_Vrms - Vrms;
ratio = 10^(rel_gain/20);
// multiplies decoded samples by ratio.


Yes exactly - though I'm not strongly suggesting we change it. I was saying it's a pity it isn't like this already, but should it be changed now?

I'll answer your other post, and them expand on that point...


EDIT: 1000th post! Should have made it better! laugh.gif

This post has been edited by 2Bdecided: Jan 8 2004, 13:19
Go to the top of the page
+Quote Post
2Bdecided
post Jan 8 2004, 13:11
Post #50


ReplayGain developer


Group: Developer
Posts: 5141
Joined: 5-November 01
From: Yorkshire, UK
Member No.: 409



QUOTE (SamK @ Jan 7 2004, 07:34 PM)
reading http://home.earthlink.net/~bobkatz24bit/integrated.html, and the K-N VU meters, I realized there is no reason why the magic 83dB number from SMTPE RP200 standard should appear in ReplayGain. The computation is all in the digital domain, no SPL number should arise.

[snip]

And its value is much more intuitive, much less confusing than expressing the value in terms of SPL produced by calibrated system, which does not belong here.

Is it not ?

No, because perceived loudness depends on loudness!

This isn't built into the current psychoacoustic model, but could well be implemented in a future improvement....

If you're listening to a bass heavy track at 60dB, you'll hear much less bass (relatively) than you will at 80dB. This means that increasing the gain on a bass heavy track by 20dB will cause its subjective loudness to be increased more than a 20dB boost to a bass light track. What's more, the perceived loudness increase of that 20dB boost will be different if it's a boost from 40dB to 60dB than if it's a boost from 80dB to 100dB.

If the equal loudness curves were parallel lines, then we wouldn't really have to worry about real world sound pressure. They're not, so it's an issue, and it can only be solved if we make some kind of guess (like the floating ATH in the lame encoder), or calibrate the system properly to a real world loudness - which is what I've chosen to do.

Hope this makes sense.

Cheers,
David.

EDIT: plus see my previous response about how many other schemes exist which are unused because no one knows how they are supposed to be calibrated, or re-calibrated.

This post has been edited by 2Bdecided: Jan 8 2004, 13:12
Go to the top of the page
+Quote Post

4 Pages V  < 1 2 3 4 >
Reply to this topicStart new topic
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:

 



RSS Lo-Fi Version Time is now: 22nd September 2014 - 12:32