IPB

Welcome Guest ( Log In | Register )

23 Pages V  < 1 2 3 4 5 > »   
Closed TopicStart new topic
R128GAIN: An EBU R128 compliant loudness scanner
pbelkner
post Jan 7 2011, 19:56
Post #51





Group: Members
Posts: 412
Joined: 13-June 10
Member No.: 81467



QUOTE (jangk @ Jan 7 2011, 20:13) *
Anyhow, great work in progress smile.gif smile.gif smile.gif

Thanks smile.gif

QUOTE (jangk @ Jan 7 2011, 20:13) *
I am excited to see further developement and how it coud work with RG.

R128GAIN writes the RG tags (currently only for FLAC). I've re-tagged all my FLACs and listen (shuffling) to them all the time with RG enabled. That's my most important test case wink.gif
Go to the top of the page
+Quote Post
googlebot
post Jan 7 2011, 21:34
Post #52





Group: Members
Posts: 698
Joined: 6-March 10
Member No.: 78779



QUOTE (pbelkner @ Jan 6 2011, 16:03) *
What R128GAIN does is the following (in principle):
  • Create an empty gating block capable of holding samples up to 400ms using a ring buffer.
  • For each input sample:
    • If the gating block is full remove the first sample from it.
    • Add the current sample to the end of the gating block.
    • If the gating block is full:
      • Pick the sample cached in the middle of the gating block.
      • Depending on the (un-gated) loudness measure of the gating block decide, whether to add the picked sample to the overall statistics.
That's my understanding of Tech doc 3341, Annex 1, at least in principle.


I agree with C.R.H. that this interpretation doesn't seem to follow Tech 3341, Annex 1.

  1. A measurement interval is first divided into blocks of 400ms length (with at least 50% overlap). Basically doable with a ring buffer.
  2. From this set of blocks, eliminate all blocks with loudness below -70 LUFS. (I don't understand why you speak of adding samples (and not blocks) to the "overall statistic").
  3. Calculate the total loudness for the latter, decimated set of blocks.
  4. This loudness minus 8 LU is your final threshold.
  5. Decimate the already decimated set of blocks again: Remove all blocks below the final threshold (which can only be known after a complete pass).
  6. The "ungated" total loudness of the resulting set of blocks is your result.

I think it would be simpler to work with block indices (a list, array, or bitmap) than a ring buffer.* Read the input stream two times block by block and skip the calculated indices. Usually, with the buffering left to the OS, you should be reading the second pass from memory automatically.

PS The wording in Annex 1 could be better. Especially the "gated loudness" LKG in (6) and (8) should use different symbols. But mathematically it is not ambiguous. It took me over an hour to crunch the whole thing, though.

* Of course, you can still use something as a ring buffer for I/O. But I would put a block-wise abstraction layer on top of it to make the overall design simpler.

This post has been edited by googlebot: Jan 7 2011, 22:14
Go to the top of the page
+Quote Post
pbelkner
post Jan 7 2011, 22:17
Post #53





Group: Members
Posts: 412
Joined: 13-June 10
Member No.: 81467



QUOTE (googlebot @ Jan 7 2011, 22:34) *
I agree with C.R.H. that this interpretation doesn't seem to follow Tech 3341, Annex 1.

Probably you're both are right, I have to think about it.

What I have in mind (probably not correct) is the following:
  • BS.1770 defines a loudness measure to which all samples contribute the same way.
  • R128 adds a gate: some samples have to be removed from the pure BS.1770 measure.
  • In R128GAIN the gate is local to each sample, i.e. for each sample it consist of -200ms to +200ms samples relative to the sample under consideration. Depending on the pure (un-gated) BS.1770 loudness measure of the relative -200ms to +200ms block the sample under consideration is added to or removed from the (gated) BS.1770 loudness measure of the track/album.
The block relative to each sample is what I call the running gate because it's easy to update it's sum of squares at each step.

QUOTE (googlebot @ Jan 7 2011, 22:34) *
  • The "ungated" total loudness of the resulting set of blocks is your result.

What do you mean by "loudness of a set of blocks"? Doesn't it imply to count samples more than once?

It seems to me that what I've implemented is the limit of what you get if you let go the overlap to 100%. If this is true than it would be fully compliant because they require 50% at a minimum.

This post has been edited by pbelkner: Jan 7 2011, 22:21
Go to the top of the page
+Quote Post
googlebot
post Jan 7 2011, 22:35
Post #54





Group: Members
Posts: 698
Joined: 6-March 10
Member No.: 78779



QUOTE (pbelkner @ Jan 7 2011, 22:17) *
It seems to me that what I've implemented is the limit of what you get if you let go the overlap to 100%.


No. Because, for true overlap, the answer to this

QUOTE (pbelkner @ Jan 7 2011, 22:17) *
Doesn't it imply to count samples more than once?


is "Yes!". Within the overlapping area the same sample can be part of both, zero or more eliminated blocks and zero or more non-eliminated blocks. All non-eliminated blocks are part of the final calculation.

QUOTE (pbelkner @ Jan 7 2011, 22:17) *
What do you mean by "loudness of a set of blocks"?


Conceptually: Concatenate all non-eliminated blocks and calculate the "ungated" loudness for the whole interval. In practice you basically average the pre-calculated loudness values of all non-eliminated blocks, see step (8).

PS My comments are not supposed to curtain the fact that you have done a great job so far! smile.gif

This post has been edited by googlebot: Jan 7 2011, 23:05
Go to the top of the page
+Quote Post
pbelkner
post Jan 7 2011, 23:21
Post #55





Group: Members
Posts: 412
Joined: 13-June 10
Member No.: 81467



QUOTE (googlebot @ Jan 7 2011, 23:35) *
PS My comments are not supposed to curtain the fact that you have done a great job so far! smile.gif

Many thanks to C.R.Helmrich and you for the great comments! Meanwhile I've taken another look at the papers and I think the point is clear now.

Probably the next version will offer the two pass approach (and may leave the current one pass as a very good approximation).
Go to the top of the page
+Quote Post
jdoering
post Jan 8 2011, 00:28
Post #56





Group: Members
Posts: 11
Joined: 6-January 11
Member No.: 87101



Please pardon the noob here; hopefully I'm keeping up with the discussion even though most of this is far outside my normal domain. I'm sure you'll all set me straight if I'm on the wrong track!

Is a two-pass approach over the input really required? While I'm sure it's a reasonable approach; from a library perspective a single pass interface seems convenient (like the common ReplayGainAnalysis C code). In googlebot's steps 1 through 6; the loudness per block is calculated implicitly during step #2 and if I'm understanding correctly only that per-block loudness is needed for all of the remaining steps.

Now in a "maximum overlap" approach as suggested by pbelkner each input sample results in a block so the block count per second is of course very high (equal to the sample rate). In this case buffering the per-block loundness in a single-pass approach sounds ridiculous compared to a two-pass algorithm.

But in the minimum 50% overlap standard laid out by Tech 3341, Annex 1; the block count per second is fixed at 5 independent of the sample rate. If I'm understanding this correctly it means that buffering the per-block loudness would "only" require 18K samples per hour (versus 172 million for near 100% overlap). If the loudness samples are stored in 64-bits that's only a little over 700 KiB an hour of buffering. While it isn't bounded; it sounds reasonable for in memory buffering this application on modern hardware (considering tyipcal PC applications at this point, not embedded devices, etc).

I looks to me like there is a good reason to stay near the 50% minimum overlap.

-Jeff
Go to the top of the page
+Quote Post
googlebot
post Jan 8 2011, 01:51
Post #57





Group: Members
Posts: 698
Joined: 6-March 10
Member No.: 78779



Completely agree! One doesn't really have to pass two times over the whole input. Only the loudness values of non-eliminated blocks need to be saved during the first pass. The second "pass" can then just further decimate those (in the same loop as the final averaging).

I often do not start to look for speed optimization potential before I have a simple to understand and correctly working first sketch. In my experience this leads to better code in the long run. But you are right: 2 passes over the whole input are overkill, probably even for a first sketch... wink.gif

This post has been edited by googlebot: Jan 8 2011, 01:52
Go to the top of the page
+Quote Post
pbelkner
post Jan 8 2011, 09:32
Post #58





Group: Members
Posts: 412
Joined: 13-June 10
Member No.: 81467



QUOTE (jdoering @ Jan 8 2011, 01:28) *
But in the minimum 50% overlap standard laid out by Tech 3341, Annex 1; the block count per second is fixed at 5 independent of the sample rate. If I'm understanding this correctly it means that buffering the per-block loudness would "only" require 18K samples per hour (versus 172 million for near 100% overlap). If the loudness samples are stored in 64-bits that's only a little over 700 KiB an hour of buffering. While it isn't bounded; it sounds reasonable for in memory buffering this application on modern hardware (considering tyipcal PC applications at this point, not embedded devices, etc).

I looks to me like there is a good reason to stay near the 50% minimum overlap.

Thanks a lot for this estimation. For album gain calculation we have to buffer "loudness samples" in this order of magnitude.
Go to the top of the page
+Quote Post
C.R.Helmrich
post Jan 8 2011, 14:29
Post #59





Group: Developer
Posts: 694
Joined: 6-December 08
From: Erlangen Germany
Member No.: 64012



QUOTE (pbelkner @ Jan 8 2011, 00:21) *
Many thanks to C.R.Helmrich and you for the great comments!

Gern geschehen. Thank you for taking the implementation initiative!

I also agree with Jeff and googlebot and suggest to do it exactly like they proposed: compute a new block loudness measure every 9600 samples (at 48 kHz) and store all blocks with loudness > -70 LUFS in a linked list (or array if you know the track/album length ahead of time... which you do in our scenario, I guess). Then you can apply the relative gate on this list.

Actually, I think to avoid calculating the logarithm and division by T every 200 ms you can simply store the block energies in your list, because the comparison

block loudness > -70 LUFS

is, assuming your block energy = left energy + right energy + center energy + 1.41* ..., equivalent to

block energy > 0.4 * sample rate * 10^((-70+0.691)/10),

with the right-hand term being a constant (0.00225113 for 48 kHz, 0.00206823 for 44.1 kHz). Then you can work analogously for the relative gating: simply sum up all the block energies in your 70-gated list, divide by the number of energies in the list to get the average 70-gated energy, and apply the relative gating threshold by

block energy > 0.1584893 * average 70-gated energy

Chris

This post has been edited by C.R.Helmrich: Jan 8 2011, 15:13


--------------------
If I don't reply to your reply, it means I agree with you.
Go to the top of the page
+Quote Post
Notat
post Jan 8 2011, 18:42
Post #60





Group: Members
Posts: 581
Joined: 17-August 09
Member No.: 72373



I have it on good authority that the calculation can be done in a single pass. This was a design requirement as R128 was designed to be workable for live broadcast applications. I will make inquiries and try and scare up the technical details. If anyone happens to be in Switzerland in February all will be revealed.
Go to the top of the page
+Quote Post
googlebot
post Jan 8 2011, 19:40
Post #61





Group: Members
Posts: 698
Joined: 6-March 10
Member No.: 78779



A fully standard compliant single-pass outline is on the table since at least post #54 (bottom). For I-scale measurements, some state has to be accumulated, though, because the loudness of a programme's last block can in principle decide whether its first block gets gated or not. Hardware with limited memory will have to be subject to limits for the maximum integrable time span (which can be huge at moderate cost if you look at Jeff's post). The S- and M- scales, on the other hand, are suited for measurements of infinite length.

I'm looking forward, however, to what you can dig up at the workshop and share here!

Great optimization by C.R.Helmrich, btw, this should save several orders of magnitude CPU time!

This post has been edited by googlebot: Jan 8 2011, 19:47
Go to the top of the page
+Quote Post
pbelkner
post Jan 9 2011, 18:47
Post #62





Group: Members
Posts: 412
Joined: 13-June 10
Member No.: 81467



v0.3 released

I've just uploaded the new version and it's available at
http://sourceforge.net/projects/r128gain/files/
What's new?
  • Implements the algorithm as discussed with C.R.Helmrich, googlebot, and jdoering (cf. "r128c.c", many thanks again, however the latest optimization as proposed by C.R.Helmrich is still missing). The results of the test cases are now in accordance with the specification:

    CODE
    $ r128gain ../sounds/ebu-loudness-test-setv01/
    args
    ../sounds/ebu-loudness-test-setv01
      analyzing ...
        1kHz Sine -20 LUFS-16bit.wav (1/16): -20.0 LUFS, -3.0 LU (peak: 0.100734: -10.0 dBFS)
        1kHz Sine -26 LUFS-16bit.wav (2/16): -26.0 LUFS, 3.0 LU (peak: 0.050508: -13.0 dBFS)
        1kHz Sine -40 LUFS-16bit.wav (3/16): -40.0 LUFS, 17.0 LU (peak: 0.010260: -19.9 dBFS)
        seq-3341-1-16bit.wav (4/16): -23.0 LUFS, -0.0 LU (peak: 0.071316: -11.5 dBFS)
        seq-3341-2-16bit.wav (5/16): -33.0 LUFS, 10.0 LU (peak: 0.023049: -16.4 dBFS)
        seq-3341-3-16bit.wav (6/16): -23.0 LUFS, -0.0 LU (peak: 0.071468: -11.5 dBFS)
        seq-3341-4-16bit.wav (7/16): -23.0 LUFS, 0.0 LU (peak: 0.070850: -11.5 dBFS)
        seq-3341-5-16bit.wav (8/16): -22.9 LUFS, -0.1 LU (peak: 0.100845: -10.0 dBFS)
        seq-3341-6-5channels-16bit.wav (9/16): -23.0 LUFS, 0.0 LU (peak: 0.063133: -12.0 dBFS)
        seq-3341-6-6channels-WAVEEX-16bit.wav (10/16): -23.7 LUFS, 0.7 LU (peak: 0.063133: -12.0 dBFS)
        seq-3341-7_seq-3342-5-24bit.wav (11/16): -23.0 LUFS, -0.0 LU (peak: 0.358341: -4.5 dBFS)
        seq-3341-8_seq-3342-6-24bit.wav (12/16): -23.0 LUFS, 0.0 LU (peak: 0.718299: -1.4 dBFS)
        seq-3342-1-16bit.wav (13/16): -22.6 LUFS, -0.4 LU (peak: 0.100089: -10.0 dBFS)
        seq-3342-2-16bit.wav (14/16): -16.8 LUFS, -6.2 LU (peak: 0.177974: -7.5 dBFS)
        seq-3342-3-16bit.wav (15/16): -20.0 LUFS, -3.0 LU (peak: 0.100089: -10.0 dBFS)
        seq-3342-4-16bit.wav (16/16): -20.0 LUFS, -3.0 LU (peak: 0.100075: -10.0 dBFS)
        ALBUM: -21.9 LUFS, -1.1 LU (peak: 0.718299: -1.4 dBFS)

  • The command line syntax has slightly changed in order to allow for (hopefully) proper wildcard expansion:

    CODE
    r128gain <input>? [-o <directory> [flac]]

    The new version accepts one or more input files or directories possibly containing wildcards. The optional output directory has to be separated from the list of inputs by the switch "-o".
Go to the top of the page
+Quote Post
googlebot
post Jan 10 2011, 09:58
Post #63





Group: Members
Posts: 698
Joined: 6-March 10
Member No.: 78779



Works perfectly, great job! Even for multichannel and high resolution files.

I'm wondering why the EBU provided test sample don't match their own descriptions in tech 3341. That should be fixed.
Go to the top of the page
+Quote Post
pbelkner
post Jan 10 2011, 17:38
Post #64





Group: Members
Posts: 412
Joined: 13-June 10
Member No.: 81467



QUOTE (C.R.Helmrich @ Jan 8 2011, 15:29) *
Actually, I think to avoid calculating the logarithm and division by T every 200 ms you can simply store the block energies in your list, because the comparison

block loudness > -70 LUFS

is, assuming your block energy = left energy + right energy + center energy + 1.41* ..., equivalent to

block energy > 0.4 * sample rate * 10^((-70+0.691)/10),

with the right-hand term being a constant (0.00225113 for 48 kHz, 0.00206823 for 44.1 kHz). Then you can work analogously for the relative gating: simply sum up all the block energies in your 70-gated list, divide by the number of energies in the list to get the average 70-gated energy, and apply the relative gating threshold by

block energy > 0.1584893 * average 70-gated energy

Let me, please, summarize how I understand this:
  • The BS.1770 loudness measure is defined as
    -0.691 + 10*lg(wmsq),
    where
    wmsq = sum_i_j G_i*x_i_j*x_i_j/n,
    i running over all channels,
    G_i the weighting coefficient for the i-th channel,
    j running from 0 to n-1 over all sampling intervals,
    x_i_j the j-1 channel's voltage of the i-1 sample
    is the (per channel) weightet mean square of the intervall under consideration.
  • Let wmsq_i be the weighted mean square of the i-th block of an EBU R128 overlapping segmentation.
  • Phase 1 of the EBU R128 algorithm chooses block i of the EBU R128 segmentation if
    -0.691 + 10*lg(wmsq_i) > -70
    <==> 10*lg(wmsq_i) > 0.691-70
    <==> lg(wmsq_i) > (0.691-70)/10
    <==> wmsq_i > 10^((0.691-70)/10)

    The threshold for chosing block i with wmsq_i is 10^((0.691-70)/10).
  • Let wmsq_p1 be the weighted mean square of all blocks chosen in phase 1.
  • Phase 2 of the EBU R128 algorithm chooses block i of the EBU R128 segmentation if
    -0.691 + 10*lg(wmsq_i) > -0.691 + 10*lg(wmsq_p1) - 8
    <==> 10*lg(wmsq_i) > 10*lg(wmsq_p1) - 8
    <==> lg(wmsq_i) > lg(wmsq_p1) - 0.8
    <==> lg(wmsq_i) - lg(wmsq_p1) > -0.8
    <==> lg(wmsq_i/wmsq_p1) > -0.8
    <==> wmsq_i/wmsq_p1 > 10^(-0.8)
    <==> wmsq_i > wmsq_p1*10^(-0.8)

    The threshold for chosing block i with wmsq_i is wmsq_p1*10^(-0.8)
  • Let wmsq_p2 be the weighted mean square of all blocks chosen in phase 2.
  • Then
    -0.691 + 10*lg(wmsq_p2)
    is the EBU R128 loudness measure.
It turns out that it is possible to avoid any logartithm during intermediate calculations. Intermediate results, i.e. weighted mean squares, are optained simply by add and multiply operations. Only the one time calculaton of the two thresholds for phase 1 and phase 2 needs exponentation.

Pass 1 of the EBU R128 algorithm only has to cache the weighted mean squares wmsq_i of the EBU R128 segmentation. From that all the rest can easily be derived.

This post has been edited by pbelkner: Jan 10 2011, 17:44
Go to the top of the page
+Quote Post
pbelkner
post Jan 10 2011, 18:01
Post #65





Group: Members
Posts: 412
Joined: 13-June 10
Member No.: 81467



QUOTE (pbelkner @ Jan 10 2011, 18:38) *
The BS.1770 loudness measure is defined as
-0.691 + 10*lg(wmsq),
where
wmsq = sum_i_j G_i*x_i_j*x_i_j/n,
i running over all channels,
G_i the weighting coefficient for the i-th channel,
j running from 0 to n-1 over all sampling intervals,
x_i_j the j-1 channel's voltage of the i-1 sample
is the (per channel) weightet mean square of the intervall under consideration.

should read

QUOTE
The BS.1770 loudness measure is defined as
-0.691 + 10*lg(wmsq),
where
wmsq = sum_i_j G_i*x_i_j*x_i_j/n,
i running over all channels,
G_i the weighting coefficient for the i-th channel,
j running from 0 to n-1 over all sampling intervals,
x_i_j the j-th channel's voltage of the i-th sample
is the (per channel) weightet mean square of the intervall under consideration.
Go to the top of the page
+Quote Post
C.R.Helmrich
post Jan 10 2011, 18:20
Post #66





Group: Developer
Posts: 694
Joined: 6-December 08
From: Erlangen Germany
Member No.: 64012



Exactly, and if you pull out the "/n" in wmsq = sum_i_j G_i*x_i_j*x_i_j/n, which you can do since n is the same in all blocks and channels, you get what I wrote because n = 0.4 * sample rate and

wmsq = block energy / n


and save many divisions. Of course you still need the division when computing the final R128 loudness measure.

I think you mean "x_i_j the i-th channel's voltage of the j-th sample" though, right?

Chris



--------------------
If I don't reply to your reply, it means I agree with you.
Go to the top of the page
+Quote Post
googlebot
post Jan 10 2011, 18:35
Post #67





Group: Members
Posts: 698
Joined: 6-March 10
Member No.: 78779



Just out of curiosity, where does that 0.4 come from?
Go to the top of the page
+Quote Post
jdoering
post Jan 10 2011, 19:27
Post #68





Group: Members
Posts: 11
Joined: 6-January 11
Member No.: 87101



QUOTE (googlebot @ Jan 10 2011, 09:35) *
Just out of curiosity, where does that 0.4 come from?


I assume that's due to the 400 ms block size = .4 seconds.

-Jeff
Go to the top of the page
+Quote Post
googlebot
post Jan 10 2011, 21:40
Post #69





Group: Members
Posts: 698
Joined: 6-March 10
Member No.: 78779



Duh. My bad!
Go to the top of the page
+Quote Post
Fandango
post Jan 11 2011, 23:14
Post #70





Group: Members
Posts: 1549
Joined: 13-August 03
Member No.: 8353



I have a proposal.

New standard tag fields:
EBU_R128_REFERENCE_LOUDNESS
EBU_R128_TRACK_GAIN
EBU_R128_TRACK_PEAK
EBU_R128_ALBUM_GAIN
EBU_R128_ALBUM_PEAK
or R128GAIN_*, EBUR128_*, ...

Replay Gain tag fields should become optional, only activated by a command line option. So no loss there if people want to test your implementation without having an EBU R128 DSP plugin.

Without independent tag fields the authors of such plugins cannot start supporting EBU R128 gain control in their Replay Gain plugins.
Go to the top of the page
+Quote Post
Fandango
post Jan 12 2011, 02:02
Post #71





Group: Members
Posts: 1549
Joined: 13-August 03
Member No.: 8353



PS: I'd say that using GAIN in your prefix is a bad idea like I had suggested (unfortunately I can't edit the post anymore). The GAIN in REPLAYGAIN_* is part of the proper name of that loudness measurement system. Choose wisely, AFAIK you're the first with such an implementation that uses tag field names or even the first with a PC implementation of EBU R128. The tag fields will probably become the standard (in the PC sound community).

EBUR128_* would be consistent with Replay Gain's omission of the whitespace between the two words, but it is confusing so that people might think the standard's name is Ebur 128 or EBUR 128. Hence I would vote for EBU_R128_*

This post has been edited by Fandango: Jan 12 2011, 02:07
Go to the top of the page
+Quote Post
Notat
post Jan 12 2011, 04:11
Post #72





Group: Members
Posts: 581
Joined: 17-August 09
Member No.: 72373



QUOTE (Fandango @ Jan 11 2011, 15:14) *
Without independent tag fields the authors of such plugins cannot start supporting EBU R128 gain control in their Replay Gain plugins.

Why not? There is a simple and reasonably accurate mapping between R128 and Replay Gain metrics.

This post has been edited by Notat: Jan 12 2011, 04:14
Go to the top of the page
+Quote Post
jdoering
post Jan 12 2011, 06:40
Post #73





Group: Members
Posts: 11
Joined: 6-January 11
Member No.: 87101



I had been hoping that the written tags were being converted into REPLAYGAIN compatible units (although I wondered). How are the flacs being tested being tested; a modified playback program as well? In that case is the correction algorithm applied at playback the same just different units / base?

New tags seems very unfortunate (given hardware device support, etc). New tags for the peak data wouldn't mean anything more than sample peak (ReplayGain) versus true signal peak (EBU R128); right? Would a playback program care about the distinction (would seem unlikely unless a fancy client had some way of estimating the worst-case error in sample-peak based on sampling frequency, etc ... sounds far fetched). In terms of the gain; I had been assuming that it was just a matter of converting units / reference levels. I guess the paper probably answers that. It sound interesting; too bad it's $20.

Also note that storing REFERENCE_LOUDNESS for ReplayGain is not a standard and probably doesn't make any more sense here than it does for ReplayGain (current non-standard metaflac behavior notwithstanding).

-Jeff
Go to the top of the page
+Quote Post
jdoering
post Jan 12 2011, 10:18
Post #74





Group: Members
Posts: 11
Joined: 6-January 11
Member No.: 87101



Just compared r128gain output versus ReplayGain for ref_pink.wav. ReplayGain defines ref_pink.wav as +6.00 dB. This was originally 0 when compared to 83 dB SPL but shifted up when 6 dB was added to make typical music "loud enough" on non-calibrated systems.

CODE
C:\development\replaygain>r128gain.exe ref_pink.wav
args
  analyzing ...
    ref_pink.wav (1/1): -23.4 LUFS, 0.4 LU (peak: 0.292569: -5.3 dBFS)
    ALBUM: -23.4 LUFS, 0.4 LU (peak: 0.292569: -5.3 dBFS)


Since the whole point is to come up with a scaling ratio and relative LU are scaled in dB it looks to me like this algorithm will generate ReplayGain compatible values simply by adding 5.6 to the reported LU to compensate for the different base loudness of ref_pink.wav (-.4 difference due to fundamental differences in algorithms and +6 difference due to the ReplayGain reference point shift). However; this whole comparison is based on my assumption that the goal is to use the new algorithm for computing the loudness and adjustment but calibrating to the original reference sound. I don't know if this is a valid comparison or if for example the new algorithm would specifically not be expected to behave ideally on ref_pink.wav.

Anyway, for the few real music files I compared the results were similar enough to the ReplayGain calculated values that this seems plausible but different enough that I don't know if this is a valid conversion method or not.

-Jeff
Go to the top of the page
+Quote Post
pbelkner
post Jan 12 2011, 10:21
Post #75





Group: Members
Posts: 412
Joined: 13-June 10
Member No.: 81467



QUOTE (jdoering @ Jan 12 2011, 07:40) *
I had been hoping that the written tags were being converted into REPLAYGAIN compatible units (although I wondered). How are the flacs being tested being tested; a modified playback program as well? In that case is the correction algorithm applied at playback the same just different units / base?

EBU R128 and ReplayGain are two different approaches to reach the same goal: uniform loudness at replay time.

Common to both approaches is to define an algorithm in order to determine at scan time
  • an absolute loudness, and
  • a relative loudness (gain) in order to adjust the loudness accordingly to a standardized absolute loudness at replay time.
Even if this is common to both approaches there are huge differences
  • The two algorithms are completely different. If you compare the relative loudness between different tacks achieved with ReplayGain and EBU R128 you will observe huge differences.
  • Hence it makes not much sense to have one part of your audio collection processed using ReplayGain and the other part using EBU R128. Propably it's best to decide beforehand which approach to use based on personal preferences (tests).
Metadata is not part of EBU R128. What R128GAIN does is writing the same tags as METAFLAC . Each playback software providing ReplayGain and honoring the METAFLAG tags should work with FLACs tagged by R128GAIN, e.g. Winamp. The loudness level than will be -23 LUFS as requiered by EBU R128 (completely different measure than RG's 83 dB).

Tests where performed using Winamp in conjunction with my own SoX and FFmpeg based input plugin. Native WA should do as well.

QUOTE (jdoering @ Jan 12 2011, 07:40) *
New tags seems very unfortunate (given hardware device support, etc). New tags for the peak data wouldn't mean anything more than sample peak (ReplayGain) versus true signal peak (EBU R128); right? Would a playback program care about the distinction (would seem unlikely unless a fancy client had some way of estimating the worst-case error in sample-peak based on sampling frequency, etc ... sounds far fetched). In terms of the gain; I had been assuming that it was just a matter of converting units / reference levels. I guess the paper probably answers that. It sound interesting; too bad it's $20.

Plaback software (as e.g. Winamp) makes use of the peak values (e.g. providing a clipping prevention mode). Whether it is amplitude peak or true peak will become intersting in case there is some up-sampling in the playback chain, and propably it is, because each contemporary DAC does it. Hence you should always store true peaks.

QUOTE (jdoering @ Jan 12 2011, 07:40) *
Also note that storing REFERENCE_LOUDNESS for ReplayGain is not a standard and probably doesn't make any more sense here than it does for ReplayGain (current non-standard metaflac behavior notwithstanding).

Maybe it could become part of the RG standard:
  • It would help to resolve the 83 dB vs. 89 dB debate.
  • It would help to integrate playback of ReplayGain tagged and EBU R128 tagged tracks depending
on the unit dB vs. LU, provided someone figures out the mean relative loudness between the two approaches.
Go to the top of the page
+Quote Post

23 Pages V  < 1 2 3 4 5 > » 
Closed TopicStart new topic
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:

 



RSS Lo-Fi Version Time is now: 25th December 2014 - 18:39