IPB

Welcome Guest ( Log In | Register )

2 Pages V   1 2 >  
Reply to this topicStart new topic
"MP3Gain: How can it be possible?", It 's indicated that the gain adjustments are lossless
Typhoon859
post Aug 1 2012, 00:35
Post #1





Group: Members
Posts: 19
Joined: 19-December 10
Member No.: 86635



So I've been thinking of trying to write a similar program from scratch and there's one main thing that I don't even understand how it's possible, yet alone done. So as is said, the process MP3Gain uses is lossless. Thinking about it, the only way MP3Gain could work where any player would play back the songs with whatever the target volume was would be if the change is present in the waveform. After MP3Gain is applied, if it weren't obvious from the beginning, in any audio editing software, the gain reduction is clearly visible. I could somewhat understand how the process can be reversed with added value, even if the waveform clips, as the information could still somehow be stored (more easily than the other way around). On the other hand, when taken away, don't you permanently lose the dB that you took from the threshold? As an example, if a song starts with some 6 decibel ambient noise and you reduce the song by 6dB, wouldn't that intro just completely disappear? And if the change is undone, wouldn't you not get any of the data back (unless it's stored) and just make the existing data 6dB louder? If that's the case, it isn't really undoing the changes; it's really just adding the difference in value back between the indicated ReplayGain value and what it is now.

Sorry this was kinda long-winded but the last thing though I'd also like to ask about is clipping. If a track's peak values are clipping by default, reducing the loudness now would be too late, wouldn't it? Wouldn't it be clipping no matter what at this point, contrary to what is indicated? The peaks would be chopped off either way since the structure of the waveform is no longer saved after being finalized. And also, the maximized volume indications don't make sense (has to be turned on in the options). For example, I have a file which ReplayGain indicates peaks at about 1.05 (16-bit = 100.8dB) and yet it's marked that only a 1.5dB reduction would be necessary to get it maximized (the loudest point before clipping - 96dB). Is there something I'm missing?

Thanks guys! Answers to these would be extremely helpful.


PS- A lot of the things here indicate to me that the values, whether over or under, remain as part of the data in the container but just doesn't play back, or rather, clips since it's within the 16-bit parameter.
Go to the top of the page
+Quote Post
greynol
post Aug 1 2012, 00:50
Post #2





Group: Super Moderator
Posts: 10338
Joined: 1-April 04
From: San Francisco
Member No.: 13167



You appear to assume that mp3 data is 16-bit integer in the time domain. This would be completely incorrect as mp3 data is 32-bit float in the frequency domain.

I recommend a little more research, beginning with the term global gain.


--------------------
Your eyes cannot hear.
Go to the top of the page
+Quote Post
saratoga
post Aug 1 2012, 02:25
Post #3





Group: Members
Posts: 5152
Joined: 2-September 02
Member No.: 3264



QUOTE (Typhoon859 @ Jul 31 2012, 19:35) *
If a track's peak values are clipping by default, reducing the loudness now would be too late, wouldn't it? Wouldn't it be clipping no matter what at this point, contrary to what is indicated? The peaks would be chopped off either way since the structure of the waveform is no longer saved after being finalized.


Replaygain prevents clipping that occurs during decoding, not clipping that was already present in the time domain samples.


QUOTE (Typhoon859 @ Jul 31 2012, 19:35) *
For example, I have a file which ReplayGain indicates peaks at about 1.05 (16-bit = 100.8dB)

and yet it's marked that only a 1.5dB reduction would be necessary to get it maximized (the loudest point before clipping - 96dB). Is there something I'm missing?


Thats not how dB work. Assuming you meant to say that 1.0 is 96 dB (which isn't really right), 1.05 would be 96.2 dB. That said, you get a -1.5dB reduction because mp3gain only has 1.5dB resolution given how MP3 works. So its reducing the volume by the smallest possible increment to get it below 1.0 peak.
Go to the top of the page
+Quote Post
greynol
post Aug 1 2012, 03:30
Post #4





Group: Super Moderator
Posts: 10338
Joined: 1-April 04
From: San Francisco
Member No.: 13167



Since we're not dealing with power, a ~0.2dB increase would take 1.0 to 1.02. It takes a ~0.4dB increase to take 1.0 to 1.05.

This post has been edited by greynol: Aug 1 2012, 03:34


--------------------
Your eyes cannot hear.
Go to the top of the page
+Quote Post
Typhoon859
post Aug 1 2012, 13:24
Post #5





Group: Members
Posts: 19
Joined: 19-December 10
Member No.: 86635



Right, so, there evidently seems to be a lot I don't understood so I was hoping I can be filled in. I know enough to understand what would be explained.

Regarding my knowledge of 16-bit, am I now understanding correctly that this is a decoding limitation and not at all a limit of 96dB dynamic range within the file itself? In other words, if I have a mix where increase everything past 96dB, up to 140 lets say, and then I save it to the parameters of a 16-bit MP3, the information would all still be there and the clipping would take place during the decoding process?

I understood that the indicated adjustment was to get the peak below 96dB, I guess I'm just not reading this correctly. Why is it incorrect to state that 1.0 represents 96dB for a 16-bit file? Furthermore, assuming that is the case, 96 x 1.05 is how I got the 100.8dB value. If that's not how it's calculated then how and why? XD

Just to also make known, I can only guess what you're talking about when you mention time domain and frequency domain although I'm fairly certain I'd understand a brief description of what that is referencing. I know many essential things when it comes to digital waveforms but not how it relates to digital parameters and related formats.
Go to the top of the page
+Quote Post
2Bdecided
post Aug 1 2012, 13:59
Post #6


ReplayGain developer


Group: Developer
Posts: 5362
Joined: 5-November 01
From: Yorkshire, UK
Member No.: 409



QUOTE (greynol @ Aug 1 2012, 00:50) *
I recommend a little more research, beginning with the term global gain.
...with "mp3" in front...
http://lmgtfy.com/?q=mp3+global+gain

Cheers,
David.
Go to the top of the page
+Quote Post
pdq
post Aug 1 2012, 14:16
Post #7





Group: Members
Posts: 3450
Joined: 1-September 05
From: SE Pennsylvania
Member No.: 24233



I seem to recall that the dynamic range of the mp3 format is in excess of 200 dB. While this is not technically the same as float 32, it is way more than needed in any real world situation.

Edit: The convention is to equate 1.0 with zero dB and anything smaller as negative dB. That way it makes no difference if you are talking about 16 bit integer, 24 bit integer, 32 bit float etc. Full scale for integers is 1.0 = 0 dB, while for float 1.0 is still o dB, it's just not full scale.

Edit2: dB is a logarithmic scale. Multiplying or dividing the amplitude by a factor means adding or subtracting the properly scaled logarithm of the factor to the dB.


This post has been edited by pdq: Aug 1 2012, 14:23
Go to the top of the page
+Quote Post
db1989
post Aug 1 2012, 14:24
Post #8





Group: Super Moderator
Posts: 5275
Joined: 23-June 06
Member No.: 32180



QUOTE (Typhoon859 @ Aug 1 2012, 13:24) *
In other words, if I have a mix where increase everything past 96dB, up to 140 lets say, and then I save it to the parameters of a 16-bit MP3
There is no such thing as a 16-bit MP3, as you have already been told.

QUOTE
Why is it incorrect to state that 1.0 represents 96dB for a 16-bit file?
Because 1.0 is an instantaneous point, specifically the maximum, on a waveform with possible amplitudes between -1 and 1, on a linear scale; whereas a decibel is a measure of loudness based on the aggregation of many samples over a period of time, measured logarithmically. A sample at +1 alone does not equal either 96 dB, 0 dB FS, or any other measure of decibels.

Assuming that your reference to 96 dB means dynamic range – rather than, for example, dB SPL, which is not relevant – to have/demonstrate this, the file would need to contain least two waves: one oscillating between 1 and 1, and one between (-1/32768) and (1/32767).

QUOTE
Furthermore, assuming that is the case, 96 x 1.05 is how I got the 100.8dB value. If that's not how it's calculated then how and why? XD
Again, linear vs. logarithmic. Without intending offence, considering this and the fact that you don’t know what the time and frequency domains are, it’s probably time to do some background reading before continuing with this thread.
Go to the top of the page
+Quote Post
saratoga
post Aug 1 2012, 15:01
Post #9





Group: Members
Posts: 5152
Joined: 2-September 02
Member No.: 3264



QUOTE (greynol @ Jul 31 2012, 22:30) *
Since we're not dealing with power, a ~0.2dB increase would take 1.0 to 1.02. It takes a ~0.4dB increase to take 1.0 to 1.05.


Opps I always mix up 10 vs 20 log.
Go to the top of the page
+Quote Post
saratoga
post Aug 1 2012, 16:06
Post #10





Group: Members
Posts: 5152
Joined: 2-September 02
Member No.: 3264



QUOTE (Typhoon859 @ Aug 1 2012, 08:24) *
Regarding my knowledge of 16-bit, am I now understanding correctly that this is a decoding limitation and not at all a limit of 96dB dynamic range within the file itself? In other words, if I have a mix where increase everything past 96dB, up to 140 lets say, and then I save it to the parameters of a 16-bit MP3, the information would all still be there and the clipping would take place during the decoding process?


MP3s do not have an associated number of bits, or even any specific precision at all. Number of bits is a property of PCM, which MP3 is definitely not.

QUOTE (Typhoon859 @ Aug 1 2012, 08:24) *
I understood that the indicated adjustment was to get the peak below 96dB, I guess I'm just not reading this correctly. Why is it incorrect to state that 1.0 represents 96dB for a 16-bit file? Furthermore, assuming that is the case, 96 x 1.05 is how I got the 100.8dB value. If that's not how it's calculated then how and why? XD


http://en.wikipedia.org/wiki/Decibel

QUOTE (Typhoon859 @ Aug 1 2012, 08:24) *
Just to also make known, I can only guess what you're talking about when you mention time domain and frequency domain although I'm fairly certain I'd understand a brief description of what that is referencing. I know many essential things when it comes to digital waveforms but not how it relates to digital parameters and related formats.


This isn't something you can learn from an internet forum. You'll need a textbook.
Go to the top of the page
+Quote Post
mjb2006
post Aug 1 2012, 19:31
Post #11





Group: Members
Posts: 860
Joined: 12-May 06
From: Colorado, USA
Member No.: 30694



QUOTE (Typhoon859 @ Jul 31 2012, 17:35) *
if a song starts with some 6 decibel ambient noise and you reduce the song by 6dB, wouldn't that intro just completely disappear?

If you're worried that you've lost something on the quiet end by reducing the global gain throughout the file, your decoder could output 24-bit instead of 16-bit. I'm doubtful this would matter in most recordings.

QUOTE (Typhoon859 @ Jul 31 2012, 17:35) *
And if the change is undone, wouldn't you not get any of the data back (unless it's stored) ... A lot of the things here indicate to me that the values, whether over or under, remain as part of the data in the container but just doesn't play back, or rather, clips since it's within the 16-bit parameter.

That's basically it. Each granule in each frame in the MP3 (2 granules per frame) contains frequency & amplitude info for generating brief sine waves. These waves are summed by the decoder to make an equally brief but much more complex composite waveform. Space is saved at encoding time by (among other things) eliminating waves that don't make an audible difference, and by storing the parameters with less-than-perfect precision, and by using standard lossless compression techniques internally. At decoding time, the global gain field of each frame is used to scale each granule's composite wave. So if you modify the global gain fields, only the amplitude changes; the "shape" (frequency content) of the output wave is unaffected, hence the gain adjustment is "lossless".

QUOTE (Typhoon859 @ Jul 31 2012, 17:35) *
If a track's peak values are clipping by default, reducing the loudness now would be too late, wouldn't it?

The MP3's combined waveform data consists of samples which use 32-bit float amplitudes, essentially perfect precision for audio purposes. However, typical audio playback APIs expect LPCM samples which use 16- or 24-bit integer amplitudes, so MP3 decoders convert the 32-bit float to 16-bit signed integer, normally. By definition, ±1.0 in the 32-bit float is the maximum range of the integers you're converting to. If 16-bit, that means -1.0 is -32768 and +1.0 is +32767. As pointed out, technically you can't assign a decibel value to a single point, but that's irrelevant for purposes of detecting clipping; if the float exceeds ±1.0, there's no choice but to clip when converting to LPCM.

Hopefully with this explanation you can see how reducing the global gain brings the float32 amplitudes under ±1.0, which in turn prevents clipping, but is essentially "lossless" in the sense that it doesn't change the shape of the waveform, just its amplitude (volume).
Go to the top of the page
+Quote Post
Typhoon859
post Aug 2 2012, 06:40
Post #12





Group: Members
Posts: 19
Joined: 19-December 10
Member No.: 86635



First of all, I'd just like to say that many of you are underestimating my knowledge and how easy it would be to explain certain things to me, just with a limited number of jargon. Just because you may have learned something first doesn't mean that it's the order someone else may have learned. In other words, just because a certain understanding of facts preceded something else for you doesn't mean that's the only logical way it could make sense. That shouldn't be a factor in the first place in my opinion; this is a forum. What's especially annoying is that many of the remarks have a negative connotation but whatever, I'll ignore that as it is I after all who's seeking for help. I can put up with it as long as it isn't constantly reiterated. You're either willing to educate on the subject matter or you're not.


QUOTE (2Bdecided @ Aug 1 2012, 08:59) *
...with "mp3" in front...
http://lmgtfy.com/?q=mp3+global+gain

The first thing I did was search this and I actually found a useful link in this very forum: http://www.hydrogenaudio.org/forums/index....ic=24527&hl. It's largely stuff I knew for a short while at some point. I can't believe I completely forgot about it all. That still only answers for a bit of my confusion. Anything there though could've easily been explained with about three sentences, but moving on...


QUOTE (pdq @ Aug 1 2012, 09:16) *
I seem to recall that the dynamic range of the mp3 format is in excess of 200 dB. While this is not technically the same as float 32, it is way more than needed in any real world situation.

Edit: The convention is to equate 1.0 with zero dB and anything smaller as negative dB. That way it makes no difference if you are talking about 16 bit integer, 24 bit integer, 32 bit float etc. Full scale for integers is 1.0 = 0 dB, while for float 1.0 is still o dB, it's just not full scale.

Edit2: dB is a logarithmic scale. Multiplying or dividing the amplitude by a factor means adding or subtracting the properly scaled logarithm of the factor to the dB.

Certainly, I mean, the threshold of pain starts as low as around 120dB SPL. The idea should really be so that recordings, when equivocally reproduced at 0dB, sound as loud as the instruments/effects in them would be in real life, or rather, as loud as intended. So anything above 140dB FS shouldn't really even exist and above 130 would really just be unnecessary (though the option for it should exist), so since what you say is actually the case, then yes, MP3 is capable of reproducing more than enough dynamic range. Just for clarification, the global gain field is limited in adjustment to 8-bit integers and each increment relates to ~1.505dB. Wouldn't that mean that it's capable of reproducing a dynamic range of ~383.775dB? Or, is it limited to a 32 bit depth which would be 192dB FS? Lol, I have a feeling that neither is right but I think I'm getting closer. XD

When I saw that the values were represented as 0.xxxxxx-1.xxxxxx, I pretty much instantly realized that the point was to compensate for differences in potential bit depth values. I incorrectly assumed (due to lack of conceptual practice) that it relates to the maximum values of each bit depth so in the case of 16-bit, I was thinking 1.0 = 96dB. As it logically makes sense, it actually relates to 0dB FS in relation to a bit depth value. I don't quite understand "float" beyond possibly an incorrect educated guess that it goes well beyond 192dB if not limitless? <Shrug> That wouldn't make sense though. Doesn't there need to be a cap?

I don't get what 1.05 represents though if it's not relative to 96dB or 144dB or whatever it may be. How is it determined?


QUOTE (db1989 @ Aug 1 2012, 09:24) *
There is no such thing as a 16-bit MP3, as you have already been told.

I understood that the first time. That's why I said within the parameters of 16-bit. An MP3 can be encoded with a 24-bit parameter as well so TO WHATEVER THIS APPLIES TO (as I understand it's not a direct application on the file), when I'm referencing 16-bit, I was already corrected that it's a decoding limitation so that is to what I refer. You can clarify that further for me, as I was asking, or you can patronize me ON A FORUM where people come to inquire for help due to lack of knowledge on a subject. Whichever makes you feel better I guess...

Now, can we keep it cool from here? If you don't feel like answering any of my inquiries because you feel that my knowledge level is way below what is worth your time (however that makes sense), then simply don't respond.

QUOTE
Because 1.0 is an instantaneous point, specifically the maximum, on a waveform with possible amplitudes between -1 and 1, on a linear scale; whereas a decibel is a measure of loudness based on the aggregation of many samples over a period of time, measured logarithmically. A sample at +1 alone does not equal either 96 dB, 0 dB FS, or any other measure of decibels.

Assuming that your reference to 96 dB means dynamic range – rather than, for example, dB SPL, which is not relevant – to have/demonstrate this, the file would need to contain least two waves: one oscillating between 1 and 1, and one between (-1/32768) and (1/32767).

I know the amplitude of a waveform ranges from between 1 and -1. In relation to a bit depth value, 1 does equal to 0dB FS though. I thought that's what it was in relation to, no? If not in relation to anything, then how does it determine if there is clipping at all? EDIT: MJB2006 mentioned the values which it relates to.

I of course didn't mean 96dB SPL as that's obviously not relevant; not sure why it was necessary to even go there besides in a false attempt to put yourself on higher grounds. I however admit (no reason why I shouldn't or should not have admitted anything else) that I don't understand why the second wave is necessary and how or why this would be true. How could I when things are just randomly thrown out there without explanation? Should I go look that up as well - anything I don't know which is mentioned? That's the premise under which I came here in the first place.

QUOTE
Again, linear vs. logarithmic. Without intending offence, considering this and the fact that you don’t know what the time and frequency domains are, it’s probably time to do some background reading before continuing with this thread.

Linear vs. logarithmic - got it. I just restated my thoughts and was asking what would be the correct way to go about understanding this. You get much more from a simple explanation from interaction and experience than you do from an irrelevant three chapters in a book to get to (if even) one relevant statement, all of which will be forgotten without actual application and/or practice.


QUOTE (saratoga @ Aug 1 2012, 11:06) *
MP3s do not have an associated number of bits, or even any specific precision at all. Number of bits is a property of PCM, which MP3 is definitely not.

Ok, but MP3s can be told to limit the decoding of it to 16-bit PCM; is that not true? If it is, then regarding this specifically, that's what I meant even if I didn't initially understand correctly. If that's in fact not true, then what does the 16-bit parameter in an MP3 file indicate then?..

QUOTE

Sigh... Like I've never been there and not more than once even...

This is the equivalent of you asking somewhere about a question which has something to do with gradients which you conceptually understand just not practically and I link you this: go ahead, click this...

Your first response was genuine and it was appreciated; try not to feel like you're on a battle front and that you need to affiliate with the group mindset.

QUOTE
This isn't something you can learn from an internet forum. You'll need a textbook.

You're assuming too much about me and in general. I'm sure if I understood what those terms were about, I'd easily be able to explain it to somebody with a simple response and expand further on things they might further not understood. Past a certain point I guess it would be fair to say that it's up to them to make any further connections. In this case here, I don't think the line has even been anywhere near reached. There is such a thing as a "bad teacher" you know. Not only that but the better the teacher, the simpler and better he/she could explain more complicated things.

This post has been edited by Typhoon859: Aug 2 2012, 07:29
Go to the top of the page
+Quote Post
Typhoon859
post Aug 2 2012, 06:45
Post #13





Group: Members
Posts: 19
Joined: 19-December 10
Member No.: 86635



QUOTE (mjb2006 @ Aug 1 2012, 14:31) *
If you're worried that you've lost something on the quiet end by reducing the global gain throughout the file, your decoder could output 24-bit instead of 16-bit. I'm doubtful this would matter in most recordings.

Ooh! Thanks for that; that's very interesting! Probably having to do with something along the lines of me not understanding the entirety of this process but how would decoding as such help with data taken from the low end? The idea to me behind this seems to be for clipping which can be worked around with these decoders instead of normalizing to the maximum by passing through the info beyond the 16-bit limitation (96dB FS). On that note, would you happen to know if the newest version of AC3Filter accomplishes this when setting the output to 24-bit? Thanks again.

QUOTE
QUOTE (Typhoon859 @ Jul 31 2012, 17:35) *
And if the change is undone, wouldn't you not get any of the data back (unless it's stored) ... A lot of the things here indicate to me that the values, whether over or under, remain as part of the data in the container but just doesn't play back, or rather, clips since it's within the 16-bit parameter.

That's basically it. Each granule in each frame in the MP3 (2 granules per frame) contains frequency & amplitude info for generating brief sine waves. These waves are summed by the decoder to make an equally brief but much more complex composite waveform. Space is saved at encoding time by (among other things) eliminating waves that don't make an audible difference, and by storing the parameters with less-than-perfect precision, and by using standard lossless compression techniques internally. At decoding time, the global gain field of each frame is used to scale each granule's composite wave. So if you modify the global gain fields, only the amplitude changes; the "shape" (frequency content) of the output wave is unaffected, hence the gain adjustment is "lossless".

Well firstly, thank you for being the first person to acknowledge that statement. In response to your comment, that pretty much explains the general premise of MP3 encoding for me after Googling "frame" and "granule" to understand that better. In the time since the beginning of my post, I sort of got it from context and from the mentioned link I visited about the global gain field, but otherwise, I still didn't understand how audio has frames, lol. All I really thought of it was that it was like digital packets.

But umm, also from that thread I earlier linked, I understood that MP3s have these global gain fields which could be altered to make differences only for decoders, leaving all the information in tact. This in fact makes it serve with the exact same function as ReplayGain but with limited accuracy to 1.5dB. In other words, the amplitude DOESN'T get physically changed, right? So yeah, if that's the case, then the process can be undone. Of course I knew that the actual shape of the waveform isn't changed. Reducing the amplitude at any stage though still removes the quietest levels by the correlated amount, and for music without much compression, it really starts to be noticeable with 4.5dB reduced or more. It could just be the deficiency of ALL my gear but I doubt it, especially when testing out and compensating for those volume changes using my FiiO E17 DAC/Amp. If all I said here is true though, the biggest part of my question is still answered because it's mainly regarding the undoing of these alterations; the mentioned loss would be the case regardless even with ReplayGain. Maybe I'm wrong in saying this; I actually hope so. XD

QUOTE
The MP3's combined waveform data consists of samples which use 32-bit float amplitudes, essentially perfect precision for audio purposes. However, typical audio playback APIs expect LPCM samples which use 16- or 24-bit integer amplitudes, so MP3 decoders convert the 32-bit float to 16-bit signed integer, normally. By definition, ±1.0 in the 32-bit float is the maximum range of the integers you're converting to. If 16-bit, that means -1.0 is -32768 and +1.0 is +32767. As pointed out, technically you can't assign a decibel value to a single point, but that's irrelevant for purposes of detecting clipping; if the float exceeds ±1.0, there's no choice but to clip when converting to LPCM.

Hopefully with this explanation you can see how reducing the global gain brings the float32 amplitudes under ±1.0, which in turn prevents clipping, but is essentially "lossless" in the sense that it doesn't change the shape of the waveform, just its amplitude (volume).

Why are you the only one that gave a normal and direct response, yet alone a good one? Anyway...

"-32768" and "+32767": what is the unit of measure for these values or to what does it relate? From what I know, dB is a relative measurement; from what I remember, it can only be measured as the "average" within a period, although I was never quite sure how that works in non-simple waveforms, unless it's calculated by RMS.

In general I understand these things now individually (unless otherwise noted) but I just don't get how they relate, but I have to know that to understand how the calculations for clipping are done. Thanks again for your help!


__________________
What's the reason to want to host a forum if it's not for the reason of spreading knowledge and getting pleasure out of informing people? I have a feeling that even if initially there was good intention here, things just got derailed along the way by the people themselves because they begin using this as their personal platform for their own totalitarian ideals or simply conformist mentality - like in most other places. I wouldn't be at all surprised if this thread were locked, my posts were modified in what they say or edited together into senseless and out of context pieces - sort of like media propaganda, and I was banned with an automated message that I broke the rules. As a matter of fact, I'm so comfortable here that before posting this, I had to save this entire thread so far onto my computer to keep as evidence.

This was stupid on my part to say; I'm sorry... But guys, come on, is it too much to ask for us to learn from each other? Have you ever questioned the reason behind your approach to your responses? Just put yourself in my position. I've been getting more ridicule here from the majority than I have gotten aid. Thanks to anyone that truly contributed though, sincerely thank you!

This post has been edited by Typhoon859: Aug 2 2012, 07:39
Go to the top of the page
+Quote Post
halb27
post Aug 2 2012, 10:20
Post #14





Group: Members
Posts: 2446
Joined: 9-October 05
From: Dormagen, Germany
Member No.: 25015



A short explanation of mp3 technology in the entire context as far as your questions are concerned:

a) Input for mp3 encoding is usually a PCM signal (music representation in the time domain). That is 44100 times a second (for music on CD) the original music is looked up for the current value of the signal. This value is stored as a signed integer (with 16 bit resolution for music on CD). When the CD recording was fine there was no clipping, that is the entire track could be encoded in the range provided by 16 bit integers.
Don't think of loudness, db, etc. at the moment as it does not help here.

b) Encoding with mp3 (or another transform codec like AAC) means bringing the signal representation from the time domain to the frequency domain. Music representation in the frequency domain means creating time windows (10 msec long as the order of magnitude which transform codecs are using) and representing the music for a time window by giving the frequency-amplitude-distribution of it. This distribution is coded efficiently using various lossy and lossless techniques.
In the case of mp3 these time windows are the frames resp. granules of a frame. A frame consists of 1152 wave samples, and a frame is separated into two granules. There is more complication because of the usage of long/short/mixed blocks. But that's all specific details you should look after yourself when you're really implementing your own thing. It's not necessary for the fundamental understanding. Just think in terms of time windows the length of which can be adapted to the current musical situation according to actual needs.
What's important for your context: there is a scale factor (called global gain) for each time window which controls the amplitude of the entire frequency-amplitude-distribution of the window. This is the spot where a lossless amplitude variation of an existing mp3 file can be done.

c) Decoding an mp3 file means looking at each time window and transforming the frequency-amplitude-distribution back to the wave samples. It should be pointed out that clipping can occur in this process even when there was no clipping in the original music (due to the approximate nature of the frequency-amplitude-distribution stored in the mp3 file). This is where replaygain infos show up a peak value > 1.
The decoding machinery can take this into account and scale the frequency-amplitude-distribution down accordingly, or - if you use replaygain to scale down the global gain values - you can avoid this situation altogether making the mp3 file playable on any player without this special clipping.

All this has nothing to do with db, perceived loudness, etc. When it comes to these in your context, it's all about normalizing music relative to a standard of perceived loudness. This standard is 89 dB SPL (but you can deviate from this), and a replaygain analysis stage analyses the music and makes a suggestion by how many db to change the music to get at this standard perceived loudness. You can use this value to change the global gain factors of the mp3 file accordingly (which can be altered in 1.5 db steps, so you can do this only approximately).

I suggest to concentrate on the fundamental things at the moment, and take care about the irritating details like granules and short blocks or the special considerations for sfb21 when you're really implementing your own method. At that time you should get familiar with the exact mp3 specs, informations of which you can find in the net and/or in books (but please don't have HA member explain this to you other than for isolated specific questions). But why bother BTW as the tools are there for doing these things?

This post has been edited by halb27: Aug 2 2012, 10:32


--------------------
lame3100m -V1 --insane-factor 0.75
Go to the top of the page
+Quote Post
db1989
post Aug 2 2012, 11:05
Post #15





Group: Super Moderator
Posts: 5275
Joined: 23-June 06
Member No.: 32180



QUOTE (Typhoon859 @ Aug 2 2012, 06:40) *
QUOTE (db1989 @ Aug 1 2012, 09:24) *
[…] you can patronize me ON A FORUM where people come to inquire for help due to lack of knowledge on a subject. Whichever makes you feel better I guess...
wahh, I’m projecting my own baseless assumptions about others’ motivations

QUOTE
Now, can we keep it cool from here? If you don't feel like answering any of my inquiries because you feel that my knowledge level is way below what is worth your time (however that makes sense), then simply don't respond.
The thing is, we are answering. Yet I see very little reason why anyone should bother to continue to do so, if this sort of thing is the thanks they get.

QUOTE
not sure why it was necessary to even go there besides in a false attempt to put yourself on higher grounds.
wahh, I’m projecting my own baseless assumptions about others’ motivations

QUOTE (Typhoon859 @ Aug 2 2012, 06:45) *
What's the reason to want to host a forum if it's not for the reason of spreading knowledge and getting pleasure out of informing people? I have a feeling that even if initially there was good intention here, things just got derailed along the way by the people themselves because they begin using this as their personal platform for their own totalitarian ideals or simply conformist mentality - like in most other places.
WAHH, people didn’t cover their replies to me with sugar and therefore the forum is totalitarian, fascist, blah-blah-blah

QUOTE
I wouldn't be at all surprised if this thread were locked
I’d normally say that’s not going to happen, but then if you keep spouting nonsense like this…
QUOTE
[or] my posts were modified in what they say or edited together into senseless and out of context pieces - sort of like media propaganda, and I was banned with an automated message that I broke the rules.
…it’s going to get tempting, albeit futile due to just giving you what you seem to want: more fuel for playing the victim card, in a mixture of ostensible self-deprecation and ultimate egocentrism. Keep wishing, I guess.

In short, please rethink your incorrect assumptions about others’ motivations as they relate to you, tempering them with the new knowledge that not everyone is out to get you on a personal level or any other.

This post has been edited by db1989: Aug 2 2012, 12:19
Go to the top of the page
+Quote Post
2Bdecided
post Aug 2 2012, 11:53
Post #16


ReplayGain developer


Group: Developer
Posts: 5362
Joined: 5-November 01
From: Yorkshire, UK
Member No.: 409



QUOTE (Typhoon859 @ Aug 2 2012, 06:45) *
Why are you the only one that gave a normal and direct response, yet alone a good one? Anyway...
Because it looks like you're very far from understanding this, yet you think you have good knowledge. Experience suggests that means we're in for a heck of a long thread, and not everyone has the time.

Digital audio comes from ADCs and goes to DACs*. Both use integers, 16-bits, 24-bits, whatever. These can be expressed in various ways. binary (unsigned int, twos compliment, etc), decimal (signed integer, floating point, etc). As long as the ranges and conversion factors are known and/or understood, these can be entirely equivalent. e.g.
QUOTE
"-32768" and "+32767": what is the unit of measure for these values or to what does it relate?
It's the range of 16-bit values expressed in signed integer decimal.

* no DAC = no way to hear the audio, so this is pretty fundamental. wink.gif


lossy codecs don't intend to accurately reproduce audio. A lossy audio file (like an mp3) contains what can be interpreted as a series of instructions to approximate the original waveform. e.g. "generate a sine wave at this frequency and amplitude, for this long, and add it to anther one, and another one, etc". How you convert these instructions into 16-bit or 24-bit or whatever audio data is very well defined, as is the amplitude level which matches the full scale of the output (1.0, or 32767/8 or 8388607/8 etc) - but within the mp3 file, there's no concept of 16-bits or 24-bits or any other input or output resolution - there's just the approximated parameters of sine waves which get added together by the decoder to create the output. A decoder can do this to any level of accuracy it wants, generating 64-bit output if it wishes.

The global gain field is just a way in which the amplitude of the sine waves gets expressed in an mp3 file. Instead of specifying all the sine wave amplitudes absolutely, there's the global gain for a given block, and then the amplitude offsets relative to that for each individual sine wave in a given block. It's just a more efficient way of saying exactly the same thing, but it has a side effect that changing the global gain field in an mp3 file is an easy and non-destructive way of changing its volume. You could do it by changing all the amplitudes specified for all the individual sine waves - except that in the amplitude parameter for them, the ranges and steps in those ranges will vary between them (because the different ranges allowed them to be encoded more efficiently), making a simple useful volume control almost impossible to do in that way.

As along as you can put the global gain fields back to what they were, you can't lose data by changing them, because the definitions of all the sine waves remain intact. If you have a (for example) 16-bit decoder, and reduce all the global gain fields dramatically, you could in theory reduce the amplitude of all the sine waves so that the 16-bit output rounded to zero - you'd get absolutely nothing out of a 16-bit decoder, because everything has been pushed more than 96dB below digital full scale. But, either by using a more accurate decoder (e.g. 32-bits) and amplifying the result, or more simply by putting the global gain fields back to what they were, you'd get all the audio data decoded just fine.

Hope this helps.

Cheers,
David.

P.S. this has all been discussed before.

This post has been edited by 2Bdecided: Aug 2 2012, 11:58
Go to the top of the page
+Quote Post
[JAZ]
post Aug 2 2012, 13:03
Post #17





Group: Members
Posts: 1797
Joined: 24-June 02
From: Catalunya(Spain)
Member No.: 2383



@Typhoon859: You should read again your posts, and think again what you asked, what you knew and what you needed to know.


The first question is on the topic's name: "How can mp3gain be lossless?". That question is reasonable, from a user point of view, or someone more interested in the final output itself, or wondering how the word "lossless" can be used next to a lossy codec. But those weren't your reasons.

You then stated that you are interested in writing a program that does a similar thing (you don't say it's about mp3, but then you wouldn't talk about mp3gain, you should have talked about replaygain), and the followup from all your replies is that you have very little knowledge about how mp3 is encoded and how it is stored (its bitstream).
You also know some audio terms, but you also show incorrect usage of some of them.

Just some corrections on your assumptions:

16bit (integer) PCM: Audio range from -32768 to 32767 (0 is the middle). peaks are -32768 and 32767 because no value can be lower/higher than this. If a value has to be lower/higher, it uses -32768 or 32767 and that is what is called digital clipping.
The difference between the highest peak (-32768) and the lowest value (-1) determines the Signal to Noise Ratio (SNR). Since 0dBFS (deciBell Full Scale) is used as a reference for peak, the lowest value is -96dBFS . So no, 32768 is not 96dB.

24bit (integer) PCM: Audio range from -8388608 to 8388607 (0 is the middle). Same as above about clipping. lowest value is -144dBFS

32bit (float) PCM: Audio range from 0.0 to 1.0 (0.5 is the middle). peaks are +/-3.4028234 × 10^38. The smallest value is +/-1.18 × 10^−38 (or +/-1.4 × 10^−45 using denormalized numbers). Since the audio range is much smaller than the peak, digital clipping is almost impossible. Difference from max value to min value is: -758.56dBFS (if i've calculated it correctly), but the SNR also "floats", and so is much smaller (similar to 24bit integer PCM, due to the way 32bit float is stored).



These forums are a place where knowledge is shared and sometimes even born. We have codec developers and people that assist and give international conferences as members. The rules have always been made in favour of making scientific and repeatable tests instead of oppinions and beliefs.

What these forums are not is a help place or a support center of any sort. Individual questions and problems are answered, but usually when they are concrete, and answers are well known by members or there is interest in documenting the question in some way.
You should not expect to be taught on a subject here.
Go to the top of the page
+Quote Post
saratoga
post Aug 2 2012, 16:05
Post #18





Group: Members
Posts: 5152
Joined: 2-September 02
Member No.: 3264



QUOTE (Typhoon859 @ Aug 2 2012, 01:40) *
QUOTE
This isn't something you can learn from an internet forum. You'll need a textbook.

You're assuming too much about me and in general.


Yes, it seems I was mistaken to assume you were serious. Now I see you're happy with what you already know, and its pointless to answer your questions.
Go to the top of the page
+Quote Post
[JAZ]
post Aug 2 2012, 17:38
Post #19





Group: Members
Posts: 1797
Joined: 24-June 02
From: Catalunya(Spain)
Member No.: 2383



QUOTE ([JAZ] @ Aug 2 2012, 14:03) *

32bit (float) PCM: Audio range from 0.0 to 1.0 (0.5 is the middle).


Ouch! what was i thinking about? Audio range is -1.0 to 1.0 and 0.0 is the middle. The rest still applies.
Go to the top of the page
+Quote Post
alanofoz
post Aug 3 2012, 02:52
Post #20





Group: Members
Posts: 46
Joined: 10-August 03
Member No.: 8294



QUOTE ([JAZ] @ Aug 3 2012, 03:38) *

QUOTE ([JAZ] @ Aug 2 2012, 14:03) *

32bit (float) PCM: Audio range from 0.0 to 1.0 (0.5 is the middle).


Ouch! what was i thinking about? Audio range is -1.0 to 1.0 and 0.0 is the middle. The rest still applies.

With one possible exception?

AFAIK the difference between 0dBFS and the min value is about 750dB, AND another ~750dB for the max.

Some years ago I artificially created a 32 bit float file with a 1500dB dynamic range. Not much practical use though...


--------------------
Cheers,
Alan
Go to the top of the page
+Quote Post
greynol
post Aug 3 2012, 04:51
Post #21





Group: Super Moderator
Posts: 10338
Joined: 1-April 04
From: San Francisco
Member No.: 13167



You're saying full scale is not maximum amplitude?


--------------------
Your eyes cannot hear.
Go to the top of the page
+Quote Post
[JAZ]
post Aug 3 2012, 09:54
Post #22





Group: Members
Posts: 1797
Joined: 24-June 02
From: Catalunya(Spain)
Member No.: 2383



The signal to noise ratio is the difference between the full scale value (in this case 1.0) and the smallest value different to middle (middle=0, smallest, non denormal=1.18 x 10^-38). dB = 20 * log10(smallest value/0dBFS value) i.e. 20 * log10( (1.18 * 10^-38) / 1.0 ) .

Completing what i said above, the real SNR is, -138.47dB
where for the same exponent:
bit 23 = 1
bit 0 = 0.00000011920928955078125

http://en.wikipedia.org/wiki/Single-precis...ng-point_format
Go to the top of the page
+Quote Post
2Bdecided
post Aug 3 2012, 09:58
Post #23


ReplayGain developer


Group: Developer
Posts: 5362
Joined: 5-November 01
From: Yorkshire, UK
Member No.: 409



I think we scared him off.

Interesting how, on a generally friendly board, a conversation can go so wrong.
Go to the top of the page
+Quote Post
skamp
post Aug 3 2012, 10:11
Post #24





Group: Developer
Posts: 1453
Joined: 4-May 04
From: France
Member No.: 13875



QUOTE (2Bdecided @ Aug 3 2012, 10:58) *
Interesting how, on a generally friendly board, a conversation can go so wrong.


There's actually often friction with newbies here. Get off my lawn type of reactions.


--------------------
See my profile for measurements, tools and recommendations.
Go to the top of the page
+Quote Post
Destroid
post Aug 3 2012, 10:50
Post #25





Group: Members
Posts: 555
Joined: 4-June 02
Member No.: 2220



Actually, I hope this person is still lurking and learning...

I remember my (long, long) past inquiry about, "Does ["holes" in MP3 playback] damage speakers?" blink.gif I wish I hadn't asked before researching, but- Hey! Sometimes a good wake-up call can kick-start the brain I was born with. tongue.gif

I just want to remark to the OP- you really can't compare MP3 to PCM because (basically) the technology is (encoded) so different that it doesn't help to cling to comparisons.

I conclude by thanking the following persons for their patience and guidance: john33, Robert et all (LAME), Frank Klemm, Dibrom, the developers of digital codecs... and HA wink.gif



--------------------
"Something bothering you, Mister Spock?"
Go to the top of the page
+Quote Post

2 Pages V   1 2 >
Reply to this topicStart new topic
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:

 



RSS Lo-Fi Version Time is now: 21st December 2014 - 15:57