IPB

Welcome Guest ( Log In | Register )

2 Pages V   1 2 >  
Reply to this topicStart new topic
What's the meaning of "sample rate" in MP3/Vorbis/AAC?, Considering the audio was transformed
sheh
post Oct 7 2012, 03:44
Post #1





Group: Members
Posts: 89
Joined: 3-November 04
Member No.: 17971



What's the meaning of "sample rate" in MP3/Vorbis/AAC? Wouldn't it be meaningless after transform, same as bit depth?

Go to the top of the page
+Quote Post
saratoga
post Oct 7 2012, 05:05
Post #2





Group: Members
Posts: 4971
Joined: 2-September 02
Member No.: 3264



No, the transform occurs on sampled data and so the sample rate determines which samples correspond to which frequencies.
Go to the top of the page
+Quote Post
pdq
post Oct 7 2012, 14:23
Post #3





Group: Members
Posts: 3407
Joined: 1-September 05
From: SE Pennsylvania
Member No.: 24233



Still. since the data are stored as frequencies and intensities, couldn't it be decoded to any arbitrary sample rate?
Go to the top of the page
+Quote Post
[JAZ]
post Oct 7 2012, 14:44
Post #4





Group: Members
Posts: 1783
Joined: 24-June 02
From: Catalunya(Spain)
Member No.: 2383



When we say that lossy streams don't have a strictly defined bit depth (as opposed to LPCM streams), it is because the transforms that happen to the signal and the way data is stored make it variable. (Sometimes it's said that for MP3 it is 32bits float, but that's a simplification as well).

Said that, the same is not exactly applicable to sample rate. Even though the frequency bands might or might not have information, the compressed data is specific for one max frequency (so a specific sample rate).


I.e. It is not the same to resample (either from the decoded signal, or by samplerate halving, like older mp3 decoders allowed to do), than to say that the samplerate is not defined for the encoded data.


Two examples:

An MP3 stream can be MPEG1 layer 3, MPEG2 layer 3 and the so-called "MPEG2.5 layer 3". So it can be one of "48Khz, 44.1Khz, 32Khz", "24Khz, 22.05Khz, 16Khz", "12Khz, 11.025Khz, 8Khz". The encoded data is relative to this sample rate, and the decoder (needs and) will use it to generate the decoded stream.

An Opus stream is a 48Khz stream. The encoded data is always at this sample rate, and when decoding, can be resampled to another sample rate. (If the original stream was at 44.1Khz, it can be resampled to that samplerate).

This post has been edited by [JAZ]: Oct 7 2012, 14:46
Go to the top of the page
+Quote Post
jensend
post Oct 7 2012, 16:46
Post #5





Group: Members
Posts: 145
Joined: 21-May 05
Member No.: 22191



I'm not intimately acquainted with the details of any of these formats, but from what little I do know, I think many of the answers thus far are misleading.

If you transform a periodic signal with the DCT, and thus really were only storing information in the frequency domain, you could decode that at any sample rate. Yes, like Saratoga said, the original sample rate would normally be used to define a standard list of frequencies so you can just store coefficients rather than both frequencies and coefficients, but you could still natively decode it at any frequency.

But since we're looking at the MDCT, a lapped/windowed transform, the width of the window and the overlap are time-domain information, and combining the windows, and the filtering that allows you to do that without introducing artifacts, surely involves some sample rate inflexibilities. For instance, at most sampling rates the window won't be an integer number of samples.

But it still may be possible to decode at a number of different rates which are integer multiples of each other. In particular, though [JAZ] said "an Opus stream is a 48kHz stream," Opus was designed so it can be decoded at any of 8, 16, 24, or 48 kHz. The native-lower-sampling-rate decoding isn't normally used, since they've only targeted their code for that at highly resource-constrained environments and thus that decoder gives lower quality, so even if you tell opusdec on the PC to give you 16 kHz output it will use its high-quality 48kHz decoding and then downsample, but it is there (and somebody could implement a higher-quality native-24-kHz decoder if they really felt like it).
Go to the top of the page
+Quote Post
benski
post Oct 7 2012, 18:32
Post #6


Winamp Developer


Group: Developer
Posts: 670
Joined: 17-July 05
From: Brooklyn, NY
Member No.: 23375



fs is part of the MDCT equation. If you didn't know the sample rate, you wouldn't be able to decode the audio. Like others have said, you can decode easily to sample rates that are multiples, but you cannot decode to arbitrary sample rates without resampling.

This post has been edited by benski: Oct 7 2012, 18:32
Go to the top of the page
+Quote Post
lvqcl
post Oct 7 2012, 18:56
Post #7





Group: Developer
Posts: 3383
Joined: 2-December 07
Member No.: 49183



Winamp MP3 decoder still have Full/Half/Quarter quality setting
Go to the top of the page
+Quote Post
saratoga
post Oct 7 2012, 19:00
Post #8





Group: Members
Posts: 4971
Joined: 2-September 02
Member No.: 3264



Using a dct means you could easily drop the sample rate by a factor of 2^n. This is widely used in jpeg and some old mp3 decoders. But that's just an optimization based on how inverse transforms work. You can't actually decode to an arbitrary sample rate. Converting to the frequency domain doesn't really make much difference for sampling. You still have a well defined sampling rate In the frequency domain.
Go to the top of the page
+Quote Post
jensend
post Oct 7 2012, 19:57
Post #9





Group: Members
Posts: 145
Joined: 21-May 05
Member No.: 22191



I still think you're wrong here. Once you've taken a DCT and have stuff represented as a sum of sinusoids you can sample those sinusoids at any intervals, even at nonuniform sampling points. You just can't use the efficiency of the IDCT to do so, so it'll be computationally inefficient.
Go to the top of the page
+Quote Post
saratoga
post Oct 7 2012, 20:16
Post #10





Group: Members
Posts: 4971
Joined: 2-September 02
Member No.: 3264



QUOTE (jensend @ Oct 7 2012, 14:57) *
I still think you're wrong here. Once you've taken a DCT and have stuff represented as a sum of sinusoids you can sample those sinusoids at any intervals, even at nonuniform sampling points. You just can't use the efficiency of the IDCT to do so, so it'll be computationally inefficient.


Could you explain what you mean more precisely? Its difficult to understand what your idea is.
Go to the top of the page
+Quote Post
sheh
post Oct 8 2012, 18:33
Post #11





Group: Members
Posts: 89
Joined: 3-November 04
Member No.: 17971



Okay, so the original fs needs to be plugged in somewhere, but isn't it possible (in theory at least) to decode to an arbitrary sample rate < fs? From what's said above it sounds like it's more of a question of decoder design. Or is there just no way to get from the frequency domain data to sample points that aren't aligned with the original sample rate (or 2-n)?
Go to the top of the page
+Quote Post
Dynamic
post Oct 8 2012, 23:04
Post #12





Group: Members
Posts: 821
Joined: 17-September 06
Member No.: 35307



I think one problem is that we're never doing a Fourier Transform of the entire file.

Instead we do overlapping windowed transforms, which don't preserve all the frequency content of the original file accurately (and spread the single frequencies across multiple bins.

Given that there must be frequency spreading, there will at least be some difficulty in anti-aliasing, and there will also be elements of temporal spreading too.

I admit that I haven't got my head around the whole concept, though it has occurred to me in thinking about things like pitch-shifting or tempo-shifting but I still haven't worked out all the pitfalls.

I know that algorithms like Soundtouch have audible problems especially on multi-instrument/vocalist samples.

This post has been edited by Dynamic: Oct 8 2012, 23:07
Go to the top of the page
+Quote Post
saratoga
post Oct 8 2012, 23:41
Post #13





Group: Members
Posts: 4971
Joined: 2-September 02
Member No.: 3264



QUOTE (sheh @ Oct 8 2012, 13:33) *
Okay, so the original fs needs to be plugged in somewhere, but isn't it possible (in theory at least) to decode to an arbitrary sample rate < fs?



QUOTE (benski @ Oct 7 2012, 13:32) *
Like others have said, you can decode easily to sample rates that are multiples, but you cannot decode to arbitrary sample rates without resampling.




QUOTE (saratoga @ Oct 7 2012, 14:00) *
You can't actually decode to an arbitrary sample rate.



Are you reading the replies to this thread blink.gif
Go to the top of the page
+Quote Post
sheh
post Oct 9 2012, 05:21
Post #14





Group: Members
Posts: 89
Joined: 3-November 04
Member No.: 17971



I guess the way I put it does seem like a retread. The thing is, some statements above make it sound like it could be an implementation problem rather than a complete inherent impossibility.

JAZ says "the compressed data is specific for one max frequency". Makes it sound like he refers to decoding to higher frequencies. Then the mention of Opus using 48kHz internally for all sample rates (though, afterwards I found an old discussion that suggest 44.1kHz *is* a problem).

Benski says "If you didn't know the sample rate, you wouldn't be able to decode the audio", makes it sound like just needing the rate for the math. And the "easily" in "can decode easily to sample rates that are multiples" like a problem of complexity.

QUOTE (saratoga @ Oct 7 2012, 21:16) *
QUOTE (jensend @ Oct 7 2012, 14:57) *
...sample those sinusoids at any intervals...

Could you explain what you mean more precisely? Its difficult to understand what your idea is.

This is what I was wondering about. Perhaps it's just my oversimplified idea of what the transformed data represents. Taken to an extreme, let's say you have 10 seconds of 1kHz sine wave at a fixed level. If encoding ultimately just means storing the frequency, level, and phase, it seems you should be able to recreate that 1kHz sine wave with as many or as few samples as you'd like. What would actually be stored in such a case?

QUOTE (saratoga @ Oct 7 2012, 06:05) *
the sample rate determines which samples correspond to which frequencies.

What does "which samples correspond to which frequencies" mean?

This post has been edited by sheh: Oct 9 2012, 05:22
Go to the top of the page
+Quote Post
saratoga
post Oct 9 2012, 19:03
Post #15





Group: Members
Posts: 4971
Joined: 2-September 02
Member No.: 3264



QUOTE (sheh @ Oct 9 2012, 00:21) *
Benski says "If you didn't know the sample rate, you wouldn't be able to decode the audio", makes it sound like just needing the rate for the math. And the "easily" in "can decode easily to sample rates that are multiples" like a problem of complexity.


I'm interpreting your question to actually ask "Is there some way to start with (M)DCT transformed data and then transform to time domain at an arbitrary sample rate that is algorithmically more efficient than a combination of inverse (M)DCT and resampling.

My answer is no. The choice of domain makes no difference with respect to sampling and starting or ending in one domain or the other is orthogonal to questions about sampling. I can see no way to exploit one to make the other easier.

QUOTE (sheh @ Oct 9 2012, 00:21) *
If encoding ultimately just means storing the frequency, level, and phase, it seems you should be able to recreate that 1kHz sine wave with as many or as few samples as you'd like. What would actually be stored in such a case?


The process of fitting a function to data for purposes of changing the sampling rate is called resampling. So your idea here is probably something like "resampling then fourier transforming". This will work but its no different than the usual process once you actually implement it. You've just expressed the same process in a different way.

QUOTE (sheh @ Oct 9 2012, 00:21) *
QUOTE (saratoga @ Oct 7 2012, 06:05) *
the sample rate determines which samples correspond to which frequencies.

What does "which samples correspond to which frequencies" mean?


Are you familiar with the discrete fourier transform (or fast fourier transform)? In these (and all) discrete fourier transforms, the choice of sampling rate determines which samples in the Fourier domain contain which frequencies.

This post has been edited by saratoga: Oct 9 2012, 19:04
Go to the top of the page
+Quote Post
[JAZ]
post Oct 9 2012, 19:03
Post #16





Group: Members
Posts: 1783
Joined: 24-June 02
From: Catalunya(Spain)
Member No.: 2383



Mmm.. the tranformation that (most) lossy codecs do from time domain (samples) to frequency domain (frequency bands intensity and phase) does not compress by itself. It might even need more data, depending on the precision.

What codecs use to reduce the bitrate demands is allowing frequencies to be less precise (quantizing the different possible values), coupled with other compressing techniques (joint stereo with less bits for side channel, huffmann compression, parametric audio reconstruction, etc..).

But going back, what you usually get from the transformation is not "frequency 1Hz, x intensity, frequency 2Hz y intensity,...). To get that, you would need to use an FFT of the same size of the sampling rate, and in that case, you would need to say which size it is. (effectively defining the sample rate).

Generally, though, a fixed size (depending on sample rate, to have enough definition) transformation is used. In MP3, it is an overlapped window of 1152 samples (in case of long blocks) (Please, correct me if i am wrong!), which generates 576 frequency bands (and their phases). Those 576 values by themselves don't mean a thing, because there is the same number of bands from a 32Khz wave than from a 48Khz wave.

Concretely, the band 576, for a 32Khz contains frequency information from the frequency 31943Hz to the frequency 31999Hz.
For a 48Khz, that same band contains frequency information from the frequency 47915Hz to the frequency 47999Hz.
It is only on playback that the frequency gets any meaning. And to understand it, you might think on what happens when playing a 22Khz file at 44Khz, or playing an LP of 33RPM at 45RPM.

This post has been edited by [JAZ]: Oct 9 2012, 19:06
Go to the top of the page
+Quote Post
Dynamic
post Oct 9 2012, 22:22
Post #17





Group: Members
Posts: 821
Joined: 17-September 06
Member No.: 35307



QUOTE ([JAZ] @ Oct 9 2012, 19:03) *

Concretely, the band 576, for a 32Khz contains frequency information from the frequency 31943Hz to the frequency 31999Hz.
For a 48Khz, that same band contains frequency information from the frequency 47915Hz to the frequency 47999Hz.


As [JAZ] would realise if reading that back, band 576 of the power spectrum reaches up to the Nyquist limit, so reaches 15999.999 Hz at a 32 kHz sampling rate, and 23999.999 Hz at 48 kHz sampling rate, however, that's just putting in specific number to illustrate a point.
Go to the top of the page
+Quote Post
benski
post Oct 10 2012, 16:16
Post #18


Winamp Developer


Group: Developer
Posts: 670
Joined: 17-July 05
From: Brooklyn, NY
Member No.: 23375



QUOTE (lvqcl @ Oct 7 2012, 13:56) *
Winamp MP3 decoder still have Full/Half/Quarter quality setting


Yes, and it does this by not doing the calculations for all the bands, and only doing a 16 or 8pt iMDCT at the end. Remember that MP3 is cascade of 32 18-pt iMDCTs followed by a 32-pt iMDCT. This was done mainly to reuse hardware and software from Layer 2 decoders. From memory, I think there's a step in between that would prevent an implementation from simply using a 576 point MDCT.

This post has been edited by benski: Oct 10 2012, 16:21
Go to the top of the page
+Quote Post
pdq
post Oct 10 2012, 17:12
Post #19





Group: Members
Posts: 3407
Joined: 1-September 05
From: SE Pennsylvania
Member No.: 24233



So am I understanding correctly that 99.9% of decoding is independent of the original sample rate, and only at the very end where the wav header is filled in does it even matter?
Go to the top of the page
+Quote Post
saratoga
post Oct 10 2012, 17:30
Post #20





Group: Members
Posts: 4971
Joined: 2-September 02
Member No.: 3264



QUOTE (benski @ Oct 10 2012, 11:16) *
QUOTE (lvqcl @ Oct 7 2012, 13:56) *
Winamp MP3 decoder still have Full/Half/Quarter quality setting


Yes, and it does this by not doing the calculations for all the bands, and only doing a 16 or 8pt iMDCT at the end. Remember that MP3 is cascade of 32 18-pt iMDCTs followed by a 32-pt iMDCT. This was done mainly to reuse hardware and software from Layer 2 decoders. From memory, I think there's a step in between that would prevent an implementation from simply using a 576 point MDCT.


Its actually a subband decomposition followed by an MDCT on each subband. So basically when you decode you do the iMDCT which gives you a bunch of MP2 style subbands. Then you use an inverse filterbank to put the subbands back into a single signal. The downsample by 2 trick works because of some symmetry in the filterbank that lets you throw away half of the samples in exchange for half as much work in the filterbank. Since the filterbank is much slower then the iMDCT, this was used on old systems to speed up decoding. Its not really useful though once you break ~50-60MHz CPUs.

QUOTE (pdq @ Oct 10 2012, 12:12) *
So am I understanding correctly that 99.9% of decoding is independent of the original sample rate, and only at the very end where the wav header is filled in does it even matter?


No, I would say the opposite. In a transform codec, entire lossy decoding process (everything after huffman decoding) depends on the sample rate.

Edit: For example, a lot of codecs don't even use the same MDCT at different sample rates (e.g. Vorbis, WMA).

This post has been edited by saratoga: Oct 10 2012, 17:32
Go to the top of the page
+Quote Post
[JAZ]
post Oct 10 2012, 17:52
Post #21





Group: Members
Posts: 1783
Joined: 24-June 02
From: Catalunya(Spain)
Member No.: 2383



QUOTE (Dynamic @ Oct 9 2012, 23:22) *
As [JAZ] would realise if reading that back, ...

Ouch! headbang.gif

QUOTE (saratoga @ Oct 10 2012, 18:30) *
Its not really useful though once you break ~50-60MHz CPUs.


That might work for current CPU's, but a 486DX2 66Mhz needed to play it at 22Khz smile.gif
Go to the top of the page
+Quote Post
saratoga
post Oct 10 2012, 18:20
Post #22





Group: Members
Posts: 4971
Joined: 2-September 02
Member No.: 3264



QUOTE ([JAZ] @ Oct 10 2012, 12:52) *

That might work for current CPU's, but a 486DX2 66Mhz needed to play it at 22Khz smile.gif


Which decoder? Early 90s ARM processors do full 32 bit output at less than 50 MHz, but you need to do fixed point + quite a bit of optimization. I don't know much about the 486 processor, but I didn't think it was much slower than early 90s ARM chips.
Go to the top of the page
+Quote Post
[JAZ]
post Oct 10 2012, 18:41
Post #23





Group: Members
Posts: 1783
Joined: 24-June 02
From: Catalunya(Spain)
Member No.: 2383



Winamp with the in-house decoder (i.e. not the frauhoffer decoder) that they used at the time. (I'm not sure if it was already winamp 2, or still winamp 1.7 or so). No MMX, probably x87 math...

This post has been edited by [JAZ]: Oct 10 2012, 18:42
Go to the top of the page
+Quote Post
saratoga
post Oct 10 2012, 18:55
Post #24





Group: Members
Posts: 4971
Joined: 2-September 02
Member No.: 3264



QUOTE ([JAZ] @ Oct 10 2012, 13:41) *

Winamp with the in-house decoder (i.e. not the frauhoffer decoder) that they used at the time. (I'm not sure if it was already winamp 2, or still winamp 1.7 or so). No MMX, probably x87 math...


x87 is definitely not going to work, but integer should be a lot faster. I actually don't know if any fixed point decoders were popular back then, libmad dates back to 2000, so I guess it couldn't have been used in those old DOS decoders.
Go to the top of the page
+Quote Post
Dynamic
post Oct 11 2012, 16:14
Post #25





Group: Members
Posts: 821
Joined: 17-September 06
Member No.: 35307



My recollection is that Winamp 1.7 running under Windows 95 on a Pentium 60 was just capable of decoding MP3 at 44.1kHz (typically encoded by l3enc at 192kbps CBR) and MP2 (from CoolEdit96 plugin) at 192-224 kbps, but my Mum's PC, around the spec [JAZ] mentioned needed 22.05kHz playback.

Encoding used to take forever, but disc space was so scarce, it was worth it. Fortunately, I didn't switch to Xing for the speed up.
Go to the top of the page
+Quote Post

2 Pages V   1 2 >
Reply to this topicStart new topic
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:

 



RSS Lo-Fi Version Time is now: 22nd September 2014 - 18:39