IPB

Welcome Guest ( Log In | Register )

 
Reply to this topicStart new topic
Subband and transform coding, The actual difference
niktheblak
post Dec 3 2003, 00:39
Post #1





Group: Members (Donating)
Posts: 302
Joined: 3-October 01
From: Finland
Member No.: 188



I see quite a lot of references around describing something as a "transform codec" and another a "subband codec". For instance, Musepack is said to be a subband codec whereas AAC is said to be a "pure transform codec".

But that's where the confusion starts. One definition says that subband coding is a technique where the signal is divided into frequency bands, 32 bands in MPEG-1 codecs. But I've read that AAC also divides the signal into frequency bands, as does pretty much every psychoacoustic codec around. So doesn't that make them subband codecs, in a way?

So, what is the actual difference of a "subband codec" and a "transform codec"? Does a transform codec store the signal in frequency domain? Does a subband codec not? Do they have different filter banks? Do subband codecs use Fourier transform instead of mDCT to obtain the frequency bands? And what actually makes MP3 a hybrid codec?

I've tried googling and wiki'ing around for information on this one, but page authors seem to believe that everyone automatically knows everything about "subband codecs" and "transform codecs", so they needn't to say anything about them besides using the magic words "transform" and "subband" in one sentence. Any help appreciated!
Go to the top of the page
+Quote Post
ErikS
post Dec 3 2003, 04:30
Post #2





Group: Members
Posts: 757
Joined: 8-October 01
Member No.: 247



QUOTE (niktheblak @ Dec 3 2003, 07:39 AM)
So, what is the actual difference of a "subband codec" and a "transform codec"? Does a transform codec store the signal in frequency domain? Does a subband codec not? Do they have different filter banks? Do subband codecs use Fourier transform instead of mDCT to obtain the frequency bands? And what actually makes MP3 a hybrid codec?

To answer one of your questions: the output from a MDCT is in frequency domain while from the set of filters used in eg mp2 is still in time domain. They use a type of filter called QMF (quadrature mirror filters) which divides the input into two frequency ranges. repeat this a number of times until you reach the desired frequency resolution (in the case of mp2 i think it is divided into 32 sub bands).

However, mp3 takes the 32 sub bands from the above filter bank and applies an MDCT transform to each of them. That's why it is called a hybrid codec. It uses both QMF bank and MDCT.
Go to the top of the page
+Quote Post
Jasper
post Dec 3 2003, 17:53
Post #3





Group: Members
Posts: 189
Joined: 9-July 02
Member No.: 2536



What is the essential difference between splitting a signal into subbands through a filterbank and retrieving frequency information through an MDCT? Both give information about the level of a signal in a certain frequency range (more or less) over a block of data.
Also, what would be the point of applying an MDCT to a signal that is already split into subbands, and to what exactly is the MDCT applied? (each of the subbands, some strange mix or ...?)
Go to the top of the page
+Quote Post
ErikS
post Dec 3 2003, 18:13
Post #4





Group: Members
Posts: 757
Joined: 8-October 01
Member No.: 247



QUOTE (Jasper @ Dec 4 2003, 12:53 AM)
What is the essential difference between splitting a signal into subbands through a filterbank and retrieving frequency information through an MDCT? Both give information about the level of a signal in a certain frequency range (more or less) over a block of data.
Also, what would be the point of applying an MDCT to a signal that is already split into subbands, and to what exactly is the MDCT applied? (each of the subbands, some strange mix or ...?)

I'm probably wrong person to try to explain this, but those better suited for the job can jump in later and correct me.

The output from a filterbank is still a waveform which bears some resemblence to the original input waveform. Main difference is that either the upper or lower frequencies are missing - pretty much like if you turned down the upper (or lower) half of the EQ on your stereo. It is also resampled to a lower sampling rate so it only takes half of storage space the input used.

Result from an MDCT is a spectrum with no information of time left what so ever. So this gives you very detailed info on the frequency content of the block. But if there is for example a step in the input you can not from the output see where it occured (at least not very easily).

If you continued to divide a signal into more and more sub bands until you finally only have one sample left in each band you could say that it is a tranform similar to the FT and (M)DCT.
Go to the top of the page
+Quote Post
2Bdecided
post Dec 3 2003, 18:27
Post #5


ReplayGain developer


Group: Developer
Posts: 5360
Joined: 5-November 01
From: Yorkshire, UK
Member No.: 409



As Erik has suggested, at the limit, I don't think there has to be any difference:

http://www.hydrogenaudio.org/forums/index....pic=15384&st=37

Cheers,
David.
Go to the top of the page
+Quote Post
niktheblak
post Dec 3 2003, 23:43
Post #6





Group: Members (Donating)
Posts: 302
Joined: 3-October 01
From: Finland
Member No.: 188



Thanks everyone! ErikS, your initial response finally put the pieces together for me.

So, QMF filter outputs time-domain data. And I suspect the filtering process itself is done by time-domain convolution or something similar, no? Not FFT convolution because that itself would convert the signal into frequency domain for a while.

So, mp2 and such perform FFT analysis on the signal in order find acceptable quantization values for the subbands (total masking and other psychoacoustics), but the subband data itself (which is re-quantisized and stored) never gets transformed into frequency domain, right?

And in the case of transform coders, the signal is indeed divided into subbands, but at some point, these (mDCT) band-dividing filters output frequency-domain data. If this is correct, then it would make perfect sense.

@Jasper

If you're talking about MP3 specifically, I don't think there is actually any point at all applying mDCT for the subband data. Layer 3 had to be backwards-compatible with Layer 2, and thus the subband/transform hybrid structure. A bad decision from MPEG organization, say most people. As well as the missing scalefactor issue for the highest frequency band. Luckily we have AAC now tongue.gif
Go to the top of the page
+Quote Post
wkwai
post Dec 4 2003, 08:22
Post #7


MPEG4 AAC developer


Group: Developer
Posts: 398
Joined: 1-June 03
Member No.: 6943



QUOTE (niktheblak @ Dec 3 2003, 02:43 PM)
If you're talking about MP3 specifically, I don't think there is actually any point at all applying mDCT for the subband data. Layer 3 had to be backwards-compatible with Layer 2, and thus the subband/transform hybrid structure. A bad decision from MPEG organization, say most people. As well as the missing scalefactor issue for the highest frequency band. Luckily we have AAC now tongue.gif

Not really true.. You must remember that at the time MPEG1 is standardized, computational power is still limited.. A subband based structure has its advantages in such as the possibility to implement a scaleable MP3 decoder even though the bitstream itself is not scaleable.

I remember having a Xing software MPEG player.. In those days, you could hardly decode VCD data on a Pentium 90MHz.. So it has an audio option of scaling the decoded audio from 44100Khz to 22050 and 11025 kHz.. and then you can play VCD realtime on your machine..

The filterbank itself is extremely computational intensive.. ( I am not sure if the fast MDCT/ IMDCT algorithm has been discovered yet back in 1990!!) So, having a scalable option in those days is an advantage.. Today however, for internet / broadcast applications scalability includes the bitstreams as well..
Go to the top of the page
+Quote Post
Gabriel
post Dec 4 2003, 10:54
Post #8


LAME developer


Group: Developer
Posts: 2950
Joined: 1-October 01
From: Nanterre, France
Member No.: 138



QUOTE
Not really true.. You must remember that at the time MPEG1 is standardized, computational power is still limited.. A subband based structure has its advantages in such as the possibility to implement a scaleable MP3 decoder even though the bitstream itself is not scaleable.


How? Is it just by only applying the inverse-qmf on only half of the subbands?
Would this really be significant, regarding processing power?

I am surprised, because it is the first time that someone is mentionning something that could be a valid, non political, reason to have used an hybrid transform in mp3.
Go to the top of the page
+Quote Post
Ivan Dimkovic
post Dec 4 2003, 12:39
Post #9


Nero MPEG4 developer


Group: Developer
Posts: 1466
Joined: 22-September 01
Member No.: 8



QUOTE
Not really true.. You must remember that at the time MPEG1 is standardized, computational power is still limited.. A subband based structure has its advantages in such as the possibility to implement a scaleable MP3 decoder even though the bitstream itself is not scaleable.


But there was no such implementation (scalable MP3 decoder) - and I don't think that computational power would be sacrifised at all by using MDCT - even, if the normal power^2 frames had been used, the complexity of some other modules would drop.

MP3 itself was anyway defined as highest complexity layer in MPEG-1/2, so one reason more not to think that QMF was put there because of complexity.
Go to the top of the page
+Quote Post
wkwai
post Dec 5 2003, 09:56
Post #10


MPEG4 AAC developer


Group: Developer
Posts: 398
Joined: 1-June 03
Member No.: 6943



Of course the complexity of the decoder will be halved if you choose a 22050 Hz sampling rate decoding.. The rest of the modules are halved in complexity...
If you choose 11025 Hz sampling rate decoding, the decoding complexity would be about a quarter the original full rate decoder..

You have not seen a scaleable MPEG1 Layer II decoder? I have seen it back in 1996.. In those days, silicon is expensive! But today, it would be a different story..

As for QMF, PQF, wavelets, these filterbanks falls in the category of multirate filterbanks.. You can implement a MP3 scaleable decoder!! From the same 44100 hz sampling rate bitstream, you can output a 22050 hz or 11025 hz audio output with a half or a quarter the decoder complexity.

As for whether a hybrid imdct filterbank is more complex than a single imdct filterbank or not; well I haven't really done any measurement on this subject. But still, in my opinion, a scaleable hybrid-mdct decoder that decodes at 1/4 the original bandwidth will definately be less complex that a single block of imdct decoding at the full bandwidth.. You also have to take into consideration the fact that the dequantization module is just 1/4 of the original..

This post has been edited by wkwai: Dec 5 2003, 10:06
Go to the top of the page
+Quote Post
Gabriel
post Dec 5 2003, 10:51
Post #11


LAME developer


Group: Developer
Posts: 2950
Joined: 1-October 01
From: Nanterre, France
Member No.: 138



QUOTE
Of course the complexity of the decoder will be halved if you choose a 22050 Hz sampling rate decoding.. The rest of the modules are halved in complexity...

Are you implying that it is possible (in case of mp3) to halve the complexity of other steps than the qmf one?
Go to the top of the page
+Quote Post
wkwai
post Dec 6 2003, 09:00
Post #12


MPEG4 AAC developer


Group: Developer
Posts: 398
Joined: 1-June 03
Member No.: 6943



Of course, the dequantization module involves a power function. If you are going to implement it in fixed point DSP, this power function is going to be very complex unlike the floating point enhanced performance of the Intel Processor...

If your output is going to be 22050 Khz sampling rate, you only need to dequantize the spectral from 0 to 288(taking into consideration of the interleaved spectral , Long block, MP3). You only use half the number of subbands to reconstruct the final output PCM data. The same applies to the Intensity and Ms stereo modules.. But I am afraid that the huffman decoding module must be completely decoded. smile.gif It is not a complete scalable decoder as with the MPEG4 AAC -SSR profile where its' bitstream is also scaleable.
Go to the top of the page
+Quote Post
wkwai
post Dec 18 2003, 10:32
Post #13


MPEG4 AAC developer


Group: Developer
Posts: 398
Joined: 1-June 03
Member No.: 6943



OOppps!!

I remembered something.. The MP3 specs allows the block switching to occur above certain subbands.. It is something like mixing short blocks with long blocks unlike MPEG4 AAC SSR which you MUST switch all 4 subbands simultaneously!

Well, this might be useful, considering that you can detect the attacks from the time domain subband output samples and decide which subband should be switched to short block!

As you will notice, most attacks occured in the upperbands..
Go to the top of the page
+Quote Post
menno
post Dec 18 2003, 10:51
Post #14


Nero MPEG4 developer


Group: Developer (Donating)
Posts: 1218
Joined: 11-October 01
From: LA
Member No.: 267



QUOTE (Gabriel @ Dec 4 2003, 10:54 AM)
QUOTE
Not really true.. You must remember that at the time MPEG1 is standardized, computational power is still limited.. A subband based structure has its advantages in such as the possibility to implement a scaleable MP3 decoder even though the bitstream itself is not scaleable.


How? Is it just by only applying the inverse-qmf on only half of the subbands?
Would this really be significant, regarding processing power?

Yup, actually even Winamp still has the Half, Full and Quarter "quality" settings. I'm not sure if this applies to the QMF only or also to the processing done before the filterbank (dequantisation, scaling, MS, etc), but that should be possible. If so, it should indeed give a saving of around a half and 3 quarter of the processing power.

Menno
Go to the top of the page
+Quote Post
Ivan Dimkovic
post Dec 18 2003, 10:51
Post #15


Nero MPEG4 developer


Group: Developer
Posts: 1466
Joined: 22-September 01
Member No.: 8



Hmm...

QUOTE
You have not seen a scaleable MPEG1 Layer II decoder? I have seen it back in 1996.. In those days, silicon is expensive! But today, it would be a different story..


No.. I was talking about scaleable MPEG1 Layer III decoders - and so far I haven't heard about them? Maybe you can point me to some vendors making that stuff?

QUOTE
Of course, the dequantization module involves a power function. If you are going to implement it in fixed point DSP, this power function is going to be very complex unlike the floating point enhanced performance of the Intel Processor...


Off topic, but there are some new methods in reducing the complexity of inverse pow-law quantization - check out the Crystal Semiconductor's papers on AES - if I remember correctly.

I know - these probably didn't exist in 1992 when the standard was released smile.gif


QUOTE
I remembered something.. The MP3 specs allows the block switching to occur above certain subbands.. It is something like mixing short blocks with long blocks unlike MPEG4 AAC SSR which you MUST switch all 4 subbands simultaneously!

Well, this might be useful, considering that you can detect the attacks from the time domain subband output samples and decide which subband should be switched to short block!


True, but IIRC - mixed blocks in MP3 had some other nasty limitations that prevented them to be used in default encoding modes of many encoders.
Go to the top of the page
+Quote Post
Ivan Dimkovic
post Dec 18 2003, 10:55
Post #16


Nero MPEG4 developer


Group: Developer
Posts: 1466
Joined: 22-September 01
Member No.: 8



QUOTE
No.. I was talking about scaleable MPEG1 Layer III decoders - and so far I haven't heard about them? Maybe you can point me to some vendors making that stuff?


So.. I stand corrected - Winamp still has scalable mp3 decoding smile.gif
Go to the top of the page
+Quote Post
MugFunky
post Dec 29 2003, 19:50
Post #17





Group: Members
Posts: 493
Joined: 28-December 03
From: Melbourne, Aus
Member No.: 10767



QUOTE
True, but IIRC - mixed blocks in MP3 had some other nasty limitations that prevented them to be used in default encoding modes of many encoders.


yeah. i read somewhere that a mixed block must be followed by a pure long block... and from what i read the mixing only occurs above a certain scalefactor band (sfb4? don't know) so it's a long block below 4, and 2 short blocks sitting on top of it, packed together.
Go to the top of the page
+Quote Post
wkwai
post Dec 31 2003, 20:37
Post #18


MPEG4 AAC developer


Group: Developer
Posts: 398
Joined: 1-June 03
Member No.: 6943



Is there any problem with this approach? MP3 can also use short blocks completely.. but the characteristics of attacks is such that most sharp edges only
occurred on the upper subbands of the PQF filterbanks.. So, it may not be necessary to code all subbands as short blocks which is a waste..

From my experience with the MPEG4 -AAC SSR gain control, gain compensation is quite effective in eleminating pre-echo.. In fact for the Sony ATRAC3 which uses the same gain-control (though slightly improved filterbanks) can handle the clip castanets.wav completely in Long Block Mode only..

But I wondered why this option is disabled for AAC?
Go to the top of the page
+Quote Post
MugFunky
post Jan 4 2004, 19:43
Post #19





Group: Members
Posts: 493
Joined: 28-December 03
From: Melbourne, Aus
Member No.: 10767



@ wkwai: you'd probably have more of a clue about this then me...

how is the output from the mp3 filterbank organised into the critical bands that actually get quantized? there's 32 subbands and 21 scalefactor bands... i don't know the boundaries of the sfb's, but i know (think?) the subbands are just equally spaced along the spectrum.

it's always confused me how that bit fits together...
Go to the top of the page
+Quote Post
wkwai
post Jan 5 2004, 08:39
Post #20


MPEG4 AAC developer


Group: Developer
Posts: 398
Joined: 1-June 03
Member No.: 6943



From the psychoacoustic perspective there are only 24 critical bands.. ISO specs stated that for better Masking threshold calculation, each critical bands are further split in 3 critical bands.. You can split it into even finer bands or less finer bands..

In the case of MP3, the 32 subbands are equally spaced.. But the problem here is that the region of sensitivity of the human hearing isn't equally spaced.. It is divided into 24 critical bands.. But the 21 scalefactor bands, in my opinion isn't exactly matching the 24 critical bands.. (measured in Barks)

In the case of AAC, the LONG block has 49 scalefactorbands or approximately almost twice the number of critical bands.. You can just assume that each scalefactorbands is almost as good as 1/2 critical bands..

However, what do scalefactor bands really represent physically? That I am not certain. Why wouldn't they mapped the scalefactor bands as closely as possible to the critical bands? blink.gif
Go to the top of the page
+Quote Post
QuantumKnot
post Jan 7 2004, 02:02
Post #21





Group: Developer
Posts: 1245
Joined: 16-December 02
From: Australia
Member No.: 4097



I'd like some confirmation about whether this is correct:

Fourier based transforms like the FFT and DCT or variants of these use basis functions that extend infinitely in time. Because of Gabor's uncertainly principle, these basis functions, which have no time resolution, should have a very fine frequency resolution (frequency lines). Because we only take the transform of a finite time frame, the windowing causes a smudging of these lines so they become thin frequency bins or bands.

The QMFs used in subband decomposition or wavelets in wavelet transforms have bases which are finite in time and therefore they have very wide frequency resolution (wide frequency bands). Hence, in order to get fine frequency resolution with subband decomposition, you have to cascade them to recursively to subdivide these frequency bands. The advantage of subband decomposition over transform coding is that you can get non-uniform bandsplitting (so they can fall upon the critical bands, lets say) while transforms only give you uniform bandsplitting.

Hence in this respect, transforms are just a special case of subband decomposition and forward and inverse transforms can be viewed a filterbanks where the impulse responses of the filters are the time-domain versions of the transform basis function.

Sounds ok? smile.gif
Go to the top of the page
+Quote Post

Reply to this topicStart new topic
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:

 



RSS Lo-Fi Version Time is now: 20th December 2014 - 00:09