IPB

Welcome Guest ( Log In | Register )

 
Reply to this topicStart new topic
downside of transform coders / enhancement idea, Your opinions wanted ...
SebastianG
post Mar 22 2004, 17:28
Post #1





Group: Developer
Posts: 1318
Joined: 20-March 04
From: Göttingen (DE)
Member No.: 12875



What do you think are the weaknesses of common transform coders for high-quality encoding ?
Can it be summarized by "lack of high temporal resultion => pulpy transients" ?
I'm just curious about other's opinions for that.

I'm currently experimenting with a new (MDCT based) filterbank.
It looks very promising.

A comparison to a hybrid QMF/Wavelet+MDCT approach like ATRAC:

pros:
- perfect reconstruction
- completey MDCT based (don't worry about designing QMF filters)
- even more flexible (frame and frequency adaptive) selectable
time/frequency resolutions.

cons:
- slightly more transform work to to
- introduces a higher delay (depends on whether i use
temporal alias reduction or not)

Such an approach could result in a highly scaleable
codec. Any comments ?

This post has been edited by SebastianG: Mar 22 2004, 18:25
Go to the top of the page
+Quote Post
wkwai
post Mar 29 2004, 15:19
Post #2


MPEG4 AAC developer


Group: Developer
Posts: 398
Joined: 1-June 03
Member No.: 6943



Actually, I don't really understand why the MDCT is called a transform.. It comes under Perfectly Reconstructed (PR) cosine modulated subband filter.. It is basically a type of QMF filterbank ! blink.gif

This post has been edited by wkwai: Mar 29 2004, 15:21
Go to the top of the page
+Quote Post
petracci
post Mar 29 2004, 15:44
Post #3





Group: Members
Posts: 95
Joined: 18-December 01
Member No.: 678



QUOTE
I'm currently experimenting with a new (MDCT based) filterbank.
It looks very promising.

pros:
- perfect reconstruction
- completey MDCT based (don't worry about designing QMF filters)
- even more flexible (frame and frequency adaptive) selectable
  time/frequency resolutions.

Such an approach could result in a highly scaleable
codec. Any comments ?


Interesting. Could you shed some more light on how to achieve a more flexible time/frequency resolution? And what criteria are you using for choosing a specific resolution?

Greetz,
Go to the top of the page
+Quote Post
SebastianG
post Mar 31 2004, 16:44
Post #4





Group: Developer
Posts: 1318
Joined: 20-March 04
From: Göttingen (DE)
Member No.: 12875



QUOTE (petracci @ Mar 29 2004, 06:44 AM)
Interesting. Could you shed some more light on how to achieve a more flexible time/frequency resolution? And what criteria are you using for choosing a specific resolution?

Hard to explain in a few words...
It's basically the same as Vorbis' MDCT for the first stage. Then, the temporal resolution of some frequency regions is increased by treating the mdct coeffs of the first MDCT as another time signal which is again MDCT transformed in smaller chunks. Due to the frequency/time duality this 2nd "decorrelation" increases time resolution for those bands. To compensate for the time domain alias introduced in the first stage some "butterflies" can be applied in the new "subband transform domain" between frames.

To give an example what time/frequency resolutions are possible:

1024-sample-block {
0-4134 Hz: 192 bands with each a single sample
4134-11025 Hz: 40 bands with each 8 samples
11025-22050 Hz: 16 bands with each 32 samples
}
// followed by
128-sample-block {
0-11025 Hz: 64 bands with each a single sample
11025-22050 Hz: 16 bands with each 4 samples
}
// and so on

In this example "butterflies" can be used for the 4134-22050 Hz region between
these two blocks to remove the temporal alias introduced in the first MDCT stage
(due to windowing)

Criteria: (1) high spectral resolution vs (2) high temporal resolution:
With (1) it's possible to accurately spectral-shape the quantization noise. It usually concentrates energy in a few bins (compact energy representation) for tonal/stationary parts.
(2) allows accurate control of the quantization noise's temporal shape and is better suited for quantizing noisy parts at low SNRs.

edit: fixed example

bye,
Sebastian

This post has been edited by SebastianG: Mar 31 2004, 16:50
Go to the top of the page
+Quote Post
petracci
post Apr 2 2004, 09:05
Post #5





Group: Members
Posts: 95
Joined: 18-December 01
Member No.: 678



QUOTE
It's basically the same as Vorbis' MDCT for the first stage. Then, the temporal resolution of some frequency regions is increased by treating the mdct coeffs of the first MDCT as another time signal which is again MDCT transformed in smaller chunks. Due to the frequency/time duality this 2nd "decorrelation" increases time resolution for those bands.


This is similar to the method of Purat and Noll, in "A new orthonormal wavelet packet decomposition for audio coding using frequency-varying modulated lapped transforms", ICASSP '96.

QUOTE
To compensate for the time domain alias introduced in the first stage some "butterflies" can be applied in the new "subband transform domain" between frames.

Since at both stages you use an orthonormal transform, I do not see why these antialiasing butterflies are necessary?

QUOTE
Criteria: (1) high spectral resolution vs (2) high temporal resolution:
With (1) it's possible to accurately spectral-shape the quantization noise. It usually concentrates energy in a few bins (compact energy representation) for tonal/stationary parts.
(2) allows accurate control of the quantization noise's temporal shape and is better suited for quantizing noisy parts at low SNRs.

I understand that (1) and (2) are the properties of the resulting transform, but I was interested in the criteria that you use to achieve high spectral resolution for stationary/tonal parts and high temporal resolution for non-stationary/transient parts. Are you using perceptual entropy, transient detection, analysis-by-synthesis methods.

I would be very interested in your results. I agree that with these adaptions you could achieve a high level of scalability. However, there is one definite "con" that you did not mention: side information. You have to tell the decoder what transform structure you used. For adaptive framing, this side information is neglegible, but for adaptive frequency decompositions, this is not the case, even when you use entropy coding of the side info.

Greetz,

Petracci
Go to the top of the page
+Quote Post
SebastianG
post Apr 5 2004, 16:05
Post #6





Group: Developer
Posts: 1318
Joined: 20-March 04
From: Göttingen (DE)
Member No.: 12875



QUOTE (petracci @ Apr 2 2004, 12:05 AM)
This is similar to the method of Purat and Noll, in "A new orthonormal wavelet packet decomposition for audio coding using frequency-varying modulated lapped transforms", ICASSP '96.

Hmm... is this paper publicy available ?
I tried citeseer but I was not able to download it.

QUOTE
QUOTE
To compensate for the time domain alias introduced in the first stage some "butterflies" can be applied in the new "subband transform domain" between frames.
Since at both stages you use an orthonormal transform, I do not see why these antialiasing butterflies are necessary?

They are necessary because each of the n MDCT coeffs affects up to 2*n time samples. Consider MDC-Transforming a unit impulse. This pulse is covered by 2 windows and therefore affects 2 MDCT spectra. The butterflies can be used to reduce this alias effect after the 2nd stage so that there'll only appear one pulse in the transformed version.

QUOTE
I understand that (1) and (2) are the properties of the resulting transform, but I was interested in the criteria that you use to achieve high spectral resolution for stationary/tonal parts and high temporal resolution for non-stationary/transient parts. Are you using perceptual entropy, transient detection, analysis-by-synthesis methods.

I'm not doing anything right now. I just tinkered a bit around this filterbank idea. Checking the impulse responses of the inverse filterbank for different pulses in the transform domain. I posted it here to discuss this approach. Well, I did not give much details in the first place. But it's hard to explain. smile.gif

The main difference compared to other hybrid approaches is, the first stage decomposes the signal with a very high spectral resolution and kind of reverts it for some bands whereas common hybrid filterbanks decompose the signal into broader subbands in the first stage and do further band-splitting in the 2nd stage.
(But we can always reduce the alias effect of the first stage after the 2nd stage by applying alias-reduction butterflies)

QUOTE
However, there is one definite "con" that you did not mention: side information. You have to tell the decoder what transform structure you used. For adaptive framing, this side information is neglegible, but for adaptive frequency decompositions, this is not the case, even when you use entropy coding of the side info.

Yes, therefore I don't think it makes much sense to allow all transform variants. Just a few one that prove to be a good choice in most situations. The sideinformation will be neglegible for just 8 transform variants for example.

Maybe you want to check my view of the MDCT, which explains to some extent why the butterflies after the 2nd stage can be used to cancel the first set of butterflies.
see thread here

bye,
Sebastian

edit: fixed some typos (probably not all) wink.gif

This post has been edited by SebastianG: Apr 6 2004, 18:09
Go to the top of the page
+Quote Post

Reply to this topicStart new topic
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:

 



RSS Lo-Fi Version Time is now: 1st October 2014 - 18:37