IPB

Welcome Guest ( Log In | Register )

 
Reply to this topicStart new topic
The MP3 Polyphase Filter Bank, About the filter, and a possible GPU speedup
JacobG
post Feb 4 2012, 14:31
Post #1





Group: Members
Posts: 2
Joined: 4-February 12
Member No.: 96926



Hello!

For a completely different application, I designed a 32-sub band filter bank in matlab,
And implemented it in CUDA for a GPU.
It currently uses 64 filters, a single 64 point FFT, and the decimation\interpolation is by 32.
But this isn't really important since the CUDA implementation can easily be changed.

Since I read that the MP3 filter bank also divides the signals into 32 sub-bands,
I was wondering if my GPU filterbank can be used for MP3 decoding.

My questions are:
1. Is the total number of taps in the MP3 filter 512, or is it 512 taps in the filter in every branch (512*32 total)?
2. How good is the isolation between the different sub-bands? Do they overlap (One input frequency may have a response in more than one band)? Are "holes", which by I mean dead areas between the sub-bands, allowed?
3. Is this filtering considered an expensive operation in MP3 decoding? What are the time constants it operates in for lets say 512 samples-
mili seconds? micro seconds?
4. Does the format use a specific filter for the decoding? Is there freedom to implement the filter bank with a different structure?

Thanks for reading,
Jacob
Go to the top of the page
+Quote Post
saratoga
post Feb 4 2012, 21:01
Post #2





Group: Members
Posts: 4963
Joined: 2-September 02
Member No.: 3264



The filterbank (including the DFT) is the slowest part of the decode process, using roughly half the total time. However, for decoding at least, the filterbank only uses 10-20 MHz (or perhaps lower with SIMD), so I'm not sure CUDA makes sense.

If you're interested, I've been working little by little on improving filterbank performance for various embedded devices, where performance can make quite a difference due to battery limitations.
Go to the top of the page
+Quote Post
JacobG
post Feb 4 2012, 23:09
Post #3





Group: Members
Posts: 2
Joined: 4-February 12
Member No.: 96926



QUOTE (saratoga @ Feb 4 2012, 22:01) *
The filterbank (including the DFT) is the slowest part of the decode process, using roughly half the total time. However, for decoding at least, the filterbank only uses 10-20 MHz (or perhaps lower with SIMD), so I'm not sure CUDA makes sense.

If you're interested, I've been working little by little on improving filterbank performance for various embedded devices, where performance can make quite a difference due to battery limitations.


Hi,
If I understand right, you mean that there isn't much of a gain in speeding up the filter bank, since it operates in a slow rate anyway?

In your work in improving the filterbank performance, are you modifying the design itself (changing the prototype filter, structure, etc.),
or are you making the filter bank code smarter?
I am curious to know weather there is a point in experimenting MP3 decoding with a different filter-bank. For example, maybe a different prototype filter can give better sound quality?
Go to the top of the page
+Quote Post
lvqcl
post Feb 4 2012, 23:26
Post #4





Group: Developer
Posts: 3382
Joined: 2-December 07
Member No.: 49183



http://wiki.hydrogenaudio.org/index.php?ti...ng_of_MP3_audio

QUOTE
Decoding [...] is carefully defined in the standard. Most decoders are "bitstream compliant", meaning that the decompressed output they produce from a given MP3 file will be the same (within a specified degree of rounding tolerance) as the output specified mathematically in the ISO/IEC standard document.
Go to the top of the page
+Quote Post
saratoga
post Feb 4 2012, 23:33
Post #5





Group: Members
Posts: 4963
Joined: 2-September 02
Member No.: 3264



QUOTE (JacobG @ Feb 4 2012, 17:09) *
If I understand right, you mean that there isn't much of a gain in speeding up the filter bank, since it operates in a slow rate anyway?


Speeding up the filterbank is generally useful, but I don't see much point in using a GPU. It happens fast enough that the CPU is sufficient. Not sure the overhead and synchronization with another processor is worthwhile.

QUOTE (JacobG @ Feb 4 2012, 17:09) *
In your work in improving the filterbank performance, are you modifying the design itself (changing the prototype filter, structure, etc.),
or are you making the filter bank code smarter?


Just better implementations for various CPUs. We've come up with different variations for different CPUs:

http://git.rockbox.org/?p=rockbox.git;a=bl...c920f3b;hb=HEAD
http://git.rockbox.org/?p=rockbox.git;a=bl...7d540b9;hb=HEAD

Haven't updated it much, but a work in progress ARMv5 version:

http://www.rockbox.org/tracker/task/11759

QUOTE (JacobG @ Feb 4 2012, 17:09) *
I am curious to know weather there is a point in experimenting MP3 decoding with a different filter-bank. For example, maybe a different prototype filter can give better sound quality?


You can cheat a little to improve speed at the expense of accuracy, but usually its not a good idea.
Go to the top of the page
+Quote Post
knutinh
post Feb 4 2012, 23:59
Post #6





Group: Members
Posts: 569
Joined: 1-November 06
Member No.: 37047



Perhaps GPU-based transcoding could be interesting. Doing 1000s of files in a batch means large potential for threading/vectorization.

-k
Go to the top of the page
+Quote Post
Canar
post Feb 5 2012, 01:41
Post #7





Group: Super Moderator
Posts: 3361
Joined: 26-July 02
From: princegeorge.ca
Member No.: 2796



QUOTE (knutinh @ Feb 4 2012, 14:59) *
Perhaps GPU-based transcoding could be interesting. Doing 1000s of files in a batch means large potential for threading/vectorization.
You're going to run into I/O slowdown far before you can utilize that degree of parallelism.


--------------------
You cannot ABX the rustling of jimmies.
No mouse? No problem.
Go to the top of the page
+Quote Post
saratoga
post Feb 5 2012, 02:32
Post #8





Group: Members
Posts: 4963
Joined: 2-September 02
Member No.: 3264



QUOTE (Canar @ Feb 4 2012, 19:41) *
QUOTE (knutinh @ Feb 4 2012, 14:59) *
Perhaps GPU-based transcoding could be interesting. Doing 1000s of files in a batch means large potential for threading/vectorization.
You're going to run into I/O slowdown far before you can utilize that degree of parallelism.


Being I/O limited would be a nice problem to have. Particularly for people with SSDs.
Go to the top of the page
+Quote Post
knutinh
post Feb 5 2012, 13:18
Post #9





Group: Members
Posts: 569
Joined: 1-November 06
Member No.: 37047



QUOTE (Canar @ Feb 5 2012, 02:41) *
QUOTE (knutinh @ Feb 4 2012, 14:59) *
Perhaps GPU-based transcoding could be interesting. Doing 1000s of files in a batch means large potential for threading/vectorization.
You're going to run into I/O slowdown far before you can utilize that degree of parallelism.

Perhaps. But if the source and destination formats are both low-bandwidth, incompatible formats with complex transforms, you can fit many of those within 50 MB/s or 400MB/s.

I think that exploiting the theoretical flop numbers of current GPUs for doing non-GPU stuff is very hard. However, even a 2x or 3x speedup compared to cpu might be worthwhile for some, especially if this frees up a precious resource (cpu) while using an unused resource (gpu)
-k
Go to the top of the page
+Quote Post

Reply to this topicStart new topic
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:

 



RSS Lo-Fi Version Time is now: 19th September 2014 - 04:01