IPB

Welcome Guest ( Log In | Register )

 
Reply to this topicStart new topic
Audio mixing: channels stored interleaved vs. not, Recommended strategies?
sheh
post Oct 27 2012, 20:26
Post #1





Group: Members
Posts: 89
Joined: 3-November 04
Member No.: 17971



When downmixing non-interleaved multichannel audio, does it make sense to interleave it first, or just process it as it is?

If as is, sample by sample jumping between channels, or a buffer-full of each channel at a time?
Go to the top of the page
+Quote Post
Dynamic
post Oct 28 2012, 01:21
Post #2





Group: Members
Posts: 803
Joined: 17-September 06
Member No.: 35307



I'm not sure I understand what you want to achieve. Just making it work, or designing a computationally efficient algorithm e.g. to minimize random reads from disk. Downmix of e.g. 5.1 to stereo or multi-track studio recording? Using existing software or writing your own? Any software environment, such as Audacity, or a bunch of files - one per channel pr track?

Go to the top of the page
+Quote Post
sheh
post Oct 28 2012, 05:28
Post #3





Group: Members
Posts: 89
Joined: 3-November 04
Member No.: 17971



I mean efficiency-wise for an algorithm (x87 and later SSE) for data in memory. Downmix an arbitrary number of input channels: multiply/add input channels to output channels according to a mix matrix. Another point, relevant to SSE, is that the data is only 4-byte aligned.

This post has been edited by sheh: Oct 28 2012, 05:29
Go to the top of the page
+Quote Post
Dynamic
post Oct 28 2012, 11:00
Post #4





Group: Members
Posts: 803
Joined: 17-September 06
Member No.: 35307



Especially with it being in fast memory rather than disk I'd have thought there's not a lot of difference between

INTERLEAVE ALL THE AUDIO from 6 channels - THEN DOWNMIX from one interleave into, say 2 channels, one sample at a time

and

DOWNMIX from 6 independent channels into 2 channels on sample at a time.

To me, the latter seems easier if it's in memory.


Maybe someone who has done this already can make a recommendation or you can hack together some quick code each way and simply test it for speed. You don't need to implement anything complicated like the mixdown and whether or not to dither down the mixed stream.

E.g. populate your memory with 6 streams of random numbers, then just the read in the data and do something simple like copy its sample value to a variable then discard it and move on to the next channel / sample. You could even test it only on a bunch of data of fixed size and use a for loop to dispense with checking for end of stream, and simply create a few megabytes of fixed value data on 6 channels into memory at the start.

Put the whole lot in a for... next loop and repeat enough times for decent timing accuracy and you'll know which is faster using your hardware and your compiler.

Then implement the faster method properly for your requirements.
Go to the top of the page
+Quote Post
sheh
post Oct 28 2012, 13:07
Post #5





Group: Members
Posts: 89
Joined: 3-November 04
Member No.: 17971



I do plan to do some benchmarking eventually, but I was hoping someone already explored it. Besides differences in results on different CPUs and depending on number of channels, I expect small things I may not think of can tip the scales.
Go to the top of the page
+Quote Post
chi
post Oct 28 2012, 15:40
Post #6





Group: Members
Posts: 45
Joined: 27-November 11
Member No.: 95439



Well, in the interleave-first case, you do: 1. read samples from various places, 2. store samples that belong together in a new place, 3. read samples from the new place, 4. multiply/add, 5. store result.
In the process-as-is case, you do: 1. read samples from various places, 4. multiply/add, 5. store result.
Hmm … Will adding additional steps (2 and 3) make a program run faster, especially if they involve memory access where the data does not fit into the cache? I very much doubt it, but perhaps I am missing something.

This post has been edited by chi: Oct 28 2012, 15:40
Go to the top of the page
+Quote Post
sheh
post Oct 28 2012, 16:51
Post #7





Group: Members
Posts: 89
Joined: 3-November 04
Member No.: 17971



I don't know. Interleave-first can at least read sequentially (one pass per channel), and the final interleaved data will be 16-byte aligned.
Go to the top of the page
+Quote Post
lvqcl
post Oct 28 2012, 17:11
Post #8





Group: Developer
Posts: 3358
Joined: 2-December 07
Member No.: 49183



"arbitrary number of input channels" means that SSE code will not be efficient for interleaved data.
Go to the top of the page
+Quote Post
sheh
post Oct 28 2012, 18:38
Post #9





Group: Members
Posts: 89
Joined: 3-November 04
Member No.: 17971



Why not? Just doing 4 at once. The end could have special handling.
Go to the top of the page
+Quote Post
lvqcl
post Oct 28 2012, 18:50
Post #10





Group: Developer
Posts: 3358
Joined: 2-December 07
Member No.: 49183



So interleaved audio is "4 samples from channel #1, then 4 samples from channel #2, ..."?
Go to the top of the page
+Quote Post
sheh
post Oct 28 2012, 19:25
Post #11





Group: Members
Posts: 89
Joined: 3-November 04
Member No.: 17971



No, it's sample 0 for channels 0..N, sample 1, etc. But why not process 4 channels at a time?
Go to the top of the page
+Quote Post
saratoga
post Oct 28 2012, 22:25
Post #12





Group: Members
Posts: 4904
Joined: 2-September 02
Member No.: 3264



QUOTE (sheh @ Oct 28 2012, 14:25) *
No, it's sample 0 for channels 0..N, sample 1, etc. But why not process 4 channels at a time?


For a fixed number of channels, its probably not too much different either way, at least assuming you're on an SSE flavor that can do some kind of scatter/gather loads (IIRC newer flavors have this). On systems without this (e.g. older ARM without NEON), interleaved is likely to be slower due load/store throughput and register space.

For a variable number of channels though, that could be tough even on modern CPUs. You might have to special case each number to get efficient processing.
Go to the top of the page
+Quote Post
sheh
post Oct 31 2012, 17:06
Post #13





Group: Members
Posts: 89
Joined: 3-November 04
Member No.: 17971



QUOTE (saratoga @ Oct 28 2012, 23:25) *
For a fixed number of channels, its probably not too much different either way, at least assuming you're on an SSE flavor that can do some kind of scatter/gather loads
I don't think any SSE does scatter/gather, but anyway I'm only looking at SSE1.

QUOTE
On systems without this (e.g. older ARM without NEON), interleaved is likely to be slower due load/store throughput
You mean the initial copy-to-interleaved?

QUOTE
and register space.
What do you mean?
Go to the top of the page
+Quote Post
benski
post Oct 31 2012, 17:20
Post #14


Winamp Developer


Group: Developer
Posts: 670
Joined: 17-July 05
From: Brooklyn, NY
Member No.: 23375



You can simulate scatter/gather (for de-interleaving) with either shufps or unpckhps/unpcklps, but it's only really efficient for 2 or 4 channels. It'd be far better to just use separate per-channel (non-interleaved) buffers, perform your processing on them and then interleave only when necessary. Interleaved audio is a relic from WAV and CDDA and is generally an inefficient way of dealing with multichannel audio data.
Go to the top of the page
+Quote Post
saratoga
post Oct 31 2012, 22:15
Post #15





Group: Members
Posts: 4904
Joined: 2-September 02
Member No.: 3264



QUOTE (sheh @ Oct 31 2012, 12:06) *
QUOTE
and register space.
What do you mean?


If you run low on register space, you will likely find that interleaved is much harder to do efficiently. For example, in benski's example of using shufps in SSE to do gather, you cannot simply load 1 128 bit register of 4 consecutive singles since it gives you 64 bits worth of each channel. This means that if you want 4 consecutive values (say to implement an FIR filter) you'll have to load 2 128 bit values at once and thus need 2 registers. This may (or may not depending on what you are doing) cause you to run out of registers and have to resort to a less efficient algorithm.
Go to the top of the page
+Quote Post
sheh
post Nov 4 2012, 15:15
Post #16





Group: Members
Posts: 89
Joined: 3-November 04
Member No.: 17971



Thanks.
Go to the top of the page
+Quote Post
Kujibo
post Nov 4 2012, 23:06
Post #17





Group: Members
Posts: 38
Joined: 4-January 08
Member No.: 50127



It probably doesn't matter what you do as long as it's reasonably efficient. Often it's better to work with non-interleaved data though to keep things simpler as handling mixing of 1, 2, 4, 6, or whatever channels ends up with many permutations of algorithms for each channel case. It would probably be worth trying to get the data SSE aligned though if possible, or make algorithms that can handle the bulk of the data aligned and handle the starts and ends one at a time if need be. Later versions of SSE can handle unaligned load/store but then you get into architecture specific penalties that can sometimes cost you more than the actual cost of the mixing operations.

It's really only worth going to the trouble of operating on interleaved data if you need to run more expensive DSP on your data that can't be vector processed. E.g. IIR filters.

I would say though that mixing audio data is about the cheapest DSP operation you could ever do. Even on SSE1 era CPUs mixing a few data streams together is hardly going to show up as CPU use. So I'm not sure you want to worry about it all that much unless there is some reason to.

This post has been edited by Kujibo: Nov 4 2012, 23:08
Go to the top of the page
+Quote Post
sheh
post Nov 7 2012, 16:41
Post #18





Group: Members
Posts: 89
Joined: 3-November 04
Member No.: 17971



Yeah, special-handling the starts/ends then doing the middle aligned is probably the most elegant way around the non-alignment.

Some mixing might not be a problem by itself, but it wouldn't hurt to optimize a little since I want to be able to decode and mix a few tens of channels on Pentium 3s (plus some other things going on).
Go to the top of the page
+Quote Post

Reply to this topicStart new topic
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:

 



RSS Lo-Fi Version Time is now: 22nd August 2014 - 06:55