IPB

Welcome Guest ( Log In | Register )

 
Reply to this topicStart new topic
Pro Logic's Center Channel Extraction, Implementation questions
wswartzendruber
post Jul 23 2011, 05:31
Post #1





Group: Members
Posts: 85
Joined: 11-December 06
Member No.: 38563



I'm trying to figure out how Pro Logic implements center channel extraction. Looking at how various different types of sine waves combine (a single one into two other ones), I guess I'll need to implement some type of FIR filter. I'm very new to this. It would be nice if this could be done on a sample-by-sample basis.

Any help?
Go to the top of the page
+Quote Post
dc2bluelight
post Jul 23 2011, 09:10
Post #2





Group: Members
Posts: 83
Joined: 16-June 11
Member No.: 91562



QUOTE (wswartzendruber @ Jul 22 2011, 23:31) *
I'm trying to figure out how Pro Logic implements center channel extraction. Looking at how various different types of sine waves combine (a single one into two other ones), I guess I'll need to implement some type of FIR filter. I'm very new to this. It would be nice if this could be done on a sample-by-sample basis.

Any help?


In soundtracks encoded for (theatrical) Dolby Stereo, there are only two audio channels, but with Center and Surround "matrix encoded" into them. The two channels become Lt and Rt (Left Total, Right Total). The basic matrix is C = Lt+Rt -3dB, Surround = Lt-Rt (with special noise reduction applied, and delayed). A signal identical in phase and level in both Lt and Rt will be decoded as Center only, a signal identical in level but 180 degrees out of phase will be decoded as Surround only.

However, the "Logic" part of ProLogic comes from a method of reducing adjacent channel cross-talk. In the basic matrix, there is only about 3dB of separation between any two adjacent channels (left to center, right to surround, etc.) ProLogic is an advanced implementation of the original analog method of "steering" the matrix output to hype separation. In the basic matrix, a L only signal would also appear in C and S. With ProLogic, L only signals are gated out of C and S, and the ProLogic decoder will output only L under those conditions. Same thing for C to L and R. So, the signals that are sent to the left output are signals that appear only in the Lt channel. With Lt driven, you get only L out, C is shut down, as is S. It's done partially by subtracting signals, partially by level dependent dynamic gating, and is quite complex. The gating time constants were chosen to by syllabic in duration.

The early implementation of the steering logic used Tate Audio chips designed for enhancing "Quad" (early 4 channel stereo), but with the speaker plan modified. Dolby Stereo decoder cards used in the first cinema decoders had rather poor separation at low levels. It wasn't great, but wasn't a big problem in large theaters. When Dolby Surround came to the home market, separation was improved by ProLogic, though it worked much the same as the Tate chips, it was refine to maintain apparent adjacent channel separation down -30dB or so.

Part of ProLogic is the ability to properly decode the surround channel, in an identical manner to theater processors. S is mono in this case. Because the system was developed for optical sound on film, it had to be engineered to deal with certain imperfections in the optical pickup. High frequency phase match between Lt and Rt was impossible, so they did two things to prevent front highs from splashing into the surround. First, surround is limited to 8KHz response. Second, they (Dolby) applied a modified Dolby B-type noise reduction system to that channel. Dolby B and all forms of Dolby NR are two-ended. They must be encoded during record, decoded during play. So, any films with Dolby Stereo (matrix) surround have that type of NR already on the S channel. After the NR comes time delay. This is at least 20ms, and is there to take advantage of the precedence effect (you localize sound as coming from the direction you hear first). Delaying the S channel helps improve apparent separation between S and the front channels.

The last bit you should know about he matrix is that there exists a problem in panning sound from C to S. When you try to do this, you pass through a null caused by the matrix. To avoid this, there is a 90 degree phase shift network in the encoder that shifts the phase of the S input signal so that when it appears in Lt and Rt, the signal in Lt is 90 degrees advanced, and the Rt is 90 degrees retarded. The difference is still 180, but signals that now appear in both C and S are now 90 degrees apart and no longer cancel.

You can probably see why trying to figure out ProLogic with test tones won't give you much to go on.

By the way, the developer of Dolby Stereo is Ioan Allen (still with Dolby), and the engineer that worked out the steering logic is Craig Todd (also still with Dolby).
Go to the top of the page
+Quote Post
wswartzendruber
post Jul 24 2011, 06:07
Post #3





Group: Members
Posts: 85
Joined: 11-December 06
Member No.: 38563



I am having difficulty extracting the common signal between TL and TR. I am doing this in C. Can I get a description of how to do this? Pro Logic is just an example.

This post has been edited by wswartzendruber: Jul 24 2011, 06:07
Go to the top of the page
+Quote Post
DVDdoug
post Jul 25 2011, 18:41
Post #4





Group: Members
Posts: 2534
Joined: 24-August 07
From: Silicon Valley
Member No.: 46454



QUOTE
I am having difficulty extracting the common signal between TL and TR. I am doing this in C. Can I get a description of how to do this? Pro Logic is just an example.
Pro Logic Simplified- When the left & right signals are equal in amplitude and in-phase, the signal is steered to the center.

When the signals are equal & out-of-phase, the signal is steered to the rear.

There is more information on Wikipedia and on the Dobly website.

By comparing L, R, L+R and L-R, you should be able to build (or code) a simple steering system. I think the hard part with Pro Logic is getting the time-constants right. (I don't know if the timing information is published.)

wink.gif The simplest way to "extract" the center channel is to use hardware (a Pro Logic receiver or soundcard). If you need a digital copy, you can record it. Last time I searched, the only "user accessable" software Pro Logic decoder I could find was the one included with SurCode ($800 USD).

QUOTE
I'll need to implement some type of FIR filter.
There is a plug-in for Winamp called DSP Centercut, and there is a similar tool in the current beta version of GoldWave. (It's not related to, or compatiible with, Pro Logic.)

If I understand correctly, it uses FFT to break the signal into many frequency bands. Then L+R & L-R operations are performed on each band. By using that information as a "control signal", you can turn-on or turn-off various frequency bands. That allows you to turn-off the center channel information, or turn-off everything except the center channel information.

(L-R subtraction has been used to "remove vocals" for many years, but this method leaves you with a single mono signal.)

This post has been edited by DVDdoug: Jul 25 2011, 18:47
Go to the top of the page
+Quote Post
dc2bluelight
post Jul 27 2011, 09:26
Post #5





Group: Members
Posts: 83
Joined: 16-June 11
Member No.: 91562



QUOTE (DVDdoug @ Jul 25 2011, 12:41) *
By comparing L, R, L+R and L-R, you should be able to build (or code) a simple steering system. I think the hard part with Pro Logic is getting the time-constants right. (I don't know if the timing information is published.)


Nope, time constants were never published...deliberately. But they were chosen to be "syllabic" in nature, meaning, roughly timed to match single syllables in human speech. In fact, though, they were not perfect anyway, and could cause steering errors. For this reason all Lt/Rt soundtracks were mixed while monitoring through a standard encoder and decoder so that steering logic "errors" could be heard and compensated for. Some referred to these steering errors as "Todd-isms", a friendly nod to Craig Todd of Dolby Labs. They could be compensated for by recording a small sound in advance of the desired sound to pop open the steering logic and get the desired sound to decode to the correct channel.

I have to say, though, it's hard to understand why anyone would spend the effort to synthesize what's already been done and standardized. There is no advantage to a self-designed ProLogic approximation, when the exact thing is available. Remember, ProLogic is a "decoder", that means it must exactly undo what's been done in the recording process. An approximation won't be exact. Trying to reverse engineer ProLogic seems somewhat futile. But, if you look over what's been written here and the material Dolby has, you could come close with enough time, patience, listening tests and DSP coding chops. Meanwhile, every AV Receiver on the market since the mid 1990s has an implementation of ProLogic in it, already matching the standard. If you get one with pre amp outs, you'll have a perfectly decoded and extracted line level center signal. A used receiver with pre outs made any time in the last decade would be just fine.
Go to the top of the page
+Quote Post
wswartzendruber
post Jul 27 2011, 20:47
Post #6





Group: Members
Posts: 85
Joined: 11-December 06
Member No.: 38563



My goal is to implement a Pro Logic decoder with finer grain control over what to do with cross talk.

Problem: When inputting L and R into Lt and Rt, anything perfectly out of phase between L and R gets erroneously sent to the surround channel upon decoding.

Solution: Decode the "surround channel" (crosstalk) on just L and R, and then apply the crosstalk equally to both sides and in phase.

Summary of Methodology:

1. Phase-shift L by +90 degrees and R by -90 degrees.
2. Find the common signal and extract it.
3. Phase-shift L by -90 degrees and R by +90 degress (to put them as they were originally but without the crosstalk).
4. Apply the extracted signal equally to both L and R.

Caveat: Anything originally having a hollow sound will be decoded into the center channel.

The final product will likely have a slider letting the user choose how much phase shift to apply to the cross talk. 0 degrees (to each channel) leaves it alone and 90 degrees puts it perfectly in phase.

I am operating under the assumption that step #2 should be done similarly to Pro Logic's center channel decoding.

This post has been edited by wswartzendruber: Jul 27 2011, 20:49
Go to the top of the page
+Quote Post
DVDdoug
post Jul 27 2011, 23:06
Post #7





Group: Members
Posts: 2534
Joined: 24-August 07
From: Silicon Valley
Member No.: 46454



There's really no way to "improve" Pro Logic decoding. Like dc2bluelight said, mixing is done while monitoring through a pro logic decoder, so the mixing engineer knows exactly how it's going to come-out... It's not perfect... It's not a discrete multichannel system like 5.1 digital... It mostly just steers (pans) the sound. With a regular Pro Logic decoder you are hearing exactly what you are supposed to hear.


QUOTE
Problem: When inputting L and R into Lt and Rt, anything perfectly out of phase between L and R gets erroneously sent to the surround channel upon decoding.
It's not a problem, and it's not erronious... That's exactly how Dolby Surround works. If there is no out-of-phase information, there is no rear channel information.

If you want to take the time, you can do you own panning with an audio editor and re-encode to 5.1 digital. (I've done something like this, along with some other "tricks", to make a 5.1 surround track from a mono source.)

QUOTE
1. Phase-shift L by +90 degrees and R by -90 degrees.
2. Find the common signal and extract it.
OK... Now you have L & R 180 degrees out of phase with each other... You can accomplish almost the same thing by inverting one channel... Now what? If you sum the out-of-phase channels you get L-R (or R-L), and if you subtract the out-of-phase channels you get L+R (or -L-R). You still only have 2 "original" channels and you can make various summations (and subtractions) of the left & right signals.


QUOTE
Caveat: Anything originally having a hollow sound will be decoded into the center channel.
What? I don't see how "hollow sound" has any relationship to phase/amplitude of the L & R signals.
Go to the top of the page
+Quote Post
wswartzendruber
post Jul 28 2011, 02:26
Post #8





Group: Members
Posts: 85
Joined: 11-December 06
Member No.: 38563



QUOTE (DVDdoug @ Jul 27 2011, 18:06) *
There's really no way to "improve" Pro Logic decoding. Like dc2bluelight said, mixing is done while monitoring through a pro logic decoder, so the mixing engineer knows exactly how it's going to come-out... It's not perfect... It's not a discrete multichannel system like 5.1 digital... It mostly just steers (pans) the sound. With a regular Pro Logic decoder you are hearing exactly what you are supposed to hear.


QUOTE
Problem: When inputting L and R into Lt and Rt, anything perfectly out of phase between L and R gets erroneously sent to the surround channel upon decoding.
It's not a problem, and it's not erronious... That's exactly how Dolby Surround works. If there is no out-of-phase information, there is no rear channel information.

If you want to take the time, you can do you own panning with an audio editor and re-encode to 5.1 digital. (I've done something like this, along with some other "tricks", to make a 5.1 surround track from a mono source.)

QUOTE
1. Phase-shift L by +90 degrees and R by -90 degrees.
2. Find the common signal and extract it.
OK... Now you have L & R 180 degrees out of phase with each other... You can accomplish almost the same thing by inverting one channel... Now what? If you sum the out-of-phase channels you get L-R (or R-L), and if you subtract the out-of-phase channels you get L+R (or -L-R). You still only have 2 "original" channels and you can make various summations (and subtractions) of the left & right signals.


QUOTE
Caveat: Anything originally having a hollow sound will be decoded into the center channel.
What? I don't see how "hollow sound" has any relationship to phase/amplitude of the L & R signals.

1. I want to make an open source product that lets dummies do the mixing.
2. The user may not want front channel info that's out-of-phase to go to the back channel.
3. It is possible that for decoding, simple inverting may work.
4. By "hollow sound" I mean stuff that's out of phase; put an arbitrary signal through two speakers and sit between them, with one speaker miswired.

EDIT: You do realize I want to make an application that goes from four discrete channels to two matrixed ones, right?

This post has been edited by wswartzendruber: Jul 28 2011, 03:08
Go to the top of the page
+Quote Post
wswartzendruber
post Jul 28 2011, 04:26
Post #9





Group: Members
Posts: 85
Joined: 11-December 06
Member No.: 38563



Wow, I've failed miserably in explaining what I want. I believe that being able to implement Pro Logic's center channel decoding is crucial to writing an ENCODER that isolates crosstalk.
Go to the top of the page
+Quote Post
dc2bluelight
post Jul 28 2011, 06:10
Post #10





Group: Members
Posts: 83
Joined: 16-June 11
Member No.: 91562



QUOTE (wswartzendruber @ Jul 27 2011, 20:26) *
1. I want to make an open source product that lets dummies do the mixing.

Sorry, could you please let us know why you'd want to mix to Dolby Surround, when every current release format handles 5.1 discrete? Dolby Stereo/Dolby Surround, decoded by ProLogic was a compromise because the release format available only supported two channels. That's the ONLY reason it ever existed. Concurrently to Dolby Stereo/surround, there was 70mm Magnetic sound on film, which was, essentially, 5.1 analog discrete, and way better than any decoded matrix formant. We've moved on past the need to create new matrixed surround. The only real application for ProLogic is decoding legacy material that was never released in 5.1.
QUOTE (wswartzendruber @ Jul 27 2011, 20:26) *
2. The user may not want front channel info that's out-of-phase to go to the back channel.

If the material is to be compatible with Dolby ProLogic decoding, or any other matrix surround decoder designed to decode Dolby Surround, out of phase LtRt material must, by definition, be in the surround channel. Not true of "random" phase material, only 180 degree out of phase material. That's how the matrix is defined.
QUOTE (wswartzendruber @ Jul 27 2011, 20:26) *
3. It is possible that for decoding, simple inverting may work.

It does, sort of, but again if you're decoding legacy Dolby Surround material, simple L-R won't really work well, and you're ignoring the fact that the surround channel has partial Dolby B noise reduction on it.
QUOTE (wswartzendruber @ Jul 27 2011, 20:26) *
4. By "hollow sound" I mean stuff that's out of phase; put an arbitrary signal through two speakers and sit between them, with one speaker miswired.

I understand what you mean, but this doesn't happen in a properly calibrated 5.1 system with ProLogic.

QUOTE (wswartzendruber @ Jul 27 2011, 20:26) *
EDIT: You do realize I want to make an application that goes from four discrete channels to two matrixed ones, right?


Actually, that wasn't obvious. But, it's already been done. The plug-ins that exist in software used to master for DVD, for example, include Dolby Digital 5.1 with down-mixing to LtRt. I've got it in DVD Studio, for example. If you're committed to making an encoder, it must work and match a standard ProLogic decoder, or what you're doing is inventing an entirely new (incompatible) format. There are literally millions of ProLogic decoders in the world, but if you modify what's happening in the encode/decode process, there will be only one deocder that works with it: yours.

Encoding for Dolby Surround is actually fairly simple, I've hinted at what's needed in a previous post.

To start with, each input is passed through a phase shift network. The networks used for L, C and R are all identical, but the S input is phase shifted 90 degrees away from LCR. L and R are then sent to LtRt without modification. C is dropped 3dB then sent equally and in phase to Lt and Rt. S is dropped 3dB, 8KHz low-passed, partial Dolby B NR encoded, then split to two paths, one inverted from the other, then sent to Lt and Rt respectively. The 90 degree networks are there to permit panning from LCR to S without a cancellation null mid pan. You have to phase shift each input because there's no other way to maintain a 90 degree difference consistently over frequency with the S input. That's about it for encoding. But the trick is to monitor through a standard ProLogic decoder so in the final mix you can compensate for the steering anomalies, inherent cross talk, etc. There's actually a lot more to decoding, including steering logic and time delay for the S channel (it's there to hype L/R to S separation via the Haas effect).

But, seriously, nobody mixes in Dolby Surround anymore. There's just no need when you have discrete 5.1 right there in your hands in every delivery format except .mp3 and audio CD, which don't usually find their ways to ProLogic anyway.
Go to the top of the page
+Quote Post
wswartzendruber
post Jul 28 2011, 07:51
Post #11





Group: Members
Posts: 85
Joined: 11-December 06
Member No.: 38563



Since you want an in-depth explanation, you can have it.

This is really about Pro Logic II and the possibility of storing 5.0 music on a conventional CD. The only PL2 encoder I've really used is eac3to's, and it has horrible issues with crosstalk. I know of no good, standalone PL2 encoder that takes crosstalk into account. $800 for SurCode isn't ideal, either. I intend to change that.

I also don't know what I'm doing, exactly. I have theories on how to eliminate crosstalk, but need to be able to dematrix the sound as a preprossing process. Pro Logic is merely a starting point.

Does this make more sense now?
Go to the top of the page
+Quote Post
DVDdoug
post Jul 28 2011, 18:20
Post #12





Group: Members
Posts: 2534
Joined: 24-August 07
From: Silicon Valley
Member No.: 46454



OK... Time to start writing some experimental code!

Once you can read an audio file into an array (or each channel into an array), it's easy to add or subtract the L & R channels and put the result in a new array. I don't actullly know how to make a 90 degree phase shift (Hilbert transform?), but I'm sure you can look it up.

I think you will find that there are serious limitations with matrix encoding... We tried to warn you! biggrin.gif
Go to the top of the page
+Quote Post
dc2bluelight
post Jul 29 2011, 04:32
Post #13





Group: Members
Posts: 83
Joined: 16-June 11
Member No.: 91562



QUOTE (DVDdoug @ Jul 28 2011, 12:20) *
OK... Time to start writing some experimental code!

Once you can read an audio file into an array (or each channel into an array), it's easy to add or subtract the L & R channels and put the result in a new array. I don't actullly know how to make a 90 degree phase shift (Hilbert transform?), but I'm sure you can look it up.

I think you will find that there are serious limitations with matrix encoding... We tried to warn you! biggrin.gif

90 degree phase shift between channels can only be had by applying a huge, multi-stage all-pass network to each channel so that phase is constantly rotated throughout the audio band, then where you want 90 degrees, build the next all-pass network so it's 90 degrees off-set from the first one.

Yes, we tried to warn you. Crosstalk is part of the matrix. ProLogic was the fix. PLII was an improvement, but not really in crosstalk. ProLogic music on CD's been done (not a commercial success). Do your 5.0 on DVD Audio or Bluray, you'll be happier.

BTW, ProLogic was designed for film soundtracks, the steering time constants being optimized for sound effects and speech. Have fun with music.

This post has been edited by dc2bluelight: Jul 29 2011, 04:37
Go to the top of the page
+Quote Post
wswartzendruber
post Jul 31 2011, 04:24
Post #14





Group: Members
Posts: 85
Joined: 11-December 06
Member No.: 38563



I've got a working Hilbert transformer. It's really, really slow because I'm using over 2,000 coefficients, but it preserves a lot of the base. I'll check back in a bit.
Go to the top of the page
+Quote Post
Dynamic
post Aug 5 2011, 11:54
Post #15





Group: Members
Posts: 795
Joined: 17-September 06
Member No.: 35307



Sorry to resurrect a slightly old thread, but I'd have thought if you want to encode 5.0 sound in a new end-to-end format onto a CD that sounds like standard 2.0 L+R Red Book audio if played without a decoder you could use some digital tricks to get far superior results with no steering problems, assuming your decoder has access to the digital stream, not just an analogue CD player output.

Just for example, a 16kHz lowpass filter is essentially transparent for real musical signals other than test tones according to many listening tests, virtually regardless of listener's age.
My initial thought was that it's quite feasible to consider encoding the side channels at much reduced volume and somewhat less bandwidth and shifted, possibly reflected into the 16-22kHz regions of the normal channels (so 0Hz gets reflected up to 22.05 kHz, 6kHz is at 16.05 kHz). Reduced amplitude is necessary to safeguard tweeters and listeners' pets but the reduced bit depth could be compensated using mu-Law a-Law or ADPCM types of technique)

If 8kHz bandwidth is fine for Pro Logic, I guess 6kHz isn't too bad, or you could possibly encode over 12 kHz using the ultrasonic areas of L and R channels together as single channel in some way, while encoding a steering signal into or instead of the LSB(s) of the CD audio.

Then I had a potentially better thought.

14-bit quantization is enough to get transparent PCM audio, even without spectrally-flat dither (and was Philips' original proposal for the CD standard), so you could definitely replace the 2 LSBs of your music, possibly more, with some other encoding that would get lost below the noise floor of even a super-quiet listening room.

Just the 2 LSBs at 44.1 Sa/s x 2 channel = 176.4kbps, so even a steering signal to re-steer the left and right channels to completely different locations with or without specified delays could be arbitrarily accurate dependant on the time-constant you allow, so you could get, say, a 12kHz channel, as above then steer it as you wish.

Instead of that, though, potentially the best idea is that with such bitrate you could even encode to a low-latency music-compatible codec like the superb CELT (in 5ms latency mode, the delay is negligible, but could be compensated with a delay to the 14-bit PCM front channels if you could be bothered, and at 20ms+ latency (still low enough) the quality per bitrate is even better). CELT is even remarkably resilient to bit-errors, so even a Red Book CD bit stream's errors in burst mode (as in a standard audio CD transport) probably wouldn't sound bad, especially if C2 error detection is known to your decoder. There's enough bitrate to allow you to encode in which ever way suits you best (e.g. you could derive the Centre from L and R from the PCM, then use CELT for LS and RS) and include some kind of signature for compatible decoders to know there's a valid signal there in the LSBs. For easier integration into industry standard chipsets, you could also use codecs like MP3 at up to 160 kbps CBR, which would also provide very good quality stereo. You could potentially offset the MP3 stream relative to the PCM to account for the higher decoder latency. End-to-end latency isn't too much of a concern as it's not encoded live, just requires reasonably sychronization on decode, though it happens that CELT is extremely good and still provides low latency (and is intended to be patent-free also).

Whatever you encode there, but especially a lossy codec like CELT (presumably padded out with null data where it doesn't use all 176.4kbps) would present a very noiselike PCM signal so that even if turned up during fadeouts it would sound much like white noise hiss or dither when played back on a regular Red Book CD player. Better still, you can dither your L&R PCM audio to 14-bits with any form of dither you like to preserve a high (technically infinite) dynamic range and relatively low perceptual or noise floor.

You could also consider using 15-bit PCM for L and R and 88.2 kbps left over for surround encoding (CELT is still remarkably good at 64 kbps in stereo, and thus 88.2kbps also, but probably not quite transparent) or variations of PCM bitdepth versus encoded bitrate as required. Encoder artifacts in LS and RS alone are likely to be harder to spot in the presence of audio from the PCM front channels.

During fadeouts and other quiet passage with no surround channel requirements, you could also consider switching off the LSB encoding and reverting to 16-bit PCM or allowing digital-silence, should that be seen as desirable for the Red Book Audio for any reason (not that I can think of any).
Go to the top of the page
+Quote Post
dc2bluelight
post Aug 5 2011, 12:55
Post #16





Group: Members
Posts: 83
Joined: 16-June 11
Member No.: 91562



Dynamic's ideas are interesting, you solve the problem by creating a one-off solution. However, if you don't care about compatibility of either the two channel stereo or the surround mix, then why bother with a CD at all? In fact, why bother with matrixed surround at all? In fact why limit yourself to 16/44.1? You see where this goes. If you take the trouble to build a one-off solution, you might as well dump 5.1 channels of discrete uncompressed high rate files onto a DVD.

I think the point is a method that takes advantage of the vast installed base of hardware that can both play a CD and apply ProLogic decoding. If not, then abandon the ProLogic idea completely, it's a flawed compromise to begin with.
Go to the top of the page
+Quote Post

Reply to this topicStart new topic
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:

 



RSS Lo-Fi Version Time is now: 22nd July 2014 - 16:12