Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: Out of phase intensity stereo (Read 21438 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

Out of phase intensity stereo

How does phase inverted intensity stereo work in Opus and how does it save bits and make the stereo better as opposing to coding that particular audio band with in-phase intensity stereo? Does it produce any audible effects without downmixing to mono?

Out of phase intensity stereo

Reply #1
I assume this is the same as MP3 "Joint Stereo"...

You can convert regular stereo to M-S (Mid-Side) by adding the left & right channels together (L+R = M) and subtracting the left & right channels (L-R = S).  You still have 2 channels.  Mathematically, you haven't lost any information and this data can be losslessly converted back to normal stereo.

The advantage comes when you compress the data...  The difference channel usually contains less information, so it can be compressed more (at a lower bitrate).  This is true for both lossy and  lossless compression.

The sum channel usually contains lots of information that's common to both left & right channels.  That information only has to be compressed once, so again you can get more compression (a lower bitrate) than you'd get compressing left & right separately.  Again that's true for both lossy and lossless compression.

The bottom line is, you can either get a smaller file (more compression), or better quality.  (Of course, with lossless compression you can't get better quality, but you can get a smaller file.)

Out of phase intensity stereo

Reply #2
I assume this is the same as MP3 "Joint Stereo"...
You see what happens when you assume. Opus does have mid-side stereo but that's not what's being asked about at all.



Out of phase intensity stereo

Reply #5
How does phase inverted intensity stereo work in Opus and how does it save bits and make the stereo better as opposing to coding that particular audio band with in-phase intensity stereo? Does it produce any audible effects without downmixing to mono?


Inverted intensity stereo is just like normal intensity stereo... except that left = -right. It helps slightly when the stereo image is very wide. A rarer case is artificial stereo effects where left and right are actually 180 degrees out of phase. In that case, the inverted intensity stereo can accurately model the effect rather than causing the stereo image to end up in the center.

Out of phase intensity stereo

Reply #6
Without much knowledge about what I am asking: Why is it that n>1 channels seems like something that is "bolted on to" inherently mono codecs, instead of codecs being natively multichannel aware? One would think that, deep down in the linear transforms and quantizers and auditory models, there might be some benefit to having complete knowledge, rather than working on a signal that someone chose to matrix as "mid/side"?

-k

Out of phase intensity stereo

Reply #7
In low-bit-rate stereo coding you don't have enough bits to code two entire spectra satisfactorily. The easiest way to increase the average bit-rate per MDCT line is to downmix the two channels to mono above a certain start frequency, and to add "cheap" panning (and maybe phase) information to that downmix.

Chris
If I don't reply to your reply, it means I agree with you.

Out of phase intensity stereo

Reply #8
Without much knowledge about what I am asking: Why is it that n>1 channels seems like something that is "bolted on to" inherently mono codecs, instead of codecs being natively multichannel aware? One would think that, deep down in the linear transforms and quantizers and auditory models, there might be some benefit to having complete knowledge, rather than working on a signal that someone chose to matrix as "mid/side"?

-k


The stereo support is not "bolted on". Most encoders use separate stereo or lossless mid/side stereo when the bitrate allows. In fact, low bitrate separate stereo would suck horribly.

Out of phase intensity stereo

Reply #9
In low-bit-rate stereo coding you don't have enough bits to code two entire spectra satisfactorily. The easiest way to increase the average bit-rate per MDCT line is to downmix the two channels to mono above a certain start frequency, and to add "cheap" panning (and maybe phase) information to that downmix.

Chris


Isn't that parametric stereo of HE-AAC v2. as opposed to intensity stereo?

Out of phase intensity stereo

Reply #10
In low-bit-rate stereo coding you don't have enough bits to code two entire spectra satisfactorily. The easiest way to increase the average bit-rate per MDCT line is to downmix the two channels to mono above a certain start frequency, and to add "cheap" panning (and maybe phase) information to that downmix.

Chris


Isn't that parametric stereo of HE-AAC v2. as opposed to intensity stereo?


HE-AACv2-style parametric stereo is just one step further than intensity stereo. There's are three types of stereo cues that the ear perceives: 1) inter-channel intensity difference 2) inter-channel phase difference 3) inter-channel coherence. Intensity stereo models only 1), while parametric stereo models all 3. In practice 2) is only perceivable at low frequencies, where messing with stereo is a bad idea anyway (this is why I think messing with phase is stupid), so the real advantage of parametric stereo over simple intensity stereo is the inter-channel coherence. It comes with a price though: a significant increase in both delay and complexity. This is why it's not in Opus.

Out of phase intensity stereo

Reply #11
There is a low delay parametric stereo that requires +5 ms of extra delay and still yields a significant rate reduction.  (Source)
Also phase coding was improved in a recent standard(s). (Source)
I was impressed by quality of the last one. There were no typical artifacts of HE-AAC v2's parametric stereo like "drive effect". Though it's not known how a low delay version would sound.

Out of phase intensity stereo

Reply #12
HE-AACv2-style parametric stereo is just one step further than intensity stereo. There's are three types of stereo cues that the ear perceives:
1) inter-channel intensity difference
2) inter-channel phase difference
3) inter-channel coherence. Intensity stereo models only 1), while parametric stereo models all 3.
In practice 2) is only perceivable at low frequencies, where messing with stereo is a bad idea anyway (this is why I think messing with phase is stupid),
so the real advantage of parametric stereo over simple intensity stereo is the inter-channel coherence.
It comes with a price though: a significant increase in both delay and complexity. This is why it's not in Opus.


Now hold it.

The cues the ear can detect are

1) HRTF cues, which are both TIME DELAY and INTENSITY (both as a function of frequency). This is kind of like 1 and 2 above, but please do not say "phase",
it's time delay, not phase, as far as the ear is concerned.
2) Interaural correlation below 500Hz, interaural envelope correlation (across an ERB, of course) above 2kHz or so, and a mix of the two
between those two frequencies

Interaural delay is audible at any frequency, HOWEVER, above 2000Hz or so, the relevant time delay is that of the signal ENVELOPE in an ERB,
as opposed to the waveform itself.
Between 500 and 2000Hz, you're less sensitive to interaural delays. Two mechanisms conflict.
Below 500Hz, it's interaural waveform delay that matters.

Yes, I am rather well aware, to say the least, that time delay and phase shift are related, I've beaten people over the head enough times about that myself,
but my point here is simple, in terms of what goes down the auditory nerves, it's time delay that matters, not phase, and the delay is of different things
at high and low frequencies.

This is why you can localize a Gaussian pulse centered at 10khz, but not a 10kHz sine wave, for instance, but claims that time delay does not matter at
higher frequencies is completely betrayed by the fact you can easily show otherwise with a simple matlab program.

Note: the kind of time delay that matters is in the order of +-.45 milliseconds relative to center. No more.
-----
J. D. (jj) Johnston

Out of phase intensity stereo

Reply #13
Without much knowledge about what I am asking: Why is it that n>1 channels seems like something that is "bolted on to" inherently mono codecs, instead of codecs being natively multichannel aware? One would think that, deep down in the linear transforms and quantizers and auditory models, there might be some benefit to having complete knowledge, rather than working on a signal that someone chose to matrix as "mid/side"?

-k


That's certainly not the case for MPEG-2 AAC, I think.
-----
J. D. (jj) Johnston

Out of phase intensity stereo

Reply #14
1) HRTF cues, which are both TIME DELAY and INTENSITY (both as a function of frequency). This is kind of like 1 and 2 above, but please do not say "phase",
it's time delay, not phase, as far as the ear is concerned.


You're seeing it from the PoV of the ears, and I'm seeing it from the signal. The signal has a phase (not a time) at each frequency.

2) Interaural correlation below 500Hz, interaural envelope correlation (across an ERB, of course) above 2kHz or so, and a mix of the two
between those two frequencies

Interaural delay is audible at any frequency, HOWEVER, above 2000Hz or so, the relevant time delay is that of the signal ENVELOPE in an ERB,
as opposed to the waveform itself.


My experience is that above 2 kHz IPD/ITD is much less important than IID and IC.

Between 500 and 2000Hz, you're less sensitive to interaural delays. Two mechanisms conflict.


Correct. And my rule of thumb has always been "don't mess with anything below 2 kHz because it's not well understood and the ear is good at picking up anything you mess up".

Out of phase intensity stereo

Reply #15
My experience is that above 2 kHz IPD/ITD is much less important than IID and IC.

Then you're using the wrong signal. If a signal has a flat envelope, yes. If it's a signal from most any real source, not so much.

Even pitchy sounds can trigger this.


Try it. You may find (you will find) that the same problem that manifests as BMLD at under 500Hz happens over 2kHz with signals that have a strongly varying envelope.

Try it.

As to phase vs. time delay, the ear hears time delay, not phase, and if you can analyze phase between channels, you can also analyze time delay. Same variable, one is the integral of the other.
-----
J. D. (jj) Johnston

Out of phase intensity stereo

Reply #16
My experience is that above 2 kHz IPD/ITD is much less important than IID and IC.

Then you're using the wrong signal. If a signal has a flat envelope, yes. If it's a signal from most any real source, not so much.

Even pitchy sounds can trigger this.


Try it. You may find (you will find) that the same problem that manifests as BMLD at under 500Hz happens over 2kHz with signals that have a strongly varying envelope.

Try it.

As to phase vs. time delay, the ear hears time delay, not phase, and if you can analyze phase between channels, you can also analyze time delay. Same variable, one is the integral of the other.


I have tried it. I've actually *measured* the perceptual impact of messing up the phase at higher frequencies. What I found was that you can pretty do whatever you like to the phase, as long as you leave the first 1-2 kHz alone. See This paper I published in 2008, more precisely, the perceptual impact of the "SCAL" algorithm.

Out of phase intensity stereo

Reply #17
I have tried it. I've actually *measured* the perceptual impact of messing up the phase at higher frequencies. What I found was that you can pretty do whatever you like to the phase, as long as you leave the first 1-2 kHz alone. See This paper I published in 2008, more precisely, the perceptual impact of the "SCAL" algorithm.



Again, what did you use for a test signal?

-----
J. D. (jj) Johnston

Out of phase intensity stereo

Reply #18
I think you've missed the entire point, jv.  Try a properly imaged (including HRTF and proper delay) signal with no content below 2kHz that has fast attacks.

-----
J. D. (jj) Johnston

Out of phase intensity stereo

Reply #19
I have tried it. I've actually *measured* the perceptual impact of messing up the phase at higher frequencies. What I found was that you can pretty do whatever you like to the phase, as long as you leave the first 1-2 kHz alone. See This paper I published in 2008, more precisely, the perceptual impact of the "SCAL" algorithm.



Again, what did you use for a test signal?


I tested with actual speech and music. I don't care about synthetic signals.

 

Out of phase intensity stereo

Reply #20
I think you've missed the entire point, jv.  Try a properly imaged (including HRTF and proper delay) signal with no content below 2kHz that has fast attacks.

Moving back to CELT, I would think that the relatively short window durations (between 22.5 and 5 ms) would allow quite accurate ITD modeling (by means of short-term IID changes), especially when switching to short transforms upon transients. Or not?

Chris
If I don't reply to your reply, it means I agree with you.

Out of phase intensity stereo

Reply #21
I think you've missed the entire point, jv.  Try a properly imaged (including HRTF and proper delay) signal with no content below 2kHz that has fast attacks.

Moving back to CELT, I would think that the relatively short window durations (between 22.5 and 5 ms) would allow quite accurate ITD modeling (by means of short-term IID changes), especially when switching to short transforms upon transients. Or not?

Chris



You want resolution in the 5 microsecond to .9 millisecond range.
-----
J. D. (jj) Johnston

Out of phase intensity stereo

Reply #22
I have tried it. I've actually *measured* the perceptual impact of messing up the phase at higher frequencies. What I found was that you can pretty do whatever you like to the phase, as long as you leave the first 1-2 kHz alone. See This paper I published in 2008, more precisely, the perceptual impact of the "SCAL" algorithm.



Again, what did you use for a test signal?


I tested with actual speech and music. I don't care about synthetic signals.


Plenty of music IS 100% SYNTHETIC!!!!!!!!!!!!!!!!!!!!!!!! 
And plenty of other genre music has synths and organs too.  This is the problem with lossy audio encoding.  Lack of foresight about the ethnomusicological realities.  And a lot of synthesizer music sounds like crap when lossy encoded because it pushes the codecs to their (classical? and jazz? -oriented) limits.  check the compression ratios of different genres of music and you start to see that the more advanced the music is in terms of mixdown and special effects, the worse the compression ratio tends to be.  Check a big catalogue of a lot of different genres of music sorted in foobar and look at the compression ratios. 

Stuff that hardly has any dynamics and not much bandwidth in terms of layers compresses well if it's a good non-noisey recording.  Stuff that has 50 different layers of widely contrasting sources each panned and autopanned in creative ways and with lots of differences in frequency contents and especially with lots of high energy percussion and it the codecs can barely deal with it.  You can even see this with the ratios of FLAC.  The more complex the music, the harder for the codec to encode it.  Thank goodness it's lossless.  And if the music has a lot of aesthetic distortion and bitcrushing, then the codec results plummet.  And that type of music is still very popular.  Musicians who make this kind of music know what I'm talking about when they chorus bitcrush flange phase and layer several reese basses and eq them heavily and automate the eq.  It creates a distinct sound.  There's endless other examples.  MP3 and similar is designed for 1990s rock drumkits but not for year 2014 synth drums. 

Electronic music can do pretty much anything... extremely amplitudinally dynamic  low bass 180 out of phase coiniciding with mono midrange and compressed treble... anything goes.  Combinations of hard and soft sounds, tonal and staticky, flanged, phased, resonant noise, anything goes.  So the codec better be able to cope with it. 

So what's the answer to the original guy's question?
Be a false negative of yourself!

Out of phase intensity stereo

Reply #23
Synthetic signals and synthetic music aren't in the same context here.
A frequency sweep is one of these synthetic signals. 

Opus and AAC handle eletronic music very well actually. Before anything ... test it.

Out of phase intensity stereo

Reply #24
Well, I got bored so I put these two files together.

Give them a listen. It would also be interesting to see what the various codecs do with the two files.

Use both speakers and phones, of course.
-----
J. D. (jj) Johnston