IPB

Welcome Guest ( Log In | Register )

2 Pages V   1 2 >  
Reply to this topicStart new topic
Out of phase intensity stereo
Neuron
post Mar 11 2014, 23:53
Post #1





Group: Members
Posts: 143
Joined: 14-December 12
Member No.: 105171



How does phase inverted intensity stereo work in Opus and how does it save bits and make the stereo better as opposing to coding that particular audio band with in-phase intensity stereo? Does it produce any audible effects without downmixing to mono?
Go to the top of the page
+Quote Post
DVDdoug
post Mar 12 2014, 01:10
Post #2





Group: Members
Posts: 2718
Joined: 24-August 07
From: Silicon Valley
Member No.: 46454



I assume this is the same as MP3 "Joint Stereo"...

You can convert regular stereo to M-S (Mid-Side) by adding the left & right channels together (L+R = M) and subtracting the left & right channels (L-R = S). You still have 2 channels. Mathematically, you haven't lost any information and this data can be losslessly converted back to normal stereo.

The advantage comes when you compress the data... The difference channel usually contains less information, so it can be compressed more (at a lower bitrate). This is true for both lossy and lossless compression.

The sum channel usually contains lots of information that's common to both left & right channels. That information only has to be compressed once, so again you can get more compression (a lower bitrate) than you'd get compressing left & right separately. Again that's true for both lossy and lossless compression.

The bottom line is, you can either get a smaller file (more compression), or better quality. (Of course, with lossless compression you can't get better quality, but you can get a smaller file.)

This post has been edited by DVDdoug: Mar 12 2014, 01:16
Go to the top of the page
+Quote Post
jensend
post Mar 12 2014, 02:46
Post #3





Group: Members
Posts: 149
Joined: 21-May 05
Member No.: 22191



QUOTE (DVDdoug @ Mar 11 2014, 17:10) *
I assume this is the same as MP3 "Joint Stereo"...
You see what happens when you assume. Opus does have mid-side stereo but that's not what's being asked about at all.
Go to the top of the page
+Quote Post
saratoga
post Mar 12 2014, 03:11
Post #4





Group: Members
Posts: 5172
Joined: 2-September 02
Member No.: 3264



The OP is probably referring to this:

http://www.hydrogenaudio.org/forums/index....st&p=840586

Although you would never know it from his question.
Go to the top of the page
+Quote Post
Neuron
post Mar 14 2014, 00:26
Post #5





Group: Members
Posts: 143
Joined: 14-December 12
Member No.: 105171



QUOTE (saratoga @ Mar 12 2014, 03:11) *
The OP is probably referring to this:

http://www.hydrogenaudio.org/forums/index....st&p=840586

Although you would never know it from his question.


Yes I mean Opus intensity stereo. And sorry for wording the question so awkwardly blush.gif
Go to the top of the page
+Quote Post
jmvalin
post Mar 14 2014, 01:15
Post #6


Xiph.org Speex developer


Group: Developer
Posts: 487
Joined: 21-August 02
Member No.: 3134



QUOTE (Neuron @ Mar 11 2014, 17:53) *
How does phase inverted intensity stereo work in Opus and how does it save bits and make the stereo better as opposing to coding that particular audio band with in-phase intensity stereo? Does it produce any audible effects without downmixing to mono?


Inverted intensity stereo is just like normal intensity stereo... except that left = -right. It helps slightly when the stereo image is very wide. A rarer case is artificial stereo effects where left and right are actually 180 degrees out of phase. In that case, the inverted intensity stereo can accurately model the effect rather than causing the stereo image to end up in the center.
Go to the top of the page
+Quote Post
knutinh
post Mar 14 2014, 09:15
Post #7





Group: Members
Posts: 570
Joined: 1-November 06
Member No.: 37047



Without much knowledge about what I am asking: Why is it that n>1 channels seems like something that is "bolted on to" inherently mono codecs, instead of codecs being natively multichannel aware? One would think that, deep down in the linear transforms and quantizers and auditory models, there might be some benefit to having complete knowledge, rather than working on a signal that someone chose to matrix as "mid/side"?

-k
Go to the top of the page
+Quote Post
C.R.Helmrich
post Mar 14 2014, 14:51
Post #8





Group: Developer
Posts: 694
Joined: 6-December 08
From: Erlangen Germany
Member No.: 64012



In low-bit-rate stereo coding you don't have enough bits to code two entire spectra satisfactorily. The easiest way to increase the average bit-rate per MDCT line is to downmix the two channels to mono above a certain start frequency, and to add "cheap" panning (and maybe phase) information to that downmix.

Chris

This post has been edited by C.R.Helmrich: Mar 14 2014, 14:53


--------------------
If I don't reply to your reply, it means I agree with you.
Go to the top of the page
+Quote Post
Neuron
post Mar 14 2014, 15:52
Post #9





Group: Members
Posts: 143
Joined: 14-December 12
Member No.: 105171



QUOTE (knutinh @ Mar 14 2014, 09:15) *
Without much knowledge about what I am asking: Why is it that n>1 channels seems like something that is "bolted on to" inherently mono codecs, instead of codecs being natively multichannel aware? One would think that, deep down in the linear transforms and quantizers and auditory models, there might be some benefit to having complete knowledge, rather than working on a signal that someone chose to matrix as "mid/side"?

-k


The stereo support is not "bolted on". Most encoders use separate stereo or lossless mid/side stereo when the bitrate allows. In fact, low bitrate separate stereo would suck horribly.

This post has been edited by Neuron: Mar 14 2014, 15:53
Go to the top of the page
+Quote Post
Neuron
post Mar 14 2014, 15:55
Post #10





Group: Members
Posts: 143
Joined: 14-December 12
Member No.: 105171



QUOTE (C.R.Helmrich @ Mar 14 2014, 14:51) *
In low-bit-rate stereo coding you don't have enough bits to code two entire spectra satisfactorily. The easiest way to increase the average bit-rate per MDCT line is to downmix the two channels to mono above a certain start frequency, and to add "cheap" panning (and maybe phase) information to that downmix.

Chris


Isn't that parametric stereo of HE-AAC v2. as opposed to intensity stereo?
Go to the top of the page
+Quote Post
jmvalin
post Mar 15 2014, 04:28
Post #11


Xiph.org Speex developer


Group: Developer
Posts: 487
Joined: 21-August 02
Member No.: 3134



QUOTE (Neuron @ Mar 14 2014, 09:55) *
QUOTE (C.R.Helmrich @ Mar 14 2014, 14:51) *
In low-bit-rate stereo coding you don't have enough bits to code two entire spectra satisfactorily. The easiest way to increase the average bit-rate per MDCT line is to downmix the two channels to mono above a certain start frequency, and to add "cheap" panning (and maybe phase) information to that downmix.

Chris


Isn't that parametric stereo of HE-AAC v2. as opposed to intensity stereo?


HE-AACv2-style parametric stereo is just one step further than intensity stereo. There's are three types of stereo cues that the ear perceives: 1) inter-channel intensity difference 2) inter-channel phase difference 3) inter-channel coherence. Intensity stereo models only 1), while parametric stereo models all 3. In practice 2) is only perceivable at low frequencies, where messing with stereo is a bad idea anyway (this is why I think messing with phase is stupid), so the real advantage of parametric stereo over simple intensity stereo is the inter-channel coherence. It comes with a price though: a significant increase in both delay and complexity. This is why it's not in Opus.
Go to the top of the page
+Quote Post
IgorC
post Mar 15 2014, 07:03
Post #12





Group: Members
Posts: 1582
Joined: 3-January 05
From: ARG/RUS
Member No.: 18803



There is a low delay parametric stereo that requires +5 ms of extra delay and still yields a significant rate reduction. (Source)
Also phase coding was improved in a recent standard(s). (Source)
I was impressed by quality of the last one. There were no typical artifacts of HE-AAC v2's parametric stereo like "drive effect". Though it's not known how a low delay version would sound.
Go to the top of the page
+Quote Post
Woodinville
post Mar 15 2014, 08:30
Post #13





Group: Members
Posts: 1415
Joined: 9-January 05
From: In the kitchen
Member No.: 18957



QUOTE (jmvalin @ Mar 14 2014, 19:28) *
HE-AACv2-style parametric stereo is just one step further than intensity stereo. There's are three types of stereo cues that the ear perceives:
1) inter-channel intensity difference
2) inter-channel phase difference
3) inter-channel coherence. Intensity stereo models only 1), while parametric stereo models all 3.
In practice 2) is only perceivable at low frequencies, where messing with stereo is a bad idea anyway (this is why I think messing with phase is stupid),
so the real advantage of parametric stereo over simple intensity stereo is the inter-channel coherence.
It comes with a price though: a significant increase in both delay and complexity. This is why it's not in Opus.


Now hold it.

The cues the ear can detect are

1) HRTF cues, which are both TIME DELAY and INTENSITY (both as a function of frequency). This is kind of like 1 and 2 above, but please do not say "phase",
it's time delay, not phase, as far as the ear is concerned.
2) Interaural correlation below 500Hz, interaural envelope correlation (across an ERB, of course) above 2kHz or so, and a mix of the two
between those two frequencies

Interaural delay is audible at any frequency, HOWEVER, above 2000Hz or so, the relevant time delay is that of the signal ENVELOPE in an ERB,
as opposed to the waveform itself.
Between 500 and 2000Hz, you're less sensitive to interaural delays. Two mechanisms conflict.
Below 500Hz, it's interaural waveform delay that matters.

Yes, I am rather well aware, to say the least, that time delay and phase shift are related, I've beaten people over the head enough times about that myself,
but my point here is simple, in terms of what goes down the auditory nerves, it's time delay that matters, not phase, and the delay is of different things
at high and low frequencies.

This is why you can localize a Gaussian pulse centered at 10khz, but not a 10kHz sine wave, for instance, but claims that time delay does not matter at
higher frequencies is completely betrayed by the fact you can easily show otherwise with a simple matlab program. smile.gif

Note: the kind of time delay that matters is in the order of +-.45 milliseconds relative to center. No more.


--------------------
-----
J. D. (jj) Johnston
Go to the top of the page
+Quote Post
Woodinville
post Mar 15 2014, 10:44
Post #14





Group: Members
Posts: 1415
Joined: 9-January 05
From: In the kitchen
Member No.: 18957



QUOTE (knutinh @ Mar 14 2014, 00:15) *
Without much knowledge about what I am asking: Why is it that n>1 channels seems like something that is "bolted on to" inherently mono codecs, instead of codecs being natively multichannel aware? One would think that, deep down in the linear transforms and quantizers and auditory models, there might be some benefit to having complete knowledge, rather than working on a signal that someone chose to matrix as "mid/side"?

-k


That's certainly not the case for MPEG-2 AAC, I think.


--------------------
-----
J. D. (jj) Johnston
Go to the top of the page
+Quote Post
jmvalin
post Mar 15 2014, 16:01
Post #15


Xiph.org Speex developer


Group: Developer
Posts: 487
Joined: 21-August 02
Member No.: 3134



QUOTE (Woodinville @ Mar 15 2014, 02:30) *
1) HRTF cues, which are both TIME DELAY and INTENSITY (both as a function of frequency). This is kind of like 1 and 2 above, but please do not say "phase",
it's time delay, not phase, as far as the ear is concerned.


You're seeing it from the PoV of the ears, and I'm seeing it from the signal. The signal has a phase (not a time) at each frequency.

QUOTE (Woodinville @ Mar 15 2014, 02:30) *
2) Interaural correlation below 500Hz, interaural envelope correlation (across an ERB, of course) above 2kHz or so, and a mix of the two
between those two frequencies

Interaural delay is audible at any frequency, HOWEVER, above 2000Hz or so, the relevant time delay is that of the signal ENVELOPE in an ERB,
as opposed to the waveform itself.


My experience is that above 2 kHz IPD/ITD is much less important than IID and IC.

QUOTE (Woodinville @ Mar 15 2014, 02:30) *
Between 500 and 2000Hz, you're less sensitive to interaural delays. Two mechanisms conflict.


Correct. And my rule of thumb has always been "don't mess with anything below 2 kHz because it's not well understood and the ear is good at picking up anything you mess up".
Go to the top of the page
+Quote Post
Woodinville
post Mar 16 2014, 07:15
Post #16





Group: Members
Posts: 1415
Joined: 9-January 05
From: In the kitchen
Member No.: 18957



QUOTE (jmvalin @ Mar 15 2014, 08:01) *
My experience is that above 2 kHz IPD/ITD is much less important than IID and IC.

Then you're using the wrong signal. If a signal has a flat envelope, yes. If it's a signal from most any real source, not so much.

Even pitchy sounds can trigger this.


Try it. You may find (you will find) that the same problem that manifests as BMLD at under 500Hz happens over 2kHz with signals that have a strongly varying envelope.

Try it.

As to phase vs. time delay, the ear hears time delay, not phase, and if you can analyze phase between channels, you can also analyze time delay. Same variable, one is the integral of the other.


--------------------
-----
J. D. (jj) Johnston
Go to the top of the page
+Quote Post
jmvalin
post Mar 16 2014, 18:42
Post #17


Xiph.org Speex developer


Group: Developer
Posts: 487
Joined: 21-August 02
Member No.: 3134



QUOTE (Woodinville @ Mar 16 2014, 01:15) *
QUOTE (jmvalin @ Mar 15 2014, 08:01) *
My experience is that above 2 kHz IPD/ITD is much less important than IID and IC.

Then you're using the wrong signal. If a signal has a flat envelope, yes. If it's a signal from most any real source, not so much.

Even pitchy sounds can trigger this.


Try it. You may find (you will find) that the same problem that manifests as BMLD at under 500Hz happens over 2kHz with signals that have a strongly varying envelope.

Try it.

As to phase vs. time delay, the ear hears time delay, not phase, and if you can analyze phase between channels, you can also analyze time delay. Same variable, one is the integral of the other.


I have tried it. I've actually *measured* the perceptual impact of messing up the phase at higher frequencies. What I found was that you can pretty do whatever you like to the phase, as long as you leave the first 1-2 kHz alone. See This paper I published in 2008, more precisely, the perceptual impact of the "SCAL" algorithm.
Go to the top of the page
+Quote Post
Woodinville
post Mar 17 2014, 08:43
Post #18





Group: Members
Posts: 1415
Joined: 9-January 05
From: In the kitchen
Member No.: 18957



QUOTE (jmvalin @ Mar 16 2014, 10:42) *
I have tried it. I've actually *measured* the perceptual impact of messing up the phase at higher frequencies. What I found was that you can pretty do whatever you like to the phase, as long as you leave the first 1-2 kHz alone. See This paper I published in 2008, more precisely, the perceptual impact of the "SCAL" algorithm.



Again, what did you use for a test signal?



--------------------
-----
J. D. (jj) Johnston
Go to the top of the page
+Quote Post
Woodinville
post Mar 17 2014, 08:50
Post #19





Group: Members
Posts: 1415
Joined: 9-January 05
From: In the kitchen
Member No.: 18957



I think you've missed the entire point, jv. Try a properly imaged (including HRTF and proper delay) signal with no content below 2kHz that has fast attacks.



--------------------
-----
J. D. (jj) Johnston
Go to the top of the page
+Quote Post
jmvalin
post Mar 18 2014, 03:16
Post #20


Xiph.org Speex developer


Group: Developer
Posts: 487
Joined: 21-August 02
Member No.: 3134



QUOTE (Woodinville @ Mar 17 2014, 02:43) *
QUOTE (jmvalin @ Mar 16 2014, 10:42) *
I have tried it. I've actually *measured* the perceptual impact of messing up the phase at higher frequencies. What I found was that you can pretty do whatever you like to the phase, as long as you leave the first 1-2 kHz alone. See This paper I published in 2008, more precisely, the perceptual impact of the "SCAL" algorithm.



Again, what did you use for a test signal?


I tested with actual speech and music. I don't care about synthetic signals.
Go to the top of the page
+Quote Post
C.R.Helmrich
post Mar 18 2014, 09:23
Post #21





Group: Developer
Posts: 694
Joined: 6-December 08
From: Erlangen Germany
Member No.: 64012



QUOTE (Woodinville @ Mar 17 2014, 08:50) *
I think you've missed the entire point, jv. Try a properly imaged (including HRTF and proper delay) signal with no content below 2kHz that has fast attacks.

Moving back to CELT, I would think that the relatively short window durations (between 22.5 and 5 ms) would allow quite accurate ITD modeling (by means of short-term IID changes), especially when switching to short transforms upon transients. Or not?

Chris


--------------------
If I don't reply to your reply, it means I agree with you.
Go to the top of the page
+Quote Post
Woodinville
post Mar 20 2014, 06:11
Post #22





Group: Members
Posts: 1415
Joined: 9-January 05
From: In the kitchen
Member No.: 18957



QUOTE (C.R.Helmrich @ Mar 18 2014, 01:23) *
QUOTE (Woodinville @ Mar 17 2014, 08:50) *
I think you've missed the entire point, jv. Try a properly imaged (including HRTF and proper delay) signal with no content below 2kHz that has fast attacks.

Moving back to CELT, I would think that the relatively short window durations (between 22.5 and 5 ms) would allow quite accurate ITD modeling (by means of short-term IID changes), especially when switching to short transforms upon transients. Or not?

Chris



You want resolution in the 5 microsecond to .9 millisecond range.


--------------------
-----
J. D. (jj) Johnston
Go to the top of the page
+Quote Post
Nystagmus
post Mar 28 2014, 17:59
Post #23





Group: Members
Posts: 27
Joined: 13-October 13
Member No.: 110926



QUOTE (jmvalin @ Mar 17 2014, 21:16) *
QUOTE (Woodinville @ Mar 17 2014, 02:43) *
QUOTE (jmvalin @ Mar 16 2014, 10:42) *
I have tried it. I've actually *measured* the perceptual impact of messing up the phase at higher frequencies. What I found was that you can pretty do whatever you like to the phase, as long as you leave the first 1-2 kHz alone. See This paper I published in 2008, more precisely, the perceptual impact of the "SCAL" algorithm.



Again, what did you use for a test signal?


I tested with actual speech and music. I don't care about synthetic signals.


Plenty of music IS 100% SYNTHETIC!!!!!!!!!!!!!!!!!!!!!!!!
And plenty of other genre music has synths and organs too. This is the problem with lossy audio encoding. Lack of foresight about the ethnomusicological realities. And a lot of synthesizer music sounds like crap when lossy encoded because it pushes the codecs to their (classical? and jazz? -oriented) limits. check the compression ratios of different genres of music and you start to see that the more advanced the music is in terms of mixdown and special effects, the worse the compression ratio tends to be. Check a big catalogue of a lot of different genres of music sorted in foobar and look at the compression ratios.

Stuff that hardly has any dynamics and not much bandwidth in terms of layers compresses well if it's a good non-noisey recording. Stuff that has 50 different layers of widely contrasting sources each panned and autopanned in creative ways and with lots of differences in frequency contents and especially with lots of high energy percussion and it the codecs can barely deal with it. You can even see this with the ratios of FLAC. The more complex the music, the harder for the codec to encode it. Thank goodness it's lossless. And if the music has a lot of aesthetic distortion and bitcrushing, then the codec results plummet. And that type of music is still very popular. Musicians who make this kind of music know what I'm talking about when they chorus bitcrush flange phase and layer several reese basses and eq them heavily and automate the eq. It creates a distinct sound. There's endless other examples. MP3 and similar is designed for 1990s rock drumkits but not for year 2014 synth drums.

Electronic music can do pretty much anything... extremely amplitudinally dynamic low bass 180 out of phase coiniciding with mono midrange and compressed treble... anything goes. Combinations of hard and soft sounds, tonal and staticky, flanged, phased, resonant noise, anything goes. So the codec better be able to cope with it.

So what's the answer to the original guy's question?

This post has been edited by Nystagmus: Mar 28 2014, 18:07


--------------------
Be a false negative of yourself!
Go to the top of the page
+Quote Post
IgorC
post Mar 28 2014, 20:22
Post #24





Group: Members
Posts: 1582
Joined: 3-January 05
From: ARG/RUS
Member No.: 18803



Synthetic signals and synthetic music aren't in the same context here.
A frequency sweep is one of these synthetic signals.

Opus and AAC handle eletronic music very well actually. Before anything ... test it.

This post has been edited by IgorC: Mar 28 2014, 20:23
Go to the top of the page
+Quote Post
Woodinville
post Apr 18 2014, 01:21
Post #25





Group: Members
Posts: 1415
Joined: 9-January 05
From: In the kitchen
Member No.: 18957



Well, I got bored so I put these two files together.

Give them a listen. It would also be interesting to see what the various codecs do with the two files. smile.gif

Use both speakers and phones, of course.
Attached File(s)
Attached File  sig.wav ( 1.68MB ) Number of downloads: 60
Attached File  sig2.wav ( 1.68MB ) Number of downloads: 59
 


--------------------
-----
J. D. (jj) Johnston
Go to the top of the page
+Quote Post

2 Pages V   1 2 >
Reply to this topicStart new topic
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:

 



RSS Lo-Fi Version Time is now: 29th December 2014 - 08:52