IPB

Welcome Guest ( Log In | Register )

3 Pages V  < 1 2 3 >  
Reply to this topicStart new topic
Interesting Histograms.
Axon
post May 30 2011, 02:08
Post #26





Group: Members (Donating)
Posts: 1984
Joined: 4-January 04
From: Austin, TX
Member No.: 10933



QUOTE (googlebot @ May 29 2011, 16:34) *
QUOTE (Axon @ May 29 2011, 20:36) *
Then I posted "GNU Octave fucking blows" on Facebook. Now see jj, if I friended you, you would have seen that comment, and you wouldn't have had to go through all that trouble, eh wink.gif


I somewhat associate 'Axon' with wooing Woodinville for quite some time now, but I think that's a new level. smile.gif

You have no idea.


QUOTE (Woodinville @ May 29 2011, 17:41) *
QUOTE (Canar @ May 29 2011, 14:34) *
I'll bug the admins. I can't do this. In the interim, zip the .m and post that?


Well, screen-copying the text above into octave works like a champ smile.gif

It's not like there's any special character stuff in it.


Right, but, my hacked audio package is 380 lines of code, my test for said hack was another 50... it starts to add up.

In any case. I have uploaded everything I've got HERE. Have at it.
Go to the top of the page
+Quote Post
Canar
post May 30 2011, 06:14
Post #27





Group: Super Moderator
Posts: 3361
Joined: 26-July 02
From: princegeorge.ca
Member No.: 2796



QUOTE (Axon @ May 29 2011, 18:08) *
Right, but, my hacked audio package is 380 lines of code, my test for said hack was another 50... it starts to add up.
[ codebox ] will work.

This post has been edited by Canar: May 30 2011, 06:14


--------------------
You cannot ABX the rustling of jimmies.
No mouse? No problem.
Go to the top of the page
+Quote Post
bandpass
post May 30 2011, 06:53
Post #28





Group: Members
Posts: 326
Joined: 3-August 08
From: UK
Member No.: 56644



QUOTE (Woodinville @ May 29 2011, 20:15) *
You can go into the library and change it to the proper 2^(n-1) without much trouble.

2^(n-1) is still problematic (though perhaps not for histograms). E.g. a wav containing a minimum value cannot be inverted.
Go to the top of the page
+Quote Post
Canar
post May 30 2011, 07:15
Post #29





Group: Super Moderator
Posts: 3361
Joined: 26-July 02
From: princegeorge.ca
Member No.: 2796



More generally,

CODE
float scale(int value){
  float offset=(MAX_INT+MIN_INT)/2;
  float range=(MAX_INT-MIN_INT)/2;
  return (value+offset)/range;
}


MAX_INT and MIN_INT are the minimum and maximum values for the int type, which is a fixed-point number.

This post has been edited by Canar: May 30 2011, 07:19
Reason for edit: cleaning up


--------------------
You cannot ABX the rustling of jimmies.
No mouse? No problem.
Go to the top of the page
+Quote Post
bandpass
post May 30 2011, 07:31
Post #30





Group: Members
Posts: 326
Joined: 3-August 08
From: UK
Member No.: 56644



But not for wav (or au, or aiff) files, where 0 is defined to be the midpoint ("offset" in the code).

Edit: not sure if the code has representational issues: mathematically, (MAX_INT+MIN_INT)/2 = -0.5

This post has been edited by bandpass: May 30 2011, 07:40
Go to the top of the page
+Quote Post
Canar
post May 30 2011, 08:41
Post #31





Group: Super Moderator
Posts: 3361
Joined: 26-July 02
From: princegeorge.ca
Member No.: 2796



That code should map int values to 0..1 inclusive, ideally, ignoring external definition of midpoint. Defining the midpoint to be int 0 implies that the encoding is slightly biased towards negative values. Seems to be a really weird engineering decision. Do you have a citation for defining 0 to be the midpoint?

This post has been edited by Canar: May 30 2011, 08:42


--------------------
You cannot ABX the rustling of jimmies.
No mouse? No problem.
Go to the top of the page
+Quote Post
romor
post May 30 2011, 08:59
Post #32





Group: Members
Posts: 673
Joined: 16-January 09
Member No.: 65630



unsure.gif to me it seems that Woodinville found that formula as workaround, and as shown later by xnor Octave's wavread does have issue, only of different kind

libsndfile data I was referring was with e-notation and it seemed like everything is OK with 16 digit Octave floats (not mentioning 4-6 digits cool edit wink.gif but it wasn't - values were way off, sometimes to two decimals
Changing to xnor notices then formating everything to 16-digit floats, and double checking, seem fine again

Apologetic IPy version for those histograms wink.gif


However literal translation of Woodinville script and use of libsndfile, seems even faster:
CODE
def histo(fn):

    import scikits.audiolab as au

    (sp, sf, b) = au.wavread(fn)
    
    sp = sp.conj().transpose()
    his=zeros((65536,), dtype=numpy.int)

    for i in range(len(sp[1])):
        for j in (0, 1):
            t = round(sp[j, i] * 32768 + 32769)
            his[t] += 1
    
    his = maximum(his/float(sum(his)), .000000000001)
    xax = arange(-32768,32768)

    semilogy(xax,his)


Compared to histo loop in Octave:

CODE
fname='xerrox.wav';
x=wavread(fname);
x=x';

len=length(x)

his(1:65536)=0;

low=32768;
high=32768;
windd=hann(2048)';

sfmmean=0;
nmeas=0;

for ii=1:len
for jj=1:2

t=x(jj,ii);

if ( t < -1)
t=-1
fflush(stdout);
end

if (t >65535/65536)
t=65535/65536
fflush(stdout);
end

t=round(t*32768+32769);

his(t)=his(t)+1;

if (t < low)
low=t;
end

if (t > high)
high=t;
end

end

end

tot=sum(his);
his=his/tot;
his=max(his, .000000000001);
xax=-32768:32767;


IPy version is ~5 times faster, using "tic; run; toc" in Octave and "%timeit" in IPython

Whole loop (excluding simple arithmetics out of the main loop) is ~12 times faster with IPy using numpy build against MKL: http://pastebin.com/taKbfCDk
Not sure if ration is dependent on track length

Also parallel and multicore processing feature from IPy wasn't used, and code isn't simplified as it could - I imagine, but it is literally translated to IPy - main reason is because I don't understand what's going on on first sight, and translating was already trouble wink.gif

I used "whos" and slicing both in Octave and IPy, to check if variables match and everything is OK

Not sure what's the purpose of "kk - for loop": kk is never used if intended to process both channel data, and even if it should, "jj - for loop" seems to me it could take care. Or perhaps I use Octave rarely wink.gif


--------------------
scripts: http://goo.gl/M1qVLQ
Go to the top of the page
+Quote Post
bandpass
post May 30 2011, 09:10
Post #33





Group: Members
Posts: 326
Joined: 3-August 08
From: UK
Member No.: 56644



QUOTE (Canar @ May 30 2011, 08:41) *
That code should map int values to 0..1 inclusive, ideally, ignoring external definition of midpoint. Defining the midpoint to be int 0 implies that the encoding is slightly biased towards negative values. Seems to be a really weird engineering decision. Do you have a citation for defining 0 to be the midpoint?

http://msdn.microsoft.com/en-us/library/ms...audiodataformat

Weird, certainly: lots of special-case code needed and/or DC-offsets creeping in.
Go to the top of the page
+Quote Post
googlebot
post May 30 2011, 09:44
Post #34





Group: Members
Posts: 698
Joined: 6-March 10
Member No.: 78779



QUOTE (Canar @ May 30 2011, 09:41) *
Defining the midpoint to be int 0 implies that the encoding is slightly biased towards negative values.


It doesn't have to imply that. The negative range can also be interpreted to just have more headroom. When you convert from a balanced to an unbalanced encoding, the max negative symbol stays unused. When you convert from an unbalanced to a balanced encoding, there indeed needs to be special case handling. Best general guidance would be not using the max negative symbol at all and asking for user feedback when you encounter it on the input pipeline.

It is a good thing, that 0 is defined as the midpoint. Else detecting silence would be a PITA. When the first PCM formats were developed, these differences were probably too far below the analog noise floor of the best converters to really cause any concern.

This post has been edited by googlebot: May 30 2011, 09:49
Go to the top of the page
+Quote Post
bandpass
post May 30 2011, 10:20
Post #35





Group: Members
Posts: 326
Joined: 3-August 08
From: UK
Member No.: 56644



QUOTE (googlebot @ May 30 2011, 09:44) *
Best general guidance would be not using the max negative symbol at all and asking for user feedback when you encounter it on the input pipeline.

Should an ADC request user intervention whenever its input is < 1/2^n of its range?

QUOTE
It is a good thing, that 0 is defined as the midpoint. Else detecting silence would be a PITA.

I'm not sure that's useful though; in practice, silence is defined in (finite) dB.
Go to the top of the page
+Quote Post
xnor
post May 30 2011, 11:04
Post #36





Group: Developer
Posts: 683
Joined: 29-April 11
From: Austria
Member No.: 90198



I don't see the problem. PCM cannot represent exactly +1.0, it's as simple as that. The valid range is -1.0 <= y < +1.0 where 1.0 maps to 2^(nbits-1).

Anything above that range is simply clipped to the highest possible value, so inverting -32768 results in +32767.

This post has been edited by xnor: May 30 2011, 11:07
Go to the top of the page
+Quote Post
Woodinville
post May 30 2011, 11:10
Post #37





Group: Members
Posts: 1402
Joined: 9-January 05
From: JJ's office.
Member No.: 18957



QUOTE (bandpass @ May 29 2011, 22:53) *
QUOTE (Woodinville @ May 29 2011, 20:15) *
You can go into the library and change it to the proper 2^(n-1) without much trouble.

2^(n-1) is still problematic (though perhaps not for histograms). E.g. a wav containing a minimum value cannot be inverted.


That is correct, and proper.

That is the nature of 2's compliment, there is only one '0' entry, and thus one extra negative entry.

And, yes, if the quantizer is done as a standard PCM quantizer, there must be a zero reconstruction level.

That is, after all, the definition. Yes. Really.


--------------------
-----
J. D. (jj) Johnston
Go to the top of the page
+Quote Post
Woodinville
post May 30 2011, 11:17
Post #38





Group: Members
Posts: 1402
Joined: 9-January 05
From: JJ's office.
Member No.: 18957



QUOTE (romor @ May 30 2011, 00:59) *
Not sure what's the purpose of "kk - for loop": kk is never used if intended to process both channel data, and even if it should, "jj - for loop" seems to me it could take care. Or perhaps I use Octave rarely wink.gif


It's for doing the spectral flatness measure, but I may have indeed messed up, let me look.

Yep, messed up. the '1' in the array reference should be 'kk'.

Sigh. I'll see if the sfm is much different. It's unlikely.

Oh, and yes, octave is astonishingly slow, even when you only do the histogram stuff. In fact, the SFM stuff adds surprisingly little to the run time, which is bizzare.

Even without bounds checking it's slow, slow, slow. No idea why.

Matlab is quite a bit faster, but I don't have it here.

This post has been edited by Woodinville: May 30 2011, 11:20


--------------------
-----
J. D. (jj) Johnston
Go to the top of the page
+Quote Post
googlebot
post May 30 2011, 13:00
Post #39





Group: Members
Posts: 698
Joined: 6-March 10
Member No.: 78779



QUOTE (bandpass @ May 30 2011, 11:20) *
Should an ADC request user intervention whenever its input is < 1/2^n of its range?


No, just fall back to the usual behavior defined for all out-of-range values.

QUOTE (bandpass @ May 30 2011, 11:20) *
I'm not sure that's useful though; in practice, silence is defined in (finite) dB.


It's not a bug, it's a feature! From an analog perspective, a good place for the midpoint would have been between the two smallest symbols and both ranges would have been in perfect symmetry. Silence would then be encoded as some form of noise alternating between the smallest symbols, which is fine, since PCM encoding doesn't make any promise better than that.

Giving '0' the privileged meaning of 'digital silence', at the cost of one usual symbol in the positive range, enables scenarios, where you signal something like: "don't try to replicate my primitive approximation of silence but replace it with the best silence you have available". The cost (1 symbol) isn't significant in contrast to the gained possibility.

In practice you dither, so a privileged '0' symbol is unnecessary, since silence is encoded as noise anyway. Some dithering tools have something like an "auto-black" feature, though.

QUOTE (xnor @ May 30 2011, 12:04) *
I don't see the problem. PCM cannot represent exactly +1.0, it's as simple as that. The valid range is -1.0 <= y < +1.0 where 1.0 maps to 2^(nbits-1).


[-1.0, 1.0] is a perfectly fine range for PCM encoding in float representation. Why should artificial constraints from a legacy storage format be carried over to a better format, which doesn't benefit from that constraint in any way? Conversion to and from [-1, 1] isn't black magic, after all.


This post has been edited by googlebot: May 30 2011, 13:12
Go to the top of the page
+Quote Post
xnor
post May 30 2011, 18:15
Post #40





Group: Developer
Posts: 683
Joined: 29-April 11
From: Austria
Member No.: 90198



QUOTE (googlebot @ May 30 2011, 14:00) *
[-1.0, 1.0] is a perfectly fine range for PCM encoding in float representation.

I (we?) were talking about the PCM format (format tag 1) in RIFF WAVE files, which is integer only, and the normalization issue.
With normalized floats (format tag 3) the range is of course, like you posted, -1.0 <= y <= +1.0 and normalization is not needed.

I don't think those non-floating point formats are legacy at all. I know a couple of recording engineers that do not use floats as storage format.
And I think it's common practice to keep the level at least a fraction of a dB below full scale. Even if you're only 0.01 dB below full scale you're down to something like 32730 with 16-bit integers.

This post has been edited by xnor: May 30 2011, 18:41
Go to the top of the page
+Quote Post
Canar
post May 30 2011, 18:40
Post #41





Group: Super Moderator
Posts: 3361
Joined: 26-July 02
From: princegeorge.ca
Member No.: 2796



Once again I put in my amateur two-bits worth and receive sound instruction in return. I <3 you guys. </off-topic>


--------------------
You cannot ABX the rustling of jimmies.
No mouse? No problem.
Go to the top of the page
+Quote Post
Axon
post May 30 2011, 20:37
Post #42





Group: Members (Donating)
Posts: 1984
Joined: 4-January 04
From: Austin, TX
Member No.: 10933



Yeah I also simplified jj's Octave code (into something like 10 lines IIRC), and always attempted to use native Octave functions whereever I could, and it still ran like a dog.

mmmm... Python for numeric work. Crunchy.
Go to the top of the page
+Quote Post
Woodinville
post May 30 2011, 20:49
Post #43





Group: Members
Posts: 1402
Joined: 9-January 05
From: JJ's office.
Member No.: 18957



QUOTE (Axon @ May 30 2011, 12:37) *
Yeah I also simplified jj's Octave code (into something like 10 lines IIRC), and always attempted to use native Octave functions whereever I could, and it still ran like a dog.

mmmm... Python for numeric work. Crunchy.


Nah, dogs run fast. It's not that fast sad.gif


--------------------
-----
J. D. (jj) Johnston
Go to the top of the page
+Quote Post
bandpass
post May 31 2011, 07:49
Post #44





Group: Members
Posts: 326
Joined: 3-August 08
From: UK
Member No.: 56644



QUOTE (xnor @ May 30 2011, 11:04) *
PCM cannot represent exactly +1.0, it's as simple as that. The valid range is -1.0 <= y < +1.0 where 1.0 maps to 2^(nbits-1).
Anything above that range is simply clipped to the highest possible value, so inverting -32768 results in +32767.

This approach introduces a new mathematics, where inversion is non-linear, which is madness, or at least highly undesirable.

Either the analogue signal is biased with LSB, in which case the digital signal range is -32767 to +32767 (-32768 can never occur and would clip if sent to a corresponding DAC), or the digital signal is biased with LSB, in which case the inverse of -32768 is 32767.

Note that even though microsoft claims that the midpoint is zero, a WAV file cannot know how your ADC is biased-up.

QUOTE (Woodinville @ May 30 2011, 11:10) *
That is correct, and proper.
That is, after all, the definition. Yes. Really.

Can you provide a source for the definition?

This post has been edited by bandpass: May 31 2011, 07:58
Go to the top of the page
+Quote Post
Nick.C
post May 31 2011, 08:03
Post #45


lossyWAV Developer


Group: Developer
Posts: 1791
Joined: 11-April 07
From: Wherever here is
Member No.: 42400



.... but can you hear a 0.5 lsb offset?


--------------------
lossyWAV -q X -a 4 --feedback 4| FLAC -8 ~= 320kbps
Go to the top of the page
+Quote Post
Woodinville
post May 31 2011, 08:24
Post #46





Group: Members
Posts: 1402
Joined: 9-January 05
From: JJ's office.
Member No.: 18957



QUOTE (bandpass @ May 30 2011, 23:49) *
Can you provide a source for the definition?


It goes all the way back to about 1960. I don't recall it presently, but in fact zero is zero. It all boils down to that.

There have been a variety of scalings for fix to float conversion, but the most common is that of -1 is the largest negative. Given the reality of integer 2's compliment math, that's really how it all works out.
For sign-magnitude integers, you wind up with one zero with two codes for it in the integer.

It is possible to do midriser quantizers instead of midtreat quantizers, but then the following bites you:

When you start to do integer math and floating point math and expect something to work out the same way, you have to have zero is in fact zero, and nothing else but. Otherwise you have very different domains for your signals.


--------------------
-----
J. D. (jj) Johnston
Go to the top of the page
+Quote Post
Axon
post May 31 2011, 08:26
Post #47





Group: Members (Donating)
Posts: 1984
Joined: 4-January 04
From: Austin, TX
Member No.: 10933



0.5lsb is an issue for spectrum analysis, and in principle, might also exacerbate potential stability issues in lowpass filters. But more importantly, ITS JUST WRONG.

QUOTE (xnor @ May 30 2011, 11:04) *
PCM cannot represent exactly +1.0, it's as simple as that. The valid range is -1.0 <= y < +1.0 where 1.0 maps to 2^(nbits-1).
Anything above that range is simply clipped to the highest possible value, so inverting -32768 results in +32767.

Not quite -- inverting (multiplying by -1) -32768 results in -32768. Invert all the bits, and add 1.

QUOTE
QUOTE (Woodinville @ May 30 2011, 11:10) *
That is correct, and proper.
That is, after all, the definition. Yes. Really.

Can you provide a source for the definition?

The wikipedia entry for two's complement arithmetic?

This post has been edited by Axon: May 31 2011, 08:38
Go to the top of the page
+Quote Post
Axon
post May 31 2011, 08:30
Post #48





Group: Members (Donating)
Posts: 1984
Joined: 4-January 04
From: Austin, TX
Member No.: 10933



And just to flesh this discussion out some, yes, having a negative value which cannot be inverted to a positive value *is* the cleanest and most efficient solution. Unless anybody here would instead prefer negative zero. Hands? smile.gif
Go to the top of the page
+Quote Post
C.R.Helmrich
post May 31 2011, 09:19
Post #49





Group: Developer
Posts: 688
Joined: 6-December 08
From: Erlangen Germany
Member No.: 64012



QUOTE (Axon @ May 31 2011, 09:26) *
QUOTE (xnor @ May 30 2011, 11:04) *
PCM cannot represent exactly +1.0, it's as simple as that. The valid range is -1.0 <= y < +1.0 where 1.0 maps to 2^(nbits-1).
Anything above that range is simply clipped to the highest possible value, so inverting -32768 results in +32767.

Not quite -- inverting (multiplying by -1) -32768 results in -32768. Invert all the bits, and add 1.

Meaning all -1s are inverted to 2, and there will be no 1s? Btw, the way xnor described it is how e.g. Audition inverts.

Chris


--------------------
If I don't reply to your reply, it means I agree with you.
Go to the top of the page
+Quote Post
bandpass
post May 31 2011, 10:13
Post #50





Group: Members
Posts: 326
Joined: 3-August 08
From: UK
Member No.: 56644



QUOTE (Axon @ May 31 2011, 08:26) *
0.5lsb is an issue for spectrum analysis, and in principle, might also exacerbate potential stability issues in lowpass filters. But more importantly, ITS JUST WRONG.

Indeed, hence the discussion—it's a small but annoying issue if it's not handled consistently.

QUOTE
QUOTE (xnor @ May 30 2011, 11:04) *
PCM cannot represent exactly +1.0, it's as simple as that. The valid range is -1.0 <= y < +1.0 where 1.0 maps to 2^(nbits-1).
Anything above that range is simply clipped to the highest possible value, so inverting -32768 results in +32767.

Not quite -- inverting (multiplying by -1) -32768 results in -32768. Invert all the bits, and add 1.

But that's also highly undesirable for DSP.

At the ADC, if there is no bias, inverting an analogue signal that converts to -32768 would produce an analogue signal that converts to 32767; digital inversion should give the same result (at the DAC output that is).

QUOTE
QUOTE
QUOTE (Woodinville @ May 30 2011, 11:10) *
That is correct, and proper.
That is, after all, the definition. Yes. Really.

Can you provide a source for the definition?

The wikipedia entry for two's complement arithmetic?

It doesn't mention ADC/DAC biasing. A better place to look might be IEC 60908 or somesuch.

If there is no ADC bias (and 16-bit ADC values are stored unmodified or with just the top bit flipped), then a valid DSP solution is:

CODE
float dsp_sample = (adc_sample + 0.5) / 32767.5;

If there is LSB ADC bias then a valid DSP solution is:

CODE
float dsp_sample = adc_sample / 32767.0;

and -32768 is an unused value. The code:

CODE
float dsp_sample = adc_sample / 32768.0;

doesn't seem to map to any real world ADC scenario.

In practice, as has been mentioned, recordings are made with headroom and probably have any DC-offset (w.r.t. digital 0) removed with post-processing; this however has the same result as biasing the ADC, which again means that -32768 should be an unused value.

This post has been edited by bandpass: May 31 2011, 10:36
Go to the top of the page
+Quote Post

3 Pages V  < 1 2 3 >
Reply to this topicStart new topic
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:

 



RSS Lo-Fi Version Time is now: 21st September 2014 - 14:01