IPB

Welcome Guest ( Log In | Register )

> Upload forum rules

- No over 30 sec clips of copyrighted music. Cite properly and never more than necessary for the discussion.


- No copyrighted software without permission.


- Click here for complete Hydrogenaudio Terms of Service

 
Reply to this topicStart new topic
Synchrotron, Universal, cross-correlating WAVE synchronization tool
rpp3po
post Jun 6 2009, 17:35
Post #1





Group: Developer
Posts: 1126
Joined: 11-February 03
From: Germany
Member No.: 4961



Synchrotron 1.2

Both audio encoders and decoders often add over a thousand samples of delay at the beginning of a file. This prevents both gapless playback and proper, sample synced ABX testing.

Several different implementations for gapless meta information have evolved over time for different lossy encoders. In practice this can work out pretty well, if you have full control over encoding and playback.

If you want to compare samples of different encoders, experience has shown that one cannot be sure that delay has really been removed for all files. Some decoders add their own delay or not and remove encoder delay (by reading meta information) or not. For example, converting Quicktime encoded AAC files through the Quicktime framework does not add decoding delay and removes encoder delay. Converting the same file to WAV with VLC adds 1088 samples of overall delay instead. I also got different overall delays from LAME encoded files at 48 kb/s and 128 kb/s.

Synchrotron can remove delay introduced by all lossy codecs, without having to rely on meta data. It uses a mathematical process called cross-correlation to exactly sample synchronize two files and then cut off any leading delay from the second (or just display it).

It serves the following purposes:
  • Prepare files prior to ABX testing.
  • Verify your encoder's/decoder's or disk writer's accuracy concerning meta-data based delay handling.
  • Generally provide ability to display two files' cross correlation.
  • Provide well structured and easy to read sample code, so that other developers can implement the same mechanism into their programs (e.g. Foobar's ABX component).

This is the cross platform Java binary:

Synchrotron-1.2.zip

There is only a command line interface, yet no GUI. It is easy to integrate into scripts or other applications. I'm also not planning to write a GUI, feel free to try it yourself, if you are interested.

CODE
Usage: java -jar synchotron.jar [--cut] primary_wav secondary_wav

  primary_wav: original PCM WAVE file (up to 24 bit integer supported)
secondary_wav: PCM WAVE file with similar content and possible delay
        --cut: remove delay from the beginning of secondary_wav
   --fullscan: scan entire file (very slow)


Sample output:
CODE
java -jar Synchrotron.jar --cut tmp1.wav tmp2.wav

PCM_SIGNED 44100.0 Hz, 16 bit, stereo, 4 bytes/frame, little-endian
Skipped 24598 leading sample(s) to improve accuracy.
Delay: 1088 - Cross Correlation: 0.99712723
Delay removal successful!


Make sure that you have a current Java Runtime Environment installed, either from your distribution's package repository (Linux) or from here (Windows). Mac OS X has it pre-installed.

Put the jar file into the same directory as your WAV files or better: a directory listed in your PATH environment variable. For example: /usr/local/bin or c:\windows, ...

Testing and comments welcome!

For developers:

The code is plain OO. All cross-correlation related issues are encapsulated inside the Correlator class. Java specific code, for example anything related to audio file reading, is located in the AudioFileCorrelator class. You are probably only interested in the former.

Testing has shown, that it is totally sufficient to cross-correlate about 40000 samples instead of the whole file. AudioFormatCorrelator will forward the audio streams to a significant position, so that cross-correlation isn't just applied to leading silence/noise.

This is the source including Netbeans project files:

Synchrotron-1.2-Source.zip

This post has been edited by rpp3po: Jun 25 2009, 02:12
Go to the top of the page
+Quote Post
Axon
post Jun 6 2009, 17:45
Post #2





Group: Members (Donating)
Posts: 1984
Joined: 4-January 04
From: Austin, TX
Member No.: 10933



OK rpp3po, this is good sh*t, but this reminds me of a feature I desperately want for vinyl craziness and I wanted to bounce it off of you. But you are most likely going to shoot me for asking for it.

How hard would it be to dynamically cross-correlate the signal? That is, do the cross-correlation at the start of the file, do the time shift internally, and then every T seconds, cut out a 2T-sized chunk of the file, window it with a Gaussian pulse, and then redo the cross-correlation.

Canar, if this is too oddball of a request could you split this off to a separate thread?

Also, can this do subsample delays? Is that even a concern with lossy encoders?

This post has been edited by Axon: Jun 6 2009, 17:45
Go to the top of the page
+Quote Post
rpp3po
post Jun 6 2009, 17:56
Post #3





Group: Developer
Posts: 1126
Joined: 11-February 03
From: Germany
Member No.: 4961



I don't know for sure that if I have fully understood what you want, but basically I don't see anything that would prevent one from doing this. The Correlator class already encapsulates the time shifting internally. You could just create all T seconds a new Correlator object and feed it with two 2T-sized integer(sample) arrays and ask it for a result (getCrossCorrelation()). The class is even thread safe. There would still probably be a second of lag until the result would be available. That's depending on the number of samples within 2T. Cross-correlation is quite processor & memory bandwidth heavy. But in any case you should get every T seconds a result on average.

QUOTE (Axon @ Jun 6 2009, 18:45) *
Also, can this do subsample delays? Is that even a concern with lossy encoders?


No, Synchrotron does not oversample and works at exactly the same precision (sample rate) as the input. For its intended main purpose (WAV file correction) subsample precision would not make a difference, since you can only apply correction in integer steps. I would guess that it doesn't make a difference for lossy encoding, either: The decoded signal is converted to series of PCM samples and you can't apply less than +/- one sample delay correction even if a value at subsample precision would be available.

This post has been edited by rpp3po: Jun 9 2009, 01:51
Go to the top of the page
+Quote Post
C.R.Helmrich
post Jun 6 2009, 19:14
Post #4





Group: Developer
Posts: 686
Joined: 6-December 08
From: Erlangen Germany
Member No.: 64012



Nice work, rpp3po!

QUOTE (rpp3po @ Jun 6 2009, 18:56) *
QUOTE (Axon @ Jun 6 2009, 18:45) *
Also, can this do subsample delays? Is that even a concern with lossy encoders?


No, Synchrotron does not oversample and works at exactly the same precision (sample rate) as the input. For its intended main purpose (WAV file correction) subsample precision would not make a difference, since you can only apply correction in integer steps.

Correct. Plus, at normal sampling rates of 32 kHz or more, sub-sample delays are inaudible. Actually, delays of one or two samples are probably also inaudible, but for blind listening tests, it is always better to restrict inter-stimulus delay to the microsecond range.

This, however, does not mean that lossy encoders do not create sub-sample delays. They in fact do at low bit rates because they downsample before encoding (from 44.1 to 32 kHz, for example). If you then upsample after decoding (or your sound card does so) to obtain the same sampling rate as the original file, this is likely to lead to a non-integer sample delay due to the anti-aliasing filter.

Chris


--------------------
If I don't reply to your reply, it means I agree with you.
Go to the top of the page
+Quote Post
rpp3po
post Jun 8 2009, 13:27
Post #5





Group: Developer
Posts: 1126
Joined: 11-February 03
From: Germany
Member No.: 4961





*** please delete ***

This post has been edited by rpp3po: Jun 8 2009, 22:50
Go to the top of the page
+Quote Post
krabapple
post Jun 8 2009, 16:04
Post #6





Group: Members
Posts: 2221
Joined: 18-December 03
Member No.: 10538



Looks to be intended for lossy vs lossless, but could Synchotron be used to 'align' two versions of the same lossless track (e.g., original and remastered version) for subsequent 'nulling' tests?

(I'm going to try it, just thought I'd ask too. ;>)

Go to the top of the page
+Quote Post
rpp3po
post Jun 8 2009, 16:54
Post #7





Group: Developer
Posts: 1126
Joined: 11-February 03
From: Germany
Member No.: 4961



As long as it is just different mastering of the same recording this should work perfectly. If parts of the track have been exchanged with material from other recording sessions your results may vary. Also the content's timing and length should not have been changed. Usual mastering steps as normalization, compression, stereo processing, and even slight reverbation and equalization should not harm much. The average correlation values will be lower, but the point of maximum correlation should still be identifiable for Synchrotron.

The current version is limited to detect at max 4096 samples delay. This is enough for common encoder delays. If about 1/10th of a second possible delay is not enough for your purpose, let me know. Also only 40000 samples are cross-correlated by the main program. If your second mastering is very different, changing that number could help, too.

It's no problem to increase these values, but it would considerably hurt performance, that's why they are preset moderately.

This post has been edited by rpp3po: Jun 8 2009, 17:57
Go to the top of the page
+Quote Post
Martel
post Jun 8 2009, 20:23
Post #8





Group: Members
Posts: 553
Joined: 31-May 04
From: Czech Rep.
Member No.: 14430



QUOTE (rpp3po @ Jun 8 2009, 08:54) *
It's no problem to increase these values, but it would considerably hurt performance, that's why they are preset moderately.
This is just a brainstorming attempt based on some knowledge that I once possessed... smile.gif
Isn't it possible to do something like FFT of the two signals (you may have to reverse one of them, I don't remember exactly), do a dot product in the spectral domain, then IFFT to obtain the cross-correlation (complexity goes down from N^2 to like NlogN but you need power-of-two sample lengths)?
I apologize if I talk nonsense but this should be basically the same as applying a FIR filter (convolution in time domain ~ dot product in the spectral domain), only one of the signals is reversed in correlation compared to convolution.


--------------------
IE4 Rockbox Clip+ AAC@192; HD 668B/HD 518 Xonar DX FB2k FLAC;
Go to the top of the page
+Quote Post
rpp3po
post Jun 8 2009, 22:13
Post #9





Group: Developer
Posts: 1126
Joined: 11-February 03
From: Germany
Member No.: 4961



The cross-correlation calculation itself could indeed work at O(n log n) complexity with your proposal, I guess. I'm using an integer only based approach right now without FFT conversion; a side product of this is the exact sample offset position for the highest correlation between both signals. Would your FFT method also output this offset or just two signals' overall cross-correlation value?

This post has been edited by rpp3po: Jun 8 2009, 23:20
Go to the top of the page
+Quote Post
Martel
post Jun 9 2009, 18:49
Post #10





Group: Members
Posts: 553
Joined: 31-May 04
From: Czech Rep.
Member No.: 14430



Oops, I mistook the English words (I'm not English, sorry). I did not mean dot product of the spectra but rather products of the corresponding spectral components which should yield a N-wide vector (series) on which you may do IFFT (you can't do that on a scalar which is a result of a dot product).

I guess the method yields a series of N cross-correlation values corresponding to different shift offsets. However, I'm not sure which part of the full cross-correlation series (2N - 2 samples, IIRC) that is.

This topic was like 5 minutes during a university class and it was 6 years ago. It wasn't particularly memorable, sorry.


--------------------
IE4 Rockbox Clip+ AAC@192; HD 668B/HD 518 Xonar DX FB2k FLAC;
Go to the top of the page
+Quote Post
rpp3po
post Jun 22 2009, 00:27
Post #11





Group: Developer
Posts: 1126
Joined: 11-February 03
From: Germany
Member No.: 4961



Updated to 1.1
  • Added security check to not cut files with a cross correlation lower than 0.88.

I had just accidentally interchanged an original and lossy version on the command line and got strange results (like a correlation of 0.44). If this had happened while the --cut option was present, the program could have cut the original instead of the lossy file at an arbitrary position.

This post has been edited by rpp3po: Jun 25 2009, 03:44
Go to the top of the page
+Quote Post
Arnold B. Kruege...
post Jun 24 2009, 14:18
Post #12





Group: Members
Posts: 3690
Joined: 29-October 08
From: USA, 48236
Member No.: 61311



QUOTE (C.R.Helmrich @ Jun 6 2009, 14:14) *
Plus, at normal sampling rates of 32 kHz or more, sub-sample delays are inaudible. Actually, delays of one or two samples are probably also inaudible, but for blind listening tests, it is always better to restrict inter-stimulus delay to the microsecond range.


Are you seriously claiming that you can reliably hear a difference between two files that are misaligned by 3 or more samples?

My experienced-based rule of thumb says that up to 1 mSec difference is innocious.
Go to the top of the page
+Quote Post
C.R.Helmrich
post Jun 24 2009, 21:50
Post #13





Group: Developer
Posts: 686
Joined: 6-December 08
From: Erlangen Germany
Member No.: 64012



QUOTE (Arnold B. Krueger @ Jun 24 2009, 15:18) *
Are you seriously claiming that you can reliably hear a difference between two files that are misaligned by 3 or more samples?

My experienced-based rule of thumb says that up to 1 mSec difference is innocious.

Yes, I am. Not for stationary passages, of course. But when you cut (for example, when defining a loop in a blind test) right within a sharp attack, say, a castanet or bass drum hit, and then loop that part, you can hear a difference. Example here. Both castanet excerpts are exactly 2 seconds long but offset by 3 samples. Hear for yourself. And this is not even the most obvious example I can come up with. You can always create something which has a clear instationarity between the cut boundaries in one stimulus, but not in the other.

Chris

Update: I just ABXed that successfully using foobar even without looping those two files. The first attack "plops" more in one item, and the "plop" seems to come from a different location in the stereo image.

This post has been edited by C.R.Helmrich: Jun 24 2009, 21:57


--------------------
If I don't reply to your reply, it means I agree with you.
Go to the top of the page
+Quote Post
C.R.Helmrich
post Jun 25 2009, 00:06
Post #14





Group: Developer
Posts: 686
Joined: 6-December 08
From: Erlangen Germany
Member No.: 64012



Sorry, I meant discontinuity instead of instationarity. Also didn't realize that this thread is in the upload section tongue.gif So here's my two above demo files.

Chris

Attached File  sqam27_3smp_delay.zip ( 564.84K ) Number of downloads: 162


--------------------
If I don't reply to your reply, it means I agree with you.
Go to the top of the page
+Quote Post
rpp3po
post Jun 25 2009, 00:17
Post #15





Group: Developer
Posts: 1126
Joined: 11-February 03
From: Germany
Member No.: 4961



Updated to 1.2
  • Added --fullscan switch for complete file scans. That's slow and usually not needed for delay computation, but can be employed to get two files' overall cross correlation.
  • Slight refactoring. Increased precision.
  • jUnit test cases and test samples removed from source package.


This post has been edited by rpp3po: Jun 25 2009, 01:50
Go to the top of the page
+Quote Post

Reply to this topicStart new topic
2 User(s) are reading this topic (2 Guests and 0 Anonymous Users)
0 Members:

 



RSS Lo-Fi Version Time is now: 23rd August 2014 - 19:08