Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: ffmpeg vs. SoX for resampling (Read 34055 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

ffmpeg vs. SoX for resampling

Reply #25
So, where is TOS#8 when everyone agrees on looking at graphs? Not trying to troll, I just smell some bigottery.
There’s no problem with discussing theoretical degradation of signals. IMHO, this (the degradation, not the discussion!) should be avoided wherever possible, whether we can hear it or not. If anyone had said ‘Yeah, it looks awful, and it sounds even worse!’, then TOS #8 might be relevant. I suspect that FFmpeg is encroaching onto audible territory, but I make no claim either way: rather, my point is that there’s no need for it to degrade the signal that much when other algorithms produce conversions that are many times cleaner.

ffmpeg vs. SoX for resampling

Reply #26
As always, there is tradeoff between speed and accuracy, and in this field, there's even different techniques in play.


There are two good ways to resample:

- Using SINC ( sin(x)/x ) interpolation.
- Using a decimator/interpolator combination.

In both cases, a filter is needed to reconstruct the signal, and the quality of that filter reflects how fast and how clean it is in removing frequency imaging, without adding other unwanted distortions.


Both are relatively slow (They migh be "fast" for downsampling a single stereo sample from 96Khz to 44.1Khz, but now think about a realtime sampler like BASSMIDI, FluidSynth, or any other Soundfont player, where you not only play several streams, but also change their speed while playing).


So there have always existed other "resamplers" (not in the academic sense, but approximations) like:

- ZO (Zero order, or sample Hold). The fastest and worst of them all (See the graph of Secret Rabbit Code, ZOH).
- Linear interpolation.  Quite an improvement over ZO, and still fast (This was being used in 16 and 32 channel trackers back in 1992-1996 with 386 and 486 PC's! See Secret Rabbit Code, Linear. One can apply a filter with this type of interpolation, like in Wavosaur Linear, but it sort of defeats the purpose)
- Cubic interpolation. Cubic interpolation, and other polynomial interpolators are an advancement over linear interpolation, approximating the signal using polinomials. This generally lowers aliasing notably, but they still miss a filter. (See Renoise cubic, or OpenMPT polyphase, which are, again, realtime multichannel trackers with realtime virtual effects).


Any SINC interpolator should give a clean signal, but it is at least 4 times slower than cubic interpolation. It can filter by itself, but to be of good quality, it gets slower (because it needs more taps).


So, while a product that is specific to change the sample rate of a signal should have a good resampler, one should not think that it is the only reasonable way to resample.

ffmpeg vs. SoX for resampling

Reply #27
So, where is TOS#8 when everyone agrees on looking at graphs? Not trying to troll, I just smell some bigottery.

I doubt you'd be happy if a straight PCM file conversion (say, from AIFF to WAV) introduced artefacts; why settle for less when resampling is involved?  Even if you can't hear the artefacts immediately, you run the risk that you might hear them later (e.g. on a better system, or after further processing).

ffmpeg vs. SoX for resampling

Reply #28
The conversion quality is the same (graphs for the newer version are @ infinitewave under Audacity 2.0.3); the performance is higher cos it's faster than before.


Do I understand correctly Audacity 2.0.3 is using new ffmpeg through which it uses libsoxr for the resampling? I would like to add a new resampling option to linux alsa rate plugin. Until recently I thought sox implementation was the best from the quality/speed POW. Using a library compatible with libsamplerate API would make the addition simple as libsamplerate is already supported by the plugin.

Thanks for the info.

ffmpeg vs. SoX for resampling

Reply #29
Audacity are using libsoxr directly (native API). LameDrop OTOH, is using the libsoxr's libsamplerate-like API. 

If you're linking to libsamplerate dynamically, you may be able to try out libsoxr before any recompiling, by simply moving/renaming the libsoxr-lsr.so in place of the libsamplerate one (I tried this successfully with pulseaudio and saw a 4-5 times speed up @ SRC_SINC_MEDIUM_QUALITY).

Note however, that varying the sample-rate in real-time is only supported in libsoxr through it's native API, so this trick wouldn't work in that case.

HTH.

ffmpeg vs. SoX for resampling

Reply #30

As always, there is tradeoff between speed and accuracy, and in this field, there's even different techniques in play.


There are two good ways to resample:

- Using SINC ( sin(x)/x ) interpolation.
- Using a decimator/interpolator combination.

In both cases, a filter is needed to reconstruct the signal, and the quality of that filter reflects how fast and how clean it is in removing frequency imaging, without adding other unwanted distortions.


Aren't these two ways of implementing essentially the same process since the decimating/interpolating filter will probably have pretty close to a sinc form?

FWIW, I would add polynomial interpolation as the main alternative to sinc interpolation.  Most software uses one or the other.

ffmpeg vs. SoX for resampling

Reply #31

... Any SINC interpolator should give a clean signal ...

Can you tell me please which software uses SINC?
lame3995o -Q1.7 --lowpass 17

ffmpeg vs. SoX for resampling

Reply #32
@ halb27:  I haven't checked other software, but a sinc method is implemented in Psycle (And at least back to 2003 or so, there was a tracker name Aodix that had a 64point sinc interpolator, for non-realtime rendering).  Here you can see the code of Psycle:

http://sourceforge.net/p/psycle/code/10455...rs/dsp.hpp#l544
search for: static float band_limited

The sinc table is calculated in this other file, and a blackman window is applied to make it finite.
http://sourceforge.net/p/psycle/code/10455...helpers/dsp.cpp

This implementation still lacks a filter (I am trying to find a good one that doesn't require too much lookahead, but I might end calling soxr variable rate if I can't find a good way to do it).
I have a file as old as 2001 (I had to doublecheck to be sure of the year), that shows a sinc interpolator and applies a filter modifying the sinc speed. But this way of filtering decays slowly and in what i've tested, alters the frequencies too.




@saratoga: I am not an expert in DSP or maths (I did study the fourier transform at the university, but was applied to signals in general, not specifically to sound).
But with my knowledge of resampling (i.e. what I've tried to know), the sinc interpolation is considered the ideal (which also means not possible in finite time/signals) resampling method because it is the response or an ideal brickwall lowpass filter. Real implementations have to window the sinc in order to make it finite, and in this way, limit the amount of samples needed to calculate the output. (See Psycle's implementation).

In contrast, decimating and interpolating is a two step method which firstly upsamples using the zero-stuffing method, applies a lowpass filter at the lowest of the two samplerates, and then downsamples by getting directly the samples from the lowpassed signal. The difficult part is getting the values to what upsample, wich is the common minimum multiple. (erm.. spelling?)
In some way, this is how a DAC works (except that then, the result is a continuous signal).


I have included the polynomial interpolators in the "other resamplers", since they are, in some way, approximations, or concepts applied to sound when sometimes they originated in graphics ( splines, for example, is more about visuals than samples).

ffmpeg vs. SoX for resampling

Reply #33
The sinc table is calculated in this other file, and a blackman window is applied to make it finite.

AFAIK LAME also uses blackman windowed sinc.

ffmpeg vs. SoX for resampling

Reply #34
The conversion quality is the same (graphs for the newer version are @ infinitewave under Audacity 2.0.3); the performance is higher cos it's faster than before.


I would like to add a new resampling option to linux alsa rate plugin.


I already did this. Replicating the libsamplerate code in alsa-plugins with some gluing just works (with some quality modes).

Here is a patch that implements soxr_lsr_{HQ,MQ,LQ} :
http://ompldr.org/vaGc3Ng/Initial-soxr-lsr-support.patch

I shared more info here:
http://www.hydrogenaudio.org/forums/index....st&p=817595

 

ffmpeg vs. SoX for resampling

Reply #35

@saratoga: I am not an expert in DSP or maths (I did study the fourier transform at the university, but was applied to signals in general, not specifically to sound).
But with my knowledge of resampling (i.e. what I've tried to know), the sinc interpolation is considered the ideal (which also means not possible in finite time/signals) resampling method because it is the response or an ideal brickwall lowpass filter. Real implementations have to window the sinc in order to make it finite, and in this way, limit the amount of samples needed to calculate the output. (See Psycle's implementation).

In contrast, decimating and interpolating is a two step method which firstly upsamples using the zero-stuffing method, applies a lowpass filter at the lowest of the two samplerates, and then downsamples by getting directly the samples from the lowpassed signal. The difficult part is getting the values to what upsample, wich is the common minimum multiple. (erm.. spelling?)
In some way, this is how a DAC works (except that then, the result is a continuous signal).

The "ideal" lowpass filter is a sin(x)/x function of infinite extent. Sampling theory tells us that using such a filter allows for perfect sampling and perfect reconstruction of a continous signal up to (but not including) fs/2.

If you think about it, resampling can be considered to be a reconstruction/sampling process, the same theory applies.

There are many ways to think about this, and many ways to optimize the processing (avoiding multiplications that does not affect the output is a significant one). I believe that you come a long way by only considering lowpass filter design. Linear interpolation, cubic interpolation, (windowed) sinc can all be considered lowpass filters.

-k

ffmpeg vs. SoX for resampling

Reply #36

@saratoga: I am not an expert in DSP or maths (I did study the fourier transform at the university, but was applied to signals in general, not specifically to sound).
But with my knowledge of resampling (i.e. what I've tried to know), the sinc interpolation is considered the ideal (which also means not possible in finite time/signals) resampling method because it is the response or an ideal brickwall lowpass filter. Real implementations have to window the sinc in order to make it finite, and in this way, limit the amount of samples needed to calculate the output. (See Psycle's implementation).

In contrast, decimating and interpolating is a two step method which firstly upsamples using the zero-stuffing method, applies a lowpass filter at the lowest of the two samplerates, and then downsamples by getting directly the samples from the lowpassed signal.


A lowpass filter is a windowed sinc.  So you propose two methods, one of which fits a windowed sinc to calculate values, and another which ... fits a windowed sinc to calculate values. 

These are just two ways of implementing the same algorithm, which is preferred is just an implementation detail that depends on the exact needs of the resampler.  Trying to draw some abstract distinction between them is silly.


I have included the polynomial interpolators in the "other resamplers", since they are, in some way, approximations, or concepts applied to sound when sometimes they originated in graphics


They didn't originate in graphics, they originated in 17th century boundary value problems.  They're general numerical techniques, as such both polynomial and windowed sinc interpolation are widely used in audio and graphics. 


( splines, for example, is more about visuals than samples).


What is it you think digital images are made of if not samples?

ffmpeg vs. SoX for resampling

Reply #37
I already did this. Replicating the libsamplerate code in alsa-plugins with some gluing just works (with some quality modes).


Fantastic, great work. Please would you mind sending the patch to the alsa-devel mailing list to have it included upstream? It would help a number of people. Thanks a lot in advance.

ffmpeg vs. SoX for resampling

Reply #38
@ saratoga:
As I said, my knowledge of this is average, and I might even use the wrong words sometimes.

That said, I'm not sure I agree completely with what you say.
A -> B  does not necesarily equal B -> A

A sinc filter (which has a sinc function as its impulse response) is a lowpass filter, but a lowpass filter is not necesarily a sinc filter. I don't have the math knowledge to make filters (or understand fully the poles and zeroes), but I understand that the polynomials generated are not just "sinc aproximations".
Just like the most basic lowpass filter is not a sinc filter:  o0 + (i1-o0)*FC


You simplified the two methods I described as to using a lowpass filter. In essence, this is true (we want to get a lowpassed signal to avoid aliasing), but I wanted to differentiate the theory from the result. Example:
A linear interpolation is an intuitive way to find the value between two points, but it is not based on theory that reconstructs the path that a continuous bandlimited signal would take.

In that way, i made a distinction between the sinc method and the decimate/interpolate method because they do have a theory related to sound behind them, but it is not the same theory. (Or, let's say, one is the theory directly, and the other is a derivate of the theory, as in the second one does not necesarily imply a sinc filter, even though it is the ideal one).
I can accept that the decimate/interpolate method is akin of doing a fixed point implementation of a floating point one, so in essence, they do the same. But as an implementation, they reach the solution differently.


About polynomials, I admit I might have been too quick. I overlooked the math history, but again I was mentioning concrete methods while you mention the concepts on which they are based.  Polynomials serve many purposes, and not all of them apply to bandlimited signals.
I mentioned graphics, because the word "spline" does describe that, a line (a visual concept).

Images are indeed made of samples, but.. what is the equivalent shannon theorem for images? I could accept that images are bandlimited (there's a finite spectrum represented by the sampled image colours, but even then, the RGB points are the representation of the image in the time domain?)

ffmpeg vs. SoX for resampling

Reply #39
Images are indeed made of samples, but.. what is the equivalent shannon theorem for images? I could accept that images are bandlimited (there's a finite spectrum represented by the sampled image colours, but even then, the RGB points are the representation of the image in the time domain?)


You might want to look at the Wikipedia article on the Nyquist–Shannon sampling theorem as it has a section specifically about multivariable sampling (images for example.)

ffmpeg vs. SoX for resampling

Reply #40

@ saratoga:
As I said, my knowledge of this is average, and I might even use the wrong words sometimes.

That said, I'm not sure I agree completely with what you say.
A -> B  does not necesarily equal B -> A

A sinc filter (which has a sinc function as its impulse response) is a lowpass filter, but a lowpass filter is not necesarily a sinc filter. I don't have the math knowledge to make filters (or understand fully the poles and zeroes), but I understand that the polynomials generated are not just "sinc aproximations".


I think you misunderstand.  My point is that the two methods you suggested (sinc interpolation and decimator/interpolator) are basically the same thing. 

I brought up polynomials as an example of a different approach.  My point is that there are basically two families of resamplers in widespread use: sinc-based and polynomial-based.


A linear interpolation is an intuitive way to find the value between two points, but it is not based on theory that reconstructs the path that a continuous bandlimited signal would take.

In that way, i made a distinction between the sinc method and the decimate/interpolate method because they do have a theory related to sound behind them, but it is not the same theory.


To be clear, the decimate/interpolate method always uses a sinc filter (or close approximation thereof) in practice.  Linear interpolation can be combined with decimation/interpolation (using a sinc filter), but in practice never is because that rather defeats the purpose. 


I can accept that the decimate/interpolate method is akin of doing a fixed point implementation of a floating point one, so in essence, they do the same. But as an implementation, they reach the solution differently.


I don't accept that.  What does decimation/interpolation have to do with machine precision?  You can do it integer, fixed point, floating point, decimal, whatever. 

I mentioned graphics, because the word "spline" does describe that, a line (a visual concept).


huh?


Images are indeed made of samples, but.. what is the equivalent shannon theorem for images? I could accept that images are bandlimited


Pixels are samples in 2D spaces, just as voxels are samples in 3D spaces.  The sampling theorem is universally applicable to all N-dimensional spaces. 


(there's a finite spectrum represented by the sampled image colours, but even then, the RGB points are the representation of the image in the time domain?)


An RGB image is simply 3 independent grayscale images recorded using a color filter.

ffmpeg vs. SoX for resampling

Reply #41
Can you tell me please which software uses SINC?

First, go to http://src.infinitewave.ca/
For the top graph, select Test Result = Impulse, then press the > button to look at each resampler in turn:
  • If the impulse looks broadly like Audacity 2.0.3 (perhaps with shorter tails), it's a close approximation to a sinc. Could be obtained by windowing sinc, but could also be other techniques such as Parks-McClellan optimal FIR. Longer tails equate to steeper/deeper filters.
  • It it looks broadly like AbletonLive 8.2 then it's a sinc approximation that's been phase-adjusted to be causal (i.e. non-linear phase).
  • It it looks roughly like AbletonLive 7 or Waveburner 1.2, then it's low-order polynomial.  Even this is a simple approximation to a sinc, but gives correspondingly poor results.
  • Others, like FL Studio 10 (6 Point Hermite), use other ways to approximate a sinc, but again with not so good results.
  • Most are type #1.
  • Soundhack looks like it should be type #1 but has implementation errors.
  • SIR2 and a few others are type #1 but inverted (bug).
  • Wavosaur 1.0.3.0 looks like a design error: polynomial followed by a sinc. (A multi-stage approach is perfectly valid; however, a good multi-stage design will still give a sinc impulse response.)
So the answer is they all use use sinc approximation, but the closer the approximation, the closer the result to the 'ideal'.

ffmpeg vs. SoX for resampling

Reply #42
So SSRC high precision is a good approximation to SINC?
lame3995o -Q1.7 --lowpass 17

ffmpeg vs. SoX for resampling

Reply #43
SSRC uses kaiser-windowed sinc.

ffmpeg vs. SoX for resampling

Reply #44
Sorry, I don't know what this means in terms of quality. Is SSRC high precision a good choice in this respect?
lame3995o -Q1.7 --lowpass 17


ffmpeg vs. SoX for resampling

Reply #46
It has some problems upsampling though:



ffmpeg vs. SoX for resampling

Reply #47
No problem for me, I'm only interested in downsampling.
lame3995o -Q1.7 --lowpass 17

ffmpeg vs. SoX for resampling

Reply #48
It has some problems upsampling though:


Are you referring to the background signals?  If so, I wouldn't really call them a 'problem' given that they're spectrally flat and -110dB below peak.

ffmpeg vs. SoX for resampling

Reply #49
It has some problems upsampling though:

Heh! Just think about what the main purpose of SSRC back in 2001 was when Naoki wrote this little gem. The need of upsampling anything to 192kHz was most likely as far away as rabbits taking over planet mars
Is troll-adiposity coming from feederism?
With 24bit music you can listen to silence much louder!