foobar2000 1.3.8 beta discussion
Reply #14 – 2015-03-05 07:41:35
I noticed that 1.3.8 b1 decodes FLAC and ALAC slightly slower than 1.3.7 (the difference in decoding speed is several percent). Maybe it's the result of Whole Program Optimization turned off . This is most interesting: 1.3.7 stable: ALAC: 324.123x realtime FLAC: 828.854x realtime 1.3.8 beta [SSE] ALAC: 308.505x realtime FLAC: 758.120x realtime 1.3.8 alpha [no SSE] ALAC: 307.728x realtime FLAC: 758.339x realtime Whole program optimizations are enabled. The offending optimizer feature (devirtualizing function calls): * Was not available in VS2010 in first place * Is not relevant to lowlevel decoding code (pure C), affects C++ only as far as I know * The compiler switch that suppresses it - /d2notypeopt - isn't actually enabled for libFLAC code yet libFLAC turned up suspiciously slow also. So, for some reason VS2010 build was faster. If we find no further advantages to using VS2013 compiler, we'll fall back. Note that this doesn't affect formats that we handle through FFmpeg which is compiled externally with GCC.I would like to vote against disabling the SSE2 optimizations. Users with CPUs that do not support the SSE2 instructions (old Athlons) can always use an older version of foobar2000. The same applies to users that refuse to install SP3 on Windows XP. On the other side everybody else (>99%) will benefit from the SSE2 optimizations. So please enable the SSE2 optimizations again. Thank you I don't think you really understand what happened here. foobar2000 has carried various SSE optimizations (also for newer SSE revisions) for a long time, enabled conditionally at run-time after checking for relevant CPU feature bits. Decoder libraries that we use - especially FFmpeg and libFLAC - already contain heavy optimizations for recent instruction sets. ReplayGain at >1000x realtime is only possible thanks to SSE2 code. What happened is that while upgrading toolchain, whole foobar2000 was accidentally compiled with SSE instructions generated in place of legacy x87 opcodes for all floating point math, as the same compiler options turned out to have different meanings in different VS versions. But as you can see from the numbers I quoted above, this made no difference at all on decoding speeds (at least with these two formats). Once there are actual benefits to compiling everything with SSE enabled, we'll reconsider, but for now there are none. Once the above speed issue is sorted out I can provide builds with different SSE modes enabled, then you can find out by yourself if anything actually gets faster from it. You can also rebuild FFmpeg by yourself using the exact version I used but different configure line to get different SSE modes in it. The version I include with fb2k performs runtime CPU detection to enable the optimal mode.