IPB

Welcome Guest ( Log In | Register )

2 Pages V   1 2 >  
Reply to this topicStart new topic
MMX Optimized WavPack Encoder (for Windows), Binarys (Win32) for and Sources (Microsoft Specific)
wisodev
post Apr 18 2006, 11:41
Post #1





Group: Developer
Posts: 123
Joined: 31-January 06
Member No.: 27439



NOTE: This are old binarys and sources! Use updated binarys and sources from POST #40.

Patched WavPack 4.31 sources (not the 4.32 sources for linux) with MMX optimizations posted by he-jo with converted GCC specific MMX INTRINSICS to Microsoft Specific MMX INTRINSICS. This sources are for Visual C++ .NET 2003 (solution and project for WavPack encoder included).

Attached File  wavpack_4.31_src_mmx.zip ( 162.19K ) Number of downloads: 773


Windows binarys (Win32) of original and MMX optimized (Miscrosoft Visual C++ .NET 2003 build).

Attached File  wavpack_4.31_bin_mmx.zip ( 132.59K ) Number of downloads: 829


This post has been edited by wisodev: Jun 14 2006, 18:24


--------------------
http://code.google.com/p/wavtoac3encoder/
Go to the top of the page
+Quote Post
rjamorim
post Apr 18 2006, 12:00
Post #2


Rarewares admin


Group: Members
Posts: 7515
Joined: 30-September 01
From: Brazil
Member No.: 81



Thank-you very much smile.gif

Wouldn't it be nicer to just give up intrinsics and go for NASM?

This post has been edited by rjamorim: Apr 18 2006, 12:01


--------------------
Get up-to-date binaries of Lame, AAC, Vorbis and much more at RareWares:
http://www.rarewares.org
Go to the top of the page
+Quote Post
Destroid
post Apr 18 2006, 12:01
Post #3





Group: Members
Posts: 551
Joined: 4-June 02
Member No.: 2220



I'm sorry, what is the difference between Wavpack_mmx.EXE and Wavpack_org.EXE?


--------------------
"Something bothering you, Mister Spock?"
Go to the top of the page
+Quote Post
wisodev
post Apr 18 2006, 12:05
Post #4





Group: Developer
Posts: 123
Joined: 31-January 06
Member No.: 27439



QUOTE
I\'m sorry, what is the difference between Wavpack_mmx.EXE and Wavpack_org.EXE?



Wavpack_mmx.EXE – optimized MMX build (he-jo optimizations converted by me to MS specific)
Wavpack_org.EXE – original 4.31 build

This post has been edited by wisodev: Apr 18 2006, 12:14


--------------------
http://code.google.com/p/wavtoac3encoder/
Go to the top of the page
+Quote Post
The Gaby
post Apr 18 2006, 12:12
Post #5





Group: Members
Posts: 4
Joined: 17-February 06
Member No.: 27830



QUOTE (Destroid @ Apr 18 2006, 08:01 AM) *
I'm sorry, what is the difference between Wavpack_mmx.EXE and Wavpack_org.EXE?


I haven´t seen them, but I guess:

Wavpack_org.EXE = Wavpack Original Version
Wavpack_mmx.EXE = Wavpack MMX Optimized Version
wink.gif
Go to the top of the page
+Quote Post
wisodev
post Apr 18 2006, 12:17
Post #6





Group: Developer
Posts: 123
Joined: 31-January 06
Member No.: 27439



QUOTE
Thank-you very much smile.gif

Wouldn\'t it be nicer to just give up intrinsics and go for NASM?


I have plain assembly version but my code (working well) was slower then intrinsics version.


--------------------
http://code.google.com/p/wavtoac3encoder/
Go to the top of the page
+Quote Post
rjamorim
post Apr 18 2006, 12:36
Post #7


Rarewares admin


Group: Members
Posts: 7515
Joined: 30-September 01
From: Brazil
Member No.: 81



QUOTE (wisodev @ Apr 18 2006, 08:17 AM) *
I have plain assembly version but my code (working well) was slower then intrinsics version.


Ah, that's curious.

That's a pity, NASM would allow for some compiler independency.


--------------------
Get up-to-date binaries of Lame, AAC, Vorbis and much more at RareWares:
http://www.rarewares.org
Go to the top of the page
+Quote Post
wisodev
post Apr 18 2006, 12:46
Post #8





Group: Developer
Posts: 123
Joined: 31-January 06
Member No.: 27439



QUOTE
QUOTE
I have plain assembly version but my code (working well) was slower then intrinsics version.


Ah, that\\\'s curious.

That\\\'s a pity, NASM would allow for some compiler independency.


My first goal was to create independent asm code, but I failed to write code faster then intrinsics version and finally I ended with this one. I will soon try again to write NASM version. I am not so experienced with asm programming, so do not blame me if something is wrong here.

It would be nice if someone can post speed comparison test results.

This post has been edited by wisodev: Apr 18 2006, 12:48


--------------------
http://code.google.com/p/wavtoac3encoder/
Go to the top of the page
+Quote Post
he-jo
post Apr 18 2006, 20:08
Post #9





Group: Members
Posts: 43
Joined: 1-April 06
Member No.: 29074



I'm also curious how it performs on different processors. Please test the extra modes, e.g. 'wavpack -h -x6'

Btw: You can additionally apply the following patch, which avoids a multiplication in the critical loops that are not MMX optimised yet. This gives a minor speedup in all modes.

CODE
--- wavpack.h
+++ wavpack.h
@@ -415,7 +415,7 @@

#if 1    // PERFCOND
#define update_weight(weight, delta, source, result) \
-    if (source && result) weight -= ((((source ^ result) >> 30) & 2) - 1) * delta;
+    if (source && result) { int32_t s = (int32_t) (source ^ result) >> 31; weight = (weight - s) + (s ^ delta); }
#else
#define update_weight(weight, delta, source, result) \
     if (source && result) (source ^ result) < 0 ? (weight -= delta) : (weight += delta);


--------------------
Joachim Henke
http://j-o.users.sourceforge.net/
Go to the top of the page
+Quote Post
Destroid
post Apr 18 2006, 20:58
Post #10





Group: Members
Posts: 551
Joined: 4-June 02
Member No.: 2220



QUOTE (wisodev @ Apr 18 2006, 11:05 AM) *
QUOTE

I\'m sorry, what is the difference between Wavpack_mmx.EXE and Wavpack_org.EXE?



Wavpack_mmx.EXE – optimized MMX build (he-jo optimizations converted by me to MS specific)
Wavpack_org.EXE – original 4.31 build


Thanks for clarifying for me.

A quick test using -h -x6:
original = 0.37x
mmx = 0.46x

Original and MMX files were identical using binary comparison. Good news!


--------------------
"Something bothering you, Mister Spock?"
Go to the top of the page
+Quote Post
wisodev
post Apr 19 2006, 15:14
Post #11





Group: Developer
Posts: 123
Joined: 31-January 06
Member No.: 27439



QUOTE
QUOTE

QUOTE

I\\\'m sorry, what is the difference between Wavpack_mmx.EXE and Wavpack_org.EXE?



Wavpack_mmx.EXE – optimized MMX build (he-jo optimizations converted by me to MS specific)
Wavpack_org.EXE – original 4.31 build


Thanks for clarifying for me.

A quick test using -h -x6:
original = 0.37x
mmx = 0.46x

Original and MMX files were identical using binary comparison. Good news!


Well 24 % speedup!
But more testing is required.
From my tests the files where always same as original for all -f -x1..6 modes and for -x1..6 and -h -x modes to.


QUOTE
I\'m also curious how it performs on different processors. Please test the extra modes, e.g. \'wavpack -h -x6\'

Btw: You can additionally apply the following patch, which avoids a multiplication in the critical loops that are not MMX optimised yet. This gives a minor speedup in all modes.

CODE
--- wavpack.h
+++ wavpack.h
@@ -415,7 +415,7 @@

#if 1    // PERFCOND
#define update_weight(weight, delta, source, result) \\
-    if (source && result) weight -= ((((source ^ result) >> 30) & 2) - 1) * delta;
+    if (source && result) { int32_t s = (int32_t) (source ^ result) >> 31; weight = (weight - s) + (s ^ delta); }
#else
#define update_weight(weight, delta, source, result) \\
     if (source && result) (source ^ result) < 0 ? (weight -= delta) : (weight += delta);



I will apply patch and post binarys here. I can test at PII 233Mhz , Celeron 466Mhz and Athlon XP 2000+.

I have made some changes (cleanups and reformatting, just for better look of code), this cleanups and patched code will be put here for download tomorrow.

I hope this will be useful for someone ;-)


--------------------
http://code.google.com/p/wavtoac3encoder/
Go to the top of the page
+Quote Post
PrakashP
post Apr 19 2006, 15:27
Post #12





Group: Members
Posts: 28
Joined: 5-November 05
Member No.: 25575



QUOTE (rjamorim @ Apr 18 2006, 01:36 PM) *
QUOTE (wisodev @ Apr 18 2006, 08:17 AM) *
I have plain assembly version but my code (working well) was slower then intrinsics version.


Ah, that's curious.

That's a pity, NASM would allow for some compiler independency.


I already pointed out in the original thread that using intrinsics is better and more portable (when done correctly) considering 32bit/64bit arch.
Go to the top of the page
+Quote Post
wisodev
post Apr 19 2006, 15:36
Post #13





Group: Developer
Posts: 123
Joined: 31-January 06
Member No.: 27439



QUOTE
QUOTE

QUOTE
I have plain assembly version but my code (working well) was slower then intrinsics version.


Ah, that\\\'s curious.

That\\\'s a pity, NASM would allow for some compiler independency.


I already pointed out in the original thread that using intrinsics is better and more portable (when done correctly) considering 32bit/64bit arch.


And how it can be done?

I am asking, because you have posted as below:

QUOTE
I did some \"magic\" in a header file to achieve this.


What precisely is the magic, I really would like to know!

This post has been edited by wisodev: Apr 19 2006, 15:47


--------------------
http://code.google.com/p/wavtoac3encoder/
Go to the top of the page
+Quote Post
wisodev
post Apr 20 2006, 06:48
Post #14





Group: Developer
Posts: 123
Joined: 31-January 06
Member No.: 27439



QUOTE
I already pointed out in the original thread that using intrinsics is better and more portable (when done correctly) considering 32bit/64bit arch.


Thanks, I checked the OpenAL MMX optimizations idea and implemented generally the same way at WavPack MMX optimized.

This post has been edited by wisodev: Apr 20 2006, 06:49


--------------------
http://code.google.com/p/wavtoac3encoder/
Go to the top of the page
+Quote Post
pepoluan
post Apr 20 2006, 11:32
Post #15





Group: Members
Posts: 1455
Joined: 22-November 05
From: Jakarta
Member No.: 25929



Wow... even in the field of Lossless, we now witness an improvement... faster FLAC, faster WavPack, new yet-to-be-named Lossless encoder... smile.gif

Gee my Lossless Performance Test, almost complete, is now obsolescent, even before published... sad.gif

Oh well... I will post it anyway, with the version number clearly indicated, and also a MD5 hashes of the binaries to make sure that my encoders/decoders are truly what they are and not some tweaked-up version.

Maybe later on I will perform an amendment test with the new compressors with the exact same test corpus.


--------------------
Nobody is Perfect.
I am Nobody.

http://pandu.poluan.info
Go to the top of the page
+Quote Post
wisodev
post Apr 20 2006, 12:16
Post #16





Group: Developer
Posts: 123
Joined: 31-January 06
Member No.: 27439



NOTE: This are old binarys and sources! Use updated binarys and sources from POST #40.

Well as stated in previous posts, I have made some code cleanup and merged new he-jo patch (more info at wiso_changes.txt in src package).

Download updated binarys:
Attached File  wavpack_4.31_bin_mmx_wiso.zip ( 141.97K ) Number of downloads: 526

Download updated sources:
Attached File  wavpack_4.31_src_mmx_wiso.zip ( 167.93K ) Number of downloads: 506


Do not use anymore binarys and sources from post #1.

Below my test results (only tested one wav file) using new binarys!

NOTE: Tested on Athlon XP 2000+, 512MB RAM, 80GB SATA DISK (8 MB Cache), Windows XP SP2.

CODE
                                 Test file: 44kHz/16bit/Stereo, 34 435 676 bytes, 195 seconds

    WavPack 4.31 Options       Original 4.31 [s]     MMX Optimized 4.31 [s]      Difference [s]       Speedup [%]
             -f                      2,89                     2,92                    -0,03              -1,04
                                     3,41                     3,45                    -0,04              -1,17
             -h                      6,22                     6,13                    0,09               1,45
           -f -x                     32,27                   28,30                    3,97               12,30
           -f -x1                    6,34                     6,03                    0,31               4,89
           -f -x2                    9,44                     8,66                    0,78               8,26
           -f -x3                    9,75                     8,95                    0,80               8,21
           -f -x4                    20,55                   18,30                    2,25               10,95
           -f -x5                    26,77                   23,66                    3,11               11,62
           -f -x6                    32,30                   28,30                    4,00               12,38
             -x                      59,19                   50,42                    8,77               14,82
            -x1                      9,08                     8,45                    0,63               6,94
            -x2                      15,23                   13,63                    1,60               10,51
            -x3                      25,08                   22,03                    3,05               12,16
            -x4                      59,16                   50,33                    8,83               14,93
            -x5                      89,44                   75,69                    13,75              15,37
            -x6                     190,55                   159,42                   31,13              16,34
           -h -x                    144,02                   127,92                   16,10              11,18
           -h -x1                    18,95                   17,09                    1,86               9,82
           -h -x2                    33,95                   29,76                    4,19               12,34
           -h -x3                   144,03                   127,94                   16,09              11,17
           -h -x4                   229,61                   199,25                   30,36              13,22
           -h -x5                   331,49                   289,81                   41,68              12,57
           -h -x6                   730,89                   645,28                   85,61              11,71

NOTE: Original is Release Build from provided sources, and MMX Optimized is ReleaseMMX build.

Here is my batch file [test.cmd], used for testing (works on WinXP SP2):
CODE
@echo off

rem WavPack MMX OPTIMIZED SPEED TESTS by WISO

rem COMMANDLINE OPTIONS FROM FILE (each line is one test)
set OptionsFile=options.txt

rem INPUT (*.wav) TEST FILE
set InFile=test.wav

rem OUTPUT (*.wv) TEST FILES
set OutFileORG=T_ORG.wv
set OutFileMMX=T_MMX.wv

rem PATH TO ORIGINAL EXE
set ExeFileORG=wavpack_ORG.exe

rem PATH TO MMX OPTIMIZED EXE
set ExeFileMMX=wavpack_MMX.exe

rem RUN SPEED TESTS
FOR /F \"tokens=*\" %%i in (%OptionsFile%) do (
  @echo TESTING: %%i
  %ExeFileORG% %%i %InFile% %OutFileORG%
  %ExeFileMMX% %%i %InFile% %OutFileMMX%
  fc /B %OutFileORG% %OutFileMMX%
  del %OutFileORG%
  del %OutFileMMX%
  @echo ################################################################################
)

echo ALL TESTS DONE
pause


and [options.txt] file, note that empty line is not empty, there is one ASCII SPACE char:
CODE
-f

-h
-f -x
-f -x1
-f -x2
-f -x3
-f -x4
-f -x5
-f -x6
-x
-x1
-x2
-x3
-x4
-x5
-x6
-h -x
-h -x1
-h -x2
-h -x3
-h -x4
-h -x5
-h -x6


Thats all,
WISO (wisodev @ HA.org).

This post has been edited by wisodev: Jun 14 2006, 18:26


--------------------
http://code.google.com/p/wavtoac3encoder/
Go to the top of the page
+Quote Post
waileongyeo
post Apr 20 2006, 15:43
Post #17





Group: Members
Posts: 101
Joined: 23-November 04
Member No.: 18278



Erm... Thanks for the mmx optimization mod. I've some QUICK test results.

Input file: "8_Channel_Sound.wav" (~ 8 secs)
-------------------------------------------
bitrate = 9216
samplerate = 48000
channels = 8
codec = PCM
encoding = lossless
bitspersample = 24
386383 samples @ 48000Hz
File size: 9,273,236 Bytes (8.84 MB)
-------------------------------------------

Command switch used: -h -x6
CODE
[Build version]   [Time (s)] [Speed Increased %]
org (wisodev)     24.94       0 (All others build are normalized to this)
mmx (wisodev)     19.41       22.17
mmx_v2 (wisodev)  19.00       23.82
mmx (wly)         23.00       7.78
mmx_v2 (wly)      24.58       1.44

org (wisodev) - Original build (from 'wavpack_4.31_bin_mmx.zip')
mmx (wisodev) - MMX optimized build (from 'wavpack_4.31_bin_mmx.zip')
mmx_v2 (wisodev) - MMX optimized build version 2 (from 'wavpack_4.31_bin_mmx_wiso.zip')
mmx (wly) - My build, MMX optimized build base on 'wavpack_4.31_src_mmx.zip' using VC2005
mmx_v2 (wly) - My build, MMX optimized build base on 'wavpack_4.31_src_mmx_wiso.zip' using VC2005

wisodev's builds are working fine but...
What's wrong with my builds?!?!?!?!?

My computer spec:
Pentium 4 - 2.4GHz
RAM 756MB
WinXP SP2

This post has been edited by waileongyeo: Apr 22 2006, 04:44
Go to the top of the page
+Quote Post
wisodev
post Apr 22 2006, 09:32
Post #18





Group: Developer
Posts: 123
Joined: 31-January 06
Member No.: 27439



QUOTE (waileongyeo @ Apr 20 2006, 04:43 PM) *
wisodev's builds are working fine but...
What's wrong with my builds?!?!?!?!?

My computer spec:
Pentium 4 - 2.4GHz
RAM 756MB
WinXP SP2


Maybe you have selected Release from Build configurations, you need to select ReleaseMMX to build MMX optimized version.


--------------------
http://code.google.com/p/wavtoac3encoder/
Go to the top of the page
+Quote Post
waileongyeo
post Apr 24 2006, 12:01
Post #19





Group: Members
Posts: 101
Joined: 23-November 04
Member No.: 18278



QUOTE (wisodev @ Apr 22 2006, 04:32 PM) *
Maybe you have selected Release from Build configurations, you need to select ReleaseMMX to build MMX optimized version.

Oopps... My bad. That's my mistake. smile.gif
Thanks for the tips.

However, the binary is only 9~10% faster (sound reasonable now) under the build of VS8. You build (VC7.1) is the king! smile.gif
Go to the top of the page
+Quote Post
wisodev
post May 5 2006, 20:03
Post #20





Group: Developer
Posts: 123
Joined: 31-January 06
Member No.: 27439



NOTE: This are old binarys and sources! Use updated binarys and sources from POST #40.

OK, included latest bsr ASM optimizations by he-jo (wavpack_MMX-BSR.exe) and done some quick test, all is included in file below (alsow the sources):

Attached File  wavpack_4.31_mmx_bsr_wiso.rar ( 180.28K ) Number of downloads: 571

For more information see this thread.

This post has been edited by wisodev: Jun 14 2006, 18:26


--------------------
http://code.google.com/p/wavtoac3encoder/
Go to the top of the page
+Quote Post
wisodev
post May 23 2006, 18:58
Post #21





Group: Developer
Posts: 123
Joined: 31-January 06
Member No.: 27439



NOTE: This are old binarys and sources! Use updated binarys and sources from POST #40.

Latest optimazation (JFL2B) from he-jo (posted here), including sources:

Attached File  wavpack_4.31_src_mmx_jfl2b_wiso.rar ( 197.14K ) Number of downloads: 492


Binarys included :
CODE
Release\\wavpack.exe          - original 4.31
ReleaseBSR\\wavpack.exe       - original 4.31 + he-jo BSR ASM optimizations
ReleaseJFL2B\\wavpack.exe     - original 4.31 + he-jo JFL2B ASM optimizations
ReleaseMMX\\wavpack.exe       - original 4.31 + he-jo MMX optimizations
ReleaseMMX-BSR\\wavpack.exe   - original 4.31 + he-jo MMX optimizations + he-jo BSR ASM optimizations
ReleaseMMX-JFL2B\\wavpack.exe - original 4.31 + he-jo MMX optimizations + he-jo JFL2B ASM optimizations

My quick speed test:
CODE
Test file: 44kHz/16bit/Stereo, 34 435 676 bytes, 195 seconds
Tested on Athlon XP 2000+, 512MB RAM, 80GB SATA DISK (8 MB Cache), Windows XP SP2
Results are in seconds, numbers in (...) are in 100 nanosecod units.

OPTIONS: -f -x6

ORG:
Kernel Time = 0.109 (1093750)
User Time = 32.109 (321093750)
Process Time = 32.218 (322187500)
Global Time = 32.484 (324843750)

BSR: 3,30%
Kernel Time = 0.156 (1562500)
User Time = 31.000 (310000000)
Process Time = 31.156 (311562500)
Global Time = 31.359 (313593750)

JFL2B: 0,19%
Kernel Time = 0.328 (3281250)
User Time = 31.828 (318281250)
Process Time = 32.156 (321562500)
Global Time = 32.312 (323125000)

MMX: 12,07%
Kernel Time = 0.296 (2968750)
User Time = 28.031 (280312500)
Process Time = 28.328 (283281250)
Global Time = 28.515 (285156250)

MMX-BSR: 15,86%
Kernel Time = 0.218 (2187500)
User Time = 26.890 (268906250)
Process Time = 27.109 (271093750)
Global Time = 27.328 (273281250)

MMX-JFL2B: 12,75%
Kernel Time = 0.203 (2031250)
User Time = 27.906 (279062500)
Process Time = 28.109 (281093750)
Global Time = 28.312 (283125000)


On my Ahtlon XP JFL2B optimizations are slower then BSR optimazations. Maybe I have done something wrong, but it seems that all is OK with my builds. Output files are same for all builds (binary comparison).

EDIT: Codebox for test results.

This post has been edited by wisodev: Jun 14 2006, 18:27


--------------------
http://code.google.com/p/wavtoac3encoder/
Go to the top of the page
+Quote Post
he-jo
post May 23 2006, 22:18
Post #22





Group: Members
Posts: 43
Joined: 1-April 06
Member No.: 29074



Hm, this is really strange! But it seems, that you always used the same file for the tests you posted here. Since the speed difference of my asm routines to the original code depends on the data in the input file, it would be nice, if you could do some more testing. Please use tracks with different kinds of music, maybe some louder tracks?! smile.gif

Thanks for your work anyway!


--------------------
Joachim Henke
http://j-o.users.sourceforge.net/
Go to the top of the page
+Quote Post
wisodev
post May 24 2006, 05:41
Post #23





Group: Developer
Posts: 123
Joined: 31-January 06
Member No.: 27439



QUOTE
Hm, this is really strange! But it seems, that you always used the same file for the tests you posted here. Since the speed difference of my asm routines to the original code depends on the data in the input file, it would be nice, if you could do some more testing. Please use tracks with different kinds of music, maybe some louder tracks?! smile.gif

Thanks for your work anyway!


Yes, the same file was used in all tests. I was short on time and only used one file. This file was actually loud metal music. But if it really depends on input data, then I will do more testing on different music genres. Beside I know that testing one file is not enough ;-)


--------------------
http://code.google.com/p/wavtoac3encoder/
Go to the top of the page
+Quote Post
he-jo
post May 24 2006, 08:24
Post #24





Group: Members
Posts: 43
Joined: 1-April 06
Member No.: 29074



Just to be sure - for the Celeron tests I used this file set from http://www.rarewares.org/test_samples/ :

41_30sec.wav Bartok_strings2.wav BigYellow.wav DaFunk.wav EnolaGay.wav Leahy.wav Mama.wav NewYorkCity.wav OrdinaryWorld.wav Quizas.wav SinceAlways.wav TheSource.wav Twelve.wav Waiting.wav bodyheat.wav rosemary.wav thear1.wav trust.wav

and three 24 bit files from the last chapter of this page: ttp://www.mytekdigital.com/compare/comparison1.htm

dBTech_122-96_24bit_web.wav mytek_8X96_24bit_web.wav prism_AD124_24bit_web.wav

Could you also do a test on this file set? I run the test with this command line:

timer wavpack -q -f -x6 "*.wav"


--------------------
Joachim Henke
http://j-o.users.sourceforge.net/
Go to the top of the page
+Quote Post
wisodev
post May 24 2006, 09:12
Post #25





Group: Developer
Posts: 123
Joined: 31-January 06
Member No.: 27439



OK, this is very good suggestion to use same test files!


--------------------
http://code.google.com/p/wavtoac3encoder/
Go to the top of the page
+Quote Post

2 Pages V   1 2 >
Reply to this topicStart new topic
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:

 



RSS Lo-Fi Version Time is now: 2nd October 2014 - 12:00