IPB

Welcome Guest ( Log In | Register )

13 Pages V   1 2 3 > »   
Reply to this topicStart new topic
Ogg Vorbis optimized for speed, ca. 1.5x faster than 1.1 original ver.
nyaochi
post Nov 4 2004, 20:11
Post #1





Group: Members
Posts: 169
Joined: 30-September 01
From: Tokyo, Japan
Member No.: 99



Some Japanese guys work on speed optimization of libvorbis by using SSE. Blacksword (or 637) launched an Ogg Vorbis acceleration project (in Japanese only) and releases oggenc binary and libvorbis patch based on libvorbis 1.1. This optimization includes SSE implementations of FFT, MDCT, windowing, channel coupling, sorting, psymodel, floor/residue encode, and so on. In my computer (Pentium IV 2.4GHz), ICL8.1 compiled oggenc binary of the optimized version (Archer Beta03) encodes at 23.4x while the one without optimization (ICL8.1 compiled but no SSE patches) does at 15.5x. Hence, this optimization archives ca. 1.5x speed gain. blink.gif

Unlike GoGo-no-coder, it's not forking: he releases a patch for libvorbis source code without absolutely changing algorithm or data structure. This is very good for source code maintenance to keep up with up-to-date official libvorbis, but limits optimization possibility in some degree. Actually, the author says in readme.txt that there's little room left for optimization. So I think it's time for quality evaluation although this optimization is in development stage. After several bugs are found and fixed for the last week, bitrates are quite similar to the reference encoder for all quality values. If you find any bugs or quality degressions from official 1.1 one, please tell us. smile.gif

Contributors are:
- Blacksword (or 637)'s SSE optimization (Japanese only): A number of functions in libvorbis are vectorized to take advantage of SSE instruction set as well as Opt-Sort and wuvorbis. For complete list of optimized functions, see readme.txt (in Japanese but you may easily find it) attached with the binary.
- Manuke's OptSort: Optimization of qsort function that consumes 20% of compression processing time, by assuming that _vp_quantize_couple_sort and _vp_noise_normalize_sort functions in psy.c call qsort with 8 or 32 element. This accelerates the whole compression process by 10%.
- W.Dee's wuvorbisfile (Japanese only?): wuvorbis.dll is a fast Ogg Vorbis decoder with SSE and 3DNow!, which is a part of KiriKiri software (useful for developing multi-media contents or adventure games). wuvorbis.dll decodes 1.4x-1.8x faster (SSE) and 1.5x-1.9x faster (3DNow!) than official libvorbis.

Happy encoding!
Go to the top of the page
+Quote Post
dev0
post Nov 4 2004, 20:37
Post #2





Group: Developer
Posts: 1679
Joined: 23-December 01
From: Germany
Member No.: 731



fefe was working on a (apparently buggy) SSE optimization of libvorbis too.
Do the optimizations only effect encoding or decoding as well?


--------------------
"To understand me, you'll have to swallow a world." Or maybe your words.
Go to the top of the page
+Quote Post
ilikedirtthe2nd
post Nov 4 2004, 23:04
Post #3





Group: Members
Posts: 470
Joined: 26-October 01
From: Germany
Member No.: 352



I archived almost 100% (rather 85%, actually wink.gif ) speed incrase (against ICL 8.1 on AMD Athlon XP 1800+)

ICL 8.1: 9,8x
Optimized 18,0x.

Pretty good ohmy.gif

This post has been edited by ilikedirtthe2nd: Nov 4 2004, 23:06
Go to the top of the page
+Quote Post
TedFromAccountin...
post Nov 5 2004, 01:22
Post #4





Group: Members
Posts: 92
Joined: 11-March 04
From: The Forest
Member No.: 12650



Wow ohmy.gif Now that is FAST. My results were similar to ilikedirtthe2nd's (actually a little better).
Go to the top of the page
+Quote Post
nyaochi
post Nov 5 2004, 03:14
Post #5





Group: Members
Posts: 169
Joined: 30-September 01
From: Tokyo, Japan
Member No.: 99



QUOTE (dev0 @ Nov 5 2004, 04:37 AM)
fefe was working on a (apparently buggy) SSE optimization of libvorbis too.
Do the optimizations only effect encoding or decoding as well?
*

Oh, I didn't know fefe's optimization. blink.gif I'll check whether it benefits Blacksword's optimization. smile.gif

IMHO this optimization effects on both encoding and decoding sides although optimized oggdec is not tested or released. Several functions for decodnig (e.g., vorbis_synthesis_blockin, mapping0_inverse, mdct_backward, etc.) are optimized too.
Go to the top of the page
+Quote Post
QuantumKnot
post Nov 6 2004, 02:05
Post #6





Group: Developer
Posts: 1245
Joined: 16-December 02
From: Australia
Member No.: 4097



Whoa, it's really fast ohmy.gif

On my P4 2.4 GHz:

ICL compiled oggenc from rarewares: 13.2x
SSE optimised oggenc: 20.5x

This post has been edited by QuantumKnot: Nov 6 2004, 02:11
Go to the top of the page
+Quote Post
Bonzi
post Nov 6 2004, 08:02
Post #7


A/V Moderator


Group: Members
Posts: 278
Joined: 22-February 03
Member No.: 5132



Pretty nice speedup here too:
oggenc from rarewares 10.4x
SSE optimized 15.3x
Go to the top of the page
+Quote Post
Music Mixer
post Nov 6 2004, 08:10
Post #8





Group: Members
Posts: 4
Joined: 31-October 04
Member No.: 17931



Hello!

Well, I have got an older machine (p3 700) and recieved a speedup from 4.4 to 9.3x realtime.

Have you guys tested the SSE2 optimized build at http://homepage3.nifty.com/blacksword/
?

I wonder how big the speedup with this build is for p 4 and amd 64 cpus.

smile.gif

This post has been edited by Music Mixer: Nov 6 2004, 08:12
Go to the top of the page
+Quote Post
Sebastian Mares
post Nov 6 2004, 10:18
Post #9





Group: Members
Posts: 3630
Joined: 14-May 03
From: Bad Herrenalb
Member No.: 6613



According to my tests...

ICL 8.1 Standard:

CODE
File length:  4m 58,0s
Elapsed time: 0m 18,0s
Rate:         16,5778
Average bitrate: 236,7 kb/s


ICL 8.1 Pentium 4:

CODE
File length:  4m 58,0s
Elapsed time: 0m 17,0s
Rate:         17,5529
Average bitrate: 236,7 kb/s


SSE:

CODE
File length:  4m 58,0s
Elapsed time: 0m 18,0s
Rate:         16,5778
Average bitrate: 236,7 kb/s


SSE2:

CODE
File length:  4m 58,0s
Elapsed time: 0m 18,0s
Rate:         16,5778
Average bitrate: 236,7 kb/s


Tested with "Toto - Africa" on a Pentium 4 with 3.2 GHz, 512 MB RAM, running Windows XP Professional Service Pack 1.


--------------------
http://listening-tests.hydrogenaudio.org/sebastian/
Go to the top of the page
+Quote Post
esa372
post Nov 6 2004, 16:10
Post #10





Group: Members (Donating)
Posts: 429
Joined: 5-September 04
From: Los Angeles
Member No.: 16796



I got a good increase, too...

SSE2
CODE
       File length:  5m 23.0s
       Elapsed time: 0m 12.0s
       Rate:         26.9556
       Average bitrate: 175.3 kb/s

ILC 8.1
CODE
       File length:  5m 23.0s
       Elapsed time: 0m 19.0s
       Rate:         17.0246
       Average bitrate: 175.3 kb/s



But I can't seem to get it to work on FLAC files...
CODE
ERROR: Input file "01.flac" is not a supported format.

Am I missing something??

Thanks,

~esa

:edit: typo

This post has been edited by esa372: Nov 6 2004, 19:15


--------------------
Clowns love haircuts; so should Lee Marvin's valet.
Go to the top of the page
+Quote Post
ilikedirtthe2nd
post Nov 6 2004, 16:24
Post #11





Group: Members
Posts: 470
Joined: 26-October 01
From: Germany
Member No.: 352



QUOTE (esa372 @ Nov 6 2004, 03:10 PM)
But I can't seem to get it to work on FLAC files...
CODE
ERROR: Input file "01.flac" is not a supported format.

Am I missing something??


Standard oggenc doesn't input lossless files directly. Only Oggenc2.3 from rarewares does.

Regards; ilikedirt
Go to the top of the page
+Quote Post
dev0
post Nov 6 2004, 16:46
Post #12





Group: Developer
Posts: 1679
Joined: 23-December 01
From: Germany
Member No.: 731



QUOTE (ilikedirtthe2nd @ Nov 6 2004, 04:24 PM)
QUOTE (esa372 @ Nov 6 2004, 03:10 PM)
But I can't seem to get it to work on FLAC files...
CODE
ERROR: Input file "01.flac" is not a supported format.

Am I missing something??


Standard oggenc doesn't input lossless files directly. Only Oggenc2.3 from rarewares does.

Regards; ilikedirt
*


The standard oggenc supports FLAC input perfectly. It's a compile-time option AFAIK.


--------------------
"To understand me, you'll have to swallow a world." Or maybe your words.
Go to the top of the page
+Quote Post
john33
post Nov 6 2004, 16:52
Post #13


xcLame and OggDropXPd Developer


Group: Developer
Posts: 3760
Joined: 30-September 01
From: Bracknell, UK
Member No.: 111



QUOTE (dev0 @ Nov 6 2004, 03:46 PM)
The standard oggenc supports FLAC input perfectly. It's a compile-time option AFAIK.
*

It sure is. smile.gif


--------------------
John
----------------------------------------------------------------
My compiles and utilities are at http://www.rarewares.org/
Go to the top of the page
+Quote Post
esa372
post Nov 6 2004, 17:01
Post #14





Group: Members (Donating)
Posts: 429
Joined: 5-September 04
From: Los Angeles
Member No.: 16796



QUOTE (ilikedirtthe2nd @ Nov 6 2004, 08:24 AM)
Standard oggenc doesn't input lossless files directly.

QUOTE (dev0 @ Nov 6 2004, 08:46 AM)
The standard oggenc supports FLAC input perfectly.


huh.gif

Well, I can't say that the issue is any clearer for me now...


--------------------
Clowns love haircuts; so should Lee Marvin's valet.
Go to the top of the page
+Quote Post
ilikedirtthe2nd
post Nov 6 2004, 17:19
Post #15





Group: Members
Posts: 470
Joined: 26-October 01
From: Germany
Member No.: 352



QUOTE
It's a compile-time option AFAIK.


That means, oggenc is able to input flac, if this is enabled when compiling. So: generally it is able to read flac, but this version is not.
Go to the top of the page
+Quote Post
esa372
post Nov 6 2004, 17:25
Post #16





Group: Members (Donating)
Posts: 429
Joined: 5-September 04
From: Los Angeles
Member No.: 16796



QUOTE (ilikedirtthe2nd @ Nov 6 2004, 09:19 AM)
...oggenc is able to input flac, if this is enabled when compiling. So: generally it is able to read flac, but this version is not.
Ah... thank you for the clarification!

smile.gif

~esa


--------------------
Clowns love haircuts; so should Lee Marvin's valet.
Go to the top of the page
+Quote Post
nyaochi
post Nov 6 2004, 18:20
Post #17





Group: Members
Posts: 169
Joined: 30-September 01
From: Tokyo, Japan
Member No.: 99



QUOTE (Music Mixer @ Nov 6 2004, 04:10 PM)
Have you guys tested the SSE2 optimized build at http://homepage3.nifty.com/blacksword/
?

I wonder how big the speedup with this build is for p 4 and amd 64 cpus.
*

I could not find speed difference between SSE and SSE2 versions on my Pentium IV machine. Is there anybody who gets speed increase? The author wants to know the effect to determine whether if he should continue SSE2 version or not.

QUOTE (Sebastian Mares @ Nov 6 2004, 06:18 PM)
According to my tests...

ICL 8.1 Standard:

CODE
File length:  4m 58,0s
Elapsed time: 0m 18,0s
Rate:         16,5778
Average bitrate: 236,7 kb/s


SSE:

CODE
File length:  4m 58,0s
Elapsed time: 0m 18,0s
Rate:         16,5778
Average bitrate: 236,7 kb/s

*

Are SSE and SSE2 binaries your own builds? If so, don't forget to define a symbol __SSE__ to activate the optimization when compiling.
Go to the top of the page
+Quote Post
Sebastian Mares
post Nov 6 2004, 19:05
Post #18





Group: Members
Posts: 3630
Joined: 14-May 03
From: Bad Herrenalb
Member No.: 6613



QUOTE (esa372 @ Nov 6 2004, 04:10 PM)
I got a good increase, too...

ILC 8.1
CODE
       File length:  5m 23.0s
       Elapsed time: 0m 12.0s
       Rate:         26.9556
       Average bitrate: 175.3 kb/s

SSE2
CODE
       File length:  5m 23.0s
       Elapsed time: 0m 19.0s
       Rate:         17.0246
       Average bitrate: 175.3 kb/s



But I can't seem to get it to work on FLAC files...
CODE
ERROR: Input file "01.flac" is not a supported format.

Am I missing something??

Thanks,

~esa
*


Huh? The ICL 8.1 compile is faster. blink.gif

QUOTE (nyaochi @ Nov 6 2004, 06:20 PM)
QUOTE (Music Mixer @ Nov 6 2004, 04:10 PM)
Have you guys tested the SSE2 optimized build at http://homepage3.nifty.com/blacksword/
?

I wonder how big the speedup with this build is for p 4 and amd 64 cpus.
*

I could not find speed difference between SSE and SSE2 versions on my Pentium IV machine. Is there anybody who gets speed increase? The author wants to know the effect to determine whether if he should continue SSE2 version or not.

QUOTE (Sebastian Mares @ Nov 6 2004, 06:18 PM)
According to my tests...

ICL 8.1 Standard:

CODE
File length:  4m 58,0s
Elapsed time: 0m 18,0s
Rate:         16,5778
Average bitrate: 236,7 kb/s


SSE:

CODE
File length:  4m 58,0s
Elapsed time: 0m 18,0s
Rate:         16,5778
Average bitrate: 236,7 kb/s

*

Are SSE and SSE2 binaries your own builds? If so, don't forget to define a symbol __SSE__ to activate the optimization when compiling.
*


Nope, they're not my own compiles. unsure.gif


--------------------
http://listening-tests.hydrogenaudio.org/sebastian/
Go to the top of the page
+Quote Post
esa372
post Nov 6 2004, 19:13
Post #19





Group: Members (Donating)
Posts: 429
Joined: 5-September 04
From: Los Angeles
Member No.: 16796



QUOTE (Sebastian Mares @ Nov 6 2004, 11:05 AM)
Huh? The ICL 8.1 compile is faster. blink.gif
Whoops! No, that's a typo... I'll edit immediately...


--------------------
Clowns love haircuts; so should Lee Marvin's valet.
Go to the top of the page
+Quote Post
kjoonlee
post Nov 6 2004, 19:17
Post #20





Group: Members
Posts: 2526
Joined: 25-July 02
From: South Korea
Member No.: 2782



OK, here are some partial translations:

OggEnc_SSE_20041101ArcherB03.zip
Changes regarding/surrounding comments
Improved low-bitrate quality

Current problems are:
  • When encoding at low bitrates, treble quality suffers, and size bloat occurs.
  • Could hang immediately on running, depending on the environment
  • Bugs due to changes to comment handling? unsure.gif


This post has been edited by kjoonlee: Nov 6 2004, 19:18


--------------------
http://blacksun.ivyro.net/vorbis/vorbisfaq.htm
Go to the top of the page
+Quote Post
nyaochi
post Nov 6 2004, 20:31
Post #21





Group: Members
Posts: 169
Joined: 30-September 01
From: Tokyo, Japan
Member No.: 99



QUOTE (kjoonlee @ Nov 7 2004, 03:17 AM)
OK, here are some partial translations:

OggEnc_SSE_20041101ArcherB03.zip
Changes regarding/surrounding comments
Improved low-bitrate quality

Current problems are:
  • When encoding at low bitrates, treble quality suffers, and size bloat occurs.

  • Could hang immediately on running, depending on the environment

  • Bugs due to changes to comment handling? unsure.gif

*

Thanks for the translation. I think all of the current problems listed above are solved in Archer B03. These problems existed in Archer B02.
Go to the top of the page
+Quote Post
QuantumKnot
post Nov 7 2004, 02:55
Post #22





Group: Developer
Posts: 1245
Joined: 16-December 02
From: Australia
Member No.: 4097



IIRC, SSE2 is optimised for double point precision so maybe there isn't that much difference with SSE since libvorbis doesn't use many of them? unsure.gif
Go to the top of the page
+Quote Post
Benjamin Lebsanf...
post Nov 7 2004, 09:04
Post #23





Group: Members
Posts: 761
Joined: 29-September 01
Member No.: 40



Tested on my AMD64 3400+, 1GB RAM

ICL 8.1:

File length: 4m 27.0s
Elapsed time: 0m 14.0s
Rate: 19.1190
Average bitrate: 132.9 kb/s

ICL 8.1 (John33):

File length: 4m 27.0s
Elapsed time: 0m 11.0s
Rate: 24.3333
Average bitrate: 132.9 kb/s

SSE/SSE2 Optimized:

File length: 4m 27.0s
Elapsed time: 0m 08.0s
Rate: 33.4583
Average bitrate: 132.9 kb/s

SSE2 optimization doesn't change encoding speed

This post has been edited by Benjamin Lebsanft: Nov 7 2004, 09:21
Go to the top of the page
+Quote Post
john33
post Nov 7 2004, 10:41
Post #24


xcLame and OggDropXPd Developer


Group: Developer
Posts: 3760
Joined: 30-September 01
From: Bracknell, UK
Member No.: 111



As QK says, there's very little use of double precision in libvorbis, so the use of SSE2 optimisation is virtually a waste of effort.


--------------------
John
----------------------------------------------------------------
My compiles and utilities are at http://www.rarewares.org/
Go to the top of the page
+Quote Post
nyaochi
post Nov 7 2004, 10:54
Post #25





Group: Members
Posts: 169
Joined: 30-September 01
From: Tokyo, Japan
Member No.: 99



QUOTE (QuantumKnot @ Nov 7 2004, 10:55 AM)
IIRC, SSE2 is optimised for double point precision so maybe there isn't that much difference with SSE since libvorbis doesn't use many of them? unsure.gif
*

QUOTE (john33 @ Nov 7 2004, 06:41 PM)
As QK says, there's very little use of double precision in libvorbis, so the use of SSE2 optimisation is virtually a waste of effort.
*

Actually, he expects higher quality (or speed) of float to integer and vice-versa conversion but, at the same time, doubts the effect. I'll tell him these results.
Go to the top of the page
+Quote Post

13 Pages V   1 2 3 > » 
Reply to this topicStart new topic
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:

 



RSS Lo-Fi Version Time is now: 21st September 2014 - 05:50