Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: Ogg Vorbis optimized for speed (Read 255100 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

Ogg Vorbis optimized for speed

Some Japanese guys work on speed optimization of libvorbis by using SSE. Blacksword (or 637) launched an Ogg Vorbis acceleration project (in Japanese only) and releases oggenc binary and libvorbis patch based on libvorbis 1.1. This optimization includes SSE implementations of FFT, MDCT, windowing, channel coupling, sorting, psymodel, floor/residue encode, and so on. In my computer (Pentium IV 2.4GHz), ICL8.1 compiled oggenc binary of the optimized version (Archer Beta03) encodes at 23.4x while the one without optimization (ICL8.1 compiled but no SSE patches) does at 15.5x. Hence, this optimization archives ca. 1.5x speed gain

Unlike GoGo-no-coder, it's not forking: he releases a patch for libvorbis source code without absolutely changing algorithm or data structure. This is very good for source code maintenance to keep up with up-to-date official libvorbis, but limits optimization possibility in some degree. Actually, the author says in readme.txt that there's little room left for optimization. So I think it's time for quality evaluation although this optimization is in development stage. After several bugs are found and fixed for the last week, bitrates are quite similar to the reference encoder for all quality values. If you find any bugs or quality degressions from official 1.1 one, please tell us. 

Contributors are:
- Blacksword (or 637)'s SSE optimization (Japanese only): A number of functions in libvorbis are vectorized to take advantage of SSE instruction set as well as Opt-Sort and wuvorbis. For complete list of optimized functions, see readme.txt (in Japanese but you may easily find it) attached with the binary.
- Manuke's OptSort: Optimization of qsort function that consumes 20% of compression processing time, by assuming that _vp_quantize_couple_sort and _vp_noise_normalize_sort functions in psy.c call qsort with 8 or 32 element. This accelerates the whole compression process by 10%.
- W.Dee's wuvorbisfile (Japanese only?): wuvorbis.dll is a fast Ogg Vorbis decoder with SSE and 3DNow!, which is a part of KiriKiri software (useful for developing multi-media contents or adventure games). wuvorbis.dll decodes 1.4x-1.8x faster (SSE) and 1.5x-1.9x faster (3DNow!) than official libvorbis.

Happy encoding!

Ogg Vorbis optimized for speed

Reply #1
fefe was working on a (apparently buggy) SSE optimization of libvorbis too.
Do the optimizations only effect encoding or decoding as well?
"To understand me, you'll have to swallow a world." Or maybe your words.

Ogg Vorbis optimized for speed

Reply #2
I archived almost 100% (rather 85%, actually  ) speed incrase (against ICL 8.1 on AMD Athlon XP 1800+)

ICL 8.1: 9,8x
Optimized 18,0x.

Pretty good

Ogg Vorbis optimized for speed

Reply #3
Wow  Now that is FAST.  My results were similar to ilikedirtthe2nd's (actually a little better).

Ogg Vorbis optimized for speed

Reply #4
Quote
fefe was working on a (apparently buggy) SSE optimization of libvorbis too.
Do the optimizations only effect encoding or decoding as well?
[a href="index.php?act=findpost&pid=252028"][{POST_SNAPBACK}][/a]

Oh, I didn't know fefe's optimization.  I'll check whether it benefits Blacksword's optimization. 

IMHO this optimization effects on both encoding and decoding sides although optimized oggdec is not tested or released. Several functions for decodnig (e.g., vorbis_synthesis_blockin, mapping0_inverse, mdct_backward, etc.) are optimized too.

Ogg Vorbis optimized for speed

Reply #5
Whoa, it's really fast 

On my P4 2.4 GHz:

ICL compiled oggenc from rarewares:  13.2x
SSE optimised oggenc:  20.5x

Ogg Vorbis optimized for speed

Reply #6
Pretty nice speedup here too:
oggenc from rarewares 10.4x
SSE optimized 15.3x

Ogg Vorbis optimized for speed

Reply #7
Hello!

Well, I have got an older machine (p3 700) and recieved a speedup from 4.4 to 9.3x realtime.

Have you guys tested the SSE2 optimized build at http://homepage3.nifty.com/blacksword/
?

I wonder how big the speedup with this build is for p 4 and amd 64 cpus.


Ogg Vorbis optimized for speed

Reply #8
According to my tests...

ICL 8.1 Standard:

Code: [Select]
File length:  4m 58,0s
Elapsed time: 0m 18,0s
Rate:         16,5778
Average bitrate: 236,7 kb/s


ICL 8.1 Pentium 4:

Code: [Select]
File length:  4m 58,0s
Elapsed time: 0m 17,0s
Rate:         17,5529
Average bitrate: 236,7 kb/s


SSE:

Code: [Select]
File length:  4m 58,0s
Elapsed time: 0m 18,0s
Rate:         16,5778
Average bitrate: 236,7 kb/s


SSE2:

Code: [Select]
File length:  4m 58,0s
Elapsed time: 0m 18,0s
Rate:         16,5778
Average bitrate: 236,7 kb/s


Tested with "Toto - Africa" on a Pentium 4 with 3.2 GHz, 512 MB RAM, running Windows XP Professional Service Pack 1.

Ogg Vorbis optimized for speed

Reply #9
I got a good increase, too...

SSE2
Code: [Select]
        File length:  5m 23.0s
       Elapsed time: 0m 12.0s
       Rate:         26.9556
       Average bitrate: 175.3 kb/s

ILC 8.1
Code: [Select]
        File length:  5m 23.0s
       Elapsed time: 0m 19.0s
       Rate:         17.0246
       Average bitrate: 175.3 kb/s



But I can't seem to get it to work on FLAC files...
Code: [Select]
ERROR: Input file "01.flac" is not a supported format.

Am I missing something??

Thanks,

~esa

:edit: typo

Ogg Vorbis optimized for speed

Reply #10
Quote
But I can't seem to get it to work on FLAC files...
Code: [Select]
ERROR: Input file "01.flac" is not a supported format.

Am I missing something??


Standard oggenc doesn't input lossless files directly. Only Oggenc2.3 from rarewares does.

Regards; ilikedirt

Ogg Vorbis optimized for speed

Reply #11
Quote
Quote
But I can't seem to get it to work on FLAC files...
Code: [Select]
ERROR: Input file "01.flac" is not a supported format.

Am I missing something??


Standard oggenc doesn't input lossless files directly. Only Oggenc2.3 from rarewares does.

Regards; ilikedirt
[a href="index.php?act=findpost&pid=252321"][{POST_SNAPBACK}][/a]


The standard oggenc supports FLAC input perfectly. It's a compile-time option AFAIK.
"To understand me, you'll have to swallow a world." Or maybe your words.

Ogg Vorbis optimized for speed

Reply #12
Quote
The standard oggenc supports FLAC input perfectly. It's a compile-time option AFAIK.
[a href="index.php?act=findpost&pid=252328"][{POST_SNAPBACK}][/a]

It sure is.

Ogg Vorbis optimized for speed

Reply #13
Quote
Standard oggenc doesn't input lossless files directly.

Quote
The standard oggenc supports FLAC input perfectly.




Well, I can't say that the issue is any clearer for me now...

Ogg Vorbis optimized for speed

Reply #14
Quote
It's a compile-time option AFAIK.


That means, oggenc is able to input flac, if this is enabled when compiling. So: generally it is able to read flac, but this version is not.

Ogg Vorbis optimized for speed

Reply #15
Quote
...oggenc is able to input flac, if this is enabled when compiling. So: generally it is able to read flac, but this version is not.
Ah...  thank you for the clarification!

 

~esa

Ogg Vorbis optimized for speed

Reply #16
Quote
Have you guys tested the SSE2 optimized build at http://homepage3.nifty.com/blacksword/
?

I wonder how big the speedup with this build is for p 4 and amd 64 cpus.
[a href="index.php?act=findpost&pid=252295"][{POST_SNAPBACK}][/a]

I could not find speed difference between SSE and SSE2 versions on my Pentium IV machine. Is there anybody who gets speed increase? The author wants to know the effect to determine whether if he should continue SSE2 version or not.

Quote
According to my tests...

ICL 8.1 Standard:

Code: [Select]
File length:  4m 58,0s
Elapsed time: 0m 18,0s
Rate:         16,5778
Average bitrate: 236,7 kb/s


SSE:

Code: [Select]
File length:  4m 58,0s
Elapsed time: 0m 18,0s
Rate:         16,5778
Average bitrate: 236,7 kb/s

[a href="index.php?act=findpost&pid=252297"][{POST_SNAPBACK}][/a]

Are SSE and SSE2 binaries your own builds? If so, don't forget to define a symbol __SSE__ to activate the optimization when compiling.

Ogg Vorbis optimized for speed

Reply #17
Quote
I got a good increase, too...

ILC 8.1
Code: [Select]
        File length:  5m 23.0s
       Elapsed time: 0m 12.0s
       Rate:         26.9556
       Average bitrate: 175.3 kb/s

SSE2
Code: [Select]
        File length:  5m 23.0s
       Elapsed time: 0m 19.0s
       Rate:         17.0246
       Average bitrate: 175.3 kb/s



But I can't seem to get it to work on FLAC files...
Code: [Select]
ERROR: Input file "01.flac" is not a supported format.

Am I missing something??

Thanks,

~esa
[{POST_SNAPBACK}][/a]


Huh? The ICL 8.1 compile is faster.

Quote
Quote
Have you guys tested the SSE2 optimized build at [a href="http://homepage3.nifty.com/blacksword/]http://homepage3.nifty.com/blacksword/[/url]
?

I wonder how big the speedup with this build is for p 4 and amd 64 cpus.
[a href="index.php?act=findpost&pid=252295"][{POST_SNAPBACK}][/a]

I could not find speed difference between SSE and SSE2 versions on my Pentium IV machine. Is there anybody who gets speed increase? The author wants to know the effect to determine whether if he should continue SSE2 version or not.

Quote
According to my tests...

ICL 8.1 Standard:

Code: [Select]
File length:  4m 58,0s
Elapsed time: 0m 18,0s
Rate:         16,5778
Average bitrate: 236,7 kb/s


SSE:

Code: [Select]
File length:  4m 58,0s
Elapsed time: 0m 18,0s
Rate:         16,5778
Average bitrate: 236,7 kb/s

[a href="index.php?act=findpost&pid=252297"][{POST_SNAPBACK}][/a]

Are SSE and SSE2 binaries your own builds? If so, don't forget to define a symbol __SSE__ to activate the optimization when compiling.
[a href="index.php?act=findpost&pid=252344"][{POST_SNAPBACK}][/a]


Nope, they're not my own compiles.

Ogg Vorbis optimized for speed

Reply #18
Quote
Huh? The ICL 8.1 compile is faster.
Whoops!  No, that's a typo...  I'll edit immediately...

Ogg Vorbis optimized for speed

Reply #19
OK, here are some partial translations:

OggEnc_SSE_20041101ArcherB03.zip
Changes regarding/surrounding comments
Improved low-bitrate quality

Current problems are:
  • When encoding at low bitrates, treble quality suffers, and size bloat occurs.
  • Could hang immediately on running, depending on the environment
  • Bugs due to changes to comment handling?

Ogg Vorbis optimized for speed

Reply #20
Quote
OK, here are some partial translations:

OggEnc_SSE_20041101ArcherB03.zip
Changes regarding/surrounding comments
Improved low-bitrate quality

Current problems are:
  • When encoding at low bitrates, treble quality suffers, and size bloat occurs.

  • Could hang immediately on running, depending on the environment

  • Bugs due to changes to comment handling?

[a href="index.php?act=findpost&pid=252353"][{POST_SNAPBACK}][/a]

Thanks for the translation. I think all of the current problems listed above are solved in Archer B03. These problems existed in Archer B02.

Ogg Vorbis optimized for speed

Reply #21
IIRC, SSE2 is optimised for double point precision so maybe there isn't that much difference with SSE since libvorbis doesn't use many of them?

Ogg Vorbis optimized for speed

Reply #22
Tested on my AMD64 3400+, 1GB RAM

ICL 8.1:

File length:  4m 27.0s
Elapsed time: 0m 14.0s
Rate:        19.1190
Average bitrate: 132.9 kb/s

ICL 8.1 (John33):

File length:  4m 27.0s
Elapsed time: 0m 11.0s
Rate:        24.3333
Average bitrate: 132.9 kb/s

SSE/SSE2 Optimized:

File length:  4m 27.0s
Elapsed time: 0m 08.0s
Rate:        33.4583
Average bitrate: 132.9 kb/s

SSE2 optimization doesn't change encoding speed

Ogg Vorbis optimized for speed

Reply #23
As QK says, there's very little use of double precision in libvorbis, so the use of SSE2 optimisation is virtually a waste of effort.

Ogg Vorbis optimized for speed

Reply #24
Quote
IIRC, SSE2 is optimised for double point precision so maybe there isn't that much difference with SSE since libvorbis doesn't use many of them?
[a href="index.php?act=findpost&pid=252403"][{POST_SNAPBACK}][/a]

Quote
As QK says, there's very little use of double precision in libvorbis, so the use of SSE2 optimisation is virtually a waste of effort.
[a href="index.php?act=findpost&pid=252436"][{POST_SNAPBACK}][/a]

Actually, he expects higher quality (or speed) of float to integer and vice-versa conversion but, at the same time, doubts the effect. I'll tell him these results.