IPB

Welcome Guest ( Log In | Register )

2 Pages V   1 2 >  
Reply to this topicStart new topic
Nine different codecs 100-pass recompression test
bernhold
post Mar 23 2013, 19:12
Post #1





Group: Members
Posts: 41
Joined: 22-March 13
Member No.: 107330



Hi everyone!

I lately discovered this forum and enjoyed reading the listening tests. I decided to run a listening test myself. Have you ever wondered how different codecs are affected by re-encoding / re-compressing? Of course, recompressing audio is a bad idea, but sometimes can't be avoided. To clear things up, I did a test with the following encoders:
  • WMA Professional 10 (wmapro)
  • WMA 9.2 (wma)
  • Musepack (mpc)
  • Fraunhofer Mp3 surround encoder (mp3s)
  • LAME (mp3)
  • Quicktime AAC (qaac)
  • Nero AAC (nero)
  • Vorbis OGG (vorbis)
  • Opus (opus)

Quality settings: Low (~96 kbps) and high (~256 kbps)
Bitrate modes: CBR, ABR and VBR

I encoded the original sample with the respective encoder, decoded it back to WAV and encoded it again, for 100 times. Then I listened to the results to determine which encoder produced the best results.

RESULTS

AAC is the clear winner by far. It is virtually unaffected by the number of passes. All other codecs had degraded sound quality increasing with the number of encoding passes, especially at low bitrates.

At low bitrates, AAC was the only codec providing satisfactory results. All other encoders fall way behind and produce audible compression artifacts such as cracking noises, muffled sound and hissing. At high bitrates, LAME and Musepack can compete with AAC, but all other encoders fall way behind.

It's interesting to see how much encoders profit from an increased bitrate when recompressing many times. For AAC, as the clear winner, it didn't matter. That being said, Musepack placed 9th with low bitrate settings, but at high bitrate, it was almost as good as AAC and placed 4th. This is similar to LAME, which produced loud cracking noises at low bitrates and placed 8th, but sounded almost perfect at high bitrates and placed 3rd.

Other codecs were mainly unaffected by bitrate, such as WMA, the Fraunhofer MP3s encoder, Opus and OGG Vorbis. These codecs were mainly affected by the number of recompression passes.

In general, WMA and the Fraunhofer MP3s codec were the most disappointing. WMA produced loud hissing and cracking noises, while the Fraunhofer encoder sounded bland and muffling, discarding brilliance and detail. The only reason Fraunhofer placed decent is that it doesn't produce loud cracking or hissing noises, which to my ears is even worse than just muffled or dull sound. Of course, that's purely subjective.

Some encoders did not only degrade sound quality, but also had some other quirks. For example, the LAME encoder lowers the volume with every encoding pass. The 100th pass was virtually inaudible. I had to normalize the audio to hear anything at all. Other encoders produced erroneous files and garbage. The Fraunhofer encoder added silence to the beginning and end of each file and repeated parts of the sample at the end. After 100 passes, it created a 12 seconds file (the original file was 7 seconds). Winamp and Foobar2000 even reported a length of 1:02 minutes for the Fraunhofer file, however the playback ended after 12 seconds. The Vorbis encoder did a similar thing, which resulted in a reported length of 2 seconds, while the playback ended at 7 seconds. I can't really say if I did something fundamentally wrong or if it's the encoders fault, but in the end, the Fraunhofer and Vorbis encoders produced corrupted files. For the listening test, I tried to fix all errors like added silence or corrupted files, since I wanted to judge the sound quality only.

You can view the complete test on my homepage. There, I also have attached the test audio samples so you can hear them in your browser. I also visualized the waveform of each sample, it's very interesting to see.

http://bernholdtech.blogspot.de/2013/03/Ni...ssion-test.html

For example, this is the original file:



This is after 100 re-encodings with Nero AAC:



And this is after 100 re-encodings with OGG Vorbis:



This is after 100 re-encodings with WMA (Windows Media Audio):



This post has been edited by bernhold: Mar 23 2013, 19:38
Go to the top of the page
+Quote Post
zerowalker
post Mar 23 2013, 19:35
Post #2





Group: Members
Posts: 266
Joined: 6-August 11
Member No.: 92828



Interesting.
Though i am surprised that Vorbis did so bad.

Have you tried with aoTuVb6.03?
Cause it should be more resilient then LibVorbis.
Go to the top of the page
+Quote Post
bernhold
post Mar 23 2013, 19:37
Post #3





Group: Members
Posts: 41
Joined: 22-March 13
Member No.: 107330



I read somewhere (Wikipedia, I think) that the improvements of aoTuV are periodically merged back to the original Vorbis codec. So I assumed it won't make much of a difference. I'm not very familiar with Vorbis, though. If you say so, it may be worth testing that, too.
Go to the top of the page
+Quote Post
saratoga
post Mar 23 2013, 19:42
Post #4





Group: Members
Posts: 4868
Joined: 2-September 02
Member No.: 3264



You probably need to manually adjust the encoders gain in lame so that it does not change the volume when encoding.

If the audio shifted in time for vorbis something probably went wrong. Vorbis supports gap less playback by default so no change in length should occur.
Go to the top of the page
+Quote Post
me7
post Mar 23 2013, 19:46
Post #5





Group: Members
Posts: 177
Joined: 23-August 06
Member No.: 34375



Wow, you uploaded all files playable within the browser. Thank you for sharing this with us.
Go to the top of the page
+Quote Post
hlloyge
post Mar 23 2013, 20:17
Post #6





Group: Members
Posts: 695
Joined: 10-January 06
From: Zagreb
Member No.: 27018



I was always wondering about this, but neverh had enough spare time to do it. Thank you.
Go to the top of the page
+Quote Post
bernhold
post Mar 23 2013, 20:20
Post #7





Group: Members
Posts: 41
Joined: 22-March 13
Member No.: 107330



QUOTE (saratoga @ Mar 23 2013, 19:42) *
You probably need to manually adjust the encoders gain in lame so that it does not change the volume when encoding.

If the audio shifted in time for vorbis something probably went wrong. Vorbis supports gap less playback by default so no change in length should occur.


Thank you, I will try that. Do you think this affected the sound quality of LAME? Regarding Vorbis, the length hasn't actually changed, it's just somehow reported wrong in the audio players I used. It shows as 0:02 in the playlist, but when I actually play it, it's perfectly normal (7 seconds). When I decode it back to WAV, the length is also correct. So I didn't bother much, it shouldn't make a difference regarding sound quality anyway.

This post has been edited by bernhold: Mar 23 2013, 20:20
Go to the top of the page
+Quote Post
alter4
post Mar 23 2013, 20:21
Post #8





Group: Members
Posts: 109
Joined: 14-September 04
From: Belarus, Vitebsk
Member No.: 16992



QUOTE (bernhold @ Mar 23 2013, 21:37) *
I read somewhere (Wikipedia, I think) that the improvements of aoTuV are periodically merged back to the original Vorbis codec. So I assumed it won't make much of a difference. I'm not very familiar with Vorbis, though. If you say so, it may be worth testing that, too.


Yes that is true. Vanilla vorbis has merged only beta2 code, but the recent Aotuv is beta 6. So it could be worth to test beta 6.
Thanks for interesting test.
Go to the top of the page
+Quote Post
zima
post Mar 23 2013, 21:42
Post #9





Group: Members
Posts: 136
Joined: 3-July 03
From: Pomerania
Member No.: 7541



Hm, reencoding 100 times seems like a bit of an overkill, and generally not a very realistic test? (in that sense, it doesn't clear things up much)

I would assume more practical results would come from one-two passes, starting 1. from lossless source and 2. a lossy high bitrate encode (comparable to typical stuff bought from iTunes or Amazon), both with low bitrate as a target (say, for portable use), and comparing the two resulting files.

This post has been edited by zima: Mar 23 2013, 21:43


--------------------
http://last.fm/user/zima
Go to the top of the page
+Quote Post
bernhold
post Mar 23 2013, 21:53
Post #10





Group: Members
Posts: 41
Joined: 22-March 13
Member No.: 107330



Yes, 100 times is not practical smile.gif

But there's a reason for it. I encoded 100 times because it's much easier to see how a codec performs. You may not be able to hear any difference after 1 or 2 re-encodes. And I assume that a codec which sounds better than another codec after 100 re-encodes will also sound better after 1 or 2 re-encodes. However, for the listening test, I only used the results after 100 re-encodes.

I also added results for 10, 25 and 50 passes in my test, they are available in the "detailed results" section on the web page (scroll down). These results are less extreme as you may expect.

This post has been edited by bernhold: Mar 23 2013, 21:55
Go to the top of the page
+Quote Post
romor
post Mar 23 2013, 22:04
Post #11





Group: Members
Posts: 668
Joined: 16-January 09
Member No.: 65630



I run your files through Python (from yesterday's waveform thread), as it also colors waveform on spectral intensity.

Here is result: http://db.tt/q9gXzysF


--------------------
scripts: http://goo.gl/M1qVLQ
Go to the top of the page
+Quote Post
zima
post Mar 23 2013, 22:10
Post #12





Group: Members
Posts: 136
Joined: 3-July 03
From: Pomerania
Member No.: 7541



QUOTE (bernhold @ Mar 23 2013, 21:53) *
I encoded 100 times because it's much easier to see how a codec performs. You may not be able to hear any difference after 1 or 2 re-encodes. And I assume that a codec which sounds better than another codec after 100 re-encodes will also sound better after 1 or 2 re-encodes.

Careful there, you assumption looks like it might go against certain rules here...

Not being able to hear any differences after a few reencodes is also a perfectly valid (and much more useful, vs. a bit artificial overkill scenario) result.

All that said, thank you for the effort (particularly for the "detailed results" section, 10 passes ;P ) - I also always wanted to do a similar test, but never got to it.


--------------------
http://last.fm/user/zima
Go to the top of the page
+Quote Post
dgauze
post Mar 23 2013, 22:56
Post #13





Group: Members
Posts: 45
Joined: 13-March 09
Member No.: 67901



It seems as though this test would be more suited to comparing different versions of the same codec, itf anything at all.

As it stands, you are using codecs with different encoding techniques on one particular sample for a few of the codecs tested, but not others.

In that sense, this test doesn't tell us much of anything as it stands.

This post has been edited by dgauze: Mar 23 2013, 22:59
Go to the top of the page
+Quote Post
saratoga
post Mar 24 2013, 02:19
Post #14





Group: Members
Posts: 4868
Joined: 2-September 02
Member No.: 3264



After 100 passes, rounding error probably starts to be a problem. I wonder what effect the intermediate formats used by the decoder/encoder have on quality. Software that can output/read float probably has an advantage here over 16 bit (or even 24 bit) PCM.
Go to the top of the page
+Quote Post
dhromed
post Mar 24 2013, 03:16
Post #15





Group: Members
Posts: 1287
Joined: 16-February 08
From: NL
Member No.: 51347



This test is very comprehensive! Good job.

QUOTE (romor @ Mar 23 2013, 22:04) *
Here is result: http://db.tt/q9gXzysF


Excellent, but your results are sorted per passcount per codec, and I think it's more interesting to see the progress of the decay for each codec and setting. Perhaps the data reaches some kind of plateau after a certain number of transcode cycles, or instead accelerates toward Shannon's oblivion.

I've aligned the Vorbis-low images in Photoshop to 0, 10, 25, 50 and 100 measuring points, but there was little I could see because of the gap between 50 and 100.

This post has been edited by dhromed: Mar 24 2013, 03:18
Go to the top of the page
+Quote Post
kjoonlee
post Mar 24 2013, 04:16
Post #16





Group: Members
Posts: 2526
Joined: 25-July 02
From: South Korea
Member No.: 2782



Does this belong under "Listening Tests"?

I don't think so...


--------------------
http://blacksun.ivyro.net/vorbis/vorbisfaq.htm
Go to the top of the page
+Quote Post
greynol
post Mar 24 2013, 06:16
Post #17





Group: Super Moderator
Posts: 10000
Joined: 1-April 04
From: San Francisco
Member No.: 13167



No this does not belong in listening tests and will be moved shortly.

There have already been complaints that this discussion is not in keeping with TOS8 and I have a hard time disagreeing.

While I understand that this took time and effort, I do not agree that the results are particularly meaningful, let alone useful. It's a lot easier to push a few buttons and let the computer chug away than it is to actually conduct double blind tests.

This is a far cry from the level of analysis that members of this forum are capable of presenting.



--------------------
Placebophiles: put up or shut up!
Go to the top of the page
+Quote Post
Mach-X
post Mar 24 2013, 07:48
Post #18





Group: Members
Posts: 269
Joined: 29-July 12
From: Windsor, On, Ca
Member No.: 101859



While perhaps it doesnt belong in listening tests, I dont believe it should be binned, I found the results quite interesting, particularly how some codecs manage to keep some semblance of the source file while others destroy it almost beyond recognition. Of course nobody is going to encode a file 100 times but its an interesting test nonetheless.
Go to the top of the page
+Quote Post
greynol
post Mar 24 2013, 11:16
Post #19





Group: Super Moderator
Posts: 10000
Joined: 1-April 04
From: San Francisco
Member No.: 13167



Read my post again. You will not see any mention of binning the discussion.


--------------------
Placebophiles: put up or shut up!
Go to the top of the page
+Quote Post
[JAZ]
post Mar 24 2013, 12:07
Post #20





Group: Members
Posts: 1767
Joined: 24-June 02
From: Catalunya(Spain)
Member No.: 2383



@greynol: Could you clarify why this infringes the TOS #8 and from what point of view is this useless?

Concretely, it is a test of codec regression and I don't even need to listen to the samples from Ogg Vorbis and WMA to know that they will sound notably different, just by looking at those waveforms above. (Edit: Ok, probably the final table classification would need an abc-hr result to back it up)

You will probably also remember some tests made some years ago, that studied transcoding from one codec to another , and that in that case, Musepack seemed to be the best source to transcode to mp3.
That test required a listening test because it was a single pass, not 100, and because it was testing inter-codec transcoding, instead of transcoding to self.


Concretely, this test can answer several things:

If an user is going to transcode some files, and the origin and destination formats are known, then there's an empirical way to know if it will degrade fast (so the decision of transcoding be less desiderable).

If there is a codec that, giving the interest of transcoding, will manage to add the less amount of artifacts and/or be more stable in doing so.



@bernhold: Like saratoga said, it would be interesting to change the gain that LAME applies by default (which i thought it no longer did), (--scale 1). Said that, which version of LAME is that? (and maybe of the other codecs and which tool was used).

This post has been edited by [JAZ]: Mar 24 2013, 12:15
Go to the top of the page
+Quote Post
IgorC
post Mar 24 2013, 15:53
Post #21





Group: Members
Posts: 1540
Joined: 3-January 05
From: ARG/RUS
Member No.: 18803



Don't take me wrong.

It's clear than there are varity of methods for testing audio codecs and everybody is free to adopt and defend any of them.

But as one could notice there is no comments from people who usually involved in listening tests from here.
Or everything is perfect and there is nothing to say, or everything is plain wrong and there is nothing to say.
Take a guess.
Go to the top of the page
+Quote Post
greynol
post Mar 24 2013, 15:58
Post #22





Group: Super Moderator
Posts: 10000
Joined: 1-April 04
From: San Francisco
Member No.: 13167



Sound quality of lossy codecs is determined though DBT, full stop.


--------------------
Placebophiles: put up or shut up!
Go to the top of the page
+Quote Post
romor
post Mar 24 2013, 16:57
Post #23





Group: Members
Posts: 668
Joined: 16-January 09
Member No.: 65630



QUOTE ([JAZ] @ Mar 24 2013, 13:07) *

@bernhold: Like saratoga said, it would be interesting to change the gain that LAME applies by default (which i thought it no longer did), (--scale 1). Said that, which version of LAME is that? (and maybe of the other codecs and which tool was used).

This would be sensible, and perhaps lossless version of your 8s sample.


--------------------
scripts: http://goo.gl/M1qVLQ
Go to the top of the page
+Quote Post
db1989
post Mar 24 2013, 18:34
Post #24





Group: Super Moderator
Posts: 5275
Joined: 23-June 06
Member No.: 32180



What would be interesting IMO and, I think, much more useful to actual users, would be a test with repeated iterations of re-encoding material from various uncompressed and lossy settings to various other lossy settings, with DBTs after each, aiming to determine when degradation becomes audible and perhaps its extent compared to other workflows. Then again, I have a hunch that effects would become audible well before 100 passes, which I agree is a number so improbable in reality that it’s not useful in any concrete sense and is purely an abstract ‘what if’. The workload in the test I suggested would come much less from the number of passes and much more from the need to choose various source and destination encoders/settings and determine how to assess their effects and the resulting relative quality. Anyway… pure speculation.
Go to the top of the page
+Quote Post
romor
post Mar 24 2013, 18:48
Post #25





Group: Members
Posts: 668
Joined: 16-January 09
Member No.: 65630



Maybe also, something similar to transcoding test linked by [JAZ], but perhaps cross-referenced table of selected 30s sample and selected bitrate - pass original signal to every encoder and yet pass again to every other. Table would look interesting to me and I might as well do that out of curiosity, but publicly this test would definitely need ABX report which is not needed here (in this thread test).


--------------------
scripts: http://goo.gl/M1qVLQ
Go to the top of the page
+Quote Post

2 Pages V   1 2 >
Reply to this topicStart new topic
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:

 



RSS Lo-Fi Version Time is now: 1st August 2014 - 05:19