IPB

Welcome Guest ( Log In | Register )

 
Reply to this topicStart new topic
Sample that kills Garf's 160 mode
Volcano
post Dec 8 2001, 00:36
Post #1





Group: Members (Donating)
Posts: 916
Joined: 30-September 01
From: Berlin, Germany
Member No.: 112



Oh dear. It all began when in one song (Bee Gees - Man In The Middle) I heard a really awful high frequency artifact, but the sound that caused the artifact really was so extreme that I thought it couldn't happen on any other sample with Garf's cool 160 mode.

Anyway, I was proven wrong sad.gif. I can't get them all together, but I have seen a few more tracks in the last few weeks that reveal flaws in the high frequency areas (I promise, I'll try to get them all together, but as many tracks were from CDs out of the library, I don't know when I can get the originals next time).

I have uploaded one test sample here (1 MB download):

http://www.volcanocenter.f2s.com/samples/taking_you_home.pac (Use right-click -> Save target as)

It's the intro to Don Henley's "Taking You Home" (from the 2000 album "Inside Job").

Listen to the 3 tambourine-like strokes across the 10 second sample, and you'll see what I mean - they become quite badly distorted. Before anybody asks ;D: I'm not hearing things, I ABXed it easily 16/16 times.

Could you look into this, Garf? I find those problems quite surprising, since on just about any other type of sample (especially impulses), the 160 mode will rival just about any other "low" bitrate setting, although on impulses the bitrate bloats extremely.

Is there any chance you can improve the 160 mode a bit to handle clips like this, or is it still a general Vorbis problem?

CU

Dominic

PS: I'll test other codecs with the sample thoroughly tomorrow. Results from a quick'n'dirty test: --r3mix produced REAL bad ringing, and --dm-preset standard (the old version from the official compile) wasn't very much better. The rev7 standard preset improved a lot on that. I'll post details tomorrow (in the thread where they actually belong smile.gif).
Go to the top of the page
+Quote Post
Garf
post Dec 8 2001, 01:53
Post #2


Server Admin


Group: Admin
Posts: 4883
Joined: 24-September 01
Member No.: 13



Thanks for the clip.

It'll be a while before I can work on this unfortunately.

Can you check how standard Vorbis RC2 and the tuned 350kbps mode do on that clip?

--
GCP
Go to the top of the page
+Quote Post
Volcano
post Dec 8 2001, 10:08
Post #3





Group: Members (Donating)
Posts: 916
Joined: 30-September 01
From: Berlin, Germany
Member No.: 112



OK. I doubt that the tuned 350 mode will produce arifacts, but I'll give it a try.

CU

Dominic
Go to the top of the page
+Quote Post
niktheblak
post Dec 9 2001, 02:04
Post #4





Group: Members (Donating)
Posts: 302
Joined: 3-October 01
From: Finland
Member No.: 188



I'm sorry Garf but I have to report from another high-frequency problem case.

The sample in question is the first 15 seconds of "Beauty Slept In Sodom" from Cradle Of Filth's album "Dusk....And Her Embrace".

http://www.hytti.uku.fi/~tnkorhon/beautysleptinsodom.pac

The sample consists of a single low-volume harpsichord instrument playing a constant tune. Acoustic characteristics for a harpsichord include sharp attacks with higher harmonics before the lower ones and a fast decay rate. Before going into the problem itself, I would like to present some interesting bitrate statistics:

Frank Klemm's mppenc -standard : 231.5 kb/s
Frank Klemm's mppenc -xtreme : 266.5 kb/s
Frank Klemm's mppenc -insane : 320.4 kb/s
LAME 3.90rev7 --alt-preset standard : 183.8 kb/s
oggencgt2 -b 999 : 146.5 kb/s
oggencgt2 -b 350 : 282.0 kb/s

:eek: First time ever I see mppenc bloat like that!

I listened the 160 kb/s GTuned2 tone and found it not to be transparent. I ABX'd it 15/16 times. Description why: hissing, distortion and noise where the expression "hissing" probably means high-frequency noise. There are other characteristics also but I'll let you judge them by yourself.

I will test 350 kb/s mode tomorrow as well as --alt-presets. I definetly want to know how MP3 survives a fast-attac/fast-decay clip like this.

I hope this will be helpful!
Go to the top of the page
+Quote Post
Dibrom
post Dec 10 2001, 21:52
Post #5


Founder


Group: Admin
Posts: 2958
Joined: 26-August 02
From: Nottingham, UK
Member No.: 1



QUOTE
Originally posted by niktheblak
The sample in question is the first 15 seconds of "Beauty Slept In Sodom" from Cradle Of Filth's album "Dusk....And Her Embrace".


Good album smile.gif

QUOTE
Frank Klemm's mppenc -standard : 231.5 kb/s
Frank Klemm's mppenc -xtreme : 266.5 kb/s
Frank Klemm's mppenc -insane : 320.4 kb/s
LAME 3.90rev7 --alt-preset standard : 183.8 kb/s
oggencgt2 -b 999 : 146.5 kb/s
oggencgt2 -b 350 : 282.0 kb/s

:eek: First time ever I see mppenc bloat like that!


I just downloaded the file and listened to it (non-encoded) and I've actually seen this MPC behavior before. MPC seems to produce larger bitrates on average on signals like this, also on dulcimers, organs, and certain guitar pieces ("Voice of The Soul" off "Death - The Sound of Perseverance" is a perfect example). I've also seen this type of thing on another clip from "Skinny Puppy - Last Rights" where there is a section that sounds like small clips of human speech combined together and fast forwarded (maybe like someone switching tv channels or radio stations very fast). It seems to be related to highly harmonic signals with rapidly changing base frequencies interspersed with attacks. Unsurprisingly, MPC seems to perform excellently on these signals also. It seems to be a "feature" of the psymodel which somehow detects these situations and increases bitrate appropriately. I believe at least part of it is probably due to the "Clear Voice Detection" (I think PsyTEL uses a similar feature with "Improved Human Speech Coding") feature or some other similar internal behavior.
Go to the top of the page
+Quote Post
Volcano
post Dec 10 2001, 23:09
Post #6





Group: Members (Donating)
Posts: 916
Joined: 30-September 01
From: Berlin, Germany
Member No.: 112



Garf,

as expected, the 350kbps mode left nothing to be desired, absolutely transparent to me.

Dominic
Go to the top of the page
+Quote Post
Garf
post Dec 11 2001, 00:11
Post #7


Server Admin


Group: Admin
Posts: 4883
Joined: 24-September 01
Member No.: 13



Okay thanks, that indicates there is room for tuning and it's not a fundamental Vorbis problem.

Don't expect a new tuned mode soon though (sorry sad.gif). My best estimate is February.

--
GCP
Go to the top of the page
+Quote Post
Volcano
post Dec 11 2001, 07:04
Post #8





Group: Members (Donating)
Posts: 916
Joined: 30-September 01
From: Berlin, Germany
Member No.: 112



Worry not smile.gif
I'm using MPC -xtreme for now, that should keep me comfortable until then wink.gif

Dominic
Go to the top of the page
+Quote Post
tubenut
post Dec 11 2001, 07:41
Post #9





Group: Members
Posts: 12
Joined: 6-November 01
Member No.: 412



Actually, I've got a much better sample that shows a big problem with the 160 mode.

http://www.students.uiuc.edu/~jkolodzi/headcleaner.pac
http://www.students.uiuc.edu/~jkolodzi/001...cleaner_ogg.pac

Yes, despite the fact that this is little more than a couple of sine waves at 440 Hz and something slightly less, this is from actual music. Einsturzende Neubauten's "Headcleaner" from their Tabula Rasa album, to be precise.

The problem isn't the sines so much as some nearly-imperceptible clicks that occur in the sample, due to slightly innacurate tape editing or something. You have to look at a spectrogram to tell they're even there. Run the sample through Vorbis at 160, and blips occur at the places where the clicks were. No need to ABX, they're hard to miss. OK, maybe not THAT hard to miss, but if I can hear them, anyone ought to be able to. It's clearer with headphones, though. Doesn't happen at 350 that I can tell, though. Perhaps this is a clue as to why some of these other samples sound wrong?

Actually, it happens with the normal 160 mode as well... perhaps this should be given its own thread?
Go to the top of the page
+Quote Post
Dibrom
post Dec 11 2001, 17:12
Post #10


Founder


Group: Admin
Posts: 2958
Joined: 26-August 02
From: Nottingham, UK
Member No.: 1



QUOTE
Originally posted by tubenut
The problem isn't the sines so much as some nearly-imperceptible clicks that occur in the sample, due to slightly innacurate tape editing or something.  You have to look at a spectrogram to tell they're even there.  Run the sample through Vorbis at 160, and blips occur at the places where the clicks were.


I just downloaded this sample and took a listen and it sounded like some of the post-echo issues I'd heard in Vorbis before. Pre-echo (which is closely related) was supposed to be addressed more properly by Monty around the time RC3 was originally "only a week away" but as far as I know, at least pre-echo has still not been worked on that much so it may not be particularly surprising to see these kinds of issues still. I think the other samples here may be related to this also but I'm not sure.

Anyway, after listening to the sample, I decided to examine it in the spectral view. This is how I originally discovered that one of the artifacts I kept hearing in drone and some other clips, were due to post-echo.

Here are some spectrograms that are somewhat interesting:

Original .wav:


Another Example

Vorbis:


Another Example

MPC:


Another Example

Here you can clearly see the post-echo (and a little bit of pre-echo also) in the Vorbis sample around where the arrows and ellipses point (which is on one of the attacks where the smearing is very audible) and to verify that this artifact is because of post-echo, I also included a spectrogram of mpc -xtreme which does not have this artifacts, and also doesn't show what looks to be post-echo in the spectral view either.

Spectrograms normally shouldn't be used to rate quality by any means but when you can clearly hear an artifact such as this they are useful for pinpointing the cause, which I think is shown above.

Edit: Added clearer examples, kept original shots under "Another Example"
Go to the top of the page
+Quote Post
Garf
post Dec 11 2001, 18:18
Post #11


Server Admin


Group: Admin
Posts: 4883
Joined: 24-September 01
Member No.: 13



Hmm, is this normal Vorbis or my tuned mode?

The tuned modes should handle postecho, and produce no more postecho than they produce preecho, which would make the postecho inaudible because the preecho should then be way worse because of the way the hearing works.

If it still happens with my tuned mode, my best guess is that the small block trigger for postecho is not set sensitive enough.

--
GCP
Go to the top of the page
+Quote Post
Dibrom
post Dec 11 2001, 18:27
Post #12


Founder


Group: Admin
Posts: 2958
Joined: 26-August 02
From: Nottingham, UK
Member No.: 1



QUOTE
Originally posted by Garf
Hmm, is this normal Vorbis or my tuned mode?


Yes. GT2 oggdrop at 160kbps is what I used.

QUOTE
The tuned modes should handle postecho, and produce no more postecho than they produce preecho, which would make the postecho inaudible because the preecho should then be way worse because of the way the hearing works.


Hrmm.. well that's interesting but I'm pretty sure that in at least a good portion of the cases where I still hear pre-echo, I'm also hearing post-echo. Of course I haven't examined all these other cases where I still hear temporal smearing in the spectral view, but post-echo seems to have a different sound than pre-echo. For that matter I also happen to hear post-echo very often with MP3 which also has very bad pre-echo.. so maybe the masking effect is not as strong there as believed?

QUOTE
If it still happens with my tuned mode, my best guess is that the small block trigger for postecho is not set sensitive enough.


That could very well be. At least the artifacts in the last clip sound so bad that they are likely due to inaccurate short block triggering of some type like you said.
Go to the top of the page
+Quote Post
niktheblak
post Dec 13 2001, 17:07
Post #13





Group: Members (Donating)
Posts: 302
Joined: 3-October 01
From: Finland
Member No.: 188



Alright, my response took a bit longer than supposed but I have some results you might be interested of.

Unsurprisingly, I concur that GTuned2 350 kbps mode sounds absolutely transparent at my harpsichord clip. But the problems are not limited just to GTuned2 160 kbps.

With a regular rc2 oggenc I found out that every single setting produced audible distortion. It was painstakingly easy to detect at bitrates 160-192 and somewhat easy even at 350 kbps!

My ABX score with rc2 oggenc -b 350 mode is not very convincing 13/16 but it was performed on a very noisy and hasty environment. I can do more ABX'ing if someone doubts my judgement.

Dibrom's --alt-preset standard actually sounded rather good. I don't think I could ABX it. As usual, MPC was annoyingly transparent.

And Dibrom, I think you may be right about that ClearVoice thing. The sound of harpsichord (and speech) is based solely on very special harmonic beaviour and since human hear is extremely trained to detect harmonics, increase in bitrate might be in order.

However OGG's behaviour saddens me a bit. Athough this is only the second time I detect something wrong with regular OGG (first with GTuned) we are talking about regular music here. Not an unnatural artificially generated test sample.

And Garf, do you know whether your tweakings are included in future versions of OGG? Based on empirical testing you have done wonders in Vorbis quality and including your work within regular OGG core should be at least discussed!
Go to the top of the page
+Quote Post
Dibrom
post Dec 13 2001, 17:37
Post #14


Founder


Group: Admin
Posts: 2958
Joined: 26-August 02
From: Nottingham, UK
Member No.: 1



QUOTE
Originally posted by niktheblak
Dibrom's --alt-preset standard actually sounded rather good. I don't think I could ABX it. As usual, MPC was annoyingly transparent.


Both are good to hear smile.gif

QUOTE
And Dibrom, I think you may be right about that ClearVoice thing. The sound of harpsichord (and speech) is based solely on very special harmonic beaviour and since human hear is extremely trained to detect harmonics, increase in bitrate might be in order.


Yeah.. according to the webpage it'd make sense:

QUOTE
ClearVoiceDetection CVD(still German): During activity of vocals (especially vowels) or [b]similar harmonic signals higher resolution of quantization is assigned to the belonging spectral bins. This procedure will fix some problems of the psychoacoustic models during changes of the base frequency of harmonic signals.


I really believe the CVD technique can be very helpful in areas most psymodels do not compensate properly for. I've tried to convince a few LAME developers to implement a similar technique (it is beyond me, at least at the moment smile.gif) but I doubt it will ever happen (we still need an adaptive lowpass and a bunch of other things..). I think it could help for quite a few situation LAME doesn't perform so well on, maybe even impulses.

I have talked to Monty some about this type of thing though and he said was going to implement an idea which partially does something like this but that the other side would come later. This was also when we discussed pre/post-echo tuning though which still hasn't happened to my knowledge aside from Garf's tweaks and a small post-echo detection bit (Update: RC3 offers some large improvements here now).
Go to the top of the page
+Quote Post

Reply to this topicStart new topic
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:

 



RSS Lo-Fi Version Time is now: 29th July 2014 - 17:06