IPB

Welcome Guest ( Log In | Register )

13 Pages V  « < 10 11 12 13 >  
Reply to this topicStart new topic
Public Listening Test [2010], Discussion
IgorC
post Apr 22 2010, 16:14
Post #276





Group: Members
Posts: 1579
Joined: 3-January 05
From: ARG/RUS
Member No.: 18803



Thank you, Alex.

Speaking of low anchor, Emese is most hard sample which I've ever seen. 64 kbps low anchor is actually not that awful for rest of the samples.

Chris has prepared concatenated reference sample.
Link to download: h*tp://www.mediafire.com/?yhwmwzjgmm3
I've also attached some possible low anchors: itunes 64 CBR, itunes 64 CVBR and CT 80. CT for 80 kbps low anchor as Apple has bug on LC-AAC 80-96 kbps.

01 BerlinDrug
02 AngelsFallFirst
03 CantWait
04 CreuzaDeMä
05 Ecstasy
06 FallOfLife_Linchpin
07 Girl
08 Hotel_Trust
09 Hurricane_YouCant
10 Kalifornia
11 Memories
12 RobotsCut
13 SinceAlways
14 Triangle_Glockenspiel
15 Trumpet_Rumba
16 Waiting

This post has been edited by IgorC: Apr 22 2010, 16:23
Go to the top of the page
+Quote Post
Alex B
post Apr 22 2010, 17:48
Post #277





Group: Members
Posts: 1303
Joined: 14-September 05
From: Helsinki, Finland
Member No.: 24472



I tried the files.

Personally I don't think the low anchor is optimal when the first thing that you hear is the obvious low-pass that makes the encoding entirely different from the others.

I'd like to suggest FAAC (v.1.28 from rarewares) with an adjusted low-pass frequency, for instance:

-q 35 -c 18000

I tried the above and it works pretty well with the concatenated sample.

EDIT

If it would appear to be too good or bad for a specific sample the q value could be adjusted for that sample.

This post has been edited by Alex B: Apr 22 2010, 18:13


--------------------
http://listening-tests.freetzi.com
Go to the top of the page
+Quote Post
IgorC
post Apr 22 2010, 18:20
Post #278





Group: Members
Posts: 1579
Joined: 3-January 05
From: ARG/RUS
Member No.: 18803



itunes CVBR 64 is noticeably better than FAAC -q 35 -c 18000
Go to the top of the page
+Quote Post
Alex B
post Apr 22 2010, 18:32
Post #279





Group: Members
Posts: 1303
Joined: 14-September 05
From: Helsinki, Finland
Member No.: 24472



itunes CVBR 64 is resampled to 32 kHz and low-passed at about 12 kHz, otherwise it sounds pretty "clean". It doesn't really help to understand what kind of artifacts (distortion, noise, pre-echo, etc) the sample may produce.

If -q 35 is too bad a higher value can be used.

In addition it would be better to include only 44.1 kHz samples. Sample rate switching may produce additional problems with the ABC-HR program, some operating systems, and/or some sound devices.


--------------------
http://listening-tests.freetzi.com
Go to the top of the page
+Quote Post
IgorC
post Apr 22 2010, 18:37
Post #280





Group: Members
Posts: 1579
Joined: 3-January 05
From: ARG/RUS
Member No.: 18803



Hm, good points indeed.

Then we should encode to FAAC -q>35 -c 18000 for low anchor

This post has been edited by IgorC: Apr 22 2010, 18:38
Go to the top of the page
+Quote Post
C.R.Helmrich
post Apr 22 2010, 19:31
Post #281





Group: Developer
Posts: 692
Joined: 6-December 08
From: Erlangen Germany
Member No.: 64012



If we're looking for an anchor emphasizing artifacts to be expected, why not use MP3, e.g. LAME CBR at the lowest setting which doesn't downsample to 32 kHz? I think we could actually use the old "version 1.0" Fraunhofer encoder from 1994(?) with an additional 16-kHz lowpass filter applied before encoding (that should avoid the bug).

Edit: The more I think of it, the more I believe we should use two anchors to stabilize the results: one to define the lower end of the grading scale, the other to define a mid-point of the scale. For the lower end, I just imitated the world's first audio encoder: our test set downsampled to 8 kHz using Audition and saved as 8-bit µ-Law stereo Wave file. That's a 128-kb encoding. Nice demonstration of how far we've come in the last 40 years or so smile.gif

µ-Law file: http://www.materialordner.de/wsRJHTtgLzlgF...TouJw5xomU.html

Edit 2: When using the µ-Law file as anchor, of course it will be upsampled to 44 kHz again.

Maybe a 96-kb MP3 would be just fine for an intermediate anchor.

Edit 3: Can someone please upload Fraunhofer's 1994 encoder (l3enc 0.99) here? Roberto's original page expired.

Chris

This post has been edited by C.R.Helmrich: Apr 22 2010, 20:30


--------------------
If I don't reply to your reply, it means I agree with you.
Go to the top of the page
+Quote Post
C.R.Helmrich
post Apr 22 2010, 22:01
Post #282





Group: Developer
Posts: 692
Joined: 6-December 08
From: Erlangen Germany
Member No.: 64012



Regarding the splitting of the concatenated encodes: I think we should use CopyAudio by Kabal et al. from McGill university to simply cut the Wav decode into the appropriate chunks. Reason:

http://www-mmsp.ece.mcgill.ca/Documents/Downloads/AFsp/

  • We can cut off the first 2.1 seconds of the test set, i.e. the HA introduction (stabilization part for CBR encoders).
  • We don't have to worry about encoder delay: we can split accordingly since the delay is known in advance.
  • CopyAudio can be sent to the listeners since it's freeware, and it's available for Linux/Mac and Windows, which
  • allows us to provide scripts for Linux/Mac and Windows, run by the listeners, to prepare the entire test from the concatenated .m4a encodes, i.e. decode to WAV and split into separate files.
  • We could even handle resampling for the anchor(s): there's also a tool ResampAudio in the Afsp package.

What do you guys think? If you agree, I'll write "prepare_test.bat" and "prepare_test.sh" Windows and Linux scripts for the ABC/HR package over the weekend.

Chris


--------------------
If I don't reply to your reply, it means I agree with you.
Go to the top of the page
+Quote Post
IgorC
post Apr 25 2010, 23:09
Post #283





Group: Members
Posts: 1579
Joined: 3-January 05
From: ARG/RUS
Member No.: 18803



Ok, Chris, your applications are better. smile.gif
I'm also fine with any of low anchors. So FAAC or LAME are just fine.
PM or send Email how you want to proceed.

This post has been edited by IgorC: Apr 25 2010, 23:09
Go to the top of the page
+Quote Post
lvqcl
post Apr 25 2010, 23:18
Post #284





Group: Developer
Posts: 3443
Joined: 2-December 07
Member No.: 49183



QUOTE
Can someone please upload Fraunhofer's 1994 encoder (l3enc 0.99) here? Roberto's original page expired.


This - http://web.archive.org/web/20070927014154/.../rrw/l3enc.html ?
Go to the top of the page
+Quote Post
C.R.Helmrich
post Apr 26 2010, 22:25
Post #285





Group: Developer
Posts: 692
Joined: 6-December 08
From: Erlangen Germany
Member No.: 64012



Thanks a lot, lvqcl! I tried a 112-kb encode with the - apparently bug-free - l3enc version 2.60 (Linux version). The quality is actually too good for a mid-anchor. 96 kbps unfortunately don't work in the unlicensed version. We are currently investigating LAME at 96 kb and 44 kHz sampling rate as anchor.

For the record, the lower anchor will be created and decoded with the following commands. This yields a delay-free anchor.

CODE
ResampAudio.exe -s 8000 -f cutoff=0.087 -D A-law -F WAVE ha_aac_test_sample_2010.wav ha_aac_test_sample_2010_a-law8.wav
ResampAudio.exe -s 44100 -D integer16 -F WAVE ha_aac_test_sample_2010_a-law8.wav ha_aac_test_sample_2010_a-law.wav
del ha_aac_test_sample_2010_a-law8.wav

Chris


--------------------
If I don't reply to your reply, it means I agree with you.
Go to the top of the page
+Quote Post
The Sheep of DEA...
post Apr 27 2010, 06:20
Post #286





Group: Developer
Posts: 175
Joined: 16-April 06
Member No.: 29596



What do you think about getting GXLame in as a low anchor (or even a competitor in a non-AAC test)? It's a low-bitrate MP3 encoder, so it just might fit the bill somewhere between V0-V30 (V20 averages 96kbps and defaults to 44kHz).

This post has been edited by The Sheep of DEATH: Apr 27 2010, 06:22


--------------------
Copy Restriction, Annulment, & Protection = C.R.A.P. -Supacon
Go to the top of the page
+Quote Post
Alex B
post Apr 27 2010, 11:42
Post #287





Group: Members
Posts: 1303
Joined: 14-September 05
From: Helsinki, Finland
Member No.: 24472



I don't understand why two low anchors would be needed. Wouldn't it better to let the "mid" anchor define where the the lower end of the scale is? Then there would possibly be a bit wider scale for the contenders. Ideally the low anchor would then get 0-3 and the contenders 2-5. IMHO, it would be enough that there is one low anchor that can be detected easier than the actual contenders.

Also, I don't understand why some old/mediocre MP3 encoder/setting would make a better low anchor than FAAC. FAAC would nicely represent the basis of the more developed AAC encoders. FAAC can be adjusted freely to provide the desired quality level. "-q 35 -c 18000" worked for me, but perhaps -q 38, -q 40 or so would work as well.

In general, it would be desirable that all encoders, including the low anchor, are easily available so that anyone can reproduce the test scenario (for verifying the authenticity of the results) or test different samples/encoders using/including the tested encoders and settings in order to get comparable personal results. Also the procedure to decode and split the test sample should be reproducible by anyone.

This post has been edited by Alex B: Apr 27 2010, 11:44


--------------------
http://listening-tests.freetzi.com
Go to the top of the page
+Quote Post
C.R.Helmrich
post Apr 27 2010, 22:06
Post #288





Group: Developer
Posts: 692
Joined: 6-December 08
From: Erlangen Germany
Member No.: 64012



QUOTE (Alex B @ Apr 27 2010, 12:42) *
I don't understand why two low anchors would be needed. Wouldn't it better to let the "mid" anchor define where the the lower end of the scale is? Then there would possibly be a bit wider scale for the contenders. Ideally the low anchor would then get 0-3 and the contenders 2-5. IMHO, it would be enough that there is one low anchor that can be detected easier than the actual contenders.

Use of two anchors follows the MUSHRA methodology and is an attempt at making the grading scale of this test more absolute. After all, all encoders in this test sound quite good compared to old/simple encoding techniques or lower bit rates. As the name implies, the lower anchor shall define the lower end of the scale and should give the listeners an idea of what we mean by "bad quality" (range 0-1). The hope then is that this reduces the confidence intervals (grade variance) for the other coders in the test, including the mid anchor (which should end up somewhere in the middle of the grading scale).

QUOTE
Also, I don't understand why some old/mediocre MP3 encoder/setting would make a better low anchor than FAAC. FAAC would nicely represent the basis of the more developed AAC encoders. [...]

Actually, it seems it doesn't. In my first informal evaluation, I noticed that FAAC is tuned very differently than the other AAC encoders in the test (less pre-echo, more warbling), and it seems LAME@96kb emphasizes the artifacts of the codecs under test (pre-echo, warbling on tonal sounds, etc.) better than FAAC@64. Btw, the bandwidth of LAME@96 is close enough to the codecs under test (around 15 kHz).

QUOTE
In general, it would be desirable that all encoders, including the low anchor, are easily available so that anyone can reproduce the test scenario (for verifying the authenticity of the results) or test different samples/encoders using/including the tested encoders and settings in order to get comparable personal results. Also the procedure to decode and split the test sample should be reproducible by anyone.

Agreed. Igor and I are working on scripts, run by the listeners, which do all the decoding and splitting of the bit streams and creation of the (decoded) anchors. My commands for the lower anchor above are a first attempt at this.

Chris

This post has been edited by C.R.Helmrich: Apr 27 2010, 22:07


--------------------
If I don't reply to your reply, it means I agree with you.
Go to the top of the page
+Quote Post
muaddib
post Apr 28 2010, 08:40
Post #289





Group: Developer
Posts: 401
Joined: 14-October 01
Member No.: 289



QUOTE (Alex B @ Apr 27 2010, 12:42) *
Ideally the low anchor would then get 0-3 and the contenders 2-5. IMHO, it would be enough that there is one low anchor that can be detected easier than the actual contenders.

QUOTE (C.R.Helmrich @ Apr 27 2010, 23:06) *
As the name implies, the lower anchor shall define the lower end of the scale and should give the listeners an idea of what we mean by "bad quality" (range 0-1).

The ITU-R five grade impairment scale that is used is between 1 (Very Annoying) and 5 (Imperceptible).
Bad quality would be in range 1-2, probably closer to 1.
Go to the top of the page
+Quote Post
Alex B
post Apr 28 2010, 11:06
Post #290





Group: Members
Posts: 1303
Joined: 14-September 05
From: Helsinki, Finland
Member No.: 24472



QUOTE (C.R.Helmrich @ Apr 28 2010, 00:06) *
Use of two anchors follows the MUSHRA methodology and is an attempt at making the grading scale of this test more absolute. After all, all encoders in this test sound quite good compared to old/simple encoding techniques or lower bit rates. As the name implies, the lower anchor shall define the lower end of the scale and should give the listeners an idea of what we mean by "bad quality" (range 0-1). The hope then is that this reduces the confidence intervals (grade variance) for the other coders in the test, including the mid anchor (which should end up somewhere in the middle of the grading scale).

In the past 48 and 64 kbps tests most samples were difficult to me because the low anchor was too bad and the remaining scale wasn't wide enough for correctly stating the differences between the easier and more difficult samples. I.e the low anchor was always like a "telephone" and got "1". The actual contenders were considerably better, but never close to transparency. So the usable scale for the contenders was mostly from 2.0 to 3.5. Actually, even then the grade "2" was a bit too low for correctly describing the difference between the low anchor and the worst contender. At the other end of the quality scale the difference between the reference and the best contender was always significant and anything above 4 would have been too much for the best contenders.

Of course the situation is different in a 128 kbps AAC test, but there is a danger that the two anchors will occupy the grades 1-4 and the actual contenders will get 4-5 and once again be more or less tied even though the testers actually could hear clear differences between the contenders.

QUOTE
Actually, it seems it doesn't. In my first informal evaluation, I noticed that FAAC is tuned very differently than the other AAC encoders in the test (less pre-echo, more warbling), and it seems LAME@96kb emphasizes the artifacts of the codecs under test (pre-echo, warbling on tonal sounds, etc.) better than FAAC@64. Btw, the bandwidth of LAME@96 is close enough to the codecs under test (around 15 kHz).

I see. I didn't actually try to do that kind of complex cross-comparison so you know more about this than I. You could have posted the explanation earlier... smile.gif


--------------------
http://listening-tests.freetzi.com
Go to the top of the page
+Quote Post
Alex B
post Apr 28 2010, 11:31
Post #291





Group: Members
Posts: 1303
Joined: 14-September 05
From: Helsinki, Finland
Member No.: 24472



QUOTE (muaddib @ Apr 28 2010, 10:40) *
The ITU-R five grade impairment scale that is used is between 1 (Very Annoying) and 5 (Imperceptible).
Bad quality would be in range 1-2, probably closer to 1.

Oops. That's my mistake and probably Chris just repeated it. I wrote the reply a bit hastily. By default ABC/HR for Java shows five integer grades from 1 to 5 (though that is configurable).


--------------------
http://listening-tests.freetzi.com
Go to the top of the page
+Quote Post
C.R.Helmrich
post Apr 28 2010, 12:52
Post #292





Group: Developer
Posts: 692
Joined: 6-December 08
From: Erlangen Germany
Member No.: 64012



QUOTE (Alex B @ Apr 28 2010, 12:06) *
Of course the situation is different in a 128 kbps AAC test, but there is a danger that the two anchors will occupy the grades 1-4 and the actual contenders will get 4-5 and once again be more or less tied even though the testers actually could hear clear differences between the contenders.

The method of statistical analysis which we will be using this time will take care of this: http://www.aes.org/e-lib/browse.cfm?elib=15021 Getting two MUSHRA-style anchors (one for worst quality, one for intermediate quality, and hidden reference for best quality) into our test allows us to use MUSHRA-style evaluation for our test, as stated in the referenced paper.

QUOTE
I see. I didn't actually try to do that kind of complex cross-comparison so you know more about this than I. You could have posted the explanation earlier... smile.gif

Sorry, I only did these tests a few days ago smile.gif

Chris


--------------------
If I don't reply to your reply, it means I agree with you.
Go to the top of the page
+Quote Post
C.R.Helmrich
post May 1 2010, 23:50
Post #293





Group: Developer
Posts: 692
Joined: 6-December 08
From: Erlangen Germany
Member No.: 64012



QUOTE (The Sheep of DEATH @ Apr 27 2010, 07:20) *
What do you think about getting GXLame in as a low anchor (or even a competitor in a non-AAC test)? It's a low-bitrate MP3 encoder, so it just might fit the bill somewhere between V0-V30 (V20 averages 96kbps and defaults to 44kHz).

When I have time, I'll certainly blind-test GXLame against LAME (because I'm interested in your work). However, assuming GXLame sounds better than LAME at low bit rates, I still tend towards LAME as anchor for this test. Here's why: unlike the codecs under test, anchors are supposed to produce certain artifacts, not avoid them. smile.gif

Chris

This post has been edited by C.R.Helmrich: May 1 2010, 23:50


--------------------
If I don't reply to your reply, it means I agree with you.
Go to the top of the page
+Quote Post
C.R.Helmrich
post May 3 2010, 23:06
Post #294





Group: Developer
Posts: 692
Joined: 6-December 08
From: Erlangen Germany
Member No.: 64012



OK, I changed my mind and go along with Alex. The mid anchor will be a "compromised" AAC encoding at 96 kbps VBR. More precisely, one without TNS and short blocks and a bandwidth of 15.8 kHz. It will be created with FAAC v1.28 and the following commands:

CODE
faac.exe --shortctl 1 -c 15848 -q 50 -w ha_aac_test_sample_2010.wav


Decoder-wise, I'm not sure yet. Either NeroAacDec 1.5.1.0 or FAAD2 v2.7. Can someone point me to an Intel MacOS X (fat) binary of the latter?

Chris


--------------------
If I don't reply to your reply, it means I agree with you.
Go to the top of the page
+Quote Post
nao
post May 4 2010, 05:34
Post #295





Group: Members
Posts: 86
Joined: 16-June 06
Member No.: 31911



QUOTE (C.R.Helmrich @ May 4 2010, 07:06) *
Can someone point me to an Intel MacOS X (fat) binary of the latter?

Here it is.
Go to the top of the page
+Quote Post
The Sheep of DEA...
post May 9 2010, 21:46
Post #296





Group: Developer
Posts: 175
Joined: 16-April 06
Member No.: 29596



QUOTE (C.R.Helmrich @ May 1 2010, 17:50) *
QUOTE (The Sheep of DEATH @ Apr 27 2010, 07:20) *
What do you think about getting GXLame in as a low anchor (or even a competitor in a non-AAC test)? It's a low-bitrate MP3 encoder, so it just might fit the bill somewhere between V0-V30 (V20 averages 96kbps and defaults to 44kHz).

When I have time, I'll certainly blind-test GXLame against LAME (because I'm interested in your work). However, assuming GXLame sounds better than LAME at low bit rates, I still tend towards LAME as anchor for this test. Here's why: unlike the codecs under test, anchors are supposed to produce certain artifacts, not avoid them. smile.gif

Chris


That's perfectly understandable. With its t4 release, I think it's actually quite competitive--I rushed to finish it in time for this test. wink.gif


--------------------
Copy Restriction, Annulment, & Protection = C.R.A.P. -Supacon
Go to the top of the page
+Quote Post
C.R.Helmrich
post Jun 4 2010, 17:02
Post #297





Group: Developer
Posts: 692
Joined: 6-December 08
From: Erlangen Germany
Member No.: 64012



In response to www.hydrogenaudio.org/forums/index.php?showtopic=77809:

QUOTE (C.R.Helmrich @ Jan 20 2010, 21:23)
QUOTE (muaddib @ Jan 20 2010, 14:24)

Also it would be beneficial to create tutorial with each,single,small step that proper test must consist of.

Do you mean a tutorial for the listeners on "what the rules are" and how to proceed before and during the test? That sounds good. Will be done.

I finally found some time for this test again. I've managed to write a nearly test-methodology (ABC/HR or MUSHRA) and user-interface independent instruction sheet to guide the test participants through a test session. It's based on my own experience and adapted to this particular test with regard to anchor and hidden-reference selection and grading. I'v put a draft under

www.ecodis.de/audio/guideline_high.html

A description of said "general test terminology", i.e. an explanation of terms such as anchor, item, overall quality, reference, session, stimulus, and transparency, will follow.

Everything related to listener training, i.e. how to use the test software, what kinds of artifacts to expect, and how to spot artifacts, will also be discussed separately. As mentioned, this instruction sheet is the "final one in the chain" and assumes a methodology- and terminology-informed, trained listener.

If you're an experienced listener and feel that your approach to a high-bit-rate blind test is radically different from my recommendation, please let me know about the difference.

Chris


--------------------
If I don't reply to your reply, it means I agree with you.
Go to the top of the page
+Quote Post
.alexander.
post Jun 7 2010, 09:30
Post #298





Group: Members
Posts: 73
Joined: 14-December 06
Member No.: 38681



QUOTE (C.R.Helmrich @ Jun 4 2010, 20:02) *
If you're an experienced listener and feel that your approach to a high-bit-rate blind test is radically different from my recommendation, please let me know about the difference.


Chris, I'm not experienced listener at all, and also my headphones are really poor. But I would love to know what people think about my way. I actually don't care about ABX probabilities but simply mux encoded and raw audio into L and R channels so that I can hear both signals simultaneously.

Also since there is some activity related to the test, I'm wondering whether someone could reach Opticom, or just have access to OperaDigitalEar to get advanced PEAQ scores for the test samples.
Go to the top of the page
+Quote Post
C.R.Helmrich
post Jun 7 2010, 18:07
Post #299





Group: Developer
Posts: 692
Joined: 6-December 08
From: Erlangen Germany
Member No.: 64012



QUOTE (.alexander. @ Jun 7 2010, 10:30) *
I actually don't care about ABX probabilities but simply mux encoded and raw audio into L and R channels so that I can hear both signals simultaneously.

My initial guess is that this is dangerous! You will probably hear artifacts which are inaudible if you just listen to the original and coded version, one after the other, and you might not hear certain artifacts which are clearly audible if you listen to both channels of the codec signal. Example: if original and coded version are slightly delayed to each other, you'll hear this with your approach because human hearing is very sensitive to inter-aural delay. However, if both coded channels are delayed by the same amount compared to the original two channels, this might be inaudible if you listen to both coded channels (which you should). I've never ABXed this way.

Objective quality measures will be done, but might not be published with the results (don't know if I'm allowed to publish Advanced PEAQ scores, the license is owned by my employer, not by me), especially not before the test.

Chris

This post has been edited by C.R.Helmrich: Jun 7 2010, 18:08


--------------------
If I don't reply to your reply, it means I agree with you.
Go to the top of the page
+Quote Post
.alexander.
post Jun 11 2010, 09:40
Post #300





Group: Members
Posts: 73
Joined: 14-December 06
Member No.: 38681



QUOTE (C.R.Helmrich @ Jun 7 2010, 21:07) *
and you might not hear certain artifacts which are clearly audible if you listen to both channels of the codec signal.

What kind of artifacts could be missed? Excluding stereo issues I can only imagine a very far fetched example. Anyway,
this method can be thought as unit test. Here is what I usually do


%%
[a, fs] = wavread('sampleA.wav');
[b, fs] = wavread('sampleB.wav');

[c, i] = xcorr(sum(a,2), sum(b,2), 4096); % fftfilt in Octave
i = i(abs( c )==max(abs( c )));

a(1: i) = [];
b(1:-i) = [];
a(length(b)+1:end) = [];
b(length(a)+1:end) = [];

%%
j = round(rand);
x = circshift([a(:) b(:)], [0 j]);

wavplay(x, fs, 'async')

Go to the top of the page
+Quote Post

13 Pages V  « < 10 11 12 13 >
Reply to this topicStart new topic
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:

 



RSS Lo-Fi Version Time is now: 29th November 2014 - 10:10