IPB

Welcome Guest ( Log In | Register )

 
Reply to this topicStart new topic
Document about listening test conduction, Time to share my experience
rjamorim
post Nov 20 2005, 01:23
Post #1


Rarewares admin


Group: Members
Posts: 7515
Joined: 30-September 01
From: Brazil
Member No.: 81



Hello, people.

These last few months I have been working on some sort of guide to help newcomers get their ways around listening test conduction. Hopefully it'll help spark interest in people that were still just wondering whether to conduct their own tests or not.

http://www.rarewares.org/rja/ListeningTest.pdf

It's still not "officially released". So, I'd like to ask you guys for suggestions on improvements and corrections, or just general comments on how do you like it.

Thank-you, and I hope you enjoy reading it.

Best regards;

Roberto.

This post has been edited by rjamorim: Nov 20 2005, 01:29


--------------------
Get up-to-date binaries of Lame, AAC, Vorbis and much more at RareWares:
http://www.rarewares.org
Go to the top of the page
+Quote Post
rjamorim
post Nov 20 2005, 01:41
Post #2


Rarewares admin


Group: Members
Posts: 7515
Joined: 30-September 01
From: Brazil
Member No.: 81



Oops. I'm sorry, the version that was available there is outdated.

If you downloaded it already, please redownload. The current version is the correct one. Thanks.


--------------------
Get up-to-date binaries of Lame, AAC, Vorbis and much more at RareWares:
http://www.rarewares.org
Go to the top of the page
+Quote Post
ff123
post Nov 20 2005, 04:01
Post #3


ABC/HR developer, ff123.net admin


Group: Developer (Donating)
Posts: 1396
Joined: 24-September 01
Member No.: 12



QUOTE
Here is a list of places you should consider announcing your test at:
Hydrogenaudio, of course
rec.audio.opinion Usenet group
...


I once got reported to my ISP when I announced a new version of abc/hr at rec.audio.opinion, for violating the group's charter.

Announce listening tests there at your peril.

Other random thoughts:

Listener Training
It would be nice to have a section on listener training. In one of my tests, I had prospective listeners download a small training package before the main test started. Perhaps the instructions to acquiring the main test could be embedded in the training package.

sample01
A note on listener psychology: they will tend to download and listen to sample01 first, and then decide whether they want to continue based on their experience on that first sample. I know that's what I do ;-) Ideally, there would be some sort of randomizer which assigns different music to each of the samples dynamically, but that would require some way to sort things out in the end. Barring that, I would try to make sample01 as friendly as possible.

Keep the ball rolling
Try to keep the discussion thread going during the test to keep interest up.

sample durations
The samples should be about the same duration. The idea is that the average bitrate of the sample set depends on the individual sample durations as well as their difficulty -- the longer the sample, the more it affects the overall bitrate.

sample bitrate distribution
For a vbr codec, I think the distribution of bitrates in the small sample set (eg., 20 samples), supposing you draw a histogram of it, should resemble the distribution of a large sample set chosen from a wide variety of music.

ff123
Go to the top of the page
+Quote Post
ff123
post Nov 20 2005, 04:06
Post #4


ABC/HR developer, ff123.net admin


Group: Developer (Donating)
Posts: 1396
Joined: 24-September 01
Member No.: 12



Eliminate 1st second of encoder output
I seem to remember some codecs having problems during the 1st second or so of the output. The config file should start the sample such that the 1st second is not included in the listening test.
Go to the top of the page
+Quote Post
ErikS
post Nov 20 2005, 04:26
Post #5





Group: Members
Posts: 757
Joined: 8-October 01
Member No.: 247



QUOTE (ff123 @ Nov 20 2005, 05:06 AM)
Eliminate 1st second of encoder output
I seem to remember some codecs having problems during the 1st second or so of the output.  The config file should start the sample such that the 1st second is not included in the listening test.
*


Well, isn't that a flaw in the encoder which should be allowed to affect the result negatively? A more extreme analogue: Some codecs have problems during sharp attacks. One should take care to eliminate such attacks from all test samples.
Go to the top of the page
+Quote Post
rjamorim
post Nov 20 2005, 04:32
Post #6


Rarewares admin


Group: Members
Posts: 7515
Joined: 30-September 01
From: Brazil
Member No.: 81



Very big thanks for the usual awesome help, ff123 smile.gif


--------------------
Get up-to-date binaries of Lame, AAC, Vorbis and much more at RareWares:
http://www.rarewares.org
Go to the top of the page
+Quote Post
ff123
post Nov 20 2005, 04:44
Post #7


ABC/HR developer, ff123.net admin


Group: Developer (Donating)
Posts: 1396
Joined: 24-September 01
Member No.: 12



QUOTE (ErikS @ Nov 19 2005, 07:26 PM)
QUOTE (ff123 @ Nov 20 2005, 05:06 AM)
Eliminate 1st second of encoder output
I seem to remember some codecs having problems during the 1st second or so of the output.  The config file should start the sample such that the 1st second is not included in the listening test.
*


Well, isn't that a flaw in the encoder which should be allowed to affect the result negatively? A more extreme analogue: Some codecs have problems during sharp attacks. One should take care to eliminate such attacks from all test samples.
*



Except that 99.5% of the time, you won't be listening to that 1st second. It's unfair to feature such a fault in every sample.
Go to the top of the page
+Quote Post
ff123
post Nov 20 2005, 04:47
Post #8


ABC/HR developer, ff123.net admin


Group: Developer (Donating)
Posts: 1396
Joined: 24-September 01
Member No.: 12



sample content
The sample should be as homogenous as reasonably possible, otherwise the listener may have difficulty rating a codec (eg., the first part codec A was better, but in the last part, codec B was better).
Go to the top of the page
+Quote Post
ErikS
post Nov 20 2005, 04:48
Post #9





Group: Members
Posts: 757
Joined: 8-October 01
Member No.: 247



And now that I read through the whole document, I can only say: well done. I hope it will help to bring forward a successor that will pick up the testing business again.

Just one question: what does the title you write after your name on the front page, PITA, mean? "Pain in the ass" is the first thing that comes to my mind... dry.gif
Go to the top of the page
+Quote Post
rjamorim
post Nov 20 2005, 05:16
Post #10


Rarewares admin


Group: Members
Posts: 7515
Joined: 30-September 01
From: Brazil
Member No.: 81



QUOTE (ErikS @ Nov 20 2005, 01:48 AM)
Just one question: what does the title you write after your name on the front page, PITA, mean? "Pain in the ass" is the first thing that comes to my mind... dry.gif
*


That's precisely it!

I looked at these articles you find around in the web, and most of them have the name followed by fancy acronyms, like "John Doe, PhD". Since I'm no PhD or anything like that, PITA is what comes closest tongue.gif

Obviously, I will remove it from the final version...


--------------------
Get up-to-date binaries of Lame, AAC, Vorbis and much more at RareWares:
http://www.rarewares.org
Go to the top of the page
+Quote Post
dreamliner77
post Nov 20 2005, 07:41
Post #11





Group: Members
Posts: 2150
Joined: 29-June 02
From: Boston
Member No.: 2427



Not PITA for those that know...


--------------------
"You can fight without ever winning, but never win without a fight." Neil Peart 'Resist'
Go to the top of the page
+Quote Post
Gabriel
post Nov 20 2005, 11:20
Post #12


LAME developer


Group: Developer
Posts: 2950
Joined: 1-October 01
From: Nanterre, France
Member No.: 138



1st second removal:
This is not because of problems in codecs, but to allow a realistic behaviour. With real encoding, encoders might be adapting themselves to content. In real tracks, this adaptation can be progressively done as the track is starting, thus the beginning of a extract from a track is not representative of the encoding of this part inside the full length sample.
Go to the top of the page
+Quote Post
MaB_fr
post Nov 20 2005, 12:37
Post #13





Group: Members
Posts: 196
Joined: 30-October 05
Member No.: 25458



Completelly off topic, but i must do it :

QUOTE
rjamorim: At low bitrates nobody is interested,
but the results are easy to obtain
rjamorim: At high bitrates everyone is interested,
but you practically can't obtain usable results
ff123: s/bitrates/beauty and s/results/fucks


This, indeed, is SYSTEMIC ! ;)

For the rest, it's very very good !

Things like the High Bitrate question always seems to me like a paradox...i always thought you have to slip from "how good this sounds" to "how big the file size is"...cause as you said, most people won't notice a difference...

But then it's not listening tests....

MaB_fr
Go to the top of the page
+Quote Post
Jan S.
post Nov 20 2005, 17:29
Post #14





Group: Admin
Posts: 2551
Joined: 26-September 01
From: Denmark
Member No.: 21



Here are some random thoughts and nitpicking - hopefully some of them are actually useful...
  • Mention bias perhaps - not just placebo.
  • QUOTE
    To counter the claims of the subjectivists, the objectivists created a method to reliably compare two audio signals called ABX.
    Wouldn't this sentence strictly be saying that the audio signals are called ABX and not the method?
  • QUOTE
    Still on the samples subject: avoid the obvious choice of problem
    samples (samples that trip codecs producing very nasty artifacts) like Kalifornia,
    Castanets and IDM stuff because their artifacts are easily detectable, and
    therefore less fatiguing for your listeners.
    Maybe you don't want to mention sample names that only people that have been around for years will know... dunno. Perhaps it just adds to the confusion.
  • QUOTE
    <nostalgia>
    Stuff like this makes it seem unserious IMO. I think the anecdotes should be left out if I understand what you want with this paper...
  • Add an Index perhaps
  • I think a more comprehensive explanation of what you actually do in the ABX would be good.. what buttons do what. How you ABX and rate. That part was very unclear to me.
  • More excel guide! You can't just use the normal x,y-graphs so if you want to make it easy for people you should add steps for the graph creation. I think this is a bigger issue than you suggest in the paper.
  • Where to get programs
  • Link to ITU doc
Go to the top of the page
+Quote Post
ff123
post Nov 20 2005, 17:39
Post #15


ABC/HR developer, ff123.net admin


Group: Developer (Donating)
Posts: 1396
Joined: 24-September 01
Member No.: 12



QUOTE
To calculate error margins, you must use ff123s statistical analysis tool
from the command prompt. Run it as:

friedman -tp resultsXX.txt

and itll print to screen the analysis done on that results table. If you want
Friedman to save the analysis to a file, use output redirection:

friedman -tp resultsXX.txt > analysisXX.txt\


Since the time you ran your early tests, I made the parametric Tukey's HSD option available on the web-based tool and made it the default:

http://ff123.net/friedman/stats.html
Go to the top of the page
+Quote Post
krabapple
post Nov 20 2005, 23:43
Post #16





Group: Members
Posts: 2454
Joined: 18-December 03
Member No.: 10538



The document could still use a bit of proofreading, e.g., I see 'conduce' used where I believe you mean 'conduct'. I'll help you out with that if you like.

This post has been edited by krabapple: Nov 20 2005, 23:44
Go to the top of the page
+Quote Post
NoXFeR
post Nov 22 2005, 02:03
Post #17





Group: Members
Posts: 84
Joined: 3-August 03
From: Trondheim, NO
Member No.: 8142



Compliments on a good document!

Suggestion: Place links and references at the end of the document. To doom9, HA, programs' homepages, etc...
Go to the top of the page
+Quote Post
rjamorim
post Dec 6 2005, 03:08
Post #18


Rarewares admin


Group: Members
Posts: 7515
Joined: 30-September 01
From: Brazil
Member No.: 81



Hello.

I would like to apologize for not producing an updated version of this document yet. I am now facing finals and papers, besides working 6 hours per day at Siemens. If that wasn't enough, I am helping Sebastian with his listening test and creating a new site design for LAME.

I guarantee you all your comments are being taken into account, and I hope to be able to release a new Work In Progress soon.

Thank-you very much.

Best regards;

Roberto.


--------------------
Get up-to-date binaries of Lame, AAC, Vorbis and much more at RareWares:
http://www.rarewares.org
Go to the top of the page
+Quote Post
pepoluan
post Dec 8 2006, 12:38
Post #19





Group: Members
Posts: 1455
Joined: 22-November 05
From: Jakarta
Member No.: 25929



Hey Roberto, any update?

Care to put some in here:

http://wiki.hydrogenaudio.org/index.php?ti...listening_tests


--------------------
Nobody is Perfect.
I am Nobody.

http://pandu.poluan.info
Go to the top of the page
+Quote Post
rjamorim
post Dec 9 2006, 14:43
Post #20


Rarewares admin


Group: Members
Posts: 7515
Joined: 30-September 01
From: Brazil
Member No.: 81



OMG! A new version at least!


[quote name='ff123' post='343316' date='Nov 20 2005, 00:01']
I once got reported to my ISP when I announced a new version of abc/hr at rec.audio.opinion, for violating the group's charter.

Announce listening tests there at your peril.[/quote]

Added a warning there.

[quote]Other random thoughts:

Listener Training
It would be nice to have a section on listener training. In one of my tests, I had prospective listeners download a small training package before the main test started. Perhaps the instructions to acquiring the main test could be embedded in the training package.

sample01
A note on listener psychology: they will tend to download and listen to sample01 first, and then decide whether they want to continue based on their experience on that first sample. I know that's what I do ;-) Ideally, there would be some sort of randomizer which assigns different music to each of the samples dynamically, but that would require some way to sort things out in the end. Barring that, I would try to make sample01 as friendly as possible.

Keep the ball rolling
Try to keep the discussion thread going during the test to keep interest up.

sample durations
The samples should be about the same duration. The idea is that the average bitrate of the sample set depends on the individual sample durations as well as their difficulty -- the longer the sample, the more it affects the overall bitrate.

sample bitrate distribution
For a vbr codec, I think the distribution of bitrates in the small sample set (eg., 20 samples), supposing you draw a histogram of it, should resemble the distribution of a large sample set chosen from a wide variety of music.

ff123[/quote]

Added all of these. Thank-you very much!

[quote name='Gabriel' post='343382' date='Nov 20 2005, 07:20']
1st second removal:
This is not because of problems in codecs, but to allow a realistic behaviour. With real encoding, encoders might be adapting themselves to content. In real tracks, this adaptation can be progressively done as the track is starting, thus the beginning of a extract from a track is not representative of the encoding of this part inside the full length sample.
[/quote]

Added it. thanks!

[quote name='Jan S.' post='343454' date='Nov 20 2005, 13:29']
Here are some random thoughts and nitpicking - hopefully some of them are actually useful...
Mention bias perhaps - not just placebo.[/quote]

Done

[quote]Wouldn't this sentence strictly be saying that the audio signals are called ABX and not the method?[/quote]

Good point! I removed the ambiguity.

[quote]Maybe you don't want to mention sample names that only people that have been around for years will know... dunno. Perhaps it just adds to the confusion.[/quote]
Done

[quote]Stuff like this makes it seem unserious IMO. I think the anecdotes should be left out if I understand what you want with this paper...[/quote]
Bummer tongue.gif

OK, removed it smile.gif

[quote]Add an Index perhaps[/quote]
Done

[quote]I think a more comprehensive explanation of what you actually do in the ABX would be good.. what buttons do what. How you ABX and rate. That part was very unclear to me.[/quote]

Well, I think that part belongs more in the listener training part. Remember, that document is for test conductors, not test participants.

[quote]More excel guide! You can't just use the normal x,y-graphs so if you want to make it easy for people you should add steps for the graph creation. I think this is a bigger issue than you suggest in the paper.[/quote]
Augh. Maybe later. Guiding step-by-step in Excel is quite the pain :B

[quote]Where to get programs[/quote]
Done (most of it)

[quote]Link to ITU doc[/quote]
Done

Thank-you very much for all your suggestions, Jan!

Moving on...

[quote name='krabapple' post='343571' date='Nov 20 2005, 19:43']The document could still use a bit of proofreading, e.g., I see 'conduce' used where I believe you mean 'conduct'. I'll help you out with that if you like.[/quote]

Yes, please! All feedback related to grammar (and everything else, really) is welcome!

[quote name='NoXFeR' post='343998' date='Nov 21 2005, 22:03']Suggestion: Place links and references at the end of the document. To doom9, HA, programs' homepages, etc...[/quote]

Added them as footer notes.

[quote name='pepoluan' post='455495' date='Dec 8 2006, 08:38']
Hey Roberto, any update?

Care to put some in here:

http://wiki.hydrogenaudio.org/index.php?ti...listening_tests
[/quote]

To be quite honest, I'm not too fond of the idea of wikifying it. I want to have responsability and authorship on this document, so that people can easily come to me if they need help. If wikifyed, both responsability and authorship get diluted...


Anyway, the new version is already uploaded, at the same location. Please download, read and send in your comments!

I promise I'll try to respond to the comments faster this time tongue.gif

This post has been edited by rjamorim: Dec 9 2006, 14:42


--------------------
Get up-to-date binaries of Lame, AAC, Vorbis and much more at RareWares:
http://www.rarewares.org
Go to the top of the page
+Quote Post
rjamorim
post Dec 9 2006, 16:22
Post #21


Rarewares admin


Group: Members
Posts: 7515
Joined: 30-September 01
From: Brazil
Member No.: 81



New version up, fixed several small errors spotted by Sebastian Mares.


--------------------
Get up-to-date binaries of Lame, AAC, Vorbis and much more at RareWares:
http://www.rarewares.org
Go to the top of the page
+Quote Post
Gabriel
post Dec 9 2006, 16:57
Post #22


LAME developer


Group: Developer
Posts: 2950
Joined: 1-October 01
From: Nanterre, France
Member No.: 138



Perhaps I should document the GnuPlot way to produce graphs (way easier than with Excel or OOorg...
Go to the top of the page
+Quote Post
rjamorim
post Dec 10 2006, 00:01
Post #23


Rarewares admin


Group: Members
Posts: 7515
Joined: 30-September 01
From: Brazil
Member No.: 81



QUOTE (Gabriel @ Dec 9 2006, 12:57) *
Perhaps I should document the GnuPlot way to produce graphs (way easier than with Excel or OOorg...


I would be very grateful smile.gif


--------------------
Get up-to-date binaries of Lame, AAC, Vorbis and much more at RareWares:
http://www.rarewares.org
Go to the top of the page
+Quote Post
ff123
post Dec 10 2006, 06:08
Post #24


ABC/HR developer, ff123.net admin


Group: Developer (Donating)
Posts: 1396
Joined: 24-September 01
Member No.: 12



About running friedman.exe

"friedman -tp" which selects Tukey's parametric analysis is statistically more "proper" than "friedman -a" which selects the Anova analysis with a Fischer LSD. The former corrects for multiple codec comparisons, while the latter does not.

I've also made the Tukey's HSD the default analysis on my web page.

ff123
Go to the top of the page
+Quote Post
rjamorim
post Dec 10 2006, 12:42
Post #25


Rarewares admin


Group: Members
Posts: 7515
Joined: 30-September 01
From: Brazil
Member No.: 81



QUOTE (ff123 @ Dec 10 2006, 02:08) *
About running friedman.exe

"friedman -tp" which selects Tukey's parametric analysis is statistically more "proper" than "friedman -a" which selects the Anova analysis with a Fischer LSD. The former corrects for multiple codec comparisons, while the latter does not.

I've also made the Tukey's HSD the default analysis on my web page.

ff123


Ah, thanks. Fixed that on the document.

I already read parts of the comments on friedman.c to try to figure out what's the difference between all those modes, but because of the considerable amount of statistical terms (and I don't know much about statistics), I couldn't understand everything.


--------------------
Get up-to-date binaries of Lame, AAC, Vorbis and much more at RareWares:
http://www.rarewares.org
Go to the top of the page
+Quote Post

Reply to this topicStart new topic
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:

 



RSS Lo-Fi Version Time is now: 29th November 2014 - 08:26