IPB

Welcome Guest ( Log In | Register )

 
Reply to this topicStart new topic
PlayerABX Tool - Foobar2000 vs XMPlay, ABX GUI Tool that silently plays songs in two players
Did PlayerDoes XMPlay or Foobar2000 give better quality sound?
You cannot see the results of the poll until you have voted. Please login and cast your vote to see the results of this poll.
Total Votes: 24
Guests cannot vote 
robertcollier4
post Feb 10 2013, 09:02
Post #1





Group: Members
Posts: 33
Joined: 25-November 12
Member No.: 104754



This tool allows double-blind playing of the same songs in two different players. Please participate in this Double-blind ABX listening test to compare Foobar2000 1.2.2 vs XMPlay 3.7. You must choose 10 audio files in the interface - listen to each audio file in two anonymous players - and then make a radio selection as to which sounded better to you.

The tool was written in AutoHotKey Classic - and so you can view the code yourself to see what it does. If you have AutoHotkey installed - you can run the .AHK file, or if you don't have AutoHotKey installed you can run the compiled .EXE file.

XMPlay 3.7 and Foobar2000 1.2.2 are to be included in subfolders. Both programs are launched to the tray and produce no popups when playing songs via this tool - so as to provide double-blind compliance. From the author's original files, the following changes have been made to support double-blind operation:
1. XMPlay.ini is modified to add the line "NoReg=1" to make it into portable mode and "TitleTray=0" to disable tray bubble notifications for blind playing and "AutoAmp=0" to disable ReplayGain.
2. XMPlay Directsound plug-in has been installed into XMPlay to match Foobar2000's default DirectSound output.
3. Foobar2000 Preferences>Shell Integration>"Bring to front when adding new files" has been unticked.

Please post your results. This tool can also very easily be modified to perform double-blind tests between other players as well. My goal was to find the best sounding audio player for Windows - and I believe XMPlay is it according to my tests. If you believe there is a better sounding audio player than XMPlay then this tool can be modified (the .AHK source is included) for that player as well. Please post your results and lets verify what the best sounding audio player for Windows is.

Download: PlayerABX-Foobar-vs-XMPlay.zip (3.25MB)



This post has been edited by robertcollier4: Feb 10 2013, 09:58
Go to the top of the page
+Quote Post
Kohlrabi
post Feb 10 2013, 09:24
Post #2





Group: Super Moderator
Posts: 1017
Joined: 12-March 05
From: Kiel, Germany
Member No.: 20561



  1. You are violating the foobar2000 license by distributing foobar2000 in non-installer form.
  2. ABX is not meant to determine a preference, but to show whether you can spot a difference.
  3. The players play at (easily perceivable) different volume, so telling them apart is dead easy. I disabled ReplayGain processing in foobar2000, now they seem to play at the same volume.
  4. The conclusion is wrong, I just ticked 4 vs. 6, and it still said "$foo sounds better in perceptual quality", which can not be said conclusively.

What's fascinating to me is that you can build a simple GUI just using AutoHotKey. smile.gif

This post has been edited by Kohlrabi: Feb 10 2013, 09:33


--------------------
PRaT is the new jitter.
Go to the top of the page
+Quote Post
robertcollier4
post Feb 10 2013, 10:30
Post #3





Group: Members
Posts: 33
Joined: 25-November 12
Member No.: 104754



QUOTE (Kohlrabi @ Feb 10 2013, 01:24) *
  1. You are violating the foobar2000 license by distributing foobar2000 in non-installer form.
  2. ABX is not meant to determine a preference, but to show whether you can spot a difference.
  3. The players play at (easily perceivable) different volume, so telling them apart is dead easy. I disabled ReplayGain processing in foobar2000, now they seem to play at the same volume.
  4. The conclusion is wrong, I just ticked 4 vs. 6, and it still said "$foo sounds better in perceptual quality", which can not be said conclusively.

What's fascinating to me is that you can build a simple GUI just using AutoHotKey. smile.gif


Okay, thank you for the feedback @Kohlrabi

Point1 and Point3. I have removed the pre-installed and pre-configured Foobar2000 and replaced it with the original author's install EXE provided with instructions to disable "Bring to front when adding new files"and disable ReplayGain.
Point2. Okay, so if ABX is not the name - then "Double-blind comparison test" is a better name? Can a moderator change the thread title appropriately because I can no longer edit the original post after a reply.
Point4 - Okay, I have removed the "conclusions" section and now it just reports the number of times the "better" radio checkboxes was clicked - it does not "assert" any conclusions anymore since you are right the sample size has to be appropriate.
Go to the top of the page
+Quote Post
robertcollier4
post Feb 11 2013, 20:06
Post #4





Group: Members
Posts: 33
Joined: 25-November 12
Member No.: 104754



The tool has been renamed as PlayerDBT since DBT (Double Blind Test) is the correct acronym. Foobar2000 is now included in installer form only.

I think this is a useful tool and it can be easily modified to do a double-blind comparison of any two programs. Source is included in playerdbt.ahk under free and do whatever you want I don't care license.

Download: PlayerDBT-Foobar-vs-XMPlay.zip


This post has been edited by robertcollier4: Feb 11 2013, 20:25
Go to the top of the page
+Quote Post
greynol
post Feb 11 2013, 20:36
Post #5





Group: Super Moderator
Posts: 10000
Joined: 1-April 04
From: San Francisco
Member No.: 13167



X was better in P Tests (P/(P+Q)*100%)
Y was better in Q Tests (Q/(P+Q)*100%)

Lets assume that X and Y are the same player instead of different players. Now each "Test" is the equivalent of a coin toss. Does choosing heads over tails make heads really better?

This is along the lines of Kohlrabi's point which is still valid criticism even after your fix. Your results still assert that one was better than another for any given "Test". If you want to show that one player was better than another for any given test, the test needs to have multiple trials and needs to calculate the probability of guessing, similar to an ABX test.


--------------------
Concern trolls: not a myth.
Go to the top of the page
+Quote Post
db1989
post Feb 11 2013, 20:42
Post #6





Group: Super Moderator
Posts: 5275
Joined: 23-June 06
Member No.: 32180



QUOTE (robertcollier4 @ Feb 10 2013, 08:02) *
Please participate in this Double-blind ABX listening test to compare Foobar2000 1.2.2 vs XMPlay 3.7. You must choose 10 audio files in the interface - listen to each audio file in two anonymous players - and then make a radio selection as to which sounded better to you.
Please provide a defensible explanation of why any audible difference should be expected, never mind possible.
Go to the top of the page
+Quote Post
robertcollier4
post Feb 11 2013, 20:56
Post #7





Group: Members
Posts: 33
Joined: 25-November 12
Member No.: 104754



QUOTE (db1989 @ Feb 11 2013, 12:42) *
Please provide a defensible explanation of why any audible difference should be expected, never mind possible.

Because Foobar2000 uses the FFmpeg decoding library and XMPlay uses the BASS decoding library. The two libraries could differ in how they deal with floating point arithmetic, etc. I wanted to find a way to double-blind test myself in comparing the two after Foobar2000 announced that it is switching from the mpg123 library to the FFmpeg library.

I was curious to myself about of over if I would be able to find a difference in the player quality - and I didn't see any existing tool and GUI like this already existing - so I decided to write it. Users can also modify the .AHK file to compare other players or other configs. The tool just randomizes X and Y launches for two EXEs with the parameters of the files selected.

The tool is just reporting what the user clicks. I will leave it to people to make their own statistical conclusions, since this is a qualitative and as 'per opinion' comparison.

This post has been edited by robertcollier4: Feb 11 2013, 20:58
Go to the top of the page
+Quote Post
greynol
post Feb 11 2013, 20:58
Post #8





Group: Super Moderator
Posts: 10000
Joined: 1-April 04
From: San Francisco
Member No.: 13167



You need to demonstrate that there is a difference between expressing an opinion and simply guessing before you can even begin to suggest this ridiculousness is somehow qualitative.

EDIT: Thank you for taking my suggestion seriously. My apologies for piling on. I do question whether there should be an open poll until you have this issue worked-out, however.


This post has been edited by greynol: Feb 11 2013, 21:09


--------------------
Concern trolls: not a myth.
Go to the top of the page
+Quote Post
robertcollier4
post Feb 11 2013, 21:05
Post #9





Group: Members
Posts: 33
Joined: 25-November 12
Member No.: 104754



QUOTE (greynol @ Feb 11 2013, 12:36) *
If you want to show that one player was better than another for any given test, the test needs to have multiple trials and needs to calculate the probability of guessing, similar to an ABX test.

Makes sense. A person can run the program multiple times and run the 10-file test multiple times. Each time the tool is loaded - the X,Y selections are randomized anew. The user can see if he/she can achieve the same results in multiple trials of running the tool.
Go to the top of the page
+Quote Post
db1989
post Feb 11 2013, 21:30
Post #10





Group: Super Moderator
Posts: 5275
Joined: 23-June 06
Member No.: 32180



QUOTE (robertcollier4 @ Feb 11 2013, 19:56) *
Because Foobar2000 uses the FFmpeg decoding library and XMPlay uses the BASS decoding library. The two libraries could differ in how they deal with floating point arithmetic, etc. I wanted to find a way to double-blind test myself in comparing the two after Foobar2000 announced that it is switching from the mpg123 library to the FFmpeg library.
More reasonable, so thanks for elaborating, even if a difference is still unlikely. wink.gif

QUOTE (robertcollier4 @ Feb 11 2013, 19:56) *
The tool is just reporting what the user clicks. I will leave it to people to make their own statistical conclusions, since this is a qualitative and as 'per opinion' comparison.
QUOTE (robertcollier4 @ Feb 11 2013, 20:05) *
A person can run the program multiple times and run the 10-file test multiple times. Each time the tool is loaded - the X,Y selections are randomized anew. The user can see if he/she can achieve the same results in multiple trials of running the tool.
However, I still have reservations about this method. We promote statistical tests for a reason. It’s not hard to imagine someone downloading this, getting some result that may be purely stochastic, and concluding wrongly on the basis of numbers that are unprocessed and not qualified with an associated probability or any other aid to interpretation. Does that make you responsible for whatever they do with that possibly erroneous conclusion? Not quite, but it doesn’t make the program fit very well with our established practices.

Of course, that’s fairly unlikely to happen to regular users, but I have concerns about the program being distributed through Hydrogenaudio and possibly (via the site’s reputation) granting an unwarranted degree of legitimacy to statistically unconvincing results. I grant that I might be being overly fanciful with these hypothetical possibilities, but there they are regardless.

This post has been edited by db1989: Feb 11 2013, 21:33
Go to the top of the page
+Quote Post
robertcollier4
post Feb 11 2013, 22:18
Post #11





Group: Members
Posts: 33
Joined: 25-November 12
Member No.: 104754



QUOTE (db1989 @ Feb 11 2013, 13:30) *
We promote statistical tests for a reason. It’s not hard to imagine someone downloading this, getting some result that may be purely stochastic, and concluding wrongly on the basis of numbers that are unprocessed and not qualified with an associated probability or any other aid to interpretation... Of course, that’s fairly unlikely to happen to regular users, but I have concerns about the program being distributed through Hydrogenaudio and possibly (via the site’s reputation) granting an unwarranted degree of legitimacy to statistically unconvincing results.

I appreciate the rigor that this site enforces. I have added instructions in the README.txt file that the user should perform 16 trials.

INSTRUCTIONS FOR USE
If you want to try to achieve statistically valid results, you are asked to reload this tool 16 times and run the test for 16 trials to see if you are able to achieve repeatable results to separate your results apart from the results you may get from guessing. For more information on how to achieve statistically valid results, you can see: http://wiki.hydrogenaudio.org/index.php?title=ABX
1. Run playerdbt.exe to begin the listening test.
2. If this is the first trial and no songs are loaded, click the "..." button to choose a file for each of your 10 test files.
3. For each song, start and stop the X and Y player. Select the radio box for which one sounds better and more accurate to you.
4. Click "Show Results" when you are done making your selection for each of the 10 test songs.
5. Copy the results output to a text document, close PlayerDBT.
6. [Repeat Steps 1,3,4,5] 15 more times until you have done a total of 16 trials.

This post has been edited by robertcollier4: Feb 11 2013, 22:26
Go to the top of the page
+Quote Post
greynol
post Feb 11 2013, 22:35
Post #12





Group: Super Moderator
Posts: 10000
Joined: 1-April 04
From: San Francisco
Member No.: 13167



I think a far better solution would be to incorporate that within the test and have the software provide the statistical results.

Why force the user to do all the heavy lifting when this is something a computer can do quite easily?

This post has been edited by greynol: Feb 11 2013, 23:14


--------------------
Concern trolls: not a myth.
Go to the top of the page
+Quote Post
db1989
post Feb 11 2013, 22:40
Post #13





Group: Super Moderator
Posts: 5275
Joined: 23-June 06
Member No.: 32180



This reveals another problem: users are to choose the files that, to them, sound “better and more accurate”. Not only are those two qualities vague: they also have no obligation to correlate with each other.

Moreover, if a user is to take seriously the invitation to test accuracy, how are they to determine that? A measure of accuracy implies a fixed reference (control) to which the item being tested is compared. What is the control in this case? In an ABX test, it would be A, but no such thing exists here. [edit] lol wut, it must have been much too late/early when I wrote that. What I was actually thinking of was something like ABC/HR, since ABX compares only two sources, not two compressed sources vs. lossless, as would be the relevant type of test here. [/edit]

I think this is why decoders are generally compared using mathematical methods, which tend to reveal no differences close to any threshold of perceptibility anyway. Assuming proper implementation, I see no reason why the two decoders that are relevant here should be any different. In that case, promoting perceptual testing not only is unnecessary but also risks drawing people into false conclusions due to a lack of statistical controls, especially if such people may already have pre-/mis-conceptions about supposed differences.

I’m not meaning this personally or doing it just for the sake of arguing. I just think that the methodology is too unreliable, especially for a test that is most likely not necessary in the first place.

This post has been edited by db1989: Feb 12 2013, 06:38
Go to the top of the page
+Quote Post

Reply to this topicStart new topic
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:

 



RSS Lo-Fi Version Time is now: 21st August 2014 - 09:46