Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: Talk about ABC/HR Levels (Read 7620 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

Talk about ABC/HR Levels

Now that the Mp3 listening test has finished (and we all thank Sebastian for the time spent on it), I would like to open a discussion about the meanings of the rankings, and especifically of the values of the ABC/HR qualities.

I tend to see results rating the low anchor as 1.0, and then rarely rankings above 3.
I could understand this for a multiformat 64kbps test, but for this MP3 at 128 VBR one, this is going a bit too much over myself.

I mean, guruboolez results average around 3 (ok, this is from a well known person for his listening tests), Alex_B's  one being a bit above guruboole's and finally Alexxander posting even a bit below than guruboolez's

Add to this the comments of some participants, wondering if the low quality of the low anchor made the rest "too high", and the conclusion that my averages are much more like the median of the whole test, if not *above* it.

So I am interested in your oppinion on what's a 4, and what's a 3. (I guess 5 is a no-brainer, and 1 should be, but probably is not).

Just for reference, the ABR/HR scale (in 0.1 steps) is:

5 -> Imperceptible
4 -> Perceptible, but not annoying
3 -> Slightly annoying
2 -> Annoying
1 -> Very annoying

(that's a total of 41 positions)


My personal view on this is the following:

(5) If I have to do an ABX test to be sure of the difference, and if that ABX test is not a clear 8/8 or 16/16 (ok, 15/16 allowed too), I cannot rate it below 4. At least not if being honest.
If it is slight, or have difficulty ABXing it, i rate it between 4.6 and 4.9 (obviously, if I can't ABX it, I rate it 5).

(4) For me, Perceptible but not annoying is everything that is ABXable, but that I would not recognize out of an AB comparison. That, plus some types of artifacts that do not interfere with the overall sound.

For example: the noise difference of the HE-AAC codecs is perceptible but not annoying to me. If I only had the lossy encoding, I wouldn't go saying "Mmm.. I should have better taken my lossless copy around".


(3) To rate something sligtly annoying, several conditions have to be met:
- Not to require an ABX.
- Be an artifact that I could identify without an AB comparison. (Again, if it is not evident without a direct comparison, it is not annoying).
- The artifact or problem be in itself something that I could accept as a medium defect (Like we accept some noise on FM radio transmissions and some random stereo problems).


(2) Annoying is, for me, something that I would listen for a while, but I would stop it afterwards. The quality of MP3 at 96kbps CBR used to be one such thing.
So the requirements for this are:
Identifiable without AB comparison.
Be of such a type that your mind can't hide the artifact, and keeps bugging you.


(1) And we reached the Very annoying. I rated just a few of the samples of the low anchor at 1.0 in this test.
Very annoying is something you can't accept, straight away.
It is just too different from the real thing. I rate around this value those cases where you are more aware of the artifact than the audible content. 1.0 is really annyoing for me, not just a "you're the worst of this set" mark.


I appreciate your oppinions on this, and I guess that in the end, I am just an average listener.

Talk about ABC/HR Levels

Reply #1
I understand the grades in a very similar if not identical way as you do. This is something I will consider for the website reboot that will take place sometime when I have the time. I saw several people submitting invalid results where they failed ABX but ranked the encoder with 3 - people, if you cannot even ABX that sample against the reference, why give a 3 in first place?

Talk about ABC/HR Levels

Reply #2
Wouldn't it be a good idea to require the user to successfully ABX the samples before allowing them to rate them? You'll still probably end up with a few people cheating and trying to throw the test results off, but it could only help.

Talk about ABC/HR Levels

Reply #3
.... or default to 5 where a successful ABX has not taken place?
lossyWAV -q X -a 4 -s h -A --feedback 2 --limit 15848 --scale 0.5 | FLAC -5 -e -p -b 512 -P=4096 -S- (having set foobar to output 24-bit PCM; scaling by 0.5 gives the ANS headroom to work)

Talk about ABC/HR Levels

Reply #4
Detailed description for grades is indeed good to have.
Great start Jazz!

Talk about ABC/HR Levels

Reply #5
Excellent Talk [JAZ] !! This is a very complex issue and complexity depends on goals.

Before "finetuning" the rating scale I think the talk should be about the listening environment. I could easily rate lossy samples as transparant (rate 5.0) in my car but listening to them through my Sennheiser headphone and with a quiet home I might rate it between 2.0 and 3.0.

My ratings of the latest ABC/HR were based on the level and number of distortions I detected in each sample using headphone but certainly I would not detect most of them listening through speakers.

So there are two ways for determining definitions of the rating scale:
1.- First impose or put clear conditions on the listening environment and than define a rating scale, or
2.- Combine in definitions level of transparency with listening environment parameters.

Maybe with an example of 2.- it will be more clear:

5 -> Imperceptible even on headphones
4 -> Perceptible on headphones with quiet environment
3 -> Slightly annoying on headphones but not annoying on speakers with quiet environment
2 -> Annoying on speakers with quiet environment
1 -> Very annoying on speakers even with very noisy environment

This is just an example and all can be shifted up and down after having determined the top and bottom line. Note also that in this example I consider headphones more usefull for detecting problems than speakers but for some types of artifacts it's the other way (especially when turning up the volume).

Talk about ABC/HR Levels

Reply #6
IMO a listener should choose whatever environment he/she wants. And maybe the best one to use is the one that he/she usually uses for listening to music.
It is known that environment and equipment influences level of noticed distortions and this should be stated in instructions for conducting a test, but shouldn't be a restriction for participants.

Talk about ABC/HR Levels

Reply #7
Wouldn't it be a good idea to require the user to successfully ABX the samples before allowing them to rate them? You'll still probably end up with a few people cheating and trying to throw the test results off, but it could only help.


That adds difficulty to the test and doesn't provide much better rankings, since ranked references are already a way to know about cheating.

What has been suggested in the past was the opposite: that successing in doing an ABX makes the program disable the original slider for you, so that you can't make an error choosing it.


Before "finetuning" the rating scale I think the talk should be about the listening environment. I could easily rate lossy samples as transparant (rate 5.0) in my car but listening to them through my Sennheiser headphone and with a quiet home I might rate it between 2.0 and 3.0.

5 -> Imperceptible even on headphones
4 -> Perceptible on headphones with quiet environment
3 -> Slightly annoying on headphones but not annoying on speakers with quiet environment
2 -> Annoying on speakers with quiet environment
1 -> Very annoying on speakers even with very noisy environment



Mmmm.. there are two topics on that sentence. One is the usual talk about the equipment that users should use when doing the ABX tests.
It has been agreed that headphones are a lot better for this purpose than speakers, since that reduces the environment noise that you hear, as well as putting more detail directly into the ear. Other than that, the requirements on equipment shouldn't go much further than this, since this usually reflects what the user could expect out of it.

The other is the meaning of ratings, which really is on topic. Your suggestion doesn't look bad to me, but that is possibly more difficult to quantify (would the user need to listen to both, to be sure?).

On another side of it, you've sort of nailed the point. That scale is possibly too subjective, since not only the meanings might differ, but also those meanings are related to the user's intended listening environment.

It's not the same to be "slightly annoying" to someone listening to classical music, relaxed in his sofa, than to someone listening to top 40 music on his hi-fi speakers.

Solving this second meaning could possibly solve the discrepancies that made me start this topic.



[Edit: a couple of spelling errors]