Public Listening Test [2010]

Topic: Public Listening Test [2010] (Read 178689 times) previous topic - next topic

0 Members and 1 Guest are viewing this topic.

Public Listening Test [2010]

Reply #200 – 2010-02-15 01:14:22

Quote from: Alex B link=msg=0 date=

In general, the only really correct and fair way to simulate a real life usage situation would be to encode the complete original source tracks, decode the encoded tracks, cut the test samples from the decoded tracks, and store the samples in a lossless format. This has been discussed in the past, but it has never been a viable option for various practical reasons.

If a person is providing a sample, have them follow that procedure prior to upload. Or at least make sure they understand the first 1-2 seconds will be chopped off, so they may decide to provide a different range. Alot of work and variables, though.

Quote from: Alex B link=msg=0 date=

Regardless of the encoder specific behavior, I think that if you want to test a certain very short passage you should simply encode a sample that starts 2 seconds earlier. If the original source track actually starts with the critical passage that is intended to be tested you can always configure that sample to start from the beginning. It would be fair for all encoders. You would then be testing how the encoders can handle a track that starts with such content.

Gabriel was in favor of cutting 1-2 seconds, but also he did warn about "scene cut", i.e. the 1-2 seconds being cut need to be representative of the proceeding material, so that the codec has already adjusted by the presented beginning point of the listening sample. But this is a problem, because the critical passage is directly after a scene change and is not long in duration? I say this is the same situation as a track that starts with a critical passage.

Trimming seconds seems to cause as many problems as it tries to solve. That would be a no-go in an engineering process.

Public Listening Test [2010]

Reply #201 – 2010-02-15 08:09:06

You know it just amazes me how HA people can screen out the acctual bitrates of the samples.

THE GREEN LINE IS 25KBPS ABOVE THE BLUE LINE!

And there is no 128kbps level for CBR encoders at all. For what particular reason the listening test should include samples produced by same encoder with 25kbps gap? Each frame of green stream has more bits than the blue one, both produced using same intra-frame routines.

And I can see no difference between bitrate overhead of "constant quality" and related to initial bit reservoir state. What is the difference?

Quote from: C.R.Helmrich on 2010-02-13 16:19:10

I think the best thing would be to concatenate all test samples into a single file (it seems this is commonly done while standardizing MPEG coders), and then to prepend a few seconds of noise to the beginning of that file so that the first sample is encoded with a half empty bit reservoir state.

This sould also make easier enumeration of encoder's settings in search of equal bitrates.

Public Listening Test [2010]

Reply #202 – 2010-02-15 09:21:36

.alexander.,

I think many readers may have difficulties in understanding the graphs.

When you posted your first graph I wasn't quite sure what exactly was looped, measured and how it was done. After seeing your second graph and the related replies I think I got it right. Could you confirm if the following is accurate:

The graph shows the bitrates of 100 instances of the 7-second emese sample, which were put together and encoded as a single AAC audio file. The duration of this file is about 700 seconds. After encoding the file was splitted to individual 7-second AAC segments that were muxed into individual MP4 files. Each dot shows the overall bitrate of one of the resulting 100 MP4 files in the same order as the individual segments were inside the single encoded file. The graph provides no information of how the bitrate varies inside any of these 7 second segments.

Public Listening Test [2010]

Reply #203 – 2010-02-15 09:55:30

Thank you Alex, your explanation is correct and very clear. The only minor remark: I trimmed silence at the begining and the end of the emese and duration of output files is rather 6 but not 7 seconds.

Public Listening Test [2010]

Reply #204 – 2010-02-15 11:33:11

I'm using the modified faad version I mentioned earlier to check the bitrate variation on a looped encoding of emese (20 iterations). I can confirm what .alexander. found by using different tools.
Here's the average bitrate for each individual loop:

Code: [Select]

#1    150.04
#2    138.87
#3    138.09
#4    153.44
#5    141.49
#6    139.85
#7    139.43
#8    137.74
#9    147.72
#10   146.37
#11   139.38
#12   138.96
#13   137.36
#14   140.22
#15   144.19
#16   139.70
#17   139.57
#18   137.98
#19   143.13
#20   153.29

minimum = 137.36 kbps (13th loop)
maximum = 153.44 kbps (4th loop)
average = 142.84 kbps

And now the full bitrate graph of the 20 iterations of emese (6.00 seconds x 20 = 2 minutes of audio material) :

Public Listening Test [2010]

Reply #205 – 2010-02-15 13:17:54

Aren't the bitrate variations overrated lately?

We are not just testing one sample but many.
What's the probability of hitting one of the extreme rates? On average it is going to be... the average.
The encoder parameters for the test have not been deduced from single track encodings (the variations would matter greatly then) but from >30GB general music collection averages. Since every encoder is going to have about the same reservoir state for each sample, where is the problem?

If we had wanted to force each encoder for each sample as close to 128kbit/s as possible, the variations would be a problem. But that's not the usual use case with VBR, anyway. VBR Q values aim at whole collection averages and that's what we are going to use. It's a VBR encoders job to scale up for problematic (and compensate with simple) content. If it isn't doing that job properly this test is going to show that.

Public Listening Test [2010]

Reply #206 – 2010-02-15 14:07:27

I think iTunes CVBR is doing something funny. The "funniness" may not be limited to the emese sample.

It behaves more like an ABR setting that uses a very large "ABR frame", at least several seconds, perhaps tens of seconds. In that case encoding any short samples will produce inconsistent, quite arbitrary results.

EDIT

As I said before, the issue could be avoided completely by encoding the complete original tracks and cutting the test samples from decoded files. Then the encoded passages would be exactly correct and there would be no reason to investigate if the samples' short durations have any effect to the encoded data.

Public Listening Test [2010]

Reply #207 – 2010-02-15 14:19:15

If the complete versions of every track can be organized, that would be fine. Else C. R. Helmrich's proposal should also work out well.

Public Listening Test [2010]

Reply #208 – 2010-02-15 14:42:07

Quote from: rpp3po on 2010-02-15 14:19:15

... Else C. R. Helmrich's proposal should also work out well.

Combining the samples into a stream that is encoded as a single file would not change the possibly inconsistent behavior. For instance, the passage that contains the emese sample might be quite different from a passage that is cutted from a separately encoded complete emese track.

At least we would need to know why the encoder behaves like it does, so that the issue could be reliably reproduced with a few different audio samples and its severity considered.

Public Listening Test [2010]

Reply #209 – 2010-02-15 15:53:21

Yes, if CVBR is behaving as if it had a several second memory, that might really screw results in comparison to encoders having a shorter attention span, when that memory is filled with data from a preceding, uncorrelated track.

Public Listening Test [2010]

Reply #210 – 2010-02-15 20:33:35

It seems iTunes CVBR in its latest version does bitrate handling in a much weirder way than I thought. The plot for the previous version which alexander posted ist much more in line with what I expected. The problem is that many of the samples I'd like to see in the test are actually the beginning of songs ("Since Always" and Fatboy Slim's "Kalifornia", for example). Although I like Alex B's suggestion of encoding entire tracks and extracting relevant parts (or at least encoding from the beginning of the track up to the relevant passage), I'm afraid there will not always be much leading music to cut off.

I think it's time to focus on the item selection.

Chris

P.S.: It's nice to see, though, that there are multiple ways to monitor a file's bitrate usage over time. We could make use of that when encoding the files for the test. If some strange up-down jumps like in the above plot occur, we can decide on a per-item basis on how to proceed.

Public Listening Test [2010]

Reply #211 – 2010-02-15 20:50:08

I have thought about all of this again. We are using collection based presets, anyway. Because of that, we aren't interested that much in actual bitrates, but more what quality is delivered for a specific Q value, representing a 128 kbit/s broad collection average, for a specific track. So if bitrate behaves strangely with an empty reservoir at the beginning of a track, it would also do so in a real life encoding of that track. And if an encoder messes up when an input starts the action without foreplay, it is the encoders fault and should not be fixed manually. So encoding full length tracks and then cutting out the relevant sections, but not more, would be as close to real life as it can get.

Public Listening Test [2010]

Reply #212 – 2010-02-15 21:12:11

Quote from: C.R.Helmrich on 2010-02-15 20:33:35

It seems iTunes CVBR in its latest version does bitrate handling in a much weirder way than I thought. The plot for the previous version which alexander posted ist much more in line with what I expected...

I agree that the first plot looks normal. Unless the Apple developers have created a new innovative system that actually works fine in real life situations, the behavior in the second plot may also be caused by a bug in the new version.

Last time when a listening test was prepered we found a serious bug in Apple's MP3 encoder. The bug had existed for years.

The Apple developer's should be informed about this issue so that could check the changed behavior.

Public Listening Test [2010]

Reply #213 – 2010-02-19 23:07:02

Quote from: Alex B on 2010-02-06 21:55:57

Summary:

DivX -v 4
Various   123.72
Classical   105.52
Average   114.62

DivX -v 5
Various   147.56
Classical   125.00
Average   136.28

I think it's not quite correct to calculate average between Various and Classic.
Some codecs produce low bitrate at quite music while other codecs at loud music.
The opposite of classic music will be loud metal.
It will be more correct to calculate average between loud (metal&hard rock), various (middle loudness) and classic (quite music).

Divx's setting is CBR.

Public Listening Test [2010]

Reply #214 – 2010-03-06 01:43:16

Apple's dev was informed about possible issue of iterations.

.alexandr.
I have try a few samples and only emese has iteration anomality. Confirmation will be good.

Public Listening Test [2010]

Reply #215 – 2010-03-09 18:40:22

Quote from: .alexander. on 2010-02-12 11:30:21

The CVBR looks different since I had to update quicktime to use qtaacenc.

Note this is bitrate distribution for looped emese sample. And each point corresponds to bitrate of one repetition of emese sample.

EDIT: legend

This is expected behavior for constrained_VBR. For offline applications, VBR is always the best option, and it won't show this type of irregular behavior.

Public Listening Test [2010]

Reply #216 – 2010-03-09 21:31:25

It's still ok to include both TVBR and CVBR at least for me.
Or should we exclude CVBR or do some workaround (iterations)?

Almost definitive list of AAC encoders and settings is:
1. Nero -q 0.41 (-q0.415?)
2. Apple --tvbr 65 --highest
3. Apple --cvbr 124 --highest . Discussion of CVBR bitrates
4. Pre-test:
4a. Divx CBR 128
4b. CT CBR 130

Public Listening Test [2010]

Reply #217 – 2010-03-10 00:23:18

Quote from: skuo on 2010-03-09 18:40:22

This is expected behavior for constrained_VBR. For offline applications, VBR is always the best option, and it won't show this type of irregular behavior.

If that is the position at Apple, I find this very interesting. Currently 'playback' within the iTunes eco system means offline use. Every device has got its own music storage. If Apple sticks to constrained VBR, for its streaming benefits and despite its disadvantages, to me this is clearly a hint at an upcoming shift towards a more streaming centered model.

Public Listening Test [2010]

Reply #218 – 2010-03-13 16:49:51

Quote from: skuo on 2010-03-09 18:40:22

Quote from: .alexander. on 2010-02-12 11:30:21
The CVBR looks different since I had to update quicktime to use qtaacenc.
Note this is bitrate distribution for looped emese sample. And each point corresponds to bitrate of one repetition of emese sample.

EDIT: legend

This is expected behavior for constrained_VBR. For offline applications, VBR is always the best option, and it won't show this type of irregular behavior.

skuo, then why does your previous encoder version (posted earlier) show a very different, much more expected, behavior?

Igor, all, if the emese@CVBR problem persists, I'm afraid we have to use an iteration giving the lowest bit rate of 135.5 kbps for fairness sake, e.g. loop #2.

Chris

Public Listening Test [2010]

Reply #219 – 2010-03-13 19:31:31

Quote from: C.R.Helmrich on 2010-03-13 16:49:51

Igor, all, if the emese@CVBR problem persists, I'm afraid we have to use an iteration giving the lowest bit rate of 135.5 kbps for fairness sake, e.g. loop #2.

But CVBR has this issue only with this particular (emese) sample. I will check all test samples on bitrate regularity.

Public Listening Test [2010]

Reply #220 – 2010-03-15 23:06:12

During selection of samples the issue with CVBR bitrate regulaity persisted too often. CVBR should be exlcude from test. Too many problems. Odd bitrate shifting (--cvbr 124 work only on MAC) to be comparable to TVBR as well.
As Skuo (Apple's dev) suggested to go with TVBR. Let's do it.

Public Listening Test [2010]

Reply #221 – 2010-03-15 23:13:35

You're going to alienate massive amounts of people, but it's your test, so it's your choice.

Public Listening Test [2010]

Reply #222 – 2010-03-15 23:20:17

It's not just my test. It's public. All decisions are made with agreement of major part (>=50%).

There maybe one solution more before excluding of CVBR.
Loop sample and choose chunk with minimal or middle bitrate. Smells like chemistry to me. Minimal or middle bitrate?

Suggestions are welcomed.

Public Listening Test [2010]

Reply #223 – 2010-03-15 23:24:54

Why not figure out what the bitrate of the chunk is by doing what people normally do: encode the entire track.

Public Listening Test [2010]

Reply #224 – 2010-03-15 23:30:52

I wish there is any possiblity for that.
Who will do that work?
Find full lenght tracks for all 18-20 samples.
Try to find all submmiters and ask for upload? Last time I tried to communicate with a pair of dudes... never got answer.

Notice