Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: TMN and NMT in psymodels (Read 9303 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

TMN and NMT in psymodels

Hi all,

most common descriptions of psymodels I have seen, for example in "Transform Coding of Audio Signals Using Perceptual Noise Criteria" by Johnston or the MPEG standards, use the Tone-Masking-Noise (+- 18-29dB) and Noise-Masking-Tone (+- 6dB) thresholds for calculation of the required SMR per band.

However, the signal we are trying to mask is the quantization noise. This leads to the question: why do we use a Noise-Masking-Tone measure? We're not trying to mask a tone at all. Specifically if the base signal is noise, it seems improbable for the introduced quantization noise to have a tonelike structure.

It seems that the more natural measure would be to use NMN thresholds, but I can't find anything related to this in the literature. Painter & Spanias give a NMN of 26dB but note that the exact amount depends on the phase relationships between the two signals (which I don't understand, since it's supposed to be noise?).

So, what's the justification for using NMT as a metric of how much quantization noise to introduce in noisy signal sections?

TMN and NMT in psymodels

Reply #1
The noise and tone in NMT and TMN are referring to the tones or noise signals in the original signal, prior to lossy requantization. They are different from the introduced (re)quantization noise.

For tonality based models, there is usually no pure NMT and TMN, but rather SomethingMaskingSomething, with values going from NMT to TMN. NMN would be an in-between value (but close to TMN).

The TMN and NMT vocabulary might comes from Zwicker, which provided experimental data for both TMN and NMT (ie both extreme cases).

TMN and NMT in psymodels

Reply #2
Quote
The noise and tone in NMT and TMN are referring to the tones or noise signals in the original signal, prior to lossy requantization. They are different from the introduced (re)quantization noise.


I know what they are supposed to refer to, but that is not what they are being used for, is it? All the psymodels use these calculations to determine the effective amount of noise they can introduce per band. This means that the signal we are trying to mask is the introduced noise, and not something in the original, correct? (It's the only way things seem to make sense, either)

After all, at no point do the psymodels try to determine what the signal is that is being masked. They calculate the tonality of a band, but this is really the tonality of the strongest signal, i.e. the masker, and not the (weaker) maskee.

This is why I think the correct thing to use would be NMN and TMN, and not NMT.

However, I can't mix this with the NMN value in Painter & Spanias, and with your statement:

Quote
NMN would be an in-between value (but close to TMN).


If we don't care what the nature of the signal is that is being masked, then things still work as long as NMT and NMN are close, but it seems they are not (6dB vs >20dB).

TMN and NMT in psymodels

Reply #3
TMN and NMT "concepts" are anterior to modern lossy coders. T and N are referring to the input signal characteristics, and not the quantization noise we are introducing.

You are right, most models only care about the tonality of the masker and not about the maskee one. In reality, we are using TNSomething and NMSomething.
This could probably improved, but the biggest influence is based on masker characteristics. The maskee characteristics seems to have a lower influence on masking.
Stricto senso, you are right: TMN and NMT words are usually wrongly used.

TMN and NMT in psymodels

Reply #4
Quote
Why not "TMNT"?

sorry...
[a href="index.php?act=findpost&pid=369970"][{POST_SNAPBACK}][/a]


???

TMN and NMT in psymodels

Reply #5
Quote
The maskee characteristics seems to have a lower influence on masking.
[a href="index.php?act=findpost&pid=369946"][{POST_SNAPBACK}][/a]


I think we understand each other, but we're still left with this problem:

1) NMT: 6dB
2) NMN: 26dB

Which contradicts the above. Either the NMN is wrong, and there's no problem. Or the NMN is right, and we need to do something to our psymodels

TMN and NMT in psymodels

Reply #6
6dB for NMT is unusually low. In Lame (NSPsytune) it is about 17dB (TMN: 8dB)

TMN and NMT in psymodels

Reply #7
psymodel.h (some recent LAME version)

#if 1
    /* AAC values, results in more masking over MP3 values */
# define TMN 18
# define NMT 6
#else
    /* MP3 values */
# define TMN 29
# define NMT 6
#endif

mppenc.c (Musepack)

Code: [Select]
static const Profile_Setting_t  Profiles [16] = {
   { 0 },
   { 0 },
   { 0 },
   { 0 },
   { 0 },
/*    Short   MinVal  EarModel  Ltq_                min   Ltq_  Band-  tmpMask  CVD_  varLtq    MS   Comb   NS_        Trans */
/*    Thr     Choice  Flag      offset  TMN   NMT   SMR   max   Width  _used    used         channel Penal used  PNS    Det  */
   { 1.e9f,  1,      300,       30,    3.0, -1.0,    0,  106,   4820,   1,      1,    1.,      3,     24,  6,   1.09f, 200 },  // 0: pre-Telephone
   { 1.e9f,  1,      300,       24,    6.0,  0.5,    0,  100,   7570,   1,      1,    1.,      3,     20,  6,   0.77f, 180 },  // 1: pre-Telephone
   { 1.e9f,  1,      400,       18,    9.0,  2.0,    0,   94,  10300,   1,      1,    1.,      4,     18,  6,   0.55f, 160 },  // 2: Telephone
   { 50.0f,  2,      430,       12,   12.0,  3.5,    0,   88,  13090,   1,      1,    1.,      5,     15,  6,   0.39f, 140 },  // 3: Thumb
   { 15.0f,  2,      440,        6,   15.0,  5.0,    0,   82,  15800,   1,      1,    1.,      6,     10,  6,   0.27f, 120 },  // 4: Radio
   {  5.0f,  2,      550,        0,   18.0,  6.5,    1,   76,  19980,   1,      2,    1.,     11,      9,  6,   0.00f, 100 },  // 5: Standard
   {  4.0f,  2,      560,       -6,   21.0,  8.0,    2,   70,  22000,   1,      2,    1.,     12,      7,  6,   0.00f,  80 },  // 6: Xtreme
   {  3.0f,  2,      570,      -12,   24.0,  9.5,    3,   64,  24000,   1,      2,    2.,     13,      5,  6,   0.00f,  60 },  // 7: Insane
   {  2.8f,  2,      580,      -18,   27.0, 11.0,    4,   58,  26000,   1,      2,    4.,     13,      4,  6,   0.00f,  40 },  // 8: BrainDead
   {  2.6f,  2,      590,      -24,   30.0, 12.5,    5,   52,  28000,   1,      2,    8.,     13,      4,  6,   0.00f,  20 },  // 9: post-BrainDead
   {  2.4f,  2,      599,      -30,   33.0, 14.0,    6,   46,  30000,   1,      2,   16.,     15,      2,  6,   0.00f,  10 },  //10: post-BrainDead
};


I think you got em reversed. Can you see my problem?

TMN and NMT in psymodels

Reply #8
Oops! Of course I got them reversed.

TMN and NMT in psymodels

Reply #9
Quote
psymodel.h (some recent LAME version)

#if 1
    /* AAC values, results in more masking over MP3 values */
# define TMN 18
# define NMT 6
#else
    /* MP3 values */
# define TMN 29
# define NMT 6
#endif


I think the TMN differences between AAC and MP3 is due to the fact that in AAC psymodel, there is a much more sophisticated handling of the binaural masking effect than in the MP3 psymodel.

At frequencies above 10 Khz, the required TMN value is just about 18 dB compared to lower frequencies which could be as high as 30 dB.

MP3 just assumed that the TMN value is uniform throughout the entire frequency bands, taking the worst case situation. (29 dB)

TMN and NMT in psymodels

Reply #10
More precisely kwwong is speaking about the ISO demonstration algorithms, not the formats themselves.

TMN and NMT in psymodels

Reply #11
Quote
I think the TMN differences between AAC and MP3 is due to the fact that in AAC psymodel, there is a much more sophisticated handling of the binaural masking effect than in the MP3 psymodel.

At frequencies above 10 Khz, the required TMN value is just about 18 dB compared to lower frequencies which could be as high as 30 dB.

MP3 just assumed that the TMN value is uniform throughout the entire frequency bands, taking the worst case situation. (29 dB)
[a href="index.php?act=findpost&pid=370301"][{POST_SNAPBACK}][/a]


I'm not so sure this is the reason. The "recommended" values in the standard could just be almost randomly picked like so many other things in the informative part.

Specifically, the Johnston paper actually has the TMN increase over the Bark range, from 15dB at Bark 1 to 40dB at Bark 25. This is exactly the other way around from what you say. At that point I doubt BMLD was being considered, but adding it still wouldn't produce the shape your explanation produces.

There are other reasons to prefer 18dB in this situation. It's easier to attain at 96-128kbps, meaning that the ISO noise allocation loops system works better.

But I don't care about the exact value of TMN or why it differs in the standard; I'm getting the strong impression that the entire tonality + TMN/NMT thing isn't based on starting with known TMN/NMT and working from there, but just a heuristic that was found to work well, and for which an explanation was produced after it turned out to work well.

TMN and NMT in psymodels

Reply #12
Quote
But I don't care about the exact value of TMN or why it differs in the standard; I'm getting the strong impression that the entire tonality + TMN/NMT thing isn't based on starting with known TMN/NMT and working from there, but just a heuristic that was found to work well, and for which an explanation was produced after it turned out to work well.

I think that your are partially right on this point.
TMN and NMT measurements were known before the work on modern codecs. We had them from at least Zwicker's work, and this was older.
However, taking into consideration the tonality of the maskee would seriously complicate the spreading function of ISO demonstration algorithms. It is likely that at this step, a simplification was made, but TMN and NMT values were still presented as an "official" explanation, even though a little bit of koocking was introduced there.

Wanting to keep trade secrets is also a possibility, although this is purely hypothetical speculation.

TMN and NMT in psymodels

Reply #13
Quote
Specifically, the Johnston paper actually has the TMN increase over the Bark range, from 15dB at Bark 1 to 40dB at Bark 25. This is exactly the other way around from what you say. At that point I doubt BMLD was being considered, but adding it still wouldn't produce the shape your explanation produces.


Well, Johnston's paper is for a slightly different spreading function slope implementation than the ISO model.

In the ISO model, both slopes of the spreading function are almost identical and much steeper whereas the Johnston's model uses unsymetrical spreading function slopes.  That explained why the TMN implementation of the Johnston model differs from the ISO model.

I think Johnston would have already accounted for BMLD in his model, only that the modelling isn't as sophisticated as that of the ISO AAC psychomodel.

TMN and NMT in psymodels

Reply #14
Quote
Quote
Specifically, the Johnston paper actually has the TMN increase over the Bark range, from 15dB at Bark 1 to 40dB at Bark 25. This is exactly the other way around from what you say. At that point I doubt BMLD was being considered, but adding it still wouldn't produce the shape your explanation produces.


Well, Johnston's paper is for a slightly different spreading function slope implementation than the ISO model.

In the ISO model, both slopes of the spreading function are almost identical and much steeper whereas the Johnston's model uses unsymetrical spreading function slopes.  That explained why the TMN implementation of the Johnston model differs from the ISO model.


This seems to be correct, i.e. more spreading means lower effective SMR needed in higher frequency parts (for most typical signals). But it's weird to mix intra and inter band masking in such a way to get some BMLD protection in such a highly roundabout manner. (Another heuristic that happens to work?) All in all I have serious doubts about this explanation.

Quote
I think Johnston would have already accounted for BMLD in his model, only that the modelling isn't as sophisticated as that of the ISO AAC psychomodel.
[a href="index.php?act=findpost&pid=370506"][{POST_SNAPBACK}][/a]


Thing is, the TMN at the lowest level is 15dB, 40dB at the highest level and the spreading function works mostly from low to high frequencies, so the model doesn't produce the wanted effect. Given that later models have BMLD explicitly accounted for, I don't believe this.

Another reason why I don't believe it is that PXFM was a mono codec and the results in the paper are for mono signals

TMN and NMT in psymodels

Reply #15
Quote
The TMN and NMT vocabulary might comes from Zwicker, which provided experimental data for both TMN and NMT (ie both extreme cases).
[a href="index.php?act=findpost&pid=369859"][{POST_SNAPBACK}][/a]


Well, I suspect that TMN came from Scharf's work using the Bark scale, where tone masking noise rises with critical band number.

I also suspect that NMT comes from a survey paper by Hellman.

I've heard that NMN is, at lowest, a bit smaller than NMT, but that the correlation of the noise sources makes this a very twitchy subject.

Also, is quantization noise in a coder "noise" or is it not? It's not dithered, we are, after all, trying to get rid of information, aren't we?

(edited to fix confusing Zwicker with Scharf. Oh well.)
-----
J. D. (jj) Johnston

TMN and NMT in psymodels

Reply #16
Quote
Wanting to keep trade secrets is also a possibility, although this is purely hypothetical speculation.
[a href="index.php?act=findpost&pid=370343"][{POST_SNAPBACK}][/a]


Oh, I'm sure that there was none of that in the MPEG-1 Committee. After all, the filterbank description is perfectly transparent! 
-----
J. D. (jj) Johnston

TMN and NMT in psymodels

Reply #17
Quote
I think Johnston would have already accounted for BMLD in his model, only that the modelling isn't as sophisticated as that of the ISO AAC psychomodel.
[a href="index.php?act=findpost&pid=370506"][{POST_SNAPBACK}][/a]


Personally, I think the limit of "everything tonal" at low frequencies might have been a hack to protect from BMLD problems.  I also suspect that there was some resistance to various issues around BMLD in the MPEG-1 Timeframe, and that might account for differences.

Finally, which MPEG-1 psych model do you mean? The two are substantially different.

Now, in the AAC model, I dare say that the idea of BLMD was addressed, but the idea of having BLMD-like behavior for signal envelope at higher frequencies seems to have been somewhat neglected.
-----
J. D. (jj) Johnston

TMN and NMT in psymodels

Reply #18
Quote
Another reason why I don't believe it is that PXFM was a mono codec and the results in the paper are for mono signals
[a href="index.php?act=findpost&pid=370537"][{POST_SNAPBACK}][/a]


Well, actually, PXFM was an M/S coder, I believe.
-----
J. D. (jj) Johnston

TMN and NMT in psymodels

Reply #19
Quote
Specifically, the Johnston paper actually has the TMN increase over the Bark range, from 15dB at Bark 1 to 40dB at Bark 25. This is exactly the other way around from what you say. At that point I doubt BMLD was being considered, but adding it still wouldn't produce the shape your explanation produces.[a href="index.php?act=findpost&pid=370335"][{POST_SNAPBACK}][/a]


Those TMN numbers are given by Scharf, actually, in Das Ohr.

I believe that later work moved to using ERB's rather than the Bark Scale, and a fixed 30dB-ish TMN more like Jont Allen's work. Or at least some people moved in that direction.

All of you appear to be leaving out the issue of ERB vs. Bark Frequency. Also, if one wishes to test this 17dB assertion, one should make an AM and an FM signal in one ERB range, and try 17dB SNR on this signal, eh?
-----
J. D. (jj) Johnston

TMN and NMT in psymodels

Reply #20
Quote
Personally, I think the limit of "everything tonal" at low frequencies might have been a hack to protect from BMLD problems. I also suspect that there was some resistance to various issues around BMLD in the MPEG-1 Timeframe, and that might account for differences.


I remember a paper from Frank Baumgarte that comes just short of calling BMLD "fiction produced due to playing with artificial signals", so I think I can see what you're getting at.

Quote
I believe that later work moved to using ERB's rather than the Bark Scale, and a fixed 30dB-ish TMN more like Jont Allen's work. Or at least some people moved in that direction.


But what about NMT in the ERB scale? Or spreading? Any references for that?

One could recalculate them from the known values in the Bark scale, but that would be working the wrong way around.

Quote
All of you appear to be leaving out the issue of ERB vs. Bark Frequency. Also, if one wishes to test this 17dB assertion, one should make an AM and an FM signal in one ERB range, and try 17dB SNR on this signal, eh?
[a href="index.php?act=findpost&pid=371598"][{POST_SNAPBACK}][/a]


I'm not sure what "17dB assertion" you're referring to, or what you're trying to make clear, generally.

TMN and NMT in psymodels

Reply #21
Quote
But what about NMT in the ERB scale? Or spreading? Any references for that?

Well, I've heard that spreading and NMT both are constant in the ERB domain as well. You have to change the spreading values a bit.
Quote
One could recalculate them from the known values in the Bark scale, but that would be working the wrong way around.

For signals where there isn't any co-articulation, stick to the usual values, is my guess.
Quote
I'm not sure what "17dB assertion" you're referring to, or what you're trying to make clear, generally.
[a href="index.php?act=findpost&pid=373089"][{POST_SNAPBACK}][/a]


17dB was asserted somewhere upstream to be ok for TMN at high frequencies.

Try that sometime for a sine wave
-----
J. D. (jj) Johnston

TMN and NMT in psymodels

Reply #22
Quote
Try that sometime for a sine wave smile.gif


I did long time ago - it's a no-go