Any recommendations for 24/96 => 16/44 => 24/96 for transparency

Topic: Any recommendations for 24/96 => 16/44 => 24/96 for transparency (Read 22040 times) previous topic - next topic

0 Members and 1 Guest are viewing this topic.

Any recommendations for 24/96 => 16/44 => 24/96 for transparency

2014-11-09 04:37:26

In another thread in this forum, [a href='index.php?showtopic=106442']two hi rez vs redbook train wrecks in progress[/a], reference is made to a timing discrepancy between a native 24/96 file, and a version of it converted to 16/44, and then back to 24/96, provided over at AVS Forum:

Quote from: mzil on 2014-10-20 21:58:39

There's no reason to cheat to pass the ABX test on their test song Mosaic, at least; I know didn't. Simply pause the music at a particularly telling location and notice the alteration due to the time misalignment. I can clearly hear the guitar slap at the end of this phrase repeated loop of this riff I posted ends with a "cha" vs "chip" sound.

I discovered a similar timing discrepancy in another of the AVS Forum test pairs. The song was On the Street Where You Live. The files differed in alignment by 1021 samples at 96kHz. However even after correcting for that, and then inverting one of the files I found I couldn't get a null. In the conversion process there seems to have been a subtle change in level, which I suspect was not evenly spread across the audible spectrum.

I would like to assist over at AVS Forum by providing a replacement twice converted file, correctly time aligned. My first attempt to do that has miscarried as the version of Audacity I used for the task was old (1.3.13-alpha-Nov 26 2010) and its noise shaped dither turned out to be noticeably noisy. I find that the current version of Audacity (2.06) has a much quieter sounding shaped dither. However I am a little hesitant to use Audacity if there is considered to be better Sample Rate Conversion software, hopefully software that is free to download and use! By better in this context I mean SRC software that is likely to be transparent to most people when processing ordinary music such as the native 24/96 version of On the Street Where You Live being provided at AVS Forum. That music is provided by AVS Forum Editor, Scot Wilkinson, at post #1 of AVS/AIX High-Resolution Audio Test: Take 2.

Krabapple has already provided me with a link to test results for a range of SRCs: http://src.infinitewave.ca/ However I find the information there overwhelming. I recall many years ago (15 perhaps) I compared different filtering options for a 44.1kHz sample rate and at that time I preferred a gentle roll-off to a highly extended high frequency passband and steep roll-off (approaching the Nyquist limit).

But for what I want to do now in creating a 16/44 bottlenecked file in 24/96 format my personal preferences are not really the point. It seems I should use an SRC methodology that is likely to strike a good balance for the majority of listeners, and in particular one that is likely to be transparent to most listeners for most music, compared with a native 24/96 version. Some people would query me at this point and suggest, "But conversion to 16/44.1 is transparent.". If that were true, there ought not to exist such a range of different SRC methods from which to choose.

So I ask: what sample rate conversion method would be recommended for this exercise? For example, would using the current version of Audacity suffice (i.e. saving as 16/44 dithered, and then saving that 16/44 file as 24/96)?

Any recommendations for 24/96 => 16/44 => 24/96 for transparency

Reply #1 – 2014-11-09 06:36:06

There are some prepared test files (including 2 of the AIX ones) here. The conversion method is explained within.

Any recommendations for 24/96 => 16/44 => 24/96 for transparency

Reply #2 – 2014-11-09 15:38:21

Quote from: MLXXX on 2014-11-09 04:37:26

So I ask: what sample rate conversion method would be recommended for this exercise? For example, would using the current version of Audacity suffice (i.e. saving as 16/44 dithered, and then saving that 16/44 file as 24/96)?

I'm of the opinion that there are only two kinds of SRCs running between 2496 and 44/16, the ones that are good and effect a transparent conversion or if you want to believe that 44/16 and 24/96 sound different, then the one that is the closest to being transparent. The other kind are defective and either do not effect a transparent conversion or effect a conversion that is unnecessarily lacking in transparency.

Based on the technical tests published at the site referenced, there are a number of SRCs that are unnecessarily lacking in transparency. I attribute this this mostly due to lack of knowledge on the part of their authors, or coding errors. Many of the errors I see don't sound particularly good, so I think that euphonic performance is not much of an issue. I think that some of their authors would do better work today as this was once an evolving art.

However, the ones that seem to have maximized transparency are not unique. In particular, of the SRCs availble to me, Sox and Cool Edit appear to me to meet reasonable criteria for criteria, provided you supply them with appropriate parameters.

The nature of the material being converted can effect how critical some more subtle parameters are. For example almost all commercial recordings are non-critical enough that a relaxed attitude can work. OTOH, its possible that people are working with musical tracks that have been artificially enhanced to be more challenging for transmission by means of the 16/44 format. For example gain riding can be used during mixing to create recordings that have more dynamic range than the corresponding live performances. In these cases ensuring the use of optimal perceptually shaped quantizing and TPDF dither can be details demanding more careful attention.

Any recommendations for 24/96 => 16/44 => 24/96 for transparency

Reply #3 – 2014-11-09 17:04:24

Thanks, bandpass. Unless someone suggests that use of another conversion method might be better for my purpose, I'll use SOX.

I had an issue with the "-f" parameter in the command lines suggested in the readme.html file contained in the zip file in your link, when tried with a current version of SoX. I was able to omit it, without any apparent ill effects.*

Hi Arnold B. Krueger,
Thanks for your remarks. I see you are inclined to approve of SoX with appropriate parameters.

I hope that the parameters I am using (stated in the footnote below) for SoX are satisfactory. Reading the documentation I see that "dither -S" is described as follows:
[blockquote]The −S option selects a slightly ‘sloped’ TPDF, biased towards higher frequencies. It can be used at any sampling rate but below ≈22k, plain TPDF is probably better, and above ≈ 37k, noise-shaped is probably better.[/blockquote]

It could be I would be [slightly?] better off with the lower case "-s" dithering option. I see it is described in the SoX documentation as follows: "Noise-shaping (only for certain sample rates) can be selected with −s."

________

* The first command line given in the readme file is: sox 1a.flac -b 16 temporary.flac rate -f 44100 dither -S

That didn't work for me with wav files. The old, now deprecated, option "-f", relating to encoding type, was rejected. I tried the new option "-e signed-integer" but that was rejected too. I ended up using: sox 1a.wav -b 16 temporary.wav -S rate 44100 dither -S

The "-S" after the target filename enabled a display of progress.

For the second command line I deleted "-f". I inserted a "-S" to see progress, and to see details of the file being encoded. The command line I ended up with was: sox temporary.wav -b 24 1b.wav -S rate 96k

The detail displayed in the command prompt window for temporary.wav was 2 channel, 44100, 16-bit Signed-Integer PCM, i.e. just as required! A spectrum check revealed the presence of noise shaped dither, reflecting the "dither -S" in the first command line. I think the overall process worked correctly.

Any recommendations for 24/96 => 16/44 => 24/96 for transparency

Reply #4 – 2014-11-09 17:22:31

To a certain extent, it doesn't matter what noise-shaping is used as all it does is delay the inevitable: at some elevated volume, you'll hear the the 16-bit noise-floor before the 24-bit one.

Any recommendations for 24/96 => 16/44 => 24/96 for transparency

Reply #5 – 2014-11-09 18:05:12

Since on all listening tests there will be people cranking up the volume i suggest a shaped dither based purely on average human ath like the shibata ones. This is for the downsampling. To not possibly introduce side effects by accident with strong shaping low shibata must be safe.

Any recommendations for 24/96 => 16/44 => 24/96 for transparency

Reply #6 – 2014-11-09 20:03:05

Quote from: bandpass on 2014-11-09 17:22:31

To a certain extent, it doesn't matter what noise-shaping is used as all it does is delay the inevitable: at some elevated volume, you'll hear the the 16-bit noise-floor before the 24-bit one.

Unnaturally elevated listening levels is a form of cheating. Anything that forces cheating be more overt and extreme in order to obtain false positive results seems like a good thing to me.

Any recommendations for 24/96 => 16/44 => 24/96 for transparency

Reply #7 – 2014-11-09 20:41:41

Thanks for that link I think I'll try that test and hear for myself.

Any recommendations for 24/96 => 16/44 => 24/96 for transparency

Reply #8 – 2014-11-09 22:33:47

Quote from: MLXXX on 2014-11-09 04:37:26

So I ask: what sample rate conversion method would be recommended for this exercise? For example, would using the current version of Audacity suffice (i.e. saving as 16/44 dithered, and then saving that 16/44 file as 24/96)?

My favorite is iZotope's SRC, but it's not free. In the latest version the sub-sample delay has been compensated, so the output file will be perfectly sync with the input file (required for a null test). I've found the input/output difference signal to be lower than -150 dBFS, which is excellent and can be considered inaudible.
I would strongly advise however to split up the test into two parts: 1) the SRC (24/96->32/44.1->24/96) and 2) the word length reduction (dithering).
I'll gladly process some audio files for you if it can be helpful.
Since cheating will be possible, what kind of controls could be used to detect or prevent cheats ? (and should this be discussed in public in the first place ? )

Any recommendations for 24/96 => 16/44 => 24/96 for transparency

Reply #9 – 2014-11-10 05:43:08

Quote from: Arnold B. Krueger on 2014-11-09 20:03:05

Quote from: bandpass on 2014-11-09 17:22:31
To a certain extent, it doesn't matter what noise-shaping is used as all it does is delay the inevitable: at some elevated volume, you'll hear the the 16-bit noise-floor before the 24-bit one.

Unnaturally elevated listening levels is a form of cheating. Anything that forces cheating be more overt and extreme in order to obtain false positive results seems like a good thing to me.

Yes, there's a 'happy medium' somewhere between plain TPDF and aggressive noise-shaping that necessitates attenuation to avoid clipping and risks pushing tweeters non-linear.

Though as long as one avoids source material that:

contains digital silence,
has fade in/outs,
is un-mastered

the point is probably moot, since a mastered recording's perceptible noise-floor is likely to be higher than that of any proper 16-bit dither.

Any recommendations for 24/96 => 16/44 => 24/96 for transparency

Reply #10 – 2014-11-10 06:31:58

Why does anyone care if multiple bit depth and sample rate changes eventually produce an audible difference? There surely can't be any rational excuse for converting from 24/96 to 16/44.1 to 24/96, so why do it to begin with? In fact, after all the time and noise on the subject, how does anyone still want to even think about the endlessly boring subject of subtle differences between different samplings?

I don't suggest that anyone or everyone not be free to pursue the topic if they want to, it is just that it is hard to come up with anything that matters less in regard to anything at all, no?

Any recommendations for 24/96 => 16/44 => 24/96 for transparency

Reply #11 – 2014-11-10 07:50:26

Quote from: Kees de Visser on 2014-11-09 22:33:47

I'll gladly process some audio files for you if it can be helpful.

Thanks I will think about your kind offer, and then get in touch by PM.

Quote from: AndyH-ha on 2014-11-10 06:31:58

Why does anyone care if multiple bit depth and sample rate changes eventually produce an audible difference? There surely can't be any rational excuse for converting from 24/96 to 16/44.1 to 24/96, so why do it to begin with? In fact, after all the time and noise on the subject, how does anyone still want to even think about the endlessly boring subject of subtle differences between different samplings?

I don't suggest that anyone or everyone not be free to pursue the topic if they want to, it is just that it is hard to come up with anything that matters less in regard to anything at all, no?

AndyH,
to my mind the importance lies in attempting to get some reconciliation between opposing camps, or if that's unrealistic, at least to provide guidance for those not locked into a position.

On one side, we have those who claim a night and day difference between 24/96 and 16/44. That seems to be wrong, as attempts to get people to demonstrate an ability to hear differences in a formal ABX setting have not been particularly successful. (I note the usual practice of including a further conversion step to 24/96 in a formal ABX, so that the playback DAC is always operating at one sample rate and not introducing a difference itself.)

On the other side, we have those who claim there are no perceptible differences at all, at normal recording and playback levels, with music. That may not necessarily be true. It may be there are minuscule differences that some listeners can differentiate under favourable conditions (without turning up the gain). For those marketing hi-rez it would be quite important to be able to establish there is sometimes a discernible difference, even if the difference is extremely small. It is an embarrassment if there is found to be no discernible difference. Conversely for those who are adamant there is never any discernible difference, it would be an embarrassment to discover an ABX result that indicates there was sometimes a discernible difference, even if slight.

I for one am interested to know. I expect that any difference would be so minor as to be unlikely to influence my choice of format when purchasing music. However it is still a matter of at least academic interest to me.

Any recommendations for 24/96 => 16/44 => 24/96 for transparency

Reply #12 – 2014-11-10 08:56:10

Quote from: MLXXX on 2014-11-10 07:50:26

For those marketing hi-rez it would be quite important to be able to establish there is sometimes a discernible difference

There is no significant discernible difference; the science and maths behind this were done and dusted decades ago. For those marketing hi-rez it's important not to try to establish that there is a difference, but instead to state it as a supposedly unquestionable fact, until enough people believe it to make a viable market. They've been trying to do so with little to no success for the best part of 20 years; read and weep:

Quote

I first listened to the regular CD supplied, focusing on Stan Getz/Joao Gilberto "Corcovado." It sounded quite smooth and focused with very good definition and depth. I then played the Verve 20 Bit CD Reissue from 1997 which sounded a tad brighter, but that’s about it. I then listened to the SACD Verve Reissue from 2002 on my ModWright 9100 with tube power supply …it was improved in every way. It was SACD of course, but it had substantially more definition and air …better depth, too. It was then I played the SHM-CD of this cut. Oh boy, it sounds like the SACD!

Any recommendations for 24/96 => 16/44 => 24/96 for transparency

Reply #13 – 2014-11-10 10:58:27

I am inclined to believe that there is no significant difference what so ever but I can hear the ultrasound in some of the "c" samples that you linked to on my laptop speakers.

Can someone provide 24/96 digital silence so that I can do a blind test?

Any recommendations for 24/96 => 16/44 => 24/96 for transparency

Reply #14 – 2014-11-10 11:26:06

This may well be due to resampling in your play-back chain. Turn it off wherever you can.
If you still hear artefacts, it may be in the hardware itself.
On my laptop, for example, it seems I have to software resample to 192kHz in order to avoid low-quality on chip resampling.

Any recommendations for 24/96 => 16/44 => 24/96 for transparency

Reply #15 – 2014-11-10 11:26:49

Quote from: Satellite_6 on 2014-11-10 10:58:27

I am inclined to believe that there is no significant difference what so ever but I can hear the ultrasound in some of the "c" samples that you linked to on my laptop speakers.

I really think you need to prove that. It implies you have superhuman hearing and that your laptop speakers are able to produce ultrasound.

Or do you mean you can hear artefacts that are brought back down into normal hearing range?

Any recommendations for 24/96 => 16/44 => 24/96 for transparency

Reply #16 – 2014-11-10 11:41:22

http://sox.sourceforge.net/

http://sourceforge.net/projects/sox/files/sox/

Generating 5 seconds of 96/24 stereo silence.wav

Code: [Select]

sox -D -n -r 96000 -b 24 -c 2 silence.wav trim 0 5

Format confirmation:

Code: [Select]

soxi silence.wav 

Input File     : 'silence.wav'
Channels       : 2
Sample Rate    : 96000
Precision      : 24-bit
Duration       : 00:00:05.00 = 480000 samples ~ 375 CDDA sectors
File Size      : 2.88M
Bit Rate       : 4.61M
Sample Encoding: 24-bit Signed Integer PCM

Contents confirmation:

Code: [Select]

sox silence.wav -n stats
             Overall     Left      Right
DC offset   0.000000  0.000000  0.000000
Min level   0.000000  0.000000  0.000000
Max level   0.000000  0.000000  0.000000
Pk lev dB       -inf      -inf      -inf
RMS lev dB      -inf      -inf      -inf
RMS Pk dB       -inf      -inf      -inf
RMS Tr dB       -inf      -inf      -inf
Crest factor       -      1.00      1.00
Flat factor   113.62    113.62    113.62
Pk count        960k      960k      960k
Bit-depth       0/0       0/0       0/0 
Num samples     480k
Length s       5.000
Scale max   1.000000
Window s       0.050

Any recommendations for 24/96 => 16/44 => 24/96 for transparency

Reply #17 – 2014-11-10 12:03:30

I have no idea how to use that thing.

I have the internal sound card set to 24/96 so I do not think that it is resampling.

Would it be better to do this test with the ODAC?

Any recommendations for 24/96 => 16/44 => 24/96 for transparency

Reply #18 – 2014-11-10 12:26:40

Quote from: Satellite_6 on 2014-11-10 12:03:30

I have the internal sound card set to 24/96 so I do not think that it is resampling.

Internal sound cards claiming to support 24/96 do provide the OS with a 24/96 interface (so the OS doesn't resample a 96k file) but may internally resample to a higher rate, and do it badly.

You either need a soundcard that supports 96k internally (a quick glance suggests the ODAC may well do), or to have all the test files resampled to the internal soundcard's internal rate (e.g. 192kHz on my laptop).

Any recommendations for 24/96 => 16/44 => 24/96 for transparency

Reply #19 – 2014-11-10 12:39:42

Quote from: bandpass on 2014-11-10 05:43:08

Yes, there's a 'happy medium' somewhere between plain TPDF and aggressive noise-shaping that necessitates attenuation to avoid clipping and risks pushing tweeters non-linear.

I think that audible nonlinear distortion due to say, E2 dither is a myth. 1 bit dither is way, way below FS and no matter how it is shaped, it is unlikely to cause dynamic range problems.

Are there any documented instances of this?

Any recommendations for 24/96 => 16/44 => 24/96 for transparency

Reply #20 – 2014-11-10 13:04:12

Quote from: AndyH-ha on 2014-11-10 06:31:58

Why does anyone care if multiple bit depth and sample rate changes eventually produce an audible difference?

Because there is a lot of audio hardware out there, both consumer and professional that downsamples high sample rate audio, and other mainstream hardware that simply won't go faster than 48 KHz. For example Yamaha used to make digital consoles for general purpose use that would run at 24/96, but when they came out with sequel models some years back, 48 KHz became the highest sample rate that they supported. A lot of mainstream power amps with built in DSPs run them no faster than 48 KHz.

As has been just pointed out in this thread most consumer audio interfaces max out at 48 KHz for critical signal processing such as their DACs, and use hardware or software resampling when they are faced with 24/96 audio data. My test of the Realtek chip in this PC shows a brick wall filter just above 20 KHz when it is processing 24/96 files.

Mainstream portable digital players have similar characteristics. Equipment with high sample rate capabilities commands a signficant price premium in many cases. People are offering downloads of music I already have on CD in allegedly high sample rate versions. This all begs the question whether or not high sample rates and/or long data words are worth it.

Quote

There surely can't be any rational excuse for converting from 24/96 to 16/44.1 to 24/96, so why do it to begin with? In fact, after all the time and noise on the subject, how does anyone still want to even think about the endlessly boring subject of subtle differences between different samplings?

There is similarly no rational excuse to convert media from 16/44 or comparable analog formats to 24/96 yet it has already been shown that a large fraction (1/3 to 1/2) of all high resolution recordings that were sold in the heyday of SACD and DVD-A were legacy media, upsampled.

Where were you when all of these facts came to light? ;-)

Any recommendations for 24/96 => 16/44 => 24/96 for transparency

Reply #21 – 2014-11-10 14:10:40

Quote from: Arnold B. Krueger on 2014-11-10 12:39:42

I think that audible nonlinear distortion due to say, E2 dither is a myth. 1 bit dither is way, way below FS and no matter how it is shaped, it is unlikely to cause dynamic range problems.

Agreed, this is far less of a problem than e.g. pre-emphasis (15/50µs) from the old days. There is some anecdotal evidence that shaped dither can color sound/music at normal listening levels, but I can't confirm that personally with my old ears.
Ultrasound (>20kHz) content however can and will increase the peak level of a signal and I've found hi-res audio to "always" clip before redbook audio because of this level difference. Comments ?

Any recommendations for 24/96 => 16/44 => 24/96 for transparency

Reply #22 – 2014-11-10 15:18:06

Quote from: Kees de Visser on 2014-11-10 14:10:40

Ultrasound (>20kHz) content however can and will increase the peak level of a signal and I've found hi-res audio to "always" clip before redbook audio because of this level difference. Comments ?

That seems like one of those "has to be" type situations. Extreme cases of that are my "Keys Jangling" files.

However if you actually measure so-called high def tracks, the difference is often minimal.

For example the 1644 and 2496 "Street Where You Live" tracks in the 24-96.zip file linked from this site have peak levels that seem to be minimally different, despite the vast different in bandpass.

Any recommendations for 24/96 => 16/44 => 24/96 for transparency

Reply #23 – 2014-11-10 16:23:55

Quote from: AndyH-ha on 2014-11-10 06:31:58

Why does anyone care if multiple bit depth and sample rate changes eventually produce an audible difference? There surely can't be any rational excuse for converting from 24/96 to 16/44.1 to 24/96, so why do it to begin with? In fact, after all the time and noise on the subject, how does anyone still want to even think about the endlessly boring subject of subtle differences between different samplings?

I don't suggest that anyone or everyone not be free to pursue the topic if they want to, it is just that it is hard to come up with anything that matters less in regard to anything at all, no?

MLXXX is here from one the AVS forum hi rez vs CD train wrecks , because he is an honest seeker of information, and I suggested there that he get advice on appropriate listening test protocols from HA, instead of relying on certain dubious parties who post there.

So there's no need to be cranky. Help the guy out, he's here to learn.

Any recommendations for 24/96 => 16/44 => 24/96 for transparency

Reply #24 – 2014-11-10 17:46:43

Quote from: MLXXX on 2014-11-09 04:37:26

However even after correcting for that, and then inverting one of the files I found I couldn't get a null. In the conversion process there seems to have been a subtle change in level, which I suspect was not evenly spread across the audible spectrum.

If the provider of these files has, um, let's say "inadvertently/accidentally" inserted a mild dynamic range expansion, compression, or limiting into the signal path of just ONE of the selections, not both, there would be an inability to null, especially apparent at musical peaks [or lulls]. The average level of the music would seem "level matched" and would largely null out nicely, however the peaks [as an example] that exceed a particular threshold level (preselected by the manipulator) will be altered.

Is the inability to null constant or does it vary based on the dynamics/level of the music at that point in time?

Notice