Audible Difference Test: “High-res” vs. CD

Abstract

Informal “null test” experiments aim to determine for the purpose of my own amusement (and perhaps as a payout on platinum-eared audiophiles whose beliefs of choice are what was advertised at them) whether the difference between CD “Red book” standard PCM stereo (at 44.1kHz/16bit) and so-called “high-res” PCM stereo (such as 192kHz/24bit) is audible when played back with hi-fi speakers in a quiet listening environment.  The experiments aim to play the actual difference to be heard in isolation from the music program itself.  That’s a “null test” without music to mask a difference. Part way through my procedure, I learnt from an acquaintance that I had been anticipated by someone 1 using sophisticated comparator software of which I was unaware.  I had been using a cumbersome routine of inversions and channel-swapping with a variety of disparate software packages.  Rather than abandoning the experiments altogether for fear of redundancy or being seen as derivative, I pressed on with a view to either confirming or supplementing that work, but abandoning my multi-step routine in favour of the (apparently) more sophisticated comparator that he used.

Spoiler (and it should be no surprise!)

The difference is silent to human beings!  “High-res” digital stereo formats provide absolutely nothing audible whatsoever over standard Red book CD.  If there is some audible difference on playback (i.e. after D:A conversion), then it cannot be attributed to the format per se.  In other words whether it’s “authenticated” or otherwise promoted, it’s just more BS to be lapped up by the exotic cable fraternity.

Background

By virtue of the Nyquist frequency criteria, Red book CD at its 44.1kHz sample rate provides a maximum frequency response of 22.05kHz (well beyond what I personally can hear), whereas a 192kHz file theoretically provides for 96kHz.  Numerous studies have been undertaken over the years leading to contradictory conclusions and tiresome arguments.  Who to believe?  One year the AES publishes a paper 2 concluding that there are no statistically verifiable audible differences between “hi-res” (including SACD) and standard Red book after passing signals through A/D/A conversions.  This one naturally took a lot of flack from audiophiles as it was obviously very confronting to them.  Then in 2016 an AES-published “meta-analysis” 3 comes down 60:40 (pretty close to 50:50 guessing) in favour of high-res aural distinguishability, but with certain caveats such as “over extended listening periods”, and by “trained listeners” (whoever they might be).  I’d want to see something closer to 100% in double-blind or ABX testing before I was convinced of anything.  Maybe ABX testing is flawed, but not for any of the reasons that I can see put forward by audiophiles.  As soon as the switch is made, the subject is no longer hearing the same section of the music, so are these actually difference tests, mere memory tests, or “let’s compare apples to oranges” tests?

A “listening test” is included in recent builds of a certain popular media playing program.  It asks its customers to select a track from their own library, then to “star rate” temporary tracks converted from it into various formats in the background.  The format is supposedly hidden during the test until after the star ratings are applied.  The “findings” can then be shared on an on-line “aren’t we just brilliant” forum.  Again, the audio samples are played back at different times, so the results (assuming no cheating) are subjective – clouded by the subject’s auditory memory and auditory masking.  More importantly, the whole thing is bogus and purely self-validating in nature.  I was able to cheat the test every time and it’s very clear that most of the participants did likewise.  And surely the volume slider ought to extend beyond -95.5dB (1% on a 100 position slider) before muting in order to distinguish high-res PCM from Red book anyway!  Red book dynamic range capability is 96dB.  I do not understand how people are able to hear a difference in such a case.  The main rival software has an ABX plug-in, which is probably less lame, but equally subjective.

It’s exactly the same as the endless claims by middle-aged audiophiles that they can hear a difference between CD and SACD (DSD).

July 2018 Addendum

  • I copped yet another such claim by one of these people just last week.  This time it was a DSD file on a USB stick played back by one of the software players to a DSD-capable DAC. He had no Red book version to compare it with and the DAC shone a light to indicate it was receiving DSD!  Sighted bias at play yet again!

Rather than participating in that kind of nonsense (and BTW “just post your test results” from a site “moderator” won’t get any from me), or accepting any of the so-called “studies”, I’d prefer to undertake my own scientific/objective tests, arrive at my own conclusions, learn something along the way and share my results here.

⊗  A commercial motivation in providing such a test might be to push the placebo train along and thereby maintain an illusion of hi-res superiority because they’re in cahoots with one of the major hi-res download providers.  I really don’t know, but where people can cheat the test to boost their own egos, then share their “results”, the silly cycle just continues.

A difference test removes all of the subjectivity, and I have no commercial motivation in applying one, other than maybe in saving money when making music purchasing decisions.  It also completely eliminates “auditory masking” of the difference by the fundamental (the music program).

Aside

My musical tastes lie predominantly in classical and jazz, which are rarely if ever dynamically compressed.  So the dreary “loudness wars” for the most heavily compressed (“loud”) pop music for playback via ear buds or in noisy motor cars doesn’t particularly faze me.  Yeah, they should probably stop it because it’s stupid when a simple “compression button” could be provided for $2 for people who want that kind of garbage, but that’s another story.

My personal standpoint

Years ago in my “dumb audiophile” phase I purchased SACDs and DVD-Audio discs purely for stereo playback in the belief that I had something special over CD when in the company of others who seemed to share the same delusion.  I never fell for the “super tweeter” nonsense however, and perhaps that was the seed to my progressive scepticism of all things audiophile.  I started to question SACD and DVD-A and gradually came to the realisation that it was a load of BS.  It seemed to be used by people to validate their purchase of over-priced, fancy-branded hi-fi equipment or their super-human hearing capabilities (or something).  And it often came from people over 45 years of age who ought to have significant hearing impairment.  In more recent years I have continued to purchase SACDs and DVD-A discs, but solely for their multi-channel content for playback in surround sound and in most cases those discs were less expensive than CDs anyway.  In some recent exchanges about digital audio, I have gone so far as to say to people “you cannot hear the difference”.  Some of them seemed very upset when I said that, and they usually hit back with illogical and defensive silly talk.  I do not waste time with LPs (vinyl records) as they clearly have nothing whatsoever to offer over CD apart from enforcing ritualistic patterns of behaviour that some nut cases seem to enjoy.  Completely ridiculous in 2017 on all technical levels, if I had any left from the 70s or 80s they’d just get thrown on the tip.

Some realities

Human hearing thresholds are well established.  They can be represented by Fletcher-Munson, or Robinson-Dadson equal-loudness graphs like this one:

  • Note: most images on this page can be viewed in a larger size by clicking on them.

That is out-dated, but more modern graphs 8 are similar.

The red “deaf” area and the coloured vertical lines I have added.  The lines are indicated at the deepest part of the graph – at around 3-5 kHz.  This is where human hearing is most sensitive.

  • Red line (about 105dB) is an absolute range between inaudibility and pain
  • Blue line (about 75dB) represents the range between pain and the background noise (30dB) of a very quiet listening environment  (my music rooms measure around 33dB – no air conditioning or street noise)
  • Green line (about 60dB) represents the range between a level that is considered dangerous beyond a 2 to 4 hour period, and the background noise of a very quiet listening environment
  • Yellow line (about 93dB) represents the range between a level that is considered dangerous beyond a 2 to 4 hour period, and the lower threshold of audibility which is some 32dB below the background noise of a very quiet listening environment

And as also hyperlinked in Reference 1, the University of New South Wales’ hearing test can be used to create a very rough approximation of one’s personal red “deaf area”.  It’s rough because it does not eliminate the sound system (including the headphones) used by each participant when performing the test – i.e. it’s uncalibrated.

Very obviously (i.e. it should go without saying that), any frequencies above 20kHz are “off the graph”.

It is probably fair to say that at around 3-5kHz a -95dB (let’s just call it -96dB) signal in a system in which the maximum possible playback volume is dangerous, would be considered quite generous as a threshold of audibility in a quiet listening environment.*  It is also probably fair to say that -80dB (somewhat beneath the noise floor of a quiet room for the same system) is a more realistic gauge, and that at any other frequency, the level of inaudibility will obviously be somewhat further down than these figures.

*   The experiment could stop right there come to think of it (the CD Red book standard already provides 96dB of dynamic range and a frequency response up to 22.05kHz), but I’ll press on…

Auditory masking

The above excludes the presence of auditory masking which is there in the form of the fundamental music waveform when a track is played normally together with all the unwanted harmonic and intermodulation distortion components introduced by the playback system itself (especially the speakers), not to mention the contributions of the room.  My better hi-fi system (including the room) accounts for some 0.139% THD (at 3.28kHz and 80dB playback level) alone.  Here is a measurement of one speaker:

Harmonic distortion figures at the bottom refer to the cursor position only.  I put it there because it just catches the 6th harmonic and is in the zone of highest hearing sensitivity.  THD is indicated as 57 dB below the fundamental.  An Earthworks microphone (EIN of about 20dB A-weighted) and a Focusrite microphone pre-amp (which measures 0.003 THD at the same frequency using a loop-back cable) were used to make the measurement.  Due to the incoherent nature of the components, it is fair to say that they contribute SFA (for example 57dB + 20dB for coherent signals ≅ 57dB anyway).  So THD via one channel of my stereo system is some 57dB down from the fundamental.  In order to hear the “difference tracks” produced in the tests here, we must consider the effect of this together with the fundamental.  Put another way:  Trying to hear a difference track would be somewhat like playing the reference track while toggling a mute switch on the difference track being played through another set of speakers.

Auditory memory

This is a whole field of psychology about which I  know nothing.  I suspect however that even ABX testing probably doesn’t eliminate a reliance on it.  But I can say that my test procedure eliminates any reliance on it.

Format exclusions

I have no interest in lossy compression file formats such as MP3 and see only one potentially valid use for DSD and that is in archiving (which is what it was invented for).  As a consumer format it’s stupid as it includes broadband ultrasonic noise that’s completely dissociated from the music anyway.  If there was a technical way of performing a difference test with a DSD Reference track, I would, but AFAIK there isn’t, so DSD is excluded.  MQA and all the others are of no interest to me either.  So my interest for this experiment is purely in comparing so-called high-res PCM files against Red book CD.  88.2, 96 and 176.4kHz formats were also excluded, but only for the reasons set out below under “Why 192kHz?”

Inspiration and motivation

My initial inspirations for these tests came from here 4 and here 5.  These pages discuss analogue methods of inverting and summing to either see or eliminate a difference.  The very original work was probably done by PJ Baxandall 6 in 1977.  Digital differencing was first proposed by Dunn and Hawksford 7 in 1991.  A more immediate motivation however was the appearance of the “listening test” in recent builds of that playback software.

Software used

Please note – I neither endorse nor recommend the installation of any of the following software.  Do so at your own discretion.  For example “Spek” will grab certain file associations which need reversal which is very annoying.

  • sacd_extract.exe – to retrieve DSF files from my DVD+R archive (freeware)
  • Korg Audiogate (an old version that’s no longer available and allows me) to convert a selected DSF file to 192/24 WAV (obsolete freeware)
  • r8brain PRO – to resample the PCM test files (freeware)
  • Audacity – to trim down the test tracks (freeware)
  • Spek – Audio spectrum analyser (freeware)
  • Audio DiffMaker – used as comparator software (freeware)

From the Audio Diffmaker Help: “A difference recording that you get from Audio DiffMaker is an absolute difference – anything whatever that is different between the two tested recordings will show up in the difference recording (or “difference track”).  Only if the original tracks are essentially the same will the difference track be left without any of the original recorded sound.”

Why 192kHz?

Sample rate conversion:

I initially chose 176.4kHz for the sample rate of the reference tracks in my experiment to anticipate an argument that “distortions were introduced by your conversion filters”.  It would be a spurious argument anyway, because all such distortions introduced in the up-conversion step could only increase the audibility of the difference track.  Also when down-converting (from the reference track to produce the intermediate Red book file) if 176.4 was used, the process would theoretically be by simple decimation without interpolation because of the integer down-sampling ratio of 4, so it could not be argued that the intermediate Red book file is in some way “corrupt”.   And again, in any case such an argument would be spurious because added distortions could only result in increasing the audibility of the difference track.  Also, when starting at 176.4, the down-sampling step can be done without any low-pass filtering because the high resampling ratio of 4 is twice as high as that required by Nyquist.

Be that as it may, the Audio Diffmaker software did not work with any 176.4 material that I tried, so I tried 192 and finally got a result that said “Sample Rate error is 0 ppm, adjustment unnecessary”.

If the experiment confirmed my position that no audible difference exists between Red book CD and 192/24, then there would be no need to experiment with other sample rate Reference tracks.  The point would be made.

Bit depth conversion:

As I understand it, bit depth has nothing to do with “resolution” if that word is interpreted to mean “musical detail”, “spatial information” or any of the usual audiophile terms.  Lowering the bit depth simply raises the noise floor 9.  It ‘s a measure of dynamic range.  16 bits provides 96dB (familiar number?) whereas 24 provides 144dB – perhaps useful for recording engineers (I don’t know or really care), but not very useful for the consumer.  Either format can result in very rapid and permanent hearing impairment when exploited to it’s full potential.  96dB will to it at about 30 minutes exposure per day.  144dB will do it instantly!  And either way 30dB of it was buried inside the background noise of what used to be that quiet listening space!

I don’t know anything about interpolation errors (or lack thereof) on bit depth conversion.  All I can do is show my test result.

As an aside, the r8brain software used for resampling the files uses flat dithering (no noise shaping) to convert the quantisation noise to white noise on the bit-depth down-conversion step, not that it matters for this experiment.  Any such conversion “noises” whether quantisation (buzzy), white (less offensive), or green with pink polka dots could only add to the audibility of the Difference tracks.

Methodology

  • I selected high-res tracks from my music library – ones that I thought had high frequency content – jazz with cymbals, or classical with violins for example
  • If originally from an SACD, I found my DVD+R archive of the track and after extracting the stereo DSF files with sacd_extract.exe, converted the file of interest to 176.4/24 WAV using Korg Audiogate to produce a Reference track with ultrasonic content (it didn’t matter if the Reference track faithfully represents the DSD material.  It’s my PCM reference)
  • The Reference track is viewed with Spek – this is a very useful program and can tell you almost immediately if something is from a PCM recording and/or master.  Here is an early test (of a 192kHz Reference track file) prepared from something in my library indicating that the DSD layer was made from a 96kHz recording (and the liner notes confirmed it).  Interestingly the reason that I used the Korg software is that it produces a full ultrasonic spectrum, unlike dbPoweramp which produces files that look like this despite being told to produce 192kHz files from DSD! →
  • Here is an interesting one from a hybrid SACD.  It’s one in which the SACD stereo section is clearly hard-filtered at 22.1kHz, so it might as well not be there.  Needless to say, tracks from such albums were excluded from my experiment →
  • I converted the Reference tracks to CD Red book quality – 44.1/16 WAV using r8brain PRO (linear phase setting) to produce a Red book version – Note the actual Red book layer of the SACD, or a cheaper 44.1/16 download of the same album is of no interest to this experiment as it must by definition exclude any mastering differences
  • Again use r8brain PRO to up-sample the Red book version to the original format of the Reference track to produce a Compare track
  • I then open Audio DiffMaker, leaving everything at the default settings, load the two files to be compared then generate the Difference file
  • The above step mostly failed on the fine alignment of the tracks, but I finally got a result at 192/24 for a short 6 second fragment of a track after trimming down with Audacity
  • Play the difference file in Audio DiffMaker
  • Load the difference file into Spek and generate a spectum

Test result

(Telarc):

  • Foreword: Quite frankly I was expecting to see differences of around -100dB per Reference 1, but the result here immediately had me say to myself: “you went into far too much analysis of the Robinson-Dadson chart above and waffled unnecessarily about masking.  This just wipes it all away”.  There is no audible difference.  😛

This track has some of the most ridiculously high dynamic content of anything in my library.  For that matter it’s from one of the most ridiculous albums I have – Telarc’s “Scary Music” SACD and it’s the opening SFX sequence with thunder that shakes the house!  Moreover, it was selected for its dynamic range to see if jettisoning 8 from the bit depth showed up anything audible from the increased noise floor in the down-converted version.  This was first trimmed down in Audacity to get it under 1 minute (the track time limit for the free version of r8brain PRO).  I had lots of trouble with Audio Diffmaker and had to trim the sample down further, so I tried just a small bit to get the thunder clap and that’s the one that worked:

The spectrum of the 192/24 Reference track segment:

The thunder clap is there at the 1 second mark with high intensity infrasonic content as well as some ultrasonic content.  Very obviously much of the ultrasonic content is just dithered noise (I used Korg “Aqua” noise shaping dither to shift it up there), but there is clearly ultrasonic program at around the 1 second and 6 second marks.

Apparently “ligitimate high-res” (oxymoron?) with a broad spectrum.  The back inset with the disc says “This Telarc Super Audio Compact Disc is produced exclusively from Direct Stream Digital™ masters made during the recording sessions”, which seems like very clever wording, but I’m sure it must have seen PCM at some stage since it also says “BEWARE DIGITAL SOUND EFFECTS”.  It doesn’t matter – I’m not testing DSD.  It’s a ligitimate 192/24 Reference track for the purpose of the experiment.

The downsampled 44.1/16 segment:

Converted back to 192/24:

All ultrasonics gone.

Aligned by Audio Diffmaker:

Notice the short alignment gap at the beginning.  Diffmaker produces this as part of its processing.

Audio Diffmaker reported: “parameters: 0sec, 0.000dB (L),  0.000dB (R)..Corr Depth: 134.1 dB (L), 131.7 dB (R)”.

The difference spectrogram (click for a very large version):

I’d call that black rectangle at the bottom of the spectogram the “black hole of silence”.  It represents more than the entire range of human hearing capability for the duration of the segment, so it is, but my position is that the black hole together with the ultrasonic spectra above it is silent to humans.  Only way to find out is to play it  – preferably directly to a 192/24 capable DAC.  It is silent!  Notice the colour legend bar only reaches to -120dB, so it corroborates what DiffMaker reports.  Play it with “super tweeters” to a cat.  Maybe the cat will hear something, but whether it would appreciate it is another matter.

This was the only test that I performed that did not cause Audio Diffmaker to either crash or report excessive sample misalignment.  Quite by coincidence it aligned the samples perfectly.  Perhaps this is why my phenomenally low (>-130dB) difference level is even more impressive than those reported in Reference 1 where some minor misalignment remained.

Notice the very short noise at the start (bottom left).  This is a remnant of the alignment gap when the difference file is generated.  It can be heard at the start of the WAV file as a brief chirp.  The remainder of the difference track is more than 130dB down so it cannot be heard, even with my volume control turned to maximum.  Even after editing the chirp out and converting to FLAC and playing it on my hi-fi on maximum (where I would never go) there is nothing to be heard.  People who think they hear things in headphones that can’t be heard from speakers try it.  It’s still SILENT!

Here is the download link for the difference track: WAV difference file

Or play it here (it’s a big file and may take some time to load from my cheap shared server, so press the replay button after it finishes loading):

 

  • Certain people attempting to fund thier retirements with a new “high-res” streaming scam have claimed that the ultrasonics somehow leap into the audible spectrum to produce a “microstructure of sounds” that is “very critical”.  By what freak of nature?  By what miracle of pseudo science?  Play the file!  I certainly hear no “microstructure”.
  • Note:  In my early experiments I was using a popular media player to show levels and suspected that these were false-reading.  And sure enough – the file above when converted to FLAC and played back with that software, although completely inaudible, has it display a reading of around -60dB.  It is wrong!  It can be turned up to 100% and still nothing is audible.

I’m not interested in repeating the test on other tracks for verification.  This has already been done at Reference 1, and I’ve spent far too much time on this already.  The only further test I might attempt is to get a harpsichord track past Diffmaker’s limitations.  Harpsichords produce amazing harmononcs that extend well past 20kHz.  Here is one that somehow jumped off a non-hybrid SACD on the Alba label →

Conclusions

  • doot-doot-doot...“High-res” PCM is a load of audiophile BS.  There is no audible difference whatsoever.  At  > -130dB, these extremely low levels of difference are too far below any known minimum audibility threshold to even be considered funny, let alone worthy of any masking analysis!  I can therefore embellish what I used to say to the subjectivist audiophiles to something more along the lines of: “You cannot hear the difference unless you’re non-human, or have some kind of bionic fantasy feature implanted in your head, but in which case you’d still need to hear it past this”:

😆

  • There is no reason that the test result found here wouldn’t apply to other “high-res” formats such as MQA, and I see no reason that it wouldn’t also apply to DSD given that Reference 2 included DSD.
  • If listening assessments of level-matched tracks of the formats considered here are made successively, or in an ABX manner, and a difference is perceived, then the perception is either placebo, or caused by something other than the format.
  • To attempt to hear the difference while comparing music tracks, the volume control would need to be cranked further than it is capable of turning, the amplifiers would clip, the speakers would disintegrate and by that time, the listener would be deaf, so wouldn’t be hearing ANYTHING ever again, let alone a difference! 😆
  • The “listening test” provided with “that playback software” might as well compare bit-identical tracks for people to report their placebo susceptibility.  The developer’s resources might be better spent out-sourcing qualified programmers to de-bug, trim down and re-arrange what has become quite tedious and progressively unstable bloatware.  I mean seriously – they can’t even get a volume slider to work sensibly! 😛 😛 😛
  • The CD-DA standard from way back in 1980 is still more than adequate.
  • The results of Reference 2 remain valid.
  • Maybe Reference 3 was “sponsored” by vested interests.
  • After reading this (assuming they got this far), and despite the fact that the difference is well below the noise floor of their “platinum reference mono blocks”, the “true audiophile” will still ask “but what about the nuances present in the ultrasonics?”  Assuming they’re not actually the Bionic Woman, the obvious answer is “Are you an alien in disguise?” because they’d need special sensory apparatus like this guy (and his whole head is platinum) to appreciate them. →
  • These same people will also insist that they can hear a difference between WAV and FLAC and even between uncompressed and compressed FLAC, so they can pretty much be dismissed as irremediable.
  • If you’re over 60 then you can’t hear over 10kHz, so the sample rate might just as well be 22.05kHz!

Additional observations

One unexpected thing that I found playing around with the Spek analyser is that some albums from one of those “high-res” download sites are not what they are held out to be anyway.  For example, this Grammy Award winning album downloaded (not by me!) as a 96kHz premium priced option is chopped hard at 24kHz, but has weird and uncorrelated ultrasonic crap thrown in! →

And another:  All of the above and nobody has even bothered to mention that a typical tweeter very naturally rolls off very sharply above 20kHz due to inductance of its voice coil!  So just because something is there in a digital file doesn’t mean it emanates from the speakers at all!  Aaaagh!!

Others have practically nothing above a measly 10kHz.  People are (deservedly?) getting ripped off!  This one (a 192kHz disgrace) has emblazoned across the album cover “From Original Analog Master” (which means an absolute maximum bit depth equivalent of about 13 and more probably around 10 or 11) and something suggesting it is “double” HD!  Big whoop – here’s what you get – nothing over 10kHz apart from a continuous noise at approx 17kHz, -100dB that nobody ever noticed anyway →

There’s one commercially motivated school of “thought” – a dodgey record label PayIX (or something) that professes 96kHz as an  ideal compromise.  Well if 192kHz adds “nothing” then 96 adds half of nothing!

Questions

  1. How does a “trained listener” hear silence over an audio track?
  2. How is a “trained listener’s” hearing any better than that which more than 2 million years of evolution has given the general population?
  3. And if they can’t hear it, then how can they do so “over extended listening periods”?
  4. How do they train people to hear nothing in the first place?
  5. And again what about MQA?  Unfold compressed streamed files into bigger ones of reconstitued “authenticated” nothingness?   Compatible licenced decoding hardware for what exactly?

References

  1. Mitchco’s difference test (link was broken)
  2. AES A/DA/A study
  3. AES meta-analysis
  4. ESP‘s Sound Impairment Monitor
  5. Halfgaar’s interconnect comparison
  6. Baxandall P, “Audible amplifier distortion is not
    a mystery”, Wireless World November 1977.
  7. Dunn C and Hawksford M, “Toward a Definitive
    Analysis of Audio System Errors” presented at the
    AES 91st Convention, New York, October 1991
  8. Monty Montgomery’s video

For obvious reasons, comments are closed for this page, but by all means platinum eared “believers” are free to bag me anywhere else they like. 😛