Did Sony’s Chief Sound Architect just tell me they can’t hear or measure the difference between upsampled 248k files and true high-res audio?*

Yes he did. It seems an extraordinary statement, given this would imply that genuine high-res files offer little benefit over the compressed low-res files we’ve been complaining about all these years. If an upscaled 256k file from, say, iTunes, can sound indistinguishable from a genuine high-res file, then why pay a premium for genuine high-res?

And it’s doubly bizarre coming from Sony, which has made high-res audio such a key plank in its hi-fi strategy for the last four years, and had (declaration, see end of this article) flown us all the way to Berlin to show us their latest products in this area.

Sony engineers

The comment came in response to a series of questions we pitched to a panel of assembled Sony engineers, a rare and valuable opportunity to talk at this level.

The four engineers on the platform in Berlin were:
Koji Nageno, Sony’s Chief Sound Architect (focused on headphones);
Ryo Oba, Electrical Engineering Manager in charge of HW platform design for Walkman;
- Hideaki Shiobara, Electrical Engineering Manager focused on Sony’s recent high-res ES models including the new TA-ZH1ES; and
- Shogo Yashiro, Art Director on Audio Products.
The panel was moderated by Leon Pereira of Sony Electronics Asia Pacific (seated left).

Quite the opportunity, then, to ask those questions that might not be answerable locally, and would never make it through the long chain of referral from PR company to local Sony team to regional to head office and back again — a path through which an answer to a tricky technical question is never, as they say, gonna happen.

That had been exactly my experience a year earlier, when trying to work out Sony’s LDAC, a proprietary codec which… well, let’s not go there now; you can read that story here, and I was delighted to get proper confirmation on LDAC’s abilities direct from the engineers in Berlin.

Sony ATH-1000X But it was a second set of questions which brought the most interesting answer, based on a short paragraph I’d seen in a hand-out that day for the MDR-1000X headphones, which read: “These are also the first headphones with the DSEE HX built in to upscale compressed music from any source to near High-Resolution Audio sound quality, even in wireless mode.”

I took this to be simply an overzealous marketing statement. What the heck does “near High-Resolution Audio” mean anyway? How near is “near”?

And as for making low-res files sound like high-res audio, well, please. We’ve seen so many ‘MP3 restorer’ circuits in recent years — they have become one of the great nonsenses of hi-fi.

I did not realise that this was in fact no casual marketing sentence in a press release — there is a dedicated webpage for this claim, explaining Sony’s DSEE HX upsampling (DSEE = Digital Sound Enhancement Engine) in more detail, with a graph and all. So not mere marketing at all; in fact this seems to have come straight from the audio team.

The website is here: http://www.sony.com/electronics/play-music-library-near-hi-res-audio-sound-quality - and here's the graph.

DSEE HX - graph by Sony

And here’s a longer quote from that page: 
“Sony DSEE HX™ software upscales your existing sound source (those lossy MP3s or AACs) to near high-resolution sound quality. This means that the technology injects more life into your music by upscaling compressed files. So it restores the subtleties of the original song recording and simulates the sound of live performance.”

Still sounds like marketing nonsense, doesn’t it? So expecting a sympathetic response from these top-level engineers, I put my hand up and asked about it.

Jez (Sound+Image): You guys as engineers, you're about purity and the purity of high-res audio. So how do you feel when you see a claim that any music file can be upscaled to near high-res audio quality? Do you believe that?

Some discussion followed before an answer. Koji Nageno, Chief Sound Architect, had already explained in his answers to my earlier LDAC questions what they meant by “near high-res audio”. When they use DSEE HX to upsample the 990kbps Bluetooth transmission of LDAC and then compare the waveform and listen, they can’t see or hear a difference, he had said. Now he continued the theme.

Koji Nageno: As I explained, our experience, our test method, the waveform and listening to the image of music, it has no difference. But if we cut off the higher frequency and then compare the sound, we can feel the difference. So we believe it is near high-res audio.

Sound+Image: So if we can upscale any file to high-res audio, we do not need high-res audio.

Koji Nageno: Oh. [Pause.]

Sound+Image: Clearly you can't. I mean you can't upscale an MP3 to high-res audio, and you say you can't hear the difference? Then we don't need high-res audio.

Koji Nageno: It depends on the original sound quality. Of course if the original MP3 rate is too low it is impossible to upscale to very very high quality. But we can improve to some extent.

Sound+Image: So what file can you upscale to near high-res audio? What level — 256, 320, CD quality? What file can you not hear the difference? Really, this is a statement that is damaging to high-res audio.

[Long group discussion.]

Koji Nageno: Sorry. DSEE HX upscales, but original bit-rate should be 248kbps, can be upscaled to near high-res audio result. So if original sound source is lower than that it can't be near high-res.

Sound+Image: Wow. So Sony says that 256k can sound like high-res audio.

Koji Nageno: Near high-res audio, yes.

[Full transcript here.]

So a bit of a shocka, we thought. It’s one thing to read this stuff as a marketing claim. It’s quite another to have this claim repeated face to face from a Sony Chief Sound Architect. So Sony may be championing high-res audio, but is it simultaneously saying that really, there’s no need to bother with actual high-res files?

I resisted the urge to rush out a splashy headline on Twitter, Snapchat, Tindr etc.… one of the benefits of working for a magazine with a print heritage is having a bit of time to pause for thought. In the back of my mind I was stuck at the ‘Wow’ moment, and a niggling question — why would they say that?

On reflection
Back home in Australia, a bit of space to think. Why oh why would they say that?

Well on reflection, I could think of only one reason for such a senior and experienced and successful audio engineer to say such a thing. It must be true.

My cynicism, after all, is based on the numbers not adding up, a mouth-open-aghast reaction to the possibility of the whole high-res audio thing being, as we like to say in Australia with a suitable tap of the nose, a furfy.

But these guys have, as Dr Karl would say, done the experiments, the trials and the measurements. And this is the conclusion they have reached. Mmm. Well, if this is so, then Koji Nageno’s comments and Sony’s DSEE HX website actually become the polar opposite of overzealous marketing. Instead they are a remarkably honest reporting of a potentially disconcerting discovery — they really can’t tell the difference.

There is one important caveat to bear in mind here — Nageno-san is not saying they can’t hear the difference between 256k and 24-96, he’s saying that Sony’s DSEE HX upsampler delivers results which they cannot distinguish from 24-96 by waveform or listening.

OK, so let’s accept that.

Moving on, then. If this be true, what follows?

Surely it follows that high-res audio must be massively over-specified. That if upsampling can match a high-res original, then there can’t be much, if any, useful additional information in all those extra bits. So are we wasting a huge amount of space and speed?

This isn’t a new argument — though most others on the subject have set the bar for ‘doesn’t make a difference’ somewhat higher than 248k. My esteemed colleague Stephen Dawson has long argued that anything over 20 bits is just the laborious encoding of nothing. And isn’t this the fundamental idea behind the impressive compression ratio of MQA, which Bob Stuart describes as “protecting the signals above 'acoustic absolute zero’.” There’s a nice quote in the context of MQA from TAS’s Robert Harley: “High sample rates create a massive container for the music (a 96/24 or 192/24 file) that is largely wasted bits. It’s like shipping a paperback book in a box the size of a filing cabinet.”

So it’s certainly not just Sony questioning the whole validity of high-res audio,

The value of over-specification
I have long liked the phrase ‘Anything’s good enough for music’. I remember years ago sitting on a mountaintop in the middle of nowhere with a portable Short Wave radio tuned to John Peel on the BBC World Service, and from that whistly low-res signal played through a two-inch speaker I got every acre of musical pleasure as ever I have known from the highest of hi-fi. Anything is good enough for music.

But come on. Endless studies and blind listening have confirmed that people can distinguish high-res from CD and certainly from lower formats… Are they all wrong? Surely not.

Yet clearly there is some level at which we begin over-specifying the space in our container.

So how much does over-specification matter?

Some years back Dee (who has asked to no longer be called merely the missus) and I undertook extensive renos on our humble home. We wanted to remove truss supports from an attic space, in order to create a nice loft music room… er, I mean guest room. Our engineer’s solution was to put a whole grid of steel beams through the whole house, which we have often been told were significantly overspecified (most regularly by the builder who had to get all the steel up our steps).

But how close do you want to get to the correct specification? Too close and the next surprisingly-overspecified Act of God might bring the house down, in an unpleasantly literal way. I’d prefer a little overspecification in this instance.

So, OK, we know there may be a cut-off point, where our music files begin to be overspecified for the information within. But we don’t know exactly where it is, and indeed we do know that it varies with the recording, because clearly some songs have more information than others (insert your own Genesis vs Britney joke here). Remember that Sony made a big point of saying that some information-rich recordings could indeed be distinguished from the upsampled lower-res version. That is the very reason why they say “near high-res audio”. It doesn’t always work; there are exceptions.

So what must we do, to be sure our high-res house doesn’t fall down? We over-specify. We use excessively large containers to ensure that we never spill a drop of information. Indeed one might say that high-end hi-fi in general is absolutely driven by enthusiastic overspecification. Isn’t that the point, to catch every drop? Up to the point of affordability, anyway.

So it makes sense
So, there we have our justification for high-res audio — and it was inherent in the very creation of Sony’s phrase “near high-res audio”. By “near”, they mean “nearly, but not always”. Sony’s DSEE HX upsampler can produce results from files of 248k upwards which are usually but not always indistinguishable from high-res, partly no doubt because of the quality of Sony’s upsampler, and partly simply because there’s so much redundant information in high-res files.

But if you want to avoid that “nearly” tag, if you want to make sure you always catch that extra drop, then stick with the actual high-res. It seems a fair and useful conclusion.

It would make for an interesting comparison - say a high-res file, a 320k MP3 compression of it, and the 320k file played back through Sony's DSEE HX system. Can you tell Sony’s “near high-res” from actual high-res?

The answer, I suspect, may often be no.

--------------------------------------------------------
 

* Note, importantly, that Koji Nageno had said some rare files could be differentiated: “...Some risk maybe for deep bit-depth and high frequency sound — in that kind of case it has some possibility to make a loss…” But “normal music, sound wave check and listening check, there's no difference.” Hence their use of the term “near high-res audio” — nearly, but not always. There are exceptions.

The transcript of our Q&A and a video of the whole Q&A (via the well-informed Ken Ho of PMR Reviews and Head-Fi http://pmrreviews.com/) is available here.

And one final note of declaration — Sound+Image travelled to IFA in Berlin as a guest of Sony Australia. We consider it a rare privilege to be able to ask Sony’s highest engineers such questions, and we made a point of thanking them for the opportunity.