Patrick Griffis is Vice President Technology in the Office of the CTO at Dolby Laboratories, and is also a fellow of SMPTE, in both roles closely involved with the new video standards governing High Dynamic Range and enlarged colour spaces. He recently visited Australia and we enjoyed an interview with him following an LG event which launched the company’s 2016/7 range of televisions, including OLED, HDR 10 and Dolby Vision-compatible models. Sound+Image was represented by both Jez Ford and Stephen Dawson.
JEZ FORD: So the LG roundtable discussion on HDR today was highly illuminating...
PATRICK GRIFFIS: Highly illuminating, very good!
JEZ: So Dolby Vision versus HDR 10 — can you encapsulate the differences for us?
PATRICK: You mean Dolby Vision versus ‘entry-level HDR’? No I’m just joking... Because Dolby Vision is an option in the Ultra HD Blu-ray spec, but the base layer is HDR at 10 bits, with this so-called ‘static metadata’ that just defines something about the entire content and the environment it’s mastered in. What Dolby Vision does is takes the 10 bits and increases it to 12-bit precision.
And by the way, HDR 10 uses SMPTE’s standard for this next generation of electro-optic transfer function (EOTF), aka gamma. It’s the new gamma, but it’s called Perceptual Quantizer (PQ) because it’s based on the human visual model. Gamma, of course, is a legacy of CRT physics. It works pretty well, but it wasn’t the way we would have designed it — it’s what the early cathode ray tubes did, and at low light levels it roughly approximates the human eye. But as you get brighter, as we move to High Dynamic Range, it actually isn’t a very good representation. So we created a new standard. All six studios participated — the Hollywood studios, I know there are studios outside Hollywood! — and settled on two things.
First, yes, we use this curve that models the human visual system. And then the next question is, how bright? What should we use as the ideal range?
You know no one had ever talked about it before, because for 50 years we knew what white was — it’s a hundred nits, because that’s what we did for CRTs, right? And for over a hundred years we knew what bright was for film — it’s called 14 foot-lamberts. And how was that picked? Because when they did the first cinema projectors, what was the brightest light bulb they could put in before they melted the film? And they stepped back and that’s the standard for film. Not because there is some magic about a dark room, all that crap about dark rooms and you don’t need it, no — melting film.
And then 24 [frames per second] was not the engineers’ choice, it was the businesspeople’s choice. Because they saw film spooling on the floor, and they said what’s the slowest frame rate we could go at and call it film. Then we live with that for a hundred years.
You know, typically by the time you get a film print you’re lucky to get seven ‘f’ stops of dynamic range. But as human beings, every time you open your eyes you see high dynamic range. That light — fluorescent lights are typically 8000 nits. You can see under that chair in the blacks — that’s probably point zero two or three. We see high dynamic range all the time but we haven’t been able to display it.
So now that film is going away and we move to digital, now that technology is improving, we sat down and said, well, what if technology wasn’t the limiting factor? What would we as human beings pick?
JEZ: And you’ve now chosen figures that the industry can grow into, rather than deliver
now with current consumer equipment?
PATRICK: So we worked with the studios to develop the standard, and we said OK we’ve found from research that probably 0.05 to 10,000 nits would be a good useful range for entertainment purposes. And heck if we’re going to go down that low, we might as well go all the way to zero, so zero to 10,000 nits. We developed a new curve based on the human visual model — it’s called SMPTE ST 2084, or PQ [Perceptual Quantizer] for short, that now models our visual system through this useful entertainment dynamic range.
And now that’s our new aspirational goal, much like CIE 1931 for colours. If we were able to represent all the colours we as human beings can see, we’d say as engineers we’re done — creatives, you decide what you want to do with it. But now we have one also for colour.
JEZ: The colour space plots you showed today were in three dimensions — colour volume. Neat, never seen that before.
PATRICK: Because we’ve never thought about it! I was a TV engineer for many years and we never talked about it, we just assumed. But we’d just cut the top off the colour volume. Now we’re in this brave new world where it’s going to continue to grow.
JEZ: Our columnist Derek Powell did a story when you did the first Dolby Vision monitor with high dynamic range.
PATRICK: The PRM [Professional Reference Monitor], yes. Which, by the way, was mentioned at the Oscars — it got an honourable mention in the sci-tech awards.
JEZ: And you were pitching for 10,000 nits at that point. Is it right that LED-LCD can go into the early thousands of nits, while we gather Dolby made an exception for OLED
to carry Dolby Vision even though it’s something like half that level? Why was that?
[ABOVE: A ladder to the light, showing the relative dynamic range of current LCD and OLED TVs within the new PQ range and compared with Rec.709 (i.e. non-4K) TVs. Remember it’s not only how high the arrow reaches, it’s about length (which indicates the total dynamic range)].
PATRICK: Well, the SMPTE spec, 0 to 10,000, is the container. So at 12 bits, that’s 4096 code words, and the idea is that one step in code value — you know one step up in red, one step up in blue, one step up in green, one step up in red green and blue aka white, or one step down — you should not be able to see any difference in the image. Because you want to have enough precision so that regardless of the content you’re not going to get quantisation errors — and quantisation errors is a techie term to say, you know, you’ve seen banding and skies are banding in red... You want to make sure you have enough precision that you’ll never see that. And you need 12 bits to guarantee that in all cases. So that was what we settled on.
So then think of it this way. We defined a thermometer that goes from zero to 100 degrees. If the temperature is 32, it doesn’t mean it’s bad. By having all the code values at any peak brightness below the threshold of visibility, it means in the future as new content comes along that takes more advantage of that thermometer, the content that you have today is not, you know, obsoleted. So the OLEDs are state-of-the-art today, right, and in fact that 540-nits number is a minimum — the nominal design is much higher with the LG OLEDs, but when we created the UHD Alliance spec numbers it was based on input from LG on a minimum value they’d always guarantee, and then a black level that there’d always be. And content is not mastered for OLED — at Dolby we advocate you master in the largest colour volume available. And state-of-the-art today on the mastering side is a display that Dolby makes called the Pulsar — 4000 nits peak whites, 0.005 blacks, it has full P3 colour and it’s linear, you know, consistent throughout its dynamic range. So when people master for distribution like Netflix or, you know, VUDU in the US, they’ll master and they’ll make what we call a Dolby Vision master at 4000 nits.
JEZ: And the EOTF then scales that to whatever you’ve got available.
PATRICK: Bingo, right. And this is where the dynamic metadata comes in. So you master at 4000 nits in real time. And we actually do a Rec. 709 version at 100 nits so the creative can look at both. So if they don’t like the resulting 709 they can tweak that, and that metadata also goes along with it — which no one else does. And so we then send that content to the OLED TV. The scene-based metadata — because the LG OLED TV set has the Dolby Vision implementation — all that metadata comes in and the TV now knows what to do to map it to its colour volume, the 540-nit/0.005. But the master is 4000-nit.
But LG also makes LCDs, and they’re Dolby Vision enabled too. They have a different colour volume — the same content goes to that TV and makes a good picture as well. And we think that’s one of the good design features — you master once and it will render on any device, you know, even these guys [points at his smartphone] some day.
[ABOVE: LG Australia's graphic showing typical luminance levels for OLED and LCD screen technologies]
JEZ: We’ve got 4K smartphones already...
PATRICK: OK, forget the 4K, it’s a waste, you really need eight million pixels? But these are actually 500 nits — five times brighter than conventional TV today, so they would actually benefit from higher dynamic range well before wasting your money putting in more pixels.
And so to answer your question, it’s not that we’re obsoleting the TVs, the whole concept was designed to be future-proof. So the container’s good up to the practical limits of the human visual system for entertainment purposes. You know in the real world the sun is 1.6 billion nits? We figured we didn’t need to go that bright.
JEZ: So how much are your Pulsar monitors? Perhaps we should all be buying those.
PATRICK: Not consumer price friendly! And they’re professional.
STEPHEN DAWSON: The dynamic metadata — you generate that...
PATRICK: In the colour suite.
STEPHEN: And you said ‘scene’ based — by scenes do you mean several seconds of a scene, or do you mean per frame?
PATRICK: It could be at the per-frame level, but if you do scene changes at a 30th of a second you’re going to drive people crazy… So we tend to think of a scene like, you know, I do a shot of you, then I cut over to Cheryl and I cut over to you. You know typically scenes might be one or two seconds if you’re a fast-paced action film, but it might be several minutes depending on the movie. So yes, the precision is on a frame basis, so we could do frame-by-frame scene-based metadata. But typically what happens is I’ll focus on you, right, and think of that colour volume now, and if it’s HD there’s two million pixels — some bright ones at the top of your head, dark ones in your jeans. But if you think of where all those two million pixels are sitting in the colour volume, it’s like a swarm of mosquitoes — on every scene they’re swimming up and down, and if it’s a bright scene they’re up here, if it’s a dark scene they’re down there.
And knowing where they are allows you to do more intelligent mapping — because the trick is you’ve got to stuff all eight million of them, if it’s 4K, in the proper place in the target display colour volume. And it’s not just squeezing them in there. It’s keeping them all in the proper proportion to preserve the artistic intent so that when the cinematographer looks at it he says, yeah that green is the same green that I had in the master. You preserve the tint, you preserve the hues, you preserve the relative colours. It’s a three-dimensional real-time mapping problem, which is one of the things we’ve done really really well, in real time.
So whether it’s a 4000-nit state-of-the-art display for mastering — and we hope to see a 6000 or 8000-nit mastering display coming out in the next year or two. We keep moving up, and the solution scales. If next year there’s an 8000-nit Dolby Vision master, it’ll still play back on the OLED TV sets.
STEPHEN: So if you’ve mastered to 8000 nits, and you have a pan across the sun, say — on a lesser display that would just crush out, presumably?
PATRICK: No it would clip — in today’s world, clip. So I know where you’re going, let me answer. And I’ll use the analogy of spatial resolution. So if you had a 4K or 8-megapixel master and then you scaled it down to HD at 2MP, you’d get a better-looking HD image because you’ve effectively oversampled. In the 8000-nit colour volume — and I often talk about this using that sun example, if you colour graded it in a colour suite at 100 nits, you just don’t have the dynamic range. If you have just the sun you’d have a white circle with a black surround — that’s no good, so you generally scale it down so you get the surrounding image, and if the sun’s in there it gets clipped and it’s often 100 nits white, right?
But if you go back and you look at the master at 4000 nits, you can see the gradations of yellow around the sun before you clip, because the sun is 1.6 billion. Now if you made that 8000 you get even more gradations, and the interesting thing that happens is when you now map it down, even if it’s to 540 or even if it’s all the way down to SDR at 100 nits, because you know what the original is supposed to look like, you can preserve the relative spacing between those pixels. And even though they’re not nearly as bright — and by the way they have to change saturation to fit down into the 100-nit — it actually looks better than the clipped 100-nit version. So much like oversampling in the spatial domain, we’re oversampling in the colour volume domain. And many people kind of get that, because we all understand spatial resolution, but colour volume is still kind of wacky.
STEPHEN: So the quality of the implementation of a particular display device depends on two things — your metadata at the production stage of the content, and then how well they judge the mapping and how it reacts to the metadata at the display end. So they could screw it up completely, and just clip everything.
PATRICK: And that’s the problem with the entry-level HDR 10 — that what they’re going to do is when they get content, they’ll do a static map. And it may be OK in some cases but as you can imagine, the cluster of pixels flying around, they’re in this sort of colour volume and now I want them to fly around with the same relative positioning in this colour volume. It’s not a trivial problem, and a single mapping doesn’t work.
STEPHEN: Back to the content — the UHD Blu-ray supports Dolby Vision as an option?
PATRICK: As an option, yes. So the first players coming out won’t have it. Like from Samsung and Panasonic, they’ll just have the generic HDR. We’ve been pushed by our studio partners pretty hard to be able to author Dolby Vision —Warner Brothers already has 35 titles in Dolby Vision so they say, yeah we’d like to release it on Blu-ray.
JEZ: What’s the mood behind 4K Blu-ray? It seems a very cautious launch?
PATRICK: You think it’s cautious?
JEZ: Well it was put back and back and [at the time of interview] there’s still no confirmation of any player coming here...
PATRICK: Yeah it’s a slow roll-out. Then there’s many who say, hmmm, in the onslaught of companies like Netflix that are approaching 100 million households, is there really a market for packaged media?
JEZ: Do you find that disappointing, given discs are more purist in terms of not bit-stripping and changing delivery on the fly? And the extras?
LG representative: Also the infrastructure is a bit of an issue here.
JEZ: Yes, we don’t have very fast broadband.
STEPHEN: I get 6Mbps in the middle of the night.
JEZ: You’re lucky! I get five, and I’m in centralish Sydney.
PATRICK: Yeah, I live in Silicon Valley, even the homeless have internet! But it’s getting better and better?
STEPHEN: No it’s not!
JEZ: It’s not really, here, no, but never mind. Last question — are you worried about mis-use of HDR? I was talking to Paul Gray of IHS DisplaySearch and he’s waiting for the advertisers to get hold of HDR and just as they pump up the volume on their ads, they’ll make everything as bright as they possibly can — whiter whites!
PATRICK: We’ve talked, you know it’s not typically in our DNA at Dolby, we’re focused on entertainment and Hollywood. But I’ve said, you know once Mercedes-Benz sees an HDR commercial, they ain’t going back — or when I’m pouring my beer and the speculas of the beer make it look more appetising — yeah I think advertisers will latch onto it. And there’s been jokes —like we have dial norm, you know, a loudness norm, are we now going to have to have some sort of a video norm for HDR?
I think what’s going to happen is that as we go through the transition, it will be difficult — imagine a live production where you’ve got SDR cameras and you’ve got HDR cameras, or even if you have all HDR cameras and the event’s in HDR, someone has an advert that’s back in SDR. So there will be conversions and scaling up and down, and we’ve already developed some technology that does real-time inverse tone mapping to take SD up to HDR. The program stream, if it’s an HDR stream, will be all HDR, and the commercials will be scaled before they come to your TV. And you’re not going to get your eyeballs blown out because governments will go wild.
One of things we’re working on now is doing Dolby Vision live, so that’s the next big thing — live sports with high dynamic range is going to be a real cash cow.
JEZ: And high frame rates for sports?
PATRICK: Well I will say, because I’m hearing it from a lot of people who do sports, is they’re beginning to rethink 4K, because 4K is 400% more data rate, and if you’re a broadcaster, you know, bits are bucks — one of my other sayings along with ‘more faster better pixels’. But give me sports with HD and high frame rate.
JEZ: Yeah, that would be story telling. Thank you Patrick, a pleasure to have you over here.