Jump to content

I Wonder When AI Upscaling Will Reach YouTube and Other Video Streaming Services

With Nvidia's DLSS (and whatever AMD are working on) getting better and better, it occurred to me that a website like YouTube could benefit greatly from this technology. To be clear, DLSS is an upscaling technology that replaces "dumb" algorithms with intelligent upscaling. The amount of compute power required is minimal, as the bulk of the work has already been done through deep learning.

 

Youtubers uploading in 4K could instead upload in 1080p and an AI upscaler could upscale to 4K and beyond. For Google's benefit it would be better served if the upscaling were done on the client side, so that Google only has to send 1080p or even less across the web. 

 

I fully understand that at this point Nvidia have to train their AI in-house and then update drivers with the learned data. But, I imagine that at some point the AI could be so well trained that it no longer needs any more training. 

 

One could argue that at some point everyone will have 8K TV's and old YouTube videos in 1080p would be harder to upscale, but in reality the AI could upscale a 1080p stream to 4K, then upscale that into 8K for the viewer. It sounds too good to be true, like a perpetual motion machine, and to me 8K seems like absolute overkill for streaming purposes.

 

This thought just came to mind as I was exporting a video. Just think of all the big YouTuber's recording with insanely expensive cameras, and how with AI upscaling there would be less need for that. And also for on-demand game streaming, I'm sure that has to be a big motivating factor for Nvidia.

 

Food for thought I guess. It'll be a reality soon enough I imagine.

R9 3900XT | Tomahawk B550 | Ventus OC RTX 3090 | Photon 1050W | 32GB DDR4 | TUF GT501 Case | Vizio 4K 50'' HDR

 

Link to comment
Share on other sites

Link to post
Share on other sites

That would be epic.

Main Rig :

Ryzen 7 2700X | Powercolor Red Devil RX 580 8 GB | Gigabyte AB350M Gaming 3 | 16 GB TeamGroup Elite 2400MHz | Samsung 750 EVO 240 GB | HGST 7200 RPM 1 TB | Seasonic M12II EVO | CoolerMaster Q300L | Dell U2518D | Dell P2217H | 

 

Laptop :

Thinkpad X230 | i5 3320M | 8 GB DDR3 | V-Gen 128 GB SSD |

Link to comment
Share on other sites

Link to post
Share on other sites

This is one thing they won't be able to do right now for a couple reasons.

 

1. Bitrate: partially why 4K content looks good on YouTube is the upper vitrage vs. 1080p content. When serving 1080p content, they need to upscale and also remove artifacts from low bitrate

 

2. Compute vs. storage: right now YouTube needs to save a 4k, 1080p, etc. Version of a file that takes up a lot of storage.

What you imply should happen is lower the amount of storage needed, but up the amount of compute power needed.

That's a trade off, which YouTube need to make a decision on , but it think at the moment they will stick with the storage needed.

"We're all in this together, might as well be friends" Tom, Toonami.

 

mini eLiXiVy: my open source 65% mechanical PCB, a build log, PCB anatomy and discussing open source licenses: https://linustechtips.com/topic/1366493-elixivy-a-65-mechanical-keyboard-build-log-pcb-anatomy-and-how-i-open-sourced-this-project/

 

mini_cardboard: a 4% keyboard build log and how keyboards workhttps://linustechtips.com/topic/1328547-mini_cardboard-a-4-keyboard-build-log-and-how-keyboards-work/

Link to comment
Share on other sites

Link to post
Share on other sites

19 minutes ago, Briggsy said:

Youtubers uploading in 4K could instead upload in 1080p and an AI upscaler could upscale to 4K and beyond

Have you tried upscaling a video?

Even a single video takes a long time

Link to comment
Share on other sites

Link to post
Share on other sites

2 minutes ago, minibois said:

2. Compute vs. storage: right now YouTube needs to save a 4k, 1080p, etc. Version of a file that takes up a lot of storage.

What you imply should happen is lower the amount of storage needed, but up the amount of compute power needed.

That's a trade off, which YouTube need to make a decision on , but it think at the moment they will stick with the storage needed.

Hard drives does costs less than GPUs.

Main Rig :

Ryzen 7 2700X | Powercolor Red Devil RX 580 8 GB | Gigabyte AB350M Gaming 3 | 16 GB TeamGroup Elite 2400MHz | Samsung 750 EVO 240 GB | HGST 7200 RPM 1 TB | Seasonic M12II EVO | CoolerMaster Q300L | Dell U2518D | Dell P2217H | 

 

Laptop :

Thinkpad X230 | i5 3320M | 8 GB DDR3 | V-Gen 128 GB SSD |

Link to comment
Share on other sites

Link to post
Share on other sites

Probably wouldn't work nearly as well because of video compression. Also I don't think you want to have your gpu pinned while watching a youtube video just to save some space on youtube's servers. Also also, not everyone is going to have an expensive dgpu for that... think of smartphones or smart TVs. Server space is cheaper than high end graphics cards.

Don't ask to ask, just ask... please 🤨

sudo chmod -R 000 /*

Link to comment
Share on other sites

Link to post
Share on other sites

As much as NVIDIA is hyping DLSS to be "Ai powered upscaling", you simply can't create things that are missing. It's just impossible no matter how much "Ai" buzzwords you throw at it. And DLSS is not creating detail out of nothing, they are just highly selectively processing the rendering to maximize performance while not affecting quality on a perceivable level. Imagine placing a 100% JPEG quality downgrade on a lossless PNG photo. There is no way anyone can pinpoint the quality downgrade, but the image will be 50% smaller than original. That's DLSS. No one but NVIDIA knows the exact specifics, but you can be sure it's not just magically making all the details out of nothing. That's just impossible, especially in highly dynamic worlds like in games that are fully interactive and don't have a predictable angle or movement of anything in it.

 

And that further applies to video. Unlike games where each individual frame is a stand alone sharp rendered frame of a rasterized scene, frames in movies are image captures of something that has happened. If it's smudged, it's smudged on capture. In games, they fake it with motion blur effect on purpose to give that effect. On top of a still sharp rasterized frame. They have pretty much free hands to manipulate every frame in games though through the rendering pipeline. You can even render it at 320x480 or 3840x2160 and it will retain all the bits, you'll just see more of them at 4K. With video, if it was captured at 320x480, that's it. You can make up some details using whatever Ai interpolation, but you can't possibly add back the details that weren't captured. Reason they could remaster some of old movies that were still recorded on actual tape is because with video on tapes, you actually have the capability to extract far higher resolution imagery from the physical tape than we were able to display on TV's or project in cinemas. With digital recording today, what you capture at 8K is 8K and that's it. You'll never be able to up that to 16K in the future like we did with taped movies from the past. It's just physically impossible.

Link to comment
Share on other sites

Link to post
Share on other sites

Didn't Linus say that nvidia's shield TV already does a good upscaling job? I'm not familiar with the device, or how it does it though.

 

More on DLSS, I think the current implementation of it requires specific training for the source material. For a particular game, you can do that. For a generic version, it will be more difficult to get good and predictable results.

 

If you follow Taran's personal twitter/youtube, he's mentioned waifu2x on many occasions, so a more generic approach is possible, but that seems to be a LOT slower than DLSS. If you want to do it for real time video, you need both good and fast, not going to be easy.

Main system: i9-7980XE, Asus X299 TUF mark 2, Noctua D15, Corsair Vengeance Pro 3200 3x 16GB 2R, RTX 3070, NZXT E850, GameMax Abyss, Samsung 980 Pro 2TB, Acer Predator XB241YU 24" 1440p 144Hz G-Sync + HP LP2475w 24" 1200p 60Hz wide gamut
Gaming laptop: Lenovo Legion 5, 5800H, RTX 3070, Kingston DDR4 3200C22 2x16GB 2Rx8, Kingston Fury Renegade 1TB + Crucial P1 1TB SSD, 165 Hz IPS 1080p G-Sync Compatible

Link to comment
Share on other sites

Link to post
Share on other sites

I don't think this is going to be a thing, for the following reason;

 

AI upscaling (upscaling in general but especially the AI variant) is useful in situations where there may not be enough processing power to render a scene at full-res in the required time window (like in a game)

 

But upscaling (with or without AI involved) is always worse than the actual full-res image. Sure AI may do a lot better than standard upscaling, but the original file at 4k will always look better than some upscaled version. 

 

When plenty of processing is available (no real time rendering is required) which is the case on a video platform, it makes more sense to actually render at 4k. 

 

Link to comment
Share on other sites

Link to post
Share on other sites

5 minutes ago, RejZoR said:

As much as NVIDIA is hyping DLSS to be "Ai powered upscaling", you simply can't create things that are missing.

I think this is missing the point somewhat. The goal is not to get back to the original exact data. It is to get something that looks good and realistic.

 

As an example, say there is a low resolution brick wall. You know that as you "zoom in" you expect to see certain details and textures. It doesn't matter that those details are not the original, as long as they're realistic and fitting.

 

We've already accepted this imperfect reality in music and visual content where selective detail loss is used to reduce data while retaining the impression of the original. This takes it the other direction, can we improvise data to convincingly fill in the gaps where we don't have that data? I'd say yes, within certain limitations for sure.

 

2 minutes ago, akio123008 said:

But upscaling (with or without AI involved) is always worse than the actual full-res image. Sure AI may do a lot better than standard upscaling, but the original file at 4k will always look better than some upscaled version. 

I'd argue a detail here. I do think an intelligent upscale can look better, but it isn't necessarily accurate compared to the true data that would be there. I suppose you can put this into the general category of processing to make things look better.

Main system: i9-7980XE, Asus X299 TUF mark 2, Noctua D15, Corsair Vengeance Pro 3200 3x 16GB 2R, RTX 3070, NZXT E850, GameMax Abyss, Samsung 980 Pro 2TB, Acer Predator XB241YU 24" 1440p 144Hz G-Sync + HP LP2475w 24" 1200p 60Hz wide gamut
Gaming laptop: Lenovo Legion 5, 5800H, RTX 3070, Kingston DDR4 3200C22 2x16GB 2Rx8, Kingston Fury Renegade 1TB + Crucial P1 1TB SSD, 165 Hz IPS 1080p G-Sync Compatible

Link to comment
Share on other sites

Link to post
Share on other sites

@porina

The thing is, upscaling will always be interpolation from existing data. Existing set of pixels present in captured video. There is no algorithm or Ai in this world that can say "yes, that's a wire fence" and then proceed to generate it on its own in a sharper more detailed way. It's just impossible to do that when it's incorporated with other things in the scene at varying angles and might have a certain context on the image.

 

Yes, it's definitely possible to train algorithms to recognize those things as wire fence. But rebuilding it from scratch in higher detail on that same image, no. Just ain't possible now. I'm guessing eventually with enough compute power and training, it might theoretically be possible. I mean, they'd have to train systems down to every material and shape known to mankind and also give it ability to generate it at specific angle/placement and then correctly incorporate it into scene and accurately shade it to properly blend with the rest of the scene (which would probably also be processed in similar way). In highly theoretical scenario, it's possible. But practically, where are nowhere close to that today.

Link to comment
Share on other sites

Link to post
Share on other sites

 

27 minutes ago, akio123008 said:

I don't think this is going to be a thing, for the following reason;

 

AI upscaling (upscaling in general but especially the AI variant) is useful in situations where there may not be enough processing power to render a scene at full-res in the required time window (like in a game)

 

But upscaling (with or without AI involved) is always worse than the actual full-res image. Sure AI may do a lot better than standard upscaling, but the original file at 4k will always look better than some upscaled version. 

 

When plenty of processing is available (no real time rendering is required) which is the case on a video platform, it makes more sense to actually render at 4k. 

 

as @porina said, AI upscaling is about intelligently filling in missing pieces, as opposed to restoring the original image. It's silly to think the latter is possible.

 

Nvidia have already shown that their upscaling technology looks better than native, and they are not 3D rendering the upscale, just like you wouldn't be 3D rendering an upscaled video stream. Thats the whole point of DLSS.

R9 3900XT | Tomahawk B550 | Ventus OC RTX 3090 | Photon 1050W | 32GB DDR4 | TUF GT501 Case | Vizio 4K 50'' HDR

 

Link to comment
Share on other sites

Link to post
Share on other sites

4 minutes ago, RejZoR said:

@porina

The thing is, upscaling will always be interpolation from existing data. Existing set of pixels present in captured video. There is no algorithm or Ai in this world that can say "yes, that's a wire fence" and then proceed to generate it on its own in a sharper more detailed way. It's just impossible to do that when it's incorporated with other things in the scene at varying angles and might have a certain context on the image.

 

Yes, it's definitely possible to train algorithms to recognize those things as wire fence. But rebuilding it from scratch in higher detail on that same image, no. Just ain't possible now. I'm guessing eventually with enough compute power and training, it might theoretically be possible. I mean, they'd have to train systems down to every material and shape known to mankind and also give it ability to generate it at specific angle/placement and then correctly incorporate it into scene and accurately shade it to properly blend with the rest of the scene (which would probably also be processed in similar way). In highly theoretical scenario, it's possible. But practically, where are nowhere close to that today.

Once the AI starts training the AI, it won't take long lol.

R9 3900XT | Tomahawk B550 | Ventus OC RTX 3090 | Photon 1050W | 32GB DDR4 | TUF GT501 Case | Vizio 4K 50'' HDR

 

Link to comment
Share on other sites

Link to post
Share on other sites

50 minutes ago, Sauron said:

Probably wouldn't work nearly as well because of video compression. Also I don't think you want to have your gpu pinned while watching a youtube video just to save some space on youtube's servers. Also also, not everyone is going to have an expensive dgpu for that... think of smartphones or smart TVs. Server space is cheaper than high end graphics cards.

I don't think you'd have your GPU pinned for something like this. I mean, with DLSS the GPU is rendering a scene at a lower resolution, then intelligently upscaling the finished 2D image. If instead the lower resolution 2D image is transferred over the internet and upscaled on your display, most of the work was already done before the transfer. For game streaming this would be a massive game changer, but it would work for regular video streaming as well if the AI was trained well enough. 

R9 3900XT | Tomahawk B550 | Ventus OC RTX 3090 | Photon 1050W | 32GB DDR4 | TUF GT501 Case | Vizio 4K 50'' HDR

 

Link to comment
Share on other sites

Link to post
Share on other sites

20 minutes ago, RejZoR said:

The thing is, upscaling will always be interpolation from existing data. Existing set of pixels present in captured video. There is no algorithm or Ai in this world that can say "yes, that's a wire fence" and then proceed to generate it on its own in a sharper more detailed way. It's just impossible to do that when it's incorporated with other things in the scene at varying angles and might have a certain context on the image.

Maybe we have an expectation gap between us. I'm not suggesting we will have movie style magic "enhance" that pulls a LOT more detail out from where there wasn't. I'm talking about more modest upscaling, perhaps of the order of 2x or 3x per side. Convincing reconstruction on that scale is already possible, but the question is at what cost. Again, it is not expected to be pixel exact to the original. Even with anti-aliasing, we know if you have a curved or diagonal line represented on pixels, you get a staircase effect. But we can "see" that it was a line and more intelligently scale it than just making bigger blocks.

Main system: i9-7980XE, Asus X299 TUF mark 2, Noctua D15, Corsair Vengeance Pro 3200 3x 16GB 2R, RTX 3070, NZXT E850, GameMax Abyss, Samsung 980 Pro 2TB, Acer Predator XB241YU 24" 1440p 144Hz G-Sync + HP LP2475w 24" 1200p 60Hz wide gamut
Gaming laptop: Lenovo Legion 5, 5800H, RTX 3070, Kingston DDR4 3200C22 2x16GB 2Rx8, Kingston Fury Renegade 1TB + Crucial P1 1TB SSD, 165 Hz IPS 1080p G-Sync Compatible

Link to comment
Share on other sites

Link to post
Share on other sites

11 minutes ago, porina said:

Maybe we have an expectation gap between us. I'm not suggesting we will have movie style magic "enhance" that pulls a LOT more detail out from where there wasn't. I'm talking about more modest upscaling, perhaps of the order of 2x or 3x per side. Convincing reconstruction on that scale is already possible, but the question is at what cost. Again, it is not expected to be pixel exact to the original. Even with anti-aliasing, we know if you have a curved or diagonal line represented on pixels, you get a staircase effect. But we can "see" that it was a line and more intelligently scale it than just making bigger blocks.

Actually, I'd love to see NVIDIA pull a super advanced SMAA like anti-aliasing post process filter that runs on Tensor cores and can differentiate still GUI elements and not process and blur them all while recognizing edges of ingame moving objects and filter them to perfection in realtime  to a point there would be no jaggies left anywhere while not blurring anything and since it's running on tensor and not shader cores, it would result in negligible performance impact. Current best post process AA filters like SMAA seem to be too basic and miss too many edges and not filtering them all properly. A lot of them feel like they've been processed at 16x MSAA, but many feel like they were hardly brushed by 2x MSAA. This, unlike making things up out of nothing does seem more realistic expectation and I sure hope they will go down this venue. I just wish we won't have to wait for RTX 4000 series or something to get it. Crappy FXAA as only option in NVCP just needs to go away already as it's so bad I never use it. SMAA in ReShade gives varying results, isn't as blurry, but still leaves too many jagged edges...

Link to comment
Share on other sites

Link to post
Share on other sites

1 hour ago, porina said:

he's mentioned waifu2x on many occasions

I've played with it a bit, experimenting on enlarging pictures so they could be printed as posters....It's very much GIGO...if you start with a high quality image, it does a nice job of giving you te image bigger, thru playing around with sharpness and blur settings...but it's not a magic CSI enhance program.

🖥️ Motherboard: MSI A320M PRO-VH PLUS  ** Processor: AMD Ryzen 2600 3.4 GHz ** Video Card: Nvidia GeForce 1070 TI 8GB Zotac 1070ti 🖥️
🖥️ Memory: 32GB DDR4 2400  ** Power Supply: 650 Watts Power Supply Thermaltake +80 Bronze Thermaltake PSU 🖥️

🍎 2012 iMac i7 27";  2007 MBP 2.2 GHZ; Power Mac G5 Dual 2GHZ; B&W G3; Quadra 650; Mac SE 🍎

🍎 iPad Air2; iPhone SE 2020; iPhone 5s; AppleTV 4k 🍎

Link to comment
Share on other sites

Link to post
Share on other sites

10 minutes ago, RejZoR said:

Actually, I'd love to see NVIDIA pull a super advanced SMAA like anti-aliasing post process filter that runs on Tensor cores and can differentiate still GUI elements and not process and blur them all while recognizing edges of ingame moving objects and filter them to perfection in realtime  to a point there would be no jaggies left anywhere while not blurring anything and since it's running on tensor and not shader cores, it would result in negligible performance impact.

While not exactly what you're asking for, for many games there is an advantage they can exploit already. UI elements can be seen as a separate layer from the 3D engine content, so you can use lower scale rendering for performance, while having a native higher resolution UI on top.

 

Also while I haven't looked at it in detail myself (figuratively or literally) it does sound like as resolutions increase people can get away with less AA.

 

1 minute ago, Video Beagle said:

I've played with it a bit, experimenting on enlarging pictures so they could be printed as posters....It's very much GIGO...if you start with a high quality image, it does a nice job of giving you te image bigger, thru playing around with sharpness and blur settings...but it's not a magic CSI enhance program.

Again maybe it is an expectations thing. Like Taran, I find it works great if you have a typical web jpeg image with compression artefacts. Upscaling 2x works well to substantially get rid of those artefacts and also the edges are preserved or enhanced. It doesn't replace having the original high resolution, but it does a lot better than "dumb" upscaling.

Main system: i9-7980XE, Asus X299 TUF mark 2, Noctua D15, Corsair Vengeance Pro 3200 3x 16GB 2R, RTX 3070, NZXT E850, GameMax Abyss, Samsung 980 Pro 2TB, Acer Predator XB241YU 24" 1440p 144Hz G-Sync + HP LP2475w 24" 1200p 60Hz wide gamut
Gaming laptop: Lenovo Legion 5, 5800H, RTX 3070, Kingston DDR4 3200C22 2x16GB 2Rx8, Kingston Fury Renegade 1TB + Crucial P1 1TB SSD, 165 Hz IPS 1080p G-Sync Compatible

Link to comment
Share on other sites

Link to post
Share on other sites

I hope that doesn't happen.

“Remember to look up at the stars and not down at your feet. Try to make sense of what you see and wonder about what makes the universe exist. Be curious. And however difficult life may seem, there is always something you can do and succeed at. 
It matters that you don't just give up.”

-Stephen Hawking

Link to comment
Share on other sites

Link to post
Share on other sites

1 hour ago, Briggsy said:

For game streaming this would be a massive game changer

I struggle to see why... also it probably would be much worse for actual video of people.

Don't ask to ask, just ask... please 🤨

sudo chmod -R 000 /*

Link to comment
Share on other sites

Link to post
Share on other sites

6 hours ago, Briggsy said:

For Google's benefit it would be better served if the upscaling were done on the client side

6 hours ago, Briggsy said:

And also for on-demand game streaming, I'm sure that has to be a big motivating factor for Nvidia.

I think that's counter intuitive. Remember that in order to do that upscaling, you currently need a beefy GPU (it's not feasible to do so in a CPU), and now you'd be wasting even more resources than it's take to actually render the video itself at higher resolutions. Remember that we have dedicated hardware decoders, you'd be making those useless in exchange for using the raw GPU cores to do your upscaling.

 

If you forgot about the idea of doing it client-side and went for server-side processing, then it'd be meaningless because GPUs are way more expensive than HDDs, and you'd still need to recreate the file temporarily, encode it and stream it to the client, the delay would be awful and there wouldn't be many savings server side anyway.

 

5 hours ago, RejZoR said:

@porina

The thing is, upscaling will always be interpolation from existing data. Existing set of pixels present in captured video. There is no algorithm or Ai in this world that can say "yes, that's a wire fence" and then proceed to generate it on its own in a sharper more detailed way. It's just impossible to do that when it's incorporated with other things in the scene at varying angles and might have a certain context on the image.

 

Yes, it's definitely possible to train algorithms to recognize those things as wire fence. But rebuilding it from scratch in higher detail on that same image, no. Just ain't possible now. I'm guessing eventually with enough compute power and training, it might theoretically be possible. I mean, they'd have to train systems down to every material and shape known to mankind and also give it ability to generate it at specific angle/placement and then correctly incorporate it into scene and accurately shade it to properly blend with the rest of the scene (which would probably also be processed in similar way). In highly theoretical scenario, it's possible. But practically, where are nowhere close to that today.

Oh man, you'd be surprised by the state of the art for current ML models, we're pretty close to what you said already. Mostly in the academia, but still, it's really nice to see the field advancing so quickly year by year.

FX6300 @ 4.2GHz | Gigabyte GA-78LMT-USB3 R2 | Hyper 212x | 3x 8GB + 1x 4GB @ 1600MHz | Gigabyte 2060 Super | Corsair CX650M | LG 43UK6520PSA
ASUS X550LN | i5 4210u | 12GB
Lenovo N23 Yoga

Link to comment
Share on other sites

Link to post
Share on other sites

1 hour ago, RejZoR said:

Press X for doubt...

Even Disney has nice papers on GANs with high quality images. You'd be surprised if you made a quick search on the topic :)

FX6300 @ 4.2GHz | Gigabyte GA-78LMT-USB3 R2 | Hyper 212x | 3x 8GB + 1x 4GB @ 1600MHz | Gigabyte 2060 Super | Corsair CX650M | LG 43UK6520PSA
ASUS X550LN | i5 4210u | 12GB
Lenovo N23 Yoga

Link to comment
Share on other sites

Link to post
Share on other sites

25 minutes ago, comander said:

This would only make sense if you as an individual had tons of compute power and very little little bandwidth (rural Oklahoma?)

 

Making compressed video look good in real time is a bit tricky. 

 

At the extreme if it were done on Google's end it'd be called... Compression, assuming you started with full quality sources. 

 

Youtube already has compression and some ML is likely to be used on it. 

 

Auto upscaling might be an interesting use case. That's something for later...

well yeah, thats all DLSS is is auto-upscaling, but done intelligently with educated guessing. 

 

Like if you're watching a 1080p video on a 4K display, your display or gpu is already upscaling with a dumb algorithm. DLSS just improves that algorithm with deep learning, so you don't need a ton of compute power.

 

I think nvidia has confused a lot of people by calling it deep learning super sampling. They should have called it deep learning upscaling, because that's what its actually doing in practice.

R9 3900XT | Tomahawk B550 | Ventus OC RTX 3090 | Photon 1050W | 32GB DDR4 | TUF GT501 Case | Vizio 4K 50'' HDR

 

Link to comment
Share on other sites

Link to post
Share on other sites

Guest
This topic is now closed to further replies.

×