Jump to content

YouTube Embraces AV1... But it Might Kill Your Battery

Nimoy007
1 hour ago, Nimoy007 said:

...for now

Past history suggests that YouTube generally keeps around old codecs. You can still force H.264 today, to use hardware acceleration on devices older than a decade. 
 

Additionally, I feel it likely YouTube has the capability to transcode VP9 and H.264 in real time, with very little cost. So storage requirements with supporting the older codecs would be negligible. 

My eyes see the past…

My camera lens sees the present…

Link to comment
Share on other sites

Link to post
Share on other sites

1 hour ago, StDragon said:

If so, this is a rational way of migrating to a new codec.

 

At the moment, only the Apple M3 and iPhone 15 Pro support hardware AV1 playback. I'm sure some Android phones that support too, but at the moment all AV1 hardware decode is a premium optional feature.

Hardware AV1 has been around in PC-space for a bit longer than that. The i7-1165G7 in my laptop does AV1 playback in hardware, as does the GeForce RTX 3000 series and later. These I can confirm myself. Supposedly, RDNA2 GPUs (the RX 6000 series, sans the 6500) also supports AV1 decode. 
 

So I’m unsure if you’re referring to general availability of AV1 decode (in which case, it’s been pretty available for a couple years now), or simply ease of access for the general consumer. 

My eyes see the past…

My camera lens sees the present…

Link to comment
Share on other sites

Link to post
Share on other sites

2 hours ago, Zodiark1593 said:

Past history suggests that YouTube generally keeps around old codecs. You can still force H.264 today, to use hardware acceleration on devices older than a decade. 
 

Youtube literately uses FFMPEG. Anything FFMPEG can play, Youtube can injest.

2 hours ago, Zodiark1593 said:

Additionally, I feel it likely YouTube has the capability to transcode VP9 and H.264 in real time, with very little cost. So storage requirements with supporting the older codecs would be negligible. 

 

Youtube used to transcode content on the fly, I know this is true because content I uploaded with ZMBV without any keyframes are unseekable, or at least USED TO BE. I'm not sure when exactly that changed. It wastes a lot of space to have 16 versions of the same video if that video is only seen once.

 

The most likely scenario is that Google transcodes all content to AV1 initially and then spins up other codecs while there is demand. When I stream to Twitch and Youtube simultaneously, Twitch gets 6MBit h264, Youtube gets h265 at 12mbit, but if I go post-stream after Youtube has molested it, it pops up as 299 (h.264 60fps)

 

image.png.826a27f434dfb2f37750d765b9c1a2c5.png

Video: MPEG4 Video (H264) 1920x1080 60fps [V: ISO Media file produced by Google Inc. (h264 high L4.2, yuv420p, 1920x1080) [default]]
Audio: Opus 48000Hz stereo 3072kbps [A: English [eng] (opus, 48000 Hz, stereo) [default]]

Overall bit rate               : 4 021 kb/s

Writing application            : Lavf61.3.100
Writing library                : Lavf61.3.100

Format                         : AVC
Format/Info                    : Advanced Video Codec
Format profile                 : High@L4.2
Format settings                : CABAC / 3 Ref Frames
Format settings, CABAC         : Yes
Format settings, Reference fra : 3 frames

 

Format                         : Opus
Codec ID                       : A_OPUS
Sampling rate                  : 48.0 kHz
Bit depth                      : 32 bits
Compression mode               : Lossy
 

So Youtube takes a high quality h.265 input, and produces a relatively decent h.264 output. Not VP9. Not AV1. If you look at 4.2 under AVC (H.264 you'll notice that is the minimum level for 1080P60.)

 

So there are multiple possibilities. No matter what youtube decides, the writing is going to be on the wall for h264 playback once all the hardware out there has AV1 decode blocks. Only recent (like 30-series nvidia's and 11th gen Intel iGPU's) have the AV1 decode hardware, but that doesn't mean the browser *cough*chrome*cough* is going to tell all the websites it can do AV1. So we might actually see more business-as-usual where streaming sites continue to use h.264 unless the user actually selects a resolution other than auto.

 

Link to comment
Share on other sites

Link to post
Share on other sites

6 minutes ago, Kisai said:

Youtube literately uses FFMPEG. Anything FFMPEG can play, Youtube can injest.

 

Youtube used to transcode content on the fly, I know this is true because content I uploaded with ZMBV without any keyframes are unseekable, or at least USED TO BE. I'm not sure when exactly that changed. It wastes a lot of space to have 16 versions of the same video if that video is only seen once.

 

The most likely scenario is that Google transcodes all content to AV1 initially and then spins up other codecs while there is demand. When I stream to Twitch and Youtube simultaneously, Twitch gets 6MBit h264, Youtube gets h265 at 12mbit, but if I go post-stream after Youtube has molested it, it pops up as 299 (h.264 60fps)

 

image.png.826a27f434dfb2f37750d765b9c1a2c5.png

Video: MPEG4 Video (H264) 1920x1080 60fps [V: ISO Media file produced by Google Inc. (h264 high L4.2, yuv420p, 1920x1080) [default]]
Audio: Opus 48000Hz stereo 3072kbps [A: English [eng] (opus, 48000 Hz, stereo) [default]]

Overall bit rate               : 4 021 kb/s

Writing application            : Lavf61.3.100
Writing library                : Lavf61.3.100

Format                         : AVC
Format/Info                    : Advanced Video Codec
Format profile                 : High@L4.2
Format settings                : CABAC / 3 Ref Frames
Format settings, CABAC         : Yes
Format settings, Reference fra : 3 frames

 

Format                         : Opus
Codec ID                       : A_OPUS
Sampling rate                  : 48.0 kHz
Bit depth                      : 32 bits
Compression mode               : Lossy
 

So Youtube takes a high quality h.265 input, and produces a relatively decent h.264 output. Not VP9. Not AV1. If you look at 4.2 under AVC (H.264 you'll notice that is the minimum level for 1080P60.)

 

So there are multiple possibilities. No matter what youtube decides, the writing is going to be on the wall for h264 playback once all the hardware out there has AV1 decode blocks. Only recent (like 30-series nvidia's and 11th gen Intel iGPU's) have the AV1 decode hardware, but that doesn't mean the browser *cough*chrome*cough* is going to tell all the websites it can do AV1. So we might actually see more business-as-usual where streaming sites continue to use h.264 unless the user actually selects a resolution other than auto.

 

I’m pretty certain that VP9 won’t be going anywhere for a long time, at least, as very many smart TVs, and other tv-connected boxes, are not able to playback AV1.

 

(Technically, a software decoder can be done, but I rarely see a smart tv feel decently fast to begin with. Think an octa-core Cortex A53 SoC can do it?).

 

 

My eyes see the past…

My camera lens sees the present…

Link to comment
Share on other sites

Link to post
Share on other sites

4 hours ago, leadeater said:

I would guess YouTube doesn't auto select 4k on phones  but since I don't watch YouTube on my phone I wouldn't know.

It's usually the opposite problem often defaulting to 480p, even on my 10" tablet. 480p is just about ok if you're viewing landscape content on a phone in portrait orientation but rotate it and full screen, it still could do with 720p. YouTube has been somewhat aggressive at serving lower resolutions presumably to save bandwidth. I use an extension on desktop to force it to 1080p.

Gaming system: R7 7800X3D, Asus ROG Strix B650E-F Gaming Wifi, Thermalright Phantom Spirit 120 SE ARGB, Corsair Vengeance 2x 32GB 6000C30, RTX 4070, MSI MPG A850G, Fractal Design North, Samsung 990 Pro 2TB, Acer Predator XB241YU 24" 1440p 144Hz G-Sync + HP LP2475w 24" 1200p 60Hz wide gamut
Productivity system: i9-7980XE, Asus X299 TUF mark 2, Noctua D15, 64GB ram (mixed), RTX 3070, NZXT E850, GameMax Abyss, Samsung 980 Pro 2TB, random 1080p + 720p displays.
Gaming laptop: Lenovo Legion 5, 5800H, RTX 3070, Kingston DDR4 3200C22 2x16GB 2Rx8, Kingston Fury Renegade 1TB + Crucial P1 1TB SSD, 165 Hz IPS 1080p G-Sync Compatible

Link to comment
Share on other sites

Link to post
Share on other sites

3 hours ago, Kisai said:

Youtube used to transcode content on the fly, I know this is true because content I uploaded with ZMBV without any keyframes are unseekable, or at least USED TO BE. I'm not sure when exactly that changed. It wastes a lot of space to have 16 versions of the same video if that video is only seen once.

The only real-time transcoding YouTube does is for Live Streams, they don't and haven't done any VOD playback transcoding in more than a decade. I would say ever but I don't know what YouTube did when it first existed.

 

16 versions of a video is absolutely less costly than transcoding, also it's likely much less than 16 versions, more than 10 is unlikely. Compute no matter if it's CPU, GPU, ASIC costs more than storage and it also doesn't scale anywhere near, like 100 times less, than file storage space and direct download.

Link to comment
Share on other sites

Link to post
Share on other sites

10 minutes ago, leadeater said:

The only real-time transcoding YouTube does is for Live Streams, they don't and haven't done any VOD playback transcoding in more than a decade. I would say ever but I don't know what YouTube did when it first existed.

 

16 versions of a video is absolutely less costly than transcoding, also it's likely much less than 16 versions, more than 10 is unlikely. Compute no matters if it's CPU, GPU, ASIC costs more than storage and it also doesn't scale anywhere near, like 100 times less, than file storage space and direct download.

Still waiting for that promise of "peelback" codecs which build a stream from multiple delta streams

 

eg 144-320-480-720-1080-1440-2160. I have a feeling patents probably are involved. The idea was that you transcode once (in real time) but you only subscribe to the streams necessary to build the final resolution you need.

 

How this would even work probably makes more sense if you think of interlaced video. Each stream adds "even" rows. (and I'm going to say rows, but its' rows and columns)

 

So 144->320 adds a 176 row second stream, 320->480 adds a 160 row stream 480->720p adds a 240 row stream, 720p->1080p adds 360 rows, 1080p->1440 adds 360 rows, 2160p adds 720 rows

 

 

Link to comment
Share on other sites

Link to post
Share on other sites

8 hours ago, Kisai said:

eg 144-320-480-720-1080-1440-2160. I have a feeling patents probably are involved. The idea was that you transcode once (in real time) but you only subscribe to the streams necessary to build the final resolution you need.

 

How this would even work probably makes more sense if you think of interlaced video. Each stream adds "even" rows. (and I'm going to say rows, but its' rows and columns)

How it could work might better be compared to jpeg encoding as that is in the frequency domain. You separate out the different spatial frequency content and store them separately. The coarsest band would be lowest resolution. You can layer on the higher frequency information to build up the higher resolution images.

 

I'm going to guess a problem with this approach is in the temporal domain. Things like motion data and predictive frames. Do they still work well when broken apart in this way? At best I can imagine there will be an encoding overhead. 

Gaming system: R7 7800X3D, Asus ROG Strix B650E-F Gaming Wifi, Thermalright Phantom Spirit 120 SE ARGB, Corsair Vengeance 2x 32GB 6000C30, RTX 4070, MSI MPG A850G, Fractal Design North, Samsung 990 Pro 2TB, Acer Predator XB241YU 24" 1440p 144Hz G-Sync + HP LP2475w 24" 1200p 60Hz wide gamut
Productivity system: i9-7980XE, Asus X299 TUF mark 2, Noctua D15, 64GB ram (mixed), RTX 3070, NZXT E850, GameMax Abyss, Samsung 980 Pro 2TB, random 1080p + 720p displays.
Gaming laptop: Lenovo Legion 5, 5800H, RTX 3070, Kingston DDR4 3200C22 2x16GB 2Rx8, Kingston Fury Renegade 1TB + Crucial P1 1TB SSD, 165 Hz IPS 1080p G-Sync Compatible

Link to comment
Share on other sites

Link to post
Share on other sites

13 hours ago, Kisai said:

Youtube used to transcode content on the fly, I know this is true because content I uploaded with ZMBV without any keyframes are unseekable, or at least USED TO BE. I'm not sure when exactly that changed. It wastes a lot of space to have 16 versions of the same video if that video is only seen once.

 

[Snip]

 

So Youtube takes a high quality h.265 input, and produces a relatively decent h.264 output. Not VP9. Not AV1. If you look at 4.2 under AVC (H.264 you'll notice that is the minimum level for 1080P60.)

I missed this post earlier. If you still upload to YouTube you'll see this on the video detail page:

image.png.b2a2ac9db7dea6da101a8687a60ad205.png

These light up as they are done encoding. SD is the first block to light up, and the others follow later. When SD is done I get 360p offered only. When encoding is complete I get all the options from 144p to 4k. It may be possible, where it is cheap to do so, they can real time transcode lower resolutions. Like 360p to 144p is relatively low cost. You're not going to want to do 4k to 1440p real time. But given after the initial SD encoding is done I do not get 144p option, that doesn't seem to be the case.

 

You also got me interested in codec choice/offerings. I upload with YouTube's recommended H.264 settings. On my system I always get offered that video playback in VP9 all the way from 144p to 4k. I checked in case they might alter for different resolutions. This is with Chrome, Ampere GPU, Win11. I don't know if you have different browser, GPU or OS if that may differ.

 

We can look at YouTube's recommended upload bitrates for 60 fps SDR content. 4k is 53-68Mbps, and the sum of all 1440p, 1080p, 720p, 480p, 360p is 49 Mbps. They don't list 240p or 144p but they're not going to be significant. Let's simplify and say that if you upload 4k content, the storage requirements for all 8 offered resolutions will be less than 2x that. I know what you download may differ, but it should scale similarly.

Gaming system: R7 7800X3D, Asus ROG Strix B650E-F Gaming Wifi, Thermalright Phantom Spirit 120 SE ARGB, Corsair Vengeance 2x 32GB 6000C30, RTX 4070, MSI MPG A850G, Fractal Design North, Samsung 990 Pro 2TB, Acer Predator XB241YU 24" 1440p 144Hz G-Sync + HP LP2475w 24" 1200p 60Hz wide gamut
Productivity system: i9-7980XE, Asus X299 TUF mark 2, Noctua D15, 64GB ram (mixed), RTX 3070, NZXT E850, GameMax Abyss, Samsung 980 Pro 2TB, random 1080p + 720p displays.
Gaming laptop: Lenovo Legion 5, 5800H, RTX 3070, Kingston DDR4 3200C22 2x16GB 2Rx8, Kingston Fury Renegade 1TB + Crucial P1 1TB SSD, 165 Hz IPS 1080p G-Sync Compatible

Link to comment
Share on other sites

Link to post
Share on other sites

On 4/21/2024 at 11:26 AM, leadeater said:

I would guess YouTube doesn't auto select 4k on phones  but since I don't watch YouTube on my phone I wouldn't know. 

 

...

 

Demanding zero people effected (negatively) is simply unrealistic and also unfair.

Ha! Funny. YouTube avoids serving even HD content if at all possible. I recall Luke demoing it on WAN show at one point. I have my YouTube settings on higher picture quality, but YouTube will almost always play at less than 720p unless I force the quality setting manually which "only applies to the current video." It does this regardless of Wi-Fi/LTE/5G. It's pretty scummy, and I can always tell, however it only really matters when I need a lot of detail, so I guess their BS strategy works.

 

As far as your final statement goes, I agree, but it's always a good idea to keep an eye on what technology companies make obsolete. Sometimes, they really do make unreasonable changes, which is why the tech community has to hold them accountable. It's not like I'm asking for a Windows phone to still be fully functional, but I'd rather a mid-tier phone from 3ish years ago still work for a while (not my device, but they definitely exist). I guess we'll have to see how it all plays out. Not too much point in getting over-speculative about the whims of YouTube 😆

Engineer, electronics enthusiast, maker. My devices/tech:

Spoiler

Desktop (Main): i7-7700k/EVGA FTW3 RTX 3090/32" Curved QLED monitor

Desktop (HTPC): Ryzen 3-3200G/RX 580

Handheld: ASUS ROG Ally Z1 Extreme

Laptop: ASUS ROG Strix G15 Advantage

VR: Quest 2

Phone: Samsung Galaxy S23 Ultra

Dev Boards: TI MSP430/Arduino Uno/Raspberry Pi 4/esp32

Sound: Sony WH-1000XM4, HyperX Quadcast

Camera: Sony a6300 w/ 18-135mm kit lens

Link to comment
Share on other sites

Link to post
Share on other sites

9 hours ago, porina said:

 

 

You also got me interested in codec choice/offerings. I upload with YouTube's recommended H.264 settings. On my system I always get offered that video playback in VP9 all the way from 144p to 4k. I checked in case they might alter for different resolutions. This is with Chrome, Ampere GPU, Win11. I don't know if you have different browser, GPU or OS if that may differ.

 

image.thumb.png.1ff75600a45c3c10be6557fa62e29abe.png

VP9 is only offered once on that h.265 stream.

 

Meanwhile, if I UPLOAD a video straight from Davinci Resolve:

image.thumb.png.b1b6c87fa1c17d64b6b802a49aa09852.png

 

Conclusion:

Youtube presently transcodes to VP9 only for 2160p/1440p, VP9+AVC for 1080p and 720p but also 480p, 360p, 240p and 144p. VP9 is not used at all for videos from streams.

 

However if you look carefully, you see each of those codecs have a suffix. My guess here is that each suffix is a specific encoding profile. 

 

9 hours ago, porina said:

We can look at YouTube's recommended upload bitrates for 60 fps SDR content. 4k is 53-68Mbps, and the sum of all 1440p, 1080p, 720p, 480p, 360p is 49 Mbps. They don't list 240p or 144p but they're not going to be significant. Let's simplify and say that if you upload 4k content, the storage requirements for all 8 offered resolutions will be less than 2x that. I know what you download may differ, but it should scale similarly.

 

Well let's find out. 4K is "3.20GB" , 1440p+1080p+720p+480p+360p+144p = 3.4GB. That's just off the estimate. I could download all of them if I wanted an exact number.

 

Link to comment
Share on other sites

Link to post
Share on other sites

6 hours ago, Nimoy007 said:

which is why the tech community has to hold them accountable. It's not like I'm asking for a Windows phone to still be fully functional, but I'd rather a mid-tier phone from 3ish years ago still work for a while (not my device, but they definitely exist). I guess we'll have to see how it all plays out. Not too much point in getting over-speculative about the whims of YouTube 😆

Well that's all fine and good but we can more accurately go by past history from YouTube itself and also it's technical information on what they actually do.

 

Quote

the YouTube infrastructure team says it has created the "VCU" or "Video (trans)Coding Unit," which helps YouTube transcode a single video into over a dozen versions that it needs to provide a smooth, bandwidth-efficient, profitable video site.

https://arstechnica.com/gadgets/2021/04/youtube-is-now-building-its-own-video-transcoding-chips/

 

So I can very confidently tell you they aren't going to be AV1 only, not for a very long time/ever.

Link to comment
Share on other sites

Link to post
Share on other sites

7 hours ago, leadeater said:

Well that's all fine and good but we can more accurately go by past history from YouTube itself and also it's technical information on what they actually do.

 

https://arstechnica.com/gadgets/2021/04/youtube-is-now-building-its-own-video-transcoding-chips/

 

So I can very confidently tell you they aren't going to be AV1 only, not for a very long time/ever.

I have to wonder why nvidia didn't build anything like this. Considering how much die space the encoders take up:

1005-block-diagram.jpg

AD102. Those 3 NVENC's and 3 NVDEC's could all fit in one GPC. Consider there's 12 of those, one AD102 should be able to fit 75 NVEC's + 3 NVDEC's. If Google is spitting out 10 resolutions in two codecs, you could probably have 4 input streams and 20 output's at once in that die space.

 

At any rate, I'd like to see an example of where Youtube actually is using AV1, because I've not run into any that I can think of.

 

 

Link to comment
Share on other sites

Link to post
Share on other sites

There is quite a lot of misinformation (or very vague terms) about this news piece floating around. Even the source article itself seems to get some things wrong or at the very least makes misleading remarks.

 

1) Android devices going quite far back already had support for AV1. What is changing is that the decoder is being changed from libgav1 (Google's own AV1 decoder) to dav1d (the AV1 decoder developed by VideoLAN). So nothing is changing in terms of what devices can and can't play. It's just that the new decoder is better than the old one.

 

2) When talking about which formats a device supports or doesn't support it is very important to specify "software support" and "hardware support". Pretty much all devices support AV1 decoding in software. Very few support it in hardware. 

 

3) Just because your device reports support for a certain video format does not mean an app will use it. On Android, when an app fetches the list of supported formats the OS specifies if decoding of the format is supported in software, hardware or both. In other words, just because your phone supports AV1 decoding in software doesn't mean an app will just decide to fetch that format for you. The app itself will have information about which formats are supported in hardware and which aren't, and makes a decision based on that.

 

4) Just because the Youtube app, or any other app for that matter, uses the new dav1d decoder doesn't mean it will automatically fetch an AV1 video. Which video it decides to fetch is a separate from which formats are supported. As I said earlier, nothing in this chance from libgav1 to dav1d changes what devices report as supported formats. If Youtube now decides to play AV1 videos on devices that doesn't support hardware accelerated AV1 decoding then it is because the Youtube app doesn't care, not because of some OS change that messes with what gets reported as supported video formats.

 

5) Something to keep in mind is that AV1 is very easy to decode in software. Last time I checked, the OnePlus 8 with its quad Cortex-A77 CPU (Snapdragon 865) was able to easily get 250+ FPS when decoding high bitrate 1080 footage on just its CPU.

Even a single Cortex-A53 is enough for playing 720p footage with.

Of course, it is more than hardware-accelerated H.264 or VP9 decoding, but we're still talking about what should be a fairly low impact, especially since this mostly applies to phones that usually get 480p video served to them.

Laptops, where the power efficiency matters the most, have had hardware-accelerated AV1 decoding support for quite a while now. It shouldn't be too big of a deal.

 

 

I am sure that Google have run some calculations to see if this is a good idea or not.

 

 

 

Edit:

Not sure why so many people are talking about uploading in this thread either. This has nothing to do with uploading.

The only thing this (potentially) changes has to do with watching/downloading/decoding. Not uploading.

Link to comment
Share on other sites

Link to post
Share on other sites

4 hours ago, Kisai said:

I have to wonder why nvidia didn't build anything like this. Considering how much die space the encoders take up:

What's the market? It isn't Nvidia's core business. Google have their own chip. What does Twitch use? 

 

Also I wouldn't use the illustration to estimate sizes, even if they're somewhat indicative. Annotated die shots like that found half way through link below is better, but it doesn't split it down to a fine enough detail.

https://locuza.substack.com/p/nvidias-ada-lineup-configurations

 

20 minutes ago, LAwLz said:

Not sure why so many people are talking about uploading in this thread either. This has nothing to do with uploading.

I was using their recommended upload bitrates earlier as a proxy to estimate the potential storage impact of multiple formats.

Gaming system: R7 7800X3D, Asus ROG Strix B650E-F Gaming Wifi, Thermalright Phantom Spirit 120 SE ARGB, Corsair Vengeance 2x 32GB 6000C30, RTX 4070, MSI MPG A850G, Fractal Design North, Samsung 990 Pro 2TB, Acer Predator XB241YU 24" 1440p 144Hz G-Sync + HP LP2475w 24" 1200p 60Hz wide gamut
Productivity system: i9-7980XE, Asus X299 TUF mark 2, Noctua D15, 64GB ram (mixed), RTX 3070, NZXT E850, GameMax Abyss, Samsung 980 Pro 2TB, random 1080p + 720p displays.
Gaming laptop: Lenovo Legion 5, 5800H, RTX 3070, Kingston DDR4 3200C22 2x16GB 2Rx8, Kingston Fury Renegade 1TB + Crucial P1 1TB SSD, 165 Hz IPS 1080p G-Sync Compatible

Link to comment
Share on other sites

Link to post
Share on other sites

6 minutes ago, porina said:

I was using their recommended upload bitrates earlier as a proxy to estimate the potential storage impact of multiple formats.

Ah I see. Missed that part.

Link to comment
Share on other sites

Link to post
Share on other sites

54 minutes ago, LAwLz said:

2) When talking about which formats a device supports or doesn't support it is very important to specify "software support" and "hardware support". Pretty much all devices support AV1 decoding in software. Very few support it in hardware. 

Not all HW decoding is equal, sometimes only a small fraction of the entier steam is HW accerated and the rest is done on the GPU or CPU/Vector units.  

On other systems the video decoder is a full seperate co-prososor with its own min OS and it is passed a point to the video data in memory and a program (set of instruction it its own private instruction set) that describes the decode and it does this providing the raw output to another memory address. 

The Power draw between these to approaches can be very different, (key part being if you have a seperate video decoder co-prossors the main cpu cores can call go to sleep for most of the time with the decoder providing raw frame data to the GPU or even directly to the display controller meaning the cpu only needs to wake up a single core for a tiny fraction of time on each frame.   Of course the downside of this is die area. 
 

59 minutes ago, LAwLz said:

Something to keep in mind is that AV1 is very easy to decode in software. Last time I checked, the OnePlus 8 with its quad Cortex-A77 CPU (Snapdragon 865) was able to easily get 250+ FPS when decoding high bitrate 1080 footage on just its CPU.

The power draw of running the raw CPU decode is huge (even for a lower bit rate 480p) compared to a dedicated decoder. 

 

Link to comment
Share on other sites

Link to post
Share on other sites

5 hours ago, Kisai said:

I have to wonder why nvidia didn't build anything like this. Considering how much die space the encoders take up:

These architecture diagrams aren't actually to physical scale. They are for illustration only.

 

64865d84-f87f-4569-bedf-ccd656df0743_2996x3560.jpeg

The encoders and decoders are actually quite big (see below of GA102 for better view of relevant area) and they are in that portion of the die so they can be used while power gating the GPC's and to also not take power budget from the GPC's.

 

Putting even just 1 pair of encoders/decoders in to the GPC would significantly increase the die area among the other drawbacks.

 

Spoiler

85cb0644-17bc-49b0-b4e2-0da41ba57fe3_300

 

Link to comment
Share on other sites

Link to post
Share on other sites

56 minutes ago, hishnash said:

Not all HW decoding is equal, sometimes only a small fraction of the entier steam is HW accerated and the rest is done on the GPU or CPU/Vector units.  

When I talk about HW acceleration I mean fixed-function hardware. In those cases, you don't run into scenarios where "part of the stream" is decoded in hardware and the rest in software. 

If you're thinking of scenarios like with the old CUVID implementation (no fixed function hardware, but decoding could be done on CUDA cores) then I'd file that under software decoding still. Yes, it offloads some stuff to the GPU but it's not in fixed-function hardware. I am not sure I've ever seen a vendor list a format as hardware accelerated or supported in hardware when they refer to GPGPU stuff. 

There might be exceptions, but pretty much all devices that say they support decoding of something mean they do it fully in discrete video decoding logics.

 

 

 

1 hour ago, hishnash said:

The power draw of running the raw CPU decode is huge (even for a lower bit rate 480p) compared to a dedicated decoder. 

Which is why I said:

2 hours ago, LAwLz said:

Of course, it is more than hardware-accelerated H.264 or VP9 decoding, but we're still talking about what should be a fairly low impact

 

Link to comment
Share on other sites

Link to post
Share on other sites

16 minutes ago, LAwLz said:

When I talk about HW acceleration I mean fixed-function hardware. In those cases, you don't run into scenarios where "part of the stream" is decoded in hardware and the rest in software. 

Most video decoders these days are not as fixed function as you might think.  

https://asahilinux.org/2024/01/fedora-asahi-new/#hardware-video-decode
 

Quote

The Apple Video Decoder is a multiformat programmable hardware driven by a custom instruction set specialized for decoding video. AVD, despite its complexity, oddly enough lacks a firmware handling the low-level decode logic

The HW itself, even the firmware, does not know how to decode any video. The instructions on how to decode the video are provided by use-space applications (the sys lib) when you call the apis to decode a video.  There is a good reason for this, once you look into all the differnt permutations for a given video coded (including color formats etc) you very quickly sendup with a huge permutation of possible sequences of tasks, having dedicated HW pathways for all of these would take up a massive amount of silicon. 

Some other SOCs do the same but they do not have it on a co-prososor they have these units within the GPU or within the CPU (see intel cpus) and in those cases some of the compute is offloaded to the cpu or GPU (or cpu's vendor engines) were that is possible. 

 


 

 

 

Link to comment
Share on other sites

Link to post
Share on other sites

1 hour ago, hishnash said:

Most video decoders these days are not as fixed function as you might think.  

https://asahilinux.org/2024/01/fedora-asahi-new/#hardware-video-decode

I would still call that fixed function. It's just that they are using the same function in a slightly more flexible way to reuse silicon.

The fact of the matter remains that those logics on the SoC is only used for decoding video, and the video formats they support are locked and can't be changed. The only difference it makes is that the same transistors that handle some parts of H.265 might also handle decoding H.264 for example.

It's very much semantics that doesn't really change anything I said earlier.

 

 

 

Quote

Some other SOCs do the same but they do not have it on a co-prososor they have these units within the GPU or within the CPU (see intel cpus) and in those cases some of the compute is offloaded to the cpu or GPU (or cpu's vendor engines) were that is possible. 

Are you saying some of the tasks for decoding the video stream is handled by the CPU or "general purpose" GPU cores on Intel processors?

Because I am fairly sure all of the actual decoding work is done inside the media engine, not on the Xe cores or other execution units/shaders/TMUs/ROPs/etc. I guess you could argue that the GPU is involved because the decoded video gets copied into the video frame buffer and gets sent to the display, but that's very very pedantic.

I guess things like rendering and upscaling could also be done on the CPU or GPU but that's very different from the actual decoding step. I am strictly talking about decoding here, since that's what is relevant to the news piece. 

Link to comment
Share on other sites

Link to post
Share on other sites

3 hours ago, leadeater said:

These architecture diagrams aren't actually to physical scale. They are for illustration only.

I know that, good luck getting physical die marked up . I wasn't saying put the encoder's IN the GPC, I was saying make a device that is nothing but NVENC.

4 hours ago, porina said:

What's the market? It isn't Nvidia's core business. Google have their own chip. What does Twitch use? 

 

Because "twitch/youtube" competitors pop up all the time, and ultimately fail when they realize that the hardware cost is extortionate and no amount of advertising will make up for that. 

 

Releasing a 2 x 20, or a 4 x 40 model that 6 can be popped into a server, gives you 160 potential on-the-fly encoders (Eg Twitch) per server, and I'm sure it could even be made more dense if it was semi-programmable (eg function level blocks, eg iDCT, Huffmann, RLE, etrophy encoding, and so forth) so that an input allocates only the necessary number of function blocks to be transformed, rather than there being enough capacity to transform a 200MBit 8K Stereo/3D video at all times.

 

And if a "peelback" transport layer was ever going to happen, this kind of functionality could effectively create deltas starting from the largest size, downwards. 

 

Anyways, I digress, Google clearly went and made their own hardware because they needed to. Twitch meanwhile has been waffling around and making themselves less useful and less usable over time because of "costs", something like this solves the problem of there not being enough encoder capacity to service everyone.

Link to comment
Share on other sites

Link to post
Share on other sites

33 minutes ago, Kisai said:

I know that, good luck getting physical die marked up . I wasn't saying put the encoder's IN the GPC, I was saying make a device that is nothing but NVENC.

Ah, I'm not sure anyone really wants that though. The Pro cards already have no actual limit on number of encode streams just performance/quality limit and they can do over 30 streams per card on Ampere at 1080p/30 high enough bit rate.

 

Quote

The encode performance listed in Table 3 is given per NVENC engine. Thus, if the GPU has 2 NVENCs (e.g. GP104, AD104), multiply the corresponding number in Table 3 by the number of NVENCs per chip to get aggregate maximum performance (applicable only when running multiple simultaneous encode sessions). Note that unless Split Frame Encoding is enabled, performance with single encoding session cannot exceed performance per NVENC, regardless of the number of NVENCs present on the GPU. Multi NVENC Split Frame Encoding is a feature introduced in SDK12.0 on Ada GPUs for HEVC and AV1. Refer to the NVENC Video Encoder API Programming Guide for more details on this feature.

 

NVENC hardware natively supports multiple hardware encoding contexts with negligible context-switching penalty. As a result, subject to the hardware performance limit and available memory, an application can encode multiple videos simultaneously. NVENCODE API exposes several presets, rate control modes and other parameters for programming the hardware. A combination of these parameters enables video encoding at varying quality and performance levels. In general, one can trade performance for quality and vice versa

 

 

image.thumb.png.e038658a5a7201d36e4c701cdfcefaa4.png

 

Quote

Above measurements are made using the following GPUs: GTX 1060 for Pascal, RTX 8000 for Turing, RTX 3090 for Ampere, and RTX 4090 for Ada. All measurements are done at the highest video clocks as reported by nvidia-smi (i.e. 1708 MHz, 1950 MHz, 1950 MHz, 2415 MHz for GTX 1060, RTX 8000, RTX 3090, and RTX 4090 respectively). The performance should scale according to the video clocks as reported by nvidia-smi for other GPUs of every individual family. Information on nvidia-smi can be found at https://developer.nvidia.com/nvidia-system-management-interface.

https://docs.nvidia.com/video-technologies/video-codec-sdk/12.1/nvenc-application-note/index.html

Link to comment
Share on other sites

Link to post
Share on other sites

9 minutes ago, leadeater said:

Ah, I'm not sure anyone really wants that though. The Pro cards already have no actual limit on number of encode streams just performance/quality limit and they can do over 30 streams per card on Ampere at 1080p/30 high bit rate.

 

In practice it doesn't work that way. I can't encode a 4K stream and a 1080p stream on an RTX 3090. That number is just an undefined limit based on the underlying function block capacity.

 

To encode a 4K HEVC video, will pull down 60-75% of the NVENC capacity. Where as a HD1080p video would pull down 24-35%. I found part of this problem is actually NVENC capacity is bottlenecked by rescaling. OBS spins up two separate encoders, and passes the non-resized video to 4K, but has to rescale the 4K to 1080p.

 

When I started doing simultaneous twitch and youtube + disk, I actually had to use the HEVC encoder on Intel iGPU to send the video stream to youtube, because it otherwise it pushes NVENC over it's capacity.

 

If you wanted to minmax a transcoding batch (eg Davinci Resolve, or command line FFMPEG), then that entire capacity is available to the software and could reasonably get 360fps, but I've NEVER seen that kind of performance from NVENC under any workload. I reasonably think that you would only get that kind of performance from a SIMO output (Single Input, Multiple Output) to make use of all the layers of caching.

 

Link to comment
Share on other sites

Link to post
Share on other sites

12 hours ago, Kisai said:

In practice it doesn't work that way. I can't encode a 4K stream and a 1080p stream on an RTX 3090. That number is just an undefined limit based on the underlying function block capacity.

Yes you can, you can do as many sessions as you want on a Pro card and 5 on a Geforce. People have been doing this for ages, if you can't get it working either it's something you are doing or you are trying an unsupported scenario, which btw wouldn't change if you had 100 nvenc encoders on a card since most have 2 already.

 

video_codec_sdk_latency_tolerant_encoding.png

https://developer.nvidia.com/video-codec-sdk

 

Also real-time encoding aka live streaming has it's own requirements, it has nothing to do with whether or not you can do multiple sessions. Your scenario is also only relevant to client side streaming, server side wouldn't care about any of that and would multi session just fine up to the real time limit due to performance (for live streaming only). But as mentioned requirements for live streaming and encoding in general are different.

 

If you want to ingest a video file and encode it to 12 different output files of different settings then no problem, just let it run.

 

You might want to have a look at this, possibly helpful to you when available:

https://help.twitch.tv/s/article/multiple-encodes?language=en_US

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×