Youtube Using Custom-Designed Video-Transcoding Chips

ThePointblank · April 22, 2021

Youtube is building and using a custom-designed video transcoding chip for use to support video transcoding for uploaded content on Youtube. This is being reported by the Youtube Blog, and Ars Technica:

https://blog.youtube/inside-youtube/new-era-video-infrastructure

https://arstechnica.com/gadgets/2021/04/youtube-is-now-building-its-own-video-transcoding-chips/

Quote

Google has decided that YouTube demands such a huge transcoding workload that it needs to build its own server chips. The company detailed its new "Argos" chips in a YouTube blog post, a CNET interview, and in a paper for ASPLOS, the Architectural Support for Programming Languages and Operating Systems Conference. Just as there are GPUs for graphics workloads and Google's TPU (tensor processing unit) for AI workloads, the YouTube infrastructure team says it has created the "VCU" or "Video (trans)Coding Unit," which helps YouTube transcode a single video into over a dozen versions that it needs to provide a smooth, bandwidth-efficient, profitable video site.

The new video transcoding chip is called "Argos", and is mounted on a full-length PCI-E card, with two chips per card. There is a giant heat sink that covers the entire card, and what appears to be an external power connector as well. They very much resemble a video card and this is apparently no accident; Google said that they chose the video card like format because it fits with their existing accelerator trays.

Per what Google claims, the Argos chip provides "up to 20-33x improvements in compute efficiency compared to our previous optimized system, which was running software on traditional servers."

The Google supplied chip diagram indicates some more details about the chip design. It lists 10 "encoder cores" on each chip, with Google's white paper adding that "all other elements are off-the-shelf IP blocks." Google further notes in the white paper that each encoder core can encode 2160p in realtime, up to 60 FPS (frames per second) using three reference frames.

It is being reported that Google already has thousands of these chips already in operation, and thanks to these chips, people can watch 4k content on Youtube within hours instead of the days it previously took with their previous Intel Skylake and T4 Tensor core GPUs based systems. With these chips, Google apparently can save tons of money, even factoring in development and manufacturing costs to develop these chips.

Reading over what Google is saying about their chip, I can definitely see why Google elected to develop their own video transcoding chip specifically for their workload; this appears to be a far more efficient and cost effective way to handle the workload compared to other off-the-shelf options.

Tieox · April 22, 2021

I can hear Linus's heavy breathing already.

trag1c · April 22, 2021

Makes sense. ASICS will almost always blow general purpose silicon out of the water when it comes very specialized tasks. I am surprised they didn't do it sooner considering that google has the resources and personal in house to do so.

JLO64 · April 23, 2021

What I'm interested in knowing is whether or not Google will sell these cards to other companies or cloud services. I could definitely see a use case for these for Instagram, Snapchat, Netflix, and others.

Zodiark1593 · April 23, 2021

11 minutes ago, JLO64 said:

What I'm interested in knowing is whether or not Google will sell these cards to other companies or cloud services. I could definitely see a use case for these for Instagram, Snapchat, Netflix, Floatplane, and others.

Fixed...

Kisai · April 23, 2021

14 hours ago, JLO64 said:

What I'm interested in knowing is whether or not Google will sell these cards to other companies or cloud services. I could definitely see a use case for these for Instagram, Snapchat, Netflix, and others.

They likely don't need to. Transcoding is very off-the-shelf IP logic.

Like I imagine it works like this:

H264/H265 hardware decoder + H264(for mobile), VP9 encoder at 8 different resolutions. So you just put enough memory on it to handle 15 decoded frames and 5 encoded frames per resolution. So back-of-the-envelope:

48bit x 8k = 7680 × 4320 x 48bpp = 200MB per frame, so you need 3GB of memory on the input side.

Then on the output side, you need 1GB (200MB x 5) for 8K UHD HDR, 500MB for 8K non-HDR, 500MB for 4K UHD, 250MB for 4K non-UHD, 64MB for 1080p UHD, 32MB for 1080p, 5MB for 720p, 1.2MB for 360p, so add all that up: 5GB of memory. Plus you can recycle the same frame buffer to do H264 and VP9 at the same time.

Note Youtube's suggested encoding has:

Quote

2 consecutive B frames

Closed GOP. GOP of half the frame rate.

So if you're encoding for 60fps or 120fps HDR, the GOP is going to be 30 or 60. You don't need enough memory for the full 60, 90, 120, 144, etc frames, because each GOP begins with an I frame, and each GOP has two B frames, the rest are P frames. You can only seek to I-frames in a video. So typically at most you need just enough frames between the I frame and the B frame in order to do any kind of compression without seeking. A B frame however requires both forward and backward data, so you can't just discard the B frame until you've put all the data into it. So under the assumption you have one B frame per 15 frames, this is probably the minimum reasonable. Again, back-of-the-envelope. There will be other kinds of videos encoded that are not h264/h265.

https://support.google.com/youtube/troubleshooter/2888402?hl=en

Quote

.MOV

.MPEG4

.MP4

.AVI

.WMV

.MPEGPS

.FLV

3GPP

WebM

DNxHR

ProRes

CineForm

HEVC (h265)

So the VAST majority of these are software codecs found in FFMPEG, and if you rip a stream from youtube, you'll usually see exactly what version of FFMPEG they forked from. Like I know from experience I can upload ZBMV video from dosbox with no I frames, or used in other emulators to youtube and it will work, but it will not seek very good because ZBMV is more of an archival format than a streamable one. It's actually kinda interesting to see as the videos will PLAY on youtube but will not seek, which tells me that they are doing a lot of playback on-the-fly, right off the source video. In the future they might just build video files with variable bitrate and resolution so that they can dynamically "build up" from the lowest resolution to the resolution the stream is requested at, which suggests why some videos don't expose both 30 and 60fps streams when the source is 60fps, since that leads to timing issues if you try to synchronize the audio to streams running at different rates (Which are sent as separate streams.)

That said, youtube botches up RGB source videos in colorspace conversion, this is a known thing that people who do animation (particularly that which is converted from Adobe Flash/Animate) to youtube, is that they have to add noise to the video otherwise the encoder will turn all the gradients in to thick color bands, and the color will be horrific.

porina · April 23, 2021

I find it amusing the VCU is described as like GPU form factor when I see GPUs as one example of an add in expansion card. Might be showing my age, but at some point in my life before they got integrated into the motherboard, systems had an IO card (IDE, serial, parallel), video card (before 3D was a thing), sound card, and maybe even a network card if you were sufficiently advanced in those days. So a VCU card continues that history. There isn't really any other alternative is there?

37 minutes ago, Kisai said:

That said, youtube botches up RGB source videos in colorspace conversion, this is a known thing that people who do animation (particularly that which is converted from Adobe Flash/Animate) to youtube, is that they have to add noise to the video otherwise the encoder will turn all the gradients in to thick color bands, and the color will be horrific.

To say they botch the conversion implies if you can supply the same video in a format other than RGB encoded so it doesn't have to do the conversion, it is fine? Basically I'm asking to confirm it is the colourspace conversion that is at fault and not the more general codec properties? It has long been an annoyance of mine that most codecs for stills or video are often tuned around photorealistic content. Other content does ok at higher bitrates, but when squeezed cracks start to show. Don't think this is going to go away any time soon.

Kisai · April 23, 2021

43 minutes ago, porina said:

To say they botch the conversion implies if you can supply the same video in a format other than RGB encoded so it doesn't have to do the conversion, it is fine? Basically I'm asking to confirm it is the colourspace conversion that is at fault and not the more general codec properties? It has long been an annoyance of mine that most codecs for stills or video are often tuned around photorealistic content. Other content does ok at higher bitrates, but when squeezed cracks start to show. Don't think this is going to go away any time soon.

The thing is, you end up having to "dirty" the animation, and pre-encode it to YUV420 to make youtube not compress it into a mess. Like I've done a lot tests in throwing things at youtube to see what it does, and youtube, repeatedly, even on 8-bit palette visuals, makes what should be gradients into a mess, particularly along edges. So my strategy for this is to always upscale the content to 2x or 3x the original resolution.

So if you solve the colorspace problem in advance, the output won't have look like vaseline has been smeared all over the screen. This is a problem also seen when people stream games from consoles, as the consoles HAVE a RGB output, IF they are not connected to a TV. If they're connected to a TV they are likely in 16-235 not 0-255.

At any rate, animation, basically stuff is too clean on the input side, and thus the compression works too hard converting it from RGB24 to YUV420.

Drama Lama · April 23, 2021

iS tHiS tHe eNd oF iNtel?

seriously: Just logical, why would you buy hardware that isn't specialized for your usecase. This is something we will see more and more. with a (almost) "free" architectures like ARM and RISC-V you have a platform with already huge software support ready to be implemented. ( I know designing custom SoCs is more difficult than that but it has become definetly easier to make them) . Independent Fab companies like Samsung, TSMC and Global Foundries are making it easy to produce chips with high performance nodes.

da na · April 23, 2021

But can it run Crysis?

Taf the Ghost · April 23, 2021

Read the blog post, and I'm rather underwhelmed. Firstly, because there is no way they were doing transcoding in "software" previously. They'd have been using accelerators inside a standard server, so this likely just means they've moved to custom server packages. This is a lot more of the result of increases in interconnect bandwidth and faster storage.

The other thing is: why wasn't this done by 2015 rather than started in 2015? It's not like ASICs are new.

RejZoR · April 23, 2021

1 hour ago, wat3rmelon_man2 said:

But can it run Crysis?

Well, it can decode Crysis intro videos with zero CPU load

Kisai · April 23, 2021

1 hour ago, Taf the Ghost said:

Read the blog post, and I'm rather underwhelmed. Firstly, because there is no way they were doing transcoding in "software" previously. They'd have been using accelerators inside a standard server, so this likely just means they've moved to custom server packages. This is a lot more of the result of increases in interconnect bandwidth and faster storage.

The other thing is: why wasn't this done by 2015 rather than started in 2015? It's not like ASICs are new.

AV1 is why without a doubt

Taf the Ghost · April 23, 2021

1 minute ago, Kisai said:

AV1 is why without a doubt

VP9 isn't the exactly cheap computationally, but my real assumption is that AV1 accelerators were going to be expensive and they finally got a massive migration to in-house designs finished. I'd also take a good guess that some group got removed from the project in the middle, because it shouldn't have taken 6 years. Unless there was some long-term IP tie up issue they've just skipped mentioning.

Tieox · April 24, 2021

Do these chips motherboards have RGB headers?

mon1ka · April 27, 2021

On 4/23/2021 at 9:44 AM, Tieox said:

I can hear Linus's heavy breathing already.

hes going to push it to its limits now. brace yourselves.

Loote · April 27, 2021

On 4/23/2021 at 9:44 PM, Kisai said:

AV1 is why without a doubt

I agree, they're one of the contributors and if they plan to use it seriously, they need to offer everything in AV1, which normally would require them to multiply what infrastructure they use for other codecs.

Seeing AV1 decoders in new hardware and YT making that step, seems AV1 is very close to the mainstream now.

LAwLz · April 27, 2021

On 4/23/2021 at 7:50 PM, Taf the Ghost said:

Read the blog post, and I'm rather underwhelmed. Firstly, because there is no way they were doing transcoding in "software" previously. They'd have been using accelerators inside a standard server, so this likely just means they've moved to custom server packages. This is a lot more of the result of increases in interconnect bandwidth and faster storage.

The other thing is: why wasn't this done by 2015 rather than started in 2015? It's not like ASICs are new.

They were in fact transcoding in software before. At least for some codecs from what I can tell.

I have some older Youtube videos downloaded and they were encoded with x264 in software for the AVC version. Newer videos seems to be encoded with something else, including the AVC versions.

Taf the Ghost · April 28, 2021

11 hours ago, LAwLz said:

They were in fact transcoding in software before. At least for some codecs from what I can tell.

I have some older Youtube videos downloaded and they were encoded with x264 in software for the AVC version. Newer videos seems to be encoded with something else, including the AVC versions.

This actually might point to a different issue, which was really their problem: they had different work paths for every codec. YT also goes back and reencodes older videos to save space, which might be where most of the software encoder stack is hitting.

Quackers101 · April 28, 2021

but can they fit in my pocket and without taking all my money?

Sign In

Youtube Using Custom-Designed Video-Transcoding Chips

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites