Jump to content

Youtube Using Custom-Designed Video-Transcoding Chips

ThePointblank

Youtube is building and using a custom-designed video transcoding chip for use to support video transcoding for uploaded content on Youtube. This is being reported by the Youtube Blog, and Ars Technica:

 

https://blog.youtube/inside-youtube/new-era-video-infrastructure

 

https://arstechnica.com/gadgets/2021/04/youtube-is-now-building-its-own-video-transcoding-chips/

 

Quote

Google has decided that YouTube demands such a huge transcoding workload that it needs to build its own server chips. The company detailed its new "Argos" chips in a YouTube blog post, a CNET interview, and in a paper for ASPLOS, the Architectural Support for Programming Languages and Operating Systems Conference. Just as there are GPUs for graphics workloads and Google's TPU (tensor processing unit) for AI workloads, the YouTube infrastructure team says it has created the "VCU" or "Video (trans)Coding Unit," which helps YouTube transcode a single video into over a dozen versions that it needs to provide a smooth, bandwidth-efficient, profitable video site.

The new video transcoding chip is called "Argos", and is mounted on a full-length PCI-E card, with two chips per card. There is a giant heat sink that covers the entire card, and what appears to be an external power connector as well. They very much resemble a video card and this is apparently no accident; Google said that they chose the video card like format because it fits with their existing accelerator trays.

 

Per what Google claims, the Argos chip provides "up to 20-33x improvements in compute efficiency compared to our previous optimized system, which was running software on traditional servers."

 

The Google supplied chip diagram indicates some more details about the chip design. It lists 10 "encoder cores" on each chip, with Google's white paper adding that "all other elements are off-the-shelf IP blocks." Google further notes in the white paper that each encoder core can encode 2160p in realtime, up to 60 FPS (frames per second) using three reference frames.

 

It is being reported that Google already has thousands of these chips already in operation, and thanks to these chips, people can watch 4k content on Youtube within hours instead of the days it previously took with their previous Intel Skylake and T4 Tensor core GPUs based systems. With these chips, Google apparently can save tons of money, even factoring in development and manufacturing costs to develop these chips.

 

Reading over what Google is saying about their chip, I can definitely see why Google elected to develop their own video transcoding chip specifically for their workload; this appears to be a far more efficient and cost effective way to handle the workload compared to other off-the-shelf options.

Link to comment
Share on other sites

Link to post
Share on other sites

I can hear Linus's heavy breathing already.

PC - NZXT H510 Elite, Ryzen 5600, 16GB DDR3200 2x8GB, EVGA 3070 FTW3 Ultra, Asus VG278HQ 165hz,

 

Mac - 1.4ghz i5, 4GB DDR3 1600mhz, Intel HD 5000.  x2

 

Endlessly wishing for a BBQ in space.

Link to comment
Share on other sites

Link to post
Share on other sites

Makes sense. ASICS will almost always blow general purpose silicon out of the water when it comes very specialized tasks. I am surprised they didn't do it sooner considering that google has the resources and personal in house to do so.

CPU: Intel i7 - 5820k @ 4.5GHz, Cooler: Corsair H80i, Motherboard: MSI X99S Gaming 7, RAM: Corsair Vengeance LPX 32GB DDR4 2666MHz CL16,

GPU: ASUS GTX 980 Strix, Case: Corsair 900D, PSU: Corsair AX860i 860W, Keyboard: Logitech G19, Mouse: Corsair M95, Storage: Intel 730 Series 480GB SSD, WD 1.5TB Black

Display: BenQ XL2730Z 2560x1440 144Hz

Link to comment
Share on other sites

Link to post
Share on other sites

What I'm interested in knowing is whether or not Google will sell these cards to other companies or cloud services. I could definitely see a use case for these for Instagram, Snapchat, Netflix, and others.

Arch is better than Ubuntu. Fight me peko.

Link to comment
Share on other sites

Link to post
Share on other sites

11 minutes ago, JLO64 said:

What I'm interested in knowing is whether or not Google will sell these cards to other companies or cloud services. I could definitely see a use case for these for Instagram, Snapchat, Netflix, Floatplane, and others.

Fixed...

My eyes see the past…

My camera lens sees the present…

Link to comment
Share on other sites

Link to post
Share on other sites

14 hours ago, JLO64 said:

What I'm interested in knowing is whether or not Google will sell these cards to other companies or cloud services. I could definitely see a use case for these for Instagram, Snapchat, Netflix, and others.

They likely don't need to. Transcoding is very off-the-shelf IP logic.

 

Like I imagine it works like this:

 

H264/H265 hardware decoder + H264(for mobile), VP9 encoder at 8 different resolutions. So you just put enough memory on it to handle 15 decoded frames and 5 encoded frames per resolution. So back-of-the-envelope:

 

48bit x 8k = 7680 × 4320 x 48bpp = 200MB per frame, so you need 3GB of memory on the input side.

Then on the output side, you need 1GB (200MB x 5) for 8K UHD HDR, 500MB for 8K non-HDR, 500MB for 4K UHD, 250MB for 4K non-UHD, 64MB for 1080p UHD, 32MB for 1080p, 5MB for 720p, 1.2MB for 360p, so add all that up: 5GB of memory. Plus you can recycle the same frame buffer to do H264 and VP9 at the same time. 

 

Note Youtube's suggested encoding has:

Quote
  • 2 consecutive B frames
  • Closed GOP. GOP of half the frame rate.

 

So if you're encoding for 60fps or 120fps HDR, the GOP is going to be 30 or 60. You don't need enough memory for the full 60, 90, 120, 144, etc frames, because each GOP begins with an I frame, and each GOP has two B frames, the rest are P frames. You can only seek to I-frames in a video. So typically at most you need just enough frames between the I frame and the B frame in order to do any kind of compression without seeking. A B frame however requires both forward and backward data, so you can't just discard the B frame until you've put all the data into it. So under the assumption you have one B frame per 15 frames, this is probably the minimum reasonable. Again, back-of-the-envelope. There will be other kinds of videos encoded that are not h264/h265.

https://support.google.com/youtube/troubleshooter/2888402?hl=en

Quote
  • .MOV
  • .MPEG4
  • .MP4
  • .AVI
  • .WMV
  • .MPEGPS
  • .FLV
  • 3GPP
  • WebM
  • DNxHR
  • ProRes
  • CineForm
  • HEVC (h265)

So the VAST majority of these are software codecs found in FFMPEG, and if you rip a stream from youtube, you'll usually see exactly what version of FFMPEG they forked from. Like I know from experience I can upload ZBMV video from dosbox with no I frames, or used in other emulators to youtube and it will work, but it will not seek very good because ZBMV is more of an archival format than a streamable one. It's actually kinda interesting to see as the videos will PLAY on youtube but will not seek, which tells me that they are doing a lot of playback on-the-fly, right off the source video. In the future they might just build video files with variable bitrate and resolution so that they can dynamically "build up" from the lowest resolution to the resolution the stream is requested at, which suggests why some videos don't expose both 30 and 60fps streams when the source is 60fps, since that leads to timing issues if you try to synchronize the audio to streams running at different rates (Which are sent as separate streams.)

 

That said, youtube botches up RGB source videos in colorspace conversion, this is a known thing that people who do animation (particularly that which is converted from Adobe Flash/Animate) to youtube, is that they have to add noise to the video otherwise the encoder will turn all the gradients in to thick color bands, and the color will be horrific.

Link to comment
Share on other sites

Link to post
Share on other sites

I find it amusing the VCU is described as like GPU form factor when I see GPUs as one example of an add in expansion card. Might be showing my age, but at some point in my life before they got integrated into the motherboard, systems had an IO card (IDE, serial, parallel), video card (before 3D was a thing), sound card, and maybe even a network card if you were sufficiently advanced in those days. So a VCU card continues that history. There isn't really any other alternative is there?

 

37 minutes ago, Kisai said:

That said, youtube botches up RGB source videos in colorspace conversion, this is a known thing that people who do animation (particularly that which is converted from Adobe Flash/Animate) to youtube, is that they have to add noise to the video otherwise the encoder will turn all the gradients in to thick color bands, and the color will be horrific.

To say they botch the conversion implies if you can supply the same video in a format other than RGB encoded so it doesn't have to do the conversion, it is fine? Basically I'm asking to confirm it is the colourspace conversion that is at fault and not the more general codec properties? It has long been an annoyance of mine that most codecs for stills or video are often tuned around photorealistic content. Other content does ok at higher bitrates, but when squeezed cracks start to show. Don't think this is going to go away any time soon.

Gaming system: R7 7800X3D, Asus ROG Strix B650E-F Gaming Wifi, Thermalright Phantom Spirit 120 SE ARGB, Corsair Vengeance 2x 32GB 6000C30, RTX 4070, MSI MPG A850G, Fractal Design North, Samsung 990 Pro 2TB, Acer Predator XB241YU 24" 1440p 144Hz G-Sync + HP LP2475w 24" 1200p 60Hz wide gamut
Productivity system: i9-7980XE, Asus X299 TUF mark 2, Noctua D15, 64GB ram (mixed), RTX 3070, NZXT E850, GameMax Abyss, Samsung 980 Pro 2TB, random 1080p + 720p displays.
Gaming laptop: Lenovo Legion 5, 5800H, RTX 3070, Kingston DDR4 3200C22 2x16GB 2Rx8, Kingston Fury Renegade 1TB + Crucial P1 1TB SSD, 165 Hz IPS 1080p G-Sync Compatible

Link to comment
Share on other sites

Link to post
Share on other sites

43 minutes ago, porina said:

To say they botch the conversion implies if you can supply the same video in a format other than RGB encoded so it doesn't have to do the conversion, it is fine? Basically I'm asking to confirm it is the colourspace conversion that is at fault and not the more general codec properties? It has long been an annoyance of mine that most codecs for stills or video are often tuned around photorealistic content. Other content does ok at higher bitrates, but when squeezed cracks start to show. Don't think this is going to go away any time soon.

 

The thing is, you end up having to "dirty" the animation, and pre-encode it to YUV420 to make youtube not compress it into a mess. Like I've done a lot tests in throwing things at youtube to see what it does, and youtube, repeatedly, even on 8-bit palette visuals, makes what should be gradients into a mess, particularly along edges. So my strategy for this is to always upscale the content to 2x or 3x the original resolution. 

 

So if you solve the colorspace problem in advance, the output won't have look like vaseline has been smeared all over the screen. This is a problem also seen when people stream games from consoles, as the consoles HAVE a RGB output, IF they are not connected to a TV. If they're connected to a TV they are likely in 16-235 not 0-255. 

 

At any rate, animation, basically stuff is too clean on the input side, and thus the compression works too hard converting it from RGB24 to YUV420.

 

Link to comment
Share on other sites

Link to post
Share on other sites

iS tHiS tHe eNd oF iNtel?

 

seriously: Just logical, why would you buy hardware that isn't specialized for your usecase. This is something we will see more and more. with a (almost) "free" architectures like ARM and RISC-V you have a platform with already huge software support ready to be implemented. ( I know designing custom SoCs is more difficult than that but it has become definetly easier to make them) . Independent Fab companies like Samsung, TSMC and Global Foundries are making it easy to produce chips with high performance nodes.

Hi

 

Spoiler
Spoiler
Spoiler
Spoiler
Spoiler
Spoiler
Spoiler
Spoiler
Spoiler
Spoiler
Spoiler
Spoiler
Spoiler
Spoiler
Spoiler
Spoiler
Spoiler
Spoiler
Spoiler
Spoiler
Spoiler
Spoiler
Spoiler
Spoiler
Spoiler
Spoiler
Spoiler

hi

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Link to comment
Share on other sites

Link to post
Share on other sites

Read the blog post, and I'm rather underwhelmed. Firstly, because there is no way they were doing transcoding in "software" previously. They'd have been using accelerators inside a standard server, so this likely just means they've moved to custom server packages. This is a lot more of the result of increases in interconnect bandwidth and faster storage.

 

The other thing is: why wasn't this done by 2015 rather than started in 2015? It's not like ASICs are new.

Link to comment
Share on other sites

Link to post
Share on other sites

1 hour ago, wat3rmelon_man2 said:

But can it run Crysis?

Well, it can decode Crysis intro videos with zero CPU load 😄

Link to comment
Share on other sites

Link to post
Share on other sites

1 hour ago, Taf the Ghost said:

Read the blog post, and I'm rather underwhelmed. Firstly, because there is no way they were doing transcoding in "software" previously. They'd have been using accelerators inside a standard server, so this likely just means they've moved to custom server packages. This is a lot more of the result of increases in interconnect bandwidth and faster storage.

 

The other thing is: why wasn't this done by 2015 rather than started in 2015? It's not like ASICs are new.

AV1 is why without a doubt

Link to comment
Share on other sites

Link to post
Share on other sites

1 minute ago, Kisai said:

AV1 is why without a doubt

VP9 isn't the exactly cheap computationally, but my real assumption is that AV1 accelerators were going to be expensive and they finally got a massive migration to in-house designs finished. I'd also take a good guess that some group got removed from the project in the middle, because it shouldn't have taken 6 years. Unless there was some long-term IP tie up issue they've just skipped mentioning.

Link to comment
Share on other sites

Link to post
Share on other sites

Do these chips motherboards have RGB headers?

PC - NZXT H510 Elite, Ryzen 5600, 16GB DDR3200 2x8GB, EVGA 3070 FTW3 Ultra, Asus VG278HQ 165hz,

 

Mac - 1.4ghz i5, 4GB DDR3 1600mhz, Intel HD 5000.  x2

 

Endlessly wishing for a BBQ in space.

Link to comment
Share on other sites

Link to post
Share on other sites

On 4/23/2021 at 9:44 AM, Tieox said:

I can hear Linus's heavy breathing already.

hes going to push it to its limits now. brace yourselves.

Main PC: the literature club machine

Intel I5 9600k @ 4.2 Ghz | MSI z390-a pro | G.Skill Trident Z RGB 32 GB 3000Mhz | Samsung 970 Evo 500 GB | Seagate barracuda 3.5" 2.5tb  | Thermaltake Floe Riing RGB 240 | Asus GeForce GTX 1660 Ti 6 GB DUAL OC | Thermaltake Core P3 TG Snow Edition

 

Daily drivers

OPPO A52 | Razer Blackwidow Chroma | Razer Deathadder V2 Pro | Beryodynamic DT 990 PRO | Focusrite Scarlett solo gen 2

Link to comment
Share on other sites

Link to post
Share on other sites

On 4/23/2021 at 9:44 PM, Kisai said:

AV1 is why without a doubt

I agree, they're one of the contributors and if they plan to use it seriously, they need to offer everything in AV1, which normally would require them to multiply what infrastructure they use for other codecs.

Seeing AV1 decoders in new hardware and YT making that step, seems AV1 is very close to the mainstream now.

Link to comment
Share on other sites

Link to post
Share on other sites

On 4/23/2021 at 7:50 PM, Taf the Ghost said:

Read the blog post, and I'm rather underwhelmed. Firstly, because there is no way they were doing transcoding in "software" previously. They'd have been using accelerators inside a standard server, so this likely just means they've moved to custom server packages. This is a lot more of the result of increases in interconnect bandwidth and faster storage.

 

The other thing is: why wasn't this done by 2015 rather than started in 2015? It's not like ASICs are new.

They were in fact transcoding in software before. At least for some codecs from what I can tell.

I have some older Youtube videos downloaded and they were encoded with x264 in software for the AVC version. Newer videos seems to be encoded with something else, including the AVC versions.

Link to comment
Share on other sites

Link to post
Share on other sites

11 hours ago, LAwLz said:

They were in fact transcoding in software before. At least for some codecs from what I can tell.

I have some older Youtube videos downloaded and they were encoded with x264 in software for the AVC version. Newer videos seems to be encoded with something else, including the AVC versions.

This actually might point to a different issue, which was really their problem: they had different work paths for every codec. YT also goes back and reencodes older videos to save space, which might be where most of the software encoder stack is hitting.

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×