Jump to content

Each video frame can only be split in so many chunks and then there's the latency involved in uploading the data to each video card to be processed and then retrieving back from the video card...

Bitcoin motherboards also have a lot of slots but electrically they're usually only pci-e x1 slots (in x16 physical format) and to top that off, they're usually going to the chipset

So you may have 8 pci-e x16 slots but each is just x1 electrically, basically have 500MB/s or 1 GB/s to each video card depending if it's pci-e v2.0 or v3.0 and the chipset has only 2-4 GB/s link to the processor.

For 4K 60 fps content, you're looking at minimum 3840 x 2160 x 3 bytes per pixel x 60 fps = 1,492,992,000 per second or 1.42 GB/s required just to push the raw frames into video cards and then to take the results back and process the results and combine them together.

Link to post
Share on other sites

It's not just about raw speed.

It's also about uploading the CUDA or OpenCL code into the graphics card, preparing it, formatting each video frame into a specific format that the Cuda or OpenCL code would be able to work with (you can't just "upload" a BMP or PNG or JPG picture into the video card's memory) , then you have to wait for the code to execute on the video card and then you have to download the results from the video card and process those results into whatever its needed.

Usually, the video card doesn't output ready to use encoded video, the processor still have to take whatever each video card outputs and do some final processing.

So at several points there are some latencies, some waiting around, you don't upload and download continuously from the video card, so the video card doesn't get to reach those high transfer speeds anyway (compare it to copying thousands of JPG pictures to your hard drive ... you won't get the same speed as copying a bluray movie to your hard drive)

The more graphics card each frame would be split into in order to do work on it, the more these small latencies and pauses increase, they don't overlap. Some video card will finish their part faster than others and in the end, the software will have to wait for the last card to be done to "centralize" the results and produce the encoded frame.

Some hardware encoders just prefer to assign one frame to each video used instead of splitting the frame into multiple strips that are passed to separate graphics cards for this reason.聽 This way, while the software is busy using the processor to centralize the results of previous frame and produce the final encoded frame, the cards can be busy processing the next frame, and so on.

IMHO it would be faster to have multiple computers load the project from one central server and start rendering the project from a specific point in the timeline, then you can "glue" together the segments and have the final video.

This is also a bit problematic, especially around the points where two segments are joined together, unless you encode the video with constant bit rate.

For some cases, you must encode a video with some restraints, like for example you can have video with average bitrate of 25 mbps but can go as high as 50 mbps for brief moments but never exceed 200 mbps worth of data over 5 seconds (so 40mbps per second for 5 seconds, or 50mbps for 2 seconds etc)

When you split the timeline in multiple parts, you could have situations where for example after 2 segments are done, if you take the last 2 seconds of a segment and the first 3 seconds of the next segment, you'd have bitrates that don't respect the conditions you want.

For example first segment could have 50mbps on last 2 seconds, and next segment could have 50 mbps on first 3 seconds ... now you have 5 seconds with 50mbps, therefore you have exceeded your rule about not going over 200 mbps over the course of 5 seconds.

Link to post
Share on other sites

Super awesome explanation! Details explained very well. Thank you!

I just have a pile of GC & saw this video聽Barnacules did about a server deal he got from Puget Systems. Then I was wondering if聽I could build a budget version.聽

My post production lab is full of aging 12 core Mac's. I have 4k cameras but I'm not shooting in 4k yet (FHD 60p is a mountain of data). Looking at building some render solution rather than update all my stations. I am the definition of low budget. I am always looking for cheep parts to make stuff faster.聽

Link to post
Share on other sites

13 hours ago, mariushm said:

Usually, the video card doesn't output ready to use encoded video, the processor still have to take whatever each video card outputs and do some final processing.

In fact the CPU hast to do much more than the GPU while encoding.

Link to post
Share on other sites

1 hour ago, .spider. said:

In fact the CPU hast to do much more than the GPU while encoding.

It depends on how you use the video card.

You can use AMD's AMF (Advanced Media Framework) (or nVidia's alternative) and push frames into the video card and you get an encoded video out of the card, but this is basically a fixed function hardware encoder with minimal configurable settings.

This is basically what tools like OBS do when you use AMF, nvEnc, VCE (deprecated these days), Intel Quicksync etc

Or you can do a hybrid thing, upload your own OpenCL or Cuda code into the video card and do motion search, predictions, various parts in the "encoding a frame chain", do these on the video card and then use the results along with some cpu side computations to encode each frame. This gives you the maximum flexibility, the highest configurable options. Didn't check the video above, but I assume that's what he's talking about.

Link to post
Share on other sites

27 minutes ago, mariushm said:

Or you can do a hybrid thing, upload your own OpenCL or Cuda code into the video card and do motion search, predictions, various parts in the "encoding a frame chain", do these on the video card and then use the results along with some cpu side computations to encode each frame. This gives you the maximum flexibility, the highest configurable options. Didn't check the video above, but I assume that's what he's talking about.

It's not possible to do these things on a GPU efficiently.

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now