Jump to content

I think Linus would be better off using FFmpeg for his 6 cam Badminton streaming / recording

Typically there is a topic posted for every video in the “LTT Official” portion of the forum, but for this video specifically there was never a post / discussion thread created… Not sure if this was by design because they didn’t want everyone and their mom telling them how they think they could do it better, or if they just forgot to make a thread for it. Assuming it’s the latter, I’m really not sure where else to put this.
 

Love the video, never seen anyone try to capture more than 3 4k sources on one PC. I understand that you already have a working solution and probably don’t want to change it, but here’s a few things I thought of while watching the video:

  1. You can bypass Nvidia’s artificial encode limit via this patch, not sure if that’s against your guys’ partnership with them or something but it’s super easy to apply and works perfectly.
  2. You can avoid high 3D GPU usage by using FFmpeg to stream and record instead of OBS. Additionally, you would only need that first Quadro with this method as the GP100 is perfectly capable of encoding 6 streams of 1080p (along with basically any other Nvidia card released in the last 5 years assuming you're using the patch), which seemed to be your final output resolution despite the 4K cameras. 
     

Using my patched GTX 1080 I am capable of encoding up to 4 streams of 4K60 using FFmpeg, and the GP100 has 3 NVENC encoding chips while the GTX 1080 has 2 (same architecture). Thus, in theory the GP100 could do 6 encodes of 4K60 by itself, which would be cool to see. Though at that point your limitation is more likely to be your CPU, or even software which starts to hiccup as multiple instances of recordings inadvertently hit the same threads and such.
 

I would suggest giving FFmpeg a try with a command / parameters like this:

-thread_queue_size 9999
-indexmem 9999 
-f dshow 
-rtbufsize 2147.48M 
-video_size 1920x1080 
-framerate 60
-i video=”[video device]”:audio=”[audio device]” 
-map 0 
-c:v h264_nvenc 
-preset: hp 
-r 60 
-rc-lookahead 120 
-pix_fmt nv12 
-b:v 6M
-b:v 6M 
-minrate 6M
-maxrate 6M 
-bufsize 6M 
-c:a aac 
-ar 44100 
-b:a 320k 
-vsync 1 
-max_muxing_queue_size 9999 
-f mpegts 
C:\Users\[user]\Videos\FFmpeg\Output.ts


You can also use the tee psuedo-muxer in FFmpeg to send the same encode to 2 places, like a stream to Twitch and a local file. What makes this particularly convenient is that you don’t have to do the same encode work twice like you have to in OBS:

-f tee 
"[f=mpegts]C:\Users\[user]\Videos\FFmpeg\Output.ts|[f=mpegts]udp://10.0.1.255:1234/"


Lastly if desired FFmpeg also has a segment muxer, which would allow you to record 24/7 without over-filling your hard drives, the recording will automatically overwrite the first part when it hits the maximum part you specified:

-f segment 
-segment_time 1800 
-segment_wrap 48 
-reset_timestamps 1
-segment_format_options max_delay=0 
C:\Users\[user]\Videos\FFmpeg\Output%02d.ts


You can even combine the tee and segment logic to stream / record 24/7 without every overfilling the drives:

-f tee 
"[f=segment:segment_time=1800:segment_wrap=48:reset_timestamps=1:segment_format_options=max_delay=0]C:\Users\[user]\Videos\Output%02d.ts|[f=mpegts]udp://10.0.1.255:1234/"


Since FFmpeg is run in the console you can call these 6 encodes programmatically in something like Powershell using Start-Process, allowing you to easily launch everything from one place. I’ve attached a zip containing the file structure for doing something like this.
 

Linus Example.zip

Main PC: Corsair 900D | ProArt Z690-Creator | Intel 13900K | RTX 4090 | Trident Z5 (2x32GB) | 1TB 980 Pro, 2TB Sabrent Rocket 4+, 2TB 980 Pro, 1TB Sabrent Rocket | HX1200i

Capture PC: Meshify XL | Designare TRX40 | AMD 3960X | 2xRTX 4070 TI | Trident Z (4x16GB) | 2TB 970 Evo Plus, 1TB 970 Evo Plus | Dual HDMI 4K Plus LT, 2xElgato 4K 60 Pro, HX850

Media / Render PC: Corsair 900D (shared) | ASRock X399M | AMD 2970WX | RTX 4070 TI | Trident Z (2x16GB) | 2TB Samsung 970 Evo | 2xElgato HD60 Pro | HX750
Full Room Watercooling: EK X3 400 | EK-XTOP Revo Dual D5 | 4xHardware Labs 560GTX | 16xSilentWings 4 Pro  | EVGA 450 B3

Peripherals: Logitech G502 X |  Wooting 60HE | Xbox Elite Controller Series 2 | Logitech G502 Wireless | Logitech MX Keys Mechanical

Displays: Asus XG35VQ | 2xLG 24UD58-B | LG 65UH6030 | Asus VH242H | BenQ GW2480 | HP 22CWA | Kenowa CNC-1080P | Asus VC39H

Audio Interfaces : RME Fireface UFX+, Scarlett 18i20, RME HDSPe RayDAT, RME HDSPe MADI FX, RME ADI-648, RME ADI-192 DD

Audio Playback: 2xYamaha HS5 & Yamaha HS8s | Sennheiser HD820, Sennheiser IE 500 Pro, Ultimate Ears RR CIEMs

Link to comment
Share on other sites

Link to post
Share on other sites

18 minutes ago, Ninbura said:

Love the video, never seen anyone try to capture more than 3 4k sources on one PC

I've done over 20 :P

(to be fair, it wasn't a single computer, but a k8s cluster with some V100s)

 

A nice thing to add is that ffmpeg can push rtmp streams directly, so they can be both pushed to twitch while also being recorded to disk.

 

Another thing that I commented on the video (since there's no dedicated thread here) was the following:

Quote

The nvenc encoder found on nvidia GPUs is the same for every card of the same generation, meaning that a GP100 has the same encoder of a GTX 1050, with the only difference being the driver limit for consumer cards. So a couple 1650 Supers (not the regular ones) would deliver better performance/quality while also being able to handle up to 6 streams (the GTX limit is 3 streams per NVENC chip)

Of course there's also the difference in the amount of actual NVENC chips inside each GPU, but a GPU like the 1070 already has 2 of those (and coupled with patched drivers it's a better solution than using a Quadro IMO).

 

 

From my experience, a single 4k30 h264 stream (with the command shown below) uses like 40~50% of the encoder, so 2 4k30 streams per nvenc module on turing should be doable.

ffmpeg -loglevel debug -threads:v 2 -threads:a 8 -filter_threads 2 -thread_queue_size 5M \
-f x11grab -s 3840x2160 -framerate 30 -i :0.0 -thread_queue_size 5M -f alsa -ac 2 \
-i hw:0,0 -bsf:a aac_adtstoasc -c:a aac -ac 2 -b:a 128k \
-b:v 20M -minrate:v 20M -maxrate:v 20M -bufsize:v 20M -c:v h264_nvenc \
-qp:v 19  -profile:v high -rc:v cbr_ld_hq -r:v 60 -g:v 120 -bf:v 3 -refs:v 16 -f flv /dev/null

 

FX6300 @ 4.2GHz | Gigabyte GA-78LMT-USB3 R2 | Hyper 212x | 3x 8GB + 1x 4GB @ 1600MHz | Gigabyte 2060 Super | Corsair CX650M | LG 43UK6520PSA
ASUS X550LN | i5 4210u | 12GB
Lenovo N23 Yoga

Link to comment
Share on other sites

Link to post
Share on other sites

50 minutes ago, igormp said:

Of course there's also the difference in the amount of actual NVENC chips inside each GPU, but a GPU like the 1070 already has 2 of those (and coupled with patched drivers it's a better solution than using a Quadro IMO).

The amount of chips makes a big difference though, not saying you weren't accounting for that but just pointing it out. I remember switching from my GTX 1080 to a GTX 1050 because I kept hearing everyone say "there's no difference" only to find that my typical FFmpeg command was overloading the encoder in the GTX 1050 when my GTX 1080 had 50% headroom. Essentially each chip of the same architecture can pull the same weight, so a GTX 1080 has 100% more bandwidth than the GTX 1050 and the GP100 has 200% more bandwidth when compared to the GTX 1050. Additionally while the Nvidia encode matrix states that the 1070 has 2 NVENC chips, only one of them is activated (at least in my experience), and the patch wasn't able to activate the second one, once again at least in my experience.
 

50 minutes ago, igormp said:

From my experience, a single 4k30 h264 stream (with the command shown below) uses like 40~50% of the encoder, so 2 4k30 streams per nvenc module on turing should be doable.

Not sure why this is the case, but the encoder doesn't care what framerate you're doing the encode at, just the resolution. When I say "it doesn't care" I mean within reason, or within the specified spec. What I mean to say is if you do a 4K30 encode it uses the same bandwidth as a 4K60 encode, in-regards to the NVENC encoder at least, obviously a higher framerate encode will further stress the system elsewhere. The Turing GPUs do have the added benefit of a higher quality output, but they also suffer from a major loss in bandwidth when compared to a GTX 1080 (half as much) and especially when compared to GP100 (1/3 the bandwidth). Though, this probably wouldn't matter in Linus's case as I believe the end result was 6 1080p encodes, which the Turing architecture could handle. Unless he kept using OBS, in-which case he would be doing 12 1080p encodes, as OBS has to do a separate encode for the recording and stream.

Bottom line, if he switched to FFmpeg he could get away with using a single 1650 super (which uses 7th gen NVENC) to handle all 6 1080p encodes, with the added benefit of the increase quality which the Turing NVENC encoder offers.

Edit: Just want to clarify that the Turing NVENC encoding chips don't suffer a loss in bandwidth when compared 1:1 to a Pascal NVENC encoding chip, but rather there are just no GPUs with more than 1 Turing NVENC chip like there are GPUs with multiple Pascal encoding chips (i.e. GTX 1080, GP100).

Main PC: Corsair 900D | ProArt Z690-Creator | Intel 13900K | RTX 4090 | Trident Z5 (2x32GB) | 1TB 980 Pro, 2TB Sabrent Rocket 4+, 2TB 980 Pro, 1TB Sabrent Rocket | HX1200i

Capture PC: Meshify XL | Designare TRX40 | AMD 3960X | 2xRTX 4070 TI | Trident Z (4x16GB) | 2TB 970 Evo Plus, 1TB 970 Evo Plus | Dual HDMI 4K Plus LT, 2xElgato 4K 60 Pro, HX850

Media / Render PC: Corsair 900D (shared) | ASRock X399M | AMD 2970WX | RTX 4070 TI | Trident Z (2x16GB) | 2TB Samsung 970 Evo | 2xElgato HD60 Pro | HX750
Full Room Watercooling: EK X3 400 | EK-XTOP Revo Dual D5 | 4xHardware Labs 560GTX | 16xSilentWings 4 Pro  | EVGA 450 B3

Peripherals: Logitech G502 X |  Wooting 60HE | Xbox Elite Controller Series 2 | Logitech G502 Wireless | Logitech MX Keys Mechanical

Displays: Asus XG35VQ | 2xLG 24UD58-B | LG 65UH6030 | Asus VH242H | BenQ GW2480 | HP 22CWA | Kenowa CNC-1080P | Asus VC39H

Audio Interfaces : RME Fireface UFX+, Scarlett 18i20, RME HDSPe RayDAT, RME HDSPe MADI FX, RME ADI-648, RME ADI-192 DD

Audio Playback: 2xYamaha HS5 & Yamaha HS8s | Sennheiser HD820, Sennheiser IE 500 Pro, Ultimate Ears RR CIEMs

Link to comment
Share on other sites

Link to post
Share on other sites

10 hours ago, Ninbura said:

and the patch wasn't able to activate the second one, once again at least in my experience.

Ouch, didn't know about that. To be fair, my experience has been with either a single consumer GPU (mine) or a handful of teslas in the cloud.

 

10 hours ago, Ninbura said:

Not sure why this is the case, but the encoder doesn't care what framerate you're doing the encode at, just the resolution.

I just mentioned it because it's what I tested. I believe what makes an actual difference is the bitrate used (since that implies more bandwidth required).

FX6300 @ 4.2GHz | Gigabyte GA-78LMT-USB3 R2 | Hyper 212x | 3x 8GB + 1x 4GB @ 1600MHz | Gigabyte 2060 Super | Corsair CX650M | LG 43UK6520PSA
ASUS X550LN | i5 4210u | 12GB
Lenovo N23 Yoga

Link to comment
Share on other sites

Link to post
Share on other sites

While we're on the topic of the badminton video, for the streams he should just use youtube, it allows you to do multiple switchable cams on the output so the user (viewer) can just have 1 player/page up and switch between all the cams. Even works in viewing the vods after.

Link to comment
Share on other sites

Link to post
Share on other sites

3 hours ago, igormp said:

I just mentioned it because it's what I tested. I believe what makes an actual difference is the bitrate used (since that implies more bandwidth required).

I promise my goal isn't to contradict haha, but at least in my testing the bitrate also has no affect on the encoder usage, almost entirely just the resolution. I've also found how "busy" the feed that's being encoded affects encoder usage. For example if I'm just recording a PS5 Pro on the home screen, my computer sitting on my desktop, and my camera while it's off, all at 4K, the encoder sits around 55%. But when I open Demon Souls on the PS5 (spinning the camera), Rocket League on the PC (driving in circles), and turn my camera on my encoder sits around 65%.

I'm not an expert on what's happening at the lowest level, but it seems to me the encoder is rendering the highest quality feed based on the resolution and then you pull a much lower quality stream from it depending on your other parameters. I mean I guess I do see some variance in-terms of encoder usage as I use different presets, like hp (high performance) & hq (high quality), but for the most part it's a wash.

Definitely what seems to pull 90% of the weight when it comes to NVENC encoder usage is the target resolution, would love for some huge brain expert to explain exactly what's happening there.

Main PC: Corsair 900D | ProArt Z690-Creator | Intel 13900K | RTX 4090 | Trident Z5 (2x32GB) | 1TB 980 Pro, 2TB Sabrent Rocket 4+, 2TB 980 Pro, 1TB Sabrent Rocket | HX1200i

Capture PC: Meshify XL | Designare TRX40 | AMD 3960X | 2xRTX 4070 TI | Trident Z (4x16GB) | 2TB 970 Evo Plus, 1TB 970 Evo Plus | Dual HDMI 4K Plus LT, 2xElgato 4K 60 Pro, HX850

Media / Render PC: Corsair 900D (shared) | ASRock X399M | AMD 2970WX | RTX 4070 TI | Trident Z (2x16GB) | 2TB Samsung 970 Evo | 2xElgato HD60 Pro | HX750
Full Room Watercooling: EK X3 400 | EK-XTOP Revo Dual D5 | 4xHardware Labs 560GTX | 16xSilentWings 4 Pro  | EVGA 450 B3

Peripherals: Logitech G502 X |  Wooting 60HE | Xbox Elite Controller Series 2 | Logitech G502 Wireless | Logitech MX Keys Mechanical

Displays: Asus XG35VQ | 2xLG 24UD58-B | LG 65UH6030 | Asus VH242H | BenQ GW2480 | HP 22CWA | Kenowa CNC-1080P | Asus VC39H

Audio Interfaces : RME Fireface UFX+, Scarlett 18i20, RME HDSPe RayDAT, RME HDSPe MADI FX, RME ADI-648, RME ADI-192 DD

Audio Playback: 2xYamaha HS5 & Yamaha HS8s | Sennheiser HD820, Sennheiser IE 500 Pro, Ultimate Ears RR CIEMs

Link to comment
Share on other sites

Link to post
Share on other sites

51 minutes ago, Ninbura said:

I promise my goal isn't to contradict haha, but at least in my testing the bitrate also has no affect on the encoder usage, almost entirely just the resolution. I've also found how "busy" the feed that's being encoded affects encoder usage. For example if I'm just recording a PS5 Pro on the home screen, my computer sitting on my desktop, and my camera while it's off, all at 4K, the encoder sits around 55%. But when I open Demon Souls on the PS5 (spinning the camera), Rocket League on the PC (driving in circles), and turn my camera on my encoder sits around 65%.

I'm not an expert on what's happening at the lowest level, but it seems to me the encoder is rendering the highest quality feed based on the resolution and then you pull a much lower quality stream from it depending on your other parameters. I mean I guess I do see some variance in-terms of encoder usage as I use different presets, like hp (high performance) & hq (high quality), but for the most part it's a wash.

Definitely what seems to pull 90% of the weight when it comes to NVENC encoder usage is the target resolution, would love for some huge brain expert to explain exactly what's happening there.

If you keep an eye on the terminal running ffmpeg, it'll output the current bitrate being used. A static or low motion screen wouldn't make much use of it, but try to open something with heavy movements (like a really fast-paced game) and you'll see both bitrate and encoder usage go up (the latter you already mentioned with rocket league/demon souls).

FX6300 @ 4.2GHz | Gigabyte GA-78LMT-USB3 R2 | Hyper 212x | 3x 8GB + 1x 4GB @ 1600MHz | Gigabyte 2060 Super | Corsair CX650M | LG 43UK6520PSA
ASUS X550LN | i5 4210u | 12GB
Lenovo N23 Yoga

Link to comment
Share on other sites

Link to post
Share on other sites

-> Moved to Programs, Apps and Websites

^^^^ That's my post ^^^^
<-- This is me --- That's your scrollbar -->
vvvv Who's there? vvvv

Link to comment
Share on other sites

Link to post
Share on other sites

On 12/17/2020 at 1:16 AM, Ninbura said:

You can avoid high 3D GPU usage by using FFmpeg to stream and record instead of OBS. Additionally, you would only need that first Quadro with this method as the GP100 is perfectly capable of encoding 6 streams of 1080p (along with basically any other Nvidia card released in the last 5 years assuming you're using the patch), which seemed to be your final output resolution despite the 4K cameras. 

 

What makes FFmpeg use less GPU for encoding compared to OBS?

Link to comment
Share on other sites

Link to post
Share on other sites

12 minutes ago, Emily123 said:

 

What makes FFmpeg use less GPU for encoding compared to OBS?

It doesn't use much less encoding, it just doesn't have any 3D usage, which is really the crux of OBS.

Main PC: Corsair 900D | ProArt Z690-Creator | Intel 13900K | RTX 4090 | Trident Z5 (2x32GB) | 1TB 980 Pro, 2TB Sabrent Rocket 4+, 2TB 980 Pro, 1TB Sabrent Rocket | HX1200i

Capture PC: Meshify XL | Designare TRX40 | AMD 3960X | 2xRTX 4070 TI | Trident Z (4x16GB) | 2TB 970 Evo Plus, 1TB 970 Evo Plus | Dual HDMI 4K Plus LT, 2xElgato 4K 60 Pro, HX850

Media / Render PC: Corsair 900D (shared) | ASRock X399M | AMD 2970WX | RTX 4070 TI | Trident Z (2x16GB) | 2TB Samsung 970 Evo | 2xElgato HD60 Pro | HX750
Full Room Watercooling: EK X3 400 | EK-XTOP Revo Dual D5 | 4xHardware Labs 560GTX | 16xSilentWings 4 Pro  | EVGA 450 B3

Peripherals: Logitech G502 X |  Wooting 60HE | Xbox Elite Controller Series 2 | Logitech G502 Wireless | Logitech MX Keys Mechanical

Displays: Asus XG35VQ | 2xLG 24UD58-B | LG 65UH6030 | Asus VH242H | BenQ GW2480 | HP 22CWA | Kenowa CNC-1080P | Asus VC39H

Audio Interfaces : RME Fireface UFX+, Scarlett 18i20, RME HDSPe RayDAT, RME HDSPe MADI FX, RME ADI-648, RME ADI-192 DD

Audio Playback: 2xYamaha HS5 & Yamaha HS8s | Sennheiser HD820, Sennheiser IE 500 Pro, Ultimate Ears RR CIEMs

Link to comment
Share on other sites

Link to post
Share on other sites

13 minutes ago, Ninbura said:

It doesn't use much less encoding, it just doesn't have any 3D usage, which is really the crux of OBS.

 

What causes OBS to use so much 3D? The only thing I can think of is the preview but you can turn that off.

Link to comment
Share on other sites

Link to post
Share on other sites

On 12/18/2020 at 11:45 PM, Emily123 said:

 

What causes OBS to use so much 3D? The only thing I can think of is the preview but you can turn that off.

I'm not precisely aware of the specifics, but even when you disable the preview it uses a ton of 3D bandwidth.

Something to do with OBS rendering the frame with the GPU but then encoding it with either the CPU or GPU encoder, no matter how you shake it, a 4K OBS canvas eats a lot of 3D GPU bandwidth.

Main PC: Corsair 900D | ProArt Z690-Creator | Intel 13900K | RTX 4090 | Trident Z5 (2x32GB) | 1TB 980 Pro, 2TB Sabrent Rocket 4+, 2TB 980 Pro, 1TB Sabrent Rocket | HX1200i

Capture PC: Meshify XL | Designare TRX40 | AMD 3960X | 2xRTX 4070 TI | Trident Z (4x16GB) | 2TB 970 Evo Plus, 1TB 970 Evo Plus | Dual HDMI 4K Plus LT, 2xElgato 4K 60 Pro, HX850

Media / Render PC: Corsair 900D (shared) | ASRock X399M | AMD 2970WX | RTX 4070 TI | Trident Z (2x16GB) | 2TB Samsung 970 Evo | 2xElgato HD60 Pro | HX750
Full Room Watercooling: EK X3 400 | EK-XTOP Revo Dual D5 | 4xHardware Labs 560GTX | 16xSilentWings 4 Pro  | EVGA 450 B3

Peripherals: Logitech G502 X |  Wooting 60HE | Xbox Elite Controller Series 2 | Logitech G502 Wireless | Logitech MX Keys Mechanical

Displays: Asus XG35VQ | 2xLG 24UD58-B | LG 65UH6030 | Asus VH242H | BenQ GW2480 | HP 22CWA | Kenowa CNC-1080P | Asus VC39H

Audio Interfaces : RME Fireface UFX+, Scarlett 18i20, RME HDSPe RayDAT, RME HDSPe MADI FX, RME ADI-648, RME ADI-192 DD

Audio Playback: 2xYamaha HS5 & Yamaha HS8s | Sennheiser HD820, Sennheiser IE 500 Pro, Ultimate Ears RR CIEMs

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×