Jump to content

Now hear me out on this one. I know it was more often than not a pretty messy technology. And with generational uplifts that were actually worth talking about, it didn't make much sense to buy two last generation card instead of one next generation card. But looking at GPUs (especially at you, Nvidia) the performance uplift is mostly very negligible. Considering this, it might make much more sense to combine, let's say two 4070s (about 1200€) instead of buying a 5080 (also about 1200€)

Especially when some GPUs (again, looking at Nvidia) still come with a mere 8 Gigs of VRam. Take a hypothetical dual 5060 8GB setup for example. Where I live, this would be between 700 and 800€ for GPU, which honestly isn't a bad price anymore. Compare that to a single 5080 Setup for example, which would run you from abou 1200-1500€ for the GPU alone and it doesn't really look like a bad deal anymore. Sure you wouldn't get 5080 performance, but sure as hell get a lot more performance per dollar. Let's take the first google result for performance of these two cards:

5080 average FPS across tested games in 1440p: ~160 FPS
5060 average FPS across tested games in 1440p: ~40 FPS

 

Divide that by the price and that gives us a rough estimate for performance per dollar for both cards:
5080: 160 FPS / 1350€ (averaged because it's price varies so much across different sources) = 0,11 FPS/€
5060: 40 FPS / 400€ (consistent across all sources) = 0,1 FPS/€

So generally the price/performance ratio isn't so far off, with a difference of about 10% in favour of the 5080, which possibly just comes from some games just not being ok with 8 GB of VRam anymore.

According to ChatGPT Deep Research, the average performance uplift of SLI in games that supported it was about 42% . Leaving Crysis 3 with a mere 2% out of this equation brings this to around 50%, with Hitman supposedly being an outlier with a 96% boost. Applying these estimates to our 5060, we get the following:

(The very unlikely)best case scenario: 40 FPS + 96% = 78,4 FPS
(The also very unlikely) worst case scenario: 40 FPS + 2% = 40,8 FPS
(The very likely) average scenario: 40 FPS + 50% = 60 FPS

Now taking the doubled price into account, being about 800€, our new price to performance ratio looks like this:

60 FPS / 800€ = 0,075 FPS/€


-- Warning, the following stuff is very speculative --
On paper, this looks worse now because the 10% difference is now at a staggering 31,82%, but we are still talking about half the price of a single GPU and some people just do not need the performance of a 5080. Also taking into account that we have now have the possibility to design much faster linking interfaces that would make sharing GPU memory actually viable, this would eliminate the many FPS problems coming from insufficient VRam. Also if major game Engines (looking at you, UE) would implement efficient GPU offloading, let's say taking the already fragmented screen from the vertex shader and passing each triangle to a different GPU for processing and reading the result back after it's finished, we could have pretty efficient multi-GPU rendering.
This could also add interesting applications for existing upscaling or frame generation methods, where for example tearing could be combated by letting the GPU that finishes earlier generate a fake-half-frame and display that until the actual frame has finished rendering.

Of course, the 5060 8 GB might not be the best example, because it still is a terrible value GPU, but running the same numbers for a 4070 could look something like this:
Price (average): 600€
Average Performance at 1440p: 105 FPS
Price to Performance: 0,175 FPS/€ (already much better than the 5080)
SLI Performance estimate: 157,5 FPS
SLI Price to Performance = 0,131 FPS/€

 

Which already looks like a much better option, considering you are still paying anywhere from 50 to 200€ less of dual 4070s. And again, factoring in the likely scenario that a modern approach to SLI would be much more efficient than it was back in the day, these numbers are more likely to get better than worse.

I am aware that this is a pretty hot take but I would like to hear other opinions on this.

TL;DR
I think SLI was "ahead of it's time" and would make much more sense in the modern landscape than the one it died in

Ryzen 5 5600X | 32GB (2x16) Corsair Vengeance Pro DDR4-3600 | MSI X570-A Pro | RX 9070 XT

Link to comment
https://linustechtips.com/topic/1611796-slinvlinkcrossfire-should-make-a-comeback/
Share on other sites

Link to post
Share on other sites

19 minutes ago, Zuckerpapa said:

Now hear me out on this one. I know it was more often than not a pretty messy technology. And with generational uplifts that were actually worth talking about, it didn't make much sense to buy two last generation card instead of one next generation card. But looking at GPUs (especially at you, Nvidia) the performance uplift is mostly very negligible. Considering this, it might make much more sense to combine, let's say two 4070s (about 1200€) instead of buying a 5080 (also about 1200€)

Especially when some GPUs (again, looking at Nvidia) still come with a mere 8 Gigs of VRam. Take a hypothetical dual 5060 8GB setup for example. Where I live, this would be between 700 and 800€ for GPU, which honestly isn't a bad price anymore. Compare that to a single 5080 Setup for example, which would run you from abou 1200-1500€ for the GPU alone and it doesn't really look like a bad deal anymore. Sure you wouldn't get 5080 performance, but sure as hell get a lot more performance per dollar. Let's take the first google result for performance of these two cards:

5080 average FPS across tested games in 1440p: ~160 FPS
5060 average FPS across tested games in 1440p: ~40 FPS

 

Divide that by the price and that gives us a rough estimate for performance per dollar for both cards:
5080: 160 FPS / 1350€ (averaged because it's price varies so much across different sources) = 0,11 FPS/€
5060: 40 FPS / 400€ (consistent across all sources) = 0,1 FPS/€

So generally the price/performance ratio isn't so far off, with a difference of about 10% in favour of the 5080, which possibly just comes from some games just not being ok with 8 GB of VRam anymore.

According to ChatGPT Deep Research, the average performance uplift of SLI in games that supported it was about 42% . Leaving Crysis 3 with a mere 2% out of this equation brings this to around 50%, with Hitman supposedly being an outlier with a 96% boost. Applying these estimates to our 5060, we get the following:

(The very unlikely)best case scenario: 40 FPS + 96% = 78,4 FPS
(The also very unlikely) worst case scenario: 40 FPS + 2% = 40,8 FPS
(The very likely) average scenario: 40 FPS + 50% = 60 FPS

Now taking the doubled price into account, being about 800€, our new price to performance ratio looks like this:

60 FPS / 800€ = 0,075 FPS/€


-- Warning, the following stuff is very speculative --
On paper, this looks worse now because the 10% difference is now at a staggering 31,82%, but we are still talking about half the price of a single GPU and some people just do not need the performance of a 5080. Also taking into account that we have now have the possibility to design much faster linking interfaces that would make sharing GPU memory actually viable, this would eliminate the many FPS problems coming from insufficient VRam. Also if major game Engines (looking at you, UE) would implement efficient GPU offloading, let's say taking the already fragmented screen from the vertex shader and passing each triangle to a different GPU for processing and reading the result back after it's finished, we could have pretty efficient multi-GPU rendering.
This could also add interesting applications for existing upscaling or frame generation methods, where for example tearing could be combated by letting the GPU that finishes earlier generate a fake-half-frame and display that until the actual frame has finished rendering.

Of course, the 5060 8 GB might not be the best example, because it still is a terrible value GPU, but running the same numbers for a 4070 could look something like this:
Price (average): 600€
Average Performance at 1440p: 105 FPS
Price to Performance: 0,175 FPS/€ (already much better than the 5080)
SLI Performance estimate: 157,5 FPS
SLI Price to Performance = 0,131 FPS/€

 

Which already looks like a much better option, considering you are still paying anywhere from 50 to 200€ less of dual 4070s. And again, factoring in the likely scenario that a modern approach to SLI would be much more efficient than it was back in the day, these numbers are more likely to get better than worse.

I am aware that this is a pretty hot take but I would like to hear other opinions on this.

TL;DR
I think SLI was "ahead of it's time" and would make much more sense in the modern landscape than the one it died in

it would be nice but with Nvidia(Ngreedia) moving for towards the aI market and away from gamers they probably won't do it unless it benefits the ai market and their wallets.

Link to post
Share on other sites

15 minutes ago, Zuckerpapa said:

Now hear me out on this one. I know it was more often than not a pretty messy technology. And with generational uplifts that were actually worth talking about, it didn't make much sense to buy two last generation card instead of one next generation card. But looking at GPUs (especially at you, Nvidia) the performance uplift is mostly very negligible. Considering this, it might make much more sense to combine, let's say two 4070s (about 1200€) instead of buying a 5080 (also about 1200€)

Especially when some GPUs (again, looking at Nvidia) still come with a mere 8 Gigs of VRam. Take a hypothetical dual 5060 8GB setup for example. Where I live, this would be between 700 and 800€ for GPU, which honestly isn't a bad price anymore. Compare that to a single 5080 Setup for example, which would run you from abou 1200-1500€ for the GPU alone and it doesn't really look like a bad deal anymore. Sure you wouldn't get 5080 performance, but sure as hell get a lot more performance per dollar. Let's take the first google result for performance of these two cards:

5080 average FPS across tested games in 1440p: ~160 FPS
5060 average FPS across tested games in 1440p: ~40 FPS

 

Divide that by the price and that gives us a rough estimate for performance per dollar for both cards:
5080: 160 FPS / 1350€ (averaged because it's price varies so much across different sources) = 0,11 FPS/€
5060: 40 FPS / 400€ (consistent across all sources) = 0,1 FPS/€

So generally the price/performance ratio isn't so far off, with a difference of about 10% in favour of the 5080, which possibly just comes from some games just not being ok with 8 GB of VRam anymore.

According to ChatGPT Deep Research, the average performance uplift of SLI in games that supported it was about 42% . Leaving Crysis 3 with a mere 2% out of this equation brings this to around 50%, with Hitman supposedly being an outlier with a 96% boost. Applying these estimates to our 5060, we get the following:

(The very unlikely)best case scenario: 40 FPS + 96% = 78,4 FPS
(The also very unlikely) worst case scenario: 40 FPS + 2% = 40,8 FPS
(The very likely) average scenario: 40 FPS + 50% = 60 FPS

Now taking the doubled price into account, being about 800€, our new price to performance ratio looks like this:

60 FPS / 800€ = 0,075 FPS/€


-- Warning, the following stuff is very speculative --
On paper, this looks worse now because the 10% difference is now at a staggering 31,82%, but we are still talking about half the price of a single GPU and some people just do not need the performance of a 5080. Also taking into account that we have now have the possibility to design much faster linking interfaces that would make sharing GPU memory actually viable, this would eliminate the many FPS problems coming from insufficient VRam. Also if major game Engines (looking at you, UE) would implement efficient GPU offloading, let's say taking the already fragmented screen from the vertex shader and passing each triangle to a different GPU for processing and reading the result back after it's finished, we could have pretty efficient multi-GPU rendering.
This could also add interesting applications for existing upscaling or frame generation methods, where for example tearing could be combated by letting the GPU that finishes earlier generate a fake-half-frame and display that until the actual frame has finished rendering.

Of course, the 5060 8 GB might not be the best example, because it still is a terrible value GPU, but running the same numbers for a 4070 could look something like this:
Price (average): 600€
Average Performance at 1440p: 105 FPS
Price to Performance: 0,175 FPS/€ (already much better than the 5080)
SLI Performance estimate: 157,5 FPS
SLI Price to Performance = 0,131 FPS/€

 

Which already looks like a much better option, considering you are still paying anywhere from 50 to 200€ less of dual 4070s. And again, factoring in the likely scenario that a modern approach to SLI would be much more efficient than it was back in the day, these numbers are more likely to get better than worse.

I am aware that this is a pretty hot take but I would like to hear other opinions on this.

TL;DR
I think SLI was "ahead of it's time" and would make much more sense in the modern landscape than the one it died in

It only makes sense in a workstation PC. Games can only utilize so much. 

Link to post
Share on other sites

One thing I always hoped for when AMD announced Infinity Fabric is that they'd use that to make dual gpu cards again because if it's fast and good enough to connect chiplets inside the cpu together why can't they do that magic to connect the 2 gpus together and make them work like 1.

Asrock 890GX Extreme 3 - AMD Phenom II X4 955 @3.50GHz - Arctic Cooling Freezer XTREME Rev.2 - 4GB Kingston HyperX - AMD Radeon HD7850 - Kingston V300 240GB - Samsung Spinpoint F3 1TB - Chieftec APS-750 - Cooler Master HAF912 PLUS


osu! profile

Link to post
Share on other sites

They can right now its part of dx12. You can run games on multiple gpus. It exists its what they did to demo dx12 with impressive stuff. 

 

 

This wasnt sli this was using dx12 multi gpu but using the sli link as a fallback. Later iterations could use the pcie bus fully.

 

1 hour ago, Zuckerpapa said:

think SLI was "ahead of it's time" and would make much more sense in the modern landscape than the one it died in

Not really. Sli came at a time whsre it was legit more sensible to buy 2 cards over a single strong one.back in 2007-8 A 8800gtx is slower than 2 8800gts cards but you can buy 2 gts for less.

 

Then sli took a dip around the 400 series, peaked again with the first titan and then fell off forever slowly.

 

Multi gpu lives and dies with devs and the niche of users that used it was INCREDIBLY small. Like id be surprised if it was even a 10k userbase in its peak. This is NICHE.

 

So then when publishers saw this as a cost to cut, the userbase went down even more, single gpus got REALLY powerfull and then gpu makers wantes to save money on product sku's we ended up here where it vanished.

 

All the tools to have it happen ARE HERE. Multi gpu IS POSSIBLE RIGHT NOW. It just doesnt make monetary sense to do it.

 

Gaming is a massive bussiness now if you want passion thats partly still there with indies and thats it. Those often dont have the budget to do this for the like legit 2 people that would try this apart from the dev team themselves.

 

HOWEVER THERE IS A SINGLE GAME THAT HAS IT: Ashes of the singularity aka dx12 the benchmark the game. It works, well even but well it cost them a looong time to implement and probs heaps of money.

 

So again no point. That game didnt even ever have players just benchmarks and its actually a totally fine game! Not good but totally fine a nice solid 6/10

Link to post
Share on other sites

For it to come back, we need to look at why it disappeared in the first place. Fundamentally, getting two GPUs working together well in a gaming workload is difficult. Game devs don't like it because it is work for them for the few people that did use it. So ideally it would need to come at practically zero cost to game devs. That shifts responsibility over to the GPU side. How do you make multiple GPUs look like one to a gaming workload? Rendering or AI is relatively easy, you can literally chop up the work. With gaming, things depend on other things and connectivity is key.

 

1 hour ago, Mo5 said:

One thing I always hoped for when AMD announced Infinity Fabric is that they'd use that to make dual gpu cards again because if it's fast and good enough to connect chiplets inside the cpu together why can't they do that magic to connect the 2 gpus together and make them work like 1.

Infinity Fabric inside socketed CPUs is not great to put it nicely. AMD went that route to optimise manufacturing and cost. Bandwidth is comparable to dual channel ram bandwidth per link. This isn't anywhere near close to being enough to make two GPUs look like one. PCIe? Even 5.0 x16 is only 64GB/s. Even less than typical DDR5. NVlink starts to get serious but probably still nowhere near enough.

 

All above is assuming the chips are "far" from each other. Basically, if they're not practically touching each other. AMD CPU chiplets are "far" because of that. For "close" connectivity, the only consumer AMD example I can think of is RDNA3. On Intel side, anything since Meteor Lake. Or whatever the Apple thing is where they glue two chips together and do get close to 2x perf. I can't remember if it is Max, Pro, Ultra, or all the above. But this isn't "SLI" either since it has to be made that way from the start. Not something you can upgrade later on. Maybe in distant future we get affordable high performance short range optical links, that might enable something like this.

Gaming system: R7 7800X3D, Asus ROG Strix B650E-F Gaming Wifi, Thermalright Phantom Spirit 120 SE ARGB, Corsair Vengeance 2x 32GB 6000C30, MSI Ventus 3x OC RTX 5070 Ti, MSI MPG A850G, Fractal Design North, Samsung 990 Pro 2TB, Alienware AW3225QF (32" 240 Hz OLED)
Productivity system: i9-7980XE, Asus X299 TUF mark 2, Noctua D15, 64GB ram (mixed), RTX 4070 FE, NZXT E850, GameMax Abyss, Samsung 980 Pro 2TB, iiyama ProLite XU2793QSU-B6 (27" 1440p 100 Hz)
Gaming laptop: Lenovo Legion 5, 5800H, RTX 3070, Kingston DDR4 3200C22 2x16GB 2Rx8, Kingston Fury Renegade 1TB + Crucial P1 1TB SSD, 165 Hz IPS 1080p G-Sync Compatible

Link to post
Share on other sites

SLI is dead. 


SLI was basically taking TWO mid-sized GPUs (e.g. 2x 6800GTX which were 250mm^2 so 500mm^2 of GPU) and mashing them together. 

nVidia changed strategy. They are now making ONE GPU that's as big as two mid-sized GPU combined. 
This has less overhead, more consistent performance and is just WAY less janky. 

There's also benefit for things like frame gen and resolution upscaling - latency with multiple GPUs would kill these concepts. 

The 5090 is 750mm^2 with 32GB RAM. The 5080 is 378mm^2 with 16GB RAM. 
If you want to double up on your GPU, get the GPU that's 2x the size. 

 

Quote

Also taking into account that we have now have the possibility to design much faster linking interfaces that would make sharing GPU memory actually viable, this would eliminate the many FPS problems coming from insufficient VRam.

Ram isn't "shared" - the exact same data needs to be in both sets of memory. 
If you have 2x 8GB video cards, your total addressable memory is still only 8GB because the data needs to be doubled up so that the GPUs can both work on the same task.

One of the things that was BAD about SLI is that you needed 2x the physical amount of RAM. 

3900x | 64 GB RAM | RTX 2080 

1.5TB Optane P4800X | 2TB Micron 1100 SSD | 16TB NAS w/ 10Gbe
LG C4 + QN90A | Sony AZ7000ES | Polk R200+R100, ELAC OW4.2, SVS PB12-NSD + 3x SB1000 | HD800

Link to post
Share on other sites

I have a feeling that OP has never used SLI. This is not me trying to say you're stupid or ignorant - rather, it's a tech that looks a whole lot better on the surface than it actually ever was in reality. Implementation is spotty across games, many titles run worse with SLI, sometimes you get better framerates but higher input lag or weird visual anomalies, or sometimes both cards are hammered but you get the exact same framerate as using 1 card... The 42% performance uplift ChatGPT claims SLI has is I suppose correct, in that it works well enough to provide any performance uplift less than half of the time. 
Now don't get me wrong - when it works, it worksBlur and Crysis are my favorite SLI games, they look gorgeous and play well, and distribute loads well between cards. But unfortunately most titles are in the other camp - poor implementation or just no support at all. 

Oh, and to enable/disable SLI on the system level, it requires a graphics driver restart. That's bothersome. 

Link to post
Share on other sites

SLI died because the burden switched from nvidia/amd to the devs with Vulken and DX12.

No Dev is going to do it, its a lot of effort for all of a dozen people and they dont get anything out of it, they dont get the money from a second GPU purchase.

Link to post
Share on other sites

The Witcher 3 with crossfire was insane, it practically doubled the framerate and made it stable. I think Far Cry 2 also had official support.

 

Other games (most) ran awful but it's solely because the implementation was either poorly done or forced via a patch or driver configuration, sometimes it worked fine others you couldn't get anything to run.

 

SLI had the same fate, some games ran great but they were a minuscule percentage and it kinda died out after 2013-2014 as only a few games officially supported the technology. They had to spend more time and resources on making games SLI or xfire compatible and apparently that wasn't worth it for most studios.

 

The first cards were also different, for ATi you had to get the master and slave card(s) as well as the bridge cable, the master card only had a DVI output if you wanted to use it as a single card, oh and your northbridge also had to support the technology, and those motherboards were costly.

 

There was also a way to link cards using internal ribbons but it never caught on as the performance was worse than vs using the bridge cable, the ribbons suffered from the scissoring problem.

 

Newer cards didn't require any of that but they came out pretty much after the technology was obsolete, there were a few games you could play but it wasn't worth it anymore to get 2 cards as power consumption was insane compared to using a single better card, like the 1080Ti.

 

What about dual-GPU cards? those were treated as a xfire config by default but their performance in most games was dogshit because the games were simply not coded to make use of 2 GPUs at once, and the power draw was like 600W back when the RTX meme FPS 6969 cards weren't a thing, 600W was your whole power supply.

Caroline doesn't need to hear all this, she's a highly trained professional.

Link to post
Share on other sites

3 hours ago, Zuckerpapa said:

Especially when some GPUs (again, looking at Nvidia) still come with a mere 8 Gigs of VRam.

Two 8GB GPUs in SLI for games would still work as an 8GB GPU, you need to have data in sync across both GPUs in order to reduce your synchronization issues.

3 hours ago, Zuckerpapa said:

According to ChatGPT Deep Research, the average performance uplift of SLI in games that supported it was about 42% . Leaving Crysis 3 with a mere 2% out of this equation brings this to around 50%, with Hitman supposedly being an outlier with a 96% boost. Applying these estimates to our 5060, we get the following:

One thing you're leaving out of the table is the amount of stutters and glitches that often happen in-game due to mGPU usage. Even if you get a perf uplift in terms of FPS, the actual gameplay experience was actually worse.

3 hours ago, Zuckerpapa said:

Also taking into account that we have now have the possibility to design much faster linking interfaces that would make sharing GPU memory actually viable, this would eliminate the many FPS problems coming from insufficient VRam.

There's no GPU sharing in any place, nor there has ever been. Simply there's no discrete interconnect that's commercially viable for that. GPUs often have 500GB/s+ bandwidth, PCIe 5.0 x16 does ~64GB/s. Ampere (which has the last gen to have a discrete NVLink bridge) maxed out at ~110GB/s.

Modern interconnects are not discrete anymore, but rather built into the devices nowadays (apart from the ones related to NVSwitch, but you won't be seeing that in consumer platforms).

3 hours ago, Zuckerpapa said:

This could also add interesting applications for existing upscaling or frame generation methods, where for example tearing could be combated by letting the GPU that finishes earlier generate a fake-half-frame and display that until the actual frame has finished rendering.

Now that's honestly an interesting idea.

Anyhow...

3 hours ago, Zuckerpapa said:

I think SLI was "ahead of it's time" and would make much more sense in the modern landscape than the one it died in

SLI is dead and no one is going to work on mGPU for games for reasons mentioned by others above already.

 

mGPU is still strong for compute workloads such as AI, where you don't need such synchronization in a really short time.

FX6300 @ 4.2GHz | Gigabyte GA-78LMT-USB3 R2 | Hyper 212x | 3x 8GB + 1x 4GB @ 1600MHz | Gigabyte 2060 Super | Corsair CX650M | LG 43UK6520PSA
ASUS X550LN | i5 4210u | 12GB
Lenovo N23 Yoga

Link to post
Share on other sites

3DFX first developed SLI with the Voodoo 2 card. nVidia acquired the tech when they bought 3DFX.

Although the methodology they use differs from what 3DFX did, they kept the acronym SLI.

Originally SLI stood for Scan Line Interleave. It entailed breaking the output into horizontal lines and each GPU rendered every other line so in essence the work load was split.

That method was actually first used in video arcade consoles and also a bunch of interactive video games of the '90's. I mean interactive in that you essentially played a character in a movie, the Gabrial Knight series, Wing Commander (still have ALLLLL the cd's for that game) Phantasmagoria, etc. They played very much like the more recent Black Mirror "Bandersnatch" series.

In order for the video to play off the CD smoothly what they did was the video literally only rendered every other horizontal line. So a line of video then a black line and so on. Noticeable but it looked ok for the time.

 

Today it stands for Scalable Link Interface. That encompasses a few different methodologies to render the screen, but essentially divides the screen horizontally.

I had a Voodoo 1 card then found a deal refurbed Voodoo 2 cards, Got 2 8meg cards for $50 each. They were straight reference cards.

You had to have a 2d card and that passed through via connector to the "master" Voodoo 2. The 2 cards were joined with a cable or small board.

It worked great, Much better the ill fated Matrox Mystique they replaced. I remember FIFA was pretty awesome with that setup at the time.

Later they had ones that were essentially two cards built as one and only used one slot.

nVidia also released some with that config early on after acquiring 3DFX.

 

Old GeForce 7950 GT X2 Multi GPU card

With the more complex abilities that the GPU's are capable of I think it might be pretty damn hard to get multiple GPU's to play nice.

But maybe the 2 cards in one idea could work, but my god that would be HUGE and probably need it's own case..

Link to post
Share on other sites

Posted (edited)

I honestly did not expect to get this many replies in that amount of time. Maybe I should start engagement farming on Instagram.

In all honesty though, it was really interesting to read all of these answers and while I am not going to respond to all of them for obvious reasons, be assured I read every one of them with great interest.
 

18 hours ago, danalog said:

I have a feeling that OP has never used SLI

First of all, you are absolutely right, I have in fact never used it myself, but I would consider myself tech savvy enough to have a pretty good understanding of it and what it was supposed to do.

One argument that seems to pop up very often is that VRAM wasn't shared but rather synched. This might be me coming from a more software heavy perspective as I work in development, but I imagine that memory sharing would be absolutely possible - but more on that later.
 

16 hours ago, igormp said:

One thing you're leaving out of the table is the amount of stutters and glitches that often happen in-game due to mGPU usage. Even if you get a perf uplift in terms of FPS, the actual gameplay experience was actually worse.

Looking at modern games this wouldn't be a noticable issue to be honest. But jokes aside, isn't this why we have dynamic resolution and frame generation nowdays? So devs can hide a poorly optimized game behind fake frames and AI upscaling? I assume it would be totally viable to use these techniques to hide stutter.

Let's look at this from this very beautiful flow diagram I totally didn't create in 5 Minutes:

Datagram.thumb.webp.84c7110ea8a70de30b0ad557b2e74d78.webp

So basically, the Idea is that the game or any application (that doesn't care, unlike 3D Software for example) will never know about the second GPU. The Driver will communicate with both of the GPUs and will chose one as a master and one as a slave. The master card will then ask the slave how much memory it has available, add that to it's own memory size and will report the sum of that to the driver. Let's say, each card goes from adresses 0x01 to 0xff for a total capacity of 255 Bytes per GPU. Now let's say, the game needs to store a texture at 0x0A, so byte 12 in memory. This would all be handled by the Master GPU without needing the second GPU. Now onto the real problem: The game would think that the last page in memory is 0x1FE (byte 255 of GPU1). The driver will recognize this and instead of trying to writhe that data to GPU0 which would result in a programmers worst nightmare, it subtracts the memory size of GPU0 from that address, so 0x1FE - 0xFF, getting the new address to be stored in as 0xFF on the second GPU. The driver will then send that data as normal, but instead of GPU0, it goes straight to GPU1.

For reading, that would be the same in reverse, aka:
- Look if the address is a valid address of GPU0
- If not, subtract the last page index of GPU0 from that address to get the page index on GPU1
- Read that texture as normal

I am aware that the scope of this operation is MUCH more complex than what I just described, but I still figure it to be a valid starting point.

In case this adds too much latency to certain operations, we could look at another possible solution, which I creatively named solution B:
Instead of treating it as unified memory, put the available VRAM in a SWAP configuration, much like Linux does with traditional RAM. So every texture is initially stored on GPU0 and kept there until it's either full or not important for the current scene. If that's the case, move that data to GPU1's memory and store a pointer to the location of that data in GPU0's memory in some hashmap, because everyone loves hashmaps, am I right?

But there is a pretty big drawback in this approach because one GPU is rendering the image on it's own while the other one would only exist as backup memory, essentially being a VERY expensive VRAM upgrade unless we put the second GPU to work on upscaling or image generation for example as mentioned in the original post.

This way, game developers wouldn't need to actively implement mGPU support as the driver will pretend it's only one GPU they have to worry about and the process of managing access to the resources would happen on the driver level.

This obviously comes with some issues, let's take the example of a singular ray from a raytracing shader trying to sample a texture. Assuming this particular ray is computed on GPU0, but the texture of the triangle it tries to sample is stored inside of GPU1s memory, well bummers. Unless you calculate a prepass on the CPU to determine which triangle on the screen samples which texture. In that case, we could use the calculated prepass to determine which part of the scene should be rendered by which GPU. Once these calculations are done, we can remap the resulting images onto that prepass and treat it as our final frame to be displayed. This approach is nowhere near perfect, since in a scenario where 90% of the textures are in GPU0s memory, GPU0 would have to do 90% of the work which would result in instability.

To combat this we could reintroduce the SLI bridge, but instead of handling the entire communication, it could server as a sort of data highway, where the Master GPU could initiate data exchanges between the two different sets of memory, to ensure that needed data is always somewhat evenly distributed across both GPUs.

But still, I very much acknowledge that latency would be the biggest issue for this approach as moving data takes time and doesn't scale very well when the cables needed to connect the devices get longer. Still an interesting thought experiment imo. So interesting in fact that I might waste some nights developing a rough simulation of this process to be able to play around with variables an see how they would affect the outcome.

Edited by Zuckerpapa
Spelling and grammar

Ryzen 5 5600X | 32GB (2x16) Corsair Vengeance Pro DDR4-3600 | MSI X570-A Pro | RX 9070 XT

Link to post
Share on other sites

47 minutes ago, Zuckerpapa said:

But jokes aside, isn't this why we have dynamic resolution and frame generation nowdays? So devs can hide a poorly optimized game behind fake frames and AI upscaling? I assume it would be totally viable to use these techniques to hide stutter.

Yup, as I had said before this idea of yours could be interesting.

But given that you're already running upscalers and frame generating models, is it really necessary to even have multiple GPUs?

One issue that lots of folks mention is that, even though those frame gen models provide an increase in FPS, the actual input lag is way worse, so your idea would make this even worst.

52 minutes ago, Zuckerpapa said:

In case this adds too much latency to certain operations

I was going to say that: not only latency, but also lots of overheads from both the driver and the "master" GPU.

53 minutes ago, Zuckerpapa said:

But there is a pretty big drawback in this approach because one GPU is rendering the image on it's own while the other one would only exist as backup memory, essentially being a VERY expensive VRAM upgrade

Yeah, that makes the 2nd GPU totally useless.

53 minutes ago, Zuckerpapa said:

unless we put the second GPU to work on upscaling or image generation for example as mentioned in the original post.

Stuff like lossless scaling are already able to do so, use the main GPU for rendering the actual game and a 2nd GPU just for upscaling/frame gen.

 

57 minutes ago, Zuckerpapa said:

One argument that seems to pop up very often is that VRAM wasn't shared but rather synched. This might be me coming from a more software heavy perspective as I work in development, but I imagine that memory sharing would be absolutely possible - but more on that later.

So far your ideas introduced lots of complexities and drawbacks, as you already noted. Memory sharing is the easiest path to achieve actual performance uplifts for games.

 

 

Nonetheless, you are forgetting that having 2 GPUs in a system would be a hassle for most people, specially given that nowadays most consumer motherboards don't even have a 2nd full-width PCIe slot attached to the CPU, but rather to the chipset (if at all).

Then you have issues regarding powering those, having the proper spacing, cooling, etc etc.

 

The cost and hassle just doesn't really make sense for your average gamer.

 

For compute it still makes tons of sense, and I say that as an owner of a 2x3090 setup.

FX6300 @ 4.2GHz | Gigabyte GA-78LMT-USB3 R2 | Hyper 212x | 3x 8GB + 1x 4GB @ 1600MHz | Gigabyte 2060 Super | Corsair CX650M | LG 43UK6520PSA
ASUS X550LN | i5 4210u | 12GB
Lenovo N23 Yoga

Link to post
Share on other sites

3 hours ago, Zuckerpapa said:

One argument that seems to pop up very often is that VRAM wasn't shared but rather synched. This might be me coming from a more software heavy perspective as I work in development, but I imagine that memory sharing would be absolutely possible - but more on that later.
 

Looking at modern games this wouldn't be a noticable issue to be honest. But jokes aside, isn't this why we have dynamic resolution and frame generation nowdays? So devs can hide a poorly optimized game behind fake frames and AI upscaling? I assume it would be totally viable to use these techniques to hide stutter.

Let's look at this from this very beautiful flow diagram I totally didn't create in 5 Minutes:


But still, I very much acknowledge that latency would be the biggest issue for this approach as moving data takes time and doesn't scale very well when the cables needed to connect the devices get longer. Still an interesting thought experiment imo. So interesting in fact that I might waste some nights developing a rough simulation of this process to be able to play around with variables an see how they would affect the outcome.

I wouldn't be surprised if there IS a way to share memory (even if there's still some redundancy) - but it's likely to just make things bad. 
At a very high level the need to split up the workload and to do EVERYTHING in a narrow time window makes it tricky. 
A fast GPU like a 5090 has around 1800GBps of memory bandwidth. A fast PCIe 5.0 x16 slow "only" has 128GBps bandwidth. Add in some overhead (PCIe numbers are optimistic) and you have a bus that can only send data around 5% as fast. And it needs to do OTHER things too beyond just toss memory around. 

The reason latency is a problem over PCIe isn't so much the time it takes to send a little bit of data. It's that when you send A LOT of data, everything backs up. If you need to send 1.8TB worth of data it takes more than 1 second...  You need 20 seconds. PLUS all of the other data that backs up behind it. 

It'd be similar to what you see in this picture where past some bandwidth threshold, latency just sky rockets because there's a backlog - StorageReview-Dapustor-X2900P-SeqRead-64k.png


There was a quasi-similar issue in the past where part of the memory on a GPU was a around 7x slower. This caused a fair bit of controversy and was generally considered an inappropriate performance cut. After ~3.5GB RAM used in the card performance just SUCKED. And this is with one GPU. With 0 coordination. And 0 driver overhead. And the bandwidth gap was smaller. And the latency gap was smaller. 

RAM that can only transfer data 20x slower (5% of total speed) is basically worthless. 
https://www.anandtech.com/show/8931/nvidia-publishes-statement-on-geforce-gtx-970-memory-allocation

 

As far as having one card ONLY doing upscaling... why? What's wrong with the upscaling and frame generation that currently exists in modern GPUs?
How does adding the extra step of shuffling data around make things better? Is the part of the GPU that's doing "AI" underpowered? My current understanding is that it's NOT underpowered and other parts of the GPU are the bottleneck. 

From an engineering perspective... there's nothing stopping more RAM from being added. It's relatively cheap. 

If it mattered THAT much people would be buying AMD and Intel GPUs left and right. It doesn't matter that much. One of the awesome things about AI upscaling is it's much more memory efficient. 
 

 

----

 

So yeah... from a "make an awesome experience" perspective... multiple cards is a bad idea. Multiple GPUs with shared memory on the same card might work in the future. 

This will likely ONLY get more extreme in the future. GPUs will have more cache and it's VERY likely that at SOME point HBM or similar will become a thing, further exacerbating the bandwidth/latency issues associated with having multiple GPUs doing time sensitive work. 

3900x | 64 GB RAM | RTX 2080 

1.5TB Optane P4800X | 2TB Micron 1100 SSD | 16TB NAS w/ 10Gbe
LG C4 + QN90A | Sony AZ7000ES | Polk R200+R100, ELAC OW4.2, SVS PB12-NSD + 3x SB1000 | HD800

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×