Why VRAM will never stack in SLI - regardless of DX12

D2ultima · May 15, 2015

Whoops. I read bus width and not bandwidth.

I obviously need sleep. Feel free to delete this post any mods who see it.

guyf2010 · May 15, 2015

You are both imagining that the initial data is the same as the working data.

Based on your ideas you may as well not have VRAM on a card, just have it all accessed over PCIE, that will work well...

Reading your first post, it seemed you thought all VRAM was accessed over PCIE...

You can't render different parts of a scene on different cards without each card having the full resources of the scene, or communicating with the other card, sure for some workloads you could but shaders don't work like that... HDR, bloom, motion blur, for example all require the entire frames data to render, Im not a programmer but as far as I know it cannot be done until the completing card has the entire frame ready.

Regardless of if the CPU is issuing instructions to each GPU individually, the only way to reduce the VRAM usage is to load resources into each card asymmetrically. If one cards workload suddenly requires the resources it has not yet loaded then it will be bottlenecked by the exact same thing that currently bottlenecks a card when it runs out of VRAM

HDR, Bloom and Motion Blur are an interesting point. I am not expert on any of them, but I know a little about motion blur. As far as I know, it is done using the comparison of the current frame to the previous frame. It is quite possible that DX12 will have a 'primary' card that performs modifications to the entire frame after being sent the data from the second GPU. As I said, I could be (and likely am) quite wrong about how this works, but nonetheless I doubt it's impossible.

The load may well be asymmetrical. But how is one of the GPUs suddenly needing to load resources into the card any different from the current situation on a single card?

All of that aside, didn't AMD already say it can be done with Mantle? I could be wrong but I thought a game already had it implemented.

rentaspoon · May 15, 2015

You're right, that's exactly where the OP's supposition falls. The 224gb/s bandwidth is exclusively between the gpu and vram, the data still streams through pcie while loading a scene, which means it won't limit rendering times (if a game is coded properly of course).

Am I right in assuming the OP is talking about all the ram being accessed at the same time? Which is unlikely to happen anyway?

One thing people seem to forget about is the time it takes to load a texture from SSD/HDD to GPU, that extra ram would still be useful for precaching.

Then we will see possibly a replacement for SLI in the next generation of cards or a dedicated connection.

All theories of course but we where having the same sort of discussion about AGP when we were getting close to hitting its limits

I meant specifically bottlenecks in games with texture loading and offloading, which for now aren't an issue.

I have issues with SSD being unable to load sound files quickly enough causing slow down in certain places

mathijs727 · May 15, 2015

Cominng from a computer science student: the OP is right.

Pcie doesnt form a problem in single gpu settings because you wont need to cummunicate that much data every frame (only stuff that changes, namely transformation matrices).

In a current dual gpu setting there is no problem either as each gpu will have its own copy of the data in VRAM.

When you want to "double the memory" both cards would have to access each others VRAM.

That's where we have a problem: instead of cummunicating with their own VRAM directly which they can do at speeds of up to 250GB/s (depending on the bus width and memory clock). This is much, much more then PCIe can do.

Maleko · May 15, 2015

Who knows what is possible in the future or what is in development at the moment...

Years ago we would have probably thought multiply cores on one cpu was impossible... look how many they fit on it now!

One of the reasons I love technology, seeing what they can blow us away with next.

Lets not be closed minded into what we know about technology in the present, but look at what is possible in the future.

linuxfan66 · May 15, 2015

Who knows what is possible in the future or what is in development at the moment...

Years ago we would have probably thought multiply cores on one cpu was impossible... look how many they fit on it now!

One of the reasons I love technology, seeing what they can blow us away with next.

Lets not be closed minded into what we know about technology in the present, but look at what is possible in the future.

And then people carried away that cores equaled more performance under all conditions while not understanding that cores had to have software to utilize them. See ghz myth for other notes

Sauron · May 15, 2015

You are both imagining that the initial data is the same as the working data.

Based on your ideas you may as well not have VRAM on a card, just have it all accessed over PCIE, that will work well...

Do you really think the gpu CONSTANTLY streams data to and from the hard drive and system memory? Do you know how hard that would bottleneck the card in the first place? When you load a level, most of the data you need is loaded right into the vram, and additional data is then streamed in if necessary, making sure it's given enough time to copy before it's actually used. Shared vram would not need cards to copy data from each other, at all.

Sauron · May 15, 2015

I have issues with SSD being unable to load sound files quickly enough causing slow down in certain places

Is that titanfall?

rentaspoon · May 15, 2015

Is that titanfall?

Noooo.... Its Battlefield 4 [emoji14] or EA depending on how you want to look at it

Maleko · May 15, 2015

And then people carried away that cores equaled more performance under all conditions while not understanding that cores had to have software to utilize them. See ghz myth for other notes

Exactly!

D2ultima · May 15, 2015

Cominng from a computer science student: the OP is right.

Pcie doesnt form a problem in single gpu settings because you wont need to cummunicate that much data every frame (only stuff that changes, namely transformation matrices).

In a current dual gpu setting there is no problem either as each gpu will have its own copy of the data in VRAM.

When you want to "double the memory" both cards would have to access each others VRAM.

That's where we have a problem: instead of cummunicating with their own VRAM directly which they can do at speeds of up to 250GB/s (depending on the bus width and memory clock). This is much, much more then PCIe can do.

Actually... I'll cut across here. What about XDMA in R9 290 and R9 290X cards? There isn't even a connector... all communication is done through PCI/e. Shouldn't the fact that communication possible via PCI/e automatically disable the limit of the SLI bridge? What if the kind of hardware access that DX12 allows grants the ability to have PCI/e communication between cards in a similar fashion? I'm fairly sure that it could happen that way.

I mean, remember they managed to combine the iGPU of a PC with the dGPU and have a performance boost... theoretically, that should have not allowed for any boost, due to the distinctly smaller vRAM buffer of the iGPU on intel chips and the slow interaction between system RAM and iGPU or virtual RAM and the iGPU.

Pony080905 · May 15, 2015

Now how about you offer a solution instead of just annoying people, it like when people say what's wrong with politics then don't offer a solution

mathijs727 · May 15, 2015

Actually... I'll cut across here. What about XDMA in R9 290 and R9 290X cards? There isn't even a connector... all communication is done through PCI/e. Shouldn't the fact that communication possible via PCI/e automatically disable the limit of the SLI bridge? What if the kind of hardware access that DX12 allows grants the ability to have PCI/e communication between cards in a similar fashion? I'm fairly sure that it could happen that way.

I mean, remember they managed to combine the iGPU of a PC with the dGPU and have a performance boost... theoretically, that should have not allowed for any boost, due to the distinctly smaller vRAM buffer of the iGPU on intel chips and the slow interaction between system RAM and iGPU or virtual RAM and the iGPU.

Its not the SLI bridge that is the bottleneck for not having "stacking" memory, the bottleneck is PCIe itself.

With XDMA they let the cards talk to eachother over PCIe, but that doesnt mean PCIe is fast enough to be able to move over gigabytes of data every 60th of a second.

In case of IGP's: they use system RAM as VRAM. Because RAM is directly connected to the CPU (and thus the IGP) PCIe wont be a bottleneck as it isnt even used.

D2ultima · May 15, 2015

Its not the SLI bridge that is the bottleneck for not having "stacking" memory, the bottleneck is PCIe itself.

With XDMA they let the cards talk to eachother over PCIe, but that doesnt mean PCIe is fast enough to be able to move over gigabytes of data every 60th of a second.

In case of IGP's: they use system RAM as VRAM. Because RAM is directly connected to the CPU (and thus the IGP) PCIe wont be a bottleneck as it isnt even used.

You might be exaggerating a bit with the gigabytes of data every 60th of a second, but the amount of vRAM data that gets shoveled in/out of vRAM should be significantly less than that. I still think that PCI/e might have a way of transferring enough data across the way I described it.

Also, for the iGPU, I was referring to the fact that the iGPU could never hope to contain enough data in it to match the dGPU, and that using virtual RAM is FAR slower than dGPUs' vRAM, so if they needed to mirror the data like normal, the iGPU should have slowed down the entire operation, rather than sped it up.

Badger906 · May 15, 2015

if the PCI lanes aren't fast enough to transfer data between one card and another then how come they're fast enough in a single card setup? Surely the same data is passing through at the same speed for a single card as it is for dual cards?

my understanding of how it will stack is that DX12 will implement split frame rendering with a restricted frame buffer that will allow each card to only render half of the screen, so each card is doing half of the work it did before in sli/crossfire.

that way only half the data has to be shared between each card.

mathijs727 · May 15, 2015

if the PCI lanes aren't fast enough to transfer data between one card and another then how come they're fast enough in a single card setup? Surely the same data is passing through at the same speed for a single card as it is for dual cards?

my understanding of how it will stack is that DX12 will implement split frame rendering with a restricted frame buffer that will allow each card to only render half of the screen, so each card is doing half of the work it did before in sli/crossfire.

that way only half the data has to be shared between each card.

read the OP again...

Current SLI/Crossfire already use split frame rendering.

To render a frame the gpu has to access all the data about the current frame (mainly vertices and textures).

In a single gpu setup all this data is stores in the vram and the only thing going over the pcie bus is stuff that has to be changed.

The gpu can access the vram on the same graphics card at very high speed.

In a dual gpu setup each graphics card needs to store its own copy of the scene in its vram (so it wont double).

If both cards would contain one half of the scene (so you could "double" the amount of effective vram) each card would have to look up half of the scene in the other cards memory.

A gpu can access its own vram at very high speeds but to access the other graphic cards data it has to transfer that data over pcie which is to slow

Badger906 · May 15, 2015

read the OP again...

Current SLI/Crossfire already use split frame rendering.

To render a frame the gpu has to access all the data about the current frame (mainly vertices and textures).

In a single gpu setup all this data is stores in the vram and the only thing going over the pcie bus is stuff that has to be changed.

The gpu can access the vram on the same graphics card at very high speed.

In a dual gpu setup each graphics card needs to store its own copy of the scene in its vram (so it wont double).

If both cards would contain one half of the scene (so you could "double" the amount of effective vram) each card would have to look up half of the scene in the other cards memory.

A gpu can access its own vram at very high speeds but to access the other graphic cards data it has to transfer that data over pcie which is to slow

yeah but dont current split from rendering mean that each card renders a frame so card 1 will render even frames and card 2 odd frames. where as split screen render means each card renders all 100% of the frames for half the screen and the other does the other half. that way either card needs to communicate as they're both rendering independently.

its also means each card is doing 50% less work than the current method.

mathijs727 · May 16, 2015

yeah but dont current split from rendering mean that each card renders a frame so card 1 will render even frames and card 2 odd frames. where as split screen render means each card renders all 100% of the frames for half the screen and the other does the other half. that way either card needs to communicate as they're both rendering independently.

its also means each card is doing 50% less work than the current method.

That would be called Alternate Frame Rendering.

SLI and Crossfire can currently do both already

D2ultima · May 16, 2015

That would be called Alternate Frame Rendering.

SLI and Crossfire can currently do both already

Hold on. Where exactly does SLI do split frame rendering? Every single instance I've ever seen of SLI since I got my machine in 2013 has been some form of AFR, and I've played quite a number of games on it. I know that AMD used to do SFR a long time ago, and it can do it now in Mantle (as someone else pointed out, it was what, tiled rendering?), but I don't see any games these days doing SFR.

Sam Z Man · May 16, 2015

This whole thread has become a whole ton of people trying to make explanation's on why it will work so they have something to glue back together there shattered dreams of having vram stack.

Sign In

Why VRAM will never stack in SLI - regardless of DX12

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites