Jump to content

Why VRAM will never stack in SLI - regardless of DX12

 

You are both imagining that the initial data is the same as the working data.

 

Based on your ideas you may as well not have VRAM on a card, just have it all accessed over PCIE, that will work well...

Reading your first post, it seemed you thought all VRAM was accessed over PCIE...

 

 

You can't render different parts of a scene on different cards without each card having the full resources of the scene, or communicating with the other card, sure for some workloads you could but shaders don't work like that... HDR, bloom, motion blur, for example all require the entire frames data to render, Im not a programmer but as far as I know it cannot be done until the completing card has the entire frame ready.

 

Regardless of if the CPU is issuing instructions to each GPU individually, the only way to reduce the VRAM usage is to load resources into each card asymmetrically. If one cards workload suddenly requires the resources it has not yet loaded then it will be bottlenecked by the exact same thing that currently bottlenecks a card when it runs out of VRAM

 

HDR, Bloom and Motion Blur are an interesting point. I am not expert on any of them, but I know a little about motion blur. As far as I know, it is done using the comparison of the current frame to the previous frame. It is quite possible that DX12 will have a 'primary' card that performs modifications to the entire frame after being sent the data from the second GPU. As I said, I could be (and likely am) quite wrong about how this works, but nonetheless I doubt it's impossible.

 

The load may well be asymmetrical. But how is one of the GPUs suddenly needing to load resources into the card any different from the current situation on a single card?

 

 

All of that aside, didn't AMD already say it can be done with Mantle? I could be wrong but I thought a game already had it implemented.

Link to comment
Share on other sites

Link to post
Share on other sites

You're right, that's exactly where the OP's supposition falls. The 224gb/s bandwidth is exclusively between the gpu and vram, the data still streams through pcie while loading a scene, which means it won't limit rendering times (if a game is coded properly of course).

Am I right in assuming the OP is talking about all the ram being accessed at the same time? Which is unlikely to happen anyway?

One thing people seem to forget about is the time it takes to load a texture from SSD/HDD to GPU, that extra ram would still be useful for precaching.

Then we will see possibly a replacement for SLI in the next generation of cards or a dedicated connection.

All theories of course but we where having the same sort of discussion about AGP when we were getting close to hitting its limits

I meant specifically bottlenecks in games with texture loading and offloading, which for now aren't an issue.

I have issues with SSD being unable to load sound files quickly enough causing slow down in certain places
Link to comment
Share on other sites

Link to post
Share on other sites

Cominng from a computer science student: the OP is right.

Pcie doesnt form a problem in single gpu settings because you wont need to cummunicate that much data every frame (only stuff that changes, namely transformation matrices).

In a current dual gpu setting there is no problem either as each gpu will have its own copy of the data in VRAM.

When you want to "double the memory" both cards would have to access each others VRAM.

That's where we have a problem: instead of cummunicating with their own VRAM directly which they can do at speeds of up to 250GB/s (depending on the bus width and memory clock). This is much, much more then PCIe can do.

Desktop: Intel i9-10850K (R9 3900X died 😢 )| MSI Z490 Tomahawk | RTX 2080 (borrowed from work) - MSI GTX 1080 | 64GB 3600MHz CL16 memory | Corsair H100i (NF-F12 fans) | Samsung 970 EVO 512GB | Intel 665p 2TB | Samsung 830 256GB| 3TB HDD | Corsair 450D | Corsair RM550x | MG279Q

Laptop: Surface Pro 7 (i5, 16GB RAM, 256GB SSD)

Console: PlayStation 4 Pro

Link to comment
Share on other sites

Link to post
Share on other sites

Who knows what is possible in the future or what is in development at the moment...

 

Years ago we would have probably thought multiply cores on one cpu was impossible... look how many they fit on it now!

 

 

One of the reasons I love technology, seeing what they can blow us away with next.

 

Lets not be closed minded into what we know about technology in the present, but look at what is possible in the future.

|CPU| i7 4770K @ 4.4GHz |Motherboard| Asus Z87 MAXIMUS VI HERO |RAM| Corsair Vengeance 16GB 1866MHz |GPU| EVGA GeForce GTX 770 2GB

|HD| 256GB SSD 850 PRO |Sound| Creative Sound Blaster Z High Performance |Case| NZXT Phantom 410 |OS| Windows 10 x64

Link to comment
Share on other sites

Link to post
Share on other sites

Who knows what is possible in the future or what is in development at the moment...

 

Years ago we would have probably thought multiply cores on one cpu was impossible... look how many they fit on it now!

 

 

One of the reasons I love technology, seeing what they can blow us away with next.

 

Lets not be closed minded into what we know about technology in the present, but look at what is possible in the future.

And then people carried away that cores equaled more performance under all conditions while not understanding that  cores had to have software to utilize them. See ghz myth for other notes

Everything you need to know about AMD cpus in one simple post.  Christian Member 

Wii u, ps3(2 usb fat),ps4

Iphone 6 64gb and surface RT

Hp DL380 G5 with one E5345 and bunch of hot swappable hdds in raid 5 from when i got it. intend to run xen server on it

Apple Power Macintosh G5 2.0 DP (PCI-X) with notebook hdd i had lying around 4GB of ram

TOSHIBA Satellite P850 with Core i7-3610QM,8gb of ram,default 750hdd has dual screens via a external display as main and laptop display as second running windows 10

MacBookPro11,3:I7-4870HQ, 512gb ssd,16gb of memory

Link to comment
Share on other sites

Link to post
Share on other sites

You are both imagining that the initial data is the same as the working data.

 

Based on your ideas you may as well not have VRAM on a card, just have it all accessed over PCIE, that will work well...

 

Do you really think the gpu CONSTANTLY streams data to and from the hard drive and system memory? Do you know how hard that would bottleneck the card in the first place? When you load a level, most of the data you need is loaded right into the vram, and additional data is then streamed in if necessary, making sure it's given enough time to copy before it's actually used. Shared vram would not need cards to copy data from each other, at all.

Don't ask to ask, just ask... please 🤨

sudo chmod -R 000 /*

Link to comment
Share on other sites

Link to post
Share on other sites

I have issues with SSD being unable to load sound files quickly enough causing slow down in certain places

 

Is that titanfall? xD

Don't ask to ask, just ask... please 🤨

sudo chmod -R 000 /*

Link to comment
Share on other sites

Link to post
Share on other sites

Is that titanfall? xD

Noooo.... Its Battlefield 4 [emoji14] or EA depending on how you want to look at it :)
Link to comment
Share on other sites

Link to post
Share on other sites

And then people carried away that cores equaled more performance under all conditions while not understanding that  cores had to have software to utilize them. See ghz myth for other notes

 

Exactly!

|CPU| i7 4770K @ 4.4GHz |Motherboard| Asus Z87 MAXIMUS VI HERO |RAM| Corsair Vengeance 16GB 1866MHz |GPU| EVGA GeForce GTX 770 2GB

|HD| 256GB SSD 850 PRO |Sound| Creative Sound Blaster Z High Performance |Case| NZXT Phantom 410 |OS| Windows 10 x64

Link to comment
Share on other sites

Link to post
Share on other sites

Cominng from a computer science student: the OP is right.

Pcie doesnt form a problem in single gpu settings because you wont need to cummunicate that much data every frame (only stuff that changes, namely transformation matrices).

In a current dual gpu setting there is no problem either as each gpu will have its own copy of the data in VRAM.

When you want to "double the memory" both cards would have to access each others VRAM.

That's where we have a problem: instead of cummunicating with their own VRAM directly which they can do at speeds of up to 250GB/s (depending on the bus width and memory clock). This is much, much more then PCIe can do.

Actually... I'll cut across here. What about XDMA in R9 290 and R9 290X cards? There isn't even a connector... all communication is done through PCI/e. Shouldn't the fact that communication possible via PCI/e automatically disable the limit of the SLI bridge? What if the kind of hardware access that DX12 allows grants the ability to have PCI/e communication between cards in a similar fashion? I'm fairly sure that it could happen that way.

 

I mean, remember they managed to combine the iGPU of a PC with the dGPU and have a performance boost... theoretically, that should have not allowed for any boost, due to the distinctly smaller vRAM buffer of the iGPU on intel chips and the slow interaction between system RAM and iGPU or virtual RAM and the iGPU.

I have finally moved to a desktop. Also my guides are outdated as hell.

 

THE INFORMATION GUIDES: SLI INFORMATION || vRAM INFORMATION || MOBILE i7 CPU INFORMATION || Maybe more someday

Link to comment
Share on other sites

Link to post
Share on other sites

Now how about you offer a solution instead of just annoying people, it like when people say what's wrong with politics then don't offer a solution

The weird kid in the corner eating glue
“People think that I must be a very strange person. This is not correct. I have the heart of a small boy. It is in a glass jar on my desk.” - Stephen King

Link to comment
Share on other sites

Link to post
Share on other sites

Actually... I'll cut across here. What about XDMA in R9 290 and R9 290X cards? There isn't even a connector... all communication is done through PCI/e. Shouldn't the fact that communication possible via PCI/e automatically disable the limit of the SLI bridge? What if the kind of hardware access that DX12 allows grants the ability to have PCI/e communication between cards in a similar fashion? I'm fairly sure that it could happen that way.

 

I mean, remember they managed to combine the iGPU of a PC with the dGPU and have a performance boost... theoretically, that should have not allowed for any boost, due to the distinctly smaller vRAM buffer of the iGPU on intel chips and the slow interaction between system RAM and iGPU or virtual RAM and the iGPU.

Its not the SLI bridge that is the bottleneck for not having "stacking" memory, the bottleneck is PCIe itself.

With XDMA they let the cards talk to eachother over PCIe, but that doesnt mean PCIe is fast enough to be able to move over gigabytes of data every 60th of a second.

 

In case of IGP's: they use system RAM as VRAM. Because RAM is directly connected to the CPU  (and thus the IGP) PCIe wont be a bottleneck as it isnt even used.

Desktop: Intel i9-10850K (R9 3900X died 😢 )| MSI Z490 Tomahawk | RTX 2080 (borrowed from work) - MSI GTX 1080 | 64GB 3600MHz CL16 memory | Corsair H100i (NF-F12 fans) | Samsung 970 EVO 512GB | Intel 665p 2TB | Samsung 830 256GB| 3TB HDD | Corsair 450D | Corsair RM550x | MG279Q

Laptop: Surface Pro 7 (i5, 16GB RAM, 256GB SSD)

Console: PlayStation 4 Pro

Link to comment
Share on other sites

Link to post
Share on other sites

Its not the SLI bridge that is the bottleneck for not having "stacking" memory, the bottleneck is PCIe itself.

With XDMA they let the cards talk to eachother over PCIe, but that doesnt mean PCIe is fast enough to be able to move over gigabytes of data every 60th of a second.

 

In case of IGP's: they use system RAM as VRAM. Because RAM is directly connected to the CPU  (and thus the IGP) PCIe wont be a bottleneck as it isnt even used.

You might be exaggerating a bit with the gigabytes of data every 60th of a second, but the amount of vRAM data that gets shoveled in/out of vRAM should be significantly less than that. I still think that PCI/e might have a way of transferring enough data across the way I described it.

 

Also, for the iGPU, I was referring to the fact that the iGPU could never hope to contain enough data in it to match the dGPU, and that using virtual RAM is FAR slower than dGPUs' vRAM, so if they needed to mirror the data like normal, the iGPU should have slowed down the entire operation, rather than sped it up. 

I have finally moved to a desktop. Also my guides are outdated as hell.

 

THE INFORMATION GUIDES: SLI INFORMATION || vRAM INFORMATION || MOBILE i7 CPU INFORMATION || Maybe more someday

Link to comment
Share on other sites

Link to post
Share on other sites

if the PCI lanes aren't fast enough to transfer data between one card and another then how come they're fast enough in a single card setup? Surely the same data is passing through at the same speed for a single card as it is for dual cards? 

 

my understanding of how it will stack is that DX12 will implement split frame rendering with a restricted frame buffer that will allow each card to only render half of the screen, so each card is doing half of the work it did before in sli/crossfire.

 

that way only half the data has to be shared between each card.

Gaming PC: • AMD Ryzen 7 3900x • 16gb Corsair Vengeance RGB Pro 3200mhz • Founders Edition 2080ti • 2x Crucial 1tb nvme ssd • NZXT H1• Logitech G915TKL • Logitech G Pro • Asus ROG XG32VQ • SteelSeries Arctis Pro Wireless

Laptop: MacBook Pro M1 512gb

Link to comment
Share on other sites

Link to post
Share on other sites

if the PCI lanes aren't fast enough to transfer data between one card and another then how come they're fast enough in a single card setup? Surely the same data is passing through at the same speed for a single card as it is for dual cards? 

 

my understanding of how it will stack is that DX12 will implement split frame rendering with a restricted frame buffer that will allow each card to only render half of the screen, so each card is doing half of the work it did before in sli/crossfire.

 

that way only half the data has to be shared between each card.

read the OP again...

Current SLI/Crossfire already use split frame rendering.

To render a frame the gpu has to access all the data about the current frame (mainly vertices and textures).

In a single gpu setup all this data is stores in the vram and the only thing going over the pcie bus is stuff that has to be changed.

The gpu can access the vram on the same graphics card at very high speed.

In a dual gpu setup each graphics card needs to store its own copy of the scene in its vram (so it wont double).

If both cards would contain one half of the scene (so you could "double" the amount of effective vram) each card would have to look up half of the scene in the other cards memory.

A gpu can access its own vram at very high speeds but to access the other graphic cards data it has to transfer that data over pcie which is to slow

Desktop: Intel i9-10850K (R9 3900X died 😢 )| MSI Z490 Tomahawk | RTX 2080 (borrowed from work) - MSI GTX 1080 | 64GB 3600MHz CL16 memory | Corsair H100i (NF-F12 fans) | Samsung 970 EVO 512GB | Intel 665p 2TB | Samsung 830 256GB| 3TB HDD | Corsair 450D | Corsair RM550x | MG279Q

Laptop: Surface Pro 7 (i5, 16GB RAM, 256GB SSD)

Console: PlayStation 4 Pro

Link to comment
Share on other sites

Link to post
Share on other sites

read the OP again...

Current SLI/Crossfire already use split frame rendering.

To render a frame the gpu has to access all the data about the current frame (mainly vertices and textures).

In a single gpu setup all this data is stores in the vram and the only thing going over the pcie bus is stuff that has to be changed.

The gpu can access the vram on the same graphics card at very high speed.

In a dual gpu setup each graphics card needs to store its own copy of the scene in its vram (so it wont double).

If both cards would contain one half of the scene (so you could "double" the amount of effective vram) each card would have to look up half of the scene in the other cards memory.

A gpu can access its own vram at very high speeds but to access the other graphic cards data it has to transfer that data over pcie which is to slow

 

 

yeah but dont current split from rendering mean that each card renders a frame so card 1 will render even frames and card 2 odd frames. where as split screen render means each card renders all 100% of the frames for half the screen and the other does the other half. that way either card needs to communicate as they're both rendering independently. 

 

its also means each card is doing 50% less work than the current method. 

Gaming PC: • AMD Ryzen 7 3900x • 16gb Corsair Vengeance RGB Pro 3200mhz • Founders Edition 2080ti • 2x Crucial 1tb nvme ssd • NZXT H1• Logitech G915TKL • Logitech G Pro • Asus ROG XG32VQ • SteelSeries Arctis Pro Wireless

Laptop: MacBook Pro M1 512gb

Link to comment
Share on other sites

Link to post
Share on other sites

yeah but dont current split from rendering mean that each card renders a frame so card 1 will render even frames and card 2 odd frames. where as split screen render means each card renders all 100% of the frames for half the screen and the other does the other half. that way either card needs to communicate as they're both rendering independently. 

 

its also means each card is doing 50% less work than the current method. 

That would be called Alternate Frame Rendering.

SLI and Crossfire can currently do both already

Desktop: Intel i9-10850K (R9 3900X died 😢 )| MSI Z490 Tomahawk | RTX 2080 (borrowed from work) - MSI GTX 1080 | 64GB 3600MHz CL16 memory | Corsair H100i (NF-F12 fans) | Samsung 970 EVO 512GB | Intel 665p 2TB | Samsung 830 256GB| 3TB HDD | Corsair 450D | Corsair RM550x | MG279Q

Laptop: Surface Pro 7 (i5, 16GB RAM, 256GB SSD)

Console: PlayStation 4 Pro

Link to comment
Share on other sites

Link to post
Share on other sites

That would be called Alternate Frame Rendering.

SLI and Crossfire can currently do both already

Hold on. Where exactly does SLI do split frame rendering? Every single instance I've ever seen of SLI since I got my machine in 2013 has been some form of AFR, and I've played quite a number of games on it. I know that AMD used to do SFR a long time ago, and it can do it now in Mantle (as someone else pointed out, it was what, tiled rendering?), but I don't see any games these days doing SFR.

I have finally moved to a desktop. Also my guides are outdated as hell.

 

THE INFORMATION GUIDES: SLI INFORMATION || vRAM INFORMATION || MOBILE i7 CPU INFORMATION || Maybe more someday

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×