Jump to content
  • entries
    63
  • comments
    51
  • views
    20,368

Why multi-video card setups can't combine VRAM

Mira Yurizaki

917 views

( I need a name for blog posts like these, but all the good ones are taken )

 

While I don't think it's often brought up, an idea might come about that when using multiple video cards, such as in SLI or Crossfire, their VRAM combines. So if you have two 8GB cards, you effectively get the same thing as a 16GB card. However, this isn't the case. You might be asking... but why? If these setups combine GPU power, how come VRAM doesn't combine?

 

On a broader view, the video cards are given the same data so that they can work on the same thing. After all, they're generating frames from the same scene in the application. But wouldn't it be cool if you weren't limited to just the amount of VRAM on the card and expand beyond it? There's just a few problems with it:

  • How is the data going to be transferred? If we look at PCI Express, it's a relatively slow interface compared to VRAM. PCIe 3.0 x16 caps at about 15.75 GB/s. NVIDIA's Titan V's VRAM has a bandwidth of a mind boggling 652 GB/s (imagine having that for your internet speed). So transferring data to and from cards would be an incredibly slow affair that would introduce stalls. To put in perspective, this speed difference is larger than that between SATA SSDs and DDR4-2133
  • VRAM works basically like a huge RAID-0 system. That is, each chip only has a fraction of the bandwidth of the card and it's the combined total of all of the chips performing that gives the bandwidth. So in order to transfer the data as fast as necessary to other cards, you would need a huge number of lines. I don't think connecting say 200 pin cables would be fun (nor would manufacturing them)
  • Data transfers would have to be over a parallel bus. I've talked about this in some detail why high-speed parallel buses for usage outside of relatively short ranges stopped being a thing. But aside from the bulky cabling, there's also the issue of signal timing. It's going to be very hard to ensure that all the bits of a 7GHz signal will reach its destination at the same time, even if it's only say six inches end to end.

A similar issue exists in systems with multiple physical processors. Though in this case, since all of the interconnects are on the motherboard itself, there's little issue with either making a huge cable or signal propagation. However, even in such cases, the system has to be aware of how to schedule tasks. As there's still a significant amount of latency accessing another processor's memory, some tasks will perform worse if scheduling considerations aren't taken into account.

3 Comments

41 minutes ago, CarnageTR said:

SLI bottlenecks communication traffic. Linus have a video about it. NVlink might be solution.

NVLink is still slow compare to VRAM bandwidth. Considering that GPUs are memory bandwidth sensitive, I don't believe even running the links at VRAM bandwidth would solve everything since memory sensitive applications have issues in NUMA based systems.

Link to comment
Link to post
×