Jump to content

Why don't we never have a motherboard that support VRAM instead of RAM modules?

Just curious, from my understanding, the video memory is far suprior to system memory in everyway from speed to channel bandwitch, why don't we never have a motherboard that support VRAM instead of RAM modules?

 

If we have even a GDDR4 on the board today, even DDR5 would be DOL by the time it came out, and obviously, this will also allow you to 'add VRAM' to the GPU which should be really useful given how insane devs are with VRAM now a day (given that your GPU use the same VRAM as your system.)

 

I dunno, was it cost? (channel like Hardware Unboxed doesn't seem to think so with their 'everyone should get 16GB VRAM' rhetoric lately) or is there any other advantage of using a combination of system + video memory instead of VRAM everything a la console? 

Link to comment
Share on other sites

Link to post
Share on other sites

2 minutes ago, e22big said:

Just curious, from my understanding, the video memory is far suprior to system memory in everyway from speed to channel bandwitch, why don't we never have a motherboard that support VRAM instead of RAM modules?

 

If we have even a GDDR4 on the board today, even DDR5 would be DOL by the time it came out, and obviously, this will also allow you to 'add VRAM' to the GPU which should be really useful given how insane devs are with VRAM now a day (given that your GPU use the same VRAM as your system.)

 

I dunno, was it cost? (channel like Hardware Unboxed doesn't seem to think so with their 'everyone should get 16GB VRAM' rhetoric lately) or is there any other advantage of using a combination of system + video memory instead of VRAM everything a la console? 

cuz then the demand for gpus would go as low as luke's confidence in the WAN show

Message me on discord (bread8669) for more help 

 

Current parts list

CPU: R5 5600 CPU Cooler: Stock

Mobo: Asrock B550M-ITX/ac

RAM: Vengeance LPX 2x8GB 3200mhz Cl16

SSD: P5 Plus 500GB Secondary SSD: Kingston A400 960GB

GPU: MSI RTX 3060 Gaming X

Fans: 1x Noctua NF-P12 Redux, 1x Arctic P12, 1x Corsair LL120

PSU: NZXT SP-650M SFX-L PSU from H1

Monitor: Samsung WQHD 34 inch and 43 inch TV

Mouse: Logitech G203

Keyboard: Rii membrane keyboard

 

 

 

 

 

 

 

 

 


 

 

 

 

 

 

Damn this space can fit a 4090 (just kidding)

Link to comment
Share on other sites

Link to post
Share on other sites

5 minutes ago, e22big said:

the video memory is far suprior to system memory in everyway from speed to channel bandwitch, why don't we never have a motherboard that support VRAM instead of RAM modules?

Because GDDR is terrible when it comes to memory latency, and a lot of desktop workloads are a bit more latency sensitive than they are bandwidth sensitive. Look at the PS5 dev board for instance, it did actually use VRAM instead of standard desktop RAM and ran Windows, and it was next to unusable as a desktop CPU because the memory performance was so terrible. Look at DDR4 vs. DDR5 for instance, there are still games that do better with DDR4 than DDR5 because of that lower latency found on DDR4, imagine having almost 3 times the amount of latency instead. 

 

I'm sure some workloads don't care, but they're also likely workloads that would be better suited to a GPU anyway, so might as well just leave VRAM on GPUs, and regular desktop memory for CPUs. 

Link to comment
Share on other sites

Link to post
Share on other sites

3 minutes ago, RONOTHAN## said:

Because GDDR is terrible when it comes to memory latency, and a lot of desktop workloads are a bit more latency sensitive than they are bandwidth sensitive. Look at the PS5 dev board for instance, it did actually use VRAM instead of standard desktop RAM and ran Windows, and it was next to unusable as a desktop CPU because the memory performance was so terrible. Look at DDR4 vs. DDR5 for instance, there are still games that do better with DDR4 than DDR5 because of that lower latency found on DDR4, imagine having almost 3 times the amount of latency instead. 

 

I'm sure some workloads don't care, but they're also likely workloads that would be better suited to a GPU anyway, so might as well just leave VRAM on GPUs, and regular desktop memory for CPUs. 

Wasn't latency one of the main selling point for GDDR? My understanding is that one of the main cause for a serious drop in game performance once you're running out of VRAM is due to system memory having a significantly more latency than GDDR (that was also physically locating closer to the GPU chip)

Link to comment
Share on other sites

Link to post
Share on other sites

2 minutes ago, e22big said:

Wasn't latency one of the main selling point for GDDR? My understanding is that one of the main cause for a serious drop in game performance once you're running out of VRAM is due to system memory having a significantly more latency than GDDR (that was also physically locating closer to the GPU chip)

Yes and no. Yeah, it has lower latency because it's closer to the actual chip, but it's also significantly lower bandwidth thanks to the PCIe slot and that lower bandwidth is what actually hurts performance a ton. 

 

Memory latency matters the most when you're dealing with very small files that are read through sporadically. GPUs don't do that, they read large texture files, so having looser timings doesn't matter quite as much. CPUs do deal with very small files and use them pretty randomly, so having lower latency is much better. 

 

There's definitely specifics I'm forgetting, it's been a while since I've looked at this and it's 2AM, but the actual memory latency is the biggest factor between these. 

Link to comment
Share on other sites

Link to post
Share on other sites

21 minutes ago, e22big said:

Wasn't latency one of the main selling point for GDDR? My understanding is that one of the main cause for a serious drop in game performance once you're running out of VRAM is due to system memory having a significantly more latency than GDDR (that was also physically locating closer to the GPU chip)

The reason you have more latency when your GPU has to access system RAM instead of the VRAM located on its board is that the GPU needs to communicate over the PCIe bus to access it, rather than accessing it directly. Plus system RAM isn't dedicated to the GPU, instead it is shared with the CPU.

Remember to either quote or @mention others, so they are notified of your reply

Link to comment
Share on other sites

Link to post
Share on other sites

It’s a different tool for a different job. It would be like trying to daily drive a semi truck. Yeah you could, but if you’re trying to commute to work a sedan is probably a better choice in every regard. Though just the same a sedan is a terrible choice for hauling freight across a country.

 

There’s far more to any memory than just speed or bandwidth. 
The biggest difference is latency, and it’s a massive scale change. Desktop memory is 15-16 nanoseconds of latency on average, it’s been that way since at least DDR1 (though the lowest latency memory was early DDR3 followed by later DDR4). GDDR6 is somewhere between 135-145 nanoseconds of latency.

Stuff like HBM2 is a little better at this but still awful in comparison to SDRAM.

There are game consoles that do this, as was noted above, some examples would be basically any system that has some kind of Linux port, so any PlayStation.

The Xbox 360 as well used it’s video memory as whole system memory, as opposed to going the other way around like an apu and using sdram as vram.

 

Lower latency is far more important for the use case of the cpu than raw speed is for the gpu. The gpu can shit out a frame stupidly fast and it needs something to shove that frame into and pull assets from super quickly, it can do that, and it can do it on a delay without issue. The latency is a reasonable drawback to make in the name of straight up speed, since a gpu is making an end result that’s still faster than our own “human latency”.

That “we can only see 60fps” thing is bullshit but you see things on a delay, a bit over 10 milliseconds, which is wayyyyyy above the nanosecond scale. Basically for real time visual output you’d never notice the latency involved in the processing of a frame. It’s happening on a scale humans can’t perceive.

 

BUT, to a cpu, that latency would be extremely noticeable. The lower latency of sdram is very important to the cpu doing its processing properly and providing its computational output to you in whatever form. 1ghz of cpu cycle rate is one cycle per nanosecond. So a decently fast cpu can get a lot done in a matter of nanoseconds, and how clock cycles work for different functions in a cpu the ratio of clock speed to cycles per nanosecond fits in very nicely with the latency of sdram, where it’s not too much of an inhibitor to most cycles that would use sdram heavily.

At the latency of video memory, you start to hold up clock cycles reliant on memory, as your cpu dumps into the cache much faster than the cache can be transferred between itself and ram. It would have to wait for the delayed information to and from the ram to be cached/moved from cache for much longer periods of time.

”much longer periods of time” on the scale of nanoseconds for a cpu

 

pardon my wall of text, this bleeds into the realm of educational documentation on computational sciences, and it gets very complicated, I cut out all of the math involved in this because the math is absolutely batshit insane on how memory latency, cache latency, cpu clock speed and cycle structure interact 

Link to comment
Share on other sites

Link to post
Share on other sites

1 hour ago, 8tg said:

It’s a different tool for a different job. It would be like trying to daily drive a semi truck. Yeah you could, but if you’re trying to commute to work a sedan is probably a better choice in every regard. Though just the same a sedan is a terrible choice for hauling freight across a country.

 

There’s far more to any memory than just speed or bandwidth. 
The biggest difference is latency, and it’s a massive scale change. Desktop memory is 15-16 nanoseconds of latency on average, it’s been that way since at least DDR1 (though the lowest latency memory was early DDR3 followed by later DDR4). GDDR6 is somewhere between 135-145 nanoseconds of latency.

Stuff like HBM2 is a little better at this but still awful in comparison to SDRAM.

There are game consoles that do this, as was noted above, some examples would be basically any system that has some kind of Linux port, so any PlayStation.

The Xbox 360 as well used it’s video memory as whole system memory, as opposed to going the other way around like an apu and using sdram as vram.

 

Lower latency is far more important for the use case of the cpu than raw speed is for the gpu. The gpu can shit out a frame stupidly fast and it needs something to shove that frame into and pull assets from super quickly, it can do that, and it can do it on a delay without issue. The latency is a reasonable drawback to make in the name of straight up speed, since a gpu is making an end result that’s still faster than our own “human latency”.

That “we can only see 60fps” thing is bullshit but you see things on a delay, a bit over 10 milliseconds, which is wayyyyyy above the nanosecond scale. Basically for real time visual output you’d never notice the latency involved in the processing of a frame. It’s happening on a scale humans can’t perceive.

 

BUT, to a cpu, that latency would be extremely noticeable. The lower latency of sdram is very important to the cpu doing its processing properly and providing its computational output to you in whatever form. 1ghz of cpu cycle rate is one cycle per nanosecond. So a decently fast cpu can get a lot done in a matter of nanoseconds, and how clock cycles work for different functions in a cpu the ratio of clock speed to cycles per nanosecond fits in very nicely with the latency of sdram, where it’s not too much of an inhibitor to most cycles that would use sdram heavily.

At the latency of video memory, you start to hold up clock cycles reliant on memory, as your cpu dumps into the cache much faster than the cache can be transferred between itself and ram. It would have to wait for the delayed information to and from the ram to be cached/moved from cache for much longer periods of time.

”much longer periods of time” on the scale of nanoseconds for a cpu

 

pardon my wall of text, this bleeds into the realm of educational documentation on computational sciences, and it gets very complicated, I cut out all of the math involved in this because the math is absolutely batshit insane on how memory latency, cache latency, cpu clock speed and cycle structure interact 

Thanks, that does indeed explained a lot

 

I guess it's also not quite possible to create some sort of a 'general purpose memory' that are both fast enough for the GPU and have low latency enough for CPU and other stuffs is it?

Link to comment
Share on other sites

Link to post
Share on other sites

3 minutes ago, e22big said:

I guess it's also not quite possible to create some sort of a 'general purpose memory' that are both fast enough for the GPU and have low latency enough for CPU and other stuffs is it?

And cheap enough for people to buy it? I dunno..

Link to comment
Share on other sites

Link to post
Share on other sites

I'd also like to point out that software also has to take advantage of new memory types.

For example, typical DDR4 memory sticks are 64 bit wide, so on each Hz the memory controller reads 2  64 bit chunks of data from a ram stick. If you have 2 memory sticks, the memory controller can work in dual channel mode, reading from both memory sticks in parallel, so on each Hz you get  2 sticks x 2 x 64 bits = 256 bits.

But if you go with HBM memory, those memories are designed to read and write 1024 bits at a time, and if a design uses multiple such chips it can then read lots more at a time. For example R9 Fury had 4 HBM chips, so it could read 4096 bits at a time from memory.

But the software has to be cleverly done and optimized, the compiler that converts the source code you write has to be aware of these types of memories and optimize the code, to arrange the data in a way that it gets as much useful data in a single read.

You won't get the most out of those memories if you read 4096 bits at a time, but only use 2000 bits and throw out the rest of 2096 bits, your program needs to be aware and smart enough to group values and numbers and information so that it gets as close as possible to 4096 or 2048 or 3072 (in case a manufacturer releases a cheaper version with less memory (like they release a model with 16 GB and then later release a cheaper one with only 12 GB)

 

 

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×