Jump to content

Why Is DDR4 RAM Still Slower Than DDR3 Overall (Mainly Due To Higher CAS Latency)?

So it's been almost 5 years since the launch of DDR4 and I'm kind of surprised that we still don't have great CAS latency on most kits. Basically, DDR3 2200MHZ CL7 is still the best overall RAM given that it's able to transfer the first word to the CPU in 7.82ns, the fourth word in 7.73ns, and the eight word in 9.55s. The only place DDR4 is able to claim a victory is with the 4600MHz CL18 kit and the 4800MHz CL 19 kit, but even then it only wins on the eight word transfer, and one can argue that the first word transfer is the most important because the words are sent in critical word first order which allows the CPU to begin work immediately using that first word. The chart below demonstrates what I'm saying. I've also included my own kit at the bottom, 3200MHz CL14. It wasn't a cheap kit, either, and it's handily beaten by both the DDR3 2200MHz CL7 kit and the 2133MHz CL7 kit in first word, fourth word, and eight word, and both of those kits were similarly priced to my kit. It seems the only real advantage to DDR4 is the lower power draw. So does anyone know WHY the CAS latency is so much higher on DDR4? Why don't we have DDR4 2133MHz CL7 kits when we had DDR3 2133MHz C7 kits? I can only guess it's because of the lower power?

 

                  Generation                    

Type

            Data rate

         Transfer time    

   Command rate  

Cycle time 

 CAS latency

   First word

   Fourth word  

Eighth word

 

                 

DDR4 SDRAM

DDR4-4600

4600 MT/s

0.217 ns

2300 MHz

0.435 ns

18

7.82 ns

8.48 ns

9.35 ns

DDR4 SDRAM

DDR4-4800

4800 MT/s

0.208 ns

2400 MHz

0.417 ns

19

7.92 ns

8.54 ns

9.38 ns

DDR3 SDRAM

DDR3-2200

2200 MT/s

0.455 ns

1100 MHz

0.909 ns

7

6.36 ns

7.73 ns

9.55 ns

DDR4 SDRAM

DDR4-4600

4600 MT/s

0.217 ns

2300 MHz

0.435 ns

19

8.26 ns

8.91 ns

9.78 ns

DDR3 SDRAM

DDR3-2133

2133 MT/s

0.469 ns

1066 MHz

0.938 ns

7

6.56 ns

7.97 ns

9.84 ns

DDR4 SDRAM

DDR4-3200

3200 MT/s

0.313 ns

1600 MHz

0.625 ns

14

8.75 ns

9.69 ns

10.94 ns

Link to comment
Share on other sites

Link to post
Share on other sites

Since you're comparing <2ns of difference in most cases, does this matter in the grand scheme?  Is there a noticable user case where downgrading to DDR3 makes a lot of sense?

I don't get why First word is being given importance compared to overall benchmark performance on different memory kits/speeds.

Link to comment
Share on other sites

Link to post
Share on other sites

because bandwidth is more important than latency?

CPU: i7-2600K 4751MHz 1.44V (software) --> 1.47V at the back of the socket Motherboard: Asrock Z77 Extreme4 (BCLK: 103.3MHz) CPU Cooler: Noctua NH-D15 RAM: Adata XPG 2x8GB DDR3 (XMP: 2133MHz 10-11-11-30 CR2, custom: 2203MHz 10-11-10-26 CR1 tRFC:230 tREFI:14000) GPU: Asus GTX 1070 Dual (Super Jetstream vbios, +70(2025-2088MHz)/+400(8.8Gbps)) SSD: Samsung 840 Pro 256GB (main boot drive), Transcend SSD370 128GB PSU: Seasonic X-660 80+ Gold Case: Antec P110 Silent, 5 intakes 1 exhaust Monitor: AOC G2460PF 1080p 144Hz (150Hz max w/ DP, 121Hz max w/ HDMI) TN panel Keyboard: Logitech G610 Orion (Cherry MX Blue) with SteelSeries Apex M260 keycaps Mouse: BenQ Zowie FK1

 

Model: HP Omen 17 17-an110ca CPU: i7-8750H (0.125V core & cache, 50mV SA undervolt) GPU: GTX 1060 6GB Mobile (+80/+450, 1650MHz~1750MHz 0.78V~0.85V) RAM: 8+8GB DDR4-2400 18-17-17-39 2T Storage: HP EX920 1TB PCIe x4 M.2 SSD + Crucial MX500 1TB 2.5" SATA SSD, 128GB Toshiba PCIe x2 M.2 SSD (KBG30ZMV128G) gone cooking externally, 1TB Seagate 7200RPM 2.5" HDD (ST1000LM049-2GH172) left outside Monitor: 1080p 126Hz IPS G-sync

 

Desktop benching:

Cinebench R15 Single thread:168 Multi-thread: 833 

SuperPi (v1.5 from Techpowerup, PI value output) 16K: 0.100s 1M: 8.255s 32M: 7m 45.93s

Link to comment
Share on other sites

Link to post
Share on other sites

48 minutes ago, LogicWeasel said:

Since you're comparing <2ns of difference in most cases, does this matter in the grand scheme?  Is there a noticable user case where downgrading to DDR3 makes a lot of sense?

I don't get why First word is being given importance compared to overall benchmark performance on different memory kits/speeds.

Well, it's not less than 2 ns in most cases. The chart I gave compared really high end RAM. Your "average" RAM is DDR4 3000 CL 16, which is 10.67 ns for first word transfer compared to similarly priced high end DDR3 which can do that in 6.36ns. That's a difference of over 4 ns. First word is given priority because it is the most true measure of throughput. 

 

As for benchmarks, I don't see any that compare faster DDR3 RAM to slower DDR4 RAM on the same CPU. But even if we could find one, DDR3 is only used on older processors where the CPU might actually be the bottleneck at this level of RAM speed, making the results between the two functionally the same (if I'm right about the CPU being the bottleneck).

 

However, I'm mostly talking about on-paper, not necessarily real world scenarios. On paper the throughput of DDR3 is generally better because of way better latency.

 

And if we are talking real world, then I can assure you there would be a difference between DDR4 2133MHz C7 vs DDR4 2133MHz C14. I really just want to know why we've seemingly fallen backwards on latency.

Link to comment
Share on other sites

Link to post
Share on other sites

1 hour ago, Jurrunio said:

because bandwidth is more important than latency?

No it's not, that's what the first word transfer demonstrates (and the fourth and 8th word). That's the measurement of THROUGHPUT, which is the overall metric for RAM. Better throughput = better (faster) RAM.

 

Just to give you an idea on how much timings can affect benchmarks see here. The 3200MHz C14 SHOULD beat the 3000MHz C16 but in several benchmarks it doesn't! It turns out it was the SUB-TIMINGS, not even the primary timings:

 

https://www.techpowerup.com/reviews/GSkill/F4-3200C14Q-32GTZSW/6.html

Link to comment
Share on other sites

Link to post
Share on other sites

Oh, and just for reference, it looks like the lowest latency on the market right now is 3200MHz CL13, which puts it at 8.125ns for first word transfer, functionally beating 4600MHz CL19 (but still beaten by 4600MHz CL18).

Link to comment
Share on other sites

Link to post
Share on other sites

It is important to note there are JEDEC standard timings, and then there are "overclocker" type modules offered by more gaming orientated companies. DDR3 2133 is only defined to CAS 11. Anything faster than that is non-standard. Having said that, DDR4 2133 is only defined down to CAS 14. 

 

The long standing argument over if bandwidth or latency matters more still to this day boils down to, it depends on what you do with it. My personal use cases are primarily bandwidth limited. For a quad core consumer Intel CPU, I consider 3200 ram to be inadequate, and estimate 4000 would be getting towards "practically unlimited" bandwidth. I don't actually own a 4000 kit yet although I'm kinda shopping for one when the right price is in stock as I don't want to overpay for it. In the same testing, I've tried varying latency up and down, and it makes hardly any difference compared to bandwidth. I am aware of different applications that can be more latency sensitive, but since I don't run them, it doesn't matter to me.

 

For the standard timings, the best cycle timing of each speed tends to work out to about the same elapsed time. So while faster kits don't necessarily reduce timing, you still gain from the bandwidth part.

 

If you care about latency, you can do what I did, ONCE. That is to manually optimise ALL the timings available in ram. This is extremely time consuming as one wrong step anywhere will lead to instability. You will reboot a lot. You risk corrupting the Windows install. But if you make it through this, the benchmark latency results will destroy ANY off the shelf kit. It took me about a week of evenings to dial in my B-die kit, and even then I made a mistake somewhere as it isn't 100% stable and my life isn't long enough to go back and find where. Some errors are not easily detectable and only crop up after extensive testing. Because of the pain involved, this is probably only something for competitive overclockers looking for every last bit of performance. For normal users, buy any old fast kit and it'll be good enough.

Gaming system: R7 7800X3D, Asus ROG Strix B650E-F Gaming Wifi, Thermalright Phantom Spirit 120 SE ARGB, Corsair Vengeance 2x 32GB 6000C30, RTX 4070, MSI MPG A850G, Fractal Design North, Samsung 990 Pro 2TB, Acer Predator XB241YU 24" 1440p 144Hz G-Sync + HP LP2475w 24" 1200p 60Hz wide gamut
Productivity system: i9-7980XE, Asus X299 TUF mark 2, Noctua D15, 64GB ram (mixed), RTX 3070, NZXT E850, GameMax Abyss, Samsung 980 Pro 2TB, random 1080p + 720p displays.
Gaming laptop: Lenovo Legion 5, 5800H, RTX 3070, Kingston DDR4 3200C22 2x16GB 2Rx8, Kingston Fury Renegade 1TB + Crucial P1 1TB SSD, 165 Hz IPS 1080p G-Sync Compatible

Link to comment
Share on other sites

Link to post
Share on other sites

14 minutes ago, porina said:

It is important to note there are JEDEC standard timings, and then there are "overclocker" type modules offered by more gaming orientated companies. DDR3 2133 is only defined to CAS 11. Anything faster than that is non-standard. Having said that, DDR4 2133 is only defined down to CAS 14. 

 

The long standing argument over if bandwidth or latency matters more still to this day boils down to, it depends on what you do with it. My personal use cases are primarily bandwidth limited. For a quad core consumer Intel CPU, I consider 3200 ram to be inadequate, and estimate 4000 would be getting towards "practically unlimited" bandwidth. I don't actually own a 4000 kit yet although I'm kinda shopping for one when the right price is in stock as I don't want to overpay for it. In the same testing, I've tried varying latency up and down, and it makes hardly any difference compared to bandwidth. I am aware of different applications that can be more latency sensitive, but since I don't run them, it doesn't matter to me.

 

For the standard timings, the best cycle timing of each speed tends to work out to about the same elapsed time. So while faster kits don't necessarily reduce timing, you still gain from the bandwidth part.

 

If you care about latency, you can do what I did, ONCE. That is to manually optimise ALL the timings available in ram. This is extremely time consuming as one wrong step anywhere will lead to instability. You will reboot a lot. You risk corrupting the Windows install. But if you make it through this, the benchmark latency results will destroy ANY off the shelf kit. It took me about a week of evenings to dial in my B-die kit, and even then I made a mistake somewhere as it isn't 100% stable and my life isn't long enough to go back and find where. Some errors are not easily detectable and only crop up after extensive testing. Because of the pain involved, this is probably only something for competitive overclockers looking for every last bit of performance. For normal users, buy any old fast kit and it'll be good enough.

 

I agree with this 100%. Many applications prefer bandwidth to latency. My question was more oriented at the on-paper specifications. My main question was more so the below

 

50 minutes ago, jerubedo said:

So does anyone know WHY the CAS latency is so much higher on DDR4? Why don't we have DDR4 2133MHz CL7 kits when we had DDR3 2133MHz C7 kits? I can only guess it's because of the lower power?

 

Overclocked or not, DDR3 was able to achieve way lower CAS latency at the same speeds (2133 vs 2133). So why can't we have the best of both worlds with DDR4: higher bandwidth WITH lower CAS latency? What is the driving force preventing this? Just the fact that DDR3-2133MHz is defined to CAS 11 whereas DDR4-2133MHz is defined to CAS 14 confuses me.

Link to comment
Share on other sites

Link to post
Share on other sites

2 minutes ago, jerubedo said:

Overclocked or not, DDR3 was able to achieve way lower CAS latency at the same speeds (2133 vs 2133). So why can't we have the best of both worlds with DDR4: higher bandwidth WITH lower CAS latency?

I don't have the answer to this, but I have a guess: Voltage. One of the drivers of each generation of DDR is a drop in voltage and consequently power. With DDR4 we run at 1.2v at lower speeds, with 1.35v being more common for the less extreme higher speeds about around 2800. From memory, DDR3 was 1.5v standard and even low power DDR3 was 1.35v. A common trick for ram overclockers is in order to get timings down, you have to really jack up the ram voltage. Maybe they simply decided when setting the standard the tradeoff for most people to lower voltage, and therefore power consumption, was more important than slightly lower latency in some cases.

Gaming system: R7 7800X3D, Asus ROG Strix B650E-F Gaming Wifi, Thermalright Phantom Spirit 120 SE ARGB, Corsair Vengeance 2x 32GB 6000C30, RTX 4070, MSI MPG A850G, Fractal Design North, Samsung 990 Pro 2TB, Acer Predator XB241YU 24" 1440p 144Hz G-Sync + HP LP2475w 24" 1200p 60Hz wide gamut
Productivity system: i9-7980XE, Asus X299 TUF mark 2, Noctua D15, 64GB ram (mixed), RTX 3070, NZXT E850, GameMax Abyss, Samsung 980 Pro 2TB, random 1080p + 720p displays.
Gaming laptop: Lenovo Legion 5, 5800H, RTX 3070, Kingston DDR4 3200C22 2x16GB 2Rx8, Kingston Fury Renegade 1TB + Crucial P1 1TB SSD, 165 Hz IPS 1080p G-Sync Compatible

Link to comment
Share on other sites

Link to post
Share on other sites

2 minutes ago, porina said:

I don't have the answer to this, but I have a guess: Voltage. One of the drivers of each generation of DDR is a drop in voltage and consequently power. With DDR4 we run at 1.2v at lower speeds, with 1.35v being more common for the less extreme higher speeds about around 2800. From memory, DDR3 was 1.5v standard and even low power DDR3 was 1.35v. A common trick for ram overclockers is in order to get timings down, you have to really jack up the ram voltage. Maybe they simply decided when setting the standard the tradeoff for most people to lower voltage, and therefore power consumption, was more important than slightly lower latency in some cases. 

Yep, that was my guess as well :)

Link to comment
Share on other sites

Link to post
Share on other sites

Posts like these (people way more versed than I am on a topic) are why I love this forum.  I just read all of that and I will easily admit I have no idea whats going on here - However I am more educated than I was 15 minutes ago.

 

Continue please...lol

Workstation Laptop: Dell Precision 7540, Xeon E-2276M, 32gb DDR4, Quadro T2000 GPU, 4k display

Wifes Rig: ASRock B550m Riptide, Ryzen 5 5600X, Sapphire Nitro+ RX 6700 XT, 16gb (2x8) 3600mhz V-Color Skywalker RAM, ARESGAME AGS 850w PSU, 1tb WD Black SN750, 500gb Crucial m.2, DIYPC MA01-G case

My Rig: ASRock B450m Pro4, Ryzen 5 3600, ARESGAME River 5 CPU cooler, EVGA RTX 2060 KO, 16gb (2x8) 3600mhz TeamGroup T-Force RAM, ARESGAME AGV750w PSU, 1tb WD Black SN750 NVMe Win 10 boot drive, 3tb Hitachi 7200 RPM HDD, Fractal Design Focus G Mini custom painted.  

NVIDIA GeForce RTX 2060 video card benchmark result - AMD Ryzen 5 3600,ASRock B450M Pro4 (3dmark.com)

Daughter 1 Rig: ASrock B450 Pro4, Ryzen 7 1700 @ 4.2ghz all core 1.4vCore, AMD R9 Fury X w/ Swiftech KOMODO waterblock, Custom Loop 2x240mm + 1x120mm radiators in push/pull 16gb (2x8) Patriot Viper CL14 2666mhz RAM, Corsair HX850 PSU, 250gb Samsun 960 EVO NVMe Win 10 boot drive, 500gb Samsung 840 EVO SSD, 512GB TeamGroup MP30 M.2 SATA III SSD, SuperTalent 512gb SATA III SSD, CoolerMaster HAF XM Case. 

https://www.3dmark.com/3dm/37004594?

Daughter 2 Rig: ASUS B350-PRIME ATX, Ryzen 7 1700, Sapphire Nitro+ R9 Fury Tri-X, 16gb (2x8) 3200mhz V-Color Skywalker, ANTEC Earthwatts 750w PSU, MasterLiquid Lite 120 AIO cooler in Push/Pull config as rear exhaust, 250gb Samsung 850 Evo SSD, Patriot Burst 240gb SSD, Cougar MX330-X Case

 

Link to comment
Share on other sites

Link to post
Share on other sites

  • 5 months later...

I read once(don't quote me here lol) its due to speed. CAS is based on clock cycles not real time, my 3770k was around a two thirds of the clock speed of my 9900k, so with faster clock cycles and higher latency, it should be similar real time. I was curious about that when playing with one of my windows 98 pcs... DDR 400mhz CAS 3, and that was the most reasonable answer i found. Perhaps the way ram is manufactured the speeds get faster but no real improvements in access times?

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×