slow ram speed

apoyusiken · February 27

so i have 2x ddr4 ram running at 3000 mhz. (i confirm its dual channel in the terminal, my mobo has 2 slots anyways) a few days ago i was getting 14gibps and now 7 gibs when i run a test, in the meantime i increased zram from 50 percent to 75 and i dedicate 2 gb to igpu. i ran tests when pc was busy and idle but no luck. also my base clock speed is 102 mhz and my mobo is on the low end, gemini says zram isnt an issue but maybe ist base clock speed and the speed should be 25+ gibps. also now my pc feels sluggish, im running mx linux btw. you guys gotta know how to fix this, thx in advance if you let me know of your ideas.

update : i tried turning zram off and i get 8gibps

update2 : so its running at 86 degrees under no load but fans arent loud? and i had configured the fan graph to be fast

update 3 : ok cpu fan wasnt turning i replugged its cable and now temps are fine but still i get 7.5 gibps

RONOTHAN## · February 27

Can you give full specs of the system (motherboard, CPU, RAM model, etc.), as well as the test you're using to run?

Eigenvektor · February 27

7 hours ago, apoyusiken said:

a few days ago i was getting 14gibps and now 7 gibs when i run a test,

Both of these results appear too low.

3000 MT/s x 64 b/T = 192 Gbps, double that for dual channel

So you should be seeing 24 GB/s or 48 GB/s for dual channel (DDR4 3000 MT/s aka PC4-24000)

apoyusiken · February 27

11 hours ago, RONOTHAN## said:

Can you give full specs of the system (motherboard, CPU, RAM model, etc.), as well as the test you're using to run?

2200G, A520M K,

# dmidecode 3.4
Getting SMBIOS data from sysfs.
SMBIOS 3.3.0 present.

Handle 0x000B, DMI type 16, 23 bytes
Physical Memory Array
   Location: System Board Or Motherboard
   Use: System Memory
   Error Correction Type: None
   Maximum Capacity: 128 GB
   Error Information Handle: 0x000A
   Number Of Devices: 4

Handle 0x0012, DMI type 17, 92 bytes
Memory Device
   Array Handle: 0x000B
   Error Information Handle: 0x0011
   Total Width: Unknown
   Data Width: Unknown
   Size: No Module Installed
   Form Factor: Unknown
   Set: None
   Locator: DIMM 0
   Bank Locator: P0 CHANNEL A
   Type: Unknown
   Type Detail: Unknown

Handle 0x0014, DMI type 17, 92 bytes
Memory Device
   Array Handle: 0x000B
   Error Information Handle: 0x0013
   Total Width: 64 bits
   Data Width: 64 bits
   Size: 8 GB
   Form Factor: DIMM
   Set: None
   Locator: DIMM 1
   Bank Locator: P0 CHANNEL A
   Type: DDR4
   Type Detail: Synchronous Unbuffered (Unregistered)
   Speed: 3000 MT/s
   Manufacturer: Unknown
   Serial Number: 00000000
   Asset Tag: Not Specified
   Part Number: CMK8GX4M1D3000C16
   Rank: 1
   Configured Memory Speed: 3000 MT/s
   Minimum Voltage: 1.2 V
   Maximum Voltage: 1.2 V
   Configured Voltage: 1.2 V
   Memory Technology: DRAM
   Memory Operating Mode Capability: Volatile memory
   Firmware Version: Unknown
   Module Manufacturer ID: Bank 3, Hex 0x9E
   Module Product ID: Unknown
   Memory Subsystem Controller Manufacturer ID: Unknown
   Memory Subsystem Controller Product ID: Unknown
   Non-Volatile Size: None
   Volatile Size: 8 GB
   Cache Size: None
   Logical Size: None

Handle 0x0017, DMI type 17, 92 bytes
Memory Device
   Array Handle: 0x000B
   Error Information Handle: 0x0016
   Total Width: Unknown
   Data Width: Unknown
   Size: No Module Installed
   Form Factor: Unknown
   Set: None
   Locator: DIMM 0
   Bank Locator: P0 CHANNEL B
   Type: Unknown
   Type Detail: Unknown

Handle 0x0019, DMI type 17, 92 bytes
Memory Device
   Array Handle: 0x000B
   Error Information Handle: 0x0018
   Total Width: 64 bits
   Data Width: 64 bits
   Size: 8 GB
   Form Factor: DIMM
   Set: None
   Locator: DIMM 1
   Bank Locator: P0 CHANNEL B
   Type: DDR4
   Type Detail: Synchronous Unbuffered (Unregistered)
   Speed: 3000 MT/s
   Manufacturer: Unknown
   Serial Number: 00000000
   Asset Tag: Not Specified
   Part Number: CMK8GX4M1D3000C16
   Rank: 1
   Configured Memory Speed: 3000 MT/s
   Minimum Voltage: 1.2 V
   Maximum Voltage: 1.2 V
   Configured Voltage: 1.2 V
   Memory Technology: DRAM
   Memory Operating Mode Capability: Volatile memory
   Firmware Version: Unknown
   Module Manufacturer ID: Bank 3, Hex 0x9E
   Module Product ID: Unknown
   Memory Subsystem Controller Manufacturer ID: Unknown
   Memory Subsystem Controller Product ID: Unknown
   Non-Volatile Size: None
   Volatile Size: 8 GB
   Cache Size: None
   Logical Size: None

$ mbw -t0 2024
Long uses 8 bytes. Allocating 2*265289728 elements = 4244635648 bytes of memory.
Getting down to business... Doing 10 runs per test.
0   Method: MEMCPY   Elapsed: 0.29422   MiB: 2024.00000   Copy: 6879.159 MiB/s
1   Method: MEMCPY   Elapsed: 0.29343   MiB: 2024.00000   Copy: 6897.821 MiB/s
2   Method: MEMCPY   Elapsed: 0.29271   MiB: 2024.00000   Copy: 6914.694 MiB/s
3   Method: MEMCPY   Elapsed: 0.29430   MiB: 2024.00000   Copy: 6877.266 MiB/s
4   Method: MEMCPY   Elapsed: 0.29954   MiB: 2024.00000   Copy: 6756.937 MiB/s
5   Method: MEMCPY   Elapsed: 0.29138   MiB: 2024.00000   Copy: 6946.351 MiB/s
6   Method: MEMCPY   Elapsed: 0.30019   MiB: 2024.00000   Copy: 6742.464 MiB/s
7   Method: MEMCPY   Elapsed: 0.29974   MiB: 2024.00000   Copy: 6752.519 MiB/s
8   Method: MEMCPY   Elapsed: 0.29447   MiB: 2024.00000   Copy: 6873.412 MiB/s
9   Method: MEMCPY   Elapsed: 0.29157   MiB: 2024.00000   Copy: 6941.729 MiB/s
AVG   Method: MEMCPY   Elapsed: 0.29515   MiB: 2024.00000   Copy: 6857.423 MiB/s

apoyusiken · February 27

7 hours ago, Eigenvektor said:

Both of these results appear too low.

3000 MT/s x 64 b/T = 192 Gbps, double that for dual channel

So you should be seeing 24 GB/s or 48 GB/s for dual channel (DDR4 3000 MT/s aka PC4-24000)

yea thats why im tryna fix it, ~~but shouldnt it be 12 or 24 gibps?~~

apoyusiken · February 27

so yea turns out the test is single threaded,

$ sysbench memory --threads=4 --memory-block-size=1M --memory-total-size=20G run
sysbench 1.0.20 (using system LuaJIT 2.1.0-beta3)

Running the test with following options:
Number of threads: 4
Initializing random number generator from current time

Running memory speed test with the following options:
block size: 1024KiB
total size: 20480MiB
operation: write
scope: global

Initializing worker threads...

Threads started!

Total operations: 20480 (34627.44 per second)

20480.00 MiB transferred (34627.44 MiB/sec)

General statistics:
total time: 0.5878s
total number of events: 20480

Latency (ms):
min: 0.04
avg: 0.09
max: 11.68
95th percentile: 0.25
sum: 1843.60

Threads fairness:
events (avg/stddev): 5120.0000/0.00
execution time (avg/stddev): 0.4609/0.08

but still not the promised 48 gibps

Eigenvektor · February 27

1 hour ago, apoyusiken said:

but still not the promised 48 gibps

Theoretical throughput is 48 GB/s not GiB/s. Converted to GiB/s the expected result would be around 44.8 GiB/s

You're most likely running into a bottleneck with some other component. Test might simply be CPU limited.

Generating random numbers is not free. Depending what device is used here (urandom?) that could turn into the limiting factor.

Eigenvektor · February 28

I've got a 5900x paired with DDR4 3600 MT/s (PC4-28800). In dual channel this gives me a theoretical throughput of 57.6 GB/s, or 54.9 GiB/s.

When I run sysbench with the same settings you did, I actually get 86638.97 MiB/sec. When I increase the block size to 8M, it actually increases to 104571.75 MiB/sec! Way more than expected.

According to ChatGPT the reason I'm seeing such large numbers is the small block size. This has me measuring my CPU's L3 cache, rather than actual memory throughput. Here's what I see when I modify the benchmark as it suggested for my CPU:

sysbench memory --threads=12 --memory-block-size=1G --memory-total-size=20G run                                                               ✔ 
sysbench 1.0.20 (using system LuaJIT 2.1.1720049189)

Running the test with following options:
Number of threads: 12
Initializing random number generator from current time


Running memory speed test with the following options:
  block size: 1048576KiB
  total size: 20480MiB
  operation: write
  scope: global

Initializing worker threads...

Threads started!

Total operations: 12 (   52.09 per second)

12288.00 MiB transferred (53343.40 MiB/sec)


General statistics:
    total time:                          0.2298s
    total number of events:              12

Latency (ms):
         min:                                  152.52
         avg:                                  194.32
         max:                                  229.69
         95th percentile:                      231.53
         sum:                                 2331.85

Threads fairness:
    events (avg/stddev):           1.0000/0.00
    execution time (avg/stddev):   0.1943/0.02

Now that looks much more like what I'd expect.

Here's the LLM's full output, which also includes recommendations for benchmarks better suited to what you're trying to test.

Spoiler

On paper, your calculation is correct:

DDR4-3600
3600 MT/s × 8 bytes per transfer (64-bit bus)
= 28.8 GB/s per channel
Dual channel → 57.6 GB/s theoretical peak (≈56–57 GB/s depending on units)

Yet you're measuring:

~104,572 MiB/s ≈ 109.6 GB/s

That’s almost 2× theoretical DRAM bandwidth.

This usually means you're not measuring DRAM bandwidth. Here's why.

You’re Mostly Measuring Cache Bandwidth

Your command:

sysbench memory --threads=4 --memory-block-size=8M --memory-total-size=20G run

Important points:

--memory-block-size=8M
4 threads
Each thread repeatedly accesses its working block

On a Ryzen 9 5900X, you have:

12 cores
2 CCDs
32 MB L3 cache per CCD
64 MB total L3

With 4 threads × 8 MB = 32 MB total working set

That fits entirely inside one CCD's 32 MB L3 cache.

So what you're benchmarking is largely:

L3 cache bandwidth, not DRAM bandwidth.

Zen 3 L3 bandwidth is easily >100 GB/s aggregate, which matches your result almost perfectly.

Sysbench Is Measuring Copy/Sequential Operations

Sysbench memory test performs simple operations like:

Sequential read
Sequential write
memcpy-like operations

These:

Are highly prefetch-friendly
Hit cache aggressively
Benefit from write-combining
Are NUMA-local

So even when DRAM is involved, the CPU:

Prefetches ahead
Hides latency
Uses store buffers efficiently

This makes measured throughput higher than raw DRAM limits in small working sets.

You’re Seeing Cache-Line Traffic, Not Memory Bus Saturation

Modern CPUs move data in 64-byte cache lines.

If the benchmark:

Reuses data
Loops over the same block repeatedly
Doesn’t evict cache

Then many operations:

Never leave the CCD
Never hit the memory controller
Never consume DRAM bandwidth

So your 109 GB/s is very consistent with:

Zen 3 L3 cache bandwidth across 4 cores

How To Measure Real DRAM Bandwidth

To force actual memory traffic, you need to:

✔ Use a working set much larger than L3

For example:

--memory-block-size=1G

or at least:

4 threads × 256M each

You want:

Working set ≫ 64 MB

✔ Use More Threads

Memory bandwidth scales with core count until the controller saturates.

Try:

--threads=12

✔ Or Use a Dedicated Bandwidth Tool

Better tools for DRAM bandwidth:

stream benchmark (classic memory bandwidth test)
mbw
lmbench
stress-ng --stream

STREAM on a Ryzen 5900X with DDR4-3600 typically reports:

~45–55 GB/s real sustained bandwidth

Which aligns with theory.

Unit Subtlety (Minor Detail)

You’re seeing:

104571.75 MiB/sec

Convert properly:

104,571 MiB/s × 1.048576 = 109.6 MB/s (decimal GB/s)

Even then:

109.6 GB/s > 57.6 GB/s

So units are not the main cause — cache effects are.

Final Answer

Your benchmark result is higher than theoretical DRAM bandwidth because:

You are mostly measuring L3 cache bandwidth, not main memory bandwidth.

Your working set (32 MB total) fits entirely in one Zen 3 CCD's L3 cache, which can deliver >100 GB/s aggregate bandwidth across multiple cores.

If you increase the memory block size well beyond 64 MB, your measured bandwidth will drop into the expected ~50–55 GB/s range.

If you'd like, I can also explain how Zen 3’s dual-CCD design affects memory scaling and why cross-CCD tests behave differently — it's quite interesting.

apoyusiken · February 28

3 hours ago, Eigenvektor said:
I've got a 5900x paired with DDR4 3600 MT/s (PC4-28800). In dual channel this gives me a theoretical throughput of 57.6 GB/s, or 54.9 GiB/s.

When I run sysbench with the same settings you did, I actually get 86638.97 MiB/sec. When I increase the block size to 8M, it actually increases to 104571.75 MiB/sec! Way more than expected.

According to ChatGPT the reason I'm seeing such large numbers is the small block size. This has me measuring my CPU's L3 cache, rather than actual memory throughput. Here's what I see when I modify the benchmark as it suggested for my CPU:
sysbench memory --threads=12 --memory-block-size=1G --memory-total-size=20G run                                                               ✔ 
sysbench 1.0.20 (using system LuaJIT 2.1.1720049189)

Running the test with following options:
Number of threads: 12
Initializing random number generator from current time


Running memory speed test with the following options:
  block size: 1048576KiB
  total size: 20480MiB
  operation: write
  scope: global

Initializing worker threads...

Threads started!

Total operations: 12 (   52.09 per second)

12288.00 MiB transferred (53343.40 MiB/sec)


General statistics:
    total time:                          0.2298s
    total number of events:              12

Latency (ms):
         min:                                  152.52
         avg:                                  194.32
         max:                                  229.69
         95th percentile:                      231.53
         sum:                                 2331.85

Threads fairness:
    events (avg/stddev):           1.0000/0.00
    execution time (avg/stddev):   0.1943/0.02
Now that looks much more like what I'd expect.

Here's the LLM's full output, which also includes recommendations for benchmarks better suited to what you're trying to test.
Hide contents
On paper, your calculation is correct:

DDR4-3600

3600 MT/s × 8 bytes per transfer (64-bit bus)

= 28.8 GB/s per channel

Dual channel → 57.6 GB/s theoretical peak (≈56–57 GB/s depending on units)

Yet you're measuring:

~104,572 MiB/s ≈ 109.6 GB/s

That’s almost 2× theoretical DRAM bandwidth.

This usually means you're not measuring DRAM bandwidth. Here's why.

You’re Mostly Measuring Cache Bandwidth

Your command:
sysbench memory --threads=4 --memory-block-size=8M --memory-total-size=20G run
Important points:

--memory-block-size=8M

4 threads

Each thread repeatedly accesses its working block

On a Ryzen 9 5900X, you have:

12 cores

2 CCDs

32 MB L3 cache per CCD

64 MB total L3

With 4 threads × 8 MB = 32 MB total working set

That fits entirely inside one CCD's 32 MB L3 cache.

So what you're benchmarking is largely:

L3 cache bandwidth, not DRAM bandwidth.

Zen 3 L3 bandwidth is easily >100 GB/s aggregate, which matches your result almost perfectly.

Sysbench Is Measuring Copy/Sequential Operations

Sysbench memory test performs simple operations like:

Sequential read

Sequential write

memcpy-like operations

These:

Are highly prefetch-friendly

Hit cache aggressively

Benefit from write-combining

Are NUMA-local

So even when DRAM is involved, the CPU:

Prefetches ahead

Hides latency

Uses store buffers efficiently

This makes measured throughput higher than raw DRAM limits in small working sets.

You’re Seeing Cache-Line Traffic, Not Memory Bus Saturation

Modern CPUs move data in 64-byte cache lines.

If the benchmark:

Reuses data

Loops over the same block repeatedly

Doesn’t evict cache

Then many operations:

Never leave the CCD

Never hit the memory controller

Never consume DRAM bandwidth

So your 109 GB/s is very consistent with:

Zen 3 L3 cache bandwidth across 4 cores

How To Measure Real DRAM Bandwidth

To force actual memory traffic, you need to:

✔ Use a working set much larger than L3

For example:
--memory-block-size=1G
or at least:
4 threads × 256M each
You want:

Working set ≫ 64 MB

✔ Use More Threads

Memory bandwidth scales with core count until the controller saturates.

Try:
--threads=12
✔ Or Use a Dedicated Bandwidth Tool

Better tools for DRAM bandwidth:

stream benchmark (classic memory bandwidth test)

mbw

lmbench

stress-ng --stream

STREAM on a Ryzen 5900X with DDR4-3600 typically reports:

~45–55 GB/s real sustained bandwidth

Which aligns with theory.

Unit Subtlety (Minor Detail)

You’re seeing:
104571.75 MiB/sec
Convert properly:

104,571 MiB/s × 1.048576 = 109.6 MB/s (decimal GB/s)

Even then:

109.6 GB/s > 57.6 GB/s

So units are not the main cause — cache effects are.

Final Answer

Your benchmark result is higher than theoretical DRAM bandwidth because:

You are mostly measuring L3 cache bandwidth, not main memory bandwidth.

Your working set (32 MB total) fits entirely in one Zen 3 CCD's L3 cache, which can deliver >100 GB/s aggregate bandwidth across multiple cores.

If you increase the memory block size well beyond 64 MB, your measured bandwidth will drop into the expected ~50–55 GB/s range.

If you'd like, I can also explain how Zen 3’s dual-CCD design affects memory scaling and why cross-CCD tests behave differently — it's quite interesting.

yea the cpu is a hero. can you try mbw with a single thread too?

$ sysbench memory --threads=4 --memory-block-size=1G --memory-total-size=20G run                                                              
sysbench 1.0.20 (using system LuaJIT 2.1.0-beta3)

Running the test with following options:
Number of threads: 4
Initializing random number generator from current time


Running memory speed test with the following options:
  block size: 1048576KiB
  total size: 20480MiB
  operation: write
  scope: global

Initializing worker threads...

Threads started!

Total operations: 20 (   12.95 per second)

20480.00 MiB transferred (13261.98 MiB/sec)


General statistics:
    total time:                          1.5409s
    total number of events:              20

Latency (ms):
         min:                                  165.46
         avg:                                  293.82
         max:                                  390.67
         95th percentile:                      383.33
         sum:                                 5876.39

Threads fairness:
    events (avg/stddev):           5.0000/0.00
    execution time (avg/stddev):   1.4691/0.06

so ram is too good for cpu?

Edit:

$ # 1. Download the source
wget https://www.cs.virginia.edu/stream/FTP/Code/stream.c

# 2. Compile with optimizations and multi-threading (OpenMP)
gcc -O3 -fopenmp -DSTREAM_ARRAY_SIZE=100000000 stream.c -o stream

# 3. Run it using all 4 of your Ryzen 2300G threads
export OMP_NUM_THREADS=4
./stream
--2026-02-28 13:42:40-- https://www.cs.virginia.edu/stream/FTP/Code/stream.c
Resolving www.cs.virginia.edu (www.cs.virginia.edu)... 128.143.67.8
Connecting to www.cs.virginia.edu (www.cs.virginia.edu)|128.143.67.8|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 19967 (19K) [text/x-csrc]
Saving to: ‘stream.c’

stream.c 100%[==================================>] 19.50K 104KB/s in 0.2s

2026-02-28 13:42:42 (104 KB/s) - ‘stream.c’ saved [19967/19967]

-------------------------------------------------------------
STREAM version $Revision: 5.10 $
-------------------------------------------------------------
This system uses 8 bytes per array element.
-------------------------------------------------------------
Array size = 100000000 (elements), Offset = 0 (elements)
Memory per array = 762.9 MiB (= 0.7 GiB).
Total memory required = 2288.8 MiB (= 2.2 GiB).
Each kernel will be executed 10 times.
The *best* time for each kernel (excluding the first iteration)
will be used to compute the reported bandwidth.
-------------------------------------------------------------
Number of Threads requested = 4
Number of Threads counted = 4
-------------------------------------------------------------
Your clock granularity/precision appears to be 2 microseconds.
Each test below will take on the order of 59507 microseconds.
(= 29753 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function Best Rate MB/s Avg time Min time Max time
Copy: 32260.8 0.051211 0.049596 0.056252
Scale: 20122.2 0.084704 0.079514 0.095905
Add: 23052.3 0.111081 0.104111 0.127775
Triad: 23743.5 0.105728 0.101080 0.117723
-------------------------------------------------------------
Solution Validates: avg error less than 1.000000e-13 on all three arrays
-------------------------------------------------------------

Gemini suggested this and said 30-36Gibps is expected and real world speed is 75-80% of the max

Eigenvektor · March 1

19 hours ago, apoyusiken said:

yea the cpu is a hero. can you try mbw with a single thread too?

Sure

sysbench memory --threads=1 --memory-block-size=1M --memory-total-size=20G run
sysbench memory --threads=1 --memory-block-size=8M --memory-total-size=20G run
sysbench memory --threads=1 --memory-block-size=1G --memory-total-size=20G run

1M > 20480.00 MiB transferred (41047.70 MiB/sec)
8M > 20480.00 MiB transferred (40541.64 MiB/sec)
1G > 20480.00 MiB transferred (17690.34 MiB/sec)

mbw -t0 1024

AVG Method: MEMCPY Elapsed: 0.05031 MiB: 1024.00000 Copy: 20354.575 MiB/s

So yeah, single threaded seems limited by the CPU's speed

Sign In

slow ram speed

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

You’re Mostly Measuring Cache Bandwidth

Sysbench Is Measuring Copy/Sequential Operations

You’re Seeing Cache-Line Traffic, Not Memory Bus Saturation

How To Measure Real DRAM Bandwidth

✔ Use a working set much larger than L3

✔ Use More Threads

✔ Or Use a Dedicated Bandwidth Tool

Unit Subtlety (Minor Detail)

Final Answer

Link to comment

Share on other sites

Link to post

Share on other sites

You’re Mostly Measuring Cache Bandwidth

Sysbench Is Measuring Copy/Sequential Operations

You’re Seeing Cache-Line Traffic, Not Memory Bus Saturation

How To Measure Real DRAM Bandwidth

✔ Use a working set much larger than L3

✔ Use More Threads

✔ Or Use a Dedicated Bandwidth Tool

Unit Subtlety (Minor Detail)

Final Answer

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Create an account or sign in to comment

Create an account

Sign in

Topics

Latest From Linus Tech Tips:

The Biggest Test Bench I’ve Ever Seen

Latest From ShortCircuit:

The coolest looking monitor. Period. - ASUS ROG display at Computex (Sponsored)

Latest From TechLinked:

This Summer’s Lookin’ Steamy

Latest From GameLinked:

This Was A GOOD One...

Latest From Tech Quickie:

The Secret Council Behind Every Emoji

Latest From The WAN Show:

Google’s Best Feature In Years - WAN Show June 5, 2026