Jump to content

I just finished a dual socket intel build and haven't been getting the performance I wanted out of my FEA workload. I ran vtune and it seems like memory latency is a factor with my performance issues. Would installing ddr5 help?

My line of thinking is that the ddr5 might have lower latency that the Intel CPU's memory?

 

pics of the build to garner interest:

IMG_20250116_173349s.thumb.jpg.eeb4fa0a3399ad0a6f374f7b33261ed0.jpg

 

IMG_20250116_173548s.thumb.jpg.86d6c77d1d74a5b8e7a1d9650f22dcfa.jpg

Link to post
Share on other sites

9 minutes ago, twin_savage said:

I just finished a dual socket intel build and haven't been getting the performance I wanted out of my FEA workload. I ran vtune and it seems like memory latency is a factor with my performance issues. Would installing ddr5 help?

My line of thinking is that the ddr5 might have lower latency that the Intel CPU's memory?

 

pics of the build to garner interest:

 

 

 

DDR5 as opposed to what? What ram is it currently running? Are you willing to swap motherboard and cpu to facilitate the ram change?

Link to post
Share on other sites

8 minutes ago, Blue4130 said:

DDR5 as opposed to what? What ram is it currently running? Are you willing to swap motherboard and cpu to facilitate the ram change?

I was under the impression that Xeon scalables came with some (slow) integrated memory but Google seems to be clueless. Not sure what OP really means, after all.

 

Apparently some Xeon scalables have integrated HBM2e.

 

Edited by thekingofmonks

Ryzen 7 5700X3D (-30 CO all-core) w/ cheap 6-pipe cooler - Gigabyte AX370-Gaming 5 - LPX 2x16GB 3600C18 2R - EVGA RTX 3070 8G XC3 PX1 - Patriot VPN110 1TB -  MSI A750GL - and a dogshit Sharkoon ATX case

Asus ROG G531GT : i7-9750H (-200 Vcore) - GTX 1650M +700mem - Samsung 16+8GB 2666 - 1920x1080@145Hz (172Hz) IPS panel

 

i5-6400 @4.3GHz (160 bclk) w/ Assassin King 120SE - Z170M-Plus - G.Skill 2x8GB 3200 C16 - Biostar RX 570 8G w/ MSI Armor cooler

 

i5-4690K + Z97-AR + Panram Blue Lightsaber 2x4GB PC3-2800

iMac 21.5" (late 2011) : i5-2400S, HD 6750M 512MB - Samsung 4x4GB PC3-1333 - WT200 512G SSD (High Sierra) - 1920x1080@60 LCD

Acer Z5610 "Theatre" C2 Quad Q9550 - 2x2GB PC3-1333 (Samsung) - 1920x1080@60Hz Touch LCD - great internal speakers

Link to post
Share on other sites

2 minutes ago, chomi said:

What cpus are you currently running? If you want performance I'm thinking something like the 9950x3d or a thread ripper would outperform your current setup.

Emerald Rapids go Brrrrrrrrrr
 

3 minutes ago, chomi said:

How in the world are you going to upgrade to DDR5. I don't think there are any motherboards that are cross compatible between ram generations.

They exist for skylake, but they are uncommon and also why would you do that.

Link to post
Share on other sites

6 minutes ago, Blue4130 said:

DDR5 as opposed to what? What ram is it currently running? Are you willing to swap motherboard and cpu to facilitate the ram change?

whoops, I should have mentioned it's currently running off of the 64GBs of built in HBM memory on each processor. These are Xeon 9480's:

 

image.thumb.png.a5d1adab8dc1ca55b5c528be995aa28f.png

 

Link to post
Share on other sites

7 minutes ago, chomi said:

What cpus are you currently running? If you want performance I'm thinking something like the 9950x3d or a thread ripper would outperform your current setup.

Probably not for the type of workload OP is doing (FEA)

Ryzen 7 5700X3D (-30 CO all-core) w/ cheap 6-pipe cooler - Gigabyte AX370-Gaming 5 - LPX 2x16GB 3600C18 2R - EVGA RTX 3070 8G XC3 PX1 - Patriot VPN110 1TB -  MSI A750GL - and a dogshit Sharkoon ATX case

Asus ROG G531GT : i7-9750H (-200 Vcore) - GTX 1650M +700mem - Samsung 16+8GB 2666 - 1920x1080@145Hz (172Hz) IPS panel

 

i5-6400 @4.3GHz (160 bclk) w/ Assassin King 120SE - Z170M-Plus - G.Skill 2x8GB 3200 C16 - Biostar RX 570 8G w/ MSI Armor cooler

 

i5-4690K + Z97-AR + Panram Blue Lightsaber 2x4GB PC3-2800

iMac 21.5" (late 2011) : i5-2400S, HD 6750M 512MB - Samsung 4x4GB PC3-1333 - WT200 512G SSD (High Sierra) - 1920x1080@60 LCD

Acer Z5610 "Theatre" C2 Quad Q9550 - 2x2GB PC3-1333 (Samsung) - 1920x1080@60Hz Touch LCD - great internal speakers

Link to post
Share on other sites

4 minutes ago, twin_savage said:

whoops, I should have mentioned it's currently running off of the 64GBs of built in HBM memory on each processor. These are Xeon 9480's:

 

image.thumb.png.a5d1adab8dc1ca55b5c528be995aa28f.png

 

Oh, Sapphire Rapids, not Emerald. What a wild build. Two 13K CPUs... I feel like this is the wrong forum. This seems like a Level 1 Tech forum question.

Link to post
Share on other sites

31 minutes ago, twin_savage said:

My line of thinking is that the ddr5 might have lower latency that the Intel CPU's memory?

I'm pretty sure HBM is better than DDR5, it's integrated memory after all.

 

I would assume that just like any other type of workload, more RAM is needed if you're running out of it, otherwise it would mildly slow down the process.

Found a Reddit post:

 

Ryzen 7 5700X3D (-30 CO all-core) w/ cheap 6-pipe cooler - Gigabyte AX370-Gaming 5 - LPX 2x16GB 3600C18 2R - EVGA RTX 3070 8G XC3 PX1 - Patriot VPN110 1TB -  MSI A750GL - and a dogshit Sharkoon ATX case

Asus ROG G531GT : i7-9750H (-200 Vcore) - GTX 1650M +700mem - Samsung 16+8GB 2666 - 1920x1080@145Hz (172Hz) IPS panel

 

i5-6400 @4.3GHz (160 bclk) w/ Assassin King 120SE - Z170M-Plus - G.Skill 2x8GB 3200 C16 - Biostar RX 570 8G w/ MSI Armor cooler

 

i5-4690K + Z97-AR + Panram Blue Lightsaber 2x4GB PC3-2800

iMac 21.5" (late 2011) : i5-2400S, HD 6750M 512MB - Samsung 4x4GB PC3-1333 - WT200 512G SSD (High Sierra) - 1920x1080@60 LCD

Acer Z5610 "Theatre" C2 Quad Q9550 - 2x2GB PC3-1333 (Samsung) - 1920x1080@60Hz Touch LCD - great internal speakers

Link to post
Share on other sites

10 minutes ago, starsmine said:

Oh, Sapphire Rapids, not Emerald. What a wild build. Two 13K CPUs... I feel like this is the wrong forum. 

There was a large dump of the 9480's from what I gather were spares for the aurora supercomputer that hit the market at less than 1k each not long ago, I suspect these will be showing up in more builds over time.

Scalpers have bought up a lot of the stock and are charging a 150-250% premium over the "normal" price.

 

 

Link to post
Share on other sites

2 minutes ago, thekingofmonks said:

I'm pretty sure HBM is better than DDR5, it's integrated memory after all.

It certainly has high bandwidth but a lot of the benchmark tools are showing it at super high latency:

image.thumb.png.bdb84f8dbc2cb3226677b2b19b7267a1.png

Link to post
Share on other sites

1 minute ago, twin_savage said:

It certainly has high bandwidth but a lot of the benchmark tools are showing it at super high latency:

Cool, I didn't know it was this bad. This somewhat explains why it was used on consumer GPUs.

 

How large is this thing you're working on, though?

 

If it really is a latency issue, you can get DDR5 and have softwares select whether to use the HBM or the DDR5 pool in flat mode.

From the video:

image.thumb.png.6949e11f98a7237e3f88e445fecb54a7.png

Ryzen 7 5700X3D (-30 CO all-core) w/ cheap 6-pipe cooler - Gigabyte AX370-Gaming 5 - LPX 2x16GB 3600C18 2R - EVGA RTX 3070 8G XC3 PX1 - Patriot VPN110 1TB -  MSI A750GL - and a dogshit Sharkoon ATX case

Asus ROG G531GT : i7-9750H (-200 Vcore) - GTX 1650M +700mem - Samsung 16+8GB 2666 - 1920x1080@145Hz (172Hz) IPS panel

 

i5-6400 @4.3GHz (160 bclk) w/ Assassin King 120SE - Z170M-Plus - G.Skill 2x8GB 3200 C16 - Biostar RX 570 8G w/ MSI Armor cooler

 

i5-4690K + Z97-AR + Panram Blue Lightsaber 2x4GB PC3-2800

iMac 21.5" (late 2011) : i5-2400S, HD 6750M 512MB - Samsung 4x4GB PC3-1333 - WT200 512G SSD (High Sierra) - 1920x1080@60 LCD

Acer Z5610 "Theatre" C2 Quad Q9550 - 2x2GB PC3-1333 (Samsung) - 1920x1080@60Hz Touch LCD - great internal speakers

Link to post
Share on other sites

3 minutes ago, thekingofmonks said:

Cool, I didn't know it was this bad. This somewhat explains why it was used on consumer GPUs.

The memory is basically 128 channels of DDR4-3200MT/s (HBM2e is closer to DDR4 and than DDR5) between the two CPUs.

 

6 minutes ago, thekingofmonks said:

How large is this thing you're working on, though?

I've only been running the CFD and CFD-EM benchmarks off of the L1Techs site so far, which have roughly 10GB and 60GB memory foot prints respectively. These are probably kind of small to take advantage of the parallelism, but I still assumed it'd beat dual Epycs by several times, but instead it barely beats dual 9374F's in the ranking.

My real workload is going to be ~120GB so maybe it'll see more of a speed up due to more parallelism in larger workloads.

 

10 minutes ago, thekingofmonks said:

If it really is a latency issue, you can get DDR5 and have softwares select whether to use the HBM or the DDR5 pool in flat mode.

This is what I'm interested in, but I can't bring myself to spend so much on memory and maybe not see a performance improvement.

Also I'm not exactly sure how my application will handle NUMA nodes without any processors attached to them which what flat mode does.

Link to post
Share on other sites

Were temps, power and clock speed normal during the runs?

Ryzen 7 5700X3D (-30 CO all-core) w/ cheap 6-pipe cooler - Gigabyte AX370-Gaming 5 - LPX 2x16GB 3600C18 2R - EVGA RTX 3070 8G XC3 PX1 - Patriot VPN110 1TB -  MSI A750GL - and a dogshit Sharkoon ATX case

Asus ROG G531GT : i7-9750H (-200 Vcore) - GTX 1650M +700mem - Samsung 16+8GB 2666 - 1920x1080@145Hz (172Hz) IPS panel

 

i5-6400 @4.3GHz (160 bclk) w/ Assassin King 120SE - Z170M-Plus - G.Skill 2x8GB 3200 C16 - Biostar RX 570 8G w/ MSI Armor cooler

 

i5-4690K + Z97-AR + Panram Blue Lightsaber 2x4GB PC3-2800

iMac 21.5" (late 2011) : i5-2400S, HD 6750M 512MB - Samsung 4x4GB PC3-1333 - WT200 512G SSD (High Sierra) - 1920x1080@60 LCD

Acer Z5610 "Theatre" C2 Quad Q9550 - 2x2GB PC3-1333 (Samsung) - 1920x1080@60Hz Touch LCD - great internal speakers

Link to post
Share on other sites

3 minutes ago, thekingofmonks said:

Were temps, power and clock speed normal during the runs?

I was super paranoid about cooling because apparently the HBM is very sensitive to temperature so I went with dual 360 rads, the big 360 rads that are two 180mm fans each; I can keep the CPU below 60C while running full blast.

I've got the software setup so that I'm only using 64 cores at a time (this gives me the best performance), so the clock speeds will fluctuate between 3.2GHz and 3.5GHz depending on how many threads the workload hits in the moment, the power usage is very spiky, ranging between 600-1100watts throughout out a run.

Link to post
Share on other sites

Good news, apparently my performance problems were rooted in the math kernel library I was using, up until recently I had been using Intel's OneAPI 2022.2.1; I switched to OneAPI 2024.0.0 last night and my performance doubled on the small-ish 60GB benchmark, now putting the system at roughly twice as fast as the fastest dual socket EPYC system.

I suppose this makes sense since sapphire rapids came out in 2023 which postdates the library I was initially using.

...I'm kind of curious to see how the performance would be if I switched to AMD's math library on this system.

 

So apparently software needs to be rewritten in or to take advantage of HBM which makes me nervous for AMD's HBM product, since they always seem to be behind in software; Intel's dedicated software developer headcount is not far off from AMD's total employee count.

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×