Jump to content

Do Dual CPU Sockets Matter in 2018?

Xeon Scalable is Intel's "true" high-end workstation CPU line, with multi-CPU support baked in. But that comes at a cost...

 

 

Buy a Xeon Scalable CPU:
On Amazon: http://geni.us/DOS2Z
On Newegg: http://geni.us/fXrZ85

 

Buy an ASUS WS C621E SAGE Motherboard: 
On Newegg: http://geni.us/TeADQcm

Emily @ LINUS MEDIA GROUP                                  

congratulations on breaking absolutely zero stereotypes - @cs_deathmatch

Link to comment
Share on other sites

Link to post
Share on other sites

no i dont think so also do you guys have a update on the whole imac thing and yes i did just summon @DrMacintosh

10 minutes ago, GabenJr said:

SNIP-

 

Link to comment
Share on other sites

Link to post
Share on other sites

“We’ve just been pinged sir!”

1E30C757-CBD8-48A6-8033-ACF240AC5E0B.jpeg.92cd2c5ddf364744392e8f6be2ce315e.jpeg

Laptop: 2019 16" MacBook Pro i7, 512GB, 5300M 4GB, 16GB DDR4 | Phone: iPhone 13 Pro Max 128GB | Wearables: Apple Watch SE | Car: 2007 Ford Taurus SE | CPU: R7 5700X | Mobo: ASRock B450M Pro4 | RAM: 32GB 3200 | GPU: ASRock RX 5700 8GB | Case: Apple PowerMac G5 | OS: Win 11 | Storage: 1TB Crucial P3 NVME SSD, 1TB PNY CS900, & 4TB WD Blue HDD | PSU: Be Quiet! Pure Power 11 600W | Display: LG 27GL83A-B 1440p @ 144Hz, Dell S2719DGF 1440p @144Hz | Cooling: Wraith Prism | Keyboard: G610 Orion Cherry MX Brown | Mouse: G305 | Audio: Audio Technica ATH-M50X & Blue Snowball | Server: 2018 Core i3 Mac mini, 128GB SSD, Intel UHD 630, 16GB DDR4 | Storage: OWC Mercury Elite Pro Quad (6TB WD Blue HDD, 12TB Seagate Barracuda, 1TB Crucial SSD, 2TB Seagate Barracuda HDD)
Link to comment
Share on other sites

Link to post
Share on other sites

Please........ the clickbait is killing me.

Judge a product on its own merits AND the company that made it.

How to setup MSI Afterburner OSD | How to make your AMD Radeon GPU more efficient with Radeon Chill | (Probably) Why LMG Merch shipping to the EU is expensive

Oneplus 6 (Early 2023 to present) | HP Envy 15" x360 R7 5700U (Mid 2021 to present) | Steam Deck (Late 2022 to present)

 

Mid 2023 AlTech Desktop Refresh - AMD R7 5800X (Mid 2023), XFX Radeon RX 6700XT MBA (Mid 2021), MSI X370 Gaming Pro Carbon (Early 2018), 32GB DDR4-3200 (16GB x2) (Mid 2022

Noctua NH-D15 (Early 2021), Corsair MP510 1.92TB NVMe SSD (Mid 2020), beQuiet Pure Wings 2 140mm x2 & 120mm x1 (Mid 2023),

Link to comment
Share on other sites

Link to post
Share on other sites

You're correct that it matters more in data centers, clusters, and supercomputers where you it pays to maximize core density per node. But you're kind of simplistic in presuming virtualization. More and more, data centers are moving away from virtualization and toward containers. Docker is the name typically thrown around, but there are entire operating systems built around containers. Apache Mesos comes to mind - though not actually an operating system, but a means of running containers across nodes in a Mesos cluster.

 

And in a lot of cases, you have, similar to a supercomputer, a master node that spawns off a large number of containers across a cluster, which are much easier to deploy in mass compared to an application. The more cores you have, the more containers you can run, the faster you get your result. Combine containers with MPI to create a dynamic supercomputing cluster.

Containers are supposed to be small in size and limited in scope, allowing you to spawn a lot of them on a dual or quad-socket machine without running hard into NUMA limitations.

 

So do multi-socket systems matter still in 2018? Maybe not for you, but they absolutely still matter.

Wife's build: Amethyst - Ryzen 9 3900X, 32GB G.Skill Ripjaws V DDR4-3200, ASUS Prime X570-P, EVGA RTX 3080 FTW3 12GB, Corsair Obsidian 750D, Corsair RM1000 (yellow label)

My build: Mira - Ryzen 7 3700X, 32GB EVGA DDR4-3200, ASUS Prime X470-PRO, EVGA RTX 3070 XC3, beQuiet Dark Base 900, EVGA 1000 G6

Link to comment
Share on other sites

Link to post
Share on other sites

Author of y-cruncher here. Been a long time viewer of LTT videos!

 

The reason why the scaling from the 18-core to the dual-platinum is so bad in the multi-threaded test is probably because the computation is too small.

 

A 1 second computation has too little work to be effectively parallelized. On top of that, the overhead of spinning up and synchronizing that many threads is significant.

 

Try a computation of 1 billion or 10 billion digits and you should see a larger difference. This applies to most of the other hardware reviews as well.

 

I'd say any computation that takes less than 30 seconds is too small to fully utilize the system - regardless of the # of cores in the system.

 

Also, the y-cruncher benchmark is memory-bound on Skylake X. So bandwidth is a big deal. I'd expect the Platinums to benefit from having 3x the memory channels.

 

 

 

Link to comment
Share on other sites

Link to post
Share on other sites

4 hours ago, savagepain said:

no i dont think so also do you guys have a update on the whole imac thing and yes i did just summon @DrMacintosh

Check my profile/specs. I did it for a reason ;) More cores can definitely aid with tasks like media encoding/editing, live streaming (while performing one or more tasks), software development (especially when building large projects), hypervising multiple OS's simultaneously/running VMs in general, protein folding, and more. I've actually talked about the first topic on YouTube, so I won't reiterate. @FluidParadigms, take it away...

Edited by TopHatProductions115
added reasons
Link to comment
Share on other sites

Link to post
Share on other sites

I've been watching LTT videos for years now and there are two workloads that I don't think I've every seen you guys test: 1) software development (i.e. compiling large projects) and 2) high-end digital audio workstations (DAWs).

 

This video in particular really hit home since I've been wondering how software development workloads scale when it comes to cores-vs-clock speed. I use multiple systems here at work (Windows and Linux) on several flavors of Xeons and I've seen some interesting results... like a dual-socket E5-2687W (Sandy Bridge EP) machine outperforming a dual-socket E5-2683 v3 (Haswell) machine.

 

The DAW use case could also have some interesting trade-offs. In theory, higher clock speeds could have lower processing latencies and fewer audio artifacts, but higher core counts could let you run more plugins and process more tracks simultaneously.

 

I know I can't be the only PC enthusiast out there who writes code and/or does audio engineering/production...

Link to comment
Share on other sites

Link to post
Share on other sites

I think the answer is that they could matter if the price was right.

 

Why focus on these high end systems that seem to be out of the way. i mean your talking to an audience that either games or probably wants to do this much cheaper.

 

I think a better idea for a video would be to explore server refurb sites like servermonkey and come up with a cost effective server for students/hobbyists.

 

Hell it might even be beneficial to take a poll and just outright ask what people are doing / would be interested in doing.

 

IE do you need an Application server, a webhost, a storage server, VM host, a rendering server, or just a platform to learn how to manage servers.

 

At the very least if you decide its not viable you could always compare those machines to just using an old desktop for server tasks.

Link to comment
Share on other sites

Link to post
Share on other sites

I wish there was a write up or video on how they the mult-machine vms on unraid.

Link to comment
Share on other sites

Link to post
Share on other sites

5 hours ago, FluidParadigms said:

This video in particular really hit home since I've been wondering how software development workloads scale when it comes to cores-vs-clock speed. I use multiple systems here at work (Windows and Linux) on several flavors of Xeons and I've seen some interesting results... like a dual-socket E5-2687W (Sandy Bridge EP) machine outperforming a dual-socket E5-2683 v3 (Haswell) machine.

 

As a professional software engineer, I can answer this question based on my observations at work. And cores are definitely your friend on this more than clock speed - with certain considerations, of course.

 

On working with an IDE, more cores will help, but RAM is your bigger friend. IDEs today have a LOT of features I really wish existed 20 years ago when I first started writing code. But those features use a lot of RAM - such as autocompletion. And then there's the partial compile Visual Studio does in the background as well to allow you to near-instantly see code that may be a problem before you try to compile it.

 

But if someone were to approach me about wanting to build a computer on which they intend to try software development, I'd steer them toward RAM and storage before discussing core count. That said, I'd make sure they got at least a quad core as well (not a dual-core with HT).

 

Now with building projects, the answer will really be based on the project you're attempting to compile, and whether a multi-processor build is capable, preferably without having to write scripts to make it happen. And with Visual Studio, it absolutely is - I believe since 2010. Enabling multiprocessor compile in the project (along with allowing parallel builds in VSTS), the build easily saturates all 4 cores on our build agent, completing in about 20 minutes from clean. But if I was building a machine that would be a dedicated build agent, I'd again push for RAM, but this time I'd also push for an SSD before pushing for more cores.

 

Eventually storage becomes the bottleneck as you add cores. Even an NVMe SSD will throttle a build beyond a certain point due to how many processes will be trying to simultaneously read and write files to the drive. It won't throttle you nearly as much as an HDD would, but you'll still reach a ceiling. It'll likely just take a lot of cores to get there.

 

I'd love to see how well both systems featured in the video would chew up and spit out the projects I build at work. I'm sure it wouldn't have any difficulty, though the SSD storage would probably start sweating long before the CPUs came close.

Wife's build: Amethyst - Ryzen 9 3900X, 32GB G.Skill Ripjaws V DDR4-3200, ASUS Prime X570-P, EVGA RTX 3080 FTW3 12GB, Corsair Obsidian 750D, Corsair RM1000 (yellow label)

My build: Mira - Ryzen 7 3700X, 32GB EVGA DDR4-3200, ASUS Prime X470-PRO, EVGA RTX 3070 XC3, beQuiet Dark Base 900, EVGA 1000 G6

Link to comment
Share on other sites

Link to post
Share on other sites

more cores per socket , more sockets per rack space = more money saved in that sense 

Please quote me or tag me if your trying to talk to me , I might see it through all my other notifications ^_^

Spoiler
Spoiler
the current list of dead cards is as follows 2 evga gtx 980ti acx 2.0 , 1 evga gtx 980 acx 2.0 1600mhz core 2100mhz ram golden chip card ... failed hardcore , 1 290x that caught fire , 1 hd 7950 .

may you all rest in peaces in the giant pc in the sky

Link to comment
Share on other sites

Link to post
Share on other sites

5:30 of the video, NUMA... 

I remember my first desktop was a dual AMP Athlon MP 2000+ setup.. Palomino cores. Ran Seti@home on them 24/7 with Seti Driver caching the workunits since I just had dialup. In the early 2000s while I was in high school. I still thought my computer was way too darn slow because of hard drives. Maxtor 300GB 7200RPM upgrade in RAID 0 wasn't too bad. 

I think the Pentium D noticeably caused global warming to accelerate. 

Link to comment
Share on other sites

Link to post
Share on other sites

15 hours ago, Mysticial said:

Try a computation of 1 billion or 10 billion digits and you should see a larger difference. This applies to most of the other hardware reviews as well.

I take it that's why HWBOT only has submission options for 25m, 1b and 10b right?

Our Grace. The Feathered One. He shows us the way. His bob is majestic and shows us the path. Follow unto his guidance and His example. He knows the one true path. Our Saviour. Our Grace. Our Father Birb has taught us with His humble heart and gentle wing the way of the bob. Let us show Him our reverence and follow in His example. The True Path of the Feathered One. ~ Dimboble-dubabob III

Link to comment
Share on other sites

Link to post
Share on other sites

15 hours ago, Mysticial said:

Author of y-cruncher here. Been a long time viewer of LTT videos!

Hi *waves*

3 minutes ago, DildorTheDecent said:

I take it that's why HWBOT only has submission options for 25m, 1b and 10b right?

I can't remember the exact requirements but 10B needs a ton of ram to do well. I've only run it on a system with 64GB, and 32GB isn't enough if you want to avoid hitting the disk.

 

I currently hold the hwbot WRs for 6 cores, not because I'm particularly good, but simply no one else bothered running it with AVX-512 before or since... consider that a challenge if anyone else has one to join in the fun.

 

I have an older dual xeon with 64GB of ram that's unstable. If I get it working again I'd like to try that too... 

Main system: i9-7980XE, Asus X299 TUF mark 2, Noctua D15, Corsair Vengeance Pro 3200 3x 16GB 2R, RTX 3070, NZXT E850, GameMax Abyss, Samsung 980 Pro 2TB, Acer Predator XB241YU 24" 1440p 144Hz G-Sync + HP LP2475w 24" 1200p 60Hz wide gamut
Gaming laptop: Lenovo Legion 5, 5800H, RTX 3070, Kingston DDR4 3200C22 2x16GB 2Rx8, Kingston Fury Renegade 1TB + Crucial P1 1TB SSD, 165 Hz IPS 1080p G-Sync Compatible

Link to comment
Share on other sites

Link to post
Share on other sites

I work at a Datacenter / multi customer for two years, but I can't see any decrease of dual socket systems.

Quad socket systems are decreasing, because of the better density. Especially if they can live with 1,5TB RAM or less.

Singe CPU Systems are also getting less, because most of the customers want as much computing power in their racks as possible.

But I can see a increase of single cpu on "non standard compute workloads" like backbox firewalls like Fortigate's, VPN Gateways like JunOS Puls gateways or other networking or transfer intensive loads. Mostly because the need to get more efficient.

Link to comment
Share on other sites

Link to post
Share on other sites

3 hours ago, DildorTheDecent said:

I take it that's why HWBOT only has submission options for 25m, 1b and 10b right?

Correct. The 25m is just a, "Does the benchmark work? And can you submit?"

1b and 10b are the "real" benchmarks. But even 1b is becoming too small as those are going under 20 seconds on the bigger Skylake Server machines.

 

3 hours ago, porina said:

Hi *waves*

I can't remember the exact requirements but 10B needs a ton of ram to do well. I've only run it on a system with 64GB, and 32GB isn't enough if you want to avoid hitting the disk.

*waves*

 

It's on the order of 45GB, but will vary a bit depending on which binary you're running and how many cores there are.

 

Another reason why large computations scale better is because they use a lot more memory. To over simplify things a bit, because there is so much more data, the probability that the two sockets "fight" over the same data at any given time is much smaller.

 

So Linus' explanation for why y-cruncher scales poorly on NUMA is correct. But it's not the dominant factor for a computation that takes only 1 second. Recent versions of the program (>= v0.7.3) are NUMA-aware and will (theoretically) scale onto 2 or 4 sockets - if the computation is large enough. (It does on my 4-socket Barcelona Opteron.)

 

It'll be far from perfect linear-scaling, but there should still be a noticeable speedup. But once you go beyond that where there are multiple physical motherboards, then it all goes downhill. I've had someone try this on a 32-socket/8-motherboard Broadwell system with 576 cores/1152 hyperthreads. It was hilariously bad.

 

3 hours ago, porina said:

I currently hold the hwbot WRs for 6 cores, not because I'm particularly good, but simply no one else bothered running it with AVX-512 before or since... consider that a challenge if anyone else has one to join in the fun.

 

I have an older dual xeon with 64GB of ram that's unstable. If I get it working again I'd like to try that too... 

 

Yeah, the AVX512 complicates things a lot more.

  • It destabilizes everyone's overclocks since nobody stress-tests for it.
  • It's why AMD is getting killed so badly in this benchmark.
  • It makes Skylake X so fast that the bottleneck is memory bandwidth.

This last reason is why a lot of the reviews show little difference between the 7960X and the 7980XE with little to gain with a CPU overclock. Unless the memory is running at like 4500 MT/s or something, the cores are just gonna be sitting there waiting on memory for much of the computation.

 

Link to comment
Share on other sites

Link to post
Share on other sites

19 hours ago, firelighter487 said:

i use a dual socket machine as a main pc.. now yea it's old, but it's still quite nice. 

Yay another dual x58 person!

⬇ - PC specs down below - ⬇

 

The Impossibox

CPU: (x2) Xeon X5690 12c/24t (6c/12t per cpu)

Motherboard: EVGA Super Record 2 (SR-2)

RAM: 48Gb (12x4gb) server DDR3 ECC

GPU: MSI GTX 1060 Gaming X 6GB

Case: Modded Lian-LI PC-08

Storage: Samsung 850 EVO 500Gb and a 2Tb HDD

PSU: 1000W something or other I forget

Display(s): 24" Acer G246HL

Cooling: (x2) Corsair H100i v2

Keyboard: Corsair Gaming K70 LUX RGB MX Browns

Mouse: Logitech G600

Headphones: Sennheiser HD558

Operating System: Windows 10 Pro

 

Folding info so I don't lose it: 

WhisperingKnickers

 

Join us on the x58 page it is awesome!

x58 Fan Page

 

Link to comment
Share on other sites

Link to post
Share on other sites

1 hour ago, Mysticial said:

Yeah, the AVX512 complicates things a lot more.

  • It destabilizes everyone's overclocks since nobody stress-tests for it.

I managed 25m at 4.5 GHz, but had to drop to 4.3 GHz for 1b and 10b. When the benchmark run takes under a second, you can get away with instability. Not so when runs are longer. I can bench non-AVX code at 4.9, but forget what I got for AVX2.

1 hour ago, Mysticial said:
  • It's why AMD is getting killed so badly in this benchmark.

It's their design choice. Helps in some ways, not in others.

1 hour ago, Mysticial said:
  • It makes Skylake X so fast that the bottleneck is memory bandwidth.

This last reason is why a lot of the reviews show little difference between the 7960X and the 7980XE with little to gain with a CPU overclock. Unless the memory is running at like 4500 MT/s or something, the cores are just gonna be sitting there waiting on memory for much of the computation.

I frequent other prime number forums under a different fishy name... :) Ram bandwidth is too often a limiting factor... even if it doesn't help at bigger FFT sizes, I hope to see AVX-512 support in gwnum even if I can only run small tests with it out of cache.

Main system: i9-7980XE, Asus X299 TUF mark 2, Noctua D15, Corsair Vengeance Pro 3200 3x 16GB 2R, RTX 3070, NZXT E850, GameMax Abyss, Samsung 980 Pro 2TB, Acer Predator XB241YU 24" 1440p 144Hz G-Sync + HP LP2475w 24" 1200p 60Hz wide gamut
Gaming laptop: Lenovo Legion 5, 5800H, RTX 3070, Kingston DDR4 3200C22 2x16GB 2Rx8, Kingston Fury Renegade 1TB + Crucial P1 1TB SSD, 165 Hz IPS 1080p G-Sync Compatible

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×