Jump to content

Cray's new Supercomputer Shasta Powered by Epyc and Volta-next

cj09beira

We got quite a lot of info today on this new supercomputer, as we already knew it will be powered by epyc, using the Milan architecture, it will also use nvidia's Volta-next for gpu acceleration.

Quote

AMD recently announced its EPYC Rome processors, the first 7nm data center chips on the market, but the company is already moving forward with its next-generation products. Here at Supercomputer 2018, the US Department of Energy (DOE) announced that its Perlmutter supercomputer would come armed with AMD's unreleased EPYC Milan processors. The new supercomputer will also use Nvidia's "Volta-Next" GPUs, with the two combining to make an exascale-class machine that will be one of the fastest supercomputers in the world.

The Perlmutter supercomputer will be built using Cray's Shasta supercomputer platform, which was also on display here at the show. The supercomputer will be built with a mixture of both CPU and GPU nodes, with the CPU node pictured above. This watercooled chassis houses eight AMD Milan CPUs. We see four copper waterblocks that cover the Milan processors, while four more processors are mounted inverted on the PCBs between the DIMM slots. This system is designed for the ultimate in performance density, so all the DIMM sticks are also watercooled.

so it seems the Super computer will be made of cpu/gpu nodes all watercooled.

 

The Cpu node will be made of 8 sockets each with 8 dimms, which will eventually be populated with Milan cpus, just for reference using Rome cpus we would get 512 cores and 1024 threads with up to 16Tb of ram

Spoiler

aHR0cDovL21lZGlhLmJlc3RvZm1pY3JvLmNvbS9N

 

the Gpu node will be made of 4 Nvidia gpus using a post Volta architecture, all connected to a single Amd Milan cpu (25GB/s bandwidth to each gpu)

Spoiler

aHR0cDovL21lZGlhLmJlc3RvZm1pY3JvLmNvbS9N

here is what the Slingshot integrated switch behind the node looks like:

Spoiler

aHR0cDovL21lZGlhLmJlc3RvZm1pY3JvLmNvbS9N

not all is watercooled though here is the network switch for the top of the Rack

Spoiler

aHR0cDovL21lZGlhLmJlc3RvZm1pY3JvLmNvbS9N

 

This super computer will use nand/flash only for its storage

Spoiler

aHR0cDovL21lZGlhLmJlc3RvZm1pY3JvLmNvbS9N

 

Thoughts:

I haven't seen many supercomputers before, but this case seems to allow for some great density, this will probably have even more cores as amd probably will increase core count again with Milan.

Just a shame that we dont have photos for the gpu node, it would be interesting to see how that was done, still very impressed with how much work goes into these machines.

this should help put amd back on the eyes of customers.

very interested to see more about this project.

 

Source:https://www.tomshardware.com/news/amd-epyc-milan-shasta-exascale,38067.html

Link to comment
Share on other sites

Link to post
Share on other sites

11 minutes ago, cj09beira said:

Just for reference using Rome cpus we would get 512 cores and 1024 threads with up to 16Tb of ram

Captain Obvious here, informing you..... Holy shit thats a lot of cores!

 

Also, is that 4 Volta GPU's for every CPU ( meaning 32 GPU's total?)

"Put as much effort into your question as you'd expect someone to give in an answer"- @Princess Luna

Make sure to Quote posts or tag the person with @[username] so they know you responded to them!

 RGB Build Post 2019 --- Rainbow 🦆 2020 --- Velka 5 V2.0 Build 2021

Purple Build Post ---  Blue Build Post --- Blue Build Post 2018 --- Project ITNOS

CPU i7-4790k    Motherboard Gigabyte Z97N-WIFI    RAM G.Skill Sniper DDR3 1866mhz    GPU EVGA GTX1080Ti FTW3    Case Corsair 380T   

Storage Samsung EVO 250GB, Samsung EVO 1TB, WD Black 3TB, WD Black 5TB    PSU Corsair CX750M    Cooling Cryorig H7 with NF-A12x25

Link to comment
Share on other sites

Link to post
Share on other sites

4 minutes ago, TVwazhere said:

Captain Obvious here, informing you..... Holy shit thats a lot of cores!

 

Also, is that 4 Volta GPU's for every CPU ( meaning 32 GPU's total?)

don't think so, in the gpu node slide they say 4 gpus and one cpu, if the gpu node is the same size as the cpu node, i would say up to 2 cpus 8 gpus, but that is just speculation it might not have fit

edit:probably is just 4 gpus per node though

Link to comment
Share on other sites

Link to post
Share on other sites

5 minutes ago, TVwazhere said:

Captain Obvious here, informing you..... Holy shit thats a lot of cores!

Yep, we'll be able to start talking about kilo cores in single machines soon enough - especially if AMD decides to be absolutely savage and launch Milan with 128 cores.

Don't ask to ask, just ask... please 🤨

sudo chmod -R 000 /*

Link to comment
Share on other sites

Link to post
Share on other sites

Why didn't they just call it Ampere, nvidia please oh please

MOAR COARS: 5GHz "Confirmed" Black Edition™ The Build
AMD 5950X 4.7/4.6GHz All Core Dynamic OC + 1900MHz FCLK | 5GHz+ PBO | ASUS X570 Dark Hero | 32 GB 3800MHz 14-15-15-30-48-1T GDM 8GBx4 |  PowerColor AMD Radeon 6900 XT Liquid Devil @ 2700MHz Core + 2130MHz Mem | 2x 480mm Rad | 8x Blacknoise Noiseblocker NB-eLoop B12-PS Black Edition 120mm PWM | Thermaltake Core P5 TG Ti + Additional 3D Printed Rad Mount

 

Link to comment
Share on other sites

Link to post
Share on other sites

46 minutes ago, TVwazhere said:

Captain Obvious here, informing you..... Holy shit thats a lot of cores!

 

Also, is that 4 Volta GPU's for every CPU ( meaning 32 GPU's total?)

Anyone feels like bringing back the MOAR CORES meme that was throw at AMD during Bulldozer days? :D

Link to comment
Share on other sites

Link to post
Share on other sites

1 minute ago, RejZoR said:

Anyone feels like bringing back the MOAR CORES meme that was throw at AMD during Bulldozer days? :D

maybe but this time its moar cores (that are actually fast)

the best one imo is the star wars meme

Link to comment
Share on other sites

Link to post
Share on other sites

2 hours ago, Sauron said:

Yep, we'll be able to start talking about kilo cores in single machines soon enough - especially if AMD decides to be absolutely savage and launch Milan with 128 cores.

Single machines? We already had a million cores in 2013.

 

https://engineering.stanford.edu/magazine/article/stanford-researchers-break-million-core-supercomputer-barrier

Link to comment
Share on other sites

Link to post
Share on other sites

3 minutes ago, Sauron said:

That's not a single machine, it's a cluster.

You mean a single rack, server chassis of x number of units? Because a distributed memory supercomputer like that is still considered a single machine...

 

Hence the Top500 exists to benchmark large machines like that.

Link to comment
Share on other sites

Link to post
Share on other sites

2 hours ago, RejZoR said:

Anyone feels like bringing back the MOAR CORES meme that was throw at AMD during Bulldozer days? :D

 

MOAR COARS: 5GHz "Confirmed" Black Edition™ The Build
AMD 5950X 4.7/4.6GHz All Core Dynamic OC + 1900MHz FCLK | 5GHz+ PBO | ASUS X570 Dark Hero | 32 GB 3800MHz 14-15-15-30-48-1T GDM 8GBx4 |  PowerColor AMD Radeon 6900 XT Liquid Devil @ 2700MHz Core + 2130MHz Mem | 2x 480mm Rad | 8x Blacknoise Noiseblocker NB-eLoop B12-PS Black Edition 120mm PWM | Thermaltake Core P5 TG Ti + Additional 3D Printed Rad Mount

 

Link to comment
Share on other sites

Link to post
Share on other sites

5 minutes ago, S w a t s o n said:

 

thats vaporware ? doesnt count, and doesnt provide more cores than what intel could already do 

Link to comment
Share on other sites

Link to post
Share on other sites

30 minutes ago, cj09beira said:

thats vaporware ? doesnt count, and doesnt provide more cores than what intel could already do 

Well the number of sockets does matter for licensing costs but yea no one is really "looking forward" to cascade-ap at this point. AWS signed onto AMD for a reason

MOAR COARS: 5GHz "Confirmed" Black Edition™ The Build
AMD 5950X 4.7/4.6GHz All Core Dynamic OC + 1900MHz FCLK | 5GHz+ PBO | ASUS X570 Dark Hero | 32 GB 3800MHz 14-15-15-30-48-1T GDM 8GBx4 |  PowerColor AMD Radeon 6900 XT Liquid Devil @ 2700MHz Core + 2130MHz Mem | 2x 480mm Rad | 8x Blacknoise Noiseblocker NB-eLoop B12-PS Black Edition 120mm PWM | Thermaltake Core P5 TG Ti + Additional 3D Printed Rad Mount

 

Link to comment
Share on other sites

Link to post
Share on other sites

1 hour ago, Amazonsucks said:

You mean a single rack, server chassis of x number of units? Because a distributed memory supercomputer like that is still considered a single machine...

 

Hence the Top500 exists to benchmark large machines like that.

I think what I meant is pretty obvious, and discussing over meaningless semantics is pretty low on my list of priorities.

Don't ask to ask, just ask... please 🤨

sudo chmod -R 000 /*

Link to comment
Share on other sites

Link to post
Share on other sites

3 hours ago, S w a t s o n said:

Why didn't they just call it Ampere, nvidia please oh please

Ampere is the Turing respin on 7nm. We haven't heard the code name for the 7nm Volta-Next, though it's nice that Cray confirmed most of it for us.

Link to comment
Share on other sites

Link to post
Share on other sites

9 minutes ago, Taf the Ghost said:

Ampere is the Turing respin on 7nm. We haven't heard the code name for the 7nm Volta-Next, though it's nice that Cray confirmed most of it for us.

With the sheer amount of misinformation surrounding both ampere and turing's names I'd wait and see.

MOAR COARS: 5GHz "Confirmed" Black Edition™ The Build
AMD 5950X 4.7/4.6GHz All Core Dynamic OC + 1900MHz FCLK | 5GHz+ PBO | ASUS X570 Dark Hero | 32 GB 3800MHz 14-15-15-30-48-1T GDM 8GBx4 |  PowerColor AMD Radeon 6900 XT Liquid Devil @ 2700MHz Core + 2130MHz Mem | 2x 480mm Rad | 8x Blacknoise Noiseblocker NB-eLoop B12-PS Black Edition 120mm PWM | Thermaltake Core P5 TG Ti + Additional 3D Printed Rad Mount

 

Link to comment
Share on other sites

Link to post
Share on other sites

6 hours ago, Sauron said:

I think what I meant is pretty obvious, and discussing over meaningless semantics is pretty low on my list of priorities.

No its not really clear what you meant. In the HPC community Sequioa and other large HPC machines are typically referred to as a single machine...

 

Hence all one million or however many cores and the rest of the hardware including PB of RAM and disks, megawatts of power supplies etc are all called something like Sequoia, Summit, Sierra, Earth Simulator, Titan, K Computer, Oak Forest or some other name for the entire system. This thread specifically references Shasta systems like HLRS Hawk.

 

@cj09beira

 

I think the most interesting part of Shasta is its Slingshot interconnect. Itll be interesting to see how it compares to Tofu3 and other exascale interconnects.

 

https://www.cray.com/blog/meet-slingshot-an-innovative-interconnect-for-the-next-generation-of-supercomputers/

 

Link to comment
Share on other sites

Link to post
Share on other sites

11 minutes ago, Nicnac said:

imagine folding on this^^

We can only dream. Back in 2011 someone folded on the French atomic energy super computer for a while, it was like in the top 50 or something and only produced 12million a day.

My Folding Stats - Join the fight against COVID-19 with FOLDING! - If someone has helped you out on the forum don't forget to give them a reaction to say thank you!

 

The only true wisdom is in knowing you know nothing. - Socrates
 

Please put as much effort into your question as you expect me to put into answering it. 

 

  • CPU
    Ryzen 9 5950X
  • Motherboard
    Gigabyte Aorus GA-AX370-GAMING 5
  • RAM
    32GB DDR4 3200
  • GPU
    Inno3D 4070 Ti
  • Case
    Cooler Master - MasterCase H500P
  • Storage
    Western Digital Black 250GB, Seagate BarraCuda 1TB x2
  • PSU
    EVGA Supernova 1000w 
  • Display(s)
    Lenovo L29w-30 29 Inch UltraWide Full HD, BenQ - XL2430(portrait), Dell P2311Hb(portrait)
  • Cooling
    MasterLiquid Lite 240
Link to comment
Share on other sites

Link to post
Share on other sites

14 hours ago, Amazonsucks said:

You mean a single rack, server chassis of x number of units? Because a distributed memory supercomputer like that is still considered a single machine...

 

Hence the Top500 exists to benchmark large machines like that.

They are still treated as nodes, workload managers allocate jobs to nodes based on a lot of rules. CERN for example can only run up 300k of their 500K CPUs due to power and cooling (there is multiple locations where nodes are located, all under OpenStack). When you submit a job you give it workload tags so the system knows where to run it i.e. GPU node or memory heavy nodes. The workload manager will make sure jobs don't get allocated to nodes in racks that are at max power capability, stuff like that.

 

Have a look in to SLURM, that's the most popular workload manager. Most of the top 500 use it.

https://slurm.schedmd.com/

https://en.wikipedia.org/wiki/Slurm_Workload_Manager

 

Intel and GPU based systems don't combine in to single logical entities, IBM on the other hand does have the capability for that type of thing. You don't actually want to make massive blocks of logical compute nodes across chassis and cabinets though, you start to hit barriers like bandwidth and latency. It's more efficient, depending on task, to move that logic up to your code and make it aware of the hardware boundaries because you can get much better control that way.

Link to comment
Share on other sites

Link to post
Share on other sites

9 minutes ago, leadeater said:

They are still treated as nodes, workload managers allocate jobs to nodes based on a lot of rules. CERN for example can only run up 300k of their 500K CPUs due to power and cooling (there is multiple locations where nodes are located, all under OpenStack). When you submit a job you give it workload tags so the system knows where to run it i.e. GPU node or memory heavy nodes. The workload manager will make sure jobs don't get allocated to nodes in racks that are at max power capability, stuff like that.

 

Have a look in to SLURM, that's the most popular workload manager. Most of the top 500 use it.

https://slurm.schedmd.com/

https://en.wikipedia.org/wiki/Slurm_Workload_Manager

 

Intel and GPU based systems don't combine in to single logical entities, IBM on the other hand does have the capability for that type of thing. You don't actually want to make massive blocks of logical compute nodes across chassis and cabinets though, you start to hit barriers like bandwidth and latency. It's more efficient, depending on task, to move that logic up to your code and make it aware of the hardware boundaries because you can get much better control that way.

I understand the difference between nodes and a whole system. I am merely stating that one section of the K Computer(as one random example) is still part of the same machine. Although HPC systems can be and are partitioned and not always fully utilized, its accurate to state that the IBM Blue Gene Sequoia was the first million core single machine.

Link to comment
Share on other sites

Link to post
Share on other sites

1 hour ago, Nicnac said:

imagine folding on this^^

Russian scientists have already been arrested for doing that (mining but same diff) ?

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×