Jump to content

Intel's 10nm only coming to servers in 2020 with Ice Lake

cj09beira
8 minutes ago, Taf the Ghost said:

Good find. I knew I'd seen it somewhere in the reviews/discussions. Yeah, Zen is capable of rerouting resources around via IF's control layer, and Intel's Mesh probably doesn't come into its own until the DDR5 era.

Here's what I was reading, http://www.hpcadvisorycouncil.com/pdf/LS_DYNA_Skylake.pdf.

 

image.png.9ae8030f7747f6c0874a2dd70ffb6b9c.png

 

Really interested to see how EPYC 2 performs at same core counts with the memory improvements, hopefully TR2 has some 2 die SKUs so we can get a preview of whats to come.

Link to comment
Share on other sites

Link to post
Share on other sites

9 minutes ago, leadeater said:

That the recent i7-980 video? Only half listened to that and time skipped a lot, need to actually watch it.

Yeah. My personal favorite: 

 

 

The 7-zip score is hilarious. 

Link to comment
Share on other sites

Link to post
Share on other sites

8 minutes ago, leadeater said:

Here's what I was reading, http://www.hpcadvisorycouncil.com/pdf/LS_DYNA_Skylake.pdf.

 

Really interested to see how EPYC 2 performs at same core counts with the memory improvements, hopefully TR2 has some 2 die SKUs so we can get a preview of whats to come.

Interesting presentation. I think they need work in their compiler though. There's not much of a performance difference between SSE, AVX2 and AVX512. Strikes me that they need more tuning of the program and/or the compiler. 

Link to comment
Share on other sites

Link to post
Share on other sites

14 minutes ago, Taf the Ghost said:

Interesting presentation. I think they need work in their compiler though. There's not much of a performance difference between SSE, AVX2 and AVX512. Strikes me that they need more tuning of the program and/or the compiler. 

That's more of a workload issue, LS-DYNA is just one example but I found it interesting that there are workloads where AVX-512 was actually slower.

 

The compiler is the latest Intel one and three different MPI implementations were tested.

 

SSE2 is actually really fast.

 

Quote

I've rewritten sse2 code to avx2, but performance is only 30-40 % better. Number of instructions has halved. What can be the problem?

 

Quote

In my practice ~30% performance increase of AVX2 on Haswell is typical, so your results don't surprise me much. Of course, it depends on the specifics of your code and data, but that is about the expected magnitude of gain in general.

 

The reason for the less-than-desired-2x speedup is that (a) some instructions have reduced performance on Haswell and (b) not all 128-bit algorithms directly translate to 256 bits and you have to add instructions to arrange data in registers, which wastes cycles.

 

Quote

The less than 2x speedup is (likely) contributable to instruction/pipeline stalls due to memory and/or cache level latencies. When code using a narrow vector (1 or 2 lanes) saturates the memory bus, then increasing the vector width will not speed up the program. There is a similar issue with the LLC/L3/L2/L1 cache levels.

 

Quote

As andysem hinted, AVX2 performance depends more on optimized unrolling (at least when comparing SSE2 and AVX2 on a CPU which supports both).

 

In the "LCD"/vectors benchmark suite, the performance gain from VS2012 AVX-128 to VS2015 AVX2-256 (when both cases vectorize) is typically the 40% you mention.  I've seen no difference in performance with VS2012 between SSE2 and AVX when both vectorize.  If you switch to a compiler which supports unrolling in addition to vectorization, a 50% gain from SSE to AVX2 might be a reasonable target.

 

In a few cases, AVX2 instructions may even lose performance compared with earlier AVX, when there is no unrolling.

https://software.intel.com/en-us/forums/intel-isa-extensions/topic/698503

 

I'm actually really glad I don't have to deal with that crap, "Here's a server, anything else not my problem" :)

Link to comment
Share on other sites

Link to post
Share on other sites

1 minute ago, leadeater said:

That's more of a workload issue, LS-DYNA is just one example but I found it interesting that there are workloads where AVX-512 was actually slower.

 

The compiler is the latest Intel one and three different MPI implementations were tested.

 

SSE2 is actually really fast.

 

 

 

 

https://software.intel.com/en-us/forums/intel-isa-extensions/topic/698503

 

I'm actually really glad I don't have to deal with that crap, "Here's a server, anything else not my problem" :)

Ian Cuttress has been working on an AVX512 testing suite and had to contract out someone to take normal "scientist code" and tune it properly for AVX512. Lot of power there, but takes a whole lot of work to get it out.

Link to comment
Share on other sites

Link to post
Share on other sites

1 minute ago, Taf the Ghost said:

Ian Cuttress has been working on an AVX512 testing suite and had to contract out someone to take normal "scientist code" and tune it properly for AVX512. Lot of power there, but takes a whole lot of work to get it out.

"Wtf, this shit is way over my head" xD.

Link to comment
Share on other sites

Link to post
Share on other sites

11 minutes ago, leadeater said:

"Wtf, this shit is way over my head" xD.

We'll have to see what the final result look like, after everyone gets back from the "AMD is giving everyone a Ferrari track day" event.

Link to comment
Share on other sites

Link to post
Share on other sites

8 minutes ago, M.Yurizaki said:

5b5d4966c42c2_Di0FXVLXoAUp0Ov1.jpg.501d263599625fbd19d9452e9dd11328.jpg

Oh dear, this should rustle some jimmies.

When a 2 core CPU beats out an 18 core 7980XE then I expect all the jimmes to be rustled, oh wait that's not what you were meaning :).

Link to comment
Share on other sites

Link to post
Share on other sites

2 minutes ago, leadeater said:

When a 2 core CPU beats out an 18 core 7980XE then I expect all the jimmes to be russtled, oh wait that's not what you were meaning :).

Though on a side note, I am kind of blindsided that vectorizing code seems to be becoming a thing these days. But I'm liking that. It hopefully should mean less work for developers who want to make the most out of data heavy applications.

 

Maybe I'm overthinking things, but I'm still miffed at how people think it's easy to just plop another thread down and expect things to increase n-fold.

Link to comment
Share on other sites

Link to post
Share on other sites

3 minutes ago, leadeater said:

When a 2 core CPU beats out an 18 core 7980XE then I expect all the jimmes to be russtled, oh wait that's not what you were meaning :).

Jeez, AVX-512 packs a ton of compute there. It's little wonder even hex channel DDR4 memory is bottlenecking. 

My eyes see the past…

My camera lens sees the present…

Link to comment
Share on other sites

Link to post
Share on other sites

13 minutes ago, M.Yurizaki said:

5b5d4966c42c2_Di0FXVLXoAUp0Ov1.jpg.501d263599625fbd19d9452e9dd11328.jpg

Oh dear, this should rustle some jimmies.

 

6 minutes ago, leadeater said:

When a 2 core CPU beats out an 18 core 7980XE then I expect all the jimmes to be rustled, oh wait that's not what you were meaning :).

He hasn't yet run the optimized code on the other platforms. Right now it's comparing Standard code vs highly optimized 3D Particle Movement code. That Skylake-X part has far more AVX512 units & power. It should run over 10x faster on that code base.  We'll know more when it gets released.

Link to comment
Share on other sites

Link to post
Share on other sites

@Taf the Ghost @M.Yurizaki

On this sort of topic I'm really glad Ian is looking in to this sort of thing and making it public. It makes it really clear that current and past CPUs can actually perform significantly better than they are but requires more effort from software developers, even just around knowing it's possible. I hope other outlets start looking at this too, the more it's known the more game developers will look in to it. Just imagine a title that comes out in say 2019 or 2020 that incorporates this type of work that doubles the CPU performance, suddenly dual core and quad core becomes relevant again lol (not that I want it to).

 

Maybe complex physics can be moved back to CPU, would kill off HairWorks and TressFX or completely alter them, I dunno.

Link to comment
Share on other sites

Link to post
Share on other sites

1 minute ago, leadeater said:

@Taf the Ghost @M.Yurizaki

On this sort of topic I'm really glad Ian is looking in to this sort of thing and making it public. It makes it really clear that current and past CPUs can actually perform significantly better than they are but requires more effort from software developers, even just around knowing it's possible. I hope other outlets start looking at this too, the more it's known the more game developers will look in to it. Just imagine a title that comes out in say 2019 or 2020 that incorporates this type of work that doubles the CPU performance, suddenly dual core and quad core becomes relevant again lol (not that I want it to).

 

Maybe complex physics can be moved back to CPU, would kill off HairWorks and TressFX or completely alter them, I dunno.

Nvidia's Gameworks mostly seems to exist to keep it all slowed down and working best on Nvidia hardware.

 

But, if there's ways of leveraging AI movement with some highly optimized AVX code, that could make a whole lot. Especially in Open World games. Though we come back to Game Engine issues.

Link to comment
Share on other sites

Link to post
Share on other sites

5 minutes ago, Taf the Ghost said:

He hasn't yet run the optimized code on the other platforms. Right now it's comparing Standard code vs highly optimized 3D Particle Movement code. That Skylake-X part has far more AVX512 units & power. It should run over 10x faster on that code base.  We'll know more when it gets released.

It's actually in the original tweet just not the graph.

 

Quote

On 18-core Core i9-7890XE, jump from 4111 to 30750.

 

I really hope the same level of attention/optimization is given to AMD as well, it's all well and good to show off highly optimized Intel AVX-512 but the same needs to be done for AMD AVX1 and AVX2.

Link to comment
Share on other sites

Link to post
Share on other sites

4 minutes ago, leadeater said:

@Taf the Ghost @M.Yurizaki

On this sort of topic I'm really glad Ian is looking in to this sort of thing and making it public. It makes it really clear that current and past CPUs can actually perform significantly better than they are but requires more effort from software developers, even just around knowing it's possible. I hope other outlets start looking at this too, the more it's known the more game developers will look in to it. Just imagine a title that comes out in say 2019 or 2020 that incorporates this type of work that doubles the CPU performance, suddenly dual core and quad core becomes relevant again lol (not that I want it to).

 

Maybe complex physics can be moved back to CPU, would kill off HairWorks and TressFX or completely alter them, I dunno.

I would like desktop PCs to adopt ARM's big.LITTLE approach. Except the LITTLE processors are for high performance I/O bound tasks while the big processors are for heavy data crunching code.

Link to comment
Share on other sites

Link to post
Share on other sites

1 minute ago, M.Yurizaki said:

I would like desktop PCs to adopt ARM's big.LITTLE approach. Except the LITTLE processors are for high performance I/O bound tasks while the big processors are for heavy data crunching code.

We've already got a little bit of that, but that's mostly on the security side. We'll see more ASIC-like dedicated approaches than we will see a big.LITTLE approach in x86.

Link to comment
Share on other sites

Link to post
Share on other sites

3 minutes ago, leadeater said:

It's actually in the original tweet just not the graph.

Quote

On 18-core Core i9-7890XE, jump from 4111 to 30750.

 

I really hope the same level of attention/optimization is given to AMD as well, it's all well and good to show off highly optimized Intel AVX-512 but the same needs to be done for AMD AVX1 and AVX2.

Just shy of a 7.5x scaling, with a dose higher clocks as well. Yeah, that seems I/O bound. Still, pretty decently scaling out to 18 cores for the first run of the code.

Link to comment
Share on other sites

Link to post
Share on other sites

1 minute ago, M.Yurizaki said:

I would like desktop PCs to adopt ARM's big.LITTLE approach. Except the LITTLE processors are for high performance I/O bound tasks while the big processors are for heavy data crunching code.

Hmm, sort of like the coprocessor days before MMX was brought in to CPUs. I actually like the IBM Cell design, sort it out at the front then chuck it to the back but that was just too hard for most people.

Link to comment
Share on other sites

Link to post
Share on other sites

Just now, leadeater said:

Hmm, sort of like the coprocessor days before MMX was brought in to CPUs. I actually like the IBM Cell design, sort it out at the front then chuck it to the back but that was just too hard for most people.

I think Cell was introduced before people were really ready for it. But hey, you had to start somewhere.

Link to comment
Share on other sites

Link to post
Share on other sites

Just now, M.Yurizaki said:

I think Cell was introduced before people were really ready for it. But hey, you had to start somewhere.

I think IBM also forgot that game developers are nothing like their PowerPC/Blue Gene server users, "Good enough" vs "Must be perfect".

Link to comment
Share on other sites

Link to post
Share on other sites

14 minutes ago, leadeater said:

I think IBM also forgot that game developers are nothing like their PowerPC/Blue Gene server users, "Good enough" vs "Must be perfect".

IBM never really wanted home users in the first place.

Link to comment
Share on other sites

Link to post
Share on other sites

2 minutes ago, Taf the Ghost said:

IBM never really wanted home users in the first place.

We're too low brow for them lol

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now


×