Jump to content

HT/SMT die area, longer term future thoughts

Particularly in light of the more recent vulnerabilities reigniting the question of tradeoffs of HT, I started to think, what if HT or SMT were no longer a thing going forwards? I know SMT isn't implicated but I'm looking at the general direction that could be taken by the industry.

 

Assumptions:

There is a cost to implementing HT/SMT in terms of die area and power used

Does the gain from having HT/SMT outweight the cost of implementing it? This is an implied yes, at least in some cases, otherwise it wouldn't be done.

If cores become plentiful, do we still need the extra threads? Not having it would reduce the scenarios where HT/SMT can slow things down if a critical path thread ends up sharing a core with less important code. Consistent performance may be preferred over peak performance, although peak performance may remain in some CPUs for those use cases still likely to use it.

 

The thinking I'm having is we have started down a "more cores" road with AMD pushing the agenda. Some software can scale well, and that's great. Some software can scale badly, not necessarily due to bad code, but simply as their function means other limits apply to limit that potential. Some software just wont scale at all, but we'll park that to one side for now.

 

Imagine we're in a scenario where lots of cores are the norm. Let's say some years in the future, 8 cores is considered entry level, but higher end desktops might be 16 cores or more. I'm wondering at that point, if ditching HT/SMT would be a better strategy. But this argument still depends on the cost of implementing HT/SMT, and that's where I'm struggling. I've found references to HT in the P4 era consuming an extra 5% die space, which is comparatively small for the potential benefit. But CPUs have changed a lot since then, and is that still the case? If CPUs get more complicated with new features, does that imply the extra stuff that goes into HT/SMT likewise also has to get more complicated to keep up. Is it still 5%? More? Less? I've tried looking for either written statements and/or die shots indicating area attributed to HT/SMT but have struggled to find anything. Any help in that would be welcome.

Main system: i9-7980XE, Asus X299 TUF mark 2, Noctua D15, Corsair Vengeance Pro 3200 3x 16GB 2R, RTX 3070, NZXT E850, GameMax Abyss, Samsung 980 Pro 2TB, Acer Predator XB241YU 24" 1440p 144Hz G-Sync + HP LP2475w 24" 1200p 60Hz wide gamut
Gaming laptop: Lenovo Legion 5, 5800H, RTX 3070, Kingston DDR4 3200C22 2x16GB 2Rx8, Kingston Fury Renegade 1TB + Crucial P1 1TB SSD, 165 Hz IPS 1080p G-Sync Compatible

Link to comment
Share on other sites

Link to post
Share on other sites

How does HT/SMT take up die space? They are virtual, not physical?

Community Standards | Fan Control Software

Please make sure to Quote me or @ me to see your reply!

Just because I am a Moderator does not mean I am always right. Please fact check me and verify my answer. 

 

"Black Out"

Ryzen 9 5900x | Full Custom Water Loop | Asus Crosshair VIII Hero (Wi-Fi) | RTX 3090 Founders | Ballistix 32gb 16-18-18-36 3600mhz 

1tb Samsung 970 Evo | 2x 2tb Crucial MX500 SSD | Fractal Design Meshify S2 | Corsair HX1200 PSU

 

Dedicated Streaming Rig

 Ryzen 7 3700x | Asus B450-F Strix | 16gb Gskill Flare X 3200mhz | Corsair RM550x PSU | Asus Strix GTX1070 | 250gb 860 Evo m.2

Phanteks P300A |  Elgato HD60 Pro | Avermedia Live Gamer Duo | Avermedia 4k GC573 Capture Card

 

Link to comment
Share on other sites

Link to post
Share on other sites

1. what does smt stand for in this case?

 

2. Maybe I do not know.

I live in misery USA. my timezone is central daylight time which is either UTC -5 or -4 because the government hates everyone.

into trains? here's the model railroad thread!

Link to comment
Share on other sites

Link to post
Share on other sites

6 minutes ago, Skiiwee29 said:

How does HT/SMT take up die space? They are virtual, not physical?

would need some changes to the FPU and pipeline before and after the CPU afaik. 

 

edit: it is something that needs to be baked into the architecure and pipeline. 

Link to comment
Share on other sites

Link to post
Share on other sites

5 minutes ago, will1432 said:

1. what does smt stand for in this case?

 

2. Maybe I do not know.

SMT = Simultaneous Multi Threading.. AMDs version of Hyper Threading. 

Community Standards | Fan Control Software

Please make sure to Quote me or @ me to see your reply!

Just because I am a Moderator does not mean I am always right. Please fact check me and verify my answer. 

 

"Black Out"

Ryzen 9 5900x | Full Custom Water Loop | Asus Crosshair VIII Hero (Wi-Fi) | RTX 3090 Founders | Ballistix 32gb 16-18-18-36 3600mhz 

1tb Samsung 970 Evo | 2x 2tb Crucial MX500 SSD | Fractal Design Meshify S2 | Corsair HX1200 PSU

 

Dedicated Streaming Rig

 Ryzen 7 3700x | Asus B450-F Strix | 16gb Gskill Flare X 3200mhz | Corsair RM550x PSU | Asus Strix GTX1070 | 250gb 860 Evo m.2

Phanteks P300A |  Elgato HD60 Pro | Avermedia Live Gamer Duo | Avermedia 4k GC573 Capture Card

 

Link to comment
Share on other sites

Link to post
Share on other sites

7 minutes ago, Skiiwee29 said:

How does HT/SMT take up die space? They are virtual, not physical?

It's not free, it has to be implemented in silicon if you want it. Some parts have to be duplicated to provide that extra thread.

6 minutes ago, will1432 said:

1. what does smt stand for in this case?

Simultaneous multi threading. You can read that either as AMD's equivalent to HT, and/or the generic term for what HT is.

Main system: i9-7980XE, Asus X299 TUF mark 2, Noctua D15, Corsair Vengeance Pro 3200 3x 16GB 2R, RTX 3070, NZXT E850, GameMax Abyss, Samsung 980 Pro 2TB, Acer Predator XB241YU 24" 1440p 144Hz G-Sync + HP LP2475w 24" 1200p 60Hz wide gamut
Gaming laptop: Lenovo Legion 5, 5800H, RTX 3070, Kingston DDR4 3200C22 2x16GB 2Rx8, Kingston Fury Renegade 1TB + Crucial P1 1TB SSD, 165 Hz IPS 1080p G-Sync Compatible

Link to comment
Share on other sites

Link to post
Share on other sites

18 minutes ago, porina said:

It's not free, it has to be implemented in silicon if you want it. Some parts have to be duplicated to provide that extra thread.

Simultaneous multi threading. You can read that either as AMD's equivalent to HT, and/or the generic term for what HT is.

Good to know, thought it was just something coded into the instructions of the CPU, not something physical. 

Community Standards | Fan Control Software

Please make sure to Quote me or @ me to see your reply!

Just because I am a Moderator does not mean I am always right. Please fact check me and verify my answer. 

 

"Black Out"

Ryzen 9 5900x | Full Custom Water Loop | Asus Crosshair VIII Hero (Wi-Fi) | RTX 3090 Founders | Ballistix 32gb 16-18-18-36 3600mhz 

1tb Samsung 970 Evo | 2x 2tb Crucial MX500 SSD | Fractal Design Meshify S2 | Corsair HX1200 PSU

 

Dedicated Streaming Rig

 Ryzen 7 3700x | Asus B450-F Strix | 16gb Gskill Flare X 3200mhz | Corsair RM550x PSU | Asus Strix GTX1070 | 250gb 860 Evo m.2

Phanteks P300A |  Elgato HD60 Pro | Avermedia Live Gamer Duo | Avermedia 4k GC573 Capture Card

 

Link to comment
Share on other sites

Link to post
Share on other sites

Dammit, was hoping to fix a typo before anyone quoted it :) 

Main system: i9-7980XE, Asus X299 TUF mark 2, Noctua D15, Corsair Vengeance Pro 3200 3x 16GB 2R, RTX 3070, NZXT E850, GameMax Abyss, Samsung 980 Pro 2TB, Acer Predator XB241YU 24" 1440p 144Hz G-Sync + HP LP2475w 24" 1200p 60Hz wide gamut
Gaming laptop: Lenovo Legion 5, 5800H, RTX 3070, Kingston DDR4 3200C22 2x16GB 2Rx8, Kingston Fury Renegade 1TB + Crucial P1 1TB SSD, 165 Hz IPS 1080p G-Sync Compatible

Link to comment
Share on other sites

Link to post
Share on other sites

16 minutes ago, porina said:

Dammit, was hoping to fix a typo before anyone quoted it :) 

nvm, I see the typo now :P wasn't Silicon. Ill correct my quote for you

Community Standards | Fan Control Software

Please make sure to Quote me or @ me to see your reply!

Just because I am a Moderator does not mean I am always right. Please fact check me and verify my answer. 

 

"Black Out"

Ryzen 9 5900x | Full Custom Water Loop | Asus Crosshair VIII Hero (Wi-Fi) | RTX 3090 Founders | Ballistix 32gb 16-18-18-36 3600mhz 

1tb Samsung 970 Evo | 2x 2tb Crucial MX500 SSD | Fractal Design Meshify S2 | Corsair HX1200 PSU

 

Dedicated Streaming Rig

 Ryzen 7 3700x | Asus B450-F Strix | 16gb Gskill Flare X 3200mhz | Corsair RM550x PSU | Asus Strix GTX1070 | 250gb 860 Evo m.2

Phanteks P300A |  Elgato HD60 Pro | Avermedia Live Gamer Duo | Avermedia 4k GC573 Capture Card

 

Link to comment
Share on other sites

Link to post
Share on other sites

3 minutes ago, CUDAcores89 said:

I mean IBMs POWERPC arcitecture uses SMT with 4 threads per core and are some of the most secure processors out there. Clearly SMT can be done securely if implinemted correctly.

That's not the point of my thinking. There will be and remain cases where it is beneficial. As core counts go up, I don't think that necessarily belongs in the consumer space. If we assume the recent trend of more cores continues, we could get to recently unthinkable core counts in the not too distant future unless it slows before then. The quantity of consumer software that could benefit from many threads is limited, and if cores become cheap enough, I'm wondering if SMT would be dropped. For a given number of threads, at least in performance terms, it would be preferable to have it delivered by real cores than through SMT. If you don't implement SMT at all, there will be some die space saving from that. I'm just wondering how much?

Main system: i9-7980XE, Asus X299 TUF mark 2, Noctua D15, Corsair Vengeance Pro 3200 3x 16GB 2R, RTX 3070, NZXT E850, GameMax Abyss, Samsung 980 Pro 2TB, Acer Predator XB241YU 24" 1440p 144Hz G-Sync + HP LP2475w 24" 1200p 60Hz wide gamut
Gaming laptop: Lenovo Legion 5, 5800H, RTX 3070, Kingston DDR4 3200C22 2x16GB 2Rx8, Kingston Fury Renegade 1TB + Crucial P1 1TB SSD, 165 Hz IPS 1080p G-Sync Compatible

Link to comment
Share on other sites

Link to post
Share on other sites

2 hours ago, Skiiwee29 said:

SMT = Simultaneous Multi Threading.. AMDs version of Hyper Threading. 

Nope, its the other way around.

Hyper Threading is Intels Version of SMT.

As  SMT is the Technical term. Hyper Threading the Marketing one.

2 hours ago, porina said:

Particularly in light of the more recent vulnerabilities reigniting the question of tradeoffs of HT, I started to think, what if HT or SMT were no longer a thing going forwards? I know SMT isn't implicated but I'm looking at the general direction that could be taken by the industry.

YOu can fix that by adding access checks, wich can be as simple as one bit, that shows wich part of the Core it is, A or B. For SMT4 you need two bits, obviously.

And when accessing something you check if the BIt fits your path and if it doesn't you just ignore it.

It just has to be implemented.

2 hours ago, porina said:

There is a cost to implementing HT/SMT in terms of die area and power used

Does the gain from having HT/SMT outweight the cost of implementing it?

No, it doesn't.

As the Cost of implementing is small.

The cost of implementing it safely is a bit higher as you need to do checks but that's still far from a Core. And CMT also isn't that great, as we know from previous experiences. So we're left with SMT.

 

2 hours ago, porina said:

The thinking I'm having is we have started down a "more cores" road with AMD pushing the agenda.

do you have a better idea of improving the Performance of a CPU??

More clock doesn't work, we hit a wall.

More IPC is still possible but the differences are minor as we're already at the upper end.

 

The only solution is:

a) a new, modern Architecture. Well, need I to say something?

b) go wider and add more cores.

 

If you have a better idea of how to improve the Performance, without blowing up the power budget, everyone is eager to hear...

 

"Hell is full of good meanings, but Heaven is full of good works"

Link to comment
Share on other sites

Link to post
Share on other sites

17 minutes ago, Stefan Payne said:

YOu can fix that by adding access checks, wich can be as simple as one bit, that shows wich part of the Core it is, A or B. For SMT4 you need two bits, obviously.

My thinking wasn't about securing SMT, it was about if it was needed at all if core counts continue increasing.

 

The short version:

Core counts will increase

Much software will have trouble scaling to more threads

Cores perform better than extra threads from SMT

If core counts are high enough, we don't gain significantly (outside of limited cases) from having those extra threads

Dropping SMT from hardware would lead to some saving, allowing more cores to be a little easier to implement

 

If you run some server needing a million threads, fine, have SMT still.

 

17 minutes ago, Stefan Payne said:

do you have a better idea of improving the Performance of a CPU??

I never said I didn't want more cores! It was about optimising the user experience by possibly ditching SMT at higher core counts.

 

17 minutes ago, Stefan Payne said:

More clock doesn't work, we hit a wall.

It would probably require moving away from silicon, and I'm not holding my breath on that being cost effective any time soon.

 

17 minutes ago, Stefan Payne said:

More IPC is still possible but the differences are minor as we're already at the upper end.

Agreed in part. I have long said, for the "common" instructions, we're probably about as optimised as we're going to get. Any major gains will be through the implementation of specific hardware acceleration.

 

For example, Ryzen is much faster at some cryptography related tasks as they have hardware support for it. On the other side, Intel CPUs are pushing FPU and related throughput through AVX extensions, with two unit AVX-512 implementations offering double the FP64 throughput compared to Intel AVX2, which in turn is about double the throughput of Zen(+) AVX2.

Main system: i9-7980XE, Asus X299 TUF mark 2, Noctua D15, Corsair Vengeance Pro 3200 3x 16GB 2R, RTX 3070, NZXT E850, GameMax Abyss, Samsung 980 Pro 2TB, Acer Predator XB241YU 24" 1440p 144Hz G-Sync + HP LP2475w 24" 1200p 60Hz wide gamut
Gaming laptop: Lenovo Legion 5, 5800H, RTX 3070, Kingston DDR4 3200C22 2x16GB 2Rx8, Kingston Fury Renegade 1TB + Crucial P1 1TB SSD, 165 Hz IPS 1080p G-Sync Compatible

Link to comment
Share on other sites

Link to post
Share on other sites

  • 1 year later...

I don't think this has been answered yet. I would be really interested to know because as core counts increase we would be making choices between physical cores and hyper threading. If the die area is something like 10% what is going to be better? 20c40t or 22c22t in the same die area. SMT also has an impact on power consumption as cores are usually able to move to and from low power states quicker as the core is simpler. It would be an interesting video for LTT if we knew and they could test workloads using a threadripper that had a lot of threads to spare.

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×