Jump to content

What exactly does Architecture refer to? What is CPU Performance?

So these question(s) are a little nebulous but my understanding of CPU (and I suppose GPU) performance is equally as nebulous. I'd really love some help in understanding this!! From what I understand, "CPU Performance" is contingent on three things: Process Node, Frequency and Instructions Per Clock.

 

For example, this Techspot Article measures performance in terms of IPC between AMD's Ryzen Chips and the Intel 8700k & 8600k, by limiting all the CPUs to 4.0GHz. I'd encourage you to look at their testing, but from what I can tell, in terms of just raw IPC, between the 2000 series AMD products and the Coffee Lake parts, IPC is a bit of a wash overall, both architectures will provide similar enough performance at 4.0GHz. So in this case, there is relative parity in terms of frequency and IPC, however the process node is different between Zen+ and the Intel Coffee Lake chips. So, would it be correct in saying that in general, GlobalFoundries' 12nmLP process and Intel's 14nm++(+) process are roughly equivalent in terms of performance? But the process isn't the CPU so measuring the "performance" of a process technology seems like a sort of stupid prospect, like how the hell do you measure the performance of a transistor?

 

Regarding the process node part of the "performance triangle" as it were, how much does a process node really have on performance? For example, AMD's FX line of chips were pretty infamous for being underwhelming in terms of performance in a multitude of workloads, but those chips were produced on the GlobalFoundries 32nm process. Certainly AMD has a strong architecture to rely on with Zen, so there would be no need to do what I'm about to suggest, but I will for the sake of the argument. Would a 14/12nm FX chip see a performance gain of ≈50%? 14nm is over 50% smaller than 32nm.... Process nodes tend to increase frequency, we've seen that with the Ryzen 2000 series, and XFR 2's ability to boost some single core scenarios to 4.3GHz, a measurable improvement to Ryzen 1000. So would this theoretical FX14nm chip run faster or perhaps more efficiently? What does this say about "architecture"? (also thermals???)

 

But then the idea of a "nm" gets thrown into question! This Wikipedia entry has a table listing all the different Semiconductor manufacturer's process dimensions on 14nm, and they all seem to be somewhat different. Like Samsung/GlobalFoundries' dimensions are 49x8x38 whereas Intel's dimensions for 14nm are 42x8x42. They aren't the same dimensions, so they must not all actually be 14nm! So what the hell does a "nm" really even mean at this point??

 

Okay but getting back to this whole architecture thing, certainly there are things that I feel like I can point to and say "That's an architectural element of the CPU". For example, AMD's Infinity fabric or Intel's Ring Bus, or integrating the northbridge into the CPU. But the thing is, I don't have an electrical engineering degree or anything of the sort, so I just have to accept the fact that these are engineer feats or something, which is hard to quantify because these things tend to be intangible. 

 

I'M SO LOST BRO WHAT DOES ALL THIS MEAN

 

Just how similar are different architectures, from generation to generation and from brand to brand?

 

I've been under the impression that more transistors ≈ more performance, and again frequency is also important. So if the CPU (& GPU) business is effectively just a race to add as many transistors on a package, what does "architecture" even refer to? 

 

I suppose like anything, the concept isn't predicated on a single aspect of itself, so in this case, it's hardly as if CPU performance is strictly contingent on Frequency or strictly Contingent on Process Tech, it's probably a combination of all these elements. I'm just struggling to understand how exactly this all adds up. 

 

I'm sorry for the absolute RAMBLE this turned out to be, but I'm just really lost in terms of the meaning of "architecture" in terms of microprocessor technology. Any thoughts are appreciated!

Link to comment
Share on other sites

Link to post
Share on other sites

Node: nm reffers to a rough picture of how large the transistor is, but it says nothing about how the node actually perform in terms of clocks and how the density is. 

 

Essentially, the smaller the node, the higher clocks can be achieved and a lowered power consumption

 

Architecture: the layout and buildup of memmory busses between cores, caches and more. This is what decides how "wide" an architecture is. The wider the architecture the higher IPC. Bulldoser and FX were rather narrow architectures with small cores, and they accordingly could reach really high clocks. 

 

IPC isnt its own architecture, but is rather tied to how wide the architecture is and how well the caches are layed out and how well they perform. 

 

Ringbus is fantastic in the sence it has very low core to core latency and memmory latency between caches and so fourth. It has sadly limited scalability beyond 12 cores  as the ring gets longer and longer. Essentially the more cores you add the worse off the ringbus will be.

 

Zen is fantastic in the sence it has the ability to tie multiple seperate dies together to work as a single CPU. 

 

Pictures: a picture of Threadripper/EPYC die (see the connection between each seperate die)

 

Ringbus Die. See how it gets longer with each extra set of cores. thumbnail_amd_epyc_interconnect.jpg.be80e0c3398763238317bfb0213eacdb.jpg599864681_9900KMockup.jpg.22236bcd910d810ecdf40ab37509fba3.jpg

Link to comment
Share on other sites

Link to post
Share on other sites

2 hours ago, GoldenLag said:

Node: nm reffers to a rough picture of how large the transistor is, but it says nothing about how the node actually perform in terms of clocks and how the density is.

 

Thank you for clarifying this, but even with that considered how come all the major semiconductor firms use such varying dimensions? Does this not muddy the waters in regards how big the transistors actually are?

2 hours ago, GoldenLag said:

Architecture: the layout and buildup of memmory busses between cores, caches and more. This is what decides how "wide" an architecture is. The wider the architecture the higher IPC. Bulldoser and FX were rather narrow architectures with small cores, and they accordingly could reach really high clocks. 

To make an automotive analogy, is it sort of like if you have a smaller bore and stroke in an engine, you can increase the RPM to try and make up for some of the engine performance? Instead in this case, the "bore and stroke" would refer to the "width" of a cpu core and "RPM" would be frequency.

2 hours ago, GoldenLag said:

Architecture: the layout and buildup of memmory busses between cores, caches and more. This is what decides how "wide" an architecture is. The wider the architecture the higher IPC. Bulldoser and FX were rather narrow architectures with small cores, and they accordingly could reach really high clocks. 

About this "wider" thing you're talking about, why exactly do "wider" designs provide higher IPC and vice versa?

2 hours ago, GoldenLag said:

IPC isnt its own architecture, but is rather tied to how wide the architecture is and how well the caches are layed out and how well they perform. 

I didn't mean to suggest that IPC is an "architecture", I'm still just having a bit of a hard time putting together what defines IPC and frequency, but I think that's you were getting at with the "wider" description from earlier.

Link to comment
Share on other sites

Link to post
Share on other sites

22 minutes ago, ExodusR said:

To make an automotive analogy, is it sort of like if you have a smaller bore and stroke in an engine, you can increase the RPM to try and make up for some of the engine performance? Instead in this case, the "bore and stroke" would refer to the "width" of a cpu core and "RPM" would be frequency.

yes, it would roughly be the same

23 minutes ago, ExodusR said:

About this "wider" thing you're talking about, why exactly do "wider" designs provide higher IPC and vice versa?

essentially: you have more transistors working each time the CPU does a cycle. hence the higher IPC

25 minutes ago, ExodusR said:

I didn't mean to suggest that IPC is an "architecture", I'm still just having a bit of a hard time putting together what defines IPC and frequency, but I think that's you were getting at with the "wider" description from earlier.

i meant to write type/corner/thing. 

 

IPC is defined by the wideness of the core and how well the architecture can feed the cores data (cachesystem, core-to-core latency, latency and more. you will often see that a smaller node also improves these parameters and not only the transistors themselves)

 

frequency also takes part in the architecture design not only the wideness, but you can attribute most of it to the Node used. 14nm used on Ryzen 1 targeted a 3ghz core clock and the node is a low power node for high efficiency. 12nm ryzen 2 is just a node shrunk ryzen 1 (long story short, the diesize and footprint of the cores is the same, but each induvidual component got smaller). this 12nm had a different ghz set as its goal and its main feature is that 14nm designs could be used on the 12nm node. both nodes are from global foundries. 

Link to comment
Share on other sites

Link to post
Share on other sites

5 minutes ago, GoldenLag said:

you will often see that a smaller node also improves these parameters and not only the transistors themselves)

And part of this is because smaller lithographies literally move the components closer together, right?

6 minutes ago, GoldenLag said:

Frequency also takes part in the architecture design not only the wideness, but you can attribute most of it to the Node used. 14nm used on Ryzen 1 targeted a 3ghz core clock and the node is a low power node for high efficiency. 12nm ryzen 2 is just a node shrunk ryzen 1 (long story short, the diesize and footprint of the cores is the same, but each induvidual component got smaller). this 12nm had a different ghz set as its goal and its main feature is that 14nm designs could be used on the 12nm node. both nodes are from global foundries. 

So the 14nm designs could be used on 12nm... From my theoretical example from earlier, the "FX14nm" chip, are you suggesting that the 32nm FX design couldn't be moved to a new node due to process limitations? If so what would those limitations be?

Link to comment
Share on other sites

Link to post
Share on other sites

7 minutes ago, ExodusR said:

So the 14nm designs could be used on 12nm... From my theoretical example from earlier, the "FX14nm" chip, are you suggesting that the 32nm FX design couldn't be moved to a new node due to process limitations? If so what would those limitations be?

I dont have details on what the limitations are, but to move to a new node certain considerations have to be taken into account to make the die function at all. 

 

8 minutes ago, ExodusR said:

And part of this is because smaller lithographies literally move the components closer together, right?

I dont have any insight into that, but i most likely have to do with the performance of the induvidual components in the CPU, and not actually the physical distance between them. 

Link to comment
Share on other sites

Link to post
Share on other sites

4 minutes ago, GoldenLag said:

I dont have details on what the limitations are, but to move to a new node certain considerations have to be taken into account to make the die function at all. 

 

I dont have any insight into that, but i most likely have to do with the performance of the induvidual components in the CPU, and not actually the physical distance between them. 

So at the very least, it's not like you can just copy/paste a design to a new node. Very interesting! 

I've learned a lot from this convo and this has cleared up a lot for me. Thanks a bunch @GoldenLag ❤️

Link to comment
Share on other sites

Link to post
Share on other sites

This is what I think of this matter:

Architecture is like programming, for example:

Two software programmers can give solution to a problem, where each one designs a different program, but both arrive at the same result. Now the code of one of the two can be more efficient than the other one. For example, one of them uses 1000 lines of code and the other only 100, just to give an example.

Another thing, the hardware acceleration of some tasks, means that there are units in the cpu, which are responsible for carrying out a certain task, that another cpu may not have, but does not mean that it can not perform the same task, but that It takes more time to do it, that means more work. For example, if a CPU has a unit for work with AES, it is much faster in the work of applications that use this algorithm, than a CPU that does not have this unit, because it happens that by not having a unit that works with AES, has to perform more mathematical calculations (it's more or less like that).

Now, the manufacturing process has a lot of meaning in all this, because the smaller you can include more units in the cpu like the ones mentioned above, or more cores, more memory cache more, but, the problem here is that there are more space to add more things that are useful ...

Another thing, a smaller manufacturing process is more efficient, it means that less heat is generated in the chip and therefore the working frequencies can be raised more,
    but there has to be a relationship of commitment between the frequency of the cpu and the number of functional components within the cpu, because the increase of the two things leads to the increase of energy consumption and therefore of temperature, therefore these two factors are in a balance, which inclines the manufacturer according to its convenience.

 

Link to comment
Share on other sites

Link to post
Share on other sites

As the manufacturing process is reduced as there is more space, execution units are added, to accelerate certain tasks within the CPU,

Look how intel has been maintained since 1015, at 14nm, what you have been doing is optimizing, it is as if you organize your room every year and discover that you have a little more space. But there is nothing new really in the Intel processors since 2015. Now when the processors arrive in 10nm, you will see how they announce new features, this has always been the case, in one year, they introduced the reduction of the manufacturing process and in the other year the new architecture arrived (It was known as TICK TOCK)

AMD ryzen 2000 arrives with higher frequencies in cores, RAM support, cache memory, some optimizations, this was possible because they dropped from 14 to 12 nm, otherwise the energy consumption goes up, and with it the temperatures.

In all that I have said there are many more technical details, which I did not want to mention here, because I wanted the idea to be understood in a general way without complications.

Link to comment
Share on other sites

Link to post
Share on other sites

1 hour ago, ExodusR said:

I'm still just having a bit of a hard time putting together what defines IPC and frequency,

IPC is Instructions per Clock-cycle.

Frequency is clock cycles per second.

Simplistically, you can multiply these to get instructions per second which indicates the raw performance of a CPU core.

An instruction is a basic operation of a CPU. Add two numbers, fetch something from cache, increment a counter.

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×