Jump to content

For my first post here on LTT Forums, I thought I might write something a bit in depth. After AMD's Ryzen launch, some light was finally shed on the underdog. With this, sales of the older FX series had increased. Because of the confusing architecture of the FX series, many people have made false claims like the following "the FX-8xxx and FX-9xxx series only have four cores" and "AMD's Bulldozer/Piledriver cores are weaker than Intel's cores". I will explain how people could believe these claims, but also how they are incorrect.

 

I will first explain the naming scheme of the FX series. First, we have the FX-4xxx lineup. The FX-4350's "4" stands for the amount of cores. The "3" stands for the generation. You may be wondering "how are there three generations when there was only Zambezi and Vishera?" Well, Zambezi CPUs had revisions of certain CPUs half way through release. There was an FX-6100, which, from the number "1" shows it is the first generation. Then, there was an FX-6200 released, being the second generation. So, now that I have explained generations, we can get back to the naming scheme. The FX-4350's "5" stands for a revision or a step above another product like the FX-4300. Usually, a higher model number indicates more performance.

 

So, now that you understand the naming of the SKUs of the FX series, we can dive into the actual Bulldozer/Piledriver architecture. I will start by explaining what the Bulldozer and Piledriver names mean. Bulldozer was the name of the architecture Zambezi CPUs were based on, so, the first and second generation CPUs. Piledriver was a slightly modified version of Bulldozer, used on current third generation FX series CPUs. Now we'll get to the technical aspects of the architectures. Bulldozer/Piledriver used something called CMT, or Clustered Multithreading, which is the implementation of cores in "modules". One AMD module consists of two cores, so, an FX-8350 would have four modules, making a total of eight cores. Using CMT allows more cores to be packed into the same or smaller area as a "traditional" core by forcing the cores to share resources, such as a floating point unit, or FPU. AMD also chose to use CMT to reduce power consumption, as there were fewer dedicated parts that used high amounts of electricity. So, now I will explain the false claims listed above. The FX series' CPUs have 100% real, physical, cores. Even though Windows Task Manager may disagree and display only four physical cores and eight logical cores, there are really eight physical cores. When both cores in a module are running Integer operations, the module acts as a dual core CPU. When both cores are running 128-bit FP operations, the module will once again act as a dual core. The module can also perform a mix of integer and 128-bit operations, and still perform similarly to a dual core CPU. There is only one case that the module will act as a single core CPU. Both cores in the module share two FPUs, each 128-bits, but for situations where a program calls for 256-bit code, AMD developed a solution called Flex FP. Flex FP allows the two 128-bit FPUs to combine into one 256-bit FPU. When this happens, one core can no longer run floating point code because both FPUs are being used by the other core for 256-bit operations. The core that cannot use the FPU can still run integer code at the same time the other core is running the FPU and integer code. At the time of release, very few programs used 256-bit code, eliminationg the need to disable a core's floating point capabilities. The second point I would like to touch base on is, AMD CPUs do not use Hyperthreading. AMD's Ryzen CPUs use SMT, or Simultaneous Multithreading, which is very similar, but it is not Hyperthreading. SMT is the alternative to CMT, with an SMT core being roughly the same size as one CMT module, possibly packing more ALUs or AGUs. SMT has one core, but can split the amount of ALUs across two threads, therefore creating two smaller "cores".

 

Another misconception of the Bulldozer/Piledriver architecture is that the cores are "weaker" than Intel's. This is simply not true. On paper, Bulldozer should have crushed Intel's offerings. AMD supported many of the same instructions as Sandy Bridge. Bulldozer, in fact, actually supported more instructions (XOP, FMA4, F16C) than Sandy Bridge, but were not implemented by developers. The real reason why AMD's Bulldozer architecture underperformed was because of branch misprediction due to an underpowered branch prediction module, poor instructions per clock (IPC), and long pipelines. Bulldozer's IPC was pretty bad. The 2010 Phenom 1090T could outperform some of the FX CPUs clock-for-clock. The FX CPUs could have crushed the Phenom CPUs with their high clock speeds, but with long pipelines, the CPUs were more susceptible to branch misprediction, causing the CPU to completely restart the instruction it was just working on.

 

By all means, I am not an AMD fanboy, nor am I an Intel fanboy. I have covered some of the pros and cons of the FX CPUs. I am a proud owner of an FX-4300, and believe me, that thing has held up greatly. It runs well for my motherboard having a broken VRM as well. I might not be able to play the newest games at a solid 60 FPS, but I still get by on Battlefield 1 and other somewhat demanding games. I am very surprised at how long the FX CPUs have lasted. Even with much lower Hypertransport speeds than Intel's solution, the FX CPUs can still compete. The FX CPUs could have been amazing if developers had used them to their full potential. I believe the FX CPUs were simply underutilized, and with DirectX 12, we're starting to see some major performance improvements in gaming. I am also very surprised at how slowly the FX CPUs have aged. I think to myself, "how can a six year old architecture still compete with the newest technology?" We all know the answer to that question though. Intel simply milks the consumer by releasing a modified architecture every year with minimal performance increases. What I would really like to see is a new AM3+ motherboard lineup with PCIe 3.0 and newer peripherals. If you read this whole post, I appreciate it. I apologize for any confusing parts of this post or any parts that don't make sense as I haven't looked over it for mistakes. :P

Link to comment
https://linustechtips.com/topic/803736-amd-bulldozerpiledriver-explained/
Share on other sites

Link to post
Share on other sites

Wow nice first post

 

 

Don't we already know all of this?

Also I think I know why people say 8xxx have 4 cores. They share cache and other supplies so some people call it "a rip off of hyperthreading."

 

I don't really, I don't want to argue with you. 

PSU Nerd | PC Parts Flipper | Cable Management Guru

Helpful Links: PSU Tier List | Why not group reg? | Avoid the EVGA G3

Helios EVO (Main Desktop) Intel Core™ i9-10900KF | 32GB DDR4-3000 | GIGABYTE Z590 AORUS ELITE | GeForce RTX 3060 Ti | NZXT H510 | EVGA G5 650W

 

Delta (Laptop) | Galaxy S21 Ultra | Pacific Spirit XT (Server)

Full Specs

Spoiler

 

Helios EVO (Main):

Intel Core™ i9-10900KF | 32GB G.Skill Ripjaws V / Team T-Force DDR4-3000 | GIGABYTE Z590 AORUS ELITE | MSI GAMING X GeForce RTX 3060 Ti 8GB GPU | NZXT H510 | EVGA G5 650W | MasterLiquid ML240L | 2x 2TB HDD | 256GB SX6000 Pro SSD | 3x Corsair SP120 RGB | Fractal Design Venturi HF-14

 

Pacific Spirit XT - Server

Intel Core™ i7-8700K (Won at LTX, signed by Dennis) | GIGABYTE Z370 AORUS GAMING 5 | 16GB Team Vulcan DDR4-3000 | Intel UrfpsgonHD 630 | Define C TG | Corsair CX450M

 

Delta - Laptop

ASUS TUF Dash F15 - Intel Core™ i7-11370H | 16GB DDR4 | RTX 3060 | 500GB NVMe SSD | 200W Brick | 65W USB-PD Charger

 


 

Intel is bringing DDR4 to the mainstream with the Intel® Core™ i5 6600K and i7 6700K processors. Learn more by clicking the link in the description below.

Link to post
Share on other sites

I like how you first claim that their cores are not weaker than intels, and then you go ahead and describe exactly why they are weaker.

NEW PC build: Blank Heaven   minimalist white and black PC     Old S340 build log "White Heaven"        The "LIGHTCANON" flashlight build log        Project AntiRoll (prototype)        Custom speaker project

Spoiler

Ryzen 3950X | AMD Vega Frontier Edition | ASUS X570 Pro WS | Corsair Vengeance LPX 64GB | NZXT H500 | Seasonic Prime Fanless TX-700 | Custom loop | Coolermaster SK630 White | Logitech MX Master 2S | Samsung 980 Pro 1TB + 970 Pro 512GB | Samsung 58" 4k TV | Scarlett 2i4 | 2x AT2020

 

Link to post
Share on other sites

The same things could've been said about the Pentium 4. It should've worked well if only developers learned how to work with it. But AMD offered a more "traditional" approach that didn't require specific optimizations. Probably. I guess though the FX has some vindication today now that more applications are starting to use more threads, but the architecture is almost 6 years old now and Ryzen just put it out of its misery.

 

But I do want to nitpick on a few bits:

52 minutes ago, BuckyJunior said:

The FX series' CPUs have 100% real, physical, cores. Even though Windows Task Manager may disagree and display only four physical cores and eight logical cores, there are really eight physical cores. When both cores in a module are running Integer operations, the module acts as a dual core CPU. When both cores are running 128-bit FP operations, the module will once again act as a dual core. The module can also perform a mix of integer and 128-bit operations, and still perform similarly to a dual core CPU. There is only one case that the module will act as a single core CPU. Both cores in the module share two FPUs, each 128-bits, but for situations where a program calls for 256-bit code, AMD developed a solution called Flex FP. Flex FP allows the two 128-bit FPUs to combine into one 256-bit FPU. When this happens, one core can no longer run floating point code because both FPUs are being used by the other core for 256-bit operations. The core that cannot use the FPU can still run integer code at the same time the other core is running the FPU and integer code. At the time of release, very few programs used 256-bit code, eliminationg the need to disable a core's floating point capabilities. The second point I would like to touch base on is, AMD CPUs do not use Hyperthreading. AMD's Ryzen CPUs use SMT, or Simultaneous Multithreading, which is very similar, but it is not Hyperthreading. SMT is the alternative to CMT, with an SMT core being roughly the same size as one CMT module, possibly packing more ALUs or AGUs. SMT has one core, but can split the amount of ALUs across two threads, therefore creating two smaller "cores".

It depends on what your definition of a "CPU core" is. A cluster has a single front-end with effectively two disjointed back-ends. And the thing with CMT is that it's more rigid than SMT. With SMT, if you had four ALUs, you could run a thread that needed three alongside one that needed one in a single pass. With CMT, since you're fixed, one thread has to spend another cycle in its integer cluster.

 

Also the FPU is "single core", there's only one scheduler in it. It can do two 128-bit values at the same time yes, but this is more akin to superscalar execution than being a "dual-core" setup. Otherwise we can claim that AMD's Vega FE has 8192 shader units vs 4096 because even though each shader unit is an FP32 core, it can do FP16 at double the rate (it doesn't work like that).

 

Quote

By all means, I am not an AMD fanboy, nor am I an Intel fanboy.

This is inviting people to heckle you that you are one or the other and doesn't add anything to the post.

Quote

 I might not be able to play the newest games at a solid 60 FPS, but I still get by on Battlefield 1 and other somewhat demanding games. I am very surprised at how long the FX CPUs have lasted. Even with much lower Hypertransport speeds than Intel's solution, the FX CPUs can still compete.

People are still very satisfied with their Sandy Bridge setups.

Quote

I believe the FX CPUs were simply underutilized, and with DirectX 12, we're starting to see some major performance improvements in gaming.

Er, not really. Most DX12 implementations so far have been lackluster. And a lot of major improvements going from DX11 to DX12 were if you were using low-end hardware to begin with. So even a Skylake Core i3 will go toe-to-toe with an i7 in DX12 when the cards are played right.

 

I mean, it's great if you held out for this long, but all DX12 and Vulkan do is lower the CPU requirement bar.

 

Quote

I am also very surprised at how slowly the FX CPUs have aged. I think to myself, "how can a six year old architecture still compete with the newest technology?" We all know the answer to that question though. Intel simply milks the consumer by releasing a modified architecture every year with minimal performance increases.

You act as if every other company releases a next generation by building a completely new architecture and releasing it. ARM milked their ARMv7 architecture for years. NVIDIA is on the third iteration of Maxwell, if you will (Pascal is basically Maxwell Gen 3 with additional hardware features), and before that, they've essentially used the same architecture based on the G80 for five generations. AMD is on the 5th generation of GCN, which hasn't changed much.

 

EDIT: I would also argue that Intel's minimal performance improvements (which define "minimal") because efficiency has been the utmost importance to Intel. Their bread and butter has been laptops and servers, where efficiency trumps overall performance. When you start reducing something by 20%, sooner or later the absolute values are going to shrink to the point of not being appreciable. Now that AMD's strategy is to provide more cores for cheap, Intel has to respond to that, even though Ryzen's single core performance isn't all that better than Haswell. And developers have been following suit, sort of. That or people just found the right combination of software to run that makes use of more cores.

Quote

What I would really like to see is a new AM3+ motherboard lineup with PCIe 3.0 and newer peripherals. If you read this whole post, I appreciate it. I apologize for any confusing parts of this post or any parts that don't make sense as I haven't looked over it for mistakes. :P

Why? It's effectively a dead platform now that Ryzen has been released. Anything AM3+ can do, Ryzen can do faster, stronger, better.

Link to post
Share on other sites

  • 4 months later...
On 7/6/2017 at 8:07 PM, Enderman said:

I like how you first claim that their cores are not weaker than intels, and then you go ahead and describe exactly why they are weaker.

The thing is, branch misprediction caused Bulldozer's notoriously low IPC. When it worked, the issues disappeared, allowing Bulldozer to perform comparatively or better than Sandy Bridge. There are also AMD-exclusive instructions such as FMA4, COP, F16C, which, if implemented by developers, could have given Bulldozer the ability to outperform Sandy Bridge. My point is, the cores aren't "weaker" because they have the ability to work, just not 100% of the time. 

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×