Well guys I've been doing my own research & analysis of the performance of AMD & Intel CPUs for sometime, I did not do tests personally however I tried to salvage as much knowledge as possible to try and answer a question that has been on my mind, which is how far ahead Intel really is? and that question led me to some very interesting realizations.
The discussion will be highly technical and might seem boring to some, I'll start off with the basics & end with the answer to the original question plus several others.
Ok let's begin, first off our discussion will be surrounding AMD's modular Bulldozer/Piledriver architecture & Intel's core series from Westmere to Sandy/Ivy & Haswell.
The Architectures : AMD Bulldozer/Piledriver
The building block of AMD's modular architecture is not an x86 core, it's an x86 module, this module contains everything that a core contains but with a number of components doubled, these components include the integer scheduler, its datapath, 16KB of L1 DCache & it's own load/store unit.
The doubling of these components is what gives the AMD module essentially two cores/threads.
The integer cores share the early pipeline stages (e.g. L1i, fetch, decode), the floating point unit, and the L2 cache with the rest of the module.
How this affects performance
Depending on the workload the quad module eight core AMD processors can either perform as eight processing units or as four and this depends entirely on the affinity of the workload to integer or floating point calculations.
If integer calculations are needed there are eight integer pipelines that can be used, however if floating point calculations are needed there are only four.
Here are examples of the effect this has on performance.
in 7-Zip the compression workload is very integer core intensive and thus a quad module AMD CPU performs well because it can provide eight integer threads.
However in floating-point heavy workloads like a synthetic render scene in Cinebench the eight integer pipelines are held back by the four floating point pipelines in the 8350 which prevents the CPU from reaching its full performance potential.
If the 8350 had eight floating point units single-threaded performance would be multiplied by 8 resulting in an 8.8 score.
The sharing of the early pipeline stages like the decode stage can also present similar symptoms to the sharing of the floating point unit.
Elimination of this resource sharing can improve performance by up to 25%.
Here is the in-depth analysis.
With Steamroller (AMD's upcoming CPU architecture) decode resources will be doubled, which will minimize resource sharing and make it exclusive to the floating point unit.
What's also worth noting is that the majority of games have a great amount of floating-point operations, in games where the floating-point workload becomes too intensive for the CPU the developers code it for the GPU, an example of this is Nvidia's PhysX, AMD's TressFX & the Havoc physics engine.
Why is AMD not paying attention to floating-point performance ?
To answer this question we must understand AMD's goals. AMD combined GPGPU (General Purpose Graphics Processing Unit) cores with CPU cores in the APU in what AMD calls a heterogeneous architecture, not necessarily to make a product that replaces entry level discrete graphics cards, but the end goal is to let the GPU cores handle floating point operations (and any parallel workload) because floating point workloads can often be massively parallel and so a GPU architecture would be able to crunch floating-point data orders of magniute faster than a single large floating-point unit.
AMD's building block v.s. Intel's.
AMD Bulldozer/Piledriver Module (2 integer cores & 1 floating point unit) on top.
Intel Westemere core (1 integer core and 1 floating point unit) on the bottom.
Both are built on the 32nm process, Sandy Bridge is also built on the 32nm process.
The sizes of Intel's & AMD's building blocks (excluding the L2/L3 2MB caches) are :
Sandy Bridge: 18.4mm^2
Bulldozer Module: 19.42mm^2
So manufacturing process aside, a modern intel core is roughly the same size as an AMD module.
Size comparison :
Ok so now that we've established that two AMD "cores" are roughly the same size as one intel Sandy/Ivy/Haswell core, lets dive in the benchmarks.
We'll be comparing the FX 8350 (Die-Size = 315mm^2) & the i7 3820 (Die-Size = 294mm^2), the FX 8350 is approximately 21mm^2 larger because of its fairly larger cache pool.
What is a CPU Die (Die Size) ? click here .
So we're noticing a pattern here, we're all familiar with, single threaded workloads run faster on the larger intel cores, however anything that utilizes all threads available runs faster on the smaller more parallel architecture of the AMD module.
This is nothing new, the 4 extra logical threads (hyper-threading) of intel's i7 processors help it keep up with AMD's eight physical integer cores.
Although the physical integer cores still maintain a performance lead, the degree of this depends on how well optimized the workload is for hyper-threading.
But what about Intel CPUs that don't support hyper-threading?
In that case 3 AMD modules become as fast as 4 intel cores as long as they are clocked 500mhz higher, although a 500mhz delta isn't really large considering that haswell i5s usually top at 4.3Ghz while the 6 core AMD processors can reach 4.8Ghz on the same cooling.
FX 6300 @ 3.5Ghz vs i5 4430 constant turbo on all 4 cores @ 3.0Ghz.
FX 6350 @ 3.9Ghz vs i5 4570 constant turbo on all 4 cores @ 3.4Ghz
What about games ?
Back to my original question which is: how far ahead is Intel ?
A dual-core AMD module which takes roughly the same die area as a single Intel core has significantly more throughput but also higher single-thread latency.
So architecturally speaking, Intel isn't really ahead of AMD nor is AMD ahead of Intel so to speak, the parallel nature of AMD's Bulldozer modules gives them the total performance advantage over the Intel cores. However what Intel's architecture lacks in total CPU throughput makes up in single-threaded performance augmented with hyper-threading.
In an absolute sense an AMD module can process more data per second than an Intel core of the same size. However the sacrifice for this higher throughput is the single-threaded performance as we've discussed above. Although the majority of CPU-intensive programs are already multi-core/thread reliant & the ones that aren't are becoming so more rapidly as time passes which will in turn make this sacrifice in single-threaded performance for more total performance worth it for both casual and power users alike.
The reason why Intel stuck to single-threaded performance focused architectures is because it's easier to write code for one thread as opposed to two or two as opposed to 3 and so on, it's also much more complicated to design a multi-threaded module that shares resources as AMD did. But it's becoming extremely difficult to push more performance out of a single thread in the highly power constrained computing environments which have become the norm today which makes the move to a higher level of parallelism an inevitable certainty.
Where Intel is really ahead of AMD is not the architecture, it's the manufacturing process. Intel has had 22nm at its disposal for nearly two years while AMD only now has moved to 28nm process technology provided by Globalfoundries, this is because AMD sold off its fabrication plants which gives the company no option but to use whatever manufacturing process is available by 3rd parties like Globalfoundries & TSMC.
The fabrication process gives Intel two advantages over AMD, one of which is power efficiency so even though 3 AMD Piledriver modules match 4 Intel Haswell cores in performance when hyper-threading is unused, the 4 Intel cores will consume less power.
And even though smart technology implementations and fabrication tricks can enhance power efficiency of a chip, generally speaking the die size of a CPU or GPU reflects power consumption at load extremely well & so with the more advanced manufacturing process Intel can keep die sizes small to benefit from the power & cost savings.
Which leads us to Intel's second advantage which is higher profit margins and that's where Intel is really ahead of AMD.
It was a long-ride I hope you've enjoyed it.