Jump to content

Carrizo To Deliver The Largest Performance per Watt Leap Ever, Coming in The Second Quarter of 2015

TERAFLOP

True. AMD however sold off their fabs (stupidity at its finest, they've assured that their balls will forever be in the vice of fabs) while Intel goes around blowing billions a year on RD for the sake of doing more. 

I actually get frustrated when people say "Intel does nothing to innovate". Really? Does anyone think that ANY of these companies sit on their asses doing nothing? They have roadmaps that stretch out for decades on some projects and product lines. They are always working, always tinkering; to say that what they do is only "reactionary" is foolish at best. 

Both companies innovate, its just that AMD has been a bit slack in recent years and just recycled a design that was flawed from the beginning. People who say otherwise have no idea what they are talking about.

"We also blind small animals with cosmetics.
We do not sell cosmetics. We just blind animals."

 

"Please don't mistake us for Equifax. Those fuckers are evil"

 

This PSA brought to you by Equifacks.
PMSL

Link to comment
Share on other sites

Link to post
Share on other sites

Both companies innovate, its just that AMD has been a bit slack in recent years and just recycled a design that was flawed from the beginning. People who say otherwise have no idea what they are talking about.

Because creating a architecture does not take time..

/sarcasm

Link to comment
Share on other sites

Link to post
Share on other sites

Seems like they used an old driver for the kaveri.

A newer driver should boost kaveris performance too 543 Mpix/s.

Link to comment
Share on other sites

Link to post
Share on other sites

Because creating a architecture does not take time..

/sarcasm

I know it takes time, but when they use the same one even though its known to under-perform, that's called being slack.

"We also blind small animals with cosmetics.
We do not sell cosmetics. We just blind animals."

 

"Please don't mistake us for Equifax. Those fuckers are evil"

 

This PSA brought to you by Equifacks.
PMSL

Link to comment
Share on other sites

Link to post
Share on other sites

I know it takes time, but when they use the same one even though its known to under-perform, that's called being slack.

They did update it for multiple market segments.

What exactly are you asking for? '

FX steamrollers? (who will most likely perform worse than the current FX piledriver processors)

Link to comment
Share on other sites

Link to post
Share on other sites

They did update it for multiple market segments.

What exactly are you asking for? '

FX steamrollers? (who will most likely perform worse than the current FX piledriver processors)

What I was asking for was a CPU that would have been able to compete with Intel's offerings with a better price/performance ratio, like their older CPUs. That of course hasn't happened and as a result I've had to stick with Intel (I still haven't forgiven them for the slot 1 celerons that were an absolute POS (I've still got them too), and my AMD K6-2 beats the crap out of them at lower clock speeds, it cost less as well).

"We also blind small animals with cosmetics.
We do not sell cosmetics. We just blind animals."

 

"Please don't mistake us for Equifax. Those fuckers are evil"

 

This PSA brought to you by Equifacks.
PMSL

Link to comment
Share on other sites

Link to post
Share on other sites

What I was asking for was a CPU that would have been able to compete with Intel's offerings with a better price/performance ratio, like their older CPUs.

When they designed their older CPUs their R&D budget was closer to Intels. Now is is like 1/10 of Intels.
Link to comment
Share on other sites

Link to post
Share on other sites

Looks like the second-half of 2015 is going to be big for AMD! Really getting tired of Intel and Nvidia commanding their respective markets.

 

For those concerned about AMD's desktop CPU lineup, I think it may be a lost cause? Keep in mind that AMD made a net loss of $364 million in Q4 2014 and their year-over-year revenue drop was 22% - both significant figures. With the financial pressures AMD is experiencing it is difficult to continue investing in product lines that do not present high-growth opportunities or large degrees of profitability. While I also want to see AMD release some highly competitive desktop CPUs, their focus on professional graphics, embedded and low-power clients may signal otherwise. With their financial burdens and reactive cost cutting strategy over the last couple of years, I just don't think they have the capacity to delve back into desktop CPUs. If they can begin to consistently break-even, then we may see their R&D budget open up for the desktop segment. But until then, we will probably see incremental improvements, rather than the radical ones we all want to see.

Link to comment
Share on other sites

Link to post
Share on other sites

Addition to @lak

Lisa su also said that AMD have been "over-concentration in consumer PCs".

Again, dont expect big cores like Intel, but instead smaller (but moar) puma-on-steroids cores.

Link to comment
Share on other sites

Link to post
Share on other sites

People need to realize Bulldozer was not a flawed "design" it was a flawed implementation. I have posted about the emulators.com "articles" on the subject multiple times. The idea behind Bulldozer was sound, and could have delivered everything AMD wanted and promised, their implementation was flawed, they did not build it in the way their design goal needed. Had they done so bulldozer would have been at the very least competitive, at best it would have one-upped Intel for a chip or two. AMD cut corners where they shouldn't have, The philosophy behind Bulldozer and the design foundation were sound.

Link to comment
Share on other sites

Link to post
Share on other sites

Seems like they used an old driver for the kaveri.

A newer driver should boost kaveris performance too 543 Mpix/s.

No, that's at best a rumor.

Software Engineer for Suncorp (Australia), Computer Tech Enthusiast, Miami University Graduate, Nerd

Link to comment
Share on other sites

Link to post
Share on other sites

People need to realize Bulldozer was not a flawed "design" it was a flawed implementation. I have posted about the emulators.com "articles" on the subject multiple times. The idea behind Bulldozer was sound, and could have delivered everything AMD wanted and promised, their implementation was flawed, they did not build it in the way their design goal needed. Had they done so bulldozer would have been at the very least competitive, at best it would have one-upped Intel for a chip or two. AMD cut corners where they shouldn't have, The philosophy behind Bulldozer and the design foundation were sound.

Talking about this one: http://www.emulators.com/docs/nx34_2011_avx.htm#Bulldozer ?

"I find their claim to be misleading, as it is based on confusing definitions of what a "core" is versus a 'thread'."

Not versus a thread, but versus a cluster. AMD switched the original terminologies.

"If AMD simply said Bulldozer was quad-core with SMT"

But it is not SMT. It is MCMT to be precise.

"But they got busted and their processor simply doesn't live up to the performance of a true 8-core AVX processor."

"true 8-core AVX processor" - WTF. Sure, it cannot run two 256-bit AVX instruction simultaneously, but it is far from the issue.

"They added ... and added hyper-threading"

Jesus... I hope this is not the one you are referring to.

"They simply cloned a Core i7"

It keeps going. What is the author talking about?

Link to comment
Share on other sites

Link to post
Share on other sites

SiSoft is amateur hour and an easily spoofed database that got fooled by the 8890k fiasco. That page isn't worth the bits it's encoded in.

Software Engineer for Suncorp (Australia), Computer Tech Enthusiast, Miami University Graduate, Nerd

Link to comment
Share on other sites

Link to post
Share on other sites

Well, I just made sure you didn't limit the situations where a iGP is more beneficial than a dedicated GPU.

That is a ridiculous perspective. They might aswell just have made the die twice as small?

Wasn't this whether or not the iGP would hamper the performance of a processor? (in a sense it can, because of thermals)

It hampers the theoretical performance the CPU would have if it was the only thing on the die. I did not say it hampers it directly.

MacBook Pro 15' 2018 (Pretty much the only system I use)

Link to comment
Share on other sites

Link to post
Share on other sites

It hampers the theoretical performance the CPU would have if it was the only thing on the die. I did not say it hampers it directly.

The theoretical performance isn't hampered at all. It's up to you to provide the volts and keep it cool.

Software Engineer for Suncorp (Australia), Computer Tech Enthusiast, Miami University Graduate, Nerd

Link to comment
Share on other sites

Link to post
Share on other sites

That's not so true anymore. If you look at the HD 5500 benchmarks vs. desktop kaveri even now, once HD 6000 comes along Intel will have an iGPU as powerful as the one on the 7850k.

Well, gotta bless the 14nm lithography then :P I mean at this point, even if Intel does nothing to improve their architecture, they are so much ahead of everyone else from a manufacturing perspective that they have the advantage anyway. I mean seriously, it is not even funny at this point...

MacBook Pro 15' 2018 (Pretty much the only system I use)

Link to comment
Share on other sites

Link to post
Share on other sites

-snip-

 

I need to find the other piece on the architecture, but it may have been on someone else's page he linked to so it may not be on the emulators.com page. But, aside from the obvious discount of core vs thread nomenclature, what he is illustrating is that the "hardware multithreading" ideal was sound in design with complementary architecture, but AMD made tradeoffs that were far too negative from their earlier architectures, and far too discounted from the Intel bar they were trying to vault. But, also that it showed a sort of surrender on AMD's part on x86 performance. He winds up being halfway vindicated with the push for HSA but that push doesn't help Bulldozer in general, or FX CPUs in particular, so in hindsight all we can point to are those cut corners in the implementation of the architecture, and the performance loss it incurred.

 

His focus has always been on virtualization, and the necessities of running that efficiently; but, his rundown is far more generalized and goes over the more nuts and bolts failures in performance:

 

 

But Sandy Bridge has addressed the load port and partial EFLAGS stalls now, and it is on other design details that Bulldozer loses and loses big:

  • Bulldozer now has the same 4-cycle L1 cache latency as all the Core i5/i7 products.  No longer the advantage of a 3-cycle L1 latency.
  • Bulldozer is now slower than Sandy Bridge at PUSHFD and LAHF arithmetic flags instructions, so again, no longer an advantage.
  • Bulldozer maxes out at 2 addition operations per cycle, compared to 3 in Sandy Bridge, meaning lower ILP on fundamental ALU operations.
  • Bulldozer still uses an older style integer divider, needing 44 cycles to perform an integer addition instead of 22 cycle.  Similarly, Bulldozer needs 4 cycles for integer multiply instead of 3.  Therefore integer scaling operations are slower than Sandy Bridge.
  • Bulldozer, as with previous AMD products, is consistently slower at most MMX and SSE operations.  For example, a simple register move between 64-bit GPR and XMM (MOVD instruction) is 9 cycles instead of 1 on Sandy Bridge.  This limits the ability to use SIMD registers as extensions of the integer register file.
  • A very key instruction introduced in SSSE3 - byte permute (PSHUFB) - is 3 cycles instead of 1 cycle.  Practically throw darts at any other SSE instructions, they mostly tend to be slower.
  • L2 cache latency is almost twice as slow as Intel parts, looks like about 21 or 22 cycles as opposed to about 12 on Sandy Bridge.
  • L3 cache latency appears to be about 44 cycles, comparable to older Intel parts but slower than Sandy Bridge's 35.
  • Executing self-modifying code, and thus dynamic generation of code in Java or .NET, appears to be about twice as slow as Sandy Bridge.
  • CMPXCHG, a fundamental atomic instruction using for synchronization primitives and locks appears to need about 50 cycles for an uncontended operation, more than twice as slow as Sandy Bridge.

Bulldozer exhibits lower ILP, slower EFLAGS operations, slower L2 and L3 cache latency, slower multiply and divide, slower MMX, slower SSE, slower dynamic code, slower locks.  When you have so many significantly slower numbers in so many of the x86 micro-benchmarks it is almost certain you will get much lower IPC than Sandy Bridge on a given piece of x86 code.  And that is the data that the benchmark sites reported to you back in October, minus the analysis I just gave you to explain why.

 

 

On the reference to SMT versus CMT I believe he was speaking solely on marketing, not on misleading the public on the tech, but to be more honest with the structure and what the architecture actually did, its not 8 cores, its 4 cores with robust hardware multithreading, The average user does not know CMT/SMT, they just know cores and "Hyper-Threading" as an intel trademark, and to a lesser extent Multi-Threading as a tech. And CMT gave AMD a temptation too sweet to resist, with the obvious misstep in performance the selling point became "More Cores" rather than the performance.

 

On the clone he was directly referring to instruction sets and core/thread counts in that sentence, not architecture. They mirrored the i7's headline and instructions rather than develop a truly contentious product. His contention is that on the most basic level they cloned the 4 core 8 thread set up and added the newer instruction sets, he uses hyper-threading as an Intel guy and at the time that was the go to term, not multi-threading, and many people balked right off the bat at AMD's "Module/Core" paradigm. 

 

AMD took an idea with merit and married it to less advanced pieces and older architecture to make a cheaper rather than faster CPU. It COULD have been all it was touted, but the corners cut were too glaring. And his was only the first one referencing the bits of architecture I stumbled across, after hitting his I did some more searching around at the time, and found more analysis on the failings of the underpinnings, if I still have the bookmarks I will share them, but his site was the first I found at the time that went into some of the bit of the underlying tech in bulldozer that were letting down the side, not CMT itself, but AMD's implementation.

 

EDIT: It turns out I lost all my old tech links when my Firefox died, even my emulators.com links are gone, but I believe I have a backup I can check, and if not I know one of the extra articles was from anandtech, and another was from some other ???tech site. I will do some more searching and PM you the links to their articles on the architecture. Many harped mostly on the slower caches but that was not the primary culprit.

Link to comment
Share on other sites

Link to post
Share on other sites

I'm surprised that Intel is letting AMD beating them in the APU market.  Especially with all the money Intel has.

Shouldn't AMD with its purchase of ATI be ahead by default in this field when compared to Intel?

Link to comment
Share on other sites

Link to post
Share on other sites

Shouldn't AMD with its purchase of ATI be ahead by default in this field when compared to Intel?

You'd think so, but AMD hasn't really innovated in years, though they haven't really had the funding to do as much as Intel.

"We also blind small animals with cosmetics.
We do not sell cosmetics. We just blind animals."

 

"Please don't mistake us for Equifax. Those fuckers are evil"

 

This PSA brought to you by Equifacks.
PMSL

Link to comment
Share on other sites

Link to post
Share on other sites

SiSoft is amateur hour and an easily spoofed database that got fooled by the 8890k fiasco. That page isn't worth the bits it's encoded in.

Couldn't the same statement be made about WCCFTs article?

Also, did you really expect 2x the GP performance?

10-15% sound more reasonable.

Link to comment
Share on other sites

Link to post
Share on other sites

The theoretical performance isn't hampered at all. It's up to you to provide the volts and keep it cool.

It hampers the theoretical performance the CPU would have if it was the only thing on the die.

I was talking about die size not voltage. Of course it will not affect it that much if it is well designed, but the performance of the cpu would be a lot better if it had all the die dedicated to it.

MacBook Pro 15' 2018 (Pretty much the only system I use)

Link to comment
Share on other sites

Link to post
Share on other sites

I need to find the other piece on the architecture, but it may have been on someone else's page he linked to so it may not be on the emulators.com page. But, aside from the obvious discount of core vs thread nomenclature, what he is illustrating is that the "hardware multithreading" ideal was sound in design with complementary architecture, but AMD made tradeoffs that were far too negative from their earlier architectures, and far too discounted from the Intel bar they were trying to vault. But, also that it showed a sort of surrender on AMD's part on x86 performance. He winds up being halfway vindicated with the push for HSA but that push doesn't help Bulldozer in general, or FX CPUs in particular, so in hindsight all we can point to are those cut corners in the implementation of the architecture, and the performance loss it incurred.

Yes, a lot of the bulldozer design was "compiled" by a computer. (Intel is doing the same, however have more engineers to tighten it up).

AMD then used the next generations (piledriver, streamroller and excavator) to tighten up the components to create lower latency, less leaking and so on.

The problem were that AMD's architecture-philosophy moved away from the original LCU-philosophy and instead towards the TCU-philosophy.

Been less latency-oriented and more throughput-oriented.

MCMT itself is terrible as it moves away from the LCU-philosophy.

The creater did try to sell MCMT to Intel and AMD, but both declined.

However, AMD later suddenly pulled MCMT out again (they were out of ideas, basically).

 

His focus has always been on virtualization, and the necessities of running that efficiently; but, his rundown is far more generalized and goes over the more nuts and bolts failures in performance

He never goes to explain why its ILP is lower than Intels offering.

 

 

On the reference to SMT versus CMT I believe he was speaking solely on marketing, not on misleading the public on the tech, but to be more honest with the structure and what the architecture actually did, its not 8 cores, its 4 cores with robust hardware multithreading, The average user does not know CMT/SMT, they just know cores and "Hyper-Threading" as an intel trademark, and to a lesser extent Multi-Threading as a tech. And CMT gave AMD a temptation too sweet to resist, with the obvious misstep in performance the selling point became "More Cores" rather than the performance.

I do agree, that it was a bad move to change the terminologies. As a 4 "core" AMD processor is performing much worse than a 4 core Intel processor.

 

On the clone he was directly referring to instruction sets and core/thread counts in that sentence, not architecture. They mirrored the i7's headline and instructions rather than develop a truly contentious product. His contention is that on the most basic level they cloned the 4 core 8 thread set up and added the newer instruction sets, he uses hyper-threading as an Intel guy and at the time that was the go to term, not multi-threading, and many people balked right off the bat at AMD's "Module/Core" paradigm.

But it is not a copy. They both uses the x86 ISA.

Should they have develop their own ISA?

 

AMD took an idea with merit and married it to less advanced pieces and older architecture to make a cheaper rather than faster CPU. It COULD have been all it was touted, but the corners cut were too glaring. And his was only the first one referencing the bits of architecture I stumbled across, after hitting his I did some more searching around at the time, and found more analysis on the failings of the underpinnings, if I still have the bookmarks I will share them, but his site was the first I found at the time that went into some of the bit of the underlying tech in bulldozer that were letting down the side, not CMT itself, but AMD's implementation.

They design would not have been much better. It is a 2 ALU/AGU design (because it can only keep 1-2 ALUs busy at a time).

If they were to increase it to a 3 ALU/AGU design (like the K10 architecture (which barely even used the 3rd ALU)), they would need to quadruple the logic.

Because it is not just having the most execution units, it is about keeping them working.

Link to comment
Share on other sites

Link to post
Share on other sites

Couldn't the same statement be made about WCCFTs article?

Also, did you really expect 2x the GP performance?

10-15% sound more reasonable.

I tried to outline the same thing from the same article tho we apparently don't have anyone here on LTT with a A10-7850K to validate these findings.

 

So in comes a conundrum. According to this source the GPGPU gains of Carrizo is roughly only 11% over the desktop Kaveri.

 

In their testing they were able to achieve said numbers with Catalyst 14.2.

Catalyst_14-2_Drivers_Sandra_GP.png

 

The tests conducted in my original post were with newer drivers (14.4). The SiSoft database does not reflect this source for numbers. So I think we need someone here on LTT with a A10-7850k to run Sandra 2014 with 14.2 drivers and again with the latest 14.12. This will help validate if there was a drastic increase in GPGPU capabilities or if its just the 14.2 driver optimized for GPGPU (which would also be interesting).

Link to comment
Share on other sites

Link to post
Share on other sites

I tried to outline the same thing from the same article tho we apparently don't have anyone here on LTT with a A10-7850K to validate these findings.

Yes, I saw that, as the wccftech article used your thread as a source.
Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×