Jump to content

Benchmarks of the Skylake i7 6700K have emerged... Only marginally faster than 4790K...

Overl0rd

This is what happens when you have little to no competition in the market. Be happy there is still AMD v. Nvidia.

My PC: i7 3770k @ 4.4 Ghz || Hyper 212 Evo || Intel Extreme Motherboard DZ77GA || EVGA Hybrid 980ti || Corsair Vengeance Blue 16GB || Samsung 840 Evo 120 GB || WD Black 1TB

 

Peripherals: Corsair K70 RGB || Sentey Pro Revolution Gaming Mouse || Beyerdynamic DT 990 Premium 250 Ohm Headphone || Benq XL2420Z Monitor

Link to comment
Share on other sites

Link to post
Share on other sites

This is what happens when you have little to no competition in the market. Be happy there is still AMD v. Nvidia.

I appreciate that you clearly did not either take salt yourself, or read even as far as page 2 to note that (through no fault of the op on this forum) these 'benchmarks' are totally bullshit.

 

The only way they can be true as shown by post #30 on page 2 is if Skylake is legitimately WORSE than broadwell. 

 

@Overl0rd please add [rumor] to title and add copious salt to your op. This is exceptionally misleading.

LINK-> Kurald Galain:  The Night Eternal 

Top 5820k, 980ti SLI Build in the World*

CPU: i7-5820k // GPU: SLI MSI 980ti Gaming 6G // Cooling: Full Custom WC //  Mobo: ASUS X99 Sabertooth // Ram: 32GB Crucial Ballistic Sport // Boot SSD: Samsung 850 EVO 500GB

Mass SSD: Crucial M500 960GB  // PSU: EVGA Supernova 850G2 // Case: Fractal Design Define S Windowed // OS: Windows 10 // Mouse: Razer Naga Chroma // Keyboard: Corsair k70 Cherry MX Reds

Headset: Senn RS185 // Monitor: ASUS PG348Q // Devices: Note 10+ - Surface Book 2 15"

LINK-> Ainulindale: Music of the Ainur 

Prosumer DYI FreeNAS

CPU: Xeon E3-1231v3  // Cooling: Noctua L9x65 //  Mobo: AsRock E3C224D2I // Ram: 16GB Kingston ECC DDR3-1333

HDDs: 4x HGST Deskstar NAS 3TB  // PSU: EVGA 650GQ // Case: Fractal Design Node 304 // OS: FreeNAS

 

 

 

Link to comment
Share on other sites

Link to post
Share on other sites

I appreciate that you clearly did not either take salt yourself, or read even as far as page 2 to note that (through no fault of the op on this forum) these 'benchmarks' are totally bullshit.

 

The only way they can be true as shown by post #30 on page 2 is if Skylake is legitimately WORSE than broadwell. 

 

@Overl0rd please add [rumor] to title and add copious salt to your op. This is exceptionally misleading.

I did read over page 2. The reason I chose to ignore your post up until now is because your post was even less credible than OP's (and you are trying to belittle people for some reason). Let me tell you why you are spouting nonsense but first let me cite the source you failed to write on the benchmarks of Skylake and Broadwell. http://www.maximumpc.com/intel-broadwell-dt-core-i7-5775c-review/

 

You are comparing a cpu at STOCK to one that is overclocked. Do you even know what that word means because in post #30 you are using it. I don't even know why you are using the i7 5775c as a reference because if you look at the scores on that they are considerably lower than the 4790k at stock.

 

Also there is no point in even comparing different benchmarks because for the most part they are all different, albeit they should be similar.

 

http://www.bit-tech.net/hardware/2014/06/19/intel-core-i7-4790k-devil-s-canyon-review/6

 

In this benchmark the 4790k scores an 882 in the multi-core test so by your method of deduction OP is on point.

 

 

In the end my point stands, if this is how the skylake i7 6700k performs it better OC like a beast or it will show how lazy Intel is getting. As always "emerging benchmarks" should be taken with a grain of salt and the title might as well have a rumor tag in it.

My PC: i7 3770k @ 4.4 Ghz || Hyper 212 Evo || Intel Extreme Motherboard DZ77GA || EVGA Hybrid 980ti || Corsair Vengeance Blue 16GB || Samsung 840 Evo 120 GB || WD Black 1TB

 

Peripherals: Corsair K70 RGB || Sentey Pro Revolution Gaming Mouse || Beyerdynamic DT 990 Premium 250 Ohm Headphone || Benq XL2420Z Monitor

Link to comment
Share on other sites

Link to post
Share on other sites

I did read over page 2. The reason I chose to ignore your post up until now is because your post was even less credible than OP's (and you are trying to belittle people for some reason). Let me tell you why you are spouting nonsense but first let me cite the source you failed to write on the benchmarks of Skylake and Broadwell. http://www.maximumpc.com/intel-broadwell-dt-core-i7-5775c-review/

 

You are comparing a cpu at STOCK to one that is overclocked. Do you even know what that word means because in post #30 you are using it. I don't even know why you are using the i7 5775c as a reference because if you look at the scores on that they are considerably lower than the 4790k at stock.

 

Also there is no point in even comparing different benchmarks because for the most part they are all different, albeit they should be similar.

 

http://www.bit-tech.net/hardware/2014/06/19/intel-core-i7-4790k-devil-s-canyon-review/6

 

In this benchmark the 4790k scores an 882 in the multi-core test so by your method of deduction OP is on point.

 

 

In the end my point stands, if this is how the skylake i7 6700k performs it better OC like a beast or it will show how lazy Intel is getting. As always "emerging benchmarks" should be taken with a grain of salt and the title might as well have a rumor tag in it.

Actually two posts later I did cite it (#36). (and I mentioned it by name first before mentioning it by URL)

 

The broadwell was clocked to 4.2 GHz. The exact same speed the 6700k was reported (and is known to be running at). This means that clock for clock the difference shown between the 4790k (the absolute value isn't particularly relevant because many other factors can change it by a small amount. The relative difference between the two however should be basically constant from each person doing they own benchmarking) to the 5775C at 4.2 is massively greater than the difference shown between this 4790k and the 6700k at the same speed.

 

5775C @4.2 vs 4790k  6.6% relative difference.

 

6700k@4.2 vs 4790k 3.3% relative difference.

 

Which would require that the 6700k is clock for clock worse than the 5775C. I hope you don't honestly believe that...

 

The only other possibility is that the entire assembly that the skylake was attached to was a piece of shit. This COULD be a factor because unlike with broadwell and haswell (that share the same socket), a true apples to apples comparison of Skylake and Haswell would be very hard to do (since you have to control quite a few additional factors). NOTE that that maximum article did actually use all the same hardware between the two cpu's.

 

EDIT:There is a third possibility that came to mind looking at that article you posted. The random variation of r15 scores could be sufficiently high such that a real ability to distinguish the performance differences between the 4790k, 5775c and 6700k does not exist. This is possible but would pretty laughably invalidate the r15 test as a useful benchmark, and if this conclusion were to be accepted, then the specific improvement between 3 generations of Intel cpus would not be determinate from that test. As such these benchmarks would have to be taken with an even greater amount of salt than otherwise rumor would permit.

LINK-> Kurald Galain:  The Night Eternal 

Top 5820k, 980ti SLI Build in the World*

CPU: i7-5820k // GPU: SLI MSI 980ti Gaming 6G // Cooling: Full Custom WC //  Mobo: ASUS X99 Sabertooth // Ram: 32GB Crucial Ballistic Sport // Boot SSD: Samsung 850 EVO 500GB

Mass SSD: Crucial M500 960GB  // PSU: EVGA Supernova 850G2 // Case: Fractal Design Define S Windowed // OS: Windows 10 // Mouse: Razer Naga Chroma // Keyboard: Corsair k70 Cherry MX Reds

Headset: Senn RS185 // Monitor: ASUS PG348Q // Devices: Note 10+ - Surface Book 2 15"

LINK-> Ainulindale: Music of the Ainur 

Prosumer DYI FreeNAS

CPU: Xeon E3-1231v3  // Cooling: Noctua L9x65 //  Mobo: AsRock E3C224D2I // Ram: 16GB Kingston ECC DDR3-1333

HDDs: 4x HGST Deskstar NAS 3TB  // PSU: EVGA 650GQ // Case: Fractal Design Node 304 // OS: FreeNAS

 

 

 

Link to comment
Share on other sites

Link to post
Share on other sites

Time will resolve all rumours ... JUST WAIT ! 

... Life is a game and the checkpoints are your birthday , you will face challenges where you may not get rewarded afterwords but those are the challenges that help you improve yourself . Always live for tomorrow because you may never know when your game will be over ... I'm totally not going insane in anyway , shape or form ... I just have broken English and an open mind ... 

Link to comment
Share on other sites

Link to post
Share on other sites

That's still an illusion. You only get the appearance of better throughput, but even this has theoretical bounds Intel is already grinding against. No matter how much you widen and deepen the pipeline, you're locked to your clock speed at some point by way of diminishing returns (not to mention control and data hazards flushing your pipeline anyway) and to how fast you can stream instructions and data from system memory. No instruction can ACTUALLY execute in less than 1 cycle. You can only get the illusion of it.

 

Netburst only had 2 ALUs! Haswell has 4 for crying out loud. And no, Netburst's biggest problem was a weaker branch predictor (~80% accurate) combined with (if I remember correctly) a 32 or 48 instruction deep pipeline where most software has a branch every 7-10 lines of code even if you're not dealing with AI. The pipeline was flushed way too often due to branch misses, so you couldn't get anywhere near the theoretical performance Intel was expecting. I believe the pipeline today is back up to 32 instructions deep, but that's with a 98% accurate branch predictor and hyperthreading on Xeons/I3s/I7s which lets two threads occupy one core, increasing the hazard length per process and reducing pipeline flushes.

 

And no, Intel's is the deepest pipeline. Power 8 has a 24-deep instruction pipeline. Sparc is 30 right now. Bulldozer and its derivatives are all 28 if I remember correctly. Haswell is up to 32 last I checked. No, accuracy is fixed. In a sequence, the probability it gets 1 branch prediction right is 0.98. 2 is 0.98^2. 3 is 0.98^3, etc... It doesn't matter how many options there are. Accuracy is fixed. If you have a sequence of branches in a row, that's where you start really running a risk of a pipeline flush. It doesn't matter how many options there are in a single branch. This was experimentally proven a long time ago with arbitrary wide menu selection performed at Stanford.

That is not an illusion. In relative to the core clock, it is 0,5 cycle.

Clearly these theoretical bounds, can be stretched with some different thinking.

You would think you could do some funny things with data hazard, when some instructions only take 0,5 clock cycle.

A bigger frontend. This is vere we are heading. Having to read further into the executable, and predict more more accurately.

But they needed to be fed as they were 4. I never suggested you needed more?

I never said it was the BIGGEST problem. I also doubt we could pinpoint a single thing as the biggest problem. There were a whole lot wrong with it.

Netburst had 20-31 pipeline stages.

Sandy and forward is 14-19 stages.

Haswell pipeline is a solid 14-19 stages.

What kind of implementation is Intel using that can have an fixed accuracy? I know of non.

Intel most have sacrificed a baby or something, as that is clearly magic.

Link to comment
Share on other sites

Link to post
Share on other sites

That is not an illusion. In relative to the core clock, it is 0,5 cycle.

Clearly these theoretical bounds, can be stretched with some different thinking.

You would think you could do some funny things with data hazard, when some instructions only take 0,5 clock cycle.

A bigger frontend. This is vere we are heading. Having to read further into the executable, and predict more more accurately.

But they needed to be fed as they were 4. I never suggested you needed more?

I never said it was the BIGGEST problem. I also doubt we could pinpoint a single thing as the biggest problem. There were a whole lot wrong with it.

Netburst had 20-31 pipeline stages.

Sandy and forward is 14-19 stages.

Haswell pipeline is a solid 14-19 stages.

What kind of implementation is Intel using that can have an fixed accuracy? I know of non.

Intel most have sacrificed a baby or something, as that is clearly magic.

No, it's an illusion. No instructions execute any faster. It's only that multiple instructions are executed in parallel. You still have the same theoretical limit, and Instruction Level Parallelism is already at its breaking point. Also no, we can already predict extremely well. AMD's front end may need a huge upgrade. Intel's doesn't, and nor would it help unless we wanted to make everything highly-threaded to suit something like the IBM Power 8.

 

Source? The Intel x86 manual's section on Haswell seems to disagree with you on the depth of the pipeline.

Software Engineer for Suncorp (Australia), Computer Tech Enthusiast, Miami University Graduate, Nerd

Link to comment
Share on other sites

Link to post
Share on other sites

I guess nothing in 2015 will be worth an upgrade, Zen better be damn impressive otherwise it's gonna take like 5 years between upgrades to gain even just 50% performance increase or worst.

if you getting a PCIE 4.0 GPU you gonna have to upgrade anyway.

Link to comment
Share on other sites

Link to post
Share on other sites

I did read over page 2. The reason I chose to ignore your post up until now is because your post was even less credible than OP's (and you are trying to belittle people for some reason). Let me tell you why you are spouting nonsense but first let me cite the source you failed to write on the benchmarks of Skylake and Broadwell. http://www.maximumpc.com/intel-broadwell-dt-core-i7-5775c-review/

 

You are comparing a cpu at STOCK to one that is overclocked. Do you even know what that word means because in post #30 you are using it. I don't even know why you are using the i7 5775c as a reference because if you look at the scores on that they are considerably lower than the 4790k at stock.

 

Also there is no point in even comparing different benchmarks because for the most part they are all different, albeit they should be similar.

 

http://www.bit-tech.net/hardware/2014/06/19/intel-core-i7-4790k-devil-s-canyon-review/6

 

In this benchmark the 4790k scores an 882 in the multi-core test so by your method of deduction OP is on point.

 

 

In the end my point stands, if this is how the skylake i7 6700k performs it better OC like a beast or it will show how lazy Intel is getting. As always "emerging benchmarks" should be taken with a grain of salt and the title might as well have a rumor tag in it.

It's not the raw scores we're using. It's IPC which can be derived from those scores and the given clocks for an objective # that would not change regardless of clock speed.

Software Engineer for Suncorp (Australia), Computer Tech Enthusiast, Miami University Graduate, Nerd

Link to comment
Share on other sites

Link to post
Share on other sites

No, it's an illusion. No instructions execute any faster. It's only that multiple instructions are executed in parallel. You still have the same theoretical limit, and Instruction Level Parallelism is already at its breaking point. Also no, we can already predict extremely well. AMD's front end may need a huge upgrade. Intel's doesn't, and nor would it help unless we wanted to make everything highly-threaded to suit something like the IBM Power 8.

 

Source? The Intel x86 manual's section on Haswell seems to disagree with you on the depth of the pipeline.

You can call it what you what, but it doesn't change it.

We humans cant fly neither, but with the "illusion" of a plane, we can.

I'm calling it a solution, to your problem.

ILP are not near any theoretical limits, whatever you mean with that.

Yes, the frontend is good enough for the backend which are in place today.

If you think Intels frontend is good enough at this point for the future, you are wrong.

A bigger frontend could also help to increase single-threaded workload.

Sources:

http://www.anandtech.com/show/6355/intels-haswell-architecture/6

https://en.wikipedia.org/wiki/Haswell_%28microarchitecture%29#Features_carried_over_from_Ivy_Bridge

http://www.lighterra.com/papers/modernmicroprocessors/

That is only to mention a few of them. I doubt Intels x86 manual on haswell disagrees with me.

This is not really something that surprising.

Link to comment
Share on other sites

Link to post
Share on other sites

You can call it what you what, but it doesn't change it.

We humans cant fly neither, but with the "illusion" of a plane, we can.

I'm calling it a solution, to your problem.

 

 

I am sorry, but this analogy is nonsense. An illusion implies something is deceptive to ones senses or beliefs. Flying in a plane will not give you the same sensation of actual flight. Using a wing suit might, but even then, you are merely descending gracefully, not flying. 

 

I think the word everyone was looking for was misconception, not illusion. 

My (incomplete) memory overclocking guide: 

 

Does memory speed impact gaming performance? Click here to find out!

On 1/2/2017 at 9:32 PM, MageTank said:

Sometimes, we all need a little inspiration.

 

 

 

Link to comment
Share on other sites

Link to post
Share on other sites

You can call it what you what, but it doesn't change it.

We humans cant fly neither, but with the "illusion" of a plane, we can.

I'm calling it a solution, to your problem.

ILP are not near any theoretical limits, whatever you mean with that.

Yes, the frontend is good enough for the backend which are in place today.

If you think Intels frontend is good enough at this point for the future, you are wrong.

A bigger frontend could also help to increase single-threaded workload.

Sources:

http://www.anandtech.com/show/6355/intels-haswell-architecture/6

https://en.wikipedia.org/wiki/Haswell_%28microarchitecture%29#Features_carried_over_from_Ivy_Bridge

http://www.lighterra.com/papers/modernmicroprocessors/

That is only to mention a few of them. I doubt Intels x86 manual on haswell disagrees with me.

This is not really something that surprising.

You're conflating the meaning of the analyses in your sources. There is only so much parallelism you can glean in an instantaneous fashion, and we're right on the cusp of that limit with every 7-9 instructions being a branch on average. If you press beyond 4 you will end up flushing more pipelines and more of the instructions in those pipelines due to bad guesswork on the part of the OoOP engine. This is not escapable! We could go with Denver's strategy of doing optimization based on CISC-VLIW translation, but that will buy you at best a 25% boost over the current potential. Pipelining does not see gains beyond having 4 instructions layered perfectly over one another across the 4 pipeline stages of fetch, decode, execute, and return. You would need to move to a MIMD or MISD instruction set to escape that. 

 

vm'N, I'm sorry, but reality is set against you. Just fetching and decoding more means nothing on its own for performance gains no matter how wide or how deep you wish to make the pipeline. Consumer programming is not so good as to have all the branches reduced to equations of pointer arithmetic as in HPC. The only place that would see gains from widening the front end is HPC. If you want to discuss realistic solutions like long-form, context-driven optimization, then let's do it. Widening the front end and deepening the pipeline will do nothing for current consumer workloads. End of story.

Software Engineer for Suncorp (Australia), Computer Tech Enthusiast, Miami University Graduate, Nerd

Link to comment
Share on other sites

Link to post
Share on other sites

I am sorry, but this analogy is nonsense. An illusion implies something is deceptive to ones senses or beliefs. Flying in a plane will not give you the same sensation of actual flight. Using a wing suit might, but even then, you are merely descending gracefully, not flying. 

 

I think the word everyone was looking for was misconception, not illusion.

Nonsense? You might not want to think to far into an anology, as that will always bring up differences.

It was simply a matter of showing a point that some problems can be solved with different thinking.

Dont go to literal.

 

 

You're conflating the meaning of the analyses in your sources. There is only so much parallelism you can glean in an instantaneous fashion, and we're right on the cusp of that limit with every 7-9 instructions being a branch on average. If you press beyond 4 you will end up flushing more pipelines and more of the instructions in those pipelines due to bad guesswork on the part of the OoOP engine. This is not escapable! We could go with Denver's strategy of doing optimization based on CISC-VLIW translation, but that will buy you at best a 25% boost over the current potential. Pipelining does not see gains beyond having 4 instructions layered perfectly over one another across the 4 pipeline stages of fetch, decode, execute, and return. You would need to move to a MIMD or MISD instruction set to escape that. 

 

vm'N, I'm sorry, but reality is set against you. Just fetching and decoding more means nothing on its own for performance gains no matter how wide or how deep you wish to make the pipeline. Consumer programming is not so good as to have all the branches reduced to equations of pointer arithmetic as in HPC. The only place that would see gains from widening the front end is HPC. If you want to discuss realistic solutions like long-form, context-driven optimization, then let's do it. Widening the front end and deepening the pipeline will do nothing for current consumer workloads. End of story.

There are no confliction in my sources. They all have the same statement. The pipeline is 14-19 stages! You are the only one so far saying otherwise. You might want to take a look in Intels x86 manual again..

Your way of thinking is nearing the max.

We have been over this ones before. The implementation have a theoretical limit.

x86 are already translating CISC (more accurately x86) instruction to optimize for the underlying micro-architecture. So what is so new about it?

That is the "traditional" RISC pipeline (except for the decode part). CISC and almost all RISC architectures have gone beyond that.

You have to crave more into the executable to increase performance, else you will be limited with the same limitation as you are today. Great!

Deeper pipeline is not necessarily. I dont see why you suddenly have an obsession with it.

"just and just". There is a whole lot of optimization that can be done for the backend. Dont confuse it with "just fetching and decoding".

We have yet seen any big implement some sort of TLS (thread-level speculation, transactional memory (except intels TSX), and so on..).

Might want to reread what I'm saying. I never said anything about a longer pipeline, and I think we have a different understanding on the functionality of the frontend.

Never said anything about ideal solutions neither. Just saying that it is possible.

Link to comment
Share on other sites

Link to post
Share on other sites

Nonsense? You might not want to think to far into an anology, as that will always bring up differences.

It was simply a matter of showing a point that some problems can be solved with different thinking.

Dont go to literal.

 

 

It is just that it makes no sense, especially with the way you word it. The plane itself is not an illusion. Had you said "with the invention of the plane, we can" it would have made perfect sense. Calling the plane an illusion would imply it does not exist, or is deceptive in its existence, which is where i find it hard to follow. Also, for you to tell me not to think too far into an analogy tells me that you do not know what an analogy is used for. An analogy is used to show similarities between two things. To tell me not to look into that is basically voiding the entire point of using an analogy. For an analogy to work, it has to make sense. Yours did not.

My (incomplete) memory overclocking guide: 

 

Does memory speed impact gaming performance? Click here to find out!

On 1/2/2017 at 9:32 PM, MageTank said:

Sometimes, we all need a little inspiration.

 

 

 

Link to comment
Share on other sites

Link to post
Share on other sites

Nonsense? You might not want to think to far into an anology, as that will always bring up differences.

It was simply a matter of showing a point that some problems can be solved with different thinking.

Dont go to literal.

 

 

There are no confliction in my sources. They all have the same statement. The pipeline is 14-19 stages! You are the only one so far saying otherwise. You might want to take a look in Intels x86 manual again..

Your way of thinking is nearing the max.

We have been over this ones before. The implementation have a theoretical limit.

x86 are already translating CISC (more accurately x86) instruction to optimize for the underlying micro-architecture. So what is so new about it?

That is the "traditional" RISC pipeline (except for the decode part). CISC and almost all RISC architectures have gone beyond that.

You have to crave more into the executable to increase performance, else you will be limited with the same limitation as you are today. Great!

Deeper pipeline is not necessarily. I dont see why you suddenly have an obsession with it.

"just and just". There is a whole lot of optimization that can be done for the backend. Dont confuse it with "just fetching and decoding".

We have yet seen any big implement some sort of TLS (thread-level speculation, transactional memory (except intels TSX), and so on..).

Might want to reread what I'm saying. I never said anything about a longer pipeline, and I think we have a different understanding on the functionality of the frontend.

Never said anything about ideal solutions neither. Just saying that it is possible.

It's not about confliction. You're drawing invalid conclusions based on what they say. The implementations have limits, and so do concepts! The law of diminishing returns comes up again and again! Amdahl's Law is the same thing in providing more cores. There is another proven theorem about ILP that is the same way. Most programming is not 95% parallelizable! Beyond 4-wide ILP is major diminishing returns for software in general. It is pointless to push it!

 

No it isn't. Microcode is an instantaneous context-free optimization which is highly limited in scope and effect. Denver takes the idea and makes context actually drive the optimization, able to get more and more levels of optimization by looking at huge chunks and taking the time to translate them, interweave them, look farther ahead, and then make more informed decisions. Half of Denver is an artificial intelligence which drives the optimization engine. Intel's x86 is only based on finite rules.

 

And no, no, and no! Pipelining is the same it has always been with or without the microcode translations.

 

Uh, yes we have. Please look at Oracle Sparc's current systems. The gains were tiny.

 

And I'm saying you're blowing the possibility out of proportion! The back end and front end are already reaching theoretical thresholds after which everything becomes tied to clock speed and memory access time & bandwidth. Software must evolve. End of story. The age of SISD should have died in 2008. Thank you Microsoft for getting in the way of that.

Software Engineer for Suncorp (Australia), Computer Tech Enthusiast, Miami University Graduate, Nerd

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now


×