Jump to content

Updates to Skylake Discrete Graphics Performance: PCIe Optimizations Incoming

tsk

 

In our initial review of the two 6th Generation Intel Skylake-K processors launched on August 5th, the i7-6700K and the i5-6600K, our comparative analysis to the previous generations of Intel processors was for the most part, positive. On the whole, clock-for-clock performance was a marginal increase over previous generations but the cumulative end-to-end effort of several generations of upgrades, plus for those that overclock, gave a substantial reason for those in CPU limited workloads to find an upgrade (along with benefits on the chipset and DRAM side as well). However, one element of the equation was puzzling at the time – the performance of games using discrete graphics cards was marginally lower with the new platform compared to older platforms when looking at average frame rates.

 

During our testing, it is not uncommon to see two platforms that perform similarly to have a reasonable margin of error, often ±1%, due to variations in pre-initialised cache structures, or in the case of games like GRID that rely on a random sequence to provide the end-result numbers. Despite this, we noticed that for Skylake-K we saw consistent drop in our discrete GPU testing, often around the -1% to -3% mark but sometimes as low as -5% or -7% when we compared it to both Intel’s 5th Generation (Broadwell) and 4thGeneration (Haswell). Other websites such as The Tech Report also noted these results, placing Broadwell’s numbers at the top of the stack (if only marginal). Some commentary at the time focused on Broadwell’s use of eDRAM in the desktop components which can aid performance while retaining a frequency deficit, although given our analysis of the eDRAM in Broadwell as a victim cache rather than a transparent DRAM cache it seems less likely that this is the case, plus we also now have new information coming post launch about this issue. But if we remove Broadwell as a special case, it was still concerning that the i7-6700K lagged behind the i7-4770K despite being higher in frequency and clock-for-clock performance.

 

 

 

Another couple of weeks later, we were contacted by ASUS who shed a lot more light on the issue.  The register in question is called the FCLK (or ‘f-clock’), which controls some of the cross-frequency compensation mechanisms between the ring interconnect of the CPU, the System Agent, and the PEG (PCI Express Graphics). Basically this means it is to do with data from the processor to the GPUs. So when data is handed from one end to another, this element of the processor manages the data buffers to allow that cross boundary migration in a lossless way. This is a ratio frequency setting which is tied directly to the base frequency of the processor (the BCLK, typically 100 MHz), and can be set at 4x, 8x or 10x for 400 MHz, 800 MHz or 1000 MHz respectively.

TL:DR, Z170 mobos were running with an FCLK of 800MHz instead of intels recommended 1000MHz.

Anandtech retested and saw an increase in performance of about 1,3% with the 1000MHz setting.

Source

Link to comment
Share on other sites

Link to post
Share on other sites

1.3% could still be in the error margin, let's see what driver and uefi/BIOS update bring..

Link to comment
Share on other sites

Link to post
Share on other sites

Thanks for the TL;DR, good read though. Didn't know it was a thing.

Also, woot 1.3%. Can it be over locked more?

Bleigh!  Ever hear of AC series? 

Link to comment
Share on other sites

Link to post
Share on other sites

1.3%!!!!!!!!!!!11!!1!!!!!!!

 

Elf.gif

My Systems:

Main - Work + Gaming:

Spoiler

Woodland Raven: Ryzen 2700X // AMD Wraith RGB // Asus Prime X570-P // G.Skill 2x 8GB 3600MHz DDR4 // Radeon RX Vega 56 // Crucial P1 NVMe 1TB M.2 SSD // Deepcool DQ650-M // chassis build in progress // Windows 10 // Thrustmaster TMX + G27 pedals & shifter

F@H Rig:

Spoiler

FX-8350 // Deepcool Neptwin // MSI 970 Gaming // AData 2x 4GB 1600 DDR3 // 2x Gigabyte RX-570 4G's // Samsung 840 120GB SSD // Cooler Master V650 // Windows 10

 

HTPC:

Spoiler

SNES PC (HTPC): i3-4150 @3.5 // Gigabyte GA-H87N-Wifi // G.Skill 2x 4GB DDR3 1600 // Asus Dual GTX 1050Ti 4GB OC // AData SP600 128GB SSD // Pico 160XT PSU // Custom SNES Enclosure // 55" LG LED 1080p TV  // Logitech wireless touchpad-keyboard // Windows 10 // Build Log

Laptops:

Spoiler

MY DAILY: Lenovo ThinkPad T410 // 14" 1440x900 // i5-540M 2.5GHz Dual-Core HT // Intel HD iGPU + Quadro NVS 3100M 512MB dGPU // 2x4GB DDR3L 1066 // Mushkin Triactor 480GB SSD // Windows 10

 

WIFE'S: Dell Latitude E5450 // 14" 1366x768 // i5-5300U 2.3GHz Dual-Core HT // Intel HD5500 // 2x4GB RAM DDR3L 1600 // 500GB 7200 HDD // Linux Mint 19.3 Cinnamon

 

EXPERIMENTAL: Pinebook // 11.6" 1080p // Manjaro KDE (ARM)

NAS:

Spoiler

Home NAS: Pentium G4400 @3.3 // Gigabyte GA-Z170-HD3 // 2x 4GB DDR4 2400 // Intel HD Graphics // Kingston A400 120GB SSD // 3x Seagate Barracuda 2TB 7200 HDDs in RAID-Z // Cooler Master Silent Pro M 1000w PSU // Antec Performance Plus 1080AMG // FreeNAS OS

 

Link to comment
Share on other sites

Link to post
Share on other sites

I think 1.3% is around the same number as the amount of people who have Skylake builds but don't have a dedicated GPU.

Main Rig:-

Ryzen 7 3800X | Asus ROG Strix X570-F Gaming | 16GB Team Group Dark Pro 3600Mhz | Corsair MP600 1TB PCIe Gen 4 | Sapphire 5700 XT Pulse | Corsair H115i Platinum | WD Black 1TB | WD Green 4TB | EVGA SuperNOVA G3 650W | Asus TUF GT501 | Samsung C27HG70 1440p 144hz HDR FreeSync 2 | Ubuntu 20.04.2 LTS |

 

Server:-

Intel NUC running Server 2019 + Synology DSM218+ with 2 x 4TB Toshiba NAS Ready HDDs (RAID0)

Link to comment
Share on other sites

Link to post
Share on other sites

I think 1.3% is around the same number as the amount of people who have Skylake builds but don't have a dedicated GPU.

 

As it says right in the title, they're talking about discrete (or dedicated if you prefer) GPU performance. These optimizations deal with the speed of CPU-PCIe communications.

Link to comment
Share on other sites

Link to post
Share on other sites

As it says right in the title, they're talking about discrete (or dedicated if you prefer) GPU performance. These optimizations deal with the speed of CPU-PCIe communications.

Oh sorry. For some reason I totally misunderstood, I guess it's cause I only had about 3 hours of sleep last night.

My bad.

Main Rig:-

Ryzen 7 3800X | Asus ROG Strix X570-F Gaming | 16GB Team Group Dark Pro 3600Mhz | Corsair MP600 1TB PCIe Gen 4 | Sapphire 5700 XT Pulse | Corsair H115i Platinum | WD Black 1TB | WD Green 4TB | EVGA SuperNOVA G3 650W | Asus TUF GT501 | Samsung C27HG70 1440p 144hz HDR FreeSync 2 | Ubuntu 20.04.2 LTS |

 

Server:-

Intel NUC running Server 2019 + Synology DSM218+ with 2 x 4TB Toshiba NAS Ready HDDs (RAID0)

Link to comment
Share on other sites

Link to post
Share on other sites

Anandtech retested and saw an increase in performance of about 1,3% with the 1000MHz setting.

 

Moar performance!

 

anigif_enhanced-25438-1403636343-2.gif

Rig:Crimson Impaler | CPU: i3 4160 | Cooler: CM Hyper TX3 Evo | Motherboard: Asrock B85M - DGS | RAM: Kingston Hyper X Savage 16GB kit (2x8) DDR3 1600MHZ CL9 | GPU: Asus Radeon R7 360 | PSU: Corsair CX 430 V2 | Storage: HDD WD 1TB Blue | Case: Delux DLC-MG866


~Half the world is composed of idiots, the other half of people clever enough to take indecent advantage of them.~

Link to comment
Share on other sites

Link to post
Share on other sites

1.3% could still be in the error margin, let's see what driver and uefi/BIOS update bring..

Normally that'd be the case, but Skylake at launch numbers were down consistently (across 5 games x 5 GPUs) around -1% to -3%, some as low as -7%, compared to Haswell/Broadwell. If they were up and down, I'd agree with you, but the base line was lower than expected. Retesting with the enhanced FCLK saw consistent gains across all of the dGPU tests. 

 

Disclosure: I'm the Senior Editor at AnandTech responsible for the article linked above. Twitter at @borandi, @IanCutress etc.

 

Some people will say the game tests we used are a limited set and all GPU limited (which in this case is fine), or not a large enough sample. But testing 5 games x 5 GPUs x 3/4 tests each game takes the best part of a day, and that's when everything works. Always ready to accept new benchmarks though if they can be done consistently.

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×