Jump to content
Search In
  • More options...
Find results that contain...
Find results in...

AMD speaks on W10 scheduler and Ryzen

I think the issue with poor gaming performance is down to the 140ns delay between the two CCX's communicating data between each other, vs two cores in the same CCX communicating data with a 10ns 20-40ns delay. Pc perspective did a video on this, and the graph is below:

 

ping-amd.png

 

Logical cores 1-8 (physical cores 1-4) are on one CCX, whereas logical cores 9-16 (physical cores 5-8) are on a different CCX.

 

An independently threaded program such as Cinebench where threads don't need to communicate data between each other (since they just receive instructions on the main thread) each core will perform at its maximum potential, however in games where cores need to communicate data between eachother, the delays between the CCX could be mounting up, leading to performance degredation.

 

If the CCX latency is the issue (which I think it is, seeing as how well Ryzen performs vs Intel in thread-independent production software) then game developers should consider putting similar workloads on cores in the same CCX to alleviate the performance issue.

 

Someone should verify this by limiting the core affinity of a game to all the logical cores within the same CCX.

 

Speedtests

WiFi - 7ms, 22Mb down, 10Mb up

Ethernet - 6ms, 47.5Mb down, 9.7Mb up

 

Rigs

Spoiler

 Type            Desktop

 OS              Windows 10 Pro

 CPU             i5-4430S

 RAM             8GB CORSAIR XMS3 (2x4gb)

 Cooler          LC Power LC-CC-97 65W

 Motherboard     ASUS H81M-PLUS

 GPU             GeForce GTX 1060

 Storage         120GB Sandisk SSD (boot), 750GB Seagate 2.5" (storage), 500GB Seagate 2.5" SSHD (cache)

 

Spoiler

Type            Server

OS              Ubuntu 14.04 LTS

CPU             Core 2 Duo E6320

RAM             2GB Non-ECC

Motherboard     ASUS P5VD2-MX SE

Storage         RAID 1: 250GB WD Blue and Seagate Barracuda

Uses            Webserver, NAS, Mediaserver, Database Server

 

Quotes of Fame

On 8/27/2015 at 10:09 AM, Drixen said:

Linus is light years ahead a lot of other YouTubers, he isn't just an average YouTuber.. he's legitimately, legit.

On 10/11/2015 at 11:36 AM, Geralt said:

When something is worth doing, it's worth overdoing.

On 6/22/2016 at 10:05 AM, trag1c said:

It's completely blown out of proportion. Also if you're the least bit worried about data gathering then you should go live in a cave a 1000Km from the nearest establishment simply because every device and every entity gathers information these days. In the current era privacy is just fallacy and nothing more.

 

Link to post
Share on other sites
29 minutes ago, LAwLz said:

Not if their yields are bad, and/or supply and demand doesn't end up being like they expected, or if they have two different types of CCXs and you have to pair them together to get a fully functional chip (which it seems like judging by the die shot posted earlier).

 

That die shot is really screwing with my head because I just can't see how they could use a single CCX by itself if that's what the die looks like. But if you need a pair then why would they go with this design to begin with?

Probably wasn't that clear by what I meant by scalable CCX design. Any SKU that uses a different amount of CCXs is a different die design, cutting the current die isn't possible.

 

Basically the CCXs are paired with a memory controller and I/O silicon logic, that means while they can put any number of CCXs they wish (maybe in pairs?) on a die they still need the CCX interconnects plus memory controller/PCIe that go with it. Cut the die cut the memory controller which means broken non functional die.

 

That's where it starts to get really complex and further in to more unknowns:

  • Can a single CCX be properly connected to the memory controller or is a minimum of two required?
  • Can a single CCX die actually be made smaller in physical area? Can the memory controller etc be rearranged?
  • Is the potential sales volume of the 4 core SKUs worth investing in a dedicated die design?

My personally feeling on the matter regarding the CCX is it was always designed primarily for Naples to allow AMD to easily make a few different die designs covering a large range of core count SKU offerings cheaply and to maximize wafer area usage. Ryzen is the product of making do with what you have and not necessarily a specifically dedicated design, surely a 100% Ryzen focused design wouldn't have a CCX interconnect in it at all and be a unified 8 core??

 

I'd love to have an open and honest discussion with the Zen architecture engineers to find out where the real focus was, Ryzen or Naples.

Link to post
Share on other sites
12 minutes ago, leadeater said:

Basically the CCXs are paired with a memory controller and I/O silicon logic, that means while they can put any number of CCXs they wish (maybe in pairs?) on a die they still need the CCX interconnects plus memory controller/PCIe that go with it. Cut the die cut the memory controller which means broken non functional die.

Isn't all of those things part of the same die though? That's what it looks like to me when looking at the die shot above. That both CCXs and the other logics (memory controller etc) are on the same die.

 

18 minutes ago, leadeater said:

 

That's where it starts to get really complex and further in to more unknowns:

  • Can a single CCX be properly connected to the memory controller or is a minimum of two required?
  • Can a single CCX die actually be made smaller in physical area? Can the memory controller etc be rearranged?
  • Is the potential sales volume of the 4 core SKUs worth investing in a dedicated die design?

That's where my uncertainty about the 4 core version comes from as well. To me, judging by the die shot, you need two CCXs for a fully functional chip, but that would mean having 4 cores disabled on a 4 core SKU, which seems wasteful (unless yields are quite poor).

Link to post
Share on other sites
5 minutes ago, LAwLz said:

Isn't all of those things part of the same die though? That's what it looks like to me when looking at the die shot above. That both CCXs and the other logics (memory controller etc) are on the same die.

Correct, any design using a different number of CCXs would be a totally different die.

 

Edit:

The CCX approach just makes it easier to design different dies, simplicity through commonality/duplication. Adding more cores doesn't lead to drastic architecture change like it normally would. It's a bit like playing Tetris but with CCXs, memory controllers and I/O.

Link to post
Share on other sites
7 minutes ago, leadeater said:

Correct, any design using a different number of CCXs would be a totally different die.

 

Edit:

The CCX approach just makes it easier to design different dies, simplicity through commonality/duplication. Adding more cores doesn't lead to drastic architecture change like it normally would. It's a bit like playing Tetris but with CCXs, memory controllers and I/O.

I guess it's kinda like the CU system that AMD had with their APU's I guess then?

Link to post
Share on other sites
On 3/13/2017 at 6:48 PM, tlink said:

screaming bulldozer is as much of a good argument as screaming hitler is.

I consider bulldozer and invading Russia mistakes of similar scale. 

muh specs 

Gaming and HTPC (reparations)- ASUS 1080, MSI X99A SLI Plus, 5820k- 4.5GHz @ 1.25v, asetek based 360mm AIO, RM 1000x, 16GB memory, 750D with front USB 2.0 replaced with 3.0  ports, 2 250GB 850 EVOs in Raid 0 (why not, only has games on it), some hard drives

Screens- Acer preditor XB241H (1080p, 144Hz Gsync), LG 1080p ultrawide, (all mounted) directly wired to TV in other room

Stuff- k70 with reds, steel series rival, g13, full desk covering mouse mat

All parts black

Workstation(desk)- 3770k, 970 reference, 16GB of some crucial memory, a motherboard of some kind I don't remember, Micomsoft SC-512N1-L/DVI, CM Storm Trooper (It's got a handle, can you handle that?), 240mm Asetek based AIO, Crucial M550 256GB (upgrade soon), some hard drives, disc drives, and hot swap bays

Screens- 3  ASUS VN248H-P IPS 1080p screens mounted on a stand, some old tv on the wall above it. 

Stuff- Epicgear defiant (solderless swappable switches), g600, moutned mic and other stuff. 

Laptop docking area- 2 1440p korean monitors mounted, one AHVA matte, one samsung PLS gloss (very annoying, yes). Trashy Razer blackwidow chroma...I mean like the J key doesn't click anymore. I got a model M i use on it to, but its time for a new keyboard. Some edgy Utechsmart mouse similar to g600. Hooked to laptop dock for both of my dell precision laptops. (not only docking area)

Shelf- i7-2600 non-k (has vt-d), 380t, some ASUS sandy itx board, intel quad nic. Currently hosts shared files, setting up as pfsense box in VM. Also acts as spare gaming PC with a 580 or whatever someone brings. Hooked into laptop dock area via usb switch

Link to post
Share on other sites
3 hours ago, leadeater said:

For the above point about TDP, well someone needs to go look up what TDP actually means because it is not power draw of the CPU. The 4 core SKU and 6 core SKU having the same TDP in no way indicates the CCX makeup.

for this type of comparison when the arch is identical and assuming the TDP is expressed on loading the cores in the exact same way, estimating TDP that way is a safe bet

there are other factors to consider, like core clocks

 

for comparison look at i3 6100 vs i5 6500 - 51W vs 65W; the i3 is clocked higher 

Link to post
Share on other sites

@Tomsen @leadeater

Behardware has benchmarked a 2+2 vs a 4+0 configuration to determine how much of an impact the interconnector has.

 

Conclusion: Next to no impact at all. So if this is any indication, then an update to the Windows scheduler where it avoids moving data between CCXs would not make a difference.

Assuming that Windows' scheduler became 100% flawless in allocating threads to specific CCXs, we're looking at an average (not counting the abnormality that is BF1) of ~2% performance increase. That might as well be within margin of error.

(Funnily enough, the same performance increase the "legendary" Bulldozer optimization patch also brought us).

 

So once again has this idea that Microsoft are to blame, and that Windows 10 just isn't "optimized" for Ryzen been debunked.

 

Capture.PNG

Link to post
Share on other sites
6 minutes ago, zMeul said:

for this type of comparison when the arch is identical and assuming the TDP is expressed on loading the cores in the exact same way, estimating TDP that way is a safe bet

there are other factors to consider, like core clocks

 

for comparison look at i3 6100 vs i5 6500 - 51W vs 65W; the i3 is clocked higher 

I agree taking an entire CCX away should actually have a significant reduction in power draw which should be reflected in the TDP figure. Only thing to be careful of is TDP is thermal design power and is there to advise on the recommended cooling solution required, having said that if AMD had a chance to state a lower TDP they would have done so.

Link to post
Share on other sites
3 minutes ago, LAwLz said:

@Tomsen @leadeater

abnormality that is BF1) of ~2% performance increase. That might as well be within margin of error.

(Funnily enough, the same performance increase the "legendary" Bulldozer optimization patch also brought us).

 

 

It's ~3-7% for gaming if you only look at gaming results, and ignore BF1.

 

That's a nice little increase on some games. One many would gladly accept, as it's the IPC difference between haswell and broadwell. :P

5950X | NH D15S | 64GB 3200Mhz | RTX 3090 | ASUS PG348Q+MG278Q

 

Link to post
Share on other sites
4 minutes ago, LAwLz said:

@Tomsen @leadeater

Behardware has benchmarked a 2+2 vs a 4+0 configuration to determine how much of an impact the interconnector has.

That likely has a lot to do with the L3 cache being a victim cache and not a fully integrated instruction cache like Intel's L3 cache. I suspect there is little to almost no data movement between CCX L3 caches at all.

Link to post
Share on other sites
1 minute ago, leadeater said:

I agree taking an entire CCX away should actually have a significant reduction in power draw which should be reflected in the TDP figure. Only thing to be careful of is TDP is thermal design power and is there to advise on the recommended cooling solution required, having said that if AMD had a chance to state a lower TDP they would have done so.

I'm assuming AMD does things in a similar fashion on how Intel does it and don't just throw a sticker on it

furthermore, Intel has two thermal specifications:

  • TDP - Thermal Design Power (TDP) represents the average power, in watts, the processor dissipates when operating at Base Frequency with all cores active under an Intel-defined, high-complexity workload
  • TSS - Intel Reference Heat Sink specification for proper operation of this SKU.

the i7 6700K has a TDP or 91W but the TSS is 130W

Link to post
Share on other sites
4 minutes ago, Valentyn said:

It's ~3-7% for gaming if you only look at gaming results, and ignore BF1.

the avg is 2.33% - that's margin of error -_-

Link to post
Share on other sites
1 minute ago, zMeul said:

I'm assuming AMD does things in a similar fashion on how Intel does it and don't just throw a sticker on it

furthermore, Intel has two thermal specifications:

  • TDP - Thermal Design Power (TDP) represents the average power, in watts, the processor dissipates when operating at Base Frequency with all cores active under an Intel-defined, high-complexity workload
  • TSS - Intel Reference Heat Sink specification for proper operation of this SKU.

the i7 6700K has a TDP or 91W but the TSS is 130W

Well it was Intel that changed the meaning of TDP and made up TSS, but hey it doesn't really matter what it's called as long as we know what it represents and are comparing the same things between vendors.

 

Far as I'm aware AMD uses TDP like Intel uses TSS.

 

Quote

The thermal design power (TDP), sometimes called thermal design point, is the maximum amount of heat generated by a computer chip or component (often the CPU or GPU) that the cooling system in a computer is designed to dissipate in typical operation. Rather than specifying CPU's real power dissipation, TDP serves as the nominal value for designing CPU cooling systems.

https://en.wikipedia.org/wiki/Thermal_design_power

Link to post
Share on other sites
2 minutes ago, leadeater said:

Well it was Intel that changed the meaning of TDP

that's the way Intel defined TDP since I dunnoo ... forever?!!?

Link to post
Share on other sites
10 minutes ago, zMeul said:

that's the way Intel defined TDP since I dunnoo ... forever?!!?

Not back before processors had power states and variable multipliers. It was either on or after Intel Core series when Intel started marketing TDP differently and doesn't use it by the industry standard definition, but in the CPU world they are big enough they can set their own standard and not really cause any big problems doing so.

 

Here's an interesting read on TDP for AMD vs Intel.

http://www.anandtech.com/show/2807/2

 

Edit:

Quote

In particular, until around 2006 AMD used to report the maximum power draw of its processors as TDP, but Intel changed this practice with the introduction of its Conroe family of processors.[4]

https://en.wikipedia.org/wiki/Thermal_design_power

Link to post
Share on other sites
On 3/13/2017 at 7:13 PM, zMeul said:

except it's ~14% xD

Ryzen on DDR4 does less than Haswell IPC .. or you did forgot that

can I just insert the obligatory "man, moore's law is more dead than a hooker in a river"? I mean core count..yea I guess but come on 

muh specs 

Gaming and HTPC (reparations)- ASUS 1080, MSI X99A SLI Plus, 5820k- 4.5GHz @ 1.25v, asetek based 360mm AIO, RM 1000x, 16GB memory, 750D with front USB 2.0 replaced with 3.0  ports, 2 250GB 850 EVOs in Raid 0 (why not, only has games on it), some hard drives

Screens- Acer preditor XB241H (1080p, 144Hz Gsync), LG 1080p ultrawide, (all mounted) directly wired to TV in other room

Stuff- k70 with reds, steel series rival, g13, full desk covering mouse mat

All parts black

Workstation(desk)- 3770k, 970 reference, 16GB of some crucial memory, a motherboard of some kind I don't remember, Micomsoft SC-512N1-L/DVI, CM Storm Trooper (It's got a handle, can you handle that?), 240mm Asetek based AIO, Crucial M550 256GB (upgrade soon), some hard drives, disc drives, and hot swap bays

Screens- 3  ASUS VN248H-P IPS 1080p screens mounted on a stand, some old tv on the wall above it. 

Stuff- Epicgear defiant (solderless swappable switches), g600, moutned mic and other stuff. 

Laptop docking area- 2 1440p korean monitors mounted, one AHVA matte, one samsung PLS gloss (very annoying, yes). Trashy Razer blackwidow chroma...I mean like the J key doesn't click anymore. I got a model M i use on it to, but its time for a new keyboard. Some edgy Utechsmart mouse similar to g600. Hooked to laptop dock for both of my dell precision laptops. (not only docking area)

Shelf- i7-2600 non-k (has vt-d), 380t, some ASUS sandy itx board, intel quad nic. Currently hosts shared files, setting up as pfsense box in VM. Also acts as spare gaming PC with a 580 or whatever someone brings. Hooked into laptop dock area via usb switch

Link to post
Share on other sites
On 3/13/2017 at 9:32 PM, zMeul said:

content creation does not include compiling code

and compiling code can be done on a bottom of the barrel celly

oh yea that's why I literally almost shot my PC when I first made the jump to quad core all those years ago and discovered the compiler I used only 2 threads? Because compile times, not gaming, is why I made the jump. 

muh specs 

Gaming and HTPC (reparations)- ASUS 1080, MSI X99A SLI Plus, 5820k- 4.5GHz @ 1.25v, asetek based 360mm AIO, RM 1000x, 16GB memory, 750D with front USB 2.0 replaced with 3.0  ports, 2 250GB 850 EVOs in Raid 0 (why not, only has games on it), some hard drives

Screens- Acer preditor XB241H (1080p, 144Hz Gsync), LG 1080p ultrawide, (all mounted) directly wired to TV in other room

Stuff- k70 with reds, steel series rival, g13, full desk covering mouse mat

All parts black

Workstation(desk)- 3770k, 970 reference, 16GB of some crucial memory, a motherboard of some kind I don't remember, Micomsoft SC-512N1-L/DVI, CM Storm Trooper (It's got a handle, can you handle that?), 240mm Asetek based AIO, Crucial M550 256GB (upgrade soon), some hard drives, disc drives, and hot swap bays

Screens- 3  ASUS VN248H-P IPS 1080p screens mounted on a stand, some old tv on the wall above it. 

Stuff- Epicgear defiant (solderless swappable switches), g600, moutned mic and other stuff. 

Laptop docking area- 2 1440p korean monitors mounted, one AHVA matte, one samsung PLS gloss (very annoying, yes). Trashy Razer blackwidow chroma...I mean like the J key doesn't click anymore. I got a model M i use on it to, but its time for a new keyboard. Some edgy Utechsmart mouse similar to g600. Hooked to laptop dock for both of my dell precision laptops. (not only docking area)

Shelf- i7-2600 non-k (has vt-d), 380t, some ASUS sandy itx board, intel quad nic. Currently hosts shared files, setting up as pfsense box in VM. Also acts as spare gaming PC with a 580 or whatever someone brings. Hooked into laptop dock area via usb switch

Link to post
Share on other sites
1 hour ago, leadeater said:

Not back before processors had power states and variable multipliers. It was either on or after Intel Core series when Intel started marketing TDP differently and doesn't use it by the industry standard definition, but in the CPU world they are big enough they can set their own standard and not really cause any big problems doing so.

 

Here's an interesting read on TDP for AMD vs Intel.

http://www.anandtech.com/show/2807/2

 

Edit:

https://en.wikipedia.org/wiki/Thermal_design_power

Let's not forget, Intel pushed for "Scenario Design Power" in an attempt to get away with power-throttling to stay within a specified TDP. This happens on their mobile SKU's and their desktop T SKU's (unless you specifically turn it off via BIOS on the desktop SKU's). 

 

This is why I tell people to only take TDP into consideration if they do not intend to overclock, or to run the most stressful programs (Prime95, Linpack, etc.). If you intend to overclock, TDP no longer becomes accurate, not even in the slightest. When it comes to heat, voltage scales quadratically, and you will likely need a cooler with a TDP rating far more aggressive than what your CPU originally needed. Heat still scales with clock speed changes, but it's far more linear if you are just changing clocks. 

 

I could go on and on about TDP, like how different cooler manufacturers often test differently, or outright lie (ID Cooling pretends that this 45mm vapor chamber is rated for 130w, I can assure you it's not) when it comes to advertising these numbers, but that rant would go on for days. 

My (incomplete) memory overclocking guide: 

 

Does memory speed impact gaming performance? Click here to find out!

On 1/2/2017 at 9:32 PM, MageTank said:

Sometimes, we all need a little inspiration.

 

 

 

Link to post
Share on other sites
23 minutes ago, MageTank said:

Let's not forget, Intel pushed for "Scenario Design Power" in an attempt to get away with power-throttling to stay within a specified TDP. This happens on their mobile SKU's and their desktop T SKU's (unless you specifically turn it off via BIOS on the desktop SKU's). 

 

This is why I tell people to only take TDP into consideration if they do not intend to overclock, or to run the most stressful programs (Prime95, Linpack, etc.). If you intend to overclock, TDP no longer becomes accurate, not even in the slightest. When it comes to heat, voltage scales quadratically, and you will likely need a cooler with a TDP rating far more aggressive than what your CPU originally needed. Heat still scales with clock speed changes, but it's far more linear if you are just changing clocks. 

 

I could go on and on about TDP, like how different cooler manufacturers often test differently, or outright lie (ID Cooling pretends that this 45mm vapor chamber is rated for 130w, I can assure you it's not) when it comes to advertising these numbers, but that rant would go on for days. 

hah you ever seen the fucking ratings on passive coolers? 

"we idled a 140w cpu on it, it's good for 140w"

"You can place it on a stove burner man, it's all good"

and I'm not talking about server ones that are meant to have air forced through, I mean consumer ones. 

muh specs 

Gaming and HTPC (reparations)- ASUS 1080, MSI X99A SLI Plus, 5820k- 4.5GHz @ 1.25v, asetek based 360mm AIO, RM 1000x, 16GB memory, 750D with front USB 2.0 replaced with 3.0  ports, 2 250GB 850 EVOs in Raid 0 (why not, only has games on it), some hard drives

Screens- Acer preditor XB241H (1080p, 144Hz Gsync), LG 1080p ultrawide, (all mounted) directly wired to TV in other room

Stuff- k70 with reds, steel series rival, g13, full desk covering mouse mat

All parts black

Workstation(desk)- 3770k, 970 reference, 16GB of some crucial memory, a motherboard of some kind I don't remember, Micomsoft SC-512N1-L/DVI, CM Storm Trooper (It's got a handle, can you handle that?), 240mm Asetek based AIO, Crucial M550 256GB (upgrade soon), some hard drives, disc drives, and hot swap bays

Screens- 3  ASUS VN248H-P IPS 1080p screens mounted on a stand, some old tv on the wall above it. 

Stuff- Epicgear defiant (solderless swappable switches), g600, moutned mic and other stuff. 

Laptop docking area- 2 1440p korean monitors mounted, one AHVA matte, one samsung PLS gloss (very annoying, yes). Trashy Razer blackwidow chroma...I mean like the J key doesn't click anymore. I got a model M i use on it to, but its time for a new keyboard. Some edgy Utechsmart mouse similar to g600. Hooked to laptop dock for both of my dell precision laptops. (not only docking area)

Shelf- i7-2600 non-k (has vt-d), 380t, some ASUS sandy itx board, intel quad nic. Currently hosts shared files, setting up as pfsense box in VM. Also acts as spare gaming PC with a 580 or whatever someone brings. Hooked into laptop dock area via usb switch

Link to post
Share on other sites
6 minutes ago, Syntaxvgm said:

hah you ever seen the fucking ratings on passive coolers? 

"we idled a 140w cpu on it, it's good for 140w"

"You can place it on a stove burner man, it's all good"

and I'm not talking about server ones that are meant to have air forced through, I mean consumer ones. 

Yes, actually, lol. This vapor chamber of mine is rated for 130w, but the moment I throw even 120w at it, I get near 98C under 48k FFT Prime95. I hit thermal junction before the first pass of Linpack finishes, so I can't even tell you what it does at it's max heat. 

 

Now, I am not ignorant, and I know they can't possibly rate their performance for absolutely every case in existence, but it's an ITX cooler, which implies it's designed to be used in an ITX case. My thermals are done in a 10L case with very good airflow from the side-vents for the CPU. In fact, taking the top panel completely off and exposing all of the internals, actually made the performance worse (less centralized air on the CPU itself). This is why I ignore TDP ratings and have taken the time to study the design of the heatsinks themselves. That being said, they were not too far off with their rating. I would say it's rated for about 105-110w. This is still more than the Cryorig C7 (95w, and in my personal tests, it did slightly outperform the C7, though at much louder fan speeds) so it's still one of the best ITX coolers you can buy (ignoring ID Cooling's awful QA and nearly double the price of the C7 for only a few C difference). 

 

Another thing to consider: My CPU's stock TDP is 91w. I delidded it, and undervolted it from 1.23 down to 1.14, and it's clocks are still stock as well. I had to do all of this to survive 48k FFT Prime95 with this cooler. It does however, draw 122w during Prime95, which is where I got my 120w numbers from. During gaming load, this cooler is more than enough, so I give it a pass. 

My (incomplete) memory overclocking guide: 

 

Does memory speed impact gaming performance? Click here to find out!

On 1/2/2017 at 9:32 PM, MageTank said:

Sometimes, we all need a little inspiration.

 

 

 

Link to post
Share on other sites
2 hours ago, LAwLz said:

@Tomsen @leadeater

Behardware has benchmarked a 2+2 vs a 4+0 configuration to determine how much of an impact the interconnector has.

 

Conclusion: Next to no impact at all. So if this is any indication, then an update to the Windows scheduler where it avoids moving data between CCXs would not make a difference.

Assuming that Windows' scheduler became 100% flawless in allocating threads to specific CCXs, we're looking at an average (not counting the abnormality that is BF1) of ~2% performance increase. That might as well be within margin of error.

(Funnily enough, the same performance increase the "legendary" Bulldozer optimization patch also brought us).

 

So once again has this idea that Microsoft are to blame, and that Windows 10 just isn't "optimized" for Ryzen been debunked.

 

Capture.PNG

You didn't come up with the same conclusion as the person who you got your data from. Did you even read the report? I know it is in French or whatever, but you could just use google translate.

 

Just to quote some pieces of the report:

Quote

Interestingly enough (we will come back to this), the 8 MB cache is available in each CCX in configuration 2 + 2. In theory, therefore, this configuration is advantageous, it has access to 2 x 8 MB of L3, against only 1 x 8 Mo for configuration 4 + 0.

Quote

X264 and x265, which are not sensitive to the memory subsystem, perform virtually identically in both modes.

 

Surprise however, the cases of WinRAR and 7-Zip, two benchs very sensitive to the memory subsystem that show very different results.

 

In the case of 7-Zip, the 2 + 2 configuration is the most interesting. It seems that the software benefits better from the presence of 16 MB of L3. Conversely, WinRAR is more penalized by synchronization and the additional L3 cache does not compensate.

So right off the bat, 4 out of the 10 programs that was run doesn't seem like it have to much cross-CCX communication. So 40% of the tested results are basically useless to this debate.

 

The remaining 60% of the programs (the games) all show some kind of regression in performance with variance, all from ~3% up to ~20% (you could argue that half of them is within error of margin, but I would argue that since EVERY game showed regression that wouldn't necessarily be true). This gives a better insight to our debate. To quote the reporter (or whatever he is):

Quote

In all cases, the configuration or a CCX is disabled is the best performing. The additional L3 cache does not change anything, the losses are very variable according to the titles but for some the difference is massive: Battlefield 1 announces a differential of almost 20% which translates in practice by 22 FPS of difference! The synchronization of data seems extremely penalizing in this title. Project Cars and Civilization VI also incur significant losses.

 

And here is his summarization:

Quote

We are now beginning to see a little more clearly. Yes, communication between CCX at a cost, and depending on the applications it is not necessarily harmless.

 

For less sensitive applications, the effect is almost zero, as is the case with video encoding software, for example.

 

For others it is much more striking, as for example Battlefield 1 where one loses 20% performance.

 

He also notes:

Quote

There may be over-aggravating factors. Knowing that communication between CCXs is a hassle, running threads from one CCX to another (Windows 10 loves constantly moving threads!) Can make the situation worse, although it is difficult to quantify in what proportion.

Quote

This does not mean that things will not change for Ryzen in the future. The most obvious solution would be a patch for the Windows scheduler , in order to limit thread movements from one CCX to another

To sum it up, the test data is extremely limited of 10 programs, in which 4 of them seems like it doesn't have much cross-CCX communication, and the rest all showed some kind of regression.

 

It is funny how you have put up my arguments, I never said microsoft is to blame. I said that scheduler optimizations could potentially yield some sort of performance improvement, which the reporter seemed to agree with. 

 

I would consider this lazy debunking from you, I would have expected better to be honest. Really seemed like you didn't even bother to read the report.

Please avoid feeding the argumentative narcissistic academic monkey.

"the last 20 percent – going from demo to production-worthy algorithm – is both hard and is time-consuming. The last 20 percent is what separates the men from the boys" - Mobileye CEO

Link to post
Share on other sites

On that cross CCX benchmarking in 4+0 or 2+2 scenarios I think they are showing the wrong numbers. In gaming I think the bigger impact would be felt on the MINIMUMS, forcing more cross CCX talk would lower the minimums more than drop the maximums or averages. But I think as the L3 cache doesn't appear to be effected by disabling cores this route may be fruitless as you have less and less cause to cross talk when you disable the cores. A more rigorous methodology may be needed, forcing crosstalk and denying it and seeing specific performance changes. I do not think it will be enough to make a big difference though, in either case.

 

EDIT: Hypoethesis: keep all 8 cores running, force affinity for 2 multi threaded apps into their OWN 2+2 or 4+0 affinities and run them concurrently. And check the results against each other, 2+2 and 2+2 vs 4+0 and 0+4.

Link to post
Share on other sites
9 hours ago, leadeater said:

Yea that's basically where I'm stuck at as to which of those two options pays off better. Seems wasteful and costly to disable that many cores and use up wafer area to deliver 4 core products. Naples while it does show how AMD can scale CCXs is very different in die design regarding PCIe lanes and memory controller and looks to be only offering 8 (2 CCX), 16 (4/6 CCX), 24 (6/8 CCX) and 32 (8 CCX) products and is a poor example to use for gauging if a single CCX die design is going to be used.

 

We also know AMD is favoring a market push to high core products all round so how much they actually want to invest in 4 core products is unknown.

theres something important here also, which is that cheap lower end cpus sell lots more, which might mean that amd would have to use perfectly good chips for lower end products which is bad for business.

i would argue that the cheer potential amount of demand for 4 core cpus would make them use a die just for that as it would be expensive to use 2ccxs for that, dont forget this node is mature as it has been producing all the Polaris chips for almost a year now.

Link to post
Share on other sites
8 hours ago, leadeater said:

Probably wasn't that clear by what I meant by scalable CCX design. Any SKU that uses a different amount of CCXs is a different die design, cutting the current die isn't possible.

 

Basically the CCXs are paired with a memory controller and I/O silicon logic, that means while they can put any number of CCXs they wish (maybe in pairs?) on a die they still need the CCX interconnects plus memory controller/PCIe that go with it. Cut the die cut the memory controller which means broken non functional die.

 

That's where it starts to get really complex and further in to more unknowns:

  • Can a single CCX be properly connected to the memory controller or is a minimum of two required?
  • Can a single CCX die actually be made smaller in physical area? Can the memory controller etc be rearranged?
  • Is the potential sales volume of the 4 core SKUs worth investing in a dedicated die design?

My personally feeling on the matter regarding the CCX is it was always designed primarily for Naples to allow AMD to easily make a few different die designs covering a large range of core count SKU offerings cheaply and to maximize wafer area usage. Ryzen is the product of making do with what you have and not necessarily a specifically dedicated design, surely a 100% Ryzen focused design wouldn't have a CCX interconnect in it at all and be a unified 8 core??

 

I'd love to have an open and honest discussion with the Zen architecture engineers to find out where the real focus was, Ryzen or Naples.

dont forget they also made ryzen with low power things in mind, like consoles, laptops embedded etc, and we will have a 4 core apu late this year, so that has to have 1 ccx as they wont sell any 8/6 core apu.

ryzens ccx is cost saving measure so that making various dies is as cheap as possible 

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now


×