Jump to content
Search In
  • More options...
Find results that contain...
Find results in...
zMeul

AMD speaks on W10 scheduler and Ryzen

Recommended Posts

9 minutes ago, LAwLz said:

Well that makes a bit more sense (I thought you meant it should be a hard coded rule) but I don't think it would be a good idea regardless. How would you determine if it would be an exception to the default behavior? Does the application developer do that? Then you run the risk of bad performance being the default for a lot of programs. Does Windows determine that? I don't see how they could do it without real-time analysis of how much intercommunication is going on between the CCXs.

The same way you determine if you should keep the (software) thread to the same core and context switch with time slices or use another core. This will be determined from the process priority you assign or get assigned. It should be the scheduler developers who does that, and application developers can do further optimizations if deemed necessary.

 

Often developers will separate child process from the parents thread if they aren't running dependencies and are running critical code. This allows you to differentiate with process priorities to a greater extend.

 

9 minutes ago, LAwLz said:

Yes it has been benched with x264 and I wouldn't say it was amazing, but it was good.

The 1700X is about 11% faster than a 7700K.

For comparison. The 6900K is 23% faster than the 1700X.

 

For HEVC 1700X, 1800X, the 6900K and the 7700K are all pretty much the same. It's like a 4% difference between the fastest and slowest.

 

 

I talked to one of the x264 developers and he said that it isn't as parallel as some people might think (although he was talking about CPU vs GPU encoding, so like 8 cores vs 1000 cores).

I think that's because it gains a lot of parallelism from using macro blocks, but each one of those macro blocks has a lot of dependencies tied to them. I have no idea how much core to core communication is going on in video encoding though. It is probably not all that much, but I am really not qualified to comment on it.

I didn't say it was amazing, I just said it fared quite well even compared to Intels line up. So it doesn't seem like x264 are affected by CCX-to-CCX communication.

 

9 minutes ago, LAwLz said:

That could work. I was just talking about what I thought Microsoft could do with the existing hardware though. If we're talking about hardware changes then wouldn't it be best to just do a single die (not two CCXs) and then use a ring bus like Intel does? That seems to be not as fast in some situations, but far more consistent.

That is hard to evaluate on at this point, we have to see how much scheduler optimization can help to minimize the penalty of cross-CCX communication. It is however one area that AMD keep getting into, trying to start their own trends (see it with bulldozer, GCN, Mantle,  etc etc), trying non-traditional methods in an attempt to get a competitive edge. They are however not really in the position to do so considering their market share and current software architecture, but they keep trying.

 

What I am still interested to see, is how well zen will scale up to say 32 cores. How much of an effect the cluster architecture will have compared to intels ringbus.


Please avoid feeding the argumentative narcissistic academic monkey.

"the last 20 percent – going from demo to production-worthy algorithm – is both hard and is time-consuming. The last 20 percent is what separates the men from the boys" - Mobileye CEO

Link to post
Share on other sites
2 hours ago, XenosTech said:

you should read down....

If you mean read the rest of your comment, I did, and how you feel about is entirely up to you. Just know that your feelings on the subject are very illogical.


Royal Rumble: https://pcpartpicker.com/user/N3v3r3nding_N3wb/saved/#view=NR9ycf

 

"How fortunate for governments that the people they administer don't think." -- Adolf Hitler
 

"I am always ready to learn although I do not always like being taught." -- Winston Churchill

 

"We must learn to live together as brothers or perish together as fools." -- Martin Luther King Jr.

Link to post
Share on other sites
Just now, N3v3r3nding_N3wb said:

If you mean read the rest of your comment, I did, and how you feel about is entirely up to you. Just know that your feelings on the subject are very illogical.

Illogical because I view and i7 as the norm for gaming ?


CPU: Intel i7 7700K | GPU: ROG Strix GTX 1080Ti | PSU: Seasonic X-1250 (faulty) | Memory: Corsair Vengeance RGB 3200Mhz 16GB | OS Drive: Western Digital Black NVMe 250GB | Game Drive(s): Samsung 970 Evo 500GB, Hitachi 7K3000 3TB 3.5" | Motherboard: Gigabyte Z270x Gaming 7 | Case: Fractal Design Define S (No Window and modded front Panel) | Monitor(s): Dell S2716DG G-Sync 144Hz, Acer R240HY 60Hz | Keyboard: G.SKILL RIPJAWS KM780R MX | Mouse: Steelseries Sensei 310 (Striked out parts are sold, awaiting zen2 parts)

Link to post
Share on other sites
Just now, XenosTech said:

Illogical because I view and i7 as the norm for gaming ?

Rather that you call it average, when in fact there isn't anything currently available that you would consider above average. It just goes against what average means.


Please avoid feeding the argumentative narcissistic academic monkey.

"the last 20 percent – going from demo to production-worthy algorithm – is both hard and is time-consuming. The last 20 percent is what separates the men from the boys" - Mobileye CEO

Link to post
Share on other sites
Just now, Tomsen said:

Rather that you call it average, when in fact there isn't anything currently available that you would consider above average. It just goes against what average means.

Based on context a word can mean several things. There isn't anything new about the i7's that we didn't know about them for the last idek years.... We know thye have excellent single core performance, they have HT and now with kaby lake the only thing new is how easy you can push them to 5 ghz without having to got shell out a ton of money for a custom water loop since we can hit that on air with a decent cooler. So to me they're just an average chip now and not say average in terms of performance.. that how you are interpreting what I say as.


CPU: Intel i7 7700K | GPU: ROG Strix GTX 1080Ti | PSU: Seasonic X-1250 (faulty) | Memory: Corsair Vengeance RGB 3200Mhz 16GB | OS Drive: Western Digital Black NVMe 250GB | Game Drive(s): Samsung 970 Evo 500GB, Hitachi 7K3000 3TB 3.5" | Motherboard: Gigabyte Z270x Gaming 7 | Case: Fractal Design Define S (No Window and modded front Panel) | Monitor(s): Dell S2716DG G-Sync 144Hz, Acer R240HY 60Hz | Keyboard: G.SKILL RIPJAWS KM780R MX | Mouse: Steelseries Sensei 310 (Striked out parts are sold, awaiting zen2 parts)

Link to post
Share on other sites
Posted (edited) · Original PosterOP
6 hours ago, cj09beira said:

theres one thing the scheduler could do to improve perf though which is keep threads wich a lot of crosstalk in the same ccx. but this is adding features to scheduler not exactly a bug

so, let me get this straight .. you want a 8 core CPU to be treated as 2 NUMA, yes ?!?!? yes ....

ok, what will happen with R5s and R3s when those CCX nodes will only have 3 and 2 cores /  node, eh .....

 

Edited by wkdpaul
cleaned up
Link to post
Share on other sites
3 minutes ago, zMeul said:

ok, what will happen with R5s and R3s when those CCX nodes will only have 3 and 2 cores /  node, eh .....

My understanding is that the 4-core Ryzen R5 and R3s will consist of a single CCX node...  

Link to post
Share on other sites
Posted · Original PosterOP
1 minute ago, WMGroomAK said:

My understanding is that the 4-core Ryzen R5 and R3s will consist of a single CCX node...  

no, they won't

from what I got, each node will have cores disabled

 

if they will have a single CCX, that will be like mana from heaven

Link to post
Share on other sites
28 minutes ago, zMeul said:

no, they won't

from what I got, each node will have cores disabled

 

if they will have a single CCX, that will be like mana from heaven

This article from Ars Tech (https://arstechnica.com/gadgets/2017/03/amds-moment-of-zen-finally-an-architecture-that-can-compete/2/) would seem to indicate that the 4-core chips will be a single CCX (which does make more sense than building a dual CCX chip than going in and either physically or microcode disabling half the chip and associated L3 cache).  

 

Quote

In the second quarter, these will be joined by Ryzen 5. The R5 1600X will be a six-core, 12-thread chip running at 3.6-4.0GHz (two CCXes, with one core from each disabled), and the 1500X will be a four-core, eight-thread chip at 3.5-3.7GHz (just a single CCX).

If you've got any news that would indicated that the 4-core SKUs are all going to be dual CCXes, I would enjoy reading those as well...  

Link to post
Share on other sites
31 minutes ago, XenosTech said:

Based on context a word can mean several things. There isn't anything new about the i7's that we didn't know about them for the last idek years.... We know thye have excellent single core performance, they have HT and now with kaby lake the only thing new is how easy you can push them to 5 ghz without having to got shell out a ton of money for a custom water loop since we can hit that on air with a decent cooler. So to me they're just an average chip now and not say average in terms of performance.. that how you are interpreting what I say as.

Yes, based on context, words find their meaning.  But, to have the desired meaning, the context must be clear.  Now that you've explained what you meant, it's obvious.  Before, going just on unclear context, it was not obvious.


Royal Rumble: https://pcpartpicker.com/user/N3v3r3nding_N3wb/saved/#view=NR9ycf

 

"How fortunate for governments that the people they administer don't think." -- Adolf Hitler
 

"I am always ready to learn although I do not always like being taught." -- Winston Churchill

 

"We must learn to live together as brothers or perish together as fools." -- Martin Luther King Jr.

Link to post
Share on other sites
26 minutes ago, WMGroomAK said:

This article from Ars Tech (https://arstechnica.com/gadgets/2017/03/amds-moment-of-zen-finally-an-architecture-that-can-compete/2/) would seem to indicate that the 4-core chips will be a single CCX (which does make more sense than building a dual CCX chip than going in and either physically or microcode disabling half the chip and associated L3 cache).  

 

If you've got any news that would indicated that the 4-core SKUs are all going to be dual CCXes, I would enjoy reading those as well...  

It will be a single CCX, that has quite clearly been indicated and is cheaper to manufactures which is a high priority for AMD.

Link to post
Share on other sites
3 minutes ago, leadeater said:

It will be a single CCX, that has quite clearly been indicated and is cheaper to manufactures which is a high priority for AMD.

That's what I thought as well...  It also makes sense in that it provides at least a bit of framework for the APUs, which should be a single CCX and a GPU SoC.  

Link to post
Share on other sites

The intercommunications between two CCX's definitely sounds like part of the easy IPC gains that they were talking about in the AMA. Whatever you might be gleaning from this, there's a lot that AMD could improve upon. Just like Intel with Nehalem. And honestly, given where Zen is now, having obvious and fixable bottlenecks makes me very optimistic for the future of Zen based chips.

Link to post
Share on other sites
2 hours ago, zMeul said:

so, let me get this straight .. you want a 8 core CPU to be treated as 2 NUMA, yes ?!?!? yes ....

ok, what will happen with R5s and R3s when those CCX nodes will only have 3 and 2 cores /  node, eh .....

 

well no i dont, thats why i said new feature.

Link to post
Share on other sites
20 hours ago, djdwosk97 said:

How come ryzens latency within CCX'S is so much lower? 

Fewer stuff to connect means lower latency, depending on how you interconnect cores. In this case, half cores, half latency. 

Link to post
Share on other sites
Posted · Original PosterOP
2 hours ago, WMGroomAK said:

This article from Ars Tech (https://arstechnica.com/gadgets/2017/03/amds-moment-of-zen-finally-an-architecture-that-can-compete/2/) would seem to indicate that the 4-core chips will be a single CCX (which does make more sense than building a dual CCX chip than going in and either physically or microcode disabling half the chip and associated L3 cache).  

 

If you've got any news that would indicated that the 4-core SKUs are all going to be dual CCXes, I would enjoy reading those as well...  

I believe that's bull, why? simple .. let's look at the Zen die shot:

ryzen-die.jpg

 

 

the one of the R5 will be a 6 core 12 threads CPU? yes? that's basically impossible to do with a single CCX

 

now the R3s with 4 cores / 8 threads - theoretically it's quite possible to do on a single CCX, but practically not possible because there is a shit ton more stuff on the CPU die than just cutting one CCX away

 

what's more plausible?

this:

Spoiler

T2GBbkN.png

or this:

Spoiler

8SXGVG1.png

 

I bet on the no2

Link to post
Share on other sites
8 minutes ago, zMeul said:

I believe that's bull, why? simple .. let's look at the Zen die shot:

ryzen-die.jpg

 

 

the one of the R5 will be a 6 core 12 threads CPU? yes? that's basically impossible to do with a single CCX

 

now the R3s with 4 cores / 8 threads - theoretically it's quite possible to do on a single CCX, but practically not possible because there is a shit ton more stuff on the CPU die than just cutting one CCX away

 

what's more plausible?

this:

  Reveal hidden contents

T2GBbkN.png

or this:

  Reveal hidden contents

8SXGVG1.png

 

I bet on the no2

Don't forget that AMD in the past has created entirely new die for their lower end CPU as it makes them actually cheaper to manufacturer than the higher end parts (if the lower end part for example had less cache, the size of the die and therefore the cost reflected that). AMD would get higher margins off a separate smaller die unless the 14nm yields aren't that good (which doesn't seem to be the case)


"We also blind small animals with cosmetics.
We do not sell cosmetics. We just blind animals."

 

"Please don't mistake us for Equifax. Those fuckers are evil"

 

This PSA brought to you by Equifacks.
PMSL

Link to post
Share on other sites
Posted · Original PosterOP
7 minutes ago, Dabombinable said:

Don't forget that AMD in the past has created entirely new die for their lower end CPU as it makes them actually cheaper to manufacturer than the higher end parts (if the lower end part for example had less cache, the size of the die and therefore the cost reflected that). AMD would get higher margins off a separate smaller die unless the 14nm yields aren't that good (which doesn't seem to be the case)

that's only possible for R3s with 4 cores, but not for R5s with 6 cores

their lithography success rate should be godlike, otherwise they'll throw away a lot of dies - testing the dies is not cheap either

 

here's one other dead giveaway that they would not have new dies - their R5 and R3 TDP is, for the most part, identical

if you cut a CCX away, the R3 should've been ~30W TDP parts, not 65W ;)

Link to post
Share on other sites
Just now, M.Yurizaki said:

Because the CCX's talk to each other through a much lower speed bus than the L3 caches within the same CCX talk to each other

I meant compared to broadwell-e


PSU Tier List | CoC

Gaming Build | FreeNAS Server

Spoiler

i5-4690k || Seidon 240m || GTX780 ACX || MSI Z97s SLI Plus || 8GB 2400mhz || 250GB 840 Evo || 1TB WD Blue || H440 (Black/Blue) || Windows 10 Pro || Dell P2414H & BenQ XL2411Z || Ducky Shine Mini || Logitech G502 Proteus Core

Spoiler

FreeNAS 9.3 - Stable || Xeon E3 1230v2 || Supermicro X9SCM-F || 32GB Crucial ECC DDR3 || 3x4TB WD Red (JBOD) || SYBA SI-PEX40064 sata controller || Corsair CX500m || NZXT Source 210.

Link to post
Share on other sites
14 minutes ago, djdwosk97 said:

I meant compared to broadwell-e

Broadwell-E uses unified L3 cache for all cores, whereas L3 in the CCX is split up into 1MB chunks totalling 8MB 4MB chunks totaling 16MB. However all of the L3 caches talk at the same speed as the L3 to L2 cache communication. So latency isn't that bad, but I guess it's still latency.

 

You can find an article that talks about it at https://www.techpowerup.com/231268/amds-ryzen-cache-analyzed-improvements-improveable-ccx-compromises

Edited by M.Yurizaki
Link to post
Share on other sites

Whether or not the quad core will be single or dual CCX remains to be seen. I don't think anyone can say one way or another right now. It entirely depends on what AMD's yields are and what the demand for the 8 and 6 core chips looks like.

Link to post
Share on other sites
8 hours ago, LAwLz said:

Whether or not the quad core will be single or dual CCX remains to be seen. I don't think anyone can say one way or another right now. It entirely depends on what AMD's yields are and what the demand for the 8 and 6 core chips looks like.

True but the CCXs are contained entities designed to be scalable, for the purpose of Naples which uses the same CCXs, so it is very likely a 4 core SKU will be a single CCX. Yields isn't really here nor there since to get the 8 core SKUs you need two functioning CCXs, asking for one isn't any tougher so I don't see how that really plays much part in 2 CCX vs 1 CCX for a 4 core SKU.

 

Edit:

I think your point was more around over supply of 2 CCX dies? The demand aspect? Lowering production and holding those back would make much more sense than just turning them in to 4 core SKUs, if the intent from the start was to make a single CCX die. If that is the case the design of it has already been done long ago and engineering samples have already been made or are being made.

End edit;

 

Design cost of a different die is the biggest factor versus just disabling cores, but then the cost per unit is higher. Which plays out better we can't know as we don't have those costing details and never will.

 

As for the 6 core SKU, that must be 2 CCX simple math :).

 

For the above point about TDP, well someone needs to go look up what TDP actually means because it is not power draw of the CPU. The 4 core SKU and 6 core SKU having the same TDP in no way indicates the CCX makeup.

Link to post
Share on other sites
31 minutes ago, leadeater said:

-snip-

I was more thinking along the lines of, we don't know what the yields nor demand are for things. After looking at the die shot on the previous page I am not even sure how the CCXs are split up. If you look at the die shot posted a bit earlier it seems like there is no clean path where they can just split two Ryzen 7 chips into two Ryzen 3 chips, since the two CCXs aren't identical.

But now that I think more about it, that doesn't make any sense. It would be a terrible design decision.

 

Designing a new die just for the quad core version wouldn't really make sense either if their goal was to save money by bein able to reuse the CCX design in all SKUs.

 

But then we also have the problem of supply/demand. Let's not forget that AMD have sold quad cores as triple and even dual cores before, like you were alluding to a bit.

 

But what if they have a lot of CCXs where two cores are faulty or the manufacturing process isn't mature yet and they get a lot of CCXs which don't pass their binning process in terms of power/heat?

 

If the die really looks the way it does in the die shot above (where there seem to be two different types of CCXs), and if their yields are bad, and/or the supply/demand is off, then I think it would make sense to use CCXs with one defective core for the 6 core version, and CCXs with two faulty cores for the quad core version.

 

But who knows... I just think it isn't as set in stone as it might appear.

Just cutting one 8 core into two quad cores seems like the most obvious way of doing things, but it doesn't seem like that's possible (or even economical).

Link to post
Share on other sites
2 minutes ago, LAwLz said:

But who knows... I just think it isn't as set in stone as it might appear.

Just cutting one 8 core into two quad cores seems like the most obvious way of doing things, but it doesn't seem like that's possible (or even economical).

Yea that's basically where I'm stuck at as to which of those two options pays off better. Seems wasteful and costly to disable that many cores and use up wafer area to deliver 4 core products. Naples while it does show how AMD can scale CCXs is very different in die design regarding PCIe lanes and memory controller and looks to be only offering 8 (2 CCX), 16 (4/6 CCX), 24 (6/8 CCX) and 32 (8 CCX) products and is a poor example to use for gauging if a single CCX die design is going to be used.

 

We also know AMD is favoring a market push to high core products all round so how much they actually want to invest in 4 core products is unknown.

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now


×