Ashes of the Singularity Receives Ryzen Performance Update - Up to 31% Improvement

MakeAMDGreatAgain · March 31, 2017

22 hours ago, Bouzoo said:

Please, someone tell me how RAM speed doesn't affect performance.

Damn, didn't know 3200 to the 2400 MHz would be that big of a diff. Def getting higher speeds when I upgrade into DDR4 (still on fx 6300 / shitty ddr3 ram atm lol)

MandelFrac · March 31, 2017

25 minutes ago, Citadelen said:

In the quote in the OP they talk about how the instruction scheduler is also optimised for Intel, this would have to be patched as well, the improvements we've seen took them around 400 hours to implement.

I hope you realize that's not a thing... You can't schedule instructions. They just run. Now, rebalancing the threads and thread pool (and yes, thread management systems are complicated if you're not just going to spin up N and let them fly from the start to the finish), that is work, and that might be what this person meant by instruction scheduler, even though that is a horrible name for it.

If you use a thread pool like in this presentation, then yes, you've got some work to do, but you don't schedule instructions. That's not how it works. There are ways to reorder them within critical functions to get better utilization/throughput, but that's not something done on the fly by the software. The Out of Order engine on the CPU itself does that, and it usually will do better than a human for a code space of about 64 instructions.

Spoiler

samcool55 · March 31, 2017

18 hours ago, Morgan MLGman said:

Well, while I'd like it to be true, I wouldn't go as far to say that "devs are actively optimizing games for ryzen" yet, because Oxide is in a close partnership with AMD...

What about the devs that were at capsaicin telling everyone they are working with AMD to optimize games?

Bethesda was one of them and they aren't exactly your average dev team, same goes for epic games

There are more of them but i can't remember them, if you want you can always watch the AMD capsaicin & cream video if you want to...

Of course i'm not expecting every dev is optimizing for AMD, but there are clear signs some of the bigger devs are working with AMD to optimize it.

yian88 · March 31, 2017

23 hours ago, Bouzoo said:

Please, someone tell me how RAM speed doesn't affect performance.

let me correct that for you, DDR3 on DDR3 memory controller CPU's dont affect performance, with DDR4 memory controllers and new arhitectures things have changed finally, for too long RAM has been only a capacity thing... now they make use of the bandwidth aswell good.

Bouzoo · March 31, 2017

4 minutes ago, yian88 said:

for too long RAM has been only a capacity thing...

~~I suppose you are referring to frequency?~~ Scratch that, misread

4 minutes ago, yian88 said:

now they make use of the bandwidth aswell good.

DDR4 did improve things more yes, but there is a difference on DDR3 when compared DDR3 as well, it's easy to test it. Of course, it also depends on the title.

MandelFrac · March 31, 2017

21 minutes ago, yian88 said:

let me correct that for you, DDR3 on DDR3 memory controller CPU's dont affect performance, with DDR4 memory controllers and new arhitectures things have changed finally, for too long RAM has been only a capacity thing... now they make use of the bandwidth aswell good.

Frankly by now they should be making RAM scream for mercy. If games kept up with AVX there'd be no way to keep our CPUs fed on dual-channel memory, even at 4266MHz.

For 1 Skylake core under AVX/2 and FMA3, vector fadd/fmull/fmadd each take 4 cycles, and you can do 2 mulls or adds at the same time.

If you're multiplying a quaternion over a dataset, let's say, then you have to pull in 512 bits (64 bytes) every 4 cycles to keep the pipeline fed.

At 4 GHz, you take 4 cycles per op (technically 2 ops, but it doesn't affect the math), so you get 1Gops. 1Gops*64 bytes/op =64GB/s in bandwidth just to keep the pipeline fed. However, you also have to be pushing data back out to main memory at that same rate if it's a large dataset that doesn't fit in cache. That means you need 128GB/s in bandwidth total to actually work at full tilt. Now multiply that by 4 physical cores and suddenly you need 512GB/s. You need 0.5TB/s in bandwidth to sustain working optimally on a single operation (called a reduction). Now, games make next to no use of AVX/2 or FMA3. They use SSE, which is half the throughput per op. You'd still need 256GB/s to sustain this. Obviously, it's untennable, so in the HPC world they cluster both data and operations so they can maximize work impact for every single memory access.

Games are nowhere near where they should be considering AVX is as old as Bulldozer, Sandy Bridge, and Jaguar (in the PS4).

MageTank · March 31, 2017

21 hours ago, SamStrecker said:

Let's not forget it's only certain games as it ranges across many. Another example, GTA V.

To say RAM speeds don't give any FPS gain is wrong now days. It might of been true in DDR2 and DDR3 days when linus made a video on it. But when you have RAM that is clocked the same or faster than your CPU it matter.

It wasn't even true back then. Linus's testing methodology was garbage, period. He used Metro LL with a 660 Ti, and copious amounts of AA. That GPU was dying under that kind of stress, lol.

I have been spreading this information for years now, ever since I stumbled upon this old thread on OCN: http://www.overclock.net/t/1487162/an-independent-study-does-the-speed-of-ram-directly-affect-fps-during-high-cpu-overhead-scenarios

We are talking about the ancient 3930k getting significant gains across the board. It's not just in newer titles too. Go back to old MMO's like Silkroad Online or Perfect World, which were notoriously CPU bound, and check your framerate scaling across several different memory overclocks. If I could tell my younger self that faster ram helped, I would have won far more territory wars, lol.

Even if you push your GPU to the forefront, and let it become the bottleneck, your minimum framerates (the ones that are almost always dictated by CPU (I/O) bound situations) still benefit from faster memory. I use a GTX 1070 at 1440p, and while this card is plenty fast for 1440p, it's the bottleneck in the more demanding titles. However, if I load JEDEC instead of my 3600 C14, I can seriously feel the difference when entering different towns/zones, as to the degree in which my frames dip. It matters, and I believe it always has. I just don't think we knew to monitor minimum fps first, and to consider the rest of our system when testing the results back then.

Citadelen · March 31, 2017

2 hours ago, MandelFrac said:

-snip-

I'm going to trust the people at Oxide over you.

MandelFrac · March 31, 2017

21 minutes ago, Citadelen said:

I'm going to trust the people at Oxide over you.

Sigh, please read these: http://www.intel.com.au/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-instruction-set-reference-manual-325383.pdf

How do you schedule actual instructions? You have to have instructions to do it. They don't exist. You put subroutines in packaged tasks which end up being given to threads in a set order or taken from a thread pool nondeterministically. Those are the two ways you "schedule" getting work done on a CPU, end of story. You otherwise just rip right down the line of instructions sequentially. You might jump around based on conditions, loop, make calls and returns, but you cannot go to the ground level and actually schedule instructions in a different order than what they are in the binary, not without actually shooting oneself in the foot on performance with a JIT system.

In some cases reorganizing the order of operations can yield some benefit. Integer division takes up to 90 cycles on Skylake. While that division is taking that long, there's plenty of 4-cycle floating point adds/mulls you could be doing, moving the order of the instructions in the final binary is all you can do. You can't actually schedule them on the fly. You would have to modify the code as it's being run, meaning your game would have to contain its own JIT compilation and optimization engine. No one, including Cloud Imperium, does that, because your executable alone would bloat by 3 gigs just to house all of that code, and then that's code which isn't actually contributing to how fast your app runs. That's a thread that's not actually doing the work of the game. It just seems like it is, just like the JVM.

This is why Java cannot beat C++ at anything. Garbage Collection can be used in C++ for those corner cases where GC is actually a more efficient memory cleanup model than explicit deletion (hence some analytics workloads moving to Java/Gosu/Scala a few years back before someone wrote an equally good GC for C++ runtimes), meaning Java can't be as fast or efficient. You're now going to tell me, who applies HPC in his career daily and writes most of his code using bleeding edge C++ 17 features, that I don't know the fundamentals of how CPUs and coding work?

This is why Appeal to Authority is a logical fallacy, not an argument. It doesn't matter who says something. It matters if that something is true/correct. What I have said here is provably and demonstrably correct.

MandelFrac · March 31, 2017

16 minutes ago, Citadelen said:

I'm going to trust the people at Oxide over you.

And read these:

https://www.google.com.au/search?q=instruction+scheduling&rlz=1C5CHFA_enAU735AU737&oq=instruction+scheduling&aqs=chrome..69i57l2j69i60j69i59j69i60l2.3239j0j7&sourceid=chrome&ie=UTF-8

Instruction Scheduling as an actual technique happens at compile time, meaning what they did was just reorder instructions in the flat binary. At runtime, all you get is threads and thread pools.

You should be more open to learning from others, because we all have some knowledge that others lack.

SpaceGhostC2C · March 31, 2017

22 hours ago, Morgan MLGman said:

In general, the more the bottleneck is shifted towards the CPU and the faster the CPU is, the more RAM speeds matter so if you had a 6700K paired with a GTX 1080 at 1080p and used two different kits of RAM, one at 2133MHz and one at 3200MHz, the difference could be just around 20FPS in more CPU-dependent games.

Thank you!

There's so much "RAM speed doesn't matter" (as blanket as that, sometimes not even "for games" is added), when the most accurate version would be "only the GPU matters", because at the end of the day that's what happens. Which is fine, you are playing a game, and games have these fancy 3D graphics which are taxing your system, so you will find this situation often, but then you may as well restrict the discussion to GPUs and that's it (which is why I also tell people not to obsess over "CPU for gaming").

If you think about it, there's a whole family of "doesn't matter" that stems from the same GPU-dependent scenario:

- Clocks don't matter

- CPU cache doesn't matter

- RAM speed doesn't matter

- Cores don't matter (this can be true, but depends on the application)

...

So yea, sometimes a potato with a 1080ti is all you need. But in those cases there's no point in discussing whether normal or sweet potato is best...

MandelFrac · March 31, 2017

6 minutes ago, SpaceGhostC2C said:

Thank you!

There's so much "RAM speed doesn't matter" (as blanket as that, sometimes not even "for games" is added), when the most accurate version would be "only the GPU matters", because at the end of the day that's what happens. Which is fine, you are playing a game, and games have these fancy 3D graphics which are taxing your system, so you will find this situation often, but then you may as well restrict the discussion to GPUs and that's it (which is why I also tell people not to obsess over "CPU for gaming").

If you think about it, there's a whole family of "doesn't matter" that stems from the same GPU-dependent scenario:

- Clocks don't matter

- CPU cache doesn't matter

- RAM speed doesn't matter

- Cores don't matter (this can be true, but depends on the application)

...

So yea, sometimes a potato with a 1080ti is all you need. But in those cases there's no point in discussing whether normal or sweet potato is best...

kek.

But yeah it's not like it's even all that difficult to completely swallow our current top bandwidth capabilities even in quad-channel configs. The 6950X needs a whopping 1.1TB/s in raw RAM bandwidth for that task I pointed out earlier. 34.1*4 = 136.4GB/s. That's quad-channel DDR4 4266. I know games are generally developed for the lowest common denominator, but it's not really that hard to build the critical paths of the CPU side with AVX intrinsics and make the CPU starve for data even with the very best RAM setup available.

Misanthrope · March 31, 2017

5 hours ago, Speaker1264 said:

Dude, you realize that most games were optimized for Intel systems and not Ryzen systems because Ryzen didn't exist when they were optimizing their game? This Ashes of the Singularity optimization just proves that all previous game benchmarks don't mean diddly squat, unless you never plan on playing a new release. All new games coming out should now be getting optimizations for both Intel and Ryzen, and the benchmark margins for up-coming game releases will shrink or disappear completely when compared to earlier benchmarks that were done with games that were optimized for Intel but not Ryzen.

For this reason when the Ryzen R5's are getting benchmarked I'm only going to be looking at the Ashes of the Singularity benchmarks to compare performance, since it's the only game currently optimized for both Intel and Ryzen, imo. It's the only game where we will be able to see Ryzen's true performance, and that is the true performance that we are likely to see in future game releases.

I can find/replace this entire argument back at the Bulldozer launch and it will be indistinguishable: "Look at this fringe case and weird exception! If you get these timings, this specific settings, this specific temps and perform the benchmark at midnight during a full moon AMD actually beats intel by 1% Just wait for more games and software to be optimized, because that will happen, Bulldozer is a great future proofing CPU!"'

Yes is not "fair" that almost no games are not optimized for AMD processors. That's life, it isn't supposed to be fair because you're not supposed to lose 5 fucking years with a crap product and expect to easily recover, expect gamers to buy your product to play ONE game better and then wait another 5 years, etc.

MandelFrac · March 31, 2017

4 minutes ago, Misanthrope said:

I can find/replace this entire argument back at the Bulldozer launch and it will be indistinguishable: "Look at this fringe case and weird exception! If you get these timings, this specific settings, this specific temps and perform the benchmark at midnight during a full moon AMD actually beats intel by 1% Just wait for more games and software to be optimized, because that will happen, Bulldozer is a great future proofing CPU!"'

Yes is not "fair" that almost no games are not optimized for AMD processors. That's life, it isn't supposed to be fair because you're not supposed to lose 5 fucking years with a crap product and expect to easily recover, expect gamers to buy your product to play ONE game better and then wait another 5 years, etc.

It's kinda sad too. excavator was apparently most of the original design and then some tweaks. If Excavator had made it to the table instead of Bulldozer, AMD would look very different right about now. And it wasn't an issue of money. AMD developed Piledriver and Steamroller after LOSING money on the BD escapade.

zMeul · March 31, 2017

http://www.pcgameshardware.de/Ryzen-7-1800X-CPU-265804/Specials/AMD-AotS-Patch-Test-Benchmark-1224503/

Quote

Update 31.03.2017: After more time, we have added our 1080p high benchmarks to the DX12 results of the Core i7-6900K as well as its system details in the specifications. The results are as expected for the integrated benchmark: unchanged in the CPU part with a slight decline in the GPU part - similar to the i7-7700K.

That the Core i7-6900K - despite Intel and AMD system independent Windows installations as well as each freshly downloaded AotS version - in our Savegame also fails and even more clearly than the R7-1800X, amazed. Not because Intel does badly, but because now a pattern can be seen: Savegame and six- or multi-core processors do not come together well. Whether this is due to a specificity of our saves , the results of which we have determined as with the other processors via OCAT , or whether in the real game in contrast to the benchmark perhaps different algorithms are at work, we could not clarify so far. AMD has already been informed prior to the release of our article, and we are working to provide clarification with the developers of AotS.

what's AMD up to, eh ...

yian88 · March 31, 2017

1 hour ago, MandelFrac said:

Frankly by now they should be making RAM scream for mercy. If games kept up with AVX there'd be no way to keep our CPUs fed on dual-channel memory, even at 4266MHz.

For 1 Skylake core under AVX/2 and FMA3, vector fadd/fmull/fmadd each take 4 cycles, and you can do 2 mulls or adds at the same time.

If you're multiplying a quaternion over a dataset, let's say, then you have to pull in 512 bits (64 bytes) every 4 cycles to keep the pipeline fed.

At 4 GHz, you take 4 cycles per op (technically 2 ops, but it doesn't affect the math), so you get 1Gops. 1Gops*64 bytes/op =64GB/s in bandwidth just to keep the pipeline fed. However, you also have to be pushing data back out to main memory at that same rate if it's a large dataset that doesn't fit in cache. That means you need 128GB/s in bandwidth total to actually work at full tilt. Now multiply that by 4 physical cores and suddenly you need 512GB/s. You need 0.5TB/s in bandwidth to sustain working optimally on a single operation (called a reduction). Now, games make next to no use of AVX/2 or FMA3. They use SSE, which is half the throughput per op. You'd still need 256GB/s to sustain this. Obviously, it's untennable, so in the HPC world they cluster both data and operations so they can maximize work impact for every single memory access.

Games are nowhere near where they should be considering AVX is as old as Bulldozer, Sandy Bridge, and Jaguar (in the PS4).

Thats really good info.

Language intrinsics support is null in most programming languages.

Since 2 weeks ago i started learning Go language, dusted off some old OpenGL/math tutorials but while Go is fast enough, it has no support for cpu vectorization or intrinsics.

Not that i will be able to make a renderer/game so complex that will need all that horsepower but most languages lock you out.

C++ is not an option, i do not know the reason C++ developers dont implement at least SSE3/4, as you say AVX1 should be the norm, but in Go i have no options, i will get several orders of magnitude less FP32 performance, if i could get at least auto SSE3 vectorized code id be happy.

I think that fundamentally the problem lies with compilers, compilers should auto vectorize the code or provide standard math libraries with vectorized code, i dont know very much about the subject but manual vectorization is not easy.

GCC Go compiler had autovectorization but Go no longer uses GCC, its now pure Go compiler.

MandelFrac · March 31, 2017

8 minutes ago, yian88 said:

Thats really good info.

Language intrinsics support is null in most programming languages.

Since 2 weeks ago i started learning Go language, dusted off some old OpenGL/math tutorials but while Go is fast enough, it has no support for cpu vectorization or intrinsics.

Not that i will be able to make a renderer/game so complex that will need all that horsepower but most languages lock you out.

C++ is not an option, i do not know the reason C++ developers dont implement at least SSE3/4, as you say AVX1 should be the norm, but in Go i have no options, i will get several orders of magnitude less FP32 performance, if i could get at least auto SSE3 vectorized code id be happy.

I think that fundamentally the problem lies with compilers, compilers should auto vectorize the code or provide standard math libraries with vectorized code, i dont know very much about the subject but manual vectorization is not easy.

GCC Go compiler had autovectorization but Go no longer uses GCC, its now pure Go compiler.

There are standardized math libraries, and Havok (thank you Intel) was one of them, but it never went past SSE3/4.

You can get auto-vectorized code in C++, but you have to know not just the compiler flags, but how to write FOR that given compiler. The naive 3-deep for loop structure for matrix multiplication to me should be a recognizable pattern for vectorizing. Unfortunately, even with -O3 -ffast-math, -mavx -march=skylake -mtune=avx, GCC 7, Clang 4, and ICC 17 all just spit out scalar SSE instructions for that loop. if you instead explicitly write out the sum of products, you at least get good horizontal SSE multiplications and sums which is a 4x speedup. But that's nasty, bloated code.

https://godbolt.org/g/v47RKH -naive loops -1040 cycles

https://godbolt.org/g/CPrBlB -AVX intrinsics - 136 cycles.

porina · March 31, 2017

Check out this... thread? (or does twitter have an alternate name for it?) I'm no programmer so this doesn't mean anything to me. General question may be, how applicable is it in a broader sense?

Colonel_Gerdauf · March 31, 2017

9 hours ago, MageTank said:

It wasn't even true back then. Linus's testing methodology was garbage, period. He used Metro LL with a 660 Ti, and copious amounts of AA. That GPU was dying under that kind of stress, lol.

I have been spreading this information for years now, ever since I stumbled upon this old thread on OCN: http://www.overclock.net/t/1487162/an-independent-study-does-the-speed-of-ram-directly-affect-fps-during-high-cpu-overhead-scenarios

We are talking about the ancient 3930k getting significant gains across the board. It's not just in newer titles too. Go back to old MMO's like Silkroad Online or Perfect World, which were notoriously CPU bound, and check your framerate scaling across several different memory overclocks. If I could tell my younger self that faster ram helped, I would have won far more territory wars, lol.

Even if you push your GPU to the forefront, and let it become the bottleneck, your minimum framerates (the ones that are almost always dictated by CPU (I/O) bound situations) still benefit from faster memory. I use a GTX 1070 at 1440p, and while this card is plenty fast for 1440p, it's the bottleneck in the more demanding titles. However, if I load JEDEC instead of my 3600 C14, I can seriously feel the difference when entering different towns/zones, as to the degree in which my frames dip. It matters, and I believe it always has. I just don't think we knew to monitor minimum fps first, and to consider the rest of our system when testing the results back then.

It is a bit comical then that now Ryzen is here with its own quirks, minimum frame-rate (and frame times) suddenly becomes important.

Valentyn · March 31, 2017

Update from PCgameshardware.de!

http://www.pcgameshardware.de/Ryzen-7-1800X-CPU-265804/Specials/AMD-AotS-Patch-Test-Benchmark-1224503/

AotS lowers the picture quality on CPUs with less than six physical cores (SMT) in the regular game - not in the integrated benchmark

They disabled some cores to run as a 1500X, same clocks as 1800X and found it gave significantly better performance!

It's barely slower than the 7700K; while both Intel and AMD Octa cores suffer significantly.

Image comparison between Quadcores and Hexa/Octa

http://www.pcgameshardware.de/commoncfm/comparison/clickSwitch.cfm?id=138531

What they have to say on the matter.

Quote

Update # 2 31.03.2017: We have supplemented the benchmarks for the results of a simulated R5-1500X OC (with the 1800X clocks). As can be seen, this quad core also performs much better than its related octal. For the solution, see the following section. In 1080p with the "High" preset, however, further tests of the Intel quad-core showed signs of a graphics limit - also the lowering of the clock to 3.6 GHz (Ryzen basic clock) did not change the Fps values of the i7-7700K. At 2.0 GHz fixed clock were still about 50 fps at about 37 fps 99er-Percentile.

Do you see the difference? Ok, we have marked it, but in the right dynamic AotS - the savegame and the integrated benchmark have in common - is not easy to discover, especially if a complete platform change, but at least a cold reboot is in between. Therefore, this difference has first gone through the flaps and was only revealed on the video: AotS lowers the picture quality on CPUs with less than six physical cores (SMT) in the regular game - not in the integrated benchmark - without the player's intervention Here is not a problem - a monochrome monochromatic monochrome, which is made by UEFI SMT, must also calculate the full board, for the sake of completeness, That we have not crossed with five physical cores). This is done in the normal game mode, but not in the integrated benchmark, and is invisible to the user - ie, the preset "high" does not point to this reduction in any option, even in the ini file, we did not find on track search. We had previously reinstalled AotS completely, the detail levels after manual deletion of the existing ini files again set to "high" etc. - all this showed no effect against AotS's unauthorized intervention.

As it appears, on Quadcores also SMT, ergo eight threads, whole particle systems are omitted - namely those of the enemy defense fire, which appear on the upper edge from the war nebula. They are missing in the Quadcore version, but they have to be included with six- and eight-core.Correspondingly, the benchmarks should be compared only between the quadcores among themselves, between eight-core ones, but not between eight- and four-core. We have inserted the values of a simulated Quadcore Ryzen (2 + 2 cores, SMT, 3.6-4.1 GHz), ie an overclocked R5 1500X, into the benchmarks, as well as a warning.

Update 31.3.2017: Ryzen profits clearly from the AotS update and comes in the case of the Achtkerner closer to Intel's Core i7-6900K approaches, in the case of the Savegames even paroli.Even the simulated R5-1500X OC can hold its own well against the Core i7-7700K, even if in the automatic image quality reduction higher fps spheres a possible graphics limit and thus possibly a larger distance between the quadcores can not be excluded.

Taf the Ghost · March 31, 2017

I'm glad they found out what was causing the issue, as when they tossed up those benchmarks, they made absolutely zero sense. Plus, 720p testing?

zMeul · March 31, 2017

I keep hoping reviewers finally see Ashes for the piece of shit benchmark that it is and ditch it for good

problems with Ashes:

integrated benchmark does not mimic the actual performance of gameplay
each benchmark run is it's own seed, it can't be replicated ever; each run is randomized producing inconsistent results
missing effects based on hardware - 1st was spotted with Pascal and Polaris comparison; now it turns out it cares about the number of cores

Speaker1264 · April 1, 2017

18 hours ago, Misanthrope said:

I can find/replace this entire argument back at the Bulldozer launch and it will be indistinguishable: "Look at this fringe case and weird exception! If you get these timings, this specific settings, this specific temps and perform the benchmark at midnight during a full moon AMD actually beats intel by 1% Just wait for more games and software to be optimized, because that will happen, Bulldozer is a great future proofing CPU!"'

Yes is not "fair" that almost no games are not optimized for AMD processors. That's life, it isn't supposed to be fair because you're not supposed to lose 5 fucking years with a crap product and expect to easily recover, expect gamers to buy your product to play ONE game better and then wait another 5 years, etc.

I didn't say it wasn't fair. I said wait until the new games come out that are optimized for it, and use those as a benchmark of performance. Not old games which have only been optimized for Intel.

Speaker1264 · April 1, 2017

21 hours ago, MandelFrac said:

Actually it doesn't mean that. If other games had their threads grouped in such a way that CCX crosstalk was already minimal, then there's no big gain to be had like there was here. There are some gains to get from having a double-sized L2, but they won't be anywhere near as big.

And let's not kid ourselves. The games industry is a good 8 years behind on optimization techniques CPU side and is 5 years behind in instruction sets (still little to no AVX usage even though it doubles the throughput over SSE which IS in use today).

DX12 wasn't released 8 years ago, and games are being optimized for it. It doesn't take that long to optimize for a new CPU architecture. Most new games coming out that have an optimization pass will most likely have some optimization for both Intel and Ryzen, compared to past games which were never optimized for the unique Ryzen architecture. And i doubt that anyone just happened to have threads grouped in a way that was somehow optimal for Ryzen when such a thing never existed before Ryzen. Ryzen is unique and different than traditional CPUs and there's no chance that any developer had planned for such a design.

MandelFrac · April 1, 2017

54 minutes ago, Speaker1264 said:

DX12 wasn't released 8 years ago, and games are being optimized for it. It doesn't take that long to optimize for a new CPU architecture. Most new games coming out that have an optimization pass will most likely have some optimization for both Intel and Ryzen, compared to past games which were never optimized for the unique Ryzen architecture. And i doubt that anyone just happened to have threads grouped in a way that was somehow optimal for Ryzen when such a thing never existed before Ryzen. Ryzen is unique and different than traditional CPUs and there's no chance that any developer had planned for such a design.

Why would you bet that? if you think about the problem solving process and task parallelism, it seems pretty obvious to me you launch co-dependent tasks in pairs, so as long as the chief dependencies are in threads 0-3 and 4-7 with minimal need to send things across that boundary (such as when doing the final data marshalling for the GPU), then actually it's not all that unlikely.

You don't have to plan for the design, just plan for having 4/6/8 core machines. If everything is either a power of 2 or at least has 2 as a factor, then launching dependent threads in pairs is fairly logical (and common actually if you were already optimizing for hyperthreading).

And if your requirements were for Core 2 Quad, you'd actually already be doing the same optimization pattern there as you would for Ryzen, minus SMT/Hyperthreading. So seriously, don't run your mouth when you're clueless. It's one thing to be skeptical, but to say there's no chance is flat out wrong on its face.

Sign In

Ashes of the Singularity Receives Ryzen Performance Update - Up to 31% Improvement

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites