Cool video from JayzTwoCents on CPUs

patrickjp93 · October 29, 2014

Are you saying only a snob can tell the difference between real lighting and dummy lighting (cannot remember the real term).

This is what is progressing at this point.

More and more games are trying to become more realistic (in the sense of motion, lighting, AI, and so forth).

You can put in dummy replacement, but people will notice it immediately.

Again, as I said: This is decision game developers have to make.

Note:

Sony and Microsoft are having issues with a not strong enough CPU:

http://www.pcper.com/news/General-Tech/Sony-PS4-and-Microsoft-Xbox-One-Already-Hitting-Performance-Wall

Consoles don't belong on this planet anymore. They cripple the whole industry. That said, no, most people can't tell the difference between amateur lighting (me) and the professional work that takes 8x the resources for 8% more realism. If you give me the same lights and walls, I can illuminate it just as well with far less intense mathematics. It can be made real enough the human eye won't be able to tell the difference. The only reason to use direct ray tracing is for absolutely precise physics simulation. In games it's a waste.

AI is another aspect that needs to be offloaded to the GPU. You can't have advanced AI without a large number of cores. Neural networks are like that, no two ways about it. This is another reason Nvidia and AMD both should be making efforts to have programmers use Intel's/AMD's iGPU when available for that and leave the image processing to the big iron.

vm'N · October 29, 2014

Consoles don't belong on this planet anymore. They cripple the whole industry. That said, no, most people can't tell the difference between amateur lighting (me) and the professional work that takes 8x the resources for 8% more realism. If you give me the same lights and walls, I can illuminate it just as well with far less intense mathematics. It can be made real enough the human eye won't be able to tell the difference. The only reason to use direct ray tracing is for absolutely precise physics simulation. In games it's a waste.

AI is another aspect that needs to be offloaded to the GPU. You can't have advanced AI without a large number of cores. Neural networks are like that, no two ways about it. This is another reason Nvidia and AMD both should be making efforts to have programmers use Intel's/AMD's iGPU when available for that and leave the image processing to the big iron.

Consoles have some incredible benefits, especially from the gaming perspective.

Most people can tell the difference.

Again, in a controlled environment, you might can create the same output as a real-lighting. But, it is the thing about gaming.

Games starve to avoid been a to controlled environment, as it does decrease the gaming-experience.

AI benefits both from CPU cores and GPU cores. You cannot keep up with the control of a CPU core, therefore to fully outsource it to the GPU, you need to decrease the intelligent of the AIs.

patrickjp93 · October 29, 2014

Consoles have some incredible benefits, especially from the gaming perspective.

Most people can tell the difference.

Again, in a controlled environment, you might can create the same output as a real-lighting. But, it is the thing about gaming.

Games starve to avoid been a to controlled environment, as it does decrease the gaming-experience.

AI benefits both from CPU cores and GPU cores. You cannot keep up with the control of a CPU core, therefore to fully outsource it to the GPU, you need to decrease the intelligent of the AIs.

Consoles offer no advantages beyond uniformity (ease of programming) and portability.

You say that, but our rendition of BF4 calls "BS."

Not true. AI boils down to comparisons, branching, loops, and statistics. Nothing among those is something a CPU is better at. You can do near anything in AI on the GPU and do it faster than you can do on a CPU. That's the reason IBM went with a hardware neural network with a ton of cores. No one needs a CPU for it. You're still on training wheels at that point.

vm'N · October 29, 2014

Consoles offer no advantages beyond uniformity (ease of programming) and portability.

No. I have actually covered the two greatest benefits the console have that a regular PC (using a regular OS) doesn't have.

It is basically that the console is a RTOS and have a fixed hardware set.

Portability goes out when you try to convert from RTOS to non-RTOS.

EDIT = I think we misunderstood eachother here.

I do agree with you. (I was more talking about the benefits a console have over a regular PC).

You say that, but our rendition of BF4 calls "BS."

How do I know, it is not just you calling "BS".

Sources.

Not true. AI boils down to comparisons, branching, loops, and statistics. Nothing among those is something a CPU is better at. You can do near anything in AI on the GPU and do it faster than you can do on a CPU. That's the reason IBM went with a hardware neural network with a ton of cores. No one needs a CPU for it. You're still on training wheels at that point.

First of all. Branches is terrible on a GPU compared to a CPU.

Loops is the same (however, I do believe the CPUs loop-detection is better).

The CPU is also better at doing logic workload, which is also important for AI.

patrickjp93 · October 29, 2014

No. I have actually covered the two greatest benefits the console have that a regular PC (using a regular OS) doesn't have.

It is basically that the console is a RTOS and have a fixed hardware set.

Portability goes out when you try to convert from RTOS to non-RTOS.

How do I know, it is not just you calling "BS".

Sources.

First of all. Branches is terrible on a GPU compared to a CPU.

Loops is the same (however, I do believe the CPUs loop-detection is better).

The CPU is also better at doing logic workload, which is also important for AI.

Eh, fixed hardware set goes into uniformity of programming, but there's no way to upgrade the experience. It's too rigid a system without enough potential.

Branching on GPUs is no more difficult than on CPUs. Now, there is no branch predictor, so your work speed is guaranteed. That holds true for loops too. A GPu can simultaneously try each pair of branches and save the ones that end up true, whereas a CPU flushes the whole pipeline in the event of a miss. It's all about how you use the tool. It's more powerful and versatile than you think.

My sources include the entire video game design club of my university trying it out. Effectively no difference, and better performance. Of course, I can't publish the code since it's copyrighted, but I'm sorry you're just going to have to trust me, and given my reputation, that shouldn't be difficult for you.

vm'N · October 29, 2014

Eh, fixed hardware set goes into uniformity of programming, but there's no way to upgrade the experience. It's too rigid a system without enough potential.

I made a edit in my previous comment.

My point was that having support for a more hardware does decrease the potential utilization of the hardware.

Branching on GPUs is no more difficult than on CPUs. Now, there is no branch predictor, so your work speed is guaranteed. That holds true for loops too. A GPu can simultaneously try each pair of branches and save the ones that end up true, whereas a CPU flushes the whole pipeline in the event of a miss. It's all about how you use the tool. It's more powerful and versatile than you think.

GPUs as a computing unit IS worse than a CPU at doing branches. You WILL lose performance when running branches on a GPU, that is guaranteed.

I do believe newer architecture, flush until the miss was discovered not the whole pipeline (not entirely sure).

patrickjp93 · October 29, 2014

I made a edit in my previous comment.

My point was that having support for a more hardware does decrease the potential utilization of the hardware.

GPUs as a computing unit IS worse than a CPU at doing branches. You WILL lose performance when running branches on a GPU, that is guaranteed.

I do believe newer architecture, flush until the miss was discovered not the whole pipeline (not entirely sure).

Given you have to flush everything after the branch miss (everything before the branch is fine).

And you don't lose performance. You just don't have prediction. So you do the calculations like normal.

LiamApex · October 29, 2014

I have nothing against either of these companies they are both exceptional in one way or another i my self have always used intel jist because thats what was there or avalible however i have tried a couple of amd products and was not dissapointed however the bias fanboyism does iratate me

vm'N · October 31, 2014

Given you have to flush everything after the branch miss (everything before the branch is fine).

And you don't lose performance. You just don't have prediction. So you do the calculations like normal.

You do lose performance. Each time you have to run some conditional code, it will have to diverge the wrap, which decreases the parallelism.

There is a reason why people recommend not running branched code on a GPU. They are terrible at it.

Joshua Ondangan · October 31, 2014

I have nothing against either of these companies they are both exceptional in one way or another i my self have always used intel jist because thats what was there or avalible however i have tried a couple of amd products and was not dissapointed however the bias fanboyism does iratate me

I'm proud of you brother at least you aren't becoming like some blind fanboys around here.

although I can say I love the intel cpu product but hey I have amd too from athlons to phenoms to bulldozer and vishera.

the thing here is people sniff too many numbers from benchmark and stuff that they had forgotten about real time playing difference.

just turn off fps counter on your screen just play the damn game and voila since you have been too immersed and too busy fragging faggots online you didn't even notice that the amd 8320/8350 were doing fine for the money.

heck few days ago I decided to dig up my old 1100T phenom II x6 rig I have with a 560 Ti. installed battlefield 4. played the game on low/med settings to put it on playable frames and voila I even forgot that I'm playing on a single 1080p monitor since I was too busy teaching those campers how to fight CQC.

but I did notice one thing though on bf4 the lower the settings it seems I can see better and manage to frag easier although still had 9 deaths sadly.

patrickjp93 · October 31, 2014

You do lose performance. Each time you have to run some conditional code, it will have to diverge the wrap, which decreases the parallelism.

There is a reason why people recommend not running branched code on a GPU. They are terrible at it.

Which is why you only use loop constructs for embarrassingly parallel problems. But still, it's one freaking cycle per comparison. The performance hit is not that big.

vm'N · October 31, 2014

Which is why you only use loop constructs for embarrassingly parallel problems. But still, it's one freaking cycle per comparison. The performance hit is not that big.

That is still not a solution for everything.

You will run into branches. When the GPU will have to diverge the wrap, into stacks, whereafter it will have to execute it in a serial manner.

This is what effectively decreases the parallelism using branches in GPU workloads, and why almost EVERYONE recommend not running branched code on a GPU.

rentaspoon · October 31, 2014

snip

Yep could agree with you more, as for the being able to see better on lower settings, I keep getting told that competitive players play on lower settings as some shadows aren't rendered and that sort of thing making it easier to spot other players, I don't know how true that is tho.

patrickjp93 · October 31, 2014

That is still not a solution for everything.

You will run into branches. When the GPU will have to diverge the wrap, into stacks, whereafter it will have to execute it in a serial manner.

This is what effectively decreases the parallelism using branches in GPU workloads, and why almost EVERYONE recommend not running branched code on a GPU.

It's still sometimes the only solution. If you want to compare 2 long lists, the GPU will do it faster than the CPU. If you need to modify a lot of data based on a small set of conditions, the GPU is better for doing that. I'd still argue the hit to parallelism is nothing. If you build your kernels correctly and have enough of them, you won't notice the hit.

Joshua Ondangan · October 31, 2014

Yep could agree with you more, as for the being able to see better on lower settings, I keep getting told that competitive players play on lower settings as some shadows aren't rendered and that sort of thing making it easier to spot other players, I don't know how true that is tho.

well on counter strike yes this is true and yes also true in bf4 as I play it on low settings. I don't see stuff that distracts me and less effects going on my screen helps me spot people more.

harrynowl · October 31, 2014

One thing to consider is that people are always learning, even tech YouTubers.

rentaspoon · October 31, 2014

well on counter strike yes this is true and yes also true in bf4 as I play it on low settings. I don't see stuff that distracts me and less effects going on my screen helps me spot people more.

Its true on most games tbh, people start to "pop out" for me, which is a help as enemy's and allies look the same to me since you can edit soldiers uniform colours.

I miss the days of red v blue v green v yellow

Joshua Ondangan · October 31, 2014

Its true on most games tbh, people start to "pop out" for me, which is a help as enemy's and allies look the same to me since you can edit soldiers uniform colours.

I miss the days of red v blue v green v yellow

hahaha true I agree on this.

btw have you tried the very first Delta Force? where mountains would look like pixels and you wouldn't know if an enemy is there due to them appearing as pixels but has some color coding to know whether they are enemy or now

vm'N · October 31, 2014

It's still sometimes the only solution. If you want to compare 2 long lists, the GPU will do it faster than the CPU.

No. The SIMD clusters that are integrated on the CPU, will mostlikely be better at comparisons. It really have to be a VERY long list, before sending it over the PCI lane will be benefitial.

I'd still argue the hit to parallelism is nothing. If you build your kernels correctly and have enough of them, you won't notice the hit.

Nothing? The constant overhead of diverging and reconverging wraps.

No matter how you build the kernel, you cannot escape this. This is an effect because of how our current technologies works.

Well, I also doubt a human been can notice a pipeline flush, however they are still bad and do affect the overall performance.

patrickjp93 · October 31, 2014

No. The SIMD clusters that are integrated on the CPU, will mostlikely be better at comparisons. It really have to be a VERY long list, before sending it over the PCI lane will be benefitial.

Nothing? The constant overhead of diverging and reconverging wraps.

No matter how you build the kernel, you cannot escape this. This is an effect because of how our current technologies works.

Well, I also doubt a human been can notice a pipeline flush, however they are still bad and do affect the overall performance.

Take my example of needing to count the number of a specific set of symbols in a document and toggle/flip the case (or generate a hash even). With some rough calculations, if you have more than 800 characters it's faster to send it to the GPU even if you have a 4790k.

You overestimate the effect of wraps. It all depends on the actual task at hand. I wouldn't run a serial program on a GPU, yet that's exactly what you're accusing me of. Often times loops can just be unrolled and distributed across the compute blocks as long as the result of each doesn't depend on the previous. That's a perfect time for a loop construct in a GPU. If each part of the loop only makes one choice, including an if is negligible in performance loss.

__kernel void toggle(char* toCheck, unsigned long length, char* specials, unsigned long size){

for(int i=0; i<length; i++){

for(int j=0; j<size; j++){

if(toCheck == specials[j]){

toCheck += 26;

break;

} else if (toCheck == specials[j]+26) {

toCheck -= 26;

break;

}

This is just a really simple example of an embarrassingly parallel problem that works better on the GPU for inputs larger than a certain size. Now, there are conditional breaks, but you can amortize the effects and predict with certainly exactly how many cycles this operation will take at minimum and maximum. The reconvergence takes almost no time, about 1 extra cycle.

Sign In

Cool video from JayzTwoCents on CPUs

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites