Jump to content

Raytracing API

8 minutes ago, laminutederire said:

Well they are in theory vastly the same...

In practice games renderer use raytracers and movie use pathtracers. These exact pathtracers they dans to make an api for.

In this case open would be better for the sole fact that it is more inline with what researchers want to potentially have.

You can't blame a consumer to look for his best interest. I could care less about them failing, I personally want better tech sooner.

 

Err not exclusively there are specific path tracers, path tracers are ray tracers, and not all ray tracers are path tracers ;)

 

Depends on the renderer being use, plus movies don't use GPU's for rendering effects, they produce artifacts even though the difference can be minimal.

 

usually the renderer is a combination of both path and ray tracing for movie fx.

 

It all depends on the accuracy you are going for at the end. Speed vs Accuracy.  You can't do path tracing in real time right now, way too much computational power is needed.

Link to comment
Share on other sites

Link to post
Share on other sites

4 hours ago, Crunchy Dragon said:

You can add some more info, all it needs to stay in this subforum is a quote from the article and personal input.

Mod: Edit your post or we'll remove it.
User: Eh, remove it, it's cool.
Mod: Wait a second you're not supposed to be okay the threat to remove it... D:

Link to comment
Share on other sites

Link to post
Share on other sites

When big hero six came out it was supposed have the most raytracing of all the releases to date,  I didn't really notice and I was looking for it, I certainly didn't hear anyone talking about how much better that quality was.  I am wondering form a general user perspective just how much of an impact this will have.

 

 

42 minutes ago, AshleyAshes said:

Mod: Edit your post or we'll remove it.
User: Eh, remove it, it's cool.
Mod: Wait a second you're not supposed to be okay the threat to remove it... D:

Glad I wasn't the only person to notice that.

Grammar and spelling is not indicative of intelligence/knowledge.  Not having the same opinion does not always mean lack of understanding.  

Link to comment
Share on other sites

Link to post
Share on other sites

31 minutes ago, Razor01 said:

Err not exclusively there are specific path tracers, path tracers are ray tracers, and not all ray tracers are path tracers ;)

 

Depends on the renderer being use, plus movies don't use GPU's for rendering effects, they produce artifacts even though the difference can be minimal.

 

usually the renderer is a combination of both path and ray tracing for movie fx.

 

It all depends on the accuracy you are going for at the end. Speed vs Accuracy.  You can't do path tracing in real time right now, way too much computational power is needed.

I think you inverse those. Raytracers cast rays, which are line starting from the light or the camera, and a second gathering ray. With the Heckbert notation: EDL paths. Paths tracers are designed for E( D|S)*L paths and so on. That makes ray tracers specialised path tracers.

Well at Disney they use Gpus at least for denoising.

 

The renderer does path tracing and use accelerated structures and methods whenever it can, so yeah usually the ray traced direct illumination is done on it's own.

Path tracing is the logical next step for the future, even though yes it is not yet at a correct state speedwise: hence the important works on path guiding and denoising to significantly reduce compute time. It is the next step since most of what game engines do is caching on for about indirect lightning to be used in ray traced direct lightning.

Link to comment
Share on other sites

Link to post
Share on other sites

ah yes you are correct, I mixed the two around!

 

Currently we need around 50x the computational power to do path tracing in real time, so its not going to be any time soon to see those techniques in real time at least not for a full application. Maybe demos and such in 2 or 3 gens.

Link to comment
Share on other sites

Link to post
Share on other sites

27 minutes ago, Razor01 said:

ah yes you are correct, I mixed the two around!

 

Currently we need around 50x the computational power to do path tracing in real time, so its not going to be any time soon to see those techniques in real time at least not for a full application. Maybe demos and such in 2 or 3 gens.

I believe intelligent machine learning could reduce that computational power down by half at least since graphics are very redundant

Link to comment
Share on other sites

Link to post
Share on other sites

There has been old games that use path tracing already, Quake 2 if I'm not mistaken, in 2011ish or so.  But the best hardware back then could only run Quake 2 path tracing version at like 30 FPS lol.

 

Actually found a video from last year

 

 

Its not about reducing the computational power using machine learning completely either.

 

We still haven't reached model poly resolutions close enough to realism yet in real time.

 

Most games currently use around 30k to 50k polys per character in game.  Cinematic characters tend to use around 200k.  This is all before subdivision or tessellation.

 

Texture resolution we aren't there yet either. 

 

This is why fixed function pipelines aren't going to be removed anytime soon in GPU's. 

 

The entire picture, to increases computational power, need more silicon and or smaller nodes. 

 

 

Link to comment
Share on other sites

Link to post
Share on other sites

33 minutes ago, Razor01 said:

There has been old games that use path tracing already, Quake 2 if I'm not mistaken, in 2011ish or so.  But the best hardware back then could only run Quake 2 path tracing version at like 30 FPS lol.

 

Actually found a video from last year

 

 

Its not about reducing the computational power using machine learning completely either.

 

We still haven't reached model poly resolutions close enough to realism yet in real time.

 

Most games currently use around 30k to 50k polys per character in game.  Cinematic characters tend to use around 200k.  This is all before subdivision or tessellation.

 

Texture resolution we aren't there yet either. 

 

This is why fixed function pipelines aren't going to be removed anytime soon in GPU's. 

 

The entire picture, to increases computational power, need more silicon and or smaller nodes. 

 

 

i would take a higher poly model vs better lighting everyday of the week

Link to comment
Share on other sites

Link to post
Share on other sites

Just now, cj09beira said:

i would take a higher poly model vs better lighting everyday of the week

I would post up some screens of game characters I'm working on right now, if it was possible, in this thread, I will later on in another thread, but it will also be shameless self promotion of a game I'm working on.  We are using 250k poly per character on this up coming title.

Link to comment
Share on other sites

Link to post
Share on other sites

1 minute ago, yian88 said:

Nvidia sucks.

AMD did nothing wrong.

 

 

Wow that was enlightening lol.  Good to see higher level thinking is something we all strive for.

Link to comment
Share on other sites

Link to post
Share on other sites

On 3/17/2018 at 10:37 AM, Razor01 said:

Yeah and what the hell does that have to do with an API coming from MS?

Because the API development had direct input from Nvidia and as stated is designed with their hardware architectures in mind. I would have the same complaint if the roles were reversed and AMD had market dominance and the API was optimized for GCN.

 

There's a difference between taking a technology and developing an API for it then creating specific hardware optimization paths in it to work best with different architectures than to create an API from the start with an architecture in mind. This is the core of the issue that is being pointed out, pre-optimization vs post-optimization when your trying to say that these APIs are hardware/vendor agnostic but then also say there are hardware requirements for it, those would be different if it were designed for a different architecture so it isn't hardware agnostic then is it.

Link to comment
Share on other sites

Link to post
Share on other sites

3 minutes ago, leadeater said:

Because the API development had direct input from Nvidia and as stated is designed with their hardware architectures in mind. I would have the same complaint if the roles were reversed and AMD had market dominance and the API was optimized for GCN.

 

There's a difference between taking a technology and developing an API for it then creating specific hardware optimization paths in it to work best with different architectures than to create an API from the start with an architecture in mind. This is the core of the issue that is being pointed out, pre-optimization vs post-optimization when your trying to say that these APIs are hardware agnostics then then also say there are hardware requirements for it, those would be different if it were designed for a different architecture so it isn't hardware agnostic then is it.

 

 

I think there was reason for that, its unusual for MS not to try to get AMD's input on this.  nV can't strong arm MS, doesn't matter what it is, just not possible.  MS won't allow it.  They didn't even let Intel strong arm them so nV is nothing compared to Intel.  I think AMD's architecture is not capable of this in real time because some features are not there.  This is why we see all this ray tracing stuff coming out now, even though the concepts were there years ago for real time ray tracing.  There needs to be certain things on GPU's that just weren't there for acceptable frame rates.

Link to comment
Share on other sites

Link to post
Share on other sites

16 minutes ago, Razor01 said:

I think there was reason for that, its unusual for MS not to try to get AMD's input on this.  nV can't strong arm MS, doesn't matter what it is, just not possible.  MS won't allow it.  They didn't even let Intel strong ARM them so nV is nothing compared to Intel.  I think AMD's architecture is not capable of this in real time because some features are not there.  This is why we see all this ray tracing stuff coming out now, even though the concepts were there years ago for real time ray tracing.  There needs to be certain things on GPU's that just weren't there for acceptable frame rates.

It's not really a case of Nvidia strong arming them though, it's purely a case of be wary as influence always comes at a price. It's a two way arrangement and Nvidia isn't going to get nothing of it. As a hardware company you can't not give advice based on what your hardware can do, that's your self interest in this arrangement.

 

As for AMD's hardware I don't know, if it's just down to thread level independence then I don't know how much AMD is lacking in that or the true details on the Nvidia side without burying myself deep in technical documentation that I can't be bothered to do right now at 6am lol. I did find this though, so if you have the Nvidia tech details side of it on how truly independent they are let me know.

 

Quote

Each CU has four SIMDs. Each SIMD has their own active waves (up to 10). A wave runs on a single SIMD (from beginning to end). Latency hiding occurs at SIMD level. Occupancy of 8 means that SIMD has 8 active waves. This is similar to 8-way hyperthreading.

 

CU doens’t move waves from one SIMD to another. CU level active wave-count isn’t relevant to latency hiding. For example: SIMD A and B have 10 waves running and SIMD C and D have only 4. Occupancy is 4 on SIMDs C and D, and these SIMDs have significantly lower latency hiding capability than SIMD A and B.

 

SIMDs do not execute instructions in lockstep. If there’s no memory instructions a SIMD could execute one wave from start to end without switching to any other wave, and then start the next wave, etc. Execution is only synchronized (at thread group level) when a barrier is met. No wave can progress beyond the barrier before all other waves (on all four SIMDs) have also reached that barrier. Barrier is the only way to synchronize work between the four SIMDs. They are otherwise fully independent. CU offers shared 64 KB of LDS memory for communication between the SIMDs (write data there + barrier to ensure other threads see your data).

https://gpuopen.com/optimizing-gpu-occupancy-resource-usage-large-thread-groups/

Link to comment
Share on other sites

Link to post
Share on other sites

24 minutes ago, leadeater said:

It's not really a case of Nvidia strong arming them though, it's purely a case of be wary as influence always comes at a price. It's a two way arrangement and Nvidia isn't going to get nothing of it. As a hardware company you can't not give advice based on what your hardware can do, that's your self interest in this arrangement.

 

As for if AMD's hardware I don't know, if it's just down to thread level independence then I don't know how much AMD is lacking in that or the true details on the Nvidia side without burying myself deep in technical documentation that I can't be bothered to do right now at 6am lol. I did find this though, so if you have the Nvidia tech details side of it on how truly independent they are let me know.

 

https://gpuopen.com/optimizing-gpu-occupancy-resource-usage-large-thread-groups/

 

 

We have instructions on both AMD and nV's hardware, instructions are broken down to wave fronts on AMD hardware, Warps on nV hardware.   Currently these threads are all synchronous, because the hardware isn't "smart" enough to synchronize them if there are race conditions or critical path needs.  Think of things like Z buffers and transparencies and Z depth

 

If we are working on bunches of pixels, that would cause a major problem based on how a program is written, because they are working on multiple pixels, not all pixels are going to be rendered the same way based on what is happening in the scene.

 

GCN does have some level of instruction independence, but thread wise its all locked down right now.  Pascal, does too but not as granular as GCN.  This is why Async compute (concurrent work loads) work better on AMD hardware.  Volta has thread level independence, that means instruction level has to be there too.  It forgoes all needs for async compute because its now transparent to the programmer.  Doesn't matter how the program is written,the processor can do the job right any which way it wants to do it.  Now of course its always better to keep things batched up in warps but the performance penalty if you can't do that is not going to be great.  We won't get huge under utilization on Volta like current technologies.

 

This is a hypothesis on my part, and why AMD wasn't part of this endeavor.

 

People can point to AMD and its renderer based on Raytracing which they showed off with Vega, but its only doing a frame a second or so, which is way to slow for real time. 

 

taking this information and pointing to ray tracing where each pixel will have some sort of reflection, refraction, coefficients for just lighting.  There are a lot of implications on current hardware that might not work so well.

Link to comment
Share on other sites

Link to post
Share on other sites

19 minutes ago, Razor01 said:

 

 

We have instructions on both AMD and nV's hardware, instructions are broken down to wave fronts on AMD hardware, Warps on nV hardware.   Currently these threads are all synchronous, because the hardware isn't "smart" enough to synchronize them if there are race conditions or critical path needs.  Think of things like Z buffers and transparencies and Z depth

 

If we are working on bunches of pixels, that would cause a major problem based on how a program is written, because they are working on multiple pixels, not all pixels are going to be rendered the same way based on what is happening in the scene.

 

GCN does have some level of instruction independence, but thread wise its all locked down right now.  Pascal, does to but not as granular as GCN.  This is why Async compute (concurrent work loads) work better on AMD hardware.  Volta has thread level independence, that means instruction level has to be there too.  It forgoes all needs for async compute because its now transparent to the programmer.  Doesn't matter how the program is written,the processor can do the job right any which way it wants to do it.  Now of course its always better to keep things batched up in warps but the performance penalty if you can't do that is not going to be great.  We won't get huge under utilization on Volta like current technologies.

 

This is a hypothesis on my part, and why AMD wasn't part of this endeavor.

 

People can point to AMD and its renderer based on Raytracing which they showed off with Vega, but its only doing a frame a second or so, which is way to slow for real time. 

I'd still like to see the technical details before passing much judgement on it. I could make similar statements about thread level independence on AMD hardware if I summarized it really badly, because that does exist with GCN but it's to what extent that actually matters.

 

Quote

Barrier is the only way to synchronize work between the four SIMDs. They are otherwise fully independent. CU offers shared 64 KB of LDS memory for communication between the SIMDs (write data there + barrier to ensure other threads see your data).

So I could say on a 64 CU GPU it has the capability to execute 256 independent threads which isn't an untrue statement but it's not the fully story either. Not when you compare it to

Quote

Modern AMD GPUs are able to execute two groups of 1024 threads simultaneously on a single compute unit (CU). 

 

I don't doubt Volta and later Nvidia architecture are much much better in this respect but I'd like to know by how much.

Link to comment
Share on other sites

Link to post
Share on other sites

3 minutes ago, leadeater said:

I'd still like to see the technical details before passing much judgement on it. I could make similar statements about thread level independence on AMD hardware if I summarized it really badly, because that does exist with GCN but it's to what extent that actually matters.

 

So I could say on a 64 CU GPU it has the capability to execute 256 independent threads which isn't an untrue statement but it's not the fully story either. Not when you compare it to

 

I don't doubt Volta and later Nvidia architecture are much much better in this respect but I'd like to know by how much.

 

 

Oh ok I see what you are coming from,  Yeah SM's are independent from each other. Those threads are separate, but we can't really have two threads that are doing something different on the same SM unless its programmed for, the chip must be to told to do so.  That's async compute or concurrent work loads step in, as before it must be very specific.

 

So lets say we have unused ALU's in one SM, unless we tell the GPU if it has resources open, do these instructions, it won't do it.

 

This is different in Volta.  It will do it lol.

 

http://images.nvidia.com/content/volta-architecture/pdf/volta-architecture-whitepaper.pdf

 

page 31.

Link to comment
Share on other sites

Link to post
Share on other sites

On 3/16/2018 at 5:01 PM, mr moose said:

I think that is true.

 

AMD spent all their time work on mantle and freesync.  Nvidia spent their time giving game developers resources that work and make their jobs easier.  Guess who won that battle?    There was nothing stopping AMD from creating API's and producing resources for game dev's that worked on their hardware.

not many understand this

they think that is nvidia being assholes but fact remains games cost shit load of money now days

and whatever reduction in cost  matters

 

but anyways carry on

Link to comment
Share on other sites

Link to post
Share on other sites

1 hour ago, Razor01 said:

 

 

Oh ok I see what you are coming from,  Yeah SM's are independent from each other. Those threads are separate, but we can't really have two threads that are doing something different on the same SM unless its programmed for, the chip must be to told to do so.  That's async compute or concurrent work loads step in, as before it must be very specific.

 

So lets say we have unused ALU's in one SM, unless we tell the GPU if it has resources open, do these instructions, it won't do it.

 

This is different in Volta.  It will do it lol.

 

http://images.nvidia.com/content/volta-architecture/pdf/volta-architecture-whitepaper.pdf

 

page 31.

That's not quite the same thing, that's higher up in the stack on the front end of the GPU but that looks to be the key to it.

 

AMD has actually has better front end workload isolation than Nvidia has for a while, Volta is better now. AMD utilizes SR-IOV to split up GPU resources in to hardware level virtual devices, up to 16, that can be assigned to VMs or other cases as required which gives it very good QoS and performance consistency and predictability. Nvidia does there's a step higher at the driver level and requires more software integration but is more versatile. 

 

This next iteration in Volta is really cool though, specifically this.

Quote

Typical execution of multiple applications sharing the GPU is implemented with time-slicing, that is, each application gets exclusive access for a period of time before access is granted to another application. Volta MPS improves aggregate GPU utilization by allowing multiple applications to simultaneously share GPU execution resources when these applications individually under-utilize the GPU execution resources.

That's damn cool, though only up to 48.

Link to comment
Share on other sites

Link to post
Share on other sites

9 minutes ago, leadeater said:

That's not quite the same thing, that's higher up in the stack on the front end of the GPU but that looks to be the key to it.

 

AMD has actually has better front end workload isolation than Nvidia has for a while, Volta is better now. AMD utilizes SR-IOV to split up GPU resources in to hardware level virtual devices, up to 16, that can be assigned to VMs or other cases as required which gives it very good QoS and performance consistency and predictability. Nvidia does there's a step higher at the driver level and requires more software integration but is more versatile. 

 

This next iteration in Volta is really cool though, specifically this.

That's damn cool, though only up to 48.

 

 

Good points!

Link to comment
Share on other sites

Link to post
Share on other sites

https://hothardware.com/news/nvidia-rtx-technology-real-time-ray-tracing

 

Quote

the GPU architecture actually has specific hardware features onboard (in addition to the Tensor cores) to help accelerate ray tracing. The company isn't really offering up any more information on this hardware at the moment. And while NVIDIA won't delve into actual performance benchmarks, it did note that Volta is “multiple integers faster” than previous generation architectures when it comes to ray tracing.
Read more at https://hothardware.com/news/nvidia-rtx-technology-real-time-ray-tracing#0bShYOkdgkfRDhTk.99

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×