nairolf

Member

View Profile See their activity

Posts
7
Joined
February 18, 2014
Last visited
February 25, 2015

Reputation Activity

nairolf reacted to GeekySole9354 in Poll: What Content Do you Want to See from Linus Media Group March 16, 2014

I think it would be better if you did both but not on the same channel.
nairolf reacted to deathjester in Modern OpenGL vs Mantle? (from NVIDIA's talk at Steam Dev Days) February 19, 2014

Reason for that. Mac, Wii, PS4, mobile, linux, Steam OS = OpenGL. So they have to work with OpenGL anyways on ports. The only thing not OpenGL is Windows/Xbox. Many of the new up and coming people are doing mobile and that is OpenGL. Games like Angry Birds are OpenGL.
Many triple A games in the past were OpenGL.

http://en.wikipedia.org/wiki/List_of_OpenGL_programs

What happens when you talk about OpenGL, or question MS forcing OS upgrades through Direct X?

This happens.
http://en.wikipedia.org/wiki/Astroturfing

MS is very good at it.

http://antitrust.slated.org/www.iowaconsumercase.org/011607/3000/PX03096.pdf

Here are documents that surfaced in an antitrust case where MS laid out plans to do just that. MS "evangelists" as they call themselves have sets of rules, which involve getting control of people in media, and changing opinion through message boards. They also have plans of attack, such as questioning the mental abilities of people who disagree with them, associating people that don't use or like a product as being mentally deficient

How many times have you seen "only non tech savvy people hate Win8", "you are holding back technology", and on and on. This is nothing new. This is Microsoft. The funny thing is, their shenanigans have been shown in the court of law, they have been caught astroturfing and impersonating people that are dead when writing in letters to press, and their "evangelists" will tell you that you wear a tin foil hat when you talk about it.
Now as far as Mantle? OpenGL with extensions would be the same thing. AMD showed a slide (Mantle presentation) where they wanted Mantle on Linux, Mac OS. This would not be using DirectX shaders which Mantle does, this would be using OpenGL. AMD is a partner of the group that owns OpenGL along with Nvidia and Intel. http://en.wikipedia.org/wiki/Khronos_Group
nairolf reacted to PillowSmoke in Samsung gets back to mocking Apple in latest commercials February 19, 2014

Samsung used to spend a big part of their marketing budget knocking Apple’s iPhone, but got away from that with recent ads. Those were some of their best commercials, and they are finally getting back their swagger with two more commercials. The first one compares the Note 3 to the iPhone 5S with a little help from LeBron James highlight reels, and the second one brings a new battle for Samsung, the iPad Air. In this one, the TabPRO 10.1 shows its prowess along with a pencil. Both of these commercials will begin airing today. It is expected that Samsung will put more of an emphasis on tablets this year, so expect them to go hard and heavy with more commercials against the iPad. Check out the 60-second spots after the break. http://www.youtube.com/watch?v=sCnB5azFmTs http://www.youtube.com/watch?v=fThtsb-Yj0w Source: http://www.talkandroid.com/196902-samsung-gets-back-to-mocking-apple-in-latest-commercials/
nairolf got a reaction from Scionyde in Modern OpenGL vs Mantle? (from NVIDIA's talk at Steam Dev Days) February 19, 2014

John McDonald (NVIDIA) dedicates the last 15 minutes of the Steam Dev Days talk "Beyond Porting: How Modern OpenGL can Radically Reduce Driver Overhead" to address the "need more draw calls per second" problem.
According to him this can be solved with modern OpenGL on current GPUs of the three major vendors.
No need to use new APIs like Mantle for that.
(He implicitly mentions Mantle in the beginning by saying "[More draw calls] is the motivation for an entirely new API. [...]")
Video
(should skip to 27:42 automatically)
The talk is quite technical but I'll try to summarize it in this post...
But first the results:
5-30x increase in number of distinct objects per second ~75% reduced interaction with driver (less CPU load/waste) GPU can be affected negatively (although not too badly) [Compared to the original OpenGL implementation and NOT to Direct3D. But as John said, OpenGL without these tricks is already better than Direct3D.] Summary of the video:
PC developers are frustrated that console developers can get 5 to 20 times as many draw calls per second.
Direct3D is slow if you need to draw many objects, naive OpenGL is faster but by using a few of the newer OpenGL extensions you can improve that drastically.
They used API traces of a real-world application (Unreal Engine 4) and analyzed it to see how they can improve the code.
OpenGL function calls cause "state changes" inside the GPU. Think of the GPU as a factory with lots of machines that are all connected together.
Each of the machines can be configured to do something specific. Once everything is set up the factory will produce exactly that thing you specified it to produce.
The configuration of all machines is called the factory's state. If you want to change what the factory produces you have to change its state.
Different state changes have different costs because some machines are more difficult to reconfigure than others.
The following diagram shows how the different state changes compare to each other in terms of execution time:

Let's try to minimize the state changes as much as possible. The factory doesn't need to be reconfigured completely for each part that you produce. Many similar parts have almost the same production steps:
Use Sparse Bindless Textures to eliminate texture changes between draw calls (Place all your needed materials close to the machine that needs them instead of in a warehouse. Prepare everything you need before starting production.) Pack many objects in an UBO, use persistent mappings, use ARB_shader_storage_buffer_object (Don't feed the machine parts individually. Give it a pile of stuff to work on.) Use ARB_multi_draw_indirect to pack multiple draw calls together (Don't produce one thing and then have a meeting with all your machine operators to tell them what to do next. Plan ahead what the next N products will be and give a detailed list to all of the operators.) Results of these optimizations:
=> With modern OpenGL you can choose at run-time how you balance the workload on CPU and GPU
The following image shows the reduction in state changes (OpenGL function calls). Each square is a state change operation.
Each line is a draw operation and consists of up to 7 state change operations and one draw call.
Because of spacing there are 4 columns that should be seen as one big stream of draw operations (lines).
This is only a part of an even longer stream of draw operations so some state changes are not visible at all in this image (red and orange).

All of these OpenGL extensions are available on NVIDIA Kepler GPUs and some of them on Fermi.
All of them are also implemented by AMD and Intel "for a pretty reasonable fraction of hardware".
Here is a (hopefully complete) list of the used OpenGL extensions:
Sparse Textures (contributed by AMD, NVIDIA)
Bindless Textures (contributed by AMD)
UBOs (contributed by APPLE, NVIDIA)
Shader Storage Buffer Object (contributed by NVIDIA, AMD)
Persistent Mapping (contributed by NVIDIA)
Multi Draw Indirect (contributed by AMD)
My opinion:
If developers decide to move away from Direct3D, why not join the OpenGL club instead of Mantle?
OpenGL is vendor-independant, extensible, well tested and optimized and works on most platforms NOW.
nairolf got a reaction from Beliskner in Modern OpenGL vs Mantle? (from NVIDIA's talk at Steam Dev Days) February 19, 2014

John McDonald (NVIDIA) dedicates the last 15 minutes of the Steam Dev Days talk "Beyond Porting: How Modern OpenGL can Radically Reduce Driver Overhead" to address the "need more draw calls per second" problem.
According to him this can be solved with modern OpenGL on current GPUs of the three major vendors.
No need to use new APIs like Mantle for that.
(He implicitly mentions Mantle in the beginning by saying "[More draw calls] is the motivation for an entirely new API. [...]")
Video
(should skip to 27:42 automatically)
The talk is quite technical but I'll try to summarize it in this post...
But first the results:
5-30x increase in number of distinct objects per second ~75% reduced interaction with driver (less CPU load/waste) GPU can be affected negatively (although not too badly) [Compared to the original OpenGL implementation and NOT to Direct3D. But as John said, OpenGL without these tricks is already better than Direct3D.] Summary of the video:
PC developers are frustrated that console developers can get 5 to 20 times as many draw calls per second.
Direct3D is slow if you need to draw many objects, naive OpenGL is faster but by using a few of the newer OpenGL extensions you can improve that drastically.
They used API traces of a real-world application (Unreal Engine 4) and analyzed it to see how they can improve the code.
OpenGL function calls cause "state changes" inside the GPU. Think of the GPU as a factory with lots of machines that are all connected together.
Each of the machines can be configured to do something specific. Once everything is set up the factory will produce exactly that thing you specified it to produce.
The configuration of all machines is called the factory's state. If you want to change what the factory produces you have to change its state.
Different state changes have different costs because some machines are more difficult to reconfigure than others.
The following diagram shows how the different state changes compare to each other in terms of execution time:

Let's try to minimize the state changes as much as possible. The factory doesn't need to be reconfigured completely for each part that you produce. Many similar parts have almost the same production steps:
Use Sparse Bindless Textures to eliminate texture changes between draw calls (Place all your needed materials close to the machine that needs them instead of in a warehouse. Prepare everything you need before starting production.) Pack many objects in an UBO, use persistent mappings, use ARB_shader_storage_buffer_object (Don't feed the machine parts individually. Give it a pile of stuff to work on.) Use ARB_multi_draw_indirect to pack multiple draw calls together (Don't produce one thing and then have a meeting with all your machine operators to tell them what to do next. Plan ahead what the next N products will be and give a detailed list to all of the operators.) Results of these optimizations:
=> With modern OpenGL you can choose at run-time how you balance the workload on CPU and GPU
The following image shows the reduction in state changes (OpenGL function calls). Each square is a state change operation.
Each line is a draw operation and consists of up to 7 state change operations and one draw call.
Because of spacing there are 4 columns that should be seen as one big stream of draw operations (lines).
This is only a part of an even longer stream of draw operations so some state changes are not visible at all in this image (red and orange).

All of these OpenGL extensions are available on NVIDIA Kepler GPUs and some of them on Fermi.
All of them are also implemented by AMD and Intel "for a pretty reasonable fraction of hardware".
Here is a (hopefully complete) list of the used OpenGL extensions:
Sparse Textures (contributed by AMD, NVIDIA)
Bindless Textures (contributed by AMD)
UBOs (contributed by APPLE, NVIDIA)
Shader Storage Buffer Object (contributed by NVIDIA, AMD)
Persistent Mapping (contributed by NVIDIA)
Multi Draw Indirect (contributed by AMD)
My opinion:
If developers decide to move away from Direct3D, why not join the OpenGL club instead of Mantle?
OpenGL is vendor-independant, extensible, well tested and optimized and works on most platforms NOW.
nairolf reacted to Xorbot in Irrational Games shuts down. Ken Levine want something new February 19, 2014

You might even call it irrational...
nairolf got a reaction from Telbet in Modern OpenGL vs Mantle? (from NVIDIA's talk at Steam Dev Days) February 19, 2014

They also said that if you use modern OpenGL you won't be bottle-necked by the API: https://twitter.com/grahamsellers/status/383242587609395201 The question is if Khronos (or a different org) wants to be responsible for Mantle. I mean why should they want to work on two competing open APIs at the same time?
Also, AMD repeatedly compares Mantle to Direct3D, not OpenGL. Mantle is supposed to kill DirectX.

For AMD's OpenGL drivers that is reasonable. For other companies probably not.

That was an issue when OpenGL had a feature-driven release model. The committee delayed new API versions until there were enough new extensions that could get promoted into the standard. That lead to fragmentation because ATI/Nvidia/... didn't want to wait for years to get their new features in. So they did their own extensions so that developers could use them if they wanted. Nowadays OpenGL has a schedule-driven release model with two releases a year. So there is less reason to write their own extensions when they can get their stuff supported by an official OpenGL release in less then 6 months. More info

The thing that are mentioned in the talk are not at all vendor specific. Also they have fallback solutions and don't need to be used together. That's answered in the Q&A part at the end.

You mean the fixed function pipeline? That's gone for quite some time now. They keep removing old and slow stuff from the API since release 3.0 (where the change from feature-driven to schedule-driven happened).

modern? OpenGL (it says so in the title of the talk) open? OpenGL
suits the graphics hardware available today? OpenGL
and in the future? OpenGL
nairolf got a reaction from Telbet in Modern OpenGL vs Mantle? (from NVIDIA's talk at Steam Dev Days) February 19, 2014

John McDonald (NVIDIA) dedicates the last 15 minutes of the Steam Dev Days talk "Beyond Porting: How Modern OpenGL can Radically Reduce Driver Overhead" to address the "need more draw calls per second" problem.
According to him this can be solved with modern OpenGL on current GPUs of the three major vendors.
No need to use new APIs like Mantle for that.
(He implicitly mentions Mantle in the beginning by saying "[More draw calls] is the motivation for an entirely new API. [...]")
Video
(should skip to 27:42 automatically)
The talk is quite technical but I'll try to summarize it in this post...
But first the results:
5-30x increase in number of distinct objects per second ~75% reduced interaction with driver (less CPU load/waste) GPU can be affected negatively (although not too badly) [Compared to the original OpenGL implementation and NOT to Direct3D. But as John said, OpenGL without these tricks is already better than Direct3D.] Summary of the video:
PC developers are frustrated that console developers can get 5 to 20 times as many draw calls per second.
Direct3D is slow if you need to draw many objects, naive OpenGL is faster but by using a few of the newer OpenGL extensions you can improve that drastically.
They used API traces of a real-world application (Unreal Engine 4) and analyzed it to see how they can improve the code.
OpenGL function calls cause "state changes" inside the GPU. Think of the GPU as a factory with lots of machines that are all connected together.
Each of the machines can be configured to do something specific. Once everything is set up the factory will produce exactly that thing you specified it to produce.
The configuration of all machines is called the factory's state. If you want to change what the factory produces you have to change its state.
Different state changes have different costs because some machines are more difficult to reconfigure than others.
The following diagram shows how the different state changes compare to each other in terms of execution time:

Let's try to minimize the state changes as much as possible. The factory doesn't need to be reconfigured completely for each part that you produce. Many similar parts have almost the same production steps:
Use Sparse Bindless Textures to eliminate texture changes between draw calls (Place all your needed materials close to the machine that needs them instead of in a warehouse. Prepare everything you need before starting production.) Pack many objects in an UBO, use persistent mappings, use ARB_shader_storage_buffer_object (Don't feed the machine parts individually. Give it a pile of stuff to work on.) Use ARB_multi_draw_indirect to pack multiple draw calls together (Don't produce one thing and then have a meeting with all your machine operators to tell them what to do next. Plan ahead what the next N products will be and give a detailed list to all of the operators.) Results of these optimizations:
=> With modern OpenGL you can choose at run-time how you balance the workload on CPU and GPU
The following image shows the reduction in state changes (OpenGL function calls). Each square is a state change operation.
Each line is a draw operation and consists of up to 7 state change operations and one draw call.
Because of spacing there are 4 columns that should be seen as one big stream of draw operations (lines).
This is only a part of an even longer stream of draw operations so some state changes are not visible at all in this image (red and orange).

All of these OpenGL extensions are available on NVIDIA Kepler GPUs and some of them on Fermi.
All of them are also implemented by AMD and Intel "for a pretty reasonable fraction of hardware".
Here is a (hopefully complete) list of the used OpenGL extensions:
Sparse Textures (contributed by AMD, NVIDIA)
Bindless Textures (contributed by AMD)
UBOs (contributed by APPLE, NVIDIA)
Shader Storage Buffer Object (contributed by NVIDIA, AMD)
Persistent Mapping (contributed by NVIDIA)
Multi Draw Indirect (contributed by AMD)
My opinion:
If developers decide to move away from Direct3D, why not join the OpenGL club instead of Mantle?
OpenGL is vendor-independant, extensible, well tested and optimized and works on most platforms NOW.
nairolf reacted to Xorbot in Downside of Mantle February 19, 2014

I've arrived at a chilling forecast during my analysis and thought about Mantle. Since this is AMD's proprietary technology, it is not likely that there is a simple way to support the unique features of nVidia's architecture.

Let's establish a few things. OpenGL is an API which exposes the hardware-specific features, if they are implemented. I'll give some very basic examples of these extensions purely for discussion/illustration purposes:
GL_ATI_shader_texture_lod GL_NV_fragment_program2 These extensions works as follows. During the initialization of an OpenGL program, the programmer is able to query the driver through OpenGL function calls to see if the extensions exist. If so, then the engine can do further initialization that are related to the extensions in question. For example, if an nVidia-specific extension exists, it can result in faster render times due to a feature being supported by the hardware. In this case, however, the extension would not be available on AMD's hardware, and thus not available as an OpenGL extension when running the game on AMD video cards. Usually there is an equivalent extension, and the engine would handle these features properly and adequately (providing the game engine isn't crappy or devoid of advanced features).

So, with my admittedly light knowledge of Mantle, it is an API (Application Programming Interface) with direct functions to these extensions. Since Mantle works for GCN and onward, it is assumed that the OpenGL-equivalent extensions exists as though they are a part of the standard API. For example, there is no need to query if "GL_ATI_shader_texture_lod" exist, and then use it with the chance of it being buggy via the current AMD OpenGL driver. Instead, Mantle would expose such a feature as though it were a "fact" or a part of normal API. It would be as common to use as the "glVertex3f" function, which *MUST* be supported on *ALL* OpenGL driver implementations and hardware.

So, now onto the next part. I could foresee nVidia responding with their own Mantle. Of course, it wouldn't be called Mantle, but rather another branding. The same thing would exist, whereby there are standard API function calls are designed for nVidia specific hardware, and likely would support Kepler and onward. I know that one great feature that is getting a lot of attention lately is nVidia's H.264 processing abilities via Kepler chips. With this feature being a "fact" of Kepler-based devices, there would be standard API function calls for H.264 encoding.

An example would be to call a function to begin encoding the last X frames rendered into an in-memory H.264 video stream, whereby it can be played back through a surface texture. This would lead to a really interesting indie title whereby the past actions are recorded and played-back within the game via television screen meshes. This example is to show that there are also nVidia-specific features that would exist through a Mantle-like API, and thus would be a game that could only be supported (via hardware) on nVidia cards.

That's where the downside lies. That is main issue of the downside of Mantle. In order to support that awesome nVidia H.264 feature on AMD renderers/Mantle, a software-based work-around must be created. Either that, or the game must be released for nVidia-only hardware, which is not a likely scenario for developers and publishers who wish to turn a profit.

As a result, I foresee *ANOTHER* API coming. If we presume that an nVidia Mantle-like API exists, as well as AMD's Mantle, we could see *ANOTHER* API that sits on top of both. Let's call it "Crust", since we are going with a naming scheme that relates to the layers of planet Earth. The Crust API is basically another set of functions that will either execute a hardware-supported version of a feature, or a software workaround. In the H.264 example, the Crust API function would redirect to nVidia's H.264 features, but for AMD, it would be redirected to a software-based solution.

Does this sound familiar? It is basically DirectX again. Albeit, that is a very simplified analogy, and writing about the minute differences would result in a 10-page post. So, in simple terms, this whole Mantle and nVidia API thing may result in a situation where games are back to the so-called "bloated" (Linus and Slick's term) DirectX. Sure, things may be improved in very specific situations. Also, console ports may be improved slightly. However with my experience as a game programmer for many years, I feel that in the end we are not going to be better off, at least in large measurable terms. If anything, the performance gains may save us a generation of video cards. What I mean by this is that the 7970 can act as a 9970 (or whatever that new R9 XYZ equivalent name is) through the optimizations.

Once gamers and developers "get used to" Mantle as a norm, then there is no tangible difference. We'll still be stuck in the same boat as we were with DirectX in a few generations from now. Developers will have new headaches in the future, and that is to "fight" with the Crust, Mantle, and nVidia API's.
nairolf got a reaction from TheNinjaNextDor in Modern OpenGL vs Mantle? (from NVIDIA's talk at Steam Dev Days) February 18, 2014

John McDonald (NVIDIA) dedicates the last 15 minutes of the Steam Dev Days talk "Beyond Porting: How Modern OpenGL can Radically Reduce Driver Overhead" to address the "need more draw calls per second" problem.
According to him this can be solved with modern OpenGL on current GPUs of the three major vendors.
No need to use new APIs like Mantle for that.
(He implicitly mentions Mantle in the beginning by saying "[More draw calls] is the motivation for an entirely new API. [...]")
Video
(should skip to 27:42 automatically)
The talk is quite technical but I'll try to summarize it in this post...
But first the results:
5-30x increase in number of distinct objects per second ~75% reduced interaction with driver (less CPU load/waste) GPU can be affected negatively (although not too badly) [Compared to the original OpenGL implementation and NOT to Direct3D. But as John said, OpenGL without these tricks is already better than Direct3D.] Summary of the video:
PC developers are frustrated that console developers can get 5 to 20 times as many draw calls per second.
Direct3D is slow if you need to draw many objects, naive OpenGL is faster but by using a few of the newer OpenGL extensions you can improve that drastically.
They used API traces of a real-world application (Unreal Engine 4) and analyzed it to see how they can improve the code.
OpenGL function calls cause "state changes" inside the GPU. Think of the GPU as a factory with lots of machines that are all connected together.
Each of the machines can be configured to do something specific. Once everything is set up the factory will produce exactly that thing you specified it to produce.
The configuration of all machines is called the factory's state. If you want to change what the factory produces you have to change its state.
Different state changes have different costs because some machines are more difficult to reconfigure than others.
The following diagram shows how the different state changes compare to each other in terms of execution time:

Let's try to minimize the state changes as much as possible. The factory doesn't need to be reconfigured completely for each part that you produce. Many similar parts have almost the same production steps:
Use Sparse Bindless Textures to eliminate texture changes between draw calls (Place all your needed materials close to the machine that needs them instead of in a warehouse. Prepare everything you need before starting production.) Pack many objects in an UBO, use persistent mappings, use ARB_shader_storage_buffer_object (Don't feed the machine parts individually. Give it a pile of stuff to work on.) Use ARB_multi_draw_indirect to pack multiple draw calls together (Don't produce one thing and then have a meeting with all your machine operators to tell them what to do next. Plan ahead what the next N products will be and give a detailed list to all of the operators.) Results of these optimizations:
=> With modern OpenGL you can choose at run-time how you balance the workload on CPU and GPU
The following image shows the reduction in state changes (OpenGL function calls). Each square is a state change operation.
Each line is a draw operation and consists of up to 7 state change operations and one draw call.
Because of spacing there are 4 columns that should be seen as one big stream of draw operations (lines).
This is only a part of an even longer stream of draw operations so some state changes are not visible at all in this image (red and orange).

All of these OpenGL extensions are available on NVIDIA Kepler GPUs and some of them on Fermi.
All of them are also implemented by AMD and Intel "for a pretty reasonable fraction of hardware".
Here is a (hopefully complete) list of the used OpenGL extensions:
Sparse Textures (contributed by AMD, NVIDIA)
Bindless Textures (contributed by AMD)
UBOs (contributed by APPLE, NVIDIA)
Shader Storage Buffer Object (contributed by NVIDIA, AMD)
Persistent Mapping (contributed by NVIDIA)
Multi Draw Indirect (contributed by AMD)
My opinion:
If developers decide to move away from Direct3D, why not join the OpenGL club instead of Mantle?
OpenGL is vendor-independant, extensible, well tested and optimized and works on most platforms NOW.
nairolf got a reaction from Slick in Modern OpenGL vs Mantle? (from NVIDIA's talk at Steam Dev Days) February 18, 2014

John McDonald (NVIDIA) dedicates the last 15 minutes of the Steam Dev Days talk "Beyond Porting: How Modern OpenGL can Radically Reduce Driver Overhead" to address the "need more draw calls per second" problem.
According to him this can be solved with modern OpenGL on current GPUs of the three major vendors.
No need to use new APIs like Mantle for that.
(He implicitly mentions Mantle in the beginning by saying "[More draw calls] is the motivation for an entirely new API. [...]")
Video
(should skip to 27:42 automatically)
The talk is quite technical but I'll try to summarize it in this post...
But first the results:
5-30x increase in number of distinct objects per second ~75% reduced interaction with driver (less CPU load/waste) GPU can be affected negatively (although not too badly) [Compared to the original OpenGL implementation and NOT to Direct3D. But as John said, OpenGL without these tricks is already better than Direct3D.] Summary of the video:
PC developers are frustrated that console developers can get 5 to 20 times as many draw calls per second.
Direct3D is slow if you need to draw many objects, naive OpenGL is faster but by using a few of the newer OpenGL extensions you can improve that drastically.
They used API traces of a real-world application (Unreal Engine 4) and analyzed it to see how they can improve the code.
OpenGL function calls cause "state changes" inside the GPU. Think of the GPU as a factory with lots of machines that are all connected together.
Each of the machines can be configured to do something specific. Once everything is set up the factory will produce exactly that thing you specified it to produce.
The configuration of all machines is called the factory's state. If you want to change what the factory produces you have to change its state.
Different state changes have different costs because some machines are more difficult to reconfigure than others.
The following diagram shows how the different state changes compare to each other in terms of execution time:

Let's try to minimize the state changes as much as possible. The factory doesn't need to be reconfigured completely for each part that you produce. Many similar parts have almost the same production steps:
Use Sparse Bindless Textures to eliminate texture changes between draw calls (Place all your needed materials close to the machine that needs them instead of in a warehouse. Prepare everything you need before starting production.) Pack many objects in an UBO, use persistent mappings, use ARB_shader_storage_buffer_object (Don't feed the machine parts individually. Give it a pile of stuff to work on.) Use ARB_multi_draw_indirect to pack multiple draw calls together (Don't produce one thing and then have a meeting with all your machine operators to tell them what to do next. Plan ahead what the next N products will be and give a detailed list to all of the operators.) Results of these optimizations:
=> With modern OpenGL you can choose at run-time how you balance the workload on CPU and GPU
The following image shows the reduction in state changes (OpenGL function calls). Each square is a state change operation.
Each line is a draw operation and consists of up to 7 state change operations and one draw call.
Because of spacing there are 4 columns that should be seen as one big stream of draw operations (lines).
This is only a part of an even longer stream of draw operations so some state changes are not visible at all in this image (red and orange).

All of these OpenGL extensions are available on NVIDIA Kepler GPUs and some of them on Fermi.
All of them are also implemented by AMD and Intel "for a pretty reasonable fraction of hardware".
Here is a (hopefully complete) list of the used OpenGL extensions:
Sparse Textures (contributed by AMD, NVIDIA)
Bindless Textures (contributed by AMD)
UBOs (contributed by APPLE, NVIDIA)
Shader Storage Buffer Object (contributed by NVIDIA, AMD)
Persistent Mapping (contributed by NVIDIA)
Multi Draw Indirect (contributed by AMD)
My opinion:
If developers decide to move away from Direct3D, why not join the OpenGL club instead of Mantle?
OpenGL is vendor-independant, extensible, well tested and optimized and works on most platforms NOW.
nairolf got a reaction from deathjester in Modern OpenGL vs Mantle? (from NVIDIA's talk at Steam Dev Days) February 18, 2014

They also said that if you use modern OpenGL you won't be bottle-necked by the API: https://twitter.com/grahamsellers/status/383242587609395201 The question is if Khronos (or a different org) wants to be responsible for Mantle. I mean why should they want to work on two competing open APIs at the same time?
Also, AMD repeatedly compares Mantle to Direct3D, not OpenGL. Mantle is supposed to kill DirectX.

For AMD's OpenGL drivers that is reasonable. For other companies probably not.

That was an issue when OpenGL had a feature-driven release model. The committee delayed new API versions until there were enough new extensions that could get promoted into the standard. That lead to fragmentation because ATI/Nvidia/... didn't want to wait for years to get their new features in. So they did their own extensions so that developers could use them if they wanted. Nowadays OpenGL has a schedule-driven release model with two releases a year. So there is less reason to write their own extensions when they can get their stuff supported by an official OpenGL release in less then 6 months. More info

The thing that are mentioned in the talk are not at all vendor specific. Also they have fallback solutions and don't need to be used together. That's answered in the Q&A part at the end.

You mean the fixed function pipeline? That's gone for quite some time now. They keep removing old and slow stuff from the API since release 3.0 (where the change from feature-driven to schedule-driven happened).

modern? OpenGL (it says so in the title of the talk) open? OpenGL
suits the graphics hardware available today? OpenGL
and in the future? OpenGL
nairolf reacted to Xorbot in Modern OpenGL vs Mantle? (from NVIDIA's talk at Steam Dev Days) February 18, 2014

Time for me to brag a little bit. Here is my post from more than 4 months ago:

http://linustechtips.com/main/topic/61261-downside-of-mantle/

The post goes on to say how Mantle can lead us down a path right back where we started. But, the first part of my post applies more-so to this thread. It states that OpenGL can already be used to enhance your games with the proper usage of OpenGL and proper usage of GL Extensions.

But anyway, I am just bragging because I know about this stuff, whereas most people in this forum are like "OMFG MANTLE ROXORS! 400% MOAR FPS" <-- 3 months before it even arrived.
nairolf got a reaction from deathjester in Modern OpenGL vs Mantle? (from NVIDIA's talk at Steam Dev Days) February 18, 2014

John McDonald (NVIDIA) dedicates the last 15 minutes of the Steam Dev Days talk "Beyond Porting: How Modern OpenGL can Radically Reduce Driver Overhead" to address the "need more draw calls per second" problem.
According to him this can be solved with modern OpenGL on current GPUs of the three major vendors.
No need to use new APIs like Mantle for that.
(He implicitly mentions Mantle in the beginning by saying "[More draw calls] is the motivation for an entirely new API. [...]")
Video
(should skip to 27:42 automatically)
The talk is quite technical but I'll try to summarize it in this post...
But first the results:
5-30x increase in number of distinct objects per second ~75% reduced interaction with driver (less CPU load/waste) GPU can be affected negatively (although not too badly) [Compared to the original OpenGL implementation and NOT to Direct3D. But as John said, OpenGL without these tricks is already better than Direct3D.] Summary of the video:
PC developers are frustrated that console developers can get 5 to 20 times as many draw calls per second.
Direct3D is slow if you need to draw many objects, naive OpenGL is faster but by using a few of the newer OpenGL extensions you can improve that drastically.
They used API traces of a real-world application (Unreal Engine 4) and analyzed it to see how they can improve the code.
OpenGL function calls cause "state changes" inside the GPU. Think of the GPU as a factory with lots of machines that are all connected together.
Each of the machines can be configured to do something specific. Once everything is set up the factory will produce exactly that thing you specified it to produce.
The configuration of all machines is called the factory's state. If you want to change what the factory produces you have to change its state.
Different state changes have different costs because some machines are more difficult to reconfigure than others.
The following diagram shows how the different state changes compare to each other in terms of execution time:

Let's try to minimize the state changes as much as possible. The factory doesn't need to be reconfigured completely for each part that you produce. Many similar parts have almost the same production steps:
Use Sparse Bindless Textures to eliminate texture changes between draw calls (Place all your needed materials close to the machine that needs them instead of in a warehouse. Prepare everything you need before starting production.) Pack many objects in an UBO, use persistent mappings, use ARB_shader_storage_buffer_object (Don't feed the machine parts individually. Give it a pile of stuff to work on.) Use ARB_multi_draw_indirect to pack multiple draw calls together (Don't produce one thing and then have a meeting with all your machine operators to tell them what to do next. Plan ahead what the next N products will be and give a detailed list to all of the operators.) Results of these optimizations:
=> With modern OpenGL you can choose at run-time how you balance the workload on CPU and GPU
The following image shows the reduction in state changes (OpenGL function calls). Each square is a state change operation.
Each line is a draw operation and consists of up to 7 state change operations and one draw call.
Because of spacing there are 4 columns that should be seen as one big stream of draw operations (lines).
This is only a part of an even longer stream of draw operations so some state changes are not visible at all in this image (red and orange).

All of these OpenGL extensions are available on NVIDIA Kepler GPUs and some of them on Fermi.
All of them are also implemented by AMD and Intel "for a pretty reasonable fraction of hardware".
Here is a (hopefully complete) list of the used OpenGL extensions:
Sparse Textures (contributed by AMD, NVIDIA)
Bindless Textures (contributed by AMD)
UBOs (contributed by APPLE, NVIDIA)
Shader Storage Buffer Object (contributed by NVIDIA, AMD)
Persistent Mapping (contributed by NVIDIA)
Multi Draw Indirect (contributed by AMD)
My opinion:
If developers decide to move away from Direct3D, why not join the OpenGL club instead of Mantle?
OpenGL is vendor-independant, extensible, well tested and optimized and works on most platforms NOW.
nairolf got a reaction from TOMPPIX in Modern OpenGL vs Mantle? (from NVIDIA's talk at Steam Dev Days) February 18, 2014

John McDonald (NVIDIA) dedicates the last 15 minutes of the Steam Dev Days talk "Beyond Porting: How Modern OpenGL can Radically Reduce Driver Overhead" to address the "need more draw calls per second" problem.
According to him this can be solved with modern OpenGL on current GPUs of the three major vendors.
No need to use new APIs like Mantle for that.
(He implicitly mentions Mantle in the beginning by saying "[More draw calls] is the motivation for an entirely new API. [...]")
Video
(should skip to 27:42 automatically)
The talk is quite technical but I'll try to summarize it in this post...
But first the results:
5-30x increase in number of distinct objects per second ~75% reduced interaction with driver (less CPU load/waste) GPU can be affected negatively (although not too badly) [Compared to the original OpenGL implementation and NOT to Direct3D. But as John said, OpenGL without these tricks is already better than Direct3D.] Summary of the video:
PC developers are frustrated that console developers can get 5 to 20 times as many draw calls per second.
Direct3D is slow if you need to draw many objects, naive OpenGL is faster but by using a few of the newer OpenGL extensions you can improve that drastically.
They used API traces of a real-world application (Unreal Engine 4) and analyzed it to see how they can improve the code.
OpenGL function calls cause "state changes" inside the GPU. Think of the GPU as a factory with lots of machines that are all connected together.
Each of the machines can be configured to do something specific. Once everything is set up the factory will produce exactly that thing you specified it to produce.
The configuration of all machines is called the factory's state. If you want to change what the factory produces you have to change its state.
Different state changes have different costs because some machines are more difficult to reconfigure than others.
The following diagram shows how the different state changes compare to each other in terms of execution time:

Let's try to minimize the state changes as much as possible. The factory doesn't need to be reconfigured completely for each part that you produce. Many similar parts have almost the same production steps:
Use Sparse Bindless Textures to eliminate texture changes between draw calls (Place all your needed materials close to the machine that needs them instead of in a warehouse. Prepare everything you need before starting production.) Pack many objects in an UBO, use persistent mappings, use ARB_shader_storage_buffer_object (Don't feed the machine parts individually. Give it a pile of stuff to work on.) Use ARB_multi_draw_indirect to pack multiple draw calls together (Don't produce one thing and then have a meeting with all your machine operators to tell them what to do next. Plan ahead what the next N products will be and give a detailed list to all of the operators.) Results of these optimizations:
=> With modern OpenGL you can choose at run-time how you balance the workload on CPU and GPU
The following image shows the reduction in state changes (OpenGL function calls). Each square is a state change operation.
Each line is a draw operation and consists of up to 7 state change operations and one draw call.
Because of spacing there are 4 columns that should be seen as one big stream of draw operations (lines).
This is only a part of an even longer stream of draw operations so some state changes are not visible at all in this image (red and orange).

All of these OpenGL extensions are available on NVIDIA Kepler GPUs and some of them on Fermi.
All of them are also implemented by AMD and Intel "for a pretty reasonable fraction of hardware".
Here is a (hopefully complete) list of the used OpenGL extensions:
Sparse Textures (contributed by AMD, NVIDIA)
Bindless Textures (contributed by AMD)
UBOs (contributed by APPLE, NVIDIA)
Shader Storage Buffer Object (contributed by NVIDIA, AMD)
Persistent Mapping (contributed by NVIDIA)
Multi Draw Indirect (contributed by AMD)
My opinion:
If developers decide to move away from Direct3D, why not join the OpenGL club instead of Mantle?
OpenGL is vendor-independant, extensible, well tested and optimized and works on most platforms NOW.
nairolf got a reaction from GoodBytes in Modern OpenGL vs Mantle? (from NVIDIA's talk at Steam Dev Days) February 18, 2014

John McDonald (NVIDIA) dedicates the last 15 minutes of the Steam Dev Days talk "Beyond Porting: How Modern OpenGL can Radically Reduce Driver Overhead" to address the "need more draw calls per second" problem.
According to him this can be solved with modern OpenGL on current GPUs of the three major vendors.
No need to use new APIs like Mantle for that.
(He implicitly mentions Mantle in the beginning by saying "[More draw calls] is the motivation for an entirely new API. [...]")
Video
(should skip to 27:42 automatically)
The talk is quite technical but I'll try to summarize it in this post...
But first the results:
5-30x increase in number of distinct objects per second ~75% reduced interaction with driver (less CPU load/waste) GPU can be affected negatively (although not too badly) [Compared to the original OpenGL implementation and NOT to Direct3D. But as John said, OpenGL without these tricks is already better than Direct3D.] Summary of the video:
PC developers are frustrated that console developers can get 5 to 20 times as many draw calls per second.
Direct3D is slow if you need to draw many objects, naive OpenGL is faster but by using a few of the newer OpenGL extensions you can improve that drastically.
They used API traces of a real-world application (Unreal Engine 4) and analyzed it to see how they can improve the code.
OpenGL function calls cause "state changes" inside the GPU. Think of the GPU as a factory with lots of machines that are all connected together.
Each of the machines can be configured to do something specific. Once everything is set up the factory will produce exactly that thing you specified it to produce.
The configuration of all machines is called the factory's state. If you want to change what the factory produces you have to change its state.
Different state changes have different costs because some machines are more difficult to reconfigure than others.
The following diagram shows how the different state changes compare to each other in terms of execution time:

Let's try to minimize the state changes as much as possible. The factory doesn't need to be reconfigured completely for each part that you produce. Many similar parts have almost the same production steps:
Use Sparse Bindless Textures to eliminate texture changes between draw calls (Place all your needed materials close to the machine that needs them instead of in a warehouse. Prepare everything you need before starting production.) Pack many objects in an UBO, use persistent mappings, use ARB_shader_storage_buffer_object (Don't feed the machine parts individually. Give it a pile of stuff to work on.) Use ARB_multi_draw_indirect to pack multiple draw calls together (Don't produce one thing and then have a meeting with all your machine operators to tell them what to do next. Plan ahead what the next N products will be and give a detailed list to all of the operators.) Results of these optimizations:
=> With modern OpenGL you can choose at run-time how you balance the workload on CPU and GPU
The following image shows the reduction in state changes (OpenGL function calls). Each square is a state change operation.
Each line is a draw operation and consists of up to 7 state change operations and one draw call.
Because of spacing there are 4 columns that should be seen as one big stream of draw operations (lines).
This is only a part of an even longer stream of draw operations so some state changes are not visible at all in this image (red and orange).

All of these OpenGL extensions are available on NVIDIA Kepler GPUs and some of them on Fermi.
All of them are also implemented by AMD and Intel "for a pretty reasonable fraction of hardware".
Here is a (hopefully complete) list of the used OpenGL extensions:
Sparse Textures (contributed by AMD, NVIDIA)
Bindless Textures (contributed by AMD)
UBOs (contributed by APPLE, NVIDIA)
Shader Storage Buffer Object (contributed by NVIDIA, AMD)
Persistent Mapping (contributed by NVIDIA)
Multi Draw Indirect (contributed by AMD)
My opinion:
If developers decide to move away from Direct3D, why not join the OpenGL club instead of Mantle?
OpenGL is vendor-independant, extensible, well tested and optimized and works on most platforms NOW.
nairolf got a reaction from Khraft in Modern OpenGL vs Mantle? (from NVIDIA's talk at Steam Dev Days) February 18, 2014

John McDonald (NVIDIA) dedicates the last 15 minutes of the Steam Dev Days talk "Beyond Porting: How Modern OpenGL can Radically Reduce Driver Overhead" to address the "need more draw calls per second" problem.
According to him this can be solved with modern OpenGL on current GPUs of the three major vendors.
No need to use new APIs like Mantle for that.
(He implicitly mentions Mantle in the beginning by saying "[More draw calls] is the motivation for an entirely new API. [...]")
Video
(should skip to 27:42 automatically)
The talk is quite technical but I'll try to summarize it in this post...
But first the results:
5-30x increase in number of distinct objects per second ~75% reduced interaction with driver (less CPU load/waste) GPU can be affected negatively (although not too badly) [Compared to the original OpenGL implementation and NOT to Direct3D. But as John said, OpenGL without these tricks is already better than Direct3D.] Summary of the video:
PC developers are frustrated that console developers can get 5 to 20 times as many draw calls per second.
Direct3D is slow if you need to draw many objects, naive OpenGL is faster but by using a few of the newer OpenGL extensions you can improve that drastically.
They used API traces of a real-world application (Unreal Engine 4) and analyzed it to see how they can improve the code.
OpenGL function calls cause "state changes" inside the GPU. Think of the GPU as a factory with lots of machines that are all connected together.
Each of the machines can be configured to do something specific. Once everything is set up the factory will produce exactly that thing you specified it to produce.
The configuration of all machines is called the factory's state. If you want to change what the factory produces you have to change its state.
Different state changes have different costs because some machines are more difficult to reconfigure than others.
The following diagram shows how the different state changes compare to each other in terms of execution time:

Let's try to minimize the state changes as much as possible. The factory doesn't need to be reconfigured completely for each part that you produce. Many similar parts have almost the same production steps:
Use Sparse Bindless Textures to eliminate texture changes between draw calls (Place all your needed materials close to the machine that needs them instead of in a warehouse. Prepare everything you need before starting production.) Pack many objects in an UBO, use persistent mappings, use ARB_shader_storage_buffer_object (Don't feed the machine parts individually. Give it a pile of stuff to work on.) Use ARB_multi_draw_indirect to pack multiple draw calls together (Don't produce one thing and then have a meeting with all your machine operators to tell them what to do next. Plan ahead what the next N products will be and give a detailed list to all of the operators.) Results of these optimizations:
=> With modern OpenGL you can choose at run-time how you balance the workload on CPU and GPU
The following image shows the reduction in state changes (OpenGL function calls). Each square is a state change operation.
Each line is a draw operation and consists of up to 7 state change operations and one draw call.
Because of spacing there are 4 columns that should be seen as one big stream of draw operations (lines).
This is only a part of an even longer stream of draw operations so some state changes are not visible at all in this image (red and orange).

All of these OpenGL extensions are available on NVIDIA Kepler GPUs and some of them on Fermi.
All of them are also implemented by AMD and Intel "for a pretty reasonable fraction of hardware".
Here is a (hopefully complete) list of the used OpenGL extensions:
Sparse Textures (contributed by AMD, NVIDIA)
Bindless Textures (contributed by AMD)
UBOs (contributed by APPLE, NVIDIA)
Shader Storage Buffer Object (contributed by NVIDIA, AMD)
Persistent Mapping (contributed by NVIDIA)
Multi Draw Indirect (contributed by AMD)
My opinion:
If developers decide to move away from Direct3D, why not join the OpenGL club instead of Mantle?
OpenGL is vendor-independant, extensible, well tested and optimized and works on most platforms NOW.
nairolf got a reaction from Nineshadow in Modern OpenGL vs Mantle? (from NVIDIA's talk at Steam Dev Days) February 18, 2014

John McDonald (NVIDIA) dedicates the last 15 minutes of the Steam Dev Days talk "Beyond Porting: How Modern OpenGL can Radically Reduce Driver Overhead" to address the "need more draw calls per second" problem.
According to him this can be solved with modern OpenGL on current GPUs of the three major vendors.
No need to use new APIs like Mantle for that.
(He implicitly mentions Mantle in the beginning by saying "[More draw calls] is the motivation for an entirely new API. [...]")
Video
(should skip to 27:42 automatically)
The talk is quite technical but I'll try to summarize it in this post...
But first the results:
5-30x increase in number of distinct objects per second ~75% reduced interaction with driver (less CPU load/waste) GPU can be affected negatively (although not too badly) [Compared to the original OpenGL implementation and NOT to Direct3D. But as John said, OpenGL without these tricks is already better than Direct3D.] Summary of the video:
PC developers are frustrated that console developers can get 5 to 20 times as many draw calls per second.
Direct3D is slow if you need to draw many objects, naive OpenGL is faster but by using a few of the newer OpenGL extensions you can improve that drastically.
They used API traces of a real-world application (Unreal Engine 4) and analyzed it to see how they can improve the code.
OpenGL function calls cause "state changes" inside the GPU. Think of the GPU as a factory with lots of machines that are all connected together.
Each of the machines can be configured to do something specific. Once everything is set up the factory will produce exactly that thing you specified it to produce.
The configuration of all machines is called the factory's state. If you want to change what the factory produces you have to change its state.
Different state changes have different costs because some machines are more difficult to reconfigure than others.
The following diagram shows how the different state changes compare to each other in terms of execution time:

Let's try to minimize the state changes as much as possible. The factory doesn't need to be reconfigured completely for each part that you produce. Many similar parts have almost the same production steps:
Use Sparse Bindless Textures to eliminate texture changes between draw calls (Place all your needed materials close to the machine that needs them instead of in a warehouse. Prepare everything you need before starting production.) Pack many objects in an UBO, use persistent mappings, use ARB_shader_storage_buffer_object (Don't feed the machine parts individually. Give it a pile of stuff to work on.) Use ARB_multi_draw_indirect to pack multiple draw calls together (Don't produce one thing and then have a meeting with all your machine operators to tell them what to do next. Plan ahead what the next N products will be and give a detailed list to all of the operators.) Results of these optimizations:
=> With modern OpenGL you can choose at run-time how you balance the workload on CPU and GPU
The following image shows the reduction in state changes (OpenGL function calls). Each square is a state change operation.
Each line is a draw operation and consists of up to 7 state change operations and one draw call.
Because of spacing there are 4 columns that should be seen as one big stream of draw operations (lines).
This is only a part of an even longer stream of draw operations so some state changes are not visible at all in this image (red and orange).

All of these OpenGL extensions are available on NVIDIA Kepler GPUs and some of them on Fermi.
All of them are also implemented by AMD and Intel "for a pretty reasonable fraction of hardware".
Here is a (hopefully complete) list of the used OpenGL extensions:
Sparse Textures (contributed by AMD, NVIDIA)
Bindless Textures (contributed by AMD)
UBOs (contributed by APPLE, NVIDIA)
Shader Storage Buffer Object (contributed by NVIDIA, AMD)
Persistent Mapping (contributed by NVIDIA)
Multi Draw Indirect (contributed by AMD)
My opinion:
If developers decide to move away from Direct3D, why not join the OpenGL club instead of Mantle?
OpenGL is vendor-independant, extensible, well tested and optimized and works on most platforms NOW.
nairolf got a reaction from tedjani in Modern OpenGL vs Mantle? (from NVIDIA's talk at Steam Dev Days) February 18, 2014

John McDonald (NVIDIA) dedicates the last 15 minutes of the Steam Dev Days talk "Beyond Porting: How Modern OpenGL can Radically Reduce Driver Overhead" to address the "need more draw calls per second" problem.
According to him this can be solved with modern OpenGL on current GPUs of the three major vendors.
No need to use new APIs like Mantle for that.
(He implicitly mentions Mantle in the beginning by saying "[More draw calls] is the motivation for an entirely new API. [...]")
Video
(should skip to 27:42 automatically)
The talk is quite technical but I'll try to summarize it in this post...
But first the results:
5-30x increase in number of distinct objects per second ~75% reduced interaction with driver (less CPU load/waste) GPU can be affected negatively (although not too badly) [Compared to the original OpenGL implementation and NOT to Direct3D. But as John said, OpenGL without these tricks is already better than Direct3D.] Summary of the video:
PC developers are frustrated that console developers can get 5 to 20 times as many draw calls per second.
Direct3D is slow if you need to draw many objects, naive OpenGL is faster but by using a few of the newer OpenGL extensions you can improve that drastically.
They used API traces of a real-world application (Unreal Engine 4) and analyzed it to see how they can improve the code.
OpenGL function calls cause "state changes" inside the GPU. Think of the GPU as a factory with lots of machines that are all connected together.
Each of the machines can be configured to do something specific. Once everything is set up the factory will produce exactly that thing you specified it to produce.
The configuration of all machines is called the factory's state. If you want to change what the factory produces you have to change its state.
Different state changes have different costs because some machines are more difficult to reconfigure than others.
The following diagram shows how the different state changes compare to each other in terms of execution time:

Let's try to minimize the state changes as much as possible. The factory doesn't need to be reconfigured completely for each part that you produce. Many similar parts have almost the same production steps:
Use Sparse Bindless Textures to eliminate texture changes between draw calls (Place all your needed materials close to the machine that needs them instead of in a warehouse. Prepare everything you need before starting production.) Pack many objects in an UBO, use persistent mappings, use ARB_shader_storage_buffer_object (Don't feed the machine parts individually. Give it a pile of stuff to work on.) Use ARB_multi_draw_indirect to pack multiple draw calls together (Don't produce one thing and then have a meeting with all your machine operators to tell them what to do next. Plan ahead what the next N products will be and give a detailed list to all of the operators.) Results of these optimizations:
=> With modern OpenGL you can choose at run-time how you balance the workload on CPU and GPU
The following image shows the reduction in state changes (OpenGL function calls). Each square is a state change operation.
Each line is a draw operation and consists of up to 7 state change operations and one draw call.
Because of spacing there are 4 columns that should be seen as one big stream of draw operations (lines).
This is only a part of an even longer stream of draw operations so some state changes are not visible at all in this image (red and orange).

All of these OpenGL extensions are available on NVIDIA Kepler GPUs and some of them on Fermi.
All of them are also implemented by AMD and Intel "for a pretty reasonable fraction of hardware".
Here is a (hopefully complete) list of the used OpenGL extensions:
Sparse Textures (contributed by AMD, NVIDIA)
Bindless Textures (contributed by AMD)
UBOs (contributed by APPLE, NVIDIA)
Shader Storage Buffer Object (contributed by NVIDIA, AMD)
Persistent Mapping (contributed by NVIDIA)
Multi Draw Indirect (contributed by AMD)
My opinion:
If developers decide to move away from Direct3D, why not join the OpenGL club instead of Mantle?
OpenGL is vendor-independant, extensible, well tested and optimized and works on most platforms NOW.
nairolf got a reaction from MbV93 in Modern OpenGL vs Mantle? (from NVIDIA's talk at Steam Dev Days) February 18, 2014

John McDonald (NVIDIA) dedicates the last 15 minutes of the Steam Dev Days talk "Beyond Porting: How Modern OpenGL can Radically Reduce Driver Overhead" to address the "need more draw calls per second" problem.
According to him this can be solved with modern OpenGL on current GPUs of the three major vendors.
No need to use new APIs like Mantle for that.
(He implicitly mentions Mantle in the beginning by saying "[More draw calls] is the motivation for an entirely new API. [...]")
Video
(should skip to 27:42 automatically)
The talk is quite technical but I'll try to summarize it in this post...
But first the results:
5-30x increase in number of distinct objects per second ~75% reduced interaction with driver (less CPU load/waste) GPU can be affected negatively (although not too badly) [Compared to the original OpenGL implementation and NOT to Direct3D. But as John said, OpenGL without these tricks is already better than Direct3D.] Summary of the video:
PC developers are frustrated that console developers can get 5 to 20 times as many draw calls per second.
Direct3D is slow if you need to draw many objects, naive OpenGL is faster but by using a few of the newer OpenGL extensions you can improve that drastically.
They used API traces of a real-world application (Unreal Engine 4) and analyzed it to see how they can improve the code.
OpenGL function calls cause "state changes" inside the GPU. Think of the GPU as a factory with lots of machines that are all connected together.
Each of the machines can be configured to do something specific. Once everything is set up the factory will produce exactly that thing you specified it to produce.
The configuration of all machines is called the factory's state. If you want to change what the factory produces you have to change its state.
Different state changes have different costs because some machines are more difficult to reconfigure than others.
The following diagram shows how the different state changes compare to each other in terms of execution time:

Let's try to minimize the state changes as much as possible. The factory doesn't need to be reconfigured completely for each part that you produce. Many similar parts have almost the same production steps:
Use Sparse Bindless Textures to eliminate texture changes between draw calls (Place all your needed materials close to the machine that needs them instead of in a warehouse. Prepare everything you need before starting production.) Pack many objects in an UBO, use persistent mappings, use ARB_shader_storage_buffer_object (Don't feed the machine parts individually. Give it a pile of stuff to work on.) Use ARB_multi_draw_indirect to pack multiple draw calls together (Don't produce one thing and then have a meeting with all your machine operators to tell them what to do next. Plan ahead what the next N products will be and give a detailed list to all of the operators.) Results of these optimizations:
=> With modern OpenGL you can choose at run-time how you balance the workload on CPU and GPU
The following image shows the reduction in state changes (OpenGL function calls). Each square is a state change operation.
Each line is a draw operation and consists of up to 7 state change operations and one draw call.
Because of spacing there are 4 columns that should be seen as one big stream of draw operations (lines).
This is only a part of an even longer stream of draw operations so some state changes are not visible at all in this image (red and orange).

All of these OpenGL extensions are available on NVIDIA Kepler GPUs and some of them on Fermi.
All of them are also implemented by AMD and Intel "for a pretty reasonable fraction of hardware".
Here is a (hopefully complete) list of the used OpenGL extensions:
Sparse Textures (contributed by AMD, NVIDIA)
Bindless Textures (contributed by AMD)
UBOs (contributed by APPLE, NVIDIA)
Shader Storage Buffer Object (contributed by NVIDIA, AMD)
Persistent Mapping (contributed by NVIDIA)
Multi Draw Indirect (contributed by AMD)
My opinion:
If developers decide to move away from Direct3D, why not join the OpenGL club instead of Mantle?
OpenGL is vendor-independant, extensible, well tested and optimized and works on most platforms NOW.
nairolf got a reaction from Ciccioo in Modern OpenGL vs Mantle? (from NVIDIA's talk at Steam Dev Days) February 18, 2014

John McDonald (NVIDIA) dedicates the last 15 minutes of the Steam Dev Days talk "Beyond Porting: How Modern OpenGL can Radically Reduce Driver Overhead" to address the "need more draw calls per second" problem.
According to him this can be solved with modern OpenGL on current GPUs of the three major vendors.
No need to use new APIs like Mantle for that.
(He implicitly mentions Mantle in the beginning by saying "[More draw calls] is the motivation for an entirely new API. [...]")
Video
(should skip to 27:42 automatically)
The talk is quite technical but I'll try to summarize it in this post...
But first the results:
5-30x increase in number of distinct objects per second ~75% reduced interaction with driver (less CPU load/waste) GPU can be affected negatively (although not too badly) [Compared to the original OpenGL implementation and NOT to Direct3D. But as John said, OpenGL without these tricks is already better than Direct3D.] Summary of the video:
PC developers are frustrated that console developers can get 5 to 20 times as many draw calls per second.
Direct3D is slow if you need to draw many objects, naive OpenGL is faster but by using a few of the newer OpenGL extensions you can improve that drastically.
They used API traces of a real-world application (Unreal Engine 4) and analyzed it to see how they can improve the code.
OpenGL function calls cause "state changes" inside the GPU. Think of the GPU as a factory with lots of machines that are all connected together.
Each of the machines can be configured to do something specific. Once everything is set up the factory will produce exactly that thing you specified it to produce.
The configuration of all machines is called the factory's state. If you want to change what the factory produces you have to change its state.
Different state changes have different costs because some machines are more difficult to reconfigure than others.
The following diagram shows how the different state changes compare to each other in terms of execution time:

Let's try to minimize the state changes as much as possible. The factory doesn't need to be reconfigured completely for each part that you produce. Many similar parts have almost the same production steps:
Use Sparse Bindless Textures to eliminate texture changes between draw calls (Place all your needed materials close to the machine that needs them instead of in a warehouse. Prepare everything you need before starting production.) Pack many objects in an UBO, use persistent mappings, use ARB_shader_storage_buffer_object (Don't feed the machine parts individually. Give it a pile of stuff to work on.) Use ARB_multi_draw_indirect to pack multiple draw calls together (Don't produce one thing and then have a meeting with all your machine operators to tell them what to do next. Plan ahead what the next N products will be and give a detailed list to all of the operators.) Results of these optimizations:
=> With modern OpenGL you can choose at run-time how you balance the workload on CPU and GPU
The following image shows the reduction in state changes (OpenGL function calls). Each square is a state change operation.
Each line is a draw operation and consists of up to 7 state change operations and one draw call.
Because of spacing there are 4 columns that should be seen as one big stream of draw operations (lines).
This is only a part of an even longer stream of draw operations so some state changes are not visible at all in this image (red and orange).

All of these OpenGL extensions are available on NVIDIA Kepler GPUs and some of them on Fermi.
All of them are also implemented by AMD and Intel "for a pretty reasonable fraction of hardware".
Here is a (hopefully complete) list of the used OpenGL extensions:
Sparse Textures (contributed by AMD, NVIDIA)
Bindless Textures (contributed by AMD)
UBOs (contributed by APPLE, NVIDIA)
Shader Storage Buffer Object (contributed by NVIDIA, AMD)
Persistent Mapping (contributed by NVIDIA)
Multi Draw Indirect (contributed by AMD)
My opinion:
If developers decide to move away from Direct3D, why not join the OpenGL club instead of Mantle?
OpenGL is vendor-independant, extensible, well tested and optimized and works on most platforms NOW.

Sign In

nairolf

Posts

Joined

Last visited

Reputation Activity

My Activity Streams