Jump to content

Investigation Shows That NVIDIA GPUs Implement Tile Based Rasterization for Greater Efficiency

Master Disaster

An investigation into Nvidia GPUs has shown that during the transition from Kepler to Maxwell Nvidia made some huge changes in the way the card handles Rasterization which is now closer to low power SoCs. 

Quote

As someone who analyzes GPUs for a living, one of the more vexing things in my life has been NVIDIA’s Maxwell architecture. The company’s 28nm refresh offered a huge performance-per-watt increase for only a modest die size increase, essentially allowing NVIDIA to offer a full generation’s performance improvement without a corresponding manufacturing improvement. We’ve had architectural updates on the same node before, but never anything quite like Maxwell.

 

The vexing aspect to me has been that while NVIDIA shared some details about how they improved Maxwell’s efficiency over Kepler, they have never disclosed all of the major improvements under the hood. We know, for example, that Maxwell implemented a significantly altered SM structure that was easier to reach peak utilization on, and thanks to its partitioning wasted much less power on interconnects. We also know that NVIDIA significantly increased the L2 cache size and did a number of low-level (transistor level) optimizations to the design. But NVIDIA has also held back information – the technical advantages that are their secret sauce – so I’ve never had a complete picture of how Maxwell compares to Kepler.

 

For a while now, a number of people have suspected that one of the ingredients of that secret sauce was that NVIDIA had applied some mobile power efficiency technologies to Maxwell. It was, after all, their original mobile-first GPU architecture, and now we have some data to back that up. Friend of AnandTech and all around tech guru David Kanter of Real World Tech has gone digging through Maxwell/Pascal, and in an article & video published this morning, he outlines how he has uncovered very convincing evidence that NVIDIA implemented a tile based rendering system with Maxwell.

 

In short, by playing around with some DirectX code specifically designed to look at triangle rasterization, he has come up with some solid evidence that NVIDIA’s handling of tringles has significantly changed since Kepler, and that their current method of triangle handling is consistent with a tile based renderer.

Tile based rasterization is more common in PowerVR & ARM based chips

Quote

Tile based rendering is something we’ve seen for some time in the mobile space, with both Imagination PowerVR and ARM Mali implementing it. The significance of tiling is that by splitting a scene up into tiles, tiles can be rasterized piece by piece by the GPU almost entirely on die, as opposed to the more memory (and power) intensive process of rasterizing the entire frame at once via immediate mode rendering. The trade-off with tiling, and why it’s a bit surprising to see it here, is that the PC legacy is immediate mode rendering, and this is still how most applications expect PC GPUs to work. So to implement tile based rasterization on Maxwell means that NVIDIA has found a practical means to overcome the drawbacks of the method and the potential compatibility issues.

 

In any case, Real Word Tech’s article goes into greater detail about what’s going on, so I won’t spoil it further. But with this information in hand, we now have a more complete picture of how Maxwell (and Pascal) work, and consequently how NVIDIA was able to improve over Kepler by so much. Finally, at this point in time Real World Tech believes that NVIDIA is the only PC GPU manufacturer to use tile based rasterization, which also helps to explain some of NVIDIA’s current advantages over Intel’s and AMD’s GPU architectures, and gives us an idea of what we may see them do in the future.

http://www.anandtech.com/show/10536/nvidia-maxwell-tile-rasterization-analysis

http://www.realworldtech.com/tile-based-rasterization-nvidia-gpus/

 

Interesting stuff, I'm not 100% sure what it all means though. 

 

Thoughts? 

Main Rig:-

Ryzen 7 3800X | Asus ROG Strix X570-F Gaming | 16GB Team Group Dark Pro 3600Mhz | Corsair MP600 1TB PCIe Gen 4 | Sapphire 5700 XT Pulse | Corsair H115i Platinum | WD Black 1TB | WD Green 4TB | EVGA SuperNOVA G3 650W | Asus TUF GT501 | Samsung C27HG70 1440p 144hz HDR FreeSync 2 | Ubuntu 20.04.2 LTS |

 

Server:-

Intel NUC running Server 2019 + Synology DSM218+ with 2 x 4TB Toshiba NAS Ready HDDs (RAID0)

Link to comment
Share on other sites

Link to post
Share on other sites

And that might explain why Kepler and Fermi based cards got left in the dust so quickly. 

Main Gaming PC - i9 10850k @ 5GHz - EVGA XC Ultra 2080ti with Heatkiller 4 - Asrock Z490 Taichi - Corsair H115i - 32GB GSkill Ripjaws V 3600 CL16 OC'd to 3733 - HX850i - Samsung NVME 256GB SSD - Samsung 3.2TB PCIe 8x Enterprise NVMe - Toshiba 3TB 7200RPM HD - Lian Li Air

 

Proxmox Server - i7 8700k @ 4.5Ghz - 32GB EVGA 3000 CL15 OC'd to 3200 - Asus Strix Z370-E Gaming - Oracle F80 800GB Enterprise SSD, LSI SAS running 3 4TB and 2 6TB (Both Raid Z0), Samsung 840Pro 120GB - Phanteks Enthoo Pro

 

Super Server - i9 7980Xe @ 4.5GHz - 64GB 3200MHz Cl16 - Asrock X299 Professional - Nvidia Telsa K20 -Sandisk 512GB Enterprise SATA SSD, 128GB Seagate SATA SSD, 1.5TB WD Green (Over 9 years of power on time) - Phanteks Enthoo Pro 2

 

Laptop - 2019 Macbook Pro 16" - i7 - 16GB - 512GB - 5500M 8GB - Thermal Pads and Graphite Tape modded

 

Smart Phones - iPhone X - 64GB, AT&T, iOS 13.3 iPhone 6 : 16gb, AT&T, iOS 12 iPhone 4 : 16gb, AT&T Go Phone, iOS 7.1.1 Jailbroken. iPhone 3G : 8gb, AT&T Go Phone, iOS 4.2.1 Jailbroken.

 

Link to comment
Share on other sites

Link to post
Share on other sites

3 minutes ago, Dan Castellaneta said:

I don't really see this as a negative thing moreso clever implementation of something that already exists.

From what I gather it's to do with how the GPU renders the scene. Instead of rendering the entire frame it splits it into small tiles then renders each tile individually. Right? 

 

If so then I'd agree it's not a bad thing as long as it is more efficient but I'd question that fact, rendering 36 small frames compared to just 1 big one must equal the same amount of overall work done so is the efficiency gains due to multithreading and simultaneous processing? 

Main Rig:-

Ryzen 7 3800X | Asus ROG Strix X570-F Gaming | 16GB Team Group Dark Pro 3600Mhz | Corsair MP600 1TB PCIe Gen 4 | Sapphire 5700 XT Pulse | Corsair H115i Platinum | WD Black 1TB | WD Green 4TB | EVGA SuperNOVA G3 650W | Asus TUF GT501 | Samsung C27HG70 1440p 144hz HDR FreeSync 2 | Ubuntu 20.04.2 LTS |

 

Server:-

Intel NUC running Server 2019 + Synology DSM218+ with 2 x 4TB Toshiba NAS Ready HDDs (RAID0)

Link to comment
Share on other sites

Link to post
Share on other sites

7 minutes ago, Dan Castellaneta said:

I don't really see this as a negative thing moreso clever implementation of something that already exists.

And considering that Nvidia would have had experience with it in the form of their Tegra, it really shouldn't surprise anyone.

"We also blind small animals with cosmetics.
We do not sell cosmetics. We just blind animals."

 

"Please don't mistake us for Equifax. Those fuckers are evil"

 

This PSA brought to you by Equifacks.
PMSL

Link to comment
Share on other sites

Link to post
Share on other sites

Just now, Master Disaster said:

From what I gather it's to do with how the GPU renders the scene. Instead of rendering the entire frame it splits it into small tiles then renders each tile individually. Right? 

 

If so then I'd agree it's not a bad thing as long as it is more efficient but I'd question that fact, rendering 36 small frames compared to just 1 big one must equal the same amount of overall work done so is the efficiency gains due to multithreading and simultaneous processing? 

It's probably benefiting from multithreading in a way.

Check out my guide on how to scan cover art here!

Local asshole and 6th generation console enthusiast.

Link to comment
Share on other sites

Link to post
Share on other sites

1 minute ago, Master Disaster said:

From what I gather it's to do with how the GPU renders the scene. Instead of rendering the entire frame it splits it into small tiles then renders each tile individually. Right? 

 

If so then I'd agree it's not a bad thing as long as it is more efficient but I'd question that fact, rendering 36 small frames compared to just 1 big one must equal the same amount of overall work done so is the efficiency gains due to multithreading and simultaneous processing? 

Rasterizing is just a fancy way of saying rendering really. It renders all tiles and then displays them in 1 frame. So yeah it really is just multi-threading in a way.

Main Gaming PC - i9 10850k @ 5GHz - EVGA XC Ultra 2080ti with Heatkiller 4 - Asrock Z490 Taichi - Corsair H115i - 32GB GSkill Ripjaws V 3600 CL16 OC'd to 3733 - HX850i - Samsung NVME 256GB SSD - Samsung 3.2TB PCIe 8x Enterprise NVMe - Toshiba 3TB 7200RPM HD - Lian Li Air

 

Proxmox Server - i7 8700k @ 4.5Ghz - 32GB EVGA 3000 CL15 OC'd to 3200 - Asus Strix Z370-E Gaming - Oracle F80 800GB Enterprise SSD, LSI SAS running 3 4TB and 2 6TB (Both Raid Z0), Samsung 840Pro 120GB - Phanteks Enthoo Pro

 

Super Server - i9 7980Xe @ 4.5GHz - 64GB 3200MHz Cl16 - Asrock X299 Professional - Nvidia Telsa K20 -Sandisk 512GB Enterprise SATA SSD, 128GB Seagate SATA SSD, 1.5TB WD Green (Over 9 years of power on time) - Phanteks Enthoo Pro 2

 

Laptop - 2019 Macbook Pro 16" - i7 - 16GB - 512GB - 5500M 8GB - Thermal Pads and Graphite Tape modded

 

Smart Phones - iPhone X - 64GB, AT&T, iOS 13.3 iPhone 6 : 16gb, AT&T, iOS 12 iPhone 4 : 16gb, AT&T Go Phone, iOS 7.1.1 Jailbroken. iPhone 3G : 8gb, AT&T Go Phone, iOS 4.2.1 Jailbroken.

 

Link to comment
Share on other sites

Link to post
Share on other sites

What I want to see is some testing by people with Kepler, gcn 1.0,1.1, no 1.2, and Polaris GPUs do some testing to see how they behave in this test. If anyone here has one of those I'm curious to see your test results. Thanks!

Make sure to quote me or tag me when responding to me, or I might not know you replied! Examples:

 

Do this:

Quote

And make sure you do it by hitting the quote button at the bottom left of my post, and not the one inside the editor!

Or this:

@DocSwag

 

Buy whatever product is best for you, not what product is "best" for the market.

 

Interested in computer architecture? Still in middle or high school? P.M. me!

 

I love computer hardware and feel free to ask me anything about that (or phones). I especially like SSDs. But please do not ask me anything about Networking, programming, command line stuff, or any relatively hard software stuff. I know next to nothing about that.

 

Compooters:

Spoiler

Desktop:

Spoiler

CPU: i7 6700k, CPU Cooler: be quiet! Dark Rock Pro 3, Motherboard: MSI Z170a KRAIT GAMING, RAM: G.Skill Ripjaws 4 Series 4x4gb DDR4-2666 MHz, Storage: SanDisk SSD Plus 240gb + OCZ Vertex 180 480 GB + Western Digital Caviar Blue 1 TB 7200 RPM, Video Card: EVGA GTX 970 SSC, Case: Fractal Design Define S, Power Supply: Seasonic Focus+ Gold 650w Yay, Keyboard: Logitech G710+, Mouse: Logitech G502 Proteus Spectrum, Headphones: B&O H9i, Monitor: LG 29um67 (2560x1080 75hz freesync)

Home Server:

Spoiler

CPU: Pentium G4400, CPU Cooler: Stock, Motherboard: MSI h110l Pro Mini AC, RAM: Hyper X Fury DDR4 1x8gb 2133 MHz, Storage: PNY CS1311 120gb SSD + two Segate 4tb HDDs in RAID 1, Video Card: Does Intel Integrated Graphics count?, Case: Fractal Design Node 304, Power Supply: Seasonic 360w 80+ Gold, Keyboard+Mouse+Monitor: Does it matter?

Laptop (I use it for school):

Spoiler

Surface book 2 13" with an i7 8650u, 8gb RAM, 256 GB storage, and a GTX 1050

And if you're curious (or a stalker) I have a Just Black Pixel 2 XL 64gb

 

Link to comment
Share on other sites

Link to post
Share on other sites

Just now, Hunter259 said:

Rasterizing is just a fancy way of saying rendering really. It renders all tiles and then displays them in 1 frame. So yeah it really is just multi-threading in a way.

So that explains why it increases ROP performance... But then I'm wondering now, he stated it decreases memory utilization.... How does that work?

Make sure to quote me or tag me when responding to me, or I might not know you replied! Examples:

 

Do this:

Quote

And make sure you do it by hitting the quote button at the bottom left of my post, and not the one inside the editor!

Or this:

@DocSwag

 

Buy whatever product is best for you, not what product is "best" for the market.

 

Interested in computer architecture? Still in middle or high school? P.M. me!

 

I love computer hardware and feel free to ask me anything about that (or phones). I especially like SSDs. But please do not ask me anything about Networking, programming, command line stuff, or any relatively hard software stuff. I know next to nothing about that.

 

Compooters:

Spoiler

Desktop:

Spoiler

CPU: i7 6700k, CPU Cooler: be quiet! Dark Rock Pro 3, Motherboard: MSI Z170a KRAIT GAMING, RAM: G.Skill Ripjaws 4 Series 4x4gb DDR4-2666 MHz, Storage: SanDisk SSD Plus 240gb + OCZ Vertex 180 480 GB + Western Digital Caviar Blue 1 TB 7200 RPM, Video Card: EVGA GTX 970 SSC, Case: Fractal Design Define S, Power Supply: Seasonic Focus+ Gold 650w Yay, Keyboard: Logitech G710+, Mouse: Logitech G502 Proteus Spectrum, Headphones: B&O H9i, Monitor: LG 29um67 (2560x1080 75hz freesync)

Home Server:

Spoiler

CPU: Pentium G4400, CPU Cooler: Stock, Motherboard: MSI h110l Pro Mini AC, RAM: Hyper X Fury DDR4 1x8gb 2133 MHz, Storage: PNY CS1311 120gb SSD + two Segate 4tb HDDs in RAID 1, Video Card: Does Intel Integrated Graphics count?, Case: Fractal Design Node 304, Power Supply: Seasonic 360w 80+ Gold, Keyboard+Mouse+Monitor: Does it matter?

Laptop (I use it for school):

Spoiler

Surface book 2 13" with an i7 8650u, 8gb RAM, 256 GB storage, and a GTX 1050

And if you're curious (or a stalker) I have a Just Black Pixel 2 XL 64gb

 

Link to comment
Share on other sites

Link to post
Share on other sites

@patrickjp93Care to chime in and explain how tile rasterization is more efficient than normal? I'm struggling to understand because splitting the frame up still equals the same work done plus then it has to stitch everything together afterwards. 

Main Rig:-

Ryzen 7 3800X | Asus ROG Strix X570-F Gaming | 16GB Team Group Dark Pro 3600Mhz | Corsair MP600 1TB PCIe Gen 4 | Sapphire 5700 XT Pulse | Corsair H115i Platinum | WD Black 1TB | WD Green 4TB | EVGA SuperNOVA G3 650W | Asus TUF GT501 | Samsung C27HG70 1440p 144hz HDR FreeSync 2 | Ubuntu 20.04.2 LTS |

 

Server:-

Intel NUC running Server 2019 + Synology DSM218+ with 2 x 4TB Toshiba NAS Ready HDDs (RAID0)

Link to comment
Share on other sites

Link to post
Share on other sites

I cannot be certain about the specifics of this type of rendering by NVIDIA.  However, tile-based rendering can lead to other optimizations such as hidden surface removal, the broad term to define this.

 

What happens is if a scene is properly rendered with opaque geometry first, then alpha-blended geometry sorted back-to-front, entire objects can be discarded completely.  Fragment processing would not proceed after this as well, as per usual rasterization.

 

So if a tile is detected as completely opaque, then triangle that lies completely behind the Z/depth values in the tile would be discarded and never even processed further.

Link to comment
Share on other sites

Link to post
Share on other sites

errrrrr any one got a dumbed down version:/

 

"if nothing is impossible, try slamming a revolving door....." - unknown

my new rig bob https://uk.pcpartpicker.com/b/sGRG3C#cx710255

Kumaresh - "Judging whether something is alive by it's capability to live is one of the most idiotic arguments I've ever seen." - jan 2017

Link to comment
Share on other sites

Link to post
Share on other sites

Just now, Xorbot said:

I cannot be certain about the specifics of this type of rendering by NVIDIA.  However, tile-based rendering can lead to other optimizations such as hidden surface removal, the broad term to define this.

 

What happens is if a scene is properly rendered with opaque geometry first, then alpha-blended geometry sorted back-to-front, entire objects can be discarded completely.  Fragment processing would not proceed after this as well, as per usual rasterization.

 

So if a tile is detected as completely opaque, then triangle that lies completely behind the Z/depth values in the tile would be discarded and never even processed further.

OK, so it can ignore some stuff that isn't required to reduce overall workload. Interesting and actually makes a tonne of sense once you think about it :)

Main Rig:-

Ryzen 7 3800X | Asus ROG Strix X570-F Gaming | 16GB Team Group Dark Pro 3600Mhz | Corsair MP600 1TB PCIe Gen 4 | Sapphire 5700 XT Pulse | Corsair H115i Platinum | WD Black 1TB | WD Green 4TB | EVGA SuperNOVA G3 650W | Asus TUF GT501 | Samsung C27HG70 1440p 144hz HDR FreeSync 2 | Ubuntu 20.04.2 LTS |

 

Server:-

Intel NUC running Server 2019 + Synology DSM218+ with 2 x 4TB Toshiba NAS Ready HDDs (RAID0)

Link to comment
Share on other sites

Link to post
Share on other sites

Just now, Master Disaster said:

@patrickjp93Care to chime in and explain how tile rasterization is more efficient than normal? I'm struggling to understand because splitting the frame up still equals the same work done plus then it has to stitch everything together afterwards. 

Think of how a single threaded application and a well written multi-threaded application behaves. Multi-threaded is much faster right? This is very similar to this.

Just now, DocSwag said:

So that explains why it increases ROP performance... But then I'm wondering now, he stated it decreases memory utilization.... How does that work?

It only is working on one tiny bit of the single frame at that one moment thus only needing a smaller bit of memory to do so. I imagine by doing this it renders that one tile, drops it from memory, and then immediately reads in new memory into its place. Not an expert by I imagine thats how it works.

Main Gaming PC - i9 10850k @ 5GHz - EVGA XC Ultra 2080ti with Heatkiller 4 - Asrock Z490 Taichi - Corsair H115i - 32GB GSkill Ripjaws V 3600 CL16 OC'd to 3733 - HX850i - Samsung NVME 256GB SSD - Samsung 3.2TB PCIe 8x Enterprise NVMe - Toshiba 3TB 7200RPM HD - Lian Li Air

 

Proxmox Server - i7 8700k @ 4.5Ghz - 32GB EVGA 3000 CL15 OC'd to 3200 - Asus Strix Z370-E Gaming - Oracle F80 800GB Enterprise SSD, LSI SAS running 3 4TB and 2 6TB (Both Raid Z0), Samsung 840Pro 120GB - Phanteks Enthoo Pro

 

Super Server - i9 7980Xe @ 4.5GHz - 64GB 3200MHz Cl16 - Asrock X299 Professional - Nvidia Telsa K20 -Sandisk 512GB Enterprise SATA SSD, 128GB Seagate SATA SSD, 1.5TB WD Green (Over 9 years of power on time) - Phanteks Enthoo Pro 2

 

Laptop - 2019 Macbook Pro 16" - i7 - 16GB - 512GB - 5500M 8GB - Thermal Pads and Graphite Tape modded

 

Smart Phones - iPhone X - 64GB, AT&T, iOS 13.3 iPhone 6 : 16gb, AT&T, iOS 12 iPhone 4 : 16gb, AT&T Go Phone, iOS 7.1.1 Jailbroken. iPhone 3G : 8gb, AT&T Go Phone, iOS 4.2.1 Jailbroken.

 

Link to comment
Share on other sites

Link to post
Share on other sites

2 minutes ago, Xorbot said:

I cannot be certain about the specifics of this type of rendering by NVIDIA.  However, tile-based rendering can lead to other optimizations such as hidden surface removal, the broad term to define this.

 

What happens is if a scene is properly rendered with opaque geometry first, then alpha-blended geometry sorted back-to-front, entire objects can be discarded completely.  Fragment processing would not proceed after this as well, as per usual rasterization.

 

So if a tile is detected as completely opaque, then triangle that lies completely behind the Z/depth values in the tile would be discarded and never even processed further.

So basically it could allow for discarding triangles that can't be seen anyways, thus improving performance... Cool!

 

Looks to me like there's a lot of advantages of using tile based rasterization!

Make sure to quote me or tag me when responding to me, or I might not know you replied! Examples:

 

Do this:

Quote

And make sure you do it by hitting the quote button at the bottom left of my post, and not the one inside the editor!

Or this:

@DocSwag

 

Buy whatever product is best for you, not what product is "best" for the market.

 

Interested in computer architecture? Still in middle or high school? P.M. me!

 

I love computer hardware and feel free to ask me anything about that (or phones). I especially like SSDs. But please do not ask me anything about Networking, programming, command line stuff, or any relatively hard software stuff. I know next to nothing about that.

 

Compooters:

Spoiler

Desktop:

Spoiler

CPU: i7 6700k, CPU Cooler: be quiet! Dark Rock Pro 3, Motherboard: MSI Z170a KRAIT GAMING, RAM: G.Skill Ripjaws 4 Series 4x4gb DDR4-2666 MHz, Storage: SanDisk SSD Plus 240gb + OCZ Vertex 180 480 GB + Western Digital Caviar Blue 1 TB 7200 RPM, Video Card: EVGA GTX 970 SSC, Case: Fractal Design Define S, Power Supply: Seasonic Focus+ Gold 650w Yay, Keyboard: Logitech G710+, Mouse: Logitech G502 Proteus Spectrum, Headphones: B&O H9i, Monitor: LG 29um67 (2560x1080 75hz freesync)

Home Server:

Spoiler

CPU: Pentium G4400, CPU Cooler: Stock, Motherboard: MSI h110l Pro Mini AC, RAM: Hyper X Fury DDR4 1x8gb 2133 MHz, Storage: PNY CS1311 120gb SSD + two Segate 4tb HDDs in RAID 1, Video Card: Does Intel Integrated Graphics count?, Case: Fractal Design Node 304, Power Supply: Seasonic 360w 80+ Gold, Keyboard+Mouse+Monitor: Does it matter?

Laptop (I use it for school):

Spoiler

Surface book 2 13" with an i7 8650u, 8gb RAM, 256 GB storage, and a GTX 1050

And if you're curious (or a stalker) I have a Just Black Pixel 2 XL 64gb

 

Link to comment
Share on other sites

Link to post
Share on other sites

Just now, Master Disaster said:

OK, so it can ignore some stuff that isn't required to reduce overall workload. Interesting and actually makes a tonne of sense once you think about it :)

That's why the tiles are small like 32x32 on PSVita, and in this NVIDIA test.. not sure what the smallest tile components' size are.  But they are indeed small compared to the main framebuffer.  Yes, I am a game programmer.

Link to comment
Share on other sites

Link to post
Share on other sites

Great video from David Kanter.

 

Could this potentially have a small effect on the image quality rendered? (doubt it would be visible to a blind eye).

We haven't seen a real image comparison on GPUs for a long time.

 

14 minutes ago, DocSwag said:

So basically it could allow for discarding triangles that can't be seen anyways, thus improving performance... Cool!

 

Looks to me like there's a lot of advantages of using tile based rasterization!

Color me surprised, but I guess it was those same triangles Nvidia was pushing with gameworks.

Please avoid feeding the argumentative narcissistic academic monkey.

"the last 20 percent – going from demo to production-worthy algorithm – is both hard and is time-consuming. The last 20 percent is what separates the men from the boys" - Mobileye CEO

Link to comment
Share on other sites

Link to post
Share on other sites

28 minutes ago, Hunter259 said:

Think of how a single threaded application and a well written multi-threaded application behaves. Multi-threaded is much faster right? This is very similar to this.

Graphics rendering is already a massively parallel operation, every triangle, every shadow every pixel is a parallel render, thats why GPU performance scales with core count unlike CPU perf

Intel i5-3570K/ Gigabyte GTX 1080/ Asus PA248Q/ Sony MDR-7506/MSI Z77A-G45/ NHD-14/Samsung 840 EVO 256GB+ Seagate Barracuda 3TB/ 16GB HyperX Blue 1600MHZ/  750w PSU/ Corsiar Carbide 500R

 

Link to comment
Share on other sites

Link to post
Share on other sites

Just now, Sharkyx1 said:

Graphics rendering is already a massively parallel operation, every triangle, every shadow every pixel is a parallel render, thats why GPU performance scales with core count unlike CPU perf

Yes I understand that. This is still just making it even more parallel.

Main Gaming PC - i9 10850k @ 5GHz - EVGA XC Ultra 2080ti with Heatkiller 4 - Asrock Z490 Taichi - Corsair H115i - 32GB GSkill Ripjaws V 3600 CL16 OC'd to 3733 - HX850i - Samsung NVME 256GB SSD - Samsung 3.2TB PCIe 8x Enterprise NVMe - Toshiba 3TB 7200RPM HD - Lian Li Air

 

Proxmox Server - i7 8700k @ 4.5Ghz - 32GB EVGA 3000 CL15 OC'd to 3200 - Asus Strix Z370-E Gaming - Oracle F80 800GB Enterprise SSD, LSI SAS running 3 4TB and 2 6TB (Both Raid Z0), Samsung 840Pro 120GB - Phanteks Enthoo Pro

 

Super Server - i9 7980Xe @ 4.5GHz - 64GB 3200MHz Cl16 - Asrock X299 Professional - Nvidia Telsa K20 -Sandisk 512GB Enterprise SATA SSD, 128GB Seagate SATA SSD, 1.5TB WD Green (Over 9 years of power on time) - Phanteks Enthoo Pro 2

 

Laptop - 2019 Macbook Pro 16" - i7 - 16GB - 512GB - 5500M 8GB - Thermal Pads and Graphite Tape modded

 

Smart Phones - iPhone X - 64GB, AT&T, iOS 13.3 iPhone 6 : 16gb, AT&T, iOS 12 iPhone 4 : 16gb, AT&T Go Phone, iOS 7.1.1 Jailbroken. iPhone 3G : 8gb, AT&T Go Phone, iOS 4.2.1 Jailbroken.

 

Link to comment
Share on other sites

Link to post
Share on other sites

16 minutes ago, Tomsen said:

Great video from David Kanter.

 

Could this potentially have a small effect on the image quality rendered? (doubt it would be visible to a blind eye).

We haven't seen a real image comparison on GPUs for a long time.

 

Color me surprised, but I guess it was those same triangles Nvidia was pushing with gameworks.

And probably why AMD doesn't use it....

"We also blind small animals with cosmetics.
We do not sell cosmetics. We just blind animals."

 

"Please don't mistake us for Equifax. Those fuckers are evil"

 

This PSA brought to you by Equifacks.
PMSL

Link to comment
Share on other sites

Link to post
Share on other sites

43 minutes ago, Master Disaster said:

@patrickjp93Care to chime in and explain how tile rasterization is more efficient than normal? I'm struggling to understand because splitting the frame up still equals the same work done plus then it has to stitch everything together afterwards. 

Deduplication of lighting work. Stamp the same lighting signature across the similar tiles and then tweak as needed. It's the same with tiled texturing.

Software Engineer for Suncorp (Australia), Computer Tech Enthusiast, Miami University Graduate, Nerd

Link to comment
Share on other sites

Link to post
Share on other sites

19 minutes ago, patrickjp93 said:

Deduplication of lighting work. Stamp the same lighting signature across the similar tiles and then tweak as needed. It's the same with tiled texturing.

THat would explain why Nvidia GPUs get different lightning signatures in Ashes of the Singularity then AMD based cards. Their "tweaking" simply doesnt work and or doesnt fit with the way the game is built.

Link to comment
Share on other sites

Link to post
Share on other sites

25 minutes ago, Dabombinable said:

And probably why AMD doesn't use it....

Ashes of the singularity showed some peculiarities. At first people blamed it on Async compute not working properly. But if it is based on tweaking Lightning, as @patrickjp93 claims, then that would explain why Pascal GPUs consistently make a different lightning signature then Radeon GPUs in that game.

Link to comment
Share on other sites

Link to post
Share on other sites

1 hour ago, DocSwag said:

So basically it could allow for discarding triangles that can't be seen anyways, thus improving performance... Cool!

 

Looks to me like there's a lot of advantages of using tile based rasterization!

AMD introduced a similar system with Polaris. It is one of the Rasterization optimizations that came with GCN4

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×