Jump to content

Nvidias confusing new lineup may finally have been explained

Master Disaster

So will there be a RTX 2080ti months later that's basically a Titan RTX, but half the price?

Link to comment
Share on other sites

Link to post
Share on other sites

16 minutes ago, Okjoek said:

So will there be a RTX 2080ti months later that's basically a Titan RTX, but half the price?

If AdoredTV's leak holds up, in this case no. It'd be the full version of the 2080's die.

Link to comment
Share on other sites

Link to post
Share on other sites

Where does that put the 2080 vs 1080ti? Anyone remember how much faster the 1080ti was over the 1080?

Link to comment
Share on other sites

Link to post
Share on other sites

20 minutes ago, Taf the Ghost said:

If AdoredTV's leak holds up, in this case no. It'd be the full version of the 2080's die.

Wasn't that still saying GV104 die? If that is the case then the 2080 Ti will use GV102 like in the past.

Link to comment
Share on other sites

Link to post
Share on other sites

@leadeater The Titan chip was ''unknown'', the 2080 chip was ''TU104''. The reasoning behind the assumption is the 2080 only has 23 Streaming multiprocessors, which is only 15% more than the 1080 and it is an odd numbered count. The required amount of SMs has been reduced to allow more chips to pass inspection, and to segment the market later on, allegedly. Regardless of how plausible this all seems, these are still rumors.

Motherboard: Asus X570-E
CPU: 3900x 4.3GHZ

Memory: G.skill Trident GTZR 3200mhz cl14

GPU: AMD RX 570

SSD1: Corsair MP510 1TB

SSD2: Samsung MX500 500GB

PSU: Corsair AX860i Platinum

Link to comment
Share on other sites

Link to post
Share on other sites

26 minutes ago, MMKing said:

@leadeater The Titan chip was ''unknown'', the 2080 chip was ''TU104''. The reasoning behind the assumption is the 2080 only has 23 Streaming multiprocessors, which is only 15% more than the 1080 and it is an odd numbered count. The required amount of SMs has been reduced to allow more chips to pass inspection, and to segment the market later on, allegedly. Regardless of how plausible this all seems, these are still rumors.

Yea, I just used GV since it's not really fully know what the arch naming actually is. The 104 is the important part to that question anyway :). I don't see any reason to change away from what Nvidia has been doing, Gx104 on launch for xx80 GPU and Gx102 for Titan then later Gx102 for xx80 Ti.

 

Big change with Volta over Pascal is double the number of SMs with half the number of CUDA core config per SM. Pascal was 30 SM and Volta 80 SM, if these new gaming GPUs/archs incorporate that change then the calculations need adjusting, plus this change allows Nvidia to create more actual diverse SKUs without having to weird/creative stuff to achieve some pseudo like separation.

Link to comment
Share on other sites

Link to post
Share on other sites

TItan A at siggraph tomorrow, must be true

 

https://www.nvidia.com/en-us/events/siggraph/

MOAR COARS: 5GHz "Confirmed" Black Edition™ The Build
AMD 5950X 4.7/4.6GHz All Core Dynamic OC + 1900MHz FCLK | 5GHz+ PBO | ASUS X570 Dark Hero | 32 GB 3800MHz 14-15-15-30-48-1T GDM 8GBx4 |  PowerColor AMD Radeon 6900 XT Liquid Devil @ 2700MHz Core + 2130MHz Mem | 2x 480mm Rad | 8x Blacknoise Noiseblocker NB-eLoop B12-PS Black Edition 120mm PWM | Thermaltake Core P5 TG Ti + Additional 3D Printed Rad Mount

 

Link to comment
Share on other sites

Link to post
Share on other sites

1 hour ago, MMKing said:

@leadeater The Titan chip was ''unknown'', the 2080 chip was ''TU104''. The reasoning behind the assumption is the 2080 only has 23 Streaming multiprocessors, which is only 15% more than the 1080 and it is an odd numbered count. The required amount of SMs has been reduced to allow more chips to pass inspection, and to segment the market later on, allegedly. Regardless of how plausible this all seems, these are still rumors.

The Telsa V100s actually come with disabled SMs for yield reasons, since its such a huge die. You can get full die ones, now, but those are sold for NVLink setups.

26 minutes ago, leadeater said:

Yea, I just used GV since it's not really fully know what the arch naming actually is. The 104 is the important part to that question anyway :). I don't see any reason to change away from what Nvidia has been doing, Gx104 on launch for xx80 GPU and Gx102 for Titan then later Gx102 for xx80 Ti.

 

Big change with Volta over Pascal is double the number of SMs with half the number of CUDA core config per SM. Pascal was 30 SM and Volta 80 SM, if these new gaming GPUs/archs incorporate that change then the calculations need adjusting, plus this change allows Nvidia to create more actual diverse SKUs without having to weird/creative stuff to achieve some pseudo like separation.

Looking over Nvidia's actual launch schedule, they've yet to release another Volta design. Which means there should be GV102 and GV104 dies out there. The question is which ones are re-spun Pascal and which has the Volta redesign. AdoredTV's leak points to GV104 / TU104 being a tweaked Pascal design, which (if true) suggests we're going to see something like TU104-200 as the launch with a potentially much bigger TU104-300/400 for later. 

 

Nvidia did that with Kepler/700 series, though that was the last generation before the HPC started to kick off. The 12nm node isn't really much of a shrink from the 16nm base node, so the dies have to get bigger anyway. Nvidia is going to slow roll everything with moving to 7nm node. AMD just isn't going to compete in the xx80 space, so Nvidia is going to milk the situation. 

Link to comment
Share on other sites

Link to post
Share on other sites

3 minutes ago, Taf the Ghost said:

The Telsa V100s actually come with disabled SMs for yield reasons, since its such a huge die. You can get full die ones, now, but those are sold for NVLink setups.

Tesla V100 uses GV100 die though, so does the Titan V for that matter which is a new thing for Titan cards.

 

Either way I expect as usual a Gx102 die being used for xx80 Ti but not for the xx80. The other option, really only due to the Titan V, is both the xx80 and xx80 Ti end up using Gx102.

Link to comment
Share on other sites

Link to post
Share on other sites

4 minutes ago, leadeater said:

Tesla V100 uses GV100 die though, so does the Titan V for that matter which is a new thing for Titan cards.

 

Either way I expect as usual a Gx102 die being used for xx80 Ti but not for the xx80. The other option, really only due to the Titan V, is both the xx80 and xx80 Ti end up using Gx102.

I think the Titan will stay the GV100, and I think the GV102 will be unlikely to show up in the consumer space since it's going to be a HBM2 model. I think those will be the Tensor Core based designs, though maybe Nvidia will have enough GV102s that the next Titan Turing will show up next year for 1500USD.

 

I think GV104/TU104 will be a fairly large die that Nvidia plans to exploit for a lot of SKUs. Given the node, it'll probably be ~450 mm2 with generational improvements and will exist as main upper design for the Turing generation. It'll still from Pascal's design branch with new Ray Tracing SIMDs. They'll be able to run out 2070 Ti and 2080 Ti off the same die. This leaves the GV106/TU106 to be the 2070 down through the 2060 versions. (This would explain the weird numbering that showed up for AdoredTV.) 

 

If the numbers play out, it'd be roughly 30% performance increase per SM for Pascal with GDDR6. (I think the memory matters a good chunk for the large generational SM improvement.) From Adored's numbers, this would make the TU106 as a 15 SM design, while the TU116 (the 2060 replacement) is a 10 SM design. TU117 is either 7 or 8 SMs. There's the entire product stack out of exactly 2 designs. The Samsung-based production will just be rebranded to the 2000 series.

 

TU104 is probably a 28 SM design. 23 SM to launch the 2080. The 2070 TI can show up as a 18-20 design later, and the 2080 TI shows up as maybe 1 SM short of the full die. Just depends on yields.

Link to comment
Share on other sites

Link to post
Share on other sites

19 hours ago, Master Disaster said:

Interesting you think so considering WCCF have called the new cards the 2000 series from the beginning, they were right, they called Turing not Ampere, they were right again and they have now called the names. If they're right this time they're 3 for 3.

 

Kind of odd for somewhere that apparently make stuff up and Photoshop just to get clicks.

Except you know, they had an article here where they reported it would be called the GTX 1180.

Here they had an article that it would launch in April.

Here is an article from them that the next gen would be Ampere and not Turing.

Although in this article they said it would most likely be Turing based and no Ampere.

 

Here is an article where they call it the "1180/2080", clearly showing that they have NOT just been saying it will be the 2080 from the beginning.

Here is one where they say it's the 1180, and that it has 16GB of GDDR6, a 256bit bus and 8+6 PCIe power, with new SLI fingers. Let's wait and see how much of that turns out to be true.

Here is another one where they say it will be the GeForce 11 series, not 20 series.

And another one. And yes, I had to double check that I wasn't posting the same article over and over. They seem like copy/paste jobs but they are slightly different.

 

And hey, check out this article where they even show graphs of performance for the "1180", with stuff like MSRP and launch date (specified as July 2018).

How much of that do you think will turn out to be accurate?

 

Want me to keep going?

Being able to point out 2-3 times where WCCFTech has been correct is like pointing to a broken clock and saying it was correct twice yesterday.

 

Want me to keep going? WCCFTech has probably published 30 or so articles about the 1180/2080, with probably over 100 claims being made regarding them. I can go on all day.

Link to comment
Share on other sites

Link to post
Share on other sites

3 hours ago, Taf the Ghost said:

I think the Titan will stay the GV100, and I think the GV102 will be unlikely to show up in the consumer space since it's going to be a HBM2 model

We don't know yet if GV102 is HBM2, or even HBM2 only. Can actually have GV102 variants that support either if Nvidia wanted to.

 

Personally based on what the GV100 die can do that we've seen the xx80 Ti is going to have to be Gx102 else it's not going to be faster than the 1080 Ti, not by enough for anyone to care.

 

3 hours ago, Taf the Ghost said:

 

If the numbers play out, it'd be roughly 30% performance increase per SM for Pascal with GDDR6. (I think the memory matters a good chunk for the large generational SM improvement.) From Adored's numbers, this would make the TU106 as a 15 SM design, while the TU116 (the 2060 replacement) is a 10 SM design. TU117 is either 7 or 8 SMs. There's the entire product stack out of exactly 2 designs. The Samsung-based production will just be rebranded to the 2000 series.

 

TU104 is probably a 28 SM design. 23 SM to launch the 2080. The 2070 TI can show up as a 18-20 design later, and the 2080 TI shows up as maybe 1 SM short of the full die. Just depends on yields.

Pretty sure we'll see the higher SM count design as seen in GV100 so basically double those SM counts. The higher SM count is very much part of the Async Compute and workload allocation improvements, don't see that not being brought in across the board.

 

Pascal is 128 Streaming Processors and 8 TMU pers SM

Volta is 64 Streaming Processors and 4 TMUs per SM

Link to comment
Share on other sites

Link to post
Share on other sites

44 minutes ago, LAwLz said:

snip

Not bad finding 9 articles that eeasily, I couldn't have been bothered going to look for the articles,  but it appears my estimation based on the number of threads and complaints was fairly accurate:

11 hours ago, mr moose said:

Interesting that many people thinks so too.   It'll take a lot more than just being right 2 or 3 times out of 10 to prove to me they actually know what they are talking about. 

 

 

EDIT: so yeah, it's not 3 for 3, it's not even 2 for 3, it's actually 2 for 9 plus god knows what, making it statistically far worse than guessing. In fact it's on par with taking a test you don't understand.  And that's really sad for any news  rumor/clickbait  publication

Grammar and spelling is not indicative of intelligence/knowledge.  Not having the same opinion does not always mean lack of understanding.  

Link to comment
Share on other sites

Link to post
Share on other sites

42 minutes ago, mr moose said:

Not bad finding 9 articles that eeasily, I couldn't have been bothered going to look for the articles,  but it appears my estimation based on the number of threads and complaints was fairly accurate:

 

 

EDIT: so yeah, it's not 3 for 3, it's not even 2 for 3, it's actually 2 for 9 plus god knows what, making it statistically far worse than guessing. In fact it's on par with taking a test you don't understand.  And that's really sad for any news  rumor/clickbait  publication

Well to be fair, the cards are not out yet so we don't know which articles will be correct, and which ones will be incorrect.

 

 

 

 

 

Anyway my point is that WCCFTech just pushes out a bunch of articles, many of which contradict themselves, so of course they will get some things right and some things wrong.

 

Again, WCCFTech is a broken watch. You can point to it being correct twice a day, but that does not mean they are reliable. If I made two threads on this forum, one saying that the next gen will be called 1180, and one saying it would be called 2080. Would people trust me when the next generation was coming up and people wanted predictions? I was "correct" about the previous gen, and if I constantly just link to the thread that was correct people might buy into it. In reality though, I am no more accurate than guessing.

Link to comment
Share on other sites

Link to post
Share on other sites

1 minute ago, LAwLz said:

Well to be fair, the cards are not out yet so we don't know which articles will be correct, and which ones will be incorrect.

But my point was that WCCFTech just pushes out a bunch of articles, many of which contradict themselves, so of course they will get some things right and some things wrong.

Again, WCCFTech is a broken watch. You can point to it being correct twice a day, but that does not mean they are reliable.

hedging bets is worse than posting rumors and hoping for truth.

Grammar and spelling is not indicative of intelligence/knowledge.  Not having the same opinion does not always mean lack of understanding.  

Link to comment
Share on other sites

Link to post
Share on other sites

2 minutes ago, mr moose said:

hedging bets is worse than posting rumors and hoping for truth.

Oh yes, that's the term I was looking for.

WCCFTech are most certainly hedging their bets.

 

Most rumor sites does it, because it's easy to just brag about the times you were correct and then linking back to those, boosting your credibility.

Link to comment
Share on other sites

Link to post
Share on other sites

1 hour ago, leadeater said:

We don't know yet if GV102 is HBM2, or even HBM2 only. Can actually have GV102 variants that support either if Nvidia wanted to.

 

Personally based on what the GV100 die can do that we've seen the xx80 Ti is going to have to be Gx102 else it's not going to be faster than the 1080 Ti, not by enough for anyone to care.

 

Pretty sure we'll see the higher SM count design as seen in GV100 so basically double those SM counts. The higher SM count is very much part of the Async Compute and workload allocation improvements, don't see that not being brought in across the board.

 

Pascal is 128 Streaming Processors and 8 TMU pers SM

Volta is 64 Streaming Processors and 4 TMUs per SM

My thinking, if AdoredTV's leak is correct, is that the SMs don't need to be broken down if they don't have the Tensor Cores. The much smaller SM for Volta is due to being able to address the Tensor Cores properly. Without the Tensor Cores, the SMs are the same size, along with uplift via generational improvements. Considering practically everything seems to be bandwidth rather than processing starved across the eco-systems, I'm going to guess a completely reworked memory system will be the big star.

 

This would end up, likely, with two Volta-based designs, and then the rework of Volta for more Gaming focused, which is Turing. Or the gaming branch is Ampere, but they made the whole generation Turing.

Link to comment
Share on other sites

Link to post
Share on other sites

27 minutes ago, Taf the Ghost said:

My thinking, if AdoredTV's leak is correct, is that the SMs don't need to be broken down if they don't have the Tensor Cores.

They did that for general compute and other non Tensor workloads. This allows them to share GPU resources between 48 processes up from 16 on Pascal.

 

However looking in to this I just discovered that Pascal GP100 also used the 64 cores per SM as Volta does, seems like that doesn't flow down to lesser dies after all.

 

Quote

Similar to Pascal GP100, the GV100 SM incorporates 64 FP32 cores and 32 FP64 cores per SM. However, the GV100 SM uses a new partitioning method to improve SM utilization and overall performance. Note that the GP100 SM is partitioned into two processing blocks, each with 32 FP32 Cores, 16 FP64 Cores, an instruction buffer, one warp scheduler, two dispatch units, and a 128 KB Register File. The GV100 SM is partitioned into four processing blocks, each with 16 FP32 Cores, 8 FP64 Cores, 16 INT32 Cores, two of the new mixed-precision Tensor Cores for deep learning matrix arithmetic, a new L0 instruction cache, one warp scheduler, one dispatch unit, and a 64 KB Register File.

 

image.png.054469eb53e16008131ae999ca65970d.png

 

Quote

Volta MPS provides hardware acceleration of critical components of the MPS server for improved performance and isolation, while increasing the maximum number of MPS clients from 16 on Pascal up to 48 on Volta (see Figure 25). Volta Multi-Process service is designed for sharing the GPU amongst applications from a single user and is not for multi-user or multi-tenant use cases

So geared towards HPC clusters that schedule and run workloads on nodes but I'm sure those same hardware features are utilized in certain parts for game async compute.

 

Volta also gives greater flexibility in workload scheduling, this part I'll to cut way short as it's a huge part of the Volta arch document so worth reading in full.

Quote

Pascal and earlier NVIDIA GPUs execute groups of 32 threads (known as warps) in SIMT (Single Instruction, Multiple Thread) fashion. The Pascal warp uses a single program counter shared amongst all 32 threads, combined with an active mask that specifies which threads of the warp are active at any given time. This means that divergent execution paths leave some threads inactive, serializing execution for different portions of the warp as shown in Figure 20.

 

Quote

Volta transforms this picture by enabling equal concurrency between all threads, regardless of warp. It does this by maintaining execution state per thread, including a program counter and call stack, as shown in Figure 21.

 

Quote

Volta’s independent thread scheduling allows the GPU to yield execution of any thread, either to make better use of execution resources or to allow one thread to wait for data to be produced by another. To maximize parallel efficiency, Volta includes a schedule optimizer which determines how to group active threads from the same warp together into SIMT units. This retains the high throughput of SIMT execution as in prior NVIDIA GPUs, but with much more flexibility: threads can now diverge and reconverge at sub-warp granularity, while the convergence optimizer in Volta will still group together threads which are executing the same code and run them in parallel for maximum efficiency

 

Quote

Starvation-free algorithms are a key pattern enabled by independent thread scheduling. These are concurrent computing algorithms that are guaranteed to execute correctly so long as the system ensures that all threads have adequate access to a contended resource. For example, a mutex (or lock) may be used in a starvation-free algorithm if a thread attempting to acquire the mutex is guaranteed eventually to succeed. In a system that does not support starvationfreedom, one or more threads may repeatedly acquire and release a mutex while starving another thread from ever successfully acquiring the mutex.

http://images.nvidia.com/content/volta-architecture/pdf/volta-architecture-whitepaper.pdf

 

 

Edit:

Also Nvidia from memory already said gaming GPUs will have Tensor cores, for RTX.

Link to comment
Share on other sites

Link to post
Share on other sites

@leadeater

 

A-sync compute would explain a good bit of the uplift, actually. It would also make sense that in non-HPC situations, that breaking things out to an extreme number of threads has a big drop off in performance gain. Nvidia would have a good handle on how far they can parallelize the Gaming tasks.

 

Nvidia has said that the Tensor Cores can be used for Raytracing tasks, I think it's going to be something more than just Tensor Cores. Even if it's Tensor Core v2.

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×