Jump to content

What if Pascal Quadro cards run RTX?

Jurrunio
Go to solution Solved by Mira Yurizaki,
21 minutes ago, Jurrunio said:

@Mira Yurizaki @Wh0_Am_1 I know Geforce and Quadro are the same when they come out from the fab, it's the software that makes them different in number crunching. what if the same "optimization" hack more performance out?

The optimizations that Quadros have over GeForce cards is typically for rasterized graphics. The thing with ray tracing is it's a very simple algorithm. This code:

#include <stdlib.h>   // card > aek.ppm
#include <stdio.h>
#include <math.h>
typedef int i;
typedef float f;
struct v {
    f x,y,z;
    v operator+(v r) {
        return v(x+r.x
                 ,y+r.y,z+r.z);
    } v operator*(f r) {
        return
            v(x*r,y*r,z*r);
    } f operator%(v r) {
        return
            x*r.x+y*r.y+z*r.z;
    } v() {} v operator^(v r
                        ) {
        return v(y*r.z-z*r.y,z*r.x-x*r.z,x*r.
                 y-y*r.x);
    } v(f a,f b,f c) {
        x=a;
        y=b;
        z=c;
    } v
    operator!() {
        return*this*(1/sqrt(*this%*
                            this));
    }
};
i G[]= {247570,280596,280600,
        249748,18578,18577,231184,16,16
       };
f R() {
    return(f)rand()/RAND_MAX;
}
i T(v o,v d,f
    &t,v&n) {
    t=1e9;
    i m=0;
    f p=-o.z/d.z;
    if(.01
            <p)t=p,n=v(0,0,1),m=1;
    for(i k=19; k--;)
        for(i j=9; j--;)if(G[j]&1<<k) {
                v p=o+v(-k
                        ,0,-j-4);
                f b=p%d,c=p%p-1,q=b*b-c;
                if(q>0
                  ) {
                    f s=-b-sqrt(q);
                    if(s<t&&s>.01)t=s,n=!(
                                                p+d*t),m=2;
                }
            }
    return m;
}
v S(v o,v d) {
    f t
    ;
    v n;
    i m=T(o,d,t,n);
    if(!m)return v(.7,
                       .6,1)*pow(1-d.z,4);
    v h=o+d*t,l=!(v(9+R(
                    ),9+R(),16)+h*-1),r=d+n*(n%d*-2);
    f b=l%
        n;
    if(b<0||T(h,l,t,n))b=0;
    f p=pow(l%r*(b
                 >0),99);
    if(m&1) {
        h=h*.2;
        return((i)(ceil(
                       h.x)+ceil(h.y))&1?v(3,1,1):v(3,3,3))*(b
                               *.2+.1);
    }
    return v(p,p,p)+S(h,r)*.5;
} i
main() {
    printf("P6 512 512 255 ");
    v g=!v
        (-6,-16,0),a=!(v(0,0,1)^g)*.002,b=!(g^a
                                           )*.002,c=(a+b)*-256+g;
    for(i y=512; y--;)
        for(i x=512; x--;) {
            v p(13,13,13);
            for(i r
                    =64; r--;) {
                v t=a*(R()-.5)*99+b*(R()-.5)*
                    99;
                p=S(v(17,16,8)+t,!(t*-1+(a*(R()+x)+b
                                         *(y+R())+c)*16))*3.5+p;
            }
            printf("%c%c%c"
                   ,(i)p.x,(i)p.y,(i)p.z);
        }
}

Produces this image:

minray.png

 

The problem with ray tracing is simply how many data points you have to sample. The only way to "optimize" ray tracing is to dump data points.

 

EDIT: Code and image came from https://fabiensanglard.net/rayTracing_back_of_business_card/

Spoiler

2019-03-18_23-15-47.png

Nvidia claims that Pascal Geforce is much worse than Turing Geforce in raytracing because it can only brute force it with single precision ALUs while Turing has extra stuff doing the heavy work, but what if Pascal Quadro is used? Quadro (and Tesla cards on that) are significantly faster than Geforce in floating point ops, what if Quadro can put that into use when it comes to brute forcing raytracing? Not expecting it to beat Turing tbh, but at least it shouldn't be as bad as PAscal Geforce cards right?

CPU: i7-2600K 4751MHz 1.44V (software) --> 1.47V at the back of the socket Motherboard: Asrock Z77 Extreme4 (BCLK: 103.3MHz) CPU Cooler: Noctua NH-D15 RAM: Adata XPG 2x8GB DDR3 (XMP: 2133MHz 10-11-11-30 CR2, custom: 2203MHz 10-11-10-26 CR1 tRFC:230 tREFI:14000) GPU: Asus GTX 1070 Dual (Super Jetstream vbios, +70(2025-2088MHz)/+400(8.8Gbps)) SSD: Samsung 840 Pro 256GB (main boot drive), Transcend SSD370 128GB PSU: Seasonic X-660 80+ Gold Case: Antec P110 Silent, 5 intakes 1 exhaust Monitor: AOC G2460PF 1080p 144Hz (150Hz max w/ DP, 121Hz max w/ HDMI) TN panel Keyboard: Logitech G610 Orion (Cherry MX Blue) with SteelSeries Apex M260 keycaps Mouse: BenQ Zowie FK1

 

Model: HP Omen 17 17-an110ca CPU: i7-8750H (0.125V core & cache, 50mV SA undervolt) GPU: GTX 1060 6GB Mobile (+80/+450, 1650MHz~1750MHz 0.78V~0.85V) RAM: 8+8GB DDR4-2400 18-17-17-39 2T Storage: HP EX920 1TB PCIe x4 M.2 SSD + Crucial MX500 1TB 2.5" SATA SSD, 128GB Toshiba PCIe x2 M.2 SSD (KBG30ZMV128G) gone cooking externally, 1TB Seagate 7200RPM 2.5" HDD (ST1000LM049-2GH172) left outside Monitor: 1080p 126Hz IPS G-sync

 

Desktop benching:

Cinebench R15 Single thread:168 Multi-thread: 833 

SuperPi (v1.5 from Techpowerup, PI value output) 16K: 0.100s 1M: 8.255s 32M: 7m 45.93s

Link to comment
Share on other sites

Link to post
Share on other sites

21 minutes ago, Jurrunio said:
  Reveal hidden contents

2019-03-18_23-15-47.png

Nvidia claims that Pascal Geforce is much worse than Turing Geforce in raytracing because it can only brute force it with single precision ALUs while Turing has extra stuff doing the heavy work, but what if Pascal Quadro is used? Quadro (and Tesla cards on that) are significantly faster than Geforce in floating point ops, what if Quadro can put that into use when it comes to brute forcing raytracing? Not expecting it to beat Turing tbh, but at least it shouldn't be as bad as PAscal Geforce cards right?

Qaudros aren't any different than GeForce GPUs, save for FP64 performance in the higher end models. However, that doesn't really matter anyway since it's likely FP64 isn't being used.

Link to comment
Share on other sites

Link to post
Share on other sites

19 minutes ago, Jurrunio said:
  Reveal hidden contents

2019-03-18_23-15-47.png

Nvidia claims that Pascal Geforce is much worse than Turing Geforce in raytracing because it can only brute force it with single precision ALUs while Turing has extra stuff doing the heavy work, but what if Pascal Quadro is used? Quadro (and Tesla cards on that) are significantly faster than Geforce in floating point ops, what if Quadro can put that into use when it comes to brute forcing raytracing? Not expecting it to beat Turing tbh, but at least it shouldn't be as bad as PAscal Geforce cards right?

Pascal cards still lack tensor cores, and as the Pascal Quadro cards use the same basic GPUs with more VRAM and compatibility for a few more compute workloads, I would expect the Pascal Quadro cards to perform much like unto their mainstream brethren. 

In search of the future, new tech, and exploring the universe! All under the cover of anonymity!

Link to comment
Share on other sites

Link to post
Share on other sites

@Mira Yurizaki @Wh0_Am_1 I know Geforce and Quadro are the same when they come out from the fab, it's the software that makes them different in number crunching. what if the same "optimization" hack more performance out?

CPU: i7-2600K 4751MHz 1.44V (software) --> 1.47V at the back of the socket Motherboard: Asrock Z77 Extreme4 (BCLK: 103.3MHz) CPU Cooler: Noctua NH-D15 RAM: Adata XPG 2x8GB DDR3 (XMP: 2133MHz 10-11-11-30 CR2, custom: 2203MHz 10-11-10-26 CR1 tRFC:230 tREFI:14000) GPU: Asus GTX 1070 Dual (Super Jetstream vbios, +70(2025-2088MHz)/+400(8.8Gbps)) SSD: Samsung 840 Pro 256GB (main boot drive), Transcend SSD370 128GB PSU: Seasonic X-660 80+ Gold Case: Antec P110 Silent, 5 intakes 1 exhaust Monitor: AOC G2460PF 1080p 144Hz (150Hz max w/ DP, 121Hz max w/ HDMI) TN panel Keyboard: Logitech G610 Orion (Cherry MX Blue) with SteelSeries Apex M260 keycaps Mouse: BenQ Zowie FK1

 

Model: HP Omen 17 17-an110ca CPU: i7-8750H (0.125V core & cache, 50mV SA undervolt) GPU: GTX 1060 6GB Mobile (+80/+450, 1650MHz~1750MHz 0.78V~0.85V) RAM: 8+8GB DDR4-2400 18-17-17-39 2T Storage: HP EX920 1TB PCIe x4 M.2 SSD + Crucial MX500 1TB 2.5" SATA SSD, 128GB Toshiba PCIe x2 M.2 SSD (KBG30ZMV128G) gone cooking externally, 1TB Seagate 7200RPM 2.5" HDD (ST1000LM049-2GH172) left outside Monitor: 1080p 126Hz IPS G-sync

 

Desktop benching:

Cinebench R15 Single thread:168 Multi-thread: 833 

SuperPi (v1.5 from Techpowerup, PI value output) 16K: 0.100s 1M: 8.255s 32M: 7m 45.93s

Link to comment
Share on other sites

Link to post
Share on other sites

21 minutes ago, Jurrunio said:

@Mira Yurizaki @Wh0_Am_1 I know Geforce and Quadro are the same when they come out from the fab, it's the software that makes them different in number crunching. what if the same "optimization" hack more performance out?

The optimizations that Quadros have over GeForce cards is typically for rasterized graphics. The thing with ray tracing is it's a very simple algorithm. This code:

#include <stdlib.h>   // card > aek.ppm
#include <stdio.h>
#include <math.h>
typedef int i;
typedef float f;
struct v {
    f x,y,z;
    v operator+(v r) {
        return v(x+r.x
                 ,y+r.y,z+r.z);
    } v operator*(f r) {
        return
            v(x*r,y*r,z*r);
    } f operator%(v r) {
        return
            x*r.x+y*r.y+z*r.z;
    } v() {} v operator^(v r
                        ) {
        return v(y*r.z-z*r.y,z*r.x-x*r.z,x*r.
                 y-y*r.x);
    } v(f a,f b,f c) {
        x=a;
        y=b;
        z=c;
    } v
    operator!() {
        return*this*(1/sqrt(*this%*
                            this));
    }
};
i G[]= {247570,280596,280600,
        249748,18578,18577,231184,16,16
       };
f R() {
    return(f)rand()/RAND_MAX;
}
i T(v o,v d,f
    &t,v&n) {
    t=1e9;
    i m=0;
    f p=-o.z/d.z;
    if(.01
            <p)t=p,n=v(0,0,1),m=1;
    for(i k=19; k--;)
        for(i j=9; j--;)if(G[j]&1<<k) {
                v p=o+v(-k
                        ,0,-j-4);
                f b=p%d,c=p%p-1,q=b*b-c;
                if(q>0
                  ) {
                    f s=-b-sqrt(q);
                    if(s<t&&s>.01)t=s,n=!(
                                                p+d*t),m=2;
                }
            }
    return m;
}
v S(v o,v d) {
    f t
    ;
    v n;
    i m=T(o,d,t,n);
    if(!m)return v(.7,
                       .6,1)*pow(1-d.z,4);
    v h=o+d*t,l=!(v(9+R(
                    ),9+R(),16)+h*-1),r=d+n*(n%d*-2);
    f b=l%
        n;
    if(b<0||T(h,l,t,n))b=0;
    f p=pow(l%r*(b
                 >0),99);
    if(m&1) {
        h=h*.2;
        return((i)(ceil(
                       h.x)+ceil(h.y))&1?v(3,1,1):v(3,3,3))*(b
                               *.2+.1);
    }
    return v(p,p,p)+S(h,r)*.5;
} i
main() {
    printf("P6 512 512 255 ");
    v g=!v
        (-6,-16,0),a=!(v(0,0,1)^g)*.002,b=!(g^a
                                           )*.002,c=(a+b)*-256+g;
    for(i y=512; y--;)
        for(i x=512; x--;) {
            v p(13,13,13);
            for(i r
                    =64; r--;) {
                v t=a*(R()-.5)*99+b*(R()-.5)*
                    99;
                p=S(v(17,16,8)+t,!(t*-1+(a*(R()+x)+b
                                         *(y+R())+c)*16))*3.5+p;
            }
            printf("%c%c%c"
                   ,(i)p.x,(i)p.y,(i)p.z);
        }
}

Produces this image:

minray.png

 

The problem with ray tracing is simply how many data points you have to sample. The only way to "optimize" ray tracing is to dump data points.

 

EDIT: Code and image came from https://fabiensanglard.net/rayTracing_back_of_business_card/

Edited by Mira Yurizaki
Pretiified the C code
Link to comment
Share on other sites

Link to post
Share on other sites

19 minutes ago, Mira Yurizaki said:

The only way to "optimize" ray tracing is to dump data points.

that's why even Turing cards need denoiser for the results and at the process cause artifacts when it's not optimized enough?

CPU: i7-2600K 4751MHz 1.44V (software) --> 1.47V at the back of the socket Motherboard: Asrock Z77 Extreme4 (BCLK: 103.3MHz) CPU Cooler: Noctua NH-D15 RAM: Adata XPG 2x8GB DDR3 (XMP: 2133MHz 10-11-11-30 CR2, custom: 2203MHz 10-11-10-26 CR1 tRFC:230 tREFI:14000) GPU: Asus GTX 1070 Dual (Super Jetstream vbios, +70(2025-2088MHz)/+400(8.8Gbps)) SSD: Samsung 840 Pro 256GB (main boot drive), Transcend SSD370 128GB PSU: Seasonic X-660 80+ Gold Case: Antec P110 Silent, 5 intakes 1 exhaust Monitor: AOC G2460PF 1080p 144Hz (150Hz max w/ DP, 121Hz max w/ HDMI) TN panel Keyboard: Logitech G610 Orion (Cherry MX Blue) with SteelSeries Apex M260 keycaps Mouse: BenQ Zowie FK1

 

Model: HP Omen 17 17-an110ca CPU: i7-8750H (0.125V core & cache, 50mV SA undervolt) GPU: GTX 1060 6GB Mobile (+80/+450, 1650MHz~1750MHz 0.78V~0.85V) RAM: 8+8GB DDR4-2400 18-17-17-39 2T Storage: HP EX920 1TB PCIe x4 M.2 SSD + Crucial MX500 1TB 2.5" SATA SSD, 128GB Toshiba PCIe x2 M.2 SSD (KBG30ZMV128G) gone cooking externally, 1TB Seagate 7200RPM 2.5" HDD (ST1000LM049-2GH172) left outside Monitor: 1080p 126Hz IPS G-sync

 

Desktop benching:

Cinebench R15 Single thread:168 Multi-thread: 833 

SuperPi (v1.5 from Techpowerup, PI value output) 16K: 0.100s 1M: 8.255s 32M: 7m 45.93s

Link to comment
Share on other sites

Link to post
Share on other sites

14 minutes ago, Jurrunio said:

that's why even Turing cards need denoiser for the results and at the process cause artifacts when it's not optimized enough?

Yes, because it's not sampling enough rays to make a clean enough image. Let's take example an image on Wikipedia:

Path_tracing_sampling_values.png

 

Starting from the top left, only one ray per pixel is sampled. It doubles as you go on. If you want to call the third to the last image as the minimum for not seeing an appreciable amount of noise, this is around 8000 samples per pixel. The RTX 2080 Ti is capable of 10 GR/sec. So if you want the quality of the 8000 sample image at 60 FPS with the performance the RTX 2080 Ti, the image resolution is capped at about 160x120. If you went with the 8th image (2nd row, 4th column), you only need 128 samples per pixel, and you could reasonably clean that up to something acceptable.

 

I don't know what really goes on under the hood, but this is the quantity of work you have to deal with.

Link to comment
Share on other sites

Link to post
Share on other sites

10 hours ago, Mira Yurizaki said:

I don't know what really goes on under the hood, but this is the quantity of work you have to deal with.

so that's why older graphics are all different means of faking physics...

CPU: i7-2600K 4751MHz 1.44V (software) --> 1.47V at the back of the socket Motherboard: Asrock Z77 Extreme4 (BCLK: 103.3MHz) CPU Cooler: Noctua NH-D15 RAM: Adata XPG 2x8GB DDR3 (XMP: 2133MHz 10-11-11-30 CR2, custom: 2203MHz 10-11-10-26 CR1 tRFC:230 tREFI:14000) GPU: Asus GTX 1070 Dual (Super Jetstream vbios, +70(2025-2088MHz)/+400(8.8Gbps)) SSD: Samsung 840 Pro 256GB (main boot drive), Transcend SSD370 128GB PSU: Seasonic X-660 80+ Gold Case: Antec P110 Silent, 5 intakes 1 exhaust Monitor: AOC G2460PF 1080p 144Hz (150Hz max w/ DP, 121Hz max w/ HDMI) TN panel Keyboard: Logitech G610 Orion (Cherry MX Blue) with SteelSeries Apex M260 keycaps Mouse: BenQ Zowie FK1

 

Model: HP Omen 17 17-an110ca CPU: i7-8750H (0.125V core & cache, 50mV SA undervolt) GPU: GTX 1060 6GB Mobile (+80/+450, 1650MHz~1750MHz 0.78V~0.85V) RAM: 8+8GB DDR4-2400 18-17-17-39 2T Storage: HP EX920 1TB PCIe x4 M.2 SSD + Crucial MX500 1TB 2.5" SATA SSD, 128GB Toshiba PCIe x2 M.2 SSD (KBG30ZMV128G) gone cooking externally, 1TB Seagate 7200RPM 2.5" HDD (ST1000LM049-2GH172) left outside Monitor: 1080p 126Hz IPS G-sync

 

Desktop benching:

Cinebench R15 Single thread:168 Multi-thread: 833 

SuperPi (v1.5 from Techpowerup, PI value output) 16K: 0.100s 1M: 8.255s 32M: 7m 45.93s

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×