Jump to content

AMD's Hawaii Is Officially The Most Efficient GPGPU In The World To Date, Tops Green500 List

he will never do it. that would mean he would have to have some actual knowledge on the matter, which he doesnt. it would also mean he admits hes wrong, which he doesnt do. it would also mean him supporting his argument, which he doesnt ever do

I have more than you. I have always admitted when I am wrong, but it's rare that I'm wrong. I also always support my argument, but thanks for the slander. It shows how mature you really are.

 

I provided a source on a currently used algorithm that no one on the OpenCL side has been able to reproduce with equal performance to this day without relying on more cores. So, it's perfectly current. Just because OPCode doesn't know how to properly evaluate a source doesn't mean I'm wrong.

Software Engineer for Suncorp (Australia), Computer Tech Enthusiast, Miami University Graduate, Nerd

Link to comment
Share on other sites

Link to post
Share on other sites

I sense someone's credibility is starting to run on empty. If you don't want to at least compile an example for utilizing CUDA then you're barking bigger words than you're capable of. Like I said I will compile the OpenCL equivalent just throw me your source code. Of course I won't be able to benchmark using my machine as I run an ancient GPU. Tho finding someone to run it on modern hardware shouldn't be an issue here (it's LTT there's plenty of hardware here).

he wont... :P i am really amazed by how much patience you have. also, i can test CUDA on several systems, and i can compare it to oCL as well

"Unofficially Official" Leading Scientific Research and Development Officer of the Official Star Citizen LTT Conglomerate | Reaper Squad, Idris Captain | 1x Aurora LN


Game developer, AI researcher, Developing the UOLTT mobile apps


G SIX [My Mac Pro G5 CaseMod Thread]

Link to comment
Share on other sites

Link to post
Share on other sites

I sense someone's credibility is starting to run on empty. If you don't want to at least compile an example for utilizing CUDA then you're barking bigger words than you're capable of. Like I said I will compile the OpenCL equivalent just throw me your source code. Of course I won't be able to benchmark using my machine as I run an ancient GPU. Tho finding someone to run it on modern hardware shouldn't be an issue here (it's LTT there's plenty of hardware here).

Thank you for misquoting me. Your credibility just burst into flames.

Software Engineer for Suncorp (Australia), Computer Tech Enthusiast, Miami University Graduate, Nerd

Link to comment
Share on other sites

Link to post
Share on other sites

Everyone missed the giant caveat in this report. They only used OpenCL. Run equivalent CUdA code and Nvidia wins by a wide margin almost universally. 

AND??????????????????????????????????????????????????????????????

 

I mean I would not care if they would use BASIC on those computers, if it runs and can calculate massive amount of data efficiently. Yes, I am 100% sure that we can find a language that runs badly on AMD, but hey, they found one that works well for them and can calculate well.

 

 

and Volta will be the next gen Nvidia GPGPUs, whereas AMD will have an answer for that as well

 

gees, man, you should take your head out of your own ass

Link to comment
Share on other sites

Link to post
Share on other sites

AND??????????????????????????????????????????????????????????????

 

I mean I would not care if they would use BASIC on those computers, if it runs and can calculate massive amount of data efficiently. Yes, I am 100% sure that we can find a language that runs badly on AMD, but hey, they found one that works well for them and can calculate well.

 

 

and Volta will be the next gen Nvidia GPGPUs, whereas AMD will have an answer for that as well

 

gees, man, you should take your head out of your own ass

My point is the Nvidia chips were crippled. It's well known Nvidia doesn't support OpenCL to the degree AMD does.

 

Also, no, volta is gone from the Nvidia roadmaps and has been gone for almost a year. Pascal is next.

 

http://www.anandtech.com/show/7900/nvidia-updates-gpu-roadmap-unveils-pascal-architecture-for-2016

 

The first roadmap is the old one. The second is the current one. 

Software Engineer for Suncorp (Australia), Computer Tech Enthusiast, Miami University Graduate, Nerd

Link to comment
Share on other sites

Link to post
Share on other sites

I have more than you. I have always admitted when I am wrong, but it's rare that I'm wrong. I also always support my argument, but thanks for the slander. It shows how mature you really are.

 

I provided a source on a currently used algorithm that no one on the OpenCL side has been able to reproduce with equal performance to this day without relying on more cores. So, it's perfectly current. Just because OPCode doesn't know how to properly evaluate a source doesn't mean I'm wrong.

Honestly you're wrong more often than not. Your past few reply's in this thread have been wrong on certain points as well.

 

I don't like relying on other peoples sources (not because they're wrong) but because they are more often then not. Reason for this is I like to compile my own examples so I can be the direct source of the results. Like I said let's make a couple of OpenCL and CUDA examples for the community, sound good?

 

Thank you for misquoting me. Your credibility just burst into flames.

You got quote ninja'd before your edit.  ;)

Link to comment
Share on other sites

Link to post
Share on other sites

Honestly you're wrong more often than not. Your past few reply's in this thread have been wrong on certain points as well.

 

I don't like relying on other peoples sources (not because they're wrong) but because they are more often then not. Reason for this is I like to compile my own examples so I can be the direct source of the results. Like I said let's make a couple of OpenCL and CUDA examples for the community, sound good?

 

You got quote ninja'd before your edit.  ;)

Anything we could make in a day would be so trivial it wouldn't matter. Can we agree on that much? Second, you can produce the algorithm from the pseudo-code provided in the source and improve upon it. It's an industry standard algorithm for quantum simulations, so it's perfectly current and perfectly relevant.

 

Also, I didn't edit.

Software Engineer for Suncorp (Australia), Computer Tech Enthusiast, Miami University Graduate, Nerd

Link to comment
Share on other sites

Link to post
Share on other sites

My point is the Nvidia chips were crippled. It's well known Nvidia doesn't support OpenCL to the degree AMD does.

 

Also, no, volta is gone from the Nvidia roadmaps and has been gone for almost a year. Pascal is next.

 

ohh well. these people used oCL and well. it works for them?

I mean we are talking about a multi million project, where the computer will be used for gaming hardcore computing... they might know some stuff about the hardware they choose and the coding language right?

 

kay, good to know

Link to comment
Share on other sites

Link to post
Share on other sites

ohh well. these people used oCL and well. it works for them?

I mean we are talking about a multi million project, where the computer will be used for gaming hardcore computing... they might know some stuff about the hardware they choose and the coding language right?

 

kay, good to know

They used OpenCL on Nvidia supercomputers. That's like compiling on Intel's compiler and running the code on an AMD chip. It's automatically crippled! Get that through your skull! It's not an objective test when you tie 22% of the computing resources' hands.

Software Engineer for Suncorp (Australia), Computer Tech Enthusiast, Miami University Graduate, Nerd

Link to comment
Share on other sites

Link to post
Share on other sites

Anything we could make in a day would be so trivial it wouldn't matter. Can we agree on that much? Second, you can produce the algorithm from the pseudo-code provided in the source and improve upon it. It's an industry standard algorithm for quantum simulations, so it's perfectly current and perfectly relevant.

 

Also, I didn't edit.

idk, a day is plenty to create quite a complicated algorithm, but ofc, not when youre 14 ;)

"Unofficially Official" Leading Scientific Research and Development Officer of the Official Star Citizen LTT Conglomerate | Reaper Squad, Idris Captain | 1x Aurora LN


Game developer, AI researcher, Developing the UOLTT mobile apps


G SIX [My Mac Pro G5 CaseMod Thread]

Link to comment
Share on other sites

Link to post
Share on other sites

They used OpenCL on Nvidia supercomputers. That's like compiling on Intel's compiler and running the code on an AMD chip. It's automatically crippled! Get that through your skull! It's not an objective test when you tie 22% of the computing resources' hands.

okay, let me rephrase my sentence

 

clever guys, many of them said " We need OpenCL machine!"

 

they looked around and saw that AMD is awesome. 

 

Then there is this list, that is for us is as informative that it let us make pointless arguments on the internet, whereas if you want to use OpenCL you go to AMD, if CUDA, then Nvidia.

Link to comment
Share on other sites

Link to post
Share on other sites

Honestly you're wrong more often than not. Your past few reply's in this thread have been wrong on certain points as well.

 

I don't like relying on other peoples sources (not because they're wrong) but because they are more often then not. Reason for this is I like to compile my own examples so I can be the direct source of the results. Like I said let's make a couple of OpenCL and CUDA examples for the community, sound good?

 

You got quote ninja'd before your edit.  ;)

I would test openCL on my R9 290X

if you want to annoy me, then join my teamspeak server ts.benja.cc

Link to comment
Share on other sites

Link to post
Share on other sites

idk, a day is plenty to create quite a complicated algorithm, but ofc, not when youre 14 ;)

I'm 21, and one algorithm does not a system make, but you wouldn't know anything about system development since your repertoire extends only as far as game programming. You ever wonder why the top paid computer scientists don't work for game companies? I'll give you a hint: it's beneath their worth and pay grade.

Software Engineer for Suncorp (Australia), Computer Tech Enthusiast, Miami University Graduate, Nerd

Link to comment
Share on other sites

Link to post
Share on other sites

My point is the Nvidia chips were crippled. It's well known Nvidia doesn't support OpenCL to the degree AMD does.

 

Also, no, volta is gone from the Nvidia roadmaps and has been gone for almost a year. Pascal is next.

 

http://www.anandtech.com/show/7900/nvidia-updates-gpu-roadmap-unveils-pascal-architecture-for-2016

 

The first roadmap is the old one. The second is the current one. 

Like AMD cards aren't crippled either? Keep in mind AMD cards rely solely on PCIe bandwidth for intercommunication. Nvidia came up with Nvlink to resolve this issue for their Pascal cards.

 

Like said already a few posts up, AMD will come up with their own implementation as well to get around the PCIe bandwidth limitation.

 

Anything we could make in a day would be so trivial it wouldn't matter. Can we agree on that much? Second, you can produce the algorithm from the pseudo-code provided in the source and improve upon it. It's an industry standard algorithm for quantum simulations, so it's perfectly current and perfectly relevant.

 

Also, I didn't edit.

Being trivial is the point. I don't really want to spend more than an hour or two creating examples. I'll leave it up to you to create whatever you wish (it will be easier for me to replicate it).

 

Like I said I don't care about (old) sources if I can be the primary source (now). And you did edit.  :P

 

They used OpenCL on Nvidia supercomputers. That's like compiling on Intel's compiler and running the code on an AMD chip. It's automatically crippled! Get that through your skull! It's not an objective test when you tie 22% of the computing resources' hands.

That statement has to be said out of plain ignorance. Parallel computing is nothing like serial computing. Your compiler argument is invalid.

 

(both CUDA and OpenCL rely on compute kernels written in C)

 

:D

Link to comment
Share on other sites

Link to post
Share on other sites

okay, let me rephrase my sentence

 

clever guys, many of them said " We need OpenCL machine!"

 

they looked around and saw that AMD is awesome. 

 

Then there is this list, that is for us is as informative that it let us make pointless arguments on the internet, whereas if you want to use OpenCL you go to AMD, if CUDA, then Nvidia.

The only people who say they need OpenCL at this point use a Xeon Phi which is the king of mass-parallel accelerators for scientific computing. AMD has better theoretical flop numbers than Nvidia, but the gap between actual performance and real for AMD is huge compared to what the theoretical-real gap is for Nvidia.

Software Engineer for Suncorp (Australia), Computer Tech Enthusiast, Miami University Graduate, Nerd

Link to comment
Share on other sites

Link to post
Share on other sites

Like AMD cards aren't crippled either? Keep in mind AMD cards rely solely on PCIe bandwidth for intercommunication. Nvidia came up with Nvlink to resolve this issue for their Pascal cards.

 

Like said already a few posts up, AMD will come up with their own implementation as well to get around the PCIe bandwidth limitation.

 

Being trivial is the point. I don't really want to spend more than an hour or two creating examples. I'll leave it up to you to create whatever you wish (it will be easier for me to replicate it).

 

Like I said I don't care about (old) sources if I can be the primary source (now). And you did edit.  :P

 

That statement has to be said out of plain ignorance. Parallel computing is nothing like serial computing. Your compiler argument is invalid.

I thought you people said PCIe 4 was pointless because 3.0 was nowhere near saturated  :rolleyes: Also, Nvidia hasn't yet implemented NVlink, so it's pointless to mention it here. If you mean the SLI bridge, you can't really use it in server racks anyway, so your argument is moot.

 

Current source as it's the industry standard, but keep running away asking for everyone else to do the work...

 

You completely missed the point. If you run crippled code, the result is thusly crippled. It's a perfect analogue for running OpenCL on an Nvidia GPU.

Software Engineer for Suncorp (Australia), Computer Tech Enthusiast, Miami University Graduate, Nerd

Link to comment
Share on other sites

Link to post
Share on other sites

I'm 21, and one algorithm does not a system make, but you wouldn't know anything about system development since your repertoire extends only as far as game programming. You ever wonder why the top paid computer scientists don't work for game companies? I'll give you a hint: it's beneath their worth and pay grade.

i dont care, you act <15.

 

I never said i know much about scientific compute from a programming point of view, but i know enough to tell you that youre full of shit.

you want something trivial, so its easier to replicate on different arches without problems. i can tell you that a quick monte carlo pi algorithm is much easier to implement in exactly the same way over multiple languages, than going full on system programming. but you shouldve known that if your claims are true. but oh my, that means you contradict yourself. just shut up please. you embarass me, opcode, anyone who tries to correct you, and most importantly, yourself

 

and no, my knowlege goes far over game programming, its just that i like game programming most of all kinds of programming. but im not a programmer professionally. im a physics advisor and artist. but that doesnt matter, i have knowledge on the shit youre saying, and youre wrong and contradictory

"Unofficially Official" Leading Scientific Research and Development Officer of the Official Star Citizen LTT Conglomerate | Reaper Squad, Idris Captain | 1x Aurora LN


Game developer, AI researcher, Developing the UOLTT mobile apps


G SIX [My Mac Pro G5 CaseMod Thread]

Link to comment
Share on other sites

Link to post
Share on other sites

This is totally fake, everyone knows only Nvidia can make good GPU's.

Sorry mate! I only buy Matrox

5950X | NH D15S | 64GB 3200Mhz | RTX 3090 | ASUS PG348Q+MG278Q

 

Link to comment
Share on other sites

Link to post
Share on other sites

I thought you people said PCIe 4 was pointless because 3.0 was nowhere near saturated  :rolleyes: Also, Nvidia hasn't yet implemented NVlink, so it's pointless to mention it here. If you mean the SLI bridge, you can't really use it in server racks anyway, so your argument is moot.

 

Current source as it's the industry standard, but keep running away asking for everyone else to do the work...

 

You completely missed the point. If you run crippled code, the result is thusly crippled. It's a perfect analogue for running OpenCL on an Nvidia GPU.

In gaming PCIe bandwidth is perfectly fine. For intercommunication between GPU's for compute heavy workloads it can be a problem (something that's rarely brought up on LTT my guess).

 

I'm asking you to create the example to see if you can live up to your own accusations. I can easily replicate anything you make regardless if its outside of this scope or not.

 

The point is when it comes to dealing with CUDA or OpenCL they both run C on top of a compute kernel.

__kernel void square(__global float* input, __global float* output, const unsigned int count){   int i = get_global_id(0);   if (i < count)      output[i] = input[i] * input[i];}
Link to comment
Share on other sites

Link to post
Share on other sites

i dont care, you act <15.

 

I never said i know much about scientific compute from a programming point of view, but i know enough to tell you that youre full of shit.

you want something trivial, so its easier to replicate on different arches without problems. i can tell you that a quick monte carlo pi algorithm is much easier to implement in exactly the same way over multiple languages, than going full on system programming. but you shouldve known that if your claims are true. but oh my, that means you contradict yourself. just shut up please. you embarass me, opcode, anyone who tries to correct you, and most importantly, yourself

 

and no, my knowlege goes far over game programming, its just that i like game programming most of all kinds of programming. but im not a programmer professionally. im a physics advisor and artist. but that doesnt matter, i have knowledge on the shit youre saying, and youre wrong and contradictory

Nope, the more trivial it is, the easier it is for any processor to handle it, the easier it is for a compiler to generate equivalent output regardless of the starting language (the point of this demonstration being to prove language matters for a real application and not the kiddy pool), and the easier it is to find edge case optimizations for small workloads (cheat which break when using larger ones for instance).

 

Monte Carlo is kiddy pool! Jeez it's harder to run edge detection on an image than it is to run a Monte Carlo Pi simulation. Grow up LukaP. That problem doesn't even leverage half the specialized circuitry on a GPU.

 

I haven't contradicted myself at all. You're just stamping your feet out of bias.

 

You say that, but you prove otherwise regarding your knowledge.

Software Engineer for Suncorp (Australia), Computer Tech Enthusiast, Miami University Graduate, Nerd

Link to comment
Share on other sites

Link to post
Share on other sites

Nope, the more trivial it is, the easier it is for any processor to handle it, the easier it is for a compiler to generate equivalent output regardless of the starting language (the point of this demonstration being to prove language matters for a real application and not the kiddy pool), and the easier it is to find edge case optimizations for small workloads (cheat which break when using larger ones for instance).

 

Monte Carlo is kiddy pool! Jeez it's harder to run edge detection on an image than it is to run a Monte Carlo Pi simulation. Grow up LukaP. That problem doesn't even leverage half the specialized circuitry on a GPU.

 

I haven't contradicted myself at all. You're just stamping your feet out of bias.

 

You say that, but you prove otherwise regarding your knowledge.

Im out... nothing you have said here has any ground (apart from that monte carlo is childs play, but thats what i was saying), you spit bullshit, you dont back it with concrete info, and when youre challenged to prove it, you avoid it... youre the definition of troll... 

 

tumblr_lxhq6n9xoy1r3way8o1_500.gif

"Unofficially Official" Leading Scientific Research and Development Officer of the Official Star Citizen LTT Conglomerate | Reaper Squad, Idris Captain | 1x Aurora LN


Game developer, AI researcher, Developing the UOLTT mobile apps


G SIX [My Mac Pro G5 CaseMod Thread]

Link to comment
Share on other sites

Link to post
Share on other sites

FOR GAMING YOU RETARD!!! WERE TALKING ABOUT COMPUTE HERE NOW!!! you really are stupid arent you?

 

NVLlink is coming with pascal and volta, in fact, the next nvidias announced compute cards are to use it....

 

hes not running away, you are avoiding doing something that you know will burst your bullshit

 

you miss the point of a benchmark, it has to be the same over all cases. you dont run cinebench2003 on some systems, and R15 on others, because one may show a better number. you decide on a standard and use it

Compute doesn't even saturate PCIe3 in most cases. You have to be doing Peta-Scale distributed computing to do that.

 

Yes, I know that. It's just pointless for him to mention it here.

 

Yes, he's running away and depending on me to make him a trivial example that will run in pretty much the same exact time in either language because it is trivial. FFTs are the same in OpenCL and CUDA except in 3D where CUDA pulls ahead, but that's not hard enough of a problem to matter!

 

Benchmarks which don't take advantage of all available resources should not be used. PERIOD!

Software Engineer for Suncorp (Australia), Computer Tech Enthusiast, Miami University Graduate, Nerd

Link to comment
Share on other sites

Link to post
Share on other sites

 

In gaming PCIe bandwidth is perfectly fine. For intercommunication between GPU's for compute they are a problem (something that's rarely brought up on LTT my guess).

 

I'm asking you to create the example to see if you can live up to your own accusations. I can easily replicate anything you make regardless if its outside of this scope or not.

 

The point is when it comes to dealing with CUDA or OpenCL they both run C on top of a compute kernel.

-snip-

 

The C should only be used for boiler plate code and queuing I/O in this case, so why even bring that up?

 

If you want an example that will demonstrate the differences it will take more than a day as I have final exams to study for and student assignments to grade. If you want me to do the leg work, then this needs to be deferred until the middle of December preferably. 

Software Engineer for Suncorp (Australia), Computer Tech Enthusiast, Miami University Graduate, Nerd

Link to comment
Share on other sites

Link to post
Share on other sites

The C should only be used for boiler plate code and queuing I/O in this case, so why even bring that up?

 

If you want an example that will demonstrate the differences it will take more than a day as I have final exams to study for and student assignments to grade. If you want me to do the leg work, then this needs to be deferred until the middle of December preferably. 

It should only 30 minutes tops to write something up real quick. You spend more time on here than that every day.  ;)

Link to comment
Share on other sites

Link to post
Share on other sites

It should only 30 minutes tops to write something up real quick. You spend more time on here than that every day.  ;)

I type one-handed while grading. furthermore, anything that can be written in 30 minutes is too trivial. Both companies have circuitry to handle vectors, FFTs, and a number of trivial operations in near constant time for a certain size. It's how all of these can be used in concert under OCL and CUDA that is the big differentiator, as I've said from the beginning.

Software Engineer for Suncorp (Australia), Computer Tech Enthusiast, Miami University Graduate, Nerd

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now


×