Jump to content

Hardware acceleration for compilation?

Go to solution Solved by BobVonBob,

It turns out a hardware accelerator for compiling would look almost exactly like a CPU. It would need to perform large, varied, sequential tasks as fast as possible, and that happens to be what CPUs are already good at.

The way we write programs makes them unfriendly to parallelism. Consider the following:

int a = 5;
int b = a + 2;
int c = a + b;

Each line changes the state of the program, and the following lines depend on that change, so how would you break this up to compile in parallel?
 

GPUs are great at performing tons of small, highly repetitive, independent computations, and it turns out a lot of what we do on computers can be split up into small, highly repetitive, independent computations, but compiling is not one of those things.

Hardware acceleration is seriously awesome. You have your general purpose compute engine but then you make a custom and optimized compute engine for a particular task and it does that a million times faster and more efficiently. If we have created hardware acceleration for even workloads like AI, where is the hardware acceleration for compiling programs?

 

I mean with the bit of knowledge I know about how compilation works, keywords are broken into tokens and processed with their dependencies and then translated into machine code. If this is also a workload that involves a large data set, and not necessarily a variety amount of logic, where is the hardware acceleration for it?

 

Is there a way I can compile normal C and C++ programs on my GPU rather than the CPU? Gentoo Linux temptation...

Microsoft owns my soul.

 

Also, Dell is evil, but HP kinda nice.

Link to comment
Share on other sites

Link to post
Share on other sites

It turns out a hardware accelerator for compiling would look almost exactly like a CPU. It would need to perform large, varied, sequential tasks as fast as possible, and that happens to be what CPUs are already good at.

The way we write programs makes them unfriendly to parallelism. Consider the following:

int a = 5;
int b = a + 2;
int c = a + b;

Each line changes the state of the program, and the following lines depend on that change, so how would you break this up to compile in parallel?
 

GPUs are great at performing tons of small, highly repetitive, independent computations, and it turns out a lot of what we do on computers can be split up into small, highly repetitive, independent computations, but compiling is not one of those things.

¯\_(ツ)_/¯

 

 

Desktop:

Intel Core i7-11700K | Noctua NH-D15S chromax.black | ASUS ROG Strix Z590-E Gaming WiFi  | 32 GB G.SKILL TridentZ 3200 MHz | ASUS TUF Gaming RTX 3080 | 1TB Samsung 980 Pro M.2 PCIe 4.0 SSD | 2TB WD Blue M.2 SATA SSD | Seasonic Focus GX-850 Fractal Design Meshify C Windows 10 Pro

 

Laptop:

HP Omen 15 | AMD Ryzen 7 5800H | 16 GB 3200 MHz | Nvidia RTX 3060 | 1 TB WD Black PCIe 3.0 SSD | 512 GB Micron PCIe 3.0 SSD | Windows 11

Link to comment
Share on other sites

Link to post
Share on other sites

1 hour ago, Gat Pelsinger said:

and not necessarily a variety amount of logic, where is the hardware acceleration for it?

I think that varied amount of logic is one of the limitations. Hardware acceleration leverages doing a specific operation efficiently. For AI these are things like tensor operations, which you can clearly define like multiplications etc. If you have a large variety of logic, what would you have the hardware do specifically?

 

GPUs also excel at doing things massively parallel, but if A depends on B then you inherently must do B before A and can't (effectively) parallelise.

 

Crystal: CPU: i7 7700K | Motherboard: Asus ROG Strix Z270F | RAM: GSkill 16 GB@3200MHz | GPU: Nvidia GTX 1080 Ti FE | Case: Corsair Crystal 570X (black) | PSU: EVGA Supernova G2 1000W | Monitor: Asus VG248QE 24"

Laptop: Dell XPS 13 9370 | CPU: i5 10510U | RAM: 16 GB

Server: CPU: i5 4690k | RAM: 16 GB | Case: Corsair Graphite 760T White | Storage: 19 TB

Link to comment
Share on other sites

Link to post
Share on other sites

7 minutes ago, tikker said:

For AI these are things like tensor operations, which you can clearly define like multiplications etc.

wasnt AI mostly matrix multiplication or whatever it's called.. something math degree stuff i never studied xD

 

but yes, essentially compilation on a GPU doesnt make much sense, because diminishing returns of multithreaded compute really hit hard there. something like raytracing is essentially infinitely threadable because you can just assign each thread a single "ray to trace", and the rays dont necessarily interact with each other, so the vareous threads dont need to communicate their work.

 

but code compliation is very reliant on an order of operations and strict rules. essentially any opportunity for a race condition is absolutely not-done in code compilation (race condition = a differing result based on which thread happens to finish first) and each thread you add adds more checks to make sure race conditions dont happen.

Link to comment
Share on other sites

Link to post
Share on other sites

1 hour ago, manikyath said:

wasnt AI mostly matrix multiplication or whatever it's called.. something math degree stuff i never studied xD

I think matrices can be seen as a 2D tensor. I'm not sure if AI uses higher-order ones much, I don't use AI in my coding and uni math courses have been a while, but I thought I'd use the term tensor to implicitly draw the parallel with e.g. tensor cores.

Crystal: CPU: i7 7700K | Motherboard: Asus ROG Strix Z270F | RAM: GSkill 16 GB@3200MHz | GPU: Nvidia GTX 1080 Ti FE | Case: Corsair Crystal 570X (black) | PSU: EVGA Supernova G2 1000W | Monitor: Asus VG248QE 24"

Laptop: Dell XPS 13 9370 | CPU: i5 10510U | RAM: 16 GB

Server: CPU: i5 4690k | RAM: 16 GB | Case: Corsair Graphite 760T White | Storage: 19 TB

Link to comment
Share on other sites

Link to post
Share on other sites

@BobVonBob @tikker

 

Yes, that makes sense and I completely missed that part.

 

But still like, I think that might be one of only those times where sequential actions is needed. I actually won't talk more on this because I can't seem to find or make up actual code snippets that might justify my statement, so you might be right but I have a feeling I am missing something and some type of code might be able to use GPU acceleration.

 

But one other way to look is what if you have a large project which has a lot of files to compile, and so you use a highly parallelized hardware accelerated compute engine like maybe a GPU, and you use the high parallel compute capabilities to split the files across the GPU, and process all of them at once. This is just like using a CPU but with way more threads.

 

But what is debatable is the compatibility to even run such compilation workload on a GPU. And if not, then that is what real hardware acceleration means. Creating a custom computing device that is only optimized for that workload. But it might not be justifiable to create one or it might end up looking similar to a CPU if the variety of operations needed are considered.

 

Idk, but I do know for a fact that there is some cases where compilation is done on the GPU. Like some LLVM thing I heard about, which is too high for me to learn about.

Microsoft owns my soul.

 

Also, Dell is evil, but HP kinda nice.

Link to comment
Share on other sites

Link to post
Share on other sites

2 hours ago, Gat Pelsinger said:

@BobVonBob @tikker

 

Yes, that makes sense and I completely missed that part.

 

But still like, I think that might be one of only those times where sequential actions is needed. I actually won't talk more on this because I can't seem to find or make up actual code snippets that might justify my statement, so you might be right but I have a feeling I am missing something and some type of code might be able to use GPU acceleration.

 

But one other way to look is what if you have a large project which has a lot of files to compile, and so you use a highly parallelized hardware accelerated compute engine like maybe a GPU, and you use the high parallel compute capabilities to split the files across the GPU, and process all of them at once. This is just like using a CPU but with way more threads.

 

But what is debatable is the compatibility to even run such compilation workload on a GPU. And if not, then that is what real hardware acceleration means. Creating a custom computing device that is only optimized for that workload. But it might not be justifiable to create one or it might end up looking similar to a CPU if the variety of operations needed are considered.

 

Idk, but I do know for a fact that there is some cases where compilation is done on the GPU. Like some LLVM thing I heard about, which is too high for me to learn about.

A GPU is not just a bunch of small CPUs. It's optimized for SIMD workloads, single instruction multiple data. In order to effectively leverage that for compilation it would need to be doing the same thing at the same time across many files.

 

There is actually one use case where GPU compilation is relatively common, and that's shader pre-compilation. Shaders are often many tiny bits of code, and they're usually similar and simple enough that they don't need much effort to optimize during compile time. That combination of small size and simple compilation (i.e. low memory requirements, because GPUs do not have much memory if you split it across many tasks) makes them perfect to compile on a GPU, but most standard compilation tasks are not like that.

 

With regards to LLVM, I think you've misunderstood it. LLVM can compile code which runs on GPUs, but it does not compile code on a GPU. LLVM is a toolchain to create compilers, and it's often used when making a compiler for new languages. You make a program to turn your language into LLVM's intermediate representation and LLVM handles optimizing that and turning it into machine code.

¯\_(ツ)_/¯

 

 

Desktop:

Intel Core i7-11700K | Noctua NH-D15S chromax.black | ASUS ROG Strix Z590-E Gaming WiFi  | 32 GB G.SKILL TridentZ 3200 MHz | ASUS TUF Gaming RTX 3080 | 1TB Samsung 980 Pro M.2 PCIe 4.0 SSD | 2TB WD Blue M.2 SATA SSD | Seasonic Focus GX-850 Fractal Design Meshify C Windows 10 Pro

 

Laptop:

HP Omen 15 | AMD Ryzen 7 5800H | 16 GB 3200 MHz | Nvidia RTX 3060 | 1 TB WD Black PCIe 3.0 SSD | 512 GB Micron PCIe 3.0 SSD | Windows 11

Link to comment
Share on other sites

Link to post
Share on other sites

@BobVonBob

 

I only took GPUs as an example. You can create a better hardware accelerated device according to your needs. Just debating.

Microsoft owns my soul.

 

Also, Dell is evil, but HP kinda nice.

Link to comment
Share on other sites

Link to post
Share on other sites

18 hours ago, Gat Pelsinger said:

But one other way to look is what if you have a large project which has a lot of files to compile, and so you use a highly parallelized hardware accelerated compute engine like maybe a GPU, and you use the high parallel compute capabilities to split the files across the GPU, and process all of them at once. This is just like using a CPU but with way more threads.

That could work, if compiling file A does not need knowledge of compiling file B. As soon as B needs to know about A you lose your ability to parallelise over A and B. It exists to some degree already:

https://www.gnu.org/software/make/manual/html_node/Parallel.html

Quote

GNU make knows how to execute several recipes at once. Normally, make will execute only one recipe at a time, waiting for it to finish before executing the next. However, the ‘-j’ or ‘--jobs’ option tells make to execute many recipes simultaneously. You can inhibit parallelism for some or all targets from within the makefile (see Disabling Parallel Execution).

 

and I found this exploring some aspects of parallelising GCC more: https://gcc.gnu.org/wiki/ParallelGcc

 

7 hours ago, Gat Pelsinger said:

I only took GPUs as an example. You can create a better hardware accelerated device according to your needs. Just debating.

The big question is what kind of operation is most common in compiling code and how would you parallelise and/or optimise that? "Compile a C program" is not the operation. Ideally you want something specific like "multiply two 3x3 matrices of integers". The former is vague while the latter you can design a circuit/component for.

Crystal: CPU: i7 7700K | Motherboard: Asus ROG Strix Z270F | RAM: GSkill 16 GB@3200MHz | GPU: Nvidia GTX 1080 Ti FE | Case: Corsair Crystal 570X (black) | PSU: EVGA Supernova G2 1000W | Monitor: Asus VG248QE 24"

Laptop: Dell XPS 13 9370 | CPU: i5 10510U | RAM: 16 GB

Server: CPU: i5 4690k | RAM: 16 GB | Case: Corsair Graphite 760T White | Storage: 19 TB

Link to comment
Share on other sites

Link to post
Share on other sites

Some compilers are multithreaded which can speed up the compilation time but it also eats up more ram as a result. Using gpu to compile is silly because gpu cores can only perform a very limited and highly paralleized operation. It is ill suited for this. 

 

Btw, multithreading might not always be the go to answer for parallelism simply because of how difficult it is to get right. It is also fragile because if one thread messes up and corrupts the program data for example, the whole program might segfault and crash. To archieve higher performance, throughput, and parallel work, it is often better to fork process instead.

 

The downside is that multiprocessing(aka forking) is more resource heavy than multithreading, chrome browser does this, each tab is its own process, which is why it eats up so much ram but this comes with the benefit of greater stability and security, since one child process,  which is one tab, crashing won't crash the entire browser. They can't directly read other tabs/process memory either so more secure. 

Sudo make me a sandwich 

Link to comment
Share on other sites

Link to post
Share on other sites

  • 2 weeks later...
On 5/5/2024 at 6:00 AM, Gat Pelsinger said:

Is there a way I can compile normal C and C++ programs on my GPU rather than the CPU? Gentoo Linux temptation...

There are some research on that going on. As an example:

https://github.com/Snektron/pareas

 

On 5/5/2024 at 3:49 PM, Gat Pelsinger said:

But it might not be justifiable to create one

I guess that's the main point, compilation is not something that's done that much to justify the cost of developing something specific to it, and which may get outdated fast enough as compiler research advances.

FX6300 @ 4.2GHz | Gigabyte GA-78LMT-USB3 R2 | Hyper 212x | 3x 8GB + 1x 4GB @ 1600MHz | Gigabyte 2060 Super | Corsair CX650M | LG 43UK6520PSA
ASUS X550LN | i5 4210u | 12GB
Lenovo N23 Yoga

Link to comment
Share on other sites

Link to post
Share on other sites

15 hours ago, igormp said:

compilation is not something that's done that much to justify the cost of developing something specific to it

I didn't mean like that. And I completely deny this statement. Compilation is definitely something that is done a lot and most people/companies who need to compile a large project need to have HEDT or server grade CPUs with high core count. It would also become very convenient if you could compile stuff much faster if you are constantly modifying stuff in your project.

 

You took my statement out of context. There is context given following that line.

On 5/6/2024 at 12:19 AM, Gat Pelsinger said:

or it might end up looking similar to a CPU

I meant that it might not be worth it to create an accelerator for it because it might end up looking similar to a CPU in which case we use our own CPUs itself. And I don't mean whatever I am saying, I think you really could make a decent accelerator for compilation but I don't know. 

Microsoft owns my soul.

 

Also, Dell is evil, but HP kinda nice.

Link to comment
Share on other sites

Link to post
Share on other sites

If you were to compile everything multithreaded you would have 1 very big negative outcome. Not being able to process the whole code in a single thread the compile wouldn't be able to make prediction and compilation optimization. Just compiling to bytecode is easy, to make it efficient need the full context

Link to comment
Share on other sites

Link to post
Share on other sites

5 hours ago, Gat Pelsinger said:

I didn't mean like that.

I know you didn't, I was giving a reason as to why such thing is not researched further.

5 hours ago, Gat Pelsinger said:

And I completely deny this statement. Compilation is definitely something that is done a lot and most people/companies who need to compile a large project need to have HEDT or server grade CPUs with high core count. It would also become very convenient if you could compile stuff much faster if you are constantly modifying stuff in your project.

Most companies do not do that, build servers are cheap and hardly the bulk of the cost of a company. Caches are also a thing, so you'll be hardly rebuilding large things from scratch.

5 hours ago, Gat Pelsinger said:

I meant that it might not be worth it to create an accelerator for it because it might end up looking similar to a CPU in which case we use our own CPUs itself. And I don't mean whatever I am saying, I think you really could make a decent accelerator for compilation but I don't know. 

As I've shown you, it's possible to compile stuff taking advantage of a GPU, and it's possible to create a dedicated accelerator that's even faster, but such thing has no real incentive to exist in practice.

 

5 hours ago, Franck said:

If you were to compile everything multithreaded you would have 1 very big negative outcome. Not being able to process the whole code in a single thread the compile wouldn't be able to make prediction and compilation optimization. Just compiling to bytecode is easy, to make it efficient need the full context

Most compile jobs are already multithreaded, that's a non-issue.

FX6300 @ 4.2GHz | Gigabyte GA-78LMT-USB3 R2 | Hyper 212x | 3x 8GB + 1x 4GB @ 1600MHz | Gigabyte 2060 Super | Corsair CX650M | LG 43UK6520PSA
ASUS X550LN | i5 4210u | 12GB
Lenovo N23 Yoga

Link to comment
Share on other sites

Link to post
Share on other sites

I mean cant you compile multiple source code files in parallel? For example in Java, if you have 5 Java files that need to be compile to its byte code .class files, can you not simply run 5 javac commands for each of these files in parallel to compile each of them to their own .class files all at the same time?

 

Of course there are dependencies that need to resolve and compile first. E.g. all the import statements, libraries and ect but if it can compile a single file independently, any good build tools can just run the compilers separately and all at the same time for each of the source files. If an user or build tool can parallelize it like this so trivially, I don't see how it can be any more difficult to have compiler do this by itself. 

 

Should be the same logic for g++ and gcc.

Sudo make me a sandwich 

Link to comment
Share on other sites

Link to post
Share on other sites

@wasab

 

Multithreading is different from acceleration. You are still talking to do stuff on your CPU.

Microsoft owns my soul.

 

Also, Dell is evil, but HP kinda nice.

Link to comment
Share on other sites

Link to post
Share on other sites

Maybe you could create dedicated hardware for some parts of the compilation process (lexer/parser?). But if that ends up language specific, it would be of limited use. It might break as the language evolves or the focus switches to another language.

 

Even if acceleration works, if the information needs to travel back from the accelerator to the CPU, you might lose any speed advantage this way.

 

As others have said, the CPU is probably the best "accelerator" for the task already. The best course of action is finding ways to make better use of available resources, i.e. compile on as many cores in parallel as possible.

Remember to either quote or @mention others, so they are notified of your reply

Link to comment
Share on other sites

Link to post
Share on other sites

6 hours ago, Gat Pelsinger said:

@wasab

 

Multithreading is different from acceleration. You are still talking to do stuff on your CPU.

I wasn't talking about gpu. Just parallel work in general. You can get more than one process of any program running if these program do not disallow it. 

Sudo make me a sandwich 

Link to comment
Share on other sites

Link to post
Share on other sites

1 hour ago, wasab said:

I wasn't talking about gpu.

Neither was I. I mentioned CPU.

Microsoft owns my soul.

 

Also, Dell is evil, but HP kinda nice.

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×