Jump to content

Is it really worth compiling programs on your system?

One thing that I wish I had is enough compute and memory and if I did, I would go compiling each and every piece of software on my system. Besides the other reasons for needing to compile code yourself, one reason would be for performance or better optimization specifically targeted for your machine. If I do sound a bit extreme or impractical, remember that Gentoo users exist, compiling their kernel for 8 hours for the fifth time.

 

I feel like if optimization was my goal anyways, I am already doing it wrong. I just get the code, and straight away compile it. Do I not have to give O3 and march=native flags for C/C++? I just compiled Neovim by adding "CFLAGS= O3" as a parameter for Make, and it compiled and ran correctly. I am not even sure if that is how it works, so did O3 apply or not (Neovim does contain a lot of C code and no C++ as far as I know)?

 

Does that mean I could clone the Linux source, and apply aggressive optimizations in the build process? I don't think the source is compiled with any optimization, but I don't think something like march=native is used right (it wouldn't be portable)? And I heard because it could break code, O2 is used instead of O3.

 

C/C++ allow you to write generic code, and compilers actually do an amazing job when it comes to optimizing that code in a certain way or for certain hardware. So does this actually make a difference, or should I stop feeling sad because of the fact that it is only feasible for me to get pre-compiled generic binaries? Do pre-compiled  binaries not come with much optimization? Because I think choosing instructions at runtime is a thing, for example a program might use AVX2 at runtime if it is supported or else it could use other, but I think that's about it and it isn't most optimized way.

PLEASE MARK COMMENTS AS SOLUTION IF SATISFIED!!

bigger number better, makes me look cooler.

Link to post
Share on other sites

Compiling everything yourself is a waste of time, assuming your computer is a nothing-special x86-64 machine like billions of others all over the world and binaries exist.

I sold my soul for ProSupport.

Link to post
Share on other sites

I am going to say the short answer is no for the following reasons

  • Machine specific code optimizations tend to be negligible. When there is often a performance uplift here whatever optimizations it's applying is often beneficial for other platforms as well, I think a great example of this is the Intel Clear Linux project where Intel patches often benefit AMD, in which case they should be investigated for more generic use.
    I'm not saying there isn't a performance improvement here, I am just saying it's probably not going to be noticeable except for some edge cases.
  • Building something directly for your system also only matters if the entire chain is built for your system, so if you build say Krita for Zen4 but the rest of your system is built against generic, then the difference is even more negligible.
     

Now you could entirely optimize for a single platform, but build flags just wont help a lot here. The only time it's beneficial is when the app was specifically designed for that platform, in which case it may not be compatible with other platforms.
Also just because something can perform better doesn't necessarily mean it will, AVX2 instructions for instance do have a performance uplift but the overhead can actually make it perform worse. Same with 64bit vs 32bit, 64bit has more overhead and most applications realistically don't benefit from it.

 

The only reason imo to compile everything is because you want to modify and build something for a specific target where you want to strip away or patch in various things to reach a target or you know for sure there is some optimization available that can actually provide additional performance or functionality for a given target.

Link to post
Share on other sites

Most folks would not put up with the time commitment for that, and the time savings for a single person is dubious depending on how long you'll be using that version of the program.

 

Let's say you do want to go and hyper optimize a piece of software for your specific system. How much time will that take you? And how much time will it save you?

 

If you know that you will be running a particular version of software for a decade on the same piece of hardware, the initial time commitment is probably worth it. If you save 1 minute a day thanks to the optimizations, over 10 years, you've saved over 60 hours of time, and this scales well with how strong of an impact your optimizations are. If you can save 30 minutes a day, we're now talking months of computing time saved, although that's very unrealistic for this sort of optimization.

 

However, most software does not have a 10 year lifespan, nor does your specific hardware. Odds are, you will update one or the other in that time. And with most software that people use day-to-day, weekly updates are not uncommon. And each time you update, you'll need to recompile.

 

There are four kinds of people who compile everything for a specific system: good OEMs, console game developers, researchers, and Gentoo Linux nerds.

 

A good OEM who will be producing 10,000 copies of the same hardware and wants their custom software to run well on it will compile to that specific hardware, because there's no reason not to apart from laziness. Every single router or NAS box or digital thermostat is going to have identical hardware.

 

With the consoles, they are a known quantity, so compiling directly for them makes sense. Console gamers would also be very annoyed if they had to compile shaders on first run like we do on PC.

 

With researchers, they are doing this because they are very limited in the amount of time they get to use a particular system. If you are given 100 hours on a supercomputer for your research, that's a hard limit. And you know what hardware you will be using and how long you'll get on it weeks or months in advance. So it behooves you to optimize the code as much as possible to get as much benefit from your time as you can.

 

And then there are Gentoo Linux nerds, who want to min-max their system through compiling everything by hand. The practical benefits of this are dubious. I'm sure some folks do have a net gain in productivity from the practice, but for most, it's just a hobby. They do it for the enjoyment of optimizing their system, not because it's practical. And that's totally fine, but not something that everyone in the world should be pushed towards.

Link to post
Share on other sites

36 minutes ago, Haswellx86 said:

One thing that I wish I had is enough compute and memory and if I did, I would go compiling each and every piece of software on my system.

just install gent-

36 minutes ago, Haswellx86 said:

remember that Gentoo users exist

oh lol

36 minutes ago, Haswellx86 said:

Do I not have to give O3 and march=native flags for C/C++? I just compiled Neovim by adding "CFLAGS= O3" as a parameter for Make, and it compiled and ran correctly. I am not even sure if that is how it works, so did O3 apply or not (Neovim does contain a lot of C code and no C++ as far as I know)?

Those compile optimizations are often not worth it, unless you're doing some really specific things.

Most libraries also dispatch intrinsics during runtime, so even if you use march=i386 it should be able to use AVX-512 when available.

Anyhow, benchmarks exist for a reason, and you can see by yourself:

https://www.phoronix.com/review/ubuntu-o3-experiment

https://www.phoronix.com/review/linux-kernel-o3

39 minutes ago, Haswellx86 said:

So does this actually make a difference, or should I stop feeling sad because of the fact that it is only feasible for me to get pre-compiled generic binaries?

If you care about performance, look into profiling your applications, with stuff like perf and whatnot, and try to find bottlenecks and maybe algorithm inefficiencies, those are always way more relevant that any compiler flag. 

FX6300 @ 4.2GHz | Gigabyte GA-78LMT-USB3 R2 | Hyper 212x | 3x 8GB + 1x 4GB @ 1600MHz | Gigabyte 2060 Super | Corsair CX650M | LG 43UK6520PSA
ASUS X550LN | i5 4210u | 12GB
Lenovo N23 Yoga

Link to post
Share on other sites

48 minutes ago, Needfuldoer said:

Compiling everything yourself is a waste of time

That sounds too bold. There are many people who kind of support compiling yourself.

 

48 minutes ago, Needfuldoer said:

assuming your computer is a nothing-special x86-64 machine like billions of others all over the world and binaries exist.

Yes but even each x86-64 CPU has its own ways of optimizations. There is one layer of optimization where the compiler generically optimizes your code using better and smarter algorithms, features, path of execution and cleaning up the code. The other layer is deeper where it optimizes the code for a specific platform and a specific CPU architecture, which takes advantage of certain instructions that are not available on a generic basis and tweaks other instructions. Modern compilers are quite good at these jobs.

 

37 minutes ago, Nayr438 said:

Machine specific code optimizations tend to be negligible.

I think it's worth to mention that it depends on the workload of the program. For your generic app which lets you do something, it would be fine for it to not be that optimized. I would even take for it to be coded in interpreted Python or something, given that it is lightweight. But if it comes to something like my browser or my terminal, I need optimization.
Also, I think machine specific optimization helps the most when you really are doing heavy computation that is very repetitive. For example, a chess engine like Stockfish has many patches for certain CPU architectures for those deep optimizations as even a few cycles saved end up accumulating to many. Prime95 has I think hand written assembly for many CPU architectures for maximum stress and no overhead or bottleneck. Glibc has hand written assembly for many low level functions. DeepSeek stated that they used Nvidia PTX (assembly) rather than CUDA for peak performance and efficiency (absolute mad lads, but I don't know the full story).

Here is also where manual hand written deep optimization plays an actual role. There is no practical need to put in efforts for like one line of instruction with SIMD acceleration, those are done by the compiler, but when you are properly doing some compute or something, where your algorithm is ran repeatedly, you could manually and properly optimize it with SIMD and probably better than your compiler as it doesn't have that much idea about your code.

I was once trying to find comparisons with Intel's compiler, and came across various articles stating that Intel's compiler actually had major performance benefits in a couple scenarios. On broader scenarios, it still is a little better in optimization.

 

@YoungBlade Time is obviously a major reason. I never said that critical optimization is practical. But if it is as easy as just putting in extra compile commands and you have the hardware to compile it that too in a reasonable time, I don't see why not if you care about it.

 

18 minutes ago, igormp said:

Most libraries also dispatch intrinsics during runtime, so even if you use march=i386 it should be able to use AVX-512 when available.

That's good to hear. I think I expected it because intrinsics is such a important topic in optimization.

18 minutes ago, igormp said:

Anyhow, benchmarks exist for a reason, and you can see by yourself:

Well you know, it's faster! Unless the time it isn't. Either runtime variance, or the those theories about O3 not only breaking the code but actually making it slower, sometimes due to higher space and so not being able to fit in cache, or some stuff like that. Maybe O3 isn't the best solution, but something like march or mtune to properly use the power of your CPU. It's not about that is just a little faster, but about the capabilities of my system being correctly utilized. Again, not practical, I agree, stop yelling at me.

 

 

PLEASE MARK COMMENTS AS SOLUTION IF SATISFIED!!

bigger number better, makes me look cooler.

Link to post
Share on other sites

8 minutes ago, Haswellx86 said:

theories about O3 not only breaking the code but actually making it slower

More about it in this stackoverflow post.

*using non-conversational, sketch-level language to gesture at structure and direction.
The GB8/12 Liberation Front

 

 

Link to post
Share on other sites

12 minutes ago, Haswellx86 said:

I think it's worth to mention that it depends on the workload of the program. For your generic app which lets you do something, it would be fine for it to not be that optimized. I would even take for it to be coded in interpreted Python or something, given that it is lightweight. But if it comes to something like my browser or my terminal, I need optimization.
Also, I think machine specific optimization helps the most when you really are doing heavy computation that is very repetitive. For example, a chess engine like Stockfish has many patches for certain CPU architectures for those deep optimizations as even a few cycles saved end up accumulating to many. Prime95 has I think hand written assembly for many CPU architectures for maximum stress and no overhead or bottleneck. Glibc has hand written assembly for many low level functions. DeepSeek stated that they used Nvidia PTX (assembly) rather than CUDA for peak performance and efficiency (absolute mad lads, but I don't know the full story).

Here is also where manual hand written deep optimization plays an actual role. There is no practical need to put in efforts for like one line of instruction with SIMD acceleration, those are done by the compiler, but when you are properly doing some compute or something, where your algorithm is ran repeatedly, you could manually and properly optimize it with SIMD and probably better than your compiler as it doesn't have that much idea about your code.

I was once trying to find comparisons with Intel's compiler, and came across various articles stating that Intel's compiler actually had major performance benefits in a couple scenarios. On broader scenarios, it still is a little better in optimization.

Everything you are stating here however depends on the Software itself being optimized for that platform, which I did mention.

 

If the software was never specifically optimized for that platform then none of this matters. Your original question was just taking every piece of software and compiling it for your platform, not making modifications to it. Most software wont have compile time optimizations specific to certain platforms and in many cases when they do they are often negligible outside of certain edge cases.

Link to post
Share on other sites

39 minutes ago, Haswellx86 said:

But if it comes to something like my browser or my terminal, I need optimization.

Bad examples, honestly.

37 minutes ago, Haswellx86 said:

DeepSeek stated that they used Nvidia PTX (assembly) rather than CUDA for peak performance and efficiency (absolute mad lads, but I don't know the full story).

Their PTX usage was really minimal and was more about cross GPU-comms rather than actual raw perf into the GPU, you can take a look at their technical paper, the whole "they bypassed CUDA!!!" thing was just media overblowing things.

34 minutes ago, Haswellx86 said:

but something like march or mtune to properly use the power of your CPU

eh, not that much either

https://www.phoronix.com/review/ubuntu-x86-64-v3-benchmark

 

Often that kind of thing is only relevant to HPC stuff, by at which point you'd be looking into many other optimizations as well.

36 minutes ago, Haswellx86 said:

It's not about that is just a little faster, but about the capabilities of my system being correctly utilized.

Again, intrinsics. Also, you should be worrying about finding bottlenecks with profiling, not focusing on such small stuff like compiler flags. What good is using a proper instruction set if you're using a subpar algorithm or ended up IO bound?

FX6300 @ 4.2GHz | Gigabyte GA-78LMT-USB3 R2 | Hyper 212x | 3x 8GB + 1x 4GB @ 1600MHz | Gigabyte 2060 Super | Corsair CX650M | LG 43UK6520PSA
ASUS X550LN | i5 4210u | 12GB
Lenovo N23 Yoga

Link to post
Share on other sites

1 hour ago, Haswellx86 said:

One thing that I wish I had is enough compute and memory and if I did, I would go compiling each and every piece of software on my system.

You can compile whatever you want on any compatible system, it will just take longer if your system sucks.

1 hour ago, Haswellx86 said:

Besides the other reasons for needing to compile code yourself, one reason would be for performance or better optimization specifically targeted for your machine.

You need to know what you're doing for that to apply.

1 hour ago, Haswellx86 said:

I feel like if optimization was my goal anyways, I am already doing it wrong. I just get the code, and straight away compile it. Do I not have to give O3 and march=native flags for C/C++? I just compiled Neovim by adding "CFLAGS= O3" as a parameter for Make, and it compiled and ran correctly. I am not even sure if that is how it works, so did O3 apply or not (Neovim does contain a lot of C code and no C++ as far as I know)?

In 99.9% of cases you will get no benefit from compiling software yourself as the packaged version already includes the obvious optimizations. Even if your executable were a bit faster you'd never notice on something like neovim, which barely taxes a modern system anyway.

1 hour ago, Haswellx86 said:

Does that mean I could clone the Linux source, and apply aggressive optimizations in the build process? I don't think the source is compiled with any optimization, but I don't think something like march=native is used right (it wouldn't be portable)? And I heard because it could break code, O2 is used instead of O3.

The packaged kernel includes support for different architectures but it will detect what architecture you have and execute the correct optimized code automatically. Not including support for other architectures won't make it any faster, it will just make it a little smaller (which is irrelevant on modern systems with terabytes of storage). And yeah, O3 is not safe because it might cause incorrect behavior for little benefit.

1 hour ago, Haswellx86 said:

So does this actually make a difference, or should I stop feeling sad because of the fact that it is only feasible for me to get pre-compiled generic binaries?

No, you should keep being sad about maybe missing out on a 0.0001% performance boost.

53 minutes ago, Haswellx86 said:

But if it comes to something like my browser or my terminal, I need optimization.

Why..? Terminal emulators barely use any resources anyway and browser performance largely depends on the pages you're rendering and how good the web engine is, compiler optimizations make no difference

55 minutes ago, Haswellx86 said:

Also, I think machine specific optimization helps the most when you really are doing heavy computation that is very repetitive. For example, a chess engine like Stockfish has many patches for certain CPU architectures for those deep optimizations as even a few cycles saved end up accumulating to many.

Those don't require you to compile stockfish yourself. It will detect your architecture and use those instructions as needed, or in some specific cases they might ship a separate binary - but either way you won't need to compile it yourself to access those features.

 

Don't ask to ask, just ask... please 🤨

sudo chmod -R 000 /*

Link to post
Share on other sites

The only time I could see it make sense is if you do something for a living, the software you use is very performance intensive, and compiling the binary yourself nets you an appreciable performance boost.

 

Say if you're encoding videos all day and the self-compiled encoder is 10% faster. However I very much doubt you'd get anywhere near that number.

Remember to either quote or @mention others, so they are notified of your reply

Link to post
Share on other sites

18 hours ago, Eigenvektor said:

The only time I could see it make sense is if you do something for a living, the software you use is very performance intensive, and compiling the binary yourself nets you an appreciable performance boost.

 

Say if you're encoding videos all day and the self-compiled encoder is 10% faster. However I very much doubt you'd get anywhere near that number.

Yeah, and what's more likely is that any performance improvement would be so small that it would be negated by the time lost compiling the updates.

Don't ask to ask, just ask... please 🤨

sudo chmod -R 000 /*

Link to post
Share on other sites

i imagine it wouldnt make so much of a difference but wouldnt it be worth it even it was 1 percent improvement? What would you lose? Just do it i'd say. 

I know it might not be secure, yeah vibecoding is cool but we shouldnt do smt unless we understand it and etc. thx but these disclaimers get old quick. maybe we shall be reminded frequently for we are stupid but i dont work at a nuclear powerplant.

Link to post
Share on other sites

if a gentoo sysadmin stumbled upon this thread he would recompile his system for the 69th time.

 

Compiling stuff usually is a waste of time but if you want to apply flags like avx512 or -O3 etc to optimize your system then i don't see why not but I still think the performance difference for most applications is very negligible(unless you're a gentoo user or have no job)

 

I use windows + arch wsl btw. 

Link to post
Share on other sites

20 minutes ago, goatedpenguin said:

nah, windows is still king in terms of consumer software.

Absolute unfortunate reality.

PLEASE MARK COMMENTS AS SOLUTION IF SATISFIED!!

bigger number better, makes me look cooler.

Link to post
Share on other sites

  • 1 month later...
On 2/21/2025 at 6:59 AM, Haswellx86 said:

Besides the other reasons for needing to compile code yourself, one reason would be for performance or better optimization specifically targeted for your machine.

You are likely not going to benefit (in terms of performance) by simply recompiling a piece of code "for your system".

Where you MAY win is if you can tailor the code to your needs.  E.g., I build kernels for each of my machines based on how I want to use them.

I build all "packages" (NetBSD-speak) as I don't trust the folks maintaining them AND they are usually more customizable than the maintainer may have chosen to support in his/her release.

For a generic user, just sit back and enjoy the ride (buy more RAM if you feel a need to "do something".)

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×