GCC up to 50% SLOWER vs LLVM (Clang) and Intel compilers

Camofelix · December 19, 2021

TLDR; Programs compiled with GCC (version 7-12) are taking up to 50% longer to complete vs those compiled by open rouce LLVM based CLang (11-13) and Intel ICX, and closed source Intel ICC compilers.

Hi Ladies and Gents,

I've been working on profiling the ins and outs of how alder lake works with various workloads in various environments as a way of previewing how Sapphire Rapids, which utilizes the same Golden Cove core, will perform in HPC tasks.

To that end, I've already published a few hundred results on twitter in different scenarios with different kernels, compilers memory sub timings etc. those can be found here:
External Link

Of interest for today however is this test of Binary trees:

gcc numbers 7 8 9 10 11 12
time taken is 379.943716 
time taken is 395.665537 
time taken is 373.488119 
time taken is 392.596422 
time taken is 382.825910 
time taken is 390.466340 

clang numbers 11 12 13
time taken is 256.381165 
time taken is 290.616438 
time taken is 284.877824 

intel numbers icc icx
time taken is 249.630150 
time taken is 250.511041 

Above tests were completed with Tree size of 26

The above was the output after running the test 20 times, and results were within run to run of +/- 0.2%

Git with the test can be found here: https://github.com/FCLC/Choosing-a-compiler-performance-testing-GCC_ICC_ICPX_NVCC_CLANG_HIP/tree/main/Binary_tree

Would love to see results from anyone else and their thoughts

Sauron · December 20, 2021

Interesting, I wonder how much of this is due to alder lake being new and possibly not fully optimized for in gcc... might try this on sandy later.

Camofelix · December 27, 2021

On 12/20/2021 at 2:35 AM, Sauron said:

Interesting, I wonder how much of this is due to alder lake being new and possibly not fully optimized for in gcc... might try this on sandy later.

Turns out it isn't alder lake at all. It's pervasive as far back as nehalem on all Gcc versions.

There didnt seem to be a lot of interest on the LTT forums, so I stopped updating this thread, but the main L1T thread has much more info:

I've been tracking this more on the Level1 techs forums (https://forum.level1techs.com/t/wip-testing-update-its-not-just-alder-lake-it-goes-back-to-nehalem-gcc-50-performance-regressions-vs-clang-and-intel-compilers-in-specific-workloads-across-all-opt-settings/179712/10)

I've dug through a lot of the assembly, but haven't gone *all the way down* the rabbit hole as it were. (If you count 100+ different runs as not going all the way down I guess )

It seem's GCC is trying to pre-cache instructions a lot, almost n64 instruction cache style, using wayyyyyyyyyyyyyyyyyyyyy more registers at times and wasting cycles.

ahmad13610 · January 1, 2022

how about bionic chips compiler. can it come into competitive?

Camofelix · January 4, 2022

On 12/31/2021 at 10:01 PM, ahmad13610 said:

how about bionic chips compiler. can it come into competitive?

Not quite sure what you mean

On 12/20/2021 at 2:35 AM, Sauron said:

Interesting, I wonder how much of this is due to alder lake being new and possibly not fully optimized for in gcc... might try this on sandy later.

Went further down the rabbit hole, and it's a bug in how Glibc (the GNU C library) and GCC do malloc.

Replacing the memory allocation subroutines with TC malloc, JE malloc or HOARD malloc all yielded *massive* uplifts in performance, leading to GCC-12 surpassing ICC and CLANG-11 (when those 2 are using GLIBC malloc)

I haven't had the time to integrate TC malloc etc. with OneAPI yet, but hope to do so soon.

Sauron · January 4, 2022

2 minutes ago, Camofelix said:

it's a bug in how Glibc (the GNU C library) and GCC do malloc.

well that's not great hopefully it's fixed soon

Camofelix · January 4, 2022

1 minute ago, Sauron said:

well that's not great hopefully it's fixed soon

Yeah, time permitting I'm hoping to have time after the kernel 5.17 merge window to look into it.

It's a *somewhat* niche case, but malloc mixed with bit-shifting for exponential trees isnt completely uncommon in HPC, so could have some problems sitting there sucking up cycles in super computers as I type this

Thankfully those environments tend to use the CRAY intel or custom compilers which are immune to this.

Sign In

GCC up to 50% SLOWER vs LLVM (Clang) and Intel compilers

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Create an account or sign in to comment

Create an account

Sign in

Featured Topics

Topics

Latest From Linus Tech Tips:

I Was Never Meant to Have This Prototype CPU

Latest From Tech Quickie:

Why Do Speakers Hiss?

Latest From TechLinked:

Intel: “It Wasn’t Me”

Latest From GameLinked:

Bethesda Knows It’s Broken

Latest From ShortCircuit:

How is this even handheld?! - OneXPlayer X1

Latest From Mac Address:

Why did you buy an Apple Vision Pro?

Latest From Channel Super Fun:

I Swapped the CEO's Assistant For a Day!