Jump to content

for is way faster than goto in C?

Gat Pelsinger
Go to solution Solved by Eigenvektor,
28 minutes ago, Gat Pelsinger said:

Isn't it doing exactly the same thing as the for loop?

Logically? Yes. But the compiler doesn't know that.

 

When you use a for-loop, the compiler knows you're writing a loop, so it can make certain assumptions/optimizations right away. If you use goto, all it knows is that you're jumping somewhere under certain conditions. It can't know that this is intended to be a loop. If you do enable optimization, it'll perform additional analysis and likely arrive at the same result. Without optimization it just performs a literal translation of C to assembler.

So this

for (register int i = 0; i < UINT_MAX; i++);

is way faster than

register int i = 0;
a:
	if (i < UINT_MAX) {
    	i++;
    	goto a;
    }
    else return;

 

Technically I thought this is what a for loop (or even a while) is essentially in the background, but the for loop is quite a bit more faster than the goto if I benchmark. Then how is the for loop made?

 

Microsoft owns my soul.

 

Also, Dell is evil, but HP kinda nice.

Link to comment
Share on other sites

Link to post
Share on other sites

We went over this in your other thread - you don't know what this compiles to and, further, this is going to be scheduled by the operating system, meaning it will not always complete in the same time.

4 minutes ago, Gat Pelsinger said:

Technically I thought this is what a for loop (or even a while) is essentially in the background

I'm pretty sure the compiler will just delete your for loop if it doesn't do anything, as is the case here.

 

The explicit goto instruction will probably not be deleted, because the variable could be used elsewhere in the code and therefore it must be incremented; it's possible the compiler will just substitute the useless goto with a simple addition, but I'm not certain.

Don't ask to ask, just ask... please 🤨

sudo chmod -R 000 /*

Link to comment
Share on other sites

Link to post
Share on other sites

My guess is that while for just increment the variable each step until condition met, goto increment once then the code go to another address, then increment, then go again etc, which is slower

System : AMD R9 5900X / Gigabyte X570 AORUS PRO/ 2x16GB Corsair Vengeance 3600CL18 ASUS TUF Gaming AMD Radeon RX 7900 XTX OC Edition GPU/ Phanteks P600S case /  Eisbaer 280mm AIO (with 2xArctic P14 fans) / 2TB Crucial T500  NVme + 2TB WD SN850 NVme + 4TB Toshiba X300 HDD drives/ Corsair RM850x PSU/  Alienware AW3420DW 34" 120Hz 3440x1440p monitor / Logitech G915TKL keyboard (wireless) / Logitech G PRO X Superlight mouse / Audeze Maxwell headphones

Link to comment
Share on other sites

Link to post
Share on other sites

ffs you ask a lot of basic and silly questions, in a forum that's not programming oriented..

 

Would be easy to just compile your program, both versions, and then load the binaries into ghidra or some other disassembler and see how they're different.

 

There's tons of optimizations a compiler could do, that you're not aware of.  In the first version the compiler could even take out the whole thing and you'd have an application that just returns.

Link to comment
Share on other sites

Link to post
Share on other sites

You really need to start looking at the assembly output of your compiler and learn to understand it if you want to go into such detail.

F@H
Desktop: i9-13900K, ASUS Z790-E, 64GB DDR5-6000 CL36, RTX3080, 2TB MP600 Pro XT, 2TB SX8200Pro, 2x16TB Ironwolf RAID0, Corsair HX1200, Antec Vortex 360 AIO, Thermaltake Versa H25 TG, Samsung 4K curved 49" TV, 23" secondary, Mountain Everest Max

Mobile SFF rig: i9-9900K, Noctua NH-L9i, Asrock Z390 Phantom ITX-AC, 32GB, GTX1070, 2x1TB SX8200Pro RAID0, 2x5TB 2.5" HDD RAID0, Athena 500W Flex (Noctua fan), Custom 4.7l 3D printed case

 

Asus Zenbook UM325UA, Ryzen 7 5700u, 16GB, 1TB, OLED

 

GPD Win 2

Link to comment
Share on other sites

Link to post
Share on other sites

Comparing or optimizing performance at that level effectively makes zero sense. Especially if you're not specifying the compiler, compiler version and compiler flags you used.

 

For any real world code that does something useful inside the loop, there's most likely going to be zero performance difference if you enable compiler optimizations. In any case that is not a performance difference in C, but rather a performance difference in the machine code each variant compiles to.

 

I've used "gcc -S" (version 13.2.1 20230801) to get the compiled machine code, without any optimization. I'm only copying the actual loop here, since I figure the rest is irrelevant to the discussion.

 

The for-loop turns into this:

.L3:
	addl	$1, %ebx
.L2:
	cmpl	$-1, %ebx
	jne	.L3

-> Increment number, compare numbers, jump back to L3 if not equal.

 

The GOTO turns into this instead:

.L2:
	cmpl	$-1, %ebx
	je	.L3
	addl	$1, %ebx
	jmp	.L2
.L3:

-> Compare numbers, jump to L3 if equal, else increment number, jump back to L2.

 

In other words, one additional instruction that needs to be evaluated per iteration for goto.

Remember to either quote or @mention others, so they are notified of your reply

Link to comment
Share on other sites

Link to post
Share on other sites

24 minutes ago, Eigenvektor said:

Comparing or optimizing performance at that level effectively makes zero sense. Especially if you're not specifying the compiler, compiler version and compiler flags you used.

 

For any real world code that does something useful inside the loop, there's most likely going to be zero performance difference if you enable compiler optimizations. In any case that is not a performance difference in C, but rather a performance difference in the machine code each variant compiles to.

 

I've used "gcc -S" (version 13.2.1 20230801) to get the compiled machine code, without any optimization. I'm only copying the actual loop here, since I figure the rest is irrelevant to the discussion.

 

The for-loop turns into this:

.L3:
	addl	$1, %ebx
.L2:
	cmpl	$-1, %ebx
	jne	.L3

-> Increment number, compare numbers, jump back to L3 if not equal.

 

The GOTO turns into this instead:

.L2:
	cmpl	$-1, %ebx
	je	.L3
	addl	$1, %ebx
	jmp	.L2
.L3:

-> Compare numbers, jump to L3 if equal, increment number, jump back to L2.

 

In other words, one additional instruction that needs to be evaluated per iteration for goto.

This confirms my reasoning 🙂 

Was considering the code as example snippets, obviously taken alone it doesn't have any effect 

System : AMD R9 5900X / Gigabyte X570 AORUS PRO/ 2x16GB Corsair Vengeance 3600CL18 ASUS TUF Gaming AMD Radeon RX 7900 XTX OC Edition GPU/ Phanteks P600S case /  Eisbaer 280mm AIO (with 2xArctic P14 fans) / 2TB Crucial T500  NVme + 2TB WD SN850 NVme + 4TB Toshiba X300 HDD drives/ Corsair RM850x PSU/  Alienware AW3420DW 34" 120Hz 3440x1440p monitor / Logitech G915TKL keyboard (wireless) / Logitech G PRO X Superlight mouse / Audeze Maxwell headphones

Link to comment
Share on other sites

Link to post
Share on other sites

35 minutes ago, PDifolco said:

This confirms my reasoning 🙂 

Was considering the code as example snippets, obviously taken alone it doesn't have any effect 

It does, yes 🙂 I just thought it might make sense to confirm.

 

On the one hand, it is interesting to see what the code compiles into. On the other hand, it makes zero sense to try and draw any definitive conclusion—for is way faster than goto—about performance from a single example.

 

As soon as you add actual content into the loop, the performance cost of a single conditional jump might turn into meaningless noise. Or the compiler output might look completely different flipping the result on its head. Of course, for any real world program you'd also enable compiler optimization, which might eliminate the performance difference altogether.

Remember to either quote or @mention others, so they are notified of your reply

Link to comment
Share on other sites

Link to post
Share on other sites

In this particular example the for loop would probably be eliminated while the goto would be turned into machine code and be executed. In a nutshell you probably measured a empty program vs. the goto code.

People never go out of business.

Link to comment
Share on other sites

Link to post
Share on other sites

@Eigenvektor

 

My dumb brain is not able to process from where exactly the new instruction is coming from, so all I want to ask is if there is a way to optimize the goto loop to make it behave the same as the for loop?

Microsoft owns my soul.

 

Also, Dell is evil, but HP kinda nice.

Link to comment
Share on other sites

Link to post
Share on other sites

59 minutes ago, Gat Pelsinger said:

My dumb brain is not able to process from where exactly the new instruction is coming from, so all I want to ask is if there is a way to optimize the goto loop to make it behave the same as the for loop?

The instructions come from how the compiler interprets your code. You got multiple options:

 

- Easy mode: Enable compiler optimizations

- Hard mode: Write assembler

- I got too much time on my hand: Fiddle around with your code until the compiler output matches what you want

 

Of course, while the loops don't actually do anything, they'll simply be removed by the optimizer. But for "useful" code that is both the easiest and at the same time fastest option.

 

Without optimizations enabled, the assembler code is more or less a literal translation of your C code.

a:                    .L2
  if (i < UINT_MAX)     cmpl $-1, %ebx
                        je .L3
  {                 
      i++;              addl $1, %ebx
      goto a;           jmp  .L2
  }
  else
  {
      return;         .L3
  }

The "je .L3" is effectively the "else"

 

With -O3 enabled, GCC spits out a warning and compiles to an endless loop, because your variable is an int, but you're comparing against an unsigned int value, which it'll never be able reach. It'll overflow and therefore never stop looping. Changing the variable to an unsigned int fixes the warning and produces the expected outcome: Nothing. The loop is optimized away.

Remember to either quote or @mention others, so they are notified of your reply

Link to comment
Share on other sites

Link to post
Share on other sites

@Eigenvektor

 

Obviously this a completely non-practical experiment. My stupidly genius brain got this idea to compare a for loop with a manual goto loop, and turns out my goto loop is slower than the for loop, and I am look for why that is. I did rewrite the code and benchmarked it, but it is still slower. I re-wrote with following the steps you told that the for loop uses -

4 hours ago, Eigenvektor said:

-> Increment number, compare numbers, jump back to L3 if not equal.

The code - 

 

register unsigned int i = 0;
    a:
        i++;
        if (i < UINT_MAX)
            goto a;

 

Isn't it doing exactly the same thing as the for loop?

 

 

Okay, if we look at the assembly output,

for loop-

 

.L5:
	addl	$1, %ebx
.L4:
	cmpl	$-1, %ebx
	jne	.L5

 

goto - 

 

.L6:
	addl	$1, %ebx
	cmpl	$-1, %ebx
	je	.L7
	jmp	.L6
.L7:

 

 

Now I don't quite know how to read assembly, but it looks like the for loop code has 1 jump statement, whereas the goto code has 2. The for loop code jumps to .L5 if the comparing result is false in one go, whereas the goto code after comparing, checks to jump to .L7 if the comparing result is true, and then has to jump to .L6. So the logical difference I can see is that the for loop code uses a standalone if statement to keep the loop running and when the if statement results false, there is no special jump code. The code execution just goes on. But in my goto code, it uses an if/else like statement. It checks to jump to .L7 if comparing result was true, and then jump to .L6. Only if we could convert the "je" instruction to a "jne" instruction (or just flip the result of the comparison?). Is there still a way to optimize my for loop?

Microsoft owns my soul.

 

Also, Dell is evil, but HP kinda nice.

Link to comment
Share on other sites

Link to post
Share on other sites

28 minutes ago, Gat Pelsinger said:

Isn't it doing exactly the same thing as the for loop?

Logically? Yes. But the compiler doesn't know that.

 

When you use a for-loop, the compiler knows you're writing a loop, so it can make certain assumptions/optimizations right away. If you use goto, all it knows is that you're jumping somewhere under certain conditions. It can't know that this is intended to be a loop. If you do enable optimization, it'll perform additional analysis and likely arrive at the same result. Without optimization it just performs a literal translation of C to assembler.

Remember to either quote or @mention others, so they are notified of your reply

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×