Jump to content

Another C quirk 🥴

Today I have another "why is this faster than this?" because I have nothing else to do in my life. Like seriously, you all might get tired replying my posts because all I say does not even really matter, but once I get this question in my head, it will bother me so I can't stop.

 

You know how "ptr" is the same as "*(ptr + i)"? Upon testing, looks like it is not. And the difference is actually something I can talk about.

 

clock_gettime(CLOCK_MONOTONIC, &start);
    for (int i = 0; i < iteration; i++){
        for (int i = 0; str[i]; i++);
    }
    clock_gettime(CLOCK_MONOTONIC, &end);
    printf("%ld\n", end.tv_nsec - start.tv_nsec);
    
    clock_gettime(CLOCK_MONOTONIC, &start);
    for (int i = 0; i < iteration; i++){
        for (int i = 0; *(str + i); i++);
    }
    clock_gettime(CLOCK_MONOTONIC, &end);
    printf("%ld\n", end.tv_nsec - start.tv_nsec);

 

This is only a snippet of my main because the string is so BIG it almost crashes VSCode. Anyways, the second loop is quite a bit faster than the first. First loop uses array indexing whereas the second loop uses pointer arithmetic.

 

Ah yes, the assembly code. Only if I could read it!

 

main:
	pushq	%rbp
	.seh_pushreg	%rbp
	movq	%rsp, %rbp
	.seh_setframe	%rbp, 0
	subq	$96, %rsp
	.seh_stackalloc	96
	.seh_endprologue
	call	__main
	leaq	.LC0(%rip), %rax
	movq	%rax, -24(%rbp)
	leaq	-48(%rbp), %rax
	movq	%rax, %rdx
	movl	$1, %ecx
	call	clock_gettime
	movl	$0, -4(%rbp)
	jmp	.L4
.L7:
	movl	$0, -8(%rbp)
	jmp	.L5
.L6:
	addl	$1, -8(%rbp)
.L5:
	movl	-8(%rbp), %eax
	cltq
	movq	-24(%rbp), %rdx
	addq	%rdx, %rax
	movzbl	(%rax), %eax
	testb	%al, %al
	jne	.L6
	addl	$1, -4(%rbp)
.L4:
	cmpl	$0, -4(%rbp)
	jle	.L7
	leaq	-64(%rbp), %rax
	movq	%rax, %rdx
	movl	$1, %ecx
	call	clock_gettime
	movl	-56(%rbp), %eax
	movl	-40(%rbp), %edx
	subl	%edx, %eax
	movl	%eax, %edx
	leaq	.LC1(%rip), %rax
	movq	%rax, %rcx
	call	printf
	leaq	-48(%rbp), %rax
	movq	%rax, %rdx
	movl	$1, %ecx
	call	clock_gettime
	movl	$0, -12(%rbp)
	jmp	.L8
.L11:
	movl	$0, -16(%rbp)
	jmp	.L9
.L10:
	addl	$1, -16(%rbp)
.L9:
	movl	-16(%rbp), %eax
	cltq
	movq	-24(%rbp), %rdx
	addq	%rdx, %rax
	movzbl	(%rax), %eax
	testb	%al, %al
	jne	.L10
	addl	$1, -12(%rbp)
.L8:
	cmpl	$0, -12(%rbp)
	jle	.L11
	leaq	-64(%rbp), %rax
	movq	%rax, %rdx
	movl	$1, %ecx
	call	clock_gettime
	movl	-56(%rbp), %eax
	movl	-40(%rbp), %edx
	subl	%edx, %eax
	movl	%eax, %edx
	leaq	.LC1(%rip), %rax
	movq	%rax, %rcx
	call	printf
	movl	$0, %eax
	addq	$96, %rsp
	popq	%rbp
	ret

 

This is the only part which seems to be relevant.

 

Microsoft owns my soul.

 

Also, Dell is evil, but HP kinda nice.

Link to comment
Share on other sites

Link to post
Share on other sites

Again, I go back to what I had said before you can't just compare these kinds of things against each other and come to conclusions that they must not be equal.  There are a bunch of things that can overpower essentially your profiler in this case.

 

Lets look at the inner loops assembly

.L6:
	addl	$1, -8(%rbp)
.L5:
	movl	-8(%rbp), %eax
	cltq
	movq	-24(%rbp), %rdx
	addq	%rdx, %rax
	movzbl	(%rax), %eax
	testb	%al, %al
	jne	.L6
	addl	$1, -4(%rbp)

vs

.L10:
	addl	$1, -16(%rbp)
.L9:
	movl	-16(%rbp), %eax
	cltq
	movq	-24(%rbp), %rdx
	addq	%rdx, %rax
	movzbl	(%rax), %eax
	testb	%al, %al
	jne	.L10
	addl	$1, -12(%rbp)

Notice how the assembly is the same pretty much?  At that stage you are talking about just things like how prior code is having effect on current code.

3735928559 - Beware of the dead beef

Link to comment
Share on other sites

Link to post
Share on other sites

@wanderingfool2

 

hmm. I changed the order of the loops, and whatever is the second loop always seems to be faster. Looks like there could be some caching or prefetching that is being done.

Microsoft owns my soul.

 

Also, Dell is evil, but HP kinda nice.

Link to comment
Share on other sites

Link to post
Share on other sites

3 hours ago, Gat Pelsinger said:

because I have nothing else to do in my life

 

3 hours ago, Gat Pelsinger said:

Ah yes, the assembly code. Only if I could read it!

Well you know what's waiting to keep you occupied

F@H
Desktop: i9-13900K, ASUS Z790-E, 64GB DDR5-6000 CL36, RTX3080, 2TB MP600 Pro XT, 2TB SX8200Pro, 2x16TB Ironwolf RAID0, Corsair HX1200, Antec Vortex 360 AIO, Thermaltake Versa H25 TG, Samsung 4K curved 49" TV, 23" secondary, Mountain Everest Max

Mobile SFF rig: i9-9900K, Noctua NH-L9i, Asrock Z390 Phantom ITX-AC, 32GB, GTX1070, 2x1TB SX8200Pro RAID0, 2x5TB 2.5" HDD RAID0, Athena 500W Flex (Noctua fan), Custom 4.7l 3D printed case

 

Asus Zenbook UM325UA, Ryzen 7 5700u, 16GB, 1TB, OLED

 

GPD Win 2

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×