Jump to content

Weird anomalies in instruction run time

Art Vandelay

so I was benchmarking some different instructions in assembly to see if certain things actually took longer than others.

 

I noticed that this code ran in about 976 ms:

    MOV ECX, -1    timing_loop:        DEC ECX        JNZ timing_loop 

and this loop ran in about 978ms:

  MOV ECX, -1  timing_loop:        ADD EDX, 1        DEC ECX        JNZ timing_loop  

and this loop ran in about 1947ms:

  MOV ECX, -1  timing_loop:        ADD EDX, 1        ADD EDX, 1        DEC ECX        JNZ timing_loop

So why exactly is this happening? Is this due to a branch delay slot or something?

 

The same thing happens with 4 NOPS instead of 2 additions, which would seem to indicate that this has something to do with the memory that instructions use. It doesn't happen with multiply, however, so I'm not really sure.

 

I also noticed that this:

timing_loop:        MOV EAX, 456464646        MUL EAX        DEC ECX        JNZ timing_loop

runs twice as fast as this:

timing_loop:        MUL EAX        DEC ECX        JNZ timing_loop

Is that because of superscalar execution/pre-execution or something?

Link to comment
Share on other sites

Link to post
Share on other sites

I'm not quite sure, but I'm guessing that the problem is that you need to finish some of the pipeline stages to be able to forward the result and execute the second ADD instruction

But why is a single add basically taking 0 time though?

if I add another add instruction it adds about an extra second.

 

NOPs and additions can take different amounts of cycles to finish, and that's probably why you're getting 4 nops = 2 adds.

Well, four NOPS took ~1.9 seconds, seven NOPS took ~1.9 seconds and 8 NOPS took ~2.9 seconds.

 

One NOP is one byte, one addition instruction is 2 bytes.

 

I'm suspecting this additional time taken is due to x86 fetching instructions in 4 byte chunks, because of this.

Link to comment
Share on other sites

Link to post
Share on other sites

Adds are pretty fast, but you can't use the same register at the same time, you need to think on the pipelining effects of those operations.

Oh you're right. The processor probably tries to do the arithmatic operations in parallel and can't because of that.

 

That doesn't really explain the NOPs though.

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×