Force application to run on single CPU core using C++

ClobberXD · July 8, 2016

I want to force an application to run on only one CPU core (which core doesn't matter) in C++. How do I achieve this?

Thanks!

vanished · July 8, 2016

Unless you explicitly create multiple threads, the application will do all computations on one thread. You can force this onto one particular physical core by using set affinity in task manager.

If you are coding for Windows,

You can also use SetProcessAffinityMask() to ensure the program will always launch on one core in particular.

https://msdn.microsoft.com/en-us/library/windows/desktop/ms686223(v=vs.85).aspx

Related is SetThreadAffinityMask()

https://msdn.microsoft.com/en-us/library/windows/desktop/ms686247(v=vs.85).aspx

ClobberXD · July 8, 2016

1 hour ago, Ryan_Vickers said:

Unless you explicitly create multiple threads, the application will do all computations on one thread. You can force this onto one particular physical core by using set affinity in task manager.

If you are coding for Windows,

You can also use SetProcessAffinityMask() to ensure the program will always launch on one core in particular.

https://msdn.microsoft.com/en-us/library/windows/desktop/ms686223(v=vs.85).aspx

Related is SetThreadAffinityMask()

https://msdn.microsoft.com/en-us/library/windows/desktop/ms686247(v=vs.85).aspx

Can you also give me an example snippet of code, for how to use this function? Also what if I want to set the process' affinity to only one core, but it can be any one - how can it be done?

Thanks a lot!

vanished · July 8, 2016

4 minutes ago, Anand_Geforce said:

Can you also give me an example snippet of code, for how to use this function? Also what if I want to set the process' affinity to only one core, but it can be any one - how can it be done?

Thanks a lot!

If you want it to only use one core worth of power and don't care which core it is, that's basically the default behaviour. You'd have to put in work to accomplish anything else, but that is how it will run by default. I mean, the Windows scheduler will probably spread out the load so it looks like it's loading all cores a little bit, but it's the equivalent of one core being pinned.

ClobberXD · July 8, 2016

3 minutes ago, Ryan_Vickers said:

If you want it to only use one core worth of power and don't care which core it is, that's basically the default behaviour. You'd have to put in work to accomplish anything else, but that is how it will run by default. I mean, the Windows scheduler will probably spread out the load so it looks like it's loading all cores a little bit, but it's the equivalent of one core being pinned.

Is the aggregate load accurately equal to one core being utilized fully? Are you sure? I personally hope that is true, as 2 out of two cores were utilized around 50% each, and task manager specified the total CPU load is almost always 50%!

vanished · July 8, 2016

1 minute ago, Anand_Geforce said:

Is the aggregate load accurately equal to one core being utilized fully? Are you sure? I personally hope that is true, as 2 out of two cores were utilized around 50% each, and task manager specified the total CPU load is almost always 50%!

In truth, it will be an immeasurable fraction higher than an even split, like if you take one thread and spread it out over two cores, each will be used ever so slightly more than 50% of the time, due to context switching wasting performance, but yeah it's basically a perfect split for all intents and purposes.

ClobberXD · July 9, 2016

On 8/7/2016 at 3:51 PM, Ryan_Vickers said:

In truth, it will be an immeasurable fraction higher than an even split, like if you take one thread and spread it out over two cores, each will be used ever so slightly more than 50% of the time, due to context switching wasting performance, but yeah it's basically a perfect split for all intents and purposes.

You say that is the default behaviour, but both my single-core benchmark and multi-core benchmark produce the same results... (CPU: Intel Pentium G2030 2-core 3.00 GHz.) And the multi-core runs on both the CPUs, pushing the CPU to 100% load, while single-core benchmark pushes the CPU to right around 50% load (fluctuates between 49% - 51%). The benchmark score is always 167772, no matter which benchmark I run, and it's the same even if I engage the CPU with other processes as well.

Are you able to figure out what's going on? Do you need any more info, stats?

Thanks!

vanished · July 9, 2016

2 minutes ago, Anand_Geforce said:

You say that is the default behaviour, but both my single-core benchmark and multi-core benchmark produce the same results... (CPU: Intel Pentium G2030 2-core 3.00 GHz.) And the multi-core runs on both the CPUs, pushing the CPU to 100% load, while single-core benchmark pushes the CPU to right around 50% load (fluctuates between 49% - 51%). The benchmark score is always 167772, no matter which benchmark I run, and it's the same even if I engage the CPU with other processes as well.

Are you able to figure out what's going on? Do you need any more info, stats?

Thanks!

Which benchmark is that you are running? That sounds very strange...

ClobberXD · July 9, 2016

30 minutes ago, Ryan_Vickers said:

Which benchmark is that you are running? That sounds very strange...

I've created my own benchmarking utility - it's for the single-core benchmark, that I asked about setting processor affinity in C++... Here's the crux of the code:

#include<iostream>
#include<conio.h>
#include<thread>
#include<vector>
  
time_t EndTime;
float IterCount = 0; // Number of iterations, achieved in a minute - core of the benchmark!

void RunBench();

int main()
{
  unsigned const int CoreCount = thread::hardware_concurrency();
  
	cout << "Press any key to begin!";
	getch();

	// End time determined.
	EndTime = time(NULL) + 60; // 1 minute from now...

/************************** THIS BLOCK IS REMOVED FOR SINGLE-CORE VARIANT ****************************/  

	// Initialize n-1 threads; n = CPU CoreCount; Benchmarking process.
	vector<thread> t;
	for (unsigned int i = 0; i < (CoreCount - 1); i++)
		t.push_back(thread(RunBench));

/*****************************************************************************************************/
  
	RunBench(); // In Main thread

/************************** THIS BLOCK IS REMOVED FOR SINGLE-CORE VARIANT ****************************/  

	for (unsigned int i = 0; i < (CoreCount - 1); i++)  // Wait till child threads are finished.
		t[i].join();

/*****************************************************************************************************/
  
	cout << "Score: " << (IterCount/100); // Note the '/100' thing...

	return 0;
}

void RunBench()
{
	int a, i = 0;
	long long fact = 1;

	while (time(NULL) < EndTime)
	{
		IterCount++;

		for (a = 1; a <= i; a++)
			fact = fact*a;
		i++;
	}
}

For both the variants, the score is always '167772' (no decimal point or digits after it, although the score is equal to (no. of iterations / 100) - weird...

ClobberXD · July 9, 2016

42 minutes ago, Ryan_Vickers said:

Which benchmark is that you are running? That sounds very strange...

I'll also try to explicitly set process affinity and verify the results...

vanished · July 9, 2016

3 minutes ago, Anand_Geforce said:

I'll also try to explicitly set process affinity and verify the results...

I've been looking over the code and I haven't spotted an obvious flaw yet but I'm convinced it is a bug somewhere. The fact that one test pins the CPU and one did not proves that you are launching the desired number of threads. If the score doesn't reflect this, there's a bug somewhere.

ClobberXD · July 9, 2016

17 minutes ago, Ryan_Vickers said:

I've been looking over the code and I haven't spotted an obvious flaw yet but I'm convinced it is a bug somewhere. The fact that one test pins the CPU and one did not proves that you are launching the desired number of threads. If the score doesn't reflect this, there's a bug somewhere.

BTW, how exactly do you set a bitmask for the SetProcessAffinityMask() ? I'm not very well-versed with bitwise manipulation...

vanished · July 9, 2016

1 minute ago, Anand_Geforce said:

BTW, how exactly do you set a bitmask for the SetProcessAffinityMask() ? I'm not very well-versed with bitwise manipulation...

Where core is an int equal to or greater than 0, and hThread is the handle to the thread,

SetThreadAffinityMask(hThread, 1<<core);

Should do what you want

ClobberXD · July 9, 2016

3 minutes ago, Ryan_Vickers said:
Where core is an int equal to or greater than 0, and hThread is the handle to the thread,
SetThreadAffinityMask(hThread, 1<<core);
Should do what you want

And core is the actual core number?

vanished · July 9, 2016

1 minute ago, Anand_Geforce said:

And core is the actual core number?

yes. This will only be useful if you only want each thread assigned to one core. But that is how it should be done. No reason to assign 1 thread to multiple cores.

ClobberXD · July 9, 2016

1 hour ago, Ryan_Vickers said:

yes. This will only be useful if you only want each thread assigned to one core. But that is how it should be done. No reason to assign 1 thread to multiple cores.

Right! Explicitly assigning process affinity worked, but the score was the same: 1677728 (except for the '8' - what???). Pulled every string possible - even modified

float IterCount = 0; // as

long IterCount = 0; // Worked like a treat!

The 'Score' probably exceeded the limit for float, as the actual score for the single-core variant, is now '2127772584' ("What the f*** ?"). The multi-threaded variant is bound to spew at least 2 times more shit on my PC...

Thanks a ton!

vanished · July 9, 2016

2 hours ago, Anand_Geforce said:

Right! Explicitly assigning process affinity worked, but the score was the same: 1677728 (except for the '8' - what???). Pulled every string possible - even modified

float IterCount = 0; // as

long IterCount = 0; // Worked like a treat!

The 'Score' probably exceeded the limit for float, as the actual score for the single-core variant, is now '2127772584' ("What the f*** ?"). The multi-threaded variant is bound to spew at least 2 times more shit on my PC...

Thanks a ton!

Interesting. I doubt it exceeded the limit for float seeing as how that can go up to somewhere around 10³⁸ but what may have happened is the number became so large that adding one was immediately rounded out and so it was no longer incrementing. Try abandoning the float for a long int (like 64 bit) or at least a double and see if that helps.

Unimportant · July 9, 2016

Race-conditions: https://en.wikipedia.org/wiki/Race_condition#Software

All threads access "IterCount" without proper locking. Eighter lock the variable with a mutex before accessing or, probably better in this case, have each thread have it's own itercount variable and add them all up when the threads are done.

Another, smaller problem with the code is the fact that std::time_t's encoding is unspecified in the C++ standard, it does not have to be seconds as your program assumes. While it will be seconds on most systems and is probably not the cause of the problem here, it's bad practice to assume things, resulting in unportable code.

mathijs727 · July 9, 2016

First of all: use a integer type (preferably 64 bit: uint64_t) for your counter.

Secondly: use atomic variables (std::atomic counter and counter.fetch_add(1) instead of counter++) because your current code is not thread safe

ClobberXD · July 10, 2016

7 hours ago, Unimportant said:

Race-conditions: https://en.wikipedia.org/wiki/Race_condition#Software

All threads access "IterCount" without proper locking. Eighter lock the variable with a mutex before accessing or, probably better in this case, have each thread have it's own itercount variable and add them all up when the threads are done.

Another, smaller problem with the code is the fact that std::time_t's encoding is unspecified in the C++ standard, it does not have to be seconds as your program assumes. While it will be seconds on most systems and is probably not the cause of the problem here, it's bad practice to assume things, resulting in unportable code.

I'll implement separate counters for each threads... Nice idea! How else do I run the code for exactly a minute? I know of only time_t...

mathijs727 · July 10, 2016

4 hours ago, Anand_Geforce said:

I'll implement separate counters for each threads... Nice idea! How else do I run the code for exactly a minute? I know of only time_t...

That is a good solution too as it removes any locking. Use thread_local or make an vector of "counters" and use the thread number to index into that array.

Using time_t is outdated, take a look at std chrono.

ClobberXD · July 10, 2016

14 minutes ago, mathijs727 said:

That is a good solution too as it removes any locking. Use thread_local or make an vector of "counters" and use the thread number to index into that array.

Using time_t is outdated, take a look at std chrono.

Can you show me a simple and relevant example of both? Thanks!

mathijs727 · July 10, 2016

32 minutes ago, Anand_Geforce said:

Can you show me a simple and relevant example of both? Thanks!

Im currently on vacation. I'll post an example tommorow (when Im home).

This is the best I can currently do for std chrono:

http://stackoverflow.com/questions/14391327/how-to-get-duration-as-int-millis-and-float-seconds-from-chrono

For the concurrency problem: replace "float IterCount" by "std::atomic<std::uint_fast64_t> IterCount" (and #include <atomic>). Replace "IterCount++" by "IterCount.fetch_add(1)"

Unimportant · July 10, 2016

@mathijs727The extra overhead of atomic operations (which will be using some sort of locking mechanism in the background anyway) will probably skew the 'benchmark' result he's trying to create tough.

@Anand_Geforce I already gave a example of using std::chrono in this thread of yours: https://linustechtips.com/main/topic/606102-running-a-loop-for-exactly-30-seconds-c/#comment-7851462

ClobberXD · July 10, 2016

15 minutes ago, Unimportant said:

@mathijs727The extra overhead of atomic operations (which will be using some sort of locking mechanism in the background anyway) will probably skew the 'benchmark' result he's trying to create tough.

@Anand_Geforce I already gave a example of using std::chrono in this thread of yours: https://linustechtips.com/main/topic/606102-running-a-loop-for-exactly-30-seconds-c/#comment-7851462

It's in that very thread, that another member suggested to use <ctime> - and it was very easy to use! Sadly, it's deemed unreliable, and hence I'll use std::chrono (it seems very complicated, that's what's making me hesitant...).

Sign In

Force application to run on single CPU core using C++

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites