Jump to content

Force application to run on single CPU core using C++

ClobberXD
Go to solution Solved by vanished,
4 minutes ago, Anand_Geforce said:

Can you also give me an example snippet of code, for how to use this function? Also what if I want to set the process' affinity to only one core, but it can be any one - how can it be done?

 

Thanks a lot!

If you want it to only use one core worth of power and don't care which core it is, that's basically the default behaviour.  You'd have to put in work to accomplish anything else, but that is how it will run by default.  I mean, the Windows scheduler will probably spread out the load so it looks like it's loading all cores a little bit, but it's the equivalent of one core being pinned.

I want to force an application to run on only one CPU core (which core doesn't matter) in C++. How do I achieve this?

 

Thanks!

Nothing to see here ;)

Link to comment
Share on other sites

Link to post
Share on other sites

Unless you explicitly create multiple threads, the application will do all computations on one thread.  You can force this onto one particular physical core by using set affinity in task manager.

 

If you are coding for Windows,

You can also use SetProcessAffinityMask() to ensure the program will always launch on one core in particular.

https://msdn.microsoft.com/en-us/library/windows/desktop/ms686223(v=vs.85).aspx

Related is SetThreadAffinityMask() 

https://msdn.microsoft.com/en-us/library/windows/desktop/ms686247(v=vs.85).aspx

 

 

Solve your own audio issues  |  First Steps with RPi 3  |  Humidity & Condensation  |  Sleep & Hibernation  |  Overclocking RAM  |  Making Backups  |  Displays  |  4K / 8K / 16K / etc.  |  Do I need 80+ Platinum?

If you can read this you're using the wrong theme.  You can change it at the bottom.

Link to comment
Share on other sites

Link to post
Share on other sites

1 hour ago, Ryan_Vickers said:

Unless you explicitly create multiple threads, the application will do all computations on one thread.  You can force this onto one particular physical core by using set affinity in task manager.

 

If you are coding for Windows,

You can also use SetProcessAffinityMask() to ensure the program will always launch on one core in particular.

https://msdn.microsoft.com/en-us/library/windows/desktop/ms686223(v=vs.85).aspx

Related is SetThreadAffinityMask() 

https://msdn.microsoft.com/en-us/library/windows/desktop/ms686247(v=vs.85).aspx

 

 

Can you also give me an example snippet of code, for how to use this function? Also what if I want to set the process' affinity to only one core, but it can be any one - how can it be done?

 

Thanks a lot!

Nothing to see here ;)

Link to comment
Share on other sites

Link to post
Share on other sites

4 minutes ago, Anand_Geforce said:

Can you also give me an example snippet of code, for how to use this function? Also what if I want to set the process' affinity to only one core, but it can be any one - how can it be done?

 

Thanks a lot!

If you want it to only use one core worth of power and don't care which core it is, that's basically the default behaviour.  You'd have to put in work to accomplish anything else, but that is how it will run by default.  I mean, the Windows scheduler will probably spread out the load so it looks like it's loading all cores a little bit, but it's the equivalent of one core being pinned.

Solve your own audio issues  |  First Steps with RPi 3  |  Humidity & Condensation  |  Sleep & Hibernation  |  Overclocking RAM  |  Making Backups  |  Displays  |  4K / 8K / 16K / etc.  |  Do I need 80+ Platinum?

If you can read this you're using the wrong theme.  You can change it at the bottom.

Link to comment
Share on other sites

Link to post
Share on other sites

3 minutes ago, Ryan_Vickers said:

If you want it to only use one core worth of power and don't care which core it is, that's basically the default behaviour.  You'd have to put in work to accomplish anything else, but that is how it will run by default.  I mean, the Windows scheduler will probably spread out the load so it looks like it's loading all cores a little bit, but it's the equivalent of one core being pinned.

Is the aggregate load accurately equal to one core being utilized fully? Are you sure? I personally hope that is true, as 2 out of two cores were utilized around 50% each, and task manager specified the total CPU load is almost always 50%!

Nothing to see here ;)

Link to comment
Share on other sites

Link to post
Share on other sites

1 minute ago, Anand_Geforce said:

Is the aggregate load accurately equal to one core being utilized fully? Are you sure? I personally hope that is true, as 2 out of two cores were utilized around 50% each, and task manager specified the total CPU load is almost always 50%!

In truth, it will be an immeasurable fraction higher than an even split, like if you take one thread and spread it out over two cores, each will be used ever so slightly more than 50% of the time, due to context switching wasting performance, but yeah it's basically a perfect split for all intents and purposes.

Solve your own audio issues  |  First Steps with RPi 3  |  Humidity & Condensation  |  Sleep & Hibernation  |  Overclocking RAM  |  Making Backups  |  Displays  |  4K / 8K / 16K / etc.  |  Do I need 80+ Platinum?

If you can read this you're using the wrong theme.  You can change it at the bottom.

Link to comment
Share on other sites

Link to post
Share on other sites

On 8/7/2016 at 3:51 PM, Ryan_Vickers said:

In truth, it will be an immeasurable fraction higher than an even split, like if you take one thread and spread it out over two cores, each will be used ever so slightly more than 50% of the time, due to context switching wasting performance, but yeah it's basically a perfect split for all intents and purposes.

You say that is the default behaviour, but both my single-core benchmark and multi-core benchmark produce the same results... (CPU: Intel Pentium G2030 2-core 3.00 GHz.) And the multi-core runs on both the CPUs, pushing the CPU to 100% load, while single-core benchmark pushes the CPU to right around 50% load (fluctuates between 49% - 51%). The benchmark score is always 167772, no matter which benchmark I run, and it's the same even if I engage the CPU with other processes as well.

 

Are you able to figure out what's going on? Do you need any more info, stats?

 

Thanks!

Nothing to see here ;)

Link to comment
Share on other sites

Link to post
Share on other sites

2 minutes ago, Anand_Geforce said:

You say that is the default behaviour, but both my single-core benchmark and multi-core benchmark produce the same results... (CPU: Intel Pentium G2030 2-core 3.00 GHz.) And the multi-core runs on both the CPUs, pushing the CPU to 100% load, while single-core benchmark pushes the CPU to right around 50% load (fluctuates between 49% - 51%). The benchmark score is always 167772, no matter which benchmark I run, and it's the same even if I engage the CPU with other processes as well.

 

Are you able to figure out what's going on? Do you need any more info, stats?

 

Thanks!

Which benchmark is that you are running?  That sounds very strange...

Solve your own audio issues  |  First Steps with RPi 3  |  Humidity & Condensation  |  Sleep & Hibernation  |  Overclocking RAM  |  Making Backups  |  Displays  |  4K / 8K / 16K / etc.  |  Do I need 80+ Platinum?

If you can read this you're using the wrong theme.  You can change it at the bottom.

Link to comment
Share on other sites

Link to post
Share on other sites

30 minutes ago, Ryan_Vickers said:

Which benchmark is that you are running?  That sounds very strange...

I've created my own benchmarking utility - it's for the single-core benchmark, that I asked about setting processor affinity in C++... Here's the crux of the code:

#include<iostream>
#include<conio.h>
#include<thread>
#include<vector>
  
time_t EndTime;
float IterCount = 0; // Number of iterations, achieved in a minute - core of the benchmark!

void RunBench();

int main()
{
  unsigned const int CoreCount = thread::hardware_concurrency();
  
	cout << "Press any key to begin!";
	getch();

	// End time determined.
	EndTime = time(NULL) + 60; // 1 minute from now...

/************************** THIS BLOCK IS REMOVED FOR SINGLE-CORE VARIANT ****************************/  

	// Initialize n-1 threads; n = CPU CoreCount; Benchmarking process.
	vector<thread> t;
	for (unsigned int i = 0; i < (CoreCount - 1); i++)
		t.push_back(thread(RunBench));

/*****************************************************************************************************/
  
	RunBench(); // In Main thread

/************************** THIS BLOCK IS REMOVED FOR SINGLE-CORE VARIANT ****************************/  

	for (unsigned int i = 0; i < (CoreCount - 1); i++)  // Wait till child threads are finished.
		t[i].join();

/*****************************************************************************************************/
  
	cout << "Score: " << (IterCount/100); // Note the '/100' thing...

	return 0;
}

void RunBench()
{
	int a, i = 0;
	long long fact = 1;

	while (time(NULL) < EndTime)
	{
		IterCount++;

		for (a = 1; a <= i; a++)
			fact = fact*a;
		i++;
	}
}

For both the variants, the score is always '167772' (no decimal point or digits after it, although the score is equal to (no. of iterations / 100) - weird...

Nothing to see here ;)

Link to comment
Share on other sites

Link to post
Share on other sites

42 minutes ago, Ryan_Vickers said:

Which benchmark is that you are running?  That sounds very strange...

I'll also try to explicitly set process affinity and verify the results...

Nothing to see here ;)

Link to comment
Share on other sites

Link to post
Share on other sites

3 minutes ago, Anand_Geforce said:

I'll also try to explicitly set process affinity and verify the results...

I've been looking over the code and I haven't spotted an obvious flaw yet but I'm convinced it is a bug somewhere.  The fact that one test pins the CPU and one did not proves that you are launching the desired number of threads.  If the score doesn't reflect this, there's a bug somewhere.

Solve your own audio issues  |  First Steps with RPi 3  |  Humidity & Condensation  |  Sleep & Hibernation  |  Overclocking RAM  |  Making Backups  |  Displays  |  4K / 8K / 16K / etc.  |  Do I need 80+ Platinum?

If you can read this you're using the wrong theme.  You can change it at the bottom.

Link to comment
Share on other sites

Link to post
Share on other sites

17 minutes ago, Ryan_Vickers said:

I've been looking over the code and I haven't spotted an obvious flaw yet but I'm convinced it is a bug somewhere.  The fact that one test pins the CPU and one did not proves that you are launching the desired number of threads.  If the score doesn't reflect this, there's a bug somewhere.

BTW, how exactly do you set a bitmask for the SetProcessAffinityMask() ? I'm not very well-versed with bitwise manipulation...

Nothing to see here ;)

Link to comment
Share on other sites

Link to post
Share on other sites

1 minute ago, Anand_Geforce said:

BTW, how exactly do you set a bitmask for the SetProcessAffinityMask() ? I'm not very well-versed with bitwise manipulation...

Where core is an int equal to or greater than 0, and hThread is the handle to the thread, 

SetThreadAffinityMask(hThread, 1<<core);

Should do what you want

Solve your own audio issues  |  First Steps with RPi 3  |  Humidity & Condensation  |  Sleep & Hibernation  |  Overclocking RAM  |  Making Backups  |  Displays  |  4K / 8K / 16K / etc.  |  Do I need 80+ Platinum?

If you can read this you're using the wrong theme.  You can change it at the bottom.

Link to comment
Share on other sites

Link to post
Share on other sites

3 minutes ago, Ryan_Vickers said:

Where core is an int equal to or greater than 0, and hThread is the handle to the thread, 


SetThreadAffinityMask(hThread, 1<<core);

Should do what you want

And core is the actual core number?

Nothing to see here ;)

Link to comment
Share on other sites

Link to post
Share on other sites

1 minute ago, Anand_Geforce said:

And core is the actual core number?

yes.  This will only be useful if you only want each thread assigned to one core.  But that is how it should be done.  No reason to assign 1 thread to multiple cores.

Solve your own audio issues  |  First Steps with RPi 3  |  Humidity & Condensation  |  Sleep & Hibernation  |  Overclocking RAM  |  Making Backups  |  Displays  |  4K / 8K / 16K / etc.  |  Do I need 80+ Platinum?

If you can read this you're using the wrong theme.  You can change it at the bottom.

Link to comment
Share on other sites

Link to post
Share on other sites

1 hour ago, Ryan_Vickers said:

yes.  This will only be useful if you only want each thread assigned to one core.  But that is how it should be done.  No reason to assign 1 thread to multiple cores.

Right! Explicitly assigning process affinity worked, but the score was the same: 1677728 (except for the '8' - what???). Pulled every string possible - even modified

float IterCount = 0; // as

long IterCount = 0; // Worked like a treat!

 

The 'Score' probably exceeded the limit for float, as the actual score for the single-core variant, is now '2127772584' ("What the f*** ?"). The multi-threaded variant is bound to spew at least 2 times more shit on my PC... xD

 

Thanks a ton!

Nothing to see here ;)

Link to comment
Share on other sites

Link to post
Share on other sites

2 hours ago, Anand_Geforce said:

Right! Explicitly assigning process affinity worked, but the score was the same: 1677728 (except for the '8' - what???). Pulled every string possible - even modified

float IterCount = 0; // as

long IterCount = 0; // Worked like a treat!

 

The 'Score' probably exceeded the limit for float, as the actual score for the single-core variant, is now '2127772584' ("What the f*** ?"). The multi-threaded variant is bound to spew at least 2 times more shit on my PC... xD

 

Thanks a ton!

Interesting.  I doubt it exceeded the limit for float seeing as how that can go up to somewhere around 1038 but what may have happened is the number became so large that adding one was immediately rounded out and so it was no longer incrementing.  Try abandoning the float for a long int (like 64 bit) or at least a double and see if that helps.

Solve your own audio issues  |  First Steps with RPi 3  |  Humidity & Condensation  |  Sleep & Hibernation  |  Overclocking RAM  |  Making Backups  |  Displays  |  4K / 8K / 16K / etc.  |  Do I need 80+ Platinum?

If you can read this you're using the wrong theme.  You can change it at the bottom.

Link to comment
Share on other sites

Link to post
Share on other sites

Race-conditions: https://en.wikipedia.org/wiki/Race_condition#Software

 

All threads access "IterCount" without proper locking. Eighter lock the variable with a mutex before accessing or, probably better in this case, have each thread have it's own itercount variable and add them all up when the threads are done.

 

Another, smaller problem with the code is the fact that std::time_t's encoding is unspecified in the C++ standard, it does not have to be seconds as your program assumes. While it will be seconds on most systems and is probably not the cause of the problem here, it's bad practice to assume things, resulting in unportable code.

 

Link to comment
Share on other sites

Link to post
Share on other sites

First of all: use a integer type (preferably 64 bit: uint64_t) for your counter.

Secondly: use atomic variables (std::atomic counter and counter.fetch_add(1) instead of counter++) because your current code is not thread safe

Desktop: Intel i9-10850K (R9 3900X died 😢 )| MSI Z490 Tomahawk | RTX 2080 (borrowed from work) - MSI GTX 1080 | 64GB 3600MHz CL16 memory | Corsair H100i (NF-F12 fans) | Samsung 970 EVO 512GB | Intel 665p 2TB | Samsung 830 256GB| 3TB HDD | Corsair 450D | Corsair RM550x | MG279Q

Laptop: Surface Pro 7 (i5, 16GB RAM, 256GB SSD)

Console: PlayStation 4 Pro

Link to comment
Share on other sites

Link to post
Share on other sites

7 hours ago, Unimportant said:

Race-conditions: https://en.wikipedia.org/wiki/Race_condition#Software

 

All threads access "IterCount" without proper locking. Eighter lock the variable with a mutex before accessing or, probably better in this case, have each thread have it's own itercount variable and add them all up when the threads are done.

 

Another, smaller problem with the code is the fact that std::time_t's encoding is unspecified in the C++ standard, it does not have to be seconds as your program assumes. While it will be seconds on most systems and is probably not the cause of the problem here, it's bad practice to assume things, resulting in unportable code.

 

I'll implement separate counters for each threads... Nice idea! How else do I run the code for exactly a minute? I know of only time_t...

Nothing to see here ;)

Link to comment
Share on other sites

Link to post
Share on other sites

4 hours ago, Anand_Geforce said:

I'll implement separate counters for each threads... Nice idea! How else do I run the code for exactly a minute? I know of only time_t...

That is a good solution too as it removes any locking. Use thread_local or make an vector of "counters" and use the thread number to index into that array.

Using time_t is outdated, take a look at std chrono.

Desktop: Intel i9-10850K (R9 3900X died 😢 )| MSI Z490 Tomahawk | RTX 2080 (borrowed from work) - MSI GTX 1080 | 64GB 3600MHz CL16 memory | Corsair H100i (NF-F12 fans) | Samsung 970 EVO 512GB | Intel 665p 2TB | Samsung 830 256GB| 3TB HDD | Corsair 450D | Corsair RM550x | MG279Q

Laptop: Surface Pro 7 (i5, 16GB RAM, 256GB SSD)

Console: PlayStation 4 Pro

Link to comment
Share on other sites

Link to post
Share on other sites

14 minutes ago, mathijs727 said:

That is a good solution too as it removes any locking. Use thread_local or make an vector of "counters" and use the thread number to index into that array.

Using time_t is outdated, take a look at std chrono.

Can you show me a simple and relevant example of both? Thanks!

Nothing to see here ;)

Link to comment
Share on other sites

Link to post
Share on other sites

32 minutes ago, Anand_Geforce said:

Can you show me a simple and relevant example of both? Thanks!

Im currently on vacation. I'll post an example tommorow (when Im home).

This is the best I can currently do for std chrono:

http://stackoverflow.com/questions/14391327/how-to-get-duration-as-int-millis-and-float-seconds-from-chrono

 

For the concurrency problem: replace "float IterCount" by "std::atomic<std::uint_fast64_t> IterCount" (and #include <atomic>). Replace "IterCount++" by "IterCount.fetch_add(1)"

Desktop: Intel i9-10850K (R9 3900X died 😢 )| MSI Z490 Tomahawk | RTX 2080 (borrowed from work) - MSI GTX 1080 | 64GB 3600MHz CL16 memory | Corsair H100i (NF-F12 fans) | Samsung 970 EVO 512GB | Intel 665p 2TB | Samsung 830 256GB| 3TB HDD | Corsair 450D | Corsair RM550x | MG279Q

Laptop: Surface Pro 7 (i5, 16GB RAM, 256GB SSD)

Console: PlayStation 4 Pro

Link to comment
Share on other sites

Link to post
Share on other sites

@mathijs727The extra overhead of atomic operations (which will be using some sort of locking mechanism in the background anyway) will probably skew the 'benchmark' result he's trying to create tough.

 

@Anand_Geforce I already gave a example of using std::chrono in this thread of yours: https://linustechtips.com/main/topic/606102-running-a-loop-for-exactly-30-seconds-c/#comment-7851462

 

Link to comment
Share on other sites

Link to post
Share on other sites

15 minutes ago, Unimportant said:

@mathijs727The extra overhead of atomic operations (which will be using some sort of locking mechanism in the background anyway) will probably skew the 'benchmark' result he's trying to create tough.

 

@Anand_Geforce I already gave a example of using std::chrono in this thread of yours: https://linustechtips.com/main/topic/606102-running-a-loop-for-exactly-30-seconds-c/#comment-7851462

 

It's in that very thread, that another member suggested to use <ctime> - and it was very easy to use! Sadly, it's deemed unreliable, and hence I'll use std::chrono (it seems very complicated, that's what's making me hesitant...).

Nothing to see here ;)

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×