Parallel Game Code: DX12 and Vulkan-Ready, Usable with DX11 and OpenGL Now
Seriously game devs, it's not that hard. Get with the times.
This would be part of the main class
std::vector<VisualObject> movableObjects;//initialize everything, set starting coordinates and directions//other prep work//FUNCTIONSvoid update(float updateTimeMillis) { //move view point, make decisions on color changes, texture loading, etc. ... //check for collisions in parallel, using in-lined function calls (no stack frame created) //and loop-unrolling to make explicit use of branching AVX 256 //assume there is a mutex or semaphore to lock an object for analysis/deletion and a smart //function to skip if locked and come back built into the calls to inline function //collisionUpdate #pragma omp parallel for for(int i = movableObjects.size()-1; i >= 0; i -= 8) { movableObjects[i].collisionUpdate(); //will run each type of object's unique function movableObjects[i-1].collisionUpdate(); ... movableObjects[i-7].collisionUpdate(); } //collided objects now destroyed or had states updated to change physical effects routines, //update all vertices of all objects in parallel, //ensure dummies exist if not in multiples of 8 to take advantage of AVX 256 in loop unrolling #pragma omp parallel for for(int i = movableObjects.size()-1; i >= 0; i -= 8) { movableObjects[i].update(updateTimeMillis); movableObjects[i-1].update(updateTimeMillis); ... movableObjects[i-7].update(updateTimeMillis); } VisualObject.draw(); //draw the scene and all objects in it.}
And this would be the chief graphics object from which near every other graphics object should inherit from.
public class VisualObject { VisualObject* parent = null; vector<VisualObject> children; int omp_max_thread_count = omp_get_max_threads(); //if quad I5, will get 4, if quad I7, will get 8 void VisualObject::addChild(const VisualObject &v) { v.setParent(this); children.push_back(v); } void VisualObject::draw() { //draw children in parallel using dynamic scheduling in case some objects //have complex functions that take far longer than others, auto scaling with core count #pragma parallel for schedule(dynamic) for(int i = 0; i < children.size(); i += 8) { children[i].draw(); children[i+1].draw(); ... children[i+7].draw(); } } struct { bool operator()(VisualObject v1, VisualObject v2) { return v1.complexity() < v2.complexity(); } } VOComparator; //Sort Visual Objects by draw complexity into equal-sized buckets for the draw() function //to handle, most expensive draws first per thread. void VisualObject::loadBalance() { //If on GCC/Clang/ICC, compile with -fopenmp and -D_GLIBCXX_PARALLEL to get parallel sort std::sort(children.begin(), children.end(), VOComparator); int chunkSize = children.size()/omp_max_thread_count; vector<VisualObject> sortedSet(children.size()); #pragma parallel for for(int i = 0; i < omp_max_thread_count; i++){ //There is room for loop unrolling as long as you check to insure your unroll length is no larger than //chunkSize and you either have dummies to fill the empty space or a cleanup function for the remainder under a given chunkSize for(int j = 0; j < chunkSize; j++){ sortedSet[i * chunkSize + j] = children[j*omp_max_thread_count + i]; } } children = sortedSet; } //end loadBalance}; //end VisualObject class
There, 80+% of your CPU-side optimization done for you. Quit your belly aching, start over, and do it right this time. There's no excuse when it's this easy.
0 Comments
There are no comments to display.