Jump to content
  • entries
    2
  • comments
    4
  • views
    941

Parallel Game Code: DX12 and Vulkan-Ready, Usable with DX11 and OpenGL Now

patrickjp93

531 views

Seriously game devs, it's not that hard. Get with the times.

 

This would be part of the main class

std::vector<VisualObject> movableObjects;//initialize everything, set starting coordinates and directions//other prep work//FUNCTIONSvoid update(float updateTimeMillis) {    //move view point, make decisions on color changes, texture loading, etc.    ...    //check for collisions in parallel, using in-lined function calls (no stack frame created)    //and loop-unrolling to make explicit use of branching AVX 256    //assume there is a mutex or semaphore to lock an object for analysis/deletion and a smart    //function to skip if locked and come back built into the calls to inline function     //collisionUpdate    #pragma omp parallel for    for(int i = movableObjects.size()-1; i >= 0; i -= 8) {        movableObjects[i].collisionUpdate(); //will run each type of object's unique function        movableObjects[i-1].collisionUpdate();        ...        movableObjects[i-7].collisionUpdate();    }    //collided objects now destroyed or had states updated to change physical effects routines,    //update all vertices of all objects in parallel,    //ensure dummies exist if not in multiples of 8 to take advantage of AVX 256 in loop unrolling    #pragma omp parallel for    for(int i = movableObjects.size()-1; i >= 0; i -= 8) {         movableObjects[i].update(updateTimeMillis);        movableObjects[i-1].update(updateTimeMillis);        ...        movableObjects[i-7].update(updateTimeMillis);    }        VisualObject.draw(); //draw the scene and all objects in it.}  


 

And this would be the chief graphics object from which near every other graphics object should inherit from.

 

public class VisualObject {    VisualObject* parent = null;    vector<VisualObject> children;    int omp_max_thread_count = omp_get_max_threads();    //if quad I5, will get 4, if quad I7, will get 8    void VisualObject::addChild(const VisualObject &v) {        v.setParent(this);        children.push_back(v);    }    void VisualObject::draw() {        //draw children in parallel using dynamic scheduling in case some objects         //have complex functions that take far longer than others, auto scaling with core count        #pragma parallel for schedule(dynamic)        for(int i = 0; i < children.size(); i += 8) {            children[i].draw();            children[i+1].draw();            ...            children[i+7].draw();        }    }    struct {        bool operator()(VisualObject v1, VisualObject v2)        {               return v1.complexity() < v2.complexity();        }       } VOComparator;    //Sort Visual Objects by draw complexity into equal-sized buckets for the draw() function     //to handle, most expensive draws first per thread.    void VisualObject::loadBalance() {        //If on GCC/Clang/ICC, compile with -fopenmp and -D_GLIBCXX_PARALLEL to get parallel sort        std::sort(children.begin(), children.end(), VOComparator);        int chunkSize = children.size()/omp_max_thread_count;        vector<VisualObject> sortedSet(children.size());        #pragma parallel for        for(int i = 0; i < omp_max_thread_count; i++){            //There is room for loop unrolling as long as you check to insure your unroll length is no larger than             //chunkSize and you either have dummies to fill the empty space or a cleanup function for the remainder under a given chunkSize            for(int j = 0; j < chunkSize; j++){                sortedSet[i * chunkSize + j] = children[j*omp_max_thread_count + i];            }        }        children = sortedSet;    } //end loadBalance}; //end VisualObject class 


 

There, 80+% of your CPU-side optimization done for you. Quit your belly aching, start over, and do it right this time. There's no excuse when it's this easy.

0 Comments

There are no comments to display.

×