Jump to content
  • entries
    4
  • comments
    66
  • views
    1,655

About this blog

A deep dive from top to bottom on how to optimize tasks common to game engines in the context of modern hardware in all its strengths and weaknesses. We'll start from scalar, readable code and dive all the way into the dark pits of vector intrinsics and p_threads while still never using the words "new" or "delete" in code.

Entries in this blog

Model Matrix and Vector Transforms Optimized By SIMD

I'm sick to death of people telling me "if it was so easy, the game devs would have done it by now. They know better than you do."   Here is visible, incontrovertible proof that the games industry can get a huge boost from taking advantage of SIMD today, especially when games require Sandy Bridge or later hardware (meaning AVX is available, but not AVX2 for our purposes).   First Example: Mesh Transform By Translation Using AVX Intrinsics   Example updated and trimm

patrickjp93

patrickjp93

SIMD in Context: The Bandwidth Problem Part I

In the previous entry, I espoused and showed how AVX could produce a whopping 10x performance improvement for 1 specific workload in a game engine and showed mathematical proof of correctness for the algorithm. However, I did not show how the solution compares to accelerating the task by multithreading the scalar code. I also only briefly mentioned why the SIMD code would have memory bandwidth limitations. However, I haven't actually fleshed either issue out. This entry seeks to start that for t

patrickjp93

patrickjp93

×