Jump to content
Search In
  • More options...
Find results that contain...
Find results in...


  • Content Count

  • Joined

  • Last visited


This user doesn't have any awards


About mathijs727

  • Title
    Junior Member
  • Birthday 1995-07-27

Profile Information

  • Gender
  • Location
  • Occupation
    Computer Science Master (2nd year)


  • CPU
    I7 i4790k
  • Motherboard
    MSI Z97 Gaming 5
  • RAM
    16GB Corsair Vengeance 1600MHz DDR3
  • GPU
    MSI GTX1080 Gaming
  • Case
    Corsair 450D
  • Storage
    500GB Samsung 970 EVO + 256GB Samsung 830 + 3TB HDD
  • PSU
  • Display(s)
    Asus MG279Q
  • Cooling
    Corsair H100i with Noctua NF-F12's
  • Keyboard
    Coolermaster Masterkeys Pro S RGB
  • Mouse
    Razer Deathadder Elite
  • Sound
    HyperX Cloud | Audio Technica ATH-M50X
  • Operating System
    Windows 10 Pro

Recent Profile Visitors

The recent visitors block is disabled and is not being shown to other users.

  1. Don’t use a single block, that will force the kernel to run on only a single CU (32 cores). You should decide on the number of threads per block (multiple of 32, max of 512 IIRC) and scale the number of blocks appropriately with the number of work items. Apart from trying all possibilities yourself, CUDA also has a function that tries to guess the optimal block size for your kernel: https://devblogs.nvidia.com/cuda-pro-tip-occupancy-api-simplifies-launch-configuration/ Note that the optimal block size might change from kernel to kernel because of registry pressure. Also make sure that you schedule a couple times more blocks than you have conpute units such that the GPU can efficiently hide memory latency (problem size should be at least a couple times larger than the number of cores in your GPU).
  2. mathijs727

    [Noob Question] Setting up Eclipse C++ with MinGW?

    Just checked: my VS2017 installation folder (C:/Program Files (x86)/Microsoft Visual Studio/2017) has a size of a "whopping" 2.34GB. I assume that this does not include the Windows SDK itself but its size clearly doesn't deviate much from VS2019. I think you are confused with VS2015 where you had to install everything at once. Since VS2017 you can use the installer to select what features you want (everything is deselected by default) and for just a C++ installation you only need a couple GB. And like @straight_stewie said, VS2017 is the easiest C++ environment to set up and use. Also, Visual Studio has some features that are just plain better than any of its competitors (Intellisense).
  3. mathijs727

    [Noob Question] Setting up Eclipse C++ with MinGW?

    Except that for a typical C++ installation it uses under 7GB (6.75GB for VS2019 without Live Share support) . That also includes stuff like the compiler, profiler and CMake. So it's actually not bloated at all and many things can be disabled in the installer if you don't need them. It's perfect for a beginner because it includes everything you need so you can start working with it out-of-the-box (no hassle with installing MinGW which might contain an outdated version of GCC which requires you to compile GCC from source to get C++17 to work properly).
  4. mathijs727

    [Noob Question] Setting up Eclipse C++ with MinGW?

    Which is totally irrelevant to the OP. @Divergent2000 either get Clion with an educational license or Visual Studio 2017 Community Edition (or VS2019 preview if you like living on the edge). Visual studio is not that bloated as some people here make it out to be. The installer lets you select exactly what you need. Also, if you are only interested in compiling single file projects then you can also just use a regular text editor and call the compiler (ie GCC or Clang) from the command line.
  5. mathijs727

    Multi-Threading C++ & OpenGL

    Like others have said, OpenGL is not thread safe. However, for any toy application that you are building I would not expect OpenGL command submission to be the bottleneck. Calls to OpenGL functions are deferred to the drive so there is no waiting involved. When you submit a drawcall the API driver will check that what you're doing is legal and will then forward the result to a work queue for the kernel-mode part of the driver to process which might do some more error checking, schedule requests between different programs, convert the commands to a GPU compatible format and upload those commands to the GPUs internal command queue. Note that your program does not wait for the kernel-mode driver (and thus also won't wait for triangles to be drawn by the GPU). With all due respect, but if draw calls are indeed a bottleneck (in your hobby OpenGL project which does not have a 100 square km game map filled with high quality assets) then you are probably doing something wrong. Make sure that you are not using the legacy fixed-function pipeline (submitting triangles with glVertex calls) and instead use "modern" OpenGL (fixed-function pipeline was deprecated in OpenGL 3.0 (2008) and removed starting from OpenGL 3.1 (2009!)): https://www.khronos.org/opengl/wiki/Fixed_Function_Pipeline Another way to reduce driver overhead is to use the functions added in recent OpenGL versions (>4.3 IIRC). This collection of new features is often referred to as AZDO ("Approaching Zero Driver Overhead") which was presented at GDC (Game Developer Conference): https://gdcvault.com/play/1020791/Approaching-Zero-Driver-Overhead-in https://gdcvault.com/play/1023516/High-performance-Low-Overhead-Rendering (2016 presentation with some new stuff). Also, be sure to check out gdcvault, the video-on-demand service of GDC), it contains a ton of very interesting and useful presentations (note that some presentations are behind a paywall (video mostly, slide decks are usually available) which usually gets removed after a year or two). A good way to greatly improve GPU performance is by applying frustum and/or occlusion culling. With frustum culling we try to check whether an object (a collection of primitives) might possibly be visible with respect to the camera frustum (whether it's inside the field of view). Frustum culling is an easy optimisation that only requires you to know the bounding volumes of the objects (which you can compute ahead of time). You simply check for each object whether its bounding volume overlaps with the cameras view frustum (google "frustum culling" for info on how to implement that test). Note that this type of frustum culling is easily parallelizable both with multi-threading and SIMD (or even on the GPU with indirect draw commands). If you have a very complex scene then you could also experiment with hierarchical culling where you store the objects in a tree structure (like a bounding volume hierarchy) and traverse the tree, only visiting child nodes when their bounding volume overlaps with the view frustum. Note that this does make multi-threading and SIMD optimizations somewhat harder (an easy way to properly utilise SIMD in this case is to use a wider tree (ie 4 or 8 children per node)). Although this might result in fewer overlap tests (when most of the objects are not visible) it does not map that well to modern hardware (many cache hits will mean a lot of stalls on memory == lower performance). Frostbite for example switched from a fully hierarchical to a hybrid for BF3: https://www.gamedevs.org/uploads/culling-the-battlefield-battlefield3.pdf https://www.gdcvault.com/play/1014491/Culling-the-Battlefield-Data-Oriented Occlusion culling is a lot more complicated than frustum culling and there are many different solutions. The most popular solutions right now are based on screen-space techniques (like hierarchical z-buffer, HOM and IOM) because they map well to modern hardware (especially GPU) and can handle any arbitrary fully dynamic scenes. Like I mentioned this topic is a lot more complex than frustum culling and requires complex scenes (high depth complexity) to perform well. So I would recommend you not look into this too much until you've build a decently sized engine and the performance is GPU bottlenecked with no other obvious optimisations (like backface culling). Anyway here is some reading on occlusion culling in games: https://www.google.com/search?q=umbra+master+thesis (first link. Master thesis by Timo Aila (currently a researcher at Nvidia Research with an impressive list of publications to his name). Umbra is now developed by the equally named company and the technology is used in games like The Witcher 3). https://www.gdcvault.com/play/1014491/Culling-the-Battlefield-Data-Oriented https://frostbite-wp-prd.s3.amazonaws.com/wp-content/uploads/2016/03/29204330/GDC_2016_Compute.pdf https://gdcvault.com/play/1017837/Why-Render-Hidden-Objects-Cull http://advances.realtimerendering.com/s2015/aaltonenhaar_siggraph2015_combined_final_footer_220dpi.pdf And a interesting note: GPUs already implement hierarchical z-buffer culling to cull individual triangles (but not whole objects). With regards to multi-threading, what most game engines do is create their own command lists. Recording into these command lists can may be multi threaded and only execution (looping over the commands and calling the corresponding OpenGL functions) of the command lists has to be sequential. Furthermore, you could also apply multithreading to any other processing (like physics simulations) that you would like to do between the input phase (polling the keyboard/mouse. This does not take any significant amount of time) and the rendering phase. The best way to handle this in terms of throughput is to overlap rendering of frame N with the input+physics of frame N+1. Although this does add a frame of latency it helps with filling compute resources (e.g. fork/join creates waiting until the last task has finished and maybe not everything can multi-threaded (Amdahl's law)). A good way to get the most parallelism out of the system is to describe your program as a directed acyclic graph (DAG) of tasks. This allows the scheduler to figure out which tasks do not depend on each other such that they can be executed in parallel. If you're keen to work with Vulkan/DX12 then you might also want to apply the same concept to scheduling GPU commands. Some examples of task/frame graphs in practice: https://gdcvault.com/play/1021926/Destiny-s-Multithreaded-Rendering https://www.ea.com/frostbite/news/framegraph-extensible-rendering-architecture-in-frostbite https://www.gdcvault.com/play/1022186/Parallelizing-the-Naughty-Dog-Engine Also, I would like to recommend you to ignore some of the previous advice in this forum thread on using std::thread for multi threading. Spawning an OS thread is relatively costly and in a game engine you want all the performance you can get. Furthermore, hitting a mutex means that the operating system will allow another thread to run which might be a completely different application. Instead I would recommend you to take a look at mulit-threaded tasking libraries which spawn a bunch of threads at start-up (usually as many threads as you have cores) and then do the scheduling of tasks themselves (using a (work stealing) task queue). Examples of these are Intel Threaded Building Blocks (TBB), cpp-taskflow, HPX (distributed computing focused), FiberTaskingLib and Boost Fiber. Note that the last 3 all use fibers (AKA user-land threads, AKA green threads) which are like operating system threads but where the programmer is in control of scheduling them. A well known example of using fibers for a tasking system in video games is the GDC presentation by Naughty Dog on porting The Last of Us to the PS4 (and running it at 60fps): https://www.gdcvault.com/play/1022186/Parallelizing-the-Naughty-Dog-Engine Finally, if you care about performance try to read up on modern computer architecture (the memory system) and SIMD. Most game engine developers now try to apply "Data Oriented Design" which is a way of structuring your program in such a way that it makes it easy for the processor to process the data. This usually comes down to storing your data as a structure of arrays (SOA) which is better for cache coherency and makes SIMD optimisations easier (although DOD does cover more than just SOA). To learn more about the graphics pipeline, a lot of resources are available online describing how the GPUs programmable cores work (covering terms like warps/wavefronts, registry pressure, shared memory vs global memory, etc). If you are interested in learning more about the actual graphics pipeline itself (which contains fixed-function parts) then I would definitely recommend this read: https://fgiesen.wordpress.com/2011/07/09/a-trip-through-the-graphics-pipeline-2011-index/ Also, writing a software rasterizer is a great way to get to learn the graphics pipeline and it is also a really good toy project to practice performance optimisations (maybe read up on project Larrabee by Intel). Sorry for the wall of text. Hopefully this will help you and anyone else trying to develop their first game/graphics engine and not knowing where to start (in terms of performance optimizations).
  6. mathijs727

    Github Microsoft? :(

    Private repos do not require a premium account as of this week: https://techcrunch.com/2019/01/07/github-free-users-now-get-unlimited-private-repositories/
  7. mathijs727

    Nvidia RTX 6000

    Adobe Premiere Pro is not really a popular GPU benchmark. Pudget Systems has a blog where they regularly post benchmarks of professional applications: https://www.pugetsystems.com/recommended/Recommended-Systems-for-Adobe-Premiere-Pro-CC-143 Looking at their testing it seems like there is no reason to buy a really high-end GPU for Premiere Pro. Also, a Quadro graphics card won’t run any faster in Premiere Pro than it’s consumer counterpart. The only big difference is 10 bit color support but they suggest a separate PCIe video monitoring cards which is still way cheaper than Quadro.
  8. mathijs727

    I'm confused, LTT

    The first video is about 4k GAMING being dumb. There are other reasons to buy a monitor than gaming.
  9. Installing Arch is a bit of a pain though. You can use Antegros which is basically Arch with a GUI installer or Manjaro which is more fully fletched and comes with some helpful applications and it uses their own package repository. For developing C++ applications I found both very useful because they're rolling releases with updated compiler versions and a lot of C++ libraries available in the package repository.
  10. mathijs727

    MacOS or Windows 10 for programming?

    I used a Macbook Air during my computer science bachelor and switched to a XPS15 for my (also computer science) master. Both machines can run all the software you want and most things (that don't require a GUI) also run in WSL (bash on Windows AKA Windows Subsystem for Linux). If you are interested in graphics than that is a big win for any Windows laptop in terms of APIs. Apple is deprecating OpenGL in favor of their own proprietary Metal 2 API. There is a project for running Vulkan on Metal but I don't think it would be the most optimal development environment (less debugging options for example). Also, Apple does not ship any laptops that come with a Nvidia GPU so no CUDA. Apple does support OpenCL 1.2 for now (which is really old) but you don't even want to bother with OpenCL (it's so shit that I bought a secondary GTX1050 for CUDA, just to get away from OpenCL). Now, if you're not interested in doing GPU work on your laptop (they drain battery and performance will suck when not connected to the charger) then a MacBook is a viable option. If you ask me, the MacBook would even be the better option (if price is not an issue). The trackpad of the MacBook is still miles ahead of any Windows laptops (both my old MacBook air and my brothers MacBook Pro 15 (2018) have nicer trackpads than the XPS 15). Programming with a trackpad is almost impossible on Windows. Literally every application scrolls differently (or not at all) and they all feel shit in their own way (no smooth scrolling, the way scrolling slows down is weird). I never encountered this on my MacBook (on macOS): scrolling works the same in every application and it feels "just right". Work spaces are also much better in macOS: full screen apps automatically get their own work space (really nice feature, at least on a laptop), good gestures to get a quick overview of all your open apps and work spaces and the ability to reorder work spaces (something that Windows 10 cannot do for some weird reason). Also, independent of whether you choose for a MacBook or Windows laptop, go for a 13 inch model. The difference in size/weight/battery life make a real difference when carrying it around and going from 13 to 15 inch doesn't have any impact on your productivity. Furthermore, most 15 inch laptops have shitty battery life and require you to bring a charger. Ultra books with integrated graphics don't have this issue so not only are they lighter/smaller but it also saves on a charger. And in the case of macOS, you really need to bring a mouse either (I actually prefer a MacBook trackpad over a mouse because of the gestures) so that saves another 100 grams. TLDR: need a dedicated / Nvidia GPU => Windows. Otherwise: MacBook Pro 13" (with touch bar, the one without touch bar is a dual core)
  11. mathijs727

    Should I?

    Good choice. The Windforce card will cool much better and be much much much quieter while doing so. Also, the Asus card doesn't seem to have backplate. Having a backplate is more important to the looks than a very small RGB light (and cooling/noise production should be more important than looks anyways).
  12. mathijs727

    550w for gtx 1070?

    I'm running an overclocked 4790k + GTX1080 without a problem. This is with a Corsair RM550x, which is a high quality PSU. Previous kill-a-watt measurements pointed to < 300W under gaming load (with a RX480 IIRC). If I can find the kill-a-watt I will try to do some new measurements but a quality 550W PSU can handle a R5 1600 + GTX1070 just fine (but if you're buying a new PSU I would recommend a 650W so you have some margin when you upgrade your CPU/GPU in the future).
  13. mathijs727

    Mac and IOS Programming in Linux

    Swift has officially support on Ubuntu for a couple of years now: https://swift.org/download/ I've not used Swift so I'm not sure about the quality of the development environment on Ubuntu (but its probably better on macOS).
  14. mathijs727

    Visual Studio on Linux

    Why not use CMake? It can generate *.sln files, Makefiles, ninja files and more. Its the defacto standard build system for cross platform C++ development. Also, scanf is not really C++, just a leftover from C (proper C++ would be to use an input stream).
  15. mathijs727

    Software to learn arm 32 bit assembly

    You could but a Raspberry Pi, its cheap and has a 64-bit (ARMv8) ARM processor. I’m not sure how much 32-bit ARM differs from ARMv8. I actually bought a RPI for the purpose of learning assembly programming (but haven’t really had the time to get started). These are two usefull and complete resources that I found on the topic (but there are more): https://github.com/s-matyukevich/raspberry-pi-os https://github.com/bztsrc/raspi3-tutorial