Jump to content

Mira Yurizaki

Member
  • Posts

    20,911
  • Joined

  • Last visited

Blog Comments posted by Mira Yurizaki

  1. 5 hours ago, Arika S said:

    Current top 3 video games? 

    I can say the top one right now is Final Fantasy XIV.

     

    The other two seems kind of out there. There's a lot of games I enjoyed, but it's hard to pick which ones I'd gush over again. But if you were to make me pick two, they'd have to be Chrono Trigger and Secret of Mana.

     

    5 hours ago, Arika S said:

    Do you have a dream you're working towards? 

    Settling down somewhere with a house, maybe a partner. I probably won't move from where I'm at in general though.

  2. I don't now if anyone keeps up with this, but another solution I thought of is keep two sets of physics routines: one purely cosmetic and the other that actually affects game play. So things like cloth and hair animation or cinematic animation would go to the cosmetic simulation while entity interaction would go to the game play simulation. The cosmetic simulation can run alongside the graphics routine and can run as fast as possible with the time delta simply being the last frame time, while the game play simulation will run with the game logic at a fixed interval to keep things simple.

  3. The unicorn is getting a fast enough bus, but it's largely that at the moment, a unicorn.

     

    The other question how much of a benefit are you truly getting from this from a manufacturing standpoint? If we pull up some numbers on Wikipedia, we can find the following stats:

    • GT 1030
      • 384 SPUs, 24 TMUs, and 16 ROPs
      • 1.8 billion transistors
      • 74 mm^2 die
    • GTX 1050 Ti
      • 768 SPus, 48 TMUs, and 16 ROPs
      • 3.3 billion transistors
      • 132 mm^2 die

    Even though the GTX 1050 Ti is basically double the GT 1030, the GTX 1050 Ti is a more efficient design since it uses less transistors and die space than a reasonably assumed 2 times. Also note that nothing else is different between the two. They both were designed with the same external bus and memory type. I'm pretty sure my math isn't correct here, but for the same amount of material you can make 66 GT 1030's, you can make 37 GTX 1050 Ti's. You can lose 4 1050 Ti's and still even out, but this is an 89% yield. We also can't assume the GT 1030 has a 100% yield rate and, for the sake of simplicity, it too suffers from an 89% yield. Which means about 58 dies are good, which further increases the amount of bad 1050 Ti's that you can have before you drop below the 2x breaking even limit. In other words, you can have a defect rate as low as 79% on the GTX 1050 Ti's before it starts to no longer make sense to make them (as much) as opposed to gluing two GT 1030's together.

     

    We also have to consider what I mentioned in the blog post: a GPU for gaming is going to be working on a time-sensitive task. Benching something like POV Ray and x264 is fine and all on a CPU because we don't care in the order that the final output is assembled (more or less) nor how long it takes (though the faster the better). In a GPU, the order of how the final output is assembled does matter and we do care how long it takes to get something done. I'm not quite sure how sensitive introducing latency or whatnot will affect overall graphics performance, and the only thing SLI shares is frame buffer data (I'm not sure what NVLink shares)

     

    But overall, until we solve the two biggest issues plaguing multi-GPU setups, that being memory pools don't combine and workload distribution, I don't think chiplets will be anything more than a fancier way of doing multi-GPU setups.

  4. A note about this part of the blog:

    Quote

    So while adding more transistors per CPU core hasn't always been viable...

    What this means is that in GPU land, you can get away with simply duplicating your basic execution units. In AMD terms, this is a stream processor. In NVIDIA terms, a CUDA core.

     

    In CPUs, you can't duplicate its basic execution units, which are the ALU, AGU, and FPU, and expect a linear improvement in performance. Most of the transistor count increases per core over time for CPUs mainly may be due to adding unrelated to semi-related features like SIMD processing.

  5. 2 hours ago, dwang040 said:

    If the game engine is only capable of producing and updating at 60 Hz, sure, we can say that there is a "reason" to cap. But if that were the case, is a cap really necessary?

    No, but I wouldn't see a reason to have it uncapped either other than for bragging rights.

     

    Quote

    Hmm, I'm not particularly saying that a system that is incapable of producing 60+ fps would suffer a penalty because it's incapable of reaching more than 60 fps, and correct me if I misread your comment, but it sounds as if you're implying that systems that are capable of running the game at 100+ fps will have an advantage because they can produce more fps, thus forcing the engine to run faster? If that is the case, what about those systems that can only run the game at 30-45 fps? I can't say I remember seeing a lot of people saying that their game runs slow cause they couldn't reach 60 fps?

    It's about how often the system can run the game logic. I'm under the belief that most game engines run their logic at the same rate, regardless of what processor you throw at it. The frame rate you get at the end is how much time left over the processor can spend sending render commands to the GPU. So if a game runs at 60Hz, then every 16 or so milliseconds, it'll run the logic. If it can complete this within say 500 microseconds, this gives the CPU 15.5 milliseconds to compile and send GPU commands.

     

    But otherwise, yes, I'm implying that it may be advantageous to be able to run the logic more or less often.

     

    Quote

    For me personally, I question if the engine is scaling based on the fps cap. We know that increasing the cap will increase the engine speed, but what about decreasing the cap to 30 fps (Probably someone tried it out, but I couldn't find any info)? Would that slow down the game to half speed?

    No, because again of my assumption that the game will always run the game logic at a consistent rate, regardless of FPS it can spit out. In this case, if you get 30 FPS, and the game is say 60 Hz, the game already processed two cycles by the time you receive a frame.

     

    Just remember, graphics is a visual representation of the state of the world. As such, it's the last thing that gets done in game. I think a good video that "explains" this is here:

     

    (I say "explains" because it's an intermediate level video, the presenter doesn't explain most of the terms he uses)

  6. While it's easy to claim it's awful design to have things revolve around the frame rate, I would argue on the other end of the spectrum, it may not be useful to have a frame rate that can exceed how fast the game world runs. If the game world only updates at 60 Hz, there's no point in exceeding 60 FPS because the graphics is a visual representation of the current state of the game world. You would just have extra frames that are rendering the same thing. Maybe if the graphics engine were fancy enough it would render in-between frames of animation, but those wouldn't really count for anything.

     

    Of course, you could also ask why won't developers allow the game world to run faster or slower? Because this would create an inconsistent experience when comparing lower end systems to higher end ones. Imagine being able to cheat by running the game world at a lower rate; you could effectively phase through matter.

  7. 41 minutes ago, CarnageTR said:

    SLI bottlenecks communication traffic. Linus have a video about it. NVlink might be solution.

    NVLink is still slow compare to VRAM bandwidth. Considering that GPUs are memory bandwidth sensitive, I don't believe even running the links at VRAM bandwidth would solve everything since memory sensitive applications have issues in NUMA based systems.

  8. It can, but developers who think their app deserves all of the CPU time in the world will never send a signal to say the thread is idling.

     

    EDIT: Though some OSes do track process utilization as a means to tell what the process is doing. I believe MINIX can use this to judge if a process is stuck in a forever loop and lower its priority automatically until it just dies.

  9. It was being handled that way in the application side, in that incoming ACK requests would immediately cause the state machine to go back to "Tx waiting" and never really get to "Waiting for ACK."

     

    Basically I believe the solution was more or less have sending ACKs out on the same "priority" as retry sending the message. It's important to send ACKs out, but it's also equally important to retry the message.

  10. As a note, the only potential issue I've seen that is related to changing from HDDs to SSDs is that on older HDDs with 512-byte or 512e sectors, if they're not 4K aligned, they will cause performance issues on SSDs since SSDs have pretty much been using 4K sectors since whenever. But practically all HDDs use 4K sectors and so that shouldn't be a problem. But I'll check on that anyway.

  11. As an explanation for the choice of benchmarks...

    • The program must have an in-game benchmark, because I wanted little influence as possible from outside programs on the results. Even if that influence is negligible.
    • 3DMark was chosen because well, it's 3DMark :D
    • Unigine Heaven was chosen because it's still a popular DX11 benchmark
    • FFXIV was chosen mostly because it's what I'm playing right now :3  However it does not test network capabilities.
    • GTAV was chosen mostly because it's still a popular benchmark
    • Deus Ex: Mankind Divided due it being rather stressful on cards
    • F1 2016 due to being relatively CPU intensive since the benchmark simulates all ~24 drivers.
  12. 8GB of shared RAM, and the only slides I can find from developers who showed the memory usage (look up Killzone Shadow Fall Post Mortem) posit that the main system takes 3GB for itself, leaving 5GB total for the games. 3.5GB of that was used for the GPU and the rest for the game itself.

     

    As far as I know, the PS4 Pro did not increase the memory capacity. However it can render 4K using a tiled approach which requires less bandwidth.

     

    Also as far as I know, the CPU is still a Jaguar CPU, just with a clock speed bump. Jaguar is a netbook class architecture. So a desktop class architecture should be more than enough to make up for any deficiencies that simply doubling the clock speed but halving the cores might do. Also, you can't buy dual socketed AM1 boards 

  13. And therein lies the problem with this discussion: there's too many open variables that people will skew in order to make their side look better. For example, you mentioned the second hand market. Guess what console buyers can do? Buy second hand if they want to. Which is why I set some strict, specific guidelines on my comparison.

     

    If you're going to start price comparing, you have to tie up as many open ended variables as possible. Otherwise your biases will start playing in and your argument will be weak.

×