Jump to content

straight_stewie

Member
  • Posts

    2,705
  • Joined

  • Last visited

Blog Comments posted by straight_stewie

  1. Windows works the same way, it's just hidden better.

    Everything is always a file, whether it's presented that way or not.

    Exposing the "everything is a file" interface to the user can have several benefits. It's significantly easier to write an application that pretends to be a mouse: All you have to do is secure a file for a mouse and write instructions to it. And that's just one example.

     

    This is derived from how things work on silicon: All you can really do is write to an address and read from an address, sometimes with side effects (operations). And that's it. That applies whether you are writing to a register, memory, or storage. All you can ever do is write to and read from an address.

    Your argument against hardcoding things like filepaths is really a different argument altogether, one which basically everyone agrees with, with one caveat: They agree with it for application development.

  2. I think that you have a fundamental misunderstanding of quantum mechanics.

    A qubit is not "at any number between 0 and 1", to paraphrase your original post. When measured, a qubit exists in exactly one state, that is to say that it's probability function "collapsed" (in quotes for a very advanced reason).

    When we think of a qubit, it sometimes helps to think of a ball. We can constrain the ball such that it cannot move except to rotate around a single axis. We will assume that the ball is in fact rotating at all times and is never actually motionless.

     

    Without looking at the ball or any of it's starting conditions, all we can say is that the ball is either rotating left or right. We would call this a "superposition" of the two states of the ball, that is: We think of the ball as rotating both to the left and the right at the same time.

    However, this is real life and in reality the ball must either be rotating to the left OR to the right, but never both, so when we actually measure (look at) the ball, it's probability functioning "collapses" (again with those pesky quotes) and we see the ball rotating to whichever direction. Hence, the "superposition" of directions on the ball is simply the probability of it spinning in a given direction when we look at it.

  3. So while trying to figure what exactly some circuit that could accelerate Ray Tracing looks to respond to your points about die area, I came across this article: What you need to know about ray tracing and NVIDIA's Turing architecture.

    That article states:

    Quote

    "The crux of the matter is something called BVH traversal, short for Bounding Volume Hierarchy. This is basically a method for optimizing intersection calculations, where objects are bounded by larger, simpler volumes." 

    "NVIDIA's solution is to have the Turing RT cores handle all the BVH traversal and ray-triangle intersection testing, which saves the SMs from spending thousands of instruction slots per ray.

    The RT cores comprises of two specialized units. The first carries out the bounding box tests, while the second performs ray-triangle intersection tests and reports on whether it's a hit or not back to the SM. This frees up the SM to do other graphics or compute work. "


    So, that narrowed down my parameters. I started a search for "bounding volume hierarchy traversal circuit" and came across an Intel patent filed for in 2012: Graphics tiling architecture with bounding volume hierarchies

     

    In part, the patents abstract states:

    Quote

    " In some embodiments, tile lists may be avoided by storing the geometry of a scene in a bounding volume hierarchy (BVH). For each tile, the bounding volume hierarchy is traversed. The traversals continued only into children nodes that overlap with the frustum on the tile. By relaxing the ordering constraint of rendering primitives, the BVH is traversed such that nodes that are closer to the viewer are traversed first, increasing the occlusion culling efficiency in some embodiments. "

    That sounds awfully similar to some official statements that have been made about how the RT Cores work. Reading this patent may be worthwhile as an introduction to the subject.

    Additionally, this newer Samsung patent is more in depth and even more similar to how Nvidia has claimed RT cores work: https://patentimages.storage.googleapis.com/ac/e4/e1/ad9e4d9b32502a/US10049488.pdf

    Quote

    A method of traversing an acceleration structure ( AS ) in a ray tracing system includes obtaining information about child nodes of a target node included in the AS ; determining whether each of the child nodes intersects a ray based on the obtained information , determining a next target node among at least one child node that intersects the ray ; and performing an operation corresponding to a type of the determined next target node

    Sounds even more similar to the claimed "BVH search and report an intersection hit to the SM".

  4. Quote

    In first world countries we have very little forced oppression; some will argue that having laws that force one another to conform to societal requirements is a forced oppression.

    Well, I would argue an even stronger argument than that: Taxation.

    If you don't pay your taxes, the government can take everything you own, threaten you with jail time, and even threaten you with firearms (a definite sign of forced oppression). Such a thing as taxation is oppression because you cannot opt out of it (the cost of denouncing ones citizenship is overwhelmingly high).

    In the trivial case, all useful governments must have some way of exerting force on it's constituents, otherwise it could not maintain governance.  A willfully entered  contract of governance can only last as long as the generation that entered into it. After that, all generations are born into the governance and so do not have a choice as to whether they are governed or not. It is, quite literally, forced upon them with threat of violence, theft, or both.

    However, I agree with your over-arching point that it is generally not fruitful, and can even be damaging, to believe that one's position in life is the fault of a generally democratic government that they live under. For if that is ones belief, then that person could never better their position in life.

  5. Well, there is one potential solution to the GPU chiplet problem, but at this time we can't create a bus fast enough to emulate an on-die bus, which would make scaling the solution difficult.

     

    Let's use the Nvidia GP104 as an example. For reference, the first spoiler contains the GP104 layout, and the second spoiler contains the Streaming Multiprocessor layout:

    Spoiler

    blockdiagram.jpg

    Spoiler

    NVIDIA-Pascal-GP104-SM.png


    Looking at the Streaming Multiprocessor, we see that each "big core" has an instruction pipeline, some controller logic, a register file, and many processing cores. The supporting infrastructure around that contains some data and instruction caches, and some shared scratchpad memory, allowing a few "big cores" to be placed together.

    The streaming multiprocessors are then grouped into groups called GPCs. The GPC surrounds the streaming multiprocessor with some shared instruction pipelining and some task specific compute resources. The GPCs are then repeated over the chip, and glued together with some data caches, and an instruction dispatcher. Finally, we have a multiplicity of memory interfaces and a single monolithic external bus controller.

    There are two ways I can see this working, both ways require the assumption that we can build external buses with equivalent performance to on die buses.

     

    The first is to separate the GPCs into individual chips. All they will carry with them are their two memory controllers, and a portion of the L2 cache. This requires building some external "Gigathread Engine" (instruction dispatcher), and piping that to all of the chips. All of the chips are still working off of the same instruction stream, and the same data in the same memory. With the assumption that the external buses are as fast as on die buses, this is exactly equivalent to what is already happening on die, but with an allowance for increasing the performance of the chip by adding more GPCs at assembly time, instead of at FAB time. The tradeoff is the number of traces required on the PCB, as well as an increase in cost for the same performance (each GPC needs it's own physical packaging).

     

    The second way trades some of the board complexity for some latency, by adding a third level: A new layer of instruction dispatching (a TeraThread Engine perhaps?). Ostensibly, the TeraThread Engine would be identical in function to the GigaThread engine, except that it would forward instructions to the GigaThread Engines instead of the individual GPCs. Doing this gives us the ability to add a multiplicity of the existing designs to a board, with two major tradeoffs: The first being a slightly higher latency, and the second being a much more complicated main memory design.

    Both of the cases above are logically identical to the way things are currently done, and could likely be pulled off without requiring any changes to the programming model. The reality of the situation, however, is that both of these designs rely on external bus speeds approaching that of on die bus speeds, which is just not realistic at this time.

×